Seonglae Cho


  • AI Engineer specializing in question answering, backed by a robust system engineering foundation.
  • Proficient across infrastructure, network, server, and application level in system architecture.
  • Focuses on result-driven development, meeting criteria while integrating innovative ideas.
  • Strong team player with effective communication skills for both internal and external stakeholders.

Engineering skills

  • ML expertise with Huggingface transformers and datasets, along with Pytorch implementation.
  • Indexed 21M Wikipedia passages and 6M documents streaming, into a Milvus vector database.
  • Accelerated text embedding generation for 21M entries through TEI and async batch processing.
  • AI application development with prompt engineering utilizing LangChain, ChromaDB, pgVector, etc.
  • Managed a GKE namespace and ensured compatible with local Kubernetes for a seamless DX.
  • Implemented GitOps CI/CD to connect issue management and development process.
  • Refactored CPP camera module into Rust, which contains mathematical projection algorithms.

Professional Experience

Kakao Mobility (software engineer) Dec 2021 – Sep 2022
  • Led a 3-person team project in developing a web app for drawing 3D maps for autonomous driving.
  • Reduced build time by 70%, simplified dependency management by merging into a mono-repo.
  • Achieved 50% code coverage by introducing unit tests and coverage reporting system within CI.
  • Implemented real-time error notification and monitoring system through logging middleware.
Stryx (software engineer) Nov 2019 – Dec 2021
  • Cut bandwidth and response time by 80% with multi-level caching to Redis and web caching.
  • Optimized deployment, downsizing Docker image size from 2GB to 180MB using multi-stage builds.
  • Improved service reliability by migrating bare-metal servers to OpenStack based Docker container.

Academic Paper

RTSUM: Relation Triple-based Interpretable summarization with Multi-level Salience Visualization
  • Minimized AI hallucination by decomposing sentences into smaller units and recomposing them.
  • Boosted relation-triple extraction speed 300% by combining a reverse proxy and container scaling.
  • Indexed 21M Wikipedia passages into a Milvus vector database with multiple embedding models.
  • Accelerated text embedding & text generation by introducing TEI & TGI multi-GPU inference server.
  • Improved ODQA performance 20% by introducing contextual compressor module before LLM.


  • Optimized computing cost and memory by 75% by applying 4-bit GPTQ quantization to LLaMa2.
  • Attained a personalized AI app without external API, utilizing a localized vector database and chat UI.
  • Integrated a recommender system into Web backend using ONNX & transformers.js for inference.
  • Embedded 30,000 documents within a knowledge system for RAG-enabled vector search.
  • Published npm packages which enables several features like CLI Notion page export (packages).


  • Yonsei University (Seoul) Undergraduate in Computer Science (2017.3 ~ 2023.11).
  • University of California (Riverside) Exchange student in Computer Science (2022.9 ~ 2023.1).
I hereby certify that the above statements are true and correct to the best of my knowledge.