Seonglae Cho


  • Results-oriented AI Engineer proficient from research to production.
  • Proficient across infrastructure, network, server, and application level in system architecture.
  • Effective team player with communication skills fueled by intellectual curiosity, high motivation.
  • Product-focused development, meeting criteria while integrating innovative ideas.
  • Strong communication skills for both internal and external stakeholders.

Engineering skills

  • Experience in Pytorch multi-node distributed training DDP and FSDP with 2 x RTX3090 per each node.
  • Accelerated text generation by introducing multi-GPU inference server (TGI) with batch processing.
  • Indexed vector databases for RAG with million-scale documents using pgVector, Faiss, ChromaDB.
  • Refactored into Rust as Node.js NAPI binding from C++ module, achieving platform independence.
  • Competent in infrastructure management, CI/CD GitOps, Kubernetes on GCP, network manipulation.

Professional Experience

Kakao Mobility (software engineer) Dec 2021 – Sep 2022
  • Led a 3-person team in developing 3D vector generation on LiDAR pointcloud for a map data pipeline.
  • Reduced build time by 70%, simplified dependency management enhancing team productivity.
  • Achieved 60% Node.js typescript code coverage by introducing Vite unit tests using Jenkins CI.
  • Expert in PostgreSQL, manipulating million-scale vector with PostGIS geospatial index and function.
  • Integrated real-time error notification through Slack bot, monitored via InfluxDB & Grafana from GKE.
Stryx (software engineer) Nov 2019 – Dec 2021
  • Cut API bandwidth and TTFB by 80% with 3-level caching across Redis, Web Server, Cache headers.
  • Downsized Docker image 91% from 2GB to 180MB Docker using multi-stage builds, hastening CI.


  • Fine-tuned Gemma using Korean chat and wiki datasets with PEFT QLoRa training on 2xRTX3090.
  • Gained a line by-line understanding of GPT, enabling integration into an HF Transformers model.
  • Achieved MBTI personality analysis for 1,200 people from 1600 unique visitors in the first month.
  • Ensured the security of RAG by real-time vector indexing with JSON mode and dynamic data splitting.
  • Optimized computing cost and memory by 75% by applying 4-bit GPTQ quantization to LLaMa2.
  • Attained a personalized AI app without external API, utilizing a localized vector database and chat UI.
  • Integrated a recommender system into Web backend using ONNX & transformers.js for inference.
  • Embedded 30,000 pages in pgVector for RAG vector search, deployed to GPTs with an action API.
  • Published npm packages which enables several features like CLI Notion page export (packages).

Academic Paper

  • Minimized AI hallucination by decomposing sentences into smaller units and recomposing them.
  • Boosted OpenIE5’s triple extraction speed by 300% combining a reverse proxy with container replica.
  • Reduced OpenAI API costs by 30% through prompt optimization, utilizing ‘LLM as optimizers’ paper.
  • Indexed 21M Wikipedia passages into a Milvus vector database with multiple embedding models.
  • Improved ODQA performance 20% by introducing contextual compressor module before LLM reader.


  • Yonsei University (Seoul) Undergraduate in Computer Science (2017.3 ~ 2024.06).
  • University of California (Riverside) Exchange student in Computer Science (2022.9 ~ 2023.1).
I hereby certify that the above statements are true and correct to the best of my knowledge.