Summary
- Results-oriented AI Engineer proficient from research to production.
- Proficient across infrastructure, network, server, and application level in system architecture.
- Effective team player with communication skills fueled by intellectual curiosity, high motivation.
- Product-focused development, meeting criteria while integrating innovative ideas.
- Strong communication skills for both internal and external stakeholders.
Engineering skills
- Experience in Pytorch multi-node distributed training DDP and FSDP with 2 x RTX3090 per each node.
- Accelerated text generation by introducing multi-GPU inference server (TGI) with batch processing.
- Indexed vector databases for RAG with million-scale documents using pgVector, Faiss, ChromaDB.
- Refactored into Rust as Node.js NAPI binding from C++ module, achieving platform independence.
- Competent in infrastructure management, CI/CD GitOps, Kubernetes on GCP, network manipulation.
Professional Experience
Kakao Mobility (software engineer) Dec 2021 – Sep 2022
- Led a 3-person team in developing 3D vector generation on LiDAR pointcloud for a map data pipeline.
- Reduced build time by 70%, simplified dependency management enhancing team productivity.
- Achieved 60% Node.js typescript code coverage by introducing Vite unit tests using Jenkins CI.
- Expert in PostgreSQL, manipulating million-scale vector with PostGIS geospatial index and function.
- Integrated real-time error notification through Slack bot, monitored via InfluxDB & Grafana from GKE.
Stryx (software engineer) Nov 2019 – Dec 2021
- Cut API bandwidth and TTFB by 80% with 3-level caching across Redis, Web Server, Cache headers.
- Downsized Docker image 91% from 2GB to 180MB Docker using multi-stage builds, hastening CI.
Projects
- Fine-tuned Gemma using Korean chat and wiki datasets with PEFT QLoRa training on 2xRTX3090.
- Gained a line by-line understanding of GPT, enabling integration into an HF Transformers model.
- Achieved MBTI personality analysis for 1,200 people from 1600 unique visitors in the first month.
- Ensured the security of RAG by real-time vector indexing with JSON mode and dynamic data splitting.
- Optimized computing cost and memory by 75% by applying 4-bit GPTQ quantization to LLaMa2.
- Attained a personalized AI app without external API, utilizing a localized vector database and chat UI.
- Integrated a recommender system into Web backend using ONNX & transformers.js for inference.
- Embedded 30,000 pages in pgVector for RAG vector search, deployed to GPTs with an action API.
- Published npm packages which enables several features like CLI Notion page export (packages).
Academic Paper
- Minimized AI hallucination by decomposing sentences into smaller units and recomposing them.
- Boosted OpenIE5’s triple extraction speed by 300% combining a reverse proxy with container replica.
ReSRer
- Reduced OpenAI API costs by 30% through prompt optimization, utilizing ‘LLM as optimizers’ paper.
- Indexed 21M Wikipedia passages into a Milvus vector database with multiple embedding models.
- Improved ODQA performance 20% by introducing contextual compressor module before LLM reader.
Education
- Yonsei University (Seoul) Undergraduate in Computer Science (2017.3 ~ 2024.06).
- University of California (Riverside) Exchange student in Computer Science (2022.9 ~ 2023.1).
I hereby certify that the above statements are true and correct to the best of my knowledge.