MATS Program 2024

Share a technical project you are proud of. This could be a research paper you wrote, a link to a GitHub repository, a technical blog post, etc.

One project I am particularly proud of is my research on RTSum, which I developed as part of my internship at Yonsei University’s Data & Language Intelligence Lab. The project focused on validating an Interpretable AI framework. This work led to a publication, and it was accepted at NAACL 2024, where I was the main developer. https://github.com/seonglae/RTSum

Recently, I won the first prize (£3,000) at the UCL x Holistic AI 2024 competition. Using a Sparse AutoEncoder(SAE) and a stereotyped dataset, I manipulated GPT-2 via steering vectors to prevent generating stereotyped text. Additionally, I introduced a novel method to extract interpretable features without relying on LLMs as explainers, utilizing Point-biserial correlation. https://github.com/seonglae/emgsd-hermes

Another project that shows my engineering ability is MBTI GPT, an AI personality analyzer that gained over 1,000 users. I implemented it using Redis, OpenAI API, and Faiss, optimizing prompts and reducing API costs by 30%. This service highlights my ability to develop real-world AI implementation. https://mbti.texonom.com

Additionally, I’ve written about Mechanistic Interpretability and LLM steering. In my article on the Superposition Hypothesis for steering LLM with Sparse AutoEncoder, I introduced and explained Anthropic's research in a more accessible way. https://seongland.medium.com/superposition-hypothesis-for-steering-llm-with-sparse-autoencoder-c07b74d23e96

Another work titled Reversing Transformer to understand In-context Learning examines phase change and feature dimensionality within transformers. https://seongland.medium.com/reversing-transformer-to-understand-in-context-learning-with-phase-change-feature-dimensionality-13cbf8a2f984

These writings reflect my commitment to understanding and advancing the interpretability and control of large language models.

What kinds of projects are you most interested in working on during MATS?

Even though I am an AI MSc student at UCL, it has been challenging to find a mentor proficient in Mechanistic Interpretability. I am particularly interested in conducting research on Steering Vectors, focusing on improving AI Reasoning or Question-Answering performance through Activation Engineering. I believe that Nina Panickssery would be the perfect mentor to help make my ideas both feasible and meaningful. I trust in the potential of Mechanistic Interpretability and am eager to discuss my various ideas with her, such as extracting features from Vision Transformer models.

In addition to this research, I am excited about the opportunity to participate in seminars with like-minded individuals and explore the value that can emerge from networking with others who share similar interests.

Program — ML Alignment & Theory Scholars

Applications for future cohorts of MATS are open here. You can apply for the Winter 2024-25 programs now.

https://www.matsprogram.org/program

MATS Program 2024

Recommendations