UCL AI4SD Coursework 1

You are required to submit 1 PDF report.

Your report must be no more than 4 pages of A4 (excluding references), and no larger than 200MB. Your total file size must be no more than 500MB.

guidelines_coursework1_ai4sd.pdf

175.7KB

Information

Do not include your name in the submission, it is graded without names.

There is no word limit but make sure the document is not longer than 4 pages, but we encourage you to try to be as concise as possible, with the exception of not going below a font size of 11

please include the source and justify how it is directly related to sustainable development.

We ask you to speculate on the potential risks of the data/technology and to think of solutions to mitigate them

we ask you to think how the application could impact other sustainable development goals indirectly

Criteria

1 = mildly relevant to the question • 2 = relatively relevant • 3 = moderately relevant, addressing most important points • 4 = very relevant but does not cover all the points • 5 = nailed it!

Chosen dataset and model

Chosen dataset and model

Gemma Scope is the Sparse AutoEncoder SAE( cite) which is the common archtiecture proposed to interpretae Large Languaeg Models’s (LLMs) feature efficiently by Anthrpopic el al. Gemma Scope is designed and trained to interpret the feattures from Open Source model Gemma (Gemma) grom Google Deepmind. The model is evaluation with the dataset called The Pile () which is multilingual large scale text corpus with majarity composed in English.

Assess its documentation

The source code of Gemma Scope is disclosed even thought the weight of each layers’ SAE is published. The hyperparameters are open and some of releveant code such as Colab notebooks Mishax(

mishax

google-deepmind • Updated 2025 Feb 19 23:1

make it footnote) is open but it is not easy to exactly reproduce model for another LLMs without codebase. However reproducing evaluattion experiment is possible since there is the weight and pile dataset on accessible. Overall analysy on the paper about the model Gemma Scope about The Pile dataset is rigorous and well validated by separating dataset into small subset

Connection to SDGs

Related goals and targets

SDG 5 (Gender Equality) aims to achieve gender equality and empower all men and women by eliminating discrimination and fostering equal opportunities across all sectors. SDG 9 (Industry, Innovation, and Infrastructure) focuses on building resilient infrastructure, promoting inclusive and sustainable industrialization, and advancing innovation for long-term growth and stability. SDG 17 (Partnerships for the Goals) supports strengthening the means of implementation and revitalizing global partnerships, promoting technology sharing and capacity building to assist all countries, particularly developing nations, in achieving sustainable goals. These goals collectively support a sustainable and equitable global framework (United Nations Statistics Division 2020).

SDG enabler and inhibitor

SAE 는 최근 model 의 bias를 generation result로만 파악하는 시도득 (David Nadeau,) 과 더불어 LLm의 activation 을 explicit하게 분석하는 새로운 방법입니다. SDG 5를 위해서 여러 bias 뿐만 아니라 sexual feature 또한 SAE에서 발견되기도 하며 (Hugo) 이러한 컨텐츠의 생성 control은 gender equality를 위해 필수적입니다. 또한 SDG9의 resilient infrastructure를 위해 가장 중대한 문제는 controllable AGI를 구축하는 것입니다. SAE가 Interpretabile AI 에 주는 중요성과 Gemma Scope의 기여는 feature를 분석하고 Steering Vector (cite)를 통해 이 가능성을 보여줍니다. 마지막으로 SDG16은 Gemma Scope의 weight를 public accessible하게 공개하여 Explicitly Interpretable Accessible 하게 만듦으로 써 17의 developing contury를 포함한 technology에 open software community에 기여합니다

Impact on sustainability

Social sustainability

What are the risks if the data/models are compromised?

Even if the model is fine-tuned to refuse illegal or inappropriate request from users for AI alignment, the LLMs do not lose the capability to generate those response. That means when the SAE is published, it could be used to generate ~ content such as sexual or violence related (cite).

Has the training data been analysed for potential biases? How could those biases impact social sustainability?

The Pile dataset (cite) analyzed the bias and sentiment co-occurence about the whole dataset. Specifically top 15 most biased adjectives or adverbs that words like “military”, “criminal”, and “offensive” strongly bias towards men, while “little”, “married”, “sexual”, and “happy” bias towards women. Also in the religious perspective, “radical” co-occurs with “muslim” at a high rate, while “rational” often co-occurs with “atheist” and for the race 4 most biased words for “black” are “unarmed”, “civil”, “criminal”, and “scary”. Finally significant bias exists on language, which states the Pile is 97.4% English while only 13% of the world’s population speaks English.

Environmental sustainability

Is the application resource intensive? What about the data collection?

The Pile dataset is 825 GiB English text corpus which means takes a lot of reousres to retain and distribute the dataset. At the same time, it is well refined text format is optimized form to reduce energy consumption minimally with transmiitting same information. Also, SAEs requires additional computational resoureces during the LLM inference. Also SAEs requires ~(적절한 단어 골라 큰~) resource to store weights per every layer of LLMs.

Could the application impact consumption and production patterns in some way?

Steering vector derived by SAE features can be used in several ways, abilty to manually control AI means it might be used for advertisement. They will introduce it if ad is possible ASAP and impact comsumption style and production pattenrs since Advertisement is one of the biggest market in internet era even though current AI services such as ChatGPT and Anthropic do not include ad on therir output by technical limitations and output quality.

Economic sustainability

Do you think the application has innovation potential? Could it lead to other impactful applications in science and sustainability?

The research about SAE especially about the Gemma is impactful since Gemma 2 (cite) is one of the leading open source base model leading open source community along with LLaMa 3 (cite) and Mistral 7B (cite). Approach using SAEs to extract features from Bricken et al (chante to inline cite)., was innovative and Gemma Scope is comparing several techniques such as activations and introduce JumpReLU to SAEs to improve SAE performance which might end up to controllable and thus sustainable AI.

Could it increase wealth/power concentration?

I must answer this quesaion in two viewpoints. First of all, open the weight and sharing several techniques though the paper idealy contribute to 과학에 대한 접근과 힘의 분배. On the other hand, the large and computationly dependent property of AI industry sometimes make it this ifnoramtion availble to reach out to highly educated personals and well-supported institutes in developed contry in realilty. Dealing this gap between them are essiential to alleviate welath and power concentatltion.

Other sustainability factors

Other critical questions we should be asking

Has the model been analysed for biases? How could those biases impact social sustainability?

Gemma has been analyzed for biases \citep{nadeau2024benchmarkingllama2mistralgemma}, particularly in handling Bias and Jailbreaking. It often tends to refusing answers, limiting engagement with sensitive topics as it does not accept system messages. This conservative approach may reduce harmful content, supporting social sustainability by lowering risks. However, it can also limit utility by occasionally failing to respond accurately to nuanced prompts, potentially impacting broader social applications.

The dataset’s focus on technical sources may limit accessibility for users from diverse socio-economic backgrounds?

The Pile dataset includes a wide range of sources, representing diverse fields such as politics, law, medicine, and computer science. Notable subsets include FreeLaw, PubMed Central and PubMed Abstracts,. Additionally, resources like Stack Exchange and Wikipedia introduce general knowledge across various subjects, while PhilPapers brings in abstract, conceptual discourse from philosophy.

How might the use of this dataset and model influence educational opportunities or knowledge dissemination in low-resource environments?

Cognitive core of LLMs could be extremely small from Karpathy et al., and it could be shrinked by 99% smaller size since it knows too much unnecessary knolwedge such as based64 for common questions. Extracting and analyzing features could contribute identitying and ablating uncencesary parts from Large Langueage model resulting smaller AI or on-device AI. This smaller size AI requiers less computation adn low-resource which might bridge the techonology gap between diverse environment difference.

Interactions between SDGs

Gender Equality ↔ Reduced Inequalities across countries 는 equally distributied capital across globaly이고 gender equality는 local 한 불평등이라고 볼 수 있다./ across contries inequality가 줄어들면 여성에게도 자본이 유입되어 interaction이 reinforcing 관계에 있다. SAE로 인해 gender bias가 주는 반대 경우도 마찬가지이다ㅓ

Industry, Innovation and Infrastructure ↔ climate action

AGI를 위해 안정적인 infra 개발은 중요하지만 climate action 과의 interaction에 Constraining에 할 수 있음

Partnership for the Goals ↔ Reduced Ineuqlaity

글로벌한 협력을 통해 inequality across countries 를 Indivisible 한 interaction relationship에 있고 sae 오픈소스화는 이에 기여한다.

Speculative solutions

Changes to dataset

데이터셋에 bias가 다방면으로 존재함으로 Gender Equality 를 대명사로 카운팅하여 gender bias를 해결할 수 있다. men이나 muslim 그리고 black 같이 위에서 언급한 co-occurance bias 의 경우 bias 강화하는 data filtering 혹은 fine tuning접근을 이용해볼 수 있겠다. English bias의 경우 non-english의 비율을 늘이고 이런 수정이 반영되도록 적절한 비율로 shuffling하여 slicing할 수 있다.

Changes to ML model

기본적으로 모델이 가지는 문제점은 추가적인 computing과 layer 별로 weight가 별도기 때문에 storage를 많이 사용하는 문제점이 있는데 atttention sink () 같은 후반 레이어의 중요헝을 포착하는 rationale을 바탕으로 cross-layer(dunefsky2024transcodersinterpretablellmfeature), cross-modal (luo2024taskvectorscrossmodal) feature analysis를 하려는 시도들이 있다. 이는 computing resousrce 또는 resource 자원을 최소화하는 방향으로 나는 attention 을 사용한다면 cross model SAE를 개발한다면 많은 리소스를 절약하면서 postivie SDG impact를 유지할 수 있을 것 같다.

Technology governance

SAE 자체가 ai infercne time에 직접적인 intervention으로 AI control을 위한 수단으로서 AI governance의 기반 기술이 될 것이다. 하지만 안정적인 goverance를 위해 AI control 능력을 가진 SAE를 악용할수도 있기에 이를 잘 막을수 있도록 제도적인 절차 마련이 필요하다.

Speculating on the ideal scenario/dataset/task

인공지능 모델은 확률 모델이다. 즉 bias는 필수적이고 효율적이기도 하다. 하지만 특정 개념이나 개체에 대한 Prejudice 를 통해 도울 수는 있지만 특정 그룹에 대한 stereotype을 가지는 모델을 장기적으로 human society에 Sustinable하지 않다. 이를 위해 stereotype이 없도록 gender 를 가지는 모델, 그리고 llm feature를 훨씩 적은 computation 과 storage로 inferrcen time에 혹시모를 bias 까지 steering이 가능한 모델을 목표로 삼을 수 있다.

Impact of Gemma Scope on Sustainability.pdf

225.3KB

Datasets

SustainBench Dataset Package Website