YSU NLP HW

실험이 많이 없다보기 introduction에 용두사미 되지 않도록 뒤에 말할 내용을 고려하면서 introduction 적기

코드 복붙은 안되고

Abstract: Describe your work from a high-level perspective.

Introduction (5 paragraphs): Present the context and significance of your work. Outline the problem and motivation. Summarize your contributions.

in context learning
tool learning
dataset
code generation ability

Related Work (2~3 paragraphs): Discuss previous work, such as CoT. You may use up to 2 pages for the abstract, introduction, and related work sections.

pseudo code
zero shot cot
…

Method (1~1.5 pages): Explain the PoT method in detail. Include the motivation and the methodology.

Experiment (0.5~1 pages): Detail your experimental settings. Describe the benchmarks used and the baselines compared with your method.

Results & Analyses (1~2 pages): Present your results. Provide a detailed analysis of your findings.

induction head

전략

introduction paraphrase 하면서 다시적기 내가 한 것들

주석 추가해가며 induction head 등 anthropic하고 in context learning쪽 좀 더 상세히 설명

related work cot zero shot cot pot 수도코드 등 적으면 될듯

메소드는 비슷하게 적고

experiment는 내거 내용으로 적기 나머지도 마찬가지로

Tip

figure 복붙 말고 수치 긁어와 새로 그리기

result 가져올 때는 figure 나 table에 caption에서 cite하기

Include two additional experiments (excluding the experiment done in the previous lab session)conducted by you.

1) reasoning with PoT-generated Python code without executing the generated Python code
2) qualitative analysis on when PoT fails.

Include three original experiments.

주요 내용

왜 zero shot 이 더 좋은지 (coding ability 에 비해 주어진 코드가 좋지 않을 때 (zero shot과 3 shot 모델 사이즈에 따라 비교) - coverage 한계

왜 pot가 cot보다 좋은 지 model의 code generation ability 능력에 비교하여 (x 축 모델 사이즈에 y축 성능차로 dot 표현)

Future works, related

pot 수도코드

tool learning

추가실험

다양한 데이터셋

Results

8b는 수정필요 할숟도


Model: gemma-7b
  result_3shot_direct.json: Accuracy = 0.10
  result_3shot_cot.json: Accuracy = 0.54
  result_3shot_pot.json: Accuracy = 0.38
  result_0shot_direct.json: Accuracy = 0.06
  result_0shot_cot.json: Accuracy = 0.46
  result_0shot_pot.json: Accuracy = 0.48

Model: llama3-8b
  result_3shot_direct.json: Accuracy = 0.16
  result_3shot_cot.json: Accuracy = 0.72
  result_3shot_pot.json: Accuracy = 0.64
  result_0shot_direct.json: Accuracy = 0.18
  result_0shot_cot.json: Accuracy = 0.60
  result_0shot_pot.json: Accuracy = 0.68

Model: llama3-70b
  result_3shot_direct.json: Accuracy = 0.44
  result_3shot_cot.json: Accuracy = 0.88
  result_3shot_pot.json: Accuracy = 0.80
  result_0shot_direct.json: Accuracy = 0.40
  result_0shot_cot.json: Accuracy = 0.78
  result_0shot_pot.json: Accuracy = 0.84

Model: gpt-3.5
  result_3shot_direct.json: Accuracy = 0.32
  result_3shot_cot.json: Accuracy = 0.62
  result_3shot_pot.json: Accuracy = 0.68
  result_0shot_direct.json: Accuracy = 0.26
  result_0shot_cot.json: Accuracy = 0.62
  result_0shot_pot.json: Accuracy = 0.72

Model: gpt-4o
  result_3shot_direct.json: Accuracy = 0.58
  result_3shot_cot.json: Accuracy = 0.90
  result_3shot_pot.json: Accuracy = 0.94
  result_0shot_direct.json: Accuracy = 0.64
  result_0shot_cot.json: Accuracy = 0.80
  result_0shot_pot.json: Accuracy = 1.00