ReSRer Code

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2023 Dec 14 6:57
Editor
Edited
Edited
2024 Jan 31 3:43
Refs
Refs
기본적인 구조는 root folder의 4가지 python 파일로 다양한 배치작업들을 실행시키는 구조
소스 구현 구조를 간략하게 설명하는게 resrer 프로젝트 이해해 도움될거라 생각

index_ctx.py

  • dataset - Step 0. Embedding passages and index into vector DB(Milvus)

hf_data.py

  • upload - Step 1. Creating & uploading QA training dataset using API prompting

train.py

  • train - Step 3. Training from base model and uploading it to Huggingface

qa_pipeline.py

  • dataset - Step 4. Running QA pipeline with evaluating and uploading summary
 

use_tiktoken.py

  • split - r1
    • split raw Wikipedia documents to passages
  • count - r2
    • for debugging
 
 
 
 

Task

  1. indexing Wikipedia data for retrieval
  1. Creating (passages, summary) dataset
  1. Training the Summarizer model
  1. QA pipeline running and evaluation
 
 
 
 
 
 
 
 
 
 
 

Recommendations