- seonglae/faithful-gemma2-2b 그냥 새로 생성
- seonglae/faithful-llama3.2-1b 이것도 새로? 2개 겹치는게 안없어짐
- 뒤에서
- seonglae/faithful-pythia-410m: 이건 잘있으니 그대로
- 나머지는 그대로 올리면 될듯
Dataset Statistics for seonglae/faithful-pythia-410m: Total number of commits processed: 2 Total number of unique commits: 1 Total number of rows: 6641 Total number of unique sequences: 6641 Total number of tokens: 4000035 Total number of tokens in unique sequences: 4000035 Processing seonglae/faithful-llama3.2-3b Total number of commits processed: 2 Total number of unique commits: 1 Total number of rows: 238000 Total number of unique sequences: 237986 Total number of tokens: 100395739 Total number of tokens in unique sequences: 100394553 Merged dataset pushed to seonglae/faithful-llama3.2-3b Dataset Statistics for seonglae/faithful-llama3.1-8b: Total number of commits processed: 20 Total number of unique commits: 19 Total number of rows: 454348 Total number of unique sequences: 453885 Total number of tokens: 180268487 Total number of tokens in unique sequences: 180199703 Merged dataset pushed to seonglae/faithful-llama3.1-8b Processing seonglae/faithful-gpt2-small Dataset Statistics for seonglae/faithful-gpt2-small: Total number of commits processed: 12 Total number of unique commits: 11 Total number of rows: 147000 Total number of unique sequences: 146594 Total number of tokens: 100548721 Total number of tokens in unique sequences: 100543674 Merged dataset pushed to seonglae/faithful-gpt2-small
다 duplicated 90 프로 넘으니 새로운 데이터 기존에 겹치는거 10프로라도 넘으면 오류내고 안하도록
Seonglae Cho