연구노트 Model Qunatization 230711

Date

Date

2023 Jul 11 0:0 → 2023 Jul 12 0:0

Created by

Created by

Seonglae Cho

Created time

Created time

2023 Jul 10 18:29

Last edited by

Last edited by

Seonglae Cho

Last edited time

Last edited time

2023 Aug 20 9:15

Refs

Refs

Model Qunatization 되는 애들도 많이 없다 오류 많이 난다 특히

CUDA inference: issue with group_size = 1024 + desc_act = False. (Triton unaffected)

Updated 2023 Jul 9 18:41

이오류 많이난다 openlm-research/open_llama_3b or psmathur/orca_mini_3b

이건 quantization을 되는데 사이즈 작아서 안된다

seonglae/opt-125m-4bit-gptq · Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

https://huggingface.co/seonglae/opt-125m-4bit-gptq

seonglae/opt-125m-4bit-gptq · Hugging Face

이건 quantization 되는데 다시 불러오면 아래같은 오류나옴 뭐지..

seonglae/tulu-7b-4bit-gptq · Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

https://huggingface.co/seonglae/tulu-7b-4bit-gptq

seonglae/tulu-7b-4bit-gptq · Hugging Face

Recommendations

/