TensorRT-LLM

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2023 Oct 24 4:10
Editor
Edited
Edited
2025 Mar 25 17:43
nvidia
빠른데 지원하는 모델 적고 왠진 모르겠지만 top p 1 로 해도 deterministic 함. 아마 오류인듯?
def tensorrt_llm( model_name="google/gemma-2-2b", max_tokens=1024, temperature=1.0, top_p=1.0, batch_size=1024, total_tokens=1e8, repo="seonglae/faithful-gemma2-2b", upload_interval=1e7 ): from tensorrt_llm import LLM, SamplingParams tokenizer = AutoTokenizer.from_pretrained(model_name) bos_token_id = tokenizer.bos_token_id llm = LLM(model=model_name) all_data = [] tokens_generated = 0 sample_id = 0 last_upload = 0 pbar = tqdm(total=total_tokens, unit='tokens') while tokens_generated < total_tokens: prompt_token_ids = [[bos_token_id]] * batch_size sampling_params = [SamplingParams(temperature=temperature, top_p=top_p, max_tokens=max_tokens - 1, seed=i + sample_id, random_seed=None) for i in range(batch_size)] outputs = llm.generate(inputs=prompt_token_ids, sampling_params=sampling_params) for i, output in enumerate(outputs): text = output.outputs[0].text num_tokens = len(tokenizer.encode(text)) all_data.append({ "id": sample_id, "seed": sampling_params[i].seed, "temp": temperature, "top_p": top_p, "text": text, "tokens": num_tokens }) tokens_generated += num_tokens sample_id += 1 pbar.update(num_tokens) if tokens_generated - last_upload >= upload_interval: fields = ["id", "seed", "temp", "top_p", "text", "tokens"] merge_and_push_dataset(all_data, repo, fields) last_upload = tokens_generated if tokens_generated >= total_tokens: break if all_data: fields = ["id", "seed", "temp", "top_p", "text", "tokens"] merge_and_push_dataset(all_data, repo, fields) pbar.close() del llm gc.collect() torch.cuda.empty_cache() def wait_on_first_completed(futures): """Wait for the first future to complete and return done and not done futures.""" done, not_done = concurrent.futures.wait( futures, return_when=concurrent.futures.FIRST_COMPLETED ) return done, list(not_done)
 
 
 
 

Sampling options

API Reference — tensorrt_llm documentation
model (str or Path) – The model name or a local model directory. Note that if the value could be both a model name or a local model directory, the local model directory will be prioritized.
 
 
 

Recommendations