Huggingface Transformers Profile

Created

Created

2023 Nov 18 15:35

Editor

Editor

Seonglae Cho

Creator

Creator

Seonglae Cho

Edited

Edited

2026 Jan 7 11:49

Refs

Refs

batch-size 1 inference run on a HuggingFace Transformers model with very poor CPU/GPU overlap — batch-size 1 inference run on a HuggingFace Transformers model with very poor CPU/GPU overlap

Compile torch

torch.compile()

Pipeline Batching

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

https://huggingface.co/docs/transformers/main_classes/pipelines#pipeline-batching

Pipelines

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

https://huggingface.co/docs/transformers/perf_infer_gpu_one

GPU inference

속도와 파이썬, 두 마리 토끼 잡기: 딥러닝 시 빠른 파이썬 코드 실행을 위한 CUDA 그래프 사용법 (Speed, Python: Pick Two. How CUDA Graphs Enable Fast Python Code for Deep Learning)

생성형 AI 플랫폼 개발/서비스 기업 Fireworks.ai의 글을 허락 하에 번역하여 공유합니다. Fireworks.ai에서 작성한 원문은 아래 링크를 눌러 보실 수 있습니다. ⚠ 이 글에는 Firework Platform에 대한 사용 사례 및 홍보가 포함되어 있습니다. 작성: 제임스 K 리드 (James K Reed), 드미트로 줄가코프(Dmytro Dzhulgakov) 이번 포스팅은 고성능의 Fireworks Gen AI 플랫폼에서 최적화를 위해 사용하는 방법에 대한 기술 블로그 시리즈 중 두 번째 글입니다. 다중 쿼리 어텐션에 대한 이전 포스팅도 참고해보세요. This is the second in a series of technical blog posts about the techniques we use for optimization of the high-performance Fireworks Gen AI Platform. See also th...

https://discuss.pytorch.kr/t/cuda-speed-python-pick-two-how-cuda-graphs-enable-fast-python-code-for-deep-learning/2441

속도와 파이썬, 두 마리 토끼 잡기: 딥러닝 시 빠른 파이썬 코드 실행을 위한 CUDA 그래프 사용법 (Speed, Python: Pick Two. How CUDA Graphs Enable Fast Python Code for Deep Learning)

Recommendations

///////