현재 병목인 추론을 가속
Inference with Reference: Lossless Acceleration of Large Language Models
We propose LLMA, an LLM accelerator to losslessly speed up Large Language Model (LLM) inference with references. LLMA is motivated by the observation that there are abundant identical text spans...
https://arxiv.org/abs/2304.04487


Seonglae Cho