현재 병목인 추론을 가속 Inference with Reference: Lossless Acceleration of Large Language ModelsWe propose LLMA, an LLM accelerator to losslessly speed up Large Language Model (LLM) inference with references. LLMA is motivated by the observation that there are abundant identical text spans...https://arxiv.org/abs/2304.04487