Aggregated Sparsity

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2024 Mar 31 5:57
Editor
Edited
Edited
2024 Mar 31 8:34
Refs
Refs
the pattern of internal skip connections used to aggregate outputs of earlier layers for consumption by deeper layers
ReLU 활성화 함수를 사용함으로써 얻을 수 있는 효율적인 계산 절감 방법
인퍼런스 단계에서 계산량을 최대 세 배까지 줄일 수 있는 전략을 제안
 
 
 
openreview.net

low-rank predictor

activation을 up_proj와 곱한 뒤 ReLU 적용했을 때 0이 될까 안 될까만 알아내는 binary classifier. 0될거같은 애들 아예 메모리 안불러와서 절약
LLM in a flash: Efficient Large Language Model Inference with...
Large language models (LLMs) are central to modern natural language processing, delivering exceptional performance in various tasks. However, their substantial computational and memory...
LLM in a flash: Efficient Large Language Model Inference with...
 
 

Recommendations