Contextual Compression

Creator

Creator

Created

Created

2024 Jan 9 1:44

Editor

Editor

Edited

Edited

2025 Jul 5 12:10

Refs

Refs

Prompt Learning

LM Context Extending

Prompt Engineering

Prompt compression, Context compression

Contextual Compressions

Selective Context

PRCA

Filter-Ranker

Dynamic Token Pruning

KV Cache Compression

Ranked Recursive Summarization

SuperPos-Prompt

Represent Compressions

Activation Beacon

Infini Transformer

AutoCompressors

Survey

Prompt Compression for Large Language Models: A Survey

Leveraging large language models (LLMs) for complex natural language tasks typically requires long-form prompts to convey detailed requirements and information, which results in increased memory usage and inference costs. To mitigate these challenges, multiple efficient methods have been proposed, with prompt compression gaining significant research interest. This survey provides an overview of prompt compression techniques, categorized into hard prompt methods and soft prompt methods. First, the technical approaches of these methods are compared, followed by an exploration of various ways to understand their mechanisms, including the perspectives of attention optimization, Parameter-Efficient Fine-Tuning (PEFT), modality integration, and new synthetic language. We also examine the downstream adaptations of various prompt compression techniques. Finally, the limitations of current prompt compression methods are analyzed, and several future directions are outlined, such as optimizing the compression encoder, combining hard and soft prompts methods, and leveraging insights from multimodality.

https://arxiv.org/html/2410.12388v2

https://arxiv.org/pdf/2312.10997v1.pdf

Contextual compression | 🦜️🔗 Langchain

One challenge with retrieval is that usually you don't know the specific queries your document storage system will face when you ingest data into the system. This means that the information most relevant to a query may be buried in a document with a lot of irrelevant text. Passing that full document through your application can lead to more expensive LLM calls and poorer responses.

https://python.langchain.com/docs/modules/data_connection/retrievers/contextual_compression

Backlinks

AI Memory Native Sparse Attention Generative Model Prompt Engineering Prompt Engineering Language Model Context Open Generative QA Advanced RAG

Recommendations

////////