REVELA

Revela’s key idea is to introduce an in-batch attention mechanism to model inter-document dependencies within a batch. Similarity scores computed by the retriever are used as attention weights, so when predicting the current sequence the model can reference context from other relevant documents in the same batch. During training, the retriever learns a probability distribution over in-batch similarities and is jointly optimized together with the language model under the NTP objective.

Architecturally, Revela applies V-normalization in cross-document attention to prevent any single token from dominating, encouraging the model to focus on sequence-level semantic information. Unlike prior methods such as REPLUG, which compute LM perplexity for each document pair and thus incur complexity, Revela jointly processes all documents in a single forward pass, reducing training complexity to linear . This design yields strong scalability even with large batch sizes and large model sizes.

Empirically, Revela achieves an nDCG@10 score 2.8 points higher than the 7B-parameter supervised model E5-Mistral-7B-Instruct on the CoIR code-retrieval benchmark. It also surpasses previous state-of-the-art unsupervised retrievers such as Contriever and LaPraDoR on BEIR, establishing a new SoTA—while using roughly 1000× less training data and 10× less compute than prior approaches.

Revela: Dense Retriever Learning via Language Modeling

Dense retrievers play a vital role in accessing external and specialized knowledge to augment language models (LMs). Training dense retrievers typically requires annotated query-document pairs,...

https://arxiv.org/abs/2506.16552

Revela

TRUMANCFY • Updated 2026 Apr 30 23:17

trumancai/Revela-3b · Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

https://huggingface.co/trumancai/Revela-3b

REVELA

Recommendations