BM25

TF-IDF + passage length

Two Poisson model

검색엔진, 추천 시스템 등에서 아직까지도 많이 사용되는 알고리즘

In hybrid search systems, the conventional multiplicative combination of signals (the Product Rule) can suffer from conjunction shrinkage: as more signals are combined, overall confidence can decrease. This work highlights that issue and proposes a log-odds–based combination framework to address it. Rather than a simple arithmetic aggregation, it follows Bayesian inference principles to amplify confidence when multiple pieces of evidence agree.

From Bayesian Inference to Neural Computation: The Analytical Emergence of Neural Network Structure from Probabilistic Relevance Estimation

Abstract This paper demonstrates that the computational structure of a two-layer feedforward neural network with sigmoid activations is not merely an engineering artifact but emerges analytically from first-principles Bayesian inference over multiple relevance signals in information retrieval. We reverse the conventional explanatory direction of neural networks. Starting from a purely probabilistic question—"What is the probability that a document is relevant given multiple evidence signals?"—we apply Bayes' theorem to derive sigmoid calibration, introduce a log-odds conjunction framework to resolve probabilistic shrinkage, and prove that the resulting end-to-end computation is formally isomorphic to a feedforward neural network. Crucially, this derived structure naturally extends to explain modern deep learning components. We show that Sigmoid and ReLU activations answer complementary probabilistic questions ("How probable?" vs. "How much?") , identify the Attention mechanism as a form of context-dependent Bayesian model averaging , and prove that WAND/Block-Max WAND algorithms constitute exact, safety-guaranteed neural pruning methods. Key Contributions Analytical Derivation of Neural Structure: We prove that a two-layer neural network with sigmoid activations is the mathematical necessity for combining multiple calibrated probability signals, establishing an isomorphism between Bayesian inference and neural computation. Unified Theory of Activations (Sigmoid & ReLU): We demonstrate that the Sigmoid is the unique solution for probability estimation (Bernoulli canonical link), while ReLU is derived as the MAP estimator under sparse non-negative priors. This explains the specific roles of hidden layers (feature quantity detection) and output layers (probability judgment) in deep networks. Probabilistic Foundation of Attention: We show that the Attention mechanism—specifically the weighted sum in log-space—is the optimal method for aggregating evidence when signal reliability is context-dependent. This frames Attention as Bayesian Model Averaging. Exact Neural Pruning: We establish that Information Retrieval algorithms like WAND and BMW provide mathematically exact pruning for the derived sigmoid networks, offering a method to skip computation without any loss in accuracy—a guarantee unattainable with unbounded activations alone. Reverse Interpretability: By viewing architecture design as "probabilistic question sequencing," we propose a new interpretability framework where the activation function of each layer explicitly identifies the type of inference (e.g., quantity estimation vs. belief update) it performs.

https://zenodo.org/records/18661379

BM25

TF-IDF + passage length

BM25+Relevance Feedback based on Contingency Table

Bayesian BM25

BB25

Recommendations

BM25+
Relevance Feedback based on
Contingency Table