finding multiple relevant passages and step-by-step reasoning to answer complex questions.Multi-hop QA ModelsBeam Retrieval Multi-hop QA DatasetsHotpotQA2WikiMultiHopQAMuSiQue There is moderate evidence of the second-hop reasoning, which does not become stronger with increasing model size.Do Large Language Models Latently Perform Multi-Hop Reasoning?We study whether Large Language Models (LLMs) latently perform multi-hop reasoning with complex prompts such as "The mother of the singer of 'Superstition' is". We look for evidence of a latent...https://arxiv.org/abs/2402.16837