Multi-modal approaches far exceed the performance of text-only RAG
Image summary or audio summary text embedding are good enough for retrieval too
Multimodal Retrievals
Multi-modal RAG on slide decks
Key Links * LangChain public benchmark evaluation notebooks * LangChain template for multi-modal RAG on presentations Motivation Retrieval augmented generation (RAG) is one of the most important concepts in LLM app development. Documents of many types can be passed into the context window of an LLM, enabling interactive chat or Q+A
https://blog.langchain.dev/multi-modal-rag-template/


Seonglae Cho