Modality determines the type of data contained in a data pointInformation integration or exchanging across Vision, Text, Speech, Touch, Smell from diverse sensors unlike unimodal AIMultimodal AI ModelsVision Language ModelSpoken Language ModelAmazon Nova Multimodal AI NotionMultimodality FusionMultimodal DatasetMultimodal BenchmarksMcGurk effectCross-Modal RetrievalPixel Aligned Language Model CS224nStanford CS 224N | Natural Language Processing with Deep LearningNote: In the 2023–24 academic year, CS224N will be taught in both Winter and Spring 2024. We hope to see you in class!https://web.stanford.edu/class/cs224n/NotionMultimodality and Large Multimodal Models (LMMs)For a long time, each ML model operated in one data mode – text (translation, language modeling), image (object detection, image classification), or audio (speech recognition).https://huyenchip.com/2023/10/10/multimodal.htmlWhat Is Multimodal AI?Applications, Principles, and Core Research Challenges in Multimodal AIhttps://app.twelvelabs.io/blog/what-is-multimodal-ai[미라클레터] 인공지능이 하나가 된다 ?!?미라클 모닝을 하는 일잘러들의 '참고서'https://stibee.com/api/v1.0/emails/share/Ue8q7jhsWwUJw3s7XtO19eGpnGJDUJ4=thegenerality.comhttps://thegenerality.com/agi/?fbclid=IwAR3aPmwPe6CC7INiPtKlq3SxShfP_-l4LsfvCmS-I6ChQgIAl5qfuQLz_YEUXMulti-Modal AI is a UX ProblemTransformers and other AI breakthroughs have shown state-of-the-art performance across different modalities * Text-to-Text (OpenAI ChatGPT) * Text-to-Image (Stable Diffusion) * Image-to-Text (Open AI CLIP) * Speech-to-Text (OpenAI Whisper) * Text-to-Speech (Meta’s Massively Multilingual Speech) * Image-to-Image (img2img or pix2pix) * Text-to-Audio (Meta MusicGen) * Text-to-Code (OpenAI Codex / GitHub Copilot) * Code-to-Text (ChatGPT, etc.) The next frontier in AI is combining these mohttps://matt-rickard.com/multi-modal-ai-is-a-ux-problem