Modality determines the type of data contained in a data pointInformation integration or exchanging across Vision, Text, Speech, Touch, Smell from diverse sensors unlike unimodal AIMultimodal AI ModelsVision Language ModelSpoken Language Model Multimodal AI NotionMultimodality FusionMultimodal DatasetMultimodal BenchmarksMcGurk effectCross-Modal RetrievalPixel Aligned Language Model CS224nStanford CS 224N | Natural Language Processing with Deep LearningNote: In the 2023–24 academic year, CS224N will be taught in both Winter and Spring 2024. We hope to see you in class!https://web.stanford.edu/class/cs224n/NotionMultimodality and Large Multimodal Models (LMMs)For a long time, each ML model operated in one data mode – text (translation, language modeling), image (object detection, image classification), or audio (speech recognition).https://huyenchip.com/2023/10/10/multimodal.htmlWhat Is Multimodal AI?Applications, Principles, and Core Research Challenges in Multimodal AIhttps://app.twelvelabs.io/blog/what-is-multimodal-ai[미라클레터] 인공지능이 하나가 된다 ?!?미라클 모닝을 하는 일잘러들의 '참고서'https://stibee.com/api/v1.0/emails/share/Ue8q7jhsWwUJw3s7XtO19eGpnGJDUJ4=thegenerality.comhttps://thegenerality.com/agi/?fbclid=IwAR3aPmwPe6CC7INiPtKlq3SxShfP_-l4LsfvCmS-I6ChQgIAl5qfuQLz_YEUXMulti-Modal AI is a UX ProblemTransformers and other AI breakthroughs have shown state-of-the-art performance across different modalities * Text-to-Text (OpenAI ChatGPT) * Text-to-Image (Stable Diffusion) * Image-to-Text (Open AI CLIP) * Speech-to-Text (OpenAI Whisper) * Text-to-Speech (Meta’s Massively Multilingual Speech) * Image-to-Image (img2img or pix2pix) * Text-to-Audio (Meta MusicGen) * Text-to-Code (OpenAI Codex / GitHub Copilot) * Code-to-Text (ChatGPT, etc.) The next frontier in AI is combining these mohttps://matt-rickard.com/multi-modal-ai-is-a-ux-problem