Multimodal Stream

Created
Created
2024 May 18 9:11
Creator
Creator
Seonglae ChoSeonglae Cho
Editor
Edited
Edited
2024 May 22 4:26
Refs
Refs

Native Multimodality

  • Continuously encoding video frames
  • Combining the video and speech input into a timeline of events
  • Caching information for efficient recall

How

  • 그래서 어떻게 토큰화 시켜서 넣어줄까 (Vision + Audio 더한다음 Split 해서 넣어주지 않을까)
  • Inference 중간에 어떻게 토큰 interrupt할까 (KV Cache 로 하면 될듯)
 
 
 
 
Project Astra
Building on our Gemini models, Project Astra explores the future of AI assistants that can process multimodal information, understand the context you’re in, and respond naturally in conversation.
Project Astra
 
 

Recommendations