Texonom
Texonom
/
Engineering
Engineering
/Data Engineering/Artificial Intelligence/AI Object/Multimodal AI/Vision Language Model/Gemini Google/
Gemini Live
Search

Gemini Live

Creator
Creator
Seonglae Cho
Created
Created
2024 Dec 22 14:11
Editor
Editor
Seonglae Cho
Edited
Edited
2024 Dec 22 14:12
Refs
Refs
OpenAI Realtime API
multimodal-live-api-web-console
google-gemini • Updated 2024 Dec 22 12:3
 
 
 
 
not yet for webrtc
Multimodal Live API  |  Gemini API  |  Google AI for Developers
The Multimodal Live API enables low-latency, two-way interactions that use text, audio, and video input, with audio and text output. This facilitates natural, human-like voice conversations with the ability to interrupt the model at any time. The model's video understanding capability expands communication modalities, enabling you to share camera input or screencasts and ask questions about them.
Multimodal Live API  |  Gemini API  |  Google AI for Developers
https://ai.google.dev/api/multimodal-live
Multimodal Live API  |  Gemini API  |  Google AI for Developers
 
 

Recommendations

Texonom
Texonom
/
Engineering
Engineering
/Data Engineering/Artificial Intelligence/AI Object/Multimodal AI/Vision Language Model/Gemini Google/
Gemini Live
Copyright Seonglae Cho