Conversational AI

Creator

Creator

Seonglae Cho

Created

Created

2024 Nov 5 21:10

Editor

Editor

Seonglae Cho

Edited

Edited

2026 Jun 25 16:45

Refs

Refs

Spoken Language Model

Conversational Speech Model

Conversational AIs

ElevenLabs Conversational

OpenAI Realtime API

Grok Voice Agent

Gemini NativeAudio Dialog

Interaction Models

Conversational AI Notion

Generative Spoken Language Model

Synchronous LLM

https://www.pnas.org/doi/10.1073/pnas.0903616106

Hertz-dev

Introducing hertz-dev - Standard Intelligence

For the last few months, we at Standard Intelligence have been researching scalable cross-modality learning. We're excited to announce that we're open-sourcing current checkpoints of our full-duplex, audio-only base model, hertz-dev, with a total of 8.5 billion parameters and three primary parts:

https://si.inc/hertz-dev/

Conversational Interface

The case against conversational interfaces

Conversational interfaces are a bit of a meme. Every couple of years a shiny new AI development emerges and people in tech go "This is it! The next computing paradigm is here! We'll only use natural language going forward!". But then nothing actually changes and we continue using computers the way w

https://julian.digital/2025/03/27/the-case-against-conversational-interfaces/

The case against conversational interfaces

Full-Duplex-Bench

Paper page - Full-Duplex-Bench: A Benchmark to Evaluate Full-duplex Spoken Dialogue Models on Turn-taking Capabilities

Join the discussion on this paper page

https://huggingface.co/papers/2503.04721

Paper page - Full-Duplex-Bench: A Benchmark to Evaluate Full-duplex Spoken Dialogue Models on Turn-taking Capabilities

schema

A Full-duplex Speech Dialogue Scheme Based On Large Language Models

We present a generative dialogue system capable of operating in a full-duplex manner, allowing for seamless interaction. It is based on a large language model (LLM) carefully aligned to be aware...

https://arxiv.org/abs/2405.19487

Beyond one-on-one

Beyond one-on-one: Authoring, simulating, and testing dynamic human-AI group conversations

Erzhen Hu, Student Researcher, and Ruofei Du, Interactive Perception & Graphics Lead, Google XR

https://research.google/blog/beyond-one-on-one-authoring-simulating-and-testing-dynamic-human-ai-group-conversations/

Beyond one-on-one: Authoring, simulating, and testing dynamic human-AI group conversations

Full Duplex Model

Streaming Requests & Realtime API in vLLM

Large language model inference has traditionally operated on a simple premise: the user submits a complete prompt (request), the model processes it, and returns

https://vllm.ai/blog/streaming-realtime

Streaming Requests & Realtime API in vLLM

STT를 넘고, Realtime STT, 그리고 곧 다가올 Full Duplex 모델 시대로

제 MBTI가 N이라서 그런지 개인적으로 미래 예측을 좋아하는데요, 다만 제 미래 예측을 믿고 판단 및 행동하는 편은 아닙니다.

STT를 넘고, Realtime STT, 그리고 곧 다가올 Full Duplex 모델 시대로

https://monday9pm.com/stt%EB%A5%BC-%EB%84%98%EA%B3%A0-realtime-stt-%EA%B7%B8%EB%A6%AC%EA%B3%A0-%EA%B3%A7-%EB%8B%A4%EA%B0%80%EC%98%AC-full-duplex-%EB%AA%A8%EB%8D%B8-%EC%8B%9C%EB%8C%80%EB%A1%9C-7f50ed74f175

STT를 넘고, Realtime STT, 그리고 곧 다가올 Full Duplex 모델 시대로

Interactions API: A unified foundation for models and agents

Google’s Interactions API is a unified interface for interacting with Gemini models and agents.

https://blog.google/innovation-and-ai/technology/developers-tools/interactions-api/

Interactions API: A unified foundation for models and agents

Recommendations

///////