Realtime Assistant
Integrate OpenAI’s Realtime API into your Unity project to enable fully voice-driven conversations with millisecond-level latency.
Last updated
Integrate OpenAI’s Realtime API into your Unity project to enable fully voice-driven conversations with millisecond-level latency.
Last updated
This component enables:
Real-time two-way voice interaction using OpenAI's Realtime API
Microphone streaming with live transcription via Whisper
Low-latency voice output using OpenAI’s built-in real-time voices (e.g., Alloy, Echo)
Dynamic playback without generating or decoding audio clips locally
Optional tool calling via FunctionManager
UnityEvent hooks for transcription, assistant responses, WebSocket events, and more
Realtime Model
The assistant model to stream to, e.g., GPT-4o Realtime Preview
Voice Actor
One of OpenAI’s real-time voices (Alloy, Shimmer, Echo, etc.)
Instructions
System prompt to define assistant behavior
Auto Start
Automatically starts streaming on scene start
Speech-to-Text Model
STT model used for transcription (typically Whisper 1)
Spoken Language
Language spoken by the user
Input Audio Format
Audio encoding format (e.g., PCM16)
Input Audio Sample Rate
Sample rate in Hz (e.g., 16000)
Input Sample Duration
Chunk size in milliseconds to send to OpenAI
Silence Duration
Max silence before speech is considered done
Silence Threshold
Volume threshold for silence detection
Output Audio Format
Format for streamed output from OpenAI
Output Audio Volume
Multiplier for playback loudness (0–1)
Note: Unlike traditional TTS, audio is streamed directly from OpenAI’s server in real time. There is no
AudioClip
or local decoding step.
Function Manager
Executes Unity-side methods triggered by tool calls
Realtime Event Receiver
General connection and session-level events
WebSocket Event Receiver
Low-level socket status updates
Input Transcription Receiver
Full transcript from user's speech input
Text Event Receiver
Assistant’s partial or full text response
Transcript Event Receiver
Final transcript of entire user utterance
Tool Call Receiver
Triggered when assistant requests tool execution