AIDevKit - AI Suite for Unity
API ReferencesDiscordGlitch9
  • Introduction
    • AI Dev Kit 3.7.0
    • Troubleshooting
    • FAQ
    • Update Logs
      • AI Dev Kit v2
      • AI Dev Kit v1
  • Quick Start
    • API Key Setup
      • OpenAI
      • Google Gemini
      • ElevenLabs
      • OpenRouter
    • Adding Models & Voices
      • Quick Add Guide
      • Creating Snippets
    • Self-Hosting with Ollama
  • Editor Tools
    • Editor Chat
    • Asset Generators
    • Asset Managers
      • Prompt History
      • File Manager
      • Chatbot Manager
      • Assistant Manager
  • GEN Tasks
    • Overview
      • Prefixes
      • Sequence
    • Response
    • Image
    • Video
    • SoundFX
    • Speech
    • Transcript
    • Voice Change
    • Audio Isolation
  • Components
    • Chatbot
    • Chatbot (Assistants API)
    • Realtime Assistant
    • Modules
    • Event Receivers
  • Platform API
    • OpenAI
      • 💬Chat completions
      • 🖼️Image operations
      • 🗣️Text to speech
      • 🎙️Speech to text
        • Recording real-time in Unity
      • 💾Files
      • 🔎Embeddings
      • 🛡️Moderations
      • ⚙️Fine-tuning
      • Assistants API
        • How it works
        • Creating custom functions
        • Creating assistants API
    • Google Gemini
      • 📝System instructions
      • 💬Text generation
      • ⚙️Fine-tuning
      • ▶️Fucntion calling
      • 🔎Embeddings
      • 🛡️Safety
      • 💻Code execution
  • Legacy Documents
    • AI Dev Kit 1.0
      • Preperation
      • Scriptable Toolkits
        • Chat Streamer
        • Image Generator
        • Voice Transcriber
        • Voice Generator
      • Editor Tools
      • Troubleshooting (Legacy)
        • ❗Build Error: The name 'UnityMenu' does not exist in the current context
        • ❗The type or namespace name 'AndroidJavaObject' could not be found
        • ❗The type or namaspace name 'Plastic' does not exist
        • ❗Build Error: The name 'Asset Database' does not exist in the current context
        • ❗'ModelData.Create(Provider, string, UnixTime?, string)': not all code paths return a value
      • Code Generators
        • C# Script Generator
        • Unity Component Generator
    • AI Dev Kit 2.0
      • Event Handlers
      • Editor Chat
      • Editor Vision (TTI, ITI)
      • Editor Speech (TTS)
      • Management Tools
        • Prompt History Viewer
        • AI Model Manager
        • TTS Voice Manager
        • OpenAI File Manager
        • OpenAI Assistant Manager
        • ElevenLabs Voice Library
Powered by GitBook
On this page
  • What it does
  • 1. General Settings
  • 2. Input Audio Transcription
  • 3. Input Audio Recording
  • 4. Output Audio
  • 5. Event Managers & Receivers
  1. Components

Realtime Assistant

Integrate OpenAI’s Realtime API into your Unity project to enable fully voice-driven conversations with millisecond-level latency.

PreviousChatbot (Assistants API)NextModules

Last updated 4 days ago

What it does

This component enables:

  • Real-time two-way voice interaction using OpenAI's Realtime API

  • Microphone streaming with live transcription via Whisper

  • Low-latency voice output using OpenAI’s built-in real-time voices (e.g., Alloy, Echo)

  • Dynamic playback without generating or decoding audio clips locally

  • Optional tool calling via FunctionManager

  • UnityEvent hooks for transcription, assistant responses, WebSocket events, and more


1. General Settings

Field
Description

Realtime Model

The assistant model to stream to, e.g., GPT-4o Realtime Preview

Voice Actor

One of OpenAI’s real-time voices (Alloy, Shimmer, Echo, etc.)

Instructions

System prompt to define assistant behavior

Auto Start

Automatically starts streaming on scene start


2. Input Audio Transcription

Field
Description

Speech-to-Text Model

STT model used for transcription (typically Whisper 1)

Spoken Language

Language spoken by the user


3. Input Audio Recording

Field
Description

Input Audio Format

Audio encoding format (e.g., PCM16)

Input Audio Sample Rate

Sample rate in Hz (e.g., 16000)

Input Sample Duration

Chunk size in milliseconds to send to OpenAI

Silence Duration

Max silence before speech is considered done

Silence Threshold

Volume threshold for silence detection


4. Output Audio

Field
Description

Output Audio Format

Format for streamed output from OpenAI

Output Audio Volume

Multiplier for playback loudness (0–1)

Note: Unlike traditional TTS, audio is streamed directly from OpenAI’s server in real time. There is no AudioClip or local decoding step.


5. Event Managers & Receivers

Field
Description

Function Manager

Executes Unity-side methods triggered by tool calls

Realtime Event Receiver

General connection and session-level events

WebSocket Event Receiver

Low-level socket status updates

Input Transcription Receiver

Full transcript from user's speech input

Text Event Receiver

Assistant’s partial or full text response

Transcript Event Receiver

Final transcript of entire user utterance

Tool Call Receiver

Triggered when assistant requests tool execution