Audio Setup

Configure voice input and output for your agent.

Overview

AI Dev Kit supports:

Input Audio - Record and transcribe user speech
Output Audio - Generate speech from agent responses
Realtime Audio - Low-latency bidirectional voice (Realtime API)

Enabling Audio

In AgentSettings

// Enable input (speech-to-text)
settings.EnableInputAudio = true;
settings.InputAudioParameters = new TranscriptionParameters
{
    Model = "whisper-1",
    SpokenLanguage = SystemLanguage.English
};

// Enable output (text-to-speech)
settings.EnableOutputAudio = true;
settings.OutputAudioParameters = new SpeechParameters
{
    Model = "tts-1",
    Voice = "alloy",
    Speed = 1.0f
};

In AgentBehaviour Inspector

Select GameObject with AgentBehaviour
Reference AgentSettings with audio enabled
Assign InputAudioRecorder component
Assign OutputAudioPlayer component

Input Audio (Speech-to-Text)

Transcription Parameters

public class TranscriptionParameters
{
    public string Model { get; set; }              // "whisper-1"
    public SystemLanguage SpokenLanguage { get; set; }  // English, Spanish, etc.
    public string Prompt { get; set; }              // Optional context
    public float Temperature { get; set; }          // 0.0 - 1.0
}

Example:

settings.InputAudioParameters = new TranscriptionParameters
{
    Model = "whisper-1",
    SpokenLanguage = SystemLanguage.English,
    Prompt = "This is a technical support conversation",
    Temperature = 0.2f
};

Input Audio Recorder

Provide a recorder component:

[SerializeField] private InputAudioRecorder recorder;

void Start()
{
    agentBehaviour.InputAudioRecorder = recorder;
}

Or implement custom recorder:

public class CustomRecorder : MonoBehaviour, IInputAudioRecorder
{
    public async UniTask<AudioClip> RecordAsync(CancellationToken ct)
    {
        // Custom recording logic
        return recordedClip;
    }
}

Output Audio (Text-to-Speech)

Speech Parameters

public class SpeechParameters
{
    public string Model { get; set; }       // "tts-1", "tts-1-hd"
    public string Voice { get; set; }       // "alloy", "echo", "fable", etc.
    public float Speed { get; set; }        // 0.25 - 4.0
    public AudioFormat Format { get; set; } // mp3, opus, aac, flac
}

Example:

settings.OutputAudioParameters = new SpeechParameters
{
    Model = "tts-1-hd",
    Voice = "nova",
    Speed = 1.1f,
    Format = AudioFormat.mp3
};

Available Voices

Voice

Description

alloy

Neutral, balanced

echo

Warm, engaging

fable

British accent

onyx

Deep, authoritative

nova

Friendly, conversational

shimmer

Upbeat, energetic

Output Audio Player

Provide a player component:

[SerializeField] private OutputAudioPlayer player;

void Start()
{
    agentBehaviour.OutputAudioPlayer = player;
}

Or implement custom player:

public class CustomPlayer : MonoBehaviour, IOutputAudioPlayer
{
    public async UniTask PlayAsync(AudioClip clip, CancellationToken ct)
    {
        // Custom playback logic
    }
    
    public void Stop()
    {
        // Stop playback
    }
}

Realtime Audio (WebSocket)

For low-latency voice conversations:

// In AgentSettings
settings.ChatServiceApi = ChatService.RealtimeApi;
settings.Model = "gpt-4o-realtime-preview";
settings.EnableInputAudio = true;
settings.EnableOutputAudio = true;

// Audio is handled natively by Realtime API
// No separate transcription/synthesis needed

Complete Setup Example

using UnityEngine;
using Glitch9.AIDevKit.Agents;

public class VoiceAssistantSetup : MonoBehaviour
{
    [SerializeField] private AgentBehaviour agent;
    [SerializeField] private InputAudioRecorder recorder;
    [SerializeField] private OutputAudioPlayer player;
    
    void Start()
    {
        // Configure audio
        agent.InputAudioRecorder = recorder;
        agent.OutputAudioPlayer = player;
        agent.OutputAudioVolume = 0.8f;
        
        // Subscribe to audio events
        agent.onInputAudioStarted.AddListener(OnRecordingStarted);
        agent.onInputAudioCompleted.AddListener(OnRecordingCompleted);
        agent.onOutputAudioStarted.AddListener(OnPlaybackStarted);
        agent.onOutputAudioCompleted.AddListener(OnPlaybackCompleted);
    }
    
    public async void OnMicButtonPressed()
    {
        // Record and send audio
        await agent.SendAudioAsync();
    }
    
    void OnRecordingStarted()
    {
        ShowRecordingIndicator();
    }
    
    void OnRecordingCompleted(AudioClip clip)
    {
        HideRecordingIndicator();
        Debug.Log($"Recorded {clip.length} seconds");
    }
    
    void OnPlaybackStarted()
    {
        ShowSpeakerIndicator();
    }
    
    void OnPlaybackCompleted()
    {
        HideSpeakerIndicator();
    }
}

Audio Events

Input Events

agent.onInputAudioStarted.AddListener(() => {
    Debug.Log("Started recording");
});

agent.onInputAudioCompleted.AddListener((clip) => {
    Debug.Log($"Recorded: {clip.length}s");
});

agent.onInputAudioTranscribed.AddListener((text) => {
    Debug.Log($"Transcribed: {text}");
});

Output Events

agent.onOutputAudioStarted.AddListener(() => {
    Debug.Log("Started playback");
});

agent.onOutputAudioCompleted.AddListener(() => {
    Debug.Log("Playback finished");
});

agent.onOutputAudioProgress.AddListener((progress) => {
    Debug.Log($"Progress: {progress:P}");
});

Volume Control

// Set output volume
agent.OutputAudioVolume = 0.7f; // 0.0 - 1.0

// Adjust at runtime
public void OnVolumeSliderChanged(float value)
{
    agent.OutputAudioVolume = value;
}

Language Support

Input Language

settings.InputAudioParameters.SpokenLanguage = SystemLanguage.English;
// Or: Spanish, French, German, Chinese, Japanese, etc.

Output Language

Output language is determined by the response text language. The TTS model automatically detects and speaks in the appropriate language.

Audio Quality

Input Quality

Use appropriate microphone settings:

// 16kHz, 16-bit, mono recommended
recorder.SampleRate = 16000;
recorder.Channels = 1;

Output Quality

// Standard quality (faster, cheaper)
settings.OutputAudioParameters.Model = "tts-1";

// HD quality (slower, more expensive)
settings.OutputAudioParameters.Model = "tts-1-hd";

Platform Considerations

Mobile

#if UNITY_IOS || UNITY_ANDROID
    // Request microphone permission
    if (!Application.HasUserAuthorization(UserAuthorization.Microphone))
    {
        await Application.RequestUserAuthorization(UserAuthorization.Microphone);
    }
#endif

WebGL

WebGL requires browser microphone permissions. Handle in UI:

if (Application.platform == RuntimePlatform.WebGLPlayer)
{
    ShowMicrophonePermissionDialog();
}

Troubleshooting

No audio recorded

Check microphone permissions
Verify InputAudioRecorder is assigned
Check microphone device is available

No audio output

Verify OutputAudioPlayer is assigned
Check audio volume settings
Ensure EnableOutputAudio is true

Poor transcription quality

Reduce background noise
Use higher quality microphone
Add context in Prompt parameter

Audio lag

Use Realtime API for lowest latency
Reduce audio quality if needed
Check network connection

Next Steps

PreviousConversation Store NextMemory Settings

Last updated 15 minutes ago