Speech to Text

Transcribe audio to text using .GENTranscript() or .SpeechToText().

Basic Usage

AudioClip recording = Microphone.Start(null, false, 10, 44100);
await UniTask.Delay(5000);
Microphone.End(null);

string transcript = await recording
    .GENTranscript()
    .ExecuteAsync();

Debug.Log($"You said: {transcript}");

Input Types

AudioClip Input

AudioClip audio = Resources.Load<AudioClip>("Recording");
string text = await audio
    .GENTranscript()
    .ExecuteAsync();

File Input

Alias Method

Configuration

Model Selection

Language Hint

Supported languages:

  • English, Spanish, French, German, Italian, Portuguese

  • Chinese, Japanese, Korean

  • Arabic, Russian, Turkish

  • And many more...

Context Prompt

Provide context to improve accuracy:

Temperature

Control randomness of transcription (0.0-1.0):

Response Format

Available formats:

  • TranscriptFormat.Text - Plain text only

  • TranscriptFormat.Json - JSON with text

  • TranscriptFormat.VerboseJson - JSON with timestamps and metadata

  • TranscriptFormat.Srt - SubRip subtitle format

  • TranscriptFormat.Vtt - WebVTT subtitle format

Unity Integration Examples

Example 1: Voice Command System

Example 2: Real-time Subtitles

Example 3: Dictation System

Example 4: Audio File Transcriber

Example 5: Meeting Transcriber

Example 6: Multi-Language Support

Provider Support

OpenAI Whisper

Features:

  • ✅ 99+ languages

  • ✅ High accuracy

  • ✅ Speaker diarization (in verbose mode)

  • ✅ Timestamp support

Google Chirp

Features:

  • ✅ Multiple languages

  • ✅ Real-time streaming

  • ✅ Punctuation

  • ✅ Word-level timestamps

Audio Requirements

Format Requirements

Supported formats:

  • WAV

  • MP3

  • M4A

  • FLAC

  • OGG

Recommended settings:

  • Sample rate: 16kHz or higher

  • Channels: Mono or Stereo

  • Bit depth: 16-bit or higher

Size Limits

OpenAI:

  • Max file size: 25 MB

  • Max duration: ~2 hours (at standard quality)

Google:

  • Max file size: 10 MB (for sync)

  • Max duration: 1 minute (for sync)

Best Practices for Audio

Best Practices

✅ Good Practices

❌ Bad Practices

Error Handling

Performance Tips

Next Steps

Last updated