githubEdit

microphoneSpeech to Text

Transcribe audio to text using .GENTranscription() or .SpeechToText().

Basic Usage

AudioClip recording = Microphone.Start(null, false, 10, 44100);
await UniTask.Delay(5000);
Microphone.End(null);

string transcript = await recording
    .GENTranscription()
    .ExecuteAsync();

Debug.Log($"You said: {transcript}");

Input Types

AudioClip Input

AudioClip audio = Resources.Load<AudioClip>("Recording");
string text = await audio
    .GENTranscription()
    .ExecuteAsync();

File Input

Alias Method

Configuration

Model Selection

Language Hint

Supported languages:

  • English, Spanish, French, German, Italian, Portuguese

  • Chinese, Japanese, Korean

  • Arabic, Russian, Turkish

  • And many more...

Timestamp Granularities

Populate timestamps in the transcription output:

Available granularities:

  • "word" - Word-level timestamps (adds latency)

  • "segment" - Segment-level timestamps (no added latency)

Unity Integration Examples

Example 1: Voice Command System

Example 2: Real-time Subtitles

Example 3: Dictation System

Example 4: Audio File Transcriber

Example 5: Meeting Transcriber

Example 6: Multi-Language Support

Provider Support

OpenAI Whisper

Features:

  • ??99+ languages

  • ??High accuracy

  • ??Speaker diarization (in verbose mode)

  • ??Timestamp support

Google Chirp

Features:

  • ??Multiple languages

  • ??Real-time streaming

  • ??Punctuation

  • ??Word-level timestamps

Audio Requirements

Format Requirements

Supported formats:

  • WAV

  • MP3

  • M4A

  • FLAC

  • OGG

Recommended settings:

  • Sample rate: 16kHz or higher

  • Channels: Mono or Stereo

  • Bit depth: 16-bit or higher

Size Limits

OpenAI:

  • Max file size: 25 MB

  • Max duration: ~2 hours (at standard quality)

Google:

  • Max file size: 10 MB (for sync)

  • Max duration: 1 minute (for sync)

Best Practices for Audio

Best Practices

??Good Practices

??Bad Practices

Error Handling

Performance Tips

Next Steps

Last updated