Speech to Text
Transcribe audio to text using .GENTranscript() or .SpeechToText().
Basic Usage
AudioClip recording = Microphone.Start(null, false, 10, 44100);
await UniTask.Delay(5000);
Microphone.End(null);
string transcript = await recording
.GENTranscript()
.ExecuteAsync();
Debug.Log($"You said: {transcript}");Input Types
AudioClip Input
AudioClip audio = Resources.Load<AudioClip>("Recording");
string text = await audio
.GENTranscript()
.ExecuteAsync();File Input
Alias Method
Configuration
Model Selection
Language Hint
Supported languages:
English, Spanish, French, German, Italian, Portuguese
Chinese, Japanese, Korean
Arabic, Russian, Turkish
And many more...
Context Prompt
Provide context to improve accuracy:
Temperature
Control randomness of transcription (0.0-1.0):
Response Format
Available formats:
TranscriptFormat.Text- Plain text onlyTranscriptFormat.Json- JSON with textTranscriptFormat.VerboseJson- JSON with timestamps and metadataTranscriptFormat.Srt- SubRip subtitle formatTranscriptFormat.Vtt- WebVTT subtitle format
Unity Integration Examples
Example 1: Voice Command System
Example 2: Real-time Subtitles
Example 3: Dictation System
Example 4: Audio File Transcriber
Example 5: Meeting Transcriber
Example 6: Multi-Language Support
Provider Support
OpenAI Whisper
Features:
✅ 99+ languages
✅ High accuracy
✅ Speaker diarization (in verbose mode)
✅ Timestamp support
Google Chirp
Features:
✅ Multiple languages
✅ Real-time streaming
✅ Punctuation
✅ Word-level timestamps
Audio Requirements
Format Requirements
Supported formats:
WAV
MP3
M4A
FLAC
OGG
Recommended settings:
Sample rate: 16kHz or higher
Channels: Mono or Stereo
Bit depth: 16-bit or higher
Size Limits
OpenAI:
Max file size: 25 MB
Max duration: ~2 hours (at standard quality)
Google:
Max file size: 10 MB (for sync)
Max duration: 1 minute (for sync)
Best Practices for Audio
Best Practices
✅ Good Practices
❌ Bad Practices
Error Handling
Performance Tips
Next Steps
Speech Translation - Translate speech to English
Text to Speech - Generate speech from text
Voice Change - Modify voice characteristics
Last updated