Speech Translation
Translate speech from any language to English using .GENTranslation() or .SpeechToEnglish().
Basic Usage
AudioClip foreignAudio = Resources.Load<AudioClip>("Spanish");
string english = await foreignAudio
.GENTranslation()
.ExecuteAsync();
Debug.Log($"Translation: {english}");Input Types
AudioClip Input
AudioClip audio = Resources.Load<AudioClip>("ForeignSpeech");
string english = await audio
.GENTranslation()
.ExecuteAsync();File Input
var audioFile = new File<AudioClip>(audioClip, "recording.mp3");
string english = await audioFile
.GENTranslation()
.ExecuteAsync();Alias Method
Configuration
Model Selection
Context Prompt
Provide context for better translation:
Temperature
Supported Languages
OpenAI Whisper supports translation from 99+ languages to English:
Spanish, French, German, Italian, Portuguese
Chinese (Mandarin, Cantonese), Japanese, Korean
Russian, Arabic, Turkish, Polish
Dutch, Swedish, Norwegian, Danish
And many more...
Note: Translation is always to English. For other target languages, use .GENTranscript() + separate translation API.
Unity Integration Examples
Example 1: Multi-Language Voice Chat
Example 2: Real-time Translation System
Example 3: Multi-Language Game Tutorial
Example 4: International Customer Support
Example 5: Language Learning Assistant
Example 6: Conference Call Translator
Differences from Transcription
Output
Always English
Original language
Use Case
Cross-language communication
Same-language text
Source Languages
99+ languages
99+ languages
Target Language
English only
Original language
When to Use
✅ Use Translation for
Converting foreign speech to English
International communication
Content localization
Customer support across languages
Educational content
❌ Use Transcription for
Same-language speech-to-text
Subtitles in original language
Voice commands
Dictation
Provider Support
OpenAI Whisper
Features:
✅ 99+ source languages
✅ High accuracy
✅ Context support
✅ English output only
Note: OpenAI Whisper is currently the primary provider for translation. Other providers may be added in future updates.
Best Practices
✅ Good Practices
❌ Bad Practices
Audio Requirements
Same as Speech to Text:
Supported formats:
WAV, MP3, M4A, FLAC, OGG
Limits:
Max file size: 25 MB
Max duration: ~2 hours
Quality:
Recommended sample rate: 16kHz+
Channels: Mono or Stereo
Error Handling
Performance Tips
Workflow: Translate → Speak
Common pattern for creating English audio from foreign speech:
Next Steps
Speech to Text - Transcribe in original language
Text to Speech - Generate speech from text
Voice Change - Modify voice characteristics
Last updated