Speech Translation

Translate speech from any language to English using .GENTranslation() or .SpeechToEnglish().

Basic Usage

AudioClip foreignAudio = Resources.Load<AudioClip>("Spanish");
string english = await foreignAudio
    .GENTranslation()
    .ExecuteAsync();

Debug.Log($"Translation: {english}");

Input Types

AudioClip Input

AudioClip audio = Resources.Load<AudioClip>("ForeignSpeech");
string english = await audio
    .GENTranslation()
    .ExecuteAsync();

File Input

var audioFile = new File<AudioClip>(audioClip, "recording.mp3");
string english = await audioFile
    .GENTranslation()
    .ExecuteAsync();

Alias Method

Configuration

Model Selection

Context Prompt

Provide context for better translation:

Temperature

Supported Languages

OpenAI Whisper supports translation from 99+ languages to English:

  • Spanish, French, German, Italian, Portuguese

  • Chinese (Mandarin, Cantonese), Japanese, Korean

  • Russian, Arabic, Turkish, Polish

  • Dutch, Swedish, Norwegian, Danish

  • And many more...

Note: Translation is always to English. For other target languages, use .GENTranscript() + separate translation API.

Unity Integration Examples

Example 1: Multi-Language Voice Chat

Example 2: Real-time Translation System

Example 3: Multi-Language Game Tutorial

Example 4: International Customer Support

Example 5: Language Learning Assistant

Example 6: Conference Call Translator

Differences from Transcription

Feature
Translation
Transcription

Output

Always English

Original language

Use Case

Cross-language communication

Same-language text

Source Languages

99+ languages

99+ languages

Target Language

English only

Original language

When to Use

✅ Use Translation for

  • Converting foreign speech to English

  • International communication

  • Content localization

  • Customer support across languages

  • Educational content

❌ Use Transcription for

  • Same-language speech-to-text

  • Subtitles in original language

  • Voice commands

  • Dictation

Provider Support

OpenAI Whisper

Features:

  • ✅ 99+ source languages

  • ✅ High accuracy

  • ✅ Context support

  • ✅ English output only

Note: OpenAI Whisper is currently the primary provider for translation. Other providers may be added in future updates.

Best Practices

✅ Good Practices

❌ Bad Practices

Audio Requirements

Same as Speech to Text:

Supported formats:

  • WAV, MP3, M4A, FLAC, OGG

Limits:

  • Max file size: 25 MB

  • Max duration: ~2 hours

Quality:

  • Recommended sample rate: 16kHz+

  • Channels: Mono or Stereo

Error Handling

Performance Tips

Workflow: Translate → Speak

Common pattern for creating English audio from foreign speech:

Next Steps

Last updated