Text to Speech

Generate natural-sounding speech from text using .GENSpeech() or .TextToSpeech().

Basic Usage

AudioClip speech = await "Welcome back, Commander!"
    .GENSpeech()
    .ExecuteAsync();

audioSource.clip = speech;
audioSource.Play();

Input Types

String Input

AudioClip speech = await "Hello, world!"
    .GENSpeech()
    .ExecuteAsync();

Prompt Input

var prompt = new Prompt("The {character} says: {dialogue}");
AudioClip speech = await prompt
    .GENSpeech()
    .ExecuteAsync();

Alias Method

Configuration

Voice Selection

Available OpenAI Voices:

  • OpenAIVoice.Alloy - Neutral, balanced

  • OpenAIVoice.Echo - Male, clear

  • OpenAIVoice.Fable - British, expressive

  • OpenAIVoice.Onyx - Deep, authoritative

  • OpenAIVoice.Nova - Energetic, young

  • OpenAIVoice.Shimmer - Soft, gentle

Available ElevenLabs Voices:

  • ElevenLabsVoice.Rachel - Calm, natural

  • ElevenLabsVoice.Adam - Deep, confident

  • ElevenLabsVoice.Antoni - Warm, friendly

  • ElevenLabsVoice.Arnold - Mature, strong

  • And many more...

Model Selection

Voice Settings (ElevenLabs)

Parameters:

  • Stability (0.0-1.0): Higher = more consistent, Lower = more expressive

  • Similarity Boost (0.0-1.0): How closely to match the original voice

  • Style (0.0-1.0): Style exaggeration (model-dependent)

Audio Format

Speed Control

Unity Integration Examples

Example 1: NPC Dialogue

Example 2: Tutorial Narrator

Example 3: Dynamic UI Feedback

Example 4: Accessibility Reader

Example 5: Multi-Language Support

Example 6: Subtitle Generator

Provider Support

OpenAI

Features:

  • ✅ Multiple voices

  • ✅ Speed control

  • ✅ HD quality option

  • ✅ Low latency

ElevenLabs

Features:

  • ✅ Highly natural voices

  • ✅ Voice cloning

  • ✅ Fine-grained control

  • ✅ Emotional range

Best Practices

✅ Good Practices

❌ Bad Practices

Performance Tips

Error Handling

Limitations

  1. Text Length: Most providers have character limits (2000-5000 chars)

  2. Rate Limits: API calls per minute may be limited

  3. Cost: Longer text = higher cost

  4. Real-time: Not suitable for real-time dialogue (use Realtime API instead)

Next Steps

Last updated