Speech to Text (STT)

Convert spoken words from an AudioClip into text using powerful AI transcription models.

Ideal for voice commands, user feedback, subtitles, or audio-driven gameplay systems.

✅ Basic Usage

AudioClip recording = MicrophoneCapture.GetLastClip();

string result = await recording
    .GENTranscript()
    .SetModel(OpenAIModel.Whisper)
    .SetLanguage(SystemLanguage.Korean)
    .ExecuteAsync();

Debug.Log("Transcript: " + result);

🔊 The AudioClip can be from a microphone, file, or any runtime source.

⚙️ Configuration Options

Method

Description

SetLanguage(SystemLanguage)

Optional hint to improve transcription accuracy

SetModel(model)

Choose which STT model to use (Whisper, Gemini STT, etc.)

SetOutputPath(path)

Save transcription to file (optional)

🌍 Translation Mode

You can also translate speech into English using GENTranslation():

string english = await recording
    .GENTranslation()
    .SetModel(OpenAIModel.Whisper)
    .ExecuteAsync();

🗣️ This uses the same audio input but produces translated text (into English).

📦 Example Result

Audio Input: "안녕하세요, 오늘 날씨 어때요?" Transcript: "안녕하세요, 오늘 날씨 어때요?" Translation: "Hello, how's the weather today?"

🧠 Tips

Works best with clean, mono audio at 16kHz or higher.
SetLanguage is optional — the model can auto-detect, but accuracy improves with a hint.
For multilingual games or voice input, pair this with Text Generation for natural response.

PreviousText to Speech (TTS)NextVoice Change

Last updated 27 days ago