Speech to Text (STT)

Convert spoken words from an AudioClip into text using powerful AI transcription models.

Ideal for voice commands, user feedback, subtitles, or audio-driven gameplay systems.


βœ… Basic Usage

AudioClip recording = MicrophoneCapture.GetLastClip();

string result = await recording
    .GENTranscript()
    .SetModel(OpenAIModel.Whisper)
    .SetLanguage(SystemLanguage.Korean)
    .ExecuteAsync();

Debug.Log("Transcript: " + result);

πŸ”Š The AudioClip can be from a microphone, file, or any runtime source.


βš™οΈ Configuration Options

Method
Description

SetLanguage(SystemLanguage)

Optional hint to improve transcription accuracy

SetModel(model)

Choose which STT model to use (Whisper, Gemini STT, etc.)

SetOutputPath(path)

Save transcription to file (optional)


🌍 Translation Mode

You can also translate speech into English using GENTranslation():

string english = await recording
    .GENTranslation()
    .SetModel(OpenAIModel.Whisper)
    .ExecuteAsync();

πŸ—£οΈ This uses the same audio input but produces translated text (into English).


πŸ“¦ Example Result

Audio Input: "μ•ˆλ…•ν•˜μ„Έμš”, 였늘 날씨 μ–΄λ•Œμš”?" Transcript: "μ•ˆλ…•ν•˜μ„Έμš”, 였늘 날씨 μ–΄λ•Œμš”?" Translation: "Hello, how's the weather today?"


🧠 Tips

  • Works best with clean, mono audio at 16kHz or higher.

  • SetLanguage is optional β€” the model can auto-detect, but accuracy improves with a hint.

  • For multilingual games or voice input, pair this with Text Generation for natural response.

Last updated