AI DevKit
Glitch9 Inc.Glitch9 DocsDiscordIssues
  • Introduction
    • AI DevKit 3.0
    • Update Logs
    • Troubleshooting
      • ❗Issues After Updating AIDevKit?
      • ❗The type or namespace name 'Newtonsoft' could not be found
      • ❗Build Error: The name 'UnityMenu' does not exist in the current context
      • ❗Model 'modelName' not found
      • ❗The model `model name` does not exist or you do not have access to it
      • ❗The type or namespace name 'AndroidJavaObject' could not be found
      • ❗The type or namaspace name 'Plastic' does not exist
      • ❗Build Error: The name 'Asset Database' does not exist in the current context
      • ❗'ModelData.Create(Provider, string, UnixTime?, string)': not all code paths return a value
      • ⚠️ Timeout Issues
      • ⚠️ Receiving a “HTTP/1.1 400 Bad Request” Error?
    • FAQ
      • My OpenAI API free trial has ended or is inactive.
  • Quick Start
    • Get API Keys
      • OpenAI API Key Guide
      • Google API Key Guide
      • ElevenLabs API Key Guide
    • Text Generation
    • C# Object Generation
    • Image Generation
    • Sound Effect Generation
    • Text to Speech (TTS)
    • Speech to Text (STT)
    • Voice Changer
    • Audio Isolation
  • Pro Features
    • Generation Menu
      • Code Generators
        • C# Script Generator
        • Unity Component Generator
    • Editor Chat
    • Editor Vision (TTI, ITI)
    • Editor Speech (TTS)
    • Management Tools
      • Prompt History Viewer
      • AI Model Manager
      • TTS Voice Manager
      • OpenAI File Manager
      • OpenAI Assistant Manager
      • ElevenLabs Voice Library
  • Assistants API (OpenAI)
    • How it works
    • Creating custom functions
    • Creating assistants API
  • Advanced API Supports
    • OpenAI API
      • 💬Chat completions
      • 🖼️Image operations
      • 🗣️Text to speech
      • 🎙️Speech to text
        • Recording real-time in Unity
      • 💾Files
      • 🔎Embeddings
      • 🛡️Moderations
      • ⚙️Fine-tuning
    • Google API
      • 📝System instructions
      • 💬Text generation
      • ⚙️Fine-tuning
      • ▶️Fucntion calling
      • 🔎Embeddings
      • 🛡️Safety
      • 💻Code execution
    • ElevenLabs API
  • Legacy Documents
    • AI DevKit 1.0 - 2.0
      • AI DevKit 2.0
      • AI DevKit 1.0
      • Preperation
      • Event Handlers
      • Scriptable Toolkits
        • Chat Streamer
        • Image Generator
        • Voice Transcriber
        • Voice Generator
      • Editor Tools
Powered by GitBook
On this page
  • Speech-to-Text Operations Overview:
  • Sample Code for Speech-to-Text Requests:
  1. Advanced API Supports
  2. OpenAI API

Speech to text

PreviousText to speechNextRecording real-time in Unity

Last updated 10 months ago

Integrating OpenAI's Speech-to-Text (STT) capabilities into your Unity project enables you to transcribe audio content into written text. This feature is powered by OpenAI's advanced speech recognition models, making it invaluable for applications that involve voice commands, audio content accessibility, or the processing of spoken user inputs.

For detailed information about the Speech-to-Text API, including the models available, parameter options, and best practices for audio files, refer to the .

Speech-to-Text Operations Overview:

  • Audio Transcription: Convert spoken words from audio files into accurate written text. This process facilitates the understanding and utilization of spoken language within your applications.

  • Audio Translation: Convert and translation spoken language into written text in English.

Sample Code for Speech-to-Text Requests:

1. Audio Transcription Request:

Transcribe audio content to text. You'll need to provide the audio file as a FormFile (for API requests) or a AudioClip object (within Unity).

var audioFile = new FormFile(path to speech.mp3);
var request = new TranscriptionRequest.Builder()
    .SetModel(WhisperModel.Whisper1)
    .SetFile(audioFile)
    .Build();

var result = await request.ExecuteAsync();

Debug.Log(result.Text);
Debug.Log(result.Language);
Debug.Log(result.Duration);
from openai import OpenAI
client = OpenAI()

audio_file = open("speech.mp3", "rb")
transcript = client.audio.transcriptions.create(
  model="whisper-1",
  file=audio_file
)

2. Audio Translation Request:

Transcribe audio content to English text. You'll need to provide the audio file as a FormFile (for API requests) or a AudioClip object (within Unity).

var audioFile = new FormFile(path to speech.mp3);

var request = new TranslationRequest.Builder()
    .SetModel(WhisperModel.Whisper1)
    .SetFile(audioFile)
    .Build();

var result = await request.ExecuteAsync();

Debug.Log(result.Text);
from openai import OpenAI
client = OpenAI()

audio_file = open("speech.mp3", "rb")
transcript = client.audio.translations.create(
  model="whisper-1",
  file=audio_file
)

🎙️
Speech API Reference