githubEdit

diagram-projectHow Agent Works

AIDevKit Agent is an advanced conversational AI system that provides a unified interface for interacting with various AI LLM providers. This document explains the internal architecture and working principles of the Agent in detail.

Agent Architecture

AIDevKit Agent Architecture
Complete architecture structure of AIDevKit Agent

Core Components

The Agent consists of multiple core components, each responsible for specific functionality.

1. Agent Core

The central hub of the Agent that coordinates and manages all components.

Key Responsibilities:

  • Component lifecycle management

  • Event coordination and routing

  • API communication orchestration

  • State management

2. AgentControlHub

The central control system that manages Agent state and events.

Management Areas:

  • AgentStatus: Current Agent state (Idle, Listening, Processing, Speaking)

  • Events: Publishing and subscribing to all Agent events

  • Hooks: Lifecycle hook execution

3. AgentChatApiAdapter

Abstracts and integrates API communication with various LLM providers.

Supported API Types:

  • Chat Completion API: Standard chat completion API

  • Assistants API: OpenAI Assistants API

  • Responses API: Structured response API

  • Realtime API: Real-time voice conversation API

Supported Providers:

  • OpenAI

  • Google Gemini

  • Anthropic Claude

  • Other compatible providers

4. Controllers

Specialized controllers that manage Agent functionality in detail.

ConversationController

Manages conversation history and messages.

Features:

  • Conversation creation and loading

  • Message history management

  • Context maintenance

  • Conversation save and restore

AudioController

Handles voice input and output.

Features:

  • Speech recognition (STT)

  • Text-to-speech (TTS)

  • Audio playback control

  • Real-time voice streaming

ImageController

Manages image generation and vision capabilities.

Features:

  • AI image generation

  • Image analysis (Vision)

  • Image editing

  • Multimodal interaction

ToolController

Manages Agent's tool and function calling capabilities.

Features:

  • Tool registration and management

  • Function call execution

  • MCP (Model Context Protocol) integration

  • Tool result processing

Workflow

The complete flow of how Agent processes user requests.

1. Sending Messages

Processing Order:

  1. ConversationController adds user message to conversation history

  2. AgentControlHub changes status to Processing

  3. OnStatusChanged event is fired

2. API Request

AgentChatApiAdapter Role:

  • Selects appropriate client based on API type

  • Transforms request format per provider

  • Manages streaming settings

  • Includes tool definitions

3. Streaming Response

Streaming Flow:

  1. Receive data in chunks from API

  2. Deliver each chunk via OnStreamingUpdate event

  3. Display text in real-time in UI

  4. Accumulate complete response

4. Tool Execution

When the Agent determines a tool call is needed:

5. Response Completion

Completion Processing:

  1. Save complete response to ConversationController

  2. Change status to Idle

  3. Fire OnResponseCompleted event

  4. Update token usage and metadata

Event System

Agent provides a comprehensive event system.

Status Events

Conversation Events

Streaming Events

Tool Events

Audio Events

State Management

Agent performs state-based operations.

Agent States

State Transitions

State Transition Example:

Memory Management

Agent can optionally use long-term memory.

How Memory Works:

  1. Conversation history is stored as vector embeddings

  2. Retrieve relevant memories for new queries

  3. Automatically include relevant context in prompt

  4. AI references past conversations in responses

For more details, see the Memory section.

API Communication

How AgentChatApiAdapter handles various API types:

Chat Completion API

Assistants API

Responses API

Realtime API

Error Handling

Agent provides comprehensive error handling.

Configuration

Various options for configuring Agent:

Performance Considerations

Streaming vs Non-Streaming

Context Window Management

Token Optimization

Best Practices

1. Proper Initialization

2. Resource Cleanup

3. Event Subscription Management

4. Error Handling

5. State Checking

Next Steps

Now that you understand the core principles of Agent, explore the following:

For practical usage examples, see the Essentialsarrow-up-right section.

Last updated