Memory (RAG)

How agents remember past messages and retrieve relevant context using RAG

By default, an Agent only "remembers" the last N messages it has seen - a simple sliding window. When a conversation grows long, old messages fall off and the model forgets them entirely.

RAG (Retrieval-Augmented Generation) solves this by embedding every message into a local vector store. Before each request, the most semantically relevant past messages are retrieved and injected into the context. The model is reminded of information from hours or even sessions ago - without sending the entire history every time (which would be expensive).

How the Two Systems Work Together

              New user message
                     |
      +------------------------------+
      |  Short-Term Context Window   |  <- Last N messages (always included)
      +------------------------------+
                     +
      +------------------------------+
      |   RAG Retrieval (optional)   |  <- Top-K semantically similar past messages
      +------------------------------+
                     |
           Final context sent to model

The two systems are additive. Short-term context ensures recency; RAG ensures relevance for older information.

Configuring Memory

Memory is configured in the Context & Memory section of your AgentSettings asset.

Basic Settings

Setting

Description

Default

Auto Save

Persist conversation to disk after each message

Max Context Messages

How many recent messages to always include

Generate Title

Automatically generate a title for the conversation

Max Context Messages

This is the most important tuning knob. A higher value gives the model more context but increases cost (more tokens per request).

Range

Typical use case

10-20

Mobile apps, simple chat UIs

20-50

Complex multi-turn assistants

50+

Use only when RAG is also enabled to offset token cost

RAG Settings

RAG is disabled by default. Enable it from Context & Memory > Use Vector Store.

Setting

Description

Default

Use Vector Store

Enable local RAG retrieval

Off

Embedding Model

Model used to embed messages (e.g. text-embedding-3-small)

Retrieval Top K

How many past messages to retrieve per request

Retrieval Min Similarity

Minimum cosine similarity to include a result (0-1)

0.5

When to Enable RAG

Enable RAG when:

Conversations span multiple sessions and the user expects the agent to recall earlier topics
You want to reduce MaxContextMessages for cost savings without sacrificing long-term recall
Your agent acts as a knowledge assistant that needs to reference specific past details

Leave RAG off when:

Conversations are short and ephemeral (one-shot Q&A, simple support chat)
You are prioritizing low latency (RAG adds an embedding API call per request)
Your embedding API budget is limited

Tuning Retrieval Quality

Retrieval Top K

Controls how many past messages are retrieved per request:

Top K

Effect

1-5

Focused - only the most relevant results; lowest token cost

6-12

Recommended - balanced relevance and context coverage

13-32

Broad retrieval; higher cost, useful for knowledge-dense agents

Retrieval Min Similarity

Controls how closely a past message must match the current input before being included:

Min Similarity

Effect

0.7-1.0

Strict - only highly relevant matches

0.5-0.7

Recommended - balanced for most use cases

0.25-0.5

Loose - may include weakly related content

< 0.25

Very loose - likely retrieves noise

Start with the defaults (Top K = 8, Min Similarity = 0.5) and only tune if you observe that relevant past messages are being missed or irrelevant ones are showing up.

Conversation Title Generation

When enabled, the Agent automatically generates a descriptive title for the conversation after the first exchange. This title is used in conversation list UIs.

Setting

Description

Enabled

Turn title generation on or off

Trigger After Message Count

Generate title after N messages (default: 2)

Regenerate Until Message Count

Keep regenerating until N messages, then lock (0 = generate once only)

Model

Lightweight model for title generation (defaults to a small utility model)

Conversation Store

Memory only persists across sessions if a ConversationStore is configured. The store type is set on the AgentBehaviour component (not on AgentSettings):

Store Type

Description

Do Not Save

Conversation is lost when the scene unloads

Local File

Saved as a JSON file on the device

Threads API (OpenAI)

Conversation stored in OpenAI's Threads API

Conversations API (OpenAI)

Stateful conversation via OpenAI Responses API

Custom Store

Implement IConversationStore for cloud storage or your own backend

For a full guide on saving, loading, and managing multiple conversations, see Saving & Loading.

PreviousCreating Your First Agent NextConversations

Last updated 10 days ago

hashtagHow the Two Systems Work Together

hashtagConfiguring Memory

hashtagBasic Settings

hashtagMax Context Messages

hashtagRAG Settings

hashtagWhen to Enable RAG

hashtagTuning Retrieval Quality

hashtagRetrieval Top K

hashtagRetrieval Min Similarity

hashtagConversation Title Generation

hashtagConversation Store

How the Two Systems Work Together

Configuring Memory

Basic Settings

Max Context Messages

RAG Settings

When to Enable RAG

Tuning Retrieval Quality

Retrieval Top K

Retrieval Min Similarity

Conversation Title Generation

Conversation Store