How to Add Memory to Claude: A Developer's Complete Tutorial

One of the most common questions from developers building with Claude is: "How do I make Claude remember things?"

Out of the box, each Claude API call is stateless — the model has no idea what was said in a previous request. But building a useful AI assistant, chatbot, or agent almost always requires some form of memory. The good news is Claude's architecture makes this surprisingly clean to implement once you understand the layers.

This tutorial covers four memory patterns, from the simplest (conversation history in the messages array) to production-grade (external vector database with semantic retrieval). You'll have working code for each approach.

Why Claude Has No Built-in Memory (and Why That's Fine)

Claude processes whatever you put in its context window — nothing more, nothing less. This is a design choice, not a limitation. Stateless APIs are easier to scale, reason about, and secure. The responsibility of what context to include belongs to your application layer.

Think of it this way: Claude is an extremely capable reasoning engine. Memory is a retrieval problem. Separating the two gives you full control over privacy, relevance, and cost.

There are four distinct memory needs most apps have:

Within-session memory — remember what was said earlier in this conversation

Cross-session memory — remember facts from previous sessions

Knowledge memory — recall from a document corpus (RAG)

Episodic memory — recall specific past interactions by similarity

Let's implement each.

Pattern 1: Conversation History (The Messages Array)

This is the foundation. Claude's API uses a messages array where you pass the full conversation history with every request.

pythonimport anthropic

client = anthropic.Anthropic()

# Start with an empty history
conversation_history = []

def chat(user_message: str) -> str:
    # Append the new user message
    conversation_history.append({
        "role": "user",
        "content": user_message
    })
    
    # Pass full history to Claude every time
    response = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=1024,
        messages=conversation_history
    )
    
    assistant_message = response.content[0].text
    
    # Append Claude's response so it's included in the next turn
    conversation_history.append({
        "role": "assistant",
        "content": assistant_message
    })
    
    return assistant_message

# Test it
print(chat("My name is Sarah and I'm building a SaaS product for dentists."))
print(chat("What are the key features I should prioritize?"))
print(chat("What was my name again?"))  # Claude will remember

This works perfectly for single-session conversations. The catch: context has a cost. As the conversation grows, you're sending more tokens every turn, and eventually you'll hit the model's context limit (though Claude's 1M token context window makes this a distant concern for most apps).

Trimming Old Messages

For very long conversations, implement a sliding window:

pythonMAX_HISTORY_MESSAGES = 20

def chat_with_trim(user_message: str) -> str:
    conversation_history.append({"role": "user", "content": user_message})
    
    # Keep only the last N messages
    trimmed_history = conversation_history[-MAX_HISTORY_MESSAGES:]
    
    response = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=1024,
        messages=trimmed_history
    )
    
    assistant_message = response.content[0].text
    conversation_history.append({"role": "assistant", "content": assistant_message})
    return assistant_message

Pattern 2: Cross-Session Memory with a System Prompt

For apps where users return across multiple sessions, you need to persist facts between sessions. The cleanest way is to store key facts in a database and inject them into the system prompt at the start of each session.

pythonimport json
import anthropic
from datetime import datetime

client = anthropic.Anthropic()

# Simulated user profile store (use PostgreSQL/Redis in production)
user_profiles = {}

def get_user_facts(user_id: str) -> str:
    """Retrieve stored facts about the user."""
    profile = user_profiles.get(user_id, {})
    if not profile:
        return "No prior information about this user."
    
    facts = []
    if profile.get("name"):
        facts.append(f"Name: {profile['name']}")
    if profile.get("role"):
        facts.append(f"Role: {profile['role']}")
    if profile.get("context"):
        facts.append(f"Context: {profile['context']}")
    if profile.get("last_session"):
        facts.append(f"Last spoke: {profile['last_session']}")
    
    return "\n".join(facts) if facts else "No stored facts yet."

def save_user_facts(user_id: str, facts: dict):
    """Update the user's profile."""
    if user_id not in user_profiles:
        user_profiles[user_id] = {}
    user_profiles[user_id].update(facts)
    user_profiles[user_id]["last_session"] = datetime.now().strftime("%Y-%m-%d")

def build_system_prompt(user_id: str) -> str:
    user_facts = get_user_facts(user_id)
    return f"""You are a helpful AI assistant with memory of this user.

What you know about this user:
{user_facts}

When the user shares new personal information (name, role, goals, preferences), 
remember it as part of the conversation context. Be natural — don't announce 
that you're remembering things."""

def chat_with_memory(user_id: str, message: str, history: list) -> str:
    system_prompt = build_system_prompt(user_id)
    
    history.append({"role": "user", "content": message})
    
    response = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=1024,
        system=system_prompt,
        messages=history
    )
    
    reply = response.content[0].text
    history.append({"role": "assistant", "content": reply})
    return reply

# Usage
user_id = "user_123"
session_history = []

# Session 1
save_user_facts(user_id, {"name": "Sarah", "role": "founder", "context": "building dental SaaS"})

r1 = chat_with_memory(user_id, "What features should I prioritize for my SaaS?", session_history)
print(r1)

# New session — history is reset, but facts persist
session_history = []
r2 = chat_with_memory(user_id, "Hey, I'm back! What were we discussing?", session_history)
print(r2)  # Claude will recall Sarah's context from the system prompt

Auto-Extracting Facts with Claude

Instead of manually calling save_user_facts, use Claude to extract facts to remember:

pythondef extract_facts_to_remember(conversation: list) -> dict:
    """Use Claude to extract key facts from a conversation."""
    extraction_prompt = """Review this conversation and extract facts worth remembering 
about the user for future sessions. Return JSON only.

Example output:
{
  "name": "Sarah",
  "role": "SaaS founder", 
  "goals": "launch dental software by Q3",
  "preferences": "prefers concise answers"
}

Return {} if nothing worth storing."""
    
    response = client.messages.create(
        model="claude-haiku-4-5-20251001",  # Haiku is fast and cheap for extraction
        max_tokens=512,
        system=extraction_prompt,
        messages=conversation
    )
    
    try:
        return json.loads(response.content[0].text)
    except json.JSONDecodeError:
        return {}

Pattern 3: Vector Database Memory (Semantic Recall)

For knowledge-intensive apps — where you need Claude to recall from thousands of past interactions or documents — inject retrieved context using a vector database.

This is the classic RAG (Retrieval-Augmented Generation) pattern applied to memory.

pythonfrom anthropic import Anthropic
import numpy as np

client = Anthropic()

# In production: use Pinecone, Weaviate, pgvector, or Chroma
# Here we simulate with a simple in-memory store
memory_store = []  # List of {"text": str, "embedding": list[float]}

def get_embedding(text: str) -> list[float]:
    """Get embedding for a piece of text using Claude's API or a dedicated embeddings model."""
    # In production, use a dedicated embedding model (e.g., Voyage AI, OpenAI embeddings)
    # Claude itself doesn't have an embeddings endpoint — use Voyage AI (Anthropic's recommended partner)
    # For this tutorial, we simulate with a placeholder
    # voyage_client.embed([text], model="voyage-3").embeddings[0]
    raise NotImplementedError("Replace with your embedding provider")

def cosine_similarity(a: list[float], b: list[float]) -> float:
    a_arr, b_arr = np.array(a), np.array(b)
    return float(np.dot(a_arr, b_arr) / (np.linalg.norm(a_arr) * np.linalg.norm(b_arr)))

def remember(text: str):
    """Store a memory with its embedding."""
    embedding = get_embedding(text)
    memory_store.append({"text": text, "embedding": embedding})

def recall(query: str, top_k: int = 3) -> list[str]:
    """Retrieve the most relevant memories for a query."""
    if not memory_store:
        return []
    
    query_embedding = get_embedding(query)
    scored = [
        (cosine_similarity(query_embedding, m["embedding"]), m["text"])
        for m in memory_store
    ]
    scored.sort(key=lambda x: x[0], reverse=True)
    return [text for _, text in scored[:top_k]]

def chat_with_vector_memory(message: str, history: list) -> str:
    # Retrieve relevant past memories
    relevant_memories = recall(message)
    
    memory_context = ""
    if relevant_memories:
        memory_context = "\n\nRelevant memories:\n" + "\n".join(
            f"- {m}" for m in relevant_memories
        )
    
    system = f"""You are a helpful assistant with access to past conversation memories.
{memory_context}

Use these memories naturally when relevant. Don't mention that you're reading from memory."""
    
    history.append({"role": "user", "content": message})
    
    response = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=1024,
        system=system,
        messages=history
    )
    
    reply = response.content[0].text
    history.append({"role": "assistant", "content": reply})
    
    # Optionally store this exchange as a new memory
    remember(f"User said: {message}. Assistant replied: {reply[:200]}")
    
    return reply

Tip: For production, swap the in-memory store for pgvector (if you're already on PostgreSQL/Neon) or Pinecone. Voyage AI is Anthropic's recommended embedding provider and integrates cleanly with Claude workflows.

Pattern 4: Tool-Based Memory (Most Flexible)

The most powerful pattern: give Claude tools to read and write its own memory. Claude decides what to remember and when to recall it.

pythonimport json
import anthropic

client = anthropic.Anthropic()

# Simple file-backed memory (use a database in production)
MEMORY_FILE = "claude_memory.json"

def load_memories() -> dict:
    try:
        with open(MEMORY_FILE, "r") as f:
            return json.load(f)
    except FileNotFoundError:
        return {}

def save_memories(memories: dict):
    with open(MEMORY_FILE, "w") as f:
        json.dump(memories, f, indent=2)

# Define memory tools for Claude
tools = [
    {
        "name": "remember",
        "description": "Save an important fact or piece of information to long-term memory. Use this when the user shares something that should be recalled in future sessions.",
        "input_schema": {
            "type": "object",
            "properties": {
                "key": {
                    "type": "string",
                    "description": "A short identifier for this memory (e.g., 'user_name', 'user_goal', 'preferred_style')"
                },
                "value": {
                    "type": "string",
                    "description": "The information to remember"
                }
            },
            "required": ["key", "value"]
        }
    },
    {
        "name": "recall",
        "description": "Retrieve a previously saved memory by key.",
        "input_schema": {
            "type": "object",
            "properties": {
                "key": {
                    "type": "string",
                    "description": "The key of the memory to retrieve"
                }
            },
            "required": ["key"]
        }
    },
    {
        "name": "list_memories",
        "description": "List all stored memory keys so you know what information is available.",
        "input_schema": {
            "type": "object",
            "properties": {}
        }
    }
]

def handle_tool_call(tool_name: str, tool_input: dict) -> str:
    memories = load_memories()
    
    if tool_name == "remember":
        memories[tool_input["key"]] = tool_input["value"]
        save_memories(memories)
        return f"Saved: {tool_input['key']} = {tool_input['value']}"
    
    elif tool_name == "recall":
        value = memories.get(tool_input["key"])
        return value if value else f"No memory found for key: {tool_input['key']}"
    
    elif tool_name == "list_memories":
        if not memories:
            return "No memories stored yet."
        return "Stored memory keys: " + ", ".join(memories.keys())
    
    return "Unknown tool"

def chat_with_tool_memory(message: str, history: list) -> str:
    history.append({"role": "user", "content": message})
    
    system = """You are a helpful assistant with the ability to save and retrieve memories.
    
At the start of conversations, use list_memories to see what you know about the user.
When users share important information (name, preferences, goals, context), use remember() to save it.
Use recall() when you need specific information about the user."""
    
    while True:
        response = client.messages.create(
            model="claude-sonnet-4-6",
            max_tokens=1024,
            system=system,
            tools=tools,
            messages=history
        )
        
        # If Claude wants to use a tool, handle it
        if response.stop_reason == "tool_use":
            history.append({"role": "assistant", "content": response.content})
            
            tool_results = []
            for block in response.content:
                if block.type == "tool_use":
                    result = handle_tool_call(block.name, block.input)
                    tool_results.append({
                        "type": "tool_result",
                        "tool_use_id": block.id,
                        "content": result
                    })
            
            history.append({"role": "user", "content": tool_results})
            continue
        
        # Claude gave a final response
        reply = ""
        for block in response.content:
            if hasattr(block, "text"):
                reply += block.text
        
        history.append({"role": "assistant", "content": reply})
        return reply

# Test
history = []
print(chat_with_tool_memory("Hi! I'm Alex, a backend engineer working on a Rust microservices project.", history))
print(chat_with_tool_memory("What do you know about me?", history))

# Start a new session — memories persist because they're saved to disk
history = []
print(chat_with_tool_memory("Hey, do you remember me?", history))

Choosing the Right Memory Pattern

Pattern	Best For	Complexity	Cost
Messages array	Single-session chatbots	Low	Low
System prompt injection	User profile apps, returning users	Low-Medium	Low
Vector DB (RAG)	Knowledge-heavy apps, large memory stores	High	Medium
Tool-based memory	Agents, complex workflows	Medium	Medium

Start with Pattern 1. Add Pattern 2 (system prompt injection) when users return across sessions. Only reach for vector memory or tool-based memory when your memory store grows beyond a few hundred items or you need semantic search.

Production Checklist

Before shipping a Claude app with memory, verify:

[ ] Privacy: Are you storing only what you need? Can users view and delete their memories?
[ ] Token budget: Log token usage per request. Memory injection increases costs.
[ ] Relevance decay: Old memories may become incorrect. Add a timestamp and prune memories older than 90 days.
[ ] Context ordering: Put memory context in the system prompt, not at the start of the messages array. System prompts get special treatment in Claude's attention.
[ ] Prompt caching: If your system prompt is large and stable (e.g., a big knowledge base), use Claude's prompt caching to cut costs by up to 90%.
[ ] Haiku for extraction: Use claude-haiku-4-5-20251001 for memory extraction and classification tasks — it's 10x cheaper and fast enough for background jobs.

Key Takeaways

Claude has no built-in memory by design — your application owns the retrieval layer
The messages array handles within-session memory; persist facts to a database for cross-session recall
Vector databases unlock semantic memory retrieval at scale
Tool-based memory lets Claude manage its own recall, which is powerful for autonomous agents
Start simple (messages array + system prompt injection) and graduate to vector memory only when needed

Next Steps

Ready to build more sophisticated Claude agents? Check out our guide on Claude multi-agent orchestration or dive into the Claude API production best practices for patterns you can ship with confidence.

If you're preparing for the Claude Certified Architect (CCA-F) exam, memory architecture is a key topic — context management, tool use, and agentic patterns all appear in the exam. Practice with our CCA exam guide and free practice questions.

How to Add Memory to Claude: Conversation History & Persistent Memory Tutorial