How to Add Memory to Claude: Conversation History & Persistent Memory Tutorial
Learn how to add short-term conversation history and long-term persistent memory to Claude API apps. Covers messages array, context compression, vector DB memory, and tool-based recall.
How to Add Memory to Claude: A Developer's Complete Tutorial
One of the most common questions from developers building with Claude is: "How do I make Claude remember things?"
Out of the box, each Claude API call is stateless — the model has no idea what was said in a previous request. But building a useful AI assistant, chatbot, or agent almost always requires some form of memory. The good news is Claude's architecture makes this surprisingly clean to implement once you understand the layers.
This tutorial covers four memory patterns, from the simplest (conversation history in the messages array) to production-grade (external vector database with semantic retrieval). You'll have working code for each approach.
Why Claude Has No Built-in Memory (and Why That's Fine)
Claude processes whatever you put in its context window — nothing more, nothing less. This is a design choice, not a limitation. Stateless APIs are easier to scale, reason about, and secure. The responsibility of what context to include belongs to your application layer.
Think of it this way: Claude is an extremely capable reasoning engine. Memory is a retrieval problem. Separating the two gives you full control over privacy, relevance, and cost.
There are four distinct memory needs most apps have:
Let's implement each.
Pattern 1: Conversation History (The Messages Array)
This is the foundation. Claude's API uses a messages array where you pass the full conversation history with every request.
pythonimport anthropic
client = anthropic.Anthropic()
# Start with an empty history
conversation_history = []
def chat(user_message: str) -> str:
# Append the new user message
conversation_history.append({
"role": "user",
"content": user_message
})
# Pass full history to Claude every time
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
messages=conversation_history
)
assistant_message = response.content[0].text
# Append Claude's response so it's included in the next turn
conversation_history.append({
"role": "assistant",
"content": assistant_message
})
return assistant_message
# Test it
print(chat("My name is Sarah and I'm building a SaaS product for dentists."))
print(chat("What are the key features I should prioritize?"))
print(chat("What was my name again?")) # Claude will rememberThis works perfectly for single-session conversations. The catch: context has a cost. As the conversation grows, you're sending more tokens every turn, and eventually you'll hit the model's context limit (though Claude's 1M token context window makes this a distant concern for most apps).
Trimming Old Messages
For very long conversations, implement a sliding window:
pythonMAX_HISTORY_MESSAGES = 20
def chat_with_trim(user_message: str) -> str:
conversation_history.append({"role": "user", "content": user_message})
# Keep only the last N messages
trimmed_history = conversation_history[-MAX_HISTORY_MESSAGES:]
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
messages=trimmed_history
)
assistant_message = response.content[0].text
conversation_history.append({"role": "assistant", "content": assistant_message})
return assistant_messagePattern 2: Cross-Session Memory with a System Prompt
For apps where users return across multiple sessions, you need to persist facts between sessions. The cleanest way is to store key facts in a database and inject them into the system prompt at the start of each session.
pythonimport json
import anthropic
from datetime import datetime
client = anthropic.Anthropic()
# Simulated user profile store (use PostgreSQL/Redis in production)
user_profiles = {}
def get_user_facts(user_id: str) -> str:
"""Retrieve stored facts about the user."""
profile = user_profiles.get(user_id, {})
if not profile:
return "No prior information about this user."
facts = []
if profile.get("name"):
facts.append(f"Name: {profile['name']}")
if profile.get("role"):
facts.append(f"Role: {profile['role']}")
if profile.get("context"):
facts.append(f"Context: {profile['context']}")
if profile.get("last_session"):
facts.append(f"Last spoke: {profile['last_session']}")
return "\n".join(facts) if facts else "No stored facts yet."
def save_user_facts(user_id: str, facts: dict):
"""Update the user's profile."""
if user_id not in user_profiles:
user_profiles[user_id] = {}
user_profiles[user_id].update(facts)
user_profiles[user_id]["last_session"] = datetime.now().strftime("%Y-%m-%d")
def build_system_prompt(user_id: str) -> str:
user_facts = get_user_facts(user_id)
return f"""You are a helpful AI assistant with memory of this user.
What you know about this user:
{user_facts}
When the user shares new personal information (name, role, goals, preferences),
remember it as part of the conversation context. Be natural — don't announce
that you're remembering things."""
def chat_with_memory(user_id: str, message: str, history: list) -> str:
system_prompt = build_system_prompt(user_id)
history.append({"role": "user", "content": message})
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
system=system_prompt,
messages=history
)
reply = response.content[0].text
history.append({"role": "assistant", "content": reply})
return reply
# Usage
user_id = "user_123"
session_history = []
# Session 1
save_user_facts(user_id, {"name": "Sarah", "role": "founder", "context": "building dental SaaS"})
r1 = chat_with_memory(user_id, "What features should I prioritize for my SaaS?", session_history)
print(r1)
# New session — history is reset, but facts persist
session_history = []
r2 = chat_with_memory(user_id, "Hey, I'm back! What were we discussing?", session_history)
print(r2) # Claude will recall Sarah's context from the system promptAuto-Extracting Facts with Claude
Instead of manually calling save_user_facts, use Claude to extract facts to remember:
pythondef extract_facts_to_remember(conversation: list) -> dict:
"""Use Claude to extract key facts from a conversation."""
extraction_prompt = """Review this conversation and extract facts worth remembering
about the user for future sessions. Return JSON only.
Example output:
{
"name": "Sarah",
"role": "SaaS founder",
"goals": "launch dental software by Q3",
"preferences": "prefers concise answers"
}
Return {} if nothing worth storing."""
response = client.messages.create(
model="claude-haiku-4-5-20251001", # Haiku is fast and cheap for extraction
max_tokens=512,
system=extraction_prompt,
messages=conversation
)
try:
return json.loads(response.content[0].text)
except json.JSONDecodeError:
return {}Pattern 3: Vector Database Memory (Semantic Recall)
For knowledge-intensive apps — where you need Claude to recall from thousands of past interactions or documents — inject retrieved context using a vector database.
This is the classic RAG (Retrieval-Augmented Generation) pattern applied to memory.
pythonfrom anthropic import Anthropic
import numpy as np
client = Anthropic()
# In production: use Pinecone, Weaviate, pgvector, or Chroma
# Here we simulate with a simple in-memory store
memory_store = [] # List of {"text": str, "embedding": list[float]}
def get_embedding(text: str) -> list[float]:
"""Get embedding for a piece of text using Claude's API or a dedicated embeddings model."""
# In production, use a dedicated embedding model (e.g., Voyage AI, OpenAI embeddings)
# Claude itself doesn't have an embeddings endpoint — use Voyage AI (Anthropic's recommended partner)
# For this tutorial, we simulate with a placeholder
# voyage_client.embed([text], model="voyage-3").embeddings[0]
raise NotImplementedError("Replace with your embedding provider")
def cosine_similarity(a: list[float], b: list[float]) -> float:
a_arr, b_arr = np.array(a), np.array(b)
return float(np.dot(a_arr, b_arr) / (np.linalg.norm(a_arr) * np.linalg.norm(b_arr)))
def remember(text: str):
"""Store a memory with its embedding."""
embedding = get_embedding(text)
memory_store.append({"text": text, "embedding": embedding})
def recall(query: str, top_k: int = 3) -> list[str]:
"""Retrieve the most relevant memories for a query."""
if not memory_store:
return []
query_embedding = get_embedding(query)
scored = [
(cosine_similarity(query_embedding, m["embedding"]), m["text"])
for m in memory_store
]
scored.sort(key=lambda x: x[0], reverse=True)
return [text for _, text in scored[:top_k]]
def chat_with_vector_memory(message: str, history: list) -> str:
# Retrieve relevant past memories
relevant_memories = recall(message)
memory_context = ""
if relevant_memories:
memory_context = "\n\nRelevant memories:\n" + "\n".join(
f"- {m}" for m in relevant_memories
)
system = f"""You are a helpful assistant with access to past conversation memories.
{memory_context}
Use these memories naturally when relevant. Don't mention that you're reading from memory."""
history.append({"role": "user", "content": message})
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
system=system,
messages=history
)
reply = response.content[0].text
history.append({"role": "assistant", "content": reply})
# Optionally store this exchange as a new memory
remember(f"User said: {message}. Assistant replied: {reply[:200]}")
return replyTip: For production, swap the in-memory store for pgvector (if you're already on PostgreSQL/Neon) or Pinecone. Voyage AI is Anthropic's recommended embedding provider and integrates cleanly with Claude workflows.
Pattern 4: Tool-Based Memory (Most Flexible)
The most powerful pattern: give Claude tools to read and write its own memory. Claude decides what to remember and when to recall it.
pythonimport json
import anthropic
client = anthropic.Anthropic()
# Simple file-backed memory (use a database in production)
MEMORY_FILE = "claude_memory.json"
def load_memories() -> dict:
try:
with open(MEMORY_FILE, "r") as f:
return json.load(f)
except FileNotFoundError:
return {}
def save_memories(memories: dict):
with open(MEMORY_FILE, "w") as f:
json.dump(memories, f, indent=2)
# Define memory tools for Claude
tools = [
{
"name": "remember",
"description": "Save an important fact or piece of information to long-term memory. Use this when the user shares something that should be recalled in future sessions.",
"input_schema": {
"type": "object",
"properties": {
"key": {
"type": "string",
"description": "A short identifier for this memory (e.g., 'user_name', 'user_goal', 'preferred_style')"
},
"value": {
"type": "string",
"description": "The information to remember"
}
},
"required": ["key", "value"]
}
},
{
"name": "recall",
"description": "Retrieve a previously saved memory by key.",
"input_schema": {
"type": "object",
"properties": {
"key": {
"type": "string",
"description": "The key of the memory to retrieve"
}
},
"required": ["key"]
}
},
{
"name": "list_memories",
"description": "List all stored memory keys so you know what information is available.",
"input_schema": {
"type": "object",
"properties": {}
}
}
]
def handle_tool_call(tool_name: str, tool_input: dict) -> str:
memories = load_memories()
if tool_name == "remember":
memories[tool_input["key"]] = tool_input["value"]
save_memories(memories)
return f"Saved: {tool_input['key']} = {tool_input['value']}"
elif tool_name == "recall":
value = memories.get(tool_input["key"])
return value if value else f"No memory found for key: {tool_input['key']}"
elif tool_name == "list_memories":
if not memories:
return "No memories stored yet."
return "Stored memory keys: " + ", ".join(memories.keys())
return "Unknown tool"
def chat_with_tool_memory(message: str, history: list) -> str:
history.append({"role": "user", "content": message})
system = """You are a helpful assistant with the ability to save and retrieve memories.
At the start of conversations, use list_memories to see what you know about the user.
When users share important information (name, preferences, goals, context), use remember() to save it.
Use recall() when you need specific information about the user."""
while True:
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
system=system,
tools=tools,
messages=history
)
# If Claude wants to use a tool, handle it
if response.stop_reason == "tool_use":
history.append({"role": "assistant", "content": response.content})
tool_results = []
for block in response.content:
if block.type == "tool_use":
result = handle_tool_call(block.name, block.input)
tool_results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": result
})
history.append({"role": "user", "content": tool_results})
continue
# Claude gave a final response
reply = ""
for block in response.content:
if hasattr(block, "text"):
reply += block.text
history.append({"role": "assistant", "content": reply})
return reply
# Test
history = []
print(chat_with_tool_memory("Hi! I'm Alex, a backend engineer working on a Rust microservices project.", history))
print(chat_with_tool_memory("What do you know about me?", history))
# Start a new session — memories persist because they're saved to disk
history = []
print(chat_with_tool_memory("Hey, do you remember me?", history))Choosing the Right Memory Pattern
| Pattern | Best For | Complexity | Cost |
|---|---|---|---|
| Messages array | Single-session chatbots | Low | Low |
| System prompt injection | User profile apps, returning users | Low-Medium | Low |
| Vector DB (RAG) | Knowledge-heavy apps, large memory stores | High | Medium |
| Tool-based memory | Agents, complex workflows | Medium | Medium |
Production Checklist
Before shipping a Claude app with memory, verify:
- [ ] Privacy: Are you storing only what you need? Can users view and delete their memories?
- [ ] Token budget: Log token usage per request. Memory injection increases costs.
- [ ] Relevance decay: Old memories may become incorrect. Add a timestamp and prune memories older than 90 days.
- [ ] Context ordering: Put memory context in the system prompt, not at the start of the messages array. System prompts get special treatment in Claude's attention.
- [ ] Prompt caching: If your system prompt is large and stable (e.g., a big knowledge base), use Claude's prompt caching to cut costs by up to 90%.
- [ ] Haiku for extraction: Use
claude-haiku-4-5-20251001for memory extraction and classification tasks — it's 10x cheaper and fast enough for background jobs.
Key Takeaways
- Claude has no built-in memory by design — your application owns the retrieval layer
- The messages array handles within-session memory; persist facts to a database for cross-session recall
- Vector databases unlock semantic memory retrieval at scale
- Tool-based memory lets Claude manage its own recall, which is powerful for autonomous agents
- Start simple (messages array + system prompt injection) and graduate to vector memory only when needed
Next Steps
Ready to build more sophisticated Claude agents? Check out our guide on Claude multi-agent orchestration or dive into the Claude API production best practices for patterns you can ship with confidence.
If you're preparing for the Claude Certified Architect (CCA-F) exam, memory architecture is a key topic — context management, tool use, and agentic patterns all appear in the exam. Practice with our CCA exam guide and free practice questions.
Ready to Start Practicing?
300+ scenario-based practice questions covering all 5 CCA domains. Detailed explanations for every answer.
Free CCA Study Kit
Get domain cheat sheets, anti-pattern flashcards, and weekly exam tips. No spam, unsubscribe anytime.