How to Build a Chatbot with Claude API: Complete Tutorial (2026)
Step-by-step tutorial to build a production-ready chatbot using the Anthropic Claude API. Covers multi-turn conversations, streaming, system prompts, and tool use.
How to Build a Chatbot with the Claude API (2026 Tutorial)
Most "build a chatbot" tutorials give you a single-question toy that breaks the moment a real user types anything. This guide skips the shortcuts. You'll build a production-ready chatbot using the Anthropic Claude API — one that handles multi-turn conversations, streams responses token-by-token, respects a custom system prompt, and calls external tools when it needs live data.
By the end you'll have a working Python chatbot you can embed in a web app, Slack bot, or CLI tool — and you'll understand why each piece exists, which is what the Claude Certified Architect exam tests.
What You'll Build
- A CLI chatbot with persistent conversation memory
- Streaming output (tokens appear as Claude generates them)
- A configurable system prompt for persona control
- One tool integration (live weather via a mock function)
- Clean error handling for rate limits and API errors
pip.
Step 1: Install the Anthropic SDK and Set Up Your Project
bashpip install anthropic python-dotenvCreate a .env file in your project root:
ANTHROPIC_API_KEY=sk-ant-...Then create chatbot.py:
pythonimport os
from dotenv import load_dotenv
import anthropic
load_dotenv()
client = anthropic.Anthropic(api_key=os.environ["ANTHROPIC_API_KEY"])The Anthropic client is thread-safe and designed to be instantiated once. Don't create a new client per request in production — it re-reads credentials and opens new connections unnecessarily.
Step 2: Send Your First Message
The Claude API is a Messages API, not a completion API. Every call takes a list of messages and returns a response you append to that list. This is the mental model that makes multi-turn conversations trivial.
pythonresponse = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
messages=[
{"role": "user", "content": "What is the Anthropic Constitution?"}
]
)
print(response.content[0].text)Run it. You should see Claude's answer. Notice what response contains:
pythonprint(response.stop_reason) # "end_turn" or "max_tokens"
print(response.usage) # input_tokens, output_tokensTracking usage per call is how you monitor costs. At claude-sonnet-4-6 pricing (roughly $3/M input, $15/M output), a 1,024-token response costs about $0.015 — trivial in testing, but it adds up at scale.
Step 3: Add Multi-Turn Conversation Memory
A chatbot that forgets what you just said isn't a chatbot — it's a fancy search box. Multi-turn memory in the Messages API is explicit: you maintain the conversation list yourself and pass the full history on every call.
pythondef chat(messages: list, user_input: str) -> str:
"""Add user message, call API, append assistant reply, return text."""
messages.append({"role": "user", "content": user_input})
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
messages=messages,
)
assistant_message = response.content[0].text
messages.append({"role": "assistant", "content": assistant_message})
return assistant_message
def run_chatbot():
messages = []
print("Claude Chatbot — type 'quit' to exit\n")
while True:
user_input = input("You: ").strip()
if user_input.lower() in ("quit", "exit"):
break
if not user_input:
continue
reply = chat(messages, user_input)
print(f"\nClaude: {reply}\n")
if __name__ == "__main__":
run_chatbot()Run this and have a multi-turn conversation. Claude will remember context because the full messages list grows with each turn.
| Strategy | How it works | Best for |
|---|---|---|
| Sliding window | Drop oldest messages when over threshold | Casual chat, support bots |
| Summary compression | Summarize old turns into one system message | Long-running assistants |
| Retrieval | Store turns in vector DB, inject relevant ones | Knowledge-heavy domains |
For this tutorial we'll use a simple sliding window.
pythonMAX_TURNS = 20 # keep last 20 messages (10 user + 10 assistant)
def trim_history(messages: list) -> list:
if len(messages) > MAX_TURNS:
return messages[-MAX_TURNS:]
return messagesCall messages = trim_history(messages) before each API call.
Step 4: Add a System Prompt for Persona Control
The system parameter is the single most powerful knob in the Claude API. It's not a first message — it's a persistent instruction layer that Claude weighs throughout the conversation.
pythonSYSTEM_PROMPT = """You are Aria, a friendly customer support assistant for AI for Anything (aiforanything.io).
Your role:
- Help users understand AI certifications (CCA, AWS AI Practitioner, Google AI)
- Answer questions about practice tests and study guides
- Keep answers concise (under 150 words) unless the user asks for detail
- Never make up pricing — direct pricing questions to the website
Tone: Warm, encouraging, technically accurate. Learners need confidence."""
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
system=SYSTEM_PROMPT, # <-- system param, not in messages list
messages=messages,
)Key system prompt rules that matter for production:
Step 5: Stream Responses Token-by-Token
Nobody wants to stare at a blank screen for 3 seconds waiting for a 400-word response. Streaming makes your chatbot feel instant.
pythondef chat_stream(messages: list, user_input: str) -> str:
messages.append({"role": "user", "content": user_input})
full_response = ""
with client.messages.stream(
model="claude-sonnet-4-6",
max_tokens=1024,
system=SYSTEM_PROMPT,
messages=messages,
) as stream:
for text in stream.text_stream:
print(text, end="", flush=True)
full_response += text
print() # newline after streaming ends
messages.append({"role": "assistant", "content": full_response})
return full_responseThe .text_stream iterator yields string chunks as they arrive. flush=True forces Python to print each chunk immediately rather than buffering. In a web app you'd send these chunks via Server-Sent Events (SSE) — the pattern is identical, just replace print with response.write.
Step 6: Add Tool Use (Function Calling)
Tool use lets Claude call functions you define — database lookups, API calls, calculations — and weave the results into its response. This is the feature that separates a chatbot from a real AI assistant.
Here's how it works:
pythonimport json
# Define the tool schema
tools = [
{
"name": "get_weather",
"description": "Get current weather for a city. Use when the user asks about weather.",
"input_schema": {
"type": "object",
"properties": {
"city": {
"type": "string",
"description": "City name, e.g. 'San Francisco'"
},
"unit": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "Temperature unit"
}
},
"required": ["city"]
}
}
]
def get_weather(city: str, unit: str = "celsius") -> dict:
"""Mock weather function — replace with real API call."""
return {"city": city, "temperature": 22, "unit": unit, "condition": "Partly cloudy"}
def chat_with_tools(messages: list, user_input: str) -> str:
messages.append({"role": "user", "content": user_input})
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
system=SYSTEM_PROMPT,
tools=tools,
messages=messages,
)
# Check if Claude wants to use a tool
while response.stop_reason == "tool_use":
tool_uses = [b for b in response.content if b.type == "tool_use"]
# Add Claude's tool-calling message to history
messages.append({"role": "assistant", "content": response.content})
# Execute each tool call
tool_results = []
for tool_use in tool_uses:
if tool_use.name == "get_weather":
result = get_weather(**tool_use.input)
tool_results.append({
"type": "tool_result",
"tool_use_id": tool_use.id,
"content": json.dumps(result),
})
# Return results to Claude
messages.append({"role": "user", "content": tool_results})
# Get Claude's final response
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
system=SYSTEM_PROMPT,
tools=tools,
messages=messages,
)
# Extract final text response
final_text = next(b.text for b in response.content if hasattr(b, "text"))
messages.append({"role": "assistant", "content": final_text})
return final_textThe while response.stop_reason == "tool_use" loop handles parallel tool calls — Claude can request multiple tools simultaneously, and you handle all of them before calling the API again.
Step 7: Handle Errors Gracefully
Production chatbots fail. Rate limits, network timeouts, invalid API keys — all of them will happen. The Anthropic SDK raises typed exceptions you can catch:
pythonfrom anthropic import (
APIConnectionError,
RateLimitError,
APIStatusError,
AuthenticationError,
)
import time
def safe_chat(messages: list, user_input: str, retries: int = 3) -> str:
for attempt in range(retries):
try:
return chat_stream(messages, user_input)
except RateLimitError:
wait = 2 ** attempt # exponential backoff: 1s, 2s, 4s
print(f"Rate limited. Retrying in {wait}s...")
time.sleep(wait)
except AuthenticationError:
raise ValueError("Invalid API key. Check your ANTHROPIC_API_KEY.")
except APIConnectionError:
print("Network error. Check your connection.")
if attempt == retries - 1:
raise
except APIStatusError as e:
print(f"API error {e.status_code}: {e.message}")
raise
raise RuntimeError("Max retries exceeded")Exponential backoff on RateLimitError is the standard pattern — it's what the Anthropic cookbook recommends and what the CCA exam tests you on.
Complete Chatbot: Putting It All Together
Here's the final chatbot.py with all features integrated:
pythonimport os, json, time
from dotenv import load_dotenv
import anthropic
from anthropic import RateLimitError, AuthenticationError, APIConnectionError
load_dotenv()
client = anthropic.Anthropic(api_key=os.environ["ANTHROPIC_API_KEY"])
SYSTEM_PROMPT = """You are Aria, a helpful AI assistant. Be concise, accurate, and friendly."""
MAX_TURNS = 20
tools = [
{
"name": "get_weather",
"description": "Get current weather for a city.",
"input_schema": {
"type": "object",
"properties": {"city": {"type": "string"}},
"required": ["city"]
}
}
]
def get_weather(city: str) -> dict:
return {"city": city, "temperature": 22, "condition": "Sunny"} # replace with real API
def trim_history(messages: list) -> list:
return messages[-MAX_TURNS:] if len(messages) > MAX_TURNS else messages
def chat(messages: list, user_input: str) -> str:
messages = trim_history(messages)
messages.append({"role": "user", "content": user_input})
full_response = ""
for attempt in range(3):
try:
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
system=SYSTEM_PROMPT,
tools=tools,
messages=messages,
)
break
except RateLimitError:
time.sleep(2 ** attempt)
else:
return "Sorry, I'm temporarily unavailable. Please try again."
# Handle tool use
while response.stop_reason == "tool_use":
tool_uses = [b for b in response.content if b.type == "tool_use"]
messages.append({"role": "assistant", "content": response.content})
results = []
for t in tool_uses:
if t.name == "get_weather":
results.append({
"type": "tool_result",
"tool_use_id": t.id,
"content": json.dumps(get_weather(**t.input))
})
messages.append({"role": "user", "content": results})
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
system=SYSTEM_PROMPT,
tools=tools,
messages=messages,
)
# Stream the final response
full_response = next(b.text for b in response.content if hasattr(b, "text"))
print(f"\nAria: {full_response}\n")
messages.append({"role": "assistant", "content": full_response})
return full_response
def main():
messages = []
print("Aria Chatbot — type 'quit' to exit\n")
while True:
user_input = input("You: ").strip()
if user_input.lower() in ("quit", "exit"):
break
if user_input:
chat(messages, user_input)
if __name__ == "__main__":
main()Choosing the Right Claude Model
| Model | Best for | Approx. cost |
|---|---|---|
claude-haiku-4-5 | High-volume, simple Q&A, classification | Lowest |
claude-sonnet-4-6 | Most chatbots, balanced quality/cost | Mid |
claude-opus-4-6 | Complex reasoning, document analysis | Highest |
For most customer-facing chatbots, start with Sonnet. Downgrade to Haiku for FAQ bots that handle thousands of requests per day. Use Opus only when the task genuinely requires deep reasoning — the cost difference is roughly 5x.
Key Takeaways
- The Messages API is stateless — you own the conversation history and pass it on every call
- The
systemparameter controls persona and constraints — it's separate from the messages list - Streaming requires minimal code changes:
client.messages.stream()instead ofclient.messages.create() - Tool use follows a request→execute→return loop; the
while stop_reason == "tool_use"pattern handles parallel calls - Always implement exponential backoff for
RateLimitErrorin production
Go Deeper: Claude Certified Architect
Building chatbots with the Claude API is one of the core competencies tested in the Claude Certified Architect (CCA-F) exam. The exam covers:
- Messages API design patterns and multi-turn architecture
- Prompt engineering and system prompt design
- Tool use schemas and agentic patterns
- Context window management and token optimization
- Safety best practices and constitutional AI
AI for Anything offers the most comprehensive CCA practice test bank available — 200+ questions organized by exam domain, with detailed explanations for every answer. Whether you're studying for the cert or building production AI apps, understanding these patterns deeply is what separates a developer who uses Claude from one who can architect with it.
Start your CCA prep →Ready to Start Practicing?
300+ scenario-based practice questions covering all 5 CCA domains. Detailed explanations for every answer.
Free CCA Study Kit
Get domain cheat sheets, anti-pattern flashcards, and weekly exam tips. No spam, unsubscribe anytime.