Tutorials10 min read

How to Build a Chatbot with Claude API: Complete Tutorial (2026)

Step-by-step tutorial to build a production-ready chatbot using the Anthropic Claude API. Covers multi-turn conversations, streaming, system prompts, and tool use.

How to Build a Chatbot with the Claude API (2026 Tutorial)

Most "build a chatbot" tutorials give you a single-question toy that breaks the moment a real user types anything. This guide skips the shortcuts. You'll build a production-ready chatbot using the Anthropic Claude API — one that handles multi-turn conversations, streams responses token-by-token, respects a custom system prompt, and calls external tools when it needs live data.

By the end you'll have a working Python chatbot you can embed in a web app, Slack bot, or CLI tool — and you'll understand why each piece exists, which is what the Claude Certified Architect exam tests.

What You'll Build

  • A CLI chatbot with persistent conversation memory
  • Streaming output (tokens appear as Claude generates them)
  • A configurable system prompt for persona control
  • One tool integration (live weather via a mock function)
  • Clean error handling for rate limits and API errors

Prerequisites: Python 3.10+, an Anthropic API key, and basic familiarity with pip.

Step 1: Install the Anthropic SDK and Set Up Your Project

bashpip install anthropic python-dotenv

Create a .env file in your project root:

ANTHROPIC_API_KEY=sk-ant-...

Then create chatbot.py:

pythonimport os
from dotenv import load_dotenv
import anthropic

load_dotenv()
client = anthropic.Anthropic(api_key=os.environ["ANTHROPIC_API_KEY"])

The Anthropic client is thread-safe and designed to be instantiated once. Don't create a new client per request in production — it re-reads credentials and opens new connections unnecessarily.


Step 2: Send Your First Message

The Claude API is a Messages API, not a completion API. Every call takes a list of messages and returns a response you append to that list. This is the mental model that makes multi-turn conversations trivial.

pythonresponse = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "What is the Anthropic Constitution?"}
    ]
)

print(response.content[0].text)

Run it. You should see Claude's answer. Notice what response contains:

pythonprint(response.stop_reason)   # "end_turn" or "max_tokens"
print(response.usage)         # input_tokens, output_tokens

Tracking usage per call is how you monitor costs. At claude-sonnet-4-6 pricing (roughly $3/M input, $15/M output), a 1,024-token response costs about $0.015 — trivial in testing, but it adds up at scale.


Step 3: Add Multi-Turn Conversation Memory

A chatbot that forgets what you just said isn't a chatbot — it's a fancy search box. Multi-turn memory in the Messages API is explicit: you maintain the conversation list yourself and pass the full history on every call.

pythondef chat(messages: list, user_input: str) -> str:
    """Add user message, call API, append assistant reply, return text."""
    messages.append({"role": "user", "content": user_input})

    response = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=1024,
        messages=messages,
    )

    assistant_message = response.content[0].text
    messages.append({"role": "assistant", "content": assistant_message})
    return assistant_message


def run_chatbot():
    messages = []
    print("Claude Chatbot — type 'quit' to exit\n")

    while True:
        user_input = input("You: ").strip()
        if user_input.lower() in ("quit", "exit"):
            break
        if not user_input:
            continue

        reply = chat(messages, user_input)
        print(f"\nClaude: {reply}\n")


if __name__ == "__main__":
    run_chatbot()

Run this and have a multi-turn conversation. Claude will remember context because the full messages list grows with each turn.

Token budget warning: The conversation list grows indefinitely. Claude Sonnet 4.6 has a 1M-token context window — generous, but a 10-hour customer support session will eventually hit it. Production chatbots use one of three strategies:
StrategyHow it worksBest for
Sliding windowDrop oldest messages when over thresholdCasual chat, support bots
Summary compressionSummarize old turns into one system messageLong-running assistants
RetrievalStore turns in vector DB, inject relevant onesKnowledge-heavy domains

For this tutorial we'll use a simple sliding window.

pythonMAX_TURNS = 20  # keep last 20 messages (10 user + 10 assistant)

def trim_history(messages: list) -> list:
    if len(messages) > MAX_TURNS:
        return messages[-MAX_TURNS:]
    return messages

Call messages = trim_history(messages) before each API call.


Step 4: Add a System Prompt for Persona Control

The system parameter is the single most powerful knob in the Claude API. It's not a first message — it's a persistent instruction layer that Claude weighs throughout the conversation.

pythonSYSTEM_PROMPT = """You are Aria, a friendly customer support assistant for AI for Anything (aiforanything.io).

Your role:
- Help users understand AI certifications (CCA, AWS AI Practitioner, Google AI)
- Answer questions about practice tests and study guides
- Keep answers concise (under 150 words) unless the user asks for detail
- Never make up pricing — direct pricing questions to the website

Tone: Warm, encouraging, technically accurate. Learners need confidence."""


response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    system=SYSTEM_PROMPT,   # <-- system param, not in messages list
    messages=messages,
)

Key system prompt rules that matter for production:

  • Role before rules — open with who the assistant is, then constrain behavior
  • Negative instructions work — "never make up pricing" is effective
  • Explicit format instructions — "under 150 words" shapes output length better than vague guidance
  • The system prompt is not secret — a determined user can often extract it. Don't put passwords or confidential logic here

  • Step 5: Stream Responses Token-by-Token

    Nobody wants to stare at a blank screen for 3 seconds waiting for a 400-word response. Streaming makes your chatbot feel instant.

    pythondef chat_stream(messages: list, user_input: str) -> str:
        messages.append({"role": "user", "content": user_input})
        full_response = ""
    
        with client.messages.stream(
            model="claude-sonnet-4-6",
            max_tokens=1024,
            system=SYSTEM_PROMPT,
            messages=messages,
        ) as stream:
            for text in stream.text_stream:
                print(text, end="", flush=True)
                full_response += text
    
        print()  # newline after streaming ends
        messages.append({"role": "assistant", "content": full_response})
        return full_response

    The .text_stream iterator yields string chunks as they arrive. flush=True forces Python to print each chunk immediately rather than buffering. In a web app you'd send these chunks via Server-Sent Events (SSE) — the pattern is identical, just replace print with response.write.


    Step 6: Add Tool Use (Function Calling)

    Tool use lets Claude call functions you define — database lookups, API calls, calculations — and weave the results into its response. This is the feature that separates a chatbot from a real AI assistant.

    Here's how it works:

  • You define tools (JSON schema describing function + parameters)
  • Claude decides when to call them
  • Your code executes the function
  • You return results to Claude, which generates the final response
  • pythonimport json
    
    # Define the tool schema
    tools = [
        {
            "name": "get_weather",
            "description": "Get current weather for a city. Use when the user asks about weather.",
            "input_schema": {
                "type": "object",
                "properties": {
                    "city": {
                        "type": "string",
                        "description": "City name, e.g. 'San Francisco'"
                    },
                    "unit": {
                        "type": "string",
                        "enum": ["celsius", "fahrenheit"],
                        "description": "Temperature unit"
                    }
                },
                "required": ["city"]
            }
        }
    ]
    
    
    def get_weather(city: str, unit: str = "celsius") -> dict:
        """Mock weather function — replace with real API call."""
        return {"city": city, "temperature": 22, "unit": unit, "condition": "Partly cloudy"}
    
    
    def chat_with_tools(messages: list, user_input: str) -> str:
        messages.append({"role": "user", "content": user_input})
    
        response = client.messages.create(
            model="claude-sonnet-4-6",
            max_tokens=1024,
            system=SYSTEM_PROMPT,
            tools=tools,
            messages=messages,
        )
    
        # Check if Claude wants to use a tool
        while response.stop_reason == "tool_use":
            tool_uses = [b for b in response.content if b.type == "tool_use"]
    
            # Add Claude's tool-calling message to history
            messages.append({"role": "assistant", "content": response.content})
    
            # Execute each tool call
            tool_results = []
            for tool_use in tool_uses:
                if tool_use.name == "get_weather":
                    result = get_weather(**tool_use.input)
                    tool_results.append({
                        "type": "tool_result",
                        "tool_use_id": tool_use.id,
                        "content": json.dumps(result),
                    })
    
            # Return results to Claude
            messages.append({"role": "user", "content": tool_results})
    
            # Get Claude's final response
            response = client.messages.create(
                model="claude-sonnet-4-6",
                max_tokens=1024,
                system=SYSTEM_PROMPT,
                tools=tools,
                messages=messages,
            )
    
        # Extract final text response
        final_text = next(b.text for b in response.content if hasattr(b, "text"))
        messages.append({"role": "assistant", "content": final_text})
        return final_text

    The while response.stop_reason == "tool_use" loop handles parallel tool calls — Claude can request multiple tools simultaneously, and you handle all of them before calling the API again.


    Step 7: Handle Errors Gracefully

    Production chatbots fail. Rate limits, network timeouts, invalid API keys — all of them will happen. The Anthropic SDK raises typed exceptions you can catch:

    pythonfrom anthropic import (
        APIConnectionError,
        RateLimitError,
        APIStatusError,
        AuthenticationError,
    )
    import time
    
    def safe_chat(messages: list, user_input: str, retries: int = 3) -> str:
        for attempt in range(retries):
            try:
                return chat_stream(messages, user_input)
    
            except RateLimitError:
                wait = 2 ** attempt  # exponential backoff: 1s, 2s, 4s
                print(f"Rate limited. Retrying in {wait}s...")
                time.sleep(wait)
    
            except AuthenticationError:
                raise ValueError("Invalid API key. Check your ANTHROPIC_API_KEY.")
    
            except APIConnectionError:
                print("Network error. Check your connection.")
                if attempt == retries - 1:
                    raise
    
            except APIStatusError as e:
                print(f"API error {e.status_code}: {e.message}")
                raise
    
        raise RuntimeError("Max retries exceeded")

    Exponential backoff on RateLimitError is the standard pattern — it's what the Anthropic cookbook recommends and what the CCA exam tests you on.


    Complete Chatbot: Putting It All Together

    Here's the final chatbot.py with all features integrated:

    pythonimport os, json, time
    from dotenv import load_dotenv
    import anthropic
    from anthropic import RateLimitError, AuthenticationError, APIConnectionError
    
    load_dotenv()
    client = anthropic.Anthropic(api_key=os.environ["ANTHROPIC_API_KEY"])
    
    SYSTEM_PROMPT = """You are Aria, a helpful AI assistant. Be concise, accurate, and friendly."""
    
    MAX_TURNS = 20
    
    tools = [
        {
            "name": "get_weather",
            "description": "Get current weather for a city.",
            "input_schema": {
                "type": "object",
                "properties": {"city": {"type": "string"}},
                "required": ["city"]
            }
        }
    ]
    
    def get_weather(city: str) -> dict:
        return {"city": city, "temperature": 22, "condition": "Sunny"}  # replace with real API
    
    def trim_history(messages: list) -> list:
        return messages[-MAX_TURNS:] if len(messages) > MAX_TURNS else messages
    
    def chat(messages: list, user_input: str) -> str:
        messages = trim_history(messages)
        messages.append({"role": "user", "content": user_input})
        full_response = ""
    
        for attempt in range(3):
            try:
                response = client.messages.create(
                    model="claude-sonnet-4-6",
                    max_tokens=1024,
                    system=SYSTEM_PROMPT,
                    tools=tools,
                    messages=messages,
                )
                break
            except RateLimitError:
                time.sleep(2 ** attempt)
        else:
            return "Sorry, I'm temporarily unavailable. Please try again."
    
        # Handle tool use
        while response.stop_reason == "tool_use":
            tool_uses = [b for b in response.content if b.type == "tool_use"]
            messages.append({"role": "assistant", "content": response.content})
            results = []
            for t in tool_uses:
                if t.name == "get_weather":
                    results.append({
                        "type": "tool_result",
                        "tool_use_id": t.id,
                        "content": json.dumps(get_weather(**t.input))
                    })
            messages.append({"role": "user", "content": results})
            response = client.messages.create(
                model="claude-sonnet-4-6",
                max_tokens=1024,
                system=SYSTEM_PROMPT,
                tools=tools,
                messages=messages,
            )
    
        # Stream the final response
        full_response = next(b.text for b in response.content if hasattr(b, "text"))
        print(f"\nAria: {full_response}\n")
        messages.append({"role": "assistant", "content": full_response})
        return full_response
    
    
    def main():
        messages = []
        print("Aria Chatbot — type 'quit' to exit\n")
        while True:
            user_input = input("You: ").strip()
            if user_input.lower() in ("quit", "exit"):
                break
            if user_input:
                chat(messages, user_input)
    
    if __name__ == "__main__":
        main()


    Choosing the Right Claude Model

    ModelBest forApprox. cost
    claude-haiku-4-5High-volume, simple Q&A, classificationLowest
    claude-sonnet-4-6Most chatbots, balanced quality/costMid
    claude-opus-4-6Complex reasoning, document analysisHighest

    For most customer-facing chatbots, start with Sonnet. Downgrade to Haiku for FAQ bots that handle thousands of requests per day. Use Opus only when the task genuinely requires deep reasoning — the cost difference is roughly 5x.


    Key Takeaways

    • The Messages API is stateless — you own the conversation history and pass it on every call
    • The system parameter controls persona and constraints — it's separate from the messages list
    • Streaming requires minimal code changes: client.messages.stream() instead of client.messages.create()
    • Tool use follows a request→execute→return loop; the while stop_reason == "tool_use" pattern handles parallel calls
    • Always implement exponential backoff for RateLimitError in production

    Go Deeper: Claude Certified Architect

    Building chatbots with the Claude API is one of the core competencies tested in the Claude Certified Architect (CCA-F) exam. The exam covers:

    • Messages API design patterns and multi-turn architecture
    • Prompt engineering and system prompt design
    • Tool use schemas and agentic patterns
    • Context window management and token optimization
    • Safety best practices and constitutional AI

    AI for Anything offers the most comprehensive CCA practice test bank available — 200+ questions organized by exam domain, with detailed explanations for every answer. Whether you're studying for the cert or building production AI apps, understanding these patterns deeply is what separates a developer who uses Claude from one who can architect with it.

    Start your CCA prep →

    Ready to Start Practicing?

    300+ scenario-based practice questions covering all 5 CCA domains. Detailed explanations for every answer.

    Free CCA Study Kit

    Get domain cheat sheets, anti-pattern flashcards, and weekly exam tips. No spam, unsubscribe anytime.