Tutorials9 min read

Claude API with Python: Complete Tutorial with Real-World Examples (2026)

Master the Anthropic Python SDK in one guide. Setup, streaming, tool use, multi-turn conversations, and production patterns — with runnable code examples.

Claude API with Python: Complete Tutorial with Real-World Examples

If you've been using Claude through the web interface and want to integrate it into your own applications, Python is the fastest path forward. The Anthropic Python SDK is well-documented, actively maintained, and takes minutes to set up — but most tutorials stop at "hello world."

This guide goes further. You'll learn streaming, multi-turn conversations, tool use (function calling), error handling, and cost-efficient patterns used in production apps. By the end, you'll have everything you need to build real Claude-powered features.

What You'll Need

Before writing a single line of code:

  • An Anthropic API key — get one at console.anthropic.com
  • Python 3.8+ installed on your machine
  • Basic Python familiarity — this is a tutorial, not an intro to Python
  • Set your API key as an environment variable (never hardcode it):

    bashexport ANTHROPIC_API_KEY="sk-ant-your-key-here"

    Or if you're using a .env file with python-dotenv:

    bashpip install python-dotenv anthropic

    pythonfrom dotenv import load_dotenv
    load_dotenv()


    Setting Up the Anthropic Python SDK

    Install the official SDK:

    bashpip install anthropic

    That's it. No heavy dependencies, no complex configuration. The SDK handles authentication automatically by reading ANTHROPIC_API_KEY from your environment.

    Your First API Call

    pythonimport anthropic
    
    client = anthropic.Anthropic()
    
    message = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=1024,
        messages=[
            {"role": "user", "content": "Explain what an API is in two sentences."}
        ]
    )
    
    print(message.content[0].text)

    Output:

    An API (Application Programming Interface) is a set of rules and protocols 
    that allows different software applications to communicate with each other. 
    It acts as a contract between two systems, defining how requests should be 
    made and what kind of responses to expect.

    Understanding the Response Object

    The message object contains more than just the text:

    pythonprint(message.model)          # "claude-sonnet-4-6"
    print(message.stop_reason)    # "end_turn"
    print(message.usage.input_tokens)   # tokens you sent
    print(message.usage.output_tokens)  # tokens in the response

    Track usage carefully — it's how you calculate your API costs.


    System Prompts: Shaping Claude's Behavior

    A system prompt is the most powerful tool for controlling how Claude responds. It sets context, persona, and constraints that persist across the entire conversation.

    pythonclient = anthropic.Anthropic()
    
    message = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=512,
        system="You are a senior Python engineer reviewing code for production readiness. \
    Be direct, specific, and prioritize security and performance issues first.",
        messages=[
            {"role": "user", "content": "Review this: x = input('Enter password: ')"}
        ]
    )
    
    print(message.content[0].text)

    Good system prompts are:

    • Specific about role and expertise level — not just "you are a helpful assistant"
    • Clear about output format — "respond in bullet points", "use markdown headers"
    • Bounded in scope — tell Claude what to focus on and what to ignore


    Multi-Turn Conversations

    Unlike single-shot queries, real applications need conversational memory. You manage this yourself by building the messages array:

    pythonimport anthropic
    
    client = anthropic.Anthropic()
    
    def chat(conversation_history, user_message):
        """Send a message and get a response, maintaining conversation history."""
        conversation_history.append({
            "role": "user",
            "content": user_message
        })
        
        response = client.messages.create(
            model="claude-sonnet-4-6",
            max_tokens=1024,
            system="You are a Python tutor helping beginners learn programming.",
            messages=conversation_history
        )
        
        assistant_message = response.content[0].text
        conversation_history.append({
            "role": "assistant",
            "content": assistant_message
        })
        
        return assistant_message, conversation_history
    
    # Usage
    history = []
    reply, history = chat(history, "What is a list in Python?")
    print(reply)
    
    reply, history = chat(history, "How is it different from a tuple?")
    print(reply)  # Claude remembers the previous context

    Key pattern: You own the conversation history. Pass it with every request. This gives you full control over context window usage — you can summarize old messages, drop irrelevant turns, or persist history to a database.

    Streaming Responses

    For any user-facing app, streaming is essential. It makes responses feel instant rather than waiting for the full reply to generate.

    pythonimport anthropic
    
    client = anthropic.Anthropic()
    
    with client.messages.stream(
        model="claude-sonnet-4-6",
        max_tokens=1024,
        messages=[
            {"role": "user", "content": "Write a Python function to parse CSV files with error handling."}
        ]
    ) as stream:
        for text in stream.text_stream:
            print(text, end="", flush=True)
        
        # Get final message after streaming completes
        final_message = stream.get_final_message()
        print(f"\n\nTokens used: {final_message.usage.input_tokens} in / {final_message.usage.output_tokens} out")

    Async Streaming (FastAPI / async apps)

    If you're building a web API, use the async client:

    pythonimport asyncio
    import anthropic
    
    async def stream_response(user_prompt: str):
        client = anthropic.AsyncAnthropic()
        
        async with client.messages.stream(
            model="claude-sonnet-4-6",
            max_tokens=1024,
            messages=[{"role": "user", "content": user_prompt}]
        ) as stream:
            async for text in stream.text_stream:
                yield text  # yield to your FastAPI streaming response
    
    # In a FastAPI endpoint:
    # from fastapi.responses import StreamingResponse
    # return StreamingResponse(stream_response(prompt), media_type="text/plain")


    Tool Use (Function Calling)

    Tool use lets Claude call functions you define — the foundation for building agents, data pipelines, and automated workflows.

    Here's a practical example: a weather assistant that can call a weather API.

    pythonimport anthropic
    import json
    
    client = anthropic.Anthropic()
    
    # Define the tools Claude can use
    tools = [
        {
            "name": "get_weather",
            "description": "Get the current weather for a city. Returns temperature, conditions, and humidity.",
            "input_schema": {
                "type": "object",
                "properties": {
                    "city": {
                        "type": "string",
                        "description": "The city name, e.g. 'San Francisco'"
                    },
                    "units": {
                        "type": "string",
                        "enum": ["celsius", "fahrenheit"],
                        "description": "Temperature units"
                    }
                },
                "required": ["city"]
            }
        }
    ]
    
    def get_weather(city: str, units: str = "celsius") -> dict:
        """Simulate a weather API call."""
        # In production, call a real weather API here
        return {
            "city": city,
            "temperature": 22,
            "units": units,
            "conditions": "Partly cloudy",
            "humidity": 65
        }
    
    def run_weather_agent(user_message: str):
        messages = [{"role": "user", "content": user_message}]
        
        while True:
            response = client.messages.create(
                model="claude-sonnet-4-6",
                max_tokens=1024,
                tools=tools,
                messages=messages
            )
            
            # If Claude wants to use a tool
            if response.stop_reason == "tool_use":
                # Add Claude's response to history
                messages.append({"role": "assistant", "content": response.content})
                
                # Process each tool call
                tool_results = []
                for block in response.content:
                    if block.type == "tool_use":
                        # Execute the function
                        result = get_weather(**block.input)
                        tool_results.append({
                            "type": "tool_result",
                            "tool_use_id": block.id,
                            "content": json.dumps(result)
                        })
                
                # Add tool results to history
                messages.append({"role": "user", "content": tool_results})
            
            # Claude has finished
            elif response.stop_reason == "end_turn":
                return response.content[0].text
            else:
                break
    
    # Usage
    answer = run_weather_agent("What's the weather like in Tokyo and should I bring an umbrella?")
    print(answer)

    This pattern — loop until end_turn, execute tools when stop_reason == "tool_use" — is the backbone of every Claude agent.


    Real-World Project: Document Summarizer

    Let's build something practical: a script that reads a text file and generates a structured summary with key points, action items, and a TL;DR.

    pythonimport anthropic
    from pathlib import Path
    
    client = anthropic.Anthropic()
    
    SUMMARIZER_SYSTEM = """You are a document analyst. When given a document, respond with:
    
    ## TL;DR
    [2-3 sentence summary]
    
    ## Key Points
    - [Point 1]
    - [Point 2]
    - [Point 3]
    
    ## Action Items
    - [Actionable item if any, otherwise "None identified"]
    
    ## Sentiment
    [Positive/Neutral/Negative and why in one sentence]
    
    Always use this exact structure. Be concise."""
    
    def summarize_document(file_path: str) -> dict:
        """Summarize a text document using Claude."""
        path = Path(file_path)
        
        if not path.exists():
            raise FileNotFoundError(f"File not found: {file_path}")
        
        content = path.read_text(encoding="utf-8")
        
        # Trim if document is too long (rough token estimate: 1 token ≈ 4 chars)
        max_chars = 180_000  # ~45K tokens, safe for claude-sonnet-4-6
        if len(content) > max_chars:
            content = content[:max_chars] + "\n\n[Document truncated for length]"
        
        response = client.messages.create(
            model="claude-sonnet-4-6",
            max_tokens=1024,
            system=SUMMARIZER_SYSTEM,
            messages=[
                {
                    "role": "user",
                    "content": f"Please summarize this document:\n\n{content}"
                }
            ]
        )
        
        return {
            "summary": response.content[0].text,
            "input_tokens": response.usage.input_tokens,
            "output_tokens": response.usage.output_tokens,
            "file": path.name
        }
    
    # Usage
    result = summarize_document("meeting_notes.txt")
    print(result["summary"])
    print(f"\nTokens: {result['input_tokens']} in / {result['output_tokens']} out")


    Error Handling in Production

    The API can fail. Your application needs to handle it gracefully.

    pythonimport anthropic
    import time
    
    client = anthropic.Anthropic()
    
    def resilient_completion(prompt: str, retries: int = 3) -> str:
        """API call with retry logic for rate limits and transient errors."""
        
        for attempt in range(retries):
            try:
                response = client.messages.create(
                    model="claude-sonnet-4-6",
                    max_tokens=512,
                    messages=[{"role": "user", "content": prompt}]
                )
                return response.content[0].text
            
            except anthropic.RateLimitError:
                if attempt < retries - 1:
                    wait_time = 2 ** attempt  # Exponential backoff: 1s, 2s, 4s
                    print(f"Rate limited. Waiting {wait_time}s before retry {attempt + 1}...")
                    time.sleep(wait_time)
                else:
                    raise
            
            except anthropic.APITimeoutError:
                if attempt < retries - 1:
                    print(f"Timeout on attempt {attempt + 1}. Retrying...")
                    time.sleep(1)
                else:
                    raise
            
            except anthropic.AuthenticationError:
                raise  # Don't retry auth errors — the key is wrong
            
            except anthropic.BadRequestError as e:
                print(f"Bad request: {e}")
                raise  # Don't retry bad requests — fix the input

    Common errors you'll encounter:
    ErrorCauseFix
    AuthenticationErrorInvalid API keyCheck ANTHROPIC_API_KEY env var
    RateLimitErrorToo many requestsExponential backoff + retry
    APITimeoutErrorRequest took too longLower max_tokens, retry
    BadRequestErrorInvalid message formatCheck message structure
    OverloadedErrorAnthropic servers busyRetry with backoff

    Cost Optimization Patterns

    Claude is priced per token. At scale, these patterns matter:

    1. Cache system prompts with Prompt Caching

    If you use a long system prompt repeatedly, enable caching to reduce costs by up to 90% on repeated calls:

    pythonresponse = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=1024,
        system=[
            {
                "type": "text",
                "text": "Your very long system prompt here...",
                "cache_control": {"type": "ephemeral"}  # Cache this block
            }
        ],
        messages=[{"role": "user", "content": user_message}]
    )

    2. Use the right model for the task
    TaskRecommended ModelWhy
    Simple classification, extractionclaude-haiku-4-510x cheaper than Sonnet
    General development, writingclaude-sonnet-4-6Best price/performance
    Complex reasoning, architectureclaude-opus-4-6Max capability
    3. Set realistic max_tokens max_tokens is the ceiling, not the target. Setting it to 4096 when you need 200 words wastes nothing — you're only billed for tokens generated. But a well-calibrated ceiling prevents runaway generations.

    Key Takeaways

    • Install with pip install anthropic — the SDK handles auth automatically from your environment variable
    • System prompts are your primary control lever — invest time in writing good ones
    • Manage conversation history yourself — pass the full messages array every time
    • Use streaming for any UI — it transforms the user experience
    • Tool use is the gateway to agents — loop until stop_reason == "end_turn", execute tools in between
    • Handle errors with exponential backoff — especially RateLimitError and APITimeoutError
    • Cache repeated system prompts — saves 80-90% on those tokens at scale


    Next Steps

    If you want to go deeper on the Claude API, check out these resources on AI for Anything:

    Ready to validate your Claude knowledge? Take our Claude Certified Architect (CCA) practice exam — 200+ questions covering API patterns, agent architecture, and production best practices. The first 20 questions are free.

    Ready to Start Practicing?

    300+ scenario-based practice questions covering all 5 CCA domains. Detailed explanations for every answer.

    Free CCA Study Kit

    Get domain cheat sheets, anti-pattern flashcards, and weekly exam tips. No spam, unsubscribe anytime.