Claude API with Python: Complete Tutorial with Real-World Examples (2026)
Master the Anthropic Python SDK in one guide. Setup, streaming, tool use, multi-turn conversations, and production patterns — with runnable code examples.
Claude API with Python: Complete Tutorial with Real-World Examples
If you've been using Claude through the web interface and want to integrate it into your own applications, Python is the fastest path forward. The Anthropic Python SDK is well-documented, actively maintained, and takes minutes to set up — but most tutorials stop at "hello world."
This guide goes further. You'll learn streaming, multi-turn conversations, tool use (function calling), error handling, and cost-efficient patterns used in production apps. By the end, you'll have everything you need to build real Claude-powered features.
What You'll Need
Before writing a single line of code:
Set your API key as an environment variable (never hardcode it):
bashexport ANTHROPIC_API_KEY="sk-ant-your-key-here"Or if you're using a .env file with python-dotenv:
bashpip install python-dotenv anthropicpythonfrom dotenv import load_dotenv
load_dotenv()Setting Up the Anthropic Python SDK
Install the official SDK:
bashpip install anthropicThat's it. No heavy dependencies, no complex configuration. The SDK handles authentication automatically by reading ANTHROPIC_API_KEY from your environment.
Your First API Call
pythonimport anthropic
client = anthropic.Anthropic()
message = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
messages=[
{"role": "user", "content": "Explain what an API is in two sentences."}
]
)
print(message.content[0].text)An API (Application Programming Interface) is a set of rules and protocols
that allows different software applications to communicate with each other.
It acts as a contract between two systems, defining how requests should be
made and what kind of responses to expect.Understanding the Response Object
The message object contains more than just the text:
pythonprint(message.model) # "claude-sonnet-4-6"
print(message.stop_reason) # "end_turn"
print(message.usage.input_tokens) # tokens you sent
print(message.usage.output_tokens) # tokens in the responseTrack usage carefully — it's how you calculate your API costs.
System Prompts: Shaping Claude's Behavior
A system prompt is the most powerful tool for controlling how Claude responds. It sets context, persona, and constraints that persist across the entire conversation.
pythonclient = anthropic.Anthropic()
message = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=512,
system="You are a senior Python engineer reviewing code for production readiness. \
Be direct, specific, and prioritize security and performance issues first.",
messages=[
{"role": "user", "content": "Review this: x = input('Enter password: ')"}
]
)
print(message.content[0].text)Good system prompts are:
- Specific about role and expertise level — not just "you are a helpful assistant"
- Clear about output format — "respond in bullet points", "use markdown headers"
- Bounded in scope — tell Claude what to focus on and what to ignore
Multi-Turn Conversations
Unlike single-shot queries, real applications need conversational memory. You manage this yourself by building the messages array:
pythonimport anthropic
client = anthropic.Anthropic()
def chat(conversation_history, user_message):
"""Send a message and get a response, maintaining conversation history."""
conversation_history.append({
"role": "user",
"content": user_message
})
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
system="You are a Python tutor helping beginners learn programming.",
messages=conversation_history
)
assistant_message = response.content[0].text
conversation_history.append({
"role": "assistant",
"content": assistant_message
})
return assistant_message, conversation_history
# Usage
history = []
reply, history = chat(history, "What is a list in Python?")
print(reply)
reply, history = chat(history, "How is it different from a tuple?")
print(reply) # Claude remembers the previous contextStreaming Responses
For any user-facing app, streaming is essential. It makes responses feel instant rather than waiting for the full reply to generate.
pythonimport anthropic
client = anthropic.Anthropic()
with client.messages.stream(
model="claude-sonnet-4-6",
max_tokens=1024,
messages=[
{"role": "user", "content": "Write a Python function to parse CSV files with error handling."}
]
) as stream:
for text in stream.text_stream:
print(text, end="", flush=True)
# Get final message after streaming completes
final_message = stream.get_final_message()
print(f"\n\nTokens used: {final_message.usage.input_tokens} in / {final_message.usage.output_tokens} out")Async Streaming (FastAPI / async apps)
If you're building a web API, use the async client:
pythonimport asyncio
import anthropic
async def stream_response(user_prompt: str):
client = anthropic.AsyncAnthropic()
async with client.messages.stream(
model="claude-sonnet-4-6",
max_tokens=1024,
messages=[{"role": "user", "content": user_prompt}]
) as stream:
async for text in stream.text_stream:
yield text # yield to your FastAPI streaming response
# In a FastAPI endpoint:
# from fastapi.responses import StreamingResponse
# return StreamingResponse(stream_response(prompt), media_type="text/plain")Tool Use (Function Calling)
Tool use lets Claude call functions you define — the foundation for building agents, data pipelines, and automated workflows.
Here's a practical example: a weather assistant that can call a weather API.
pythonimport anthropic
import json
client = anthropic.Anthropic()
# Define the tools Claude can use
tools = [
{
"name": "get_weather",
"description": "Get the current weather for a city. Returns temperature, conditions, and humidity.",
"input_schema": {
"type": "object",
"properties": {
"city": {
"type": "string",
"description": "The city name, e.g. 'San Francisco'"
},
"units": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "Temperature units"
}
},
"required": ["city"]
}
}
]
def get_weather(city: str, units: str = "celsius") -> dict:
"""Simulate a weather API call."""
# In production, call a real weather API here
return {
"city": city,
"temperature": 22,
"units": units,
"conditions": "Partly cloudy",
"humidity": 65
}
def run_weather_agent(user_message: str):
messages = [{"role": "user", "content": user_message}]
while True:
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
tools=tools,
messages=messages
)
# If Claude wants to use a tool
if response.stop_reason == "tool_use":
# Add Claude's response to history
messages.append({"role": "assistant", "content": response.content})
# Process each tool call
tool_results = []
for block in response.content:
if block.type == "tool_use":
# Execute the function
result = get_weather(**block.input)
tool_results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": json.dumps(result)
})
# Add tool results to history
messages.append({"role": "user", "content": tool_results})
# Claude has finished
elif response.stop_reason == "end_turn":
return response.content[0].text
else:
break
# Usage
answer = run_weather_agent("What's the weather like in Tokyo and should I bring an umbrella?")
print(answer)This pattern — loop until end_turn, execute tools when stop_reason == "tool_use" — is the backbone of every Claude agent.
Real-World Project: Document Summarizer
Let's build something practical: a script that reads a text file and generates a structured summary with key points, action items, and a TL;DR.
pythonimport anthropic
from pathlib import Path
client = anthropic.Anthropic()
SUMMARIZER_SYSTEM = """You are a document analyst. When given a document, respond with:
## TL;DR
[2-3 sentence summary]
## Key Points
- [Point 1]
- [Point 2]
- [Point 3]
## Action Items
- [Actionable item if any, otherwise "None identified"]
## Sentiment
[Positive/Neutral/Negative and why in one sentence]
Always use this exact structure. Be concise."""
def summarize_document(file_path: str) -> dict:
"""Summarize a text document using Claude."""
path = Path(file_path)
if not path.exists():
raise FileNotFoundError(f"File not found: {file_path}")
content = path.read_text(encoding="utf-8")
# Trim if document is too long (rough token estimate: 1 token ≈ 4 chars)
max_chars = 180_000 # ~45K tokens, safe for claude-sonnet-4-6
if len(content) > max_chars:
content = content[:max_chars] + "\n\n[Document truncated for length]"
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
system=SUMMARIZER_SYSTEM,
messages=[
{
"role": "user",
"content": f"Please summarize this document:\n\n{content}"
}
]
)
return {
"summary": response.content[0].text,
"input_tokens": response.usage.input_tokens,
"output_tokens": response.usage.output_tokens,
"file": path.name
}
# Usage
result = summarize_document("meeting_notes.txt")
print(result["summary"])
print(f"\nTokens: {result['input_tokens']} in / {result['output_tokens']} out")Error Handling in Production
The API can fail. Your application needs to handle it gracefully.
pythonimport anthropic
import time
client = anthropic.Anthropic()
def resilient_completion(prompt: str, retries: int = 3) -> str:
"""API call with retry logic for rate limits and transient errors."""
for attempt in range(retries):
try:
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=512,
messages=[{"role": "user", "content": prompt}]
)
return response.content[0].text
except anthropic.RateLimitError:
if attempt < retries - 1:
wait_time = 2 ** attempt # Exponential backoff: 1s, 2s, 4s
print(f"Rate limited. Waiting {wait_time}s before retry {attempt + 1}...")
time.sleep(wait_time)
else:
raise
except anthropic.APITimeoutError:
if attempt < retries - 1:
print(f"Timeout on attempt {attempt + 1}. Retrying...")
time.sleep(1)
else:
raise
except anthropic.AuthenticationError:
raise # Don't retry auth errors — the key is wrong
except anthropic.BadRequestError as e:
print(f"Bad request: {e}")
raise # Don't retry bad requests — fix the input| Error | Cause | Fix |
|---|---|---|
AuthenticationError | Invalid API key | Check ANTHROPIC_API_KEY env var |
RateLimitError | Too many requests | Exponential backoff + retry |
APITimeoutError | Request took too long | Lower max_tokens, retry |
BadRequestError | Invalid message format | Check message structure |
OverloadedError | Anthropic servers busy | Retry with backoff |
Cost Optimization Patterns
Claude is priced per token. At scale, these patterns matter:
1. Cache system prompts with Prompt CachingIf you use a long system prompt repeatedly, enable caching to reduce costs by up to 90% on repeated calls:
pythonresponse = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
system=[
{
"type": "text",
"text": "Your very long system prompt here...",
"cache_control": {"type": "ephemeral"} # Cache this block
}
],
messages=[{"role": "user", "content": user_message}]
)| Task | Recommended Model | Why |
|---|---|---|
| Simple classification, extraction | claude-haiku-4-5 | 10x cheaper than Sonnet |
| General development, writing | claude-sonnet-4-6 | Best price/performance |
| Complex reasoning, architecture | claude-opus-4-6 | Max capability |
max_tokens
max_tokens is the ceiling, not the target. Setting it to 4096 when you need 200 words wastes nothing — you're only billed for tokens generated. But a well-calibrated ceiling prevents runaway generations.
Key Takeaways
- Install with
pip install anthropic— the SDK handles auth automatically from your environment variable - System prompts are your primary control lever — invest time in writing good ones
- Manage conversation history yourself — pass the full
messagesarray every time - Use streaming for any UI — it transforms the user experience
- Tool use is the gateway to agents — loop until
stop_reason == "end_turn", execute tools in between - Handle errors with exponential backoff — especially
RateLimitErrorandAPITimeoutError - Cache repeated system prompts — saves 80-90% on those tokens at scale
Next Steps
If you want to go deeper on the Claude API, check out these resources on AI for Anything:
- Claude API Prompt Caching Guide — cut costs on repeated context
- Claude Tool Use & Function Calling Tutorial — build production agents
- How to Build a Chatbot with Claude API — full chatbot walkthrough
Ready to validate your Claude knowledge? Take our Claude Certified Architect (CCA) practice exam — 200+ questions covering API patterns, agent architecture, and production best practices. The first 20 questions are free.
Ready to Start Practicing?
300+ scenario-based practice questions covering all 5 CCA domains. Detailed explanations for every answer.
Free CCA Study Kit
Get domain cheat sheets, anti-pattern flashcards, and weekly exam tips. No spam, unsubscribe anytime.