Claude Batch API Tutorial: Process Thousands of Requests at 50% Cost
Learn how to use Claude's Message Batches API to process thousands of AI requests asynchronously at half the price. Includes Python and JavaScript code examples.
Claude Batch API Tutorial: Process Thousands of Requests at 50% Cost
If you're running Claude at scale — classifying thousands of support tickets, enriching product catalogs, or generating summaries for a data pipeline — you've probably felt the pinch of API costs and rate limits. Sending requests one at a time isn't just expensive; it's slow.
That's exactly what the Claude Message Batches API was built for. It lets you submit up to 10,000 requests in a single batch, process them asynchronously, and pay 50% less than standard API pricing. No polling loops, no rate-limit juggling.
This tutorial walks you through everything: how batches work, when to use them, and production-ready code in Python and JavaScript.
What Is the Claude Message Batches API?
The Message Batches API is Anthropic's asynchronous processing endpoint. Instead of making thousands of individual POST /v1/messages calls, you:
The trade-off is latency: you won't get a result in milliseconds. But if you're running nightly ETL pipelines, weekly content enrichment jobs, or offline document processing, that trade-off is irrelevant — and the 50% cost reduction is very relevant.
Batch API vs. Synchronous API — At a Glance
| Feature | Synchronous API | Batch API |
|---|---|---|
| Response time | Milliseconds | Up to 24 hours |
| Cost | Standard pricing | 50% discount |
| Max requests per call | 1 | 10,000 |
| Use case | Real-time features | Offline / bulk jobs |
| Rate limit pressure | High | None during processing |
| Models supported | All Claude models | All Claude models |
When to Use the Batch API
The Batch API is ideal when:
- You don't need results immediately — nightly jobs, weekly reports, async enrichment
- Volume is high — more than ~100 requests per run
- Costs matter — 50% savings adds up fast at scale
- Rate limits are a bottleneck — batches bypass standard rate limits
Common use cases:
- Document classification — categorize 50,000 support tickets by topic
- Bulk summarization — generate summaries for every article in a CMS
- Data enrichment — extract structured fields from unstructured text at scale
- Content generation — produce product descriptions for an entire catalog
- Sentiment analysis — score thousands of customer reviews
If you need sub-second responses (chatbots, code completion, real-time Q&A), stick with the synchronous API.
Setting Up: Prerequisites
You'll need:
- An Anthropic API key (get one at console.anthropic.com)
- Python 3.8+ or Node.js 18+
- The Anthropic SDK
bash# Python
pip install anthropic
# JavaScript / Node.js
npm install @anthropic-ai/sdkSet your API key as an environment variable:
bashexport ANTHROPIC_API_KEY="sk-ant-..."Step 1: Create a Batch (Python)
Here's a complete Python example that submits a batch of 5 requests, polls for completion, and saves results.
pythonimport anthropic
import time
import json
client = anthropic.Anthropic()
# Define your batch of requests
# Each request needs a unique custom_id for matching results later
requests = [
{
"custom_id": "ticket-001",
"params": {
"model": "claude-sonnet-4-6",
"max_tokens": 256,
"messages": [
{
"role": "user",
"content": "Classify this support ticket into one category (Billing, Technical, Account, Other): 'I can't log into my account after resetting my password.'"
}
]
}
},
{
"custom_id": "ticket-002",
"params": {
"model": "claude-sonnet-4-6",
"max_tokens": 256,
"messages": [
{
"role": "user",
"content": "Classify this support ticket into one category (Billing, Technical, Account, Other): 'I was charged twice for my subscription this month.'"
}
]
}
},
{
"custom_id": "ticket-003",
"params": {
"model": "claude-sonnet-4-6",
"max_tokens": 256,
"messages": [
{
"role": "user",
"content": "Classify this support ticket into one category (Billing, Technical, Account, Other): 'The mobile app crashes when I try to upload images.'"
}
]
}
},
]
# Submit the batch
print("Submitting batch...")
batch = client.messages.batches.create(requests=requests)
print(f"Batch ID: {batch.id}")
print(f"Status: {batch.processing_status}")Running this gives you a batch ID like msgbatch_01AbCdEfGhIjKlMnOpQrStUv. Save it — you'll need it to check status and retrieve results.
Step 2: Poll for Completion
Batches typically complete in minutes for small jobs, hours for large ones (up to the 24-hour maximum). Here's a polling loop:
pythondef wait_for_batch(client, batch_id: str, poll_interval: int = 60) -> dict:
"""Poll until batch completes. Returns the finished batch object."""
while True:
batch = client.messages.batches.retrieve(batch_id)
status = batch.processing_status
print(f"Status: {status} | "
f"Succeeded: {batch.request_counts.succeeded} | "
f"Processing: {batch.request_counts.processing} | "
f"Errored: {batch.request_counts.errored}")
if status == "ended":
return batch
time.sleep(poll_interval)
# Wait for our batch to finish
finished_batch = wait_for_batch(client, batch.id, poll_interval=30)
print(f"\nBatch complete! Total succeeded: {finished_batch.request_counts.succeeded}")The processing_status field cycles through:
"in_progress"— requests are being processed"ended"— all requests have been processed (check individual results for errors)"canceling"— a cancellation is in progress"canceled"— the batch was canceled
Step 3: Retrieve Results
Once the batch status is "ended", stream the results. Results are returned in JSONL format (one JSON object per line):
pythondef get_batch_results(client, batch_id: str) -> list[dict]:
"""Download and parse batch results."""
results = []
for result in client.messages.batches.results(batch_id):
record = {
"custom_id": result.custom_id,
"type": result.result.type, # "succeeded" or "errored"
}
if result.result.type == "succeeded":
# Extract the text response
record["response"] = result.result.message.content[0].text
record["input_tokens"] = result.result.message.usage.input_tokens
record["output_tokens"] = result.result.message.usage.output_tokens
else:
# Handle errors
record["error"] = result.result.error.type
record["error_message"] = result.result.error.message
results.append(record)
return results
results = get_batch_results(client, batch.id)
# Save to file for downstream processing
with open("batch_results.jsonl", "w") as f:
for result in results:
f.write(json.dumps(result) + "\n")
# Print summary
for r in results:
if r["type"] == "succeeded":
print(f"{r['custom_id']}: {r['response']}")
else:
print(f"{r['custom_id']}: ERROR — {r['error_message']}")Step 4: Full JavaScript Example
Here's the same workflow in Node.js:
javascriptimport Anthropic from "@anthropic-ai/sdk";
import fs from "fs";
const client = new Anthropic();
async function runBatchJob() {
// 1. Create the batch
const batch = await client.messages.batches.create({
requests: [
{
custom_id: "doc-001",
params: {
model: "claude-sonnet-4-6",
max_tokens: 512,
messages: [
{
role: "user",
content:
"Summarize this in 2 sentences: Artificial intelligence is transforming industries by automating repetitive tasks, enabling faster decision-making, and uncovering patterns in large datasets that humans would miss.",
},
],
},
},
{
custom_id: "doc-002",
params: {
model: "claude-sonnet-4-6",
max_tokens: 512,
messages: [
{
role: "user",
content:
"Summarize this in 2 sentences: Machine learning models require large amounts of high-quality training data. Data quality directly impacts model accuracy and generalization ability.",
},
],
},
},
],
});
console.log(`Batch submitted: ${batch.id}`);
// 2. Poll for completion
let finished = false;
while (!finished) {
await new Promise((r) => setTimeout(r, 30000)); // wait 30 seconds
const status = await client.messages.batches.retrieve(batch.id);
console.log(`Status: ${status.processing_status}`);
if (status.processing_status === "ended") {
finished = true;
}
}
// 3. Stream results
const results = [];
for await (const result of await client.messages.batches.results(batch.id)) {
if (result.result.type === "succeeded") {
results.push({
id: result.custom_id,
text: result.result.message.content[0].text,
});
}
}
// 4. Save results
fs.writeFileSync("results.json", JSON.stringify(results, null, 2));
console.log("Done! Results saved to results.json");
results.forEach((r) => console.log(`${r.id}: ${r.text}`));
}
runBatchJob().catch(console.error);Processing Large Batches (10,000 Requests)
For production jobs with thousands of requests, batch your data in chunks of up to 10,000 and track batch IDs:
pythonimport math
def submit_large_dataset(client, items: list, prompt_template: str, model: str = "claude-sonnet-4-6"):
"""Submit a dataset that may exceed 10,000 items across multiple batches."""
BATCH_SIZE = 10_000
batch_ids = []
total_batches = math.ceil(len(items) / BATCH_SIZE)
print(f"Submitting {len(items)} items across {total_batches} batches...")
for batch_num, i in enumerate(range(0, len(items), BATCH_SIZE)):
chunk = items[i:i + BATCH_SIZE]
requests = [
{
"custom_id": f"item-{i + j}",
"params": {
"model": model,
"max_tokens": 256,
"messages": [
{
"role": "user",
"content": prompt_template.format(text=item)
}
]
}
}
for j, item in enumerate(chunk)
]
batch = client.messages.batches.create(requests=requests)
batch_ids.append(batch.id)
print(f"Batch {batch_num + 1}/{total_batches} submitted: {batch.id}")
return batch_ids
# Example usage
items = [f"Support ticket text number {i}" for i in range(25_000)]
prompt = "Classify into (Billing/Technical/Account/Other): {text}"
batch_ids = submit_large_dataset(client, items, prompt)Cost Calculation: How Much Can You Save?
The Batch API charges 50% of standard API prices. Here's what that means at scale:
| Volume | Standard Cost (Sonnet) | Batch Cost | Savings |
|---|---|---|---|
| 10,000 requests (avg 500 input / 200 output tokens) | ~$42 | ~$21 | $21 |
| 100,000 requests | ~$420 | ~$210 | $210 |
| 1,000,000 requests | ~$4,200 | ~$2,100 | $2,100 |
For a typical data enrichment pipeline processing 500K records per month, the Batch API can save over $1,000/month vs. the synchronous API.
Error Handling and Retries
Individual requests within a batch can fail independently. Always check result.result.type:
pythondef process_results_with_retry(client, batch_id: str) -> tuple[list, list]:
"""Separate succeeded and failed results for potential retry."""
succeeded = []
failed = []
for result in client.messages.batches.results(batch_id):
if result.result.type == "succeeded":
succeeded.append({
"id": result.custom_id,
"content": result.result.message.content[0].text
})
else:
failed.append({
"id": result.custom_id,
"error_type": result.result.error.type,
})
return succeeded, failed
succeeded, failed = process_results_with_retry(client, batch.id)
print(f"Success: {len(succeeded)} | Failed: {len(failed)}")
# Resubmit failed items if needed
if failed:
print(f"Retrying {len(failed)} failed requests...")
# Re-create requests from failed custom_ids and submit new batchCommon error types to handle:
"overloaded_error"— retry in a new batch"invalid_request_error"— check your request schema; don't retry as-is"api_error"— transient error; retry is safe
Best Practices for Production
1. Use meaningful custom IDsUse IDs that map to your database (e.g., "ticket-{db_id}" or "article-{slug}"). This makes joining results back to your records trivial.
You can include a system field in each request's params. Use it to enforce output format (JSON, specific categories) so parsing is consistent across all results.
max_tokens conservatively
Batch pricing is based on tokens. If you only need short classifications, set max_tokens: 128. For summaries, max_tokens: 512. Don't leave it at 4096 if you don't need it.
Save the raw JSONL output before parsing. If your parsing logic has a bug, you can re-parse without re-running the (expensive) batch.
5. Monitor with request countsThe batch object's request_counts field shows succeeded, errored, canceled, and processing counts in real time. Log this per poll to track job health.
Key Takeaways
- The Claude Batch API processes up to 10,000 requests per batch asynchronously with a 24-hour SLA
- You pay 50% less vs. the synchronous API — significant savings at scale
- Each request gets a
custom_idfor result matching; results are streamed as JSONL - Best for nightly pipelines, bulk document processing, and offline enrichment — not for real-time features
- Handle
erroredresults by checkingresult.result.error.typeand resubmitting as needed
Next Steps
Ready to use Claude at scale? The Batch API is one of the most valuable — and underused — features in Claude's API. Master it and you can build data pipelines that would cost 2× as much on the synchronous endpoint.If you're preparing for the Claude Certified Architect (CCA-F) exam, batch processing architecture is a tested topic. Our CCA practice test bank includes questions on API optimization, cost management, and agentic system design.
Want to go deeper on Claude API fundamentals? Start with our Claude API beginner's guide or jump into prompt caching — another technique that cuts costs on repeated context.
The Anthropic official Batch API documentation has the full API reference if you need endpoint specs.
Ready to Start Practicing?
300+ scenario-based practice questions covering all 5 CCA domains. Detailed explanations for every answer.
Free CCA Study Kit
Get domain cheat sheets, anti-pattern flashcards, and weekly exam tips. No spam, unsubscribe anytime.