Claude Batch API Tutorial: Process Thousands of Requests at 50% Cost

If you're running Claude at scale — classifying thousands of support tickets, enriching product catalogs, or generating summaries for a data pipeline — you've probably felt the pinch of API costs and rate limits. Sending requests one at a time isn't just expensive; it's slow.

That's exactly what the Claude Message Batches API was built for. It lets you submit up to 10,000 requests in a single batch, process them asynchronously, and pay 50% less than standard API pricing. No polling loops, no rate-limit juggling.

This tutorial walks you through everything: how batches work, when to use them, and production-ready code in Python and JavaScript.

What Is the Claude Message Batches API?

The Message Batches API is Anthropic's asynchronous processing endpoint. Instead of making thousands of individual POST /v1/messages calls, you:

Bundle up to 10,000 requests into a single batch payload

Submit the batch — Anthropic queues it and processes it within 24 hours (usually much faster)

Poll for completion or receive a webhook

Download the results as a JSONL file

The trade-off is latency: you won't get a result in milliseconds. But if you're running nightly ETL pipelines, weekly content enrichment jobs, or offline document processing, that trade-off is irrelevant — and the 50% cost reduction is very relevant.

Batch API vs. Synchronous API — At a Glance

Feature	Synchronous API	Batch API
Response time	Milliseconds	Up to 24 hours
Cost	Standard pricing	50% discount
Max requests per call	1	10,000
Use case	Real-time features	Offline / bulk jobs
Rate limit pressure	High	None during processing
Models supported	All Claude models	All Claude models

When to Use the Batch API

The Batch API is ideal when:

You don't need results immediately — nightly jobs, weekly reports, async enrichment
Volume is high — more than ~100 requests per run
Costs matter — 50% savings adds up fast at scale
Rate limits are a bottleneck — batches bypass standard rate limits

Common use cases:

Document classification — categorize 50,000 support tickets by topic
Bulk summarization — generate summaries for every article in a CMS
Data enrichment — extract structured fields from unstructured text at scale
Content generation — produce product descriptions for an entire catalog
Sentiment analysis — score thousands of customer reviews

If you need sub-second responses (chatbots, code completion, real-time Q&A), stick with the synchronous API.

Setting Up: Prerequisites

You'll need:

An Anthropic API key (get one at console.anthropic.com)
Python 3.8+ or Node.js 18+
The Anthropic SDK

bash# Python
pip install anthropic

# JavaScript / Node.js
npm install @anthropic-ai/sdk

Set your API key as an environment variable:

bashexport ANTHROPIC_API_KEY="sk-ant-..."

Step 1: Create a Batch (Python)

Here's a complete Python example that submits a batch of 5 requests, polls for completion, and saves results.

pythonimport anthropic
import time
import json

client = anthropic.Anthropic()

# Define your batch of requests
# Each request needs a unique custom_id for matching results later
requests = [
    {
        "custom_id": "ticket-001",
        "params": {
            "model": "claude-sonnet-4-6",
            "max_tokens": 256,
            "messages": [
                {
                    "role": "user",
                    "content": "Classify this support ticket into one category (Billing, Technical, Account, Other): 'I can't log into my account after resetting my password.'"
                }
            ]
        }
    },
    {
        "custom_id": "ticket-002",
        "params": {
            "model": "claude-sonnet-4-6",
            "max_tokens": 256,
            "messages": [
                {
                    "role": "user",
                    "content": "Classify this support ticket into one category (Billing, Technical, Account, Other): 'I was charged twice for my subscription this month.'"
                }
            ]
        }
    },
    {
        "custom_id": "ticket-003",
        "params": {
            "model": "claude-sonnet-4-6",
            "max_tokens": 256,
            "messages": [
                {
                    "role": "user",
                    "content": "Classify this support ticket into one category (Billing, Technical, Account, Other): 'The mobile app crashes when I try to upload images.'"
                }
            ]
        }
    },
]

# Submit the batch
print("Submitting batch...")
batch = client.messages.batches.create(requests=requests)
print(f"Batch ID: {batch.id}")
print(f"Status: {batch.processing_status}")

Running this gives you a batch ID like msgbatch_01AbCdEfGhIjKlMnOpQrStUv. Save it — you'll need it to check status and retrieve results.

Step 2: Poll for Completion

Batches typically complete in minutes for small jobs, hours for large ones (up to the 24-hour maximum). Here's a polling loop:

pythondef wait_for_batch(client, batch_id: str, poll_interval: int = 60) -> dict:
    """Poll until batch completes. Returns the finished batch object."""
    while True:
        batch = client.messages.batches.retrieve(batch_id)
        status = batch.processing_status
        
        print(f"Status: {status} | "
              f"Succeeded: {batch.request_counts.succeeded} | "
              f"Processing: {batch.request_counts.processing} | "
              f"Errored: {batch.request_counts.errored}")
        
        if status == "ended":
            return batch
        
        time.sleep(poll_interval)

# Wait for our batch to finish
finished_batch = wait_for_batch(client, batch.id, poll_interval=30)
print(f"\nBatch complete! Total succeeded: {finished_batch.request_counts.succeeded}")

The processing_status field cycles through:

"in_progress" — requests are being processed
"ended" — all requests have been processed (check individual results for errors)
"canceling" — a cancellation is in progress
"canceled" — the batch was canceled

Step 3: Retrieve Results

Once the batch status is "ended", stream the results. Results are returned in JSONL format (one JSON object per line):

pythondef get_batch_results(client, batch_id: str) -> list[dict]:
    """Download and parse batch results."""
    results = []
    
    for result in client.messages.batches.results(batch_id):
        record = {
            "custom_id": result.custom_id,
            "type": result.result.type,  # "succeeded" or "errored"
        }
        
        if result.result.type == "succeeded":
            # Extract the text response
            record["response"] = result.result.message.content[0].text
            record["input_tokens"] = result.result.message.usage.input_tokens
            record["output_tokens"] = result.result.message.usage.output_tokens
        else:
            # Handle errors
            record["error"] = result.result.error.type
            record["error_message"] = result.result.error.message
        
        results.append(record)
    
    return results

results = get_batch_results(client, batch.id)

# Save to file for downstream processing
with open("batch_results.jsonl", "w") as f:
    for result in results:
        f.write(json.dumps(result) + "\n")

# Print summary
for r in results:
    if r["type"] == "succeeded":
        print(f"{r['custom_id']}: {r['response']}")
    else:
        print(f"{r['custom_id']}: ERROR — {r['error_message']}")

Step 4: Full JavaScript Example

Here's the same workflow in Node.js:

javascriptimport Anthropic from "@anthropic-ai/sdk";
import fs from "fs";

const client = new Anthropic();

async function runBatchJob() {
  // 1. Create the batch
  const batch = await client.messages.batches.create({
    requests: [
      {
        custom_id: "doc-001",
        params: {
          model: "claude-sonnet-4-6",
          max_tokens: 512,
          messages: [
            {
              role: "user",
              content:
                "Summarize this in 2 sentences: Artificial intelligence is transforming industries by automating repetitive tasks, enabling faster decision-making, and uncovering patterns in large datasets that humans would miss.",
            },
          ],
        },
      },
      {
        custom_id: "doc-002",
        params: {
          model: "claude-sonnet-4-6",
          max_tokens: 512,
          messages: [
            {
              role: "user",
              content:
                "Summarize this in 2 sentences: Machine learning models require large amounts of high-quality training data. Data quality directly impacts model accuracy and generalization ability.",
            },
          ],
        },
      },
    ],
  });

  console.log(`Batch submitted: ${batch.id}`);

  // 2. Poll for completion
  let finished = false;
  while (!finished) {
    await new Promise((r) => setTimeout(r, 30000)); // wait 30 seconds
    const status = await client.messages.batches.retrieve(batch.id);
    console.log(`Status: ${status.processing_status}`);
    if (status.processing_status === "ended") {
      finished = true;
    }
  }

  // 3. Stream results
  const results = [];
  for await (const result of await client.messages.batches.results(batch.id)) {
    if (result.result.type === "succeeded") {
      results.push({
        id: result.custom_id,
        text: result.result.message.content[0].text,
      });
    }
  }

  // 4. Save results
  fs.writeFileSync("results.json", JSON.stringify(results, null, 2));
  console.log("Done! Results saved to results.json");
  results.forEach((r) => console.log(`${r.id}: ${r.text}`));
}

runBatchJob().catch(console.error);

Processing Large Batches (10,000 Requests)

For production jobs with thousands of requests, batch your data in chunks of up to 10,000 and track batch IDs:

pythonimport math

def submit_large_dataset(client, items: list, prompt_template: str, model: str = "claude-sonnet-4-6"):
    """Submit a dataset that may exceed 10,000 items across multiple batches."""
    BATCH_SIZE = 10_000
    batch_ids = []
    
    total_batches = math.ceil(len(items) / BATCH_SIZE)
    print(f"Submitting {len(items)} items across {total_batches} batches...")
    
    for batch_num, i in enumerate(range(0, len(items), BATCH_SIZE)):
        chunk = items[i:i + BATCH_SIZE]
        
        requests = [
            {
                "custom_id": f"item-{i + j}",
                "params": {
                    "model": model,
                    "max_tokens": 256,
                    "messages": [
                        {
                            "role": "user",
                            "content": prompt_template.format(text=item)
                        }
                    ]
                }
            }
            for j, item in enumerate(chunk)
        ]
        
        batch = client.messages.batches.create(requests=requests)
        batch_ids.append(batch.id)
        print(f"Batch {batch_num + 1}/{total_batches} submitted: {batch.id}")
    
    return batch_ids

# Example usage
items = [f"Support ticket text number {i}" for i in range(25_000)]
prompt = "Classify into (Billing/Technical/Account/Other): {text}"
batch_ids = submit_large_dataset(client, items, prompt)

Cost Calculation: How Much Can You Save?

The Batch API charges 50% of standard API prices. Here's what that means at scale:

Volume	Standard Cost (Sonnet)	Batch Cost	Savings
10,000 requests (avg 500 input / 200 output tokens)	~$42	~$21	$21
100,000 requests	~$420	~$210	$210
1,000,000 requests	~$4,200	~$2,100	$2,100

Estimates based on claude-sonnet-4-6 pricing. Check anthropic.com/pricing for current rates.

For a typical data enrichment pipeline processing 500K records per month, the Batch API can save over $1,000/month vs. the synchronous API.

Error Handling and Retries

Individual requests within a batch can fail independently. Always check result.result.type:

pythondef process_results_with_retry(client, batch_id: str) -> tuple[list, list]:
    """Separate succeeded and failed results for potential retry."""
    succeeded = []
    failed = []
    
    for result in client.messages.batches.results(batch_id):
        if result.result.type == "succeeded":
            succeeded.append({
                "id": result.custom_id,
                "content": result.result.message.content[0].text
            })
        else:
            failed.append({
                "id": result.custom_id,
                "error_type": result.result.error.type,
            })
    
    return succeeded, failed

succeeded, failed = process_results_with_retry(client, batch.id)
print(f"Success: {len(succeeded)} | Failed: {len(failed)}")

# Resubmit failed items if needed
if failed:
    print(f"Retrying {len(failed)} failed requests...")
    # Re-create requests from failed custom_ids and submit new batch

Common error types to handle:

"overloaded_error" — retry in a new batch
"invalid_request_error" — check your request schema; don't retry as-is
"api_error" — transient error; retry is safe

Best Practices for Production

1. Use meaningful custom IDs

Use IDs that map to your database (e.g., "ticket-{db_id}" or "article-{slug}"). This makes joining results back to your records trivial.

2. Set system prompts in requests

You can include a system field in each request's params. Use it to enforce output format (JSON, specific categories) so parsing is consistent across all results.

3. Use max_tokens conservatively

Batch pricing is based on tokens. If you only need short classifications, set max_tokens: 128. For summaries, max_tokens: 512. Don't leave it at 4096 if you don't need it.

4. Store raw results before processing

Save the raw JSONL output before parsing. If your parsing logic has a bug, you can re-parse without re-running the (expensive) batch.

5. Monitor with request counts

The batch object's request_counts field shows succeeded, errored, canceled, and processing counts in real time. Log this per poll to track job health.

Key Takeaways

The Claude Batch API processes up to 10,000 requests per batch asynchronously with a 24-hour SLA
You pay 50% less vs. the synchronous API — significant savings at scale
Each request gets a custom_id for result matching; results are streamed as JSONL
Best for nightly pipelines, bulk document processing, and offline enrichment — not for real-time features
Handle errored results by checking result.result.error.type and resubmitting as needed

Next Steps

Ready to use Claude at scale? The Batch API is one of the most valuable — and underused — features in Claude's API. Master it and you can build data pipelines that would cost 2× as much on the synchronous endpoint.

If you're preparing for the Claude Certified Architect (CCA-F) exam, batch processing architecture is a tested topic. Our CCA practice test bank includes questions on API optimization, cost management, and agentic system design.

Want to go deeper on Claude API fundamentals? Start with our Claude API beginner's guide or jump into prompt caching — another technique that cuts costs on repeated context.

The Anthropic official Batch API documentation has the full API reference if you need endpoint specs.