Tutorials10 min read

Claude Batch API Tutorial: Process Thousands of Requests at 50% Cost

Learn how to use Claude's Message Batches API to process thousands of AI requests asynchronously at half the price. Includes Python and JavaScript code examples.

Claude Batch API Tutorial: Process Thousands of Requests at 50% Cost

If you're running Claude at scale — classifying thousands of support tickets, enriching product catalogs, or generating summaries for a data pipeline — you've probably felt the pinch of API costs and rate limits. Sending requests one at a time isn't just expensive; it's slow.

That's exactly what the Claude Message Batches API was built for. It lets you submit up to 10,000 requests in a single batch, process them asynchronously, and pay 50% less than standard API pricing. No polling loops, no rate-limit juggling.

This tutorial walks you through everything: how batches work, when to use them, and production-ready code in Python and JavaScript.


What Is the Claude Message Batches API?

The Message Batches API is Anthropic's asynchronous processing endpoint. Instead of making thousands of individual POST /v1/messages calls, you:

  • Bundle up to 10,000 requests into a single batch payload
  • Submit the batch — Anthropic queues it and processes it within 24 hours (usually much faster)
  • Poll for completion or receive a webhook
  • Download the results as a JSONL file
  • The trade-off is latency: you won't get a result in milliseconds. But if you're running nightly ETL pipelines, weekly content enrichment jobs, or offline document processing, that trade-off is irrelevant — and the 50% cost reduction is very relevant.

    Batch API vs. Synchronous API — At a Glance

    FeatureSynchronous APIBatch API
    Response timeMillisecondsUp to 24 hours
    CostStandard pricing50% discount
    Max requests per call110,000
    Use caseReal-time featuresOffline / bulk jobs
    Rate limit pressureHighNone during processing
    Models supportedAll Claude modelsAll Claude models

    When to Use the Batch API

    The Batch API is ideal when:

    • You don't need results immediately — nightly jobs, weekly reports, async enrichment
    • Volume is high — more than ~100 requests per run
    • Costs matter — 50% savings adds up fast at scale
    • Rate limits are a bottleneck — batches bypass standard rate limits

    Common use cases:

    • Document classification — categorize 50,000 support tickets by topic
    • Bulk summarization — generate summaries for every article in a CMS
    • Data enrichment — extract structured fields from unstructured text at scale
    • Content generation — produce product descriptions for an entire catalog
    • Sentiment analysis — score thousands of customer reviews

    If you need sub-second responses (chatbots, code completion, real-time Q&A), stick with the synchronous API.


    Setting Up: Prerequisites

    You'll need:

    bash# Python
    pip install anthropic
    
    # JavaScript / Node.js
    npm install @anthropic-ai/sdk

    Set your API key as an environment variable:

    bashexport ANTHROPIC_API_KEY="sk-ant-..."


    Step 1: Create a Batch (Python)

    Here's a complete Python example that submits a batch of 5 requests, polls for completion, and saves results.

    pythonimport anthropic
    import time
    import json
    
    client = anthropic.Anthropic()
    
    # Define your batch of requests
    # Each request needs a unique custom_id for matching results later
    requests = [
        {
            "custom_id": "ticket-001",
            "params": {
                "model": "claude-sonnet-4-6",
                "max_tokens": 256,
                "messages": [
                    {
                        "role": "user",
                        "content": "Classify this support ticket into one category (Billing, Technical, Account, Other): 'I can't log into my account after resetting my password.'"
                    }
                ]
            }
        },
        {
            "custom_id": "ticket-002",
            "params": {
                "model": "claude-sonnet-4-6",
                "max_tokens": 256,
                "messages": [
                    {
                        "role": "user",
                        "content": "Classify this support ticket into one category (Billing, Technical, Account, Other): 'I was charged twice for my subscription this month.'"
                    }
                ]
            }
        },
        {
            "custom_id": "ticket-003",
            "params": {
                "model": "claude-sonnet-4-6",
                "max_tokens": 256,
                "messages": [
                    {
                        "role": "user",
                        "content": "Classify this support ticket into one category (Billing, Technical, Account, Other): 'The mobile app crashes when I try to upload images.'"
                    }
                ]
            }
        },
    ]
    
    # Submit the batch
    print("Submitting batch...")
    batch = client.messages.batches.create(requests=requests)
    print(f"Batch ID: {batch.id}")
    print(f"Status: {batch.processing_status}")

    Running this gives you a batch ID like msgbatch_01AbCdEfGhIjKlMnOpQrStUv. Save it — you'll need it to check status and retrieve results.


    Step 2: Poll for Completion

    Batches typically complete in minutes for small jobs, hours for large ones (up to the 24-hour maximum). Here's a polling loop:

    pythondef wait_for_batch(client, batch_id: str, poll_interval: int = 60) -> dict:
        """Poll until batch completes. Returns the finished batch object."""
        while True:
            batch = client.messages.batches.retrieve(batch_id)
            status = batch.processing_status
            
            print(f"Status: {status} | "
                  f"Succeeded: {batch.request_counts.succeeded} | "
                  f"Processing: {batch.request_counts.processing} | "
                  f"Errored: {batch.request_counts.errored}")
            
            if status == "ended":
                return batch
            
            time.sleep(poll_interval)
    
    # Wait for our batch to finish
    finished_batch = wait_for_batch(client, batch.id, poll_interval=30)
    print(f"\nBatch complete! Total succeeded: {finished_batch.request_counts.succeeded}")

    The processing_status field cycles through:

    • "in_progress" — requests are being processed
    • "ended" — all requests have been processed (check individual results for errors)
    • "canceling" — a cancellation is in progress
    • "canceled" — the batch was canceled


    Step 3: Retrieve Results

    Once the batch status is "ended", stream the results. Results are returned in JSONL format (one JSON object per line):

    pythondef get_batch_results(client, batch_id: str) -> list[dict]:
        """Download and parse batch results."""
        results = []
        
        for result in client.messages.batches.results(batch_id):
            record = {
                "custom_id": result.custom_id,
                "type": result.result.type,  # "succeeded" or "errored"
            }
            
            if result.result.type == "succeeded":
                # Extract the text response
                record["response"] = result.result.message.content[0].text
                record["input_tokens"] = result.result.message.usage.input_tokens
                record["output_tokens"] = result.result.message.usage.output_tokens
            else:
                # Handle errors
                record["error"] = result.result.error.type
                record["error_message"] = result.result.error.message
            
            results.append(record)
        
        return results
    
    results = get_batch_results(client, batch.id)
    
    # Save to file for downstream processing
    with open("batch_results.jsonl", "w") as f:
        for result in results:
            f.write(json.dumps(result) + "\n")
    
    # Print summary
    for r in results:
        if r["type"] == "succeeded":
            print(f"{r['custom_id']}: {r['response']}")
        else:
            print(f"{r['custom_id']}: ERROR — {r['error_message']}")


    Step 4: Full JavaScript Example

    Here's the same workflow in Node.js:

    javascriptimport Anthropic from "@anthropic-ai/sdk";
    import fs from "fs";
    
    const client = new Anthropic();
    
    async function runBatchJob() {
      // 1. Create the batch
      const batch = await client.messages.batches.create({
        requests: [
          {
            custom_id: "doc-001",
            params: {
              model: "claude-sonnet-4-6",
              max_tokens: 512,
              messages: [
                {
                  role: "user",
                  content:
                    "Summarize this in 2 sentences: Artificial intelligence is transforming industries by automating repetitive tasks, enabling faster decision-making, and uncovering patterns in large datasets that humans would miss.",
                },
              ],
            },
          },
          {
            custom_id: "doc-002",
            params: {
              model: "claude-sonnet-4-6",
              max_tokens: 512,
              messages: [
                {
                  role: "user",
                  content:
                    "Summarize this in 2 sentences: Machine learning models require large amounts of high-quality training data. Data quality directly impacts model accuracy and generalization ability.",
                },
              ],
            },
          },
        ],
      });
    
      console.log(`Batch submitted: ${batch.id}`);
    
      // 2. Poll for completion
      let finished = false;
      while (!finished) {
        await new Promise((r) => setTimeout(r, 30000)); // wait 30 seconds
        const status = await client.messages.batches.retrieve(batch.id);
        console.log(`Status: ${status.processing_status}`);
        if (status.processing_status === "ended") {
          finished = true;
        }
      }
    
      // 3. Stream results
      const results = [];
      for await (const result of await client.messages.batches.results(batch.id)) {
        if (result.result.type === "succeeded") {
          results.push({
            id: result.custom_id,
            text: result.result.message.content[0].text,
          });
        }
      }
    
      // 4. Save results
      fs.writeFileSync("results.json", JSON.stringify(results, null, 2));
      console.log("Done! Results saved to results.json");
      results.forEach((r) => console.log(`${r.id}: ${r.text}`));
    }
    
    runBatchJob().catch(console.error);


    Processing Large Batches (10,000 Requests)

    For production jobs with thousands of requests, batch your data in chunks of up to 10,000 and track batch IDs:

    pythonimport math
    
    def submit_large_dataset(client, items: list, prompt_template: str, model: str = "claude-sonnet-4-6"):
        """Submit a dataset that may exceed 10,000 items across multiple batches."""
        BATCH_SIZE = 10_000
        batch_ids = []
        
        total_batches = math.ceil(len(items) / BATCH_SIZE)
        print(f"Submitting {len(items)} items across {total_batches} batches...")
        
        for batch_num, i in enumerate(range(0, len(items), BATCH_SIZE)):
            chunk = items[i:i + BATCH_SIZE]
            
            requests = [
                {
                    "custom_id": f"item-{i + j}",
                    "params": {
                        "model": model,
                        "max_tokens": 256,
                        "messages": [
                            {
                                "role": "user",
                                "content": prompt_template.format(text=item)
                            }
                        ]
                    }
                }
                for j, item in enumerate(chunk)
            ]
            
            batch = client.messages.batches.create(requests=requests)
            batch_ids.append(batch.id)
            print(f"Batch {batch_num + 1}/{total_batches} submitted: {batch.id}")
        
        return batch_ids
    
    # Example usage
    items = [f"Support ticket text number {i}" for i in range(25_000)]
    prompt = "Classify into (Billing/Technical/Account/Other): {text}"
    batch_ids = submit_large_dataset(client, items, prompt)


    Cost Calculation: How Much Can You Save?

    The Batch API charges 50% of standard API prices. Here's what that means at scale:

    VolumeStandard Cost (Sonnet)Batch CostSavings
    10,000 requests (avg 500 input / 200 output tokens)~$42~$21$21
    100,000 requests~$420~$210$210
    1,000,000 requests~$4,200~$2,100$2,100
    Estimates based on claude-sonnet-4-6 pricing. Check anthropic.com/pricing for current rates.

    For a typical data enrichment pipeline processing 500K records per month, the Batch API can save over $1,000/month vs. the synchronous API.


    Error Handling and Retries

    Individual requests within a batch can fail independently. Always check result.result.type:

    pythondef process_results_with_retry(client, batch_id: str) -> tuple[list, list]:
        """Separate succeeded and failed results for potential retry."""
        succeeded = []
        failed = []
        
        for result in client.messages.batches.results(batch_id):
            if result.result.type == "succeeded":
                succeeded.append({
                    "id": result.custom_id,
                    "content": result.result.message.content[0].text
                })
            else:
                failed.append({
                    "id": result.custom_id,
                    "error_type": result.result.error.type,
                })
        
        return succeeded, failed
    
    succeeded, failed = process_results_with_retry(client, batch.id)
    print(f"Success: {len(succeeded)} | Failed: {len(failed)}")
    
    # Resubmit failed items if needed
    if failed:
        print(f"Retrying {len(failed)} failed requests...")
        # Re-create requests from failed custom_ids and submit new batch

    Common error types to handle:

    • "overloaded_error" — retry in a new batch
    • "invalid_request_error" — check your request schema; don't retry as-is
    • "api_error" — transient error; retry is safe


    Best Practices for Production

    1. Use meaningful custom IDs

    Use IDs that map to your database (e.g., "ticket-{db_id}" or "article-{slug}"). This makes joining results back to your records trivial.

    2. Set system prompts in requests

    You can include a system field in each request's params. Use it to enforce output format (JSON, specific categories) so parsing is consistent across all results.

    3. Use max_tokens conservatively

    Batch pricing is based on tokens. If you only need short classifications, set max_tokens: 128. For summaries, max_tokens: 512. Don't leave it at 4096 if you don't need it.

    4. Store raw results before processing

    Save the raw JSONL output before parsing. If your parsing logic has a bug, you can re-parse without re-running the (expensive) batch.

    5. Monitor with request counts

    The batch object's request_counts field shows succeeded, errored, canceled, and processing counts in real time. Log this per poll to track job health.


    Key Takeaways

    • The Claude Batch API processes up to 10,000 requests per batch asynchronously with a 24-hour SLA
    • You pay 50% less vs. the synchronous API — significant savings at scale
    • Each request gets a custom_id for result matching; results are streamed as JSONL
    • Best for nightly pipelines, bulk document processing, and offline enrichment — not for real-time features
    • Handle errored results by checking result.result.error.type and resubmitting as needed


    Next Steps

    Ready to use Claude at scale? The Batch API is one of the most valuable — and underused — features in Claude's API. Master it and you can build data pipelines that would cost 2× as much on the synchronous endpoint.

    If you're preparing for the Claude Certified Architect (CCA-F) exam, batch processing architecture is a tested topic. Our CCA practice test bank includes questions on API optimization, cost management, and agentic system design.

    Want to go deeper on Claude API fundamentals? Start with our Claude API beginner's guide or jump into prompt caching — another technique that cuts costs on repeated context.

    The Anthropic official Batch API documentation has the full API reference if you need endpoint specs.

    Ready to Start Practicing?

    300+ scenario-based practice questions covering all 5 CCA domains. Detailed explanations for every answer.

    Free CCA Study Kit

    Get domain cheat sheets, anti-pattern flashcards, and weekly exam tips. No spam, unsubscribe anytime.