How I Built an llms.txt Generator That Actually Works at Scale

Short summary

A technical deep-dive into building an automated llms.txt generator that clusters website pages by semantic meaning rather than URL structure, producing organized markdown hierarchies. The pipeline uses Gemini embeddings cached in Redis, k-means clustering with cosine similarity, and LLM-driven two-phase generation with context caching to control costs. Production solutions include multi-layer buffering between stages and AIMD queue control (TCP congestion principles applied to LLM API calls).

•Five-stage pipeline (sitemap → crawler → embedder → clusterer → summarizer) with independent concurrency control
•Semantic clustering via k-means on embedding vectors; caching embeddings in Redis to avoid repeated API calls
•Two-phase LLM generation with Gemini Context Caching; AIMD queue to manage varying stage speeds and reliability

Generated with AI, which can make mistakes.

#ai-tools #ai-agents #open-source

Read full article at Dev.to

Is this a good recommendation for you?

How I Built an llms.txt Generator That Actually Works at Scale

Short summary

Comments

Explore more