Back to feed
Dev.to
Dev.to
5/9/2026
Adaptive Compression in Inverted Indexes: What Actually Happens Inside Lucene, Elasticsearch, and Tantivy

Adaptive Compression in Inverted Indexes: What Actually Happens Inside Lucene, Elasticsearch, and Tantivy

Short summary

Adaptive compression in inverted indexes chooses encoding strategies per postings list based on statistical profile (PFOR-delta for dense, bitpacking for uniform). The key distinction often missed: Elasticsearch's BEST_COMPRESSION tuning helps stored fields (JSON), not postings (doc IDs)—separate storage layers often conflated in documentation. At scale, uncompressed postings thrash page cache causing disk I/O spikes that RAM alone cannot solve; only custom codecs or architectural changes are effective.

  • Adaptive compression picks encoding per postings list (PFOR-delta, bitpacking, RLE) based on density and gap distribution
  • BEST_COMPRESSION in Elasticsearch addresses stored fields (JSON), not postings (doc IDs)—two distinct compression problems often conflated
  • At scale, large postings lists thrash page cache; only custom codecs or architectural changes (field splitting, doc-value-only fields) solve disk I/O bottlenecks that RAM can't resolve

Generated with AI, which can make mistakes.

Is this a good recommendation for you?

Explore more