Dev.to
5/9/2026

Adaptive Compression in Inverted Indexes: What Actually Happens Inside Lucene, Elasticsearch, and Tantivy
Short summary
Adaptive compression in inverted indexes chooses encoding strategies per postings list based on statistical profile (PFOR-delta for dense, bitpacking for uniform). The key distinction often missed: Elasticsearch's BEST_COMPRESSION tuning helps stored fields (JSON), not postings (doc IDs)—separate storage layers often conflated in documentation. At scale, uncompressed postings thrash page cache causing disk I/O spikes that RAM alone cannot solve; only custom codecs or architectural changes are effective.
- •Adaptive compression picks encoding per postings list (PFOR-delta, bitpacking, RLE) based on density and gap distribution
- •BEST_COMPRESSION in Elasticsearch addresses stored fields (JSON), not postings (doc IDs)—two distinct compression problems often conflated
- •At scale, large postings lists thrash page cache; only custom codecs or architectural changes (field splitting, doc-value-only fields) solve disk I/O bottlenecks that RAM can't resolve
Generated with AI, which can make mistakes.
Is this a good recommendation for you?



