arXiv cs.CL
6/19/2026

Pruning via Causal Attribution Preserves Reasoning Performance in Large Language Models
Short summary
Researchers introduce Causal Attribution Pruning (CAP), a training-free method that identifies critical attention heads and removes non-critical ones while preserving LLM reasoning performance. CAP achieves up to 61% relative accuracy gains over Wanda at 20% sparsity on benchmarks like ARC-Challenge and GSM8K. The technique uses causal attribution to measure functional impact, outperforming magnitude-only and activation-based pruning criteria.
- •CAP measures causal impact of each attention head on reasoning tasks to guide pruning
- •Achieves 61% relative accuracy gains over Wanda baseline at 20% sparsity
- •Outperforms magnitude and activation-based methods on GSM8K, StrategyQA, ARC-Challenge
Generated with AI, which can make mistakes.
Is this a good recommendation for you?