Back to feed
arXiv cs.CL
arXiv cs.CL
6/19/2026
Pruning via Causal Attribution Preserves Reasoning Performance in Large Language Models

Pruning via Causal Attribution Preserves Reasoning Performance in Large Language Models

Short summary

Researchers introduce Causal Attribution Pruning (CAP), a training-free method that identifies critical attention heads and removes non-critical ones while preserving LLM reasoning performance. CAP achieves up to 61% relative accuracy gains over Wanda at 20% sparsity on benchmarks like ARC-Challenge and GSM8K. The technique uses causal attribution to measure functional impact, outperforming magnitude-only and activation-based pruning criteria.

  • CAP measures causal impact of each attention head on reasoning tasks to guide pruning
  • Achieves 61% relative accuracy gains over Wanda baseline at 20% sparsity
  • Outperforms magnitude and activation-based methods on GSM8K, StrategyQA, ARC-Challenge

Generated with AI, which can make mistakes.

Is this a good recommendation for you?

Explore more