Pruning via Causal Attribution Preserves Reasoning Performance in Large Language Models

Short summary

Researchers introduce Causal Attribution Pruning (CAP), a training-free method that identifies critical attention heads and removes non-critical ones while preserving LLM reasoning performance. CAP achieves up to 61% relative accuracy gains over Wanda at 20% sparsity on benchmarks like ARC-Challenge and GSM8K. The technique uses causal attribution to measure functional impact, outperforming magnitude-only and activation-based pruning criteria.

•CAP measures causal impact of each attention head on reasoning tasks to guide pruning
•Achieves 61% relative accuracy gains over Wanda baseline at 20% sparsity
•Outperforms magnitude and activation-based methods on GSM8K, StrategyQA, ARC-Challenge

Generated with AI, which can make mistakes.

#research-breakthrough #ai-tools #ai-agents

Read full article at arXiv cs.CL

Is this a good recommendation for you?

Pruning via Causal Attribution Preserves Reasoning Performance in Large Language Models

Short summary

Explore more