MarkTechPost
5/11/2026

Sakana AI and NVIDIA Introduce TwELL with CUDA Kernels for 20.5% Inference and 21.9% Training Speedup in LLMs
Short summary
Sakana AI and NVIDIA demonstrate that L1 regularization achieves >99% sparsity in LLM feedforward layers without performance loss. They use optimized CUDA kernels (TwELL) to translate this sparsity into real GPU throughput gains. Result: 20.5% faster inference, 21.9% faster training.
- •L1 regularization induces >99% sparsity in feedforward layers with negligible model performance impact
- •TwELL CUDA kernels translate sparsity into measurable GPU throughput improvements
- •Measured speedups: 20.5% faster inference, 21.9% faster training
Generated with AI, which can make mistakes.
Is this a good recommendation for you?



