Back to feed
MarkTechPost
MarkTechPost
5/11/2026
Sakana AI and NVIDIA Introduce TwELL with CUDA Kernels for 20.5% Inference and 21.9% Training Speedup in LLMs

Sakana AI and NVIDIA Introduce TwELL with CUDA Kernels for 20.5% Inference and 21.9% Training Speedup in LLMs

Short summary

Sakana AI and NVIDIA demonstrate that L1 regularization achieves >99% sparsity in LLM feedforward layers without performance loss. They use optimized CUDA kernels (TwELL) to translate this sparsity into real GPU throughput gains. Result: 20.5% faster inference, 21.9% faster training.

  • L1 regularization induces >99% sparsity in feedforward layers with negligible model performance impact
  • TwELL CUDA kernels translate sparsity into measurable GPU throughput improvements
  • Measured speedups: 20.5% faster inference, 21.9% faster training

Generated with AI, which can make mistakes.

Is this a good recommendation for you?

Explore more