arXiv cs.CL
6/18/2026

Continuous Audio Thinking for Large Audio Language Models
Short summary
Continuous Audio Thinking (CoAT) improves large audio language models by adding a learnable workspace to preserve acoustic information—phonetics, prosody, sound events—before generating responses. Using distillation from audio experts with no additional decoding cost, CoAT showed measurable gains on three models across audio reasoning, music classification, and speech emotion recognition.
- •Introduces CoAT framework that adds a continuous latent workspace to preserve acoustic details typically lost during text-aligned training
- •Uses audio expert distillation with no additional autoregressive decoding overhead compared to baseline models
- •Demonstrates improvements on Qwen2-Audio, Qwen2.5-Omni-7B, and Audio Flamingo across audio reasoning, music classification, and speech tasks
Generated with AI, which can make mistakes.
Is this a good recommendation for you?