arXiv cs.CL
6/17/2026

Are you speaking my languages? On spoken language adherence in multimodal LLMs
Short summary
arXiv paper proposes three methods to improve language adherence in multilingual speech recognition: zero-shot prompting, supervised fine-tuning, and chain-of-thought reasoning. Introduces novel metric to quantify language violations and evaluates trade-offs across languages. Addresses critical issue where ASR models misidentify output language, affecting transcription fidelity.
- •Three mitigation strategies for language adherence in multilingual ASR: zero-shot prompting, SFT, and CoT reasoning
- •Novel metric introduced to quantify language violation in speech recognition
- •Evaluation across multiple languages with analysis of compute constraints and performance trade-offs
Generated with AI, which can make mistakes.
Is this a good recommendation for you?