Back to feed
arXiv cs.CL
arXiv cs.CL
6/17/2026
Are you speaking my languages? On spoken language adherence in multimodal LLMs

Are you speaking my languages? On spoken language adherence in multimodal LLMs

Short summary

arXiv paper proposes three methods to improve language adherence in multilingual speech recognition: zero-shot prompting, supervised fine-tuning, and chain-of-thought reasoning. Introduces novel metric to quantify language violations and evaluates trade-offs across languages. Addresses critical issue where ASR models misidentify output language, affecting transcription fidelity.

  • Three mitigation strategies for language adherence in multilingual ASR: zero-shot prompting, SFT, and CoT reasoning
  • Novel metric introduced to quantify language violation in speech recognition
  • Evaluation across multiple languages with analysis of compute constraints and performance trade-offs

Generated with AI, which can make mistakes.

Is this a good recommendation for you?

Explore more