Are you speaking my languages? On spoken language adherence in multimodal LLMs

Short summary

arXiv paper proposes three methods to improve language adherence in multilingual speech recognition: zero-shot prompting, supervised fine-tuning, and chain-of-thought reasoning. Introduces novel metric to quantify language violations and evaluates trade-offs across languages. Addresses critical issue where ASR models misidentify output language, affecting transcription fidelity.

•Three mitigation strategies for language adherence in multilingual ASR: zero-shot prompting, SFT, and CoT reasoning
•Novel metric introduced to quantify language violation in speech recognition
•Evaluation across multiple languages with analysis of compute constraints and performance trade-offs

Generated with AI, which can make mistakes.

#research-breakthrough #ai-tools

Read full article at arXiv cs.CL

Is this a good recommendation for you?

Are you speaking my languages? On spoken language adherence in multimodal LLMs

Short summary

Explore more