CoRA: Confidence-Rationale Alignment for Reliable Chain-of-Thought Reasoning

Short summary

CoRA addresses a critical LLM reliability issue: models can express high confidence in answers even when their reasoning is incomplete or poorly supported. Using GRPO-based reinforcement learning with rubric-based evaluation, the method jointly optimizes answer correctness and rationale quality across grounding, coherence, and task alignment. Results show up to 26.51% reduction in confidence-rationale misalignment while maintaining competitive accuracy.

•Framework aligns LLM answer confidence with quality of reasoning rationales
•Reduces confidence-rationale alignment error by 26.51% using GRPO reinforcement learning
•Tested on medical, math, and general knowledge tasks with improved calibration

Generated with AI, which can make mistakes.

#research-breakthrough

Read full article at arXiv cs.CL

Is this a good recommendation for you?

CoRA: Confidence-Rationale Alignment for Reliable Chain-of-Thought Reasoning

Short summary

Explore more