arXiv cs.CL
6/16/2026

CoRA: Confidence-Rationale Alignment for Reliable Chain-of-Thought Reasoning
Short summary
CoRA addresses a critical LLM reliability issue: models can express high confidence in answers even when their reasoning is incomplete or poorly supported. Using GRPO-based reinforcement learning with rubric-based evaluation, the method jointly optimizes answer correctness and rationale quality across grounding, coherence, and task alignment. Results show up to 26.51% reduction in confidence-rationale misalignment while maintaining competitive accuracy.
- •Framework aligns LLM answer confidence with quality of reasoning rationales
- •Reduces confidence-rationale alignment error by 26.51% using GRPO reinforcement learning
- •Tested on medical, math, and general knowledge tasks with improved calibration
Generated with AI, which can make mistakes.
Is this a good recommendation for you?