Back to feed
arXiv cs.CL
arXiv cs.CL
6/16/2026
CoRA: Confidence-Rationale Alignment for Reliable Chain-of-Thought Reasoning

CoRA: Confidence-Rationale Alignment for Reliable Chain-of-Thought Reasoning

Short summary

CoRA addresses a critical LLM reliability issue: models can express high confidence in answers even when their reasoning is incomplete or poorly supported. Using GRPO-based reinforcement learning with rubric-based evaluation, the method jointly optimizes answer correctness and rationale quality across grounding, coherence, and task alignment. Results show up to 26.51% reduction in confidence-rationale misalignment while maintaining competitive accuracy.

  • Framework aligns LLM answer confidence with quality of reasoning rationales
  • Reduces confidence-rationale alignment error by 26.51% using GRPO reinforcement learning
  • Tested on medical, math, and general knowledge tasks with improved calibration

Generated with AI, which can make mistakes.

Is this a good recommendation for you?

Explore more