Back to feed
arXiv cs.LG
arXiv cs.LG
6/18/2026
CODEBLOCK: Learning to Supervise Code at the Right Granularity

CODEBLOCK: Learning to Supervise Code at the Right Granularity

Short summary

CodeBlock is a training framework that selects structure-complete code blocks for supervised fine-tuning instead of all tokens uniformly. By supervising only 1.9% of response tokens while maintaining full context, it achieves competitive or stronger performance on six code-generation benchmarks. The method uses data-flow analysis to prioritize blocks containing important program dependencies.

  • Structure-aware sparse supervision selects coherent code blocks instead of individual tokens
  • Achieves competitive/stronger performance with only 1.9% of supervised tokens
  • Uses data-flow analysis to prioritize blocks that propagate important program dependencies

Generated with AI, which can make mistakes.

Is this a good recommendation for you?

Explore more