arXiv cs.LG
6/18/2026

CODEBLOCK: Learning to Supervise Code at the Right Granularity
Short summary
CodeBlock is a training framework that selects structure-complete code blocks for supervised fine-tuning instead of all tokens uniformly. By supervising only 1.9% of response tokens while maintaining full context, it achieves competitive or stronger performance on six code-generation benchmarks. The method uses data-flow analysis to prioritize blocks containing important program dependencies.
- •Structure-aware sparse supervision selects coherent code blocks instead of individual tokens
- •Achieves competitive/stronger performance with only 1.9% of supervised tokens
- •Uses data-flow analysis to prioritize blocks that propagate important program dependencies
Generated with AI, which can make mistakes.
Is this a good recommendation for you?