How a $0.02/Call Model Scored 78.2% on SWE-bench Verified — Beating Every Model on the Leaderboard

Short summary

Xanther's Context Engine (MCP-based) enables MiniMax M2.5 ($0.02 per call) to achieve 78.2% on SWE-bench Verified, outperforming Claude Opus 4.5 at 76.8% ($0.75). Improvement comes entirely from architectural context injection, not model capability. Performance gains correlate with codebase complexity—sympy +17%, pytest +8%—making 76%+ performance 3.4x cheaper.

•MiniMax M2.5 + Xanther Context Engine scores 78.2% on SWE-bench Verified—beats Claude Opus 4.5 (76.8%) at 3.4x lower cost
•Context advantage is architectural: understanding codebase dependencies and inheritance chains enables better bug fixes
•Benefit scales with complexity: deep-dependency codebases (sympy +17%) gain more than flat ones (pytest +8%)

Generated with AI, which can make mistakes.

#ai-tools #ai-agents #research-breakthrough #product-launch #market-trend

Read full article at Dev.to

Is this a good recommendation for you?

How a $0.02/Call Model Scored 78.2% on SWE-bench Verified — Beating Every Model on the Leaderboard

Short summary

Explore more