Back to feed
arXiv cs.CL
arXiv cs.CL
6/16/2026
Are Online Skill and Memory Modules Always Worth Their Tokens? A Budget-Constrained Study of Web Agents

Are Online Skill and Memory Modules Always Worth Their Tokens? A Budget-Constrained Study of Web Agents

Short summary

Research compares augmented web agents (with memory/workflow/skill modules) against budget-matched vanilla baselines. Across three LLM models and multiple domains, vanilla actors often match or exceed augmented performance while using fewer total tokens. The findings suggest apparent gains from augmentation frequently vanish under real token constraints.

  • Augmented agents don't consistently outperform vanilla baselines when token budgets are matched
  • Study spans Gemini Flash, GPT-5.4-mini, and Qwen 3.6-27B across WebArena and WorkArena tasks
  • Run-to-run variance is material and should be reported as core evaluation criterion

Generated with AI, which can make mistakes.

Is this a good recommendation for you?

Explore more