Are Online Skill and Memory Modules Always Worth Their Tokens? A Budget-Constrained Study of Web Agents

Short summary

Research compares augmented web agents (with memory/workflow/skill modules) against budget-matched vanilla baselines. Across three LLM models and multiple domains, vanilla actors often match or exceed augmented performance while using fewer total tokens. The findings suggest apparent gains from augmentation frequently vanish under real token constraints.

•Augmented agents don't consistently outperform vanilla baselines when token budgets are matched
•Study spans Gemini Flash, GPT-5.4-mini, and Qwen 3.6-27B across WebArena and WorkArena tasks
•Run-to-run variance is material and should be reported as core evaluation criterion

Generated with AI, which can make mistakes.

#ai-agents #research-breakthrough #ai-tools

Read full article at arXiv cs.CL

Is this a good recommendation for you?

Are Online Skill and Memory Modules Always Worth Their Tokens? A Budget-Constrained Study of Web Agents

Short summary

Explore more