AR
arXiv CS.AI
5/12/2026

MemQ: Integrating Q-Learning into Self-Evolving Memory Agents over Provenance DAGs
Short summary
MemQ applies reinforcement learning's TD(λ) eligibility traces to episodic memory in LLM agents, propagating credit through provenance DAGs that track how memories enable future memories. Tested on six tasks, it achieves the highest success rates, with largest gains (+5.7pp) on multi-step problems requiring deep memory chains. Code will be released soon.
- •Novel method (MemQ) for credit propagation through memory dependencies using TD(λ) traces
- •Outperforms baselines across 6 benchmarks including code generation, OS interaction, and QA
- •Largest improvements (+5.7pp) on multi-step tasks with deep provenance chains
Generated with AI, which can make mistakes.
Is this a good recommendation for you?