Back to feed
arXiv cs.CL
arXiv cs.CL
6/16/2026
PhoneHarness: Harnessing Phone-Use Agents through Mixed GUI, CLI, and Tool Actions

PhoneHarness: Harnessing Phone-Use Agents through Mixed GUI, CLI, and Tool Actions

Short summary

PhoneHarness is a new framework and benchmark for evaluating phone-use agents that blend GUI taps, command-line actions, and structured tool calls. Rather than scoring agents on plausibility, it verifies actual side effects—reaching a 75% pass rate, 12.9 points above prior methods. The research demonstrates that reliable mobile automation needs verifiable execution routing, not just visual control.

  • PhoneHarness framework combines GUI, CLI, and tool actions for realistic mobile workflows
  • Benchmark evaluates agents by verifiable side effects, not just screen predictions
  • Achieves 75% pass rate, significantly outperforming non-harness baselines

Generated with AI, which can make mistakes.

Is this a good recommendation for you?

Explore more