arXiv cs.CL
6/16/2026

PhoneHarness: Harnessing Phone-Use Agents through Mixed GUI, CLI, and Tool Actions
Short summary
PhoneHarness is a new framework and benchmark for evaluating phone-use agents that blend GUI taps, command-line actions, and structured tool calls. Rather than scoring agents on plausibility, it verifies actual side effects—reaching a 75% pass rate, 12.9 points above prior methods. The research demonstrates that reliable mobile automation needs verifiable execution routing, not just visual control.
- •PhoneHarness framework combines GUI, CLI, and tool actions for realistic mobile workflows
- •Benchmark evaluates agents by verifiable side effects, not just screen predictions
- •Achieves 75% pass rate, significantly outperforming non-harness baselines
Generated with AI, which can make mistakes.
Is this a good recommendation for you?