Back to feed
Dev.to
Dev.to
5/12/2026
The AI audit rep-curve: why 1 run gives you 67 percent reliability

The AI audit rep-curve: why 1 run gives you 67 percent reliability

Short summary

Single-run AI engine audits match multi-run results only 67% of the time due to non-determinism and retrieval volatility. Running 5 reps brings consistency to 95%+. Report confidence intervals and detailed methodology instead of point estimates.

  • 1-rep audits match the 5-rep aggregate only 67% of the time; reliability reaches 95% by the 4th rep
  • Engine non-determinism, retrieval volatility, and prompt phrasing variance cause output instability run-to-run
  • Adopt 5-rep minimum, report confidence intervals, and document methodology to avoid presenting noise as signal

Generated with AI, which can make mistakes.

Is this a good recommendation for you?

Explore more