Dev.to
6/18/2026

How I Built an Autonomous Incident Investigation Agent That Reduced MTTR by 65%
Short summary
An engineer built FRIDAY, an autonomous incident investigation agent reducing MTTR by 65% on a 30M-user platform. The agent uses Claude Opus to correlate GitHub changes with Datadog signals and deliver synthesized findings to Teams in under 2 minutes. A dual-Lambda architecture circumvents API Gateway timeouts while a tool-use loop prioritizes code changes over raw observability data.
- •FRIDAY autonomously investigates PagerDuty incidents by checking GitHub changes first, then Datadog signals, delivering analysis in under 2 minutes
- •Production deployment shows 65% MTTR reduction with high root-cause accuracy across 30M users
- •Dual-Lambda pattern (sync + async) and region-locking prevent false positives and API Gateway timeouts
Generated with AI, which can make mistakes.
Is this a good recommendation for you?



