Back to feed
Dev.to
Dev.to
5/8/2026
AI Firewall learns to defend

AI Firewall learns to defend

Original: How I Built a Red/Blue Team Loop That Teaches My AI Firewall to Defend Itself

Short summary

Build an automated adversarial testing loop where Claude Haiku generates novel prompt injection attacks nightly while your firewall defends against them. Escaped attacks trigger new detection signatures proposed by an analysis step, reviewed by humans before deployment. This feedback loop keeps security rules ahead of evolving attack techniques without exposing unvetted signatures.

  • Red team (Claude Haiku) generates 10 novel attack payloads nightly, avoiding techniques already covered by existing signatures
  • Blue team tests attacks against live firewall in strict mode; escaped attacks below threat threshold are analyzed for new signatures
  • Escaped attack patterns trigger signature proposals checked for novelty via pgvector, then queued for human admin review before going live

Generated with AI, which can make mistakes.

Is this a good recommendation for you?

Explore more