Manual regression doesn't scale. Every ticket on staging needed human eyes for API contracts, UI flows, or both — hours of repetitive click-through that crowded out test strategy, edge-case hunting, and the work that actually moves quality forward. Meanwhile, AI agents were getting good enough to matter.
A multi-headed QA agent we built to meet the AI-agent era head-on — Jira in, validated tickets out.
A multi-headed QA agent, triggered by a ticket.
Hydra is a QA agent I proposed and the team I lead built on Claude and Codex — our move in the AI-agent era reshaping our craft. Drop it a Jira ticket key: it routes the ticket through the right validation head (backend, frontend, or hybrid), runs the checks against staging, captures evidence, and publishes the bundle back to reviewers. It's live, it's evolving, and it's freeing the team to do the work only humans can.
I pitched and designed the approach; the QA team I lead built it — no one assigned us. A Jira-driven agent that takes over the repetitive half so the team can focus on the human half. Trigger it with a ticket key; it figures out what to test, runs the checks, captures the evidence, and hands back a clean report. Built on Claude Code's skills model so the behavior is readable, testable, and improves every sprint.
Three heads, one body.
The agent routes every ticket to the right verification head. Each head has its own playbook, its own tooling, and its own idea of what 'done' looks like — but they all share the same spine.
Backend Head
API, auth, DB, contracts.
Endpoint verification, validation rules, authentication flows, database assertions, and contract testing. Runs against staging with the ticket's acceptance criteria as a spec.
Frontend Head
Components, interaction, permissions, responsive.
UI component verification, interaction flows, permission boundaries, and responsive behavior. Drives the app the way a user would — but with perfect memory of every ticket that came before.
Hybrid Head
Full-stack tickets end-to-end.
For tickets that cross the stack, the orchestrator routes to both heads and stitches the evidence together. One command, two verification paths, one report.
From a ticket key to a reviewed result, in six steps.
Every run starts with a Jira ticket and ends with a structured report. In between, Hydra handles the orchestration — so humans only show up for the parts that need judgment.
Jira ticket → entry point
Trigger with a ticket key. Hydra fetches the ticket, reads the acceptance criteria, and decides what kind of validation it needs.
Route to the right head
Backend, frontend, or hybrid — each with its own verification playbook and tooling.
Validate against staging
Exercises the ticket: hits APIs with the right payloads, clicks through UI flows, checks DB state, verifies permissions.
Capture evidence
Discovery notes, Postman collections, screenshots, and structured test reports — everything a reviewer needs to trust the result.
Publish to GCS
The evidence bundle ships to Google Cloud Storage under a per-ticket folder. Reviewers get a single link with everything attached.
Comment back on Jira
A structured comment lands on the ticket: verdict, links to evidence, and any follow-ups the team needs to handle.
Why it works the way it does.
The architecture didn't appear by accident. These are the six principles I held the build to — the decisions that made Hydra dependable instead of clever.
Meet the team where they already are
Hydra's entry point is a Jira ticket — the artifact every engineer, PM, and QA already interacts with. Zero new UX for the team to adopt.
Route, then validate
A thin orchestrator decides which head to call; each head specializes in its own verification discipline. Changing how we test backend doesn't touch frontend logic.
Evidence-first
Every run ships a structured artifact bundle — not just pass/fail. Reviewers trust what they can audit.
Human-in-the-loop
Hydra proposes verdicts and surfaces findings. Humans approve, override, or escalate. The agent never ships to production on its own.
Capabilities as skills, not scripts
Every capability is a composable skill with a contract — so the playbook improves every sprint without a rewrite.
Portable by design
What-to-test is decoupled from how-to-test. The architecture travels to whatever product the team works on next.
What Hydra is made of.
Why Hydra is the thing I'm most excited about right now.
Took the initiative — nobody assigned Hydra. I saw the AI-agent moment and rallied the team to build our answer.
Live on staging — not a demo, not a slide deck. Real traffic is running through it.
Growing every sprint — new heads, new skills, new coverage.
Frees the team for higher-value work — the thinking humans are uniquely strong at.