Skip to content
~/samuel
LIVE · 2026 – ongoing

A multi-headed QA agent we built to meet the AI-agent era head-on — Jira in, validated tickets out.

OriginSelf-initiated
TypeQA Validation Agent
StatusLive on staging · Actively growing new heads
How it works

What Hydra is

A multi-headed QA agent, triggered by a ticket.

Hydra is a QA agent I proposed and the team I lead built on Claude and Codex — our move in the AI-agent era reshaping our craft. Drop it a Jira ticket key: it routes the ticket through the right validation head (backend, frontend, or hybrid), runs the checks against staging, captures evidence, and publishes the bundle back to reviewers. It's live, it's evolving, and it's freeing the team to do the work only humans can.

The problem

Manual regression doesn't scale. Every ticket on staging needed human eyes for API contracts, UI flows, or both — hours of repetitive click-through that crowded out test strategy, edge-case hunting, and the work that actually moves quality forward. Meanwhile, AI agents were getting good enough to matter.

Hydra's answer

I pitched and designed the approach; the QA team I lead built it — no one assigned us. A Jira-driven agent that takes over the repetitive half so the team can focus on the human half. Trigger it with a ticket key; it figures out what to test, runs the checks, captures the evidence, and hands back a clean report. Built on Claude Code's skills model so the behavior is readable, testable, and improves every sprint.

Anatomy

Three heads, one body.

The agent routes every ticket to the right verification head. Each head has its own playbook, its own tooling, and its own idea of what 'done' looks like — but they all share the same spine.

head 01

Backend Head

/test-be

API, auth, DB, contracts.

Endpoint verification, validation rules, authentication flows, database assertions, and contract testing. Runs against staging with the ticket's acceptance criteria as a spec.

head 02

Frontend Head

/test-fe

Components, interaction, permissions, responsive.

UI component verification, interaction flows, permission boundaries, and responsive behavior. Drives the app the way a user would — but with perfect memory of every ticket that came before.

head 03

Hybrid Head

/test (routes both)

Full-stack tickets end-to-end.

For tickets that cross the stack, the orchestrator routes to both heads and stitches the evidence together. One command, two verification paths, one report.

How it works

From a ticket key to a reviewed result, in six steps.

Every run starts with a Jira ticket and ends with a structured report. In between, Hydra handles the orchestration — so humans only show up for the parts that need judgment.

  1. 01

    Jira ticket → entry point

    Trigger with a ticket key. Hydra fetches the ticket, reads the acceptance criteria, and decides what kind of validation it needs.

  2. 02

    Route to the right head

    Backend, frontend, or hybrid — each with its own verification playbook and tooling.

  3. 03

    Validate against staging

    Exercises the ticket: hits APIs with the right payloads, clicks through UI flows, checks DB state, verifies permissions.

  4. 04

    Capture evidence

    Discovery notes, Postman collections, screenshots, and structured test reports — everything a reviewer needs to trust the result.

  5. 05

    Publish to GCS

    The evidence bundle ships to Google Cloud Storage under a per-ticket folder. Reviewers get a single link with everything attached.

  6. 06

    Comment back on Jira

    A structured comment lands on the ticket: verdict, links to evidence, and any follow-ups the team needs to handle.

Design principles

Why it works the way it does.

The architecture didn't appear by accident. These are the six principles I held the build to — the decisions that made Hydra dependable instead of clever.

  1. principle 01

    Meet the team where they already are

    Hydra's entry point is a Jira ticket — the artifact every engineer, PM, and QA already interacts with. Zero new UX for the team to adopt.

  2. principle 02

    Route, then validate

    A thin orchestrator decides which head to call; each head specializes in its own verification discipline. Changing how we test backend doesn't touch frontend logic.

  3. principle 03

    Evidence-first

    Every run ships a structured artifact bundle — not just pass/fail. Reviewers trust what they can audit.

  4. principle 04

    Human-in-the-loop

    Hydra proposes verdicts and surfaces findings. Humans approve, override, or escalate. The agent never ships to production on its own.

  5. principle 05

    Capabilities as skills, not scripts

    Every capability is a composable skill with a contract — so the playbook improves every sprint without a rewrite.

  6. principle 06

    Portable by design

    What-to-test is decoupled from how-to-test. The architecture travels to whatever product the team works on next.

Engineering Playbook · Shared Skill Library
BrainstormingWriting & executing plansSystematic debuggingTest-driven developmentVerification before completionRequesting code reviewUsing git worktreesParallel agent dispatch

Stack

What Hydra is made of.

Model
ClaudeCodex
Runtime
Claude CodeCursor
Protocol
MCP
Knowledge
Context7
Language
Python
Tooling
uv
Integration
Jira API
Testing
Postman
Infra
Google Cloud Storage

Why this one matters

Why Hydra is the thing I'm most excited about right now.

  1. reason 01

    Took the initiative — nobody assigned Hydra. I saw the AI-agent moment and rallied the team to build our answer.

  2. reason 02

    Live on staging — not a demo, not a slide deck. Real traffic is running through it.

  3. reason 03

    Growing every sprint — new heads, new skills, new coverage.

  4. reason 04

    Frees the team for higher-value work — the thinking humans are uniquely strong at.