8RR8
← Audit an agent

Methodology

The audit pipeline, end to end

9 stages, typed input/output between each. Every stage is open source; every artefact is reproducible.

INTAKE → FETCH → RECON → SCOPE → MAP → CHECK → VERIFY → GRADE → REPORT

01INTAKE

Accepts an ERC-8004 agent URL, a raw 8004 agentId, or a GitHub repo URL. Resolves which type of input you gave us.

Output · Audit ID + parsed source

Code · src/pipeline/run.ts

02FETCH

Downloads the repo as a GitHub tarball (no `git` shell-out — works in serverless), pins a concrete commit SHA, extracts to an ephemeral sandbox, builds a file inventory (governance docs present? CI workflows? eval dir?).

Output · worktree dir, commitSha, fileCount, languages, FileInventory

Code · src/pipeline/stages/fetch.ts

03RECON

Single-pass scanner. For every text-readable source file (size-capped), applies regex + AST patterns against 21 named signals: agent_framework, logging_hooks, oversight_hooks, eval_artefacts, biometric, employment, content_generation, and so on. Each match contributes evidence to one or more signals.

Output · ReconResult — signals × evidence records

Code · src/pipeline/stages/recon.ts

04SCOPE

Selects which regulation packs to audit against. V0 echoes the user's choice; V1+ will narrow by jurisdiction and broaden by detected signals (e.g. content_generation auto-includes Article 50(2)).

Output · ScopeResult — list of regulation pack IDs

Code · src/pipeline/stages/scope.ts

05MAP

Cross-clause reasoning: derives the risk classification (high / limited / minimal / gpai). For EU AI Act, fires Annex III triggers based on RECON signals (e.g. employment_signals → Annex III §4) and Article 50 triggers from generation/emotion signals.

Output · MapResult — classification, annexIiiCategories, art50Triggers, rationale

Code · src/pipeline/stages/map.ts

06CHECK

Runs each in-scope clause's deterministic rules. Each rule returns a [0..1] score and evidence records. Weighted aggregation produces a per-clause raw score; raw → 0..4 ordinal verdict. Prohibition clauses (Art 5) invert: signal found = violation = fail.

Output · CheckResult per clause — verdict, score, rules breakdown, evidence

Code · src/pipeline/stages/check.ts + src/pipeline/checkers/rules.ts

07VERIFY

Pass-through stub in V0. V1 will invoke an LLM judge on clauses whose raw score lands in the ambiguous band [0.3, 0.7]. Disagreements downgrade verdict.

Output · VerifyResult per clause

Code · src/pipeline/stages/verify.ts

08GRADE

Aggregates verdicts to per-regulation averages and an overall score. Excludes n/a + external from the mean.

Output · GradeResult — overallScore + perRegulation breakdown

Code · src/pipeline/stages/grade.ts

09REPORT

Assembles the final AuditReport, computes the canonical bundleHash = sha256 of the stable parts (commit, regulations versions, checker version, sorted clause verdicts). The hash is what gets posted on chain in V1.

Output · AuditReport — durable, shareable, reproducible

Code · src/pipeline/stages/report.ts

Reproducibility contract

For an audit to be third-party verifiable, four inputs must be pinned and re-derivable from the on-chain attestation:

Anyone can clone our checker at checkerVersion, clone the audited repo at commitSha, load the regulation pack at regulationsVersion, run the pipeline, and compute their own bundleHash. If it matches the on-chain hash, the audit is honest. If not, one of the inputs drifted.

Acknowledgement

The multi-stage security-audit pattern that inspired this regulation-audit pipeline comes from prior work in agentic security tooling — notably the orchestrated RECON → HUNT → VERIFY → GRADE → REPORT flow popularised by hacker-bob.