Methodology
The audit pipeline, end to end
9 stages, typed input/output between each. Every stage is open source; every artefact is reproducible.
01INTAKE
Accepts an ERC-8004 agent URL, a raw 8004 agentId, or a GitHub repo URL. Resolves which type of input you gave us.
Output · Audit ID + parsed source
Code · src/pipeline/run.ts
02FETCH
Downloads the repo as a GitHub tarball (no `git` shell-out — works in serverless), pins a concrete commit SHA, extracts to an ephemeral sandbox, builds a file inventory (governance docs present? CI workflows? eval dir?).
Output · worktree dir, commitSha, fileCount, languages, FileInventory
Code · src/pipeline/stages/fetch.ts
03RECON
Single-pass scanner. For every text-readable source file (size-capped), applies regex + AST patterns against 21 named signals: agent_framework, logging_hooks, oversight_hooks, eval_artefacts, biometric, employment, content_generation, and so on. Each match contributes evidence to one or more signals.
Output · ReconResult — signals × evidence records
Code · src/pipeline/stages/recon.ts
04SCOPE
Selects which regulation packs to audit against. V0 echoes the user's choice; V1+ will narrow by jurisdiction and broaden by detected signals (e.g. content_generation auto-includes Article 50(2)).
Output · ScopeResult — list of regulation pack IDs
Code · src/pipeline/stages/scope.ts
05MAP
Cross-clause reasoning: derives the risk classification (high / limited / minimal / gpai). For EU AI Act, fires Annex III triggers based on RECON signals (e.g. employment_signals → Annex III §4) and Article 50 triggers from generation/emotion signals.
Output · MapResult — classification, annexIiiCategories, art50Triggers, rationale
Code · src/pipeline/stages/map.ts
06CHECK
Runs each in-scope clause's deterministic rules. Each rule returns a [0..1] score and evidence records. Weighted aggregation produces a per-clause raw score; raw → 0..4 ordinal verdict. Prohibition clauses (Art 5) invert: signal found = violation = fail.
Output · CheckResult per clause — verdict, score, rules breakdown, evidence
Code · src/pipeline/stages/check.ts + src/pipeline/checkers/rules.ts
07VERIFY
Pass-through stub in V0. V1 will invoke an LLM judge on clauses whose raw score lands in the ambiguous band [0.3, 0.7]. Disagreements downgrade verdict.
Output · VerifyResult per clause
08GRADE
Aggregates verdicts to per-regulation averages and an overall score. Excludes n/a + external from the mean.
Output · GradeResult — overallScore + perRegulation breakdown
Code · src/pipeline/stages/grade.ts
09REPORT
Assembles the final AuditReport, computes the canonical bundleHash = sha256 of the stable parts (commit, regulations versions, checker version, sorted clause verdicts). The hash is what gets posted on chain in V1.
Output · AuditReport — durable, shareable, reproducible
Reproducibility contract
For an audit to be third-party verifiable, four inputs must be pinned and re-derivable from the on-chain attestation:
commitSha— exactly which version of the audited repo we read.regulationsVersion— sha256 of the regulation YAML pack used.checkerVersion— pinned git SHA of the checker code at audit time.bundleHash— sha256 of canonical-JSON over all the above plus the sorted clause verdicts.
Anyone can clone our checker at checkerVersion, clone the audited repo at commitSha, load the regulation pack at regulationsVersion, run the pipeline, and compute their own bundleHash. If it matches the on-chain hash, the audit is honest. If not, one of the inputs drifted.
Acknowledgement
The multi-stage security-audit pattern that inspired this regulation-audit pipeline comes from prior work in agentic security tooling — notably the orchestrated RECON → HUNT → VERIFY → GRADE → REPORT flow popularised by hacker-bob.