Skip to main content

Methodology

Capture

Every agency in the archive is crawled on a fixed cadence (weekly for priority-tier 1, biweekly for 2, monthly for 3). Each captured URL is rendered in a headless Chromium browser, hashed (SHA-256), stored to MinIO, submitted to the Wayback Machine and archive.today, and pinned to IPFS. Sub-resources are serialized to WARC. Capture is append-only — a re-fetch produces a new row, not a mutation.

Extraction

Captures pass through three tiers of extractors. Tier A adapters are hand-tuned per source (SIU Ontario director's reports, BEI Québec investigation detail pages, IIO BC Chief Civilian Director decisions, SIRT-NL director's reports, CanLII OCPC decisions). Tier B templates cover document families across municipalities. Tier C is an LLM fallback (Anthropic Claude) that handles every captured document for which no hand-tuned adapter exists.

Purge detection

Four detector families watch for record removal. The Ledger runs all four:

Transparency grading

Each agency's grade is computed daily from extracted incidents and detected purges. The rubric is explicit:

Anonymization

The public product never publishes officer names. Every incident carries an HMAC-SHA256 token derived from a salt held outside the application database; rotating the salt re-tokenizes the corpus, so a compromised app DB does not compromise name→token linkage. Dates are published to quarter-level precision only; records wait at least 60 days from disposition before publication; cells with fewer than five similar records are suppressed or aggregated upward. Agencies with fewer than 50 sworn members publish only yearly aggregates. Publication bans are honored unconditionally.