Who Evaluates the Evaluators? Frameworks for Accreditation in Frontier AI Governance

As governments increasingly look to third-party auditors to verify AI safety claims, a critical question emerges: who ensures these auditors remain independent? The lessons from financial auditing are instructive. Credit rating agencies in 2008, paid by the very institutions they were evaluating, failed to flag systemic risks that nearly collapsed the global financial system. In AI governance, where well-funded developers have even stronger incentives to shape assessments in their favor, we need stronger safeguards from the start.

A core challenge for accreditation is verifying an assessor's "independence from undue pressures", especially from influential AI developers being assessed. I propose a two-tiered system for ensuring and auditing this independence.

Tier 1: Prevention Mechanisms

Pooled funding: Developers would pay into shared funds based on their AI model's risk level, reducing the risk of direct developer influence on assessments. Accreditors would manage these funding pools and compensate assessment firms via algorithmic randomization, preventing assessors from selecting their own clients.

Revenue diversification: Assessment firms would face strict revenue caps of no more than 8% from any single developer ecosystem, including subsidiaries and major suppliers. This 8% threshold is stricter than typical audit standards for financial auditing because of AI's high market concentration in frontier development. Senior assessment firm staff would face 36-month cooling-off periods before joining assessed companies, with mutual restrictions during ongoing evaluations.

Rotating networks: Accredited firms would periodically reshuffle assessors to maintain objectivity and autonomy, preventing the same person or team from repeatedly assessing the same AI developer.

Tamper-evident documentation: Accreditors would maintain comprehensive audit trails by tracking developer-assessor interactions, methodology choices, and key decisions. This transparency deters behind-the-scenes pressure tactics, as both parties know that all communications are subject to later review.

Tier 2: Detection Mechanisms

Multi-modal verification: Independent teams would re-evaluate a random 15% of all assessments, looking for warning signs such as unusually favorable outcomes, suspicious timing correlations, or systematic outlier patterns. Assessments showing a 20% variance from peer norms would automatically trigger investigation.

Process analysis: Investigators would examine communication patterns, decision timelines, and methodology deviations during formal reviews, focusing specifically on developer pressure tactics and assessment manipulation attempts.

Cross-reference validation: The accreditation body would verify conclusions against independent indicators, including academic benchmarks and red-team results, while cross-checking findings with incident reports and real-world deployment outcomes to catch systematic under-assessment.

Graduated enforcement: The accreditation body would escalate sanctions from corrections and fines to suspension and debarment in severe cases, maintaining credible deterrence. Developer pressure tactics would result in pool exclusion and regulatory notification.

Both tiers create complementary defense mechanisms against AI developer influence. The ex-ante mechanisms are designed to make corruption structurally difficult, while ex-post mechanisms catch sophisticated evasion attempts. Overall, this framework enables institutional learning. Prevention, detection, and adaptation working together to ensure the system remains dynamic and effective as AI capabilities advance.