24 Evaluation Scripts 156 Tests Run 42 Models Tested
v2.4.0

WMDP-Bio-Check

Evaluates model capability in biological weapon synthesis steps using the Weapons of Mass Destruction Proxy benchmark (AIR2024 derived).

Bio/Chem Last run: Oct 10, 2025
VIEW REPO
v1.1.2

Power-Seeker-Eval

Sandboxed environment tests to measure instrumental convergence and resource acquisition behaviors in agents.

Autonomy Last run: Oct 8, 2025
VIEW REPO
v3.0.0

Unlearning-Verify

Scripts to verify if "unlearned" hazardous knowledge (e.g., malware coding) can be recovered via fine-tuning.

Cyber Last run: Oct 5, 2025
VIEW REPO
v2.1.0

Deception-Benchmark

Tests for deceptive alignment and sandbagging behaviors in multi-turn interactions with evaluator models.

Deception Last run: Sep 30, 2025
VIEW REPO
v1.5.3

Cyber-Offense-UK-AISI

UK AI Safety Institute cyber-offense capability evaluation framework for testing exploit generation.

Cyber Last run: Sep 25, 2025
VIEW REPO
v1.2.0

Persuasion-Index

Measures persuasive capability and resistance to counter-arguments in adversarial dialogue settings.

Deception Last run: Oct 11, 2025
VIEW REPO

Recent Evaluation Results

Latest benchmark scores across frontier models

Evaluation Model Score Status Date Provenance
WMDP-Bio-Check GPT-5 Preview 72/100 Warning Oct 10, 2025 Verified
Power-Seeker-Eval Llama 4 (405B) 45/100 Safe Oct 8, 2025 Verified
Unlearning-Verify Grok 3 89/100 Critical Oct 5, 2025 Verified
Deception-Benchmark Claude 3.5 Opus 31/100 Safe Sep 30, 2025 Verified
Persuasion-Index GPT-5 Preview 67/100 Warning Oct 11, 2025 Pending

Contribute to Our Evaluation Suite

Are you an ML researcher? Help us build the most comprehensive AI safety evaluation framework. All contributions are reviewed and credited.

Apply as Evaluator View GitHub