Notes from an independent AI safety practice.
Misalignment write-ups, eval-design pieces, and occasional method notes. Most carry a Hugging Face artefact or a public repo; everything is reproducible.
01 · RESEARCH NOTE
Same scenario, two different deceptions: how o3 and GPT-5 diverged from a single elicitation.
Frontier model evalsPalisade Bounty
21 May 2026
11 min
11 min
01 · RESEARCH NOTE · 21 May 2026
Same scenario, two different deceptions: how o3 and GPT-5 diverged from a single elicitation.
Frontier model evalsPalisade Bounty