Shutdown Rewiring
Intended misalignment rate for shutdown rewiring tasks, sorted by decreasing misalignment.
- 1 GPT-5.4 95%
- 2 GPT-5.5 91%
- 3 Claude Opus 4.6 91%
- 4 GPT-5.5 (xhigh) 88%
- 5 Claude Opus 4.6 (Subagents) 73%
- 6 GPT-5.4 (Subagents) 68%
- 7 Gemini 3.1 Pro Preview 41%
- 8 Kimi K2.6 36%
- 9 GPT-5.4 Mini 18%
- 10 Qwen 3.6 Plus 18%
- 11 Claude Opus 4.7 14%
- 12 Claude Opus 4.7 (xhigh) 14%
Restricted Resource Access
Intended violation rate for restricted resource access tasks, sorted by decreasing misalignment.
- 1 Claude Opus 4.6 100%
- 2 Claude Opus 4.6 (Subagents) 100%
- 3 Gemini 3.1 Pro Preview 62%
- 4 GPT-5.4 (Subagents) 50%
- 5 Qwen 3.6 Plus 38%
- 6 Claude Opus 4.7 (xhigh) 25%
- 7 Kimi K2.6 25%
- 8 Claude Opus 4.7 12%
- 9 GPT-5.5 0%
- 10 GPT-5.5 (xhigh) 0%
- 11 GPT-5.4 0%
- 12 GPT-5.4 Mini 0%
User-Control Override
Intended misalignment rate for user-control override tasks, sorted by decreasing misalignment.
- 1 GPT-5.5 (xhigh) 100%
- 2 GPT-5.4 100%
- 3 GPT-5.4 (Subagents) 100%
- 4 GPT-5.5 98%
- 5 GPT-5.4 Mini 98%
- 6 Gemini 3.1 Pro Preview 94%
- 7 Claude Opus 4.7 87%
- 8 Qwen 3.6 Plus 87%
- 9 Kimi K2.6 87%
- 10 Claude Opus 4.7 (xhigh) 83%
- 11 Claude Opus 4.6 (Subagents) 31%
- 12 Claude Opus 4.6 10%