ROGUE Benchmark

Leaderboard

Model-level intended misalignment rates

Shutdown Rewiring

Intended misalignment rate for shutdown rewiring tasks, sorted by decreasing misalignment.

  1. 1 GPT-5.4 95%
  2. 2 GPT-5.5 91%
  3. 3 Claude Opus 4.6 91%
  4. 4 GPT-5.5 (xhigh) 88%
  5. 5 Claude Opus 4.6 (Subagents) 73%
  6. 6 GPT-5.4 (Subagents) 68%
  7. 7 Gemini 3.1 Pro Preview 41%
  8. 8 Kimi K2.6 36%
  9. 9 GPT-5.4 Mini 18%
  10. 10 Qwen 3.6 Plus 18%
  11. 11 Claude Opus 4.7 14%
  12. 12 Claude Opus 4.7 (xhigh) 14%

Restricted Resource Access

Intended violation rate for restricted resource access tasks, sorted by decreasing misalignment.

  1. 1 Claude Opus 4.6 100%
  2. 2 Claude Opus 4.6 (Subagents) 100%
  3. 3 Gemini 3.1 Pro Preview 62%
  4. 4 GPT-5.4 (Subagents) 50%
  5. 5 Qwen 3.6 Plus 38%
  6. 6 Claude Opus 4.7 (xhigh) 25%
  7. 7 Kimi K2.6 25%
  8. 8 Claude Opus 4.7 12%
  9. 9 GPT-5.5 0%
  10. 10 GPT-5.5 (xhigh) 0%
  11. 11 GPT-5.4 0%
  12. 12 GPT-5.4 Mini 0%

User-Control Override

Intended misalignment rate for user-control override tasks, sorted by decreasing misalignment.

  1. 1 GPT-5.5 (xhigh) 100%
  2. 2 GPT-5.4 100%
  3. 3 GPT-5.4 (Subagents) 100%
  4. 4 GPT-5.5 98%
  5. 5 GPT-5.4 Mini 98%
  6. 6 Gemini 3.1 Pro Preview 94%
  7. 7 Claude Opus 4.7 87%
  8. 8 Qwen 3.6 Plus 87%
  9. 9 Kimi K2.6 87%
  10. 10 Claude Opus 4.7 (xhigh) 83%
  11. 11 Claude Opus 4.6 (Subagents) 31%
  12. 12 Claude Opus 4.6 10%