Model Leaderboard

Rankings based on Elo rating from head-to-head comparisons. Higher Elo = better performance.

Filter by persona:

Global Rankings

RankModelElo RatingWin RateW / LComparisons
🥇
Claude Opus
100083.3%5 / 16
🥈
Gemini 3.0 Pro
100063.6%7 / 411
🥉
Gemini 3.0 Flash
100066.7%4 / 26
#4
GPT-5.2 Medium
100050.0%2 / 24
#5
Gemma 3 4B
100033.3%1 / 23
#6
GPT-5.2 Low
100033.3%1 / 23
#7
GPT-5 Mini
100025.0%1 / 34
#8
Claude 3.5 Sonnet
10000.0%0 / 55