Coding

Coding

How well models write, edit, and debug real code.

Last updated · May 08, 2026, 12:40 UTC

RankModelCompositeTrend
1
Claude Opus 4.6limited
Anthropic
75.6
2
GLM-5limited
Z.ai
72.8
3
GPT-5
OpenAI
66.2
4
Qwen3-235B-A22Blimited
Alibaba
65.9
5
Claude Opus 4.5
Anthropic
64.9
6
Gemini 3 Pro
Google
63.4
7
o4-mini
OpenAI
63.2
8
o3
OpenAI
62.3
9
GPT-5.2
OpenAI
61.8
10
Gemini 3 Flash
Google
60.7
11
Claude Sonnet 4.5
Anthropic
58.4
12
GPT-4.1
OpenAI
58.0
13
Grok 4limited
xAI
57.9
14
Doubao-Seed-Code
ByteDance
57.2
15
Claude Sonnet 4
Anthropic
56.9
16
Claude Opus 4limited
Anthropic
56.6
17
GPT-5.1
OpenAI
56.3
18
Gemini 2.5 Pro
Google
55.5
19
DeepSeek V3.2
DeepSeek
54.9
20
Kimi K2
Moonshot AI
53.5
21
Claude 3.7 Sonnet
Anthropic
47.0
22
Claude 3.5 Sonnet
Anthropic
44.7
23
DeepSeek R1limited
DeepSeek
40.6
24
GPT-5 Codexlimited
OpenAI
38.9
25
DeepSeek V3.2 Specialelimited
DeepSeek
37.9
26
o3-minilimited
OpenAI
36.8
27
DeepSeek V3
DeepSeek
31.3
28
Gemini 2.5 Flash
Google
25.8
29
GPT-4o
OpenAI
22.8
30
Grok 3limited
xAI
19.8
31
Llama 4 Mavericklimited
Meta
15.6