OpenSOTA — Agentic & Coding LLM Leaderboards

Coding

How well models write, edit, and debug real code.

Last updated · May 31, 2026, 13:27 UTC

Rank	Model	Provider	Composite	SWE-bench Verified	LiveCodeBench	Aider Polyglot	Artificial Analysis
1	Claude Opus 4.6limited Anthropic	Anthropic	75.6	75.6	—	—	—
2	GLM-5limited Z.ai	Z.ai	72.8	72.8	—	—	—
3	GPT-5 OpenAI	OpenAI	66.2	74.4	—	88.0	39.0
4	Qwen3-235B-A22Blimited Alibaba	Alibaba	65.9	—	65.9	—	—
5	Claude Opus 4.5 Anthropic	Anthropic	64.9	79.2	—	—	47.8
6	Gemini 3 Pro Google	Google	63.4	77.4	—	—	46.5
7	o4-mini OpenAI	OpenAI	63.2	74.4	80.2	72.0	25.6
8	o3 OpenAI	OpenAI	62.3	58.4	75.8	81.3	38.4
9	GPT-5.2 OpenAI	OpenAI	61.8	72.8	—	—	48.7
10	Gemini 3 Flash Google	Google	60.7	75.8	—	—	42.6
11	Claude Sonnet 4.5 Anthropic	Anthropic	58.4	74.8	—	—	38.6
12	GPT-4.1 OpenAI	OpenAI	58.0	74.6	—	78.2	21.8
13	Grok 4limited xAI	xAI	57.9	—	—	79.6	40.5
14	Doubao-Seed-Code ByteDance	ByteDance	57.2	78.8	—	—	31.3
15	Claude Sonnet 4 Anthropic	Anthropic	56.9	76.8	56.0	—	34.1
16	Claude Opus 4limited Anthropic	Anthropic	56.6	—	56.6	—	—
17	GPT-5.1 OpenAI	OpenAI	56.3	66.0	—	—	44.7
18	Gemini 2.5 Pro Google	Google	55.5	75.2	—	—	31.9
19	DeepSeek V3.2 DeepSeek	DeepSeek	54.9	70.0	—	—	36.7
20	Kimi K2 Moonshot AI	Moonshot AI	53.5	65.4	—	59.1	34.8
21	Claude 3.7 Sonnet Anthropic	Anthropic	47.0	63.2	—	—	27.6
22	Claude 3.5 Sonnet Anthropic	Anthropic	44.7	50.8	36.4	64.0	30.2
23	DeepSeek R1limited DeepSeek	DeepSeek	40.6	—	—	71.4	15.9
24	GPT-5 Codexlimited OpenAI	OpenAI	38.9	—	—	—	38.9
25	DeepSeek V3.2 Specialelimited DeepSeek	DeepSeek	37.9	—	—	—	37.9
26	o3-minilimited OpenAI	OpenAI	36.8	—	—	60.4	17.9
27	DeepSeek V3 DeepSeek	DeepSeek	31.3	—	27.2	55.1	16.4
28	Gemini 2.5 Flash Google	Google	25.8	28.7	—	—	22.2
29	GPT-4o OpenAI	OpenAI	22.8	21.6	—	—	24.2
30	Grok 3limited xAI	xAI	19.8	—	—	—	19.8
31	Llama 4 Mavericklimited Meta	Meta	15.6	—	—	15.6	15.6