반응형

(개념) 일반적인 NYT Connections 퍼즐을 인공지능(AI) 성능 측정용으로 확장·개량한 고난도 평가 지표

 

Extended NYT Connections 벤치마크에서 GPT‑5.2의 고추론 버전이 69.9→77.9로 향상됨

 

Leaderboard: Extended Version

RankModelScore %#Puzzles

1 Gemini 3 Pro Preview 96.8 759
2 Grok 4.1 Fast Reasoning 93.5 759
3 Sherlock Think Alpha 92.4 759
4 Grok 4 Fast Reasoning 92.1 759
5 Grok 4 91.7 759
6 Sonoma Sky Alpha 90.7 759
7 o3-pro (medium reasoning) 87.3 759
8 GPT-5 Pro 83.9 759
9 o1-pro (medium reasoning) 82.5 651
10 o3 (high reasoning) 78.6 759
11 GPT-5.2 (high reasoning) 77.9 759
12 GPT-5 (high reasoning) 77.0 759

 

Correlation of puzzle-level results: heatmap

 

https://github.com/lechmazur/nyt-connections/

 

 

Posted by Mr. Slumber
,