Char :: LLM - 성능 - 모델 최적화 - '테스트 타임 스케일링(test-time scaling)' 기법의 역설

LLM - 성능 - 모델 최적화 - '테스트 타임 스케일링(test-time scaling)' 기법의 역설

07.AI 2025. 7. 30. 16:54

728x90

모델이 추론 길이(reasoning steps)를 늘릴수록 성능이 떨어지는 현상을 연구한 결과,

Claude 모델은 추론이 길어질수록 산만해지고,

OpenAI o-시리즈 모델은 과제 프레이밍에 과도하게 오버핏(overfit)하는 등,

모델별로 다양한 실패 모드를 확인하였습니다

https://arxiv.org/abs/2507.14417

Inverse Scaling in Test-Time Compute

We construct evaluation tasks where extending the reasoning length of Large Reasoning Models (LRMs) deteriorates performance, exhibiting an inverse scaling relationship between test-time compute and accuracy. Our evaluation tasks span four categories: simp

arxiv.org

https://github.com/safety-research/inverse-scaling-ttc

GitHub - safety-research/inverse-scaling-ttc: Inverse Scaling in Test-Time Compute

Inverse Scaling in Test-Time Compute. Contribute to safety-research/inverse-scaling-ttc development by creating an account on GitHub.

github.com

728x90

'07.AI' 카테고리의 다른 글

LLM - 딥시크 (DeepSeek) (4)	2025.07.31
머신러닝 - 파인튜닝(fine-tuning), 미세조정 (4)	2025.07.31
인공지능 - 인공지능 에이전트 (Agent) - 에이전틱 AI - 가디언 에이전트(Guardian Agent) (3)	2025.07.21
인공지능 - 기술 발전 방향 (1)	2025.07.21
LLM - 성능 - 벤치마크, 케이엠엠엘유-프로(KMMLU-Pro) (1)	2025.07.20