[논문리뷰] OccuBench: Evaluating AI Agents on Real-World Professional Tasks via Language World ModelsarXiv에 게시된 'OccuBench: Evaluating AI Agents on Real-World Professional Tasks via Language World Models' 논문에 대한 자세한 리뷰입니다.#Review#AI Agents#Language World Models#Professional Tasks#Environmental Robustness#Fault Injection#Benchmark2026년 4월 15일댓글 수 로딩 중
[논문리뷰] $OneMillion-Bench: How Far are Language Agents from Human Experts?arXiv에 게시된 '$OneMillion-Bench: How Far are Language Agents from Human Experts?' 논문에 대한 자세한 리뷰입니다.#Review#Language Agents#Benchmarking#Expert Evaluation#Economic Value#Professional Tasks#Rubric-based Evaluation#Multi-step Reasoning#Reliability#Domain Adaptation2026년 3월 9일댓글 수 로딩 중