[논문리뷰] CodeClash: Benchmarking Goal-Oriented Software EngineeringarXiv에 게시된 'CodeClash: Benchmarking Goal-Oriented Software Engineering' 논문에 대한 자세한 리뷰입니다.#Review#Software Engineering Benchmarking#Language Models#AI Agents#Goal-Oriented Development#Competitive Programming#Code Evolution#Strategic Reasoning#Autonomous Systems2025년 11월 9일댓글 수 로딩 중
[논문리뷰] Vision-Zero: Scalable VLM Self-Improvement via Strategic Gamified Self-PlayJing Shi이 arXiv에 게시한 'Vision-Zero: Scalable VLM Self-Improvement via Strategic Gamified Self-Play' 논문에 대한 자세한 리뷰입니다.#Review#Vision-Language Models (VLMs)#Self-Play#Reinforcement Learning#Gamification#Data Efficiency#Strategic Reasoning#Multimodal AI#Self-Improvement2025년 10월 1일댓글 수 로딩 중
[논문리뷰] Democratizing Diplomacy: A Harness for Evaluating Any Large Language Model on Full-Press DiplomacyElizabeth Karpinski이 arXiv에 게시한 'Democratizing Diplomacy: A Harness for Evaluating Any Large Language Model on Full-Press Diplomacy' 논문에 대한 자세한 리뷰입니다.#Review#Large Language Models#Diplomacy Game#Multi-agent Systems#Strategic Reasoning#LLM Evaluation#Prompt Engineering#Behavioral Analysis#Game AI2025년 8월 13일댓글 수 로딩 중