#Iterative Development

2개의 포스트

[논문리뷰] SlopCodeBench: Benchmarking How Coding Agents Degrade Over Long-Horizon Iterative Tasks

기존 Coding Agent Benchmark 들은 압도적으로 Single-shot Solutions 을 Complete Specification에 대해 평가하고 있으며, 이는 Agent가 현재 Specification 에 대한 Correct Code 를 생성할 수 있는지 여부만을 측정한다.

#Review #SlopCodeBench #Coding Agents #Iterative Development #Code Quality #Structural Erosion #Verbosity #Benchmarking #Long-Horizon Tasks

2026년 3월 26일

[논문리뷰] CharacterFlywheel: Scaling Iterative Improvement of Engaging and Steerable LLMs in Production

본 논문은 Instagram, WhatsApp, Messenger와 같은 프로덕션 환경의 소셜 챗 애플리케이션에서 LLM 의 사용자 참여도와 조종성(steerability)을 반복적으로 개선하는 CharacterFlywheel 이라는 이터레이션 프로세스를 제시합니다.

#Review #LLM #Social Chat #Engagement Optimization #Steerability #Reinforcement Learning #Reward Modeling #A/B Testing #Iterative Development

2026년 3월 2일