[논문리뷰] GBQA: A Game Benchmark for Evaluating LLMs as Quality Assurance EngineersZhiyang Chen이 arXiv에 게시한 'GBQA: A Game Benchmark for Evaluating LLMs as Quality Assurance Engineers' 논문에 대한 자세한 리뷰입니다.#Review#Autonomous Bug Discovery#Large Language Models#Game Benchmark#Quality Assurance#Multi-agent System#Software Engineering2026년 4월 7일댓글 수 로딩 중
[논문리뷰] Ask or Assume? Uncertainty-Aware Clarification-Seeking in Coding AgentsarXiv에 게시된 'Ask or Assume? Uncertainty-Aware Clarification-Seeking in Coding Agents' 논문에 대한 자세한 리뷰입니다.#Review#LLM Agents#Software Engineering#Underspecification#Uncertainty-Aware#Multi-Agent#Collaborative AI2026년 4월 2일댓글 수 로딩 중
[논문리뷰] SWE-CI: Evaluating Agent Capabilities in Maintaining Codebases via Continuous IntegrationBing Zhao이 arXiv에 게시한 'SWE-CI: Evaluating Agent Capabilities in Maintaining Codebases via Continuous Integration' 논문에 대한 자세한 리뷰입니다.#Review#LLM Agents#Software Engineering#Code Maintenance#Continuous Integration#Benchmark#Code Generation#Long-term Evaluation#Technical Debt2026년 3월 4일댓글 수 로딩 중
[논문리뷰] Qwen3-Coder-Next Technical ReportarXiv에 게시된 'Qwen3-Coder-Next Technical Report' 논문에 대한 자세한 리뷰입니다.#Review#Coding Agents#Large Language Models (LLMs)#Mixture-of-Experts (MoE)#Agentic Training#Software Engineering#Reinforcement Learning#Code Generation#Tool Usage2026년 3월 3일댓글 수 로딩 중
[논문리뷰] LongCLI-Bench: A Preliminary Benchmark and Study for Long-horizon Agentic Programming in Command-Line InterfacesChuanhao Li이 arXiv에 게시한 'LongCLI-Bench: A Preliminary Benchmark and Study for Long-horizon Agentic Programming in Command-Line Interfaces' 논문에 대한 자세한 리뷰입니다.#Review#Agentic Programming#CLI#Benchmark#Long-horizon Tasks#Code Generation#LLM Evaluation#Human-Agent Collaboration#Software Engineering2026년 2월 24일댓글 수 로딩 중
[논문리뷰] GLM-5: from Vibe Coding to Agentic EngineeringGLM-5 Team이 arXiv에 게시한 'GLM-5: from Vibe Coding to Agentic Engineering' 논문에 대한 자세한 리뷰입니다.#Review#Foundation Model#Agentic AI#Reinforcement Learning#Sparse Attention#Software Engineering#Long-Context Models#GPU Optimization2026년 2월 17일댓글 수 로딩 중
[논문리뷰] AIDev: Studying AI Coding Agents on GitHubAhmed E. Hassan이 arXiv에 게시한 'AIDev: Studying AI Coding Agents on GitHub' 논문에 대한 자세한 리뷰입니다.#Review#AI Coding Agents#GitHub Data#Software Engineering#Pull Request Analysis#Human-AI Collaboration#Developer Productivity#Large Language Models2026년 2월 16일댓글 수 로딩 중
[논문리뷰] FeatureBench: Benchmarking Agentic Coding for Complex Feature DevelopmentJiahe Wang이 arXiv에 게시한 'FeatureBench: Benchmarking Agentic Coding for Complex Feature Development' 논문에 대한 자세한 리뷰입니다.#Review#Agentic Coding#Benchmarking#LLMs#Feature Development#Software Engineering#Test-Driven Development#Scalability2026년 2월 11일댓글 수 로딩 중
[논문리뷰] CLI-Gym: Scalable CLI Task Generation via Agentic Environment InversionFeiyang Pan이 arXiv에 게시한 'CLI-Gym: Scalable CLI Task Generation via Agentic Environment Inversion' 논문에 대한 자세한 리뷰입니다.#Review#Agentic Coding#CLI Automation#Environment Inversion#Task Generation#Large Language Models (LLMs)#Software Engineering#Dockerfile#Terminal-Bench2026년 2월 11일댓글 수 로딩 중
[논문리뷰] CodeOCR: On the Effectiveness of Vision Language Models in Code UnderstandingarXiv에 게시된 'CodeOCR: On the Effectiveness of Vision Language Models in Code Understanding' 논문에 대한 자세한 리뷰입니다.#Review#Vision Language Models#Code Understanding#Visual Code Representation#Code Compression#Computational Efficiency#Multimodal LLMs#Software Engineering2026년 2월 3일댓글 수 로딩 중
[논문리뷰] Kimi K2.5: Visual Agentic IntelligencearXiv에 게시된 'Kimi K2.5: Visual Agentic Intelligence' 논문에 대한 자세한 리뷰입니다.#Review#Multimodal AI#Agentic Intelligence#Vision-Language Models#Parallel Agent Orchestration#Reinforcement Learning#Joint Optimization#Visual Reasoning#Software Engineering2026년 2월 2일댓글 수 로딩 중
[논문리뷰] TAM-Eval: Evaluating LLMs for Automated Unit Test MaintenanceDaniil Grebenkin이 arXiv에 게시한 'TAM-Eval: Evaluating LLMs for Automated Unit Test Maintenance' 논문에 대한 자세한 리뷰입니다.#Review#LLM#Unit Test Maintenance#Software Engineering#Code Generation#Test Repair#Test Updating#Benchmark#Mutation Testing#Code Coverage2026년 2월 1일댓글 수 로딩 중
[논문리뷰] Guidelines to Prompt Large Language Models for Code Generation: An Empirical CharacterizationGabriele Bavota이 arXiv에 게시한 'Guidelines to Prompt Large Language Models for Code Generation: An Empirical Characterization' 논문에 대한 자세한 리뷰입니다.#Review#Large Language Models#Code Generation#Prompt Engineering#Prompt Optimization#Empirical Study#Software Engineering#Guidelines2026년 1월 25일댓글 수 로딩 중
[논문리뷰] Terminal-Bench: Benchmarking Agents on Hard, Realistic Tasks in Command Line InterfacesHarsh Raj이 arXiv에 게시한 'Terminal-Bench: Benchmarking Agents on Hard, Realistic Tasks in Command Line Interfaces' 논문에 대한 자세한 리뷰입니다.#Review#AI Agents#LLM Evaluation#Benchmarking#Command Line Interface#Software Engineering#Realistic Tasks#Error Analysis2026년 1월 22일댓글 수 로딩 중
[논문리뷰] Advances and Frontiers of LLM-based Issue Resolution in Software Engineering: A Comprehensive SurveyarXiv에 게시된 'Advances and Frontiers of LLM-based Issue Resolution in Software Engineering: A Comprehensive Survey' 논문에 대한 자세한 리뷰입니다.#Review#LLM-based Issue Resolution#Software Engineering#Autonomous Agents#Code Generation#Benchmarking#Reinforcement Learning#Supervised Fine-tuning#Multimodal LLMs2026년 1월 20일댓글 수 로딩 중
[논문리뷰] MemGovern: Enhancing Code Agents through Learning from Governed Human ExperiencesRui Xu이 arXiv에 게시한 'MemGovern: Enhancing Code Agents through Learning from Governed Human Experiences' 논문에 대한 자세한 리뷰입니다.#Review#Code Agents#Software Engineering#Experiential Memory#GitHub Data#Experience Governance#Agentic Search#LLM Applications#Bug Fixing2026년 1월 13일댓글 수 로딩 중
[논문리뷰] AgentDevel: Reframing Self-Evolving LLM Agents as Release EngineeringDi Zhang이 arXiv에 게시한 'AgentDevel: Reframing Self-Evolving LLM Agents as Release Engineering' 논문에 대한 자세한 리뷰입니다.#Review#LLM Agents#Release Engineering#Self-Improvement#Regression Testing#Continuous Integration#Flip-Centered Gating#Auditable Development#Software Engineering2026년 1월 8일댓글 수 로딩 중
[논문리뷰] SWE-Lego: Pushing the Limits of Supervised Fine-tuning for Software Issue ResolvingarXiv에 게시된 'SWE-Lego: Pushing the Limits of Supervised Fine-tuning for Software Issue Resolving' 논문에 대한 자세한 리뷰입니다.#Review#Software Engineering#Issue Resolution#Supervised Fine-tuning (SFT)#Large Language Models (LLMs)#Hybrid Dataset#Error Masking#Curriculum Learning#Test-Time Scaling (TTS)#Generative Verifiers2026년 1월 5일댓글 수 로딩 중
[논문리뷰] GraphLocator: Graph-guided Causal Reasoning for Issue LocalizationWei Zhang이 arXiv에 게시한 'GraphLocator: Graph-guided Causal Reasoning for Issue Localization' 논문에 대한 자세한 리뷰입니다.#Review#Issue Localization#Causal Reasoning#Graph-guided#Large Language Models#Software Engineering#Defect Analysis#Repository Mining2025년 12월 30일댓글 수 로딩 중
[논문리뷰] SWE-EVO: Benchmarking Coding Agents in Long-Horizon Software Evolution ScenariosNghi D. Q. Bui이 arXiv에 게시한 'SWE-EVO: Benchmarking Coding Agents in Long-Horizon Software Evolution Scenarios' 논문에 대한 자세한 리뷰입니다.#Review#Coding Agents#Software Evolution#Benchmarking#Long-Horizon Tasks#Large Language Models (LLMs)#Software Engineering#Code Generation2025년 12월 24일댓글 수 로딩 중
[논문리뷰] NL2Repo-Bench: Towards Long-Horizon Repository Generation Evaluation of Coding Agentschongyang09이 arXiv에 게시한 'NL2Repo-Bench: Towards Long-Horizon Repository Generation Evaluation of Coding Agents' 논문에 대한 자세한 리뷰입니다.#Review#Coding Agents#LLMs#Software Engineering#Repository Generation#Long-Horizon Reasoning#Benchmark#Python Development#Autonomous Systems2025년 12월 15일댓글 수 로딩 중
[논문리뷰] Confucius Code Agent: An Open-sourced AI Software Engineer at Industrial ScalearXiv에 게시된 'Confucius Code Agent: An Open-sourced AI Software Engineer at Industrial Scale' 논문에 대한 자세한 리뷰입니다.#Review#AI Agent#Software Engineering#Open-Source#LLM#Orchestrator#Context Management#Long-term Memory#Meta-agent2025년 12월 11일댓글 수 로딩 중
[논문리뷰] Agent READMEs: An Empirical Study of Context Files for Agentic CodingKundjanasith Thonglek이 arXiv에 게시한 'Agent READMEs: An Empirical Study of Context Files for Agentic Coding' 논문에 대한 자세한 리뷰입니다.#Review#Agentic Coding#Context Files#READMEs for Agents#Empirical Study#Software Engineering#Documentation Maintenance#Non-functional Requirements#LLMs2025년 11월 18일댓글 수 로딩 중
[논문리뷰] LoCoBench-Agent: An Interactive Benchmark for LLM Agents in Long-Context Software EngineeringarXiv에 게시된 'LoCoBench-Agent: An Interactive Benchmark for LLM Agents in Long-Context Software Engineering' 논문에 대한 자세한 리뷰입니다.#Review#LLM Agents#Software Engineering#Long-Context#Interactive Benchmark#Tool Usage#Memory Management#Bias-Free Evaluation#Multi-Turn2025년 11월 17일댓글 수 로딩 중
[논문리뷰] Agentic Refactoring: An Empirical Study of AI Coding AgentsHajimu Iida이 arXiv에 게시한 'Agentic Refactoring: An Empirical Study of AI Coding Agents' 논문에 대한 자세한 리뷰입니다.#Review#AI Agents#Code Refactoring#Software Engineering#Empirical Study#Large Language Models#Code Quality#Agentic Software Development#Maintainability2025년 11월 12일댓글 수 로딩 중
[논문리뷰] Walking the Tightrope of LLMs for Software Development: A Practitioners' PerspectiveChristoph Treude이 arXiv에 게시한 'Walking the Tightrope of LLMs for Software Development: A Practitioners' Perspective' 논문에 대한 자세한 리뷰입니다.#Review#Large Language Models#Software Engineering#Developer Productivity#Socio-Technical Grounded Theory#Practitioner Insights#AI Adoption#Benefits and Risks#Balanced Use2025년 11월 11일댓글 수 로딩 중
[논문리뷰] Diff-XYZ: A Benchmark for Evaluating Diff UnderstandingarXiv에 게시된 'Diff-XYZ: A Benchmark for Evaluating Diff Understanding' 논문에 대한 자세한 리뷰입니다.#Review#Diff Understanding#Code Diff#Benchmark#LLMs#Code Editing#Software Engineering#Unified Diff Format#Search-Replace2025년 10월 24일댓글 수 로딩 중
[논문리뷰] A Survey of Vibe Coding with Large Language ModelsarXiv에 게시된 'A Survey of Vibe Coding with Large Language Models' 논문에 대한 자세한 리뷰입니다.#Review#Vibe Coding#Large Language Models#Coding Agents#Human-AI Collaboration#Software Engineering#Development Models#Context Engineering2025년 10월 15일댓글 수 로딩 중
[논문리뷰] BigCodeArena: Unveiling More Reliable Human Preferences in Code Generation via ExecutionHange Liu이 arXiv에 게시한 'BigCodeArena: Unveiling More Reliable Human Preferences in Code Generation via Execution' 논문에 대한 자세한 리뷰입니다.#Review#Code Generation#Human Preference#LLM Evaluation#Execution Feedback#Benchmarking#Crowdsourcing#Software Engineering#Large Language Models2025년 10월 13일댓글 수 로딩 중
[논문리뷰] Code4MeV2: a Research-oriented Code-completion PlatformarXiv에 게시된 'Code4MeV2: a Research-oriented Code-completion Platform' 논문에 대한 자세한 리뷰입니다.#Review#Code Completion#Research Platform#Human-AI Interaction#Software Engineering#Open Science#JetBrains IDE Plugin#Telemetry#AI4SE2025년 10월 7일댓글 수 로딩 중
[논문리뷰] PIPer: On-Device Environment Setup via Online Reinforcement LearningarXiv에 게시된 'PIPer: On-Device Environment Setup via Online Reinforcement Learning' 논문에 대한 자세한 리뷰입니다.#Review#Environment Setup#LLMs#Reinforcement Learning#Supervised Fine-tuning#On-device AI#Software Engineering#Verifiable Rewards2025년 10월 2일댓글 수 로딩 중
[논문리뷰] BuildBench: Benchmarking LLM Agents on Compiling Real-World Open-Source SoftwarearXiv에 게시된 'BuildBench: Benchmarking LLM Agents on Compiling Real-World Open-Source Software' 논문에 대한 자세한 리뷰입니다.#Review#LLM Agents#Open-Source Software#Compilation#Benchmarking#Software Engineering#Error Resolution#Retrieval-Augmented Generation2025년 10월 1일댓글 수 로딩 중
[논문리뷰] V-GameGym: Visual Game Generation for Code Large Language ModelsShawn Guo이 arXiv에 게시한 'V-GameGym: Visual Game Generation for Code Large Language Models' 논문에 대한 자세한 리뷰입니다.#Review#Code Large Language Models#Visual Game Generation#Benchmark#Pygame#Multimodal Evaluation#Software Engineering#AI-assisted Game Development2025년 9월 26일댓글 수 로딩 중
[논문리뷰] On the Use of Agentic Coding: An Empirical Study of Pull Requests on GitHubHajimu Iida이 arXiv에 게시한 'On the Use of Agentic Coding: An Empirical Study of Pull Requests on GitHub' 논문에 대한 자세한 리뷰입니다.#Review#Agentic Coding#AI Agents#Large Language Models#GitHub Pull Requests#Software Engineering#Empirical Study#Code Generation#Software Development2025년 9월 25일댓글 수 로딩 중
[논문리뷰] SWE-Bench Pro: Can AI Agents Solve Long-Horizon Software Engineering Tasks?Yannis Yiming He이 arXiv에 게시한 'SWE-Bench Pro: Can AI Agents Solve Long-Horizon Software Engineering Tasks?' 논문에 대한 자세한 리뷰입니다.#Review#AI Agents#Software Engineering#LLMs#Code Generation#Benchmark#Contamination Resistance#Long-Horizon Tasks#Enterprise Software2025년 9월 23일댓글 수 로딩 중
[논문리뷰] CodeFuse-CR-Bench: A Comprehensiveness-aware Benchmark for End-to-End Code Review Evaluation in Python ProjectsHang Yu이 arXiv에 게시한 'CodeFuse-CR-Bench: A Comprehensiveness-aware Benchmark for End-to-End Code Review Evaluation in Python Projects' 논문에 대한 자세한 리뷰입니다.#Review#Code Review#LLMs#Benchmark#Python Projects#End-to-End Evaluation#Context-Awareness#Software Engineering#LLM-as-a-Judge2025년 9월 23일댓글 수 로딩 중
[논문리뷰] RPG: A Repository Planning Graph for Unified and Scalable Codebase GenerationSteven Liu이 arXiv에 게시한 'RPG: A Repository Planning Graph for Unified and Scalable Codebase Generation' 논문에 대한 자세한 리뷰입니다.#Review#Code Generation#LLMs#Repository Planning#Graph-based Representation#Software Engineering#Agent Frameworks#Scalable Codebase2025년 9월 22일댓글 수 로딩 중
[논문리뷰] LoCoBench: A Benchmark for Long-Context Large Language Models in Complex Software EngineeringJianguo Zhang이 arXiv에 게시한 'LoCoBench: A Benchmark for Long-Context Large Language Models in Complex Software Engineering' 논문에 대한 자세한 리뷰입니다.#Review#Long-Context LLMs#Software Engineering#Code Evaluation#Benchmark#Multi-file Reasoning#Architectural Understanding#Context Length#Software Development Lifecycle#Metrics2025년 9월 12일댓글 수 로딩 중
[논문리뷰] Training Long-Context, Multi-Turn Software Engineering Agents with Reinforcement LearningMaksim Nekrashevich이 arXiv에 게시한 'Training Long-Context, Multi-Turn Software Engineering Agents with Reinforcement Learning' 논문에 대한 자세한 리뷰입니다.#Review#Reinforcement Learning#Large Language Models#Software Engineering#Multi-Turn Interaction#Long Context#DAPO#Autonomous Agents#SWE-BENCH2025년 8월 7일댓글 수 로딩 중
[논문리뷰] Tool-integrated Reinforcement Learning for Repo Deep SearchYanzhen Zou이 arXiv에 게시한 'Tool-integrated Reinforcement Learning for Repo Deep Search' 논문에 대한 자세한 리뷰입니다.#Review#Issue Localization#Large Language Models (LLMs)#Reinforcement Learning (RL)#Supervised Fine-tuning (SFT)#Tool-integrated Agents#Software Engineering#Code Search2025년 8월 6일댓글 수 로딩 중
[논문리뷰] SWE-Debate: Competitive Multi-Agent Debate for Software Issue ResolutionHeng Lian이 arXiv에 게시한 'SWE-Debate: Competitive Multi-Agent Debate for Software Issue Resolution' 논문에 대한 자세한 리뷰입니다.#Review#Multi-Agent System#Software Engineering#Fault Localization#Issue Resolution#Large Language Models#Competitive Debate#Graph Traversal2025년 8월 4일댓글 수 로딩 중