본문으로 건너뛰기

#Software Engineering

42개의 포스트

[논문리뷰] SWE-chat: Coding Agent Interactions From Real Users in the Wild

댓글 수 로딩 중

[논문리뷰] GBQA: A Game Benchmark for Evaluating LLMs as Quality Assurance Engineers

댓글 수 로딩 중

[논문리뷰] Ask or Assume? Uncertainty-Aware Clarification-Seeking in Coding Agents

댓글 수 로딩 중

[논문리뷰] SWE-CI: Evaluating Agent Capabilities in Maintaining Codebases via Continuous Integration

댓글 수 로딩 중

[논문리뷰] Qwen3-Coder-Next Technical Report

댓글 수 로딩 중

[논문리뷰] LongCLI-Bench: A Preliminary Benchmark and Study for Long-horizon Agentic Programming in Command-Line Interfaces

댓글 수 로딩 중

[논문리뷰] AIDev: Studying AI Coding Agents on GitHub

댓글 수 로딩 중

[논문리뷰] FeatureBench: Benchmarking Agentic Coding for Complex Feature Development

댓글 수 로딩 중

[논문리뷰] CLI-Gym: Scalable CLI Task Generation via Agentic Environment Inversion

댓글 수 로딩 중

[논문리뷰] TAM-Eval: Evaluating LLMs for Automated Unit Test Maintenance

댓글 수 로딩 중

[논문리뷰] Advances and Frontiers of LLM-based Issue Resolution in Software Engineering: A Comprehensive Survey

댓글 수 로딩 중

[논문리뷰] SWE-Lego: Pushing the Limits of Supervised Fine-tuning for Software Issue Resolving

댓글 수 로딩 중

[논문리뷰] NL2Repo-Bench: Towards Long-Horizon Repository Generation Evaluation of Coding Agents

댓글 수 로딩 중

[논문리뷰] Confucius Code Agent: An Open-sourced AI Software Engineer at Industrial Scale

댓글 수 로딩 중

[논문리뷰] LoCoBench-Agent: An Interactive Benchmark for LLM Agents in Long-Context Software Engineering

댓글 수 로딩 중

[논문리뷰] Agentic Refactoring: An Empirical Study of AI Coding Agents

댓글 수 로딩 중

[논문리뷰] Walking the Tightrope of LLMs for Software Development: A Practitioners' Perspective

댓글 수 로딩 중

[논문리뷰] SWE-Bench Pro: Can AI Agents Solve Long-Horizon Software Engineering Tasks?

댓글 수 로딩 중

[논문리뷰] RPG: A Repository Planning Graph for Unified and Scalable Codebase Generation

댓글 수 로딩 중

[논문리뷰] LoCoBench: A Benchmark for Long-Context Large Language Models in Complex Software Engineering

댓글 수 로딩 중

[논문리뷰] Training Long-Context, Multi-Turn Software Engineering Agents with Reinforcement Learning

댓글 수 로딩 중

[논문리뷰] Tool-integrated Reinforcement Learning for Repo Deep Search

댓글 수 로딩 중

[논문리뷰] BigCodeArena: Unveiling More Reliable Human Preferences in Code Generation via Execution

댓글 수 로딩 중

[논문리뷰] Code4MeV2: a Research-oriented Code-completion Platform

댓글 수 로딩 중

[논문리뷰] PIPer: On-Device Environment Setup via Online Reinforcement Learning

댓글 수 로딩 중

[논문리뷰] BuildBench: Benchmarking LLM Agents on Compiling Real-World Open-Source Software

댓글 수 로딩 중