[논문리뷰] GBQA: A Game Benchmark for Evaluating LLMs as Quality Assurance EngineersZhiyang Chen이 arXiv에 게시한 'GBQA: A Game Benchmark for Evaluating LLMs as Quality Assurance Engineers' 논문에 대한 자세한 리뷰입니다.#Review#Autonomous Bug Discovery#Large Language Models#Game Benchmark#Quality Assurance#Multi-agent System#Software Engineering2026년 4월 7일댓글 수 로딩 중
[논문리뷰] SciCoQA: Quality Assurance for Scientific Paper--Code AlignmentarXiv에 게시된 'SciCoQA: Quality Assurance for Scientific Paper--Code Alignment' 논문에 대한 자세한 리뷰입니다.#Review#Reproducibility#Paper-Code Discrepancy#Code Alignment#LLM Evaluation#Synthetic Data Generation#Quality Assurance#Scientific Automation2026년 1월 20일댓글 수 로딩 중