[논문리뷰] Terminal-Bench: Benchmarking Agents on Hard, Realistic Tasks in Command Line InterfacesHarsh Raj이 arXiv에 게시한 'Terminal-Bench: Benchmarking Agents on Hard, Realistic Tasks in Command Line Interfaces' 논문에 대한 자세한 리뷰입니다.#Review#AI Agents#LLM Evaluation#Benchmarking#Command Line Interface#Software Engineering#Realistic Tasks#Error Analysis2026년 1월 22일댓글 수 로딩 중