#General Capabilities

1개의 포스트

[논문리뷰] A Survey on Large Language Model Benchmarks

본 논문은 대규모 언어 모델(LLM) 평가 벤치마크의 현재 상태와 발전 과정을 체계적으로 검토하고, 기존 벤치마크의 한계를 분석하며, 향후 벤치마크 혁신을 위한 설계 패러다임을 제시하는 것을 목표로 합니다. LLM의 기능 측정과 기술 혁신 촉진이라는 핵심 역할을 하는 벤치마크의 중요성을 강조합니다.

#Review #LLM Benchmarks #Evaluation #Systematic Review #General Capabilities #Domain-Specific Benchmarks #Target-Specific Benchmarks #Data Contamination #AI Ethics

2025년 8월 22일