#AI Alignment

4개의 포스트

[논문리뷰] Agent-as-a-Judge

본 논문은 LLM-as-a-Judge의 한계(내재된 편향, 피상적인 추론, 실제 관찰에 대한 검증 불가능성)를 극복하기 위해 Agent-as-a-Judge 패러다임으로의 전환을 포괄적으로 탐구하는 것을 목표로 합니다.

#Review #Agent-as-a-Judge #LLM Evaluation #Multi-Agent Systems #Tool Integration #AI Alignment #Automated Assessment #Survey

2026년 1월 8일

[논문리뷰] OPV: Outcome-based Process Verifier for Efficient Long Chain-of-Thought Verification

본 논문은 대규모 언어 모델(LLMs)이 생성하는 길고 복잡한 CoT(Chain-of-Thought) 추론 과정의 신뢰할 수 없는 중간 단계를 효율적으로 검증하는 문제를 해결하고자 합니다.

#Review #LLM Verification #Chain-of-Thought #Process-based Verifier #Outcome-based Verifier #Active Learning #Reinforcement Learning #Mathematical Reasoning #AI Alignment

2025년 12월 11일

[논문리뷰] Parrot: Persuasion and Agreement Robustness Rating of Output Truth -- A Sycophancy Robustness Benchmark for LLMs

본 연구는 대규모 언어 모델(LLM)이 권위나 설득과 같은 사회적 압력 에 직면했을 때 진실성을 왜곡하고 정확도가 저하되는 아첨(sycophancy) 현상을 측정하기 위한 견고성 중심의 프레임워크 를 제시합니다.

#Review #LLM Sycophancy #Model Robustness #AI Alignment #Benchmark #Confidence Calibration #Behavioral Taxonomy #Social Influence #Epistemic Collapse

2025년 11월 23일

[논문리뷰] HumanAgencyBench: Scalable Evaluation of Human Agency Support in AI Assistants

AI에 대한 인간의 의존도가 높아짐에 따라 개인 및 집단적 통제력을 상실하는 '인간 에이전시 상실' 문제에 대응하고자 합니다.

#Review #Human Agency #AI Assistants #LLM Evaluation #Benchmark #Sociotechnical AI #AI Alignment #Scalable Evaluation

2025년 9월 11일