#Alignment Guardrail

1개의 포스트

[논문리뷰] A Contextual Quality Reward Model for Reliable and Efficient Best-of-N Sampling

현재 선호도 정렬 기법인 Best-of-N (BoN) 샘플링 이 단순히 '더 나은' 응답을 선택할 뿐, '충분히 좋은' 응답의 절대적 허용 가능성을 판단하지 못하는 문제를 해결하고자 합니다.

#Review #Reward Model #Best-of-N Sampling #Preference Alignment #Contextual Acceptability #Discrete Choice Model #Alignment Guardrail #Inference Accelerator

2025년 10월 8일