#Adaptive Computation Allocation

1개의 포스트

[논문리뷰] Process Rewards with Learned Reliability

본 논문은 기존 PRM이 중간 단계에 대해 단일 Scalar 보상값만을 제공하여, 해당 점수의 신뢰도를 평가할 수 없는 한계점을 해결하고자 합니다.

#Review #Process Reward Model #Beta-Binomial #Adaptive Computation Allocation #Test-Time Scaling #Uncertainty Estimation

2026년 5월 19일