#Process Reward Model

4개의 포스트

[논문리뷰] Off-the-Shelf LLMs as Process Scorers: Training-Free Alternative to PRMs for Mathematical Reasoning

본 연구는 대형 모델의 추론 성능을 소형 모델에서 효율적으로 모사하기 위한 기존 추론 기법들의 한계를 해결하고자 합니다.

#Review #Mathematical Reasoning #Large Language Models #Process Reward Model #Inference-time Guidance #Chunk-Level Generation #Likelihood Scoring #Training-Free

2026년 6월 1일

[논문리뷰] Process Rewards with Learned Reliability

본 논문은 기존 PRM이 중간 단계에 대해 단일 Scalar 보상값만을 제공하여, 해당 점수의 신뢰도를 평가할 수 없는 한계점을 해결하고자 합니다.

#Review #Process Reward Model #Beta-Binomial #Adaptive Computation Allocation #Test-Time Scaling #Uncertainty Estimation

2026년 5월 19일

[논문리뷰] PRISM: Pushing the Frontier of Deep Think via Process Reward Model-Guided Inference

논문은 DEEPTHINK 시스템의 주요 병목 현상인 인스턴스 추론 중 신뢰할 수 없는 정확성 신호 부족 을 해결하고자 합니다. 이는 깊은 추론 과정에서 오류를 증폭시키고, 소수의 올바른 해결책을 억압하며, 추가 컴퓨팅의 효율성을 저하시키는 문제를 야기합니다.

#Review #DeepThink #Process Reward Model #Inference Algorithm #Population Refinement #Stochastic Mutation #Reasoning Benchmarks #Compute-Accuracy Tradeoff

2026년 3월 3일

[논문리뷰] MASPRM: Multi-Agent System Process Reward Model

Multi-Agent Systems (MAS)의 추론 시 검색 과정에서 발생하는 비신뢰성 문제를 해결하는 것을 목표로 합니다.

#Review #Multi-Agent Systems #Process Reward Model #MCTS #Inference-time Search #LLM Agents #Zero-shot Transfer #Reinforcement Learning #Compute-Aware Reasoning

2025년 10월 30일