#Indistinguishability

1개의 포스트

[논문리뷰] Learning User Simulators with Turing Rewards

본 논문은 기존의 사용자 시뮬레이터 학습 방식이 실제 인간의 행동을 충분히 모사하지 못하는 근본적인 한계를 해결하고자 합니다. 기존 연구들은 주로 Log-probability 최대화 또는 Ground truth 응답과의 단순 Similarity를 측정하는 방식에 의존해 왔습니다.

#Review #User Simulation #Turing Reward #Reinforcement Learning #Large Language Models #Indistinguishability #GRPO #Human-likeness

2026년 6월 17일