#Utterance-level Rewards

1개의 포스트

[논문리뷰] Sotopia-RL: Reward Design for Social Intelligence

본 논문은 대규모 언어 모델(LLM)을 사회적으로 지능적인 에이전트로 훈련할 때 직면하는 부분적 관측성(Partial Observability) 과 다차원성(Multi-dimensionality) 이라는 핵심 과제를 해결하고자 합니다.

#Review #Social Intelligence #Reinforcement Learning #Reward Design #Large Language Models #Utterance-level Rewards #Multi-dimensional Rewards #Partial Observability #SOTOPIA

2025년 8월 7일