#Self-Alignment

1개의 포스트

[논문리뷰] Meta-Awareness Enhances Reasoning Models: Self-Alignment Reinforcement Learning

대규모 언어 모델(LLM)의 메타 인식(meta-awareness) 능력 부족으로 인한 심각한 불일치(misalignment) 문제를 해결하고, 메타 예측(meta-prediction)과 실제 롤아웃(rollout) 간의 정렬을 통해 추론 모델의 성능을 향상시키는 것을 목표로 합니다.

#Review #Meta-Awareness #Reinforcement Learning #Self-Alignment #LLM Reasoning #Training Efficiency #Generalization #Predictive Gating

2025년 10월 10일