#Adversarial Training

4개의 포스트

[논문리뷰] TriPlay-RL: Tri-Role Self-Play Reinforcement Learning for LLM Safety Alignment

arXiv에 게시된 'TriPlay-RL: Tri-Role Self-Play Reinforcement Learning for LLM Safety Alignment' 논문에 대한 자세한 리뷰입니다.

#Review #LLM Safety Alignment #Reinforcement Learning #Self-Play #Red Teaming #Adversarial Training #Multi-Role Framework #Reward Hacking Mitigation

2026년 1월 27일

[논문리뷰] The Unanticipated Asymmetry Between Perceptual Optimization and Assessment

Du Chen이 arXiv에 게시한 'The Unanticipated Asymmetry Between Perceptual Optimization and Assessment' 논문에 대한 자세한 리뷰입니다.

#Review #Perceptual Optimization #Image Quality Assessment (IQA)#Adversarial Training #Discriminators #Super-Resolution #Fidelity Metrics #Deep Learning

2025년 9월 26일

[논문리뷰] Language Self-Play For Data-Free Training

Vijai Mohan이 arXiv에 게시한 'Language Self-Play For Data-Free Training' 논문에 대한 자세한 리뷰입니다.

#Review #Large Language Models #Reinforcement Learning #Self-Play #Data-Free Training #Instruction Following #Adversarial Training #Reward Modeling

2025년 9월 10일

[논문리뷰] R^textbf{2AI}: Towards Resistant and Resilient AI in an Evolving World

Bowen Zhou이 arXiv에 게시한 'R^textbf{2AI}: Towards Resistant and Resilient AI in an Evolving World' 논문에 대한 자세한 리뷰입니다.

#Review #AI Safety #Resistant AI #Resilient AI #Coevolution #Fast-Slow Models #Adversarial Training #Continual Learning #AGI Alignment

2025년 9월 9일