#Reinforcement Learning from Human Feedback

6개의 포스트

[논문리뷰] Ministral 3

arXiv에 게시된 'Ministral 3' 논문에 대한 자세한 리뷰입니다.

#Review #Large Language Models #Model Distillation #Pruning #Parameter-Efficient AI #Multimodal LLMs #Instruction Tuning #Reinforcement Learning from Human Feedback #Open-Source AI

2026년 1월 13일

[논문리뷰] Seedance 1.5 pro: A Native Audio-Visual Joint Generation Foundation Model

arXiv에 게시된 'Seedance 1.5 pro: A Native Audio-Visual Joint Generation Foundation Model' 논문에 대한 자세한 리뷰입니다.

#Review #Audio-Visual Generation #Diffusion Transformer #Multimodal AI #Speech Synchronization #Video Generation #Reinforcement Learning from Human Feedback #Inference Acceleration

2025년 12월 18일

[논문리뷰] Test-Time Policy Adaptation for Enhanced Multi-Turn Interactions with LLMs

Yao Shu이 arXiv에 게시한 'Test-Time Policy Adaptation for Enhanced Multi-Turn Interactions with LLMs' 논문에 대한 자세한 리뷰입니다.

#Review #Large Language Models #Multi-turn Interaction #Test-Time Adaptation #Reinforcement Learning from Human Feedback #Policy Optimization #Online Learning #Self-Correction

2025년 10월 1일

[논문리뷰] A Survey on Diffusion Language Models

Zhiqiang Shen이 arXiv에 게시한 'A Survey on Diffusion Language Models' 논문에 대한 자세한 리뷰입니다.

#Review #Diffusion Language Models #Generative AI #Parallel Decoding #Text Generation #Multimodal AI #Model Compression #Reinforcement Learning from Human Feedback #Inference Optimization

2025년 8월 15일

[논문리뷰] Learning to Align, Aligning to Learn: A Unified Approach for Self-Optimized Alignment

Lei Fan이 arXiv에 게시한 'Learning to Align, Aligning to Learn: A Unified Approach for Self-Optimized Alignment' 논문에 대한 자세한 리뷰입니다.

#Review #LLM Alignment #Reinforcement Learning from Human Feedback #Preference Learning #Group Relative Alignment Optimization #Self-Optimization #Mixture-of-Experts #Imitation Learning

2025년 8월 14일

[논문리뷰] TRACEALIGN -- Tracing the Drift: Attributing Alignment Failures to Training-Time Belief Sources in LLMs

Aman Chadha이 arXiv에 게시한 'TRACEALIGN -- Tracing the Drift: Attributing Alignment Failures to Training-Time Belief Sources in LLMs' 논문에 대한 자세한 리뷰입니다.

#Review #LLM Alignment #Alignment Drift #Training Data Provenance #Belief Conflict Index (BCI)#Suffix Array #Safety Interventions #Reinforcement Learning from Human Feedback #Explainable AI

2025년 8월 6일