본문으로 건너뛰기

#Direct Preference Optimization (DPO)

11개의 포스트

[논문리뷰] Persuasion Dynamics in LLMs: Investigating Robustness and Adaptability in Knowledge and Safety with DuET-PD

댓글 수 로딩 중

[논문리뷰] InfiAlign: A Scalable and Sample-Efficient Framework for Aligning LLMs to Enhance Reasoning Capabilities

댓글 수 로딩 중