[논문리뷰] References Improve LLM Alignment in Non-Verifiable DomainsarXiv에 게시된 'References Improve LLM Alignment in Non-Verifiable Domains' 논문에 대한 자세한 리뷰입니다.#Review#LLM Alignment#Reference-Guided Evaluation#Self-Improvement#Non-Verifiable Domains#Reinforcement Learning from Human Feedback (RLHF)#Direct Preference Optimization (DPO)2026년 2월 19일댓글 수 로딩 중
[논문리뷰] Self-Improving World Modelling with Latent ActionsAnna Korhonen이 arXiv에 게시한 'Self-Improving World Modelling with Latent Actions' 논문에 대한 자세한 리뷰입니다.#Review#World Modeling#Latent Actions#Self-Improvement#Reinforcement Learning#LLMs#VLMs#Inverse Dynamics Model#Forward World Modelling2026년 2월 8일댓글 수 로딩 중
[논문리뷰] Self-Improving Multilingual Long Reasoning via Translation-Reasoning Integrated TrainingLiqian Huang이 arXiv에 게시한 'Self-Improving Multilingual Long Reasoning via Translation-Reasoning Integrated Training' 논문에 대한 자세한 리뷰입니다.#Review#Multilingual Reasoning#Reinforcement Learning#Machine Translation#Question Understanding#Self-Improvement#Language Models#Cross-Lingual Alignment2026년 2월 8일댓글 수 로딩 중
[논문리뷰] AgentDevel: Reframing Self-Evolving LLM Agents as Release EngineeringDi Zhang이 arXiv에 게시한 'AgentDevel: Reframing Self-Evolving LLM Agents as Release Engineering' 논문에 대한 자세한 리뷰입니다.#Review#LLM Agents#Release Engineering#Self-Improvement#Regression Testing#Continuous Integration#Flip-Centered Gating#Auditable Development#Software Engineering2026년 1월 8일댓글 수 로딩 중
[논문리뷰] Reinforcement Learning for Self-Improving Agent with Skill LibrarySoumya Smruti Mishra이 arXiv에 게시한 'Reinforcement Learning for Self-Improving Agent with Skill Library' 논문에 대한 자세한 리뷰입니다.#Review#Reinforcement Learning (RL)#LLM Agents#Skill Library#Self-Improvement#Sequential Rollout#AppWorld dataset#GRPO2025년 12월 23일댓글 수 로딩 중
[논문리뷰] Self-Improving VLM Judges Without Human AnnotationsarXiv에 게시된 'Self-Improving VLM Judges Without Human Annotations' 논문에 대한 자세한 리뷰입니다.#Review#Vision-Language Models#Self-Improvement#Judge Models#Synthetic Data Generation#Iterative Refinement#Reward Modeling#Human-free Alignment2025년 12월 7일댓글 수 로딩 중
[논문리뷰] SIMA 2: A Generalist Embodied Agent for Virtual WorldsarXiv에 게시된 'SIMA 2: A Generalist Embodied Agent for Virtual Worlds' 논문에 대한 자세한 리뷰입니다.#Review#Embodied AI#Generalist Agent#Virtual Worlds#Foundation Models#Gemini#Self-Improvement#Dialogue#Reasoning#Reinforcement Learning2025년 12월 4일댓글 수 로딩 중
[논문리뷰] Counteracting Matthew Effect in Self-Improvement of LVLMs through Head-Tail Re-balancingXiaowei Shi이 arXiv에 게시한 'Counteracting Matthew Effect in Self-Improvement of LVLMs through Head-Tail Re-balancing' 논문에 대한 자세한 리뷰입니다.#Review#LVLMs#Self-Improvement#Matthew Effect#Data Bias Mitigation#Distribution Reshaping#Trajectory Resampling#Visual Reasoning2025년 10월 31일댓글 수 로딩 중
[논문리뷰] Self-Improvement in Multimodal Large Language Models: A SurveyYapeng Tian이 arXiv에 게시한 'Self-Improvement in Multimodal Large Language Models: A Survey' 논문에 대한 자세한 리뷰입니다.#Review#Multimodal Large Language Models (MLLMs)#Self-Improvement#Data Collection#Data Organization#Model Optimization#Survey#Reinforcement Learning#Direct Preference Optimization2025년 10월 6일댓글 수 로딩 중
[논문리뷰] Vision-Zero: Scalable VLM Self-Improvement via Strategic Gamified Self-PlayJing Shi이 arXiv에 게시한 'Vision-Zero: Scalable VLM Self-Improvement via Strategic Gamified Self-Play' 논문에 대한 자세한 리뷰입니다.#Review#Vision-Language Models (VLMs)#Self-Play#Reinforcement Learning#Gamification#Data Efficiency#Strategic Reasoning#Multimodal AI#Self-Improvement2025년 10월 1일댓글 수 로딩 중
[논문리뷰] Bootstrapping Task Spaces for Self-ImprovementYoram Bachrach이 arXiv에 게시한 'Bootstrapping Task Spaces for Self-Improvement' 논문에 대한 자세한 리뷰입니다.#Review#Reinforcement Learning (RL)#Large Language Models (LLMs)#Self-Improvement#Autocurriculum#Task-Space Exploration#Inference-Time Iteration#Policy Optimization2025년 9월 8일댓글 수 로딩 중
[논문리뷰] Improving Large Vision and Language Models by Learning from a Panel of PeersSimon Jenni이 arXiv에 게시한 'Improving Large Vision and Language Models by Learning from a Panel of Peers' 논문에 대한 자세한 리뷰입니다.#Review#Large Vision and Language Models (LVLMs)#Self-Improvement#Peer Learning#Preference Alignment#Reward Modeling#Multimodal Learning#Knowledge Transfer2025년 9월 3일댓글 수 로딩 중