#Supervised Fine-Tuning (SFT)

23개의 포스트

[논문리뷰] Video-CoE: Reinforcing Video Event Prediction via Chain of Events

arXiv에 게시된 'Video-CoE: Reinforcing Video Event Prediction via Chain of Events' 논문에 대한 자세한 리뷰입니다.

#Review #Video Event Prediction (VEP)#Multimodal Large Language Models (MLLMs)#Chain of Events (CoE)#Logical Reasoning #Visual Grounding #Reinforcement Learning (RL)#Supervised Fine-Tuning (SFT)

2026년 3월 18일

[논문리뷰] Unlocking Data Value in Finance: A Study on Distillation and Difficulty-Aware Training

arXiv에 게시된 'Unlocking Data Value in Finance: A Study on Distillation and Difficulty-Aware Training' 논문에 대한 자세한 리뷰입니다.

#Review #Financial LLMs #Data-Centric AI #Distillation #Chain-of-Thought (CoT)#Reinforcement Learning (RL)#Supervised Fine-Tuning (SFT)#Difficulty-Aware Training #Data Quality

2026년 3월 9일

[논문리뷰] Controllable Memory Usage: Balancing Anchoring and Innovation in Long-Term Human-Agent Interaction

Zhengkang Guo이 arXiv에 게시한 'Controllable Memory Usage: Balancing Anchoring and Innovation in Long-Term Human-Agent Interaction' 논문에 대한 자세한 리뷰입니다.

#Review #Long-Term Human-Agent Interaction #Controllable Memory #Memory Anchoring #Large Language Models (LLMs)#Personalization #Reinforcement Learning (RL)#Supervised Fine-Tuning (SFT)#Memory Dependence

2026년 1월 12일

[논문리뷰] Entropy-Adaptive Fine-Tuning: Resolving Confident Conflicts to Mitigate Forgetting

arXiv에 게시된 'Entropy-Adaptive Fine-Tuning: Resolving Confident Conflicts to Mitigate Forgetting' 논문에 대한 자세한 리뷰입니다.

#Review #Supervised Fine-Tuning (SFT)#Catastrophic Forgetting #Entropy-Adaptive Fine-Tuning (EAFT)#Large Language Models (LLMs)#Domain Adaptation #Reinforcement Learning (RL)#Confident Conflicts

2026년 1월 7일

[논문리뷰] Falcon-H1R: Pushing the Reasoning Frontiers with a Hybrid Model for Efficient Test-Time Scaling

arXiv에 게시된 'Falcon-H1R: Pushing the Reasoning Frontiers with a Hybrid Model for Efficient Test-Time Scaling' 논문에 대한 자세한 리뷰입니다.

#Review #Reasoning #Small Language Models (SLMs)#Hybrid Architecture #Test-Time Scaling (TTS)#Supervised Fine-Tuning (SFT)#Reinforcement Learning (RL)#DeepConf #Computational Efficiency

2026년 1월 5일

[논문리뷰] DreaMontage: Arbitrary Frame-Guided One-Shot Video Generation

arXiv에 게시된 'DreaMontage: Arbitrary Frame-Guided One-Shot Video Generation' 논문에 대한 자세한 리뷰입니다.

#Review #Video Generation #One-Shot Video #Diffusion Transformer (DiT)#Frame-Guided Generation #Auto-Regressive Generation #Supervised Fine-Tuning (SFT)#Direct Preference Optimization (DPO)

2025년 12월 24일

[논문리뷰] Robust-R1: Degradation-Aware Reasoning for Robust Visual Understanding

Runtao Liu이 arXiv에 게시한 'Robust-R1: Degradation-Aware Reasoning for Robust Visual Understanding' 논문에 대한 자세한 리뷰입니다.

#Review #Multimodal Large Language Models (MLLMs)#Visual Degradation #Robustness #Reasoning Chains #Supervised Fine-Tuning (SFT)#Reinforcement Learning (RL)#Degradation-Aware Reasoning #Interpretability

2025년 12월 21일

[논문리뷰] Skyra: AI-Generated Video Detection via Grounded Artifact Reasoning

arXiv에 게시된 'Skyra: AI-Generated Video Detection via Grounded Artifact Reasoning' 논문에 대한 자세한 리뷰입니다.

#Review #AI-Generated Video Detection #Multimodal Large Language Model (MLLM)#Artifact Reasoning #Explainable AI #Supervised Fine-Tuning (SFT)#Reinforcement Learning (RL)#Video Forensics

2025년 12월 17일

[논문리뷰] Toward Ambulatory Vision: Learning Visually-Grounded Active View Selection

arXiv에 게시된 'Toward Ambulatory Vision: Learning Visually-Grounded Active View Selection' 논문에 대한 자세한 리뷰입니다.

#Review #Active Perception #Vision-Language Models (VLMs)#Embodied AI #View Selection #Reinforcement Learning (RL)#Supervised Fine-Tuning (SFT)#Visual Question Answering (VQA)#3D Environments

2025년 12월 15일

[논문리뷰] Revisiting the Necessity of Lengthy Chain-of-Thought in Vision-centric Reasoning Generalization

arXiv에 게시된 'Revisiting the Necessity of Lengthy Chain-of-Thought in Vision-centric Reasoning Generalization' 논문에 대한 자세한 리뷰입니다.

#Review #Chain-of-Thought (CoT)#Vision-Language Models (VLMs)#Visual Reasoning #Generalization #Supervised Fine-Tuning (SFT)#Reinforcement Learning (RL)#Grounding CoT #Maze Solving

2025년 12월 2일

[논문리뷰] Tiny Model, Big Logic: Diversity-Driven Optimization Elicits Large-Model Reasoning Ability in VibeThinker-1.5B

arXiv에 게시된 'Tiny Model, Big Logic: Diversity-Driven Optimization Elicits Large-Model Reasoning Ability in VibeThinker-1.5B' 논문에 대한 자세한 리뷰입니다.

#Review #Small Language Models #Reasoning #Diversity Optimization #Supervised Fine-Tuning (SFT)#Reinforcement Learning (RL)#Spectrum-to-Signal Principle (SSP)#Mathematical Reasoning #Code Generation

2025년 11월 11일

[논문리뷰] Value Drifts: Tracing Value Alignment During LLM Post-Training

arXiv에 게시된 'Value Drifts: Tracing Value Alignment During LLM Post-Training' 논문에 대한 자세한 리뷰입니다.

#Review #LLM Alignment #Value Drift #Supervised Fine-Tuning (SFT)#Preference Optimization #RLHF #Llama-3 #Qwen-3 #Human Values

2025년 11월 9일

[논문리뷰] UI-Ins: Enhancing GUI Grounding with Multi-Perspective Instruction-as-Reasoning

arXiv에 게시된 'UI-Ins: Enhancing GUI Grounding with Multi-Perspective Instruction-as-Reasoning' 논문에 대한 자세한 리뷰입니다.

#Review #GUI Grounding #Natural Language Instructions #Multi-Perspective Reasoning #Supervised Fine-Tuning (SFT)#Reinforcement Learning (RL)#Policy Collapse Mitigation #GUI Agents

2025년 10월 27일

[논문리뷰] Distractor Injection Attacks on Large Reasoning Models: Characterization and Defense

arXiv에 게시된 'Distractor Injection Attacks on Large Reasoning Models: Characterization and Defense' 논문에 대한 자세한 리뷰입니다.

#Review #Large Reasoning Models (LRMs)#Prompt Injection #Adversarial Attack #Reasoning Distraction #Chain-of-Thought #Robustness #Supervised Fine-Tuning (SFT)#Reinforcement Learning (RL)

2025년 10월 21일

[논문리뷰] Apriel-1.5-15b-Thinker

arXiv에 게시된 'Apriel-1.5-15b-Thinker' 논문에 대한 자세한 리뷰입니다.

#Review #Multimodal Reasoning Model #Open-Weights Model #Continual Pretraining (CPT)#Supervised Fine-Tuning (SFT)#Training Design #Efficiency #Frontier Performance

2025년 10월 6일

[논문리뷰] Thinking Sparks!: Emergent Attention Heads in Reasoning Models During Post Training

arXiv에 게시된 'Thinking Sparks!: Emergent Attention Heads in Reasoning Models During Post Training' 논문에 대한 자세한 리뷰입니다.

#Review #Mechanistic Interpretability #Attention Heads #Post-Training #Supervised Fine-Tuning (SFT)#Reinforcement Learning (RL)#Circuit Analysis #Reasoning Models #Transformer Architecture

2025년 10월 1일

[논문리뷰] ScaleDiff: Scaling Difficult Problems for Advanced Mathematical Reasoning

Yu Li이 arXiv에 게시한 'ScaleDiff: Scaling Difficult Problems for Advanced Mathematical Reasoning' 논문에 대한 자세한 리뷰입니다.

#Review #Mathematical Reasoning #Large Reasoning Models (LRMs)#Difficulty Scaling #Data Augmentation #Supervised Fine-Tuning (SFT)#Problem Generation #Solution Distillation

2025년 9월 26일

[논문리뷰] Logics-Parsing Technical Report

Fan Yang이 arXiv에 게시한 'Logics-Parsing Technical Report' 논문에 대한 자세한 리뷰입니다.

#Review #Document Parsing #Large Vision-Language Models (LVLM)#Reinforcement Learning (RL)#Layout Analysis #Reading Order #Supervised Fine-Tuning (SFT)#HTML Annotation #Benchmarking

2025년 9월 25일

[논문리뷰] Analyzing the Effects of Supervised Fine-Tuning on Model Knowledge from Token and Parameter Levels

Qi Zhang이 arXiv에 게시한 'Analyzing the Effects of Supervised Fine-Tuning on Model Knowledge from Token and Parameter Levels' 논문에 대한 자세한 리뷰입니다.

#Review #Supervised Fine-Tuning (SFT)#Large Language Models (LLMs)#Model Knowledge #Closed-Book Question Answering (CBQA)#Parameter Restoration #Kullback-Leibler Divergence #Knowledge Forgetting

2025년 9월 23일

[논문리뷰] Improving Context Fidelity via Native Retrieval-Augmented Reasoning

Xiangru Tang이 arXiv에 게시한 'Improving Context Fidelity via Native Retrieval-Augmented Reasoning' 논문에 대한 자세한 리뷰입니다.

#Review #Context Fidelity #Retrieval-Augmented Generation (RAG)#Large Language Models (LLMs)#Reinforcement Learning (RL)#Supervised Fine-Tuning (SFT)#Hallucination #Question Answering #In-context Retrieval #Curriculum Learning

2025년 9월 18일

[논문리뷰] Towards a Unified View of Large Language Model Post-Training

Hongyi Liu이 arXiv에 게시한 'Towards a Unified View of Large Language Model Post-Training' 논문에 대한 자세한 리뷰입니다.

#Review #Large Language Models (LLMs)#Post-Training #Reinforcement Learning (RL)#Supervised Fine-Tuning (SFT)#Policy Gradient #Unified Framework #Hybrid Algorithms #Bias-Variance Tradeoff

2025년 9월 5일

[논문리뷰] On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification

Xinyu Ye이 arXiv에 게시한 'On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification' 논문에 대한 자세한 리뷰입니다.

#Review #Supervised Fine-Tuning (SFT)#Reinforcement Learning (RL)#Generalization #Reward Rectification #Dynamic Fine-Tuning (DFT)#LLM #Policy Gradient #Mathematical Reasoning

2025년 8월 8일

[논문리뷰] Are Today's LLMs Ready to Explain Well-Being Concepts?

Huan Liu이 arXiv에 게시한 'Are Today's LLMs Ready to Explain Well-Being Concepts?' 논문에 대한 자세한 리뷰입니다.

#Review #Large Language Models #Well-being Concepts #LLM Evaluation #Principle-Guided Evaluation #LLM-as-a-Judge #Supervised Fine-Tuning (SFT)#Direct Preference Optimization (DPO)#Explanation Generation

2025년 8월 8일