Review

[논문리뷰] Learning to Foresee: Unveiling the Unlocking Efficiency of On-Policy Distillation

본 논문은 대규모 언어 모델(LLM)의 post-training에서 OPD가 RL보다 높은 효율성을 보이는 근본적인 파라미터 업데이트 메커니즘을 규명하고자 합니다.

#Review #On-Policy Distillation #Large Language Models #Parameter Dynamics #Training Efficiency #EffOPD #Subspace Evolution

2026년 5월 17일

[논문리뷰] Learning from Failures: Correction-Oriented Policy Optimization with Verifiable Rewards

본 논문은 기존 RLVR 패러다임이 가진 sparse binary reward와 weak credit assignment 문제를 해결하여 모델의 추론 능력을 극대화하는 것을 목적으로 합니다.

#Review #Reinforcement Learning #Large Language Models #Verifiable Rewards #Policy Optimization #Error Correction #Reasoning Capability

2026년 5월 17일

[논문리뷰] Learning POMDP World Models from Observations with Language-Model Priors

본 연구는 잠재 상태에 대한 정보(Ground-truth state)가 주어지지 않는 완전한 부분 관측 환경(Strict POMDP setting)에서 에이전트가 어떻게 효과적으로 세계 모델(World Model)을 학습할 수 있는지 탐구합니다.

#Review #POMDP #World Model #Large Language Models #Program Induction #Sample Efficiency #Partial Observability #Belief-based Filtering

2026년 5월 17일

[논문리뷰] InsightTok: Improving Text and Face Fidelity in Discrete Tokenization for Autoregressive Image Generation

본 논문은 Autoregressive 모델 기반의 이미지 생성에서 텍스트와 얼굴의 품질이 저하되는 문제를 해결하고자 합니다.

#Review #Discrete Tokenization #Autoregressive Image Generation #Perceptual Loss #Text Fidelity #Face Fidelity #Content-Aware Supervision

2026년 5월 17일

[논문리뷰] Hölder Policy Optimisation

본 논문은 LLM의 long-horizon 추론 과제에서 GRPO와 같은 기존 그룹 기반 RL 알고리즘이 사용하는 고정된 aggregation mechanism의 한계를 지적한다.

#Review #Reinforcement Learning #Large Language Models #Hölder Mean #Gradient Concentration #Policy Optimisation #Group Relative Policy Optimisation (GRPO)

2026년 5월 17일

[논문리뷰] HodgeCover: Higher-Order Topological Coverage Drives Compression of Sparse Mixture-of-Experts

본 논문은 기존의 MoE 압축 방식들이 전문가 간의 결합 가능성을 평가할 때 사용하는 pairwise 점수의 구조적 한계를 해결하고자 합니다.

#Review #Sparse Mixture-of-Experts #Simplicial Complex #Hodge Decomposition #Harmonic Kernel #Model Compression #Topological Deep Learning

2026년 5월 17일

[논문리뷰] GQLA: Group-Query Latent Attention for Hardware-Adaptive Large Language Model Decoding

본 연구는 MLA가 특정 하드웨어(예: NVIDIA H100)의 연산-대역폭 비율에 지나치게 종속되어 있다는 문제를 해결합니다.

#Review #Large Language Model #KV-cache #Multi-head Latent Attention #GQLA #Hardware-Adaptive #Roofline Model #Tensor Parallelism

2026년 5월 17일

[논문리뷰] From Plans to Pixels: Learning to Plan and Orchestrate for Open-Ended Image Editing

기존의 Diffusion-based 이미지 편집 모델들은 '모자를 추가하라'와 같은 명확하고 구체적인 작업에는 우수한 성능을 보이지만, '광고를 채식주의자 친화적으로 바꾸라'와 같은 추상적이고 다단계의 장기적인(long-horizon) 지시사항을 처리하는 데에는 한계가 있습니다.

#Review #Long-horizon #Image Editing #Planner-Orchestrator #Experiential Learning #Reward-driven #Multimodal LLM #Diffusion Models

2026년 5월 17일

[논문리뷰] Flash-GRPO: Efficient Alignment for Video Diffusion via One-Step Policy Optimization

본 논문은 Video Diffusion Model의 효율적인 정렬(Alignment)을 위한 단일 단계(Single-step) 훈련 프레임워크인 Flash-GRPO를 제안합니다 .

#Review #Video Diffusion Models #Group Relative Policy Optimization #Reinforcement Learning #Single-step Training #Iso-temporal Grouping #Temporal Gradient Rectification #Alignment

2026년 5월 17일

[논문리뷰] FashionChameleon: Towards Real-Time and Interactive Human-Garment Video Customization

본 논문은 실시간 인터랙티브 가먼트 교체와 비디오 생성을 동시에 달성하기 어려운 기존의 한계를 해결하고자 합니다. 기존의 subject-to-video(S2V) 방식은 주로 identity 보존에만 집중하고 있어, 패션 산업이나 콘텐츠 생성에서 요구되는 실시간이고 유연한 가먼트 제어 능력이 부족합니다.

#Review #Video Customization #Garment Switching #Autoregressive Generation #In-Context Learning #Streaming Distillation #KV Cache Rescheduling #Real-Time Inference

2026년 5월 17일

[논문리뷰] FFAvatar: Few-Shot, Feed-Forward, and Generalizable Avatar Reconstruction

본 논문은 기존 3D avatar 재구성 기법들이 요구하는 과도한 연산 시간과 복잡한 사전 처리의 한계를 극복하기 위해 FFAvatar를 제안한다.

#Review #3D Gaussian Splatting #Feed-Forward #Few-Shot #Avatar Reconstruction #FLAME #Multi-View #Generalization

2026년 5월 17일

[논문리뷰] Efficient Image Synthesis with Sphere Latent Encoder

본 연구는 기존 few-step 생성 모델들이 겪는 비효율성과 훈련 불안정성 문제를 해결하고자 합니다.

#Review #Few-step Image Generation #Spherical Latent Space #Representation Autoencoder #Denoising Model #Latent Space Sampling

2026년 5월 17일

[논문리뷰] Distilling Long-CoT Reasoning through Collaborative Step-wise Multi-Teacher Decoding

본 논문은 대규모 Long-CoT 모델의 높은 추론 비용을 해결하기 위한 효율적인 reasoning distillation 프레임워크를 제안합니다.

#Review #Reasoning Distillation #Collaborative Decoding #Long-CoT #Predictive Perplexity #Multi-Teacher #Beam Search #Step-wise Synthesis

2026년 5월 17일

[논문리뷰] DiagnosticIQ: A Benchmark for LLM-Based Industrial Maintenance Action Recommendation from Symbolic Rules

본 연구는 산업 설비의 고장 감지 이후, 엔지니어가 수행해야 할 구체적인 정비 단계(maintenance steps)를 추천하는 데 있어 LLM의 역량을 체계적으로 진단하고자 한다.

#Review #DiagnosticIQ #Industrial Maintenance #LLM Benchmark #Symbolic Rules #MCQA #Fault Detection #Action Recommendation

2026년 5월 17일

[논문리뷰] DexJoCo: A Benchmark and Toolkit for Task-Oriented Dexterous Manipulation on MuJoCo

본 논문은 기존의 로봇 조작 벤치마크가 단순한 그리퍼 중심의 환경에 치중되어 있어, 진정한 의미의 인간 수준(Human-level) 조작 능력을 평가하는 데 한계가 있다는 문제 의식에서 출발합니다.

#Review #Dexterous Manipulation #Robotics Benchmark #Teleoperation #Imitation Learning #Vision-Language-Action Models #MuJoCo #Domain Randomization

2026년 5월 17일

[논문리뷰] CiteVQA: Benchmarking Evidence Attribution for Trustworthy Document Intelligence

본 논문은 현대 MLLM의 Doc-VQA 평가 방식이 최종 답변의 정답 여부에만 지나치게 의존하여, 실제 추론의 근거가 되는 시각적 증거의 정확성을 검증하지 못한다는 문제를 지적합니다.

#Review #Multimodal Large Language Models #Document Visual Question Answering #Evidence Attribution #Trustworthy AI #Strict Attributed Accuracy #Attribution Hallucination

2026년 5월 17일

[논문리뷰] ChangeFlow -- Latent Rectified Flow for Change Detection in Remote Sensing

본 논문은 기존의 RSCD 연구들이 주로 픽셀 단위의 결정론적 분류(discriminative classification)에 의존하고 있어, 지역적 일관성 부족과 모호성 처리에 한계가 있다는 점을 지적합니다.

#Review #Remote Sensing Change Detection #Rectified Flow #Generative Models #Latent Space #Diffusion Transformer #Coherence #Confidence Estimation

2026년 5월 17일

[논문리뷰] CM-EVS: Sparse Panoramic RGB-D-Pose Data for Complete Scene Coverage

본 논문은 기존 3D 자산 데이터셋들이 파노라마 모델 학습을 위한 효과적인 '관측 레이어(Observation layer)'를 정의하지 못하고 있다는 문제점을 지적합니다.

#Review #Panoramic #RGB-D-Pose #Viewpoint Curation #Submodular Maximization #Scene Coverage #Dataset

2026년 5월 17일

[논문리뷰] Agentic Discovery of Neural Architectures: AIRA-Compose and AIRA-Design

본 연구는 Recursive Self-Improvement의 일환으로 LLM 에이전트가 기존 Transformer 패러다임을 넘어선 차세대 foundation model을 자율적으로 설계할 수 있는지 탐구합니다.

#Review #Neural Architecture Search #Foundation Models #LLM Agents #Recursive Self-Improvement #Hybrid Architectures #AIRS-Bench

2026년 5월 17일

[논문리뷰] WildClawBench: A Benchmark for Real-World, Long-Horizon Agent Evaluation

본 연구는 기존 에이전트 벤치마크가 현실적인 배포 환경을 제대로 반영하지 못하는 한계를 해결하기 위해 수행되었다.

#Review #Agent Evaluation #Long-Horizon #Native-Runtime #Multimodal #Reproducible #Hybrid Verification

2026년 5월 14일