최신 포스트

[논문리뷰] Surflo: Consistent 3D Surface Flow Model with Global State

본 연구는 기존의 3D Scene Flow 추정 방식이 가지는 프레임 간의 기하학적 불일치 문제를 해결하는 것을 목표로 합니다. 기존 모델들은 주로 독립적인 프레임 페어 간의 대응 관계를 찾는 데 집중하여, 연속적인 시간 흐름 속에서 누적 오차가 발생하거나 장면의 표면 구조를 왜곡시키는 한계가 있습니다.

#Review #3D Scene Flow #Surface Flow #Global State #Point Cloud #Temporal Consistency

2026년 6월 11일

[논문리뷰] SG-OPD: Sign-Gated On-Policy Distillation via Sign-Consistency Gating and Phased Teacher Sampling

본 연구는 기존의 Off-policy Distillation이 지닌 데이터 고립성 문제와 Teacher-Student 간의 Distribution Mismatch를 해결하는 데 초점을 맞춥니다.

#Review #Knowledge Distillation #On-Policy Learning #Sign-Consistency #Phased Teacher Sampling #Large Language Models #Model Alignment

2026년 6월 11일

[논문리뷰] Risk Under Pressure: Compute-Aware Evaluation of Adversarial Robustness in Language Models

본 논문은 대규모 언어 모델(LLM)의 안전성 평가가 고정된 쿼리 예산(fixed query budget)에 의존함에 따라 발생하는 심각한 정보 왜곡 문제를 해결하고자 합니다.

#Review #Adversarial Robustness #Compute-Aware Evaluation #FLOPs #Jailbreak Attacks #Risk-Compute Curves #Safety Alignment

2026년 6월 11일

[논문리뷰] Revisiting Articulated Parts Perception in Robot Manipulation

본 연구는 기존의 로봇 조작 연구들이 정적인 객체 인식에 편중되어, 관절형 객체의 복잡한 기구학적 특성을 충분히 반영하지 못하고 있다는 점을 해결하고자 한다.

#Review #Articulated Parts #Robot Manipulation #Part Segmentation #Motion Estimation #Geometric Reasoning

2026년 6월 11일

[논문리뷰] PianoKontext: Expressive Performance Rendering from Deadpan Context

본 논문은 기존의 음악 생성 모델이 표현적 타이밍(Expressive timing)과 다성 음악(Polyphonic music)의 복잡성을 제대로 모델링하지 못하는 문제를 해결하기 위해 PianoKontext를 제안한다.

#Review #Expressive Performance Rendering #Flow Matching #Latent Diffusion #Dynamic Time Warping #Music2Latent #DiT #RoPE

2026년 6월 11일

[논문리뷰] N-GRPO: Embedding-Level Neighbor Mixing for Enhanced Policy Optimization

본 연구는 LLM의 강화학습 과정 중 Rollout 단계에서 발생하는 효과적인 탐색(Exploration)의 부족과 기존 방법론의 한계점을 해결하고자 합니다.

#Review #Reinforcement Learning #Large Language Models #GRPO #Semantic Neighbor Mixing #Policy Optimization #Embedding Space #Latent Reasoning

2026년 6월 11일

[논문리뷰] MuJoCo-Drones-Gym: A GPU-Accelerated Multi-Drone Simulator for Control and Reinforcement Learning

본 논문은 기존 쿼드콥터 시뮬레이터들이 가진 물리적 정확성, Multi-agent 지원, 그리고 현대적인 Deep RL 파이프라인에 필요한 처리량(Throughput) 간의 Trade-off 문제를 해결하고자 합니다.

#Review #Multi-drone Simulator #MuJoCo #Reinforcement Learning #GPU Acceleration #MJX #Aerial Robotics #Gymnasium

2026년 6월 11일

[논문리뷰] MoVerse: Real-Time Video World Modeling with Panoramic Gaussian Scaffold

본 논문은 단일 NFOV 이미지로부터 사용자가 자유롭게 이동하며 탐색할 수 있는 spatially persistent한 3D 환경을 생성하는 것을 목표로 합니다.

#Review #World Model #3D Gaussian Splatting #Panoramic Generation #Video Rendering #Real-Time Interaction

2026년 6월 11일

[논문리뷰] MaxProof: Scaling Mathematical Proof with Generative-Verifier RL and Population-Level Test-Time Scaling

본 논문은 대규모 언어 모델이 수학적 증명 문제에서 겪는 Hallucination과 Logical Inconsistency 문제를 해결하는 것을 핵심 목표로 합니다.

#Review #Mathematical Reasoning #Reinforcement Learning #Test-Time Scaling #Generative-Verifier #Formal Verification #Scalable Alignment

2026년 6월 11일

[논문리뷰] MaskAlign: Token-Subset Representation Alignment for Efficient Diffusion Training

본 논문은 기존의 Representation Alignment 기법이 diffusion 모델의 학습 효율성을 개선함에도 불구하고, 노이즈가 포함된 모델 입력과 깨끗한 이미지 기반의 참조 특징 사이에서 발생하는 근본적인 '불일치(mismatch)' 문제를 해결하고자 합니다.

#Review #Diffusion Models #Representation Alignment #Token Masking #Efficient Training #Stochastic Interpolants #Transformer

2026년 6월 11일

[논문리뷰] Leveraging Morphology for Historical Script Metrological Analysis

본 연구는 고대 필사본 연구에서 필자의 서체 특성을 객관적으로 정량화하기 위한 자동화된 도구가 부족하다는 점을 해결하고자 합니다. 기존의 수동적인 Paleography 분석은 연구자의 주관에 의존하며, 대규모 데이터를 처리하는 데 한계가 있습니다.

#Review #Historical Script #Metrological Analysis #Morphology #Paleography #Feature Extraction #Geometric Analysis

2026년 6월 11일

[논문리뷰] LabVLA: Grounding Vision-Language-Action Models in Scientific Laboratories

본 연구는 기존의 General-purpose VLA 모델들이 정밀한 과학 실험실 환경에서의 특수성과 고도의 Domain-specific 작업 수행 능력 부족 문제를 해결하고자 합니다.

#Review #Vision-Language-Action #Robotics #Scientific Laboratory #Multimodal Learning #Embodied AI #Automation

2026년 6월 11일

[논문리뷰] InterleaveThinker: Reinforcing Agentic Interleaved Generation

본 논문은 기존의 Unified Multimodal Models(UMMs)가 장기 시퀀스 생성 과정에서 겪는 Visual Over-reliance와 Step-wise Error Accumulation 문제를 해결하기 위해 고안되었습니다.

#Review #Interleaved Generation #Multi-Agent Framework #Reinforcement Learning #GRPO #Visual Over-reliance #Error Accumulation

2026년 6월 11일

[논문리뷰] IDEAL: In-DEpth ALignment Makes A Discrete Representation AutoEncoder

본 논문은 VFM 기반의 RAE가 재구성 품질과 의미 보존 사이에서 겪는 근본적인 병목 현상을 해결하고자 합니다. 기존 연구들은 주로 깊은 계층의 의미론적 정보에만 의존하는데, 이는 디테일한 시각적 속성(색상, 텍스트, 로컬 구조 등)을 소실시키는 결과를 초래합니다.

#Review #Representation Autoencoder #Vision Foundation Models #Vector Quantization #Autoregressive Generation #Semantic Preservation #Reconstruction Fidelity

2026년 6월 11일

[논문리뷰] High-Fidelity Two-Step Image Generation via Teacher-Aligned End-to-End Distillation

본 연구는 고품질 이미지 생성 모델의 Inference Latency 문제와 다단계 생성 과정에서의 정보 손실을 해결하는 것을 목표로 합니다.

#Review #Image Generation #Knowledge Distillation #Diffusion Models #Model Compression #Latent Diffusion #Efficiency

2026년 6월 11일

[논문리뷰] HarnessBridge: Learnable Bidirectional Controller for LLM Agent Harness

본 논문은 기존의 수동으로 설계된(manually engineered) Harness가 복잡하고 긴 호흡의(long-horizon) 과제에서 비효율적인 상호작용을 초래하는 문제를 해결하고자 합니다.

#Review #LLM Agent #Harness Engineering #Bidirectional Projection #Observation Projection #Action Projection #Unified Instruction Tuning #Long-Horizon Task

2026년 6월 11일

[논문리뷰] HYDRA-X: Native Unified Multimodal Models with Holistic Visual Tokenizers

본 논문은 기존 Multimodal Large Language Models(MLLMs)가 Visual Encoder와 LLM 사이의 불균형 및 정보 정렬(Alignment) 미흡으로 인해 발생하는 성능 저하 문제를 해결합니다.

#Review #Multimodal Learning #Visual Tokenizer #Unified Architecture #Large Language Models #Representation Learning #Vision-Language Integration

2026년 6월 11일

[논문리뷰] From 2D Grids to 1D Tokens: Reforming Shared Representations for Multimodal Image Fusion

본 논문은 기존의 Multimodal Image Fusion (MMIF) 기법들이 공유 표현(shared representation)으로 dense 2D feature grid를 사용함으로써 발생하는 구조적 한계를 해결합니다.

#Review #Multimodal Image Fusion #1D Tokenizer #Shared Representation #Selective Token Editing #Global Appearance #Local Fidelity

2026년 6월 11일

[논문리뷰] Flash-GMM: A Memory-Efficient Kernel for Scalable Soft Clustering

본 논문은 대규모 데이터셋에 대한 GMM 훈련 시 발생하는 메모리 부족(OOM) 문제와 과도한 HBM 대역폭 요구 사항을 해결합니다.

#Review #Gaussian Mixture Models #GMM #Triton #IVF #Approximate Nearest Neighbor #Memory-Efficient #Soft Clustering

2026년 6월 11일

[논문리뷰] FORT-Searcher: Synthesizing Shortcut-Resistant Search Tasks for Training Deep Search Agents

본 연구는 Deep Search Agents가 훈련 과정에서 데이터셋 내의 의도치 않은 패턴인 Shortcut에 과도하게 의존하여 실제 검색 환경에서 성능이 저하되는 현상을 해결합니다.

#Review #Deep Search Agents #Shortcut-Resistant #Task Synthesis #Representation Learning #Reinforcement Learning #Information Retrieval #Robustness

2026년 6월 11일