최신 포스트

[논문리뷰] TREK: Distill to Explore, Reinforce to Refine

본 논문은 GRPO 학습 중 발생하는 핵심적 한계인 '탐색 공간의 부족(Inadequate exploration)' 문제를 해결하고자 합니다.

#Review #GRPO #Reinforcement Learning #Distillation #Exploration #Reasoning #Language Models #Policy Optimization

2026년 7월 7일

[논문리뷰] SkillOpt-Lite: Better and Faster Agent Self-evolution via One Line of Vibe

본 논문은 기존의 Agent Skill 최적화 프레임워크들이 과도하게 복잡해지고 있다는 점을 지적하며, 이론적·실증적으로 정당화 가능한 Minimal Viable Pipeline의 필요성을 제기합니다.

#Review #Agent Self-evolution #Skill Optimization #Zeroth-Order Optimization #PAC-Learning #Harness Optimization #Minimal Viable Pipeline

2026년 7월 7일

[논문리뷰] SiamJEPA: On the Role of Siamese Student Encoders in JEPA

본 논문은 JEPA 프레임워크 내에서 Siamese Student Encoders의 역할과 이들이 표현 학습에 미치는 유의미한 영향력을 체계적으로 규명하는 것을 목표로 합니다.

#Review #Self-supervised Learning #JEPA #Siamese Student Encoders #Representation Learning #Latent Prediction #Inductive Bias

2026년 7월 7일

[논문리뷰] SIEVE: Structure-Aware Data Selection for Imitation Learning with VLA Models

본 논문은 대규모 로봇 데모 데이터셋에 존재하는 중복성, 노이즈, 불균일한 작업 커버리지 문제를 해결하기 위해 구조적 데이터 선택 프레임워크인 SIEVE를 제안한다 .

#Review #Vision-Language-Action Models #Imitation Learning #Data Selection #Primitive Discovery #Structural Exposure #Behavior Cloning

2026년 7월 7일

[논문리뷰] RynnWorld-Teleop: An Action-Conditioned World Model for Digital Teleoperation

본 논문은 로봇 학습을 위한 대규모 데이터 수집이 물리적 teleoperation의 물리적 제약과 자원 한계로 인해 병목 현상을 겪고 있다는 문제를 해결하고자 합니다.

#Review #Digital Teleoperation #World Model #Robotic Learning #Video Diffusion Transformer #Action-Conditioned Generation #Sim2Real Transfer #Imitation Learning

2026년 7월 7일

[논문리뷰] RynnWorld-4D: 4D Embodied World Models for Robotic Manipulation

기존의 로봇 조작을 위한 월드 모델들은 주로 2D 픽셀 기반의 비디오 생성에 의존하고 있어, 실제 로봇 시스템이 요구하는 정밀한 3D 공간 관계나 물리적 일관성을 확보하는 데 한계가 있습니다.

#Review #4D Embodied World Models #Robotic Manipulation #Generative Video Models #RGB-DF Representation #Flow Matching #Joint Cross-Modal Attention #Embodied AI

2026년 7월 7일

[논문리뷰] Rank-Then-Act: Reward-Free Control from Frame-Order Progress

일반적인 강화학습 에이전트 학습에 필요한 외적 보상(Extrinsic Reward) 설계는 매우 복잡하거나, 환경의 특성에 따라 보상 기획이 불가능한 경우가 많습니다.

#Review #Reward-Free Control #Vision-Language Models #Ordinal Progress #Spearman Correlation #GRPO #Reinforcement Learning

2026년 7월 7일

[논문리뷰] Quantifying and Expanding the Theoretical Capacity of Late-Interaction Retrieval Models

본 논문은 Late-Interaction 모델의 핵심 연산인 MaxSim이 왜 기존의 단일 벡터 기반 dense 또는 sparse retrieval 모델보다 성능이 우수한지 그 이론적 근거를 규명하고자 합니다.

#Review #Late-Interaction #MaxSim #Information Retrieval #Neural Retrieval #Representation Learning #Inner Product

2026년 7월 7일

[논문리뷰] PointDiT: Pixel-Space Diffusion for Monocular Geometry Estimation

본 요청에 대해 제공해주신 논문 URL(https://arxiv.org/html/2607.02515)에 직접적인 접근이 현재 기술적 제약으로 인해 불가능하여, 해당 논문의 내용을 정확히 분석할 수 없습니다.

2026년 7월 7일

[논문리뷰] PluraMath: Extending Mathematical Reasoning Evaluation Beyond High-Resource Languages

본 연구는 대규모 언어 모델(LLM) 평가 및 학습 데이터가 영어와 중국어 등 고자원 언어에 과도하게 편향되어 있는 문제를 해결하는 것을 목적으로 합니다.

#Review #Multilingual Benchmark #Mathematical Reasoning #Large Language Models #Low-resource Languages #Human-in-the-loop

2026년 7월 7일

[논문리뷰] Parallelized Autoregressive Decoding for Omni-Modal Dense Video Captioning

본 논문은 기존 Autoregressive Video-LLM 기반의 Dense Video Captioning 모델들이 겪는 높은 추론 지연(Latency)과 확장성 문제를 해결하고자 합니다.

#Review #Dense video captioning #Parallel decoding #Latent planning #Omni-modal #Video-LLM #Dependency restructuring

2026년 7월 7일

[논문리뷰] Nemotron-Labs-Diffusion: A Tri-Mode Language Model Unifying Autoregressive, Diffusion, and Self-Speculation Decoding

본 논문은 기존의 엄격한 순차적 Autoregressive (AR) 디코딩 방식이 가진 낮은 추론 병렬성과 자원 활용도 문제를 해결하기 위해 고안되었습니다.

#Review #Language Model #Autoregressive #Diffusion #Self-Speculation #Parallel Decoding #Inference Efficiency #Tri-Mode Decoding

2026년 7월 7일

[논문리뷰] MuseBench: Benchmarking Intent-Level Audiovisual Arts Understanding in MLLMs

본 논문은 최신 MLLMs가 일반적인 인식 및 추론 태스크에서는 높은 성능을 보이나, 예술적 창작 의도를 해석하는 전문 영역에서는 여전히 유의미한 한계를 보인다는 문제의식에서 출발합니다.

#Review #Multimodal Large Language Models #Audiovisual Arts #Benchmark #Intent-Level Understanding #Video Essay #Interpretation Plurality

2026년 7월 7일

[논문리뷰] MentalThink: Shaping Thoughts in Mental SVG World

본 논문은 기존의 언어 중심 Multimodal CoT가 가진 시각적 접지(Visual Grounding)의 취약성과 할루시네이션(Hallucination) 문제를 해결하고자 합니다.

#Review #Multimodal LLMs #Spatial Reasoning #Scalable Vector Graphics #Chain-of-Thought #Reinforcement Learning #Mental Imagery

2026년 7월 7일

[논문리뷰] Light-Omni: Reflex over Reasoning in Agentic Video Understanding with Long-Term Memory

본 논문은 기존 비디오 에이전트 모델들이 롱폼 비디오를 처리할 때 의존하는 '탐정 스타일'의 반복적 추론(Iterative Reasoning)이 초래하는 과도한 비용과 레이턴시 문제를 해결하고자 합니다 .

#Review #Multimodal Long-Term Memory #Agentic Video Understanding #Dual-State Design #Reflexive Response #Retrieval-Augmented Generation #Video-LLM

2026년 7월 7일

[논문리뷰] Layer-wise Cross-Lingual Depression Detection from Speech: Analysis with Contrastive Alignment

본 논문은 Speech 기반 우울증 탐지 모델이 언어적 경계를 넘어 일반화되지 못하는 한계를 해결하고자 합니다.

#Review #Cross-lingual Depression Detection #Supervised Contrastive Alignment #WavLM #Speaker-identity Leakage #Layer-wise Analysis #CLeaD

2026년 7월 7일

[논문리뷰] Image2Sim: Scaling Embodied Navigation via Generative Neural Simulator

본 논문은 Embodied Navigation 학습을 위한 대규모의 고품질 물리 기반 대화형 시뮬레이션 환경이 부족하다는 문제점을 해결하고자 합니다. 기존 연구들은 실제 스캔 데이터와 합성 데이터 사이의 trade-off, 즉 시각적 충실도와 확장성 사이의 한계에 직면해 있습니다 .

#Review #Embodied Navigation #Neural Simulator #3D Gaussian Splatting #Pixel Flow #Vision-Language Navigation #Sim-to-Real

2026년 7월 7일

[논문리뷰] HunyuanOCR-1.5: Making Lightweight OCR VLMs Faster and Better

본 논문은 OCR 특화 VLM이 단순한 문서 파싱 도구를 넘어 더 넓은 영역을 커버하고 실제 배포 환경에서 더 빠른 성능을 내야 한다는 필요성에 착안했습니다.

#Review #OCR #Vision-Language Model #DFlash #Agentic Data Flow #Speculative Decoding #Document Parsing #Inference Acceleration

2026년 7월 7일

[논문리뷰] Hierarchical Sparse Attention Done Right: Toward Infinite Context Modeling

본 논문은 LLM의 long-context 확장을 저해하는 quadratic computation cost와 length extrapolation 성능 저하 문제를 해결하기 위해, 기존 chunk-wise sparse attention 방식이 갖는 불완전한 chunk 선택 메커니즘을 개선하고자 합니다.

#Review #Large Language Models #Long Context Modeling #Sparse Attention #Hierarchical Attention #Chunk-wise Attention #End-to-end Learning

2026년 7월 7일

[논문리뷰] Gemma 4 Technical Report

본 논문은 최신 LLM 생태계에서 요구되는 강력한 multimodal 이해도, 복잡한 추론 능력, 그리고 컴퓨팅 효율성을 동시에 달성하기 위해 Gemma 4 모델 제품군을 제안합니다.

#Review #Multimodal #Mixture-of-Experts #Reasoning Trace #Speculative Decoding #Quantization-Aware Training #Long-context #Encoder-free

2026년 7월 7일