Review

[논문리뷰] StaMo: Unsupervised Learning of Generalizable Robot Motion from Compact State Representation

로봇 시스템에서 효율적인 세계 모델링과 의사 결정을 위해 표현적이고 압축적인 상태 표현 을 개발하는 것이 핵심 목표입니다. 기존 방법론들이 과도한 중복성이나 핵심 정보 부족으로 겪던 한계를 극복하고, 로봇의 시각적 정보를 효과적으로 요약하여 행동에 직접 연결될 수 있는 표현을 학습하고자 합니다.

#Review #Robot Learning #State Representation #Motion Representation #Diffusion Models #Unsupervised Learning #World Modeling #Vision-Language Models #Latent Action

2025년 10월 9일

[논문리뷰] SHANKS: Simultaneous Hearing and Thinking for Spoken Language Models

현재 대규모 언어 모델(LLMs) 및 음성 언어 모델(SLMs)이 사용자의 발화가 끝난 후에야 추론 및 행동을 시작하여 발생하는 높은 응답 지연 시간 문제를 해결하는 것이 목표입니다.

#Review #Spoken Language Models #Real-time Interaction #Thinking While Listening #Chain-of-Thought #Interruption #Tool Calling #Streaming ASR

2025년 10월 9일

[논문리뷰] Revisiting the Uniform Information Density Hypothesis in LLM Reasoning Traces

본 논문은 대규모 언어 모델(LLM)의 CoT(Chain-of-Thought) 추론 과정에서 효과적인 추론이 단순히 피상적인 일관성을 넘어섰는지 판단하는 방법을 모색합니다.

#Review #LLM Reasoning #Chain-of-Thought #Uniform Information Density #Information Theory #Reasoning Trace Analysis #Entropy #Mathematical Reasoning #Model Evaluation

2025년 10월 9일

[논문리뷰] Revisiting Long-context Modeling from Context Denoising Perspective

본 연구는 Long-context Models (LCMs)가 컨텍스트 내의 불필요한 토큰(contextual noise)에 취약하여 모델의 어텐션을 잘못 유도하고 성능을 저해하는 문제를 해결하는 것을 목표로 합니다.

#Review #Long-context Models #Context Denoising #Integrated Gradient #LLM Training #Context Window Scaling #Information Flow #Attention Mechanism

2025년 10월 9일

[논문리뷰] RLinf-VLA: A Unified and Efficient Framework for VLA+RL Training

본 논문은 Vision-Language-Action (VLA) 모델 에 강화 학습(RL)을 적용할 때 발생하는 소규모 및 파편화된 실험의 문제점을 해결하고자 합니다. 대규모 실험을 지원하고 다양한 모델, 알고리즘, 평가 설정 간의 공정한 비교를 가능하게 하는 통합적이고 효율적인 프레임워크 를 제공하는 것을 목표로 합니다.

#Review #Reinforcement Learning #VLA Models #Robotics #GPU Management #PPO #GRPO #Sim-to-Real

2025년 10월 9일

[논문리뷰] Pushing on Multilingual Reasoning Models with Language-Mixed Chain-of-Thought

본 논문은 중간 자원 언어(mid-resource language)에서 언어별 추론의 격차를 해소하고, 번역으로 인한 품질 저하 및 일상 표현에 대한 취약성을 극복하는 것을 목표로 합니다. 특히 한국어를 사례 연구로 하여, 다국어 추론 모델의 성능을 향상시키기 위한 효과적인 방법론을 제시하고자 합니다.

#Review #Multilingual Reasoning #Chain-of-Thought (CoT)#Language-Mixed CoT #Instruction Tuning #Korean LLMs #Data Curation #Supervised Fine-tuning (SFT)

2025년 10월 9일

[논문리뷰] Patch-as-Decodable-Token: Towards Unified Multi-Modal Vision Tasks in MLLMs

기존 MLLM이 시각 작업을 위해 텍스트로 좌표를 생성하는 등 간접적인 표현 방식 에 의존하여 성능이 제한되고 분할(Segmentation)과 같은 밀집 예측(Dense Prediction) 작업 이 어려웠던 문제를 해결하는 것입니다.

#Review #Multimodal Large Language Models (MLLMs)#Visual Reference Tokens (VRTs)#Dense Prediction #Referring Expression Comprehension (REC)#Open-Vocabulary Detection (OVD)#Image Captioning #Unified Architecture #Autoregressive Generation

2025년 10월 9일

[논문리뷰] Online Generic Event Boundary Detection

본 논문은 기존 오프라인(offline) GEBD(Generic Event Boundary Detection)의 한계를 극복하고, 인간의 인지 과정에 더 가까운 온라인 GEBD(On-GEBD) 라는 새로운 태스크를 제안합니다.

#Review #Online Video Analysis #Event Boundary Detection #Event Segmentation Theory #Real-time AI #Anomaly Detection #Transformer Architecture

2025년 10월 9일

[논문리뷰] OBS-Diff: Accurate Pruning For Diffusion Models in One-Shot

대규모 텍스트-이미지 확산 모델의 과도한 연산 비용 문제를 해결하고, 기존 원샷 네트워크 가지치기(pruning) 방법론이 확산 모델의 반복적인 노이즈 제거 특성 과 복잡한 아키텍처 에 직접 적용하기 어려운 한계를 극복하는 것을 목표로 합니다.

#Review #Diffusion Models #Network Pruning #One-Shot Pruning #Optimal Brain Surgeon (OBS)#Model Compression #Timestep-Aware Hessian #Structured Pruning

2025년 10월 9일

[논문리뷰] NorMuon: Making Muon more efficient and scalable

대규모 언어 모델(LLM) 훈련 효율성 향상을 위해 기존 Muon 옵티마이저의 한계를 극복하는 것이 목표입니다. Muon이 업데이트의 컨디셔닝을 개선하지만 뉴런별 업데이트 노름의 분산이 크다는 문제를 해결하고, 이를 통해 훈련 동역학을 더욱 균형 있게 만들어 전반적인 수렴 속도와 확장성을 높이고자 합니다.

#Review #LLM Training #Optimizer #Muon #Orthogonalization #Adaptive Learning Rates #Distributed Training #FSDP2 #NorMuon

2025년 10월 9일

[논문리뷰] Native Hybrid Attention for Efficient Sequence Modeling

본 논문은 Transformer의 O(n²) 연산 복잡도와 선형 어텐션 모델의 낮은 정확도 문제를 해결하기 위해, 효율적이면서도 긴 컨텍스트에서 높은 정확도를 유지할 수 있는 새로운 하이브리드 어텐션 아키텍처를 개발하는 것을 목표로 합니다.

#Review #Sequence Modeling #Hybrid Attention #Transformer Architecture #Linear Attention #Sliding Window Attention #Long Context #Large Language Models (LLMs)#Efficiency

2025년 10월 9일

[논문리뷰] Multi-Agent Tool-Integrated Policy Optimization

본 논문은 단일 에이전트 LLM의 도구 통합 계획(Tool-Integrated Planning, TIP) 방식이 갖는 제한된 컨텍스트 길이 와 노이즈가 많은 도구 응답 문제를 해결하고자 합니다.

#Review #Multi-Agent RL #Tool-Integrated Planning #Large Language Models (LLMs)#Policy Optimization #Credit Assignment #Reinforcement Learning #MATPO

2025년 10월 9일

[논문리뷰] Ming-UniVision: Joint Image Understanding and Generation with a Unified Continuous Tokenizer

기존 autoregressive 시각 모델에서 이산 잠재 공간 토크나이저 의 양자화 오류가 의미 표현력과 시각-언어 이해 능력을 저해하는 문제를 해결하고자 합니다.

#Review #Unified Vision-Language Model #Continuous Tokenizer #Autoregressive Generation #Image Understanding #Image Generation #Multimodal AI #In-context Editing

2025년 10월 9일

[논문리뷰] MLE-Smith: Scaling MLE Tasks with Automated Multi-Agent Pipeline

현재 기계 학습 엔지니어링(MLE) 벤치마크 는 수동 큐레이션에 의존하여 확장성이 낮고 적용 가능성이 제한적입니다. 본 연구는 이러한 문제를 해결하기 위해 LLM(Large Language Model) 에이전트 를 위한 고품질의 확장 가능한 MLE 태스크를 자동으로 생성하는 프레임워크를 개발하는 것을 목표로 합니다.

#Review #MLE (Machine Learning Engineering)#Automated Task Generation #Multi-Agent System #LLM Agents #Benchmark #Data Curation #Hybrid Verification #Kaggle

2025년 10월 9일

[논문리뷰] MATRIX: Mask Track Alignment for Interaction-aware Video Generation

본 논문은 비디오 Diffusion Transformers (DiTs)가 다중 인스턴스 또는 주체-객체 상호작용을 어떻게 내부적으로 표현하는지 분석하고, 상호작용 인지 비디오 생성 능력을 향상시키는 것을 목표로 합니다.

#Review #Video Generation #Diffusion Transformers #Human-Object Interaction #Attention Alignment #Mask Tracking #Semantic Grounding #Semantic Propagation #Text-to-Video

2025년 10월 9일

[논문리뷰] Lumina-DiMOO: An Omni Diffusion Large Language Model for Multi-Modal Generation and Understanding

본 논문은 다양한 양상의 데이터(텍스트, 이미지)를 처리할 수 있는 옴니(Omni) 형태의 멀티모달 생성 및 이해 모델 인 Lumina-DiMOO를 제안합니다.

#Review #Multi-modal LLM #Discrete Diffusion #Image Generation #Image Understanding #Omni-modal #Interactive Retouching #Generative AI #Reinforcement Learning

2025년 10월 9일

[논문리뷰] Heptapod: Language Modeling on Visual Signals

이 논문은 시각 생성 모델에서 외부 의미론적 정보 주입 및 CFG(Classifier-Free Guidance)에 대한 의존성을 비판하며, 재구성 중심의 토크나이저 와 Transformer의 내재적 의미 학습 이라는 언어 모델링의 기본 원칙으로 회귀하는 것을 목표로 합니다.

#Review #Autoregressive Models #Image Generation #Language Modeling #Causal Transformer #2D Distribution Prediction #Visual Tokenization #Self-Supervised Learning #Generative Models

2025년 10월 9일

[논문리뷰] G^2RPO: Granular GRPO for Precise Reward in Flow Models

본 논문은 확산 및 플로우 모델에서 인간 선호도에 맞춰 생성 모델을 정렬하는 기존 GRPO(Group Relative Policy Optimization) 방법론의 한계, 즉 희소하고 부정확한 보상 신호 및 불완전한 평가 문제를 해결하고자 합니다.

#Review #Reinforcement Learning #Flow Models #Generative Models #Human Preference Alignment #Stochastic Differential Equations (SDE)#Reward Signal #Multi-Granularity

2025년 10월 9일

[논문리뷰] DeepTravel: An End-to-End Agentic Reinforcement Learning Framework for Autonomous Travel Planning Agents

기존 수동 프롬프트 엔지니어링 및 고정된 워크플로우에 의존하는 여행 계획(TP) 에이전트의 한계를 극복하고, 자율적으로 계획, 도구 실행, 응답 반영을 통해 다단계 추론을 수행할 수 있는 종단 간 에이전트 강화 학습 프레임워크인 DeepTravel 을 구축하는 것이 목표입니다.

#Review #Agentic Reinforcement Learning #Travel Planning #Large Language Models #Sandbox Environment #Hierarchical Reward Modeling #Experience Replay #Autonomous Agents

2025년 10월 9일

[논문리뷰] D^3QE: Learning Discrete Distribution Discrepancy-aware Quantization Error for Autoregressive-Generated Image Detection

본 논문은 시각적 자기회귀(AR) 모델 이 생성한 이미지의 탐지라는 새로운 도전 과제를 해결하는 것을 목표로 합니다. 기존 GAN이나 Diffusion 모델 탐지 방법론과 달리, AR 모델의 이산 토큰 예측 및 코드북 의 독특한 패턴과 빈도 분포 편향을 활용하여 실제 이미지와 생성된 이미지 간의 차이를 식별하고자 합니다.

#Review #Autoregressive Models #Image Detection #Discrete Distribution Discrepancy #Quantization Error #Transformer #Generative AI #Deepfake Detection

2025년 10월 9일