최신 포스트

[논문리뷰] SciOrch: Learning to Orchestrate Expert LLMs for Solving Frontier Multimodal Scientific Reasoning Tasks

본 논문은 frontier multimodal scientific reasoning 분야에서 단일 상용 LLM 시스템이 전문가 수준의 성능을 달성하지 못하는 한계를 극복하고자 합니다.

#Review #Multimodal Scientific Reasoning #LLM Orchestration #MCTS #Reinforcement Learning #Expert Model Delegation #Agentic Workflow

2026년 6월 17일

[논문리뷰] STARE: Surprisal-Guided Token-Level Advantage Reweighting for Policy Entropy Stability

본 논문은 RLVR 기반의 LLM 학습 과정에서 빈번하게 발생하는 Policy Entropy Collapse 문제를 해결하고자 합니다. 기존의 GRPO는 학습이 지속됨에 따라 출력 다양성이 사라지고 모델이 조기에 수렴하는 현상을 겪으며, 이는 장기적인 포스트 트레이닝의 병목 현상으로 작용합니다 .

#Review #Reinforcement Learning #Policy Entropy #GRPO #Advantage Reweighting #Surprisal #LLM Post-training #Credit Assignment

2026년 6월 17일

[논문리뷰] SAE Interventions are Unreliable: Post-Intervention Recovery of Suppressed Behavior

본 논문은 SAE를 이용한 잠재 공간(latent-space) 방어 기법들이 행동을 완전히 통제하지 못할 수 있다는 한계점을 지적합니다.

#Review #Sparse Autoencoders #Intervention #Post-Intervention Recovery #Constrained Optimization #Interpretability #Safety #Residual Stream

2026년 6월 17일

[논문리뷰] Reinforcing Dual-Path Reasoning in Spatial Vision Language Models

본 논문은 기존의 Spatial VLM들이 가진 복합적인 공간 추론 능력의 한계를 극복하고자 합니다. 현재의 모델들은 단순한 기하학적 인식에는 강점을 보이나, 깊이 정보와 거리 비교, 장면 관계가 얽힌 복합적인 다단계 추론에는 여전히 취약합니다 .

#Review #Spatial Vision-Language Models #Reinforcement Learning #Dual-Path Reasoning #Chain-of-Thought #3D Grounding #Geometric Reasoning

2026년 6월 17일

[논문리뷰] RODS: Reward-Driven Online Data Synthesis for Multi-Turn Tool-Use Agents

본 논문은 Multi-turn Tool-Use 에이전트 학습 시 발생하는 데이터 부족 및 정보 밀도 감소 문제를 해결하고자 합니다.

#Review #Multi-turn Tool-Use #Reinforcement Learning #Data Synthesis #Gradient Variance #Capability Boundary #Agentic RL #Replay Buffer

2026년 6월 17일

[논문리뷰] Physics-IQ Verified

본 논문은 기존 Physics-IQ benchmark가 물리적 이해도를 측정하는 데 있어 포함하고 있는 측정 오류(Measurement Error)와 평가 프로토콜의 한계를 해결하고자 한다.

#Review #Video Generative Models #Physical Reasoning #Benchmark #Evaluation #Ground Truth #Artifacts #Physics-IQ

2026년 6월 17일

[논문리뷰] PAIWorld: A 3D-Consistent World Foundation Model for Robotic Manipulation

기존의 World Foundation Models는 대부분 단일 뷰(Single-view) 기반으로 동작하거나, 다중 뷰를 단순히 시퀀스 차원에서 연결(Concatenation)하는 방식을 취하여 로봇 조작에 필수적인 3D 일관성 문제를 해결하지 못한다.

#Review #World Foundation Model #Robotic Manipulation #3D Consistency #Diffusion Transformer #Flow Matching #Multi-view Generation

2026년 6월 17일

[논문리뷰] Native Active Perception as Reasoning for Omni-Modal Understanding

본 논문은 기존의 패시브한 Long Video Understanding 모델들이 가진 컴퓨팅 자원 및 성능의 한계를 해결하기 위해 제안되었습니다. 기존 연구들은 비디오 전체를 균일하게 처리하거나 전역적 사전 스캔에 의존함으로써, 비디오 길이에 따라 계산 비용이 선형적으로 증가하는 고질적인 병목 현상을 겪고 있습니다 .

#Review #Omni-modal Understanding #Active Perception #POMDP #Agentic Reasoning #Test-time Scaling #TAURA #Reinforcement Learning

2026년 6월 17일

[논문리뷰] MyPCBench: A Benchmark for Personally Intelligent Computer-Use Agents

본 논문은 기존의 Computer-Use 에이전트 벤치마크가 실사용 환경과 동떨어진 '개인성(Impersonality)' 결여 문제를 해결하고자 한다.

#Review #Computer-Use Agents #Personalization #Benchmark #Linux Desktop #Agent Harness #Cross-App Consistency

2026년 6월 17일

[논문리뷰] Morpheus: A Morphology-Aware Neural Tokenizer and Word Embedder for Turkish

본 논문은 터키어와 같은 교착어에서 기존 subword 토크나이저(BPE, WordPiece 등)가 가진 형태론적 무지성과 정보 손실 문제를 해결하고자 합니다.

#Review #Turkish NLP #Morphological Tokenizer #Differentiable Segmentation #Word Embedding #Poisson-Binomial #Reversible Tokenization

2026년 6월 17일

[논문리뷰] Learning User Simulators with Turing Rewards

본 논문은 기존의 사용자 시뮬레이터 학습 방식이 실제 인간의 행동을 충분히 모사하지 못하는 근본적인 한계를 해결하고자 합니다. 기존 연구들은 주로 Log-probability 최대화 또는 Ground truth 응답과의 단순 Similarity를 측정하는 방식에 의존해 왔습니다.

#Review #User Simulation #Turing Reward #Reinforcement Learning #Large Language Models #Indistinguishability #GRPO #Human-likeness

2026년 6월 17일

[논문리뷰] LLM-Enabled NWDAF: A Step Toward AI-Native 6G Network Intelligence

본 연구는 기존 오픈소스 5G 네트워크 환경에서 NWDAF의 기능적 한계와 비전문가 사용자를 위한 직관적인 관리 도구의 부재를 해결하고자 합니다.

#Review #NWDAF #5G Core Network #6G #LLM Interface #Intent-Based Networking #Open-Source Testbed #RAG

2026년 6월 17일

[논문리뷰] Kairos: A Native World Model Stack for Physical AI

본 논문은 현재 World Model이 단순한 비디오 생성기를 넘어 Physical AI를 위한 근본적인 인프라로 진화해야 한다는 필요성에서 출발한다 .

#Review #Physical AI #World Model #Diffusion Transformer #Gated Linear Attention #Cross-Embodiment #Deployment-Aware #Embodied Control

2026년 6월 17일

[논문리뷰] IndustryBench-MIPU: Benchmarking Multi-Image Attribute Value Extraction for Industrial Products

본 연구는 기존의 일반적인 시각적 추론 벤치마크가 산업용 제품의 복잡하고 지식 집약적인 사양 이해를 다루지 못한다는 한계에서 출발합니다.

#Review #IndustryBench-MIPU #Attribute Value Extraction #Multimodal Large Language Models #Industrial Products #Completeness Gap #Multi-Image Integration

2026년 6월 17일

[논문리뷰] Guava: An Effective and Universal Harness for Embodied Manipulation

본 논문은 Embodied Manipulation 환경에서 복잡한 저수준 제어를 직접 학습하는 기존의 End-to-End VLA(Vision-Language-Action) 모델의 데이터 비효율성과 낮은 복구 능력을 해결하기 위해 Guava 프레임워크를 제안합니다.

#Review #Embodied Manipulation #Harness Framework #Vision-Language Models #ReAct #Tool Use #Policy Distillation #Sim2Real

2026년 6월 17일

[논문리뷰] From Trainee to Trainer: LLM-Designed Training Environment for RL with Multi-Agent Reasoning

본 연구는 RL 학습 환경을 수동으로 설계하는 기존 파이프라인의 비효율성과 확장성 한계를 해결하고자 수행되었습니다. 기존의 RL 학습은 환경 설정이 고정되어 있거나, 전문가가 휴리스틱에 의존하여 학습 커리큘럼을 직접 조정해야 하므로 복잡한 시나리오에서의 일반화 및 최적화 능력이 저하되는 문제가 있습니다.

#Review #Reinforcement Learning #LLM-as-Environment-Engineer #Multi-Agent Path Finding #MAPF-FrozenLake #Self-Improvement #Policy Conditioning

2026년 6월 17일

[논문리뷰] Externalizing Research Synthesis and Validation in AI Scientists through a Research Harness

본 논문은 현대의 AI 과학자 시스템들이 자동화된 과학적 연구 수행에는 능숙하나, 연구의 논리적 근거와 추론 과정이 모델의 내부 상태에 잠겨 있어 외부에서 검증하거나 감사하기 어렵다는 문제를 제기한다 .

#Review #AI Scientist #Research Harness #Research Synthesis #Experimental Validation #Claim Drift #Auditability #Paper Graph Infrastructure

2026년 6월 17일

[논문리뷰] EfficientRollout: System-Aware Self-Speculative Decoding for RL Rollouts

본 논문은 LLM의 RL 학습 과정에서 발생하는 Rollout 생성의 고질적인 Latency 문제를 해결하기 위해 고안되었습니다.

#Review #Reinforcement Learning #Speculative Decoding #Self-Speculative Decoding #LLM Rollout #System-Aware #Quantization

2026년 6월 17일

[논문리뷰] CEO-Bench: Can Agents Play the Long Game?

본 논문은 기존의 에이전트 평가 방식이 단기 작업(Short-horizon tasks)에 치우쳐 있어, 실제 세계의 복잡한 의사결정 과정을 검증하지 못한다는 문제 의식에서 출발한다 .

#Review #Long-Horizon #Agent Evaluation #Business Simulation #Decision Making #Partial Observability #Strategic Planning #Autonomous Agents

2026년 6월 17일

[논문리뷰] Beyond the Current Observation: Evaluating Multimodal Large Language Models in Controllable Non-Markov Games

본 논문은 MLLM이 현재 눈에 보이는 정보 외에 과거의 관측값을 기억하고 이를 활용해야 하는 Non-Markov 상황에서의 한계를 해결하고자 합니다. 기존 벤치마크들은 은닉 상태를 재구성하는 능력과 다른 에이전트 기술들을 혼재시키거나, 에피소드가 끝난 후 기억을 테스트하는 방식으로 기억력을 제대로 격리하지 못했습니다.

#Review #Multimodal Large Language Models #Non-Markov Games #In-context State Tracking #Belief State #Closed-loop Evaluation #Memory Gap

2026년 6월 17일