최신 포스트

[onnxruntime] [ONNX Runtime] SGEMM의 함정에서 벗어나기: GQA 전용 GEMV 커널을 통한 디코딩 최적화

ONNX Runtime에서 M=1인 디코딩 상황의 SGEMM 오버헤드를 해결하고, 전용 GEMV 커널로 GQA 성능을 최대 1.5배 끌어올린 최적화 사례를 분석합니다.

#ONNX Runtime #GQA #Performance Optimization #GEMV #LLM Inference

2026년 6월 26일

[논문리뷰] Why Multi-Step Tool-Use Reinforcement Learning Collapses and How Supervisory Signals Fix It

본 논문은 다단계 도구 사용 태스크에서 RL 기반 최적화가 겪는 학습 불안정성과 성능 정체 문제를 해결하고자 합니다.

#Review #Tool Learning #Reinforcement Learning #Structural Collapse #Supervisory Signals #Interleaved Training #Process Reflection Supervision

2026년 6월 25일

[논문리뷰] When Does Combining Language Models Help? A Co-Failure Ceiling on Routing, Voting, and Mixture-of-Agents Across 67 Frontier Models

본 논문은 다양한 LLM 시스템(Routing, Voting, Mixture-of-Agents)의 정확도 향상 잠재력이 일반적으로 알려진 것보다 훨씬 낮다는 문제를 제기한다. 기존 실무에서는 모델 간의 오차 상관관계인 $\rho$를 지표로 활용하여, $\rho$가 낮으면 다양한 모델을 결합하는 것이 효과적이라 판단해왔다.

#Review #LLM Orchestration #Model Routing #Co-failure Ceiling #Error Correlation #Mixture-of-Agents #Inference Economics

2026년 6월 25일

[논문리뷰] ViQ: Text-Aligned Visual Quantized Representations at Any Resolution

본 논문은 기존 MLLM에서 시각적 인코더가 사용하는 연속적 표현(continuous representation)과 언어 모델의 이산적 토큰(discrete token) 사이의 표현 불일치 문제를 해결하고자 합니다.

#Review #Multimodal Large Language Models #Visual Quantization #Representation Learning #Any-Resolution #Discrete Visual Representations #Text-Aligned #Efficiency

2026년 6월 25일

[논문리뷰] The Verification Horizon: No Silver Bullet for Coding Agent Rewards

본 논문은 최신 Coding Agent의 성능이 발전함에 따라, 생성된 코드의 정확성을 신뢰할 수 있게 검증하는 문제가 생성 자체보다 훨씬 어려워진 현실을 지적합니다.

#Review #Coding Agents #Reward Design #Reward Hacking #Alignment #Verification #Systematic Evaluation

2026년 6월 25일

[논문리뷰] Running the Gauntlet: Re-evaluating the Capabilities of Agents Beyond Familiar Environments

본 연구는 기존 에이전트 벤치마크들이 지나치게 단순한 작업이나 친숙한 웹 환경에만 치중하여 현대 에이전트의 잠재적 한계를 적절히 탐지하지 못한다는 문제의식에서 출발한다. 기존 벤치마크는 주로 온라인 쇼핑이나 단순 정보 검색과 같은 소비자 중심의 작업을 대상으로 하므로, 에이전트의 성능이 조기에 포화되는 현상을 보인다.

#Review #Agentic Systems #GauntletBench #Temporal Perception #Graphical Understanding #3D Reasoning #Generalization #Multimodal Large Language Models

2026년 6월 25일

[논문리뷰] Qwen-Image-Agent: Bridging the Context Gap in Real-World Image Generation

본 논문은 기존 Text-to-Image (T2I) 모델이 실세계의 복잡하고 모호한 요청을 처리하는 데 겪는 구조적 한계를 해결하고자 합니다. T2I 모델은 일반적으로 완전히 명시된 프롬프트에 최적화되어 있으나, 실세계의 사용자 요청은 불완전하거나 맥락 정보를 필요로 하는 경우가 많습니다 .

#Review #Agentic Image Generation #Context Gap #Context-Aware Planning #Context Grounding #IA-Bench #Multimodal Large Language Model (MLLM)

2026년 6월 25일

[논문리뷰] PhysiFormer: Learning to Simulate Mechanics in World Space

본 연구는 영상 기반의 물리 모델링이 겪는 뷰 의존성과 물리 법칙 위배 문제를 해결하고, 3D Mesh 수준에서 직접 물리 시뮬레이션을 수행하는 PhysiFormer를 제안합니다.

#Review #PhysiFormer #Diffusion Transformer #3D Mesh #World Space Simulation #Physically-plausible #Trajectory Prediction

2026년 6월 25일

[논문리뷰] OpenBioRQ: Unsolved Biomedical Research Questions for Agents

본 논문은 기존의 LLM 평가 벤치마크들이 정해진 정답(ground-truth)이 있는 질문들만을 다룸으로써, 실제 환경에서 발생하는 치명적인 오류 유형을 간과하고 있다는 문제를 제기합니다 .

#Review #Biomedical Research #Agentic Evaluation #Retrieval-Grounded #Faithfulness #Citation Factuality #Open Questions

2026년 6월 25일

[논문리뷰] OPID: On-Policy Skill Distillation for Agentic Reinforcement Learning

본 논문은 에이전트 강화학습에서 outcome-based RL의 희소하고 지연된 보상이 중간 의사결정에 대한 세밀한 신용 할당(credit assignment)을 제공하지 못하는 문제를 해결합니다 .

#Review #Agentic Reinforcement Learning #On-Policy Distillation #Skill Extraction #Hindsight Supervision #Hierarchical Skills #Self-Distillation #Token-level Advantage

2026년 6월 25일

[논문리뷰] JetSpec: Breaking the Scaling Ceiling of Speculative Decoding with Parallel Tree Drafting

본 논문은 기존 Speculative Decoding 방식이 직면한 확장성(Scalability) 한계, 즉 '인과성-효율성 딜레마(Causality-Efficiency Dilemma)'를 해결하고자 한다 .

#Review #Speculative Decoding #Parallel Tree Drafting #Causal Attention #LLM Inference #Latency Reduction

2026년 6월 25일

[논문리뷰] In-Context World Modeling for Robotic Control

본 연구는 기존 VLA 모델들이 훈련 단계의 고정된 환경 컨텍스트에 지나치게 의존하여, 카메라 시점이나 로봇 형태가 변경되는 환경에서의 일반화(generalization) 실패 문제를 해결하고자 합니다.

#Review #In-Context World Modeling #VLA models #System Identification #Robotic Control #Generalization #Zero-shot Adaptation

2026년 6월 25일

[논문리뷰] How Post-Training Shapes Biological Reasoning Models

본 논문은 생물학적 추론 모델 개발에서 Post-Training 과정이 모델의 일반화 능력과 과잉 최적화(Over-specialization)에 미치는 영향을 체계적으로 규명한다.

#Review #Biological Reasoning #Post-Training #Supervised Fine-Tuning #Reinforcement Learning #Generalization #Foundation Models

2026년 6월 25일

[논문리뷰] Hallucination in World Models is Predictable and Preventable

본 논문은 현대의 generative world model들이 매우 사실적인 미래를 생성함에도 불구하고, 실제 동역학으로부터 이탈하는 Hallucination 문제를 해결하고자 한다.

#Review #World Models #Hallucination #Data Coverage #Visual Generative Modeling #Representation Learning #Curiosity-driven Data Collection

2026년 6월 25일

[논문리뷰] GUI vs. CLI: Execution Bottlenecks in Screen-Only and Skill-Mediated Computer-Use Agents

본 논문은 기존의 Computer-Use 에이전트 평가 방식이 GUI와 CLI라는 상호작용 모달리티(Modality)의 차이를 모델 성능, 작업 환경, 에이전트의 제어 능력과 혼동하고 있다는 점을 지적한다.

#Review #GUI Agents #CLI Agents #Computer-Use #Skill-Mediated #Execution Bottlenecks #Benchmark #Action Space #Visual Grounding

2026년 6월 25일

[논문리뷰] Fast LeWorldModel

본 논문은 LeWM과 같은 기존 JEPA 기반 World Model이 가진 비효율적인 계획(Planning) 과정을 개선하는 것을 목표로 합니다. 기존의 Autoregressive Rollout 방식은 미래 상태를 예측하기 위해 한 단계씩 순차적으로 모델을 호출해야 하므로 계산 비용이 매우 높습니다.

#Review #Latent World Models #Visual Planning #Joint-Embedding Predictive Architectures (JEPA)#Action-Prefix Prediction #Parallel Rollout #CEM (Cross-Entropy Method)

2026년 6월 25일

[논문리뷰] Discretizing Reward Models

본 논문은 현대의 Reward Model들이 성능 측정 지표상으로는 우수해 보이지만, 실제 Reinforcement Learning 과정에서 응답의 유용성을 과도하게 구별하는 Oversensitivity 문제로 인해 저품질 정책을 유도한다는 점을 문제로 제기합니다 .

#Review #Reward Model #Reinforcement Learning #Oversensitivity #Discretization #Reward Clustering #Monte Carlo Dropout #Discriminative Ability #Specificity

2026년 6월 25일

[논문리뷰] DanceOPD: On-Policy Generative Field Distillation

본 연구는 단일 모델이 T2I, 로컬/글로벌 에디팅 등 서로 충돌할 수 있는 다양한 생성 능력을 통합하면서도 각각의 성능을 유지해야 하는 문제를 해결하고자 합니다. 기존의 데이터 혼합(data mixing)이나 모델 결합 방식은 capability 간의 gradient 충돌을 야기하거나 성능을 희석시키는 한계를 가집니다.

#Review #Generative Field Distillation #Flow Matching #On-Policy Distillation #Capability Composition #Hard-Routed Field Matching #Multi-Capability Alignment

2026년 6월 25일

[논문리뷰] Confidence-Aware Tool Orchestration for Robust Video Understanding

본 논문은 현대의 Video-LLM들이 실세계의 다양한 시각적 열화 환경에서 프레임별 신뢰도를 무시함으로써 발생하는 Blind Trust Problem을 해결하는 것을 목표로 합니다.

#Review #Video Understanding #Robustness #Tool Orchestration #GRPO #Frame Selection #Blind Trust Problem #Confidence-Aware

2026년 6월 25일

[논문리뷰] CoffeeBench: Benchmarking Long-Horizon LLM Agents in Heterogeneous Multi-Agent Economies

본 논문은 기존 LLM 에이전트 벤치마크가 단일 에이전트나 동질적인 환경에 국한되어, 현실적인 경제 시스템의 복잡성을 반영하지 못하는 한계를 해결하고자 한다.

#Review #LLM Agents #Long-Horizon #Multi-Agent Economy #Benchmark #Supply Chain #Decision-making

2026년 6월 25일