최신 포스트

[논문리뷰] WorldBench: A Challenging and Visually Diverse Multimodal Reasoning Benchmark

본 논문은 기존 멀티모달 벤치마크들이 모델의 실제 추론 능력을 충분히 측정하지 못하는 한계점을 극복하기 위해 WorldBench를 제안한다. 많은 기존 벤치마크가 특정 도메인에 편향되어 있거나 시각적 다양성이 부족하여, VLM의 실제 문제 해결 능력을 과대평가하게 만드는 경향이 있다.

#Review #Multimodal Reasoning #Benchmark #Vision-Language Model #Visual Diversity #Inference #Evaluation #LLM

2026년 6월 7일

[논문리뷰] When Tools Fail: Benchmarking Dynamic Replanning and Anomaly Recovery in LLM Agents

본 논문은 기존의 LLM 에이전트 벤치마크들이 이상적인 'Happy Path' 환경만을 상정하여, 현실의 불안정한 도구 실행 및 오류 상황을 제대로 평가하지 못하는 한계를 지적한다.

#Review #LLM Agents #Tool-Integrated Reasoning #Fault-Tolerance #Dynamic Replanning #Anomaly Recovery #Benchmark #DAG-based Task Generation

2026년 6월 7일

[논문리뷰] When Gradients Collide: Failure Modes of Multi-Objective Prompt Optimization for LLM Judges

본 논문은 여러 평가 기준을 동시에 고려해야 하는 Multi-Objective LLM Judge의 프롬프트 최적화 과정에서 발생하는 근본적인 문제들을 규명한다.

#Review #LLM-as-a-Judge #Prompt Optimization #Textual Gradient #Multi-Objective Optimization #Gradient Dilution #Instruction Interference

2026년 6월 7일

[논문리뷰] Watch, Remember, Reason: Human-View Video Understanding with MLLMs

본 연구는 짧은 클립 위주의 비디오 이해에서 벗어나 분 단위 이상의 장기적이고 다중 모달이 얽힌 복잡한 비디오 환경으로 변화하는 트렌드를 다룹니다.

#Review #Multimodal Large Language Models #Video Understanding #Temporal Grounding #Memory Modeling #Long-video Reasoning #Efficient Perception

2026년 6월 7일

[논문리뷰] UniSHARP: Universal Sharp Monocular View Synthesis

기존의 monocular novel view synthesis 연구(예: SHARP, Flash3D)는 주로 pinhole 카메라 기반의 perspective 이미지에 최적화되어 있어, 시야각이 넓거나 왜곡이 큰 wide-FoV, fisheye, panoramic 카메라 환경으로의 일반화가 어렵습니다 .

#Review #Novel View Synthesis #3D Gaussian Splatting #Monocular Rendering #Omnidirectional Latent Space #Ray-Based Representation #Universal Camera Model

2026년 6월 7일

[논문리뷰] Towards Retrieving Interaction Spaces for Agentic Search

본 논문은 기존의 Agentic Search 방식이 가진 확장성 및 효율성 문제를 해결하기 위해 제안되었습니다.

#Review #Agentic Search #Retrieval-Augmented Generation #Direct Corpus Interaction #Interaction Space #Information Retrieval #LLM

2026년 6월 7일

[논문리뷰] Towards Human-Like Interactive Speech Recognition With Agentic Correction and Semantic Evaluation

본 논문은 현대의 ASR 시스템이 단일 패스 방식에 고착되어 있어, 인간의 의사소통처럼 반복적인 확인과 수정이 필요한 상황에서 의미론적 오류(Meaning-critical errors)를 효과적으로 해결하지 못하는 문제를 해결합니다 .

#Review #Interactive ASR #Agentic Correction #Semantic Evaluation #S2ER #Human-AI Alignment #LLM-as-a-Judge

2026년 6월 7일

[논문리뷰] Thinking with Imagination: Agentic Visual Spatial Reasoning with World Simulators

기존의 VLM들은 관측된 이미지에 제한되어 있어 보이지 않는 레이아웃을 추론하거나 시점 변화에 따른 공간적 일관성을 유지하는 데 한계를 보입니다. 특히 제한적인 일인칭 관측 환경에서는 alternative viewpoint에서 장면을 파악해야 할 필요성이 크지만, 현 모델들은 이를 능동적으로 해결하지 못합니다.

#Review #Vision-Language Models #Spatial Reasoning #World Simulator #Reinforcement Learning #View Consistency #Agentic Reasoning

2026년 6월 7일

[논문리뷰] SubtleMemory: A Benchmark for Fine-Grained Relational Memory Discrimination in Long-Horizon AI Agents

본 연구는 장기 기억을 가진 AI 에이전트가 축적된 기억들 간의 복잡한 관계를 정확히 이해하고 활용하지 못하는 근본적인 문제를 해결하고자 합니다.

#Review #Long-Horizon AI Agents #Long-term Memory #Relational Memory #Benchmarking #LLM Agents #Knowledge Discrimination

2026년 6월 7일

[논문리뷰] Streaming Video Generation with Streaming Force Control

본 논문은 기존 비디오 생성 모델들이 가진 상호작용성(Interactivity)의 결여와 물리적 제어의 한계를 해결하기 위해 StreamForce를 제안합니다.

#Review #Streaming Video Generation #Force Control #Causal Autoregressive Model #Force-aware Distillation #Unified Force Representation

2026년 6월 7일

[논문리뷰] Stream3D-VLM: Online 3D Spatial Understanding with Incremental Geometry Priors

본 논문은 기존 3D LMM들이 전체 장면 관측이나 미리 정의된 비디오 클립을 요구하는 오프라인 방식으로 운영되어 실시간 환경 적용에 한계가 있다는 문제점을 해결하고자 합니다 . 이러한 방식은 자율 로봇이나 AR/VR 기기와 같이 실시간 상호작용이 필수적인 임베디드 애플리케이션에서 사용하기 어렵습니다.

#Review #3D Large Multimodal Models #Online Spatial Understanding #Incremental Geometry Priors #Visual-Spatial Feature Integration #Geometry-Adaptive Voxel Compression #Streaming Video

2026년 6월 7일

[논문리뷰] Socratic-SWE: Self-Evolving Coding Agents via Trace-Derived Agent Skills

본 연구는 LLM 기반 소프트웨어 엔지니어링 에이전트가 고품질 태스크 데이터 부족으로 인해 학습 및 일반화 성능이 제한되는 문제를 해결하고자 합니다. 기존 합성 데이터 생성 방식은 고정된 규칙이나 무작위 버그 주입에 의존하여 에이전트의 실제 취약점이나 학습 진행 상황을 반영하지 못한다는 한계가 있습니다.

#Review #Software Engineering #Large Language Models #Reinforcement Learning #Self-Evolution #Agent Skills #Trace-Driven Learning #Code Repair

2026년 6월 7일

[논문리뷰] SoCRATES: Towards Reliable Automated Evaluation of Proactive LLM Mediation across Domains and Socio-cognitive Variations

본 논문은 실시간으로 변화하는 복잡한 갈등 상황에서 LLM 기반 중재자를 안정적으로 평가할 수 있는 체계적인 방법론이 부재하다는 문제에서 출발한다. 기존 연구들은 몇몇 제한된 도메인에 의존하거나, 중재자의 성능을 전체 대화 맥락에서 평가함으로써 관련 없는 대화 내용에 의한 노이즈를 발생시킨다는 한계가 있다.

#Review #LLM Mediation #Automated Evaluation #Socio-cognitive Adaptation #Agentic Pipeline #Topic-localized Evaluation

2026년 6월 7일

[논문리뷰] SPACENUM: Revisiting Spatial Numerical Understanding in VLMs

본 논문은 VLM이 embodied 환경에서 생성하는 수치적 출력값(예: action magnitude, spatial coordinate)이 실제 공간 정보에 기반하고 있는지에 대해 의문을 제기합니다.

#Review #Vision-Language Models #Spatial Numerical Understanding #Spatial Exploration #Spatial Reasoning #Metric Grounding #Num2Space #Space2Num

2026년 6월 7일

[논문리뷰] SIA: Self Improving AI with Harness & Weight Updates

본 논문은 기존 AI 자기 개선 연구가 Harness(scaffold) 개선과 Test-time training(weight updates)이라는 두 가지 고립된 사일로(silo)로 나뉘어 있는 한계를 해결하고자 한다 .

#Review #Self-Improving Agents #Test-Time Training #Reinforcement Learning #Harness Engineering #Scaffold Generation #LoRA

2026년 6월 7일

[논문리뷰] Robots Need More than VLA and World Models

본 논문은 현재 로봇 학습 분야가 VLA 모델의 스케일링에만 지나치게 의존하고 있으며, 이것만으로는 일반적인 로봇 지능(Generalist robot intelligence)을 달성할 수 없다고 지적한다.

#Review #Robotics #Vision-Language-Action Models #Physical Intelligence #Embodied AI #Grounding #Robot Learning #Data Engines

2026년 6월 7일

[논문리뷰] Reinforcement Learning from Rich Feedback with Distributional DAgger

본 연구는 기존의 RLVR 패러다임이 가진 극심한 희소 보상 문제와 그에 따른 부적절한 신용 할당 문제를 해결하고자 합니다.

#Review #Reinforcement Learning #Rich Feedback #Self-Distillation #DAgger #Policy Optimization #Credit Assignment

2026년 6월 7일

[논문리뷰] Physics in 2-Steps: Locking Motion Priors Before Visual Refinement Erases Them

본 논문은 현대의 Image-to-Video(I2V) 생성 모델이 뛰어난 시각적 품질에도 불구하고 왜 기초적인 물리 법칙을 자주 위반하는가라는 핵심 문제를 해결하고자 합니다.

#Review #Video Generation #Diffusion Models #Physical Consistency #Phase Erosion #Latent Delta Guidance #Spectral Analysis #Training-Free

2026년 6월 7일

[논문리뷰] Parametric Social Identity Injection and Diversification in Public Opinion Simulation

본 논문은 기존의 LLM 기반 공공 의견 시뮬레이션 방식이 가진 심각한 다양성 결여 문제를 해결하고자 한다. 저자들은 기존의 프롬프트 기반 페르소나 방식이 실제 인간의 응답 분포를 모사하지 못하고, 계층적 정보 전달 과정에서 정체성 정보가 유실되는 Diversity Collapse 현상을 발견하였다 .

#Review #Agent-based Modeling #Public Opinion Simulation #Social Diversity #Large Language Models #Hidden State Manipulation

2026년 6월 7일

[논문리뷰] PaperFlow: Profiling, Recommending, and Adapting Across Daily Paper Streams

본 논문은 기존의 논문 추천 시스템이 대부분 고정된 후보군을 대상으로 하는 Static Ranking 문제로 프레임워크를 제한하고 있다는 한계를 지적합니다 .

#Review #Scientific Paper Recommendation #User Profiling #Interest Drift #Longitudinal Benchmark #Multi-signal Aggregation #LLM-based Recommendation

2026년 6월 7일