최신 포스트

[논문리뷰] Visual-ERM: Reward Modeling for Visual Equivalence

Vision-to-Code 작업은 AI 지원 프론트엔드 개발, 과학 논문 파싱, 지식 관리 및 시스템 통합과 같은 다양한 하위 시스템에 필수적인 핵심 기능입니다.

#Review #Reward Modeling #Vision-to-Code #Reinforcement Learning #Multimodal Generative Model #Visual Equivalence #Fine-grained Feedback #Test-Time Scaling

2026년 3월 15일

[논문리뷰] Video Streaming Thinking: VideoLLMs Can Watch and Think Simultaneously

온라인 Video Large Language Models (VideoLLMs) 는 스트리밍 시각 입력(streaming visual inputs)을 해석하고 실시간으로 응답하는 데 필수적이며, 특히 Embodied Intelligence와 상호작용형 AI 어시스턴트에서 중요하다.

#Review #Streaming Video Understanding #VideoLLMs #Chain-of-Thought (CoT)#Real-time AI #Reinforcement Learning #Knowledge Graphs #Streaming Thinking #Low Latency

2026년 3월 15일

[논문리뷰] VQQA: An Agentic Approach for Video Evaluation and Quality Improvement

비디오 생성 모델의 빠른 발전에도 불구하고, 복잡한 사용자 의도에 모델 Output을 맞추는 것은 여전히 큰 과제입니다.

2026년 3월 15일

[논문리뷰] V-Bridge: Bridging Video Generative Priors to Versatile Few-shot Image Restoration

기존 image restoration 방법론은 주로 task-specific modeling에 초점을 맞추어 각 degradation type별로 상당한 supervision(백만 개 이상의 샘플)을 요구했습니다. 이는 (a) Traditional Image Restoration `

2026년 3월 15일

[논문리뷰] Think While Watching: Online Streaming Segment-Level Memory for Multi-Turn Video Reasoning in Multimodal Large Language Models

Multimodal Large Language Models (MLLMs)는 Offline Video Understanding Task에서 뛰어난 성능을 보였지만, Live Broadcasting, Monitoring, Robotic Assistants와 같이 continuously arriving video stream에 대한 Online Multi-turn Interaction에서는 약점을 드러냅니다.

#Review #Streaming Video Reasoning #Multi-Turn Interaction #Segment-Level Memory #Causal Mask #Positional Encoding #Dual KV Cache #Multimodal Large Language Models

2026년 3월 15일

[논문리뷰] Steve-Evolving: Open-World Embodied Self-Evolution via Fine-Grained Diagnosis and Dual-Track Knowledge Distillation

본 연구는 오픈 월드 환경에서 Embodied Agent가 Long-Horizon Compound Task를 자율적으로 수행하는 데 있어 Single-Step Planning Quality보다는 상호작용 경험을 어떻게 체계화하고 진화시키는지가 핵심 병목(bottleneck)임을 지적합니다.

2026년 3월 15일

[논문리뷰] Spend Less, Reason Better: Budget-Aware Value Tree Search for LLM Agents

최근 Large Language Models (LLMs)의 신뢰성을 향상하기 위해 test-time scaling 이 보편화되었지만, 기존 접근 방식은 컴퓨팅 리소스를 무한하다고 가정하여 에이전트가 중복되거나 막다른 길(dead-end) 궤적에 token 및 tool budgets 을 소진하는 문제가 있습니다.

2026년 3월 15일

[논문리뷰] SimRecon: SimReady Compositional Scene Reconstruction from Real Videos

기존 3D scene reconstruction 방법론들은 대개 scene을 holistic 하게 표현하여 시각적 fidelity는 뛰어나지만, 완전한 object geometry와 명확한 object boundary가 부족하여 simulation 및 interaction에 부적합하다는 근본적인 한계점을 가집니다.

#Review #Compositional 3D Scene Reconstruction #Simulation-Ready Scenes #Active Viewpoint Optimization (AVO)#Scene Graph Synthesizer (SGS)#Real-world Videos #Physical Plausibility

2026년 3월 15일

[논문리뷰] OmniForcing: Unleashing Real-time Joint Audio-Visual Generation

최근 LTX-2 와 Veo 3 와 같은 Joint Audio-Visual Diffusion Model들은 탁월한 Generation Quality를 보여주지만, Bidirectional Attention Dependency로 인해 높은 Latency를 겪어 Real-time Application에 적용하기 어렵습니다.

#Review #Streaming Audio-Visual Generation #Diffusion Distillation #Autoregressive Video Synthesis #Multi-modal AI

2026년 3월 15일

[논문리뷰] Multimodal OCR: Parse Anything from Documents

Large language model과 Multimodal model 시대에 문서 파싱은 Pretraining 및 Retrieval을 위한 핵심 Data engine이 되었습니다.

#Review #Multimodal OCR #MOCR #Document Parsing #Structured Graphics #Image-to-SVG #Vision-Language Models #OCR Arena

2026년 3월 15일

[논문리뷰] MM-CondChain: A Programmatically Verified Benchmark for Visually Grounded Deep Compositional Reasoning

Multimodal Large Language Models ( MLLM )은 GUI 탐색과 같은 복잡한 시각적 워크플로우를 처리하는 데 점점 더 많이 사용되고 있지만, 이러한 Deep Compositional Reasoning 능력에 대한 평가는 여전히 부족합니다.

#Review #MLLM #Deep Compositional Reasoning #Programmatically Verified Benchmark #Hard Negatives #Control Flow #VPIR #Path F1

2026년 3월 15일

[논문리뷰] LookaheadKV: Fast and Accurate KV Cache Eviction by Glimpsing into the Future without Generation

최근 LLM의 Context Length가 급증하면서 KV Cache 의 크기가 입력 시퀀스 길이에 비례하여 선형적으로 증가하며, 이는 long-context task 에서 메모리 병목 현상을 야기하여 inference scalability에 큰 제약을 초래하고 있습니다.

#Review #KV Cache Eviction #Long Context LLM #Attention Score Prediction #LoRA #Parameter-Efficient #Time-to-First-Token

2026년 3월 15일

[논문리뷰] LMEB: Long-horizon Memory Embedding Benchmark

Memory-augmented system (예: OpenClaw )에서 Memory embedding은 필수적이지만, 현재 Text embedding benchmark들은 그 평가가 미흡합니다.

#Review #Memory Embeddings #Long-horizon Memory Retrieval #Text Embedding Benchmarks #Episodic Memory #Dialogue Memory #Semantic Memory #Procedural Memory #Zero-Shot Evaluation

2026년 3월 15일

[논문리뷰] HybridStitch: Pixel and Timestep Level Model Stitching for Diffusion Acceleration

Text-to-Image (T2I) Diffusion 모델은 인상적인 이미지 생성 능력을 보여주지만, 수십억 개의 파라미터를 포함하는 대규모 모델의 경우 극심한 계산 오버헤드와 높은 Latency로 인해 latency-sensitive한 애플리케이션에 적용하기 어렵다는 문제에 직면해 있습니다.

#Review #Diffusion model #Mixture of models #Acceleration #Text-to-Image #Model stitching #Latency reduction #Pixel-level #Timestep-level

2026년 3월 15일

[논문리뷰] HomeSafe-Bench: Evaluating Vision-Language Models on Unsafe Action Detection for Embodied Agents in Household Scenarios

Embodied Agents 가 가정 환경에 빠르게 도입되면서 예측 불가능한 안전 위험이 증가하고 있습니다. 기존의 안전 평가 방식은 주로 정적인 이미지, 텍스트 또는 일반적인 위험에 국한되어, household scenarios의 동적인 unsafe action detection을 적절히 벤치마킹하는 데 실패했습니다.

#Review #Embodied Agents #Unsafe Action Detection #Vision-Language Models (VLMs)#Household Scenarios #HomeSafe-Bench #HD-Guard #Real-time Safety Monitoring

2026년 3월 15일

[논문리뷰] From Sparse to Dense: Multi-View GRPO for Flow Models via Augmented Condition Space

최근 Diffusion/Flow Models은 Visual Content 생성에서 혁신적인 능력을 보여주고 있지만, 생성된 Outputs이 Human Preference 및 Task-specific Constraint에 Align되도록 하는 것은 여전히 중요한 과제입니다.

#Review #Reinforcement Learning #GRPO #Diffusion Models #Flow Models #Preference Alignment #Condition Enhancement #Multi-View Learning

2026년 3월 15일

[논문리뷰] ECoLAD: Deployment-Oriented Evaluation for Automotive Time-Series Anomaly Detection

기존의 Time-Series Anomaly Detection(TSAD) 연구들은 주로 workstation-class hardware에서 unconstrained execution 환경 하에 detection quality(주로 accuracy)만을 비교하고 최적화했습니다.

#Review #Time-series anomaly detection #Deployment-oriented evaluation #Compute reduction #CPU parallelism #Throughput #Latency #Automotive telemetry #AUC-PR

2026년 3월 15일

[논문리뷰] Detecting Intrinsic and Instrumental Self-Preservation in Autonomous Agents: The Unified Continuation-Interest Protocol

자율 에이전트, 특히 메모리, 지속적인 컨텍스트, 다단계 계획을 가진 위임된(delegated) 시스템은 고유한 측정 문제를 제기합니다.

#Review #AI safety #self-preservation #instrumental convergence #Quantum Boltzmann Machine #entanglement entropy #alignment

2026년 3월 15일

[논문리뷰] CreativeBench: Benchmarking and Enhancing Machine Creativity via Self-Evolving Challenges

Large Language Models(LLMs)의 성공은 인터넷 규모의 데이터 확장에 힘입었지만, 현재 고품질 데이터의 포화로 인해 모델 인텔리전스(model intelligence)의 추가 스케일링이 한계에 부딪혔습니다.

2026년 3월 15일

[논문리뷰] Cheers: Decoupling Patch Details from Semantic Representations Enables Unified Multimodal Comprehension and Generation

최근 멀티모달 모델링 분야에서 시각적 이해와 생성을 단일 모델 내에서 통합하는 연구는 인간과 유사한 멀티모달 인텔리전스를 향한 중요한 진전으로 평가받습니다. 그러나 이러한 통합은 두 가지 근본적인 문제에 직면합니다.

#Review #Unified multimodal model #Visual generation and comprehension #Unified vision encoder #Cascaded flow matching #Token compression

2026년 3월 15일