최신 포스트

[논문리뷰] TUNA: Taming Unified Visual Representations for Native Unified Multimodal Models

논문은 멀티모달 이해와 생성 태스크를 단일 프레임워크 내에서 원활하게 수행하는 TUNA라는 네이티브 통합 멀티모달 모델(UMM) 을 개발하는 것을 목표로 합니다. 기존 UMM의 분리된 또는 편향된 시각 표현 방식 으로 인한 한계를 극복하고, 이해와 생성 모두에 효과적인 통합된 연속 시각 표현 공간 을 구축하고자 합니다.

#Review #Unified Multimodal Models #Visual Representation #VAE #Flow Matching #Multimodal Understanding #Multimodal Generation #Image Editing #State-of-the-Art

2025년 12월 1일

[논문리뷰] Structured Extraction from Business Process Diagrams Using Vision-Language Models

이 논문은 비즈니스 프로세스 모델 및 표기법(BPMN) 다이어그램 이미지에서 원시 XML 파일이나 텍스트 주석 없이 직접 구조화된 JSON 표현 을 추출하는 것을 목표로 합니다. 이는 기존 방법론이 XML 의존성으로 인해 발생하는 하위 시스템 통합 및 분석의 제약을 극복하기 위함입니다.

#Review #Vision-Language Models #BPMN Extraction #Structured Information Extraction #OCR Enrichment #Prompt Engineering #Diagram Understanding #Business Process Management

2025년 12월 1일

[논문리뷰] StreamGaze: Gaze-Guided Temporal Reasoning and Proactive Understanding in Streaming Videos

본 연구는 대규모 언어 모델(MLLMs)이 스트리밍 비디오 환경에서 인간의 시선(gaze) 신호를 활용하여 시간적 추론 및 선제적 이해를 얼마나 효과적으로 수행하는지 평가하는 것을 목표로 합니다.

#Review #Streaming Video Understanding #Gaze-Guided AI #Temporal Reasoning #Proactive AI #MLLMs #Eye Tracking #Benchmark #Human-Computer Interaction

2025년 12월 1일

[논문리뷰] Stabilizing Reinforcement Learning with LLMs: Formulation and Practices

본 논문은 LLM 기반 RL의 불안정성 문제를 해결하고, 시퀀스 레벨 보상을 토큰 레벨 최적화 목표로 효과적으로 근사하여 최적화할 수 있는 조건을 밝히는 것을 목표로 합니다. 특히, MoE 모델에서 동적 전문가 라우팅이 학습 안정성에 미치는 영향을 분석하고, 이를 완화하기 위한 실용적인 방법을 제시합니다.

#Review #Reinforcement Learning (RL)#Large Language Models (LLMs)#Policy Gradient #REINFORCE #Mixture-of-Experts (MoE)#Training Stability #Importance Sampling #Routing Replay #Off-policy Learning

2025년 12월 1일

[논문리뷰] SpeContext: Enabling Efficient Long-context Reasoning with Speculative Context Sparsity in LLMs

본 논문은 대규모 언어 모델(LLM)의 장문맥(long-context) 추론 시 발생하는 Key-Value (KV) 캐시 관련 문제를 해결하는 것을 목표로 합니다.

#Review #LLMs #Long-context Reasoning #KV Cache Optimization #Speculative Sparsity #Knowledge Distillation #Adaptive Memory Management #Throughput

2025년 12월 1일

[논문리뷰] Seeing the Wind from a Falling Leaf

본 연구는 영상 데이터로부터 나뭇잎이 떨어지는 바람과 같이 눈에 보이지 않는 물리적 힘(invisible forces)을 추정하는 것을 목표로 합니다. 인간이 시각적 단서만으로 보이지 않는 물리적 효과를 인지하는 능력을 모방하여, 비전과 물리학 간의 간극을 줄이고 픽셀 뒤의 물리적 과정을 이해하는 데 기여하고자 합니다.

#Review #Inverse Graphics #Differentiable Physics #Force Estimation #Video Generation #Material Point Method #3D Gaussians #Spatio-temporal Modeling #Vision-Language Models

2025년 12월 1일

[논문리뷰] Script: Graph-Structured and Query-Conditioned Semantic Token Pruning for Multimodal Large Language Models

본 논문은 멀티모달 대규모 언어 모델(MLLM)에서 고해상도 이미지 및 비디오 처리 시 발생하는 과도한 메모리 소비 및 추론 지연 시간 문제 를 해결하고자 합니다.

#Review #Multimodal Large Language Models (MLLMs)#Token Pruning #Graph-Structured Pruning (GSP)#Query-Conditioned Semantic Pruning (QCSP)#Determinantal Point Processes (DPP)#Model Efficiency #Visual Redundancy

2025년 12월 1일

[논문리뷰] SCALE: Selective Resource Allocation for Overcoming Performance Bottlenecks in Mathematical Test-time Scaling

이 논문은 대규모 언어 모델(LLMs)의 수학적 추론 과정에서 발생하는 성능 병목 현상을 해결하는 것을 목표로 합니다.

#Review #LLM Reasoning #Test-time Scaling #Resource Allocation #Dual-process Theory #Mathematical Reasoning #Adaptive Computation #Performance Optimization

2025년 12월 1일

[논문리뷰] Rectifying LLM Thought from Lens of Optimization

본 논문은 Long Chain-of-Thought (CoT) LLM이 흔히 보이는 과도한 추론 및 불필요하게 긴 추론 사슬과 같은 비최적 추론 행동 을 해결하여, 성능 저하 및 높은 계산 비용 문제를 개선하는 것을 목표로 합니다. CoT를 최적화 과정으로 재개념화하고 이를 효과적으로 교정하고자 합니다.

#Review #LLM Reasoning #Chain-of-Thought #RLVR #Optimization Framework #Process-level Reward #Gradient Descent #Reasoning Efficiency #Suboptimal Reasoning

2025년 12월 1일

[논문리뷰] PromptBridge: Cross-Model Prompt Transfer for Large Language Models

본 논문은 LLM 시스템에서 모델이 교체되거나 업데이트될 때, 기존 모델에 최적화된 프롬프트의 성능이 다른 모델에서 크게 저하되는 현상인 모델 드리프팅(Model Drifting) 문제를 해결하고자 합니다.

#Review #Large Language Models #Prompt Engineering #Model Drifting #Prompt Transfer #Cross-Model Adaptation #Training-Free #Prompt Optimization #MAP-RPE

2025년 12월 1일

[논문리뷰] OpenREAD: Reinforced Open-Ended Reasoing for End-to-End Autonomous Driving with LLM-as-Critic

자율 주행 시스템에서 기존 SFT(Supervised Fine-tuning) 기반 VLM(Vision-Language Model) 의 제한된 추론 일반화 및 개방형 태스크 처리 능력을 개선하는 것이 목표입니다.

#Review #Autonomous Driving #Reinforcement Fine-tuning #LLM-as-Critic #Vision-Language Model #End-to-End Learning #Chain-of-Thought #Trajectory Planning

2025년 12월 1일

[논문리뷰] OmniFusion: Simultaneous Multilingual Multimodal Translations via Modular Fusion

본 논문은 텍스트 전용 번역 LLM이 겪는 지연 시간과 멀티모달 컨텍스트 활용 불가능성, 그리고 MMFM이 가진 다국어 번역 성능 및 커버리지의 한계를 해결하고자 합니다.

#Review #Multimodal Translation #Speech Translation #Simultaneous Translation #Large Language Models #Multimodal Foundation Models #Modular Fusion #End-to-End #Gated Fusion #OCR

2025년 12월 1일

[논문리뷰] Lotus-2: Advancing Geometric Dense Prediction with Powerful Image Generative Model

본 논문은 단일 이미지에서 픽셀 단위의 기하학적 속성을 복구하는 고질적인 난제(ill-posed problem)를 해결하는 것을 목표로 합니다.

#Review #Geometric Dense Prediction #Depth Estimation #Surface Normal Prediction #Diffusion Models #Rectified Flow #Generative Priors #Deterministic Inference #Two-Stage Framework

2025년 12월 1일

[논문리뷰] LongVT: Incentivizing 'Thinking with Long Videos' via Native Tool Calling

논문은 대규모 멀티모달 모델(LMMs)이 장시간 비디오(hours-long)에서 증거가 희박하고 시간적으로 분산된 정보를 처리할 때 발생하는 환각 현상과 부정확한 추론 문제를 해결하고자 합니다.

#Review #Long Video Understanding #Multimodal LLMs #Tool Calling #Reinforcement Learning #Chain-of-Thought #Temporal Grounding #Video Question Answering

2025년 12월 1일

[논문리뷰] Learning Eigenstructures of Unstructured Data Manifolds

이 논문은 비정형 데이터(unstructured data)로부터 연산자 선택, 이산화, 고유값 해석기 없이 직접 스펙트럼 기저(spectral basis)를 학습하는 새로운 프레임워크를 제안합니다.

#Review #Spectral Basis Learning #Unstructured Data #Manifold Learning #Laplacian Operator #Optimal Approximation Theory #Neural Networks #Eigenstructure #Point Cloud Processing

2025년 12월 1일

[논문리뷰] LFM2 Technical Report

본 논문은 LFM2 라는 Liquid Foundation Models 제품군을 소개하며, 효율적인 온-디바이스 배포 와 강력한 태스크 수행 능력 을 동시에 달성하는 것을 목표로 합니다.

#Review #Edge AI #Foundation Models #Hybrid Architecture #Knowledge Distillation #Multimodal AI #On-device Deployment #Efficient Inference #LLM Optimization

2025년 12월 1일

[논문리뷰] InternVideo-Next: Towards General Video Foundation Models without Video-Text Supervision

본 논문은 노이즈 많고 제한적인 비디오-텍스트 지도 학습의 한계와 저수준 픽셀 재구성에 머무르거나 숏컷 학습을 유도하는 기존 Masked Video Modeling (MVM) 의 문제점을 해결하고자 합니다.

#Review #Video Foundation Models #Self-Supervised Learning #Masked Video Modeling #Video-Text Supervision-Free #Encoder-Predictor-Decoder #Diffusion Decoder #Semantic Alignment #Latent World Model

2025년 12월 1일

[논문리뷰] Infinity-RoPE: Action-Controllable Infinite Video Generation Emerges From Autoregressive Self-Rollout

본 논문은 기존의 autoregressive 비디오 diffusion 모델이 가진 세 가지 핵심 한계를 해결하는 것을 목표로 합니다.

#Review #Autoregressive Video Generation #Rotary Positional Embedding #Infinite Video Generation #Action Control #Cinematic Transitions #Video Diffusion Models #KV Cache

2025년 12월 1일

[논문리뷰] IndicParam: Benchmark to evaluate LLMs on low-resource Indic Languages

대규모 언어 모델(LLMs)이 고자원 다국어 작업에서 우수한 성능을 보이지만, 저자원 및 초저자원 인디언 언어에 대한 평가는 심각하게 부족합니다. 본 연구는 이러한 언어에서의 LLM 성능 한계를 체계적으로 평가하고, 교차 언어 전이 학습의 효과를 밝히는 데 목적이 있습니다.

#Review #Low-resource Languages #Indic Languages #LLM Evaluation #Benchmark #Multilingual LLMs #Question Answering #Cross-lingual Transfer

2025년 12월 1일

[논문리뷰] How Far Are We from Genuinely Useful Deep Research Agents?

본 논문은 기존의 심층 연구 에이전트(DRA) 벤치마크가 질문 응답(QA) 또는 폐쇄형 작업 에 치중하여 종합적인 보고서 생성 능력을 제대로 평가하지 못하는 한계를 지적합니다. 또한, 현재의 개방형 벤치마크는 LLM 기반 샘플링 이나 주관적인 평가 방식 으로 인해 실제 사용자 요구사항과 동떨어져 있음을 문제로 삼습니다.

#Review #Deep Research Agents #Evaluation Benchmark #Failure Taxonomy #Report Generation #Information Retrieval #Reasoning Resilience #Content Fabrication #AI Agents

2025년 12월 1일