최신 포스트

[논문리뷰] UniVid: Unifying Vision Tasks with Pre-trained Video Generation Models

기존 Large Vision Models (LVMs)이 태스크 및 모달리티별 사전 훈련 데이터에 대한 높은 의존성으로 인해 확장성이 제한되는 문제를 해결하고자 합니다.

#Review #Unified Vision Modeling #Video Generation #Diffusion Transformer #Supervised Fine-tuning #Cross-modal #Cross-source Tasks #Visual Sentences #LoRA

2025년 9월 29일

[논문리뷰] UltraHorizon: Benchmarking Agent Capabilities in Ultra Long-Horizon Scenarios

기존 LLM 에이전트 벤치마크가 짧은 호라이즌과 완전 관측 가능한 태스크에 집중하여 실제 복합 태스크에 필수적인 지속적인 추론, 계획, 메모리 관리, 툴 사용 능력 을 충분히 평가하지 못하는 문제를 해결하는 것을 목표로 합니다.

#Review #LLM Agents #Long-Horizon Reasoning #Benchmarking #Partially Observable #Tool Use #Memory Management #Exploration

2025년 9월 29일

[논문리뷰] Think-on-Graph 3.0: Efficient and Adaptive LLM Reasoning on Heterogeneous Graphs via Multi-Agent Dual-Evolving Context Retrieval

본 논문은 기존 그래프 기반 RAG 시스템이 직면한 정적 그래프 인덱스 구축의 한계 와 LLM 추출기의 성능 의존성 문제를 해결하는 것을 목표로 합니다.

#Review #RAG #LLM Reasoning #Knowledge Graphs #Multi-Agent Systems #Context Retrieval #Heterogeneous Graphs #Adaptive Learning #Dual-Evolution

2025년 9월 29일

[논문리뷰] TUN3D: Towards Real-World Scene Understanding from Unposed Images

본 논문은 실세계 스캔에서 정확한 카메라 포즈나 깊이 정보 없이 다중 뷰 이미지 입력만으로 조인트 레이아웃 추정(layout estimation) 과 3D 객체 감지(3D object detection) 를 수행하는 최초의 방법론인 TUN3D 를 제시합니다.

#Review #3D Scene Understanding #Layout Estimation #3D Object Detection #Unposed Images #Sparse Convolutional Networks #Multi-view Stereo #Real-time AI

2025년 9월 29일

[논문리뷰] StateX: Enhancing RNN Recall via Post-training State Expansion

본 논문은 Transformer 대비 긴 컨텍스트 처리 효율이 높은 RNN 계열 모델들이 고정된 크기의 recurrent state 로 인해 장문 컨텍스트에서의 정보 회상 능력(recall ability) 이 떨어지는 문제를 해결하고자 합니다.

#Review #RNN #State Expansion #Post-training #Long-context Recall #Linear Attention #State Space Models #GLA #Mamba2

2025년 9월 29일

[논문리뷰] See, Point, Fly: A Learning-Free VLM Framework for Universal Unmanned Aerial Navigation

본 논문은 기존 Vision-Language Models (VLMs) 기반의 드론 내비게이션 접근 방식이 액션 예측을 텍스트 생성으로 간주하여 발생하는 한계를 해결하고자 합니다.

#Review #Vision-Language Models #UAV Navigation #Zero-shot #Spatial Grounding #Waypoint Prompting #Autonomous Navigation #Adaptive Control

2025년 9월 29일

[논문리뷰] SPARK: Synergistic Policy And Reward Co-Evolving Framework

본 논문은 대규모 언어/시각-언어 모델(LLM/LVLM)의 강화 학습(RL) 파이프라인이 겪는 한계를 해결하고자 합니다.

#Review #Reinforcement Learning #LLMs #LVLMs #Reward Modeling #Policy Optimization #Self-Reflection #Verifiable Rewards #Co-evolution

2025년 9월 29일

[논문리뷰] ReviewScore: Misinformed Peer Review Detection with Large Language Models

AI 학회에서 급증하는 제출 수로 인해 저하되는 동료 검토의 품질 문제를 해결하고자 합니다.

#Review #Peer Review #Review Quality #Large Language Models (LLMs)#Misinformed Review #Argument Reconstruction #Factuality Evaluation #Natural Language Processing #Automated Evaluation

2025년 9월 29일

[논문리뷰] RefAM: Attention Magnets for Zero-Shot Referral Segmentation

컴퓨터 비전 태스크에서 CNN의 의존성을 완전히 제거 하고, 순수한 Transformer 아키텍처 만으로 이미지 분류 성능을 달성하는 것을 목표로 합니다. 기존 CNN 기반 접근법의 한계를 극복하고 self-attention 메커니즘 이 이미지 패치 간의 관계를 효과적으로 학습할 수 있음을 증명하고자 합니다.

#Review #Zero-Shot Segmentation #Referring Segmentation #Diffusion Transformers (DiTs)#Attention Mechanisms #Attention Sinks #Stop Words #Vision-Language Models #Training-Free Methods

2025년 9월 29일

[논문리뷰] Real-Time Object Detection Meets DINOv3

본 논문은 실시간 객체 탐지 분야에서 성능과 연산 효율성 사이의 균형을 개선하고, 특히 경량 모델을 위한 엣지 및 모바일 환경에서의 배포 효율성을 높이는 것을 목표로 합니다.

#Review #Real-time Object Detection #DINOv3 #DEIMv2 #Vision Transformer #Multi-scale Features #Spatial Tuning Adapter #Lightweight Models #Object Detection Framework

2025년 9월 29일

[논문리뷰] Quantile Advantage Estimation for Entropy-Safe Reasoning

대규모 언어 모델(LLMs)의 추론 능력을 강화하는 Reinforcement Learning with Verifiable Rewards (RLVR) 훈련 과정에서 발생하는 엔트로피 붕괴(entropy collapse) 및 엔트로피 폭발(entropy explosion) 문제를 해결하고, 안정적인 학습을 통해 성능을 지속적으로 향상시키는 것을 목표로 합니다.

#Review #Reinforcement Learning #LLM Reasoning #Entropy Control #Advantage Estimation #Quantile Baseline #Exploration-Exploitation #RLVR

2025년 9월 29일

[논문리뷰] PromptCoT 2.0: Scaling Prompt Synthesis for Large Language Model Reasoning

LLM 추론을 위한 고품질 훈련 문제의 부족이라는 핵심 병목 현상을 해결하고자 합니다.

#Review #Prompt Synthesis #Large Language Models #Reasoning #Expectation-Maximization #Self-Play #Supervised Fine-Tuning #Task Generation #Rationale Generation

2025년 9월 29일

[논문리뷰] No Prompt Left Behind: Exploiting Zero-Variance Prompts in LLM Reinforcement Learning via Entropy-Guided Advantage Shaping

본 논문은 기존의 Verifiable Rewards를 활용한 강화 학습(RLVR) 방법론, 특히 GRPO 가 모든 롤아웃 응답이 동일한 보상을 받는 ' Zero-Variance Prompts '를 무시하여 귀중한 학습 신호를 손실하고 롤아웃 비용을 낭비하는 문제를 해결하고자 합니다.

#Review #LLM Reinforcement Learning #Zero-Variance Prompts #Advantage Shaping #Entropy-Guided #Math Reasoning #RLVR #Group Relative Policy Optimization

2025년 9월 29일

[논문리뷰] MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing

본 연구는 기존 비전-언어 모델(VLM)이 고해상도 문서 처리 시 겪는 토큰 중복, 높은 계산 비용, 환각 문제 등의 한계를 극복하는 것을 목표로 합니다. 특히, 전반적인 계산 효율성을 유지하면서도 복잡하고 밀도 높은 문서의 구조 및 내용을 정확하게 파싱하기 위한 효율적인 디커플링 비전-언어 모델 을 제안합니다.

#Review #Document Parsing #Vision-Language Model #High-Resolution #Two-Stage Inference #Layout Analysis #Content Recognition #Data Engine #Computational Efficiency

2025년 9월 29일

[논문리뷰] Mind-the-Glitch: Visual Correspondence for Detecting Inconsistencies in Subject-Driven Generation

본 논문은 Subject-Driven 이미지 생성 모델에서 발생하는 시각적 불일치(visual inconsistencies)를 정확하게 감지하고 정량화하며, 더 나아가 해당 불일치 영역을 공간적으로 지역화하는 것을 목표로 합니다.

#Review #Subject-Driven Generation #Visual Inconsistency Detection #Feature Disentanglement #Diffusion Models #Semantic Correspondence #Evaluation Metric #Spatial Localization #Contrastive Learning

2025년 9월 29일

[논문리뷰] MesaTask: Towards Task-Driven Tabletop Scene Generation via 3D Spatial Reasoning

로봇 조작 태스크를 위한 현실적이고 태스크 관련성이 높은 3D 탁상 장면(tabletop scene)을 자동으로 생성하는 것을 목표로 합니다. 기존 수동 또는 무작위 장면 생성 방식의 비효율성과 낮은 현실성을 극복하고, 고수준의 태스크 지시와 3D 장면 레이아웃 간의 큰 격차를 해소하고자 합니다.

#Review #3D Scene Generation #Robotic Manipulation #Large Language Models #Spatial Reasoning #Dataset #Direct Preference Optimization #Tabletop Scene

2025년 9월 29일

[논문리뷰] LucidFlux: Caption-Free Universal Image Restoration via a Large-Scale Diffusion Transformer

본 논문은 알 수 없는 혼합된 열화가 적용된 실제 저품질(LQ) 이미지에 대해 의미론적 일관성과 지각적 충실도를 유지하면서 범용 이미지 복원(UIR)을 수행하는 것을 목표로 합니다.

#Review #Universal Image Restoration #Diffusion Transformer #Caption-Free #Semantic Alignment #Image Quality Assessment #Data Curation #Real-World Degradations #Deep Learning

2025년 9월 29일

[논문리뷰] LongLive: Real-time Interactive Long Video Generation

실시간 및 대화형으로 고품질의 긴 비디오를 생성하는 데 따르는 효율성, 일관성, 그리고 시맨틱 일관성 문제를 해결하는 것을 목표로 합니다. 특히, 프롬프트 전환 시 시각적 일관성과 동적 콘텐츠 생성을 위한 상호작용성 부족이라는 기존 AR 및 Diffusion 모델의 한계를 극복하고자 합니다.

#Review #Long Video Generation #Real-time #Interactive AI #Autoregressive Models #KV Cache #Streaming Tuning #Attention Sink #Diffusion Models

2025년 9월 29일

[논문리뷰] Learn the Ropes, Then Trust the Wins: Self-imitation with Progressive Exploration for Agentic Reinforcement Learning

본 논문의 핵심 목표는 장기적인(long-horizon), 희소한 보상(sparsely-rewarded)을 가진 LLM 에이전트 태스크에서 강화 학습(RL)의 근본적인 문제인 탐색-활용 트레이드오프(exploration-exploitation trade-off) 를 효과적으로 관리하는 것입니다.

#Review #Reinforcement Learning #LLM Agents #Exploration-Exploitation #Self-Imitation Learning #Intrinsic Rewards #Curriculum Learning #Policy Entropy #Tool Use

2025년 9월 29일

[논문리뷰] Language Models Can Learn from Verbal Feedback Without Scalar Rewards

기존 RLHF(Reinforcement Learning from Human Feedback) 방식이 구두 피드백을 스칼라 보상으로 압축하여 발생하는 정보 손실, 모호성, 보상 스케일 불균형 문제를 해결하는 것을 목표로 합니다.

#Review #Verbal Feedback #Conditional Generation #Large Language Models #Feedback-Conditional Policy #Offline-Online Learning #Reward Hypothesis Bypass

2025년 9월 29일