본문으로 건너뛰기

#Video Understanding

42개의 포스트

[논문리뷰] ViMU: Benchmarking Video Metaphorical Understanding

댓글 수 로딩 중

[논문리뷰] Watch Before You Answer: Learning from Visually Grounded Post-Training

댓글 수 로딩 중

[논문리뷰] Video-MME-v2: Towards the Next Stage in Benchmarks for Comprehensive Video Understanding

댓글 수 로딩 중

[논문리뷰] Attend Before Attention: Efficient and Scalable Video Understanding via Autoregressive Gazing

댓글 수 로딩 중

[논문리뷰] Insight-V++: Towards Advanced Long-Chain Visual Reasoning with Multimodal Large Language Models

댓글 수 로딩 중

[논문리뷰] Learning Situated Awareness in the Real World

댓글 수 로딩 중

[논문리뷰] Towards Universal Video MLLMs with Attribute-Structured and Quality-Verified Instructions

댓글 수 로딩 중

[논문리뷰] Demo-ICL: In-Context Learning for Procedural Video Knowledge Acquisition

댓글 수 로딩 중

[논문리뷰] Molmo2: Open Weights and Data for Vision-Language Models with Video Understanding and Grounding

댓글 수 로딩 중

[논문리뷰] VideoAuto-R1: Video Auto Reasoning via Thinking Once, Answering Twice

댓글 수 로딩 중

[논문리뷰] Taming Hallucinations: Boosting MLLMs' Video Understanding via Counterfactual Video Generation

댓글 수 로딩 중

[논문리뷰] OmniAgent: Audio-Guided Active Perception Agent for Omnimodal Audio-Video Understanding

댓글 수 로딩 중

[논문리뷰] UnityVideo: Unified Multi-Modal Multi-Task Learning for Enhancing World-Aware Video Generation

댓글 수 로딩 중

[논문리뷰] Mitigating Object and Action Hallucinations in Multimodal LLMs via Self-Augmented Contrastive Alignment

댓글 수 로딩 중

[논문리뷰] OneThinker: All-in-one Reasoning Model for Image and Video

댓글 수 로딩 중

[논문리뷰] Click2Graph: Interactive Panoptic Video Scene Graphs from a Single Click

댓글 수 로딩 중

[논문리뷰] SciEducator: Scientific Video Understanding and Educating via Deming-Cycle Multi-Agent System

댓글 수 로딩 중

[논문리뷰] VIDEOP2R: Video Understanding from Perception to Reasoning

댓글 수 로딩 중

[논문리뷰] EmoVid: A Multimodal Emotion Video Dataset for Emotion-Centric Video Understanding and Generation

댓글 수 로딩 중

[논문리뷰] VideoSSR: Video Self-Supervised Reinforcement Learning

댓글 수 로딩 중

[논문리뷰] VIR-Bench: Evaluating Geospatial and Temporal Understanding of MLLMs via Travel Video Itinerary Reconstruction

댓글 수 로딩 중

[논문리뷰] Video2Roleplay: A Multimodal Dataset and Framework for Video-Guided Role-playing Agents

댓글 수 로딩 중

[논문리뷰] Kwai Keye-VL 1.5 Technical Report

댓글 수 로딩 중

[논문리뷰] PixelRefer: A Unified Framework for Spatio-Temporal Object Referring with Arbitrary Granularity

댓글 수 로딩 중