본문으로 건너뛰기

#Vision-Language Models (VLMs)

44개의 포스트

[논문리뷰] TerraScope: Pixel-Grounded Visual Reasoning for Earth Observation

댓글 수 로딩 중

[논문리뷰] HomeSafe-Bench: Evaluating Vision-Language Models on Unsafe Action Detection for Embodied Agents in Household Scenarios

댓글 수 로딩 중

[논문리뷰] Can Vision-Language Models Solve the Shell Game?

댓글 수 로딩 중

[논문리뷰] Holi-Spatial: Evolving Video Streams into Holistic 3D Spatial Intelligence

댓글 수 로딩 중

[논문리뷰] AI Gamestore: Scalable, Open-Ended Evaluation of Machine General Intelligence with Human Games

댓글 수 로딩 중

[논문리뷰] NarraScore: Bridging Visual Narrative and Musical Dynamics via Hierarchical Affective Control

댓글 수 로딩 중

[논문리뷰] AdaptMMBench: Benchmarking Adaptive Multimodal Reasoning for Mode Selection and Reasoning Process

댓글 수 로딩 중

[논문리뷰] SketchDynamics: Exploring Free-Form Sketches for Dynamic Intent Expression in Animation Generation

댓글 수 로딩 중

[논문리뷰] VisGym: Diverse, Customizable, Scalable Environments for Multimodal Agents

댓글 수 로딩 중

[논문리뷰] Urban Socio-Semantic Segmentation with Vision-Language Reasoning

댓글 수 로딩 중

[논문리뷰] FocusUI: Efficient UI Grounding via Position-Preserving Visual Token Selection

댓글 수 로딩 중

[논문리뷰] See Less, See Right: Bi-directional Perceptual Shaping For Multimodal Reasoning

댓글 수 로딩 중

[논문리뷰] GTR-Turbo: Merged Checkpoint is Secretly a Free Teacher for Agentic VLM Training

댓글 수 로딩 중

[논문리뷰] Reasoning Palette: Modulating Reasoning via Latent Contextualization for Controllable Exploration for (V)LMs

댓글 수 로딩 중

[논문리뷰] VTCBench: Can Vision-Language Models Understand Long Context with Vision-Text Compression?

댓글 수 로딩 중

[논문리뷰] V-REX: Benchmarking Exploratory Visual Reasoning via Chain-of-Questions

댓글 수 로딩 중

[논문리뷰] Toward Ambulatory Vision: Learning Visually-Grounded Active View Selection

댓글 수 로딩 중

[논문리뷰] ReViSE: Towards Reason-Informed Video Editing in Unified Models with Self-Reflective Learning

댓글 수 로딩 중

[논문리뷰] Revisiting the Necessity of Lengthy Chain-of-Thought in Vision-centric Reasoning Generalization

댓글 수 로딩 중

[논문리뷰] Scaling Agentic Reinforcement Learning for Tool-Integrated Reasoning in VLMs

댓글 수 로딩 중

[논문리뷰] Chain-of-Visual-Thought: Teaching VLMs to See and Think Better with Continuous Visual Tokens

댓글 수 로딩 중

[논문리뷰] Multi-Faceted Attack: Exposing Cross-Model Vulnerabilities in Defense-Equipped Vision-Language Models

댓글 수 로딩 중

[논문리뷰] First Frame Is the Place to Go for Video Content Customization

댓글 수 로딩 중

[논문리뷰] Ariadne: A Controllable Framework for Probing and Extending VLM Reasoning Boundaries

댓글 수 로딩 중

[논문리뷰] Vote-in-Context: Turning VLMs into Zero-Shot Rank Fusers

댓글 수 로딩 중

[논문리뷰] OpenGVL - Benchmarking Visual Temporal Progress for Data Curation

댓글 수 로딩 중

[논문리뷰] Visual Programmability: A Guide for Code-as-Thought in Chart Understanding

댓글 수 로딩 중

[논문리뷰] D-HUMOR: Dark Humor Understanding via Multimodal Open-ended Reasoning

댓글 수 로딩 중

[논문리뷰] Robix: A Unified Model for Robot Interaction, Reasoning and Planning

댓글 수 로딩 중

[논문리뷰] LLaVA-Critic-R1: Your Critic Model is Secretly a Strong Policy Model

댓글 수 로딩 중

[논문리뷰] IAG: Input-aware Backdoor Attack on VLMs for Visual Grounding

댓글 수 로딩 중

[논문리뷰] Adapting Vision-Language Models Without Labels: A Comprehensive Survey

댓글 수 로딩 중

[논문리뷰] HPSv3: Towards Wide-Spectrum Human Preference Score

댓글 수 로딩 중

[논문리뷰] RobotArena infty: Scalable Robot Benchmarking via Real-to-Sim Translation

댓글 수 로딩 중

[논문리뷰] Learning an Image Editing Model without Image Editing Pairs

댓글 수 로딩 중

[논문리뷰] TTRV: Test-Time Reinforcement Learning for Vision Language Models

댓글 수 로딩 중

[논문리뷰] Training Vision-Language Process Reward Models for Test-Time Scaling in Multimodal Reasoning: Key Insights and Lessons Learned

댓글 수 로딩 중