본문으로 건너뛰기

#MLLMs

30개의 포스트

[논문리뷰] CurveStream: Boosting Streaming Video Understanding in MLLMs via Curvature-Aware Hierarchical Visual Memory Management

댓글 수 로딩 중

[논문리뷰] TextPecker: Rewarding Structural Anomaly Quantification for Enhancing Visual Text Rendering

댓글 수 로딩 중

[논문리뷰] BiManiBench: A Hierarchical Benchmark for Evaluating Bimanual Coordination of Multimodal Large Language Models

댓글 수 로딩 중

[논문리뷰] Zooming without Zooming: Region-to-Image Distillation for Fine-Grained Multimodal Perception

댓글 수 로딩 중

[논문리뷰] MetaphorStar: Image Metaphor Understanding and Reasoning with End-to-End Visual Reinforcement Learning

댓글 수 로딩 중

[논문리뷰] Taming Hallucinations: Boosting MLLMs' Video Understanding via Counterfactual Video Generation

댓글 수 로딩 중

[논문리뷰] GroundingME: Exposing the Visual Grounding Gap in MLLMs through Multi-Dimensional Evaluation

댓글 수 로딩 중

[논문리뷰] Tool-Augmented Spatiotemporal Reasoning for Streamlining Video Question Answering Task

댓글 수 로딩 중

[논문리뷰] StreamGaze: Gaze-Guided Temporal Reasoning and Proactive Understanding in Streaming Videos

댓글 수 로딩 중

[논문리뷰] VideoSSR: Video Self-Supervised Reinforcement Learning

댓글 수 로딩 중

[논문리뷰] UME-R1: Exploring Reasoning-Driven Generative Multimodal Embeddings

댓글 수 로딩 중

[논문리뷰] VisR-Bench: An Empirical Study on Visual Retrieval-Augmented Generation for Multilingual Long Document Understanding

댓글 수 로딩 중

[논문리뷰] UI-AGILE: Advancing GUI Agents with Effective Reinforcement Learning and Precise Inference-Time Grounding

댓글 수 로딩 중

[논문리뷰] InfiGUI-G1: Advancing GUI Grounding with Adaptive Exploration Policy Optimization

댓글 수 로딩 중

[논문리뷰] TARS: MinMax Token-Adaptive Preference Strategy for Hallucination Reduction in MLLMs

댓글 수 로딩 중

[논문리뷰] VisJudge-Bench: Aesthetics and Quality Assessment of Visualizations

댓글 수 로딩 중

[논문리뷰] Multimodal Prompt Optimization: Why Not Leverage Multiple Modalities for MLLMs

댓글 수 로딩 중

[논문리뷰] Are We Using the Right Benchmark: An Evaluation Framework for Visual Token Compression Methods

댓글 수 로딩 중