본문으로 건너뛰기

#Large Multimodal Models (LMMs)

12개의 포스트

[논문리뷰] From Narrow to Panoramic Vision: Attention-Guided Cold-Start Reshapes Multimodal Reasoning

댓글 수 로딩 중

[논문리뷰] DLEBench: Evaluating Small-scale Object Editing Ability for Instruction-based Image Editing Model

댓글 수 로딩 중

[논문리뷰] RISE-Video: Can Video Generators Decode Implicit World Rules?

댓글 수 로딩 중

[논문리뷰] Thinking-while-Generating: Interleaving Textual Reasoning throughout Visual Generation

댓글 수 로딩 중

[논문리뷰] Web-CogReasoner: Towards Knowledge-Induced Cognitive Reasoning for Web Agents

댓글 수 로딩 중

[논문리뷰] MedVLSynther: Synthesizing High-Quality Visual Question Answering from Medical Documents with Generator-Verifier LMMs

댓글 수 로딩 중

[논문리뷰] Video-LMM Post-Training: A Deep Dive into Video Reasoning with Large Multimodal Models

댓글 수 로딩 중

[논문리뷰] PRISMM-Bench: A Benchmark of Peer-Review Grounded Multimodal Inconsistencies

댓글 수 로딩 중