본문으로 건너뛰기

#Benchmark Dataset

15개의 포스트

[논문리뷰] Ref-Adv: Exploring MLLM Visual Reasoning in Referring Expression Tasks

댓글 수 로딩 중

[논문리뷰] X-MuTeST: A Multilingual Benchmark for Explainable Hate Speech Detection and A Novel LLM-consulted Explanation Framework

댓글 수 로딩 중

[논문리뷰] IF-Bench: Benchmarking and Enhancing MLLMs for Infrared Images with Generative Visual Prompting

댓글 수 로딩 중

[논문리뷰] miniF2F-Lean Revisited: Reviewing Limitations and Charting a Path Forward

댓글 수 로딩 중

[논문리뷰] Dr.V: A Hierarchical Perception-Temporal-Cognition Framework to Diagnose Video Hallucination by Fine-grained Spatial-Temporal Grounding

댓글 수 로딩 중

[논문리뷰] Drivel-ology: Challenging LLMs with Interpreting Nonsense with Depth

댓글 수 로딩 중

[논문리뷰] T2R-bench: A Benchmark for Generating Article-Level Reports from Real World Industrial Tables

댓글 수 로딩 중

[논문리뷰] FakeParts: a New Family of AI-Generated DeepFakes

댓글 수 로딩 중

[논문리뷰] A Coarse-to-Fine Approach to Multi-Modality 3D Occupancy Grounding

댓글 수 로딩 중

[논문리뷰] DITING: A Multi-Agent Evaluation Framework for Benchmarking Web Novel Translation

댓글 수 로딩 중