[논문리뷰] MA-EgoQA: Question Answering over Egocentric Videos from Multiple Embodied AgentsarXiv에 게시된 'MA-EgoQA: Question Answering over Egocentric Videos from Multiple Embodied Agents' 논문에 대한 자세한 리뷰입니다.#Review#Egocentric Vision#Multi-Agent Systems#Video Question Answering#Long-Horizon Reasoning#Embodied AI#Benchmark Dataset#Shared Memory#Dynamic Retrieval2026년 3월 11일댓글 수 로딩 중
[논문리뷰] Ref-Adv: Exploring MLLM Visual Reasoning in Referring Expression TasksarXiv에 게시된 'Ref-Adv: Exploring MLLM Visual Reasoning in Referring Expression Tasks' 논문에 대한 자세한 리뷰입니다.#Review#Referring Expression Comprehension#MLLM#Visual Reasoning#Benchmark Dataset#Hard Distractors#Grounding Shortcuts#Chain-of-Thought#Negation2026년 3월 1일댓글 수 로딩 중
[논문리뷰] X-MuTeST: A Multilingual Benchmark for Explainable Hate Speech Detection and A Novel LLM-consulted Explanation FrameworkShwetank Shekhar Singh이 arXiv에 게시한 'X-MuTeST: A Multilingual Benchmark for Explainable Hate Speech Detection and A Novel LLM-consulted Explanation Framework' 논문에 대한 자세한 리뷰입니다.#Review#Hate Speech Detection#Explainable AI (XAI)#Multilingual NLP#Large Language Models (LLMs)#Attention Mechanism#N-gram Explanations#Human Rationales#Benchmark Dataset2026년 1월 6일댓글 수 로딩 중
[논문리뷰] Beyond Memorization: A Multi-Modal Ordinal Regression Benchmark to Expose Popularity Bias in Vision-Language ModelsYu-Lun Liu이 arXiv에 게시한 'Beyond Memorization: A Multi-Modal Ordinal Regression Benchmark to Expose Popularity Bias in Vision-Language Models' 논문에 대한 자세한 리뷰입니다.#Review#Vision-Language Models (VLMs)#Popularity Bias#Ordinal Regression#Building Age Estimation#Multi-modal Learning#Benchmark Dataset#Explainable AI2025년 12월 24일댓글 수 로딩 중
[논문리뷰] IF-Bench: Benchmarking and Enhancing MLLMs for Infrared Images with Generative Visual PromptingarXiv에 게시된 'IF-Bench: Benchmarking and Enhancing MLLMs for Infrared Images with Generative Visual Prompting' 논문에 대한 자세한 리뷰입니다.#Review#Multimodal Large Language Models (MLLMs)#Infrared Image Understanding#Benchmark Dataset#Visual Question Answering (VQA)#Generative Visual Prompting (GenViP)#Domain Adaptation#Image-to-Image Translation2025년 12월 10일댓글 수 로딩 중
[논문리뷰] FreeAskWorld: An Interactive and Closed-Loop Simulator for Human-Centric Embodied AIXinyu Yin이 arXiv에 게시한 'FreeAskWorld: An Interactive and Closed-Loop Simulator for Human-Centric Embodied AI' 논문에 대한 자세한 리뷰입니다.#Review#Embodied AI#Vision-and-Language Navigation (VLN)#LLM-driven Simulation#Human-Agent Interaction#Closed-Loop#Benchmark Dataset#Social Cognition2025년 11월 19일댓글 수 로딩 중
[논문리뷰] miniF2F-Lean Revisited: Reviewing Limitations and Charting a Path ForwardFarzan Farnia이 arXiv에 게시한 'miniF2F-Lean Revisited: Reviewing Limitations and Charting a Path Forward' 논문에 대한 자세한 리뷰입니다.#Review#Automated Theorem Proving#Autoformalization#Benchmark Dataset#miniF2F#Lean Language#Large Language Models#Mathematical Reasoning#Formal Verification2025년 11월 16일댓글 수 로딩 중
[논문리뷰] The Massive Legal Embedding Benchmark (MLEB)arXiv에 게시된 'The Massive Legal Embedding Benchmark (MLEB)' 논문에 대한 자세한 리뷰입니다.#Review#Legal Information Retrieval#Embedding Models#Benchmark Dataset#Natural Language Processing#Retrieval-Augmented Generation#Jurisdictional Diversity#Legal Tech2025년 10월 24일댓글 수 로딩 중
[논문리뷰] DITING: A Multi-Agent Evaluation Framework for Benchmarking Web Novel TranslationarXiv에 게시된 'DITING: A Multi-Agent Evaluation Framework for Benchmarking Web Novel Translation' 논문에 대한 자세한 리뷰입니다.#Review#Machine Translation Evaluation#Large Language Models (LLMs)#Web Novel Translation#Multi-Agent Systems#Cultural Nuance#Benchmark Dataset#Natural Language Generation2025년 10월 15일댓글 수 로딩 중
[논문리뷰] EthicsMH: A Pilot Benchmark for Ethical Reasoning in Mental Health AIUVSKKR이 arXiv에 게시한 'EthicsMH: A Pilot Benchmark for Ethical Reasoning in Mental Health AI' 논문에 대한 자세한 리뷰입니다.#Review#Ethical Reasoning#Mental Health AI#Benchmark Dataset#Large Language Models#AI Ethics#Clinical Decision Support#Human-in-the-loop2025년 9월 16일댓글 수 로딩 중
[논문리뷰] Dr.V: A Hierarchical Perception-Temporal-Cognition Framework to Diagnose Video Hallucination by Fine-grained Spatial-Temporal GroundingLi Zheng이 arXiv에 게시한 'Dr.V: A Hierarchical Perception-Temporal-Cognition Framework to Diagnose Video Hallucination by Fine-grained Spatial-Temporal Grounding' 논문에 대한 자세한 리뷰입니다.#Review#Video Hallucination#Large Video Models (LVMs)#Hierarchical Reasoning#Spatial-Temporal Grounding#Diagnostic Framework#Benchmark Dataset#Multimodal AI2025년 9월 16일댓글 수 로딩 중
[논문리뷰] Drivel-ology: Challenging LLMs with Interpreting Nonsense with DepthChi-Li Chen이 arXiv에 게시한 'Drivel-ology: Challenging LLMs with Interpreting Nonsense with Depth' 논문에 대한 자세한 리뷰입니다.#Review#Large Language Models#Pragmatic Understanding#Drivelology#Benchmark Dataset#Multilingual NLP#Semantic Reasoning#Contextual Inference2025년 9월 5일댓글 수 로딩 중
[논문리뷰] T2R-bench: A Benchmark for Generating Article-Level Reports from Real World Industrial TablesYu Zhao이 arXiv에 게시한 'T2R-bench: A Benchmark for Generating Article-Level Reports from Real World Industrial Tables' 논문에 대한 자세한 리뷰입니다.#Review#Table-to-Report Generation#Large Language Models (LLMs)#Benchmark Dataset#Industrial Applications#Table Reasoning#Evaluation Metrics#Real-world Data2025년 9월 2일댓글 수 로딩 중
[논문리뷰] FakeParts: a New Family of AI-Generated DeepFakesXi Wang이 arXiv에 게시한 'FakeParts: a New Family of AI-Generated DeepFakes' 논문에 대한 자세한 리뷰입니다.#Review#Deepfake Detection#Partial Deepfakes#AI-Generated Video#Benchmark Dataset#Video Forensics#Generative Models#Manipulation Detection#Human Perception2025년 8월 29일댓글 수 로딩 중
[논문리뷰] A Coarse-to-Fine Approach to Multi-Modality 3D Occupancy GroundingJianke Zhu이 arXiv에 게시한 'A Coarse-to-Fine Approach to Multi-Modality 3D Occupancy Grounding' 논문에 대한 자세한 리뷰입니다.#Review#3D Occupancy Grounding#Multi-modal Learning#Natural Language Understanding#Autonomous Driving#Voxel-based Prediction#Benchmark Dataset#Coarse-to-Fine2025년 8월 7일댓글 수 로딩 중