#AI Benchmarking

4개의 포스트

[논문리뷰] The Trinity of Consistency as a Defining Principle for General World Models

본 논문은 최신 생성 AI 모델들이 시각적으로 그럴듯한 결과물을 생성하지만, 물리 법칙과 인과 관계를 이해하는 데 한계를 보이는 문제를 해결하고자 합니다.

#Review #World Models #Multimodal Generative AI #Consistency Theory #Spatial-Temporal Reasoning #Causal Simulation #AI Benchmarking #Artificial General Intelligence

2026년 2월 26일

[논문리뷰] Can World Simulators Reason? Gen-ViRe: A Generative Visual Reasoning Benchmark

본 논문은 최신 비디오 생성 모델 이 단순한 시각적 품질을 넘어 실제 세계의 물리 법칙과 연속성을 이해하며 추론하는 Chain-of-Frames (CoF) 추론 능력 을 체계적으로 평가할 수 있는 벤치마크의 부재를 해결하는 것을 목표로 합니다.

#Review #Generative Visual Reasoning #Chain-of-Frames (CoF)#Video Generation Models #World Simulators #AI Benchmarking #Cognitive Reasoning #VLM Evaluation

2025년 11월 18일

[논문리뷰] Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm

기존의 'Thinking with Text' 및 'Thinking with Images' 패러다임이 가진 정적 이미지의 한계와 모달리티 분리 문제를 극복하고자 합니다.

#Review #Video Generation #Multimodal Reasoning #Temporal Understanding #Spatial Reasoning #Foundation Models #AI Benchmarking #In-Context Learning #Self-Consistency

2025년 11월 9일

[논문리뷰] Benchmark It Yourself (BIY): Preparing a Dataset and Benchmarking AI Models for Scatterplot-Related Tasks

본 연구는 기존 벤치마크들이 산점도(scatterplot) 관련 태스크를 충분히 다루지 못하여 AI 모델의 성능을 평가하는 데 한계가 있다는 문제점을 해결하고자 합니다.

#Review #Scatterplot Analysis #AI Benchmarking #Multimodal LLMs #Synthetic Data Generation #Cluster Detection #Outlier Detection #Data Visualization #Prompt Engineering

2025년 10월 8일