#Programmatically Verified Benchmark

1개의 포스트

[논문리뷰] MM-CondChain: A Programmatically Verified Benchmark for Visually Grounded Deep Compositional Reasoning

Multimodal Large Language Models ( MLLM )은 GUI 탐색과 같은 복잡한 시각적 워크플로우를 처리하는 데 점점 더 많이 사용되고 있지만, 이러한 Deep Compositional Reasoning 능력에 대한 평가는 여전히 부족합니다.

#Review #MLLM #Deep Compositional Reasoning #Programmatically Verified Benchmark #Hard Negatives #Control Flow #VPIR #Path F1

2026년 3월 15일