#Region Understanding

1개의 포스트

[논문리뷰] Grasp Any Region: Towards Precise, Contextual Pixel Understanding for Multimodal LLMs

본 논문은 기존 MLLM 이 전체적인 이해에는 뛰어나지만, 복잡한 장면의 미세한 디테일과 객체 간의 복잡한 관계를 파악하는 데 한계가 있음을 지적합니다.

#Review #Multimodal LLMs #Region Understanding #Contextual Pixel Understanding #RoI-aligned Feature Replay #Compositional Reasoning #GAR-Bench #Zero-shot Video Understanding

2025년 10월 22일