#Tool-Augmented MLLMs

1개의 포스트

[논문리뷰] REVISOR: Beyond Textual Reflection, Towards Multimodal Introspective Reasoning in Long-Form Video Understanding

본 논문은 기존 텍스트 기반 자기 성찰(self-reflection) 메커니즘 이 풍부하고 동적인 시각 정보를 처리하는 데 한계가 있어, 장문 비디오 이해(long-form video understanding) 태스크에서 성능 저하를 겪는 문제를 해결하고자 합니다.

#Review #Multimodal Reasoning #Long-Form Video Understanding #Self-Reflection #Reinforcement Learning #Tool-Augmented MLLMs #Visual Rethinking #Video Question Answering #Causal Attribution

2025년 11월 18일