#Multi-modal Reasoning

5개의 포스트

[논문리뷰] ReGuLaR: Variational Latent Reasoning Guided by Rendered Chain-of-Thought

본 연구는 LLM의 Chain-of-Thought (CoT) 추론 과정에서 발생하는 높은 계산 비용 과 추론 비효율성 을 해결하고자 합니다.

#Review #Latent Reasoning #Chain-of-Thought #Variational Autoencoder #Visual-Text Compression #LLMs #Multi-modal Reasoning #Computational Efficiency

2026년 2월 1일

[논문리뷰] Urban Socio-Semantic Segmentation with Vision-Language Reasoning

본 논문은 위성 이미지에서 건물이나 수역과 같은 물리적 속성이 아닌, 학교나 공원과 같은 사회적으로 정의된 도시의 의미론적 개체 를 정확하게 분할하는 새로운 도전 과제인 도시 사회-의미론적 분할(Urban Socio-Semantic Segmentation)을 해결하는 것을 목표로 합니다.

#Review #Urban Segmentation #Socio-Semantic #Vision-Language Models (VLMs)#Reinforcement Learning #Geospatial Data #Multi-modal Reasoning #SAM

2026년 1월 15일

[논문리뷰] Probing Scientific General Intelligence of LLMs with Scientist-Aligned Workflows

본 논문은 대규모 언어 모델(LLM)의 과학적 일반 지능(SGI) 평가를 위한 체계적인 프레임워크와 정의가 부족하다는 문제를 해결합니다.

#Review #Scientific General Intelligence (SGI)#LLMs #Benchmarking #Scientist-Aligned Workflows #Practical Inquiry Model #Multi-modal Reasoning #Code Generation #Test-Time Reinforcement Learning (TTRL)

2025년 12월 21일

[논문리뷰] DualVLA: Building a Generalizable Embodied Agent via Partial Decoupling of Reasoning and Action

본 논문은 Vision-Language-Action (VLA) 모델에서 발생하는 '액션 퇴화(action degeneration)' 문제를 해결하는 것을 목표로 합니다.

#Review #Vision-Language-Action (VLA)#Embodied AI #Action Degeneration #Data Pruning #Knowledge Distillation #Multi-modal Reasoning #Robot Learning #VLA Score

2025년 11월 30일

[논문리뷰] Draft and Refine with Visual Experts

최신 Large Vision-Language Models (LVLMs) 는 시각적 증거보다 언어적 사전 지식에 과도하게 의존하여 근거 없는 환각(hallucination)을 자주 생성합니다.

#Review #Large Vision-Language Models (LVLMs)#Visual Grounding #Hallucination Mitigation #Agent Framework #Visual Question Answering (VQA)#Expert Coordination #Relevance Map #Multi-modal Reasoning

2025년 11월 20일