#3D Understanding

4개의 포스트

[논문리뷰] VLM3: Vision Language Models Are Native 3D Learners

본 논문은 표준 VLM이 복잡한 전용 설계 없이도 3D 이해를 수행할 수 있음을 증명하기 위해 수행되었다.

#Review #Vision Language Models #3D Understanding #Metric Depth Estimation #Pixel Correspondence #Camera Pose Estimation #Focal Length Unification #Scalable Training

2026년 5월 31일

[논문리뷰] UniMesh: Unifying 3D Mesh Understanding and Generation

본 논문은 3D 생성과 이해를 하나의 아키텍처 내에서 통합하고 상호 강화할 수 있는 UniMesh를 제안합니다. Mesh Head를 도입하여 BAGEL의 latent와 Hunyuan3D의 conditioning latent를 직접 매핑함으로써 정보 손실을 최소화하고 기하학적 정밀도를 유지합니다.

#Review #3D Generation #3D Understanding #Mesh Head #Chain-of-Mesh #Self-Reflection #Multimodal Learning

2026년 4월 21일

[논문리뷰] How Much 3D Do Video Foundation Models Encode?

본 논문은 대규모 비디오 데이터로 사전 훈련된 Video Foundation Models (VidFMs) 내에 글로벌 3D 이해도가 자연스럽게 내재되어 있는지를 정량적으로 탐구하는 것을 목표로 합니다.

#Review #Video Foundation Models #3D Understanding #3D Reconstruction #Model Agnostic #Feature Probing #Diffusion Models #Temporal Reasoning

2025년 12월 25일

[논문리뷰] Part-X-MLLM: Part-aware 3D Multimodal Large Language Model

본 논문은 기존 3D MLLM(Multimodal Large Language Model)이 3D 객체를 개별 부품으로 인식하고 조작하는 데 한계가 있다는 문제점을 해결하고자 합니다.

#Review #3D Multimodal LLM #Part-aware #3D Generation #3D Editing #3D Understanding #Bounding Box #Structured Program #Dual-encoder

2025년 11월 17일