#Open-Vocabulary Segmentation

4개의 포스트

[논문리뷰] From Pixels to Concepts: Do Segmentation Models Understand What They Segment?

본 논문은 최신 promptable segmentation 모델들이 시각적 살점(salient cues)에 과도하게 의존하여 semantically invalid한 프롬프트에도 정확한 마스크를 생성하는 '개념적 기반(concept-faithful grounding)'의 결여 문제를 해결하고자 합니다 .

#Review #Promptable Segmentation #Counterfactual Evaluation #Semantic Grounding #Visual Hallucination #Multimodal Reasoning #Open-Vocabulary Segmentation

2026년 5월 13일

[논문리뷰] Holi-Spatial: Evolving Video Streams into Holistic 3D Spatial Intelligence

본 논문의 핵심 목표는 수동 개입 없이 원시 비디오 스트림을 대규모의 홀리스틱 3D 공간 지능 데이터로 자동 변환하는 파이프라인인 Holi-Spatial 을 제시하는 것입니다.

#Review #3D Spatial Intelligence #Video Stream Processing #Automated Data Curation #3D Gaussian Splatting (3DGS)#Vision-Language Models (VLMs)#Open-Vocabulary Segmentation #Spatial Reasoning #Multimodal Datasets

2026년 3월 9일

[논문리뷰] OpenVoxel: Training-Free Grouping and Captioning Voxels for Open-Vocabulary 3D Scene Understanding

기존 3D 장면 이해 방법론들이 훈련된 임베딩과 대규모 수동 주석, 긴 훈련 시간에 의존하는 한계를 극복하고자 합니다. OpenVoxel은 훈련 없이 희소 복셀을 그룹화하고 캡셔닝하여 오픈-vocabulary 3D 장면 이해 태스크를 수행하며, 특히 복잡한 자연어 질의에 효과적으로 대응하는 것을 목표로 합니다.

#Review #3D Scene Understanding #Open-Vocabulary Segmentation #Referring Expression Segmentation #Training-Free #Voxel Grouping #Vision-Language Models #Multi-modal Large Language Models #Sparse Voxel Rasterization

2026년 1월 14일

[논문리뷰] SAM 3: Segment Anything with Concepts

이 논문은 기존 SAM(Segment Anything Model) 의 한계, 즉 단일 객체 분할(PVS)을 넘어 이미지와 비디오에서 개념(Concept) 을 기반으로 모든 객체 인스턴스를 탐지, 분할 및 추적하는 것을 목표로 합니다.

#Review #Segment Anything Model #Open-Vocabulary Segmentation #Multimodal Foundation Model #Instance Segmentation #Video Object Tracking #Prompt Engineering #Data Engine #Human-in-the-loop

2025년 11월 23일