#Zero-Shot

2개의 포스트

[논문리뷰] Think, Act, Build: An Agentic Framework with Vision Language Models for Zero-Shot 3D Visual Grounding

본 논문은 3D-VG 작업을 'Think(추론)', 'Act(도구 호출)', 'Build(재구성)' 단계로 세분화한 TAB 프레임워크를 제안합니다 . TAB은 고정된 파이프라인 대신, 전문적인 3D-VG Skill blueprint에 따라 VLM 에이전트가 능동적으로 visual tool을 호출하여 타겟을 추적하고 마스크를 생성합니다.

#Review #3D Visual Grounding #Vision-Language Models #Agentic Framework #RGB-D #Zero-Shot #Geometric Reconstruction

2026년 4월 1일

[논문리뷰] SEM: Sparse Embedding Modulation for Post-Hoc Debiasing of Vision-Language Models

CLIP과 같은 Vision-Language Models (VLMs)는 multimodal AI의 핵심 구성 요소이지만, 대규모의 uncurated training data로 인해 심각한 social 및 spurious bias가 내재되어 있다.

#Review #Vision-Language Models #CLIP #Debiasing #Sparse Autoencoder #Post-Hoc #Zero-Shot #Feature Disentanglement #Bias Mitigation

2026년 3월 23일