#Depth Estimation

14개의 포스트

[논문리뷰] Towards Consistent Video Geometry Estimation

본 논문은 기존 비디오 기하학 추정 모델들이 모델 구조나 학습 프로토콜에 따라 offline(full-sequence) 또는 online(streaming) 환경 중 하나에만 국한되는 문제를 해결합니다.

#Review #Foundation Model #Video Geometry Estimation #Dynamic Chunking Attention #Depth Estimation #Surface Normal Estimation #Point Map Estimation

2026년 5월 28일

[논문리뷰] M2Retinexformer: Multi-Modal Retinexformer for Low-Light Image Enhancement

본 논문은 기존의 Retinex 기반 딥러닝 기법들이 RGB 정보에만 의존하여 장면의 기하학적 구조나 조명 분포를 효과적으로 해석하지 못한다는 한계를 해결하고자 합니다.

#Review #Low-light Image Enhancement #Retinex Theory #Multi-modal Learning #Transformer #Cross-attention #Depth Estimation #Semantic Features

2026년 5월 13일

[논문리뷰] InfiniDepth: Arbitrary-Resolution and Fine-Grained Depth Estimation with Neural Implicit Fields

기존의 이산적인 이미지 그리드 기반 깊이 추정 방식이 가지는 해상도 확장성 및 기하학적 세부 정보 복구의 한계를 극복하는 것을 목표로 합니다.

#Review #Depth Estimation #Neural Implicit Fields #Arbitrary Resolution #Fine-Grained #Novel View Synthesis #Vision Transformer #Synth4K Benchmark

2026년 1월 6일

[논문리뷰] Diffusion Knows Transparency: Repurposing Video Diffusion for Transparent Object Depth and Normal Estimation

본 논문은 투명하거나 반사되는 객체에 대한 깊이 및 법선 추정의 고질적인 문제를 해결하고자 합니다.

#Review #Video Diffusion Model #Depth Estimation #Normal Estimation #Transparent Objects #Robotics #Data Generation #LoRA Fine-tuning

2025년 12월 29일

[논문리뷰] N3D-VLM: Native 3D Grounding Enables Accurate Spatial Reasoning in Vision-Language Models

본 연구는 기존 멀티모달 모델이 2D 이미지에 의존하여 3D 공간 이해 능력이 부족하다는 한계를 해결하는 것을 목표로 합니다.

#Review #3D Grounding #Spatial Reasoning #Vision-Language Models #Depth Estimation #3D Object Detection #Chain-of-Thought #Data Generation #Multimodal AI

2025년 12월 18일

[논문리뷰] DualCamCtrl: Dual-Branch Diffusion Model for Geometry-Aware Camera-Controlled Video Generation

본 논문은 기존의 카메라 제어 비디오 생성 모델들이 겪는 장면 이해 및 기하학적 인식 부족 문제를 해결하여, 지정된 카메라 궤적에 더욱 충실하고 기하학적으로 일관된 비디오를 생성하는 것을 목표로 합니다. 특히 깊이(depth) 정보를 효과적으로 통합하여 카메라 제어 비디오 생성의 정확도를 높이는 데 중점을 둡니다.

#Review #Diffusion Models #Video Generation #Camera Control #Depth Estimation #Dual-Branch Architecture #Geometric Awareness #Semantic Alignment #Multi-modal Fusion

2025년 12월 2일

[논문리뷰] Lotus-2: Advancing Geometric Dense Prediction with Powerful Image Generative Model

본 논문은 단일 이미지에서 픽셀 단위의 기하학적 속성을 복구하는 고질적인 난제(ill-posed problem)를 해결하는 것을 목표로 합니다.

#Review #Geometric Dense Prediction #Depth Estimation #Surface Normal Prediction #Diffusion Models #Rectified Flow #Generative Priors #Deterministic Inference #Two-Stage Framework

2025년 12월 1일

[논문리뷰] Depth Anything 3: Recovering the Visual Space from Any Views

논문은 단일 이미지, 다중 뷰 또는 비디오 스트림과 같은 임의의 시각 입력 으로부터 공간적으로 일관된 3D 기하 정보를 복구 하는 것을 목표로 합니다.

#Review #Depth Estimation #Multi-view Geometry #Transformer Architecture #Teacher-Student Learning #Pose Estimation #3D Reconstruction #Novel View Synthesis #Visual Space Recovery

2025년 11월 13일

[논문리뷰] Discriminately Treating Motion Components Evolves Joint Depth and Ego-Motion Learning

본 논문은 심도 추정 및 에고-모션 학습을 위한 기존의 자율학습(unsupervised learning) 프레임워크가 모션 구성요소(회전, 병진)를 불분명하게 처리하여 신뢰성과 견고성이 저하되는 문제를 해결하고자 합니다.

#Review #Self-supervised Learning #Depth Estimation #Ego-Motion Estimation #Motion Component Discrimination #Geometric Constraints #Optical Flow #PoseNet #DepthNet

2025년 11월 9일

[논문리뷰] 3D Aware Region Prompted Vision Language Model

본 논문은 단일 뷰 2D 이미지와 다중 뷰 3D 데이터를 공유된 시각 토큰 공간으로 연결하는 3D-aware Vision-Language Model (VLM) 인 SR-3D 를 제안하여, 복잡한 3D 장면에서 유연하고 정확한 공간 추론 능력을 제공하는 것을 목표로 합니다.

#Review #3D Vision #Vision-Language Models #Spatial Reasoning #Region Prompting #Multi-view Learning #Depth Estimation #Unified Representation #Generative AI

2025년 9월 17일

[논문리뷰] From Editor to Dense Geometry Estimator

본 논문은 기존의 텍스트-투-이미지(T2I) 생성 모델보다 Diffusion Transformer (DiT) 기반의 이미지 편집 모델이 단안 밀집 기하학 추정(depth 및 normal) 작업에 더 적합한 파운데이션 모델임을 증명하고, 이를 기반으로 FE2E 라는 새로운 프레임워크를 개발하여 제한된 훈련 데이터로도 뛰어난 제로샷 성능을 달성하는 것을 목표로 합니다.

#Review #Dense Geometry Estimation #Diffusion Transformer #Image Editing #Zero-shot Learning #Depth Estimation #Normal Estimation #Flow Matching #Logarithmic Quantization

2025년 9월 5일

[논문리뷰] Multi-View 3D Point Tracking

본 논문은 기존 단안 카메라 트래커의 깊이 모호성 및 가림(occlusion) 문제나, 20개 이상의 카메라와 복잡한 최적화를 요구하는 기존 멀티 카메라 방식의 한계를 극복하고자 합니다.

#Review #3D Point Tracking #Multi-View #Transformer #kNN Correlation #Depth Estimation #Dynamic Scenes #Occlusion Handling #Feature Fusion

2025년 8월 29일

[논문리뷰] G-CUT3R: Guided 3D Reconstruction with Camera and Depth Prior Integration

본 논문은 기존의 피드포워드(feed-forward) 3D 재구성 모델들이 RGB 이미지에만 의존하여 보조 데이터(깊이 맵, 카메라 내/외부 파라미터)를 활용하지 못하는 한계를 해결하고자 합니다.

#Review #3D Reconstruction #Deep Learning #Multi-Modal Fusion #Camera Pose Estimation #Depth Estimation #Transformer Networks #Prior Information

2025년 8월 19일

[논문리뷰] EgoNight: Towards Egocentric Vision Understanding at Night with a Challenging Benchmark

대부분의 기존 egocentric vision 벤치마크가 주간 시나리오에 집중하여 야간의 저조도 환경을 간과하는 문제를 해결하고자 합니다.

#Review #Egocentric Vision #Nighttime Conditions #Visual Question Answering (VQA)#Day-Night Alignment #Multimodal Large Language Models (MLLMs)#Depth Estimation #Correspondence Retrieval #Benchmark

2025년 10월 8일