#Dense Captioning

3개의 포스트

[논문리뷰] Molmo2: Open Weights and Data for Vision-Language Models with Video Understanding and Grounding

현재 가장 강력한 비디오-언어 모델(VLM)들이 대부분 독점적이거나, 독점 모델의 데이터를 증류하여 생성되거나, 훈련 데이터 및 방법론을 공개하지 않는 문제를 해결하고자 합니다.

#Review #Vision-Language Models #Video Understanding #Grounding #Open Weights #Open Data #Multimodal AI #Object Tracking #Dense Captioning

2026년 1월 15일

[논문리뷰] Factorized Learning for Temporally Grounded Video-Language Models

기존 비디오-언어 모델(VLLMs)이 이벤트 수준의 정확한 temporal grounding 과 텍스트 응답 생성에서 겪는 한계를 해결하는 것을 목표로 합니다.

#Review #Video-Language Models #Temporal Grounding #Factorized Learning #Preference Optimization #Evidence Referencing #Video Understanding #Dense Captioning

2025년 12월 31일

[논문리뷰] Dense Motion Captioning

본 논문은 3D 휴먼 모션 시퀀스 내에서 의미 있는 액션을 시간적으로 정확히 감지하고, 해당 액션에 대한 상세한 캡션을 생성하는 새로운 태스크인 Dense Motion Captioning (DMC) 을 제안합니다.

#Review #3D Human Motion #Dense Captioning #Large Language Models #Motion Understanding #Temporal Localization #Human-Language Datasets #Motion Generation

2025년 11월 9일