#Long-form Video Understanding

3개의 포스트

[논문리뷰] LiteFrame: Efficient Vision Encoders Unlock Frame Scaling in Video LLMs

본 연구는 장편 비디오 이해를 위해 Video LLMs를 확장할 때 발생하는 고질적인 계산 복잡도와 효율성 병목 문제를 해결하는 데 집중합니다.

#Review #Video LLMs #Vision Encoder #Token Compression #Compressed Token Distillation #Long-form Video Understanding #Spatio-temporal Modeling

2026년 5월 18일

[논문리뷰] ARC-Chapter: Structuring Hour-Long Videos into Navigable Chapters and Hierarchical Summaries

본 논문은 기존 비디오 챕터링 방법론이 짧고 거친 주석에 의해 제한되어 장시간 비디오의 미묘한 전환에 대한 일반화가 어렵다는 문제를 해결하고자 합니다.

#Review #Video Chaptering #Long-form Video Understanding #Large Language Models #Multimodal Learning #Hierarchical Summarization #Video Segmentation #Reinforcement Learning #Dataset Creation

2025년 11월 19일

[논문리뷰] TimeSearch-R: Adaptive Temporal Search for Long-Form Video Understanding via Self-Verification Reinforcement Learning

본 논문은 수만 개의 프레임에서 관련 정보를 식별해야 하는 긴 형식 비디오 이해 태스크에서, 기존의 수동으로 고안된 검색 전략이 최적의 검색 전략 학습을 위한 end-to-end 최적화가 부족하다는 문제를 해결합니다.

#Review #Long-form Video Understanding #Temporal Search #Reinforcement Learning #Self-Verification #Video-Language Models #Adaptive Search #Interleaved Reasoning

2025년 11월 11일