#Frame Selection

5개의 포스트

[논문리뷰] FrameSkip: Learning from Fewer but More Informative Frames in VLA Training

본 논문은 기존 VLA 모델 학습 과정에서 무분별하게 모든 프레임을 동일한 비중으로 사용하는 'Temporal supervision imbalance' 문제를 해결하고자 합니다.

#Review #Vision-Language-Action (VLA)#Robot Manipulation #Frame Selection #Temporal Supervision #Data Curation #Policy Learning #Embodied AI

2026년 5월 13일

[논문리뷰] HiMu: Hierarchical Multimodal Frame Selection for Long Video Question Answering

Long-form video question answering (VideoQA)은 확장된 시간적 맥락에 대한 추론을 요구하지만, 현재 <strong>Large Vision-Language Models (LVLMs)</strong>의 finite context windows는 전체 비디오를 원시 프레임 속도로 처리하는 것을 불가능하게 만든다.

#Review #Video Question Answering #Frame Selection #Neuro-Symbolic Reasoning #Multimodal Understanding #Long Video

2026년 3월 22일

[논문리뷰] HERBench: A Benchmark for Multi-Evidence Integration in Video Question Answering

기존 VideoQA 벤치마크가 단일 단서나 언어 사전 지식에 의존하는 경향이 있어 다중 증거 통합 능력을 제대로 평가하지 못하는 문제를 해결하고자 합니다.

#Review #Video Question Answering #Multi-evidence Integration #Video-LLMs #Benchmark #Temporal Reasoning #Frame Selection #Evidential Requirement #MRFS

2025년 12월 21일

[논문리뷰] OneStory: Coherent Multi-Shot Video Generation with Adaptive Memory

이 논문은 기존 다중 샷 비디오 생성(MSV) 모델이 복잡한 서사에 필요한 장거리 샷 간 컨텍스트를 효과적으로 모델링하지 못하여 발생하는 시각적 불일치와 일관성 저하 문제를 해결하는 것을 목표로 합니다.

#Review #Multi-Shot Video Generation #Adaptive Memory #Long-Range Context #Frame Selection #Diffusion Models #Image-to-Video #Autoregressive Generation #Narrative Coherence

2025년 12월 9일

[논문리뷰] Find the Leak, Fix the Split: Cluster-Based Method to Prevent Leakage in Video-Derived Datasets

본 논문은 비디오 기반 데이터셋에서 발생하는 정보 누출(information leakage) 문제를 해결하는 것을 목표로 합니다.

#Review #Data Leakage #Video Datasets #Clustering #Frame Selection #Deep Learning #Object Detection #Dataset Partitioning #Dimensionality Reduction

2025년 11월 30일