#Vision-to-Text Aggregation

1개의 포스트

[논문리뷰] TimeViper: A Hybrid Mamba-Transformer Vision-Language Model for Efficient Long Video Understanding

본 논문은 기존 MLLM이 긴 비디오 컨텍스트 처리 시 효율성과 효과성 사이의 균형을 맞추기 어려운 문제를 해결하고자 합니다.

#Review #Long Video Understanding #Hybrid Mamba-Transformer #Vision-Language Model #Token Compression #Vision-to-Text Aggregation #Efficient LLM #Multimodal AI

2025년 11월 20일