#Audio-Guided Perception

1개의 포스트

[논문리뷰] OmniAgent: Audio-Guided Active Perception Agent for Omnimodal Audio-Video Understanding

기존 옴니모달 대규모 언어 모델(OmniLLMs) 이 겪는 미세한 크로스모달 이해(fine-grained cross-modal understanding) 및 멀티모달 정렬(multimodal alignment) 의 한계를 해결하는 것을 목표로 합니다.

#Review #Omnimodal Understanding #Audio-Guided Perception #Active Learning Agents #Cross-Modal Alignment #Tool-Use #Video Understanding #Multimodal LLMs

2025년 12월 29일