#Multimodal Speech Recognition

1개의 포스트

[논문리뷰] Omni-AVSR: Towards Unified Multimodal Speech Recognition with Large Language Models

본 논문은 ASR, VSR, AVSR 태스크를 단일 프레임워크 내에서 지원하고 유연한 추론(elastic inference)이 가능한 통합된 오디오-비주얼 대규모 언어 모델(LLM) 을 개발하는 것을 목표로 합니다.

#Review #Multimodal Speech Recognition #Large Language Models #Audio-Visual Speech Recognition #LoRA #Matryoshka Representation Learning #Elastic Inference #Parameter-Efficient Adaptation

2025년 11월 10일