#Public Datasets

1개의 포스트

[논문리뷰] MiDashengLM: Efficient Audio Understanding with General Audio Captions

본 논문은 기존 대규모 오디오 언어 모델(LALM)이 직면한 폐쇄형 데이터 의존성, 일반화 및 접근성 한계, 그리고 자동 음성 인식(ASR) 기반 사전 훈련의 비효율성을 해결하고자 합니다.

#Review #Audio-Language Model #General Audio Captions #Audio Understanding #Speech Recognition #Efficient Inference #Public Datasets #Multimodality #Data Curation

2025년 8월 7일