#Multimodal Pretraining

3개의 포스트

[논문리뷰] Kwai Keye-VL-2.0 Technical Report

본 연구는 대규모 다중 모달 데이터셋 환경에서 높은 추론 성능과 효율적인 정렬을 동시에 달성하기 위한 고성능 VLM 아키텍처 개발을 목표로 합니다.

#Review #Vision-Language Model #Multimodal Pretraining #Alignment #Instruction Tuning #Visual Encoder #LLM

2026년 6월 9일

[논문리뷰] Beyond Language Modeling: An Exploration of Multimodal Pretraining

본 논문은 기존 언어 모델링의 한계를 넘어, 비전 신호를 퍼스트 클래스 시민 으로 통합한 통합 멀티모달 사전 훈련(unified multimodal pretraining) 의 설계 공간을 탐색하고 경험적 명확성을 제공하는 것을 목표로 합니다.

#Review #Multimodal Pretraining #Vision-Language Models #Mixture-of-Experts (MoE)#Representation Autoencoders (RAE)#World Modeling #Scaling Laws #Diffusion Models #Unified Architectures

2026년 3월 3일

[논문리뷰] EmbodiedOneVision: Interleaved Vision-Text-Action Pretraining for General Robot Control

본 연구는 기존 VLA 모델들이 가진 제한된 도메인 및 유연성 문제를 해결하고, 개방형 환경에서 인간 수준의 유연한 다중 모달 추론 및 물리적 상호작용 을 가능하게 하는 일반ist 로봇 제어를 목표로 합니다.

#Review #Embodied AI #Robot Control #Vision-Language-Action Models #Multimodal Pretraining #Flow Matching #Foundation Models #Generalization #Real-world Robotics

2025년 9월 1일