#Multimodal Foundation Model

6개의 포스트

[논문리뷰] Context Unrolling in Omni Models

본 논문은 다양한 모달리티를 원천 학습하여 모델이 스스로 추론 경로를 구조화하도록 유도하는 Context Unrolling 프레임워크를 제안한다. 모델은 작업 관련 컨텍스트를 선택적으로 활성화하여 공유 작업 공간에 투입하며, 이는 최종 예측 전후로 긴밀하게 작동한다 .

#Review #Multimodal Foundation Model #Context Unrolling #Unified Architecture #Cross-modal Reasoning #Spatial Intelligence #Mixture-of-Experts

2026년 4월 23일

[논문리뷰] LLaDA2.0-Uni: Unifying Multimodal Understanding and Generation with Diffusion Large Language Model

본 연구는 통합된 multimodal 이해와 생성을 위해 독립적인 아키텍처 대신 dLLM 기반의 단일 프레임워크를 구축하는 것을 목표로 합니다.

#Review #Multimodal Foundation Model #Diffusion Large Language Model #SigLIP-VQ #Unified Architecture #Block-wise Masked Diffusion

2026년 4월 22일

[논문리뷰] Seedance 2.0: Advancing Video Generation for World Complexity

본 연구는 기존 비디오 생성 모델이 가졌던 짧은 클립 생성 위주의 한계와 제한적인 제어 능력을 극복하고, 복잡한 실세계 시나리오에 대응하는 강력하고 조작 가능한(controllable) 비디오 합성 기술을 구현하는 데 목적이 있습니다.

#Review #Video Generation #Multimodal Foundation Model #Audio-Video Joint Generation #Controllability #Generative AI #Real-world Complexity

2026년 4월 15일

[논문리뷰] ERNIE 5.0 Technical Report

ERNIE 5.0은 텍스트, 이미지, 비디오, 오디오에 걸쳐 통합된 멀티모달 이해 및 생성 을 위한 본질적으로 자기회귀(autoregressive) 기반 파운데이션 모델 을 개발하는 것을 목표로 합니다.

#Review #Multimodal Foundation Model #Autoregressive #Mixture-of-Experts #Elastic Training #Reinforcement Learning #Unified Architecture #Sparse MoE #Efficient Deployment

2026년 2월 4일

[논문리뷰] SAM 3: Segment Anything with Concepts

이 논문은 기존 SAM(Segment Anything Model) 의 한계, 즉 단일 객체 분할(PVS)을 넘어 이미지와 비디오에서 개념(Concept) 을 기반으로 모든 객체 인스턴스를 탐지, 분할 및 추적하는 것을 목표로 합니다.

#Review #Segment Anything Model #Open-Vocabulary Segmentation #Multimodal Foundation Model #Instance Segmentation #Video Object Tracking #Prompt Engineering #Data Engine #Human-in-the-loop

2025년 11월 23일

[논문리뷰] Intern-S1: A Scientific Multimodal Foundation Model

본 논문은 과학 분야에서 오픈 소스 파운데이션 모델과 클로즈드 소스 모델 간의 성능 격차를 줄이고자 합니다.

#Review #Multimodal Foundation Model #Scientific AI #Reinforcement Learning #Mixture-of-Experts (MoE)#Dynamic Tokenizer #Data Curation #Low-Resource Learning

2025년 8월 22일