#Generative Pretraining

5개의 포스트

[논문리뷰] Penguin-VL: Exploring the Efficiency Limits of VLM with LLM-based Vision Encoders

arXiv에 게시된 'Penguin-VL: Exploring the Efficiency Limits of VLM with LLM-based Vision Encoders' 논문에 대한 자세한 리뷰입니다.

#Review #Vision Language Model (VLM)#LLM-based Vision Encoder #Efficient AI #Multimodal Understanding #Generative Pretraining #Resource-constrained Deployment #Temporal Reasoning

2026년 3월 8일

[논문리뷰] Learning from Next-Frame Prediction: Autoregressive Video Modeling Encodes Effective Representations

arXiv에 게시된 'Learning from Next-Frame Prediction: Autoregressive Video Modeling Encodes Effective Representations' 논문에 대한 자세한 리뷰입니다.

#Review #Autoregressive Model #Video Modeling #Generative Pretraining #Representation Learning #Flow-Matching Decoder #Context Isolation #Masked Next-Frame Prediction

2025년 12월 24일

[논문리뷰] Next-Embedding Prediction Makes Strong Vision Learners

arXiv에 게시된 'Next-Embedding Prediction Makes Strong Vision Learners' 논문에 대한 자세한 리뷰입니다.

#Review #Self-supervised Learning #Generative Pretraining #Vision Transformer #Next-Embedding Prediction #Autoregressive Model #Image Classification #Semantic Segmentation #Causal Masking

2025년 12월 18일

[논문리뷰] Scaling Language-Centric Omnimodal Representation Learning

arXiv에 게시된 'Scaling Language-Centric Omnimodal Representation Learning' 논문에 대한 자세한 리뷰입니다.

#Review #Multimodal Embeddings #MLLMs #Contrastive Learning #Cross-modal Alignment #Generative Pretraining #Representation Learning #Scaling Laws

2025년 10월 15일

[논문리뷰] OpenVision 2: A Family of Generative Pretrained Visual Encoders for Multimodal Learning

Zirui Wang이 arXiv에 게시한 'OpenVision 2: A Family of Generative Pretrained Visual Encoders for Multimodal Learning' 논문에 대한 자세한 리뷰입니다.

#Review #Multimodal Learning #Vision Encoder #Generative Pretraining #Captioning Loss #Training Efficiency #Image-Text Models #Large Language Models

2025년 9월 3일