#Reconstruction

2개의 포스트

[논문리뷰] OpenVision 3: A Family of Unified Visual Encoder for Both Understanding and Generation

본 논문은 이미지 이해(understanding)와 생성(generation) 모두에 활용될 수 있는 단일하고 통합된 시각적 표현을 학습하는 고급 비전 인코더인 OpenVision 3 를 제안합니다.

#Review #Unified Visual Encoder #Image Understanding #Image Generation #VAE #Vision Transformer #Multimodal Learning #Reconstruction #Contrastive Learning

2026년 1월 22일

[논문리뷰] AToken: A Unified Tokenizer for Vision

ATOKEN은 기존 시각 토크나이저들의 모달리티 및 태스크별 분절 문제를 해결하고, 이미지, 비디오, 3D 에셋 전반에서 고품질 재구성 및 심층적인 의미론적 이해를 동시에 달성하는 범용 시각 토크나이저를 개발하는 것을 목표로 합니다.

#Review #Unified Visual Tokenizer #Multimodal AI #Transformer Architecture #4D Representation #Adversarial-free Training #Reconstruction #Semantic Understanding #Generative Models

2025년 9월 19일