#Semantic Alignment

12개의 포스트

[논문리뷰] End-to-End Autoregressive Image Generation with 1D Semantic Tokenizer

본 논문은 기존의 2단계 학습 방식이 토크나이저와 생성 모델 간의 비정렬 문제를 야기하여 최종 생성 품질을 제한한다는 점을 해결하고자 한다.

#Review #Autoregressive Image Generation #1D Vision Tokenizer #End-to-End Training #Semantic Alignment #Vision Foundation Models

2026년 5월 3일

[논문리뷰] Improving Semantic Proximity in Information Retrieval through Cross-Lingual Alignment

본 논문은 기존의 CLIR 평가 방식이 실제 다국어 환경에서의 모델 성능과 잠재적인 언어 편향성을 충분히 측정하지 못한다는 문제를 지적한다.

#Review #Cross-Lingual Information Retrieval #Semantic Alignment #Jensen-Shannon Divergence #InfoNCE #Multilingual Embedding Models #Language Bias

2026년 4월 8일

[논문리뷰] HDINO: A Concise and Efficient Open-Vocabulary Detector

논문은 기존 개방형 단어 객체 탐지(OVD) 모델들이 수동으로 큐레이션된 학습 데이터셋 과 자원 집약적인 교차 모달 특징 추출 에 과도하게 의존하는 문제를 해결하고자 합니다. 이러한 의존성을 제거하여 간결하면서도 효율적인 개방형 단어 객체 탐지기 를 개발하는 것을 목표로 합니다.

#Review #Open-Vocabulary Object Detection #Transformer #DINO #CLIP #Semantic Alignment #Hard Example Mining #Feature Fusion #Two-stage Training

2026년 3월 4일

[논문리뷰] Communication-Inspired Tokenization for Structured Image Representations

본 논문은 기존 이미지 토크나이저들이 재구성 및 압축에만 초점을 맞춰 객체 수준의 의미론적 구조보다는 국부적인 텍스처를 포착하는 한계를 해결하고자 합니다.

#Review #Image Tokenization #Structured Representation #Attentive Encoding #Flow Matching #Semantic Alignment #Compositional Generalization #Transformer Architecture

2026년 2월 24일

[논문리뷰] Focal Guidance: Unlocking Controllability from Semantic-Weak Layers in Video Diffusion Models

본 논문은 Diffusion Transformer (DiT) 기반의 Image-to-Video (I2V) 모델에서 텍스트 프롬프트에 대한 제어력 부족 문제를 해결하고자 합니다.

#Review #Video Diffusion Models #Image-to-Video Generation #Diffusion Transformers (DiT)#Controllability #Semantic Alignment #Focal Guidance #Prompt Adherence

2026년 1월 14일

[논문리뷰] DualCamCtrl: Dual-Branch Diffusion Model for Geometry-Aware Camera-Controlled Video Generation

본 논문은 기존의 카메라 제어 비디오 생성 모델들이 겪는 장면 이해 및 기하학적 인식 부족 문제를 해결하여, 지정된 카메라 궤적에 더욱 충실하고 기하학적으로 일관된 비디오를 생성하는 것을 목표로 합니다. 특히 깊이(depth) 정보를 효과적으로 통합하여 카메라 제어 비디오 생성의 정확도를 높이는 데 중점을 둡니다.

#Review #Diffusion Models #Video Generation #Camera Control #Depth Estimation #Dual-Branch Architecture #Geometric Awareness #Semantic Alignment #Multi-modal Fusion

2025년 12월 2일

[논문리뷰] InternVideo-Next: Towards General Video Foundation Models without Video-Text Supervision

본 논문은 노이즈 많고 제한적인 비디오-텍스트 지도 학습의 한계와 저수준 픽셀 재구성에 머무르거나 숏컷 학습을 유도하는 기존 Masked Video Modeling (MVM) 의 문제점을 해결하고자 합니다.

#Review #Video Foundation Models #Self-Supervised Learning #Masked Video Modeling #Video-Text Supervision-Free #Encoder-Predictor-Decoder #Diffusion Decoder #Semantic Alignment #Latent World Model

2025년 12월 1일

[논문리뷰] SemCoT: Accelerating Chain-of-Thought Reasoning through Semantically-Aligned Implicit Tokens

현재 암시적 CoT(implicit CoT) 방법론이 직면한 두 가지 핵심 문제, 즉 (1) 암시적 추론과 실제 추론 간의 의미적 정렬 부족 으로 인한 성능 저하와 (2) 개별 암시적 추론 토큰 생성에 필요한 높은 연산 비용 을 해결하는 것을 목표로 합니다.

#Review #Chain-of-Thought (CoT)#Implicit Reasoning #LLMs #Semantic Alignment #Efficiency Optimization #Knowledge Distillation

2025년 11월 9일

[논문리뷰] LucidFlux: Caption-Free Universal Image Restoration via a Large-Scale Diffusion Transformer

본 논문은 알 수 없는 혼합된 열화가 적용된 실제 저품질(LQ) 이미지에 대해 의미론적 일관성과 지각적 충실도를 유지하면서 범용 이미지 복원(UIR)을 수행하는 것을 목표로 합니다.

#Review #Universal Image Restoration #Diffusion Transformer #Caption-Free #Semantic Alignment #Image Quality Assessment #Data Curation #Real-World Degradations #Deep Learning

2025년 9월 29일

[논문리뷰] 2D Gaussian Splatting with Semantic Alignment for Image Inpainting

본 논문은 기존 이미지 인페인팅 방법론의 이산적인 픽셀 처리 방식이 갖는 한계를 극복하고, 2D Gaussian Splatting(2DGS) 의 연속적인 특성을 활용하여 픽셀 수준의 일관성과 전역적인 의미론적 정합성을 갖춘 고품질 이미지 인페인팅 프레임워크를 개발하는 것을 목표로 합니다.

#Review #Image Inpainting #2D Gaussian Splatting #Semantic Alignment #DINO Features #Patch-level Rasterization #Continuous Representation #Generative Models

2025년 9월 12일

[논문리뷰] InfiGUI-G1: Advancing GUI Grounding with Adaptive Exploration Policy Optimization

본 논문은 MLLM(Multimodal Large Language Model) 기반 GUI 에이전트 의 핵심 과제인 자연어 지시문 GUI Grounding 에서 의미론적 정렬(Semantic Alignment) 의 비효율적인 탐색 문제 해결을 목표로 합니다.

#Review #GUI Grounding #MLLMs #Reinforcement Learning #Policy Optimization #Exploration Strategy #Semantic Alignment #Adaptive Exploration Reward #Human-Computer Interaction

2025년 8월 11일

[논문리뷰] UniME-V2: MLLM-as-a-Judge for Universal Multimodal Embedding Learning

기존 multimodal 임베딩 모델의 한계인 hard negative 샘플의 다양성 부족 과 의미적 미묘한 차이 포착 능력 부족 을 해결하여, discriminative ability 를 향상시키는 보편적인 multimodal 임베딩 모델을 개발하는 것을 목표로 합니다.

#Review #Multimodal Embeddings #MLLM-as-a-Judge #Hard Negative Mining #Semantic Alignment #Representation Learning #Reranking #Contrastive Learning

2025년 10월 16일