#Representation Learning

28개의 포스트

[논문리뷰] Which Pretraining Paradigm Better Serves Spatial Intelligence? An Empirical Comparison of Vision-Language and Video Generation Models

본 논문은 Spatial Intelligence를 구축하는 데 있어 VLM과 VGM 중 어느 사전 학습(Pre-training) 패러다임이 더 우수한 표현 체계(Representation substrate)를 제공하는지 분석한다 .

#Review #Spatial Intelligence #Vision-Language Models #Video Generation Models #Frozen-Feature Probing #Representation Learning #Semantic Tagging #3D Geometry Prediction

2026년 6월 1일

[논문리뷰] LatentUMM: Dual Latent Alignment for Unified Multimodal Models

본 논문은 기존 멀티모달 모델이 겪고 있는 Modality 간의 표현 불일치 문제를 해결하기 위해 LatentUMM을 제안한다. 기존의 방식들은 서로 다른 모달리티의 특징을 독립적인 Latent Space로 학습하여, Cross-modal 태스크에서의 성능 저하 및 정렬(Alignment) 미흡이라는 한계를 가진다.

#Review #Multimodal Learning #Latent Alignment #Unified Models #Representation Learning #Cross-modal Representation

2026년 5월 24일

[논문리뷰] The Unlearnability Phenomenon in RLVR for Language Models

본 논문은 LLM 학습 과정에서 특정 문제들이 정답 보상을 받음에도 불구하고 왜 지속적으로 학습되지 않는지(Unlearnability)라는 역설적인 현상을 규명합니다.

#Review #Large Language Models #Reinforcement Learning #RLVR #Unlearnability #Gradient Outliers #Representation Learning

2026년 5월 20일

[논문리뷰] Semantic Generative Tuning for Unified Multimodal Models

본 논문은 현대 UMM들이 이해와 생성이라는 두 핵심 과업을 분리된 최적화 경로로 학습함으로써 발생하는 표현적 불일치(Representational misalignment) 문제를 해결하고자 합니다.

#Review #Unified Multimodal Models #Generative Tuning #Image Segmentation #Multimodal Alignment #Semantic Proxy #Representation Learning

2026년 5월 19일

[논문리뷰] What Matters for Diffusion-Friendly Latent Manifold? Prior-Aligned Autoencoders for Latent Diffusion

본 논문은 기존 Latent Diffusion Models(LDMs)의 tokenizer들이 주로 reconstruction fidelity에만 초점을 맞추어 설계되어, 정작 확산 생성 모델의 학습에 적합한 latent space를 형성하지 못한다는 문제를 제기합니다 .

#Review #Latent Diffusion Models #Tokenizer #Latent Manifold #Prior Alignment #Autoencoder #Generative Modeling #Representation Learning

2026년 5월 10일

[논문리뷰] Anisotropic Modality Align

MLLM 학습은 고품질의 쌍(paired) 멀티모달 데이터 부족이라는 고질적인 문제에 직면해 있으며, 이를 해결하기 위해 공유 임베딩 공간에서 unimodal 데이터를 정렬하는 방식이 주목받고 있다.

#Review #Multimodal Large Language Models #Modality Gap #Unpaired Alignment #Anisotropic Geometric Correction #Representation Learning

2026년 5월 10일

[논문리뷰] TabEmbed: Benchmarking and Learning Generalist Embeddings for Tabular Understanding

본 논문은 LLM이 자연어 처리에 성공한 것과 달리, tabular 데이터를 위한 통합된 representation 패러다임이 부재하다는 점을 해결하고자 합니다 .

#Review #Tabular Embedding #Contrastive Learning #Tabular Understanding #Foundation Models #Representation Learning #Tabular Retrieval

2026년 5월 7일

[논문리뷰] Scaling Test-Time Compute for Agentic Coding

본 논문은 Long-horizon 코딩 에이전트의 Inference-time scaling을 위해 데이터의 표현(Representation)과 선택(Selection) 방식이 핵심 Bottleneck임을 규명합니다.

#Review #Test-Time Compute #Agentic Coding #Representation Learning #Recursive Tournament Voting (RTV)#Parallel-Distill-Refine (PDR)#Long-Horizon Agents #Inference-Time Scaling

2026년 4월 22일

[논문리뷰] Convergent Evolution: How Different Language Models Learn Similar Number Representations

본 논문은 언어 모델이 일반 텍스트 학습만으로도 수(number)에 대한 주기적인 표현을 학습한다는 기존 연구들의 관찰에서 출발합니다.

#Review #Language Models #Mechanistic Interpretability #Fourier Features #Convergent Evolution #Modular Arithmetic #Representation Learning

2026년 4월 22일

[논문리뷰] SciLT: Long-Tailed Classification in Scientific Image Domains

본 논문은 자연 이미지 도메인과 달리 도메인 이동(Domain Shift)과 데이터 분포의 심각한 불균형이 존재하는 과학적 이미지 도메인에서 파운데이션 모델의 미세 조정이 효과적이지 않다는 문제를 해결하고자 합니다.

#Review #Long-Tailed Recognition #Scientific Image Domain #Foundation Models #Parameter-Efficient Fine-Tuning #Feature Fusion #Domain Shift #Representation Learning

2026년 4월 6일

[논문리뷰] Ghost-FWL: A Large-Scale Full-Waveform LiDAR Dataset for Ghost Detection and Removal

최근 모바일 LiDAR 시스템의 고감도화는 거리 측정 범위는 넓혔으나, 유리나 반사면으로 인한 Ghost Points 발생을 증가시켜 SLAM이나 3D 객체 인식의 정확도를 심각하게 떨어뜨리고 있습니다.

#Review #Full-Waveform LiDAR #Ghost Removal #Masked Autoencoder #Mobile LiDAR #Dataset #Representation Learning

2026년 3월 31일

[논문리뷰] Utonia: Toward One Encoder for All Point Clouds

본 논문의 핵심 목표는 단일 인코더 로 원격 감지, 실외 LiDAR, 실내 RGB-D 시퀀스, 객체 중심 CAD 모델, 비디오 리프티드 포인트 클라우드 등 다양한 도메인의 포인트 클라우드를 통합 처리 하는 것입니다.

#Review #Point Clouds #Self-supervised Learning #Multi-domain Learning #Foundation Model #Point Transformer #Representation Learning #Robotics #Spatial Reasoning

2026년 3월 3일

[논문리뷰] InfoNCE Induces Gaussian Distribution

본 논문은 InfoNCE 손실 함수 를 사용하여 학습된 표현(representations)이 실제 어떤 분포를 따르는지에 대한 근본적인 질문에 답하고, 이러한 표현들이 가우시안 분포 를 나타내는 이유에 대한 이론적 설명을 제공하는 것을 목표로 합니다.

#Review #Contrastive Learning #InfoNCE Loss #Gaussian Distribution #Representation Learning #Self-Supervised Learning #Hyperspherical Uniformity #Thin-Shell Concentration

2026년 3월 1일

[논문리뷰] MAEB: Massive Audio Embedding Benchmark

오디오 임베딩 모델의 평가 프로토콜이 파편화되어 모델 비교 및 의미 있는 진척도 추적에 어려움이 있는 문제를 해결하고자 합니다. 이를 위해 광범위하고 통일된 평가 프레임워크 인 MAEB(Massive Audio Embedding Benchmark) 를 구축하여 범용 오디오 임베딩 모델 개발을 촉진하는 것을 목표로 합니다.

#Review #Audio Embedding #Benchmark #Multimodal #Zero-shot Classification #Clustering #Representation Learning #MTEB Ecosystem #Cross-modal Audio-Text #Multilingual Audio

2026년 2월 18일

[논문리뷰] Modality Gap-Driven Subspace Alignment Training Paradigm For Multimodal Large Language Models

본 논문은 멀티모달 대조 학습(multimodal contrastive learning)에서 시각 및 언어 표현 정렬에도 불구하고 발생하는 Modality Gap 이라는 기하학적 이상 현상을 해결하고자 합니다.

#Review #Multimodal Large Language Models (MLLMs)#Modality Gap #Subspace Alignment #Unpaired Data #Representation Learning #Pretraining #Geometric Alignment

2026년 2월 9일

[논문리뷰] KV-Embedding: Training-free Text Embedding via Internal KV Re-routing in Decoder-only LLMs

디코더 전용 LLM을 학습 없이 텍스트 임베딩 백본으로 활용할 때 발생하는 두 가지 구조적 문제(인과적 어텐션으로 인한 정보 비대칭, 다음 토큰 예측 목표로 인한 의미 압축 편향)를 해결하여, 고품질의 텍스트 임베딩을 효율적으로 추출하는 것입니다.

#Review #Text Embedding #Decoder-only LLMs #Training-free #KV Re-routing #Causal Attention #Representation Learning #Intrinsic Dimensionality

2026년 1월 5일

[논문리뷰] Learning from Next-Frame Prediction: Autoregressive Video Modeling Encodes Effective Representations

기존 시각 생성 사전 훈련 방법론이 비디오의 핵심적인 시간 정보를 간과하거나, 자기회귀 방식이 의미론적 부정확성 및 낮은 생성 품질을 겪는 문제를 해결합니다.

#Review #Autoregressive Model #Video Modeling #Generative Pretraining #Representation Learning #Flow-Matching Decoder #Context Isolation #Masked Next-Frame Prediction

2025년 12월 24일

[논문리뷰] In Pursuit of Pixel Supervision for Visual Pre-training

본 논문은 기존 자기 지도 학습(Self-Supervised Learning) 패러다임이 잠재 공간 목표(latent-space objectives)에 의존하거나 과도한 휴먼 큐레이션을 통해 편향을 도입하는 한계를 지적합니다.

#Review #Pixel Supervision #Self-Supervised Learning #Masked Autoencoders (MAE)#Visual Pre-training #Foundation Models #Representation Learning #Web-Scale Data #Computer Vision

2025년 12월 17일

[논문리뷰] DiG-Flow: Discrepancy-Guided Flow Matching for Robust VLA Models

Vision-Language-Action (VLA) 모델이 분포 변화 및 복잡한 다단계 로봇 조작 태스크에서 성능 저하를 겪는 문제를 해결하고자 합니다. 이는 학습된 표현이 태스크 관련 의미를 견고하게 포착하지 못하기 때문이며, 본 논문은 기하학적 정규화 를 통해 VLA 모델의 견고성을 향상시키는 것을 목표로 합니다.

#Review #VLA Models #Flow Matching #Robotics #Robustness #Distribution Shift #Wasserstein Distance #Geometric Regularization #Representation Learning

2025년 12월 2일

[논문리뷰] FedRE: A Representation Entanglement Framework for Model-Heterogeneous Federated Learning

논문은 기존 FL 방법론이 가정하는 모델 동질성(homogeneous model architectures) 의 비현실성을 지적하며, 모델 이질성(model-heterogeneous FL) 환경에서 성능, 프라이버시, 통신 오버헤드 간의 효과적인 균형을 달성하는 것을 목표로 합니다.

#Review #Federated Learning #Model Heterogeneity #Representation Learning #Privacy Preservation #Communication Efficiency #Entangled Representation #Knowledge Transfer

2025년 11월 30일

[논문리뷰] Virtual Width Networks

본 논문은 Transformer 모델의 히든 차원을 늘릴 때 발생하는 Quadratic한 계산 비용 문제를 해결하면서도, 더 넓은 표현(wider representations)이 제공하는 이점을 얻는 것을 목표로 합니다.

#Review #Virtual Width Networks #Transformer #Mixture-of-Experts (MoE)#Scaling Laws #Representation Learning #Model Efficiency #Multi-Token Prediction #Hyper-Connections

2025년 11월 16일

[논문리뷰] Latent Zoning Network: A Unified Principle for Generative Modeling, Representation Learning, and Classification

본 논문은 생성 모델링(Generative Modeling) , 표현 학습(Representation Learning) , 분류(Classification) 라는 세 가지 핵심 ML 태스크를 단일 통합 원칙으로 해결하는 것을 목표로 합니다.

#Review #Generative Modeling #Representation Learning #Classification #Unified Framework #Latent Space #Flow Matching #Deep Learning #Image Generation

2025년 9월 22일

[논문리뷰] UniME-V2: MLLM-as-a-Judge for Universal Multimodal Embedding Learning

기존 multimodal 임베딩 모델의 한계인 hard negative 샘플의 다양성 부족 과 의미적 미묘한 차이 포착 능력 부족 을 해결하여, discriminative ability 를 향상시키는 보편적인 multimodal 임베딩 모델을 개발하는 것을 목표로 합니다.

#Review #Multimodal Embeddings #MLLM-as-a-Judge #Hard Negative Mining #Semantic Alignment #Representation Learning #Reranking #Contrastive Learning

2025년 10월 16일

[논문리뷰] Spatial Forcing: Implicit Spatial Representation Alignment for Vision-language-action Model

본 논문은 2D 데이터로 사전 훈련된 VLA 모델이 3D 물리 세계에서 정확한 동작을 수행하는 데 필요한 공간 인식이 부족하다는 문제를 해결하고자 합니다.

#Review #Vision-Language-Action Models #Spatial Perception #Implicit Representation Alignment #3D Foundation Models #Robotics #Data Efficiency #Representation Learning

2025년 10월 15일

[논문리뷰] Scaling Language-Centric Omnimodal Representation Learning

본 논문은 MLLM(Multimodal Large Language Model) 기반 임베딩 모델의 우수한 성능이 전통적인 CLIP-스타일 모델 에 비해 가지는 근본적인 이유를 탐구합니다.

#Review #Multimodal Embeddings #MLLMs #Contrastive Learning #Cross-modal Alignment #Generative Pretraining #Representation Learning #Scaling Laws

2025년 10월 15일

[논문리뷰] Advancing End-to-End Pixel Space Generative Modeling via Self-supervised Pre-training

본 연구는 픽셀 공간(pixel-space) 기반 생성 모델이 잠재 공간(latent-space) 기반 모델에 비해 훈련이 어렵고 성능이 낮은 문제점을 해결하여, 성능 및 효율성 격차를 해소하는 것을 목표로 합니다.

#Review #Pixel-space Generative Models #Diffusion Models #Consistency Models #Self-supervised Pre-training #End-to-end Training #Image Generation #FID #Representation Learning

2025년 10월 15일

[논문리뷰] RECALL: REpresentation-aligned Catastrophic-forgetting ALLeviation via Hierarchical Model Merging

대규모 언어 모델(LLMs)이 연속 학습 및 다중 도메인 환경에서 겪는 Catastrophic Forgetting (CF) 문제를 해결하는 것을 목표로 합니다.

#Review #Catastrophic Forgetting #Continual Learning #Model Merging #LLMs #Representation Learning #Data-free Learning #Hierarchical Parameter Fusion

2025년 10월 27일

[논문리뷰] Language Models are Injective and Hence Invertible

논문은 비선형 활성화 함수와 정규화 등으로 인해 Transformer 언어 모델이 정보를 손실하고, 입력 텍스트를 숨겨진 표현에서 정확하게 복구하기 어렵다는 기존의 인식을 비판합니다.

#Review #Language Models #Injectivity #Invertibility #Transformer #Representation Learning #Exact Recovery #SIPIT Algorithm #Real Analysis

2025년 10월 23일