#Contrastive Learning

49개의 포스트

[논문리뷰] Critic-R: Improving Agentic Search using Instruction-tuned Retrievers with Natural Language Introspective Feedback

본 논문은 Agentic Search 환경에서 기존 Retriever의 경직성이 전체 시스템 성능의 병목 현상(bottleneck)을 유발한다는 점을 해결하고자 합니다 . 기존 연구들은 주로 Reasoning Agent만을 최적화하거나, Retriever를 고정된 블랙박스로 간주하는 한계를 보입니다.

#Review #Agentic Search #Retrieval-Augmented Generation #Instruction-tuned Retriever #Inference-time Scaling #Contrastive Learning #Introspective Feedback

2026년 6월 7일

[논문리뷰] MERIT: Learning Disentangled Music Representations for Audio Similarity

본 논문은 기존 음악 유사도 모델이 여러 음악적 요소를 하나의 Monolithic 점수로 융합하여 표현함에 따라 발생하는 해석 가능성 및 세밀한 쿼리 제어의 한계를 해결하고자 합니다 .

#Review #Music Representation Learning #Disentanglement #Audio Similarity #Representation Learning #Contrastive Learning #Self-Supervised Learning

2026년 6월 2일

[논문리뷰] Towards Streaming Synchronized Spatial Audio Generation via Autoregressive Diffusion Transformer

본 논문은 실시간 인터랙티브 환경에서 몰입형 경험을 제공하기 위한 고품질 공간 오디오 생성 모델의 지연 시간과 정확도 문제를 해결하고자 합니다.

#Review #Spatial Audio Generation #Autoregressive Diffusion Transformer #Multimodal Learning #Streaming Generation #First-Order Ambisonics #Contrastive Learning #Direct Preference Optimization

2026년 5월 31일

[논문리뷰] Hide-and-Seek in Trajectories: Discovering Failure Signals for VLA Runtime Monitoring

본 논문은 범용 VLA 모델이 실환경 배포 시 겪는 실행 실패 문제를 실시간으로 감지하기 위한 효율적인 방법을 모색합니다. 기존의 방법론들은 고가의 단계별 실패 주석이 필요하거나, 액션 재샘플링 및 외부 VLM 모델 사용에 따른 높은 계산 오버헤드로 인해 실시간 배포가 어렵다는 한계가 있습니다.

#Review #Vision-Language-Action (VLA)#Failure Detection #Coarsely Supervised Learning #Contrastive Learning #Conformal Prediction #Embodied AI

2026년 5월 31일

[논문리뷰] PEAM: Parametric Embodied Agent Memory through Contrastive Internalization of Experience in Minecraft

본 논문은 기존 LLM 기반 embodied agent가 의존하는 비파라미터식(non-parametric) 기억 방식의 근본적인 한계를 해결하고자 합니다.

#Review #Embodied Agent #Parametric Memory #Contrastive Learning #Mixture-of-Experts #Continual Learning #Minecraft

2026년 5월 27일

[논문리뷰] Your Embedding Model is SMARTer Than You Think

본 논문은 single-vector multimodal retriever가 rich하고 sequential한 token sequence를 단일 global representation으로 압축하면서 발생하는 근본적인 information bottleneck 문제를 해결하고자 합니다.

#Review #Multimodal Retrieval #Single-Vector Embeddings #Multi-Vector Embeddings #Late Interaction #Information Bottleneck #Hidden States #Contrastive Learning #Plug-and-Play

2026년 5월 25일

[논문리뷰] CEPO: RLVR Self-Distillation using Contrastive Evidence Policy Optimization

본 논문은 RLVR 환경에서 기존 정책 최적화 방식들이 겪는 불균일한 credit assignment 문제를 해결하기 위해 CEPO를 제안합니다. 기존의 GRPO와 같은 방식은 전체 시퀀스에 동일한 보상을 부여하여 결정적 추론 단계와 단순 서술 토큰을 구분하지 못하는 한계가 있습니다.

#Review #RLVR #Credit Assignment #Self-Distillation #Contrastive Learning #Policy Optimization #Information Leakage

2026년 5월 19일

[논문리뷰] TabEmbed: Benchmarking and Learning Generalist Embeddings for Tabular Understanding

본 논문은 LLM이 자연어 처리에 성공한 것과 달리, tabular 데이터를 위한 통합된 representation 패러다임이 부재하다는 점을 해결하고자 합니다 .

#Review #Tabular Embedding #Contrastive Learning #Tabular Understanding #Foundation Models #Representation Learning #Tabular Retrieval

2026년 5월 7일

[논문리뷰] Dual-View Training for Instruction-Following Information Retrieval

본 논문은 기존의 instruction-aware retrievers가 지시문이 변경될 때 적절히 대응하지 못하고, 표면적인 쿼리-문서 유사도에 의존하여 구체적인 제약 조건을 무시하는 문제를 해결하고자 한다. Weller et al.

#Review #Instruction-Following #Information Retrieval #Dual-View Training #Polarity Reversal #Contrastive Learning

2026년 4월 21일

[논문리뷰] MegaStyle: Constructing Diverse and Scalable Style Dataset via Consistent Text-to-Image Style Mapping

본 논문은 기존 스타일 전이 기법들이 self-supervised 훈련 방식의 한계로 인해 스타일과 콘텐츠를 효과적으로 분리하지 못하고, 데이터셋의 품질 및 다양성 부족으로 스타일 전이 성능이 제한되는 문제를 해결하고자 한다.

#Review #MegaStyle #Style Transfer #Data Curation #Diffusion Transformer #Contrastive Learning

2026년 4월 9일

[논문리뷰] π-StepNFT: Wider Space Needs Finer Steps in Online RL for Flow-based VLAs

본 논문은 플로우 기반 Vision-Language-Action (VLA) 모델이 온라인 강화 학습(RL)에서 겪는 문제를 해결하는 것을 목표로 합니다. 특히, 다단계 샘플링 시 계산하기 어려운 우도(likelihood) 문제와, 미세 조정 후 행동 다양성이 부족하여 사소한 편차에도 취약해지는 문제를 해결하고자 합니다.

#Review #Reinforcement Learning (RL)#Flow-based Models #Vision-Language-Action (VLA) Models #Online Learning #Stochastic Differential Equation (SDE)#Contrastive Learning #Embodied AI #Robotics

2026년 3월 8일

[논문리뷰] SLER-IR: Spherical Layer-wise Expert Routing for All-in-One Image Restoration

다양한 이미지 손상(degradation)에 대해 단일 모델로 처리하는 올인원 이미지 복원(All-in-One Image Restoration) 프레임워크의 한계, 즉 특징 간섭과 전문가 특화 부족 문제를 해결하고자 합니다.

#Review #Image Restoration #Mixture of Experts #Degradation Representation #Spherical Embedding #Contrastive Learning #Adaptive Routing #All-in-One Model #Global-Local Fusion

2026년 3월 8일

[논문리뷰] DREAM: Where Visual Understanding Meets Text-to-Image Generation

본 논문은 시각적 이해(discriminative)와 텍스트-이미지 생성(generative)을 단일 모델 내에서 통합하는 멀티모달 학습 의 근본적인 문제를 해결하고자 합니다.

#Review #Multimodal Learning #Visual Representation Learning #Text-to-Image Generation #Masked Autoregressive Models #Contrastive Learning #Masking Warmup #Semantically Aligned Decoding

2026년 3월 3일

[논문리뷰] InfoNCE Induces Gaussian Distribution

본 논문은 InfoNCE 손실 함수 를 사용하여 학습된 표현(representations)이 실제 어떤 분포를 따르는지에 대한 근본적인 질문에 답하고, 이러한 표현들이 가우시안 분포 를 나타내는 이유에 대한 이론적 설명을 제공하는 것을 목표로 합니다.

#Review #Contrastive Learning #InfoNCE Loss #Gaussian Distribution #Representation Learning #Self-Supervised Learning #Hyperspherical Uniformity #Thin-Shell Concentration

2026년 3월 1일

[논문리뷰] MoBind: Motion Binding for Fine-Grained IMU-Video Pose Alignment

IMU 신호와 비디오에서 추출된 2D 포즈 시퀀스 간의 정교한 정렬을 위한 공동 표현 학습 을 목표로 합니다.

#Review #Multi-modal Alignment #Contrastive Learning #IMU-Video Fusion #Pose Estimation #Temporal Synchronization #Human Motion Analysis #Hierarchical Learning

2026년 2월 25일

[논문리뷰] CGPT: Cluster-Guided Partial Tables with LLM-Generated Supervision for Table Retrieval

본 논문은 일반적인 임베딩 모델이 테이블 검색에서 겪는 의미론적 압축(semantic compression) 및 쿼리-테이블 불일치 문제를 해결하고, 기존 LLM 기반 검색 증강 방법론인 QGpT의 한계(휴리스틱한 부분 테이블 선택 및 합성 쿼리의 불충분한 활용)를 극복하여 테이블 검색 성능을 향상시키는 것을 목표로 합니다.

#Review #Table Retrieval #LLM Supervision #K-means Clustering #Partial Table #Contrastive Learning #Embedding Fine-tuning #Synthetic Query Generation

2026년 1월 26일

[논문리뷰] OpenVision 3: A Family of Unified Visual Encoder for Both Understanding and Generation

본 논문은 이미지 이해(understanding)와 생성(generation) 모두에 활용될 수 있는 단일하고 통합된 시각적 표현을 학습하는 고급 비전 인코더인 OpenVision 3 를 제안합니다.

#Review #Unified Visual Encoder #Image Understanding #Image Generation #VAE #Vision Transformer #Multimodal Learning #Reconstruction #Contrastive Learning

2026년 1월 22일

[논문리뷰] Qwen3-VL-Embedding and Qwen3-VL-Reranker: A Unified Framework for State-of-the-Art Multimodal Retrieval and Ranking

본 논문은 텍스트, 이미지, 문서 이미지, 비디오 등 다양한 양식의 데이터를 통합 하여 고정밀 멀티모달 검색을 수행하는 Qwen3-VL-Embedding 및 Qwen3-VL-Reranker 모델 시리즈를 소개합니다.

#Review #Multimodal Retrieval #Multimodal Ranking #Foundation Models #Embedding Models #Reranking Models #Contrastive Learning #Knowledge Distillation #Matryoshka Representation Learning #Quantization-Aware Training

2026년 1월 11일

[논문리뷰] Parallel Latent Reasoning for Sequential Recommendation

순차 추천 시스템에서 희소한 사용자 행동 시퀀스로부터 복잡한 사용자 선호를 포착하는 문제를 해결하는 것이 목표입니다.

#Review #Sequential Recommendation #Latent Reasoning #Parallel Processing #Computational Scaling #Mixture of Experts #Contrastive Learning #Transformer Architecture

2026년 1월 6일

[논문리뷰] Towards Scalable Pre-training of Visual Tokenizers for Generation

본 논문은 시각 토크나이저(예: VAE)의 잠재 공간이 저수준 정보에 편향되어 고품질 생성으로 이어지지 않는 '사전 학습 스케일링 문제'를 해결하는 것을 목표로 합니다.

#Review #Visual Tokenizers #Pre-training #Latent Diffusion Models #Generative Models #Vision Transformer #Contrastive Learning #Self-Supervised Learning #Scaling Laws

2025년 12월 15일

[논문리뷰] Mitigating Object and Action Hallucinations in Multimodal LLMs via Self-Augmented Contrastive Alignment

본 논문은 비디오 이해 태스크에서 멀티모달 LLM(MLLM)이 생성하는 설명문의 시각적 객체 및 시간적 행동 환각 문제를 공동으로 완화하는 것을 목표로 합니다.

#Review #Multimodal LLMs #Video Understanding #Hallucination Mitigation #Object Hallucination #Action Hallucination #Contrastive Learning #Self-Augmentation #Tracklet-Phrase Alignment

2025년 12월 4일

[논문리뷰] Pillar-0: A New Frontier for Radiology Foundation Models

본 논문은 급증하는 영상 판독량과 인력 부족으로 인한 의료 시스템의 부담을 해결하기 위해, 기존 의료 AI 모델의 한계를 극복하는 새로운 방사선과 파운데이션 모델 Pillar-0 을 제안합니다.

#Review #Radiology Foundation Model #Volumetric Imaging #Multi-window Tokenization #Multi-scale Attention #Contrastive Learning #Clinical Evaluation #Data Efficiency #Medical Imaging

2025년 11월 24일

[논문리뷰] Φeat: Physically-Grounded Feature Representation

기존의 자기 지도 시각 백본이 고수준의 의미론적 특징과 저수준의 물리적 요소를 혼합하여 물리적 추론을 방해하는 문제를 해결하고자 합니다.

#Review #Self-supervised Learning #Physically-Grounded Features #Material Representation #Intrinsic Scene Understanding #Vision Transformer #Synthetic Data #Contrastive Learning

2025년 11월 18일

[논문리뷰] Large Language Models Meet Extreme Multi-label Classification: Scaling and Multi-modal Framework

본 연구는 Extreme Multi-label Classification (XMC)에서 Large Language Models (LLMs) 의 잠재력을 효과적으로 활용하고, 시각적 정보 를 효율적으로 통합하여 성능을 향상하는 것을 목표로 합니다.

#Review #Extreme Multi-label Classification (XMC)#Large Language Models (LLMs)#Multi-modal Learning #Dual-decoder Learning #Vision Transformers #Contrastive Learning #Prompt Engineering

2025년 11월 18일

[논문리뷰] Llama-Embed-Nemotron-8B: A Universal Text Embedding Model for Multilingual and Cross-Lingual Tasks

본 논문은 기존 임베딩 모델의 불투명한 훈련 데이터 및 방법론 문제를 해결하고자, 다국어 및 교차 언어 태스크에서 최첨단 성능을 달성하는 완전 오픈 소스 범용 텍스트 임베딩 모델인 llama-embed-nemotron-8b 를 개발하는 것을 목표로 합니다.

#Review #Text Embedding #Multilingual #Cross-Lingual #Contrastive Learning #Model Merging #Synthetic Data Generation #Instruction-Tuning #LLM

2025년 11월 10일

[논문리뷰] Mind-the-Glitch: Visual Correspondence for Detecting Inconsistencies in Subject-Driven Generation

본 논문은 Subject-Driven 이미지 생성 모델에서 발생하는 시각적 불일치(visual inconsistencies)를 정확하게 감지하고 정량화하며, 더 나아가 해당 불일치 영역을 공간적으로 지역화하는 것을 목표로 합니다.

#Review #Subject-Driven Generation #Visual Inconsistency Detection #Feature Disentanglement #Diffusion Models #Semantic Correspondence #Evaluation Metric #Spatial Localization #Contrastive Learning

2025년 9월 29일

[논문리뷰] Understand Before You Generate: Self-Guided Training for Autoregressive Image Generation

본 논문은 자연어 처리에서 성공적인 자기회귀(Autoregressive, AR) 모델이 이미지 생성 시 고수준 시각적 의미 학습에 어려움을 겪는 문제를 해결하고자 합니다.

#Review #Autoregressive Models #Image Generation #Self-Supervised Learning #Visual Understanding #Masked Image Modeling #Contrastive Learning #Next-Token Prediction #LlamaGen

2025년 9월 19일

[논문리뷰] Modality Alignment with Multi-scale Bilateral Attention for Multimodal Recommendation

본 논문은 기존 멀티모달 추천 시스템의 두 가지 주요 한계를 해결하고자 합니다: (1) 미세-정교한 교차-모달 연관성을 모델링하는 능력 부족으로 인한 최적 이하의 융합 품질, (2) 전역 분포 수준의 일관성 부족으로 발생하는 표현 편향.

#Review #Multimodal Recommendation #Modality Alignment #Attention Mechanism #Dilated Convolution #Maximum Mean Discrepancy #Contrastive Learning #Dimensionality Reduction

2025년 9월 12일

[논문리뷰] Focusing by Contrastive Attention: Enhancing VLMs' Visual Reasoning

본 논문은 복잡한 시각 환경에서 Vision-Language Models (VLMs) 의 추론 성능이 저하되는 문제를 해결하고자 합니다.

#Review #Vision-Language Models (VLMs)#Visual Reasoning #Attention Mechanisms #Contrastive Learning #Noise Suppression #Visual Complexity #Training-Free

2025년 9월 9일

[논문리뷰] NER Retriever: Zero-Shot Named Entity Retrieval with Type-Aware Embeddings

논문은 기존 NER(Named Entity Recognition) 시스템의 한계, 즉 고정된 유형 스키마와 대량의 레이블링 데이터 의존성을 극복하고자 합니다.

#Review #Named Entity Retrieval #Zero-Shot Learning #Type-Aware Embeddings #Large Language Models (LLMs)#Contrastive Learning #Internal Representations #Information Retrieval

2025년 9월 5일

[논문리뷰] Efficient Code Embeddings from Code Generation Models

본 논문은 기존 코드 임베딩 모델들이 겪는 지도 학습 데이터 부족 문제 와 대규모 비정렬 코드/자연어 데이터의 활용 미흡 을 해결하고자 합니다.

#Review #Code Embeddings #Code Generation Models #Autoregressive Backbones #Last-Token Pooling #Instruction Tuning #Contrastive Learning #Retrieval-Augmented Generation #MTEB Benchmark

2025년 9월 1일

[논문리뷰] Selective Contrastive Learning for Weakly Supervised Affordance Grounding

본 논문은 약지도 어포던스 그라운딩(Weakly Supervised Affordance Grounding, WSAG) 에서 모델이 어포던스 관련 부위 대신 일반적인 클래스 패턴에 집중하는 한계를 극복하고자 합니다.

#Review #Weakly Supervised Learning #Affordance Grounding #Contrastive Learning #CLIP #Part Discovery #Object Localization #DINO #Generative Models

2025년 8월 25일

[논문리뷰] CARFT: Boosting LLM Reasoning via Contrastive Learning with Annotated Chain-of-Thought-based Reinforced Fine-Tuning

본 논문은 LLM의 추론 능력 향상을 목표로, 기존 SFT(Supervised Fine-Tuning) 방식의 제한된 일반화 능력과 RL(Reinforcement Learning) 기반 방식의 불안정한 추론 경로 샘플링 및 주석된 CoT(Chain-of-Thought) 활용 부족 이라는 두 가지 주요 한계를 해결하고자 합니다.

#Review #LLM Reasoning #Contrastive Learning #Reinforcement Learning #Fine-tuning #Chain-of-Thought (CoT)#Annotated Data #Model Stability

2025년 8월 25일

[논문리뷰] Refining Contrastive Learning and Homography Relations for Multi-Modal Recommendation

본 논문은 멀티모달 추천 시스템의 주요 문제점인 데이터 희소성을 해결하고, 기존 대조 학습(Contrastive Learning) 방법의 두 가지 한계를 극복하는 것을 목표로 합니다.

#Review #Multi-modal Recommendation #Contrastive Learning #Graph Neural Network #Homography Relations #Meta-network #Orthogonal Constraint #Data Sparsity

2025년 8월 21일

[논문리뷰] UNCAGE: Contrastive Attention Guidance for Masked Generative Transformers in Text-to-Image Generation

본 논문은 Masked Generative Transformers (MGTs)를 사용한 텍스트-이미지(T2I) 생성 시 발생하는 조합적 충실도(compositional fidelity) 문제를 해결하고, 특히 속성 바인딩(attribute binding) 오류를 개선하는 것을 목표로 합니다.

#Review #Text-to-Image Generation #Masked Generative Transformers #Compositional Generation #Attention Guidance #Unmasking Strategy #Contrastive Learning #Training-Free #Attribute Binding

2025년 8월 13일

[논문리뷰] Marco-Voice Technical Report

본 논문은 음성 복제(voice cloning)와 감정 제어(emotion control)를 통합한 다기능 음성 합성 시스템 인 Marco-Voice 를 개발하는 것을 목표로 합니다.

#Review #Speech Synthesis #Voice Cloning #Emotion Control #Text-to-Speech #Disentanglement #Contrastive Learning #Flow Matching #Emotional Speech Dataset

2025년 8월 8일

[논문리뷰] CRINN: Contrastive Reinforcement Learning for Approximate Nearest Neighbor Search

논문은 ANNS(Approximate Nearest Neighbor Search) 알고리즘 최적화의 수작업적, 전문 지식 의존적 특성을 해결하는 것을 목표로 합니다. LLM을 강화 학습으로 증강하여 실행 속도를 보상 신호로 삼아, ANNS 구현을 자동으로 최적화하는 새로운 패러다임인 CRINN 을 제안합니다.

#Review #Approximate Nearest Neighbor Search #Reinforcement Learning #Large Language Models #Code Optimization #HNSW #Retrieval-Augmented Generation #Contrastive Learning

2025년 8월 6일

[논문리뷰] Routing Matters in MoE: Scaling Diffusion Transformers with Explicit Routing Guidance

본 논문은 Mixture-of-Experts(MoE)를 Diffusion Transformers(DiTs)에 적용할 때 발생하는 제한적인 성능 향상 문제를 해결하는 것을 목표로 합니다.

#Review #Mixture-of-Experts (MoE)#Diffusion Transformers (DiTs)#Routing Guidance #Semantic Specialization #Contrastive Learning #Image Generation #Flow Matching

2025년 10월 29일

[논문리뷰] E^2Rank: Your Text Embedding can Also be an Effective and Efficient Listwise Reranker

본 논문은 효율적인 검색과 효과적인 리스트와이즈 재랭킹 사이의 성능 격차를 해소하기 위해, 단일 텍스트 임베딩 모델을 확장하여 두 가지 기능을 모두 수행할 수 있는 통일된 프레임워크 E²RANK 를 제안합니다.

#Review #Text Embedding #Listwise Reranking #Information Retrieval #Pseudo Relevance Feedback #Contrastive Learning #Multi-task Learning #Efficiency #LLM-based Ranking

2025년 10월 28일

[논문리뷰] WithAnyone: Towards Controllable and ID Consistent Image Generation

본 논문은 텍스트-투-이미지 생성 모델에서 레퍼런스 인물의 ID(Identity)를 일관성 있게 유지하면서도, 레퍼런스 이미지를 단순히 복사하는 듯한 'copy-paste' 아티팩트 를 줄이고 생성된 이미지의 표현, 포즈, 조명 등의 다양성 및 제어 가능성 을 높이는 것을 목표로 합니다.

#Review #Identity-Consistent Generation #Text-to-Image Diffusion #Copy-Paste Artifacts #Contrastive Learning #Multi-Identity Dataset #Controllable Generation #ID-Preservation

2025년 10월 17일

[논문리뷰] UniME-V2: MLLM-as-a-Judge for Universal Multimodal Embedding Learning

기존 multimodal 임베딩 모델의 한계인 hard negative 샘플의 다양성 부족 과 의미적 미묘한 차이 포착 능력 부족 을 해결하여, discriminative ability 를 향상시키는 보편적인 multimodal 임베딩 모델을 개발하는 것을 목표로 합니다.

#Review #Multimodal Embeddings #MLLM-as-a-Judge #Hard Negative Mining #Semantic Alignment #Representation Learning #Reranking #Contrastive Learning

2025년 10월 16일

[논문리뷰] FG-CLIP 2: A Bilingual Fine-grained Vision-Language Alignment Model

기존 비전-언어 모델(VLM)이 대규모 전역 정렬에는 능숙하지만, 객체 속성, 공간 관계, 미묘한 언어 표현 등 세분화된 디테일 을 포착하고 비영어권 환경(특히 중국어) 에서 다국어 지원이 부족하다는 문제점을 해결하는 것을 목표로 합니다.

#Review #Vision-Language Alignment #Fine-grained Understanding #Bilingual Model #Contrastive Learning #Multimodal Retrieval #Open-Vocabulary Detection #Region-Text Matching

2025년 10월 16일

[논문리뷰] Scaling Language-Centric Omnimodal Representation Learning

본 논문은 MLLM(Multimodal Large Language Model) 기반 임베딩 모델의 우수한 성능이 전통적인 CLIP-스타일 모델 에 비해 가지는 근본적인 이유를 탐구합니다.

#Review #Multimodal Embeddings #MLLMs #Contrastive Learning #Cross-modal Alignment #Generative Pretraining #Representation Learning #Scaling Laws

2025년 10월 15일

[논문리뷰] SAIL-Embedding Technical Report: Omni-modal Embedding Foundation Model

기존 멀티모달 임베딩 모델의 한계인 제한된 모달리티 지원, 불안정한 학습 메커니즘, 산업 도메인 간극을 해결하는 것을 목표로 합니다. 이를 통해 다양한 실세계 시나리오에서 효과적인 옴니모달 임베딩(omni-modal embedding) 을 제공하는 SAIL-Embedding 이라는 파운데이션 모델을 제안합니다.

#Review #Omni-modal Embedding #Multimodal Learning #Recommendation Systems #Hard Negative Mining #Contrastive Learning #Large Language Models (LLMs)#Data Balancing #Multitask Learning

2025년 10월 15일

[논문리뷰] GCPO: When Contrast Fails, Go Gold

본 논문은 기존 강화 학습 방법론, 특히 Group Relative Policy Optimization (GRPO) 이 모델의 추론 한계에 갇혀 샘플 활용 효율성이 떨어지는 문제점을 해결하고자 합니다.

#Review #Reinforcement Learning #LLMs Reasoning #Policy Optimization #Contrastive Learning #Chain of Thought #Reference Answers #Math Reasoning #Gold-Standard Answer

2025년 10월 10일

[논문리뷰] No Tokens Wasted: Leveraging Long Context in Biomedical Vision-Language Models

본 논문은 기존 VLM(Vision-Language Model)의 짧은 텍스트 컨텍스트 길이(일반적으로 77 토큰)로 인해 발생하는 바이오메디컬 이미지 캡션의 토큰 손실 문제 를 해결하고, 긴 컨텍스트 캡션이 모델 성능에 미치는 영향을 탐구하는 것을 목표로 합니다.

#Review #Biomedical Vision-Language Models #Long-context Modeling #Contrastive Learning #Token Efficiency #Zero-shot Classification #Medical Image Retrieval

2025년 10월 8일

[논문리뷰] ProCLIP: Progressive Vision-Language Alignment via LLM-based Embedder

기존 CLIP 텍스트 인코더의 77토큰 길이 제한 , 영어 전용 지원, 미흡한 세분화된 의미 이해 능력이라는 한계를 해결하는 것이 목표입니다.

#Review #Vision-Language Models #CLIP #LLM-based Embedder #Knowledge Distillation #Contrastive Learning #Curriculum Learning #Multimodal Alignment #Progressive Alignment

2025년 10월 22일

[논문리뷰] Towards Mixed-Modal Retrieval for Universal Retrieval-Augmented Generation

본 연구는 기존 RAG 시스템이 단일 모드 텍스트나 제한된 다중 모드 설정에만 초점을 맞춰, 실제 환경의 혼합 모드(mixed-modal) 질의 및 문서 처리에 한계가 있다는 문제를 해결하고자 합니다.

#Review #Universal RAG #Multimodal Retrieval #Mixed-Modal Data Generation #Vision-Language Models #Contrastive Learning #Matryoshka Representation Learning

2025년 10월 21일

[논문리뷰] OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM

본 연구는 인간처럼 여러 모달리티에 걸쳐 세상을 인지하고 추론할 수 있는 강력한 오픈소스 옴니모달 LLM(Omni-Modal LLM) 인 OmniVinci 를 구축하는 것을 목표로 합니다.

#Review #Omni-Modal LLM #Multimodal Understanding #Vision-Audio Alignment #Temporal Reasoning #Data Curation #Foundation Models #Contrastive Learning #Rotary Time Embedding

2025년 10월 20일