#Matryoshka Representation Learning

9개의 포스트

[논문리뷰] PARCEL: Pool-Anchored Resampling with Conditioned Elastic Queries for Efficient Vision-Language Understanding

본 논문은 기존의 elastic vision-token compression 방식들이 가진 근본적인 표현력 한계를 극복하고자 한다.

#Review #Vision-Language Models #Token Compression #Elastic Inference #Matryoshka Representation Learning #Pool-Conditioned Query Resampling #Efficient Multimodal Learning

2026년 6월 1일

[논문리뷰] MatryoshkaLoRA: Learning Accurate Hierarchical Low-Rank Representations for LLM Fine-Tuning

본 논문은 기존 LoRA 방식이 고정된 rank $R$에 의존하여 최적의 성능을 찾기 위해 반복적인 grid search가 필요하다는 점을 해결하고자 합니다.

#Review #LoRA #Parameter-Efficient Fine-Tuning #Rank-Adaptive #Matryoshka Representation Learning #LLM #Hierarchical Low-Rank

2026년 5월 10일

[논문리뷰] Matryoshka Gaussian Splatting

3D Gaussian Splatting (3DGS)의 실질적인 배포를 위해서는 단일 모델에서 조정 가능한 충실도(fidelity)로 장면을 렌더링하는 LoD 기능이 매우 중요합니다.

#Review #3D Gaussian Splatting #Level of Detail (LoD)#Continuous LoD #Matryoshka Representation Learning #Stochastic Budget Training #Neural Rendering

2026년 3월 19일

[논문리뷰] F2LLM-v2: Inclusive, Performant, and Efficient Embeddings for a Multilingual World

최근 Encoder-based 아키텍처에서 Decoder-based LLM embeddings로의 전환은 성능 향상을 가져왔지만, 현재 연구는 두 가지 주요 한계를 가지고 있습니다.

#Review #Multilingual Embedding #LLM #Matryoshka Representation Learning #Knowledge Distillation #Model Pruning #MTEB Benchmark #Low-resource Languages #Open-source

2026년 3월 19일

[논문리뷰] Qwen3-VL-Embedding and Qwen3-VL-Reranker: A Unified Framework for State-of-the-Art Multimodal Retrieval and Ranking

본 논문은 텍스트, 이미지, 문서 이미지, 비디오 등 다양한 양식의 데이터를 통합 하여 고정밀 멀티모달 검색을 수행하는 Qwen3-VL-Embedding 및 Qwen3-VL-Reranker 모델 시리즈를 소개합니다.

#Review #Multimodal Retrieval #Multimodal Ranking #Foundation Models #Embedding Models #Reranking Models #Contrastive Learning #Knowledge Distillation #Matryoshka Representation Learning #Quantization-Aware Training

2026년 1월 11일

[논문리뷰] Omni-AVSR: Towards Unified Multimodal Speech Recognition with Large Language Models

본 논문은 ASR, VSR, AVSR 태스크를 단일 프레임워크 내에서 지원하고 유연한 추론(elastic inference)이 가능한 통합된 오디오-비주얼 대규모 언어 모델(LLM) 을 개발하는 것을 목표로 합니다.

#Review #Multimodal Speech Recognition #Large Language Models #Audio-Visual Speech Recognition #LoRA #Matryoshka Representation Learning #Elastic Inference #Parameter-Efficient Adaptation

2025년 11월 10일

[논문리뷰] MetaEmbed: Scaling Multimodal Retrieval at Test-Time with Flexible Late Interaction

기존 멀티모달 검색 방법론들이 단일 벡터 임베딩의 표현력 한계에 부딪히거나, 다수의 토큰으로 인한 다중 벡터 방식의 계산 비용 문제로 확장성에 제약을 받는 문제를 해결하고자 합니다. 유연한 테스트 시간 임베딩 세분화 제어를 통해 확장 가능하며 높은 정확도를 유지하는 멀티모달 검색 패러다임을 개발하는 것이 주 목표입니다.

#Review #Multimodal Retrieval #Late Interaction #Meta Tokens #Matryoshka Representation Learning #Test-Time Scaling #Vision-Language Models #Dense Retrieval #Efficiency

2025년 9월 23일

[논문리뷰] MoME: Mixture of Matryoshka Experts for Audio-Visual Speech Recognition

논문은 대규모 언어 모델(LLMs) 기반 오디오-비주얼 음성 인식(AVSR) 시스템이 겪는 높은 계산 수요와 고정된 토큰 압축률의 한계를 해결하고자 합니다.

#Review #Audio-Visual Speech Recognition #Mixture of Experts #Matryoshka Representation Learning #Large Language Models #Elastic Inference #Token Compression #Multimodal AI

2025년 10월 7일

[논문리뷰] Towards Mixed-Modal Retrieval for Universal Retrieval-Augmented Generation

본 연구는 기존 RAG 시스템이 단일 모드 텍스트나 제한된 다중 모드 설정에만 초점을 맞춰, 실제 환경의 혼합 모드(mixed-modal) 질의 및 문서 처리에 한계가 있다는 문제를 해결하고자 합니다.

#Review #Universal RAG #Multimodal Retrieval #Mixed-Modal Data Generation #Vision-Language Models #Contrastive Learning #Matryoshka Representation Learning

2025년 10월 21일