#Dense Retrieval

9개의 포스트

[논문리뷰] DREAM: Dense Retrieval Embeddings via Autoregressive Modeling

본 논문은 기존 Contrastive Learning 기반의 Dense Retriever 학습이 고비용의 레이블 데이터와 정교한 Hard Negative Mining을 필요로 하는 한계를 극복하고자 합니다.

#Review #Dense Retrieval #Autoregressive Modeling #Next-Token Prediction #Attention Heads #Frozen LLM #Information Retrieval

2026년 6월 23일

[논문리뷰] Beyond Hard Negatives: The Importance of Score Distribution in Knowledge Distillation for Dense Retrieval

본 논문은 teacher 모델의 score 분포를 균등하게 유지하는 Stratified Sampling을 제안합니다. 이 방법은 전체 score 범위 내에서 사전에 정의된 quantile anchors에 가장 근접한 문서들을 선택하여 학습 데이터를 구성함으로써, 특정 점수대에 편향되지 않은 포괄적인 데이터 표본을 확보합니다 .

#Review #Knowledge Distillation #Dense Retrieval #Stratified Sampling #Score Distribution #Information Retrieval #Generalization

2026년 4월 8일

[논문리뷰] LaSER: Internalizing Explicit Reasoning into Latent Space for Dense Retrieval

본 논문은 강력한 추론 능력을 가진 LLM 기반 dense retriever 가 복잡한 쿼리에 대해 높은 지연 시간 없이 추론 능력을 활용하지 못하는 문제를 해결하고자 합니다.

#Review #Dense Retrieval #LLMs #Reasoning #Knowledge Distillation #Latent Space #Self-Distillation #Chain-of-Thought

2026년 3월 2일

[논문리뷰] Semantic Search over 9 Million Mathematical Theorems

본 논문은 기존 검색 도구가 논문 단위로만 작동하여 특정 수학적 정리, 보조 정리, 명제 검색이 어려운 문제를 해결하고자 합니다. 대규모 수학적 정리 코퍼스에 대한 시맨틱 검색 시스템 을 구축하여, 연구자와 AI 에이전트가 특정 수학적 지식을 효율적으로 찾을 수 있도록 지원하는 것을 목표로 합니다.

#Review #Semantic Search #Theorem Retrieval #LLMs #Dense Retrieval #Mathematical Information Retrieval #Vector Embeddings #Mathematical Dataset #RAG

2026년 2월 5일

[논문리뷰] TurkColBERT: A Benchmark of Dense and Late-Interaction Models for Turkish Information Retrieval

본 연구는 신경 임베딩 기반 정보 검색(IR) 시스템이 영어 중심의 아키텍처에서 뛰어난 성능을 보임에도 불구하고, 튀르키예어와 같이 형태론적으로 복잡하고 자원이 부족한 언어 에 대한 비교 가능한 발전이 부족하다는 문제의식에서 시작되었습니다.

#Review #Information Retrieval #Turkish Language #Late-Interaction Models #ColBERT #Dense Retrieval #MUVERA #Benchmarking #Low-Resource NLP #Fine-tuning

2025년 11월 20일

[논문리뷰] BiCA: Effective Biomedical Dense Retrieval with Citation-Aware Hard Negatives

본 연구는 생물의학 및 일반 도메인 정보 검색(IR) 시스템의 성능 향상을 목표로 합니다. 특히, 기존 방법론에서 어려움을 겪는 '하드 네거티브' 문서를 효과적으로 식별하고 활용하여, 밀집 검색 모델의 정밀도를 높이고 미묘한 의미적 차이를 학습할 수 있도록 하는 것을 핵심 과제로 삼습니다.

#Review #Dense Retrieval #Biomedical IR #Hard Negative Mining #Citation Networks #PubMed #Zero-shot Retrieval #Transformer Models

2025년 11월 11일

[논문리뷰] Trove: A Flexible Toolkit for Dense Retrieval

Trove는 밀집 검색(Dense Retrieval) 연구 실험을 위한 유연하고 사용하기 쉬운 오픈 소스 툴킷을 제공하여, 유연성과 속도를 희생하지 않으면서 연구 과정을 단순화 하는 것을 목표로 합니다. 특히, 대규모 데이터셋의 효율적인 관리, 유연한 모델링, 쉬운 분산 평가 등 기존 툴킷의 한계를 극복하고자 합니다.

#Review #Dense Retrieval #Retrieval Toolkit #Data Management #Distributed Training #Model Customization #Hard Negative Mining #Hugging Face Integration #Performance Optimization

2025년 11월 9일

[논문리뷰] MetaEmbed: Scaling Multimodal Retrieval at Test-Time with Flexible Late Interaction

기존 멀티모달 검색 방법론들이 단일 벡터 임베딩의 표현력 한계에 부딪히거나, 다수의 토큰으로 인한 다중 벡터 방식의 계산 비용 문제로 확장성에 제약을 받는 문제를 해결하고자 합니다. 유연한 테스트 시간 임베딩 세분화 제어를 통해 확장 가능하며 높은 정확도를 유지하는 멀티모달 검색 패러다임을 개발하는 것이 주 목표입니다.

#Review #Multimodal Retrieval #Late Interaction #Meta Tokens #Matryoshka Representation Learning #Test-Time Scaling #Vision-Language Models #Dense Retrieval #Efficiency

2025년 9월 23일

[논문리뷰] SitEmb-v1.5: Improved Context-Aware Dense Retrieval for Semantic Association and Long Story Comprehension

본 논문은 장문 문서에 대한 RAG(Retrieval-Augmented Generation) 시스템에서 기존 임베딩 모델의 한계를 극복하는 것을 목표로 합니다.

#Review #Dense Retrieval #Context-Aware Embedding #RAG #Long Document Comprehension #Residual Learning #Semantic Association #Text Embedding

2025년 8월 5일