#Uncertainty Quantification

11개의 포스트

[논문리뷰] Are LLM Decisions Faithful to Verbal Confidence?

대규모 언어 모델(LLM)이 자체 불확실성을 표현하는 '언어적 자신감'이 모델의 실제 추론, 지식 또는 의사 결정에 얼마나 충실한지 평가하는 것을 목표로 합니다. 특히, LLM이 다양한 오류 페널티에 반응하여 질문 응답 또는 기권 정책을 전략적으로 조정하는지 여부를 테스트합니다.

#Review #Large Language Model #Uncertainty Quantification #Verbal Confidence #Abstention #Decision-Making #Risk-Sensitive AI #Utility Maximization

2026년 1월 12일

[논문리뷰] QuCo-RAG: Quantifying Uncertainty from the Pre-training Corpus for Dynamic Retrieval-Augmented Generation

대규모 언어 모델(LLM)의 내부 신호(예: logits, 엔트로피)가 부정확한 예측에 대해 종종 높은 확신을 보이는 등 신뢰할 수 없다는 문제점을 해결하고자 합니다.

#Review #Dynamic RAG #Hallucination Detection #Corpus Statistics #Uncertainty Quantification #Pre-training Data #LLM Calibration #Infini-gram #Multi-hop QA

2025년 12월 22일

[논문리뷰] CheXmask-U: Quantifying uncertainty in landmark-based anatomical segmentation for X-ray images

본 논문은 의료 영상 분할 시스템의 안전한 임상 배포를 위해 랜드마크 기반 해부학적 분할 에서 불확실성 추정을 연구합니다. 기존 픽셀 기반 불확실성 연구와 달리, 내재적 토폴로지 보장을 제공하는 랜드마크 기반 모델에 대한 불확실성 추정의 간극을 해결하고, 신뢰할 수 없는 예측을 식별하는 것을 목표로 합니다.

#Review #Uncertainty Quantification #Landmark Segmentation #Chest X-ray #VAE #Graph Neural Networks #Out-of-Distribution Detection #Medical Imaging

2025년 12월 14일

[논문리뷰] World Models That Know When They Don't Know: Controllable Video Generation with Calibrated Uncertainty

본 논문은 최첨단 제어 가능한 비디오 모델이 흔히 겪는 환각 현상과 불확실성 표현 능력 부족 문제를 해결하고자 합니다.

#Review #Controllable Video Generation #Uncertainty Quantification #Video Models #Calibration #Out-of-Distribution Detection #Proper Scoring Rules #Latent Space

2025년 12월 7일

[논문리뷰] MIST: Mutual Information Via Supervised Training

본 논문은 고차원, 제한된 샘플, 복잡한 분포, 높은 MI(Mutual Information) 설정에서 기존 MI 추정기들이 겪는 성능 저하 문제를 해결하고자 합니다.

#Review #Mutual Information Estimation #Supervised Learning #Meta-Learning #Neural Networks #Uncertainty Quantification #SetTransformer #Quantile Regression

2025년 11월 24일

[논문리뷰] Why Language Models Hallucinate

본 논문은 대규모 언어 모델(LLM)이 '환각' 현상, 즉 그럴듯하지만 틀린 정보를 자신감 있게 생성하는 이유를 통계적으로 분석하고, 이러한 문제가 최신 모델에서도 지속되는 근본적인 원인을 밝히는 것을 목표로 합니다.

#Review #Language Models #Hallucination #Pretraining #Post-training #Evaluation Metrics #Binary Classification #Uncertainty Quantification #Calibration

2025년 9월 8일

[논문리뷰] Beyond Human Judgment: A Bayesian Evaluation of LLMs' Moral Values Understanding

본 연구는 대규모 언어 모델(LLMs)이 인간과 비교하여 도덕적 차원을 어떻게 이해하는지 평가하는 것을 목표로 합니다. 특히, 기존의 확정론적 정답(ground-truth) 가정에서 벗어나 어노테이터 불일치를 베이지안 방식으로 모델링 하여 인간의 내재된 불확실성과 모델의 도메인 민감도를 포착하고자 합니다.

#Review #Large Language Models #Moral Reasoning #Bayesian Evaluation #Uncertainty Quantification #Natural Language Processing #Soft Labels

2025년 8월 20일

[논문리뷰] When Models Lie, We Learn: Multilingual Span-Level Hallucination Detection with PsiloQA

대규모 언어 모델(LLM)의 안전하고 신뢰할 수 있는 배포를 위한 핵심 과제인 환각(hallucination) 탐지를 목표로 합니다.

#Review #Hallucination Detection #Multilingual LLMs #Span-Level Annotation #Synthetic Data Generation #Question Answering (QA)#Encoder Models #Uncertainty Quantification #GPT-4o

2025년 10월 17일

[논문리뷰] What If : Understanding Motion Through Sparse Interactions

논문은 물리적 장면의 동역학을 이해하는 것을 목표로 하며, 특히 국부적인 상호작용('pokes')의 결과로 발생할 수 있는 잠재적인 변화의 다중 모드 분포 를 예측하고자 합니다.

#Review #Motion Understanding #Sparse Interactions #Multimodal Prediction #Flow Poke Transformer #Physical Scene Dynamics #Uncertainty Quantification #Generative Models #Computer Vision

2025년 10월 15일

[논문리뷰] How Confident are Video Models? Empowering Video Models to Express their Uncertainty

비디오 생성 모델이 텍스트 프롬프트에 기반하여 부정확하거나 사실과 다른(hallucinate) 비디오를 생성할 때, 그 예측에 대한 불확실성을 표현하지 못하는 문제를 해결하는 것을 목표로 합니다.

#Review #Video Generation #Uncertainty Quantification #Aleatoric Uncertainty #Epistemic Uncertainty #Model Calibration #Text-to-Video #Generative AI #VMF Distribution

2025년 10월 6일

[논문리뷰] ERGO: Entropy-guided Resetting for Generation Optimization in Multi-turn Language Models

논문은 다중 턴 대화에서 Large Language Models (LLMs) 의 성능이 저하되는 문제를 해결하는 것을 목표로 합니다. 특히, 정보가 점진적으로 주어질 때 LLM이 대화 맥락을 '잃어버려' 발생하는 정확도 감소 및 신뢰성 하락을 개선하고자 합니다.

#Review #Multi-turn Conversation #Large Language Models (LLMs)#Context Management #Entropy-guided Resetting #Uncertainty Quantification #Performance Degradation #Prompt Engineering #Conversational AI

2025년 10월 20일