#Trustworthy AI

6개의 포스트

[논문리뷰] CiteVQA: Benchmarking Evidence Attribution for Trustworthy Document Intelligence

본 논문은 현대 MLLM의 Doc-VQA 평가 방식이 최종 답변의 정답 여부에만 지나치게 의존하여, 실제 추론의 근거가 되는 시각적 증거의 정확성을 검증하지 못한다는 문제를 지적합니다.

#Review #Multimodal Large Language Models #Document Visual Question Answering #Evidence Attribution #Trustworthy AI #Strict Attributed Accuracy #Attribution Hallucination

2026년 5월 17일

[논문리뷰] Towards Comprehensive Stage-wise Benchmarking of Large Language Models in Fact-Checking

본 논문은 기존 벤치마크가 클레임 검증에만 초점을 맞춰 LLM의 사실 확인 워크플로우 전반(클레임 추출 및 증거 검색 포함)을 간과하는 문제를 해결하고자 합니다.

#Review #Fact-Checking #Large Language Models (LLMs)#Benchmarking #Multi-agent System #Stage-wise Evaluation #Claim Evolution #Trustworthy AI

2026년 1월 13일

[논문리뷰] The Confidence Dichotomy: Analyzing and Mitigating Miscalibration in Tool-Use Agents

본 논문은 대규모 언어 모델(LLM) 기반 자율 에이전트의 신뢰성을 높이기 위해, 도구 사용 환경에서 발생하는 verbalized calibration(언어화된 확신) 의 문제를 분석하고 완화하는 것을 목표로 합니다.

#Review #LLM Agents #Calibration #Tool Use #Reinforcement Learning #Miscalibration #Overconfidence #Trustworthy AI

2026년 1월 13일

[논문리뷰] A Decentralized Retrieval Augmented Generation System with Source Reliabilities Secured on Blockchain

기존 중앙 집중식 RAG(Retrieval Augmented Generation) 시스템의 높은 데이터 관리 비용과 개인 정보 보호 문제를 해결하고자 합니다.

#Review #Decentralized RAG #Blockchain #Smart Contracts #Source Reliability #Large Language Models #Retrieval Augmented Generation #Trustworthy AI

2025년 11월 17일

[논문리뷰] ReFIne: A Framework for Trustworthy Large Reasoning Models with Reliability, Faithfulness, and Interpretability

논문은 기존 Long Chain-of-Thought (CoT) 추론 모델 들이 답변 정확도와 토큰 효율성에만 집중하여 신뢰성(trustworthiness) 을 간과하는 문제를 해결하고자 합니다.

#Review #Trustworthy AI #Large Reasoning Models (LRMs)#Interpretability #Faithfulness #Reliability #Chain-of-Thought (CoT)#Supervised Fine-tuning (SFT)#GRPO

2025년 10월 15일

[논문리뷰] Annotation-Efficient Universal Honesty Alignment

본 논문은 대규모 언어 모델(LLM)이 지식 경계를 인식하고 보정된 자신감을 표현하는 Honesty Alignment 를 달성하는 것을 목표로 합니다.

#Review #LLM Honesty Alignment #Confidence Calibration #Annotation Efficiency #Self-Consistency #Elicitation-Then-Calibration (EliCal)#HonestyBench #LoRA #Trustworthy AI

2025년 10월 21일