[논문리뷰] AgentDoG: A Diagnostic Guardrail Framework for AI Agent Safety and SecurityarXiv에 게시된 'AgentDoG: A Diagnostic Guardrail Framework for AI Agent Safety and Security' 논문에 대한 자세한 리뷰입니다.#Review#AI Agents#Safety Guardrails#Explainable AI (XAI)#Risk Taxonomy#Benchmarking#LLM Safety#Tool Use#Agent Alignment2026년 1월 27일댓글 수 로딩 중
[논문리뷰] X-MuTeST: A Multilingual Benchmark for Explainable Hate Speech Detection and A Novel LLM-consulted Explanation FrameworkShwetank Shekhar Singh이 arXiv에 게시한 'X-MuTeST: A Multilingual Benchmark for Explainable Hate Speech Detection and A Novel LLM-consulted Explanation Framework' 논문에 대한 자세한 리뷰입니다.#Review#Hate Speech Detection#Explainable AI (XAI)#Multilingual NLP#Large Language Models (LLMs)#Attention Mechanism#N-gram Explanations#Human Rationales#Benchmark Dataset2026년 1월 6일댓글 수 로딩 중
[논문리뷰] REFLEX: Self-Refining Explainable Fact-Checking via Disentangling Truth into Style and SubstanceYaxin Fan이 arXiv에 게시한 'REFLEX: Self-Refining Explainable Fact-Checking via Disentangling Truth into Style and Substance' 논문에 대한 자세한 리뷰입니다.#Review#Fact-Checking#Explainable AI (XAI)#Large Language Models (LLMs)#Self-Refinement#Latent Space#Disentanglement#Steering Vectors#Misinformation2025년 12월 4일댓글 수 로딩 중
[논문리뷰] Fidelity-Aware Recommendation Explanations via Stochastic Path IntegrationOren Barkan이 arXiv에 게시한 'Fidelity-Aware Recommendation Explanations via Stochastic Path Integration' 논문에 대한 자세한 리뷰입니다.#Review#Recommender Systems#Explainable AI (XAI)#Explanation Fidelity#Path Integration#Stochastic Sampling#Counterfactual Explanations#Model-Agnostic#Sparse Data2025년 11월 24일댓글 수 로딩 중
[논문리뷰] Rethinking Saliency Maps: A Cognitive Human Aligned Taxonomy and Evaluation Framework for ExplanationsNoam Koenigstein이 arXiv에 게시한 'Rethinking Saliency Maps: A Cognitive Human Aligned Taxonomy and Evaluation Framework for Explanations' 논문에 대한 자세한 리뷰입니다.#Review#Saliency Maps#Explainable AI (XAI)#Taxonomy#Evaluation Framework#Faithfulness Metrics#Contrastive Explanations#Granularity2025년 11월 23일댓글 수 로딩 중
[논문리뷰] Cross-Attention is Half Explanation in Speech-to-Text ModelsLuisa Bentivogli이 arXiv에 게시한 'Cross-Attention is Half Explanation in Speech-to-Text Models' 논문에 대한 자세한 리뷰입니다.#Review#Cross-attention#Speech-to-Text (S2T)#Explainable AI (XAI)#Saliency Maps#Feature Attribution#Transformer#Context Mixing#Correlation2025년 9월 23일댓글 수 로딩 중
[논문리뷰] When Explainability Meets Privacy: An Investigation at the Intersection of Post-hoc Explainability and Differential Privacy in the Context of Natural Language ProcessingGjergji Kasneci이 arXiv에 게시한 'When Explainability Meets Privacy: An Investigation at the Intersection of Post-hoc Explainability and Differential Privacy in the Context of Natural Language Processing' 논문에 대한 자세한 리뷰입니다.#Review#Natural Language Processing (NLP)#Explainable AI (XAI)#Post-hoc Explainability#Differential Privacy (DP)#Privacy-Utility Trade-off#Model Faithfulness#Text Privatization2025년 8월 15일댓글 수 로딩 중