#LLM Self-Awareness

1개의 포스트

[논문리뷰] Can LLMs Predict Their Own Failures? Self-Awareness via Internal Circuits

거대 언어 모델(LLM)이 생성하는 텍스트의 정확성 또는 오류를 스스로 인지하지 못하는 문제를 해결하고, 외부 평가자 없이 LLM 내부 작동을 통해 자체 실패를 예측할 수 있는 경량 메커니즘 을 개발하는 것을 목표로 합니다. 이는 LLM의 신뢰성, 안전성 및 효율성을 향상시키는 데 중요합니다.

#Review #LLM Self-Awareness #Failure Prediction #Internal States #Attention Mechanisms #Neural Network Probes #Computational Efficiency #Zero-Shot Transfer

2026년 1월 5일