#In-car Assistant

1개의 포스트

[논문리뷰] CAR-bench: Evaluating the Consistency and Limit-Awareness of LLM Agents under Real-World Uncertainty

기존 LLM 에이전트 벤치마크가 이상적인 설정에서의 태스크 완료에만 초점을 맞추고 실제 환경에서의 신뢰성, 일관성, 한계 인식 을 간과하는 문제를 해결하고자 합니다.

#Review #LLM Agents #Benchmarks #Tool-use #Consistency #Uncertainty Handling #Hallucination #In-car Assistant #Policy Adherence

2026년 2월 5일