#Guardrail

2개의 포스트

[논문리뷰] SafeHarbor: Hierarchical Memory-Augmented Guardrail for LLM Agent Safety

본 논문은 LLM 에이전트의 강력한 도구 사용 능력(tool-use)으로 인한 새로운 보안 위협과 기존 방어 기법들의 한계점을 해결하고자 합니다.

#Review #LLM Agent Safety #Memory Mechanism #Guardrail #Adversarial Generation #Information Entropy #Over-refusal Mitigation

2026년 5월 13일

[논문리뷰] ToolSafe: Enhancing Tool Invocation Safety of LLM-based agents via Proactive Step-level Guardrail and Feedback

본 논문은 LLM 기반 에이전트의 도구 호출 기능에서 발생하는 보안 위험을 해결하는 것을 목표로 합니다.

#Review #LLM Agents #Tool Use Safety #Guardrail #Step-level Safety Detection #Prompt Injection #Reinforcement Learning #Feedback Framework

2026년 1월 15일