#Prompt Injection

11개의 포스트

[논문리뷰] Your Agent, Their Asset: A Real-World Safety Analysis of OpenClaw

arXiv에 게시된 'Your Agent, Their Asset: A Real-World Safety Analysis of OpenClaw' 논문에 대한 자세한 리뷰입니다.

#Review #Personal AI Agents #Persistent State #Security Analysis #CIK Taxonomy #Prompt Injection #Agent Safety #Evolution-Safety Tradeoff

2026년 4월 6일

[논문리뷰] Agents of Chaos

Koyena Pal이 arXiv에 게시한 'Agents of Chaos' 논문에 대한 자세한 리뷰입니다.

#Review #AI Agents #Red-teaming #Agentic Systems #Multi-Agent Communication #Security Vulnerabilities #Prompt Injection #Social Engineering #Resource Management

2026년 2월 23일

[논문리뷰] FinVault: Benchmarking Financial Agent Safety in Execution-Grounded Environments

arXiv에 게시된 'FinVault: Benchmarking Financial Agent Safety in Execution-Grounded Environments' 논문에 대한 자세한 리뷰입니다.

#Review #Financial AI Agents #Security Benchmark #Execution-Grounded #LLM Safety #Prompt Injection #Jailbreaking #Compliance #Vulnerability Assessment

2026년 1월 21일

[논문리뷰] ToolSafe: Enhancing Tool Invocation Safety of LLM-based agents via Proactive Step-level Guardrail and Feedback

Shikun Zhang이 arXiv에 게시한 'ToolSafe: Enhancing Tool Invocation Safety of LLM-based agents via Proactive Step-level Guardrail and Feedback' 논문에 대한 자세한 리뷰입니다.

#Review #LLM Agents #Tool Use Safety #Guardrail #Step-level Safety Detection #Prompt Injection #Reinforcement Learning #Feedback Framework

2026년 1월 15일

[논문리뷰] Soft Instruction De-escalation Defense

arXiv에 게시된 'Soft Instruction De-escalation Defense' 논문에 대한 자세한 리뷰입니다.

#Review #Prompt Injection #LLM Security #Agentic Systems #Iterative Sanitization #Instruction Control #Adversarial Robustness #Large Language Models

2025년 10월 27일

[논문리뷰] Distractor Injection Attacks on Large Reasoning Models: Characterization and Defense

arXiv에 게시된 'Distractor Injection Attacks on Large Reasoning Models: Characterization and Defense' 논문에 대한 자세한 리뷰입니다.

#Review #Large Reasoning Models (LRMs)#Prompt Injection #Adversarial Attack #Reasoning Distraction #Chain-of-Thought #Robustness #Supervised Fine-Tuning (SFT)#Reinforcement Learning (RL)

2025년 10월 21일

[논문리뷰] Adaptive Attacks on Trusted Monitors Subvert AI Control Protocols

Maksym Andriushchenko이 arXiv에 게시한 'Adaptive Attacks on Trusted Monitors Subvert AI Control Protocols' 논문에 대한 자세한 리뷰입니다.

#Review #AI Control Protocols #LLM Monitors #Adaptive Attacks #Prompt Injection #Jailbreaking #Red Teaming #Scalable Oversight

2025년 10월 13일

[논문리뷰] Imperceptible Jailbreaking against Large Language Models

arXiv에 게시된 'Imperceptible Jailbreaking against Large Language Models' 논문에 대한 자세한 리뷰입니다.

#Review #Large Language Models #Jailbreaking #Imperceptible Attacks #Unicode Variation Selectors #Adversarial Suffixes #Safety Alignment #Prompt Injection

2025년 10월 7일

[논문리뷰] WAInjectBench: Benchmarking Prompt Injection Detections for Web Agents

Neil Zhenqiang Gong이 arXiv에 게시한 'WAInjectBench: Benchmarking Prompt Injection Detections for Web Agents' 논문에 대한 자세한 리뷰입니다.

#Review #Prompt Injection #Web Agents #Multimodal AI #Adversarial Attacks #Detection Benchmarking #Large Language Models #Image-based Detection #Text-based Detection

2025년 10월 6일

[논문리뷰] FocusAgent: Simple Yet Effective Ways of Trimming the Large Context of Web Agents

Léo Boisvert이 arXiv에 게시한 'FocusAgent: Simple Yet Effective Ways of Trimming the Large Context of Web Agents' 논문에 대한 자세한 리뷰입니다.

#Review #Web Agents #LLM Context Pruning #Accessibility Tree #Prompt Injection #Retrieval Augmented Generation #Web Navigation #Agent Security #Efficient LLM

2025년 10월 6일

[논문리뷰] aiXiv: A Next-Generation Open Access Ecosystem for Scientific Discovery Generated by AI Scientists

Heng Zhang이 arXiv에 게시한 'aiXiv: A Next-Generation Open Access Ecosystem for Scientific Discovery Generated by AI Scientists' 논문에 대한 자세한 리뷰입니다.

#Review #AI Agents #Open Access #Scientific Discovery #Peer Review #LLMs #Multi-agent Systems #Prompt Injection #Iterative Refinement

2025년 8월 22일