#Adversarial Attack

12개의 포스트

[논문리뷰] A Systematic Study of Cross-Modal Typographic Attacks on Audio-Visual Reasoning

본 논문은 오디오-시각적 MLLM이 서로 다른 모달리티 간의 정보 불일치에 노출되었을 때 발생하는 취약점을 체계적으로 분석한다.

#Review #Multi-modal Large Language Models #Audio Typography #Adversarial Attack #Cross-modal Robustness #Semantic Steering #Safety Application #Content Moderation

2026년 4월 8일

[논문리뷰] When the Prompt Becomes Visual: Vision-Centric Jailbreak Attacks for Large Image Editing Models

본 논문은 대규모 이미지 편집 모델에서 시각적 프롬프트가 사용자 의도를 전달하는 새로운 패러다임이 도입되면서 발생하는 미탐지된 안전 위험 을 밝히고 해결하는 것을 목표로 합니다.

#Review #Vision-Centric Jailbreak Attack #Image Editing Models #Safety Benchmark #IESBench #Multimodal Reasoning #Adversarial Attack #Defense Mechanism

2026년 2월 11일

[논문리뷰] Less Is More -- Until It Breaks: Security Pitfalls of Vision Token Compression in Large Vision-Language Models

본 논문은 대규모 시각-언어 모델(LVLM)에서 시각 토큰 압축이 모델의 강건성(robustness) 에 미치는 보안적 영향을 최초로 체계적으로 탐구합니다.

#Review #LVLM Security #Token Compression #Adversarial Attack #Robustness Degradation #Compression-Aware Attack #Efficiency-Security Trade-off #Black-box Attack

2026년 1월 26일

[논문리뷰] GateBreaker: Gate-Guided Attacks on Mixture-of-Expert LLMs

본 논문은 Mixture-of-Experts (MoE) LLM 의 고유한 안전 특성과 취약점이 기존 Dense LLM 에 비해 충분히 연구되지 않았다는 문제의식을 제기합니다.

#Review #MoE LLM #Safety Alignment #Adversarial Attack #Neuron Pruning #Gate-level Profiling #Transfer Attack #Vision Language Model

2025년 12월 30일

[논문리뷰] DeContext as Defense: Safe Image Editing in Diffusion Transformers

본 논문은 대규모 Diffusion Transformer(DiT) 기반 이미지 편집 모델 의 심각한 프라이버시 문제를 해결하고자 합니다.

#Review #Diffusion Transformers #Image Editing #Privacy Protection #Adversarial Attack #Attention Mechanism #Identity Preservation #Deepfake Defense #In-context Learning

2025년 12월 18일

[논문리뷰] In-Context Representation Hijacking

본 논문은 LLM의 내부 표현을 조작하여 안전 장치를 우회하는 새로운 형태의 탈옥(jailbreak) 공격인 'Doublespeak'을 소개합니다.

#Review #LLM Jailbreak #In-Context Learning #Representation Hijacking #Mechanistic Interpretability #LLM Safety #Adversarial Attack #Semantic Shift

2025년 12월 3일

[논문리뷰] Adversarial Confusion Attack: Disrupting Multimodal Large Language Models

본 논문은 기존의 오분류나 탈옥(jailbreak) 공격과 달리, 멀티모달 대규모 언어 모델(MLLMs)이 일관성 없거나 자신감 있게 틀린 출력을 생성하도록 유도하여 시스템적인 혼란(confusion)을 야기하는 새로운 유형의 적대적 공격인 Adversarial Confusion Attack 을 제안합니다.

#Review #Adversarial Attack #Multimodal Large Language Models (MLLMs)#Entropy Maximization #Confusion Attack #Black-box Transfer #PGD #AI Agent Safety

2025년 12월 3일

[논문리뷰] Multi-Faceted Attack: Exposing Cross-Model Vulnerabilities in Defense-Equipped Vision-Language Models

본 논문은 RLHF(Reinforcement Learning from Human Feedback), 시스템 프롬프트, 입력/출력 콘텐츠 필터 등 다양한 방어 메커니즘이 적용된 Vision-Language Models (VLMs) 의 안전성 취약점 을 체계적으로 드러내는 것을 목표로 합니다.

#Review #Vision-Language Models (VLMs)#Adversarial Attack #Jailbreaking #Reward Hacking #Content Moderation Bypass #Cross-Model Transferability #Safety Vulnerabilities

2025년 11월 23일

[논문리뷰] IAG: Input-aware Backdoor Attack on VLMs for Visual Grounding

이 연구는 시각적 그라운딩(Visual Grounding) 태스크를 수행하는 Vision-Language Models (VLMs) 에 대한 새로운 입력 인지(Input-aware) 백도어 공격(Backdoor Attack) 시나리오와 방법론인 IAG 를 제시합니다.

#Review #Backdoor Attack #Vision-Language Models (VLMs)#Visual Grounding #Input-aware Trigger #Adversarial Attack #Security #U-Net #Open-vocabulary

2025년 8월 14일

[논문리뷰] Adversarial Video Promotion Against Text-to-Video Retrieval

본 논문은 텍스트-비디오 검색(T2VR) 모델의 간과된 취약점인 적대적 비디오 프로모션 공격 을 탐구합니다.

#Review #Adversarial Attack #Video Promotion #Text-to-Video Retrieval #Modality Refinement #Black-box Attack #Video Manipulation #Transferability

2025년 8월 13일

[논문리뷰] Fact2Fiction: Targeted Poisoning Attack to Agentic Fact-checking System

본 연구는 최신 LLM 기반 에이전트 팩트체킹 시스템 이 잘못된 정보를 확산시키거나 진실을 훼손할 수 있는 포이즈닝 공격에 취약함을 지적합니다. 기존 공격 방식은 이러한 정교한 시스템의 클레임 분해 및 교차 검증 메커니즘에 효과적이지 못합니다.

#Review #Adversarial Attack #Poisoning Attack #Fact-checking #LLM Agent #Retrieval Augmented Generation #Misinformation #System Security

2025년 8월 12일

[논문리뷰] Distractor Injection Attacks on Large Reasoning Models: Characterization and Defense

본 논문은 대규모 추론 모델(LRMs)에서 '추론 방해(Reasoning Distraction)' 라는 새로운 취약점을 식별하고 체계적으로 분석하는 것을 목표로 합니다.

#Review #Large Reasoning Models (LRMs)#Prompt Injection #Adversarial Attack #Reasoning Distraction #Chain-of-Thought #Robustness #Supervised Fine-Tuning (SFT)#Reinforcement Learning (RL)

2025년 10월 21일