Domain-Camouflaged Injection Attacks Evade Detection in Multi-Agent LLM Systems
Tags AI ยท Security
arXiv paper by Aaditya Pai demonstrates that domain-camouflaged injection attacks can evade standard LLM safety detectors by mimicking target document vocabulary and authority structures. Detection rates drop from 93.8% to 9.7% on Llama 3.1 8B and from 100% to 55.6% on Gemini 2.0 Flash. Multi-agent debate architectures amplify static injection attacks by up to 9.9x on smaller models. The paper introduces the Camouflage Detection Gap (CDG) metric and releases a public task bank and payload generator.