Beyond Prompt Injection: How AI Agents Are Reshaping the Cybersecurity Landscape

Marco Totolo
Data Scientist

The rise of AI agents represents a major shift in artificial intelligence, moving beyond simple chatbots to autonomous systems capable of independent reasoning, planning, and execution. As these systems become more advanced, they create new privacy and security challenges that require immediate attention from business leaders and technical professionals.

At ZDF Sparks, we're actively exploring secure agentic architectures to support editorial workflows while maintaining compliance and data integrity.

AI Agents and Their Security Implications

AI agents are fundamentally different from traditional AI systems because they can make autonomous decisions and maintain ongoing context across interactions. Unlike conventional AI tools that respond to individual prompts, agents can break down complex goals into smaller tasks, use external tools and APIs, and adapt their strategies based on real-time feedback.

The most prevalent architectural framework centers around five core components: a planning engine that breaks down goals, a reasoning module powered by large language models, a tool integration layer connecting to external systems, a memory system for persistent context, and an execution environment that monitors and adapts behavior. This architecture enables agents to operate independently with minimal human oversight, making them particularly valuable for complex workflows.

However, the same features that make agents valuable -- persistent access to sensitive data, autonomous decision-making, and deep integration with organizational systems -- also make them attractive targets for sophisticated attacks. These autonomous capabilities create unprecedented security vulnerabilities that traditional security measures struggle to address.

Application-Level Vulnerabilities: When Trusted Systems Turn Against Us

The EchoLeak Discovery: A turning point in AI security

The discovery of the EchoLeak vulnerability represents a significant development in AI security. This critical vulnerability marked the first known 'zero-click' attack on an AI agent, discovered by researchers at Aim Security and patched by Microsoft in their Copilot application in May 2025. The attack demonstrates how threat actors can manipulate AI agents without requiring any user interaction.

EchoLeak exploits what researchers call 'LLM Scope Violation' -- a fundamental flaw where AI agents mix trusted and untrusted data in their reasoning processes. An attacker can send a specially crafted email containing hidden prompt injection instructions that appear to be normal business correspondence. When a user later asks Copilot about topics referenced in the malicious email, the AI executes the attacker's hidden commands.

The attack works through several steps: An attacker sends an innocent-looking email referencing common business topics like employee onboarding or HR guides. The email contains hidden instructions designed to manipulate Copilot's behavior. When the victim asks Copilot about these topics, the AI retrieves the malicious email and processes the embedded commands. Copilot then sends out sensitive data including chat histories, OneDrive documents, SharePoint content, and Teams conversations through Microsoft's own trusted domains.

What makes EchoLeak particularly dangerous is its ability to bypass multiple security layers. The attack gets around cross-prompt injection attack (XPIA) classifiers by phrasing malicious instructions as if they're directed at humans rather than AI systems. It also bypasses image and link protection mechanisms and Content Security Policy protections.

Document-Based Attacks: The Same Vulnerability, Different Vector

Building on the EchoLeak vulnerability discovered in email agents, security researchers have identified similar attack patterns targeting AI agents with document access capabilities. These agents, which can connect to cloud storage services like Google Drive, OneDrive, or corporate document repositories, face the same fundamental LLM scope violation problems but through a different attack vector.

The document-based attack adapts the technique for document sharing environments. An attacker shares a document containing malicious prompt injection instructions disguised as legitimate business content. The document appears normal in preview mode, containing typical business information like meeting notes or project updates. When the victim asks the AI agent to summarize recent documents or search for information, the agent retrieves and processes the poisoned document. The agent then executes the embedded commands, searching for and exfiltrating sensitive data through covert channels like transparent images or hidden formatting.

Understanding the Common Application-Level Threat Pattern

These kinds of attacks reveal a critical pattern that extends beyond specific applications. Email agents create unique security risks through LLM scope violations where agents blend trusted and untrusted data in their reasoning processes, making it difficult to distinguish legitimate commands from malicious instructions. Cross-application data leakage becomes possible when agents with access to multiple systems accidentally share sensitive information across platforms. The autonomous nature of these systems means they can take actions like forwarding emails, scheduling meetings, or accessing documents without explicit user approval for each action.

This represents a fundamental shift from traditional application security. Compared to conventional email security that relies on perimeter-based filtering and rule-based detection, AI agents operate with context-aware analysis but are vulnerable to semantic attacks that exploit AI understanding rather than technical vulnerabilities. The integrated nature of these systems creates single points of failure that can impact entire organizational infrastructures.

AI agents that interact with everyday business applications face a new category of security threats that exploit their autonomous nature and trusted access to organizational data. These vulnerabilities share common patterns but manifest differently across various application types.

Infrastructure-Level Vulnerabilities

MCP Servers: The Critical Infrastructure Layer

The Model Context Protocol (MCP), launched by Anthropic in November 2024, has emerged as a standardized interface enabling AI agents to interact with external tools, databases, and services. MCP operates through a client-server architecture where AI applications communicate with external services via standardized MCP servers. When a user submits a prompt, the MCP client analyzes tool descriptions and routes calls to appropriate servers -- whether querying databases, calling APIs, or accessing local functions.

This architecture enables powerful workflows where a single AI conversation can seamlessly integrate multiple services, but it also introduces significant security risks that threaten the entire AI infrastructure ecosystem. The connection between application-level vulnerabilities and infrastructure vulnerabilities in MCP servers creates a compounding effect where compromised agents can leverage infrastructure weaknesses to expand their access and impact.

While application-level vulnerabilities demonstrate how individual AI agents can be compromised, the infrastructure that enables AI agent functionality faces its own set of critical security challenges. These infrastructure vulnerabilities amplify application-level risks and create system-wide exposure that can affect entire AI ecosystems.

Systematic Infrastructure Vulnerabilities

Comprehensive analysis of MCP servers has revealed systematic vulnerabilities across multiple critical attack vectors. OAuth discovery vulnerabilities represent one of the most severe attack classes, affecting a significant portion of analyzed servers. The widespread adoption of packages like mcp-remote transforms OAuth vulnerabilities into potential supply chain attacks affecting countless developer environments. Malicious servers can inject arbitrary commands through OAuth authorization endpoints, turning legitimate authentication flows into remote code execution vectors.

Command injection and code execution flaws are prevalent throughout the MCP ecosystem. These vulnerabilities allow complete system compromise through seemingly innocent AI tool interactions, enabling attackers to execute arbitrary system commands on host machines through inadequate input validation and unsafe command construction.

Unrestricted network access creates pathways for data theft and external communication, with many MCP servers allowing unrestricted URL fetches that create direct channels for stealing sensitive data and communicating with command-and-control infrastructure. File system exposure vulnerabilities enable unauthorized access to sensitive documents and system configurations, with widespread file leakage vulnerabilities that allow access to files outside intended directories.

New Attack Classes Unique to AI Infrastructure

Beyond traditional vulnerabilities, MCP introduces entirely new attack classes not seen in conventional software security. Tool poisoning attacks represent a fundamentally new threat where malicious MCP servers manipulate AI agents by providing false tool descriptions or poisoned responses. These attacks trick AI systems into performing unauthorized actions by exploiting the trust relationship between agents and their tools.

This infrastructure-level manipulation creates a direct pathway for the kinds of scope violation attacks seen in EchoLeak and document-based vulnerabilities. When infrastructure components themselves become untrustworthy, the boundary between trusted and untrusted data that AI agents rely on for security completely breaks down.

Understanding AI's Interconnected Risks

Across all examined AI agent categories, several common vulnerability patterns emerge that transcend specific implementations or platforms. Scope violation attacks represent the most significant threat, where agents fail to maintain proper boundaries between trusted internal operations and untrusted external data. This fundamental architectural flaw appears in email agents processing malicious messages, document agents accessing poisoned files, and MCP servers executing untrusted commands.

Persistent access exploitation represents another universal threat vector. Unlike traditional applications that operate within session-based security models, AI agents maintain continuous access to user credentials, system resources, and sensitive data. This persistent access becomes a liability when agents are compromised, as attackers gain long-term access to organizational systems without triggering traditional authentication alerts or session timeouts.

Context manipulation attacks exploit the AI agent's reliance on contextual understanding to make decisions. Attackers craft inputs that appear legitimate within normal business contexts but contain hidden instructions that change agent behavior. These attacks succeed because they exploit the semantic understanding capabilities that make AI agents valuable, turning their greatest strength into their most significant vulnerability.

The poisoning technique works by serving misleading tool descriptions that cause AI agents to misinterpret their capabilities or intended functions. For example, a malicious server might describe a 'file backup' tool that actually sends out data or a 'database query' tool that modifies records instead of reading them. Since AI agents rely on these descriptions to determine tool usage, poisoned descriptions can completely change agent behavior.

The Escalating Sophistication of Multi-Vector Attacks

The evolution from simple prompt injection to sophisticated multi-vector attacks demonstrates how quickly threat actors adapt to new technologies. Modern attacks combine multiple techniques to achieve persistent system access and large-scale data theft, leveraging both application vulnerabilities and infrastructure weaknesses in coordinated campaigns.

A critical emerging pattern is the convergence of traditional vulnerability exploitation with social engineering techniques specifically designed for AI systems. Attackers no longer rely solely on technical exploits or human deception in isolation. Instead, they create hybrid attacks that exploit both system vulnerabilities and the cognitive patterns of AI models, using persuasive language patterns and context manipulation to convince AI agents to bypass their safety guidelines.

Multi-stage attack chains now combine initial compromise vectors with persistence mechanisms and data theft techniques specifically designed for AI environments. An attacker might use document-based malicious prompts to trigger deeper system access, then leverage MCP vulnerabilities to establish persistent backdoors, and finally exploit agent trust relationships to move laterally across integrated systems.

Traditional security approaches prove inadequate for protecting against these converging threats. Network perimeter defenses cannot address vulnerabilities within trusted AI tools, while application security measures often fail to account for the unique attack vectors that exploit AI agent behavior rather than just technical flaws. The protocol design of systems like MCP prioritizes convenience over security, meaning that comprehensive protection requires architectural changes, not just patching individual vulnerabilities.

The rapid evolution of AI capabilities presents a fundamental challenge: as these models become more capable with each iteration, their potential attack surface expands in tandem. The very improvements that make AI agents more valuable -- enhanced reasoning, better contextual understanding, and more sophisticated decision-making -- also make them more attractive targets and potentially more dangerous when compromised.

The convergence of application-level and infrastructure-level threats creates a complex security landscape where compromise at any level can cascade across entire organizational systems. Organizations face mounting pressure to grant increasingly capable AI agents broader access to personal documents, financial systems, and autonomous decision-making authority to realize productivity gains. Yet this expanding scope of permissions fundamentally amplifies the consequences of security breaches.

Conclusion: Dealing With a New Security Paradigm

Success in this new paradigm requires organizations to move beyond traditional security thinking and embrace approaches that address the unique characteristics of autonomous AI systems. This means implementing security frameworks that account for the persistent, context-aware, and interconnected nature of AI agents while maintaining the operational benefits that make these systems valuable.

For organizations worldwide, the convergence of these new AI-specific threats creates a manageable but complex security landscape. Those that recognize the interconnected nature of application and infrastructure vulnerabilities, invest early in comprehensive security foundations, and maintain continuous adaptation to evolving threats will be best positioned to exploit the transformative potential of AI agents while protecting sensitive data and maintaining operational security. The organizations that thrive will be those that successfully balance the adoption of these powerful technologies with the implementation of robust security measures that address the unique risks of the autonomous AI frontier.

We're building secure, compliant AI systems at ZDF Sparks. Let's connect.

"Note: Some of the visuals in this blog post were created using AI technology."

AI with Purpose. Innovation with Integrity.
ZDF Sparks GmbH
Büro: Hausvogteiplatz 3-4, 10117 Berlin
Kontaktiere Uns: