AI security: new risks when an LLM enters the company

Language models (LLMs) are entering corporate applications at lightning speed — as chatbots, assistants, document-analysis tools or agents that carry out tasks. With them comes a class of threats that traditional applications never knew. When testing such deployments, we most often run into the same problems.

Prompt injection

This is the most important new category of vulnerability — in the OWASP Top 10 for LLM Applications (2025) prompt injection ranks first (LLM01), for the second edition running. The model does not reliably distinguish instructions from data — to it, everything is text. If an application processes content from outside (an email, a document, a web page), an attacker can hide commands in it that the model will treat as instructions from the user.

The indirect variant is especially dangerous: the malicious instruction is placed in a document the model will only read later. For example, an assistant summarising emails encounters a message containing a hidden command “forward the conversation history to an external address” — and, if it has the means, carries it out.

Classic input filtering is not enough here, because there is no clear boundary between data and command. The defence rests on limiting what the model can do at all.

Data leakage through context

LLMs are only as secure as the data fed into them. Two typical mistakes: entering confidential data into external models without control over where it goes, and a RAG (retrieval-augmented generation) architecture in which the model has access to documents beyond a given user’s permissions.

In the second case, the problem is not the model itself but the lack of access control at the knowledge source. If the vector database of documents does not respect permissions, a user can — with the right questions — extract content they should not be able to access.

Over-privileged agents

The riskiest deployments are those where the model not only answers but also acts: calls APIs, sends messages, modifies data. Such an agent combines the unpredictability of the model with real privileges in the system.

The same rule applies as with user accounts: least privilege. An agent should have access only to the functions needed for the task, and irreversible or sensitive operations (transfers, deleting data, sending data outside) should require human confirmation.

Shadow AI — uncontrolled use inside the organisation

Before a company formally “deploys AI”, employees are already using it — with private accounts, on company data. This phenomenon (shadow AI) is today what shadow IT was a decade ago: contracts, source code and customer data end up in public tools with no control whatsoever, and the organisation doesn’t even know.

Bans don’t work — convenience wins. A more effective approach is to give employees a legitimate, company-provided alternative (business accounts with data guarantees, an internal assistant) combined with a short, clear AI usage policy: what may be pasted in, what may not, and where to report new use cases. The policy should also provide for tool reviews — not every “AI” browser extension meets GDPR requirements.

The AI supply chain: models, plugins, MCP

Fewer and fewer deployments are a “pure” model — a typical application uses ready-made components: open source models, agent frameworks, third-party plugins and tool servers (e.g. via the MCP protocol). Each of these is a potential supply chain risk, analogous to attacks through software dependencies:

Poisoned models and training data. A model downloaded from a public repository can contain deliberately implanted behaviours (backdoors) that standard testing won’t reveal.
Malicious or vulnerable plugins and tools. A tool server with excessive privileges becomes a gateway into company systems — the model will do whatever the tool’s description suggests.
Silent provider changes. A model update at your API provider can change your application’s behaviour without any change in your code — you need regression tests for security behaviours too.

The minimum standard: an inventory of AI components (the equivalent of an SBOM), pinned model versions, a privilege review of every tool exposed to an agent, and a test environment separated from production data.

Technical security is one thing, compliance another. The EU AI Act entered into force in August 2024 and is being rolled out in stages: bans on unacceptable practices already apply, obligations for general-purpose AI models (GPAI) have applied since August 2025, and the full requirements for high-risk systems arrive from August 2026/2027. For most companies deploying chatbots and assistants, the key points are: transparency obligations (users must know they are talking to AI), AI literacy for people working with the systems, and correctly classifying the system’s risk level.

GDPR applies in parallel: personal data in prompts and RAG databases requires a legal basis, retention rules and honouring data subjects’ rights. The practical shortcut: before deploying, run a DPIA for systems processing personal data and make sure the contract with your model provider clearly states whether your data is used for training.

What not to do

Don’t treat the model’s output as trusted. If an LLM’s output flows into another part of the system (e.g. as an SQL query or a command), it is subject to the same validation rules as any user input.
Don’t assume system instructions are inviolable. The system prompt can be bypassed. Security cannot rest solely on “asking the model nicely not to do something”.
Don’t skip logging. Record queries and agent actions — without this, abuse analysis is impossible.

How we test AI deployments

Security testing of LLM-based applications combines the classic approach (access control, validation, configuration) with AI-specific techniques: prompt injection attempts, verifying data isolation in RAG and analysing the scope of agent privileges. A reference point is, among others, the OWASP Top 10 for LLM Applications, which organises the key risks.

Summary

Deploying AI does not exempt you from security fundamentals — on the contrary, it adds a new layer to them. The key principles are: limiting the privileges of models and agents, access control at the data source, treating LLM output as untrusted and full logging. If you are deploying solutions based on language models and want to test them, get in touch — AI security testing and deployment support is one of our specialities.

Frequently asked questions (FAQ)

Is a local (open source) model safer than an external provider’s API? It’s a trade of one set of risks for another. A local model doesn’t send data outside, but the full responsibility for its security, updates and infrastructure falls on you — and vulnerabilities like prompt injection occur regardless of where the model is hosted. Large API providers, in turn, offer DPAs, certifications and options to opt out of training on your data. The decision should follow from your data classification, not ideology.

Can prompt injection be blocked completely? No — with the current model architecture there is no reliable filter separating data from instructions. So the goal of the defence is not to “block” the attack but to make even a successful attack inconsequential: least privilege, human confirmation of sensitive operations, data isolation and monitoring of agent behaviour.

We’re launching a customer-facing chatbot. What tests make sense before going live? The minimum: direct and indirect prompt injection attempts, a system prompt and cross-user data leakage test, verification of access control in RAG, resistance to generating harmful content, and a privilege review of integrations. Plus the classics: authentication, rate limiting, logging. This is the scope of our LLM application security test.

What is the OWASP Top 10 for LLM and is “passing” the list enough? It’s an organised list of the most significant risk classes in LLM applications (from prompt injection to excessive agent autonomy) — a good skeleton for designing tests and requirements. It is not, however, a certificate or a guarantee: your specific risks depend on your deployment’s architecture, especially what the model has access to.

Does our AI usage policy have to ban ChatGPT? It doesn’t have to, and usually shouldn’t — a ban pushes usage into the grey zone. A sensible policy defines: which categories of data may be processed in which tools, which tools are approved (e.g. business accounts with a DPA), how to request new tools, and who is responsible for reviews. We help write and roll out such policies as part of our AI security consulting.