The Threat
Prompt injection occurs when attackers manipulate model inputs to override system instructions, extract sensitive information, or force unintended behavior. This can happen directly through user input or indirectly through external data sources like web pages, documents, or API responses. The attack works because LLMs treat instructions and data as the same type of input. An attacker can craft input that looks like data but acts as instructions, causing the model to ignore its original purpose.How Oximy Detects Prompt Injection
Instruction Boundary Detection
Identifies attempts to close system prompts or inject new instructions using delimiter manipulation, role switching, or instruction keywords. Catches patterns like:- “Ignore previous instructions”
- “New task:”
- “System: You are now…”
- Delimiter manipulation (
---END USER INPUT---)
Role Confusion Detection
Detects attempts to impersonate system, developer, or admin roles to gain elevated privileges or access restricted functionality. Flags phrases that claim authority:- “As the system administrator…”
- “Developer mode activated”
- “Override safety protocols”
Indirect Injection Scanning
Analyzes external content (web pages, documents, uploaded files) for hidden instructions before they reach the model context. Scans for:- Hidden text in HTML/CSS
- Encoded instructions in images
- Malicious content in PDFs or documents
- Instructions embedded in data fields
Real-World Example
A customer service chatbot receives this input:- Delimiter manipulation detected (
---END USER INPUT---) - Unauthorized role escalation flagged (
admin mode) - Suspicious action identified (
List all customer emails) - Request blocked before reaching the model
- Security team alerted to the attempt
Protection Techniques
- Direct Injection
- Indirect Injection
- Multi-Step Injection
Detection Methods:Guardrail Response: Detects instruction override attempt, blocks request, logs violation.
- Pattern matching for injection keywords
- Instruction syntax analysis
- Role and permission validation
- Context boundary enforcement
Best Practices
- Use strict mode in production: Better to block edge cases than allow injections
- Sanitize external content: Always scan web pages, documents, and uploads
- Monitor false positives: Track blocked legitimate requests to tune sensitivity
- Layer defenses: Combine with output validation and least-privilege access
- Regular updates: Keep injection patterns current with emerging techniques
Related Vulnerabilities
Prompt injection defense also protects against:- LLM01: Prompt Injection (OWASP Top 10)
- LLM06: Sensitive Information Disclosure (via injection)
- LLM08: Excessive Agency (via instruction override)