Thanks for your interest in contributing! This project is open to anyone who wants to help improve prompt injection detection.
Open an issue using the Bug Report template. Include:
- Steps to reproduce
- Expected vs actual behavior
- Python version and OS
If you've found a prompt injection technique that Prompt Guard doesn't catch:
- Open an issue with the Feature Request template
- Include example text that should be detected
- Suggest the severity level (critical/high/medium/low)
- Fork the repo
- Create a branch from
main:git checkout -b my-feature - Make your changes
- Test locally:
python prompt_guard.py ./test-directory - Open a PR with a clear description of what you changed and why
Patterns live in prompt_guard.py inside the PATTERNS dict. Each entry has:
(r"regex_pattern", "unique_name", "Human-readable description")Guidelines:
- Use
re.IGNORECASE(applied automatically) - Keep regexes readable; add comments if complex
- Avoid overly broad patterns that cause false positives
- Include patterns in both English and Spanish when applicable
- Place the pattern in the correct severity level
- Follow PEP 8
- Use type hints where possible
- Keep functions focused and well-documented
- Test your changes against repos with known benign content to check for false positives
| Level | Score | Use for |
|---|---|---|
| critical | 10 | Credential exfiltration, system prompt override, prompt reveal |
| high | 8 | Jailbreaks, impersonation, code execution |
| medium | 5 | Subtle manipulation, hidden HTML/scripts |
| low | 2 | Secrecy indicators, ambiguous patterns |
Open an issue with your question. All contributions and feedback are welcome.