Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FEAT: Sneaky Bits - Advanced Data Smuggling Techniques (ASCII Smuggler Updates) #822

Open
KutalVolkan opened this issue Mar 24, 2025 · 1 comment
Assignees

Comments

@KutalVolkan
Copy link
Contributor

KutalVolkan commented Mar 24, 2025

Is your feature request related to a problem? Please describe.

We currently support ASCII smuggling via Unicode Tags but do not support more flexible, byte-level invisible encoding. We want to increase our ability to simulate modern LLM smuggling and data exfiltration scenarios.

Describe the solution you'd like

Add a new sneaky_bits encoding mode to AsciiSmugglerConverter:

  • Uses only two invisible Unicode characters (U+2062 for 0, U+2064 for 1)
  • Encodes any UTF-8 input at the bit level
  • Supports decoding
  • Keeps unicode_tags as the default

Describe alternatives you've considered, if relevant

We could explore Variant Selectors next next, but let us start with Sneaky Bits.

Additional context

Based on Sneaky Bits. Enables advanced red teaming use cases like prompt injection, data leakage, and hidden triggers using only two invisible characters.

References

https://embracethered.com/blog/posts/2025/sneaky-bits-and-ascii-smuggler/

Note: Unless anyone thinks it's unnecessary or redundant, I’ll like to go ahead and start implementing the converter. I actually already have Sneaky Bits converter ready, just need to write the tests for it. :)

@romanlutz
Copy link
Contributor

romanlutz commented Mar 24, 2025

Please go ahead!

The only thought I've had since @paulinek13 's open draft PR #818 is that we should probably consider common utilities for certain functionality, e.g., if you have a word or token level converter then a utility to randomly select which ones to replace, a utility to replace only within certain predefined boundaries (using delineating tags which we have in one converter), if it's a character level converter again something to randomly select the characters, and in a case like this one you can still select subsections or convert all of it. This feels like functionality that shouldn't just be reinvented every single time. Probably out of scope for this item (and also for @paulinek13's), though. I just finally managed to get my thoughts in order while reading this 😀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants