feat: taint checking and security #197

ambrishrawat · 2025-10-14T19:21:17Z

This PR introduces a minimal proof-of-concept for taint and security propagation across CBlock, ModelOutputThunk, and session flows, as discussed in generative-computing/mellea#189
.

mergify · 2025-10-14T19:21:52Z

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🟢 Enforce conventional commit

Wonderful, this rule succeeded.

Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/

title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert|release)(?:\(.+\))?:

ambrishrawat · 2025-10-17T14:15:05Z

@nrfulton quick clarifications -

What’s the best way for expose taint configuration to devs? e.g. when a description includes a user variable like summarise the following {{email_body}}, should taint be inferred automatically or something they can configure?
Would it make sense to have a global strictness setting to toggle between warnings and exceptions for taint violations? Is blocify the best place for this?

nrfulton · 2025-10-22T19:55:22Z

What’s the best way for expose taint configuration to devs? e.g. when a description includes a user variable like summarise the following {{email_body}}, should taint be inferred automatically or something they can configure?

We should infer automatically where-ever possible. I nthis case, I'm not sure how you would infer taint. I guess you assumption here is that email_boy -- or any user_variable input -- should entail taint?

ambrishrawat · 2025-10-23T12:47:18Z

Yes, that was the thinking. Making it configurable may make more sense for taint. Any thoughts on the best way to expose that? Would love your take on the code too.

davidcox · 2025-10-23T13:24:53Z

If there is a tainted variable in the context, everything downstream should get tainted. As for how variables get tainted in the first place, a common way people do this is to define sources, sinks, and (optionally) washers. These are wrappers around interfaces that produce sensitive data (e.g. HR database api), or where it enters an unsafe place (e.g. sending to a UI).

Signed-off-by: Ambrish Rawat <[email protected]>

nrfulton · 2025-11-21T01:18:11Z

docs/dev/taint_analysis.md

+component = CBlock("user input")
+component.mark_tainted()  # Sets SecLevel.tainted_by(component)


CBlocks should be immutable.

Naming a variable Ccmponent and assigning it to a CBlock is confusing.

The cyclic reference here is a bit confusing and invites buggy code. Use tainted_by(None) instead of tainted_by(self) for the root node.

c = CBlock("user input", sec_level=SecLevel.tained_by(None))

Updated this; tainted_by(None) for root now

nrfulton · 2025-11-21T01:19:11Z

docs/dev/taint_analysis.md

+component = CBlock("user input")
+component.mark_tainted()  # Sets SecLevel.tainted_by(component)
+
+if component._meta["_security"].is_tainted():


Why not c.sec_level?

Defined it as a property and now this works as c.sec_level.is_tainted()

nrfulton · 2025-11-21T01:20:25Z

docs/examples/security/taint_example.py

+print(f"Original CBlock is tainted: {not tainted_desc.is_safe()}")
+
+# Create session
+session = MelleaSession(OllamaModelBackend("llama3.2"))


Unless the example critically depends on using a particular model, always use session = start_session() instead. This makes the examples easier to maintain.

nrfulton · 2025-11-21T01:21:07Z

docs/examples/security/taint_example.py

+
+# The result should be tainted
+print(f"Result is tainted: {not result.is_safe()}")
+if not result.is_safe():


We should use is_tainted instead of is_safe. The meaning of safe is very ambiguous.

Removed all instances of is_safe

nrfulton · 2025-11-21T01:22:19Z

mellea/security/core.py

+        Returns:
+            The CBlock or Component that tainted this content, or None
+        """
+        if self.level_type == "tainted_by":


Especially in a module called security.core, we should avoid use of magic strings.

Created SecLevelType enum

nrfulton · 2025-11-21T01:24:24Z

mellea/security/core.py

+            sources.append(action)
+
+    # For Components, check their constituent parts for taint
+    if hasattr(action, 'parts'):


Instead us something like:

match action: case Component... case CBlock...

(If type(action) :> Component then check is not necessary because the Component protocol has a parts() method. )

Updated it to use match/case

Signed-off-by: Ambrish Rawat <[email protected]>

ambrishrawat · 2025-11-25T13:27:04Z

Thanks for the review @nrfulton !
I have incorporated your suggestions. Appreciate another pass when you get the chance

Signed-off-by: Ambrish Rawat <[email protected]>

guicho271828 · 2025-11-26T15:50:04Z

hi ambrish!

ambrishrawat marked this pull request as draft October 14, 2025 19:21

nrfulton self-requested a review October 15, 2025 16:54

ambrishrawat force-pushed the security_poc branch from 4798f64 to d1b09e1 Compare November 11, 2025 19:24

ambrishrawat marked this pull request as ready for review November 12, 2025 11:11

ambrishrawat added 9 commits November 14, 2025 15:34

version 2 of taint tracking

7d4e73e

Signed-off-by: Ambrish Rawat <[email protected]>

version 2 of taint tracking

4c9ab68

Signed-off-by: Ambrish Rawat <[email protected]>

taint tracking updates for ollama and litellm

daaf5ee

Signed-off-by: Ambrish Rawat <[email protected]>

docs taint analysis

2fd7bc0

Signed-off-by: Ambrish Rawat <[email protected]>

restored formatter to original

0efbee6

Signed-off-by: Ambrish Rawat <[email protected]>

removed redundant sanitise helper

5e1a266

Signed-off-by: Ambrish Rawat <[email protected]>

updated taint analysis dev docs

4ea932a

Signed-off-by: Ambrish Rawat <[email protected]>

updated taint analysis dev docs

b0a23a5

Signed-off-by: Ambrish Rawat <[email protected]>

updated taint analysis dev docs

5466c31

Signed-off-by: Ambrish Rawat <[email protected]>

ambrishrawat force-pushed the security_poc branch from d5f8033 to 5466c31 Compare November 14, 2025 15:35

ambrishrawat and others added 3 commits November 18, 2025 16:01

Merge branch 'generative-computing:main' into security_poc

5f85458

Merge branch 'generative-computing:main' into security_poc

0075de9

Merge branch 'main' into security_poc

7876e62

nrfulton requested changes Nov 21, 2025

View reviewed changes

ambrishrawat added 2 commits November 25, 2025 12:17

updates based on PR feedback

c303b87

Signed-off-by: Ambrish Rawat <[email protected]>

added tests for taint_sources of a Component

f85f953

Signed-off-by: Ambrish Rawat <[email protected]>

minor doc updates to remove the use of word safe

c1062e5

Signed-off-by: Ambrish Rawat <[email protected]>

		component = CBlock("user input")
		component.mark_tainted() # Sets SecLevel.tainted_by(component)

feat: taint checking and security #197

Are you sure you want to change the base?

feat: taint checking and security #197

Uh oh!

Conversation

ambrishrawat commented Oct 14, 2025

Uh oh!

mergify bot commented Oct 14, 2025

Merge Protections

🟢 Enforce conventional commit

Uh oh!

ambrishrawat commented Oct 17, 2025

Uh oh!

nrfulton commented Oct 22, 2025

Uh oh!

ambrishrawat commented Oct 23, 2025

Uh oh!

davidcox commented Oct 23, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ambrishrawat commented Nov 25, 2025

Uh oh!

guicho271828 commented Nov 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants