feat: add pii masking capability to private ai integration #901

letmerecall · 2024-12-11T17:16:13Z

Description

This PR enhances Private AI integration by adding support for PII masking capabilities. Previously, Private AI integration allowed for PII detection, preventing texts containing PII from being sent to the LLM or end user. With the new masking feature, texts can now be sanitized and shared with the LLM or user in a masked format.

For example, given the user input:

My email id is [email protected]

The PII masking configuration can transform the text into:

My email id is [EMAIL]

This enables safe data handling while maintaining the contextual integrity of the input.

@Pouyanpi

Checklist

I've read the CONTRIBUTING guidelines.
I've updated the documentation if applicable.
I've added tests if applicable.
@mentions of the person or team responsible for reviewing proposed changes.

letmerecall · 2025-01-08T08:58:23Z

@Pouyanpi hope you had a wonderful holidays. PTAL whenever you find some time.

Pouyanpi · 2025-01-09T10:10:43Z

docs/user-guides/guardrails-library.md

@@ -694,9 +694,9 @@ For more details, check out the [GCP Text Moderation](https://github.com/NVIDIA/

 ### Private AI PII Detection

-NeMo Guardrails supports using [Private AI API](https://docs.private-ai.com/?utm_medium=github&utm_campaign=nemo-guardrails) for PII detection in input, output and retrieval flows.
+NeMo Guardrails supports using [Private AI API](https://docs.private-ai.com/?utm_medium=github&utm_campaign=nemo-guardrails) for PII detection and masking the in input, output and retrieval flows.


Suggested change

NeMo Guardrails supports using [Private AI API](https://docs.private-ai.com/?utm_medium=github&utm_campaign=nemo-guardrails) for PII detection and masking the in input, output and retrieval flows.

NeMo Guardrails supports using [Private AI API](https://docs.private-ai.com/?utm_medium=github&utm_campaign=nemo-guardrails) for PII detection and masking input, output and retrieval flows.

Pouyanpi · 2025-01-09T10:18:56Z

nemoguardrails/library/privateai/actions.py

+    """Masks any detected PII in the provided text.
+
+    Args
+        source: The source for the text, i.e. "input", "output", "retrieval".
+        text: The text to check.
+        config: The rails configuration object.
+
+    Returns
+        The altered text with PII masked.
+    """


Suggested change

"""Masks any detected PII in the provided text.

Args

source: The source for the text, i.e. "input", "output", "retrieval".

text: The text to check.

config: The rails configuration object.

Returns

The altered text with PII masked.

"""

"""Masks any detected PII in the provided text.

Args:

source (str): The source for the text, i.e. "input", "output", "retrieval".

text (str): The text to check.

config (RailsConfig): The rails configuration object.

Returns:

str: The altered text with PII masked.

"""

Pouyanpi · 2025-01-09T10:24:41Z

nemoguardrails/library/privateai/request.py

@@ -25,14 +24,14 @@
 log = logging.getLogger(__name__)


-async def private_ai_detection_request(
+async def private_ai_request(
    text: str,
    enabled_entities: List[str],
    server_endpoint: str,
    api_key: Optional[str] = None,
 ):
    """


Suggested change

"""

"""Send a PII detection request to the Private AI API.

Pouyanpi · 2025-01-09T10:24:56Z

nemoguardrails/library/privateai/request.py

    text: str,
    enabled_entities: List[str],
    server_endpoint: str,
    api_key: Optional[str] = None,
 ):
    """
-    Send a detection request to the Private AI API.
+    Send a PII detection request to the Private AI API.


Suggested change

Send a PII detection request to the Private AI API.

Pouyanpi · 2025-01-09T10:28:03Z

nemoguardrails/library/privateai/request.py

-            result = await resp.json()
-
-            return any(res["entities_present"] for res in result)
+            return await resp.json()


improve error handling for response parsing

As a suggestion, but please decide for yourself I have not given it much thought

try: response_json = await resp.json() except aiohttp.ContentTypeError: raise ValueError( f"Failed to parse response as JSON. Status: {resp.status}, " f"Content: {await resp.text()}" )

Pouyanpi · 2025-01-09T10:29:52Z

nemoguardrails/library/privateai/actions.py

+        server_endpoint,
+        pai_api_key,
+    )
+    return private_ai_response[0]["processed_text"]


Ensure that private_ai_response[0]["processed_text"] is always valid to avoid potential index errors.

Improve error handling

Update docstring with Raises (see ref)

Pouyanpi

Thank you @letmerecall for the PR, please review my comments.

current tests for PII detection and masking are using mocked responses. While this is great for unit testing, it doesn't really test the actual integration with the Private AI API. I think we should consider adding some integration tests using temporary credentials to make sure everything works end-to-end. What do you think about it @letmerecall? I am not quite sure wether the mock requests and response accurately mocks Private AI API. You can have a look at this sensitive content detection tests or its refinement in #845

Also, let's make sure our tests cover error scenarios, like what happens if the API key is missing or invalid. This will help us catch any potential issues early

letmerecall · 2025-01-09T20:01:28Z

Thank you @letmerecall for the PR, please review my comments.

current tests for PII detection and masking are using mocked responses. While this is great for unit testing, it doesn't really test the actual integration with the Private AI API. I think we should consider adding some integration tests using temporary credentials to make sure everything works end-to-end. What do you think about it @letmerecall? I am not quite sure wether the mock requests and response accurately mocks Private AI API. You can have a look at this sensitive content detection tests or its refinement in #845

Also, let's make sure our tests cover error scenarios, like what happens if the API key is missing or invalid. This will help us catch any potential issues early

Having a temp Private AI API key to test is a good idea. Let me get back to you on that.

Thanks for the review and the references. I'll address them soon :)

letmerecall added 5 commits December 11, 2024 20:18

Add tests for private ai pii masking

da2f547

Add pii masking action and flow for private ai

9b4a4f6

Update private ai docs to add pii masking

eb4577f

Add private ai pii masking example config

85409c5

Add private ai pii masking example in notebook

eebe8c4

Pouyanpi self-assigned this Jan 8, 2025

Pouyanpi self-requested a review January 8, 2025 11:46

Pouyanpi added enhancement New feature or request status: in review labels Jan 8, 2025

Pouyanpi reviewed Jan 9, 2025

View reviewed changes

Pouyanpi requested changes Jan 9, 2025

View reviewed changes

Pouyanpi assigned letmerecall and unassigned Pouyanpi Jan 9, 2025

Pouyanpi mentioned this pull request Jan 9, 2025

Prompt_security #920

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add pii masking capability to private ai integration #901

feat: add pii masking capability to private ai integration #901

letmerecall commented Dec 11, 2024

letmerecall commented Jan 8, 2025

Pouyanpi Jan 9, 2025

Pouyanpi Jan 9, 2025

Pouyanpi Jan 9, 2025

Pouyanpi Jan 9, 2025

Pouyanpi Jan 9, 2025 •

edited

Loading

Pouyanpi Jan 9, 2025

Pouyanpi left a comment •

edited

Loading

letmerecall commented Jan 9, 2025

	NeMo Guardrails supports using [Private AI API](https://docs.private-ai.com/?utm_medium=github&utm_campaign=nemo-guardrails) for PII detection and masking the in input, output and retrieval flows.
	NeMo Guardrails supports using [Private AI API](https://docs.private-ai.com/?utm_medium=github&utm_campaign=nemo-guardrails) for PII detection and masking input, output and retrieval flows.

feat: add pii masking capability to private ai integration #901

Are you sure you want to change the base?

feat: add pii masking capability to private ai integration #901

Conversation

letmerecall commented Dec 11, 2024

Description

Checklist

letmerecall commented Jan 8, 2025

Pouyanpi Jan 9, 2025

Choose a reason for hiding this comment

Pouyanpi Jan 9, 2025

Choose a reason for hiding this comment

Pouyanpi Jan 9, 2025

Choose a reason for hiding this comment

Pouyanpi Jan 9, 2025

Choose a reason for hiding this comment

Pouyanpi Jan 9, 2025 • edited Loading

Choose a reason for hiding this comment

Pouyanpi Jan 9, 2025

Choose a reason for hiding this comment

Pouyanpi left a comment • edited Loading

Choose a reason for hiding this comment

letmerecall commented Jan 9, 2025

Pouyanpi Jan 9, 2025 •

edited

Loading

Pouyanpi left a comment •

edited

Loading