Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add pii masking capability to private ai integration #901

Open
wants to merge 5 commits into
base: develop
Choose a base branch
from

Conversation

letmerecall
Copy link
Contributor

Description

This PR enhances Private AI integration by adding support for PII masking capabilities. Previously, Private AI integration allowed for PII detection, preventing texts containing PII from being sent to the LLM or end user. With the new masking feature, texts can now be sanitized and shared with the LLM or user in a masked format.

For example, given the user input:

My email id is [email protected]

The PII masking configuration can transform the text into:

My email id is [EMAIL]

This enables safe data handling while maintaining the contextual integrity of the input.

@Pouyanpi

Checklist

  • I've read the CONTRIBUTING guidelines.
  • I've updated the documentation if applicable.
  • I've added tests if applicable.
  • @mentions of the person or team responsible for reviewing proposed changes.

@letmerecall
Copy link
Contributor Author

@Pouyanpi hope you had a wonderful holidays. PTAL whenever you find some time.

@Pouyanpi Pouyanpi self-assigned this Jan 8, 2025
@Pouyanpi Pouyanpi self-requested a review January 8, 2025 11:46
@Pouyanpi Pouyanpi added enhancement New feature or request status: in review labels Jan 8, 2025
@@ -694,9 +694,9 @@ For more details, check out the [GCP Text Moderation](https://github.com/NVIDIA/

### Private AI PII Detection

NeMo Guardrails supports using [Private AI API](https://docs.private-ai.com/?utm_medium=github&utm_campaign=nemo-guardrails) for PII detection in input, output and retrieval flows.
NeMo Guardrails supports using [Private AI API](https://docs.private-ai.com/?utm_medium=github&utm_campaign=nemo-guardrails) for PII detection and masking the in input, output and retrieval flows.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
NeMo Guardrails supports using [Private AI API](https://docs.private-ai.com/?utm_medium=github&utm_campaign=nemo-guardrails) for PII detection and masking the in input, output and retrieval flows.
NeMo Guardrails supports using [Private AI API](https://docs.private-ai.com/?utm_medium=github&utm_campaign=nemo-guardrails) for PII detection and masking input, output and retrieval flows.

Comment on lines +74 to +83
"""Masks any detected PII in the provided text.

Args
source: The source for the text, i.e. "input", "output", "retrieval".
text: The text to check.
config: The rails configuration object.

Returns
The altered text with PII masked.
"""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"""Masks any detected PII in the provided text.
Args
source: The source for the text, i.e. "input", "output", "retrieval".
text: The text to check.
config: The rails configuration object.
Returns
The altered text with PII masked.
"""
"""Masks any detected PII in the provided text.
Args:
source (str): The source for the text, i.e. "input", "output", "retrieval".
text (str): The text to check.
config (RailsConfig): The rails configuration object.
Returns:
str: The altered text with PII masked.
"""

@@ -25,14 +24,14 @@
log = logging.getLogger(__name__)


async def private_ai_detection_request(
async def private_ai_request(
text: str,
enabled_entities: List[str],
server_endpoint: str,
api_key: Optional[str] = None,
):
"""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"""
"""Send a PII detection request to the Private AI API.

text: str,
enabled_entities: List[str],
server_endpoint: str,
api_key: Optional[str] = None,
):
"""
Send a detection request to the Private AI API.
Send a PII detection request to the Private AI API.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Send a PII detection request to the Private AI API.

result = await resp.json()

return any(res["entities_present"] for res in result)
return await resp.json()
Copy link
Collaborator

@Pouyanpi Pouyanpi Jan 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

improve error handling for response parsing

As a suggestion, but please decide for yourself I have not given it much thought

            try:
                response_json = await resp.json()
            except aiohttp.ContentTypeError:
                raise ValueError(
                    f"Failed to parse response as JSON. Status: {resp.status}, "
                    f"Content: {await resp.text()}"
                )

server_endpoint,
pai_api_key,
)
return private_ai_response[0]["processed_text"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Ensure that private_ai_response[0]["processed_text"] is always valid to avoid potential index errors.
  • Improve error handling
  • Update docstring with Raises (see ref)

Copy link
Collaborator

@Pouyanpi Pouyanpi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @letmerecall for the PR, please review my comments.

current tests for PII detection and masking are using mocked responses. While this is great for unit testing, it doesn't really test the actual integration with the Private AI API. I think we should consider adding some integration tests using temporary credentials to make sure everything works end-to-end. What do you think about it @letmerecall? I am not quite sure wether the mock requests and response accurately mocks Private AI API. You can have a look at this sensitive content detection tests or its refinement in #845

Also, let's make sure our tests cover error scenarios, like what happens if the API key is missing or invalid. This will help us catch any potential issues early

@Pouyanpi Pouyanpi assigned letmerecall and unassigned Pouyanpi Jan 9, 2025
@Pouyanpi Pouyanpi mentioned this pull request Jan 9, 2025
@letmerecall
Copy link
Contributor Author

Thank you @letmerecall for the PR, please review my comments.

current tests for PII detection and masking are using mocked responses. While this is great for unit testing, it doesn't really test the actual integration with the Private AI API. I think we should consider adding some integration tests using temporary credentials to make sure everything works end-to-end. What do you think about it @letmerecall? I am not quite sure wether the mock requests and response accurately mocks Private AI API. You can have a look at this sensitive content detection tests or its refinement in #845

Also, let's make sure our tests cover error scenarios, like what happens if the API key is missing or invalid. This will help us catch any potential issues early

Having a temp Private AI API key to test is a good idea. Let me get back to you on that.

Thanks for the review and the references. I'll address them soon :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request status: in review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants