-
Notifications
You must be signed in to change notification settings - Fork 420
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add pii masking capability to private ai integration #901
base: develop
Are you sure you want to change the base?
feat: add pii masking capability to private ai integration #901
Conversation
@Pouyanpi hope you had a wonderful holidays. PTAL whenever you find some time. |
@@ -694,9 +694,9 @@ For more details, check out the [GCP Text Moderation](https://github.com/NVIDIA/ | |||
|
|||
### Private AI PII Detection | |||
|
|||
NeMo Guardrails supports using [Private AI API](https://docs.private-ai.com/?utm_medium=github&utm_campaign=nemo-guardrails) for PII detection in input, output and retrieval flows. | |||
NeMo Guardrails supports using [Private AI API](https://docs.private-ai.com/?utm_medium=github&utm_campaign=nemo-guardrails) for PII detection and masking the in input, output and retrieval flows. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NeMo Guardrails supports using [Private AI API](https://docs.private-ai.com/?utm_medium=github&utm_campaign=nemo-guardrails) for PII detection and masking the in input, output and retrieval flows. | |
NeMo Guardrails supports using [Private AI API](https://docs.private-ai.com/?utm_medium=github&utm_campaign=nemo-guardrails) for PII detection and masking input, output and retrieval flows. |
"""Masks any detected PII in the provided text. | ||
|
||
Args | ||
source: The source for the text, i.e. "input", "output", "retrieval". | ||
text: The text to check. | ||
config: The rails configuration object. | ||
|
||
Returns | ||
The altered text with PII masked. | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"""Masks any detected PII in the provided text. | |
Args | |
source: The source for the text, i.e. "input", "output", "retrieval". | |
text: The text to check. | |
config: The rails configuration object. | |
Returns | |
The altered text with PII masked. | |
""" | |
"""Masks any detected PII in the provided text. | |
Args: | |
source (str): The source for the text, i.e. "input", "output", "retrieval". | |
text (str): The text to check. | |
config (RailsConfig): The rails configuration object. | |
Returns: | |
str: The altered text with PII masked. | |
""" | |
@@ -25,14 +24,14 @@ | |||
log = logging.getLogger(__name__) | |||
|
|||
|
|||
async def private_ai_detection_request( | |||
async def private_ai_request( | |||
text: str, | |||
enabled_entities: List[str], | |||
server_endpoint: str, | |||
api_key: Optional[str] = None, | |||
): | |||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
""" | |
"""Send a PII detection request to the Private AI API. | |
text: str, | ||
enabled_entities: List[str], | ||
server_endpoint: str, | ||
api_key: Optional[str] = None, | ||
): | ||
""" | ||
Send a detection request to the Private AI API. | ||
Send a PII detection request to the Private AI API. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Send a PII detection request to the Private AI API. |
result = await resp.json() | ||
|
||
return any(res["entities_present"] for res in result) | ||
return await resp.json() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
improve error handling for response parsing
As a suggestion, but please decide for yourself I have not given it much thought
try:
response_json = await resp.json()
except aiohttp.ContentTypeError:
raise ValueError(
f"Failed to parse response as JSON. Status: {resp.status}, "
f"Content: {await resp.text()}"
)
server_endpoint, | ||
pai_api_key, | ||
) | ||
return private_ai_response[0]["processed_text"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Ensure that
private_ai_response[0]["processed_text"]
is always valid to avoid potential index errors. - Improve error handling
- Update docstring with Raises (see ref)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @letmerecall for the PR, please review my comments.
current tests for PII detection and masking are using mocked responses. While this is great for unit testing, it doesn't really test the actual integration with the Private AI API. I think we should consider adding some integration tests using temporary credentials to make sure everything works end-to-end. What do you think about it @letmerecall? I am not quite sure wether the mock requests and response accurately mocks Private AI API. You can have a look at this sensitive content detection tests or its refinement in #845
Also, let's make sure our tests cover error scenarios, like what happens if the API key is missing or invalid. This will help us catch any potential issues early
Having a temp Private AI API key to test is a good idea. Let me get back to you on that. Thanks for the review and the references. I'll address them soon :) |
Description
This PR enhances Private AI integration by adding support for PII masking capabilities. Previously, Private AI integration allowed for PII detection, preventing texts containing PII from being sent to the LLM or end user. With the new masking feature, texts can now be sanitized and shared with the LLM or user in a masked format.
For example, given the user input:
The PII masking configuration can transform the text into:
This enables safe data handling while maintaining the contextual integrity of the input.
@Pouyanpi
Checklist