-
Notifications
You must be signed in to change notification settings - Fork 86
[ENG-1404] Duplicate DSR - runner integration #6860
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENG-1404] Duplicate DSR - runner integration #6860
Conversation
…-duplicate-dsr-records-model
…-duplicate-dsr-records-model
…-duplicate-dsr-records-model
…-duplicate-dsr-records-model
…-duplicate-dsr-records-model
…dsr-runner-integration' of github.com:ethyca/fides into ENG-1404-be-implement-ability-to-sort-filter-duplicate-dsr-runner-integration
…-duplicate-dsr-runner-integration
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Greptile Summary
Integrates duplicate detection into the privacy request runner, allowing the system to identify and terminate duplicate requests before full execution.
Key changes:
- Added duplicate detection check at the start of
run_privacy_requestthat terminates withduplicatestatus if a duplicate is found - Enhanced
DuplicateDetectionServiceto create execution logs for both success and error cases during duplicate detection - Updated method signatures to accept
DuplicateDetectionSettingsProxyfor runtime configuration access - Added
update_duplicate_group_idsmethod to batch update group IDs for all requests in a duplicate group - Comprehensive test coverage for runner integration with various verification scenarios
Impact:
This change prevents duplicate privacy requests from being fully executed, improving efficiency and data consistency. The execution logs provide visibility into duplicate detection decisions.
Confidence Score: 4/5
- Safe to merge with minor style improvement suggested
- The implementation is well-tested and follows established patterns. The duplicate detection logic is sound and properly integrated into the runner. One minor style issue with redundant Union type hint found. The execution logging addition improves observability and debugging.
- No files require special attention
Important Files Changed
File Analysis
| Filename | Score | Overview |
|---|---|---|
| src/fides/api/service/privacy_request/request_runner_service.py | 5/5 | Adds duplicate detection integration at the start of the privacy request runner with proper logging and early termination |
| src/fides/api/service/privacy_request/duplication_detection.py | 4/5 | Adds execution logging throughout duplicate detection process and updates method signatures to accept DuplicateDetectionSettingsProxy instead of DuplicateDetectionSettings |
5 files reviewed, 1 comment
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
…-duplicate-dsr-runner-integration
…-duplicate-dsr-runner-integration
…-duplicate-dsr-runner-integration
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good - just two small things around the visibility / usage of config here but they aren't blocking from my point of view
| class DuplicateDetectionService: | ||
| def __init__(self, db: Session): | ||
| self.db = db | ||
| self.config = ConfigProxy(db).privacy_request_duplicate_detection |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might makes this _config because people probably shouldn't be accessing it directly, and just hide whatever we are using in a public property (like enabled)
|
|
||
| duplicate_detection_service = DuplicateDetectionService(session) | ||
| # Service initializes with ConfigProxy, so we can check if duplicate detection is enabled | ||
| if duplicate_detection_service.config.enabled: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be better to just have a property duplicate_detection_service.enabled? I think the config should be a hidden implementation detail (i.e. if you wanted to test the service you can just stub out enabled=True which is simpler). Also, you could move this check into is_duplicate_request and if it's not enabled, just immediately return False
…-duplicate-dsr-runner-integration
…-duplicate-dsr-runner-integration
Ticket ENG-1404
Description Of Changes
🎯 Fides must provide a way for privacy admins to quickly identify and surface in the request manager UI privacy requests which are likely duplicates submitted by the same user over a period of time.
This PR integrates the duplicate detection into the request runner.
Code Changes
Steps to Confirm
PATCH /api/v1/configThe duplicate DSR should then be marked as such in the DB and it won't be re-queued by the runner.

Pre-Merge Checklist
CHANGELOG.mdupdatedmaindowngrade()migration is correct and works