Skip to content

Conversation

@JadeCara
Copy link
Contributor

@JadeCara JadeCara commented Oct 28, 2025

Ticket ENG-1404

Description Of Changes

🎯 Fides must provide a way for privacy admins to quickly identify and surface in the request manager UI privacy requests which are likely duplicates submitted by the same user over a period of time.

This PR integrates the duplicate detection into the request runner.

Code Changes

  • Updated the request runner to detect duplicates before full run.
  • Added runner tests
  • Updated the duplicate service to create execution logs
  • Added functionality to update duplicate groups with current group id.

Steps to Confirm

  1. can run on fides (fidesplus as well when [ENG-1404] Added PrivacyRequest duplicate indication cols #6811 , ENG-1404 Create Duplicate Group table #6881 are merged.)
  2. Update the config proxy to enable privacy request duplicate detection with PATCH /api/v1/config
{
  "privacy_request_duplicate_detection": {
    "enabled": true
  }
}
  1. Create a privacy request that duplicates one you already have - if you dont have any up and running make two :) you will see logs indicating it has been identified as a duplicate and terminating the run.
2025-10-28 21:05:51.243 | INFO     | fides.api.service.privacy_request.request_runner_service:run_privacy_request:476 - Terminating privacy request: request is a duplicate. | {'privacy_request_id': 'pri_f4892dc9-1d13-48a4-bc3b-db1966b6cc69', 'privacy_request_source': None}

The duplicate DSR should then be marked as such in the DB and it won't be re-queued by the runner.
Screenshot 2025-10-28 at 3 37 43 PM

Pre-Merge Checklist

  • Issue requirements met
  • All CI pipelines succeeded
  • CHANGELOG.md updated
    • Add a db-migration This indicates that a change includes a database migration label to the entry if your change includes a DB migration
    • Add a high-risk This issue suggests changes that have a high-probability of breaking existing code label to the entry if your change includes a high-risk change (i.e. potential for performance impact or unexpected regression) that should be flagged
    • Updates unreleased work already in Changelog, no new entry necessary
  • UX feedback:
    • All UX related changes have been reviewed by a designer
    • No UX review needed
  • Followup issues:
    • Followup issues created
    • No followup issues
  • Database migrations:
    • Ensure that your downrev is up to date with the latest revision on main
    • Ensure that your downgrade() migration is correct and works
      • If a downgrade migration is not possible for this change, please call this out in the PR description!
    • No migrations
  • Documentation:
    • Documentation complete, PR opened in fidesdocs
    • Documentation issue created in fidesdocs
    • If there are any new client scopes created as part of the pull request, remember to update public-facing documentation that references our scope registry
    • No documentation updates required

Jade Wibbels and others added 30 commits October 21, 2025 15:53
Jade Wibbels added 2 commits October 31, 2025 09:58
…dsr-runner-integration' of github.com:ethyca/fides into ENG-1404-be-implement-ability-to-sort-filter-duplicate-dsr-runner-integration
@JadeCara JadeCara marked this pull request as ready for review October 31, 2025 16:05
@JadeCara JadeCara requested a review from a team as a code owner October 31, 2025 16:05
@JadeCara JadeCara requested review from johnewart and removed request for a team October 31, 2025 16:05
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Greptile Summary

Integrates duplicate detection into the privacy request runner, allowing the system to identify and terminate duplicate requests before full execution.

Key changes:

  • Added duplicate detection check at the start of run_privacy_request that terminates with duplicate status if a duplicate is found
  • Enhanced DuplicateDetectionService to create execution logs for both success and error cases during duplicate detection
  • Updated method signatures to accept DuplicateDetectionSettingsProxy for runtime configuration access
  • Added update_duplicate_group_ids method to batch update group IDs for all requests in a duplicate group
  • Comprehensive test coverage for runner integration with various verification scenarios

Impact:
This change prevents duplicate privacy requests from being fully executed, improving efficiency and data consistency. The execution logs provide visibility into duplicate detection decisions.

Confidence Score: 4/5

  • Safe to merge with minor style improvement suggested
  • The implementation is well-tested and follows established patterns. The duplicate detection logic is sound and properly integrated into the runner. One minor style issue with redundant Union type hint found. The execution logging addition improves observability and debugging.
  • No files require special attention

Important Files Changed

File Analysis

Filename Score Overview
src/fides/api/service/privacy_request/request_runner_service.py 5/5 Adds duplicate detection integration at the start of the privacy request runner with proper logging and early termination
src/fides/api/service/privacy_request/duplication_detection.py 4/5 Adds execution logging throughout duplicate detection process and updates method signatures to accept DuplicateDetectionSettingsProxy instead of DuplicateDetectionSettings

5 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Copy link
Collaborator

@johnewart johnewart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good - just two small things around the visibility / usage of config here but they aren't blocking from my point of view

class DuplicateDetectionService:
def __init__(self, db: Session):
self.db = db
self.config = ConfigProxy(db).privacy_request_duplicate_detection
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might makes this _config because people probably shouldn't be accessing it directly, and just hide whatever we are using in a public property (like enabled)


duplicate_detection_service = DuplicateDetectionService(session)
# Service initializes with ConfigProxy, so we can check if duplicate detection is enabled
if duplicate_detection_service.config.enabled:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be better to just have a property duplicate_detection_service.enabled? I think the config should be a hidden implementation detail (i.e. if you wanted to test the service you can just stub out enabled=True which is simpler). Also, you could move this check into is_duplicate_request and if it's not enabled, just immediately return False

@JadeCara JadeCara added this pull request to the merge queue Nov 5, 2025
Merged via the queue into main with commit 9299d65 Nov 5, 2025
68 of 69 checks passed
@JadeCara JadeCara deleted the ENG-1404-be-implement-ability-to-sort-filter-duplicate-dsr-runner-integration branch November 5, 2025 16:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants