Skip to content
Merged
Show file tree
Hide file tree
Changes from 146 commits
Commits
Show all changes
150 commits
Select commit Hold shift + click to select a range
49d77ce
added duplicate indication cols
Oct 21, 2025
ea4fecd
clean up tests
Oct 21, 2025
1114b17
Apply suggestion from @JadeCara
JadeCara Oct 21, 2025
1af8571
Apply suggestion from @JadeCara
JadeCara Oct 21, 2025
c5ee7c2
Apply suggestion from @JadeCara
JadeCara Oct 21, 2025
4623459
Apply suggestion from @JadeCara
JadeCara Oct 21, 2025
7288af3
Apply suggestion from @JadeCara
JadeCara Oct 21, 2025
3beec51
Apply suggestion from @JadeCara
JadeCara Oct 21, 2025
aa008ea
Apply suggestion from @JadeCara
JadeCara Oct 21, 2025
25b64e6
Apply suggestion from @JadeCara
JadeCara Oct 21, 2025
266a716
Apply suggestion from @JadeCara
JadeCara Oct 21, 2025
0006a61
Apply suggestion from @JadeCara
JadeCara Oct 21, 2025
473f04e
Apply suggestion from @JadeCara
JadeCara Oct 21, 2025
7ef5703
Apply suggestion from @JadeCara
JadeCara Oct 21, 2025
a4ead46
added to db_dataset
Oct 21, 2025
94eeaf6
Merge branch 'main' into ENG-1404-be-implement-ability-to-sort-filter…
JadeCara Oct 21, 2025
ab760a5
add dupicate detection config
Oct 22, 2025
71ca1c0
improved naming, removed some redundancy
Oct 22, 2025
6e4e57c
Merge branch 'main' into ENG-1404-be-implement-ability-to-sort-filter…
JadeCara Oct 22, 2025
a44d017
add to config key allow list
Oct 22, 2025
38d739d
Merge branch 'main' into ENG-1404-be-implement-ability-to-sort-filter…
JadeCara Oct 22, 2025
0feb90c
update config tests
Oct 22, 2025
180460f
Merge branch 'main' into ENG-1404-be-implement-ability-to-sort-filter…
JadeCara Oct 22, 2025
68b4cfa
simplified model
Oct 24, 2025
e404cfb
simplified model
Oct 24, 2025
b0d00ed
simplified model
Oct 24, 2025
50d0d16
Merge branch 'main' into ENG-1404-be-implement-ability-to-sort-filter…
JadeCara Oct 24, 2025
2d7ab2b
update migration
Oct 24, 2025
fa3052e
updates for default config
Oct 24, 2025
082c99a
updates for default config
Oct 24, 2025
c328acc
updates for default config
Oct 24, 2025
cc61426
Merge branch 'main' into ENG-1404-be-implement-ability-to-sort-filter…
JadeCara Oct 24, 2025
d340cb8
initial duplication detection
Oct 24, 2025
66dc29a
corrected policy_condition
Oct 24, 2025
ded7171
Merge branch 'ENG-1404-be-implement-ability-to-sort-filter-duplicate-…
JadeCara Oct 24, 2025
b2d9d48
Fix Vendor list logic (#6830)
gilluminate Oct 24, 2025
38eef26
Restore isConsent to LI (#6840)
gilluminate Oct 24, 2025
973ea32
Eng 1695 skip access polling on erasure tasks (#6827)
Vagoasdf Oct 24, 2025
495a454
initial duplication detection
Oct 24, 2025
180bac0
corrected policy_condition
Oct 24, 2025
37bf572
Merge branch 'ENG-1404-be-implement-ability-to-sort-filter-duplicate-…
Oct 24, 2025
f951b23
.
Oct 24, 2025
ae41918
test clean up
Oct 24, 2025
ada551b
convert to class
Oct 27, 2025
638fd3f
initial
Oct 27, 2025
c0b3470
updates
Oct 27, 2025
5454ecd
Merge branch 'ENG-1404-be-implement-ability-to-sort-filter-duplicate-…
JadeCara Oct 27, 2025
ff4a31f
.
Oct 27, 2025
9c9f1fd
some more logic around assigning groups and tests
Oct 27, 2025
17c77b7
Merge branch 'main' into ENG-1404-be-implement-ability-to-sort-filter…
JadeCara Oct 27, 2025
d1b781b
Merge branch 'ENG-1404-be-implement-ability-to-sort-filter-duplicate-…
JadeCara Oct 27, 2025
e972ee7
updated some comments for clarity
Oct 27, 2025
1284885
Apply suggestion from @JadeCara
JadeCara Oct 28, 2025
6781aed
Merge branch 'ENG-1404-be-implement-ability-to-sort-filter-duplicate-…
JadeCara Oct 28, 2025
03cf158
initial
Oct 28, 2025
e99a87a
Merge branch 'main' into ENG-1404-be-implement-ability-to-sort-filter…
JadeCara Oct 28, 2025
ea5b7a5
Merge branch 'ENG-1404-be-implement-ability-to-sort-filter-duplicate-…
JadeCara Oct 28, 2025
b3ff7b9
updated comments
Oct 28, 2025
4862320
Merge branch 'ENG-1404-be-implement-ability-to-sort-filter-duplicate-…
JadeCara Oct 28, 2025
23e434f
Apply suggestion from @JadeCara
JadeCara Oct 28, 2025
eae970b
updated function naming
Oct 28, 2025
1663797
Merge branch 'ENG-1404-be-implement-ability-to-sort-filter-duplicate-…
JadeCara Oct 28, 2025
ad0cf2d
Merge branch 'ENG-1404-be-implement-ability-to-sort-filter-duplicate-…
JadeCara Oct 28, 2025
c3a23c0
Merge branch 'ENG-1404-be-implement-ability-to-sort-filter-duplicate-…
JadeCara Oct 28, 2025
bab5293
updated to use proxy
Oct 28, 2025
cb56f70
Apply suggestion from @JadeCara
JadeCara Oct 28, 2025
9f083ae
added some request runner tests
Oct 28, 2025
dfc2ef1
typo
Oct 28, 2025
300c9b9
removed some logging I was using to debug
Oct 28, 2025
a86715d
Update src/fides/api/service/privacy_request/duplication_detection.py
JadeCara Oct 28, 2025
b7440cc
Merge branch 'ENG-1404-be-implement-ability-to-sort-filter-duplicate-…
JadeCara Oct 28, 2025
8969999
Apply suggestion from @JadeCara
JadeCara Oct 28, 2025
d8bb57e
Merge branch 'ENG-1404-be-implement-ability-to-sort-filter-duplicate-…
JadeCara Oct 28, 2025
720d5d1
Update tests/ops/service/privacy_request/test_duplication_detection.py
JadeCara Oct 29, 2025
057176f
deterministic duplicate group ids
Oct 29, 2025
e4ac4ba
better naming
Oct 29, 2025
167ba3f
Merge branch 'duplicate_group-ENG-1404-be-implement-ability-to-sort-f…
JadeCara Oct 29, 2025
4feb500
Merge branch 'conditions-ENG-1404-be-implement-ability-to-sort-filter…
JadeCara Oct 29, 2025
3c8d77f
Update src/fides/api/models/privacy_request/privacy_request.py
JadeCara Oct 29, 2025
d551601
Update src/fides/api/alembic/migrations/versions/xx_2025_10_29_1659_8…
JadeCara Oct 29, 2025
846cd64
Apply suggestion from @JadeCara
JadeCara Oct 29, 2025
42fdb1d
Merge branch 'main' into ENG-1404-be-implement-ability-to-sort-filter…
JadeCara Oct 29, 2025
d4ccff3
Merge branch 'ENG-1404-be-implement-ability-to-sort-filter-duplicate-…
JadeCara Oct 29, 2025
c67da62
Merge branch 'duplicate_group-ENG-1404-be-implement-ability-to-sort-f…
JadeCara Oct 29, 2025
72578d8
Merge branch 'conditions-ENG-1404-be-implement-ability-to-sort-filter…
JadeCara Oct 29, 2025
c1b1269
use duplicate group to get ids
Oct 29, 2025
a9400fb
Merge branch 'mark-duplicates-ENG-1404-be-implement-ability-to-sort-f…
Oct 29, 2025
78ae4e5
Apply suggestion from @JadeCara
JadeCara Oct 29, 2025
95bf81c
using new duplication group
Oct 29, 2025
63c6048
updated changelog
Oct 29, 2025
e96b91c
Merge branch 'main' into ENG-1404-be-implement-ability-to-sort-filter…
JadeCara Oct 29, 2025
000b0d3
Merge branch 'ENG-1404-be-implement-ability-to-sort-filter-duplicate-…
JadeCara Oct 29, 2025
0e98d06
update db yml
Oct 29, 2025
58e8b1b
Merge branch 'duplicate_group-ENG-1404-be-implement-ability-to-sort-f…
JadeCara Oct 29, 2025
369ddbe
Merge branch 'conditions-ENG-1404-be-implement-ability-to-sort-filter…
JadeCara Oct 29, 2025
cd66519
Merge branch 'mark-duplicates-ENG-1404-be-implement-ability-to-sort-f…
Oct 29, 2025
9400fe5
updated tests for the deterministic group id
Oct 29, 2025
6a35f07
Merge branch 'main' into duplicate_group-ENG-1404-be-implement-abilit…
JadeCara Oct 29, 2025
4b25171
Merge branch 'duplicate_group-ENG-1404-be-implement-ability-to-sort-f…
JadeCara Oct 29, 2025
5e000be
linting
Oct 29, 2025
4e1a40f
Merge branch 'conditions-ENG-1404-be-implement-ability-to-sort-filter…
JadeCara Oct 29, 2025
3ca1228
add execution log
Oct 29, 2025
ff3eaa5
Merge branch 'mark-duplicates-ENG-1404-be-implement-ability-to-sort-f…
Oct 29, 2025
07a4e15
added additional test for missing coverage lines
Oct 29, 2025
7408e09
Merge branch 'mark-duplicates-ENG-1404-be-implement-ability-to-sort-f…
JadeCara Oct 29, 2025
af63f91
Apply suggestion from @JadeCara
JadeCara Oct 29, 2025
cb5db8b
Apply suggestion from @greptile-apps[bot]
JadeCara Oct 29, 2025
f05b503
Apply suggestion from @greptile-apps[bot]
JadeCara Oct 29, 2025
8d289b6
Apply suggestion from @greptile-apps[bot]
JadeCara Oct 29, 2025
583cad5
updated tests
Oct 29, 2025
b50ab9c
linting
Oct 29, 2025
40bc68d
Merge branch 'mark-duplicates-ENG-1404-be-implement-ability-to-sort-f…
JadeCara Oct 29, 2025
921aa6d
Merge branch 'duplicate_group-ENG-1404-be-implement-ability-to-sort-f…
JadeCara Oct 29, 2025
5cc32d6
added tests for missing coverage
Oct 30, 2025
4e49d97
Merge branch 'main' into duplicate_group-ENG-1404-be-implement-abilit…
JadeCara Oct 30, 2025
ff58137
Merge branch 'duplicate_group-ENG-1404-be-implement-ability-to-sort-f…
JadeCara Oct 30, 2025
6e02bea
Merge branch 'conditions-ENG-1404-be-implement-ability-to-sort-filter…
JadeCara Oct 30, 2025
dfa41cb
Merge branch 'mark-duplicates-ENG-1404-be-implement-ability-to-sort-f…
JadeCara Oct 30, 2025
f7884d2
linting some test improvements
Oct 30, 2025
0f2dde4
fixed type error
Oct 30, 2025
7ae2e6a
Merge branch 'main' into duplicate_group-ENG-1404-be-implement-abilit…
JadeCara Oct 30, 2025
bc487fa
updated response schema
Oct 30, 2025
4ff8144
Merge branch 'main' into duplicate_group-ENG-1404-be-implement-abilit…
JadeCara Oct 30, 2025
7a59899
updated changelog
Oct 30, 2025
86f8ed7
Merge branch 'duplicate_group-ENG-1404-be-implement-ability-to-sort-f…
JadeCara Oct 30, 2025
742a60c
Merge branch 'conditions-ENG-1404-be-implement-ability-to-sort-filter…
JadeCara Oct 30, 2025
f579a9f
Merge branch 'mark-duplicates-ENG-1404-be-implement-ability-to-sort-f…
JadeCara Oct 30, 2025
1c0fbd7
Merge branch 'main' into duplicate_group-ENG-1404-be-implement-abilit…
JadeCara Oct 30, 2025
9373b8d
Merge branch 'duplicate_group-ENG-1404-be-implement-ability-to-sort-f…
JadeCara Oct 30, 2025
904dc3e
Merge branch 'main' into conditions-ENG-1404-be-implement-ability-to-…
JadeCara Oct 30, 2025
2a59c67
Merge branch 'conditions-ENG-1404-be-implement-ability-to-sort-filter…
JadeCara Oct 30, 2025
ea2ead1
Merge branch 'mark-duplicates-ENG-1404-be-implement-ability-to-sort-f…
JadeCara Oct 30, 2025
20c7eb0
Merge branch 'main' into mark-duplicates-ENG-1404-be-implement-abilit…
JadeCara Oct 31, 2025
9db13b3
Merge branch 'mark-duplicates-ENG-1404-be-implement-ability-to-sort-f…
JadeCara Oct 31, 2025
fb9e3a3
Merge branch 'main' into ENG-1404-be-implement-ability-to-sort-filter…
JadeCara Oct 31, 2025
9710c1c
Merge branch 'main' into ENG-1404-be-implement-ability-to-sort-filter…
JadeCara Oct 31, 2025
a4de7ea
updated changelog
Oct 31, 2025
08f8014
Merge branch 'ENG-1404-be-implement-ability-to-sort-filter-duplicate-…
Oct 31, 2025
f57f4a5
Merge branch 'main' into ENG-1404-be-implement-ability-to-sort-filter…
JadeCara Oct 31, 2025
575ca2e
Apply suggestion from @greptile-apps[bot]
JadeCara Oct 31, 2025
8bca866
Apply suggestion from @JadeCara
JadeCara Oct 31, 2025
cb3b621
Merge branch 'main' into ENG-1404-be-implement-ability-to-sort-filter…
JadeCara Oct 31, 2025
761f918
Merge branch 'main' into ENG-1404-be-implement-ability-to-sort-filter…
JadeCara Oct 31, 2025
1b72113
updated so config is an attribute of the service class
Nov 3, 2025
b3534d8
Apply suggestion from @JadeCara
JadeCara Nov 3, 2025
c762467
Merge branch 'main' into ENG-1404-be-implement-ability-to-sort-filter…
JadeCara Nov 3, 2025
1d3ca0a
is_enabled to own function
Nov 3, 2025
3188fa4
update test
Nov 3, 2025
6e6af19
Merge branch 'main' into ENG-1404-be-implement-ability-to-sort-filter…
JadeCara Nov 4, 2025
7405cf8
Merge branch 'main' into ENG-1404-be-implement-ability-to-sort-filter…
JadeCara Nov 5, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ Changes can also be flagged with a GitHub label for tracking purposes. The URL o
### Added
- Added a duplicate group table with deterministic ids [#6881](https://github.com/ethyca/fides/pull/6881) https://github.com/ethyca/fides/labels/db-migration
- Added replace mode for decoding context loggers to avoid decode errors with zip files[#6899](https://github.com/ethyca/fides/pull/6899/files)
- Added duplicate DSR checking to request runner [#6860](https://github.com/ethyca/fides/pull/6860/)

### Changed
- Updated filter modal in new privacy request screen to store filters as query params in url [#6818](https://github.com/ethyca/fides/pull/6818)
Expand Down
214 changes: 123 additions & 91 deletions src/fides/api/service/privacy_request/duplication_detection.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
from datetime import datetime, timedelta, timezone
from typing import Optional
from uuid import UUID

from loguru import logger
from sqlalchemy.orm import Session
Expand All @@ -21,6 +22,7 @@
from fides.api.task.conditional_dependencies.sql_translator import (
SQLConditionTranslator,
)
from fides.config.config_proxy import ConfigProxy
from fides.config.duplicate_detection_settings import DuplicateDetectionSettings

ACTIONED_REQUEST_STATUSES = [
Expand All @@ -38,9 +40,10 @@
class DuplicateDetectionService:
def __init__(self, db: Session):
self.db = db
self.config = ConfigProxy(db).privacy_request_duplicate_detection
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might makes this _config because people probably shouldn't be accessing it directly, and just hide whatever we are using in a public property (like enabled)


def _create_identity_conditions(
self, current_request: PrivacyRequest, config: DuplicateDetectionSettings
self, current_request: PrivacyRequest
) -> list[Condition]:
"""Creates conditions for matching identity fields.

Expand All @@ -52,12 +55,12 @@ def _create_identity_conditions(
current_identities: dict[str, str] = {
pi.field_name: pi.hashed_value
for pi in current_request.provided_identities # type: ignore [attr-defined]
if pi.field_name in config.match_identity_fields
if pi.field_name in self.config.match_identity_fields
}
if len(current_identities) != len(config.match_identity_fields):
if len(current_identities) != len(self.config.match_identity_fields):
missing_fields = [
field
for field in config.match_identity_fields
for field in self.config.match_identity_fields
if field not in current_identities.keys()
]
logger.debug(
Expand Down Expand Up @@ -103,7 +106,6 @@ def _create_time_window_condition(self, time_window_days: int) -> Condition:
def create_duplicate_detection_conditions(
self,
current_request: PrivacyRequest,
config: DuplicateDetectionSettings,
) -> Optional[ConditionGroup]:
"""
Create conditions for duplicate detection based on configuration.
Expand All @@ -115,15 +117,15 @@ def create_duplicate_detection_conditions(
Returns:
A ConditionGroup with AND operator, or None if no conditions can be created
"""
if len(config.match_identity_fields) == 0:
if len(self.config.match_identity_fields) == 0:
return None

identity_conditions = self._create_identity_conditions(current_request, config)
identity_conditions = self._create_identity_conditions(current_request)
if not identity_conditions:
return None # Only proceed if we have identity conditions

time_window_condition = self._create_time_window_condition(
config.time_window_days
self.config.time_window_days
)

# Combine all conditions with AND operator
Expand All @@ -135,7 +137,6 @@ def create_duplicate_detection_conditions(
def find_duplicate_privacy_requests(
self,
current_request: PrivacyRequest,
config: DuplicateDetectionSettings,
) -> list[PrivacyRequest]:
"""
Find potential duplicate privacy requests based on duplicate detection configuration.
Expand All @@ -151,7 +152,7 @@ def find_duplicate_privacy_requests(
List of PrivacyRequest objects that match the duplicate criteria,
does not include the current request
"""
condition = self.create_duplicate_detection_conditions(current_request, config)
condition = self.create_duplicate_detection_conditions(current_request)

if condition is None:
return []
Expand All @@ -162,28 +163,75 @@ def find_duplicate_privacy_requests(
query = query.filter(PrivacyRequest.id != current_request.id)
return query.all()

def generate_dedup_key(
self, request: PrivacyRequest, config: DuplicateDetectionSettings
) -> str:
def generate_dedup_key(self, request: PrivacyRequest) -> str:
"""
Generate a dedup key for a request based on the duplicate detection settings.
"""
current_identities: dict[str, str] = {
pi.field_name: pi.hashed_value
for pi in request.provided_identities # type: ignore [attr-defined]
if pi.field_name in config.match_identity_fields
if pi.field_name in self.config.match_identity_fields
}
if len(current_identities) != len(config.match_identity_fields):
if len(current_identities) != len(self.config.match_identity_fields):
raise ValueError(
"This request does not contain the required identity fields for duplicate detection."
)
return "|".join(
[
current_identities[field]
for field in sorted(config.match_identity_fields)
for field in sorted(self.config.match_identity_fields)
]
)

def update_duplicate_group_ids(
self,
request: PrivacyRequest,
duplicates: list[PrivacyRequest],
duplicate_group_id: UUID,
) -> None:
"""
Update the duplicate request group ids for a request and its duplicates.
Args:
request: The privacy request to update
duplicates: The list of duplicate requests to update
duplicate_group_id: The duplicate request group id to update
"""
update_all = [request] + duplicates
try:
for privacy_request in update_all:
privacy_request.duplicate_request_group_id = duplicate_group_id # type: ignore [assignment]
except Exception as e:
logger.error(f"Failed to update duplicate request group ids: {e}")
raise e

def add_error_execution_log(self, request: PrivacyRequest, message: str) -> None:
request.add_error_execution_log(
db=self.db,
connection_key=None,
dataset_name="Duplicate Request Detection",
collection_name=None,
message=message,
action_type=(
request.policy.get_action_type() # type: ignore [arg-type]
if request.policy
else ActionType.access
),
)

def add_success_execution_log(self, request: PrivacyRequest, message: str) -> None:
request.add_success_execution_log(
db=self.db,
connection_key=None,
dataset_name="Duplicate Request Detection",
collection_name=None,
message=message,
action_type=(
request.policy.get_action_type() # type: ignore [arg-type]
if request.policy
else ActionType.access
),
)

def verified_identity_cases(
self, request: PrivacyRequest, duplicates: list[PrivacyRequest]
) -> bool:
Expand All @@ -206,60 +254,52 @@ def verified_identity_cases(
# The request identity is not verified.
if not request.identity_verified_at:
if len(verified_in_group) > 0:
logger.debug(
f"Request {request.id} is a duplicate: it is not verified and duplicating verified request(s) {verified_in_group}."
)
message = f"Request {request.id} is a duplicate: it is duplicating request(s) {[duplicate.id for duplicate in verified_in_group]}."
logger.debug(message)
self.add_error_execution_log(request, message)
return True

min_created_at = min(
(d.created_at for d in duplicates if d.created_at), default=None
) or datetime.now(timezone.utc)
request_created_at = (
request.created_at
if request.created_at is not None
else datetime.now(timezone.utc)
canonical_request = min(duplicates, key=lambda x: x.created_at) # type: ignore [arg-type, return-value]
canonical_request_created_at = canonical_request.created_at or datetime.now(
timezone.utc
)
if request_created_at < min_created_at:
logger.debug(
f"Request {request.id} is not a duplicate: it is the first request to be created in the group."
)
request_created_at = request.created_at or datetime.now(timezone.utc)
if request_created_at < canonical_request_created_at:
message = f"Request {request.id} is not a duplicate: it is the first request to be created in the group."
logger.debug(message)
self.add_success_execution_log(request, message)
return False
logger.debug(
f"Request {request.id} is a duplicate: it is not verified and is not the first request to be created in the group."
)

message = f"Request {request.id} is a duplicate: it is duplicating request(s) ['{canonical_request.id}']."
logger.debug(message)
self.add_error_execution_log(request, message)
return True

# The request identity is verified.
if not verified_in_group:
logger.debug(
f"Request {request.id} is not a duplicate: it is verified and no other requests in the group are verified."
)
message = f"Request {request.id} is not a duplicate: it is the first request to be verified in the group."
logger.debug(message)
self.add_success_execution_log(request, message)
return False

# If this request is the first with a verified identity, it is not a duplicate.
min_verified_at = min(
(d.identity_verified_at for d in duplicates if d.identity_verified_at),
default=None,
) or datetime.now(timezone.utc)
request_verified_at = (
request.identity_verified_at
if request.identity_verified_at is not None
else datetime.now(timezone.utc)
canonical_request = min(duplicates, key=lambda x: x.identity_verified_at) # type: ignore [arg-type, return-value]
canonical_request_verified_at = (
canonical_request.identity_verified_at or datetime.now(timezone.utc)
)
if request_verified_at < min_verified_at:
logger.debug(
f"Request {request.id} is not a duplicate: it is the first request to be verified in the group."
)
request_verified_at = request.identity_verified_at or datetime.now(timezone.utc)
if request_verified_at < canonical_request_verified_at:
message = f"Request {request.id} is not a duplicate: it is the first request to be verified in the group."
logger.debug(message)
self.add_success_execution_log(request, message)
return False
logger.debug(
f"Request {request.id} is a duplicate: it is verified but not the first request to be verified in the group."
)
message = f"Request {request.id} is a duplicate: it is duplicating request(s) ['{canonical_request.id}']."
logger.debug(message)
self.add_error_execution_log(request, message)
return True

# pylint: disable=too-many-return-statements
def is_duplicate_request(
self, request: PrivacyRequest, config: DuplicateDetectionSettings
) -> bool:
def is_duplicate_request(self, request: PrivacyRequest) -> bool:
"""
Determine if a request is a duplicate request and assigns a duplicate request group id.

Expand All @@ -281,52 +321,44 @@ def is_duplicate_request(
Returns:
True if the request is a duplicate request, False otherwise
"""
duplicates = self.find_duplicate_privacy_requests(request, config)
rule_version = generate_rule_version(config)
duplicates = self.find_duplicate_privacy_requests(request)
rule_version = generate_rule_version(
DuplicateDetectionSettings(
enabled=self.config.enabled,
time_window_days=self.config.time_window_days,
match_identity_fields=self.config.match_identity_fields,
)
)
try:
dedup_key = self.generate_dedup_key(request, config)
dedup_key = self.generate_dedup_key(request)
except ValueError as e:
logger.debug(f"Request {request.id} is not a duplicate: {e}")
message = f"Request {request.id} is not a duplicate: {e}"
logger.debug(message)
self.add_success_execution_log(request, message)
return False

_, duplicate_group = DuplicateGroup.get_or_create(
db=self.db, data={"rule_version": rule_version, "dedup_key": dedup_key}
)
if duplicate_group is None:
logger.error(
f"Failed to create duplicate group for request {request.id} with dedup key {dedup_key}"
)
message = f"Failed to create duplicate group for request {request.id} with dedup key {dedup_key}"
logger.error(message)
self.add_error_execution_log(request, message)
return False
logger.info(
f"Duplicate group {duplicate_group.id} created for request {request.id} with dedup key {dedup_key}"
)
request.update(
db=self.db, data={"duplicate_request_group_id": duplicate_group.id}
)

self.update_duplicate_group_ids(request, duplicates, duplicate_group.id) # type: ignore [arg-type]

# if this is the only request in the group, it is not a duplicate
if len(duplicates) == 0:
logger.debug(
f"Request {request.id} is not a duplicate: no matching requests were found."
)
message = f"Request {request.id} is not a duplicate."
logger.debug(message)
self.add_success_execution_log(request, message)
return False

if request.status == PrivacyRequestStatus.duplicate:
logger.warning(
f"Request {request.id} is a duplicate request that was requeued. This should not happen."
)
request.add_error_execution_log(
db=self.db,
connection_key=None,
dataset_name="Duplicate Request Detection",
collection_name=None,
message=f"Request {request.id} is a duplicate request that was requeued. This should not happen.",
action_type=(
request.policy.get_action_type() # type: ignore [arg-type]
if request.policy
else ActionType.access
),
)
message = f"Request {request.id} is a duplicate request that was requeued. This should not happen."
logger.warning(message)
self.add_error_execution_log(request, message)
return True

# only compare to non-duplicate requests for the following cases
Expand All @@ -337,9 +369,9 @@ def is_duplicate_request(
]
# If no non-duplicate requests are found, this request is not a duplicate.
if len(canonical_requests) == 0:
logger.debug(
f"Request {request.id} is not a duplicate: all matching requests have been marked as duplicate requests."
)
message = f"Request {request.id} is not a duplicate."
logger.debug(message)
self.add_success_execution_log(request, message)
return False

# If any requests in group are actioned, this request is a duplicate.
Expand All @@ -349,9 +381,9 @@ def is_duplicate_request(
if duplicate.status in ACTIONED_REQUEST_STATUSES
]
if len(actioned_in_group) > 0:
logger.debug(
f"Request {request.id} is a duplicate: it is duplicating actioned request(s) {actioned_in_group}."
)
message = f"Request {request.id} is a duplicate: it is duplicating actioned request(s) {[duplicate.id for duplicate in actioned_in_group]}."
logger.debug(message)
self.add_error_execution_log(request, message)
return True
# Check against verified identity rules.
return self.verified_identity_cases(request, canonical_requests)
16 changes: 16 additions & 0 deletions src/fides/api/service/privacy_request/request_runner_service.py
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,9 @@
get_attachments_content,
process_attachments_for_upload,
)
from fides.api.service.privacy_request.duplication_detection import (
DuplicateDetectionService,
)
from fides.api.service.storage.storage_uploader_service import upload
from fides.api.task.filter_results import filter_data_categories
from fides.api.task.graph_runners import access_runner, consent_runner, erasure_runner
Expand Down Expand Up @@ -446,6 +449,19 @@ def run_privacy_request(
logger.info("Terminating privacy request: request deleted.")
return

duplicate_detection_service = DuplicateDetectionService(session)
# Service initializes with ConfigProxy, so we can check if duplicate detection is enabled
if duplicate_detection_service.config.enabled:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be better to just have a property duplicate_detection_service.enabled? I think the config should be a hidden implementation detail (i.e. if you wanted to test the service you can just stub out enabled=True which is simpler). Also, you could move this check into is_duplicate_request and if it's not enabled, just immediately return False

logger.info(
"Duplicate detection is enabled. Checking if privacy request is a duplicate."
)
if duplicate_detection_service.is_duplicate_request(privacy_request):
logger.info("Terminating privacy request: request is a duplicate.")
privacy_request.update(
session, data={"status": PrivacyRequestStatus.duplicate}
)
return

logger.info("Dispatching privacy request")
privacy_request.start_processing(session)

Expand Down
Loading