Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP - Reject duplicate submissions #876

Closed
wants to merge 4 commits into from
Closed

Conversation

noliveleger
Copy link
Contributor

@noliveleger noliveleger commented May 8, 2023

Description

TBC

Additional info

would supersede #859

@noliveleger noliveleger requested a review from jnm May 8, 2023 19:46
@noliveleger noliveleger force-pushed the duplicate-submissions branch 3 times, most recently from 9b39237 to 53c6320 Compare May 9, 2023 18:45
@noliveleger noliveleger force-pushed the duplicate-submissions branch from 53c6320 to 019e7bf Compare May 9, 2023 18:47
@@ -52,7 +52,7 @@ jobs:
- name: Install Python dependencies
run: pip-sync dependencies/pip/dev_requirements.txt
- name: Run pytest
run: pytest -vv -rf
run: pytest -vv -rf --disable-warnings
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔

@@ -32,6 +32,7 @@ test:
POSTGRES_PASSWORD: kobo
POSTGRES_DB: kobocat_test
SERVICE_ACCOUNT_BACKEND_URL: redis://redis_cache:6379/4
GIT_LAB: "True"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe something more descriptive like SKIP_TESTS_WITH_CONCURRENCY

@@ -40,7 +41,7 @@ test:
script:
- apt-get update && apt-get install -y ghostscript gdal-bin libproj-dev gettext openjdk-11-jre
- pip install -r dependencies/pip/dev_requirements.txt
- pytest -vv -rf
- pytest -vv -rf --disable-warnings
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔

Comment on lines +90 to +94
fakeredis.FakeStrictRedis(),
):
with patch(
'onadata.apps.django_digest_backends.cache.RedisCacheNonceStorage._get_cache',
fakeredis.FakeStrictRedis,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is one instantiated and the other isn't?

results[result] += 1

assert results[status.HTTP_201_CREATED] == 1
assert results[status.HTTP_409_CONFLICT] == DUPLICATE_SUBMISSIONS_COUNT - 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does the OpenRosa spec allow returning a 409? and do Enketo and Collect handle a 409 properly? i can't find the code, but i remember wanting to return a 40x that wasn't 400 but being forced to use 400 only because without it Collect wouldn't display the error message i was sending. could've been Enketo, though, or i might be misremembering entirely

Comment on lines +171 to +176
# The start-time requirement protected submissions with identical responses
# from being rejected as duplicates *before* KoBoCAT had the concept of
# submission UUIDs. Nowadays, OpenRosa requires clients to send a UUID (in
# `<instanceID>`) within every submission; if the incoming XML has a UUID
# and still exactly matches an existing submission, it's certainly a
# duplicate (https://docs.opendatakit.org/openrosa-metadata/#fields).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should reject outright any submission without a UUID

Comment on lines +189 to +194
if existing_instance:
existing_instance.check_active(force=False)
# ensure we have saved the extra attachments
new_attachments = save_attachments(existing_instance, media_files)
if not new_attachments:
raise DuplicateInstanceError()
Copy link
Member

@jnm jnm Jun 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't ConflictingXMLHashInstanceError be raised already and block us from getting here? Nevermind; I misunderstood, and that exception only has to do with concurrent processing of submissions with identical XML

Comment on lines +229 to +238
if is_postgresql:
cur = connection.cursor()
cur.execute('SELECT pg_try_advisory_lock(%s::bigint);', (int_lock,))
acquired = cur.fetchone()[0]
else:
prefix = os.getenv('KOBOCAT_REDIS_LOCK_PREFIX', 'kc-lock')
key_ = f'{prefix}:{int_lock}'
redis_lock = settings.REDIS_LOCK_CLIENT.lock(key_, timeout=60)
acquired = redis_lock.acquire(blocking=False)
yield acquired
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think support for something other than PostgreSQL is needed. We already depend on Postgres for many things.

Moot point, then, but just to say it: I think all os.getenv() for configuration (and default values) should be in the settings files

Comment on lines -544 to +581
except DuplicateInstance:
response = OpenRosaResponse(t("Duplicate submission"))
except ConflictingXMLHashInstanceError:
response = OpenRosaResponse(t('Conflict with already existing instance'))
response.status_code = 409
response['Location'] = request.build_absolute_uri(request.path)
error = response
except DuplicateInstanceError:
response = OpenRosaResponse(t('Duplicate instance'))
Copy link
Member

@jnm jnm Jun 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A conflicting XML hash with no additional attachments is the same thing as a duplicate instance. If the XML is the same but there are additional attachments, no exception should be raised; this is normal operation. I don't think a new exception class is needed. I also don't think that DuplicateInstanceError can be reached anymore, but that's addressed by a different comment [was based on a misunderstanding]

Copy link
Member

@jnm jnm Jul 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thing that confused me about this is that the verbiage is wrong. It's not a really a conflict with an existing instance, it's unwanted concurrent processing of two (or more) instances with identical XML.

I think ideally we should wait (a short amount of time) to try to acquire the lock before returning an error. I don't know what happens in Enketo now, but it's easy to imagine that a client would asynchronously send the following requests concurrently:

  1. Submission XML (hash abc123) + dog.jpg
  2. Submission XML (hash abc123) + cat.jpg
  3. Submission XML (hash abc123) + gecko.jpg

Ideally, all three requests would succeed. Let's assume request 1 arrives first and is still processing while requests 2 and 3 are received by the server: what I understand this PR would do is reject requests 2 and 3 immediately with an error code. I think we should wait (again, briefly) for request 1 to finish1 before returning an immediate rejection.

If the waiting takes too long, then we do have to reject in order to avoid sapping the worker pool with useless spinning. The message we return should be effectively "try again later", not something about a conflict with an existing instance. The HTTP code has to be dependent on the OpenRosa specification and compatible with Enketo and ODK Collect. Hopefully, those requirements are one and the same, but we'll have to test.

Footnotes

  1. Addendum: Whoops, we don't really need to spin until request 1 finishes, we just need to wait until the row has been created in logger_instance. Imagine one POST has 50 files: locking while all 50 are written to storage is not what we want. We should still avoid storing the same attachment multiple times for a submission, so within a single submission, we should have some kind of attachment uniqueness constraint (if we don't already).

Copy link

@tiritea tiritea Sep 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's assume request 1 arrives first and is still processing while requests 2 and 3 are received by the server: what I understand this PR would do is reject requests 2 and 3 immediately with an error code.

So my interpretation of the spec would be that we would only need to (immediately!) reject if the same submission XML was subsequently received [ie duplicate hash] and it includes no attachment (!). Because a client should never be attempting to resend just the submission XML on its own more than once.

Whereas if the same submission XML hash was received - but it included an attachment (or multiple!) - then we can then decide whether to or not to reject based on whether any of those attachments have already been received (or are currently being processed). [and just throw away the XML since we know its already been processed, or is currently being processed...]

So any locking would minimally only need to occur for the duration of calculating (then checking for existing, then storing if not) the submission XML hash, right?

@@ -1,6 +1,7 @@
# coding: utf-8
import os

from fakeredis import FakeConnection, FakeStrictRedis, FakeServer
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, was the support for locking without PostgreSQL exclusively for unit testing?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unit testing, at least on GitLab, is already using PostgreSQL, so I don't think we need to support any non-Postgres locking mechanisms for this

Copy link

@noliveleger
Copy link
Contributor Author

Closed in favor of kobotoolbox/kpi#5047

rajpatel24 added a commit to kobotoolbox/kpi that referenced this pull request Jan 15, 2025
## Summary
Implemented logic to detect and reject duplicate submissions.

## Description

We have identified a race condition in the submission processing that
causes duplicate submissions with identical UUIDs and XML hashes. This
issue is particularly problematic under conditions with multiple remote
devices submitting forms simultaneously over unreliable networks.

To address this issue, a PR has been raised with the following proposed
changes:

- Race Condition Resolution: A locking mechanism has been added to
prevent the race condition when checking for existing instances and
creating new ones. This aims to eliminate duplicate submissions.

- UUID Enforcement: Submissions without a UUID are now explicitly
disallowed. This ensures that every submission is uniquely identifiable
and further mitigates the risk of duplicate entries.

- Introduction of `root_uuid`:

- To ensure a consistent submission UUID throughout its lifecycle and
prevent duplicate submissions with the same UUID, a new `root_uuid`
column has been added to the `Instance` model with a unique constraint
(`root_uuid` per `xform`).

- If the `<meta><rootUuid>` is present in the submission XML, it is
stored in the `root_uuid` column.

- If `<meta><rootUuid>` is not present, the value from
`<meta><instanceID>` is used instead.

- This approach guarantees that the `root_uuid` remains constant across
the lifecycle of a submission, providing a reliable identifier for all
instances.

- UUID Handling Improvement: Updated the logic to strip only the `uuid:`
prefix while preserving custom, non-UUID ID schemes (e.g.,
domain.com:1234). This ensures compliance with the OpenRosa spec and
prevents potential ID collisions with custom prefixes.

- Error Handling:
- 202 Accepted: Returns when content is identical to an existing
submission and successfully processed.
- 409 Conflict: Returns when a duplicate UUID is detected but with
differing content.

These changes should improve the robustness of the submission process
and prevent both race conditions and invalid submissions.

## Notes

- Implemented a fix to address the race condition that leads to
duplicate submissions with the same UUID and XML hash.
- Incorporated improvements from existing work, ensuring consistency and
robustness in handling concurrent submissions.
- The fix aims to prevent duplicate submissions, even under high load
and unreliable network conditions.

## Related issues

Supersedes
[kobotoolbox/kobocat#876](kobotoolbox/kobocat#876)
and kobotoolbox/kobocat#859

---------

Co-authored-by: Olivier Leger <[email protected]>
magicznyleszek added a commit to kobotoolbox/kpi that referenced this pull request Jan 17, 2025
commit 11ee676
Author: Rebecca Graber <[email protected]>
Date:   Thu Jan 16 08:47:09 2025 -0500

    feat(projectHistoryLogs): log new submissions (#5416)

    ### 📣 Summary
    Create logs when new submissions are added to projects.

    ### 👷 Description for instance maintainers
    Allow null user_uids in AuditLogs so we can log anonymous submissions.

    ### 💭 Notes
    We previously had no need for null users in audit logs because the
    actions we logged were all restricted to authenticated users, but since
    we allow anonymous submissions, we needed a way to log those.

    ### 👀 Preview steps

    Feature/no-change template:
    1. ℹ️ have an account and a project. Make sure the account username is
    not `admin` (see [this notion
    task](https://www.notion.so/kobotoolbox/Anonymous-submissions-dont-work-if-user-named-admin-owns-asset-1767e515f65480608dfcee76ba9b3710?pvs=4))
    2. Deploy the project
    3. Add a submission to the project
    4. Go to `api/v2/asset/<asset-uid>/history`
    5. 🟢 There should be a new project history log with
    `action='add-submission'` and all the usual metadata, plus
    ```
    "submission": {
        "submitted_by": "user1"
    }
    ```
    6. Enable submissions without username/email to the project
    7. To make sure you're submitting anonymously, copy and paste the enketo
    link into a new private tab and add a new submission
    8. 🟢 Reload the endpoint. There should be a new audit log with
    `action='add-submission'`
    a. The user should be
    `http://kf.kobo.local:8080/api/v2/users/AnonymousUser/`
      b. The user_uid will be the uid of the anonymous user in the database
      c. The username should be `AnonymousUser`
    d. The metadata should contain `{"submission": {"submitted_by":
    "AnonymousUser"}` in addition to the usual

commit bbfdaf1
Author: olive-KTB <[email protected]>
Date:   Thu Jan 16 02:49:14 2025 +0100

    update gitlab-ci.yml

commit bc56f8f
Author: Akuukis <[email protected]>
Date:   Wed Jan 15 10:50:39 2025 +0200

    refactor(frontend): Mantine Component Library PoC (#5344)

    ### 💭 Notes

    Please read the PR commit-by-commit, here's a guide.

    1.
    [3105f47](3105f47)
    Setup Mantine Component Library. Please install the new dependencies,
    and add recommended VSCode extensions.
    2.
    [453f2c1](453f2c1)
    Here's a preview that Mantine in general works, and API for the example
    Button is different but generally similar.

    | our button | Mantine default button |
    |--------|--------|
    |
    ![image](https://github.com/user-attachments/assets/1f3e5736-c5eb-4cbf-a56e-a393dab561d3)
    |
    ![image](https://github.com/user-attachments/assets/c8f2a9dc-cf1a-44c3-881a-ff05da433060)
    |

    ![image](https://github.com/user-attachments/assets/e2abe407-3113-4900-9027-43186137ce13)

    3.
    [d782e38](d782e38)
    Example of custom styled component, Button. IMHO achieves pixel-perfect
    match in storybook and example above, except for line breaks, spinner
    and hover animations. Click animation matches out of box. Icon-only
    buttons are omitted because Mantine uses a different component
    `IconAction` for those.

    | Original on left / Mantine implementation on right |
    | --- |
    |
    ![image](https://github.com/user-attachments/assets/9a58b7af-3ff4-4b18-995c-6c57ff4264cb)
    |

    5.
    [eaf45e3](eaf45e3)
    Wrapped Button to add support for inbuilt Tooltip. No idea if we want to
    move forward with these two coupled, but I found it useful for
    comparison by re-implementing part of old Button behavior that's
    represented in storybook.

    | Original | Mantine implementation |
    | --- | --- |
    |
    ![image](https://github.com/user-attachments/assets/7bf2a504-902b-4801-a2dd-3e7648166316)
    |
    ![image](https://github.com/user-attachments/assets/41bc4dff-acf3-4328-ab6f-f7fcccbde20b)
    |

    ### 👀 Preview steps

    1. ℹ️ open Kobo home
    4. 🟢 [on main] notice the original "new" button
    6. 🟢 [on PR] notice the new "new" button with default mantine style,
    only slightly different

    ---------

    Co-authored-by: Leszek Pietrzak <[email protected]>
    Co-authored-by: Leszek <[email protected]>
    Co-authored-by: James Kiger <[email protected]>
    Co-authored-by: Paulo Amorim <[email protected]>
    Co-authored-by: James Kiger <[email protected]>

commit 9598180
Author: Raj Patel <[email protected]>
Date:   Wed Jan 15 13:29:03 2025 +0530

    fix!: reject duplicate submissions (#5047)

    ## Summary
    Implemented logic to detect and reject duplicate submissions.

    ## Description

    We have identified a race condition in the submission processing that
    causes duplicate submissions with identical UUIDs and XML hashes. This
    issue is particularly problematic under conditions with multiple remote
    devices submitting forms simultaneously over unreliable networks.

    To address this issue, a PR has been raised with the following proposed
    changes:

    - Race Condition Resolution: A locking mechanism has been added to
    prevent the race condition when checking for existing instances and
    creating new ones. This aims to eliminate duplicate submissions.

    - UUID Enforcement: Submissions without a UUID are now explicitly
    disallowed. This ensures that every submission is uniquely identifiable
    and further mitigates the risk of duplicate entries.

    - Introduction of `root_uuid`:

    - To ensure a consistent submission UUID throughout its lifecycle and
    prevent duplicate submissions with the same UUID, a new `root_uuid`
    column has been added to the `Instance` model with a unique constraint
    (`root_uuid` per `xform`).

    - If the `<meta><rootUuid>` is present in the submission XML, it is
    stored in the `root_uuid` column.

    - If `<meta><rootUuid>` is not present, the value from
    `<meta><instanceID>` is used instead.

    - This approach guarantees that the `root_uuid` remains constant across
    the lifecycle of a submission, providing a reliable identifier for all
    instances.

    - UUID Handling Improvement: Updated the logic to strip only the `uuid:`
    prefix while preserving custom, non-UUID ID schemes (e.g.,
    domain.com:1234). This ensures compliance with the OpenRosa spec and
    prevents potential ID collisions with custom prefixes.

    - Error Handling:
    - 202 Accepted: Returns when content is identical to an existing
    submission and successfully processed.
    - 409 Conflict: Returns when a duplicate UUID is detected but with
    differing content.

    These changes should improve the robustness of the submission process
    and prevent both race conditions and invalid submissions.

    ## Notes

    - Implemented a fix to address the race condition that leads to
    duplicate submissions with the same UUID and XML hash.
    - Incorporated improvements from existing work, ensuring consistency and
    robustness in handling concurrent submissions.
    - The fix aims to prevent duplicate submissions, even under high load
    and unreliable network conditions.

    ## Related issues

    Supersedes
    [kobotoolbox/kobocat#876](kobotoolbox/kobocat#876)
    and kobotoolbox/kobocat#859

    ---------

    Co-authored-by: Olivier Leger <[email protected]>

commit d22b8b5
Merge: 89bd9b7 b4aa1b7
Author: John N. Milner <[email protected]>
Date:   Tue Jan 14 15:20:49 2025 -0500

    Merge remote-tracking branch 'origin/release/2.024.36'

commit 89bd9b7
Author: Rebecca Graber <[email protected]>
Date:   Tue Jan 14 15:11:37 2025 -0500

    fix(auditLogs): correctly serialize audit logs from deleted users (#5418)

    ### 📣 Summary
    Fixes a 500 error from the various audit log endpoints when there are
    actions by deleted users.

    ### 📖 Description
    Return empty user and username fields in the response if the user was
    deleted after the log was created. This applies to `/api/v2/audit-logs`,
    `api/v2/assets/<uid>/history`, and `api/v2/project-history-logs`.

    ### 💭 Notes
    Small fix in the serializer. Also updates the ProjectHistoryLog
    serializer to inherit from the AuditLogSerializer so we don't have to
    duplicate the method fields.

    ### 👀 Preview steps

    Bug template:
    1. ℹ️ have a super user account and a project
    2. Create a new user (user1) and give them the `Edit Form` permission on
    the project.
    3. Log in as user1 and make an edit to the project.
    4. Log out user1 and log back in as the super user
    5. Delete user1. You can do this from the admin page if you delete the
    user from the User list, then from the Trash Bin.
    6. Go to:
      a. `api/v2/audit-logs`
      b. `api/v2/project-history-logs`
      c. `api/v2/assets/<uid>/history`
    7. 🔴 [on main] All will return a 500 error (`AttributeError: 'NoneType'
    object has no attribute 'username'`)
    8. 🟢 [on PR] The endpoint will return the expected logs. For all user1's
    actions, the user and username fields will be empty. but the user_uid
    should still refer to the old user.

commit b4aa1b7
Author: jnm <[email protected]>
Date:   Tue Jan 14 15:01:18 2025 -0500

    feat: export background-geopoint as GPS field (#5420)

    See kobotoolbox/formpack#327. This change just updates the formpack
    commit hash used by KPI

commit dddd619
Author: Olivier Léger <[email protected]>
Date:   Tue Jan 14 14:10:31 2025 -0500

    fix: catch additional XLSForm validation errors during deployment (#5419)

    ### 📣 Summary
    Enhanced error handling to catch more validation errors in XLSForm
    during deployment.

    ### 📖 Description
    Validation error handling for XLSForm deployment has been enhanced to
    catch a wider range of issues. This prevents the display of a generic
    500 error in the deployment modal and instead returns the explicit error
    message.

    ### Notes
    Supersedes #5417, #5411 and #5403

commit 9189ac9
Author: Olivier Léger <[email protected]>
Date:   Tue Jan 14 09:40:34 2025 -0500

    fix: handle case sensitivity for "Settings" sheet name with explicit error TASK-1353 (#5417)

    ### 📣 Summary
    Improved error handling for case-sensitive "Settings" sheet names.

    ### 📖 Description
    This update addresses an issue where a sheet named "Settings" with
    uppercase or mixed case letters causes unexpected behavior. An explicit
    error message is now raised to alert users of the case sensitivity,
    ensuring they can resolve the issue easily.

commit 8e8d6bb
Author: Rebecca Graber <[email protected]>
Date:   Mon Jan 13 15:02:04 2025 -0500

    test: rename admin user in fixture (#5415)

    ### 💭 Notes
    Developer-facing changes only. Changes the username of the admin user to
    `adminuser` in preparation for disallowing the name `admin` as part of
    https://www.notion.so/kobotoolbox/Anonymous-submissions-dont-work-if-user-named-admin-owns-asset-1767e515f65480608dfcee76ba9b3710
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants