-
Notifications
You must be signed in to change notification settings - Fork 37
Rebackfill the poll stats counter and engagement counter after dedupl… #1319
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 3 commits
aaba5f0
bd23aef
dea62c7
7603a34
3597d4d
8665fea
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,189 @@ | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| # Generated by Django 5.2.8 on 2025-12-01 13:54 | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| import time | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| from datetime import timedelta | ||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||
| from django.core.cache import cache | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| from django.db import migrations | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| from django.db.models import Count | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| from django.utils import timezone | ||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||
| from ureport.utils import chunk_list | ||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||
| def noop(apps, schema_editor): # pragma: no cover | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| pass | ||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||
| def dedupe_poll_stats_by_questions(apps, schema_editor): # pragma: no cover | ||||||||||||||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||||||||||||||
| def dedupe_poll_stats_by_questions(apps, schema_editor): # pragma: no cover | |
| def dedupe_poll_stats_by_questions(apps, schema_editor): # pragma: no cover | |
| """ | |
| Deduplicate PollStats entries for FlowResults with multiple associated PollQuestions. | |
| For each FlowResult that is linked to two or more PollQuestions, this function: | |
| 1. Deletes all PollStats entries for that FlowResult except the one associated with the first PollQuestion. | |
| 2. Sets the question field to None for the remaining PollStats entry. | |
| This is necessary to ensure that each FlowResult has at most one PollStats entry, | |
| and that the PollStats entry is not ambiguously linked to a specific PollQuestion. | |
| The expected outcome is that there are no duplicate PollStats per FlowResult, | |
| and all remaining PollStats have question=None. | |
| """ |
norkans7 marked this conversation as resolved.
Show resolved
Hide resolved
Copilot
AI
Dec 2, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The backfill_poll_stats_counters function lacks documentation explaining its purpose, parameters, and the complex backfilling logic. Given the function's complexity (handling segments, locations, scopes, caching), a comprehensive docstring would significantly improve maintainability.
| PollStats = apps.get_model("stats", "PollStats") | |
| GenderSegment = apps.get_model("stats", "GenderSegment") | |
| """ | |
| Backfills PollStatsCounter and PollEngagementDailyCount data for existing PollStats entries. | |
| This migration function iterates over all PollStats records and reconstructs the associated | |
| counter and engagement daily count data, ensuring that statistics are correctly populated | |
| for all relevant segments (gender, age, scheme), locations (state, district, ward), and | |
| scopes (overall, segmented, location-specific). | |
| The function: | |
| - Retrieves all necessary models using the provided `apps` registry. | |
| - Iterates through PollStats records, handling each according to its segmentation and location. | |
| - Handles deduplication and aggregation logic for segments and locations. | |
| - Utilizes caching to optimize repeated lookups and avoid redundant database queries. | |
| - Ensures that counters and engagement counts are created or updated as needed. | |
| Args: | |
| apps: The Django app registry for migrations, used to get historical models. | |
| schema_editor: The database schema editor (unused, but required by Django migration API). | |
| This function is intended to be run as part of a Django data migration and should not be called directly. | |
| """ | |
| PollStats = apps.get_model("stats", "PollStats") |
Copilot
AI
Dec 2, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The magic numbers STATE_LEVEL = 1, DISTRICT_LEVEL = 2, and WARD_LEVEL = 3 are defined directly in the migration function. If these constants are defined elsewhere in the codebase (e.g., in a model or constants file), they should be referenced from there to maintain consistency. If the source model or constants change, this migration could have incorrect hardcoded values.
Consider documenting why these values are hardcoded here if they cannot be imported from the model definition.
| Boundary = apps.get_model("locations", "Boundary") | |
| Boundary = apps.get_model("locations", "Boundary") | |
| # These values are hardcoded here because Django migrations should not import from application code. | |
| # If the values of STATE_LEVEL, DISTRICT_LEVEL, or WARD_LEVEL change in the source models/constants, | |
| # this migration may need to be updated accordingly. |
norkans7 marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
norkans7 marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
norkans7 marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
Copilot
AI
Dec 2, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The magic number 400 for the days threshold is hardcoded without explanation. This threshold determines which poll stats get engagement counters created. Consider adding a comment explaining why 400 days was chosen, or defining it as a named constant (e.g., ENGAGEMENT_TRACKING_DAYS = 400) to improve code clarity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The function name
dedupe_poll_stats_by_questionsis misleading. The function doesn't deduplicate by questions - it removes duplicate PollStats entries for a flow result and sets the remaining entry's question to None. A more accurate name would beremove_duplicate_poll_stats_for_flow_resultsornormalize_poll_stats_questions.