Skip to content

Conversation

@norkans7
Copy link
Contributor

@norkans7 norkans7 commented Dec 1, 2025

…icating the poll stats

Copilot AI review requested due to automatic review settings December 1, 2025 14:51
Copilot finished reviewing on behalf of norkans7 December 1, 2025 14:55
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a new Django migration to deduplicate PollStats entries by questions and then re-backfill the PollStatsCounter and PollEngagementDailyCount tables with the deduplicated data. This migration is a follow-up to migration 0032, which initially backfilled these counter tables.

Key Changes:

  • Introduces a deduplication function that identifies FlowResults with multiple associated PollQuestions and consolidates their PollStats
  • Re-backfills PollStatsCounter and PollEngagementDailyCount entries after deduplication, using a separate cache key for resumability
  • Deletes and re-creates counters for stopped polls to ensure data consistency after deduplication
Comments suppressed due to low confidence (1)

ureport/stats/migrations/0033_backfill_poll_stats_counters_dedupes.py:41

  • This assignment to 'stats' is unnecessary as it is redefined before this value is used.
        stats = (

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@codecov-commenter
Copy link

codecov-commenter commented Dec 1, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 83.20%. Comparing base (483da2b) to head (8665fea).

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #1319   +/-   ##
=======================================
  Coverage   83.20%   83.20%           
=======================================
  Files          49       49           
  Lines        6102     6102           
=======================================
  Hits         5077     5077           
  Misses       1025     1025           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@norkans7 norkans7 requested a review from Copilot December 1, 2025 15:29
Copilot finished reviewing on behalf of norkans7 December 1, 2025 15:30
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@norkans7 norkans7 requested a review from Copilot December 2, 2025 08:28
Copilot finished reviewing on behalf of norkans7 December 2, 2025 08:31
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 7 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +49 to +51
PollStats = apps.get_model("stats", "PollStats")

GenderSegment = apps.get_model("stats", "GenderSegment")
Copy link

Copilot AI Dec 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The backfill_poll_stats_counters function lacks documentation explaining its purpose, parameters, and the complex backfilling logic. Given the function's complexity (handling segments, locations, scopes, caching), a comprehensive docstring would significantly improve maintainability.

Suggested change
PollStats = apps.get_model("stats", "PollStats")
GenderSegment = apps.get_model("stats", "GenderSegment")
"""
Backfills PollStatsCounter and PollEngagementDailyCount data for existing PollStats entries.
This migration function iterates over all PollStats records and reconstructs the associated
counter and engagement daily count data, ensuring that statistics are correctly populated
for all relevant segments (gender, age, scheme), locations (state, district, ward), and
scopes (overall, segmented, location-specific).
The function:
- Retrieves all necessary models using the provided `apps` registry.
- Iterates through PollStats records, handling each according to its segmentation and location.
- Handles deduplication and aggregation logic for segments and locations.
- Utilizes caching to optimize repeated lookups and avoid redundant database queries.
- Ensures that counters and engagement counts are created or updated as needed.
Args:
apps: The Django app registry for migrations, used to get historical models.
schema_editor: The database schema editor (unused, but required by Django migration API).
This function is intended to be run as part of a Django data migration and should not be called directly.
"""
PollStats = apps.get_model("stats", "PollStats")

Copilot uses AI. Check for mistakes.
count=stat.count,
)
engagement_counter_kwargs = dict()
if stat.date is not None and stat.date >= (timezone.now() - timedelta(days=400)):
Copy link

Copilot AI Dec 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The magic number 400 for the days threshold is hardcoded without explanation. This threshold determines which poll stats get engagement counters created. Consider adding a comment explaining why 400 days was chosen, or defining it as a named constant (e.g., ENGAGEMENT_TRACKING_DAYS = 400) to improve code clarity.

Copilot uses AI. Check for mistakes.
poll_engagement_daily_count_ids_count = len(poll_engagement_daily_count_ids)

for batch in chunk_list(poll_engagement_daily_count_ids, 1000):
PollEngagementDailyCount.objects.filter(pk__in=list(batch)).delete()
Copy link

Copilot AI Dec 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same issue as with poll_stats_counters_ids: the batch iterator is consumed when converting to list on line 88, but chunk_list returns iterators that can only be consumed once. The batch should be converted to a list before the loop, not inside the filter() call.

The previous migration (0032) uses batch directly in the filter without wrapping in list(), which is the correct approach.

Suggested change
PollEngagementDailyCount.objects.filter(pk__in=list(batch)).delete()
PollEngagementDailyCount.objects.filter(pk__in=batch).delete()

Copilot uses AI. Check for mistakes.
pass


def dedupe_poll_stats_by_questions(apps, schema_editor): # pragma: no cover
Copy link

Copilot AI Dec 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The function name dedupe_poll_stats_by_questions is misleading. The function doesn't deduplicate by questions - it removes duplicate PollStats entries for a flow result and sets the remaining entry's question to None. A more accurate name would be remove_duplicate_poll_stats_for_flow_results or normalize_poll_stats_questions.

Copilot uses AI. Check for mistakes.
pass


def dedupe_poll_stats_by_questions(apps, schema_editor): # pragma: no cover
Copy link

Copilot AI Dec 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The dedupe_poll_stats_by_questions function lacks documentation explaining its purpose and the deduplication logic. Given the complexity of the operation (removing duplicate PollStats entries and nullifying question references), a docstring explaining why this is necessary and what the expected outcome is would improve maintainability.

Suggested change
def dedupe_poll_stats_by_questions(apps, schema_editor): # pragma: no cover
def dedupe_poll_stats_by_questions(apps, schema_editor): # pragma: no cover
"""
Deduplicate PollStats entries for FlowResults with multiple associated PollQuestions.
For each FlowResult that is linked to two or more PollQuestions, this function:
1. Deletes all PollStats entries for that FlowResult except the one associated with the first PollQuestion.
2. Sets the question field to None for the remaining PollStats entry.
This is necessary to ensure that each FlowResult has at most one PollStats entry,
and that the PollStats entry is not ambiguously linked to a specific PollQuestion.
The expected outcome is that there are no duplicate PollStats per FlowResult,
and all remaining PollStats have question=None.
"""

Copilot uses AI. Check for mistakes.
Comment on lines 79 to 80
for batch in chunk_list(poll_stats_counters_ids, 1000):
PollStatsCounter.objects.filter(pk__in=list(batch)).delete()
Copy link

Copilot AI Dec 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The chunk_list function returns an iterator of iterators, not lists. On line 80, calling list(batch) only converts the outer iterator to a list, but each item in that list is still an iterator that can only be consumed once. This is problematic because the iterator is consumed when creating the list on line 112 (batch_ids = list(batch)), leaving nothing for subsequent iterations.

The same issue exists on line 88. The batch should be converted to a list before being used in the delete query filter.

This differs from the previous migration (0032) where batch is used directly without converting to a list twice, which works correctly.

Copilot uses AI. Check for mistakes.

Poll = apps.get_model("polls", "Poll")
PollQuestion = apps.get_model("polls", "PollQuestion")
Boundary = apps.get_model("locations", "Boundary")
Copy link

Copilot AI Dec 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The magic numbers STATE_LEVEL = 1, DISTRICT_LEVEL = 2, and WARD_LEVEL = 3 are defined directly in the migration function. If these constants are defined elsewhere in the codebase (e.g., in a model or constants file), they should be referenced from there to maintain consistency. If the source model or constants change, this migration could have incorrect hardcoded values.

Consider documenting why these values are hardcoded here if they cannot be imported from the model definition.

Suggested change
Boundary = apps.get_model("locations", "Boundary")
Boundary = apps.get_model("locations", "Boundary")
# These values are hardcoded here because Django migrations should not import from application code.
# If the values of STATE_LEVEL, DISTRICT_LEVEL, or WARD_LEVEL change in the source models/constants,
# this migration may need to be updated accordingly.

Copilot uses AI. Check for mistakes.
@norkans7 norkans7 merged commit 8a24dd7 into main Dec 2, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants