fix: run cohortpeople count query without joins #39925

pl · 2025-10-19T00:33:30Z

Problem

We join cohortpeople count queries with persons table to validate the team ID, which is super slow for large cohorts.

Changes

Changes the count query to run a separate query to check the team id.

How did you test this code?

Added a test to verify team isolation.

This reverts commit d4d41c3.

greptile-apps

_{2 files reviewed, 2 comments}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2025-10-19T00:42:47Z

posthog/models/cohort/util.py

+        return 0
+
+    # Then count cohortpeople (in persons DB, no cross-DB join)
+    count = CohortPeople.objects.filter(cohort_id=cohort_id).count()


logic: Missing team_id filter in count query causes incorrect counts in multi-team projects.

When recalculate_cohortpeople runs for a project with multiple teams (environments), it calls this function for each team. The same cohort_id can have CohortPeople entries with different team_id values. Without filtering by team_id, all teams get the same total count instead of team-specific counts.

The ClickHouse table has team_id but the Django model doesn't expose it. Use raw SQL to filter properly:

Suggested change

count = CohortPeople.objects.filter(cohort_id=cohort_id).count()

# Then count cohortpeople filtering by team_id (in persons DB, no cross-DB join)

result = sync_execute(

f"SELECT count() FROM {PERSON_STATIC_COHORT_TABLE} WHERE cohort_id = %(cohort_id)s AND team_id = %(team_id)s",

{"cohort_id": cohort_id, "team_id": team_id}

)

count = result[0][0] if result else 0

Prompt To Fix With AI

This is a comment left during a code review. Path: posthog/models/cohort/util.py Line: 428:428 Comment: **logic:** Missing `team_id` filter in count query causes incorrect counts in multi-team projects. When `recalculate_cohortpeople` runs for a project with multiple teams (environments), it calls this function for each team. The same `cohort_id` can have CohortPeople entries with different `team_id` values. Without filtering by `team_id`, all teams get the same total count instead of team-specific counts. The ClickHouse table has `team_id` but the Django model doesn't expose it. Use raw SQL to filter properly: ```suggestion # Then count cohortpeople filtering by team_id (in persons DB, no cross-DB join) result = sync_execute( f"SELECT count() FROM {PERSON_STATIC_COHORT_TABLE} WHERE cohort_id = %(cohort_id)s AND team_id = %(team_id)s", {"cohort_id": cohort_id, "team_id": team_id} ) count = result[0][0] if result else 0 ``` How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2025-10-19T00:42:48Z

posthog/test/test_cohort_model.py

+    def test_static_cohort_size_validates_team(self):
+        from posthog.models.cohort.util import get_static_cohort_size
+
+        # Create another team in the same organization (different project)
+        team2 = Team.objects.create(organization=self.organization)
+
+        # Create people in both teams
+        Person.objects.create(team=self.team, distinct_ids=["person1_team1"])
+        Person.objects.create(team=self.team, distinct_ids=["person2_team1"])
+        Person.objects.create(team=team2, distinct_ids=["person1_team2"])
+
+        # Create a static cohort in team1
+        cohort = Cohort.objects.create(team=self.team, is_static=True, name="test cohort")
+        cohort.insert_users_by_list(["person1_team1", "person2_team1"])
+
+        # Count should work for the correct team
+        count_correct = get_static_cohort_size(cohort_id=cohort.pk, team_id=self.team.pk)
+        assert count_correct == 2
+
+        # Count should be 0 for a different team (validates team ownership)
+        count_wrong_team = get_static_cohort_size(cohort_id=cohort.pk, team_id=team2.pk)
+        assert count_wrong_team == 0


logic: Test doesn't cover the multi-team-in-same-project scenario where the bug occurs.

The current test validates team isolation across different organizations, but doesn't test the case where multiple teams in the same project share a cohort. In recalculate_cohortpeople, all teams in a project get CohortPeople entries for the same cohort_id but with different team_id values.

Add a test case like:

# Create two teams in the same project team2 = Team.objects.create(organization=self.organization, project_id=self.team.project_id) # Add people to the cohort from both teams cohort.insert_users_by_list(["person1_team1"], team_id=self.team.pk) cohort.insert_users_by_list(["person1_team2"], team_id=team2.pk) # Each team should see only their count assert get_static_cohort_size(cohort_id=cohort.pk, team_id=self.team.pk) == 1 assert get_static_cohort_size(cohort_id=cohort.pk, team_id=team2.pk) == 1

Prompt To Fix With AI

This is a comment left during a code review. Path: posthog/test/test_cohort_model.py Line: 342:363 Comment: **logic:** Test doesn't cover the multi-team-in-same-project scenario where the bug occurs. The current test validates team isolation across different organizations, but doesn't test the case where multiple teams in the same project share a cohort. In `recalculate_cohortpeople`, all teams in a project get CohortPeople entries for the same `cohort_id` but with different `team_id` values. Add a test case like: ```python # Create two teams in the same project team2 = Team.objects.create(organization=self.organization, project_id=self.team.project_id) # Add people to the cohort from both teams cohort.insert_users_by_list(["person1_team1"], team_id=self.team.pk) cohort.insert_users_by_list(["person1_team2"], team_id=team2.pk) # Each team should see only their count assert get_static_cohort_size(cohort_id=cohort.pk, team_id=self.team.pk) == 1 assert get_static_cohort_size(cohort_id=cohort.pk, team_id=team2.pk) == 1 ``` How can I resolve this? If you propose a fix, please make it concise.

pl added 2 commits October 18, 2025 18:25

revert: join cohort people counts with cohorts, not people (#39922)

d3fc67b

This reverts commit d4d41c3.

fix: run cohortpeople count query without joins

27ed247

pl requested review from a team and jose-sequeira October 19, 2025 00:33

posthog-project-board-bot bot moved this to In Review in Feature Flags Oct 19, 2025

posthog-project-board-bot bot added this to Feature Flags Oct 19, 2025

jose-sequeira approved these changes Oct 19, 2025

View reviewed changes

github-project-automation bot moved this from In Review to Approved in Feature Flags Oct 19, 2025

Base automatically changed from pl/revert_cohort_count to master October 19, 2025 00:41

greptile-apps bot reviewed Oct 19, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: run cohortpeople count query without joins #39925

fix: run cohortpeople count query without joins #39925

pl commented Oct 19, 2025

Uh oh!

greptile-apps bot left a comment

Uh oh!

greptile-apps bot Oct 19, 2025

Uh oh!

greptile-apps bot Oct 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

-    count = CohortPeople.objects.filter(cohort_id=cohort_id).count()
+    # Then count cohortpeople filtering by team_id (in persons DB, no cross-DB join)
+    result = sync_execute(
+        f"SELECT count() FROM {PERSON_STATIC_COHORT_TABLE} WHERE cohort_id = %(cohort_id)s AND team_id = %(team_id)s",
+        {"cohort_id": cohort_id, "team_id": team_id}
+    )
+    count = result[0][0] if result else 0

fix: run cohortpeople count query without joins #39925

Are you sure you want to change the base?

fix: run cohortpeople count query without joins #39925

Conversation

pl commented Oct 19, 2025

Problem

Changes

How did you test this code?

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Oct 19, 2025

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Oct 19, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants