perf(RHINENG-25330): rewrite get_host_counts_batch to avoid full hosts table scan#3847
Open
rodrigonull wants to merge 1 commit intomasterfrom
Open
perf(RHINENG-25330): rewrite get_host_counts_batch to avoid full hosts table scan#3847rodrigonull wants to merge 1 commit intomasterfrom
rodrigonull wants to merge 1 commit intomasterfrom
Conversation
Contributor
SC Environment Impact AssessmentOverall Impact: ⚪ NONE No SC Environment-specific impacts detected in this PR. What was checkedThis PR was automatically scanned for:
|
Contributor
There was a problem hiding this comment.
Hey - I've found 1 issue, and left some high level feedback:
- The new "total minus excluded" logic partially reimplements the non-culled filter; consider reusing
find_non_culled_hostsor extracting a shared helper so the business rules for culled/non-culled hosts cannot drift over time. - When computing
count_map[gid_str] = total_map.get(gid_str, 0) - excluded_map.get(gid_str, 0), consider clamping at zero (e.g.max(..., 0)) or asserting non-negativity to guard against any edge cases where the two queries become inconsistent.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The new "total minus excluded" logic partially reimplements the non-culled filter; consider reusing `find_non_culled_hosts` or extracting a shared helper so the business rules for culled/non-culled hosts cannot drift over time.
- When computing `count_map[gid_str] = total_map.get(gid_str, 0) - excluded_map.get(gid_str, 0)`, consider clamping at zero (e.g. `max(..., 0)`) or asserting non-negativity to guard against any edge cases where the two queries become inconsistent.
## Individual Comments
### Comment 1
<location path="lib/host_repository.py" line_range="427" />
<code_context>
- db.session.query(HostGroupAssoc.group_id, func.count(Host.id).label("host_count"))
- .join(Host)
+ # Count ALL members per group from hosts_groups only (index-only scan, no join).
+ total_query = (
+ db.session.query(HostGroupAssoc.group_id, func.count().label("total"))
.filter(HostGroupAssoc.org_id == org_id, HostGroupAssoc.group_id.in_(group_ids))
</code_context>
<issue_to_address>
**issue (complexity):** Consider simplifying the map construction and extracting the staleness/exclusion and two-query strategy into helpers so the main function clearly expresses high-level intent.
You can keep the optimization but local complexity can be reduced and staleness semantics encapsulated a bit better.
### 1. Inline map-merge can be simplified
You don’t need the prefilled `count_map` + loop; you can compute it in one expression and avoid mutation:
```python
total_map = {str(gid): total for gid, total in total_query.all()}
excluded_map = {str(gid): excluded for gid, excluded in excluded_query.all()}
count_map = {
str(gid): total_map.get(str(gid), 0) - excluded_map.get(str(gid), 0)
for gid in group_ids
}
return count_map
```
This keeps the “total minus excluded” behavior but makes it easier to see the core logic.
### 2. Encapsulate the staleness / exclusion filter
The custom staleness logic is currently inlined and partially duplicates existing abstractions. You can wrap it in a small helper so that the query reads as “exclude culled hosts” instead of embedding the details:
```python
def _excluded_hosts_filter() -> ColumnElement[bool]:
staleness_filter = HostStalenessStatesDbFilters()
return or_(
staleness_filter.culled(),
Host.deletion_timestamp.is_(None),
)
```
Then in `get_host_counts_batch`:
```python
excluded_query = (
db.session.query(
HostGroupAssoc.group_id,
func.count().label("excluded"),
)
.join(
Host,
and_(
HostGroupAssoc.host_id == Host.id,
HostGroupAssoc.org_id == Host.org_id,
),
)
.filter(
HostGroupAssoc.org_id == org_id,
HostGroupAssoc.group_id.in_(group_ids),
_excluded_hosts_filter(),
)
.group_by(HostGroupAssoc.group_id)
)
```
That keeps the optimization (only culled/“excluded” hosts hit the join) but moves the staleness semantics into one place so it’s easier to keep consistent with `find_non_culled_hosts` and to adjust later if the definition changes.
### 3. Optional: encapsulate the two-query strategy in a helper
If you want to further separate domain intent from the query strategy, you can push the “total minus excluded” details into a private helper. `get_host_counts_batch` then just expresses high-level intent:
```python
def _get_non_culled_host_counts_for_groups(
org_id: str, group_ids: list[str]
) -> dict[str, int]:
if not group_ids:
return {}
total_query = (
db.session.query(
HostGroupAssoc.group_id,
func.count().label("total"),
)
.filter(
HostGroupAssoc.org_id == org_id,
HostGroupAssoc.group_id.in_(group_ids),
)
.group_by(HostGroupAssoc.group_id)
)
total_map = {str(gid): total for gid, total in total_query.all()}
excluded_query = (
db.session.query(
HostGroupAssoc.group_id,
func.count().label("excluded"),
)
.join(
Host,
and_(
HostGroupAssoc.host_id == Host.id,
HostGroupAssoc.org_id == Host.org_id,
),
)
.filter(
HostGroupAssoc.org_id == org_id,
HostGroupAssoc.group_id.in_(group_ids),
_excluded_hosts_filter(),
)
.group_by(HostGroupAssoc.group_id)
)
excluded_map = {str(gid): excluded for gid, excluded in excluded_query.all()}
return {
str(gid): total_map.get(str(gid), 0) - excluded_map.get(str(gid), 0)
for gid in group_ids
}
def get_host_counts_batch(org_id: str, group_ids: list[str]) -> dict[str, int]:
return _get_non_culled_host_counts_for_groups(org_id, group_ids)
```
Functionality and performance characteristics stay the same, but `get_host_counts_batch` is again “about counts per group,” and the performance/SQL strategy is isolated and easier to reason about or test.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
ce3a97a to
94236c7
Compare
…s table scan Replace the single expensive JOIN query with a "total minus excluded" strategy: count all members from hosts_groups (index-only scan, no join) then subtract the small number of culled/NULL deletion_timestamp hosts. For large orgs with 500K+ hosts in a group, this eliminates the sequential scan on the wide hosts table.
94236c7 to
9f10fa5
Compare
kruai
reviewed
Mar 31, 2026
Comment on lines
+272
to
+282
| def _excluded_hosts_filter(): | ||
| """Filter matching hosts that should be excluded from active counts. | ||
|
|
||
| Matches culled hosts (deletion_timestamp <= now) and hosts with NULL | ||
| deletion_timestamp. This is the inverse of the non-culled filter used | ||
| by find_non_culled_hosts, kept in sync via HostStalenessStatesDbFilters. | ||
| """ | ||
| staleness_filter = HostStalenessStatesDbFilters() | ||
| return or_(staleness_filter.culled(), Host.deletion_timestamp.is_(None)) | ||
|
|
||
|
|
Collaborator
There was a problem hiding this comment.
There is now overlap with find_non_culled_hosts. You could change that function to just
return query.filter(not_(_excluded_hosts_filter()))
Comment on lines
+445
to
+457
| # Count EXCLUDED members: culled (deletion_timestamp <= now) OR NULL deletion_timestamp. | ||
| # This query joins the hosts table but matches very few rows (typically <1%). | ||
| excluded_query = ( | ||
| db.session.query(HostGroupAssoc.group_id, func.count().label("excluded")) | ||
| .join(Host, and_(HostGroupAssoc.host_id == Host.id, HostGroupAssoc.org_id == Host.org_id)) | ||
| .filter( | ||
| HostGroupAssoc.org_id == org_id, | ||
| HostGroupAssoc.group_id.in_(group_ids), | ||
| _excluded_hosts_filter(), | ||
| ) | ||
| .group_by(HostGroupAssoc.group_id) | ||
| ) | ||
| excluded_map = {str(gid): excluded for gid, excluded in excluded_query.all()} |
Collaborator
There was a problem hiding this comment.
This new version of this function now does 2 queries instead of 1, and I'd expect this second query to do similar .join work to the original. The implementation's a bit more complicated, but it could be worth it if the performance gain is large enough. Have you compared the old and new queries using EXPLAIN ANALYZE or something like that?
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Overview
This PR is being created to address RHINENG-25330.
Replace the single expensive JOIN query with a "total minus excluded" strategy: count all members from hosts_groups (index-only scan, no join) then subtract the small number of culled/NULL deletion_timestamp hosts. For large orgs, this eliminates the sequential scan on the wide hosts table.
PR Checklist
Secure Coding Practices Documentation Reference
You can find documentation on this checklist here.
Secure Coding Checklist
Summary by Sourcery
Optimize batched host count retrieval for groups by replacing a join-based query with a two-phase "total minus excluded" counting strategy that avoids full scans of the hosts table for large organizations.
Enhancements:
Summary by Sourcery
Optimize batched host count retrieval for groups by replacing a single join-based query with a two-phase counting strategy that avoids full scans of the hosts table for large organizations.
Enhancements: