Skip to content

Conversation

@wrn14897
Copy link
Member

@wrn14897 wrn14897 commented Oct 25, 2025

Plus fixed 'group-by' alert state issues (alert histories)

image

Ref: HDX-2661
Ref: HDX-2660

@changeset-bot
Copy link

changeset-bot bot commented Oct 25, 2025

🦋 Changeset detected

Latest commit: 3f3231c

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 3 packages
Name Type
@hyperdx/common-utils Minor
@hyperdx/api Minor
@hyperdx/app Minor

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@vercel
Copy link

vercel bot commented Oct 25, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Preview Comments Updated (UTC)
hyperdx-v2-oss-app Ready Ready Preview Comment Nov 4, 2025 10:53pm

@claude
Copy link

claude bot commented Oct 25, 2025

Code Review

Critical Issues

Missing index for group-by queries → Add compound index { alert: 1, group: 1, createdAt: -1 } to AlertHistory model

  • The getPreviousAlertHistories aggregation groups by both alert AND group (line 627-631 in index.ts)
  • Current index only covers { alert: 1, createdAt: -1 } (alertHistory.ts:58)
  • For group-by alerts with many groups, MongoDB will need to scan all alert histories and do an in-memory group, degrading performance

Important Issues

⚠️ Potential race condition in skip logic → Consider adding alert-level locking or idempotency keys

  • Lines 209-222 check if any history exists in current window to skip processing
  • If multiple workers process the same alert concurrently, both could pass the skip check before either creates history
  • This could result in duplicate notifications being sent

⚠️ AlertInput.id field lacks validation → Add Zod schema validation or document usage

  • Added optional id?: string field to AlertInput type (controllers/alerts.ts:21)
  • No validation ensuring it is a valid ObjectId format
  • Usage in template.ts:267 assumes it exists without null check

Minor Observations

✅ Test coverage is extensive (1845 lines added to checkAlerts.test.ts)
✅ Auto-resolve logic for missing groups is well-implemented
✅ Good separation of concerns with composite keys for grouped alerts
✅ IncidentIO integration follows existing webhook pattern

Recommendation

Fix the missing compound index before merge - this is a performance issue that will impact production workloads with group-by alerts.

@github-actions
Copy link
Contributor

github-actions bot commented Oct 25, 2025

E2E Test Results

All tests passed • 39 passed • 3 skipped • 304s

Status Count
✅ Passed 39
❌ Failed 0
⚠️ Flaky 0
⏭️ Skipped 3

View full report →

@wrn14897 wrn14897 changed the title feat: add support for alert auto-resolve + Incident.io integration (WIP) feat: add support for alert auto-resolve + Incident.io integration Oct 26, 2025
@teeohhem teeohhem marked this pull request as draft October 27, 2025 15:11
@wrn14897 wrn14897 changed the title (WIP) feat: add support for alert auto-resolve + Incident.io integration feat: add support for alert auto-resolve + Incident.io integration Oct 27, 2025
@wrn14897 wrn14897 marked this pull request as ready for review October 27, 2025 17:31
@wrn14897 wrn14897 force-pushed the warren/introduce-alert-state-var-and-auto-resolve branch from 3e00bb0 to fe1b1f6 Compare October 27, 2025 21:54
"title": "{{title}}",
"description": "{{body}}",
"deduplication_key": "{{eventId}}",
"status": "{{#if (eq state "ALERT")}}firing{{else}}resolved{{/if}}",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@pulpdrew pulpdrew left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides the other comments, I also notice that since we create an alert history for each group, the Alerts page has an alert status per group, which is probably not correct:

For example, if there are three group keys, you'll get three red bars here for the same time and with different counts.

Screenshot 2025-10-28 at 10 17 16 AM

Just a thought - would it make more sense to save one alert history with all the groups, and maintain the state of each group within the one history entry? Sort of like how we have lastValues already? That could cut down on requests to mongo, help avoid making changes to the UI to fix the above ^^ issue, and maybe simplify some of the other code (particularly the stuff that handles partial mongo save failures)?

@wrn14897 wrn14897 force-pushed the warren/introduce-alert-state-var-and-auto-resolve branch from 025dd4e to 2f828ae Compare November 4, 2025 09:01
@wrn14897 wrn14897 force-pushed the warren/introduce-alert-state-var-and-auto-resolve branch from 2f828ae to d3e29ad Compare November 4, 2025 09:09
@wrn14897 wrn14897 force-pushed the warren/introduce-alert-state-var-and-auto-resolve branch from d3e29ad to 2a3adce Compare November 4, 2025 17:55
Copy link
Contributor

@pulpdrew pulpdrew left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, with the understanding that there are a few followup items listed in various comments

@wrn14897
Copy link
Member Author

wrn14897 commented Nov 4, 2025

LGTM, with the understanding that there are a few followup items listed in various comments

Thanks. I will file tickets for those followup items

@kodiakhq kodiakhq bot merged commit f612bf3 into main Nov 4, 2025
8 of 9 checks passed
@kodiakhq kodiakhq bot deleted the warren/introduce-alert-state-var-and-auto-resolve branch November 4, 2025 22:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants