Skip to content

[DI] Fix race condition when applying probe at app boot #5698

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 19, 2025

Conversation

watson
Copy link
Collaborator

@watson watson commented May 12, 2025

What does this PR do?

A race condition exists where the tracer receives a probe via RC, before Node.js has had a chance to load all the JS files from disk. If this race condition is triggered, it results in the tracer either not being able to find any script to attach the probe to, or, if the probe filename is a bit generic, it finds an incorrect match.

Therefore, once new scripts has been loaded, all probes are re-evaluated. If the matched scriptId has changed, we simply remove the old probe (if it was added to the wrong script) and apply it again.

This is only really an issue if Node.js is using the ESM loader, as this is really slow. If the application is purely a CommonJS application, this race condtion would probably never be triggered.

Motivation

Plugin Checklist

Additional Notes

Copy link
Collaborator Author

watson commented May 12, 2025

This stack of pull requests is managed by Graphite. Learn more about stacking.

Copy link

github-actions bot commented May 12, 2025

Overall package size

Self size: 9.38 MB
Deduped: 103.41 MB
No deduping: 103.93 MB

Dependency sizes | name | version | self size | total size | |------|---------|-----------|------------| | @datadog/libdatadog | 0.5.1 | 29.73 MB | 29.73 MB | | @datadog/native-appsec | 8.5.2 | 19.33 MB | 19.34 MB | | @datadog/pprof | 5.8.0 | 12.55 MB | 12.92 MB | | @datadog/native-iast-taint-tracking | 4.0.0 | 11.72 MB | 11.73 MB | | @opentelemetry/core | 1.30.1 | 908.66 kB | 7.16 MB | | protobufjs | 7.4.0 | 2.77 MB | 5.42 MB | | @datadog/wasm-js-rewriter | 4.0.1 | 2.85 MB | 3.58 MB | | @datadog/native-metrics | 3.1.1 | 1.02 MB | 1.43 MB | | @opentelemetry/api | 1.8.0 | 1.21 MB | 1.21 MB | | import-in-the-middle | 1.13.1 | 117.64 kB | 839.26 kB | | source-map | 0.7.4 | 226 kB | 226 kB | | opentracing | 0.14.7 | 194.81 kB | 194.81 kB | | lru-cache | 7.18.3 | 133.92 kB | 133.92 kB | | pprof-format | 2.1.0 | 111.69 kB | 111.69 kB | | @datadog/sketches-js | 2.1.1 | 109.9 kB | 109.9 kB | | lodash.sortby | 4.7.0 | 75.76 kB | 75.76 kB | | ignore | 5.3.2 | 53.63 kB | 53.63 kB | | istanbul-lib-coverage | 3.2.0 | 29.34 kB | 29.34 kB | | rfdc | 1.4.1 | 27.15 kB | 27.15 kB | | @isaacs/ttlcache | 1.4.1 | 25.2 kB | 25.2 kB | | dc-polyfill | 0.1.8 | 25.08 kB | 25.08 kB | | tlhunter-sorted-set | 0.1.0 | 24.94 kB | 24.94 kB | | shell-quote | 1.8.2 | 23.54 kB | 23.54 kB | | limiter | 1.1.5 | 23.17 kB | 23.17 kB | | retry | 0.13.1 | 18.85 kB | 18.85 kB | | semifies | 1.0.0 | 15.84 kB | 15.84 kB | | jest-docblock | 29.7.0 | 8.99 kB | 12.76 kB | | crypto-randomuuid | 1.0.0 | 11.18 kB | 11.18 kB | | ttl-set | 1.0.0 | 4.61 kB | 9.69 kB | | mutexify | 1.4.0 | 5.71 kB | 8.74 kB | | path-to-regexp | 0.1.12 | 6.6 kB | 6.6 kB | | koalas | 1.0.2 | 6.47 kB | 6.47 kB | | module-details-from-path | 1.0.3 | 4.47 kB | 4.47 kB |

🤖 This report was automatically generated by heaviest-objects-in-the-universe

Copy link

codecov bot commented May 12, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 79.04%. Comparing base (3f13dee) to head (8676916).
Report is 1 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #5698   +/-   ##
=======================================
  Coverage   79.04%   79.04%           
=======================================
  Files         515      515           
  Lines       23589    23589           
=======================================
  Hits        18646    18646           
  Misses       4943     4943           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@watson watson force-pushed the watson/DEBUG-3843/fix-race-condition branch from 3eedce8 to ba9305e Compare May 12, 2025 10:01
@pr-commenter
Copy link

pr-commenter bot commented May 12, 2025

Benchmarks

Benchmark execution time: 2025-05-17 10:42:32

Comparing candidate commit 8676916 in PR branch watson/DEBUG-3843/fix-race-condition with baseline commit 3f13dee in branch master.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 1269 metrics, 54 unstable metrics.

@watson watson force-pushed the watson/DEBUG-3843/fix-race-condition branch 2 times, most recently from ca4eaa7 to c48de67 Compare May 12, 2025 10:39
@datadog-datadog-prod-us1
Copy link

datadog-datadog-prod-us1 bot commented May 12, 2025

Datadog Report

Branch report: watson/DEBUG-3843/fix-race-condition
Commit report: 09c81b7
Test service: dd-trace-js-integration-tests

✅ 0 Failed, 988 Passed, 0 Skipped, 15m 13.81s Total Time

Base automatically changed from watson/DEBUG-3843/add-extra-debug-logs to master May 12, 2025 11:10
@watson watson force-pushed the watson/DEBUG-3843/fix-race-condition branch from c48de67 to e4b4716 Compare May 12, 2025 12:42
@datadog-datadog-prod-us1
Copy link

Datadog Summary

✅ Code Quality    ✅ Code Security    ✅ Dependencies


Was this helpful? Give us feedback!

@watson watson marked this pull request as ready for review May 12, 2025 12:49
@watson watson requested review from a team as code owners May 12, 2025 12:49
@watson watson added the bug Something isn't working label May 13, 2025
Copy link
Collaborator

@BridgeAR BridgeAR left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be nice to mock the time in the test instead of adding a new internal only configuration.
Otherwise LGTM!

@watson watson force-pushed the watson/DEBUG-3843/fix-race-condition branch from e4b4716 to 63afadd Compare May 14, 2025 06:26
A race condition exists where the tracer receives a probe via RC, before
Node.js has had a chance to load all the JS files from disk. If this
race condition is triggered, it results in the tracer either not being
able to find any script to attach the probe to, or, if the probe
filename is a bit generic, it finds an incorrect match.

Therefore, once new scripts has been loaded, all probes are
re-evaluated. If the matched `scriptId` has changed, we simply remove
the old probe (if it was added to the wrong script) and apply it again.

This is only really an issue if Node.js is using the ESM loader, as this
is really slow. If the application is purely a CommonJS application,
this race condtion would probably never be triggered.
@watson watson force-pushed the watson/DEBUG-3843/fix-race-condition branch from 63afadd to 8676916 Compare May 17, 2025 10:30
@watson watson requested a review from BridgeAR May 17, 2025 10:34
@BridgeAR BridgeAR merged commit f1f9a06 into master May 19, 2025
586 of 592 checks passed
@BridgeAR BridgeAR deleted the watson/DEBUG-3843/fix-race-condition branch May 19, 2025 08:34
dd-trace-js bot pushed a commit that referenced this pull request May 19, 2025
A race condition exists where the tracer receives a probe via RC, before
Node.js has had a chance to load all the JS files from disk. If this
race condition is triggered, it results in the tracer either not being
able to find any script to attach the probe to, or, if the probe
filename is a bit generic, it finds an incorrect match.

Therefore, once new scripts has been loaded, all probes are
re-evaluated. If the matched `scriptId` has changed, we simply remove
the old probe (if it was added to the wrong script) and apply it again.

This is only really an issue if Node.js is using the ESM loader, as this
is really slow. If the application is purely a CommonJS application,
this race condtion would probably never be triggered.
@dd-trace-js dd-trace-js bot mentioned this pull request May 19, 2025
dd-trace-js bot pushed a commit that referenced this pull request May 20, 2025
A race condition exists where the tracer receives a probe via RC, before
Node.js has had a chance to load all the JS files from disk. If this
race condition is triggered, it results in the tracer either not being
able to find any script to attach the probe to, or, if the probe
filename is a bit generic, it finds an incorrect match.

Therefore, once new scripts has been loaded, all probes are
re-evaluated. If the matched `scriptId` has changed, we simply remove
the old probe (if it was added to the wrong script) and apply it again.

This is only really an issue if Node.js is using the ESM loader, as this
is really slow. If the application is purely a CommonJS application,
this race condtion would probably never be triggered.
@dd-trace-js dd-trace-js bot mentioned this pull request May 20, 2025
BridgeAR pushed a commit that referenced this pull request May 23, 2025
A race condition exists where the tracer receives a probe via RC, before
Node.js has had a chance to load all the JS files from disk. If this
race condition is triggered, it results in the tracer either not being
able to find any script to attach the probe to, or, if the probe
filename is a bit generic, it finds an incorrect match.

Therefore, once new scripts has been loaded, all probes are
re-evaluated. If the matched `scriptId` has changed, we simply remove
the old probe (if it was added to the wrong script) and apply it again.

This is only really an issue if Node.js is using the ESM loader, as this
is really slow. If the application is purely a CommonJS application,
this race condtion would probably never be triggered.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants