feat(crashtracking): capture non signal based crashes by gyuheon0h · Pull Request #5321 · DataDog/dd-trace-rb

gyuheon0h · 2026-02-05T20:51:57Z

What does this PR do?
This PR adds support for crash report collection and emission for non-signal based crashes. We do this by hooking into at_exit and accessing the exception stack. We send the exception stack over from the Ruby side to the native code side, and use it to build a crash report. We also send a crash ping, mainly for parity.

Native stack collection planned to be implemented but is out of scope for this stage.

Motivation:
Nice to see non-signal based crashes (not captured by regular errortracking) and was a feature request from SSI team.

Ticket: PROF-13673
Change log entry
Non-signal based crashes are caught and reported

Additional Notes:

How to test the change?
Unit tests

Run a test ruby program instrumented with the crashtracker and look at the report being sent.

{
  "data_schema_version": "1.4",
  "error": {
    "is_crash": true,
    "kind": "UnhandledException",
    "message": "Unhandled ArgumentError: Test argument crash",
    "source_type": "Crashtracking",
    "stack": {
      "format": "Datadog Crashtracker 1.0",
      "frames": [
        {
          "file": "/home/bits/go/src/github.com/DataDog/dd-trace-rb/spec/datadog/core/crashtracking/component_spec.rb",
          "function": "block (4 levels) in <top (required)>",
          "line": 161
        },
        {
          "file": "/home/bits/go/src/github.com/DataDog/dd-trace-rb/spec/datadog/core/crashtracking/component_spec.rb",
          "function": "block (6 levels) in <top (required)>",
          "line": 168
        },
        ...
        {
          "file": "/var/lib/gems/3.0.0/gems/rspec-core-3.13.6/lib/rspec/core/runner.rb",
          "function": "invoke",
          "line": 45
        },
        {
          "file": "/var/lib/gems/3.0.0/gems/rspec-core-3.13.6/exe/rspec",
          "function": "<top (required)>",
          "line": 4
        },
        {
          "file": "/usr/local/bin/rspec",
          "function": "load",
          "line": 25
        },
        {
          "file": "/usr/local/bin/rspec",
          "function": "<main>",
          "line": 25
        }
      ],
      "incomplete": false
    }
  },
  "incomplete": false,
  "metadata": {
    "library_name": "dd-trace-rb",
    "library_version": "2.29.0",
    "family": "ruby",
    "tags": [
      "tag1:value1",
      "tag2:value2",
      "language:ruby-testing-123",
      "service:ruby-testing-123"
    ]
  },
  "os_info": {
    "architecture": "x86_64",
    "bitness": "64-bit",
    "os_type": "Ubuntu",
    "version": "22.4.0"
  },
  "proc_info": {
    "pid": 220117
  },
  "timestamp": "2026-02-06 00:25:31.590807434 UTC",
  "uuid": "9082567b-686a-4897-95cb-e596c929ba78"
}

github-actions · 2026-02-05T20:52:08Z

Thank you for updating Change log entry section 👏

^{Visited at: 2026-02-06 01:14:59 UTC}

datadog-official · 2026-02-06T00:12:23Z

✅ Tests

🎉 All green!

❄️ No new flaky tests detected
🧪 All tests passed

🎯 Code Coverage
• Patch Coverage: 86.59%
• Overall Coverage: 95.17%

View detailed report

_{This comment will be updated automatically if new data arrives.

🔗 Commit SHA: 808b3f6 | Docs | Datadog PR Page | Was this helpful? Give us feedback!}

pr-commenter · 2026-02-06T00:27:47Z

Benchmarks

Benchmark execution time: 2026-02-06 22:46:19

Comparing candidate commit 808b3f6 in PR branch gyuheon0h/capture-non-signal-crash with baseline commit 7631952 in branch master.

Found 2 performance improvements and 0 performance regressions! Performance is the same for 42 metrics, 2 unstable metrics.

scenario:tracing - Propagation - Datadog

🟩 throughput [+3204.958op/s; +3279.084op/s] or [+11.190%; +11.449%]

scenario:tracing - Tracing.log_correlation

🟩 throughput [+6094.284op/s; +6354.678op/s] or [+6.068%; +6.328%]

ivoanjo

I've given it a pass!

ext/libdatadog_api/ruby_crash_reporting.c

spec/datadog/core/crashtracking/component_spec.rb

ext/libdatadog_api/crashtracker.c

ext/libdatadog_api/crashtracker.h

ext/libdatadog_api/crashtracker_report_exception.c

lib/datadog/core/crashtracking/component.rb

Revert "Gitignore weird files that keep popping up (will pop this commit later)" This reverts commit aeb3017. Revert "Remove VS Code config files from tracking" This reverts commit 2b30b86. Use locations array Clean Lazy logging Fix memory leak

Fmt fmt

Remove noisy log Update symbol name Check result, build message in ruby unit test and test cleanup Inline + no order dependency + cleanup Number of frames logic on ruby side frame processing in helper Restore accidentally deleted comment Update tags on fork Fmt Fix potential mem leak move to core clean Extract into helper Fix more potential leaks Fmt

Removed comment about Ruby exception crash reporting tests.

p-datadog

I read the C code and while nothing jumped out at me I also don't know if everything there is correct.

I left comments for the Ruby code.

In general, since we do have a crash tracker for crashes, I would like to see "unhandled exceptions" (and more precisely, "unhandled exceptions on main thread") NOT be referred to as "crashes" in Ruby code or documentation. I understand that eventually the libdatadog data structures will be created that have "crash" in their name, but I would prefer to see everything upstream of that use correct terminology and refer to "unhandled exceptions".

lib/datadog/core.rb

p-datadog · 2026-02-06T21:15:57Z

lib/datadog/core.rb

+        rescue => e
+          # Don't let crash reporting itself crash the exit process


This only rescues StandardError-derived exceptions. If you want to be fully thorough in not permitting crash tracking issues affecting the application, you should rescue everything (with provisions for NoMemoryError, Interrupt and SystemExit again) as is for example done in https://github.com/DataDog/dd-trace-rb/blob/master/lib/datadog/di/instrumenter.rb#L191-L196.

Yeah I understand but wouldn't we want to reraise if we get NoMemoryError, Interrupt, or SystemExit.

At that point the program is actually just done?

Like we don't want to swallow SystemExit, ignore SIGTERM, or block process shutdown? Plz correct me if I am wrong

You are correct on those three exception classes.

The question is whether you intend to rescue StandardError-derived exceptions only or all (except for those 3).

I asked claude which exceptions do not derive from StandardError and here is what it said:

Exceptions that don't inherit from StandardError:

NoMemoryError - Out of memory condition

ScriptError and its subclasses:
- LoadError - Failed to load a file
- NotImplementedError - Method not implemented on this platform
- SyntaxError - Syntax error in code

SecurityError - Security violation

SignalException and its subclass:
- Interrupt - Raised when Ctrl-C is pressed (SIGINT)

SystemExit - Raised by exit or abort

SystemStackError - Stack overflow

fatal - Unrecoverable error (cannot be rescued)

Of these, LoadError (and NotImplementedError) are quite common.

Our existing rescues are mostly for StandardError (this is what gets rescued if no class is explicitly specified) therefore, if you just want to do what the library is already doing elsewhere, the current code in the diff is fine, but I think maybe this should be revisited library-wide (and making this change library-wide would be out of scope for this PR).

lib/datadog/core.rb

lib/datadog/core/crashtracking/component.rb

Flip negation

ext/libdatadog_api/crashtracker_report_exception.c

p-datadog · 2026-02-07T07:57:17Z

spec/datadog/core/crashtracking/component_spec.rb

+            end
+            sleep 0.1
+
+            raise StandardError, 'Test Ruby crash'


Suggested change

raise StandardError, 'Test Ruby crash'

raise StandardError, 'Test Ruby unhandled exception on main thread'

p-datadog · 2026-02-07T07:57:48Z

spec/datadog/core/crashtracking/component_spec.rb

+          end
+        end
+
+        it 'reports Ruby exceptions via http when app crashes' do


Suggested change

it 'reports Ruby exceptions via http when app crashes' do

it 'reports Ruby unhandled exceptions via http' do

p-datadog · 2026-02-07T07:57:58Z

spec/datadog/core/crashtracking/component_spec.rb

        end
      end

+      context 'Ruby exception crash reporting' do


Suggested change

context 'Ruby exception crash reporting' do

context 'Ruby unhandled exception reporting' do

p-datadog · 2026-02-07T07:59:11Z

lib/datadog/core/crashtracking/component.rb

+          logger.debug('Crashtracker failed to report unhandled exception to crash tracker') unless success
+        rescue => e
+          # don't let crash reporting itself raise an error
+          logger.debug("Crashtracker failed to report Ruby exception crash: #{e.message}")


Suggested change

logger.debug("Crashtracker failed to report Ruby exception crash: #{e.message}")

logger.debug("Crashtracker failed to report Ruby unhandled exception: #{e.class}: #{e.message}")

gyuheon0h added 2 commits February 5, 2026 18:33

Remove VS Code config files from tracking

2b30b86

Gitignore weird files that keep popping up (will pop this commit later)

aeb3017

gyuheon0h marked this pull request as ready for review February 5, 2026 20:52

gyuheon0h requested review from a team as code owners February 5, 2026 20:52

gyuheon0h marked this pull request as draft February 5, 2026 20:52

github-actions bot added the core Involves Datadog core libraries label Feb 5, 2026

gyuheon0h force-pushed the gyuheon0h/capture-non-signal-crash branch 2 times, most recently from c5d3fce to e4b1623 Compare February 5, 2026 21:35

gyuheon0h marked this pull request as ready for review February 6, 2026 01:05

ivoanjo reviewed Feb 6, 2026

View reviewed changes

gyuheon0h requested a review from ivoanjo February 6, 2026 19:16

gyuheon0h added 3 commits February 6, 2026 19:49

Crash ping

451aaec

Fmt fmt

gyuheon0h force-pushed the gyuheon0h/capture-non-signal-crash branch from 6f5fc9b to 25077d0 Compare February 6, 2026 19:50

Remove comment from Ruby exception crash reporting context

579b8bb

Removed comment about Ruby exception crash reporting tests.

p-datadog reviewed Feb 6, 2026

View reviewed changes

Respond to oleg -(rescuing all exceptions)

808b3f6

Flip negation

gyuheon0h force-pushed the gyuheon0h/capture-non-signal-crash branch from 87fad47 to 808b3f6 Compare February 6, 2026 22:16

gleocadie reviewed Feb 6, 2026

View reviewed changes

ext/libdatadog_api/crashtracker_report_exception.c Show resolved Hide resolved

p-datadog reviewed Feb 7, 2026

View reviewed changes

		rescue => e
		# Don't let crash reporting itself crash the exit process

	raise StandardError, 'Test Ruby crash'
	raise StandardError, 'Test Ruby unhandled exception on main thread'

	it 'reports Ruby exceptions via http when app crashes' do
	it 'reports Ruby unhandled exceptions via http' do

	context 'Ruby exception crash reporting' do
	context 'Ruby unhandled exception reporting' do

	logger.debug("Crashtracker failed to report Ruby exception crash: #{e.message}")
	logger.debug("Crashtracker failed to report Ruby unhandled exception: #{e.class}: #{e.message}")

Conversation

gyuheon0h commented Feb 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Feb 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

datadog-official bot commented Feb 6, 2026 • edited by datadog-datadog-prod-us1 bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pr-commenter bot commented Feb 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmarks

scenario:tracing - Propagation - Datadog

scenario:tracing - Tracing.log_correlation

Uh oh!

ivoanjo left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

p-datadog left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

gyuheon0h commented Feb 5, 2026 •

edited

Loading

github-actions bot commented Feb 5, 2026 •

edited

Loading

datadog-official bot commented Feb 6, 2026 •

edited by datadog-datadog-prod-us1 bot

Loading

pr-commenter bot commented Feb 6, 2026 •

edited

Loading