Skip to content

[PROF-11311] Remove datadog_profiling_loader #4356

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Feb 7, 2025

Conversation

ivoanjo
Copy link
Member

@ivoanjo ivoanjo commented Feb 6, 2025

What does this PR do?

This PR removes the datadog_profiling_loader native extension, as it's no longer needed.

The profiling native loader was added in
#2003 .

Quoting from that PR's description:

When Ruby loads a native extension (using require), it uses
dlopen(..., RTLD_LAZY | RTLD_GLOBAL)
(https://github.com/ruby/ruby/blob/67950a4c0a884bdb78d9beb4405ebf7459229b21/dln.c#L362).

This means that every symbol exposed directly or indirectly by that
native extension becomes visible to every other extension in the
Ruby process. This can cause issues, see
rubyjs/mini_racer#179 .

Ruby's extension loading mechanism is not configurable -- there's
no way to tell it to use different flags when calling dlopen.
To get around this, this commit introduces introduces another
extension (profiling loader) which has only a single responsibility:
mimic Ruby's extension loading mechanism, but when calling
dlopen use a different set of flags.

This idea was shamelessly stolen from @lloeki's work in
rubyjs/mini_racer#179, big thanks!

...and importantly ends with:

Note that, that we know of, the profiling native extension only
exposes one potentially problematic symbol:
rust_eh_personality (coming from libddprof/libdatadog).

Future versions of Rust have been patched not to expose this
(see rust-lang/rust#95604 (comment))
so we may want to revisit the need for this loader in the future,
and perhaps delete it if we no longer require its services :)

And we have reached the situation predicted in that description:

  1. Nowadays libdatadog no longer exposes rust_eh_personality

  2. We have a test that validates that only expected symbols are exported by the libdatadog library (see [NO-TICKET] Validate that all public symbols are prefixed in Ruby releases libdatadog#573 ).

    Any new symbols that show up would break shipping new libdatadog versions to rubygems.org until we review them.

  3. The libdatadog_api extension, which we've been shipping for customers since release 2.3.0 back in July 2024 has always been loaded directly without a loader without issues.

Thus, I think it's the right time to get rid of the loader.

Motivation:

Having the loader around is not zero cost; we've run into/caused a few issues ( #2250 and #3582 come to mind).

It also adds overhead to development: every time we need to rebuild the extensions, it's an extra extension that needs to be prepared, rebuilt, copied, etc.

I've been meaning to get rid of the loader for some time now, and this came up again this week so I decided it was time to do it.

Change log entry

None.

Additional Notes:

Note that none if this is visible to customers -- the loader was just an additional layer we used when loading the profiler.

In the future, we can always resurrect this approach if we figure out we need it again.

We've also discussed internally about proposing upstream to the Ruby VM to maybe add an API to do what the loader was doing, so that we wouldn't need the weird workaround.

We haven't yet decided if we're going to do that (help welcome!).

How to test the change?

The loader was responsible for loading the rest of profiling.

Thus, any test that uses profiling was also validating the loader and now will validate that we're doing fine without it.

TL;DR green CI is good.

**What does this PR do?**

This PR removes the `datadog_profiling_loader` native extension, as
it's no longer needed.

The profiling native loader was added in
#2003 .

Quoting from that PR's description:

> When Ruby loads a native extension (using `require`), it uses
> `dlopen(..., RTLD_LAZY | RTLD_GLOBAL)`
> (https://github.com/ruby/ruby/blob/67950a4c0a884bdb78d9beb4405ebf7459229b21/dln.c#L362).
>
> This means that every symbol exposed directly or indirectly by that
> native extension becomes visible to every other extension in the
> Ruby process. This can cause issues, see
> rubyjs/mini_racer#179 .
>
> Ruby's extension loading mechanism is not configurable -- there's
> no way to tell it to use different flags when calling `dlopen`.
> To get around this, this commit introduces introduces another
> extension (profiling loader) which has only a single responsibility:
> mimic Ruby's extension loading mechanism, but when calling
> `dlopen` use a different set of flags.
>
> This idea was shamelessly stolen from @lloeki's work in
> rubyjs/mini_racer#179, big thanks!

...and importantly ends with:

> Note that, that we know of, the profiling native extension only
> exposes one potentially problematic symbol:
> `rust_eh_personality` (coming from libddprof/libdatadog).
>
> Future versions of Rust have been patched not to expose this
> (see rust-lang/rust#95604 (comment))
> so we may want to revisit the need for this loader in the future,
> and perhaps delete it if we no longer require its services :)

And we have reached the situation predicted in that description:

1. Nowadays libdatadog no longer exposes `rust_eh_personality`

2. We have a test that validates that only expected symbols are
   exported by the libdatadog library
   (see DataDog/libdatadog#573 ).

   Any new symbols that show up would break shipping new
   libdatadog versions to rubygems.org until we review them.

3. The `libdatadog_api` extension, which we've been shipping for
   customers since release 2.3.0 back in July 2024 has always been
   loaded directly without a loader without issues.

Thus, I think it's the right time to get rid of the loader.

**Motivation:**

Having the loader around is not zero cost; we've run into/caused
a few issues ( #2250
and #3582 come to mind).

It also adds overhead to development: every time we need to rebuild
the extensions, it's an extra extension that needs to be prepared,
rebuilt, copied, etc.

I've been meaning to get rid of the loader for some time now,
and this came up again this week so I decided it was time to
do it.

**Additional Notes:**

In the future, we can always ressurrect this approach if we
figure out we need it again.

We've also discussed internally about proposing upstream
to the Ruby VM to maybe add an API to do what the loader was
doing, so that we wouldn't need the weird workaround.

We haven't yet decided if we're going to do that (help welcome!).

**How to test the change?**

The loader was responsible for loading the rest of profiling.

Thus, any test that uses profiling was also validating the loader
and now will validate that we're doing fine without it.

TL;DR green CI is good.
@ivoanjo ivoanjo requested review from a team as code owners February 6, 2025 17:24
@github-actions github-actions bot added the profiling Involves Datadog profiling label Feb 6, 2025
@datadog-datadog-prod-us1
Copy link
Contributor

datadog-datadog-prod-us1 bot commented Feb 6, 2025

Datadog Report

Branch report: ivoanjo/prof-11311-remove-profiling-loader
Commit report: db67572
Test service: dd-trace-rb

✅ 0 Failed, 22075 Passed, 1477 Skipped, 6m 23.79s Total Time

@pr-commenter
Copy link

pr-commenter bot commented Feb 6, 2025

Benchmarks

Benchmark execution time: 2025-02-07 11:14:50

Comparing candidate commit db67572 in PR branch ivoanjo/prof-11311-remove-profiling-loader with baseline commit a444023 in branch master.

Found 1 performance improvements and 0 performance regressions! Performance is the same for 30 metrics, 2 unstable metrics.

scenario:profiler - sample timeline=false

  • 🟩 throughput [+0.578op/s; +0.589op/s] or [+9.608%; +9.784%]

This was an interesting one... Once we removed the profiling loader
extension, our tests using Ractors started failing with:

> Ractor::UnsafeError: ractor unsafe method called from not main ractor

When I started investigating with gdb, I discovered that because
we were initializing our extension without going through Ruby,
we were skipping this part:

https://github.com/ruby/ruby/blob/7178593558080ca529abb61ef27038236ab2687d/load.c#L1301-L1302 :

```c
ext_config_push(th, &prev_ext_config);
handle = rb_vm_call_cfunc(rb_vm_top_self(), load_ext,
                          path, VM_BLOCK_HANDLER_NONE, path);
```

that is, the `ext_config_push` is what's used by Ruby to validate
if a gem is safe to use from Ractors or not. (If you're curious,
that value then affects function definition, which controls
wether Ruby will check or not for being called from a Ractor).

Because we were skipping this entire mechanism, we implicitly
were getting the same result as `rb_ext_ractor_safe(true)`.

Once we removed the loader, this started failing. And I missed it
before opening my PR since for reasons documented in the profiler
ractor tests in detail, we don't run ractor-related tests by
default.

So this issue is one more reason why having the loader may create
its own set of issues and why I'm happy to get rid of it.
Copy link
Contributor

@AlexJF AlexJF left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉

@codecov-commenter
Copy link

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 97.72%. Comparing base (a444023) to head (db67572).
Report is 10 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #4356      +/-   ##
==========================================
- Coverage   97.73%   97.72%   -0.01%     
==========================================
  Files        1368     1367       -1     
  Lines       83014    82963      -51     
  Branches     4221     4213       -8     
==========================================
- Hits        81134    81077      -57     
- Misses       1880     1886       +6     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@ivoanjo ivoanjo enabled auto-merge February 7, 2025 11:09
@ivoanjo ivoanjo merged commit 6b6ec18 into master Feb 7, 2025
502 checks passed
@ivoanjo ivoanjo deleted the ivoanjo/prof-11311-remove-profiling-loader branch February 7, 2025 11:20
@github-actions github-actions bot added this to the 2.11.0 milestone Feb 7, 2025
@ivoanjo ivoanjo added the dev/internal Other internal work that does not need to be included in the changelog label Feb 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dev/internal Other internal work that does not need to be included in the changelog profiling Involves Datadog profiling
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants