Skip to content

DEBUG-3573 send AppClientConfigurationChange telemetry event when subsequent telemetry instances start #4678

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 57 commits into from
May 31, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
57 commits
Select commit Hold shift + click to select a range
93a7bfc
state management
p May 21, 2025
a1f1d50
state management
p May 21, 2025
21fe6b7
hmm
p May 21, 2025
f596b22
update documentation to match reality
p May 21, 2025
cc85634
do not lose events
p May 22, 2025
ce38a92
remove redundant checks
p May 22, 2025
ca980f0
changing to initial event
p May 22, 2025
d396fec
fixing
p May 22, 2025
dee2aa6
note
p May 23, 2025
92bf75b
initial events
p May 23, 2025
7067e65
fake event
p May 19, 2025
7446a11
type
p May 19, 2025
381eec1
rename to synth
p May 21, 2025
71301b7
use synth event
p May 23, 2025
22d4453
fix
p May 23, 2025
b6df901
tests
p May 23, 2025
ae70584
synth event test
p May 23, 2025
cc45401
class -> module
p May 23, 2025
a9eccfb
no longer applicable
p May 23, 2025
6f9b391
rubocop -> standard migration
p May 23, 2025
ae6bd7e
new tests
p May 23, 2025
7ce6381
fix dep test
p May 23, 2025
db1c9f9
not needed
p May 23, 2025
7bb70fe
return not needed
p May 23, 2025
6fa986c
remove
p May 23, 2025
d40409d
fix test
p May 26, 2025
0f90a11
fix test
p May 26, 2025
8890454
send once
p May 27, 2025
21344c8
avoid warning
p May 27, 2025
b367965
fix
p May 27, 2025
8094955
cleanup
p May 27, 2025
a5ba0bd
cleanup
p May 27, 2025
02351fc
rbs
p May 27, 2025
1ca3983
types
p May 27, 2025
ed5effb
integration test for the synth event
p May 27, 2025
8dc11b5
repair diagnostics
p May 27, 2025
4efb289
fix condition
p May 27, 2025
4a9bab0
fix diagnostics
p May 27, 2025
77cf010
fix condition for real
p May 27, 2025
5b3e780
test dependency collection event
p May 27, 2025
b4ce0c1
fix
p May 27, 2025
1df630d
standard
p May 27, 2025
936007c
explain
p May 27, 2025
ae6a2a5
mark private
p Apr 28, 2025
b32a647
test early event submission
p May 27, 2025
8196384
test retries of initial event
p May 27, 2025
0023b94
remove debug
p May 27, 2025
e125340
Update lib/datadog/core/configuration/components_state.rb
p-datadog May 28, 2025
9698dcb
fix
p May 30, 2025
e473898
make Components return their state
p May 30, 2025
d979725
di + telemetry integration test
p May 30, 2025
e158a34
add assertions to diagnose flakiness
p May 30, 2025
efe549b
use equal?
p May 30, 2025
a6c0d13
adjust logger for debugging
p May 30, 2025
691971f
rubocop
p May 30, 2025
85a9c78
retest
p May 30, 2025
932e02c
Merge branch 'master' into telemetry-lifecycle
p-datadog May 30, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 16 additions & 0 deletions lib/datadog/core/buffer/random.rb
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,22 @@ def concat(items)
overflow&.each { |item| replace!(item) }
end

def unshift(*items)
# TODO The existing concat implementation does not always append
# to the end of the buffer - if the buffer is full, a random
# item is deleted and the new item is added in the position of
# removed item.
# Therefore, if we want to preserve the item order, concat
# would also need to be changed to maintain order.
# With the existing implementation, the idea is to not move
# existing items around, which is what sets unshift apart from
# concat to begin with.
#
# Since this method currently delegates to +concat+, it does not
# have a matching definition in the thread-safe worker.
concat(items)
end

# Stored items are returned and the local buffer is reset.
def pop
drain!
Expand Down
14 changes: 8 additions & 6 deletions lib/datadog/core/configuration.rb
Original file line number Diff line number Diff line change
Expand Up @@ -265,12 +265,14 @@ def replace_components!(settings, old)
components = Components.new(settings)

# Carry over state from existing components to the new ones.
# Currently, if we already started the remote component (which
# happens after a request goes through installed Rack middleware),
# we will start the new remote component as well.
old_state = {
remote_started: old.remote&.started?,
}
# Currently:
# 1. If we already started the remote component (which
# happens after a request goes through installed Rack middleware),
# we will start the new remote component as well.
# 2. If telemetry has been enabled and is enabled in the new
# component tree, we send AppConfigurationChange event instead
# of AppStarted event.
old_state = old.state

old.shutdown!(components)
components.startup!(settings, old_state: old_state)
Expand Down
23 changes: 16 additions & 7 deletions lib/datadog/core/configuration/components.rb
Original file line number Diff line number Diff line change
@@ -1,22 +1,21 @@
# frozen_string_literal: true

require_relative 'agent_settings_resolver'
require_relative 'components_state'
require_relative 'ext'
require_relative '../diagnostics/environment_logger'
require_relative '../diagnostics/health'
require_relative '../logger'
require_relative '../runtime/metrics'
require_relative '../telemetry/component'
require_relative '../workers/runtime_metrics'

require_relative '../remote/component'
require_relative '../../tracing/component'
require_relative '../../profiling/component'
require_relative '../../appsec/component'
require_relative '../../di/component'
require_relative '../../error_tracking/component'
require_relative '../crashtracking/component'

require_relative '../environment/agent_info'
require_relative '../process_discovery'

Expand Down Expand Up @@ -135,6 +134,8 @@ def initialize(settings)

# Starts up components
def startup!(settings, old_state: nil)
telemetry.start(old_state&.telemetry_enabled?)

if settings.profiling.enabled
if profiler
profiler.start
Expand All @@ -145,7 +146,7 @@ def startup!(settings, old_state: nil)
end
end

if settings.remote.enabled && old_state&.[](:remote_started)
if settings.remote.enabled && old_state&.remote_started?
# The library was reconfigured and previously it already started
# the remote component (i.e., it received at least one request
# through the installed Rack middleware which started the remote).
Expand Down Expand Up @@ -173,7 +174,7 @@ def shutdown!(replacement = nil)

# Shutdown the old tracer, unless it's still being used.
# (e.g. a custom tracer instance passed in.)
tracer.shutdown! unless replacement && tracer == replacement.tracer
tracer.shutdown! unless replacement && tracer.equal?(replacement.tracer)

# Shutdown old profiler
profiler&.shutdown!
Expand Down Expand Up @@ -206,13 +207,21 @@ def shutdown!(replacement = nil)
unused_statsd = (old_statsd - (old_statsd & new_statsd))
unused_statsd.each(&:close)

# enqueue closing event before stopping telemetry so it will be send out on shutdown
telemetry.emit_closing! unless replacement
telemetry.stop!
# enqueue closing event before stopping telemetry so it will be sent out on shutdown
telemetry.emit_closing! unless replacement&.telemetry&.enabled
telemetry.shutdown!

# TODO: Re-enable this once we have updated libdatadog to 17.1
# Core::ProcessDiscovery._native_close_tracer_memfd(@process_discovery_fd, @logger) if @process_discovery_fd
end

# Returns the current state of various components.
def state
ComponentsState.new(
telemetry_enabled: telemetry.enabled,
remote_started: remote&.started?,
)
end
end
end
end
Expand Down
23 changes: 23 additions & 0 deletions lib/datadog/core/configuration/components_state.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
# frozen_string_literal: true

module Datadog
module Core
module Configuration
# Stores the state of component tree when replacing the tree.
class ComponentsState
def initialize(telemetry_enabled:, remote_started:)
@telemetry_enabled = !!telemetry_enabled
@remote_started = !!remote_started
end

def telemetry_enabled?
@telemetry_enabled
end

def remote_started?
@remote_started
end
end
end
end
end
25 changes: 20 additions & 5 deletions lib/datadog/core/telemetry/component.rb
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,8 @@ module Core
module Telemetry
# Telemetry entrypoint, coordinates sending telemetry events at various points in app lifecycle.
# Note: Telemetry does not spawn its worker thread in fork processes, thus no telemetry is sent in forked processes.
#
# @api private
class Component
attr_reader :enabled, :logger, :transport, :worker

Expand All @@ -40,7 +42,7 @@ def self.build(settings, agent_settings, logger)
end

# @param enabled [Boolean] Determines whether telemetry events should be sent to the API
def initialize( # rubocop: disable Metrics/MethodLength
def initialize( # standard:disable Metrics/MethodLength
settings:,
agent_settings:,
logger:,
Expand Down Expand Up @@ -94,19 +96,32 @@ def initialize( # rubocop: disable Metrics/MethodLength
logger: logger,
shutdown_timeout: settings.telemetry.shutdown_timeout_seconds,
)

@worker.start
end

def disable!
@enabled = false
@worker&.enabled = false
end

def stop!
def start(initial_event_is_change = false)
return if !@enabled

initial_event = if initial_event_is_change
Event::SynthAppClientConfigurationChange.new
else
Event::AppStarted.new
end

@worker.start(initial_event)
end

def shutdown!
return if @stopped

@worker&.stop(true)
if defined?(@worker)
@worker&.stop(true)
end

@stopped = true
end

Expand Down
5 changes: 4 additions & 1 deletion lib/datadog/core/telemetry/event.rb
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,9 @@
module Datadog
module Core
module Telemetry
# Collection of telemetry events
# Collection of telemetry events.
#
# @api private
module Event
extend Core::Utils::Forking

Expand All @@ -27,6 +29,7 @@ def self.configuration_sequence
require_relative 'event/app_heartbeat'
require_relative 'event/app_integrations_change'
require_relative 'event/app_started'
require_relative 'event/synth_app_client_configuration_change'
require_relative 'event/generate_metrics'
require_relative 'event/distributions'
require_relative 'event/log'
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
# frozen_string_literal: true

require_relative 'base'

module Datadog
module Core
module Telemetry
module Event
# app-client-configuration-change event emitted instead of
# app-started event for telemetry startups other than the initial
# one in a process.
#
# dd-trace-rb generally creates a new component tree whenever
# the tracer is reconfigured via Datadog.configure (with some
# components potentially reused, if their configuration has not
# changed). Telemetry system tests on the other hand expect there
# to be one "tracer" per process, and do not permit multiple
# app-started events to be emitted.
#
# To resolve this conflict, we replace second and onward app-started
# events with app-client-configuration-change events.
# To avoid diffing configuration, we send all parameters that are
# sent in app-started event.
#
# It's a "quick fix" on top of a not-so-quick fix that omitted
# second and subsequent app-started (and app-closing) events in the
# first place, and only works with the existing hackery of app-started
# and app-closing events.
class SynthAppClientConfigurationChange < AppStarted
def type
'app-client-configuration-change'
end

def payload
{
configuration: configuration,
}
end
end
end
end
end
end
Loading
Loading