Skip to content

Conversation

@jmacd
Copy link
Contributor

@jmacd jmacd commented Jan 17, 2026

The ObservedEvent has associated flume channels and a connection with the existing metrics and admin component which make it an appealing way to transport log events in the engine.

Move PipelineKey, DeployedPipelineKey, CoreId types into crates/config.

Therefore, moving ObservedEvent into crates/telemetry lets us (optionally) use the same channel already use for lifecycle events for tokio log records. The existing event structure is extended with a EventMessage enum which supports None, String, or LogRecord messages. This way we can use a log record as the event message for all existing event types. The event.rs file moves, only ObservedEventRingBuffer from that file remains in crates/state.

The LogRecord has been storing a timestamp. Now, we leave that to the ObservedEvent struct. LogRecord passes through SystemTime everywhere it has been used. Callers generally compute this and pass it in. Minor cleanup in self_tracing/formatter.rs, do not pass SavedCallsite it can be calculated from record metadata as needed.

In internal_events, the raw_error! macro has been replaced with a helper to generate LogRecord values first, by level. This lets us pass info_event!("string", key=value) to any of the event constructors and construct an OTLP bytes message instead of a String message.

@jmacd jmacd requested a review from a team as a code owner January 17, 2026 03:10
@github-actions github-actions bot added the rust Pull requests that update Rust code label Jan 17, 2026
@codecov
Copy link

codecov bot commented Jan 17, 2026

Codecov Report

❌ Patch coverage is 52.88462% with 98 lines in your changes missing coverage. Please review.
✅ Project coverage is 84.72%. Comparing base (ede3e17) to head (7f9b088).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #1818   +/-   ##
=======================================
  Coverage   84.72%   84.72%           
=======================================
  Files         498      499    +1     
  Lines      146419   146467   +48     
=======================================
+ Hits       124048   124090   +42     
- Misses      21837    21843    +6     
  Partials      534      534           
Components Coverage Δ
otap-dataflow 86.16% <52.88%> (+<0.01%) ⬆️
query_abstraction 80.61% <ø> (ø)
query_engine 90.52% <ø> (ø)
syslog_cef_receivers ∅ <ø> (∅)
otel-arrow-go 53.50% <ø> (ø)
quiver 90.66% <ø> (ø)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

@lquerel lquerel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a few questions that I don't think are necessarily blocking. I believe I will need the other PRs before having a complete, end to end view of the approach.

The main point that seems important to me to guarantee in the long term is that we must have a way to compile the engine in a mode where only vital events are compiled in, and all others are completely eliminated. I don't have the impression that this is in place yet, so I'm waiting to see how this will be implemented in the following PRs.

Comment on lines 143 to +150
macro_rules! raw_error {
($name:expr $(, $($fields:tt)*)?) => {{
use $crate::self_tracing::{ConsoleWriter, RawLoggingLayer};
use $crate::self_tracing::ConsoleWriter;
let now = std::time::SystemTime::now();
let record = $crate::error_event!($name $(, $($fields)*)?);
ConsoleWriter::no_color().print_log_record(now, &record);
}};
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, we should use raw_error! with great care, since there is no way to eliminate its cost, neither at compile time nor at runtime. Is that correct?
Also, for println!, we rely on Clippy to catch it in CI (and prevent them to end up in regular code). So what should we do for raw_error!?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. There will be very few of these, for cases where all/else fails. Is it difficult to configure clippy with a similar rule for raw_error? This statement is the same as println/eprintln but with the structured syntax of Tokio tracing and use of our code path.
I can see an argument that having raw_error become an eprintln!() statement makes sense; my preference is to use the same instrumentation interface use everywhere, so my goal is to replace log::error! (formatting) with tracing::error! or otel_error! (structured) and eprintln! (formatting) with raw_error! (structured).

Comment on lines +215 to +217
let callsite = record.callsite();
assert_eq!(*callsite.level(), Level::INFO);
assert_eq!(callsite.name(), "test.event");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the line number returned by the callsite correct?
I'm asking because I'm not sure I understand how the call site behaves with the nesting of the macros defined earlier.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a test. These macros are designed for the kind of nesting here, IIUC.

@jmacd
Copy link
Contributor Author

jmacd commented Jan 19, 2026

long term is that we must have a way to compile the engine in a mode where only vital events are compiled in, and ...

I can see your concern. I showed how to use special macros to encode LogRecord (i.e., structured body and key/values) with static callsite, but the current lifecycle events are dispatched unconditionally. I think you would prefer to see structured errors, but I also see formatting happening in the existing events.

My goals here are:

  1. the admin thread can be assigned the role of collecting LogRecords from subscribers, this for the ConsoleAsync mode of internal logging (this is the fallback of not using ITS i.e., not using an internal pipeline). This means we can send logs on the same channel as the current lifecycle events.
  2. To suggest that anywhere you see an Option with format!(...) body you might instead use an info_event!(...) macro to encode OTLP bytes and use that instead of formatting, defer formatting to the consumer if at all

Answering your concern at the top, this leaves a slight misalignment. We're using Tokio macros to construct LogRecords which are the encoded parts and the Metadata i.e., callsite data. Your question is about whether can can compile-time disable these events.

Yes, we can but not as written. Tokio macros have support for compile-time disablement base on Level, part of every callsite metadata. The fact that we have error_event! and info_event! is to suggest that we could choose to compile with only error events, but we'd have to modify ObservedEvent to treat those as the EventMessage::None value vs the EventMessage::Log event.

@jmacd jmacd enabled auto-merge January 19, 2026 19:35
@jmacd jmacd added this pull request to the merge queue Jan 19, 2026
Merged via the queue into open-telemetry:main with commit 1b503f3 Jan 19, 2026
43 of 44 checks passed
@jmacd jmacd deleted the jmacd/move_event branch January 19, 2026 20:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

rust Pull requests that update Rust code

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

2 participants