Skip to content

Conversation

@SoumyaRaikwar
Copy link
Contributor

@SoumyaRaikwar SoumyaRaikwar commented Dec 25, 2025

[storage] Add Elasticsearch data stream support

Ref: #4708
Part of #4708

Which problem is this PR solving?

Adds comprehensive support for Elasticsearch data streams in Jaeger storage. Data streams provide automatic rollover, simplified scaling, and optimized lifecycle management for time-series data, replacing traditional date-suffixed indices.

Description of the changes

  • Full Storage Layer Integration: Implemented Data Stream support across all four storage types: Span (traces & services), Sampling (throughput & probabilities), and Dependencies.
  • Improved Isolation Strategy: Adopted the jaeger-ds-* naming convention (e.g., jaeger-ds-span) for all data streams. This ensures strict isolation, preventing legacy wildcard searches (jaeger-span-*) from accidentally picking up new Data Stream data.
  • Read & Write Paths:
    • Updated writers to use OpType("create") and remove date suffixes when Data Streams are enabled.
    • Updated SpanReader and other stores with dual-reading logic to query both new Data Streams and legacy indices simultaneously, ensuring a zero-downtime migration.
  • Timestamp Normalization: Templates are configured to use ingest pipelines for mapping Jaeger's microsecond startTime to the mandatory @timestamp field required by Data Streams.
  • Design & Evidence Documentation: Added docs/design/elasticsearch-data-streams.md. This document serves as the Report of Evidence requested by maintainers, detailing ingest pipeline definitions, step-by-step verification procedures, and ingestion results.
  • Code Quality: Resolved all lint issues in the affected storage packages and ensured all unit tests are 100% passing.

How was this change tested?

  • Automated Tests: Updated and expanded unit tests for MappingBuilder, SpanStore, SamplingStore, and DependencyStore.
  • Manual Verification: Verified on an Elasticsearch 8.11 cluster:
    • Correct template and pipeline application.
    • Successful trace ingestion into backing indices via tracegen.
    • Verified query consistency across both legacy indices and Data Streams via the Jaeger UI.
  • Evidence Report: Detailed verification steps and JSON evidence are included in the repository at docs/design/elasticsearch-data-streams.md.

Checklist

@codecov
Copy link

codecov bot commented Dec 25, 2025

Codecov Report

❌ Patch coverage is 87.82051% with 19 lines in your changes missing coverage. Please review.
✅ Project coverage is 95.44%. Comparing base (cb60fb4) to head (edfa85a).

Files with missing lines Patch % Lines
.../storage/v1/elasticsearch/samplingstore/storage.go 72.34% 9 Missing and 4 partials ⚠️
internal/storage/elasticsearch/config/config.go 0.00% 2 Missing and 2 partials ⚠️
internal/storage/elasticsearch/config/utils.go 50.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #7768      +/-   ##
==========================================
- Coverage   95.53%   95.44%   -0.10%     
==========================================
  Files         307      308       +1     
  Lines       15911    16015     +104     
==========================================
+ Hits        15201    15285      +84     
- Misses        558      570      +12     
- Partials      152      160       +8     
Flag Coverage Δ
badger_v1 9.14% <3.84%> (-0.04%) ⬇️
badger_v2 1.97% <3.84%> (+0.04%) ⬆️
cassandra-4.x-v1-manual 13.50% <3.84%> (-0.09%) ⬇️
cassandra-4.x-v2-auto 1.96% <3.84%> (+0.04%) ⬆️
cassandra-4.x-v2-manual 1.96% <3.84%> (+0.04%) ⬆️
cassandra-5.x-v1-manual 13.50% <3.84%> (-0.09%) ⬇️
cassandra-5.x-v2-auto 1.96% <3.84%> (+0.04%) ⬆️
cassandra-5.x-v2-manual 1.96% <3.84%> (+0.04%) ⬆️
clickhouse 2.01% <3.84%> (+0.03%) ⬆️
elasticsearch-6.x-v1 17.84% <60.25%> (+0.29%) ⬆️
elasticsearch-7.x-v1 17.87% <60.25%> (+0.29%) ⬆️
elasticsearch-8.x-v1 18.22% <67.94%> (+0.48%) ⬆️
elasticsearch-8.x-v2 1.97% <3.84%> (+0.04%) ⬆️
elasticsearch-9.x-v2 1.97% <3.84%> (+0.04%) ⬆️
grpc_v1 8.81% <3.84%> (-0.04%) ⬇️
grpc_v2 1.97% <3.84%> (+0.04%) ⬆️
kafka-3.x-v2 1.97% <3.84%> (+0.04%) ⬆️
memory_v2 1.97% <3.84%> (+0.04%) ⬆️
opensearch-1.x-v1 17.92% <60.25%> (+0.29%) ⬆️
opensearch-2.x-v1 17.92% <60.25%> (+0.29%) ⬆️
opensearch-2.x-v2 1.97% <3.84%> (+0.04%) ⬆️
opensearch-3.x-v2 1.97% <3.84%> (+0.04%) ⬆️
query 1.97% <3.84%> (+0.04%) ⬆️
tailsampling-processor 0.61% <3.84%> (+0.05%) ⬆️
unittests 93.96% <78.20%> (-0.20%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Part of jaegertracing#4708

Adds opt-in support for Elasticsearch data streams to replace
traditional date-based indices with automatic lifecycle management.

Signed-off-by: SoumyaRaikwar <[email protected]>
@SoumyaRaikwar SoumyaRaikwar force-pushed the feature/es-datastream-support branch from c5f1ae9 to 1b79e3a Compare December 25, 2025 11:06
@SoumyaRaikwar
Copy link
Contributor Author

@yurishkuro @Manik2708

PR implements data stream templates aligned with @Manik2708's design doc:

  • Index pattern: jaeger-span-ds*
  • Ingestion pipeline reference for @timestamp
  • ILM policy integration in settings

Templates are ready for review.

@Manik2708
Copy link
Contributor

Thanks @SoumyaRaikwar! But I would like to suggest a little break. The design doc is not ready for implementation yet. Currently it lacks evidence, we might have to manually update the templates and then see how it behaves. Moreover we have to see whether we are missing something over backward compatibility. Currently it has my research from docs but I still have to test and provide the steps for reviewer. If you can help with this, then I would be very grateful to you.

@SoumyaRaikwar
Copy link
Contributor Author

SoumyaRaikwar commented Dec 25, 2025

@Manik2708, I've reviewed your proposal and the current template implementation. I'm keeping my PR as draft for now.

I can help with the manual testing you mentioned. I'll set up a local ES cluster, create the ingest pipelines, apply the data stream templates, and verify the behavior. I'll document the steps and results here for the design doc evidence. Let me know if there's a specific scenario you'd like me to prioritize

@SoumyaRaikwar SoumyaRaikwar marked this pull request as draft December 25, 2025 18:04
@SoumyaRaikwar
Copy link
Contributor Author

@Manik2708

I've completed initial testing and evidence gathering. Here's what I verified:

Design Alignment Confirmed

  • Templates updated with jaeger-span-ds pattern
  • data_stream: {} mapping added
  • index.default_pipeline linked to ingest pipelines
  • Following TsengSR's approach: no ILM customization in Jaeger code - users can override via ES jaeger-span-custom component templates

Ingest Pipeline Tested

Created and tested jaeger-span-ds-timestamp pipeline on ES 8.11:

  • Successfully copies startTime@timestamp
  • Handles epoch_millis format correctly
  • Falls back to _ingest.timestamp for docs without startTime

Manual Verification Evidence

Posted test document to data stream:

  • Input: {"traceID": "test-1", "startTime": 1672531200000}
  • Result: Correctly stored in .ds-jaeger-span-ds-2025.12.26-000001 with @timestamp: "2023-01-01T00:00:00.000Z"

Full evidence report: https://gist.github.com/SoumyaRaikwar/519a98bcc81dc2df04308ae4a66b702b

Next Steps

Ready to work on:

  1. Backward compatibility testing (dual read from old + new indices)
  2. Documentation showing ES custom template override approach (per TsengSR's suggestion)
  3. All 4 index types (span, service, dependencies, sampling)
  4. Integration test updates

Which should I prioritize?

@SoumyaRaikwar SoumyaRaikwar marked this pull request as ready for review December 28, 2025 12:13
UseILM bool `mapstructure:"use_ilm"`
// UseDataStream, if set to true, will use Elasticsearch data streams for storing traces.
// This requires Elasticsearch 7.9+.
UseDataStream bool `mapstructure:"use_data_stream"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

datastream is going to be the default strategy, we don't need this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it - data streams will be the default strategy for ES 8+. I'll remove the config flag and make data streams the standard path for span storage.

Index(index string) IndexService
Type(typ string) IndexService
Id(id string) IndexService
OpType(opType string) IndexService
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's this? what do we want to do?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The OpType(opType string) method is required for data stream writes. ES data streams only accept documents with op_type=create (not index). Without this, we get 400 errors from ES. This is why I added it to the IndexService interface and mocks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@Manik2708 Manik2708 Dec 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then why to add in interface? its a part of *elastic.BulkRequest, can't we add op type in Add method or perhaps directly in WrapESIndexService?

@@ -0,0 +1,167 @@
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't need this as a seperate template, I think only v8 templates needs to be changed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood - I'll remove all non-span data stream implementations:

  • Remove jaeger-ds-service-8.json
  • Remove jaeger-ds-dependencies-8.json
  • Remove jaeger-ds-sampling-8.json
  • Keep only jaeger-ds-span-8.json

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my main point was to change the existing v8 template, so I don't think we even need jaeger-ds-span-8.json

@@ -0,0 +1,23 @@
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we want only spans in datastream as for now, no need of services etc

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood! I'll remove all non-span data stream files:

  • jaeger-ds-service-8.json
  • jaeger-ds-dependencies-8.json
  • jaeger-ds-sampling-8.json
  • Keep only jaeger-ds-span-8.json

Will also remove the corresponding code from samplingstore, depstore, and service_operation stores. Working on it now.

UseILM bool `mapstructure:"use_ilm"`
// UseDataStream, if set to true, will use Elasticsearch data streams for storing traces.
// This requires Elasticsearch 7.9+.
UseDataStream bool `mapstructure:"use_data_stream"`
Copy link
Contributor

@Manik2708 Manik2708 Dec 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest using featuregates for dual look up in reader

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great idea! I'll implement it with feature gates for the dual lookup logic.

To clarify the approach:

  • Keep UseDataStream as an internal implementation detail (not exposed in config)
  • Use feature gates to control reader behavior (search data streams vs regular indices)
  • For ES 8+, data streams will be the default write path

This way we can safely migrate without breaking existing deployments. Does this align with your vision?

@@ -0,0 +1,116 @@
# Elasticsearch Data Streams in Jaeger
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know whether it should be added here or not. I would suggest to add this in Google doc first and get it approved there by the maintainers. For documentation I am more concerned for the modification of ILM/Templates as that information would be of utmost importance for the users!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done i have updated it here https://gist.github.com/SoumyaRaikwar/33b696ae9ce488c99c165f6301adf7e1 so maintainers could review it

@SoumyaRaikwar
Copy link
Contributor Author

@yurishkuro

I've updated the design document and the Gist with the proposed Index Lifecycle Management (ILM) policy for data streams.

Updates include:

  • ILM Policy Proposal (jaeger-ilm-policy):
    • Hot Phase: Rollover at 50GB or 1 day. Priority 100.
    • Warm Phase: Transition immediately after rollover. Priority 50.
    • Delete Phase: Delete indices after 7 days.
  • Technical Verification:
    • Updated jaeger-span-8.json template to reference the ILM policy.
    • Verified that all mapping tests pass with the new configuration.
    • Created a sample jaeger-ilm-policy.json definition.

Please review the updated proposal. Once approved, I can proceed with applying the ILM policy to the templates.

Implement jaeger.es.readLegacyWithDataStream feature gate and enhance ILM policy with forcemerge.

Signed-off-by: SoumyaRaikwar <[email protected]>
@SoumyaRaikwar SoumyaRaikwar force-pushed the feature/es-datastream-support branch from ca6e88d to 3cdd9aa Compare December 29, 2025 20:27
SoumyaRaikwar and others added 2 commits December 30, 2025 02:32
…Fielddata error

- Updated ES 8 index patterns to include both legacy and data stream names
- Resolved Fielddata error for serviceName by ensuring mappings apply to data streams
- Fixed writer_test.go mock expectations and addressed linting issues (revive, testifylint)

Signed-off-by: SoumyaRaikwar <[email protected]>
@github-actions
Copy link

github-actions bot commented Jan 8, 2026

Metrics Comparison Summary

Total changes across all snapshots: 0

Detailed changes per snapshot

summary_metrics_snapshot_cassandra

📊 Metrics Diff Summary

Total Changes: 0

  • 🆕 Added: 0 metrics
  • ❌ Removed: 0 metrics
  • 🔄 Modified: 0 metrics
  • 🚫 Excluded: 53 metrics

summary_metrics_snapshot_cassandra

📊 Metrics Diff Summary

Total Changes: 0

  • 🆕 Added: 0 metrics
  • ❌ Removed: 0 metrics
  • 🔄 Modified: 0 metrics
  • 🚫 Excluded: 53 metrics

➡️ View full metrics file

SoumyaRaikwar and others added 2 commits January 8, 2026 14:50
Co-authored-by: graphite-app[bot] <96075541+graphite-app[bot]@users.noreply.github.com>
Signed-off-by: Soumya Raikwar <[email protected]>
@SoumyaRaikwar
Copy link
Contributor Author

@yurishkuro i have updated PR to reflect changes as per doc, could you review?

Signed-off-by: SoumyaRaikwar <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants