Skip to content

Conversation

@Manik2708
Copy link
Contributor

Which problem is this PR solving?

Description of the changes

  • Refactor the internal methods of integration tests to read, write and compare ptrace.Traces directly.

How was this change tested?

  • Integration Tests

Checklist

@Manik2708 Manik2708 requested a review from a team as a code owner January 1, 2026 10:19
@Manik2708 Manik2708 requested a review from albertteoh January 1, 2026 10:19
@Manik2708 Manik2708 marked this pull request as draft January 1, 2026 10:19
@dosubot dosubot bot added the area/storage label Jan 1, 2026
@Manik2708
Copy link
Contributor Author

Currently fixing the ES tests

Signed-off-by: Manik Mehta <[email protected]>
Signed-off-by: Manik Mehta <[email protected]>
Signed-off-by: Manik Mehta <[email protected]>
@codecov
Copy link

codecov bot commented Jan 2, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 95.52%. Comparing base (cb60fb4) to head (9d581bc).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #7812      +/-   ##
==========================================
- Coverage   95.53%   95.52%   -0.02%     
==========================================
  Files         307      307              
  Lines       15911    15911              
==========================================
- Hits        15201    15199       -2     
- Misses        558      559       +1     
- Partials      152      153       +1     
Flag Coverage Δ
badger_v1 8.91% <ø> (-0.28%) ⬇️
badger_v2 1.26% <ø> (-0.67%) ⬇️
cassandra-4.x-v1-manual 13.31% <ø> (-0.28%) ⬇️
cassandra-4.x-v2-auto 1.26% <ø> (-0.66%) ⬇️
cassandra-4.x-v2-manual 1.26% <ø> (-0.66%) ⬇️
cassandra-5.x-v1-manual 13.31% <ø> (-0.28%) ⬇️
cassandra-5.x-v2-auto 1.26% <ø> (-0.66%) ⬇️
cassandra-5.x-v2-manual 1.26% <ø> (-0.66%) ⬇️
clickhouse 1.22% <ø> (-0.75%) ⬇️
elasticsearch-6.x-v1 16.99% <ø> (-0.55%) ⬇️
elasticsearch-7.x-v1 17.03% <ø> (-0.55%) ⬇️
elasticsearch-8.x-v1 17.18% <ø> (-0.55%) ⬇️
elasticsearch-8.x-v2 1.26% <ø> (-0.67%) ⬇️
elasticsearch-9.x-v2 1.26% <ø> (-0.67%) ⬇️
grpc_v1 8.31% <ø> (-0.54%) ⬇️
grpc_v2 1.26% <ø> (-0.67%) ⬇️
kafka-3.x-v2 1.26% <ø> (-0.67%) ⬇️
memory_v2 1.26% <ø> (-0.67%) ⬇️
opensearch-1.x-v1 17.07% <ø> (-0.55%) ⬇️
opensearch-2.x-v1 17.07% <ø> (-0.55%) ⬇️
opensearch-2.x-v2 1.26% <ø> (-0.67%) ⬇️
opensearch-3.x-v2 1.26% <ø> (-0.67%) ⬇️
query 1.26% <ø> (-0.67%) ⬇️
tailsampling-processor 0.55% <ø> (-0.01%) ⬇️
unittests 94.15% <ø> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@github-actions
Copy link

github-actions bot commented Jan 2, 2026

Metrics Comparison Summary

Total changes across all snapshots: 32

Detailed changes per snapshot

summary_metrics_snapshot_cassandra

📊 Metrics Diff Summary

Total Changes: 0

  • 🆕 Added: 0 metrics
  • ❌ Removed: 0 metrics
  • 🔄 Modified: 0 metrics
  • 🚫 Excluded: 53 metrics

summary_metrics_snapshot_badger

📊 Metrics Diff Summary

Total Changes: 32

  • 🆕 Added: 0 metrics
  • ❌ Removed: 32 metrics
  • 🔄 Modified: 0 metrics
  • 🚫 Excluded: 0 metrics

❌ Removed Metrics

  • jaeger_storage_badger_compaction_current_num_lsm (2 variants)
View diff sample
-jaeger_storage_badger_compaction_current_num_lsm{name="another_store",otel_scope_name="jaeger-v2",otel_scope_schema_url="",otel_scope_version="",role="tracestore"}
-jaeger_storage_badger_compaction_current_num_lsm{name="some_store",otel_scope_name="jaeger-v2",otel_scope_schema_url="",otel_scope_version="",role="tracestore"}
- `jaeger_storage_badger_get_num_memtable` (2 variants)
View diff sample
-jaeger_storage_badger_get_num_memtable{name="another_store",otel_scope_name="jaeger-v2",otel_scope_schema_url="",otel_scope_version="",role="tracestore"}
-jaeger_storage_badger_get_num_memtable{name="some_store",otel_scope_name="jaeger-v2",otel_scope_schema_url="",otel_scope_version="",role="tracestore"}
- `jaeger_storage_badger_get_num_user` (2 variants)
View diff sample
-jaeger_storage_badger_get_num_user{name="another_store",otel_scope_name="jaeger-v2",otel_scope_schema_url="",otel_scope_version="",role="tracestore"}
-jaeger_storage_badger_get_num_user{name="some_store",otel_scope_name="jaeger-v2",otel_scope_schema_url="",otel_scope_version="",role="tracestore"}
- `jaeger_storage_badger_get_with_result_num_user` (2 variants)
View diff sample
-jaeger_storage_badger_get_with_result_num_user{name="another_store",otel_scope_name="jaeger-v2",otel_scope_schema_url="",otel_scope_version="",role="tracestore"}
-jaeger_storage_badger_get_with_result_num_user{name="some_store",otel_scope_name="jaeger-v2",otel_scope_schema_url="",otel_scope_version="",role="tracestore"}
- `jaeger_storage_badger_iterator_num_user` (2 variants)
View diff sample
-jaeger_storage_badger_iterator_num_user{name="another_store",otel_scope_name="jaeger-v2",otel_scope_schema_url="",otel_scope_version="",role="tracestore"}
-jaeger_storage_badger_iterator_num_user{name="some_store",otel_scope_name="jaeger-v2",otel_scope_schema_url="",otel_scope_version="",role="tracestore"}
- `jaeger_storage_badger_put_num_user` (2 variants)
View diff sample
-jaeger_storage_badger_put_num_user{name="another_store",otel_scope_name="jaeger-v2",otel_scope_schema_url="",otel_scope_version="",role="tracestore"}
-jaeger_storage_badger_put_num_user{name="some_store",otel_scope_name="jaeger-v2",otel_scope_schema_url="",otel_scope_version="",role="tracestore"}
- `jaeger_storage_badger_read_bytes_lsm` (2 variants)
View diff sample
-jaeger_storage_badger_read_bytes_lsm{name="another_store",otel_scope_name="jaeger-v2",otel_scope_schema_url="",otel_scope_version="",role="tracestore"}
-jaeger_storage_badger_read_bytes_lsm{name="some_store",otel_scope_name="jaeger-v2",otel_scope_schema_url="",otel_scope_version="",role="tracestore"}
- `jaeger_storage_badger_read_bytes_vlog` (2 variants)
View diff sample
-jaeger_storage_badger_read_bytes_vlog{name="another_store",otel_scope_name="jaeger-v2",otel_scope_schema_url="",otel_scope_version="",role="tracestore"}
-jaeger_storage_badger_read_bytes_vlog{name="some_store",otel_scope_name="jaeger-v2",otel_scope_schema_url="",otel_scope_version="",role="tracestore"}
- `jaeger_storage_badger_read_num_vlog` (2 variants)
View diff sample
-jaeger_storage_badger_read_num_vlog{name="another_store",otel_scope_name="jaeger-v2",otel_scope_schema_url="",otel_scope_version="",role="tracestore"}
-jaeger_storage_badger_read_num_vlog{name="some_store",otel_scope_name="jaeger-v2",otel_scope_schema_url="",otel_scope_version="",role="tracestore"}
- `jaeger_storage_badger_size_bytes_lsm` (2 variants)
View diff sample
-jaeger_storage_badger_size_bytes_lsm{name="another_store",otel_scope_name="jaeger-v2",otel_scope_schema_url="",otel_scope_version="",role="tracestore"}
-jaeger_storage_badger_size_bytes_lsm{name="some_store",otel_scope_name="jaeger-v2",otel_scope_schema_url="",otel_scope_version="",role="tracestore"}
- `jaeger_storage_badger_size_bytes_vlog` (2 variants)
View diff sample
-jaeger_storage_badger_size_bytes_vlog{name="another_store",otel_scope_name="jaeger-v2",otel_scope_schema_url="",otel_scope_version="",role="tracestore"}
-jaeger_storage_badger_size_bytes_vlog{name="some_store",otel_scope_name="jaeger-v2",otel_scope_schema_url="",otel_scope_version="",role="tracestore"}
- `jaeger_storage_badger_write_bytes_l0` (2 variants)
View diff sample
-jaeger_storage_badger_write_bytes_l0{name="another_store",otel_scope_name="jaeger-v2",otel_scope_schema_url="",otel_scope_version="",role="tracestore"}
-jaeger_storage_badger_write_bytes_l0{name="some_store",otel_scope_name="jaeger-v2",otel_scope_schema_url="",otel_scope_version="",role="tracestore"}
- `jaeger_storage_badger_write_bytes_user` (2 variants)
View diff sample
-jaeger_storage_badger_write_bytes_user{name="another_store",otel_scope_name="jaeger-v2",otel_scope_schema_url="",otel_scope_version="",role="tracestore"}
-jaeger_storage_badger_write_bytes_user{name="some_store",otel_scope_name="jaeger-v2",otel_scope_schema_url="",otel_scope_version="",role="tracestore"}
- `jaeger_storage_badger_write_bytes_vlog` (2 variants)
View diff sample
-jaeger_storage_badger_write_bytes_vlog{name="another_store",otel_scope_name="jaeger-v2",otel_scope_schema_url="",otel_scope_version="",role="tracestore"}
-jaeger_storage_badger_write_bytes_vlog{name="some_store",otel_scope_name="jaeger-v2",otel_scope_schema_url="",otel_scope_version="",role="tracestore"}
- `jaeger_storage_badger_write_num_vlog` (2 variants)
View diff sample
-jaeger_storage_badger_write_num_vlog{name="another_store",otel_scope_name="jaeger-v2",otel_scope_schema_url="",otel_scope_version="",role="tracestore"}
-jaeger_storage_badger_write_num_vlog{name="some_store",otel_scope_name="jaeger-v2",otel_scope_schema_url="",otel_scope_version="",role="tracestore"}
- `jaeger_storage_badger_write_pending_num_memtable` (2 variants)
View diff sample
-jaeger_storage_badger_write_pending_num_memtable{name="another_store",otel_scope_name="jaeger-v2",otel_scope_schema_url="",otel_scope_version="",role="tracestore"}
-jaeger_storage_badger_write_pending_num_memtable{name="some_store",otel_scope_name="jaeger-v2",otel_scope_schema_url="",otel_scope_version="",role="tracestore"}
### summary_metrics_snapshot_cassandra ## 📊 Metrics Diff Summary

Total Changes: 0

  • 🆕 Added: 0 metrics
  • ❌ Removed: 0 metrics
  • 🔄 Modified: 0 metrics
  • 🚫 Excluded: 106 metrics

➡️ View full metrics file

@Manik2708 Manik2708 marked this pull request as ready for review January 2, 2026 15:20
@Manik2708
Copy link
Contributor Author

Manik2708 commented Jan 2, 2026

Majority of tests are passing, except kafka. Working on it!. @yurishkuro Please can you review it!

Signed-off-by: Manik Mehta <[email protected]>
@Manik2708
Copy link
Contributor Author

Manik2708 commented Jan 2, 2026

Want to ask about kafka, do kafka also assigns one span per resource span? Like ES/OS? Also I can't understand the encoding part, how is kafka tested? i mean to ask how it is different from other storage tests?

return 0
}

func checkSize(t *testing.T, expected *model.Trace, actual *model.Trace) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to check size, ptracetest.CompareTraces checks for us

ptracetest.IgnoreSpansOrder(),
}
if err := ptracetest.CompareTraces(expected, actual, options...); err != nil {
t.Logf("Actual trace and expected traces are not equal: %v", err)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need of pretty diff, CompareTraces gives first point of difference as error

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have you tried comparing JSON strigns instead? I think it's nice to get a full dump of diffs, not the first breaking point.

Example:

import (
    "github.com/hexops/gotextdiff"
    "github.com/hexops/gotextdiff/myers"
    "github.com/hexops/gotextdiff/span"
)

func DiffStrings(want, got string) string {
    edits := myers.ComputeEdits(span.URIFromPath("want.json"), want, got)
    diff := gotextdiff.ToUnified("want.json", "got.json", want, edits)
    return fmt.Sprint(diff)
}

return bytes.Compare(aAttrs[:], bAttrs[:])
}

func compareTimestamps(a, b pcommon.Timestamp) int {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could directly subtract the timestamps but that would require unnecessary and risky conversion of uint64 to int

t.Log(err)
return false
}
if len(expected) != len(traces) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't know the number of traces which reader will give in one slice, so this check becomes useless

@Manik2708
Copy link
Contributor Author

Manik2708 commented Jan 2, 2026

Another problem is that for TestGetLargeTrace in some tests, the loaded and expected traces are same like: https://github.com/jaegertracing/jaeger/actions/runs/20660935489/job/59323122666?pr=7812#step:7:547 whereas in some tests, the difference is very high, like: https://github.com/jaegertracing/jaeger/actions/runs/20660935489/job/59323122693?pr=7812#step:7:784. I can't figure out the exact reason, how it is linked to conversion? Initially I thought it might be related to normalization but same issue is there in memory, where there is no normalization.

Signed-off-by: Manik Mehta <[email protected]>
}
}
}
return assert.ObjectsAreEqualValues(expected, actual)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought ObjectsAreEqualValues does not work for ptrace objects

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, but they never compare ptrace objects in tests, only operations, services, dependency link etc

Copy link
Member

@yurishkuro yurishkuro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similar comments in different functions, please address comprehensively

@Manik2708
Copy link
Contributor Author

Another problem is that for TestGetLargeTrace in some tests, the loaded and expected traces are same like: https://github.com/jaegertracing/jaeger/actions/runs/20660935489/job/59323122666?pr=7812#step:7:547 whereas in some tests, the difference is very high, like: https://github.com/jaegertracing/jaeger/actions/runs/20660935489/job/59323122693?pr=7812#step:7:784. I can't figure out the exact reason, how it is linked to conversion? Initially I thought it might be related to normalization but same issue is there in memory, where there is no normalization.

This issue seemed to be fixed now! I think jiter.CollectWithError fixed that, maybe I was doing something wrong before. The only problem left is one failing test in kafka that too with otlp_json encoding

@Manik2708
Copy link
Contributor Author

Manik2708 commented Jan 3, 2026

@yurishkuro The OTLP Json encoding test for large trace is failing with this log (I incremented number of traces from 200 to 8000 in TestGetLargeTrace and it was passing till 5000 but failed at 8000 with these logs), what i have concluded is this line thrown by collector: The request included a message larger than the max message size the server will accept. Is there any solution to this which is known to you? Can we change the batch size (The whole trace is being sent as a single record, you can see record = 1 written)? Also I can't get how is this linked to this refactoring!

2026-01-03T12:34:04.127+0530	error	[email protected]/kafka_exporter.go:185	kafka records export failed	{"resource": {"service.instance.id": "e1651255-1e73-425a-82a7-2131ae3764fd", "service.name": "jaeger_collector", "service.version": ""}, "otelcol.component.id": "kafka", "otelcol.component.kind": "exporter", "otelcol.signal": "traces", "records": 1, "topic": "jaeger-spans-1767423837680097000", "error": "error exporting to topic \"jaeger-spans-1767423837680097000\": MESSAGE_TOO_LARGE: The request included a message larger than the max message size the server will accept."}
github.com/open-telemetry/opentelemetry-collector-contrib/exporter/kafkaexporter.(*kafkaExporter[...]).exportData
	/Users/manikmehta/go/pkg/mod/github.com/open-telemetry/opentelemetry-collector-contrib/exporter/[email protected]/kafka_exporter.go:185
go.opentelemetry.io/collector/consumer.ConsumeTracesFunc.ConsumeTraces
	/Users/manikmehta/go/pkg/mod/go.opentelemetry.io/collector/[email protected]/traces.go:27
go.opentelemetry.io/collector/exporter/exporterhelper.NewTraces.RequestConsumeFromTraces.func2
	/Users/manikmehta/go/pkg/mod/go.opentelemetry.io/collector/exporter/[email protected]/internal/queuebatch/traces.go:124
go.opentelemetry.io/collector/exporter/exporterhelper/internal/sender.(*sender[...]).Send
	/Users/manikmehta/go/pkg/mod/go.opentelemetry.io/collector/exporter/[email protected]/internal/sender/sender.go:31
go.opentelemetry.io/collector/exporter/exporterhelper/internal.(*retrySender).Send
	/Users/manikmehta/go/pkg/mod/go.opentelemetry.io/collector/exporter/[email protected]/internal/retry_sender.go:91
go.opentelemetry.io/collector/exporter/exporterhelper/internal.(*obsReportSender[...]).Send
	/Users/manikmehta/go/pkg/mod/go.opentelemetry.io/collector/exporter/[email protected]/internal/obs_report_sender.go:92
go.opentelemetry.io/collector/exporter/exporterhelper/internal.NewQueueSender.func1
	/Users/manikmehta/go/pkg/mod/go.opentelemetry.io/collector/exporter/[email protected]/internal/queue_sender.go:49
go.opentelemetry.io/collector/exporter/exporterhelper/internal/queuebatch.(*disabledBatcher[...]).Consume
	/Users/manikmehta/go/pkg/mod/go.opentelemetry.io/collector/exporter/[email protected]/internal/queuebatch/disabled_batcher.go:23
go.opentelemetry.io/collector/exporter/exporterhelper/internal/queue.(*asyncQueue[...]).Start.func1
	/Users/manikmehta/go/pkg/mod/go.opentelemetry.io/collector/exporter/[email protected]/internal/queue/async_queue.go:49
2026-01-03T12:34:04.128+0530	info	internal/retry_sender.go:133	Exporting failed. Will retry the request after interval.	{"resource": {"service.instance.id": "e1651255-1e73-425a-82a7-2131ae3764fd", "service.name": "jaeger_collector", "service.version": ""}, "otelcol.component.id": "kafka", "otelcol.component.kind": "exporter", "otelcol.signal": "traces", "error": "error exporting to topic \"jaeger-spans-1767423837680097000\": MESSAGE_TOO_LARGE: The request included a message larger than the max message size the server will accept.", "interval": "5.501876495s"}
2026-01-03T12:34:04.138+0530	info	franz	[email protected]/kzap.go:114	skipping producer id initialization because the client was configured to disable idempotent writes	{"resource": {"service.instance.id": "e1651255-1e73-425a-82a7-2131ae3764fd", "service.name": "jaeger_collector", "service.version": ""}, "otelcol.component.id": "kafka", "otelcol.component.kind": "exporter", "otelcol.signal": "traces"}
2026-01-03T12:34:09.642+0530	error	[email protected]/kafka_exporter.go:185	kafka records export failed	{"resource": {"service.instance.id": "e1651255-1e73-425a-82a7-2131ae3764fd", "service.name": "jaeger_collector", "service.version": ""}, "otelcol.component.id": "kafka", "otelcol.component.kind": "exporter", "otelcol.signal": "traces", "records": 1, "topic": "jaeger-spans-1767423837680097000", "error": "error exporting to topic \"jaeger-spans-1767423837680097000\": MESSAGE_TOO_LARGE: The request included a message larger than the max message size the server will accept."}
github.com/open-telemetry/opentelemetry-collector-contrib/exporter/kafkaexporter.(*kafkaExporter[...]).exportData
	/Users/manikmehta/go/pkg/mod/github.com/open-telemetry/opentelemetry-collector-contrib/exporter/[email protected]/kafka_exporter.go:185
go.opentelemetry.io/collector/consumer.ConsumeTracesFunc.ConsumeTraces
	/Users/manikmehta/go/pkg/mod/go.opentelemetry.io/collector/[email protected]/traces.go:27
go.opentelemetry.io/collector/exporter/exporterhelper.NewTraces.RequestConsumeFromTraces.func2
	/Users/manikmehta/go/pkg/mod/go.opentelemetry.io/collector/exporter/[email protected]/internal/queuebatch/traces.go:124
go.opentelemetry.io/collector/exporter/exporterhelper/internal/sender.(*sender[...]).Send
	/Users/manikmehta/go/pkg/mod/go.opentelemetry.io/collector/exporter/[email protected]/internal/sender/sender.go:31
go.opentelemetry.io/collector/exporter/exporterhelper/internal.(*retrySender).Send
	/Users/manikmehta/go/pkg/mod/go.opentelemetry.io/collector/exporter/[email protected]/internal/retry_sender.go:91
go.opentelemetry.io/collector/exporter/exporterhelper/internal.(*obsReportSender[...]).Send
	/Users/manikmehta/go/pkg/mod/go.opentelemetry.io/collector/exporter/[email protected]/internal/obs_report_sender.go:92
go.opentelemetry.io/collector/exporter/exporterhelper/internal.NewQueueSender.func1
	/Users/manikmehta/go/pkg/mod/go.opentelemetry.io/collector/exporter/[email protected]/internal/queue_sender.go:49
go.opentelemetry.io/collector/exporter/exporterhelper/internal/queuebatch.(*disabledBatcher[...]).Consume
	/Users/manikmehta/go/pkg/mod/go.opentelemetry.io/collector/exporter/[email protected]/internal/queuebatch/disabled_batcher.go:23
go.opentelemetry.io/collector/exporter/exporterhelper/internal/queue.(*asyncQueue[...]).Start.func1
	/Users/manikmehta/go/pkg/mod/go.opentelemetry.io/collector/exporter/[email protected]/internal/queue/async_queue.go:49
2026-01-03T12:34:09.642+0530	info	internal/retry_sender.go:133	Exporting failed. Will retry the request after interval.	{"resource": {"service.instance.id": "e1651255-1e73-425a-82a7-2131ae3764fd", "service.name": "jaeger_collector", "service.version": ""}, "otelcol.component.id": "kafka", "otelcol.component.kind": "exporter", "otelcol.signal": "traces", "error": "error exporting to topic \"jaeger-spans-1767423837680097000\": MESSAGE_TOO_LARGE: The request included a message larger than the max message size the server will accept.", "interval": "6.692620082s"}
2026-01-03T12:34:16.345+0530	error	[email protected]/kafka_exporter.go:185	kafka records export failed	{"resource": {"service.instance.id": "e1651255-1e73-425a-82a7-2131ae3764fd", "service.name": "jaeger_collector", "service.version": ""}, "otelcol.component.id": "kafka", "otelcol.component.kind": "exporter", "otelcol.signal": "traces", "records": 1, "topic": "jaeger-spans-1767423837680097000", "error": "error exporting to topic \"jaeger-spans-1767423837680097000\": MESSAGE_TOO_LARGE: The request included a message larger than the max message size the server will accept."}
github.com/open-telemetry/opentelemetry-collector-contrib/exporter/kafkaexporter.(*kafkaExporter[...]).exportData
	/Users/manikmehta/go/pkg/mod/github.com/open-telemetry/opentelemetry-collector-contrib/exporter/[email protected]/kafka_exporter.go:185
go.opentelemetry.io/collector/consumer.ConsumeTracesFunc.ConsumeTraces
	/Users/manikmehta/go/pkg/mod/go.opentelemetry.io/collector/[email protected]/traces.go:27
go.opentelemetry.io/collector/exporter/exporterhelper.NewTraces.RequestConsumeFromTraces.func2
	/Users/manikmehta/go/pkg/mod/go.opentelemetry.io/collector/exporter/[email protected]/internal/queuebatch/traces.go:124
go.opentelemetry.io/collector/exporter/exporterhelper/internal/sender.(*sender[...]).Send
	/Users/manikmehta/go/pkg/mod/go.opentelemetry.io/collector/exporter/[email protected]/internal/sender/sender.go:31
go.opentelemetry.io/collector/exporter/exporterhelper/internal.(*retrySender).Send
	/Users/manikmehta/go/pkg/mod/go.opentelemetry.io/collector/exporter/[email protected]/internal/retry_sender.go:91
go.opentelemetry.io/collector/exporter/exporterhelper/internal.(*obsReportSender[...]).Send
	/Users/manikmehta/go/pkg/mod/go.opentelemetry.io/collector/exporter/[email protected]/internal/obs_report_sender.go:92
go.opentelemetry.io/collector/exporter/exporterhelper/internal.NewQueueSender.func1
	/Users/manikmehta/go/pkg/mod/go.opentelemetry.io/collector/exporter/[email protected]/internal/queue_sender.go:49
go.opentelemetry.io/collector/exporter/exporterhelper/internal/queuebatch.(*disabledBatcher[...]).Consume
	/Users/manikmehta/go/pkg/mod/go.opentelemetry.io/collector/exporter/[email protected]/internal/queuebatch/disabled_batcher.go:23
go.opentelemetry.io/collector/exporter/exporterhelper/internal/queue.(*asyncQueue[...]).Start.func1
	/Users/manikmehta/go/pkg/mod/go.opentelemetry.io/collector/exporter/[email protected]/internal/queue/async_queue.go:49
2026-01-03T12:34:16.346+0530	info	internal/retry_sender.go:133	Exporting failed. Will retry the request after interval.	{"resource": {"service.instance.id": "e1651255-1e73-425a-82a7-2131ae3764fd", "service.name": "jaeger_collector", "service.version": ""}, "otelcol.component.id": "kafka", "otelcol.component.kind": "exporter", "otelcol.signal": "traces", "error": "error exporting to topic \"jaeger-spans-1767423837680097000\": MESSAGE_TOO_LARGE: The request included a message larger than the max message size the server will accept.", "interval": "5.722965263s"}
2026-01-03T12:34:22.082+0530	error	[email protected]/kafka_exporter.go:185	kafka records export failed	{"resource": {"service.instance.id": "e1651255-1e73-425a-82a7-2131ae3764fd", "service.name": "jaeger_collector", "service.version": ""}, "otelcol.component.id": "kafka", "otelcol.component.kind": "exporter", "otelcol.signal": "traces", "records": 1, "topic": "jaeger-spans-1767423837680097000", "error": "error exporting to topic \"jaeger-spans-1767423837680097000\": MESSAGE_TOO_LARGE: The request included a message larger than the max message size the server will accept."}
github.com/open-telemetry/opentelemetry-collector-contrib/exporter/kafkaexporter.(*kafkaExporter[...]).exportData
	/Users/manikmehta/go/pkg/mod/github.com/open-telemetry/opentelemetry-collector-contrib/exporter/[email protected]/kafka_exporter.go:185
go.opentelemetry.io/collector/consumer.ConsumeTracesFunc.ConsumeTraces
	/Users/manikmehta/go/pkg/mod/go.opentelemetry.io/collector/[email protected]/traces.go:27
go.opentelemetry.io/collector/exporter/exporterhelper.NewTraces.RequestConsumeFromTraces.func2
	/Users/manikmehta/go/pkg/mod/go.opentelemetry.io/collector/exporter/[email protected]/internal/queuebatch/traces.go:124
go.opentelemetry.io/collector/exporter/exporterhelper/internal/sender.(*sender[...]).Send
	/Users/manikmehta/go/pkg/mod/go.opentelemetry.io/collector/exporter/[email protected]/internal/sender/sender.go:31
go.opentelemetry.io/collector/exporter/exporterhelper/internal.(*retrySender).Send
	/Users/manikmehta/go/pkg/mod/go.opentelemetry.io/collector/exporter/[email protected]/internal/retry_sender.go:91
go.opentelemetry.io/collector/exporter/exporterhelper/internal.(*obsReportSender[...]).Send
	/Users/manikmehta/go/pkg/mod/go.opentelemetry.io/collector/exporter/[email protected]/internal/obs_report_sender.go:92
go.opentelemetry.io/collector/exporter/exporterhelper/internal.NewQueueSender.func1
	/Users/manikmehta/go/pkg/mod/go.opentelemetry.io/collector/exporter/[email protected]/internal/queue_sender.go:49
go.opentelemetry.io/collector/exporter/exporterhelper/internal/queuebatch.(*disabledBatcher[...]).Consume
	/Users/manikmehta/go/pkg/mod/go.opentelemetry.io/collector/exporter/[email protected]/internal/queuebatch/disabled_batcher.go:23
go.opentelemetry.io/collector/exporter/exporterhelper/internal/queue.(*asyncQueue[...]).Start.func1
	/Users/manikmehta/go/pkg/mod/go.opentelemetry.io/collector/exporter/[email protected]/internal/queue/async_queue.go:49
2026-01-03T12:34:22.083+0530	info	internal/retry_sender.go:133	Exporting failed. Will retry the request after interval.	{"resource": {"service.instance.id": "e1651255-1e73-425a-82a7-2131ae3764fd", "service.name": "jaeger_collector", "service.version": ""}, "otelcol.component.id": "kafka", "otelcol.component.kind": "exporter", "otelcol.signal": "traces", "error": "error exporting to topic \"jaeger-spans-1767423837680097000\": MESSAGE_TOO_LARGE: The request included a message larger than the max message size the server will accept.", "interval": "12.785583759s"}
2026-01-03T12:34:34.881+0530	error	[email protected]/kafka_exporter.go:185	kafka records export failed	{"resource": {"service.instance.id": "e1651255-1e73-425a-82a7-2131ae3764fd", "service.name": "jaeger_collector", "service.version": ""}, "otelcol.component.id": "kafka", "otelcol.component.kind": "exporter", "otelcol.signal": "traces", "records": 1, "topic": "jaeger-spans-1767423837680097000", "error": "error exporting to topic \"jaeger-spans-1767423837680097000\": MESSAGE_TOO_LARGE: The request included a message larger than the max message size the server will accept."}
github.com/open-telemetry/opentelemetry-collector-contrib/exporter/kafkaexporter.(*kafkaExporter[...]).exportData
	/Users/manikmehta/go/pkg/mod/github.com/open-telemetry/opentelemetry-collector-contrib/exporter/[email protected]/kafka_exporter.go:185
go.opentelemetry.io/collector/consumer.ConsumeTracesFunc.ConsumeTraces
	/Users/manikmehta/go/pkg/mod/go.opentelemetry.io/collector/[email protected]/traces.go:27
go.opentelemetry.io/collector/exporter/exporterhelper.NewTraces.RequestConsumeFromTraces.func2
	/Users/manikmehta/go/pkg/mod/go.opentelemetry.io/collector/exporter/[email protected]/internal/queuebatch/traces.go:124
go.opentelemetry.io/collector/exporter/exporterhelper/internal/sender.(*sender[...]).Send
	/Users/manikmehta/go/pkg/mod/go.opentelemetry.io/collector/exporter/[email protected]/internal/sender/sender.go:31
go.opentelemetry.io/collector/exporter/exporterhelper/internal.(*retrySender).Send
	/Users/manikmehta/go/pkg/mod/go.opentelemetry.io/collector/exporter/[email protected]/internal/retry_sender.go:91
go.opentelemetry.io/collector/exporter/exporterhelper/internal.(*obsReportSender[...]).Send
	/Users/manikmehta/go/pkg/mod/go.opentelemetry.io/collector/exporter/[email protected]/internal/obs_report_sender.go:92
go.opentelemetry.io/collector/exporter/exporterhelper/internal.NewQueueSender.func1
	/Users/manikmehta/go/pkg/mod/go.opentelemetry.io/collector/exporter/[email protected]/internal/queue_sender.go:49
go.opentelemetry.io/collector/exporter/exporterhelper/internal/queuebatch.(*disabledBatcher[...]).Consume
	/Users/manikmehta/go/pkg/mod/go.opentelemetry.io/collector/exporter/[email protected]/internal/queuebatch/disabled_batcher.go:23
go.opentelemetry.io/collector/exporter/exporterhelper/internal/queue.(*asyncQueue[...]).Start.func1
	/Users/manikmehta/go/pkg/mod/go.opentelemetry.io/collector/exporter/[email protected]/internal/queue/async_queue.go:49
2026-01-03T12:34:34.882+0530	info	internal/retry_sender.go:133	Exporting failed. Will retry the request after interval.	{"resource": {"service.instance.id": "e1651255-1e73-425a-82a7-2131ae3764fd", "service.name": "jaeger_collector", "service.version": ""}, "otelcol.component.id": "kafka", "otelcol.component.kind": "exporter", "otelcol.signal": "traces", "error": "error exporting to topic \"jaeger-spans-1767423837680097000\": MESSAGE_TOO_LARGE: The request included a message larger than the max message size the server will accept.", "interval": "28.738999952s"}
2026-01-03T12:35:03.630+0530	error	[email protected]/kafka_exporter.go:185	kafka records export failed	{"resource": {"service.instance.id": "e1651255-1e73-425a-82a7-2131ae3764fd", "service.name": "jaeger_collector", "service.version": ""}, "otelcol.component.id": "kafka", "otelcol.component.kind": "exporter", "otelcol.signal": "traces", "records": 1, "topic": "jaeger-spans-1767423837680097000", "error": "error exporting to topic \"jaeger-spans-1767423837680097000\": MESSAGE_TOO_LARGE: The request included a message larger than the max message size the server will accept."}
github.com/open-telemetry/opentelemetry-collector-contrib/exporter/kafkaexporter.(*kafkaExporter[...]).exportData
	/Users/manikmehta/go/pkg/mod/github.com/open-telemetry/opentelemetry-collector-contrib/exporter/[email protected]/kafka_exporter.go:185
go.opentelemetry.io/collector/consumer.ConsumeTracesFunc.ConsumeTraces
	/Users/manikmehta/go/pkg/mod/go.opentelemetry.io/collector/[email protected]/traces.go:27
go.opentelemetry.io/collector/exporter/exporterhelper.NewTraces.RequestConsumeFromTraces.func2
	/Users/manikmehta/go/pkg/mod/go.opentelemetry.io/collector/exporter/[email protected]/internal/queuebatch/traces.go:124
go.opentelemetry.io/collector/exporter/exporterhelper/internal/sender.(*sender[...]).Send
	/Users/manikmehta/go/pkg/mod/go.opentelemetry.io/collector/exporter/[email protected]/internal/sender/sender.go:31
go.opentelemetry.io/collector/exporter/exporterhelper/internal.(*retrySender).Send
	/Users/manikmehta/go/pkg/mod/go.opentelemetry.io/collector/exporter/[email protected]/internal/retry_sender.go:91
go.opentelemetry.io/collector/exporter/exporterhelper/internal.(*obsReportSender[...]).Send
	/Users/manikmehta/go/pkg/mod/go.opentelemetry.io/collector/exporter/[email protected]/internal/obs_report_sender.go:92
go.opentelemetry.io/collector/exporter/exporterhelper/internal.NewQueueSender.func1
	/Users/manikmehta/go/pkg/mod/go.opentelemetry.io/collector/exporter/[email protected]/internal/queue_sender.go:49
go.opentelemetry.io/collector/exporter/exporterhelper/internal/queuebatch.(*disabledBatcher[...]).Consume
	/Users/manikmehta/go/pkg/mod/go.opentelemetry.io/collector/exporter/[email protected]/internal/queuebatch/disabled_batcher.go:23
go.opentelemetry.io/collector/exporter/exporterhelper/internal/queue.(*asyncQueue[...]).Start.func1
	/Users/manikmehta/go/pkg/mod/go.opentelemetry.io/collector/exporter/[email protected]/internal/queue/async_queue.go:49
2026-01-03T12:35:03.631+0530	info	internal/retry_sender.go:133	Exporting failed. Will retry the request after interval.	{"resource": {"service.instance.id": "e1651255-1e73-425a-82a7-2131ae3764fd", "service.name": "jaeger_collector", "service.version": ""}, "otelcol.component.id": "kafka", "otelcol.component.kind": "exporter", "otelcol.signal": "traces", "error": "error exporting to topic \"jaeger-spans-1767423837680097000\": MESSAGE_TOO_LARGE: The request included a message larger than the max message size the server will accept.", "interval": "35.306777749s"}
2026-01-03T12:35:38.948+0530	error	[email protected]/kafka_exporter.go:185	kafka records export failed	{"resource": {"service.instance.id": "e1651255-1e73-425a-82a7-2131ae3764fd", "service.name": "jaeger_collector", "service.version": ""}, "otelcol.component.id": "kafka", "otelcol.component.kind": "exporter", "otelcol.signal": "traces", "records": 1, "topic": "jaeger-spans-1767423837680097000", "error": "error exporting to topic \"jaeger-spans-1767423837680097000\": MESSAGE_TOO_LARGE: The request included a message larger than the max message size the server will accept."}
github.com/open-telemetry/opentelemetry-collector-contrib/exporter/kafkaexporter.(*kafkaExporter[...]).exportData
	/Users/manikmehta/go/pkg/mod/github.com/open-telemetry/opentelemetry-collector-contrib/exporter/[email protected]/kafka_exporter.go:185
go.opentelemetry.io/collector/consumer.ConsumeTracesFunc.ConsumeTraces
	/Users/manikmehta/go/pkg/mod/go.opentelemetry.io/collector/[email protected]/traces.go:27
go.opentelemetry.io/collector/exporter/exporterhelper.NewTraces.RequestConsumeFromTraces.func2
	/Users/manikmehta/go/pkg/mod/go.opentelemetry.io/collector/exporter/[email protected]/internal/queuebatch/traces.go:124
go.opentelemetry.io/collector/exporter/exporterhelper/internal/sender.(*sender[...]).Send
	/Users/manikmehta/go/pkg/mod/go.opentelemetry.io/collector/exporter/[email protected]/internal/sender/sender.go:31
go.opentelemetry.io/collector/exporter/exporterhelper/internal.(*retrySender).Send
	/Users/manikmehta/go/pkg/mod/go.opentelemetry.io/collector/exporter/[email protected]/internal/retry_sender.go:91
go.opentelemetry.io/collector/exporter/exporterhelper/internal.(*obsReportSender[...]).Send
	/Users/manikmehta/go/pkg/mod/go.opentelemetry.io/collector/exporter/[email protected]/internal/obs_report_sender.go:92
go.opentelemetry.io/collector/exporter/exporterhelper/internal.NewQueueSender.func1
	/Users/manikmehta/go/pkg/mod/go.opentelemetry.io/collector/exporter/[email protected]/internal/queue_sender.go:49
go.opentelemetry.io/collector/exporter/exporterhelper/internal/queuebatch.(*disabledBatcher[...]).Consume
	/Users/manikmehta/go/pkg/mod/go.opentelemetry.io/collector/exporter/[email protected]/internal/queuebatch/disabled_batcher.go:23
go.opentelemetry.io/collector/exporter/exporterhelper/internal/queue.(*asyncQueue[...]).Start.func1
	/Users/manikmehta/go/pkg/mod/go.opentelemetry.io/collector/exporter/[email protected]/internal/queue/async_queue.go:49
2026-01-03T12:35:38.949+0530	info	internal/retry_sender.go:133	Exporting failed. Will retry the request after interval.	{"resource": {"service.instance.id": "e1651255-1e73-425a-82a7-2131ae3764fd", "service.name": "jaeger_collector", "service.version": ""}, "otelcol.component.id": "kafka", "otelcol.component.kind": "exporter", "otelcol.signal": "traces", "error": "error exporting to topic \"jaeger-spans-1767423837680097000\": MESSAGE_TOO_LARGE: The request included a message larger than the max message size the server will accept.", "interval": "31.326463227s"}


@Manik2708 Manik2708 requested a review from yurishkuro January 3, 2026 07:31
@yurishkuro
Copy link
Member

Also I can't get how is this linked to this refactoring!

Are you generating more verbose traces which exceeded the message limit?

There should be settings in both Kafka broker and exporter about max message size. Alternatively we can use a smaller batch in the collector config, there's no reason to send all 1000 spans as a single message

@Manik2708
Copy link
Contributor Author

Manik2708 commented Jan 7, 2026

Also I can't get how is this linked to this refactoring!

Are you generating more verbose traces which exceeded the message limit?

There should be settings in both Kafka broker and exporter about max message size. Alternatively we can use a smaller batch in the collector config, there's no reason to send all 1000 spans as a single message

  1. The difference in sending traces is that traces are normalised to 1 resource span per span because kafka has 4 tests; otlp tests don't require nomralization but jaeger proto do require it. I tried turning off normalization, still otel_json test is failing.
  2. The default value of send_batch_size is 8192, should we reduce it? if yes what can be a good number? AI say for 1024 for send_batch_size and 2048 for send_batch_max_size, should I try with this number?

@yurishkuro
Copy link
Member

I remember there was an outstanding ticket in upstream OTEL contrib about Kafka exporter not respecting max batch size / message size, i.e. the batch processor can make the message bigger by aggregating multiple ptrace.Traces but could not make one huge ptrace.Traces payload smaller. There was a PR trying to solve that but I don't recall seeing it merged.

@Manik2708
Copy link
Contributor Author

I remember there was an outstanding ticket in upstream OTEL contrib about Kafka exporter not respecting max batch size / message size, i.e. the batch processor can make the message bigger by aggregating multiple ptrace.Traces but could not make one huge ptrace.Traces payload smaller. There was a PR trying to solve that but I don't recall seeing it merged.

yes, the issue is still there!

_ io.Closer = (*traceWriter)(nil)

MaxChunkSize = 35 // max chunk size otel kafka export can handle safely.
MaxChunkSize = 5 // max chunk size otel kafka export can handle safely.
Copy link
Contributor Author

@Manik2708 Manik2708 Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reducing to this has passed the tests

@Manik2708
Copy link
Contributor Author

@yurishkuro Please review

loadAndParseJSONPB(t, fileName, &trace)
return &trace
// getNormalisedTraces normalise traces and assign one resource span to one span
func getNormalisedTraces(td ptrace.Traces) ptrace.Traces {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use usually use American spelling

Suggested change
func getNormalisedTraces(td ptrace.Traces) ptrace.Traces {
func getNormalizedTraces(td ptrace.Traces) ptrace.Traces {

I am not clear why this function is needed.

require.NoError(t, err, "Not expecting error when unmarshaling fixture %s", path)
}

func correctTimeForTraces(trace ptrace.Traces) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add comments

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and move to dates.go with dates_test.go

ptracetest.IgnoreSpansOrder(),
}
if err := ptracetest.CompareTraces(expected, actual, options...); err != nil {
t.Logf("Actual trace and expected traces are not equal: %v", err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have you tried comparing JSON strigns instead? I think it's nice to get a full dump of diffs, not the first breaking point.

Example:

import (
    "github.com/hexops/gotextdiff"
    "github.com/hexops/gotextdiff/myers"
    "github.com/hexops/gotextdiff/span"
)

func DiffStrings(want, got string) string {
    edits := myers.ComputeEdits(span.URIFromPath("want.json"), want, got)
    diff := gotextdiff.ToUnified("want.json", "got.json", want, edits)
    return fmt.Sprint(diff)
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants