Prototype(symbolization): Add symbolization in Pyroscope read path #3799

marcsanmi · 2024-12-20T11:10:06Z

Context

This PR introduces a comprehensive implementation for DWARF symbolization of unsymbolized profiles in the Pyroscope read path. It enables automatic symbolization of profiles for non-customer code (primarily open-source libraries and binaries) where symbol information isn't available at collection time.

Symbolization

DWARF parsing: Optimized parsing of debug information with minimal memory overhead
Comprehensive symbol resolution: Support for function names, file paths, and line numbers
Inline function resolution: Proper handling of inlined functions for accurate stack traces
Address-based lookup: Fast address-to-symbol mapping with optimized data structures

Multi-level Caching

In-memory symbol cache: LRU cache for frequently accessed symbols
Object storage for debug files: Persistent storage of debug files with configurable obj storage solution
Configurable TTL: Control over cache expiration for both memory and storage caches

Integration Points

Read path symbolization: Automatically symbolize profiles during query time
Remote debug info fetching: Integration with debuginfod for symbol discovery from public servers

Configuration Example

symbolizer:
  enabled: true
  debuginfod_url: "https://debuginfod.elfutils.org"
  in_memory_symbol_cache_size: 100000         # Symbol cache in memory (entries)
  in_memory_debuginfo_cache_size: 2147483648  # Debug info cache in memory (bytes)
  persistent_debuginfo_store:                 # Debug info in persistent storage
    enabled: true
    max_age: 168h
    storage:                                  # Storage backend configuration
      backend: s3
      s3:
        bucket_name: debug-symbols-bucket
        endpoint: s3.amazonaws.com
        access_key_id: ${S3_ACCESS_KEY}
        secret_access_key: ${S3_SECRET_KEY}

korniltsev

would be nice to have some benchmarks of symbolizing different amount of locations and different file sizes, I think it can help us to pick the right place and architecture for using this

pkg/experiment/symbolization/debuginfod_client.go

pkg/phlaredb/symdb/resolver.go

kolesnikovae

Good work, Marc! I'm excited to see some experimental results 🚀

I think we can implement a slightly more optimized version for production use:

sequenceDiagram
    autonumber

    participant QF as Query Frontend
    participant M  as Metastore
    participant QB as Query Backend
    participant SYM as Symbolizer

    QF ->>+M: Query Metadata
    Note left of M: Build identifiers are returned<br> along with the metadata records
    M ->>-QF: 

    par
        QF ->>+SYM: Request for symbolication
        Note left of SYM: Prepare symbols for<br>the objects requested
    and
        QF ->>+QB: Data retrieval and aggregation
        Note left of QB: The main data path<br>Might be serverless
    end

    QB ->>-QF: Data in pprof format
    Note over QF: Because of the truncation,<br> only a limited set of locations<br>make it here (16K by default) 

    QF --)SYM: Location addresses
    
    SYM ->>-QF: Symbols
    
    QF ->>QF: Flame graph rendering

Even without a parallel pipeline and dedicated symbolication service, we could implement something like this:

sequenceDiagram
    autonumber

    participant QF as Query Frontend
    participant M  as Metastore
    participant QB as Query Backend
    participant SYM as Symbols

    QF ->>+M: Query Metadata
    Note left of M: No build identifiers are returned
    M ->>-QF: 

    QF ->>+QB: Data retrieval and aggregation
    Note left of QB: The main data path<br>Might be serverless

    QB ->>-QF: Data in pprof format
    Note over QF: Because of the truncation,<br> only a limited set of locations<br>make it here (16K by default)

    QF ->>+SYM: Fetch symbols
    SYM ->>-QF: Symbols
    Note over QF: In terms of the added latency,<br>this approach is not worse than<br>block level symbolication
    
    QF ->>QF: Flame graph rendering

I think we should avoid symbolization at the block level if the symbols are not already present in the block itself. Otherwise, this approach leads to excessive processing, increased latency, and higher resource usage. Please keep in mind, that a query may span many thousands of blocks.

I won't delve too deeply into how we fetch and process ELF/DWARF files, but I strongly doubt we can bypass the need for an intermediate representation optimized for our access patterns. Additionally, we need a solution to prevent concurrent access to the debuginfod service.

pkg/phlaredb/symdb/resolver_tree.go

pkg/experiment/query_backend/backend.go

korniltsev · 2025-01-17T10:49:24Z

I have not look into the code yet, but I've tried to run it locally and it looks like it's trying to load a lot of unnecesarry debug files.

I run ebpf profiler with no ontarget symbolization , also run a simple python -m http.server to mock debug infod responses.

I then query only one executable process_cpu:cpu:nanoseconds:cpu:nanoseconds{service_name="unknown", process_executable_path="/home/korniltsev/.cache/JetBrains/IntelliJIdea2024.2/tmp/GoLand/___go_build_go_opentelemetry_io_ebpf_profiler"}

I see 268 GET requests, with 13 requests to "GET /buildid/fbce2598b34f1cf8d0c899f34c2218864e1da6d1/debuginfo HTTP/1.1" 200 - (which is the profiler binary I put into mock server for testing and a bunch of 404 which I assume are build ids for the filles in the other processes which the query does not target.

Other then that it works \M/ Can't wait to run it in dev.

liaol · 2025-02-18T11:39:07Z

Hi @marcsanmi , when can this PR be merged?
Thanks

marcsanmi · 2025-02-19T17:32:42Z

Hi @liaol,
It's still going to take a little while :)

marcsanmi · 2025-03-03T16:17:01Z

I've created this diagram to outline the current Symbolization arch:

flowchart TD
    A[SymbolizePprof] --> B{Group by Mapping}
    B --> C[Symbolize Request]
    
    C --> D{Check Symbol Cache}
    
    subgraph "Symbol Cache Layer (LRU, in-memory)"
        D -->|Cache Hit| E[Return Cached Symbols]
        D -->|Cache Miss| F
    end
    
    F{Check Debug Info Cache} 
    
    subgraph "Debug Info Cache Layer (Ristretto, in-memory)"
        F -->|Cache Hit| G[Read from Debug Info Cache]
        F -->|Cache Miss| H
    end
    
    subgraph "Persistent Storage Layer"
        H{Check Object Store}
        H -->|Cache Hit| I[Read from Object Store]
        H -->|Cache Miss| J[Fetch from Debuginfod]
    end
    
    I --> K[Store in Debug Info Cache]
    J --> L[Store in Debug Info Cache]
    J --> M[Store in Object Store]
    
    G --> N[Parse ELF/DWARF]
    K --> N
    L --> N
    
    subgraph "DWARF Resolution Layer"
        N --> O[Resolve Addresses]
        O --> P{Check Address Map}
        P -->|Map Hit| Q[Return from Map]
        P -->|Map Miss| R[Parse DWARF Data]
        R --> S[Build Lookup Tables]
        S --> T[Store in Address Map]
        T --> U[Return Symbols]
        Q --> U
    end
    
    U --> V[Update Symbol Cache]
    V --> W[Return to Caller]
    E --> W

pkg/test/integration/testdata/otel-ebpf-profiler-offcpu-cpu.json

pkg/experiment/symbolizer/metrics.go

pkg/experiment/symbolizer/debuginfod_client.go

pkg/experiment/symbolizer/cache.go

pkg/experiment/symbolizer/debuginfod_client.go

pkg/experiment/symbolizer/reader.go

kolesnikovae · 2025-03-04T06:21:59Z

I might be missing some details, but I have doubts about the cache hierarchy.

Now it looks like we have: symbols_cache -> object_store -> in_memory_object_store (ristretto) -> debuginfod.

As far as I understand, we're going to read from object_store even if there's just a single unresolved address.

I expected to see: symbols_cache -> in_memory_object_store (ristretto) -> object_store -> debuginfod.

Could you please elaborate on the decision?

cmd/pyroscope/help-all.txt.tmpl

marcsanmi · 2025-03-04T18:54:37Z

I might be missing some details, but I have doubts about the cache hierarchy.

You're right @kolesnikovae. I've just realized the problem is that the ristretto cache is coupled inside the debuginfod client. I'll decoupled it and placed it at symbolizer level. Thus, we'll be able to have the following path:

symbols_cache -> in_memory_object_store (ristretto) -> object_store -> debuginfod

pkg/experiment/ingester/segment.go

pkg/frontend/read_path/query_frontend/query_frontend.go

kolesnikovae · 2025-03-20T08:47:27Z

pkg/test/integration/testdata/otel-ebpf-profiler-offcpu.json.bin

Hm. I was sure that the .bin extension would do the trick. Well, the proper solution is to use .gitattributes; I think it should be located in the integration folder and contain:

testdata/* binary

I've tried with

testdata/* binary

testdata/* -text

And running git add --renormalize ., but nothing changed... I'll further investigate.

* Fix generation of function IDs * Store debuginfo requests 404s in symbol cache * Add fallback symbols for unsymbolizable profiles

…ce over ConvertProfileToTree

…lizion by hasNativeProfiles

…bled

…l review comments

korniltsev · 2025-04-25T14:50:34Z

pkg/phlare/phlare.go

@@ -200,6 +202,11 @@ func (c *Config) RegisterFlagsWithContext(f *flag.FlagSet) {
 	c.API.RegisterFlags(f)
 	c.EmbeddedGrafana.RegisterFlags(f)
 	c.TenantSettings.RegisterFlags(f)
+
+	c.v2Experiment = os.Getenv("PYROSCOPE_V2_EXPERIMENT") != ""


don't we already have c.v2Experiment = os.Getenv("PYROSCOPE_V2_EXPERIMENT") != "" in registerServerFlagsWithChangedDefaultValues , do we need this duplication?

korniltsev · 2025-04-25T15:07:30Z

pkg/test/integration/testdata/otel-ebpf-profiler-pyrosymbolized-docker.json.bin

          "mapping": "0x400000-0x40c000@0x0 cat(2efc66e3f4cae30a989532b37b1d5dc81026be68)"
        },
        {
          "address": "0x402462",
-          "lines": [
-            "cat 0x402462[]@:0"


I still don't understand why these are removed. Iam 100% sure I've asked why we remove these, but can not find where my question was.

korniltsev · 2025-04-25T15:08:07Z

pkg/experiment/ingester/segment.go

@@ -337,6 +337,10 @@ func concatSegmentHead(f *headFlush, w *writerOffset, s *metadata.StringTable) (
 		lb.WithLabelSet(model.LabelNameServiceName, f.head.key.service, model.LabelNameProfileType, profileType)
 	}

+	if f.flushed.HasNativeProfiles {
+		lb.WithLabelSet(model.LabelNameServiceName, f.head.key.service, metadata.LabelNameHasNativeProfiles, "true")


why do we set service name here second time?

It is okay – this is the way our metadata labels work (I think I want to rename them to attributes to stop abusing the labels word – too vague, too widely spread). We want to avoid locking ourselves to 1:1 dataset:service mapping, and having multi-value attributes (e.g., profile types).

Therefore, we represent dataset labels in 1st normalized form (1NF): each label set is independent – we can think of them as of rows in a table. The overhead of repetitive labels is affordable: string interning and proto encoding make it practically neglectable.

So, if you query metadata entries for a service, and want to retrieve some extra attributes (like whether it has native profiles), you need to maintain the relation between the service and the attribute, when you add it.

korniltsev · 2025-04-25T15:09:03Z

pkg/experiment/ingester/segment.go

+	if len(s) < 3 || !strings.HasPrefix(s, "0x") {
+		return false
+	}
+


are these functions used or needed? getFunctionById isHexAddress

not anymore, good catch 👍

kolesnikovae

Please cleanup the code after LLM

kolesnikovae · 2025-04-26T03:11:00Z

lidia/constants.go

 	// Size of the file header in bytes
-	headerSize = 0x80
+	headerSize = 0x98

 	// Number of fields in a line table entry
 	lineTableFieldsCount = 2

+	// Size of each line table entry (2 uint16s or 2 uint32s)
+	lineTableFieldsSize = 4
+


Just in case: does such change invalidate existing objects?

I'll remove the constant to properly calculate this dynamically as:

entrySize := 4 if hdr.lineTablesHeader.fieldSize == 4 { entrySize = 8 } hdr.binaryLayoutHeader.offset = hdr.lineTablesHeader.offset + uint64(len(rc.lb.entries)*entrySize)

But answer your question, no, this should only affect the offset calculation when writing a new file, ensuring that the binary layout section starts at the correct position. It has no impact on how we read existing files.

N/A anymore since we removed MapRuntimeAddress so no binaryLayout is needed.

kolesnikovae · 2025-04-26T03:13:20Z

lidia/builder.go

+	if layout == nil {
+		return fmt.Errorf("nil binary layout")
+	}


Is this possible?

Not as of now.

kolesnikovae · 2025-04-26T04:18:03Z

cmd/symbolization/main.go

Let's extend profilecli in a separate PR and removing this file for now

kolesnikovae · 2025-04-26T04:32:18Z

pkg/experiment/ingester/segment.go

@@ -337,6 +337,10 @@ func concatSegmentHead(f *headFlush, w *writerOffset, s *metadata.StringTable) (
 		lb.WithLabelSet(model.LabelNameServiceName, f.head.key.service, model.LabelNameProfileType, profileType)
 	}

+	if f.flushed.HasNativeProfiles {
+		lb.WithLabelSet(model.LabelNameServiceName, f.head.key.service, metadata.LabelNameHasNativeProfiles, "true")


It is okay – this is the way our metadata labels work (I think I want to rename them to attributes to stop abusing the labels word – too vague, too widely spread). We want to avoid locking ourselves to 1:1 dataset:service mapping, and having multi-value attributes (e.g., profile types).

Therefore, we represent dataset labels in 1st normalized form (1NF): each label set is independent – we can think of them as of rows in a table. The overhead of repetitive labels is affordable: string interning and proto encoding make it practically neglectable.

So, if you query metadata entries for a service, and want to retrieve some extra attributes (like whether it has native profiles), you need to maintain the relation between the service and the attribute, when you add it.

kolesnikovae · 2025-04-26T04:43:09Z

pkg/experiment/symbolizer/reader.go

+// memoryReader implements io.ReadCloser and io.ReaderAt for reading from an in-memory byte slice
+type memoryReader struct {
+	bs  []byte
+	off int64
+}
+
+func (b *memoryReader) Read(p []byte) (n int, err error) {
+	res, err := b.ReadAt(p, b.off)
+	b.off += int64(res)
+	return res, err
+}
+
+func (b *memoryReader) ReadAt(p []byte, off int64) (n int, err error) {
+	if off >= int64(len(b.bs)) {
+		return 0, io.EOF
+	}
+	n = copy(p, b.bs[off:])
+	return n, nil
+}
+
+func (b *memoryReader) Close() error {
+	return nil
+}


Please use bytes.Reader with io.NopCloser instead.

kolesnikovae · 2025-04-26T09:09:08Z

pkg/experiment/symbolizer/symbolizer.go

+	for _, loc := range req.Locations {
+		resolveStart := time.Now()
+		addr, err := MapRuntimeAddress(loc.Address, ei, Mapping{
+			Start:  loc.Mapping.Start,
+			Limit:  loc.Mapping.Limit,
+			Offset: loc.Mapping.Offset,
+		})
+		if err != nil {
+			return fmt.Errorf("normalize address: %w", err)
+		}
+
+		// Look up the address directly in the lidia table
+		frames, err := table.Lookup(framesBuf, addr)
+		if err != nil {
+			level.Error(s.logger).Log(
+				"msg", "failed to resolve address on Lidia table lookup",
+				"addr", fmt.Sprintf("0x%x", addr),
+				"binary", req.BinaryName,
+				"build_id", req.BuildID,
+				"error", err,
+			)
+			loc.Lines = s.createNotFoundSymbols(req.BinaryName, loc, addr)
+			s.metrics.debugSymbolResolution.WithLabelValues(StatusErrorServerError).Observe(time.Since(resolveStart).Seconds())
+			continue
+		}


We don't want to update metrics on each resolved location: it's expensive and not useful: we are measuring memory latency here.

We don't want to write a log message on each failed Lookup. it will kill the system.

kolesnikovae · 2025-04-26T09:16:43Z

pkg/experiment/symbolizer/addrmapper.go

+// - The program might be loaded at a different address than it was linked for
+// - Different segments might need different adjustments
+// - Various ELF types (EXEC, DYN, REL) handle addressing differently
+func MapRuntimeAddress(runtimeAddr uint64, ei *BinaryLayout, m Mapping) (uint64, error) {


@korniltsev could you please take a careful look? I assume we should have something very similar in the profiler-side symbolizer.

REL is not something can be loaded , we should not handle these

for EXEC base address is always 0

for DYN, the otle ebpf profiler always outputs addresses relative to the elf base, so if an elf has a function at 0xcafe and the library is loaded at 0x7f0000000000, the profiler outputs just 0xcafe. We may need this addrmapper if in the future some other profiler outputs non relative addresses - 0x7f000000cafe. I remember we found a bug somewhere here in addrmapper when debug elf and source elf had different PT_LOAD addresses and this code was outputting incorrect addresses for a debug-only file. This was an alloy binary. And in the case of full binary it was correct and the base offset was calculated as zero because it was an otel profile. I think we either need to solve this issue and make it work for both debug-only elfs and regular elfs including debug sections and have tests for these cases. Or just remove this addrmapper alltogether and assume the base is always zero (until we find some other non otel profiler which outputs outputs non relative addresses)

This is my understanding as well: we do not want to normalize addresses at lookup time. For DYN, it should be done either in profiler, or during the translation to lidia.

the otle ebpf profiler always outputs addresses relative to the elf base

I wasn't aware of it. Considering this, I'd say we can remove it. As Tolyan mentioned, this will also simplify implementation since lidia binary layout won't be needed anymore (at least for now).

kolesnikovae · 2025-04-26T09:35:43Z

pkg/experiment/symbolizer/symbolizer.go

+	for _, loc := range req.Locations {
+		resolveStart := time.Now()
+		addr, err := MapRuntimeAddress(loc.Address, ei, Mapping{
+			Start:  loc.Mapping.Start,
+			Limit:  loc.Mapping.Limit,
+			Offset: loc.Mapping.Offset,
+		})
+		if err != nil {
+			return fmt.Errorf("normalize address: %w", err)
+		}


I'd try to normalize all addresses before trying to resolve them.

kolesnikovae · 2025-04-26T09:37:19Z

pkg/experiment/symbolizer/symbolizer.go

+	// Create symbolization request for this mapping group
+	req := s.createSymbolizationRequest(binaryName, buildID, mapping, locs)
+
+	if err := s.Symbolize(ctx, &req); err != nil {
+		return err
+	}
+
+	// Store symbolization results back into profile
+	s.updateProfileWithSymbols(profile, mapping, locs, req.Locations)


The code generated by LLM is extremely inefficient and might be invalid. Please consider an alternative:

We split profile by mappings. Each such profile only contains Locations, Mappings (just one), and StringTable.

Each such sub-profile is symbolized independently in-parallel (bound)

Symbolized profiles are merged.

kolesnikovae · 2025-04-26T09:44:40Z

pkg/experiment/symbolizer/symbolizer.go

+func (s *ProfileSymbolizer) Symbolize(ctx context.Context, req *Request) error {
+	start := time.Now()
+	status := StatusSuccess
+	defer func() {
+		s.metrics.profileSymbolization.WithLabelValues(status).Observe(time.Since(start).Seconds())
+	}()
+
+	if s.checkSymbolCache(req) {
+		return nil
+	}
+
+	if s.checkLidiaTableCache(ctx, req) {
+		return nil
+	}
+
+	if s.checkObjectStoreCache(ctx, req) {
+		return nil
+	}
+
+	return s.fetchAndCacheFromDebuginfod(ctx, req, &status)
+}


A comment explaining the sequence and cache layers would be helpful.

kolesnikovae · 2025-04-26T13:25:08Z

pkg/experiment/symbolizer/metrics.go

+func (m *Metrics) Unregister() {
+	if m.registerer == nil {
+		return
+	}


Do we use it?

kolesnikovae · 2025-04-26T13:32:53Z

pkg/experiment/symbolizer/symbolizer.go

+	// TODO: change me back to false!
+	f.BoolVar(&cfg.Enabled, "symbolizer.enabled", true, "Enable symbolization for unsymbolized profiles")


Should probably be disabled by default

kolesnikovae · 2025-04-26T13:45:39Z

pkg/experiment/symbolizer/symbolizer.go

+	if s.store != nil {
+		cacheStart := time.Now()
+		if cacheErr := s.store.Put(ctx, req.BuildID, bytes.NewReader(data)); cacheErr != nil {
+			s.metrics.cacheOperations.WithLabelValues("objstore_cache", "put", StatusErrorUpload).Observe(time.Since(cacheStart).Seconds())


We already measure it inside the bucket implementation.

korniltsev · 2025-04-29T05:48:17Z

pkg/experiment/symbolizer/symbolizer.go

+
+// checkSymbolCache checks if all addresses are in the symbol cache
+func (s *ProfileSymbolizer) checkSymbolCache(req *Request) bool {
+	if s.symbolCache == nil {


when is s.symbolCache nil?

korniltsev · 2025-04-29T05:51:03Z

pkg/experiment/symbolizer/symbolizer.go

+// updateAllCaches updates all caches with the fetched data
+func (s *ProfileSymbolizer) updateAllCaches(ctx context.Context, req *Request, data []byte) error {
+	// Store in Ristretto cache
+	if s.lidiaTableCache != nil {


when is it nil?

korniltsev · 2025-04-29T06:03:45Z

pkg/experiment/symbolizer/symbolizer.go

+}
+
+func (s *ProfileSymbolizer) processELFData(data []byte) (lidiaData []byte, ei *BinaryLayout, err error) {
+	// Create a reader from the data


Lets remove all the "obvious" comments all over the new package. They are not very useful.

// Create a reader from data
... bytes.NewReader

// Create lidia file
lidia.CreateLidiaFromElf

// Convert to our internal binary layout format ei = convertBinaryLayout(lidiaLayout)

// Open the Lidia table to extract binary layout
lidia.Open

korniltsev · 2025-04-29T06:08:21Z

pkg/experiment/symbolizer/symbolizer.go

+	}
+
+	// Store in ObjstoreCache
+	if s.store != nil {


when is s.store nil?

* feat: add symbolizer per tenant overrides

korniltsev reviewed Dec 20, 2024

View reviewed changes

pkg/experiment/symbolization/debuginfod_client.go Outdated Show resolved Hide resolved

marcsanmi force-pushed the marcsanmi/symbolization-poc branch from efdde88 to 6b009d3 Compare January 16, 2025 12:15

marcsanmi requested review from korniltsev and petethepig January 16, 2025 12:20

korniltsev reviewed Jan 17, 2025

View reviewed changes

pkg/phlaredb/symdb/resolver.go Outdated Show resolved Hide resolved

kolesnikovae reviewed Jan 17, 2025

View reviewed changes

marcsanmi changed the title ~~POC feat(symbolization): Add DWARF symbolization with debuginfod support~~ Prototype(symbolization): Add symbolization for unsymbolized profiles in Pyroscope read path Jan 19, 2025

marcsanmi changed the title ~~Prototype(symbolization): Add symbolization for unsymbolized profiles in Pyroscope read path~~ Prototype(symbolization): Add symbolization in Pyroscope read path Jan 19, 2025

marcsanmi force-pushed the marcsanmi/symbolization-poc branch 2 times, most recently from 2937519 to 0b8a289 Compare February 19, 2025 17:29

marcsanmi requested a review from kolesnikovae February 19, 2025 17:31

marcsanmi requested a review from korniltsev February 20, 2025 07:59

marcsanmi force-pushed the marcsanmi/symbolization-poc branch 2 times, most recently from 7c2ab09 to 87a481c Compare March 3, 2025 15:34

kolesnikovae reviewed Mar 4, 2025

View reviewed changes

pkg/test/integration/testdata/otel-ebpf-profiler-offcpu-cpu.json Outdated Show resolved Hide resolved

kolesnikovae reviewed Mar 4, 2025

View reviewed changes

alsoba13 reviewed Mar 4, 2025

View reviewed changes

cmd/pyroscope/help-all.txt.tmpl Outdated Show resolved Hide resolved

marcsanmi force-pushed the marcsanmi/symbolization-poc branch 2 times, most recently from fd39d9e to 65fb599 Compare March 5, 2025 15:49

marcsanmi requested a review from kolesnikovae March 5, 2025 16:52

marcsanmi force-pushed the marcsanmi/symbolization-poc branch from 65fb599 to 10f52ea Compare March 17, 2025 08:52

kolesnikovae reviewed Mar 17, 2025

View reviewed changes

pkg/experiment/ingester/segment.go Outdated Show resolved Hide resolved

liaol reviewed Mar 20, 2025

View reviewed changes

pkg/frontend/read_path/query_frontend/query_frontend.go Outdated Show resolved Hide resolved

kolesnikovae reviewed Mar 20, 2025

View reviewed changes

marcsanmi added 15 commits April 22, 2025 16:12

Fix flags after rebase

a054f58

update metadata labels to work with needs_symbolization & fix maxNodes

c38dac7

Address review and several fixes:

8628f33

* Fix generation of function IDs * Store debuginfo requests 404s in symbol cache * Add fallback symbols for unsymbolizable profiles

Made needsSymbolization thread-safe

c650907

Update .gitattributes w/ normalize

af13cfb

ensure valid existing symbolization is preserved

fa1ff76

update symbols check by only checking lines length

73a48aa

refactor: added TreeFromBackendProfileSampleType for better performan…

95f8a6f

…ce over ConvertProfileToTree

refactor: move symbols logic from segment to head & change needsSymbo…

0008486

…lizion by hasNativeProfiles

feat: symbolize only otel-related profiles

ca335b8

TODO: review this commit, it has logs and otel only smbolization disa…

762a596

…bled

Change lidia write to accept interface

13c2659

Use new Lidia format insitead of dwarf

3522cd0

Add Lidia binary layout support

a6bd5c9

Remove unnecessary debuginfod client raw files cache & address severa…

61d2b41

…l review comments

marcsanmi force-pushed the marcsanmi/symbolization-poc branch from 671af5c to 61d2b41 Compare April 22, 2025 14:32

marcsanmi marked this pull request as ready for review April 25, 2025 14:34

marcsanmi requested a review from a team as a code owner April 25, 2025 14:34

korniltsev reviewed Apr 25, 2025

View reviewed changes

kolesnikovae requested changes Apr 26, 2025

View reviewed changes

marcsanmi mentioned this pull request Apr 26, 2025

feat(lidia): add binary layout support #4141

Closed

korniltsev reviewed Apr 29, 2025

View reviewed changes

pkg/experiment/symbolizer/symbolizer.go

}

// Store in ObjstoreCache

if s.store != nil {

Copy link

Collaborator

korniltsev Apr 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when is s.store nil?

feat: add symbolizer per tenant overrides (#4136)

1ca0b6b

* feat: add symbolizer per tenant overrides

		// TODO: change me back to false!
		f.BoolVar(&cfg.Enabled, "symbolizer.enabled", true, "Enable symbolization for unsymbolized profiles")

Prototype(symbolization): Add symbolization in Pyroscope read path #3799

Are you sure you want to change the base?

Prototype(symbolization): Add symbolization in Pyroscope read path #3799

Conversation

marcsanmi commented Dec 20, 2024 • edited Loading

Context

Symbolization

Multi-level Caching

Integration Points

Configuration Example

korniltsev left a comment

Choose a reason for hiding this comment

kolesnikovae left a comment

Choose a reason for hiding this comment

korniltsev commented Jan 17, 2025

liaol commented Feb 18, 2025

marcsanmi commented Feb 19, 2025

marcsanmi commented Mar 3, 2025 • edited Loading

kolesnikovae commented Mar 4, 2025 • edited Loading

marcsanmi commented Mar 4, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kolesnikovae left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

marcsanmi Apr 28, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kolesnikovae Apr 26, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

marcsanmi commented Dec 20, 2024 •

edited

Loading

marcsanmi commented Mar 3, 2025 •

edited

Loading

kolesnikovae commented Mar 4, 2025 •

edited

Loading

marcsanmi Apr 28, 2025 •

edited

Loading

kolesnikovae Apr 26, 2025 •

edited

Loading