-
Notifications
You must be signed in to change notification settings - Fork 645
Prototype(symbolization): Add symbolization in Pyroscope read path #3799
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would be nice to have some benchmarks of symbolizing different amount of locations and different file sizes, I think it can help us to pick the right place and architecture for using this
efdde88
to
6b009d3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good work, Marc! I'm excited to see some experimental results 🚀
I think we can implement a slightly more optimized version for production use:
sequenceDiagram
autonumber
participant QF as Query Frontend
participant M as Metastore
participant QB as Query Backend
participant SYM as Symbolizer
QF ->>+M: Query Metadata
Note left of M: Build identifiers are returned<br> along with the metadata records
M ->>-QF:
par
QF ->>+SYM: Request for symbolication
Note left of SYM: Prepare symbols for<br>the objects requested
and
QF ->>+QB: Data retrieval and aggregation
Note left of QB: The main data path<br>Might be serverless
end
QB ->>-QF: Data in pprof format
Note over QF: Because of the truncation,<br> only a limited set of locations<br>make it here (16K by default)
QF --)SYM: Location addresses
SYM ->>-QF: Symbols
QF ->>QF: Flame graph rendering
Even without a parallel pipeline and dedicated symbolication service, we could implement something like this:
sequenceDiagram
autonumber
participant QF as Query Frontend
participant M as Metastore
participant QB as Query Backend
participant SYM as Symbols
QF ->>+M: Query Metadata
Note left of M: No build identifiers are returned
M ->>-QF:
QF ->>+QB: Data retrieval and aggregation
Note left of QB: The main data path<br>Might be serverless
QB ->>-QF: Data in pprof format
Note over QF: Because of the truncation,<br> only a limited set of locations<br>make it here (16K by default)
QF ->>+SYM: Fetch symbols
SYM ->>-QF: Symbols
Note over QF: In terms of the added latency,<br>this approach is not worse than<br>block level symbolication
QF ->>QF: Flame graph rendering
I think we should avoid symbolization at the block level if the symbols are not already present in the block itself. Otherwise, this approach leads to excessive processing, increased latency, and higher resource usage. Please keep in mind, that a query may span many thousands of blocks.
I won't delve too deeply into how we fetch and process ELF/DWARF files, but I strongly doubt we can bypass the need for an intermediate representation optimized for our access patterns. Additionally, we need a solution to prevent concurrent access to the debuginfod service.
I have not look into the code yet, but I've tried to run it locally and it looks like it's trying to load a lot of unnecesarry debug files. I run ebpf profiler with no ontarget symbolization , also run a simple I then query only one executable I see 268 GET requests, with 13 requests to Other then that it works \M/ Can't wait to run it in dev. |
Hi @marcsanmi , when can this PR be merged? |
2937519
to
0b8a289
Compare
Hi @liaol, |
7c2ab09
to
87a481c
Compare
I've created this diagram to outline the current Symbolization arch: flowchart TD
A[SymbolizePprof] --> B{Group by Mapping}
B --> C[Symbolize Request]
C --> D{Check Symbol Cache}
subgraph "Symbol Cache Layer (LRU, in-memory)"
D -->|Cache Hit| E[Return Cached Symbols]
D -->|Cache Miss| F
end
F{Check Debug Info Cache}
subgraph "Debug Info Cache Layer (Ristretto, in-memory)"
F -->|Cache Hit| G[Read from Debug Info Cache]
F -->|Cache Miss| H
end
subgraph "Persistent Storage Layer"
H{Check Object Store}
H -->|Cache Hit| I[Read from Object Store]
H -->|Cache Miss| J[Fetch from Debuginfod]
end
I --> K[Store in Debug Info Cache]
J --> L[Store in Debug Info Cache]
J --> M[Store in Object Store]
G --> N[Parse ELF/DWARF]
K --> N
L --> N
subgraph "DWARF Resolution Layer"
N --> O[Resolve Addresses]
O --> P{Check Address Map}
P -->|Map Hit| Q[Return from Map]
P -->|Map Miss| R[Parse DWARF Data]
R --> S[Build Lookup Tables]
S --> T[Store in Address Map]
T --> U[Return Symbols]
Q --> U
end
U --> V[Update Symbol Cache]
V --> W[Return to Caller]
E --> W
|
pkg/test/integration/testdata/otel-ebpf-profiler-offcpu-cpu.json
Outdated
Show resolved
Hide resolved
I might be missing some details, but I have doubts about the cache hierarchy. Now it looks like we have: symbols_cache -> object_store -> in_memory_object_store (ristretto) -> debuginfod. As far as I understand, we're going to read from object_store even if there's just a single unresolved address. I expected to see: symbols_cache -> in_memory_object_store (ristretto) -> object_store -> debuginfod. Could you please elaborate on the decision? |
You're right @kolesnikovae. I've just realized the problem is that the ristretto cache is coupled inside the debuginfod client. I'll decoupled it and placed it at symbolizer level. Thus, we'll be able to have the following path: symbols_cache -> in_memory_object_store (ristretto) -> object_store -> debuginfod |
fd39d9e
to
65fb599
Compare
65fb599
to
10f52ea
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm. I was sure that the .bin
extension would do the trick. Well, the proper solution is to use .gitattributes
; I think it should be located in the integration
folder and contain:
testdata/* binary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've tried with
testdata/* binary
testdata/* -text
And running git add --renormalize .
, but nothing changed... I'll further investigate.
* Fix generation of function IDs * Store debuginfo requests 404s in symbol cache * Add fallback symbols for unsymbolizable profiles
…ce over ConvertProfileToTree
…lizion by hasNativeProfiles
…l review comments
671af5c
to
61d2b41
Compare
@@ -200,6 +202,11 @@ func (c *Config) RegisterFlagsWithContext(f *flag.FlagSet) { | |||
c.API.RegisterFlags(f) | |||
c.EmbeddedGrafana.RegisterFlags(f) | |||
c.TenantSettings.RegisterFlags(f) | |||
|
|||
c.v2Experiment = os.Getenv("PYROSCOPE_V2_EXPERIMENT") != "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't we already have c.v2Experiment = os.Getenv("PYROSCOPE_V2_EXPERIMENT") != ""
in registerServerFlagsWithChangedDefaultValues
, do we need this duplication?
"mapping": "0x400000-0x40c000@0x0 cat(2efc66e3f4cae30a989532b37b1d5dc81026be68)" | ||
}, | ||
{ | ||
"address": "0x402462", | ||
"lines": [ | ||
"cat 0x402462[]@:0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still don't understand why these are removed. Iam 100% sure I've asked why we remove these, but can not find where my question was.
@@ -337,6 +337,10 @@ func concatSegmentHead(f *headFlush, w *writerOffset, s *metadata.StringTable) ( | |||
lb.WithLabelSet(model.LabelNameServiceName, f.head.key.service, model.LabelNameProfileType, profileType) | |||
} | |||
|
|||
if f.flushed.HasNativeProfiles { | |||
lb.WithLabelSet(model.LabelNameServiceName, f.head.key.service, metadata.LabelNameHasNativeProfiles, "true") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we set service name here second time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is okay – this is the way our metadata labels work (I think I want to rename them to attributes to stop abusing the labels word – too vague, too widely spread). We want to avoid locking ourselves to 1:1 dataset:service mapping, and having multi-value attributes (e.g., profile types).
Therefore, we represent dataset labels in 1st normalized form (1NF): each label set is independent – we can think of them as of rows in a table. The overhead of repetitive labels is affordable: string interning and proto encoding make it practically neglectable.
So, if you query metadata entries for a service, and want to retrieve some extra attributes (like whether it has native profiles), you need to maintain the relation between the service and the attribute, when you add it.
if len(s) < 3 || !strings.HasPrefix(s, "0x") { | ||
return false | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are these functions used or needed? getFunctionById isHexAddress
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not anymore, good catch 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please cleanup the code after LLM
// Size of the file header in bytes | ||
headerSize = 0x80 | ||
headerSize = 0x98 | ||
|
||
// Number of fields in a line table entry | ||
lineTableFieldsCount = 2 | ||
|
||
// Size of each line table entry (2 uint16s or 2 uint32s) | ||
lineTableFieldsSize = 4 | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just in case: does such change invalidate existing objects?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll remove the constant to properly calculate this dynamically as:
entrySize := 4
if hdr.lineTablesHeader.fieldSize == 4 {
entrySize = 8
}
hdr.binaryLayoutHeader.offset = hdr.lineTablesHeader.offset + uint64(len(rc.lb.entries)*entrySize)
But answer your question, no, this should only affect the offset calculation when writing a new file, ensuring that the binary layout section starts at the correct position. It has no impact on how we read existing files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
N/A anymore since we removed MapRuntimeAddress
so no binaryLayout is needed.
if layout == nil { | ||
return fmt.Errorf("nil binary layout") | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this possible?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not as of now.
cmd/symbolization/main.go
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's extend profilecli
in a separate PR and removing this file for now
@@ -337,6 +337,10 @@ func concatSegmentHead(f *headFlush, w *writerOffset, s *metadata.StringTable) ( | |||
lb.WithLabelSet(model.LabelNameServiceName, f.head.key.service, model.LabelNameProfileType, profileType) | |||
} | |||
|
|||
if f.flushed.HasNativeProfiles { | |||
lb.WithLabelSet(model.LabelNameServiceName, f.head.key.service, metadata.LabelNameHasNativeProfiles, "true") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is okay – this is the way our metadata labels work (I think I want to rename them to attributes to stop abusing the labels word – too vague, too widely spread). We want to avoid locking ourselves to 1:1 dataset:service mapping, and having multi-value attributes (e.g., profile types).
Therefore, we represent dataset labels in 1st normalized form (1NF): each label set is independent – we can think of them as of rows in a table. The overhead of repetitive labels is affordable: string interning and proto encoding make it practically neglectable.
So, if you query metadata entries for a service, and want to retrieve some extra attributes (like whether it has native profiles), you need to maintain the relation between the service and the attribute, when you add it.
// memoryReader implements io.ReadCloser and io.ReaderAt for reading from an in-memory byte slice | ||
type memoryReader struct { | ||
bs []byte | ||
off int64 | ||
} | ||
|
||
func (b *memoryReader) Read(p []byte) (n int, err error) { | ||
res, err := b.ReadAt(p, b.off) | ||
b.off += int64(res) | ||
return res, err | ||
} | ||
|
||
func (b *memoryReader) ReadAt(p []byte, off int64) (n int, err error) { | ||
if off >= int64(len(b.bs)) { | ||
return 0, io.EOF | ||
} | ||
n = copy(p, b.bs[off:]) | ||
return n, nil | ||
} | ||
|
||
func (b *memoryReader) Close() error { | ||
return nil | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use bytes.Reader
with io.NopCloser
instead.
for _, loc := range req.Locations { | ||
resolveStart := time.Now() | ||
addr, err := MapRuntimeAddress(loc.Address, ei, Mapping{ | ||
Start: loc.Mapping.Start, | ||
Limit: loc.Mapping.Limit, | ||
Offset: loc.Mapping.Offset, | ||
}) | ||
if err != nil { | ||
return fmt.Errorf("normalize address: %w", err) | ||
} | ||
|
||
// Look up the address directly in the lidia table | ||
frames, err := table.Lookup(framesBuf, addr) | ||
if err != nil { | ||
level.Error(s.logger).Log( | ||
"msg", "failed to resolve address on Lidia table lookup", | ||
"addr", fmt.Sprintf("0x%x", addr), | ||
"binary", req.BinaryName, | ||
"build_id", req.BuildID, | ||
"error", err, | ||
) | ||
loc.Lines = s.createNotFoundSymbols(req.BinaryName, loc, addr) | ||
s.metrics.debugSymbolResolution.WithLabelValues(StatusErrorServerError).Observe(time.Since(resolveStart).Seconds()) | ||
continue | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- We don't want to update metrics on each resolved location: it's expensive and not useful: we are measuring memory latency here.
- We don't want to write a log message on each failed
Lookup
. it will kill the system.
// - The program might be loaded at a different address than it was linked for | ||
// - Different segments might need different adjustments | ||
// - Various ELF types (EXEC, DYN, REL) handle addressing differently | ||
func MapRuntimeAddress(runtimeAddr uint64, ei *BinaryLayout, m Mapping) (uint64, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@korniltsev could you please take a careful look? I assume we should have something very similar in the profiler-side symbolizer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- REL is not something can be loaded , we should not handle these
- for EXEC base address is always 0
- for DYN, the otle ebpf profiler always outputs addresses relative to the elf base, so if an elf has a function at 0xcafe and the library is loaded at 0x7f0000000000, the profiler outputs just 0xcafe. We may need this addrmapper if in the future some other profiler outputs non relative addresses - 0x7f000000cafe. I remember we found a bug somewhere here in addrmapper when debug elf and source elf had different PT_LOAD addresses and this code was outputting incorrect addresses for a debug-only file. This was an alloy binary. And in the case of full binary it was correct and the base offset was calculated as zero because it was an otel profile. I think we either need to solve this issue and make it work for both debug-only elfs and regular elfs including debug sections and have tests for these cases. Or just remove this addrmapper alltogether and assume the base is always zero (until we find some other non otel profiler which outputs outputs non relative addresses)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is my understanding as well: we do not want to normalize addresses at lookup time. For DYN, it should be done either in profiler, or during the translation to lidia.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the otle ebpf profiler always outputs addresses relative to the elf base
I wasn't aware of it. Considering this, I'd say we can remove it. As Tolyan mentioned, this will also simplify implementation since lidia binary layout won't be needed anymore (at least for now).
for _, loc := range req.Locations { | ||
resolveStart := time.Now() | ||
addr, err := MapRuntimeAddress(loc.Address, ei, Mapping{ | ||
Start: loc.Mapping.Start, | ||
Limit: loc.Mapping.Limit, | ||
Offset: loc.Mapping.Offset, | ||
}) | ||
if err != nil { | ||
return fmt.Errorf("normalize address: %w", err) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd try to normalize all addresses before trying to resolve them.
// Create symbolization request for this mapping group | ||
req := s.createSymbolizationRequest(binaryName, buildID, mapping, locs) | ||
|
||
if err := s.Symbolize(ctx, &req); err != nil { | ||
return err | ||
} | ||
|
||
// Store symbolization results back into profile | ||
s.updateProfileWithSymbols(profile, mapping, locs, req.Locations) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code generated by LLM is extremely inefficient and might be invalid. Please consider an alternative:
- We split profile by mappings. Each such profile only contains Locations, Mappings (just one), and StringTable.
- Each such sub-profile is symbolized independently in-parallel (bound)
- Symbolized profiles are merged.
func (s *ProfileSymbolizer) Symbolize(ctx context.Context, req *Request) error { | ||
start := time.Now() | ||
status := StatusSuccess | ||
defer func() { | ||
s.metrics.profileSymbolization.WithLabelValues(status).Observe(time.Since(start).Seconds()) | ||
}() | ||
|
||
if s.checkSymbolCache(req) { | ||
return nil | ||
} | ||
|
||
if s.checkLidiaTableCache(ctx, req) { | ||
return nil | ||
} | ||
|
||
if s.checkObjectStoreCache(ctx, req) { | ||
return nil | ||
} | ||
|
||
return s.fetchAndCacheFromDebuginfod(ctx, req, &status) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A comment explaining the sequence and cache layers would be helpful.
func (m *Metrics) Unregister() { | ||
if m.registerer == nil { | ||
return | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we use it?
// TODO: change me back to false! | ||
f.BoolVar(&cfg.Enabled, "symbolizer.enabled", true, "Enable symbolization for unsymbolized profiles") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should probably be disabled by default
if s.store != nil { | ||
cacheStart := time.Now() | ||
if cacheErr := s.store.Put(ctx, req.BuildID, bytes.NewReader(data)); cacheErr != nil { | ||
s.metrics.cacheOperations.WithLabelValues("objstore_cache", "put", StatusErrorUpload).Observe(time.Since(cacheStart).Seconds()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already measure it inside the bucket implementation.
|
||
// checkSymbolCache checks if all addresses are in the symbol cache | ||
func (s *ProfileSymbolizer) checkSymbolCache(req *Request) bool { | ||
if s.symbolCache == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when is s.symbolCache nil?
// updateAllCaches updates all caches with the fetched data | ||
func (s *ProfileSymbolizer) updateAllCaches(ctx context.Context, req *Request, data []byte) error { | ||
// Store in Ristretto cache | ||
if s.lidiaTableCache != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when is it nil?
} | ||
|
||
func (s *ProfileSymbolizer) processELFData(data []byte) (lidiaData []byte, ei *BinaryLayout, err error) { | ||
// Create a reader from the data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets remove all the "obvious" comments all over the new package. They are not very useful.
// Create a reader from data
... bytes.NewReader// Create lidia file
lidia.CreateLidiaFromElf// Convert to our internal binary layout format ei = convertBinaryLayout(lidiaLayout)
// Open the Lidia table to extract binary layout
lidia.Open
} | ||
|
||
// Store in ObjstoreCache | ||
if s.store != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when is s.store nil?
* feat: add symbolizer per tenant overrides
Context
This PR introduces a comprehensive implementation for DWARF symbolization of unsymbolized profiles in the Pyroscope read path. It enables automatic symbolization of profiles for non-customer code (primarily open-source libraries and binaries) where symbol information isn't available at collection time.
Symbolization
Multi-level Caching
Integration Points
Configuration Example