feat(ksymbols): reimplement ksymbols #4464

oshaked1 · 2024-12-25T16:41:54Z

1. Explain what the PR does

The previous ksymbols implementation used a lazy lookup method, where only symbols marked as required ahead of time were stored. Trying to lookup a symbol that was not stored resulted in /proc/kallsyms being read and parsed in its entirety.
While most symbols being looked up were registered as required ahead of time, some weren't (in particular symbols needed for kprobe attachment) which incurred significant overhead when tracee is being initialized.

This new implementation stores all symbols, or if a requiredDataSymbolsOnly flag is used when creating the symbol table (used by default), only non-data symbols are stored (and required data symbols must be registered before updating). Some additional memory usage optimizations are included, for example encoding symbol owners as an index into a list of owner names, and also lazy symbol name lookups where the map of symbol name to symbol is populated only for symbols that were looked up once.

From measurements I performed, the extra memory consumption is around 21MB (from ~159MB to ~180MB when running tracee with no arguments on my machine).

Under the hood, this ksymbols implementation uses a generic symbol table implementation that can be used by future code for managing executable file symbols.

A significant advantage gained by storing all non-data symbols is the ability to lookup a function symbol that contains a given code address, a feature that I plan to use in the future.

This PR closes #4463 and renders #4325 irrelevant (because /proc/kallsyms reads no-longer happen "spontaneously").

NDStrahilevitz

Nice work overall, though I do have some comments in mind.

pkg/utils/symbol_table.go

pkg/utils/environment/kernel_symbols.go

pkg/ebpf/tracee.go

oshaked1 · 2025-01-01T15:50:37Z

I added an additional memory optimization - kernel symbols now only store the lower 48 bits of the address with the assumption that all addresses begin with 0xffff. We ignore any symbols whose address doesn't start with 0xffff, which is only percpu symbols. This allows us to encode the address and owner index together which eliminates 8 bytes per symbol for a total memory saving of around 3-4MB.

This implementation stores all symbols, or if a `requiredDataSymbolsOnly` flag is used when creating the symbol table, only non-data symbols are saved (and required data symbols must be registered before updating). This new implementation uses a generic symbol table implementation that is responsible for managing symbol lookups, and can be used by future code for managing exeutable file symbols.

After running the init function of a kernel module, the kernel frees the memory that was allocated for it but doesn't remove its symbol from kallsyms. This resulsts in a scenario where a subsequent loaded module can be allocated to the same area as the free'd init function of the prevous module. This could result in 2 symbols at the same address, one is the free'd init function and another from the newly loaded module. This caused an undeterminism in which symbol is used by the hooked_syscall event, which only used the first symbol that was found, resulting in random test failures. This commit changes the hooked_syscall event to emit one event for each found symbol.

NDStrahilevitz

I have one critical request to make: please avoid the mixed mutex transactions in the kernel symbols table. From experience this tends to cause transaction mixes where one write operation will happen in the middle of a mixed operation for example. Imagine the following methods m1 and m2 where m1 is w and m2 has rw. The following could occur:

m2 - r
m1 - wait for w
m2 - release r
m1 - back for w
m1 - release
m2 - back to w, wiht r assumptions changed due to m1.

Please either pick for each method that it is R or W and make the lock last for the whole operation. You could even opt for a regular mutex instead of a RWMutex, I don't think we have very frequent reads or writes to this struct anyway.

oshaked1 · 2025-01-02T13:20:31Z

Symbols lookups could be very frequent in the future (stack trace processing). Using a write lock for the entire duration of KernelSymbolTable.UpdateFromReader will prevent symbol lookups for a significant duration. The same applies to SymbolTable.LookupByName.

In both functions, the write operations only add new data, they don't change or remove existing data. In the case of SymbolTable.LookupByName, the worst case scenario for an outdated assumption means we add the same name to symbol mapping twice (the added data will always be the same). For KernelSymbolTable.UpdateFromReader, the worst case scenario is the same owner gets added twice to idxToSymbolOwner, but all data remains valid.

I could solve the issue with UpdateFromReader by adding a third lock that makes sure only a single update operation can happen at a time (which prevents 2 concurrent goroutines from wanting to add a new symbol owner).

oshaked1 · 2025-01-02T13:23:55Z

I could also change the API of the kernel symbol table so that reading /proc/kallsyms happens once when creating the symbol table, and if the user wants to update it, he must create a new one. This prevents the need for locks at all.

It also solves a race condition where if a lookup happens between kst.symbols.Clear() and kst.symbols.AddSymbols(symbols) the lookup will fail.

NDStrahilevitz · 2025-01-02T13:31:32Z

I could also change the API of the kernel symbol table so that reading /proc/kallsyms happens once when creating the symbol table, and if the user wants to update it, he must create a new one. This prevents the need for locks at all.

It also solves a race condition where if a lookup happens between kst.symbols.Clear() and kst.symbols.AddSymbols(symbols) the lookup will fail.

Wouldn't this be an issue for your stack trace feature if you miss a symbol in the initial request?

In both functions, the write operations only add new data, they don't change or remove existing data. In the case of SymbolTable.LookupByName, the worst case scenario for an outdated assumption means we add the same name to symbol mapping twice (the added data will always be the same). For KernelSymbolTable.UpdateFromReader, the worst case scenario is the same owner gets added twice to idxToSymbolOwner, but all data remains valid.
I could solve the issue with UpdateFromReader by adding a third lock that makes sure only a single update operation can happen at a time (which prevents 2 concurrent goroutines from wanting to add a new symbol owner).

So this is a valid concern... I can't think of a better solution, and it sounds reasonable that these internal structures should have separate access control. So i'm ok with that. Make the change and i'll +1.

@yanivagman @geyslan tagging you in case you want to drop a review as well.

oshaked1 · 2025-01-02T13:36:32Z

Wouldn't this be an issue for your stack trace feature if you miss a symbol in the initial request?

Do you mean with the current implementation? If so then yes, I only noticed it now.

So this is a valid concern... I can't think of a better solution, and it sounds reasonable that these internal structures should have separate access control. So i'm ok with that. Make the change and i'll +1.

Which solution do prefer? IMO the second one (RO symbol table, create a new one instead of updating) is favorable.

Additionally, I could change SymbolTable.symbolsByName to an LRU which would solve the concurrency issue. It would also allow us to control the memory usage of name mappings.

NDStrahilevitz · 2025-01-02T13:42:59Z

Additionally, I could change SymbolTable.symbolsByName to an LRU which would solve the concurrency issue. It would also allow us to control the memory usage of name mappings.

That is just equivalent to giving it a separate mutex, and I don't see why we would want to limit the owner number. It might be slightly more readable. BTW, you may want to use the set.Set data structure for this instead, as the owner symbols just are a set, and I believe i've included a mutex with it. Though it may not fit with the int<->string translation scheme you've made.

Wouldn't this be an issue for your stack trace feature if you miss a symbol in the initial request?

Do you mean with the current implementation? If so then yes, I only noticed it now.

I mean if you make it RO and require full regeneration with a new symbol you, I think you may easily encounter multiple regenerations per stack trace extraction.

So this is a valid concern... I can't think of a better solution, and it sounds reasonable that these internal structures should have separate access control. So i'm ok with that. Make the change and i'll +1.

Which solution do prefer? IMO the second one (RO symbol table, create a new one instead of updating) is favorable.

Which is why the RO symbol table seems like a bad choice for your future needs, unless I am missing something in what you've suggested. Therefore I suggest you use a set, LRU, mutex on the owners, such that the overall concurrency mutex is limited to its relevant data.

oshaked1 · 2025-01-02T14:04:16Z

Additionally, I could change SymbolTable.symbolsByName to an LRU which would solve the concurrency issue. It would also allow us to control the memory usage of name mappings.

That is just equivalent to giving it a separate mutex, and I don't see why we would want to limit the owner number. It might be slightly more readable. BTW, you may want to use the set.Set data structure for this instead, as the owner symbols just are a set, and I believe i've included a mutex with it. Though it may not fit with the int<->string translation scheme you've made.

symbolsByName is not for the owners, it's for the cached symbol name to symbol structure mapping (which probably should be size limited). The owners cannot be a set for the reason you mentioned, there is no way to index into it.

I mean if you make it RO and require full regeneration with a new symbol you, I think you may easily encounter multiple regenerations per stack trace extraction.

There is no regeneration on failed request, only manually (used in processDoInitModule). Creating a new kernel symbol table instead of updating it will ensure that for the users, the update is atomic (a new structure gets assigned to t.kernelSymbols).

geyslan

First pass so far. I've only reviewed kernelSymbolInternal with some suggestions that could make the code easier to read and change - later I'm going to reach the rest.

geyslan · 2025-01-03T15:13:12Z

pkg/utils/environment/kernel_symbols.go

 )

+// Kernel symbols do not have an associated size, so we define a sensible size
+// limit to prevent unrelated symbols from being returned for an address lookup
+const maxSymbolSize = 0x100000


Consider using more consts like:

const ( ownerShift = 48 // Number of bits to shift the owner into the upper 16 bits addressMask = (1 << ownerShift) - 1 // Mask to extract the address from the addressAndOwner field kernelAddressPrefix = uint64(0xffff) << ownerShift // Precomputed prefix for kernel addresses )

geyslan · 2025-01-03T15:14:27Z

pkg/utils/environment/kernel_symbols.go

+func newKernelSymbolInternal(name string, address uint64, owner uint16) *kernelSymbolInternal {
+	return &kernelSymbolInternal{
+		name:            name,
+		addressAndOwner: (uint64(owner) << 48) | (address & ((1 << 48) - 1)),


Based on the consideration above, this could be:

Suggested change

addressAndOwner: (uint64(owner) << 48) | (address & ((1 << 48) - 1)),

addressAndOwner: (uint64(owner) << ownerShift) | (address & addressMask),

geyslan · 2025-01-03T15:25:29Z

pkg/utils/environment/kernel_symbols.go

-type KSymbTableOption func(k *KernelSymbolTable) error
+func (ks kernelSymbolInternal) Address() uint64 {
+	// Convert truncated address to the real kernel address
+	return (0xffff << 48) | (ks.addressAndOwner & ((1 << 48) - 1))


As the prefix (all 1) overwrites owner bits I believe this can be simplified to:

Suggested change

return (0xffff << 48) | (ks.addressAndOwner & ((1 << 48) - 1))

return kernelAddressPrefix | ks.addressAndOwner

geyslan · 2025-01-03T15:26:33Z

pkg/utils/environment/kernel_symbols.go

-		return nil
-	}
+func (ks kernelSymbolInternal) owner() uint16 {
+	return uint16(ks.addressAndOwner >> 48)


ditto:

Suggested change

return uint16(ks.addressAndOwner >> 48)

return uint16(ks.addressAndOwner >> ownerShift)

geyslan · 2025-01-03T15:27:42Z

pkg/utils/environment/kernel_symbols.go

-		return nil
-	}
+func (ks kernelSymbolInternal) Contains(address uint64) bool {
+	return ks.Address() <= address && ks.Address()+maxSymbolSize > address


consider calling it once:

Suggested change

return ks.Address() <= address && ks.Address()+maxSymbolSize > address

addr := ks.Address()

return addr <= address && addr+maxSymbolSize > address

geyslan

I did one pass more.

Besides my thoughts put, I would recommend you bring also a test file for concurrency like this: https://github.com/aquasecurity/tracee/blob/main/pkg/capabilities/capabilities_test.go

I didn't reviewed KernelSymbolTable yet. It will wait for the next pass. 👍🏼

geyslan · 2025-01-03T19:13:56Z

pkg/utils/symbol_table.go

+		}
+	}
+
+	// Sort the symbols slice by address in descending order


Perhaps sorting it in ascending order would bring a small performance and API features (as boolean found or the ordered index where a value should be inserted when not found) by using slices.BinarySearch() or slices.BinarySearchFunc(). WDYT?

geyslan · 2025-01-03T19:28:44Z

pkg/utils/symbol_table.go

+	st.mu.RLock()
+	// We call RUnlock manually and not using defer because we may need to upgrade to a write lock later
+
+	// Lookup the name in the name to symbol mapping
+	if symbols, found := st.symbolsByName[name]; found {
+		st.mu.RUnlock()
+		return symbols, nil
+	}
+
+	// Lazy name lookup is disabled, the lookup failed
+	if !st.lazyNameLookup {
+		st.mu.RUnlock()
+		return nil, ErrSymbolNotFound
+	}
+
+	// Lazy name lookup is enabled, perform a linear search to find the requested name
+	symbols := []*T{}
+	for _, symbol := range st.sortedSymbols {
+		if (*symbol).Name() == name {
+			symbols = append(symbols, symbol)
+		}
+	}


WDYT about moving this reading logic into a private method, so all read mutex control would be managed by it... i.e.: symbols := st.lookupByName()

Then we should write lock only if symbols len is > 0.

If you agree, provide comments about the locking in the inner call to warn about unexpected deadlocks if it's used out of context.

geyslan · 2025-01-03T19:33:35Z

pkg/utils/symbol_table.go

+		return nil, ErrSymbolNotFound
+	}
+
+	// Lazy name lookup is enabled, perform a linear search to find the requested name


Depending on the size of sortedSymbols this won't be optimal, but I believe that to keep metadata keyed by name would be worse (memory wise), is that right?

geyslan · 2025-01-03T19:35:21Z

pkg/utils/symbol_table.go

+	// Not found or not exact match
+	if idx == len(st.sortedSymbols) || (*st.sortedSymbols[idx]).Address() != address {
+		return nil, ErrSymbolNotFound
+	}


As commented above, I suppose that by using BinarySearch this check might not be necessary.

geyslan · 2025-01-03T19:47:42Z

pkg/utils/symbol_table.go

+	return st.sortedSymbols[idx], nil
+}
+
+func (st *SymbolTable[T]) ForEachSymbol(callback func(symbol *T)) {


This method name makes me to believe that it will do something for each symbol (write to it), but we're just locking for read. Is that right? Couldn't the caller to pass a callback changing the symbol inadvertently?

geyslan · 2025-01-03T19:48:25Z

pkg/utils/symbol_table_test.go

+}
+
+// TestLookupByAddressExact tests the LookupByAddressExact function
+func TestLookupByAddressExaxt(t *testing.T) {


Suggested change

func TestLookupByAddressExaxt(t *testing.T) {

func TestLookupByAddressExact(t *testing.T) {

github-actions bot assigned oshaked1 Dec 25, 2024

github-actions bot added area/ebpf area/testing area/events labels Dec 25, 2024

oshaked1 force-pushed the kallsyms branch 10 times, most recently from 165e3d5 to 8eabeb9 Compare December 26, 2024 14:36

NDStrahilevitz reviewed Dec 26, 2024

View reviewed changes

oshaked1 force-pushed the kallsyms branch 7 times, most recently from a37c7cd to f3e5990 Compare December 29, 2024 10:58

yanivagman linked an issue Dec 29, 2024 that may be closed by this pull request

Kernel symbols are updated without required capabilities #4325

Open

oshaked1 force-pushed the kallsyms branch 2 times, most recently from e5c5324 to eaa12ab Compare January 1, 2025 15:41

oshaked1 added 2 commits January 1, 2025 18:44

oshaked1 force-pushed the kallsyms branch from eaa12ab to 3ba819b Compare January 1, 2025 16:47

NDStrahilevitz requested changes Jan 2, 2025

View reviewed changes

geyslan added the milestone/v0.23.0 label Jan 2, 2025

geyslan reviewed Jan 3, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(ksymbols): reimplement ksymbols #4464

feat(ksymbols): reimplement ksymbols #4464

oshaked1 commented Dec 25, 2024 •

edited

Loading

NDStrahilevitz left a comment

oshaked1 commented Jan 1, 2025 •

edited

Loading

NDStrahilevitz left a comment

oshaked1 commented Jan 2, 2025

oshaked1 commented Jan 2, 2025

NDStrahilevitz commented Jan 2, 2025 •

edited

Loading

oshaked1 commented Jan 2, 2025

NDStrahilevitz commented Jan 2, 2025 •

edited

Loading

oshaked1 commented Jan 2, 2025

geyslan left a comment

geyslan Jan 3, 2025

geyslan Jan 3, 2025

geyslan Jan 3, 2025

geyslan Jan 3, 2025

geyslan Jan 3, 2025

geyslan left a comment

geyslan Jan 3, 2025

geyslan Jan 3, 2025

geyslan Jan 3, 2025

geyslan Jan 3, 2025

geyslan Jan 3, 2025

geyslan Jan 3, 2025

	addressAndOwner: (uint64(owner) << 48) \| (address & ((1 << 48) - 1)),
	addressAndOwner: (uint64(owner) << ownerShift) \| (address & addressMask),

	return (0xffff << 48) \| (ks.addressAndOwner & ((1 << 48) - 1))
	return kernelAddressPrefix \| ks.addressAndOwner

	return uint16(ks.addressAndOwner >> 48)
	return uint16(ks.addressAndOwner >> ownerShift)

	return ks.Address() <= address && ks.Address()+maxSymbolSize > address
	addr := ks.Address()
	return addr <= address && addr+maxSymbolSize > address

	func TestLookupByAddressExaxt(t *testing.T) {
	func TestLookupByAddressExact(t *testing.T) {

feat(ksymbols): reimplement ksymbols #4464

Are you sure you want to change the base?

feat(ksymbols): reimplement ksymbols #4464

Conversation

oshaked1 commented Dec 25, 2024 • edited Loading

1. Explain what the PR does

NDStrahilevitz left a comment

Choose a reason for hiding this comment

oshaked1 commented Jan 1, 2025 • edited Loading

NDStrahilevitz left a comment

Choose a reason for hiding this comment

oshaked1 commented Jan 2, 2025

oshaked1 commented Jan 2, 2025

NDStrahilevitz commented Jan 2, 2025 • edited Loading

oshaked1 commented Jan 2, 2025

NDStrahilevitz commented Jan 2, 2025 • edited Loading

oshaked1 commented Jan 2, 2025

geyslan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

geyslan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

oshaked1 commented Dec 25, 2024 •

edited

Loading

oshaked1 commented Jan 1, 2025 •

edited

Loading

NDStrahilevitz commented Jan 2, 2025 •

edited

Loading

NDStrahilevitz commented Jan 2, 2025 •

edited

Loading