Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

argument parsers improvements #4279

Merged
merged 14 commits into from
Jan 16, 2025

Conversation

geyslan
Copy link
Member

@geyslan geyslan commented Sep 5, 2024

Close: #4476
Close: #4477
Close: #4478

1. Explain what the PR does

This is a sequence of the effort already started by #4200. As it was getting huge, the continuation will come after.


Improvements include reducing the size of the parser logic by using slices instead of maps whenever possible. This allows for direct access to values via index, and when direct access isn't feasible, iterating through slices is generally more efficient than fetching values from a map.

Return string only instead of the Argument type since it's the only value used.

This PR reduces the size of the final binary by ~56KB.


cb55f8f fix(parsers): fix openFlagsValues O_LARGEFILE
9676369 feat: parse dirfd for special case AT_FDCWD
af50e8b perf: chore: add flag to ParseSocketDomainArgument
4fc5f7c perf: chore: improve ParseSocketcallCall
35613a5 perf: chore: add flags ParsePtraceRequestArgument
083cfd4 perf: chore: add missing flags to ParseBPFCmd
6c2ca98 perf: chore: add missing flags to ParsePrctlOption
88f4f92 perf: chore: add missing flags to ParseCapability
9b8fe58 feat: perf: fix: AT flags parsing
c99ebed perf: fix: improve ParseAccessMode
86b68aa perf: fix: improve ParseOpenFlagArgument
9a328a9 perf: improve ParseCloneFlags
2baff04 chore: add comment about raw values
2d5e43a chore: deprecate internal interface

cb55f8f fix(parsers): fix openFlagsValues O_LARGEFILE

O_LARGEFILE is different on arm64 and amd64, 0400000 and 00100000
respectively.

9676369 feat: parse dirfd for special case AT_FDCWD

syscalls with dirfd arg now parse for special case AT_FDCWD when
ParseArgumentsFDs is true.

af50e8b perf: chore: add flag to ParseSocketDomainArgument

Add missing flag to ParseSocketDomainArgument:
    AF_MCTP

Use slice instead of maps. This allows for direct access to values via
index.

Return string only instead of the Argument type since it's the only
value used.

4fc5f7c perf: chore: improve ParseSocketcallCall

Use slice instead of maps. This allows for direct access to values via
index.

Return string only instead of the Argument type since it's the only
value used.

35613a5 perf: chore: add flags ParsePtraceRequestArgument

Add missing flags to ParsePtraceRequestArgument:
    PTRACE_GET_THREAD_AREA, PTRACE_SET_THREAD_AREA, PTRACE_ARCH_PRCTL,
    PTRACE_SYSEMU, PTRACE_SYSEMU_SINGLESTEP, PTRACE_SINGLEBLOCK,
    PTRACE_GET_RSEQ_CONFIGURATION,
    PTRACE_SET_SYSCALL_USER_DISPATCH_CONFIG and
    PTRACE_GET_SYSCALL_USER_DISPATCH_CONFIG.

Iterate slice instead of fetching a map.

Return string only instead of the Argument type since it's the only
value used.

083cfd4 perf: chore: add missing flags to ParseBPFCmd

Add missing flags to ParseBPFCmd:
    BPF_PROG_BIND_MAP, BPF_TOKEN_CREATE

Use slice instead of maps. This allows for direct access to values via
index.

Return string only instead of the Argument type since it's the only
value used.

6c2ca98 perf: chore: add missing flags to ParsePrctlOption

Add missing flags to ParsePrctlOption:
    PR_SET_IO_FLUSHER, PR_GET_IO_FLUSHER, PR_SET_SYSCALL_USER_DISPATCH,
    PR_PAC_SET_ENABLED_KEYS, PR_PAC_GET_ENABLED_KEYS, PR_SCHED_CORE,
    PR_SME_SET_VL, PR_SME_GET_VL, PR_SET_MDWE, PR_GET_MDWE,
    PR_SET_MEMORY_MERGE and PR_GET_MEMORY_MERGE.

Iterate slice instead of fetching a map.

Return string only instead of the Argument type since it's the only
value used.

88f4f92 perf: chore: add missing flags to ParseCapability

Add missing flags to ParseCapability:
CAP_PERFMON, CAP_BPF and CAP_CHECKPOINT_RESTORE

Use slice instead of maps. This allows for direct access to values via
index.

Return string only instead of the Argument type since it's the only
value used.

9b8fe58 feat: perf: fix: AT flags parsing

Introduce ParseFaccessatFlag to parse faccessat flags.
Introduce ParseFchmodatFlag to parse fchmodat flags.

Iterate slice instead of conditional branch.

Return string only instead of the Argument type since it's the only
value used.

Fix ParseExecFlag (now ParseExecveatFlag) to check only related flags.

c99ebed perf: fix: improve ParseAccessMode

Iterate slice instead of conditional branch.

Return string only instead of the Argument type since it's the only
value used.

Fix logic, since it wasn't printing F_OK flag alone.

86b68aa perf: fix: improve ParseOpenFlagArgument

When possible iterates slice instead of conditional branch.
Return string only instead of the Argument type since it's the only
value used.

Fix logic, since it wasn't printing O_RDONLY flag alone.

9a328a9 perf: improve ParseCloneFlags

Iterate slice instead of conditional branch.

Return string only instead of the Argument type since it's the only
value used.

| BenchmarkParseCloneFlags-32 | Before   | After  |
|-----------------------------|----------|--------|
| Execution Time (ns/op)      | 1117     | 696.7  |
| Memory Allocated (B/op)     | 1192     | 736    |
| Allocations per Operation   | 27       | 25     |

2d5e43a chore: deprecate internal interface

Dealing with a concrete type instead of an interface is more efficient.

systemFunctionArgument interface will be replaced by
SystemFunctionArgument struct in future efforts.

2. Explain how to test it

3. Other comments

@geyslan

This comment was marked as outdated.

@geyslan geyslan force-pushed the parse-args-optm-cont branch 10 times, most recently from a349ec7 to 58a8fb4 Compare September 16, 2024 22:03
@geyslan

This comment was marked as duplicate.

@geyslan geyslan marked this pull request as ready for review September 16, 2024 22:10
@geyslan geyslan requested a review from rscampos September 16, 2024 22:14
@geyslan
Copy link
Member Author

geyslan commented Sep 16, 2024

After all refactor done, I pretend to split it in different files. I also have an idea to try to reduce ParseArgs() logic, but gonna play with it only after finishing the whole data parsers.

@geyslan geyslan force-pushed the parse-args-optm-cont branch from 58a8fb4 to 7eb0e8f Compare September 17, 2024 14:59
@geyslan geyslan force-pushed the parse-args-optm-cont branch from 7eb0e8f to 32999c7 Compare September 27, 2024 13:57
@geyslan geyslan force-pushed the parse-args-optm-cont branch 2 times, most recently from 6e04d2f to 62541c2 Compare October 17, 2024 13:39
@geyslan geyslan force-pushed the parse-args-optm-cont branch from 62541c2 to 5c9c6b9 Compare December 4, 2024 11:37
Copy link
Collaborator

@yanivagman yanivagman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed the first 2 commits and added some doubts.
I assume the comments there will affect the rest of the code so I will continue the review after clarifying these points

@@ -13,11 +13,27 @@ import (
"github.com/aquasecurity/tracee/pkg/utils/environment"
)

type SystemFunctionArgument interface {

type systemFunctionArgument interface {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why making the interface private?
Isn't the idea behind interfaces to be exposed between different packages?

Copy link
Member Author

@geyslan geyslan Dec 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not used (imported) so I just made it private via c071141 by creating a public concrete type with same name - since dealing with the concrete or its pointer is cheaper than dealing with an interface.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we discussed I'm going to remove the private interface (enforcer). 👍🏼

Copy link
Member Author

@geyslan geyslan Dec 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, we'll have to wait a bit to remove that interface since other still not refactored parsers consumes it.

pkg/events/parsers/data_parsers.go Outdated Show resolved Hide resolved
@geyslan geyslan force-pushed the parse-args-optm-cont branch 2 times, most recently from 9bc2df9 to 06328f8 Compare December 17, 2024 17:10
@@ -23,7 +23,8 @@ func ParseArgs(event *trace.Event) error {
}
}

switch ID(event.EventID) {
evtID := ID(event.EventID)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After all refactoring (optimisation) is complete, I plan to make ParseArgs() smaller (cache wise), fetching parsers from a slice or map based on its eventId. It will certainly help to deduplicate code and expand the parsing to other events.

Copy link
Collaborator

@ShohamBit ShohamBit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes in this PR are relevant to my PR #4442 please review and approve @yanivagman @oshaked1

pkg/events/parse_args.go Show resolved Hide resolved
pkg/events/parsers/data_parsers_test.go Outdated Show resolved Hide resolved
pkg/events/parsers/data_parsers.go Show resolved Hide resolved
pkg/events/parsers/data_parsers.go Show resolved Hide resolved
pkg/events/parsers/data_parsers.go Show resolved Hide resolved
pkg/events/parsers/data_parsers.go Show resolved Hide resolved
Comment on lines 66 to 81
// Use always raw values for the constants, since unix/syscall constants are not
// always set to the same values.
// For example, `O_LARGEFILE` is defined as 0x8000 (00100000) in C include,
// but as 0x0 in unix package.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is not so clear to me. Use raw values where exactly?

Copy link
Member Author

@geyslan geyslan Jan 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For all rawValue's set in every XXXFlagArgument. It's just a hint to not trust in every const under Go libraries:

https://cs.opensource.google/go/x/sys/+/refs/tags/v0.29.0:unix/zerrors_linux_amd64.go;l=182
https://cs.opensource.google/go/x/sys/+/refs/tags/v0.29.0:unix/zerrors_linux_arm64.go;l=183

Under linux code:

image

The main issue is that those consts are fetched from the uapi, but it can be different based on the #defines set like _LARGEFILE64_SOURCE and _FILE_OFFSET_BITS, in this context. If used, those will set O_LARGEFILE as 0 if the binary is 64 bits. But if the binary is 32 bits, those macros will set O_LARGEFILE as 00100000 (x86_64) or 00400000 (arm64), it makes difference only for the glibc that wrappers calls like fcntl64.

#define _LARGEFILE64_SOURCE
#define _FILE_OFFSET_BITS 64

#include <stdio.h>
#include <fcntl.h>

int main(void)
{
    printf("%o\n", O_LARGEFILE);
    fcntl(0, F_SETFD, O_LARGEFILE);
    return 0;
}

When built as 64 bits it calls:

fcntl(0, F_SETFD, 0) - O_LARGEFILE = 0

When buit as 32 bits it calls:

fcntl64(0, F_SETFD, 0x8000 /* FD_??? */) - O_LARGEFILE = 0x8000

The same for others wrapped calls like open("file", O_LARGEFILE);:

When 64 bits:

openat(AT_FDCWD, "file", O_RDONLY) - O_LARGEFILE = 0

When 32 bits:

openat(AT_FDCWD, "file", O_RDONLY|O_LARGEFILE) - O_LARGEFILE = 0x8000

In summary, what counts for us is the real value set in the kernel include files.


This reminds me that the flags from asm-generic/fcntl.h need to be split into two arch files as well due to the different value of O_LARGEFILE for arm64.

}

type SocketDomainArgument uint64
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about socket domain type? No need for this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm abolishing the different types since all of them have the same behaviour of String and Value. I'm focusing on declaring flags (based on origin - include files) and declaring slices like socketDomainValues = []SystemFunctionArgument{ which hold a instance copy of the related flags already made public. For the particular case of SocketDomain, socketDomainValues store the values and they are parsed by ParseSocketDomainArgument().

You can better understand the idea looking the AT_* declarations and the "faccessatFlagsValues, fchmodatFlagsValues and execveAtFlagsValues" holders which make use of only a fragment of those flags based on the event to be parsed. Before we were iterating the whole bunch for events that don't use those flags.

Dealing with a concrete type instead of an interface is more efficient.

systemFunctionArgument interface will be replaced by
SystemFunctionArgument struct in future efforts.
Iterate slice instead of conditional branch.

Return string only instead of the Argument type since it's the only
value used.

| BenchmarkParseCloneFlags-32 | Before   | After  |
|-----------------------------|----------|--------|
| Execution Time (ns/op)      | 1117     | 696.7  |
| Memory Allocated (B/op)     | 1192     | 736    |
| Allocations per Operation   | 27       | 25     |
When possible iterates slice instead of conditional branch.
Return string only instead of the Argument type since it's the only
value used.

Fix logic, since it wasn't printing O_RDONLY flag alone.
Iterate slice instead of conditional branch.

Return string only instead of the Argument type since it's the only
value used.

Fix logic, since it wasn't printing F_OK flag alone.
Introduce ParseFaccessatFlag to parse faccessat flags.
Introduce ParseFchmodatFlag to parse fchmodat flags.

Iterate slice instead of conditional branch.

Return string only instead of the Argument type since it's the only
value used.

Fix ParseExecFlag (now ParseExecveatFlag) to check only related flags.
Add missing flags to ParseCapability:
CAP_PERFMON, CAP_BPF and CAP_CHECKPOINT_RESTORE

Use slice instead of maps. This allows for direct access to values via
index.

Return string only instead of the Argument type since it's the only
value used.
Add missing flags to ParsePrctlOption:
    PR_SET_IO_FLUSHER, PR_GET_IO_FLUSHER, PR_SET_SYSCALL_USER_DISPATCH,
    PR_PAC_SET_ENABLED_KEYS, PR_PAC_GET_ENABLED_KEYS, PR_SCHED_CORE,
    PR_SME_SET_VL, PR_SME_GET_VL, PR_SET_MDWE, PR_GET_MDWE,
    PR_SET_MEMORY_MERGE and PR_GET_MEMORY_MERGE.

Iterate slice instead of fetching a map.

Return string only instead of the Argument type since it's the only
value used.
Add missing flags to ParseBPFCmd:
    BPF_PROG_BIND_MAP, BPF_TOKEN_CREATE

Use slice instead of maps. This allows for direct access to values via
index.

Return string only instead of the Argument type since it's the only
value used.
Add missing flags to ParsePtraceRequestArgument:
    PTRACE_GET_THREAD_AREA, PTRACE_SET_THREAD_AREA, PTRACE_ARCH_PRCTL,
    PTRACE_SYSEMU, PTRACE_SYSEMU_SINGLESTEP, PTRACE_SINGLEBLOCK,
    PTRACE_GET_RSEQ_CONFIGURATION,
    PTRACE_SET_SYSCALL_USER_DISPATCH_CONFIG and
    PTRACE_GET_SYSCALL_USER_DISPATCH_CONFIG.

Iterate slice instead of fetching a map.

Return string only instead of the Argument type since it's the only
value used.
Use slice instead of maps. This allows for direct access to values via
index.

Return string only instead of the Argument type since it's the only
value used.
Add missing flag to ParseSocketDomainArgument:
    AF_MCTP

Use slice instead of maps. This allows for direct access to values via
index.

Return string only instead of the Argument type since it's the only
value used.
syscalls with dirfd arg now parse for special case AT_FDCWD when
ParseArgumentsFDs is true.
O_LARGEFILE is different on arm64 and amd64, 0400000 and 00100000
respectively.
@geyslan geyslan force-pushed the parse-args-optm-cont branch from 66a49e7 to cb55f8f Compare January 15, 2025 20:35
@geyslan
Copy link
Member Author

geyslan commented Jan 15, 2025

@yanivagman rebased with a new fix.

Copy link
Collaborator

@yanivagman yanivagman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@geyslan
Copy link
Member Author

geyslan commented Jan 16, 2025

/fast-forward

@github-actions github-actions bot merged commit cb55f8f into aquasecurity:main Jan 16, 2025
40 of 41 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants