-
Notifications
You must be signed in to change notification settings - Fork 431
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
argument parsers improvements #4279
argument parsers improvements #4279
Conversation
This comment was marked as outdated.
This comment was marked as outdated.
a349ec7
to
58a8fb4
Compare
This comment was marked as duplicate.
This comment was marked as duplicate.
After all refactor done, I pretend to split it in different files. I also have an idea to try to reduce ParseArgs() logic, but gonna play with it only after finishing the whole data parsers. |
58a8fb4
to
7eb0e8f
Compare
7eb0e8f
to
32999c7
Compare
6e04d2f
to
62541c2
Compare
62541c2
to
5c9c6b9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reviewed the first 2 commits and added some doubts.
I assume the comments there will affect the rest of the code so I will continue the review after clarifying these points
@@ -13,11 +13,27 @@ import ( | |||
"github.com/aquasecurity/tracee/pkg/utils/environment" | |||
) | |||
|
|||
type SystemFunctionArgument interface { | |||
|
|||
type systemFunctionArgument interface { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why making the interface private?
Isn't the idea behind interfaces to be exposed between different packages?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not used (imported) so I just made it private via c071141 by creating a public concrete type with same name - since dealing with the concrete or its pointer is cheaper than dealing with an interface.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we discussed I'm going to remove the private interface (enforcer). 👍🏼
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, we'll have to wait a bit to remove that interface since other still not refactored parsers consumes it.
9bc2df9
to
06328f8
Compare
@@ -23,7 +23,8 @@ func ParseArgs(event *trace.Event) error { | |||
} | |||
} | |||
|
|||
switch ID(event.EventID) { | |||
evtID := ID(event.EventID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After all refactoring (optimisation) is complete, I plan to make ParseArgs() smaller (cache wise), fetching parsers from a slice or map based on its eventId. It will certainly help to deduplicate code and expand the parsing to other events.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes in this PR are relevant to my PR #4442 please review and approve @yanivagman @oshaked1
pkg/events/parsers/data_parsers.go
Outdated
// Use always raw values for the constants, since unix/syscall constants are not | ||
// always set to the same values. | ||
// For example, `O_LARGEFILE` is defined as 0x8000 (00100000) in C include, | ||
// but as 0x0 in unix package. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is not so clear to me. Use raw values where exactly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For all rawValue
's set in every XXXFlagArgument. It's just a hint to not trust in every const under Go libraries:
https://cs.opensource.google/go/x/sys/+/refs/tags/v0.29.0:unix/zerrors_linux_amd64.go;l=182
https://cs.opensource.google/go/x/sys/+/refs/tags/v0.29.0:unix/zerrors_linux_arm64.go;l=183
Under linux code:
The main issue is that those consts are fetched from the uapi, but it can be different based on the #define
s set like _LARGEFILE64_SOURCE
and _FILE_OFFSET_BITS
, in this context. If used, those will set O_LARGEFILE
as 0
if the binary is 64 bits. But if the binary is 32 bits, those macros will set O_LARGEFILE
as 00100000
(x86_64) or 00400000
(arm64), it makes difference only for the glibc that wrappers calls like fcntl64
.
#define _LARGEFILE64_SOURCE
#define _FILE_OFFSET_BITS 64
#include <stdio.h>
#include <fcntl.h>
int main(void)
{
printf("%o\n", O_LARGEFILE);
fcntl(0, F_SETFD, O_LARGEFILE);
return 0;
}
When built as 64 bits it calls:
fcntl(0, F_SETFD, 0)
- O_LARGEFILE = 0
When buit as 32 bits it calls:
fcntl64(0, F_SETFD, 0x8000 /* FD_??? */)
- O_LARGEFILE = 0x8000
The same for others wrapped calls like open("file", O_LARGEFILE);
:
When 64 bits:
openat(AT_FDCWD, "file", O_RDONLY)
- O_LARGEFILE = 0
When 32 bits:
openat(AT_FDCWD, "file", O_RDONLY|O_LARGEFILE)
- O_LARGEFILE = 0x8000
In summary, what counts for us is the real value set in the kernel include files.
This reminds me that the flags from asm-generic/fcntl.h
need to be split into two arch files as well due to the different value of O_LARGEFILE
for arm64.
} | ||
|
||
type SocketDomainArgument uint64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about socket domain type? No need for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm abolishing the different types since all of them have the same behaviour of String and Value. I'm focusing on declaring flags (based on origin - include files) and declaring slices like socketDomainValues = []SystemFunctionArgument{
which hold a instance copy of the related flags already made public. For the particular case of SocketDomain
, socketDomainValues
store the values and they are parsed by ParseSocketDomainArgument()
.
You can better understand the idea looking the AT_*
declarations and the "faccessatFlagsValues
, fchmodatFlagsValues
and execveAtFlagsValues
" holders which make use of only a fragment of those flags based on the event to be parsed. Before we were iterating the whole bunch for events that don't use those flags.
Dealing with a concrete type instead of an interface is more efficient. systemFunctionArgument interface will be replaced by SystemFunctionArgument struct in future efforts.
Iterate slice instead of conditional branch. Return string only instead of the Argument type since it's the only value used. | BenchmarkParseCloneFlags-32 | Before | After | |-----------------------------|----------|--------| | Execution Time (ns/op) | 1117 | 696.7 | | Memory Allocated (B/op) | 1192 | 736 | | Allocations per Operation | 27 | 25 |
When possible iterates slice instead of conditional branch. Return string only instead of the Argument type since it's the only value used. Fix logic, since it wasn't printing O_RDONLY flag alone.
Iterate slice instead of conditional branch. Return string only instead of the Argument type since it's the only value used. Fix logic, since it wasn't printing F_OK flag alone.
Introduce ParseFaccessatFlag to parse faccessat flags. Introduce ParseFchmodatFlag to parse fchmodat flags. Iterate slice instead of conditional branch. Return string only instead of the Argument type since it's the only value used. Fix ParseExecFlag (now ParseExecveatFlag) to check only related flags.
Add missing flags to ParseCapability: CAP_PERFMON, CAP_BPF and CAP_CHECKPOINT_RESTORE Use slice instead of maps. This allows for direct access to values via index. Return string only instead of the Argument type since it's the only value used.
Add missing flags to ParsePrctlOption: PR_SET_IO_FLUSHER, PR_GET_IO_FLUSHER, PR_SET_SYSCALL_USER_DISPATCH, PR_PAC_SET_ENABLED_KEYS, PR_PAC_GET_ENABLED_KEYS, PR_SCHED_CORE, PR_SME_SET_VL, PR_SME_GET_VL, PR_SET_MDWE, PR_GET_MDWE, PR_SET_MEMORY_MERGE and PR_GET_MEMORY_MERGE. Iterate slice instead of fetching a map. Return string only instead of the Argument type since it's the only value used.
Add missing flags to ParseBPFCmd: BPF_PROG_BIND_MAP, BPF_TOKEN_CREATE Use slice instead of maps. This allows for direct access to values via index. Return string only instead of the Argument type since it's the only value used.
Add missing flags to ParsePtraceRequestArgument: PTRACE_GET_THREAD_AREA, PTRACE_SET_THREAD_AREA, PTRACE_ARCH_PRCTL, PTRACE_SYSEMU, PTRACE_SYSEMU_SINGLESTEP, PTRACE_SINGLEBLOCK, PTRACE_GET_RSEQ_CONFIGURATION, PTRACE_SET_SYSCALL_USER_DISPATCH_CONFIG and PTRACE_GET_SYSCALL_USER_DISPATCH_CONFIG. Iterate slice instead of fetching a map. Return string only instead of the Argument type since it's the only value used.
Use slice instead of maps. This allows for direct access to values via index. Return string only instead of the Argument type since it's the only value used.
Add missing flag to ParseSocketDomainArgument: AF_MCTP Use slice instead of maps. This allows for direct access to values via index. Return string only instead of the Argument type since it's the only value used.
syscalls with dirfd arg now parse for special case AT_FDCWD when ParseArgumentsFDs is true.
06328f8
to
7068a7d
Compare
O_LARGEFILE is different on arm64 and amd64, 0400000 and 00100000 respectively.
66a49e7
to
cb55f8f
Compare
@yanivagman rebased with a new fix. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
/fast-forward |
Close: #4476
Close: #4477
Close: #4478
1. Explain what the PR does
This is a sequence of the effort already started by #4200. As it was getting huge, the continuation will come after.
Improvements include reducing the size of the parser logic by using slices instead of maps whenever possible. This allows for direct access to values via index, and when direct access isn't feasible, iterating through slices is generally more efficient than fetching values from a map.
Return string only instead of the Argument type since it's the only value used.
This PR reduces the size of the final binary by ~56KB.
cb55f8f fix(parsers): fix openFlagsValues O_LARGEFILE
9676369 feat: parse dirfd for special case AT_FDCWD
af50e8b perf: chore: add flag to ParseSocketDomainArgument
4fc5f7c perf: chore: improve ParseSocketcallCall
35613a5 perf: chore: add flags ParsePtraceRequestArgument
083cfd4 perf: chore: add missing flags to ParseBPFCmd
6c2ca98 perf: chore: add missing flags to ParsePrctlOption
88f4f92 perf: chore: add missing flags to ParseCapability
9b8fe58 feat: perf: fix: AT flags parsing
c99ebed perf: fix: improve ParseAccessMode
86b68aa perf: fix: improve ParseOpenFlagArgument
9a328a9 perf: improve ParseCloneFlags
2baff04 chore: add comment about raw values
2d5e43a chore: deprecate internal interface
cb55f8f fix(parsers): fix openFlagsValues O_LARGEFILE
9676369 feat: parse dirfd for special case AT_FDCWD
af50e8b perf: chore: add flag to ParseSocketDomainArgument
4fc5f7c perf: chore: improve ParseSocketcallCall
35613a5 perf: chore: add flags ParsePtraceRequestArgument
083cfd4 perf: chore: add missing flags to ParseBPFCmd
6c2ca98 perf: chore: add missing flags to ParsePrctlOption
88f4f92 perf: chore: add missing flags to ParseCapability
9b8fe58 feat: perf: fix: AT flags parsing
c99ebed perf: fix: improve ParseAccessMode
86b68aa perf: fix: improve ParseOpenFlagArgument
9a328a9 perf: improve ParseCloneFlags
2d5e43a chore: deprecate internal interface
2. Explain how to test it
3. Other comments