-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
infra: perf flame graph #5995
base: master
Are you sure you want to change the base?
infra: perf flame graph #5995
Conversation
This works locally as I have |
Interesting : the fuzz target spends quite a bit of time resetting/clearing hash tables... which is not what they are optimized for and is needed by the fuzz target to be stateless... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice. Thanks!
cd linux-stable/tools/perf/ | ||
apt-get install -y flex bison make | ||
# clang finds errors such as tautological-bitwise-compare | ||
CC=gcc DESTDIR=/usr/ make install |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please uninstall flex and bison (there's a bit of an issue with my assumption that we can just uninstall these packages, what if they are installed by a previous step because it is needed in a later step, they will no longer be available).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please delete the source checkout and the non-installed build.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done...
I am not sure about flex and bison, I guess a later step should reinstall them if needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They aren't already installed so they should be safe to remove here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might have something different in mind for profiling than just profiling on the corpus (which I'm not sure illustrates very useful info).
Like using this for timeouts on ClusterFuzz for example
@@ -82,7 +82,7 @@ function run_fuzz_target { | |||
local args="-merge=1 -timeout=100 -close_fd_mask=3 $corpus_dummy $corpus_real" | |||
|
|||
export LLVM_PROFILE_FILE=$profraw_file | |||
timeout $TIMEOUT $OUT/$target $args &> $LOGS_DIR/$target.log | |||
timeout $TIMEOUT perf record -g -o $DUMPS_DIR/$target.perf $OUT/$target $args &> $LOGS_DIR/$target.log |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure we should be doing this in the coverage script. It seems different than coverage IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would surely be useful for timeouts :-)
I also think profiling on the corpus still shows relevant information, as it shows profiling across all kinds of inputs...
I did it in the coverage script, as it was easy : get both reports in one shot...
Let me know what you prefer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Profiling on the corpus seems like it will produce a bit of a skewed result because the corpus is a subset of the inputs that the target runs on during fuzzing. Let's say there is an edge that is hit very frequently during fuzzing and takes a really long time, but only one file in the corpus covers it. The profiling won't show that fuzzing is spending a ton of time on this edge.
@oliverchang What do you think about this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another reason this is problematic: In this run the target is instrumented with Source-based coverage instrumentation and dumping results. So not only will profiling include time wasted on that stuff, it will not be able to accurately show the amount of time spent on sancov instrumentation which can also be useful to know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, so what is the best way to run it ?
We can run it for some definite time with classic ASAN instrumentation
But then, should it be a new subcommand to infra/helper.py
? Or an option to the run_fuzzer
subcommand ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How useful is sancov info? It seems like pretty niche information in most cases?
That said, it does seem like something that could be built more into ClusterFuzz instead, as we may want this data for timeouts and there's the potential corpus vs actual fuzzing skew issue. Not requiring a specialised build also make it easier.
I think this is fine to include with coverage as is for now though, so we can evaluate this before devoting more time to properly integrate this into ClusterFuzz. How long do these runs typically take?
@@ -82,7 +82,7 @@ function run_fuzz_target { | |||
local args="-merge=1 -timeout=100 -close_fd_mask=3 $corpus_dummy $corpus_real" | |||
|
|||
export LLVM_PROFILE_FILE=$profraw_file | |||
timeout $TIMEOUT $OUT/$target $args &> $LOGS_DIR/$target.log | |||
timeout $TIMEOUT perf record -g -o $DUMPS_DIR/$target.perf $OUT/$target $args &> $LOGS_DIR/$target.log |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another reason this is problematic: In this run the target is instrumented with Source-based coverage instrumentation and dumping results. So not only will profiling include time wasted on that stuff, it will not be able to accurately show the amount of time spent on sancov instrumentation which can also be useful to know.
cd linux-stable/tools/perf/ | ||
apt-get install -y flex bison make | ||
# clang finds errors such as tautological-bitwise-compare | ||
CC=gcc DESTDIR=/usr/ make install |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They aren't already installed so they should be safe to remove here.
@@ -105,6 +105,11 @@ RUN wget https://repo1.maven.org/maven2/org/jacoco/org.jacoco.cli/0.8.7/org.jaco | |||
echo "37df187b76888101ecd745282e9cd1ad4ea508d6 /opt/jacoco-agent.jar" | shasum --check && \ | |||
echo "c1814e7bba5fd8786224b09b43c84fd6156db690 /opt/jacoco-cli.jar" | shasum --check | |||
|
|||
ENV PERF_DIR=/root/perf | |||
RUN git clone --depth 1 https://github.com/brendangregg/FlameGraph $PERF_DIR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we pin this by checking out a git hash?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Why do we need this ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's good to pin dependencies like these going forward for more reproducibility and to avoid potential supply chain compromises.
@@ -82,7 +82,7 @@ function run_fuzz_target { | |||
local args="-merge=1 -timeout=100 -close_fd_mask=3 $corpus_dummy $corpus_real" | |||
|
|||
export LLVM_PROFILE_FILE=$profraw_file | |||
timeout $TIMEOUT $OUT/$target $args &> $LOGS_DIR/$target.log | |||
timeout $TIMEOUT perf record -g -o $DUMPS_DIR/$target.perf $OUT/$target $args &> $LOGS_DIR/$target.log |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How useful is sancov info? It seems like pretty niche information in most cases?
That said, it does seem like something that could be built more into ClusterFuzz instead, as we may want this data for timeouts and there's the potential corpus vs actual fuzzing skew issue. Not requiring a specialised build also make it easier.
I think this is fine to include with coverage as is for now though, so we can evaluate this before devoting more time to properly integrate this into ClusterFuzz. How long do these runs typically take?
In my main test case, suricata fuzz_sigpcap_aware,
In my main test case, suricata fuzz_sigpcap_aware, with a corpus over 10k elements, it takes like 10 seconds |
Which hash tables are you talking about? |
Not sure to be honest. And I'm wondering if this is going to be a problem on google cloud (i.e. production) since I would bet the kernels there are also non-standard. Googling "install perf on azure" doesn't come up with much that is helpful: https://www.google.com/search?q=install+perf+on+azure+linux+linux-tools+-perfinsights |
The ones in the software being fuzzed (suricata)
At least, I solved the CI problem with install_perf.sh and its multiple alternatives to install perf |
You mean the ones reset by |
I do not think so, I mean the hash tables from util-hashlist.c in Suricata |
Ah got it. |
Other UIs are available to get flame graphs cf https://www.markhansen.co.nz/profiler-uis/ |
FYI a few months ago we used Prodfiler to debug some of the Envoy fuzzers (check page 10 in the report https://github.com/envoyproxy/envoy/blob/main/docs/security/audit_fuzzer_adalogics_2021.pdf) and the flamegraphs were very useful for debugging performance issues in the fuzzers. |
Thanks @DavidKorczynski for this interesting read I had missed. Does this mean I should not Side note for me later : We can try catenacyber@0ec259f to see if we have a difference in the perf report |
as libfuzzer may take time as well
Friendly ping on this @jonathanmetzman would you want to use perf for other use cases ? |
I think it would be really nice to get this added and add the flame graphs to Fuzz Introspector |
Thanks @DavidKorczynski. I still want to get this somehow ;-) What would it take to get it into Fuzz Introspector ? |
I think the easiest from Fuzz Introspector's perspective would be to accept the |
@jonathanmetzman how would you like to proceed on this ? |
cc @jonathanmetzman @inferno-chromium
This allows to have perf flame graph along with coverage reports for C/C++/Rust
Tested with
after having downloaded the corpus directory
Result is available here https://catenacyber.fr/perf-fuzz_sigpcap_aware.svg
In this case, the flame graph allows for instance to see that we spend 17% of the time in
JsonAnomalyLogger
Even if this function may have bugs, it is clearly way too much time that is not spent on something else...