infra: perf flame graph #5995

catenacyber · 2021-07-05T13:58:27Z

This allows to have perf flame graph along with coverage reports for C/C++/Rust

Tested with

python infra/helper.py build_fuzzers --sanitizer coverage suricata
python3 infra/helper.py build_image --no-pull base-runner
python3 infra/helper.py coverage --fuzz-target fuzz_sigpcap_aware --corpus-dir /path/to/corpus-libFuzzer-suricata_fuzz_sigpcap_aware-latest/ suricata

after having downloaded the corpus directory

Result is available here https://catenacyber.fr/perf-fuzz_sigpcap_aware.svg

In this case, the flame graph allows for instance to see that we spend 17% of the time in JsonAnomalyLogger
Even if this function may have bugs, it is clearly way too much time that is not spent on something else...

catenacyber · 2021-07-05T14:29:23Z

This works locally as I have uname -r : 4.15.0-142-generic but fails CI with uname -r : 5.8.0-1036-azure
Any idea on how to solve this ?

catenacyber · 2021-07-06T06:52:54Z

Interesting : the fuzz target spends quite a bit of time resetting/clearing hash tables... which is not what they are optimized for and is needed by the fuzz target to be stateless...

jonathanmetzman

Nice. Thanks!

jonathanmetzman · 2021-07-07T14:38:27Z

infra/base-images/base-runner/install_perf.sh

+    cd linux-stable/tools/perf/
+    apt-get install -y flex bison make
+    # clang finds errors such as tautological-bitwise-compare
+    CC=gcc DESTDIR=/usr/ make install


Please uninstall flex and bison (there's a bit of an issue with my assumption that we can just uninstall these packages, what if they are installed by a previous step because it is needed in a later step, they will no longer be available).

Please delete the source checkout and the non-installed build.

Done...
I am not sure about flex and bison, I guess a later step should reinstall them if needed

They aren't already installed so they should be safe to remove here.

infra/base-images/base-runner/install_perf.sh

infra/base-images/base-runner/Dockerfile

infra/base-images/base-runner/install_perf.sh

jonathanmetzman

I might have something different in mind for profiling than just profiling on the corpus (which I'm not sure illustrates very useful info).
Like using this for timeouts on ClusterFuzz for example

jonathanmetzman · 2021-07-07T14:44:29Z

infra/base-images/base-runner/coverage

@@ -82,7 +82,7 @@ function run_fuzz_target {
  local args="-merge=1 -timeout=100 -close_fd_mask=3 $corpus_dummy $corpus_real"

  export LLVM_PROFILE_FILE=$profraw_file
-  timeout $TIMEOUT $OUT/$target $args &> $LOGS_DIR/$target.log
+  timeout $TIMEOUT perf record -g -o $DUMPS_DIR/$target.perf $OUT/$target $args &> $LOGS_DIR/$target.log


I'm not sure we should be doing this in the coverage script. It seems different than coverage IMO.

This would surely be useful for timeouts :-)

I also think profiling on the corpus still shows relevant information, as it shows profiling across all kinds of inputs...

I did it in the coverage script, as it was easy : get both reports in one shot...

Let me know what you prefer

Profiling on the corpus seems like it will produce a bit of a skewed result because the corpus is a subset of the inputs that the target runs on during fuzzing. Let's say there is an edge that is hit very frequently during fuzzing and takes a really long time, but only one file in the corpus covers it. The profiling won't show that fuzzing is spending a ton of time on this edge.
@oliverchang What do you think about this?

Another reason this is problematic: In this run the target is instrumented with Source-based coverage instrumentation and dumping results. So not only will profiling include time wasted on that stuff, it will not be able to accurately show the amount of time spent on sancov instrumentation which can also be useful to know.

Indeed, so what is the best way to run it ?
We can run it for some definite time with classic ASAN instrumentation
But then, should it be a new subcommand to infra/helper.py ? Or an option to the run_fuzzer subcommand ?

How useful is sancov info? It seems like pretty niche information in most cases?

That said, it does seem like something that could be built more into ClusterFuzz instead, as we may want this data for timeouts and there's the potential corpus vs actual fuzzing skew issue. Not requiring a specialised build also make it easier.

I think this is fine to include with coverage as is for now though, so we can evaluate this before devoting more time to properly integrate this into ClusterFuzz. How long do these runs typically take?

infra/base-images/base-runner/coverage

infra/base-images/base-runner/install_perf.sh

jonathanmetzman · 2021-07-09T12:00:04Z

infra/base-images/base-runner/coverage

@@ -82,7 +82,7 @@ function run_fuzz_target {
  local args="-merge=1 -timeout=100 -close_fd_mask=3 $corpus_dummy $corpus_real"

  export LLVM_PROFILE_FILE=$profraw_file
-  timeout $TIMEOUT $OUT/$target $args &> $LOGS_DIR/$target.log
+  timeout $TIMEOUT perf record -g -o $DUMPS_DIR/$target.perf $OUT/$target $args &> $LOGS_DIR/$target.log


Another reason this is problematic: In this run the target is instrumented with Source-based coverage instrumentation and dumping results. So not only will profiling include time wasted on that stuff, it will not be able to accurately show the amount of time spent on sancov instrumentation which can also be useful to know.

infra/base-images/base-runner/install_perf.sh

jonathanmetzman · 2021-07-09T12:03:23Z

infra/base-images/base-runner/install_perf.sh

+    cd linux-stable/tools/perf/
+    apt-get install -y flex bison make
+    # clang finds errors such as tautological-bitwise-compare
+    CC=gcc DESTDIR=/usr/ make install


They aren't already installed so they should be safe to remove here.

oliverchang · 2021-07-12T02:38:44Z

infra/base-images/base-runner/Dockerfile

@@ -105,6 +105,11 @@ RUN wget https://repo1.maven.org/maven2/org/jacoco/org.jacoco.cli/0.8.7/org.jaco
    echo "37df187b76888101ecd745282e9cd1ad4ea508d6  /opt/jacoco-agent.jar" | shasum --check && \
    echo "c1814e7bba5fd8786224b09b43c84fd6156db690  /opt/jacoco-cli.jar" | shasum --check

+ENV PERF_DIR=/root/perf
+RUN git clone --depth 1 https://github.com/brendangregg/FlameGraph $PERF_DIR


Can we pin this by checking out a git hash?

Done. Why do we need this ?

It's good to pin dependencies like these going forward for more reproducibility and to avoid potential supply chain compromises.

oliverchang · 2021-07-12T02:42:43Z

infra/base-images/base-runner/coverage

@@ -82,7 +82,7 @@ function run_fuzz_target {
  local args="-merge=1 -timeout=100 -close_fd_mask=3 $corpus_dummy $corpus_real"

  export LLVM_PROFILE_FILE=$profraw_file
-  timeout $TIMEOUT $OUT/$target $args &> $LOGS_DIR/$target.log
+  timeout $TIMEOUT perf record -g -o $DUMPS_DIR/$target.perf $OUT/$target $args &> $LOGS_DIR/$target.log


How useful is sancov info? It seems like pretty niche information in most cases?

That said, it does seem like something that could be built more into ClusterFuzz instead, as we may want this data for timeouts and there's the potential corpus vs actual fuzzing skew issue. Not requiring a specialised build also make it easier.

I think this is fine to include with coverage as is for now though, so we can evaluate this before devoting more time to properly integrate this into ClusterFuzz. How long do these runs typically take?

catenacyber · 2021-07-12T05:45:48Z

How useful is sancov info? It seems like pretty niche information in most cases?

In my main test case, suricata fuzz_sigpcap_aware, __sanitizer_cov_trace_cmp8 is pretty significant in resetting hash tables between each input, like 3% of the total runtime

How long do these runs typically take?

In my main test case, suricata fuzz_sigpcap_aware, with a corpus over 10k elements, it takes like 10 seconds

jonathanmetzman · 2021-07-12T14:10:33Z

Interesting : the fuzz target spends quite a bit of time resetting/clearing hash tables... which is not what they are optimized for and is needed by the fuzz target to be stateless...

Which hash tables are you talking about?

jonathanmetzman · 2021-07-12T14:14:15Z

This works locally as I have uname -r : 4.15.0-142-generic but fails CI with uname -r : 5.8.0-1036-azure
Any idea on how to solve this ?

Not sure to be honest. And I'm wondering if this is going to be a problem on google cloud (i.e. production) since I would bet the kernels there are also non-standard. Googling "install perf on azure" doesn't come up with much that is helpful: https://www.google.com/search?q=install+perf+on+azure+linux+linux-tools+-perfinsights

catenacyber · 2021-07-12T14:23:09Z

Which hash tables are you talking about?

The ones in the software being fuzzed (suricata)

And I'm wondering if this is going to be a problem on google cloud (i.e. production) since I would bet the kernels there are also non-standard.

At least, I solved the CI problem with install_perf.sh and its multiple alternatives to install perf

jonathanmetzman · 2021-07-12T14:24:08Z

Which hash tables are you talking about?

The ones in the software being fuzzed (suricata)

You mean the ones reset by __sanitizer_cov_trace_cmp8 right?

catenacyber · 2021-07-12T15:43:27Z

The ones in the software being fuzzed (suricata)
You mean the ones reset by __sanitizer_cov_trace_cmp8 right?

I do not think so, I mean the hash tables from util-hashlist.c in Suricata
So, that is only relevant for Suricata's project, it is not generic.
The generic point is that profiling helps to find points where we can improve the buzzer's speed.

jonathanmetzman · 2021-07-13T15:10:52Z

The ones in the software being fuzzed (suricata)
You mean the ones reset by __sanitizer_cov_trace_cmp8 right?

I do not think so, I mean the hash tables from util-hashlist.c in Suricata
So, that is only relevant for Suricata's project, it is not generic.
The generic point is that profiling helps to find points where we can improve the buzzer's speed.

Ah got it.

catenacyber · 2021-09-05T17:21:21Z

Other UIs are available to get flame graphs cf https://www.markhansen.co.nz/profiler-uis/

DavidKorczynski · 2021-09-06T10:16:38Z

Other UIs are available to get flame graphs cf https://www.markhansen.co.nz/profiler-uis/

FYI a few months ago we used Prodfiler to debug some of the Envoy fuzzers (check page 10 in the report https://github.com/envoyproxy/envoy/blob/main/docs/security/audit_fuzzer_adalogics_2021.pdf) and the flamegraphs were very useful for debugging performance issues in the fuzzers.

catenacyber · 2021-09-06T20:18:06Z

Thanks @DavidKorczynski for this interesting read I had missed.

Does this mean I should not | grep LLVMFuzzerTestOneInput ? for getting the perf report also outside the fuzz target ?

Side note for me later : We can try catenacyber@0ec259f to see if we have a difference in the perf report

as libfuzzer may take time as well

catenacyber · 2021-11-16T21:46:52Z

Friendly ping on this
I rebased it and made some needed changes to get this still working.

@jonathanmetzman would you want to use perf for other use cases ?

DavidKorczynski · 2022-07-26T09:16:13Z

I think it would be really nice to get this added and add the flame graphs to Fuzz Introspector

catenacyber · 2022-07-26T09:45:35Z

I think it would be really nice to get this added and add the flame graphs to Fuzz Introspector

Thanks @DavidKorczynski. I still want to get this somehow ;-)

What would it take to get it into Fuzz Introspector ?
Does Fuzz Introspector support rust ? (perf works for it)

DavidKorczynski · 2022-07-26T09:49:39Z

I think the easiest from Fuzz Introspector's perspective would be to accept the .svg file and perhaps a textual representation of what the flamegraph represents, rather than having Fuzz Introspector run the flame graph generation itself

catenacyber · 2022-08-22T06:12:38Z

@jonathanmetzman how would you like to proceed on this ?

This was referenced Jul 7, 2021

Fuzz perf v2 OISF/suricata#6259

Closed

Fuzz perf v3 OISF/suricata#6260

Closed

jonathanmetzman reviewed Jul 7, 2021

View reviewed changes

jonathanmetzman reviewed Jul 9, 2021

View reviewed changes

oliverchang reviewed Jul 12, 2021

View reviewed changes

catenacyber added 12 commits November 16, 2021 21:08

infra: perf flame graph with coverage

b45aaec

fixu

f9389cd

try CI without kernel-specific perf

d11f2f0

Install perf script

dac7f0c

install make for perf

f032252

perf after packages

a2b1681

Address comments

7b4aba9

Address comments

179f2d5

Pin perf to a specific hash

73f6765

fixup perf install

8536324

elfutils for debug eymbols

c0ecaeb

do not focus on fuzz target only

e83b772

as libfuzzer may take time as well

libelf-dev for symbols

fb9a146

catenacyber force-pushed the perf branch from 6681ff4 to fb9a146 Compare November 16, 2021 21:44

remove comment in multiline piped command

a57e7a7

catenacyber mentioned this pull request Jul 8, 2022

Can fuzzers be run without sanitizers? #7974

Closed

infra: perf flame graph #5995

Are you sure you want to change the base?

infra: perf flame graph #5995

Conversation

catenacyber commented Jul 5, 2021

catenacyber commented Jul 5, 2021

catenacyber commented Jul 6, 2021

jonathanmetzman left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jonathanmetzman left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

oliverchang Jul 12, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

catenacyber commented Jul 12, 2021

jonathanmetzman commented Jul 12, 2021

jonathanmetzman commented Jul 12, 2021

catenacyber commented Jul 12, 2021

jonathanmetzman commented Jul 12, 2021

catenacyber commented Jul 12, 2021

jonathanmetzman commented Jul 13, 2021

catenacyber commented Sep 5, 2021

DavidKorczynski commented Sep 6, 2021

catenacyber commented Sep 6, 2021

catenacyber commented Nov 16, 2021

DavidKorczynski commented Jul 26, 2022

catenacyber commented Jul 26, 2022

DavidKorczynski commented Jul 26, 2022

catenacyber commented Aug 22, 2022

oliverchang Jul 12, 2021 •

edited

Loading