Caveats

Profiling tools generally answer the question where’s all my time going?, and they answer this question by periodically sampling a running executable to see what it’s doing at that point in time. When many samples are collected, they can be analyzed and tallied. Execution hotspots then show up as a large sample count: the application spent more time executing the hotspots. As expected, a higher sampling rate yields more precise results at the cost of a higher instrumentation overhead.

Recently I needed to answer a slightly diffent question: how quickly does THIS SPECIFIC function run?. A sampling profiler is both too much and too little here. Too much because I don’t need to sample when the function of interest is not running, and too little because a very high sampling resolution may be necessary to report the timings with sufficient precision.

I looked around, and didn’t see any available tools that solved this problem. The =perf= tool from the linux kernel does 99% of what is needed to construct such a tool, so I wrote a simple tool that uses perf to give me the facilities I need: clockfunction.

perf is able to probe executables at entry points to a function and at exit points of a function. Probing the entry is trivial, since the addresses can be looked up in the symbol table or in the debug symbols. Probing the exit is not trivial, since multiple ret statements could be present. One could either do some minor static analysis to find the ret statements, or one could look at the return address upon entry, and then dynamically place a probe there. And if there’re any non-local exits, these would both break. I’m not 100% sure, but I suspect that perf does neither, but instead uses some special hardware to do this. In any case, I don’t care: perf allows me to probe function returns somehow, and that’s all I care about.

After placing the probes, I run the executable being evaluated while perf is recording all probe crossings. When the executable exits, the probe log can be analyzed to extract the timing information.

The clockfunction tool automates this. Multiple functions can be sampled, with each one specified as a func@lib string (ltrace-style). func is the name of the function we care about. This could be a shell pattern to pick out multiple functions. lib is the ELF library or executable that contains this function; must be an absolute path. An example:

$ ./clockfunction.py '*rand*'@/usr/bin/perl perl_run@/usr/bin/perl perl -e 'for $i (0..100000) { $s = rand(); }'

# function mean min max stdev Ncalls
## All timings in seconds
Perl_drand48_init_r 7.55896326154e-06 7.55896326154e-06 7.55896326154e-06 0.0               1
Perl_drand48_r      1.95271501819e-06 1.76404137164e-06 3.67719912902e-05 4.0105865074e-07  100001
Perl_pp_rand        5.23026800056e-06 4.78199217469e-06 0.000326015986502 1.71576428687e-06 100001
perl_run            0.662568764063    0.662568764063    0.662568764063    0.0               1

The table was re-spaced for readability. We see that the main perl application took 0.66 seconds. And Perl_pp_rand was called 100001 times, taking 5.23us each time, on average, for a total of 0.523 seconds. A lower-level Perl_drand48_r function took about 1/3 of the time of Perl_pp_rand. If one cared about this detail of perl, this would be very interesting to know. And we found it out without any compile-time instrumentation of our binary and without even bothering to find out what the *rand* functions area called.

Recursive or parallel invocations are supported so far as the mean and Ncalls will be reported correctly. The min, max and stdev of the timings will not be available, however.

Caveats

This tool is a quick hack, and all the actual work is done by ‘perf’. This tool calls ‘sudo’ all over the place, which is ugly.

A relatively recent ‘perf’ is required. The devs have been tinkering with the semantics of ‘perf probe -F’. The following should produce reasonable output:

perf probe -x `which python` -F 'Py*'

(I.e. it should print out a long list of instrumentable functions in the python executable that start with ‘Py’). Older versions of the ‘perf’ tool will barf instead. Note that ‘perf’ is a userspace tool that lives in the linux kernel source tree. And it doesn’t directly depend on specific kernel versions. Grabbing a very recent kernel tree and rebuilding JUST ‘perf’ usually works. And you don’t need to rebuild the kernel and reboot. Usually.

When instrumenting C++ functions you generally need to use the mangled symbol names. At this time ‘perf’ has partial support for demangled names, but it’s not complete enough to work fully (‘perf probe -F’ can report demangled names, but you can’t insert probes with ‘:’ in their names since ‘:’ is already taken in ‘perf probe’ syntax). So I use ‘perf probe –no-demangle’, which again requires a relatively recent ‘perf’. If you aren’t looking at C++, but your perf is too old to have –no-demangle, you’ll get needless barfing; take out the ‘–no-demangle’ in that case.

License

released into the public domain; I’m giving up all copyright.

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
LICENSE		LICENSE
README.org		README.org
clockfunction.py		clockfunction.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Caveats

License

About

Releases

Packages

Languages

License

dkogan/clockfunction

Folders and files

Latest commit

History

Repository files navigation

Caveats

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages