You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If you take a look at the core region of the innermost method in a benchmark in the libpfc case, you find a rep stos call inside the timed region as follows:
The code before and after is issuing rdpmc to read the performance counters, and the actual timed called is dep_add_rax_rax, but the presence of the rep stos is unfortunate, since it's slow, invokes microcode and so on. It's there because of:
which zero-initializes the counter array. The existing macro either add (PFC_END as shown above) or sub from the array location, so we require zero init since otherwise the garbage will be picked up. In principle though the array is just replaced with the current value, so this isn't necessary - we have have a new PFC_ macro which just mov in the absolute value.
In principle, the effect is cancelled out by the use of dummy_bench (or any other bench), but it would still be nice to eliminate all unnecessary code in the benchmarked region, especially rep instructions and those which modify memory.
The text was updated successfully, but these errors were encountered:
If you take a look at the core region of the innermost method in a benchmark in the libpfc case, you find a
rep stos
call inside the timed region as follows:The code before and after is issuing
rdpmc
to read the performance counters, and the actual timed called isdep_add_rax_rax
, but the presence of therep stos
is unfortunate, since it's slow, invokes microcode and so on. It's there because of:and
which zero-initializes the counter array. The existing macro either
add
(PFC_END
as shown above) orsub
from the array location, so we require zero init since otherwise the garbage will be picked up. In principle though the array is just replaced with the current value, so this isn't necessary - we have have a newPFC_
macro which justmov
in the absolute value.In principle, the effect is cancelled out by the use of
dummy_bench
(or any other bench), but it would still be nice to eliminate all unnecessary code in the benchmarked region, especiallyrep
instructions and those which modify memory.The text was updated successfully, but these errors were encountered: