flow.log: perf: Possibly tweak Async_file_logger to dealloc msg/mdt objects in the same thread that alloc-ed them. #77
Labels
enhancement
New feature or request
from-akamai-pre-open
Issue origin is Akamai, before opening source
Filed by @ygoldfeld pre-open-source:
(colleague) recently told me (unrelated service) log module has an optimization. Consider a given log call site in thread T that logs message M, having passed verbosity filter check. A buffer is allocated, the message M is composed via printf-type formatting and placed in that buffer B. Pointer to B is passed to the logger which asynchronously logs it to file. Once that occurs (in its logging thread), B can be deleted. (colleague) said that in order to leverage tcmalloc (and, I guess, other thread-caching allocators like jemalloc) perf gains, it is thread T that eventually deallocs B – not the logging thread. I am not clear on how logging thread informs thread T when it's safe to get rid of B.
In my Async_file_logger code, and for other async
Logger
s (which don't exist really), the logging thread simply does so. Actually in my case there is the message buffer B plus the relatively little metadata struct (which has like file, line, severity...) MDT. These are both deleted as noted.I should also note that in my case there is a thread-local buffer B', which maintains stream state (like formatting; for example it remembers setprecision and boost.chrono format specifiers like "ms" instead of "milliseconds"); its string output is (after each log call site) is copied into a B. Detail perhaps but thought I'd mention it.
Anyway, the above info is hearsay from (colleague), and I didn't get some of the details, but (1) I've got ideas besides and (2) certainly the actual code should be perused to see what they do/get ideas/get educated/get inspired....
P.S. "I've got ideas besides" = If the goal is indeed to dealloc in the original thread that alloc-ed B/MDT, then one way is: Thread-local vector<> (pre-reserved to size... something... 10?). Starts empty. Each element = atomic bool Fdone, plus the pointers to allocated B and MDT. Log call site => check existing elements in vector, starting with first one if any. Is flag Fdone=true? Then just reuse B and MDT, no need to realloc (though maybe B isn't big enough? details... something like that). If Fdone=false, check next element in vector. If no Fdone=true one is found, add a new element to vector, allocating new B and MDT, setting Fdone=false. Then invoke Async_file_logger::do_log() as usual, giving it the B/MDT/Fdone slot. This guy async-logs B and MDT; and once it is in the file, sets atomic bool Fdone=true to indicate that MDT/B can be freed or reused. Oh, and after calling do_log() from thread T, perhaps scan the rest of the vector and delete any excess B/MDT entries, removing them from vector – perhaps there should be at most 1 in the steady state. Or something. / ANYWAY! This is just on-the-fly musing. Just read their code first before doing stuff!
The text was updated successfully, but these errors were encountered: