fix(trace): use scope in logger (Syndica#345)

* Add scope to logger in Gossip Service * Add scope to logger in Shred Inserter * Improve adding scope to logger in Gossip Service * Add scope to logger in transaction sender service * Add scope to logger in AccountsDB * Add scope to logger in GossipDumpService * Add scope to logger in RepairService * Add scope to logger in DownloadProgress * Add scope to logger in BlockstoreReader * Add scope to logger in RepairRequester * Add scope to logger in BlockstoreWriter * Add scope to logger in Client * Add scope to logger in ShredCollectorDependencies * Add scope to logger in BasicShredTracker * Removed unused import * Add scope to logger in ShredReceiver * Add scope to logger in LeaderInfo * Fix leader info * Add scope to logger in ServiceManager * Add scope to logger in AppBase * Scope Gossip metrics to gossip service * Fix after merge * Removed unused imports * More scoping fix(snapshot-download): url (Syndica#355) test(ledger): cleanup database testing (Syndica#353) * fix database testing * move `std.testing.allocator` usages into test blocks --------- Co-authored-by: Drew Nutter <[email protected]> fix(shred-collector): shred receiver metrics not being counted (Syndica#356) The following metrics were not being incremented: - shred_receiver_received_count - shred_receiver_satisfactor_shred_count This pr fixes it by incrementing them in the appropriate places. Fix after merge Scope MerkleRootValidator Remove unused import Scoping functions not attached to structs Update runService to take logger as anytype Update printTimeEstimate to take logger as anytype Remove unused imports Scope init function of AppBase to AppBase Scope RocksDB refactor(ledger): rename roots and orphans to rooted_slots and orphan_slots (Syndica#357) improve(benchmarks): support multiple values + visualizing runtimes (Syndica#316) * build: add a `no-run` option Useful for when you need to build a specific executable, but not install it. * speed up the snapshot downloader * bench: add verifySnapshot benchmark * download: return an error instead of panicing in `writeCallback` * free `loading_threads` if `AccountsDB.init` fails * benchmark progress * feat: csv benchmark outputs * fix: lint and remove unused * more fixes * add variance to single output * fix: path allocations * re-enable read/write benchmarks * fix: formatting * fix: column headers * fix: leaks in read/write test * fix benchmarks * ci fix: dont run all the benchmarks * attempt fix: OOM ci * CI check * fix leak in CI * fix and identify leaks * fix lint * feat: start agg print results * feat: add option to output all runtimes * remove extra bench * add script to view benchmark runtimes * more improvements on scripts * fix: update script for different lengths * fixes * fix formatting * add docs * fix: lint * reduce logger memory * fix script * remove commented out code * fix tests / build after merge * output data to files * support output for humans * improve formatting * support ledger benchs * merge cache changes * remove trailing commas * fix test leak * fix another test leak * address comments * support custom time units per benchmark * fix field output * fix post merge * fix lint * add back bincode benchs * change to switch and use tmpdir * remove assert type alias * address more comments * remove account_ref defaults * lower case enums * more comments * fix tests --------- Co-authored-by: David Rubin <[email protected]> Remove unused import feat(ci) add line size checking to the CI (Syndica#360) * add line size checking to the CI * add exclude list + format the python scripts * lexicographically order the exclude list * add missing exclude files feat(ledger): bench getting code shreds (Syndica#354) * fix(ledger): backward merkle root chaining bugs It was ignoring result of checking chaining, and just always marked it as a duplicate (as if the check failed), and always returned true (as if the check succeeded). complete nonsense It was comparing chained root to chained root, which is wrong. it should be chained to actual root This change also refactors common logic into a single place. Also this now returns an error if the merkle root is missing, because that means there is a severe bug in the validator and it needs to be shutdown and patched immediately. * Update src/ledger/shred_inserter/merkle_root_checks.zig Co-authored-by: InKryption <[email protected]> * fix(ledger): zig fmt * refactor(ledger): merkle root validator struct the four methods all have basically the same dependencies so this bundles them together to clarify their purpose and simplify their expression * refactor(ledger): use working stores to clarify and minimize dependencies, and to organize working state code * refactor(ledger): in merkle root checks, rename early and late shreds to old and new shreds * Added code shreds * Add BlockstoreReader.getCodeShred * Bench reader.getCodeShredsForSlot --------- Co-authored-by: Drew Nutter <[email protected]> Co-authored-by: InKryption <[email protected]> feat(accountsdb,gossip): Initial snapshot propagation work (Syndica#333) * Refactor latest snapshot info state 1. Guard the full and incremental snapshot info under the same rwmux, both to deduplicate the full slot value, and to synchronize the existence of incremental snapshots relative to full snapshots. 2. Also do the same with the first snapshot info. * Improve, simplify, and fix some bincode code * Minor significant gossip/data rework & misc * Pass const slice instead of arraylist * Remove unused import * Big refactor on accountsdb fields and init & misc * Grouped fields into a few categories. * Change init to take only an `InitParams` struct. * Lowercase `AllocatorConfig` fields. * Share the `AllocatorConfig` enum tag across a few different places. * Inline `InitConfig`'s fields and remove it. * Fix code that potentially leaked a disk allocator pointer. * Notify gossip of new full snapshot * Re-group and alphabetize type aliases * Remove pointless `file_size` field * Fix `writeSnapshotTarWithFields` It was incorrectly reporting `file_info.length` as the file size, when it should have been using `account_file.memory.len`. This also adds a couple more runtime checks to assert the total written bytes are aligned to full 512 byte blocks. * Simplify allocation scheme in untar procedure * Set up test for snapshot gossip advertisement It's currently disabled because it overwrites the snapshot archives in the test data dir; can comment the skip statement in order to test it locally for now. As the TODO says, we need a mechanism for loading from a snapshot which isn't in the accountsdb's snapshot working directory. This also adds a simple way to test the equality of the `SnapshotHashes` structure in tests. * Re-simplify SnapshotHashes incremental list The potential memory savings were minimal at best, and came at a significant cost in complexity - a simple tagged union is sufficient to avoid the overhead of an allocation, whilst retaining simplicity across the rest of the code (especially wrt generic comparisons). * Defer gossip service shutdown Sequence it before deinit * Use workaround in gossip snapshot update unit test In the future there should be a better API for loading from a snapshot that's outside the accountsdb snapshot working directory. * Fixes possible invalid deinit * Remove unused imports * Remove unused default field values * Improve loading thread code & safety. * Various naming improvements * Gossip push queue & signing refactor * Refactor `initSigned`, and related init methods, and call sites. * Move methods into more scoped types. * Flatten the `Gossip` struct into `AccountsDB`, and only store `my_pubkey`, instead of the whole key pair. * Change `push_msg_queue` to queue unsigned gossip data, and then sign it in `drainPushQueueToGossipTable`. * Set the wallclock time in `drainPushQueueToGossipTable` as well. * Remove useless debug field * Affirm and improve warning logs * Put pubkey and queue back into combined optional * Remove old commented code, adjust comparison fix(scripts): unused import regex and misc refactoring (Syndica#362) * fix(scripts): check_style.py regex issues, avoid early exit, and refactoring * refactor(scripts) rename check_style.py to style.py since it's not just for checking, it does autofix as well. --check is a cli option and confusingly redundant with the filename * refactor: remove unused imports I noticed the regex for detecting unused imports was changed in a recent PR. This broke the detection of unused imports. The change made it so you would only detect unused lines that explicitly say `@import("...")`. But those are not the only lines we want to check. We also want to check for things like `const Thing = sig.utils.Thing;`. The recent change also broke detection of identifiers with numbers and underscores. So I fixed these issues. I also made the regex slightly better at detecting file paths within `@import` lines by including `/` as a valid character. This identified few more unused imports which I also removed in this PR. I consolidated the code to a main function so it's clearer what's actually running when you run the script. I used return values to indicate failed checks, instead of directly calling `exit(1)` within the check function. That way it always runs all the checks, instead of exiting early. I renamed the file to `style.py`. `check_style.py` is misleading because it's not just a check. By default this script will automatically edit the code to adjust its style. Only if you provide the `--check` flag does it limit itself to checks. At that point, the name is redundant with the flag. I see "style" as analogous to zig's "fmt" subcommand. By default `zig fmt` will autoformat the code, and then it limits itself to checks if you use `--check`. fix(gossip): send pull requests with ContactInfo instead of LegacyContactInfo (Syndica#374) perf(shred-collector): cache verified merkle roots (Syndica#376) This improves the performance of signature verification in the shred collector by a factor of up to 64. The shred signature signs the merkle root. Since all the shreds from an erasure set share the same merkle root, you actually only need to verify the signature once for that merkle root, then every shred from the erasure set should be considered "verified" as long as you can reproduce the merkle root from its merkle proof. Before this pr, the code was redundantly verifying the signature of the merkle root for every single shred. This leads to a performance bottleneck in the shred verifier because signature verification is computationally expensive. This change removes the redundancy. It uses an lru cache to keep track of the last 1024 merkle roots to be verified. add fix (Syndica#377) feat(ledger): user-selectable db, hashmapdb fixes, add tests, safer memory management (Syndica#372) * feat(ledger): user-selectable db, hashmapdb fixes, add tests, safer memory management * fix(ledger): hashmapdb thread safety problem when copied * fix(ledger): potential deadlock in hashmap db * refactor(allocators): remove dead code noAlloc feat(bincode): config default values (Syndica#380) refactor(merkle_tree): general purpose nested list (Syndica#378) feat(shred-collector): add metrics and grafana dashboard (Syndica#375) * feat(shred-collector): distinguish turbine vs repair packets in metrics * feat(grafana): shred collector dashboard refactor(bincode): file_id config (Syndica#379) * refactor(bincode): file_id config * move config into file_id.zig feat(ci) start correcting files to fit the new style guide (Syndica#361) * start correcting files to fit the new style * continue formatting files * remove non-style change * some more refactors * fix some try precedence issues fix(accountsdb): inf loop (Syndica#383) Remove unused import scope runService logger to ServiceManager Un-scope logger on ShredCollectorDependencies docs(research): add lmdb/accountsdb writing to docs/research (Syndica#390) * add lmdb details to docs/research feat(dashboards): add hardware metrics (Syndica#392) fix(rocksdb): zig build fails on linux intel systems (Syndica#394) This upgrades rocksdb to a new commit from our fork of rocksdb. That commit includes some fixes, and upgrades rocksdb from 9.5 to 9.7. ---- Normally, I build sig for my work machine, which is a macbook, and it builds fine. I tried to build sig on my personal laptop and my desktop, and it did not build on either system. Both are running linux with intel cpus (tiger lake and arrow lake). There are two errors. Both are failures to build rocksdb. ---- The first error is due to incorrect syntax from our commit to our fork of rocksdb. ``` /home/drew/.cache/zig/p/1220dc54736c65c61ee181cd2716db055cb78131ebed2b88ccf4733f43d35ac39eeb/port/lang.h:79:16: error: extra tokens at end of #ifdef directive ^ /home/drew/.cache/zig/p/1220dc54736c65c61ee181cd2716db055cb78131ebed2b88ccf4733f43d35ac39eeb/util/crc32c.cc:18:10: note: in file included from /home/drew/.cache/zig/p/1220dc54736c65c61ee181cd2716db055cb78131ebed2b88ccf4733f43d35ac39eeb/util/crc32c.cc:18: ^ ``` Commit introducing the bug: facebook/rocksdb@a9c9eee Fix I committed today: facebook/rocksdb@7c32e83 ---- The next error is due to a bug in zig. Zig should be able to detect that crc32 is supported by any architecture supporting sse4, but it does not. So zig fails to activate the crc32 feature even though it is available. Meanwhile, RocksDB simply checks to see if sse4 is supported, then it proceeds to use crc32 functions. The error occurs because zig has failed to activate the crc32 feature, despite it being supported by the cpu. ``` /home/drew/.cache/zig/p/1220dc54736c65c61ee181cd2716db055cb78131ebed2b88ccf4733f43d35ac39eeb/util/crc32c.cc:611:13: error: always_inline function '_mm_crc32_u64' requires target feature 'crc32', but would be inlined into function 'crc32c_3way' that is compiled without support for 'crc32' CRCtriplet(crc, next, -128); ^ /home/drew/.cache/zig/p/1220dc54736c65c61ee181cd2716db055cb78131ebed2b88ccf4733f43d35ac39eeb/util/crc32c.cc:432:12: note: expanded from macro 'CRCtriplet' crc##0 = _mm_crc32_u64(crc##0, *(buf##0 + offset)); \ ``` To be more precise, RocksDB should check for crc32 itself, instead of checking for sse4, since it's actually only interested in whether crc32 is supported. So I committed that change here: facebook/rocksdb@d879aa8 This avoids the compilation error because the compiler thinks crc32 is not supported, so it will not attempt to use the code without activating it. Ideally the zig compiler should also be fixed so we can actually use crc32 on platforms that support it. feat(allocator): add recycle buffer (Syndica#381) feat(accountsdb): integrate improved reference allocator (Syndica#382) * private recycle method * add unsafe suffix to unsafe methods * feat(accountsdb): improved reference allocator improve(benchmarks): better cli parsing (Syndica#391) * improve(benchmarks): better cli parsing * update usage to by dynamic with enum Use same scope identier in logger in cleanup_service.zig Scope to rockdb in rocksdb.zig remove usage of @src().fn_name in log scope * Fixes after rebasing * Style * More fixes
clarkezone · Nov 20, 2024 · 41cf6aa · 41cf6aa
1 parent f58096a
commit 41cf6aa
Show file tree

Hide file tree

Showing 29 changed files with 206 additions and 144 deletions.
diff --git a/src/accountsdb/db.zig b/src/accountsdb/db.zig
@@ -43,6 +43,7 @@ const Pubkey = sig.core.Pubkey;
 const Slot = sig.core.Slot;
 
 const NestedHashTree = sig.utils.merkle_tree.NestedHashTree;
+const Logger = sig.trace.log.Logger;
 
 const GeyserWriter = sig.geyser.GeyserWriter;
 
@@ -55,8 +56,6 @@ const WeightedAliasSampler = sig.rand.WeightedAliasSampler;
 
 const RwMux = sig.sync.RwMux;
 
-const Logger = sig.trace.log.Logger;
-
 const parallelUnpackZstdTarBall = sig.accounts_db.snapshots.parallelUnpackZstdTarBall;
 const spawnThreadTasks = sig.utils.thread.spawnThreadTasks;
 const printTimeEstimate = sig.time.estimate.printTimeEstimate;
@@ -634,7 +633,7 @@ pub const AccountsDB = struct {
 
             if (print_progress and progress_timer.read().asNanos() > DB_LOG_RATE.asNanos()) {
                 printTimeEstimate(
-                    self.logger,
+                    self.logger.withScope(@typeName(Self)),
                     &timer,
                     n_account_files,
                     file_count,
@@ -669,7 +668,7 @@ pub const AccountsDB = struct {
 
             if (print_progress and progress_timer.read().asNanos() > DB_LOG_RATE.asNanos()) {
                 printTimeEstimate(
-                    self.logger,
+                    self.logger.withScope(@typeName(Self)),
                     &timer,
                     n_accounts_total,
                     ref_count,
@@ -695,7 +694,7 @@ pub const AccountsDB = struct {
             .data_len = self.account_index.pubkey_ref_map.numberOfShards(),
             .max_threads = n_threads,
             .params = .{
-                self.logger,
+                self.logger.unscoped(),
                 &self.account_index,
                 thread_dbs,
             },
@@ -758,11 +757,12 @@ pub const AccountsDB = struct {
     /// combines multiple thread indexes into the given index.
     /// each bin is also sorted by pubkey.
     pub fn combineThreadIndexesMultiThread(
-        logger: Logger,
+        logger_: Logger,
         index: *AccountIndex,
         thread_dbs: []const AccountsDB,
         task: sig.utils.thread.TaskParams,
     ) !void {
+        const logger = logger_.withScope(@typeName(Self));
         const shard_start_index = task.start_index;
         const shard_end_index = task.end_index;
 
@@ -1142,7 +1142,7 @@ pub const AccountsDB = struct {
 
             if (print_progress and progress_timer.read() > DB_LOG_RATE.asNanos()) {
                 printTimeEstimate(
-                    self.logger,
+                    self.logger.withScope(@typeName(Self)),
                     &timer,
                     shards.len,
                     count,

diff --git a/src/accountsdb/download.zig b/src/accountsdb/download.zig
@@ -10,9 +10,13 @@ const GossipTable = sig.gossip.GossipTable;
 const ThreadSafeContactInfo = sig.gossip.data.ThreadSafeContactInfo;
 const GossipService = sig.gossip.GossipService;
 const Logger = sig.trace.Logger;
+const ScopedLogger = sig.trace.ScopedLogger;
 
 const DOWNLOAD_PROGRESS_UPDATES_NS = 6 * std.time.ns_per_s;
 
+// The identifier for the scoped logger used in this file.
+const LOG_SCOPE = "accountsdb.download";
+
 /// Analogous to [PeerSnapshotHash](https://github.com/anza-xyz/agave/blob/f868aa38097094e4fb78a885b6fb27ce0e43f5c7/validator/src/bootstrap.rs#L342)
 const PeerSnapshotHash = struct {
     contact_info: ThreadSafeContactInfo,
@@ -157,13 +161,14 @@ pub fn findPeersToDownloadFromAssumeCapacity(
 /// note: gossip_service must be running.
 pub fn downloadSnapshotsFromGossip(
     allocator: std.mem.Allocator,
-    logger: Logger,
+    logger_: Logger,
     // if null, then we trust any peer for snapshot download
     maybe_trusted_validators: ?[]const Pubkey,
     gossip_service: *GossipService,
     output_dir: std.fs.Dir,
     min_mb_per_sec: usize,
 ) !void {
+    const logger = logger_.withScope(LOG_SCOPE);
     logger
         .info()
         .logf("starting snapshot download with min download speed: {d} MB/s", .{min_mb_per_sec});
@@ -240,7 +245,7 @@ pub fn downloadSnapshotsFromGossip(
 
             downloadFile(
                 allocator,
-                logger,
+                logger.unscoped(),
                 snapshot_url,
                 output_dir,
                 snapshot_filename,
@@ -281,7 +286,7 @@ pub fn downloadSnapshotsFromGossip(
                 logger.info().logf("downloading inc_snapshot from: {s}", .{inc_snapshot_url});
                 _ = downloadFile(
                     allocator,
-                    logger,
+                    logger.unscoped(),
                     inc_snapshot_url,
                     output_dir,
                     inc_snapshot_filename,
@@ -304,7 +309,7 @@ pub fn downloadSnapshotsFromGossip(
 const DownloadProgress = struct {
     file: std.fs.File,
     min_mb_per_second: ?usize,
-    logger: Logger,
+    logger: ScopedLogger(@typeName(Self)),
 
     progress_timer: sig.time.Timer,
     bytes_read: u64 = 0,
@@ -325,7 +330,7 @@ const DownloadProgress = struct {
         try file.setEndPos(download_size);
 
         return .{
-            .logger = logger,
+            .logger = logger.withScope(@typeName(Self)),
             .file = file,
             .min_mb_per_second = min_mb_per_second,
             .progress_timer = try sig.time.Timer.start(),
@@ -476,12 +481,13 @@ fn enableProgress(
 /// the main errors include {HeaderRequestFailed, NoContentLength, TooSlow} or a curl-related error
 pub fn downloadFile(
     allocator: std.mem.Allocator,
-    logger: Logger,
+    logger_: Logger,
     url: [:0]const u8,
     output_dir: std.fs.Dir,
     filename: []const u8,
     min_mb_per_second: ?usize,
 ) !void {
+    const logger = logger_.withScope(LOG_SCOPE);
     var easy = try curl.Easy.init(allocator, .{});
     defer easy.deinit();
 
@@ -503,7 +509,7 @@ pub fn downloadFile(
     // timeout will need to be larger
     easy.timeout_ms = std.time.ms_per_hour * 5; // 5 hours is probs too long but its ok
     var download_progress = try DownloadProgress.init(
-        logger,
+        logger.unscoped(),
         output_dir,
         filename,
         download_size,

diff --git a/src/accountsdb/index.zig b/src/accountsdb/index.zig
@@ -66,11 +66,12 @@ pub const AccountIndex = struct {
 
     pub fn init(
         allocator: std.mem.Allocator,
-        logger: sig.trace.Logger,
+        logger_: sig.trace.Logger,
         allocator_config: AllocatorConfig,
         /// number of shards for the pubkey_ref_map
         number_of_shards: usize,
     ) !Self {
+        const logger = logger_.withScope(@typeName((Self)));
         const reference_allocator: ReferenceAllocator = switch (allocator_config) {
             .ram => |ram| blk: {
                 logger.info().logf("using ram memory for account index", .{});
@@ -80,7 +81,7 @@ pub const AccountIndex = struct {
                 var index_dir = try disk.accountsdb_dir.makeOpenPath("index", .{});
                 errdefer index_dir.close();
                 const disk_allocator = try allocator.create(DiskMemoryAllocator);
-                disk_allocator.* = .{ .dir = index_dir, .logger = logger };
+                disk_allocator.* = .{ .dir = index_dir, .logger = logger.withScope(@typeName(DiskMemoryAllocator)) };
                 logger.info().logf("using disk memory (@{s}) for account index", .{sig.utils.fmt.tryRealPath(index_dir, ".")});
                 break :blk .{ .disk = .{ .dma = disk_allocator, .ptr_allocator = allocator } };
             },

diff --git a/src/accountsdb/snapshots.zig b/src/accountsdb/snapshots.zig
@@ -1725,10 +1725,11 @@ pub const AllSnapshotFields = struct {
 
     pub fn fromFiles(
         allocator: std.mem.Allocator,
-        logger: Logger,
+        logger_: Logger,
         snapshot_dir: std.fs.Dir,
         files: SnapshotFiles,
     ) !Self {
+        const logger = logger_.withScope(@typeName((Self)));
         // unpack
         const full_fields = blk: {
             const rel_path_bounded = sig.utils.fmt.boundedFmt("snapshots/{0}/{0}", .{files.full_snapshot.slot});