Skip to content

Commit

Permalink
fix(trace): use scope in logger (Syndica#345)
Browse files Browse the repository at this point in the history
* Add scope to logger in Gossip Service

* Add scope to logger in Shred Inserter

* Improve adding scope to logger in Gossip Service

* Add scope to logger in transaction sender service

* Add scope to logger in AccountsDB

* Add scope to logger in GossipDumpService

* Add scope to logger in RepairService

* Add scope to logger in DownloadProgress

* Add scope to logger in BlockstoreReader

* Add scope to logger in RepairRequester

* Add scope to logger in BlockstoreWriter

* Add scope to logger in Client

* Add scope to logger in ShredCollectorDependencies

* Add scope to logger in BasicShredTracker

* Removed unused import

* Add scope to logger in ShredReceiver

* Add scope to logger in LeaderInfo

* Fix leader info

* Add scope to logger in ServiceManager

* Add scope to logger in AppBase

* Scope Gossip metrics to gossip service

* Fix after merge

* Removed unused imports

* More scoping

 fix(snapshot-download): url (Syndica#355)

test(ledger): cleanup database testing (Syndica#353)

* fix database testing

* move `std.testing.allocator` usages into test blocks

---------

Co-authored-by: Drew Nutter <[email protected]>

fix(shred-collector): shred receiver metrics not being counted (Syndica#356)

The following metrics were not being incremented:
- shred_receiver_received_count
- shred_receiver_satisfactor_shred_count

This pr fixes it by incrementing them in the appropriate places.

Fix after merge

Scope MerkleRootValidator

Remove unused import

Scoping functions not attached to structs

Update runService to take logger as anytype

Update printTimeEstimate to take logger as anytype

Remove unused imports

Scope init function of AppBase to AppBase

Scope RocksDB

refactor(ledger): rename roots and orphans to rooted_slots and orphan_slots (Syndica#357)

improve(benchmarks): support multiple values + visualizing runtimes (Syndica#316)

* build: add a `no-run` option
  Useful for when you need to build a specific executable, but not install it.

* speed up the snapshot downloader

* bench: add verifySnapshot benchmark

* download: return an error instead of panicing in `writeCallback`

* free `loading_threads` if `AccountsDB.init` fails

* benchmark progress

* feat: csv benchmark outputs

* fix: lint and remove unused

* more fixes

* add variance to single output

* fix: path allocations

* re-enable read/write benchmarks

* fix: formatting

* fix: column headers

* fix: leaks in read/write test

* fix benchmarks

* ci fix: dont run all the benchmarks

* attempt fix: OOM ci

* CI check

* fix leak in CI

* fix and identify leaks

* fix lint

* feat: start agg print results

* feat: add option to output all runtimes

* remove extra bench

* add script to view benchmark runtimes

* more improvements on scripts

* fix: update script for different lengths

* fixes

* fix formatting

* add docs

* fix: lint

* reduce logger memory

* fix script

* remove commented out code

* fix tests / build after merge

* output data to files

* support output for humans

* improve formatting

* support ledger benchs

* merge cache changes

* remove trailing commas

* fix test leak

* fix another test leak

* address comments

* support custom time units per benchmark

* fix field output

* fix post merge

* fix lint

* add back bincode benchs

* change to switch and use tmpdir

* remove assert type alias

* address more comments

* remove account_ref defaults

* lower case enums

* more comments

* fix tests

---------

Co-authored-by: David Rubin <[email protected]>

Remove unused import

feat(ci) add line size checking to the CI (Syndica#360)

* add line size checking to the CI

* add exclude list + format the python scripts

* lexicographically order the exclude list

* add missing exclude files

feat(ledger): bench getting code shreds (Syndica#354)

* fix(ledger): backward merkle root chaining bugs

It was ignoring result of checking chaining, and just always marked it as a duplicate (as if the check failed), and always returned true (as if the check succeeded). complete nonsense

It was comparing chained root to chained root, which is wrong. it should be chained to actual root

This change also refactors common logic into a single place.

Also this now returns an error if the merkle root is missing, because that means there is a severe bug in the validator and it needs to be shutdown and patched immediately.

* Update src/ledger/shred_inserter/merkle_root_checks.zig

Co-authored-by: InKryption <[email protected]>

* fix(ledger): zig fmt

* refactor(ledger): merkle root validator struct

the four methods all have basically the same dependencies so this bundles them together to clarify their purpose and simplify their expression

* refactor(ledger): use working stores to clarify and minimize dependencies, and to organize working state code

* refactor(ledger): in merkle root checks, rename early and late shreds to old and new shreds

* Added code shreds

* Add BlockstoreReader.getCodeShred

* Bench reader.getCodeShredsForSlot

---------

Co-authored-by: Drew Nutter <[email protected]>
Co-authored-by: InKryption <[email protected]>

feat(accountsdb,gossip): Initial snapshot propagation work (Syndica#333)

* Refactor latest snapshot info state
  1. Guard the full and incremental snapshot info under the same rwmux,
    both to deduplicate the full slot value, and to synchronize the
    existence of incremental snapshots relative to full snapshots.
  2. Also do the same with the first snapshot info.

* Improve, simplify, and fix some bincode code

* Minor significant gossip/data rework & misc

* Pass const slice instead of arraylist

* Remove unused import

* Big refactor on accountsdb fields and init & misc
* Grouped fields into a few categories.
* Change init to take only an `InitParams` struct.
* Lowercase `AllocatorConfig` fields.
* Share the `AllocatorConfig` enum tag across a few different places.
* Inline `InitConfig`'s fields and remove it.
* Fix code that potentially leaked a disk allocator pointer.

* Notify gossip of new full snapshot

* Re-group and alphabetize type aliases

* Remove pointless `file_size` field

* Fix `writeSnapshotTarWithFields`
It was incorrectly reporting `file_info.length` as the file size,
when it should have been using `account_file.memory.len`.
This also adds a couple more runtime checks to assert the total written
bytes are aligned to full 512 byte blocks.

* Simplify allocation scheme in untar procedure

* Set up test for snapshot gossip advertisement
It's currently disabled because it overwrites the snapshot archives
in the test data dir; can comment the skip statement in order to
test it locally for now. As the TODO says, we need a mechanism for
loading from a snapshot which isn't in the accountsdb's snapshot
working directory.
This also adds a simple way to test the equality of the `SnapshotHashes`
structure in tests.

* Re-simplify SnapshotHashes incremental list
The potential memory savings were minimal at best, and came at a
significant cost in complexity - a simple tagged union is sufficient
to avoid the overhead of an allocation, whilst retaining simplicity
across the rest of the code (especially wrt generic comparisons).

* Defer gossip service shutdown
Sequence it before deinit

* Use workaround in gossip snapshot update unit test
In the future there should be a better API for loading from a snapshot
that's outside the accountsdb snapshot working directory.

* Fixes possible invalid deinit

* Remove unused imports

* Remove unused default field values

* Improve loading thread code & safety.

* Various naming improvements

* Gossip push queue & signing refactor
* Refactor `initSigned`, and related init methods, and call sites.
* Move methods into more scoped types.
* Flatten the `Gossip` struct into `AccountsDB`, and only
  store `my_pubkey`, instead of the whole key pair.
* Change `push_msg_queue` to queue unsigned gossip data, and then
  sign it in `drainPushQueueToGossipTable`.
* Set the wallclock time in `drainPushQueueToGossipTable` as well.

* Remove useless debug field

* Affirm and improve warning logs

* Put pubkey and queue back into combined optional

* Remove old commented code, adjust comparison

fix(scripts): unused import regex and misc refactoring (Syndica#362)

* fix(scripts): check_style.py regex issues, avoid early exit, and refactoring

* refactor(scripts) rename check_style.py to style.py since it's not just for checking, it does autofix as well. --check is a cli option and confusingly redundant with the filename

* refactor: remove unused imports

I noticed the regex for detecting unused imports was changed in a recent PR. This broke the detection of unused imports. The change made it so you would only detect unused lines that explicitly say `@import("...")`. But those are not the only lines we want to check. We also want to check for things like `const Thing = sig.utils.Thing;`. The recent change also broke detection of identifiers with numbers and underscores. So I fixed these issues.

I also made the regex slightly better at detecting file paths within `@import` lines by including `/` as a valid character. This identified few more unused imports which I also removed in this PR.

I consolidated the code to a main function so it's clearer what's actually running when you run the script.

I used return values to indicate failed checks, instead of directly calling `exit(1)` within the check function. That way it always runs all the checks, instead of exiting early.

I renamed the file to `style.py`. `check_style.py` is misleading because it's not just a check. By default this script will automatically edit the code to adjust its style. Only if you provide the `--check` flag does it limit itself to checks. At that point, the name is redundant with the flag. I see "style" as analogous to zig's "fmt" subcommand. By default `zig fmt` will autoformat the code, and then it limits itself to checks if you use `--check`.

fix(gossip): send pull requests with ContactInfo instead of LegacyContactInfo (Syndica#374)

perf(shred-collector): cache verified merkle roots (Syndica#376)

This improves the performance of signature verification in the shred collector by a factor of up to 64.

The shred signature signs the merkle root. Since all the shreds from an erasure set share the same merkle root, you actually only need to verify the signature once for that merkle root, then every shred from the erasure set should be considered "verified" as long as you can reproduce the merkle root from its merkle proof.

Before this pr, the code was redundantly verifying the signature of the merkle root for every single shred. This leads to a performance bottleneck in the shred verifier because signature verification is computationally expensive.

This change removes the redundancy. It uses an lru cache to keep track of the last 1024 merkle roots to be verified.

add fix (Syndica#377)

feat(ledger): user-selectable db, hashmapdb fixes, add tests, safer memory management (Syndica#372)

* feat(ledger): user-selectable db, hashmapdb fixes, add tests, safer memory management

* fix(ledger): hashmapdb thread safety problem when copied

* fix(ledger): potential deadlock in hashmap db

* refactor(allocators): remove dead code noAlloc

feat(bincode): config default values (Syndica#380)

refactor(merkle_tree): general purpose nested list (Syndica#378)

feat(shred-collector): add metrics and grafana dashboard (Syndica#375)

* feat(shred-collector): distinguish turbine vs repair packets in metrics

* feat(grafana): shred collector dashboard

refactor(bincode): file_id config (Syndica#379)

* refactor(bincode): file_id config

* move config into file_id.zig

feat(ci) start correcting files to fit the new style guide (Syndica#361)

* start correcting files to fit the new style

* continue formatting files

* remove non-style change

* some more refactors

* fix some try precedence issues

fix(accountsdb): inf loop (Syndica#383)

Remove unused import

scope runService logger to ServiceManager

Un-scope logger on ShredCollectorDependencies

docs(research): add lmdb/accountsdb writing to docs/research (Syndica#390)

* add lmdb details to docs/research

feat(dashboards): add hardware metrics (Syndica#392)

fix(rocksdb): zig build fails on linux intel systems (Syndica#394)

This upgrades rocksdb to a new commit from our fork of rocksdb. That commit includes some fixes, and upgrades rocksdb from 9.5 to 9.7.

----
Normally, I build sig for my work machine, which is a macbook, and it builds fine. I tried to build sig on my personal laptop and my desktop, and it did not build on either system. Both are running linux with intel cpus (tiger lake and arrow lake).

There are two errors. Both are failures to build rocksdb.

----

The first error is due to incorrect syntax from our commit to our fork of rocksdb.
```
/home/drew/.cache/zig/p/1220dc54736c65c61ee181cd2716db055cb78131ebed2b88ccf4733f43d35ac39eeb/port/lang.h:79:16: error: extra tokens at end of #ifdef directive
               ^
/home/drew/.cache/zig/p/1220dc54736c65c61ee181cd2716db055cb78131ebed2b88ccf4733f43d35ac39eeb/util/crc32c.cc:18:10: note: in file included from /home/drew/.cache/zig/p/1220dc54736c65c61ee181cd2716db055cb78131ebed2b88ccf4733f43d35ac39eeb/util/crc32c.cc:18:
         ^
```
Commit introducing the bug: facebook/rocksdb@a9c9eee
Fix I committed today: facebook/rocksdb@7c32e83

----

The next error is due to a bug in zig. Zig should be able to detect that crc32 is supported by any architecture supporting sse4, but it does not. So zig fails to activate the crc32 feature even though it is available.

Meanwhile, RocksDB simply checks to see if sse4 is supported, then it proceeds to use crc32 functions. The error occurs because zig has failed to activate the crc32 feature, despite it being supported by the cpu.
```
/home/drew/.cache/zig/p/1220dc54736c65c61ee181cd2716db055cb78131ebed2b88ccf4733f43d35ac39eeb/util/crc32c.cc:611:13: error: always_inline function '_mm_crc32_u64' requires target feature 'crc32', but would be inlined into function 'crc32c_3way' that is compiled without support for 'crc32'
            CRCtriplet(crc, next, -128);
            ^
/home/drew/.cache/zig/p/1220dc54736c65c61ee181cd2716db055cb78131ebed2b88ccf4733f43d35ac39eeb/util/crc32c.cc:432:12: note: expanded from macro 'CRCtriplet'
  crc##0 = _mm_crc32_u64(crc##0, *(buf##0 + offset)); \
```
To be more precise, RocksDB should check for crc32 itself, instead of checking for sse4, since it's actually only interested in whether crc32 is supported. So I committed that change here: facebook/rocksdb@d879aa8

This avoids the compilation error because the compiler thinks crc32 is not supported, so it will not attempt to use the code without activating it. Ideally the zig compiler should also be fixed so we can actually use crc32 on platforms that support it.

feat(allocator): add recycle buffer (Syndica#381)

feat(accountsdb): integrate improved reference allocator (Syndica#382)

* private recycle method

* add unsafe suffix to unsafe methods

* feat(accountsdb): improved reference allocator

improve(benchmarks): better cli parsing (Syndica#391)

* improve(benchmarks): better cli parsing

* update usage to by dynamic with enum

Use same scope identier in logger in cleanup_service.zig

Scope to rockdb in rocksdb.zig

remove usage of @src().fn_name in log scope

* Fixes after rebasing

* Style

* More fixes
  • Loading branch information
dadepo authored Nov 20, 2024
1 parent f58096a commit 41cf6aa
Show file tree
Hide file tree
Showing 29 changed files with 206 additions and 144 deletions.
14 changes: 7 additions & 7 deletions src/accountsdb/db.zig
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,7 @@ const Pubkey = sig.core.Pubkey;
const Slot = sig.core.Slot;

const NestedHashTree = sig.utils.merkle_tree.NestedHashTree;
const Logger = sig.trace.log.Logger;

const GeyserWriter = sig.geyser.GeyserWriter;

Expand All @@ -55,8 +56,6 @@ const WeightedAliasSampler = sig.rand.WeightedAliasSampler;

const RwMux = sig.sync.RwMux;

const Logger = sig.trace.log.Logger;

const parallelUnpackZstdTarBall = sig.accounts_db.snapshots.parallelUnpackZstdTarBall;
const spawnThreadTasks = sig.utils.thread.spawnThreadTasks;
const printTimeEstimate = sig.time.estimate.printTimeEstimate;
Expand Down Expand Up @@ -634,7 +633,7 @@ pub const AccountsDB = struct {

if (print_progress and progress_timer.read().asNanos() > DB_LOG_RATE.asNanos()) {
printTimeEstimate(
self.logger,
self.logger.withScope(@typeName(Self)),
&timer,
n_account_files,
file_count,
Expand Down Expand Up @@ -669,7 +668,7 @@ pub const AccountsDB = struct {

if (print_progress and progress_timer.read().asNanos() > DB_LOG_RATE.asNanos()) {
printTimeEstimate(
self.logger,
self.logger.withScope(@typeName(Self)),
&timer,
n_accounts_total,
ref_count,
Expand All @@ -695,7 +694,7 @@ pub const AccountsDB = struct {
.data_len = self.account_index.pubkey_ref_map.numberOfShards(),
.max_threads = n_threads,
.params = .{
self.logger,
self.logger.unscoped(),
&self.account_index,
thread_dbs,
},
Expand Down Expand Up @@ -758,11 +757,12 @@ pub const AccountsDB = struct {
/// combines multiple thread indexes into the given index.
/// each bin is also sorted by pubkey.
pub fn combineThreadIndexesMultiThread(
logger: Logger,
logger_: Logger,
index: *AccountIndex,
thread_dbs: []const AccountsDB,
task: sig.utils.thread.TaskParams,
) !void {
const logger = logger_.withScope(@typeName(Self));
const shard_start_index = task.start_index;
const shard_end_index = task.end_index;

Expand Down Expand Up @@ -1142,7 +1142,7 @@ pub const AccountsDB = struct {

if (print_progress and progress_timer.read() > DB_LOG_RATE.asNanos()) {
printTimeEstimate(
self.logger,
self.logger.withScope(@typeName(Self)),
&timer,
shards.len,
count,
Expand Down
20 changes: 13 additions & 7 deletions src/accountsdb/download.zig
Original file line number Diff line number Diff line change
Expand Up @@ -10,9 +10,13 @@ const GossipTable = sig.gossip.GossipTable;
const ThreadSafeContactInfo = sig.gossip.data.ThreadSafeContactInfo;
const GossipService = sig.gossip.GossipService;
const Logger = sig.trace.Logger;
const ScopedLogger = sig.trace.ScopedLogger;

const DOWNLOAD_PROGRESS_UPDATES_NS = 6 * std.time.ns_per_s;

// The identifier for the scoped logger used in this file.
const LOG_SCOPE = "accountsdb.download";

/// Analogous to [PeerSnapshotHash](https://github.com/anza-xyz/agave/blob/f868aa38097094e4fb78a885b6fb27ce0e43f5c7/validator/src/bootstrap.rs#L342)
const PeerSnapshotHash = struct {
contact_info: ThreadSafeContactInfo,
Expand Down Expand Up @@ -157,13 +161,14 @@ pub fn findPeersToDownloadFromAssumeCapacity(
/// note: gossip_service must be running.
pub fn downloadSnapshotsFromGossip(
allocator: std.mem.Allocator,
logger: Logger,
logger_: Logger,
// if null, then we trust any peer for snapshot download
maybe_trusted_validators: ?[]const Pubkey,
gossip_service: *GossipService,
output_dir: std.fs.Dir,
min_mb_per_sec: usize,
) !void {
const logger = logger_.withScope(LOG_SCOPE);
logger
.info()
.logf("starting snapshot download with min download speed: {d} MB/s", .{min_mb_per_sec});
Expand Down Expand Up @@ -240,7 +245,7 @@ pub fn downloadSnapshotsFromGossip(

downloadFile(
allocator,
logger,
logger.unscoped(),
snapshot_url,
output_dir,
snapshot_filename,
Expand Down Expand Up @@ -281,7 +286,7 @@ pub fn downloadSnapshotsFromGossip(
logger.info().logf("downloading inc_snapshot from: {s}", .{inc_snapshot_url});
_ = downloadFile(
allocator,
logger,
logger.unscoped(),
inc_snapshot_url,
output_dir,
inc_snapshot_filename,
Expand All @@ -304,7 +309,7 @@ pub fn downloadSnapshotsFromGossip(
const DownloadProgress = struct {
file: std.fs.File,
min_mb_per_second: ?usize,
logger: Logger,
logger: ScopedLogger(@typeName(Self)),

progress_timer: sig.time.Timer,
bytes_read: u64 = 0,
Expand All @@ -325,7 +330,7 @@ const DownloadProgress = struct {
try file.setEndPos(download_size);

return .{
.logger = logger,
.logger = logger.withScope(@typeName(Self)),
.file = file,
.min_mb_per_second = min_mb_per_second,
.progress_timer = try sig.time.Timer.start(),
Expand Down Expand Up @@ -476,12 +481,13 @@ fn enableProgress(
/// the main errors include {HeaderRequestFailed, NoContentLength, TooSlow} or a curl-related error
pub fn downloadFile(
allocator: std.mem.Allocator,
logger: Logger,
logger_: Logger,
url: [:0]const u8,
output_dir: std.fs.Dir,
filename: []const u8,
min_mb_per_second: ?usize,
) !void {
const logger = logger_.withScope(LOG_SCOPE);
var easy = try curl.Easy.init(allocator, .{});
defer easy.deinit();

Expand All @@ -503,7 +509,7 @@ pub fn downloadFile(
// timeout will need to be larger
easy.timeout_ms = std.time.ms_per_hour * 5; // 5 hours is probs too long but its ok
var download_progress = try DownloadProgress.init(
logger,
logger.unscoped(),
output_dir,
filename,
download_size,
Expand Down
5 changes: 3 additions & 2 deletions src/accountsdb/index.zig
Original file line number Diff line number Diff line change
Expand Up @@ -66,11 +66,12 @@ pub const AccountIndex = struct {

pub fn init(
allocator: std.mem.Allocator,
logger: sig.trace.Logger,
logger_: sig.trace.Logger,
allocator_config: AllocatorConfig,
/// number of shards for the pubkey_ref_map
number_of_shards: usize,
) !Self {
const logger = logger_.withScope(@typeName((Self)));
const reference_allocator: ReferenceAllocator = switch (allocator_config) {
.ram => |ram| blk: {
logger.info().logf("using ram memory for account index", .{});
Expand All @@ -80,7 +81,7 @@ pub const AccountIndex = struct {
var index_dir = try disk.accountsdb_dir.makeOpenPath("index", .{});
errdefer index_dir.close();
const disk_allocator = try allocator.create(DiskMemoryAllocator);
disk_allocator.* = .{ .dir = index_dir, .logger = logger };
disk_allocator.* = .{ .dir = index_dir, .logger = logger.withScope(@typeName(DiskMemoryAllocator)) };
logger.info().logf("using disk memory (@{s}) for account index", .{sig.utils.fmt.tryRealPath(index_dir, ".")});
break :blk .{ .disk = .{ .dma = disk_allocator, .ptr_allocator = allocator } };
},
Expand Down
3 changes: 2 additions & 1 deletion src/accountsdb/snapshots.zig
Original file line number Diff line number Diff line change
Expand Up @@ -1725,10 +1725,11 @@ pub const AllSnapshotFields = struct {

pub fn fromFiles(
allocator: std.mem.Allocator,
logger: Logger,
logger_: Logger,
snapshot_dir: std.fs.Dir,
files: SnapshotFiles,
) !Self {
const logger = logger_.withScope(@typeName((Self)));
// unpack
const full_fields = blk: {
const rel_path_bounded = sig.utils.fmt.boundedFmt("snapshots/{0}/{0}", .{files.full_snapshot.slot});
Expand Down
Loading

0 comments on commit 41cf6aa

Please sign in to comment.