Skip to content

actor-based structure of downstream server #149

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Apr 21, 2025
Merged

Conversation

rkuhn
Copy link
Contributor

@rkuhn rkuhn commented Mar 17, 2025

Posting this mainly for discussion of the approach.

Summary by CodeRabbit

  • New Features

    • Added support for configurable network listening address, maximum downstream peers, and network magic number for incoming connections.
    • Enhanced block forwarding stage to accept and manage network peers, enabling client protocol management and event forwarding.
    • Introduced client protocol handling for chain synchronization, including rollback and forward operations.
    • Added a command-line option to specify the listening address for the daemon.
  • Documentation

    • Added detailed documentation and diagrams explaining the architecture and concurrency model of the chain forwarding stage.
    • Expanded README with instructions for running a Cardano node using Docker.
  • Chores

    • Updated dependencies and development dependencies to support new features and protocol handling.
    • Improved .gitignore rules for Cardano node support files and snapshots.
  • Tests

    • Introduced comprehensive tests for chain synchronization logic, header traversal, and protocol integration.
  • Refactor

    • Refactored Makefile for flexible configuration of ledger and chain directories, and improved database handling commands.

Copy link
Contributor

coderabbitai bot commented Mar 17, 2025

Walkthrough

Alright, mate, here’s the lowdown: This update is a ripper—introducing a full-featured network peer and client management system to the consensus forwarding stage. It brings in async actors for handling chain sync, block fetch, and keepalive mini-protocols, all managed with the new acto dependency. There’s a heap of new docs and diagrams explaining the concurrency model, plus tests to make sure everything’s shipshape. The config, Makefile, and README got a tidy-up to support flexible ledger and chain directories, and there’s a new listen address option for the daemon. All up, it’s a massive step up in modularity and network smarts.

Changes

File(s) Change Summary
.gitignore Updated ignore rules for ledger.db and chain.db to root-only; added ignores for /cardano-node-config/ and /snapshots/.
Cargo.toml, crates/amaru/Cargo.toml Added acto (with tokio feature), minicbor, serde_json, and slot-arithmetic as workspace dependencies.
crates/amaru-consensus/Cargo.toml Added minicbor, serde_json, and tracing-subscriber as dev-dependencies with workspace linkage.
Makefile Introduced variables for DB prefix and listen address; defined ledger and chain directories; added clear-db target; updated bootstrap to depend on clear-db; import targets now specify ledger and chain dirs; dev target passes ledger, chain dirs, and listen address to daemon command; coverage target updated to run offline; demo target exports ledger and chain dirs.
README.md Added section on spinning up a Cardano node with Docker, detailing volumes, ports, and environment variables; clarified preprod instructions with a pointer to more info.
crates/amaru/src/bin/amaru/cmd/daemon.rs, crates/amaru/src/bin/amaru/cmd/mod.rs Added listen_address and max_downstream_peers CLI/config options; introduced constant for default listen address; included explicit network_magic in config struct.
crates/amaru/src/stages/mod.rs Extended Config struct with network_magic, listen_address, and max_downstream_peers; loaded chain tip as Header and converted to Tip; updated make_chain_selector to accept amaru_kernel::Point; changed ForwardStage::new to accept multiple new params including downstream actor ref (optional), network magic, listen address, max peers, and current tip; added traits PallasPoint and AsTip for conversions between internal and pallas network types.
crates/amaru/src/stages/consensus/chain_forward.rs Major overhaul adding network peer acceptance, async actor-based client management, and event forwarding; introduced enums ForwardEvent, Unit, ClientOp, and ClientMsg; ForwardStage and Worker extended with fields for network config, runtime handle, downstream actor ref, max peers, and current tip; bootstrap binds TCP listener and spawns acceptor task; worker schedules on block or peer events; executes forwarding of block events and peer handling; added client supervisor actor managing per-client protocol actors; replaced stub with full network integration.
crates/amaru/src/stages/consensus/chain_forward/client_protocol.rs New module implementing async actors for client protocols: chain sync (with intersection and request handling), block fetch, transaction submission, and keepalive; defines error types and message enums; manages protocol lifecycles and message passing; chain sync separates intersection logic and request handling into supervised actors; stub implementations for block fetch, tx submission, keepalive included.
crates/amaru/src/stages/consensus/chain_forward/client_state.rs New module managing client state for chain sync: tracks queue of client operations (forward and backward), handles rollback truncation, provides async methods to get next operation and tip; includes utility to find headers between points in chain store; helper functions for point hashing and conversions; uses async mutex for store access.
crates/amaru/src/stages/consensus/chain_forward/tests.rs New test module with unit tests for header traversal and intersection logic; integration test exercising chain sync client interaction with forward stage, verifying forward and backward message handling and tip updates.
crates/amaru/src/stages/consensus/chain_forward/README.md New documentation explaining the concurrency model, actor responsibilities, protocol handling, and design rationale of the chain_forward stage including client protocol handling and connection acceptance.
crates/amaru/src/stages/consensus/chain_forward/chain_forward.d2 New diagram definition file depicting the internal architecture and message flow of the chain_forward stage: TCP listener, forward stage worker, client supervisor, client protocols (chain sync, block fetch, tx submission, keepalive), and their relationships.
crates/amaru/src/stages/consensus/chain_forward/test_infra.rs New test infrastructure providing an in-memory chain store, test setup with runtime and forward stage, client wrapper for chain sync interaction, utilities for hash and point creation, and helpers for sending validated and rollback events; includes methods for asserting expected forward/backward events and client message equality.
scripts/demo.sh Added environment variables LEDGER_DIR and CHAIN_DIR with defaults; passed them as command-line arguments to the daemon; reformatted command invocation for clarity.

Sequence Diagram(s)

sequenceDiagram
    participant Peer as Network Peer
    participant Listener as TCP Listener
    participant Stage as ForwardStage Worker
    participant Supervisor as Client Supervisor
    participant Protocol as Client Protocol Actors

    Peer->>Listener: Connect to listen_address
    Listener->>Stage: Send PeerServer via channel
    Stage->>Supervisor: Forward PeerServer and Tip
    Supervisor->>Protocol: Spawn protocol actors (chainsync, blockfetch, etc.)
    Stage->>Supervisor: Broadcast block events (Forward/Backward)
    Protocol->>Peer: Respond to protocol requests (e.g., chainsync)
Loading

Poem

G'day to the code, now networked and bold,
With actors and peers, it's a sight to behold!
Chain syncs are humming, the tests never rest,
The docs and diagrams put design to the test.
From Docker to Makefile, it's all looking ace—
Like a Mario power-up, this code’s picked up the pace!
🚀🍀


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Lite

📥 Commits

Reviewing files that changed from the base of the PR and between 9b8b3cd and bb387c1.

📒 Files selected for processing (1)
  • Makefile (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • Makefile

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@abailly abailly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall it makes sense to me but there's a lot of ground to tread and I think what would help me understand better the design is a diagram. Adding some design notes in the README of the crate as part of this PR would be great.
Also, I'd assume you will add tests after having gathered consensus (ah! ah!) on the design?

info!(
target: EVENT_TARGET,
slot = point.slot_or_default(),
hash = %Hash::<32>::from(point),
"rolled_back_to"
);

// FIXME: block height should be part of BlockValidated message
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point


enum TxSubmissionMsg {}

async fn tx_submission(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need TxSubmission at this stage so probably better to not clutter the code with it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I just wanted to sketch the whole range, we don’t need details here

store: &Arc<Mutex<dyn ChainStore<Header>>>,
start_point: &Point,
points: &[Point],
) -> Option<(Vec<ClientOp>, Tip)> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would extract the intersection logic from this module and put it closer to chain_selection but that can happen later

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking some more about it: this is not related to chain selection, it is only related to computing the intersection in chain sync. So I’d like to keep it here.

Copy link
Contributor

@abailly abailly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some more comments

Tip(to_pallas_point(&header.point()), header.block_height()),
));
}
headers.push(ClientOp::Forward(header.clone()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is where we'll need to revise the design later to only use indices/pointers to track each followers chain, because otherwise the intersection process could accumulate the whole chain as Ops to send to the follower.

Copy link
Contributor Author

@rkuhn rkuhn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@abailly I added a README with a diagram, laying out the basic reasoning behind the approach.


enum TxSubmissionMsg {}

async fn tx_submission(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I just wanted to sketch the whole range, we don’t need details here

Comment on lines 167 to 190
// FIXME: block height should be part of BlockValidated message
let store = stage.store.lock().await;
if let Some(header) = store.load_header(&Hash::from(point)) {
self.tip = Tip(to_pallas_point(point), header.block_height());
self.clients.send(ClientMsg::Op(ClientOp::Forward(header)));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed!

Ok(())
}
Unit::Peer(peer) => {
// FIXME: gasket design bug that we only get &Unit and thus cannot take values from it without internal mutability
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah. Hence the RefCell, I see.

@rkuhn rkuhn force-pushed the rk/downstream-server branch 2 times, most recently from 9017fa4 to 6c6243d Compare March 29, 2025 11:21
@rkuhn rkuhn marked this pull request as ready for review March 29, 2025 11:21
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (26)
README.md (1)

131-148: Comprehensive Docker setup instructions - brilliant addition!

The new section on setting up a Cardano node is clear and detailed. However, it might be worth mentioning the approximate disk space required for the blockchain and expected sync time - these can be rather hefty!

 The final and crucial ingredient is to supply the environment variable `NETWORK=preprod`.
 With this, the node with start up, fetch the required config files, and then start syncing blocks from the network; this will take some time.
+
+Note: The preprod blockchain requires several GB of disk space and initial sync may take hours to complete. You can monitor the sync progress using the node logs.
crates/amaru/src/sync.rs (1)

89-94: Nice implementation of the ForwardStage instantiation!

The updated instantiation now properly incorporates the new configuration parameters. I notice you're passing None as the first parameter (downstream), which suggests this is a terminal actor in your actor system - a bit like the last bloke in a pub crawl who doesn't have anyone to pass the pint to!

One small thing to watch: you're casting network_magic from u32 to u64 on line 92. That's perfectly fine for now, but worth documenting why the types differ between the config and the function parameter if there's a specific reason.

Consider adding a brief comment explaining the type conversion if there's a specific architectural reason for the different integer sizes.

Makefile (1)

75-80: Comprehensive dev target update

You've done a cracking job updating the dev target with all the new parameters. This makes it much easier for developers to run the system with the default configuration.

Since you're building an actor-based system, you might consider adding a comment or additional target to demonstrate different configurations, such as running multiple nodes with different listen addresses. This could help developers understand how to set up a local test network.

crates/amaru-consensus/chain_forward.d2 (3)

9-38: Consistent styling classes, but watch out for overuse of shadows.

It might be simpler to exclude shadows for containers that don’t need a strong emphasis. Too many shadows can clutter the diagram visually—sometimes, less is more.


45-48: Container label is a beaut, but consider descriptive naming.

"chain_forward" is short and sweet, but might you label it "Chain Forward Stage" for clarity?

-  label: "chain_forward"
+  label: "Chain Forward Stage"

113-119: Arrows from client_protocols are clear, but consider extended annotation.

Currently, these lines succinctly indicate how updates flow, but a brief mention in the arrow label about "miniprotocol" might make the transitions even more explicit for new readers.

crates/amaru-consensus/README.md (3)

23-24: Watch the wording to keep it lively yet concise.

You’ve used “needs to be deconstructed in order to allow concurrent usage” – a cheeky rephrase can make this tidier and match our colourful style. For instance:

- PeerServer needs to be deconstructed in order to allow concurrent usage
+ PeerServer must be broken apart to allow concurrent usage
🧰 Tools
🪛 LanguageTool

[style] ~23-~23: Consider a shorter alternative to avoid wordiness.
Context: ... PeerServer needs to be deconstructed in order to allow concurrent usage of the protocols...

(IN_ORDER_TO_PREMIUM)


[uncategorized] ~24-~24: Possible missing comma found.
Context: ...he protocol instances is managed by one actor whose sole responsibility is to react t...

(AI_HYDRA_LEO_MISSING_COMMA)


26-27: Variety is the spice of life—tweak repetitive phrasing.

You’ve used “needs to” again, which can feel a tad repetitive. Perhaps “must be separated” or “should be split off” would ring more melodious.

🧰 Tools
🪛 LanguageTool

[style] ~26-~26: You have already used this phrasing in nearby sentences. Consider replacing it to add variety to your writing.
Context: .... In the case of chainsync this actor needs to obtain information regarding the curren...

(REP_NEED_TO_VB)


[style] ~26-~26: Consider a shorter alternative to avoid wordiness.
Context: ...r operations (roll forward or backward) in order to respond to requests for the next header...

(IN_ORDER_TO_PREMIUM)


[style] ~27-~27: You have already used this phrasing in nearby sentences. Consider replacing it to add variety to your writing.
Context: ...ion updates, these two responsibilities need to be separated into two actors: - `chain...

(REP_NEED_TO_VB)


37-38: Slap in a comma for clarity, mate.

A small punctuation cameo ensures readers don’t run out of breath mid-sentence.

- ...supervisor there is an additional level...
+ ...supervisor, there is an additional level...
🧰 Tools
🪛 LanguageTool

[uncategorized] ~37-~37: A comma might be missing here.
Context: ...s for one client and the overall client supervisor there is an additional level of per-cli...

(AI_EN_LECTOR_MISSING_PUNCTUATION_COMMA)

crates/amaru-consensus/src/chain_forward/tests.rs (2)

97-114: Consider adding negative tests for edge cases.

It might be beneficial to test scenarios like “file not found” or “invalid JSON” to confirm robust error handling. Currently, mk_store unwraps, which can cause panics if the chain file is missing.


270-278: No graceful teardown leads to logs at ERROR level.

It’s not exactly a big worry, but you might add a nice little shutdown signal for the gasket stage, so you don’t get spammed with error logs. This can come in handy for diagnosing real errors.

crates/amaru-consensus/src/chain_forward/client_state.rs (1)

62-66: todo!() usage for Backward operations is a bit of a cliff-hanger.

We’re all for a bit of suspense, but leaving it unimplemented too long might cause confusion. Recommend either adding a comment with the plan or an issue reference so the team knows how to proceed.

Do you fancy me drafting a minimal partial rollback approach?

crates/amaru-consensus/src/chain_forward/client_protocol.rs (8)

38-40: Single-variant enum might be limiting.
Consider expanding ClientProtocolMsg if you foresee new message types in future. Otherwise, it’s fine for now.


42-69: Well-arranged spawns for parallel tasks.
The approach to block fetching, transaction submission, keep-alive, and chain-sync is quite spiffy. Yet, you might want logging statements for each supervised task's lifecycle. Helps debug if tasks misbehave.


78-136: Asynchronous chain sync logic is sound.
The flow for waiting on intersections and bridging ops to the handler is well-designed. Consider a fallback or timeout approach if a chain intersection is delayed indefinitely—just in case the client dozes off.


138-187: chain_sync_handler covers the request flow effectively.
Grander concurrency concerns seem addressed by the cell-based architecture. If the client’s requests become more complex, a break-down of sub-handlers might keep the code tidy.


189-195: Simple transformation to HeaderContent.
Hardcoding variant=1 is okay if you’re certain this never changes. Otherwise, you might consider parameterising.


197-211: block_fetch is a placeholder.
Ensure you handle large block ranges properly when you do implement this. A little buffering or streaming approach is typically wise.


213-224: tx_submission skeleton.
Nothing to add for now—perhaps consider bounding your queue for transactions if you suspect a spam scenario.


226-235: keep_alive loop is minimal but serviceable.
Make sure a misbehaving or unresponsive client doesn’t cause indefinite resource usage. A cheeky little idle-timeout could be nice.

crates/amaru-consensus/src/chain_forward.rs (6)

59-63: ForwardEvent is a handy enumeration of events.
It clearly delineates listening from forwarding. Might consider an additional event if we’re expecting more connection states.


84-87: Unit enum is a neat approach.
Bundling either a peer or a block helps unify the schedule logic. Be cautious of potential expansions—trying to shoehorn in too many variants can get messy.


104-154: bootstrap sets up the server and spawns the client actor.
It’s good that we blink the listening event to downstream. Keep an eye on security—limiting or filtering accepted peers may be wise.


156-170: schedule method concurrently awaits new blocks or new peers.
A parallel fetch strategy is pinned well—just ensure you’re handling exhaustion of the mpsc channel gracefully.


171-247: Solid execute method handles block events and peer arrivals.
Implementation covers validated blocks, rollbacks, and new peer setup. The scattered FIXME notes about block heights and partial storage coverage are worth addressing soon.


250-285: client_supervisor elegantly manages multiple clients.
Broadcasting a ClientOp to all connected clients is neat, though it could become chatty if you have loads of peers. Possibly consider a more selective dispatch approach if scaling up.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e3fcec8 and 6c6243d.

⛔ Files ignored due to path filters (3)
  • Cargo.lock is excluded by !**/*.lock
  • crates/amaru-consensus/Cargo.lock is excluded by !**/*.lock
  • crates/amaru-consensus/chain_forward.png is excluded by !**/*.png
📒 Files selected for processing (14)
  • .gitignore (2 hunks)
  • Cargo.toml (2 hunks)
  • Makefile (2 hunks)
  • README.md (2 hunks)
  • crates/amaru-consensus/Cargo.toml (2 hunks)
  • crates/amaru-consensus/README.md (1 hunks)
  • crates/amaru-consensus/chain_forward.d2 (1 hunks)
  • crates/amaru-consensus/src/chain_forward.rs (5 hunks)
  • crates/amaru-consensus/src/chain_forward/client_protocol.rs (1 hunks)
  • crates/amaru-consensus/src/chain_forward/client_state.rs (1 hunks)
  • crates/amaru-consensus/src/chain_forward/tests.rs (1 hunks)
  • crates/amaru/src/bin/amaru/cmd/daemon.rs (2 hunks)
  • crates/amaru/src/bin/amaru/cmd/mod.rs (1 hunks)
  • crates/amaru/src/sync.rs (2 hunks)
🧰 Additional context used
🧬 Code Definitions (3)
crates/amaru/src/sync.rs (1)
crates/amaru-consensus/src/chain_forward.rs (1)
  • new (66-81)
crates/amaru-consensus/src/chain_forward.rs (6)
crates/amaru-consensus/src/chain_forward/client_protocol.rs (1)
  • client_protocols (42-69)
crates/amaru-consensus/src/chain_forward/client_state.rs (3)
  • to_pallas_point (121-126)
  • new (27-37)
  • tip (45-60)
crates/amaru-consensus/src/consensus/header_validation.rs (1)
  • new (74-84)
crates/amaru-consensus/src/consensus/store/rocksdb.rs (1)
  • new (16-28)
crates/ouroboros-traits/src/is_header/fake.rs (2)
  • point (44-46)
  • slot (40-42)
crates/ouroboros-traits/src/is_header/mod.rs (5)
  • point (33-35)
  • slot (45-45)
  • slot (69-71)
  • slot (87-89)
  • hash (28-30)
crates/amaru-consensus/src/chain_forward/client_state.rs (2)
crates/amaru-consensus/src/chain_forward.rs (1)
  • new (66-81)
crates/amaru-consensus/src/chain_forward/tests.rs (2)
  • point (89-91)
  • hash (62-64)
🪛 LanguageTool
crates/amaru-consensus/README.md

[style] ~23-~23: Consider a shorter alternative to avoid wordiness.
Context: ... PeerServer needs to be deconstructed in order to allow concurrent usage of the protocols...

(IN_ORDER_TO_PREMIUM)


[uncategorized] ~24-~24: Possible missing comma found.
Context: ...he protocol instances is managed by one actor whose sole responsibility is to react t...

(AI_HYDRA_LEO_MISSING_COMMA)


[style] ~26-~26: You have already used this phrasing in nearby sentences. Consider replacing it to add variety to your writing.
Context: .... In the case of chainsync this actor needs to obtain information regarding the curren...

(REP_NEED_TO_VB)


[style] ~26-~26: Consider a shorter alternative to avoid wordiness.
Context: ...r operations (roll forward or backward) in order to respond to requests for the next header...

(IN_ORDER_TO_PREMIUM)


[style] ~27-~27: You have already used this phrasing in nearby sentences. Consider replacing it to add variety to your writing.
Context: ...ion updates, these two responsibilities need to be separated into two actors: - `chain...

(REP_NEED_TO_VB)


[uncategorized] ~37-~37: A comma might be missing here.
Context: ...s for one client and the overall client supervisor there is an additional level of per-cli...

(AI_EN_LECTOR_MISSING_PUNCTUATION_COMMA)

⏰ Context from checks skipped due to timeout of 90000ms (2)
  • GitHub Check: Build on windows-latest with target x86_64-pc-windows-msvc
  • GitHub Check: Build on windows-latest with target x86_64-pc-windows-msvc
🔇 Additional comments (28)
.gitignore (3)

17-17: Database directory structure changes - brilliant!

The change from ledger.db to /*ledger.db/ restricts ignore patterns to the root directory and indicates these are now directories rather than files. This is a clever way to allow for nested database structures while maintaining clean git tracking. Good stuff!


20-20: Same pattern applied consistently - top job!

Applying the same pattern modification to chain.db ensures consistency across database structures. Like a well-paired set of Wellington boots, these changes match perfectly!


33-35: Adding Cardano node support files to gitignore - spot on!

These new ignore patterns align beautifully with the Cardano node setup instructions added in the README. Keeping these potentially large directories out of version control is as sensible as bringing an umbrella to Manchester!

README.md (1)

23-23: Improved cross-reference in documentation - handy addition!

Adding the reference to the bottom of the page provides a lovely little breadcrumb trail for users to follow. It's like leaving a trail of biscuit crumbs to the tea pot - quite thoughtful!

crates/amaru/src/bin/amaru/cmd/mod.rs (1)

28-30:

❓ Verification inconclusive

New default listen address constant - good addition but mind the security!

Setting "0.0.0.0:3001" as the default listen address will bind to all network interfaces. While this makes sense for flexibility, it's like leaving your garden gate wide open - anyone could wander in!

Ensure that firewall rules are properly set up when deploying in production environments, or consider adding a note in the documentation about securing this port.


🏁 Script executed:

#!/bin/bash
# Check if there's any authentication or TLS configuration for this listening port
# which would mitigate the security concerns

rg -A 5 -B 5 "DEFAULT_LISTEN_ADDRESS" --glob "*.rs"
rg "bind|listen|tls|auth|secure" --glob "*.rs" | grep -i "3001"

Length of output: 1781


Security Check Required – Please Verify Network Safeguards

The introduction of the new default listen address is a neat way to centralise configuration, but binding to all interfaces at "0.0.0.0:3001" could resemble leaving your front door wide open. My recent search didn’t uncover any evidence of TLS, authentication, or similar security measures being applied to this port. This might well be a benign omission in the code, but it does raise an eyebrow.

  • In crates/amaru/src/bin/amaru/cmd/mod.rs: The constant DEFAULT_LISTEN_ADDRESS is defined as "0.0.0.0:3001".
  • In crates/amaru/src/bin/amaru/cmd/daemon.rs: This constant is used as a default value for the listen address.
  • No immediate security-related configurations (e.g. TLS or auth) associated with port 3001 appear to be present.

Could you kindly perform a manual verification to confirm that your deployment environment (or additional modules) handles network security appropriately through firewall rules or other means? A note in the documentation regarding securing this port might also be beneficial.

Cargo.toml (2)

55-55: Enhanced tracing with env-filter - smashing improvement!

Adding the env-filter feature to tracing-subscriber allows for more flexible logging configuration via environment variables. This is as handy as a pocket on a shirt - perfect for toggling different log levels during development and debugging!

This change will make it easier to diagnose issues in different environments without recompiling.


18-18:

✅ Verification successful

Adding acto dependency - embracing the actor model!

This addition aligns perfectly with the PR objective of implementing an actor-based structure. The acto crate with tokio features is a cracking choice for building asynchronous actor systems.

It's worth ensuring this matches the overall architectural direction and that team members are familiar with the actor model pattern - it's a bit like switching from driving on the left to the right if you're not used to it!


🌐 Web query:

What features and benefits does the acto 0.7.2 Rust crate provide for actor-based systems?

💡 Result:

The Acto crate (version 0.7.2) is not directly covered in the provided search results. However, based on prominent Rust actor frameworks like Actix and Axiom discussed in the sources, here are key features and benefits typical to modern Rust actor systems:


Core Features of Actor Frameworks (General)

  1. Concurrent Message Handling
    Actors process messages asynchronously with thread-safe communication, using address systems (Addr/Recipient) for message routing[3][9].

  2. Lifecycle Management
    Built-in state management for actor initialization (started), graceful shutdown (stopping), and cleanup (stopped)[3][4].

  3. Supervision
    Parent-child hierarchies for error recovery and system resilience[4][6].

  4. Typed Messaging
    Compile-time type safety for messages without Any type usage[4][9].

  5. Async/Sync Support
    Hybrid execution models (async for I/O-bound tasks, sync for CPU-bound operations)[4][9].


Actix Framework Highlights (v0.13.5)

// Example actor definition
use actix::{Actor, Context, Handler, Message};

struct MyActor;
impl Actor for MyActor {
    type Context = Context<Self>;
}

#[derive(Message)]
#[rtype(result = "String")]
struct MyMessage;

impl Handler<MyMessage> for MyActor {
    type Result = String;
    
    fn handle(&mut self, msg: MyMessage, _ctx: &mut Context<Self>) -> Self::Result {
        "Received message".into()
    }
}
Feature Benefit
Tokio Runtime Built on Tokio for efficient async I/O[4]
Futures Integration Native async/await support via futures[3][9]
Local/Thread Context Flexible execution in thread-bound or shared contexts[4]
Actor Supervision Automatic restarts for failed actors[4]

Axiom Framework Features

// Error handling example
async fn processor(state: State, _ctx: Context, msg: Message) -> AxiomResult<State> {
    if msg.should_fail() {
        Err(AxiomError::new("Critical failure"))?;
    }
    Ok(Status::done(state))
}
Feature Benefit
Panic Tolerance Converts panics to errors for fault isolation[6]
Async Processors First-class async/await support[6]
Runtime Agnostic Works with multiple async runtimes via traits[6][16]
Distributed Design Inspired by Akka/Erlang for cluster-ready systems[6]

Comparison Table

Aspect Actix Axiom
Runtime Tokio-bound[4] Runtime-agnostic[6]
Error Handling Supervision[4] Panic conversion[6]
Async Model Futures-based[9] Async processors[6]
Learning Curve Moderate Higher (Akka-like)

While the specific acto 0.7.2 crate isn't detailed here, modern Rust actor systems generally provide type-safe concurrency, lifecycle management, and runtime flexibility. For production use, Actix offers battle-tested Tokio integration[3][4], while Axiom provides Erlang-style resilience for complex systems[6].

Citations:


Dependency Update: Embracing the Actor Model

The addition of the acto dependency (version 0.7.2 with Tokio features) fits perfectly with the PR's goal of introducing an actor-based architecture. Although direct documentation for this specific version wasn't readily available, similar modern Rust actor frameworks typically offer robust asynchronous message handling, lifecycle management, and supervision, all of which align nicely with our design intentions.

Please ensure that this direction is in sync with our overall architectural strategy and that everyone on the team is comfortable with the actor model approach – a bit like making sure you know whether to drive on the left or the right!

File: Cargo.toml (Line 18)

acto = { version = "0.7.2", features = ["tokio"] }
crates/amaru/src/sync.rs (1)

52-53: Lovely additions to the Config struct!

These new fields are crucial for the actor-based structure. The network_magic will identify the specific network, while the listen_address allows the node to accept incoming connections. Good job keeping the types consistent with their intended use.

crates/amaru-consensus/Cargo.toml (3)

16-16: Adding the acto dependency - brilliant!

Adding the acto crate aligns perfectly with the actor-based structure mentioned in the PR objectives. This will provide the foundation for your actor system.


18-18: Clean formatting on the gasket features

Nice tidy-up on the formatting here. Those extra spaces can be as annoying as getting a pebble in your shoe!


40-45: Good additions to dev dependencies

Adding minicbor, serde_json, and tracing-subscriber makes perfect sense for an actor-based system. These will help with serialization, debugging, and observability - absolutely essential when you're building a distributed system like this.

crates/amaru/src/bin/amaru/cmd/daemon.rs (2)

54-56: Great addition of the listen_address parameter!

The clear documentation and sensible default value make this a very user-friendly addition. This parameter is crucial for allowing incoming connections in your actor-based system.


113-114: Proper configuration handling - spot on!

You've correctly added the new parameters to the Config struct. The network_magic is derived from the network, and the listen_address is passed through from the args. This ensures your actor system has all the information it needs to operate correctly.

Makefile (3)

6-10: Clever variable organization!

Creating these variables makes the Makefile more maintainable and consistent. I particularly like how you've set up LEDGER_DIR and CHAIN_DIR to use the DB_PREFIX, which allows users to easily customize the database locations. The default value for LISTEN_ADDRESS as 0.0.0.0:0 is a good choice for development as it'll let the OS assign an available port.


69-71: Handy clear-db target - lifesaver!

This new target will be incredibly useful during development and testing. Being able to quickly clear the database with a single command is as refreshing as a cold pint on a hot day!


72-72: Smart update to the bootstrap target

Making bootstrap depend on clear-db ensures a clean slate before importing data - that's a proper way to avoid stale data issues. This kind of attention to detail makes for a much smoother developer experience.

crates/amaru-consensus/chain_forward.d2 (1)

1-7: Graceful introduction, well done!

These lines effectively set the stage for the diagram with a clear title and styling instructions. Lovely job making it stand out with the bold text and larger font size.

crates/amaru-consensus/src/chain_forward/tests.rs (2)

1-16: Overall test structure looks magic.

The mixture of ChainStore, ForwardEvent, and runtime creation is well aligned. Great job setting up your test environment.


180-193: Potential concurrency gotchas.

You spawn an actor to forward downstream traffic, which is brilliant. However, consider logging or handling the scenario where the channel might fail to send if the receiver is dropped unexpectedly. That’ll help diagnose issues if a rogue drop happens.

Would you like to add a small check or test to handle the Result of port_tx.send in case the channel is closed?

crates/amaru-consensus/src/chain_forward/client_state.rs (2)

8-12: Choice of pub(super) for ClientOp is sound.

This restricts usage to within the module, preserving your architecture’s secrecy. Good show, mate!


96-98: Check for a potential short-circuit.

When you find a matching point, you return, which is perfect. Just ensure we don’t skip interesting headers that might appear between multiple matching points. If the design only expects one match, you’re grand.

crates/amaru-consensus/src/chain_forward/client_protocol.rs (3)

1-15: Imports appear sensible and well-structured.
They collectively handle concurrency, messaging, and the chain store interactions. No immediate concerns here, mate.


16-36: Good use of thiserror for structured errors.
This enum neatly categorises client errors. Keep an eye on the expansions to ensure they remain comprehensible.


71-77: ChainSyncMsg: straightforward, no fuss.
Defining an enum for ledger operations is spot on—makes the chain-sync code more readable.

crates/amaru-consensus/src/chain_forward.rs (4)

16-27: Imports are logically grouped and relevant.
Locks, concurrency utilities, actor frameworks—all properly set up. No hairy bits spotted.


49-57: ForwardStage gains new fields nicely.
Introducing network_magic, runtime, listen_address, and downstream is wise to unify your chain-forward logic.


66-81: ForwardStage::new is quite straightforward.
All needed fields are properly initialised. Cheers for using AcTokio::new(...), but do be mindful of error handling if the runtime fails to initialise.


89-101: Worker struct and its Drop are tidy.
Aborting the server on drop is a practical approach, preventing stray tasks. Just watch for partial shutdown states.

@rkuhn rkuhn changed the title preview: possible actor-based structure of downstream server actor-based structure of downstream server Apr 11, 2025
@KtorZ
Copy link
Contributor

KtorZ commented Apr 14, 2025

What's up with this PR? Is it abandoned / superseded?

@rkuhn
Copy link
Contributor Author

rkuhn commented Apr 16, 2025

So far I didn’t get any thumbs-up — how does this work here?

@abailly
Copy link
Contributor

abailly commented Apr 16, 2025

So far I didn’t get any thumbs-up — how does this work here?

I think the rule is that if 1/ the CI is green and 2/ there's been reviews without any blocking comments, you should be good to go. If this works, I would like to merge it so that at least we have some ability to propagate headers downstream.

@rkuhn rkuhn force-pushed the rk/downstream-server branch from 1900493 to 661ba26 Compare April 17, 2025 08:44
@rkuhn
Copy link
Contributor Author

rkuhn commented Apr 17, 2025

aye aye; rebased across the move, will now add some basic client limits in place and then make CI green

Copy link

codecov bot commented Apr 17, 2025

Codecov Report

Attention: Patch coverage is 80.25000% with 158 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
...u/src/stages/consensus/chain_forward/test_infra.rs 78.29% 51 Missing ⚠️
crates/amaru/src/stages/consensus/chain_forward.rs 73.91% 48 Missing ⚠️
.../stages/consensus/chain_forward/client_protocol.rs 76.35% 35 Missing ⚠️
crates/amaru/src/stages/mod.rs 45.16% 17 Missing ⚠️
crates/amaru/src/bin/amaru/cmd/daemon.rs 0.00% 6 Missing ⚠️
...src/stages/consensus/chain_forward/client_state.rs 98.14% 1 Missing ⚠️
Files with missing lines Coverage Δ
crates/amaru/src/bin/amaru/cmd/mod.rs 96.42% <ø> (ø)
.../amaru/src/stages/consensus/chain_forward/tests.rs 100.00% <100.00%> (ø)
...src/stages/consensus/chain_forward/client_state.rs 98.14% <98.14%> (ø)
crates/amaru/src/bin/amaru/cmd/daemon.rs 0.00% <0.00%> (ø)
crates/amaru/src/stages/mod.rs 11.96% <45.16%> (+11.96%) ⬆️
.../stages/consensus/chain_forward/client_protocol.rs 76.35% <76.35%> (ø)
crates/amaru/src/stages/consensus/chain_forward.rs 72.19% <73.91%> (+72.19%) ⬆️
...u/src/stages/consensus/chain_forward/test_infra.rs 78.29% <78.29%> (ø)

... and 1 file with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@rkuhn rkuhn force-pushed the rk/downstream-server branch from 45d7bd2 to 74a52ab Compare April 19, 2025 08:47
@rkuhn
Copy link
Contributor Author

rkuhn commented Apr 19, 2025

Alright, I pushed this as far as I can without diving into rabbit holes. Any ideas about the remaining CI failure?

@abailly
Copy link
Contributor

abailly commented Apr 19, 2025

I agree the message is not very explicit

Error: Open(IO(Os { code: 2, kind: NotFound, message: "No such file or directory" }))

@abailly
Copy link
Contributor

abailly commented Apr 19, 2025

Alright, I pushed this as far as I can without diving into rabbit holes. Any ideas about the remaining CI failure?

I think the problem is that you've set the directories where ledger and chain are restored explicitly in the Makefile, but the demo.sh script still uses the default paths defined in the code. Either revert the Makefile to use default directories (I think that's better if only because it reduces the surface of the change), or pass the directories to demo.sh

rkuhn added 5 commits April 20, 2025 19:15
Currently only keepalive and chainsync protocols are supported. There is a smoke test for the latter and the assumption that the former is tested in pallas.
Some work has also been done in the Makefile to make it easier to have multiple Amaru nodes running in the project root directory.
An architecture diagram has been added to the amaru-consensus README.

Signed-Off-By: Roland Kuhn <[email protected]>
For now this is a simple cmdline argument that just ignores peers beyond a certain count.
Proper handling will need to include idle timers etc. to kick out downstream peers that are just squatting.

Signed-Off-By: Roland Kuhn <[email protected]>
also fix some bugs along the way, of course

Signed-off-by: Roland Kuhn <[email protected]>
Signed-off-by: Roland Kuhn <[email protected]>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🧹 Nitpick comments (17)
crates/amaru/src/bin/amaru/cmd/mod.rs (1)

28-29: Validate the default listen address with strong types

Nice one centralizing the default listen address. Consider switching from a raw string to SocketAddr with Clap’s value_parser or std::net::SocketAddr parsing to catch bad addresses early and improve clarity.

.gitignore (1)

39-42: Remove duplicate snapshot ignore

You’ve already got snapshots/ ignored earlier. The new /snapshots/ is a near-twin—consider pruning one to keep the list DRY.

crates/amaru/src/stages/consensus/chain_forward/tests.rs (1)

132-132: Consider improving the shutdown process for tests

That comment about ERROR logs when shutting down the gasket stage is a bit concerning. Like having a game that always crashes when you try to exit properly!

It might be worth investigating a cleaner shutdown approach to avoid polluting test logs with errors, even if they're expected.

crates/amaru/src/stages/consensus/chain_forward/client_state.rs (1)

57-93: Solid header finding algorithm, but could benefit from a small optimization

This find_headers_between function is doing a fair dinkum job of traversing the chain backwards to find matching points. The early return for exact matches is a beauty.

However, for long chains, we might be creating a lot of temporary objects with all those clone() calls. When we're building up a potentially large vector of headers, this could get a bit expensive.

- headers.push(ClientOp::Forward(header.clone(), header.as_tip()));
+ // Create ClientOp just once at the end after we've collected all the headers
+ let header_ref = header;
+ headers.push(header_ref);

And then transform the headers into ClientOps right before returning. Just like how you'd batch craft items in "Breath of the Wild" instead of crafting them one by one!

Makefile (1)

76-82: Cracking command-line flexibility!

You've done a smashing job updating the dev target with all these configurable options. Now users can properly configure directories and listen address. It's more customizable than a Fallout character creation screen!

One thing though - it might be helpful to add a comment explaining what a valid LISTEN_ADDRESS looks like for newcomers.

8 DB_PREFIX ?= amaru
9 LISTEN_ADDRESS ?= 0.0.0.0:0  # Format: <ip>:<port>. Use 0.0.0.0:0 for random port on all interfaces.
crates/amaru/src/stages/consensus/chain_forward/README.md (1)

22-36: Ace actor decomposition rationale!

Deadly explanation of why you've split the responsibilities between chain_sync and chain_sync_handler. I particularly appreciate you noting this is "the minimum required complexity" - shows you've really thought about the trade-offs.

Just a tiny language improvement for readability:

- The `PeerServer` API bundles miniprotocol instances for each of the supported protocols, but unfortunately, using these instances via the `PeerServer` requires an exclusive reference, so the `PeerServer` needs to be deconstructed in order to allow concurrent usage of the protocols (like concurrently waiting for requests).
+ The `PeerServer` API bundles miniprotocol instances for each of the supported protocols, but unfortunately, using these instances via the `PeerServer` requires an exclusive reference, so the `PeerServer` needs to be deconstructed to allow concurrent protocol usage (like concurrently waiting for requests).

Makes it read a bit smoother, like a well-edited cutscene!

🧰 Tools
🪛 LanguageTool

[style] ~23-~23: Consider a shorter alternative to avoid wordiness.
Context: ... PeerServer needs to be deconstructed in order to allow concurrent usage of the protocols...

(IN_ORDER_TO_PREMIUM)


[uncategorized] ~24-~24: Possible missing comma found.
Context: ...he protocol instances is managed by one actor whose sole responsibility is to react t...

(AI_HYDRA_LEO_MISSING_COMMA)


[style] ~26-~26: You have already used this phrasing in nearby sentences. Consider replacing it to add variety to your writing.
Context: .... In the case of chainsync this actor needs to obtain information regarding the curren...

(REP_NEED_TO_VB)


[style] ~26-~26: Consider a shorter alternative to avoid wordiness.
Context: ...r operations (roll forward or backward) in order to respond to requests for the next header...

(IN_ORDER_TO_PREMIUM)


[style] ~27-~27: You have already used this phrasing in nearby sentences. Consider replacing it to add variety to your writing.
Context: ...ion updates, these two responsibilities need to be separated into two actors: - `chain...

(REP_NEED_TO_VB)

crates/amaru/src/stages/consensus/chain_forward/chain_forward.d2 (1)

40-126: Comprehensive component architecture visualization

This is a cracking good diagram definition that perfectly complements the README. The way you've structured the components and their relationships gives a crystal clear picture of the system's architecture.

The markdown text blocks within components add just the right level of detail without cluttering the diagram. It's like one of those beautiful Assassin's Creed maps that shows you exactly what you need to know!

One suggestion - consider adding a brief comment at the top explaining how to generate the actual PNG from this d2 file for those who aren't familiar with the tool.

+# Generate diagram with: d2 chain_forward.d2 chain_forward.png
+# (Install d2 from https://d2lang.com if needed)
+
 title: {
   label: "internal design of the chain_forward stage"
crates/amaru/src/stages/mod.rs (2)

108-115: ForwardStage gets u64, config gives u32 – use Into instead of a cast

The explicit as u64 hides a silent truncation bug if we ever exceed u32::MAX.
Use Into::<u64> (or make the field itself u64) so the compiler yells when it should.

-        config.network_magic as u64,
+        u64::from(config.network_magic),

173-196: These helper traits are brilliant – consider moving them to a shared util

PallasPoint, AsTip, and to_pallas_point are generic goodies that other modules could reuse (I already spotted a couple of conversions elsewhere).
Sticking them into a crate::utils::point module will keep duplication away and make tests happier.

crates/amaru/src/stages/consensus/chain_forward/client_protocol.rs (3)

81-85: Early‑bird clients get the worm… or a polite error?

Right now we bail with EarlyRequestNext when the client forgets to ask for intersection first. That’s spec‑correct, but maybe over‑strict for a public relay. Consider logging & closing silently instead of bubbling an error upstream if you don’t want noisy telemetry.


86-93: Heavy mutex lock inside hot path

find_headers_between grabs store.lock().await for the whole computation. On a busy box with dozens of downstreams this will serialise every intersection search.
Clone the needed headers outside the lock or expose a lock‑free read API in ChainStore to keep things zippy like Sonic on a sugar rush.


204-210: Hard‑coding variant = 1 smells like trouble

Shelley headers are variant = 1 today, but once Voltaire rocks up we’ll have more flavours than a gelato bar. Maybe derive the variant from the header itself (or a const enum) instead of freezing it.

crates/amaru/src/stages/consensus/chain_forward.rs (2)

195-223: Name shadowing hides the listener – rename for clarity

We create let server = TcpListener… and then immediately let server = tokio::spawn….
Totally legal, but future Ryan will scratch his head figuring out which “server” is which. A tiny rename keeps the code as crisp as a freshly‑poured Guinness.

-let server = TcpListener::bind(&stage.listen_address).await.or_panic()?;
-
-let server = tokio::spawn(async move {
+let listener = TcpListener::bind(&stage.listen_address).await.or_panic()?;
+
+let listener_task = tokio::spawn(async move {
     loop {
-        let peer = match PeerServer::accept(&server, network_magic).await {
+        let peer = match PeerServer::accept(&listener, network_magic).await {

337-379: Broadcast loop could clog if a single client stalls

for client in clients.values() { client.send(…) } awaits each send in series.
A slow peer will block everyone like that one mate who hogs the controller. Fire‑and‑forget with try_send or spawn per‑client tasks to keep the queue flowing.

crates/amaru/src/stages/consensus/chain_forward/test_infra.rs (3)

88-93: Tiny nit: constant naming vs. file path is a tad confusing

CHAIN_47 points at tests/data/chain41.json – that mismatch can trip the next dev reading this (I almost did a double‑take like Marty in “Back to the Future”).
Either rename the constant or bump the file so the numbers line up.


343-351: One‑second timeout is pretty tight for CI runners

block_on wraps every future in timeout(Duration::from_secs(1), f).
On a busy GitHub runner or a local box under load, a single context switch can exceed that, leading to flaky red builds faster than Sonic on caffeine.

Consider making this configurable or bumping it to 5‑10 s:

-rt.block_on(timeout(Duration::from_secs(1), f))
+const TEST_TIMEOUT_SECS: u64 = 5;
+rt.block_on(timeout(Duration::from_secs(TEST_TIMEOUT_SECS, f))

258-269: Graceful exit on peer disconnect would improve resilience

recv_until_await breaks only on Await, but if the peer closes the socket you get an Err and silently exit the loop, returning whatever ops were collected.
Might be worth logging that error (or bubbling it up) so future debugging isn’t a wild Chocobo chase.

-while let Ok(response) = block_on(&self.runtime, self.client.chainsync().request_next()) {
+while let Ok(response) =
+    block_on(&self.runtime, self.client.chainsync().request_next())
+{
     match response {
         ...
     }
 }
 // Else: log/propagate the disconnection error
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a1c2150 and 57af6ff.

⛔ Files ignored due to path filters (4)
  • Cargo.lock is excluded by !**/*.lock
  • crates/amaru-consensus/Cargo.lock is excluded by !**/*.lock
  • crates/amaru/src/stages/consensus/chain_forward/chain_forward.png is excluded by !**/*.png
  • examples/Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (17)
  • .gitignore (2 hunks)
  • Cargo.toml (1 hunks)
  • Makefile (3 hunks)
  • README.md (2 hunks)
  • crates/amaru-consensus/Cargo.toml (1 hunks)
  • crates/amaru/Cargo.toml (2 hunks)
  • crates/amaru/src/bin/amaru/cmd/daemon.rs (2 hunks)
  • crates/amaru/src/bin/amaru/cmd/mod.rs (1 hunks)
  • crates/amaru/src/stages/consensus/chain_forward.rs (2 hunks)
  • crates/amaru/src/stages/consensus/chain_forward/README.md (1 hunks)
  • crates/amaru/src/stages/consensus/chain_forward/chain_forward.d2 (1 hunks)
  • crates/amaru/src/stages/consensus/chain_forward/client_protocol.rs (1 hunks)
  • crates/amaru/src/stages/consensus/chain_forward/client_state.rs (1 hunks)
  • crates/amaru/src/stages/consensus/chain_forward/test_infra.rs (1 hunks)
  • crates/amaru/src/stages/consensus/chain_forward/tests.rs (1 hunks)
  • crates/amaru/src/stages/mod.rs (7 hunks)
  • scripts/demo.sh (1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (2)
crates/amaru/src/stages/consensus/chain_forward/tests.rs (4)
crates/amaru/src/stages/consensus/chain_forward/client_state.rs (2)
  • find_headers_between (57-93)
  • new (17-19)
crates/amaru/src/stages/consensus/chain_forward/test_infra.rs (5)
  • hash (115-117)
  • mk_store (94-113)
  • get_height (59-62)
  • new (141-192)
  • get_point (54-57)
crates/amaru/src/stages/consensus/chain_forward.rs (2)
  • tip (170-175)
  • new (65-86)
crates/amaru/src/stages/mod.rs (5)
  • pallas_point (174-174)
  • pallas_point (178-180)
  • pallas_point (184-186)
  • as_tip (199-199)
  • as_tip (203-205)
crates/amaru/src/stages/mod.rs (7)
crates/amaru-kernel/src/lib.rs (1)
  • from (161-167)
crates/amaru/src/stages/consensus/chain_forward.rs (2)
  • tip (170-175)
  • new (65-86)
crates/amaru-ledger/src/state.rs (2)
  • tip (165-177)
  • new (97-150)
crates/amaru/src/stages/consensus/chain_forward/test_infra.rs (3)
  • new (141-192)
  • point (123-125)
  • hash (115-117)
crates/amaru/src/stages/pull.rs (1)
  • new (57-64)
crates/amaru/src/stages/consensus/fetch.rs (1)
  • new (42-52)
crates/amaru/src/stages/consensus/header.rs (1)
  • new (43-49)
🪛 LanguageTool
crates/amaru/src/stages/consensus/chain_forward/README.md

[style] ~23-~23: Consider a shorter alternative to avoid wordiness.
Context: ... PeerServer needs to be deconstructed in order to allow concurrent usage of the protocols...

(IN_ORDER_TO_PREMIUM)


[uncategorized] ~24-~24: Possible missing comma found.
Context: ...he protocol instances is managed by one actor whose sole responsibility is to react t...

(AI_HYDRA_LEO_MISSING_COMMA)


[style] ~26-~26: You have already used this phrasing in nearby sentences. Consider replacing it to add variety to your writing.
Context: .... In the case of chainsync this actor needs to obtain information regarding the curren...

(REP_NEED_TO_VB)


[style] ~26-~26: Consider a shorter alternative to avoid wordiness.
Context: ...r operations (roll forward or backward) in order to respond to requests for the next header...

(IN_ORDER_TO_PREMIUM)


[style] ~27-~27: You have already used this phrasing in nearby sentences. Consider replacing it to add variety to your writing.
Context: ...ion updates, these two responsibilities need to be separated into two actors: - `chain...

(REP_NEED_TO_VB)


[uncategorized] ~37-~37: A comma might be missing here.
Context: ...s for one client and the overall client supervisor there is an additional level of per-cli...

(AI_EN_LECTOR_MISSING_PUNCTUATION_COMMA)

🔇 Additional comments (27)
Cargo.toml (1)

18-18: Approve acto workspace dependency

Right on—adding acto v0.7.2 with the Tokio feature is spot on for the actor-based model. Ensure it aligns with versions in downstream crates for consistency.

crates/amaru/Cargo.toml (2)

17-18: Approve acto usage in amaru crate

Looks sweet having acto as a workspace dependency here—just double-check that the required features propagate correctly to the chain_forward modules.


24-25: Verify additional serialization and utility deps

You’ve pulled in minicbor, serde_json, and slot-arithmetic. Make sure these crates are indeed referenced in crates/amaru (e.g., in consensus serialization logic). If they aren’t used yet, we might trim the Cargo manifest for faster compiles.

Also applies to: 33-34

crates/amaru-consensus/Cargo.toml (1)

33-38: Approve new dev-dependencies

Nice additions of minicbor, serde_json, and tracing-subscriber in the amaru-consensus crate. These will beef up your testing and tracing for the chain forward stage. Just eyeball that they land in test code as expected.

scripts/demo.sh (2)

22-24: Nice addition of configurable database directories!

G'day mate! These environment variables are a cracker addition - makes the demo script more flexible by letting users specify custom paths for ledger and chain databases. Like being able to customize your character's loadout in Fallout before starting a new game!


28-32: Ripper job with the command line formatting!

Breaking down the command over multiple lines is a thing of beauty, eh? Makes it way more readable than a Wall of Text boss fight. The command now properly passes the database directories as arguments, ensuring consistent configuration across the system. Top notch!

README.md (2)

23-23: Good on ya for that helpful cross-reference!

Adding a reference to the bottom section is fair dinkum helpful - gives readers a heads up that there's more detailed instructions coming up later, like a proper game tutorial that tells you there's an advanced guide in the appendix.


131-149: Top-notch Docker instructions for the preprod network!

Absolutely brilliant addition here! These Docker instructions are as clear as the waters of Bondi Beach. Explaining the volumes, port exposure, and environment variables gives users everything they need to get a Cardano node running on preprod - it's like getting the perfect walkthrough for a tricky boss fight.

The detail about syncing taking time is a nice touch too - sets proper expectations like a good game loading screen tip.

crates/amaru/src/bin/amaru/cmd/daemon.rs (2)

54-60: Deadly networking config options - like leveling up your multiplayer capabilities!

These two new CLI arguments are proper game-changers:

  • listen_address: Gives control over where to listen for incoming connections
  • max_downstream_peers: Limits the number of downstream peers to prevent resource overload

This is like upgrading from a single-player game to a proper MMO - now we can actually control our network presence! Good default values too.


117-120: Proper config structure updates to match the new CLI args!

You've tied everything together nicely here by updating the Config with all the new fields. The explicit setting of network_magic from the network name is a smart move - like making sure your character's stats match the class you've chosen.

This ensures that all the new networking options flow through properly from CLI args to the runtime configuration.

crates/amaru/src/stages/consensus/chain_forward/tests.rs (6)

12-21: Solid test for store creation - establishing your base camp!

This test validates that your in-memory store creation works properly. Like checking your inventory before starting a quest - making sure you've got the right number of items (48 headers) and that the chain structure is what you expect (47 length). Good foundation for the more complex tests to come!


23-32: Clean test for the simplest scenario - tip to tip intersection!

This test handles the base case - when our tip and the peer's tip are the same point. It's like checking that your fast-travel system works to your current location. Simple but necessary validation!


34-51: Proper test for branching scenarios - where the real challenge begins!

This test checks finding headers between a tip and a branch point - crucial for chain sync protocols. It's like finding the right path back to a checkpoint after exploring a side quest. Good validation of both the count and the expected intersection point and height!


78-89: Edge case testing for chains with no common ancestor - nice one!

This test handles the scenario where we try to find headers between our tip and a completely separate chain. Like trying to fast-travel between two different game worlds! Correctly falls back to genesis (Point::Origin) when there's no intersection. Deadly important test for robustness!


91-133: Comprehensive chain sync integration test - the boss battle!

This integration test is the real deal - setting up a client, finding intersection points, receiving headers, and handling backward/validated events. It's like a full quest line that exercises all the game mechanics together.

The comment about chain[6] on lines 117-120 is helpful in explaining the slightly unintuitive test behavior - good awareness that the timing of operations affects the observed results.


135-172: Nice test for the rollback optimization - handling detours elegantly!

This test checks an important optimization case: how the system handles chain reorganizations and rollbacks. Like testing how a game recovers when you reload a save from an earlier checkpoint! Good verification that the correct sequence of headers is sent after a rollback.

crates/amaru/src/stages/consensus/chain_forward/client_state.rs (2)

8-14: Good data structure design, mate!

Nice one with this ClientState struct - it's a ripper of a design for tracking client operations. The comment about having only one rollback at the front is crucial for maintaining a clean state model. This reminds me of the inventory system in "The Last of Us" - simple but effective!


26-49: Clever queue management logic!

This add_op method is spot on! The way you're handling the backward operations by either truncating to a matching point or clearing entirely is deadly efficient. It's like when Neo learns to optimize his movements in The Matrix - no wasted effort here. The debug traces will be bonza for troubleshooting too.

Makefile (3)

8-12: Nice configurability addition!

Top work setting up these configurable database directories! Using the DB_PREFIX as a base for both LEDGER_DIR and CHAIN_DIR is clever as - reminds me of how "Portal" lets you control both portals with related inputs. This will make testing with different database setups a breeze.


71-74: Grand cleanup target!

Adding this clear-db target and making bootstrap depend on it is absolutely brilliant! It's like that satisfying moment in "Katamari Damacy" when you clear the board and start fresh. Clean state for clean testing, love it!


110-110: Consistent environment variable passing!

Good on ya for updating the demo script invocation to use the environment variables. This keeps everything consistent across all targets - reminds me of how "Mass Effect" maintains universe consistency across all its quests. Deadly important when dealing with filesystem paths!

crates/amaru/src/stages/consensus/chain_forward/README.md (3)

1-13: Brilliant architectural introduction!

This README starts off strong by clearly explaining the concurrency challenges and your approach to solving them. Breaking complex state machines into smaller actor-based components is ace - reminds me of how "The Avengers" works better with each hero handling their specialty rather than one superhero trying to do it all!

The diagram inclusion is a ripper touch too - visual learners will appreciate that!


15-20: Clear explanation of TCP listener constraints

You've done a champion job explaining the Tokio TcpListener API constraints and your solution using a bounded queue. This is the kind of clarity that saves dev hours! It's like those tutorial levels in "Portal" that explain the mechanics before you need to use them.


37-39: Clear supervision hierarchy explanation

Great job explaining the purpose of the client_protocols actor level. This additional layer of supervision is like how the Normandy operates in Mass Effect - command structure matters! The failure propagation policy is clearly articulated too.

🧰 Tools
🪛 LanguageTool

[uncategorized] ~37-~37: A comma might be missing here.
Context: ...s for one client and the overall client supervisor there is an additional level of per-cli...

(AI_EN_LECTOR_MISSING_PUNCTUATION_COMMA)

crates/amaru/src/stages/consensus/chain_forward/chain_forward.d2 (1)

1-38: Beautiful diagram styling!

You've set up a proper professional-looking diagram here with these styling classes. The container/component/subcomponent hierarchy with different visual attributes is just like a well-designed UI in "Cyberpunk 2077" - clear visual hierarchy that helps understand the system at a glance.

crates/amaru/src/stages/consensus/chain_forward.rs (1)

296-304: Double‑check rollback header height

We assume the header exists in the store after rollback, but on deep rollbacks the block might be pruned. An Option::None here silently drops the event.
Consider logging a warn and pushing a ClientOp::Backward with the best info you have so downstreams don’t stall.

crates/amaru/src/stages/consensus/chain_forward/test_infra.rs (1)

70-86:

❓ Verification inconclusive

Heads‑up: unimplemented!() will blow up if the test ever needs nonces or era history

Right now the trait stubs for get_nonces, put_nonces, and era_history punt with unimplemented!().
If any future test (or the stage itself) decides to look up nonces or era data, the whole suite will crash harder than the Death Star in “Return of the Jedi”.

 fn get_nonces(&self, _header: &Hash<32>) -> Option<Nonces> {
-    unimplemented!()
+    // TODO: return a default or mocked value so tests won’t panic
+    None
 }

Even a dummy value (or a todo!() guarded by a clear comment) is safer for exploratory work.
Otherwise, remember to upgrade these before the infra gets used outside the happy‑path tests.


Let’s check if any tests or consensus logic actually invoke these stubs:


🏁 Script executed:

#!/bin/bash
# Search for get_nonces, put_nonces, and era_history usages within the amaru crate
rg -n "get_nonces" -C2 crates/amaru
rg -n "put_nonces" -C2 crates/amaru
rg -n "era_history" -C2 crates/amaru

Length of output: 8912


Heads‑up: stubbed methods in test_infra.rs aren’t currently exercised but will panic if ever invoked

  • File: crates/amaru/src/stages/consensus/chain_forward/test_infra.rs (lines 75–86)
    get_nonces, put_nonces, and era_history all use unimplemented!()
    • A ripgrep scan shows no current calls from the chain_forward stage or its tests

Even though your existing chain‑forward tests don’t hit these stubs, any future lookup will explode like a Sith Star Destroyer. You might swap them for harmless defaults:

fn get_nonces(&self, _header: &Hash<32>) -> Option<Nonces> {
-   unimplemented!()
+   None
}

fn put_nonces(&mut self, _header: &Hash<32>, _nonces: Nonces) -> Result<(), StoreError> {
-   unimplemented!()
+   Ok(())
}

fn era_history(&self) -> &slot_arithmetic::EraHistory {
-   unimplemented!()
+   // e.g. return a stored default or inject a dummy EraHistory in your TestStore
+   &self.default_era_history
}

Keep this in mind before you add any tests or logic that rely on nonce or era data—otherwise you’ll get a nasty panic.

Comment on lines +53 to +76
#[test]
fn find_headers_between_tip_and_branches() {
let store = mk_store(CHAIN_47);

let tip = store.get_point(TIP_47);
// Note that the below scheme does not match the documented behaviour, which shall pick the first from
// the list that is on the same chain. But that doesn't make sense to me at all.
let points = [
store.get_point(BRANCH_47), // this will lose to the (taller) winner
store.get_point(LOST_47), // this is not on the same chain
store.get_point(WINNER_47), // this is the winner after the branch
];
let peer = store.get_point(WINNER_47);

let (ops, Tip(p, h)) = find_headers_between(&store, &tip, &points).unwrap();
assert_eq!(
(ops.len() as u64, p, h),
(
store.get_height(TIP_47) - store.get_height(WINNER_47),
peer,
store.get_height(WINNER_47)
)
);
}
Copy link
Contributor

@coderabbitai coderabbitai bot Apr 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Interesting behavior note about intersection selection logic!

Your comment on lines 58-60 raises a fair question about the documented behavior. Seems like there might be a mismatch between how the code works and what's documented.

Consider clarifying whether the current implementation (which seems to prefer the tallest intersection point) is the intended behavior or if it should be updated to match documentation (which apparently prefers the first matching point in the list).


🏁 Script executed:

#!/bin/bash
# Let's see if there's documentation that explains the intersection finding logic
grep -r "find_headers_between" --include="*.md" --include="*.rs" .

Length of output: 1514


🏁 Script executed:

#!/bin/bash
# Extract the doc comment and signature of find_headers_between to verify intended behavior
grep -R -n -A5 -B10 "pub(super) fn find_headers_between" crates/amaru/src/stages/consensus/chain_forward/client_state.rs

Length of output: 760


🏁 Script executed:

#!/bin/bash
# Dump the implementation of find_headers_between to inspect intersection selection logic
sed -n '50,200p' crates/amaru/src/stages/consensus/chain_forward/client_state.rs

Length of output: 1700


Clarify intersection‐selection semantics in find_headers_between

Oi, mate—after digging into the impl, I see we walk backwards from the tip and pick the first matching ancestor (i.e. the tallest common point), not the first entry in the points array as the doc suggests. Let’s decide whether to:

  • Update the doc comment in client_state.rs (lines 52–56) to say “pick the nearest ancestor on the chain”
  • Or change the loop logic to scan your points slice in order and pick the first match

Also, the inline note in tests.rs (around lines 58–60) needs adjusting once we settle on the intended behavior.

• client_state.rs: amend the “first found point” description to reflect chain‐walk semantics
• tests.rs: update or remove the “does not match the documented behaviour” comment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indeed, will fix the docs

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rkuhn Ripper! Thanks for sorting that out, mate. Documentation and code should be best mates, not strangers passing in the night. Let me know if ya need any more eyes on those doc changes when you've got 'em ready. Like they say in The Matrix, "The issue is not the spoon, it's the documentation about the spoon." Or something like that! 👍

┌─────────┐
│ ^ω^ │
│ CHEERS! │
└─────────┘

Comment on lines +181 to +184
let ActoInput::Message(Some((op, tip))) = cell.recv().await else {
return Ok(());
};
match op {
Copy link
Contributor

@coderabbitai coderabbitai bot Apr 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Potential dead‑air after await_reply()

After sending await_reply we block on cell.recv(). If the parent dies meanwhile we happily exit Ok(()), leaving the network side hanging. Maybe detect ActoInput::NoMoreSenders and close the chainsync stream with send_done so the client isn’t ghosted.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@abailly This remark from the code rabbit is uncanny: it does recognise how acto works (which is not the most popular library) while implying that dropping the chainsync server won’t be noticed by the client on the other side. Is closing the stream not signalled by the muxer?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

Comment on lines +266 to +271
match header.parent() {
Some(parent) => assert_eq!(
Point::new(self.our_tip.0.slot_or_default(), parent.as_ref().to_vec()),
self.our_tip.0
),
None => assert_eq!(self.our_tip.0, Point::Origin),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Compilation blocker – Point::new doesn’t exist

pallas_network::miniprotocols::Point only has the Specific variant.
The assert won’t even compile, meaning your CI Guinness is flat before the first sip.

-Point::new(self.our_tip.0.slot_or_default(), parent.as_ref().to_vec()),
+Point::Specific(self.our_tip.0.slot_or_default(), parent.as_ref().to_vec()),
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
match header.parent() {
Some(parent) => assert_eq!(
Point::new(self.our_tip.0.slot_or_default(), parent.as_ref().to_vec()),
self.our_tip.0
),
None => assert_eq!(self.our_tip.0, Point::Origin),
match header.parent() {
Some(parent) => assert_eq!(
Point::Specific(self.our_tip.0.slot_or_default(), parent.as_ref().to_vec()),
self.our_tip.0
),
None => assert_eq!(self.our_tip.0, Point::Origin),
}

@abailly abailly force-pushed the rk/downstream-server branch from 57af6ff to 0a5b8fd Compare April 20, 2025 17:18
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (2)
crates/amaru/src/stages/consensus/chain_forward/tests.rs (1)

58-60: G'day! Let's chat about this intersection logic comment.

This note about the intersection selection behavior not matching docs is spot on. We're choosing the tallest common point (walking backwards from tip) rather than the first match in the provided points array.

This is the same issue flagged in the previous review - we need to decide which behavior is correct and either:

  1. Update the docs in client_state.rs to describe the current behavior, or
  2. Change the implementation to scan through points in order

Either way, this comment and the docs need to be aligned with the actual behavior, mate!

crates/amaru/src/stages/consensus/chain_forward.rs (1)

268-269: ⚠️ Potential issue

Critical compiler error - Point::new doesn't exist!

The CI's gonna chuck a wobbly over this one - Point::new isn't a valid constructor.

According to the pallas_network API, Point only has Point::Origin and Point::Specific variants, not a "new" method:

-Point::new(self.our_tip.0.slot_or_default(), parent.as_ref().to_vec()),
+Point::Specific(self.our_tip.0.slot_or_default(), parent.as_ref().to_vec()),
🧹 Nitpick comments (2)
crates/amaru/src/stages/consensus/chain_forward/tests.rs (1)

132-132: Fair warning about gasket shutdown behavior.

This note about not being able to cleanly shut down the gasket stage is a bit like that scene in Jurassic Park - "We've got a T-Rex!" - it's a known issue but still deserving of attention.

Consider investigating if there's a way to implement a cleaner shutdown mechanism that doesn't log errors, or alternatively, add a test-specific log filter to suppress these expected errors during testing. Might save future devs from scratching their heads wondering if something's actually gone wrong.

crates/amaru/src/stages/consensus/chain_forward.rs (1)

196-222: TCP listener background task looks right, but could use error handling improvements.

The TCP listener task is spawned as a background task, which is appropriate for this continuous operation. However, if the task ends due to a bind error or other issue, we won't know.

Consider adding a termination detection mechanism:

 let server = tokio::spawn(async move {
+    let result: Result<(), anyhow::Error> = async {
         loop {
             // due to the signature of TcpListener::accept, this is the only way to use this API
             // in particular, it isn't possible to poll for new peers within the `schedule` method
             let peer = match PeerServer::accept(&server, network_magic).await {
                 Ok(peer) => peer,
                 Err(e) => {
                     tracing::warn!(
                         target: EVENT_TARGET,
                         "error accepting peer: {e}",
                     );
+                    if e.is_io() && e.kind() == std::io::ErrorKind::BrokenPipe {
+                        return Err(anyhow::anyhow!("Listener connection broken: {}", e));
+                    }
                     continue;
                 }
             };

             match tx.send(peer).await {
                 Ok(_) => {}
                 Err(e) => {
                     tracing::info!(
                         target: EVENT_TARGET,
                         "dropping incoming connection: {e}"
                     );
+                    if tx.is_closed() {
+                        return Err(anyhow::anyhow!("Channel to worker closed"));
+                    }
                 }
             }
         }
+    }.await;
+    if let Err(e) = result {
+        tracing::error!(target: EVENT_TARGET, "Listener task terminated: {}", e);
+    }
 });

This would help identify if the listener task unexpectedly terminates during runtime.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Lite

📥 Commits

Reviewing files that changed from the base of the PR and between 57af6ff and 0a5b8fd.

⛔ Files ignored due to path filters (3)
  • Cargo.lock is excluded by !**/*.lock
  • crates/amaru-consensus/Cargo.lock is excluded by !**/*.lock
  • crates/amaru/src/stages/consensus/chain_forward/chain_forward.png is excluded by !**/*.png
📒 Files selected for processing (17)
  • .gitignore (2 hunks)
  • Cargo.toml (1 hunks)
  • Makefile (3 hunks)
  • README.md (2 hunks)
  • crates/amaru-consensus/Cargo.toml (1 hunks)
  • crates/amaru/Cargo.toml (2 hunks)
  • crates/amaru/src/bin/amaru/cmd/daemon.rs (2 hunks)
  • crates/amaru/src/bin/amaru/cmd/mod.rs (1 hunks)
  • crates/amaru/src/stages/consensus/chain_forward.rs (2 hunks)
  • crates/amaru/src/stages/consensus/chain_forward/README.md (1 hunks)
  • crates/amaru/src/stages/consensus/chain_forward/chain_forward.d2 (1 hunks)
  • crates/amaru/src/stages/consensus/chain_forward/client_protocol.rs (1 hunks)
  • crates/amaru/src/stages/consensus/chain_forward/client_state.rs (1 hunks)
  • crates/amaru/src/stages/consensus/chain_forward/test_infra.rs (1 hunks)
  • crates/amaru/src/stages/consensus/chain_forward/tests.rs (1 hunks)
  • crates/amaru/src/stages/mod.rs (7 hunks)
  • scripts/demo.sh (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • README.md
🚧 Files skipped from review as they are similar to previous changes (14)
  • crates/amaru/Cargo.toml
  • Cargo.toml
  • crates/amaru/src/bin/amaru/cmd/mod.rs
  • .gitignore
  • crates/amaru-consensus/Cargo.toml
  • scripts/demo.sh
  • crates/amaru/src/bin/amaru/cmd/daemon.rs
  • crates/amaru/src/stages/consensus/chain_forward/README.md
  • Makefile
  • crates/amaru/src/stages/consensus/chain_forward/client_state.rs
  • crates/amaru/src/stages/consensus/chain_forward/chain_forward.d2
  • crates/amaru/src/stages/consensus/chain_forward/test_infra.rs
  • crates/amaru/src/stages/consensus/chain_forward/client_protocol.rs
  • crates/amaru/src/stages/mod.rs
🧰 Additional context used
🧬 Code Graph Analysis (2)
crates/amaru/src/stages/consensus/chain_forward/tests.rs (4)
crates/amaru/src/stages/consensus/chain_forward/client_state.rs (2)
  • find_headers_between (57-93)
  • new (17-19)
crates/amaru/src/stages/consensus/chain_forward/test_infra.rs (5)
  • hash (115-117)
  • mk_store (94-113)
  • get_height (59-62)
  • new (141-192)
  • get_point (54-57)
crates/amaru/src/stages/consensus/chain_forward.rs (2)
  • tip (170-175)
  • new (65-86)
crates/amaru/src/stages/mod.rs (5)
  • pallas_point (174-174)
  • pallas_point (178-180)
  • pallas_point (184-186)
  • as_tip (199-199)
  • as_tip (203-205)
crates/amaru/src/stages/consensus/chain_forward.rs (12)
crates/amaru/src/stages/consensus/chain_forward/client_protocol.rs (1)
  • client_protocols (37-64)
crates/amaru/src/stages/ledger.rs (4)
  • new (41-54)
  • bootstrap (141-143)
  • schedule (145-151)
  • execute (158-182)
crates/amaru/src/stages/consensus/chain_forward/client_state.rs (1)
  • new (17-19)
crates/amaru/src/stages/consensus/chain_forward/test_infra.rs (4)
  • new (141-192)
  • fmt (311-327)
  • point (123-125)
  • eq (331-341)
crates/amaru/src/stages/consensus/fetch.rs (4)
  • new (42-52)
  • bootstrap (103-105)
  • schedule (107-114)
  • execute (116-122)
crates/amaru/src/stages/consensus/header.rs (4)
  • new (43-49)
  • bootstrap (90-92)
  • schedule (94-101)
  • execute (108-114)
crates/amaru-ledger/src/store.rs (2)
  • default (262-271)
  • tip (107-107)
crates/amaru-kernel/src/lib.rs (1)
  • slot_or_default (156-161)
crates/amaru-ledger/src/context.rs (4)
  • from (57-61)
  • from (71-73)
  • from (86-88)
  • from (98-100)
crates/amaru/src/stages/mod.rs (1)
  • bootstrap (68-151)
crates/amaru-ledger/src/rules/transaction.rs (1)
  • execute (79-190)
crates/amaru-ledger/src/rules/block.rs (1)
  • execute (61-128)
🔇 Additional comments (3)
crates/amaru/src/stages/consensus/chain_forward/tests.rs (1)

23-32: Ripper test coverage, mate! Comprehensive scenarios and edge cases.

The tests thoroughly cover the chain synchronization protocol with various scenarios:

  • Tip-to-tip (no headers needed)
  • Tip-to-branch (headers up to branch point)
  • Tip-to-multiple-branches (tallest branch selected)
  • Tip-to-lost-fork (all the way back to genesis)
  • Integration tests with client connection and chain events

This test suite gives us solid confidence in the chain forward implementation. Good on ya for covering both the happy paths and edge cases!

Also applies to: 34-51, 53-76, 78-89, 91-133, 135-172

crates/amaru/src/stages/consensus/chain_forward.rs (2)

47-55: ForwardStage configuration looks solid.

The expanded ForwardStage struct has all the necessary bits to manage downstream network connections, like a well-equipped Road Warrior ready for the outback.

Good job incorporating network configuration (magic, listen address, max peers) and the Tokio runtime into the stage. This provides all the necessary components for the TCP listener and client management.


347-382: Crikey! This client supervisor is a beauty.

The client supervisor uses actor supervision effectively to manage client lifecycle and enforce connection limits.

I like how this pattern:

  1. Enforces the max peers limit
  2. Supervises client handlers and gets notified when they terminate
  3. Broadcasts operations to all connected clients efficiently

It's as elegant as a synchronized swim routine in Mad Max: Fury Road!

Comment on lines +261 to +262
// FIXME: block height should be part of BlockValidated message
let store = stage.store.lock().await;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

FIXME comments need addressing.

These TODOs about block height should be part of the BlockValidated message are like that hanging plot thread in The Last of Us 2 - they need to be resolved!

Consider updating the BlockValidationResult enum to include block height data, eliminating the need to look it up from the store. This would make the code more robust and efficient:

pub enum BlockValidationResult {
    BlockValidated(Point, u64, Span), // Added block height parameter
    RolledBackTo(Point, u64, Span),   // Added block height parameter
    BlockValidationFailed(Point, Span),
}

Also applies to: 296-297

Comment on lines +73 to +75
#[allow(clippy::expect_used)]
let runtime =
AcTokio::new("consensus.forward", 1).expect("failed to create AcTokio runtime");
Copy link
Contributor

@coderabbitai coderabbitai bot Apr 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Runtime creation with expect() is risky.

Using expect() for runtime creation could cause a panic if the runtime can't be created, which is a bit like driving without a seatbelt - probably fine, but not the safest approach.

Consider propagating the error instead:

-#[allow(clippy::expect_used)]
-let runtime =
-    AcTokio::new("consensus.forward", 1).expect("failed to create AcTokio runtime");
+let runtime = AcTokio::new("consensus.forward", 1)
+    .map_err(|e| format!("failed to create AcTokio runtime: {}", e))?;

Then update the new function signature to return a Result:

-pub fn new(
+pub fn new(
     downstream: Option<ActoRef<ForwardEvent>>,
     store: Arc<Mutex<dyn ChainStore<Header>>>,
     network_magic: u64,
     listen_address: &str,
     max_peers: usize,
     our_tip: Tip,
-) -> Self {
+) -> Result<Self, String> {
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
#[allow(clippy::expect_used)]
let runtime =
AcTokio::new("consensus.forward", 1).expect("failed to create AcTokio runtime");
pub fn new(
downstream: Option<ActoRef<ForwardEvent>>,
store: Arc<Mutex<dyn ChainStore<Header>>>,
network_magic: u64,
listen_address: &str,
max_peers: usize,
our_tip: Tip,
) -> Result<Self, String> {
let runtime = AcTokio::new("consensus.forward", 1)
.map_err(|e| format!("failed to create AcTokio runtime: {}", e))?;
// ... the rest of your initialization logic ...
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We’re currently working in the mode that fatal errors tear down the node with a clear message — which is exactly what a Rust panic!() is made for. Ideally, this would be handled using Result passing and shutting down gracefully or in central places, but the codebase is just not there yet. So for now the .expect() should stay.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rkuhn Fair dinkum, mate! I hear ya loud and clear on keeping the .expect() call. That makes perfect sense for your current development approach - using panics for critical failures lets ya get a clear message when things go pear-shaped, without having to rewrite half the codebase. Like in Mad Max when resources are scarce, ya gotta work with what ya have! I'll keep this in mind for future reviews.


✏️ Learnings added
Learnt from: rkuhn
PR: pragma-org/amaru#149
File: crates/amaru/src/stages/consensus/chain_forward.rs:73-75
Timestamp: 2025-04-20T18:02:25.049Z
Learning: In the current development stage, rkuhn prefers using explicit panics (via `.expect()` or `.unwrap()`) for fatal errors in the application code that would tear down the node, rather than propagating errors with `Result`. The intention is to eventually transition to proper error handling with `Result` as the codebase matures.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.


🧠 Learnings used
Learnt from: rkuhn
PR: pragma-org/amaru#149
File: crates/amaru/src/stages/consensus/chain_forward/test_infra.rs:272-285
Timestamp: 2025-04-20T17:57:23.214Z
Learning: In test infrastructure code, rkuhn prefers explicit panics (using .unwrap() or similar) over returning Result types, as test failures should be immediate and obvious.

Comment on lines +322 to +330
// FIXME: gasket design bug that we only get &Unit and thus cannot take values from it without internal mutability
let peer = peer.borrow_mut().take();
if let Some(peer) = peer {
self.clients
.send(ClientMsg::Peer(peer, self.our_tip.clone()));
} else {
tracing::error!(target: EVENT_TARGET, "Unit::Peer was empty in execute");
}
Ok(())
Copy link
Contributor

@coderabbitai coderabbitai bot Apr 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Gasket design limitation with mutability.

This workaround for the gasket framework limitation is like when you're playing Skyrim with a bucket on your head to avoid NPC detection - creative but a bit dodgy.

The RefCell usage here is a workaround for a design limitation in the gasket framework. Can we raise this as an issue in the gasket repository to improve the API? It would be worth checking if there's already an open issue for this:


🌐 Web query:

Does gasket framework have issues with internal mutability or moving values out of references?

💡 Result:

There is no direct evidence in the search results indicating that the Gasket framework specifically has issues with internal mutability or with moving values out of references. None of the provided search results mention the Gasket framework or discuss its internal mutability or value movement semantics.

However, based on general knowledge of frameworks written in languages with strong mutability constraints (like Rust), issues related to internal mutability and moving values out of references are common topics. In Rust, for instance:

  • Internal Mutability: This is typically handled using types like Cell, RefCell, or smart pointers combined with interior mutability (e.g., Rc<RefCell<T>>). These allow mutation behind shared references, but at the cost of enforcing borrow rules at runtime rather than compile time. Misuse can lead to runtime panics or logic bugs if mutability is not correctly encapsulated[8][9].

  • Moving Values Out of References: Rust’s ownership model prohibits moving a value out of a reference (&T or &mut T) unless the value implements the Copy trait. Attempting to do so is a common source of compiler errors. Frameworks that manage data structures or state across boundaries must be careful to avoid attempting such moves, often opting for clones or using smart pointers to share data safely.

General Issues with Internal Mutability and Value Moving in Frameworks (not Gasket-specific):

  • Misuse of internal mutability (e.g., excessive or unsafe use of RefCell or unsafe code) can result in hidden bugs or runtime panics, especially when multiple aliases try to mutate data concurrently or reentrantly[8][9].
  • Moving values out of references when not allowed by the type system leads to compilation errors or, in some languages, undefined behavior[7].
  • Frameworks must carefully design their APIs to either avoid these pitfalls or provide ergonomic, well-documented patterns for mutating state or transferring ownership.

If you are worried about these issues in the context of the Gasket framework, checking its official documentation or codebase for specific handling of these Rust ownership and mutability semantics is advisable.

Summary Table: General Rust Mutability & Move Issues

Issue Typical Framework Handling Risks if Misused
Internal Mutability Cell, RefCell, smart pointers Borrowing panics, unpredictable bugs[8][9]
Moving values from references Cloning, using smart pointers Compilation failure, logic errors[7]

Conclusion:
No search result directly attributes mutability or reference movement issues to the Gasket framework. The concerns mentioned are general to systems programming and require careful design in any framework, including Gasket, if it is implemented in a language like Rust. For concrete details, reference the Gasket framework's documentation or codebase.

Citations:


Open a Gasket API improvement issue for this interior‑mutability workaround

I’ve scoured the issue tracker and there’s no quest log about internal mutability or moving values out of references in Gasket. Since our RefCell hack in chain_forward.rs feels like using the “arrow‑to‑the‑knee” workaround—clever but hacky—let’s file a feature request:

• File: crates/amaru/src/stages/consensus/chain_forward.rs
Lines: 322–330 (the RefCelltake() workaround for Unit::Peer)
• Describe the current limitation: you can’t move a value out of the unit without interior mutability
• Ask for a more ergonomic API to safely extract values from a Unit component

That way we can drop this bucket-on-the-head hack and keep our code as smooth as an Elden Ring rolling dodge.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gasket has made this API choice because it allows scheduled work units to be retries. This is not used here, but changing the gasket API seems inadequate in this case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rkuhn Ah, fair dinkum mate! That's a ripper explanation about Gasket's design philosophy. Like a good gameplay mechanic, sometimes what looks like a quirk is actually a feature - the ability to retry work units is bloody useful even if you're not using it here.

Maybe we could chuck a quick comment above that RefCell to explain the pattern for future devs who might scratch their heads? Something like:

// RefCell used because Gasket's execute method receives &Unit references
// to support potential retries, requiring interior mutability to take ownership

No worries about changing the API then. It's like in Fallout when you discover that weird bug is actually an intentional feature! 👌


✏️ Learnings added
Learnt from: rkuhn
PR: pragma-org/amaru#149
File: crates/amaru/src/stages/consensus/chain_forward.rs:322-330
Timestamp: 2025-04-20T17:58:20.505Z
Learning: Gasket framework's Worker execute method intentionally takes references to work units (&Unit) rather than consuming them to support potential retries of scheduled work. This requires using interior mutability patterns like RefCell when values need to be moved out of the reference.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

@abailly abailly force-pushed the rk/downstream-server branch from 0a5b8fd to 9bef6b2 Compare April 20, 2025 17:24
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
crates/amaru/src/stages/consensus/chain_forward/test_infra.rs (1)

147-150: Nice one on fixing the channel capacity issue, mate!

You've bumped those channel capacities from 1 to 8, solving that potential deadlock issue. Like Neo dodging bullets in the Matrix, your tests won't freeze up when messages pile up now.

🧹 Nitpick comments (5)
crates/amaru/src/stages/consensus/chain_forward/test_infra.rs (5)

343-351: Consider increasing the timeout for CI environments.

The one-second timeout in block_on is shorter than a TikTok video. While fine for your local machine, it might cause flaky tests in CI environments where resources are shared and slower.

- rt.block_on(timeout(Duration::from_secs(1), f))
+ rt.block_on(timeout(Duration::from_secs(5), f))

35-47: Check for cycles in your blockchain traversal, mate.

Your get_chain function could go into an infinite loop faster than Bill Murray in Groundhog Day if there's a cycle in the header parent links. Add a visited set to break the cycle.

pub fn get_chain(&self, h: &str) -> Vec<Header> {
    let mut chain = Vec::new();
+    let mut visited = std::collections::HashSet::new();
    let mut current = hash(h);
    while let Some(header) = self.get(&current) {
+        if !visited.insert(current) {
+            tracing::warn!("Cycle detected in blockchain at {}", current);
+            break;
+        }
        chain.push(header.clone());
        let Some(parent) = header.parent() else {
            break;
        };
        current = parent;
    }
    chain.reverse();
    chain
}

196-196: That span name is a bit casual, isn't it?

The "whatever" span name is about as descriptive as calling your pet "Animal". Consider using something more meaningful for debugging and tracing, like "block_validation_test".

- let span = tracing::debug_span!("whatever");
+ let span = tracing::debug_span!("block_validation_test");

174-176: Panic messages could use more detail.

When things go pear-shaped, your panic message is as vague as a weather forecast. Consider adding more context about the expected and actual values.

- panic!("expected listening event, got {:?}", port);
+ panic!("Expected ForwardEvent::Listening, but received {:?}", port);

88-92: Great constants, but what's with the naming?

The constants are ripper for test data, but the naming scheme is more mysterious than the plot of Inception. Consider adding some comments to explain what CHAIN_47, TIP_47, etc. represent in your test scenarios.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Lite

📥 Commits

Reviewing files that changed from the base of the PR and between 9bef6b2 and aa3c385.

📒 Files selected for processing (1)
  • crates/amaru/src/stages/consensus/chain_forward/test_infra.rs (1 hunks)

Comment on lines +272 to +285
pub fn recv_n<const N: usize>(&mut self) -> [ClientMsg; N] {
let mut ops = Vec::new();
for _ in 0..N {
let msg = block_on(&self.runtime, self.client.chainsync().request_next()).unwrap();
match msg {
NextResponse::RollForward(header, tip) => {
ops.push(ClientMsg::Forward(from_cbor(&header.cbor).unwrap(), tip))
}
NextResponse::RollBackward(point, tip) => ops.push(ClientMsg::Backward(point, tip)),
NextResponse::Await => break,
}
}
ops.try_into().unwrap()
}
Copy link
Contributor

@coderabbitai coderabbitai bot Apr 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

This recv_n method is a bit ripper-risky, yeah?

The method will panic if it doesn't receive exactly N messages, which could be a right pain to debug when tests go sideways. Consider returning a Result instead or making it more resilient.

- pub fn recv_n<const N: usize>(&mut self) -> [ClientMsg; N] {
+ pub fn recv_n<const N: usize>(&mut self) -> Result<[ClientMsg; N], &'static str> {
    let mut ops = Vec::new();
    for _ in 0..N {
        let msg = block_on(&self.runtime, self.client.chainsync().request_next()).unwrap();
        match msg {
            NextResponse::RollForward(header, tip) => {
                ops.push(ClientMsg::Forward(from_cbor(&header.cbor).unwrap(), tip))
            }
            NextResponse::RollBackward(point, tip) => ops.push(ClientMsg::Backward(point, tip)),
            NextResponse::Await => break,
        }
    }
-    ops.try_into().unwrap()
+    ops.try_into().map_err(|_| "Not enough messages received")
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this is test infrastructure, panic is exactly the right thing to do.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rkuhn G'day mate! Fair dinkum - you're spot on there. Test infrastructure should absolutely panic when things go pear-shaped. It's like that scene in Alien where the alarm goes off - you want to know immediately when something's off!

Panics in test code make failures loud and obvious rather than silently returning errors that might get missed. Makes total sense to keep it as-is.

(•ᴗ•)⁻⁻⁻⁻


✏️ Learnings added
Learnt from: rkuhn
PR: pragma-org/amaru#149
File: crates/amaru/src/stages/consensus/chain_forward/test_infra.rs:272-285
Timestamp: 2025-04-20T17:57:23.214Z
Learning: In test infrastructure code, rkuhn prefers explicit panics (using .unwrap() or similar) over returning Result types, as test failures should be immediate and obvious.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

rkuhn and others added 3 commits April 20, 2025 20:05
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Signed-off-by: Roland Kuhn <[email protected]>
Signed-off-by: Roland Kuhn <[email protected]>
@rkuhn rkuhn force-pushed the rk/downstream-server branch from fd7a4e5 to 9b8b3cd Compare April 20, 2025 18:05
Turns out there's a hidden dependency in CI between the ledger.db and
the snapshot tests. While we find a way to make tests aware of the
LEDGER_DIR env variable, it seems simpler to just stick to default
dirs.

Signed-off-by: Arnaud Bailly <[email protected]>
@abailly abailly merged commit 911ecea into main Apr 21, 2025
13 checks passed
@abailly abailly deleted the rk/downstream-server branch April 21, 2025 03:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants