-
Notifications
You must be signed in to change notification settings - Fork 8
actor-based structure of downstream server #149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughAlright, mate, here’s the lowdown: This update is a ripper—introducing a full-featured network peer and client management system to the consensus forwarding stage. It brings in async actors for handling chain sync, block fetch, and keepalive mini-protocols, all managed with the new Changes
Sequence Diagram(s)sequenceDiagram
participant Peer as Network Peer
participant Listener as TCP Listener
participant Stage as ForwardStage Worker
participant Supervisor as Client Supervisor
participant Protocol as Client Protocol Actors
Peer->>Listener: Connect to listen_address
Listener->>Stage: Send PeerServer via channel
Stage->>Supervisor: Forward PeerServer and Tip
Supervisor->>Protocol: Spawn protocol actors (chainsync, blockfetch, etc.)
Stage->>Supervisor: Broadcast block events (Forward/Backward)
Protocol->>Peer: Respond to protocol requests (e.g., chainsync)
Poem
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall it makes sense to me but there's a lot of ground to tread and I think what would help me understand better the design is a diagram. Adding some design notes in the README
of the crate as part of this PR would be great.
Also, I'd assume you will add tests after having gathered consensus (ah! ah!) on the design?
info!( | ||
target: EVENT_TARGET, | ||
slot = point.slot_or_default(), | ||
hash = %Hash::<32>::from(point), | ||
"rolled_back_to" | ||
); | ||
|
||
// FIXME: block height should be part of BlockValidated message |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point
|
||
enum TxSubmissionMsg {} | ||
|
||
async fn tx_submission( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need TxSubmission
at this stage so probably better to not clutter the code with it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, I just wanted to sketch the whole range, we don’t need details here
store: &Arc<Mutex<dyn ChainStore<Header>>>, | ||
start_point: &Point, | ||
points: &[Point], | ||
) -> Option<(Vec<ClientOp>, Tip)> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would extract the intersection logic from this module and put it closer to chain_selection
but that can happen later
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking some more about it: this is not related to chain selection, it is only related to computing the intersection in chain sync. So I’d like to keep it here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some more comments
Tip(to_pallas_point(&header.point()), header.block_height()), | ||
)); | ||
} | ||
headers.push(ClientOp::Forward(header.clone())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is where we'll need to revise the design later to only use indices/pointers to track each followers chain, because otherwise the intersection process could accumulate the whole chain as Op
s to send to the follower.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@abailly I added a README with a diagram, laying out the basic reasoning behind the approach.
|
||
enum TxSubmissionMsg {} | ||
|
||
async fn tx_submission( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, I just wanted to sketch the whole range, we don’t need details here
// FIXME: block height should be part of BlockValidated message | ||
let store = stage.store.lock().await; | ||
if let Some(header) = store.load_header(&Hash::from(point)) { | ||
self.tip = Tip(to_pallas_point(point), header.block_height()); | ||
self.clients.send(ClientMsg::Op(ClientOp::Forward(header))); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed!
Ok(()) | ||
} | ||
Unit::Peer(peer) => { | ||
// FIXME: gasket design bug that we only get &Unit and thus cannot take values from it without internal mutability |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah. Hence the RefCell, I see.
9017fa4
to
6c6243d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (26)
README.md (1)
131-148
: Comprehensive Docker setup instructions - brilliant addition!The new section on setting up a Cardano node is clear and detailed. However, it might be worth mentioning the approximate disk space required for the blockchain and expected sync time - these can be rather hefty!
The final and crucial ingredient is to supply the environment variable `NETWORK=preprod`. With this, the node with start up, fetch the required config files, and then start syncing blocks from the network; this will take some time. + +Note: The preprod blockchain requires several GB of disk space and initial sync may take hours to complete. You can monitor the sync progress using the node logs.crates/amaru/src/sync.rs (1)
89-94
: Nice implementation of the ForwardStage instantiation!The updated instantiation now properly incorporates the new configuration parameters. I notice you're passing
None
as the first parameter (downstream), which suggests this is a terminal actor in your actor system - a bit like the last bloke in a pub crawl who doesn't have anyone to pass the pint to!One small thing to watch: you're casting
network_magic
fromu32
tou64
on line 92. That's perfectly fine for now, but worth documenting why the types differ between the config and the function parameter if there's a specific reason.Consider adding a brief comment explaining the type conversion if there's a specific architectural reason for the different integer sizes.
Makefile (1)
75-80
: Comprehensive dev target updateYou've done a cracking job updating the dev target with all the new parameters. This makes it much easier for developers to run the system with the default configuration.
Since you're building an actor-based system, you might consider adding a comment or additional target to demonstrate different configurations, such as running multiple nodes with different listen addresses. This could help developers understand how to set up a local test network.
crates/amaru-consensus/chain_forward.d2 (3)
9-38
: Consistent styling classes, but watch out for overuse of shadows.It might be simpler to exclude shadows for containers that don’t need a strong emphasis. Too many shadows can clutter the diagram visually—sometimes, less is more.
45-48
: Container label is a beaut, but consider descriptive naming."chain_forward" is short and sweet, but might you label it "Chain Forward Stage" for clarity?
- label: "chain_forward" + label: "Chain Forward Stage"
113-119
: Arrows fromclient_protocols
are clear, but consider extended annotation.Currently, these lines succinctly indicate how updates flow, but a brief mention in the arrow label about "miniprotocol" might make the transitions even more explicit for new readers.
crates/amaru-consensus/README.md (3)
23-24
: Watch the wording to keep it lively yet concise.You’ve used “needs to be deconstructed in order to allow concurrent usage” – a cheeky rephrase can make this tidier and match our colourful style. For instance:
- PeerServer needs to be deconstructed in order to allow concurrent usage + PeerServer must be broken apart to allow concurrent usage🧰 Tools
🪛 LanguageTool
[style] ~23-~23: Consider a shorter alternative to avoid wordiness.
Context: ...PeerServer
needs to be deconstructed in order to allow concurrent usage of the protocols...(IN_ORDER_TO_PREMIUM)
[uncategorized] ~24-~24: Possible missing comma found.
Context: ...he protocol instances is managed by one actor whose sole responsibility is to react t...(AI_HYDRA_LEO_MISSING_COMMA)
26-27
: Variety is the spice of life—tweak repetitive phrasing.You’ve used “needs to” again, which can feel a tad repetitive. Perhaps “must be separated” or “should be split off” would ring more melodious.
🧰 Tools
🪛 LanguageTool
[style] ~26-~26: You have already used this phrasing in nearby sentences. Consider replacing it to add variety to your writing.
Context: .... In the case ofchainsync
this actor needs to obtain information regarding the curren...(REP_NEED_TO_VB)
[style] ~26-~26: Consider a shorter alternative to avoid wordiness.
Context: ...r operations (roll forward or backward) in order to respond to requests for the next header...(IN_ORDER_TO_PREMIUM)
[style] ~27-~27: You have already used this phrasing in nearby sentences. Consider replacing it to add variety to your writing.
Context: ...ion updates, these two responsibilities need to be separated into two actors: - `chain...(REP_NEED_TO_VB)
37-38
: Slap in a comma for clarity, mate.A small punctuation cameo ensures readers don’t run out of breath mid-sentence.
- ...supervisor there is an additional level... + ...supervisor, there is an additional level...🧰 Tools
🪛 LanguageTool
[uncategorized] ~37-~37: A comma might be missing here.
Context: ...s for one client and the overall client supervisor there is an additional level of per-cli...(AI_EN_LECTOR_MISSING_PUNCTUATION_COMMA)
crates/amaru-consensus/src/chain_forward/tests.rs (2)
97-114
: Consider adding negative tests for edge cases.It might be beneficial to test scenarios like “file not found” or “invalid JSON” to confirm robust error handling. Currently,
mk_store
unwraps, which can cause panics if the chain file is missing.
270-278
: No graceful teardown leads to logs at ERROR level.It’s not exactly a big worry, but you might add a nice little shutdown signal for the gasket stage, so you don’t get spammed with error logs. This can come in handy for diagnosing real errors.
crates/amaru-consensus/src/chain_forward/client_state.rs (1)
62-66
:todo!()
usage for Backward operations is a bit of a cliff-hanger.We’re all for a bit of suspense, but leaving it unimplemented too long might cause confusion. Recommend either adding a comment with the plan or an issue reference so the team knows how to proceed.
Do you fancy me drafting a minimal partial rollback approach?
crates/amaru-consensus/src/chain_forward/client_protocol.rs (8)
38-40
: Single-variant enum might be limiting.
Consider expandingClientProtocolMsg
if you foresee new message types in future. Otherwise, it’s fine for now.
42-69
: Well-arranged spawns for parallel tasks.
The approach to block fetching, transaction submission, keep-alive, and chain-sync is quite spiffy. Yet, you might want logging statements for each supervised task's lifecycle. Helps debug if tasks misbehave.
78-136
: Asynchronous chain sync logic is sound.
The flow for waiting on intersections and bridging ops to the handler is well-designed. Consider a fallback or timeout approach if a chain intersection is delayed indefinitely—just in case the client dozes off.
138-187
:chain_sync_handler
covers the request flow effectively.
Grander concurrency concerns seem addressed by the cell-based architecture. If the client’s requests become more complex, a break-down of sub-handlers might keep the code tidy.
189-195
: Simple transformation toHeaderContent
.
Hardcodingvariant=1
is okay if you’re certain this never changes. Otherwise, you might consider parameterising.
197-211
:block_fetch
is a placeholder.
Ensure you handle large block ranges properly when you do implement this. A little buffering or streaming approach is typically wise.
213-224
:tx_submission
skeleton.
Nothing to add for now—perhaps consider bounding your queue for transactions if you suspect a spam scenario.
226-235
:keep_alive
loop is minimal but serviceable.
Make sure a misbehaving or unresponsive client doesn’t cause indefinite resource usage. A cheeky little idle-timeout could be nice.crates/amaru-consensus/src/chain_forward.rs (6)
59-63
:ForwardEvent
is a handy enumeration of events.
It clearly delineates listening from forwarding. Might consider an additional event if we’re expecting more connection states.
84-87
:Unit
enum is a neat approach.
Bundling either a peer or a block helps unify the schedule logic. Be cautious of potential expansions—trying to shoehorn in too many variants can get messy.
104-154
:bootstrap
sets up the server and spawns the client actor.
It’s good that we blink the listening event todownstream
. Keep an eye on security—limiting or filtering accepted peers may be wise.
156-170
:schedule
method concurrently awaits new blocks or new peers.
A parallel fetch strategy is pinned well—just ensure you’re handling exhaustion of the mpsc channel gracefully.
171-247
: Solidexecute
method handles block events and peer arrivals.
Implementation covers validated blocks, rollbacks, and new peer setup. The scatteredFIXME
notes about block heights and partial storage coverage are worth addressing soon.
250-285
:client_supervisor
elegantly manages multiple clients.
Broadcasting aClientOp
to all connected clients is neat, though it could become chatty if you have loads of peers. Possibly consider a more selective dispatch approach if scaling up.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (3)
Cargo.lock
is excluded by!**/*.lock
crates/amaru-consensus/Cargo.lock
is excluded by!**/*.lock
crates/amaru-consensus/chain_forward.png
is excluded by!**/*.png
📒 Files selected for processing (14)
.gitignore
(2 hunks)Cargo.toml
(2 hunks)Makefile
(2 hunks)README.md
(2 hunks)crates/amaru-consensus/Cargo.toml
(2 hunks)crates/amaru-consensus/README.md
(1 hunks)crates/amaru-consensus/chain_forward.d2
(1 hunks)crates/amaru-consensus/src/chain_forward.rs
(5 hunks)crates/amaru-consensus/src/chain_forward/client_protocol.rs
(1 hunks)crates/amaru-consensus/src/chain_forward/client_state.rs
(1 hunks)crates/amaru-consensus/src/chain_forward/tests.rs
(1 hunks)crates/amaru/src/bin/amaru/cmd/daemon.rs
(2 hunks)crates/amaru/src/bin/amaru/cmd/mod.rs
(1 hunks)crates/amaru/src/sync.rs
(2 hunks)
🧰 Additional context used
🧬 Code Definitions (3)
crates/amaru/src/sync.rs (1)
crates/amaru-consensus/src/chain_forward.rs (1)
new
(66-81)
crates/amaru-consensus/src/chain_forward.rs (6)
crates/amaru-consensus/src/chain_forward/client_protocol.rs (1)
client_protocols
(42-69)crates/amaru-consensus/src/chain_forward/client_state.rs (3)
to_pallas_point
(121-126)new
(27-37)tip
(45-60)crates/amaru-consensus/src/consensus/header_validation.rs (1)
new
(74-84)crates/amaru-consensus/src/consensus/store/rocksdb.rs (1)
new
(16-28)crates/ouroboros-traits/src/is_header/fake.rs (2)
point
(44-46)slot
(40-42)crates/ouroboros-traits/src/is_header/mod.rs (5)
point
(33-35)slot
(45-45)slot
(69-71)slot
(87-89)hash
(28-30)
crates/amaru-consensus/src/chain_forward/client_state.rs (2)
crates/amaru-consensus/src/chain_forward.rs (1)
new
(66-81)crates/amaru-consensus/src/chain_forward/tests.rs (2)
point
(89-91)hash
(62-64)
🪛 LanguageTool
crates/amaru-consensus/README.md
[style] ~23-~23: Consider a shorter alternative to avoid wordiness.
Context: ... PeerServer
needs to be deconstructed in order to allow concurrent usage of the protocols...
(IN_ORDER_TO_PREMIUM)
[uncategorized] ~24-~24: Possible missing comma found.
Context: ...he protocol instances is managed by one actor whose sole responsibility is to react t...
(AI_HYDRA_LEO_MISSING_COMMA)
[style] ~26-~26: You have already used this phrasing in nearby sentences. Consider replacing it to add variety to your writing.
Context: .... In the case of chainsync
this actor needs to obtain information regarding the curren...
(REP_NEED_TO_VB)
[style] ~26-~26: Consider a shorter alternative to avoid wordiness.
Context: ...r operations (roll forward or backward) in order to respond to requests for the next header...
(IN_ORDER_TO_PREMIUM)
[style] ~27-~27: You have already used this phrasing in nearby sentences. Consider replacing it to add variety to your writing.
Context: ...ion updates, these two responsibilities need to be separated into two actors: - `chain...
(REP_NEED_TO_VB)
[uncategorized] ~37-~37: A comma might be missing here.
Context: ...s for one client and the overall client supervisor there is an additional level of per-cli...
(AI_EN_LECTOR_MISSING_PUNCTUATION_COMMA)
⏰ Context from checks skipped due to timeout of 90000ms (2)
- GitHub Check: Build on windows-latest with target x86_64-pc-windows-msvc
- GitHub Check: Build on windows-latest with target x86_64-pc-windows-msvc
🔇 Additional comments (28)
.gitignore (3)
17-17
: Database directory structure changes - brilliant!The change from
ledger.db
to/*ledger.db/
restricts ignore patterns to the root directory and indicates these are now directories rather than files. This is a clever way to allow for nested database structures while maintaining clean git tracking. Good stuff!
20-20
: Same pattern applied consistently - top job!Applying the same pattern modification to
chain.db
ensures consistency across database structures. Like a well-paired set of Wellington boots, these changes match perfectly!
33-35
: Adding Cardano node support files to gitignore - spot on!These new ignore patterns align beautifully with the Cardano node setup instructions added in the README. Keeping these potentially large directories out of version control is as sensible as bringing an umbrella to Manchester!
README.md (1)
23-23
: Improved cross-reference in documentation - handy addition!Adding the reference to the bottom of the page provides a lovely little breadcrumb trail for users to follow. It's like leaving a trail of biscuit crumbs to the tea pot - quite thoughtful!
crates/amaru/src/bin/amaru/cmd/mod.rs (1)
28-30
:❓ Verification inconclusive
New default listen address constant - good addition but mind the security!
Setting "0.0.0.0:3001" as the default listen address will bind to all network interfaces. While this makes sense for flexibility, it's like leaving your garden gate wide open - anyone could wander in!
Ensure that firewall rules are properly set up when deploying in production environments, or consider adding a note in the documentation about securing this port.
🏁 Script executed:
#!/bin/bash # Check if there's any authentication or TLS configuration for this listening port # which would mitigate the security concerns rg -A 5 -B 5 "DEFAULT_LISTEN_ADDRESS" --glob "*.rs" rg "bind|listen|tls|auth|secure" --glob "*.rs" | grep -i "3001"Length of output: 1781
Security Check Required – Please Verify Network Safeguards
The introduction of the new default listen address is a neat way to centralise configuration, but binding to all interfaces at
"0.0.0.0:3001"
could resemble leaving your front door wide open. My recent search didn’t uncover any evidence of TLS, authentication, or similar security measures being applied to this port. This might well be a benign omission in the code, but it does raise an eyebrow.
- In
crates/amaru/src/bin/amaru/cmd/mod.rs
: The constantDEFAULT_LISTEN_ADDRESS
is defined as"0.0.0.0:3001"
.- In
crates/amaru/src/bin/amaru/cmd/daemon.rs
: This constant is used as a default value for the listen address.- No immediate security-related configurations (e.g. TLS or auth) associated with port 3001 appear to be present.
Could you kindly perform a manual verification to confirm that your deployment environment (or additional modules) handles network security appropriately through firewall rules or other means? A note in the documentation regarding securing this port might also be beneficial.
Cargo.toml (2)
55-55
: Enhanced tracing with env-filter - smashing improvement!Adding the env-filter feature to tracing-subscriber allows for more flexible logging configuration via environment variables. This is as handy as a pocket on a shirt - perfect for toggling different log levels during development and debugging!
This change will make it easier to diagnose issues in different environments without recompiling.
18-18
:✅ Verification successful
Adding acto dependency - embracing the actor model!
This addition aligns perfectly with the PR objective of implementing an actor-based structure. The acto crate with tokio features is a cracking choice for building asynchronous actor systems.
It's worth ensuring this matches the overall architectural direction and that team members are familiar with the actor model pattern - it's a bit like switching from driving on the left to the right if you're not used to it!
🌐 Web query:
What features and benefits does the acto 0.7.2 Rust crate provide for actor-based systems?
💡 Result:
The Acto crate (version 0.7.2) is not directly covered in the provided search results. However, based on prominent Rust actor frameworks like Actix and Axiom discussed in the sources, here are key features and benefits typical to modern Rust actor systems:
Core Features of Actor Frameworks (General)
Concurrent Message Handling
Actors process messages asynchronously with thread-safe communication, using address systems (Addr
/Recipient
) for message routing[3][9].Lifecycle Management
Built-in state management for actor initialization (started
), graceful shutdown (stopping
), and cleanup (stopped
)[3][4].Supervision
Parent-child hierarchies for error recovery and system resilience[4][6].Typed Messaging
Compile-time type safety for messages withoutAny
type usage[4][9].Async/Sync Support
Hybrid execution models (async for I/O-bound tasks, sync for CPU-bound operations)[4][9].
Actix Framework Highlights (v0.13.5)
// Example actor definition use actix::{Actor, Context, Handler, Message}; struct MyActor; impl Actor for MyActor { type Context = Context<Self>; } #[derive(Message)] #[rtype(result = "String")] struct MyMessage; impl Handler<MyMessage> for MyActor { type Result = String; fn handle(&mut self, msg: MyMessage, _ctx: &mut Context<Self>) -> Self::Result { "Received message".into() } }
Feature Benefit Tokio Runtime Built on Tokio for efficient async I/O[4] Futures Integration Native async/await support via futures[3][9] Local/Thread Context Flexible execution in thread-bound or shared contexts[4] Actor Supervision Automatic restarts for failed actors[4]
Axiom Framework Features
// Error handling example async fn processor(state: State, _ctx: Context, msg: Message) -> AxiomResult<State> { if msg.should_fail() { Err(AxiomError::new("Critical failure"))?; } Ok(Status::done(state)) }
Feature Benefit Panic Tolerance Converts panics to errors for fault isolation[6] Async Processors First-class async/await support[6] Runtime Agnostic Works with multiple async runtimes via traits[6][16] Distributed Design Inspired by Akka/Erlang for cluster-ready systems[6]
Comparison Table
Aspect Actix Axiom Runtime Tokio-bound[4] Runtime-agnostic[6] Error Handling Supervision[4] Panic conversion[6] Async Model Futures-based[9] Async processors[6] Learning Curve Moderate Higher (Akka-like) While the specific
acto 0.7.2
crate isn't detailed here, modern Rust actor systems generally provide type-safe concurrency, lifecycle management, and runtime flexibility. For production use, Actix offers battle-tested Tokio integration[3][4], while Axiom provides Erlang-style resilience for complex systems[6].Citations:
- 1: https://doc.rust-lang.org/cargo/reference/features.html
- 2: https://gencmurat.com/en/posts/crates-in-rust/
- 3: https://actix.rs/docs/actix/actor
- 4: https://docs.rs/crate/actix/latest
- 5: https://dev.to/alexmercedcoder/introduction-to-cargo-and-cargotoml-2l86
- 6: https://docs.rs/axiom/latest/axiom/
- 7: https://ohadravid.github.io/posts/2024-12-state-of-the-crates/
- 8: https://blog.turbo.fish/cargo-features/
- 9: https://github.com/actix/actix
- 10: https://doc.rust-lang.org/cargo/reference/resolver.html
- 11: https://users.rust-lang.org/t/how-to-publish-a-crate-with-docs-that-depends-on-features/29711
- 12: https://lib.rs/concurrency
- 13: https://github.com/rust-lang/cargo/blob/master/Cargo.toml
- 14: https://docs.rs/acto
- 15: https://www.redpacketsecurity.com/cisa-vulnerability-summary-for-the-week-of-november-4-2024/
- 16: https://docs.rs/agnostic
- 17: Features of dependencies are enabled if they're enabled in build-dependencies; breaks no_std libs rust-lang/cargo#5730
- 18: https://www.cisa.gov/news-events/bulletins/sb25-026
- 19: https://crates.io/crates/acto
- 20: https://crates.io/crates/utoipa
Dependency Update: Embracing the Actor Model
The addition of the acto dependency (version 0.7.2 with Tokio features) fits perfectly with the PR's goal of introducing an actor-based architecture. Although direct documentation for this specific version wasn't readily available, similar modern Rust actor frameworks typically offer robust asynchronous message handling, lifecycle management, and supervision, all of which align nicely with our design intentions.
Please ensure that this direction is in sync with our overall architectural strategy and that everyone on the team is comfortable with the actor model approach – a bit like making sure you know whether to drive on the left or the right!
File: Cargo.toml (Line 18)
acto = { version = "0.7.2", features = ["tokio"] }crates/amaru/src/sync.rs (1)
52-53
: Lovely additions to the Config struct!These new fields are crucial for the actor-based structure. The
network_magic
will identify the specific network, while thelisten_address
allows the node to accept incoming connections. Good job keeping the types consistent with their intended use.crates/amaru-consensus/Cargo.toml (3)
16-16
: Adding the acto dependency - brilliant!Adding the
acto
crate aligns perfectly with the actor-based structure mentioned in the PR objectives. This will provide the foundation for your actor system.
18-18
: Clean formatting on the gasket featuresNice tidy-up on the formatting here. Those extra spaces can be as annoying as getting a pebble in your shoe!
40-45
: Good additions to dev dependenciesAdding
minicbor
,serde_json
, andtracing-subscriber
makes perfect sense for an actor-based system. These will help with serialization, debugging, and observability - absolutely essential when you're building a distributed system like this.crates/amaru/src/bin/amaru/cmd/daemon.rs (2)
54-56
: Great addition of the listen_address parameter!The clear documentation and sensible default value make this a very user-friendly addition. This parameter is crucial for allowing incoming connections in your actor-based system.
113-114
: Proper configuration handling - spot on!You've correctly added the new parameters to the Config struct. The
network_magic
is derived from the network, and thelisten_address
is passed through from the args. This ensures your actor system has all the information it needs to operate correctly.Makefile (3)
6-10
: Clever variable organization!Creating these variables makes the Makefile more maintainable and consistent. I particularly like how you've set up
LEDGER_DIR
andCHAIN_DIR
to use theDB_PREFIX
, which allows users to easily customize the database locations. The default value forLISTEN_ADDRESS
as0.0.0.0:0
is a good choice for development as it'll let the OS assign an available port.
69-71
: Handy clear-db target - lifesaver!This new target will be incredibly useful during development and testing. Being able to quickly clear the database with a single command is as refreshing as a cold pint on a hot day!
72-72
: Smart update to the bootstrap targetMaking
bootstrap
depend onclear-db
ensures a clean slate before importing data - that's a proper way to avoid stale data issues. This kind of attention to detail makes for a much smoother developer experience.crates/amaru-consensus/chain_forward.d2 (1)
1-7
: Graceful introduction, well done!These lines effectively set the stage for the diagram with a clear title and styling instructions. Lovely job making it stand out with the bold text and larger font size.
crates/amaru-consensus/src/chain_forward/tests.rs (2)
1-16
: Overall test structure looks magic.The mixture of
ChainStore
,ForwardEvent
, and runtime creation is well aligned. Great job setting up your test environment.
180-193
: Potential concurrency gotchas.You spawn an actor to forward downstream traffic, which is brilliant. However, consider logging or handling the scenario where the channel might fail to send if the receiver is dropped unexpectedly. That’ll help diagnose issues if a rogue drop happens.
Would you like to add a small check or test to handle the
Result
ofport_tx.send
in case the channel is closed?crates/amaru-consensus/src/chain_forward/client_state.rs (2)
8-12
: Choice ofpub(super)
forClientOp
is sound.This restricts usage to within the module, preserving your architecture’s secrecy. Good show, mate!
96-98
: Check for a potential short-circuit.When you find a matching point, you return, which is perfect. Just ensure we don’t skip interesting headers that might appear between multiple matching points. If the design only expects one match, you’re grand.
crates/amaru-consensus/src/chain_forward/client_protocol.rs (3)
1-15
: Imports appear sensible and well-structured.
They collectively handle concurrency, messaging, and the chain store interactions. No immediate concerns here, mate.
16-36
: Good use ofthiserror
for structured errors.
This enum neatly categorises client errors. Keep an eye on the expansions to ensure they remain comprehensible.
71-77
:ChainSyncMsg
: straightforward, no fuss.
Defining an enum for ledger operations is spot on—makes the chain-sync code more readable.crates/amaru-consensus/src/chain_forward.rs (4)
16-27
: Imports are logically grouped and relevant.
Locks, concurrency utilities, actor frameworks—all properly set up. No hairy bits spotted.
49-57
:ForwardStage
gains new fields nicely.
Introducingnetwork_magic
,runtime
,listen_address
, anddownstream
is wise to unify your chain-forward logic.
66-81
:ForwardStage::new
is quite straightforward.
All needed fields are properly initialised. Cheers for usingAcTokio::new(...)
, but do be mindful of error handling if the runtime fails to initialise.
89-101
:Worker
struct and itsDrop
are tidy.
Aborting the server on drop is a practical approach, preventing stray tasks. Just watch for partial shutdown states.
What's up with this PR? Is it abandoned / superseded? |
So far I didn’t get any thumbs-up — how does this work here? |
I think the rule is that if 1/ the CI is green and 2/ there's been reviews without any blocking comments, you should be good to go. If this works, I would like to merge it so that at least we have some ability to propagate headers downstream. |
1900493
to
661ba26
Compare
aye aye; rebased across the move, will now add some basic client limits in place and then make CI green |
45d7bd2
to
74a52ab
Compare
Alright, I pushed this as far as I can without diving into rabbit holes. Any ideas about the remaining CI failure? |
I agree the message is not very explicit
|
I think the problem is that you've set the directories where ledger and chain are restored explicitly in the |
Currently only keepalive and chainsync protocols are supported. There is a smoke test for the latter and the assumption that the former is tested in pallas. Some work has also been done in the Makefile to make it easier to have multiple Amaru nodes running in the project root directory. An architecture diagram has been added to the amaru-consensus README. Signed-Off-By: Roland Kuhn <[email protected]>
For now this is a simple cmdline argument that just ignores peers beyond a certain count. Proper handling will need to include idle timers etc. to kick out downstream peers that are just squatting. Signed-Off-By: Roland Kuhn <[email protected]>
also fix some bugs along the way, of course Signed-off-by: Roland Kuhn <[email protected]>
Signed-off-by: Roland Kuhn <[email protected]>
Signed-off-by: Roland Kuhn <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 6
🧹 Nitpick comments (17)
crates/amaru/src/bin/amaru/cmd/mod.rs (1)
28-29
: Validate the default listen address with strong typesNice one centralizing the default listen address. Consider switching from a raw string to
SocketAddr
with Clap’svalue_parser
orstd::net::SocketAddr
parsing to catch bad addresses early and improve clarity..gitignore (1)
39-42
: Remove duplicate snapshot ignoreYou’ve already got
snapshots/
ignored earlier. The new/snapshots/
is a near-twin—consider pruning one to keep the list DRY.crates/amaru/src/stages/consensus/chain_forward/tests.rs (1)
132-132
: Consider improving the shutdown process for testsThat comment about ERROR logs when shutting down the gasket stage is a bit concerning. Like having a game that always crashes when you try to exit properly!
It might be worth investigating a cleaner shutdown approach to avoid polluting test logs with errors, even if they're expected.
crates/amaru/src/stages/consensus/chain_forward/client_state.rs (1)
57-93
: Solid header finding algorithm, but could benefit from a small optimizationThis
find_headers_between
function is doing a fair dinkum job of traversing the chain backwards to find matching points. The early return for exact matches is a beauty.However, for long chains, we might be creating a lot of temporary objects with all those
clone()
calls. When we're building up a potentially large vector of headers, this could get a bit expensive.- headers.push(ClientOp::Forward(header.clone(), header.as_tip())); + // Create ClientOp just once at the end after we've collected all the headers + let header_ref = header; + headers.push(header_ref);And then transform the headers into ClientOps right before returning. Just like how you'd batch craft items in "Breath of the Wild" instead of crafting them one by one!
Makefile (1)
76-82
: Cracking command-line flexibility!You've done a smashing job updating the dev target with all these configurable options. Now users can properly configure directories and listen address. It's more customizable than a Fallout character creation screen!
One thing though - it might be helpful to add a comment explaining what a valid
LISTEN_ADDRESS
looks like for newcomers.8 DB_PREFIX ?= amaru 9 LISTEN_ADDRESS ?= 0.0.0.0:0 # Format: <ip>:<port>. Use 0.0.0.0:0 for random port on all interfaces.crates/amaru/src/stages/consensus/chain_forward/README.md (1)
22-36
: Ace actor decomposition rationale!Deadly explanation of why you've split the responsibilities between
chain_sync
andchain_sync_handler
. I particularly appreciate you noting this is "the minimum required complexity" - shows you've really thought about the trade-offs.Just a tiny language improvement for readability:
- The `PeerServer` API bundles miniprotocol instances for each of the supported protocols, but unfortunately, using these instances via the `PeerServer` requires an exclusive reference, so the `PeerServer` needs to be deconstructed in order to allow concurrent usage of the protocols (like concurrently waiting for requests). + The `PeerServer` API bundles miniprotocol instances for each of the supported protocols, but unfortunately, using these instances via the `PeerServer` requires an exclusive reference, so the `PeerServer` needs to be deconstructed to allow concurrent protocol usage (like concurrently waiting for requests).Makes it read a bit smoother, like a well-edited cutscene!
🧰 Tools
🪛 LanguageTool
[style] ~23-~23: Consider a shorter alternative to avoid wordiness.
Context: ...PeerServer
needs to be deconstructed in order to allow concurrent usage of the protocols...(IN_ORDER_TO_PREMIUM)
[uncategorized] ~24-~24: Possible missing comma found.
Context: ...he protocol instances is managed by one actor whose sole responsibility is to react t...(AI_HYDRA_LEO_MISSING_COMMA)
[style] ~26-~26: You have already used this phrasing in nearby sentences. Consider replacing it to add variety to your writing.
Context: .... In the case ofchainsync
this actor needs to obtain information regarding the curren...(REP_NEED_TO_VB)
[style] ~26-~26: Consider a shorter alternative to avoid wordiness.
Context: ...r operations (roll forward or backward) in order to respond to requests for the next header...(IN_ORDER_TO_PREMIUM)
[style] ~27-~27: You have already used this phrasing in nearby sentences. Consider replacing it to add variety to your writing.
Context: ...ion updates, these two responsibilities need to be separated into two actors: - `chain...(REP_NEED_TO_VB)
crates/amaru/src/stages/consensus/chain_forward/chain_forward.d2 (1)
40-126
: Comprehensive component architecture visualizationThis is a cracking good diagram definition that perfectly complements the README. The way you've structured the components and their relationships gives a crystal clear picture of the system's architecture.
The markdown text blocks within components add just the right level of detail without cluttering the diagram. It's like one of those beautiful Assassin's Creed maps that shows you exactly what you need to know!
One suggestion - consider adding a brief comment at the top explaining how to generate the actual PNG from this d2 file for those who aren't familiar with the tool.
+# Generate diagram with: d2 chain_forward.d2 chain_forward.png +# (Install d2 from https://d2lang.com if needed) + title: { label: "internal design of the chain_forward stage"crates/amaru/src/stages/mod.rs (2)
108-115
: ForwardStage getsu64
, config givesu32
– useInto
instead of a castThe explicit
as u64
hides a silent truncation bug if we ever exceedu32::MAX
.
UseInto::<u64>
(or make the field itselfu64
) so the compiler yells when it should.- config.network_magic as u64, + u64::from(config.network_magic),
173-196
: These helper traits are brilliant – consider moving them to a shared util
PallasPoint
,AsTip
, andto_pallas_point
are generic goodies that other modules could reuse (I already spotted a couple of conversions elsewhere).
Sticking them into acrate::utils::point
module will keep duplication away and make tests happier.crates/amaru/src/stages/consensus/chain_forward/client_protocol.rs (3)
81-85
: Early‑bird clients get the worm… or a polite error?Right now we bail with
EarlyRequestNext
when the client forgets to ask for intersection first. That’s spec‑correct, but maybe over‑strict for a public relay. Consider logging & closing silently instead of bubbling an error upstream if you don’t want noisy telemetry.
86-93
: Heavy mutex lock inside hot path
find_headers_between
grabsstore.lock().await
for the whole computation. On a busy box with dozens of downstreams this will serialise every intersection search.
Clone the needed headers outside the lock or expose a lock‑free read API inChainStore
to keep things zippy like Sonic on a sugar rush.
204-210
: Hard‑codingvariant = 1
smells like troubleShelley headers are variant = 1 today, but once Voltaire rocks up we’ll have more flavours than a gelato bar. Maybe derive the variant from the header itself (or a const enum) instead of freezing it.
crates/amaru/src/stages/consensus/chain_forward.rs (2)
195-223
: Name shadowing hides the listener – rename for clarityWe create
let server = TcpListener…
and then immediatelylet server = tokio::spawn…
.
Totally legal, but future Ryan will scratch his head figuring out which “server” is which. A tiny rename keeps the code as crisp as a freshly‑poured Guinness.-let server = TcpListener::bind(&stage.listen_address).await.or_panic()?; -… -let server = tokio::spawn(async move { +let listener = TcpListener::bind(&stage.listen_address).await.or_panic()?; +… +let listener_task = tokio::spawn(async move { loop { - let peer = match PeerServer::accept(&server, network_magic).await { + let peer = match PeerServer::accept(&listener, network_magic).await {
337-379
: Broadcast loop could clog if a single client stalls
for client in clients.values() { client.send(…) }
awaits each send in series.
A slow peer will block everyone like that one mate who hogs the controller. Fire‑and‑forget withtry_send
or spawn per‑client tasks to keep the queue flowing.crates/amaru/src/stages/consensus/chain_forward/test_infra.rs (3)
88-93
: Tiny nit: constant naming vs. file path is a tad confusing
CHAIN_47
points attests/data/chain41.json
– that mismatch can trip the next dev reading this (I almost did a double‑take like Marty in “Back to the Future”).
Either rename the constant or bump the file so the numbers line up.
343-351
: One‑second timeout is pretty tight for CI runners
block_on
wraps every future intimeout(Duration::from_secs(1), f)
.
On a busy GitHub runner or a local box under load, a single context switch can exceed that, leading to flaky red builds faster than Sonic on caffeine.Consider making this configurable or bumping it to 5‑10 s:
-rt.block_on(timeout(Duration::from_secs(1), f)) +const TEST_TIMEOUT_SECS: u64 = 5; +rt.block_on(timeout(Duration::from_secs(TEST_TIMEOUT_SECS, f))
258-269
: Graceful exit on peer disconnect would improve resilience
recv_until_await
breaks only onAwait
, but if the peer closes the socket you get anErr
and silently exit the loop, returning whatever ops were collected.
Might be worth logging that error (or bubbling it up) so future debugging isn’t a wild Chocobo chase.-while let Ok(response) = block_on(&self.runtime, self.client.chainsync().request_next()) { +while let Ok(response) = + block_on(&self.runtime, self.client.chainsync().request_next()) +{ match response { ... } } // Else: log/propagate the disconnection error
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (4)
Cargo.lock
is excluded by!**/*.lock
crates/amaru-consensus/Cargo.lock
is excluded by!**/*.lock
crates/amaru/src/stages/consensus/chain_forward/chain_forward.png
is excluded by!**/*.png
examples/Cargo.lock
is excluded by!**/*.lock
📒 Files selected for processing (17)
.gitignore
(2 hunks)Cargo.toml
(1 hunks)Makefile
(3 hunks)README.md
(2 hunks)crates/amaru-consensus/Cargo.toml
(1 hunks)crates/amaru/Cargo.toml
(2 hunks)crates/amaru/src/bin/amaru/cmd/daemon.rs
(2 hunks)crates/amaru/src/bin/amaru/cmd/mod.rs
(1 hunks)crates/amaru/src/stages/consensus/chain_forward.rs
(2 hunks)crates/amaru/src/stages/consensus/chain_forward/README.md
(1 hunks)crates/amaru/src/stages/consensus/chain_forward/chain_forward.d2
(1 hunks)crates/amaru/src/stages/consensus/chain_forward/client_protocol.rs
(1 hunks)crates/amaru/src/stages/consensus/chain_forward/client_state.rs
(1 hunks)crates/amaru/src/stages/consensus/chain_forward/test_infra.rs
(1 hunks)crates/amaru/src/stages/consensus/chain_forward/tests.rs
(1 hunks)crates/amaru/src/stages/mod.rs
(7 hunks)scripts/demo.sh
(1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (2)
crates/amaru/src/stages/consensus/chain_forward/tests.rs (4)
crates/amaru/src/stages/consensus/chain_forward/client_state.rs (2)
find_headers_between
(57-93)new
(17-19)crates/amaru/src/stages/consensus/chain_forward/test_infra.rs (5)
hash
(115-117)mk_store
(94-113)get_height
(59-62)new
(141-192)get_point
(54-57)crates/amaru/src/stages/consensus/chain_forward.rs (2)
tip
(170-175)new
(65-86)crates/amaru/src/stages/mod.rs (5)
pallas_point
(174-174)pallas_point
(178-180)pallas_point
(184-186)as_tip
(199-199)as_tip
(203-205)
crates/amaru/src/stages/mod.rs (7)
crates/amaru-kernel/src/lib.rs (1)
from
(161-167)crates/amaru/src/stages/consensus/chain_forward.rs (2)
tip
(170-175)new
(65-86)crates/amaru-ledger/src/state.rs (2)
tip
(165-177)new
(97-150)crates/amaru/src/stages/consensus/chain_forward/test_infra.rs (3)
new
(141-192)point
(123-125)hash
(115-117)crates/amaru/src/stages/pull.rs (1)
new
(57-64)crates/amaru/src/stages/consensus/fetch.rs (1)
new
(42-52)crates/amaru/src/stages/consensus/header.rs (1)
new
(43-49)
🪛 LanguageTool
crates/amaru/src/stages/consensus/chain_forward/README.md
[style] ~23-~23: Consider a shorter alternative to avoid wordiness.
Context: ... PeerServer
needs to be deconstructed in order to allow concurrent usage of the protocols...
(IN_ORDER_TO_PREMIUM)
[uncategorized] ~24-~24: Possible missing comma found.
Context: ...he protocol instances is managed by one actor whose sole responsibility is to react t...
(AI_HYDRA_LEO_MISSING_COMMA)
[style] ~26-~26: You have already used this phrasing in nearby sentences. Consider replacing it to add variety to your writing.
Context: .... In the case of chainsync
this actor needs to obtain information regarding the curren...
(REP_NEED_TO_VB)
[style] ~26-~26: Consider a shorter alternative to avoid wordiness.
Context: ...r operations (roll forward or backward) in order to respond to requests for the next header...
(IN_ORDER_TO_PREMIUM)
[style] ~27-~27: You have already used this phrasing in nearby sentences. Consider replacing it to add variety to your writing.
Context: ...ion updates, these two responsibilities need to be separated into two actors: - `chain...
(REP_NEED_TO_VB)
[uncategorized] ~37-~37: A comma might be missing here.
Context: ...s for one client and the overall client supervisor there is an additional level of per-cli...
(AI_EN_LECTOR_MISSING_PUNCTUATION_COMMA)
🔇 Additional comments (27)
Cargo.toml (1)
18-18
: Approveacto
workspace dependencyRight on—adding
acto
v0.7.2 with the Tokio feature is spot on for the actor-based model. Ensure it aligns with versions in downstream crates for consistency.crates/amaru/Cargo.toml (2)
17-18
: Approveacto
usage inamaru
crateLooks sweet having
acto
as a workspace dependency here—just double-check that the required features propagate correctly to thechain_forward
modules.
24-25
: Verify additional serialization and utility depsYou’ve pulled in
minicbor
,serde_json
, andslot-arithmetic
. Make sure these crates are indeed referenced incrates/amaru
(e.g., in consensus serialization logic). If they aren’t used yet, we might trim the Cargo manifest for faster compiles.Also applies to: 33-34
crates/amaru-consensus/Cargo.toml (1)
33-38
: Approve new dev-dependenciesNice additions of
minicbor
,serde_json
, andtracing-subscriber
in theamaru-consensus
crate. These will beef up your testing and tracing for the chain forward stage. Just eyeball that they land in test code as expected.scripts/demo.sh (2)
22-24
: Nice addition of configurable database directories!G'day mate! These environment variables are a cracker addition - makes the demo script more flexible by letting users specify custom paths for ledger and chain databases. Like being able to customize your character's loadout in Fallout before starting a new game!
28-32
: Ripper job with the command line formatting!Breaking down the command over multiple lines is a thing of beauty, eh? Makes it way more readable than a Wall of Text boss fight. The command now properly passes the database directories as arguments, ensuring consistent configuration across the system. Top notch!
README.md (2)
23-23
: Good on ya for that helpful cross-reference!Adding a reference to the bottom section is fair dinkum helpful - gives readers a heads up that there's more detailed instructions coming up later, like a proper game tutorial that tells you there's an advanced guide in the appendix.
131-149
: Top-notch Docker instructions for the preprod network!Absolutely brilliant addition here! These Docker instructions are as clear as the waters of Bondi Beach. Explaining the volumes, port exposure, and environment variables gives users everything they need to get a Cardano node running on preprod - it's like getting the perfect walkthrough for a tricky boss fight.
The detail about syncing taking time is a nice touch too - sets proper expectations like a good game loading screen tip.
crates/amaru/src/bin/amaru/cmd/daemon.rs (2)
54-60
: Deadly networking config options - like leveling up your multiplayer capabilities!These two new CLI arguments are proper game-changers:
listen_address
: Gives control over where to listen for incoming connectionsmax_downstream_peers
: Limits the number of downstream peers to prevent resource overloadThis is like upgrading from a single-player game to a proper MMO - now we can actually control our network presence! Good default values too.
117-120
: Proper config structure updates to match the new CLI args!You've tied everything together nicely here by updating the
Config
with all the new fields. The explicit setting ofnetwork_magic
from the network name is a smart move - like making sure your character's stats match the class you've chosen.This ensures that all the new networking options flow through properly from CLI args to the runtime configuration.
crates/amaru/src/stages/consensus/chain_forward/tests.rs (6)
12-21
: Solid test for store creation - establishing your base camp!This test validates that your in-memory store creation works properly. Like checking your inventory before starting a quest - making sure you've got the right number of items (48 headers) and that the chain structure is what you expect (47 length). Good foundation for the more complex tests to come!
23-32
: Clean test for the simplest scenario - tip to tip intersection!This test handles the base case - when our tip and the peer's tip are the same point. It's like checking that your fast-travel system works to your current location. Simple but necessary validation!
34-51
: Proper test for branching scenarios - where the real challenge begins!This test checks finding headers between a tip and a branch point - crucial for chain sync protocols. It's like finding the right path back to a checkpoint after exploring a side quest. Good validation of both the count and the expected intersection point and height!
78-89
: Edge case testing for chains with no common ancestor - nice one!This test handles the scenario where we try to find headers between our tip and a completely separate chain. Like trying to fast-travel between two different game worlds! Correctly falls back to genesis (Point::Origin) when there's no intersection. Deadly important test for robustness!
91-133
: Comprehensive chain sync integration test - the boss battle!This integration test is the real deal - setting up a client, finding intersection points, receiving headers, and handling backward/validated events. It's like a full quest line that exercises all the game mechanics together.
The comment about chain[6] on lines 117-120 is helpful in explaining the slightly unintuitive test behavior - good awareness that the timing of operations affects the observed results.
135-172
: Nice test for the rollback optimization - handling detours elegantly!This test checks an important optimization case: how the system handles chain reorganizations and rollbacks. Like testing how a game recovers when you reload a save from an earlier checkpoint! Good verification that the correct sequence of headers is sent after a rollback.
crates/amaru/src/stages/consensus/chain_forward/client_state.rs (2)
8-14
: Good data structure design, mate!Nice one with this
ClientState
struct - it's a ripper of a design for tracking client operations. The comment about having only one rollback at the front is crucial for maintaining a clean state model. This reminds me of the inventory system in "The Last of Us" - simple but effective!
26-49
: Clever queue management logic!This
add_op
method is spot on! The way you're handling the backward operations by either truncating to a matching point or clearing entirely is deadly efficient. It's like when Neo learns to optimize his movements in The Matrix - no wasted effort here. The debug traces will be bonza for troubleshooting too.Makefile (3)
8-12
: Nice configurability addition!Top work setting up these configurable database directories! Using the
DB_PREFIX
as a base for bothLEDGER_DIR
andCHAIN_DIR
is clever as - reminds me of how "Portal" lets you control both portals with related inputs. This will make testing with different database setups a breeze.
71-74
: Grand cleanup target!Adding this
clear-db
target and makingbootstrap
depend on it is absolutely brilliant! It's like that satisfying moment in "Katamari Damacy" when you clear the board and start fresh. Clean state for clean testing, love it!
110-110
: Consistent environment variable passing!Good on ya for updating the demo script invocation to use the environment variables. This keeps everything consistent across all targets - reminds me of how "Mass Effect" maintains universe consistency across all its quests. Deadly important when dealing with filesystem paths!
crates/amaru/src/stages/consensus/chain_forward/README.md (3)
1-13
: Brilliant architectural introduction!This README starts off strong by clearly explaining the concurrency challenges and your approach to solving them. Breaking complex state machines into smaller actor-based components is ace - reminds me of how "The Avengers" works better with each hero handling their specialty rather than one superhero trying to do it all!
The diagram inclusion is a ripper touch too - visual learners will appreciate that!
15-20
: Clear explanation of TCP listener constraintsYou've done a champion job explaining the Tokio
TcpListener
API constraints and your solution using a bounded queue. This is the kind of clarity that saves dev hours! It's like those tutorial levels in "Portal" that explain the mechanics before you need to use them.
37-39
: Clear supervision hierarchy explanationGreat job explaining the purpose of the
client_protocols
actor level. This additional layer of supervision is like how the Normandy operates in Mass Effect - command structure matters! The failure propagation policy is clearly articulated too.🧰 Tools
🪛 LanguageTool
[uncategorized] ~37-~37: A comma might be missing here.
Context: ...s for one client and the overall client supervisor there is an additional level of per-cli...(AI_EN_LECTOR_MISSING_PUNCTUATION_COMMA)
crates/amaru/src/stages/consensus/chain_forward/chain_forward.d2 (1)
1-38
: Beautiful diagram styling!You've set up a proper professional-looking diagram here with these styling classes. The container/component/subcomponent hierarchy with different visual attributes is just like a well-designed UI in "Cyberpunk 2077" - clear visual hierarchy that helps understand the system at a glance.
crates/amaru/src/stages/consensus/chain_forward.rs (1)
296-304
: Double‑check rollback header heightWe assume the header exists in the store after rollback, but on deep rollbacks the block might be pruned. An
Option::None
here silently drops the event.
Consider logging a warn and pushing aClientOp::Backward
with the best info you have so downstreams don’t stall.crates/amaru/src/stages/consensus/chain_forward/test_infra.rs (1)
70-86
:❓ Verification inconclusive
Heads‑up:
unimplemented!()
will blow up if the test ever needs nonces or era historyRight now the trait stubs for
get_nonces
,put_nonces
, andera_history
punt withunimplemented!()
.
If any future test (or the stage itself) decides to look up nonces or era data, the whole suite will crash harder than the Death Star in “Return of the Jedi”.fn get_nonces(&self, _header: &Hash<32>) -> Option<Nonces> { - unimplemented!() + // TODO: return a default or mocked value so tests won’t panic + None }Even a dummy value (or a
todo!()
guarded by a clear comment) is safer for exploratory work.
Otherwise, remember to upgrade these before the infra gets used outside the happy‑path tests.
Let’s check if any tests or consensus logic actually invoke these stubs:
🏁 Script executed:
#!/bin/bash # Search for get_nonces, put_nonces, and era_history usages within the amaru crate rg -n "get_nonces" -C2 crates/amaru rg -n "put_nonces" -C2 crates/amaru rg -n "era_history" -C2 crates/amaruLength of output: 8912
Heads‑up: stubbed methods in test_infra.rs aren’t currently exercised but will panic if ever invoked
- File:
crates/amaru/src/stages/consensus/chain_forward/test_infra.rs
(lines 75–86)
•get_nonces
,put_nonces
, andera_history
all useunimplemented!()
• A ripgrep scan shows no current calls from the chain_forward stage or its testsEven though your existing chain‑forward tests don’t hit these stubs, any future lookup will explode like a Sith Star Destroyer. You might swap them for harmless defaults:
fn get_nonces(&self, _header: &Hash<32>) -> Option<Nonces> { - unimplemented!() + None } fn put_nonces(&mut self, _header: &Hash<32>, _nonces: Nonces) -> Result<(), StoreError> { - unimplemented!() + Ok(()) } fn era_history(&self) -> &slot_arithmetic::EraHistory { - unimplemented!() + // e.g. return a stored default or inject a dummy EraHistory in your TestStore + &self.default_era_history }Keep this in mind before you add any tests or logic that rely on nonce or era data—otherwise you’ll get a nasty panic.
#[test] | ||
fn find_headers_between_tip_and_branches() { | ||
let store = mk_store(CHAIN_47); | ||
|
||
let tip = store.get_point(TIP_47); | ||
// Note that the below scheme does not match the documented behaviour, which shall pick the first from | ||
// the list that is on the same chain. But that doesn't make sense to me at all. | ||
let points = [ | ||
store.get_point(BRANCH_47), // this will lose to the (taller) winner | ||
store.get_point(LOST_47), // this is not on the same chain | ||
store.get_point(WINNER_47), // this is the winner after the branch | ||
]; | ||
let peer = store.get_point(WINNER_47); | ||
|
||
let (ops, Tip(p, h)) = find_headers_between(&store, &tip, &points).unwrap(); | ||
assert_eq!( | ||
(ops.len() as u64, p, h), | ||
( | ||
store.get_height(TIP_47) - store.get_height(WINNER_47), | ||
peer, | ||
store.get_height(WINNER_47) | ||
) | ||
); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Interesting behavior note about intersection selection logic!
Your comment on lines 58-60 raises a fair question about the documented behavior. Seems like there might be a mismatch between how the code works and what's documented.
Consider clarifying whether the current implementation (which seems to prefer the tallest intersection point) is the intended behavior or if it should be updated to match documentation (which apparently prefers the first matching point in the list).
🏁 Script executed:
#!/bin/bash
# Let's see if there's documentation that explains the intersection finding logic
grep -r "find_headers_between" --include="*.md" --include="*.rs" .
Length of output: 1514
🏁 Script executed:
#!/bin/bash
# Extract the doc comment and signature of find_headers_between to verify intended behavior
grep -R -n -A5 -B10 "pub(super) fn find_headers_between" crates/amaru/src/stages/consensus/chain_forward/client_state.rs
Length of output: 760
🏁 Script executed:
#!/bin/bash
# Dump the implementation of find_headers_between to inspect intersection selection logic
sed -n '50,200p' crates/amaru/src/stages/consensus/chain_forward/client_state.rs
Length of output: 1700
Clarify intersection‐selection semantics in find_headers_between
Oi, mate—after digging into the impl, I see we walk backwards from the tip and pick the first matching ancestor (i.e. the tallest common point), not the first entry in the points
array as the doc suggests. Let’s decide whether to:
- Update the doc comment in
client_state.rs
(lines 52–56) to say “pick the nearest ancestor on the chain” - Or change the loop logic to scan your
points
slice in order and pick the first match
Also, the inline note in tests.rs
(around lines 58–60) needs adjusting once we settle on the intended behavior.
• client_state.rs: amend the “first found point” description to reflect chain‐walk semantics
• tests.rs: update or remove the “does not match the documented behaviour” comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indeed, will fix the docs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rkuhn Ripper! Thanks for sorting that out, mate. Documentation and code should be best mates, not strangers passing in the night. Let me know if ya need any more eyes on those doc changes when you've got 'em ready. Like they say in The Matrix, "The issue is not the spoon, it's the documentation about the spoon." Or something like that! 👍
┌─────────┐
│ ^ω^ │
│ CHEERS! │
└─────────┘
let ActoInput::Message(Some((op, tip))) = cell.recv().await else { | ||
return Ok(()); | ||
}; | ||
match op { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Potential dead‑air after await_reply()
After sending await_reply
we block on cell.recv()
. If the parent dies meanwhile we happily exit Ok(())
, leaving the network side hanging. Maybe detect ActoInput::NoMoreSenders
and close the chainsync stream with send_done
so the client isn’t ghosted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@abailly This remark from the code rabbit is uncanny: it does recognise how acto
works (which is not the most popular library) while implying that dropping the chainsync
server won’t be noticed by the client on the other side. Is closing the stream not signalled by the muxer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!
match header.parent() { | ||
Some(parent) => assert_eq!( | ||
Point::new(self.our_tip.0.slot_or_default(), parent.as_ref().to_vec()), | ||
self.our_tip.0 | ||
), | ||
None => assert_eq!(self.our_tip.0, Point::Origin), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Compilation blocker – Point::new
doesn’t exist
pallas_network::miniprotocols::Point
only has the Specific
variant.
The assert won’t even compile, meaning your CI Guinness is flat before the first sip.
-Point::new(self.our_tip.0.slot_or_default(), parent.as_ref().to_vec()),
+Point::Specific(self.our_tip.0.slot_or_default(), parent.as_ref().to_vec()),
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
match header.parent() { | |
Some(parent) => assert_eq!( | |
Point::new(self.our_tip.0.slot_or_default(), parent.as_ref().to_vec()), | |
self.our_tip.0 | |
), | |
None => assert_eq!(self.our_tip.0, Point::Origin), | |
match header.parent() { | |
Some(parent) => assert_eq!( | |
Point::Specific(self.our_tip.0.slot_or_default(), parent.as_ref().to_vec()), | |
self.our_tip.0 | |
), | |
None => assert_eq!(self.our_tip.0, Point::Origin), | |
} |
57af6ff
to
0a5b8fd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
♻️ Duplicate comments (2)
crates/amaru/src/stages/consensus/chain_forward/tests.rs (1)
58-60
: G'day! Let's chat about this intersection logic comment.This note about the intersection selection behavior not matching docs is spot on. We're choosing the tallest common point (walking backwards from tip) rather than the first match in the provided points array.
This is the same issue flagged in the previous review - we need to decide which behavior is correct and either:
- Update the docs in
client_state.rs
to describe the current behavior, or- Change the implementation to scan through points in order
Either way, this comment and the docs need to be aligned with the actual behavior, mate!
crates/amaru/src/stages/consensus/chain_forward.rs (1)
268-269
:⚠️ Potential issueCritical compiler error - Point::new doesn't exist!
The CI's gonna chuck a wobbly over this one -
Point::new
isn't a valid constructor.According to the pallas_network API, Point only has Point::Origin and Point::Specific variants, not a "new" method:
-Point::new(self.our_tip.0.slot_or_default(), parent.as_ref().to_vec()), +Point::Specific(self.our_tip.0.slot_or_default(), parent.as_ref().to_vec()),
🧹 Nitpick comments (2)
crates/amaru/src/stages/consensus/chain_forward/tests.rs (1)
132-132
: Fair warning about gasket shutdown behavior.This note about not being able to cleanly shut down the gasket stage is a bit like that scene in Jurassic Park - "We've got a T-Rex!" - it's a known issue but still deserving of attention.
Consider investigating if there's a way to implement a cleaner shutdown mechanism that doesn't log errors, or alternatively, add a test-specific log filter to suppress these expected errors during testing. Might save future devs from scratching their heads wondering if something's actually gone wrong.
crates/amaru/src/stages/consensus/chain_forward.rs (1)
196-222
: TCP listener background task looks right, but could use error handling improvements.The TCP listener task is spawned as a background task, which is appropriate for this continuous operation. However, if the task ends due to a bind error or other issue, we won't know.
Consider adding a termination detection mechanism:
let server = tokio::spawn(async move { + let result: Result<(), anyhow::Error> = async { loop { // due to the signature of TcpListener::accept, this is the only way to use this API // in particular, it isn't possible to poll for new peers within the `schedule` method let peer = match PeerServer::accept(&server, network_magic).await { Ok(peer) => peer, Err(e) => { tracing::warn!( target: EVENT_TARGET, "error accepting peer: {e}", ); + if e.is_io() && e.kind() == std::io::ErrorKind::BrokenPipe { + return Err(anyhow::anyhow!("Listener connection broken: {}", e)); + } continue; } }; match tx.send(peer).await { Ok(_) => {} Err(e) => { tracing::info!( target: EVENT_TARGET, "dropping incoming connection: {e}" ); + if tx.is_closed() { + return Err(anyhow::anyhow!("Channel to worker closed")); + } } } } + }.await; + if let Err(e) = result { + tracing::error!(target: EVENT_TARGET, "Listener task terminated: {}", e); + } });This would help identify if the listener task unexpectedly terminates during runtime.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Lite
⛔ Files ignored due to path filters (3)
Cargo.lock
is excluded by!**/*.lock
crates/amaru-consensus/Cargo.lock
is excluded by!**/*.lock
crates/amaru/src/stages/consensus/chain_forward/chain_forward.png
is excluded by!**/*.png
📒 Files selected for processing (17)
.gitignore
(2 hunks)Cargo.toml
(1 hunks)Makefile
(3 hunks)README.md
(2 hunks)crates/amaru-consensus/Cargo.toml
(1 hunks)crates/amaru/Cargo.toml
(2 hunks)crates/amaru/src/bin/amaru/cmd/daemon.rs
(2 hunks)crates/amaru/src/bin/amaru/cmd/mod.rs
(1 hunks)crates/amaru/src/stages/consensus/chain_forward.rs
(2 hunks)crates/amaru/src/stages/consensus/chain_forward/README.md
(1 hunks)crates/amaru/src/stages/consensus/chain_forward/chain_forward.d2
(1 hunks)crates/amaru/src/stages/consensus/chain_forward/client_protocol.rs
(1 hunks)crates/amaru/src/stages/consensus/chain_forward/client_state.rs
(1 hunks)crates/amaru/src/stages/consensus/chain_forward/test_infra.rs
(1 hunks)crates/amaru/src/stages/consensus/chain_forward/tests.rs
(1 hunks)crates/amaru/src/stages/mod.rs
(7 hunks)scripts/demo.sh
(1 hunks)
✅ Files skipped from review due to trivial changes (1)
- README.md
🚧 Files skipped from review as they are similar to previous changes (14)
- crates/amaru/Cargo.toml
- Cargo.toml
- crates/amaru/src/bin/amaru/cmd/mod.rs
- .gitignore
- crates/amaru-consensus/Cargo.toml
- scripts/demo.sh
- crates/amaru/src/bin/amaru/cmd/daemon.rs
- crates/amaru/src/stages/consensus/chain_forward/README.md
- Makefile
- crates/amaru/src/stages/consensus/chain_forward/client_state.rs
- crates/amaru/src/stages/consensus/chain_forward/chain_forward.d2
- crates/amaru/src/stages/consensus/chain_forward/test_infra.rs
- crates/amaru/src/stages/consensus/chain_forward/client_protocol.rs
- crates/amaru/src/stages/mod.rs
🧰 Additional context used
🧬 Code Graph Analysis (2)
crates/amaru/src/stages/consensus/chain_forward/tests.rs (4)
crates/amaru/src/stages/consensus/chain_forward/client_state.rs (2)
find_headers_between
(57-93)new
(17-19)crates/amaru/src/stages/consensus/chain_forward/test_infra.rs (5)
hash
(115-117)mk_store
(94-113)get_height
(59-62)new
(141-192)get_point
(54-57)crates/amaru/src/stages/consensus/chain_forward.rs (2)
tip
(170-175)new
(65-86)crates/amaru/src/stages/mod.rs (5)
pallas_point
(174-174)pallas_point
(178-180)pallas_point
(184-186)as_tip
(199-199)as_tip
(203-205)
crates/amaru/src/stages/consensus/chain_forward.rs (12)
crates/amaru/src/stages/consensus/chain_forward/client_protocol.rs (1)
client_protocols
(37-64)crates/amaru/src/stages/ledger.rs (4)
new
(41-54)bootstrap
(141-143)schedule
(145-151)execute
(158-182)crates/amaru/src/stages/consensus/chain_forward/client_state.rs (1)
new
(17-19)crates/amaru/src/stages/consensus/chain_forward/test_infra.rs (4)
new
(141-192)fmt
(311-327)point
(123-125)eq
(331-341)crates/amaru/src/stages/consensus/fetch.rs (4)
new
(42-52)bootstrap
(103-105)schedule
(107-114)execute
(116-122)crates/amaru/src/stages/consensus/header.rs (4)
new
(43-49)bootstrap
(90-92)schedule
(94-101)execute
(108-114)crates/amaru-ledger/src/store.rs (2)
default
(262-271)tip
(107-107)crates/amaru-kernel/src/lib.rs (1)
slot_or_default
(156-161)crates/amaru-ledger/src/context.rs (4)
from
(57-61)from
(71-73)from
(86-88)from
(98-100)crates/amaru/src/stages/mod.rs (1)
bootstrap
(68-151)crates/amaru-ledger/src/rules/transaction.rs (1)
execute
(79-190)crates/amaru-ledger/src/rules/block.rs (1)
execute
(61-128)
🔇 Additional comments (3)
crates/amaru/src/stages/consensus/chain_forward/tests.rs (1)
23-32
: Ripper test coverage, mate! Comprehensive scenarios and edge cases.The tests thoroughly cover the chain synchronization protocol with various scenarios:
- Tip-to-tip (no headers needed)
- Tip-to-branch (headers up to branch point)
- Tip-to-multiple-branches (tallest branch selected)
- Tip-to-lost-fork (all the way back to genesis)
- Integration tests with client connection and chain events
This test suite gives us solid confidence in the chain forward implementation. Good on ya for covering both the happy paths and edge cases!
Also applies to: 34-51, 53-76, 78-89, 91-133, 135-172
crates/amaru/src/stages/consensus/chain_forward.rs (2)
47-55
: ForwardStage configuration looks solid.The expanded
ForwardStage
struct has all the necessary bits to manage downstream network connections, like a well-equipped Road Warrior ready for the outback.Good job incorporating network configuration (magic, listen address, max peers) and the Tokio runtime into the stage. This provides all the necessary components for the TCP listener and client management.
347-382
: Crikey! This client supervisor is a beauty.The client supervisor uses actor supervision effectively to manage client lifecycle and enforce connection limits.
I like how this pattern:
- Enforces the max peers limit
- Supervises client handlers and gets notified when they terminate
- Broadcasts operations to all connected clients efficiently
It's as elegant as a synchronized swim routine in Mad Max: Fury Road!
// FIXME: block height should be part of BlockValidated message | ||
let store = stage.store.lock().await; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
FIXME comments need addressing.
These TODOs about block height should be part of the BlockValidated message are like that hanging plot thread in The Last of Us 2 - they need to be resolved!
Consider updating the BlockValidationResult
enum to include block height data, eliminating the need to look it up from the store. This would make the code more robust and efficient:
pub enum BlockValidationResult {
BlockValidated(Point, u64, Span), // Added block height parameter
RolledBackTo(Point, u64, Span), // Added block height parameter
BlockValidationFailed(Point, Span),
}
Also applies to: 296-297
#[allow(clippy::expect_used)] | ||
let runtime = | ||
AcTokio::new("consensus.forward", 1).expect("failed to create AcTokio runtime"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Runtime creation with expect() is risky.
Using expect()
for runtime creation could cause a panic if the runtime can't be created, which is a bit like driving without a seatbelt - probably fine, but not the safest approach.
Consider propagating the error instead:
-#[allow(clippy::expect_used)]
-let runtime =
- AcTokio::new("consensus.forward", 1).expect("failed to create AcTokio runtime");
+let runtime = AcTokio::new("consensus.forward", 1)
+ .map_err(|e| format!("failed to create AcTokio runtime: {}", e))?;
Then update the new
function signature to return a Result
:
-pub fn new(
+pub fn new(
downstream: Option<ActoRef<ForwardEvent>>,
store: Arc<Mutex<dyn ChainStore<Header>>>,
network_magic: u64,
listen_address: &str,
max_peers: usize,
our_tip: Tip,
-) -> Self {
+) -> Result<Self, String> {
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
#[allow(clippy::expect_used)] | |
let runtime = | |
AcTokio::new("consensus.forward", 1).expect("failed to create AcTokio runtime"); | |
pub fn new( | |
downstream: Option<ActoRef<ForwardEvent>>, | |
store: Arc<Mutex<dyn ChainStore<Header>>>, | |
network_magic: u64, | |
listen_address: &str, | |
max_peers: usize, | |
our_tip: Tip, | |
) -> Result<Self, String> { | |
let runtime = AcTokio::new("consensus.forward", 1) | |
.map_err(|e| format!("failed to create AcTokio runtime: {}", e))?; | |
// ... the rest of your initialization logic ... | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We’re currently working in the mode that fatal errors tear down the node with a clear message — which is exactly what a Rust panic!()
is made for. Ideally, this would be handled using Result
passing and shutting down gracefully or in central places, but the codebase is just not there yet. So for now the .expect()
should stay.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rkuhn Fair dinkum, mate! I hear ya loud and clear on keeping the .expect()
call. That makes perfect sense for your current development approach - using panics for critical failures lets ya get a clear message when things go pear-shaped, without having to rewrite half the codebase. Like in Mad Max when resources are scarce, ya gotta work with what ya have! I'll keep this in mind for future reviews.
✏️ Learnings added
Learnt from: rkuhn
PR: pragma-org/amaru#149
File: crates/amaru/src/stages/consensus/chain_forward.rs:73-75
Timestamp: 2025-04-20T18:02:25.049Z
Learning: In the current development stage, rkuhn prefers using explicit panics (via `.expect()` or `.unwrap()`) for fatal errors in the application code that would tear down the node, rather than propagating errors with `Result`. The intention is to eventually transition to proper error handling with `Result` as the codebase matures.
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.
🧠 Learnings used
Learnt from: rkuhn
PR: pragma-org/amaru#149
File: crates/amaru/src/stages/consensus/chain_forward/test_infra.rs:272-285
Timestamp: 2025-04-20T17:57:23.214Z
Learning: In test infrastructure code, rkuhn prefers explicit panics (using .unwrap() or similar) over returning Result types, as test failures should be immediate and obvious.
// FIXME: gasket design bug that we only get &Unit and thus cannot take values from it without internal mutability | ||
let peer = peer.borrow_mut().take(); | ||
if let Some(peer) = peer { | ||
self.clients | ||
.send(ClientMsg::Peer(peer, self.our_tip.clone())); | ||
} else { | ||
tracing::error!(target: EVENT_TARGET, "Unit::Peer was empty in execute"); | ||
} | ||
Ok(()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Gasket design limitation with mutability.
This workaround for the gasket framework limitation is like when you're playing Skyrim with a bucket on your head to avoid NPC detection - creative but a bit dodgy.
The RefCell
usage here is a workaround for a design limitation in the gasket framework. Can we raise this as an issue in the gasket repository to improve the API? It would be worth checking if there's already an open issue for this:
🌐 Web query:
Does gasket framework have issues with internal mutability or moving values out of references?
💡 Result:
There is no direct evidence in the search results indicating that the Gasket framework specifically has issues with internal mutability or with moving values out of references. None of the provided search results mention the Gasket framework or discuss its internal mutability or value movement semantics.
However, based on general knowledge of frameworks written in languages with strong mutability constraints (like Rust), issues related to internal mutability and moving values out of references are common topics. In Rust, for instance:
-
Internal Mutability: This is typically handled using types like
Cell
,RefCell
, or smart pointers combined with interior mutability (e.g.,Rc<RefCell<T>>
). These allow mutation behind shared references, but at the cost of enforcing borrow rules at runtime rather than compile time. Misuse can lead to runtime panics or logic bugs if mutability is not correctly encapsulated[8][9]. -
Moving Values Out of References: Rust’s ownership model prohibits moving a value out of a reference (
&T
or&mut T
) unless the value implements theCopy
trait. Attempting to do so is a common source of compiler errors. Frameworks that manage data structures or state across boundaries must be careful to avoid attempting such moves, often opting for clones or using smart pointers to share data safely.
General Issues with Internal Mutability and Value Moving in Frameworks (not Gasket-specific):
- Misuse of internal mutability (e.g., excessive or unsafe use of
RefCell
or unsafe code) can result in hidden bugs or runtime panics, especially when multiple aliases try to mutate data concurrently or reentrantly[8][9]. - Moving values out of references when not allowed by the type system leads to compilation errors or, in some languages, undefined behavior[7].
- Frameworks must carefully design their APIs to either avoid these pitfalls or provide ergonomic, well-documented patterns for mutating state or transferring ownership.
If you are worried about these issues in the context of the Gasket framework, checking its official documentation or codebase for specific handling of these Rust ownership and mutability semantics is advisable.
Summary Table: General Rust Mutability & Move Issues
Issue | Typical Framework Handling | Risks if Misused |
---|---|---|
Internal Mutability | Cell , RefCell , smart pointers |
Borrowing panics, unpredictable bugs[8][9] |
Moving values from references | Cloning, using smart pointers | Compilation failure, logic errors[7] |
Conclusion:
No search result directly attributes mutability or reference movement issues to the Gasket framework. The concerns mentioned are general to systems programming and require careful design in any framework, including Gasket, if it is implemented in a language like Rust. For concrete details, reference the Gasket framework's documentation or codebase.
Citations:
- 1: https://discourse.haskell.org/t/mutable-data-structures-why-so-hard-to-find/4558
- 2: Mutability bugs Turfjs/turf#116
- 3: https://sapegin.me/blog/avoid-mutation/
- 4: https://hackmd.io/@solisoma/rJEfPRuq6
- 5: https://dev.to/nas5w/mutation-isnt-always-bad-in-javascript-5e94
- 6: https://users.rust-lang.org/t/when-does-interior-mutability-make-sense/89092
- 7: https://pling.jondgoodwin.com/post/interior-references-and-shared-mutable/
- 8: https://bakalian.cs.umd.edu/assets/slides/09-interior-mutability.pdf
- 9: https://songlh.github.io/paper/rust-study.pdf
- 10: https://www.trelleborg.com/seals/-/media/tss-media-repository/tss_website/pdf-and-other-literature/brochures/flatsealguides/flatseal-guide-04-optimized-gasket-geometry_en.pdf?rev=-1
Open a Gasket API improvement issue for this interior‑mutability workaround
I’ve scoured the issue tracker and there’s no quest log about internal mutability or moving values out of references in Gasket. Since our RefCell hack in chain_forward.rs feels like using the “arrow‑to‑the‑knee” workaround—clever but hacky—let’s file a feature request:
• File: crates/amaru/src/stages/consensus/chain_forward.rs
Lines: 322–330 (the RefCell
→ take()
workaround for Unit::Peer
)
• Describe the current limitation: you can’t move a value out of the unit without interior mutability
• Ask for a more ergonomic API to safely extract values from a Unit
component
That way we can drop this bucket-on-the-head hack and keep our code as smooth as an Elden Ring rolling dodge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gasket has made this API choice because it allows scheduled work units to be retries. This is not used here, but changing the gasket API seems inadequate in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rkuhn Ah, fair dinkum mate! That's a ripper explanation about Gasket's design philosophy. Like a good gameplay mechanic, sometimes what looks like a quirk is actually a feature - the ability to retry work units is bloody useful even if you're not using it here.
Maybe we could chuck a quick comment above that RefCell to explain the pattern for future devs who might scratch their heads? Something like:
// RefCell used because Gasket's execute method receives &Unit references
// to support potential retries, requiring interior mutability to take ownership
No worries about changing the API then. It's like in Fallout when you discover that weird bug is actually an intentional feature! 👌
✏️ Learnings added
Learnt from: rkuhn
PR: pragma-org/amaru#149
File: crates/amaru/src/stages/consensus/chain_forward.rs:322-330
Timestamp: 2025-04-20T17:58:20.505Z
Learning: Gasket framework's Worker execute method intentionally takes references to work units (&Unit) rather than consuming them to support potential retries of scheduled work. This requires using interior mutability patterns like RefCell when values need to be moved out of the reference.
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.
Signed-off-by: Arnaud Bailly <[email protected]>
0a5b8fd
to
9bef6b2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
crates/amaru/src/stages/consensus/chain_forward/test_infra.rs (1)
147-150
: Nice one on fixing the channel capacity issue, mate!You've bumped those channel capacities from 1 to 8, solving that potential deadlock issue. Like Neo dodging bullets in the Matrix, your tests won't freeze up when messages pile up now.
🧹 Nitpick comments (5)
crates/amaru/src/stages/consensus/chain_forward/test_infra.rs (5)
343-351
: Consider increasing the timeout for CI environments.The one-second timeout in
block_on
is shorter than a TikTok video. While fine for your local machine, it might cause flaky tests in CI environments where resources are shared and slower.- rt.block_on(timeout(Duration::from_secs(1), f)) + rt.block_on(timeout(Duration::from_secs(5), f))
35-47
: Check for cycles in your blockchain traversal, mate.Your
get_chain
function could go into an infinite loop faster than Bill Murray in Groundhog Day if there's a cycle in the header parent links. Add a visited set to break the cycle.pub fn get_chain(&self, h: &str) -> Vec<Header> { let mut chain = Vec::new(); + let mut visited = std::collections::HashSet::new(); let mut current = hash(h); while let Some(header) = self.get(¤t) { + if !visited.insert(current) { + tracing::warn!("Cycle detected in blockchain at {}", current); + break; + } chain.push(header.clone()); let Some(parent) = header.parent() else { break; }; current = parent; } chain.reverse(); chain }
196-196
: That span name is a bit casual, isn't it?The "whatever" span name is about as descriptive as calling your pet "Animal". Consider using something more meaningful for debugging and tracing, like "block_validation_test".
- let span = tracing::debug_span!("whatever"); + let span = tracing::debug_span!("block_validation_test");
174-176
: Panic messages could use more detail.When things go pear-shaped, your panic message is as vague as a weather forecast. Consider adding more context about the expected and actual values.
- panic!("expected listening event, got {:?}", port); + panic!("Expected ForwardEvent::Listening, but received {:?}", port);
88-92
: Great constants, but what's with the naming?The constants are ripper for test data, but the naming scheme is more mysterious than the plot of Inception. Consider adding some comments to explain what CHAIN_47, TIP_47, etc. represent in your test scenarios.
pub fn recv_n<const N: usize>(&mut self) -> [ClientMsg; N] { | ||
let mut ops = Vec::new(); | ||
for _ in 0..N { | ||
let msg = block_on(&self.runtime, self.client.chainsync().request_next()).unwrap(); | ||
match msg { | ||
NextResponse::RollForward(header, tip) => { | ||
ops.push(ClientMsg::Forward(from_cbor(&header.cbor).unwrap(), tip)) | ||
} | ||
NextResponse::RollBackward(point, tip) => ops.push(ClientMsg::Backward(point, tip)), | ||
NextResponse::Await => break, | ||
} | ||
} | ||
ops.try_into().unwrap() | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
This recv_n
method is a bit ripper-risky, yeah?
The method will panic if it doesn't receive exactly N messages, which could be a right pain to debug when tests go sideways. Consider returning a Result
instead or making it more resilient.
- pub fn recv_n<const N: usize>(&mut self) -> [ClientMsg; N] {
+ pub fn recv_n<const N: usize>(&mut self) -> Result<[ClientMsg; N], &'static str> {
let mut ops = Vec::new();
for _ in 0..N {
let msg = block_on(&self.runtime, self.client.chainsync().request_next()).unwrap();
match msg {
NextResponse::RollForward(header, tip) => {
ops.push(ClientMsg::Forward(from_cbor(&header.cbor).unwrap(), tip))
}
NextResponse::RollBackward(point, tip) => ops.push(ClientMsg::Backward(point, tip)),
NextResponse::Await => break,
}
}
- ops.try_into().unwrap()
+ ops.try_into().map_err(|_| "Not enough messages received")
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, this is test infrastructure, panic is exactly the right thing to do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rkuhn G'day mate! Fair dinkum - you're spot on there. Test infrastructure should absolutely panic when things go pear-shaped. It's like that scene in Alien where the alarm goes off - you want to know immediately when something's off!
Panics in test code make failures loud and obvious rather than silently returning errors that might get missed. Makes total sense to keep it as-is.
(•ᴗ•)⁻⁻⁻⁻
✏️ Learnings added
Learnt from: rkuhn
PR: pragma-org/amaru#149
File: crates/amaru/src/stages/consensus/chain_forward/test_infra.rs:272-285
Timestamp: 2025-04-20T17:57:23.214Z
Learning: In test infrastructure code, rkuhn prefers explicit panics (using .unwrap() or similar) over returning Result types, as test failures should be immediate and obvious.
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: Roland Kuhn <[email protected]>
Signed-off-by: Arnaud Bailly <[email protected]>
Signed-off-by: Roland Kuhn <[email protected]>
fd7a4e5
to
9b8b3cd
Compare
Turns out there's a hidden dependency in CI between the ledger.db and the snapshot tests. While we find a way to make tests aware of the LEDGER_DIR env variable, it seems simpler to just stick to default dirs. Signed-off-by: Arnaud Bailly <[email protected]>
Posting this mainly for discussion of the approach.
Summary by CodeRabbit
New Features
Documentation
Chores
Tests
Refactor