refactor: store ShardChunk and EncodedShardChunk in the same type#13359
refactor: store ShardChunk and EncodedShardChunk in the same type#13359
ShardChunk and EncodedShardChunk in the same type#13359Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #13359 +/- ##
==========================================
+ Coverage 69.61% 69.63% +0.01%
==========================================
Files 858 858
Lines 170742 170767 +25
Branches 170742 170767 +25
==========================================
+ Hits 118865 118910 +45
+ Misses 47064 47048 -16
+ Partials 4813 4809 -4
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
c468753 to
8434800
Compare
ShardChunk at the same time as EncodedShardChunk
f890bc8 to
01137c6
Compare
01137c6 to
1f4c082
Compare
| chunk_extra.validator_proposals().collect(), | ||
| prepared_transactions.transactions, | ||
| outgoing_receipts, | ||
| outgoing_receipts.clone(), |
There was a problem hiding this comment.
We wouldn't need to add this clone() if the outgoing_receipts were returned from the functions as before. Why did you remove it from returned tuple?
There was a problem hiding this comment.
It seems the only usage afterwards is for logging the len of outgoing receipts so you can consider just storing that or moving the logs elsewhere.
There was a problem hiding this comment.
Before: the function was not creating ShardChunk. It was only "reading" from outgoing_receipts. It still had to be passed by value and returned because of the way encoding and decoding worked.
Now: We are creating ShardChunk and we are storing outgoing_receipts in it. So if we still return it, then we would have to return a clone. So it is cleaner for the caller to send a clone to the function which consumes it.
If you follow how ProduceChunkResult is used, it seems like it is using outgoing_receipts in all sorts of places.
Note that this is not introducing a new clone. In the old code, we were still cloning when we created a ShardChunk.
| ) -> Result<Option<(ShardChunk, PartialEncodedChunk)>, Error> { | ||
| match self.check_chunk_complete(&mut encoded_chunk) { | ||
| ChunkStatus::Complete(merkle_paths) => { | ||
| let shard_chunk = encoded_chunk.decode_chunk()?; |
There was a problem hiding this comment.
We're decoding the chunk here and check for validity of decoding in decode_encoded_chunk() called afterwards. That looks fragile. Shall we pull the validity check from within that function to this level as well?
There was a problem hiding this comment.
Yeah, I think we should have strong guarantee that the chunk is derived from the encoded chunk. newtype perhaps?
There was a problem hiding this comment.
I tried to verify stuff in EncodedAndShardChunk::new() but due to how the crates are setup, I could not. I have moved it closer to EncodedAndShardChunk::new() so the new situation is as good as it was in the past. We could maybe figure out a way to improve things still though.
wacban
left a comment
There was a problem hiding this comment.
I think the code should be restructured first - see comments. Otherwise looks good.
For my info what's the relation between EncodedChunk and PartialEncodedChunk?
| ) -> Result<Option<(ShardChunk, PartialEncodedChunk)>, Error> { | ||
| match self.check_chunk_complete(&mut encoded_chunk) { | ||
| ChunkStatus::Complete(merkle_paths) => { | ||
| let shard_chunk = encoded_chunk.decode_chunk()?; |
There was a problem hiding this comment.
Yeah, I think we should have strong guarantee that the chunk is derived from the encoded chunk. newtype perhaps?
| chunk_extra.validator_proposals().collect(), | ||
| prepared_transactions.transactions, | ||
| outgoing_receipts, | ||
| outgoing_receipts.clone(), |
There was a problem hiding this comment.
It seems the only usage afterwards is for logging the len of outgoing receipts so you can consider just storing that or moving the logs elsewhere.
| encoded_chunk: EncodedShardChunk, | ||
| shard_chunk: ShardChunk, |
There was a problem hiding this comment.
The change makes a lot of sense, we also see a need in this from tracing side #core/performance-optimization > traces analysis of overloaded network @ 💬
However, it feels like there is some duplication here - EncodedShardChunk and ShardChunk represent the same data, just in bytes/structured format.
Maybe there is some way to represent all usecases in a single struct? For example,
struct AnyShardChunk {
/// Structured data. Is Some if we produced or reconstructed this chunk
chunk: Option<ShardChunk>,
/// Raw bytes. Is Some if we decoded or received this chunk
bytes: Option<EncodedShardChunk>,
}
impl AnyShardChunk {
/// Constructor - may have debug assertion that chunk matches bytes if both are Some.
pub fn new(chunk, bytes) { ... }
/// Sets bytes if chunk is set
fn decode(&mut self) { ... }
/// Sets chunk if bytes is set
fn encode(&mut self) { ... }
/// Gets chunk, maybe calls encode
pub fn chunk(&mut self) { ... }
/// Gets bytes, maybe calls decode
pub fn bytes(&mut self) { ... }
}
It would allow to simplify test setups, where we don't care about performance that much.
There was a problem hiding this comment.
thanks for the suggestion, I ended up introducing a new type similar to what you proposed.
|
I am going to make significant changes to this PR based on the feedback above. Marking it as draft till it is ready for another round of reviews. |
db4ef78 to
7bf274b
Compare
f646bc3 to
53cd95a
Compare
53cd95a to
51a6913
Compare
ShardChunk at the same time as EncodedShardChunkShardChunk and EncodedShardChunk in the same type
| } | ||
|
|
||
| #[derive(Clone)] | ||
| pub struct EncodedAndShardChunk { |
There was a problem hiding this comment.
I would love to get a suggestion for a better name here.
There was a problem hiding this comment.
gpt suggested ShardChunkWithEncoding. Not perfect but works for me.
Also I think it is worth small comment, like "Used to pass chunk around and skip costly encoding/decoding if needed".
Longarithm
left a comment
There was a problem hiding this comment.
Thank you! Looks clean, and it is great that it even reduces LOC and optimises performance together.
| } | ||
|
|
||
| #[derive(Clone)] | ||
| pub struct EncodedAndShardChunk { |
There was a problem hiding this comment.
gpt suggested ShardChunkWithEncoding. Not perfect but works for me.
Also I think it is worth small comment, like "Used to pass chunk around and skip costly encoding/decoding if needed".
…#13359) The goal of this PR is to create `ShardChunk` and `EncodedShardChunk` at the same time which helps with skipping an expensive decoding of `TransactionReciept`. Based on the feedback received, this PR introduces a new type that stores both `ShardChunk` and `EncodedShardChunk` together so that we can be confident that they are related and create them at the same time.
The goal of this PR is to create
ShardChunkandEncodedShardChunkat the same time which helps with skipping an expensive decoding ofTransactionReciept.Based on the feedback received, this PR introduces a new type that stores both
ShardChunkandEncodedShardChunktogether so that we can be confident that they are related and create them at the same time.