-
Notifications
You must be signed in to change notification settings - Fork 305
feat(event cache): introduce an absolute local event ordering #5225
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov ReportAttention: Patch coverage is
✅ All tests successful. No failed tests found. Additional details and impacted files@@ Coverage Diff @@
## main #5225 +/- ##
==========================================
+ Coverage 88.69% 88.73% +0.03%
==========================================
Files 329 330 +1
Lines 88747 89513 +766
Branches 88747 89513 +766
==========================================
+ Hits 78717 79427 +710
- Misses 6241 6278 +37
- Partials 3789 3808 +19 ☔ View full report in Codecov by Sentry. |
2acde34
to
fc99597
Compare
7673e19
to
b218b10
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the code, it's clear and well explained.
I'm wondering how the event order will be used. I'm actually intrigued by the complexity induced by a lazy-loaded vs. a fully-loaded. I've a feeling that having a negative or positive positions would work and could simplify stuff. I'm not super comfortable with the idea of loading all the chunks (even without the events), it defeats the purpose of having such a light structure.
Are you loading all the chunks to always return the position of an event? It means you want the position of an event from the store, not from the in-memory room event cache? If this is the case, I think the strategy should entirely move onto the EventCacheStore
store trait. If that's not the case, I don't understand why we need to load all the chunks. Can you explain to me please?
Thanks for the review! These are very valid questions.
The requirement is that we want to be able to compare the relative position of two events, be they loaded in the in-memory linked chunk or not. So, this must work, independently of the actual position of the event. Some alternative ways to build this:
It sounds like the first and last solutions might be preferable in the short term, and would both require loading only the metadata of a chunk. So I could start with this. Then, I find the concern around performance totally legit. The best way to resolve it would be benchmarking or using some real-world measures, so I may look into this and get back to you here.
As said above, a solution only based on the Loading all the chunks at start seemed like a reasonable solution to build the initial state, and then maintain it cleanly over time. But yeah, at the very least we'd need to only load the minimal amount of data for the order tracker to work correctly. |
We've discussed this, and lazy loading the order tracker brings many other complications, so we're going to roll with the first suggestion only (load a limited set of metadata about each chunk of a linked chunk, and use that instead of the fully loaded linked chunk), and see what comes out of that in terms of performance. We can load the entire metadata in a single SQL query, so that ought to be rather efficient. |
68d7da4
to
5a55d8b
Compare
Rebased on top of main, but only the last three commits are really new. Time for another round of review \o/ |
5c1be4a
to
e8f12a1
Compare
pub fn from_metadata(metas: Vec<ChunkMetadata>) -> Self { | ||
let initial_chunk_lengths = | ||
metas.into_iter().map(|meta| (meta.identifier, meta.num_items)).collect(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a blocker!
We are doing a double allocation here. One collect
in from_metadata
, and one collect
in LinkedChunk::order_tracker
. I propose the following:
pub fn from_metadata(metas: Vec<ChunkMetadata>) -> Self { | |
let initial_chunk_lengths = | |
metas.into_iter().map(|meta| (meta.identifier, meta.num_items)).collect(); | |
pub fn from_metadata<M>(metas: M) -> Self | |
where M: Iterator<Item = ChunkMetadata> | |
{ | |
let initial_chunk_lengths = | |
metas.map(|meta| (meta.identifier, meta.num_items)).collect(); |
and in order_tracker
, we pass the iterator directly, without calling collect
. I don't know how it will work with the unwrap_or_else
though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, it's a good idea and I've tried it, but the one caller looks like this:
Some(OrderTracker::new(
updates,
token,
all_chunks.unwrap_or_else(|| {
// Consider the linked chunk as fully loaded.
self.chunks()
.map(|chunk| ChunkMetadata {
identifier: chunk.identifier(),
num_items: chunk.num_items(),
previous: chunk.previous().map(|prev| prev.identifier()),
next: chunk.next().map(|next| next.identifier()),
})
.collect()
}),
))
where all_chunks
is a Option<Vec<ChunkMetadata>>
. If I wanted OrderTracker::new()
to accept an Iterator<Item = ChunkMetadata>
as the last parameter, then I'd have some code like this:
all_chunks.map(Vec::into_iter).unwrap_or_else(|| {
// Consider the linked chunk as fully loaded.
self.chunks()
.map(|chunk| ChunkMetadata {
identifier: chunk.identifier(),
num_items: chunk.num_items(),
previous: chunk.previous().map(|prev| prev.identifier()),
next: chunk.next().map(|next| next.identifier()),
})
}
But now, you can see what the problem is:
- when
all_chunks
isSome
, then the iterator has the concrete typevec::IntoIter
- and when it's
None
, then the iterator has the concrete typeMap<...>
so the compiler complains (correctly) they're not the same type. I think I recall some Rust proposals to basically allow this, but it's not a reality right now, unfortunately. If we found a better solution later, happy to rewrite this code in another way that would make this possible, of course.
…ferent accumulators In the next patch, we're going to introduce another user of `UpdatesToVectorDiff` which doesn't require accumulating the `VectorDiff` updates; so as to make it optional, let's generalize the algorithm with a trait, that carries the same semantics. No changes in functionality.
…ordering of the current items This is a new data structure that will help figuring out a local, absolute ordering for events in the current linked chunk. It's designed to work even if the linked chunk is being lazily loaded, and it provides a few high-level primitives that make it possible to work nicely with the event cache.
…by the event cache The one hardship is that lazy-loading updates must NOT affect the order tracker, otherwise its internal state will be incorrect (disynchronized from the store) and thus return incorrect values upon shrink/lazy-load. In this specific case, some updates must be ignored, the same way we do it for the store using `let _ = store_updates().take()` in a few places. The author considered that a right place where to flush the pending updates was at the same time we flushed the updates-as-vector-diffs, since they would be observable at the same time.
e0ca18f
to
e04f87b
Compare
This PR introduces a local absolute ordering for items of a linked chunk, or equivalently, for events within a room's timeline. The idea is to reuse the same underlying mechanism we had for
AsVector
, but restricting it to only counting the number of items in a chunk; given an item'sPosition
, we can then compute its absolute order as the total number of items before its containing chunk + its index within the chunk.This will help us order edits that would apply to a thread event, for instance; this is deferred to a future PR, to not make this one too heavyweight.
Attention to reviewers: sorry, this is a bulky PR (mostly because of tests), but I think it's important to see how the
OrderTracker
methods are used in 280bd32, to make sense of their raison d'être.Part of #4869 / #5122.