Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Struggling to understand load balancing with thread-per-core shards #56

Open
numinnex opened this issue Jun 5, 2024 · 1 comment
Open

Comments

@numinnex
Copy link

numinnex commented Jun 5, 2024

Hey,

Really neat project, reading through it felt like a breeze, it's some of the most beautiful Rust I've ever seen.

From my understanding of the thread per core architecture, one is supposed to partition load in a way, where requests can be routed to specific cores(shards) to perform the operation (for example by using the partition id as the identifier of shard that's supposed to serve the request) and only communicate via channels (no shared state). This guarantees some benefits such as lack of need for locks.

Reading through your code, I can't seem to find such router, instead there is a send_request_to_local_shards that on request received for example CreateCollection, sends message to all of the shards except the currently used one, which appears like it performs the same operation N times locally ? I wonder if is there something that I am missing, maybe some detail about glommio, I am using monoio for my project and currently looking around for some inspiration. As of right now my main source of knowledge is project Sphinx by Pekka Enberg.

@tontinton
Copy link
Owner

tontinton commented Aug 18, 2024

Oh I missed this issue, only now noticed it, thanks for the kind words :)

When using gossip to relay messages, I thought, why should I simply rely on gossip to eventually reach all shards when I can instead save on bandwidth and send only to one of the shards, and then relay locally using send_request_to_local_shards.

I basically did it as an optimization.

It's not perfect, what if that shard is unresponsive while others are? But for this project I've decided that's how I wanted to roll with it. Of course, this hypothesis needs to be tested in a highly scaled cluster running a bunch of load tests, which I haven't done.

Hope that answers your questions.

PS: anything Pekka Enberg makes is most likely a good source, so you're on the safe side following his methods.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants