-
Notifications
You must be signed in to change notification settings - Fork 23
Description
Trying out the gossip example (browser-chat) left me with several questions which I could not resolve by looking into the various explanation pieces:
This iroh repo declares that you can use "iroh-gossip for establishing publish-subscribe overlay networks that scale".
What exactly is the service that the gossip protocol implements? I wonder whether pubSub is the right name for the intended service: wouldn't "multicast group" be the more appropriate name?
Now, assuming it is really pubsub (but the following problem also applies if one would call it "multicast"): this would require that the "topic"/"channel" (or multicast group) lives on, regardless of peers going offline. However, what I observed with the browser-chat example and using the CLI is that when the creator node leaves the channel, new peers can only seemingly join the channel, without becoming visible and without hearing the old peers, while those still connected can continue to use the channel as before.
That is, the creator becomes a kind of server which is what p2p programming wants to overcome, and subscribers find themselves in strange "channel" partitions. It is as if the same gossip channel had several instances (which I would call partitions), despite using the same ticket: If new peers join an orphaned channel (= the creator is currently not online), they can't talk to any of the old peers in that channel.
Where is the life-cycle of a "iroh gossip channel" (or as it seems "channel instances") described? Note that gossip tickets are not explained, either.
Simple sequence of actions to demonstrate the partitioning:
- Alice creates a channel with ticket T
- Bob joins T
- Carla joins T (from a machine far away)
- all three can chat
- Alice stops the CLI program
- Alice joins T
- Alice cannot talk to Bob and Carla, while Bob and Carla continue to have their channel
- If Carla stops her CLI program, and joins T again, then a) Alice and Carla are in their own partition while b) Bob is now left fully isolated despite still being online and never having left the channel.
Second observation: It seems that there is a different treatment of local peers vs remote peers, or it is a timing problem. The following case uses the same ticket T that Alice had created above. I made sure that all three peers first went offline and are now reconnecting, in the following sequence:
- all peers shut down their CLI browser-chat program
- Bob joins T (on machine X)
- Carla joins T (on machine Y, far away) - as said above, Bob and Carla cannot talk to each other
- Alice joins T (on machine X) --> Alice and Bob will be in the same channel, but Carla cannot hear them nor speak to them.
If I disconnect all three peers and immediately afterwards and rapidly execute the steps 11) to 13), then all three can talk to each other. If I wait 5 minutes after disconnection, then steps 11) to 13) even leave Alice and Bob in their own partition and nobody can talk to any other peer. So there seems to be some garbage collection going on in the background? Such semantics is quite difficult to explain and to work with.
As pointed out, "server"-style behavior or "not-yet-garbage-collected-channel-instances-can-be-rejoined-otherwise-its-a-partition" are suboptimal in a p2p and distributed systems context. I can't tell whether this is a limitation of the gossip protocol or due to a buggy implementation. From what I observed, this is neither pubSub nor multicast.
Do I miss something?
Metadata
Metadata
Assignees
Labels
Type
Projects
Status