-
Notifications
You must be signed in to change notification settings - Fork 93
aws ec2 bandwidth between different regions #16
Comments
Indeed we are not using it. The idea was to simulate (as much as possible) a distributed setting where each node is run by a different authority (although using only AWS machines undermines this argument). Also, I still have no evidence that bandwidth is the bottleneck (do you think otherwise?). |
I use iperf to test bandwidth between two instances in two different regions and the result shows the bandwidth is only about 60Mbps if without a peer connection. Is it ok or normal? |
Which instance type are you using? Fig 6 of the paper (https://arxiv.org/pdf/2105.11827.pdf) seems to show that we get higher than that using |
I use m5.8xlarge actually, one in ohio, on in sydney. I am also confused and what about your bandwidth in your environments? |
What's the exact command you use? I will try to run the same |
iperf -s on one instance a and iperf -c a.ip on another instance b |
I get similar results than you (around 60 Mbps), and I looked back at the benchmark results: Benchmark with 4 nodes:
The result seems to report a total throughput of about 23MB/s. Since there are 4 nodes (and the codebase uses one TCP connection per node), each connection ships about Benchmark with 50 nodes:
The result seems to report a total throughput of about 88MB/s. Since there are 50 nodes (and the codebase uses one TCP connection per node), each connection ships about Now it is not clear how the above calculations are useful: (1) Any thoughts? |
BTW, did you try to reproduce some of the results? |
I have reproduced in some other environment where BW per instance is limited but not aws. In my previous experiments, |
It is a good point, I only selected |
Ok, i will discuss with you if i have any result. Thank you! |
I deployed 4 nodes in Sydney, Ohio, Frankfurt and Stockholm, all using ec2 |
No, the consensus layer uses 0 BW. The Tusk consensus protocol achieves total ordering by merely interpreting the dag (without sending any message). Most of the BW should be used by the Another source of BW consumption could be the synchronization protocol. If a node falls behind (that is, the rest of the nodes made progress without it), it will try to sync any missing data. The current sync protocol is quite naive to be fair (and could be bottleneck). The sync mechanism however uses separate TCP connections.
Which node_params = {
'header_size': 1_000,
'max_header_delay': 100,
'gc_depth': 50,
'sync_retry_delay': 10_000,
'sync_retry_nodes': 3,
'batch_size': 500_000,
'max_batch_delay': 100
} Also it could be that the bottleneck is not BW (could it be storage/IO?). Another experiment could be to use many workers per node, all on the same machine (setting bench_params = {
'nodes': [10, 20, 30],
'workers: 2,
'collocate': True,
'rate': [20_000, 30_000, 40_000],
'tx_size': 512,
'faults': 0,
'duration': 300,
'runs': 2,
} Btw, I just added a few benchmark results to the repo (in a folder called data). |
I used
What i mean for consensus is that I'm stilll confused by the result. As shown above, |
That would be great actually. Let me know how it goes
It is possible I guess. Each worker is collocated with a client (ie. the input load is load-balanced amongst the workers), there are thus 4 clients in your testbed (each submitting transactions at a rate of 60_000/4 tx/s). Now I suspect that the workers do not use all their available BW. |
It seems that a peer connection is needed to establish between any two regions to achieve high bandwidth in aws ec2, while i didn't find in benchmark scripts.
The text was updated successfully, but these errors were encountered: