small and straightforward representation of how a kademlia-based dht could be integrated into ethereum, particularly at das-research.
the current work on this simulation of a kademlia-based dht network offers:
-
DHTNewtorkobject that can: spawndhtclients, serve as main source to initialize therouting tableof thedhtclients, resolveconnectionsbetweendhtclients, handle and keep track of the interactions betweendhtclients. the parameters to configure the adhtnetworkare:networkid: in case we want to simulate different network at the same timeerrorrate: to define the number connections that will fall into an error (we can understand it as the likelines o f an error in %)delayrage: range between the slowest possible delay and the biggest one. a random delay will be selected every time a connection is stablished between 2 nodes (if no error is raised)
the network offers the following functions:
parallel_clilist_initializerinit_with_random_peersinitializes a network using a "blazingly fast" method, which can be optimized even more if a number of threads/processes is definedadd_new_nodeadds a new node to the localNetworkconnect_to_nodereturns theConnectionobj between nodeAandBbootstrap_nodereturn the "best" nodes/dhtclis to compose the routing table for the given nodesummaryreturn the summary of the current status of the network (number of nodes, successful connections, failed ones, etc), will evolve over time
-
Connectioninterface that limits how twodhtclientsinteract with each other (like if it was a closed api/protocol). it offers the possibility to clienta(client starting the connection) to ask clientb(remote client), applying if specified the delay at the moment of stablishing the connection and per each interaction:-
get_closest_nodes_to(hash)will return the k closest nodes to the givenhashthat clientbhas in it's routing table. note: it will also return the value if it's stored locally :) -
store_segment(segment)will add thehashand the segment as a key-value pair inb's local storage -
retrieve_segment(hash)will askbto return the segment of the givenhashif it has it
-
-
DHTClientas representation of a node in the simulated dht network. thedhtclientcan be created using the following parameters:nodeid: id of the node that hosts thedhtclientnetwork: referece to the network obj that where thedhtclientparticipates inkbucketsize: k value, number of nodes per kbucketa: number of concurrent node connections the client does while looking for a given keyb: target of nodes (number of nodes) returned when asking for ahashsteptostop: number of iterations without finding anyone closer to stop thelookupoperation
the client serves a list of endpoints such as:
bootstrapuses the network reference to find the right peers for the routing tablelookup_for_hashwill try to look the value of thehashin the network, and the closest nodes to itget_closest_nodes_towill return the closest nodes to ahashfrom the local routing tableprovide_block_segmentwill lookup for the closest nodes in the network, and store the segment on themstore_segmentwill store locally a segment value using itshashas keyretrieve_segmentwill return the value of ahashif its locally, exception raised otherwise
-
RoutingTableandKBucketclasses to store locally the local representation of the network for a given node -
HashandBitArrayclasses to represent anodeid/blocksegment/generalobject
The source code runs mostly on plain Python libraries. However, to speed up the performance, the plain arrays and dicts
were updated to classes from collections. Thus, I recomend to have an specific virtual environment to use the module.
To install the dependencies, do:
# python -m venv venv
# or
# python -m virtualenv venv
# source venv/bin/activate
(venv)$ pip install -r requirements.txtThe repo has a list of tests to ensure that no functionality is broken whenever a new feature is added. All the tests are
triggered whenever GitHub records a Push or a PullRequest. However, there are locally runable using the ./launch_test.sh
script
# the script will try to activate any `venv` located at the root of the directory
py-dht$ bash launch_tests.shRunning benchmarks is a bit trickier. First install the py-dht module as editable in the venv
py-dht$ pip install -e ./after that, feel free to change the parameters in the benchmarks/launch_benchmarks.sh and run it like if it was a test:
# the script will try to activate any `venv` located at the root of the directory
py-dht$ cd benchmarks
py-dht/benchmarks$ bash launch_benchmarks.shFrom the experience of running tests and benchmarks on the repo, I can say that the optimizations on #8 and #9 were more than necesary.
Recomendations:
- to simulate a network -> use the
network.init_with_random_peers()function setting theprocessesparameters the initialization of the network is by far the process that takes the longer, as it has to compute the best routing tables for each of the spawned nodes. So, please benefit from the concurrency to reduce the duration (check this test as example) - At the moment using 20 cores of a
ryzen 5900xI'm able to initialize a network of10k dhtclientswithk=20in28 secs
Latest numbers of the network becnhmark
# 03.08.2023
py-dht$ python benchmarks/network.py -t opt-conc -o ./csvs -i 1 -k 20 -n 10000 --threads 20
-- benchmark: opt-conc_network_n_initialization --
rounds : 1
failed rounds : 0
prep time (s) : 1.5974044799804688e-05
avg (s) : 0.09420228004455566
median (s) : 0.09420228004455566
p90_duration (s): 0.09420228004455566
-- benchmark: opt-conc_network_bootstrap_node --
rounds : 1
failed rounds : 0
prep time (s) : 0.11602234840393066
avg (s) : 0.6563236713409424
median (s) : 0.6563236713409424
p90_duration (s): 0.6563236713409424
-- benchmark: opt-conc_network_fast_bootstrap_network --
rounds : 1
failed rounds : 0
prep time (s) : 8.344650268554688e-06
avg (s) : 174.6814157962799
median (s) : 174.6814157962799
p90_duration (s): 174.6814157962799
-- benchmark: opt-conc_network_fast_threaded_bootstrap_network --
rounds : 1
failed rounds : 0
prep time (s) : 0.0008330345153808594
avg (s) : 28.239949226379395
median (s) : 28.239949226379395
p90_duration (s): 28.239949226379395At the moment (after applying the code optimizations in #8 and the conncurrency #9) the numbers look like this:
| Number of nodes | Default implementation | 1st Optimization (monothread) | 2nd Optimization (8 processes) |
|---|---|---|---|
| 500 | 10 | 2.096 | 0.489 |
| 1000 | 39 | 5.39 | 1.02 |
| 1200 | 56 | 7.22 | 1.48 |
| 2000 | 157.36 | 14.49 | 2.83 |
| 5000 | 995.73 | 60.01 | 11.15 |
| 10000 | (beyond 2 hours) | 199.97 | 39.15 |
| NOTE: all the measurements displayed in the table are expresed in seconds |
Mikel Cortes-Goicoechea - @cortze
feel free to dive in! change proposals, issues, and prs will be more than welcome.
- the work has been supported by codex
- feel free to support this project through buy me a coffee; help me getting the ship of caffeine that I need 😊.
mit, see license file
