Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CometBFT E2E Latency #13

Open
namn-grg opened this issue Apr 4, 2024 · 5 comments
Open

CometBFT E2E Latency #13

namn-grg opened this issue Apr 4, 2024 · 5 comments

Comments

@namn-grg
Copy link
Collaborator

namn-grg commented Apr 4, 2024

Goal

Calculate the E2E latency for the CometBFT consensus mechanism under a load test scenario.

How is E2E Latency calculated?

It is calculated by taking the mean of the time difference between went the tx is sent from the client side and when it is committed inside a block.
E2E Latency = mean( time[tx_sent_from_client] - time[tx_committed_inside_a_block] )

Problem 1

The cometbft client (load-test) sends txs in a batch. The logs for the node don't show the time at which a particular tx was received.

Problem 2

The random (key = value) data generated at all the nodes by the client sometimes leads to 2 different nodes generating the same value, which results in that tx not getting included in the mempool (log - "tx already exists in the mempool").
This will lead to incorrect results as the expected tx rate is not actually achieved. The tx is supposed to get committed, but because of some duplicate txs it isn't.

Possible Solution

Output an additional debug log on the client side for a tx after every set of specific.
Debug log - f"Tx {tx_num} in batch {batch_num} successfully sent"
This will give us the time stamp of that particular tx number.

The tx data is generated randomly. We can add the identifier specified below at the start. This will allow us to catch it in the node logs. (Wont work in case of small data size as depends upon the identifier length)

Identifier = node_id + batch_num + tx_num // make it hex
data_string = identifier + "f" * (length - len(identifier) )

Result

We will know the specific tx to look for using the identifier in the node logs from the newly added client debug log. We will match the identifier and get the time stamp.

@namn-grg namn-grg changed the title CometBFT Solution for E2E Latency CometBFT E2E Latency Apr 4, 2024
@hvanz
Copy link

hvanz commented Apr 17, 2024

Hey @namn-grg, what we usually do to obtain tx latencies on CometBFT experiments is to create each tx with an id and a timestamp in its content, instead of just putting random data. Then we can query the timestamp of each committed block, take each tx in the block, and compare the block timestamp to the timestamp encoded in the tx.
This is the original implementation: tendermint/tendermint#9351
And this is the latest QA report with some latency plots as example: https://github.com/cometbft/cometbft/blob/main/docs/references/qa/CometBFT-QA-v1.md#latencies

If I understand correctly, you want to analyse the logs to calculate latencies. We haven't tried that approach though I imagine the log will be huge, considering that our experiments have millions of txs.

@namn-grg
Copy link
Collaborator Author

namn-grg commented Apr 18, 2024

Thanks for the help @hvanz , I think I figured out what to do.

QQ - is the latency output from test/loadtime/cmd/report is E2E latency? i.e. mean( time[tx_sent_from_client] - time[tx_committed_inside_a_block] )

@hvanz
Copy link

hvanz commented Apr 18, 2024

That's right. That's the tool we use for computing latencies.

@namn-grg
Copy link
Collaborator Author

@hvanz This is a test we ran on a cometbft network, do you think the results is justified? i.e. we calculated appropriately. Or any other views you have would be helpful

SUMMARY:

  • CONFIG:
    Faults: 0 nodes
    Committee size: 4 nodes
    Input rate: 1,000 tx/s
    Transaction size: 256 B
    Execution time: 96 s

  • RESULTS:
    Consensus TPS: 933 tx/s
    Consensus BPS: 238,917 B/s
    Consensus latency: 289 ms
    End-to-end TPS: 958 tx/s
    End-to-end BPS: 245,280 B/s
    End-to-end latency: 951 ms

@hvanz
Copy link

hvanz commented Apr 19, 2024

Yes, those numbers seem ok for a simple local network.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants