Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Setup Cassandra benchmark env and add CQLMultiQueryBenchmark #3101

Merged
merged 1 commit into from
Jul 25, 2022

Conversation

li-boxuan
Copy link
Member

Currently, all our benchmarks are using the in-memory backend. This PR sets up the infrastructure for Cassandra-based benchmarks and adds a multi-query benchmark using the CQL backend.


Thank you for contributing to JanusGraph!

In order to streamline the review of the contribution we ask you
to ensure the following steps have been taken:

For all changes:

  • Is there an issue associated with this PR? Is it referenced in the commit message?
  • Does your PR body contain #xyz where xyz is the issue number you are trying to resolve?
  • Has your PR been rebased against the latest commit within the target branch (typically master)?
  • Is your initial contribution a single, squashed commit?

For code changes:

  • Have you written and/or updated unit tests to verify your changes?
  • If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
  • If applicable, have you updated the LICENSE.txt file, including the main LICENSE.txt file in the root of this repository?
  • If applicable, have you updated the NOTICE.txt file, including the main NOTICE.txt file found in the root of this repository?

For documentation related changes:

  • Have you ensured that format looks appropriate for the output in which it is rendered?

@janusgraph-bot janusgraph-bot added the cla: external Externally-managed CLA label Jun 24, 2022
@li-boxuan li-boxuan force-pushed the li-boxuan/add-cql-benchmark branch from 0afa135 to fb2f62a Compare June 24, 2022 15:33
@li-boxuan li-boxuan added this to the Release v1.0.0 milestone Jun 24, 2022
- uses: actions/setup-python@v2
with:
python-version: '3.x'
- run: pip install git+https://github.com/li-boxuan/ccm.git
Copy link
Member Author

@li-boxuan li-boxuan Jun 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried many many times and the CI only worked with my custom changes to ccm. The upstream PR is here.

Copy link
Contributor

@farodin91 farodin91 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just interest. Why didn't you use testcontainers for the cql tests?

@li-boxuan
Copy link
Member Author

Why didn't you use testcontainers for the cql tests?

That's a good question. I guess we could use test containers, but IIUC it does not support multiple Cassandra instances. ccm gives us more flexibility in the Cassandra cluster setup. It is also the tool used by Cassandra's official distributed test suite. Also, I am not sure but I suspect using Cassandra in docker would bring additional overhead that might affect the stability of benchmark results. But yeah, there is no strong reason against testcontainers.

@li-boxuan
Copy link
Member Author

This is a blocker for #2942

@li-boxuan li-boxuan requested a review from a team July 6, 2022 15:49
Copy link
Contributor

@rngcntr rngcntr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @li-boxuan, thanks for your PR!

ModifiableConfiguration config = GraphDatabaseConfiguration.buildGraphConfiguration();
config.set(GraphDatabaseConfiguration.STORAGE_BACKEND,"cql");
config.set(CQLConfigOptions.LOCAL_DATACENTER, "dc1");
config.set(GraphDatabaseConfiguration.USE_MULTIQUERY, true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about adding the USE_MULTIQUERY flag as a @Param to provide some insights to the performance boost multi query brings?

Copy link
Member Author

@li-boxuan li-boxuan Jul 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good idea, but unfortunately, it will take much much longer time (a few hours) for the benchmark to finish. I have tried it manually and (as we would expect) USE_MULTIQUERY = true makes the queries much much faster.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ouch, I didn't expect the time to be on the scale of hours. In this case, I think it's fine as it is.

@li-boxuan li-boxuan force-pushed the li-boxuan/add-cql-benchmark branch from fb2f62a to 1ee27a1 Compare July 8, 2022 01:31
@li-boxuan
Copy link
Member Author

@farodin91 Do you want to review again?

@li-boxuan li-boxuan merged commit 15a00b7 into master Jul 25, 2022
@li-boxuan li-boxuan deleted the li-boxuan/add-cql-benchmark branch July 30, 2022 16:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: external Externally-managed CLA
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants