-
Notifications
You must be signed in to change notification settings - Fork 435
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CONTRIB/IBMOCK: Inital add with EFA support #10069
Conversation
buildlib/pr/efa.yml
Outdated
AZP_AGENT_ID: $(AZP_AGENT_ID) | ||
BUILD_NUMBER: "$(Build.BuildId)-$(Build.BuildNumber)" | ||
JOB_URL: "$(System.TeamFoundationCollectionUri)$(System.TeamProject)/_build/results?buildId=$(Build.BuildId)" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do you need these? looks like they are not used by the efa test script
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
contrib/test_efa.sh
Outdated
LD_LIBRARY_PATH=$(pwd)/contrib/ibmock/build:$LD_LIBRARY_PATH | ||
CLOUD_TYPE=aws | ||
export LD_LIBRARY_PATH | ||
export CLOUD_TYPE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor: can init in place
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
contrib/ibmock/verbs.h
Outdated
@@ -0,0 +1,21 @@ | |||
/** | |||
* Copyright (c) 2024, NVIDIA CORPORATION. All rights reserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* Copyright (c) 2024, NVIDIA CORPORATION. All rights reserved. | |
* Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed all
contrib/ibmock/verbs.c
Outdated
@@ -0,0 +1,1064 @@ | |||
/** | |||
* Copyright (c) 2024, NVIDIA CORPORATION. All rights reserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* Copyright (c) 2024, NVIDIA CORPORATION. All rights reserved. | |
* Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
contrib/ibmock/config.h
Outdated
#include <infiniband/efadv.h> | ||
#include <infiniband/verbs.h> | ||
|
||
#define NUM_DEVS 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe define in verbs.c? anyways it needs to be queried by ibv_get_device_list
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
contrib/ibmock/config.c
Outdated
@@ -0,0 +1,109 @@ | |||
/** | |||
* Copyright (c) 2024, NVIDIA CORPORATION. All rights reserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* Copyright (c) 2024, NVIDIA CORPORATION. All rights reserved. | |
* Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
contrib/ibmock/config.h
Outdated
@@ -0,0 +1,22 @@ | |||
/** | |||
* Copyright (c) 2024, NVIDIA CORPORATION. All rights reserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* Copyright (c) 2024, NVIDIA CORPORATION. All rights reserved. | |
* Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
contrib/ibmock/efa.c
Outdated
@@ -0,0 +1,102 @@ | |||
/** | |||
* Copyright (c) 2024, NVIDIA CORPORATION. All rights reserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* Copyright (c) 2024, NVIDIA CORPORATION. All rights reserved. | |
* Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
contrib/ibmock/verbs.c
Outdated
} | ||
} | ||
|
||
printf("ibmock: failed to destroy QP#%d\n", qp->qp_num); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe print to stderr instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed for all
contrib/ibmock/verbs.c
Outdated
if (node_type < IBV_NODE_CA || node_type > IBV_NODE_UNSPECIFIED) | ||
return "unknown"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in some places you use different style - brackets around each statement, and also brackets for every block even if it is one line. Better to use the same style thru the whole lib
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
contrib/ibmock/README.md
Outdated
``` | ||
cd ucx | ||
./autogen.sh | ||
./contrib/configure-devel --with-verbs=$(pwd)/../rdma-core/build/include --with-efa |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually, remove the include
suffix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
neeed to apply the comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
@tvegas1, can you pls squash? |
CI failure fixed by #10457 |
/azp run UCX PR |
Azure Pipelines successfully started running 1 pipeline(s). |
cb7e3da
to
610ec40
Compare
src/uct/ib/efa/srd/srd_iface.c
Outdated
@@ -357,8 +364,7 @@ static uct_iface_ops_t uct_srd_iface_tl_ops = { | |||
ucs_empty_function_return_unsupported, | |||
.ep_pending_purge = (uct_ep_pending_purge_func_t) | |||
ucs_empty_function_return_unsupported, | |||
.iface_flush = (uct_iface_flush_func_t) | |||
ucs_empty_function_return_unsupported, | |||
.iface_flush = uct_srd_iface_flush, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we use uct_base_iface_flush?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
test/gtest/ucp/test_ucp_perf.cc
Outdated
@@ -327,6 +327,10 @@ UCS_TEST_SKIP_COND_P(test_ucp_perf, envelope, has_transport("self")) | |||
size_t max_iter = std::numeric_limits<size_t>::max(); | |||
test_spec test = tests[get_variant_value(VARIANT_TEST_TYPE)]; | |||
|
|||
if (ucs::is_aws() && (test.wait_mode == UCX_PERF_WAIT_MODE_SLEEP)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what happens if we don'k skip aws here?
i would expect UCP layer or libperf be able to handle the missing support
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is the error below, actually it is seen when only ud_v
is available, updated the condition in the meantime. I am understanding that it is related to missing event support, do you see an easy way to detect it before inside libperf?
[ RUN ] ud/test_ucp_perf.envelope/10 <ud_v/tag_bw_b>
[1738752485.488782] [rock28:3736565:p-0] select.c:645 UCX ERROR no active messages transport to rock28:3736565: ud_verbs/rdmap0:1 - no send completion event, ud_verbs/rdmap1:1 - no send completion event
[1738752485.488791] [rock28:3736565:p-1] select.c:645 UCX ERROR no active messages transport to rock28:3736565: ud_verbs/rdmap0:1 - no send completion event, ud_verbs/rdmap1:1 - no send completion event
contrib/../test/gtest/common/test.cc:366: Failure
Failed
Got 2 errors and 0 warnings during the test
[ INFO ] < contrib/../src/ucp/wireup/select.c:645 no active messages transport to rock28:3736565: ud_verbs/rdmap0:1 - no send completion event, ud_verbs/rdmap1:1 - no send completion event >
[ INFO ] < contrib/../src/ucp/wireup/select.c:645 no active messages transport to rock28:3736565: ud_verbs/rdmap0:1 - no send completion event, ud_verbs/rdmap1:1 - no send completion event >
[ FAILED ] ud/test_ucp_perf.envelope/10, where GetParam() = ud_v/tag_bw_b (6 ms)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIR the issue was solicited even support, maybe we can support wakeup/sleep in ud_v trasnport with arming cq for any event (not just solicited), it will not be optimal but at least will work
we can add here "// TODO support wakeup in UD transport without requiring IBV_SEND_SOLICITED"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added comment
test/gtest/ucp/test_ucp_perf.cc
Outdated
ss << GetParam().transports; | ||
|
||
if (ucs::is_aws() && (test.wait_mode == UCX_PERF_WAIT_MODE_SLEEP) && | ||
ss.str() == "ud_v") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- add ( )
- maybe use has_transport() instead of std::stringstream ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
What
Add IB mocking (EFA support) capability.
Why ?
Need to run gtest without the need for EFA interface.
How ?
Implement loopback libibverbs/libefa. Keep build/code independent from UCX libraries.
Perftest/Info
Gtest: