-
Notifications
You must be signed in to change notification settings - Fork 435
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UCP/PROTO: Disable protocols that need memory type copy #10490
base: master
Are you sure you want to change the base?
Conversation
@@ -192,7 +192,8 @@ ucp_tag_rndv_offload_sw_proto_probe(const ucp_proto_init_params_t *init_params) | |||
ucp_proto_rndv_ctrl_priv_t rpriv; | |||
|
|||
if (!ucp_tag_rndv_check_op_id(init_params) || | |||
!ucp_ep_config_key_has_tag_lane(init_params->ep_config_key)) { | |||
!ucp_ep_config_key_has_tag_lane(init_params->ep_config_key) || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add a simple mock test to verify the protocol selection?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need to discuss because there are many protocols to tests in that case.
src/ucp/am/eager_multi.c
Outdated
@@ -49,7 +49,8 @@ ucp_am_eager_multi_bcopy_proto_probe(const ucp_proto_init_params_t *init_params) | |||
}; | |||
|
|||
if (!ucp_am_check_init_params(init_params, UCP_PROTO_AM_OP_ID_MASK, | |||
UCP_PROTO_SELECT_OP_FLAG_AM_RNDV)) { | |||
UCP_PROTO_SELECT_OP_FLAG_AM_RNDV) || | |||
context->config.ext.avoid_copy_mem_types) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it feasible to move this check in ucp_proto_init_check_op
and then apply it when send_op != ZCOPY?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
eager zcopy should also be disabled
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed, and egr single too
src/ucp/am/eager_multi.c
Outdated
@@ -49,7 +49,8 @@ ucp_am_eager_multi_bcopy_proto_probe(const ucp_proto_init_params_t *init_params) | |||
}; | |||
|
|||
if (!ucp_am_check_init_params(init_params, UCP_PROTO_AM_OP_ID_MASK, | |||
UCP_PROTO_SELECT_OP_FLAG_AM_RNDV)) { | |||
UCP_PROTO_SELECT_OP_FLAG_AM_RNDV) || | |||
context->config.ext.avoid_copy_mem_types) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
eager zcopy should also be disabled
src/ucp/am/eager_single.c
Outdated
@@ -118,7 +118,8 @@ ucp_am_eager_short_probe_common(const ucp_proto_init_params_t *init_params, | |||
|
|||
if (!ucp_am_check_init_params(init_params, UCS_BIT(op_id), | |||
UCP_PROTO_SELECT_OP_FLAG_AM_RNDV) || | |||
!ucp_proto_is_short_supported(select_param)) { | |||
!ucp_proto_is_short_supported(select_param) || | |||
init_params->worker->context->config.ext.avoid_copy_mem_types) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe you could define a wrapper on top of ucp_am_check_init_params
to use for all AM eager protocols and check for avoid_copy_mem_types
inside
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
src/ucp/rndv/rndv_rkey_ptr.c
Outdated
@@ -396,7 +396,7 @@ ucp_proto_t ucp_rndv_rkey_ptr_mtype_proto = { | |||
.name = "rndv/rkey_ptr/mtype", | |||
.desc = "copy to mapped remote memory", | |||
.flags = 0, | |||
.probe = ucp_proto_rndv_rkey_ptr_mtype_probe, | |||
.probe = ucp_proto_rndv_rkey_ptr_mtype_probe, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unrelated
also we need to disable rkey_ptr_mtype
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is disabled as it uses ucp_proto_rndv_mtype_init()
which contains the added check. restored space
src/ucp/rndv/rndv_am.c
Outdated
@@ -120,6 +120,10 @@ static void ucp_rndv_am_bcopy_probe(const ucp_proto_init_params_t *init_params) | |||
.middle.tl_cap_flags = UCT_IFACE_FLAG_AM_BCOPY | |||
}; | |||
|
|||
if (init_params->worker->context->config.ext.avoid_copy_mem_types) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls move it to ucp_rndv_am_probe_common
, because we need to also disable am_zcopy (it also implies unpacking on the receiver)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
@@ -177,7 +177,8 @@ static void ucp_proto_amo_probe(const ucp_proto_init_params_t *init_params, | |||
}; | |||
|
|||
if ((init_params->select_param->dt_class != UCP_DATATYPE_CONTIG) || | |||
!ucp_proto_init_check_op(init_params, UCS_BIT(op_id))) { | |||
!ucp_proto_init_check_op(init_params, UCS_BIT(op_id)) || | |||
init_params->worker->context->config.ext.avoid_copy_mem_types) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe you can disable it only for non-host memtypes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be fixed now
{"MEMTYPE_AVOID_COPY", "n", | ||
"Avoid memory type copies when activated.\n", | ||
ucs_offsetof(ucp_context_config_t, avoid_copy_mem_types), UCS_CONFIG_TYPE_BOOL}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
{"MEMTYPE_AVOID_COPY", "n", | |
"Avoid memory type copies when activated.\n", | |
ucs_offsetof(ucp_context_config_t, avoid_copy_mem_types), UCS_CONFIG_TYPE_BOOL}, | |
{"MEMTYPE_COPY_ENABLE", "y", | |
"Enable memory type copies", | |
ucs_offsetof(ucp_context_config_t, mem_type_copy_enable), UCS_CONFIG_TYPE_BOOL}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to explain my comment. We have dozens UCX variables ended by "ENABLE", but with "AVOID". And I think it is inconvenient to set a negative value to a variable with a negative in the name to enable a feature.
What?
Make sure to disable any protocol using locally or remotely, a memory type based copy.
Why?
Need to avoid all cuda calls from under
ucp_worker_progress()
. Cuda memory queries are fine since they are performed under send/recv operations.cuda_ipc
transport can be disabled altogether, andgdr_copy
is anyways not of interest when specifyingUCX_MEMTYPE_AVOID_COPY=y
.How?
Disable applicable protocols altogether, for any memory type (cuda/host..).
Test
Might need to make inline sizes to 0, see below otherwise.
Used by perftest to exchange
psn_t sn
: