Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UCT/MLX5: Fix dp_ordering logic - needed to disable DDP when no AR #10489

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

yosefe
Copy link
Contributor

@yosefe yosefe commented Feb 13, 2025

Why

Fix a failure to initialize RC/DC iface with DDP_ENABLE=n, when the NIC does not support OOO/AR.
Happens because of an issue in the current logic: DDP_ENABLE=n implies we want MD to support UCT_IB_MLX5_MD_FLAG_DP_ORDERING_FORCE (presumable to force-disable), while in fact it is not needed.

How

Examine DDP and AR configurations independently; fail if they conflict; pick the maxinal level that is both supported and needed.

Failure example:

$ UCX_RC_MLX5_DDP_ENABLE=n UCX_DC_MLX5_DDP_ENABLE=n <path>/gtest --gtest_filter='*reorder_cyclic*' 
[ RUN      ] dc_mlx5/test_rc_srq.reorder_cyclic/0 <dc_mlx5/mlx5_0:1>
[     INFO ] < mlx5_0: cannot set ar_enable=3, ddp_enable=0, capability=0 on dc >
/labhome/yosefe/work/ucx/contrib/../test/gtest/uct/uct_test.cc:862: Failure
Error: Invalid parameter
[  FAILED  ] dc_mlx5/test_rc_srq.reorder_cyclic/0, where GetParam() = dc_mlx5/mlx5_0:1 (33 ms)
[ RUN      ] dc_mlx5/test_rc_srq.reorder_cyclic/1 <dc_mlx5/mlx5_1:1>
[     INFO ] < mlx5_1: cannot set ar_enable=3, ddp_enable=0, capability=0 on dc >
/labhome/yosefe/work/ucx/contrib/../test/gtest/uct/uct_test.cc:862: Failure
Error: Invalid parameter
[  FAILED  ] dc_mlx5/test_rc_srq.reorder_cyclic/1, where GetParam() = dc_mlx5/mlx5_1:1 (22 ms)
[----------] 2 tests from dc_mlx5/test_rc_srq (55 ms total)

ucs_config_sprintf_ternary_auto(ddp_enable_str, sizeof(ddp_enable_str),
&config->ddp_enable, NULL);
ucs_error("%s/%s: cannot set ar_enable=%s ddp_enable=%s cap=%s "
"supp_force=%d (min=%s max=%s force=%d)",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

force_support=

Comment on lines +633 to +634
min_dp_ordering = ucs_max(min_dp_ordering,
UCT_IB_MLX5_DP_ORDERING_OOO_ALL);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need ucs_max here? can just assign to UCT_IB_MLX5_DP_ORDERING_OOO_ALL

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's true that since we calculate min/max here right after they are initialized to the initial values, no need to really use min.max, but IMO this improves code readability and shows that applying DDP and AR configurations is interchangeable.

Comment on lines +649 to +650
max_dp_ordering = ucs_min(max_dp_ordering,
UCT_IB_MLX5_DP_ORDERING_IBTA);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need ucs_min here?

} else if (config->ddp_enable == UCS_NO) {
/* No more than OOO_RW would be used */
max_dp_ordering = ucs_min(max_dp_ordering,
UCT_IB_MLX5_DP_ORDERING_OOO_RW);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe initialize force=1 and set it to 0 just here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed to logic to calculate "force", so the test would also not fail when AR_ENABLE=n and AR is not supported(and cannot be forced)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants