Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

librdmacm: prevent NULL pointer access during device initialization #1547

Merged
merged 1 commit into from
Feb 4, 2025
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 17 additions & 4 deletions librdmacm/cma.c
Original file line number Diff line number Diff line change
Expand Up @@ -2252,17 +2252,30 @@ int rdma_ack_cm_event(struct rdma_cm_event *event)

static void ucma_process_addr_resolved(struct cma_event *evt)
{
struct rdma_cm_id *id = &evt->id_priv->id;

if (af_ib_support) {
evt->event.status = ucma_query_addr(&evt->id_priv->id);
evt->event.status = ucma_query_addr(id);
if (!evt->event.status && !id->verbs)
goto err_dev;

if (!evt->event.status &&
evt->id_priv->id.verbs->device->transport_type == IBV_TRANSPORT_IB)
evt->event.status = ucma_query_gid(&evt->id_priv->id);
id->verbs->device->transport_type == IBV_TRANSPORT_IB) {
evt->event.status = ucma_query_gid(id);
}
} else {
evt->event.status = ucma_query_route(&evt->id_priv->id);
evt->event.status = ucma_query_route(id);
if (!evt->event.status && !id->verbs)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that flow, the 'id->verbs' is not used compared to the above 'af_ib_support' case.

Do we really need to fail the call once id->verbs wasn't set ? no NULL pointer issue is expected here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we still need to fail the call in af_ib_support == 0 case, as the man page says

If successful, the specified rdma_cm_id will be bound to a local device.

However, in the current implementation, no device binding occurs in this case.

This becomes problematic in the typical RDMA CM connection setup flow where rdma_resolve_route is called after receiving RDMA_CM_EVENT_ADDR_RESOLVED. The following code in rdma_resolve_route would trigger a NULL pointer dereference:

int rdma_resolve_route(struct rdma_cm_id *id, int timeout_ms)
{
        ... snip ...
        // id->verbs is NULL here
        if (id->verbs->device->transport_type == IBV_TRANSPORT_IB) {
                ret = ucma_set_ib_route(id);
                if (!ret)
                        goto out;
        }
      
        ... snip ...
}

So we'd better set the event to RDMA_CM_EVENT_ADDR_ERROR and let user handle the error event

goto err_dev;
}

if (evt->event.status)
evt->event.event = RDMA_CM_EVENT_ADDR_ERROR;
return;

err_dev:
evt->event.status = ERR(ENODEV);
evt->event.event = RDMA_CM_EVENT_ADDR_ERROR;
}

static void ucma_process_route_resolved(struct cma_event *evt)
Expand Down