-
Notifications
You must be signed in to change notification settings - Fork 244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add out-of-bounds memory checking for GPUMem buffers and sub-buffers #3666
Conversation
This commit implements out-of-bounds (OOBs) memory detection for MIOPen GPUMem convolution buffers. This detection relies on the fact that the GPU will raise a memory fault if an access into GPU buffers crosses a 2 MiB alignment boundary *and* the memory adjacent to the boundary is not allocated, i.e., if the gpu memory is aligned to a 2 MiB page size boundary and the memory access is either before the start of the buffer or after the end of the buffer. To make the problem more likely to occur, the memory returned to the client is either allocated at the start of the buffer (to catch issues before the start of the buffer) or at the end of the buffer (to catch issues at the end of the buffer). This detection is off by default, but can be enabled via the MIOpenDriver --gpubuffer_check flag which takes the following values: 0 - no out-of-bound detection (default) 1 - align the user-memory to the left (detect OOB before the start of the buffer) 2 - align the user-memory to the right (detect OOB after the end of the buffer) Note that this initial commit only enables this detection for convolutions.
This commit implements out-of-bounds memory detection for MIOpen sub-buffers, which are buffers created from the workspace; that is, the workspace is chopped into logical sub-buffers that can then be used by the kernels. Note that the previous detection methods might not detect invalid uses of sub-buffers, since these sub-buffers were still part of the workspace, and so going over the edge of a sub-buffer might be hidden by the fact that the access was *into* a different sub-buffer. This is still an issue, but would not cause a GPU memory fault. This commit introduces a new environment variable, MIOPEN_DEBUG_CHECK_SUB_BUFFERS, which can be used to enable out-of-bound memory access detection for sub-buffers. This is done by hipMalloc'ing each sub-buffer (in CreateSubBuffer()) and then aligning the sub-buffer to either the left or the right of the 2 MiB aligned pages. If the environment variable is undefined or set to zero, then no sub-buffer out-of-bounds detection is done. Otherwise, setting the environment variable to 1 aligns the buffer to the left, which will detect out-of-bounds before the start of the sub-buffer, and setting the environment variable to 2 aligns the buffer to the right, which detects out-of-bounds at the end of the sub-buffer. Note that this is intended for internal debugging only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Also rename some variables and lambdas to help clarify the intention of the code.
…_OOB_MEMORY_ACCESS
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a couple of suggestions for the docs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This PR adds two different detection mechanisms for detecting out-of-bounds GPU buffer
memory accesses: