Skip to content

OpenCL: Pinned memory fallbacks don't work #5845

@magnumripper

Description

@magnumripper

Some of our OpenCL formats try to allocate pinned memory for buffers with a fallback to normal buffers if the former fails. I always thought the code looked fishy so I looked a bit closer now and it's worse than I expected:

Current code (this from opencl_rawmd4_fmt_plug.c):

	pinned_saved_keys = clCreateBuffer(context[gpu_id], CL_MEM_READ_ONLY | CL_MEM_ALLOC_HOST_PTR, BUFSIZE * kpc, NULL, &ret_code);
	if (ret_code != CL_SUCCESS) {
		saved_plain = (cl_uint *) mem_alloc(BUFSIZE * kpc);
		if (saved_plain == NULL)
			HANDLE_CLERROR(ret_code, "Error creating page-locked memory pinned_saved_keys.");
	}
	else {
		saved_plain = (cl_uint *) clEnqueueMapBuffer(queue[gpu_id], pinned_saved_keys, CL_TRUE, CL_MAP_READ | CL_MAP_WRITE, 0, BUFSIZE * kpc, 0, NULL, NULL, &ret_code);
		HANDLE_CLERROR(ret_code, "Error mapping page-locked memory saved_plain.");
	}

(...)

	buffer_keys = clCreateBuffer(context[gpu_id], CL_MEM_READ_ONLY, BUFSIZE * kpc, NULL, &ret_code);

(...)

	BENCH_CLERROR(clEnqueueWriteBuffer(queue[gpu_id], buffer_keys, CL_TRUE, 0, 4 * key_idx, saved_plain, 0, NULL, NULL), "failed in clEnqueueWriteBuffer buffer_keys.");
  1. I emulated a failure for the pinned alloc and sure enough the fallback just makes the format crash soon after.
  2. The current code weirdly allocates twice the size needed: First a pinned buffer, then a normal one.
  3. Even worse, the current code doesn't even use the pinned buffer ever: It uses that extra, non-pinned, one. Duh!

It's not hard to make it work as intended, the question is if it's even needed at all. I think not: When an allocation using CL_MEM_ALLOC_HOST_PTR fails to get pinned memory it will simply return non-pinned instead (which is why we never saw the mentioned crash). IMHO we should just drop that fallback code and start actually using the pinned buffer.

Something like this:

	device_buffer = clCreateBuffer(context[gpu_id], CL_MEM_READ_ONLY | CL_MEM_ALLOC_HOST_PTR, BUFSIZE * kpc, NULL, &ret_code);
	HANDLE_CLERROR(ret_code, "Error creating device buffer");
	host_buffer = clEnqueueMapBuffer(queue[gpu_id], device_buffer, CL_TRUE, CL_MAP_READ | CL_MAP_WRITE, 0, BUFSIZE * kpc, 0, NULL, NULL, &ret_code);
	HANDLE_CLERROR(ret_code, "Error mapping host buffer");

(...)

	BENCH_CLERROR(clEnqueueWriteBuffer(queue[gpu_id], device_buffer, CL_TRUE, 0, 4 * key_idx, host_buffer, 0, NULL, NULL), "failed in clEnqueueWriteBuffer buffer_keys.");

I authored some of the cases with b0rken code myself, blindly copied from existing formats.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions