- 
                Notifications
    
You must be signed in to change notification settings  - Fork 2.4k
 
Open
Labels
Description
Some of our OpenCL formats try to allocate pinned memory for buffers with a fallback to normal buffers if the former fails. I always thought the code looked fishy so I looked a bit closer now and it's worse than I expected:
Current code (this from opencl_rawmd4_fmt_plug.c):
	pinned_saved_keys = clCreateBuffer(context[gpu_id], CL_MEM_READ_ONLY | CL_MEM_ALLOC_HOST_PTR, BUFSIZE * kpc, NULL, &ret_code);
	if (ret_code != CL_SUCCESS) {
		saved_plain = (cl_uint *) mem_alloc(BUFSIZE * kpc);
		if (saved_plain == NULL)
			HANDLE_CLERROR(ret_code, "Error creating page-locked memory pinned_saved_keys.");
	}
	else {
		saved_plain = (cl_uint *) clEnqueueMapBuffer(queue[gpu_id], pinned_saved_keys, CL_TRUE, CL_MAP_READ | CL_MAP_WRITE, 0, BUFSIZE * kpc, 0, NULL, NULL, &ret_code);
		HANDLE_CLERROR(ret_code, "Error mapping page-locked memory saved_plain.");
	}
(...)
	buffer_keys = clCreateBuffer(context[gpu_id], CL_MEM_READ_ONLY, BUFSIZE * kpc, NULL, &ret_code);
(...)
	BENCH_CLERROR(clEnqueueWriteBuffer(queue[gpu_id], buffer_keys, CL_TRUE, 0, 4 * key_idx, saved_plain, 0, NULL, NULL), "failed in clEnqueueWriteBuffer buffer_keys.");- I emulated a failure for the pinned alloc and sure enough the fallback just makes the format crash soon after.
 - The current code weirdly allocates twice the size needed: First a pinned buffer, then a normal one.
 - Even worse, the current code doesn't even use the pinned buffer ever: It uses that extra, non-pinned, one. Duh!
 
It's not hard to make it work as intended, the question is if it's even needed at all. I think not: When an allocation using CL_MEM_ALLOC_HOST_PTR fails to get pinned memory it will simply return non-pinned instead (which is why we never saw the mentioned crash). IMHO we should just drop that fallback code and start actually using the pinned buffer.
Something like this:
	device_buffer = clCreateBuffer(context[gpu_id], CL_MEM_READ_ONLY | CL_MEM_ALLOC_HOST_PTR, BUFSIZE * kpc, NULL, &ret_code);
	HANDLE_CLERROR(ret_code, "Error creating device buffer");
	host_buffer = clEnqueueMapBuffer(queue[gpu_id], device_buffer, CL_TRUE, CL_MAP_READ | CL_MAP_WRITE, 0, BUFSIZE * kpc, 0, NULL, NULL, &ret_code);
	HANDLE_CLERROR(ret_code, "Error mapping host buffer");
(...)
	BENCH_CLERROR(clEnqueueWriteBuffer(queue[gpu_id], device_buffer, CL_TRUE, 0, 4 * key_idx, host_buffer, 0, NULL, NULL), "failed in clEnqueueWriteBuffer buffer_keys.");I authored some of the cases with b0rken code myself, blindly copied from existing formats.