Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Instanced rendering not working with Metal. #891

Open
aleph1 opened this issue Feb 21, 2025 · 4 comments
Open

Instanced rendering not working with Metal. #891

aleph1 opened this issue Feb 21, 2025 · 4 comments

Comments

@aleph1
Copy link
Contributor

aleph1 commented Feb 21, 2025

Describe the bug
As per , instanced rendering is not implemented for Metal on macOS. I took a quick look at pipeline.m.h and it seems like the issue is that the code that iterates the pipeline->inputLayout array does not account for the correct number of input layouts, as other backends account for this. I have a patched version that is based on the Vulkan back-end that no longer triggers a run-time error, however if I compile and run the instanced drawing sample, only one triangle is drawn versus the expected number.

To Reproduce
Steps to reproduce the behavior:

  1. As per download and build the sample for instanced rendering using -g metal
  2. Build the Xcode project.
  3. Run the Xcode project and it will throw a runtime error.

Expected behavior
Instanced rendering should work as expected.

Additional context
I am fairly new to metal, but this is the partially modified code that no longer throws a runtime error, but it still only draws a single triangle, so I am missing something. I have included a note as to where the code modifications in this function begin and end.

void kinc_g5_pipeline_compile(kinc_g5_pipeline_t *pipeline) {
	kinc_log(KINC_LOG_LEVEL_INFO, "kinc_g5_pipeline_compile()");
	MTLRenderPipelineDescriptor *renderPipelineDesc = [[MTLRenderPipelineDescriptor alloc] init];
	renderPipelineDesc.vertexFunction = (__bridge id<MTLFunction>)pipeline->vertexShader->impl.mtlFunction;
	renderPipelineDesc.fragmentFunction = (__bridge id<MTLFunction>)pipeline->fragmentShader->impl.mtlFunction;
	for (int i = 0; i < pipeline->colorAttachmentCount; ++i) {
		renderPipelineDesc.colorAttachments[i].pixelFormat = convert_render_target_format(pipeline->colorAttachment[i]);
		renderPipelineDesc.colorAttachments[i].blendingEnabled =
		    pipeline->blend_source != KINC_G5_BLEND_ONE || pipeline->blend_destination != KINC_G5_BLEND_ZERO ||
		    pipeline->alpha_blend_source != KINC_G5_BLEND_ONE || pipeline->alpha_blend_destination != KINC_G5_BLEND_ZERO;
		renderPipelineDesc.colorAttachments[i].sourceRGBBlendFactor = convert_blending_factor(pipeline->blend_source);
		renderPipelineDesc.colorAttachments[i].destinationRGBBlendFactor = convert_blending_factor(pipeline->blend_destination);
		renderPipelineDesc.colorAttachments[i].rgbBlendOperation = convert_blending_operation(pipeline->blend_operation);
		renderPipelineDesc.colorAttachments[i].sourceAlphaBlendFactor = convert_blending_factor(pipeline->alpha_blend_source);
		renderPipelineDesc.colorAttachments[i].destinationAlphaBlendFactor = convert_blending_factor(pipeline->alpha_blend_destination);
		renderPipelineDesc.colorAttachments[i].alphaBlendOperation = convert_blending_operation(pipeline->alpha_blend_operation);
		renderPipelineDesc.colorAttachments[i].writeMask =
		    (pipeline->colorWriteMaskRed[i] ? MTLColorWriteMaskRed : 0) | (pipeline->colorWriteMaskGreen[i] ? MTLColorWriteMaskGreen : 0) |
		    (pipeline->colorWriteMaskBlue[i] ? MTLColorWriteMaskBlue : 0) | (pipeline->colorWriteMaskAlpha[i] ? MTLColorWriteMaskAlpha : 0);
	}
	renderPipelineDesc.depthAttachmentPixelFormat = MTLPixelFormatInvalid;
	renderPipelineDesc.stencilAttachmentPixelFormat = MTLPixelFormatInvalid;

	MTLVertexDescriptor *vertexDescriptor = [[MTLVertexDescriptor alloc] init];

        // Begin code modifications

	int vertexBindingCount = 0;
	for (int i = 0; i < 16; ++i) {
		if (pipeline->inputLayout[i] == NULL) {
			break;
		}
		//vertexAttributeCount += pipeline->inputLayout[i]->size;
		vertexBindingCount++;
	}

	int attr = 0;
	
	for (int binding = 0; binding < vertexBindingCount; ++binding) {
		int offset = 0;
		int stride = 0;
		for (int i = 0; i < pipeline->inputLayout[binding]->size; ++i) {
			kinc_g5_vertex_element_t element = pipeline->inputLayout[binding]->elements[i];
			int index = findAttributeIndex(renderPipelineDesc.vertexFunction.vertexAttributes, element.name);

			if (index < 0) {
				kinc_log(KINC_LOG_LEVEL_WARNING, "Could not find vertex attribute %s\n", element.name);
			}

			if (index >= 0) {
				vertexDescriptor.attributes[attr].bufferIndex = attr;
				vertexDescriptor.attributes[attr].offset = offset;
				offset += kinc_g4_vertex_data_size(element.data);
				stride += kinc_g4_vertex_data_size(element.data);

				switch (element.data) {
				case KINC_G4_VERTEX_DATA_NONE:
					assert(false);
					break;
				case KINC_G4_VERTEX_DATA_F32_1X:
					vertexDescriptor.attributes[attr].format = MTLVertexFormatFloat;
					break;
				case KINC_G4_VERTEX_DATA_F32_2X:
					vertexDescriptor.attributes[attr].format = MTLVertexFormatFloat2;
					break;
				case KINC_G4_VERTEX_DATA_F32_3X:
					vertexDescriptor.attributes[attr].format = MTLVertexFormatFloat3;
					break;
				case KINC_G4_VERTEX_DATA_F32_4X:
					vertexDescriptor.attributes[attr].format = MTLVertexFormatFloat4;
					break;
				case KINC_G4_VERTEX_DATA_F32_4X4:
					assert(false);
					break;
				case KINC_G4_VERTEX_DATA_I8_1X:
					vertexDescriptor.attributes[attr].format = MTLVertexFormatChar;
					break;
				case KINC_G4_VERTEX_DATA_U8_1X:
					vertexDescriptor.attributes[attr].format = MTLVertexFormatUChar;
					break;
				case KINC_G4_VERTEX_DATA_I8_1X_NORMALIZED:
					vertexDescriptor.attributes[attr].format = MTLVertexFormatCharNormalized;
					break;
				case KINC_G4_VERTEX_DATA_U8_1X_NORMALIZED:
					vertexDescriptor.attributes[attr].format = MTLVertexFormatUCharNormalized;
					break;
				case KINC_G4_VERTEX_DATA_I8_2X:
					vertexDescriptor.attributes[attr].format = MTLVertexFormatChar2;
					break;
				case KINC_G4_VERTEX_DATA_U8_2X:
					vertexDescriptor.attributes[attr].format = MTLVertexFormatUChar2;
					break;
				case KINC_G4_VERTEX_DATA_I8_2X_NORMALIZED:
					vertexDescriptor.attributes[attr].format = MTLVertexFormatChar2Normalized;
					break;
				case KINC_G4_VERTEX_DATA_U8_2X_NORMALIZED:
					vertexDescriptor.attributes[attr].format = MTLVertexFormatUChar2Normalized;
					break;
				case KINC_G4_VERTEX_DATA_I8_4X:
					vertexDescriptor.attributes[attr].format = MTLVertexFormatChar4;
					break;
				case KINC_G4_VERTEX_DATA_U8_4X:
					vertexDescriptor.attributes[attr].format = MTLVertexFormatUChar4;
					break;
				case KINC_G4_VERTEX_DATA_I8_4X_NORMALIZED:
					vertexDescriptor.attributes[attr].format = MTLVertexFormatChar4Normalized;
					break;
				case KINC_G4_VERTEX_DATA_U8_4X_NORMALIZED:
					vertexDescriptor.attributes[attr].format = MTLVertexFormatUChar4Normalized;
					break;
				case KINC_G4_VERTEX_DATA_I16_1X:
					vertexDescriptor.attributes[attr].format = MTLVertexFormatShort;
					break;
				case KINC_G4_VERTEX_DATA_U16_1X:
					vertexDescriptor.attributes[attr].format = MTLVertexFormatUShort;
					break;
				case KINC_G4_VERTEX_DATA_I16_1X_NORMALIZED:
					vertexDescriptor.attributes[attr].format = MTLVertexFormatShortNormalized;
					break;
				case KINC_G4_VERTEX_DATA_U16_1X_NORMALIZED:
					vertexDescriptor.attributes[attr].format = MTLVertexFormatUShortNormalized;
					break;
				case KINC_G4_VERTEX_DATA_I16_2X:
					vertexDescriptor.attributes[attr].format = MTLVertexFormatShort2;
					break;
				case KINC_G4_VERTEX_DATA_U16_2X:
					vertexDescriptor.attributes[attr].format = MTLVertexFormatUShort2;
					break;
				case KINC_G4_VERTEX_DATA_I16_2X_NORMALIZED:
					vertexDescriptor.attributes[attr].format = MTLVertexFormatShort2Normalized;
					break;
				case KINC_G4_VERTEX_DATA_U16_2X_NORMALIZED:
					vertexDescriptor.attributes[attr].format = MTLVertexFormatUShort2Normalized;
					break;
				case KINC_G4_VERTEX_DATA_I16_4X:
					vertexDescriptor.attributes[attr].format = MTLVertexFormatShort4;
					break;
				case KINC_G4_VERTEX_DATA_U16_4X:
					vertexDescriptor.attributes[attr].format = MTLVertexFormatUShort4;
					break;
				case KINC_G4_VERTEX_DATA_I16_4X_NORMALIZED:
					vertexDescriptor.attributes[attr].format = MTLVertexFormatShort4Normalized;
					break;
				case KINC_G4_VERTEX_DATA_U16_4X_NORMALIZED:
					vertexDescriptor.attributes[attr].format = MTLVertexFormatUShort4Normalized;
					break;
				case KINC_G4_VERTEX_DATA_I32_1X:
					vertexDescriptor.attributes[attr].format = MTLVertexFormatInt;
					break;
				case KINC_G4_VERTEX_DATA_U32_1X:
					vertexDescriptor.attributes[attr].format = MTLVertexFormatUInt;
					break;
				case KINC_G4_VERTEX_DATA_I32_2X:
					vertexDescriptor.attributes[attr].format = MTLVertexFormatInt2;
					break;
				case KINC_G4_VERTEX_DATA_U32_2X:
					vertexDescriptor.attributes[attr].format = MTLVertexFormatUInt2;
					break;
				case KINC_G4_VERTEX_DATA_I32_3X:
					vertexDescriptor.attributes[attr].format = MTLVertexFormatInt3;
					break;
				case KINC_G4_VERTEX_DATA_U32_3X:
					vertexDescriptor.attributes[attr].format = MTLVertexFormatUInt3;
					break;
				case KINC_G4_VERTEX_DATA_I32_4X:
					vertexDescriptor.attributes[attr].format = MTLVertexFormatInt4;
					break;
				case KINC_G4_VERTEX_DATA_U32_4X:
					vertexDescriptor.attributes[attr].format = MTLVertexFormatUInt4;
					break;
				default:
					assert(false);
					break;
				}
				attr++;
			}
		}

		vertexDescriptor.layouts[binding].stride = offset;
		vertexDescriptor.layouts[binding].stepFunction = MTLVertexStepFunctionPerVertex;
	}

        // End code modifications

	renderPipelineDesc.vertexDescriptor = vertexDescriptor;

	NSError *errors = nil;
	MTLRenderPipelineReflection *reflection = nil;
	id<MTLDevice> device = getMetalDevice();

	pipeline->impl._pipeline = (__bridge_retained void *)[device newRenderPipelineStateWithDescriptor:renderPipelineDesc
	                                                                                          options:MTLPipelineOptionBufferTypeInfo
	                                                                                       reflection:&reflection
	                                                                                            error:&errors];
	if (errors != nil)
		NSLog(@"%@", [errors localizedDescription]);
	assert(pipeline->impl._pipeline && !errors);

	renderPipelineDesc.depthAttachmentPixelFormat = MTLPixelFormatDepth32Float_Stencil8;
	renderPipelineDesc.stencilAttachmentPixelFormat = MTLPixelFormatDepth32Float_Stencil8;
	pipeline->impl._pipelineDepth = (__bridge_retained void *)[device newRenderPipelineStateWithDescriptor:renderPipelineDesc
	                                                                                               options:MTLPipelineOptionBufferTypeInfo
	                                                                                            reflection:&reflection
	                                                                                                 error:&errors];
	if (errors != nil)
		NSLog(@"%@", [errors localizedDescription]);
	assert(pipeline->impl._pipelineDepth && !errors);

	pipeline->impl._reflection = (__bridge_retained void *)reflection;

	MTLDepthStencilDescriptor *depthStencilDescriptor = [MTLDepthStencilDescriptor new];
	depthStencilDescriptor.depthCompareFunction = convert_compare_mode(pipeline->depthMode);
	depthStencilDescriptor.depthWriteEnabled = pipeline->depthWrite;
	pipeline->impl._depthStencil = (__bridge_retained void *)[device newDepthStencilStateWithDescriptor:depthStencilDescriptor];

	depthStencilDescriptor.depthCompareFunction = MTLCompareFunctionAlways;
	depthStencilDescriptor.depthWriteEnabled = false;
	pipeline->impl._depthStencilNone = (__bridge_retained void *)[device newDepthStencilStateWithDescriptor:depthStencilDescriptor];
}
@aleph1
Copy link
Contributor Author

aleph1 commented Feb 22, 2025

Looking at the Vulkan backend as a point of comparison it seems like Metal is not using the instanced value of each inputLayout to alter the step function for the MTLVertexDescriptor’s layouts. There might be other issues, but I think that is the glaring one.

It seems like Kope has the beginnings of support for this with, typedef enum kope_metal_vertex_step_mode { KOPE_METAL_VERTEX_STEP_MODE_VERTEX, KOPE_METAL_VERTEX_STEP_MODE_INSTANCE } kope_metal_vertex_step_mode; but the pipeline seems to assume the same thing with no branching in the pipeline to adjust the step function.

@RobDangerous
Copy link
Member

Kope (I now call it Kore 3) only started drawing its first triangle recently, it will be a little while.

@aleph1
Copy link
Contributor Author

aleph1 commented Feb 22, 2025

I think I am a bit closer to a fix for the Metal backend in Kore 2, but I’m struggling with one thing. I don’t think the stride value for the memory layouts are being set correctly. Currently the stride value is set based on the MTLVertexFormat values KINC_G4_VERTEX_DATA_, but it looks like stride has to be set using MemoryLayout, but this appears to be Swift only. Any of the working code samples I can find utilize MemoryLayout for calculating the stride, but I have never done any calling of Swift from C.

Swift code from a working example:

let vertexDescriptor = MTLVertexDescriptor()
vertexDescriptor.attributes[0].format = MTLVertexFormat.float3 // the metal pipeline in Kore seems to set this correctly
vertexDescriptor.attributes[0].bufferIndex = 0
vertexDescriptor.attributes[0].offset = 0
vertexDescriptor.layouts[0].stride = MemoryLayout<SIMD3<Float>>.stride // the metal pipeline in Kore sets this using MTLVertexFormat, which can be different than the value provided by MemoryLayout

I suppose I could just create a swift app to output all of the stride values and create a switch based on the KINC_G4_VERTEX_DATA_ types, but it would be nice to be able to utilize MemoryLayout directly.

If you would prefer me to not update this issue with research, please let me know.

@RobDangerous
Copy link
Member

We really don't need "MemoryLayout" to figure out the size of a float3. It's 12. kinc_g4_vertex_data_size does the right thing (and if you doubt it feel free to compare the values you need to whatever you get from "MemoryLayout".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants