Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add generic slice implementations to csharp backend #139

Merged
merged 13 commits into from
Feb 18, 2025

Conversation

arsher
Copy link
Contributor

@arsher arsher commented Feb 16, 2025

This PR contains the following changes:

  • Generic Slice/SliceMut implementations with corresponding marshallers as discussed in PR#135
    • I believe it should not have a performance impact on existing scenarios, but obviously requires copying when having to do custom marshalling, so that will add some slow down
  • Callback wrappers where custom marshalling is required
    • Unfortunately the P/Invoke source generator cannot handle custom marshallers in callbacks, so I decided to add auto generated marshalling around these
    • NOTE: The callback return values are not marshalled
    • NOTE: Nested callbacks are not supported

I'm looking forward to your thoughts about this. Admittedly I did not spend a huge amount of time cleaning this up, so some things might need a bit of moving around, but wanted to get a second set of eyes on this as quickly as possible.

@arsher arsher force-pushed the csharp_generic_slice branch from 0046660 to a73e427 Compare February 16, 2025 15:31
@arsher
Copy link
Contributor Author

arsher commented Feb 16, 2025

Quick note about the null checks in the custom marshallers for arrays and strings.

I think enforcing a non-null field value on the managed side in the generated ConvertToUnmanaged implementation for array/string fields makes it less ergonomic to use the generated code.

Quick example to demonstrate below:

Rust FFI code:

#[repr(transparent)]
pub struct FixedString<const N: usize> {
    data: [u8; N],
}

#[ffi_type]
pub struct Data {
    pub unique_id: FixedString<256>,
}

#[ffi_function]
pub fn get_data(mut list: FFISliceMut<Data>) -> FFIError {
...
}

Current usage from C# (with the PR)

var data = new Data[1] { new Data { unique_id = "" } };
Interop.get_data(new SliceMut<Data>(data));

Ideal usage from C#

var data = new Data[1];
Interop.get_data(new SliceMut<Data>(data));

Clearly this is just a simple example, but if you have a large number of fields that need to be initialized it becomes cumbersome. Given that the actual underlying native structure has a fixed size anyway, I think it would be perfectly fine to handle nulls the same as empty values. WDYT?

@ralfbiedert
Copy link
Owner

Wow, thanks a lot! I had a quick look at that the other week but couldn't get the generic C# part to compile, so I thought this wouldn't work after all.

Some feedback from the top of my head:

  • I had some trouble following the logic, esp. around if (CustomMarshallerHelper<T>.HasCustomMarshaller) parts. Could you add some comments (I think it's fine to emit // ... comments into the .cs file) to make it easier for people to review how the bindings work

  • To follow up on that, I assume that check is to ensure that if you marshal slices that themselves are marshallable you need to handle that explicitly? If so I find it a bit weird C# doesn't handle that automatically, but that's probably because the representation + of each element when converted to the slice isn't obvious, right?

  • That invocation of your ad-hoc assembled function CustomMarshallerHelper<T>.ToUnmanagedFunc!( managed[i], IntPtr.Add(native.Data, i * size));, in what cases will that actually be invoked? Like on what T for Slice<T> would if (CustomMarshallerHelper<T>.HasCustomMarshaller) hit, and for which ones wouldn't it? It appears float or Vec3f32 wouldn't hit that branch, but something that in turn holds on to an array or slice would, like Weird2u8, is that correct?

  • When you write

    The callback return values are not marshalled

    you mean they don't support custom marshalling, but most existing types like Vec3f32 still work?

Clearly this is just a simple example, but if you have a large number of fields that need to be initialized it becomes cumbersome. Given that the actual underlying native structure has a fixed size anyway, I think it would be perfectly fine to handle nulls the same as empty values. WDYT?

In your example of

#[repr(transparent)]
pub struct FixedString<const N: usize> {
    data: [u8; N],
}

isn't the 'issue' that on the Rust side you specify data is a fixed array of a certain size? I'm slightly torn here. I agree that in your case the length of the array is more an "implementation detail". I'd say there you'd want this to "just work" up until the 256, and fail for anything beyond that.

On the other side if I had

pub struct Vec3 {
    data: [u8; 3],
}

I'd argue it should be an error if people initialized that with only [1.0, 2.0].

Thinking out loud, I wonder if we should have one or two extra types here CappedArray<T, N> and FixedArray<T, N> that act / have a fallback impl like a normal array, but FixedArray having stricter checks in backends. (Not saying this should be part of this PR, just tossing ideas).

@arsher
Copy link
Contributor Author

arsher commented Feb 17, 2025

I had some trouble following the logic, esp. around if (CustomMarshallerHelper.HasCustomMarshaller) parts. Could you add some comments (I think it's fine to emit // ... comments into the .cs file) to make it easier for people to review how the bindings work

Yes of course, in general I was going to do some clean up, but wanted to see if this looks like what you were thinking originally.

To follow up on that, I assume that check is to ensure that if you marshal slices that themselves are marshallable you need to handle that explicitly? If so I find it a bit weird C# doesn't handle that automatically, but that's probably because the representation + of each element when converted to the slice isn't obvious, right?

Basically the source generated P/Invoke can only do generic collections if it's marshalled as an array/contiguous collection, but we have a custom type here that also contains the length. Even if we did not, we would still need to call the custom marshallers ourselves, it appears that the current source generation cannot handle recursively calling custom marshallers. This is also why we needed to generate them ourselves for each struct containing a custom marshalled struct. I found the design document useful to see the options: https://github.com/dotnet/runtime/blob/main/docs/design/libraries/LibraryImportGenerator/UserTypeMarshallingV2.md .
Btw. if you look at this doc, the custom marshallers can have multiple different shapes, for the non-slice types we generate stateless ones, which is what's discovered by this helper through the NativeMarshallingAttribute. However this also means that a slice-in-slice will not work sadly, because of the requirement to clean up after themselves these are stateful marshallers. (I forgot to mention this as a limitation)

That invocation of your ad-hoc assembled function CustomMarshallerHelper.ToUnmanagedFunc!( managed[i], IntPtr.Add(native.Data, i * size));, in what cases will that actually be invoked? Like on what T for Slice would if (CustomMarshallerHelper.HasCustomMarshaller) hit, and for which ones wouldn't it? It appears float or Vec3f32 wouldn't hit that branch, but something that in turn holds on to an array or slice would, like Weird2u8, is that correct?

The HasCustomMarshaller would be true if T has the NativeMarshallingAttribute set to a custom stateless marshaller in which case the Slice/SliceMut marshaller would have to make a new collection to marshal the elements into using the generated methods. I chose this approach to minimize the effects of the unavoidable reflection usage, as the expressions get compiled when the helper class is first used with a given T. So yes, for structs where we do not emit custom marshallers it will do exactly what the current implementation does and just expose a pointer to a pinned GCHandle to native code.

When you write

The callback return values are not marshalled

you mean they don't support custom marshalling, but most existing types like Vec3f32 still work?

Exactly, if the return value of a callback would need custom marshalling it won't work, but otherwise it's fine, so in case of the Vec2f32 for example it does work. I think I can add this btw. just have to think a bit about freeing things correctly.

isn't the 'issue' that on the Rust side you specify data is a fixed array of a certain size? I'm slightly torn here. I agree that in your case the length of the array is more an "implementation detail". I'd say there you'd want this to "just work" up until the 256, and fail for anything beyond that.

I think I missed a crucial part of my example, I specifically have the CType set to an array of cchar, so it's clearly indicating it is a string, but easier to create a byte array from a string in rust then a [c_char] 😅 :

unsafe impl<const N: usize> CTypeInfo for FixedString<N> {
    fn type_info() -> CType {
        CType::Array(ArrayType::new(CType::Pattern(TypePattern::CChar), N))
    }
}

The main issue I think is how people might use this. For example in my application I specifically pass an FFISliceMut to my Rust function to be used as an output collection, which means that I do not care about the contents as an input just that there is sufficient place allocated for my structs.
I'm quite used to having to interop with similar C APIs where you have a length and a pointer to an array of structs passed in to be filled by the C library. In these cases when the C structs have owned fixed sized arrays I tend to use the ByValArray and ByValTStr unmanaged types for marsalling to c#. Especially with strings this is very convenient, I do not have to set an actual string value as the MarshalAsAttribute contains the fixed size of the field and the underlying marshaller just ignores the null string when going to the native direction and send a zero length 0 terminated string, but coming back to managed land it does create the strings for me.
I do understand your point about normal arrays though, and I like the idea of having separate patterns for allowing changing this behaviour. However maybe we can have more allowance for strings specifically which are basically variable length anyway 🤔

@ralfbiedert
Copy link
Owner

Thanks for the explanation and PR! I'll merge this now, let's do improvements / changes to the FixedString situation in a separate PR. I'll also have to use this for a bit to get a better feeling.

@ralfbiedert ralfbiedert merged commit 1db6b72 into ralfbiedert:master Feb 18, 2025
2 checks passed
@arsher
Copy link
Contributor Author

arsher commented Feb 18, 2025

Thanks for the explanation and PR! I'll merge this now, let's do improvements / changes to the FixedString situation in a separate PR. I'll also have to use this for a bit to get a better feeling.

My pleasure. I'll clean up a few things in a separate PR then. I'll also try some things around owned arrays/strings.

@arsher arsher deleted the csharp_generic_slice branch February 18, 2025 21:10
@ralfbiedert
Copy link
Owner

ralfbiedert commented Feb 21, 2025

Alright, I have a bit of bad news / FYI.

I'm in the middle of modernizing some constructs, esp. callbacks, which have two design flaws in that they don't support a payload param (#144, not really needed for C#, but C), and esp. in C# that its hard to know when / if the callback can be freed (#143). This is basically prep for #145, having transparent C# -> Rust async calls working, and generally "long-term callbacks".

I tried getting things to work incrementally with some more delegate-related generics, but I keep running into edge cases where the C# marshaller refuses to compile (pushed them as two graveyard branches I abandoned that didn't work out).

In order to make progress I will now probably re-write the callback generation using custom classes again. I'm not exactly sure what that means for the nested array marshalling yet. I definitely want to have that functionality, but I might remove it for a while until the other things work again.

It would be nice once things settle down (in a few days or so) if you could have another look and help re-assess what the best way moving forward w.r.t array marshalling is. With a bit of luck the majority of the code keeps working, but there's a chance that some patterns don't quite compose.


Edit, to document what I'm running into (in various flavors):

public delegate void CallbackSliceMutDelegate(SliceMut<byte> slice);


// ctor for instance holding a delegate, to allow user to explicitly manage "lifetime" of raw pointer
public CallbackSliceMut(CallbackSliceMutDelegate callbackUser)
{
    _callbackNative = Marshal.GetFunctionPointerForDelegate(Call); // <--- this fails
}

error:

System.ArgumentException: The specified Type must not be a generic type. (Parameter 'delegate')

System.ArgumentException
The specified Type must not be a generic type. (Parameter 'delegate')
   at System.Runtime.InteropServices.Marshal.GetFunctionPointerForDelegateInternal(Delegate d)

that error apparently comes from Call having this signature

public void Call(SliceMut<byte> slice, IntPtr _) { ... }

@arsher
Copy link
Contributor Author

arsher commented Feb 21, 2025

Alright, I have a bit of bad news / FYI.

I'm in the middle of modernizing some constructs, esp. callbacks, which have two design flaws in that they don't support a payload param (#144, not really needed for C#, but C), and esp. in C# that its hard to know when / if the callback can be freed (#143). This is basically prep for #145, having transparent C# -> Rust async calls working, and generally "long-term callbacks".

I tried getting things to work incrementally with some more delegate-related generics, but I keep running into edge cases where the C# marshaller refuses to compile (pushed them as two graveyard branches I abandoned that didn't work out).

In order to make progress I will now probably re-write the callback generation using custom classes again. I'm not exactly sure what that means for the nested array marshalling yet. I definitely want to have that functionality, but I might remove it for a while until the other things work again.

It would be nice once things settle down (in a few days or so) if you could have another look and help re-assess what the best way moving forward w.r.t array marshalling is. With a bit of luck the majority of the code keeps working, but there's a chance that some patterns don't quite compose.

I'll keep an eye out and do a second round around this when the callback refactor is finished 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants