Design choices for a proc macro implementation

Since I've been working on a proof-of-concept implementation of proc macro equivalents for `declare_class!`, `extern_methods!`, etc., I encountered a number of different points where there were some interesting choices to be made in the design space and I thought it would be a good idea to discuss some of those.

For a point of reference, take a current `macro_rules!` based definition like this:

```rust
declare_class!(
    struct Delegate {
        text_field: IvarDrop<Id<NSTextField>, "_text_field">,
        web_view: IvarDrop<Id<WKWebView>, "_web_view">,
    }
    mod ivars;

    unsafe impl ClassType for Delegate {
        type Super = NSObject;

        const NAME: &'static str = "Delegate";
    }

    unsafe impl Delegate {
        #[method(initWithTextField:andWebView:)]
        unsafe fn __initWithTextField_andWebView(
            self: &mut Self,
            text_field: *mut NSTextField,
            web_view: *mut WKWebView,
        ) -> Option<&mut Self> {
            let this: Option<&mut Self> = msg_send![super(self), init];
            let this = this?;
            Ivar::write(&mut this.text_field, unsafe { Id::retain(text_field) }?);
            Ivar::write(&mut this.web_view, unsafe { Id::retain(web_view) }?);
            Some(this)
        }
    }
);

extern_methods!(
    unsafe impl Delegate {
        #[method_id(initWithTextField:andWebView:)]
        #[allow(non_snake_case)]
        pub fn initWithTextField_andWebView(
            this: Option<Allocated<Self>>,
            text_field: &NSTextField,
            web_view: &WKWebView,
        ) -> Id<Self>;
    }
);
```

The equivalent in terms of the proof-of-concept proc macros currently looks like this:

```rust
#[objc(super = NSObject)]
mod Delegate {
    struct Delegate {
        text_field: IvarDrop<Id<NSTextField>>,
        web_view: IvarDrop<Id<WKWebView>>,
    }

    unsafe impl Delegate {
        #[objc(sel = "initWithTextField:andWebView:")]
        unsafe fn __initWithTextField_andWebView(
            self: &mut Self,
            text_field: *mut NSTextField,
            web_view: *mut WKWebView,
        ) -> Option<&mut Self> {
            let this: Option<&mut Self> = msg_send![super(self), init];
            let this = this?;
            Ivar::write(&mut this.text_field, unsafe { Id::retain(text_field) }?);
            Ivar::write(&mut this.web_view, unsafe { Id::retain(web_view) }?);
            Some(this)
        }

        // NOTE: we only need this until `#173: Support super in msg_send_id!` is merged
        pub fn initWithTextField_andWebView(
            this: Option<Allocated<Self>>,
            text_field: &NSTextField,
            web_view: &WKWebView,
        ) -> Id<Self>;
    }
}
```

A couple of observations about this:

Originally, I was thinking it would make sense to have more separate macros like `#[class]`, `#[protocol]`, etc., when I proposed something looking closer to this:

```rust
#[class(extern, super = NSActionCell, inherits = NSCell, NSObject)]
struct NSPathCell;

#[class]
unsafe impl NSPathCell {
    ...
}

#[protocol]
pub unsafe trait NSPathCellDelegate {
    ...
}
```

But at that time I didn't realize yet that we need to be able to parse the class `struct` and the class `impl` together in order to correctly define the `::class()` method (because it registers the methods when first called).

Unfortunately, there is also no practical way (that I know of) to manage state across proc-macro invocations. So the only real obvious choice as an alternative is to place the respective `struct` and `impl` items within an enclosing item so the proc macro can work similarly to `declare_class`. Which leads to the choice of using `mod`.

Given an invocation like this:

```rust
#[objc(super = <superclass>, inherits? = [<superclass>*])]
mod <ClassName> {
    ...
}
```

What happens is the macro expects to find, within the `mod <ClassName>`, a `struct <ClassName> { ... }`, or a `type <ClassName>;` (note the lack of `=`). The actual `mod` is just a dummy item and is not emitted, only the items it encloses are emitted. Furthermore, the name of the `struct` or `type` must exactly match the name of the `mod`, and only a single `struct` xor `type` is allowed.

Within a class `#[objc(super = <superclass>)] mod C { ... }`, an `impl` is translated in the following way.

Specifying the selector is not mandatory (if omitted, it is computed from the current camel-case/snake-case hybrid scheme we use, correctly handling trailing `:`).

Also, `#[method]` / `#[method_id]` are not necessary since we determine this from the method return type (looking for `-> Id<...>` or `Result<Id<...>, ...>`), although as with selectors it is possible to manually control this behavior. In that case you can specify `#[objc(managed)] fn f(&self, args*) -> ...` (without explicit retain semantics) or `#[objc(managed = "init")] fn f(args*) -> ...` (with explicit retain semantics).

For `impl C`, we translate methods `fn(args*) -> ... { ... }` as class methods, `fn f(&self, args*) -> ... { ... }` as instance methods, similar as for `declare_class!`. Methods `fn(&self?, args*) -> ...;` (which are not valid Rust syntax, but which we can parse) are handled as with `extern_methods!`.

For `impl T for C`, we translate the enclosed methods as protocol methods.

One choice I've been considering is splitting this behavior up a little more and using `extern` blocks along with `mod`, in the following sense.

For `#[objc] mod C { ... }` we would only allow a class `struct` and not a class `type`. Furthermore, we would no longer parse methods without bodies like `fn f(...) -> ...;` within `impl` items in the class `mod`.

Instead, to handle those cases, you would now write this:

```rust
#[objc(super = Delegate)]
unsafe extern "ObjC" {
    type Delegate;

    fn initWithTextField_andWebView(
        this: Option<Allocated<Self>>,
        text_field: &NSTextField,
        web_view: &WKWebView,
    ) -> Id<Self>;

    fn control_textView_doCommandBySelector(
        &self,
        _control: &NSControl,
        text_view: &NSTextView,
        command_selector: Sel,
    ) -> bool 
}
```

The obvious disadvantages to this approach are that it's maybe a little uglier (since we don't have `impl C`) and we'd probably still need an outer enclosing `mod` to handle protocol translations, since we also can't write `impl P for C` within `extern`.

Advantages are that it's arguably clearer what is happening semantically, specifically because we are using `extern type` here. It's also arguably easier to parse, since within `extern`, having `type T;` and `fn f() -> ...;` is valid syntax.

The latter part is not a huge issue, since in the case of syn, it handles those non-valid syntax cases as a raw `TokenStream`, it just requires re-implementing some of the parsing for those items by hand. But to be honest, I am already doing some of that in order to parse items within a class `mod` without backtracking (e.g., several items are ambiguous until after you parse attributes and visibility qualifiers).

This is also the approach that [cxx](https://crates.io/crates/cxx) and [wasm-bindgen](https://crates.io/crates/wasm-bindgen) use with their proc-macros.

Actually, with `cxx` you have this:

```rust
#[cxx::bridge]
mod ffi {
    // Any shared structs, whose fields will be visible to both languages.
    struct BlobMetadata {
        size: usize,
        tags: Vec<String>,
    }

    extern "Rust" {
        // Zero or more opaque types which both languages can pass around but
        // only Rust can see the fields.
        type MultiBuf;

        // Functions implemented in Rust.
        fn next_chunk(buf: &mut MultiBuf) -> &[u8];
    }

    unsafe extern "C++" {
        // One or more headers with the matching C++ declarations. Our code
        // generators don't read it but it gets #include'd and used in static
        // assertions to ensure our picture of the FFI boundary is accurate.
        include!("demo/include/blobstore.h");

        // Zero or more opaque types which both languages can pass around but
        // only C++ can see the fields.
        type BlobstoreClient;

        // Functions implemented in C++.
        fn new_blobstore_client() -> UniquePtr<BlobstoreClient>;
        fn put(&self, parts: &mut MultiBuf) -> u64;
        fn tag(&self, blobid: u64, tag: &str);
        fn metadata(&self, blobid: u64) -> BlobMetadata;
    }
}
```

where the stuff in `extern "Rust" { ... }` is used for generating header files for using Rust definitions from C++. AFAIK, we don't have an equivalent for that (and maybe it's out of scope), but it might be worth considering as a future option.

And there's also the part where `cxx` uses the `include!` directive in the `extern "C++"` block for generating bindings. Something that might be interesting for us to consider, if proc macros seem like the way to go, is making the `header-translator` functionality available in terms of macro invocations instead of requiring it to be run externally.

I think that's all I have to say about this for now. I didn't mention macros for `static`, `fn`, and `enum`, but I was planning on just re-using the `#[objc]` macro for that. It trivial to determine which item it is applied to, so it seemed to make sense to minimize the number of names we use for the macros. But maybe something other than `#[objc]` would be appropriate too.

Any thoughts or feedback on this? Does it make sense to split the functionality into `extern` even if it's more verbose?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Design choices for a proc macro implementation #423

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Design choices for a proc macro implementation #423

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions