Skip to content

ctest: Add type alias extraction logic #4477

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

mbyx
Copy link
Contributor

@mbyx mbyx commented Jun 4, 2025

Description

This PR adds a helper function for calling cargo expand, as well as the start of constructing a SymbolTable that extracts all required items from the codebase (for now only type aliases).

It also has two tests, although more are to be added.

It also adds new dependencies:

  • quote for easily testing small snippets of code, as well as converting the parsed ast back into a token stream (and therefore the original code), which is useful for when we don't want to do further parsing but extract the rest as is (for example in a type alias we don't care about whether the type is a pointer or a path, we just want to know what the aliased type is).
  • proc_macro2 for using TokenStream in struct fields.

Sources

N/A

Checklist

  • Relevant tests in libc-test/semver have been updated
  • No placeholder or unstable values like *LAST or *MAX are
    included (see #3131)
  • Tested locally (cd libc-test && cargo test --target mytarget);
    especially relevant for platforms that may not be checked in CI

@rustbot rustbot added ctest Issues relating to the ctest crate S-waiting-on-review labels Jun 4, 2025
@tgross35
Copy link
Contributor

tgross35 commented Jun 4, 2025

Right now the tests aren't very useful, they're just testing syn's ability to roundtrip tokens. If you can add functionality to print TypeAlias as a C typedef and test that instead, I'd be happy to merge this with #[expect(dead_code)] anywhere relevant to pass CI.

@mbyx
Copy link
Contributor Author

mbyx commented Jun 4, 2025

Now that I think about it, you're right the tests aren't very useful. For the conversion to C part I want to keep that separate from the extraction logic, so it'll belong to a different struct. I'll add support for extracting all the other types that we need tomorrow, remove the tests for extraction and add a simple stub for a rust to C converter and try to test that.

My intial reasoning for the tests was because locally I wasn't really sure if the visitor was visiting every item regardless of whether it was inside a function, top level etc. This could mean that some symbols would accidentally be missed when extracting and it wouldn't be easy to figure out which ones.

Copy link
Contributor

@tgross35 tgross35 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some review for ir, I still need to take a look at translation.

Comment on lines +22 to +36
pub fn public(&self) -> bool {
self.public
}

pub fn ident(&self) -> &Ident {
&self.ident
}

pub fn ty(&self) -> &TokenStream {
&self.ty
}

pub fn value(&self) -> &TokenStream {
&self.value
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For our public API, we probably don't want to expose syn/pm2 types to give us more flexibility in the future. So for now you can change any -> TokenStream methods to pub(crate) and mark them #[expect(unused)] if needed. and then ident should return a String. Or also save an ident_string field so we can return an &str.

(applies to all the types in ir)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a nit, this is more of an ast than ir - ast is just structure, ir typically has more information associated with it.

Comment on lines +31 to +39
pub enum SkipItem {
Constant(Predicate<Constant>),
Function(Predicate<Function>),
Static(Predicate<Static>),
TypeAlias(Predicate<TypeAlias>),
Struct(Predicate<Struct>),
Field(Predicate<Field>),
Union(Predicate<Union>),
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this should be reversed; in AST add a pub enum Item { Constant(Constant), ... } and then Predicate can wrap that.

pub fn add(left: usize, right: usize) -> usize {
left + right
}
#![allow(dead_code)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will need to be removed before merge. Add #[expect(dead_code)] to specific types/functions instead.

Copy link
Contributor

@tgross35 tgross35 Jun 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add #![warn(unreachable_pub)] and #![warn(missing_docs)]? This will make it a bit more obvious what API is user public vs. crate public. Even a small docstring is helpful for public items so we don't forget, that can be expanded later.

Comment on lines +10 to +12
/// A `SymbolTable` represents a collected set of top-level Rust items
/// relevant to FFI generation or analysis, including foreign functions/statics,
/// type aliases, structs, unions, and constants.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two small nits:

  1. Docs don't need to restate the type, rustdoc handles the association for you
  2. The first paragraph needs to be 1-2 lines max since it gets used as a summary

So something like this:

/// Represents a collected set of top-level Rust items relevant to FFI generation or analysis.
///
/// Includes foreign functions/statics, type aliases, structs, unions, and constants.

Comment on lines +35 to +45
pub fn contains_struct(&self, ident: &str) -> bool {
self.structs()
.iter()
.any(|structure| structure.ident().to_string() == ident)
}

pub fn contains_union(&self, ident: &str) -> bool {
self.unions()
.iter()
.any(|structure| structure.ident().to_string() == ident)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.ident() without .to_string() should be fine here, Ident implements PartialEq for everything AsRef<str> https://docs.rs/proc-macro2/latest/proc_macro2/struct.Ident.html#impl-PartialEq%3CT%3E-for-Ident

Comment on lines +10 to +17
pub use constant::Constant;
pub use field::Field;
pub use function::Function;
pub use parameter::Parameter;
pub use static_variable::Static;
pub use structure::Struct;
pub use type_alias::TypeAlias;
pub use union::Union;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you keep these names consistent with syn's https://docs.rs/syn/latest/syn/enum.Item.html which are consistent with rustc? So Constant->Const, Function -> Fn, TypeAlias -> Type.

I actually prefer the names you have now, but consistency with the ecosystem is useful.

/// relevant to FFI generation or analysis, including foreign functions/statics,
/// type aliases, structs, unions, and constants.
#[derive(Debug)]
pub struct SymbolTable {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have any ideas for a different name than SymbolTable? A "symbol table" is typically a list of the exported symbols from a library which doesn't really apply here.

Maybe something like ItemTable or MappableItems (mappable from C->Rust)?

Copy link
Contributor Author

@mbyx mbyx Jun 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about FfiItems?

}
}

fn collect_fields(fields: &syn::punctuated::Punctuated<syn::Field, syn::Token![,]>) -> Vec<Field> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

syn::punctuated::Punctuated is a long path, you can just import it

Comment on lines +80 to +82
if ty == "&str" {
return "char*".to_string();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't correct, &str isn't a C-safe type

Ok(s)
}

pub fn translate_primitive_type(&self, ty: &str) -> String {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These functions should be taking syn types, syn's &str roundtrip isn't reliable. I can help with that in a follow up.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

translate_primitive_type would have to remain &str since syn only supports Path representation once all other parts of a type have been stripped from it. The other functions and those in the future will take in syn types.

@tgross35
Copy link
Contributor

tgross35 commented Jun 9, 2025

To scope this a bit, I think it would be good for you to split this into 3-4 separate PRs:

  1. Logic to call cargo expand and a very simple user-facing entrypoint. This can also be part of 1 (preferably as a separate commit if so, but don't worry about that if you're not yet a git wizard).
  2. What is currently ir, parsing Rust files into the data structures here
  3. Basic translation from Rust to C
  4. Skips

This would build up a test structure like this:

ctest-next/
    tests/
        sources/
            hierarchy/
                lib.rs
                foo.rs
                bar/
                    mod.rs
                hierarchy.c
                hierarchy.skips.c
            simple.rs
            simple.c
            simple.skips.c
            ...
        basic.rs

Which gets added with the steps:

  1. basic.rs calls the entrypoint on simple.rs and hierarchy/lib.rs, which should expand things
  2. No change, the entrypoint will just be parsing files now
  3. *.c files get added. Update the test to assert that the expansion for <name>.rs matches <name>.c, or update it if LIBC_BLESS=1 (use pretty_assertions to make this easier on you).
  4. Add the *.skip.c versions and do some skips in the tests

At some point the actual test functions should move to tests/common/mod.rs and then build.rs can autogenerate a #[test] function for each file in tests/sources/*.rs, but that doesn't need to happen until there is >1 test file.

@mbyx
Copy link
Contributor Author

mbyx commented Jun 10, 2025

I'm a little confused on the last part, do you mean that simple.rs and hierarchy/ are two different fake files used for testing expansion of a single file and expansion of a crate respectively? And basic.rs is then the actual test file?

Then in the translation PR basic.rs would be updated to also translate the expansion to c and assert that they are the same?

And then the same thing for the skip PR?

@tgross35
Copy link
Contributor

tgross35 commented Jun 10, 2025

I'm a little confused on the last part, do you mean that simple.rs and hierarchy/ are two different fake files used for testing expansion of a single file and expansion of a crate respectively? And basic.rs is then the actual test file?

This isn't anything concrete, but that is what I had in mind. basic.rs and anything else top-level in test/ contains the actual #[test]s. sources/ is used for the test cases (feel free to call it support or test-input or something else). Then sources/foo.rs is a single-file library to be tested, sources/somedir/lib.rs is a multi-file library. So in the example, there are two test cases (basic.rs and hierarchy/lib.rs).

It would be possible to always use somedir/lib.rs, but most tests only need a single file so eliminating the directory seems fine.

Then in the translation PR basic.rs would be updated to also translate the expansion to c and assert that they are the same?

And then the same thing for the skip PR?

basic.rs probably wouldn't need to change, it would just get a basic.c and then basic.skip.c. But yeah, the output would be checked. Sorry I misread, you're correct!

You can feel free to play with the ideas here a bit, I'm just modeling something after how we do our bless tests in Rust CI.

@mbyx
Copy link
Contributor Author

mbyx commented Jun 12, 2025

  1. *.c files get added. Update the test to assert that the expansion for .rs matches .c, or update it if LIBC_BLESS=1 (use pretty_assertions to make this easier on you).

Could you elaborate on the LIBC_BLESS=1 part? When the tests run it would assert that the expansion matches, if it fails to assert it checks if that environment variable is set and if it is it modifies .c with what the expansion actually gives?

@tgross35
Copy link
Contributor

  1. *.c files get added. Update the test to assert that the expansion for .rs matches .c, or update it if LIBC_BLESS=1 (use pretty_assertions to make this easier on you).

Could you elaborate on the LIBC_BLESS=1 part? When the tests run it would assert that the expansion matches, if it fails to assert it checks if that environment variable is set and if it is it modifies .c with what the expansion actually gives?

That's correct! Similar to how we use RUSTC_BLESS or --bless in rust-lang/rust.

@tgross35 tgross35 changed the title Add type alias extraction logic ctest: Add type alias extraction logic Jun 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ctest Issues relating to the ctest crate S-waiting-on-review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants