Skip to content

Linking SMIR JSON data from different crates #86

@jberthold

Description

@jberthold

Linking several SMIR JSON files together

SMIR JSON is emitted per crate.
For a multi-crate program, this means we get several *.smir.json files.
Each of these files has its own independent set of internalised Ty IDs
to refer to functions and types (same for adt_def IDs).

In order to call a function in another crate, the functions table contains
entries which do not have a corresponding item in the items. Instead, the
called code resides in the other crate and its item is another SMIR JSON file.

Within a single crate, all functions and types are referred to by their
crate-local interned ID Ty. These references occur in various SMIR statements
and terminators. Also, the type metadata contains Tys for structured types such
as tuples, arrays, and structs.

We can link together a number of SMIR JSON files (each one for a named crate) by
applying a positive per-crate offset to all Ty IDs in the SMIR JSON.

The chosen offset value must be large enough to avoid any collisions between Tys from different crates.
A suitable value may be obtained by computing the max of all Ty IDs for each crate.
Traversing items is not necessary for this because all Tys are present in the functions or types tables.

TODO is this true for the types, or are we skipping some?
All Tys should be present in types (except the function ones, of course), and if an item contains a Ty not present in types or functions, we aren't using it!

This approach will lead to duplicates in the types, several (offset-equipped)
Tys will point to the same actual type (primitive or structured). This is harmless
in most cases, but the Tys cannot be used for type checking at runtime.

Algorithm to link several SMIR JSON crate files together:

  • Determine the maximum Ty used in each SMIR JSON (functions, types);
  • Assign a suitable offset to each crate (in alphabetical order of crate names?);
  • traverse all items of each crate, adding the offset of the current crate to each Ty contained within;
  • traverse all types in each crate, adding the offset of the current crate to all Ty keys and each Ty contained within;
  • add the offset of the crate to each Ty key of the functions in each crate
  • concatenate types, functions, items

In the concatenated SMIR JSON, there should be no TerminatorKindCall with a
Ty that is absent from the functions table. Again, we may have duplicate references to
(multiple offset-equipped Tys referring to) the same function (when the calls come from
different crates) but all call targets are known in the functions table.
The fully-qualified name is then looked up in the items and should be present because
all items have been concatenated.

TODO Can there be multiple items which are in fact the same?
(Same qualified name, items coming from different crates)
AFAICT this is possible for monomorphic instances of polymorphic functions (stdlib, specifically).
However, the items would in fact be identical , it would not matter which one to use.

MIR's visitors cannot be used for this, since they are not able to mutate the data they traverse.

TODO we have to read / deserialise SMIR JSON, which is a problem for types that come directly from rustc.

  • Workaround: write the impls by hand (the code that would be derived)
  • Future work: submit a PR to rustc to add #[derive(Deserialize)] where required
    This will hopefully only be necessary for stable_mir types (not rustc_middle ones).

Eliminating dead code from SMIR JSON

Given a particular start symbol, we want to eliminate all items from items that
aren't executed (transitively) by running the item referred to by the start_symbol.
The functions and types tables can remain unmodified, they only cause minor
overhead in lookups in K during execution.

To compute the required items:

  • Traverse each item's terminators to determine which functions are called.
    • Resolve the Ty IDs of called functions using the functions table.
    • These are the edges of a graph of items, keyed by symbol_name.
  • Compute the transitive closure of the start_symbol's item in this graph.
    • A simple graph walk starting at start_symbol can be used.
    • This is the set of required items for execution of start_symbol.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions