-
Notifications
You must be signed in to change notification settings - Fork 1
Description
Linking several SMIR JSON files together
SMIR JSON is emitted per crate.
For a multi-crate program, this means we get several *.smir.json files.
Each of these files has its own independent set of internalised Ty IDs
to refer to functions and types (same for adt_def IDs).
In order to call a function in another crate, the functions table contains
entries which do not have a corresponding item in the items. Instead, the
called code resides in the other crate and its item is another SMIR JSON file.
Within a single crate, all functions and types are referred to by their
crate-local interned ID Ty. These references occur in various SMIR statements
and terminators. Also, the type metadata contains Tys for structured types such
as tuples, arrays, and structs.
We can link together a number of SMIR JSON files (each one for a named crate) by
applying a positive per-crate offset to all Ty IDs in the SMIR JSON.
The chosen offset value must be large enough to avoid any collisions between Tys from different crates.
A suitable value may be obtained by computing the max of all Ty IDs for each crate.
Traversing items is not necessary for this because all Tys are present in the functions or types tables.
TODO is this true for the types, or are we skipping some?
All Tys should be present in types (except the function ones, of course), and if an item contains a Ty not present in types or functions, we aren't using it!
This approach will lead to duplicates in the types, several (offset-equipped)
Tys will point to the same actual type (primitive or structured). This is harmless
in most cases, but the Tys cannot be used for type checking at runtime.
Algorithm to link several SMIR JSON crate files together:
- Determine the maximum
Tyused in each SMIR JSON (functions,types); - Assign a suitable offset to each crate (in alphabetical order of crate names?);
- traverse all items of each crate, adding the
offsetof the current crate to eachTycontained within; - traverse all
typesin each crate, adding theoffsetof the current crate to allTykeys and eachTycontained within; - add the
offsetof the crate to eachTykey of thefunctionsin each crate - concatenate
types,functions,items
In the concatenated SMIR JSON, there should be no TerminatorKindCall with a
Ty that is absent from the functions table. Again, we may have duplicate references to
(multiple offset-equipped Tys referring to) the same function (when the calls come from
different crates) but all call targets are known in the functions table.
The fully-qualified name is then looked up in the items and should be present because
all items have been concatenated.
TODO Can there be multiple items which are in fact the same?
(Same qualified name, items coming from different crates)
AFAICT this is possible for monomorphic instances of polymorphic functions (stdlib, specifically).
However, the items would in fact be identical , it would not matter which one to use.
MIR's visitors cannot be used for this, since they are not able to mutate the data they traverse.
TODO we have to read / deserialise SMIR JSON, which is a problem for types that come directly from rustc.
- Workaround: write the
impls by hand (the code that would be derived) - Future work: submit a PR to
rustcto add#[derive(Deserialize)]where required
This will hopefully only be necessary forstable_mirtypes (notrustc_middleones).
Eliminating dead code from SMIR JSON
Given a particular start symbol, we want to eliminate all items from items that
aren't executed (transitively) by running the item referred to by the start_symbol.
The functions and types tables can remain unmodified, they only cause minor
overhead in lookups in K during execution.
To compute the required items:
- Traverse each item's terminators to determine which functions are called.
- Resolve the
TyIDs of called functions using thefunctionstable. - These are the edges of a graph of items, keyed by
symbol_name.
- Resolve the
- Compute the transitive closure of the
start_symbol's item in this graph.- A simple graph walk starting at
start_symbolcan be used. - This is the set of required
itemsfor execution ofstart_symbol.
- A simple graph walk starting at