-
Notifications
You must be signed in to change notification settings - Fork 75
Emit retags in codegen #958
Description
Proposal
This is part of project goal #392.
Both Stacked Borrows and Tree Borrows rely on retags to create and update the permissions associated with pointers. However, the information that Miri uses to determine where a retag should occur is lost during codegen. We need a way to recover this to be able to detect aliasing violations in lower-level representations of Rust programs. This is necessary to support third-party tools like BorrowSanitizer, which aim at providing support for detecting Rust-specific undefined behavior in multilanguage programs.
Design
We propose adding a new unstable flag -Zcodegen-emit-retag. When this flag is set, the following function call will be emitted whenever a retag needs to happen (shown in LLVM IR):
ptr @__retag(ptr, i64, i64, ptr)This is just a vehicle for type information; third party tools will replace this call with their own implementations. Its parameters are:
-
Target (
ptr) - The pointer being retagged. -
Size (
i64) - An offset in bytes from the start of the pointer being retagged, indicating the range for the new permission within the allocation pointed to by the target. -
Permission Type (
i64) - The permission created by the retag. Third-party tools will be able to configure this by overriding a compiler query. You can expect this to be serialized from something equivalent to Miri'sNewPermission. -
Interior Mutable Fields (
ptr) - A pointer to a constant array of pairs ofi64. Each pair is an offset from the target pointer and a size, indicating an interior mutable field within the pointee type of the target.
The return value is an alias for the target pointer. Additional information can be encoded for convenience using LLVM metadata nodes attached to this function call. For example, function-entry retags will have a fn_entry metadata node.
Implementation Notes
We will determine where to emit retag function calls during codegen without using MIR Retag statements (which are likely going away). Broadly, parameters containing references are retagged on entry to a function, and values with references are retagged when they are copied by MIR assignment statements and function call terminators. When an aggregate contains references, we will recurse into its fields and branch on its variants as necessary. For more implementation details, take a look at this pre-RFC and its discussion.
One key difference between Stacked and Tree Borrows is that under Stacked Borrows, raw pointers are retagged after being cast from references. Until a decision is made on this, we will attempt to retag these pointers. However, any retag can be skipped by having the compiler query that creates the "Permission Type" parameter return an Option. If the permission is None, then we will not emit a retag.
We have a prototype implementation that we are currently testing in our ongoing development of BorrowSanitizer. This still needs to be modified to remove our dependency on MIR retag statements, though.
Extensions
Even if we move away from retagging raw pointers, we still expect that tool designers will want to be able to identify conversions between references and raw pointers at the LLVM level. This would make it possible to use a conservative static analysis to identify when we can skip run-time checks for allocations that are never accessed via raw pointers (see LiteRSan for a proof-of-concept using AddressSanitizer). As an optional extension to this MCP, we could identify these casts by emitting a second intrinsic:
ptr @__expose_tag(ptr)This creates an alias for its argument. Like retag intrinsics, it will need to be eliminated by third-party tools. We would also add a from_raw metadata annotation to retag intrinsics, indicating when references are created from raw pointers.
Mentors or Reviewers
Process
The main points of the Major Change Process are as follows:
- File an issue describing the proposal.
- A compiler team member who is knowledgeable in the area can second by writing
@rustbot secondor kickoff a team FCP with@rfcbot fcp $RESOLUTION.- Refer to Proposals, Approvals and Stabilization docs for when a second is sufficient, or when a full team FCP is required.
- Once an MCP is seconded, the Final Comment Period begins.
- Final Comment Period lasts for 10 days after all outstanding concerns are solved.
- Outstanding concerns will block the Final Comment Period from finishing. Once all concerns are resolved, the 10 day countdown is restarted.
- If no concerns are raised after 10 days since the resolution of the last outstanding concern, the MCP is considered approved.
You can read more about Major Change Proposals on forge.