Skip to content

Emit retags in codegen #958

@icmccorm

Description

@icmccorm

Proposal

This is part of project goal #392.

Both Stacked Borrows and Tree Borrows rely on retags to create and update the permissions associated with pointers. However, the information that Miri uses to determine where a retag should occur is lost during codegen. We need a way to recover this to be able to detect aliasing violations in lower-level representations of Rust programs. This is necessary to support third-party tools like BorrowSanitizer, which aim at providing support for detecting Rust-specific undefined behavior in multilanguage programs.

Design

We propose adding a new unstable flag -Zcodegen-emit-retag. When this flag is set, the following function call will be emitted whenever a retag needs to happen (shown in LLVM IR):

ptr @__retag(ptr, i64, i64, ptr)

This is just a vehicle for type information; third party tools will replace this call with their own implementations. Its parameters are:

  1. Target (ptr) - The pointer being retagged.

  2. Size (i64) - An offset in bytes from the start of the pointer being retagged, indicating the range for the new permission within the allocation pointed to by the target.

  3. Permission Type (i64) - The permission created by the retag. Third-party tools will be able to configure this by overriding a compiler query. You can expect this to be serialized from something equivalent to Miri's NewPermission.

  4. Interior Mutable Fields (ptr) - A pointer to a constant array of pairs of i64. Each pair is an offset from the target pointer and a size, indicating an interior mutable field within the pointee type of the target.

The return value is an alias for the target pointer. Additional information can be encoded for convenience using LLVM metadata nodes attached to this function call. For example, function-entry retags will have a fn_entry metadata node.

Implementation Notes

We will determine where to emit retag function calls during codegen without using MIR Retag statements (which are likely going away). Broadly, parameters containing references are retagged on entry to a function, and values with references are retagged when they are copied by MIR assignment statements and function call terminators. When an aggregate contains references, we will recurse into its fields and branch on its variants as necessary. For more implementation details, take a look at this pre-RFC and its discussion.

One key difference between Stacked and Tree Borrows is that under Stacked Borrows, raw pointers are retagged after being cast from references. Until a decision is made on this, we will attempt to retag these pointers. However, any retag can be skipped by having the compiler query that creates the "Permission Type" parameter return an Option. If the permission is None, then we will not emit a retag.

We have a prototype implementation that we are currently testing in our ongoing development of BorrowSanitizer. This still needs to be modified to remove our dependency on MIR retag statements, though.

Extensions

Even if we move away from retagging raw pointers, we still expect that tool designers will want to be able to identify conversions between references and raw pointers at the LLVM level. This would make it possible to use a conservative static analysis to identify when we can skip run-time checks for allocations that are never accessed via raw pointers (see LiteRSan for a proof-of-concept using AddressSanitizer). As an optional extension to this MCP, we could identify these casts by emitting a second intrinsic:

ptr @__expose_tag(ptr)

This creates an alias for its argument. Like retag intrinsics, it will need to be eliminated by third-party tools. We would also add a from_raw metadata annotation to retag intrinsics, indicating when references are created from raw pointers.

Mentors or Reviewers

Process

The main points of the Major Change Process are as follows:

  • File an issue describing the proposal.
  • A compiler team member who is knowledgeable in the area can second by writing @rustbot second or kickoff a team FCP with @rfcbot fcp $RESOLUTION.
  • Once an MCP is seconded, the Final Comment Period begins.
    • Final Comment Period lasts for 10 days after all outstanding concerns are solved.
    • Outstanding concerns will block the Final Comment Period from finishing. Once all concerns are resolved, the 10 day countdown is restarted.
    • If no concerns are raised after 10 days since the resolution of the last outstanding concern, the MCP is considered approved.

You can read more about Major Change Proposals on forge.

Metadata

Metadata

Assignees

No one assigned

    Labels

    T-compilerAdd this label so rfcbot knows to poll the compiler teammajor-changeA proposal to make a major change to rustcmajor-change-acceptedA major change proposal that was acceptedto-announceAnnounce this issue on triage meeting

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions