Skip to content

Rewrite local labels in inline assembly #81570

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Closed
61 changes: 61 additions & 0 deletions compiler/rustc_codegen_llvm/src/asm.rs
Original file line number Diff line number Diff line change
Expand Up @@ -200,6 +200,7 @@ impl AsmBuilderMethods<'tcx> for Builder<'a, 'll, 'tcx> {

// Build the template string
let mut template_str = String::new();
let mut labels = vec![];
for piece in template {
match *piece {
InlineAsmTemplatePiece::String(ref s) => {
Expand All @@ -212,6 +213,10 @@ impl AsmBuilderMethods<'tcx> for Builder<'a, 'll, 'tcx> {
}
}
} else {
if let Some(label) = get_llvm_label_from_str(s) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A template string piece may consist of multiple lines (e.g. raw string literals). These strings could define more than one label.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah thanks, I didn't think of this case

labels.push(label.to_owned());
}

template_str.push_str(s)
}
}
Expand Down Expand Up @@ -244,6 +249,30 @@ impl AsmBuilderMethods<'tcx> for Builder<'a, 'll, 'tcx> {
}
}

// Labels should be made local to prevent issues with inlined `asm!` and duplicate labels
// Fixes https://github.com/rust-lang/rust/issues/74262
for label in labels.iter() {
let ranges: Vec<_> = template_str
.match_indices(label)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this checks if the character before the match is a non-identifier character. This means that if you have a label bar then foobar will be replaced with foo${:private}bar${:uid} which is really not what we want.

.filter_map(|(idx, _)| {
let label_end_idx = idx + label.len();
let next_char = template_str[label_end_idx..].chars().nth(0);
if next_char.is_none() || !llvm_ident_continuation(next_char.unwrap()) {
Some(idx..label_end_idx)
} else {
None
}
})
.collect();

// We add a constant length of 18 to the string every time
// from ${:private} and ${:uid}
for (i, range) in ranges.iter().enumerate() {
let real_range = range.start + (i * 18)..range.end + (i * 18);
template_str.replace_range(real_range, &format!("${{:private}}{}${{:uid}}", label))
}
}

if !options.contains(InlineAsmOptions::PRESERVES_FLAGS) {
match asm_arch {
InlineAsmArch::AArch64 | InlineAsmArch::Arm => {
Expand Down Expand Up @@ -874,3 +903,35 @@ fn llvm_fixup_output_type(
_ => layout.llvm_type(cx),
}
}

// Valid llvm symbols: https://llvm.org/docs/AMDGPUOperandSyntax.html#symbols
// First char is [a-zA-Z_.]
fn llvm_ident_start(c: char) -> bool {
('a'..='z').contains(&c) || ('A'..='Z').contains(&c) || '_' == c || '.' == c
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
('a'..='z').contains(&c) || ('A'..='Z').contains(&c) || '_' == c || '.' == c
matches!(c, 'a'..='z' | 'A'..='Z' | '_' | '.')

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

matches! is more concise, thanks

}

// All subsequent chars are [a-zA-Z0-9_$.@]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer if we disallowed @ and $ in symbol names. We don't need to support all symbol names here, the user is free to shoot themselves in the foot if they choose to use a symbol name with weird characters.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer if we disallowed @ and $ in symbol names.

That should happen elsewhere, possibly in the same location parsing for formatting specifiers happens. Should also be a concern for a separate PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My main concern here is that these may not be valid characters in symbol names for all targets and may have other special meanings on those targets.

Copy link
Member

@nagisa nagisa Feb 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I agree. I still think this is the the wrong place in the codebase to enforce what characters we allow in asm labels. Probably should happen as a separate PR too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't actually prevent the user from using such labels. We just don't provide automatic rewriting into local labels for them.

Also we're not handling all possible label definitions. For example labels can be defined after a ; on targets where that is a statement separator.

We're not trying to handle all possible cases here since that would require us to effectively parse the asm, which we really want to leave to LLVM. We're just trying to make the most common types of labels work correctly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't actually prevent the user from using such labels. We just don't provide automatic rewriting into local labels for them.

Aha, I see the distinction you're trying to make now.

We're just trying to make the most common types of labels work correctly.

That seems like a recipe for a pretty poor experience with a very non-obvious cliff at which things may start failing in non-obvious ways. For instance, user with passing familiarity with Rust's asm! labels being local could write something like this:

#[inline] // not always
pub fn peach() {
    unsafe {
        asm!("per$$imon: jmp per$$imon")
    }
}

and it might very well work out in basic tests, but start failing once the function is released to the field. If we do go the route of making things automagic, its better they're magic all the time, not only sometimes.

that would require us to effectively parse the asm, which we really want to leave to LLVM

I believe just lexing would be sufficient, but I do agree that we'd rather not do that at all.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the very least I would like @ to be removed from the set of allowed characters for symbols. @ is used as a symbol modifier in x86 intel syntax: sym@gotpcrel.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to confirm what we want to accept as a valid identifier for the purposes of replacing labels. Denying symbols outright should be properly implemented somewhere else, preferably with a compiler warning or error. This code can reflect what symbols are accepted, but it will ignore other things. For example, @foobar: will be replaced with @${:private}foobar${:uid} if we choose to not acknowledge @ in labels.

Is the set of symbols we will accept in a label [a-zA-Z0-9_$.]?

Additionally, do we want to accept labels that are strings like nagisa commented here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that for label renaming we should only support [a-zA-Z0-9_.], and only plain labels not quoted ones.

fn llvm_ident_continuation(c: char) -> bool {
('a'..='z').contains(&c)
|| ('A'..='Z').contains(&c)
|| ('0'..='9').contains(&c)
|| '_' == c
|| '$' == c
|| '.' == c
|| '@' == c
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A ? mark is a valid continuation character according to clang: https://rust.godbolt.org/z/EsP4sW, though it does appear like a bug and is only intended to be allowed when emulating the microsoft cl. Nevertheless it is something that LLVM currently will let through.

}

/// Gets a LLVM label from a &str if it exists
fn get_llvm_label_from_str(s: &str) -> Option<&str> {
if let Some(colon_idx) = s.find(':') {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use trim_start to strip leading whitespace from the line.

let substr = &s[..colon_idx];
let mut chars = substr.chars();
if let Some(c) = chars.next() {
if llvm_ident_start(c) && chars.all(llvm_ident_continuation) {
return Some(substr);
}
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One way to get more arbitrary stuff into a asm label is to quote it as a string, like this:

void test(void) {
    __asm__(".intel_syntax noprefix\n"
    "\"01☺☺@banana.?@!\": jmp \"01☺☺@banana.?@!\"");
}

or in Rust

#![feature(asm)]
pub fn test() {
    unsafe {
    asm!(r#"
        "☺helloworld?@!$%^&": jmp "☺helloworld?@!$%^&"
    "#);
    }
}

So that's something we should handle or forbid too.


None
}
16 changes: 16 additions & 0 deletions src/test/ui/asm/duplicate-labels-inline.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
// compile-flags: -g
// build-pass

#![feature(asm)]

#[inline(always)]
fn asm() {
unsafe {
asm!("duplabel: nop",);
}
}

fn main() {
asm();
asm();
}
21 changes: 21 additions & 0 deletions src/test/ui/asm/duplicate-labels.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
// compile-flags: -g
// build-pass

#![feature(asm)]

fn asm1() {
unsafe {
asm!("duplabel: nop",);
}
}

fn asm2() {
unsafe {
asm!("nop", "duplabel: nop",);
}
}

fn main() {
asm1();
asm2();
}
10 changes: 10 additions & 0 deletions src/test/ui/asm/label-substr-instruction.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
// compile-flags: -g
// build-pass

#![feature(asm)]

fn main() {
unsafe {
asm!("jmp j", "nop", "j: nop");
}
}
10 changes: 10 additions & 0 deletions src/test/ui/asm/label-substr-label.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
// compile-flags: -g
// build-pass

#![feature(asm)]

fn main() {
unsafe {
asm!("jmp l", "l: nop", "jmp l2", "l2: nop");
}
}
10 changes: 10 additions & 0 deletions src/test/ui/asm/label-use-before-declaration.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
// compile-flags: -g
// build-pass

#![feature(asm)]

fn main() {
unsafe {
asm!("jmp l", "nop", "l: nop");
}
}