Skip to content

Conversation

dianqk
Copy link
Member

@dianqk dianqk commented Dec 20, 2023

When there is an IR with inline asm modules and another with a weak function, LTO can lead to multiple symbol definition issue. The weak function can be correctly discarded without using LTO.
I found that the LLVM IR can't understand the asm content and can only handle it in codegen.
Currently I've only solved the weak function under ELF.

@llvmbot llvmbot added the llvm:mc Machine (object) code label Dec 20, 2023
@llvmbot
Copy link
Member

llvmbot commented Dec 20, 2023

@llvm/pr-subscribers-mc

Author: DianQK (DianQK)

Changes

When there is an IR with inline asm modules and another with a weak function, LTO can lead to multiple symbol definition issue. The weak function can be correctly discarded without using LTO.
I found that the LLVM IR can't understand the asm content and can only handle it in codegen.
Currently I've only solved the weak function under ELF.


Full diff: https://github.com/llvm/llvm-project/pull/76040.diff

7 Files Affected:

  • (modified) llvm/include/llvm/MC/MCSymbol.h (+7-1)
  • (modified) llvm/lib/CodeGen/MachineFunctionPass.cpp (+7)
  • (modified) llvm/lib/MC/MCParser/ELFAsmParser.cpp (+2)
  • (added) llvm/test/CodeGen/Thumb/asm-fn-weak.ll (+20)
  • (added) llvm/test/CodeGen/Thumb/asm-fn.ll (+16)
  • (added) llvm/test/CodeGen/Thumb/asm-weak-fn-weak.ll (+16)
  • (added) llvm/test/CodeGen/Thumb/asm-weak-fn.ll (+17)
diff --git a/llvm/include/llvm/MC/MCSymbol.h b/llvm/include/llvm/MC/MCSymbol.h
index 7be31223280273..aa30cab7bc09d2 100644
--- a/llvm/include/llvm/MC/MCSymbol.h
+++ b/llvm/include/llvm/MC/MCSymbol.h
@@ -104,6 +104,9 @@ class MCSymbol {
   /// This symbol is weak external.
   mutable unsigned IsWeakExternal : 1;
 
+  /// This symbol is weak.
+  mutable unsigned IsWeak : 1;
+
   /// LLVM RTTI discriminator. This is actually a SymbolKind enumerator, but is
   /// unsigned to avoid sign extension and achieve better bitpacking with MSVC.
   unsigned Kind : 3;
@@ -163,7 +166,7 @@ class MCSymbol {
   MCSymbol(SymbolKind Kind, const StringMapEntry<bool> *Name, bool isTemporary)
       : IsTemporary(isTemporary), IsRedefinable(false), IsUsed(false),
         IsRegistered(false), IsExternal(false), IsPrivateExtern(false),
-        IsWeakExternal(false), Kind(Kind), IsUsedInReloc(false),
+        IsWeakExternal(false), IsWeak(false), Kind(Kind), IsUsedInReloc(false),
         SymbolContents(SymContentsUnset), CommonAlignLog2(0), Flags(0) {
     Offset = 0;
     HasName = !!Name;
@@ -411,6 +414,9 @@ class MCSymbol {
 
   bool isWeakExternal() const { return IsWeakExternal; }
 
+  bool isWeak() const { return IsWeak; }
+  void setWeak(bool Value) const { IsWeak = Value; }
+
   /// print - Print the value to the stream \p OS.
   void print(raw_ostream &OS, const MCAsmInfo *MAI) const;
 
diff --git a/llvm/lib/CodeGen/MachineFunctionPass.cpp b/llvm/lib/CodeGen/MachineFunctionPass.cpp
index d57a912f418b72..668fb289a117d5 100644
--- a/llvm/lib/CodeGen/MachineFunctionPass.cpp
+++ b/llvm/lib/CodeGen/MachineFunctionPass.cpp
@@ -27,6 +27,7 @@
 #include "llvm/IR/Dominators.h"
 #include "llvm/IR/Function.h"
 #include "llvm/IR/PrintPasses.h"
+#include "llvm/Target/TargetMachine.h"
 
 using namespace llvm;
 using namespace ore;
@@ -43,6 +44,12 @@ bool MachineFunctionPass::runOnFunction(Function &F) {
     return false;
 
   MachineModuleInfo &MMI = getAnalysis<MachineModuleInfoWrapperPass>().getMMI();
+  const TargetMachine &TM = MMI.getTarget();
+  MCSymbol *Symbol = TM.getSymbol(&F);
+  // Don't codegen the weak function when there is a defined non-weak symbol.
+  if (Symbol->isDefined() && !Symbol->isWeak() && F.hasWeakLinkage()) {
+    return false;
+  }
   MachineFunction &MF = MMI.getOrCreateMachineFunction(F);
 
   MachineFunctionProperties &MFProps = MF.getProperties();
diff --git a/llvm/lib/MC/MCParser/ELFAsmParser.cpp b/llvm/lib/MC/MCParser/ELFAsmParser.cpp
index 93e1d2f44b8c56..a27bfe879f2075 100644
--- a/llvm/lib/MC/MCParser/ELFAsmParser.cpp
+++ b/llvm/lib/MC/MCParser/ELFAsmParser.cpp
@@ -189,6 +189,8 @@ bool ELFAsmParser::ParseDirectiveSymbolAttribute(StringRef Directive, SMLoc) {
 
       MCSymbol *Sym = getContext().getOrCreateSymbol(Name);
 
+      if (Attr == MCSA_Weak)
+        Sym->setWeak(true);
       getStreamer().emitSymbolAttribute(Sym, Attr);
 
       if (getLexer().is(AsmToken::EndOfStatement))
diff --git a/llvm/test/CodeGen/Thumb/asm-fn-weak.ll b/llvm/test/CodeGen/Thumb/asm-fn-weak.ll
new file mode 100644
index 00000000000000..421db2e8a7f421
--- /dev/null
+++ b/llvm/test/CodeGen/Thumb/asm-fn-weak.ll
@@ -0,0 +1,20 @@
+; RUN: llc -mtriple=thumbv6m-none-unknown-eabi < %s | FileCheck %s
+
+; CHECK: .globl	__aeabi_uidivmod
+; CHECK-NEXT: .type	__aeabi_uidivmod,%function
+; CHECK-NEXT: __aeabi_uidivmod:
+; CHECK-NEXT: str	r0, [r2, #96]
+; CHECK-NEXT: str	r1, [r2, #100]
+module asm ".global __aeabi_uidivmod"
+module asm ".type __aeabi_uidivmod, %function"
+module asm "__aeabi_uidivmod:"
+module asm "str    r0, [r2, #0x060]"
+module asm "str    r1, [r2, #0x064]"
+
+; CHECK-NOT: __aeabi_uidivmod
+define weak void @__aeabi_uidivmod() #0 {
+  tail call void asm sideeffect alignstack "push {lr}\0Asub sp, sp, #4\0Amov r2, sp\0Abl __udivmodsi4\0Aldr r1, [sp]\0Aadd sp, sp, #4\0Apop {pc}", "~{cc},~{memory}"()
+  unreachable
+}
+
+attributes #0 = { naked }
diff --git a/llvm/test/CodeGen/Thumb/asm-fn.ll b/llvm/test/CodeGen/Thumb/asm-fn.ll
new file mode 100644
index 00000000000000..c6918b6e3ccb6d
--- /dev/null
+++ b/llvm/test/CodeGen/Thumb/asm-fn.ll
@@ -0,0 +1,16 @@
+; RUN: not llc -mtriple=thumbv6m-none-unknown-eabi < %s 2>&1 | FileCheck %s
+
+; CHECK: error: symbol '__aeabi_uidivmod' is already defined
+
+module asm ".global __aeabi_uidivmod"
+module asm ".type __aeabi_uidivmod, %function"
+module asm "__aeabi_uidivmod:"
+module asm "str    r0, [r2, #0x060]"
+module asm "str    r1, [r2, #0x064]"
+
+define void @__aeabi_uidivmod() #0 {
+  tail call void asm sideeffect alignstack "push {lr}\0Asub sp, sp, #4\0Amov r2, sp\0Abl __udivmodsi4\0Aldr r1, [sp]\0Aadd sp, sp, #4\0Apop {pc}", "~{cc},~{memory}"()
+  unreachable
+}
+
+attributes #0 = { naked }
diff --git a/llvm/test/CodeGen/Thumb/asm-weak-fn-weak.ll b/llvm/test/CodeGen/Thumb/asm-weak-fn-weak.ll
new file mode 100644
index 00000000000000..1a45ed55110a8a
--- /dev/null
+++ b/llvm/test/CodeGen/Thumb/asm-weak-fn-weak.ll
@@ -0,0 +1,16 @@
+; RUN: not llc -mtriple=thumbv6m-none-unknown-eabi < %s 2>&1 | FileCheck %s
+
+; CHECK: error: symbol '__aeabi_uidivmod' is already defined
+
+module asm ".weak __aeabi_uidivmod"
+module asm ".type __aeabi_uidivmod, %function"
+module asm "__aeabi_uidivmod:"
+module asm "str    r0, [r2, #0x060]"
+module asm "str    r1, [r2, #0x064]"
+
+define weak void @__aeabi_uidivmod() #0 {
+  tail call void asm sideeffect alignstack "push {lr}\0Asub sp, sp, #4\0Amov r2, sp\0Abl __udivmodsi4\0Aldr r1, [sp]\0Aadd sp, sp, #4\0Apop {pc}", "~{cc},~{memory}"()
+  unreachable
+}
+
+attributes #0 = { naked }
diff --git a/llvm/test/CodeGen/Thumb/asm-weak-fn.ll b/llvm/test/CodeGen/Thumb/asm-weak-fn.ll
new file mode 100644
index 00000000000000..24b915d0d4f00d
--- /dev/null
+++ b/llvm/test/CodeGen/Thumb/asm-weak-fn.ll
@@ -0,0 +1,17 @@
+; RUN: not llc -mtriple=thumbv6m-none-unknown-eabi < %s 2>&1 | FileCheck %s
+
+; CHECK: error: symbol '__aeabi_uidivmod' is already defined
+; FIXME: We want to discard the weak asm function.
+
+module asm ".weak __aeabi_uidivmod"
+module asm ".type __aeabi_uidivmod, %function"
+module asm "__aeabi_uidivmod:"
+module asm "str    r0, [r2, #0x060]"
+module asm "str    r1, [r2, #0x064]"
+
+define void @__aeabi_uidivmod() #0 {
+  tail call void asm sideeffect alignstack "push {lr}\0Asub sp, sp, #4\0Amov r2, sp\0Abl __udivmodsi4\0Aldr r1, [sp]\0Aadd sp, sp, #4\0Apop {pc}", "~{cc},~{memory}"()
+  unreachable
+}
+
+attributes #0 = { naked }

Copy link
Member

@MaskRay MaskRay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I'll be out for 2 weeks and my response time will be very slow.)

Can you describe your case using concrete commands?
We already have .lto_discard to handle non-prevailing symbols.

I am also concerned of adding IsWeak to MCSymbol, which makes the representation less orthogonal.

@dianqk
Copy link
Member Author

dianqk commented Dec 21, 2023

Can you describe your case using concrete commands? We already have .lto_discard to handle non-prevailing symbols.

The concrete usage case is to override the default builtin functions in rust. Currently rust adds builtin functions to LTO.

An example might be as follows:

#![no_std]
#![no_main]

use core::panic::PanicInfo;

#[panic_handler]
fn panic(_info: &PanicInfo) -> ! {
    loop {}
}

core::arch::global_asm!(
    ".global __aeabi_uidivmod",
    ".type __aeabi_uidivmod, %function",
    "__aeabi_uidivmod:",
    "str    r0, [r2, #0x060]",
    "str    r1, [r2, #0x064]",
);

Then run rustc -Clto --target=thumbv6m-none-eabi. This requires many other settings, such as setting the weak-intrinsics. (I should be able to make an example for x86_64 if needed.)

This will probably get user.ll and builtin.ll.

target datalayout = "e-m:e-p:32:32-Fi8-i64:64-v128:64:128-a:0:32-n32-S64"
target triple = "thumbv6m-none-unknown-eabi"

module asm ".global __aeabi_uidivmod"
module asm ".type __aeabi_uidivmod, %function"
module asm "__aeabi_uidivmod:"
module asm "str    r0, [r2, #0x060]"
module asm "str    r1, [r2, #0x064]"
target datalayout = "e-m:e-p:32:32-Fi8-i64:64-v128:64:128-a:0:32-n32-S64"
target triple = "thumbv6m-none-unknown-eabi"

; Function Attrs: naked nocf_check noinline nounwind
define weak hidden void @__aeabi_uidivmod() unnamed_addr {
  tail call void asm sideeffect alignstack "push {lr}\0Asub sp, sp, #4\0Amov r2, sp\0Abl __udivmodsi4\0Aldr r1, [sp]\0Aadd sp, sp, #4\0Apop {pc}", "~{cc},~{memory}"()
  unreachable
}

Then run with llvm-link user.ll builtin.ll -S -o out.ll and llc out.ll.

Looking at .lto_discard related code, this could be useful or something similar. I will look into it. Thanks.

@dianqk
Copy link
Member Author

dianqk commented Dec 22, 2023

From

/// Parse inline ASM and collect the symbols that are defined or referenced in
/// the current module.
///
/// For each found symbol, call \p AsmSymbol with the name of the symbol found
/// and the associated flags.
static void CollectAsmSymbols(
const Module &M,
function_ref<void(StringRef, object::BasicSymbolRef::Flags)> AsmSymbol);
, I can handle it at LTO.
This is the better approach for me. Marked as a draft as I am preparing a new PR.

It seems that .lto_discard can be used to handle the presence of a weak and a global symbol in the inline asm. But I want to drop the weak function defined in IR.

@dianqk
Copy link
Member Author

dianqk commented Dec 24, 2023

Alternative #76287.

I also found a way I could solve this issue by writing symbol resolution. This is similar to the llvm-lto2 command.

@dianqk
Copy link
Member Author

dianqk commented Dec 28, 2023

I prefer #76287 or rust-lang/rust#119348.

@dianqk dianqk closed this Dec 28, 2023
@dianqk dianqk deleted the asm-weak-fn branch December 28, 2023 00:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
llvm:mc Machine (object) code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants