Skip to content

Base v0 mangling grammar #747

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Oct 18, 2021
Merged
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
86 changes: 86 additions & 0 deletions gcc/rust/backend/rust-mangle.cc
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
#include "rust-mangle.h"
#include "fnv-hash.h"
#include <algorithm>

// FIXME: Rename those to legacy_*
static const std::string kMangledSymbolPrefix = "_ZN";
Expand Down Expand Up @@ -154,6 +155,85 @@ v0_simple_type_prefix (const TyTy::BaseType *ty)
gcc_unreachable ();
}

// FIXME: Is this present somewhere in libbiberty already?
static std::string
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think generic things like this base62 implementation method should really belong to its own file over in https://github.com/Rust-GCC/gccrs/tree/master/gcc/rust/util

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the file in f425f31. I think this implementation is used throughout rustc and with more bases, so I'm assuming the function might grow bigger. I also don't really know if this implementation is rustc-specific so for now I'm assuming it is and calling it rust-base62.h :D

v0_base62_integer (uint64_t x)
{
const static std::string base_64
= "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ@$";
std::string buffer (128, '\0');
size_t idx = 0;
size_t base = 62;

do
{
buffer[idx] = base_64[(x % base)];
idx++;
x = x / base;
}
while (x != 0);

std::reverse (buffer.begin (), buffer.begin () + idx);
return buffer.substr (0, idx);
}

// Add an underscore-terminated base62 integer to the mangling string.
// This corresponds to the `<base-62-number>` grammar in the v0 mangling RFC:
// - 0 is encoded as "_"
// - any other value is encoded as itself minus one in base 62, followed by "_"
static void
v0_add_integer_62 (std::string &mangled, uint64_t x)
{
if (x > 0)
mangled.append (v0_base62_integer (x - 1));

mangled.append ("_");
}

// Add a tag-prefixed base62 integer to the mangling string when the
// integer is greater than 0:
// - 0 is encoded as "" (nothing)
// - any other value is encoded as <tag> + v0_add_integer_62(itself), that is
// <tag> + base62(itself - 1) + '_'
static void
v0_add_opt_integer_62 (std::string &mangled, std::string tag, uint64_t x)
{
if (x > 0)
{
mangled.append (tag);
v0_add_integer_62 (mangled, x);
}
}

static void
v0_add_disambiguator (std::string &mangled, uint64_t dis)
{
v0_add_opt_integer_62 (mangled, "s", dis);
}

// Add an identifier to the mangled string. This corresponds to the
// `<identifier>` grammar in the v0 mangling RFC.
static void
v0_add_identifier (std::string &mangled, const std::string &identifier)
{
// FIXME: gccrs cannot handle unicode identifiers yet, so we never have to
// create mangling for unicode values for now. However, this is handled
// by the v0 mangling scheme. The grammar for unicode identifier is contained
// in <undisambiguated-identifier>, right under the <identifier> one. If the
// identifier contains unicode values, then an extra "u" needs to be added
// to the mangling string and `punycode` must be used to encode the
// characters.

mangled += std::to_string (identifier.size ());

// If the first character of the identifier is a digit or an underscore, we
// add an extra underscore
if (identifier[0] == '_')
mangled.append ("_");

mangled.append (identifier);
}

static std::string
v0_type_prefix (const TyTy::BaseType *ty)
{
Expand Down Expand Up @@ -194,7 +274,13 @@ static std::string
v0_mangle_item (const TyTy::BaseType *ty, const Resolver::CanonicalPath &path,
const std::string &crate_name)
{
std::string mangled;

// FIXME: Add real algorithm once all pieces are implemented
auto ty_prefix = v0_type_prefix (ty);
v0_add_identifier (mangled, crate_name);
v0_add_disambiguator (mangled, 62);

gcc_unreachable ();
}

Expand Down