Skip to content

Commit

Permalink
Canonicalize away bit width and embed small integers into IntIds (c…
Browse files Browse the repository at this point in the history
…arbon-language#4487)

The first change here is to canonicalize away bit width when tracking
integers in our shared value store. This lets us have a more definitive
model of "what is the mathematical value". It also frees us to use more
efficient bit widths when available, such as bits inside the ID itself.

For canonicalizing, we try to minimize the width adjustments and
maximize the use of the SSO in APInt, and so we never shrink belowe
64-bits and grow in multiples of the word bit width in the
implementation. We also canonicalize to the signed 2s compliment
representation so we can represent negative numbers in an intuitive way.

The canonicalizing requires getting the bit width out of the type and
adjusting to it within the toolchain when doing any kind of math, and
this PR updates various places to do that, as well as adding some
convenience APIs to assist.

Then we take advantage of the canonical form and embed small integers
into the ID itself rather than allocating storage for them and
referencing them with an index. This is especially helpful for the
pervasive small integers such as the sizes of types, arrays, etc. Those
no longer require indirection at all. Various short-cut APIs to take
advantage of this have also been added.

This PR improves lexing by about 5% when there are lots of `i32` types.

---------

Co-authored-by: Dana Jansens <[email protected]>
Co-authored-by: Carbon Infra Bot <[email protected]>
Co-authored-by: Jon Ross-Perkins <[email protected]>
  • Loading branch information
4 people authored Nov 13, 2024
1 parent 39ed62d commit 3ba4997
Show file tree
Hide file tree
Showing 23 changed files with 848 additions and 98 deletions.
32 changes: 32 additions & 0 deletions toolchain/base/BUILD
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,7 @@ cc_library(
hdrs = ["value_ids.h"],
deps = [
":index_base",
"//common:check",
"//common:ostream",
"@llvm-project//llvm:Support",
],
Expand Down Expand Up @@ -89,10 +90,41 @@ cc_test(
],
)

cc_library(
name = "int_store",
srcs = ["int_store.cpp"],
hdrs = ["int_store.h"],
deps = [
":index_base",
":mem_usage",
":value_store",
":yaml",
"//common:check",
"//common:hashtable_key_context",
"//common:ostream",
"//common:set",
"@llvm-project//llvm:Support",
],
)

cc_test(
name = "int_store_test",
size = "small",
srcs = ["int_store_test.cpp"],
deps = [
":int_store",
"//testing/base:gtest_main",
"//testing/base:test_raw_ostream",
"//toolchain/testing:yaml_test_helpers",
"@googletest//:gtest",
],
)

cc_library(
name = "shared_value_stores",
hdrs = ["shared_value_stores.h"],
deps = [
":int_store",
":mem_usage",
":value_ids",
":value_store",
Expand Down
66 changes: 66 additions & 0 deletions toolchain/base/int_store.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
// Part of the Carbon Language project, under the Apache License v2.0 with LLVM
// Exceptions. See /LICENSE for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception

#include "toolchain/base/int_store.h"

namespace Carbon {

auto IntStore::CanonicalBitWidth(int significant_bits) -> int {
// For larger integers, we store them in as a signed APInt with a canonical
// width that is the smallest multiple of the word type's bits, but no
// smaller than a minimum of 64 bits to avoid spurious resizing of the most
// common cases (<= 64 bits).
static constexpr int WordWidth = llvm::APInt::APINT_BITS_PER_WORD;

return std::max<int>(
MinAPWidth, ((significant_bits + WordWidth - 1) / WordWidth) * WordWidth);
}

auto IntStore::CanonicalizeSigned(llvm::APInt value) -> llvm::APInt {
return value.sextOrTrunc(CanonicalBitWidth(value.getSignificantBits()));
}

auto IntStore::CanonicalizeUnsigned(llvm::APInt value) -> llvm::APInt {
// We need the width to include a zero sign bit as we canonicalize to a
// signed representation.
return value.zextOrTrunc(CanonicalBitWidth(value.getActiveBits() + 1));
}

auto IntStore::AddLarge(int64_t value) -> IntId {
auto ap_id =
values_.Add(llvm::APInt(CanonicalBitWidth(64), value, /*isSigned=*/true));
return MakeIndexOrInvalid(ap_id.index);
}

auto IntStore::AddSignedLarge(llvm::APInt value) -> IntId {
auto ap_id = values_.Add(CanonicalizeSigned(value));
return MakeIndexOrInvalid(ap_id.index);
}

auto IntStore::AddUnsignedLarge(llvm::APInt value) -> IntId {
auto ap_id = values_.Add(CanonicalizeUnsigned(value));
return MakeIndexOrInvalid(ap_id.index);
}

auto IntStore::LookupLarge(int64_t value) const -> IntId {
auto ap_id = values_.Lookup(
llvm::APInt(CanonicalBitWidth(64), value, /*isSigned=*/true));
return MakeIndexOrInvalid(ap_id.index);
}

auto IntStore::LookupSignedLarge(llvm::APInt value) const -> IntId {
auto ap_id = values_.Lookup(CanonicalizeSigned(value));
return MakeIndexOrInvalid(ap_id.index);
}

auto IntStore::OutputYaml() const -> Yaml::OutputMapping {
return values_.OutputYaml();
}

auto IntStore::CollectMemUsage(MemUsage& mem_usage, llvm::StringRef label) const
-> void {
mem_usage.Collect(std::string(label), values_);
}

} // namespace Carbon
Loading

0 comments on commit 3ba4997

Please sign in to comment.