[WIP][C++20][Modules] Lazily, but fully load 'HeaderFileInfo' table into memory. #140867

mpark · 2025-05-21T08:47:07Z

This is a WIP prototype in trying to solve #140860.

The approach explored here is to fully load the on-disk hash tables in each of the PCMs into memory, so that we perform N (# of includes) hash table look-ups, rather than N (# of includes) * M (# of loaded modules) look-ups.

The behavior is still somewhat lazy, where the loading doesn't happen until the first GetHeaderFileInfo request. We keep track of the modules that have already been processed, so that only newly added modules get added to the in-memory hash table.

The following test case takes 14s before this change, and 0.18s with the change.

#/usr/bin/env bash

CLANG=${CLANG:-clang++}
NUM_MODULES=${NUM_MODULES:-1000}
NUM_HEADERS=${NUM_HEADERS:-10000}

mkdir build && cd build

for i in $(seq 1 $NUM_MODULES); do > m${i}.h; done
seq 1 $NUM_MODULES | xargs -I {} -P $(nproc) bash -c "$CLANG -std=c++20 -fmodule-header m{}.h"

for i in $(seq 1 $NUM_HEADERS); do > h${i}.h; done

> a.cpp
for i in $(seq 1 $NUM_MODULES); do echo "import \"m${i}.h\";" >> a.cpp; done
for i in $(seq 1 $NUM_HEADERS); do echo "#include \"h${i}.h\"" >> a.cpp; done
echo "int main() {}" >> a.cpp

time $CLANG -std=c++20 -Wno-experimental-header-units -E a.cpp -o /dev/null \
    $(for i in $(seq 1 $NUM_MODULES); do echo -n "-fmodule-file=m${i}.pcm "; done)

Theoretically, this would do more work in scenarios where there are only a few modules, each with large header search tables. I tested this out with another script. This time, it produces 10 header units, each with 10,000 header includes in each of them. The main file imports the 10 header units. This loads 100,010 entries into the in-memory hash table, but still only takes 0.15s. However, I'm not sure whether this would be acceptable, and if not, what might be a better strategy here.

#/usr/bin/env bash

CLANG=${CLANG:-clang++}
NUM_MODULES=${NUM_MODULES:-10}
NUM_HEADERS=${NUM_HEADERS:-10000}

mkdir build && cd build

for i in $(seq 1 $NUM_MODULES)
do
    > m${i}.h
    for j in $(seq 1 $NUM_HEADERS)
    do
        > h${i}-${j}.h
        echo "#include \"h${i}-${j}.h\"" >> m${i}.h
    done
done
seq 1 $NUM_MODULES | xargs -I {} -P $(nproc) bash -c "$CLANG -std=c++20 -fmodule-header m{}.h"

> a.cpp
for i in $(seq 1 $NUM_MODULES); do echo "import \"m${i}.h\";" >> a.cpp; done

time $CLANG -std=c++20 -Wno-experimental-header-units -E a.cpp -o /dev/null \
    $(for i in $(seq 1 $NUM_MODULES); do echo -n "-fmodule-file=m${i}.pcm "; done)

Other Considerations

I've also tried doing the preloading at load time, the way we do for source locations and interesting identifiers in ASTReader::ReadAST. This works just as well, it's slightly more eager than the approach taken here, but the effect is similar.
I tried using MultiOnDiskHashTable to pull together all of the hash tables, since it would handle merging and loading into memory for performance reasons. It turns out to be kind of a pain to use with HeaderFileInfoTrait. More importantly, we can't really afford to key by the internal key type since that only hashes on the size of the file and the hash collision gets pretty bad. Moreover, the merging and condensing strategy seems rather simple.
Another thing we could do is to introduce a command-line flag to opt into this behavior. e.g. -fmodules-eager-load-header-search-table

llvmbot · 2025-05-21T08:47:41Z

@llvm/pr-subscribers-clang-modules

@llvm/pr-subscribers-clang

Author: Michael Park (mpark)

Changes

This is a WIP prototype in trying to solve #140860.

The approach explored here is to fully load the on-disk hash tables in each of the PCMs into memory, so that we perform N (# of includes) hash table look-ups, rather than N (# of includes) * M (# of loaded modules) look-ups.

The behavior is still somewhat lazy, where the loading doesn't happen until the first GetHeaderFileInfo request. We keep track of the modules that have already been processed, so that only newly added modules get added to the in-memory hash table.

The following test case takes 14s before this change, and 0.18s with the change.

#/usr/bin/env bash

CLANG=${CLANG:-clang++}
NUM_MODULES=${NUM_MODULES:-1000}
NUM_HEADERS=${NUM_HEADERS:-10000}

mkdir build &amp;&amp; cd build

for i in $(seq 1 $NUM_MODULES); do &gt; m${i}.h; done
seq 1 $NUM_MODULES | xargs -I {} -P $(nproc) bash -c "$CLANG -std=c++20 -fmodule-header m{}.h"

for i in $(seq 1 $NUM_HEADERS); do &gt; h${i}.h; done

&gt; a.cpp
for i in $(seq 1 $NUM_MODULES); do echo "import \"m${i}.h\";" &gt;&gt; a.cpp; done
for i in $(seq 1 $NUM_HEADERS); do echo "#include \"h${i}.h\"" &gt;&gt; a.cpp; done
echo "int main() {}" &gt;&gt; a.cpp

time $CLANG -std=c++20 -Wno-experimental-header-units -E a.cpp -o /dev/null \
    $(for i in $(seq 1 $NUM_MODULES); do echo -n "-fmodule-file=m${i}.pcm "; done)

Theoretically, this would do more work in scenarios where there are only a few modules, each with large header search tables. I tested this out with another script. This time, it produces 10 header units, each with 10,000 header includes in each of them. The main file imports the 10 header units. This loads 100,010 entries into the in-memory hash table, but still only takes 0.15s. However, I'm not sure whether this would be acceptable, and if not, what might be a better strategy here.

#/usr/bin/env bash

CLANG=${CLANG:-clang++}
NUM_MODULES=${NUM_MODULES:-10}
NUM_HEADERS=${NUM_HEADERS:-10000}

mkdir build &amp;&amp; cd build

for i in $(seq 1 $NUM_MODULES)
do
    &gt; m${i}.h
    for j in $(seq 1 $NUM_HEADERS)
    do
        &gt; h${i}-${j}.h
        echo "#include \"h${i}-${j}.h\"" &gt;&gt; m${i}.h
    done
done
seq 1 $NUM_MODULES | xargs -I {} -P $(nproc) bash -c "$CLANG -std=c++20 -fmodule-header m{}.h"

&gt; a.cpp
for i in $(seq 1 $NUM_MODULES); do echo "import \"m${i}.h\";" &gt;&gt; a.cpp; done

time $CLANG -std=c++20 -Wno-experimental-header-units -E a.cpp -o /dev/null \
    $(for i in $(seq 1 $NUM_MODULES); do echo -n "-fmodule-file=m${i}.pcm "; done)

Other Considerations

I've also tried doing the preloading at load time, the way we do for source locations and interesting identifiers in ASTReader::ReadAST. This works just as well, it's slightly more eager than the approach taken here, but the effect is similar.
I tried using MultiOnDiskHashTable to pull together all of the hash tables, since it would handle merging and loading into memory for performance reasons. It turns out to be kind of a pain to use with HeaderFileInfoTrait. More importantly, we can't really afford to key by the internal key type since that only hashes on the size of the file and the hash collision gets pretty bad. Moreover, the merging and condensing strategy seems rather simple.

Full diff: https://github.com/llvm/llvm-project/pull/140867.diff

3 Files Affected:

(modified) clang/include/clang/Serialization/ASTReader.h (+4)
(modified) clang/lib/Serialization/ASTReader.cpp (+48-31)
(modified) clang/lib/Serialization/ASTReaderInternals.h (+1-2)

diff --git a/clang/include/clang/Serialization/ASTReader.h b/clang/include/clang/Serialization/ASTReader.h
index 57b0266af26bb..9b63ad1a7cde7 100644
--- a/clang/include/clang/Serialization/ASTReader.h
+++ b/clang/include/clang/Serialization/ASTReader.h
@@ -654,6 +654,10 @@ class ASTReader
   /// Map from the TU to its lexical contents from each module file.
   std::vector<std::pair<ModuleFile*, LexicalContents>> TULexicalDecls;
 
+  unsigned HeaderFileInfoIdx = 0;
+
+  llvm::DenseMap<FileEntryRef, HeaderFileInfo> HeaderFileInfoLookup;
+
   /// Map from a DeclContext to its lookup tables.
   llvm::DenseMap<const DeclContext *,
                  serialization::reader::DeclContextLookupTable> Lookups;
diff --git a/clang/lib/Serialization/ASTReader.cpp b/clang/lib/Serialization/ASTReader.cpp
index d068f5e163176..15ea6b25d1d92 100644
--- a/clang/lib/Serialization/ASTReader.cpp
+++ b/clang/lib/Serialization/ASTReader.cpp
@@ -4155,6 +4155,7 @@ llvm::Error ASTReader::ReadASTBlock(ModuleFile &F,
       if (Record[0]) {
         F.HeaderFileInfoTable = HeaderFileInfoLookupTable::Create(
             (const unsigned char *)F.HeaderFileInfoTableData + Record[0],
+            (const unsigned char *)F.HeaderFileInfoTableData + sizeof(uint32_t),
             (const unsigned char *)F.HeaderFileInfoTableData,
             HeaderFileInfoTrait(*this, F));
 
@@ -6831,43 +6832,59 @@ std::optional<bool> ASTReader::isPreprocessedEntityInFileID(unsigned Index,
     return false;
 }
 
-namespace {
-
-  /// Visitor used to search for information about a header file.
-  class HeaderFileInfoVisitor {
-  FileEntryRef FE;
-    std::optional<HeaderFileInfo> HFI;
-
-  public:
-    explicit HeaderFileInfoVisitor(FileEntryRef FE) : FE(FE) {}
-
-    bool operator()(ModuleFile &M) {
-      HeaderFileInfoLookupTable *Table
-        = static_cast<HeaderFileInfoLookupTable *>(M.HeaderFileInfoTable);
-      if (!Table)
-        return false;
+static void mergeHeaderFileInfoModuleBits(HeaderFileInfo &HFI,
+                                          bool isModuleHeader,
+                                          bool isTextualModuleHeader) {
+  HFI.isModuleHeader |= isModuleHeader;
+  if (HFI.isModuleHeader)
+    HFI.isTextualModuleHeader = false;
+  else
+    HFI.isTextualModuleHeader |= isTextualModuleHeader;
+}
 
-      // Look in the on-disk hash table for an entry for this file name.
-      HeaderFileInfoLookupTable::iterator Pos = Table->find(FE);
-      if (Pos == Table->end())
-        return false;
+/// Merge the header file info provided by \p OtherHFI into the current
+/// header file info (\p HFI)
+static void mergeHeaderFileInfo(HeaderFileInfo &HFI,
+                                const HeaderFileInfo &OtherHFI) {
+  assert(OtherHFI.External && "expected to merge external HFI");
 
-      HFI = *Pos;
-      return true;
-    }
+  HFI.isImport |= OtherHFI.isImport;
+  HFI.isPragmaOnce |= OtherHFI.isPragmaOnce;
+  mergeHeaderFileInfoModuleBits(HFI, OtherHFI.isModuleHeader,
+                                OtherHFI.isTextualModuleHeader);
 
-    std::optional<HeaderFileInfo> getHeaderFileInfo() const { return HFI; }
-  };
+  if (!HFI.LazyControllingMacro.isValid())
+    HFI.LazyControllingMacro = OtherHFI.LazyControllingMacro;
 
-} // namespace
+  HFI.DirInfo = OtherHFI.DirInfo;
+  HFI.External = (!HFI.IsValid || HFI.External);
+  HFI.IsValid = true;
+}
 
 HeaderFileInfo ASTReader::GetHeaderFileInfo(FileEntryRef FE) {
-  HeaderFileInfoVisitor Visitor(FE);
-  ModuleMgr.visit(Visitor);
-  if (std::optional<HeaderFileInfo> HFI = Visitor.getHeaderFileInfo())
-      return *HFI;
-
-  return HeaderFileInfo();
+  for (auto Iter = ModuleMgr.begin() + HeaderFileInfoIdx, End = ModuleMgr.end();
+       Iter != End; ++Iter) {
+    if (auto *Table =
+            static_cast<HeaderFileInfoLookupTable *>(Iter->HeaderFileInfoTable)) {
+      auto& Info = Table->getInfoObj();
+      for (auto Iter = Table->data_begin(), End = Table->data_end();
+           Iter != End; ++Iter) {
+        const auto *Item = Iter.getItem();
+        // Determine the length of the key and the data.
+        const auto& [KeyLen, DataLen] = HeaderFileInfoTrait::ReadKeyDataLength(Item);
+        // Read the key.
+        const auto &Key = Info.ReadKey(Item, KeyLen);
+        if (auto EKey = Info.getFile(Key)) {
+          auto Data = Info.ReadData(Key, Item + KeyLen, DataLen);
+          auto [Iter, Inserted] = HeaderFileInfoLookup.try_emplace(*EKey, Data);
+          if (!Inserted)
+            mergeHeaderFileInfo(Iter->second, Data);
+        }
+      }
+    }
+  }
+  HeaderFileInfoIdx = ModuleMgr.size();
+  return HeaderFileInfoLookup.lookup(FE);
 }
 
 void ASTReader::ReadPragmaDiagnosticMappings(DiagnosticsEngine &Diag) {
diff --git a/clang/lib/Serialization/ASTReaderInternals.h b/clang/lib/Serialization/ASTReaderInternals.h
index 4a7794889b039..d78d9fee6042d 100644
--- a/clang/lib/Serialization/ASTReaderInternals.h
+++ b/clang/lib/Serialization/ASTReaderInternals.h
@@ -402,13 +402,12 @@ class HeaderFileInfoTrait {
 
   data_type ReadData(internal_key_ref,const unsigned char *d, unsigned DataLen);
 
-private:
   OptionalFileEntryRef getFile(const internal_key_type &Key);
 };
 
 /// The on-disk hash table used for known header files.
 using HeaderFileInfoLookupTable =
-    llvm::OnDiskChainedHashTable<HeaderFileInfoTrait>;
+    llvm::OnDiskIterableChainedHashTable<HeaderFileInfoTrait>;
 
 } // namespace reader

github-actions · 2025-05-21T08:49:22Z

✅ With the latest revision this PR passed the C/C++ code formatter.

jansvoboda11 · 2025-05-21T19:50:37Z

FWIW I tried running this patch through my clang-scan-deps benchmark on an internal Apple project and saw 16% increase in instruction count.

mpark · 2025-05-21T20:40:37Z

FWIW I tried running this patch through my clang-scan-deps benchmark on an internal Apple project and saw 16% increase in instruction count.

Ah, I see. Thank you for trying it out!

jansvoboda11 · 2025-05-21T22:32:04Z

I didn't take a profile, but clang-scan-deps uses a caching VFS, so I don't think the eager creation of FileEntryRef objects is causing the slowdown. I'm assuming it's actually the deserialization itself that's the issue. Conceptually, MultiOnDiskHashTable makes the most sense, but I can see how the hash collisions degrade the performance.

mpark · 2025-05-22T00:44:08Z

Gotcha. I'm planning to try an approach where we use MultiOnDiskHashTable but with a modification such that it uses Info::InMemoryKey as the key for the MergedTable. That way we can keep the existing traits keying by internal key type but HeaderInfo Trait can key by the external key type (FileEntryRef). Maybe we set the MaxTables to... I don't know. Some number that seems like a reasonable threshold to be considered "coarse-ish modules" category. 64? 128?

llvmbot added clang Clang issues not falling into any other category clang:modules C++20 modules and Clang Header Modules labels May 21, 2025

mpark marked this pull request as draft May 21, 2025 08:48

mpark requested review from jansvoboda11, ChuanqiXu9 and Bigcheese May 21, 2025 08:49

mpark force-pushed the full-hfi-load branch from eed4b12 to 1013ab2 Compare May 21, 2025 08:55

Lazily, but fully load 'HeaderFileInfo' table into memory.

a82059f

mpark force-pushed the full-hfi-load branch from 1013ab2 to a82059f Compare May 21, 2025 09:02

mpark mentioned this pull request May 21, 2025

[C++20][Modules] The search complexity for header file info is too expensive with a lot of modules. #140860

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP][C++20][Modules] Lazily, but fully load 'HeaderFileInfo' table into memory. #140867

[WIP][C++20][Modules] Lazily, but fully load 'HeaderFileInfo' table into memory. #140867

mpark commented May 21, 2025 •

edited

Loading

llvmbot commented May 21, 2025 •

edited

Loading

Other Considerations

github-actions bot commented May 21, 2025 •

edited

Loading

jansvoboda11 commented May 21, 2025

mpark commented May 21, 2025

jansvoboda11 commented May 21, 2025

mpark commented May 22, 2025

[WIP][C++20][Modules] Lazily, but fully load 'HeaderFileInfo' table into memory. #140867

Are you sure you want to change the base?

[WIP][C++20][Modules] Lazily, but fully load 'HeaderFileInfo' table into memory. #140867

Conversation

mpark commented May 21, 2025 • edited Loading

Other Considerations

llvmbot commented May 21, 2025 • edited Loading

Other Considerations

github-actions bot commented May 21, 2025 • edited Loading

jansvoboda11 commented May 21, 2025

mpark commented May 21, 2025

jansvoboda11 commented May 21, 2025

mpark commented May 22, 2025

mpark commented May 21, 2025 •

edited

Loading

llvmbot commented May 21, 2025 •

edited

Loading

github-actions bot commented May 21, 2025 •

edited

Loading