Skip to content

[C++20][Modules] The search complexity for header file info is too expensive with a lot of modules. #140860

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
mpark opened this issue May 21, 2025 · 4 comments
Labels
clang:modules C++20 modules and Clang Header Modules slow-compile

Comments

@mpark
Copy link
Member

mpark commented May 21, 2025

Currently, for each header file #include or import, there is a look-up of this header file on each of the loaded PCMs. This results in a search complexity of O(N*M) where N is the # of includes, and M is the # of loaded PCMs. For large numbers, e.g. N = 10,000 and M = 1,000, this becomes very costly.

The synthetic test case I put together takes 14s user time, compared to 0.06s without modules.
In a real world scenario, the difference for us was 18s with modules and 3s without modules.

This problem was encountered previously inside ASTWriter (see #87848). It was solved there by only considering the local information rather than making the query out to ASTReader. I'm not sure that such a solution can be used in general. (CC @jansvoboda11)

Test Case

I wrote a bash script to generate the test case. It produces 1,000 empty header units, and 10,000 empty header files. The main file imports the 1,000 header units, then #includes 10,000 header files. We can measure the preprocessing time on this file. On my machine, this takes 14s, compared to 0.06s without modules.

The script for the synthetic repro:

#/usr/bin/env bash

CLANG=${CLANG:-clang++}
NUM_MODULES=${NUM_MODULES:-1000}
NUM_HEADERS=${NUM_HEADERS:-10000}

mkdir build && cd build

for i in $(seq 1 $NUM_MODULES); do > m${i}.h; done
seq 1 $NUM_MODULES | xargs -I {} -P $(nproc) bash -c "$CLANG -std=c++20 -fmodule-header m{}.h"

for i in $(seq 1 $NUM_HEADERS); do > h${i}.h; done

> a.cpp
for i in $(seq 1 $NUM_MODULES); do echo "import \"m${i}.h\";" >> a.cpp; done
for i in $(seq 1 $NUM_HEADERS); do echo "#include \"h${i}.h\"" >> a.cpp; done
echo "int main() {}" >> a.cpp

time $CLANG -std=c++20 -Wno-experimental-header-units -E a.cpp -o /dev/null \
    $(for i in $(seq 1 $NUM_MODULES); do echo -n "-fmodule-file=m${i}.pcm "; done)

Note: I am aware that -fmodule-file=<pcm> eagerly loads modules and that it's not encouraged. In reality we actually use the -fmodule-file=<name>=<pcm> form with header units.

Code Path

We more or less start here:

  // Do a standard file entry lookup.
  OptionalFileEntryRef FE = HeaderInfo.LookupFile(...);

and proceed through the following:

Here we encounter this:

ModuleMap::KnownHeader
HeaderSearch::findModuleForHeader(FileEntryRef File, bool AllowTextual,
                                  bool AllowExcluded) const {
  if (ExternalSource) {
    // Make sure the external source has handled header info about this file,
    // which includes whether the file is part of a module.
    (void)getExistingFileInfo(File);
  }
  return ModMap.findModuleForHeader(File, AllowTextual, AllowExcluded);
}

where there is an invocation to (void)getExistingFileInfo(File);, which calls ExternalSource->GetHeaderFileInfo(FE);.

We finally arrive at:

HeaderFileInfo ASTReader::GetHeaderFileInfo(FileEntryRef FE) {
  HeaderFileInfoVisitor Visitor(FE);
  ModuleMgr.visit(Visitor);
  if (std::optional<HeaderFileInfo> HFI = Visitor.getHeaderFileInfo())
      return *HFI;

  return HeaderFileInfo();
}

where ModuleMgr.visit(Visitor); will visit every currently loaded PCM.


Every PCM file contains a HEADER_SEARCH_TABLE record, which is an on-disk hash table. For header units, it has at least one entry in that table which is the header file itself. For named modules, we shouldn't need this, but it's still present today (see #78495). This means that every header unit has a HEADER_SEARCH_TABLE with at least one entry in it.

@frederick-vs-ja frederick-vs-ja added clang:modules C++20 modules and Clang Header Modules slow-compile and removed new issue labels May 21, 2025
@llvmbot
Copy link
Member

llvmbot commented May 21, 2025

@llvm/issue-subscribers-clang-modules

Author: Michael Park (mpark)

Currently, for each header file `#include` or `import`, there is a look-up of this header file on each of the loaded PCMs. This results in a search complexity of `O(N*M)` where `N` is the # of includes, and `M` is the # of loaded PCMs. For large numbers, e.g. `N` = 10,000 and `M` = 1,000, this becomes very costly.

This problem was encountered previously inside ASTWriter (see #87848). It was solved there by only considering the local information rather than making the query out to ASTReader. I'm not sure that such a solution can be used in general. (CC @jansvoboda11)

Test Case

I wrote a bash script to generate the test case. It produces 1,000 empty header units, and 10,000 empty header files. The main file imports the 1,000 header units, then #includes 10,000 header files. We can measure the preprocessing time on this file. On my machine, it takes about 14s user time, compared to 0.06s without modules. In a real world scenario, the difference for us was 18s with modules and 3s without modules.

The script for the synthetic repro:

#/usr/bin/env bash

CLANG=${CLANG:-clang++}
NUM_MODULES=${NUM_MODULES:-1000}
NUM_HEADERS=${NUM_HEADERS:-10000}

mkdir build &amp;&amp; cd build

for i in $(seq 1 $NUM_MODULES); do &gt; m${i}.h; done
seq 1 $NUM_MODULES | xargs -I {} -P $(nproc) bash -c "$CLANG -std=c++20 -fmodule-header m{}.h"

for i in $(seq 1 $NUM_HEADERS); do &gt; h${i}.h; done

&gt; a.cpp
for i in $(seq 1 $NUM_MODULES); do echo "import \"m${i}.h\";" &gt;&gt; a.cpp; done
for i in $(seq 1 $NUM_HEADERS); do echo "#include \"h${i}.h\"" &gt;&gt; a.cpp; done
echo "int main() {}" &gt;&gt; a.cpp

time $CLANG -std=c++20 -Wno-experimental-header-units -E a.cpp -o /dev/null \
    $(for i in $(seq 1 $NUM_MODULES); do echo -n "-fmodule-file=m${i}.pcm "; done)

> Note: I am aware that -fmodule-file=&lt;pcm&gt; eagerly loads modules and that it's not encouraged. In reality we actually use the -fmodule-file=&lt;name&gt;=&lt;pcm&gt; form with header units.

Code Path

We more or less start here:

  // Do a standard file entry lookup.
  OptionalFileEntryRef FE = HeaderInfo.LookupFile(...);

and proceed through the following:

Here we encounter this:

ModuleMap::KnownHeader
HeaderSearch::findModuleForHeader(FileEntryRef File, bool AllowTextual,
                                  bool AllowExcluded) const {
  if (ExternalSource) {
    // Make sure the external source has handled header info about this file,
    // which includes whether the file is part of a module.
    (void)getExistingFileInfo(File);
  }
  return ModMap.findModuleForHeader(File, AllowTextual, AllowExcluded);
}

where there is an invocation to (void)getExistingFileInfo(File);, which calls ExternalSource-&gt;GetHeaderFileInfo(FE);.

We finally arrive at:

HeaderFileInfo ASTReader::GetHeaderFileInfo(FileEntryRef FE) {
  HeaderFileInfoVisitor Visitor(FE);
  ModuleMgr.visit(Visitor);
  if (std::optional&lt;HeaderFileInfo&gt; HFI = Visitor.getHeaderFileInfo())
      return *HFI;

  return HeaderFileInfo();
}

where ModuleMgr.visit(Visitor); will visit every currently loaded PCM.


Every PCM file contains a HEADER_SEARCH_TABLE record, which is an on-disk hash table. For header units, it has at least one entry in that table which is the header file itself. For named modules, we shouldn't need this, but it's still present today (see #78495). This means that every header unit has a HEADER_SEARCH_TABLE with at least one entry in it.

@ChuanqiXu9
Copy link
Member

BTW, for this,

For named modules, we shouldn't need this, but it's still present today (see #78495).

I remember I tried to do some of such optimizations in the downstream. But I didn't find any significant impacts, so I stopped. It is welcomed to optimize this if any one want to give it a try.

@Bigcheese
Copy link
Contributor

Every PCM file contains a HEADER_SEARCH_TABLE record, which is an on-disk hash table. For header units, it has at least one entry in that table which is the header file itself. For named modules, we shouldn't need this, but it's still present today (see #78495).

I'm pretty sure we do still need it. I would expect:

//--- A.h
#pragma once
#define FOO

//--- B.h
#include "A.h"

//--- C.cpp
import "B.h";
#include "A.h" // this #include should get skipped.

where A.h is textual and B.h is a header unit.

The actual issue with #78495 is that Clang does not do visibility tracking of HeaderFileInfo which causes other problems with submodules too. I think the right fix for this part of it is to fix this part of Clang. Named modules should always block HeaderFileInfo propagation because they don't export macros, but header units should always propagate it. Basically they should follow the exact same rules as the other preprocessor state does.

Of course this doesn't fix the performance issue. There I have a preference for making chained hash tables work.

@mpark
Copy link
Member Author

mpark commented May 21, 2025

There I have a preference for making chained hash tables work.

This means using MultiOnDiskHashTable, is that right? I did try this (mentioned in #140867) and I'm willing to make that work, but it'll be more involved. Specifically because HeaderFileInfoTrait isn't readily usable with it, and because I think we want to store FileEntryRef instead of the "internal key". I think all of this isn't too hard, like, MultiOnDiskHashTable could use Info::InMemoryKey as the MergedTable's key, and the traits can decide which key they want to be used as the in-memory key or something. If that's the direction we want to head, I'm happy to update #140867 with that implementation. The current approach I think is no-go anyway based on @jansvoboda11's report.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:modules C++20 modules and Clang Header Modules slow-compile
Projects
None yet
Development

No branches or pull requests

5 participants