Description
3C doesn't yet have a way to uniquely, persistently identify a location in the program after macro expansion. Certain parts of 3C's analysis (such as cast insertion) need this in order to identify the same AST node of the macro-expanded program across translation units. Currently, we use a plain file location (represented by our existing PersistentSourceLoc
class) in the common case where no macros are involved, and we have various workarounds for macros that break from time to time, e.g., in #439. It would be great to fix this once and for all.
Clang's SourceLocation
is an intricate data structure that includes all the information we want about a location within a macro expansion. (It also includes the #include
stack, which we want to ignore in at least some scenarios in order to treat a program element in a header file as the same element when that header file is included in several translation units.) But a SourceLocation
is specific to a translation unit for the period of time it is loaded in memory by Clang, and 3C makes multiple passes over the translation units. That's what motivated us to introduce PersistentSourceLoc
in the first place by dumping the file/line/column out of the SourceLocation
ourselves, at least in the common case of no macros.
One potential approach to a solution is to dump all the information we need out of a SourceLocation
. In simple cases, this might just be one file/line/column for each frame of the macro call stack. With macros that take parameters, it could get mind-bending. Clang has a "scratch space" that seems to be involved here. John has more of the history.
We could also see if another piece of Clang-based software has an approach to matching macro-expanded AST nodes across translation units based on more sophisticated use of the Clang API. Currently, the only Clang-based tool I'm aware of that does any cross-translation-unit analysis is clangd. It may have at least a partial solution to the problem, though that solution may rely on holding all the translation units in memory for the duration of the analysis that needs to match AST nodes, which we might not want to commit 3C to doing.
3C may still need the concept of a persistent source location before macro expansion for other purposes, such as identifying the location of a rewrite. We can discuss which concept should have which name in 3C.