Skip to content

Uniquely identify persistent source locations inside macro expansions #444

Open
@mattmccutchen-cci

Description

@mattmccutchen-cci

3C doesn't yet have a way to uniquely, persistently identify a location in the program after macro expansion. Certain parts of 3C's analysis (such as cast insertion) need this in order to identify the same AST node of the macro-expanded program across translation units. Currently, we use a plain file location (represented by our existing PersistentSourceLoc class) in the common case where no macros are involved, and we have various workarounds for macros that break from time to time, e.g., in #439. It would be great to fix this once and for all.

Clang's SourceLocation is an intricate data structure that includes all the information we want about a location within a macro expansion. (It also includes the #include stack, which we want to ignore in at least some scenarios in order to treat a program element in a header file as the same element when that header file is included in several translation units.) But a SourceLocation is specific to a translation unit for the period of time it is loaded in memory by Clang, and 3C makes multiple passes over the translation units. That's what motivated us to introduce PersistentSourceLoc in the first place by dumping the file/line/column out of the SourceLocation ourselves, at least in the common case of no macros.

One potential approach to a solution is to dump all the information we need out of a SourceLocation. In simple cases, this might just be one file/line/column for each frame of the macro call stack. With macros that take parameters, it could get mind-bending. Clang has a "scratch space" that seems to be involved here. John has more of the history.

We could also see if another piece of Clang-based software has an approach to matching macro-expanded AST nodes across translation units based on more sophisticated use of the Clang API. Currently, the only Clang-based tool I'm aware of that does any cross-translation-unit analysis is clangd. It may have at least a partial solution to the problem, though that solution may rely on holding all the translation units in memory for the duration of the analysis that needs to match AST nodes, which we might not want to commit 3C to doing.

3C may still need the concept of a persistent source location before macro expansion for other purposes, such as identifying the location of a rewrite. We can discuss which concept should have which name in 3C.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions