We're starting to use clang-tidy
to automatically enforce some of the LLVM
style guidelines in our code. The main configuration is in
clang/lib/3C/.clang-tidy
. There are a few existing warnings that we haven't
yet decided what to do about; please try to avoid adding more though.
We're currently using version 11.0.0 due to bugs in older versions, so if you
want to verify that our code is compliant with clang-tidy
, you'll need to get
a copy of that version. Now that this repository is based on a pre-release of
LLVM 11, clang-tidy
built from it is probably OK. But you probably want a
release build of clang-tidy
rather than a debug build (a release build is
significantly faster), so if your main build from this repository is debug, it
might be easier to get clang-tidy
another way.
This file maintains a list of issues we've run into that you should be aware of
when using clang-tidy
. Many of these could in principle be researched and (if
appropriate) reported upstream if it becomes a priority for us. Some issues lead
to bogus warnings we need to suppress, and those suppressions refer to the
relevant sections in this file.
We use 3C
in the names of some program elements. Since an identifier cannot
begin with a digit, for names that would begin with 3C
, we decided that the
least evil is to prepend an underscore so the name begins with _3C
. (A leading
underscore sometimes suggests that a program element is "private" or "special",
which is not the case here; this may be misleading.) One alternative we
considered was to use ThreeC
at the beginning of a name, but that would be
inconsistent and would make it more work to search for all occurrences of either
3C
or ThreeC
. And we thought the alternative of using ThreeC
everywhere
was just too clunky.
While the LLVM naming
guideline
does not mention leading underscores, its current implementation in the
readability-identifier-naming
check disallows them, and this is clearly
intentional. There's no indication that the guideline has contemplated our
scenario, and we don't know whether the community would allow our scenario if
they knew about it. We raised the question in a related bug
report but would probably need to
raise it again in a more appropriate forum to get an answer. For now, we're
keeping the _3C
prefix and suppressing the warnings, aware that we might have
to change our naming if our code is ever upstreamed to LLVM.
The readability-identifier-naming
check may flag a name N
that we can't
change because it is referenced by a template we don't control. Clearly, there's
nothing we can do but suppress the warning.
There are various ways the check could in principle be enhanced to avoid this
problem. For example, the check could expand all templates to find any
references, though then it's unclear where the name should be flagged if both
the definition and the template are under the user's control. Ideally, the
template would require a C++20
"concept" and the
definition of N
would state that it implements the concept (there's not yet a
standard way to do this, though there are some popular
hacks), so all
Clang-based tools (not just the readability-identifier-naming
check) would
treat the N
in both the definition and the template as references to the N
in the concept. However, all of these potential solutions seem far off from
implementation.
The readability-identifier-naming
check does have an exemption when N
is a
method and a superclass also has a method named N
, even if our N
is not a
virtual override. This exemption is designed for scenarios where the superclass
provides a default implementation of the method, as RecursiveASTVisitor
does.
However, it does not apply to other templates such as GraphWriter
.
Some of our names have "runs" (our term for an undocumented concept in the
readability-identifier-naming
check) of two or more uppercase letters, which
typically represent initialisms; an example is TRV
in DeclWriter.cpp
. (An
uppercase letter followed by a lowercase letter or a digit is considered part of
the next word, not the run; for example, in getRCVars
in
3CInteractiveData.cpp
, the run is RC
.) The readability-identifier-naming
check allows such runs, but if a name fails the check due to any other problem,
then the proposed fix "normalizes" the run by lowercasing all letters after the
first (in addition to fixing the other problem, whatever it was). For example,
getRCVars
passes the check for a function name, but GetRCVars
fails because
the first letter is uppercase and gets changed to getRcVars
.
Normalizing well-known initialisms can improve readability (e.g., ID
-> Id
),
while for certain other runs, there are good arguments against normalization
(for example, getRCVars
should remain consistent with getSrcCVars
).
Unfortunately, between some names with runs never getting flagged and other runs
getting normalized without us paying too much attention, our code is currently a
mishmash. We're making a mild effort not to make it any worse. If a run gets
normalized and you think it shouldn't be, you're welcome to change it back. It
won't by itself cause the name to be flagged, so you won't need to add a
suppression.
-
Since many of the edits made by
clang-tidy --fix
can affect formatting in one way or another (e.g., changing the length of anything can affect line wrapping or the column at which other syntax should be aligned), it's generally a good idea to runclang-format
afterclang-tidy --fix
. -
If two warnings have textually overlapping fixes,
clang-tidy --fix
will skip applying one of them with the message, "note: this fix will not be applied because it overlaps with another fix".llvm-else-after-return
is one of the worst offenders because it is implemented naively by deleting the entireelse
block and reinserting the content of the block, rather than just deleting theelse
,{
, and}
tokens (as applicable). Thus, for example, if an element flagged byreadability-identifier-naming
has even a single reference inside anelse
block being unwrapped byllvm-else-after-return
, the element may not get renamed at all. Therefore, it may make sense to runclang-tidy --fix --checks='-*,llvm-else-after-return'
as many times as needed (see the next point) before running the full set of checks. -
llvm-else-after-return
often needs multiple passes to fix everything. We've seen several ways this can happen:-
Since
llvm-else-after-return
turns an if-else statement into two statements (theif
branch followed by the unwrappedelse
branch), it doesn't trigger in a context that accepts only one statement, such as theelse
branch of an outer if-else statement without braces. If the outerelse
branch gets unwrapped byllvm-else-after-return
, it may then become possible to unwrap the inner one. In a long if-else chain in which each block returns, it will take one pass to remove eachelse
. -
In a scenario like the previous one, if the outer
else
branch has braces, the innerelse
branch will be flagged as ready for unwrapping but still may not be unwrapped in the same pass due to overlap, as previously described. -
Unwrapping an
else
branch may leave an unconditional return in an outerif
branch that then makes it possible to unwrap the outerelse
branch.
-
-
We've seen many cases in which
readability-identifier-naming
fails to update some references to an element that it renamed. There may be several causes for this; we haven't found a comprehensive list on the web, so it's hard to know what the problem is in any given case.The biggest cause seems to be elements that are defined in shared header files and referenced in multiple translation units. Our understanding is that
clang-tidy
processes each translation unit independently, so thereadability-identifer-naming
check generates a warning at the definition and a fix to rename all the references in that translation unit. Whenclang-tidy
aggregates its warnings across translation units, it deduplicates them by location and retains the fix from an arbitrary one of the duplicate warnings rather than merging the fixes. The result is that only the references to the element in an arbitrary one of the translation units that includes its definition get renamed: a fundamental design flaw.The upshot is that you're probably better off excluding
readability-identifier-naming
from yourclang-tidy --fix
pass, running a second pass without--fix
, and fixing thereadability-identifier-naming
warnings with a tool that handles multiple translation units properly (such as aclangd
-based IDE) along with the warnings that don't come with fixes. -
readability-identifier-naming
may introduce a name collision. This may lead to an error on one of the colliding definitions, an error on a reference that now resolves to an element that doesn't make sense in context, or possibly just incorrect runtime behavior. -
llvm-else-after-return
edits the text in a naive way that may leave undesired patterns of whitespace.clang-format
fixes many of these cases but sometimes leaves formatting that is valid but not what we want. We've seen blank lines added and a comment from anelse
block moved to the same line as the closing brace of theif
.