Skip to content

Latest commit

 

History

History
167 lines (142 loc) · 9.83 KB

clang-tidy.md

File metadata and controls

167 lines (142 loc) · 9.83 KB

Use of clang-tidy on 3C

We're starting to use clang-tidy to automatically enforce some of the LLVM style guidelines in our code. The main configuration is in clang/lib/3C/.clang-tidy. There are a few existing warnings that we haven't yet decided what to do about; please try to avoid adding more though.

We're currently using version 11.0.0 due to bugs in older versions, so if you want to verify that our code is compliant with clang-tidy, you'll need to get a copy of that version. Now that this repository is based on a pre-release of LLVM 11, clang-tidy built from it is probably OK. But you probably want a release build of clang-tidy rather than a debug build (a release build is significantly faster), so if your main build from this repository is debug, it might be easier to get clang-tidy another way.

This file maintains a list of issues we've run into that you should be aware of when using clang-tidy. Many of these could in principle be researched and (if appropriate) reported upstream if it becomes a priority for us. Some issues lead to bogus warnings we need to suppress, and those suppressions refer to the relevant sections in this file.

_3C name prefix

We use 3C in the names of some program elements. Since an identifier cannot begin with a digit, for names that would begin with 3C, we decided that the least evil is to prepend an underscore so the name begins with _3C. (A leading underscore sometimes suggests that a program element is "private" or "special", which is not the case here; this may be misleading.) One alternative we considered was to use ThreeC at the beginning of a name, but that would be inconsistent and would make it more work to search for all occurrences of either 3C or ThreeC. And we thought the alternative of using ThreeC everywhere was just too clunky.

While the LLVM naming guideline does not mention leading underscores, its current implementation in the readability-identifier-naming check disallows them, and this is clearly intentional. There's no indication that the guideline has contemplated our scenario, and we don't know whether the community would allow our scenario if they knew about it. We raised the question in a related bug report but would probably need to raise it again in a more appropriate forum to get an answer. For now, we're keeping the _3C prefix and suppressing the warnings, aware that we might have to change our naming if our code is ever upstreamed to LLVM.

Names referenced by templates

The readability-identifier-naming check may flag a name N that we can't change because it is referenced by a template we don't control. Clearly, there's nothing we can do but suppress the warning.

There are various ways the check could in principle be enhanced to avoid this problem. For example, the check could expand all templates to find any references, though then it's unclear where the name should be flagged if both the definition and the template are under the user's control. Ideally, the template would require a C++20 "concept" and the definition of N would state that it implements the concept (there's not yet a standard way to do this, though there are some popular hacks), so all Clang-based tools (not just the readability-identifier-naming check) would treat the N in both the definition and the template as references to the N in the concept. However, all of these potential solutions seem far off from implementation.

The readability-identifier-naming check does have an exemption when N is a method and a superclass also has a method named N, even if our N is not a virtual override. This exemption is designed for scenarios where the superclass provides a default implementation of the method, as RecursiveASTVisitor does. However, it does not apply to other templates such as GraphWriter.

Runs of uppercase letters in names

Some of our names have "runs" (our term for an undocumented concept in the readability-identifier-naming check) of two or more uppercase letters, which typically represent initialisms; an example is TRV in DeclWriter.cpp. (An uppercase letter followed by a lowercase letter or a digit is considered part of the next word, not the run; for example, in getRCVars in 3CInteractiveData.cpp, the run is RC.) The readability-identifier-naming check allows such runs, but if a name fails the check due to any other problem, then the proposed fix "normalizes" the run by lowercasing all letters after the first (in addition to fixing the other problem, whatever it was). For example, getRCVars passes the check for a function name, but GetRCVars fails because the first letter is uppercase and gets changed to getRcVars.

Normalizing well-known initialisms can improve readability (e.g., ID -> Id), while for certain other runs, there are good arguments against normalization (for example, getRCVars should remain consistent with getSrcCVars). Unfortunately, between some names with runs never getting flagged and other runs getting normalized without us paying too much attention, our code is currently a mishmash. We're making a mild effort not to make it any worse. If a run gets normalized and you think it shouldn't be, you're welcome to change it back. It won't by itself cause the name to be flagged, so you won't need to add a suppression.

Things to know about clang-tidy --fix

  • Since many of the edits made by clang-tidy --fix can affect formatting in one way or another (e.g., changing the length of anything can affect line wrapping or the column at which other syntax should be aligned), it's generally a good idea to run clang-format after clang-tidy --fix.

  • If two warnings have textually overlapping fixes, clang-tidy --fix will skip applying one of them with the message, "note: this fix will not be applied because it overlaps with another fix". llvm-else-after-return is one of the worst offenders because it is implemented naively by deleting the entire else block and reinserting the content of the block, rather than just deleting the else, {, and } tokens (as applicable). Thus, for example, if an element flagged by readability-identifier-naming has even a single reference inside an else block being unwrapped by llvm-else-after-return, the element may not get renamed at all. Therefore, it may make sense to run clang-tidy --fix --checks='-*,llvm-else-after-return' as many times as needed (see the next point) before running the full set of checks.

  • llvm-else-after-return often needs multiple passes to fix everything. We've seen several ways this can happen:

    • Since llvm-else-after-return turns an if-else statement into two statements (the if branch followed by the unwrapped else branch), it doesn't trigger in a context that accepts only one statement, such as the else branch of an outer if-else statement without braces. If the outer else branch gets unwrapped by llvm-else-after-return, it may then become possible to unwrap the inner one. In a long if-else chain in which each block returns, it will take one pass to remove each else.

    • In a scenario like the previous one, if the outer else branch has braces, the inner else branch will be flagged as ready for unwrapping but still may not be unwrapped in the same pass due to overlap, as previously described.

    • Unwrapping an else branch may leave an unconditional return in an outer if branch that then makes it possible to unwrap the outer else branch.

  • We've seen many cases in which readability-identifier-naming fails to update some references to an element that it renamed. There may be several causes for this; we haven't found a comprehensive list on the web, so it's hard to know what the problem is in any given case.

    The biggest cause seems to be elements that are defined in shared header files and referenced in multiple translation units. Our understanding is that clang-tidy processes each translation unit independently, so the readability-identifer-naming check generates a warning at the definition and a fix to rename all the references in that translation unit. When clang-tidy aggregates its warnings across translation units, it deduplicates them by location and retains the fix from an arbitrary one of the duplicate warnings rather than merging the fixes. The result is that only the references to the element in an arbitrary one of the translation units that includes its definition get renamed: a fundamental design flaw.

    The upshot is that you're probably better off excluding readability-identifier-naming from your clang-tidy --fix pass, running a second pass without --fix, and fixing the readability-identifier-naming warnings with a tool that handles multiple translation units properly (such as a clangd-based IDE) along with the warnings that don't come with fixes.

  • readability-identifier-naming may introduce a name collision. This may lead to an error on one of the colliding definitions, an error on a reference that now resolves to an element that doesn't make sense in context, or possibly just incorrect runtime behavior.

  • llvm-else-after-return edits the text in a naive way that may leave undesired patterns of whitespace. clang-format fixes many of these cases but sometimes leaves formatting that is valid but not what we want. We've seen blank lines added and a comment from an else block moved to the same line as the closing brace of the if.