Since D129389 (and downstream PR https://github.com/apple/llvm-project/pull/4965), the dependency scanner is responsible for generating full command-lines, including the modules paths. This patch removes the flag that was making this an opt-in behavior in clang-scan-deps.
Reviewed By: benlangmuir
Differential Revision: https://reviews.llvm.org/D131420
Sharing the FileManager across implicit module builds currently leaks
paths looked up in an importer into the built module itself. This can
cause non-deterministic results across scans. It is especially bad for
modules since the path can be saved into the pcm file itself, leading to
stateful behaviour if the cache is shared.
This should not impact the number of real filesystem accesses in the
scanner, since it is already caching in the
DependencyScanningWorkerFilesystem.
Note: this change does not affect whether or not the FileManager is
shared across TUs in the scanner, which is a separate issue.
Differential Revision: https://reviews.llvm.org/D131412
Sharing the FileManager between the importer and the module build should
only be an optimization. Add a cc1 option -fno-modules-share-filemanager
to allow us to test this. Fix the path to modulemap files, which
previously depended on the shared FileManager when using path mapped to
an external file in a VFS.
Differential Revision: https://reviews.llvm.org/D131076
The "strict context hash" is insufficient to identify module
dependencies during scanning, leading to different module build commands
being produced for a single module, and non-deterministically choosing
between them. This commit switches to hashing the canonicalized
`CompilerInvocation` of the module. By hashing the invocation we are
converting these from correctness issues to performance issues, and we
can then incrementally improve our ability to canonicalize
command-lines.
This change can cause a regression in the number of modules needed. Of
the 4 projects I tested, 3 had no regression, but 1, which was
clang+llvm itself, had a 66% regression in number of modules (4%
regression in total invocations). This is almost entirely due to
differences between -W options across targets. Of this, 25% of the
additional modules are system modules, which we could avoid if we
canonicalized -W options when -Wsystem-headers is not present --
unfortunately this is non-trivial due to some warnings being enabled in
system headers by default. The rest of the additional modules are mostly
real differences in potential warnings, reflecting incorrect behaviour
in the current scanner.
There were also a couple of differences due to `-DFOO`
`-fmodule-ignore-macro=FOO`, which I fixed here.
Since the output paths for the module depend on its context hash, we
hash the invocation before filling in outputs, and rely on the build
system to always return the same output paths for a given module.
Note: since the scanner itself uses an implicit modules build, there can
still be non-determinism, but it will now present as different
module+hashes rather than different command-lines for the same
module+hash.
Differential Revision: https://reviews.llvm.org/D129884
Follow-up to 6626f6fec3, this fixes the handling of -MT
* If no targets are provided, we need to invent one since cc1 expects
the driver to have handled it. The default is to use -o, quoting as
necessary for a make target.
* Fix the splitting for empty string, which was incorrectly treated as
{""} instead of {}.
* Add a way to test this behaviour in clang-scan-deps.
Differential Revision: https://reviews.llvm.org/D129607
When building modules, override secondary outputs (dependency file,
dependency targets, serialized diagnostic file) in addition to the pcm
file path. This avoids inheriting per-TU command-line options that
cause non-determinism in the results (non-deterministic command-line for
the module build, non-determinism in which TU's .diag and .d files will
contain the module outputs). In clang-scan-deps we infer whether to
generate dependency or serialized diagnostic files based on an original
command-line. In a real build system this should be modeled explicitly.
Differential Revision: https://reviews.llvm.org/D129389
Otherwise a header may be erroneously marked as having a header macro guard and won't get re-included.
Differential Revision: https://reviews.llvm.org/D128772
Dependency scanning does not care about the order of submodules for
correctness, so sort the submodules so that we get the same
command-lines to build the module across different TUs. The order of
inferred submodules can vary depending on the order of #includes in the
including TU.
Differential Revision: https://reviews.llvm.org/D128008
Disable or canonicalize compiler options that are not relevant in
explicit module builds, similar to what we already did for the modules
cache path. This reduces uninteresting differences between
command-lines, which is particularly useful if there is a tool that can
cache the compilations.
Differential Revision: https://reviews.llvm.org/D127883
The command-line arguments for module builds are cc1 commands, so they
do not implicitly set -disable-free like a driver invocation, and
Tooling will disable it for the scanning instance itself. Set
-disable-free explicitly so that separate invocations for building
modules will not pay for freeing memory unnecessarily.
Differential Revision: https://reviews.llvm.org/D127229
This is first of a series of patches for making the special lexing for dependency scanning a first-class feature of the `Preprocessor` and `Lexer`.
This patch only includes NFC renaming changes to make reviewing of the functionality changing parts easier.
Differential Revision: https://reviews.llvm.org/D125484
This is to improve maintenance a bit and remove need to maintain the additional option and related code-paths.
Differential Revision: https://reviews.llvm.org/D124558
The dependency scanner can reuse single FileManager instance across multiple translation units. This may lead to non-deterministic output depending on which TU gets processed first.
One of the problems is that Clang uses DirectoryEntry::getName in the header search algorithm. This function returns the path that was first used to construct the (shared) entry in FileManager. Using DirectoryEntryRef::getName instead preserves the case as it was spelled out for the current "get directory entry" request.
rdar://90647508
Reviewed By: dexonsmith
Differential Revision: https://reviews.llvm.org/D123229
In D92191, a bunch of test cases were added to check `clang-scan-deps` works in `clang-cl` mode as well.
We don't need to duplicate all test cases, though. Testing the few special cases we have in `clang-scan-deps` for `clang-cl` should be good enough:
1. Deducing output path (and therefore target name in our make output).
2. Ignoring `-Xclang` arguments in step 1.
3. Deducing resource directory by invoking the compiler executuable.
This test de-duplicates the extra clang-cl test cases.
Reviewed By: dexonsmith, saudi
Differential Revision: https://reviews.llvm.org/D121812
This patch gets rid of the ridiculous relative path we use to invoke the `module-deps-to-rsp.py` script and creates proper lit substitution, cleaning up the tests.
Reviewed By: dexonsmith
Differential Revision: https://reviews.llvm.org/D121525
The code for traversing precompiled dependencies is somewhat complicated and contains a dangling iterator bug.
This patch simplifies the code and fixes the bug.
Reviewed By: dexonsmith
Differential Revision: https://reviews.llvm.org/D121533
When pruning header search paths (to reduce the number of modules we need to build explicitly), we can't prune the search paths used in (transitive) dependencies of a module. Otherwise, we could end up with either of the following dependency graphs:
```
X:<hash1> -> Y:<hash2>
X:<hash1> -> Y:<hash3>
```
depending on the search paths of the translation unit we discovered `X` and `Y` from.
This patch fixes that.
Depends on D121295.
Reviewed By: dexonsmith
Differential Revision: https://reviews.llvm.org/D121303
This patch follows the same reasoning as D114481. The PCH reader looks for `__clangast` section in the precompiled module file, which is not present in the file on AIX and not supported in XCOFF yet.
Reviewed By: daltenty
Differential Revision: https://reviews.llvm.org/D121709
Instead of outputting the test directory into the JSON result file, parsing it with `FileCheck` and then potentially stripping it, simply use `FileCheck`'s `-D` option.
Note that we use `%/t` instead of `%t` in order to normalize to forward slashes on Windows, which matches what we do with `sed 's:\\\\\?:/:g'`.
Differential Revision: https://reviews.llvm.org/D121516
With explicit modules build, the '-fmodules-cache-path=' argument is unused.
This patch removes the argument to avoid warnings or errors (with '-Werror') stemming from that.
Depends on D118915.
Reviewed By: dexonsmith
Differential Revision: https://reviews.llvm.org/D120474
The `clang-scan-deps` tool currently generates `-fmodule-file=` command-line arguments for the whole transitive closure of modular dependencies. This is not necessary, we only need to provide the direct dependencies on the command line. Information about transitive dependencies is stored within the `.pcm` files of direct dependencies. This makes the command lines shorter, but should be a NFC otherwise (unless there are bugs in the loading mechanism for explicit modules).
Depends on D120465.
Reviewed By: Bigcheese
Differential Revision: https://reviews.llvm.org/D118915
Since D113473, we don't report any module map files via `-fmodule-map-file=` in explicit builds. The ultimate goal here is to make sure Clang doesn't open/read/parse/evaluate unnecessary module maps.
However, implicit module maps still end up reading all reachable module maps. This patch disables implicit module maps in explicit builds.
Unfortunately, we still need to report some module map files that aren't encoded in PCM files of dependencies: module maps that are necessary to correctly evaluate includes in modules marked as `[no_undeclared_includes]`.
Depends on D120464.
Reviewed By: dexonsmith
Differential Revision: https://reviews.llvm.org/D120465
The dependency scanner already generates canonical -cc1 command lines that can be used to compile discovered modular dependencies.
For translation unit command lines, the scanner only generates additional driver arguments the build system is expected to append to the original command line.
While this works most of the time, there are situations where that's not the case. For example with `-Wunused-command-line-argument`, Clang will complain about the `-fmodules-cache-path=` argument that's not being used in explicit modular builds. Combine that with `-Werror` and the build outright fails.
To prevent such failures, this patch changes the dependency scanner to return the full driver command line to compile the original translation unit. This gives us more opportunities to massage the arguments into something reasonable.
Reviewed By: Bigcheese
Differential Revision: https://reviews.llvm.org/D118986
Same as D114481, the PCH reader looks for a `__clangast` section in the precompiled module file, which isn't present on AIX, and we don't support writing this custom section in XCOFF yet.
Reviewed By: Jake-Egan, daltenty, sfertile
Differential Revision: https://reviews.llvm.org/D118477
When `-fno-integrated-as` is in effect (the default on AIX) the cc1 job produces a `.s` file instead. This patch adapts the test to accept `.s` or `.o` files.
Reviewed By: jansvoboda11
Differential Revision: https://reviews.llvm.org/D118152
The minimizing and caching filesystem used by the dependency scanner can be configured to **not** minimize some files. That's necessary when scanning a TU with prebuilt inputs (i.e. PCH) that refer to the original (non-minimized) files. Minimizing such files in the dependency scanner would cause discrepancy between the current perceived state of the filesystem and the file sizes stored in the AST file. By not minimizing such files, we avoid creating the discrepancy.
The problem with the current approach is that files that should not be minimized are identified by their path. This breaks down when the prebuilt input (PCH) and the current TU refer to the same file via different paths (i.e. symlinks). This patch switches from paths to `llvm::sys::fs::UniqueID` when identifying ignored files. This is consistent with how the rest of Clang treats files.
Depends on D114966.
Reviewed By: dexonsmith, arphaman
Differential Revision: https://reviews.llvm.org/D114971
The modified tests fail because 64-bit XCOFF object files are not currently supported on AIX. This patch disables these tests on 64-bit AIX for now.
This patch is similar to D111887 except the failures on this patch are on a 64-bit build.
Reviewed By: shchenz, #powerpc
Differential Revision: https://reviews.llvm.org/D113049
This reverts commits:
- 04192422c4.
- 015e08c6ba
D114206 was landed before it was approved - and was landed knowing that
the test crashed on windows, without an xfail. The promised follow-up
commit with fixes has not appeared since it was promised on December 14th.
This fixes the test introduced in D114206 so it no longer writes to the current working directory.
Reviewed By: simon_tatham
Differential Revision: https://reviews.llvm.org/D116611
Make clang-scan-deps use the virtual path for module maps instead of the on disk
path. This is needed so that modulemap relative lookups are done correctly in
the actual module builds. The file dependencies still use the on disk path as
that's what matters for build invalidation.
Differential Revision: https://reviews.llvm.org/D114206
Dependency scanner test for resource directory deduction doesn't account for LLVM builds with custom `CLANG_RESOURCE_DIR`.
This patch ensures we don't hardcode the default behavior into the test and take into account the actual value. This is done by running `%clang -print-resource-dir` and using that as the expected value in test assertions.
New comment also clarifies this is different from running that command as part of the dependency scan.
Reviewed By: mgorny
Differential Revision: https://reviews.llvm.org/D115628
Some command-line codegen arguments are likely to differ between identical modules discovered from different translation units. This patch removes them to make builds deterministic and/or reduce the number of built modules.
Reviewed By: Bigcheese
Differential Revision: https://reviews.llvm.org/D112923
The PCH reader looks for `__clangast` section in the precompiled module file, which is not present in the file on AIX, and we don't support writing this custom section in XCOFF yet.
Reviewed By: daltenty
Differential Revision: https://reviews.llvm.org/D114481
During explicit modules build, when all modules are provided via `-fmodule-file=<path>` and implicit modules and implicit module maps are disabled (`-fno-implicit-modules`, `-fno-implicit-module-maps`), we don't need to load the original module map files at all. This patch stops emitting the `-fmodule-map-file=` arguments we don't need, saving some compilation time due to avoiding parsing such module maps and making the command line shorter.
Reviewed By: bnbarham
Differential Revision: https://reviews.llvm.org/D113473
The #pragma directives push_macro/pop_macro and include_alias may influence the #include / import directives encountered by dependency scanning tools like clang-scan-deps.
This patch ensures that those directives are not removed during source code minimization.
Differential Revision: https://reviews.llvm.org/D112088
The `clang-scan-deps` CLI tool invokes the compiler with `-print-resource-dir` in case the `-resource-dir` argument is missing from the compilation command line. This is to enable running the tool on compilation databases that use compiler from a different toolchain than `clang-scan-deps` itself. While this doesn't make sense when scanning modular builds (due to the `-cc1` arguments the tool generates), the tool can can be used to efficiently scan for file dependencies of non-modular builds too.
This patch stops deducing the resource directory by invoking the compiler by default. This mode can still be enabled by invoking `clang-scan-deps` with `--resource-dir-recipe invoke-compiler`. The new default is `--resource-dir-recipe modify-compiler-path` which relies on the resource directory deduction taking place in `Driver::Driver` which is based on the compiler path. This makes the default more aligned with the intended usage of the tool while still allowing it to serve other use-cases.
Note that this functionality was also influenced by D108979, where the dependency scanner stopped going through `ClangTool::run`. The function tried to deduce the resource directory based on the current executable path, which might not be what the users expect when invoked from within a shared library.
Depends on D108979.
Reviewed By: dexonsmith
Differential Revision: https://reviews.llvm.org/D108366
One of main goals of the dependency scanner is to be strict about module compatibility. This is achieved through strict context hash. This patch ensures that strict context hash is enabled not only during the scan itself (and its minimized implicit build), but also when actually reporting the dependency.
Reviewed By: dexonsmith
Differential Revision: https://reviews.llvm.org/D111720
During explicit modular build, PCM files are typically specified via the `-fmodule-file=<path>` command-line option. Early during the compilation, Clang uses the `ASTReader` to read their contents and caches the result so that the module isn't loaded implicitly later on. A listener is attached to the `ASTReader` to collect names of the modules read from the PCM files. However, if the PCM has already been loaded previously via PCH:
1. the `ASTReader` doesn't do anything for the second time,
2. the listener is not invoked at all,
3. the module load result is not cached,
4. the compilation fails when attempting to load the module implicitly later on.
This patch solves this problem by attaching the listener to the `ASTReader` for PCH reading as well.
Reviewed By: dexonsmith
Differential Revision: https://reviews.llvm.org/D111560
To reduce the number of explicit builds of a single module, we can try to squash multiple occurrences of the module with different command-lines (and context hashes) by removing benign command-line options. The greatest contributors to benign differences between command-lines are the header search paths.
In this patch, the lookup cache in `HeaderSearch` is used to identify paths that were actually used when implicitly building the module during scanning. This information is serialized into the unhashed control block of the implicitly-built PCM. The dependency scanner then loads this and may use it to prune the header search paths before computing the context hash of the module and generating the command-line.
We could also prune the header search paths when serializing `HeaderSearchOptions` into the PCM. That way, we could do it only once instead of every load of the PCM file by dependency scanner. However, that would result in a PCM file whose contents don't produce the same context hash as the original build, which is probably highly surprising.
There is an alternative approach to storing extra information into the PCM: wire up preprocessor callbacks to capture the used header search paths on-the-fly during preprocessing of modularized headers (similar to what we currently do for the main source file and textual headers). Right now, that's not compatible with the fact that we do an actual implicit build producing PCM files during dependency scanning. The second run of dependency scanner loads the PCM from the first run, skipping the preprocessing altogether, which would result in different results between runs. We can revisit this approach when we stop building implicitly during dependency scanning.
Depends on D102923.
Reviewed By: dexonsmith
Differential Revision: https://reviews.llvm.org/D102488