Commit Graph

2610 Commits

Author SHA1 Message Date
Iain Sandoe a2dd6130d4 [clang][Modules] Fix a regression in handling missing framework headers.
The commit of af2d11b1d5 missed a case where
the value of a suggested module needed to be reset to nullptr.  Fixed thus
and added a testcase to cover the circumstance.
2022-08-19 09:13:22 +01:00
Nico Weber aacf1a9742 Revert "[clang] adds unary type transformations as compiler built-ins"
This reverts commit bc60cf2368.
Doesn't build on Windows and breaks gcc 9 build, see
https://reviews.llvm.org/D116203#3722094 and
https://reviews.llvm.org/D116203#3722128

Also revert two follow-ups. One fixed a warning added in
bc60cf2368, the other
makes use of the feature added in bc60cf2368
in libc++:

Revert "[libcxx][NFC] utilises compiler builtins for unary transform type-traits"
This reverts commit 06a1d917ef.

Revert "[Sema] Fix a warning"
This reverts commit c85abbe879.
2022-08-14 15:58:21 -04:00
Christopher Di Bella bc60cf2368 [clang] adds unary type transformations as compiler built-ins
Adds

* `__add_lvalue_reference`
* `__add_pointer`
* `__add_rvalue_reference`
* `__decay`
* `__make_signed`
* `__make_unsigned`
* `__remove_all_extents`
* `__remove_extent`
* `__remove_const`
* `__remove_volatile`
* `__remove_cv`
* `__remove_pointer`
* `__remove_reference`
* `__remove_cvref`

These are all compiler built-in equivalents of the unary type traits
found in [[meta.trans]][1]. The compiler already has all of the
information it needs to answer these transformations, so we can skip
needing to make partial specialisations in standard library
implementations (we already do this for a lot of the query traits). This
will hopefully improve compile times, as we won't need use as much
memory in such a base part of the standard library.

[1]: http://wg21.link/meta.trans

Co-authored-by: zoecarver

Reviewed By: aaron.ballman, rsmith

Differential Revision: https://reviews.llvm.org/D116203
2022-08-14 17:12:15 +00:00
Fangrui Song 32197830ef [clang][clang-tools-extra] LLVM_NODISCARD => [[nodiscard]]. NFC 2022-08-09 07:11:18 +00:00
Fangrui Song 3f18f7c007 [clang] LLVM_FALLTHROUGH => [[fallthrough]]. NFC
With C++17 there is no Clang pedantic warning or MSVC C5051.

Reviewed By: aaron.ballman

Differential Revision: https://reviews.llvm.org/D131346
2022-08-08 09:12:46 -07:00
Jun Zhang 786b503f66
[Clang][Lex] Extend HeaderSearch::LookupFile to control OpenFile behavior.
In the case of static compilation the file system is pretty much read-only
and taking a snapshot of it usually is sufficient. In the interactive C++
case the compilation is longer and people can create and include files, etc.
In that case we often do not want to open files or cache failures unless is
absolutely necessary.

This patch extends the original API call by forwarding some optional flags,
so we can continue use it in the previous way with no breakage.
Signed-off-by: Jun Zhang <jun@junz.org>

Differential Revision: https://reviews.llvm.org/D131241
2022-08-06 11:36:02 +08:00
Gabriel Ravier 5674a3c880 Fixed a number of typos
I went over the output of the following mess of a command:

(ulimit -m 2000000; ulimit -v 2000000; git ls-files -z |
 parallel --xargs -0 cat | aspell list --mode=none --ignore-case |
 grep -E '^[A-Za-z][a-z]*$' | sort | uniq -c | sort -n |
 grep -vE '.{25}' | aspell pipe -W3 | grep : | cut -d' ' -f2 | less)

and proceeded to spend a few days looking at it to find probable typos
and fixed a few hundred of them in all of the llvm project (note, the
ones I found are not anywhere near all of them, but it seems like a
good start).

Differential Revision: https://reviews.llvm.org/D130827
2022-08-01 13:13:18 -04:00
Corentin Jabot ad16268f13 [Clang] Do not check for underscores in isAllowedInitiallyIDChar
isAllowedInitiallyIDChar is only used with non-ASCII codepoints,
which are handled by isAsciiIdentifierStart.
To make that clearer, remove the check for _ from
isAllowedInitiallyIDChar, and assert on ASCII - to ensure neither
_ or $ are passed to this function.

Reviewed By: tahonermann, aaron.ballman

Differential Revision: https://reviews.llvm.org/D130750
2022-07-29 17:46:38 +02:00
Corentin Jabot 559f07b872 [Clang] Adjust extension warnings for #warning
The #warning directive is standard in C++2b and C2x,
this adjusts the pedantic and extensions warning accordingly.

Reviewed By: aaron.ballman

Differential Revision: https://reviews.llvm.org/D130415
2022-07-23 14:10:11 +02:00
Corentin Jabot aee76cb59c [Clang] Add support for Unicode identifiers (UAX31) in C2x mode.
This implements
N2836 Identifier Syntax using Unicode Standard Annex 31.

The feature was already implemented for C++,
and the semantics are the same.

Unlike C++ there was, afaict, no decision to
backport the feature in older languages mode,
so C17 and earlier are not modified and the
code point tables for these language modes are conserved.

Reviewed By: aaron.ballman

Differential Revision: https://reviews.llvm.org/D130416
2022-07-23 14:08:08 +02:00
Kazu Hirata 70257fab68 Use any_of (NFC) 2022-07-22 01:05:17 -07:00
Volodymyr Sapsai 381fcaa136 [modules] Replace `-Wauto-import` with `-Rmodule-include-translation`.
Diagnostic for `-Wauto-import` shouldn't be a warning because it doesn't
represent a potential problem in code that should be fixed. And the
emitted fix-it is likely to trigger `-Watimport-in-framework-header`
which makes it challenging to have a warning-free codebase. But it is
still useful to see how include directives are translated into modular
imports and which module a header belongs to, that's why keep it as a remark.

Keep `-Wauto-import` for now to allow a gradual migration for codebases
using `-Wno-auto-import`, e.g., `-Weverything -Wno-auto-import`.

rdar://79594287

Differential Revision: https://reviews.llvm.org/D130138
2022-07-21 17:42:04 -07:00
Kazu Hirata cb2c8f694d [clang] Use value instead of getValue (NFC) 2022-07-13 23:39:33 -07:00
Corentin Jabot 6882ca9aff [Clang] Adjust extension warnings for delimited sequences
WG21 approved delimited escape sequences and named escape
sequences.
Adjust the extension warnings accordingly, and update
the release notes.

Reviewed By: aaron.ballman

Differential Revision: https://reviews.llvm.org/D129664
2022-07-14 07:50:58 +02:00
Corentin Jabot d4892a168f [Clang] Add a warning on invalid UTF-8 in comments.
Introduce an off-by default `-Winvalid-utf8` warning
that detects invalid UTF-8 code units sequences in comments.

Invalid UTF-8 in other places is already diagnosed,
as that cannot appear in identifiers and other grammar constructs.

The warning is off by default as its likely to be somewhat disruptive
otherwise.

This warning allows clang to conform to the yet-to be approved WG21
"P2295R5 Support for UTF-8 as a portable source file encoding"
paper.

Reviewed By: aaron.ballman, #clang-language-wg

Differential Revision: https://reviews.llvm.org/D128059
2022-07-13 10:19:26 +02:00
Jonas Devlieghere a262f4dbd7 Revert "[Clang] Add a warning on invalid UTF-8 in comments."
This reverts commit cc309721d2 because it
breaks the following tests on GreenDragon:

  TestDataFormatterObjCCF.py
  TestDataFormatterObjCExpr.py
  TestDataFormatterObjCKVO.py
  TestDataFormatterObjCNSBundle.py
  TestDataFormatterObjCNSData.py
  TestDataFormatterObjCNSError.py
  TestDataFormatterObjCNSNumber.py
  TestDataFormatterObjCNSURL.py
  TestDataFormatterObjCPlain.py
  TestDataFormatterObjNSException.py

https://green.lab.llvm.org/green/view/LLDB/job/lldb-cmake/45288/
2022-07-12 15:22:29 -07:00
Corentin Jabot cc309721d2 [Clang] Add a warning on invalid UTF-8 in comments.
Introduce an off-by default `-Winvalid-utf8` warning
that detects invalid UTF-8 code units sequences in comments.

Invalid UTF-8 in other places is already diagnosed,
as that cannot appear in identifiers and other grammar constructs.

The warning is off by default as its likely to be somewhat disruptive
otherwise.

This warning allows clang to conform to the yet-to be approved WG21
"P2295R5 Support for UTF-8 as a portable source file encoding"
paper.

Reviewed By: aaron.ballman, #clang-language-wg

Differential Revision: https://reviews.llvm.org/D128059
2022-07-12 14:34:30 +02:00
Iain Sandoe af2d11b1d5 [C++20][Modules] Implement include translation.
This addresses [cpp.include]/7

(when encountering #include header-name)

If the header identified by the header-name denotes an importable header, it
is implementation-defined whether the #include preprocessing directive is
instead replaced by an import directive.

In this implementation, include translation is performed _only_ for headers
in the Global Module fragment, so:
```
module;
 #include "will-be-translated.h" // IFF the header unit is available.

export module M;
 #include "will-not-be-translated.h" // even if the header unit is available
```
The reasoning is that, in general, includes in the module purview would not
be validly translatable (they would have to immediately follow the module
decl and without any other intervening decls).  Otherwise that would violate
the rules on contiguous import directives.

This would be quite complex to track in the preprocessor, and for relatively
little gain (the user can 'import "will-not-be-translated.h";' instead.)

TODO: This is one area where it becomes increasingly difficult to disambiguate
clang modules in C++ from C++ standard modules.  That needs to be addressed in
both the driver and the FE.

Differential Revision: https://reviews.llvm.org/D128981
2022-07-10 11:06:51 +01:00
Corentin Jabot 50416e5454 Revert "[Clang] Add a warning on invalid UTF-8 in comments."
It is probable thart this change crashes on the powerpc bots.

This reverts commit 355532a149.
2022-07-09 17:18:35 +02:00
Corentin Jabot 355532a149 [Clang] Add a warning on invalid UTF-8 in comments.
Introduce an off-by default `-Winvalid-utf8` warning
that detects invalid UTF-8 code units sequences in comments.

Invalid UTF-8 in other places is already diagnosed,
as that cannot appear in identifiers and other grammar constructs.

The warning is off by default as its likely to be somewhat disruptive
otherwise.

This warning allows clang to conform to the yet-to be approved WG21
"P2295R5 Support for UTF-8 as a portable source file encoding"
paper.

Reviewed By: aaron.ballman, #clang-language-wg

Differential Revision: https://reviews.llvm.org/D128059
2022-07-09 11:26:45 +02:00
Nico Weber e9fe20dab3 Revert "[Clang] Add a warning on invalid UTF-8 in comments."
This reverts commit 4174f0ca61.

Also revert follow-up "[Clang] Fix invalid utf-8 detection"
This reverts commit bf45e27a67.

The second commit broke tests, see comments on
https://reviews.llvm.org/D129223, and it sounds like the first
commit isn't valid without the second one. So reverting both for now.
2022-07-06 22:51:52 +02:00
Corentin Jabot 4174f0ca61 [Clang] Add a warning on invalid UTF-8 in comments.
Introduce an off-by default `-Winvalid-utf8` warning
that detects invalid UTF-8 code units sequences in comments.

Invalid UTF-8 in other places is already diagnosed,
as that cannot appear in identifiers and other grammar constructs.

The warning is off by default as its likely to be somewhat disruptive
otherwise.

This warning allows clang to conform to the yet-to be approved WG21
"P2295R5 Support for UTF-8 as a portable source file encoding"
paper.

Reviewed By: aaron.ballman, #clang-language-wg

Differential Revision: https://reviews.llvm.org/D128059
2022-07-06 21:18:29 +02:00
Corentin Jabot fb06dd3e8c Revert "[Clang] Add a warning on invalid UTF-8 in comments."
Reverting while I investigate build failures

This reverts commit e3dc56805f.
2022-07-06 19:45:12 +02:00
Corentin Jabot e3dc56805f [Clang] Add a warning on invalid UTF-8 in comments.
Introduce an off-by default `-Winvalid-utf8` warning
that detects invalid UTF-8 code units sequences in comments.

Invalid UTF-8 in other places is already diagnosed,
as that cannot appear in identifiers and other grammar constructs.

The warning is off by default as its likely to be somewhat disruptive
otherwise.

This warning allows clang to conform to the yet-to be approved WG21
"P2295R5 Support for UTF-8 as a portable source file encoding"
paper.

Reviewed By: aaron.ballman, #clang-language-wg

Differential Revision: https://reviews.llvm.org/D128059
2022-07-06 17:59:44 +02:00
Argyrios Kyrtzidis 0d3a2b4c66 [Lex] Introduce `PPCallbacks::LexedFileChanged()` preprocessor callback
This is a preprocessor callback focused on the lexed file changing, without conflating effects of line number directives and other pragmas.
A client that only cares about what files the lexer processes, like dependency generation, can use this more straightforward
callback instead of `PPCallbacks::FileChanged()`. Clients that want the pragma directive effects as well can keep using `FileChanged()`.

A use case where `PPCallbacks::LexedFileChanged()` is particularly simpler to use than `FileChanged()` is in a situation
where a client wants to keep track of lexed file changes that include changes from/to the predefines buffer, where it becomes
unnecessary complicated trying to use `FileChanged()` while filtering out the pragma directives effects callbacks.

Also take the opportunity to provide information about the prior `FileID` the `Lexer` moved from, even when entering a new file.

Differential Revision: https://reviews.llvm.org/D128947
2022-07-01 14:22:31 -07:00
Argyrios Kyrtzidis c68b8c84eb [Lex] Make sure to notify `MultipleIncludeOpt` for "read tokens" during fast dependency directive lexing
Otherwise a header may be erroneously marked as having a header macro guard and won't get re-included.

Differential Revision: https://reviews.llvm.org/D128772
2022-06-29 15:50:16 -07:00
Egor Zhdan 5f2cf3a21f [Clang][Preprocessor] Fix inconsistent `FLT_EVAL_METHOD` when compiling vs preprocessing
When running `clang -E -Ofast` on macOS, the `__FLT_EVAL_METHOD__` macro is `0`, which causes the following typedef to be emitted into the preprocessed source: `typedef float float_t`.

However, when running `clang -c -Ofast`, `__FLT_EVAL_METHOD__` is `-1`, and `typedef long double float_t` is emitted.

This causes build errors for certain projects, which are not reproducible when compiling from preprocessed source.

The issue is that `__FLT_EVAL_METHOD__` is configured in `Sema::Sema` which is not executed when running in `-E` mode.

This change moves that logic into the preprocessor initialization method, which is invoked correctly in `-E` mode.

rdar://96134605
rdar://92748429

Differential Revision: https://reviews.llvm.org/D128814
2022-06-29 19:36:22 +01:00
Corentin Jabot a9a60f20e6 [Clang] Rename StringLiteral::isAscii() => isOrdinary() [NFC]
"Ascii" StringLiteral instances are actually narrow strings
that are UTF-8 encoded and do not have an encoding prefix.
(UTF8 StringLiteral are also UTF-8 encoded strings, but with
the u8 prefix.

To avoid possible confusion both with actuall ASCII strings,
and with future works extending the set of literal encodings
supported by clang, this rename StringLiteral::isAscii() to
isOrdinary(), matching C++ standard terminology.

Reviewed By: aaron.ballman

Differential Revision: https://reviews.llvm.org/D128762
2022-06-29 18:28:51 +02:00
Kazu Hirata ca05cc2064 [clang] Don't use Optional::hasValue (NFC)
This patch replaces x.hasValue() with x where x is contextually
convertible to bool.
2022-06-26 18:51:54 -07:00
Kazu Hirata 97afce08cb [clang] Don't use Optional::hasValue (NFC)
This patch replaces Optional::hasValue with the implicit cast to bool
in conditionals only.
2022-06-25 22:26:24 -07:00
Kazu Hirata 3b7c3a654c Revert "Don't use Optional::hasValue (NFC)"
This reverts commit aa8feeefd3.
2022-06-25 11:56:50 -07:00
Kazu Hirata aa8feeefd3 Don't use Optional::hasValue (NFC) 2022-06-25 11:55:57 -07:00
Corentin Jabot c92056d038 [Clang][C++23] P2071 Named universal character escapes
Implements [[ https://wg21.link/p2071r1  | P2071 Named Universal Character Escapes ]] - as an extension in all language mode, the patch  not warn in c++23 mode will be done later once this paper is plenary approved (in July).

We add

 * A code generator that transforms `UnicodeData.txt` and `NameAliases.txt` to a space efficient data structure that can be queried in `O(NameLength)`
 * A set of functions in `Unicode.h` to query that data, including

   * A function to find an exact match of a given Unicode character name
   * A function to perform a loose (ignoring case, space, underscore, medial hyphen) matching
   * A function returning the best matching codepoint for a given string per edit distance

 * Support of `\N{}` escape sequences in String and character Literals, with loose and typos diagnostics/fixits
 * Support of `\N{}` as UCN with loose matching diagnostics/fixits.

Loose matching is considered an error to match closely the semantics of P2071.

The generated data contributes to 280kB of data to the binaries.

`UnicodeData.txt` and `NameAliases.txt`  are not committed to the repository in this patch, and regenerating the data is a manual process.

Reviewed By: tahonermann

Differential Revision: https://reviews.llvm.org/D123064
2022-06-25 19:03:33 +02:00
Kazu Hirata ca4af13e48 [clang] Don't use Optional::getValue (NFC) 2022-06-20 22:59:26 -07:00
Kazu Hirata d66cbc565a Don't use Optional::hasValue (NFC) 2022-06-20 20:26:05 -07:00
Kazu Hirata 0916d96d12 Don't use Optional::hasValue (NFC) 2022-06-20 20:17:57 -07:00
Kazu Hirata 064a08cd95 Don't use Optional::hasValue (NFC) 2022-06-20 20:05:16 -07:00
Kazu Hirata 5413bf1bac Don't use Optional::hasValue (NFC) 2022-06-20 11:33:56 -07:00
Kazu Hirata 452db157c9 [clang] Don't use Optional::hasValue (NFC) 2022-06-20 10:51:34 -07:00
Argyrios Kyrtzidis f7e19a5928 [Lex] Keep track of skipped preprocessor blocks and advance the lexer directly if they are revisited
This speeds up preprocessing, specifically for preprocessing the clang sources time is reduced by about -36%,
using measurements on M1Pro with a release+thinLTO build.

Differential Revision: https://reviews.llvm.org/D127379
2022-06-13 21:46:46 -07:00
Jan Svoboda d9390b6ac3 Reapply "[clang][lex] NFCI: Use DirectoryEntryRef in HeaderSearch::load*()"
This reverts commit 340654e0f2, essentially reapplying 1d3ba05e4a.

The test VFS/real-path-found-first.m that was failing on Windows is now passing with a workaround.
2022-06-13 17:03:32 +02:00
Nico Weber 5f57ca208b fix comment typo to cycle bots 2022-06-11 18:55:40 -04:00
Argyrios Kyrtzidis fbaa8b9ae5 [Lex] Fix `fixits` for typo-corrections of preprocessing directives within skipped blocks
The `EndLoc` parameter was always unset so no fixit was emitted. But it is also unnecessary for determining the range so we can remove it.

Differential Revision: https://reviews.llvm.org/D127251
2022-06-10 13:32:19 -07:00
Leonard Grey dd6bcdbf21 [Attributes] Remove AttrSyntax and migrate uses to AttributeCommonInfo::Syntax (NFC)
This is setup for allowing hasAttribute to work for plugin-provided attributes

Differential Revision: https://reviews.llvm.org/D126902
2022-06-03 12:11:48 -04:00
Paul Pluzhnikov 4ad17d2e96 Clean "./" from __FILE__ expansion.
This is alternative to https://reviews.llvm.org/D121733
and helps with Clang header modules in which FILE
may expand to "./foo.h" or "foo.h" depending on whether the file was
included directly or not.

Only do this when UseTargetPathSeparator is true, as we are already
changing the path in that case.

Reviewed By: ayzhao

Differential Revision: https://reviews.llvm.org/D126396
2022-06-02 18:00:19 -04:00
Argyrios Kyrtzidis fad6e37995 [Lex] Fix crash during dependency scanning while skipping an unmatched `#if` 2022-05-27 23:59:30 -07:00
Argyrios Kyrtzidis b4c83a13f6 [Tooling/DependencyScanning & Preprocessor] Refactor dependency scanning to produce pre-lexed preprocessor directive tokens, instead of minimized sources
This is a commit with the following changes:

* Remove `ExcludedPreprocessorDirectiveSkipMapping` and related functionality

Removes `ExcludedPreprocessorDirectiveSkipMapping`; its intended benefit for fast skipping of excluded directived blocks
will be superseded by a follow-up patch in the series that will use dependency scanning lexing for the same purpose.

* Refactor dependency scanning to produce pre-lexed preprocessor directive tokens, instead of minimized sources

Replaces the "source minimization" mechanism with a mechanism that produces lexed dependency directives tokens.

* Make the special lexing for dependency scanning a first-class feature of the `Preprocessor` and `Lexer`

This is bringing the following benefits:

    * Full access to the preprocessor state during dependency scanning. E.g. a component can see what includes were taken and where they were located in the actual sources.
    * Improved performance for dependency scanning. Measurements with a release+thin-LTO build shows ~ -11% reduction in wall time.
    * Opportunity to use dependency scanning lexing to speed-up skipping of excluded conditional blocks during normal preprocessing (as follow-up, not part of this patch).

For normal preprocessing measurements show differences are below the noise level.

Since, after this change, we don't minimize sources and pass them in place of the real sources, `DependencyScanningFilesystem` is not technically necessary, but it has valuable performance benefits for caching file `stat`s along with the results of scanning the sources. So the setup of using the `DependencyScanningFilesystem` during a dependency scan remains.

Differential Revision: https://reviews.llvm.org/D125486
Differential Revision: https://reviews.llvm.org/D125487
Differential Revision: https://reviews.llvm.org/D125488
2022-05-26 12:50:06 -07:00
Argyrios Kyrtzidis b58a420ff4 [Tooling/DependencyScanning] Rename refactorings towards transitioning dependency scanning to use pre-lexed preprocessor directive tokens
This is first of a series of patches for making the special lexing for dependency scanning a first-class feature of the `Preprocessor` and `Lexer`.
This patch only includes NFC renaming changes to make reviewing of the functionality changing parts easier.

Differential Revision: https://reviews.llvm.org/D125484
2022-05-26 12:49:51 -07:00
Yaxun (Sam) Liu cefe472c51 [clang] Fix __has_builtin
Fix __has_builtin to return 1 only if the requested target features
of a builtin are enabled by refactoring the code for checking
required target features of a builtin and use it in evaluation
of __has_builtin.

Reviewed by: Artem Belevich

Differential Revision: https://reviews.llvm.org/D125829
2022-05-19 11:34:42 -04:00
Ken Matsui 45e01ce5fe [clang] Avoid suggesting typoed directives in `.S` files
This patch is itended to avoid suggesting typoed directives in `.S`
files to support the cases of `#` directives treated as comments or
various pseudo-ops. The feature is implemented in
https://reviews.llvm.org/D124726.

Fixes: https://reviews.llvm.org/D124726#3516346.

Reviewed By: nickdesaulniers

Differential Revision: https://reviews.llvm.org/D125727
2022-05-16 15:46:59 -07:00
Ken Matsui a247ba9d15 Suggest typo corrections for preprocessor directives
When a preprocessor directive is unknown outside of a skipped
conditional block, we give an error diagnostic because we don't know
how to proceed with preprocessing. But when the directive is in a
skipped conditional block, we would not diagnose it on the theory that
the directive may be known to an implementation other than Clang.

Now, for unknown directives inside a skipped conditional block, we
diagnose the unknown directive as a warning if it is sufficiently
similar to a directive specific to preprocessor conditional blocks. For
example, we'll warn about `#esle` and suggest `#else` but we won't warn
about `#progma` because it's not a directive specific to preprocessor
conditional blocks.

Fixes #51598

Differential Revision: https://reviews.llvm.org/D124726
2022-05-13 09:16:46 -04:00
Timm Bäder b91073db6a [clang][preprocessor] Fix unsigned-ness of utf8 char literals
UTF8 char literals are always unsigned.

Fixes https://github.com/llvm/llvm-project/issues/54886

Differential Revision: https://reviews.llvm.org/D124996
2022-05-13 07:57:10 +02:00
Ken Matsui a1545f51a9 Warn if using `elifdef` & `elifndef` in not C2x & C++2b mode
This adds an extension warning when using the preprocessor conditionals
in a language mode they're not officially supported in, and an opt-in
warning for compatibility with previous standards.

Fixes #55306
Differential Revision: https://reviews.llvm.org/D125178
2022-05-12 09:26:44 -04:00
Alan Zhao 6398f3f2e9 [clang] Add the flag -ffile-reproducible
When Clang generates the path prefix (i.e. the path of the directory
where the file is) when generating FILE, __builtin_FILE(), and
std::source_location, Clang uses the platform-specific path separator
character of the build environment where Clang _itself_ is built. This
leads to inconsistencies in Chrome builds where Clang running on
non-Windows environments uses the forward slash (/) path separator
while Clang running on Windows builds uses the backslash (\) path
separator. To fix this, we add a flag -ffile-reproducible (and its
inverse, -fno-file-reproducible) to have Clang use the target's
platform-specific file separator character.

Additionally, the existing flags -fmacro-prefix-map and
-ffile-prefix-map now both imply -ffile-reproducible. This can be
overriden by setting -fno-file-reproducible.

[0]: https://crbug.com/1310767

Differential revision: https://reviews.llvm.org/D122766
2022-05-11 23:04:36 +02:00
Ken Matsui 786c721c2b Add extension diagnostic for linemarker directives
This adds the -Wgnu-line-marker diagnostic flag, grouped under -Wgnu,
to warn about use of the GNU linemarker preprocessor extension.

Fixes #55067

Differential Revision: https://reviews.llvm.org/D124534
2022-05-11 06:42:00 -04:00
Sam McCall 817550919e [Lex] Don't assert when decoding invalid UCNs.
Currently if a lexically-valid UCN encodes an invalid codepoint, then we
diagnose that, and then hit an assertion while trying to decode it.

Since there isn't anything preventing us reaching this state, remove the
assertion. expandUCNs("X\UAAAAAAAAY") will produce "XY".

Differential Revision: https://reviews.llvm.org/D125059
2022-05-06 08:51:42 +02:00
Cyndy Ishida b6c67c3c67 [clang] Track how headers get included generally during lookup time
tapi & clang-extractapi both attempt to construct then check against
how a header was included to determine api information when working
against multiple search paths, headermap, and vfsoverlay mechanisms.
Validating this against what the preprocessor sees during lookup time
makes this check more reliable.

Reviewed By: zixuw, jansvoboda11

Differential Revision: https://reviews.llvm.org/D124638
2022-05-04 09:52:31 -07:00
Senran Zhang ae76eb32a5 [NFC][Clang][Pragma] Remove unused variables
Reviewed By: beanz

Differential Revision: https://reviews.llvm.org/D124339
2022-04-24 14:50:59 +08:00
Christopher Di Bella e9a902c7f7 Revert "Revert "Revert "[clang][pp] adds '#pragma include_instead'"""
> Includes regression test for problem noted by @hans.
> is reverts commit 973de71.
>
> Differential Revision: https://reviews.llvm.org/D106898

Feature implemented as-is is fairly expensive and hasn't been used by
libc++. A potential reimplementation is possible if libc++ become
interested in this feature again.

Differential Revision: https://reviews.llvm.org/D123885
2022-04-22 16:37:20 +00:00
Jan Svoboda 340654e0f2 Revert "[clang][lex] NFCI: Use DirectoryEntryRef in HeaderSearch::load*()"
This reverts commit 1d3ba05e4a which caused failures of the VFS/real-path-found-first.m test on Windows build bots.
2022-04-20 20:27:14 +02:00
Jan Svoboda 99cfccdcb3 [clang][lex] NFCI: Use FileEntryRef in ModuleMap::diagnoseHeaderInclusion()
This patch removes uses of the deprecated `DirectoryEntry::getName()` from the `ModuleMap::diagnoseHeaderInclusion()` function by using `{File,Directory}EntryRef` instead.

Reviewed By: bnbarham

Differential Revision: https://reviews.llvm.org/D123856
2022-04-20 20:27:13 +02:00
Jan Svoboda f43ce5199d [clang][lex] NFCI: Use DirectoryEntryRef in FrameworkCacheEntry
This patch changes the member of `FrameworkCacheEntry` from `const DirectoryEntry *` to `Optional<DirectoryEntryRef>` in order to remove uses of the deprecated `DirectoryEntry::getName()`.

Reviewed By: bnbarham

Differential Revision: https://reviews.llvm.org/D123854
2022-04-20 19:01:02 +02:00
Jan Svoboda 1d3ba05e4a [clang][lex] NFCI: Use DirectoryEntryRef in HeaderSearch::load*()
This patch removes uses of the deprecated `DirectoryEntry::getName()` from `HeaderSearch::load*()` functions by using `DirectoryEntryRef` instead.

Note that we bail out in one case and use the also deprecated `FileEntry::getLastRef()`. That's to prevent this patch from growing, and is addressed in a follow-up.

Reviewed By: bnbarham

Differential Revision: https://reviews.llvm.org/D123771
2022-04-20 18:52:27 +02:00
Timm Bäder 33ec653055 [clang][lexer] Allow u8 character literal prefixes in C2x
Implement N2418 for C2x.

Differential Revision: https://reviews.llvm.org/D119221
2022-04-19 09:57:51 +02:00
Jan Svoboda 0b09b5d448 [clang][lex] NFC: Use FileEntryRef in PreprocessorLexer::getFileEntry()
This patch changes the return type of `PreprocessorLexer::getFileEntry()` so that its clients may stop using the deprecated APIs of `FileEntry`.

Reviewed By: bnbarham

Differential Revision: https://reviews.llvm.org/D123772
2022-04-15 15:16:17 +02:00
Paul Robinson 7726ad04e2 [PS5] Add basic PS5 driver behavior
This adds a PS5-specific ToolChain subclass, which defines some basic
PS5 driver behavior. Future patches will add more target-specific
driver behavior.
2022-04-14 12:45:33 -07:00
Jan Svoboda d79ad2f1db [clang][lex] NFCI: Use FileEntryRef in PPCallbacks::InclusionDirective()
This patch changes type of the `File` parameter in `PPCallbacks::InclusionDirective()` from `const FileEntry *` to `Optional<FileEntryRef>`.

With the API change in place, this patch then removes some uses of the deprecated `FileEntry::getName()` (e.g. in `DependencyGraph.cpp` and `ModuleDependencyCollector.cpp`).

Reviewed By: dexonsmith, bnbarham

Differential Revision: https://reviews.llvm.org/D123574
2022-04-14 10:46:12 +02:00
Timm Bäder 0eb5891adc [clang][preprocessor] Allow calling DumpToken() on annotation tokens
Differential Revision: https://reviews.llvm.org/D122659
2022-04-13 07:06:00 +02:00
Jan Svoboda b672638dbc [clang][deps] Ensure deterministic filename case
The dependency scanner can reuse single FileManager instance across multiple translation units. This may lead to non-deterministic output depending on which TU gets processed first.

One of the problems is that Clang uses DirectoryEntry::getName in the header search algorithm. This function returns the path that was first used to construct the (shared) entry in FileManager. Using DirectoryEntryRef::getName instead preserves the case as it was spelled out for the current "get directory entry" request.

rdar://90647508

Reviewed By: dexonsmith

Differential Revision: https://reviews.llvm.org/D123229
2022-04-08 09:18:00 +02:00
Paul Robinson 1e085448b3 [PS4] Fix header search list
A missing "break" in the initial implementation had us adding a
spurious "/usr/include" to the header search list. Later someone
introduced LLVM_FALLTHROUGH to prevent a warning.  Replace this with
the correct "break" and make sure the extra directory isn't added to
the PS4 header search list.
2022-04-05 14:14:13 -07:00
David Goldman d9739f29cd Serialize PragmaAssumeNonNullLoc to support preambles
Previously, if a `#pragma clang assume_nonnull begin` was at the
end of a premable with a `#pragma clang assume_nonnull end` at the
end of the main file, clang would diagnose an unterminated begin in
the preamble and an unbalanced end in the main file.

With this change, those errors no longer occur and the case above is
now properly handled. I've added a corresponding test to clangd,
which makes use of preambles, in order to verify this works as
expected.

Differential Revision: https://reviews.llvm.org/D122179
2022-03-31 11:08:01 -04:00
Chuanqi Xu ee572129ae [C++20] [Modules] Use '-' as the separator of partitions when searching
in filesystems

It is simpler to search for module unit by -fprebuilt-module-path
option. However, the separator ':' of partitions is not friendly.
According to the discussion in https://reviews.llvm.org/D118586, I think
we get consensus to use '-' as the separator instead. The '-' is the
choice of GCC too.

Previously I thought it would be better to add an option. But I feel it
is over-engineering now. Another reason here is that there are too many
options for modules (for clang module mainly) now. Given it is not bad
to use '-' when searching, I think it is acceptable to not add an
option.

Reviewed By: iains

Differential Revision: https://reviews.llvm.org/D120874
2022-03-31 11:21:58 +08:00
Iain Sandoe 6c0e60e884 [C++20][Modules][HU 1/5] Introduce header units as a module type.
This is the first in a series of patches that introduce C++20 importable
header units.

These differ from clang header modules in that:
 (a) they are identifiable by an internal name
 (b) they represent the top level source for a single header - although
     that might include or import other headers.

We name importable header units with the path by which they are specified
(although that need not be the absolute path for the file).

So "foo/bar.h" would have a name "foo/bar.h".  Header units are made a
separate module type so that we can deal with diagnosing places where they
are permitted but a named module is not.

Differential Revision: https://reviews.llvm.org/D121095
2022-03-25 09:17:14 +00:00
Jan Svoboda 59dadd178b [clang][lex] Fix failures with Microsoft header search rules
`HeaderSearch` currently assumes `LookupFileCache` is eventually populated in `LookupFile`. However, that's not always the case with `-fms-compatibility` and its early returns.

This patch adds a defensive check that the iterator pulled out of the cache is actually valid before using it.

(This bug was introduced in D119721. Before that, the cache was initialized to `0` - essentially the `search_dir_begin()` iterator.)

Reviewed By: dexonsmith, erichkeane

Differential Revision: https://reviews.llvm.org/D122237
2022-03-23 14:49:17 +01:00
Zahira Ammarguellat bbf0d1932a Currently the control of the eval-method is mixed with fast-math.
FLT_EVAL_METHOD tells the user the precision at which, temporary results
are evaluated but when fast-math is enabled, the numeric values are not
guaranteed to match the source semantics, so the eval-method is
meaningless.
For example, the expression `x + y + z` has as source semantics `(x + y)
+ z`. FLT_EVAL_METHOD is telling the user at which precision `(x + y)`
is evaluated. With fast-math enable the compiler can choose to
evaluate the expression as `(y + z) + x`.
The correct behavior is to set the FLT_EVAL_METHOD to `-1` to tell the
user that the precision of the intermediate values is unknow. This
patch is doing that.

Differential Revision: https://reviews.llvm.org/D121122
2022-03-17 11:48:03 -07:00
Jan Svoboda 6007b0b67b [clang][deps] NFC: Use range-based for loop instead of iterators
The iterator is not needed after the loop body anymore, meaning we can use more terse range-based for loop.

Depends on D121295.

Reviewed By: dexonsmith

Differential Revision: https://reviews.llvm.org/D121685
2022-03-16 12:17:53 +01:00
Jan Svoboda 77924d60ef [clang][deps] Modules don't contribute to search path usage
To reduce the number of modules we build in explicit builds (which use strict context hash), we prune unused header search paths. This essentially merges parts of the dependency graph.

Determining whether a search path was used to discover a module (through implicit module maps) proved to be somewhat complicated. Initial support landed in D102923, while D113676 attempts to fix some bugs.

However, now that we don't use implicit module maps in explicit builds (since D120465), we don't need to consider such search paths as used anymore. Modules are no longer discovered through the header search mechanism, so we can drop such search paths (provided they are not needed for other reasons).

This patch removes whatever support for detecting such usage we had, since it's buggy and not required anymore.

Depends on D120465.

Reviewed By: dexonsmith

Differential Revision: https://reviews.llvm.org/D121295
2022-03-16 12:17:52 +01:00
Aaron Ballman 9e3e85ac6e Silence -Wlogical-op-parentheses and fix a logic bug while doing so 2022-03-14 10:13:39 -04:00
Aaron Ballman 8cba72177d Implement literal suffixes for _BitInt
WG14 adopted N2775 (http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2775.pdf)
at our Feb 2022 meeting. This paper adds a literal suffix for
bit-precise types that automatically sizes the bit-precise type to be
the smallest possible legal _BitInt type that can represent the literal
value. The suffix chosen is wb (for a signed bit-precise type) which
can be combined with the u suffix (for an unsigned bit-precise type).

The preprocessor continues to operate as-if all integer types were
intmax_t/uintmax_t, including bit-precise integer types. It is a
constraint violation if the bit-precise literal is too large to fit
within that type in the context of the preprocessor (when still using
a pp-number preprocessing token), but it is not a constraint violation
in other circumstances. This allows you to make bit-precise integer
literals that are wider than what the preprocessor currently supports
in order to initialize variables, etc.
2022-03-14 09:24:19 -04:00
Paul Robinson 7b85f0f32f [PS4] isPS4 and isPS4CPU are not meaningfully different 2022-03-03 11:36:59 -05:00
Dawid Jurczak d813116c9d [NFC][Lexer] Remove getLangOpts function from Lexer
Given that there is only one external user of Lexer::getLangOpts
we can remove getter entirely without much pain.

Differential Revision: https://reviews.llvm.org/D120404
2022-03-02 11:17:05 +01:00
Adam Czachorowski 8f4ea36bfe [clang] Improve laziness of resolving module map headers.
clang has support for lazy headers in module maps - if size and/or
modtime and provided in the cppmap file, headers are only resolved when
an include directive for a file with that size/modtime is encoutered.

Before this change, the lazy resolution was all-or-nothing per module.
That means as soon as even one file in that module potentially matched
an include, all lazy files in that module were resolved. With this
change, only files with matching size/modtime will be resolved.

The goal is to avoid unnecessary stat() calls on non-included files,
which is especially valuable on networked file systems, with higher
latency.

Differential Revision: https://reviews.llvm.org/D120569
2022-03-01 15:56:23 +01:00
Dawid Jurczak a64d3c602f [NFC][Lexer] Make Lexer::LangOpts const reference
This change can be seen as code cleanup but motivation is more performance related.
While browsing perf reports captured during Linux build we can notice unusual portion of instructions executed in std::vector<std::string> copy constructor like:

0.59%     0.58%  clang-14    clang-14      [.] std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >,
                                                                std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >::vector

or even:

1.42%     0.26%  clang    clang-14             [.] clang::LangOptions::LangOptions
       |
        --1.16%--clang::LangOptions::LangOptions
                  |
                   --0.74%--std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >,
                            std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >::vector

After more digging we can see that relevant LangOptions std::vector members (*Files, ModuleFeatures and NoBuiltinFuncs)
are constructed when Lexer::LangOpts field is initialized on list:

Lexer::Lexer(..., const LangOptions &langOpts, ...)
            : ..., LangOpts(langOpts),

Since LangOptions copy constructor is called by Lexer(..., const LangOptions &LangOpts,...) and local Lexer objects are created thousands times
(in Lexer::getRawToken, Preprocessor::EnterSourceFile and more) during single module processing in frontend it makes std::vector copy constructors surprisingly hot.

Unfortunately even though in current Lexer implementation mentioned std::vector members are unused and most of time empty,
no compiler is smart enough to optimize their std::vector copy constructors out (take a look at test assembly): https://godbolt.org/z/hdoxPfMYY even with LTO enabled.
However there is simple way to fix this. Since Lexer doesn't access *Files, ModuleFeatures, NoBuiltinFuncs and any other LangOptions fields (but only LangOptionsBase)
we can simply get rid of redundant copy constructor assembly by changing LangOpts type to more appropriate const LangOptions reference: https://godbolt.org/z/fP7de9176

Additionally we need to store LineComment outside LangOpts because it's written in SkipLineComment function.
Also FormatTokenLexer need to be adjusted a bit to avoid lifetime issues related to passing local LangOpts reference to Lexer.

After this change I can see more than 1% speedup in some of my microbenchmarks when using Clang release binary built with LTO.
For Linux build gains are not so significant but still nice at the level of -0.4%/-0.5% instructions drop.

Differential Revision: https://reviews.llvm.org/D120334
2022-02-28 15:42:19 +01:00
Zahira Ammarguellat 1592d88aa7 Add support for floating-point option `ffp-eval-method` and for
`pragma clang fp eval_method`.

Differential Revision: https://reviews.llvm.org/D109239
2022-02-23 15:00:18 -08:00
Dawid Jurczak fbe38a784e [NFC][Lexer] Make access to LangOpts more consistent
Before this change without any good reason Lexer::LangOpts is sometimes accessed by getter and another time read directly in Lexer functions.
Since getLangOpts is a bit more verbose prefer direct access to LangOpts member when possible.

Differential Revision: https://reviews.llvm.org/D120333
2022-02-23 12:46:13 +01:00
Florian Hahn 09193f20a1
Revert "Add support for floating-point option `ffp-eval-method` and for"
This reverts commit 32b73bc6ab.

This breaks builds on macOS in some configurations, because
__FLT_EVAL_METHOD__ is set to an unexpected value.

E.g.
https://green.lab.llvm.org/green/job/clang-stage1-RA/28282/consoleFull#129538464349ba4694-19c4-4d7e-bec5-911270d8a58c

More details available in the review thread
https://reviews.llvm.org/D109239
2022-02-18 11:04:00 +00:00
Zahira Ammarguellat 32b73bc6ab Add support for floating-point option `ffp-eval-method` and for
`pragma clang fp eval_method`.

https://reviews.llvm.org/D109239
2022-02-17 08:59:21 -08:00
Nico Weber 125abb61f7 Revert "Add support for floating-point option `ffp-eval-method` and for"
This reverts commit 4bafe65c2b.
Breaks at least Misc/warning-flags.c, see comments on
https://reviews.llvm.org/D109239
2022-02-15 22:02:25 -05:00
Zahira Ammarguellat 4bafe65c2b Add support for floating-point option `ffp-eval-method` and for
`pragma clang fp eval_method`.
2022-02-15 13:59:27 -08:00
Jan Svoboda e7dcf09fc3 [clang][lex] Use `SearchDirIterator` types in for loops
This patch replaces a lot of index-based loops with iterators and ranges.

Depends on D117566.

Reviewed By: ahoppen

Differential Revision: https://reviews.llvm.org/D119722
2022-02-15 11:02:26 +01:00
Jan Svoboda 17c9fcd6f6 [clang][lex] Use `ConstSearchDirIterator` in lookup cache
This patch starts using the new iterator type in `LookupFileCacheInfo`.

Depends on D117566.

Reviewed By: ahoppen

Differential Revision: https://reviews.llvm.org/D119721
2022-02-15 10:39:05 +01:00
Jan Svoboda 7631c366c8 [clang][lex] Introduce `ConstSearchDirIterator`
The `const DirectoryLookup *` out-parameter of `{HeaderSearch,Preprocessor}::LookupFile()` is assigned the most recently used search directory, which callers use to implement `#include_next`.

From the function signature it's not obvious the `const DirectoryLookup *` is being used as an iterator. This patch introduces `ConstSearchDirIterator` to make that affordance obvious. This would've prevented a bug that occurred after initially landing D116750.

Reviewed By: ahoppen

Differential Revision: https://reviews.llvm.org/D117566
2022-02-15 10:36:54 +01:00
Jan Svoboda a081a0654f [clang][lex] NFC: De-duplicate some #include_next logic
This patch addresses a FIXME and de-duplicates some `#include_next` logic

Depends on D119714.

Reviewed By: ahoppen

Differential Revision: https://reviews.llvm.org/D119716
2022-02-15 09:52:39 +01:00
Jan Svoboda d8298f04a9 [clang][lex][minimizer] Avoid treating path separators as comments
The minimizer strips out single-line comments (introduced by `//`). This sequence of characters can also appear in `#include` or `#import` directives where they play the role of path separators. We already avoid stripping this character sequence for `#include` but not for `#import` (which has the same semantics). This patch makes it so `#import <A//A.h>` is not affected by minimization. Previously, we would incorrectly reduce it into `#import <A`.

Reviewed By: arphaman

Differential Revision: https://reviews.llvm.org/D119226
2022-02-15 09:49:19 +01:00
Jan Svoboda fd2dff17c5 [clang][lex][minimizer] Ensure whitespace between squashed lines
The minimizer tries to squash multi-line macro definitions into single line. For that to work, contents of each line need to be separated by a space. Since we always strip leading whitespace on lines of a macro definition, the code currently tries to preserve exactly one space that appeared before the backslash.

This means the following code:

```
#define FOO(BAR) \
  #BAR           \
  baz
```

gets minimized into:

```
#define FOO(BAR) #BAR baz
```

However, if there are no spaces before the backslash on line 2:

```
#define FOO(BAR) \
  #BAR\
  baz
```

no space can be preserved, leading to (most likely) malformed macro definition:

```
#define FOO(BAR) #BARbaz
```

This patch makes sure we always put exactly one space at the end of line ending with a backslash.

Reviewed By: arphaman

Differential Revision: https://reviews.llvm.org/D119231
2022-02-15 09:49:03 +01:00
Jan Svoboda edd09bb5a4 [clang][lex] Remove `Preprocessor::GetCurDirLookup()`
`Preprocessor` exposes the search directory iterator via `GetCurDirLookup()` getter, which is only used in two static functions.

To simplify reasoning about search directory iterators/references and to simplify the `Preprocessor` API, this patch makes the two static functions private member functions and removes the getter entirely.

Depends D119708.

Reviewed By: ahoppen, dexonsmith

Differential Revision: https://reviews.llvm.org/D119714
2022-02-15 09:48:25 +01:00
Jan Svoboda 7a124f4859 [clang][lex] Remove `PPCallbacks::FileNotFound()`
The purpose of the `FileNotFound` preprocessor callback was to add the ability to recover from failed header lookups. This was to support downstream project.

However, injecting additional search path while performing header search can invalidate currently used iterators/references to `DirectoryLookup` in `Preprocessor` and `HeaderSearch`.

The downstream project ended up maintaining a separate patch to further tweak the functionality. Since we don't have any upstream users nor open source downstream users, I'd like to remove this callback for good to prevent future misuse. I doubt there are any actual downstream users, since the functionality is definitely broken at the moment.

Reviewed By: ahoppen

Differential Revision: https://reviews.llvm.org/D119708
2022-02-15 09:48:25 +01:00
Alex Lorenz 00cd6c0420 [Preprocessor] Reduce the memory overhead of `#define` directives (Recommit)
Recently we observed high memory pressure caused by clang during some parallel builds.
We discovered that we have several projects that have a large number of #define directives
in their TUs (on the order of millions), which caused huge memory consumption in clang due
to a lot of allocations for MacroInfo. We would like to reduce the memory overhead of
clang for a single #define to reduce the memory overhead for these files, to allow us to
reduce the memory pressure on the system during highly parallel builds. This change achieves
that by removing the SmallVector in MacroInfo and instead storing the tokens in an array
allocated using the bump pointer allocator, after all tokens are lexed.

The added unit test with 1000000 #define directives illustrates the problem. Prior to this
change, on arm64 macOS, clang's PP bump pointer allocator allocated 272007616 bytes, and
used roughly 272 bytes per #define. After this change, clang's PP bump pointer allocator
allocates 120002016 bytes, and uses only roughly 120 bytes per #define.

For an example test file that we have internally with 7.8 million #define directives, this
change produces the following improvement on arm64 macOS: Persistent allocation footprint for
this test case file as it's being compiled to LLVM IR went down 22% from 5.28 GB to 4.07 GB
and the total allocations went down 14% from 8.26 GB to 7.05 GB. Furthermore, this change
reduced the total number of allocations made by the system for this clang invocation from
1454853 to 133663, an order of magnitude improvement.

The recommit fixes the LLDB build failure.

Differential Revision: https://reviews.llvm.org/D117348
2022-02-14 09:27:44 -08:00
Alex Lorenz 3f05192c4c Revert "[Preprocessor] Reduce the memory overhead of `#define` directives"
This reverts commit 0d9b91524e.

This change broke LLDB's build. I will need to recommit after fixing LLDB.
2022-02-11 15:53:16 -08:00
Alex Lorenz 0d9b91524e [Preprocessor] Reduce the memory overhead of `#define` directives
Recently we observed high memory pressure caused by clang during some parallel builds.
We discovered that we have several projects that have a large number of #define directives
in their TUs (on the order of millions), which caused huge memory consumption in clang due
to a lot of allocations for MacroInfo. We would like to reduce the memory overhead of
clang for a single #define to reduce the memory overhead for these files, to allow us to
reduce the memory pressure on the system during highly parallel builds. This change achieves
that by removing the SmallVector in MacroInfo and instead storing the tokens in an array
allocated using the bump pointer allocator, after all tokens are lexed.

The added unit test with 1000000 #define directives illustrates the problem. Prior to this
change, on arm64 macOS, clang's PP bump pointer allocator allocated 272007616 bytes, and
used roughly 272 bytes per #define. After this change, clang's PP bump pointer allocator
allocates 120002016 bytes, and uses only roughly 120 bytes per #define.

For an example test file that we have internally with 7.8 million #define directives, this
change produces the following improvement on arm64 macOS: Persistent allocation footprint for
this test case file as it's being compiled to LLVM IR went down 22% from 5.28 GB to 4.07 GB
and the total allocations went down 14% from 8.26 GB to 7.05 GB. Furthermore, this change
reduced the total number of allocations made by the system for this clang invocation from
1454853 to 133663, an order of magnitude improvement.

Differential Revision: https://reviews.llvm.org/D117348
2022-02-11 15:01:10 -08:00