Commit Graph

347 Commits

Author SHA1 Message Date
Corentin Jabot 2715c4da50 Do not emit diagnostics for invalid unicode characters in preprocessing mode
This amends 4e80636db7 with a fix for
https://lab.llvm.org/buildbot/#/builders/139/builds/8943
2021-08-18 09:12:36 -04:00
Corentin Jabot 4e80636db7 Implement P1949
This adds the Unicode 13 data for XID_Start and XID_Continue.
The definition of valid identifier is changed in all C++ modes
as P1949 (https://wg21.link/p1949) was accepted by WG21 as a defect
report.
2021-08-18 07:33:14 -04:00
Christopher Di Bella 0871954197 Revert "Revert "[clang][pp] adds '#pragma include_instead'""
Includes regression test for problem noted by @hans.
This reverts commit 973de71856.

Differential Revision: https://reviews.llvm.org/D106898
2021-07-29 19:21:43 +00:00
Hans Wennborg 973de71856 Revert "[clang][pp] adds '#pragma include_instead'"
> `#pragma clang include_instead(<header>)` is a pragma that can be used
> by system headers (and only system headers) to indicate to a tool that
> the file containing said pragma is an implementation-detail header and
> should not be directly included by user code.
>
> The library alternative is very messy code that can be seen in the first
> diff of D106124, and we'd rather avoid that with something more
> universal.
>
> This patch takes the first step by warning a user when they include a
> detail header in their code, and suggests alternative headers that the
> user should include instead. Future work will involve adding a fixit to
> automate the process, as well as cleaning up modules diagnostics to not
> suggest said detail headers. Other tools, such as clangd can also take
> advantage of this pragma to add the correct user headers.
>
> Differential Revision: https://reviews.llvm.org/D106394

This caused compiler crashes in Chromium builds involving PCH and an include
directive with macro expansion, when Token::getLiteralData() returned null. See
the code review for details.

This reverts commit e8a64e5491.
2021-07-27 17:29:48 +02:00
Christopher Di Bella e8a64e5491 [clang][pp] adds '#pragma include_instead'
`#pragma clang include_instead(<header>)` is a pragma that can be used
by system headers (and only system headers) to indicate to a tool that
the file containing said pragma is an implementation-detail header and
should not be directly included by user code.

The library alternative is very messy code that can be seen in the first
diff of D106124, and we'd rather avoid that with something more
universal.

This patch takes the first step by warning a user when they include a
detail header in their code, and suggests alternative headers that the
user should include instead. Future work will involve adding a fixit to
automate the process, as well as cleaning up modules diagnostics to not
suggest said detail headers. Other tools, such as clangd can also take
advantage of this pragma to add the correct user headers.

Differential Revision: https://reviews.llvm.org/D106394
2021-07-26 16:07:45 +00:00
Simon Tatham 21401a7262 [clang] Introduce SourceLocation::[U]IntTy typedefs.
This is part of a patch series working towards the ability to make
SourceLocation into a 64-bit type to handle larger translation units.

NFC: this patch introduces typedefs for the integer type used by
SourceLocation and makes all the boring changes to use the typedefs
everywhere, but for the moment, they are unconditionally defined to
uint32_t.

Patch originally by Mikhail Maltsev.

Reviewed By: tmatheson

Differential Revision: https://reviews.llvm.org/D105492
2021-07-21 10:45:46 +01:00
Sam McCall fd22785054 [Lex] Consider a PCH header-guarded even with #endif truncated
This seems to be a more useful behavior for tools that use preambles.
I believe it doesn't affect real compiles: the PCH is only included once
when used, and recursive inclusion of the main-file *within* the PCH
isn't supported in any case.

Differential Revision: https://reviews.llvm.org/D106204
2021-07-20 14:25:36 +02:00
Yitzhak Mandelbaum 93dc73b1e0 [Lexer] Fix bug in `makeFileCharRange` called on split tokens.
When the end loc of the specified range is a split token, `makeFileCharRange`
does not process it correctly.  This patch adds proper support for split tokens.

Differential Revision: https://reviews.llvm.org/D105365
2021-07-14 14:36:31 +00:00
Aaron Ballman 8edd3464af Add support for #elifdef and #elifndef
WG14 adopted N2645 and WG21 EWG has accepted P2334 in principle (still
subject to full EWG vote + CWG review + plenary vote), which add
support for #elifdef as shorthand for #elif defined and #elifndef as
shorthand for #elif !defined. This patch adds support for the new
preprocessor directives.
2021-05-27 08:57:47 -04:00
Richard Smith de6164ec4d PR50456: Properly handle multiple escaped newlines in a '*/'. 2021-05-24 16:21:03 -07:00
serge-sans-paille 9628cb1fee [NFC] Use higher level constructs to check for whitespace/newlines in the lexer
It turns out that according to valgrind and perf, it's also slightly faster.

Differential Revision: https://reviews.llvm.org/D98637
2021-03-15 18:27:19 +01:00
Aaron Ballman e448310059 Add support for digit separators in C2x.
WG14 adopted N2626 at the meetings this week. This commit adds support
for using ' as a digit separator in a numeric literal which is
compatible with the C++ feature.
2021-03-12 07:21:03 -05:00
Duncan P. N. Exon Smith b3eff6b7bb Lexer: Update the Lexer to use MemoryBufferRef, NFC
Update `Lexer` / `Lexer::Lexer` to use `MemoryBufferRef` instead of
`MemoryBuffer*`. Callers that were acquiring a `MemoryBuffer*` via
`SourceManager::getBuffer` were updated, such that if they checked
`Invalid` they use `getBufferOrNone` and otherwise `getBufferOrFake`.

Differential Revision: https://reviews.llvm.org/D89398
2020-10-19 19:10:21 -04:00
Zequan Wu 9caa3fbe03 [Coverage] Add empty line regions to SkippedRegions
Differential Revision: https://reviews.llvm.org/D84988
2020-09-21 12:42:53 -07:00
Eric Christopher 1f593f46f3 [AST/Lex/Parse/Sema] As part of using inclusive language within
the llvm project, migrate away from the use of blacklist and whitelist.
2020-06-20 01:15:32 -07:00
Aaron Ballman 1b3f1f4436 Rename warning identifiers from cxx2a to cxx20; NFC. 2020-04-22 14:31:13 -04:00
Aaron Ballman 6a30894391 C++2a -> C++20 in some identifiers; NFC. 2020-04-21 15:37:19 -04:00
Haojian Wu 297a9dac43 [CodeComplete] Don't replace the rest of line in #include completion.
Summary:
The previous behavior was aggressive,
  #include "abc/f^/abc.h"
                foo/  -> candidate
"f/abc.h" is replaced with "foo/", this patch will preserve the "abc.h".

Reviewers: sammccall

Subscribers: jkorous, arphaman, kadircet, usaxena95, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D76770
2020-03-26 11:03:04 +01:00
serge-sans-paille 3185c30c54 Prefer __vector over vector keyword for altivec
`vector' uses the keyword-and-predefine mode from gcc, while __vector is
reliably supported.

As a side effect, it also makes the code consistent in its usage of __vector.

Differential Revision: https://reviews.llvm.org/D74129
2020-02-10 20:23:26 +01:00
Benjamin Kramer adcd026838 Make llvm::StringRef to std::string conversions explicit.
This is how it should've been and brings it more in line with
std::string_view. There should be no functional change here.

This is mostly mechanical from a custom clang-tidy check, with a lot of
manual fixups. It uncovers a lot of minor inefficiencies.

This doesn't actually modify StringRef yet, I'll do that in a follow-up.
2020-01-28 23:25:25 +01:00
Scott Egerton a90ea38698 [Lexer] Allow UCN for dollar symbol '\u0024' in identifiers when using -fdollars-in-identifiers flag.
Summary:
Previously, the -fdollars-in-identifiers flag allows the '$' symbol to be used
in an identifier but the universal character name equivalent '\u0024' is not
allowed.
This patch changes this, so that \u0024 is valid in identifiers.

Reviewers: rsmith, jordan_rose

Reviewed By: rsmith

Subscribers: dexonsmith, simoncook, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D71758
2020-01-15 11:28:57 +00:00
Simon Pilgrim f2132070d9 Lexer::ReadToEndOfLine - fix Token uninitialised value warnings. NFCI.
Use Token::startToken to initialize Token.
2019-10-28 18:28:18 +00:00
Alex Lorenz ca6e60971e [clang-scan-deps] add skip excluded conditional preprocessor block preprocessing optimization
This commit adds an optimization to clang-scan-deps and clang's preprocessor that skips excluded preprocessor
blocks by bumping the lexer pointer, and not lexing the tokens until reaching appropriate #else/#endif directive.
The skip positions and lexer offsets are computed when the file is minimized, directly from the minimized tokens.

On an 18-core iMacPro with macOS Catalina Beta I got 10-15% speed-up from this optimization when running clang-scan-deps on
the compilation database for a recent LLVM and Clang (3511 files).

Differential Revision: https://reviews.llvm.org/D67127

llvm-svn: 371656
2019-09-11 20:40:31 +00:00
Fangrui Song 90e95bb289 Delete dead stores
llvm-svn: 365901
2019-07-12 14:04:34 +00:00
Richard Smith 91e150d54c Replace tok::angle_string_literal with new tok::header_name.
Use the new kind for both angled header-name tokens and for
double-quoted header-name tokens.

This is in preparation for C++20's context-sensitive header-name token
formation rules.

llvm-svn: 356530
2019-03-19 22:09:55 +00:00
Chandler Carruth 2946cd7010 Update the file headers across all of the LLVM projects in the monorepo
to reflect the new license.

We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.

Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.

llvm-svn: 351636
2019-01-19 08:50:56 +00:00
Raphael Isemann b23ccecbb0 Misc typos fixes in ./lib folder
Summary: Found via `codespell -q 3 -I ../clang-whitelist.txt -L uint,importd,crasher,gonna,cant,ue,ons,orign,ned`

Reviewers: teemperor

Reviewed By: teemperor

Subscribers: teemperor, jholewinski, jvesely, nhaehnle, whisperity, jfb, cfe-commits

Differential Revision: https://reviews.llvm.org/D55475

llvm-svn: 348755
2018-12-10 12:37:46 +00:00
Erik Pilkington fa98390b3c NFC: Remove the ObjC1/ObjC2 distinction from clang (and related projects)
We haven't supported compiling ObjC1 for a long time (and never will again), so
there isn't any reason to keep these separate. This patch replaces
LangOpts::ObjC1 and LangOpts::ObjC2 with LangOpts::ObjC.

Differential revision: https://reviews.llvm.org/D53547

llvm-svn: 345637
2018-10-30 20:31:30 +00:00
Richard Smith 4e966e8135 Don't emit "will be treated as an identifier character" warning for
UTF-8 characters that aren't identifier characters in the current
language mode.

llvm-svn: 343040
2018-09-25 22:34:45 +00:00
Sam McCall 3d8051abb8 [CodeComplete] Add completions for filenames in #include directives.
Summary:
The dir component ("somedir" in #include <somedir/fo...>) is considered fixed.
We append "foo" to each directory on the include path, and then list its files.

Completions are of the forms:
 #include <somedir/fo^
                   foo.h>
                   fox/

The filter is set to the filename part ("fo"), so fuzzy matching can be
applied to the filename only.

No fancy scoring/priorities are set, and no information is added to
CodeCompleteResult to make smart scoring possible. Could be in future.

Reviewers: ilya-biryukov

Subscribers: cfe-commits

Differential Revision: https://reviews.llvm.org/D52076

llvm-svn: 342449
2018-09-18 08:40:41 +00:00
Richard Smith 8ed7776bc4 PR38870: Add warning for zero-width unicode characters appearing in
identifiers.

llvm-svn: 341700
2018-09-07 19:25:39 +00:00
Adrian Prantl 9fc8faf9e6 Remove \brief commands from doxygen comments.
This is similar to the LLVM change https://reviews.llvm.org/D46290.

We've been running doxygen with the autobrief option for a couple of
years now. This makes the \brief markers into our comments
redundant. Since they are a visual distraction and we don't want to
encourage more \brief markers in new code either, this patch removes
them all.

Patch produced by

for i in $(git grep -l '\@brief'); do perl -pi -e 's/\@brief //g' $i & done
for i in $(git grep -l '\\brief'); do perl -pi -e 's/\\brief //g' $i & done

Differential Revision: https://reviews.llvm.org/D46320

llvm-svn: 331834
2018-05-09 01:00:01 +00:00
Richard Smith b5f8171a1b PR37189 Fix incorrect end source location and spelling for a split '>>' token.
When a '>>' token is split into two '>' tokens (in C++11 onwards), or (as an
extension) when we do the same for other tokens starting with a '>', we can't
just use a location pointing to the first '>' as the location of the split
token, because that would result in our miscomputing the length and spelling
for the token. As a consequence, for example, a refactoring replacing 'A<X>'
with something else would sometimes replace one character too many, and
similarly diagnostics highlighting a template-id source range would highlight
one character too many.

Fix this by creating an expansion range covering the first character of the
'>>' token, whose spelling is '>'. For this to work, we generalize the
expansion range of a macro FileID to be either a token range (the common case)
or a character range (used in this new case).

llvm-svn: 331155
2018-04-30 05:25:48 +00:00
Ilya Biryukov ef4ece75fd [CodeComplete] Fix completion in the middle of ident in ctor lists.
Summary:
The example that was broken before (^ designates completion points):

    class Foo {
      Foo() : fie^ld^() {} // no completions were provided here.
      int field;
    };

To fix it we don't cut off lexing after an identifier followed by code
completion token is lexed. Instead we skip the rest of identifier and
continue lexing.
This is consistent with behavior of completion when completion token is
right before the identifier.

Reviewers: sammccall, aaron.ballman, bkramer, sepavloff, arphaman, rsmith

Reviewed By: aaron.ballman

Subscribers: cfe-commits

Differential Revision: https://reviews.llvm.org/D44932

llvm-svn: 330833
2018-04-25 15:13:34 +00:00
Ilya Biryukov b3510c4254 [CodeComplete] Fix completion at the end of keywords
Summary:
Make completion behave consistently no matter if it is run at the
start, in the middle or at the end of an identifier that happens to
be a keyword or a macro name. Since completion is often ran on
incomplete identifiers, they may turn into keywords by accident.

For example, we should produce same results for all of these
completion points:

    // ^ is completion point.
    ^class
    cla^ss
    class^

Previously clang produced different results for the last case (as if
the completion point was after a space: `class ^`).

This change also updates some offsets in tests that (unintentionally?)
relied on the old behavior.

Reviewers: sammccall, bkramer, arphaman, aaron.ballman

Reviewed By: sammccall

Subscribers: cfe-commits

Differential Revision: https://reviews.llvm.org/D45887

llvm-svn: 330717
2018-04-24 13:48:53 +00:00
Alexander Kornienko 2a8c18d991 Fix typos in clang
Found via codespell -q 3 -I ../clang-whitelist.txt
Where whitelist consists of:

  archtype
  cas
  classs
  checkk
  compres
  definit
  frome
  iff
  inteval
  ith
  lod
  methode
  nd
  optin
  ot
  pres
  statics
  te
  thru

Patch by luzpaz! (This is a subset of D44188 that applies cleanly with a few
files that have dubious fixes reverted.)

Differential revision: https://reviews.llvm.org/D44188

llvm-svn: 329399
2018-04-06 15:14:32 +00:00
Volodymyr Sapsai abb8dfc114 [Lex] Avoid out-of-bounds dereference in LexAngledStringLiteral.
Fix makes the loop in LexAngledStringLiteral more like the loops in
LexStringLiteral, LexCharConstant. When we skip a character after
backslash, we need to check if we reached the end of the file instead of
reading the next character unconditionally.

Discovered by OSS-Fuzz:
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=3832

rdar://problem/35572754

Reviewers: arphaman, kcc, rsmith, dexonsmith

Reviewed By: rsmith, dexonsmith

Subscribers: cfe-commits, rsmith, dexonsmith

Differential Revision: https://reviews.llvm.org/D41423

llvm-svn: 322390
2018-01-12 18:54:35 +00:00
Richard Smith 77091b167f Warn if we find a Unicode homoglyph for a symbol in an identifier.
Specifically, warn if:
 * we find a character that the language standard says we must treat as an
   identifier, and
 * that character is not reasonably an identifier character (it's a punctuation
   character or similar), and 
 * it renders identically to a valid non-identifier character in common
   fixed-width fonts.

Some tools "helpfully" substitute the surprising characters for the expected
characters, and replacing semicolons with Greek question marks is a common
"prank".

llvm-svn: 320697
2017-12-14 13:15:08 +00:00
Eugene Zelenko cb96ac64b0 [Lex] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC).
llvm-svn: 320207
2017-12-08 22:39:26 +00:00
Taewook Oh cebac48bf7 Stringizing raw string literals containing newline
Summary: This patch implements 4.3 of http://open-std.org/jtc1/sc22/wg21/docs/papers/2014/n4220.pdf. If a raw string contains a newline character, replace each newline character with the \n escape code. Without this patch, included test case (macro_raw_string.cpp) results compilation failure.

Reviewers: rsmith, doug.gregor, jkorous-apple

Reviewed By: jkorous-apple

Subscribers: jkorous-apple, vsapsai, cfe-commits

Differential Revision: https://reviews.llvm.org/D39279

llvm-svn: 319904
2017-12-06 17:00:53 +00:00
Aaron Ballman c351fba69e Now that C++17 is official (https://www.iso.org/standard/68564.html), start changing the C++1z terminology over to C++17. NFC intended, these are all mechanical changes.
llvm-svn: 319688
2017-12-04 20:27:34 +00:00
Richard Smith edbf5972a4 [c++2a] P0515R3: lexer support for new <=> token.
llvm-svn: 319509
2017-12-01 01:07:10 +00:00
Alex Lorenz ebbbb81266 [refactor][extract] insert semicolons into extracted/inserted code
when needed

This commit implements the semicolon insertion logic into the extract
refactoring. The following rules are used:

- extracting expression: add terminating ';' to the extracted function.
- extracting statements that don't require terminating ';' (e.g. switch): add
  terminating ';' to the callee.
- extracting statements with ';':  move (if possible) the original ';' from the
  callee and add terminating ';'.
- otherwise, add ';' to both places.

Differential Revision: https://reviews.llvm.org/D39441

llvm-svn: 317343
2017-11-03 18:11:22 +00:00
Aaron Ballman 606093a53b Add -f[no-]double-square-bracket-attributes as new driver options to control use of [[]] attributes in all language modes. This is the initial implementation of WG14 N2165, which is a proposal to add [[]] attributes to C2x, but also allows you to enable these attributes in C++98, or disable them in C++11 or later.
llvm-svn: 315856
2017-10-15 15:01:42 +00:00
Alex Lorenz d5bf436d3a [Lex] Avoid out-of-bounds dereference in SkipLineComment
Credit to OSS-Fuzz for discovery:
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=3145

rdar://34526482

llvm-svn: 315785
2017-10-14 01:18:30 +00:00
Alex Lorenz c1e32fca96 A '<' with a trigraph '#' is not a valid editor placeholder
Credit to OSS-Fuzz for discovery:
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=3137#c5

rdar://34923985

llvm-svn: 315398
2017-10-11 00:41:20 +00:00
Cameron Desrochers 531aec2e90 Fixed unused variable warning introduced in r313796 causing build failure
llvm-svn: 313802
2017-09-20 19:37:37 +00:00
Cameron Desrochers 84fd064ef9 [PCH] Fixed preamble breaking with BOM presence (and particularly, fluctuating BOM presence)
This patch fixes broken preamble-skipping when the preamble region includes a byte order mark (BOM). Previously, parsing would fail if preamble PCH generation was enabled and a BOM was present.

This also fixes preamble invalidation when a BOM appears or disappears. This may seem to be an obscure edge case, but it happens regularly with IDEs that pass buffer overrides that never (or always) have a BOM, yet the underlying file from the initial parse that generated a PCH might (or might not) have a BOM.

I've included a test case for these scenarios.

Differential Revision: https://reviews.llvm.org/D37491

llvm-svn: 313796
2017-09-20 19:03:37 +00:00
Erich Keane e916d54614 [Preprocessor] Correct internal token parsing of newline characters in CRLF
Correct implementation:  Apparently I managed in r311683 to submit the wrong
version of the patch for this, so I'm correcting it now.

Differential Revision: https://reviews.llvm.org/D37079

llvm-svn: 312542
2017-09-05 17:32:36 +00:00
Erich Keane 5a2b322e0d [Preprocessor] Correct internal token parsing of newline characters in CRLF
Discovered due to a goofy git setup, the test system-headerline-directive.c 
(and a few others) failed because the token-consumption will consume only the 
'\r' in CRLF, making the preprocessor's printed value give the wrong line number 
when returning from an include. For example:

(line 1):#include <noline.h>\r\n

The "file exit" code causes the printer to try to print the 'returned to the 
main file' line. It looks up what the current line number is. However, since the 
current 'token' is the '\n' (since only the \r was consumed), it will give the 
line number as '1", not '2'. This results in a few failed tests, but more 
importantly, results in error messages being incorrect when compiling a 
previously preprocessed file.

Differential Revision: https://reviews.llvm.org/D37079

llvm-svn: 311683
2017-08-24 18:36:07 +00:00