literal operators. Also, for now, allow the proposed C++1y "il", "i", and "if"
suffixes too. (Will revert the latter if LWG decides not to go ahead with that
change after all.)
llvm-svn: 191274
Before this patch, Lex() would recurse whenever the current lexer changed (e.g.
upon entry into a macro). This patch turns the recursion into a loop: the
various lex routines now don't return a token when the current lexer changes,
and at the top level Preprocessor::Lex() now loops until it finds a token.
Normally, the recursion wouldn't end up being very deep, but the recursion depth
can explode in edge cases like a bunch of consecutive macros which expand to
nothing (like in the testcase test/Preprocessor/macro_expand_empty.c in this
patch).
<rdar://problem/14569770>
llvm-svn: 190980
Apparently, gcc's -traditional-cpp behaves slightly differently in C++ mode;
specifically, it discards "//" comments. Match gcc's behavior.
<rdar://problem/14808126>
llvm-svn: 189515
If the user has requested this warning, we should emit it, even if it's not
an extension in the current language mode. However, being an extension is
more important, so prefer the pedantic warning or the pedantic-compatibility
warning if those are enabled.
<rdar://problem/12922063>
llvm-svn: 189110
It's beneficial when compiling to treat // as the start of a line
comment even in -std=c89 mode, since it's not valid C code (with a few
rare exceptions) and is usually intended as such. We emit a pedantic
warning and then continue on as if line comments were enabled.
This has been our behavior for quite some time.
However, people use the preprocessor for things besides C source files.
In today's prompting example, the input contains (unquoted) URLs, which
contain // but should still be preserved.
This change instructs the lexer to treat // as a plain token if Clang is
in C90 mode and generating preprocessed output rather than actually compiling.
<rdar://problem/13338743>
llvm-svn: 176526
Add warnings under -Wc++11-compat, -Wc++98-compat, and -Wc99-compat when a
particular UCN is incompatible with a different standard, and -Wunicode when
a UCN refers to a surrogate character in C++03.
llvm-svn: 174788
Rewriting the same predicates over and over again is bad for code size and
code maintainence. Using the functions in <ctype.h> is generally unsafe
unless they are specified to be locale-independent (i.e. only isdigit and
isxdigit).
The next commit will try to clean up uses of <ctype.h> functions within Clang.
llvm-svn: 174765
This allows people to use Unicode in their #pragma mark and in macros
that exist only to be string-ized.
<rdar://problem/13107323&13121362>
llvm-svn: 174081
People use the C preprocessor for things other than C files. Some of them
have Unicode characters. We shouldn't warn about Unicode characters
appearing outside of identifiers in this case.
There's not currently a way for the preprocessor to tell if it's in -E mode,
so I added a new flag, derived from the PreprocessorOutputOptions. This is
only used by the Unicode warnings for now, but could conceivably be used by
other warnings or even behavioral differences later.
<rdar://problem/13107323>
llvm-svn: 173881
This is a missing piece for C99 conformance.
This patch handles UCNs by adding a '\\' case to LexTokenInternal and
LexIdentifier -- if we see a backslash, we tentatively try to read in a UCN.
If the UCN is not syntactically well-formed, we fall back to the old
treatment: a backslash followed by an identifier beginning with 'u' (or 'U').
Because the spelling of an identifier with UCNs still has the UCN in it, we
need to convert that to UTF-8 in Preprocessor::LookUpIdentifierInfo.
Of course, valid code that does *not* use UCNs will see only a very minimal
performance hit (checks after each identifier for non-ASCII characters,
checks when converting raw_identifiers to identifiers that they do not
contain UCNs, and checks when getting the spelling of an identifier that it
does not contain a UCN).
This patch also adds basic support for actual UTF-8 in the source. This is
treated almost exactly the same as UCNs except that we consider stray
Unicode characters to be mistakes and offer a fixit to remove them.
llvm-svn: 173369
uncovered.
This required manually correcting all of the incorrect main-module
headers I could find, and running the new llvm/utils/sort_includes.py
script over the files.
I also manually added quite a few missing headers that were uncovered by
shuffling the order or moving headers up to be main-module-headers.
llvm-svn: 169237
string literal needs cleaning (because it contains line-splicing in the
encoding prefix or in the ud-suffix), do not clean the section between the
double-quotes -- that's the "raw" bit!
llvm-svn: 168776
This makes LexCharConstant() look more like LexStringLiteral(), which doesn't
have this bug. Add tests for eof after \ for several other cases.
llvm-svn: 168269
undefined behaviour, and move the diagnostic for '' from an Error into
an ExtWarn in this group. This is important for some users of the preprocessor,
and is necessary for gcc compatibility.
llvm-svn: 159335
* Removed docs for Lexer::makeFileCharRange from Lexer.cpp, as they're in
the header file;
* Reworked the documentation for SkipBlockComment so that it doesn't confuse
Doxygen's comment parsing;
* Added another summary with \brief markup.
llvm-svn: 158618
1. Teach Lexer that pragma lexers are like macro expansions at EOF.
2. Treat pragmas like #define/#undef when printing.
3. If we just printed a directive, add a newline before any more tokens.
(4. Miscellaneous cleanup in PrintPreprocessedOutput.cpp)
PR10594 and <rdar://problem/11562490> (two separate related problems)
llvm-svn: 158571
modes. For languages other than C99/C11, this isn't quite a conforming
extension, and for C++11, it breaks some reasonable code containing
user-defined literals.
In languages which don't officially have hexfloats, pare back this extension
to only apply in cases where the token starts 0x and does not contain an
underscore. The extension is still not quite conforming, but it's a lot closer
now.
llvm-svn: 158487
This was a problem for people who write 'return(result);'
Also fix ARCMT's corresponding code, though there's no test case for this
because implicit casts like this are rejected by the migrator for being
ambiguous, and explicit casts have no problem.
<rdar://problem/11577346>
llvm-svn: 158130
starting with an underscore is ill-formed.
Since this rule rejects programs that were using <inttypes.h>'s macros, recover
from this error by treating the ud-suffix as a separate preprocessing-token,
with a DefaultError ExtWarn. The approach of treating such cases as two tokens
is under discussion for standardization, but is in any case a conforming
extension and allows existing codebases to keep building while the committee
makes up its mind.
Reword the warning on the definition of literal operators not starting with
underscores (which are, strangely, legal) to more explicitly state that such
operators can't be called by literals. Remove the special-case diagnostic for
hexfloats, since it was both triggering in the wrong cases and incorrect.
llvm-svn: 152287
grammar requires a string-literal and not a user-defined-string-literal. The
two constructs are still represented by the same TokenKind, in order to prevent
a combinatorial explosion of different kinds of token. A flag on Token tracks
whether a ud-suffix is present, in order to prevent clients from needing to look
at the token's spelling.
llvm-svn: 152098
re-computed rather than the variables be re-used just after the assert.
Just use the variables since we have them already. Fixes an unused
variable warning.
Also fix an 80-column violation.
llvm-svn: 148212
This also adds a -Wc++98-compat-pedantic for warning on constructs which would
be diagnosed by -std=c++98 -pedantic (that is, it warns even on C++11 features
which we enable by default, with no warning, in C++98 mode).
llvm-svn: 142034
Previously we would cut off the source file buffer at the code-completion
point; this impeded code-completion inside C++ inline methods and,
recently, with buffering ObjC methods.
Have the code-completion inserted into the source buffer so that it can
be buffered along with a method body. When we actually hit the code-completion
point the cut-off lexing or parsing.
Fixes rdar://10056932&8319466
llvm-svn: 139086
The function was only counting lines that included tokens and not empty lines,
but MaxLines (mainly initiated to the line where the code-completion point resides)
is a count of overall lines (even empty ones).
llvm-svn: 139085
loads the named module. The syntax itself is intentionally hideous and
will be replaced at some later point with something more
palatable. For now, we're focusing on the semantics:
- Module imports are handled first by the preprocessor (to get macro
definitions) and then the same tokens are also handled by the parser
(to get declarations). If both happen (as in normal compilation),
the second one is redundant, because we currently have no way to
hide macros or declarations when loading a module. Chris gets credit
for this mad-but-workable scheme.
- The Preprocessor now holds on to a reference to a module loader,
which is responsible for loading named modules. CompilerInstance is
the only important module loader: it now knows how to create and
wire up an AST reader on demand to actually perform the module load.
- We search for modules in the include path, using the module name
with the suffix ".pcm" (precompiled module) for the file name. This
is a temporary hack; we hope to improve the situation in the
future.
llvm-svn: 138679
etc. With this I think essentially all of the SourceManager APIs are
converted. Comments and random other bits of cleanup should be all thats
left.
llvm-svn: 136057
and various other 'expansion' based terms. I've tried to reformat where
appropriate and catch as many references in comments but I'm going to do
several more passes. Also I've tried to expand parameter names to be
more clear where appropriate.
llvm-svn: 136056