spelled (#pragma, _Pragma, __pragma). In -E mode, use that information
to add appropriate newlines when translating _Pragma and __pragma into
#pragma, like GCC does. Fixes <rdar://problem/8412013>.
llvm-svn: 113553
The extra data stored on user-defined literal Tokens is stored in extra
allocated memory, which is managed by the PreprocessorLexer because there isn't
a better place to put it that makes sure it gets deallocated, but only after
it's used up. My testing has shown no significant slowdown as a result, but
independent testing would be appreciated.
llvm-svn: 112458
__debug overflow_stack'.
- For testing crash reporting stuff... you'd think I could just use some C++
code but Doug keeps fixing stuff!
llvm-svn: 109587
reparsing an ASTUnit. When saving a preamble, create a buffer larger
than the actual file we're working with but fill everything from the
end of the preamble to the end of the file with spaces (so the lexer
will quickly skip them). When we load the file, create a buffer of the
same size, filling it with the file and then spaces. Then, instruct
the lexer to start lexing after the preamble, therefore continuing the
parse from the spot where the preamble left off.
It's now possible to perform a simple preamble build + parse (+
reparse) with ASTUnit. However, one has to disable a bunch of checking
in the PCH reader to do so. That part isn't committed; it will likely
be handled with some other kind of flag (e.g., -fno-validate-pch).
As part of this, fix some issues with null termination of the memory
buffers created for the preamble; we were trying to explicitly
NULL-terminate them, even though they were also getting implicitly
NULL terminated, leading to excess warnings about NULL characters in
source files.
llvm-svn: 109445
is present.
Rather than using clang_getCursorExtent(), which requires
us to lex the token at the ending position to determine its
length. Then, we'd be comparing [a, b) source ranges that cover the
characters in the range rather than the normal behavior for Clang's
source ranges, which covers the tokens in the range. However, relexing
causes us to read the source file (which may come from a precompiled
header), which is rather unfortunate and affects performance.
In the new scheme, we only use Clang-style source ranges that cover
the tokens in the range. At the entry points where this matters
(clang_annotateTokens, clang_getCursor), we make sure to move source
locations to the start of the token.
Addresses most of <rdar://problem/8049381>.
llvm-svn: 109134
which is the part of the file that contains all of the initial
comments, includes, and preprocessor directives that occur before any
of the actual code. Added a new -print-preamble cc1 action that is
only used for testing.
llvm-svn: 108913
When loading the PCH, IdentifierInfos that are associated with pragmas cause declarations that use these identifiers to be deserialized (e.g. the "clang" pragma causes the "clang" namespace to be loaded).
We can avoid this if we just use StringRefs for the pragmas.
As a bonus, since we don't have to create and pass IdentifierInfos, the pragma interfaces get a bit more simplified.
llvm-svn: 108237
In the case of backtracking, the cached token lexer will be the only
lexer on the stack, without this the token stack will be empty and EOF
won't be returned.
This fixes PR7072.
llvm-svn: 108124
Fix string concatenation to treat escapes in concatenated strings that
are wide because of other string chunks to process the escapes as wide
themselves. Before we would warn about and miscompile the attached testcase.
This fixes rdar://8040728 - miscompile + warning: hex escape sequence out of range
llvm-svn: 106012
diagnostics. That would be while we're parsing string literals for the
sole purpose of producing a diagnostic about them. Fixes
<rdar://problem/8026030>.
llvm-svn: 104684
1) Suppress diagnostics as soon as we form the code-completion
token, so we don't get any error/warning spew from the early
end-of-file.
2) If we consume a code-completion token when we weren't expecting
one, go into a code-completion recovery path that produces the best
results it can based on the context that the parser is in.
llvm-svn: 104585
make it miss (invalid) things like:
<<<<<<<
>>>>>>>
and crash if
<<<<<<<
was at the end of the line. When we find a >>>>>>> that is not at the
end of the line, make sure to reset Pos so we don't crash on something
like:
<<<<<<< >>>>>>>
This isn't worth making testcases for, since each would require a new file.
rdar://7987078 - signal 11 compiling "<<<<<<<<<<"
llvm-svn: 103968
in an input file like this:
# 42
int x;
we were emitting:
# <something>
int x;
(with a space before the int) because we weren't clearing the leading
whitespace flag properly after the \n from the directive was handled.
llvm-svn: 101084
instantiations when we have the corresponding macro definition and by
removing macro definition information from our table when the macro is
undefined.
llvm-svn: 99004
record (which includes all macro instantiations and definitions). As
with all lay deserialization, this introduces a new external source
(here, an external preprocessing record source) that loads all of the
preprocessed entities prior to iterating over the entities.
The preprocessing record is an optional part of the precompiled header
that is disabled by default (enabled with
-detailed-preprocessing-record). When the preprocessor given to the
PCH writer has a preprocessing record, that record is written into the
PCH file. When the PCH reader is given a PCH file that contains a
preprocessing record, it will be lazily loaded (which, effectively,
implicitly adds -detailed-preprocessing-record). This is the first
case where we have sections of the precompiled header that are
added/removed based on a compilation flag, which is
unfortunate. However, this data consumes ~550k in the PCH file for
Cocoa.h (out of ~9.9MB), and there is a non-trivial cost to gathering
this detailed preprocessing information, so it's too expensive to turn
on by default. In the future, we should investigate a better encoding
of this information.
llvm-svn: 99002
eliminating the extra PopulatePreprocessingRecord object. This will
become useful once we start writing the preprocessing record to
precompiled headers.
llvm-svn: 98966
preprocessing record. Use that link with clang_getCursorReferenced()
and clang_getCursorDefinition() to match instantiations of a macro to
the definition of the macro.
llvm-svn: 98842
the macro definitions and macro instantiations that are found
during preprocessing. Preprocessing records are *not* generated by
default; rather, we provide a PPCallbacks subclass that hooks into the
existing callback mechanism to record this activity.
The only client of preprocessing records is CIndex, which keeps track
of macro definitions and instantations so that they can be exposed via
cursors. At present, only token annotation uses these facilities, and
only for macro instantiations; both will change in the near
future. However, with this change, token annotation properly annotates
macro instantiations that do not produce any tokens and instantiations
of macros that are later undef'd, improving our consistency.
Preprocessing directives that are not macro definitions are still
handled by clang_annotateTokens() via re-lexing, so that we don't have
to track every preprocessing directive in the preprocessing record.
Performance impact of preprocessing records is still TBD, although it
is limited to CIndex and therefore out of the path of the main compiler.
llvm-svn: 98836
SourceManager's getBuffer() and, therefore, could fail, along with
Preprocessor::getSpelling(). Use the Invalid parameters in the literal
parsers (string, floating point, integral, character) to make them
robust against errors that stem from, e.g., PCH files that are not
consistent with the underlying file system.
I still need to audit every use caller to all of these routines, to
determine which ones need specific handling of error conditions.
llvm-svn: 98608
SourceManager's getBuffer() (and similar) operations. This abstract
can be used to force callers to cope with errors in getBuffer(), such
as missing files and changed files. Fix a bunch of callers to use the
new interface.
Add some very basic checks for file consistency (file size,
modification time) into ContentCache::getBuffer(), although these
checks don't help much until we've updated the main callers (e.g.,
SourceManager::getSpelling()).
llvm-svn: 98585
guard macro is already defined for the first occurrence of the
header. If it is, the body will be skipped and not be properly
analyzed for the include guard optimization.
llvm-svn: 95972
doing so invalidates the file guard optimization and is not
in the spirit of "#if 0" because it is supposed to completely
skip everything, even if it isn't lexically valid. Patch by
Abramo Bagnara!
llvm-svn: 95253
region of interest (if provided). Implement clang_getCursor() in terms
of this traversal rather than using the Index library; the unified
cursor visitor is more complete, and will be The Way Forward.
Minor other tweaks needed to make this work:
- Extend Preprocessor::getLocForEndOfToken() to accept an offset
from the end, making it easy to move to the last character in the
token (rather than just past the end of the token).
- In Lexer::MeasureTokenLength(), the length of whitespace is zero.
llvm-svn: 94200
disabled with the intent that users can start with them now and not have to change
a thing to have them work when we implement the features.
llvm-svn: 93312
incompatible with user-defined literals, specifically with the following form:
0x1p+1
The preprocessing-number token extends only as far as the 'p'; the '+' is not
included. Previously we could get away with this extension as p was an invalid
suffix, but now with user-defined literals, 'p' might well be a valid suffix
and we are forced to consider it as such.
This patch also adds a warning in non-0x C++ modes telling the user that
this extension is incompatible with C++0x that is enabled by default
(previously and with other languages, we warn only with a compliance
option such as -pedantic).
llvm-svn: 93135
definitions from a precompiled header. This ensures that
code-completion with macro names behaves the same with or without
precompiled headers.
llvm-svn: 92497
1. Don't make a copy of LangOptions every time a lexer is created.
2. Don't make CharInfo global mutable state.
3. Fix the implementation to properly treat ^Z as EOF instead of as
horizontal whitespace, which matches the semantic implemented by VC++.
llvm-svn: 91586
on PR5610 (2.185 -> 2.130s). The big issue is that this is making
insanely huge macro argument lists with over a million tokens in it.
The reason that mallco and free are so expensive is that we are
actually going to the kernel to get it, and switching to a bump
pointer allocator won't change this in an interesting way.
llvm-svn: 91449
We creating and free thousands of MacroArgs objects (and the related
std::vectors hanging off them) for the testcase in PR5610 even though
there are only ~20 live at a time. This doesn't actually use the
cache yet.
llvm-svn: 91391
expanding directives withing macro expansions. This is undefined behavior
according to 6.10.3p11, so we might as well be undefined in ways similar to
GCC.
llvm-svn: 91266
files with the contents of an arbitrary memory buffer. Use this new
functionality to drastically clean up the way in which we handle file
truncation for code-completion: all of the truncation/completion logic
is now encapsulated in the preprocessor where it belongs
(<rdar://problem/7434737>).
llvm-svn: 90300
in diagnostics when we fail to open a file. This allows us to
report things like:
$ clang test.c -I.
test.c:2:10: fatal error: error opening file './foo.h': Permission denied
#include "foo.h"
^
llvm-svn: 90276
stat a file but where mmaping it fails. In this case, we emit an
error like:
t.c:1:10: fatal error: error opening file '../../foo.h'
instead of "cannot find file".
llvm-svn: 90110
annotation token, because some of the tokens we're annotating might
not be in the set of cached tokens (we could have consumed them
unconditionally).
Also, move the tentative parsing from ParseTemplateTemplateArgument
into the one caller that needs it, improving recovery.
llvm-svn: 86904
a little fuzzy, but conceptually it's just uniquing the identifier.
Chris, please review. I debated splitting into const/non-const versions where
the const one propogated constness to the resulting IdentifierInfo*.
llvm-svn: 86106
only supporting a single stat cache. The immediate benefit of this
change is that we can now generate a PCH/AST file when including
another PCH file; in the future, the chain of stat caches will likely
be useful with multiple levels of PCH files.
llvm-svn: 84263
-code-completion-at=filename:line:column
which performs code completion at the specified location by truncating
the file at that position and enabling code completion. This approach
makes it possible to run multiple tests from a single test file, and
gives a more natural command-line interface.
llvm-svn: 82571
essence, code completion is triggered by a magic "code completion"
token produced by the lexer [*], which the parser recognizes at
certain points in the grammar. The parser then calls into the Action
object with the appropriate CodeCompletionXXX action.
Sema implements the CodeCompletionXXX callbacks by performing minimal
translation, then forwarding them to a CodeCompletionConsumer
subclass, which uses the results of semantic analysis to provide
code-completion results. At present, only a single, "printing" code
completion consumer is available, for regression testing and
debugging. However, the design is meant to permit other
code-completion consumers.
This initial commit contains two code-completion actions: one for
member access, e.g., "x." or "p->", and one for
nested-name-specifiers, e.g., "std::". More code-completion actions
will follow, along with improved gathering of code-completion results
for the various contexts.
[*] In the current -code-completion-dump testing/debugging mode, the
file is truncated at the completion point and EOF is translated into
"code completion".
llvm-svn: 82166
declaration in the AST.
The new ASTContext::getCommentForDecl function searches for a comment
that is attached to the given declaration, and returns that comment,
which may be composed of several comment blocks.
Comments are always available in an AST. However, to avoid harming
performance, we don't actually parse the comments. Rather, we keep the
source ranges of all of the comments within a large, sorted vector,
then lazily extract comments via a binary search in that vector only
when needed (which never occurs in a "normal" compile).
Comments are written to a precompiled header/AST file as a blob of
source ranges. That blob is only lazily loaded when one requests a
comment for a declaration (this never occurs in a "normal" compile).
The indexer testbed now supports comment extraction. When the
-point-at location points to a declaration with a Doxygen-style
comment, the indexer testbed prints the associated comment
block(s). See test/Index/comments.c for an example.
Some notes:
- We don't actually attempt to parse the comment blocks themselves,
beyond identifying them as Doxygen comment blocks to associate them
with a declaration.
- We won't find comment blocks that aren't adjacent to the
declaration, because we start our search based on the location of
the declaration.
- We don't go through the necessary hops to find, for example,
whether some redeclaration of a declaration has comments when our
current declaration does not. Similarly, we don't attempt to
associate a \param Foo marker in a function body comment with the
parameter named Foo (although that is certainly possible).
- Verification of my "no performance impact" claims is still "to be
done".
llvm-svn: 74704
with dos style newlines. I have a trivial test for this:
// RUN: clang-cc %s -verify
#define test(x, y) \
x ## y
but I don't know how to get svn to not change newlines and testrunner
doesn't work with dos style newlines either, so "not worth it". :)
rdar://6994000
llvm-svn: 73945
line, and when the pragma is at the end of a file. In this case, the last
token consumed could pop the lexer, invalidating CurPPLexer. Thanks to
Peter Thoman for pointing it out.
llvm-svn: 73689
registered when PCH wasn't being used. We should always install (in BuiltinInfo)
information about target-specific builtins, but we shouldn't register any builtin
identifier infos. This fixes the build of apps that use PCH and target specific
builtins together.
llvm-svn: 73492
builtin preprocessor macro. This appears to work with two caveats:
1) builtins are registered in -E mode, and 2) target-specific builtins
are unconditionally registered even if they aren't supported by the
target (e.g. SSE4 builtin when only SSE1 is enabled).
llvm-svn: 73289
diagnostic to include the full instantiation location for the
invalid paste. For:
#define foo(a, b) a ## b
#define bar(x) foo(x, ])
bar(a)
bar(zdy)
Instead of:
t.c:3:22: error: pasting formed 'a]', an invalid preprocessing token
#define foo(a, b) a ## b
^
t.c:3:22: error: pasting formed 'zdy]', an invalid preprocessing token
we now produce:
t.c:7:1: error: pasting formed 'a]', an invalid preprocessing token
bar(a)
^
t.c:4:16: note: instantiated from:
#define bar(x) foo(x, ])
^
t.c:3:22: note: instantiated from:
#define foo(a, b) a ## b
^
t.c:8:1: error: pasting formed 'zdy]', an invalid preprocessing token
bar(zdy)
^
t.c:4:16: note: instantiated from:
#define bar(x) foo(x, ])
^
t.c:3:22: note: instantiated from:
#define foo(a, b) a ## b
^
llvm-svn: 72519
1. When we accept "#garbage" in asm-with-cpp mode, change the token kind
of the # to unknown so that the preprocessor won't try to process it as
a real #. This fixes a crash on the attached example
2. Fix macro definition extents processing to handle #foo at the end of a
macro to say the definition ends with the foo, not the #.
This is a follow-on fix to r72283, and rdar://6916026
llvm-svn: 72388
two empty arguments. Also, add an assert so that this bug
manifests as an assertion failure, not a valgrind problem.
This fixes rdar://6880648 - [cpp] crash in ArgNeedsPreexpansion
llvm-svn: 71616
that if we're going to print an extension warning anyway,
there's no point to changing behavior based on NoExtensions: it will
only make error recovery worse.
Note that this doesn't cause any behavior change because NoExtensions
isn't used by the current front-end. I'm still considering what to do about
the remaining use of NoExtensions in IdentifierTable.cpp.
llvm-svn: 70273
PCH file. In the Cocoa-prefixed "Hello, World" benchmark, this takes
us from reading 503 identifiers down to 37 and from 470 macros down to
4. It also results in an 8% performance improvement.
llvm-svn: 70094
will let us test for multiple different warning modes in the same
file in regression tests.
This implements rdar://2362963, a 10-year old feature request :)
llvm-svn: 69560
support it. I don't know what evaluation method we use for complex
arithmetic, so I don't know whether/if we should warn about use of
CX_LIMITED_RANGE.
This concludes my planned hacking on STDC pragmas, flame away :)
llvm-svn: 69556
in a function-like macro body. This has the added bonus of moving some
function-like macro specific code out of the object-like macro codepath.
llvm-svn: 69530
as decimal, even if it starts with 0. Also, since things like 0x1 are
completely illegal, don't even bother using numericliteralparser for them.
llvm-svn: 69454
Highlights: PP::isNextPPTokenLParen() no longer eats the (
when present. We now simplify slightly the logic parsing
macro arguments. We now handle PR3937 and other related cases
correctly.
llvm-svn: 69411
This allows it to accurately measure tokens, so that we get:
t.cpp:8:13: error: unknown type name 'X'
static foo::X P;
~~~~~^
instead of the woefully inferior:
t.cpp:8:13: error: unknown type name 'X'
static foo::X P;
~~~~ ^
Most of this is just plumbing to push the reference around.
llvm-svn: 69099
t.c:3:8: warning: extra tokens at end of #endif directive
#endif foo
^
//
Don't do this in strict-C89 mode because bcpl comments aren't
valid there, and it is too much trouble to analyze whether
C block comments are safe.
llvm-svn: 69024
buffer generated for the current translation unit. If they are
different, complain and then ignore the PCH file. This effectively
checks for all compilation options that somehow would affect
preprocessor state (-D, -U, -include, the dreaded -imacros, etc.).
When we do accept the PCH file, throw away the contents of the
predefines buffer rather than parsing them, since all of the results
of that parsing are already stored in the PCH file. This eliminates
the ugliness with the redefinition of __builtin_va_list, among other
things.
llvm-svn: 68838
PCH. This works now, except for limitations not being able to do things
with identifiers. The basic example in the testcase works though.
llvm-svn: 68832
into clang-cc.cpp. This makes it so clang-cc constructs the *entire* predefines
buffer, not just half of it. A bonus of this is that we get to kill a copy
of DefineBuiltinMacro.
llvm-svn: 68830
improvement, source locations read from the PCH file will properly
resolve to the source files that were used to build the PCH file
itself.
Once we have the preprocessor state stored in the PCH file, source
locations that refer to macro instantiations that occur in the PCH
file should have the appropriate instantiation information.
llvm-svn: 68758
- Add -static-define option driver can use when __STATIC__ should be
defined (instead of __DYNAMIC__).
- Don't set __OPTIMIZE_SIZE__ on Os, __OPTIMIZE_SIZE__ is tied to Oz.
- Set __NO_INLINE__ following GCC 4.2.
- Set __GNU_GNU_INLINE__ or __GNU_STDC_INLINE__ following GCC 4.2.
- Set __EXCEPTIONS for Objective-C NonFragile ABI.
- Set __STRICT_ANSI__ for standard conforming modes.
- I added a clang style test case in utils for this, but its not
particularly portable and I don't think it belongs in the test
suite.
llvm-svn: 68621
- Add -pic-level clang-cc option to specify the value for the define,
updated driver to pass this.
- Added __pic__
- Added OBJC_ZEROCOST_EXCEPTIONS define while I was here (to match gcc).
llvm-svn: 68584
and are even set in C mode. As such, move them to Targets.cpp.
__OBJC_GC__ is also darwin specific, but seems reasonable to always
define it when in objc-gc mode.
This fixes rdar://6761450
llvm-svn: 68494
Eventually, would be nice to be able to run these modifications even
when we don't want the warning or errors for the actual diagnostic.
llvm-svn: 68272
From a front-end perspective, I believe this code should work for ObjC @-strings. At the moment, I believe we need to tweak the code generation for @-strings (which doesn't appear to handle them). Will be investigating.
llvm-svn: 68076
- Temporarily undef'ed __OBJC2__ in nonfragile objc abi mode
as it was forcing ivar synthesis in a certain project which clang
does not yet support.
llvm-svn: 67766
terminated with an EOF token. The condition it is trying to check for is
handled by this code above.
// Empty arguments are standard in C99 and supported as an extension in
// other modes.
if (ArgTokens.empty() && !Features.C99)
Diag(Tok, diag::ext_empty_fnmacro_arg);
llvm-svn: 67705
- Make the Diagnostic::Level for PTH errors to be specified by the caller
clang (driver):
- Set the PTHManager diagnostic level to "Diagnostic::Error" for -include-pth
(a hard error) and Diagnostic::Warning for -token-cache (we can still
proceed).
llvm-svn: 67462
Add a #include directive around the command line buffer so that
diagnostics generated from -include directives get diagnostics
like:
In file included from <built-in>:98:
In file included from <command line>:3:
./t.h:2:1: warning: type specifier missing, defaults to 'int'
b;
^
llvm-svn: 67396
and the token after the # should be expanded if it is not a valid directive.
This allows us to transform things like:
#define FOO BAR
# FOO
into # BAR, even though FOO is not normally expanded for directives.
This should fix PR3833
llvm-svn: 67236
diagnostics. This builds on the patch that Sebastian committed and
then revert. Major differences are:
- We don't remove or use the current ".def" files. Instead, for now,
we just make sure that we're building the ".inc" files.
- Fixed CMake makefiles to run TableGen and build the ".inc" files
when needed. Tested with both the Xcode and Makefile generators
provided by CMake, so it should be solid.
- Fixed normal makefiles to handle out-of-source builds that involve
the ".inc" files.
I'll send a separate patch to the list with Sebastian's changes that
eliminate the use of the .def files.
llvm-svn: 67058
to being allocated from the same bumpptr that the MacroInfo objects
themselves are.
This speeds up -Eonly cocoa.h pth by ~4%, fsyntax-only is barely measurable.
llvm-svn: 65195
escapes in the string for subtoken positioning. This gives
us working examples like:
t.m:5:16: warning: field width should have type 'int', but argument has type 'unsigned int'
printf("\n\n%*d", (unsigned) 1, 1);
^ ~~~~~~~~~~~~
where before the caret pointed two spaces to the left.
llvm-svn: 64940
We now emit:
t.m:6:15: warning: field width should have type 'int', but argument has type 'unsigned int'
printf(STR, (unsigned) 1, 1);
^ ~~~~~~~~~~~~
t.m:3:18: note: instantiated from:
#define STR "abc%*ddef"
^
which has the correct location in the string literal in the note line.
llvm-svn: 64936
*end* of a macro instantiation, not the start of it. This is
really all about bug-for-bug compatibility with GCC, but not
doing this breaks the FreeBSD kernel.
llvm-svn: 64604
Now instead of just tracking the expansion history, also track the full
range of the macro that got replaced. For object-like macros, this doesn't
change anything. For _Pragma and function-like macros, this means we track
the locations of the ')'.
This is required for PR3579 because apparently GCC uses the line of the ')'
of a function-like macro as the location to expand __LINE__ to.
llvm-svn: 64601
a target.
Make Preprocessor.cpp define a new __INTPTR_TYPE__ macro based on this.
On linux/32, set intptr_t to int, instead of long. This fixes PR3563.
llvm-svn: 64495
wine sources. This was happening because HighlightMacros was
calling EnterMainFile multiple times on the same preprocessor
object and getting an assert due to the new #line stuff (the
file in question was bison output with #line directives).
The fix for this is to not reenter the file. Instead,
relex the tokens in raw mode, swizzle them a bit and repreprocess
the token stream. An added bonus of this is that rewrite macros
will now hilight the macro definition as well as its uses. Woo.
llvm-svn: 64480
to use this stat information in the PTH file using a 'StatSysCallCache' object.
Performance impact (Cocoa.h, PTH):
- number of stat calls reduces from 1230 to 425
- fsyntax-only: time improves by 4.2%
We can reduce the number of stat calls to almost zero by caching negative stat
calls and directory stat calls in the PTH file as well.
llvm-svn: 64353
actually *slightly* slower than the binary search. Since this is algorithmically
better, further performance tuning should be able to make this faster.
llvm-svn: 64326
line markers, including maintenance of the virtual include stack.
For something like this:
# 42 "bar.c" 1
# 142 "bar2.c" 1
#warning zappa
# 92 "bar.c" 2
#warning gonzo
# 102 "foo.c" 2
#warning bonkta
we now produce these three warnings:
#1:
In file included from foo.c:3:
In file included from bar.c:42:
bar2.c:143:2: warning: #warning zappa
#warning zappa
^
#2:
In file included from foo.c:3:
bar.c:92:2: warning: #warning gonzo
#warning gonzo
^
#3:
foo.c:102:2: warning: #warning bonkta
#warning bonkta
^
llvm-svn: 63722
.def file for each library. This means that adding a diagnostic
to sema doesn't require all the other libraries to be rebuilt.
Patch by Anders Johnsen!
llvm-svn: 63111
as reported to the user and as manipulated by #line. This is what __FILE__,
__INCLUDE_LEVEL__, diagnostics and other things should follow (but not
dependency generation!).
This patch also includes several cleanups along the way:
- SourceLocation now has a dump method, and several other places
that did similar things now use it.
- I cleaned up some code in AnalysisConsumer, but it should probably be
simplified further now that NamedDecl is better.
- TextDiagnosticPrinter is now simplified and cleaned up a bit.
This patch is a prerequisite for #line, but does not actually provide
any #line functionality.
llvm-svn: 63098
Performance impact for -fsyntax-only on Cocoa.h (with Cocoa.h in the PTH file):
- PTH generation time improves by 5%
- PTH reading improves by 0.3%.
llvm-svn: 63072
instantiation history in an effort to speed up c99-intconst-1.c.
Now that multiple nested instantiations are allowed, we just
make them and don't pay the cost of lookups. With the other
changes that went in before this, reverting this is actually
a speedup for c99-intconst-1.c, speeding it up from 1.96s to 1.80s,
and preserves much better loc info.
llvm-svn: 63036
Token now has a class of kinds for "literals", which include
numeric constants, strings, etc. These tokens can optionally have
a pointer to the start of the token in the lexer buffer. This
makes it faster to get spelling and do other gymnastics, because we
don't have to go through source locations.
This change is performance neutral, but will make other changes
more feasible down the road.
llvm-svn: 63028
This reduces fsyntax-only time on c99-intconst-1.c from 2.43s down to
2.01s (20%), reducing the number of fileid lookups from 2529040 linear
and 64771121 binary to 5625902 linear and 4151182 binary.
This knocks getFileID down to only 4.6% of compile time on this testcase.
At this point, malloc/free is over 35% of compile time, primarily allocating
MacroArgs objects and their argument preexpansion vectors.
I don't feel like malloc avoiding right now, so I'm just going to call
this good.
llvm-svn: 62994
of a macro. Since these tokens may themselves be from macro
expansions, we need to resolve down to the spelling loc when the
macro ends up being instantiated. Instead of resolving this for
each token expanded from the macro definition, just do it once when
the macro is defined. This speeds up clang on c99-intconst-1.c from
2.66s to 2.43s (9.5%), reducing the FileID lookups from 407244 linear and
114175649 binary to 2529040 linear and 64771121 binary.
llvm-svn: 62993
per token lexed from it. This speeds up clang on c99-intconst-1.c from
the GCC testsuite from 3.64s to 2.66s (36%). This reduces the number of
binary search FileID lookups from 251570522 to 114175649 on this testcase.
llvm-svn: 62992
ground work for implementing #line, and fixes the "out of macro ID's"
problem.
There is nothing particularly tricky about the code, other than the
very performance sensitive SourceManager::getFileID() method.
llvm-svn: 62978
Refactor how the preprocessor changes a token from being an tok::identifier to a
keyword (e.g. tok::kw_for). Instead of doing this in HandleIdentifier, hoist this
common case out into the caller, so that every keyword doesn't have to go through
HandleIdentifier. This drops time in HandleIdentifier from 1.25ms to .62ms, and
speeds up clang -Eonly with PTH by about 1%.
llvm-svn: 62855
tells us whether Preprocessor::HandleIdentifier needs to be called.
Because this method is only rarely needed, this saves a call and a
bunch of random checks. This drops the time in HandleIdentifier
from 3.52ms to .98ms on cocoa.h on my machine.
llvm-svn: 62675
Changes to IdentifierTable:
- High-level summary: StringMap never owns IdentifierInfos. It just
references them.
- The string map now has StringMapEntry<IdentifierInfo*> instead of
StringMapEntry<IdentifierInfo>. The IdentifierInfo object is
allocated using the same bump pointer allocator as used by the
StringMap.
Changes to IdentifierInfo:
- Added an extra pointer to point to the
StringMapEntry<IdentifierInfo*> in the string map. This pointer
will be null if the IdentifierInfo* is *only* used by the PTHLexer
(that is it isn't in the StringMap).
Algorithmic changes:
- Non-PTH case:
IdentifierInfo::get() will always consult the StringMap first to
see if we have an IdentifierInfo object. If that StringMapEntry
references a null pointer, we allocate a new one from the BumpPtrAllocator
and update the reference in the StringMapEntry.
- PTH case:
We do the same lookup as with the non-PTH case, but if we don't get
a hit in the StringMap we do a secondary lookup in the PTHManager for
the IdentifierInfo. If we don't find an IdentifierInfo we create a
new one as in the non-PTH case. If we do find and IdentifierInfo
in the PTHManager, we update the StringMapEntry to refer to it so
that the IdentifierInfo will be found on the next StringMap lookup.
This way we only do a binary search in the PTH file at most once
for a given IdentifierInfo. This greatly speeds things up for source
files containing a non-trivial amount of code.
Performance impact:
While these changes do add some extra indirection in
IdentifierTable to access an IdentifierInfo*, I saw speedups even
in the non-PTH case as well.
Non-PTH: For -fsyntax-only on Cocoa.h, we see a 6% speedup.
PTH (with Cocoa.h in token cache): 11% speedup.
I also did an experiment where we did -fsyntax-only on a source file
including a large header and Cocoa.h, but the token cache did not
contain the larger header. For this file, we were seeing a performance
*regression* when using PTH of 3% over non-PTH. Now we are seeing
a performance improvement of 9%!
Tests:
The serialization tests are now failing. I looked at this extensively,
and I my belief is that this change is unmasking a bug rather than
introducing a new one. I have disabled the serialization tests for now.
llvm-svn: 62636
the chunk ID not the file ID. This exposes problems in
TextDiagnosticPrinter where it should have been using the canonical
file ID but wasn't. Fix these along the way.
llvm-svn: 62427
method. This lets us clean up the interface and make it more obvious that
this method is *really really* _Pragma specific.
Note that _Pragma handling uglifies the Lexer in the critical path. It would
be very interesting to consider making _Pragma remapping be a new special
lexer class of its own.
llvm-svn: 62425
"FileID" a concept that is now enforced by the compiler's type checker
instead of yet-another-random-unsigned floating around.
This is an important distinction from the "FileID" currently tracked by
SourceLocation. *That* FileID may refer to the start of a file or to a
chunk within it. The new FileID *only* refers to the file (and its
#include stack and eventually #line data), it cannot refer to a chunk.
FileID is a completely opaque datatype to all clients, only SourceManager
is allowed to poke and prod it.
llvm-svn: 62407
if warnings in system headers are disabled. isIdenticalTo can end up
calling the expensive getSpelling method, and other bad stuff and is
completely unneeded if the warning will be discarded anyway. rdar://6502956
llvm-svn: 62347
the "physical" location of tokens, refer to the "spelling" location.
This is more concrete and useful, tokens aren't really physical objects!
llvm-svn: 62309
- IdentifierInfo can now (optionally) have its string data not be
co-located with itself. This is for use with PTH. This aspect is a
little gross, as getName() and getLength() now make assumptions
about a possible alternate representation of IdentifierInfo.
Perhaps we should make IdentifierInfo have virtual methods?
IdentifierTable:
- Added class "IdentifierInfoLookup" that can be used by
IdentifierTable to perform "string -> IdentifierInfo" lookups using
an auxilliary data structure. This is used by PTH.
- Perform tests show that IdentifierTable::get() does not slow down
because of the extra check for the IdentiferInfoLookup object (the
regular StringMap lookup does enough work to mitigate the impact of
an extra null pointer check).
- The upshot is that now that some IdentifierInfo objects might be
owned by the IdentiferInfoLookup object. This should be reviewed.
PTH:
- Modified PTHManager::GetIdentifierInfo to *not* insert entries in
IdentifierTable's string map, and instead create IdentifierInfo
objects on the fly when mapping from persistent IDs to
IdentifierInfos. This saves a ton of work with string copies,
hashing, and StringMap lookup and resizing. This change was
motivated because when processing source files in the PTH cache we
don't need to do any string -> IdentifierInfo lookups.
- PTHManager now subclasses IdentifierInfoLookup, allowing clients of
IdentifierTable to transparently use IdentifierInfo objects managed
by the PTH file. PTHManager resolves "string -> IdentifierInfo"
queries by doing a binary search over a sorted table of identifier
strings in the PTH file (the exact algorithm we use can be changed
as needed).
These changes lead to the following performance changes when using PTH on Cocoa.h:
- fsyntax-only: 10% performance improvement
- Eonly: 30% performance improvement
llvm-svn: 62273
lexical order of the corresponding identifier strings. This will be used for a
forthcoming optimization. This slows down PTH generation time by 7%. We can
revert this change if the optimization proves to not be valuable.
llvm-svn: 62248
- Use canonical FileID when using getSpelling() caching. This
addresses some cache misses we were seeing with -fsyntax-only on
Cocoa.h
- Added Preprocessor::getPhysicalCharacterAt() utility method for
clients to grab the first character at a specified sourcelocation.
This uses the PTH spelling cache.
- Modified Sema::ActOnNumericConstant() to use
Preprocessor::getPhysicalCharacterAt() instead of
SourceManager::getCharacterData() (to get PTH hits).
These changes cause -fsyntax-only to not page in any sources from
Cocoa.h. We see a speedup of 27%.
llvm-svn: 62193
- Refactor caching logic into a helper class PTHSpellingSearch
- Allow "random accesses" in the spelling cache, thus catching the remaining
cases where 'getSpelling' wasn't hitting the PTH cache
For -Eonly, PTH, Cocoa.h:
- This reduces wall time by 3% (user time unchanged, sys time reduced)
- This reduces the amount of paged source by 1112K.
The remaining 1112K still being paged in is from somewhere else
(investigating).
llvm-svn: 62009
performance gain. Here's what we see for -Eonly on Cocoa.h (using PTH):
- wall time decreases by 21% (26% speedup overall)
- system time decreases by 35%
- user time decreases by 6%
These reductions are due to not paging source files just to get spellings for
literals. The solution in place doesn't appear to be 100% yet, as we still see
some of the pages for source files getting mapped in. Using -print-stats, we see
that SourceManager maps in 7179K less bytes of source text (reduction of 75%).
Will investigate why the remaining 25% are getting paged in.
With these changes, here's how PTH compares to non-PTH on Cocoa.h:
-Eonly: PTH takes 64% of the time as non-PTH (54% speedup)
-fsyntax-only: PTH takes 89% of the time as non-PTH (11% speedup)
llvm-svn: 61913
- Added stub PTHLexer::getSpelling() that will be used for fetching cached
spellings from the PTH file. This doesn't do anything yet.
- Added a hook in Preprocessor::getSpelling() to call PTHLexer::getSpelling()
when using a PTHLexer.
- Updated PTHLexer to read the offsets of spelling tables in the PTH file.
llvm-svn: 61911
- Encode the token length with 2 bytes instead of 4.
- This reduces the size of the .pth file for Cocoa.h by 12%.
- This speeds up PTH time (-Eonly) on Cocoa.h by 1.6%.
llvm-svn: 61364
- In PTHLexer::Lex read all of the token data from PTH file before
constructing the token. The idea is to enhance locality.
- Do not use Read8/Read32 in PTHLexer::Lex. Inline these operations manually.
- Change PTHManager::ReadIdentifierInfo() to PTHManager::GetIdentifierInfo().
They are functionally the same except that PTHLexer::Lex() reads the
persistent id.
These changes result in a 3.3% speedup for PTH on Cocoa.h (-Eonly).
llvm-svn: 61363
- Embed 'eom' tokens in PTH file.
- Use embedded 'eom' tokens to not lazily generate them in the PTHLexer.
This means that PTHLexer can always advance to the next token after
reading a token (instead of buffering tokens using a copy).
- Moved logic of 'ReadToken' into Lex. GetToken & ReadToken no longer exist.
- These changes result in a 3.3% speedup (-Eonly) on Cocoa.h.
- The code is a little gross. Many cleanups are possible and should be done.
llvm-svn: 61360
- Added a side-table per each token-cached file with the preprocessor conditional stack. This tracks what #if's are matched with what #endifs and where their respective tokens are in the PTH file. This will allow for quick skipping of excluded conditional branches in the Preprocessor.
- Performance testing shows the addition of this information (without actually utilizing it) leads to no performance regressions.
llvm-svn: 60911
- Added virtual method 'getSourceLocation()' (no arguments) that gets the location of the next "observable" location (e.g., next character, next token).
PPLexerChange.cpp:
- Implemented FIXME by using PreprocessorLexer::getSourceLocation() to get the location in the file we are returning to after lexing a #included file. This appears to be slightly faster than having the branch (i.e., 'if(CurLexer)'). It's also not a really hot part of the Preprocessor.
llvm-svn: 60860
- Added method "setPTHManager" that will be called by the driver to install
a PTHManager for the Preprocessor.
- Fixed some comments.
- Added EnterSourceFileWithPTH to mirror EnterSourceFileWithLexer.
llvm-svn: 60437
with implicit quotes around them. This has a bunch of follow-on
effects and requires tweaking to a whole lot of code. This causes
a regression in two tests (xfailed) by causing it to emit things like:
Line 10: duplicate interface declaration for category 'MyClass1' ('Category1')
instead of:
Line 10: duplicate interface declaration for category 'MyClass1(Category1)'
I will fix this in a follow-up commit.
As part of this, I had to start switching stuff to use ->getDeclName() instead
of Decl::getName() for consistency. This is good, but I was planning to do this
as an independent patch. There will be several follow-on patches
to clean up some of the mess, but this patch is already too big.
llvm-svn: 59917
would not eat the "-1" in "0x0p-1", but LiteralSupport would accept
it when extensions are on. This caused strangeness and failures
when hexfloats were properly treated as an extension (not error)
in LiteralSupport.
llvm-svn: 59865
its call sites. This makes it more explicit when the hasError flag is
getting set and removes a confusing difference in behavior between
PP.Diag and Diag in this code.
llvm-svn: 59863
(and carefully calculated) effect of allowing the compiler to reason
about the aliasing properties of DiagnosticBuilder object better,
allowing the whole thing to be promoted to registers instead of
resulting in a ton of stack traffic.
While I'm not very concerned about the performance of the Diag() method
invocations, I *am* more concerned about their code size and impact on the
non-diagnostic code. This patch shrinks the clang executable (in
release-asserts mode with gcc-4.2) from 14523980 to 14519816 bytes. This
isn't much, but it shrinks the lexer from 38192 to 37776, PPDirectives.o
from 31116 to 28868 bytes, etc.
llvm-svn: 59862
one for building up the diagnostic that is in flight (DiagnosticBuilder)
and one for pulling structured information out of the diagnostic when
formatting and presenting it.
There is no functionality change with this patch.
llvm-svn: 59849
- Move out logic for handling the end-of-file to LexEndOfFile (to match the Lexer) class. The logic now mirrors the Lexer class more, which allows us to pass most of the Preprocessor test cases.
llvm-svn: 59768
- Rename 'CurToken' and 'LastToken' to 'CurTokenIdx' and 'LastTokenIdx'
respectively.
- Add helper methods GetToken(), AdvanceToken(), AtLastToken() to abstract away
details of the token stream. This also allows us to easily replace their
implementation later.
llvm-svn: 59733
(temporary hack) to test the PTHLexer is that whenever we would create a Lexer
object we instead raw lex a memory buffer first and then use the PTHLexer. This
logic exists only to driver the PTHLexer and will be removed/changed in the
future. Note that the regular path using normal Lexer objects is what is used by
default.
llvm-svn: 59723
- Add variants of IsNonPragmaNonMacroLexer to accept an IncludeMacroStack entry
(simplifies some uses).
- Use IsNonPragmaNonMacroLexer in Preprocessor::LookupFile.
- Add 'FileID' to PreprocessorLexer, and have Preprocessor query this fileid
when looking up the FileEntry for a file
Performance testing of -Eonly on Cocoa.h shows no performance regression because
of this patch.
llvm-svn: 59666
- Add variants of IsNonPragmaNonMacroLexer to accept an IncludeMacroStack entry
(simplifies some uses).
- Use IsNonPragmaNonMacroLexer in Preprocessor::LookupFile.
Performance testing of -Eonly on Cocoa.h shows no performance regression because
of this patch.
llvm-svn: 59574
are formed. In particular, a diagnostic with all its strings and ranges is now
packaged up and sent to DiagnosticClients as a DiagnosticInfo instead of as a
ton of random stuff. This has the benefit of simplifying the interface, making
it more extensible, and allowing us to do more checking for things like access
past the end of the various arrays passed in.
In addition to introducing DiagnosticInfo, this also substantially changes how
Diagnostic::Report works. Instead of being passed in all of the info required
to issue a diagnostic, Report now takes only the required info (a location and
ID) and returns a fresh DiagnosticInfo *by value*. The caller is then free to
stuff strings and ranges into the DiagnosticInfo with the << operator. When
the dtor runs on the DiagnosticInfo object (which should happen at the end of
the statement), the diagnostic is actually emitted with all of the accumulated
information. This is a somewhat tricky dance, but it means that the
accumulated DiagnosticInfo is allowed to keep pointers to other expression
temporaries without those pointers getting invalidated.
This is just the minimal change to get this stuff working, but this will allow
us to eliminate the zillions of variant "Diag" methods scattered throughout
(e.g.) sema. For example, instead of calling:
Diag(BuiltinLoc, diag::err_overload_no_match, typeNames,
SourceRange(BuiltinLoc, RParenLoc));
We will soon be able to just do:
Diag(BuiltinLoc, diag::err_overload_no_match)
<< typeNames << SourceRange(BuiltinLoc, RParenLoc));
This scales better to support arbitrary types being passed in (not just
strings) in a type-safe way. Go operator overloading?!
llvm-svn: 59502