* Clarify what MacroInfo::isBuiltinMacro means, as it really means something
more like "isMagicalMacro" or "requiresProcessingBeforeExpansion" -- the
macros defined in "<built-in>" are not considered built-in by this function;
* Escape __LINE__ as \__LINE__ in Doxygen comments so that the underscores
don't get replaced by *bold* output;
* Turn comments in MacroInfo.cpp into non-Doxygen comments, so that they
don't result in duplicated/badly formatted Doxygen output;
* Clean up a bunch of \brief formatting, and add a \file comment for
MacroInfo.h.
llvm-svn: 177581
In a module-enabled Cocoa PCH file, we spend a lot of time stat'ing the headers
in order to associate the FileEntries with their modules and support implicit
module import.
Use a more lazy scheme by enhancing HeaderInfoTable to store extra info about
the module that a header belongs to, and associate it with its module only when
there is a request for loading the header info for a particular file.
Part of rdar://13391765
llvm-svn: 176976
its index in the preprocessed entities vector.
This is because the order of the entities in the vector can change in some (uncommon) cases.
llvm-svn: 175907
for the data specific to a macro definition (e.g. what the tokens are), and
MacroDirective class which encapsulates the changes to the "macro namespace"
(e.g. the location where the macro name became active, the location where it was undefined, etc.)
(A MacroDirective always points to a MacroInfo object.)
Usually a macro definition (MacroInfo) is where a macro name becomes active (MacroDirective) but
splitting the concepts allows us to better model the effect of modules to the macro namespace
(also as a bonus it allows better modeling of push_macro/pop_macro #pragmas).
Modules can have their own macro history, separate from the local (current translation unit)
macro history; MacroDirectives will be used to model the macro history (changes to macro namespace).
For example, if "@import A;" imports macro FOO, there will be a new local MacroDirective created
to indicate that "FOO" became active at the import location. Module "A" itself will contain another
MacroDirective in its macro history (at the point of the definition of FOO) and both MacroDirectives
will point to the same MacroInfo object.
Introducing the separation of macro concepts is the first part towards better modeling of module macros.
llvm-svn: 175585
The use of this flag enables a modules optimization where a given set
of macros can be labeled as "ignored" by the modules
system. Definitions of those macros will be completely ignored when
building the module hash and will be stripped when actually building
modules. The overall effect is that this flag can be used to
drastically reduce the number of
Eventually, we'll want modules to tell us what set of macros they
respond to (the "configuration macros"), and anything not in that set
will be excluded. However, that requires a lot of per-module
information that must be accurate, whereas this option can be used
more readily.
Fixes the rest of <rdar://problem/13165109>.
llvm-svn: 174560
People use the C preprocessor for things other than C files. Some of them
have Unicode characters. We shouldn't warn about Unicode characters
appearing outside of identifiers in this case.
There's not currently a way for the preprocessor to tell if it's in -E mode,
so I added a new flag, derived from the PreprocessorOutputOptions. This is
only used by the Unicode warnings for now, but could conceivably be used by
other warnings or even behavioral differences later.
<rdar://problem/13107323>
llvm-svn: 173881
- The only group where it makes sense for the "ExternC" bit is System, so this
simplifies having to have the extra isCXXAware (or ImplicitExternC, depending
on what code you talk to) bit caried around.
llvm-svn: 173859
This is a missing piece for C99 conformance.
This patch handles UCNs by adding a '\\' case to LexTokenInternal and
LexIdentifier -- if we see a backslash, we tentatively try to read in a UCN.
If the UCN is not syntactically well-formed, we fall back to the old
treatment: a backslash followed by an identifier beginning with 'u' (or 'U').
Because the spelling of an identifier with UCNs still has the UCN in it, we
need to convert that to UTF-8 in Preprocessor::LookUpIdentifierInfo.
Of course, valid code that does *not* use UCNs will see only a very minimal
performance hit (checks after each identifier for non-ASCII characters,
checks when converting raw_identifiers to identifiers that they do not
contain UCNs, and checks when getting the spelling of an identifier that it
does not contain a UCN).
This patch also adds basic support for actual UTF-8 in the source. This is
treated almost exactly the same as UCNs except that we consider stray
Unicode characters to be mistakes and offer a fixit to remove them.
llvm-svn: 173369
Makes sure that a deserialized macro is only added to the preprocessor macro definitions only once.
Unfortunately I couldn't get a reduced test case.
rdar://13016031
llvm-svn: 172843
Previously we would serialize the macro redefinitions as a list, part of
the identifier, and try to chain them together across modules individually
without having the info that they were already chained at definition time.
Change this by serializing the macro redefinition chain and then try
to synthesize the chain parts across modules. This allows us to correctly
pinpoint when 2 different definitions are ambiguous because they came from
unrelated modules.
Fixes bogus "ambiguous expansion of macro" warning when a macro in a PCH
is redefined without undef'ing it first.
rdar://13016031
llvm-svn: 172620
which a particular declaration resides. Use this information to
customize the "definition of 'blah' must be imported from another
module" diagnostic with the module the user actually has to
import. Additionally, recover by importing that module, so we don't
complain about other names in that module.
Still TODO: coming up with decent Fix-Its for these cases, and expand
this recovery approach for other name lookup failures.
llvm-svn: 172290
directive as a macro expansion.
This is more of a "macro reference" than a macro expansion but it's close enough
for libclang's purposes. If it causes issues we can revisit and introduce a new
kind of cursor.
llvm-svn: 169666
PreprocessingRecord and into its own class, PPConditionalDirectiveRecord.
Decoupling allows a client to use the functionality of PPConditionalDirectiveRecord
without needing a PreprocessingRecord.
llvm-svn: 169229
building module 'Foo' imported from..." notes (the same we we provide
"In file included from..." notes) in the diagnostic, so that we know
how this module got included in the first place. This is part of
<rdar://problem/12696425>.
llvm-svn: 169021
import of that module elsewhere, don't try to build the module again:
it won't work, and the experience is quite dreadful. We track this
information somewhat globally, shared among all of the related
CompilerInvocations used to build modules on-the-fly, so that a
particular Clang instance will only try to build a given module once.
Fixes <rdar://problem/12552849>.
llvm-svn: 168961
common LexStringLiteral function. In doing so, some consistency problems have
been ironed out (e.g. where the first token in the string literal was lexed
with macro expansion, but subsequent ones were not) and also an erroneous
diagnostic has been corrected.
LexStringLiteral is complemented by a FinishLexStringLiteral function which
can be used in the situation where the first token of the string literal has
already been lexed.
llvm-svn: 168266
the related comma pasting extension.
In certain cases, we used to get two diagnostics for what is essentially one
extension. This change suppresses the first diagnostic in certain cases
where we know we're going to print the second diagnostic. The
diagnostic is redundant, and it can't be suppressed in the definition
of the macro because it points at the use of the macro, so we want to
avoid printing it if possible.
The implementation works by detecting constructs which look like comma
pasting at the time of the definition of the macro; this information
is then used when the macro is used. (We can't actually detect
whether we're using the comma pasting extension until the macro is
actually used, but we can detecting constructs which will be comma
pasting if the varargs argument is elided.)
<rdar://problem/12292192>
llvm-svn: 167907
allowing a module map to be placed one level above the '.framework'
directories to specify that all .frameworks within that directory can
be inferred as framework modules. One can also specifically exclude
frameworks known not to work.
This makes explicit (and more restricted) behavior modules have had
"forever", where *any* .framework was assumed to be able to be built
as a module. That's not necessarily true, so we white-list directories
(with exclusions) when those directories have been audited.
llvm-svn: 167482
The stat cache became essentially useless ever since we started
validating all file entries in the PCH.
But the motivating reason for removing it now is that it also affected
correctness in this situation:
-You have a header without include guards (using "#pragma once" or #import)
-When creating the PCH:
-The same header is referenced in an #include with different filename cases.
-In the PCH, of course, we record only one file entry for the header file
-But we cache in the PCH file the stat info for both filename cases
-Then the source files are updated and the header file is updated in a way that
its size and modification time are the same but its inode changes
-When using the PCH:
-We validate the headers, we check that header file and we create a file entry with its current inode
-There's another #include with a filename with different case than the previously created file entry
-In order to get its stat info we go through the cached stat info of the PCH and we receive the old inode
-because of the different inodes, we think they are different files so we go ahead and include its contents.
Removing the stat cache will potentially break clients that are attempting to use the stat cache
as a way of avoiding having the actual input files available. If that use case is important, patches are welcome
to bring it back in a way that will actually work correctly (i.e., emit a PCH that is self-contained, coping with
literal strings, line/column computations, etc.).
This fixes rdar://5502805
llvm-svn: 167172
description. Previously, one could emulate this behavior by placing
the header in an always-unavailable submodule, but Argyrios guilted me
into expressing this idea properly.
llvm-svn: 165921
macro history.
When deserializing macro history, we arrange history such that the
macros that have definitions (that haven't been #undef'd) and are
visible come at the beginning of the list, which is what the
preprocessor and other clients of Preprocessor::getMacroInfo()
expect. If additional macro definitions become visible later, they'll
be moved toward the front of the list. Note that it's possible to have
ambiguities, but we don't diagnose them yet.
There is a partially-implemented design decision here that, if a
particular identifier has been defined or #undef'd within the
translation unit, that definition (or #undef) hides any macro
definitions that come from imported modules. There's still a little
work to do to ensure that the right #undef'ing happens.
Additionally, we'll need to scope the update records for #undefs, so
they only kick in when the submodule containing that update record
becomes visible.
llvm-svn: 165682
MacroInfo*. Instead of simply dumping an offset into the current file,
give each macro definition a proper ID with all of the standard
modules-remapping facilities. Additionally, when a macro is modified
in a subsequent AST file (e.g., #undef'ing a macro loaded from another
module or from a precompiled header), provide a macro update record
rather than rewriting the entire macro definition. This gives us
greater consistency with the way we handle declarations, and ties
together macro definitions much more cleanly.
Note that we're still not actually deserializing macro history (we
never were), but it's far easy to do properly now.
llvm-svn: 165560
Summary:
When issuing a diagnostic message for the -Wimplicit-fallthrough diagnostics, always try to find the latest macro, defined at the point of fallthrough, which is immediately expanded to "[[clang::fallthrough]]", and use it's name instead of the actual sequence.
Known issues:
* uses PP.getSpelling() to compare macro definition with a string (anyone can suggest a convenient way to fill a token array, or maybe lex it in runtime?);
* this can be generalized and used in other similar cases, any ideas where it should reside then?
Reviewers: doug.gregor, rsmith
Reviewed By: rsmith
CC: cfe-commits
Differential Revision: http://llvm-reviews.chandlerc.com/D50
llvm-svn: 164858
have PPCallbacks::InclusionDirective pass the character range for the filename quotes or brackets.
rdar://11113134 & http://llvm.org/PR13880
llvm-svn: 164743
Summary: Passes all tests (+ the new one with code completion), but needs a thorough review in part related to modules.
Reviewers: doug.gregor
Reviewed By: alexfh
CC: cfe-commits, rsmith
Differential Revision: http://llvm-reviews.chandlerc.com/D41
llvm-svn: 164610
specific module (__building_module(modulename)) and to get the name of
the current module as an identifier (__MODULE__).
Used to help headers behave differently when they're being included as
part of building a module. Oh, the irony.
llvm-svn: 164605
within its own argument list. The original definition is used for the immediate
expansion, but the new definition is used for any subsequent occurences within
the argument list or after the expansion.
llvm-svn: 162906
Summary:
The problem was with the following sequence:
#pragma push_macro("long")
#undef long
#pragma pop_macro("long")
in case when "long" didn't represent a macro.
Fixed crash and removed code duplication for #undef/pop_macro case. Added regression tests.
Reviewers: doug.gregor, klimek
Reviewed By: doug.gregor
CC: cfe-commits, chapuni
Differential Revision: http://llvm-reviews.chandlerc.com/D31
llvm-svn: 162845
Summary:
Summary: Keep history of macro definitions and #undefs with corresponding source locations, so that we can later find out all macros active in a specified source location. We don't save the history in PCH (no need currently). Memory overhead is about sizeof(void*)*3*<number of macro definitions and #undefs>+<in-memory size of all #undef'd macros>
I've run a test on a file composed of 109 .h files from boost 1.49 on x86-64 linux.
Stats before this patch:
*** Preprocessor Stats:
73222 directives found:
19171 #define.
4345 #undef.
#include/#include_next/#import:
5233 source files entered.
27 max include stack depth
19210 #if/#ifndef/#ifdef.
2384 #else/#elif.
6891 #endif.
408 #pragma.
14466 #if/#ifndef#ifdef regions skipped
80023/451669/1270 obj/fn/builtin macros expanded, 85724 on the fast path.
127145 token paste (##) operations performed, 11008 on the fast path.
Preprocessor Memory: 5874615B total
BumpPtr: 4399104
Macro Expanded Tokens: 417768
Predefines Buffer: 8135
Macros: 1048576
#pragma push_macro Info: 0
Poison Reasons: 1024
Comment Handlers: 8
Stats with this patch:
...
Preprocessor Memory: 7541687B total
BumpPtr: 6066176
Macro Expanded Tokens: 417768
Predefines Buffer: 8135
Macros: 1048576
#pragma push_macro Info: 0
Poison Reasons: 1024
Comment Handlers: 8
In my test increase in memory usage is about 1.7Mb, which is ~28% of initial preprocessor's memory usage and about 0.8% of clang's total VMM allocation.
As for CPU overhead, it should only be noticeable when iterating over all macros, and should mostly consist of couple extra dereferences and one comparison per macro + skipping of #undef'd macros. It's less trivial to measure, though, as the preprocessor consumes a very small fraction of compilation time.
Reviewers: doug.gregor, klimek, rsmith, djasper
Reviewed By: doug.gregor
CC: cfe-commits, chandlerc
Differential Revision: http://llvm-reviews.chandlerc.com/D28
llvm-svn: 162810
nested names as id-expressions, using the annot_primary_expr annotation, where
possible. This removes some redundant lookups, and also allows us to
typo-correct within tentative parsing, and to carry on disambiguating past an
identifier which we can determine will fail lookup as both a type and as a
non-type, allowing us to disambiguate more declarations (and thus offer
improved error recovery for such cases).
This also introduces to the parser the notion of a tentatively-declared name,
which is an identifier which we *might* have seen a declaration for in a
tentative parse (but only if we end up disambiguating the tokens as a
declaration). This is necessary to correctly disambiguate cases where a
variable is used within its own initializer.
llvm-svn: 162159
* Primarily fixed \param commands with names not matching any actual
parameters of the documented functions. In many cases this consists
just of fixing up the parameter name in the \param to match the code,
in some it means deleting obsolete documentation and occasionally it
means documenting the parameter that has replaced the older one that
was documented, which sometimes means some simple reverse-engineering
of the docs from the implementation;
* Fixed \param ParamName [out] to the correct format with [out] before
the parameter name;
* Fixed some \brief summaries.
llvm-svn: 158980
* Added \file documentation for PPCallbacks.h;
* Added/formated \brief summaries;
* Deleted documentation for parameters that no longer exist;
* Used \param more systematically for documentation of parameters;
* Escaped # characters in Doxygen comments.
llvm-svn: 158978
* Escaped # and < characters in Doxygen comments as needed;
* Fixed up some \brief summaries;
* Marked up some parameter references with \p;
* Added \code...\endcode around code examples;
* Used \returns a little more.
llvm-svn: 158966
* Retain comments in the AST
* Serialize/deserialize comments
* Find comments attached to a certain Decl
* Expose raw comment text and SourceRange via libclang
llvm-svn: 158771
* Escaped # characters in Doxygen comments as needed;
* Added/reformatted \brief docs;
* Used a \file comment to document the file (MultipleIncludeOpt.h).
llvm-svn: 158635
* Escaped the # of #define in Doxygen comments;
* Formatting: Annotated __VA_ARGS__ with \c;
* Converted docs to use \brief to provide summaries;
* Fixed a typo: disbles -> disables.
llvm-svn: 158553
This reduces the number of warnings generated by Doxygen by about 100
(roughly 10%). Issues addressed:
(1) Primarily, backslash-escaped "@foo" and "#bah" in Doxygen comments
when they're not supposed to be Doxygen commands or links, and
similarly for "<baz>" when it's not intended as as HTML tag;
(2) Changed some \t commands (which don't exist) to \c ("to refer to a
word of code", as the Doxygen manual says);
(3) \precondition becomes \pre;
(4) When touching comments, deleted a couple of spurious spaces in them;
(5) Changed some \n and \r to \\n and \\r;
(6) Fixed one tiny typo: #pragms -> #pragma.
This patch touches documentation/comments only.
llvm-svn: 158422
override whether headers are system headers by checking for prefixes of the
header name specified in the #include directive.
This allows warnings to be disabled for third-party code which is found in
specific subdirectories of include paths.
llvm-svn: 158418
The preprocessor's handling of diagnostic push/pops is stateful, so
encountering pragmas during a re-parse causes problems. HTMLRewrite
already filters out normal # directives including #pragma, so it's
clear it's not expected to be interpreting pragmas in this mode.
This fix adds a flag to Preprocessor to explicitly disable pragmas.
The "right" fix might be to separate pragma lexing from pragma
parsing so that we can throw away pragmas like we do preprocessor
directives, but right now it's important to get the fix in.
Note that this has nothing to do with the "hack" of re-using the
input preprocessor in HTMLRewrite. Even if we someday copy the
preprocessor instead of re-using it, the copy would (and should) include
the diagnostic level tables and have the same problems.
llvm-svn: 158214
This was a problem for people who write 'return(result);'
Also fix ARCMT's corresponding code, though there's no test case for this
because implicit casts like this are rejected by the migrator for being
ambiguous, and explicit casts have no problem.
<rdar://problem/11577346>
llvm-svn: 158130
- Developers of system frameworks need a way for their framework to be treated as a "system framework" during development. Otherwise, they are unable to properly test how their framework behaves when installed because of the semantic changes (in warning behavior) applied to system frameworks.
llvm-svn: 154105
If we are pre-expanding a macro argument don't actually "activate"
the pragma at that point, activate the pragma whenever we encounter
it again in the token stream.
This ensures that we will activate it in the correct location
or that we will ignore it if it never enters the token stream, e.g:
\#define EMPTY(x)
\#define INACTIVE(x) EMPTY(x)
INACTIVE(_Pragma("clang diagnostic ignored \"-Wconversion\""))
This also fixes the crash in rdar://11168596.
llvm-svn: 153959
"#include MACRO(STUFF)".
-As an inclusion position for the included file, use the file location of the file where it
was included but *after* the macro expansions. We want the macro expansions to be considered
as before-in-translation-unit for everything in the included file.
-In the preprocessing record take into account that only inclusion directives can be encountered
as "out-of-order" (by comparing the start of the range which for inclusions is the hash location)
and use binary search if there is an extreme number of macro expansions in the include directive.
Fixes rdar://11111779
llvm-svn: 153527
Enable incremental parsing by the Preprocessor,
where more code can be provided after an EOF.
It mainly prevents the tearing down of the topmost lexer.
To be used like this:
PP.enableIncrementalProcessing();
while (getMoreSource()) {
while (Parser.ParseTopLevelDecl(ADecl)) {...}
}
PP.enableIncrementalProcessing(false);
llvm-svn: 152914
- The theory here is that we have these functions sprinkled in all over the
place. This should allow the optimizer to at least realize it can still do
load CSE across these calls.
- I blindly marked all instances as such, even though the optimizer can infer
this attribute in some instances (some of the inline ones) as that was easier
and also, when given the choice between thinking and not thinking, I prefer
the latter.
You might think this is mere frivolity, but actually this is good for a .7 -
1.1% speedup on 403.gcc/combine.c, JSC/Interpreter.cpp,
OGF/NSBezierPath-OAExtensions.m.
llvm-svn: 152426
grammar requires a string-literal and not a user-defined-string-literal. The
two constructs are still represented by the same TokenKind, in order to prevent
a combinatorial explosion of different kinds of token. A flag on Token tracks
whether a ud-suffix is present, in order to prevent clients from needing to look
at the token's spelling.
llvm-svn: 152098
Introduce PreprocessingRecord::rangeIntersectsConditionalDirective() which returns
true if a given range intersects with a conditional directive block.
llvm-svn: 152018
-Add location parameter for the directives callbacks
-Skip callbacks if the directive is inside a skipped range.
-Make sure the directive callbacks are invoked in source order.
llvm-svn: 152017
Fixes PR10606.
I'm not sure if this is the best way to go about it, but
I locally enabled this code path without the msext conditional,
and all tests pass, except for test/Preprocessor/cxx_oper_keyword.cpp
which explicitly checks that operator keywords can't be redefined.
I also parsed chromium/win with a clang with and without this patch.
It introduced no new errors, but removes 43 existing errors.
llvm-svn: 151768
This seems to negatively affect compile time onsome ObjC tests
(which use a lot of partial diagnostics I assume). I have to come
up with a way to keep them inline without including Diagnostic.h
everywhere. Now adding a new diagnostic requires a full rebuild
of e.g. the static analyzer which doesn't even use those diagnostics.
This reverts commit 6496bd10dc3a6d5e3266348f08b6e35f8184bc99.
This reverts commit 7af19b817ba964ac560b50c1ed6183235f699789.
This reverts commit fdd15602a42bbe26185978ef1e17019f6d969aa7.
This reverts commit 00bd44d5677783527d7517c1ffe45e4d75a0f56f.
This reverts commit ef9b60ffed980864a8db26ad30344be429e58ff5.
llvm-svn: 150006
- Move the offending methods out of line and fix transitive includers.
- This required changing an enum in the PPCallback API into an unsigned.
llvm-svn: 149782
into using non-absolute system includes (<foo>)...
... and introduce another hack that is simultaneously more heineous
and more effective. We whitelist Clang-supplied headers that augment
or override system headers (such as float.h, stdarg.h, and
tgmath.h). For these headers, Clang does not provide a module
mapping. Instead, a system-supplied module map can refer to these
headers in a system module, and Clang will look both in its own
include directory and wherever the system-supplied module map
suggests, then adds either or both headers. The end result is that
Clang-supplied headers get merged into the system-supplied module for
the C standard library.
As a drive-by, fix up a few dependencies in the _Builtin_instrinsics
module.
llvm-svn: 149611
for getting the name of the module file, unifying the code for
searching for a module with a given name (into lookupModule()) and
separating out the mapping to a module file (into
getModuleFileName()). No functionality change.
llvm-svn: 149197
single attribute ("system") that allows us to mark a module as being a
"system" module. Each of the headers that makes up a system module is
considered to be a system header, so that we (for example) suppress
warnings there.
If a module is being inferred for a framework, and that framework
directory is within a system frameworks directory, infer it as a
system framework.
llvm-svn: 149143
when it actually has changed (and not, e.g., when we've simply attached a
deserialized macro definition). Good for ~1.5% reduction in module
file size, mostly in the identifier table.
llvm-svn: 148808
modules. This leaves us without an explicit syntax for importing
modules in C/C++, because such a syntax needs to be discussed
first. In Objective-C/Objective-C++, the @import syntax is used to
import modules.
Note that, under -fmodules, C/C++ programs can import modules via the
#include mechanism when a module map is in place for that header. This
allows us to work with modules in C/C++ without committing to a syntax.
llvm-svn: 147467
features needed for a particular module to be available. This allows
mixed-language modules, where certain headers only work under some
language variants (e.g., in C++, std.tuple might only be available in
C++11 mode).
llvm-svn: 147387
part of HeaderSearch. This function just normalizes filenames for use
inside of a synthetic include directive, but it is used in both the
Frontend and Serialization libraries so it needs a common home.
llvm-svn: 146227
umbrella headers in the sense that all of the headers within that
directory (and eventually its subdirectories) are considered to be
part of the module with that umbrella directory. However, unlike
umbrella headers, which are expected to include all of the headers
within their subdirectories, Clang will automatically include all of
the headers it finds in the named subdirectory.
The intent here is to allow a module map to trivially turn a
subdirectory into a module, where the module's structure can mimic the
directory structure.
llvm-svn: 146165
within module maps, which will (eventually) be used to re-export a
module from another module. There are still some pieces missing,
however.
llvm-svn: 145665
(sub)module, all of the names may be hidden, just the macro names may
be exposed (for example, after the preprocessor has seen the import of
the module but the parser has not), or all of the names may be
exposed. Importing a module makes its names, and the names in any of
its non-explicit submodules, visible to name lookup (transitively).
This commit only introduces the notion of name visible and marks
modules and submodules as visible when they are imported. The actual
name-hiding logic in the AST reader will follow (along with test cases).
llvm-svn: 145586
library, since modules cut across all of the libraries. Rename
serialization::Module to serialization::ModuleFile to side-step the
annoying naming conflict. Prune a bunch of ModuleMap.h includes that
are no longer needed (most files only needed the Module type).
llvm-svn: 145538
submodules. This information will eventually be used for name hiding
when dealing with submodules. For now, we only use it to ensure that
the module "key" returned when loading a module will always be a
module (rather than occasionally being a FileEntry).
llvm-svn: 145497
return the module itself (in the module map) rather than returning the
umbrella header used to build the module. While doing this, make sure
that we're inferring modules for frameworks to build that module.
llvm-svn: 145310
into a module. This module can either be loaded from a module map in
the framework directory (which isn't quite working yet) or inferred
from an umbrella header (which does work, and replaces the existing
hack).
llvm-svn: 144877
the umbrella header's directory and its subdirectories are part of the
module (that's why it's an umbrella). Make sure that these headers are
considered to be part of the module for lookup purposes.
llvm-svn: 144859
the module is described in one of the module maps in a search path or
in a subdirectory off the search path that has the same name as the
module we're looking for.
llvm-svn: 144433
map, so long as they have an umbrella header. This makes it possible
to introduce a module map + umbrella header for a given set of
headers, to turn it into a module.
There are two major deficiencies here: first, we don't go hunting for
module map files when we just see a module import (so we won't know
about the modules described therein). Second, we don't yet have a way
to build modules that don't have umbrella headers, or have incomplete
umbrella headers.
llvm-svn: 144424
the corresponding (top-level) modules. This isn't actually useful yet,
because we don't yet have a way to build modules out of module maps.
llvm-svn: 144410
Module map files provide a way to map between headers and modules, so
that we can layer a module system on top of existing headers without
changing those headers at all.
This commit introduces the module map file parser and the module map
that it generates, and wires up the module map file parser so that
we'll automatically find module map files as part of header
search. Note that we don't yet use the information stored in the
module map.
llvm-svn: 144402
AST file more lazy, so that we don't eagerly load that information for
all known identifiers each time a new AST file is loaded. The eager
reloading made some sense in the context of precompiled headers, since
very few identifiers were defined before PCH load time. With modules,
however, a huge amount of code can get parsed before we see an
@import, so laziness becomes important here.
The approach taken to make this information lazy is fairly simple:
when we load a new AST file, we mark all of the existing identifiers
as being out-of-date. Whenever we want to access information that may
come from an AST (e.g., whether the identifier has a macro definition,
or what top-level declarations have that name), we check the
out-of-date bit and, if it's set, ask the AST reader to update the
IdentifierInfo from the AST files. The update is a merge, and we now
take care to merge declarations before/after imports with declarations
from multiple imports.
The results of this optimization are fairly dramatic. On a small
application that brings in 14 non-trivial modules, this takes modules
from being > 3x slower than a "perfect" PCH file down to 30% slower
for a full rebuild. A partial rebuild (where the PCH file or modules
can be re-used) is down to 7% slower. Making the PCH file just a
little imperfect (e.g., adding two smallish modules used by a bunch of
.m files that aren't in the PCH file) tips the scales in favor of the
modules approach, with 24% faster partial rebuilds.
This is just a first step; the lazy scheme could possibly be improved
by adding versioning, so we don't search into modules we already
searched. Moreover, we'll need similar lazy schemes for all of the
other lookup data structures, such as DeclContexts.
llvm-svn: 143100
preprocessed entities that are #included in the range that we are interested.
This is useful when we are interested in preprocessed entities of a specific file, e.g
when we are annotating tokens. There is also an optimization where we cache the last
result of PreprocessingRecord::getPreprocessedEntitiesInRange and we re-use it if
there is a call with the same range as before.
rdar://10313365
llvm-svn: 142887
CoreFoundation object-transfer properties audited, and add a #pragma
to cause them to be automatically applied to functions in a particular
span of code. This has to be implemented largely in the preprocessor
because of the requirement that the region be entirely contained in
a single file; that's hard to impose from the parser without registering
for a ton of callbacks.
llvm-svn: 140846
PreprocessingRecord's getPreprocessedEntitiesInRange.
Also remove all the stuff that were added in ASTUnit that are unnecessary now
that we do a binary search for preprocessed entities and deserialize only
what is necessary.
llvm-svn: 140063
which will do a binary search and return a pair of iterators
for preprocessed entities in the given source range.
Source ranges of preprocessed entities are stored twice currently in
the PCH/Module file but this will be fixed in a subsequent commit.
llvm-svn: 140058
-Use an array of offsets for all preprocessed entities
-Get rid of the separate array of offsets for just macro definitions;
for references to macro definitions use an index inside the preprocessed
entities array.
-Deserialize each preprocessed entity lazily, at first request; not in bulk.
Paves the way for binary searching of preprocessed entities that will offer
efficiency and will simplify things on the libclang side a lot.
llvm-svn: 139809
target triple to separate modules built under different
conditions. The hash is used to create a subdirectory in the module
cache path where other invocations of the compiler (with the same
version, language options, etc.) can find the precompiled modules.
llvm-svn: 139662
but there is a corresponding umbrella header in a framework, build the
module on-the-fly so it can be immediately loaded at the import
statement. This is very much proof-of-concept code, with details to be
fleshed out over time.
llvm-svn: 139558
where the compiler will look for module files. Eliminates the
egregious hack where we looked into the header search paths for
modules.
llvm-svn: 139538
keyword. We now handle this keyword in HandleIdentifier, making a note
for ourselves when we've seen the __import_module__ keyword so that
the next lexed token can trigger a module import (if needed). This
greatly simplifies Preprocessor::Lex(), and completely erases the 5.5%
-Eonly slowdown Argiris noted when I originally implemented
__import_module__. Big thanks to Argiris for noting that horrible
regression!
llvm-svn: 139265
Previously we would cut off the source file buffer at the code-completion
point; this impeded code-completion inside C++ inline methods and,
recently, with buffering ObjC methods.
Have the code-completion inserted into the source buffer so that it can
be buffered along with a method body. When we actually hit the code-completion
point the cut-off lexing or parsing.
Fixes rdar://10056932&8319466
llvm-svn: 139086
and language-specific initialization. Use this to allow ASTUnit to
create a preprocessor object *before* loading the AST file. No actual
functionality change.
llvm-svn: 138983
LangOptions, rather than making distinct copies of
LangOptions. Granted, LangOptions doesn't actually get modified, but
this will eventually make it easier to construct ASTContext and
Preprocessor before we know all of the LangOptions.
llvm-svn: 138959
include guards don't show up as macro definitions in every translation
unit that imports a module. Macro definitions can, however, be
exported with the intentionally-ugly #__export_macro__
directive. Implement this feature by not even bothering to serialize
non-exported macros to a module, because clients of that module need
not (should not) know that these macros even exist.
llvm-svn: 138943
existing practice with Python extension modules. Not that Python
extension modules should be using a double-underscored identifier
anyway, but...
llvm-svn: 138870
__import__ within the preprocessor, since the prior one foolishly
assumed that Preprocessor::Lex() was re-entrant. We now handle
__import__ at the top level (only), after macro expansion. This should
fix the buildbot failures.
llvm-svn: 138704
loads the named module. The syntax itself is intentionally hideous and
will be replaced at some later point with something more
palatable. For now, we're focusing on the semantics:
- Module imports are handled first by the preprocessor (to get macro
definitions) and then the same tokens are also handled by the parser
(to get declarations). If both happen (as in normal compilation),
the second one is redundant, because we currently have no way to
hide macros or declarations when loading a module. Chris gets credit
for this mad-but-workable scheme.
- The Preprocessor now holds on to a reference to a module loader,
which is responsible for loading named modules. CompilerInstance is
the only important module loader: it now knows how to create and
wire up an AST reader on demand to actually perform the module load.
- We search for modules in the include path, using the module name
with the suffix ".pcm" (precompiled module) for the file name. This
is a temporary hack; we hope to improve the situation in the
future.
llvm-svn: 138679
Currently getMacroArgExpandedLocation is very inefficient and for the case
of a location pointing at the main file it will end up checking almost all of
the SLocEntries. Make it faster:
-Use a map of macro argument chunks to their expanded source location. The map
is for a single source file, it's stored in the file's ContentCache and lazily
computed, like the source lines cache.
-In SLocEntry's FileInfo add an 'unsigned NumCreatedFIDs' field that keeps track
of the number of FileIDs (files and macros) that were created during preprocessing
of that particular file SLocEntry. This is useful when computing the macro argument
map in skipping included files while scanning for macro arg FileIDs that lexed from
a specific source file. Due to padding, the new field does not increase the size
of SLocEntry.
llvm-svn: 138225
for tokens that are lexed consecutively from the same FileID, instead of creating
a SLocEntry for each token. e.g for
assert(foo == bar);
there will be a single SLocEntry for the "foo == bar" chunk and locations
for the 'foo', '==', 'bar' tokens will point inside that chunk.
For parsing SemaExpr.cpp, this reduced the number of SLocEntries by 25%.
llvm-svn: 138129
1. Be more tolerant of comments in -CC (comment-preserving) mode. We were missing a few cases.
2. Make sure to expand the second FOO in "#if defined FOO FOO". (See also
r97253, which addressed the case of "#if defined(FOO FOO".)
Fixes PR10286.
llvm-svn: 136748
entities generated directly by the preprocessor from those loaded from
the external source (e.g., the ASTReader). By separating these two
sets of entities into different vectors, we allow both to grow
independently, and eliminate the need for preallocating all of the
loaded preprocessing entities. This is similar to the way the recent
SourceManager refactoring treats FileIDs and the source location
address space.
As part of this, switch over to building a continuous range map to
track preprocessing entities.
llvm-svn: 135646
variants to 'expand'. This changed a couple of public APIs, including
one public type "MacroInstantiation" which is now "MacroExpansion". The
rest of the codebase was updated to reflect this, especially the
libclang code. Two of the C++ (and thus easily changed) libclang APIs
were updated as well because they pertained directly to the old
MacroInstantiation class.
No functionality changed.
llvm-svn: 135139
'expand'. Also update the public API it provides to the new term, and
propagate that update to the various clients.
No functionality changed.
llvm-svn: 135138
When a macro instantiation occurs, reserve a SLocEntry chunk with length the
full length of the macro definition source. Set the spelling location of this chunk
to point to the start of the macro definition and any tokens that are lexed directly
from the macro definition will get a location from this chunk with the appropriate offset.
For any tokens that come from argument expansion, '##' paste operator, etc. have their
instantiation location point at the appropriate place in the instantiated macro definition
(the argument identifier and the '##' token respectively).
This improves macro instantiation diagnostics:
Before:
t.c:5:9: error: invalid operands to binary expression ('struct S' and 'int')
int y = M(/);
^~~~
t.c:5:11: note: instantiated from:
int y = M(/);
^
After:
t.c:5:9: error: invalid operands to binary expression ('struct S' and 'int')
int y = M(/);
^~~~
t.c:3:20: note: instantiated from:
\#define M(op) (foo op 3);
~~~ ^ ~
t.c:5:11: note: instantiated from:
int y = M(/);
^
The memory savings for a candidate boost library that abuses the preprocessor are:
- 32% less SLocEntries (37M -> 25M)
- 30% reduction in PCH file size (900M -> 635M)
- 50% reduction in memory usage for the SLocEntry table (1.6G -> 800M)
llvm-svn: 134587
Previously macro expanded tokens were added to Preprocessor's bump allocator and never released,
even after the TokenLexer that were lexing them was finished, thus they were wasting memory.
A very "useful" boost library was causing clang to eat 1 GB just for the expanded macro tokens.
Introduce a special cache that works like a stack; a TokenLexer can add the macro expanded tokens
in the cache, and when it finishes, the tokens are removed from the end of the cache.
Now consumed memory by expanded tokens for that library is ~ 1.5 MB.
Part of rdar://9327049.
llvm-svn: 134105
1. We would assume that the length of the string literal token was at least 2
2. We would allocate a buffer with size length-2
And when the stars aligned (one of which would be an invalid source location due to stale PCH)
The length would be 0 and we would try to allocate a 4GB buffer.
Add checks for this corner case and a bunch of asserts.
(We really really should have had an assert for 1.).
Note that there's no test case since I couldn't get one (it was major PITA to reproduce),
maybe later.
llvm-svn: 131492
__has_extension is a function-like macro which takes the same set
of feature identifiers as __has_feature. It evaluates to 1 if the
feature is supported by Clang in the current language (either as a
language extension or a standard language feature) or 0 if not.
At the same time, add support for the C1X feature identifiers
c_generic_selections (renamed from generic_selections) and
c_static_assert, and document them.
Patch by myself and Jean-Daniel Dupas.
llvm-svn: 131308
CXTranslationUnit_NestedMacroInstantiations, which indicates whether
we want to see "nested" macro instantiations (e.g., those that occur
inside other macro instantiations) within the detailed preprocessing
record. Many clients (e.g., those that only care about visible tokens)
don't care about this information, and in code that uses preprocessor
metaprogramming, this information can have a very high cost.
Addresses <rdar://problem/9389320>.
llvm-svn: 130990
which determines whether a particular file is actually a header that
is intended to be guarded from multiple inclusions within the same
translation unit.
llvm-svn: 130808
includes get resolved, especially when they are found relatively to
another include file. We also try to get it working for framework
includes, but that part of the code is untested, as I don't have a code
base that uses it.
llvm-svn: 130246
This change requires making a bunch of fundamental Clang structures (optionally) reference counted to allow correct
ownership semantics of these objects (e.g., ASTContext) to play out between an active ASTUnit and CompilerInstance
object.
llvm-svn: 128011
clients to observe the exact path through which an #included file was
located. This is very useful when trying to record and replay inclusion
operations without it beind influenced by the aggressive caching done
inside the FileManager to avoid redundant system calls and filesystem
operations.
The work to compute and return this is only done in the presence of
callbacks, so it should have no effect on normal compilation.
Patch by Manuel Klimek.
llvm-svn: 127742
The previous name was inaccurate as this token in fact appears at
the end of every preprocessing directive, not just macro definitions.
No functionality change, except for a diagnostic tweak.
llvm-svn: 126631
AST/PCH files more lazy:
- Don't preload all of the file source-location entries when reading
the AST file. Instead, load them lazily, when needed.
- Only look up header-search information (whether a header was already
#import'd, how many times it's been included, etc.) when it's needed
by the preprocessor, rather than pre-populating it.
Previously, we would pre-load all of the file source-location entries,
which also populated the header-search information structure. This was
a relatively minor performance issue, since we would end up stat()'ing
all of the headers stored within a AST/PCH file when the AST/PCH file
was loaded. In the normal PCH use case, the stat()s were cached, so
the cost--of preloading ~860 source-location entries in the Cocoa.h
case---was relatively low.
However, the recent optimization that replaced stat+open with
open+fstat turned this into a major problem, since the preloading of
source-location entries would now end up opening those files. Worse,
those files wouldn't be closed until the file manager was destroyed,
so just opening a Cocoa.h PCH file would hold on to ~860 file
descriptors, and it was easy to blow through the process's limit on
the number of open file descriptors.
By eliminating the preloading of these files, we neither open nor stat
the headers stored in the PCH/AST file until they're actually needed
for something. Concretely, we went from
*** HeaderSearch Stats:
835 files tracked.
364 #import/#pragma once files.
823 included exactly once.
6 max times a file is included.
3 #include/#include_next/#import.
0 #includes skipped due to the multi-include optimization.
1 framework lookups.
0 subframework lookups.
*** Source Manager Stats:
835 files mapped, 3 mem buffers mapped.
37460 SLocEntry's allocated, 11215575B of Sloc address space used.
62 bytes of files mapped, 0 files with line #'s computed.
with a trivial program that uses a chained PCH including a Cocoa PCH
to
*** HeaderSearch Stats:
4 files tracked.
1 #import/#pragma once files.
3 included exactly once.
2 max times a file is included.
3 #include/#include_next/#import.
0 #includes skipped due to the multi-include optimization.
1 framework lookups.
0 subframework lookups.
*** Source Manager Stats:
3 files mapped, 3 mem buffers mapped.
37460 SLocEntry's allocated, 11215575B of Sloc address space used.
62 bytes of files mapped, 0 files with line #'s computed.
for the same program.
llvm-svn: 125286
When we are in code-completion mode, skip parsing of all function bodies except the one where the
code-completion point resides.
For big .cpp files like 'SemaExpr.cpp' the improvement makes a huge difference, in some cases cutting down
code-completion time -62% !
We don't get diagnostics for the bodies though, so modify the code-completion tests that check for errors.
See rdar://8814203.
llvm-svn: 122765
Diagnostic pragmas are broken because we don't keep track of the diagnostic state changes and we only check the current/latest state.
Problems manifest if a diagnostic is emitted for a source line that has different diagnostic state than the current state; this can affect
a lot of places, like C++ inline methods, template instantiations, the lexer, etc.
Fix the issue by having the Diagnostic object keep track of the source location of the pragmas so that it is able to know what is the diagnostic state at any given source location.
Fixes rdar://8365684.
llvm-svn: 121873
trap the serialized preprocessing records (macro definitions, macro
instantiations, macro definitions) from the generation of the
precompiled preamble, then replay those when walking the list of
preprocessed entities. This eliminates a bug where clang_getCursor()
wasn't able to find preprocessed-entity cursors in the preamble.
llvm-svn: 120396
FileSystemOpts through a ton of apis, simplifying a lot of code.
This also fixes a latent bug in ASTUnit where it would invoke
methods on FileManager without creating one in some code paths
in cindextext.
llvm-svn: 120010
and use a better and more general approach, where NullStmt has a flag to indicate whether it was preceded by an empty macro.
Thanks to Abramo Bagnara for the hint!
llvm-svn: 119887
than a Token that holds the same information all in one easy-to-use
package. There's no technical reason to prefer the former -- the
information comes from a Token originally -- and it's clumsier to use,
so I've changed the code to use tokens everywhere.
Approved by clattner
llvm-svn: 119845
-Move the stuff of Diagnostic related to creating/querying diagnostic IDs into a new DiagnosticIDs class.
-DiagnosticIDs can be shared among multiple Diagnostics for multiple translation units.
-The rest of the state in Diagnostic object is considered related and tied to one translation unit.
-Have Diagnostic point to the SourceManager that is related with. Diagnostic can now accept just a
SourceLocation instead of a FullSourceLoc.
-Reflect the changes to various interfaces.
llvm-svn: 119730
The callback info for #if/#elif is not great -- ideally it would give
us a list of tokens in the #if, or even better, a little parse tree.
But that's a lot more work. Instead, clients can retokenize using
Lexer::LexFromRawLexer().
Reviewed by nlewycky.
llvm-svn: 118318
When -working-directory is passed in command line, file paths are resolved relative to the specified directory.
This helps both when using libclang (where we can't require the user to actually change the working directory)
and to help reproduce test cases when the reproduction work comes along.
--FileSystemOptions is introduced which controls how file system operations are performed (currently it just contains
the working directory value if set).
--FileSystemOptions are passed around to various interfaces that perform file operations.
--Opening & reading the content of files should be done only through FileManager. This is useful in general since
file operations will be abstracted in the future for the reproduction mechanism.
FileSystemOptions is independent of FileManager so that we can have multiple translation units sharing the same
FileManager but with different FileSystemOptions.
Addresses rdar://8583824.
llvm-svn: 118203
load identifiers without loading their corresponding macro
definitions. This is likely to improve PCH performance slightly, and
reduces deserialization stack depth considerably when using
preprocessor metaprogramming.
llvm-svn: 117750
inclusion directives, keeping track of every #include, #import,
etc. in the translation unit. We keep track of the source location and
kind of the inclusion, how the file name was spelled, and the
underlying file to which the inclusion resolved.
llvm-svn: 116952
Now MICache is a linked list (per the FIXME), where we tradeoff between MacroInfo objects being in MICache
and MIChainHead. MacroInfo objects in the MICache chain are already "Destroy()'ed", so they can be reused. When
inserting into MICache, we need to remove them from the regular linked list so that they aren't destroyed more than
once.
llvm-svn: 116869
list of allocated MacroInfos. This requires only 1 extra pointer per MacroInfo object, and allows us to blow them
away in one place. This fixes an elusive memory leak with MacroInfos (whose exact location I couldn't still figure
out despite substantial digging).
Fixes <rdar://problem/8361834>.
llvm-svn: 116842
spelled (#pragma, _Pragma, __pragma). In -E mode, use that information
to add appropriate newlines when translating _Pragma and __pragma into
#pragma, like GCC does. Fixes <rdar://problem/8412013>.
llvm-svn: 113553
The extra data stored on user-defined literal Tokens is stored in extra
allocated memory, which is managed by the PreprocessorLexer because there isn't
a better place to put it that makes sure it gets deallocated, but only after
it's used up. My testing has shown no significant slowdown as a result, but
independent testing would be appreciated.
llvm-svn: 112458