See r331124 for how I made a list of files missing the include.
I then ran this Python script:
for f in open('filelist.txt'):
f = f.strip()
fl = open(f).readlines()
found = False
for i in xrange(len(fl)):
p = '#include "llvm/'
if not fl[i].startswith(p):
continue
if fl[i][len(p):] > 'Config':
fl.insert(i, '#include "llvm/Config/llvm-config.h"\n')
found = True
break
if not found:
print 'not found', f
else:
open(f, 'w').write(''.join(fl))
and then looked through everything with `svn diff | diffstat -l | xargs -n 1000 gvim -p`
and tried to fix include ordering and whatnot.
No intended behavior change.
llvm-svn: 331184
As demonstrated by the regression tests added in this patch, the
following cases are valid cases:
1. A Function with no DISubprogram attached, but various debug info
related to its instructions, coming, for instance, from an inlined
function, also defined somewhere else in the same module;
2. ... or coming exclusively from the functions inlined and eliminated
from the module entirely.
The ValueMap shared between CloneFunctionInto calls within CloneModule
needs to contain identity mappings for all of the DISubprogram's to
prevent them from being duplicated by MapMetadata / RemapInstruction
calls, this is achieved via DebugInfoFinder collecting all the
DISubprogram's. However, CloneFunctionInto was missing calls into
DebugInfoFinder for functions w/o DISubprogram's attached, but still
referring DISubprogram's from within (case 1). This patch fixes that.
The fix above, however, exposes another issue: if a module contains a
DISubprogram referenced only indirectly from other debug info
metadata, but not attached to any Function defined within the module
(case 2), cloning such a module causes a DICompileUnit duplication: it
will be moved in indirecty via a DISubprogram by DebugInfoFinder first
(because of the first bug fix described above), without being
self-mapped within the shared ValueMap, and then will be copied during
named metadata cloning. So this patch makes sure DebugInfoFinder
visits DICompileUnit's referenced from DISubprogram's as it goes w/o
re-processing llvm.dbg.cu list over and over again for every function
cloned, and makes sure that CloneFunctionInto self-maps
DICompileUnit's referenced from the entire function, not just its own
DISubprogram attached that may also be missing.
The most convenient way of tesing CloneModule I found is to rely on
CloneModule call from `opt -run-twice`, instead of writing tedious
unit tests. That feature has a couple of properties that makes it hard
to use for this purpose though:
1. CloneModule doesn't copy source filename, making `opt -run-twice`
report it as a difference.
2. `opt -run-twice` does the second run on the original module, not
its clone, making the result of cloning completely invisible in opt's
actual output with and without `-run-twice` both, which directly
contradicts `opt -run-twice`s own error message.
This patch fixes this as well.
Reviewed By: aprantl
Reviewers: loladiro, GorNishanov, espindola, echristo, dexonsmith
Subscribers: vsk, debug-info, JDevlieghere, llvm-commits
Differential Revision: https://reviews.llvm.org/D45593
llvm-svn: 330069
We have a few functions that virtually all command wants to run on
process startup/shutdown. This patch adds InitLLVM class to do that
all at once, so that we don't need to copy-n-paste boilerplate code
to each llvm command's main() function.
Differential Revision: https://reviews.llvm.org/D45602
llvm-svn: 330046
These aren't the .def style files used in LLVM that require a macro
defined before their inclusion - they're just basic non-modular includes
to stamp out command line flag variables.
llvm-svn: 329840
Sometimes users do not specify data layout in LLVM assembly and let llc set the
data layout by target triple after loading the LLVM assembly.
Currently the parser checks alloca address space no matter whether the LLVM
assembly contains data layout definition, which causes false alarm since the
default data layout does not contain the correct alloca address space.
The parser also calls verifier to check debug info and updating invalid debug
info. Currently there is no way to let the verifier to check debug info only.
If the verifier finds non-debug-info issues the parser will fail.
For llc, the fix is to remove the check of alloca addr space in the parser and
disable updating debug info, and defer the updating of debug info and
verification to be after setting data layout of the IR by target.
For other llvm tools, since they do not override data layout by target but
instead can override data layout by a command line option, an argument for
overriding data layout is added to the parser. In cases where data layout
overriding is necessary for the parser, the data layout can be provided by
command line.
Differential Revision: https://reviews.llvm.org/D41832
llvm-svn: 323826
Combine expression patterns to form expressions with fewer, simple instructions.
This pass does not modify the CFG.
For example, this pass reduce width of expressions post-dominated by TruncInst
into smaller width when applicable.
It differs from instcombine pass in that it contains pattern optimization that
requires higher complexity than the O(1), thus, it should run fewer times than
instcombine pass.
Differential Revision: https://reviews.llvm.org/D38313
llvm-svn: 323321
Opt's "-enable-debugify" mode adds an instance of Debugify at the
beginning of the pass pipeline, and an instance of CheckDebugify at the
end.
You can enable this mode with lit using: -Dopt="opt -enable-debugify".
Note that running test suites in this mode will result in many failures
due to strict FileCheck commands, etc.
It can be more useful to look for assertion failures which arise only
when Debugify is enabled, e.g to prove that we have (or do not have)
test coverage for some code path with debug info present.
Differential Revision: https://reviews.llvm.org/D41793
llvm-svn: 323256
Summary:
First, we need to explain the core of the vulnerability. Note that this
is a very incomplete description, please see the Project Zero blog post
for details:
https://googleprojectzero.blogspot.com/2018/01/reading-privileged-memory-with-side.html
The basis for branch target injection is to direct speculative execution
of the processor to some "gadget" of executable code by poisoning the
prediction of indirect branches with the address of that gadget. The
gadget in turn contains an operation that provides a side channel for
reading data. Most commonly, this will look like a load of secret data
followed by a branch on the loaded value and then a load of some
predictable cache line. The attacker then uses timing of the processors
cache to determine which direction the branch took *in the speculative
execution*, and in turn what one bit of the loaded value was. Due to the
nature of these timing side channels and the branch predictor on Intel
processors, this allows an attacker to leak data only accessible to
a privileged domain (like the kernel) back into an unprivileged domain.
The goal is simple: avoid generating code which contains an indirect
branch that could have its prediction poisoned by an attacker. In many
cases, the compiler can simply use directed conditional branches and
a small search tree. LLVM already has support for lowering switches in
this way and the first step of this patch is to disable jump-table
lowering of switches and introduce a pass to rewrite explicit indirectbr
sequences into a switch over integers.
However, there is no fully general alternative to indirect calls. We
introduce a new construct we call a "retpoline" to implement indirect
calls in a non-speculatable way. It can be thought of loosely as
a trampoline for indirect calls which uses the RET instruction on x86.
Further, we arrange for a specific call->ret sequence which ensures the
processor predicts the return to go to a controlled, known location. The
retpoline then "smashes" the return address pushed onto the stack by the
call with the desired target of the original indirect call. The result
is a predicted return to the next instruction after a call (which can be
used to trap speculative execution within an infinite loop) and an
actual indirect branch to an arbitrary address.
On 64-bit x86 ABIs, this is especially easily done in the compiler by
using a guaranteed scratch register to pass the target into this device.
For 32-bit ABIs there isn't a guaranteed scratch register and so several
different retpoline variants are introduced to use a scratch register if
one is available in the calling convention and to otherwise use direct
stack push/pop sequences to pass the target address.
This "retpoline" mitigation is fully described in the following blog
post: https://support.google.com/faqs/answer/7625886
We also support a target feature that disables emission of the retpoline
thunk by the compiler to allow for custom thunks if users want them.
These are particularly useful in environments like kernels that
routinely do hot-patching on boot and want to hot-patch their thunk to
different code sequences. They can write this custom thunk and use
`-mretpoline-external-thunk` *in addition* to `-mretpoline`. In this
case, on x86-64 thu thunk names must be:
```
__llvm_external_retpoline_r11
```
or on 32-bit:
```
__llvm_external_retpoline_eax
__llvm_external_retpoline_ecx
__llvm_external_retpoline_edx
__llvm_external_retpoline_push
```
And the target of the retpoline is passed in the named register, or in
the case of the `push` suffix on the top of the stack via a `pushl`
instruction.
There is one other important source of indirect branches in x86 ELF
binaries: the PLT. These patches also include support for LLD to
generate PLT entries that perform a retpoline-style indirection.
The only other indirect branches remaining that we are aware of are from
precompiled runtimes (such as crt0.o and similar). The ones we have
found are not really attackable, and so we have not focused on them
here, but eventually these runtimes should also be replicated for
retpoline-ed configurations for completeness.
For kernels or other freestanding or fully static executables, the
compiler switch `-mretpoline` is sufficient to fully mitigate this
particular attack. For dynamic executables, you must compile *all*
libraries with `-mretpoline` and additionally link the dynamic
executable and all shared libraries with LLD and pass `-z retpolineplt`
(or use similar functionality from some other linker). We strongly
recommend also using `-z now` as non-lazy binding allows the
retpoline-mitigated PLT to be substantially smaller.
When manually apply similar transformations to `-mretpoline` to the
Linux kernel we observed very small performance hits to applications
running typical workloads, and relatively minor hits (approximately 2%)
even for extremely syscall-heavy applications. This is largely due to
the small number of indirect branches that occur in performance
sensitive paths of the kernel.
When using these patches on statically linked applications, especially
C++ applications, you should expect to see a much more dramatic
performance hit. For microbenchmarks that are switch, indirect-, or
virtual-call heavy we have seen overheads ranging from 10% to 50%.
However, real-world workloads exhibit substantially lower performance
impact. Notably, techniques such as PGO and ThinLTO dramatically reduce
the impact of hot indirect calls (by speculatively promoting them to
direct calls) and allow optimized search trees to be used to lower
switches. If you need to deploy these techniques in C++ applications, we
*strongly* recommend that you ensure all hot call targets are statically
linked (avoiding PLT indirection) and use both PGO and ThinLTO. Well
tuned servers using all of these techniques saw 5% - 10% overhead from
the use of retpoline.
We will add detailed documentation covering these components in
subsequent patches, but wanted to make the core functionality available
as soon as possible. Happy for more code review, but we'd really like to
get these patches landed and backported ASAP for obvious reasons. We're
planning to backport this to both 6.0 and 5.0 release streams and get
a 5.0 release with just this cherry picked ASAP for distros and vendors.
This patch is the work of a number of people over the past month: Eric, Reid,
Rui, and myself. I'm mailing it out as a single commit due to the time
sensitive nature of landing this and the need to backport it. Huge thanks to
everyone who helped out here, and everyone at Intel who helped out in
discussions about how to craft this. Also, credit goes to Paul Turner (at
Google, but not an LLVM contributor) for much of the underlying retpoline
design.
Reviewers: echristo, rnk, ruiu, craig.topper, DavidKreitzer
Subscribers: sanjoy, emaste, mcrosier, mgorny, mehdi_amini, hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D41723
llvm-svn: 323155
Since this isn't a real header - it includes static functions and had
external linkage variables (though this change makes them static, since
that's what they should be) so can't be included more than once in a
program.
llvm-svn: 319082
Clang implements the -finstrument-functions flag inherited from GCC, which
inserts calls to __cyg_profile_func_{enter,exit} on function entry and exit.
This is useful for getting a trace of how the functions in a program are
executed. Normally, the calls remain even if a function is inlined into another
function, but it is useful to be able to turn this off for users who are
interested in a lower-level trace, i.e. one that reflects what functions are
called post-inlining. (We use this to generate link order files for Chromium.)
LLVM already has a pass for inserting similar instrumentation calls to
mcount(), which it does after inlining. This patch renames and extends that
pass to handle calls both to mcount and the cygprofile functions, before and/or
after inlining as controlled by function attributes.
Differential Revision: https://reviews.llvm.org/D39287
llvm-svn: 318195
Probably due to a change of how some pass initializes its dependencies,
the -write-bitcode pass (Bitcode/Writer/BitcodeWriterPass.cpp) is not
initialized in opt anymore and therefore not usable with
opt -write-bitcode
Explicitly call initializeWriteBitcodePassPass() to make it available
in opt again.
Differential Revision: https://reviews.llvm.org/D39223
llvm-svn: 316464
Reverting to investigate layering effects of MCJIT not linking
libCodeGen but using TargetMachine::getNameWithPrefix() breaking the
lldb bots.
This reverts commit r315633.
llvm-svn: 315637
Merge LLVMTargetMachine into TargetMachine.
- There is no in-tree target anymore that just implements TargetMachine
but not LLVMTargetMachine.
- It should still be possible to stub out all the various functions in
case a target does not want to use lib/CodeGen
- This simplifies the code and avoids methods ending up in the wrong
interface.
Differential Revision: https://reviews.llvm.org/D38489
llvm-svn: 315633
This came out of a recent discussion on llvm-dev
(https://reviews.llvm.org/D38042). Currently the Verifier will strip
the debug info metadata from a module if it finds the dbeug info to be
malformed. This feature is very valuable since it allows us to improve
the Verifier by making it stricter without breaking bcompatibility,
but arguable the Verifier pass should not be modifying the IR. This
patch moves the stripping of broken debug info into AutoUpgrade
(UpgradeDebugInfo to be precise), which is a much better location for
this since the stripping of malformed (i.e., produced by older, buggy
versions of Clang) is a (harsh) form of AutoUpgrade.
This change is mostly NFC in nature, the one big difference is the
behavior when LLVM module passes are introducing malformed debug
info. Prior to this patch, a NoAsserts build would have printed a
warning and stripped the debug info, after this patch the Verifier
will report a fatal error. I believe this behavior is actually more
desirable anyway.
Differential Revision: https://reviews.llvm.org/D38184
llvm-svn: 314699
Summary:
The New Pass Manager infrastructure was forgetting to keep around the optimization remark yaml file that the compiler might have been producing. This meant setting the option to '-' for stdout worked, but setting it to a filename didn't give file output (presumably it was deleted because compilation didn't explicitly keep it). This change just ensures that the file is kept if compilation succeeds.
So far I have updated one of the optimization remark output tests to add a version with the new pass manager. It is my intention for this patch to also include changes to all tests that use `-opt-remark-output=` but I wanted to get the code patch ready for review while I was making all those changes.
Fixes https://bugs.llvm.org/show_bug.cgi?id=33951
Reviewers: anemet, chandlerc
Reviewed By: anemet, chandlerc
Subscribers: javed.absar, chandlerc, fhahn, llvm-commits
Differential Revision: https://reviews.llvm.org/D36906
llvm-svn: 311271
IMHO it is an antipattern to have a enum value that is Default.
At any given piece of code it is not clear if we have to handle
Default or if has already been mapped to a concrete value. In this
case in particular, only the target can do the mapping and it is nice
to make sure it is always done.
This deletes the two default enum values of CodeModel and uses an
explicit Optional<CodeModel> when it is possible that it is
unspecified.
llvm-svn: 309911
Summary:
Add an option to prevent diagnostics that do not meet a minimum hotness
threshold from being output. When generating optimization remarks for
large codebases with a ton of cold code paths, this option can be used
to limit the optimization remark output at a reasonable size. Discussion of
this change can be read here:
http://lists.llvm.org/pipermail/llvm-dev/2017-June/114377.html
Reviewers: anemet, davidxl, hfinkel
Reviewed By: anemet
Subscribers: qcolombet, javed.absar, fhahn, eraman, llvm-commits
Differential Revision: https://reviews.llvm.org/D34867
llvm-svn: 306912
Summary:
To enable profile hotness information in diagnostics output, Clang takes
the option `-fdiagnostics-show-hotness` -- that's "diagnostics", with an
"s" at the end. Clang also defines `CodeGenOptions::DiagnosticsWithHotness`.
LLVM, on the other hand, defines
`LLVMContext::getDiagnosticHotnessRequested` -- that's "diagnostic", not
"diagnostics". It's a small difference, but it's confusing, typo-inducing, and
frustrating.
Add a new method with the spelling "diagnostics", and "deprecate" the
old spelling.
Reviewers: anemet, davidxl
Reviewed By: anemet
Subscribers: llvm-commits, mehdi_amini
Differential Revision: https://reviews.llvm.org/D34864
llvm-svn: 306848
Summary: Also see D33429 for other ThinLTO + New PM related changes.
Reviewers: davide, chandlerc, tejohnson
Subscribers: mehdi_amini, Prazek, cfe-commits, inglorion, llvm-commits, eraman
Differential Revision: https://reviews.llvm.org/D33525
llvm-svn: 304378
This provides a new way to access the TargetMachine through
TargetPassConfig, as a dependency.
The patterns replaced here are:
* Passes handling a null TargetMachine call
`getAnalysisIfAvailable<TargetPassConfig>`.
* Passes not handling a null TargetMachine
`addRequired<TargetPassConfig>` and call
`getAnalysis<TargetPassConfig>`.
* MachineFunctionPasses now use MF.getTarget().
* Remove all the TargetMachine constructors.
* Remove INITIALIZE_TM_PASS.
This fixes a crash when running `llc -start-before prologepilog`.
PEI needs StackProtector, which gets constructed without a TargetMachine
by the pass manager. The StackProtector pass doesn't handle the case
where there is no TargetMachine, so it segfaults.
Related to PR30324.
Differential Revision: https://reviews.llvm.org/D33222
llvm-svn: 303360
Currently, when masked load, store, gather or scatter intrinsics are used, we check in CodeGenPrepare pass if the subtarget support these intrinsics, if not we replace them with scalar code - this is a functional transformation not an optimization (not optional).
CodeGenPrepare pass does not run when the optimization level is set to CodeGenOpt::None (-O0).
Functional transformation should run with all optimization levels, so here I created a new pass which runs on all optimization levels and does no more than this transformation.
Differential Revision: https://reviews.llvm.org/D32487
llvm-svn: 303050
This pass uses a new target hook to decide whether or not to expand a particular
intrinsic to the shuffevector sequence.
Differential Revision: https://reviews.llvm.org/D32245
llvm-svn: 302631
This lets the pass focus on gathering the required analyzes, and the
utility class focus on the transformation.
Differential Revision: https://reviews.llvm.org/D31303
llvm-svn: 302609
Summary:
The cumulative size of the bitcode files for a very large application
can be huge, particularly with -g. In a distributed build environment,
all of these files must be sent to the remote build node that performs
the thin link step, and this can exceed size limits.
The thin link actually only needs the summary along with a bitcode
symbol table. Until we have a proper bitcode symbol table, simply
stripping the debug metadata results in significant size reduction.
Add support for an option to additionally emit minimized bitcode
modules, just for use in the thin link step, which for now just strips
all debug metadata. I plan to add a cc1 option so this can be invoked
easily during the compile step.
However, care must be taken to ensure that these minimized thin link
bitcode files produce the same index as with the original bitcode files,
as these original bitcode files will be used in the backends.
Specifically:
1) The module hash used for caching is typically produced by hashing the
written bitcode, and we want to include the hash that would correspond
to the original bitcode file. This is because we want to ensure that
changes in the stripped portions affect caching. Added plumbing to emit
the same module hash in the minimized thin link bitcode file.
2) The module paths in the index are constructed from the module ID of
each thin linked bitcode, and typically is automatically generated from
the input file path. This is the path used for finding the modules to
import from, and obviously we need this to point to the original bitcode
files. Added gold-plugin support to take a suffix replacement during the
thin link that is used to override the identifier on the MemoryBufferRef
constructed from the loaded thin link bitcode file. The assumption is
that the build system can specify that the minimized bitcode file has a
name that is similar but uses a different suffix (e.g. out.thinlink.bc
instead of out.o).
Added various tests to ensure that we get identical index files out of
the thin link step.
Reviewers: mehdi_amini, pcc
Subscribers: Prazek, llvm-commits
Differential Revision: https://reviews.llvm.org/D31027
llvm-svn: 298638
Summary: Because SamplePGO passes will be invoked twice in ThinLTO build: once at compile phase, the other at backend. We want to make sure the IR at the 2nd phase matches the hot part in profile, thus we do not want to inline hot callsites in the first phase.
Reviewers: tejohnson, eraman
Reviewed By: tejohnson
Subscribers: mehdi_amini, llvm-commits, Prazek
Differential Revision: https://reviews.llvm.org/D31201
llvm-svn: 298428
This change introduces adjustPassManager target callback giving a
target an opportunity to tweak PassManagerBuilder before pass
managers are populated.
This generalizes and replaces addEarlyAsPossiblePasses target
callback. In particular that can be used to add custom passes to
extension points other than EP_EarlyAsPossible.
Differential Revision: https://reviews.llvm.org/D28336
llvm-svn: 293189
This pass prepares a module containing type metadata for ThinLTO by splitting
it into regular and thin LTO parts if possible, and writing both parts to
a multi-module bitcode file. Modules that do not contain type metadata are
written unmodified as a single module.
All globals with type metadata are added to the regular LTO module, and
the rest are added to the thin LTO module.
Differential Revision: https://reviews.llvm.org/D27324
llvm-svn: 289899
Summary:
This makes it explicit that ownership is taken. Also replace all `new`
with make_unique<> at call sites.
Reviewers: anemet
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D26884
llvm-svn: 287449
This restores the rest of r286297 (part was restored in r286475).
Specifically, it restores the part requiring adding a dependency from
the Analysis to Object library (downstream use changed to correctly
model split BitReader vs BitWriter libraries).
Original description of this part of patch follows:
Module level asm may also contain defs of values. We need to prevent
export of any refs to local values defined in module level asm (e.g. a
ref in normal IR), since that also requires renaming/promotion of the
local. To do that, the summary index builder looks at all values in the
module level asm string that are not marked Weak or Global, which is
exactly the set of locals that are defined. A summary is created for
each of these local defs and flagged as NoRename.
This required adding handling to the BitcodeWriter to look at GV
declarations to see if they have a summary (rather than skipping them
all).
Finally, added an assert to IRObjectFile::CollectAsmUndefinedRefs to
ensure that an MCAsmParser is available, otherwise the module asm parse
would silently fail. Initialized the asm parser in the opt tool for use
in testing this fix.
Fixes PR30610.
llvm-svn: 286844
Summary:
This patch uses the same approach added for inline asm in r285513 to
similarly prevent promotion/renaming of locals used or defined in module
level asm.
All static global values defined in normal IR and used in module level asm
should be included on either the llvm.used or llvm.compiler.used global.
The former were already being flagged as NoRename in the summary, and
I've simply added llvm.compiler.used values to this handling.
Module level asm may also contain defs of values. We need to prevent
export of any refs to local values defined in module level asm (e.g. a
ref in normal IR), since that also requires renaming/promotion of the
local. To do that, the summary index builder looks at all values in the
module level asm string that are not marked Weak or Global, which is
exactly the set of locals that are defined. A summary is created for
each of these local defs and flagged as NoRename.
This required adding handling to the BitcodeWriter to look at GV
declarations to see if they have a summary (rather than skipping them
all).
Finally, added an assert to IRObjectFile::CollectAsmUndefinedRefs to
ensure that an MCAsmParser is available, otherwise the module asm parse
would silently fail. Initialized the asm parser in the opt tool for use
in testing this fix.
Fixes PR30610.
Reviewers: mehdi_amini
Subscribers: johanengelen, krasin, llvm-commits
Differential Revision: https://reviews.llvm.org/D26146
llvm-svn: 286297
(Re-committed after moving the template specialization under the yaml
namespace. GCC was complaining about this.)
This allows various presentation of this data using an external tool.
This was first recommended here[1].
As an example, consider this module:
1 int foo();
2 int bar();
3
4 int baz() {
5 return foo() + bar();
6 }
The inliner generates these missed-optimization remarks today (the
hotness information is pulled from PGO):
remark: /tmp/s.c:5:10: foo will not be inlined into baz (hotness: 30)
remark: /tmp/s.c:5:18: bar will not be inlined into baz (hotness: 30)
Now with -pass-remarks-output=<yaml-file>, we generate this YAML file:
--- !Missed
Pass: inline
Name: NotInlined
DebugLoc: { File: /tmp/s.c, Line: 5, Column: 10 }
Function: baz
Hotness: 30
Args:
- Callee: foo
- String: will not be inlined into
- Caller: baz
...
--- !Missed
Pass: inline
Name: NotInlined
DebugLoc: { File: /tmp/s.c, Line: 5, Column: 18 }
Function: baz
Hotness: 30
Args:
- Callee: bar
- String: will not be inlined into
- Caller: baz
...
This is a summary of the high-level decisions:
* There is a new streaming interface to emit optimization remarks.
E.g. for the inliner remark above:
ORE.emit(DiagnosticInfoOptimizationRemarkMissed(
DEBUG_TYPE, "NotInlined", &I)
<< NV("Callee", Callee) << " will not be inlined into "
<< NV("Caller", CS.getCaller()) << setIsVerbose());
NV stands for named value and allows the YAML client to process a remark
using its name (NotInlined) and the named arguments (Callee and Caller)
without parsing the text of the message.
Subsequent patches will update ORE users to use the new streaming API.
* I am using YAML I/O for writing the YAML file. YAML I/O requires you
to specify reading and writing at once but reading is highly non-trivial
for some of the more complex LLVM types. Since it's not clear that we
(ever) want to use LLVM to parse this YAML file, the code supports and
asserts that we're writing only.
On the other hand, I did experiment that the class hierarchy starting at
DiagnosticInfoOptimizationBase can be mapped back from YAML generated
here (see D24479).
* The YAML stream is stored in the LLVM context.
* In the example, we can probably further specify the IR value used,
i.e. print "Function" rather than "Value".
* As before hotness is computed in the analysis pass instead of
DiganosticInfo. This avoids the layering problem since BFI is in
Analysis while DiagnosticInfo is in IR.
[1] https://reviews.llvm.org/D19678#419445
Differential Revision: https://reviews.llvm.org/D24587
llvm-svn: 282539
This allows various presentation of this data using an external tool.
This was first recommended here[1].
As an example, consider this module:
1 int foo();
2 int bar();
3
4 int baz() {
5 return foo() + bar();
6 }
The inliner generates these missed-optimization remarks today (the
hotness information is pulled from PGO):
remark: /tmp/s.c:5:10: foo will not be inlined into baz (hotness: 30)
remark: /tmp/s.c:5:18: bar will not be inlined into baz (hotness: 30)
Now with -pass-remarks-output=<yaml-file>, we generate this YAML file:
--- !Missed
Pass: inline
Name: NotInlined
DebugLoc: { File: /tmp/s.c, Line: 5, Column: 10 }
Function: baz
Hotness: 30
Args:
- Callee: foo
- String: will not be inlined into
- Caller: baz
...
--- !Missed
Pass: inline
Name: NotInlined
DebugLoc: { File: /tmp/s.c, Line: 5, Column: 18 }
Function: baz
Hotness: 30
Args:
- Callee: bar
- String: will not be inlined into
- Caller: baz
...
This is a summary of the high-level decisions:
* There is a new streaming interface to emit optimization remarks.
E.g. for the inliner remark above:
ORE.emit(DiagnosticInfoOptimizationRemarkMissed(
DEBUG_TYPE, "NotInlined", &I)
<< NV("Callee", Callee) << " will not be inlined into "
<< NV("Caller", CS.getCaller()) << setIsVerbose());
NV stands for named value and allows the YAML client to process a remark
using its name (NotInlined) and the named arguments (Callee and Caller)
without parsing the text of the message.
Subsequent patches will update ORE users to use the new streaming API.
* I am using YAML I/O for writing the YAML file. YAML I/O requires you
to specify reading and writing at once but reading is highly non-trivial
for some of the more complex LLVM types. Since it's not clear that we
(ever) want to use LLVM to parse this YAML file, the code supports and
asserts that we're writing only.
On the other hand, I did experiment that the class hierarchy starting at
DiagnosticInfoOptimizationBase can be mapped back from YAML generated
here (see D24479).
* The YAML stream is stored in the LLVM context.
* In the example, we can probably further specify the IR value used,
i.e. print "Function" rather than "Value".
* As before hotness is computed in the analysis pass instead of
DiganosticInfo. This avoids the layering problem since BFI is in
Analysis while DiagnosticInfo is in IR.
[1] https://reviews.llvm.org/D19678#419445
Differential Revision: https://reviews.llvm.org/D24587
llvm-svn: 282499
As discussed in https://reviews.llvm.org/D22666, our current mechanism to
support -pg profiling, where we insert calls to mcount(), or some similar
function, is fundamentally broken. We insert these calls in the frontend, which
means they get duplicated when inlining, and so the accumulated execution
counts for the inlined-into functions are wrong.
Because we don't want the presence of these functions to affect optimizaton,
they should be inserted in the backend. Here's a pass which would do just that.
The knowledge of the name of the counting function lives in the frontend, so
we're passing it here as a function attribute. Clang will be updated to use
this mechanism.
Differential Revision: https://reviews.llvm.org/D22825
llvm-svn: 280347
minimal and boring form than the old pass manager's version.
This pass does the very minimal amount of work necessary to inline
functions declared as always-inline. It doesn't support a wide array of
things that the legacy pass manager did support, but is alse ... about
20 lines of code. So it has that going for it. Notably things this
doesn't support:
- Array alloca merging
- To support the above, bottom-up inlining with careful history
tracking and call graph updates
- DCE of the functions that become dead after this inlining.
- Inlining through call instructions with the always_inline attribute.
Instead, it focuses on inlining functions with that attribute.
The first I've omitted because I'm hoping to just turn it off for the
primary pass manager. If that doesn't pan out, I can add it here but it
will be reasonably expensive to do so.
The second should really be handled by running global-dce after the
inliner. I don't want to re-implement the non-trivial logic necessary to
do comdat-correct DCE of functions. This means the -O0 pipeline will
have to be at least 'always-inline,global-dce', but that seems
reasonable to me. If others are seriously worried about this I'd like to
hear about it and understand why. Again, this is all solveable by
factoring that logic into a utility and calling it here, but I'd like to
wait to do that until there is a clear reason why the existing
pass-based factoring won't work.
The final point is a serious one. I can fairly easily add support for
this, but it seems both costly and a confusing construct for the use
case of the always inliner running at -O0. This attribute can of course
still impact the normal inliner easily (although I find that
a questionable re-use of the same attribute). I've started a discussion
to sort out what semantics we want here and based on that can figure out
if it makes sense ta have this complexity at O0 or not.
One other advantage of this design is that it should be quite a bit
faster due to checking for whether the function is a viable candidate
for inlining exactly once per function instead of doing it for each call
site.
Anyways, hopefully a reasonable starting point for this pass.
Differential Revision: https://reviews.llvm.org/D23299
llvm-svn: 278896
Summary:
Port the ModuleSummaryAnalysisWrapperPass to the new pass manager.
Use it in the ported BitcodeWriterPass (similar to how we use the
legacy ModuleSummaryAnalysisWrapperPass in the legacy WriteBitcodePass).
Also, pass the -module-summary opt flag through to the new pass
manager pipeline and through to the bitcode writer pass, and add
a test that uses it.
Reviewers: mehdi_amini
Subscribers: llvm-commits, mehdi_amini
Differential Revision: https://reviews.llvm.org/D23439
llvm-svn: 278508
Summary:
Having -O0 in opt allows testing that -O0 optimization
pipeline is built correctly.
Reviewers: majnemer
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D23208
llvm-svn: 277829
This adds boilerplate code for all coroutine passes,
the passes are no-ops for now.
Also, a small test has been added to verify that passes execute in
the expected order or not at all if coroutine support is disabled.
Patch by Gor Nishanov!
Differential Revision: https://reviews.llvm.org/D22847
llvm-svn: 277033
Summary:
This is the first set of changes implementing the RFC from
http://thread.gmane.org/gmane.comp.compilers.llvm.devel/98334
This is a cross-sectional patch; rather than implementing the hotness
attribute for all optimization remarks and all passes in a patch set, it
implements it for the 'missed-optimization' remark for Loop
Distribution. My goal is to shake out the design issues before scaling
it up to other types and passes.
Hotness is computed as an integer as the multiplication of the block
frequency with the function entry count. It's only printed in opt
currently since clang prints the diagnostic fields directly. E.g.:
remark: /tmp/t.c:3:3: loop not distributed: use -Rpass-analysis=loop-distribute for more info (hotness: 300)
A new API added is similar to emitOptimizationRemarkMissed. The
difference is that it additionally takes a code region that the
diagnostic corresponds to. From this, hotness is computed using BFI.
The new API is exposed via an analysis pass so that it can be made
dependent on LazyBFI. (Thanks to Hal for the analysis pass idea.)
This feature can all be enabled by setDiagnosticHotnessRequested in the
LLVM context. If this is off, LazyBFI is not calculated (D22141) so
there should be no overhead.
A new command-line option is added to turn this on in opt.
My plan is to switch all user of emitOptimizationRemark* to use this
module instead.
Reviewers: hfinkel
Subscribers: rcox2, mzolotukhin, llvm-commits
Differential Revision: http://reviews.llvm.org/D21771
llvm-svn: 275583
Previously, there was a discrepancy between the population of function
passes in FPasses, and their invocation. Function passes specified on
the command line, after an optimizaton level was simply discared. This
fix PR27509.
Patch by Jesper Antonsson.
Differential Review: http://reviews.llvm.org/D20725
llvm-svn: 272770
looking for it along $PATH. This allows installs of LLVM tools outside of
$PATH to find the symbolizer and produce pretty backtraces if they crash.
llvm-svn: 272232
Having an enum member named Default is quite confusing: Is it distinct
from the others?
This patch removes that member and instead uses Optional<Reloc> in
places where we have a user input that still hasn't been maped to the
default value, which is now clear has no be one of the remaining 3
options.
llvm-svn: 269988
Summary:
This is a hook to allow TargetMachine to install passes at the
EP_EarlyAsPossible PassManagerBuilder extension point.
Reviewers: chandlerc
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D18614
llvm-svn: 267763
This intrinsic takes two arguments, ``%ptr`` and ``%offset``. It loads
a 32-bit value from the address ``%ptr + %offset``, adds ``%ptr`` to that
value and returns it. The constant folder specifically recognizes the form of
this intrinsic and the constant initializers it may load from; if a loaded
constant initializer is known to have the form ``i32 trunc(x - %ptr)``,
the intrinsic call is folded to ``x``.
LLVM provides that the calculation of such a constant initializer will
not overflow at link time under the medium code model if ``x`` is an
``unnamed_addr`` function. However, it does not provide this guarantee for
a constant initializer folded into a function body. This intrinsic can be
used to avoid the possibility of overflows when loading from such a constant.
Differential Revision: http://reviews.llvm.org/D18367
llvm-svn: 267223
Summary:
This is a follow-on to apply Duncan's new DIType ODR uniquing from
r266549 and r266713 in more places.
Enable enableDebugTypeODRUniquing() for ThinLTO backends invoked via
libLTO, similar to the way r266549 enabled this for ThinLTO backend
threads launched from gold-plugin.
Also enable enableDebugTypeODRUniquing in opt, similar to the way
r266549 enabled this for llvm-link (on by default, can be disabled with
new -disable-debug-info-type-map option), since we may perform ThinLTO
importing from opt.
Reviewers: dexonsmith, joker.eph
Subscribers: joker.eph, llvm-commits
Differential Revision: http://reviews.llvm.org/D19263
llvm-svn: 266746
The fast register-allocator cannot cope with inter-block dependencies without
spilling. This is fine for ldrex/strex loops coming from atomicrmw instructions
where any value produced within a block is dead by the end, but not for
cmpxchg. So we lower a cmpxchg at -O0 via a pseudo-inst that gets expanded
after regalloc.
Fortunately this is at -O0 so we don't have to care about performance. This
simplifies the various axes of expansion considerably: we assume a strong
seq_cst operation and ensure ordering via the always-present DMB instructions
rather than v8 acquire/release instructions.
Should fix the 32-bit part of PR25526.
llvm-svn: 266679
At the same time, fixes InstructionsTest::CastInst unittest: yes
you can leave the IR in an invalid state and exit when you don't
destroy the context (like the global one), no longer now.
This is the first part of http://reviews.llvm.org/D19094
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 266379
Summary:
Let keep llvm-as "dumb": it converts textual IR to bitcode. This
commit removes the dependency from llvm-as to libLLVMAnalysis.
We'll add back summary in llvm-as if we get to a textual
representation for it at some point. In the meantime, opt seems
like a better place for that.
Reviewers: tejohnson
Subscribers: joker.eph, llvm-commits
Differential Revision: http://reviews.llvm.org/D19032
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 266131
opt adds Verifier passes in AddOptimizationPasses even if
-disable-verify is on. Fix it so that the extra verification occurs
either when (1) -disable-verifier is off, or (2) -verify-each is on.
Thanks to David Jones for pointing out this behavior!
llvm-svn: 263090
Summary:
This is intended to be a performance flag, on the same level as clang
cc1 option "--disable-free". LLVM will never initialize it by default,
it will be up to the client creating the LLVMContext to request this
behavior. Clang will do it by default in Release build (just like
--disable-free).
"opt" and "llc" can opt-in using -disable-named-value command line
option.
When performing LTO on llvm-tblgen, the initial merging of IR peaks
at 92MB without this patch, and 86MB after this patch,setNameImpl()
drops from 6.5MB to 0.5MB.
The total link time goes from ~29.5s to ~27.8s.
Compared to a compile-time flag (like the IRBuilder one), it performs
very close. I profiled on SROA and obtain these results:
420ms with IRBuilder that preserve name
372ms with IRBuilder that strip name
375ms with IRBuilder that preserve name, and a runtime flag to strip
Reviewers: chandlerc, dexonsmith, bogner
Subscribers: joker.eph, llvm-commits
Differential Revision: http://reviews.llvm.org/D17946
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 263086
Cloning the module was supposed to guard against the possibility
that the passes may be non-idempotent. However, for some reason
I decided to put that AFTER the passes had already run on the
module, defeating the point entirely. Fix that by moving up the
CloneModule as is done in llc.
llvm-svn: 254819
`Out` can be null if no output is requested, so move any access
to it inside the conditional. Thanks to Justin Bogner for finding
this.
llvm-svn: 254804
Summary: Lately, I have submitted a number of patches to fix bugs that
only occurred when using the same pass manager to compile multiple
modules (generally these bugs are failure to reset some persistent
state). Unfortunately I don't think there is currently a way to test
that from the command line. This adds a very simple flag to both llc
and opt, under which the tools will simply re-run their respective
pass pipelines using the same pass manager on (a clone of the same
module). Additionally, we verify that both outputs are bitwise the
same.
Reviewers: yaron.keren
Subscribers: loladiro, yaron.keren, kcc, llvm-commits
Differential Revision: http://reviews.llvm.org/D14965
llvm-svn: 254774
folding the code into the main Analysis library.
There already wasn't much of a distinction between Analysis and IPA.
A number of the passes in Analysis are actually IPA passes, and there
doesn't seem to be any advantage to separating them.
Moreover, it makes it hard to have interactions between analyses that
are both local and interprocedural. In trying to make the Alias Analysis
infrastructure work with the new pass manager, it becomes particularly
awkward to navigate this split.
I've tried to find all the places where we referenced this, but I may
have missed some. I have also adjusted the C API to continue to be
equivalently functional after this change.
Differential Revision: http://reviews.llvm.org/D12075
llvm-svn: 245318
remove ExecutionEngine's dependence on CodeGen. NFC.
This is a follow-up to r238080.
Differential Revision: http://reviews.llvm.org/D9830
llvm-svn: 238244
This is part of the work to remove TargetMachine::resetTargetOptions.
In this patch, instead of updating global variable NoFramePointerElim in
resetTargetOptions, its use in DisableFramePointerElim is replaced with a call
to TargetFrameLowering::noFramePointerElim. This function determines on a
per-function basis if frame pointer elimination should be disabled.
There is no change in functionality except that cl:opt option "disable-fp-elim"
can now override function attribute "no-frame-pointer-elim".
llvm-svn: 238080
options.
This commit fixes a bug in llc and opt where "-mcpu" and "-mattr" wouldn't
override function attributes "-target-cpu" and "-target-features" in the IR.
Differential Revision: http://reviews.llvm.org/D9537
llvm-svn: 236677
Remove all the global bits to do with preserving use-list order by
moving the `cl::opt`s to the individual tools that want them. There's a
minor functionality change to `libLTO`, in that you can't send in
`-preserve-bc-uselistorder=false`, but making that bit settable (if it's
worth doing) should be through explicit LTO API.
As a drive-by fix, I removed some includes of `UseListOrder.h` that were
made unnecessary by recent commits.
llvm-svn: 234973
Unify the error messages for the various tools when `verifyModule()`
fails on an input module. The "brave new way" is:
lltool: path/to/input.ll: error: input module is broken!
llvm-svn: 233667
Change `llc` and `opt` to run `verifyModule()`. This ensures that we
check the full module before `FunctionPass::doInitialization()` ever
gets called (I was getting crashes in `DwarfDebug` instead of verifier
failures when testing a WIP patch that checks operands of compile
units). In `opt`, also move up debug-info-stripping so that it still
runs before verification.
There was a fair bit of broken code that was sitting in tree.
Interestingly, some were cases of a `select` that referred to itself in
`-instcombine` tests (apparently an intermediate result). I split them
off to `*-noverify.ll` tests with RUN lines like this:
opt < %s -S -disable-verify -instcombine | opt -S | FileCheck %s
This avoids verifying the input file (so we can get the broken code into
`-instcombine), but still verifies the output with a second call to
`opt` (to verify that `-instcombine` will clean it up like it should).
llvm-svn: 233432
Remove `DebugInfoVerifierLegacyPass` and the `-verify-di` pass.
Instead, call into the `DebugInfoVerifier` from inside
`VerifierLegacyPass::finalizeModule()`. This better matches the logic
in `verifyModule()` (used by the new PassManager), avoids requiring two
separate passes to verify the IR, and makes the API for "add a pass to
verify the IR" simple.
Note: the `-verify-debug-info` flag still works (for now, at least;
eventually it might make sense to just remove it).
llvm-svn: 232772
`StripDebug` was only used by tools/opt/opt.cpp in
`AddStandardLinkPasses()`, but opt.cpp adds the same pass based on its
command-line flag before it calls `AddStandardLinkPasses()`. Stripping
debug info twice isn't very useful.
llvm-svn: 232765
Summary:
DataLayout keeps the string used for its creation.
As a side effect it is no longer needed in the Module.
This is "almost" NFC, the string is no longer
canonicalized, you can't rely on two "equals" DataLayout
having the same string returned by getStringRepresentation().
Get rid of DataLayoutPass: the DataLayout is in the Module
The DataLayout is "per-module", let's enforce this by not
duplicating it more than necessary.
One more step toward non-optionality of the DataLayout in the
module.
Make DataLayout Non-Optional in the Module
Module->getDataLayout() will never returns nullptr anymore.
Reviewers: echristo
Subscribers: resistor, llvm-commits, jholewinski
Differential Revision: http://reviews.llvm.org/D7992
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 231270
LLVM's include tree and the use of using declarations to hide the
'legacy' namespace for the old pass manager.
This undoes the primary modules-hostile change I made to keep
out-of-tree targets building. I sent an email inquiring about whether
this would be reasonable to do at this phase and people seemed fine with
it, so making it a reality. This should allow us to start bootstrapping
with modules to a certain extent along with making it easier to mix and
match headers in general.
The updates to any code for users of LLVM are very mechanical. Switch
from including "llvm/PassManager.h" to "llvm/IR/LegacyPassManager.h".
Qualify the types which now produce compile errors with "legacy::". The
most common ones are "PassManager", "PassManagerBase", and
"FunctionPassManager".
llvm-svn: 229094
terms of the new pass manager's TargetIRAnalysis.
Yep, this is one of the nicer bits of the new pass manager's design.
Passes can in many cases operate in a vacuum and so we can just nest
things when convenient. This is particularly convenient here as I can
now consolidate all of the TargetMachine logic on this analysis.
The most important change here is that this pushes the function we need
TTI for all the way into the TargetMachine, and re-creates the TTI
object for each function rather than re-using it for each function.
We're now prepared to teach the targets to produce function-specific TTI
objects with specific subtargets cached, etc.
One piece of feedback I'd love here is whether its worth renaming any of
this stuff. None of the names really seem that awesome to me at this
point, but TargetTransformInfoWrapperPass is particularly ... odd.
TargetIRAnalysisWrapper might make more sense. I would want to do that
rename separately anyways, but let me know what you think.
llvm-svn: 227731
produce it.
This adds a function to the TargetMachine that produces this analysis
via a callback for each function. This in turn faves the way to produce
a *different* TTI per-function with the correct subtarget cached.
I've also done the necessary wiring in the opt tool to thread the target
machine down and make it available to the pass registry so that we can
construct this analysis from a target machine when available.
llvm-svn: 227721
base which it adds a single analysis pass to, to instead return the type
erased TargetTransformInfo object constructed for that TargetMachine.
This removes all of the pass variants for TTI. There is now a single TTI
*pass* in the Analysis layer. All of the Analysis <-> Target
communication is through the TTI's type erased interface itself. While
the diff is large here, it is nothing more that code motion to make
types available in a header file for use in a different source file
within each target.
I've tried to keep all the doxygen comments and file boilerplate in line
with this move, but let me know if I missed anything.
With this in place, the next step to making TTI work with the new pass
manager is to introduce a really simple new-style analysis that produces
a TTI object via a callback into this routine on the target machine.
Once we have that, we'll have the building blocks necessary to accept
a function argument as well.
llvm-svn: 227685
type erased interface and a single analysis pass rather than an
extremely complex analysis group.
The end result is that the TTI analysis can contain a type erased
implementation that supports the polymorphic TTI interface. We can build
one from a target-specific implementation or from a dummy one in the IR.
I've also factored all of the code into "mix-in"-able base classes,
including CRTP base classes to facilitate calling back up to the most
specialized form when delegating horizontally across the surface. These
aren't as clean as I would like and I'm planning to work on cleaning
some of this up, but I wanted to start by putting into the right form.
There are a number of reasons for this change, and this particular
design. The first and foremost reason is that an analysis group is
complete overkill, and the chaining delegation strategy was so opaque,
confusing, and high overhead that TTI was suffering greatly for it.
Several of the TTI functions had failed to be implemented in all places
because of the chaining-based delegation making there be no checking of
this. A few other functions were implemented with incorrect delegation.
The message to me was very clear working on this -- the delegation and
analysis group structure was too confusing to be useful here.
The other reason of course is that this is *much* more natural fit for
the new pass manager. This will lay the ground work for a type-erased
per-function info object that can look up the correct subtarget and even
cache it.
Yet another benefit is that this will significantly simplify the
interaction of the pass managers and the TargetMachine. See the future
work below.
The downside of this change is that it is very, very verbose. I'm going
to work to improve that, but it is somewhat an implementation necessity
in C++ to do type erasure. =/ I discussed this design really extensively
with Eric and Hal prior to going down this path, and afterward showed
them the result. No one was really thrilled with it, but there doesn't
seem to be a substantially better alternative. Using a base class and
virtual method dispatch would make the code much shorter, but as
discussed in the update to the programmer's manual and elsewhere,
a polymorphic interface feels like the more principled approach even if
this is perhaps the least compelling example of it. ;]
Ultimately, there is still a lot more to be done here, but this was the
huge chunk that I couldn't really split things out of because this was
the interface change to TTI. I've tried to minimize all the other parts
of this. The follow up work should include at least:
1) Improving the TargetMachine interface by having it directly return
a TTI object. Because we have a non-pass object with value semantics
and an internal type erasure mechanism, we can narrow the interface
of the TargetMachine to *just* do what we need: build and return
a TTI object that we can then insert into the pass pipeline.
2) Make the TTI object be fully specialized for a particular function.
This will include splitting off a minimal form of it which is
sufficient for the inliner and the old pass manager.
3) Add a new pass manager analysis which produces TTI objects from the
target machine for each function. This may actually be done as part
of #2 in order to use the new analysis to implement #2.
4) Work on narrowing the API between TTI and the targets so that it is
easier to understand and less verbose to type erase.
5) Work on narrowing the API between TTI and its clients so that it is
easier to understand and less verbose to forward.
6) Try to improve the CRTP-based delegation. I feel like this code is
just a bit messy and exacerbating the complexity of implementing
the TTI in each target.
Many thanks to Eric and Hal for their help here. I ended up blocked on
this somewhat more abruptly than I expected, and so I appreciate getting
it sorted out very quickly.
Differential Revision: http://reviews.llvm.org/D7293
llvm-svn: 227669
If the personality is not a recognized MSVC personality function, this
pass delegates to the dwarf EH preparation pass. This chaining supports
people on *-windows-itanium or *-windows-gnu targets.
Currently this recognizes some personalities used by MSVC and turns
resume instructions into traps to avoid link errors. Even if cleanups
are not used in the source program, LLVM requires the frontend to emit a
code path that resumes unwinding after an exception. Clang does this,
and we get unreachable resume instructions. PR20300 covers cleaning up
these unreachable calls to resume.
Reviewers: majnemer
Differential Revision: http://reviews.llvm.org/D7216
llvm-svn: 227405
manager to support the actual uses of it. =]
When I ported instcombine to the new pass manager I discover that it
didn't work because TLI wasn't available in the right places. This is
a somewhat surprising and/or subtle aspect of the new pass manager
design that came up before but I think is useful to be reminded of:
While the new pass manager *allows* a function pass to query a module
analysis, it requires that the module analysis is already run and cached
prior to the function pass manager starting up, possibly with
a 'require<foo>' style utility in the pass pipeline. This is an
intentional hurdle because using a module analysis from a function pass
*requires* that the module analysis is run prior to entering the
function pass manager. Otherwise the other functions in the module could
be in who-knows-what state, etc.
A somewhat surprising consequence of this design decision (at least to
me) is that you have to design a function pass that leverages
a module analysis to do so as an optional feature. Even if that means
your function pass does no work in the absence of the module analysis,
you have to handle that possibility and remain conservatively correct.
This is a natural consequence of things being able to invalidate the
module analysis and us being unable to re-run it. And it's a generally
good thing because it lets us reorder passes arbitrarily without
breaking correctness, etc.
This ends up causing problems in one case. What if we have a module
analysis that is *definitionally* impossible to invalidate. In the
places this might come up, the analysis is usually also definitionally
trivial to run even while other transformation passes run on the module,
regardless of the state of anything. And so, it follows that it is
natural to have a hard requirement on such analyses from a function
pass.
It turns out, that TargetLibraryInfo is just such an analysis, and
InstCombine has a hard requirement on it.
The approach I've taken here is to produce an analysis that models this
flexibility by making it both a module and a function analysis. This
exposes the fact that it is in fact safe to compute at any point. We can
even make it a valid CGSCC analysis at some point if that is useful.
However, we don't want to have a copy of the actual target library info
state for each function! This state is specific to the triple. The
somewhat direct and blunt approach here is to turn TLI into a pimpl,
with the state and mutators in the implementation class and the query
routines primarily in the wrapper. Then the analysis can lazily
construct and cache the implementations, keyed on the triple, and
on-demand produce wrappers of them for each function.
One minor annoyance is that we will end up with a wrapper for each
function in the module. While this is a bit wasteful (one pointer per
function) it seems tolerable. And it has the advantage of ensuring that
we pay the absolute minimum synchronization cost to access this
information should we end up with a nice parallel function pass manager
in the future. We could look into trying to mark when analysis results
are especially cheap to recompute and more eagerly GC-ing the cached
results, or we could look at supporting a variant of analyses whose
results are specifically *not* cached and expected to just be used and
discarded by the consumer. Either way, these seem like incremental
enhancements that should happen when we start profiling the memory and
CPU usage of the new pass manager and not before.
The other minor annoyance is that if we end up using the TLI in both
a module pass and a function pass, those will be produced by two
separate analyses, and thus will point to separate copies of the
implementation state. While a minor issue, I dislike this and would like
to find a way to cleanly allow a single analysis instance to be used
across multiple IR unit managers. But I don't have a good solution to
this today, and I don't want to hold up all of the work waiting to come
up with one. This too seems like a reasonable thing to incrementally
improve later.
llvm-svn: 226981
The pass is really just a means of accessing a cached instance of the
TargetLibraryInfo object, and this way we can re-use that object for the
new pass manager as its result.
Lots of delta, but nothing interesting happening here. This is the
common pattern that is developing to allow analyses to live in both the
old and new pass manager -- a wrapper pass in the old pass manager
emulates the separation intrinsic to the new pass manager between the
result and pass for analyses.
llvm-svn: 226157
While the term "Target" is in the name, it doesn't really have to do
with the LLVM Target library -- this isn't an abstraction which LLVM
targets generally need to implement or extend. It has much more to do
with modeling the various runtime libraries on different OSes and with
different runtime environments. The "target" in this sense is the more
general sense of a target of cross compilation.
This is in preparation for porting this analysis to the new pass
manager.
No functionality changed, and updates inbound for Clang and Polly.
llvm-svn: 226078
This introduces the symbol rewriter. This is an IR->IR transformation that is
implemented as a CodeGenPrepare pass. This allows for the transparent
adjustment of the symbols during compilation.
It provides a clean, simple, elegant solution for symbol inter-positioning. This
technique is often used, such as in the various sanitizers and performance
analysis.
The control of this is via a custom YAML syntax map file that indicates source
to destination mapping, so as to avoid having the compiler to know the exact
details of the source to destination transformations.
llvm-svn: 221548
With this a DataLayoutPass can be reused for multiple modules.
Once we have doInitialization/doFinalization, it doesn't seem necessary to pass
a Module to the constructor.
Overall this change seems in line with the idea of making DataLayout a required
part of Module. With it the only way of having a DataLayout used is to add it
to the Module.
llvm-svn: 217548
Take a StringRef instead of a "const char *".
Take a "std::error_code &" instead of a "std::string &" for error.
A create static method would be even better, but this patch is already a bit too
big.
llvm-svn: 216393
AtomicExpandLoadLinked is currently rather ARM-specific. This patch is the first of
a group that aim at making it more target-independent. See
http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-August/075873.html
for details
The command line option is "atomic-expand"
llvm-svn: 216231
This is mostly a cleanup, but it changes a fairly old behavior.
Every "real" LTO user was already disabling the silly internalize pass
and creating the internalize pass itself. The difference with this
patch is for "opt -std-link-opts" and the C api.
Now to get a usable behavior out of opt one doesn't need the funny
looking command line:
opt -internalize -disable-internalize -internalize-public-api-list=foo,bar -std-link-opts
llvm-svn: 214919
Without initializing the assembly printers a shared library build of opt is
linked with these libraries whereas for a static build these libraries are dead
code eliminated. This is unfortunate for plugins in case they want to use them,
as they neither can rely on opt to provide this functionality nor can they link
the printers in themselves as this breaks with a shared object build of opt.
This patch calls InitializeAllAsmPrinters() from opt, which increases the static
binary size from 50MB -> 52MB on my system (all backends compiled) and causes no
measurable increase in the time needed to run 'make check'.
llvm-svn: 210914
Still only 32-bit ARM using it at this stage, but the promotion allows
direct testing via opt and is a reasonably self-contained patch on the
way to switching ARM64.
At this point, other targets should be able to make use of it without
too much difficulty if they want. (See ARM64 commit coming soon for an
example).
llvm-svn: 206485
Implement DebugInfoVerifier, which steals verification relying on
DebugInfoFinder from Verifier.
- Adds LegacyDebugInfoVerifierPassPass, a ModulePass which wraps
DebugInfoVerifier. Uses -verify-di command-line flag.
- Change verifyModule() to invoke DebugInfoVerifier as well as
Verifier.
- Add a call to createDebugInfoVerifierPass() wherever there was a
call to createVerifierPass().
This implementation as a module pass should sidestep efficiency issues,
allowing us to turn debug info verification back on.
<rdar://problem/15500563>
llvm-svn: 206300
There's a bit of duplicated "magic" code in opt.cpp and Clang's CodeGen that
computes the inliner threshold from opt level and size opt level.
This patch moves the code to a function that lives alongside the inliner itself,
providing a convenient overload to the inliner creation.
A separate patch can be committed to Clang to use this once it's committed to
LLVM. Standalone tools that use the inlining pass can also avoid duplicating
this code and fearing it will go out of sync.
Note: this patch also restructures the conditinal logic of the computation to
be cleaner.
llvm-svn: 203669
This compiles with no changes to clang/lld/lldb with MSVC and includes
overloads to various functions which are used by those projects and llvm
which have OwningPtr's as parameters. This should allow out of tree
projects some time to move. There are also no changes to libs/Target,
which should help out of tree targets have time to move, if necessary.
llvm-svn: 203083
Eventually DataLayoutPass should go away, but for now that is the only easy
way to get a DataLayout in some APIs. This patch only changes the ones that
have easy access to a Module.
One interesting issue with sometimes using DataLayoutPass and sometimes
fetching it from the Module is that we have to make sure they are equivalent.
We can get most of the way there by always constructing the pass with a Module.
In fact, the pass could be changed to point to an external DataLayout instead
of owning one to make this stricter.
Unfortunately, the C api passes a DataLayout, so it has to be up to the caller
to make sure the pass and the module are in sync.
llvm-svn: 202204
Now that DataLayout is not a pass, store one in Module.
Since the C API expects to be able to get a char* to the datalayout description,
we have to keep a std::string somewhere. This patch keeps it in Module and also
uses it to represent modules without a DataLayout.
Once DataLayout is mandatory, we should probably move the string to DataLayout
itself since it won't be necessary anymore to represent the special case of a
module without a DataLayout.
llvm-svn: 202190
After this I will set the default back to F_None. The advantage is that
before this patch forgetting to set F_Binary would corrupt a file on windows.
Forgetting to set F_Text produces one that cannot be read in notepad, which
is a better failure mode :-)
llvm-svn: 202052
CodeGenPrepare uses extensively TargetLowering which is part of libLLVMCodeGen.
This is a layer violation which would introduce eventually a dependence on
CodeGen in ScalarOpts.
Move CodeGenPrepare into libLLVMCodeGen to avoid that.
Follow-up of <rdar://problem/15519855>
llvm-svn: 201912
The same code (~20 lines) for initializing a TargetOptions object from CodeGen
cmdline flags is duplicated 4 times in 4 different tools. This patch moves it
into a utility function.
Since the CodeGen/CommandFlags.h file defines cl::opt flags in a header, it's
a bit of a touchy situation because we should only link them into tools. So this
patch puts the init function in the header.
llvm-svn: 201699
These are self-contained in functionality so it makes sense to separate them,
as opt.cpp has grown quite big already.
Following Eric's suggestions, if this code is ever deemed useful outside of
tools/opt, it will make sense to move it to one of the LLVM libraries like IR.
llvm-svn: 201116
various opt verifier commandline options.
Mostly mechanical wiring of the verifier to the new pass manager.
Exercises one of the more unusual aspects of it -- a pass can be either
a module or function pass interchangably. If this is ever problematic,
we can make things more constrained, but for things like the verifier
where there is an "obvious" applicability at both levels, it seems
convenient.
This is the next-to-last piece of basic functionality left to make the
opt commandline driving of the new pass manager minimally functional for
testing and further development. There is still a lot to be done there
(notably the factoring into .def files to kill the current boilerplate
code) but it is relatively uninteresting. The only interesting bit left
for minimal functionality is supporting the registration of analyses.
I'm planning on doing that on top of the .def file switch mostly because
the boilerplate for the analyses would be significantly worse.
llvm-svn: 199646
When registering a pass, a pass can now specify a second construct that takes as
argument a pointer to TargetMachine.
The PassInfo class has been updated to reflect that possibility.
If such a constructor exists opt will use it instead of the default constructor
when instantiating the pass.
Since such IR passes are supposed to be rare, no specific support has been
added to this commit to allow an easy registration of such a pass.
In other words, for such pass, the initialization function has to be
hand-written (see CodeGenPrepare for instance).
Now, codegenprepare can be tested using opt:
opt -codegenprepare -mtriple=mytriple input.ll
llvm-svn: 199430
directory. These passes are already defined in the IR library, and it
doesn't make any sense to have the headers in Analysis.
Long term, I think there is going to be a much better way to divide
these matters. The dominators code should be fully separated into the
abstract graph algorithm and have that put in Support where it becomes
obvious that evn Clang's CFGBlock's can use it. Then the verifier can
manually construct dominance information from the Support-driven
interface while the Analysis library can provide a pass which both
caches, reconstructs, and supports a nice update API.
But those are very long term, and so I don't want to leave the really
confusing structure until that day arrives.
llvm-svn: 199082
This moves the old pass creation functionality to its own header and
updates the callers of that routine. Then it adds a new PM supporting
bitcode writer to the header file, and wires that up in the opt tool.
A test is added that round-trips code into bitcode and back out using
the new pass manager.
llvm-svn: 199078
that through the interface rather than a simple bool. This should allow
starting to wire up real output to round-trip IR through opt with the
new pass manager.
llvm-svn: 199071
Nothing was using the ability of the pass to delete the raw_ostream it
printed to, and nothing was trying to pass it a pointer to the
raw_ostream. Also, the function variant had a different order of
arguments from all of the others which was just really confusing. Now
the interface accepts a reference, doesn't offer to delete it, and uses
a consistent order. The implementation of the printing passes haven't
been updated with this simplification, this is just the API switch.
llvm-svn: 199044
name to match the source file which I got earlier. Update the include
sites. Also modernize the comments in the header to use the more
recommended doxygen style.
llvm-svn: 199041
manager. I cannot emphasize enough that this is a WIP. =] I expect it
to change a great deal as things stabilize, but I think its really
important to get *some* functionality here so that the infrastructure
can be tested more traditionally from the commandline.
The current design is looking something like this:
./bin/opt -passes='module(pass_a,pass_b,function(pass_c,pass_d))'
So rather than custom-parsed flags, there is a single flag with a string
argument that is parsed into the pass pipeline structure. This makes it
really easy to have nice structural properties that are very explicit.
There is one obvious and important shortcut. You can start off the
pipeline with a pass, and the minimal context of pass managers will be
built around the entire specified pipeline. This makes the common case
for tests super easy:
./bin/opt -passes=instcombine,sroa,gvn
But this won't introduce any of the complexity of the fully inferred old
system -- we only ever do this for the *entire* argument, and we only
look at the first pass. If the other passes don't fit in the pass
manager selected it is a hard error.
The other interesting aspect here is that I'm not relying on any
registration facilities. Such facilities may be unavoidable for
supporting plugins, but I have alternative ideas for plugins that I'd
like to try first. My plan is essentially to build everything without
registration until we hit an absolute requirement.
Instead of registration of pass names, there will be a library dedicated
to parsing pass names and the pass pipeline strings described above.
Currently, this is directly embedded into opt for simplicity as it is
very early, but I plan to eventually pull this into a library that opt,
bugpoint, and even Clang can depend on. It should end up as a good home
for things like the existing PassManagerBuilder as well.
There are a bunch of FIXMEs in the code for the parts of this that are
just stubbed out to make the patch more incremental. A quick list of
what's coming up directly after this:
- Support for function passes and building the structured nesting.
- Support for printing the pass structure, and FileCheck tests of all of
this code.
- The .def-file based pass name parsing.
- IR priting passes and the corresponding tests.
Some obvious things that I'm not going to do right now, but am
definitely planning on as the pass manager work gets a bit further:
- Pull the parsing into library, including the builders.
- Thread the rest of the target stuff into the new pass manager.
- Wire support for the new pass manager up to llc.
- Plugin support.
Some things that I'd like to have, but are significantly lower on my
priority list. I'll get to these eventually, but they may also be places
where others want to contribute:
- Adding nice error reporting for broken pass pipeline descriptions.
- Typo-correction for pass names.
llvm-svn: 198998
are part of the core IR library in order to support dumping and other
basic functionality.
Rename the 'Assembly' include directory to 'AsmParser' to match the
library name and the only functionality left their -- printing has been
in the core IR library for quite some time.
Update all of the #includes to match.
All of this started because I wanted to have the layering in good shape
before I started adding support for printing LLVM IR using the new pass
infrastructure, and commandline support for the new pass infrastructure.
llvm-svn: 198688
The intended behaviour is to force vectorization on the presence
of the flag (either turn on or off), and to continue the behaviour
as expected in its absence. Tests were added to make sure the all
cases are covered in opt. No tests were added in other tools with
the assumption that they should use the PassManagerBuilder in the
same way.
This patch also removes the outdated -late-vectorize flag, which was
on by default and not helping much.
The pragma metadata is being attached to the same place as other loop
metadata, but nothing forbids one from attaching it to a function
(to enable #pragma optimize) or basic blocks (to hint the basic-block
vectorizers), etc. The logic should be the same all around.
Patches to Clang to produce the metadata will be produced after the
initial implementation is agreed upon and committed. Patches to other
vectorizers (such as SLP and BB) will be added once we're happy with
the pass manager changes.
llvm-svn: 196537
clang enables vectorization at optimization levels > 1 and size level < 2. opt
should behave similarily.
Loop vectorization and SLP vectorization can be disabled with the flags
-disable-(loop/slp)-vectorization.
llvm-svn: 196294
In DIBuilder, the context field of a TAG_member is updated to use the
scope reference. Verifier is updated accordingly.
DebugInfoFinder now needs to generate a type identifier map to have
access to the actual scope. Same applies for BreakpointPrinter.
processModule of DebugInfoFinder is called during initialization phase
of the verifier to make sure the type identifier map is constructed early
enough.
We are now able to unique a simple class as demonstrated by the added
testing case.
llvm-svn: 190334
When unrolling is disabled in the pass manager, the loop vectorizer should also
not unroll loops. This will allow the -fno-unroll-loops option in Clang to
behave as expected (even for vectorizable loops). The loop vectorizer's
-force-vector-unroll option will (continue to) override the pass-manager
setting (including -force-vector-unroll=0 to force use of the internal
auto-selection logic).
In order to test this, I added a flag to opt (-disable-loop-unrolling) to force
disable unrolling through opt (the analog of -fno-unroll-loops in Clang). Also,
this fixes a small bug in opt where the loop vectorizer was enabled only after
the pass manager populated the queue of passes (the global_alias.ll test needed
a slight update to the RUN line as a result of this fix).
llvm-svn: 189499
Function attributes are the future! So just query whether we want to realign the
stack directly from the function instead of through a random target options
structure.
llvm-svn: 187618
Merge consecutive if-regions if they contain identical statements.
Both transformations reduce number of branches. The transformation
is guarded by a target-hook, and is currently enabled only for +R600,
but the correctness has been tested on X86 target using a variety of
CPU benchmarks.
Patch by: Mei Ye
llvm-svn: 187278
There's no need to specify a flag to omit frame pointer elimination on non-leaf
nodes...(Honestly, I can't parse that option out.) Use the function attribute
stuff instead.
llvm-svn: 187093
Use the function attributes to pass along the stack protector buffer size.
Now that we have robust function attributes, don't use a command line option to
specify the stack protecto buffer size.
llvm-svn: 186863
No functionality change.
It should suffice to check the type of a debug info metadata, instead of
calling Verify. For cases where we know the type of a DI metadata, use
assert.
Also update testing cases to make them conform to the format of DI classes.
llvm-svn: 185135
This commit completely removes what is left of the simplify-libcalls
pass. All of the functionality has now been migrated to the instcombine
and functionattrs passes. The following C API functions are now NOPs:
1. LLVMAddSimplifyLibCallsPass
2. LLVMPassManagerBuilderSetDisableSimplifyLibCalls
llvm-svn: 184459
- requires existing debug information to be present
- fixes up file name and line number information in metadata
- emits a "<orig_filename>-debug.ll" succinct IR file (without !dbg metadata
or debug intrinsics) that can be read by a debugger
- initialize pass in opt tool to enable the "-debug-ir" flag
- lit tests to follow
llvm-svn: 181467
its own library. These functions are bridging between the bitcode reader
and the ll parser which are in different libraries. Previously we didn't
have any good library to do this, and instead played fast and loose with
a "header only" set of interfaces in the Support library. This really
doesn't work well as evidenced by the recent attempt to add timing logic
to the these routines.
As part of this, make them normal functions rather than weird inline
functions, and sink the implementation into the library. Also clean up
the header to be nice and minimal.
This requires updating lots of build system dependencies to specify that
the IRReader library is needed, and several source files to not
implicitly rely upon the header file to transitively include all manner
of other headers.
If you are using IRReader.h, this commit will break you (the header
moved) and you'll need to also update your library usage to include
'irreader'. I will commit the corresponding change to Clang momentarily.
llvm-svn: 177971
a TargetMachine to construct (and thus isn't always available), to an
analysis group that supports layered implementations much like
AliasAnalysis does. This is a pretty massive change, with a few parts
that I was unable to easily separate (sorry), so I'll walk through it.
The first step of this conversion was to make TargetTransformInfo an
analysis group, and to sink the nonce implementations in
ScalarTargetTransformInfo and VectorTargetTranformInfo into
a NoTargetTransformInfo pass. This allows other passes to add a hard
requirement on TTI, and assume they will always get at least on
implementation.
The TargetTransformInfo analysis group leverages the delegation chaining
trick that AliasAnalysis uses, where the base class for the analysis
group delegates to the previous analysis *pass*, allowing all but tho
NoFoo analysis passes to only implement the parts of the interfaces they
support. It also introduces a new trick where each pass in the group
retains a pointer to the top-most pass that has been initialized. This
allows passes to implement one API in terms of another API and benefit
when some other pass above them in the stack has more precise results
for the second API.
The second step of this conversion is to create a pass that implements
the TargetTransformInfo analysis using the target-independent
abstractions in the code generator. This replaces the
ScalarTargetTransformImpl and VectorTargetTransformImpl classes in
lib/Target with a single pass in lib/CodeGen called
BasicTargetTransformInfo. This class actually provides most of the TTI
functionality, basing it upon the TargetLowering abstraction and other
information in the target independent code generator.
The third step of the conversion adds support to all TargetMachines to
register custom analysis passes. This allows building those passes with
access to TargetLowering or other target-specific classes, and it also
allows each target to customize the set of analysis passes desired in
the pass manager. The baseline LLVMTargetMachine implements this
interface to add the BasicTTI pass to the pass manager, and all of the
tools that want to support target-aware TTI passes call this routine on
whatever target machine they end up with to add the appropriate passes.
The fourth step of the conversion created target-specific TTI analysis
passes for the X86 and ARM backends. These passes contain the custom
logic that was previously in their extensions of the
ScalarTargetTransformInfo and VectorTargetTransformInfo interfaces.
I separated them into their own file, as now all of the interface bits
are private and they just expose a function to create the pass itself.
Then I extended these target machines to set up a custom set of analysis
passes, first adding BasicTTI as a fallback, and then adding their
customized TTI implementations.
The fourth step required logic that was shared between the target
independent layer and the specific targets to move to a different
interface, as they no longer derive from each other. As a consequence,
a helper functions were added to TargetLowering representing the common
logic needed both in the target implementation and the codegen
implementation of the TTI pass. While technically this is the only
change that could have been committed separately, it would have been
a nightmare to extract.
The final step of the conversion was just to delete all the old
boilerplate. This got rid of the ScalarTargetTransformInfo and
VectorTargetTransformInfo classes, all of the support in all of the
targets for producing instances of them, and all of the support in the
tools for manually constructing a pass based around them.
Now that TTI is a relatively normal analysis group, two things become
straightforward. First, we can sink it into lib/Analysis which is a more
natural layer for it to live. Second, clients of this interface can
depend on it *always* being available which will simplify their code and
behavior. These (and other) simplifications will follow in subsequent
commits, this one is clearly big enough.
Finally, I'm very aware that much of the comments and documentation
needs to be updated. As soon as I had this working, and plausibly well
commented, I wanted to get it committed and in front of the build bots.
I'll be doing a few passes over documentation later if it sticks.
Commits to update DragonEgg and Clang will be made presently.
llvm-svn: 171681
interfaces which could be extracted from it, and must be provided on
construction, to a chained analysis group.
The end goal here is that TTI works much like AA -- there is a baseline
"no-op" and target independent pass which is in the group, and each
target can expose a target-specific pass in the group. These passes will
naturally chain allowing each target-specific pass to delegate to the
generic pass as needed.
In particular, this will allow a much simpler interface for passes that
would like to use TTI -- they can have a hard dependency on TTI and it
will just be satisfied by the stub implementation when that is all that
is available.
This patch is a WIP however. In particular, the "stub" pass is actually
the one and only pass, and everything there is implemented by delegating
to the target-provided interfaces. As a consequence the tools still have
to explicitly construct the pass. Switching targets to provide custom
passes and sinking the stub behavior into the NoTTI pass is the next
step.
llvm-svn: 171621
into their new header subdirectory: include/llvm/IR. This matches the
directory structure of lib, and begins to correct a long standing point
of file layout clutter in LLVM.
There are still more header files to move here, but I wanted to handle
them in separate commits to make tracking what files make sense at each
layer easier.
The only really questionable files here are the target intrinsic
tablegen files. But that's a battle I'd rather not fight today.
I've updated both CMake and Makefile build systems (I think, and my
tests think, but I may have missed something).
I've also re-sorted the includes throughout the project. I'll be
committing updates to Clang, DragonEgg, and Polly momentarily.
llvm-svn: 171366
Again, tools are trickier to pick the main module header for than
library source files. I've started to follow the pattern of using
LLVMContext.h when it is included as a stub for program source files.
llvm-svn: 169252
The TargetTransform changes are breaking LTO bootstraps of clang. I am
working with Nadav to figure out the problem, but I am reverting it for now
to get our buildbots working.
This reverts svn commits: 165665 165669 165670 165786 165787 165997
and I have also reverted clang svn 165741
llvm-svn: 166168
include/llvm/Analysis/DebugInfo.h to include/llvm/DebugInfo.h.
The reasoning is because the DebugInfo module is simply an interface to the
debug info MDNodes and has nothing to do with analysis.
llvm-svn: 159312
options, to enable easier testing of the innards of LLVM that are
enabled by such optimization strategies.
Note that this doesn't provide the (much needed) function attribute
support for -Oz (as opposed to -Os), but still seems like a positive
step to better test the logic that Clang currently relies on.
Patch by Patrik Hägglund.
llvm-svn: 156913
This is the initial checkin of the basic-block autovectorization pass along with some supporting vectorization infrastructure.
Special thanks to everyone who helped review this code over the last several months (especially Tobias Grosser).
llvm-svn: 149468
the X86 asmparser to produce ranges in the one case that was annoying me, for example:
test.s:10:15: error: invalid operand for instruction
movl 0(%rax), 0(%edx)
^~~~~~~
It should be straight-forward to enhance filecheck, tblgen, and/or the .ll parser to use
ranges where appropriate if someone is interested.
llvm-svn: 142106