Now that we've moved to C++14, we no longer need the llvm::make_unique
implementation from STLExtras.h. This patch is a mechanical replacement
of (hopefully) all the llvm::make_unique instances across the monorepo.
llvm-svn: 369013
This allows later passes (in particular InstCombine) to optimize more
cases.
One that's important to us is `memcmp(p, q, constant) < 0` and memcmp(p, q, constant) > 0.
llvm-svn: 364412
Use -fsave-optimization-record=<format> to specify a different format
than the default, which is YAML.
For now, only YAML is supported.
llvm-svn: 363573
* Add a common function to setup opt-remarks
* Rename common options to the same names
* Add error types to distinguish between file errors and regex errors
llvm-svn: 363415
* Add a common function to setup opt-remarks
* Rename common options to the same names
* Add error types to distinguish between file errors and regex errors
llvm-svn: 363328
This was using its own, outdated list of possible captures. This was
at minimum not catching cmpxchg and addrspacecast captures.
One change is now any volatile access is treated as capturing. The
test coverage for this pass is quite inadequate, but this required
removing volatile in the lifetime capture test.
Also fixes some infrastructure issues to allow running just the IR
pass.
Fixes bug 42238.
llvm-svn: 363169
Patch which introduces a target-independent framework for generating
hardware loops at the IR level. Most of the code has been taken from
PowerPC CTRLoops and PowerPC has been ported over to use this generic
pass. The target dependent parts have been moved into
TargetTransformInfo, via isHardwareLoopProfitable, with
HardwareLoopInfo introduced to transfer information from the backend.
Three generic intrinsics have been introduced:
- void @llvm.set_loop_iterations
Takes as a single operand, the number of iterations to be executed.
- i1 @llvm.loop_decrement(anyint)
Takes the maximum number of elements processed in an iteration of
the loop body and subtracts this from the total count. Returns
false when the loop should exit.
- anyint @llvm.loop_decrement_reg(anyint, anyint)
Takes the number of elements remaining to be processed as well as
the maximum numbe of elements processed in an iteration of the loop
body. Returns the updated number of elements remaining.
llvm-svn: 362774
Separate the remark serialization to YAML from the LLVM Diagnostics.
This adds a new serialization abstraction: remarks::Serializer. It's
completely independent from lib/IR and it provides an easy way to
replace YAML by providing a new remarks::Serializer.
Differential Revision: https://reviews.llvm.org/D62632
llvm-svn: 362160
Summary:
When refactoring vectorization flags, vectorization was disabled by default in the new pass manager.
This patch re-enables is for both managers, and changes the assumptions opt makes, based on the new defaults.
Comments in opt.cpp should clarify the intended use of all flags to enable/disable vectorization.
Reviewers: chandlerc, jgorbe
Subscribers: jlebar, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D61091
llvm-svn: 359167
Removes the code from opt and the pass manager builder.
The code was unused - even by the C library code that was supposed to set
it and had been removed previously.
llvm-svn: 358024
Currently we have -Rpass for filtering the remarks that are displayed as
diagnostics, but when using -fsave-optimization-record, there is no way
to filter the remarks while generating them.
This adds support for filtering remarks by passes using a regex.
Ex: `clang -fsave-optimization-record -foptimization-record-passes=inline`
will only emit the remarks coming from the pass `inline`.
This adds:
* `-fsave-optimization-record` to the driver
* `-opt-record-passes` to cc1
* `-lto-pass-remarks-filter` to the LTOCodeGenerator
* `--opt-remarks-passes` to lld
* `-pass-remarks-filter` to llc, opt, llvm-lto, llvm-lto2
* `-opt-remarks-passes` to gold-plugin
Differential Revision: https://reviews.llvm.org/D59268
Original llvm-svn: 355964
llvm-svn: 355984
Currently we have -Rpass for filtering the remarks that are displayed as
diagnostics, but when using -fsave-optimization-record, there is no way
to filter the remarks while generating them.
This adds support for filtering remarks by passes using a regex.
Ex: `clang -fsave-optimization-record -foptimization-record-passes=inline`
will only emit the remarks coming from the pass `inline`.
This adds:
* `-fsave-optimization-record` to the driver
* `-opt-record-passes` to cc1
* `-lto-pass-remarks-filter` to the LTOCodeGenerator
* `--opt-remarks-passes` to lld
* `-pass-remarks-filter` to llc, opt, llvm-lto, llvm-lto2
* `-opt-remarks-passes` to gold-plugin
Differential Revision: https://reviews.llvm.org/D59268
llvm-svn: 355964
This allows us to store more info about where we're emitting the remarks
without cluttering LLVMContext. This is needed for future support for
the remark section.
Differential Revision: https://reviews.llvm.org/D58996
Original llvm-svn: 355507
llvm-svn: 355514
This allows us to store more info about where we're emitting the remarks
without cluttering LLVMContext. This is needed for future support for
the remark section.
Differential Revision: https://reviews.llvm.org/D58996
llvm-svn: 355507
Partly addresses PR15026.
There are a few tests that passed in invalid architectures, which are fixed in: rL355349 and D58931
Reviewers: echristo, efriedma, rengolin, atrick
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D58933
llvm-svn: 355455
to reflect the new license.
We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.
Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.
llvm-svn: 351636
Currently we have pgo options defined in PassManagerBuilder.cpp only for
instrument pgo, but not for sample pgo. We also have pgo options defined
in NewPMDriver.cpp in opt only for new pass manager and for all kinds of
pgo. They have some inconsistency.
To make the options more consistent and make tests writing easier, the
patch let old pass manager to share the same pgo options with new pass
manager in opt, and removes the options in PassManagerBuilder.cpp.
Differential Revision: https://reviews.llvm.org/D56749
llvm-svn: 351392
Summary:
Records in the module summary index whether the bitcode was compiled
with the option necessary to enable splitting the LTO unit
(e.g. -fsanitize=cfi, -fwhole-program-vtables, or -fsplit-lto-unit).
The information is passed down to the ModuleSummaryIndex builder via a
new module flag "EnableSplitLTOUnit", which is propagated onto a flag
on the summary index.
This is then used during the LTO link to check whether all linked
summaries were built with the same value of this flag. If not, an error
is issued when we detect a situation requiring whole program visibility
of the class hierarchy. This is the case when both of the following
conditions are met:
1) We are performing LowerTypeTests or Whole Program Devirtualization.
2) There are type tests or type checked loads in the code.
Note I have also changed the ThinLTOBitcodeWriter to also gate the
module splitting on the value of this flag.
Reviewers: pcc
Subscribers: ormris, mehdi_amini, Prazek, inglorion, eraman, steven_wu, dexonsmith, arphaman, dang, llvm-commits
Differential Revision: https://reviews.llvm.org/D53890
llvm-svn: 350948
This patch defines an interleaved-load-combine pass. The pass searches
for ShuffleVector instructions that represent interleaved loads. Matches are
converted such that they will be captured by the InterleavedAccessPass.
The pass extends LLVMs capabilities to use target specific instruction
selection of interleaved load patterns (e.g.: ld4 on Aarch64
architectures).
Differential Revision: https://reviews.llvm.org/D52653
llvm-svn: 347208
Add a -debugify-export option to opt. This exports per-pass `debugify`
loss statistics to a file in CSV format.
For some interesting numbers on debug value loss during an -O2 build
of the sqlite3 amalgamation, see the review thread.
Differential Revision: https://reviews.llvm.org/D49003
llvm-svn: 337787
This is a minor cleanup in preparation for a change to export DI
statistics from -check-debugify. To do that, it would be cleaner to have
a dedicated header for the debugify interface.
llvm-svn: 337786
Applying synthetic debug info before the bitcode writer pass has no
testing-related purpose. This commit prevents that from happening.
It also adds tests which check that IR produced with/without
-debugify-each enabled is identical after stripping. This makes it
possible to check that individual passes (or full pipelines) are
invariant to debug info.
llvm-svn: 333861
The -strip-module-flags option strips llvm.module.flags metadata from a
module at the beginning of the opt pipeline.
This will be used to test whether the output of a pass is debug info
(DI) invariant.
E.g, after applying synthetic debug info to a test case, we'd like to
strip out all DI-related metadata and check that the final IR is
identical to a baseline file without any DI applied, to check that
optimizations aren't inhibited by debug info.
llvm-svn: 333860
This adds a -debugify-each mode to opt which, when enabled, wraps each
{Module,Function}Pass in a pipeline with logic to add, check, and strip
synthetic debug info for testing purposes.
This mode can be used to test complex pipelines for debug info bugs, or
to collect statistics about the number of debug values & locations lost
throughout various stages of a pipeline.
Patch by Son Tuan Vu!
Differential Revision: https://reviews.llvm.org/D46525
llvm-svn: 332312
See r331124 for how I made a list of files missing the include.
I then ran this Python script:
for f in open('filelist.txt'):
f = f.strip()
fl = open(f).readlines()
found = False
for i in xrange(len(fl)):
p = '#include "llvm/'
if not fl[i].startswith(p):
continue
if fl[i][len(p):] > 'Config':
fl.insert(i, '#include "llvm/Config/llvm-config.h"\n')
found = True
break
if not found:
print 'not found', f
else:
open(f, 'w').write(''.join(fl))
and then looked through everything with `svn diff | diffstat -l | xargs -n 1000 gvim -p`
and tried to fix include ordering and whatnot.
No intended behavior change.
llvm-svn: 331184
As demonstrated by the regression tests added in this patch, the
following cases are valid cases:
1. A Function with no DISubprogram attached, but various debug info
related to its instructions, coming, for instance, from an inlined
function, also defined somewhere else in the same module;
2. ... or coming exclusively from the functions inlined and eliminated
from the module entirely.
The ValueMap shared between CloneFunctionInto calls within CloneModule
needs to contain identity mappings for all of the DISubprogram's to
prevent them from being duplicated by MapMetadata / RemapInstruction
calls, this is achieved via DebugInfoFinder collecting all the
DISubprogram's. However, CloneFunctionInto was missing calls into
DebugInfoFinder for functions w/o DISubprogram's attached, but still
referring DISubprogram's from within (case 1). This patch fixes that.
The fix above, however, exposes another issue: if a module contains a
DISubprogram referenced only indirectly from other debug info
metadata, but not attached to any Function defined within the module
(case 2), cloning such a module causes a DICompileUnit duplication: it
will be moved in indirecty via a DISubprogram by DebugInfoFinder first
(because of the first bug fix described above), without being
self-mapped within the shared ValueMap, and then will be copied during
named metadata cloning. So this patch makes sure DebugInfoFinder
visits DICompileUnit's referenced from DISubprogram's as it goes w/o
re-processing llvm.dbg.cu list over and over again for every function
cloned, and makes sure that CloneFunctionInto self-maps
DICompileUnit's referenced from the entire function, not just its own
DISubprogram attached that may also be missing.
The most convenient way of tesing CloneModule I found is to rely on
CloneModule call from `opt -run-twice`, instead of writing tedious
unit tests. That feature has a couple of properties that makes it hard
to use for this purpose though:
1. CloneModule doesn't copy source filename, making `opt -run-twice`
report it as a difference.
2. `opt -run-twice` does the second run on the original module, not
its clone, making the result of cloning completely invisible in opt's
actual output with and without `-run-twice` both, which directly
contradicts `opt -run-twice`s own error message.
This patch fixes this as well.
Reviewed By: aprantl
Reviewers: loladiro, GorNishanov, espindola, echristo, dexonsmith
Subscribers: vsk, debug-info, JDevlieghere, llvm-commits
Differential Revision: https://reviews.llvm.org/D45593
llvm-svn: 330069
We have a few functions that virtually all command wants to run on
process startup/shutdown. This patch adds InitLLVM class to do that
all at once, so that we don't need to copy-n-paste boilerplate code
to each llvm command's main() function.
Differential Revision: https://reviews.llvm.org/D45602
llvm-svn: 330046
These aren't the .def style files used in LLVM that require a macro
defined before their inclusion - they're just basic non-modular includes
to stamp out command line flag variables.
llvm-svn: 329840
Sometimes users do not specify data layout in LLVM assembly and let llc set the
data layout by target triple after loading the LLVM assembly.
Currently the parser checks alloca address space no matter whether the LLVM
assembly contains data layout definition, which causes false alarm since the
default data layout does not contain the correct alloca address space.
The parser also calls verifier to check debug info and updating invalid debug
info. Currently there is no way to let the verifier to check debug info only.
If the verifier finds non-debug-info issues the parser will fail.
For llc, the fix is to remove the check of alloca addr space in the parser and
disable updating debug info, and defer the updating of debug info and
verification to be after setting data layout of the IR by target.
For other llvm tools, since they do not override data layout by target but
instead can override data layout by a command line option, an argument for
overriding data layout is added to the parser. In cases where data layout
overriding is necessary for the parser, the data layout can be provided by
command line.
Differential Revision: https://reviews.llvm.org/D41832
llvm-svn: 323826
Combine expression patterns to form expressions with fewer, simple instructions.
This pass does not modify the CFG.
For example, this pass reduce width of expressions post-dominated by TruncInst
into smaller width when applicable.
It differs from instcombine pass in that it contains pattern optimization that
requires higher complexity than the O(1), thus, it should run fewer times than
instcombine pass.
Differential Revision: https://reviews.llvm.org/D38313
llvm-svn: 323321
Opt's "-enable-debugify" mode adds an instance of Debugify at the
beginning of the pass pipeline, and an instance of CheckDebugify at the
end.
You can enable this mode with lit using: -Dopt="opt -enable-debugify".
Note that running test suites in this mode will result in many failures
due to strict FileCheck commands, etc.
It can be more useful to look for assertion failures which arise only
when Debugify is enabled, e.g to prove that we have (or do not have)
test coverage for some code path with debug info present.
Differential Revision: https://reviews.llvm.org/D41793
llvm-svn: 323256
Summary:
First, we need to explain the core of the vulnerability. Note that this
is a very incomplete description, please see the Project Zero blog post
for details:
https://googleprojectzero.blogspot.com/2018/01/reading-privileged-memory-with-side.html
The basis for branch target injection is to direct speculative execution
of the processor to some "gadget" of executable code by poisoning the
prediction of indirect branches with the address of that gadget. The
gadget in turn contains an operation that provides a side channel for
reading data. Most commonly, this will look like a load of secret data
followed by a branch on the loaded value and then a load of some
predictable cache line. The attacker then uses timing of the processors
cache to determine which direction the branch took *in the speculative
execution*, and in turn what one bit of the loaded value was. Due to the
nature of these timing side channels and the branch predictor on Intel
processors, this allows an attacker to leak data only accessible to
a privileged domain (like the kernel) back into an unprivileged domain.
The goal is simple: avoid generating code which contains an indirect
branch that could have its prediction poisoned by an attacker. In many
cases, the compiler can simply use directed conditional branches and
a small search tree. LLVM already has support for lowering switches in
this way and the first step of this patch is to disable jump-table
lowering of switches and introduce a pass to rewrite explicit indirectbr
sequences into a switch over integers.
However, there is no fully general alternative to indirect calls. We
introduce a new construct we call a "retpoline" to implement indirect
calls in a non-speculatable way. It can be thought of loosely as
a trampoline for indirect calls which uses the RET instruction on x86.
Further, we arrange for a specific call->ret sequence which ensures the
processor predicts the return to go to a controlled, known location. The
retpoline then "smashes" the return address pushed onto the stack by the
call with the desired target of the original indirect call. The result
is a predicted return to the next instruction after a call (which can be
used to trap speculative execution within an infinite loop) and an
actual indirect branch to an arbitrary address.
On 64-bit x86 ABIs, this is especially easily done in the compiler by
using a guaranteed scratch register to pass the target into this device.
For 32-bit ABIs there isn't a guaranteed scratch register and so several
different retpoline variants are introduced to use a scratch register if
one is available in the calling convention and to otherwise use direct
stack push/pop sequences to pass the target address.
This "retpoline" mitigation is fully described in the following blog
post: https://support.google.com/faqs/answer/7625886
We also support a target feature that disables emission of the retpoline
thunk by the compiler to allow for custom thunks if users want them.
These are particularly useful in environments like kernels that
routinely do hot-patching on boot and want to hot-patch their thunk to
different code sequences. They can write this custom thunk and use
`-mretpoline-external-thunk` *in addition* to `-mretpoline`. In this
case, on x86-64 thu thunk names must be:
```
__llvm_external_retpoline_r11
```
or on 32-bit:
```
__llvm_external_retpoline_eax
__llvm_external_retpoline_ecx
__llvm_external_retpoline_edx
__llvm_external_retpoline_push
```
And the target of the retpoline is passed in the named register, or in
the case of the `push` suffix on the top of the stack via a `pushl`
instruction.
There is one other important source of indirect branches in x86 ELF
binaries: the PLT. These patches also include support for LLD to
generate PLT entries that perform a retpoline-style indirection.
The only other indirect branches remaining that we are aware of are from
precompiled runtimes (such as crt0.o and similar). The ones we have
found are not really attackable, and so we have not focused on them
here, but eventually these runtimes should also be replicated for
retpoline-ed configurations for completeness.
For kernels or other freestanding or fully static executables, the
compiler switch `-mretpoline` is sufficient to fully mitigate this
particular attack. For dynamic executables, you must compile *all*
libraries with `-mretpoline` and additionally link the dynamic
executable and all shared libraries with LLD and pass `-z retpolineplt`
(or use similar functionality from some other linker). We strongly
recommend also using `-z now` as non-lazy binding allows the
retpoline-mitigated PLT to be substantially smaller.
When manually apply similar transformations to `-mretpoline` to the
Linux kernel we observed very small performance hits to applications
running typical workloads, and relatively minor hits (approximately 2%)
even for extremely syscall-heavy applications. This is largely due to
the small number of indirect branches that occur in performance
sensitive paths of the kernel.
When using these patches on statically linked applications, especially
C++ applications, you should expect to see a much more dramatic
performance hit. For microbenchmarks that are switch, indirect-, or
virtual-call heavy we have seen overheads ranging from 10% to 50%.
However, real-world workloads exhibit substantially lower performance
impact. Notably, techniques such as PGO and ThinLTO dramatically reduce
the impact of hot indirect calls (by speculatively promoting them to
direct calls) and allow optimized search trees to be used to lower
switches. If you need to deploy these techniques in C++ applications, we
*strongly* recommend that you ensure all hot call targets are statically
linked (avoiding PLT indirection) and use both PGO and ThinLTO. Well
tuned servers using all of these techniques saw 5% - 10% overhead from
the use of retpoline.
We will add detailed documentation covering these components in
subsequent patches, but wanted to make the core functionality available
as soon as possible. Happy for more code review, but we'd really like to
get these patches landed and backported ASAP for obvious reasons. We're
planning to backport this to both 6.0 and 5.0 release streams and get
a 5.0 release with just this cherry picked ASAP for distros and vendors.
This patch is the work of a number of people over the past month: Eric, Reid,
Rui, and myself. I'm mailing it out as a single commit due to the time
sensitive nature of landing this and the need to backport it. Huge thanks to
everyone who helped out here, and everyone at Intel who helped out in
discussions about how to craft this. Also, credit goes to Paul Turner (at
Google, but not an LLVM contributor) for much of the underlying retpoline
design.
Reviewers: echristo, rnk, ruiu, craig.topper, DavidKreitzer
Subscribers: sanjoy, emaste, mcrosier, mgorny, mehdi_amini, hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D41723
llvm-svn: 323155