Previously, we materialized secondary vector IVs from the primary scalar IV,
by offseting the primary to match the correct start value, and then broadcasting
it - inside the loop body. Instead, we can use a real vector IV, like we do for
the primary.
This enables using vector IVs for secondary integer IVs whose type matches the
type of the primary.
Differential Revision: http://reviews.llvm.org/D20932
llvm-svn: 272283
Summary:
This fixes PR26682. Also add LCSSA as a preserved pass to LoopSimplify,
that looks correct to me and allows to write a test for the issue.
Reviewers: chandlerc, bogner, sanjoy
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D21112
llvm-svn: 272224
Summary:
This fixes PR27617.
Bug description: The SLPVectorizer asserts on encountering GEPs with different index types, such as i8 and i64.
The patch includes a simple relaxation of the assert to allow constants being of different types, along with a regression test that will provoke the unrelaxed assert.
Reviewers: nadav, mzolotukhin
Subscribers: JesperAntonsson, llvm-commits, mzolotukhin
Differential Revision: http://reviews.llvm.org/D20685
Patch by Jesper Antonsson!
llvm-svn: 272206
with user specified count has been applied.
Summary:
Previously SetLoopAlreadyUnrolled() set the disable pragma only if
there was some loop metadata.
Now it set the pragma in all cases. This helps to prevent multiple
unroll when -unroll-count=N is given.
Reviewers: mzolotukhin
Differential Revision: http://reviews.llvm.org/D20765
From: Evgeny Stupachenko <evstupac@gmail.com>
llvm-svn: 272195
As suggested by clang-tidy's performance-unnecessary-copy-initialization.
This can easily hit lifetime issues, so I audited every change and ran the
tests under asan, which came back clean.
llvm-svn: 272126
Unlike native shifts, the AVX2 per-element shift instructions VPSRAV/VPSRLV/VPSLLV handle out of range shift values (logical shifts set the result to zero, arithmetic shifts splat the sign bit).
If the shift amount is constant we can sometimes convert these instructions to native shifts:
1 - if all shift amounts are in range then the conversion is trivial.
2 - out of range arithmetic shifts can be clamped to the (bitwidth - 1) (a legal shift amount) before conversion.
3 - logical shifts just return zero if all elements have out of range shift amounts.
In addition, UNDEF shift amounts are handled - either as an UNDEF shift amount in a native shift or as an UNDEF in the logical 'all out of range' zero constant special case for logical shifts.
Differential Revision: http://reviews.llvm.org/D19675
llvm-svn: 271996
This patch adds support for folding undef/zero/constant inputs to MOVMSK instructions.
The SSE/AVX versions can be fully folded, but the MMX version can only handle undef inputs.
Differential Revision: http://reviews.llvm.org/D20998
llvm-svn: 271990
scalarizePHI only looked for phis that have exactly two uses - the "latch"
use, and an extract. Unfortunately, we can not assume all equivalent extracts
are CSE'd, since InstCombine itself may create an extract which is a duplicate
of an existing one. This extends it to handle several distinct extracts from
the same index.
This should fix at least some of the performance regressions from PR27988.
Differential Revision: http://reviews.llvm.org/D20983
llvm-svn: 271961
Summary:
Fix LSRInstance::HoistInsertPosition() to check the original insert
position block first for a canonical insertion point that is dominated
by all inputs. This leads to SCEV being able to reuse more instructions
since it currently tracks the instructions it creates for reuse by
keeping a table of <Value, insert point> pairs.
Originally reviewed in http://reviews.llvm.org/D18001
Reviewers: atrick
Subscribers: llvm-commits, mzolotukhin, mcrosier
Differential Revision: http://reviews.llvm.org/D18480
llvm-svn: 271929
In r271810 ( http://reviews.llvm.org/rL271810 ), I loosened the check
above this to work for any Constant rather than ConstantInt. AFAICT,
that part makes sense if we can determine that the shrunken/extended
constant remained equal. But it doesn't make sense for this later
transform where we assume that the constant DID change.
This could assert for a ConstantExpr:
https://llvm.org/bugs/show_bug.cgi?id=28011
And it could be wrong for a vector as shown in the added regression test.
llvm-svn: 271908
Summary:
This hasn't been caught before because it requires noalias or similarly
strong alias analysis to actually reproduce.
Fixes http://llvm.org/PR27952 .
Reviewers: hfinkel, sanjoy
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D20944
llvm-svn: 271858
Since FoldOpIntoPhi speculates the binary operation to potentially each
of the predecessors of the PHI node (pulling it out of arbitrary control
dependence in the process), we can FoldOpIntoPhi only if we know the
operation doesn't have UB.
This also brings up an interesting profitability question -- the way it
is written today, commonIRemTransforms will hoist out work from
dynamically dead code into code that will execute at runtime. Perhaps
that isn't the best canonicalization?
Fixes PR27968.
llvm-svn: 271857
Summary:
There are some rough corners, since the new pass manager doesn't have
(as far as I can tell) LoopSimplify and LCSSA, so I've updated the
tests to run them separately in the old pass manager in the lit tests.
We also don't have an equivalent for AU.setPreservesCFG() in the new
pass manager, so I've left a FIXME.
Reviewers: bogner, chandlerc, davide
Subscribers: sanjoy, mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D20783
llvm-svn: 271846
Change the name of the ICmpInst to 'ICmp' and the Constant (was a ConstantInt) to 'C',
so that it's hopefully clearer that 'CI' refers to CastInst in this context.
While we're scrubbing, fix the documentation comment and use 'auto' with 'dyn_cast'.
llvm-svn: 271817
A basic block could contain:
%cp = cleanuppad []
cleanupret from %cp unwind to caller
This basic block is empty and is thus a candidate for removal. However,
there can be other uses of %cp outside of this basic block. This is
only possible in unreachable blocks.
Make our transform more correct by checking that the pad has a single
user before removing the BB.
This fixes PR28005.
llvm-svn: 271816
Add the MMX implementation to the SimplifyDemandedUseBits SSE/AVX MOVMSK support added in D19614
Requires a minor tweak as llvm.x86.mmx.pmovmskb takes a x86_mmx argument - so we have to be explicit about the implied v8i8 vector type.
llvm-svn: 271789
Summary:
Adds an option -esan-assume-intra-cache-line which causes esan to assume
that a single memory access touches just one cache line, even if it is not
aligned, for better performance at a potential accuracy cost. Experiments
show that the performance difference can be 2x or more, and accuracy loss
is typically negligible, so we turn this on by default. This currently
applies just to the working set tool.
Reviewers: aizatsky
Subscribers: vitalybuka, zhaoqin, kcc, eugenis, llvm-commits
Differential Revision: http://reviews.llvm.org/D20978
llvm-svn: 271743
Summary:
Adds a global variable to specify the tool, to support handling early
interceptors that invoke instrumented code and require shadow memory to be
initialized prior to __esan_init() being invoked.
Reviewers: aizatsky
Subscribers: vitalybuka, zhaoqin, kcc, eugenis, llvm-commits
Differential Revision: http://reviews.llvm.org/D20973
llvm-svn: 271715
There was concern that creating bitcasts for the simpler potential select pattern:
define <2 x i64> @vecBitcastOp1(<4 x i1> %cmp, <2 x i64> %a) {
%a2 = add <2 x i64> %a, %a
%sext = sext <4 x i1> %cmp to <4 x i32>
%bc = bitcast <4 x i32> %sext to <2 x i64>
%and = and <2 x i64> %a2, %bc
ret <2 x i64> %and
}
might lead to worse code for some targets, so this patch is matching the larger
patterns seen in the test cases.
The motivating example for this patch is this IR produced via SSE intrinsics in C:
define <2 x i64> @gibson(<2 x i64> %a, <2 x i64> %b) {
%t0 = bitcast <2 x i64> %a to <4 x i32>
%t1 = bitcast <2 x i64> %b to <4 x i32>
%cmp = icmp sgt <4 x i32> %t0, %t1
%sext = sext <4 x i1> %cmp to <4 x i32>
%t2 = bitcast <4 x i32> %sext to <2 x i64>
%and = and <2 x i64> %t2, %a
%neg = xor <4 x i32> %sext, <i32 -1, i32 -1, i32 -1, i32 -1>
%neg2 = bitcast <4 x i32> %neg to <2 x i64>
%and2 = and <2 x i64> %neg2, %b
%or = or <2 x i64> %and, %and2
ret <2 x i64> %or
}
For an AVX target, this is currently:
vpcmpgtd %xmm1, %xmm0, %xmm2
vpand %xmm0, %xmm2, %xmm0
vpandn %xmm1, %xmm2, %xmm1
vpor %xmm1, %xmm0, %xmm0
retq
With this patch, it becomes:
vpmaxsd %xmm1, %xmm0, %xmm0
Differential Revision: http://reviews.llvm.org/D20774
llvm-svn: 271676
Summary:
Instrument GEP instruction for counting the number of struct field
address calculation to approximate the number of struct field accesses.
Adds test struct_field_count_basic.ll to test the struct field
instrumentation.
Reviewers: bruening, aizatsky
Subscribers: junbuml, zhaoqin, llvm-commits, eugenis, vitalybuka, kcc, bruening
Differential Revision: http://reviews.llvm.org/D20892
llvm-svn: 271619
In r270478, where I enabled the new heuristic I posted testing results,
which I got when explicitly passed the thresholds values via CL options.
However, setting the CL options init-values is not enough to change the
default values of thresholds, so I'm changing them in another place now.
llvm-svn: 271615
In preparation for porting to the new PM.
Patch by Jake VanAdrighem! (review mainly by me/Justin)
Differential Revision: http://reviews.llvm.org/D20610
llvm-svn: 271607
Summary:
The module pass pipeline includes a late LICM run after loop
unrolling. LCSSA is implicitly run as a pass dependency of LICM. However no
cleanup pass was run after this, so the LCSSA nodes ended in the optimized output.
Reviewers: hfinkel, mehdi_amini
Subscribers: majnemer, bruno, mzolotukhin, mehdi_amini, llvm-commits
Differential Revision: http://reviews.llvm.org/D20606
llvm-svn: 271602
This is effectively NFC because we already do this transform after r175380:
http://reviews.llvm.org/rL175380
and also via foldBoolSextMaskToSelect().
This change should just make it a bit more efficient to match the pattern.
The original guard was added in r95058:
http://reviews.llvm.org/rL95058
A sampling of codegen for current in-tree targets shows no problems. This
makes sense given that we're already producing the vector selects via the
other transforms.
llvm-svn: 271554
Inline virtual functions has linkeonceodr linkage (emitted in comdat on
supporting targets). If the vtable for the class is not emitted in the
defining module, function won't be address taken thus its address is not
recorded. At the mercy of the linker, if the per-func prf_data from this
module (in comdat) is picked at link time, we will lose mapping from
function address to its hash val. This leads to missing icall promotion.
The second test case (currently disabled) in compiler_rt (r271528):
instrprof-icall-prom.test demostrates the bug. The first profile-use
subtest is fine due to linker order difference.
With this change, no missing icall targets is found in instrumented clang's
raw profile.
llvm-svn: 271532
Add support for the new pass manager to MemorySSA pass.
Change MemorySSA to be computed eagerly upon construction.
Change MemorySSAWalker to be owned by the MemorySSA object that creates
it.
Reviewers: dberlin, george.burgess.iv
Subscribers: mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D19664
llvm-svn: 271432
Previously, whenever we needed a vector IV, we would create it on the fly,
by splatting the scalar IV and adding a step vector. Instead, we can create a
real vector IV. This tends to save a couple of instructions per iteration.
This only changes the behavior for the most basic case - integer primary
IVs with a constant step.
Differential Revision: http://reviews.llvm.org/D20315
llvm-svn: 271410
This will be necessary to allow the global merge pass to attach
multiple debug info metadata nodes to global variables once we reverse
the edge from DIGlobalVariable to GlobalVariable.
Differential Revision: http://reviews.llvm.org/D20414
llvm-svn: 271358
This patch fixes bug https://llvm.org/bugs/show_bug.cgi?id=27897.
When query memory access cost, current SLP always passes in alignment value of 1 (unaligned), so it gets a very high cost of scalar memory access, and wrongly vectorize memory loads in the test case.
It can be fixed by simply giving correct alignment.
llvm-svn: 271333
The assumption, made in insert() that weak functions are always inserted after strong functions,
is only true in the first round of adding functions.
In subsequent rounds this is no longer guaranteed , because we might remove a strong function from the tree (because it's modified) and add it later,
where an equivalent weak function already exists in the tree.
This change removes the assert in insert() and explicitly enforces a weak->strong order.
This also removes the need of two separate loops in runOnModule().
llvm-svn: 271299
Summary:
Creates a global variable containing preliminary information
for the cache-fragmentation tool runtime.
Passes a pointer to the variable (null if no variable is created) to the
compilation unit init and exit routines in the runtime.
Reviewers: aizatsky, bruening
Subscribers: filcab, kubabrecka, bruening, kcc, vitalybuka, eugenis, llvm-commits, zhaoqin
Differential Revision: http://reviews.llvm.org/D20541
llvm-svn: 271298
This adds support to the backed to actually support SjLj EH as an exception
model. This is *NOT* the default model, and requires explicitly opting into it
from the frontend. GCC supports this model and for MinGW can still be enabled
via the `--using-sjlj-exceptions` options.
Addresses PR27749!
llvm-svn: 271244
Since we already assert that the outgoing IR is in LCSSA, it is easy to
get misled into thinking that -indvars broke LCSSA if the incoming IR is
non-LCSSA. Checking this pre-condition will make such cases break in
more obvious ways.
Inspired by (but does _not_ fix) PR26682.
llvm-svn: 271196
Summary:
If we can prove that an op.with.overflow intrinsic does not overflow, we
can get rid of the intrinsic, and replace it with non-wrapping
arithmetic.
This was first checked in at r265913 but reverted in r265950 because it
exposed some issues around how SCEV handled post-inc add recurrences.
Those issues have now been fixed.
Reviewers: atrick, regehr
Subscribers: sanjoy, mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D18685
llvm-svn: 271153
This patch removes the llvm intrinsics VPMOVSX and (V)PMOVZX sign/zero extension intrinsics and auto-upgrades to SEXT/ZEXT calls instead. We already did this for SSE41 PMOVSX sometime ago so much of that implementation can be reused.
Reapplied now that the the companion patch (D20684) removes/auto-upgrade the clang intrinsics has been committed.
Differential Revision: http://reviews.llvm.org/D20686
llvm-svn: 271131
Summary:
When RF_NullMapMissingGlobalValues is set, mapValue can return null
for GlobalValue. When mapping the operands of a constant that is
referenced from metadata, we need to handle this case and actually
return null instead of mapping this constant.
Reviewers: dexonsmith, rafael
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D20713
llvm-svn: 271129
It was triggering an msan bot.
Revert "[IRPGO] Set the function entry count metadata."
This reverts commit r271090.
Revert "[IRPGO] Centralize the function attribute inliner hint logic. NFC."
This reverts commit r271089.
llvm-svn: 271091
Summary:
Unroll factor (Count) calculations moved to a new function.
Early exits on pragma and "-unroll-count" defined factor added.
New type of unrolling "Force" introduced (previously used implicitly).
New unroll preference "AllowRemainder" introduced and set "true" by default.
(should be set to false for architectures that suffers from it).
Reviewers: hfinkel, mzolotukhin, zzheng
Differential Revision: http://reviews.llvm.org/D19553
From: Evgeny Stupachenko <evstupac@gmail.com>
llvm-svn: 271071
When we traced through a phi node looking for floating-point reductions, we
forgot whether we'd ever seen an instruction without fast-math flags (that
would block vectorization). This propagates it through to the end.
llvm-svn: 271015
Currently we consider that each constant has itself as a base value. I.e "base(const) = const".
This introduces couple of problems when we are trying to avoid reporting constants in statepoint live sets:
1. When querying "base( phi(const1, const2) )" we will get "phi(const1, const2)" as a base pointer. Since
it's not a constant we will record it in a stack map. However on practice we don't want this to happen
(constant are never relocated).
2. base( phi(const, gc ptr) ) = phi( const, base(gc ptr) ). This particular case imposes challenge on our
runtime - we don't expect to see constant base pointers other than null. This problems can be avoided
by treating all constant as if they were derived from null pointer base. I.e in a first case we will
not include constant pointer in a stack map at all. In a second case we will get "phi(null, base(gc ptr))"
as a base pointer which is a lot more convenient.
Differential Revision: http://reviews.llvm.org/D20584
llvm-svn: 270993
This patch removes the llvm intrinsics VPMOVSX and (V)PMOVZX sign/zero extension intrinsics and auto-upgrades to SEXT/ZEXT calls instead. We already did this for SSE41 PMOVSX sometime ago so much of that implementation can be reused.
A companion patch (D20684) removes/auto-upgrade the clang intrinsics.
Differential Revision: http://reviews.llvm.org/D20686
llvm-svn: 270973
objc_storeStrong can be formed from a sequence such as
%0 = tail call i8* @objc_retain(i8* %p) nounwind
%tmp = load i8*, i8** @x, align 8
store i8* %0, i8** @x, align 8
tail call void @objc_release(i8* %tmp) nounwind
The code was already looking through bitcasts for most of the values
involved, but had missed one case where the pointer operand for the
store was a bitcast. Ultimately the pointer for the load and store
have to be the same value, after stripping casts.
llvm-svn: 270955
If every operands of a constant are mapping to themselves, and the
type does not change, we have an early exit as acknowledged in the
comment:
// Otherwise, we have some other constant to remap. Start by checking to see
// if all operands have an identity remapping.
However instead of checking for identity the code was checking if the
operands were mapped to the constant itself, which is rarely true.
As a consequence, the coverage report showed that the early exit was
never taken.
llvm-svn: 270944
Condition might be simplified to a Constant, but it doesn't have to be
ConstantInt, so we should dyn_cast, instead of cast.
This fixes PR27886.
llvm-svn: 270924
An exception could prevent a store from occurring but MemCpyOpt's
callslot optimization would fire anyway, causing the store to occur.
This fixes PR27849.
llvm-svn: 270892
The problem with plugins on Windows is that when building a plugin DLL it needs
to explicitly link against something (an exe or DLL) if it uses symbols from
that thing, and that thing must explicitly export those symbols. Also there's a
limit of 65535 symbols that can be exported. This means that currently plugins
only work on Windows when using BUILD_SHARED_LIBS, and that doesn't work with
MSVC.
This patch adds an LLVM_EXPORT_SYMBOLS_FOR_PLUGINS option, which when enabled
automatically exports from all LLVM tools the symbols that a plugin could want
to use so that a plugin can link against a tool directly. Plugins can specify
what tool they link against by using PLUGIN_TOOL argument to llvm_add_library.
The option can also be enabled on Linux, though there all it should do is
restrict the set of symbols that are exported as by default all symbols are
exported.
This option is currently OFF by default, as while I've verified that it works
with MSVC, linux gcc, and cygwin gcc, I haven't tried mingw gcc and I have no
idea what will happen on OSX. Also unfortunately we can't turn on
LLVM_ENABLE_PLUGINS when the option is ON as bugpoint-passes needs to be
loaded by both bugpoint.exe and opt.exe which is incompatible with this
approach. Also currently clang plugins don't work with this approach, which
will be fixed in future patches.
Differential Revision: http://reviews.llvm.org/D18826
llvm-svn: 270839
It is unsafe to hoist a load before a function call which may throw, the
throw might prevent a pointer dereference.
Likewise, it is unsafe to sink a store after a call which may throw.
The caller might be able to observe the difference.
This fixes PR27858.
llvm-svn: 270828
It turns out that too many passes are relying on alias analysis results
for control dependencies. Until we fix that by introducing a more accurate
modelling of control dependencies, special case assume in MemorySSA instead.
Also introduce tests to ensure we don't regress the FunctionAttrs or LICM
passes.
Differential Revision: http://reviews.llvm.org/D20658
llvm-svn: 270823
There is only one caller of MemorySSA::createNewAccess, and it passes true
as the IgnoreNonMemory argument. Remove that argument and fold its behavior
into createNewAccess.
llvm-svn: 270812
After this change, we do the expected thing for cases like
```
Check0Passed = /* range check IRCE can optimize */
Check1Passed = /* range check IRCE can optimize */
if (!(Check0Passed && Check1Passed))
throw_Exception();
```
llvm-svn: 270804
As a result of D18634 we no longer infer certain attributes on linkonce_odr
functions at compile time, and may only infer them at LTO time. The readnone
attribute in particular is required for virtual constant propagation (part
of whole-program virtual call optimization) to work correctly.
This change moves the whole-program virtual call optimization pass after
the function attribute inference passes, and enables the attribute inference
passes at opt level 1, so that virtual constant propagation has a chance to
work correctly for linkonce_odr functions.
Differential Revision: http://reviews.llvm.org/D20643
llvm-svn: 270765
It may materialize a declaration, or a definition. The name could
be misleading. This is following a merge of materializeInitFor()
into materializeDeclFor().
Differential Revision: http://reviews.llvm.org/D20593
llvm-svn: 270759
They were originally separated to handle the co-recursion between
the ValueMapper and the ValueMaterializer. This recursion does not
exist anymore: the ValueMapper now uses a Worklist and the
ValueMaterializer is scheduling job on the Worklist.
Differential Revision: http://reviews.llvm.org/D20593
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 270758
Also, rename recognizeBitReverseOrBSwapIdiom to recognizeBSwapOrBitReverseIdiom,
so the ordering of the MatchBSwaps and MatchBitReversals arguments are
consistent with the function name.
llvm-svn: 270715
Move the now index-based ODR resolution and internalization routines out
of ThinLTOCodeGenerator.cpp and into either LTO.cpp (index-based
analysis) or FunctionImport.cpp (index-driven optimizations).
This is to enable usage by other linkers.
llvm-svn: 270698
Followup to D20528 clang patch, this removes the (V)CVTDQ2PD(Y) and (V)CVTPS2PD(Y) llvm intrinsics and auto-upgrades to sitofp/fpext instead.
Differential Revision: http://reviews.llvm.org/D20568
llvm-svn: 270678
A volatile load has side effects beyond what callers expect readonly to
signify. For example, it is not safe to reorder two function calls
which each perform a volatile load to the same memory location.
llvm-svn: 270671
Summary:
Adds fastpath instrumentation for esan's working set tool. The
instrumentation for an intra-cache-line load or store consists of an
inlined write to shadow memory bits for the corresponding cache line.
Adds a basic test for this instrumentation.
Reviewers: aizatsky
Subscribers: vitalybuka, zhaoqin, kcc, eugenis, llvm-commits
Differential Revision: http://reviews.llvm.org/D20483
llvm-svn: 270640
Summary:
Adds createEsanInitToolGV for creating a tool-specific variable passed
to the runtime library.
Adds dtor "esan.module_dtor" and inserts calls from the dtor to
"__esan_exit" in the runtime library.
Updates the EfficiencySanitizer test.
Patch by Qin Zhao.
Reviewers: aizatsky
Subscribers: bruening, kcc, vitalybuka, eugenis, llvm-commits
Differential Revision: http://reviews.llvm.org/D20488
llvm-svn: 270627
This changes IRCE to optimize uses, and not branches. This change is
NFCI since the uses we do inspect are in practice only ever going to be
the condition use in conditional branches; but this flexibility will
later allow us to analyze more complex expressions than just a direct
branch on a range check.
llvm-svn: 270500
When an aggregate contains an opaque type its size cannot be
determined. This triggers an "Invalid GetElementPtrInst indices for type" assert
in function checkGEPType. The fix suppresses the conversion in this case.
http://reviews.llvm.org/D20319
llvm-svn: 270479
Summary:
This patch turns on LoopUnrollAnalyzer by default. To mitigate compile
time regressions, I chose very conservative thresholds for now. Later we
can make them more aggressive, but it might require being smarter in
which loops we're optimizing. E.g. currently the biggest issue is that
with more agressive thresholds we unroll many cold loops, which
increases compile time for no performance benefit (performance of those
loops is improved, but it doesn't matter since they are cold).
Test results for compile time(using 4 samples to reduce noise):
```
MultiSource/Benchmarks/VersaBench/ecbdes/ecbdes 5.19%
SingleSource/Benchmarks/Polybench/medley/reg_detect/reg_detect 4.19%
MultiSource/Benchmarks/FreeBench/fourinarow/fourinarow 3.39%
MultiSource/Applications/JM/lencod/lencod 1.47%
MultiSource/Benchmarks/Fhourstones-3_1/fhourstones3_1 -6.06%
```
I didn't see any performance changes in the testsuite, but it improves
some internal tests.
Reviewers: hfinkel, chandlerc
Subscribers: llvm-commits, mzolotukhin
Differential Revision: http://reviews.llvm.org/D20482
llvm-svn: 270478
A cleanuppad is not cheap, they turn into many instructions and result
in additional spills and fills. It is not worth keeping a cleanuppad
around if all it does is hold a lifetime.end instruction.
N.B. We first try to merge the cleanuppad with another cleanuppad to
avoid dropping the lifetime and debug info markers.
llvm-svn: 270314
The InductiveRangeCheck struct is only five words long; so passing these
around value is fine. The allocator makes the code look more complex
than it is.
llvm-svn: 270309
I had used `std::remove_if` under the assumption that it moves the
predicate matching elements to the end, but actaully the elements
remaining towards the end (after the iterator returned by
`std::remove_if`) are indeterminate. Fix the bug (and make the code
more straightforward) by using a temporary SmallVector, and add a test
case demonstrating the issue.
llvm-svn: 270306
Summary:
Uses ModulePass instead of FunctionPass for EfficiencySanitizerPass to
better support global variable creation for a forthcoming struct field
counter tool.
Patch by Qin Zhao.
Reviewers: aizatsky
Subscribers: llvm-commits, eugenis, vitalybuka, bruening, kcc
Differential Revision: http://reviews.llvm.org/D20458
llvm-svn: 270263
Check that the incoming blocks of phi nodes are identical, and block
function merging if they are not.
rdar://problem/26255167
Differential Revision: http://reviews.llvm.org/D20462
llvm-svn: 270250
Sequences of range checks expressed using guards, like
guard((I - 2) u< L)
guard((I - 1) u< L)
guard((I + 0) u< L)
guard((I + 1) u< L)
guard((I + 2) u< L)
can sometimes be combined into a smaller sequence:
guard((I - 2) u< L AND (I + 2) u< L)
if we can prove that (I - 2) u< L AND (I + 2) u< L implies all of checks
expressed in the previous sequence.
This change teaches GuardWidening to do this kind of merging when
feasible.
llvm-svn: 270151
This patch fixes https://llvm.org/bugs/show_bug.cgi?id=27703.
If there is a sequence of one or more load instructions, each loaded value is used as address of later load instruction, bitcast is necessary to change the value type, don't optimize it.
llvm-svn: 270135
Transition InstrProf and Coverage over to the stricter Error/Expected
interface.
Changes since the initial commit:
- Fix error message printing in llvm-profdata.
- Check errors in loadTestingFormat() + annotateAllFunctions().
- Defer error handling in InstrProfIterator to InstrProfReader.
- Remove the base ProfError class to work around an MSVC ICE.
Differential Revision: http://reviews.llvm.org/D19901
llvm-svn: 270020
Summary:
Implement guard widening in LLVM. Description from GuardWidening.cpp:
The semantics of the `@llvm.experimental.guard` intrinsic lets LLVM
transform it so that it fails more often that it did before the
transform. This optimization is called "widening" and can be used hoist
and common runtime checks in situations like these:
```
%cmp0 = 7 u< Length
call @llvm.experimental.guard(i1 %cmp0) [ "deopt"(...) ]
call @unknown_side_effects()
%cmp1 = 9 u< Length
call @llvm.experimental.guard(i1 %cmp1) [ "deopt"(...) ]
...
```
to
```
%cmp0 = 9 u< Length
call @llvm.experimental.guard(i1 %cmp0) [ "deopt"(...) ]
call @unknown_side_effects()
...
```
If `%cmp0` is false, `@llvm.experimental.guard` will "deoptimize" back
to a generic implementation of the same function, which will have the
correct semantics from that point onward. It is always _legal_ to
deoptimize (so replacing `%cmp0` with false is "correct"), though it may
not always be profitable to do so.
NB! This pass is a work in progress. It hasn't been tuned to be
"production ready" yet. It is known to have quadriatic running time and
will not scale to large numbers of guards
Reviewers: reames, atrick, bogner, apilipenko, nlewycky
Subscribers: mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D20143
llvm-svn: 269997
Summary: Set default branch weight to 1:1 if one of the branch has profile missing when simplifying CFG.
Reviewers: spatel, davidxl
Subscribers: danielcdh, llvm-commits
Differential Revision: http://reviews.llvm.org/D20307
llvm-svn: 269995
In truncateToMinimalBitwidths() we were RAUW'ing an instruction then erasing it. However, that intruction could be cached in the map we're iterating over. The first check is "I->use_empty()" which in most cases would return true, as the (deleted) object was RAUW'd first so would have zero use count. However in some cases the object could have been polluted or written over and this wouldn't be the case. Also it makes valgrind, asan and traditionalists who don't like their compiler to crash sad.
No testcase as there are no externally visible symptoms apart from a crash if the stars align.
Fixes PR26509.
llvm-svn: 269908
This bug was introduced in r269728 and is the likely cause of many stage 2 ubsan bot failures.
I'll add a test in a follow-up commit assuming this fixes things properly.
llvm-svn: 269797
This is assertion is no longer necessary since we never record
constants in the live set anyway. (They are never recorded in
the initial live set, and constant bases are removed near line 2119)
Differential Revision: http://reviews.llvm.org/D20293
llvm-svn: 269764
Fix a bug introduced with rL269426 :
[InstCombine] canonicalize* LE/GE vector integer comparisons to LT/GT (PR26701, PR26819)
We were assuming that a ConstantDataVector / ConstantVector / ConstantAggregateZero operand of
an ICMP was composed of ConstantInt elements, but it might have ConstantExpr or UndefValue
elements. Handle those appropriately.
Also, refactor this function to join the scalar and vector paths and eliminate the switches.
Differential Revision: http://reviews.llvm.org/D20289
llvm-svn: 269728
Transition InstrProf and Coverage over to the stricter Error/Expected
interface.
Changes since the initial commit:
- Address undefined-var-template warning.
- Fix error message printing in llvm-profdata.
- Check errors in loadTestingFormat() + annotateAllFunctions().
- Defer error handling in InstrProfIterator to InstrProfReader.
Differential Revision: http://reviews.llvm.org/D19901
llvm-svn: 269694
The selection of the vectorization factor currently doesn't consider
interleaved accesses. The vectorization factor is based on the maximum safe
dependence distance computed by LAA. However, for loops with interleaved
groups, we should instead base the vectorization factor on the maximum safe
dependence distance divided by the maximum interleave factor of all the
interleaved groups. Interleaved accesses not in a group will be scalarized.
Differential Revision: http://reviews.llvm.org/D20241
llvm-svn: 269659
TargetLibraryInfoWrapperPass is a dependency of
SCCP but it's not listed as such. Chandler pointed
out this is an easy mistake to make which only
surfaces in weird crashes with some flag combinations.
This code will go away anyway at some point in the
future, but as long as it's (still) exercised, try
to make it correct.
llvm-svn: 269589
Transition InstrProf and Coverage over to the stricter Error/Expected
interface.
Changes since the initial commit:
- Fix error message printing in llvm-profdata.
- Check errors in loadTestingFormat() + annotateAllFunctions().
- Defer error handling in InstrProfIterator to InstrProfReader.
Differential Revision: http://reviews.llvm.org/D19901
llvm-svn: 269491
Transition InstrProf and Coverage over to the stricter Error/Expected
interface.
Differential Revision: http://reviews.llvm.org/D19901
llvm-svn: 269462
Currently there is no reasonable way to control the warnings in the 'use' phase
of the IRPGO pass. This is problematic because the output can be somewhat
spammy. This patch adds some flags which allow us to optionally disable these
warnings. The current upstream behavior will remain the default.
Patch by Jake VanAdrighem (jvanadrighem@gmail.com)
Differential Revision: http://reviews.llvm.org/D20195
llvm-svn: 269437
Summary: This change fix the bug in isProfitableToUseMemset() where MaxIntSize shoule be in byte, not bit.
Reviewers: arsenm, joker.eph, mcrosier
Subscribers: mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D20176
llvm-svn: 269433
*We don't currently handle the edge case constants (min/max values), so it's not a complete
canonicalization.
To fully solve the motivating bugs, we need to enhance this to recognize a zero vector
too because that's a ConstantAggregateZero which is a ConstantData, not a ConstantVector
or a ConstantDataVector.
Differential Revision: http://reviews.llvm.org/D17859
llvm-svn: 269426
Summary:
...loop after the last iteration.
This is really hard to do correctly. The core problem is that we need to
model liveness through the induction PHIs from iteration to iteration in
order to get the correct results, and we need to correctly de-duplicate
the common subgraphs of instructions feeding some subset of the
induction PHIs. All of this can be driven either from a side effect at
some iteration or from the loop values used after the loop finishes.
This patch implements this by storing the forward-propagating analysis
of each instruction in a cache to recall whether it was free and whether
it has become live and thus counted toward the total unroll cost. Then,
at each sink for a value in the loop, we recursively walk back through
every value that feeds the sink, including looping back through the
iterations as needed, until we have marked the entire input graph as
live. Because we cache this, we never visit instructions more than twice
-- once when we analyze them and put them into the cache, and once when
we count their cost towards the unrolled loop. Also, because the cache
is only two bits and because we are dealing with relatively small
iteration counts, we can store all of this very densely in memory to
avoid this from becoming an excessively slow analysis.
The code here is still pretty gross. I would appreciate suggestions
about better ways to factor or split this up, I've stared too long at
the algorithmic side to really have a good sense of what the design
should probably look at.
Also, it might seem like we should do all of this bottom-up, but I think
that is a red herring. Specifically, the simplification power is *much*
greater working top-down. We can forward propagate very effectively,
even across strange and interesting recurrances around the backedge.
Because we use data to propagate, this doesn't cause a state space
explosion. Doing this level of constant folding, etc, would be very
expensive to do bottom-up because it wouldn't be until the last moment
that you could collapse everything. The current solution is essentially
a top-down simplification with a bottom-up cost accounting which seems
to get the best of both worlds. It makes the simplification incremental
and powerful while leaving everything dead until we *know* it is needed.
Finally, a core property of this approach is its *monotonicity*. At all
times, the current UnrolledCost is a conservatively low estimate. This
ensures that we will never early-exit from the analysis due to exceeding
a threshold when if we had continued, the cost would have gone back
below the threshold. These kinds of bugs can cause incredibly hard to
track down random changes to behavior.
We could use a techinque similar (but much simpler) within the inliner
as well to avoid considering speculated code in the inline cost.
Reviewers: chandlerc
Subscribers: sanjoy, mzolotukhin, llvm-commits
Differential Revision: http://reviews.llvm.org/D11758
llvm-svn: 269388
Ported DA to the new PM by splitting the former DependenceAnalysis Pass
into a DependenceInfo result type and DependenceAnalysisWrapperPass type
and adding a new PM-style DependenceAnalysis analysis pass returning the
DependenceInfo.
Patch by Philip Pfaffe, most of the review by Justin.
Differential Revision: http://reviews.llvm.org/D18834
llvm-svn: 269370
Split FCMP//ICMP/SEL from the basic arithmetic cost functions. They were not sharing any notable code path (just the return) and were repeatedly testing the opcode.
llvm-svn: 269348
LoopVectorBody was changed from a single pointer to a SmallVector when
store predication was introduced in r200270. Since r247139, store predication
no longer splits the vector loop body in-place, so we can go back to having
a single LoopVectorBody block.
This reverts the no-longer-needed changes from r200270.
llvm-svn: 269321
Shifts beyond the bitwidth are undef but SCCP resolved them to zero.
Instead, DTRT and resolve them to undef.
This reimplements the transform which caused PR27712.
llvm-svn: 269269
This new verifier rule lets us unambigously pick a calling convention
when creating a new declaration for
`@llvm.experimental.deoptimize.<ty>`. It is also congruent with our
lowering strategy -- since all calls to `@llvm.experimental.deoptimize`
are lowered to calls to `__llvm_deoptimize`, it is reasonable to enforce
a unique calling convention.
Some of the tests that were breaking this verifier rule have had to be
split up into different .ll files.
The inliner was violating this rule as well, and has been fixed to avoid
producing invalid IR.
llvm-svn: 269261
This should just be a compile-time change. Correct the check for whether
we have already analyzed the callee when making summary based decisions.
There is no need to reprocess one at the same threshold as when it was
last processed.
llvm-svn: 269251
Summary: In sample profile, some branches may have profile missing due to profile inaccuracy. We want existing branch probability still valid after propagation.
Reviewers: hfinkel, davidxl, spatel
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D19948
llvm-svn: 269137
Sort of the BB-local equivalent to idiom-recognizer: if we have a basic-block
that really implements a memcpy operation, backends can benefit from seeing
this.
llvm-svn: 269125
Before r268509, Clang would disable the loop unroll pass when optimizing
for size. That commit enabled it to be able to support unroll pragmas
in -Os builds. However, this regressed binary size in one of Chromium's
DLLs with ~100 KB.
This restores the original behaviour of no unrolling at -Os, but doing it
in LLVM instead of Clang makes more sense, and also allows the pragmas to
keep working.
Differential revision: http://reviews.llvm.org/D20115
llvm-svn: 269124
This patch extend loopreroll to allow the instruction chain
of loop control only IV has sext.
Differential Revision: http://reviews.llvm.org/D19820
llvm-svn: 269121
Remove the ModuleLevelChanges argument, and the ability to create new
subprograms for cloned functions. The latter was added without review in
r203662, but it has no in-tree clients (all non-test callers pass false
for ModuleLevelChanges [1], so it isn't reachable outside of tests). It
also isn't clear that adding a duplicate subprogram to the compile unit is
always the right thing to do when cloning a function within a module. If
this functionality comes back it should be accompanied with a more concrete
use case.
Furthermore, all in-tree clients add the returned function to the module.
Since that's pretty much the only sensible thing you can do with the function,
just do that in CloneFunction.
[1] http://llvm-cs.pcc.me.uk/lib/Transforms/Utils/CloneFunction.cpp/rCloneFunction
Differential Revision: http://reviews.llvm.org/D18628
llvm-svn: 269110
The plan is to eventually make this logic simpler, however I expect it to
be a little tricky for the foreseeable future (at least until we're rid of
pointee types), so move it here so that it can be reused to build a summary
index for devirtualization.
Differential Revision: http://reviews.llvm.org/D20005
llvm-svn: 269081
Summary:
Add support for emission of plaintext lists of the imported files for
each distributed backend compilation. Used for distributed build file
staging.
Invoked with new gold-plugin thinlto-emit-imports-files option, which is
only valid with thinlto-index-only (i.e. for distributed builds), or
from llvm-lto with new -thinlto-action=emitimports value.
Depends on D19556.
Reviewers: joker.eph
Subscribers: llvm-commits, joker.eph
Differential Revision: http://reviews.llvm.org/D19636
llvm-svn: 269067
This restores commit r268627:
Summary:
When launching ThinLTO backends in a distributed build (currently
supported in gold via the thinlto-index-only plugin option), emit
an individual index file for each backend process as described here:
http://lists.llvm.org/pipermail/llvm-dev/2016-April/098272.html
...
Differential Revision: http://reviews.llvm.org/D19556
Address msan failures by avoiding std::prev on map.end(), the
theory is that this is causing issues due to some known UB problems
in __tree.
llvm-svn: 269059
Loop rotation clones instruction from the old header into the preheader. If
there were uses of values produced by these instructions that were outside
the loop, we have to insert PHI nodes to merge the two values. If the values
are used by DbgIntrinsics they will be used as a MetadataAsValue of a
ValueAsMetadata of the original values, and iterating all of the uses of the
original value will not update the DbgIntrinsics. The new code checks if the
values are used by DbgIntrinsics and if so, updates them using essentially
the same logic as the original code.
The attached testcase demonstrates the issue. Without the fix, the
DbgIntrinic outside the loop uses values computed inside the loop, even
though these values do not dominate the DbgIntrinsic.
Author: Thomas Jablin (tjablin)
Reviewers: dblaikie aprantl kbarton hfinkel cycheng
http://reviews.llvm.org/D19564
llvm-svn: 269034
When a va_start or va_copy is immediately followed by a va_end (ignoring
debug information or other start/end in between), then it is safe to
remove the pair. As this code shares some commonalities with the lifetime
markers, this has been factored to helper functions.
This InstCombine pattern kicks-in 3 times when running the LLVM test
suite.
llvm-svn: 269033
This reverts commits r268969, r268979 and r268984. They had target specific test
in generic directories without the correct specifiers and made it hard for us to
come up with a good solution by rapidly committing untested changes.
This test needs to be in a target specific directory or have the correct REQUIRED
identifier.
llvm-svn: 269027
Allow vectorization when the step is a loop-invariant variable.
This is the loop example that is getting vectorized after the patch:
int int_inc;
int bar(int init, int *restrict A, int N) {
int x = init;
for (int i=0;i<N;i++){
A[i] = x;
x += int_inc;
}
return x;
}
"x" is an induction variable with *loop-invariant* step.
But it is not a primary induction. Primary induction variable with non-constant step is not handled yet.
Differential Revision: http://reviews.llvm.org/D19258
llvm-svn: 269023
IR instrumentation generates a COMDAT symbol __llvm_profile_raw_version to
overwrite the same symbol in profile run-time to distinguish IR profiles from
Clang generated profiles. In MACHO, LinkOnceODR linkage is used due to the
lack of COMDAT support.
But LinkOnceODR linkage might have .weak_def_can_be_hidden assembly directive,
while the weak variable in run-time has a .weak_definition directive. Linker
will not merge these two symbols even they have the same name. The end result
is IR profiles are not properly flagged in MACHO.
This patch changes the linkage for __llvm_profile_raw_version in each module to
LinkOnceAny so that it has same .weak_definition directive as in the run-time.
Differential Revision: http://reviews.llvm.org/D20078
llvm-svn: 268969
This fixes http://llvm.org/PR27646 on AArch64.
There are three issues here:
- The GR save area is 7 words in size, instead of 8. This is not enough
if none of the fixed arguments is passed in GRs (they're all floats or
aggregates).
- The first argument is ignored (which counteracts the above if it's passed
in GR).
- Like x86_64, fixed arguments landing in the overflow area are wrongly
counted towards the overflow offset.
Differential Revision: http://reviews.llvm.org/D20023
llvm-svn: 268967
Original Commit Message
Extend load/store type canonicalization to handle unordered operations
Extend the type canonicalization logic to work for unordered atomic loads and stores. Note that while this change itself is fairly simple and low risk, there's a reasonable chance this will expose problems in the backends by suddenly generating IR they wouldn't have seen before. Anything of this nature will be an existing bug in the backend (you could write an atomic float load), but this will definitely change the frequency with which such cases are encountered. If you see problems, feel free to revert this change, but please make sure you collect a test case.
Note that the concern about lowering is now much less likely. PR27490 proved that we already *were* mucking with the types of ordered atomics and volatiles. As a result, this change doesn't introduce as much new behavior as originally thought.
llvm-svn: 268809
Again, fairly simple. Only change is ensuring that we actually copy the property of the load correctly. The aliasing legality constraints were already handled by the FRE patches. There's nothing special about unorder atomics from the perspective of the PRE algorithm itself.
llvm-svn: 268804
You'll note there are essentially no code changes here. Cross block FRE heavily reuses code from the block local FRE. All of the tricky parts were done as part of the previous patch and the refactoring that removed the original code duplication.
llvm-svn: 268775
This patch is the first in a small series teaching GVN to optimize unordered loads aggressively. This change just handles block local FRE because that's the simplest thing which lets me test MDA, and the AvailableValue pieces. Somewhat suprisingly, MDA appears fine and only a couple of small changes are needed in GVN.
Once this is in, I'll tackle non-local FRE and PRE. The former looks like a natural extension of this, the later will require a couple of minor changes.
Differential Revision: http://reviews.llvm.org/D19440
llvm-svn: 268770
Summary:
The original ThinLTO pipeline was derived from some
work I did tuning FullLTO on the test suite and SPEC. This
patch reduces the amount of work done in the "linker phase" of
the build, and extend the function simplifications passes
performed during the "compile phase". This helps the build time
by reducing the IR as much as possible during the compile phase
and limiting the work to be performed during the "link phase",
while keeping the performance "on par" with the existing pipeline.
Reviewers: tejohnson
Subscribers: llvm-commits, joker.eph
Differential Revision: http://reviews.llvm.org/D19773
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 268769
Retrying r268550/r268751 which were reverted at r268577/r268765 due a memory sanitizer failure.
I have not been able to reproduce that failure, but I've taken another guess at fixing
the problem in this version of the patch and will watch for another failure.
Original commit message:
Unlike earlier similar fixes, we need to recalculate the branch weights
in this case.
Differential Revision: http://reviews.llvm.org/D19674
llvm-svn: 268767
Retrying r268550 which was reverted at r268577 due a memory sanitizer failure.
I have not been able to reproduce that failure, but I've taken a guess at fixing
the problem in this version of the patch and will watch for another failure.
Original commit message:
Unlike earlier similar fixes, we need to recalculate the branch weights
in this case.
Differential Revision: http://reviews.llvm.org/D19674
llvm-svn: 268751
Allowing overriding the default ASAN shadow mapping offset with the
-asan-shadow-offset option, and allow zero to be specified for both offset and
scale.
Patch by Aaron Carroll <aaronc@apple.com>.
llvm-svn: 268724
This test was crashing, and currently it breaks bootstrapping clang with debuginfo
Differential Revision: http://reviews.llvm.org/D20008
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 268715
Summary: We need to clean up CFG before assigning discriminator to minimize the impact of optimization on debug info.
Reviewers: davidxl, dblaikie, dnovillo
Subscribers: dnovillo, danielcdh, llvm-commits
Differential Revision: http://reviews.llvm.org/D19926
llvm-svn: 268675
Summary:
Some PHIs can have expressions that are not AddRecExprs due to the presence
of sext/zext instructions. In order to prevent the Loop Vectorizer from
bailing out when encountering these PHIs, we now coerce the SCEV
expressions to AddRecExprs using SCEV predicates (when possible).
We only do this when the alternative would be to not vectorize.
Reviewers: mzolotukhin, anemet
Subscribers: mssimpso, sanjoy, mzolotukhin, llvm-commits
Differential Revision: http://reviews.llvm.org/D17153
llvm-svn: 268633
Summary:
When launching ThinLTO backends in a distributed build (currently
supported in gold via the thinlto-index-only plugin option), emit
an individual index file for each backend process as described here:
http://lists.llvm.org/pipermail/llvm-dev/2016-April/098272.html
The individual index file encodes the summary and module information
required for implementing the importing/exporting decisions made
for a given module in the thin link step.
This is in place of the current mechanism that uses the combined index
to make importing decisions in each back end independently. It is an
enabler for doing global summary based optimizations in the thin link
step (which will be recorded in the individual index files), and reduces
the size of the index that must be sent to each backend process, and
the amount of work to scan it in the backends.
Rather than create entirely new ModuleSummaryIndex structures (and all
the included unique_ptrs) for each backend index file, a map is created
to record all of the GUID and summary pointers needed for a particular
index file. The IndexBitcodeWriter walks this map instead of the full
index (hiding the details of managing the appropriate summary iteration
in a new iterator subclass). This is more efficient than walking the
entire combined index and filtering out just the needed summaries during
each backend bitcode index write.
Depends on D19481.
Reviewers: joker.eph
Subscribers: llvm-commits, joker.eph
Differential Revision: http://reviews.llvm.org/D19556
llvm-svn: 268627
Allowing overriding the default ASAN shadow mapping offset with the
-asan-shadow-offset option, and allow zero to be specified for both offset and
scale.
llvm-svn: 268586
MemorySanitizer: use-of-uninitialized-value
0x4910e47 in count /mnt/b/sanitizer-buildbot2/sanitizer-x86_64-linux-bootstrap/build/llvm/include/llvm/Support/MathExtras.h:159:12
0x4910e47 in countLeadingZeros<unsigned long> /mnt/b/sanitizer-buildbot2/sanitizer-x86_64-linux-bootstrap/build/llvm/include/llvm/Support/MathExtras.h:183
0x4910e47 in FitWeights /mnt/b/sanitizer-buildbot2/sanitizer-x86_64-linux-bootstrap/build/llvm/lib/Transforms/Utils/SimplifyCFG.cpp:855
0x4910e47 in SimplifyCondBranchToCondBranch /mnt/b/sanitizer-buildbot2/sanitizer-x86_64-linux-bootstrap/build/llvm/lib/Transforms/Utils/SimplifyCFG.cpp:2895
This reverts commit 609f4dd4bf3bc735c8c047a4d4b0a8e9e4d202e2.
llvm-svn: 268577
This reapplies commit r268521, that was reverted in r268530 due to a test failure in select-implied.ll
Modified the test case to reflect the new change.
llvm-svn: 268557
Unlike earlier similar fixes, we need to recalculate the branch weights
in this case.
Differential Revision: http://reviews.llvm.org/D19674
llvm-svn: 268550
This patch fixes PR27615.
@llvm.dbg.value instructions no longer count towards the maximum number of
instructions to look back at in the instruction list when searching for a
store instruction. This should make the output consistent between debug and
non-debug build.
Patch by Henric Karlsson <henric.karlsson@ericsson.com>!
Differential Revision: http://reviews.llvm.org/D19912
llvm-svn: 268512
Goal of this change is to guarantee stable ordering of the statepoint arguments and other
newly inserted values such as gc.relocates. Previously we had explicit sorting in a couple
of places. However for unnamed values ordering was partial and overall we didn't have any
strong invariant regarding it. This change switches all data structures to use SetVector's
and MapVector's which provide possibility for deterministic iteration over them.
Explicit sorting is now redundant and was removed.
Differential Revision: http://reviews.llvm.org/D19669
llvm-svn: 268502
We forgot to consider the target of ifuncs when considering if a
function was alive or dead.
N.B. Also update a few auxiliary tools like bugpoint and
verify-uselistorder.
This fixes PR27593.
llvm-svn: 268468
pointing to the same addr space. This can prevent SROA from creating a bitcast
between pointers with different addr spaces.
Differential Revision: http://reviews.llvm.org/D19697
llvm-svn: 268424
SCEV caches whether SCEV expressions are loop invariant, variant or
computable. LICM breaks this cache, almost by definition; so clear the
SCEV disposition cache if LICM changed anything.
llvm-svn: 268408
`Loop::makeLoopInvariant` can hoist instructions out of loops, so loop
dispositions for the loop it operated on may need to be cleared. We can
be smarter here (especially around how `forgetLoopDispositions` is
implemented), but let's be correct first.
Fixes PR27570.
llvm-svn: 268406
Be more specific in describing compression failures. Also, check for
this kind of error in emitNameData().
This is part of a series of patches to transition ProfileData over to
the stricter Error/Expected interface.
llvm-svn: 268400
This pass is supposed to reduce the size of the IR for compile time
purpose. We should run it ASAP, except when we prepare for LTO or
ThinLTO, and we want to keep them available for link-time inline.
Differential Revision: http://reviews.llvm.org/D19813
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 268394
A few benchmarks with lots of accesses to global variables in the hot
loops regressed a lot since r266399, which added the
SpeculativeExecution pass to the default pipeline. The problem is that
this pass doesn't mark Globals Alias Analysis as preserved. Globals
Alias Analysis is computed in a module pass, whereas
SpeculativeExecution is a function pass, and a lot of passes dependent
on the Globals Alias Analysis to optimize these benchmarks are also
function passes. As such, the Globals Alias Analysis information cannot
be recomputed between SpeculativeExecution and the following function
passes needing that information.
SpeculativeExecution doesn't invalidate Globals Alias Analysis, so mark
it as such to fix those performance regressions.
Differential Revision: http://reviews.llvm.org/D19806
llvm-svn: 268370
We were overly cautious in our analysis of loops which have invokes
which unwind to EH pads. The loop unroll transform is safe because it
only clones blocks in the loop body, it does not try to split critical
edges involving EH pads. Instead, move the necessary safety check to
LoopUnswitch.
N.B. The safety check for loop unswitch is covered by an existing test
which fails without it.
llvm-svn: 268357
There is not point in importing a "weak" or a "linkonce" function
since we won't be able to inline it anyway.
We already had a targeted check for WeakAny, this is using the
same check on GlobalValue as the inline, i.e.
isMayBeOverriddenLinkage()
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 268341
There is not point in importing a "weak" or a "linkonce" function
since we won't be able to inline it anyway.
We already had a targeted check for WeakAny, this is using the
same check on GlobalValue as the inline, i.e.
isMayBeOverriddenLinkage()
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 268315
When running cc1 with -flto=thin, it is followed by GlobalOpt, which
requires the callgraph. This saves rebuilding one.
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 268266
Make it possible that TryToSimplifyUncondBranchFromEmptyBlock merges empty
basic block including lifetime intrinsics as well as phi nodes and
unconditional branch into its successor or predecessor(s).
If successor of empty block has single predecessor, all contents including
lifetime intrinsics are sinked into the successor. Otherwise, they are
hoisted into its predecessor(s) and then merged into the predecessor(s).
Patch by Josh Yoon <josh.yoon@samsung.com>!
Differential Revision: http://reviews.llvm.org/D19257
llvm-svn: 268254
This is where it was originally, until LoopVersioningLICM was
inserted before in r259986, I don't believe it was on purpose.
Differential Revision: http://reviews.llvm.org/D19809
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 268252
SystemZ on Linux currently has 53-bit address space. In theory, the hardware
could support a full 64-bit address space, but that's not supported due to
kernel limitations (it'd require 5-level page tables), and there are no plans
for that. The default process layout stays within first 4TB of address space
(to avoid creating 4-level page tables), so any offset >= (1 << 42) is fine.
Let's use 1 << 52 here, ie. exactly half the address space.
I've originally used 7 << 50 (uses top 1/8th of the address space), but ASan
runtime assumes there's some space after the shadow area. While this is
fixable, it's simpler to avoid the issue entirely.
Also, I've originally wanted to have the shadow aligned to 1/8th the address
space, so that we can use OR like X86 to assemble the offset. I no longer
think it's a good idea, since using ADD enables us to load the constant just
once and use it with register + register indexed addressing.
Differential Revision: http://reviews.llvm.org/D19650
llvm-svn: 268161
Make use of Constant::getAggregateElement instead of checking constant types - first step towards adding support for UNDEF mask elements.
llvm-svn: 268158
If a guard call being lowered by LowerGuardIntrinsics has the
`!make.implicit` metadata attached, then reattach the metadata to the
branch in the resulting expanded form of the intrinsic. This allows us
to implement null checks as guards and still get the benefit of implicit
null checks.
llvm-svn: 268148
support multiple induction variables
This patch enable loop reroll for the following case:
for(int i=0; i<N; i += 2) {
S += *a++;
S += *a++;
};
Differential Revision: http://reviews.llvm.org/D16550
llvm-svn: 268147
This moves some logic added to EarlyCSE in rL268120 into
`llvm::isInstructionTriviallyDead`. Adds a test case for DCE to
demonstrate that passes other than EarlyCSE can now pick up on the new
information.
llvm-svn: 268126
Summary:
This change teaches EarlyCSE some basic properties of guard intrinsics:
- Guard intrinsics read all memory, but don't write to any memory
- After a guard has executed, the condition it was guarding on can be
assumed to be true
- Guard intrinsics on a constant `true` are no-ops
Reviewers: reames, hfinkel
Subscribers: mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D19578
llvm-svn: 268120
Make use of Constant::getAggregateElement instead of checking constant types - first step towards adding support for UNDEF mask elements.
llvm-svn: 268115
The implemented heuristic has a large body of code which better sits
in its own function for better readability. It also allows adding more
heuristics easier in the future.
llvm-svn: 268107
This patch fixes two somewhat related bugs in MemorySSA's caching
walker. These bugs were found because D19695 brought up the problem
that we'd have defs cached to themselves, which is incorrect.
The bugs this fixes are:
- We would sometimes skip the nearest clobber of a MemoryAccess, because
we would query our cache for a given potential clobber before
checking if the potential clobber is the clobber we're looking for.
The cache entry for the potential clobber would point to the nearest
clobber *of the potential clobber*, so if that was a cache hit, we'd
ignore the potential clobber entirely.
- There are times (sometimes in DFS, sometimes in the getClobbering...
functions) where we would insert cache entries that say a def
clobbers itself.
There's a bit of common code between the fixes for the bugs, so they
aren't split out into multiple commits.
This patch also adds a few unit tests, and refactors existing tests a
bit to reduce the duplication of setup code.
llvm-svn: 268087
Summary: Callee name is not used to identify a callsite now, so do not read it during annotation.
Reviewers: davidxl, dnovillo
Subscribers: dnovillo, danielcdh, llvm-commits
Differential Revision: http://reviews.llvm.org/D19704
llvm-svn: 268069
Summary:
Historically, we had a switch in the Makefiles for turning on "expensive
checks". This has never been ported to the cmake build, but the
(dead-ish) code is still around.
This will also make it easier to turn it on in buildbots.
Reviewers: chandlerc
Subscribers: jyknight, mzolotukhin, RKSimon, gberry, llvm-commits
Differential Revision: http://reviews.llvm.org/D19723
llvm-svn: 268050
We neglected to transfer operand bundles for some transforms. These
were found via inspection, I'll try to come up with some test cases.
llvm-svn: 268011
We neglected to transfer operand bundles for some transforms. These
were found via inspection, I'll try to come up with some test cases.
llvm-svn: 268010
We need to keep loop hints from the original loop on the new vector loop.
Failure to do this meant that, for example:
void foo(int *b) {
#pragma clang loop unroll(disable)
for (int i = 0; i < 16; ++i)
b[i] = 1;
}
this loop would be unrolled. Why? Because we'd vectorize it, thus dropping the
hints that unrolling should be disabled, and then we'd unroll it.
llvm-svn: 267970
I closely followed the precedents set by the vectorizer:
* With -Rpass-missed, the loop is reported with further details pointing
to -Rpass--analysis.
* -Rpass-analysis reports the details why distribution has failed.
* Regardless of -Rpass*, when distribution fails for a loop where
distribution was forced with the pragma, a warning is produced according
to -Wpass-failed. In this case the analysis info is also printed even
without -Rpass-analysis.
llvm-svn: 267952
The next patch will start using these for -Rpass-analysis so they won't
be internal-only anymore.
Move the 'Skipping; ' prefix that some of the message are using into the
'fail' function. We don't want to include this prefix in
the -Rpass-analysis report.
llvm-svn: 267951
When inlining a call site with llvm.mem.parallel_loop_access metadata, this
metadata needs to be propagated to all cloned memory-accessing instructions.
Otherwise, inlining parts of the loop body will invalidate the annotation.
With this functionality, we now vectorize the following as expected:
void Body(int *res, int *c, int *d, int *p, int i) {
res[i] = (p[i] == 0) ? res[i] : res[i] + d[i];
}
void Test(int *res, int *c, int *d, int *p, int n) {
int i;
#pragma clang loop vectorize(assume_safety)
for (i = 0; i < 1600; i++) {
Body(res, c, d, p, i);
}
}
llvm-svn: 267949
The MOVMSK instructions copies a vector elements' sign bits to the low bits of a scalar register and zeros the high bits.
This patch adds MOVMSK support to SimplifyDemandedUseBits so that its aware that the upper bits are known to be zero. It also removes the call to MOVMSK if none of the lower bits are actually required and just returns zero.
Differential Revision: http://reviews.llvm.org/D19614
llvm-svn: 267873
This patch implements the transformation that promotes indirect calls to
conditional direct calls when the indirect-call value profile meta-data is
available.
Differential Revision: http://reviews.llvm.org/D17864
llvm-svn: 267815
There's no existing test for this path, and I don't know how to expose
it in a regression test, but I'm assuming there's some reason this
path exists.
llvm-svn: 267813
"inferattrs" will deduce the attribute, but it will be too late for
many optimizations. Set it ourselves when creating the call.
Differential Revision: http://reviews.llvm.org/D17598
llvm-svn: 267762
Now the pass is just a tiny wrapper around the util. This lets us reuse
the logic elsewhere (done here for BuildLibCalls) instead of duplicating
it.
The next step is to have something like getOrInsertLibFunc that also
sets the attributes.
Differential Revision: http://reviews.llvm.org/D19470
llvm-svn: 267759
I tried to be as close as possible to the strongest check that
existed before; cleaning these up properly is left for future work.
Differential Revision: http://reviews.llvm.org/D19469
llvm-svn: 267758
We previously disallowed interleaved load groups that may cause us to
speculatively access memory out-of-bounds (r261331). We did this by ensuring
each load group had an access corresponding to the first and last member.
Instead of bailing out for these interleaved groups, this patch enables us to
peel off the last vector iteration, ensuring that we execute at least one
iteration of the scalar remainder loop. This solution was proposed in the
review of the previous patch.
Differential Revision: http://reviews.llvm.org/D19487
llvm-svn: 267751
This is the first of two commits for extending SLP Vectorizer to deal with aggregates.
This commit merely refactors existing logic.
http://reviews.llvm.org/D14185
llvm-svn: 267748
This change adds a new hook for estimating the cost of vector extracts followed
by zero- and sign-extensions. The motivating example for this change is the
SMOV and UMOV instructions on AArch64. These instructions move data from vector
to general purpose registers while performing the corresponding extension
(sign-extend for SMOV and zero-extend for UMOV) at the same time. For these
operations, TargetTransformInfo can assume the extensions are free and only
report the cost of the vector extract. The SLP vectorizer has been updated to
make use of the new hook.
Differential Revision: http://reviews.llvm.org/D18523
llvm-svn: 267725
Summary:
Refine the workaround from r266877 that attempts to prevent
renaming of locals in inline assembly, so that in addition to looking
for a llvm.used local value, that there is at least one inline assembly
call in the module. Otherwise, debug functions added to the llvm.used
can block importing/exporting unnecessarily.
Reviewers: joker.eph
Subscribers: llvm-commits, joker.eph
Differential Revision: http://reviews.llvm.org/D19573
llvm-svn: 267717
This is required to use this function from isSafeToSpeculativelyExecute
Reviewed By: hfinkel
Differential Revision: http://reviews.llvm.org/D16231
llvm-svn: 267692
Summary:
D19403 adds a new pragma for loop distribution. This change adds
support for the corresponding metadata that the pragma is translated to
by the FE.
As part of this I had to rethink the flag -enable-loop-distribute. My
goal was to be backward compatible with the existing behavior:
A1. pass is off by default from the optimization pipeline
unless -enable-loop-distribute is specified
A2. pass is on when invoked directly from opt (e.g. for unit-testing)
The new pragma/metadata overrides these defaults so the new behavior is:
B1. A1 + enable distribution for individual loop with the pragma/metadata
B2. A2 + disable distribution for individual loop with the pragma/metadata
The default value whether the pass is on or off comes from the initiator
of the pass. From the PassManagerBuilder the default is off, from opt
it's on.
I moved -enable-loop-distribute under the pass. If the flag is
specified it overrides the default from above.
Then the pragma/metadata can further modifies this per loop.
As a side-effect, we can now also use -enable-loop-distribute=0 from opt
to emulate the default from the optimization pipeline. So to be precise
this is the new behavior:
C1. pass is off by default from the optimization pipeline
unless -enable-loop-distribute or the pragma/metadata enables it
C2. pass is on when invoked directly from opt
unless -enable-loop-distribute=0 or the pragma/metadata disables it
Reviewers: hfinkel
Subscribers: joker.eph, mzolotukhin, llvm-commits
Differential Revision: http://reviews.llvm.org/D19431
llvm-svn: 267672
Summary:
cloneLoopWithPreheader() does not update LoopInfo for sub-loop of
the original loop being cloned. Add assert to ensure no sub-loops for loop being cloned.
Reviewers: anemet, ashutosh.nema, hfinkel
Subscribers: mzolotukhin, llvm-commits
Differential Revision: http://reviews.llvm.org/D15922
llvm-svn: 267671
Summary:
It is incorrect to compare TripCount (which is BECount + 1)
with extraiters (or Count) to check if we should enter unrolled
loop or not, because TripCount can potentially overflow
(when BECount is max unsigned integer).
While comparing BECount with (Count - 1) is overflow safe and
therefore correct.
Reviewer: hfinkel
Differential Revision: http://reviews.llvm.org/D19256
From: Evgeny Stupachenko <evstupac@gmail.com>
llvm-svn: 267662
In the case where isLegalAddressingMode is used for cases
not related to addressing modes, such as pure adds and muls,
it should not be using address space 0. LSR already passes -1
as the address space in these cases.
llvm-svn: 267645
This splits out the per-loop functionality from the Pass class.
With this the fact whether the loop is forced-distribute with the new
metadata/pragma can be cached in the per-loop class rather than passed
around.
llvm-svn: 267643
We need the default ratio to be sufficiently large that it triggers transforms
based on block frequency info (BFI) and plays well with the recently introduced
BranchProbability used by CGP.
Differential Revision: http://reviews.llvm.org/D19435
llvm-svn: 267615
The destination buffer that sprintf uses is restrict qualified, we do
not need to worry about derived pointers referenced via format
specifiers.
This reverts commit r267580.
llvm-svn: 267605
sprintf doesn't read or copy the terminating null byte from it's string
operands. sprintf will append it's own after processing all of the
format specifiers.
This fixes PR27526.
llvm-svn: 267580
Summary:
Instead of using maximum IR weight as the basic block weight, this patch uses the voting algorithm to find the most likely weight for the basic block. This can effectively avoid the cases when some IRs are annotated incorrectly due to code motion of the profiled binary.
This patch also updates propagate.ll unittest to include discriminator in the input file so that it is testing something meaningful.
Reviewers: davidxl, dnovillo
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D19301
llvm-svn: 267519
When SimplifyCFG merges identical instructions from both sides of a diamond, it
can preserve !llvm.mem.parallel_loop_access (as it does with most of the other
metadata). There's no real data or control dependency change in this case.
llvm-svn: 267515
I really thought we were doing this already, but we were not. Given this input:
void Test(int *res, int *c, int *d, int *p) {
for (int i = 0; i < 16; i++)
res[i] = (p[i] == 0) ? res[i] : res[i] + d[i];
}
we did not vectorize the loop. Even with "assume_safety" the check that we
don't if-convert conditionally-executed loads (to protect against
data-dependent deferenceability) was not elided.
One subtlety: As implemented, it will still prefer to use a masked-load
instrinsic (given target support) over the speculated load. The choice here
seems architecture specific; the best option depends on how expensive the
masked load is compared to a regular load. Ideally, using the masked load still
reduces unnecessary memory traffic, and so should be preferred. If we'd rather
do it the other way, flipping the order of the checks is easy.
The LangRef is updated to make explicit that llvm.mem.parallel_loop_access also
implies that if conversion is okay.
Differential Revision: http://reviews.llvm.org/D19512
llvm-svn: 267514
Pass all of the state we need around as arguments, so that these
functions are easier to reuse. There is one part of this that is
unusual: we pass around a functor to look up a DomTree for a function.
This will be a necessary abstraction when we try to use this code in
both the legacy and the new pass manager.
llvm-svn: 267498
This patch is what was the "instcombine" portion of D14185, with an additional
test added (see julia_pseudovec in test/Transforms/InstCombine/insert-val-extract-elem.ll).
The patch causes instcombine to replace sequences of extractelement-insertvalue-store
that act essentially like a bitcast followed by a store.
Differential review: http://reviews.llvm.org/D14260
llvm-svn: 267482
Add a typedef for the std::map<GlobalValue::GUID, GlobalValueSummary *>
map that is passed around to identify summaries for values defined in a
particular module. This shortens up declarations in a variety of places.
llvm-svn: 267471
The current logic assumes that any constant global will never be SRA'd. I presume this is because normally constant globals can be pushed into their uses and deleted. However, that sometimes can't happen (which is where you really want SRA, so the elements that can be eliminated, are!).
There seems to be no reason why we can't SRA constants too, so let's do it.
llvm-svn: 267393
As discussed on D19318, if we only demand the first element of a DIVSS/DIVSD intrinsic, then reduce to a FDIV call. This matches the existing FADD/FSUB/FMUL patterns.
llvm-svn: 267359
Split from D17490. This patch improves support for determining the demanded vector elements through SSE scalar intrinsics:
1 - demanded vector element support for unary and some extra binary scalar intrinsics (RCP/RSQRT/SQRT/FRCZ and ADD/CMP/DIV/ROUND).
2 - addss/addsd get simplified to a fadd call if we aren't interested in the pass through elements
3 - if we don't need the lowest element of a scalar operation then just use the first argument (the pass through elements) directly
We can add support for propagating demanded elements through any equivalent packed SSE intrinsics in a future patch (these wouldn't use the pass through patterns).
Differential Revision: http://reviews.llvm.org/D19318
llvm-svn: 267357
This patch improves support for determining the demanded vector elements through SSE scalar intrinsics:
1 - recognise that we only need the lowest element of the second input for binary scalar operations (and all the elements of the first input)
2 - recognise that the roundss/roundsd intrinsics use the lowest element of the second input and the remaining elements from the first input
Differential Revision: http://reviews.llvm.org/D17490
llvm-svn: 267356
Summary:
Remove the GlobalValueInfo and change the ModuleSummaryIndex to directly
reference summary objects. The info structure was there to support lazy
parsing of the combined index summary objects, which is no longer
needed and not supported.
Reviewers: joker.eph
Subscribers: joker.eph, llvm-commits
Differential Revision: http://reviews.llvm.org/D19462
llvm-svn: 267344
Summary:
We are always importing the initializer for a GlobalVariable.
So if a GlobalVariable is in the export-list, we pull in any
refs as well.
Reviewers: tejohnson
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D19102
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 267303
The original commit was reverted because of a buildbot problem with LazyCallGraph::SCC handling (not related to the OptBisect handling).
Differential Revision: http://reviews.llvm.org/D19172
llvm-svn: 267231
This patch changes the interface for createPGOFuncNameMetadata() where we add
another PGOFuncName argument.
Differential Revision: http://reviews.llvm.org/D19433
llvm-svn: 267216
Summary:
We can fold compares to false when two distinct allocations within a
function are compared for equality.
Patch by Anna Thomas!
Reviewers: majnemer, reames, sanjoy
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D19390
llvm-svn: 267214
Extend the type canonicalization logic to work for unordered atomic loads and stores. Note that while this change itself is fairly simple and low risk, there's a reasonable chance this will expose problems in the backends by suddenly generating IR they wouldn't have seen before. Anything of this nature will be an existing bug in the backend (you could write an atomic float load), but this will definitely change the frequency with which such cases are encountered. If you see problems, feel free to revert this change, but please make sure you collect a test case.
llvm-svn: 267210
Summary: This change will shorten memset if the beginning of memset is overwritten by later stores.
Reviewers: hfinkel, eeckstein, dberlin, mcrosier
Subscribers: mgrang, mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D18906
llvm-svn: 267197
E.g. for:
!1 = {"llvm.distribute", i32 1}
it now returns the MDOperand for 1.
I will use this in LoopDistribution to check the value of the metadata.
Note that the change is backward-compatible with its current use in
LoopVersioningLICM. An Optional implicitly converts to a bool depending
whether it contains a value or not.
llvm-svn: 267190
Summary:
CachingMemorySSAWalker::invalidateInfo was using IsCall to determine
which cache map needed to be cleared of entries referring to the invalidated
MemoryAccess, but there could also be entries referring to it in the
other cache map (value entries, not key entries). This change just
clears both tables to be conservatively correct.
Also add a verifyRemoved() function, called when expensive
checks (i.e. XDEBUG) are enabled to verify that the invalidated
MemoryAccess object is not referenced in any of the caches.
Reviewers: dberlin, george.burgess.iv
Subscribers: mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D19388
llvm-svn: 267157
We take the intersection of overflow flags while CSE'ing.
This permits us to consider two instructions with different overflow
behavior to be replaceable.
llvm-svn: 267153
Summary:
When optimizing PHIs which have inputs floating point binary
operators, we preserve all IR flags except the fast math
flags.
This change removes the logic which tracked some of the IR flags
(no wrap, exact) and replaces it by doing an and on the IR flags of
all inputs to the PHI - which will also handle the fast math
flags.
Reviewers: majnemer
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D19370
llvm-svn: 267139
EarlyCSE had inconsistent behavior with regards to flag'd instructions:
- In some cases, it would pessimize if the available instruction had
different flags by not performing CSE.
- In other cases, it would miscompile if it replaced an instruction
which had no flags with an instruction which has flags.
Fix this by being more consistent with our flag handling by utilizing
andIRFlags.
llvm-svn: 267111
Re-layer the functions in the new (i.e., newly correct) post-order
traversals in ValueEnumerator (r266947) and ValueMapper (r266949).
Instead of adding a node to the worklist in a helper function and
returning a flag to say what happened, return the node itself. This
makes the code way cleaner: the worklist is local to the main function,
there is no flag for an early loop exit (since we can cleanly bury the
loop), and it's perfectly clear when pointers into the worklist might be
invalidated.
I'm fixing both algorithms in the same commit to avoid repeating the
commit message; if you take the time to understand one the other should
be easy. The diff itself isn't entirely obvious since the traversals
have some noise (i.e., things to do), but here's the high-level change:
auto helper = [&WL](T *Op) { auto helper = [](T **&I, T **E) {
=> while (I != E) {
if (shouldVisit(Op)) { T *Op = *I++;
WL.push(Op, Op->begin()); if (shouldVisit(Op)) {
return true; return Op;
} }
return false; return nullptr;
}; };
=>
WL.push(S, S->begin()); WL.push(S, S->begin());
while (!empty()) { while (!empty()) {
auto *N = WL.top().N; auto *N = WL.top().N;
auto *&I = WL.top().I; auto *&I = WL.top().I;
bool DidChange = false;
while (I != N->end())
if (helper(*I++)) { => if (T *Op = helper(I, N->end()) {
DidChange = true; WL.push(Op, Op->begin());
break; continue;
} }
if (DidChange)
continue;
POT.push(WL.pop()); => POT.push(WL.pop());
} }
Thanks to Mehdi for helping me find a better way to layer this.
llvm-svn: 267099
Summary:
Adds an instrumentation pass for the new EfficiencySanitizer ("esan")
performance tuning family of tools. Multiple tools will be supported
within the same framework. Preliminary support for a cache fragmentation
tool is included here.
The shared instrumentation includes:
+ Turn mem{set,cpy,move} instrinsics into library calls.
+ Slowpath instrumentation of loads and stores via callouts to
the runtime library.
+ Fastpath instrumentation will be per-tool.
+ Which memory accesses to ignore will be per-tool.
Reviewers: eugenis, vitalybuka, aizatsky, filcab
Subscribers: filcab, vkalintiris, pcc, silvas, llvm-commits, zhaoqin, kcc
Differential Revision: http://reviews.llvm.org/D19167
llvm-svn: 267058
Summary:
If we know that the pointer allocated within a function does not escape,
we can fold away comparisons that are done with global pointers
Patch by Anna Thomas!
Reviewers: reames, majnemer, sanjoy
Subscribers: mgrang, mcrosier, majnemer, llvm-commits
Differential Revision: http://reviews.llvm.org/D19276
llvm-svn: 267035