This is a follow-up to the rL337171. This patch fixes regression
introduced by the r337171 and enables MipsTruncIntFP pattern.
Differential revision: https://reviews.llvm.org/D49469
llvm-svn: 337392
Pulled out from D49225, we have a lot of repeated scalar cost calculations, often with arguments that don't look the same but turn out to be.
llvm-svn: 337390
When rL336971 removed the scalar-fp isel patterns, we lost the need for this canonicalization - commutation/folding can handle everything else.
llvm-svn: 337387
ARMSubtarget had a copy/pasted block to determine whether the target was
hard-float, but it just delegated to triple features anyway so it's better at
the TargetMachine level.
llvm-svn: 337384
This patch adds support for the following unpredicated
floating-point instructions:
FADD Floating point add
FSUB Floating point subtract
FMUL Floating point multiplication
FTSMUL Floating point trigonometric starting value
FRECPS Floating point reciprocal step
FRSQRTS Floating point reciprocal square root step
The instructions have the following assembly format:
fadd z0.h, z1.h, z2.h
and have variants for 16, 32 and 64-bit FP elements.
llvm-svn: 337383
This reverts commit 55222c9183c6e07f53a54c4061677734f54feac1.
I missed that this patch has a dependency on https://reviews.llvm.org/D49219
that has not been approved yet.
llvm-svn: 337373
The signed/unsigned DOT instructions perform a dot-product on
quadtuplets from two source vectors and accumulate the result in
the destination register. The instructions come in two forms:
Vector form, e.g.
sdot z0.s, z1.b, z2.b - signed dot product on four 8-bit quad-tuplets,
accumulating results in 32-bit elements.
udot z0.d, z1.h, z2.h - unsigned dot product on four 16-bit quad-tuplets,
accumulating results in 64-bit elements.
Indexed form, e.g.
sdot z0.s, z1.b, z2.b[3] - signed dot product on four 8-bit quad-tuplets
with specified quadtuplet from second
source vector, accumulating results in 32-bit
elements.
udot z0.d, z1.h, z2.h[1] - dot product on four 16-bit quad-tuplets
with specified quadtuplet from second
source vector, accumulating results in 64-bit
elements.
llvm-svn: 337372
Summary: This is how it appears to be handled in GCC and it prevents a
"Unknown mismatch" error in the SelectionDAGBuilder.
Reviewers: venkatra, jyknight, jrtc27
Reviewed By: jyknight, jrtc27
Subscribers: eraman, fedor.sergeev, jrtc27, llvm-commits
Differential Revision: https://reviews.llvm.org/D49218
llvm-svn: 337370
This patch adds the following predicated instructions:
UDIV Unsigned divide active elements
UDIVR Unsigned divide active elements, reverse form.
SDIV Signed divide active elements
SDIVR Signed divide active elements, reverse form.
e.g.
udiv z0.s, p0/m, z0.s, z1.s
(unsigned divide active elements in z0 by z1, store result in z0)
sdivr z0.s, p0/m, z0.s, z1.s
(signed divide active elements in z1 by z0, store result in z0)
llvm-svn: 337369
This patch adds the following instructions:
MUL - multiply vectors, e.g.
mul z0.h, p0/m, z0.h, z1.h
- multiply with immediate, e.g.
mul z0.h, z0.h, #127
SMULH - signed multiply returning high half, e.g.
smulh z0.h, p0/m, z0.h, z1.h
UMULH - unsigned multiply returning high half, e.g.
umulh z0.h, p0/m, z0.h, z1.h
llvm-svn: 337358
* Delete a no-longer-used override, and mark the other
getRegisterTypeForCallingConv() as override.
* SPE only supports i32, not i64, as the internal type, so simply remove
the type check, so that DestReg and Opc are provably always set.
GCC 6.4 did not warn about either of the above.
llvm-svn: 337350
The X86ISD::MOVLHPS/MOVHLPS should now only be emitted in SSE1 only. This means that the v2i64/v2f64 types would be illegal thus we don't need these patterns.
llvm-svn: 337349
I'm trying to restrict the MOVLHPS/MOVHLPS ISD nodes to SSE1 only. With SSE2 we can use unpcks. I believe this will allow some patterns to be cleaned up to require fewer bitcasts.
I've put in an odd isel hack to still select MOVHLPS instruction from the unpckh node to avoid changing tests and because movhlps is a shorter encoding. Ideally we'd do execution domain switching on this, but the operands are in the wrong order and are tied. We might be able to try a commute in the domain switching using custom code.
We already support domain switching for UNPCKLPD and MOVLHPS.
llvm-svn: 337348
Summary:
The Signal Processing Engine (SPE) is found on NXP/Freescale e500v1,
e500v2, and several e200 cores. This adds support targeting the e500v2,
as this is more common than the e500v1, and is in SoCs still on the
market.
This patch is very intrusive because the SPE is binary incompatible with
the traditional FPU. After discussing with others, the cleanest
solution was to make both SPE and FPU features on top of a base PowerPC
subset, so all FPU instructions are now wrapped with HasFPU predicates.
Supported by this are:
* Code generation following the SPE ABI at the LLVM IR level (calling
conventions)
* Single- and Double-precision math at the level supported by the APU.
Still to do:
* Vector operations
* SPE intrinsics
As this changes the Callee-saved register list order, one test, which
tests the precise generated code, was updated to account for the new
register order.
Reviewed by: nemanjai
Differential Revision: https://reviews.llvm.org/D44830
llvm-svn: 337347
This is the lead-up to having SPE codegen. Add the rest of the
instructions, along with MC tests.
Differential Revision: https://reviews.llvm.org/D44829
llvm-svn: 337346
The presence of these symbols in the symbol table can cause symbol type
mismatch errors (or undefined symbol errors on emulated TLS targets)
and they can't be ICF'd anyway.
llvm-svn: 337338
These are all methods that, while not currently used in the
Itanium demangler, are generally useful enough that it's
likely the itanium demangler could find a use for them. More
importantly, they are all necessary for the Microsoft demangler
which is up and coming in a subsequent patch. Rather than
combine these into a single monolithic patch, I think it makes
sense to commit this utility code first since it is very simple,
this way it won't detract from the substance of the MS demangler
patch.
llvm-svn: 337316
InstCombine has a cast transform that matches a cast-of-select:
Orig = cast (Src = select Cond TV FV)
And tries to replace it with a select which has the cast folded in:
NewSel = select Cond (cast TV) (cast FV)
The combiner does RAUW(Orig, NewSel), so any debug values for Orig would
survive the transform. But debug values for Src would be lost.
This patch teaches InstCombine to replace all debug uses of Src with
NewSel (taking care of doing any necessary DIExpression rewriting).
Differential Revision: https://reviews.llvm.org/D49270
llvm-svn: 337310
Summary:
The only thing he suggested that I've skipped here is the double-wide
multiply instructions. Multiply is an area I'm nervous about there being
some hidden data-dependent behavior, and it doesn't seem important for
any benchmarks I have, so skipping it and sticking with the minimal
multiply support that matches what I know is widely used in existing
crypto libraries. We can always add double-wide multiply when we have
clarity from vendors about its behavior and guarantees.
I've tried to at least cover the fundamentals here with tests, although
I've not tried to cover every width or permutation. I can add more tests
where folks think it would be helpful.
Reviewers: craig.topper
Subscribers: sanjoy, mcrosier, hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D49413
llvm-svn: 337308
Previously we were assuming whole program compilation. Now that
separate compilation is a thing we need to update this pass.
Firstly, it can no longer assert on the existence of malloc and free.
This functions might not be in the current translation unit. If we
need them then we will generate not imports for them.
Secondly the global helper function we create should be marked as
weak since we will be generating a separate copy in each translation
unit.
Finally the names of the symbols used must be unique and fixed since
they need to agree across translation units.
Differential Revision: https://reviews.llvm.org/D49263
llvm-svn: 337301
Previously we passed 'null_frag' into the instruction definition. The multiclass is shared with MOVHPD which doesn't use null_frag. It turns out by passing X86Movsd it produces patterns equivalent to some standalone patterns.
llvm-svn: 337299
The Mips FastISel back-end does not extend i1 values while lowering icmp.
Ensure that we bail into DAG ISel when handling this case.
Patch by Dragan Mladjenovic.
Differential Revision: https://reviews.llvm.org/D49290
llvm-svn: 337288
Once we resolved an undef in a function we can run Solve, which could
lead to finding a constant return value for the function, which in turn
could turn undefs into constants in other functions that call it, before
resolving undefs there.
Computationally the amount of work we are doing stays the same, just the
order we process things is slightly different and potentially there are
a few less undefs to resolve.
We are still relying on the order of functions in the IR, which means
depending on the order, we are able to resolve the optimal undef first
or not. For example, if @test1 comes before @testf, we find the constant
return value of @testf too late and we cannot use it while solving
@test1.
This on its own does not lead to more constants removed in the
test-suite, probably because currently we have to be very lucky to visit
applicable functions in the right order.
Maybe we manage to come up with a better way of resolving undefs in more
'profitable' functions first.
Reviewers: efriedma, mssimpso, davide
Reviewed By: efriedma, davide
Differential Revision: https://reviews.llvm.org/D49385
llvm-svn: 337283
TTI::getMinMaxReductionCost typically can't handle pointer types - until this is changed its better to limit horizontal reduction to integer/float vector types only.
llvm-svn: 337280
Summary:
Part of the adjustCopiesBackFrom method wasn't correctly dealing with SubRange
intervals when updating.
2 changes. The first to ensure that bogus SubRange Segments aren't propagated when
encountering Segments of the form [1234r, 1234d:0) when preparing to merge value
numbers. These can be removed in this case.
The second forces a shrinkToUses call if SubRanges end on the copy index
(instead of just the parent register).
V2: Addressed review comments, plus MIR test instead of ll test
Subscribers: MatzeB, qcolombet, nhaehnle
Differential Revision: https://reviews.llvm.org/D40308
Change-Id: I1d2b2b4beea802fce11da01edf71feb2064aab05
llvm-svn: 337273
This patch completes support for the following floating point
instructions that take FP immediates:
FADD* (addition)
FSUB (subtract)
FSUBR (subtract reverse form)
FMUL* (multiplication)
FMAX* (maximum)
FMAXNM (maximum number)
FMIN (maximum)
FMINNM (maximum number)
All operations are predicated and take a FP immediate operand,
e.g.
fadd z0.h, p0/m, z0.h, #0.5
fmin z0.s, p0/m, z0.s, #1.0
^___________^ (tied)
* Instructions added in a previous patch.
llvm-svn: 337272
Similarly to rL336736, at least one more C API function does not
properly get declared as extern "C" due to a missing header, causing
name mangling and linking errors.
This patch fixes calls to LLVMAddAggressiveInstCombinerPass().
Differential Revision: https://reviews.llvm.org/D49416
Reviewed By: whitequark
llvm-svn: 337264
rL333307 was introduced to remove automatic target triple
normalization when calling sys::getDefaultTargetTriple(), arguing
that users of the latter already called Triple::normalize()
if necessary. However, users of the C API currently have no way of
doing target triple normalization.
This patch introduces an LLVMNormalizeTargetTriple function to
the C API which wraps Triple::normalize() and can be used on
the result of LLVMGetDefaultTargetTriple to achieve the same effect.
Differential Revision: https://reviews.llvm.org/D49414
Reviewed By: whitequark
llvm-svn: 337263
If we are only extracting vector elements via EXTRACT_VECTOR_ELT(s) we may be able to use SimplifyDemandedVectorElts to avoid unnecessary vector ops.
Differential Revision: https://reviews.llvm.org/D49262
llvm-svn: 337258
The SPLICE instruction splices two vectors into one vector using a
predicate. It copies the active elements from the first vector, and
then fills the remaining elements with the low-numbered elements from
the second vector.
The instruction has the following form, e.g.
splice z0.b, p0, z0.b, z1.b
for 8-bit elements. It also supports 16, 32 and
64-bit elements.
llvm-svn: 337253
This patch adds an instruction that allows extracting
a vector from a pair of vectors, given an immediate index
that describes the element position to extract from.
The instruction has the following assembly:
ext z0.b, z0.b, z1.b, #imm
where #imm is an immediate between 0 and 255.
llvm-svn: 337251
The ta instruction will always trap, regardless of the value
of the integer condition codes. TRAPri is marked as using icc,
so we cannot use a pattern for TRAPri to implement ta 1, as
verify-machineinstrs can complain that icc is not defined.
Instead we implement ta 1 the same way as ta 5.
llvm-svn: 337236
This amounts to pretty ridiculous number of patterns. Ideally we'd canonicalize the X86ISD::VRNDSCALE earlier to reuse those patterns. I briefly looked into doing that, but some strict FP operations could still get converted to rint and nearbyint during isel. It's probably still worthwhile to look into. This patch is meant as a starting point to work from.
llvm-svn: 337234
This allows us to use 231 form to fold an insertelement on the add input to the fma. There is technically no software intrinsic that can use this until AVX512F, but it can be manually built up from other intrinsics.
llvm-svn: 337223
This support was partial and temporary. Now that we have
wasm object file support its no longer needed.
Differential Revision: https://reviews.llvm.org/D48744
llvm-svn: 337222
As discussed here:
http://lists.llvm.org/pipermail/llvm-dev/2018-May/123292.htmlhttp://lists.llvm.org/pipermail/llvm-dev/2018-July/124400.html
We want to add rotate intrinsics because the IR expansion of that pattern is 4+ instructions,
and we can lose pieces of the pattern before it gets to the backend. Generalizing the operation
by allowing 2 different input values (plus the 3rd shift/rotate amount) gives us a "funnel shift"
operation which may also be a single hardware instruction.
Initially, I thought we needed to define new DAG nodes for these ops, and I spent time working
on that (much larger patch), but then I concluded that we don't need it. At least as a first
step, we have all of the backend support necessary to match these ops...because it was required.
And shepherding these through the IR optimizer is the primary concern, so the IR intrinsics are
likely all that we'll ever need.
There was also a question about converting the intrinsics to the existing ROTL/ROTR DAG nodes
(along with improving the oversized shift documentation). Again, I don't think that's strictly
necessary (as the test results here prove). That can be an efficiency improvement as a small
follow-up patch.
So all we're left with is documentation, definition of the IR intrinsics, and DAG builder support.
Differential Revision: https://reviews.llvm.org/D49242
llvm-svn: 337221
In a followup I'm looking to add a Microsoft demangler. Doing
so needs a lot of the same utility classes and feature test
macros which are already implemented in ItaniumDemangle.cpp.
So move all of these things into header files so that they
can be re-used by a new demangler.
Differential Revision: https://reviews.llvm.org/D49399
llvm-svn: 337217
Summary:
[[ https://bugs.llvm.org/show_bug.cgi?id=38149 | PR38149 ]]
As discussed in https://reviews.llvm.org/D49179#1158957 and later,
the IR for 'check for [no] signed truncation' pattern can be improved:
https://rise4fun.com/Alive/gBf
^ that pattern will be produced by Implicit Integer Truncation sanitizer,
https://reviews.llvm.org/D48958https://bugs.llvm.org/show_bug.cgi?id=21530
in signed case, therefore it is probably a good idea to improve it.
Proofs for this transform: https://rise4fun.com/Alive/mgu
This transform is surprisingly frustrating.
This does not deal with non-splat shift amounts, or with undef shift amounts.
I've outlined what i think the solution should be:
```
// Potential handling of non-splats: for each element:
// * if both are undef, replace with constant 0.
// Because (1<<0) is OK and is 1, and ((1<<0)>>1) is also OK and is 0.
// * if both are not undef, and are different, bailout.
// * else, only one is undef, then pick the non-undef one.
```
The DAGCombine will reverse this transform, see
https://reviews.llvm.org/D49266
Reviewers: spatel, craig.topper
Reviewed By: spatel
Subscribers: JDevlieghere, rkruppe, llvm-commits
Differential Revision: https://reviews.llvm.org/D49320
llvm-svn: 337190
trivially rematerializable.
We run into a case where machineLICM hoists a large number of live ranges
outside of a big loop because it thinks those live ranges are trivially
rematerializable. In regalloc, global splitting is tried out first for those
live ranges before they are spilled and rematerialized. Because the global
splitting algorithm is quadratic, increasing a lot of global splitting
candidates causes huge compile time increase (50s to 1400s on my local
machine when compiling a module).
However, we think for live ranges which are very large and are trivially
rematerialiable, it is better to just skip global splitting so as to save
compile time with little chance of sacrificing performance. We uses the
segment size of live range to indirectly evaluate whether the global
splitting of the live range can introduce high cost, and use an option
as a knob to adjust the size limit threshold.
Differential Revision: https://reviews.llvm.org/D49353
llvm-svn: 337186
This reverts commit r337081, therefore restoring r337050 (and fix in
r337059), with test fix for bot failure described after the original
description below.
In order to always import the same copy of a linkonce function,
even when encountering it with different thresholds (a higher one then a
lower one), keep track of the summary we decided to import.
This ensures that the backend only gets a single definition to import
for each GUID, so that it doesn't need to choose one.
Move the largest threshold the GUID was considered for import into the
current module out of the ImportMap (which is part of a larger map
maintained across the whole index), and into a new map just maintained
for the current module we are computing imports for. This saves some
memory since we no longer have the thresholds maintained across the
whole index (and throughout the in-process backends when doing a normal
non-distributed ThinLTO build), at the cost of some additional
information being maintained for each invocation of ComputeImportForModule
(the selected summary pointer for each import).
There is an additional map lookup for each callee being considered for
importing, however, this was able to subsume a map lookup in the
Worklist iteration that invokes computeImportForFunction. We also are
able to avoid calling selectCallee if we already failed to import at the
same or higher threshold.
I compared the run time and peak memory for the SPEC2006 471.omnetpp
benchmark (running in-process ThinLTO backends), as well as for a large
internal benchmark with a distributed ThinLTO build (so just looking at
the thin link time/memory). Across a number of runs with and without
this change there was no significant change in the time and memory.
(I tried a few other variations of the change but they also didn't
improve time or peak memory).
The new commit removes a test that no longer makes sense
(Transforms/FunctionImport/hotness_based_import2.ll), as exposed by the
reverse-iteration bot. The test depends on the order of processing the
summary call edges, and actually depended on the old problematic
behavior of selecting more than one summary for a given GUID when
encountered with different thresholds. There was no guarantee even
before that we would eventually pick the linkonce copy with the hottest
call edges, it just happened to work with the test and the old code, and
there was no guarantee that we would end up importing the selected
version of the copy that had the hottest call edges (since the backend
would effectively import only one of the selected copies).
Reviewers: davidxl
Subscribers: mehdi_amini, inglorion, llvm-commits
Differential Revision: https://reviews.llvm.org/D48670
llvm-svn: 337184
invariant instructions to be both more correct and much more powerful.
While testing, I continued to find issues with sinking post-load
hardening. Unfortunately, it was amazingly hard to create any useful
tests of this because we were mostly sinking across copies and other
loading instructions. The fact that we couldn't sink past normal
arithmetic was really a big oversight.
So first, I've ported roughly the same set of instructions from the data
invariant loads to also have their non-loading varieties understood to
be data invariant. I've also added a few instructions that came up so
often it again made testing complicated: inc, dec, and lea.
With this, I was able to shake out a few nasty bugs in the validity
checking. We need to restrict to hardening single-def instructions with
defined registers that match a particular form: GPRs that don't have
a NOREX constraint directly attached to their register class.
The (tiny!) test case included catches all of the issues I was seeing
(once we can sink the hardening at all) except for the NOREX issue. The
only test I have there is horrible. It is large, inexplicable, and
doesn't even produce an error unless you try to emit encodings. I can
keep looking for a way to test it, but I'm out of ideas really.
Thanks to Ben for giving me at least a sanity-check review. I'll follow
up with Craig to go over this more thoroughly post-commit, but without
it SLH crashes everywhere so landing it for now.
Differential Revision: https://reviews.llvm.org/D49378
llvm-svn: 337177
Instead, the pattern is tagged with the correct predicate when
it is declared. Some patterns have been duplicated as necessary.
Patch by Simon Dardis.
Differential revision: https://reviews.llvm.org/D48365
llvm-svn: 337171
Add code for selection of G_LOAD, G_STORE, G_GEP, G_FRAMEINDEX and
G_CONSTANT. Support loads and stores of i32 values.
Patch by Petar Avramovic.
Differential Revision: https://reviews.llvm.org/D48957
llvm-svn: 337168
Summary:
[[ https://bugs.llvm.org/show_bug.cgi?id=38149 | PR38149 ]]
As discussed in https://reviews.llvm.org/D49179#1158957 and later,
the IR for 'check for [no] signed truncation' pattern can be improved:
https://rise4fun.com/Alive/gBf
^ that pattern will be produced by Implicit Integer Truncation sanitizer,
https://reviews.llvm.org/D48958https://bugs.llvm.org/show_bug.cgi?id=21530
in signed case, therefore it is probably a good idea to improve it.
But the IR-optimal patter does not lower efficiently, so we want to undo it..
This handles the simple pattern.
There is a second pattern with predicate and constants inverted.
NOTE: we do not check uses here. we always do the transform.
Reviewers: spatel, craig.topper, RKSimon, javed.absar
Reviewed By: spatel
Subscribers: kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D49266
llvm-svn: 337166
Summary: These are the names used in libgcc.
Reviewers: venkatra, jyknight, ekedaigle
Reviewed By: jyknight
Subscribers: joerg, fedor.sergeev, jrtc27, llvm-commits
Differential Revision: https://reviews.llvm.org/D48915
llvm-svn: 337164
Summary: Software trap number one is the trap used for breakpoints
in the Sparc ABI.
Reviewers: jyknight, venkatra
Reviewed By: jyknight
Subscribers: fedor.sergeev, jrtc27, llvm-commits
Differential Revision: https://reviews.llvm.org/D48637
llvm-svn: 337163
Summary:
If the high part of the load is not used the offset to the next element
will not be set correctly.
For example, on Sparc V8, the following code will read val2 from offset 4
instead of 8.
```
int val = __builtin_va_arg(va, long long);
int val2 = __builtin_va_arg(va, int);
```
Reviewers: jyknight
Reviewed By: jyknight
Subscribers: fedor.sergeev, jrtc27, llvm-commits
Differential Revision: https://reviews.llvm.org/D48595
llvm-svn: 337161
Found cases that hit the assert I added. This patch factors the validity
checking into a nice helper routine and calls it when deciding to harden
post-load, and asserts it when doing so later.
I've added tests for the various ways of loading a floating point type,
as well as loading all vector permutations. Even though many of these go
to identical instructions, it seems good to somewhat comprehensively
test them.
I'm confident there will be more fixes needed here, I'll try to add
tests each time as I get this predicate adjusted.
llvm-svn: 337160
For dsymutil we want to store offsets in the accelerator table entries
rather than DIE pointers. In addition, we need a way to communicate
which CU a DIE belongs to. This patch provides support for both of these
issues.
Differential revision: https://reviews.llvm.org/D49102
llvm-svn: 337158
Re-apply "[AMDGPU][Waitcnt] fix "comparison of integers of different signs" build error""
( fe0a456510131f268e388c4a18a92f575c0db183 ), which was inadvertantly reverted via
2b2ee080f0164485562593b1b87291a48cea4a9a .
llvm-svn: 337156
This patch introduces createUserspaceApi() that creates function/global
declarations for symbols used by MSan in the userspace.
This is a step towards the upcoming KMSAN implementation patch.
Reviewed at https://reviews.llvm.org/D49292
llvm-svn: 337155
Memory legalizer, waitcnt, and shrink passes can perturb the instructions,
which means that the post-RA hazard recognizer pass should run after them.
Otherwise, one of those passes may invalidate the work done by the hazard
recognizer. Note that this has adverse side-effect that any consecutive
S_NOP 0's, emitted by the hazard recognizer, will not be shrunk into a
single S_NOP <N>. This should be addressed in a follow-on patch.
Differential Revision: https://reviews.llvm.org/D49288
llvm-svn: 337154
Bug fix for PR37808. The regression test is a reduced version of the
original reproducer attached to the bug report. As stated in the report,
the problem was that InsertedPHIs was keeping dangling pointers to
deleted Memory-Phis. MemoryPhis are created eagerly and sometimes get
zapped shortly afterwards. I've used WeakVH instead of an expensive
removal operation from the active workset.
Differential Revision: https://reviews.llvm.org/D48372
llvm-svn: 337149
This unfortunately requires a bunch of bitcasts to be added added to SUBREG_TO_REG, COPY_TO_REGCLASS, and instructions in output patterns. Otherwise tablegen seems to default to picking f128 and then we fail when something tries to get the register class for f128 which isn't always valid.
The test changes are because we were previously mixing fr128 and vr128 due to contrainRegClass finding FR128 first and passes like live range shrinking weren't handling that well.
llvm-svn: 337147
indices used by AVX2 and AVX-512 gather instructions.
The index vector is hardened by broadcasting the predicate state
into a vector register and then or-ing. We don't even have to worry
about EFLAGS here.
I've added a test for all of the gather intrinsics to make sure that we
don't miss one. A particularly interesting creation is the gather
prefetch, which needs to be marked as potentially "loading" to get the
correct behavior. It's a memory access in many ways, and is actually
relevant for SLH. Based on discussion with Craig in review, I've moved
it to be `mayLoad` and `mayStore` rather than generic side effects. This
matches how we model other prefetch instructions.
Many thanks to Craig for the review here.
Differential Revision: https://reviews.llvm.org/D49336
llvm-svn: 337144
AVX512F only has integer domain logic instructions. AVX512DQ added FP domain logic instructions.
Execution domain fixing runs before EVEX->VEX. So if we have AVX512F and not AVX512DQ we fail to do execution domain switching of the logic operations. This leads to mismatches in execution domain and more test differences.
This patch adds custom domain fixing that switches EVEX integer logic operations to VEX fp logic operations if XMM16-31 are not used.
llvm-svn: 337137
128-bit ops implicitly zero the upper bits. This should address the comment about domain crossing for the integer version without AVX2 since we can use a 128-bit VBLENDW without AVX2.
The only bad thing I see here is that we failed to reuse an vxorps in some of the tests, but I think that's already known issue.
llvm-svn: 337134
The actual code seems to be correct, but the comments were misleading.
Patch by Aaron Puchert!
Differential Revision: https://reviews.llvm.org/D49276
llvm-svn: 337131
This is almost the same as an existing IR canonicalization in instcombine,
so I'm assuming this is a good early generic DAG combine too.
The motivation comes from reduced bit-hacking for select-of-constants in IR
after rL331486. We want to restore that functionality in the DAG as noted in
the commit comments for that change and the llvm-dev discussion here:
http://lists.llvm.org/pipermail/llvm-dev/2018-July/124433.html
The PPC and AArch tests show that those targets are already doing something
similar. x86 will be neutral in the minimal case and generally better when
this pattern is extended with other ops as shown in the signbit-shift.ll tests.
Note the asymmetry: we don't include the (extend (ifneg X)) transform because
it already exists in SimplifySelectCC(), and that is verified in the later
unchanged tests in the signbit-shift.ll files. Without the 'not' op, the
general transform to use a shift is always a win because that's a single
instruction.
Alive proofs:
https://rise4fun.com/Alive/ysli
Name: if pos, get -1
%c = icmp sgt i16 %x, -1
%r = sext i1 %c to i16
=>
%n = xor i16 %x, -1
%r = ashr i16 %n, 15
Name: if pos, get 1
%c = icmp sgt i16 %x, -1
%r = zext i1 %c to i16
=>
%n = xor i16 %x, -1
%r = lshr i16 %n, 15
Differential Revision: https://reviews.llvm.org/D48970
llvm-svn: 337130
This fold is repeated/misplaced in instcombine, but I'm
not sure if it's safe to remove that yet because some
other folds appear to be asserting that the transform
has occurred within instcombine itself.
This isn't the best fix for PR37776, but it probably
hides the bug with the given code example:
https://bugs.llvm.org/show_bug.cgi?id=37776
We have another test to demonstrate the more general bug.
llvm-svn: 337127
registers.
The goal of this patch is to improve the throughput analysis in llvm-mca for the
case where instructions perform partial register writes.
On x86, partial register writes are quite difficult to model, mainly because
different processors tend to implement different register merging schemes in
hardware.
When the code contains partial register writes, the IPC (instructions per
cycles) estimated by llvm-mca tends to diverge quite significantly from the
observed IPC (using perf).
Modern AMD processors (at least, from Bulldozer onwards) don't rename partial
registers. Quoting Agner Fog's microarchitecture.pdf:
" The processor always keeps the different parts of an integer register together.
For example, AL and AH are not treated as independent by the out-of-order
execution mechanism. An instruction that writes to part of a register will
therefore have a false dependence on any previous write to the same register or
any part of it."
This patch is a first important step towards improving the analysis of partial
register updates. It changes the semantic of RegisterFile descriptors in
tablegen, and teaches llvm-mca how to identify false dependences in the presence
of partial register writes (for more details: see the new code comments in
include/Target/TargetSchedule.h - class RegisterFile).
This patch doesn't address the case where a write to a part of a register is
followed by a read from the whole register. On Intel chips, high8 registers
(AH/BH/CH/DH)) can be stored in separate physical registers. However, a later
(dirty) read of the full register (example: AX/EAX) triggers a merge uOp, which
adds extra latency (and potentially affects the pipe usage).
This is a very interesting article on the subject with a very informative answer
from Peter Cordes:
https://stackoverflow.com/questions/45660139/how-exactly-do-partial-registers-on-haswell-skylake-perform-writing-al-seems-to
In future, the definition of RegisterFile can be extended with extra information
that may be used to identify delays caused by merge opcodes triggered by a dirty
read of a partial write.
Differential Revision: https://reviews.llvm.org/D49196
llvm-svn: 337123
All predicates are handled.
There does not seem to be any other possible folds here.
There are some more folds possible with inverted mask though.
llvm-svn: 337112
The MachineOutliner was doing an std::for_each from the call (inserted
before the outlined sequence) to the iterator at the end of the
sequence.
std::for_each needs the iterator past the end, so the last instruction
was not taken into account when propagating the liveness information.
This fixes the machine verifier issue in machine-outliner-disubprogram.ll.
Differential Revision: https://reviews.llvm.org/D49295
llvm-svn: 337090
no conditions.
This is only valid to do if we're hardening calls and rets with LFENCE
which results in an LFENCE guarding the entire entry block for us.
llvm-svn: 337089
The code tried to find the immediate by using getNumOperands() on the MachineInstr, but there might be implicit-defs after the immediate that get counted.
Instead use getNumOperands() from the instruction description which will only count the operands that are defined in the td file.
llvm-svn: 337088
AVX512 doesn't have an immediate controlled blend instruction. But blend throughput is still better than movss/sd on SKX.
This commit changes AVX512 to use the AVX blend instructions instead of MOVSS/MOVSD. This constrains the register allocation since it won't be able to use XMM16-31, but hopefully the increased throughput and reduced port 5 pressure makes up for that.
llvm-svn: 337083
This reverts commit r337021.
WARNING: MemorySanitizer: use-of-uninitialized-value
#0 0x1415cd65 in void write_signed<long>(llvm::raw_ostream&, long, unsigned long, llvm::IntegerStyle) /code/llvm-project/llvm/lib/Support/NativeFormatting.cpp:95:7
#1 0x1415c900 in llvm::write_integer(llvm::raw_ostream&, long, unsigned long, llvm::IntegerStyle) /code/llvm-project/llvm/lib/Support/NativeFormatting.cpp:121:3
#2 0x1472357f in llvm::raw_ostream::operator<<(long) /code/llvm-project/llvm/lib/Support/raw_ostream.cpp:117:3
#3 0x13bb9d4 in llvm::raw_ostream::operator<<(int) /code/llvm-project/llvm/include/llvm/Support/raw_ostream.h:210:18
#4 0x3c2bc18 in void printField<unsigned int, &(amd_kernel_code_s::amd_kernel_code_version_major)>(llvm::StringRef, amd_kernel_code_s const&, llvm::raw_ostream&) /code/llvm-project/llvm/lib/Target/AMDGPU/Utils/AMDKernelCodeTUtils.cpp:78:23
#5 0x3c250ba in llvm::printAmdKernelCodeField(amd_kernel_code_s const&, int, llvm::raw_ostream&) /code/llvm-project/llvm/lib/Target/AMDGPU/Utils/AMDKernelCodeTUtils.cpp:104:5
#6 0x3c27ca3 in llvm::dumpAmdKernelCode(amd_kernel_code_s const*, llvm::raw_ostream&, char const*) /code/llvm-project/llvm/lib/Target/AMDGPU/Utils/AMDKernelCodeTUtils.cpp:113:5
#7 0x3a46e6c in llvm::AMDGPUTargetAsmStreamer::EmitAMDKernelCodeT(amd_kernel_code_s const&) /code/llvm-project/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp:161:3
#8 0xd371e4 in llvm::AMDGPUAsmPrinter::EmitFunctionBodyStart() /code/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp:204:26
[...]
Uninitialized value was created by an allocation of 'KernelCode' in the stack frame of function '_ZN4llvm16AMDGPUAsmPrinter21EmitFunctionBodyStartEv'
#0 0xd36650 in llvm::AMDGPUAsmPrinter::EmitFunctionBodyStart() /code/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp:192
llvm-svn: 337079
If an HVX vector register is to be coalesced into a vector pair, make
sure that the vector pair will not have a function call in its live range,
unless it already had one. All HVX vector registers are volatile, so
any vector register live across a function call will have to be spilled.
If a vector needs to be spilled, and it's coalesced into a vector pair
then the whole pair will need to be spilled (even if only a part of it is
live), taking extra stack space.
llvm-svn: 337073
Summary:
By looking at the callers of getUse(), we can see that even though
IVUsers may offer uses, but they may not be interesting to
LSR. It's possible that none of them is interesting.
Reviewers: sanjoy
Subscribers: jlebar, hiraditya, bixia, llvm-commits
Differential Revision: https://reviews.llvm.org/D49049
llvm-svn: 337072
Ryzen has something like an 18 cycle latency on these based on Agner's data. AMD's own xls is blank. So it seems like there might be something tricky here.
Agner's data for Intel CPUs indicates these are a single uop there.
Probably safest to remove them. We never generate them without an intrinsic so this should be ok.
Differential Revision: https://reviews.llvm.org/D49315
llvm-svn: 337067
-Drop the intrinsic versions of conversion instructions. These should be handled when we do vectors. They shouldn't show up in scalar code.
-Add the float<->double conversions which were missing.
-Add the AVX512 and AVX version of the conversion instructions including the unsigned integer conversions unique to AVX512
Differential Revision: https://reviews.llvm.org/D49313
llvm-svn: 337066
-Move BSF/BSR to the same group as TZCNT/LZCNT/POPCNT.
-Split some of the bit manipulation instructions away from TZCNT/LZCNT/POPCNT. These are things like 'x & (x - 1)' which are composed of a few simple arithmetic operations. These aren't nearly as complicated/surprising as counting bits.
-Move BEXTR/BZHI into their own group. They aren't like a simple arithmethic op or the bit manipulation instructions. They're more like a shift+and.
Differential Revision: https://reviews.llvm.org/D49312
llvm-svn: 337065
These were supposed to be integer types since we are selecting integer instructions.
Found while preparing to remove these patterns for another patch.
llvm-svn: 337057
When we're linking an alias which will be defined later, we neeed to
build a GlobalAlias, or else we'll crash later in
IRLinker::linkGlobalValueBody.
clang sometimes constructs aliases like this for C++ destructors.
Differential Revision: https://reviews.llvm.org/D49316
llvm-svn: 337053
In order to always import the same copy of a linkonce function,
even when encountering it with different thresholds (a higher one then a
lower one), keep track of the summary we decided to import.
This ensures that the backend only gets a single definition to import
for each GUID, so that it doesn't need to choose one.
Move the largest threshold the GUID was considered for import into the
current module out of the ImportMap (which is part of a larger map
maintained across the whole index), and into a new map just maintained
for the current module we are computing imports for. This saves some
memory since we no longer have the thresholds maintained across the
whole index (and throughout the in-process backends when doing a normal
non-distributed ThinLTO build), at the cost of some additional
information being maintained for each invocation of ComputeImportForModule
(the selected summary pointer for each import).
There is an additional map lookup for each callee being considered for
importing, however, this was able to subsume a map lookup in the
Worklist iteration that invokes computeImportForFunction. We also are
able to avoid calling selectCallee if we already failed to import at the
same or higher threshold.
I compared the run time and peak memory for the SPEC2006 471.omnetpp
benchmark (running in-process ThinLTO backends), as well as for a large
internal benchmark with a distributed ThinLTO build (so just looking at
the thin link time/memory). Across a number of runs with and without
this change there was no significant change in the time and memory.
(I tried a few other variations of the change but they also didn't
improve time or peak memory).
Reviewers: davidxl
Subscribers: mehdi_amini, inglorion, llvm-commits
Differential Revision: https://reviews.llvm.org/D48670
llvm-svn: 337050
Summary:
Currently LowerTypeTests emits jumptable entries for all live external
and address-taken functions; however, we could limit the number of
functions that we emit entries for significantly.
For Cross-DSO CFI, we continue to emit jumptable entries for all
exported definitions. In the non-Cross-DSO CFI case, we only need to
emit jumptable entries for live functions that are address-taken in live
functions. This ignores exported functions and functions that are only
address taken in dead functions. This change uses ThinLTO summary data
(now emitted for all modules during ThinLTO builds) to determine
address-taken and liveness info.
The logic for emitting jumptable entries is more conservative in the
regular LTO case because we don't have summary data in the case of
monolithic LTO builds; however, once summaries are emitted for all LTO
builds we can unify the Thin/monolithic LTO logic to only use summaries
to determine the liveness of address taking functions.
This change is a partial fix for PR37474. It reduces the build size for
nacl_helper by ~2-3%, the reduction is due to nacl_helper compiling in
lots of unused code and unused functions that are address taken in dead
functions no longer being being considered live due to emitted jumptable
references. The reduction for chromium is ~0.1-0.2%.
Reviewers: pcc, eugenis, javed.absar
Reviewed By: pcc
Subscribers: aheejin, dexonsmith, dschuff, mehdi_amini, eraman, steven_wu, llvm-commits, kcc
Differential Revision: https://reviews.llvm.org/D47652
llvm-svn: 337038
For instance, When dumping .apple_types, the second atom represents the
DW_TAG. In addition to printing the raw value, we now also pretty print
the value if the ATOM tells us how.
llvm-svn: 337026
This needs to refer to arguments by their original argument
index, not the argument split index which depends on what
the type splitting decides to do.
Also avoid increment PSInputNum for each split piece.
llvm-svn: 337022
This was completely broken if there was ever a struct argument, as
this information is thrown away during the argument analysis.
The offsets as passed in to LowerFormalArguments are not useful,
as they partially depend on the legalized result register type,
and they don't consider the alignment in the first place.
Ignore the Ins array, and instead figure out from the raw IR type
what we need to do. This seems to fix the padding computation
if the DAG lowering is forced (and stops breaking arguments
following padded arguments if the arguments were only partially
lowered in the IR)
llvm-svn: 337021
This reverts commit r336419: use-after-free on CallGraph::FunctionMap elements
due to the use of a stale iterator in CGPassManager::runOnModule.
The iterator may be invalidated if a pass removes a function, ex.:
llvm::LegacyInlinerBase::inlineCalls
inlineCallsImpl
llvm::CallGraph::removeFunctionFromModule
llvm-svn: 337018
Follow up of rL336913: fix base class description. Thanks to Ahmed Bougacha
for pointing this out.
Differential Revision: https://reviews.llvm.org/D49284
llvm-svn: 337009
Revision r322373 fixed a bug in how we materialize constants when the CR-field
needs to be set.
However the fix is overly conservative. It will only do the transform if
AND-ing the input with the new constant produces the same new constant.
This is of course correct, but not necessarily required.
If there are no futher uses of the constant, the constant can be changed.
If there are no uses of the GPR result, the final result of the materialization
isn't important other than it needs to compare to zero correctly (lt, gt, eq).
Differential revision: https://reviews.llvm.org/D42109
llvm-svn: 337008
This patch adds support for AArch64 to cfi-verify.
This required three changes to cfi-verify. First, it generalizes checking if an instruction is a trap by adding a new isTrap flag to TableGen (and defining it for x86 and AArch64). Second, the code that ensures that the operand register is not clobbered between the CFI check and the indirect call needs to allow a single dereference (in x86 this happens as part of the jump instruction). Third, we needed to ensure that return instructions are not counted as indirect branches. Technically, returns are indirect branches and can be covered by CFI, but LLVM's forward-edge CFI does not protect them, and x86 does not consider them, so we keep that behavior.
In addition, we had to improve AArch64's code to evaluate the branch target of a MCInst to handle calls where the destination is not the first operand (which it often is not).
Differential Revision: https://reviews.llvm.org/D48836
llvm-svn: 337007
A TableGen instruction record usually contains a DAG pattern that will
describe the SelectionDAG operation that can be implemented by this
instruction. However, there will be cases where several different DAG
patterns can all be implemented by the same instruction. The way to
represent this today is to write additional patterns in the Pattern
(or usually Pat) class that map those extra DAG patterns to the
instruction. This usually also works fine.
However, I've noticed cases where the current setup seems to require
quite a bit of extra (and duplicated) text in the target .td files.
For example, in the SystemZ back-end, there are quite a number of
instructions that can implement an "add-with-overflow" operation.
The same instructions also need to be used to implement just plain
addition (simply ignoring the extra overflow output). The current
solution requires creating extra Pat pattern for every instruction,
duplicating the information about which particular add operands
map best to which particular instruction.
This patch enhances TableGen to support a new PatFrags class, which
can be used to encapsulate multiple alternative patterns that may
all match to the same instruction. It operates the same way as the
existing PatFrag class, except that it accepts a list of DAG patterns
to match instead of just a single one. As an example, we can now define
a PatFrags to match either an "add-with-overflow" or a regular add
operation:
def z_sadd : PatFrags<(ops node:$src1, node:$src2),
[(z_saddo node:$src1, node:$src2),
(add node:$src1, node:$src2)]>;
and then use this in the add instruction pattern:
defm AR : BinaryRRAndK<"ar", 0x1A, 0xB9F8, z_sadd, GR32, GR32>;
These SystemZ target changes are implemented here as well.
Note that PatFrag is now defined as a subclass of PatFrags, which
means that some users of internals of PatFrag need to be updated.
(E.g. instead of using PatFrag.Fragment you now need to use
!head(PatFrag.Fragments).)
The implementation is based on the following main ideas:
- InlinePatternFragments may now replace each original pattern
with several result patterns, not just one.
- parseInstructionPattern delays calling InlinePatternFragments
and InferAllTypes. Instead, it extracts a single DAG match
pattern from the main instruction pattern.
- Processing of the DAG match pattern part of the main instruction
pattern now shares most code with processing match patterns from
the Pattern class.
- Direct use of main instruction patterns in InferFromPattern and
EmitResultInstructionAsOperand is removed; everything now operates
solely on DAG match patterns.
Reviewed by: hfinkel
Differential Revision: https://reviews.llvm.org/D48545
llvm-svn: 336999
Summary:
This commit does two things:
1. modified the existing DivergenceAnalysis::dump() so it dumps the
whole function with added DIVERGENT: annotations;
2. added code to do that dump if the appropriate -debug-only option is
on.
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D47700
Change-Id: Id97b605aab1fc6f5a11a20c58a99bbe8c565bf83
llvm-svn: 336998
Spectre variant #1 for x86.
There is a lengthy, detailed RFC thread on llvm-dev which discusses the
high level issues. High level discussion is probably best there.
I've split the design document out of this patch and will land it
separately once I update it to reflect the latest edits and updates to
the Google doc used in the RFC thread.
This patch is really just an initial step. It isn't quite ready for
prime time and is only exposed via debugging flags. It has two major
limitations currently:
1) It only supports x86-64, and only certain ABIs. Many assumptions are
currently hard-coded and need to be factored out of the code here.
2) It doesn't include any options for more fine-grained control, either
of which control flow edges are significant or which loads are
important to be hardened.
3) The code is still quite rough and the testing lighter than I'd like.
However, this is enough for people to begin using. I have had numerous
requests from people to be able to experiment with this patch to
understand the trade-offs it presents and how to use it. We would also
like to encourage work to similar effect in other toolchains.
The ARM folks are actively developing a system based on this for
AArch64. We hope to merge this with their efforts when both are far
enough along. But we also don't want to block making this available on
that effort.
Many thanks to the *numerous* people who helped along the way here. For
this patch in particular, both Eric and Craig did a ton of review to
even have confidence in it as an early, rough cut at this functionality.
Differential Revision: https://reviews.llvm.org/D44824
llvm-svn: 336990
We currently only support binary instructions in the alternate opcode shuffles.
This patch is an initial attempt at adding cast instructions as well, this raises several issues that we probably want to address as we continue to generalize the alternate mechanism:
1 - Duplication of cost determination - we should probably add scalar/vector costs helper functions and get BoUpSLP::getEntryCost to use them instead of determining costs directly.
2 - Support alternate instructions with the same opcode (e.g. casts with different src types) - alternate vectorization of calls with different IntrinsicIDs will require this.
3 - Allow alternates to be a different instruction type - mixing binary/cast/call etc.
4 - Allow passthrough of unsupported alternate instructions - related to PR30787/D28907 'copyable' elements.
Reapplied with fix to only accept 2 different casts if they come from the same source type (PR38154).
Differential Revision: https://reviews.llvm.org/D49135
llvm-svn: 336989
flow patterns including forks, merges, and even cyles.
This tries to cover a reasonably comprehensive set of patterns that
still don't require PHIs or PHI placement. The coverage was inspired by
the amazing variety of patterns produced when copy EFLAGS and restoring
it to implement Speculative Load Hardening. Without this patch, we
simply cannot make such complex and invasive changes to x86 instruction
sequences due to EFLAGS.
I've added "just" one test, but this test covers many different
complexities and corner cases of this approach. It is actually more
comprehensive, as far as I can tell, than anything that I have
encountered in the wild on SLH.
Because the test is so complex, I've tried to give somewhat thorough
comments and an ASCII-art diagram of the control flows to make it a bit
easier to read and maintain long-term.
Differential Revision: https://reviews.llvm.org/D49220
llvm-svn: 336985
This patch adds support for the following unpack instructions:
- PUNPKLO, PUNPKHI Unpack elements from low/high half and
place into elements of twice their size.
e.g. punpklo p0.h, p0.b
- UUNPKLO, UUNPKHI Unpack elements from low/high half and
SUNPKLO, SUNPKHI place into elements of twice their size
after zero- or sign-extending the values.
e.g. uunpklo z0.h, z0.b
llvm-svn: 336982
During the execution of long functions or functions that have a lot of
inlined code it could come to the situation where tracked value could be
transferred from one register to another. The transfer is recognized only if
destination register is a callee saved register and if source register is
killed. We do not salvage caller-saved registers since there is a great
chance that killed register would outlive it.
Patch by Nikola Prica.
Differential Revision: https://reviews.llvm.org/D44016
llvm-svn: 336978
Previously we iseled to blend, commuted to another blend, and then commuted back to movss/movsd or blend depending on optsize. Now we do it directly.
llvm-svn: 336976
Summary:
llvm-xray changes:
- account-mode - process-id {...} shows after thread-id
- convert-mode - process {...} shows after thread
- parses FDR and basic mode pid entries
- Checks version number for FDR log parsing.
Basic logging changes:
- Update header version from 2 -> 3
FDR logging changes:
- Update header version from 2 -> 3
- in writeBufferPreamble, there is an additional PID Metadata record (after thread id record and tsc record)
Test cases changes:
- fdr-mode.cc, fdr-single-thread.cc, fdr-thread-order.cc modified to catch process id output in the log.
Reviewers: dberris
Reviewed By: dberris
Subscribers: hiraditya, llvm-commits, #sanitizers
Differential Revision: https://reviews.llvm.org/D49153
llvm-svn: 336974
This is not an optimization we should be doing in isel. This is more suitable for a DAG combine.
My main concern is a future time when we support more FPENV. Changing a packed op to a scalar op could cause us to miss some exceptions that should have occured if we had done a packed op. A DAG combine would be better able to manage this.
llvm-svn: 336971
Summary:
Previously, when both DT and PDT are nullptrs and the UpdateStrategy is Lazy, DomTreeUpdater still pends updates inside.
After this patch, DomTreeUpdater will ignore all updates from(`applyUpdates()/insertEdge*()/deleteEdge*()`) in this case. (call `delBB()` still pends BasicBlock deletion until a flush event according to the doc).
The behavior of DomTreeUpdater previously documented won't change after the patch.
Reviewers: dmgreen, davide, kuhar, brzycki, grosser
Reviewed By: kuhar
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D48974
llvm-svn: 336968
This bug was created by rL335258 because we used to always call instsimplify
after trying the associative folds. After that change it became possible
for subsequent folds to encounter unsimplified code (and potentially assert
because of it).
Instead of carrying changed state through instcombine, we can just return
immediately. This allows instsimplify to run, so we can continue assuming
that easy folds have already occurred.
llvm-svn: 336965
This re-applies r336929 with a fix to accomodate for the Mips target
scheduling multiple SelectionDAG instances into the pass pipeline.
PrologEpilogInserter and StackColoring depend on the StackProtector analysis
being alive from the point it is run until PEI, which requires that they are all
scheduled in the same FunctionPassManager. Inserting a (machine) ModulePass
between StackProtector and PEI results in these passes being in separate
FunctionPassManagers and the StackProtector is not available for PEI.
PEI and StackColoring don't use much information from the StackProtector pass,
so transfering the required information to MachineFrameInfo is cleaner than
keeping the StackProtector pass around. This commit moves the SSP layout
information to MFI instead of keeping it in the pass.
This patch set (D37580, D37581, D37582, D37583, D37584, D37585, D37586, D37587)
is a first draft of the pagerando implementation described in
http://lists.llvm.org/pipermail/llvm-dev/2017-June/113794.html.
Patch by Stephen Crane <sjc@immunant.com>
Differential Revision: https://reviews.llvm.org/D49256
llvm-svn: 336964
Summary:
This patch is crucial for proving equality laundered/stripped
pointers. eg:
bool foo(A *a) {
return a == std::launder(a);
}
Clang with -fstrict-vtable-pointers will emit something like:
define dso_local zeroext i1 @_Z3fooP1A(%struct.A* %a) {
entry:
%c = bitcast %struct.A* %a to i8*
%call = tail call i8* @llvm.launder.invariant.group.p0i8(i8* %c)
%0 = bitcast %struct.A* %a to i8*
%1 = tail call i8* @llvm.strip.invariant.group.p0i8(i8* %0)
%2 = tail call i8* @llvm.strip.invariant.group.p0i8(i8* %call)
%cmp = icmp eq i8* %1, %2
ret i1 %cmp
}
and because %2 can be replaced with @llvm.strip.invariant.group(%0)
and that %2 and %1 will produce the same value (because strip is readnone)
we can replace compare with true.
Reviewers: rsmith, hfinkel, majnemer, amharc, kuhar
Subscribers: llvm-commits, hiraditya
Differential Revision: https://reviews.llvm.org/D47423
llvm-svn: 336963
Summary:
This allows counters associated with unused functions to be
dead-stripped along with their functions. This approach is the same one
we used for PC tables.
Fixes an issue where LLD removes an unused PC table but leaves the 8-bit
counter.
Reviewers: eugenis
Reviewed By: eugenis
Subscribers: llvm-commits, hiraditya, kcc
Differential Revision: https://reviews.llvm.org/D49264
llvm-svn: 336941
PrologEpilogInserter and StackColoring depend on the StackProtector analysis
being alive from the point it is run until PEI, which requires that they are all
scheduled in the same FunctionPassManager. Inserting a (machine) ModulePass
between StackProtector and PEI results in these passes being in separate
FunctionPassManagers and the StackProtector is not available for PEI.
PEI and StackColoring don't use much information from the StackProtector pass,
so transfering the required information to MachineFrameInfo is cleaner than
keeping the StackProtector pass around. This commit moves the SSP layout
information to MFI instead of keeping it in the pass.
This patch set (D37580, D37581, D37582, D37583, D37584, D37585, D37586, D37587)
is a first draft of the pagerando implementation described in
http://lists.llvm.org/pipermail/llvm-dev/2017-June/113794.html.
Patch by Stephen Crane <sjc@immunant.com>
Differential Revision: https://reviews.llvm.org/D49256
llvm-svn: 336929
and no use of DW_FORM_rnglistx with the DW_AT_ranges attribute.
Reviewer: aprantl
Differential Revision: https://reviews.llvm.org/D49214
llvm-svn: 336927
We were accidentally connecting it to result 0 instead of result 1. This was caught by the machine verifier that noticed the flags were dead, but we were using them somehow. I'm still not clear what actually happened downstream.
llvm-svn: 336925
We have located a bug in AssemblyWriter::printModuleSummaryIndex(). This
function outputs path strings incorrectly. Backslashes in the strings
are not correctly escaped.
Consequently, if a path name contains a backslash followed by two
hexadecimal characters, the sequence is incorrectly interpreted when the
output is read by another component. This mangles the path and results
in error.
This patch fixes this issue by calling printEscapedString() to output
the module paths.
Patch by Chris Jackson.
Differential Revision: https://reviews.llvm.org/D49090
llvm-svn: 336908
canWidenShuffleElements can do a better job if given a mask with ZeroableElements info. Apparently, ZeroableElements was being only used to identify AllZero candidates, but possibly we could plug it into more shuffle matchers.
Original Patch by Zvi Rackover @zvi
Differential Revision: https://reviews.llvm.org/D42044
llvm-svn: 336903
Noticed while updating D42044, lowerV2X128VectorShuffle can improve the shuffle mask with the zeroable data to create a target shuffle mask to recognise more 'zero upper 128' patterns.
NOTE: lowerV4X128VectorShuffle could benefit as well but the code needs refactoring first to discriminate between SM_SentinelUndef and SM_SentinelZero for negative shuffle indices.
Differential Revision: https://reviews.llvm.org/D49092
llvm-svn: 336900
We no longer care about the order of blocks in these collections,
so can change to SmallPtrSets, making contains checks quicker.
Differential revision: https://reviews.llvm.org/D49060
llvm-svn: 336897
Mark standard encoded instructions and pseudo "standard encoded"
as not being in MIPS16e by default.
Patch by Simon Dardis.
Differential revision: https://reviews.llvm.org/D48379
llvm-svn: 336893
We now use llvm.fma.f32/f64 or llvm.x86.fmadd.f32/f64 intrinsics that use scalar types rather than vector types. So we don't these special ISD nodes that operate on the lowest element of a vector.
llvm-svn: 336883
there for a long time.
The boolean tracking whether we saw a kill of the flags was supposed to
be per-block we are scanning and instead was outside that loop and never
cleared. It requires a quite contrived test case to hit this as you have
to have multiple levels of successors and interleave them with kills.
I've included such a test case here.
This is another bug found testing SLH and extracted to its own focused
patch.
llvm-svn: 336876
multiple successors where some of the uses end up killing the EFLAGS
register.
There was a bug where rather than skipping to the next basic block
queued up with uses once we saw a kill, we stopped processing the blocks
entirely. =/
Test case produces completely nonsensical code w/o this tiny fix.
This was found testing Speculative Load Hardening and split out of that
work.
Differential Revision: https://reviews.llvm.org/D49211
llvm-svn: 336874
This converts them to what clang is now using for codegen. Unfortunately, there seem to be a few kinks to work out still. I'll try to address with follow up patches.
llvm-svn: 336871
This changes `-print-*` from transformation passes to analysis passes so
that `-print-after-all` and `-print-before-all` don't trigger. This
avoids some redundant output.
Patch by Son Tuan Vu!
llvm-svn: 336869
This is marginally helpful for removing redundant extensions, and the
code is easier to read, so it seems like an all-around win. In the new
test i8-phi-ext.ll, we used to emit an AssertSext i8; now we emit an
AssertZext i2, which allows the extension of the return value to be
eliminated.
Differential Revision: https://reviews.llvm.org/D49004
llvm-svn: 336868
This commit suppresses turning loops like this into "(bitwidth - ctlz(input))".
unsigned foo(unsigned input) {
unsigned num = 0;
do {
++num;
input >>= 1;
} while (input != 0);
return num;
}
The loop version returns a value of 1 for both an input of 0 and an input of 1. Converting to a naive ctlz does not preserve that.
Theoretically we could do better if we checked isKnownNonZero or we could insert a select to handle the divergence. But until we have motivating cases for that, this is the easiest solution.
llvm-svn: 336864
SITargetLowering queries SIInstrInfo in its constructor, so SIInstrInfo
must be initialized first. This fixes msan buildbot failures and was
introduced by r336851.
llvm-svn: 336861
Summary:
The move APIs added in this patch will be used to update MemorySSA when CFG changes merge or split blocks, by moving memory accesses accordingly in MemorySSA's internal data structures.
[Split from D45299 for easier review]
Reviewers: george.burgess.iv
Subscribers: sanjoy, jlebar, Prazek, llvm-commits
Differential Revision: https://reviews.llvm.org/D48897
llvm-svn: 336860
Summary:
This is a follow-up to r335942.
- Merge SISubtarget into AMDGPUSubtarget and rename to GCNSubtarget
- Rename AMDGPUCommonSubtarget to AMDGPUSubtarget
- Merge R600Subtarget::Generation and GCNSubtarget::Generation into
AMDGPUSubtarget::Generation.
Reviewers: arsenm, jvesely
Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D49037
llvm-svn: 336851
Summary:
https://bugs.llvm.org/show_bug.cgi?id=38123
This pattern will be produced by Implicit Integer Truncation sanitizer,
https://reviews.llvm.org/D48958https://bugs.llvm.org/show_bug.cgi?id=21530
in unsigned case, therefore it is probably a good idea to improve it.
https://rise4fun.com/Alive/Rny
^ there are more opportunities for folds, i will follow up with them afterwards.
Caveat: this somehow exposes a missing opportunities
in `test/Transforms/InstCombine/icmp-logical.ll`
It seems, the problem is in `foldLogOpOfMaskedICmps()` in `InstCombineAndOrXor.cpp`.
But i'm not quite sure what is wrong, because it calls `getMaskedTypeForICmpPair()`,
which calls `decomposeBitTestICmp()` which should already work for these cases...
As @spatel notes in https://reviews.llvm.org/D49179#1158760,
that code is a rather complex mess, so we'll let it slide.
Reviewers: spatel, craig.topper
Reviewed By: spatel
Subscribers: yamauchi, majnemer, t.p.northover, llvm-commits
Differential Revision: https://reviews.llvm.org/D49179
llvm-svn: 336834
We can instead block the load folding isProfitableToFold. Then isel will emit a register->register move for the zeroing part and a separate load. The PostProcessISelDAG should be able to remove the register->register move.
This saves us patterns and fixes the fact that we only had unaligned load patterns. The test changes show places where we should have been using an aligned load.
llvm-svn: 336828
Make the DIE iterator bidirectional so we can move to the previous
sibling of a DIE.
Differential revision: https://reviews.llvm.org/D49173
llvm-svn: 336823
Before revision 336728, the "mayLoad" flag for instruction (V)MOVLPSrm was
inferred directly from the "default" pattern associated with the instruction
definition.
r336728 removed special node X86Movlps, and all the patterns associated to it.
Now instruction (V)MOVLPSrm doesn't have a pattern associated to it, and the
'mayLoad/hasSideEffects' flags are left unset.
When the instruction info is emitted by tablegen, method
CodeGenDAGPatterns::InferInstructionFlags() sees that (V)MOVLPSrm doesn't have a
pattern, and flags are undefined. So, it conservatively sets the
"hasSideEffects" flag for it.
As a consequence, we were losing the 'mayLoad' flag, and we were gaining a
'hasSideEffect' flag in its place.
This patch fixes the issue (originally reported by Michael Holmen).
The mca tests show the differences in the instruction info flags. Instructions
that were affected by this problem were: MOVLPSrm/VMOVLPSrm/VMOVLPSZ128rm.
Differential Revision: https://reviews.llvm.org/D49182
llvm-svn: 336818
We currently only support binary instructions in the alternate opcode shuffles.
This patch is an initial attempt at adding cast instructions as well, this raises several issues that we probably want to address as we continue to generalize the alternate mechanism:
1 - Duplication of cost determination - we should probably add scalar/vector costs helper functions and get BoUpSLP::getEntryCost to use them instead of determining costs directly.
2 - Support alternate instructions with the same opcode (e.g. casts with different src types) - alternate vectorization of calls with different IntrinsicIDs will require this.
3 - Allow alternates to be a different instruction type - mixing binary/cast/call etc.
4 - Allow passthrough of unsupported alternate instructions - related to PR30787/D28907 'copyable' elements.
Reapplied with fix to only accept 2 different casts if they come from the same source type.
Differential Revision: https://reviews.llvm.org/D49135
llvm-svn: 336812
We currently only support binary instructions in the alternate opcode shuffles.
This patch is an initial attempt at adding cast instructions as well, this raises several issues that we probably want to address as we continue to generalize the alternate mechanism:
1 - Duplication of cost determination - we should probably add scalar/vector costs helper functions and get BoUpSLP::getEntryCost to use them instead of determining costs directly.
2 - Support alternate instructions with the same opcode (e.g. casts with different src types) - alternate vectorization of calls with different IntrinsicIDs will require this.
3 - Allow alternates to be a different instruction type - mixing binary/cast/call etc.
4 - Allow passthrough of unsupported alternate instructions - related to PR30787/D28907 'copyable' elements.
Differential Revision: https://reviews.llvm.org/D49135
llvm-svn: 336804
This mostly brings the P5600 scheduler model to a mostly complete
status. There are a number of instructions which trigger the
`error:'MipsP5600Model' lacks information for` error. These are certain
codegen only instructions relating to MIPS64 which can be addressed by
using the correct predicates for them. That will be done in a full-up
patch.
Patch by Simon Dardis.
Differential revision: https://reviews.llvm.org/D45245
llvm-svn: 336802
Reuse this function as to test correctness and profitability of
reducing width of either load or store operations.
Reviewsers: samparker
Differential Revision: https://reviews.llvm.org/D48624
llvm-svn: 336800
This fixes an issue that we were not properly supporting multiple reduction
stmts in a loop, and not generating SMLADs for these cases. The alias analysis
checks were done too early, making it too conservative.
Differential revision: https://reviews.llvm.org/D49125
llvm-svn: 336795
AT_NAME was being emitted before the directory paths were remapped. This
ensures that all paths are remapped before anything is emitted.
An additional test case has been added.
Note that this only works if the replacement string is an absolute path.
If not, then AT_decl_file believes the new path is a relative path, and
joins that path with the compilation directory. I do not know of a good
way to resolve this.
Patch by: Siddhartha Bagaria (starsid)
Differential revision: https://reviews.llvm.org/D49169
llvm-svn: 336793
The compact instruction shuffles active elements of vector
into lowest numbered elements and sets remaining elements
to zero.
e.g.
compact z0.s, p0, z1.s
llvm-svn: 336789
The LASTB and LASTA instructions extract the last active element,
or element after the last active, from the source vector.
The added variants are:
Scalar:
last(a|b) w0, p0, z0.b
last(a|b) w0, p0, z0.h
last(a|b) w0, p0, z0.s
last(a|b) x0, p0, z0.d
SIMD & FP Scalar:
last(a|b) b0, p0, z0.b
last(a|b) h0, p0, z0.h
last(a|b) s0, p0, z0.s
last(a|b) d0, p0, z0.d
The CLASTB and CLASTA conditionally extract the last or element after
the last active element from the source vector.
The added variants are:
Scalar:
clast(a|b) w0, p0, w0, z0.b
clast(a|b) w0, p0, w0, z0.h
clast(a|b) w0, p0, w0, z0.s
clast(a|b) x0, p0, x0, z0.d
SIMD & FP Scalar:
clast(a|b) b0, p0, b0, z0.b
clast(a|b) h0, p0, h0, z0.h
clast(a|b) s0, p0, s0, z0.s
clast(a|b) d0, p0, d0, z0.d
Vector:
clast(a|b) z0.b, p0, z0.b, z1.b
clast(a|b) z0.h, p0, z0.h, z1.h
clast(a|b) z0.s, p0, z0.s, z1.s
clast(a|b) z0.d, p0, z0.d, z1.d
Please refer to the architecture specification for more details on
the semantics of the added instructions.
llvm-svn: 336783
This allows us to use SelectionDAG::isKnownNeverZero in DAGCombiner::visitREM (visitSDIVLike/visitUDIVLike handle the checking for constants).
llvm-svn: 336779
First stage in PR38057 - support non-uniform constant vectors in the combine to reuse the division-by-constant logic.
We can definitely do better for srem pow2 remainders (and avoid that extra multiply....) but this at least helps keep everything on the vector unit.
Differential Revision: https://reviews.llvm.org/D48975
llvm-svn: 336774
gcc 4.7 seems to disagree with gcc 5.3 about whether you need to say
'return std::move(thing)' instead of just 'return thing'. All the
json::Arrays and json::Objects that I was implicitly turning into
json::Values by returning them from functions now have explicit
std::move wrappers, so hopefully 4.7 will be happy now.
llvm-svn: 336772
The aim of this backend is to output everything TableGen knows about
the record set, similarly to the default -print-records backend. But
where -print-records produces output in TableGen's input syntax
(convenient for humans to read), this backend produces it as
structured JSON data, which is convenient for loading into standard
scripting languages such as Python, in order to extract information
from the data set in an automated way.
The output data contains a JSON representation of the variable
definitions in output 'def' records, and a few pieces of metadata such
as which of those definitions are tagged with the 'field' prefix and
which defs are derived from which classes. It doesn't dump out
absolutely every piece of knowledge it _could_ produce, such as type
information and complicated arithmetic operator nodes in abstract
superclasses; the main aim is to allow consumers of this JSON dump to
essentially act as new backends, and backends don't generally need to
depend on that kind of data.
The new backend is implemented as an EmitJSON() function similar to
all of llvm-tblgen's other EmitFoo functions, except that it lives in
lib/TableGen instead of utils/TableGen on the basis that I'm expecting
to add it to clang-tblgen too in a future patch.
To test it, I've written a Python script that loads the JSON output
and tests properties of it based on comments in the .td source - more
or less like FileCheck, except that the CHECK: lines have Python
expressions after them instead of textual pattern matches.
Reviewers: nhaehnle
Reviewed By: nhaehnle
Subscribers: arichardson, labath, mgorny, llvm-commits
Differential Revision: https://reviews.llvm.org/D46054
llvm-svn: 336771
Summary:
These changes cover the PR#31399.
Now the ffs(x) function is lowered to (x != 0) ? llvm.cttz(x) + 1 : 0
and it corresponds to the following llvm code:
%cnt = tail call i32 @llvm.cttz.i32(i32 %v, i1 true)
%tobool = icmp eq i32 %v, 0
%.op = add nuw nsw i32 %cnt, 1
%add = select i1 %tobool, i32 0, i32 %.op
and x86 asm code:
bsfl %edi, %ecx
addl $1, %ecx
testl %edi, %edi
movl $0, %eax
cmovnel %ecx, %eax
In this case the 'test' instruction can't be eliminated because
the 'add' instruction modifies the EFLAGS, namely, ZF flag
that is set by the 'bsf' instruction when 'x' is zero.
We now produce the following code:
bsfl %edi, %ecx
movl $-1, %eax
cmovnel %ecx, %eax
addl $1, %eax
Patch by Ivan Kulagin
Reviewers: davide, craig.topper, spatel, RKSimon
Reviewed By: craig.topper
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D48765
llvm-svn: 336768
These patterns looked for a MOVSS/SD followed by a scalar_to_vector. Or a scalar_to_vector followed by a load.
In both cases we emitted a MOVSS/SD for the MOVSS/SD part, a REG_CLASS for the scalar_to_vector, and a MOVSS/SD for the load.
But we have patterns that do each of those 3 things individually so there's no reason to build large patterns.
Most of the test changes are just reorderings. The one test that had a meaningful change is pr30430.ll and it appears to be a regression. But its doing -O0 so I think it missed a lot of opportunities and was just getting lucky before.
llvm-svn: 336762
The original code attempted to do this, but the std::abs() call didn't
actually do anything due to implicit type conversions. Fix the type
conversions, and perform the correct check for negative immediates.
This probably has very little practical impact, but it's worth fixing
just to avoid confusion in the future, I think.
Differential Revision: https://reviews.llvm.org/D48907
llvm-svn: 336742
If we don't include Initialization.h,
`LLVMInitializeAggressiveInstCombiner` won't see its `extern "C"` decl.
This causes sadness, name mangling, and linker errors.
Reported on the mailing lists by Vladimir Vissoultchev. Thanks!
llvm-svn: 336736
Some added 20 and some added 15. Its unclear when to use which value and whether they are required at all.
This patch removes them all. If we start finding real world issues we may need to add them back with proper tests.
llvm-svn: 336735
Isel currently emits movss/movsd a lot of the time and an accidental double commute turns it into a blend.
Ideally we'd select blend directly in isel under optspeed and not rely on the double commute to create blend.
llvm-svn: 336731
These ISD nodes try to select the MOVLPS and MOVLPD instructions which are special load only instructions. They load data and merge it into the lower 64-bits of an XMM register. They are logically equivalent to our MOVSD node plus a load.
There was only one place in X86ISelLowering that used MOVLPD and no places that selected MOVLPS. The one place that selected MOVLPD had to choose between it and MOVSD based on whether there was a load. But lowering is too early to tell if the load can really be folded. So in isel we have patterns that use MOVSD for MOVLPD if we can't find a load.
We also had patterns that select the MOVLPD instruction for a MOVSD if we can find a load, but didn't choose the MOVLPD ISD opcode for some reason.
So it seems better to just standardize on MOVSD ISD opcode and manage MOVSD vs MOVLPD instruction with isel patterns.
llvm-svn: 336728
Summary:
I noticed that the .imports files emitted for distributed ThinLTO
backends do not have consistent ordering. This is because StringMap
iteration order is not guaranteed to be deterministic. Since we already
have a std::map with this information, used when emitting the individual
index files (ModuleToSummariesForIndex), use it for the imports files as
well.
This issue is likely causing some unnecessary rebuilds of the ThinLTO
backends in our distributed build system as the imports files are inputs
to those backends.
Reviewers: pcc, steven_wu, mehdi_amini
Subscribers: mehdi_amini, inglorion, eraman, steven_wu, dexonsmith, llvm-commits
Differential Revision: https://reviews.llvm.org/D48783
llvm-svn: 336721
I believe isProfitableToFold will stop the load folding that this was intended to overcome.
Given an (xor load, -1), isProfitableToFold will see that the immediate can be folded with the xor using a one byte immediate since it can be sign extended. It doesn't know about NOT, but the one byte immediate check is enough to stop the fold.
llvm-svn: 336712
The llvm_gcov_... routines in compiler-rt are regular C functions that
need to be called using the proper C ABI for the target. The current
code simply calls them using plain LLVM IR types. Since the type are
mostly simple, this happens to just work on certain targets. But other
targets still need special handling; in particular, it may be necessary
to sign- or zero-extended sub-word values to comply with the ABI. This
caused gcov failures on SystemZ in particular.
Now the very same problem was already fixed for the llvm_profile_ calls
here: https://reviews.llvm.org/D21736
This patch uses the same method to fix the llvm_gcov_ calls, in
particular calls to llvm_gcda_start_file, llvm_gcda_emit_function, and
llvm_gcda_emit_arcs.
Reviewed By: marco-c
Differential Revision: https://reviews.llvm.org/D49134
llvm-svn: 336692
When manually finishing the object writer in dsymutil, it's possible
that there are pending labels that haven't been resolved. This results
in an assertion when the assembler tries to fixup a label that doesn't
have an address yet.
Differential revision: https://reviews.llvm.org/D49131
llvm-svn: 336688
This was originally intended with D48893, but as discussed there, we
have to make the folds safe from producing extra poison. This should
give the single binop folds the same capabilities as the existing
folds for 2-binops+shuffle.
LLVM binary opcode review: there are a total of 18 binops. There are 7
commutative binops (add, mul, and, or, xor, fadd, fmul) which we already
fold. We're able to fold 6 more opcodes with this patch (shl, lshr, ashr,
fdiv, udiv, sdiv). There are no folds for srem/urem/frem AFAIK. We don't
bother with sub/fsub with constant operand 1 because those are
canonicalized to add/fadd. 7 + 6 + 3 + 2 = 18.
llvm-svn: 336684
debug compilation dir when compiling assembly files with -g.
Part of PR38050.
Patch by Siddhartha Bagaria!
Differential Revision: https://reviews.llvm.org/D48988
llvm-svn: 336680
This patch adds support for the following instructions:
CLS (Count Leading Sign bits)
CLZ (Count Leading Zeros)
CNT (Count non-zero bits)
CNOT (Logically invert boolean condition in vector)
NOT (Bitwise invert vector)
FABS (Floating-point absolute value)
FNEG (Floating-point negate)
All operations are predicated and unary, e.g.
clz z0.s, p0/m, z1.s
- CLS, CLZ, CNT, CNOT and NOT have variants for 8, 16, 32
and 64 bit elements.
- FABS and FNEG have variants for 16, 32 and 64 bit elements.
llvm-svn: 336677
The case with 2 variables is more complicated than the case where
we eliminate the shuffle entirely because a shuffle with an undef
mask element creates an undef result.
I'm not aware of any current analysis/transform that recognizes that
undef propagating to a div/rem/shift, but we have to guard against
the possibility.
llvm-svn: 336668
An explicit untied use is not sufficient to maintain liveness of a
register redefined in a predicated instruction. For example
%1 = COPY %0
...
%1 = A2_paddif %2, %1, 1
could become
$r1 = COPY $r0
...
$r1 = A2_paddif $p0, $r1, 1
and later
$r1 = COPY $r0 ;; this is not really dead!
...
$r1 = A2_paddif $p0, $r0, 1
llvm-svn: 336662
Summary:
Fixed two cases of where PHI nodes need to be updated by lowerswitch.
When lowerswitch find out that the switch default branch is not
reachable it remove the old default and replace it with the most
popular block from the cases, but it forget to update the PHI
nodes in the default block.
The PHI nodes also need to be updated when the switch is replaced
with a single branch.
Reviewers: hans, reames, arsenm
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D47203
llvm-svn: 336659
Parsing invalid UTF-8 input is now a parse error.
Creating JSON values from invalid UTF-8 now triggers an assertion, and
(in no-assert builds) substitutes the unicode replacement character.
Strings retrieved from json::Value are always valid UTF-8.
llvm-svn: 336657
As suggested by @efriedma on D48975, this patch separates the BuildDiv/Pow2 style optimizations from the rest of the visitSDIV/visitUDIV to make it easier to reuse the combines and will allow us to avoid some rather nasty node recursive combining in visitREM.
llvm-svn: 336656
switch unswitching.
The core problem was that the way we handled unswitching trivial exit
edges through the default successor of a switch. For some reason
I thought the right way to do this was to add a block containing
unreachable and point the default successor at this block. In
retrospect, this has an amazing number of problems.
The first issue is the one that this pass has always worked around -- we
have to *detect* such edges and avoid unswitching them again. This
seemed pretty easy really. You juts look for an edge to a block
containing unreachable. However, this pattern is woefully unsound. So
many things can break it. The amazing thing is that I found a test case
where *simple-loop-unswitch itself* breaks this! When we do
a *non-trivial* unswitch of a switch we will end up splitting this exit
edge. The result will be a default successor that is an exit and
terminates in ... a perfectly normal branch. So the first test case that
I started trying to fix is added to the nontrivial test cases. This is
a ridiculous example that did just amazing things previously. With just
unswitch, it would create 10+ copies of this stuff stamped out. But if
you combine it *just right* with a bunch of other passes (like
simplify-cfg, loop rotate, and some LICM) you can get it to do this
infinitely. Or at least, I never got it to finish. =[
This, in turn, uncovered another related issue. When we are manipulating
these switches after doing a trivial unswitch we never correctly updated
PHI nodes to reflect our edits. As soon as I started changing how these
edges were managed, it became obvious there were more issues that
I couldn't realistically leave unaddressed, so I wrote more test cases
around PHI updates here and ensured all of that works now.
And this, in turn, required some adjustment to how we collect and manage
the exit successor when it is the default successor. That showed a clear
bug where we failed to include it in our search for the outer-most loop
reached by an unswitched exit edge. This was actually already tested and
the test case didn't work. I (wrongly) thought that was due to SCEV
failing to analyze the switch. In fact, it was just a simple bug in the
code that skipped the default successor. While changing this, I handled
it correctly and have updated the test to reflect that we now get
precise SCEV analysis of trip counts for the outer loop in one of these
cases.
llvm-svn: 336646
Now that rL336250 has landed, we should prefer 2 immediate shifts + a shuffle blend over performing a multiply. Despite the increase in instructions, this is quicker (especially for slow v4i32 multiplies), avoid loads and constant pool usage. It does mean however that we increase register pressure. The code size will go up a little but by less than what we save on the constant pool data.
This patch also adds support for v16i16 to the BLEND(SHIFT(v,c1),SHIFT(v,c2)) combine, and also prevents blending on pre-SSE41 shifts if it would introduce extra blend masks/constant pool usage.
Differential Revision: https://reviews.llvm.org/D48936
llvm-svn: 336642
We're missing the EVEX equivalents of these patterns and seem to get along fine.
I think we end up with X86vzload for the obvious IR cases that would produce this DAG.
llvm-svn: 336638
This is prep for DWARF v5 range list emission. Emission of a single range list is moved
to a static helper function.
Reviewer: jdevlieghere
Differential Revision: https://reviews.llvm.org/D49098
llvm-svn: 336621
getSafeVectorConstantForBinop() was calling getBinOpIdentity() assuming
that the constant we wanted was operand 1 (RHS). That's wrong, but I
don't think we could expose a bug or even a suboptimal fold from that
because the callers have other guards for any binop that would have
been affected.
llvm-svn: 336617
Summary:
This adds support for binary atomic read-modify-write instructions:
add, sub, and, or, xor, and xchg.
This does not yet support translations of some of LLVM IR atomicrmw
instructions (nand, max, min, umax, and umin) that do not have a direct
counterpart in wasm instructions.
Reviewers: dschuff
Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D49088
llvm-svn: 336615
Summary:
Support for this option is needed for building Linux kernel.
This is a very frequently requested feature by kernel developers.
More details : https://lkml.org/lkml/2018/4/4/601
GCC option description for -fdelete-null-pointer-checks:
This Assume that programs cannot safely dereference null pointers,
and that no code or data element resides at address zero.
-fno-delete-null-pointer-checks is the inverse of this implying that
null pointer dereferencing is not undefined.
This feature is implemented in LLVM IR in this CL as the function attribute
"null-pointer-is-valid"="true" in IR (Under review at D47894).
The CL updates several passes that assumed null pointer dereferencing is
undefined to not optimize when the "null-pointer-is-valid"="true"
attribute is present.
Reviewers: t.p.northover, efriedma, jyknight, chandlerc, rnk, srhines, void, george.burgess.iv
Reviewed By: efriedma, george.burgess.iv
Subscribers: eraman, haicheng, george.burgess.iv, drinkcat, theraven, reames, sanjoy, xbolva00, llvm-commits
Differential Revision: https://reviews.llvm.org/D47895
llvm-svn: 336613
I don't think there's a need to use `const char *`. In most (probably all?)
cases, we need a length of a name later, so discarding a length will
lead to a wasted effort.
Differential Revision: https://reviews.llvm.org/D49046
llvm-svn: 336612
In non-zero address spaces, we were reporting that an object at `null`
always occupies zero bytes. This is incorrect in many cases, so just
return `unknown` in those cases for now.
Differential Revision: https://reviews.llvm.org/D48860
llvm-svn: 336611
delegate method (and unit test).
The name 'replace' better captures what the old delegate method did: it
returned materialization responsibility for a set of symbols to the VSO.
The new delegate method delegates responsibility for a set of symbols to a new
MaterializationResponsibility instance. This can be used to split responsibility
between multiple threads, or multiple materialization methods.
llvm-svn: 336603
Added __float128 support for a number of rounding operations:
trunc
rint
nearbyint
round
floor
ceil
Differential Revision: https://reviews.llvm.org/D48415
llvm-svn: 336601
Summary:
- Changed variable/function names to be more consistent
- Improved comments in test files
- Added more tests
- Fixed a few typos
- Misc. cosmetic changes
Reviewers: dschuff
Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D49087
llvm-svn: 336598
Summary:
This patch adds support for the atomicrmw instructions and the strong
cmpxchg instruction to the IRTranslator.
I've left out weak cmpxchg because LangRef.rst isn't entirely clear on what
difference it makes to the backend. As far as I can tell from the code, it
only matters to AtomicExpandPass which is run at the LLVM-IR level.
Reviewers: ab, t.p.northover, qcolombet, rovka, aditya_nandakumar, volkan, javed.absar
Reviewed By: qcolombet
Subscribers: kristof.beyls, javed.absar, igorb, llvm-commits
Differential Revision: https://reviews.llvm.org/D40092
llvm-svn: 336589
Build error on Android; reported by and fix provided by (thanks) by Mauro Rossi <issor.oruam@gmail.com>
Fixes the following building error:
external/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp:1903:61:
error: comparison of integers of different signs:
'typename iterator_traits<__wrap_iter<MachineBasicBlock **> >::difference_type'
(aka 'int') and 'unsigned int' [-Werror,-Wsign-compare]
BlockWaitcntProcessedSet.end(), &MBB) < Count)) {
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ^ ~~~~~
1 error generated.
Differential Revision: https://reviews.llvm.org/D49089
llvm-svn: 336588
These won't work for the forseeable future. These aren't allowed
from OpenCL, but IPO optimizations can make them appear.
Also directly set the attributes on functions, regardless
of the linkage rather than cloning functions like before.
llvm-svn: 336587
Summary:
This adds a reverse transform for the instcombine canonicalizations
that were added in D47980, D47981.
As discussed later, that was worse at least for the code size,
and potentially for the performance, too.
https://rise4fun.com/Alive/Zmpl
Reviewers: craig.topper, RKSimon, spatel
Reviewed By: spatel
Subscribers: reames, llvm-commits
Differential Revision: https://reviews.llvm.org/D48768
llvm-svn: 336585
Summary:
If the encoding is not specified in CIE augmentation string, then it
should be DW_EH_PE_absptr instead of DW_EH_PE_omit.
Reviewers: ruiu, MaskRay, plotfi, rafauler
Reviewed By: MaskRay
Subscribers: rafauler, JDevlieghere, llvm-commits
Differential Revision: https://reviews.llvm.org/D49000
llvm-svn: 336577
This is similar to what is done for binops. I don't know if this would have helped us catch the bug fixed in r336566 earlier or not, but I figured it couldn't hurt.
llvm-svn: 336576
This patch ports hasDedicatedExits, getUniqueExitBlocks and
getUniqueExitBlock in Loop to LoopBase so that they can be used
from other LoopBase sub-classes.
Reviewers: chandlerc, sanjoy, hfinkel, fhahn
Reviewed By: chandlerc
Differential Revision: https://reviews.llvm.org/D48817
llvm-svn: 336572
As discussed in D49047 / D48987, shift-by-undef produces poison,
so we can't use undef vector elements in that case..
Note that we need to extend this for poison-generating flags,
and there's a proposal to create poison from FMF in D47963,
llvm-svn: 336562
Summary:
To allow bitcode built by old compiler to pass the current verifer,
BitcodeReader needs to auto infer the correct runtime preemption from
linkage and visibility for GlobalValues.
Since llvm-6.0 bitcode already contains the new field but can be
incorrect in some cases, the attribute needs to be recomputed all the
time in BitcodeReader. This will make all the GVs has dso_local marked
correctly if read from bitcode, and it should still allow the verifier
to catch mistakes in optimization passes.
This should fix PR38009.
Reviewers: sfertile, vsk
Reviewed By: vsk
Subscribers: dexonsmith, llvm-commits
Differential Revision: https://reviews.llvm.org/D49039
llvm-svn: 336560
This is almost NFC, but there could be some case where the original
code had undefs in the constants (rather than just the shuffle mask),
and we'll use safe constants rather than undefs now.
The FIXME noted in foldShuffledBinop() is already visible in existing
tests, so correcting that is the next step.
llvm-svn: 336558
These patterns mapped (v2f64 (X86vzmovl (v2f64 (scalar_to_vector FR64:$src)))) to a MOVSD and an zeroing XOR. But the complexity of a pattern for (v2f64 (X86vzmovl (v2f64))) that selects MOVQ is artificially and hides this MOVSD pattern.
Weirder still, the SSE version of the pattern was explicitly blocked on SSE41, but yet we had copied it to AVX and AVX512.
llvm-svn: 336556
This patch introduces a VPValue in VPBlockBase to represent the condition
bit that is used as successor selector when a block has multiple successors.
This information wasn't necessary until now, when we are about to introduce
outer loop vectorization support in VPlan code gen.
Reviewers: fhahn, rengolin, mkuper, hfinkel, mssimpso
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D48814
llvm-svn: 336554
This patch adds support for the following instructions:
CNTB CNTH - Determine the number of active elements implied by
CNTW CNTD the named predicate constant, multiplied by an
immediate, e.g.
cnth x0, vl8, #16
CNTP - Count active predicate elements, e.g.
cntp x0, p0, p1.b
counts the number of active elements in p1, predicated
by p0, and stores the result in x0.
llvm-svn: 336552
This patch completes support for shifts, which include:
- LSL - Logical Shift Left
- LSLR - Logical Shift Left, Reversed form
- LSR - Logical Shift Right
- LSRR - Logical Shift Right, Reversed form
- ASR - Arithmetic Shift Right
- ASRR - Arithmetic Shift Right, Reversed form
- ASRD - Arithmetic Shift Right for Divide
In the following variants:
- Predicated shift by immediate - ASR, LSL, LSR, ASRD
e.g.
asr z0.h, p0/m, z0.h, #1
(active lanes of z0 shifted by #1)
- Unpredicated shift by immediate - ASR, LSL*, LSR*
e.g.
asr z0.h, z1.h, #1
(all lanes of z1 shifted by #1, stored in z0)
- Predicated shift by vector - ASR, LSL*, LSR*
e.g.
asr z0.h, p0/m, z0.h, z1.h
(active lanes of z0 shifted by z1, stored in z0)
- Predicated shift by vector, reversed form - ASRR, LSLR, LSRR
e.g.
lslr z0.h, p0/m, z0.h, z1.h
(active lanes of z1 shifted by z0, stored in z0)
- Predicated shift left/right by wide vector - ASR, LSL, LSR
e.g.
lsl z0.h, p0/m, z0.h, z1.d
(active lanes of z0 shifted by wide elements of vector z1)
- Unpredicated shift left/right by wide vector - ASR, LSL, LSR
e.g.
lsl z0.h, z1.h, z2.d
(all lanes of z1 shifted by wide elements of z2, stored in z0)
*Variants added in previous patches.
llvm-svn: 336547
As noted in D48987, there are many different ways for this transform to go wrong.
In particular, the poison potential for shifts means we have to more careful with those ops.
I added tests to make that behavior visible for all of the different cases that I could find.
This is a partial fix. To make this review easier, I did not make changes for the single binop
pattern (handled in foldSelectShuffleWith1Binop()). I also left out some potential optimizations
noted with TODO comments. I'll follow-up once we're confident that things are correct here.
The goal is to correct all marked FIXME tests to either avoid the shuffle transform or do it safely.
Note that distinguishing when the shuffle mask contains undefs and using getBinOpIdentity() allows
for some improvements to div/rem patterns, so there are wins along with the missed opportunities
and fixes.
Differential Revision: https://reviews.llvm.org/D49047
llvm-svn: 336546
Support for SVE's TBL instruction for programmable table
lookup/permute using vector of element indices, e.g.
tbl z0.d, { z1.d }, z2.d
stores elements from z1, indexed by elements from z2, into z0.
llvm-svn: 336544
Summary:
This patch adds a new "integer" ValueType, and renames Number -> Double.
This allows us to preserve the full precision of int64_t when parsing integers
from the wire, or constructing from an integer.
The API is unchanged, other than giving asInteger() a clearer contract.
In addition, always output doubles with enough precision that parsing will
reconstruct the same double.
Reviewers: simon_tatham
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D46209
llvm-svn: 336541
r335553 with the non-trivial unswitching of switches.
The code correctly updated most aspects of the CFG and analyses, but
missed some crucial aspects:
1) When multiple cases have the same successor, we unswitch that
a single time and replace the switch with a direct branch. The CFG
here is correct, but the target of this direct branch may have had
a PHI node with multiple entries in it.
2) When we still have to clone a successor of the switch into an
unswitched copy of the loop, we'll delete potentially multiple edges
entering this successor, not just one.
3) We also have to delete multiple edges entering the successors in the
original loop when they have to be retained.
4) When the "retained successor" *also* occurs as a case successor, we
just assert failed everywhere. This doesn't happen very easily
because its always valid to simply drop the case -- the retained
successor for switches is always the default successor. However, it
is likely possible through some contrivance of different loop passes,
unrolling, and simplifying for this to occur in practice and
certainly there is nothing "invalid" about the IR so this pass needs
to handle it.
5) In the case of #4, we also will replace these multiple edges with
a direct branch much like in #1 and need to collapse the entries in
any PHI nodes to a single enrty.
All of this stems from the delightful fact that the same successor can
show up in multiple parts of the switch terminator, and each of these
are considered a distinct edge for the purpose of PHI nodes (and
iterating the successors and predecessors) but not for unswitching
itself, the dominator tree, or many other things. For the record,
I intensely dislike this "feature" of the IR in large part because of
the complexity it causes in passes like this. We already have a ton of
logic building sets and handling duplicates, and we just had to add
a bunch more.
I've added a complex test case that covers all five of the above failure
modes. I've also added a variation on it where #4 and #5 occur in loop
exit, adding fun where we have an LCSSA PHI node with "multiple entries"
despite have dedicated exits. There were no additional issues found by
this, but it seems a useful corner case to cover with testing.
One thing that working on all of this code has made painfully clear for
me as well is how amazingly inefficient our PHI node representation is
(in terms of the in-memory data structures and the APIs used to update
them). This code has truly marvelous complexity bounds because every
time we remove an entry from a PHI node we do a linear scan to find it
and then a linear update to the data structure to remove it. We could in
theory batch all of the PHI node updates into a single linear walk of
the operands making this much more efficient, but the APIs fight hard
against this and the fact that we have to handle duplicates in the
peculiar manner we do (removing all but one in some cases) makes even
implementing that very tedious and annoying. Anyways, none of this is
new here or specific to loop unswitching. All code in LLVM that updates
PHI node operands suffers from these problems.
llvm-svn: 336536
Summary:
This consists of four main parts:
- an type json::Expr representing JSON values of dynamic kind, which can be
composed, inspected, and modified
- a JSON parser from string -> json::Expr
- a JSON printer from json::Expr -> string, with optional pretty-printing
- a convention for mapping json::Expr <=> native types (fromJSON/toJSON)
Mapping functions are provided for primitives (e.g. int, vector) and the
ObjectMapper helper helps implement fromJSON for struct/object types.
Based on clangd's usage, a couple of places I'd appreciate review attention:
- fromJSON returns only bool. A richer error-signaling mechanism may be useful
to provide useful messages, or let recursive fromJSONs (containers/structs)
do careful error recovery.
- should json::obj be always explicitly written (like json::ary)
- there's no streaming parse API. I suspect there are some simple wins like
a callback API where the document is a long array, and each element is small.
But this can probably be bolted on easily when we see the need.
Reviewers: bkramer, labath
Subscribers: mgorny, ilya-biryukov, ioeric, MaskRay, llvm-commits
Differential Revision: https://reviews.llvm.org/D45753
llvm-svn: 336534
This patch adds support for:
UZP1 Concatenate even elements from two vectors
UZP2 Concatenate odd elements from two vectors
TRN1 Interleave even elements from two vectors
TRN2 Interleave odd elements from two vectors
With variants for both data and predicate vectors, e.g.
uzp1 z0.b, z1.b, z2.b
trn2 p0.s, p1.s, p2.s
llvm-svn: 336531
When emitting the DWARF accelerator tables from dsymutil, we don't have
a DwarfDebug instance and we use a custom class to represent Dwarf
compile units. This patch adds an interface AccelTableWriterInfo to
abstract these from the Dwarf5AccelTableWriter, so we can have a custom
implementation for this in dsymutil.
Differential revision: https://reviews.llvm.org/D49031
llvm-svn: 336529
Summary:
PGOMemOPSize only modifies CFG in a couple of places; thus we can preserve the DominatorTree with little effort.
When optimizing SQLite with -O3, this patch can decrease 3.8% of the numbers of nodes traversed by DFS and 5.7% of the times DominatorTreeBase::recalculation is called.
Reviewers: kuhar, davide, dmgreen
Reviewed By: dmgreen
Subscribers: mzolotukhin, vsk, llvm-commits
Differential Revision: https://reviews.llvm.org/D48914
llvm-svn: 336522
This replaces some asserts in lowerV2F64VectorShuffle with the similar asserts from lowerVIF64VectorShuffle which are more readable. The original asserts mentioned a blend, but there's no guarantee that it is a blend.
Also remove an if that the asserts prove is always true. Mask[0] is always less than 2 and Mask[1] is always at least 2. Therefore (Mask[0] >= 2) + (Mask[1] >= 2) == 1 must wlays be true.
llvm-svn: 336517
It only existed on SSE and AVX version. AVX512 version didn't have it.
I checked the generated table and this didn't seem necessary to creat a match preference.
llvm-svn: 336516
Summary:
{F6603964}
While there is still some discrepancies within that new group,
it is clearly separate from the other shifts.
And Agner's tables agree, these double shifts are clearly
different from the normal shifts/rotates.
I'm guessing `FeatureSlowSHLD` is related.
Indeed, a basic sched pair is *not* the /best/ match.
But keeping it in the WriteShift is /clearly/ not ideal either.
This can and likely will be fine-tuned later.
This is purely mechanical change, it does not change any numbers,
as the [lack of the change of] mca tests show.
Reviewers: craig.topper, RKSimon, andreadb
Reviewed By: craig.topper
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D49015
llvm-svn: 336515
Pre-AVX512 (which can perform a quick extend/shift/truncate), extending to 2 v8i16 for the PMULLW and then truncating is more performant than relying on the generic PBLENDVB vXi8 shift path and uses a similar amount of mask constant pool data.
Differential Revision: https://reviews.llvm.org/D48963
llvm-svn: 336513
Summary:
Motivation: {F6597954}
This only does the mechanical splitting, does not actually change
any numbers, as the tests added in previous revision show.
Reviewers: craig.topper, RKSimon, courbet
Reviewed By: craig.topper
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D48998
llvm-svn: 336511
In the 'detectCTLZIdiom' function support for loops that use LSHR instruction instead of ASHR has been added.
This supports creating ctlz from the following code.
int lzcnt(int x) {
int count = 0;
while (x > 0) {
count++;
x = x >> 1;
}
return count;
}
Patch by Olga Moldovanova
Differential Revision: https://reviews.llvm.org/D48354
llvm-svn: 336509
This allows us to handle masking in a very similar way to the default rounding version that uses llvm.fma.
I had to add new rounding mode CodeGenOnly instructions to support isel when we can't find a movss to grab the upper bits from to use the b_Int instruction.
Fast-isel tests have been updated to match new clang codegen.
We are currently having trouble folding fneg into the new intrinsic. I'm going to correct that in a follow up patch to keep the size of this one down.
A future patch will also remove the old intrinsics.
llvm-svn: 336506
Splits off isKnownNeverZeroFloat to handle +/- 0 float cases.
This will make it easier to be more aggressive with the integer isKnownNeverZero tests (similar to ValueTracking), use computeKnownBits etc.
Differential Revision: https://reviews.llvm.org/D48969
llvm-svn: 336492
As discussed on PR37989, this patch adds EXTRACT_SUBVECTOR handling to TargetLowering::SimplifyDemandedVectorElts and calls it from DAGCombiner::visitEXTRACT_SUBVECTOR.
Differential Revision: https://reviews.llvm.org/D48825
llvm-svn: 336490
We penalize general SDIV/UDIV costs but don't do the same for SREM/UREM.
This patch makes general vector SREM/UREM x20 as costly as scalar, the same approach as we do for SDIV/UDIV. The patch also extends the existing SDIV/UDIV constant costs for SREM/UREM - at the moment this means the additional cost of a MUL+SUB (see D48975).
Differential Revision: https://reviews.llvm.org/D48980
llvm-svn: 336486
The checking is done deeper inside MachineBasicBlock, but this will
hopefully help to find issues when porting the machine outliner to a
target where Liveness tracking is broken (like ARM).
Differential Revision: https://reviews.llvm.org/D49023
llvm-svn: 336481
after trivial unswitching.
This PR illustrates that a fundamental analysis update was not performed
with the new loop unswitch. This update is also somewhat fundamental to
the core idea of the new loop unswitch -- we actually *update* the CFG
based on the unswitching. In order to do that, we need to update the
loop nest in addition to the domtree.
For some reason, when writing trivial unswitching, I thought that the
loop nest structure cannot be changed by the transformation. But the PR
helps illustrate that it clearly can. I've expanded this to a number of
different test cases that try to cover the different cases of this. When
we unswitch, we move an exit edge of a loop out of the loop. If this
exit edge changes which loop reached by an exit is the innermost loop,
it changes the parent of the loop. Essentially, this transformation may
hoist the inner loop up the nest. I've added the simple logic to handle
this reliably in the trivial unswitching case. This just requires
updating LoopInfo and rebuilding LCSSA on the impacted loops. In the
trivial case, we don't even need to handle dedicated exits because we're
only hoisting the one loop and we just split its preheader.
I've also ported all of these tests to non-trivial unswitching and
verified that the logic already there correctly handles the loop nest
updates necessary.
Differential Revision: https://reviews.llvm.org/D48851
llvm-svn: 336477
appendToVector used the wrong overload of SmallVector::append, resulting
in it appending the same element to a vector `getSize()` times. This did
not cause a problem when initially committed because appendToVector was
only used to append 1-element operands.
This changes appendToVector to use the correct overload of append().
Testing: ./unittests/IR/IRTests --gtest_filter='*DIExpressionTest*'
llvm-svn: 336466
The reference implementation uses a case-insensitive string
comparison for strings of equal length. This will cause the
string "tEo" to compare less than "VUo". However we were using
a case sensitive comparison, which would generate the opposite
outcome. Switch to a case insensitive comparison. Also, when
one of the strings contains non-ascii characters, fallback to
a straight memcmp.
The only way to really test this is with a DIA test. Before this
patch, the test will fail (but succeed if link.exe is used instead
of lld-link). After the patch, it succeeds even with lld-link.
llvm-svn: 336464
It's a bit neater to write T.isIntOrPtrTy() over `T.isIntegerTy() ||
T.isPointerTy()`.
I used Python's re.sub with this regex to update users:
r'([\w.\->()]+)isIntegerTy\(\)\s*\|\|\s*\1isPointerTy\(\)'
llvm-svn: 336462
We've removed the legacy FMA3 intrinsics and are now using llvm.fma and extractelement/insertelement. So we don't need patterns for the nodes that could only be created by the old intrinscis. Those ISD opcodes still exist because we haven't dropped the AVX512 intrinsics yet, but those should go to EVEX instructions.
llvm-svn: 336457
The replaceAllDbgUsesWith utility helps passes preserve debug info when
replacing one value with another.
This improves upon the existing insertReplacementDbgValues API by:
- Updating debug intrinsics in-place, while preventing use-before-def of
the replacement value.
- Falling back to salvageDebugInfo when a replacement can't be made.
- Moving the responsibiliy for rewriting llvm.dbg.* DIExpressions into
common utility code.
Along with the API change, this teaches replaceAllDbgUsesWith how to
create DIExpressions for three basic integer and pointer conversions:
- The no-op conversion. Applies when the values have the same width, or
have bit-for-bit compatible pointer representations.
- Truncation. Applies when the new value is wider than the old one.
- Zero/sign extension. Applies when the new value is narrower than the
old one.
Testing:
- check-llvm, check-clang, a stage2 `-g -O3` build of clang,
regression/unit testing.
- This resolves a number of mis-sized dbg.value diagnostics from
Debugify.
Differential Revision: https://reviews.llvm.org/D48676
llvm-svn: 336451
The enhanced version will be used in D48893 and related patches
and an almost identical (fadd is different) version is proposed
in D28907, so adding this as a preliminary step.
llvm-svn: 336444
Added statistics for the number of SMLAD instructions created, and
als renamed the pass name to -arm-parallel-dsp.
Differential Revision: https://reviews.llvm.org/D48971
llvm-svn: 336441
LoopBlockNumber is a DenseMap<BasicBlock*, int>, comparing the result of
find() will compare a pair<BasicBlock*, int>. That's of course depending
on pointer ordering which varies from run to run. Reverse iteration
doesn't find this because we're copying to a vector first.
This bug has been there since 2016 but only recently showed up on clang
selfhost with FDO and ThinLTO, which is also why I didn't manage to get
a reasonable test case for this. Add an assert that would've caught
this.
llvm-svn: 336439
D48278
Allow to reduce redundant shift masks.
For example:
x1 = x & 0xAB00
x2 = (x >> 8) & 0xAB
can be reduced to:
x1 = x & 0xAB00
x2 = x1 >> 8
It only allows folding when the masks and shift values are constants.
llvm-svn: 336426
Previously we only iterated over functions reachable from the set of
external functions in the module. But since some of the passes under
this (notably the always-inliner and coroutine lowerer) are required for
correctness, they need to run over everything.
This just adds an extra layer of iteration over the CallGraph to keep
track of which functions we've already visited and get the next batch of
SCCs.
Should fix PR38029.
llvm-svn: 336419
The intrinsics can be implemented with a f32/f64 llvm.fma intrinsic and an insert into a zero vector.
There are a couple regressions here due to SelectionDAG not being able to pull an fneg through an extract_vector_elt. I'm not super worried about this though as InstCombine should be able to do it before we get to SelectionDAG.
llvm-svn: 336416
This upgrades all of the intrinsics to use fneg instructions to convert fma into fmsub/fnmsub/fnmadd/fmsubadd. And uses a select instruction for masking.
This matches how clang uses the intrinsics these days.
llvm-svn: 336409
Power 9 does not have a hardware instruction for frem but we can call fmodf128.
Differential Revision: https://reviews.llvm.org/D48552
llvm-svn: 336406
It seems like the debugger first computes a symbol's bucket,
and then does a binary search of entries in the bucket using the
symbol's name in order to find it. If the bucket entries are not
in sorted order, this obviously won't work. After this patch a
couple of simple test cases show that we generate an exactly
identical GSI hash stream, which is very nice.
llvm-svn: 336405
Summary: The lib paths are not correctly picked up for OpenEmbedded sysroots
(like arm-oe-linux-gnueabi). I fix this in a follow-up clang patch. But in
order to add the correct libs I need to detect if the vendor is oe. For this
reason, it is first necessary to teach llvm to detect oe vendor, which is what
this patch does.
Reviewers: chandlerc, compnerd, rengolin, javed.absar
Reviewed By: compnerd
Subscribers: kristof.beyls, dexonsmith, llvm-commits
Differential Revision: https://reviews.llvm.org/D48861
llvm-svn: 336401
Summary:
If LOCK prefix is not the first prefix in an instruction, LLVM
disassembler silently drops the prefix.
The fix is to select a proper instruction with a builtin LOCK prefix if
one exists.
Reviewers: craig.topper
Reviewed By: craig.topper
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D49001
llvm-svn: 336400
Summary:
CompileOnDemandLayer.cpp uses function in these libraries, and builds
with `-DSHARED_LIB=ON` fail without this.
Reviewers: lhames
Subscribers: mgorny, llvm-commits
Differential Revision: https://reviews.llvm.org/D48995
llvm-svn: 336389
a deficiency in TableGen that has been addressed in r336334.
[AArch64][SVE] Asm: Support for predicated FP rounding instructions.
This patch also adds instructions for predicated FP square-root and
reciprocal exponent.
The added instructions are:
- FRINTI Round to integral value (current FPCR rounding mode)
- FRINTX Round to integral value (current FPCR rounding mode, signalling inexact)
- FRINTA Round to integral value (to nearest, with ties away from zero)
- FRINTN Round to integral value (to nearest, with ties to even)
- FRINTZ Round to integral value (toward zero)
- FRINTM Round to integral value (toward minus Infinity)
- FRINTP Round to integral value (toward plus Infinity)
- FSQRT Floating-point square root
- FRECPX Floating-point reciprocal exponent
llvm-svn: 336387
writing them to a buffer and re-loading them.
Also introduces a multithreaded variant of SimpleCompiler
(MultiThreadedSimpleCompiler) for compiling IR concurrently on multiple
threads.
These changes are required to JIT IR on multiple threads correctly.
No test case yet. I will be looking at how to modify LLI / LLJIT to test
multithreaded JIT support soon.
llvm-svn: 336385
Better NaN handling for AMDGCN fmed3.
All operands are checked for NaN now. The checks
were moved before the canonicalization to provide
a better mapping from fclamp. Changed the behaviour
of fmed3(x,y,NaN) to return max(x,y) instead of
min(x,y) in light of this. Updated tests as a result
and added some new cases to cover the fix.
Patch by Alan Baker
llvm-svn: 336375
Avoid using allocateKernArg / AssignFn. We do not want any
of the type splitting properties of normal calling convention
lowering.
For now at least this exists alongside the IR argument lowering
pass. This is necessary to handle struct padding correctly while
some arguments are still skipped by the IR argument lowering
pass.
llvm-svn: 336373
Map the following instructions to the proper float128 lib calls:
pow[i], exp[2], log[2|10], sin, cos, fmin, fmax
Differential Revision: https://reviews.llvm.org/D48544
llvm-svn: 336361
This is an early step towards matching Instructions by attributes other than the opcode. This will be necessary for cast/call alternates which share the same opcode but have different types/intrinsicIDs etc. - which we could vectorize as long as we split them using the alternate mechanism.
Differential Revision: https://reviews.llvm.org/D48945
llvm-svn: 336344
Wait states are not properly being inserted after buffer_store for v_interp instructions.
Add VALU to V_INTERP instructions so that the GCNHazardRecognizer can
check and insert the appropriate wait states when needed.
Differential Revision: https://reviews.llvm.org/D48772
Change-Id: Id540c9b074fc69b5c1de6b182276aa089c74aa64
llvm-svn: 336339
Similar to PR/25526, fast-regalloc introduces spills at the end of basic
blocks. When this occurs in between an ll and sc, the stores can cause the
atomic sequence to fail.
This patch fixes the issue by introducing more pseudos to represent atomic
operations and moving their lowering to after the expansion of postRA
pseudos.
This version addresses issues with the initial implementation and covers
all atomic operations.
This resolves PR/32020.
Thanks to James Cowgill for reporting the issue!
Patch By: Simon Dardis
Differential Revision: https://reviews.llvm.org/D31287
llvm-svn: 336328
This patch also adds instructions for predicated FP square-root and
reciprocal exponent.
The added instructions are:
- FRINTI Round to integral value (current FPCR rounding mode)
- FRINTX Round to integral value (current FPCR rounding mode, signalling inexact)
- FRINTA Round to integral value (to nearest, with ties away from zero)
- FRINTN Round to integral value (to nearest, with ties to even)
- FRINTZ Round to integral value (toward zero)
- FRINTM Round to integral value (toward minus Infinity)
- FRINTP Round to integral value (toward plus Infinity)
- FSQRT Floating-point square root
- FRECPX Floating-point reciprocal exponent
llvm-svn: 336322
We were miscompiling i8 loads, so reject them as unsupported narrow operations
for now.
Differential Revision: https://reviews.llvm.org/D48944
llvm-svn: 336319
Optimize code sequences for integer conversion to fp128 when the integer is a result of:
* float->int
* float->long
* double->int
* double->long
Differential Revision: https://reviews.llvm.org/D48429
llvm-svn: 336316
The alignment specified by a constant for the field
`BumpPointerAllocator::InitialBuffer` exceeded the alignment
guaranteed by `malloc` and `new` on Windows. This change set
the alignment value to that of `long double`, which is defined
by the used platform.
It fixes https://bugs.llvm.org/show_bug.cgi?id=37944.
Differential Revision: https://reviews.llvm.org/D48889
llvm-svn: 336311
Non-homogenous aggregates are passed in consecutive GPRs, in GPRs and in memory,
or in memory. This patch ensures that float128 members of non-homogenous
aggregates are passed via VSX registers.
This is done via custom lowering a bitcast of a build_pari(i64,i64) to float128
to a new PPCISD node, BUILD_FP128.
Differential Revision: https://reviews.llvm.org/D48308
llvm-svn: 336310
Legalize and emit code for quad-precision floating point operation conversion of
single-precision value to quad-precision.
Differential Revision: https://reviews.llvm.org/D47569
llvm-svn: 336307
This patch enable parameter passing and return by value for float128 types.
Passing aggregate/union which contain float128 members will be submitted in
subsequent patches.
Differential Revision: https://reviews.llvm.org/D47552
llvm-svn: 336306
We have patterns for SELECTS that top at v1i1 and we have a pattern for (v1i1 (scalar_to_vector GR8)). The patterns being removed here do the same thing as the two other patterns combined so there is no need for them.
llvm-svn: 336305
Previously we could only negate the FMADD opcodes. This used to be mostly ok when we lowered FMA intrinsics during lowering. But with the move to llvm.fma from target specific intrinsics, we can combine (fneg (fma)) to (fmsub) earlier. So if we start with (fneg (fma (fneg))) we would get stuck at (fmsub (fneg)).
This patch fixes that so we can also combine things like (fmsub (fneg)).
llvm-svn: 336304
There's a regression in here due to inability to combine fneg inputs of X86ISD::FMSUB/FNMSUB/FNMADD nodes.
More removals to come, but I wanted to stop and fix the regression that showed up in this first.
llvm-svn: 336303
Legalize and emit code for round & convert float128 to double precision and
single precision.
Differential Revision: https://reviews.llvm.org/D46997
llvm-svn: 336299
CRC and GINV ASE require revision 6, Virtualization requires revision 5.
Print a warning when revision is older than required.
Differential Revision: https://reviews.llvm.org/D48843
llvm-svn: 336296
We want to run the Machine Scheduler instead of the List Scheduler after RA.
Checked with a performance run on a Power 9 machine with SPEC 2006 and while
some benchmarks improved and others degraded the geomean was slightly improved
with the Machine Scheduler.
Differential Revision: https://reviews.llvm.org/D45265
llvm-svn: 336295
We have bailout hacks based on min/max in various places in instcombine
that shouldn't be necessary. The affected test was added for:
D48930
...which is a consequence of the improvement in:
D48584 (https://reviews.llvm.org/rL336172)
I'm assuming the visitTrunc bailout in this patch was added specifically
to avoid a change from SimplifyDemandedBits, so I'm just moving that
below the EvaluateInDifferentType optimization. A narrow min/max is still
a min/max.
llvm-svn: 336293
Support for negative immediates was implemented in
https://reviews.llvm.org/rL298380, however few instruction options were missing.
This change adds negative immediates support and respective tests
for the following:
ADD
ADDS
ADDS.W
AND.W
ANDS
BIC.W
BICS
BICS.W
SUB
SUBS
SUBS.W
Differential Revision: https://reviews.llvm.org/D48649
llvm-svn: 336286
This patch adds both a vector and an immediate form, e.g.
- Vector form:
subr z0.h, p0/m, z0.h, z1.h
subtract active elements of z0 from z1, and store the result in z0.
- Immediate form:
subr z0.h, z0.h, #255
subtract elements of z0, and store the result in z0.
llvm-svn: 336274
When creating `phi` instructions to resume at the scalar part of the loop,
copy the DebugLoc from the original phi over to the new one.
Differential Revision: https://reviews.llvm.org/D48769
llvm-svn: 336256
When zext is EvaluatedInDifferentType, InstCombine
drops the dbg.value intrinsic. This patch tries to
preserve said DI, by inserting the zext's old DI in the
resulting instruction. (Only for integer type for now)
Differential Revision: https://reviews.llvm.org/D48331
llvm-svn: 336254
We were only doing this for basic blends, despite shuffle lowering now being good enough to handle more complex blends. This means that the two v8i16 splat shifts are performed in parallel instead of serially as the general shift case.
Reapplied with a fixed (extra null tests) version of rL336113 after reversion in rL336189 - extra test case added at rL336247.
llvm-svn: 336250
SVE overloads the AArch64 PSTATE condition flags and introduces
a set of condition code aliases for the assembler. The
details are described in section 2.2 of the architecture
reference manual supplement for SVE.
In short:
SVE alias => AArch64 name
--------------------------
NONE => EQ
ANY => NE
NLAST => HS
LAST => LO
FIRST => MI
NFRST => PL
PMORE => HI
PLAST => LS
TCONT => GE
TSTOP => LT
Reviewers: rengolin, fhahn, SjoerdMeijer, samparker, javed.absar
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D48869
llvm-svn: 336245
The following code pattern:
mov %rax, %rcx
test %rax, %rax
%rax = ....
je throw_npe
mov(%rcx), %r9
mov(%rax), %r10
gets transformed into the following incorrect code after implicit null check pass:
mov %rax, %rcx
%rax = ....
faulting_load_op("movl (%rax), %r10", throw_npe)
mov(%rcx), %r9
For implicit null check pass, if the register that is checked for null value (ie, the register used in the 'test' instruction) is written into before the condition jump, we should avoid doing the optimization.
Patch by Surya Kumari Jangala!
Differential Revision: https://reviews.llvm.org/D48627
Reviewed By: skatkov
llvm-svn: 336241
This patch adds a new token type specifically for (%dx). We will now always create this token when we parse (%dx). After all operands have been parsed, if the mnemonic is in/out we'll morph this token to a regular register token. Otherwise we keep it as the special DX token which won't match any instructions.
This removes the need for passing Mnemonic through the parsing functions. It also seems closer to gas where when its used on the wrong instruction it just gets diagnosed as an invalid operand rather than a bad memory address.
llvm-svn: 336218
This might make the error message added in r335668 unneeded, but I'm not sure yet.
The check for RIP is technically unnecessary since RIP is in GR64, but that fact is kind of surprising so be explicit.
llvm-svn: 336217
As the test diffs show, the current users of getBinOpIdentity()
are InstCombine and Reassociate. SLP vectorizer is a candidate
for using this functionality too (D28907).
The InstCombine shuffle improvements are part of the planned
enhancements noted in D48830.
InstCombine actually has several other uses of getBinOpIdentity()
via SimplifyUsingDistributiveLaws(), but we don't call that for
any FP ops. Fixing that might be another part of removing the
custom reassociation in InstCombine that is only done for fadd+fmul.
llvm-svn: 336215
r336120 resulted in falling back to SelectionDAG more often due to the G_STORE
MMOs not matching the vreg size. This fixes that by explicitly any-extending the
value.
llvm-svn: 336209
Unpredicated FP-multiply of SVE vector with a vector-element given by
vector[index], for example:
fmul z0.s, z1.s, z2.s[0]
which performs an unpredicated FP-multiply of all 32-bit elements in
'z1' with the first element from 'z2'.
This patch adds restricted register classes for SVE vectors:
ZPR_3b (only z0..z7 are allowed) - for indexed vector of 16/32-bit elements.
ZPR_4b (only z0..z15 are allowed) - for indexed vector of 64-bit elements.
Reviewers: rengolin, fhahn, SjoerdMeijer, samparker, javed.absar
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D48823
llvm-svn: 336205
This is the last significant change suggested in PR37806:
https://bugs.llvm.org/show_bug.cgi?id=37806#c5
...though there are several follow-ups noted in the code comments
in this patch to complete this transform.
It's possible that a binop feeding a select-shuffle has been eliminated
by earlier transforms (or the code was just written like this in the 1st
place), so we'll fail to match the patterns that have 2 binops from:
D48401,
D48678,
D48662,
D48485.
In that case, we can try to materialize identity constants for the remaining
binop to fill in the "ghost" lanes of the vector (where we just want to pass
through the original values of the source operand).
I added comments to ConstantExpr::getBinOpIdentity() to show planned follow-ups.
For now, we only handle the 5 commutative integer binops (add/mul/and/or/xor).
Differential Revision: https://reviews.llvm.org/D48830
llvm-svn: 336196
With a view to support parallel operations that have their results
stored to memory, refactor the consecutive access helper out so it
could support stores instructions.
Differential Revision: https://reviews.llvm.org/D48872
llvm-svn: 336195
This adds the following system registers:
- RAS registers,
- MPAM registers,
- Activitiy monitor registers,
- Trace Extension registers,
- Timing insensitivity of data processing instructions,
- Enhanced Support for Nested Virtualization.
Differential Revision: https://reviews.llvm.org/D48871
llvm-svn: 336193
Summary:
When salvaging a dbg.declare/dbg.addr we should not add
DW_OP_stack_value to the DIExpression
(see test/Transforms/InstCombine/salvage-dbg-declare.ll).
Consider this example
%vla = alloca i32, i64 2
call void @llvm.dbg.declare(metadata i32* %vla, metadata !1, metadata !DIExpression())
Instcombine will turn it into
%vla1 = alloca [2 x i32]
%vla1.sub = getelementptr inbounds [2 x i32], [2 x i32]* %vla, i64 0, i64 0
call void @llvm.dbg.declare(metadata [2 x i32]* %vla1.sub, metadata !19, metadata !DIExpression())
If the GEP can be eliminated, then the dbg.declare will be salvaged
and we should get
%vla1 = alloca [2 x i32]
call void @llvm.dbg.declare(metadata [2 x i32]* %vla1, metadata !19, metadata !DIExpression())
The problem was that salvageDebugInfo did not recognize dbg.declare
as being indirect (%vla1 points to the value, it does not hold the
value), so we incorrectly got
call void @llvm.dbg.declare(metadata [2 x i32]* %vla1, metadata !19, metadata !DIExpression(DW_OP_stack_value))
I also made sure that llvm::salvageDebugInfo and
DIExpression::prependOpcodes do not add DW_OP_stack_value to
the DIExpression in case no new operands are added to the
DIExpression. That way we avoid to, unneccessarily, turn a
register location expression into an implicit location expression
in some situations (see test11 in test/Transforms/LICM/sinking.ll).
Reviewers: aprantl, vsk
Reviewed By: aprantl, vsk
Subscribers: JDevlieghere, llvm-commits
Differential Revision: https://reviews.llvm.org/D48837
llvm-svn: 336191
Lower more than 4 arguments using stack. This patch targets MIPS32.
It supports only functions with arguments of type i32.
Patch by Petar Avramovic.
Differential Revision: https://reviews.llvm.org/D47934
llvm-svn: 336185
unswitching loops.
Original patch trying to address this was sent in D47624, but that
didn't quite handle things correctly. There are two key principles used
to select whether and how to invalidate SCEV-cached information about
loops:
1) We must invalidate any info SCEV has cached before unswitching as we
may change (or destroy) the loop structure by the act of unswitching,
and make it hard to recover everything we want to invalidate within
SCEV.
2) We need to invalidate all of the loops whose CFGs are mutated by the
unswitching. Notably, this isn't the *entire* loop nest, this is
every loop contained by the outermost loop reached by an exit block
relevant to the unswitch.
And we need to do this even when doing trivial unswitching.
I've added more focused tests that directly check that SCEV starts off
with imprecise information and after unswitching (and simplifying
instructions) re-querying SCEV will produce precise information. These
tests also specifically work to check that an *outer* loop's information
becomes precise.
However, the testing here is still a bit imperfect. Crafting test cases
that reliably fail to be analyzed by SCEV before unswitching and succeed
afterward proved ... very, very hard. It took me several hours and
careful work to build these, and I'm not optimistic about necessarily
coming up with more to cover more elaborate possibilities. Fortunately,
the code pattern we are testing here in the pass is really
straightforward and reliable.
Thanks to Max Kazantsev for the initial work on this as well as the
review, and to Hal Finkel for helping me talk through approaches to test
this stuff even if it didn't come to much.
Differential Revision: https://reviews.llvm.org/D47624
llvm-svn: 336183
This patch changes order of transform in InstCombineCompares to avoid
performing transforms based on ranges which produce complex bit arithmetics
before more simple things (like folding with constants) are done. See PR37636
for the motivating example.
Differential Revision: https://reviews.llvm.org/D48584
Reviewed By: spatel, lebedev.ri
llvm-svn: 336172
Summary:
This patch is the first in a series of patches related to the [[ http://lists.llvm.org/pipermail/llvm-dev/2018-June/123883.html | RFC - A new dominator tree updater for LLVM ]].
This patch introduces the DomTreeUpdater class, which provides a cleaner API to perform updates on available dominator trees (none, only DomTree, only PostDomTree, both) using different update strategies (eagerly or lazily) to simplify the updating process.
—Prior to the patch—
- Directly calling update functions of DominatorTree updates the data structure eagerly while DeferredDominance does updates lazily.
- DeferredDominance class cannot be used when a PostDominatorTree also needs to be updated.
- Functions receiving DT/DDT need to branch a lot which is currently necessary.
- Functions using both DomTree and PostDomTree need to call the update function separately on both trees.
- People need to construct an additional DeferredDominance class to use functions only receiving DDT.
—After the patch—
Patch by Chijun Sima <simachijun@gmail.com>.
Reviewers: kuhar, brzycki, dmgreen, grosser, davide
Reviewed By: kuhar, brzycki
Author: NutshellySima
Subscribers: vsk, mgorny, llvm-commits
Differential Revision: https://reviews.llvm.org/D48383
llvm-svn: 336163
Summary:
When we import an alias (which will import a copy of the aliasee), but
aren't going to import the aliasee directly, the distributed backend
index will not contain the aliasee summary. Handle this in the summary
assembly printer by printing "null" as the aliasee.
Reviewers: davidxl, dexonsmith
Subscribers: mehdi_amini, inglorion, eraman, steven_wu, llvm-commits
Differential Revision: https://reviews.llvm.org/D48699
llvm-svn: 336160
The verifier identified several modules that were broken due to incorrect
linkage on declarations. To fix this, CompileOnDemandLayer2::extractFunction
has been updated to change decls to external linkage.
llvm-svn: 336150
Summary:
In the individual index files emitted for distributed ThinLTO backends,
the module path ids are not contiguous. Assign slots to module paths in
order to handle this better and also to get contiguous numbering in the
summary assembly.
Reviewers: davidxl, dexonsmith
Subscribers: mehdi_amini, inglorion, eraman, llvm-commits, steven_wu
Differential Revision: https://reviews.llvm.org/D48698
llvm-svn: 336148
Summary:
Comment on Transforms/LoopVersioning/incorrect-phi.ll: With the change
SCEV is able to prove that the loop doesn't wrap-self (due to zext i16
to i64), disabling the entire loop versioning pass. Removed the zext and
just use i64.
Reviewers: sanjoy
Subscribers: jlebar, hiraditya, javed.absar, bixia, llvm-commits
Differential Revision: https://reviews.llvm.org/D48409
llvm-svn: 336140
LLVM doesn't guarantee anything about the high bits of a register holding
an i1 value at the IR level, so don't translate LLVM IR i1 values directly
into WebAssembly conditional branch operands. WebAssembly's conditional
branches do demand all 32 bits be valid.
Fixes PR38019.
llvm-svn: 336138
Add registers still missing after r328016 (D43353):
- for bits 15-8 of SI, DI, BP, SP (*H), and R8-R15 (*BH),
- for bits 31-16 of R8-R15 (*WH).
Thanks to Craig Topper for pointing it out.
llvm-svn: 336134
Summary: It is common to have the following min/max pattern during the intermediate stages of SLP since we only optimize at the end. This patch tries to catch such patterns and allow more vectorization.
%1 = extractelement <2 x i32> %a, i32 0
%2 = extractelement <2 x i32> %a, i32 1
%cond = icmp sgt i32 %1, %2
%3 = extractelement <2 x i32> %a, i32 0
%4 = extractelement <2 x i32> %a, i32 1
%select = select i1 %cond, i32 %3, i32 %4
Author: FarhanaAleen
Reviewed By: ABataev, RKSimon, spatel
Differential Revision: https://reviews.llvm.org/D47608
llvm-svn: 336130
This extends D48485 to allow another pair of binops (add/or) to be combined either
with or without a leading shuffle:
or X, C --> add X, C (when X and C have no common bits set)
Here, we need value tracking to determine that the 'or' can be reversed into an 'add',
and we've added general infrastructure to allow extending to other opcodes or moving
to where other passes could use that functionality.
Differential Revision: https://reviews.llvm.org/D48662
llvm-svn: 336128
On darwin, all virtual sections have zerofill type, and having a
.zerofill directive in a non-virtual section is not allowed. Instead of
asserting, show a nicer error.
In order to use the equivalent of .zerofill in a non-virtual section,
the usage of .zero of .space is required.
This patch replaces the assert with an error.
Differential Revision: https://reviews.llvm.org/D48517
llvm-svn: 336127
Similarily, don't fold fp128 loads into SSE instructions if the load isn't aligned. Unless we're targeting an AMD CPU that doesn't check alignment on arithmetic instructions.
Should fix PR38001
llvm-svn: 336121
We currently don't any-extend vararg parameters before storing them to the stack
locations on Darwin. However, SelectionDAG however does this, and so user code
is in the wild which inadvertently relies on this extension. This can manifest
in cases where the value stored is (int)0, but the actual parameter is interpreted
by va_arg as a pointer, and so not extending to 64 bits causes the callee to
load additional undefined bits.
llvm-svn: 336120
Summary:
This patch is the first in a series of patches related to the [[ http://lists.llvm.org/pipermail/llvm-dev/2018-June/123883.html | RFC - A new dominator tree updater for LLVM ]].
This patch introduces the DomTreeUpdater class, which provides a cleaner API to perform updates on available dominator trees (none, only DomTree, only PostDomTree, both) using different update strategies (eagerly or lazily) to simplify the updating process.
—Prior to the patch—
- Directly calling update functions of DominatorTree updates the data structure eagerly while DeferredDominance does updates lazily.
- DeferredDominance class cannot be used when a PostDominatorTree also needs to be updated.
- Functions receiving DT/DDT need to branch a lot which is currently necessary.
- Functions using both DomTree and PostDomTree need to call the update function separately on both trees.
- People need to construct an additional DeferredDominance class to use functions only receiving DDT.
—After the patch—
Patch by Chijun Sima <simachijun@gmail.com>.
Reviewers: kuhar, brzycki, dmgreen, grosser, davide
Reviewed By: kuhar, brzycki
Subscribers: vsk, mgorny, llvm-commits
Author: NutshellySima
Differential Revision: https://reviews.llvm.org/D48383
llvm-svn: 336114
We were only doing this for basic blends, despite shuffle lowering now being good enough to handle more complex blends. This means that the two v8i16 splat shifts are performed in parallel instead of serially as the general shift case.
llvm-svn: 336113
Summary:
Replace use of a SmallPtrSet with a SmallSetVector to make the worklist
iteration order deterministic. This is done as the order the blocks are
removed may affect whether or not PHI nodes in successor blocks are
removed.
For example, consider the following case where %bb1 and %bb2 are
removed:
bb1:
br i1 undef, label %bb3, label %bb4
bb2:
br i1 undef, label %bb4, label %bb3
bb3:
pv1 = phi type [ undef, %bb1 ], [ undef, %bb2], [ v0, %other ]
br label %bb4
bb4:
pv2 = phi type [ undef, %bb1 ], [ undef, %bb2 ],
[ pv1, %bb3 ], [ v0, %other ]
If %bb2 is removed before %bb1, the incoming values from %bb1 and %bb2
to pv1 will be removed before %bb1 is removed as a predecessor to %bb4.
The pv1 node will thus be optimized out (to v0) at the time %bb1 is
removed as a predecessor to %bb4, leaving the blocks as following when
the incoming value from %bb1 has been removed:
bb3: ; pv1 optimized out, incoming value to pv2 is v0
br label %bb4
bb4:
pv2 = phi type [ v0, %bb3 ], [ v0, %other ]
The pv2 PHI node will be optimized away by removePredecessor() as all
incoming values are identical.
In case %bb2 is removed after %bb1, pv1 will not be optimized out at the
time %bb2 is removed as a predecessor to %bb4, leaving the blocks as
following when the incoming value from %bb2 to pv2 has been removed:
bb3:
pv1 = phi type [ undef, %bb2 ], [ v0, %other ]
br label %bb4
bb4:
pv2 = phi type [ pv1, %bb3 ], [ v0, %other ]
The pv2 PHI node will thus not be removed in this case, ultimately
leading to the following output
bb3: ; pv1 optimized out, incoming value to pv2 is v0
br label %bb4
bb4:
pv2 = phi type [ v0, %bb3 ], [ v0, %other ]
I have not looked into changing DeleteDeadBlock() so that the redundant
PHI nodes are removed.
I have not added a test case, as I was not able to create a particularly
small and (not messy) reproducer. This is likely due to SmallPtrSet
behaving deterministically when in small mode.
Reviewers: void, dexonsmith, spatel, skatkov, fhahn, bkramer, nhaehnle
Reviewed By: fhahn
Subscribers: mgrang, llvm-commits
Differential Revision: https://reviews.llvm.org/D48369
llvm-svn: 336109
The X86 asm parser currently has custom parsing logic for .word. Rather than
use this custom logic, we can just use addAliasForDirective to enable the
reuse of AsmParser::parseDirectiveValue.
See also similar changes to Sparc (rL333078), AArch64 (rL333077), and Hexagon
(rL332607) backends.
Differential Revision: https://reviews.llvm.org/D47004
This is a fixed reland of rL336100. This should have been caught in
pre-commit testing so apologies for the noise.
llvm-svn: 336104
This code is only used by alternate opcodes so the InstructionsState has already confirmed that every Value is an Instruction, plus we use cast<Instruction> which will assert on failure.
llvm-svn: 336102
The X86 asm parser currently has custom parsing logic for .word. Rather than
use this custom logic, we can just use addAliasForDirective to enable the
reuse of AsmParser::parseDirectiveValue.
See also similar changes to Sparc (rL333078), AArch64 (rL333077), and Hexagon
(rL332607) backends.
Differential Revision: https://reviews.llvm.org/D47004
llvm-svn: 336100
This version contains a fix to add values for which the state in ParamState change
to the worklist if the state in ValueState did not change. To avoid adding the
same value multiple times, mergeInValue returns true, if it added the value to
the worklist. The value is added to the worklist depending on its state in
ValueState.
Original message:
For comparisons with parameters, we can use the ParamState lattice
elements which also provide constant range information. This improves
the code for PR33253 further and gets us closer to use
ValueLatticeElement for all values.
Also, as we are using the range information in the solver directly, we
do not need tryToReplaceWithConstantRange afterwards anymore.
Reviewers: dberlin, mssimpso, davide, efriedma
Reviewed By: mssimpso
Differential Revision: https://reviews.llvm.org/D43762
llvm-svn: 336098
We were always using the opcodes of the first 2 scalars for the costs of the alternate opcode + shuffle. This made sense when we used SK_Alternate and opcodes were guaranteed to be alternating, but this fails for the more general SK_Select case.
This fix exposes an issue demonstrated by the fmul_fdiv_v4f32_const test - the SLM model has v4f32 fdiv costs which are more than twice those of the f32 scalar cost, meaning that the cost model determines that the vectorization is not performant. Unfortunately it completely ignores the fact that the fdiv by a constant will be changed into a fmul by InstCombine for a much lower cost vectorization. But at least we're seeing this now...
llvm-svn: 336095
Increment/decrement vector by multiple of predicate constraint
element count.
The variants added by this patch are:
- INCH, INCW, INC
and (saturating):
- SQINCH, SQINCW, SQINCD
- UQINCH, UQINCW, UQINCW
- SQDECH, SQINCW, SQINCD
- UQDECH, UQINCW, UQINCW
For example:
incw z0.s, all, mul #4
llvm-svn: 336090
We don't have PMCs to cover many of the Jaguar resources but we can at least monitor the FPU issue pipes which give an indication of the fpu uop count, just not the execution resources.
llvm-svn: 336089
This change fixes the issue that arises when we duplicate condition from
the predecessor block. If the condition's arguments are not considered alive
across the blocks, fast regalloc gets confused and starts generating reloads
from the slots that have never been spilled to. This change also leads to
smaller code given that, unlike on architectures with condition codes, on
Mips we can branch directly on register value, thus we gain nothing by
duplication.
Patch by Dragan Mladjenovic.
Differential Revision: https://reviews.llvm.org/D48642
llvm-svn: 336084
These patches were previously reverted as they led to
buildbot time-outs caused by large switch statement in
printAliasInstr when using UBSan and O3. The issue has
been addressed with a workaround (r335525).
llvm-svn: 336079
It looks like someone ran clang-format over this entire file which reformatted these switches into a multiline form. But I think the single line form is more useful here.
llvm-svn: 336077
I separated out the rounding and broadcast groups into their own tables because it made the ordering in the main table easier.
Further splitting of the tables might make it possible to directly index using bits from the TSFlags, but its probably not worth it right now.
llvm-svn: 336075
For the below case, pre-inc prep think it's a good candidate to use pre-inc for the bucket, but 64bit integer load/store update (pre-inc) instruction on Power requires the displacement field should be DS-form (4's multiple). Since it can't satisfy the constraint, we have to do some fix ups later. As below, the original load/stores could be well-form, it makes things worse.
unsigned long long result = 0;
unsigned long long foo(char *p, unsigned long long n) {
for (unsigned long long i = 0; i < n; i++) {
unsigned long long x1 = *(unsigned long long *)(p - 50000 + i);
unsigned long long x2 = *(unsigned long long *)(p - 61024 + i);
unsigned long long x3 = *(unsigned long long *)(p - 62048 + i);
unsigned long long x4 = *(unsigned long long *)(p - 64096 + i);
result *= x1 * x2 * x3 * x4;
}
return result;
}
Patch by jedilyn(Kewen Lin).
Differential Revision: https://reviews.llvm.org/D48813
--This line, and those below, will be ignored--
M lib/Target/PowerPC/PPCLoopPreIncPrep.cpp
A test/CodeGen/PowerPC/preincprep-i64-check.ll
llvm-svn: 336074
Summary:
This patch introduce new intrinsic -
strip.invariant.group that was described in the
RFC: Devirtualization v2
Reviewers: rsmith, hfinkel, nlopes, sanjoy, amharc, kuhar
Subscribers: arsenm, nhaehnle, JDevlieghere, hiraditya, xbolva00, llvm-commits
Differential Revision: https://reviews.llvm.org/D47103
Co-authored-by: Krzysztof Pszeniczny <krzysztof.pszeniczny@gmail.com>
llvm-svn: 336073
section flags in the ELF assembler. This matches the defaults
given in the rest of MC.
Fixes PR37997 where we couldn't assemble our own assembly output
without warnings.
llvm-svn: 336072
findCommutedOpIndices does the pre-checking for whether commuting is possible. There should be no reason left to fail in commuteInstructionImpl. There was a missing pre-check that I've added there and changed the check to an assert in commuteInstructionImpl.
llvm-svn: 336070
I've check the disassembler tables and this shouldn't be reachable. Which is good since if it was reachable there should have been a 'return' after the addOperand line.
llvm-svn: 336066
This is a simple implementation of the unroll-and-jam classical loop
optimisation.
The basic idea is that we take an outer loop of the form:
for i..
ForeBlocks(i)
for j..
SubLoopBlocks(i, j)
AftBlocks(i)
Instead of doing normal inner or outer unrolling, we unroll as follows:
for i... i+=2
ForeBlocks(i)
ForeBlocks(i+1)
for j..
SubLoopBlocks(i, j)
SubLoopBlocks(i+1, j)
AftBlocks(i)
AftBlocks(i+1)
Remainder Loop
So we have unrolled the outer loop, then jammed the two inner loops into
one. This can lead to a simpler inner loop if memory accesses can be shared
between the now jammed loops.
To do this we have to prove that this is all safe, both for the memory
accesses (using dependence analysis) and that ForeBlocks(i+1) can move before
AftBlocks(i) and SubLoopBlocks(i, j).
Differential Revision: https://reviews.llvm.org/D41953
llvm-svn: 336062
Also move the static folding tables, their search functions and the new class into new cpp/h files.
The unfolding table is effectively static data. It's just a different ordering and a subset of the static folding tables.
By putting it in a separate ManagedStatic we ensure we only have one copy instead of one per X86InstrInfo object. This way also makes it only get initialized when really needed.
llvm-svn: 336056
The class only exists to hold a DenseMap and is only created as a ManagedStatic. It used to expose a single static method that outside code was expected to use.
This patch moves that static function out of the class and moves it implementation into the cpp file. It can now access the ManagedStatic directly by name without the need for the other static method that accessed the ManagedStatic.
llvm-svn: 336055
There are no instructions that use them so they weren't causing any bad matches. But they weren't being diagnosed as "invalid register name" if they were used and would instead trigger some form of invalid operand.
llvm-svn: 336054
I believe all of these are constants so legalizing them should be pretty trivial, but this saves a step.
In one case it looks like we may have been creating a shift amount larger than the shift input itself.
llvm-svn: 336052
This combine runs pretty late and causes us to introduce a shift after the op legalization phase has run. We need to be sure we create the shift with the proper type for the shift amount. If we don't do this, we will still re-legalize the operation properly, but we won't get a chance to fully optimize the truncate that gets inserted.
So this patch adds the necessary truncate when the shift is created. I've also narrowed the subtract that gets created to always be an i32 type. The truncate would have trigered SimplifyDemandedBits to optimize it anyway. But using a more appropriate VT here is free and saves an optimization step.
llvm-svn: 336051
The combine added in commit 329525 overlooked the case where one, but not all, of the divisor elements is -1, -1 is the only power of two value for which the sdiv expansion recipe breaks.
Thanks to @zvi for the original patch.
Differential Revision: https://reviews.llvm.org/D45806
llvm-svn: 336048
Summary:
We could split sizes that are not power of two into smaller sized
G_IMPLICIT_DEF instructions, but this ends up generating
G_MERGE_VALUES instructions which we then have to handle in the instruction
selector. Since G_IMPLICIT_DEF is really a no-op it's easier just to
keep everything that can fit into a register legal.
Reviewers: arsenm
Reviewed By: arsenm
Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D48777
llvm-svn: 336041
This adds functionality to the outliner that allows targets to
specify certain functions that should be outlined from by default.
If a target supports default outlining, then it specifies that in
its TargetOptions. In the case that it does, and the user hasn't
specified that they *never* want to outline, the outliner will
be added to the pass pipeline and will run on those default functions.
This is a preliminary patch for turning the outliner on by default
under -Oz for AArch64.
https://reviews.llvm.org/D48776
llvm-svn: 336040
and diretory.
Also cleans up all the associated naming to be consistent and removes
the public access to the pass ID which was unused in LLVM.
Also runs clang-format over parts that changed, which generally cleans
up a bunch of formatting.
This is in preparation for doing some internal cleanups to the pass.
Differential Revision: https://reviews.llvm.org/D47352
llvm-svn: 336028
We have a function which switches on the type of a symbol record
to return a hardcoded offset into the record that contains the
symbol name. Not all symbols have names to begin with, and for
those records we return -1 for the offset.
Names are used for various things. Importantly for this particular
bug, a hash of the record name is used as a key for certain hash
tables which are serialied into the PDB file. One of these hash
tables is for the global symbol stream, which is basically a
collection of S_PROCREF symbols which contain the name of the
symbol, a module, and an address offset.
However, for S_PROCREF symbols, the function to return the offset
of the name was returning -1: basically it wasn't implemented.
As a result of this, all global symbols were hashing to the same
value, essentially it was as if every single global symbol's name
was the empty string.
This manifests in the VS debugger when you try to call a function
(global or member, doesn't matter) through the immediate window
and the debugger simply reports an error because it can't find the
function. This makes perfect sense, because it is hashing the name
for real, looking in the global symbol hash table, and there is only
1 entry there which corresponds to a symbol whose name is the empty
string.
Fixing this fixes the MSVC debugger in this case.
llvm-svn: 336024
Summary:
MemoryPhis now have APIs analogous to BB Phis to remove an incoming value/block.
The MemorySSAUpdater uses the above APIs when updating MemorySSA given a set of dead blocks about to be deleted.
Reviewers: george.burgess.iv
Subscribers: sanjoy, jlebar, Prazek, llvm-commits
Differential Revision: https://reviews.llvm.org/D48396
llvm-svn: 336015
Summary:
Retagging allocas before returning from the function might help
detecting use after return bugs, but it does not work at all in real
life, when instrumented and non-instrumented code is intermixed.
Consider the following code:
F_non_instrumented() {
T x;
F1_instrumented(&x);
...
}
{
F_instrumented();
F_non_instrumented();
}
- F_instrumented call leaves the stack below the current sp tagged
randomly for UAR detection
- F_non_instrumented allocates its own vars on that tagged stack,
not generating any tags, that is the address of x has tag 0, but the
shadow memory still contains tags left behind by F_instrumented on the
previous step
- F1_instrumented verifies &x before using it and traps on tag mismatch,
0 vs whatever tag was set by F_instrumented
Reviewers: eugenis
Subscribers: srhines, llvm-commits
Differential Revision: https://reviews.llvm.org/D48664
llvm-svn: 336011
When instructions with metadata are accidentally leaked, the result is a
difficult-to-find memory corruption in ~LLVMContextImpl that leads to
random crashes.
Patch by Arvīds Kokins!
llvm-svn: 336010
This was introducing unnecessary padding after the explicit
arguments, depending on the alignment of the total struct type.
Also has the side effect of avoiding creating an extra GEP for
the offset from the base kernel argument to the explicit kernel
argument offset.
llvm-svn: 335999
The important part is the creation of the SHLD/SHRD nodes. The compare and the conditional move can use target independent nodes that can be legalized on their own. This gives some opportunities to trigger the optimizations present in the lowering for those things. And its just better to limit the number of places we emit target specific nodes.
The changed test cases still aren't optimal.
Differential Revision: https://reviews.llvm.org/D48619
llvm-svn: 335998
Extends the CFGPrinter and CallPrinter with heat colors based on heuristics or
profiling information. The colors are enabled by default and can be toggled
on/off for CFGPrinter by using the option -cfg-heat-colors for both
-dot-cfg[-only] and -view-cfg[-only]. Similarly, the colors can be toggled
on/off for CallPrinter by using the option -callgraph-heat-colors for both
-dot-callgraph and -view-callgraph.
Patch by Rodrigo Caetano Rocha!
Differential Revision: https://reviews.llvm.org/D40425
llvm-svn: 335996
Previously we used a DenseMap which is costly to set up due to multiple full table rehashes as the size increases and causes the table to be reallocated.
This patch changes the table to a vector of structs. We now walk the reg->mem tables and push new entries in the mem->reg table for each row not marked TB_NO_REVERSE. Once all the table entries have been created, we sort the vector. Then we can use a binary search for lookups.
Differential Revision: https://reviews.llvm.org/D48585
llvm-svn: 335994
This allows to hoist code portion to compute reciprocal of loop
invariant denominator in integer division after codegen prepare
expansion.
Differential Revision: https://reviews.llvm.org/D48604
llvm-svn: 335988
This is a recommit of r335887, which was erroneously committed earlier.
To enable the MachineOutliner by default on AArch64, we need to be able to
disable the MachineOutliner and also provide an option to "always" enable the
outliner.
This adds that capability. It allows the user to still use the old
-enable-machine-outliner option, which defaults to "always". This is building
up to allowing the user to specify "always" versus the target default
outlining behaviour.
https://reviews.llvm.org/D48682
llvm-svn: 335986
Summary:
.debug_loc section is not supported for NVPTX target. If there is an
object whose location can change during its lifetime, we do not generate
debug location info for this variable.
Reviewers: echristo
Subscribers: jholewinski, JDevlieghere, llvm-commits
Differential Revision: https://reviews.llvm.org/D48730
llvm-svn: 335976
This was discussed in D48401 as another improvement for:
https://bugs.llvm.org/show_bug.cgi?id=37806
If we have 2 different variable values, then we shuffle (select) those lanes,
shuffle (select) the constants, and then perform the binop. This eliminates a binop.
The new shuffle uses the same shuffle mask as the existing shuffle, so there's no
danger of creating a difficult shuffle.
All of the earlier constraints still apply, but we also check for extra uses to
avoid creating more instructions than we'll remove.
Additionally, we're disallowing the fold for div/rem because that could expose a
UB hole.
Differential Revision: https://reviews.llvm.org/D48678
llvm-svn: 335974
We can have AddRec with loops having many predecessors.
This changes an assert to an early return.
Differential Revision: https://reviews.llvm.org/D48766
llvm-svn: 335965
This uses the same technique as for shifts - split the rotation into 4/2/1-bit partial rotations and select those partials based on the amount bit, making use of PBLENDVB if available. This halves the use of PBLENDVB compared to expanding to shifts, which can be a slow op.
Unfortunately I haven't found a decent way to share much of this code with the shift equivalent.
Differential Revision: https://reviews.llvm.org/D48655
llvm-svn: 335957
Initial patch adding assembly support for Armv8.4-A.
Besides adding v8.4 as a supported architecture to the usual places, this also
adds target features for the different crypto algorithms. Armv8.4-A introduced
new crypto algorithms, made them optional, and allows different combinations:
- none of the v8.4 crypto functions are supported, which is independent of the
implementation of the Armv8.0 SHA1 and SHA2 instructions.
- the v8.4 SHA512 and SHA3 support is implemented, in this case the Armv8.0
SHA1 and SHA2 instructions must also be implemented.
- the v8.4 SM3 and SM4 support is implemented, which is independent of the
implementation of the Armv8.0 SHA1 and SHA2 instructions.
- all of the v8.4 crypto functions are supported, in this case the Armv8.0 SHA1
and SHA2 instructions must also be implemented.
The v8.4 crypto instructions are added to AArch64 only, and not AArch32,
and are made optional extensions to Armv8.2-A.
The user-facing Clang options will map on these new target features, their
naming will be compatible with GCC and added in follow-up patches.
The Armv8.4-A instruction sets can be downloaded here:
https://developer.arm.com/products/architecture/a-profile/exploration-tools
Differential Revision: https://reviews.llvm.org/D48625
llvm-svn: 335953
Summary:
An alternative to D48597.
Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=37936 | PR37936 ]].
The problem is as follows:
1. `indvars` marks `%dec` as `NUW`.
2. `loop-instsimplify` runs `instsimplify`, which constant-folds `%dec` to -1 (D47908)
3. `loop-reduce` tries to do some further modification, but crashes
with an type assertion in cast, because `%dec` is no longer an `Instruction`,
If the runline is split into two, i.e. you first run `-indvars -loop-instsimplify`,
store that into a file, and then run `-loop-reduce`, there is no crash.
So it looks like the problem is due to `-loop-instsimplify` not discarding SCEV.
But in this case we can just not crash if it's not an `Instruction`.
This is just a local fix, unlike D48597, so there may very well be other problems.
Reviewers: mkazantsev, uabelho, sanjoy, silviu.baranga, wmi
Reviewed By: mkazantsev
Subscribers: evstupac, javed.absar, spatel, llvm-commits
Differential Revision: https://reviews.llvm.org/D48599
llvm-svn: 335950
Summary:
We now have two sets of generated TableGen files, one for R600 and one
for GCN, so each sub-target now has its own tables of instructions,
registers, ISel patterns, etc. This should help reduce compile time
since each sub-target now only has to consider information that
is specific to itself. This will also help prevent the R600
sub-target from slowing down new features for GCN, like disassembler
support, GlobalISel, etc.
Reviewers: arsenm, nhaehnle, jvesely
Reviewed By: arsenm
Subscribers: MatzeB, kzhuravl, wdng, mgorny, yaxunl, dstuttard, tpr, t-tye, javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D46365
llvm-svn: 335942
We don't ever check these again (unless you're using
-fno-integrated-as), so make sure the extracted bits are well-defined.
I don't think it's possible to trigger any of the assertions on trunk,
but it's difficult to prove. (The first one depends on DAGCombine to
minimize the number of set bits in AND masks; I think the others are
mathematically impossible to hit.)
llvm-svn: 335931
This is a recommit of r335879.
We shouldn't add the outliner when compiling at -O0 even if
-enable-machine-outliner is passed in. This makes sure that we
don't add it in this case.
This also removes -O0 from the outliner DWARF test.
llvm-svn: 335930
This change adds experimental support for SHT_RELR sections, proposed
here: https://groups.google.com/forum/#!topic/generic-abi/bX460iggiKg
Definitions for the new ELF section type and dynamic array tags, as well
as the encoding used in the new section are all under discussion and are
subject to change. Use with caution!
Author: rahulchaudhry
Differential Revision: https://reviews.llvm.org/D47919
llvm-svn: 335922
There's no way to expose this difference currently,
but we should use the updated variable because the
original opcodes can go stale if we transform into
something new.
llvm-svn: 335920
This fixes a regression since SVN r334523, where the object files
built targeting MinGW were rejected by GNU binutils tools. Prior to
that commit, we only put constants in comdat for MSVC configurations.
Differential Revision: https://reviews.llvm.org/D48567
llvm-svn: 335918
Summary:
The InlinerFunctionImportStats will collect and dump stats regarding how
many function inlined into the module were imported by ThinLTO.
Reviewers: wmi, dexonsmith
Subscribers: mehdi_amini, inglorion, llvm-commits, eraman
Differential Revision: https://reviews.llvm.org/D48729
llvm-svn: 335914
Mostly just adding checks for Thumb2 instructions which correspond to
ARM instructions which already had diagnostics. While I'm here, also fix
ARM-mode strd to check the input registers correctly.
Differential Revision: https://reviews.llvm.org/D48610
llvm-svn: 335909
When rewriting an alloca partition copy the DL from the
old alloca over the the new one.
Differential Revision: https://reviews.llvm.org/D48640
llvm-svn: 335904
FileOutputBuffer creates a temp file and on commit atomically
renames the temp file to the destination file. Sometimes we
want to modify an existing file in place, but still have the
atomicity guarantee. To do this we can initialize the contents
of the temp file from the destination file (if it exists), that
way the resulting FileOutputBuffer can have only selective
bytes modified. Committing will then atomically replace the
destination file as desired.
llvm-svn: 335902
This is a follow up to r335753. At the time I forgot about isProfitableToFold which makes this pretty easy.
Differential Revision: https://reviews.llvm.org/D48706
llvm-svn: 335895
Reverting because this is causing failures in the LLDB test suite on
GreenDragon.
LLVM ERROR: unsupported relocation with subtraction expression, symbol
'__GLOBAL_OFFSET_TABLE_' can not be undefined in a subtraction
expression
llvm-svn: 335894
This is an enhancement to D48401 that was discussed in:
https://bugs.llvm.org/show_bug.cgi?id=37806
We can convert a shift-left-by-constant into a multiply (we canonicalize IR in the other
direction because that's generally better of course). This allows us to remove the shuffle
as we do in the regular opcodes-are-the-same cases.
This requires a small hack to make sure we don't introduce any extra poison:
https://rise4fun.com/Alive/ZGv
Other examples of opcodes where this would work are add+sub and fadd+fsub, but we already
canonicalize those subs into adds, so there's nothing to do for those cases AFAICT. There
are planned enhancements for opcode transforms such or -> add.
Note that there's a different fold needed if we've already managed to simplify away a binop
as seen in the test based on PR37806, but we manage to get that one case here because this
fold is positioned above the demanded elements fold currently.
Differential Revision: https://reviews.llvm.org/D48485
llvm-svn: 335888
Targets should be able to define whether or not they support the outliner
without the outliner being added to the pass pipeline. Before this, the
outliner pass would be added, and ask the target whether or not it supports the
outliner.
After this, it's possible to query the target in TargetPassConfig, before the
outliner pass is created. This ensures that passing -enable-machine-outliner
will not modify the pass pipeline of any target that does not support it.
https://reviews.llvm.org/D48683
llvm-svn: 335887
We could get away with it for constant folded cases, but not for rL335719.
Thanks to Krzysztof Parzyszek for noticing.
Reapply original commit rL335821 which was reverted at rL335871 due to a WebAssembly bug that was fixed at rL335884.
llvm-svn: 335886
This reverts commit 9c7c10e4073a0bc6a759ce5cd33afbac74930091.
It relies on r335872 since that introduces the machine outliner
flags test. I meant to commit D48683 in that commit, but got mixed
up and committed D48682 instead. So, I'm reverting this and
r335872, since D48682 hasn't made it through review yet.
llvm-svn: 335882
We shouldn't add the outliner when compiling at -O0 even if
-enable-machine-outliner is passed in. This makes sure that we
don't add it in this case.
This also updates machine-outliner-flags to reflect the change
and improves the comment describing what that test does.
llvm-svn: 335879
Add NoTrapAfterNoreturn target option which skips emission of traps
behind noreturn calls even if TrapUnreachable is enabled.
Enable the feature on Mach-O to save code size; Comments suggest it is
not possible to enable it for the other users of TrapUnreachable.
rdar://41530228
DifferentialRevision: https://reviews.llvm.org/D48674
llvm-svn: 335877
To enable the MachineOutliner by default on AArch64, we need to be able to
disable the MachineOutliner and also provide an option to "always" enable the
outliner.
This adds that capability. It allows the user to still use the old
-enable-machine-outliner option, which defaults to "always". This is building
up to allowing the user to specify "always" versus the target-default
outlining behaviour.
llvm-svn: 335872
This allows hoisting of a common code, for instance if denominator
is loop invariant. Current change is expansion only, adding licm to
the target pass list going to be a separate patch. Given this patch
changes to codegen are minor as the expansion is similar to that on
DAG. DAG expansion still must remain for R600.
Differential Revision: https://reviews.llvm.org/D48586
llvm-svn: 335868
This pass is being added in order to make the information available to BasicAA,
which can't do caching of this information itself, but possibly this information
may be useful for other passes.
Incorporates code based on Daniel Berlin's implementation of Tarjan's algorithm.
Differential Revision: https://reviews.llvm.org/D47893
llvm-svn: 335857
Armv6 introduced instructions to perform 32-bit SIMD operations. The purpose of
this pass is to do some straightforward IR pattern matching to create ACLE DSP
intrinsics, which map on these 32-bit SIMD operations.
Currently, only the SMLAD instruction gets recognised. This instruction
performs two multiplications with 16-bit operands, and stores the result in an
accumulator. We will follow this up with patches to recognise SMLAD in more
cases, and also to generate other DSP instructions (like e.g. SADD16).
Patch by: Sam Parker and Sjoerd Meijer
Differential Revision: https://reviews.llvm.org/D48128
llvm-svn: 335850
We have too many mechanisms for tracking the various offsets
used for kernel arguments, so remove one. There's still a lot of
confusion with these because there are two different "implicit"
argument areas located at the beginning and end of the kernarg
segment.
Additionally, the offset was determined based on the memory
size of the split element types. This would break in a future
commit where v3i32 is decomposed into separate i32 pieces.
llvm-svn: 335830
In principle nothing should stop these from working, but
work is necessary to create an ABI for dealing with the stack
related registers.
llvm-svn: 335829
Just fix the crash for now by not doing the optimization since
figuring out how to properly convert the bits for an arbitrary
struct is a pain.
Also fix a crash when there is only an empty struct argument.
llvm-svn: 335827
These are all benign races and only visible in !NDEBUG. tsan complains
about it, but a simple atomic bool is sufficient to make it happy.
llvm-svn: 335823
SCCP does not change the CFG, so we can mark it as preserved.
Reviewers: dberlin, efriedma, davide
Reviewed By: davide
Differential Revision: https://reviews.llvm.org/D47149
llvm-svn: 335820
If a trunc has a user in a block which is not reachable from entry,
we can safely perform trunc elimination as if this user didn't exist.
llvm-svn: 335816
Remove unused ByteStreamer argument from function emitDebugLocValue.
Patch by Nikola Prica.
Differential Revision: https://reviews.llvm.org/D48590
llvm-svn: 335811
This more efficient for the isel table generator since we can use CheckChildInteger instead of MoveChild, CheckPredicate, MoveParent. This reduced the table size by 1-2K.
I wish there was a way to share the values with X86BaseInfo.h and still use a PatFrag like this. These numbers are fixed by the X86 intrinsic spec going back many years and we should never need to change them. So we shouldn't waste table bytes to support sharing.
llvm-svn: 335806
BMI2 added new shift by register instructions that have the ability to fold a load.
Normally without doing anything special isel would prefer folding a load over folding an immediate because the load folding pattern has higher "complexity". This would require an instruction to move the immediate into a register. We would rather fold the immediate instead and have a separate instruction for the load.
We used to enforce this priority by artificially lowering the complexity of the load pattern.
This patch changes this to instead reject the load fold in isProfitableToFoldLoad if there is an immediate. This is more consistent with other binops and feels less hacky.
llvm-svn: 335804
=== Generating the CG Profile ===
The CGProfile module pass simply gets the block profile count for each BB and scans for call instructions. For each call instruction it adds an edge from the current function to the called function with the current BB block profile count as the weight.
After scanning all the functions, it generates an appending module flag containing the data. The format looks like:
```
!llvm.module.flags = !{!0}
!0 = !{i32 5, !"CG Profile", !1}
!1 = !{!2, !3, !4} ; List of edges
!2 = !{void ()* @a, void ()* @b, i64 32} ; Edge from a to b with a weight of 32
!3 = !{void (i1)* @freq, void ()* @a, i64 11}
!4 = !{void (i1)* @freq, void ()* @b, i64 20}
```
Differential Revision: https://reviews.llvm.org/D48105
llvm-svn: 335794
The code to emit the pieces of the MSF file were actually in
PDBFileBuilder. Move this to MSFBuilder so that we can
theoretically emit an MSF without having a PDB file.
llvm-svn: 335789
If we turn X86ISD::AND into ISD::AND, we delete N. But we were continuing onto the next block of code even though N no longer existed.
Just happened to notice it. I assume asan didn't notice it because we explicitly unpoison deleted nodes and give them a DELETE_NODE opcode.
llvm-svn: 335787
Summary:
In r333455 we added a peephole to fix the corner cases that result
from separating base + offset lowering of global address.The
peephole didn't handle some of the cases because it only has a basic
block view instead of a function level view.
This patch replaces that logic with a machine function pass. In
addition to handling the original cases it handles uses of the global
address across blocks in function and folding an offset from LW\SW
instruction. This pass won't run for OptNone compilation, so there
will be a negative impact overall vs the old approach at O0.
Reviewers: asb, apazos, mgrang
Reviewed By: asb
Subscribers: MartinMosbeck, brucehoult, the_o, rogfer01, mgorny, rbar, johnrusso, simoncook, niosHD, kito-cheng, shiva0217, zzheng, llvm-commits, edward-jones
Differential Revision: https://reviews.llvm.org/D47857
llvm-svn: 335786
The %eiz/%riz are dummy registers that force the encoder to emit a SIB byte when it normally wouldn't. By emitting them in the disassembly output we ensure that assembling the disassembler output would also produce a SIB byte.
This should match the behavior of objdump from binutils.
llvm-svn: 335768
Now that we have the ability to legalize based on MMO's. Add support for
legalizing based on AtomicOrdering and use it to correct the legalization
of the atomic instructions.
Also extend all() to be a variadic template as this ruleset now requires
3 and 4 argument versions.
llvm-svn: 335767
As noted in the D44909 review, the transform from (fptosi+sitofp) to ftrunc
can produce -0.0 where the original code does not:
#include <stdio.h>
int main(int argc) {
float x;
x = -0.8 * argc;
printf("%f\n", (float)((int)x));
return 0;
}
$ clang -O0 -mavx fp.c ; ./a.out
0.000000
$ clang -O1 -mavx fp.c ; ./a.out
-0.000000
Ideally, we'd use IR/node flags to predicate the transform, but the IR parser
doesn't currently allow fast-math-flags on the cast instructions. So for now,
just use the function attribute that corresponds to clang's "-fno-signed-zeros"
option.
Differential Revision: https://reviews.llvm.org/D48085
llvm-svn: 335761
Summary:
Rather than just print the GUID, when it is available in the index,
print the global name as well in the function import thin link debug
messages. Names will be available when the combined index is being
built by the same process, e.g. a linker or "llvm-lto2 run".
Reviewers: davidxl
Subscribers: mehdi_amini, inglorion, eraman, steven_wu, llvm-commits
Differential Revision: https://reviews.llvm.org/D48612
llvm-svn: 335760
It isn't safe to outline sequences of instructions where x16/x17/nzcv live
across the sequence.
This teaches the outliner to check whether or not a specific canidate has
x16/x17/nzcv live across it and discard the candidate in the case that that is
true.
https://bugs.llvm.org/show_bug.cgi?id=37573https://reviews.llvm.org/D47655
llvm-svn: 335758
If we are just modifying a single bit at a variable bit position we can use the BT* instructions to make the change instead of shifting a 1(or rotating a -1) and doing a binop. These instruction also ignore the upper bits of their index input so we can also remove an and if one is present on the index.
Fixes PR37938.
llvm-svn: 335754
Summary:
AliasSet::print uses `I->printAsOperand` to print UnknownInstructions. The problem is that not all UnknownInstructions have names (e.g. call instructions). When such instructions are printed, they appear as `<badref>` in AliasSets, which is very confusing, as the values are perfectly valid.
This patch fixes that by printing UnknownInstructions without a name using `print` instead of `printAsOperand`.
Reviewers: asbirlea, chandlerc, sanjoy, grosser
Reviewed By: asbirlea
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D48609
llvm-svn: 335751
I think the intrinsics named 'avx512.mask.' should refer to the previous behavior of taking a mask argument in the intrinsic instead of using a 'select' or 'and' instruction in IR to accomplish the masking. This is more consistent with the goal that eventually we will have no intrinsics that have masking builtin. When we reach that goal, we should have no intrinsics named "avx512.mask".
llvm-svn: 335744
If a source of rcp instruction is a result of any conversion from
an integer convert it into rcp_iflag instruction. No FP exception
can ever happen except division by zero if a single precision rcp
argument is a representation of an integral number.
Differential Revision: https://reviews.llvm.org/D48569
llvm-svn: 335742
This patch adds a custom trunc store lowering for v4i8 vector types.
Since there is not v.4b register, the v4i8 is promoted to v4i16 (v.4h)
and default action for v4i8 is to extract each element and issue 4
byte stores.
A better strategy would be to extended the promoted v4i16 to v8i16
(with undef elements) and extract and store the word lane which
represents the v4i8 subvectores. The construction:
define void @foo(<4 x i16> %x, i8* nocapture %p) {
%0 = trunc <4 x i16> %x to <4 x i8>
%1 = bitcast i8* %p to <4 x i8>*
store <4 x i8> %0, <4 x i8>* %1, align 4, !tbaa !2
ret void
}
Can be optimized from:
umov w8, v0.h[3]
umov w9, v0.h[2]
umov w10, v0.h[1]
umov w11, v0.h[0]
strb w8, [x0, #3]
strb w9, [x0, #2]
strb w10, [x0, #1]
strb w11, [x0]
ret
To:
xtn v0.8b, v0.8h
str s0, [x0]
ret
The patch also adjust the memory cost for autovectorization, so the C
code:
void foo (const int *src, int width, unsigned char *dst)
{
for (int i = 0; i < width; i++)
*dst++ = *src++;
}
can be vectorized to:
.LBB0_4: // %vector.body
// =>This Inner Loop Header: Depth=1
ldr q0, [x0], #16
subs x12, x12, #4 // =4
xtn v0.4h, v0.4s
xtn v0.8b, v0.8h
st1 { v0.s }[0], [x2], #4
b.ne .LBB0_4
Instead of byte operations.
llvm-svn: 335735
This patch adds support for the q versions of the dup
(load-to-all-lanes) NEON intrinsics, such as vld2q_dup_f16() for
example.
Currently, non-q versions of the dup intrinsics are implemented
in clang by generating IR that first loads the elements of the
structure into the first lane with the lane (to-single-lane)
intrinsics, and then propagating it other lanes. There are at
least two problems with this approach. First, there are no
double-spaced to-single-lane byte-element instructions. For
example, there is no such instruction as 'vld2.8 { d0[0], d2[0]
}, [r0]'. That means we cannot rely on the to-single-lane
intrinsics and instructions to implement the q versions of the
dup intrinsics. Note that to-all-lanes instructions do support
all sizes of data items, including bytes.
The second problem with the current approach is that we need a
separate vdup instruction to propagate the structure to each
lane. So for vld4q_dup_f16() we would need four vdup instructions
in addition to the initial vld instruction.
This patch introduces dup LLVM intrinsics and reworks handling of
the currently supported (non-q) NEON dup intrinsics to expand
them into those LLVM intrinsics, thus eliminating the need for
using to-single-lane intrinsics and instructions.
Additionally, this patch adds support for u64 and s64 dup NEON
intrinsics. These are marked as Arch64-only in the ARM NEON
Reference, but it seems there are no reasons to not support them
in AArch32 mode. Please correct, if that is wrong.
That's what we generate with this patch applied:
vld2q_dup_f16:
vld2.16 {d0[], d2[]}, [r0]
vld2.16 {d1[], d3[]}, [r0]
vld3q_dup_f16:
vld3.16 {d0[], d2[], d4[]}, [r0]
vld3.16 {d1[], d3[], d5[]}, [r0]
vld4q_dup_f16:
vld4.16 {d0[], d2[], d4[], d6[]}, [r0]
vld4.16 {d1[], d3[], d5[], d7[]}, [r0]
Differential Revision: https://reviews.llvm.org/D48439
llvm-svn: 335733
This prevents InstCombine from creating mis-sized dbg.values when
replacing a sequence of casts with a simpler cast. For example, in:
(fptrunc (floor (fpext X))) -> (floorf X)
We no longer emit dbg.value(X) (with a 32-bit float operand) to describe
(fpext X) (which is a 64-bit float).
This was diagnosed by the debugify check added in r335682.
llvm-svn: 335696
Nothing was using this relationship. By splitting them we no longer need to worry about register or memory entries being empty in a group.
The memory folding tables in X86InstrInfo.cpp can be used to access this relationship if needed.
llvm-svn: 335694
Summary:
When recording uses we need to rewrite after cloning a loop we need to
check if the use is not dominated by the original def. The initial
assumption was that the cloned basic block will introduce a new path and
thus the original def will only dominate the use if they are in the same
BB, but as the reproducer from PR37745 shows it's not always the case.
This fixes PR37745.
Reviewers: haicheng, Ka-Ka
Subscribers: hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D48111
llvm-svn: 335675
LLJIT is a prefabricated ORC based JIT class that is meant to be the go-to
replacement for MCJIT. Unlike OrcMCJITReplacement (which will continue to be
supported) it is not API or bug-for-bug compatible, but targets the same
use cases: Simple, non-lazy compilation and execution of LLVM IR.
LLLazyJIT extends LLJIT with support for function-at-a-time lazy compilation,
similar to what was provided by LLVM's original (now long deprecated) JIT APIs.
This commit also contains some simple utility classes (CtorDtorRunner2,
LocalCXXRuntimeOverrides2, JITTargetMachineBuilder) to support LLJIT and
LLLazyJIT.
Both of these classes are works in progress. Feedback from JIT clients is very
welcome!
llvm-svn: 335670
This addresses post-commit feedback about the name 'skipDebugInfo' being
misleading. This name could be interpreted as meaning 'a function that
skips instructions with debug locations'.
The new name, 'skipDebugIntrinsics', makes it clear that this function
only skips debug info intrinsics.
Thanks to Adrian Prantl for pointing this out!
llvm-svn: 335667
AsynchronousSymbolQuery::canStillFail checks the value of the callback to
prevent sending it redundant error notifications, so we need to reset it after
running it.
llvm-svn: 335664
Right now, when we use RIP-relative instructions in 32-bit mode, we'll just
assert and crash.
This adds an error message which tells the user that they can't do that in
32-bit mode, so that we don't crash (and also can see the issue outside of
assert builds).
llvm-svn: 335658
This replaces most argument uses with loads, but for
now not all.
The code in SelectionDAG for calling convention lowering
is actively harmful for amdgpu_kernel. It attempts to
split the argument types into register legal types, which
results in low quality code for arbitary types. Since
all kernel arguments are passed in memory, we just want the
raw types.
I've tried a couple of methods of mitigating this in SelectionDAG,
but it's easier to just bypass this problem alltogether. It's
possible to hack around the problem in the initial lowering,
but the real problem is the DAG then expects to be able to use
CopyToReg/CopyFromReg for uses of the arguments outside the block.
Exposing the argument loads in the IR also has the advantage
that the LoadStoreVectorizer can merge them.
I'm not sure the best approach to dealing with the IR
argument list is. The patch as-is just leaves the IR arguments
in place, so all the existing code will still compute the same
kernarg size and pointlessly lowers the arguments.
Arguably the frontend should emit kernels with an empty argument
list in the first place. Alternatively a dummy array could be
inserted as a single argument just to reserve space.
This does have some disadvantages. Local pointer kernel arguments can
no longer have AssertZext placed on them as the equivalent !range
metadata is not valid on pointer typed loads. This is mostly bad
for SI which needs to know about the known bits in order to use the
DS instruction offset, so in this case this is not done.
More importantly, this skips noalias arguments since this pass
does not yet convert this to the equivalent !alias.scope and !noalias
metadata. Producing this metadata correctly seems to be tricky,
although this logically is the same as inlining into a function which
doesn't exist. Additionally, exposing these loads to the vectorizer
may result in degraded aliasing information if a pointer load is
merged with another argument load.
I'm also not entirely sure this is preserving the current clover
ABI, although I would greatly prefer if it would stop widening
arguments and match the HSA ABI. As-is I think it is extending
< 4-byte arguments to 4-bytes but doesn't align them to 4-bytes.
llvm-svn: 335650
Not sure why this logic seems to be repeated in 2 different places,
one called by the other.
On AMDGPU addrspace(3) globals start allocating at 0, so these
checks will be incorrect (not that real code actually tries
to compare these addresses)
llvm-svn: 335649
Summary: This is trying to add support for r334428.
Reviewers: sanjoy
Subscribers: jlebar, hiraditya, bixia, llvm-commits
Differential Revision: https://reviews.llvm.org/D48399
llvm-svn: 335646
I'm not sure why the code here is skipping calls since
TTI does try to do something for general calls, but it
at least should allow intrinsics.
Skip intrinsics that should not be omitted as calls, which
is by far the most common case on AMDGPU.
llvm-svn: 335645
salvageDebugInfo() performs a check that allows it to exit early without
doing a DenseMap lookup. It's a bit neater and marginally more useful to
sink this early exit into the findDbg{Addr,Users,Values} helpers.
llvm-svn: 335642
Add the generic processor for Hexagon so that it can be used
with 3rd party programs that create a back-end with the
"generic" CPU. This patch also enables the JIT for Hexagon.
Differential Revision: https://reviews.llvm.org/D48571
llvm-svn: 335641
Similar to other patches in this series:
https://reviews.llvm.org/rL335512https://reviews.llvm.org/rL335527https://reviews.llvm.org/rL335597https://reviews.llvm.org/rL335616
...this is filling a gap in analysis that is exposed by an unrelated select-of-constants transform.
I didn't see a way to unify the sext cases because each div/rem opcode results in a different fold.
Note that in this case, the backend might want to convert the select into math:
Name: sext urem
%e = sext i1 %x to i32
%r = urem i32 %y, %e
=>
%c = icmp eq i32 %y, -1
%z = zext i1 %c to i32
%r = add i32 %z, %y
llvm-svn: 335622
Since D46637 we are better at handling uniform/non-uniform constant Pow2 detection; this patch tweaks the SLP argument handling to support them.
As SLP works with arrays of values I don't think we can easily use the pattern match helpers here.
Differential Revision: https://reviews.llvm.org/D48214
llvm-svn: 335621
It is legal for a PHI node not to have a live value in a predecessor
as long as the end of the predecessor is jointly dominated by an undef
value.
llvm-svn: 335607
Summary:
If a routine with no stack frame makes a sibling call, we need to
preserve the stack space check even if the local stack frame is empty,
since the call target could be a "no-split" function (in which case
the linker needs to be able to fix up the prolog sequence in order to
switch to a larger stack).
This fixes PR37807.
Reviewers: cherry, javed.absar
Subscribers: srhines, llvm-commits
Differential Revision: https://reviews.llvm.org/D48444
llvm-svn: 335604
Summary:
Adds assembly parsing support for the module summary index (follow on
to r333335 which added the assembly writing support).
I added support to llvm-as to invoke the index parsing, so that it can
create either a bitcode file with a Module and a per-module index, or
a combined index without a Module.
I will send follow on patches soon to do the following:
- add support to tools such as llvm-lto2 to parse the per-module indexes
from assembly instead of bitcode when testing the thin link.
- verification support.
Depends on D47844 and D47842.
Reviewers: pcc, dexonsmith, mehdi_amini
Subscribers: inglorion, eraman, steven_wu, llvm-commits
Differential Revision: https://reviews.llvm.org/D47905
llvm-svn: 335602
Note: I didn't add a hasOneUse() check because the existing,
related fold doesn't have that check. I suspect that the
improved analysis and codegen make these some of the rare
canonicalization cases where we allow an increase in
instructions.
llvm-svn: 335597
When the condition code for an IT instruction is "AL" we get strange "15"
predicates on subsequent instructions. These are dealt with for most
instructions by treating them as "ARMCC::AL", but VFP takes a different path
which didn't have this code.
llvm-svn: 335594
IT instructions are allowed to have the 'AL' predicate, but it must never
result in an 'NV' predicated instruction. Essentially this means that all
branches must be 't' rather than 'e' if the predicate is 'AL'.
This patch adds a diagnostic for this during assembly (error because parsing
hits an assertion if allowed to continue) and an annotation during disassembly.
llvm-svn: 335593
changeToUnreachable may remove PHI nodes from executable blocks we found values
for and we would fail to replace them. By changing dead blocks to unreachable after
we replaced constants in all executable blocks, we ensure such PHI nodes are replaced
by their known value before.
Fixes PR37780.
Reviewers: efriedma, davide
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D48421
llvm-svn: 335588
Summary:
This is a follow-up to r334830 and r335031.
In the valueCoversEntireFragment check we now also handle
the situation when there is a variable length array (VLA)
involved, and the length of the array has been reduced to
a constant.
The ConvertDebugDeclareToDebugValue functions that are related
to PHI nodes and load instructions now avoid inserting dbg.value
intrinsics when the value does not, for certain, cover the
variable/fragment that should be described.
In r334830 we assumed that the value always covered the entire
var/fragment and we had assertions in the code to show that
assumption. However, those asserts failed when compiling code
with VLAs, so we removed the asserts in r335031. Now when we
know that the valueCoversEntireFragment check can fail also for
PHI/Load instructions we avoid to insert the faulty dbg.value
intrinsic in such situations. Compared to the Store instruction
scenario we simply drop the dbg.value here (as the variable does
not change its value due to PHI/Load, so an earlier dbg.value
describing the variable should still be valid).
Reviewers: aprantl, vsk, efriedma
Reviewed By: aprantl
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D48547
llvm-svn: 335580
Turn canonicalized subtraction back into (-1 - B) and combine it with (A + 1) into (A - B).
This is similar to the folding already done for (B ^ -1) + Const into (-1 + Const) - B.
Differential Revision: https://reviews.llvm.org/D48535
llvm-svn: 335579
These opcodes have a fixed type of i8 for their immediate and shouldn't have anything to do with the scalar shift amount used by target independent shift nodes.
llvm-svn: 335578
CallLoweringInfo's NumFixedArgs field gives the number of fixed arguments
before legalization. The ISD::OutputArg "Outs" array holds legalized
arguments, so when indexing into it to find the non-fixed arguemn, we need
to use the number of arguments after legalization.
Fixes PR37934.
llvm-svn: 335576
Summary:
Same idea as D48529, but restricted to X86 and done very late to avoid any surprises where subtract might be better for DAG combining.
This seems like the safest way to do this trick. And we consider doing it as a DAG combine later.
Reviewers: spatel, RKSimon
Reviewed By: spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D48557
llvm-svn: 335575
Summary:
Adds a string saver to the ModuleSummaryIndex so it can store value
names in the case of adding a ValueInfo for a GUID when we don't
have the name stored in a Module string table. This is motivated
by the upcoming summary parser patch, where we will read value names
from the summary entry and want to store them, even when a Module
is not available.
Currently this allows us to store the name in the legacy bitcode case,
and I have added a test to show that.
Reviewers: pcc, dexonsmith
Subscribers: mehdi_amini, inglorion, eraman, steven_wu, llvm-commits
Differential Revision: https://reviews.llvm.org/D47842
llvm-svn: 335570
This recommits r335562 and 335563 as a single commit.
The frontend will surround the intrinsic with the appropriate marshalling to/from a scalar type to match the sigature of the builtin that software expects.
By exposing the vXi1 type directly in the llvm intrinsic we make it available to optimizers much earlier. This can enable the scalar marshalling code to be optimized away.
llvm-svn: 335568
Summary:
Without this change we only add module paths to the combined index when
there is a module hash or at least one global value. Make this more
consistent by adding the module to the index whenever there is a summary
section, and it is a per-module summary (had a MODULE_CODE_SOURCE_FILENAME
record).
Since we will no longer add module paths lazily, add a new interface to get
the module info from the index that asserts it is already added.
Fixes PR37899.
Reviewers: Vlad, pcc
Subscribers: mehdi_amini, inglorion, steven_wu, llvm-commits
Differential Revision: https://reviews.llvm.org/D48511
llvm-svn: 335567
Summary:
I discovered when writing the summary parsing support that the
per-module index builder and writer are computing the GUID from the
value name alone (ignoring the linkage type). This was ok since those
GUID were not emitted in the bitcode, and there are never multiple
conflicting names in a single module.
However, I don't see a reason for making the GUID computation different
for the per-module case. It also makes things simpler on the parsing
side to have the GUID computation consistent. So this patch changes the
summary analysis phase and the per-module summary writer to compute the
GUID using the facility on the GlobalValue.
Reviewers: pcc, dexonsmith
Subscribers: llvm-commits, inglorion
Differential Revision: https://reviews.llvm.org/D47844
llvm-svn: 335560
unswitching of switches.
This works much like trivial unswitching of switches in that it reliably
moves the switch out of the loop. Here we potentially clone the entire
loop into each successor of the switch and re-point the cases at these
clones.
Due to the complexity of actually doing nontrivial unswitching, this
patch doesn't create a dedicated routine for handling switches -- it
would duplicate far too much code. Instead, it generalizes the existing
routine to handle both branches and switches as it largely reduces to
looping in a few places instead of doing something once. This actually
improves the results in some cases with branches due to being much more
careful about how dead regions of code are managed. With branches,
because exactly one clone is created and there are exactly two edges
considered, somewhat sloppy handling of the dead regions of code was
sufficient in most cases. But with switches, there are much more
complicated patterns of dead code and so I've had to move to a more
robust model generally. We still do as much pruning of the dead code
early as possible because that allows us to avoid even cloning the code.
This also surfaced another problem with nontrivial unswitching before
which is that we weren't as precise in reconstructing loops as we could
have been. This seems to have been mostly harmless, but resulted in
pointless LCSSA PHI nodes and other unnecessary cruft. With switches, we
have to get this *right*, and everything benefits from it.
While the testing may seem a bit light here because we only have two
real cases with actual switches, they do a surprisingly good job of
exercising numerous edge cases. Also, because we share the logic with
branches, most of the changes in this patch are reasonably well covered
by existing tests.
The new unswitch now has all of the same fundamental power as the old
one with the exception of the single unsound case of *partial* switch
unswitching -- that really is just loop specialization and not
unswitching at all. It doesn't fit into the canonicalization model in
any way. We can add a loop specialization pass that runs late based on
profile data if important test cases ever come up here.
Differential Revision: https://reviews.llvm.org/D47683
llvm-svn: 335553
This removes a "UDivFoldAction" in favor of a simple constant
matcher. In theory, the existing code could do more matching,
but I don't see any evidence or need for it. I've left a TODO
about using ValueTracking in case we see any regressions.
llvm-svn: 335545
std::lower_bound doesn't require the thing to search for to be the same type as the table entries. We just need to define an appropriate comparison function that can take an table entry and an intrinsic number.
llvm-svn: 335518
This avoids creating unnecessary casts if the IP used to be a dbg info
intrinsic. Fixes PR37727.
Reviewers: vsk, aprantl, sanjoy, efriedma
Reviewed By: vsk, efriedma
Differential Revision: https://reviews.llvm.org/D47874
llvm-svn: 335513
Summary:
The NetBSD Operating System installs debuginfo
files into /usr/libdata/debug, rather than other path
like in some other popular distribution.
This change makes llvm-symbolizer functional with
the basesystem executables.
Reviewers: joerg, vitalybuka
Reviewed By: vitalybuka
Subscribers: JDevlieghere, llvm-commits
Differential Revision: https://reviews.llvm.org/D48525
llvm-svn: 335511
The large code model allows code and data segments to exceed 2GB, which
means that some symbol references may require a displacement that cannot
be encoded as a displacement from RIP. The large PIC model even relaxes
the assumption that the GOT itself is within 2GB of all code. Therefore,
we need a special code sequence to materialize it:
.LtmpN:
leaq .LtmpN(%rip), %rbx
movabsq $_GLOBAL_OFFSET_TABLE_-.LtmpN, %rax # Scratch
addq %rax, %rbx # GOT base reg
From that, non-local references go through the GOT base register instead
of being PC-relative loads. Local references typically use GOTOFF
symbols, like this:
movq extern_gv@GOT(%rbx), %rax
movq local_gv@GOTOFF(%rbx), %rax
All calls end up being indirect:
movabsq $local_fn@GOTOFF, %rax
addq %rbx, %rax
callq *%rax
The medium code model retains the assumption that the code segment is
less than 2GB, so calls are once again direct, and the RIP-relative
loads can be used to access the GOT. Materializing the GOT is easy:
leaq _GLOBAL_OFFSET_TABLE_(%rip), %rbx # GOT base reg
DSO local data accesses will use it:
movq local_gv@GOTOFF(%rbx), %rax
Non-local data accesses will use RIP-relative addressing, which means we
may not always need to materialize the GOT base:
movq extern_gv@GOTPCREL(%rip), %rax
Direct calls are basically the same as they are in the small code model:
They use direct, PC-relative addressing, and the PLT is used for calls
to non-local functions.
This patch adds reasonably comprehensive testing of LEA, but there are
lots of interesting folding opportunities that are unimplemented.
I restricted the MCJIT/eh-lg-pic.ll test to Linux, since the large PIC
code model is not implemented for MachO yet.
Differential Revision: https://reviews.llvm.org/D47211
llvm-svn: 335508
With the static tables sorted we can binary search them directly for reg->mem lookups. This removes 6 DenseMaps that had to be created when X86InstrInfo is constructed.
We still have one Mem->Reg DenseMap for the reverse direction. This is created just as before by walking the reg->mem arrays to populate it.
Differential Revision: https://reviews.llvm.org/D48527
llvm-svn: 335501
This removes debug locations from ConstantSDNode and ConstantSDFPNode.
When this kind of node is materialized we no longer create a line table
entry which jumps back to the constant's first point of use. This makes
single-stepping behavior smoother, and it matches the model used by IR,
where Constants have no locations. See this thread for more context:
http://lists.llvm.org/pipermail/llvm-dev/2018-June/124164.html
I'd like to handle constant BuildVectorSDNodes and to try to eliminate
passing SDLocs to SelectionDAG::getConstant*() in follow-up commits.
Differential Revision: https://reviews.llvm.org/D48468
llvm-svn: 335497
There are quite a few if statements that enumerate all these cases. It gets
even worse in our fork of LLVM where we also have a Triple::cheri (which
is mips64 + CHERI instructions) and we had to update all if statements that
check for Triple::mips64 to also handle Triple::cheri. This patch helps to
reduce our diff to upstream and should also make some checks more readable.
Reviewed By: atanasyan
Differential Revision: https://reviews.llvm.org/D48548
llvm-svn: 335493
Note a normal select test is not currently possible because this
relies on input registers tracked in SIMachineFunctionInfo which
are not currently serializable in MIR, but this does work end-to-end
from the IR.
llvm-svn: 335490
I thought I fixed this in r308673, but that fix was
very broken. The assumption that any frame index can be used
in place of another was more widespread than I realized.
Even when stack slot sharing was disabled, this was still
replacing frame index uses with a different ID with a different
stack slot.
Really fix this by doing the coloring per-stack ID, so all of
the coloring logically done in a separate namespace. This is a lot
simpler than trying to figure out how to change the color if
the stack ID is different.
llvm-svn: 335488
If a function has sample to use, but cannot use them because of no debug
information, currently a warning will be issued to inform the missing
opportunity.
This warning assumes the binary generating the profile and the binary using
the profile are similar enough. It is not always the case. Sometimes even
if the binaries are not quite similar, we may still get some benefit by
using sampleFDO. In those cases, we may still want to apply sampleFDO but
not want to see a lot of such warnings pop up.
The patch adds an option for the warning.
Differential Revision: https://reviews.llvm.org/D48510
llvm-svn: 335484
We can prove that some delinearized subscripts do not wrap around to become
negative by the fact that they are from inbound geps of load/store locations.
This helps improve the delinearisation in cases where we can't prove that they
are non-negative from SCEV alone.
Differential Revision: https://reviews.llvm.org/D48481
llvm-svn: 335481
This should avoid relying on the pointee type
to get the alignment, particularly since pointee
types are supposed to be removed at some point.
Also fixes not getting the alignment for unsized types.
llvm-svn: 335478
Not only should SafepointIRVerifier ignore unreachable blocks (as suggested in https://reviews.llvm.org/D47011) but it also has to ignore dead blocks.
In @test2 (see the new tests):
br i1 true, label %right, label %left
left:
...
right:
...
merge:
%val = phi i8 addrspace(1)* [ ..., %left ], [ ..., %right ]
use %val
both left and right branches are reachable.
If they collide then SafepointIRVerifier reports an error.
Because of the foldable branch condition GVN finds the left branch dead and removes the phi node entry that merges values from right and left. Then the use comes from the right branch. This results in no collision.
So, SafepointIRVerifier ends up in different results depending on either GVN is run or not.
To solve this issue this patch adds Dead Block detection to SafepointIRVerifier which can ignore dead blocks while validating IR. The Dead Block detection algorithm is taken from GVN but modified to not split critical edges. That is needed to keep CFG unchanged by SafepointIRVerifier.
Patch by Yevgeny Rouban.
Reviewed By: anna, apilipenko, DaniilSuchkov
Differential Revision: https://reviews.llvm.org/D47441
llvm-svn: 335473
We should be blocking the operand while we are in the routine that tries to find commutable operand indices. Doing it later means we might have missed out on another valid set of operands we could have commuted.
The intrinsic case was the only case that could really prevent commuting in getFMA3OpcodeToCommuteOperands. All the other cases in getThreeSrcCommuteCase were not reachable conditions as they were protected by findThreeSrcCommutedOpIndices.
With that abort case pushed earlier, we can remove all the abort checks and replace with asserts.
llvm-svn: 335446
It's easy for domination numbers to get out-of-date, and this is no more
costly than any of the other verifiers we already have, so it seems nice
to have.
A stage3 build with this Works On My Machine, so this hasn't caught any
bugs... yet. :)
llvm-svn: 335444
Summary:
A WebAssemblyException object contains BBs that belong to a 'catch' part
of the try-catch-end structure. Because CFGSort requires all the BBs
within a catch part to be sorted together as it does for loops, this
pass calculates the nesting structure of catch part of exceptions in a
function. Now this assumes the use of Windows EH instructions.
Reviewers: dschuff, majnemer
Subscribers: jfb, mgorny, sbc100, jgravelle-google, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D44134
llvm-svn: 335439
Summary:
Add WebAssemblyLateEHPrepare pass that does several small jobs for
exception handling. This runs before CFGSort, and is different from
WasmEHPrepare pass that runs before ISel, even though the names are
similar.
Reviewers: dschuff, majnemer
Subscribers: sbc100, jgravelle-google, sunfish, mgorny, llvm-commits
Differential Revision: https://reviews.llvm.org/D46803
llvm-svn: 335438
They appear to be untested other than the test case for p37879.ll and I believe we should be using SimplifyDemandedElts here to handle these cases.
llvm-svn: 335436
This patch has the same motivating example as D48466:
define void @foo(i64 %x, i32 %c.0282.in, i32 %d.0280, i32* %ptr0, i32* %ptr1) {
%c.0282 = and i32 %c.0282.in, 268435455
%a16 = lshr i64 32508, %x
%a17 = and i64 %a16, 1
%tobool = icmp eq i64 %a17, 0
%. = select i1 %tobool, i32 1, i32 2
%.286 = select i1 %tobool, i32 27, i32 26
%shr97 = lshr i32 %c.0282, %.
%shl98 = shl i32 %c.0282.in, %.286
%or99 = or i32 %shr97, %shl98
%shr100 = lshr i32 %d.0280, %.
%shl101 = shl i32 %d.0280, %.286
%or102 = or i32 %shr100, %shl101
store i32 %or99, i32* %ptr0
store i32 %or102, i32* %ptr1
ret void
}
...but I'm trying to kill the setcc bool math sooner rather than later.
By matching a larger pattern that includes both the low-bit mask and the trailing add/sub,
we can create a universally good fold because we always eliminate the condition code
intermediate value.
Here are Alive proofs for these (currently instcombine folds the 'add' variants, but
misses the 'sub' patterns):
https://rise4fun.com/Alive/Gsyp
Name: sub of zext cmp mask
%a = and i8 %x, 1
%c = icmp eq i8 %a, 0
%z = zext i1 %c to i32
%r = sub i32 C1, %z
=>
%optional_cast = zext i8 %a to i32
%r = add i32 %optional_cast, C1-1
Name: add of zext cmp mask
%a = and i32 %x, 1
%c = icmp eq i32 %a, 0
%z = zext i1 %c to i8
%r = add i8 %z, C1
=>
%optional_cast = trunc i32 %a to i8
%r = sub i8 C1+1, %optional_cast
All of the tests look like improvements or neutral to me. But it is possible that x86
test+set+bitop is better than what we now show here. I suspect we could do better by
adding another fold for the 'sub' variants.
We start with select-of-constant in IR in the larger motivating test, so that's why I
included tests with selects. Proofs for those variants:
https://rise4fun.com/Alive/Bx1
Name: true const is bigger
Pre: C2 == (C1 + 1)
%a = and i8 %x, 1
%c = icmp eq i8 %a, 0
%r = select i1 %c, i64 C2, i64 C1
=>
%z = zext i8 %a to i64
%r = sub i64 C2, %z
Name: false const is bigger
Pre: C2 == (C1 + 1)
%a = and i8 %x, 1
%c = icmp eq i8 %a, 0
%r = select i1 %c, i64 C1, i64 C2
=>
%z = zext i8 %a to i64
%r = add i64 C1, %z
Differential Revision: https://reviews.llvm.org/D48466
llvm-svn: 335433
For some reason the 64-bit patterns were separated from their 8/16/32-bit friends, but only for add/sub/mul. For and/or/xor they were together.
llvm-svn: 335429
-Ensure EIP isn't used with an index reigster.
-Ensure EIP isn't used as index register.
-Ensure base register isn't a vector register.
-Ensure eiz/riz usage matches the size of their base register.
llvm-svn: 335412
FDiv is replaced with multiplication by reciprocal and invariant
reciprocal is hoisted out of the loop, while multiplication remains
even if invariant.
Switch checks for all invariant operands and only invariant
denominator to fix the issue.
Differential Revision: https://reviews.llvm.org/D48447
llvm-svn: 335411
Implements PR34259
Intrinsics.h is a very popular header. Most LLVM TUs care about things
like dbg_value, but they don't care how they are implemented. After I
split these out, IntrinsicImpl.inc is 1.7 MB, so this saves each LLVM TU
from scanning 1.7 MB of source that gets pre-processed away.
It also means we can modify intrinsic properties without triggering a
full rebuild, but that's probably less of a win.
I think the next best thing to do would be to split out the target
intrinsics into their own header. Very, very few TUs care about
target-specific intrinsics. It's very hard to split up the target
independent intrinsics like llvm.expect, assume, and dbg.value, though.
llvm-svn: 335407
Previously, to support (%dx) we left a wide open hole in our 16-bit memory address checking. This let this address value be used with any instruction without error in the parser. It would later fail in the encoder with an assertion failure on debug builds and who knows what on release builds.
This patch passes the mnemonic down to the memory operand parsing function so we can allow the (%dx) form only on specific instructions.
llvm-svn: 335403
This gets rid of a bunch of weird special cases; instead, just use SCEV
rewriting for everything. In addition to being simpler, this fixes a
bug where we would use the wrong stride in certain edge cases.
The one bit I'm not quite sure about is the trip count handling,
specifically the FIXME about overflow. In general, I think we need to
widen the exit condition, but that's probably not profitable if the new
type isn't legal, so we probably need a check somewhere. That said, I
don't think I'm making the existing problem any worse.
As a followup to this, a bunch of IV-related code in root-finding could
be cleaned up; with SCEV-based rewriting, there isn't any reason to
assume a loop will have exactly one or two PHI nodes.
Differential Revision: https://reviews.llvm.org/D45191
llvm-svn: 335400
This allows us to check these:
-16-bit addressing doesn't support scale so we should error if we find one there.
-Multiplying ESP/RSP by a scale even if the scale is 1 should be an error because ESP/RSP can't be an index.
llvm-svn: 335398
By default, the second register gets assigned to the index register slot. But ESP can't be an index register so we need to swap it with the other register.
There's still a slight bug that we allow [EAX+ESP*1]. The existence of the multiply even though its with 1 should force ESP to the index register and trigger an error, but it doesn't currently.
llvm-svn: 335394
Since we are now producing a summary also for regular LTO builds, we
need to run the NameAnonGlobals pass in those cases as well (the
summary cannot handle anonymous globals).
See https://reviews.llvm.org/D34156 for details on the original change.
This reverts commit 6c9ee4a4a438a8059aacc809b2dd57128fccd6b3.
llvm-svn: 335385
The second register is the index register and should only be %si or %di if used with a base register. And in that case the base register should be %bp or %bx.
This makes us compatible with gas.
We do still need to support both orders with Intel syntax which uses [bp+si] and [si+bp]
llvm-svn: 335384
(%bp) can't be encoded without a displacement. The encoding is instead used for displacement alone. So a 1 byte displacement of 0 must be used. But if there is an index register we can encode without a displacement.
llvm-svn: 335379
Summary:
In LoopUnswitch when replacing a branch Parent -> Succ with a conditional
branch Parent -> True & Parent->False, the DomTree updates should insert an edge for
each of True/False if True/False are different than Succ, and delete Parent->Succ edge
if both are different. The comparison with Succ appears to be incorect,
it's comparing with Parent instead.
There is no test failing either before or after this change, but it seems to me this is
the right way to do the update.
Reviewers: chandlerc, kuhar
Subscribers: sanjoy, jlebar, llvm-commits
Differential Revision: https://reviews.llvm.org/D48457
llvm-svn: 335369
Enable tryToVectorizeList to support InstructionsState alternate opcode patterns at a root (build vector etc.) as well as further down the vectorization tree.
NOTE: This patch reduces some of the debug reporting if there are opcode mismatches - I can try to add it back if it proves a problem. But it could get rather messy trying to provide equivalent verbose debug strings via getSameOpcode etc.
Differential Revision: https://reviews.llvm.org/D48488
llvm-svn: 335364
DWARF v5 explicitly represents file #0 in the line table. Prior
versions did not, so ".loc 0" is still an error in those cases.
Differential Revision: https://reviews.llvm.org/D48452
llvm-svn: 335350
SLP currently only accepts (F)Add/(F)Sub alternate counterpart ops to be merged into an alternate shuffle.
This patch relaxes this to accept any pair of BinaryOperator opcodes instead, assuming the target's cost model accepts the vectorization+shuffle.
Differential Revision: https://reviews.llvm.org/D48477
llvm-svn: 335349
With compilation fix.
Original commit message:
D39788 added a '.stack-size' section containing metadata on function stack sizes
to output ELF files behind the new -stack-size-section flag.
This change does following two things on top:
1) Imagine the case when there are -ffunction-sections flag given and there are text sections in COMDATs.
The patch adds a '.stack-size' section into corresponding COMDAT group, so that linker will be able to
eliminate them fast during resolving the COMDATs.
2) Patch sets a SHF_LINK_ORDER flag and links '.stack-size' with the corresponding .text.
With that linker will be able to do -gc-sections on dead stack sizes sections.
Differential revision: https://reviews.llvm.org/D46874
llvm-svn: 335336
D39788 added a '.stack-size' section containing metadata on function stack sizes
to output ELF files behind the new -stack-size-section flag.
This change does following two things on top:
1) Imagine the case when there are -ffunction-sections flag given and there are text sections in COMDATs.
The patch adds a '.stack-size' section into corresponding COMDAT group, so that linker will be able to
eliminate them fast during resolving the COMDATs.
2) Patch sets a SHF_LINK_ORDER flag and links '.stack-size' with the corresponding .text.
With that linker will be able to do -gc-sections on dead stack sizes sections.
Differential revision: https://reviews.llvm.org/D46874
llvm-svn: 335332
AArch64 was only setting costs for SK_Transpose, which meant that many of the simpler shuffles (e.g. SK_Select and SK_PermuteSingleSrc for larger vector elements) was being severely overestimated by the default shuffle expansion.
This patch adds costs to help improve SLP performance and avoid a regression in reductions introduced by D48174.
I'm not very knowledgeable about AArch64 shuffle lowering so I've kept the extra costs to a minimum - someone who knows this code can add extra costs which should improve vectorization a lot more.
Differential Revision: https://reviews.llvm.org/D48172
llvm-svn: 335329
This sets target feature FeatureStrictAlign for Armv6-m and Armv8-m.baseline,
because it has no support for unaligned accesses.
It looks like we always pass target feature "+strict-align" from
Clang, so this is not a user facing problem, but querying the subtarget
(in e.g. llc) for unaligned access support is incorrect.
Differential Revision: https://reviews.llvm.org/D48437
llvm-svn: 335326
Not sure why the 32/64 split is needed in the atomic_load
store hierarchies. The regular PatFrags do this, but we don't
do it for the existing handling for global.
llvm-svn: 335325
Changing the logic of scalar mask folding to check for valid input types rather
than against invalid ones, making it more robust and fixing PR37879.
Differential Revision: https://reviews.llvm.org/D48366
llvm-svn: 335323
This is the first pass in the main pipeline to use the legacy PM's
ability to run function analyses "on demand". Unfortunately, it turns
out there are bugs in that somewhat-hacky approach. At the very least,
it leaks memory and doesn't support -debug-pass=Structure. Unclear if
there are larger issues or not, but this should get the sanitizer bots
back to green by fixing the memory leaks.
llvm-svn: 335320
Summary:
We can select all instructions that are marked as legal in a full piglit run,
so now is a good time to make the TableGen'd instruction selector default
for all opcodes. This is NFC for a full piglit run, which is why there are
no tests.
Reviewers: arsenm, nhaehnle
Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D48198
llvm-svn: 335319
clear out deleted loops from the current queue beyond just the current
loop.
This is important because SimpleLoopUnswitch will now enqueue the same
loop to be re-processed. When it does this with the legacy PM, we don't
have a way of canceling the rest of the pipeline and so we can end up
deleting the loop before we reprocess it. =/
This change also makes it easy to support deleting other loops in the
queue to process, although I don't have any use cases for that.
Differential Revision: https://reviews.llvm.org/D48470
llvm-svn: 335317
With non-commutative binops, we could be using the same
variable value as operand 0 in 1 binop and operand 1 in
the other, so we have to check for that possibility and
bail out.
llvm-svn: 335312
This patch adds support for generating a call graph profile from Branch Frequency Info.
The CGProfile module pass simply gets the block profile count for each BB and scans for call instructions. For each call instruction it adds an edge from the current function to the called function with the current BB block profile count as the weight.
After scanning all the functions, it generates an appending module flag containing the data. The format looks like:
!llvm.module.flags = !{!0}
!0 = !{i32 5, !"CG Profile", !1}
!1 = !{!2, !3, !4} ; List of edges
!2 = !{void ()* @a, void ()* @b, i64 32} ; Edge from a to b with a weight of 32
!3 = !{void (i1)* @freq, void ()* @a, i64 11}
!4 = !{void (i1)* @freq, void ()* @b, i64 20}
Differential Revision: https://reviews.llvm.org/D48105
llvm-svn: 335306
Summary:
The large code model allows code and data segments to exceed 2GB, which
means that some symbol references may require a displacement that cannot
be encoded as a displacement from RIP. The large PIC model even relaxes
the assumption that the GOT itself is within 2GB of all code. Therefore,
we need a special code sequence to materialize it:
.LtmpN:
leaq .LtmpN(%rip), %rbx
movabsq $_GLOBAL_OFFSET_TABLE_-.LtmpN, %rax # Scratch
addq %rax, %rbx # GOT base reg
From that, non-local references go through the GOT base register instead
of being PC-relative loads. Local references typically use GOTOFF
symbols, like this:
movq extern_gv@GOT(%rbx), %rax
movq local_gv@GOTOFF(%rbx), %rax
All calls end up being indirect:
movabsq $local_fn@GOTOFF, %rax
addq %rbx, %rax
callq *%rax
The medium code model retains the assumption that the code segment is
less than 2GB, so calls are once again direct, and the RIP-relative
loads can be used to access the GOT. Materializing the GOT is easy:
leaq _GLOBAL_OFFSET_TABLE_(%rip), %rbx # GOT base reg
DSO local data accesses will use it:
movq local_gv@GOTOFF(%rbx), %rax
Non-local data accesses will use RIP-relative addressing, which means we
may not always need to materialize the GOT base:
movq extern_gv@GOTPCREL(%rip), %rax
Direct calls are basically the same as they are in the small code model:
They use direct, PC-relative addressing, and the PLT is used for calls
to non-local functions.
This patch adds reasonably comprehensive testing of LEA, but there are
lots of interesting folding opportunities that are unimplemented.
Reviewers: chandlerc, echristo
Subscribers: hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D47211
llvm-svn: 335297
Summary:
A reprise of D25849.
This crash was found through fuzzing some time ago and was documented in PR28879.
No check for load size has been added due to the following tests:
- Transforms/GVN/invariant.group.ll
- Transforms/GVN/pr10820.ll
These tests expect load sizes that are not a multiple of eight.
Thanks to @davide for the original patch.
Reviewers: nlopes, davide, RKSimon, reames, efriedma
Reviewed By: efriedma
Subscribers: davide, llvm-commits, Prazek
Differential Revision: https://reviews.llvm.org/D48330
llvm-svn: 335294
Summary:
This initiates a discussion on changing Polly accordingly while re-applying r335197 (D48338).
I have never worked on Polly. The proposed change to param_div_div_div_2.ll is not educated, but just patterns that match the output.
All LLVM files are already reviewed in D48338.
Reviewers: jdoerfert, bollu, efriedma
Subscribers: jlebar, sanjoy, hiraditya, llvm-commits, bixia
Differential Revision: https://reviews.llvm.org/D48453
llvm-svn: 335292
Summary:
GCC and the binutils COFF linker do comdats differently from MSVC.
If we want to be ABI compatible, we have to do what they do, which is to
emit unique section names like ".text$_Z3foov" instead of short section
names like ".text". Otherwise, the binutils linker gets confused and
reports multiple definition errors when two object files from GCC and
Clang containing the same inline function are linked together.
The best description of the issue is probably at
https://github.com/Alexpux/MINGW-packages/issues/1677, we don't seem to
have a good one in our tracker.
I fixed up the .pdata and .xdata sections needed everywhere other than
32-bit x86. GCC doesn't use associative comdats for those, it appears to
rely on the section name.
Reviewers: smeenai, compnerd, mstorsjo, martell, mati865
Subscribers: llvm-commits, hiraditya
Differential Revision: https://reviews.llvm.org/D48402
llvm-svn: 335286
This is the simplest case from PR37806:
https://bugs.llvm.org/show_bug.cgi?id=37806
If we have a common variable operand used in a pair of binops with vector constants
that are vector selected together, then we can constant shuffle the constant vectors
to eliminate the shuffle instruction.
This has some tricky parts that are hopefully addressed in the tests and their
respective comments:
1. If the shuffle mask contains an undef element, then that lane of the result is
undef:
http://llvm.org/docs/LangRef.html#shufflevector-instruction
Therefore, we can replace the constant in that lane with an undef value except
for div/rem. With div/rem, an undef in the divisor would cause the whole op to
be undef. So I'm using the same hack as in D47686 - replace the undefs with '1'.
2. Intersect the wrapping and FMF of the original binops for the new binop. There
should be no extra poison or fast-math potential in the new binop that wasn't
possible in the original code.
3. Disregard other uses. Given that we're eliminating uses (shortening the
dependency chain), I think that's always the right IR canonicalization. But
I purposely chose the udiv test to demonstrate the scenario where both
intermediate values have other uses because that seems likely worse for
codegen with an expensive math op. This seems like a very rare possibility to
me, so I don't think it requires a backend patch first.
Differential Revision: https://reviews.llvm.org/D48401
llvm-svn: 335283
Update AMDGPU assembler syntax behind the code-object-v3 feature:
* Replace/rename most AMDGPU assembler directives/symbols and document them.
* Provide more diagnostics (e.g. values out of range, missing values, repeated
values).
* Provide path for backwards compatibility, even with underlying descriptor
changes.
Differential Revision: https://reviews.llvm.org/D47736
llvm-svn: 335281
This reverts commit r335206.
As discussed here: https://reviews.llvm.org/rL333740, a fix will come
tomorrow. In the meanwhile, revert this to fix some bots.
llvm-svn: 335272
BlockWaitcntProcessedSet was not being cleared between calls, so it was
producing incorrect counts in cases where MBB addresses happened to coincide
across multiple calls.
Differential Revision: https://reviews.llvm.org/D48391
llvm-svn: 335268
and everything that comes with it from implementation
and v3 header files.
Leave definition in v2 header files for backwards
compatibility.
Differential Revision: https://reviews.llvm.org/D48191
llvm-svn: 335267
Summary:
The logic for handling the sinking of COPY instructions was generating
different code when building with debug flags.
The original code did not take into consideration debug instructions. This
resulted in the registers in the DBG_VALUE instructions being treated as used,
and prevented the COPY from being sunk. This patch avoids analyzing debug
instructions when trying to sink COPY instructions.
This patch also creates a routine from the code in MachineSinking::SinkInstruction to
perform the logic of sinking an instruction along with its debug instructions.
This functionality is used in multiple places, including the code for sinking COPY instrs.
Reviewers: junbuml, javed.absar, MatzeB, bjope
Reviewed By: bjope
Subscribers: aprantl, probinson, thegameg, jonpa, bjope, vsk, kristof.beyls, JDevlieghere, llvm-commits
Tags: #debug-info
Differential Revision: https://reviews.llvm.org/D45637
llvm-svn: 335264
The previous code worked with vectors, but it failed when the
vector constants contained undef elements.
The matchers handle those cases.
llvm-svn: 335262
This is outwardly NFC from what I can tell, but it should be more efficient
to simplify first (despite the name, SimplifyAssociativeOrCommutative does
not actually simplify as InstSimplify does - it creates/morphs instructions).
This should make it easier to refactor duplicated code that runs for all binops.
llvm-svn: 335258
This reverts commit d8f57105010cc7e78026e511d5def873fc91e0e7.
Original Commit:
Author: Haicheng Wu <haicheng@codeaurora.org>
Date: Sun Feb 18 13:51:33 2018 +0000
[AArch64] Coalesce Copy Zero during instruction selection
Add special case for copy of zero to avoid a double copy.
Differential Revision: https://reviews.llvm.org/D36104
Author's intention is to remove a BB that has one mov instruction. In
order to do that, d8f571050 pessmizes MachineSinking by introducing a
copy, such that mov instruction is NOT moved to the BB. Optimization
downstream gets rid of the BB with only mov instruction. This works well
if we have only one fall through branch as there is only one "extra"
mov instruction.
If we have multiple fall throughs, we will have a lot of redundant movs.
In such a case, it's better to have this BB which has one mov instruction.
This is causing degradation in jpeg, fft and other codebases. I believe
if we want to remove a BB with only one branch instruction, we should not
pessimize Machine Sinking at all, and find some other solution.
llvm-svn: 335251
Allowed folding for "and/or" binops with non-constant operand if
arguments of select are 0/-1 values.
Normally this code with "and" opcode does not get to a DAG combiner
and simplified yet in the InstCombine. However AMDGPU produces it
during lowering and InstCombine has no chance to optimize it out.
In turn the same pattern with "or" opcode can reach DAG.
Differential Revision: https://reviews.llvm.org/D48301
llvm-svn: 335250
This option allows codegen (such as DAGCombine or MI scheduling) to use alias
analysis information, which can help with the codegen on in-order cpu's,
especially machine scheduling. Here I have done things the same way as AArch64,
adding a subtarget feature to enable this for specific cores, and enabled it for
the R52 where we have a schedule to make use of it.
Differential Revision: https://reviews.llvm.org/D48074
llvm-svn: 335249
Summary:
When expanding the PseudoTail in expandFunctionCall() we were using X6
to save the return address. Since this is a tail call the return
address is not needed, this patch replaces it with X0 to be ignored.
This matches the behaviour listed in the ISA V2.2 document page 110.
tail offset -----> jalr x0, x6, offset
GCC exhibits the same behavior.
Reviewers: apazos, asb, mgrang
Reviewed By: asb
Subscribers: rbar, johnrusso, simoncook, niosHD, kito-cheng, shiva0217, zzheng, edward-jones, rogfer01
Differential Revision: https://reviews.llvm.org/D48343
llvm-svn: 335239
This should help in lowering the following four intrinsics:
_mm256_cvtepi32_epi8
_mm256_cvtepi64_epi16
_mm256_cvtepi64_epi8
_mm512_cvtepi64_epi8
Differential Revision: https://reviews.llvm.org/D46957
llvm-svn: 335238
Summary:
For sample and gather ops, we can accurately determine the set of
vaddr-size instruction variants that are required. This reduces
the size of instruction tables by ~5%.
The number of machine instruction opcodes is reduced from 10002
to 9476.
Change-Id: Ie7fc65d3657b762c7816017fe70b2e9bec644a8a
Reviewers: arsenm, rampitec
Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye
Differential Revision: https://reviews.llvm.org/D48168
llvm-svn: 335232
Summary:
This also removes the need for atomic pseudo instructions, since
we select the correct encoding directly in SITargetLowering::lowerImage
for dimension-aware image intrinsics.
Mesa uses dimension-aware image intrinsics since
commit a9a7993441.
Change-Id: I7473d20009476a4ed6d919cae4e6dca9ff42e77a
Reviewers: arsenm, rampitec, mareko, tpr, b-sumner
Subscribers: kzhuravl, wdng, yaxunl, dstuttard, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D48167
llvm-svn: 335231
Summary:
Use the expanded features of the TableGen generic tables to avoid manually
adding the combinatorially exploded set of intrinsics. The
getAMDGPUImageDimIntrinsic lookup function is early-out,
i.e. non-AMDGPU intrinsics will never look at the underlying table.
Use a generic approach for getting the new intrinsic overload to keep the
code simple, and make the image dmask handling more generic:
- handle non-sampler image loads
- handle the case where the set of demanded elements is not a prefix
There is some overlap between this code and an optimization that happens
in the backend during code generation. They currently complement each other:
- only the codegen optimization can generate vec3 loads
- only the InstCombine optimization can handle D16
The InstCombine optimization also likely covers more cases since the
codegen optimization is fairly ad-hoc. Ideally, we'll remove the optimization
in codegen once the infrastructure for vec3 is in place (which will probably
take a long time).
Modify the test cases to use dimension-aware intrinsics. This makes it
easier to see that the test coverage for the new intrinsics is equivalent,
and the old style intrinsics will be removed in a follow-up commit anyway.
Change-Id: I4b91ea661413d13004956fe4ef7d13d41b8ce3ad
Reviewers: arsenm, rampitec, majnemer
Subscribers: kzhuravl, wdng, mgorny, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D48165
llvm-svn: 335230
Summary:
Having TableGen patterns for image intrinsics is hitting limitations:
for D16 we already have to manually pre-lower the packing of data
values, and we will have to do the same for A16 eventually.
Since there is already some custom C++ code anyway, it is arguably easier
to just do everything in C++, now that we can use the beefed-up generic
tables backend of TableGen to provide all the required metadata and map
intrinsics to corresponding opcodes. With this approach, all image
intrinsic lowering happens in SITargetLowering::lowerImage. That code is
dense due to all the cases that it handles, but it should still be easier
to follow than what we had before, by virtue of it all being done in a
single location, and by virtue of not relying on the TableGen pattern
magic that very few people really understand.
This means that we will have MachineSDNodes with MIMG instructions
during DAG combining, but that seems alright: previously we had
intrinsic nodes instead, but those are similarly opaque to the generic
CodeGen infrastructure, and the final pattern matching just did a 1:1
translation to machine instructions anyway. If anything, the fact that
we now merge the address words into a vector before DAG combine should
be an advantage.
Change-Id: I417f26bd88f54ce9781c1668acc01f3f99774de6
Reviewers: arsenm, rampitec, rtaylor, tstellar
Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D48017
llvm-svn: 335228
Summary:
This allows us to access rich information about MIMG opcodes from C++ code.
Simplifying the mapping between equivalent opcodes of different data size
becomes quite natural.
This also flattens the MIMG-related class and multiclass hierarchy a little,
and collapses together some of the scaffolding for sample and gather4 opcodes.
Change-Id: I1a2549fdc1e881ff100e5393d2d87e73729a0ccd
Reviewers: arsenm, rampitec
Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D48016
llvm-svn: 335227
Summary:
This will allows us to provide rich metadata about the instructions
in tables that are accessible by custom C++ code.
Change-Id: Id9305a26304ab6a6cceb6c65c8cd49141cc0101d
Reviewers: arsenm, rampitec
Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D48011
llvm-svn: 335224
Summary:
Kill instructions sometimes do use SCC in unusual circumstances, when
v_cmpx cannot be used due to the operands that are involved.
Additionally, even if SCC was never defined by the expansion, kill pseudos
could previously occur between an s_cmp and an s_cbranch_scc, which breaks
the SCC liveness tracking when the pseudo is expanded to split the basic
block. While it would be possible to explicitly mark the SCC as live-in for
the successor basic block, it's simpler to just mark the pseudo as using SCC,
so that such a sequence is never emitted by instruction selection in the
first place.
A similar issue affects indirect source/dest pseudos in principle, although
I haven't been able to come up with a test case where it actually matters
(this affects instruction selection, so a MIR test can't be used).
Fixes: dEQP-GLES3.functional.shaders.discard.dynamic_loop_always
Change-Id: Ica8d82ecff1a763b892a1112cf1b06c948863a4f
Reviewers: arsenm, rampitec
Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D47761
llvm-svn: 335223
Summary:
This allows us to reduce the number of different machine instruction
opcodes, which reduces the table sizes and helps flatten the TableGen
multiclass hierarchies.
We can do this because for each hardware MIMG opcode, we have a full set
of IMAGE_xxx_Vn_Vm machine instructions for all required sizes of vdata
and vaddr registers. Instead of having separate D16 machine instructions,
a packed D16 instructions loading e.g. 4 components can simply use the
same V2 opcode variant that non-D16 instructions use.
We still require a TSFlag for D16 buffer instructions, because the
D16-ness of buffer instructions is part of the opcode. Renaming the flag
should help avoid future confusion.
The one non-obvious code change is that for gather4 instructions, the
disassembler can no longer automatically decide whether to use a V2 or
a V4 variant. The existing logic which choose the correct variant for
other MIMG instruction is extended to cover gather4 as well.
As a bonus, some of the assembler error messages are now more helpful
(e.g., complaining about a wrong data size instead of a non-existing
instruction).
While we're at it, delete a whole bunch of dead legacy TableGen code.
Change-Id: I89b02c2841c06f95e662541433e597f5d4553978
Reviewers: arsenm, rampitec, kzhuravl, artem.tamazov, dp, rtaylor
Subscribers: wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D47434
llvm-svn: 335222
Summary:
This also allows inner foreach loops to have a list that depends on
the iteration variable of an outer foreach loop. The test cases show
some very simple examples of how this can be used.
This was perhaps the last remaining major non-orthogonality in the
TableGen frontend.
Change-Id: I79b92d41a5c0e7c03cc8af4000c5e1bda5ef464d
Reviewers: tra, simon_tatham, craig.topper, MartinO, arsenm
Subscribers: wdng, llvm-commits
Differential Revision: https://reviews.llvm.org/D47431
llvm-svn: 335221
This enables da-delinearize in Dependence Analysis for delinearizing array
accesses into multiple dimensions. This can help to increase the power of
Dependence analysis on multi-dimensional arrays and prevent having to fall
back to the slower and less accurate MIV tests. It adds static checks on the
bounds of the arrays to ensure that one dimension doesn't overflow into
another, and brings our code in line with our tests.
Differential Revision: https://reviews.llvm.org/D45872
llvm-svn: 335217
These were being over cautious for costs for one/two op general shuffles - VSHUFPD doesn't have to replicate the same shuffle in both lanes like VSHUFPS does.
llvm-svn: 335216
Summary:
In some cases, these operands lacked the IsDebug property, which is meant to signal that
they should not affect codegen. This patch adds a check for this property in the
MachineVerifier and adds it where it was missing.
This includes refactorings to use MachineInstrBuilder construction functions instead of
manually setting up the intrinsic everywhere.
Patch by: JesperAntonsson
Reviewers: aprantl, rnk, echristo, javed.absar
Reviewed By: aprantl
Subscribers: qcolombet, sdardis, nemanjai, JDevlieghere, atanasyan, llvm-commits
Differential Revision: https://reviews.llvm.org/D48319
llvm-svn: 335214
The alignment parameter to getExtLoad is treated as a base alignment,
not the alignment of the load (base + offset). When we infer a better
alignment for a Ptr we need to ensure that it applies to the base to
prevent the alignment on the load from being wrong.
This fixes a bug where the alignment could then be used to incorrectly
prove noalias between a load and a store, leading to a miscompile.
Differential Revision: https://reviews.llvm.org/D48029
llvm-svn: 335210
r335150 should resolve the issues with the clang-with-thin-lto-ubuntu
and clang-with-lto-ubuntu builders.
Original message:
This patch updates IPSCCP to use PredicateInfo to propagate
facts to true branches predicated by EQ and to false branches
predicated by NE.
As a follow up, we should be able to extend it to also propagate additional
facts about nonnull.
Reviewers: davide, mssimpso, dberlin, efriedma
Reviewed By: davide, dberlin
llvm-svn: 335206
Summary:
Fixes PR36579.
For cases where we had e.g.
DBG_VALUE 42
[...]
DBG_VALUE undef
LiveDebugVariables would discard all undef DBG_VALUEs and then it would
look like the variable had the value 42 throughout the rest of the
function, which is incorrect.
With this patch we don't remove all undef DBG_VALUEs in LiveDebugVariables
so they will be kept after register allocation just like other DBG_VALUEs
which will yield more correct debug information.
Reviewers: aprantl
Reviewed By: aprantl
Subscribers: bjope, Ka-Ka, JDevlieghere, llvm-commits
Differential Revision: https://reviews.llvm.org/D48277
llvm-svn: 335205
conditions feeding a chain of `and`s or `or`s for a branch.
Much like with full non-trivial unswitching, we rely on the pass manager
to handle iterating until all of the profitable unswitches have been
done. This is to allow other more profitable unswitches to fire on any
of the cloned, simpler versions of the loop if viable.
Threading the partial unswiching through the non-trivial unswitching
logic motivated some minor refactorings. If those are too disruptive to
make it reasonable to review this patch, I can separate them out, but
it'll be somewhat timeconsuming so I wanted to send it for initial
review as-is. Feel free to tell me whether it warrants pulling apart.
I've tried to re-use (and factor out) logic form the partial trivial
unswitching, but not as much could be shared as I had haped. Still, this
wasn't as bad as I naively expected.
Some basic testing is added, but I probably need more. Suggestions for
things you'd like to see tested more than welcome. One thing I'd like to
do is add some testing that when we schedule this with loop-instsimplify
it effectively cleans up the cruft created.
Last but not least, this uncovered a bug that has been in loop cloning
the entire time for non-trivial unswitching. Specifically, we didn't
correctly add the outer-most cloned loop to the list of cloned loops.
This meant that LCSSA wouldn't be updated for it hypothetically, and
more significantly that we would never visit it in the loop pass
manager. I noticed this while checking loop-instsimplify by hand. I'll
try to separate this bugfix out into its own patch with a more focused
test. But it is just one line, so shouldn't significantly confuse the
review here.
After this patch, the only missing "feature" in this unswitch I'm aware
of us non-trivial unswitching of switches. I'll try implementing *full*
non-trivial unswitching of switches (which is at least a sound thing to
implement), but *partial* non-trivial unswitching of switches is
something I don't see any sound and principled way to implement. I also
have no interesting test cases for the latter, so I'm not really
worried. The rest of the things that need to be ported are bug-fixes and
more narrow / targeted support for specific issues.
Differential Revision: https://reviews.llvm.org/D47522
llvm-svn: 335203
Summary:
Since the value stored in the cache might be deleted or replaced with
something else, we need to use tracking ValueHandlers instead of plain
Value pointers. It was discovered in one of internal builds, and
unfortunately there is no small reproducer for the issue.
The cache was introduced in rL327328.
Reviewers: ahatanak, pete
Subscribers: hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D48407
llvm-svn: 335201
Summary:
Try to match udiv and urem patterns, and sink zext down to the leaves.
I'm not entirely sure why some unrelated tests change, but the added <nsw>s seem right.
Reviewers: sanjoy
Subscribers: jlebar, hiraditya, bixia, llvm-commits
Differential Revision: https://reviews.llvm.org/D48338
llvm-svn: 335197
Errors found processing the DW_AT_ranges attribute are propagated by lower level
routines and reported by their callers.
Reviewer: JDevlieghere
Differential Revision: https://reviews.llvm.org/D48344
llvm-svn: 335188
These are identical but use microMIPS instructions instead of MIPS instructions.
Also, flatten the 'let AdditionalPredicates = [InMicroMips]' by using the
ISA_MICROMIPS adjective. Add tests for constant materialization.
Reviewers: atanasyan, abeserminji, smaksimovic
Differential Revision: https://reviews.llvm.org/D48275
llvm-svn: 335185
Summary:
Two utils methods have essentially the same functionality. This is an attempt to merge them into one.
1. lib/Transforms/Utils/Local.cpp : MergeBasicBlockIntoOnlyPred
2. lib/Transforms/Utils/BasicBlockUtils.cpp : MergeBlockIntoPredecessor
Prior to the patch:
1. MergeBasicBlockIntoOnlyPred
Updates either DomTree or DeferredDominance
Moves all instructions from Pred to BB, deletes Pred
Asserts BB has single predecessor
If address was taken, replace the block address with constant 1 (?)
2. MergeBlockIntoPredecessor
Updates DomTree, LoopInfo and MemoryDependenceResults
Moves all instruction from BB to Pred, deletes BB
Returns if doesn't have a single predecessor
Returns if BB's address was taken
After the patch:
Method 2. MergeBlockIntoPredecessor is attempting to become the new default:
Updates DomTree or DeferredDominance, and LoopInfo and MemoryDependenceResults
Moves all instruction from BB to Pred, deletes BB
Returns if doesn't have a single predecessor
Returns if BB's address was taken
Uses of MergeBasicBlockIntoOnlyPred that need to be replaced:
1. lib/Transforms/Scalar/LoopSimplifyCFG.cpp
Updated in this patch. No challenges.
2. lib/CodeGen/CodeGenPrepare.cpp
Updated in this patch.
i. eliminateFallThrough is straightforward, but I added using a temporary array to avoid the iterator invalidation.
ii. eliminateMostlyEmptyBlock(s) methods also now use a temporary array for blocks
Some interesting aspects:
- Since Pred is not deleted (BB is), the entry block does not need updating.
- The entry block was being updated with the deleted block in eliminateMostlyEmptyBlock. Added assert to make obvious that BB=SinglePred.
- isMergingEmptyBlockProfitable assumes BB is the one to be deleted.
- eliminateMostlyEmptyBlock(BB) does not delete BB on one path, it deletes its unique predecessor instead.
- adding some test owner as subscribers for the interesting tests modified:
test/CodeGen/X86/avx-cmp.ll
test/CodeGen/AMDGPU/nested-loop-conditions.ll
test/CodeGen/AMDGPU/si-annotate-cf.ll
test/CodeGen/X86/hoist-spill.ll
test/CodeGen/X86/2006-11-17-IllegalMove.ll
3. lib/Transforms/Scalar/JumpThreading.cpp
Not covered in this patch. It is the only use case using the DeferredDominance.
I would defer to Brian Rzycki to make this replacement.
Reviewers: chandlerc, spatel, davide, brzycki, bkramer, javed.absar
Subscribers: qcolombet, sanjoy, nemanjai, nhaehnle, jlebar, tpr, kbarton, RKSimon, wmi, arsenm, llvm-commits
Differential Revision: https://reviews.llvm.org/D48202
llvm-svn: 335183
Summary: Make the MemorySSA verify also check that all Phi incoming blocks are block predecessors.
Reviewers: george.burgess.iv
Subscribers: sanjoy, jlebar, Prazek, llvm-commits
Differential Revision: https://reviews.llvm.org/D48333
llvm-svn: 335174
I don't believe there is any real reason to have separate X86 specific opcodes for vector compares. Setcc has the same behavior just uses a different encoding for the condition code.
I had to change the CondCodeAction for SETLT and SETLE to prevent some transforms from changing SETGT lowering.
Differential Revision: https://reviews.llvm.org/D43608
llvm-svn: 335173
As described in D48359, this patch pushes InstructionsState down the BoUpSLP call hierarchy instead of the corresponding raw OpValue. This makes it easier to track the alternate opcode etc. and avoids us having to call getAltOpcode which makes it difficult to support more than one alternate opcode.
Differential Revision: https://reviews.llvm.org/D48382
llvm-svn: 335170
Previously this folding was done only if select is a first operand.
However, for non-commutative operations constant may go before
select.
Differential Revision: https://reviews.llvm.org/D48223
llvm-svn: 335167
The idea of partial unswitching is to take a *part* of a branch's
condition that is loop invariant and just unswitching that part. This
primarily makes sense with i1 conditions of branches as opposed to
switches. When dealing with i1 conditions, we can easily extract loop
invariant inputs to a a branch and unswitch them to test them entirely
outside the loop.
As part of this, we now create much more significant cruft in the loop
body, so this relies on adding cleanup passes to the loop pipeline and
revisiting unswitched loops to do that cleanup before continuing to
process them.
This already appears to be more powerful at unswitching than the old
loop unswitch pass, and so I'd appreciate pretty careful review in case
I'm just missing some correctness checks. The `LIV-loop-condition` test
case is not unswitched by the old unswitch pass, but is with this pass.
Thanks to Sanjoy and Fedor for the review!
Differential Revision: https://reviews.llvm.org/D46706
llvm-svn: 335156
These instructions were renamed in version 2.2 of the user-level ISA spec, but
the old name should also be accepted by standard tools.
llvm-svn: 335154
This utility should operate on Values, not Instructions. While I'm here,
I've also made it possible to skip emitting replacement dbg.values for
certain debug users (by having RewriteExpr return nullptr).
llvm-svn: 335152
Using OrderedInstructions::dominates as comparator for instructions in
BBs without dominance relation can cause a non-deterministic order
between such instructions. That in turn can cause us to materialize
copies in a non-deterministic order. While this does not effect
correctness, it causes some minor non-determinism in the final generated
code, because values have slightly different labels.
Without this patch, running -print-predicateinfo on a reasonably large
module produces slightly different output on each run.
This patch uses the dominator trees DFSInNum to order instruction from
different BBs, which should enforce a deterministic ordering and
guarantee that dominated instructions come after the instructions that
dominate them.
Reviewers: dberlin, efriedma, davide
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D48230
llvm-svn: 335150
Summary:
Due to uniqueing of DICompositeTypes, it's possible for a type from one
module to be loaded into another earlier module without being renamed.
Then when the defining module is being IRMoved, the type can be used as
a Mapping destination before being loaded, such that when it's requested
using TypeMapTy::get() it will fail with an assertion that the type is a
source type when it's actually a type in both the source and
destination modules. Correctly handle that case by allowing a non-opaque
non-literal struct type be present in both modules.
Fix for PR37684.
Reviewers: pcc, tejohnson
Reviewed By: pcc, tejohnson
Subscribers: tobiasvk, mehdi_amini, steven_wu, llvm-commits, kcc
Differential Revision: https://reviews.llvm.org/D47898
llvm-svn: 335145
The purpose of this utility is to make it easier for optimizations to
insert replacement dbg.values for instructions they are deleting. This
is useful in situations where salvageDebugInfo is inapplicable, say,
because the new dbg.value cannot refer to an operand of the dying value.
The utility is called insertReplacementDbgValues.
It assumes that the instruction 'From' is going to be deleted, and
inserts replacement dbg.values for each debug user of 'From'. The
newly-inserted dbg.values refer to 'To' instead of 'From'. Each
replacement dbg.value has the same location and variable as the debug
user it replaces, has a DIExpression determined by the result of
'RewriteExpr' applied to an old debug user of 'From', and is placed
before 'InsertBefore'.
This should simplify future patches, like D48331.
llvm-svn: 335144
Summary:
Found some regressions (infinite loop in DAGTypeLegalizer::RemapId)
after r334880. This patch makes sure that we do map a TableId to
itself.
Reviewers: niravd
Reviewed By: niravd
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D48364
llvm-svn: 335141
Summary:
The waterfall no longer builds .s files and no longers uses
the wasm-o when it builds object files.
Subscribers: dschuff, jgravelle-google, aheejin, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D48371
llvm-svn: 335135
This is part of a move towards generalizing the alternate opcode mechanism and not just supporting (F)Add/(F)Sub counterparts.
The patch embeds the AltOpcode in the InstructionsState instead of calling getAltOpcode so often.
I'm hoping to eventually remove all uses of getAltOpcode and handle alternate opcode selection entirely within getSameOpcode, that will require us to use InstructionsState throughout the BoUpSLP call hierarchy (similar to some of the changes in D28907), which I will begin in future patches.
Differential Revision: https://reviews.llvm.org/D48359
llvm-svn: 335134
D47985 saw the old SK_Alternate 'alternating' shuffle mask replaced with the SK_Select mask which accepts either input operand for each lane, equivalent to a vector select with a constant condition operand.
This patch updates SLPVectorizer to make full use of this SK_Select shuffle pattern by removing the 'isOdd()' limitation.
The AArch64 regression will be fixed by D48172.
Differential Revision: https://reviews.llvm.org/D48174
llvm-svn: 335130
For both operands are unsigned, the following optimizations are valid, and missing:
1. X > Y && X != 0 --> X > Y
2. X > Y || X != 0 --> X != 0
3. X <= Y || X != 0 --> true
4. X <= Y || X == 0 --> X <= Y
5. X > Y && X == 0 --> false
unsigned foo(unsigned x, unsigned y) { return x > y && x != 0; }
should fold to x > y, but I found we haven't done it right now.
besides, unsigned foo(unsigned x, unsigned y) { return x < y && y != 0; }
Has been folded to x < y, so there may be a bug.
Patch by: Li Jia He!
Differential Revision: https://reviews.llvm.org/D47922
llvm-svn: 335129
These are produced by GCC and supported by GAS, but not currently contained in
the pseudoinstruction listing in the RISC-V ISA manual.
llvm-svn: 335127
These are produced by GCC and supported by GAS, but not currently contained in
the pseudoinstruction listing in the RISC-V ISA manual.
llvm-svn: 335120
Thumb has more 16-bit encoding space dedicated to ADD than ORR, allowing both a
3-address encoding and a wider range of immediates. So, particularly when
optimizing for code size (but it doesn't make things worse elsewhere) it's
beneficial to select an OR operation to an ADD if we know overflow won't occur.
This is made even better by LLVM's penchant for putting operations in canonical
form by converting the other way.
llvm-svn: 335119
This patch teaches llvm-mca how to identify register writes that implicitly zero
the upper portion of a super-register.
On X86-64, a general purpose register is implemented in hardware as a 64-bit
register. Quoting the Intel 64 Software Developer's Manual: "an update to the
lower 32 bits of a 64 bit integer register is architecturally defined to zero
extend the upper 32 bits". Also, a write to an XMM register performed by an AVX
instruction implicitly zeroes the upper 128 bits of the aliasing YMM register.
This patch adds a new method named clearsSuperRegisters to the MCInstrAnalysis
interface to help identify instructions that implicitly clear the upper portion
of a super-register. The rest of the patch teaches llvm-mca how to use that new
method to obtain the information, and update the register dependencies
accordingly.
I compared the kernels from tests clear-super-register-1.s and
clear-super-register-2.s against the output from perf on btver2. Previously
there was a large discrepancy between the estimated IPC and the measured IPC.
Now the differences are mostly in the noise.
Differential Revision: https://reviews.llvm.org/D48225
llvm-svn: 335113
Summary:
Related to https://bugs.llvm.org/show_bug.cgi?id=37793, https://reviews.llvm.org/D46760#1127287
We'd like to do this canonicalization https://rise4fun.com/Alive/Gmc
But it is currently restricted by rL155136 / rL155362, which says:
```
// This is a constant shift of a constant shift. Be careful about hiding
// shl instructions behind bit masks. They are used to represent multiplies
// by a constant, and it is important that simple arithmetic expressions
// are still recognizable by scalar evolution.
//
// The transforms applied to shl are very similar to the transforms applied
// to mul by constant. We can be more aggressive about optimizing right
// shifts.
//
// Combinations of right and left shifts will still be optimized in
// DAGCombine where scalar evolution no longer applies.
```
I think these tests show that for *constants*, SCEV has no issues with that canonicalization.
Reviewers: mkazantsev, spatel, efriedma, sanjoy
Reviewed By: mkazantsev
Subscribers: sanjoy, javed.absar, llvm-commits, stoklund, bixia
Differential Revision: https://reviews.llvm.org/D48229
llvm-svn: 335101
Summary:
First off: i do not have any access to that processor,
so this is purely theoretical, no benchmarks.
I have been looking into b**d**ver2 scheduling profile, and while cross-referencing
the existing b**t**ver2, znver1 profiles, and the reference docs
(`Software Optimization Guide for AMD Family {15,16,17}h Processors`),
i have noticed that only b**t**ver2 scheduling profile specifies these.
Also, there is no mca test coverage.
Reviewers: RKSimon, craig.topper, courbet, GGanesh, andreadb
Reviewed By: GGanesh
Subscribers: gbedwell, vprasad, ddibyend, shivaram, Ashutosh, javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D47676
llvm-svn: 335099
Summary:
I ran llvm-exegesis on SKX, SKL, BDW, HSW, SNB.
Atom is from Agner and SLM is a guess.
I've left AMD processors alone.
Reviewers: RKSimon, craig.topper
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D48079
llvm-svn: 335097
Summary:
If we get an error building the SelectionDAG for inline assembly we try to continue and still build the DAG.
But if the return type for the inline assembly is a struct we end up crashing because we try to create an UNDEF node with a struct type which isn't valid.
Instead we need to create an UNDEF for each element of the struct and join them with merge_values.
This patch relies on single operand merge_values being handled gracefully by getMergeValues. If the return type is void there will be no VTs returned by ComputeValueVTs and now we just return instead of calling setValue. Hopefully that's ok, I assumed nothing would need to look up the mapped value for void node.
Fixes PR37359
Reviewers: rengolin, rovka, echristo, efriedma, bogner
Reviewed By: efriedma
Subscribers: craig.topper, llvm-commits
Differential Revision: https://reviews.llvm.org/D46560
llvm-svn: 335093
Summary:
After r335018, the static tables are guaranteed sorted by the EVEX opcode to convert. We can use this to do a binary search and remove the need for any secondary data structures.
Right now one table is 736 entries and the other is 482 entries. It might make sense to merge the two tables as a follow up. The effort it takes to select the table is probably similar to the extra binary search step it would require for a larger table.
I haven't done any measurements to see if this has any effect on compile time, but I don't imagine that EVEX->VEX conversion is a place we spend a lot of time.
Reviewers: RKSimon, spatel, chandlerc
Reviewed By: RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D48312
llvm-svn: 335092
This patch introduces two helpers to make it easier to ignore debug
intrinsics:
- Instruction::getNextNonDebugInstruction()
This is just like Instruction::getNextNode(), except that it skips debug
info.
- skipDebugInfo(BasicBlock::iterator)
A free function which advances a BasicBlock iterator past any debug
info. This is a no-op when the iterator already points to a non-debug
instruction.
Part of: llvm.org/PR37728
Related to: https://reviews.llvm.org/D47874
Differential Revision: https://reviews.llvm.org/D48305
llvm-svn: 335083
This patch covers up a fairly fundemental issue around remat and register allocation which shows up with psuedo instructions with more vreg uses than there are physical registers. This patch essentially just disables remat for STATEPOINTs which are the only case we've seen so far, but long term we need a better fix.
For STATEPOINTs specifically, this is a strict improvement. It unblocks progress towards enabling a currently off-by-default mode which integrates deopt bundle operand lowering with register allocator spilling so that we end up with smaller stack sizes and more optimally placed spills. Assming no other issues turn up during my next round of integration testing - which based on experience so far, is admittedly unlikely - we might finally be able to enable something I've been working towards in small bits and pieces for years now. :)
For psuedo ops in general, there are a couple of ideas for a "proper fix" discussed on the bug, but I'm far enough outside my knowledge area to not be able to see any of them through to a successful conclusion. If anyone wants to help out here, please do.
Differential Revision: https://reviews.llvm.org/D41098
llvm-svn: 335077
insertOutlinerPrologue was not used by any target, and prologue-esque code was
beginning to appear in insertOutlinerEpilogue. Refactor that into one function,
buildOutlinedFrame.
This just removes insertOutlinerPrologue and renames insertOutlinerEpilogue.
llvm-svn: 335076
Summary:
This fixes liveness tracking information after `drop` instruction
insertion in ExplicitLocals pass.
When a drop instruction is inserted to drop a dead register operand, the
original operand should be marked not dead anymore because it is now
used by the new drop instruction. And the operand to the new drop
instruction should be marked killed instead. This bug caused some
programs to fail when `llc` is run with `-verify-machineinstrs` option.
Reviewers: dschuff
Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D48253
llvm-svn: 335074
The optimizer is getting smarter (eg, D47986) about differentiating shuffles
based on its mask values, so we should make queries on the mask constant
operand generally available to avoid code duplication.
We'll probably use this soon in the vectorizers and instcombine (D48023 and
https://bugs.llvm.org/show_bug.cgi?id=37806).
We might clean up TTI a bit more once all of its current 'SK_*' options are
covered.
Differential Revision: https://reviews.llvm.org/D48236
llvm-svn: 335067
Summary:
Patch r323922 changed the sigil for physical registers to '$', instead of '%'.
An error message was missed during this change, and reports the wrong sigil.
This patch corrects that diagnostic and the tests that check that error string.
Reviewers: zer0, bjope
Reviewed By: bjope
Subscribers: bjope, thegameg, plotfi, llvm-commits
Differential Revision: https://reviews.llvm.org/D48086
llvm-svn: 335066
This value is the first vector instruction type in numerical order. The
previous value was incorrect, leaving TypeCVI_GATHER outside of the range
for vector instructions. This caused vector .new instructions to be
incorrectly encoded in the presence of gather.
llvm-svn: 335065
FMA3Info only exists as a managed static. As far as I know the ManagedStatic construction proccess is thread safe. It doesn't look like we ever access the ManagedStatic object without immediately doing a query on it that would require the map to be populated. So I don't think we're ever deferring the calculation of the tables from the construction of the object.
So I think we should be able to just populate the FMA3Info map directly in the constructor and get rid of all of the initGroupsOnce stuff.
Differential Revision: https://reviews.llvm.org/D48194
llvm-svn: 335064
There are no provided instruction definitions for this architecture.
Reviewers: smaksimovic, atanasyan, abeserminji
Differential Revision: https://reviews.llvm.org/D48320
llvm-svn: 335057
Previously, some aliases were marked as not being available for microMIPS32R6,
but this was overridden at the top level.
Reviewers: atanasyan, abeserminji, smaksimovic
Differential Revision: https://reviews.llvm.org/D48321
llvm-svn: 335053
The getArithmeticInstrCost calls for shuffle vectors entry costs specify TargetTransformInfo::OperandValueKind arguments, but are just using the method's default values. This seems to be a copy + paste issue and doesn't affect the costs in anyway. The TargetTransformInfo::OperandValueProperties default arguments are already not being used.
Noticed while working on D47985.
Differential Revision: https://reviews.llvm.org/D48008
llvm-svn: 335045
This patch replaces calls to X86-specific intrinsics with floor-ceil semantics
with calls to target-independent @llvm.floor.* and @llvm.ceil.* intrinsics. This
doesn't affect the resulting machine code, as those intrinsics are lowered to
the same instructions, but exposes these specific rounding cases to generic
optimizations.
Differential Revision: https://reviews.llvm.org/D48067
llvm-svn: 335039
This patch handles back-end folding of generic patterns created by lowering the
X86 rounding intrinsics to native IR in cases where the instruction isn't a
straightforward packed values rounding operation, but a masked operation or a
scalar operation.
Differential Revision: https://reviews.llvm.org/D45203
llvm-svn: 335037
LoopSimplifyCFG, being a loop pass, needs to preserve scalar
evolution. This invalidates SE for the loops altered during
block merging.
Differential Revision: https://reviews.llvm.org/D48258
llvm-svn: 335036
This is a fixup for r334830 causing problems in polly-aosp buildbot.
Focus in r334830 was to fix a problem seen with
ConvertDebugDeclareToDebugValue involving store instructions.
It also added some asserts to find out of similar problems
existed for the ConvertDebugDeclareToDebugValue functions
involving load and phi instructions. One of those asserts seems
to blow in the polly-aosp buildbot, so I'll revert the asserts
while debugging.
llvm-svn: 335031
This patch moves the logic to handle reduction PHI nodes to the end of
adjustLoopBranches. Reduction PHI nodes in the outer loop header can be
moved to the inner loop header and reduction PHI nodes from the inner loop
header can be moved to the outer loop header. In the latter situation,
we have to deal with 1 kind of PHI nodes:
PHI nodes that are part of inner loop-only reductions.
We can replace the PHI node with the value coming from outside
the inner loop.
Reviewers: mcrosier, efriedma, karthikthecool
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D46198
llvm-svn: 335027
and expand it post RA basing on the register pressure. However, we miss to do the add-imm peephole for these pseudo instruction.
Differential Revision: https://reviews.llvm.org/D47568
Reviewed By: Nemanjai
llvm-svn: 335024
This patch adds logic to deal with the following constructions:
%iv = phi i64 ...
%trunc = trunc i64 %iv to i32
%cmp = icmp <pred> i32 %trunc, %invariant
Replacing it with
%iv = phi i64 ...
%cmp = icmp <pred> i64 %iv, sext/zext(%invariant)
In case if it is legal. Specifically, if `%iv` has signed comparison users, it is
required that `sext(trunc(%iv)) == %iv`, and if it has unsigned comparison
uses then we require `zext(trunc(%iv)) == %iv`. The current implementation
bails if `%trunc` has other uses than `icmp`, but in theory we can handle more
cases here (e.g. if the user of trunc is bitcast).
Differential Revision: https://reviews.llvm.org/D47928
Reviewed By: reames
llvm-svn: 335020
This adds an EVEX2VEXOverride string to the X86 instruction class in X86InstrFormats.td. If this field is set it will add manual entry in the EVEX->VEX tables that doesn't check the encoding information.
Then use this mechanism to map VMOVDU/A8/16, 128-bit VALIGN, and VPSHUFF/I instructions to VEX instructions.
Finally, remove the manual table from the emitter.
This has the bonus of fully sorting the autogenerated EVEX->VEX tables by their EVEX instruction enum value. We may be able to use this to do a binary search for the conversion and get rid of the need to create a DenseMap.
llvm-svn: 335018
EVEX makes heavy use of the VEX.W bit to indicate 64-bit element vs 32-bit elements. Many of the VEX instructions were split into 2 versions with different masking granularity.
The EVEX->VEX table generate can collapse the two versions if the VEX version uses is tagged as VEX_WIG. But if the VEX version is instead marked VEX.W==0 we can't combine them because we don't know if there is also a VEX version with VEX.W==1.
This patch adds a new VEX_W1X tag that indicates the EVEX instruction encodes with VEX.W==1, but is safe to convert to a VEX instruction with VEX.W==0.
This allows us to remove a bunch of manual EVEX->VEX table entries. We may want to look into splitting up the VEX_WPrefix field which would simplify the disassembler.
llvm-svn: 335017
This reverts r334428. It incorrectly marks some multiplications as nuw. Tim
Shen is working on a proper fix.
Original commit message:
[SCEV] Add nuw/nsw to mul ops in StrengthenNoWrapFlags where safe.
Summary:
Previously we would add them for adds, but not multiplies.
llvm-svn: 335016
The code was previously checking the L2 and L flag on 3 separate lines, treating the combination as an encoding. Instead its better to think of the L2 bit as being something that can't be done with VEX and early returning. Then we just need to check the L bit.
llvm-svn: 335015
Summary:
Added more utility functions that will be used in EH-related passes Also
changed `LoopBottom` function to `getBottom` and uses templates to be
able to handle other classes as well, which will be used in CFGSort
later.
Reviewers: dschuff
Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D48262
llvm-svn: 335006
Summary:
Add WasmEHFuncInfo and routines to calculate and fill in this struct to
keep track of unwind destination information. This will be used in
other EH related passes.
Reviewers: dschuff
Subscribers: sbc100, jgravelle-google, sunfish, chrib, llvm-commits
Differential Revision: https://reviews.llvm.org/D48263
llvm-svn: 335005
Summary:
This patch changes the rethrow instruction to take a BB argument in LLVM
backend, like `br` and `br_if`s. This BB is a target catch BB the
rethrow instruction unwinds to. This BB argument will be converted to an
relative depth immediate at the end of CFGStackify pass, as in the same
way of branches.
RETHROW_TO_CALLER is a codegen-only instruction that should be used when
a rethrow instruction does not have an unwind destination BB, i.e., it
should rethrow to its caller function.
Reviewers: dschuff
Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D48260
llvm-svn: 334998
Summary: This patch originated from D46562 and is a proper subset, with some issues addressed.
Reviewers: spatel, hfinkel, wristow, arsenm, javed.absar
Reviewed By: spatel
Subscribers: wdng, nhaehnle
Differential Revision: https://reviews.llvm.org/D47909
llvm-svn: 334996
The instructions that use this class don't have another source register. So I think this was just marking one of the address operands as ReadAfterLd?
llvm-svn: 334994
This reverts commit f976cf4cca0794267f28b54e468007fd476d37d9.
I am reverting this because it causes break in a few bots and its going
to take me sometime to look at this.
llvm-svn: 334993
Summary:
Simplify blockaddress usage before giving up in MergeBlockIntoPredecessor
This is a missing small optimization in MergeBlockIntoPredecessor.
This helps with one simplifycfg test which expects this case to be handled.
Reviewers: davide, spatel, brzycki, asbirlea
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D48284
llvm-svn: 334992
Summary:
One for register based, much like the existing definitions,
and one for stack based (suffix _S).
This allows us to use registers in most of LLVM (which works better),
and stack based in MC (which results in a simpler and more readable
assembler / disassembler).
Tried to keep this change as small as possible while passing tests,
follow-up commit will:
- Add reg->stack conversion in MI.
- Fix asm/disasm in MC to be stack based.
- Fix emitter to be stack based.
tests passing:
llvm-lit -v `find test -name WebAssembly`
test/CodeGen/WebAssembly
test/MC/WebAssembly
test/MC/Disassembler/WebAssembly
test/DebugInfo/WebAssembly
test/CodeGen/MIR/WebAssembly
test/tools/llvm-objdump/WebAssembly
Reviewers: dschuff, sbc100, jgravelle-google, sunfish
Subscribers: aheejin, JDevlieghere, llvm-commits
Differential Revision: https://reviews.llvm.org/D48183
llvm-svn: 334985
Summary: Refactoring for all constant cases which require AllowNewConst and some staging for future fmf usage.
Reviewers: spatel, hfinkel, wristow
Reviewed By: spatel
Subscribers: nhaehnle
Differential Revision: https://reviews.llvm.org/D48289
llvm-svn: 334984
This patch uses the DiagnosticPredicate for SVE predicate patterns
to improve their diagnostics, now giving a 'invalid operand' diagnostic
if the type is not an immediate or one of the expected pattern
labels.
Reviewers: samparker, SjoerdMeijer, javed.absar, fhahn
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D48220
llvm-svn: 334983
The variants added by this patch are:
- SQINC signed increment, e.g. sqinc x0, w0, all, mul #4
- SQDEC signed decrement, e.g. sqdec x0, w0, all, mul #4
- UQINC unsigned increment, e.g. uqinc w0, all, mul #4
- UQDEC unsigned decrement, e.g. uqdec w0, all, mul #4
This patch includes asmparser changes to parse a GPR64 as a GPR32 in
order to satisfy the constraint check:
x0 == GPR64(w0)
in:
sqinc x0, w0, all, mul #4
^___^ (must match)
Reviewers: rengolin, fhahn, SjoerdMeijer, samparker, javed.absar
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D47716
llvm-svn: 334980
Rather than having an exclusion list in tablegen sources, add a flag to the X86 instruction records that can be used to suppress checking for convertibility.
llvm-svn: 334971
This patch introduces a VPInstructionToVPRecipe transformation, which
allows us to generate code for a VPInstruction based VPlan re-using the
existing infrastructure.
Reviewers: dcaballe, hsaito, mssimpso, hfinkel, rengolin, mkuper, javed.absar, sguggill
Reviewed By: dcaballe
Differential Revision: https://reviews.llvm.org/D46827
llvm-svn: 334969
CompileOnDemandLayer2 is a replacement for CompileOnDemandLayer built on the ORC
Core APIs. Functions in added modules are extracted and compiled lazily.
CompileOnDemandLayer2 supports multithreaded JIT'd code, and compilation on
multiple threads.
llvm-svn: 334967
materializing weak symbols as strong.
This removes some elaborate flag tweaking and plays nicer with RuntimeDyld,
which relies of weak/common flags to determine whether it should emit a given
weak definition. (Switching to strong up-front makes it appear as if there is
already an overriding definition, which would require an extra back-channel to
override).
llvm-svn: 334966
Ensure we keep track of the input vectors in all cases instead of just for SK_Select.
Ideally we'd reuse the shuffle mask pattern matching in TargetTransformInfo::getInstructionThroughput here to easily add support for all TargetTransformInfo::ShuffleKind without mass code duplication, I've added a TODO for now but D48236 should help us here.
Differential Revision: https://reviews.llvm.org/D48023
llvm-svn: 334958
This patch adds instructions for comparing elements from two vectors, e.g.
cmpgt p0.s, p0/z, z0.s, z1.s
and also adds support for comparing to a 64-bit wide element vector, e.g.
cmpgt p0.s, p0/z, z0.s, z1.d
The patch also contains aliases for certain comparisons, e.g.:
cmple p0.s, p0/z, z0.s, z1.s => cmpge p0.s, p0/z, z1.s, z0.s
cmplo p0.s, p0/z, z0.s, z1.s => cmphi p0.s, p0/z, z1.s, z0.s
cmpls p0.s, p0/z, z0.s, z1.s => cmphs p0.s, p0/z, z1.s, z0.s
cmplt p0.s, p0/z, z0.s, z1.s => cmpgt p0.s, p0/z, z1.s, z0.s
llvm-svn: 334931
Previously we heap allocated the X86InstrFMA3Group objects which were created by passing them small register/memory opcode arrays that existed as individual static tables.
Rather than a bunch of small static arrays we now have one large static table of X86InstrFMA3Group objects. Rather than storing a pointer to the opcode arrays in the X86InstrFMA3Group object, we now store have a register and memory array as part of the object. If a group doesn't have memory or register opcodes, the array entries will be 0.
This greatly simplifies the destruction of the X86InstrFMA3Info object. We no longer need to delete the X86InstrFMA3Group objects as we destruct the DenseMap. And we don't need to keep track of which ones we already deleted.
This reduces the llc binary size on my local machine by ~50k. I can only assume that's really due to the fact that we had something like 512 small static arrays that we passed to the init functions either one at a time or in pairs. So there were between 256 and 512 distinct calls to the init functions in the initOnceImpl method.
llvm-svn: 334925
We already have these aliases for EVEX enocded instructions, but not for the GPR, MMX, SSE, and VEX versions.
Also remove the vpextrw.s EVEX alias. That's not something gas implements.
llvm-svn: 334922
The .s assembly strings allow the reversed forms to be targeted from assembly which matches gas behavior. But when printing the instructions we should print them without the .s to match other tooling like objdump. By using InstAliases we can use the normal string in the instruction and just hide it from the assembly parser.
Ideally we'd add the .s versions to the legacy SSE and VEX versions as well for full compatibility with gas. Not sure how we got to state where only EVEX was supported.
llvm-svn: 334920
These increases the size of the static tables, but is closer to what we would get if used the autogenerated table directly. This reduces the remaining large deltas between what's in the manual table and what's in the autogenerated table.
llvm-svn: 334915
symbols in debug mode.
The MaterializationResponsibility class hijacks the Materializing flag to track
symbols that have not yet been resolved in order to guard against redundant
resolution. Since this is an API contract check and only enforced in debug mode
there is no reason to maintain the flag state in release mode.
llvm-svn: 334909
Some of the calls to hasSingleUseFromRoot were passing the load itself. If the load's chain result has a user this would count against that. By getting the true parent of the match and ensuring any intermediate between the match and the load have a single use we can avoid this case. isLegalToFold will take care of checking users of the load's data output.
This fixed at least fma-scalar-memfold.ll to succed without the peephole pass.
llvm-svn: 334908
Support for SVE's predicated select instructions to select elements
from either vector, both in a data-vector and a predicate-vector
variant.
llvm-svn: 334905
We don't want to prevent inlining because of target-cpu and -features
attributes that were added to newer versions of LLVM/Clang: There are
no incompatible functions in PTX, ptxas will throw errors in such cases.
Differential Revision: https://reviews.llvm.org/D47691
llvm-svn: 334904
These all have a short form encoding that the assembler already prefers. Though that preference seems to only be based on order in the .td fie. Hiding the long form saves space in the table and prevents us from breaking the implicit order based priority.
llvm-svn: 334897
VMOVPQIto64Zmr is not a 64-bit mode only instruction. But I don't know how to test this because VMOVPQIto64mr should always have priority over it in 32-bit mode since its only advantage is XMM16-XMM31 which aren't usable in 32-bit mode.
VMOVPQIto64Zrr is a 64-bit mode only instruction, but we don't need to explicitly mark it as such because it uses a GR64 register which won't parse in 32-bit mode.
llvm-svn: 334896
Summary:
We only modify CFG in a couple of places, and we can preserve DT there
with a little effort.
Reviewers: davide, vsk
Subscribers: hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D48059
llvm-svn: 334895
This is the common case in the BE when we serialize condition and then
rematerialize it. Use either original or inverted condition.
Differential Revision: https://reviews.llvm.org/D48246
llvm-svn: 334882
Relanding after fixing expensive check from modifying tables.
To avoid redundant work, during DAG legalization we keep tables
mapping pre-legalized SDValues to post-legalized SDValues and a
SDValue-to-SDValue map to enable fast node replacements. However, as
the keys are nodes which may be reused it is possible that an entry in
a table refers to a now deleted node N (that should have been renamed
by the value replacement map) while a new node N' exists. If N' is
then replaced that entry would be wrong. Previously we avoided this by
when potentially violating this property, walking every table and
updating all node pointers. This is very expensive but hopefully rare
occurance.
This patch assigns each instance of a SDValue used in legalization a
unique id and uses these ids in the legalization tables. This avoids
any such aliasing issue, avoiding the full table search and allowing
more aggressive incremental table pruning.
In some cases this is a 1000x speedup to compilation.
Reviewers: jyknight, echristo, bogner, tra
Reviewed By: bogner
Subscribers: dberris, grandinj, hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D47959
llvm-svn: 334880
Summary: This patch originated from D47388 and is a proper subset of the originating changes, containing only the fmf optimization guard extensions.
Reviewers: spatel, hfinkel, wristow, arsenm, javed.absar, rampitec, nhaehnle, nemanjai
Reviewed By: rampitec, nhaehnle
Subscribers: tpr, nemanjai, wdng
Differential Revision: https://reviews.llvm.org/D47918
llvm-svn: 334876
Summary:
Sending for presubmit review out of an abundance of caution; it would be
bad to mess this up.
Reviewers: sanjoy
Subscribers: hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D48238
llvm-svn: 334875
Summary:
Obviates the need for mask/clear/setFlags helpers.
There are some expressions here which can be simplified, but to keep
this easy to review, I have not simplified them in this patch.
No functional change.
Reviewers: sanjoy
Subscribers: hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D48237
llvm-svn: 334874
So far, we've only handled special cases of PatFrag like ImmLeaf. This patch
adds support for the remaining cases using similar mechanisms.
Like most C++ code from SelectionDAG, GISel and DAGISel expect to operate on
different types and representations and as such the code is not compatible
between the two. It's therefore necessary to add an alternative implementation
in the GISelPredicateCode field.
The target test for this feature could easily be done with IntImmLeaf and this
would save on a little boilerplate. The reason I've chosen to implement this
using PatFrag.GISelPredicateCode and not IntImmLeaf is because I was unable to
find a rule that was blocked solely by lack of support for PatFrag predicates. I
found that the ones I investigated as being likely candidates for the test
were further blocked by other things.
llvm-svn: 334871
Not sure any of these matter today because I don't think we ever produce them with IMPLICIT_DEF as an input. But by listing them we don't be suprised in the future.
llvm-svn: 334867
Modify ExpandStrictFPOp(...) to handle nodes that have scalar
operands.
Also, add a Strict FMA test and do some other light cleanup in the
Strict FP code.
Differential Revision: https://reviews.llvm.org/D48149
llvm-svn: 334863
Summary: This patch originated from D46562 and is a proper subset, with some issues addressed.
Reviewers: spatel, hfinkel, wristow, arsenm
Reviewed By: spatel
Subscribers: wdng, nhaehnle
Differential Revision: https://reviews.llvm.org/D47954
llvm-svn: 334862
Summary:
Using associated metadata rather than llvm.used allows linkers to
perform dead stripping with -fsanitize-coverage=pc-table. Unfortunately
in my local tests, LLD was the only linker that made use of this metadata.
Partially addresses https://bugs.llvm.org/show_bug.cgi?id=34636 and fixes
https://github.com/google/sanitizers/issues/971.
Reviewers: eugenis
Reviewed By: eugenis
Subscribers: Dor1s, hiraditya, llvm-commits, kcc
Differential Revision: https://reviews.llvm.org/D48203
llvm-svn: 334858
Enables using the high and high-adjusted symbol modifiers on thread local
storage modifers in powerpc assembly. Needed to be able to support 64 bit
thread-pointer and dynamic-thread-pointer access sequences.
Differential Revision: https://reviews.llvm.org/D47754
llvm-svn: 334856
Add support for the "@high" and "@higha" symbol modifiers in powerpc64 assembly.
The modifiers represent accessing the segment consiting of bits 16-31 of a
64-bit address/offset.
Differential Revision: https://reviews.llvm.org/D47729
llvm-svn: 334855
An earlier commit prevented folds from the peephole pass by checking for IMPLICIT_DEF. But later in the pipeline IMPLICIT_DEF just becomes and Undef flag on the input register so we need to check for that case too.
llvm-svn: 334848
When coalescing a small register into a subregister of a larger register,
if the larger register is rematerialized, the function updateRegDefUses
can add an <undef> flag to the rematerialized definition (since it's
treating it as only definining the coalesced subregister). While with that
assumption doing so is not incorrect, make sure to remove the flag later
on after the call to updateRegDefUses.
llvm-svn: 334845
Summary:
When iterating users of a multiply in processUMulZExtIdiom, the
call to setOperand in the truncation case may replace the use
being visited; make sure the iterator has been advanced before
doing that replacement.
Reviewers: majnemer, davide
Reviewed By: davide
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D48192
llvm-svn: 334844
We were unnecessarily going from SmallString to std::string just to
get a null-terminated C string. So just...don't do that. Crash
slightly faster!
llvm-svn: 334841
This is a minor fix for LV cost model, where the cost for VF=2 was
computed twice when the vectorization of the loop was forced without
specifying a VF.
Reviewers: xusx595, hsaito, fhahn, mkuper
Reviewed By: hsaito, xusx595
Differential Revision: https://reviews.llvm.org/D48048
llvm-svn: 334840
Increment/decrement scalar register by (scaled) element count given by
predicate pattern, e.g. 'incw x0, all, mul #4'.
Reviewers: rengolin, fhahn, SjoerdMeijer, samparker, javed.absar
Reviewed By: SjoerdMeijer
Differential Revision: https://reviews.llvm.org/D47713
llvm-svn: 334838
Try to access pieces 4 bytes at a time. This helps
various hasOneUse extract_vector_elt combines, such
as load width reductions.
Avoids test regressions in a future commit.
llvm-svn: 334836
This is r334704 (which was reverted in r334732) with a fix for
types like x86_fp80. We need to use getTypeAllocSizeInBits and
not getTypeStoreSizeInBits to avoid dropping debug info for
such types.
Original commit msg:
> Summary:
> Do not convert a DbgDeclare to DbgValue if the store
> instruction only refer to a fragment of the variable
> described by the DbgDeclare.
>
> Problem was seen when for example having an alloca for an
> array or struct, and there were stores to individual elements.
> In the past we inserted a DbgValue intrinsics for each store,
> just as if the store wrote the whole variable.
>
> When handling store instructions we insert a DbgValue that
> indicates that the variable is "undefined", as we do not know
> which part of the variable that is updated by the store.
>
> When ConvertDebugDeclareToDebugValue is used with a load/phi
> instruction we assert that the referenced value is large enough
> to cover the whole variable. Afaict this should be true for all
> scenarios where those methods are used on trunk. If the assert
> blows in the future I guess we could simply skip to insert a
> dbg.value instruction.
>
> In the future I think we should examine which part of the variable
> that is accessed, and add a DbgValue instrinsic with an appropriate
> DW_OP_LLVM_fragment expression.
>
> Reviewers: dblaikie, aprantl, rnk
>
> Reviewed By: aprantl
>
> Subscribers: JDevlieghere, llvm-commits
>
> Tags: #debug-info
>
> Differential Revision: https://reviews.llvm.org/D48024
llvm-svn: 334830
Some instructions require of a limited set of FP immediates as operands,
for example '#0.5 or #1.0' for SVE's FADD instruction.
This patch adds support for parsing and printing such FP immediates as
exact values (e.g. #0.499999 is not accepted for #0.5).
Reviewers: rengolin, fhahn, SjoerdMeijer, samparker, javed.absar
Reviewed By: SjoerdMeijer
Differential Revision: https://reviews.llvm.org/D47711
llvm-svn: 334826
Summary:
We already do it for splat constants, but not just values.
Also, undef cases are mostly non-functional.
The original commit was reverted because
it broke tests for amdgpu backend, which i didn't check.
Now, the backed was updated to recognize these new
patterns, so we are good.
https://bugs.llvm.org/show_bug.cgi?id=37603https://rise4fun.com/Alive/cplX
Reviewers: spatel, craig.topper, mareko, bogner, rampitec, nhaehnle, arsenm
Reviewed By: spatel, rampitec, nhaehnle
Subscribers: wdng, nhaehnle, llvm-commits
Differential Revision: https://reviews.llvm.org/D47980
llvm-svn: 334818
Summary: The same pattern as D48010, but this one is IR-canonical as of D47428.
Reviewers: nhaehnle, bogner, tstellar, arsenm
Reviewed By: arsenm
Subscribers: arsenm, kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Tags: #amdgpu
Differential Revision: https://reviews.llvm.org/D48012
llvm-svn: 334817
Summary:
As a followup for D48007.
Since we already handle `x << (bitwidth - y) >> (bitwidth - y)` pattern,
which does not have ub for both the edge cases (`y == 0`, `y == bitwidth`),
i think also handling a pattern that is ub for `y == bitwidth` should be fine.
Reviewers: nhaehnle, bogner, tstellar, arsenm
Reviewed By: arsenm
Subscribers: arsenm, kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Tags: #amdgpu
Differential Revision: https://reviews.llvm.org/D48010
llvm-svn: 334816
Summary:
D47980 will canonicalize the `x << (32 - y) >> (32 - y)`,
which is the pattern the AMDGPU expects to `x & (-1 >> (32 - y))`,
which is not recognized by AMDGPU.
Thus, it needs to be recognized, too.
Reviewers: nhaehnle, bogner, tstellar, arsenm
Reviewed By: arsenm
Subscribers: arsenm, kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Tags: #amdgpu
Differential Revision: https://reviews.llvm.org/D48007
llvm-svn: 334815
Instruction bundling is only supported on descendants of the
MCEncodedFragment type. By moving the bundling functionality and
MCSubtargetInfo to this class it makes it easier to set and extract the
MCSubtargetInfo when it is necessary.
This is a refactoring change that will make it easier to pass the
MCSubtargetInfo through to writeNops when nop padding is required.
Differential Revision: https://reviews.llvm.org/D45959
llvm-svn: 334814
I think this covers most of the unmasked vector instructions. We're still missing a lot of the masked instructions.
There are some test changes here because of the new folding support. I don't think these particular cases should be folded because it creates an undef register dependency. I think the changes introduced in r334175 are not handling stack folding. They're only blocking the peephole pass.
llvm-svn: 334800
IEEE 754 defines the expected result on overflow. As far as I know,
hardware implementations (of f16), and compiler-rt (__floatuntisf)
correctly return +-Inf on overflow. And I can't think of any useful
transform that would take advantage of overflow being undefined here.
Differential Revision: https://reviews.llvm.org/D47807
llvm-svn: 334777
Once a symbol has been selected for materialization it can no longer be
overridden. Stripping the weak flag guarantees this (override attempts will
then be treated as duplicate definitions and result in a DuplicateDefinition
error).
llvm-svn: 334771
Summary:
Here we relax the old constraint which utilized unsafe with the TargetOption flag HonorSignDependentRoundingFPMathOption, with the assertion that unsafe is no longer needed or never was required for correctness on FDIV/FMUL.
Reviewers: spatel, hfinkel, wristow, arsenm, javed.absar
Reviewed By: spatel
Subscribers: efriedma, wdng, tpr
Differential Revision: https://reviews.llvm.org/D48057
llvm-svn: 334769
In particular, when asked to print a MemoryAccess, we'll now print where
defs are optimized to, and we'll print optimized access types.
This patch also introduces an operator<< to make printing AliasResults
easier.
Patch by Juneyoung Lee!
Differential Revision: https://reviews.llvm.org/D47860
llvm-svn: 334760
isVectorClearMaskLegal() is the TLI hook used by the generic
DAGCombiner::XformToShuffleWithZero().
We've grown to accomodate/expect this transform to shuffle
(disabling it more generally results in many regressions).
So I'm narrowly excluding the 256-bit types that clearly
are not worthwhile for AVX1.
I think in most cases we are able to recover by converting
the shuffle back into 'and' ops, but the cases in:
https://bugs.llvm.org/show_bug.cgi?id=37749
...show that there are cracks.
llvm-svn: 334759
This is r334750 (which was reverted in r334754) with a fix for an
uninitialized variable that was caught by msan.
Original commit message:
> If a copy bundle happens to involve overlapping registers, we can end
> up with emitting the copies in an order that ends up clobbering some
> of the subregisters. Since instructions in the copy bundle
> semantically happen at the same time, this is incorrect and we need to
> make sure we order the copies such that this doesn't happen.
llvm-svn: 334756
Summary: A FMF constraint is added to FADD with unsafe still available as the fallback
Reviewers: spatel, wristow, arsenm, hfinkel
Reviewed By: spatel
Subscribers: wdng
Differential Revision: https://reviews.llvm.org/D48180
llvm-svn: 334753
WebAssembly doesn't support more than one function per section
and we rely on function sections being unique. This change ignores
the section provided by the function to avoid two functions being
in the same section.
Without this change the object writer produces the following
error for this test:
LLVM ERROR: section already has a defining function: baz
Differential Revision: https://reviews.llvm.org/D48178
llvm-svn: 334752
If a copy bundle happens to involve overlapping registers, we can end
up with emitting the copies in an order that ends up clobbering some
of the subregisters. Since instructions in the copy bundle
semantically happen at the same time, this is incorrect and we need to
make sure we order the copies such that this doesn't happen.
Differential Revision: https://reviews.llvm.org/D48154
llvm-svn: 334750
Summary:
Specifically, we transform
zext(2^K * (trunc X to iN)) to iM ->
2^K * (zext(trunc X to i{N-K}) to iM)<nuw>
This is helpful because pulling the 2^K out of the zext allows further
optimizations.
Reviewers: sanjoy
Subscribers: hiraditya, llvm-commits, timshen
Differential Revision: https://reviews.llvm.org/D48158
llvm-svn: 334737
Summary:
Previously we would do this simplification only if it did not introduce
any new truncs (excepting new truncs which replace other cast ops).
This change weakens this condition: If the number of truncs stays the
same, but we're able to transform trunc(X + Y) to X + trunc(Y), that's
still simpler, and it may open up additional transformations.
While we're here, also clean up some duplicated code.
Reviewers: sanjoy
Subscribers: hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D48160
llvm-svn: 334736
This reverts rL331412. We didn't up using fragment atoms
in the wasm object writer after all.
Differential Revision: https://reviews.llvm.org/D48173
llvm-svn: 334734
To avoid redundant work, during DAG legalization we keep tables
mapping pre-legalized SDValues to post-legalized SDValues and a
SDValue-to-SDValue map to enable fast node replacements. However, as
the keys are nodes which may be reused it is possible that an entry in
a table refers to a now deleted node N (that should have been renamed
by the value replacement map) while a new node N' exists. If N' is
then replaced that entry would be wrong. Previously we avoided this by
when potentially violating this property, walking every table and
updating all node pointers. This is very expensive but hopefully rare
occurance.
This patch assigns each instance of a SDValue used in legalization a
unique id and uses these ids in the legalization tables. This avoids
any such aliasing issue, avoiding the full table search and allowing
more aggressive incremental table pruning.
In some cases this is a 1000x speedup to compilation.
Reviewers: jyknight, echristo, bogner, tra
Reviewed By: bogner
Subscribers: dberris, grandinj, hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D47959
llvm-svn: 334729
If WaitUntilReady is set to true then blockingLookup will return once all
requested symbols are ready. If WaitUntilReady is set to false then
blockingLookup will return as soon as all requested symbols have been
resolved. In the latter case, if any error occurs in finalizing the symbols it
will be reported to the ExecutionSession, rather than returned by
blockingLookup.
llvm-svn: 334722
In some cases, for example when compiling a preprocessed file, the
front-end is not able to provide an MD5 checksum for all files. When
that happens, omit the MD5 checksums from the final DWARF, because
DWARF doesn't have a way to indicate that some but not all files have
a checksum.
When assembling a .s file, and some but not all .file directives
provide an MD5 checksum, issue a warning and don't emit MD5 into the
DWARF.
Fixes PR37623.
Differential Revision: https://reviews.llvm.org/D48135
llvm-svn: 334710
This patches teaches EarlyCSE to figure out that if `and i1 %x, %y` is true then both
`%x` and `%y` are true in the taken branch, and if `or i1 %x, %y` is false then both
`%x` and `%y` are false in non-taken branch. Fix for PR37635.
Differential Revision: https://reviews.llvm.org/D47574
Reviewed By: reames
llvm-svn: 334707
Summary:
Do not convert a DbgDeclare to DbgValue if the store
instruction only refer to a fragment of the variable
described by the DbgDeclare.
Problem was seen when for example having an alloca for an
array or struct, and there were stores to individual elements.
In the past we inserted a DbgValue intrinsics for each store,
just as if the store wrote the whole variable.
When handling store instructions we insert a DbgValue that
indicates that the variable is "undefined", as we do not know
which part of the variable that is updated by the store.
When ConvertDebugDeclareToDebugValue is used with a load/phi
instruction we assert that the referenced value is large enough
to cover the whole variable. Afaict this should be true for all
scenarios where those methods are used on trunk. If the assert
blows in the future I guess we could simply skip to insert a
dbg.value instruction.
In the future I think we should examine which part of the variable
that is accessed, and add a DbgValue instrinsic with an appropriate
DW_OP_LLVM_fragment expression.
Reviewers: dblaikie, aprantl, rnk
Reviewed By: aprantl
Subscribers: JDevlieghere, llvm-commits
Tags: #debug-info
Differential Revision: https://reviews.llvm.org/D48024
llvm-svn: 334704
This is part of the work to cleanup use of 'alternate' ops so we can use the more general SK_Select shuffle type.
Only getSameOpcode calls getMainOpcode and much of the logic is repeated in both functions. This will require some reworking of D28907 but that patch has hit trouble and is unlikely to be completed anytime soon.
Differential Revision: https://reviews.llvm.org/D48120
llvm-svn: 334701
Summary:
The tests in:
https://bugs.llvm.org/show_bug.cgi?id=37751
...show miscompiles because we wrongly mapped and folded x86-specific intrinsics into generic DAG nodes.
This patch corrects the mappings in X86IntrinsicsInfo.h and adds isel matching corresponding to the new patterns. The complete tests for the failure cases should be in avx-cvttp2si.ll and sse-cvttp2si.ll and avx512-cvttp2i.ll
Reviewers: RKSimon, gbedwell, spatel
Reviewed By: spatel
Subscribers: mcrosier, llvm-commits
Differential Revision: https://reviews.llvm.org/D47993
llvm-svn: 334685
This is failing to compile when LLVM_ENABLE_THREADS is false,
and the fix is not immediately obvious, so reverting while I look
into it.
llvm-svn: 334658
When using clang --save-stats -mllvm -time-passes, both timers and stats
end up in the same json file.
We could end up with things like:
{
"asm-printer.EmittedInsts": 1,
"time.pass.Virtual Register Map.wall": 2.9015541076660156e-04,
"time.pass.Virtual Register Map.user": 2.0500000000000379e-04,
"time.pass.Virtual Register Map.sys": 8.5000000000001741e-05,
}
This patch makes use of the pass argument name (if available) in the
JSON key to end up with things like:
{
"asm-printer.EmittedInsts": 1,
"time.pass.virtregmap.wall": 2.9015541076660156e-04,
"time.pass.virtregmap.user": 2.0500000000000379e-04,
"time.pass.virtregmap.sys": 8.5000000000001741e-05,
}
This also helps avoiding to write another JSON printer to handle all the
cases that we could have in our pass names.
Fixed test instead of adding a new one originally from r334649.
Differential Revision: https://reviews.llvm.org/D48109
llvm-svn: 334657
Such globals are very likely to be part of a sorted section array, such
the .CRT sections used for dynamic initialization. The uses its own
sorted sections called ATL$__a, ATL$__m, and ATL$__z. Instead of special
casing them, just look for the dollar sign, which is what invokes linker
section sorting for COFF.
Avoids issues with ASan and the ATL uncovered after we started
instrumenting comdat globals on COFF.
llvm-svn: 334653
When using clang --save-stats -mllvm -time-passes, both timers and stats
end up in the same json file.
We could end up with things like:
{
"asm-printer.EmittedInsts": 1,
"time.pass.Virtual Register Map.wall": 2.9015541076660156e-04,
"time.pass.Virtual Register Map.user": 2.0500000000000379e-04,
"time.pass.Virtual Register Map.sys": 8.5000000000001741e-05,
}
This patch makes use of the pass argument name (if available) in the
JSON key to end up with things like:
{
"asm-printer.EmittedInsts": 1,
"time.pass.virtregmap.wall": 2.9015541076660156e-04,
"time.pass.virtregmap.user": 2.0500000000000379e-04,
"time.pass.virtregmap.sys": 8.5000000000001741e-05,
}
This also helps avoiding to write another JSON printer to handle all the
cases that we could have in our pass names.
Differential Revision: https://reviews.llvm.org/D48109
llvm-svn: 334649
Previously ThreadPool could only queue async "jobs", i.e. work
that was done for its side effects and not for its result. It's
useful occasionally to queue async work that returns a value.
From an API perspective, this is very intuitive. The previous
API just returned a shared_future<void>, so all we need to do is
make it return a shared_future<T>, where T is the type of value
that the operation returns.
Making this work required a little magic, but ultimately it's not
too bad. Instead of keeping a shared queue<packaged_task<void()>>
we just keep a shared queue<unique_ptr<TaskBase>>, where TaskBase
is a class with a pure virtual execute() method, then have a
templated derived class that stores a packaged_task<T()>. Everything
else works out pretty cleanly.
Differential Revision: https://reviews.llvm.org/D48115
llvm-svn: 334643
On Windows we've observed that if you open a file, write to it, map it into
memory and close the file handle, the contents of the memory mapping can
sometimes be incorrect. That was what we did when adding an entry to the
ThinLTO cache using the TempFile and MemoryBuffer classes, and it was causing
intermittent build failures on Chromium's ThinLTO bots on Windows. More
details are in the associated Chromium bug (crbug.com/786127).
We can prevent this from happening by keeping a handle to the file open while
the mapping is active. So this patch changes the mapped_file_region class to
duplicate the file handle when mapping the file and close it upon unmapping it.
One gotcha is that the file handle that we keep open must not have been
created with FILE_FLAG_DELETE_ON_CLOSE, as otherwise the operating system
will prevent other processes from opening the file. We can achieve this
by avoiding the use of FILE_FLAG_DELETE_ON_CLOSE altogether. Instead,
we use SetFileInformationByHandle with FileDispositionInfo to manage the
delete-on-close bit. This lets us remove the hack that we used to use to
clear the delete-on-close bit on a file opened with FILE_FLAG_DELETE_ON_CLOSE.
A downside of using SetFileInformationByHandle/FileDispositionInfo as
opposed to FILE_FLAG_DELETE_ON_CLOSE is that it prevents us from using
CreateFile to open the file while the flag is set, even within the same
process. This doesn't seem to matter for almost every client of TempFile,
except for LockFileManager, which calls sys::fs::create_link to create a
hard link from the lock file, and in the process of doing so tries to open
the file. To prevent this change from breaking LockFileManager I changed it
to stop using TempFile by effectively reverting r318550.
Differential Revision: https://reviews.llvm.org/D48051
llvm-svn: 334630
Currently the handle type is a global pointer which holds 8 bytes.
We need a larger type which hold 16 bytes, therefore change it
to [i64 x 2].
Differential Revision: https://reviews.llvm.org/D48094
llvm-svn: 334625
We're constant folding here, so we shouldn't check uses. This matches
the IR optimizer behavior.
The x86 test shows the expected win. The AArch64 test shows something
else. This only seems to happen if the "generic" AArch64 CPU model is
used by MachineCombiner, so I'll file a bug report to follow-up.
llvm-svn: 334608
Summary:
The code that handles ISD:Register and ISD::CopyFromReg assumes
the target is amdgcn, so this is broken on r600. We don't
need this analysis on r600 anyway so we can safely move
it to SITargetLowering.
Reviewers: alex-t, arsenm, nhaehnle
Reviewed By: arsenm
Subscribers: msearles, kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D46298
llvm-svn: 334607
Add a helper function to expand constrained FP operations as needed.
Note that the Strict POWI operation is not handled in this patch since
the format is slightly different from the others.
Differential Revision: https://reviews.llvm.org/D47491
llvm-svn: 334603
Even if we support no-canonical-prefix on
clang-cl(https://reviews.llvm.org/D47480), argv0 becomes absolute path
in clang-cl and that embeds absolute path in /showIncludes.
This patch removes such full path normalization from InitLLVM on
windows, and that removes absolute path from clang-cl output
(obj/stdout/stderr) when debug flag is disabled.
Patch by Takuto Ikuta!
Differential Revision https://reviews.llvm.org/D47578
llvm-svn: 334602
Author: milena.vujosevic.janicic
Reviewers: sdardis
The patch extends size reduction pass for MicroMIPS.
It introduces reduction of two instructions into one instruction:
Two SW instructions are transformed into one SWP instrucition.
Two LW instructions are transformed into one LWP instrucition.
Differential Revision: https://reviews.llvm.org/D39115
llvm-svn: 334595
This shortcoming was noted in D47330, and the test diffs show we already
had other examples where we failed to fold to a SHRUNKBLEND:
/// Dynamic (non-constant condition) vector blend where only the sign bits
/// of the condition elements are used. This is used to enforce that the
/// condition mask is not valid for generic VSELECT optimizations.
This patch implements an idea from D48043 and would obsolete that patch
because it catches more cases (notable the AVX1 case that was missed there).
All we're doing is allowing the existing transform to fire more often by
removing the post-legalize constraint. All of the relevant feature checks
and other predicates are left as-is.
Differential Revision: https://reviews.llvm.org/D48078
llvm-svn: 334592
Fences are inserted according to table A.6 in the current draft of version 2.3
of the RISC-V Instruction Set Manual, which incorporates the memory model
changes and definitions contributed by the RISC-V Memory Consistency Model
task group.
Instruction selection failures will now occur for 8/16/32-bit atomicrmw and
cmpxchg operations when targeting RV32IA until lowering for these operations
is added in a follow-on patch.
Differential Revision: https://reviews.llvm.org/D47589
llvm-svn: 334591
This patch adds lowering for atomic fences and relies on AtomicExpandPass to
lower atomic loads/stores, atomic rmw, and cmpxchg to __atomic_* libcalls.
test/CodeGen/RISCV/atomic-* are modelled on the exhaustive
test/CodeGen/PPC/atomics-regression.ll, and will prove more useful once RV32A
codegen support is introduced.
Fence mappings are taken from table A.6 in the current draft of version 2.3 of
the RISC-V Instruction Set Manual, which incorporates the memory model changes
and definitions contributed by the RISC-V Memory Consistency Model task group.
Differential Revision: https://reviews.llvm.org/D47587
llvm-svn: 334590
Summary:
For targets I'm not familiar with, I've automatically made the "default to 1 for each resource" behaviour explicit in the td files.
For more obvious cases, I've ventured a fix.
Some notes:
- Exynos is especially fishy.
- AArch64SchedThunderX2T99.td had some truncated entries. If I understand correctly, the person who wrote that interpreted the ResourceCycle as a range. I made the decision to use the upper/lower bound for consistency with the 'Latency' value. I'm sure there is a better choice.
- The change to X86ScheduleBtVer2.td is an NFC, it just makes values more explicit.
Also see PR37310.
Reviewers: RKSimon, craig.topper, javed.absar
Subscribers: kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D46356
llvm-svn: 334586
This patch fixes a failure in lnt tests with -verify-machineinstrs option.
When VSX Swap Removal pass swaps two register operands, it did not maintain kill flags associated with operands. This patch swaps flags as well as register number to avoid inconsistent kill flags information.
llvm-svn: 334579
Summary:
This method was not correct for entries in DWO files as it assumed it
could just add up the CU and DIE offsets to get the absolute DIE offset.
This is not correct for the DWO files, as here the CU offset will
reference the skeleton unit, whereas the DIE offset will be the offset
in the full unit in the DWO file.
Unfortunately, this means that we are not able to determine the absolute
DIE offset using the information in the .debug_names section alone,
which means we have to offload some of this work to the users of this
class.
To demonstrate how this can be done, I've added/fixed the ability to
lookup entries using accelerator tables in DWO files in llvm-dwarfdump.
To make this happen, I've needed to make two extra changes in other
classes:
- made the DWARFContext method to lookup a CU based on the section
offset public. I've needed this functionality to lookup a CU, and this
seems like a useful thing in general.
- made DWARFUnit::getDWOId call extractDIEsIfNeeded. Before this, the
DWOId was filled in only if the root DIE happened to be parsed
before we called the accessor. Since the lazy parsing is supposed to
happen under the hood, calling extractDIEsIfNeeded seems appropriate.
Reviewers: JDevlieghere, aprantl, dblaikie
Subscribers: mgrang, llvm-commits
Differential Revision: https://reviews.llvm.org/D48009
llvm-svn: 334578
IndVarSimplify sometimes makes transforms basing on users that are trivially dead. In particular,
if DCE wasn't run before it, there may be a dead `sext/zext` in loop that will trigger widening
transforms, however it makes no sense to do it.
This patch teaches IndVarsSimplify ignore the mist trivial cases of that.
Differential Revision: https://reviews.llvm.org/D47974
Reviewed By: sanjoy
llvm-svn: 334567
Previously we were whitelisting in instructions based on their SchedRW value. With the masked store instructions explicitly removed via NotMemoryFoldable, we don't seem to need this check anymore.
llvm-svn: 334563
If a VSO has a fallback definition generator attached it will be called during
lookup (and lookupFlags) for any unresolved symbols. The definition generator
can add new definitions to the VSO for any unresolved symbol. This allows VSOs
to generate new definitions on demand.
The immediate use case for this code is supporting VSOs that can import
definitions found via dlsym on demand.
llvm-svn: 334538
Resolvers are required to find results for all requested symbols or return an
error, but if a resolver fails to adhere to this contract (by returning results
for only a subset of the requested symbols) then this code will infinite loop.
This assertion catches resolvers that fail to adhere to the contract.
llvm-svn: 334536
This only affects modules with lazy GVMaterializers attached (usually modules
read off disk using the lazy bitcode reader). For such modules, materializing
before compiling prevents crashes due to missing function bodies /
initializers.
llvm-svn: 334535
Register x20 is a callee-saved register which may be used for other
purposes in certain contexts, for example to hold special variables
within the kernel. This change adds support for reserving this register
both to frontend and backend to make this register usable for these
purposes.
Differential Revision: https://reviews.llvm.org/D46552
llvm-svn: 334531
All COFF targets should use @IMGREL32 relocations for symbol differences
against __ImageBase. Do the same for getSectionForConstant, so that
immediates lowered to globals get merged across TUs.
Patch by Chris January
Differential Revision: https://reviews.llvm.org/D47783
llvm-svn: 334523
Apparently, MachineInstr class definition as well as pretty much all of
the machine passes assume that the only kind of MachineInstr's operands
that is variadic for variadic opcodes is explicit non-definitions.
In particular, this assumption is made by MachineInstr::defs(), uses(),
and explicit_uses() methods, as well as by MachineCSE pass.
The assumption is incorrect judging from at least TableGen backend
implementation, that recognizes variable_ops in OutOperandList, and the
very existence of G_UNMERGE_VALUES generic opcode, or ARM load multiple
instructions, all of which have variadic defs.
In particular, MachineCSE pass breaks MIR with CSE'able G_UNMERGE_VALUES
instructions in it.
This commit implements MachineInstr::getNumExplicitDefs() similar to
pre-existing MachineInstr::getNumExplicitOperands(), fixes
MachineInstr::defs(), uses(), and explicit_uses(), and fixes MachineCSE
pass.
As the issue addressed seems to affect only machine passes that could be
ran mid-GlobalISel pipeline at the moment, the other passes aren't fixed
by this commit, like MachineLICM: that could be done on per-pass basis
when (if ever) they get adopted for GlobalISel.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D45640
llvm-svn: 334520
- Do not emit following assembler directives:
- .hsa_code_object_version
- .hsa_code_object_isa
- .amd_amdgpu_isa
- .amd_amdgpu_hsa_metadata
- .amd_amdgpu_pal_metadata
- Do not emit .note entries
- Cleanup and bring in sync kernel descriptor header file
- Emit kernel descriptor into .rodata with appropriate relocations and
alignments
llvm-svn: 334519
This simplifies some code which had StringRefs to begin with, and
makes other code more complicated which had const char* to begin
with.
In the end, I think this makes for a more idiomatic and platform
agnostic API. Not all platforms launch process with null terminated
c-string arrays for the environment pointer and argv, but the api
was designed that way because it allowed easy pass-through for
posix-based platforms. There's a little additional overhead now
since on posix based platforms we'll be takign StringRefs which
were constructed from null terminated strings and then copying
them to null terminate them again, but from a readability and
usability standpoint of the API user, I think this API signature
is strictly better.
llvm-svn: 334518
Summary:
This is similar to D46319 (ARM). x86-64 psABI p40 gives an example:
leaq _GLOBAL_OFFSET_TABLE(%rip), %r15 # GOTPC32 reloc
GNU as creates R_X86_64_GOTPC32. However, MC currently emits R_X86_64_PC32.
Reviewers: javed.absar, echristo
Subscribers: kristof.beyls, llvm-commits, peter.smith, grimar
Differential Revision: https://reviews.llvm.org/D47507
llvm-svn: 334515
Summary: This patch originated from D46562 and is a proper subset, with some issues addressed for fmul.
Reviewers: spatel, hfinkel, wristow, arsenm
Reviewed By: spatel
Subscribers: nhaehnle, wdng
Differential Revision: https://reviews.llvm.org/D47911
llvm-svn: 334514
As discussed on PR33744, this patch relaxes ShuffleKind::SK_Alternate which requires shuffle masks to only match an alternating pattern from its 2 sources:
e.g. v4f32: <0,5,2,7> or <4,1,6,3>
This seems far too restrictive as most SIMD hardware which will implement it using a general blend/bit-select instruction, so replaces it with SK_Select, permitting elements from either source as long as they are inline:
e.g. v4f32: <0,5,2,7>, <4,1,6,3>, <0,1,6,7>, <4,1,2,3> etc.
This initial patch just updates the name and cost model shuffle mask analysis, later patch reviews will update SLP to better utilise this - it still limits itself to SK_Alternate style patterns.
Differential Revision: https://reviews.llvm.org/D47985
llvm-svn: 334513
Don't provide the assembler source as the "root file" unless the user
asked to have debug info for the assembler source (with -g).
If the source doesn't provide an explicit ".file 0" then (a) use the
compilation directory as directory #0, and (b) use the file #1 info
for file #0 also.
Differential Revision: https://reviews.llvm.org/D48055
llvm-svn: 334512
As discussed on D47985, identity shuffle masks should probably be free.
I've limited this to the case where the input and output types all match - but we could probably accept all cases.
Differential Revision: https://reviews.llvm.org/D47986
llvm-svn: 334506
Implement default legalization of rotates: either in terms of the rotation
in the opposite direction (if legal), or in terms of shifts and ors.
Implement generating of rotate instructions for Hexagon. Hexagon only
supports rotates by an immediate value, so implement custom lowering of
ROTL/ROTR on Hexagon. If a rotate is not legal, use the default expansion.
Differential Revision: https://reviews.llvm.org/D47725
llvm-svn: 334497
Currently SmallSet<PointerTy> inherits from SmallPtrSet<PointerTy>. This
patch replaces such types with SmallPtrSet, because IMO it is slightly
clearer and allows us to get rid of unnecessarily including SmallSet.h
Reviewers: dblaikie, craig.topper
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D47836
llvm-svn: 334492
Extend LONG_BRANCH_LUi and LONG_BRANCH_ADDiu pseudo instructions with
additional flag, so instead of always lowering to lui %hi(...),
addiu %lo(...) or addiu %hi(...), now they can lower to either %lo, %hi,
%higher or %highest depending on the added flag.
Differential Revision: https://reviews.llvm.org/D47941
llvm-svn: 334490
These include PUSH/POP instructions that don't match the manual table. This also includes CMPXCHG which we never emit in non-locked form.
llvm-svn: 334479
Most of these are system instructions or other instructions we don't use in CodeGen. No point wasting space for them in the table. Removing them from the autogenerated table makes it easier to review the manual table.
A few are real opcode collisions where the memory and register forms are completely different instructions.
llvm-svn: 334474
We were missing packed isel folding patterns for all of sse41, avx, and avx512.
For some reason avx512 had scalar load folding patterns under optsize(due to partial/undef reg update), but we didn't have the equivalent sse41 and avx patterns.
Sometimes we would get load folding due to peephole pass anyway, but we're also missing avx512 instructions from the load folding table. I'll try to fix that in another patch.
Some of this was spotted in the review for D47993.
This patch adds all the folds to isel, adds a few spot tests, and disables the peephole pass on a few tests to ensure we're testing some of these patterns.
llvm-svn: 334460
The use iterator, used within findMaskOperands(), can return anything which is
not a def. isUse() requires a register, so check isReg() before calling isUse().
Differential Revision: https://reviews.llvm.org/D48047
llvm-svn: 334459
Name table occupies a big chunk of size in current binary format sample profile.
In order to reduce its size, the patch changes the sample writer/reader to
save/restore MD5Hash of names in the name table. Sample annotation phase will
also use MD5Hash of name to query samples accordingly.
Experiment shows compact binary format can reduce the size of sample profile by
2/3 compared with binary format generally.
Differential Revision: https://reviews.llvm.org/D47955
llvm-svn: 334447
This would fail before because 1x vectors aren't legal,
so instead just use the scalar type.
Avoids regressions in a future AMDGPU commit to add
v4i16/v4f16 as legal types.
Test update is just the one test that this triggers
on in tree now. It wasn't checking anything before.
The result is completely changed since the selects
are eliminated. Not sure if it's considered better
or not.
llvm-svn: 334440
All of the cases are already wrapped in curly braces so declaring a variable there isn't an issue. And the variables aren't assigned or used in the larger scope.
llvm-svn: 334436
Summary:
Previously we would add them for adds, but not multiplies.
Reviewers: sanjoy
Subscribers: llvm-commits, hiraditya
Differential Revision: https://reviews.llvm.org/D48038
llvm-svn: 334428
Necessary for D46276 as even though btver2 doesn't use these instructions, its now flagged as complete so complains if ANY instruction isn't tagged.....
UnsupportedFeatures wouldn't help here as these instructions don't appear to have a feature predicate (like a lot of AVX512).
llvm-svn: 334423
Rational: if there is indirect access that is usually an issue
because load is not ready by the use. However, if use is inside a
loop and load is outside that is potentially an issue for a first
iteration only.
Differential Revision: https://reviews.llvm.org/D47740
llvm-svn: 334420
When program is compiled for mips3 with n64 abi, wrong register class
is used for creating an emergency spill slot. This patch fixes the
correct register class to be chosen.
This patch resolves PR35859.
Thanks to John Baldwin for reporting the issue!
Differential Revision: https://reviews.llvm.org/D47938
llvm-svn: 334419
This sets trackLivenessAfterRegAlloc on AVRRegisterInfo.
Most existing targets set this flag. Without it, specific IR inputs
cause LLVM to fail with:
Assertion failed: (getParent()->getProperties().hasProperty( MachineFunctionProperties::Property::TracksLiveness) &&
"Liveness information is accurate"), function livein_begin
file MachineBasicBlock.cpp, line 1354.
With this commit, this no longer happens.
Patch by Peter Nimmervoll.
llvm-svn: 334409
Summary:
This fixes most of the scheduling info for SKX vector operations.
I had to split a lot of the YMM/ZMM classes into separate classes for YMM and ZMM.
The before/after llvm-exegesis analysis are in the phabricator diff.
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D47721
llvm-svn: 334407
It's been reported
<http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20180611/559616.html>
that template argument deduction for RetryAfterSignal fails if open is
not prefixed with "::".
This should help us build correctly on those platforms and explicitly
specifying the namespace is more correct anyway.
llvm-svn: 334403
Summary:
This kind of functionality is useful to other project apart from clang.
LLDB works with version numbers a lot, but it does not have a convenient
abstraction for this. Moving this class to a lower level library allows
it to be freely used within LLDB.
Since this class is used in a lot of places in clang, and it used to be
in the clang namespace, it seemed appropriate to add it to the list of
adopted classes in LLVM.h to avoid prefixing all uses with "llvm::".
Also, I didn't find any tests specific for this class, so I wrote a
couple of quick ones for the more interesting bits of functionality.
Reviewers: zturner, erik.pilkington
Subscribers: mgorny, cfe-commits, llvm-commits
Differential Revision: https://reviews.llvm.org/D47887
llvm-svn: 334399
Summary: In preparation for D47721. HSW and SNB still define unsupported
classes as they are used by KNL and generic models respectively.
Reviewers: RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D47763
llvm-svn: 334389
Summary: When compiling with -fpic, in contrast to -fPIC, use only the
immediate field to index into the GOT. This saves space if the GOT is
known to be small. The linker will warn if the GOT is too large for
this method.
Reviewers: jyknight, venkatra
Reviewed By: jyknight
Subscribers: brad, fedor.sergeev, jrtc27, llvm-commits
Differential Revision: https://reviews.llvm.org/D47136
llvm-svn: 334383
Codeview references to unnamed structs and unions are expected to refer to the
complete type definition instead of a forward reference so Visual Studio can
resolve the type properly.
Differential Revision: https://reviews.llvm.org/D32498
llvm-svn: 334382
This patch started off much more general and ambitious, but it's been a nightmare
seeing all the ways x86 vector codegen can go wrong.
So the code is still structured to allow extending easily, but it's currently
limited in several ways:
1. Only handle cases with an extending load.
2. Only handle cases with a zero constant compare.
3. Ignore setcc with vector bitmask (SetCCWidth != 1) - so AVX512 should be unaffected.
The motivating case from PR37427:
https://bugs.llvm.org/show_bug.cgi?id=37427
...is the 1st test, and that shows the expected win - we eliminated the unnecessary
intermediate cast.
There's a clear regression in the last test (sgt_zero_fp_select) because we longer
recognize a 'SHRUNKBLEND' opportunity. I think that general problem is also present
in sgt_zero, so I'll try to fix that in a follow-up. We need to match a sign-bit
setcc from a sign-extended operand and remove it.
Differential Revision: https://reviews.llvm.org/D47330
llvm-svn: 334378
I took some liberties and quoted fewer characters than before,
based on an article from MSDN which says that only certain characters
cause an arg to require quoting. This seems to be incorrect, though,
and worse it seems to be a difference in Windows version. The bot
that fails is Windows 7, and I can't reproduce the failure on Win
10. But it's definitely related to quoting and special characters,
because both tests that fail have a * in the argument, which is one
of the special characters that would cause an argument to be quoted
before but not any longer after the new patch.
Since I don't have Win 7, all I can do is just guess that I need to
restore the old quoting rules. So this patch does that in hopes that
it fixes the problem on Windows 7.
llvm-svn: 334375
We currently support them only in AArch64. The NEON Reference,
however, says they are 'ARMv7, ARMv8' intrinsics.
Differential Revision: https://reviews.llvm.org/D47447
llvm-svn: 334361
This reverts commit 65243b6d19143cb7a03f68df0169dcb63e8b4632.
Seems like it's not a flake. It might have something to do with
the '*' character being in a command line.
llvm-svn: 334356
There were a few linux compilation failures, but other than that
I think this was just a flake that caused the tests to fail. I'm
going to resubmit and see if the failures go away, if not I'll
revert again.
llvm-svn: 334355
This reverts commit 10d2e88e87150a35dc367ba30716189d2af26774.
This is causing some test failures for some reason, reverting
while I investigate.
llvm-svn: 334354
This function was internal to Program.inc, but I've needed this
on several occasions when I've had to use CreateProcess without
llvm's sys::Execute functions. In doing so, I noticed that the
function was written using unsafe C-string access and was pretty
hard to understand / make sense of, so I've also re-written the
functions to use more modern LLVM constructs.
llvm-svn: 334353
This is a recommit of r333506, which was reverted in r333518.
The original commit message is below.
In r325551 many calls of malloc/calloc/realloc were replaces with calls of
their safe counterparts defined in the namespace llvm. There functions
generate crash if memory cannot be allocated, such behavior facilitates
handling of out of memory errors on Windows.
If the result of *alloc function were checked for success, the function was
not replaced with the safe variant. In these cases the calling function made
the error handling, like:
T *NewElts = static_cast<T*>(malloc(NewCapacity*sizeof(T)));
if (NewElts == nullptr)
report_bad_alloc_error("Allocation of SmallVector element failed.");
Actually knowledge about the function where OOM occurred is useless. Moreover
having a single entry point for OOM handling is convenient for investigation
of memory problems. This change removes custom OOM errors handling and
replaces them with calls to functions `llvm::safe_*alloc`.
Declarations of `safe_*alloc` are moved to a separate include file, to avoid
cyclic dependency in SmallVector.h
Differential Revision: https://reviews.llvm.org/D47440
llvm-svn: 334344
SmallSet forwards to SmallPtrSet for pointer types. SmallPtrSet supports iteration, but a normal SmallSet doesn't. So if it wasn't for the forwarding, this wouldn't work.
These places were found by hiding the begin/end methods in the SmallSet forwarding
llvm-svn: 334343
It looks like this got left in by accident in r289794; I can't think of
any reason this check would be necessary. (Maybe it was meant to be a
check that the AND has one use? But we check that a few lines earlier.)
Differential Revision: https://reviews.llvm.org/D47921
llvm-svn: 334322
An expression like
(zext i2 {(trunc i32 (1 + %B) to i2),+,1}<%while.body> to i32)
will become zero exactly when the nested value becomes zero in its type.
Strip injective operations from the input value in howFarToZero to make
the value simpler.
Differential Revision: https://reviews.llvm.org/D47951
llvm-svn: 334318
Summary:
If we can use comdats, then we can make it so that the global metadata
is thrown away if the prevailing definition of the global was
uninstrumented. I have only tested this on COFF targets, but in theory,
there is no reason that we cannot also do this for ELF.
This will allow us to re-enable string merging with ASan on Windows,
reducing the binary size cost of ASan on Windows.
Reviewers: eugenis, vitalybuka
Subscribers: hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D47841
llvm-svn: 334313
Extension to D46954 (PR37426), this patch adds support for v8i16/v16i16 rotations in a similar manner - the conversion of the shift/rotate amount to a multiplication factor and the use of PMULLW to shift left and PMULHUW (ISD::MULHU) to shift the wrapped bits back around to be ORd together.
Differential Revision: https://reviews.llvm.org/D47822
llvm-svn: 334309
Summary: This patch originated from D46562 and is a proper subset, with some issues addressed for fsub.
Reviewers: spatel, hfinkel, wristow, arsenm
Reviewed By: spatel
Subscribers: wdng
Differential Revision: https://reviews.llvm.org/D47910
llvm-svn: 334306
This patch moves the recipe-creation functions out of
LoopVectorizationPlanner, which should do the high-level
orchestration of the transformations.
Reviewers: dcaballe, rengolin, hsaito, Ayal
Reviewed By: dcaballe
Differential Revision: https://reviews.llvm.org/D47595
llvm-svn: 334305
As detailed on Agner's Microarchitecture doc (21.8 AMD Bobcat and Jaguar pipeline - Dependency-breaking instructions), these instructions are dependency breaking and fast-path zero the destination register (and appropriate EFLAGS bits).
llvm-svn: 334303
AMDGPU inline assembler support i16, half and i128 typed variables in constraints, but they were reported as error.
Needed to fix https://github.com/RadeonOpenCompute/ROCm/issues/341,
e.g. to be able to load with global_load_dwordx4 to a 128bit integer variable
Differential Revision: https://reviews.llvm.org/D44920
llvm-svn: 334301
Summary:
`%ret = add nuw i8 %x, C`
From [[ https://llvm.org/docs/LangRef.html#add-instruction | langref ]]:
nuw and nsw stand for “No Unsigned Wrap” and “No Signed Wrap”,
respectively. If the nuw and/or nsw keywords are present,
the result value of the add is a poison value if unsigned
and/or signed overflow, respectively, occurs.
So if `C` is `-1`, `%x` can only be `0`, and the result is always `-1`.
I'm not sure we want to use `KnownBits`/`LVI` here, because there is
exactly one possible value (all bits set, `-1`), so some other pass
should take care of replacing the known-all-ones with constant `-1`.
The `test/Transforms/InstCombine/set-lowbits-mask-canonicalize.ll` change *is* confusing.
What happening is, before this: (omitting `nuw` for simplicity)
1. First, InstCombine D47428/rL334127 folds `shl i32 1, %NBits`) to `shl nuw i32 -1, %NBits`
2. Then, InstSimplify D47883/rL334222 folds `shl nuw i32 -1, %NBits` to `-1`,
3. `-1` is inverted to `0`.
But now:
1. *This* InstSimplify fold `%ret = add nuw i32 %setbit, -1` -> `-1` happens first,
before InstCombine D47428/rL334127 fold could happen.
Thus we now end up with the opposite constant,
and it is all good: https://rise4fun.com/Alive/OA9https://rise4fun.com/Alive/sldC
Was mentioned in D47428 review.
Follow-up for D47883.
Reviewers: spatel, craig.topper
Reviewed By: spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D47908
llvm-svn: 334298
Summary:
The function `llvm::sys::commandLineFitsWithinSystemLimits` appears to be overestimating the system limits. This issue was discovered while attempting to enable response files in the Swift compiler. When the compiler submits its frontend jobs, those jobs are subjected to the system limits on command line length. `commandLineFitsWithinSystemLimits` is used to determine if the job's arguments need to be wrapped in a response file. There are some cases where the argument size for the job passes `commandLineFitsWithinSystemLimits`, but actually exceeds the real system limit, and the job fails.
`clang` also uses this function to decide whether or not to wrap it's job arguments in response files. See: https://github.com/llvm-mirror/clang/blob/master/lib/Driver/Driver.cpp#L1341. Clang will also fail for response files who's size falls within a certain range. I wrote a script that should find a failure point for `clang++`. All that is needed to run it is Python 2.7, and a simple "hello world" program for `test.cc`. It should run on Linux and on macOS. The script is available here: https://gist.github.com/dabelknap/71bd083cd06b91c5b3cef6a7f4d3d427. When it hits a failure point, you should see a `clang: error: unable to execute command: posix_spawn failed: Argument list too long`.
The proposed solution is to mirror the behavior of `xargs` in `commandLinefitsWithinSystemLimits`. `xargs` defaults to 128k for the command line length size (See: https://fossies.org/dox/findutils-4.6.0/buildcmd_8c_source.html#l00551). It adjusts this depending on the value of `ARG_MAX`.
Reviewers: alexfh
Reviewed By: alexfh
Subscribers: llvm-commits
Tags: #clang
Patch by Austin Belknap!
Differential Revision: https://reviews.llvm.org/D47795
llvm-svn: 334295
NFC here, this just raises some platform specific ifdef hackery
out of a class and creates proper platform-independent typedefs
for the relevant things. This allows these typedefs to be
reused in other places without having to reinvent this preprocessor
logic.
llvm-svn: 334294
O_CLOEXEC is the right default, but occasionally you don't
want this. This is especially true for tools like debuggers
where you might need to spawn the child process with specific
files already open, but it's occasionally useful in other
scenarios as well, like when you want to do some IPC between
parent and child.
llvm-svn: 334293
Simplify combineVectorTruncationWithPACKUS to mask the upper bits followed by calling truncateVectorWithPACK instead of duplicating with similar code.
This results in the codegen using (V)PACKUSDW on SSE41+ targets for vXi64/vXi32 inputs where before it always used PACKUSWB (along with a lot more bitcasting).
I've raised PR37749 as until we avoid unnecessary concats back to 256-bit for bitwise ops, we can't avoid splitting the input value into 128-bit subvectors for masking.
llvm-svn: 334289
Currently the loop branch heuristic is applied before the invoke heuristic which makes us overestimate the probability of the unwind destination of invokes inside loops. This in turn makes us grossly underestimate the frequencies of loops with invokes.
Reviewed By: skatkov, vsk
Differential Revision: https://reviews.llvm.org/D47371
llvm-svn: 334285
This first step separates VPInstruction-based and VPRecipe-based
VPlan creation, which should make it easier to migrate to VPInstruction
based code-gen step by step.
Reviewers: Ayal, rengolin, dcaballe, hsaito, mkuper, mzolotukhin
Reviewed By: dcaballe
Subscribers: bollu, tschuett, rkruppe, llvm-commits
Differential Revision: https://reviews.llvm.org/D47477
llvm-svn: 334284
The instruction makes use of a previously ignored field in the fence
instruction. It is introduced in the version 2.3 draft of the RISC-V
specification after much work by the Memory Model Task Group.
As clarified here <https://github.com/riscv/riscv-isa-manual/issues/186>,
the fence.tso assembler mnemonic does not have operands.
llvm-svn: 334278
We have some combines/lowerings that attempt to use PACKSS-then-PACKUS and others that use PACKUS-then-PACKSS.
PACKUS is much easier to combine with if we know the upper bits are zero as ComputeKnownBits can easily see through BITCASTs etc. especially now that rL333995 and rL334007 have landed. It also effectively works at byte level which further simplifies shuffle combines.
The only (minor) annoyances are that ComputeKnownBits can sometimes take longer as it doesn't fail as quickly as ComputeNumSignBits (but I'm not seeing any actual regressions in tests) and PACKUSDW only became available after SSE41 so we have more codegen diffs between targets.
llvm-svn: 334276
There could be more than one PHIs in exit block using same loop recurrence.
Don't assume there is only one and fix each user.
Differential Revision: https://reviews.llvm.org/D47788
llvm-svn: 334271
While trying to propagate AND masks back to loads, we currently allow
one non-load node to be included as a leaf in chain. This fix now
limits that node to produce only a single data value.
Differential Revision: https://reviews.llvm.org/D47878
llvm-svn: 334268
The NumControlBits variable was definitely sketchy. I think that only worked because the expected value was 1 or 2 and the number of lanes was 2 or 4. Had their been 8 lanes the number of bits should have been 3 not 4 as the previous code would have given.
llvm-svn: 334258
This one allows much more flexibility than the standard
openFileForRead / openFileForWrite functions. Since there is now
just one "real" function that does the work, all other implementations
simply delegate to this one.
llvm-svn: 334246
Summary: This change uses fmf subflags to guard fma optimizations as well as unsafe. These changes originated from D46483 and have been simplified via getNode.
Reviewers: spatel, arsenm, hfinkel, javed.absar
Reviewed By: spatel
Subscribers: nemanjai, wdng
Differential Revision: https://reviews.llvm.org/D47388
llvm-svn: 334242
- Make code easier to maintain.
- Avoid generating waitcnts for VMEM if the address sppace does not involve VMEM.
- Add support to generate waitcnts for LDS and GDS memory.
Differential Revision: https://reviews.llvm.org/D47504
llvm-svn: 334241
Summary:
`%r = shl nuw i8 C, %x`
As per langref:
```
If the nuw keyword is present, then the shift produces
a poison value if it shifts out any non-zero bits.
```
Thus, if the sign bit is set on `C`, then `%x` can only be `0`,
which means that `%r` can only be `C`.
Or in other words, set sign bit means that the signed value
is negative, so the constant is `<= 0`.
https://rise4fun.com/Alive/WMkhttps://rise4fun.com/Alive/udv
Was mentioned in D47428 review.
We already handle the `0` constant, https://godbolt.org/g/UZq1sJ, so this only handles negative constants.
Could use computeKnownBits() / LazyValueInfo,
but the cost-benefit analysis (https://reviews.llvm.org/D47891)
suggests it isn't worth it.
Reviewers: spatel, craig.topper
Reviewed By: spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D47883
llvm-svn: 334222
This breaks the OpenFlags enumeration into two separate
enumerations: OpenFlags and CreationDisposition. The first
controls the behavior of the API depending on whether or not
the target file already exists, and is not a flags-based
enum. The second controls more flags-like values.
This yields a more easy to understand API, while also allowing
flags to be passed to the openForRead api, where most of the
values didn't make sense before. This also makes the apis more
testable as it becomes easy to enumerate all the configurations
which make sense, so I've added many new tests to exercise all
the different values.
llvm-svn: 334221
This avoids regressions in a future AMDGPU change
to make v4i16/v4f16 legal. For these types, build_vector
is implemented as bitcasted operations on v2i32. This
combine was creating v4i16s out of what would have been
already been a v2i32 build_vector, creating a mess
of nodes that never get cleaned up.
I'm not sure this is the right condition to check.
I initially tried just checking for the legality of the
new build_vector. This works for my case, but breaks dozens
of x86 tests. A Mips test seems to show some improvement
or at least a neutral change. I don't want to think
about how long it would take to analyze the set of
different x86 vector operations impacted.
Test included in future commit.
llvm-svn: 334218
These weren't included in D19544 - probably just an oversight.
D40044 made it more likely that we'll have LLVM math intrinsics rather
than libcalls, so this bug was more easily exposed.
As the tests/code show, we already have the complete mappings for pow/exp/log.
I don't have any experience with SVML, so I don't know if anything else is
missing. It's also not clear to me that we should be doing this transform in
IR rather than DAG/isel, but that's a separate issue.
Differential Revision: https://reviews.llvm.org/D47610
llvm-svn: 334211
The implementation follows the MIPS backend and expands the pseudo instruction
directly during asm parsing. As the result, only real MC instructions are
emitted to the MCStreamer. The actual expansion to real instructions is
similar to the expansion performed by the GNU Assembler.
This patch supersedes D41949.
Differential Revision: https://reviews.llvm.org/D46118
Patch by Mario Werner.
llvm-svn: 334203
BitPermutationSelector sets Repl32 flag for bit groups which can be (potentially) benefit from 32-bit rotate-and-mask instructions with bit replication, i.e. rlwinm/rlwimi copies lower 32 bits into upper 32 bits on 64-bit PowerPC before rotation.
However, enforcing 32-bit instruction sometimes results in redundant generated code.
For example, the following simple code is compiled into rotldi + rlwimi while it can be compiled into only rldimi instruction if Repl32 flag is not set on the bit group for (a & 0xFFFFFFFF).
uint64_t func(uint64_t a, uint64_t b) {
return (a & 0xFFFFFFFF) | (b << 32) ;
}
To avoid such problem, this patch checks the potential benefit of Repl32 flag before setting it. If a bit group does not require rotation (i.e. RLAmt == 0) and won't be merged into another group, we do not benefit from Repl32 flag on this group.
Differential Revision: https://reviews.llvm.org/D47867
llvm-svn: 334195
isORCopyInst and isReadOrWriteToDSPReg functions were producing warning
that some statements my fall through.
Patch by Nikola Prica.
Differential Revision: https://reviews.llvm.org/D47876
llvm-svn: 334194
Simplify combineVectorTruncationWithPACKSS to just a SIGN_EXTEND_INREG followed by using the existing truncateVectorWithPACK instead of duplicating code.
llvm-svn: 334193
We do the same thing in rewriteSingleStoreAlloca.
Fixes PR37632.
Reviewers: chandlerc, davide, efriedma
Reviewed By: davide
Differential Revision: https://reviews.llvm.org/D47825
llvm-svn: 334187
This has two main components. First, widen
widen short constant loads in DAG when they have
the correct alignment. This is already done a bit in
AMDGPUCodeGenPrepare, since that has access to
DivergenceAnalysis. This can't help kernarg loads
created in the DAG. Start to use DAG divergence analysis
to help this case.
The second part is to avoid kernel argument lowering
breaking the alignment of short vector elements because
calling convention lowering wants to split everything
into legal register types.
When loading a split type, load the nearest 4-byte aligned
segment and shift to get the desired bits. This extra
load of the earlier argument piece ends up merging,
and the bit extract hopefully folds out.
There are a number of improvements and regressions with
this, but I think as-is this is a better compromise between
several of the worst parts of SelectionDAG.
Particularly when i16 is legal, this produces worse code
for i8 and i16 element vector kernel arguments. This is
partially due to the very weak load merging the DAG does.
It only looks for fairly specific combines between pairs
of loads which no longer appear. In particular this
causes v4i16 loads to be split into 2 components when
previously the two halves were merged.
Worse, because of the newly introduced shifts, there
is a lot more unnecessary vector packing and unpacking code
emitted. At least some of this is due to reporting
false for isTypeDesirableForOp for i16 as a workaround for
the lack of divergence information in the DAG. The cases
where this happens it doesn't actually matter, but the
relevant code in SimplifyDemandedBits doens't have the context
to know to ignore this.
The use of the scalar cache is probably more important
than the mess of mostly scalar instructions doing this packing
and unpacking. Future work can fix this, possibly by making better
use of the new DAG divergence information for controlling promotion
decisions, or adding another version of shift + trunc + shift
combines that doesn't only know about the used types.
llvm-svn: 334180
Summary: Prevent folding of operations with memory loads when one of the sources has undefined register update.
Reviewers: craig.topper
Subscribers: llvm-commits, mike.dvoretsky, ashlykov
Differential Revision: https://reviews.llvm.org/D47621
llvm-svn: 334175
Summary:
When the branch folder hoist code into a predecessor it adjust live-in's
in the blocks it hoist code from. However it fail to handle hoisted code
that contain a defed register that originally is live-in in the block
through a super register.
This is fixed by replacing the live-in handling code with calls to
utility functions in LivePhysRegs.
Reviewers: kparzysz, gberry, MatzeB, uweigand, aprantl
Reviewed By: kparzysz
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D47529
llvm-svn: 334163
This is needed to get CC operand in right place, as expected by the
SchedModel.
Review: Ulrich Weigand
https://reviews.llvm.org/D47820
llvm-svn: 334161
When denormals are supported we are producing a full division for
1.0f / x. That still can be replaced by the faster version:
bool c = fabs(x) > 0x1.0p+96f;
float s = c ? 0x1.0p-32f : 1.0f;
x *= s;
return s * v_rcp_f32(x)
in case if requested accuracy is 2.5ulp or less. The same version
is used if denormals are not supported for non 1.0 numerators, where
just v_rcp_f32 is then used for 1.0 numerator.
The optimization of 1/x is extended to the case -1/x, which is the
same except for the resulting sign bit.
OpenCL conformance passed with both enabled and disabled denorms.
Differential Revision: https://reviews.llvm.org/D47805
llvm-svn: 334142
With the upcoming patch to add summary parsing support, IsAnalysis would
be true in contexts where we are not performing module summary analysis.
Rename to the more specific and approprate HaveGVs, which is essentially
what this flag is indicating.
llvm-svn: 334140
The bug report:
https://bugs.llvm.org/show_bug.cgi?id=36036
...requests a DAG change for this, but an IR canonicalization
probably handles most cases. If we still want to match this
pattern in the backend, there's a proposal for that too:
D47831
Alive proofs including nsw/nuw cases that were first noted in:
D46988
https://rise4fun.com/Alive/Kmp
This patch is largely copied from the existing code that was
initially added with:
D40984
...but I didn't see much gain from trying to share code.
llvm-svn: 334137
Fixes terrible code on targets without f16 support. The
legalization creates a mess that is difficult to recover
from. Also should avoid randomly breaking these tests
multiple times in sequence in future commits.
Some regressions in cases where it happens to be better
to pull the source modifier after the conversion.
llvm-svn: 334132
Summary:
This is [[ https://bugs.llvm.org/show_bug.cgi?id=37603 | PR37603 ]].
https://godbolt.org/g/VCMNpShttps://rise4fun.com/Alive/idM
When doing bit manipulations, it is quite common to calculate some bit mask,
and apply it to some value via `and`.
The typical C code looks like:
```
int mask_signed_add(int nbits) {
return (1 << nbits) - 1;
}
```
which is translated into (with `-O3`)
```
define dso_local i32 @mask_signed_add(int)(i32) local_unnamed_addr #0 {
%2 = shl i32 1, %0
%3 = add nsw i32 %2, -1
ret i32 %3
}
```
But there is a second, less readable variant:
```
int mask_signed_xor(int nbits) {
return ~(-(1 << nbits));
}
```
which is translated into (with `-O3`)
```
define dso_local i32 @mask_signed_xor(int)(i32) local_unnamed_addr #0 {
%2 = shl i32 -1, %0
%3 = xor i32 %2, -1
ret i32 %3
}
```
Since we created such a mask, it is quite likely that we will use it in `and` next.
And then we may get rid of `not` op by folding into `andn`.
But now that i have actually looked:
https://godbolt.org/g/VTUDmU
_some_ backend changes will be needed too.
We clearly loose `bzhi` recognition.
Reviewers: spatel, craig.topper, RKSimon
Reviewed By: spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D47428
llvm-svn: 334127
Summary:
In D47428, i propose to choose the `~(-(1 << nbits))` as the canonical form of low-bit-mask formation.
As it is seen from these tests, there is a reason for that.
AArch64 currently better handles `~(-(1 << nbits))`, but not the more traditional `(1 << nbits) - 1` (sic!).
The other way around for X86.
It would be much better to canonicalize.
This patch is completely monkey-typing.
I don't really understand how this works :)
I have based it on `// x & (-1 >> (32 - y))` pattern.
Also, when we only have `BMI`, i wonder if we could use `BEXTR` with `start=0` ?
Related links:
https://bugs.llvm.org/show_bug.cgi?id=36419https://bugs.llvm.org/show_bug.cgi?id=37603https://bugs.llvm.org/show_bug.cgi?id=37610https://rise4fun.com/Alive/idM
Reviewers: craig.topper, spatel, RKSimon, javed.absar
Reviewed By: craig.topper
Subscribers: kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D47453
llvm-svn: 334125
These encodings correspond to the cases in the normal encoding scheme where there is no index and our modrm reading code initially decodes it as such. The VSIB handling code tried to compensate for this, but failed to add the base needed to make later code do the right thing.
Fixes PR37712.
llvm-svn: 334121
The index size is represented by the letter after the 'v'. The number represents the memory size. If an 'x' appears after the number its means the index register can be from VR128X/VR256X instead of VR128/VR256.
As vy512mem uses a VR256X index it should have an x.
And vz256mem uses a VR512 index so it shouldn't have an x.
I admit these names kind of suck and are confusing.
llvm-svn: 334120
As detailed on Agner's Microarchitecture doc (21.8 AMD Bobcat and Jaguar pipeline - Dependency-breaking instructions), all these instructions are dependency breaking and zero the destination register.
llvm-svn: 334119
Summary:
This change uses fmf subflags to guard optimizations as well as unsafe. These changes originated from D46483.
It contains only context for fsqrt.
Reviewers: spatel, hfinkel, arsenm
Reviewed By: spatel
Subscribers: hfinkel, wdng, andrew.w.kaylor, wristow, efriedma, nemanjai
Differential Revision: https://reviews.llvm.org/D47749
llvm-svn: 334113
Make TII isCopyInstr() return MachineOperands through pointer to pointer
instead via reference.
Patch by Nikola Prica.
Differential Revision: https://reviews.llvm.org/D47364
llvm-svn: 334105
If no alignment is set, the abi/preferred alignment of structs will be
used which may be higher than required. This can lead to extra padding
and in the end an increase in data size.
Differential Revision: https://reviews.llvm.org/D47633
llvm-svn: 334099
We should never get different CodeGen based on whether the code is being
compiled in debug mode so we must skip over @llvm.dbg.value (and similar)
calls.
Should fix at least the worst part of PR37690.
llvm-svn: 334090
Only the bottom 16-bits of BEXTR's control op are required (0:8 INDEX, 15:8 LENGTH).
Differential Revision: https://reviews.llvm.org/D47690
llvm-svn: 334083
On targets like Arm some relaxations may only be performed when certain
architectural features are available. As functions can be compiled with
differing levels of architectural support we must make a judgement on
whether we can relax based on the MCSubtargetInfo for the function. This
change passes through the MCSubtargetInfo for the function to
fixupNeedsRelaxation so that the decision on whether to relax can be made
per function. In this patch, only the ARM backend makes use of this
information. We must also pass the MCSubtargetInfo to applyFixup because
some fixups skip error checking on the assumption that relaxation has
occurred, to prevent code-generation errors applyFixup must see the same
MCSubtargetInfo as fixupNeedsRelaxation.
Differential Revision: https://reviews.llvm.org/D44928
llvm-svn: 334078
Add minimal support to lower function calls.
Support only functions with arguments/return that go through registers
and have type i32.
Patch by Petar Avramovic.
Differential Revision: https://reviews.llvm.org/D45627
llvm-svn: 334071
Fuchsia doesn't use __clear_cache, instead it provide zx_cache_flush
system call. Use it to flush instruction cache.
Differential Revision: https://reviews.llvm.org/D47753
llvm-svn: 334068
This is a fix for the problem arising in D47374 (PR37678):
https://bugs.llvm.org/show_bug.cgi?id=37678
We may not have throughput info because it's not specified in the model
or it's not available with variant scheduling, so assume that those
instructions can execute/complete at max-issue-width.
Differential Revision: https://reviews.llvm.org/D47723
llvm-svn: 334055
Summary: As it turns out, the lowering for the Mips16* family of target is the exact same thing as what the ops expands to, so the code handling them can be removed and the ops only enabled for the MipsSE* family of targets.
Reviewers: smaksimovic, atanasyan, abeserminji
Subscribers: sdardis, arichardson, llvm-commits
Differential Revision: https://reviews.llvm.org/D47703
llvm-svn: 334052
CodeGenPrepare pass move extension instructions close to load instructions in different BB, so they can be combined later. But the extension instructions can't move through logical and shift instructions in current implementation. This patch enables this enhancement, so we can eliminate more extension instructions.
Differential Revision: https://reviews.llvm.org/D45537
This is re-commit of r331783, which was reverted by r333305. The performance regression was caused by some unlucky alignment, not a code generation problem.
llvm-svn: 334049
There was only one place in the entire codebase where a non
default value was being passed, and that place was already hidden
in an implementation file. So we can delete the extra parameter
and all existing clients continue to work as they always have,
while making the interface a bit simpler.
Differential Revision: https://reviews.llvm.org/D47789
llvm-svn: 334046
Preserves the low bound of the !range. I don't think
it's legal to do anything with the top half since it's
theoretically reading garbage.
llvm-svn: 334045
Summary: This change uses fmf subflags to guard optimizations as well as unsafe. These changes originated from D46483.
Reviewers: spatel, hfinkel
Reviewed By: spatel
Subscribers: nemanjai
Differential Revision: https://reviews.llvm.org/D47389
llvm-svn: 334037
Similar to v4i32 SHL, convert v8i16 shift amounts to scale factors instead to improve performance and reduce instruction count. We were already doing this for constant shifts, this adds variable shift support.
Reduces the serial nature of the codegen, which relies on chains of plendvb/pand+pandn+por shifts.
This is a step towards adding support for vXi16 vector rotates.
Differential Revision: https://reviews.llvm.org/D47546
llvm-svn: 334023
Summary:
Allow extended parsing of variable assembler assignment syntax and modify X86 to permit
VAR = register assignment. As we emit these as .set directives when possible, we inline
such expressions in output assembly.
Fixes PR37425.
Reviewers: rnk, void, echristo
Reviewed By: rnk
Subscribers: nickdesaulniers, llvm-commits, hiraditya
Differential Revision: https://reviews.llvm.org/D47545
llvm-svn: 334022
When legalizing illegal FP load results, this was
for some reason dropping the invariant and dereferencable
memory flags. There doesn't seem to be any reason for this,
and the equivalent isn't done for integer loads.
Fixes an issue in a future AMDGPU commit where some identical
loads fail to merge because one of the loads ends up
dropping the flags.
llvm-svn: 334020
When adjusting a cmp in order to canonicalize an abs/nabs select pattern we need
to use the type of the existing operand when creating a new operand not the
type of a select operand, as the two may be different.
This fixes PR37686.
llvm-svn: 334019
BitPermutationSelector builds the output value by repeating rotate-and-mask instructions with input registers.
Here, we may avoid one rotate instruction if we start building from an input register that does not require rotation.
For example of the test case bitfieldinsert.ll, it first rotates left r4 by 8 bits and then inserts some bits from r5 without rotation.
This can be executed by one rlwimi instruction, which rotates r4 by 8 bits and inserts its bits into r5.
This patch adds a check for rotation amounts in the comparator used in sorting to process the input without rotation first.
Differential Revision: https://reviews.llvm.org/D47765
llvm-svn: 334011
Ideally we'd use resolveTargetShuffleInputs to handle faux shuffles as well but:
(a) that code path doesn't handle general/pre-legalized ops/types very well.
(b) I'm concerned about the compute time as they recurse to calls to computeKnownBits/ComputeNumSignBits which would need depth limiting somehow.
llvm-svn: 334007
Summary:
Bringing some come duplicated in the AT&T and the Intel printers
into a common parent class.
Reviewers: craig.topper
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D47682
llvm-svn: 334005
When the branch target of a Thumb2 unconditional or conditonal branch is
resolved at assembly time, no range checking is performed on the result
leading to incorrect immediates. This change adds a range check:
+- 16 Megabytes for unconditional branches, +- 1 Megabyte for the
conditional branch.
Differential Revision: https://reviews.llvm.org/D46306
llvm-svn: 333997
The Thumb BL range is + or - either 16 Megabytes or 4 Megabytes depending
on whether the CPU supports Thumb2 or the v8-m baseline ops. The existing
check for BL range is incorrectly set at +- 32 Megabytes. This change
corrects the higher range and uses the lower range if the featurebits
don't have the necessary support for it.
Differential Revision: https://reviews.llvm.org/D46305
llvm-svn: 333991
This is the new version of D46181, allowing setjmp/longjmp
to work correctly with the Intel CET shadow stack by storing
SSP on setjmp and fixing it on longjmp. The patch has been
updated to use the cf-protection-return module flag instead
of HasSHSTK, and the bug that caused D46181 to be reverted
has been fixed with the test expanded to track that fix.
patch by mike.dvoretsky
Differential Revision: https://reviews.llvm.org/D47311
llvm-svn: 333990
Passing -mattr=-mmx needs to disable these instructions since the MMX register class won't have been set up. But we don't want -mattr=-mmx to disable SSE so we have to do it separately.
llvm-svn: 333984
RegAlloc keeps a insertion-time ordered map of evictee information,
but we only use membership. Replace MapVector with contextually
equivalent DenseMap which is smaller and faster.
llvm-svn: 333981
Start by emitting remarks for very basic unsupported cases such as
irreducible CFGs and EHFunclets. The end goal is to be able to cover all
the cases where we give up with an explanation.
llvm-svn: 333972
We already output true and false in the printer, but the parser isn't able to
read it.
Differential Revision: https://reviews.llvm.org/D47424
llvm-svn: 333970
Code review feedback from r328123 prefers copying the few feature test
macros used by Demangle into there, rather than sinking the header into
an odd corner like Demangle.
llvm-svn: 333965
The ELF version was broken (does not deal with wasm specific fixups),
and now is slightly less broken. It will be removed in its entirety
in the future which this change makes slightly easier (just remove
the IsELF bool).
Differential Revision: https://reviews.llvm.org/D47745
Patch by Wouter van Oortmerssen
llvm-svn: 333964
As noted in rL333782, we can be both better for optimization and
safer with this transform:
BinOp (shuffle V1, Mask), C --> shuffle (BinOp V1, NewC), Mask
The only potentially unsafe-to-speculate binops are integer div/rem.
All other binops are always safe (although I don't see a way to
assert that in code here).
For opcodes like shifts that can produce poison, it can't matter
here because we know the lanes with undef are dropped by the
subsequent shuffle.
Differential Revision: https://reviews.llvm.org/D47686
llvm-svn: 333962
Review feedback from r328165. Split out just the one function from the
file that's used by Analysis. (As chandlerc pointed out, the original
change only moved the header and not the implementation anyway - which
was fine for the one function that was used (since it's a
template/inlined in the header) but not in general)
llvm-svn: 333954
This is setting up to fix bug 37573 cleanly.
This moves data structures that are technically both used in some way by the
target and the general-purpose outlining algorithm into MachineOutliner.h. In
particular, the `Candidate` class is of importance.
Before, the outliner passed the locations of `Candidates` to the target, which
would then make some decisions about the prospective outlined function. This
change allows us to just pass `Candidates` along to the target. This will allow
the target to discard `Candidates` that would be considered unsafe before cost
calculation. Thus, we will be able to remove the unsafe candidates described in
the bug without resorting to torching the entire prospective function.
Also, as a side-effect, it makes the outliner a bit cleaner.
https://bugs.llvm.org/show_bug.cgi?id=37573
llvm-svn: 333952
Windows' CRT has a limit of 512 open file descriptors, and fds which are
generated by converting a HANDLE via _get_osfhandle count towards this
limit as well.
Regardless, often you find yourself marshalling back and forth between
native HANDLE objects and fds anyway. If we know from the getgo that
we're going to need to work directly with the handle, we can cut out the
marshalling layer while also not contributing to filling up the CRT's
very limited handle table.
On Unix these functions just delegate directly to the existing set of
functions since an fd *is* the native file type. It would be nice, very
long term, if we could convert most uses of fds to file_t.
Differential Revision: https://reviews.llvm.org/D47688
llvm-svn: 333945
Summary: This include variant for add, uaddo and addcarry. usubo and subcarry require the carry to be flipped to preserve semantic, but we chose to do the transform anyway in that case as to push the transform down the carry chain.
Reviewers: efriedma, spatel, RKSimon, zvi, bkramer
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D46505
llvm-svn: 333943
Summary: It has been deprecated in favor of SETCCCARRY for a year now and isn't used by any in tree backend.
Reviewers: efriedma, craig.topper, dblaikie, bkramer
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D47685
llvm-svn: 333939
entries to reach the target. Since these calls don't require type checks,
we can short-circuit them to their real targets, except in cases when they
can be pre-empted.
Differential Revision: https://reviews.llvm.org/D46326
llvm-svn: 333937
When checking a select to see if it matches an abs, allow the true/false values
to be a sign-extension of the comparison value instead of requiring that they're
directly the comparison value, as all the comparison cares about is the sign of
the value.
This fixes a regression due to r333702, where we were no longer generating ctlz
due to isKnownNonNegative failing to match such a pattern.
Differential Revision: https://reviews.llvm.org/D47631
llvm-svn: 333927
On GFX9 and earlier, flat memory ops may decrement VMCNT out-of-order as well as LGKMCNT out-of-order.
Differential Revision: https://reviews.llvm.org/D46616
llvm-svn: 333926
This patch is the last of a sequence of three patches related to LLVM-dev RFC
"MC support for variant scheduling classes".
http://lists.llvm.org/pipermail/llvm-dev/2018-May/123181.html
This fixes PR36672.
The main goal of this patch is to teach llvm-mca how to solve variant scheduling
classes. This patch does that, plus it adds new variant scheduling classes to
the BtVer2 scheduling model to identify so-called zero-idioms (i.e. so-called
dependency breaking instructions that are known to generate zero, and that are
optimized out in hardware at register renaming stage).
Without the BtVer2 change, this patch would not have had any meaningful tests.
This patch is effectively the union of two changes:
1) a change that teaches llvm-mca how to resolve variant scheduling classes.
2) a change to the BtVer2 scheduling model that allows us to special-case
packed XOR zero-idioms (this partially fixes PR36671).
Differential Revision: https://reviews.llvm.org/D47374
llvm-svn: 333909
Summary:
The new rules are straightforward. The main rules to keep in mind
are:
1. NAME is an implicit template argument of class and multiclass,
and will be substituted by the name of the instantiating def/defm.
2. The name of a def/defm in a multiclass must contain a reference
to NAME. If such a reference is not present, it is automatically
prepended.
And for some additional subtleties, consider these:
3. defm with no name generates a unique name but has no special
behavior otherwise.
4. def with no name generates an anonymous record, whose name is
unique but undefined. In particular, the name won't contain a
reference to NAME.
Keeping rules 1&2 in mind should allow a predictable behavior of
name resolution that is simple to follow.
The old "rules" were rather surprising: sometimes (but not always),
NAME would correspond to the name of the toplevel defm. They were
also plain bonkers when you pushed them to their limits, as the old
version of the TableGen test case shows.
Having NAME correspond to the name of the toplevel defm introduces
"spooky action at a distance" and breaks composability:
refactoring the upper layers of a hierarchy of nested multiclass
instantiations can cause unexpected breakage by changing the value
of NAME at a lower level of the hierarchy. The new rules don't
suffer from this problem.
Some existing .td files have to be adjusted because they ended up
depending on the details of the old implementation.
Change-Id: I694095231565b30f563e6fd0417b41ee01a12589
Reviewers: tra, simon_tatham, craig.topper, MartinO, arsenm, javed.absar
Subscribers: wdng, llvm-commits
Differential Revision: https://reviews.llvm.org/D47430
llvm-svn: 333900
For immediates used in DUP instructions that have the range
-128 to 127, or a multiple of 256 in the range -32768 to 32512,
one could argue that when the result element size is 16bits (.h),
the value can be considered both signed and unsigned.
Reviewers: rengolin, fhahn, SjoerdMeijer, samparker, javed.absar
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D47619
llvm-svn: 333873
Print the first indexed element as a FP register, for example:
mov z0.d, z1.d[0]
Is now printed as:
mov z0.d, d1
Next to printing, this patch also adds aliases to parse 'mov z0.d, d1'.
Reviewers: rengolin, fhahn, samparker, SjoerdMeijer, javed.absar
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D47571
llvm-svn: 333872
Unpredicated copy of indexed SVE element to SVE vector,
along with MOV-aliases.
For example:
dup z0.h, z1.h[0]
duplicates the first 16-bit element from z1 to all elements in
the result vector z0.
Reviewers: rengolin, fhahn, samparker, SjoerdMeijer, javed.absar
Reviewed By: SjoerdMeijer
Differential Revision: https://reviews.llvm.org/D47570
llvm-svn: 333871
Predicated copy of floating-point immediate value to SVE vector,
along with MOV-aliases.
Reviewers: rengolin, fhahn, samparker, SjoerdMeijer, javed.absar
Reviewed By: javed.absar
Differential Revision: https://reviews.llvm.org/D47518
llvm-svn: 333869
Predicated copy of possibly shifted immediate value into SVE
vector, along with MOV-aliases.
Reviewers: rengolin, fhahn, samparker, SjoerdMeijer, javed.absar
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D47517
llvm-svn: 333868
When we optimize select basing on fact that div by 0 is undef
we should not traverse the instruction which are not guaranteed to
transfer execution to next instruction. Guard intrinsic is an example.
Reviewers: spatel, craig.topper
Reviewed By: spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D47576
llvm-svn: 333864
Applying synthetic debug info before the bitcode writer pass has no
testing-related purpose. This commit prevents that from happening.
It also adds tests which check that IR produced with/without
-debugify-each enabled is identical after stripping. This makes it
possible to check that individual passes (or full pipelines) are
invariant to debug info.
llvm-svn: 333861
pre-existing SymbolFlags and SymbolToDefinition maps.
This constructor is useful when delegating work from an existing
IRMaterialiaztionUnit to a new one, as it avoids the cost of re-computing these
maps.
llvm-svn: 333852
There's a patchwork of existing transforms trying to handle
these cases, but as seen in the changed test, we weren't
catching them all.
llvm-svn: 333845
Summary: This creates a small perf regression, but after talking with Jacques Pienaar, he was good with it to get things moving toward removng SETCCE.
Reviewers: jpienaar, bryant
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D47626
llvm-svn: 333838
We currently support them only in AArch64. The NEON Reference,
however, says they are 'ARMv7, ARMv8' intrinsics.
Differential Revision: https://reviews.llvm.org/D47120
llvm-svn: 333825
Object FIle Representation
At codegen time this is emitted into the ELF file a pair of symbol indices and a weight. In assembly it looks like:
.cg_profile a, b, 32
.cg_profile freq, a, 11
.cg_profile freq, b, 20
When writing an ELF file these are put into a SHT_LLVM_CALL_GRAPH_PROFILE (0x6fff4c02) section as (uint32_t, uint32_t, uint64_t) tuples as (from symbol index, to symbol index, weight).
Differential Revision: https://reviews.llvm.org/D44965
llvm-svn: 333823