Summary:
When an LCSSA phi survives through instruction selection, the pass
ends up removing that phi entirely because it is dominated by the
logic that does the lanemask merging.
This then used to trigger an assertion when processing a dependent
phi instruction.
Change-Id: Id4949719f8298062fe476a25718acccc109113b6
Reviewers: llvm-commits
Subscribers: kzhuravl, jvesely, wdng, yaxunl, t-tye, tpr, dstuttard, rtaylor, arsenm
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60999
llvm-svn: 358983
The check for creating CBZ in constant island pass recently obtained the
ability to search backwards to find a Cmp instruction. The code in IfCvt should
mirror this to allow more conversions to the smaller form. The common code has
been pulled out into a separate function to be shared between the two places.
Differential Revision: https://reviews.llvm.org/D60090
llvm-svn: 358977
Ifcvt can replicate instructions as it converts them to be predicated. This
stops that from happening on thumb2 targets at minsize where an extra IT
instruction is likely needed.
Differential Revision: https://reviews.llvm.org/D60089
llvm-svn: 358974
Summary:
The DAGCombiner is rewriting (canonicalizing) an ISD::ADD
with no common bits set in the operands as an ISD::OR node.
This could sometimes result in "missing out" on some
combines that normally are performed for ADD. To be more
specific this could happen if we already have rewritten an
ADD into OR, and later (after legalizations or combines)
we expose patterns that could have been optimized if we
had seen the OR as an ADD (e.g. reassociations based on ADD).
To make the DAG combiner less sensitive to if ADD or OR is
used for these "no common bits set" ADD/OR operations we
now apply most of the ADD combines also to an OR operation,
when value tracking indicates that the operands have no
common bits set.
Reviewers: spatel, RKSimon, craig.topper, kparzysz
Reviewed By: spatel
Subscribers: arsenm, rampitec, lebedev.ri, jvesely, nhaehnle, hiraditya, javed.absar, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59758
llvm-svn: 358965
This patch provides intrinsics support for Memory Tagging Extension (MTE),
which was introduced with the Armv8.5-a architecture.
The intrinsics are described in detail in the latest
ACLE Q1 2019 documentation: https://developer.arm.com/docs/101028/latest
Reviewed by: David Spickett
Differential Revision: https://reviews.llvm.org/D60486
llvm-svn: 358963
In some circumstances we can end up with setup costs that are very complex to
compute, even though the scevs are not very complex to create. This can also
lead to setupcosts that are calculated to be exactly -1, which LSR treats as an
invalid cost. This patch puts a limit on the recursion depth for setup cost to
prevent them taking too long.
Thanks to @reames for the report and test case.
Differential Revision: https://reviews.llvm.org/D60944
llvm-svn: 358958
This was supposed to be NFC, but the change in SDLoc
definitions causes instruction scheduling changes.
There's nothing x86-specific in this code, and it can
likely be used from DAGCombiner's simplifyVBinOp().
llvm-svn: 358930
Summary:
- Only apply packed literal `op_sel_hi` skipping on operands requiring
packed literals. Even an instruction is `packed`, it may have operand
requiring non-packed literal, such as `v_dot2_f32_f16`.
Reviewers: rampitec, arsenm, kzhuravl
Subscribers: jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60978
llvm-svn: 358922
These are inserted after branch relaxation, and for some reason it's
decided to put them in the long branch expansion block. It's probably
not great to rely on the source block address, so this should probably
be switched to being PC relative instead of relying on the block
address
llvm-svn: 358909
This patch adds support for BigBitWidth -> SmallBitWidth bitcasts, splitting the DemandedBits/Elts accordingly.
The AMDGPU backend needed an extra (srl (and x, c1 << c2), c2) -> (and (srl(x, c2), c1) combine to encourage BFE creation, I investigated putting this in DAGCombine but it caused a lot of noise on other targets - some improvements, some regressions.
The X86 changes are all definite wins.
Differential Revision: https://reviews.llvm.org/D60462
llvm-svn: 358887
Summary:
If you pass two 1024 bit vectors in IR with AVX2 on Windows 64. Both vectors will be split in four 256 bit pieces. The four pieces of the first argument will be passed indirectly using 4 gprs. The second argument will get passed via pointers in memory.
The PartOffsets stored for the second argument are all in terms of its original 1024 bit size. So the PartOffsets for each piece are 32 bytes apart. So if we consider it for copy elision we'll only load an 8 byte pointer, but we'll move the address 32 bytes. The stack object size we create for the first part is probably wrong too.
This issue was encountered by ISPC. I'm working on getting a reduce test case, but wanted to go ahead and get feedback on the fix.
Reviewers: rnk
Reviewed By: rnk
Subscribers: dbabokin, llvm-commits, hiraditya
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60801
llvm-svn: 358817
Fix for https://bugs.llvm.org/show_bug.cgi?id=41477. On the x32 ABI
with stack probing a dynamic alloca will result in a WIN_ALLOCA_32
with a 32-bit size. The current implementation tries to copy it into
RAX, resulting in a physreg copy error. Fix this by copying to EAX
instead. Also fix incorrect opcodes or registers used in subs.
llvm-svn: 358807
The MOVZX doesn't require an immediate to be encoded at all. Though it does use
a 2 byte opcode so its the same size as a 1 byte immediate. But it has a
separate source and dest register so can help avoid copies.
llvm-svn: 358805
There's one slight regression in here because we don't check that the immediate
already allowed movzx before the shift. I'll fix that next.
llvm-svn: 358804
Exactly the same as G_FCEIL, G_FABS, etc.
Add tests for the fp16/nofp16 behaviour, update arm64-vfloatintrinsics, etc.
Differential Revision: https://reviews.llvm.org/D60895
llvm-svn: 358799
My understanding is that once BuildMI has been called we can't fallback
to SelectionDAG.
This change moves the fallback for when getRegForValue() fails for
that target of an indirect call. This was failing in -fPIC mode when
the callee is GlobalValue.
Add a test case that tickles this.
Differential Revision: https://reviews.llvm.org/D60908
llvm-svn: 358793
Summary:
This emits labels around heapallocsite calls and S_HEAPALLOCSITE debug
info in codeview. Currently only changes FastISel, so emitting labels still
needs to be implemented in SelectionDAG.
Reviewers: hans, rnk
Subscribers: aprantl, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D60800
llvm-svn: 358783
This instruction is legalized in the same way as G_FSIN, G_FCOS, G_FLOG10, etc.
Update legalize-pow.mir and arm64-vfloatintrinsics.ll to reflect the change.
Differential Revision: https://reviews.llvm.org/D60218
llvm-svn: 358764
These are general queries, so they should not die when given
a degenerate input like an all undef mask. Callers should be
able to deal with an op that will eventually be simplified away.
llvm-svn: 358761
This adds legalization for G_SEXT, G_ZEXT, and G_ANYEXT for v8s8s.
We were falling back on G_ZEXT in arm64-vabs.ll before, preventing us from
selecting the @llvm.aarch64.neon.sabd.v8i8 intrinsic.
This adds legalizer support for those 3, which gives us selection via the
importer. Update the relevant tests (legalize-ext.mir, select-int-ext.mir) and
add a GISel line to arm64-vabs.ll.
Differential Revision: https://reviews.llvm.org/D60881
llvm-svn: 358715
combineVectorTruncationWithPACKUS is currently splitting the upper bit bit masking into 128-bit subregs and then concatenating them back together.
This was originally done to avoid regressions that caused existing subregs to be concatenated to the larger type just for the AND masking before being extracted again. This was fixed by @spatel (notably rL303997 and rL347356).
This also lets SimplifyDemandedBits do some further improvements before it hits the recursive depth limit.
My only annoyance with this is that we were broadcasting some xmm masks but we seem to have lost them by moving to ymm - but that's a known issue as the logic in lowerBuildVectorAsBroadcast isn't great.
Differential Revision: https://reviews.llvm.org/D60375#inline-539623
llvm-svn: 358692
This replaces the MOVMSK combine introduced at D52121/rL342326
(movmsk (setne (and X, (1 << C)), 0)) -> (movmsk (X << C))
with the more general icmp lowering so it can pick up more cases through bitcasts - notably vXi8 cases which use vXi16 shifts+masks, this patch can remove the mask and use pcmpgtb(0,x) for the sra.
Differential Revision: https://reviews.llvm.org/D60625
llvm-svn: 358651
Summary:
This issue from the bugzilla: https://bugs.llvm.org/show_bug.cgi?id=41177
When the two operands for BUILD_VECTOR are same, we will get assert error.
llvm::SDValue combineBVOfConsecutiveLoads(llvm::SDNode*, llvm::SelectionDAG&):
Assertion `!(InputsAreConsecutiveLoads && InputsAreReverseConsecutive) &&
"The loads cannot be both consecutive and reverse consecutive."' failed.
This error caused by the wrong ElemSIze when calling isConsecutiveLS(). We
should use `getScalarType().getStoreSize();` to get the ElemSize instread of
`getScalarSizeInBits() / 8`.
Reviewed By: jsji
Differential Revision: https://reviews.llvm.org/D60811
llvm-svn: 358644
fneg combining attempts to turn it into fadd(fneg(A), fneg(0)), but
creating the new fadd folds to just fneg(A). When A has multiple uses,
this confuses it and you get an assert. Fixed.
Differential Revision: https://reviews.llvm.org/D60633
Change-Id: I0ddc9b7286abe78edc0cd8d734fdeb05ff09821c
llvm-svn: 358640
The test file has pairs of tests that are logically equivalent:
https://rise4fun.com/Alive/2zQ
%t4 = and i8 %t1, 8
%t5 = zext i8 %t4 to i16
%sh = shl i16 %t5, 2
%t6 = add i16 %sh, %t0
=>
%t4 = and i8 %t1, 8
%sh2 = shl i8 %t4, 2
%z5 = zext i8 %sh2 to i16
%t6 = add i16 %z5, %t0
...so if we can fold the shift op into LEA in the 1st pattern, then we
should be able to do the same in the 2nd pattern (unnecessary 'movzbl'
is a separate bug I think).
We don't want to do this any sooner though because that would conflict
with generic transforms that try to narrow the width of the shift.
Differential Revision: https://reviews.llvm.org/D60789
llvm-svn: 358622
Legalize things like i24 load/store by splitting them into smaller power of 2 operations.
This matches how SelectionDAG handles these operations.
Differential Revision: https://reviews.llvm.org/D59971
llvm-svn: 358613
Summary:
None of these derived classes do anything that the base class cannot.
If we remove these case statements, then the base class can handle them
just fine.
Reviewers: peter.smith, echristo
Reviewed By: echristo
Subscribers: nemanjai, javed.absar, eraman, kristof.beyls, hiraditya, kbarton, jsji, llvm-commits, srhines
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60803
llvm-svn: 358603
Summary: This fixes a large Dawn of War 3 performance regression with RADV from Mesa 19.0 to master which was caused by creating less code in some branches.
Reviewers: arsen, nhaehnle
Reviewed By: nhaehnle
Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60824
llvm-svn: 358592
This will change with the proposal in D60214.
Unfortunately, the triple is not supported for auto-generation
via script, and the multiple RUN lines have diffs on this test,
but I can't tell exactly what is required by this test.
PR7162 was an assert/crash, so hopefully, this is good enough.
llvm-svn: 358587
Improves codegen demonstrated by D60512 - instructions represented by X86ISD::PERMV/PERMV3 can never memory fold the operand used for their index register.
This patch updates the 'isUseOfShuffle' helper into the more capable 'isFoldableUseOfShuffle' that recognises that the op is used for a X86ISD::PERMV/PERMV3 index mask and can't be folded - allowing us to use broadcast/subvector-broadcast ops to reduce the size of the mask constant pool data.
Differential Revision: https://reviews.llvm.org/D60562
llvm-svn: 358516
When not optimizing for minimum size (-Oz) we custom lower wide shifts
(SHL_PARTS, SRA_PARTS, SRL_PARTS) instead of expanding to a libcall.
Differential Revision: https://reviews.llvm.org/D59477
llvm-svn: 358498
The original commit caused false positives from AddressSanitizer's
use-after-scope checks, which have now been fixed in r358478.
> The code was previously checking that candidates for sinking had exactly
> one use or were a store instruction (which can't have uses). This meant
> we could sink call instructions only if they had a use.
>
> That limitation seemed a bit arbitrary, so this patch changes it to
> "instruction has zero or one use" which seems more natural and removes
> the need to special-case stores.
>
> Differential revision: https://reviews.llvm.org/D59936
llvm-svn: 358483
Since non-pow-2 types are going to get split up into multiple loads anyway,
don't do the [SZ]EXTLOAD combine for those and save us trouble later in
legalization.
llvm-svn: 358458
The pattern we replaced these with may be too hard to match as demonstrated by
PR41496 and PR41316.
This patch restores the intrinsics and then we can start focusing
on the optimizing the intrinsics.
I've mostly reverted the original patch that removed them. Though I modified
the avx512 intrinsics to not have masking built in.
Differential Revision: https://reviews.llvm.org/D60674
llvm-svn: 358427
Summary:
Use KnownBits::computeForAddSub/computeForAddCarry
in SelectionDAG::computeKnownBits when doing value
tracking for addition/subtraction.
This should improve the precision of the known bits,
as we only used to make a simple estimate of known
zeroes. The KnownBits support functions are also
able to deduce bits that are known to be one in the
result.
Reviewers: spatel, RKSimon, nikic, lebedev.ri
Reviewed By: nikic
Subscribers: nikic, javed.absar, lebedev.ri, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60460
llvm-svn: 358372
Apparently there are some stray IMPLICIT_DEF operations that weren't in the
checks. Not sure if they've always been there or something changed at some
point.
llvm-svn: 358371
Other opcodes shouldn't be CSE'd until we can be sure debug info quality won't
be degraded.
This change also improves the IRTranslator so that in most places, but not all,
it creates constants using the MIRBuilder directly instead of first creating a
new destination vreg and then creating a constant. By doing this, the
buildConstant() method can just return the vreg of an existing G_CONSTANT
instead of having to create a COPY from it.
I measured a 0.2% improvement in compile time and a 0.9% improvement in code
size at -O0 ARM64.
Compile time:
Program base cse diff
test-suite...ark/tramp3d-v4/tramp3d-v4.test 9.04 9.12 0.8%
test-suite...Mark/mafft/pairlocalalign.test 2.68 2.66 -0.7%
test-suite...-typeset/consumer-typeset.test 5.53 5.51 -0.4%
test-suite :: CTMark/lencod/lencod.test 5.30 5.28 -0.3%
test-suite :: CTMark/Bullet/bullet.test 25.82 25.76 -0.2%
test-suite...:: CTMark/ClamAV/clamscan.test 6.92 6.90 -0.2%
test-suite...TMark/7zip/7zip-benchmark.test 34.24 34.17 -0.2%
test-suite :: CTMark/SPASS/SPASS.test 6.25 6.24 -0.1%
test-suite...:: CTMark/sqlite3/sqlite3.test 1.66 1.66 -0.1%
test-suite :: CTMark/kimwitu++/kc.test 13.61 13.60 -0.0%
Geomean difference -0.2%
Code size:
Program base cse diff
test-suite...-typeset/consumer-typeset.test 1315632 1266480 -3.7%
test-suite...:: CTMark/ClamAV/clamscan.test 1313892 1297508 -1.2%
test-suite :: CTMark/lencod/lencod.test 1439504 1423112 -1.1%
test-suite...TMark/7zip/7zip-benchmark.test 2936980 2904172 -1.1%
test-suite :: CTMark/Bullet/bullet.test 3478276 3445460 -0.9%
test-suite...ark/tramp3d-v4/tramp3d-v4.test 8082868 8033492 -0.6%
test-suite :: CTMark/kimwitu++/kc.test 3870380 3853972 -0.4%
test-suite :: CTMark/SPASS/SPASS.test 1434904 1434896 -0.0%
test-suite...Mark/mafft/pairlocalalign.test 764528 764528 0.0%
test-suite...:: CTMark/sqlite3/sqlite3.test 782092 782092 0.0%
Geomean difference -0.9%
Differential Revision: https://reviews.llvm.org/D60580
llvm-svn: 358369
We had many tablegen patterns for these instructions. And due to the
commutability of the patterns, tablegen expands them to even more patterns. All
together VPTESTMD patterns accounted for more the 50K of the 610K isel table.
This had gotten bad when we stopped canonicalizing AND to vXi64. This required
a pattern for every combination of bitcast input type.
This change moves the matching to custom code where it is easier to look through
the bitcasts without being concerned with the specific types.
The test changes are because we are now stricter with one use checks as its
required to make load folding legal. We now require the AND and any BITCAST to
only have a single use. This prevents forming VPTESTM and a VPAND with the same
inputs.
We now support broadcast loads for 128/256 patterns without VLX. We'll widen to
512-bit like and still fold the broadcast since the amount of memory read
doesn't change.
There are a few tests that got slightly longer because are now prefering
load + VPTESTM over XOR+VPCMPEQ for (seteq (load), allzeros). Previously we were
able to share the XOR with multiple VPTESTM instructions.
llvm-svn: 358359
We're better of emitting a single compare + kand rather than a compare for the
other use and a masked compare.
I'm looking into using custom instruction selection for VPTESTM to reduce the
ridiculous number of permutations of patterns in the isel table. Putting a one
use check on all masked compare folding makes load fold matching in the custom
code easier.
llvm-svn: 358358
Summary:
The Linux kernel uses PC-relative mode, so allow that when the code model is
"kernel".
Reviewers: craig.topper
Reviewed By: craig.topper
Subscribers: llvm-commits, kees, nickdesaulniers
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60643
llvm-svn: 358343
This enables the simple copy combine that already exists in the CombinerHelper.
However, it exposed a bug in the GISelChangeObserver where it wouldn't clear a
set of MIs to process, and so would end up causing a crash when deleted MIs were
being added to the combiner worklist again.
Differential Revision: https://reviews.llvm.org/D60579
llvm-svn: 358318
This crash was introduced in r358032 as we try to construct an EVT from an MVT
in order to find the register type for the calling conv. Fall back instead of
trying to do this with an invalid MVT coming from i256.
llvm-svn: 358314
If a shufflevector's mask vector has an element with "undef" then the generic
instruction defining that element register is a G_IMPLICT_DEF instead of G_CONSTANT.
This fixes the selector to handle this case, and for now assumes that undef just means
zero. In future we'll optimize this case properly.
llvm-svn: 358312
Summary: This brings the backend in line with Clang.
Reviewers: aheejin, dschuff
Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60594
llvm-svn: 358310
The Hexagon Vector Loop Carried Reuse pass was allowing reuse between
two shufflevectors with different masks. The reason is that the masks
are not instruction objects, so the code that checks each operand
just skipped over the operands.
This patch fixes the bug by checking if the operands are the same
when they are not instruction objects. If the objects are not the
same, then the code assumes that reuse cannot occur.
Differential Revision: https://reviews.llvm.org/D60019
llvm-svn: 358292
// shuffle (concat X, undef), (concat Y, undef), Mask -->
// concat (shuffle X, Y, Mask0), (shuffle X, Y, Mask1)
The ARM changes with 'vtrn' and narrowed 'vuzp' are improvements.
The x86 changes look neutral or better. There's one test with an
extra instruction, but that could be reversed for a subtarget with
the right attributes. But by default, we want to avoid the 256-bit
op when possible (in my motivating benchmark, a handful of ymm ops
sprinkled into a sequence of xmm ops are triggering frequency
throttling on Haswell resulting in significantly worse perf).
Differential Revision: https://reviews.llvm.org/D60545
llvm-svn: 358291
Currently combineHorizontalPredicateResult only handles anyof/allof reduction patterns of legal types, which can be tricky to match as type legalization of bools can introduce bitcasts/truncs/extensions.
This patch extends combineHorizontalPredicateResult to recognise vXi1 bool reductions as well and uses the existing combineBitcastvxi1 helper to create the MOVMSK necessary to then compare the signmask result.
This ensures the accuracy of the reduction costs added in D60403 which assume the MOVMSK generation.
Differential Revision: https://reviews.llvm.org/D60610
llvm-svn: 358286
It causes clang to crash while building Chromium. See https://crbug.com/952230
for reproducer.
> The PrologEpilogInserter need to insert a DW_OP_deref_size before
> prepending a memory location expression to an already implicit
> expression to avoid having the existing expression act on the memory
> address instead of the value behind it.
>
> The reason for using DW_OP_deref_size and not plain DW_OP_deref is that
> big-endian targets need to read the right size as simply truncating a
> larger read would yield the wrong result (LSB bytes are not at the lower
> address).
>
> Differential Revision: https://reviews.llvm.org/D59687
llvm-svn: 358281
Summary:
Some llc debug options need pass-name as the parameters.
But if we use the pass-name ppc-early-ret, we will get below error:
llc test.ll -stop-after ppc-early-ret
LLVM ERROR: "ppc-early-ret" pass is not registered.
Below pass-names have the pass is not registered error:
ppc-ctr-loops
ppc-ctr-loops-verify
ppc-loop-preinc-prep
ppc-toc-reg-deps
ppc-vsx-copy
ppc-early-ret
ppc-vsx-fma-mutate
ppc-vsx-swaps
ppc-reduce-cr-ops
ppc-qpx-load-splat
ppc-branch-coalescing
ppc-branch-select
Reviewed By: jsji
Differential Revision: https://reviews.llvm.org/D60248
llvm-svn: 358271
The PrologEpilogInserter need to insert a DW_OP_deref_size before
prepending a memory location expression to an already implicit
expression to avoid having the existing expression act on the memory
address instead of the value behind it.
The reason for using DW_OP_deref_size and not plain DW_OP_deref is that
big-endian targets need to read the right size as simply truncating a
larger read would yield the wrong result (LSB bytes are not at the lower
address).
Differential Revision: https://reviews.llvm.org/D59687
llvm-svn: 358268
If the upper bits of the SHL result aren't used, we might be able to use a narrower shift. For example, on X86 this can turn a 64-bit into 32-bit enabling a smaller encoding.
Differential Revision: https://reviews.llvm.org/D60358
llvm-svn: 358257
Summary:
Some llc debug options need pass-name as the parameters.
But if we use the pass-name ppc-early-ret, we will get below error:
llc test.ll -stop-after ppc-early-ret
LLVM ERROR: "ppc-early-ret" pass is not registered.
Below pass-names have the pass is not registered error:
ppc-ctr-loops
ppc-ctr-loops-verify
ppc-loop-preinc-prep
ppc-toc-reg-deps
ppc-vsx-copy
ppc-early-ret
ppc-vsx-fma-mutate
ppc-vsx-swaps
ppc-reduce-cr-ops
ppc-qpx-load-splat
ppc-branch-coalescing
ppc-branch-select
Reviewed By: jsji
Differential Revision: https://reviews.llvm.org/D60248
llvm-svn: 358256
There are 3 operands of maddld, (add (mul %1, %2), %3) and sometimes
they are constant. If there is constant operand, it takes extra li to
materialize the operand, and one more extra register too. So it's not
profitable to use maddld to optimize mul-add pattern.
Differential Revision: https://reviews.llvm.org/D60181
llvm-svn: 358253
The isLoopCarriedDep function does not correctly compute loop
carried dependences when the array index offset is negative
or the stride is smallar than the access size.
Patch by Denis Antrushin.
Differential Revision: https://reviews.llvm.org/D60135
llvm-svn: 358233
Loads and store of values with type like <2 x p0> currently don't get imported
because SelectionDAG has no knowledge of pointer types. To leverage the existing
support for vector load/stores, we can bitcast the value to have s64 element
types instead. We do this as a custom legalization.
This patch also adds support for general loads of <2 x s64>, and relaxes some
type conditions on selecting G_BITCAST.
Differential Revision: https://reviews.llvm.org/D60534
llvm-svn: 358221
If the vector setcc has been legalized then we will need to convert a vector boolean of 0 or -1 to a scalar boolean of 0 or 1.
The added test case previously crashed in 32-bit mode by creating a setcc with an i64 condition that type legalization couldn't expand.
llvm-svn: 358218
This patch adds patterns for turning bitcasted atomic load/store into movss/sd.
It also removes the pseudo instructions for atomic RMW fadd. Instead just adding isel patterns for folding an atomic load into addss/sd. And relying on the new movss/sd store pattern to handle the write part.
This also makes the fadd patterns use VEX and EVEX instructions when AVX or AVX512F are enabled.
Differential Revision: https://reviews.llvm.org/D60394
llvm-svn: 358215
With correct test checks this time.
If we have X87, but not SSE2 we can atomicaly load an i64 value into the significand of an 80-bit extended precision x87 register using fild. We can then use a fist instruction to convert it back to an i64 integ
This matches what gcc and icc do for this case and removes an existing FIXME.
llvm-svn: 358214
If we have X87, but not SSE2 we can atomicaly load an i64 value into the significand of an 80-bit extended precision x87 register using fild. We can then use a fist instruction to convert it back to an i64 integer and store it to a stack temporary. From there we can do two 32-bit loads to get the value into integer registers without worrying about atomicness.
This matches what gcc and icc do for this case and removes an existing FIXME.
Differential Revision: https://reviews.llvm.org/D60156
llvm-svn: 358211
// bo (build_vec ...undef, x, undef...), (build_vec ...undef, y, undef...) -->
// build_vec ...undef, (bo x, y), undef...
The lifetime of the nodes in these examples is different for variables versus constants,
but they are all build vectors briefly, so I'm proposing to catch them in this form to
handle all of the leading examples in the motivating test file.
Before we have build vectors, we might have insert_vector_element. After that, we might
have scalar_to_vector and constant pool loads.
It's going to take more work to ensure that FP vector operands are getting simplified
with undef elements, so this transform can apply more widely. In a non-loose FP environment,
we are likely simplifying FP elements to NaN values rather than undefs.
We also need to allow more opcodes down this path. Eg, we don't handle FP min/max flavors
yet.
Differential Revision: https://reviews.llvm.org/D60514
llvm-svn: 358172
Because of gp = sdata_start_address + 0x800, gp with signed twelve-bit offset
could covert most of the small data section. Linker relaxation could transfer
the multiple data accessing instructions to a gp base with signed twelve-bit
offset instruction.
Differential Revision: https://reviews.llvm.org/D57493
llvm-svn: 358150
foldMaskedShiftToScaledMask tries to reorder and & shl to enable the shl to fold into an LEA. But if there is an any_extend between them it doesn't work.
This patch modifies the code to look through any_extend from i32 to i64 when the and mask only uses bits that weren't from the extended part.
This will prevent a regression from D60358 caused by 64-bit SHL being narrowed to 32-bits when their upper bits aren't demanded.
Differential Revision: https://reviews.llvm.org/D60532
llvm-svn: 358139
If we have an (add X, (and (aext (shl Y, C1)), C2)), we can pull the shift through and+aext to fold into an LEA with the.
Assuming C1 is small enough and C2 masks off all of the extend bits.
This pattern showed up in D60358. And we need to handle it to prevent a regression.
llvm-svn: 358124
This adds a simple extra test for constant hoisting to show it's
usefulness with constant addresses like those seen in memory
mapped registers in embedded systems.
llvm-svn: 358114
Certain optimisations from ConstantHoisting and CGP rely on Selection DAG not
seeing through to the constant in other blocks. Revert this patch while we come
up with a better way to handle that.
I will try to follow this up with some better tests.
llvm-svn: 358113
This patch teach getTestBitOperand to look through ANY_EXTENDs when the extended bits aren't used. The test case changed here is based what D60358 did to test16 in tbz-tbnz.ll. So this patch will avoid that regression.
Differential Revision: https://reviews.llvm.org/D60482
llvm-svn: 358108
Summary: The fp16 scalar version of facge and facgt requires a custom patter matching, as the result type is not the same width of the operands.
Reviewers: olista01, javed.absar, pbarrio
Reviewed By: javed.absar
Subscribers: kristof.beyls, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60212
llvm-svn: 358083
Make it possible to TableGen code for FCONSTS and FCONSTD.
We need to make two changes to the TableGen descriptions of vfp_f32imm
and vfp_f64imm respectively:
* add GISelPredicateCode to check that the immediate fits in 8 bits;
* extract the SDNodeXForms into separate definitions and create a
GISDNodeXFormEquiv and a custom renderer function for each of them.
There's a lot of boilerplate to get the actual value of the immediate,
but it basically just boils down to calling ARM_AM::getFP32Imm or
ARM_AM::getFP64Imm.
llvm-svn: 358063
Summary:
Obviously, new built MI (sethi+add or sethi+xor+add) for constructing large offset
should be inserted before new created MI for storing even register into memory.
So the insertion position should be *StMI instead of II.
before fixed:
std %f0, [%g1+80]
sethi 4, %g1 <<<
add %g1, %sp, %g1 <<< this two instructions should be put before "std %f0, [%g1+80]".
sethi 4, %g1
add %g1, %sp, %g1
std %f2, [%g1+88]
after fixed:
sethi 4, %g1
add %g1, %sp, %g1
std %f0, [%g1+80]
sethi 4, %g1
add %g1, %sp, %g1
std %f2, [%g1+88]
Reviewers: venkatra, jyknight
Reviewed By: jyknight
Subscribers: jyknight, fedor.sergeev, jrtc27, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60397
llvm-svn: 358042
The selection for G_ICMP is unfortunately not currently importable from SDAG
due to the use of custom SDNodes. To support this, this selection method has an
opcode table which has been generated by a script, indexed by various
instruction properties. Ideally in future we will have a GISel native selection
patterns that we can write in tablegen to improve on this.
For selection of some types we also need support for G_ASHR and G_SHL which are
generated as a result of legalization. This patch also adds support for them,
generating the same code as SelectionDAG currently does.
Differential Revision: https://reviews.llvm.org/D60436
llvm-svn: 358035
required to be passed as different register types. E.g. <2 x i16> may need to
be passed as a larger <2 x i32> type, so formal arg lowering needs to be able
truncate it back. Likewise, when dealing with returns of these types, they need
to be widened in the appropriate way back.
Differential Revision: https://reviews.llvm.org/D60425
llvm-svn: 358032
This lines up with what we do for regular subtract and it matches up better with X86 assumptions in isel patterns that add with immediate is more canonical than sub with immediate.
Differential Revision: https://reviews.llvm.org/D60020
llvm-svn: 358027
One of out of tree targets has regressed with this patch. Reverting
it for now and let liveness to be fully reconstructed in case pass
was used after the LIS is created to resolve the regression.
Differential Revision: https://reviews.llvm.org/D60466
llvm-svn: 358015
When bitcasting from a source op to a larger bitwidth op, split the demanded bits and OR them on top of one another and demand those merged bits in the SimplifyDemandedBits call on the source op.
llvm-svn: 357992
I was looking at a potential DAGCombiner fix for 1 of the regressions in D60278, and it caused severe regression test pain because x86 TLI lies about the desirability of 8-bit shift ops.
We've hinted at making all 8-bit ops undesirable for the reason in the code comment:
// TODO: Almost no 8-bit ops are desirable because they have no actual
// size/speed advantages vs. 32-bit ops, but they do have a major
// potential disadvantage by causing partial register stalls.
...but that leads to massive diffs and exposes all kinds of optimization holes itself.
Differential Revision: https://reviews.llvm.org/D60286
llvm-svn: 357912
Summary:
Previously we would use MOVZXrm8/MOVZXrm16, but those are longer encodings.
This is similar to what we do in the loadi32 predicate.
Reviewers: RKSimon, spatel
Reviewed By: RKSimon
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60341
llvm-svn: 357875
Add support for min/max flavor selects in computeConstantRange(),
which allows us to fold comparisons of a min/max against a constant
in InstSimplify. This fixes an infinite InstCombine loop, with the
test case taken from D59378.
Relative to the previous iteration, this contains some adjustments for
AMDGPU med3 tests: The AMDGPU target runs InstSimplify prior to codegen,
which ends up constant folding some existing med3 tests after this
change. To preserve these tests a hidden -amdgpu-scalar-ir-passes option
is added, which allows disabling scalar IR passes (that use InstSimplify)
for testing purposes.
Differential Revision: https://reviews.llvm.org/D59506
llvm-svn: 357870
In the case where we only want the sign bit (e.g. when using PACKSS truncation of comparison results for MOVMSK) then we can just demand the sign bit of the source operands.
This makes use of the fact that PACKSS saturates out of range values to the min/max int values - so the sign bit is always preserved.
Differential Revision: https://reviews.llvm.org/D60333
llvm-svn: 357859
This function reorders AND and SHL to enable the SHL to fold into an LEA. The
upper bits of the AND will be shifted out by the SHL so it doesn't matter what
mask value we use for these bits. By using sign bits from the original mask in
these upper bits we might enable a shorter immediate encoding to be used.
llvm-svn: 357846
When we shift the AND mask over we should shift in sign bits instead of zero bits. The scale in the LEA will shift these bits out so it doesn't matter whether we mask the bits off or not. Using sign bits will potentially allow a sign extended immediate to be used.
Also add some other test cases for cases that are currently optimal.
llvm-svn: 357845
The expansion of TCRETURNri(64) would not keep operand flags like
undef/renamable/etc. which can result in machine verifier issues.
Also add plumbing to be able to use `-run-pass=x86-pseudo`.
llvm-svn: 357808
Detect dead lanes can create some dead defs. Then RenameIndependentSubregs
will break a REG_SEQUENCE which may use these dead defs. At this point
a dead instruction can be removed but we do not run a DCE anymore.
MachineDCE was only running before live variable analysis. The patch
adds a mean to preserve LiveIntervals and SlotIndexes in case it works
past this.
Differential Revision: https://reviews.llvm.org/D59626
llvm-svn: 357805
Summary:
This avoids needing an isel pattern for each condition code. And it removes translation switches for converting between Jcc instructions and condition codes.
Now the printer, encoder and disassembler take care of converting the immediate. We use InstAliases to handle the assembly matching. But we print using the asm string in the instruction definition. The instruction itself is marked IsCodeGenOnly=1 to hide it from the assembly parser.
Reviewers: spatel, lebedev.ri, courbet, gchatelet, RKSimon
Reviewed By: RKSimon
Subscribers: MatzeB, qcolombet, eraman, hiraditya, arphaman, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60228
llvm-svn: 357802
Summary:
This avoids needing an isel pattern for each condition code. And it removes translation switches for converting between SETcc instructions and condition codes.
Now the printer, encoder and disassembler take care of converting the immediate. We use InstAliases to handle the assembly matching. But we print using the asm string in the instruction definition. The instruction itself is marked IsCodeGenOnly=1 to hide it from the assembly parser.
Reviewers: andreadb, courbet, RKSimon, spatel, lebedev.ri
Reviewed By: andreadb
Subscribers: hiraditya, lebedev.ri, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60138
llvm-svn: 357801
Summary:
Reorder the condition code enum to match their encodings. Move it to MC layer so it can be used by the scheduler models.
This avoids needing an isel pattern for each condition code. And it removes
translation switches for converting between CMOV instructions and condition
codes.
Now the printer, encoder and disassembler take care of converting the immediate.
We use InstAliases to handle the assembly matching. But we print using the
asm string in the instruction definition. The instruction itself is marked
IsCodeGenOnly=1 to hide it from the assembly parser.
This does complicate the scheduler models a little since we can't assign the
A and BE instructions to a separate class now.
I plan to make similar changes for SETcc and Jcc.
Reviewers: RKSimon, spatel, lebedev.ri, andreadb, courbet
Reviewed By: RKSimon
Subscribers: gchatelet, hiraditya, kristina, lebedev.ri, jdoerfert, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60041
llvm-svn: 357800
Second half of PR40800, this patch adds DAG undef handling to fcmp instructions to match the behavior in llvm::ConstantFoldCompareInstruction, this permits constant folding of vector comparisons where some elements had been reduced to UNDEF (by SimplifyDemandedVectorElts etc.).
This involves a lot of tweaking to reduced tests as bugpoint loves to reduce fcmp arguments to undef........
Differential Revision: https://reviews.llvm.org/D60006
llvm-svn: 357765
There are a variety of vector patterns that may be profitably reduced to a
scalar op when scalar ops are performed using a subset (typically, the
first lane) of the vector register file.
For x86, this is true for float/double ops and element 0 because
insert/extract is just a sub-register rename.
Other targets should likely enable the hook in a similar way.
Differential Revision: https://reviews.llvm.org/D60150
llvm-svn: 357760
Recent change rL357393 uses MachineInstrBuilder::addDisp to add a based on a
BlockAddress but this case was not implemented.
This patch adds the missing case and a test for RISC-V that exercises the new
case.
Differential Revision: https://reviews.llvm.org/D60136
llvm-svn: 357752
Summary:
Teach SelectionDAG how to compute known bits of ISD::CopyFromReg if
the virtual reg used has one def only.
This can be particularly useful when calling isBaseWithConstantOffset()
with the ISD::CopyFromReg argument, as more optimizations may get enabled
in the result.
Also add a missing truncation on X86, found by testing of this patch.
Change-Id: Id1c9fceec862d118c54a5b53adf72ada5d6daefa
Reviewers: bogner, craig.topper, RKSimon
Reviewed By: RKSimon
Subscribers: lebedev.ri, nemanjai, jvesely, nhaehnle, javed.absar, jsji, jdoerfert, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59535
llvm-svn: 357745
We already promote SRL and SHL to i32.
This will introduce sign extends sometimes which might be harder to deal with than the zero we use for promoting SRL. I ran this through some of our internal benchmark lists and didn't see any major regressions.
I think there might be some DAG combine improvement opportunities in the test changes here.
Differential Revision: https://reviews.llvm.org/D60278
llvm-svn: 357743
Lowering safepoint checks that all gc.relocaes observed in safepoint
must be lowered. However Fast-Isel is able to skip dead gc.relocate.
To resolve this issue we just ignore dead gc.relocate in the check.
Reviewers: reames
Reviewed By: reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D60184
llvm-svn: 357742
It unnecessarily breaks previously-working code which used varargs,
but didn't pass any float/double arguments (such as EDK2).
Also revert the fixup on top of that:
Revert [X86] Fix a test from r357317
This reverts r357317 (git commit d413f41de6)
This reverts r357380 (git commit 7af32444b9)
llvm-svn: 357718
This function is responsible for checking the legality of fusing an instance
of load -> op -> store into a single operation. In the SystemZ backend the
check was incomplete and a test case emerged with a cycle in the instruction
selection DAG as a result.
Instead of using the NodeIds to determine node relationships,
hasPredecessorHelper() now is used just like in the X86 backend. This handled
the failing tests and as well gave a few additional transformations on
benchmarks.
The SystemZ isFusableLoadOpStorePattern() is now a very near copy of the X86
function, and it seems this could be made a utility function in common code
instead.
Review: Ulrich Weigand
https://reviews.llvm.org/D60255
llvm-svn: 357688
SUBREG_TO_REG is supposed to be used to assert that we know the upper bits are
zero. But that isn't the case here. We've done no analysis of the inputs.
llvm-svn: 357673
The Fast ISel has a fallback to SelectionDAGISel in case it cannot handle the instruction.
This works as follows:
Using reverse order, try to select instruction using Fast ISel, if it cannot handle instruction it fallbacks to SelectionDAGISel
for these instructions if it is a call and continue fast instruction selections.
However if unhandled instruction is not a call or statepoint related instruction it fallbacks to SelectionDAGISel for all remaining
instructions in basic block.
However gc.result instruction is missed and as a result it is possible that gc.result is processed earlier than statepoint
causing breakage invariant the gc.results should be handled after statepoint.
Test is updated because in the current form fast-isel cannot handle ret instruction (due to i1 ret type without explicit ext)
and as a result test does not check fast-isel at all.
Reviewers: reames
Reviewed By: reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D60182
llvm-svn: 357672
This revision causes tests to fail under ASAN. Since the cause of the failures
is not clear (could be ASAN, could be a Clang bug, could be a bug in this
revision), the safest course of action seems to be to revert while investigating.
llvm-svn: 357667
These inserters inserted some instructions to zero some registers and copied from virtual registers to physical registers.
This change instead inserts the zeros directly into the DAG at lowering time using new ISD opcodes
that take the extra zeroes as inputs. The zeros will then go through isel on their own to select
the MOV32r0 pseudo. Then we just need to mention the physical registers directly
in the isel patterns and the isel table and InstrEmitter will take care of inserting the necessary
copies to/from physical registers.
llvm-svn: 357659
This custom inserter existed so we could do a weird thing where we pretended that the instructions support
a full address mode instead of taking a pointer in EAX/RAX. I think was largely so we could be pointer
size agnostic in the isel pattern.
To make this work we would then put the address into an LEA into EAX/RAX in front of the instruction after
isel. But the LEA is overkill when we just have a base pointer. So we end up using the LEA as a slower MOV
instruction.
With this change we now just do custom selection during isel instead and just assign the incoming address
of the intrinsic into EAX/RAX based on its size. After the intrinsic is selected, we can let isel take
care of selecting an LEA or other operation to do any address computation needed in this basic block.
I've also split the instruction into a 32-bit mode version and a 64-bit mode version so the implicit
use is properly sized based on the pointer. Without this we get comments in the assembly output about
killing eax and defing rax or vice versa depending on whether we define the instruction to use EAX/RAX.
llvm-svn: 357652
This pattern would show up as a regression if we more
aggressively convert vector FP ops to scalar ops.
There's still a missed optimization for the v4f64 legal
case (AVX) because we create that h-op with an undef operand.
We should probably just duplicate the operands for that
pattern to avoid trouble.
llvm-svn: 357642
Relying on no spill or other code being inserted before this was
precarious. It relied on code diligently checking isBasicBlockPrologue
which is likely to be forgotten.
Ideally this could be done earlier, but this doesn't work because of
phis. Any other instruction can't be placed before them, so we have to
accept the position being incorrect during SSA.
This avoids regressions in the fast register allocator rewrite from
inverting the direction.
llvm-svn: 357634
Same as G_EXP. Add a test, and update legalizer-info-validation.mir and
f16-instructions.ll.
Differential Revision: https://reviews.llvm.org/D60165
llvm-svn: 357605
When performing an add-with-overflow with an immediate in the
range -2G ... -4G, code currently loads the immediate into a
register, which generally takes two instructions.
In this particular case, it is preferable to load the negated
immediate into a register instead, which always only requires
one instruction, and then perform a subtract.
llvm-svn: 357597
There are 3 changes to make this correspond to the same transform in instcombine:
1. Remove the legality check - we can't create anything less legal than we started with.
2. Ease the use restriction, so we only bail out if both operands have >1 use.
3. Ease the use restriction for binops with a repeated operand (eg, mul x, x).
As discussed in D60150, there's a scalarization opportunity that will be made
easier by allowing this transform more generally.
llvm-svn: 357580