Add an option to control whether or not to enable store merging in dag combiner
so we can workaround some bugs more easily.
Differential Revision: https://reviews.llvm.org/D65482
llvm-svn: 367365
This allows us to peek through BITCASTs, attempt to simplify the source operand, and then bitcast back.
This reapplies rL367091 which was reverted at rL367118 - we were inconsistently peeking through the bitcasts to the source value.
Fixes PR42777
llvm-svn: 367174
If anything called the recursive isKnownNeverNaN/computeKnownBits/ComputeNumSignBits/SimplifyDemandedBits/SimplifyMultipleUseDemandedBits with an incorrect depth then we could continue to recurse if we'd already exceeded the depth limit.
This replaces the limit check (Depth == 6) with a (Depth >= 6) to make sure that we don't circumvent it.
This causes a couple of regressions as a mixture of calls (SimplifyMultipleUseDemandedBits + combineX86ShufflesRecursively) were calling with depths that were already over the limit. I've fixed SimplifyMultipleUseDemandedBits to not do this. combineX86ShufflesRecursively is trickier as we get a lot of regressions if we reduce its own limit from 8 to 6 (it also starts at Depth == 1 instead of Depth == 0 like the others....) - I'll see what I can do in future patches.
llvm-svn: 367171
We're getting reports of massive compile time increases because SimplifyMultipleUseDemandedBits was losing track of the depth and not earlying-out. No repro yet, but consider this a pre-emptive commit.
llvm-svn: 367169
Eventually all of these will be moved over, but we create nodes in GetDemandedBits recursion at the moment which causes regressions when we try to remove them all.
llvm-svn: 367092
Summary:
This was originally reported in D62818.
https://rise4fun.com/Alive/oPH
InstCombine does the opposite fold, in hope that `C l>>/<< Y` expression
will be hoisted out of a loop if `Y` is invariant and `X` is not.
But as it is seen from the diffs here, if it didn't get hoisted,
the produced assembly is almost universally worse.
Much like with my recent "hoist add/sub by/from const" patches,
we should get almost universal win if we hoist constant,
there is almost always an "and/test by imm" instruction,
but "shift of imm" not so much, so we may avoid having to
materialize the immediate, and thus need one less register.
And since we now shift not by constant, but by something else,
the live-range of that something else may reduce.
Special care needs to be applied not to disturb x86 `BT` / hexagon `tstbit`
instruction pattern. And to not get into endless combine loop.
Reviewers: RKSimon, efriedma, t.p.northover, craig.topper, spatel, arsenm
Reviewed By: spatel
Subscribers: hiraditya, MaskRay, wuzish, xbolva00, nikic, nemanjai, jvesely, wdng, nhaehnle, javed.absar, tpr, kristof.beyls, jsji, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62871
llvm-svn: 366955
This patch adds support for recognizing cases where a larger vector type is being used to reduce just the elements in the lower subvector:
e.g. <8 x i32> reduction pattern in a <16 x i32> vector:
<4,5,6,7,u,u,u,u,u,u,u,u,u,u,u,u>
<2,3,u,u,u,u,u,u,u,u,u,u,u,u,u,u>
<1,u,u,u,u,u,u,u,u,u,u,u,u,u,u,u>
matchBinOpReduction returns the lower extracted subvector in such cases, assuming isExtractSubvectorCheap accepts the extraction.
I've only enabled it for X86 reduction sums so far. I intend to enable it for the bitop/minmax cases in future patches, and eventually I think its worth turning it on all the time. This is mainly just a case of ensuring calls to matchBinOpReduction don't make assumptions on the vector width based on the original vector extraction.
Fixes the x86 partial reduction sum cases in PR33758 and PR42023.
Differential Revision: https://reviews.llvm.org/D65047
llvm-svn: 366933
If we are already using the same chain for the old/new memory ops then just return.
Fixes PR42727 which had getLoad() reusing an existing node.
llvm-svn: 366922
If all the demanded elts are from one operand and are inline, then we can use the operand directly.
The changes are mainly from SSE41 targets which has blendvpd but not cmpgtq, allowing the v2i64 comparison to be simplified as we only need the signbit from alternate v4i32 elements.
llvm-svn: 366817
This patch introduces the DAG version of SimplifyMultipleUseDemandedBits, which attempts to peek through ops (mainly and/or/xor so far) that don't contribute to the demandedbits/elts of a node - which means we can do this even in cases where we have multiple uses of an op, which normally requires us to demanded all bits/elts. The intention is to remove a similar instruction - SelectionDAG::GetDemandedBits - once SimplifyMultipleUseDemandedBits has matured.
The InstCombine version of SimplifyMultipleUseDemandedBits can constant fold which I haven't added here yet, and so far I've only wired this up to some basic binops (and/or/xor/add/sub/mul) to demonstrate its use.
We do see a couple of regressions that need to be addressed:
AMDGPU unsigned dot product codegen retains an AND mask (for ZERO_EXTEND) that it previously removed (but otherwise the dotproduct codegen is a lot better).
X86/AVX2 has poor handling of vector ANY_EXTEND/ANY_EXTEND_VECTOR_INREG - it prematurely gets converted to ZERO_EXTEND_VECTOR_INREG.
The code owners have confirmed its ok for these cases to fixed up in future patches.
Differential Revision: https://reviews.llvm.org/D63281
llvm-svn: 366799
The function was calling getNode() on an SDValue to return and the
caller turned the result back into a SDValue. So just return the
original SDValue to avoid this.
llvm-svn: 366779
We were silently using the ABI alignment for all of the stores generated for deopt and gc values. We'd gotten the alignment of the stack slot itself properly reduced (via MachineFrameInfo's clamping), but having the MMO on the store incorrect was enough for us to generate an aligned store to a unaligned location.
The simplest fix would have been to just pass the alignment to the helper function, but once we do that, the helper function doesn't really help. So, inline it and directly call the MMO version of DAG.getStore with a properly constructed MMO.
Note that there's a separate performance possibility here. Even if we *can* realign stacks, we probably don't *want to* if all of the stores are in slowpaths. But that's a later patch, if at all. :)
llvm-svn: 366765
ARM has code to recognise uses of the "returned" function parameter
attribute which guarantee that the value passed to the function in r0
will be returned in r0 unmodified. IPRA replaces the regmask on call
instructions, so needs to be told about this to avoid reverting the
optimisation.
Differential revision: https://reviews.llvm.org/D64986
llvm-svn: 366669
Summary:
Four things here:
1. Generalize the fold to handle non-splat divisors. Reasonably trivial.
2. Unban power-of-two divisors. I don't see any reason why they should
be illegal.
* There is no ban in Hacker's Delight
* I think the ban came from the same bug that caused the miscompile
in the base patch - in `floor((2^W - 1) / D)` we were dividing by
`D0` instead of `D`, and we **were** ensuring that `D0` is not `1`,
which made sense.
3. Unban `1` divisors. I no longer believe Hacker's Delight actually says
that the fold is invalid for `D = 0`. Further considerations:
* We know that
* `(X u% 1) == 0` can be constant-folded to `1`,
* `(X u% 1) != 0` can be constant-folded to `0`,
* Also, we know that
* `X u<= -1` can be constant-folded to `1`,
* `X u> -1` can be constant-folded to `0`,
* https://godbolt.org/z/7jnZJXhttps://rise4fun.com/Alive/oF6p
* We know will end up with the following:
`(setule/setugt (rotr (mul N, P), K), Q)`
* Therefore, for given new DAG nodes and comparison predicates
(`ule`/`ugt`), we will still produce the correct answer if:
`Q` is a all-ones constant; and both `P` and `K` are *anything*
other than `undef`.
* The fold will indeed produce `Q = all-ones`.
4. Try to re-splat the `P` and `K` vectors - we don't care about
their values for the lanes where divisor was `1`.
Reviewers: RKSimon, hermord, craig.topper, spatel, xbolva00
Reviewed By: RKSimon
Subscribers: hiraditya, javed.absar, dexonsmith, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D63963
llvm-svn: 366637
This was handled previously for arguments split due to not fitting in
an MVT. This was dropping the register for argument registers split
due to TLI::getRegisterTypeForCallingConv.
llvm-svn: 366574
Implement IR intrinsics for stack tagging. Generated code is very
unoptimized for now.
Two special intrinsics, llvm.aarch64.irg.sp and llvm.aarch64.tagp are
used to implement a tagged stack frame pointer in a virtual register.
Differential Revision: https://reviews.llvm.org/D64172
llvm-svn: 366360
Summary:
As per title. DAGCombiner only mathes the special case where b = 0, this patches extends the pattern to match any value of b.
Depends on D57302
Reviewers: hfinkel, RKSimon, craig.topper
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59208
llvm-svn: 366214
We already split extract_subvector(binop(insert_subvector(v,x),insert_subvector(w,y))) -> binop(x,y).
This patch adds support for extract_subvector(binop(concat_vectors(),concat_vectors())) cases as well.
In particular this means we don't have to wait for X86 lowering to convert concat_vectors to insert_subvector chains, which helps avoid some cases where demandedelts/combine calls occur too late to split large vector ops.
The fast-isel-store.ll load folding regression is annoying but I don't think is that critical.
Differential Revision: https://reviews.llvm.org/D63653
llvm-svn: 365785
If we have:
R = sub X, Y
P = cmp Y, X
...then flipping the operands in the compare instruction can allow using a subtract that sets compare flags.
Motivated by diffs in D58875 - not sure if this changes anything there,
but this seems like a good thing independent of that.
There's a more involved version of this transform already in IR (in instcombine
although that seems misplaced to me) - see "swapMayExposeCSEOpportunities()".
Differential Revision: https://reviews.llvm.org/D63958
llvm-svn: 365711
Summary: Unsafe does not map well alone for each of these three cases as it is missing NoNan context when accessed directly with clang. I have migrated the fold guards to reflect the expectations of handing nan and zero contexts directly (NoNan, NSZ) and some tests with it. Unsafe does include NSZ, however there is already precedent for using the target option directly to reflect that context.
Reviewers: spatel, wristow, hfinkel, craig.topper, arsenm
Reviewed By: arsenm
Subscribers: michele.scandale, wdng, javed.absar
Differential Revision: https://reviews.llvm.org/D64450
llvm-svn: 365679
Basically the problem is that X86 doesn't set the Fast flag from
allowsMemoryAccess on certain CPUs due to slow unaligned memory
subtarget features. This prevents bitcasts from being folded into
loads and stores. But all vector loads and stores of the same width
are the same cost on X86.
This patch merges the allowsMemoryAccess call into isLoadBitCastBeneficial to allow X86 to skip it.
Differential Revision: https://reviews.llvm.org/D64295
llvm-svn: 365549
This makes the functions in Loads.h require a type to be specified
independently of the pointer Value so that when pointers have no structure
other than address-space, it can still do its job.
Most callers had an obvious memory operation handy to provide this type, but a
SROA and ArgumentPromotion were doing more complicated analysis. They get
updated to merge the properties of the various instructions they were
considering.
llvm-svn: 365468
DAGTypeLegalizer and SelectionDAGLegalize has helper
functions wrapping the call to TLI.getSetCCResultType(...).
Use those helpers in more places.
llvm-svn: 365456
Summary:
Make sure we use SETGE instead of SETGT when checking
if the sign bit is zero at SMULFIXSAT expansion.
The faulty expansion occured when doing "expand" of
SMULFIXSAT and the scale was exactly matching the
size of the smaller type. For example doing
i64 Z = SMULFIXSAT X, Y, 32
and expanding X/Y/Z into using two i32 values.
The problem was that we sometimes did not saturate
to min when overflowing.
Here is an example using Q3.4 numbers:
Consider that we are multiplying X and Y.
X = 0x80 (-8.0 as Q3.4)
Y = 0x20 (2.0 as Q3.4)
To avoid loss of precision we do a widening
multiplication, getting a 16 bit result
Z = 0xF000 (-16.0 as Q7.8)
To detect negative overflow we should check if
the five most significant bits in Z are less than -1.
Assume that we name the 4 most significant bits
as HH and the next 4 bits as HL. Then we can do the
check by examining if
(HH < -1) or (HH == -1 && "sign bit in HL is zero").
The fault was that we have been doing the check as
(HH < -1) or (HH == -1 && HL > 0)
instead of
(HH < -1) or (HH == -1 && HL >= 0).
In our example HH is -1 and HL is 0, so the old
code did not trigger saturation and simply truncated
the result to 0x00 (0.0). With the bugfix we instead
detect that we should saturate to min, and the result
will be set to 0x80 (-8.0).
Reviewers: leonardchan, bevinh
Reviewed By: leonardchan
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D64331
llvm-svn: 365455
Summary:
This makes it so that IR files using triples without an environment work
out of the box, without normalizing them.
Typically, the MSVC behavior is more desirable. For example, it tends to
enable things like constant merging, use of associative comdats, etc.
Addresses PR42491
Reviewers: compnerd
Subscribers: hiraditya, dexonsmith, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D64109
llvm-svn: 365387
Don't do this locally, computeKnownBits does this better (and can handle non-constant cases as well).
A next step would be to actually simplify non-constant elements - building on what we already do in SimplifyDemandedVectorElts.
llvm-svn: 365309
Summary:
The uaddo won't be removed and the addcarry will still be
dependent on the uaddo. So we'll just increase the use count
of X and Y and potentially require a COPY.
Reviewers: spatel, RKSimon, deadalnix
Reviewed By: RKSimon
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D64190
llvm-svn: 365149
We previously marked all the tests with branch funnels as
`-verify-machineinstrs=0`.
This is an attempt to fix it.
1) `ICALL_BRANCH_FUNNEL` has no defs. Mark it as `let OutOperandList =
(outs)`
2) After that we hit an assert: ``` Assertion failed: (Op.getValueType()
!= MVT::Other && Op.getValueType() != MVT::Glue && "Chain and glue
operands should occur at end of operand list!"), function AddOperand,
file
/Users/francisvm/llvm/llvm/lib/CodeGen/SelectionDAG/InstrEmitter.cpp,
line 461. ```
The chain operand was added at the beginning of the operand list. Move
that to the end.
3) After that we hit another verifier issue in the pseudo expansion
where the registers used in the cmps and jmps are not added to the
livein lists. Add the `EFLAGS` to all the new MBBs that we create.
PR39436
Differential Review: https://reviews.llvm.org/D54155
llvm-svn: 365058
Summary:
This diff improve the capability of DAGCOmbine to generate linear carries propagation in presence of a diamond pattern. It is now able to match a large variety of different patterns rather than some hardcoded one.
Arguably, the codegen in test cases is not better, but this is to be expected. The goal of this transformation is more about canonicalisation than actual optimisation.
Reviewers: hfinkel, RKSimon, craig.topper
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D57302
llvm-svn: 365051
When a target intrinsic has been determined to touch memory, we construct a MachineMemOperand during SDAG construction. In this case, we should propagate AAMDNodes metadata to the MachineMemOperand where available.
Differential revision: https://reviews.llvm.org/D64131
llvm-svn: 365043
Summary:
This is the backend part of [[ https://bugs.llvm.org/show_bug.cgi?id=42457 | PR42457 ]].
In middle-end, we'd want to prefer the form with two adds - D63992,
but as this diff shows, not every target will prefer that pattern.
Out of 4 targets for which i added tests all seem to be ok with inc-of-add for scalars,
but only X86 prefer that same pattern for vectors.
Here i'm adding a new TLI hook, always defaulting to the inc-of-add,
but adding AArch64,ARM,PowerPC overrides to prefer inc-of-add only for scalars.
Reviewers: spatel, RKSimon, efriedma, t.p.northover, hfinkel
Reviewed By: efriedma
Subscribers: nemanjai, javed.absar, kristof.beyls, kbarton, jsji, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D64090
llvm-svn: 365010
For a given floating point load / store pair, if the load value isn't used by any other operations,
then consider transforming the pair to integer load / store operations if the target deems the transformation profitable.
And we can exploiting much more when there are other operation nodes with chain operand between the load/store pair
so long as we keep the chain ordering original. We only replace the register used to load/store from float to integer.
I only add testcase in ARM because the TLI.isDesirableToTransformToIntegerOp hook is only enabled in ARM target.
Differential Revision: https://reviews.llvm.org/D60601
llvm-svn: 364883
The SDAGBuilder behavior stems from the days when we didn't have fast
math flags available in SDAG. We do now and doing the transformation in
the legalizer has the advantage that it also works for vector types.
llvm-svn: 364743
Summary:
I'm submitting a new revision since i don't understand how to reclaim/reopen/take over the existing one, D50222.
There is no such action in "Add Action" menu...
This implements an optimization described in Hacker's Delight 10-17: when `C` is constant,
the result of `X % C == 0` can be computed more cheaply without actually calculating the remainder.
The motivation is discussed here: https://bugs.llvm.org/show_bug.cgi?id=35479.
This is a recommit, the original commit rL364563 was reverted in rL364568
because test-suite detected miscompile - the new comparison constant 'Q'
was being computed incorrectly (we divided by `D0` instead of `D`).
Original patch D50222 by @hermord (Dmytro Shynkevych)
Notes:
- In principle, it's possible to also handle the `X % C1 == C2` case, as discussed on bugzilla.
This seems to require an extra branch on overflow, so I refrained from implementing this for now.
- An explicit check for when the `REM` can be reduced to just its LHS is included:
the `X % C` == 0 optimization breaks `test1` in `test/CodeGen/X86/jump_sign.ll` otherwise.
I hadn't managed to find a better way to not generate worse output in this case.
- The `test/CodeGen/X86/jump_sign.ll` regresses, and is being fixed by a followup patch D63390.
Reviewers: RKSimon, craig.topper, spatel, hermord, xbolva00
Reviewed By: RKSimon, xbolva00
Subscribers: dexonsmith, kristina, xbolva00, javed.absar, llvm-commits, hermord
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D63391
llvm-svn: 364600
Summary:
I'm submitting a new revision since i don't understand how to reclaim/reopen/take over the existing one, D50222.
There is no such action in "Add Action" menu...
Original patch D50222 by @hermord (Dmytro Shynkevych)
This implements an optimization described in Hacker's Delight 10-17: when `C` is constant,
the result of `X % C == 0` can be computed more cheaply without actually calculating the remainder.
The motivation is discussed here: https://bugs.llvm.org/show_bug.cgi?id=35479.
Original patch author: @hermord (Dmytro Shynkevych)!
Notes:
- In principle, it's possible to also handle the `X % C1 == C2` case, as discussed on bugzilla.
This seems to require an extra branch on overflow, so I refrained from implementing this for now.
- An explicit check for when the `REM` can be reduced to just its LHS is included:
the `X % C` == 0 optimization breaks `test1` in `test/CodeGen/X86/jump_sign.ll` otherwise.
I hadn't managed to find a better way to not generate worse output in this case.
- The `test/CodeGen/X86/jump_sign.ll` regresses, and is being fixed by a followup patch D63390.
Reviewers: RKSimon, craig.topper, spatel, hermord, xbolva00
Reviewed By: RKSimon, xbolva00
Subscribers: xbolva00, javed.absar, llvm-commits, hermord
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D63391
llvm-svn: 364563
While lowering calls, collect info about registers that forward arguments
into following function frame. We store such info into the MachineFunction
of the call. This is used very late when dumping DWARF info about
call site parameters.
([9/13] Introduce the debug entry values.)
Co-authored-by: Ananth Sowda <asowda@cisco.com>
Co-authored-by: Nikola Prica <nikola.prica@rt-rk.com>
Co-authored-by: Ivan Baev <ibaev@cisco.com>
Differential Revision: https://reviews.llvm.org/D60715
llvm-svn: 364516
Summary:
(Not so) boringly identical to pattern a (D62786)
Not yet sure how do deal with the last pattern c.
Reviewers: RKSimon, craig.topper, spatel
Reviewed By: RKSimon
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62793
llvm-svn: 364418
We support 'big to little' (e.g. extract_subvector(v16i8 bitcast(v2i64))) but not 'little to big' cases (e.g. extract_subvector(v2i64 bitcast(v16i8)))
llvm-svn: 364405
Change the generic ctpop expansion to more efficiently handle a
check for not-a-power-of-two value:
(ctpop x) != 1 --> (x == 0) || ((x & x-1) != 0)
This is the inverted predicate sibling pattern that was added with:
D63004
This should have been done before I changed IR canonicalization to
favor this form with:
rL364246
...so if this requires revert/changing, the earlier commit may also
need to modified.
llvm-svn: 364319
Simplify ZERO_EXTEND_VECTOR_INREG if the extended bits are not required.
Matches what we already do for ZERO_EXTEND.
Reapplies rL363850 but now with legality checks added at rL364290
llvm-svn: 364303
This should not cause any visible change in output, but it's
more efficient because we were producing non-canonical 'sub x, 1'
and 'setcc ugt x, 0'. As mentioned in the TODO, we should also
be handling the inverse predicate.
llvm-svn: 364302
Simplify SIGN_EXTEND_VECTOR_INREG if the extended bits are not required/known zero.
Matches what we already do for SIGN_EXTEND.
Reapplies rL363802 but now with legality checks added at rL364290
llvm-svn: 364299
The *_EXTEND_VECTOR_INREG opcodes were relaxed back around rL346784 to support source vector widths that are smaller than the output - it looks like the legalizers were never updated to account for this.
This patch inserts the smaller source vector into an undef vector of the same width of the result before performing the shuffle+bitcast to correctly handle this.
Part of the yak shaving to solve the crashes from rL364264 and rL364272
llvm-svn: 364295
As part of the fix for rL364264 + rL364272 - limit the *_EXTEND conversion to !TLO.LegalOperations || isOperationLegal cases.
We'll improve X86 legality in future commits.
llvm-svn: 364290
Summary:
This addresses the regression that is being exposed by D50222 in `test/CodeGen/X86/jump_sign.ll`
The missing fold, at least partially, looks trivial:
https://rise4fun.com/Alive/Zsln
i.e. if we are comparing with zero, and comparing the `urem`-by-non-power-of-two,
and the `urem` is of something that may at most have a single bit set (or no bits set at all),
the `urem` is not needed.
Reviewers: RKSimon, craig.topper, xbolva00, spatel
Reviewed By: xbolva00, spatel
Subscribers: xbolva00, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D63390
llvm-svn: 364286
This reverts the following patches.
"[TargetLowering] SimplifyDemandedBits SIGN_EXTEND_VECTOR_INREG -> ANY/ZERO_EXTEND_VECTOR_INREG"
"[TargetLowering] SimplifyDemandedBits ZERO_EXTEND_VECTOR_INREG -> ANY_EXTEND_VECTOR_INREG"
"[TargetLowering] SimplifyDemandedBits - add ANY_EXTEND_VECTOR_INREG support"
We can end up with an any_extend_vector_inreg with a 256 bit result type
and a 128 bit result type. This is allowed by the ISD opcode, but the
generic operation legalizer is only able to expand cases where the
total vector width is the same.
The X86 backend creates these mismatched cases for zext_vec_inreg/sext_vec_inreg.
The SimplifyDemandedBits changes are allowing those nodes to become
aext_vec_inreg. For the zext/sext cases, the X86 backend has Custom
handling and never lets them get to the generic legalizer. We need to do the same
for aext_vec_inreg.
llvm-svn: 364264
Widen vector result type for ctlz_zero_undef and cttz_zero_undef the same as
ctlz and cttz.
Differential Revision: https://reviews.llvm.org/D63463
llvm-svn: 364221
Avoids using a plain unsigned for registers throughoug codegen.
Doesn't attempt to change every register use, just something a little
more than the set needed to build after changing the return type of
MachineOperand::getReg().
llvm-svn: 364191
This can occur under certain circumstances when undefs are created later on in the constant multipliers (e.g. in this case due to SimplifyDemandedVectorElts). Its better to let the shift by zero to occur and perform any cleanup afterward.
Fixes OSS Fuzz #15429
llvm-svn: 364179
The code divides the alignment by 2 if the original alignment is
equal to the original VT size. But this wouldn't be correct
if the alignment was larger than the VT size.
The memory operand object already takes care of calling MinAlign
on the base alignment and the memory pointer offset. So we don't
need any special code at all.
llvm-svn: 364151
We tend to only test for scalar/scalar consts when really we could support non-uniform vectors using ISD::matchUnaryPredicate/matchBinaryPredicate etc.
llvm-svn: 363924
Use getAPIntValue() in a few more places. Most of the time getZExtValue() is fine, but occasionally there's fuzzed code or someone decides to create i65536 or something.....
llvm-svn: 363887
Use matchBinaryPredicate instead of isConstOrConstSplat to let us handle non-uniform shift cases.
This requires us to tweak matchBinaryPredicate to allow it to (optionally) handle constants with different type widths.
llvm-svn: 363792
This allows targets to make more decisions about reserved registers
after isel. For example, now it should be certain there are calls or
stack objects in the frame or not, which could have been introduced by
legalization.
Patch by Matthias Braun
llvm-svn: 363757
Other than adding consistent demanded elts handling which was a trivial addition, the other differences in functionality will be added in later patches.
llvm-svn: 363713
Other than adding consistent demanded elts handling which was a trivial addition, the other differences in functionality will be added in later patches.
llvm-svn: 363710
This adds vector splitting for vaarg instructions during type legalization
Committed on behalf of @luke (Luke Lau)
Differential Revision: https://reviews.llvm.org/D60762
llvm-svn: 363671
Some GEPs were not being split, presumably because that split would just be
undone by the DAGCombiner. Not performing those splits can prevent important
optimizations, such as preventing the element indices / member offsets from
being (partially) folded into load/store instruction immediates. This patch:
- Makes the splits also occur in the cases where the base address and the GEP
are in the same BB.
- Ensures that the DAGCombiner doesn't reassociate them back again.
Differential Revision: https://reviews.llvm.org/D60294
llvm-svn: 363544
This is already done in DAGCombiner::visitINSERT_SUBVECTOR, but this helps a number of shuffles across different vector widths recognise when they come from the same source.
llvm-svn: 363542
This reverts rL363474. -debug-only=isel was added to some tests that
don't specify `REQUIRES: asserts`. This causes failures on
-DLLVM_ENABLE_ASSERTIONS=off builds.
I chose to revert instead of fixing the tests because I'm not sure
whether we should add `REQUIRES: asserts` to more tests.
llvm-svn: 363482
As discussed on D62910, we need to check whether particular types of memory access are allowed, not just their alignment/address-space.
This NFC patch adds a MachineMemOperand::Flags argument to allowsMemoryAccess and allowsMisalignedMemoryAccesses, and wires up calls to pass the relevant flags to them.
If people are happy with this approach I can then update X86TargetLowering::allowsMisalignedMemoryAccesses to handle misaligned NT load/stores.
Differential Revision: https://reviews.llvm.org/D63075
llvm-svn: 363179
As suggested by @arsenm on D63075 - this adds a TargetLowering::allowsMemoryAccess wrapper that takes a Load/Store node's MachineMemOperand to handle the AddressSpace/Alignment arguments and will also implicitly handle the MachineMemOperand::Flags change in D63075.
llvm-svn: 363048
This patch changes how LLVM handles the accumulator/start value
in the reduction, by never ignoring it regardless of the presence of
fast-math flags on callsites. This change introduces the following
new intrinsics to replace the existing ones:
llvm.experimental.vector.reduce.fadd -> llvm.experimental.vector.reduce.v2.fadd
llvm.experimental.vector.reduce.fmul -> llvm.experimental.vector.reduce.v2.fmul
and adds functionality to auto-upgrade existing LLVM IR and bitcode.
Reviewers: RKSimon, greened, dmgreen, nikic, simoll, aemerson
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D60261
llvm-svn: 363035
This behavior was added in r130928 for both FastISel and SD, and then
disabled in r131156 for FastISel.
This re-enables it for FastISel with the corresponding fix.
This is triggered only when FastISel can't lower the arguments and falls
back to SelectionDAG for it.
FastISel contains a map of "register fixups" where at the end of the
selection phase it replaces all uses of a register with another
register that FastISel sometimes pre-assigned. Code at the end of
SelectionDAGISel::runOnMachineFunction is doing the replacement at the
very end of the function, while other pieces that come in before that
look through the MachineFunction and assume everything is done. In this
case, the real issue is that the code emitting COPY instructions for the
liveins (physreg to vreg) (EmitLiveInCopies) is checking if the vreg
assigned to the physreg is used, and if it's not, it will skip the COPY.
If a register wasn't replaced with its assigned fixup yet, the copy will
be skipped and we'll end up with uses of undefined registers.
This fix moves the replacement of registers before the emission of
copies for the live-ins.
The initial motivation for this fix is to enable tail calls for
swiftself functions, which were blocked because we couldn't prove that
the swiftself argument (which is callee-save) comes from a function
argument (live-in), because there was an extra copy (vreg to vreg).
A few tests are affected by this:
* llvm/test/CodeGen/AArch64/swifterror.ll: we used to spill x21
(callee-save) but never reload it because it's attached to the return.
We now don't even spill it anymore.
* llvm/test/CodeGen/*/swiftself.ll: we tail-call now.
* llvm/test/CodeGen/AMDGPU/mubuf-legalize-operands.ll: I believe this
test was not really testing the right thing, but it worked because the
same registers were re-used.
* llvm/test/CodeGen/ARM/cmpxchg-O0.ll: regalloc changes
* llvm/test/CodeGen/ARM/swifterror.ll: get rid of a copy
* llvm/test/CodeGen/Mips/*: get rid of spills and copies
* llvm/test/CodeGen/SystemZ/swift-return.ll: smaller stack
* llvm/test/CodeGen/X86/atomic-unordered.ll: smaller stack
* llvm/test/CodeGen/X86/swifterror.ll: same as AArch64
* llvm/test/DebugInfo/X86/dbg-declare-arg.ll: stack size changed
Differential Revision: https://reviews.llvm.org/D62361
llvm-svn: 362963
This opportunity is found from spec 2017 557.xz_r. And it is used by the sha encrypt/decrypt. See sha-2/sha512.c
static void store64(u64 x, unsigned char* y)
{
for(int i = 0; i != 8; ++i)
y[i] = (x >> ((7-i) * 8)) & 255;
}
static u64 load64(const unsigned char* y)
{
u64 res = 0;
for(int i = 0; i != 8; ++i)
res |= (u64)(y[i]) << ((7-i) * 8);
return res;
}
The load64 has been implemented by https://reviews.llvm.org/D26149
This patch is trying to implement the store pattern.
Match a pattern where a wide type scalar value is stored by several narrow
stores. Fold it into a single store or a BSWAP and a store if the targets
supports it.
Assuming little endian target:
i8 *p = ...
i32 val = ...
p[0] = (val >> 0) & 0xFF;
p[1] = (val >> 8) & 0xFF;
p[2] = (val >> 16) & 0xFF;
p[3] = (val >> 24) & 0xFF;
>
*((i32)p) = val;
i8 *p = ...
i32 val = ...
p[0] = (val >> 24) & 0xFF;
p[1] = (val >> 16) & 0xFF;
p[2] = (val >> 8) & 0xFF;
p[3] = (val >> 0) & 0xFF;
>
*((i32)p) = BSWAP(val);
Differential Revision: https://reviews.llvm.org/D62897
llvm-svn: 362921
In order for GlobalISel to re-use the significant amount of analysis and
optimization code in SDAG's switch lowering, we first have to extract it and
create an interface to be used by both frameworks.
No test changes as it's NFC.
Differential Revision: https://reviews.llvm.org/D62745
llvm-svn: 362857
Summary:
(1) Function descriptor on AIX
On AIX, a called routine may have 2 distinct symbols associated with it:
* A function descriptor (Name)
* A function entry point (.Name)
The descriptor structure on AIX is the same as those in the ELF V1 ABI:
* The address of the entry point of the function.
* The TOC base address for the function.
* The environment pointer.
The descriptor symbol uses the same name as the source level function in C.
The function entry point is analogous to the symbol we would generate for a
function in a non-descriptor-based ABI, except that it is renamed by
prepending a ".".
Which symbol gets referenced depends on the context:
* Taking the address of the function references the descriptor symbol.
* Calling the function references the entry point symbol.
(2) Speaking of implementation on AIX, for direct function call target, we
create proper MCSymbol SDNode(e.g . ".foo") while constructing SDAG to
replace original TargetGlobalAddress SDNode. Then down the path, we can
take advantage of this MCSymbol.
Patch by: Xiangling_L
Reviewed by: sfertile, hubert.reinterpretcast, jasonliu, syzaara
Differential Revision: https://reviews.llvm.org/D62532
llvm-svn: 362735
This patch is the first step towards ensuring MergeConsecutiveStores correctly handles non-temporal loads\stores:
1 - When merging load\stores we must ensure that they all have the same non-temporal flag. This is unlikely to occur, but can in strange cases where we're storing at the end of one page and the beginning of another.
2 - The merged load\store node must retain the non-temporal flag.
Differential Revision: https://reviews.llvm.org/D62910
llvm-svn: 362723
The ISD::STRICT_ nodes used to implement the constrained floating-point
intrinsics are currently never passed to the target back-end, which makes
it impossible to handle them correctly (e.g. mark instructions are depending
on a floating-point status and control register, or mark instructions as
possibly trapping).
This patch allows the target to use setOperationAction to switch the action
on ISD::STRICT_ nodes to Legal. If this is done, the SelectionDAG common code
will stop converting the STRICT nodes to regular floating-point nodes, but
instead pass the STRICT nodes to the target using normal SelectionDAG
matching rules.
To avoid having the back-end duplicate all the floating-point instruction
patterns to handle both strict and non-strict variants, we make the MI
codegen explicitly aware of the floating-point exceptions by introducing
two new concepts:
- A new MCID flag "mayRaiseFPException" that the target should set on any
instruction that possibly can raise FP exception according to the
architecture definition.
- A new MI flag FPExcept that CodeGen/SelectionDAG will set on any MI
instruction resulting from expansion of any constrained FP intrinsic.
Any MI instruction that is *both* marked as mayRaiseFPException *and*
FPExcept then needs to be considered as raising exceptions by MI-level
codegen (e.g. scheduling).
Setting those two new flags is straightforward. The mayRaiseFPException
flag is simply set via TableGen by marking all relevant instruction
patterns in the .td files.
The FPExcept flag is set in SDNodeFlags when creating the STRICT_ nodes
in the SelectionDAG, and gets inherited in the MachineSDNode nodes created
from it during instruction selection. The flag is then transfered to an
MIFlag when creating the MI from the MachineSDNode. This is handled just
like fast-math flags like no-nans are handled today.
This patch includes both common code changes required to implement the
new features, and the SystemZ implementation.
Reviewed By: andrew.w.kaylor
Differential Revision: https://reviews.llvm.org/D55506
llvm-svn: 362663
Most parts of LLVM don't care whether the byval type is derived from an
explicit Attribute or from the parameter's pointee type, so it makes
sense for the main access function to just return the right value.
The very few users who do care (only BitcodeReader so far) can find out
how it's specified by accessing the Attribute directly.
llvm-svn: 362642
Summary:
An argument that is return by a function but bit-casted before can still
be annotated as "returned". Make sure we do not crash for this case.
Reviewers: sunfish, stephenwlin, niravd, arsenm
Subscribers: wdng, hiraditya, bollu, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59917
llvm-svn: 362546
This is a special case of a more general transform (not (sub Y, X)) -> (add X, ~Y). InstCombine knows the general form. I've restricted to the special case to fix the motivating case PR42118. I tried handling any case where Y was constant, but got some changes on some Mips tests that I couldn't quickly prove where beneficial.
Fixes PR42118
Differential Revision: https://reviews.llvm.org/D62828
llvm-svn: 362533
The proposal in D62498 showed that x86 would benefit from vector
store splitting, but that may conflict with the generic DAG
combiner's store merging transforms.
Add memory type to the existing TLI hook that enables the merging
transforms, so we can limit those changes to scalars only for x86.
llvm-svn: 362507
Summary:
This *might* be the last fold for `sink-addsub-of-const.ll`, but i'm not sure yet.
As far as i can tell, there are no regressions here (ignoring x86-32),
all changes are either good or neutral.
This, almost surprisingly to me, fixes the motivational tests (in `shift-amount-mod.ll`)
`@reg32_lshr_by_sub_from_negated` from [[ https://bugs.llvm.org/show_bug.cgi?id=41952 | PR41952 ]].
https://rise4fun.com/Alive/vMd3
Reviewers: RKSimon, t.p.northover, craig.topper, spatel, efriedma
Reviewed By: RKSimon
Subscribers: sdardis, javed.absar, arichardson, kristof.beyls, jrtc27, atanasyan, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62774
llvm-svn: 362488
As I mentioned on D61887 we don't get many hits on ComputeNumSignBits as we did on computeKnownBits.
The case we do get is interesting though - it allows us to use the 'ConditionalNegate' combine in combineLogicBlendIntoPBLENDV to remove a select.
It comes too late for SSE41 (BLENDV) cases, but SSE2 tests can hit it now. We should probably try to make use of this for SSE41+ targets as well - avoiding variable blends is usually a good idea. I'll investigate as a followup.
Differential Revision: https://reviews.llvm.org/D62777
llvm-svn: 362486
This opportunity is found from spec 2017 557.xz_r. And it is used by the sha encrypt/decrypt. See sha-2/sha512.c
static void store64(u64 x, unsigned char* y)
{
for(int i = 0; i != 8; ++i)
y[i] = (x >> ((7-i) * 8)) & 255;
}
static u64 load64(const unsigned char* y)
{
u64 res = 0;
for(int i = 0; i != 8; ++i)
res |= (u64)(y[i]) << ((7-i) * 8);
return res;
}
The load64 has been implemented by https://reviews.llvm.org/D26149
This patch is trying to implement the store pattern.
Match a pattern where a wide type scalar value is stored by several narrow
stores. Fold it into a single store or a BSWAP and a store if the targets
supports it.
Assuming little endian target:
i8 *p = ...
i32 val = ...
p[0] = (val >> 0) & 0xFF;
p[1] = (val >> 8) & 0xFF;
p[2] = (val >> 16) & 0xFF;
p[3] = (val >> 24) & 0xFF;
>
*((i32)p) = val;
i8 *p = ...
i32 val = ...
p[0] = (val >> 24) & 0xFF;
p[1] = (val >> 16) & 0xFF;
p[2] = (val >> 8) & 0xFF;
p[3] = (val >> 0) & 0xFF;
>
*((i32)p) = BSWAP(val);
Differential Revision: https://reviews.llvm.org/D61843
llvm-svn: 362472
Summary: This change facilitates propagating fmf which was placed on setcc from fcmp through folds with selects so that back ends can model this path for arithmetic folds on selects in SDAG.
Reviewers: qcolombet, spatel
Reviewed By: qcolombet
Subscribers: nemanjai, jsji
Differential Revision: https://reviews.llvm.org/D62552
llvm-svn: 362439
We were missing this fold in the DAG, which I've copied directly from llvm::ConstantFoldCastInstruction
Differential Revision: https://reviews.llvm.org/D62807
llvm-svn: 362397
If we hit the limit, we do expand the outstanding tokenfactors.
Otherwise, we might drop nodes with users in the unexpanded
tokenfactors. This fixes the crashes reported by Jordan Rupprecht.
Reviewers: niravd, spatel, craig.topper, rupprecht
Reviewed By: niravd
Differential Revision: https://reviews.llvm.org/D62633
llvm-svn: 362350
Move this combine from x86 into generic DAGCombine, which currently only manages cases where the bitcast is between types of the same scalarsize.
Differential Revision: https://reviews.llvm.org/D59188
llvm-svn: 362324
Add (opt-in) support for implicit truncation to isConstOrConstSplat, which allows us to match truncated 'all ones' cases in isBitwiseNot.
PR41020 compares against using ISD::isBuildVectorAllOnes() instead, but that predicate silently accepts any UNDEF elements in the build vector which might not be what we want in isBitwiseNot - so I've added an opt-in 'AllowUndefs' flag that is set to false by default but will allow us to enable it on individual cases where its safe.
Differential Revision: https://reviews.llvm.org/D62783
llvm-svn: 362323
The results of the dyn_casts were immediately dereferenced on the next line
so they had better not be null.
I don't think there's any way for these dyn_casts to fail, so use a cast
of adding null check.
llvm-svn: 362315
Just copy all of the operands except the chain and call MorphNode on that.
This removes the IsUnary and IsTernary flags.
Also always get the result type from the result type of the original
nodes. Previously we got it from the operand except for two nodes
where that didn't work.
llvm-svn: 362269
[FPEnv] Added a special UnrollVectorOp method to deal with the chain on StrictFP opcodes
This change creates UnrollVectorOp_StrictFP. The purpose of this is to address a failure that consistently occurs when calling StrictFP functions on vectors whose number of elements is 3 + 2n on most platforms, such as PowerPC or SystemZ. The old UnrollVectorOp method does not expect that the vector that it will unroll will have a chain, so it has an assert that prevents it from running if this is the case. This new StrictFP version of the method deals with the chain while unrolling the vector. With this new function in place during vector widending, llc can run vector-constrained-fp-intrinsics.ll for SystemZ successfully.
Submitted by: Drew Wock <drew.wock@sas.com>
Reviewed by: Cameron McInally, Kevin P. Neal
Approved by: Cameron McInally
Differential Revision: https://reviews.llvm.org/D62546
llvm-svn: 362241
I don't have a test case for these, but there is a test case for D62266
where, even after all the constant-folding patches, we still end up
with endless combine loop. Which makes sense, since we don't constant
fold for opaque constants.
llvm-svn: 362156
Summary:
Only vector tests are being affected here,
since subtraction by scalar constant is rewritten
as addition by negated constant.
No surprising test changes.
https://rise4fun.com/Alive/pbT
This is a recommit, originally committed in rL361852, but reverted
to investigate test-suite compile-time hangs.
Reviewers: RKSimon, craig.topper, spatel
Reviewed By: RKSimon
Subscribers: javed.absar, kristof.beyls, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62257
llvm-svn: 362146
Summary:
Again only vectors affected. Frustrating. Let me take a look into that..
https://rise4fun.com/Alive/AAq
This is a recommit, originally committed in rL361852, but reverted
to investigate test-suite compile-time hangs, and then reverted in
rL362109 to fix missing constant folds that were causing
endless combine loops.
Reviewers: RKSimon, craig.topper, spatel
Reviewed By: RKSimon
Subscribers: javed.absar, JDevlieghere, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62294
llvm-svn: 362145
Summary:
This prevents regressions in next patch,
and somewhat recovers from the regression to AMDGPU test in D62223.
It is indeed not great that we leave vector decrement,
don't transform it into vector add all-ones..
https://rise4fun.com/Alive/ZRl
This is a recommit, originally committed in rL361852, but reverted
to investigate test-suite compile-time hangs, and then reverted in
rL362109 to fix missing constant folds that were causing
endless combine loops.
Reviewers: RKSimon, craig.topper, spatel, arsenm
Reviewed By: RKSimon, arsenm
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, javed.absar, dstuttard, tpr, t-tye, kristof.beyls, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62263
llvm-svn: 362144
Summary:
Direct sibling of D62223 patch.
While i don't have a direct motivational pattern for this,
it would seem to make sense to handle both patterns (or none),
for symmetry?
The aarch64 changes look neutral;
sparc and systemz look like improvement (one less instruction each);
x86 changes - 32bit case improves, 64bit case shows that LEA no longer
gets constructed, which may be because that whole test is `-mattr=+slow-lea,+slow-3ops-lea`
https://rise4fun.com/Alive/ffh
This is a recommit, originally committed in rL361852, but reverted
to investigate test-suite compile-time hangs, and then reverted in
rL362109 to fix missing constant folds that were causing
endless combine loops.
Reviewers: RKSimon, craig.topper, spatel, t.p.northover
Reviewed By: t.p.northover
Subscribers: t.p.northover, jyknight, javed.absar, kristof.beyls, fedor.sergeev, jrtc27, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62252
llvm-svn: 362143
Summary:
The main motivation is shown by all these `neg` instructions that are now created.
In particular, the `@reg32_lshr_by_negated_unfolded_sub_b` test.
AArch64 test changes all look good (`neg` created), or neutral.
X86 changes look neutral (vectors), or good (`neg` / `xor eax, eax` created).
I'm not sure about `X86/ragreedy-hoist-spill.ll`, it looks like the spill
is now hoisted into preheader (which should still be good?),
2 4-byte reloads become 1 8-byte reload, and are elsewhere,
but i'm not sure how that affects that loop.
I'm unable to interpret AMDGPU change, looks neutral-ish?
This is hopefully a step towards solving [[ https://bugs.llvm.org/show_bug.cgi?id=41952 | PR41952 ]].
https://rise4fun.com/Alive/pkdq (we are missing more patterns, i'll submit them later)
This is a recommit, originally committed in rL361852, but reverted
to investigate test-suite compile-time hangs, and then reverted in
rL362109 to fix missing constant folds that were causing
endless combine loops.
Reviewers: craig.topper, RKSimon, spatel, arsenm
Reviewed By: RKSimon
Subscribers: bjope, qcolombet, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, javed.absar, dstuttard, tpr, t-tye, kristof.beyls, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62223
llvm-svn: 362142
Summary:
Direct sibling of D62662, the root cause of the endless combine loop in D62257
https://rise4fun.com/Alive/d3W
Reviewers: RKSimon, craig.topper, spatel, t.p.northover
Reviewed By: t.p.northover
Subscribers: t.p.northover, javed.absar, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62664
llvm-svn: 362133
Summary:
No tests change, and i'm not sure how to test this, but it's better safe than sorry.
Reviewers: spatel, RKSimon, craig.topper, t.p.northover
Reviewed By: craig.topper
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62663
llvm-svn: 362132
Summary:
This was the root cause of the endless combine loop in D62257
https://rise4fun.com/Alive/d3W
Reviewers: RKSimon, spatel, craig.topper, t.p.northover
Reviewed By: t.p.northover
Subscribers: t.p.northover, javed.absar, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62662
llvm-svn: 362131
Summary: No tests change, and i'm not sure how to test this, but it's better safe than sorry.
Reviewers: spatel, RKSimon, craig.topper, t.p.northover
Reviewed By: craig.topper
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62661
llvm-svn: 362130
When we switch to opaque pointer types we will need some way to describe
how many bytes a 'byval' parameter should occupy on the stack. This adds
a (for now) optional extra type parameter.
If present, the type must match the pointee type of the argument.
The original commit did not remap byval types when linking modules, which broke
LTO. This version fixes that.
Note to front-end maintainers: if this causes test failures, it's probably
because the "byval" attribute is printed after attributes without any parameter
after this change.
llvm-svn: 362128
This change creates UnrollVectorOp_StrictFP. The purpose of this is to address a failure that consistently occurs when calling StrictFP functions on vectors whose number of elements is 3 + 2n on most platforms, such as PowerPC or SystemZ. The old UnrollVectorOp method does not expect that the vector that it will unroll will have a chain, so it has an assert that prevents it from running if this is the case. This new StrictFP version of the method deals with the chain while unrolling the vector. With this new function in place during vector widending, llc can run vector-constrained-fp-intrinsics.ll for SystemZ successfully.
Submitted by: Drew Wock <drew.wock@sas.com>
Reviewed by: Cameron McInally, Kevin P. Neal
Approved by: Cameron McInally
Differential Revision: http://reviews.llvm.org/D62546
llvm-svn: 362112
I was looking into an endless combine loop the uncommitted follow-up patch
was causing, and it appears even these patches can exibit such an
endless loop. The root cause is that we try to hoist one binop (add/sub) with
constant operand, and if we get two such binops both of which are
eligible for this hoisting, we get stuck.
Some cases may highlight missing constant-folds.
Reverts r361871,r361872,r361873,r361874.
llvm-svn: 362109
When we switch to opaque pointer types we will need some way to describe
how many bytes a 'byval' parameter should occupy on the stack. This adds
a (for now) optional extra type parameter.
If present, the type must match the pointee type of the argument.
Note to front-end maintainers: if this causes test failures, it's probably
because the "byval" attribute is printed after attributes without any parameter
after this change.
llvm-svn: 362012
This patch add the ISD::LRINT and ISD::LLRINT along with new
intrinsics. The changes are straightforward as for other
floating-point rounding functions, with just some adjustments
required to handle the return value being an interger.
The idea is to optimize lrint/llrint generation for AArch64
in a subsequent patch. Current semantic is just route it to libm
symbol.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D62017
llvm-svn: 361875
Summary:
Again only vectors affected. Frustrating. Let me take a look into that..
https://rise4fun.com/Alive/AAq
This is a recommit, originally committed in rL361856, but reverted
to investigate test-suite compile-time hangs.
Reviewers: RKSimon, craig.topper, spatel
Reviewed By: RKSimon
Subscribers: javed.absar, JDevlieghere, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62294
llvm-svn: 361874
Summary:
This prevents regressions in next patch,
and somewhat recovers from the regression to AMDGPU test in D62223.
It is indeed not great that we leave vector decrement,
don't transform it into vector add all-ones..
https://rise4fun.com/Alive/ZRl
This is a recommit, originally committed in rL361855, but reverted
to investigate test-suite compile-time hangs.
Reviewers: RKSimon, craig.topper, spatel, arsenm
Reviewed By: RKSimon, arsenm
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, javed.absar, dstuttard, tpr, t-tye, kristof.beyls, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62263
llvm-svn: 361873
Summary:
Direct sibling of D62223 patch.
While i don't have a direct motivational pattern for this,
it would seem to make sense to handle both patterns (or none),
for symmetry?
The aarch64 changes look neutral;
sparc and systemz look like improvement (one less instruction each);
x86 changes - 32bit case improves, 64bit case shows that LEA no longer
gets constructed, which may be because that whole test is `-mattr=+slow-lea,+slow-3ops-lea`
https://rise4fun.com/Alive/ffh
This is a recommit, originally committed in rL361853, but reverted
to investigate test-suite compile-time hangs.
Reviewers: RKSimon, craig.topper, spatel, t.p.northover
Reviewed By: t.p.northover
Subscribers: t.p.northover, jyknight, javed.absar, kristof.beyls, fedor.sergeev, jrtc27, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62252
llvm-svn: 361872
Summary:
The main motivation is shown by all these `neg` instructions that are now created.
In particular, the `@reg32_lshr_by_negated_unfolded_sub_b` test.
AArch64 test changes all look good (`neg` created), or neutral.
X86 changes look neutral (vectors), or good (`neg` / `xor eax, eax` created).
I'm not sure about `X86/ragreedy-hoist-spill.ll`, it looks like the spill
is now hoisted into preheader (which should still be good?),
2 4-byte reloads become 1 8-byte reload, and are elsewhere,
but i'm not sure how that affects that loop.
I'm unable to interpret AMDGPU change, looks neutral-ish?
This is hopefully a step towards solving [[ https://bugs.llvm.org/show_bug.cgi?id=41952 | PR41952 ]].
https://rise4fun.com/Alive/pkdq (we are missing more patterns, i'll submit them later)
This is a recommit, originally committed in rL361852, but reverted
to investigate test-suite compile-time hangs.
Reviewers: craig.topper, RKSimon, spatel, arsenm
Reviewed By: RKSimon
Subscribers: bjope, qcolombet, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, javed.absar, dstuttard, tpr, t-tye, kristof.beyls, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62223
llvm-svn: 361871
Summary:
Again only vectors affected. Frustrating. Let me take a look into that..
https://rise4fun.com/Alive/AAq
Reviewers: RKSimon, craig.topper, spatel
Reviewed By: RKSimon
Subscribers: javed.absar, JDevlieghere, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62294
llvm-svn: 361856
Summary:
This prevents regressions in next patch,
and somewhat recovers from the regression to AMDGPU test in D62223.
It is indeed not great that we leave vector decrement,
don't transform it into vector add all-ones..
https://rise4fun.com/Alive/ZRl
Reviewers: RKSimon, craig.topper, spatel, arsenm
Reviewed By: RKSimon, arsenm
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, javed.absar, dstuttard, tpr, t-tye, kristof.beyls, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62263
llvm-svn: 361855
Summary:
Only vector tests are being affected here,
since subtraction by scalar constant is rewritten
as addition by negated constant.
No surprising test changes.
https://rise4fun.com/Alive/pbT
Reviewers: RKSimon, craig.topper, spatel
Reviewed By: RKSimon
Subscribers: javed.absar, kristof.beyls, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62257
llvm-svn: 361854
Summary:
Direct sibling of D62223 patch.
While i don't have a direct motivational pattern for this,
it would seem to make sense to handle both patterns (or none),
for symmetry?
The aarch64 changes look neutral;
sparc and systemz look like improvement (one less instruction each);
x86 changes - 32bit case improves, 64bit case shows that LEA no longer
gets constructed, which may be because that whole test is `-mattr=+slow-lea,+slow-3ops-lea`
https://rise4fun.com/Alive/ffh
Reviewers: RKSimon, craig.topper, spatel, t.p.northover
Reviewed By: t.p.northover
Subscribers: t.p.northover, jyknight, javed.absar, kristof.beyls, fedor.sergeev, jrtc27, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62252
llvm-svn: 361853
Summary:
The main motivation is shown by all these `neg` instructions that are now created.
In particular, the `@reg32_lshr_by_negated_unfolded_sub_b` test.
AArch64 test changes all look good (`neg` created), or neutral.
X86 changes look neutral (vectors), or good (`neg` / `xor eax, eax` created).
I'm not sure about `X86/ragreedy-hoist-spill.ll`, it looks like the spill
is now hoisted into preheader (which should still be good?),
2 4-byte reloads become 1 8-byte reload, and are elsewhere,
but i'm not sure how that affects that loop.
I'm unable to interpret AMDGPU change, looks neutral-ish?
This is hopefully a step towards solving [[ https://bugs.llvm.org/show_bug.cgi?id=41952 | PR41952 ]].
https://rise4fun.com/Alive/pkdq (we are missing more patterns, i'll submit them later)
Reviewers: craig.topper, RKSimon, spatel, arsenm
Reviewed By: RKSimon
Subscribers: bjope, qcolombet, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, javed.absar, dstuttard, tpr, t-tye, kristof.beyls, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62223
llvm-svn: 361852
Move the element index/count variables into the block where they are actually used - appeases cppcheck and helps avoid shadow variable warnings.
llvm-svn: 361821
This is derived from the related fold for build vectors.
We also have a version of this in DAGCombiner. The benefit of
having this fold at node creation time is (1) efficiency and
(2) preventing infinite looping from creating patterns that
should not exist in the first place.
Currently, the inf-loop could happen with MergeConsecutiveStores()
because it naively creates concat of extracts when forming a wider
vector store. That could fight with target-specific store narrowing.
llvm-svn: 361780
There's a possible missing fold here for extracting from the
same source vector. It's similar to a check that we use to
squash a build vector with all extracted elements from the
same source vector.
llvm-svn: 361778
Summary:
- The current implementation simplifies the case where the source of
`copyto` is `implicit-def`ed. However, it only works when that
`implicit-def` is single-used since it detects that from
`implicit-def` and cannot determine which destination vreg should be
used if there are multiple uses.
- This patch changes that detection when `copyto` is being emitted. If
that `copyto`'s source is defined from `implicit-def`, it simplifies
it. Hence, it works even that `implicit-def` is multi-used.
- Except it simplifies the internal IR, it won't improve the quality of
code generation. However, it helps to detect 'implicit-def` in a
straight-forward manner in some passes, such as `si-i1-copies`. A test
case is added.
Reviewers: sunfish, nhaehnle
Subscribers: jvesely, hiraditya, asbirlea, llvm-commits, yaxunl
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62342
llvm-svn: 361777
The DemandedElts variable is pretty much inert at the moment - the original GetDemandedBits implementation calls it with an 'all ones' DemandedElts value so the function is active and behaves exactly as it used to.
llvm-svn: 361773
Details: To make instruction selection really divergence driven it is necessary to assign
the correct register classes to the cross block values beforehand. For the divergent targets
same value type requires different register classes dependent on the value divergence.
Reviewers: rampitec, nhaehnle
Differential Revision: https://reviews.llvm.org/D59990
This commit was reverted because of the build failure.
The reason was mlformed patch.
Build failure fixed.
llvm-svn: 361741
The test based on PR42010:
https://bugs.llvm.org/show_bug.cgi?id=42010
...may show an inaccuracy for PPC's target defs, but we should not
be so aggressive with an assert here. There's no telling what out-of-tree
targets look like.
llvm-svn: 361696
Details: To make instruction selection really divergence driven it is necessary to assign
the correct register classes to the cross block values beforehand. For the divergent targets
same value type requires different register classes dependent on the value divergence.
Reviewers: rampitec, nhaehnle
Differential Revision: https://reviews.llvm.org/D59990
llvm-svn: 361644
This patch adds the overridable TargetLowering::getTargetConstantFromLoad function which allows targets to return any constant value loaded by a LoadSDNode node - only X86 makes use of this so far but everything should be in place for other targets.
computeKnownBits then uses this function to improve codegen, notably vector code after legalization.
A future commit will do the same for ComputeNumSignBits but computeKnownBits sees the bigger benefit.
This required a couple of fixes:
* SimplifyDemandedBits must early-out for getTargetConstantFromLoad cases to prevent infinite loops of constant regeneration (similar to what we already do for BUILD_VECTOR).
* Fix a DAGCombiner::visitTRUNCATE issue as we had trunc(shl(v8i32),v8i16) <-> shl(trunc(v8i16),v8i32) infinite loops after legalization on AVX512 targets.
Differential Revision: https://reviews.llvm.org/D61887
llvm-svn: 361620
This is no-functional-change-intended currently because the definition
of isBinOp() only includes opcodes that produce 1 value. But if we
share that implementation with isCommutativeBinOp() as proposed in
D62191, then we need to make sure that the callers bail out for
opcodes that they are not prepared to handle correctly.
llvm-svn: 361547
Add an intrinsic that takes 2 signed integers with the scale of them provided
as the third argument and performs fixed point multiplication on them. The
result is saturated and clamped between the largest and smallest representable
values of the first 2 operands.
This is a part of implementing fixed point arithmetic in clang where some of
the more complex operations will be implemented as intrinsics.
Differential Revision: https://reviews.llvm.org/D55720
llvm-svn: 361289
DAGCombiner simplifies this more liberally as:
// If inserting an UNDEF, just return the original vector.
if (N1.isUndef())
return N0;
So there's no way to make this visible in output AFAIK, but
doing this at node creation time should be slightly more efficient.
llvm-svn: 361287
getNode() squashes concatenation of undefs via FoldCONCAT_VECTORS():
// Concat of UNDEFs is UNDEF.
if (llvm::all_of(Ops, [](SDValue Op) { return Op.isUndef(); }))
return DAG.getUNDEF(VT);
llvm-svn: 361284
There are no FP callers of DAGCombiner::reassociateOps() currently,
but we can add a fast-math check to make sure this API is not being
misused.
This was noted as a potential risk (and that risk might increase) with:
D62191
llvm-svn: 361268
Summary:
The endianess used in the calling convention does not always match the
endianess of the target on all architectures, namely AVR.
When an argument is too large to be legalised by the architecture and is
split for the ABI, a new hook TargetLoweringInfo::shouldSplitFunctionArgumentsAsLittleEndian
is queried to find the endianess that function arguments must be laid
out in.
This approach was recommended by Eli Friedman.
Originally reported in https://github.com/avr-rust/rust/issues/129.
Patch by Carl Peto.
Reviewers: bogner, t.p.northover, RKSimon, niravd, efriedma
Reviewed By: efriedma
Subscribers: JDevlieghere, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62003
llvm-svn: 361222
Since INLINEASM_BR is a terminator we need to flush the pending exports before
emitting it. If we don't do this, a TokenFactor can be inserted between it and
the BR instruction emitted to finish the callbr lowering.
It looks like nodes are glued to the INLINEASM_BR so I had to make sure we emit
the TokenFactor before that.
Differential Revision: https://reviews.llvm.org/D59981
llvm-svn: 361177
We shouldn't really make assumptions about possible sizes for long and long long. And longer term we should probably support vectorizing these intrinsics. By making the result types not fixed we can support vectors as well.
Differential Revision: https://reviews.llvm.org/D62026
llvm-svn: 361169
This changes the isShift variable to include the constant operand
check that was previously in the if statement.
While there fix an 80 column violation and an unnecessary use of
getNode. Also fix variable name capitalization.
llvm-svn: 361168
Fixes issue reported by aemerson on D57348. Vector op legalization
support is added for uaddo, usubo, saddo and ssubo (umulo and smulo
were already supported). As usual, by extracting TargetLowering methods
and calling them from vector op legalization.
Vector op legalization doesn't really deal with multiple result nodes,
so I'm explicitly performing a recursive legalization call on the
result value that is not being legalized.
There are some existing test changes because expansion happens
earlier, so we don't get a DAG combiner run in between anymore.
Differential Revision: https://reviews.llvm.org/D61692
llvm-svn: 361166
Refactor DIExpression::With* into a flag enum in order to be less
error-prone to use (as discussed on D60866).
Patch by Djordje Todorovic.
Differential Revision: https://reviews.llvm.org/D61943
llvm-svn: 361137
Summary:
That check claims that the transform is illegal otherwise.
That isn't true:
1. For `ISD::ADD`, we only process `ISD::SHL` outer shift => sign bit does not matter
https://rise4fun.com/Alive/K4A
2. For `ISD::AND`, there is no restriction on constants:
https://rise4fun.com/Alive/Wy3
3. For `ISD::OR`, there is no restriction on constants:
https://rise4fun.com/Alive/GOH
3. For `ISD::XOR`, there is no restriction on constants:
https://rise4fun.com/Alive/ml6
So, why is it there then?
This changes the testcase that was touched by @spatel in rL347478,
but i'm not sure that test tests anything particular?
Reviewers: RKSimon, spatel, craig.topper, jojo, rengolin
Reviewed By: spatel
Subscribers: javed.absar, llvm-commits, spatel
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D61918
llvm-svn: 361044
The recent introduction of v3i32 etc as an MVT, and its use in AMDGPU
3-dword memory instructions, caused a de-optimization problem for code
with such a load that then bitcasts via vector of i8, because v12i8 is
not an MVT so it legalizes the bitcast by widening it.
This commit adds the ability to widen a bitcast using extract_subvector
on the result, so the value does not need to go via memory.
Differential Revision: https://reviews.llvm.org/D60457
Change-Id: Ie4abb7760547e54a2445961992eafc78e80d4b64
llvm-svn: 360942
This patch add the ISD::LROUND and ISD::LLROUND along with new
intrinsics. The changes are straightforward as for other
floating-point rounding functions, with just some adjustments
required to handle the return value being an interger.
The idea is to optimize lround/llround generation for AArch64
in a subsequent patch. Current semantic is just route it to libm
symbol.
llvm-svn: 360889
Before this change, they were erroneously constructed with the EH_LABEL
SDNode opcode, which caused other passes to interact with them in
incorrect ways. See the FIXME about fastisel that this addresses in the
existing test case.
Fixes PR41890
llvm-svn: 360818
Summary:
X86TargetLowering::LowerAsmOperandForConstraint had better support than
TargetLowering::LowerAsmOperandForConstraint for arbitrary depth
getelementpointers for "i", "n", and "s" extended inline assembly
constraints. Hoist its support from the derived class into the base
class.
Link: https://github.com/ClangBuiltLinux/linux/issues/469
Reviewers: echristo, t.p.northover
Reviewed By: t.p.northover
Subscribers: t.p.northover, E5ten, kees, jyknight, nemanjai, javed.absar, eraman, hiraditya, jsji, llvm-commits, void, craig.topper, nathanchance, srhines
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D61560
llvm-svn: 360604
We catch most of these patterns (on x86 at least) by matching
a concat vectors opcode early in combining, but the pattern may
emerge later using insert subvector instead.
The AVX1 diffs for add/sub overflow show another missed narrowing
pattern. That one may be falling though the cracks because of
combine ordering and multiple uses.
llvm-svn: 360585
The new fptrunc and fpext intrinsics are constrained versions of the
regular fptrunc and fpext instructions.
Reviewed by: Andrew Kaylor, Craig Topper, Cameron McInally, Conner Abbot
Approved by: Craig Topper
Differential Revision: https://reviews.llvm.org/D55897
llvm-svn: 360581
Summary:
When we know for sure whether two addresses do or do not alias, we
should immediately return from DAGCombiner::isAlias().
I think this comes from a bad copy/paste, Sorry for not catching that during the
code review.
Fixes PR41855.
Reviewers: niravd, gchatelet, EricWF
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D61846
llvm-svn: 360566
I've included a new fix in X86RegisterInfo to prevent PR41619 without
reintroducing r359392. We might be able to improve that in the base class
implementation of shouldRewriteCopySrc somehow. But this hopefully enables
forward progress on SimplifyDemandedBits improvements for now.
Original commit message:
This patch adds support for BigBitWidth -> SmallBitWidth bitcasts, splitting the DemandedBits/Elts accordingly.
The AMDGPU backend needed an extra (srl (and x, c1 << c2), c2) -> (and (srl(x, c2), c1) combine to encourage BFE creation, I investigated putting this in DAGComb
but it caused a lot of noise on other targets - some improvements, some regressions.
The X86 changes are all definite wins.
llvm-svn: 360552
I noticed that we were failing to narrow an x86 ymm math op in a case similar
to the 'madd' test diff. That is because a bitcast is sitting between the math
and the extract subvector and thwarting our pattern matching for narrowing:
t56: v8i32 = add t59, t58
t68: v4i64 = bitcast t56
t73: v2i64 = extract_subvector t68, Constant:i64<2>
t96: v4i32 = bitcast t73
There are a few wins and neutral diffs in the other tests.
Differential Revision: https://reviews.llvm.org/D61806
llvm-svn: 360541
We already updated the LegalizedNodes map at the end of the Expand call. This
would have marked the new node as being mapped to itself. So the LegalizeOp
call will find that an immediately return.
llvm-svn: 360472
Split out from D61692 per RKSimon's suggestion. Vector op
legalization will automatically recursively legalize the returned
SDValue, but we need to take care of the other results ourselves.
Otherwise it will end up getting legalized only during op
legalization, by which point it might be too late (though I'm not
aware of any specific cases right now).
There are codegen differences because expansion occurs earlier now
and we don't get a DAGCombiner run in between.
Differential Revision: https://reviews.llvm.org/D61744
llvm-svn: 360470
To find the candidates to merge stores we iterate over all nodes in a chain
for each store, which leads to quadratic compile times for large basic blocks
with a large number of stores.
Reviewers: niravd, spatel, craig.topper
Reviewed By: niravd
Differential Revision: https://reviews.llvm.org/D61511
llvm-svn: 360357
This patch allows for expansion of ADDCARRY and SUBCARRY when the target does not support it.
Differential Revision: https://reviews.llvm.org/D61411
llvm-svn: 360303
This is extracted from the original draft of D61419 with some additional tests.
We don't currently get this in IR (it's conservatively turned into a NaN),
but presumably that'll get updated as we add real IR support for 'fneg'
rather than 'fsub -0.0, x'.
The x86-32 run shows the following, and I haven't looked further to see why,
but that seems to be independent:
Legalizing: t1: f32 = undef
Trying to expand node
Creating fp constant: t4: f32 = ConstantFP<0.000000e+00>
Differential Revision: https://reviews.llvm.org/D61516
llvm-svn: 360296
This patch adds support for calling selectFNeg for FNeg instructions in addition to the fsub idiom
Differential Revision: https://reviews.llvm.org/D61624
llvm-svn: 360273
Add a new function to do the endian check, as I will commit another patch later, which will also need the endian check.
Differential Revision: https://reviews.llvm.org/D61236
llvm-svn: 360226
When simplifying TokenFactors, we potentially iterate over all
operands of a large number of TokenFactors. This causes quadratic
compile times in some cases and the large token factors cause additional
scalability problems elsewhere.
This patch adds some limits to the number of nodes explored for the
cases mentioned above.
Reviewers: niravd, spatel, craig.topper
Reviewed By: niravd
Differential Revision: https://reviews.llvm.org/D61397
llvm-svn: 360171
Summary:
If fneg lowering for fsub -0.0, x fails we currently fall back to treating it as an fsub. This has different behavior for nans than the xor with sign bit trick we normally try to do. On X86, the xor trick for double fails fast-isel in 32-bit mode with sse2 due to 64 bit integer types not being available. With -O2 we would always use an xorpd for this case. If we use subsd, this creates an observable behavior difference between -O0 and -O2. So fall back to SelectionDAG if we can't fast-isel it, that way SelectionDAG will use the xorpd.
I believe this patch is restoring the behavior prior to r345295 from last October. This was missed then because our fast isel case in 32-bit mode aborted fast-isel earlier for another reason. But I've added new tests to cover that.
Reviewers: andrew.w.kaylor, cameron.mcinally, spatel, efriedma
Reviewed By: cameron.mcinally
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D61622
llvm-svn: 360111
The problem was that we were creating a CMOV64rr <TargetFrameIndex>, <TargetFrameIndex>. The entire point of a TFI is that address code is not generated, so there's no way to legalize/lower this. Instead, simply prevent it's creation.
Arguably, we shouldn't be using *Target*FrameIndices in StatepointLowering at all, but that's a much deeper change.
llvm-svn: 360090
It's possible to use the 'y' mmx constraint with a type narrower than 64-bits.
This patch supports this by bitcasting the mmx type to 64-bits and then
truncating to the desired type.
There are probably other missing type combinations we need to support, but this
is the case we have a bug report for.
Fixes PR41748.
Differential Revision: https://reviews.llvm.org/D61582
llvm-svn: 360069
Reverts "[X86] Remove (V)MOV64toSDrr/m and (V)MOVDI2SSrr/m. Use 128-bit result MOVD/MOVQ and COPY_TO_REGCLASS instead"
Reverts "[TargetLowering][AMDGPU][X86] Improve SimplifyDemandedBits bitcast handling"
Eric Christopher and Jorge Gorbe Moya reported some issues with these patches to me off list.
Removing the CodeGenOnly instructions has changed how fneg is handled during fast-isel with sse/sse2. We're now emitting fsub -0.0, x instead
moving to the integer domain(in a GPR), xoring the sign bit, and then moving back to xmm. This is because the fast isel table no longer
contains an entry for (f32/f64 bitcast (i32/i64)) so the target independent fneg code fails. The use of fsub changes the behavior of nan with
respect to -O2 codegen which will always use a pxor. NOTE: We still have a difference with double with -m32 since the move to GPR doesn't work
there. I'll file a separate PR for that and add test cases.
Since removing the CodeGenOnly instructions was fixing PR41619, I'm reverting r358887 which exposed that PR. Though I wouldn't be surprised
if that bug can still be hit independent of that.
This should hopefully get Google back to green. I'll work with Simon and other X86 folks to figure out how to move forward again.
llvm-svn: 360066
This addresses one half of https://bugs.llvm.org/show_bug.cgi?id=41635
by combining a VECREDUCE_AND/OR into VECREDUCE_UMIN/UMAX (if latter is
legal but former is not) for zero-or-all-ones boolean reductions (which
are detected based on sign bits).
Differential Revision: https://reviews.llvm.org/D61398
llvm-svn: 360054
Based on PR41748, not all cases are handled in this function.
llvm_unreachable is treated as an optimization hint than can prune code paths
in a release build. This causes weird behavior when PR41748 is encountered on a
release build. It appears to generate an fp_round instruction from the floating
point code.
Making this a report_fatal_error prevents incorrect optimization of the code
and will instead generate a message to file a bug report.
llvm-svn: 360008
As a result of the underlying cause of PR41678 we created an ANY_EXTEND node with a scalar result type and v1i1 input type. Ideally we would have asserted for this instead of letting it go through to instruction selection and generate bad machine IR
Differential Revision: https://reviews.llvm.org/D61463
llvm-svn: 359836
The original patch was committed at rL359398 and reverted at rL359695 because of
infinite looping.
This includes a fix to check for a vector splat of "1.0" to avoid the infinite loop.
Original commit message:
This was originally part of D61028, but it's an independent diff.
If we try the repeated divisor reciprocal transform before producing an estimate sequence,
then we have an opportunity to use scalar fdiv. On x86, the trade-off is 1 divss vs. 5
vector FP ops in the default estimate sequence. On recent chips (Skylake, Ryzen), the
full-precision division is only 3 cycle throughput, so that's probably the better perf
default option and avoids problems from x86's inaccurate estimates.
The last 2 tests show that users still have the option to override the defaults by using
the function attributes for reciprocal estimates, but those patterns are potentially made
faster by converting the vector ops (including ymm ops) to scalar math.
Differential Revision: https://reviews.llvm.org/D61149
llvm-svn: 359793
We don't have FP exception limits in the IR constant folder for the binops (apart from strict ops),
so it does not make sense to have them here in the DAG either. Nothing else in the backend tries
to preserve exceptions (again outside of strict ops), so I don't see how this could have ever
worked for real code that cares about FP exceptions.
There are still cases (examples: unary opcodes in SDAG, FMA in IR) where we are trying (at least
partially) to preserve exceptions without even asking if the target supports FP exceptions. Those
should be corrected in subsequent patches.
Real support for FP exceptions requires several changes to handle the constrained/strict FP ops.
Differential Revision: https://reviews.llvm.org/D61331
llvm-svn: 359791
In preparation for supporting ILP32 on AArch64, this modifies the SelectionDAG
builder code so that pointers are allowed to have a larger type when "live" in
the DAG compared to memory.
Pointers get zero-extended whenever they are loaded, and truncated prior to
stores. In addition, a few not quite so obvious locations need updating:
* A GEP that has not been marked inbounds needs to enforce the IR-documented
2s-complement wrapping at the memory pointer size. Inbounds GEPs are
undefined if they overflow the address space, so no additional operations
are needed.
* Signed comparisons would give incorrect results if performed on the
zero-extended values.
This shouldn't affect CodeGen for now, but will become active when the AArch64
ILP32 support is committed.
llvm-svn: 359676
We don't have this restriction in IR, so it should not be here
either simply out of consistency. Code that wants to handle FP
exceptions is expected to use the 'strict' variants of these
nodes.
We don't get the frem case because frem by 0.0 produces NaN (invalid),
and that's the remaining check here (so the removed check for frem
was dead code AFAIK).
This is the only place in SDAG that uses "HasFPExceptions", so I
think we should remove that entirely as a follow-up patch.
llvm-svn: 359566
This was a local static funtion in SelectionDAG, which I've promoted to
TargetLowering so that I can reuse it to estimate the cost of a memory
operation in D59787.
Differential Revision: https://reviews.llvm.org/D59766
llvm-svn: 359543
The MachineFunction wasn't used in getOptimalMemOpType, but more importantly,
this allows reuse of findOptimalMemOpLowering that is calling getOptimalMemOpType.
This is the groundwork for the changes in D59766 and D59787, that allows
implementation of TTI::getMemcpyCost.
Differential Revision: https://reviews.llvm.org/D59785
llvm-svn: 359537
Do not combine (trunc adde(X, Y, Carry)) into (adde trunc(X), trunc(Y), Carry),
if adde is not legal for the target. Even it's at type-legalize phase.
Because adde is special and will not be legalized at operation-legalize phase later.
This fixes: PR40922
https://bugs.llvm.org/show_bug.cgi?id=40922
Differential Revision: https://reviews.llvm.org//D60854
llvm-svn: 359532
Summary:
Extract the logic for doing reassociations
from DAGCombiner::reassociateOps into a helper
function DAGCombiner::reassociateOpsCommutative,
and use that helper to trigger reassociation
on the original operand order, or the commuted
operand order.
Codegen is not identical since the operand order will
be different when doing the reassociations for the
commuted case. That causes some unfortunate churn in
some test cases. Apart from that this should be NFC.
Reviewers: spatel, craig.topper, tstellar
Reviewed By: spatel
Subscribers: dmgreen, dschuff, jvesely, nhaehnle, javed.absar, sbc100, jgravelle-google, hiraditya, aheejin, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D61199
llvm-svn: 359476
This was originally part of D61028, but it's an independent diff.
If we try the repeated divisor reciprocal transform before producing an estimate sequence,
then we have an opportunity to use scalar fdiv. On x86, the trade-off is 1 divss vs. 5
vector FP ops in the default estimate sequence. On recent chips (Skylake, Ryzen), the
full-precision division is only 3 cycle throughput, so that's probably the better perf
default option and avoids problems from x86's inaccurate estimates.
The last 2 tests show that users still have the option to override the defaults by using
the function attributes for reciprocal estimates, but those patterns are potentially made
faster by converting the vector ops (including ymm ops) to scalar math.
Differential Revision: https://reviews.llvm.org/D61149
llvm-svn: 359398
As detailed on PR40758, Bobcat/Jaguar can perform vector immediate shifts on the same pipes as vector ANDs with the same latency - so it doesn't make sense to replace a shl+lshr with a shift+and pair as it requires an additional mask (with the extra constant pool, loading and register pressure costs).
Differential Revision: https://reviews.llvm.org/D61068
llvm-svn: 359293
We had special case handling here, but it uses a scalar any_extend for the
promotion then bitcasts to the final type. This won't split up the input data
into multiple promoted elements like we need.
This patch falls back to doing the conversion through memory.
Fixes PR41594 which I believe was reflected in the bitcast-vector-bool.ll
changes. The changes to vector-half-conversions.ll are fixing a previously
unknown miscompile from this issue.
Differential Revision: https://reviews.llvm.org/D61114
llvm-svn: 359219
Summary:
This emits labels around heapallocsite calls and S_HEAPALLOCSITE debug
info in codeview. Currently only changes FastISel, so emitting labels still
needs to be implemented in SelectionDAG.
Reviewers: rnk
Subscribers: aprantl, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D61083
llvm-svn: 359149
If we have a vector FP division with a splatted divisor, use the existing transform
that converts 'x/y' into 'x * (1.0/y)' to allow more conversions. This can then
potentially be converted into a scalar FP division by existing combines (rL358984)
as seen in the tests here.
That can be a potentially big perf difference if scalar fdiv has better timing
(including avoiding possible frequency throttling for vector ops).
Differential Revision: https://reviews.llvm.org/D61028
llvm-svn: 359147
Summary:
Both the input Value pointer and the returned Value
pointers in GetUnderlyingObjects are now declared as
const.
It turned out that all current (in-tree) uses of
GetUnderlyingObjects were trivial to update, being
satisfied with have those Value pointers declared
as const. Actually, in the past several of the users
had to use const_cast, just because of ValueTracking
not providing a version of GetUnderlyingObjects with
"const" Value pointers. With this patch we get rid
of those const casts.
Reviewers: hfinkel, materi, jkorous
Reviewed By: jkorous
Subscribers: dexonsmith, jkorous, jholewinski, sdardis, eraman, hiraditya, jrtc27, atanasyan, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D61038
llvm-svn: 359072
If we only match build vectors, we can miss some patterns
that use shuffles as seen in the affected tests.
Note that the underlying calls within getSplatSourceVector()
have the potential for compile-time explosion because of
exponential recursion looking through binop opcodes, but
currently the list of supported opcodes is very limited.
Both of those problems should be addressed in follow-up
patches.
llvm-svn: 358984
Summary:
The DAGCombiner is rewriting (canonicalizing) an ISD::ADD
with no common bits set in the operands as an ISD::OR node.
This could sometimes result in "missing out" on some
combines that normally are performed for ADD. To be more
specific this could happen if we already have rewritten an
ADD into OR, and later (after legalizations or combines)
we expose patterns that could have been optimized if we
had seen the OR as an ADD (e.g. reassociations based on ADD).
To make the DAG combiner less sensitive to if ADD or OR is
used for these "no common bits set" ADD/OR operations we
now apply most of the ADD combines also to an OR operation,
when value tracking indicates that the operands have no
common bits set.
Reviewers: spatel, RKSimon, craig.topper, kparzysz
Reviewed By: spatel
Subscribers: arsenm, rampitec, lebedev.ri, jvesely, nhaehnle, hiraditya, javed.absar, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59758
llvm-svn: 358965
This was supposed to be NFC, but the change in SDLoc
definitions causes instruction scheduling changes.
There's nothing x86-specific in this code, and it can
likely be used from DAGCombiner's simplifyVBinOp().
llvm-svn: 358930
This patch adds support for BigBitWidth -> SmallBitWidth bitcasts, splitting the DemandedBits/Elts accordingly.
The AMDGPU backend needed an extra (srl (and x, c1 << c2), c2) -> (and (srl(x, c2), c1) combine to encourage BFE creation, I investigated putting this in DAGCombine but it caused a lot of noise on other targets - some improvements, some regressions.
The X86 changes are all definite wins.
Differential Revision: https://reviews.llvm.org/D60462
llvm-svn: 358887
Summary:
This emits labels around heapallocsite calls and S_HEAPALLOCSITE debug
info in codeview. Currently only changes FastISel, so emitting labels still
needs to be implemented in SelectionDAG.
Reviewers: hans, rnk
Subscribers: aprantl, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D60800
llvm-svn: 358783
These are general queries, so they should not die when given
a degenerate input like an all undef mask. Callers should be
able to deal with an op that will eventually be simplified away.
llvm-svn: 358761
Currently there is a single point in ScheduleDAGRRList, where we
actually query the topological order (besides init code). Currently we
are recomputing the order after adding a node (which does not have
predecessors) and then we add predecessors edge-by-edge.
We can avoid adding edges one-by-one after we added a new node. In that case, we can
just rebuild the order from scratch after adding the edges to the DAG
and avoid all the updates to the ordering.
Also, we can delay updating the DAG until we query the DAG, if we keep a
list of added edges. Depending on the number of updates, we can either
apply them when needed or recompute the order from scratch.
This brings down the geomean compile time for of CTMark with -O1 down 0.3% on X86,
with no regressions.
Reviewers: MatzeB, atrick, efriedma, niravd, paquette
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D60125
llvm-svn: 358583
As discussed on PR41359, this patch renames the pair of shift-mask target feature functions to make their purposes more obvious.
shouldFoldShiftPairToMask -> shouldFoldConstantShiftPairToMask
preferShiftsToClearExtremeBits -> shouldFoldMaskToVariableShiftPair
llvm-svn: 358526
The checks in `canFoldInAddressingMode` tested for addressing modes that have a
base register but didn't set the `HasBaseReg` flag to true (it's false by
default). This patch fixes that. Although the omission of the flag was
technically incorrect it had no known observable impact, so no tests were
changed by this patch.
Differential Revision: https://reviews.llvm.org/D60314
llvm-svn: 358502
Arguments already have a flag to inform backends when they have been split up.
The AArch64 arm64_32 ABI makes use of these on return types too, so that code
emitted for armv7k can be ABI-compliant.
There should be no CodeGen changes yet, just making more information available.
llvm-svn: 358399
The arm64_32 ABI specifies that pointers (despite being 32-bits) should be
zero-extended to 64-bits when passed in registers for efficiency reasons. This
means that the SelectionDAG needs to be able to tell the backend that an
argument was originally a pointer, which is implmented here.
Additionally, some memory intrinsics need to be declared as taking an i8*
instead of an iPTR.
There should be no CodeGen change yet, but it will be triggered when AArch64
backend support for ILP32 is added.
llvm-svn: 358398
Summary:
Use KnownBits::computeForAddSub/computeForAddCarry
in SelectionDAG::computeKnownBits when doing value
tracking for addition/subtraction.
This should improve the precision of the known bits,
as we only used to make a simple estimate of known
zeroes. The KnownBits support functions are also
able to deduce bits that are known to be one in the
result.
Reviewers: spatel, RKSimon, nikic, lebedev.ri
Reviewed By: nikic
Subscribers: nikic, javed.absar, lebedev.ri, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60460
llvm-svn: 358372
// shuffle (concat X, undef), (concat Y, undef), Mask -->
// concat (shuffle X, Y, Mask0), (shuffle X, Y, Mask1)
The ARM changes with 'vtrn' and narrowed 'vuzp' are improvements.
The x86 changes look neutral or better. There's one test with an
extra instruction, but that could be reversed for a subtarget with
the right attributes. But by default, we want to avoid the 256-bit
op when possible (in my motivating benchmark, a handful of ymm ops
sprinkled into a sequence of xmm ops are triggering frequency
throttling on Haswell resulting in significantly worse perf).
Differential Revision: https://reviews.llvm.org/D60545
llvm-svn: 358291
If the upper bits of the SHL result aren't used, we might be able to use a narrower shift. For example, on X86 this can turn a 64-bit into 32-bit enabling a smaller encoding.
Differential Revision: https://reviews.llvm.org/D60358
llvm-svn: 358257
// bo (build_vec ...undef, x, undef...), (build_vec ...undef, y, undef...) -->
// build_vec ...undef, (bo x, y), undef...
The lifetime of the nodes in these examples is different for variables versus constants,
but they are all build vectors briefly, so I'm proposing to catch them in this form to
handle all of the leading examples in the motivating test file.
Before we have build vectors, we might have insert_vector_element. After that, we might
have scalar_to_vector and constant pool loads.
It's going to take more work to ensure that FP vector operands are getting simplified
with undef elements, so this transform can apply more widely. In a non-loose FP environment,
we are likely simplifying FP elements to NaN values rather than undefs.
We also need to allow more opcodes down this path. Eg, we don't handle FP min/max flavors
yet.
Differential Revision: https://reviews.llvm.org/D60514
llvm-svn: 358172
Certain optimisations from ConstantHoisting and CGP rely on Selection DAG not
seeing through to the constant in other blocks. Revert this patch while we come
up with a better way to handle that.
I will try to follow this up with some better tests.
llvm-svn: 358113
This lines up with what we do for regular subtract and it matches up better with X86 assumptions in isel patterns that add with immediate is more canonical than sub with immediate.
Differential Revision: https://reviews.llvm.org/D60020
llvm-svn: 358027
When bitcasting from a source op to a larger bitwidth op, split the demanded bits and OR them on top of one another and demand those merged bits in the SimplifyDemandedBits call on the source op.
llvm-svn: 357992
Second half of PR40800, this patch adds DAG undef handling to fcmp instructions to match the behavior in llvm::ConstantFoldCompareInstruction, this permits constant folding of vector comparisons where some elements had been reduced to UNDEF (by SimplifyDemandedVectorElts etc.).
This involves a lot of tweaking to reduced tests as bugpoint loves to reduce fcmp arguments to undef........
Differential Revision: https://reviews.llvm.org/D60006
llvm-svn: 357765
There are a variety of vector patterns that may be profitably reduced to a
scalar op when scalar ops are performed using a subset (typically, the
first lane) of the vector register file.
For x86, this is true for float/double ops and element 0 because
insert/extract is just a sub-register rename.
Other targets should likely enable the hook in a similar way.
Differential Revision: https://reviews.llvm.org/D60150
llvm-svn: 357760
Summary:
Teach SelectionDAG how to compute known bits of ISD::CopyFromReg if
the virtual reg used has one def only.
This can be particularly useful when calling isBaseWithConstantOffset()
with the ISD::CopyFromReg argument, as more optimizations may get enabled
in the result.
Also add a missing truncation on X86, found by testing of this patch.
Change-Id: Id1c9fceec862d118c54a5b53adf72ada5d6daefa
Reviewers: bogner, craig.topper, RKSimon
Reviewed By: RKSimon
Subscribers: lebedev.ri, nemanjai, jvesely, nhaehnle, javed.absar, jsji, jdoerfert, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59535
llvm-svn: 357745
Lowering safepoint checks that all gc.relocaes observed in safepoint
must be lowered. However Fast-Isel is able to skip dead gc.relocate.
To resolve this issue we just ignore dead gc.relocate in the check.
Reviewers: reames
Reviewed By: reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D60184
llvm-svn: 357742
The Fast ISel has a fallback to SelectionDAGISel in case it cannot handle the instruction.
This works as follows:
Using reverse order, try to select instruction using Fast ISel, if it cannot handle instruction it fallbacks to SelectionDAGISel
for these instructions if it is a call and continue fast instruction selections.
However if unhandled instruction is not a call or statepoint related instruction it fallbacks to SelectionDAGISel for all remaining
instructions in basic block.
However gc.result instruction is missed and as a result it is possible that gc.result is processed earlier than statepoint
causing breakage invariant the gc.results should be handled after statepoint.
Test is updated because in the current form fast-isel cannot handle ret instruction (due to i1 ret type without explicit ext)
and as a result test does not check fast-isel at all.
Reviewers: reames
Reviewed By: reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D60182
llvm-svn: 357672
There are 3 changes to make this correspond to the same transform in instcombine:
1. Remove the legality check - we can't create anything less legal than we started with.
2. Ease the use restriction, so we only bail out if both operands have >1 use.
3. Ease the use restriction for binops with a repeated operand (eg, mul x, x).
As discussed in D60150, there's a scalarization opportunity that will be made
easier by allowing this transform more generally.
llvm-svn: 357580
Summary:
Nodes that have no uses are eventually pruned when they are selected
from the worklist. Record nodes newly added to the worklist or DAG and
perform pruning after every combine attempt.
Reviewers: efriedma, RKSimon, craig.topper, spatel, jyknight
Reviewed By: jyknight
Subscribers: jdoerfert, jyknight, nemanjai, jvesely, nhaehnle, javed.absar, hiraditya, jsji, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D58070
llvm-svn: 357283
Summary:
Various SelectionDAG non-combine operations (e.g. the getNode smart
constructor and legalization) may leave dangling nodes by applying
optimizations without fully pruning unused result values. This results
in nodes that are never added to the worklist and therefore can not be
pruned.
Add a node inserter for the combiner to make sure such nodes have the
chance of being pruned. This allows a number of additional peephole
optimizations.
Reviewers: efriedma, RKSimon, craig.topper, jyknight
Reviewed By: jyknight
Subscribers: msearles, jyknight, sdardis, nemanjai, javed.absar, hiraditya, jrtc27, atanasyan, jsji, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D58068
llvm-svn: 357279
After investigating the examples from D59777 targeting an SSE4.1 machine,
it looks like a very different problem due to how we map illegal types (256-bit in these cases).
We're missing a shuffle simplification that maps elements of a vector back to a shuffled operand.
We have a more general version of this transform in DAGCombiner::visitVECTOR_SHUFFLE(), but that
generality means it is limited to patterns with a one-use constraint, and the examples here have
2 uses. We don't need any uses or legality limitations for a simplification (no new value is
created).
It looks like we miss this pattern in IR too.
In one of the zext examples here, we have shuffle masks like this:
Shuf0 = vector_shuffle<0,u,3,7,0,u,3,7>
Shuf = vector_shuffle<4,u,6,7,u,u,u,u>
...so that's moving the high half of the 1st vector into the low half. But the high half of the
1st vector is already identical to the low half.
Differential Revision: https://reviews.llvm.org/D59961
llvm-svn: 357258
This is a sibling to rL357178 that I noticed we'd hit if we chose
an alternate transform in D59818.
%z = zext i8 %x to i32
%dec = add i32 %z, -1
%r = sext i32 %dec to i64
=>
%z2 = zext i8 %x to i64
%r = add i64 %z2, -1
https://rise4fun.com/Alive/kPP
The x86 vector diffs show a slight regression, so there's a chance
that we should limit this and the previous transform to scalars.
But given that we allowed vectors before, I'm matching that behavior
here. We should change both transforms together if that's the right
thing to do.
llvm-svn: 357254
In the example below, we would previously emit two range checks, one for cases
1--3 and one for 4--6. This patch makes us exploit the fact that the
fall-through is unreachable and only one range check is necessary.
switch i32 %i, label %default [
i32 1, label %bb1
i32 2, label %bb1
i32 3, label %bb1
i32 4, label %bb2
i32 5, label %bb2
i32 6, label %bb2
]
default: unreachable
llvm-svn: 357252
If scalar truncates are free, attempt to pre-truncate build_vectors source operands.
Only attempt to do this before legalization as we often end up with truncations/extensions during build_vector lowering.
Differential Revision: https://reviews.llvm.org/D59654
llvm-svn: 357161
When lowering a load or store for TypeWidenVector, the type legalizer
would use a single load or store if the associated integer type was legal
or promoted. E.g. it loads a v4i8 as an i32 if i32 is legal/promotable.
(See https://reviews.llvm.org/rL236528 for reference.)
This applies that behaviour to vector types. If the vector type is
TypePromoteInteger, the element type is going to be TypePromoteInteger
as well, which will lead to have a single promoting load rather than N
individual promoting loads. For instance, if we have a v3i1, we would
now have a load of v4i1 instead of 3 loads of i1.
Patch by Guillaume Marques. Thanks!
Differential Revision: https://reviews.llvm.org/D56201
llvm-svn: 357120
Split out from D59749. The current implementation of isWrappedSet()
doesn't do what it says on the tin, and treats ranges like
[X, Max] as wrapping, because they are represented as [X, 0) when
using half-inclusive ranges. This also makes it inconsistent with
the semantics of isSignWrappedSet().
This patch renames isWrappedSet() to isUpperWrapped(), in preparation
for the introduction of a new isWrappedSet() method with corrected
behavior.
llvm-svn: 357107
Rework BaseIndexOffset and isAlias to fully work with lifetime nodes
and fold in lifetime alias analysis.
This is mostly NFC.
Reviewers: courbet
Reviewed By: courbet
Subscribers: hiraditya, jdoerfert, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59794
llvm-svn: 357070
Original commit by Ayonam Ray.
This commit adds a regression test for the issue discovered in the
previous commit: that the range check for the jump table can only be
omitted if the fall-through destination of the jump table is
unreachable, which isn't necessarily true just because the default of
the switch is unreachable.
This addresses the missing optimization in PR41242.
> During the lowering of a switch that would result in the generation of a
> jump table, a range check is performed before indexing into the jump
> table, for the switch value being outside the jump table range and a
> conditional branch is inserted to jump to the default block. In case the
> default block is unreachable, this conditional jump can be omitted. This
> patch implements omitting this conditional branch for unreachable
> defaults.
>
> Differential Revision: https://reviews.llvm.org/D52002
> Reviewers: Hans Wennborg, Eli Freidman, Roman Lebedev
llvm-svn: 357067
getAsCarry() checks that the input argument is a carry-producing node before
allowing a transformation to addcarry. This patch adds a check to make sure
that the carry-producing node is legal. If it is not, it may not remain in a
form that is manageable by the target backend. The test case caused a
compilation failure during instruction selection for this reason on SystemZ.
Patch by Ulrich Weigand.
Review: Sanjay Patel
https://reviews.llvm.org/D59822
llvm-svn: 357052
We have the folds for fadd/fsub/fmul already in DAGCombiner,
so it may be possible to remove that code if we can guarantee that
these ops are zapped before they can exist.
llvm-svn: 357029
Various SelectionDAG non-combine operations (e.g. the getNode smart
constructor and legalization) may leave dangling nodes by applying
optimizations or not fully pruning unused result values. This can
result in nodes that are never added to the worklist and therefore can
not be pruned.
Add a node inserter as the current node deleter to make sure such
nodes have the chance of being pruned.
Many minor changes, mostly positive.
llvm-svn: 356996
This helps us relax the extension of a lot of scalar elements before they are inserted into a vector.
Its exposes an issue in DAGCombiner::convertBuildVecZextToZext as some/all the zero-extensions may be relaxed to ANY_EXTEND, so we need to handle that case to avoid a couple of AVX2 VPMOVZX test regressions.
Once this is in it should be easier to fix a number of remaining failures to fold loads into VBROADCAST nodes.
Differential Revision: https://reviews.llvm.org/D59484
llvm-svn: 356989
DenseMap iteration order is not guaranteed, use MapVector instead.
Fix provided by srhines.
Differential Revision: https://reviews.llvm.org/D59807
llvm-svn: 356988
First half of PR40800, this patch adds DAG undef handling to icmp instructions to match the behaviour in llvm::ConstantFoldCompareInstruction and SimplifyICmpInst, this permits constant folding of vector comparisons where some elements had been reduced to UNDEF (by SimplifyDemandedVectorElts etc.).
This involved a lot of tweaking to reduced tests as bugpoint loves to reduce icmp arguments to undef........
Differential Revision: https://reviews.llvm.org/D59363
llvm-svn: 356938
An i16 bswap can be implemented with an i16 rotate by 8. We previously emitted
a shift and OR sequence that DAG combine should be able to turn back into
rotate. But we might as well go there directly. If rotate isn't legal,
LegalizeDAG should further legalize it to either the opposite rotate, or the
shift and OR pattern.
I don't know of any way to get the existing DAG combine reliance to fail. So
I don't know any way to add new tests for this that wouldn't have worked
previously.
llvm-svn: 356860
SDNodes can only have 64k operands and for some inputs (e.g. large
number of stores), we can reach this limit when creating TokenFactor
nodes. This patch is a follow up to D56740 and updates a few more places
that potentially can create TokenFactors with too many operands.
Reviewers: efriedma, craig.topper, aemerson, RKSimon
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D59156
llvm-svn: 356668
The actual code change is fairly straight forward, but exercising it isn't. First, it turned out we weren't adding the appropriate flags in SelectionDAG. Second, it turned out that we've got some optimization gaps, so obvious test cases don't work.
My first attempt (in atomic-unordered.ll) points out a deficiency in our peephole-opt folding logic which I plan to fix separately. Instead, I'm exercising this through MachineLICM.
Differential Revision: https://reviews.llvm.org/D59375
llvm-svn: 356494
In r311255 we added a case where we split vectors whose elements are
all derived from the same input vector so that we could shuffle it
more efficiently. In doing so, createBuildVecShuffle was taught to
adjust for the fact that all indices would be based off of the first
vector when this happens, but it's possible for the code that checked
that to fire incorrectly if we happen to have a BUILD_VECTOR of
extracts from subvectors and don't hit this new optimization.
Instead of trying to detect if we've split the vector by checking if
we have extracts from the same base vector, we can just pass that
information into createBuildVecShuffle, avoiding the miscompile.
Differential Revision: https://reviews.llvm.org/D59507
llvm-svn: 356476
These changes are related to PR37743 and include:
SelectionDAGBuilder::visitSelect handles the unary SelectPatternFlavor::SPF_ABS case to build ABS node.
Delete the redundant recognizer of the integer ABS pattern from the DAGCombiner.
Add promoting the integer ABS node in the LegalizeIntegerType.
Expand-based legalization of integer result for the ABS nodes.
Expand-based legalization of ABS vector operations.
Add some integer abs testcases for different typesizes for Thumb arch
Add the custom ABS expanding and change the SAD pattern recognizer for X86 arch: The i64 result of the ABS is expanded to:
tmp = (SRA, Hi, 31)
Lo = (UADDO tmp, Lo)
Hi = (XOR tmp, (ADDCARRY tmp, hi, Lo:1))
Lo = (XOR tmp, Lo)
The "detectZextAbsDiff" function is changed for the recognition of pattern with the ABS node. Given a ABS node, detect the following pattern:
(ABS (SUB (ZERO_EXTEND a), (ZERO_EXTEND b))).
Change integer abs testcases for codegen with the ABS node support for AArch64.
Indicate that the ABS is legal for the i64 type when the NEON is supported.
Change the integer abs testcases to show changing of codegen.
Add combine and legalization of ABS nodes for Thumb arch.
Extend 'matchSelectPattern' to recognize the ABS patterns with ICMP_SGE condition.
For discussion, see https://bugs.llvm.org/show_bug.cgi?id=37743
Patch by: @ikulagin (Ivan Kulagin)
Differential Revision: https://reviews.llvm.org/D49837
llvm-svn: 356468
This allows better code size for aarch64 floating point materialization
in a future patch.
Reviewers: evandro
Differential Revision: https://reviews.llvm.org/D58690
llvm-svn: 356389
Delete temporarily constructed node uses for analysis after it's use,
holding onto original input nodes. Ideally this would be rewritten
without making nodes, but this appears relatively complex.
Reviewers: spatel, RKSimon, craig.topper
Subscribers: jdoerfert, hiraditya, deadalnix, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D57921
llvm-svn: 356382
Summary:
Look past bitcasts when looking for parameter debug values that are
described by frame-index loads in `EmitFuncArgumentDbgValue()`.
In the attached test case we would be left with an undef `DBG_VALUE`
for the parameter without this patch.
A similar fix was done for parameters passed in registers in D13005.
This fixes PR40777.
Reviewers: aprantl, vsk, jmorse
Reviewed By: aprantl
Subscribers: bjope, javed.absar, jdoerfert, llvm-commits
Tags: #debug-info, #llvm
Differential Revision: https://reviews.llvm.org/D58831
llvm-svn: 356363
AMDGPU would like to have MVTs for v3i32, v3f32, v5i32, v5f32. This
commit does not add them, but makes preparatory changes:
* Exclude non-legal non-power-of-2 vector types from ComputeRegisterProp
mechanism in TargetLoweringBase::getTypeConversion.
* Cope with SETCC and VSELECT for odd-width i1 vector when the other
vectors are legal type.
Some of this patch is from Matt Arsenault, also of AMD.
Differential Revision: https://reviews.llvm.org/D58899
Change-Id: Ib5f23377dbef511be3a936211a0b9f94e46331f8
llvm-svn: 356350
Fold (x & ~y) | y and it's four commuted variants to x | y. This pattern
can in particular appear when a vselect c, x, -1 is expanded to
(x & ~c) | (-1 & c) and combined to (x & ~c) | c.
This change has some overlap with D59066, which avoids creating a
vselect of this form in the first place during uaddsat expansion.
Differential Revision: https://reviews.llvm.org/D59174
llvm-svn: 356333
This is a subset of what was proposed in:
D59006
...and may overlap with test changes from:
D59174
...but it seems like a good general optimization to turn selects
into bitwise-logic when possible because we never know exactly
what can happen at this stage of DAG combining depending on how
the target has defined things.
Differential Revision: https://reviews.llvm.org/D59066
llvm-svn: 356332
rL356292 reduces the size of scalar_to_vector if we know the upper bits are undef - which means that shuffles may find they are suddenly referencing scalar_to_vector elements other than zero - so make sure we handle this as undef.
llvm-svn: 356327
Summary:
In the new wasm EH proposal, `rethrow` takes an `except_ref` argument.
This change was missing in r352598.
This patch adds `llvm.wasm.rethrow.in.catch` intrinsic. This is an
intrinsic that's gonna eventually be lowered to wasm `rethrow`
instruction, but this intrinsic can appear only within a catchpad or a
cleanuppad scope. Also this intrinsic needs to be invokable - otherwise
EH pad successor for it will not be correctly generated in clang.
This also adds lowering logic for this intrinsic in
`SelectionDAGBuilder::visitInvoke`. This routine is basically a
specialized and simplified version of
`SelectionDAGBuilder::visitTargetIntrinsic`, but we can't use it
because if is only for `CallInst`s.
This deletes the previous `llvm.wasm.rethrow` intrinsic and related
tests, which was meant to be used within a `__cxa_rethrow` library
function. Turned out this needs some more logic, so the intrinsic for
this purpose will be added later.
LateEHPrepare takes a result value of `catch` and inserts it into
matching `rethrow` as an argument.
`RETHROW_IN_CATCH` is a pseudo instruction that serves as a link between
`llvm.wasm.rethrow.in.catch` and the real wasm `rethrow` instruction. To
generate a `rethrow` instruction, we need an `except_ref` argument,
which is generated from `catch` instruction. But `catch` instrutions are
added in LateEHPrepare pass, so we use `RETHROW_IN_CATCH`, which takes
no argument, until we are able to correctly lower it to `rethrow` in
LateEHPrepare.
Reviewers: dschuff
Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59352
llvm-svn: 356316
Summary:
A number of optimizations are inhibited by single-use TokenFactors not
being merged into the TokenFactor using it. This makes we consider if
we can do the merge immediately.
Most tests changes here are due to the change in visitation causing
minor reorderings and associated reassociation of paired memory
operations.
CodeGen tests with non-reordering changes:
X86/aligned-variadic.ll -- memory-based add folded into stored leaq
value.
X86/constant-combiners.ll -- Optimizes out overlap between stores.
X86/pr40631_deadstore_elision -- folds constant byte store into
preceding quad word constant store.
Reviewers: RKSimon, craig.topper, spatel, efriedma, courbet
Reviewed By: courbet
Subscribers: dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, javed.absar, eraman, hiraditya, kbarton, jrtc27, atanasyan, jsji, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59260
llvm-svn: 356068
First step towards PR40800 - I intend to move the float case in a separate future patch.
I had to tweak the (overly reduced) thumb2 test and the x86 widening test change is annoying (no longer rematerializable) but we should address this separately.
Differential Revision: https://reviews.llvm.org/D59244
llvm-svn: 356040
The existing statepoint lowering code does something odd; it adds machine memory operands post instruction selection. This was copied from the stackmap/patchpoint implementation, but appears to be non-idiomatic.
This change is largely NFC. It moves the MMO creation logic into SelectionDAG building. It ends up not quite being NFC because the size of the stack slot is reflected in the MMO. The old code blindly used pointer size for the MMO size, which appears to have always been incorrect for larger values. It just happened nothing actually relied on the MMOs, so it worked out okay.
For context, I'm planning on removing the MOVolatile flag from these in a future commit, and then removing the MOStore flag from deopt spill slots in a separate one. Doing so is motivated by a small test case where we should be able to better schedule spill slots, but don't do so due to a memory use/def implied by the statepoint.
Differential Revision: https://reviews.llvm.org/D59106
llvm-svn: 355953
Expand MULO with constant power of two operand into a shift. The
overflow is checked with (x << shift) >> shift == x, where the right
shift will be logical for umulo and arithmetic for smulo (with
exception for multiplications by signed_min).
Differential Revision: https://reviews.llvm.org/D59041
llvm-svn: 355937
Fixes https://bugs.llvm.org/show_bug.cgi?id=36796.
Implement basic legalizations (PromoteIntRes, PromoteIntOp,
ExpandIntRes, ScalarizeVecOp, WidenVecOp) for VECREDUCE opcodes.
There are more legalizations missing (esp float legalizations),
but there's no way to test them right now, so I'm not adding them.
This also includes a few more changes to make this work somewhat
reasonably:
* Add support for expanding VECREDUCE in SDAG. Usually
experimental.vector.reduce is expanded prior to codegen, but if the
target does have native vector reduce, it may of course still be
necessary to expand due to legalization issues. This uses a shuffle
reduction if possible, followed by a naive scalar reduction.
* Allow the result type of integer VECREDUCE to be larger than the
vector element type. For example we need to be able to reduce a v8i8
into an (nominally) i32 result type on AArch64.
* Use the vector operand type rather than the scalar result type to
determine the action, so we can control exactly which vector types are
supported. Also change the legalize vector op code to handle
operations that only have vector operands, but no vector results, as
is the case for VECREDUCE.
* Default VECREDUCE to Expand. On AArch64 (only target using VECREDUCE),
explicitly specify for which vector types the reductions are supported.
This does not handle anything related to VECREDUCE_STRICT_*.
Differential Revision: https://reviews.llvm.org/D58015
llvm-svn: 355860
Includes a fix to emit a CheckOpcode for build_vector when immAllZerosV/immAllOnesV is used as a pattern root. This means it can't be used to look through bitcasts when used as a root, but that's probably ok. This extra CheckOpcode will ensure that the first match in the isel table will be a SwitchOpcode which is needed by the caching optimization in the ISel Matcher.
Original commit message:
Previously we had build_vector PatFrags that called ISD::isBuildVectorAllZeros/Ones. Internally the ISD::isBuildVectorAllZeros/Ones look through bitcasts, but we aren't able to take advantage of that in isel. Instead of we have to canonicalize the types of the all zeros/ones build_vectors and insert bitcasts. Then we have to pattern match those exact bitcasts.
By emitting specific matchers for these 2 nodes, we can make isel look through any bitcasts without needing to explicitly match them. We should also be able to remove the canonicalization to vXi32 from lowering, but I've left that for a follow up.
This removes something like 40,000 bytes from the X86 isel table.
Differential Revision: https://reviews.llvm.org/D58595
llvm-svn: 355784
This avoids breaking possible value dependencies when sorting loads by
offset.
AMDGPU has some load instructions that write into the high or low bits
of the destination register, and have a tied input for the other input
bits. These can easily have the same base pointer, but be a swizzle so
the high address load needs to come first. This was inserting glue
forcing the opposite ordering, producing a cycle the InstrEmitter
would assert on. It may be potentially expensive to look for the
dependency between the other loads, so just skip any where this could
happen.
Fixes bug 40936 by reverting r351379, which added a hacky attempt to
fix this by adding chains in this case, which I think was just working
around broken glue before the InstrEmitter. The core of the patch is
re-implementing the fix for that problem.
llvm-svn: 355728
Move the x86 combine from D58974 into the DAGCombine VSELECT code and update the SELECT version to use the isBooleanFlip helper as well.
Requested by @spatel on D59006
llvm-svn: 355533
During the lowering of a switch that would result in the generation of a
jump table, a range check is performed before indexing into the jump
table, for the switch value being outside the jump table range and a
conditional branch is inserted to jump to the default block. In case the
default block is unreachable, this conditional jump can be omitted. This
patch implements omitting this conditional branch for unreachable
defaults.
Differential Revision: https://reviews.llvm.org/D52002
Reviewers: Hans Wennborg, Eli Freidman, Roman Lebedev
llvm-svn: 355490
During the lowering of a switch that would result in the generation of a
jump table, a range check is performed before indexing into the jump
table, for the switch value being outside the jump table range and a
conditional branch is inserted to jump to the default block. In case the
default block is unreachable, this conditional jump can be omitted. This
patch implements omitting this conditional branch for unreachable
defaults.
Differential Revision: https://reviews.llvm.org/D52002
Reviewers: Hans Wennborg, Eli Freidman, Roman Lebedev
llvm-svn: 355483
This caused the first matcher in the isel table for many targets to Opc_Scope instead of Opc_SwitchOpcode. This leads to a significant increase in isel match failures.
llvm-svn: 355433
This patch enables combining integer bitcasts of integer build vectors when the new scalar type is legal. I've avoided floating point because the implementation bitcasts float to int along the way and we would need to check the intermediate types for legality
Differential Revision: https://reviews.llvm.org/D58884
llvm-svn: 355324
Summary:
Before when we implemented the first EH proposal, 'catch <tag>'
instruction may not catch an exception so there were multiple EH pads an
exception can unwind to. That means a BB could have multiple EH pad
successors.
Now after we switched to the new proposal, every 'catch' instruction
catches an exception, and there is only one catchpad per catchswitch, so
we at most have one EH pad successor, making `ThrowUnwindDest` map in
`WasmEHInfo` unnecessary.
Keeping `ThrowUnwindDest` map in `WasmEHInfo` has its own problems,
because other optimization passes can split a BB that contains possibly
throwing calls (previously invokes), and we have to update the map every
time that happens, which is not easy for common CodeGen passes.
This also correctly updates successor info in LateEHPrepare when we add
a rethrow instruction.
Reviewers: dschuff
Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D58486
llvm-svn: 355296
Previously we had build_vector PatFrags that called ISD::isBuildVectorAllZeros/Ones. Internally the ISD::isBuildVectorAllZeros/Ones look through bitcasts, but we aren't able to take advantage of that in isel. Instead of we have to canonicalize the types of the all zeros/ones build_vectors and insert bitcasts. Then we have to pattern match those exact bitcasts.
By emitting specific matchers for these 2 nodes, we can make isel look through any bitcasts without needing to explicitly match them. We should also be able to remove the canonicalization to vXi32 from lowering, but I've left that for a follow up.
This removes something like 40,000 bytes from the X86 isel table.
Differential Revision: https://reviews.llvm.org/D58595
llvm-svn: 355224
Summary:
The description of KnownBits::zext() and
KnownBits::zextOrTrunc() has confusingly been telling
that the operation is equivalent to zero extending the
value we're tracking. That has not been true, instead
the user has been forced to explicitly set the extended
bits as known zero afterwards.
This patch adds a second argument to KnownBits::zext()
and KnownBits::zextOrTrunc() to control if the extended
bits should be considered as known zero or as unknown.
Reviewers: craig.topper, RKSimon
Reviewed By: RKSimon
Subscribers: javed.absar, hiraditya, jdoerfert, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D58650
llvm-svn: 355099
At the moment, we mark every atomic memory access as being also volatile. This is unnecessarily conservative and prohibits many legal transforms (DCE, folding, etc..).
This patch removes MOVolatile from the MachineMemOperands of atomic, but not volatile, instructions. This should be strictly NFC after a series of previous patches which have gone in to ensure backend code is conservative about handling of isAtomic MMOs. Once it's in and baked for a bit, we'll start working through removing unnecessary bailouts one by one. We applied this same strategy to the middle end a few years ago, with good success.
To make sure this patch itself is NFC, it is build on top of a series of other patches which adjust code to (for the moment) be as conservative for an atomic access as for a volatile access and build up a test corpus (mostly in test/CodeGen/X86/atomics-unordered.ll)..
Previously landed
D57593 Fix a bug in the definition of isUnordered on MachineMemOperand
D57596 [CodeGen] Be conservative about atomic accesses as for volatile
D57802 Be conservative about unordered accesses for the moment
rL353959: [Tests] First batch of cornercase tests for unordered atomics.
rL353966: [Tests] RMW folding tests w/unordered atomic operations.
rL353972: [Tests] More unordered atomic lowering tests.
rL353989: [SelectionDAG] Inline a single use helper function, and remove last non-MMO interface
rL354740: [Hexagon, SystemZ] Be super conservative about atomics
rL354800: [Lanai] Be super conservative about atomics
rL354845: [ARM] Be super conservative about atomics
Attention Out of Tree Backend Owners: This patch may break you. If it does, you can use the TLI getMMOFlags hook to restore the MOVolatile to any instruction you need to. (See llvm-dev thread titled "PSA: Changes to how atomics are handled in backends" started Feb 27, 2019.)
Differential Revision: https://reviews.llvm.org/D57601
llvm-svn: 355025
If SADDSAT/SSUBSAT are legal, then we can expand SADDO/SSUBO by performing a ADD/SUB and a SADDO/SSUBO and then compare the results.
I looked at doing this for UADDO/USUBO as well but as we don't have to do as many range comparisons I didn't see any/much benefit.
Differential Revision: https://reviews.llvm.org/D58637
llvm-svn: 354866
These helpers extend the existing isConstOrConstSplat helper checks to support DemandedElts masks as well.
We already had a local version of this in SelectionDAG that computeKnownBits/ComputeNumSignBits made use of, but this adds the functionality directly to the BuildVectorSDNode node and extends isConstOrConstSplat etc. to use that.
This will allow us to reuse the functionality in SimplifyDemandedVectorElts/SimplifyDemandedBits.
Differential Revision: https://reviews.llvm.org/D58503
llvm-svn: 354797
Support undef shuffle mask indices in the shuffle(concat_vectors, concat_vectors) -> concat_vectors fold
Differential Revision: https://reviews.llvm.org/D58585
llvm-svn: 354793
OPC_CheckCondCode is always used as operand 2 of a setcc. And its always surrounded by a MoveChild2 and a MoveParent. By having a dedicated opcode for this case we can reduce the number of bytes needed for this pattern from 4 bytes to 2.
This saves ~3000 bytes in the X86 table.
llvm-svn: 354763
Summary:
When promoting the over flow vector for these ops we should use the target's desired setcc result type. This way a v8i32 result type will use a v8i32 overflow vector instead of a v8i16 overflow vector. A v8i16 overflow vector will cause LegalizeDAG/LegalizeVectorOps to have to use v8i32 and truncate to v8i16 in its expansion. By doing this in type legalization instead, we get the truncate into the DAG earlier and give DAG combine more of a chance to optimize it.
We also have to fix unrolling to use the scalar setcc result type for the scalarized operation, and convert it to the required vector element type after the scalar operation. We have to observe the vector boolean contents when doing this conversion. The previous code was just taking the scalar result and putting it in the vector. But for X86 and AArch64 that would have only put a the boolean value in bit 0 of the element and left all other bits in the element 0. We need to ensure all bits in the element are the same. I'm using a select with constants here because that's what setcc unrolling in LegalizeVectorOps used.
Reviewers: spatel, RKSimon, nikic
Reviewed By: nikic
Subscribers: javed.absar, kristof.beyls, dmgreen, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D58567
llvm-svn: 354753
r354648 was a follow up to fix a regression "[X86] Add a DAG combine for (aext_vector_inreg (aext_vector_inreg X)) -> (aext_vector_inreg X) to fix a regression from my previous commit."
These were reverted in r354713 as their context depended on other patches that were reverted for a bug.
llvm-svn: 354734
r354363 caused https://crbug.com/934963#c1, which has a plain C reduced
test case.
I also had to revert some dependent changes:
- r354648
- r354647
- r354640
- r354511
llvm-svn: 354713
When we need to merge two adjacent loads the AND mask for the low piece was still sized for the full src element size. But we didn't have that many bits. The upper bits are already zero due to the SRL. So we can skip the AND if we're going to combine with the high bits.
We do need an AND to clear out any bits from the high part. We were anding the high part before combining with the low part, but it looks like ANDing after the OR gets better results.
So we can just emit the final AND after the optional concatentation is done. That will handling skipping before the OR and get rid of extra high bits after the OR.
llvm-svn: 354655
Otherwise we end up creating extract_vector_elts that then each need to have their input promoted. This can lead to truncates needing to be emitted for each of those.
But we already emitted any_extends when we legalized the extract_subvector. So now we have pairs of any_extend+trunc that partially cancel. But depending on how DAGCombiner visits them we can get weird results.
By promoting the input at the same time we can create only a single any_extend or truncate.
There's one regression in the vector-narrow-binop.ll case, but that looks easy to fix with a follow up patch.
llvm-svn: 354647
This fold can occur during legalization, so it can fight with promotion
to the larger type. It apparently takes a special sequence and subtarget
to avoid more basic simplifications that would hide the problem.
But there's a bigger question raised here: why does distributeTruncateThroughAnd()
even exist? It duplicates functionality from a more minimal pattern that we
already have. But getting rid of this function requires some preliminary steps.
https://bugs.llvm.org/show_bug.cgi?id=40793
llvm-svn: 354594
If the LHS has known zeros, then the RHS immediate mask might have been simplified to remove those bits.
This patch adds a call to computeKnownBits to get the known zeroes to handle that possibility. I left an early out to skip the call if all of the demanded bits are set in the mask.
Differential Revision: https://reviews.llvm.org/D58464
llvm-svn: 354514
Second part of https://bugs.llvm.org/show_bug.cgi?id=40442.
This adds an extra UnrollVectorOverflowOp() method to SDAG, because
the general UnrollOverflowOp() method can't deal with multiple results.
Additionally we need to expand UMULO/SMULO during vector op
legalization, as it may result in unrolling, which may need additional
type legalization.
Differential Revision: https://reviews.llvm.org/D57997
llvm-svn: 354513
If the bit position has known zeros in it, then the AND immediate will likely be optimized to remove bits.
This can prevent GetDemandedBits from recognizing that the AND is unnecessary.
llvm-svn: 354498
Directly use the correct shift amount type if it is possible, and
future-proof the code against vectors. The added test makes sure that
bitwidths that do not fit into the shift amount type do not assert.
Split out from D57997.
llvm-svn: 354359
Summary:
A store to an object whose lifetime is about to end can be removed.
See PR40550 for motivation.
Reviewers: niravd
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D57541
llvm-svn: 354244
In preparation for supporting vector expansion.
Add an isPostTypeLegalization flag to makeLibCall(), because this
expansion relies on the legalized form using MERGE_VALUES. Drop
the corresponding variant of ExpandLibCall, which is no longer used.
Differential Revision: https://reviews.llvm.org/D58006
llvm-svn: 354226
While rebasing a refactor in r353950 I accidentally swapped two function
arguments; one is SelectionDAGBuilders "current" DebugLoc, the other is the one
from the "current" debug intrinsic. They're probably always identical, but I
haven't proved that yet.
llvm-svn: 354019
For D57601, we need to know whether the instruction is volatile. We'd either have to pass yet another parameter, or just standardize on the MMO interface. I chose the second.
llvm-svn: 353989
The helper function was used by only two callers, and largely ended up providing distinct functionality based on optional arguments and opcode. Inline and simply to make the functionality much more clear.
llvm-svn: 353977
In this patch SelectionDAG tries to salvage any dbg.values that are going to be
dropped, in case they can be recovered from Values in the current BB. It also
strengthens SelectionDAGs handling of dangling debug data, so that dbg.values
are *always* emitted (as Undef or otherwise) instead of dangling forever.
The motivation behind this patch exists in the new test case: a memory address
(here a bitcast and GEP) exist in one basic block, and a dbg.value referring to
the address is left in the 'next' block. The base pointer is live across all
basic blocks. In current llvm trunk the dbg.value cannot be encoded, and it
isn't even emitted as an Undef DBG_VALUE.
The change is simply: if we're definitely going to drop a dbg.value, repeatedly
apply salvageDebugInfo to its operand until either we find something that can
be encoded, or we can't salvage any further in which case we produce an Undef
DBG_VALUE. To know when we're "definitely going to drop a dbg.value",
SelectionDAG signals SelectionDAGBuilder when all IR instructions have been
encoded to force salvaging. This ensures that any dbg.value that's dangling
after DAG creation will have a corresponding DBG_VALUE encoded.
Differential Revision: https://reviews.llvm.org/D57694
llvm-svn: 353954
This is a pure copy-and-paste job, moving the logic for lowering dbg.value
intrinsics to SDDbgValues into its own function. This is ahead of adding some
more users of this logic.
Differential Revision: https://reviews.llvm.org/D57697
llvm-svn: 353950
SelectionDAGBuilder has special handling for dbg.value intrinsics that are
understood to define the location of function parameters on entry to the
function. To enable this, we avoid recording a dbg.value as a virtual register
reference if it might be such a parameter, so that it later hits
EmitFuncArgumentDbgValue.
This patch reduces the set of circumstances where we avoid recording a
dbg.value as a virtual register reference, to allow more "normal" variables
to be recorded that way. We now only bypass for potential parameters if:
* The dbg.value operand is an Argument,
* The Variable is a parameter, and
* The Variable is not inlined.
meaning it's very likely that the dbg.value is a function-entry parameter
location.
Differential Revision: https://reviews.llvm.org/D57584
llvm-svn: 353948
If we're comparing some value for equality against 2 constants
and those constants have an absolute difference of just 1 bit,
then we can offset and mask off that 1 bit and reduce to a single
compare against zero:
and/or (setcc X, C0, ne), (setcc X, C1, ne/eq) -->
setcc ((add X, -C1), ~(C0 - C1)), 0, ne/eq
https://rise4fun.com/Alive/XslKj
This transform is disabled by default using a TLI hook
("convertSetCCLogicToBitwiseLogic()").
That should be overridden for AArch64, MIPS, Sparc and possibly
others based on the asm shown in:
https://bugs.llvm.org/show_bug.cgi?id=40611
llvm-svn: 353859
Summary:
The SMULO/UMULO DAG nodes, when not directly supported by the target,
expand to a multiplication twice as wide. In case that the resulting
type is not legal, the legalizer cannot directly call the intrinsic
with the wide arguments; instead, it "pre-lowers" them by splitting
them in halves.
rL283203 made sure that on big endian targets, the legalizer passes
the argument halves in the correct order. It did not do the same
for the return value halves because the existing code used a hack;
it put an illegal type into DAG and hoped that nothing would break
and it would be correctly lowered elsewhere.
rL307207 fixed this, handling return value halves similar to how
argument handles are handled, but did not take big-endian targets
into account.
This commit fixes the expansion on big-endian targets, such as
the out-of-tree OR1K target.
Reviewers: eli.friedman, vadimcn
Subscribers: george-hopkins, efriedma, llvm-commits
Differential Revision: https://reviews.llvm.org/D45355
llvm-svn: 353854
Summary:
This patch fixes PR40587.
When a dbg.value instrinsic is emitted to the DAG
by using EmitFuncArgumentDbgValue the resulting
DBG_VALUE is hoisted to the beginning of the entry
block. I think the idea is to be able to locate
a formal argument already from the start of the
function.
However, EmitFuncArgumentDbgValue only checked that
the value that was used to describe a variable was
originating from a function parameter, not that the
variable itself actually was an argument to the
function. So when for example assigning a local
variable "local" the value from an argument "a",
the assocated DBG_VALUE instruction would be hoisted
to the beginning of the function, even if the scope
for "local" started somewhere else (or if "local"
was mapped to other values earlier in the function).
This patch adds some logic to EmitFuncArgumentDbgValue
to check that the variable being described actually
is an argument to the function. And that the dbg.value
being lowered already is in the entry block. Otherwise
we bail out, and the dbg.value will be handled as an
ordinary dbg.value (not as a "FuncArgumentDbgValue").
A tricky situation is when both the variable and
the value is related to function arguments, but not
neccessarily the same argument. We make sure that we
do not describe the same argument more than once as
a "FuncArgumentDbgValue". This solution works as long
as opt has injected a "first" dbg.value that corresponds
to the formal argument at the function entry.
Reviewers: jmorse, aprantl
Subscribers: jyknight, hiraditya, fedor.sergeev, dstenb, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D57702
llvm-svn: 353735
`CallBase` class rather than `CallSite` wrappers.
I pushed this change down through most of the statepoint infrastructure,
completely removing the use of CallSite where I could reasonably do so.
I ended up making a couple of cut-points: generic call handling
(instcombine, TLI, SDAG). As soon as it hit truly generic handling with
users outside the immediate code, I simply transitioned into or out of
a `CallSite` to make this a reasonable sized chunk.
Differential Revision: https://reviews.llvm.org/D56122
llvm-svn: 353660
Now that we have vector support for [US](ADD|SUB)O we no longer
need to scalarize when expanding [US](ADD|SUB)SAT.
This matches what the cost model already does.
Differential Revision: https://reviews.llvm.org/D57348
llvm-svn: 353651
Now that we have SimplifyDemandedBits support for funnel shifts (rL353539), we need to simplify funnel shifts back to bitshifts in cases where either argument has been folded to undef/zero.
Differential Revision: https://reviews.llvm.org/D58009
llvm-svn: 353645
SimplifySetCC still has much room for improvement, but this should
fix the remaining problem examples from:
https://bugs.llvm.org/show_bug.cgi?id=40657
The initial fix for this problem was rL353615.
llvm-svn: 353639
There's effectively no difference for the cases with variables.
We just trade a sub for an add on those. But the case with a
subtract from constant would require an extra move instruction
on x86, so this looks like a reasonable generic combine.
llvm-svn: 353619
In preparation for supporting vector expansion.
Also drop a variant of ExpandLibCall, of which the MULO expansions
were the only user.
llvm-svn: 353611
This patch accompanies the RFC posted here:
http://lists.llvm.org/pipermail/llvm-dev/2018-October/127239.html
This patch adds a new CallBr IR instruction to support asm-goto
inline assembly like gcc as used by the linux kernel. This
instruction is both a call instruction and a terminator
instruction with multiple successors. Only inline assembly
usage is supported today.
This also adds a new INLINEASM_BR opcode to SelectionDAG and
MachineIR to represent an INLINEASM block that is also
considered a terminator instruction.
There will likely be more bug fixes and optimizations to follow
this, but we felt it had reached a point where we would like to
switch to an incremental development model.
Patch by Craig Topper, Alexander Ivchenko, Mikhail Dvoretckii
Differential Revision: https://reviews.llvm.org/D53765
llvm-svn: 353563
The sqrt case is faster and we already do this for the case where
the exponent is 0.25. This adds the 0.75 case which is also not
sensitive to signed zeros.
Patch by Whitney Tsang (Whitney)
Differential revision: https://reviews.llvm.org/D57434
llvm-svn: 353557
Replace OR(SHL,SRL) pattern with ISD::FSHR (legalization expands this later if necessary) - this helps with the scale == 0 'undefined' drop-through case that was discussed on D55720.
llvm-svn: 353546
This is part of https://bugs.llvm.org/show_bug.cgi?id=40442.
Vector legalization is implemented for the add/sub overflow opcodes.
UMULO/SMULO are also handled as far as legalization is concerned, but
they don't support vector expansion yet (so no tests for them).
The vector result widening implementation is suboptimal, because it
could result in a legalization loop.
Differential Revision: https://reviews.llvm.org/D57639
llvm-svn: 353464
Move the (add (umax X, C), -C) --> (usubsat X, C) X86 combine into generic DAGCombiner
First of a number of saturated arithmetic folds that can be moved out of X86-specific code for PR40111.
Differential Revision: https://reviews.llvm.org/D57754
llvm-svn: 353457
I noticed that we are missing this canonicalization in IR:
rL352515
...and then realized that we don't get this right in SDAG either,
so this has to be fixed first regardless of what we choose to do in IR.
The existing fold was limited to scalars and using the wrong predicate
to guard the transform. We have a boolean contents TLI query that can
be used to decide which direction to fold.
This may eventually lead back to the problems/question in:
https://bugs.llvm.org/show_bug.cgi?id=40486
...but it makes no difference to that yet.
Differential Revision: https://reviews.llvm.org/D57401
llvm-svn: 353433
Summary:
If the index isn't constant, this transform inserts a multiply and an add on the index to calculating the base pointer for a scalar load. But we still create a memory operand with an offset of 0 and the size of the scalar access. But the access is really to an unknown offset within the original access size.
This can cause the machine scheduler to incorrectly calculate dependencies between this load and other accesses. In the case we saw, there was a 32 byte vector store that was split into two 16 byte stores, one with offset 0 and one with offset 16. The size of the memory operand for both was 16. The scheduler correctly detected the alias with the offset 0 store, but not the offset 16 store.
This patch discards the pointer info so we don't incorrectly detect aliasing. I wasn't sure if we could keep using the original offset and size without risking some other transform on the load changing the size.
I tried to reduce a test case, but there's still a lot of memory operations needed to get the scheduler to do the bad reordering. So it looked pretty fragile to maintain.
Reviewers: efriedma
Reviewed By: efriedma
Subscribers: arphaman, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D57616
llvm-svn: 353124
Add an intrinsic that takes 2 unsigned integers with the scale of them
provided as the third argument and performs fixed point multiplication on
them.
This is a part of implementing fixed point arithmetic in clang where some of
the more complex operations will be implemented as intrinsics.
Differential Revision: https://reviews.llvm.org/D55625
llvm-svn: 353059
Noticed while investigating PR40483, and fixes the basic test case from the bug - but not a more general case.
We're pretty weak at dealing with ADD/SUB combines compared to the SimplifyAssociativeOrCommutative/SimplifyUsingDistributiveLaws abilities that InstCombine can manage.
llvm-svn: 353044
We already have the getConstantOperandVal helper which returns a uint64_t, but along comes the fuzzer and inserts a i128 -1 constant or something and the whole thing asserts.......
I've updated a few obvious cases, and tried to make use of the const reference where possible, but there's more to do. A number of existing oss-fuzz tickets should be fixed if we start using APInt and perform value clamping where necessary.
llvm-svn: 352961
Summary: This fixes using the correct stack registers for SEH when stack realignment is needed or when variable size objects are present.
Reviewers: rnk, efriedma, ssijaric, TomTan
Reviewed By: rnk, efriedma
Subscribers: javed.absar, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D57183
llvm-svn: 352923
This cleans up all CallInst creation in LLVM to explicitly pass a
function type rather than deriving it from the pointer's element-type.
Differential Revision: https://reviews.llvm.org/D57170
llvm-svn: 352909
The version of FoldConstantArithmetic() that takes arbitrary nodes
was confusingly naming those nodes as constants when they might
not be; also "Cst" reads like "Cast".
llvm-svn: 352884
This might be the start of tracking all vector element constants generally if we take it to its
logical conclusion, but let's stop here and make sure this is correct/beneficial so far.
The affected tests require a convoluted path before they get simplified currently because we
don't call SimplifyDemandedVectorElts() from binops directly and don't modify the binop operands
directly in SimplifyDemandedVectorElts().
That's why the tests all have a trailing shuffle to induce a chain reaction of transforms. So
something like this is happening:
1. Improve the knowledge of undefs in the binop via a SimplifyDemandedVectorElts() call that
originates from a shuffle.
2. Transfer that undef knowledge back to the shuffle mask user as more undef lanes.
3. Combine the modified shuffle by calling SimplifyDemandedVectorElts() again.
4. Translate the improved shuffle mask as undemanded lanes of build vector constants causing
those to become full undef constants.
5. Simplify the binop now that it has a full undef operand.
As we can see from the unchanged 'and' and 'or' tests, tracking undefs alone isn't a full solution.
We would need to track zero and all-ones constants to improve those opcodes. We'd probably need to
track NaN for FP ops too (assuming we don't have fast-math-flags set).
Differential Revision: https://reviews.llvm.org/D57066
llvm-svn: 352880
For targets where i32 is not a legal type (e.g. 64-bit RISC-V),
LegalizeIntegerTypes must promote the integer operand of ISD::FPOWI. As this
is a signed value, this should be sign-extended.
This patch enables all tests in test/CodeGen/RISCVfloat-intrinsics.ll for
RV64, as prior to this patch that file couldn't be compiled for RV64 due to an
assertion when performing codegen for fpowi.
Differential Revision: https://reviews.llvm.org/D54574
llvm-svn: 352832
This patch fixes pr39098.
For the attached test case, CombineZExtLogicopShiftLoad can optimize it to
t25: i64 = Constant<1099511627775>
t35: i64 = Constant<0>
t0: ch = EntryToken
t57: i64,ch = load<(load 4 from `i40* undef`, align 8), zext from i32> t0, undef:i64, undef:i64
t58: i64 = srl t57, Constant:i8<1>
t60: i64 = and t58, Constant:i64<524287>
t29: ch = store<(store 5 into `i40* undef`, align 8), trunc to i40> t57:1, t60, undef:i64, undef:i64
But later visitANDLike transforms it to
t25: i64 = Constant<1099511627775>
t35: i64 = Constant<0>
t0: ch = EntryToken
t57: i64,ch = load<(load 4 from `i40* undef`, align 8), zext from i32> t0, undef:i64, undef:i64
t61: i32 = truncate t57
t63: i32 = srl t61, Constant:i8<1>
t64: i32 = and t63, Constant:i32<524287>
t65: i64 = zero_extend t64
t58: i64 = srl t57, Constant:i8<1>
t60: i64 = and t58, Constant:i64<524287>
t29: ch = store<(store 5 into `i40* undef`, align 8), trunc to i40> t57:1, t60, undef:i64, undef:i64
And it triggers CombineZExtLogicopShiftLoad again, causes a dead loop.
Both forms should generate same instructions, CombineZExtLogicopShiftLoad generated IR looks cleaner. But it looks more difficult to prevent visitANDLike to do the transform, so I prevent CombineZExtLogicopShiftLoad to do the transform if the ZExt is free.
Differential Revision: https://reviews.llvm.org/D57491
llvm-svn: 352792
While dangling nodes will eventually be pruned when they are
considered, leaving them disables combines requiring single-use.
Reviewers: Carrot, spatel, craig.topper, RKSimon, efriedma
Subscribers: hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D57520
llvm-svn: 352784
r zero scale SMULFIX, expand into MUL which produces better code for X86.
For vector arguments, expand into MUL if SMULFIX is provided with a zero scale.
Otherwise, expand into MULH[US] or [US]MUL_LOHI.
Differential Revision: https://reviews.llvm.org/D56987
llvm-svn: 352783
And instead just generate a libcall. My motivating example on ARM was a simple:
shl i64 %A, %B
for which the code bloat is quite significant. For other targets that also
accept __int128/i128 such as AArch64 and X86, it is also beneficial for these
cases to generate a libcall when optimising for minsize. On these 64-bit targets,
the 64-bits shifts are of course unaffected because the SHIFT/SHIFT_PARTS
lowering operation action is not set to custom/expand.
Differential Revision: https://reviews.llvm.org/D57386
llvm-svn: 352736
Summary:
Fixes PR40267, in which the removed assertion was triggering on
perfectly valid IR. As far as I can tell, constant out of bounds
indices should be allowed when splitting extract_vector_elt, since
they will simply be propagated as out of bounds indices in the
resulting split vector and handled appropriately elsewhere.
Reviewers: aheejin
Subscribers: dschuff, sbc100, jgravelle-google, hiraditya
Differential Revision: https://reviews.llvm.org/D57471
llvm-svn: 352702
These can be triggered by mistakenly using a 64-bit mode only intrinsics with a -mtriple=i686. Using report_fatal_error gives a better experience for this mistake in release builds instead of probably crashing.
We already do this for some of the vector type legalization handles.
llvm-svn: 352699
This extends the existing transform for:
add X, 0/1 --> sub X, 0/-1
...to allow the sibling subtraction fold.
This pattern could regress with the proposed change in D57401.
llvm-svn: 352680
Summary:
This switches the EH implementation to the new proposal:
https://github.com/WebAssembly/exception-handling/blob/master/proposals/Exceptions.md
(The previous proposal was
https://github.com/WebAssembly/exception-handling/blob/master/proposals/old/Exceptions.md)
- Instruction changes
- Now we have one single `catch` instruction that returns a except_ref
value
- `throw` now can take variable number of operations
- `rethrow` does not have 'depth' argument anymore
- `br_on_exn` queries an except_ref to see if it matches the tag and
branches to the given label if true.
- `extract_exception` is a pseudo instruction that simulates popping
values from wasm stack. This is to make `br_on_exn`, a very special
instruction, work: `br_on_exn` puts values onto the stack only if it
is taken, and the # of values can vay depending on the tag.
- Now there's only one `catch` per `try`, this patch removes all special
handling for terminate pad with a call to `__clang_call_terminate`.
Before it was the only case there are two catch clauses (a normal
`catch` and `catch_all` per `try`).
- Make `rethrow` act as a terminator like `throw`. This splits BB after
`rethrow` in WasmEHPrepare, and deletes an unnecessary `unreachable`
after `rethrow` in LateEHPrepare.
- Now we stop at all catchpads (because we add wasm `catch` instruction
that catches all exceptions), this creates new
`findWasmUnwindDestinations` function in SelectionDAGBuilder.
- Now we use `br_on_exn` instrution to figure out if an except_ref
matches the current tag or not, LateEHPrepare generates this sequence
for catch pads:
```
catch
block i32
br_on_exn $__cpp_exception
end_block
extract_exception
```
- Branch analysis for `br_on_exn` in WebAssemblyInstrInfo
- Other various misc. changes to switch to the new proposal.
Reviewers: dschuff
Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D57134
llvm-svn: 352598
This is the sibling fold for insert-of-insert that was added with D56604.
Now that we have x86 shuffle narrowing (D57156), this change shows improvements for
lots of AVX512 reduction code (not sure that we would ever expect extract-of-extract otherwise).
There's a small regression in some of the partial-permute tests (extracting followed by splat).
That is tracked by PR40500:
https://bugs.llvm.org/show_bug.cgi?id=40500
Differential Revision: https://reviews.llvm.org/D57336
llvm-svn: 352528
This fixes most references to the paths:
llvm.org/svn/
llvm.org/git/
llvm.org/viewvc/
github.com/llvm-mirror/
github.com/llvm-project/
reviews.llvm.org/diffusion/
to instead point to https://github.com/llvm/llvm-project.
This is *not* a trivial substitution, because additionally, all the
checkout instructions had to be migrated to instruct users on how to
use the monorepo layout, setting LLVM_ENABLE_PROJECTS instead of
checking out various projects into various subdirectories.
I've attempted to not change any scripts here, only documentation. The
scripts will have to be addressed separately.
Additionally, I've deleted one document which appeared to be outdated
and unneeded:
lldb/docs/building-with-debug-llvm.txt
Differential Revision: https://reviews.llvm.org/D57330
llvm-svn: 352514
During the lowering of a switch that would result in the generation of a
jump table, a range check is performed before indexing into the jump
table, for the switch value being outside the jump table range and a
conditional branch is inserted to jump to the default block. In case the
default block is unreachable, this conditional jump can be omitted. This
patch implements omitting this conditional branch for unreachable
defaults.
Review ID: D52002
Reviewers: Hans Wennborg, Eli Freidman, Roman Lebedev
llvm-svn: 352484
A FrameIndex should be valid throughout a block regardless of what instructions
get selected in that block -- therefore we shouldn't harness dbg.values that
refer to FrameIndexes to an SDNode. There are numerous codegen reasons why
an SDNode never appears or doesn't become a location that a DBG_VALUE can
refer to. None of them actually affect the variable location.
Therefore, before any other tests to encode dbg_values in a SelectionDAG,
identify FrameIndex operands and encode them unattached to any SDNode.
Differential Revision: https://reviews.llvm.org/D57328
llvm-svn: 352467
This did not cause the buildbot failure it was previously reverted for.
Original commit message:
I'm not sure why we were using SEXTLOAD. EXTLOAD seems more appropriate since we don't care about the upper bits.
This patch changes this and then modifies the X86 post legalization combine to emit a extending shuffle instead of a sign_extend_vector_inreg. Could maybe use an any_extend_vector_inre
On AVX512 targets I think we might be able to use a masked vpmovzx and not have to expand this at all.
llvm-svn: 352433
Followup to D56636, this time handling the UADDSAT case by expanding
uadd.sat(a, b) to umin(a, ~b) + b.
Differential Revision: https://reviews.llvm.org/D56869
llvm-svn: 352409
This patch improves the placement of DBG_VALUEs when by SelectionDAG, which
as documented in PR40427 can go very wrong. At the core of this is
ProcessSourceNode, which assumes the last instruction in a BB is the start
of the last processed IR instruction, which isn't always true.
Instead, use a helper function to call InstrEmitter::EmitNode, that records
before-and-after iterators and determines the first of any new instruction
created during emission. This is passed to ProcessSourceNode, which can
then make more elightened decisions about ordering for DBG_VALUE placement.
Differential revision: https://reviews.llvm.org/D57163
llvm-svn: 352350
Summary:
I'm not sure why we were using SEXTLOAD. EXTLOAD seems more appropriate since we don't care about the upper bits.
This patch changes this and then modifies the X86 post legalization combine to emit a extending shuffle instead of a sign_extend_vector_inreg. Could maybe use an any_extend_vector_inreg, but I just did what we already do in LowerLoad. I think we can actually get rid of this code entirely if we switch to -x86-experimental-vector-widening-legalization.
On AVX512 targets I think we might be able to use a masked vpmovzx and not have to expand this at all.
Reviewers: RKSimon, spatel
Reviewed By: RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D57186
llvm-svn: 352255
It should be emitted when any floating-point operations (including
calls) are present in the object, not just when calls to printf/scanf
with floating point args are made.
The difference caused by this is very subtle: in static (/MT) builds,
on x86-32, in a program that uses floating point but doesn't print it,
the default x87 rounding mode may not be set properly upon
initialization.
This commit also removes the walk of the types pointed to by pointer
arguments in calls. (To assist in opaque pointer types migration --
eventually the pointee type won't be available.)
That latter implies that it will no longer consider a call like
`scanf("%f", &floatvar)` as sufficient to emit _fltused on its
own. And without _fltused, `scanf("%f")` will abort with error R6002. This
new behavior is unlikely to bite anyone in practice (you'd have to
read a float, and do nothing with it!), and also, is consistent with
MSVC.
Differential Revision: https://reviews.llvm.org/D56548
llvm-svn: 352076
The current check in CombineToPreIndexedLoadStore is too
conversative, preventing a pre-indexed store when the base pointer
is a predecessor of the value being stored. Instead, we should check
the pointer operand of the store.
Differential Revision: https://reviews.llvm.org/D56719
llvm-svn: 351933
Also add debug prints in the default case of the switches in these routines.
Most if not all of the type legalization handlers already do this so this makes promoting floats consistent
llvm-svn: 351890
vecbo (insertsubv undef, X, Z), (insertsubv undef, Y, Z) --> insertsubv VecC, (vecbo X, Y), Z
This is another step in generic vector narrowing. It's also a step towards more horizontal op
formation specifically for x86 (although we still failed to match those in the affected tests).
The scalarization cases are also not optimal (we should be scalarizing those), but it's still
an improvement to use a narrower vector op when we know part of the result must be constant
because both inputs are undef in some vector lanes.
I think a similar match but checking for a constant operand might help some of the cases in
D51553.
Differential Revision: https://reviews.llvm.org/D56875
llvm-svn: 351825
The regression test is reduced from the example shown in D56281.
This does raise a question as noted in the test file: do we want
to handle this pattern? I don't have a motivating example for
that on x86 yet, but it seems like we could have that pattern
there too, so we could avoid the back-and-forth using a shuffle.
llvm-svn: 351753
to reflect the new license.
We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.
Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.
llvm-svn: 351636
Summary:
This patch makes some changes related to -dag-dump-verbose.
Main use case has been when debugging how SelectionDAG is
dealing with debug info (SDDbgValue nodes).
1) We now print the number of DbgValues that are mapped to each
SDNode.
2) Removed duplicated printing of DebugLoc (nowadays DebugLoc is
printed also when not using -dag-dump-verbose).
3) Renamed SDDbgValue::dump to SDDbgValue::print, and added a
new SDDbgValue::dump that will start a new line after calling
print.
4) SDDbgValue::print now prints "Order", and it also prints
some additional information when kind is CONST/FRAMEIX/VREG.
5) SelectionDAG::dump() now dumps all SDDbgValue nodes after
the list of SDNodes (both "regular" and "ByVal" SDDbgValue:s).
Invalidated nodes are not printed.
6) Prohibit inline printing of SDNode operands that has SDDbgValue
nodes associated to them.
Reviewers: jmorse, aprantl
Reviewed By: aprantl
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D56793
llvm-svn: 351581
Similar to D55073. Without this change, the DAG combiner crashes on code
with more than 64k of stores in a single basic block that form parallelizable
chains.
No test case, as it would be very IR file.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D56740
llvm-svn: 351571
Defer inline asm's output fixup work until after we've generated the
inline asm node itself. Remove StoresToEmit, IndirectStoresToEmit, and
RetValRegs in favor of using ConstraintOperands.
llvm-svn: 351558
This functionality is required at multiple places which potentially
create large operand lists, like SelectionDAGBuilder or DAGCombiner.
Differential Revision: https://reviews.llvm.org/D56739
llvm-svn: 351552
Summary:
Use this helper to make sure we use the same value at various places.
This will likely be needed at more places were we currently crash
because we use more operands than possible.
Also makes it easier to change in the future.
Reviewers: RKSimon, craig.topper, efriedma, aemerson
Reviewed By: RKSimon
Subscribers: hiraditya, arsenm, llvm-commits
Differential Revision: https://reviews.llvm.org/D56859
llvm-svn: 351537
We should not pre-scheduled the node has ADJCALLSTACKDOWN parent,
or else, when bottom-up scheduling, ADJCALLSTACKDOWN and
ADJCALLSTACKUP may hold CallResource too long and make other
calls can't be scheduled. If there's no other available node
to schedule, the scheduler will try to rename the register by
creating copy to avoid the conflict which will fail because
CallResource is not a real physical register.
llvm-svn: 351527
Summary:
This patch supports MS SEH extensions __try/__except/__finally. The intrinsics localescape and localrecover are responsible for communicating escaped static allocas from the try block to the handler.
We need to preserve frame pointers for SEH. So we create a new function/property HasLocalEscape.
Reviewers: rnk, compnerd, mstorsjo, TomTan, efriedma, ssijaric
Reviewed By: rnk, efriedma
Subscribers: smeenai, jrmuizel, alex, majnemer, ssijaric, ehsan, dmajor, kristina, javed.absar, kristof.beyls, chrib, llvm-commits
Differential Revision: https://reviews.llvm.org/D53540
llvm-svn: 351370
dbg.value intrinsics can appear in blocks where their operand is not used,
meaning the operand never receives an SDNode, and thus no DBG_VALUE will
be created. Get around this by looking to see whether the operand has already
been allocated a virtual register. This allows dbg.values of Phi node and
Values that are used across basic blocks to successfully be translated into
DBG_VALUEs.
Differential Revision: https://reviews.llvm.org/D56678
llvm-svn: 351358
The value returned by max() is the last valid value, adjust the
comparison accordingly.
The code added in D55073 creates TokenFactors with max() operands.
Reviewers: aemerson, efriedma, RKSimon, craig.topper
Reviewed By: aemerson
Differential Revision: https://reviews.llvm.org/D56738
llvm-svn: 351318
ReduceLoadWidth can trigger using a shifted mask is used and this
requires that the function return a shl node to correct for the
offset. However, the way that this was implemented meant that the
returned result could be an existing node, which would be incorrect.
This fixes the method of inserting the new node and replacing uses.
Differential Revision: https://reviews.llvm.org/D50432
llvm-svn: 351310
The motivating case for this is shown in the first regression test. We are
transferring to scalar and back rather than just zero-extending with 'vpmovzxdq'.
That's a special-case for a more general pattern as shown here. In all tests,
we're avoiding the vector-scalar-vector moves in favor of vector ops.
We aren't producing optimal shuffle code in some cases though, so the patch is
limited to reduce regressions.
Differential Revision: https://reviews.llvm.org/D56281
llvm-svn: 351198
I accidentally triggered this code while doing some experiments and it doesn't look lke it could possibly work.
It calls 'getNOT' on a node that should be a CondCode.
I think to do this right we would need to swap the branch target and the fallthrough target. But that's not easy to do. Or we could create an explicit SetCC and feed that into a new BR_CC?
llvm-svn: 351022
This pattern:
t33: v8i32 = insert_subvector undef:v8i32, t35, Constant:i64<0>
t21: v16i32 = insert_subvector undef:v16i32, t33, Constant:i64<0>
...shows up in PR33758:
https://bugs.llvm.org/show_bug.cgi?id=33758
...although this patch doesn't make any difference to the final result on that yet.
In the affected tests here, it looks like it just makes RA wiggle. But we might
as well squash this to prevent it interfering with other pattern-matching.
Differential Revision:
https://reviews.llvm.org/D56604
llvm-svn: 351008
This patch takes some of the code from D49837 to allow us to enable ISD::ABS support for all SSE vector types.
Differential Revision: https://reviews.llvm.org/D56544
llvm-svn: 350998
Summary:
When legalizing the result of a SELECT_CC node by promoting the
floating-point type, use the promoted-to type rather than the original
type.
Fix PR40273.
Reviewers: efriedma, majnemer
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D56566
llvm-svn: 350951
That is, remove many of the calls to Type::getNumContainedTypes(),
Type::subtypes(), and Type::getContainedType(N).
I'm not intending to remove these accessors -- they are
useful/necessary in some cases. However, removing the pointee type
from pointers would potentially break some uses, and reducing the
number of calls makes it easier to audit.
llvm-svn: 350835
This removes check for single use from general ShrinkDemandedConstant
to the BE because of the AArch64 regression after D56289/rL350475.
After several hours of experiments I did not come up with a testcase
failing on any other targets if check is not performed.
Moreover, direct call to ShrinkDemandedConstant is not really needed
and superceed by SimplifyDemandedBits.
Differential Revision: https://reviews.llvm.org/D56406
llvm-svn: 350684
As we saw in D56057 when we tried to use this function on X86, it's unsafe. It allows the operand node to have multiple users, but doesn't prevent recursing past the first node when it does have multiple users. This can cause other simplifications earlier in the graph without regard to what bits are needed by the other users of the first node. Ideally all we should do to the first node if it has multiple uses is bypass it when its not needed by the user we started from. Doing any other transformation that SimplifyDemandedBits can do like turning ZEXT/SEXT into AEXT would result in an increase in instructions.
Fortunately, we already have a function that can do just that, GetDemandedBits. It will only make transformations that involve bypassing a node.
This patch changes AMDGPU's simplifyI24, to use a combination of GetDemandedBits to handle the multiple use simplifications. And then uses the regular SimplifyDemandedBits on each operand to handle simplifications allowed when the operand only has a single use. Unfortunately, GetDemandedBits simplifies constants more aggressively than SimplifyDemandedBits. This caused the -7 constant in the changed test to be simplified to remove the upper bits. I had to modify computeKnownBits to account for this by ignoring the upper 8 bits of the input.
Differential Revision: https://reviews.llvm.org/D56087
llvm-svn: 350560
The FSHL/FSHR nodes are handled in the expand function, but they need to also be listed in the code that queries for the operation action too.
llvm-svn: 350490
Fixes cvt_f32_ubyte combine. performCvtF32UByteNCombine() could shrink
source node to demanded bits only even if there are other uses.
Differential Revision: https://reviews.llvm.org/D56289
llvm-svn: 350475
This adds support for calculating sign bits of insert_subvector. I based it on the computeKnownBits.
My motivating case is propagating sign bits information across basic blocks on AVX targets where concatenating using insert_subvector is common.
Differential Revision: https://reviews.llvm.org/D56283
llvm-svn: 350432
As noted in PR39973 and D55558:
https://bugs.llvm.org/show_bug.cgi?id=39973
...this is a partial implementation of a fold that we do as an IR canonicalization in instcombine:
// extelt (binop X, Y), Index --> binop (extelt X, Index), (extelt Y, Index)
We want to have this in the DAG too because as we can see in some of the test diffs (reductions),
the pattern may not be visible in IR.
Given that this is already an IR canonicalization, any backend that would prefer a vector op over
a scalar op is expected to already have the reverse transform in DAG lowering (not sure if that's
a realistic expectation though). The transform is limited with a TLI hook because there's an
existing transform in CodeGenPrepare that tries to do the opposite transform.
Differential Revision: https://reviews.llvm.org/D55722
llvm-svn: 350354
Currently we expand the two nodes separately. This gives DAG combiner an opportunity to optimize the expanded sequence taking into account only one set of users. When we expand the other node we'll create the expansion again, but might not be able to optimize it the same way. So the nodes won't CSE and we'll have two similarish sequences in the same basic block. By expanding both nodes at the same time we'll avoid prematurely optimizing the expansion until both the division and remainder have been replaced.
Improves the test case from PR38217. There may be additional opportunities after this.
Differential Revision: https://reviews.llvm.org/D56145
llvm-svn: 350239
By also promoting the input type we get a better idea for what scalar type to use. This can provide better results if the result of the extract is sign extended. What was previously happening is that the extract result would be legalized, sometime later the input of the sign extend would be legalized using the result of the extract. Then later the extract input would be legalized forcing a truncate into the input of the sign extend using a replace all uses. This requires DAG combine to combine out the sext/truncate pair. But sometimes we visited the truncate first and messed things up before the sext could be combined.
By creating the extract with the correct scalar type when we create legalize the result type, the truncate will be added right away. Then when the sign_extend input is legalized it will create an any_extend of the truncate which can be optimized by getNode to maybe remove the truncate. And then a sign_extend_inreg. Now DAG combine doesn't have to worry about getting rid of the extend.
This fixes the regression on X86 in D56156.
Differential Revision: https://reviews.llvm.org/D56176
llvm-svn: 350236
If x has multiple sign bits than it doesn't matter which one we extend from so we can sext from x's msb instead.
The X86 setcc-combine.ll changes are a little weird. It appears we ended up with a (sext_inreg (aext (trunc (extractelt)))) after type legalization. The sext_inreg+aext now gets optimized by this combine to leave (sext (trunc (extractelt))). Then we visit the trunc before we visit the sext. This ends up changing the truncate to an extractvectorelt from a bitcasted vector. I have a follow up patch to fix this.
Differential Revision: https://reviews.llvm.org/D56156
llvm-svn: 350235
default
During the lowering of a switch that would result in the generation of a jump
table, a range check is performed before indexing into the jump table, for the
switch value being outside the jump table range and a conditional branch is
inserted to jump to the default block. In case the default block is
unreachable, this conditional jump can be omitted. This patch implements
omitting this conditional branch for unreachable defaults.
Review Reference: D52002
llvm-svn: 350186
The patch adds a possibility to make library calls on NVPTX.
An important thing about library functions - they must be defined within
the current module. This basically should guarantee that we produce a
valid PTX assembly (without calls to not defined functions). The one who
wants to use the libcalls is probably will have to link against
compiler-rt or any other implementation.
Currently, it's completely impossible to make library calls because of
error LLVM ERROR: Cannot select: i32 = ExternalSymbol '...'. But we can
lower ExternalSymbol to TargetExternalSymbol and verify if the function
definition is available.
Also, there was an issue with a DAG during legalisation. When we expand
instruction into libcall, the inner call-chain isn't being "integrated"
into outer chain. Since the last "data-flow" (call retval load) node is
located in call-chain earlier than CALLSEQ_END node, the latter becomes
a leaf and therefore a dead node (and is being removed quite fast).
Proposed here solution relies on another data-flow pseudo nodes
(ProxyReg) which purpose is only to keep CALLSEQ_END at legalisation and
instruction selection phases - we remove the pseudo instructions before
register scheduling phase.
Patch by Denys Zariaiev!
Differential Revision: https://reviews.llvm.org/D34708
llvm-svn: 350069
This is an alternative to what I attempted in D56057.
GetDemandedBits is a special version of SimplifyDemandedBits that allows simplifications even when the operand has other uses. GetDemandedBits will only do simplifications that allow a node to be bypassed. It won't create new nodes or alter any of the other users.
I had to add support for bypassing SIGN_EXTEND_INREG to GetDemandedBits.
Based on a patch that Simon Pilgrim sent me in email.
Fixes PR40142.
llvm-svn: 350059
More migration so we can disable the implicit int -> LocationSize
conversion.
All of these are either scatter/gather'ed vector instructions, or direct
loads. Hence, they're all precise.
Perhaps if we see way more getTypeStoreSize calls, we can make a
getTypeStoreLocationSize (or similar) as a wrapper that applies this
::precise. Doesn't appear that it's a good idea to make getTypeStoreSize
return a LocationSize itself, however.
llvm-svn: 350042
It's dangerous to knowingly create an illegal vector type
no matter what stage of combining we're in.
This prevents the missed folding/scalarization seen in:
https://bugs.llvm.org/show_bug.cgi?id=40146
llvm-svn: 350034
trunc (add X, C ) --> add (trunc X), C'
If we're throwing away the top bits of an 'add' instruction, do it in the narrow destination type.
This makes the truncate-able opcode list identical to the sibling transform done in IR (in instcombine).
This change used to show regressions for x86, but those are gone after D55494.
This gets us closer to deleting the x86 custom function (combineTruncatedArithmetic)
that does almost the same thing.
Differential Revision: https://reviews.llvm.org/D55866
llvm-svn: 350006
This saves materializing the immediate. The additional forms are less
common (they don't usually show up for bitfield insert/extract), but
they're still relevant.
I had to add a new target hook to prevent DAGCombine from reversing the
transform. That isn't the only possible way to solve the conflict, but
it seems straightforward enough.
Differential Revision: https://reviews.llvm.org/D55630
llvm-svn: 349857
This patch enables funnel shift -> rotate building for all ROTL/ROTR custom/legal operations.
AFAICT X86 was the last target that was missing modulo support (PR38243), but I've tried to CC stakeholders for every target that has ROTL/ROTR custom handling for their final OK.
Differential Revision: https://reviews.llvm.org/D55747
llvm-svn: 349765
Now that SimplifyDemandedBits/SimplifyDemandedVectorElts is simplifying vector elements, we're seeing more constant BUILD_VECTOR containing undefs.
This patch provides opt-in support for UNDEF elements in matchBinaryPredicate, passing NULL instead of the result ConstantSDNode* argument.
I've updated the (or (and X, c1), c2) -> (and (or X, c2), c1|c2) fold to demonstrate its use, which I believe is safe for undef cases.
Differential Revision: https://reviews.llvm.org/D55822
llvm-svn: 349629
Now that SimplifyDemandedBits/SimplifyDemandedVectorElts is simplifying vector elements, we're seeing more constant BUILD_VECTOR containing undefs.
This patch provides opt-in support for UNDEF elements in matchBinaryPredicate, passing NULL instead of the result ConstantSDNode* argument.
Differential Revision: https://reviews.llvm.org/D55822
llvm-svn: 349628
As described on PR40091, we have several places where zext (and zext_vector_inreg) fold an undef input into an undef output. For zero extensions this is incorrect as the output should guarantee to least have the new upper bits set to zero.
SimplifyDemandedVectorElts is the worst offender (and its the most likely to cause new undefs to appear) but DAGCombiner's tryToFoldExtendOfConstant has a similar issue.
Thanks to @dmgreen for catching this.
Differential Revision: https://reviews.llvm.org/D55883
llvm-svn: 349625
Now that SimplifyDemandedBits/SimplifyDemandedVectorElts are simplifying vector elements, we're seeing more constant BUILD_VECTOR containing UNDEFs.
This patch provides opt-in handling of UNDEF elements in matchUnaryPredicate, passing NULL instead of the ConstantSDNode* argument.
I've updated SelectionDAG::simplifyShift to demonstrate its use.
Differential Revision: https://reviews.llvm.org/D55819
llvm-svn: 349616
SelectionDAG currently changes these intrinsics to function calls, but that won't work
for other ISel's. Also we want to eventually support nonlazybind and weak linkage coming
from the front-end which we can't do in SelectionDAG.
llvm-svn: 349552
For opcodes not covered by SimplifyDemandedVectorElts, SimplifyDemandedBits might be able to help now that it supports demanded elts as well.
llvm-svn: 349466
The assertion type is always supposed to be a scalar type. So if the result VT of the assertion is a vector, we need to get the scalar VT before we can compare them.
Similarly for the assert above it.
I don't have a test case because I don't know of any place we violate this today. A coworker found this while trying to use r347287 on the 6.0 branch without also having r336868
llvm-svn: 349390
This is an initial patch to add the necessary support for a DemandedElts argument to SimplifyDemandedBits, more closely matching computeKnownBits and to help improve vector codegen.
I've added only a small amount of the changes necessary to get at least one test to update - a lot more can be done but I'd like to add these methodically with proper test coverage, at the same time the hope is to slowly move some/all of SimplifyDemandedVectorElts into SimplifyDemandedBits as well.
Differential Revision: https://reviews.llvm.org/D55768
llvm-svn: 349374
We keep a few iterators into the basic block we're selecting while
performing FastISel. Usually this is fine, but occasionally code wants
to remove already-emitted instructions. When this happens we have to be
careful to update those iterators so they're not pointint at dangling
memory.
llvm-svn: 349365
The transform performs a bitwise logic op in a wider type followed by
truncate when both inputs are truncated from the same source type:
logic_op (truncate x), (truncate y) --> truncate (logic_op x, y)
There are a bunch of other checks that should prevent doing this when
it might be harmful.
We already do this transform for scalars in this spot. The vector
limitation was shared with a check for the case when the operands are
extended. I'm not sure if that limit is needed either, but that would
be a separate patch.
Differential Revision: https://reviews.llvm.org/D55448
llvm-svn: 349303
Also exposes an issue in DAGCombiner::visitFunnelShift where we were assuming the shift amount had the result type (after legalization it'll have the targets shift amount type).
llvm-svn: 349298
Summary:
If the setcc already has the target desired type we can reach the getSetCC/getSExtOrTrunc after the MatchingVecType check with the exact same types as the nodes we started with. This causes those causes VsetCC to be CSEd to N0 and the getSExtOrTrunc will CSE to N. When we return N, the caller will think that meant we called CombineTo and did our own worklist management. But that's not what happened. This prevents target hooks from being called for the node.
To fix this, I've now returned SDValue if the setcc is already the desired type. But to avoid some regressions in X86 I've had to disable one of the target combines that wasn't being reached before in the case of a (sext (setcc)). If we get vector widening legalization enabled that entire function will be deleted anyway so hopefully this is only for the short term.
Reviewers: RKSimon, spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D55459
llvm-svn: 349137
This isn't quite NFC, but I don't know how to expose
any outward diffs from these changes. Mostly, this
was confusing because it used 'VT' to refer to the
operand type rather the usual type of the input node.
There's also a large block at the end that is dedicated
solely to matching loads, but that wasn't obvious. This
could probably be split up into separate functions to
make it easier to see.
It's still not clear to me when we make certain transforms
because the legality and constant conditions are
intertwined in a way that might be improved.
llvm-svn: 349095
This is a retry of rL349051 (reverted at rL349056). I changed the check for dead-ness from
number of uses to an opcode test for DELETED_NODE based on existing similar code.
Differential Revision: https://reviews.llvm.org/D55655
llvm-svn: 349058
Move existing rotation expansion code into TargetLowering and set it up for vectors as well.
Ideally this would share more of the funnel shift expansion, but we handle the shift amount modulo quite differently at the moment.
Begun removing x86 vector rotate custom lowering to use the expansion.
llvm-svn: 349025
Summary:
All targets either just return false here or properly model `Fast`, so I
don't think there is any reason to prevent CodeGen from doing the right
thing here.
Subscribers: nemanjai, javed.absar, eraman, jsji, llvm-commits
Differential Revision: https://reviews.llvm.org/D55365
llvm-svn: 349016
This patch introduces a generic function to determine whether a given vector type is known to be a splat value for the specified demanded elements, recursing up the DAG looking for BUILD_VECTOR or VECTOR_SHUFFLE splat patterns.
It also keeps track of the elements that are known to be UNDEF - it returns true if all the demanded elements are UNDEF (as this may be useful under some circumstances), so this needs to be handled by the caller.
A wrapper variant is also provided that doesn't take the DemandedElts or UndefElts arguments for cases where we just want to know if the SDValue is a splat or not (with/without UNDEFS).
I had hoped to completely remove the X86 local version of this function, but I'm seeing some regressions in shift/rotate codegen that will take a little longer to fix and I hope to get this in sooner so I can continue work on PR38243 which needs more capable splat detection.
Differential Revision: https://reviews.llvm.org/D55426
llvm-svn: 348953
If either of the operand elements are zero then we know the result element is going to be zero (even if the other element is undef).
Differential Revision: https://reviews.llvm.org/D55558
llvm-svn: 348926
Add an intrinsic that takes 2 signed integers with the scale of them provided
as the third argument and performs fixed point multiplication on them.
This is a part of implementing fixed point arithmetic in clang where some of
the more complex operations will be implemented as intrinsics.
Differential Revision: https://reviews.llvm.org/D54719
llvm-svn: 348912
Summary:
All targets either just return false here or properly model `Fast`, so I
don't think there is any reason to prevent CodeGen from doing the right
thing here.
Subscribers: nemanjai, javed.absar, eraman, jsji, llvm-commits
Differential Revision: https://reviews.llvm.org/D55365
llvm-svn: 348843
If all the demanded elements of the SimplifyDemandedVectorElts are known to be UNDEF, we can simplify to an ISD::UNDEF node.
Zero constant folding will be handled in a future patch - its a little trickier as we often have bitcasted zero values.
Differential Revision: https://reviews.llvm.org/D55511
llvm-svn: 348784
As discussed on D55511, this caused an issue if the inner node deletes a node that the outer node depends upon. As it doesn't affect any lit-tests and I've only been able to expose this with the D55511 change I'm committing this now.
llvm-svn: 348781
This triggers an assert when combining concat_vectors of a bitcast of
merge_values.
With asserts disabled, it fails to select:
fatal error: error in backend: Cannot select: 0x7ff19d000e90: i32 = any_extend 0x7ff19d000ae8
0x7ff19d000ae8: f64,ch = CopyFromReg 0x7ff19d000c20:1, Register:f64 %1
0x7ff19d000b50: f64 = Register %1
In function: d
Differential Revision: https://reviews.llvm.org/D55507
llvm-svn: 348759
Currently, dbg.value's of "nullptr" are dropped when entering a SelectionDAG --
apparently just because of an oversight when recognising Values that are
constant (see PR39787). This patch adds ConstantPointerNull to the list of
constants that can be turned into DBG_VALUEs.
The matter of what bit-value a null pointer constant in LLVM has was raised
in this mailing list thread:
http://lists.llvm.org/pipermail/llvm-dev/2018-December/128234.html
Where it transpires LLVM relies on (IR) null pointers being zero valued,
thus I've baked this assumption into the patch.
Differential Revision: https://reviews.llvm.org/D55227
llvm-svn: 348753
This is a fix for PR39896, where dbg.value's of SDNodes that have been
optimised out do not lead to "DBG_VALUE undef" instructions being created.
Such undef instructions are necessary to terminate earlier variable
ranges, otherwise variable values leak past the point where they're valid.
The "invalidated" flag of SDDbgValue is currently being abused to mean two
things:
* The corresponding SDNode is now invalid
* This SDDbgValue should not be emitted
Of which there are several legitimate combinations of meaning:
* The SDNode has been invalidated and we should emit "DBG_VALUE undef"
* The SDNode has been invalidated but the debug data was salvaged, don't
emit anything for this SDDbgValue
* This SDDbgValue has been emitted
This patch introduces distinct "Emitted" and "Invalidated" fields to the
SDDbgValue class, updates users accordingly, and generates "undef"
DBG_VALUEs for invalidated records. Awkwardly, there are circumstances
where we emit SDDbgValue's twice, specifically DebugInfo/X86/dbg-addr-dse.ll
which I've preserved.
Differential Revision: https://reviews.llvm.org/D55372
llvm-svn: 348751
This is effectively re-committing the changes from:
rL347917 (D54640)
rL348195 (D55126)
...which were effectively reverted here:
rL348604
...because the code had a bug that could induce infinite looping
or eventual out-of-memory compilation.
The bug was that this code did not guard against transforming
opaque constants. More details are in the post-commit mailing
list thread for r347917. A reduced test for that is included
in the x86 bool-math.ll file. (I wasn't able to reduce a PPC
backend test for this, but it was almost the same pattern.)
Original commit message for r347917:
The motivating case for this is shown in:
https://bugs.llvm.org/show_bug.cgi?id=32023
and the corresponding rot16.ll regression tests.
Because x86 scalar shift amounts are i8 values, we can end up with trunc-binop-trunc
sequences that don't get folded in IR.
As the TODO comments suggest, there will be regressions if we extend this (for x86,
we mostly seem to be missing LEA opportunities, but there are likely vector folds
missing too). I think those should be considered existing bugs because this is the
same transform that we do as an IR canonicalization in instcombine. We just need
more tests to make those visible independent of this patch.
llvm-svn: 348706
These nodes should have two results. A real VT and a Glue. But this code would have returned Undef which would only be a single result. But we're in the single result version of getNode so these opcodes should never be seen by this function anyway.
llvm-svn: 348670
As discussed in the post-commit thread of r347917, this
transform is fighting with an existing transform causing
an infinite loop or out-of-memory, so this is effectively
reverting r347917 and its follow-up r348195 while we
investigate the bug.
llvm-svn: 348604
If this is not a valid way to assign an SDLoc, then we get this
wrong all over SDAG.
I don't know enough about the SDAG to explain this. IIUC, theoretically,
debug info is not supposed to affect codegen. But here it has clearly
affected 3 different targets, and the x86 change is an actual improvement.
llvm-svn: 348552
We shouldn't care about the debug location for a node that
we're creating, but attaching the root of the pattern should
be the best effort. (If this is not true, then we are doing
it wrong all over the SDAG).
This is no-functional-change-intended, and there are no
regression test diffs...and that's what I expected. But
there's a similar line above this diff, where those
assumptions apparently do not hold.
llvm-svn: 348550
This was probably organized as it was because bswap is a unary op.
But that's where the similarity to the other opcodes ends. We should
not limit this transform to scalars, and we should not try it if
either input has other uses. This is another step towards trying to
clean this whole function up to prevent it from causing infinite loops
and memory explosions.
Earlier commits in this series:
rL348501
rL348508
rL348518
llvm-svn: 348534
Unlike some of the folds in hoistLogicOpWithSameOpcodeHands()
above this shuffle transform, this has the expected hasOneUse()
checks in place.
llvm-svn: 348523
This patch introduces a new DAGCombiner rule to simplify concat_vectors nodes:
concat_vectors( bitcast (scalar_to_vector %A), UNDEF)
--> bitcast (scalar_to_vector %A)
This patch only partially addresses PR39257. In particular, it is enough to fix
one of the two problematic cases mentioned in PR39257. However, it is not enough
to fix the original test case posted by Craig; that particular case would
probably require a more complicated approach (and knowledge about used bits).
Before this patch, we used to generate the following code for function PR39257
(-mtriple=x86_64 , -mattr=+avx):
vmovsd (%rdi), %xmm0 # xmm0 = mem[0],zero
vxorps %xmm1, %xmm1, %xmm1
vblendps $3, %xmm0, %xmm1, %xmm0 # xmm0 = xmm0[0,1],xmm1[2,3]
vmovaps %ymm0, (%rsi)
vzeroupper
retq
Now we generate this:
vmovsd (%rdi), %xmm0 # xmm0 = mem[0],zero
vmovaps %ymm0, (%rsi)
vzeroupper
retq
As a side note: that VZEROUPPER is completely redundant...
I guess the vzeroupper insertion pass doesn't realize that the definition of
%xmm0 from vmovsd is already zeroing the upper half of %ymm0. Note that on
%-mcpu=btver2, we don't get that vzeroupper because pass vzeroupper insertion
%pass is disabled.
Differential Revision: https://reviews.llvm.org/D55274
llvm-svn: 348522
The PPC test with 2 extra uses seems clearly better by avoiding this transform.
With 1 extra use, we also prevent an extra register move (although that might
be an RA problem). The general rule should be to only make a change here if
it is always profitable. The x86 diffs are all neutral.
llvm-svn: 348518
The AVX512 diffs are neutral, but the bswap test shows a clear overreach in
hoistLogicOpWithSameOpcodeHands(). If we don't check for other uses, we can
increase the instruction count.
This could also fight with transforms trying to go in the opposite direction
and possibly blow up/infinite loop. This might be enough to solve the bug
noted here:
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20181203/608593.html
I did not add the hasOneUse() checks to all opcodes because I see a perf
regression for at least one opcode. We may decide that's irrelevant in the
face of potential compiler crashing, but I'll see if I can salvage that first.
llvm-svn: 348508
Because we're potentially peeking through a bitcast in this transform,
we need to use overall bitwidths rather than number of elements to
determine when it's safe to proceed.
Should fix:
https://bugs.llvm.org/show_bug.cgi?id=39893
llvm-svn: 348383
This is an initial patch to add a minimum level of support for funnel shifts to the SelectionDAG and to begin wiring it up to the X86 SHLD/SHRD instructions.
Some partial legalization code has been added to handle the case for 'SlowSHLD' where we want to expand instead and I've added a few DAG combines so we don't get regressions from the existing DAG builder expansion code.
Differential Revision: https://reviews.llvm.org/D54698
llvm-svn: 348353
Fix potential issue with the ISD::INSERT_VECTOR_ELT case tweaking the DemandedElts mask instead of using a local copy - so later uses of the mask use the tweaked version.....
Noticed while investigating adding zero/undef folding to SimplifyDemandedVectorElts and the altered DemandedElts mask was causing mismatches.
llvm-svn: 348348
There's a 64k limit on the number of SDNode operands, and some very large
functions with 64k or more loads can cause crashes due to this limit being hit
when a TokenFactor with this many operands is created. To fix this, create
sub-tokenfactors if we've exceeded the limit.
No test case as it requires a very large function.
rdar://45196621
Differential Revision: https://reviews.llvm.org/D55073
llvm-svn: 348324
PR17686 demonstrates that for some targets FP exceptions can fire in cases where the FP_TO_UINT is expanded using a FP_TO_SINT instruction.
The existing code converts both the inrange and outofrange cases using FP_TO_SINT and then selects the result, this patch changes this for 'strict' cases to pre-select the FP_TO_SINT input and the offset adjustment.
The X87 cases don't need the strict flag but generates much nicer code with it....
Differential Revision: https://reviews.llvm.org/D53794
llvm-svn: 348251
Add support for ISD::*_EXTEND and ISD::*_EXTEND_VECTOR_INREG opcodes.
The extra broadcast in trunc-subvector.ll will be fixed in an upcoming patch.
llvm-svn: 348246
This is the smallest vector enhancement I could find to D54640.
Here, we're allowing narrowing to only legal vector ops because we'll see
regressions without that. All of the test diffs are wins from what I can tell.
With AVX/AVX512, we can shrink ymm/zmm ops to xmm.
x86 vector multiplies are the problem case that we're avoiding due to the
patchwork ISA, and it's not clear to me if we can dance around those
regressions using TLI hooks or if we need preliminary patches to plug those
holes.
Differential Revision: https://reviews.llvm.org/D55126
llvm-svn: 348195
Summary:
Under -x86-experimental-vector-widening-legalization, fp_to_uint/fp_to_sint with a smaller than 128 bit vector type results are custom type legalized by promoting the result to a 128 bit vector by promoting the elements, inserting an assertzext/assertsext, then truncating back to original type. The truncate will be further legalizdd to a pack shuffle. In the case of a v8i8 result type, we'll end up with a v8i16 fp_to_sint. This will need to be further legalized during vector op legalization by promoting to v8i32 and then truncating again. Under avx2 this produces good code with two pack instructions, but Under avx512 this will result in a truncate instruction and a packuswb instruction. But we should be able to get away with a single truncate instruction.
The other option is to promote all the way to vXi32 result type during the first type legalization. But in some experimentation that seemed to require more work to produce good code for other configurations.
Reviewers: RKSimon, spatel
Reviewed By: RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D54836
llvm-svn: 348158
This makes the SDAG behavior consistent with the way we do this in IR.
It's possible that we were getting the wrong answer before. For example,
'xor undef, undef --> 0' but 'xor undef, C' --> undef.
But the most practical improvement is likely as shown in the tests here -
for FP, we were overconstraining undef lanes to NaN, and that can prevent
vector simplifications/narrowing (see D51553).
llvm-svn: 348090
This change prevents the crash noted in the post-commit comments
for rL347478 :
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20181119/605166.html
We can't guarantee that an oversized shift amount is folded away,
so we have to check for it.
Note that I committed an incomplete fix for that crash with:
rL347502
But as discussed here:
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20181126/605679.html
...we have to try harder.
So I'm not sure how to expose the bug now (and apparently no fuzzers have found
a way yet either).
On the plus side, we have discovered that we're missing real optimizations by
not simplifying nodes sooner, so the earlier fix still has value, and there's
likely more value in extending that so we can simplify more opcodes and simplify
when doing RAUW and/or putting nodes on the combiner worklist.
Differential Revision: https://reviews.llvm.org/D54954
llvm-svn: 348089
D52935 introduced the ability for SimplifyDemandedBits to call SimplifyDemandedVectorElts through BITCASTs if the demanded bit mask entirely covered the sub element.
This patch relaxes this to demanding an element if we need any bit from it.
Differential Revision: https://reviews.llvm.org/D54761
llvm-svn: 348073
Summary:
The VirtReg2Value mapping is crucial for getting consistently
reliable divergence information into the SelectionDAG. This
patch fixes a bunch of issues that lead to incorrect divergence
info and introduces tight assertions to ensure we don't regress:
1. VirtReg2Value is generated lazily; there were some cases where
a lookup was performed before all relevant virtual registers were
created, leading to an out-of-sync mapping. Those cases were:
- Complex code to lower formal arguments that generated CopyFromReg
nodes from live-in registers (fixed by never querying the mapping
for live-in registers).
- Code that generates CopyToReg for formal arguments that are used
outside the entry basic block (fixed by never querying the
mapping for Register nodes, which don't need the divergence info
anyway).
2. For complex values that are lowered to a sequence of registers,
all registers must be reflected in the VirtReg2Value mapping.
I am not adding any new tests, since I'm not actually aware of any
bugs that these problems are causing with trunk as-is. However,
I recently added a test case (in r346423) which fails when D53283 is
applied without this change. Also, the new assertions should provide
most of the effective test coverage.
There is one test change in sdwa-peephole.ll. The underlying issue
is that since the divergence info is now correct, the DAGISel will
select V_OR_B32 directly instead of S_OR_B32. This leads to an extra
COPY which affects the behavior of MachineLICM in a way that ends up
with the S_MOV_B32 with the constant in a different basic block than
the V_OR_B32, which is presumably what defeats the peephole.
Reviewers: alex-t, arsenm, rampitec
Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D54340
llvm-svn: 348049
Summary:
If a given liveness arg of STATEPOINT is at a fixed frame index
(e.g. a function argument passed on stack), prefer to use this
fixed location even the address is also in a register. If we use
the register it will generate a spill, which is not necessary
since the fixed frame index can be directly recorded in the stack
map.
Patch by Cherry Zhang <cherryyz@google.com>.
Reviewers: thanm, niravd, reames
Reviewed By: reames
Subscribers: cherryyz, reames, anna, arphaman, llvm-commits
Differential Revision: https://reviews.llvm.org/D53889
llvm-svn: 347998
Summary:
This simplifies writing predicates for pattern fragments that are
automatically re-associated or commuted.
For example, a followup patch adds patterns for fragments of the form
(add (shl $x, $y), $z) to the AMDGPU backend. Such patterns are
automatically commuted to (add $z, (shl $x, $y)), which makes it basically
impossible to refer to $x, $y, and $z generically in the PredicateCode.
With this change, the PredicateCode can refer to $x, $y, and $z simply
as `Operands[i]`.
Test confirmed that there are no changes to any of the generated files
when building all (non-experimental) targets.
Change-Id: I61c00ace7eed42c1d4edc4c5351174b56b77a79c
Reviewers: arsenm, rampitec, RKSimon, craig.topper, hfinkel, uweigand
Subscribers: wdng, tpr, llvm-commits
Differential Revision: https://reviews.llvm.org/D51994
llvm-svn: 347992
For targets where i32 is not a legal type (e.g. 64-bit RISC-V),
LegalizeIntegerTypes must promote the result of ISD::FLT_ROUNDS_.
Differential Revision: https://reviews.llvm.org/D53820
llvm-svn: 347986
For targets where i32 is not a legal type (e.g. 64-bit RISC-V),
LegalizeIntegerTypes must promote the operands of ISD::PREFETCH.
Differential Revision: https://reviews.llvm.org/D53281
llvm-svn: 347980
For targets where i32 is not a legal type (e.g. 64-bit RISC-V),
LegalizeIntegerTypes must promote the operand.
Differential Revision: https://reviews.llvm.org/D53279
llvm-svn: 347978
DAGTypeLegalizer::PromoteSetCCOperands currently prefers to zero-extend
operands when it is able to do so. For some targets this is more expensive
than a sign-extension, which is also a valid choice. Introduce the
isSExtCheaperThanZExt hook and use it in the new SExtOrZExtPromotedInteger
helper. On RISC-V, we prefer sign-extension for FromTy == MVT::i32 and ToTy ==
MVT::i64, as it can be performed using a single instruction.
Differential Revision: https://reviews.llvm.org/D52978
llvm-svn: 347977
The motivating case for this is shown in:
https://bugs.llvm.org/show_bug.cgi?id=32023
and the corresponding rot16.ll regression tests.
Because x86 scalar shift amounts are i8 values, we can end up with trunc-binop-trunc
sequences that don't get folded in IR.
As the TODO comments suggest, there will be regressions if we extend this (for x86,
we mostly seem to be missing LEA opportunities, but there are likely vector folds
missing too). I think those should be considered existing bugs because this is the
same transform that we do as an IR canonicalization in instcombine. We just need
more tests to make those visible independent of this patch.
Differential Revision: https://reviews.llvm.org/D54640
llvm-svn: 347917
I believe we should be legalizing these with the rest of vector binary operations. If any custom lowering is required for these nodes, this will give the DAG combine between LegalizeVectorOps and LegalizeDAG to run on the custom code before constant build_vectors are lowered in LegalizeDAG.
I've moved MULHU/MULHS handling in AArch64 from Lowering to isel. Moving the lowering earlier caused build_vector+extract_subvector simplifications to kick in which made the generated code worse.
Differential Revision: https://reviews.llvm.org/D54276
llvm-svn: 347902
SplitVecOp_TruncateHelper tries to promote the result type while splitting FP_TO_SINT/UINT. It then concatenates the result and introduces a truncate to the original result type. But it does this without inserting the AssertZExt/AssertSExt that the regular result type promotion would insert. Nor does it turn FP_TO_UINT into FP_TO_SINT the way normal result type promotion for these operations does. This is bad on X86 which doesn't support FP_TO_SINT until AVX512.
This patch disables the use of SplitVecOp_TruncateHelper for these operations and just lets normal promotion handle it. I've tweaked a couple things in X86ISelLowering to avoid a few obvious regressions there. I believe all the changes on X86 are improvements. The other targets look neutral.
Differential Revision: https://reviews.llvm.org/D54906
llvm-svn: 347593
We might find a target specific node that needs to be unwrapped after we look through an add/or. Otherwise we get inconsistent results if one pointer is just X86WrapperRIP and the other is (add X86WrapperRIP, C)
Differential Revision: https://reviews.llvm.org/D54818
llvm-svn: 347591
This should likely be adjusted to limit this transform
further, but these diffs should be clear wins.
If we have blendv/conditional move, then we should assume
those are cheap ops. The loads become independent of the
compare, so those can be speculated before we need to use
the values in the blend/mov.
llvm-svn: 347526
rL347502 moved the null sibling, so we should group all of these
together. I'm not sure why these aren't methods of the SDValue
class itself, but that's another patch if that's possible.
llvm-svn: 347523
...and use them to avoid creating obviously undef values as
discussed in the post-commit thread for r347478.
The diffs in vector div/rem show that we were missing real
optimizations by creating bogus shift nodes.
llvm-svn: 347502
This code takes a truncate, fp_to_int, or int_to_fp with a legal result type and an input type that needs to be split and enlarges the elements in the result type before doing the split. Then inserts a follow up truncate or fp_round after concatenating the two halves back together.
But if the input type of the original op is being split on its way to ultimately being scalarized we're just going to end up building a vector from scalars and then truncating or rounding it in the vector register. Seems kind of silly to enlarge the result element type of the operation only to end up with scalar code and then building a vector with large elements only to make the elements smaller again in the vector register. Seems better to just try to get away producing smaller result types in the scalarized code.
The X86 test case that changes is a pretty contrived test case that exists because of a bug we used to have in our AVG matching code. I think the code is better now, but its not realistic anyway.
llvm-svn: 347482
SplitVecOp_TruncateHelper tries to introduce a multilevel truncate to avoid scalarization. But if splitting the result type would still be a legal type we don't need to do that.
The comment block at the top of the function implied that this was already implemented. I looked back through the history and it doesn't look to have ever been checked.
llvm-svn: 347479
We fail to canonicalize IR this way (prefer 'not' ops to arbitrary 'xor'),
but that would not matter without this patch because DAGCombiner was
reversing that transform. I think we need this transform in the backend
regardless of what happens in IR to catch cases where the shift-xor
is formed late from GEP or other ops.
https://rise4fun.com/Alive/NC1
Name: shl
Pre: (-1 << C2) == C1
%shl = shl i8 %x, C2
%r = xor i8 %shl, C1
=>
%not = xor i8 %x, -1
%r = shl i8 %not, C2
Name: shr
Pre: (-1 u>> C2) == C1
%sh = lshr i8 %x, C2
%r = xor i8 %sh, C1
=>
%not = xor i8 %x, -1
%r = lshr i8 %not, C2
https://bugs.llvm.org/show_bug.cgi?id=39657
llvm-svn: 347478
This transform needs to be limited.
We are converting to a constant pool load very early, and we
are turning loads that are independent of the select condition
(and therefore speculatable) into a dependent non-speculatable
load.
We may also be transferring a condition code from an FP register
to integer to create that dependent load.
llvm-svn: 347424
This is another step in vector narrowing - a follow-up to D53784
(and hoping to eventually squash potential regressions seen in
D51553).
The x86 test diffs are wins, but the AArch64 diff is probably not.
That problem already exists independent of this patch (see PR39722), but it
went unnoticed in the previous patch because there were no regression tests
that showed the possibility.
The x86 diff in i64-mem-copy.ll is close. Given the frequency throttling
concerns with using wider vector ops, an extra extract to reduce vector
width is the right trade-off at this level of codegen.
Differential Revision: https://reviews.llvm.org/D54392
llvm-svn: 347356
This uncovered an off-by-one typo in SimplifyDemandedVectorElts's INSERT_SUBVECTOR handling as its bounds check was bailing on safe indices.
llvm-svn: 347313
For bitcast nodes from larger element types, add the ability for SimplifyDemandedVectorElts to call SimplifyDemandedBits by merging the elts mask to a bits mask.
I've raised https://bugs.llvm.org/show_bug.cgi?id=39689 to deal with the few places where SimplifyDemandedBits's lack of vector handling is a problem.
Differential Revision: https://reviews.llvm.org/D54679
llvm-svn: 347301
Summary:
We already support this for scalars, but it was explicitly disabled for vectors. In the updated test cases this allows us to see the upper bits are zero to use less multiply instructions to emulate a 64 bit multiply.
This should help with this ispc issue that a coworker pointed me to https://github.com/ispc/ispc/issues/1362
Reviewers: spatel, efriedma, RKSimon, arsenm
Reviewed By: spatel
Subscribers: wdng, llvm-commits
Differential Revision: https://reviews.llvm.org/D54725
llvm-svn: 347287
Consistently use (!LegalOperations || isOperationLegalOrCustom) for all node pairs.
Differential Revision: https://reviews.llvm.org/D53478
llvm-svn: 347255
As discussed on D53794, for float types with ranges smaller than the destination integer type, then we should be able to just use a regular FP_TO_SINT opcode.
I thought we'd need to provide MSA test cases for very small integer types as well (fp16 -> i8 etc.), but it turns out that promotion will kick in so they're unnecessary.
Differential Revision: https://reviews.llvm.org/D54703
llvm-svn: 347251
Sadly, this duplicates (twice) the logic from InstSimplify. There
might be some way to at least share the DAG versions of the code,
but copying the folds seems to be the standard method to ensure
that we don't miss these folds.
Unlike in IR, we don't run DAGCombiner to fixpoint, so there's no
way to ensure that we do these kinds of simplifications unless the
code is repeated at node creation time and during combines.
There were other tests that would become worthless with this
improvement that I changed as pre-commits:
rL347161
rL347164
rL347165
rL347166
rL347167
I'm not sure how to salvage the remaining tests (diffs in this patch).
So the x86 tests verify that the new code is working as intended.
The AMDGPU test is actually similar to my motivating case: we have
some undef value that has survived to machine IR in an x86 test, and
then it gets folded in some weird way, or we crash if we don't transfer
the undef flag. But we would have been better off never getting to that
point by doing these simplifications.
This will lead back to PR32023 someday...
https://bugs.llvm.org/show_bug.cgi?id=32023
llvm-svn: 347170
For example, on X86 we emit a sign_extend_vector_inreg from LowerLoad and without sse4.1 this node will need further legalization. Previously this sign_extend_vector_inreg was being custom lowered during DAG legalization instead of vector op legalization.
Unfortunately, this doesn't seem to matter for the output of any existing lit tests.
llvm-svn: 347094
Legalizer used to request an ext load from i8 to i1 when promoting
vector element type to i8. Fixed.
Differential Revision: https://reviews.llvm.org/D54440
llvm-svn: 346795
Previously, the extend_vector_inreg opcode required their input register to be the same total width as their output. But this doesn't match up with how the X86 instructions are defined. For X86 the input just needs to be a legal type with at least enough elements to cover the output.
This patch weakens the check on these nodes and allows them to be used as long as they have more input elements than output elements. I haven't changed type legalization behavior so it will still create them with matching input and output sizes.
X86 will custom legalize these nodes by shrinking the input to be a 128 bit vector and once we've done that we treat them as legal operations. We still have one case during type legalization where we must custom handle v64i8 on avx512f targets without avx512bw where v64i8 isn't a legal type. In this case we will custom type legalize to a *extend_vector_inreg with a v16i8 input. After that the input is a legal type so type legalization should ignore the node and doesn't need to know about the relaxed restriction. We are no longer allowed to use the default expansion for these nodes during vector op legalization since the default expansion uses a shuffle which required the widths to match. Custom legalization for all types will prevent us from reaching the default expansion code.
I believe DAG combine works correctly with the released restriction because it doesn't check the number of input elements.
The rest of the patch is changing X86 to use either the vector_inreg nodes or the regular zero_extend/sign_extend nodes. I had to add additional isel patterns to handle any_extend during isel since simplifydemandedbits can create them at any time so we can't legalize to zero_extend before isel. We don't yet create any_extend_vector_inreg in simplifydemandedbits.
Differential Revision: https://reviews.llvm.org/D54346
llvm-svn: 346784
The IEEE-754 Standard makes it clear that fneg(x) and
fsub(-0.0, x) are two different operations. The former is a bitwise
operation, while the latter is an arithmetic operation. This patch
creates a dedicated FNeg IR Instruction to model that behavior.
Differential Revision: https://reviews.llvm.org/D53877
llvm-svn: 346774
It should be ok to create a new build_vector after legal operations so long as it doesn't cause an infinite loop in DAG combiner.
Unfortunately, X86's custom constant folding in combineVSZext is hiding any test changes from this. But I'm trying to get to a point where that X86 specific code isn't necessary at all.
Differential Revision: https://reviews.llvm.org/D54285
llvm-svn: 346728
Summary:
Handle extra output from index loads in cases where we wish to
forward a load value directly from a preceeding store.
Fixes PR39571.
Reviewers: peter.smith, rengolin
Subscribers: javed.absar, hiraditya, arphaman, llvm-commits
Differential Revision: https://reviews.llvm.org/D54265
llvm-svn: 346654
This is a long-awaited follow-up suggested in D33578. Since then, we've picked up even more
opportunities for vector narrowing from changes like D53784, so there are a lot of test diffs.
Apart from 2-3 strange cases, these are all wins.
I've structured this to be no-functional-change-intended for any target except for x86
because I couldn't tell if AArch64, ARM, and AMDGPU would improve or not. All of those
targets have existing regression tests (4, 4, 10 files respectively) that would be
affected. Also, Hexagon overrides the shouldReduceLoadWidth() hook, but doesn't show
any regression test diffs. The trade-off is deciding if an extra vector load is better
than a single wide load + extract_subvector.
For x86, this is almost always better (on paper at least) because we often can fold
loads into subsequent ops and not increase the official instruction count. There's also
some unknown -- but potentially large -- benefit from using narrower vector ops if wide
ops are implemented with multiple uops and/or frequency throttling is avoided.
Differential Revision: https://reviews.llvm.org/D54073
llvm-svn: 346595
It's possible for vector op legalization to generate a shuffle. If that happens we should give a chance for DAG combine to combine that with a build_vector input.
I also fixed a bug in combineShuffleOfScalars that was considering the number of uses on a undef input to a shuffle. We don't care how many times undef is used.
Differential Revision: https://reviews.llvm.org/D54283
llvm-svn: 346530
Previous version used type erasure through a `void* (*)()` pointer,
which triggered gcc warning and implied a lot of reinterpret_cast.
This version should make it harder to hit ourselves in the foot.
Differential revision: https://reviews.llvm.org/D54203
llvm-svn: 346522
The DAGCombiner tries to SimplifySelectCC as follows:
select_cc(x, y, 16, 0, cc) -> shl(zext(set_cc(x, y, cc)), 4)
It can't cope with the situation of reordered operands:
select_cc(x, y, 0, 16, cc)
In that case we just need to swap the operands and invert the Condition Code:
select_cc(x, y, 16, 0, ~cc)
Differential Revision: https://reviews.llvm.org/D53236
llvm-svn: 346484
FindBetterNeighborChains simulateanously improves the chain
dependencies of a chain of related stores avoiding the generation of
extra token factors. For chains longer than the GatherAllAliasDepths,
stores further down in the chain will necessarily fail, a potentially
significant waste and preventing otherwise trivial parallelization.
This patch directly parallelize the chains of stores before improving
each store. This generally improves DAG-level parallelism.
Reviewers: courbet, spatel, RKSimon, bogner, efriedma, craig.topper, rnk
Subscribers: sdardis, javed.absar, hiraditya, jrtc27, atanasyan, llvm-commits
Differential Revision: https://reviews.llvm.org/D53552
llvm-svn: 346432
This adds the llvm-side support for post-inlining evaluation of the
__builtin_constant_p GCC intrinsic.
Also fixed SCCPSolver::visitCallSite to not blow up when seeing a call
to a function where canConstantFoldTo returns true, and one of the
arguments is a struct.
Updated from patch initially by Janusz Sobczak.
Differential Revision: https://reviews.llvm.org/D4276
llvm-svn: 346322