Commit Graph

11950 Commits

Author SHA1 Message Date
Simon Pilgrim 22dbc7a48b [DAG] foldConstantFPMath - fold vector splats as well as scalar constants 2021-12-17 14:24:36 +00:00
Simon Pilgrim d91b5b0f57 [DAG] foldConstantFPMath - use APFloat& for read-only constant fold arg. NFC.
We just need to copy the 1st arg (which we use for the constant fold result) - use a cheaper const reference for the 2nd arg.
2021-12-17 12:34:03 +00:00
Simon Pilgrim 42f00106b7 [DAG] Constant fold + canonicalize integer binops before SimplifyVBinOp call
SimplifyVBinOp still has a FoldConstantArithmetic call, which now it isn't vector specific we should be able to remove (once fp binops are tidied up); but we can at least clean up the integer opcodes to perform the basic constant/undef handling in common code first.
2021-12-17 12:02:27 +00:00
Nathan Sidwell dd073e08ae Avoid by-value copies of referenced objects
These were detected by the new -Wauto-by-value-copy (D114989) warning, these by-value
constant copies need only be references.

Reviewed By: aaron.ballman

Differential Revision: https://reviews.llvm.org/D114990
2021-12-16 07:22:46 -08:00
Simon Pilgrim b88f4f271b [DAG] SelectionDAG::isSplatValue - add *_EXTEND_VECTOR_INREG handling
Fixes #52719
2021-12-15 12:26:39 +00:00
Chen Zheng 062d9b7d43 [LegalizeVectorOps] code refactor for LegalizeOp; NFC
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D115636
2021-12-14 03:45:53 +00:00
Chen Zheng 8c107bee70 [LegalizeVectorOps] fix a typo
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D115637
2021-12-14 00:22:58 +00:00
Roman Lebedev c1a36ba002
[DAGCombine][X86][ARM] EXTRACT_SUBVECTOR(VECTOR_SHUFFLE(?,?,Mask)) -> VECTOR_SHUFFLE(EXTRACT_SUBVECTOR(?, ?), EXTRACT_SUBVECTOR(?, ?), Mask')
In most test changes this allows us to drop some broadcasts/shuffles.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D104156
2021-12-13 20:03:44 +03:00
Peter Waller 921e89c59a [SVE] Only combine (fneg (fma)) => FNMLA with nsz
-(Za + Zm * Zn) != (-Za + Zm * (-Zn))
when the FMA produces a zero output (e.g. all zero inputs can produce -0
output)

Add a PatFrag to check presence of nsz on the fneg, add tests which
ensure the combine does not fire in the absense of nsz.

See https://reviews.llvm.org/D90901 for a similar discussion on X86.

Differential Revision: https://reviews.llvm.org/D109525
2021-12-13 11:33:07 +00:00
Fraser Cormack b0319ab79b [PR52475] Ensure a correct chain in copies to/from hidden sret parameter
This patch fixes an issue during SelectionDAG construction. When the
target is unable to lower the function's return value, a hidden sret
parameter is created. It is initialized and copied to a stored variable
(DemoteRegister) with CopyToReg and is later fetched with
CopyFromReg. The bug is that the chains used for each copy are
inconsistent, and thus in rare cases the scheduler may issue them out of
order.

The fix is to ensure that the CopyFromReg uses the DAG root which is set
as the chain corresponding to the initial CopyToReg.

Fixes https://llvm.org/PR52475

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D114795
2021-12-13 10:46:32 +00:00
David Sherwood 652faed353 [CodeGen] Improve SelectionDAGBuilder lowering code for get.active.lane.mask intrinsic
Previously we were using UADDO to generate a two-result value with
the unsigned addition and the overflow mask. We then combined the
overflow mask with the trip count comparison to get a result.
However, we don't need to do this - we can simply use a UADDSAT
saturating add node to add the vector index splat and the stepvector
together. Then we can just compare this to a splat of the trip count.
This results in overall better code quality for both Thumb2 and AArch64.

Differential Revision: https://reviews.llvm.org/D115354
2021-12-10 13:39:38 +00:00
Kazu Hirata f829630d2e [llvm] Use llvm::count (NFC) 2021-12-09 20:50:38 -08:00
David Sherwood 3257f63bbd [NFC][CodeGen] Remove rarely used DL variable from SelectionDAGBuilder
There is a pointer to the DataLayout in SelectionDAGBuilder called
'DL' that is hardly ever used. In most cases the code seems to just
use `DAG.getDataLayout()` instead. Given that DL is also often used
as a shadowed variable for the debug location it seems sensible to
just kill off the few remaining uses and be consistent with the rest
of the code.

Differential Revision: https://reviews.llvm.org/D114451
2021-12-08 17:05:46 +00:00
David Green 5d7efd4758 [SDAG] Refine MMO size when converting masked load/store to normal load/store
After D113888 / 32b6c17b29 the MMO size of a masked loads/store is
unknown. When we are converting back to a standard load/store because
the mask is known all ones, we can refine that to the correct size from
the size of the vector being loaded/stored.

Differential Revision: https://reviews.llvm.org/D114582
2021-12-08 10:13:25 +00:00
Simon Pilgrim d298c32407 Remove unused variable. NFC. 2021-12-07 18:37:07 +00:00
Simon Pilgrim 52d2f35323 [DAG] Update expandFunnelShift/expandROT to return the expansion directly. NFCI.
Don't return a bool to indicate if the expansion was successful, just return the SDValue result directly, like we do for most other basic expansions.
2021-12-07 18:09:43 +00:00
Kazu Hirata 630c847b1b [llvm] Use range-based for loops (NFC) 2021-12-07 09:17:03 -08:00
Fraser Cormack 40d51de5cb [SelectionDAG] Use UnknownSize for VP memory ops
In the style of D113888, this patch updates the various VP memory
operations (load, store, gather, scatter) to use UnknownSize. This is
for the same reason as for masked loads and stores: the number of
elements accessed is not generally known at compile time.

This is somewhat pessimistic in the sense that we may still find
un-canonicalized intrinsics featuring both an all-true mask and an EVL
equal to the vector size. Arguably those should be canonicalized before
the SelectionDAG, so those have been left for future work.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D115036
2021-12-07 10:51:02 +00:00
Fraser Cormack 3460cc2585 [VP] Propagate align parameter attr on VP load/store to ISel
This patch fixes a case where the 'align' parameter attribute on the
pointer operands to llvm.vp.load and llvm.vp.store was being dropped
during the conversion to the SelectionDAG. The default alignment
equal to the ABI type alignment of the vector type was kept. It also
updates the documentation to reflect the fact that the parameter
attribute is now properly supported.

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D114422
2021-12-07 10:16:16 +00:00
Kazu Hirata c4a8928b51 [CodeGen] Use range-based for loops (NFC) 2021-12-06 08:49:10 -08:00
David Green 57ff805a6d [DAG] Create fptoui.sat from clamped fptosi
As an extension to D111976, this converts clamp fptosi, clamped between
0 and (2^n)-1 to a fptoui.sat. This can greatly help on targets with
conversions that naturally saturate, such as Arm.

X86 disables the transform as some of the test cases increases in size.
A fptoui.sat necessitates a fp clamp without native support, so there is
little use in converting if the instruction is just going to be
expanded.

Differential Revision: https://reviews.llvm.org/D112428
2021-12-05 09:25:52 +00:00
Simon Pilgrim ebf5271918 [DAG] PromoteIntRes_FunnelShift - rename shift Amount variable to Amt to prevent line overflow. NFC. 2021-12-03 17:24:45 +00:00
David Green 255ad73424 [ARM] Make MVE v2i1 predicates legal
MVE can treat v16i1, v8i1, v4i1 and v2i1 as different views onto the
same 16bit VPR.P0 register, with v2i1 holding two 8 bit values for the
two halves. This was never treated as a legal type in llvm in the past
as there are not many 64bit instructions and no 64bit compares. There
are a few instructions that could use it though, notably a VSELECT (as
it can handle any size using the underlying v16i8 VPSEL), AND/OR/XOR for
similar reasons, some gathers/scatter and long multiplies and VCTP64
instructions.

This patch goes through and makes v2i1 a legal type, handling all the
cases that fall out of that. It also makes VSELECT legal for v2i64 as a
side benefit. A lot of the codegen changes as a result - usually in way
that is a little better or a little worse, but still expensive. Costs
can change a little too in the process, again in a way that expensive
things remain expensive. A lot of the tests that changed are mainly to
ensure correctness - the code can hopefully be improved in the future
where it comes up in practice.

The intrinsics currently remain using the v4i1 they previously did to
emulate a v2i1. This will be changed in a followup patch but this one
was already large enough.

Differential Revision: https://reviews.llvm.org/D114449
2021-12-03 14:05:41 +00:00
Jay Foad d133a21b71 [SelectionDAG] Add newline to a debug message 2021-12-03 13:33:32 +00:00
Simon Pilgrim 6803d08c38 [DAG][PowerPC] Enable initial ISD::BITCAST SimplifyDemandedBits/SimplifyMultipleUseDemandedBits big-endian handling
This patch begins extending handling for peeking through bitcast nodes to big-endian targets as well as the existing little-endian case.

Differential Revision: https://reviews.llvm.org/D114676
2021-12-02 11:47:53 +00:00
Omer Aviram 617ad14060 [SelectionDAG] Add pattern to haveNoCommonBitsSet
Correctly identify the following pattern, which has no common bits: (X & ~M) op (Y & M).

Differential Revision: https://reviews.llvm.org/D113970
2021-12-01 12:04:04 -05:00
Simon Pilgrim 19d34f6e95 [X86] combinePMULH - recognise 'cheap' trunctions via PACKS/PACKUS as well as SEXT/ZEXT
combinePMULH currently only truncates vXi32/vXi64 multiplies to PMULHW/PMULUW if the source operands are SEXT/ZEXT instructions for a 'free' truncation.

But we can generalize this to any source operand with sufficient leading sign/zero bits that would allow PACKS/PACKUS to be used as a 'cheap' truncation.

This helps us avoid the wider multiplies, in exchange for truncation on both source operands instead of the result.

Differential Revision: https://reviews.llvm.org/D113371
2021-12-01 16:37:49 +00:00
Bradley Smith 0eb1efb92c [DAGCombiner] When combining REM ensure optimized div nodes are unique
The REM DAG combine uses the visitDivLike functions to try and get an
optimized DIV node to provide better codegen, however in some cases this
visitDivLike call ends up in the BuildSDIVPow2 target hook, which in
turn sometimes will return the same node passed in to indicate not to
change it. The REM DAG combine does not anticipate this and creates a
cycle in the DAG because of it.

Fix this by ensuring any such optimized div node returned is distinct
from the node being combined.

Differential Revision: https://reviews.llvm.org/D114716
2021-12-01 11:24:26 +00:00
Simon Pilgrim 9981dd142f [DAG] Apply clang-format to visitMSTORE + visitMLOAD. NFC.
Reduce diff in D114582
2021-12-01 11:23:47 +00:00
Qiu Chaofan 15826eb437 [Legalizer] Avoid expansion to BR_CC if illegal
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D110616
2021-12-01 12:22:21 +08:00
David Green 9e8a71caf0 [DAG] Create fptosi.sat from clamped fptosi
This adds a fold in DAGCombine to create fptosi_sat from sequences for
smin(smax(fptosi(x))) nodes, where the min/max saturate the output of
the fp convert to a specific bitwidth (say INT_MIN and INT_MAX). Because
it is dealing with smin(/smax) in DAG they may currently be ISD::SMIN,
ISD::SETCC/ISD::SELECT, ISD::VSELECT or ISD::SELECT_CC nodes which need
to be handled similarly.

A shouldConvertFpToSat method was added to control when converting may
be profitable. The original fptosi will have a less strict semantics
than the fptosisat, with less values that need to produce defined
behaviour.

This especially helps on ARM/AArch64 where the vcvt instructions
naturally saturate the result.

Differential Revision: https://reviews.llvm.org/D111976
2021-11-30 15:29:14 +00:00
Hans Wennborg a87782c34d Revert "[DAG] Create fptosi.sat from clamped fptosi"
It causes builds to fail with this assert:

llvm/include/llvm/ADT/APInt.h:990:
bool llvm::APInt::operator==(const llvm::APInt &) const:
Assertion `BitWidth == RHS.BitWidth && "Comparison requires equal bit widths"' failed.

See comment on the code review.

> This adds a fold in DAGCombine to create fptosi_sat from sequences for
> smin(smax(fptosi(x))) nodes, where the min/max saturate the output of
> the fp convert to a specific bitwidth (say INT_MIN and INT_MAX). Because
> it is dealing with smin(/smax) in DAG they may currently be ISD::SMIN,
> ISD::SETCC/ISD::SELECT, ISD::VSELECT or ISD::SELECT_CC nodes which need
> to be handled similarly.
>
> A shouldConvertFpToSat method was added to control when converting may
> be profitable. The original fptosi will have a less strict semantics
> than the fptosisat, with less values that need to produce defined
> behaviour.
>
> This especially helps on ARM/AArch64 where the vcvt instructions
> naturally saturate the result.
>
> Differential Revision: https://reviews.llvm.org/D111976

This reverts commit 52ff3b0093.
2021-11-30 15:36:56 +01:00
David Green 52ff3b0093 [DAG] Create fptosi.sat from clamped fptosi
This adds a fold in DAGCombine to create fptosi_sat from sequences for
smin(smax(fptosi(x))) nodes, where the min/max saturate the output of
the fp convert to a specific bitwidth (say INT_MIN and INT_MAX). Because
it is dealing with smin(/smax) in DAG they may currently be ISD::SMIN,
ISD::SETCC/ISD::SELECT, ISD::VSELECT or ISD::SELECT_CC nodes which need
to be handled similarly.

A shouldConvertFpToSat method was added to control when converting may
be profitable. The original fptosi will have a less strict semantics
than the fptosisat, with less values that need to produce defined
behaviour.

This especially helps on ARM/AArch64 where the vcvt instructions
naturally saturate the result.

Differential Revision: https://reviews.llvm.org/D111976
2021-11-30 11:05:32 +00:00
Jeremy Morse a20987adf4 [DebugInfo][InstrRef] Add indirection from dbg.declare in SelectionDAG
Usually dbg.declares get translated into either entries in an MF
side-table, or a DBG_VALUE on entry to the function with IsIndirect set
(including in instruction referencing mode). Much rarer is a dbg.declare
attached to a non-argument value, such as in the test added in this patch
where there's a variable-length-array. Such dbg.declares become SDDbgValue
nodes with InIndirect=true.

As it happens, we weren't correctly emitting DBG_INSTR_REFs with the
additional indirection. This patch adds the extra indirection, encoded as
adding an additional DW_OP_deref to the expression.

Differential Revision: https://reviews.llvm.org/D114440
2021-11-29 22:24:19 +00:00
Kazu Hirata f240e528ce [llvm] Use range-based for loops (NFC) 2021-11-29 09:04:44 -08:00
Bradley Smith 6180806632 [AArch64][SVE] Mark fixed-type FP extending/truncating loads/stores as custom
This allows the generic DAG combine to fold fp_extend/fp_trunc into
loads/stores which we can then lower into a integer extending
load/truncating store plus an FP_EXTEND/FP_ROUND.

The nuance here is that fixed-type FP_EXTEND/FP_ROUND require unpacked
types hence lowering them introduces an unpack/zip. By allowing these
nodes to be combined with loads/store we make it much easier to have
this unpack/zip combined into the load/store by our custom lowering.

Differential Revision: https://reviews.llvm.org/D114580
2021-11-29 11:56:07 +00:00
David Sherwood a31f4bdfe8 [CodeGen][SVE] Use whilelo instruction when lowering @llvm.get.active.lane.mask
In most common cases the @llvm.get.active.lane.mask intrinsic maps directly
to the SVE whilelo instruction, which already takes overflow into account.
However, currently in SelectionDAGBuilder::visitIntrinsicCall we always lower
this immediately to a generic sequence of instructions that explicitly
take overflow into account. This makes it very difficult to then later
transform back into a single whilelo instruction. Therefore, this patch
introduces a new TLI function called shouldExpandGetActiveLaneMask that asks if
we should lower/expand this to a sequence of generic ISD nodes, or instead
just leave it as an intrinsic for the target to lower.

You can see the significant improvement in code quality for some of the
tests in this file:

  CodeGen/AArch64/active_lane_mask.ll

Differential Revision: https://reviews.llvm.org/D114542
2021-11-29 08:08:17 +00:00
Kazu Hirata fd7d40640d [llvm] Use range-based for loops (NFC) 2021-11-28 18:14:49 -08:00
Nikita Popov bfa91f38a9 [DAG] Restore dropped condition
This was dropped in fcee33bd5a,
presumably accidentally.
2021-11-26 21:18:54 +01:00
Simon Pilgrim fcee33bd5a [DAG] Pull out repeated isLittleEndian() calls. NFC. 2021-11-26 18:41:56 +00:00
Simon Pilgrim 2778f9a9f6 [DAG] SimplifyDemandedVectorElts - attempt to handle ADD(x,x) as single use
If the ADD node is the only user of the repeated operand, then treat this as single use - allows us to peek through shl(x,1) patterns.
2021-11-26 10:32:10 +00:00
David Sherwood 86137fb722 [CodeGen] Add scalable vector support for lowering of llvm.get.active.lane.mask
Currently the generic lowering of llvm.get.active.lane.mask is done
in SelectionDAGBuilder::visitIntrinsicCall and currently assumes
only fixed-width vectors are used. This patch changes the code to be
more generic and support scalable vectors too. I have added tests
for SVE here:

  CodeGen/AArch64/active_lane_mask.ll

although the code quality leaves a lot to be desired. The code will
be improved significantly in a later patch that makes use of the
SVE whilelo instruction.

Differential Revision: https://reviews.llvm.org/D114541
2021-11-26 08:17:55 +00:00
Simon Pilgrim 63b1e58f07 [DAG] SimplifyDemandedBits - simplify rotl/rotr to shl/srl (REAPPLIED)
If we only demand bits from one half of a rotation pattern, see if we can simplify to a logical shift.

For the ARM/AArch64 rev16/32 patterns, I had to drop a fold to prevent srl(bswap()) -> rotr(bswap) -> srl(bswap) infinite loops. I've replaced this with an isel PatFrag which should do the same task.

Reapplied with fix for AArch64 rev patterns to matching the ARM fix.

https://alive2.llvm.org/ce/z/iroxki (rol -> shl by amt iff demanded bits has at least as many trailing zeros as the shift amount)
https://alive2.llvm.org/ce/z/4ez_U- (ror -> shl by revamt iff demanded bits has at least as many trailing zeros as the reverse shift amount)
https://alive2.llvm.org/ce/z/cD7dR- (ror -> lshr by amt iff demanded bits has at least as many leading zeros as the shift amount)
https://alive2.llvm.org/ce/z/_XGHtQ (rol -> lshr by revamt iff demanded bits has at least as many leading zeros as the reverse shift amount)

Differential Revision: https://reviews.llvm.org/D114354
2021-11-25 11:14:15 +00:00
Zarko Todorovski 95875d246a [LLVM][NFC]Inclusive language: remove occurances of sanity check/test from llvm
Part of work to use more inclusive language in clang/llvm. Rewording
some comments and change function and variable names.
2021-11-24 17:29:55 -05:00
Benjamin Kramer d32787230d Revert "[DAG] SimplifyDemandedBits - simplify rotl/rotr to shl/srl"
This reverts commit 3cf4a2c620.

It makes llc hang on the following test case.
```
target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"
target triple = "aarch64-unknown-linux-gnu"

define dso_local void @_PyUnicode_EncodeUTF16() local_unnamed_addr #0 {
entry:
  br label %while.body117.i

while.body117.i:                                  ; preds = %cleanup149.i, %entry
  %out.6269.i = phi i16* [ undef, %cleanup149.i ], [ undef, %entry ]
  %0 = load i16, i16* undef, align 2
  %1 = icmp eq i16 undef, -10240
  br i1 %1, label %fail.i, label %cleanup149.i

cleanup149.i:                                     ; preds = %while.body117.i
  %or130.i = call i16 @llvm.bswap.i16(i16 %0) #2
  store i16 %or130.i, i16* %out.6269.i, align 2
  br label %while.body117.i

fail.i:                                           ; preds = %while.body117.i
  ret void
}

; Function Attrs: nofree nosync nounwind readnone speculatable willreturn
declare i16 @llvm.bswap.i16(i16) #1

attributes #0 = { "target-features"="+neon,+v8a" }
attributes #1 = { nofree nosync nounwind readnone speculatable willreturn }
attributes #2 = { mustprogress nofree norecurse nosync nounwind readnone uwtable willreturn "frame-pointer"="non-leaf" "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="generic" "target-features"="+neon,+v8a" }
```
2021-11-24 14:42:54 +01:00
Simon Pilgrim 3cf4a2c620 [DAG] SimplifyDemandedBits - simplify rotl/rotr to shl/srl
If we only demand bits from one half of a rotation pattern, see if we can simplify to a logical shift.

For the ARM rev16 patterns, I had to drop a fold to prevent srl(bswap()) -> rotr(bswap) -> srl(bswap) infinite loops. I've replaced this with an isel PatFrag which should do the same task.

https://alive2.llvm.org/ce/z/iroxki (rol -> shl by amt iff demanded bits has at least as many trailing zeros as the shift amount)
https://alive2.llvm.org/ce/z/4ez_U- (ror -> shl by revamt iff demanded bits has at least as many trailing zeros as the reverse shift amount)
https://alive2.llvm.org/ce/z/cD7dR- (ror -> lshr by amt iff demanded bits has at least as many leading zeros as the shift amount)
https://alive2.llvm.org/ce/z/_XGHtQ (rol -> lshr by revamt iff demanded bits has at least as many leading zeros as the reverse shift amount)

Differential Revision: https://reviews.llvm.org/D114354
2021-11-24 11:28:35 +00:00
David Sherwood cf40ca026f [NFC] Tidy up SelectionDAGBuilder::visitIntrinsicCall to use existing sdl debug loc
In quite a few places we were calling getCurSDLoc() to get the debug
location, but this is already a local variable `sdl`.

Differential Revision: https://reviews.llvm.org/D114447
2021-11-24 10:35:49 +00:00
Simon Moll 1e65b93f3a [VP] Canonicalize macros of VPIntrinsics.def
Usage and naming of macros in VPIntrinsics.def has been inconsistent. Rename all property macros to VP_PROPERTY_<name>.  Use BEGIN/END scope macros to attach properties to vp intrinsics and SDNodes (instead of specifying either directly with the property macro).
A follow-up patch has documentation on how the macros are (intended) to be used.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D114144
2021-11-23 16:51:11 +01:00
David Green 32b6c17b29 [SDAG] Use UnknownSize for masked load/store MMO size
A masked load or store will load a potentially unknown number of bytes
from a memory location - that is not generally known at compile time.
They do not necessarily load/store the entire vector width, and treating
them as such can lead to incorrect aliasing information (for example, if
the underlying object is smaller than the size of the vector).

This makes sure that the MMO is given an unknown size to represent this.
which is less accurate that "may load/store from up to 16 bytes", but
less incorrect that "will load/store from 16 bytes".

Differential Revision: https://reviews.llvm.org/D113888
2021-11-23 09:47:56 +00:00
Simon Pilgrim 812e64ef0c [DAG] MatchRotate - support rotate-by-constant of illegal types
Patch to fix some of the regressions in D77804.

By folding to rotate/funnel-shift by constant amounts for illegal types, we prevent SimplifyDemandedBits from destroying the patterns prematurely, allowing us to use the rotate/funnel-shift legalization that was added in D112443.

Differential Revision: https://reviews.llvm.org/D113192
2021-11-19 11:12:04 +00:00
Eric Tang 9fe6b9e802 [TargetLowering][RISCV] Fixed a scalable vector issue when lowering [s|u]mul.overflow intrinsics
Fixed the vector type issue that where we used getVectorNumElements()
    should be replaced by getVectorElementCount() when lowering these
    intrinsics.

    This is similar to D94149

Signed-off-by: Eric Tang <tangxingxin1008@gmail.com>

Reviewed By: craig.topper, frasercrmck

Differential Revision: https://reviews.llvm.org/D109809
2021-11-18 10:16:08 +00:00
Craig Topper d78fdf111d [LegalizeTypes] Further limit expansion of CTTZ during type promotion.
Don't expand CTTZ if CTPOP or CTLZ is supported on the promoted type.
We have special handling for CTTZ expansion to use those ops with a
small conversion. The setup for that doesn't generate extra code or
large constants so we don't gain anything from expanding early and we
make CTTZ_ZERO_UNDEF codegen worse.

Follow up from post commit feedback on D112268. We don't seem to have
any in tree tests that care about this.
2021-11-17 15:27:29 -08:00
Craig Topper 0274be28d7 [RISCV] Lower vector CTLZ_ZERO_UNDEF/CTTZ_ZERO_UNDEF by converting to FP and extracting the exponent.
If we have a large enough floating point type that can exactly
represent the integer value, we can convert the value to FP and
use the exponent to calculate the leading/trailing zeros.

The exponent will contain log2 of the value plus the exponent bias.
We can then remove the bias and convert from log2 to leading/trailing
zeros.

This doesn't work for zero since the exponent of zero is zero so we
can only do this for CTLZ_ZERO_UNDEF/CTTZ_ZERO_UNDEF. If we need
a value for zero we can use a vmseq and a vmerge to handle it.

We need to be careful to make sure the floating point type is legal.
If it isn't we'll continue using the integer expansion. We could split the vector
and concatenate the results but that needs some additional work and evaluation.

Differential Revision: https://reviews.llvm.org/D111904
2021-11-17 10:29:41 -08:00
Simon Pilgrim 5fedbd5b18 [DAG] SimplifyDemandedVectorElts - zero_extend_vector_inreg(and(x,c)) -> and(x,c')
If we've only demanded the 0'th element, and it comes from a (one-use) AND, try to convert the zero_extend_vector_inreg into a mask and constant fold it with the AND.
2021-11-17 12:41:48 +00:00
Eric Tang f7eb061a5f [SelectionDAG] Make WidenVecRes_SELECT work for scalable vectors
This change make WidenVecRes_SELECT work for scalable vectors.

    This patch is split from [D110319](https://reviews.llvm.org/D110319)

Signed-off-by: Eric Tang <tangxingxin1008@gmail.com>

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D110388
2021-11-17 08:55:11 +00:00
Fabian Wolff b484fa8289 [X86] Fix crash with inline asm using wrong register name
Fixes PR#48678. `X86TargetLowering::getRegForInlineAsmConstraint()` can adjust the register class to match the type, e.g. change `VR128X` to `VR256X` if the type needs 256 bits. However, the function currently returns the unadjusted register and the adjusted register class, e.g. `xmm15` and `VR256X`, which then causes an assertion failure later because the register class does not contain that register. This patch fixes this behavior.

Reviewed By: pengfei

Differential Revision: https://reviews.llvm.org/D113834
2021-11-16 10:38:12 +08:00
Craig Topper 233def40f7 [DAGCombiner] Prevent unfoldMaskedMerge from creating an AND with two inverted inputs.
It's possible that the mask is already a NOT. At least if InstCombine
hasn't canonicalized the input. In that case we will form an ANDN with
X instead of with Y. So we don't need to worry about Y being a constant.

We might need to check that X isn't a constant instead, but we don't
have a test case for that yet.

This fixes a size regression found when trying to enable this combine
for RISCV in D113937.

Differential Revision: https://reviews.llvm.org/D113948
2021-11-15 17:15:51 -08:00
Simon Pilgrim 7bac1985f4 [DAG] SimplifyVBinOp - add SDLoc() argument
Pass in SDLoc instead of (repeated) local creations in SimplifyVBinOp and scalarizeBinOpOfSplats
2021-11-15 10:43:56 +00:00
Simon Pilgrim 8658d20724 [DAG] SimplifyVBinOp - pull out repeated getValueType() call. NFC. 2021-11-15 10:43:55 +00:00
Sanjay Patel 254c5246e9 [DAGCombiner] match inverted/swapped patterns for vselect of mask of signbit
This was noted as a follow-up to D113212 / D113426:
4fc1fc4005
7e30404c3b
11522cfcad

https://alive2.llvm.org/ce/z/e4o96b

The canonicalization rules for these IR patterns are complicated,
and we were not matching the expected forms in 2 out of the 3
cases. We can make codegen more robust by matching the swapped
forms (and that will also work if these patterns are created late).
2021-11-14 09:35:26 -05:00
Craig Topper 82bc6a094e [X86] Promote f16 STRICT_FROUND to f32 and call libc.
Reviewed By: pengfei

Differential Revision: https://reviews.llvm.org/D113817
2021-11-12 21:37:03 -08:00
Kazu Hirata 99d5cbbd7e [CodeGen] Use SDNode::uses (NFC) 2021-11-12 07:33:29 -08:00
Simon Pilgrim 010b09b0c5 [DAG] reassociateOpsCommutative - test getNode result directly. NFC
Matches the clean code style we use directly above
2021-11-11 18:45:50 +00:00
Sanjay Patel 11522cfcad [DAGCombiner] add fold for vselect based on mask of signbit, part 3
(Cond0 s> -1) ? N1 : 0 --> ~(Cond0 s>> BW-1) & N1

https://alive2.llvm.org/ce/z/mGCBrd

This was suggested as a potential enhancement in D113212 (also 7e30404c3b ).
There's an improvement for AArch that could be generalized ( X > -1 --> X >= 0 ).
For x86, we have a counter-acting fold for most cases that turns the shift+not
back into a setcc, so that needs a work-around to get more cases to use "pandn":
D113603

Note that this pattern (and a previous one) are not currently canonical forms
in IR:
https://alive2.llvm.org/ce/z/e4o96b

Adding swapped variants is left as a TODO item here, but is planned as
a near-term follow-up patch.

Differential Revision: https://reviews.llvm.org/D113426
2021-11-11 10:27:37 -05:00
Simon Pilgrim 82b74363a9 [DAG] reassociateOpsCommutative - peek through bitcasts to find constants
Now that FoldConstantArithmetic can fold bitcasted constants, we should peek through bitcasts of binop operands to try and find foldable constants
2021-11-11 12:00:22 +00:00
Simon Pilgrim 098ea29641 [DAG] FoldConstantArithmetic - fold intop(bitcast(buildvector(c1)),bitcast(buildvector(c1))) -> bitcast(intop(buildvector(c1'),buildvector(c2')))
Enable FoldConstantArithmetic to constant fold bitcasted constant build vectors. These have typically been bitcasted for type legalization purposes.

By extracting the raw constant bit data, performing the constant fold, and then casting the constant bit data back to the (legalized) type, we can perform constant folding on integer types after legalization.

This in particular helps 32-bit targets which need to handle vXi64 build vectors - during legalization the (unsupported) i64 elements are split to create a bitcasted v2Xi32 build vector.

Addresses some regressions in D113192.

Differential Revision: https://reviews.llvm.org/D113564
2021-11-11 11:35:18 +00:00
Fraser Cormack b1d8d70b9d [SelectionDAG] Replace the Chain in LOAD->VP_LOAD widening
The introduction of this legalization, D111248, forgot to replace the
old chain with the new. This could manifest itself in the old
(illegally-typed) value remaining in the DAG, though the simple test
cases didn't catch this.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D113561
2021-11-10 17:49:12 +00:00
Simon Pilgrim 381d14775e [DAG] reassociateOpsCommutative - pull out repeated getOperand() calls. NFC. 2021-11-10 15:19:13 +00:00
Simon Pilgrim ed80761b50 [DAG] Split BuildVectorSDNode::getConstantRawBits into BuildVectorSDNode::recastRawBits helper. NFC.
NFC refactor of D113351, pulling out the APInt split/merge code from the BuildVectorSDNode bits extraction into a BuildVectorSDNode::recastRawBits helper. This is to allow us to reuse the code when we're packing constant folded APInt data back together.
2021-11-10 13:06:19 +00:00
Fraser Cormack 332318ffb6 [SelectionDAG] Widen scalable-vector loads/stores via VP_LOAD/VP_STORE
This patch fixes a compiler crash when widening scalable-vector loads
and stores which end up breaking down to element-wise store operations.
It does so by providing a way for targets with support for
vector-predicated loads and stores to use those instead. By widening the
operation but maintaining the original effective operation length via
the EVL, only the intended vector elements are loaded or stored.

This method should in theory be possible and even preferred for
fixed-length vector types, but all fixed-length types can be broken down
into their elements, and regardless I have observed regressions in the
generated code when doing so. I believe this is simply due to
VP_LOAD/VP_STORE not being up to par with LOAD/STORE in terms of
optimization. It does improve performance on smaller self-contained
examples, however, so the potential is there.

While the only target that benefits from this is RISCV, the legalization
is generic and so was placed centrally.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D111248
2021-11-10 09:55:03 +00:00
Simon Pilgrim 58c01ef270 [SelectionDAG] Merge FoldConstantVectorArithmetic into FoldConstantArithmetic (PR36544)
This patch merges FoldConstantVectorArithmetic back into FoldConstantArithmetic.

Like FoldConstantVectorArithmetic we now handle vector ops with any operand count, but we currently still only handle binops for scalar types - this can be improved in future patches - in particular some common unary/trinary ops still have poor constant folding.

There's one change in functionality causing test changes - FoldConstantVectorArithmetic bails early if the build/splat vector isn't all constant (with some undefs) elements, but FoldConstantArithmetic doesn't - it instead attempts to fold the scalar nodes and bails if they fail to regenerate a constant/undef result, allowing some additional identity/undef patterns to be handled.

Differential Revision: https://reviews.llvm.org/D113300
2021-11-09 11:31:01 +00:00
Simon Pilgrim f059b04f7b [DAG] Add SelectionDAG::ComputeMinSignedBits helper
As suggested on D113371, this adds a wrapper to SelectionDAG::ComputeNumSignBits, similar to the llvm::ComputeMinSignedBits wrapper.

I've included some usage, its not exhaustive, just the more obvious cases where the intention is obvious.

Differential Revision: https://reviews.llvm.org/D113396
2021-11-08 14:12:45 +00:00
Simon Pilgrim f60d3ec0c7 [DAG] Add BuildVectorSDNode::getConstantRawBits helper
We have several places where we need to extract the raw bits data from a BUILD_VECTOR node, so consolidate this to a single helper function that handles Undefs and Integer/FP constants, including implicit truncation.

This should make it easier to extend D113202 to handle more constant folding of bitcasted constant data.

Differential Revision: https://reviews.llvm.org/D113351
2021-11-08 12:07:38 +00:00
Simon Pilgrim 0ff1edeeec [DAG] SimplifyVBinOp - replace FoldConstantVectorArithmetic with FoldConstantArithmetic
Currently FoldConstantArithmetic only handles binops, so replacing other uses of FoldConstantVectorArithmetic (in particular for SETCC nodes), still require more work.
2021-11-07 12:11:46 +00:00
Sanjay Patel 39c4c7d391 [DAGCombiner] remove vselect fold that was accidentally added
This diff snuck into the unrelated:
025a2f73a3

It's a suggested follow-up for D113212, but I need to add test
coverage first.
2021-11-06 09:34:30 -04:00
Sanjay Patel 025a2f73a3 [InstCombine] add tests for umax with sub; NFC 2021-11-06 08:32:52 -04:00
Kazu Hirata 87e53a0ad8 [llvm] Use make_early_inc_range (NFC) 2021-11-05 19:39:07 -07:00
Sanjay Patel 7e30404c3b [DAGCombiner] add fold for vselect based on mask of signbit, part 2
This is the 'or' sibling for the fold added with:
D113212

https://alive2.llvm.org/ce/z/tgnp7K

Note that neither of these transforms is poison-safe,
but it does not seem to matter at this level. We have
had the scalar version of D113212 for a long time, so
this is just making optimizer behavior consistent.

We do not have the scalar version of *this* fold,
however, so that is another follow-up.
2021-11-05 15:02:12 -04:00
Simon Pilgrim 9e6506299a [DAG] FoldConstantVectorArithmetic - remove SDNodeFlags argument
Another minor step towards merging FoldConstantVectorArithmetic into FoldConstantArithmetic.

We don't use SDNodeFlags in any constant folding inside DAG, so passing the Flags argument is a waste of time - an alternative would be to wire up FoldConstantArithmetic to take SDNodeFlags just-in-case we someday start using it, but we don't have any way to test it and I'd prefer to avoid dead code.

Differential Revision: https://reviews.llvm.org/D113276
2021-11-05 14:36:17 +00:00
Sanjay Patel 4fc1fc4005 [DAGCombiner] add fold for vselect based on mask of signbit
(X s< 0) ? Y : 0 --> (X s>> BW-1) & Y

We canonicalize to the icmp+select form in IR, and we already have this fold
for scalar select in SDAG, so I think it's an oversight that we don't have
the fold for vectors. It seems neutral for AArch64 and saves some instructions
on x86.

Whether we should also have the sibling folds for the inverse condition or
all-ones true value may depend on target-specific factors such as whether
there's an "and-not" instruction.

Differential Revision: https://reviews.llvm.org/D113212
2021-11-05 10:06:16 -04:00
Simon Pilgrim f2703c3c33 [DAG] FoldConstantArithmetic - rename NumOps -> NumElts. NFC.
NumOps represents the number of elements for vector constant folding, rename this NumElts so in future we can the consistently use NumOps to represent the number of operands of the opcode.

Minor cleanup before trying to begin generalizing FoldConstantArithmetic to support opcodes other than binops.
2021-11-05 13:32:34 +00:00
Simon Pilgrim c1e7911c3b [DAG] FoldConstantArithmetic - fold bitlogic(bitcast(x),bitcast(y)) -> bitcast(bitlogic(x,y))
To constant fold bitwise logic ops where we've legalized constant build vectors to a different type (e.g. v2i64 -> v4i32), this patch adds a basic ability to peek through the bitcasts and perform the constant fold on the inner operands.

The MVE predicate v2i64 regressions will be addressed by future support for basic v2i64 type support.

One of the yak shaving fixes for D113192....

Differential Revision: https://reviews.llvm.org/D113202
2021-11-05 12:00:59 +00:00
jacquesguan a39eadcf16 [DAGCombiner] Teach combineShiftToMULH to handle constant and const splat vector.
Fold (srl (mul (zext i32:$a to i64), i64:c), 32) -> (mulhu $a, $b),
if c can truncate to i32 without loss.

Reviewed By: frasercrmck, craig.topper, RKSimon

Differential Revision: https://reviews.llvm.org/D108129
2021-11-02 12:04:23 +00:00
Simon Pilgrim 325031786e [SelectionDAG] Optimize expansion for rotates/funnel shifts
If the type of a funnel shift needs to be expanded, expand it to two funnel shifts instead of regular shifts. For constant shifts, this doesn't make much difference, but for variable shifts it allows a more optimal lowering.

Also use the optimized funnel shift lowering for rotates.

Alive2: https://alive2.llvm.org/ce/z/TvHDB- / https://alive2.llvm.org/ce/z/yzPept

(Branched from D108058 as getting this completed should help unlock some other WIP patches).

Original Patch: @efriedma (Eli Friedman)

Differential Revision: https://reviews.llvm.org/D112443
2021-11-02 11:38:25 +00:00
Simon Pilgrim 37e17f278f [DAG] MatchRotate - remove (redundant) legal type check.
Rely on the hasOperation() instead - as commented on D77804, the mid-term intention is to recognise rotate/funnel-by-constant pre-legalization to help avoid SimplifyDemandedBits regressions.
2021-11-02 11:24:50 +00:00
Craig Topper ada5458521 [RISCV] Expand scalable vector bswap. Fix crash for bitreverse.
Fix LegalizeVectorOps to not try shuffle or unrolling expansions for
scalable vectors.

Differential Revision: https://reviews.llvm.org/D112236
2021-10-31 10:01:27 -07:00
Roman Lebedev 25043c8276
[NFCI] Introduce `ICmpInst::compare()` and use it where appropriate
As noted in https://reviews.llvm.org/D90924#inline-1076197
apparently this is a pretty common pattern,
let's not repeat it yet again, but have it in a common place.

There may be some more places where it could be used,
but these are the most obvious ones.
2021-10-30 17:50:06 +03:00
Fraser Cormack 8314a04ede [SelectionDAG] Allow FindMemType to fail when widening loads & stores
This patch removes an internal failure found in FindMemType and "bubbles
it up" to the users of that method: GenWidenVectorLoads and
GenWidenVectorStores. FindMemType -- renamed findMemType -- now returns
an optional value, returning None if no such type is found.

Each of the aforementioned users now pre-calculates the list of types it
will use to widen the memory access. If the type breakdown is not
possible they will signal a failure, at which point the compiler will
crash as it does currently.

This patch is preparing the ground for alternative legalization
strategies for vector loads and stores, such as using vector-predication
versions of loads or stores.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D112000
2021-10-29 18:27:31 +01:00
Abinav Puthan Purayil db8d7b6e2d [DAGCombine][NFC] s/it's/its in the comment of hasNoInfs(). 2021-10-29 07:36:38 +05:30
Kerry McLaughlin f01fafdcd4 [SVE][CodeGen] Fix incorrect legalisation of zero-extended masked loads
PromoteIntRes_MLOAD always sets the extension type to `EXTLOAD`, which
results in a sign-extended load. If the type returned by getExtensionType()
for the load being promoted is something other than `NON_EXTLOAD`, we
should instead pass this to getMaskedLoad() as the extension type.

Reviewed By: CarolineConcatto

Differential Revision: https://reviews.llvm.org/D112320
2021-10-27 14:15:41 +01:00
Caroline Concatto 1137b7207d [SelectionDAG] Widening the result of INSERT_SUBVECTOR.
Widens the result and first input vector because they have the same size.
The subvector to be inserted is widened in the operand widen function.

Differential Revision: https://reviews.llvm.org/D112187
2021-10-27 13:52:25 +01:00
Craig Topper d51e3a2139 [LegalizeTypes][TargetLowering] Merge getShiftAmountTyForConstant into TargetLowering::getShiftAmountTy.
getShiftAmountTyForConstant is a special helper that changes the
shift amount to i32 if the type chosen by
TargetLowering::getShiftAmountTy can't represent all possible values.
This is needed to satisfy an assert in SelectionDAG::getNode.

It requires additional consideration to know when this helper should be used.
I'm not sure that we are always using it when we should.

This patch merges the getShiftAmountTyForConstant handling into
TargetLowering::getShiftAmountTy so we don't need to think about it
anymore.

Technically this may slightly increase compile times since the majority
of callers of getShiftAmountTy won't need this. Hopefully, this isn't
an issue in practice.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D112469
2021-10-25 14:06:53 -07:00
Sanjay Patel 6e46b66e2a [DAGCombiner] make matching bit-hack form of usubsat more flexible
(i8 X ^ 128) & (i8 X s>> 7) --> usubsat X, 128

As suggested in D112085, we can substitute 'xor' with 'add'
in this pattern, and it is logically equivalent:
https://alive2.llvm.org/ce/z/eJtWWC

We canonicalize to 'xor' in IR, but SDAG does not do that
(and it probably should not - https://llvm.org/PR52267 ), so
it is possible to see either pattern in codegen. Note that
'sub' is a another potential pattern, but that is
canonicalized to 'add' in DAGCombiner, so we don't need to
worry about that variation.

Differential Revision: https://reviews.llvm.org/D112377
2021-10-25 09:01:52 -04:00
Craig Topper 93139a3c32 [LegalizeTypes] Only expand CTLZ/CTTZ/CTPOP during type promotion if the new type is legal.
We might be promoting a large non-power of 2 type and the new type
may need to be split. Once we split it we may have a ctlz/cttz/ctpop
instruction for the split type.

I'm also concerned that we may create large shifts with shift amounts
that are too small.
2021-10-22 11:02:35 -07:00
Simon Pilgrim a5f56342b0 [DAG] narrowExtractedVectorLoad - EXTRACT_SUBVECTOR indices are always constant
EXTRACT_SUBVECTOR indices are always constant, we don't need to check for ConstantSDNode, we should just use getConstantOperandVal which will assert for the constant.
2021-10-22 18:32:14 +01:00
Craig Topper 04c184bba7 [TargetLowering] Simplify the interface of expandABS. NFC
Instead of returning a bool to indicate success and a separate
SDValue, return the SDValue and have the callers check if it is
null.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D112331
2021-10-22 10:22:23 -07:00
Craig Topper 0766aef3f3 [LegalizeTypes][RISCV][PowerPC] Expand CTLZ/CTTZ/CTPOP instead of promoting if they'll be expanded later.
Expanding these requires multiple constants. If we promote during type
legalization when they'll end up getting expanded in LegalizeDAG, we'll
use larger constants. These constants may be harder to materialize.
For example, 64-bit constants on 64-bit RISCV are very expensive.

This is similar to what has already been done to BSWAP and BITREVERSE.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D112268
2021-10-22 09:10:01 -07:00
Craig Topper 996123e5e8 [TargetLowering] Simplify the interface for expandCTPOP/expandCTLZ/expandCTTZ.
There is no need to return a bool and have an SDValue output
parameter. Just return the SDValue and let the caller check if it
is null.

I have another patch to add more callers of these so I thought
I'd clean up the interface first.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D112267
2021-10-21 15:35:28 -07:00
Craig Topper ff37b1105d [LegalizeVectorOps][X86] Don't defer BITREVERSE expansion to LegalizeDAG.
By expanding early it allows the shifts to be custom lowered in
LegalizeVectorOps. Then a DAG combine is able to run on them before
LegalizeDAG handles the BUILD_VECTORS for the masks used.

v16Xi8 shift lowering on X86 requires a mask to be applied to a v8i16
shift. The BITREVERSE expansion applied an AND mask before SHL ops and
after SRL ops. This was done to share the same mask constant for both shifts.
It looks like this patch allows DAG combine to remove the AND mask added
after v16i8 SHL by X86 lowering. This maintains the mask sharing that
BITREVERSE was trying to achieve. Prior to this patch it looks like
we kept the mask after the SHL instead which required an extra constant
pool or a PANDN to invert it.

This is dependent on D112248 because RISCV will end up scalarizing the BSWAP
portion of the BITREVERSE expansion if we don't disable BSWAP scalarization in
LegalizeVectorOps first.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D112254
2021-10-21 15:23:23 -07:00
Craig Topper 458ed5fcc3 [TargetLowering][RISCV] Prevent scalarization of fixed vector bswap.
It's better to do the ands, shifts, ors in the vector domain than
to scalarize it and do those operations on each element.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D112248
2021-10-21 14:34:01 -07:00
Sanjay Patel d2198771e9 [DAGCombiner] fold bit-hack form of usubsat
(i8 X ^ 128) & (i8 X s>> 7) --> usubsat X, 128

I haven't found a generalization of this identity:
https://alive2.llvm.org/ce/z/_sriEQ

Note: I was actually looking at the first form of the pattern in that link,
but that's part of a long chain of potential missed transforms in codegen
and IR....that I hope ends here!

The predicates for when this is profitable are a bit tricky. This version of
the patch excludes multi-use but includes custom lowering (as opposed to
legal only).

On x86 for example, we have custom lowering for some vector types, and that
uses umax and sub. So to enable that fold, we need add use checks to avoid
regressions. Even with legal-only lowering, we could see code with extra
reg move instructions for extra uses, so that constraint would have to be
eased very carefully to avoid penalties.

Differential Revision: https://reviews.llvm.org/D112085
2021-10-21 09:47:19 -04:00
Kerry McLaughlin 0d153df69e [SVE] Fix selection failure when splitting extended masked loads
When splitting a masked load, `GetDependentSplitDestVTs` is used to get the
MemVTs of the high and low parts. If the masked load is extended, this
may return VTs with different element types which are used to create the
high & low masked load instructions.
This patch changes `GetDependentSplitDestVTs` to ensure we return VTs with
the same element type.

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D111996
2021-10-21 13:04:38 +01:00
Arthur Eubanks 6ea7437ca5 [SelectionDAG] Bail out of mergeTruncStores when not optimizing
With unoptimized code, we may see lots of stores and spend too much time in mergeTruncStores.

Fixes PR51827.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D111596
2021-10-20 16:58:22 -07:00
Craig Topper fe1f0de003 [RISCV][WebAssembly][TargetLowering] Allow expandCTLZ/expandCTTZ to rely on CTPOP expansion for vectors.
Our fallback expansion for CTLZ/CTTZ relies on CTPOP. If CTPOP
isn't legal or custom for a vector type we would scalarize the
CTLZ/CTTZ. This is different than CTPOP itself which would use a
vector expansion.

This patch teaches expandCTLZ/CTTZ to rely on the vector CTPOP
expansion instead of scalarizing. To do this I had to add additional
checks to make sure the operations used by CTPOP expansions are all
supported. Some of the operations were already needed for the CTLZ/CTTZ
expansion.

This is a huge improvement to the RISCV which doesn't have a scalar
ctlz or cttz in the base ISA.

For WebAssembly, I've added Custom lowering to keep the scalarizing
behavior. I've also extended the scalarizing to CTPOP.

Differential Revision: https://reviews.llvm.org/D111919
2021-10-20 07:46:41 -07:00
Sander de Smalen be6c8dc765 [SelectionDAG] Fix getVectorSubVecPointer for scalable subvectors.
When inserting a scalable subvector into a scalable vector through
the stack, the index to store to needs to be scaled by vscale.
Before this patch, that didn't yet happen, so it would generate the
wrong offset, thus storing a subvector to the incorrect address
and overwriting the wrong lanes.

For some insert:
  nxv8f16 insert_subvector(nxv8f16 %vec, nxv2f16 %subvec, i64 2)

The offset was not scaled by vscale:
  orr     x8, x8, #0x4
  st1h    { z0.h }, p0, [sp]
  st1h    { z1.d }, p1, [x8]
  ld1h    { z0.h }, p0/z, [sp]

And is changed to:
  mov x8, sp
  st1h { z0.h }, p0, [sp]
  st1h { z1.d }, p1, [x8, #1, mul vl]
  ld1h { z0.h }, p0/z, [sp]

Differential Revision: https://reviews.llvm.org/D111633
2021-10-20 13:55:24 +01:00
Simon Pilgrim 71e39e3f18 [ADT] Add APInt::isNegatedPowerOf2() helper
Inspired by D111968, provide a isNegatedPowerOf2() wrapper instead of obfuscating code with (-Value).isPowerOf2() patterns, which I'm sure are likely avenues for typos.....

Differential Revision: https://reviews.llvm.org/D111998
2021-10-19 14:38:21 +01:00
Sanjay Patel 2a3cc4d461 [Analysis] add utility function for unary shuffle mask creation
This is NFC-intended for the callers. Posting in case there are
other potential users that I missed.
I would also use this from VectorCombine in a patch for:
https://llvm.org/PR52178 ( D111901 )

Differential Revision: https://reviews.llvm.org/D111891
2021-10-18 09:00:39 -04:00
Fraser Cormack 3d850d03ae [SelectionDAG] Fix illegal widening of scalable-vector loads
The process of widening simple vector loads attempts to use a load of a
wider vector type if the original load is sufficiently aligned to avoid
memory faults.

However this optimization is only legal when performed on fixed-length
vector types. For scalable vector types this is invalid (unless vscale
happens to be 1).

This patch does increase the likelihood of compiler crashes (from
`FindMemType` failing to find a suitable type) but this now better
matches how widening non-simple loads, insufficiently-aligned loads, and
scalable-vector stores are handled.

Patches will be introduced later by which loads and stores can be
widened on targets with support for masked or predicated operations.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D111885
2021-10-18 10:00:00 +01:00
Mingming Liu cfd155c41b [SelectionDAG] Fix typo in option help
Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D111867
2021-10-15 11:27:40 -07:00
Dávid Bolvanský 6678db00e6 [X86] Enable promotion of i16 popcnt (PR52056)
Solves https://bugs.llvm.org/show_bug.cgi?id=52056

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111507
2021-10-15 15:41:37 +02:00
Kazu Hirata 81e9c90686 [llvm] Use llvm::is_contained (NFC) 2021-10-14 22:44:09 -07:00
Simon Pilgrim 88487662f7 [Codegen] TargetLowering::getCanonicalIndexType - early out scaled MVT::i8 indices. NFCI.
Avoids unused assignment scan-build warning.
2021-10-14 13:08:40 +01:00
Kerry McLaughlin 1a2e90199f [SVE][CodeGen] Add patterns for ADD/SUB + element count
This patch adds patterns to match the following with INC/DEC:
 - @llvm.aarch64.sve.cnt[b|h|w|d] intrinsics + ADD/SUB
 - vscale + ADD/SUB

For some implementations of SVE, INC/DEC VL is not as cheap as ADD/SUB and
so this behaviour is guarded by the "use-scalar-inc-vl" feature flag, which for SVE
is off by default. There are no known issues with SVE2, so this feature is
enabled by default when targeting SVE2.

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D111441
2021-10-13 11:36:15 +01:00
Roman Lebedev 684cbae89a
[KnownBits] Introduce `countMaxActiveBits()` and use it in a few places 2021-10-11 23:36:06 +03:00
Bradley Smith 7c68d4b8ff Revert "[SelectionDAG] Remove PromoteIntOp_EXTRACT_SUBVECTOR."
This reverts commit 3e8d2008f7.

The code removed in this commit is actually required for extracting
fixed types from illegal scalable types, hence this commit causes
assertion failures in such extracts.
2021-10-08 14:53:26 +00:00
Itay Bookstein faa0e2ae76 [SelectionDAG] Fix shift libcall ABI mismatch in shift-amount argument
The shift libcalls have a shift amount parameter of MVT::i32, but
sometimes ExpandIntRes_Shift may be called with a node whose
second operand is a type that is larger than that. This leads to
an ABI mismatch, and for example causes a spurious zeroing of
a register in RV32 for 64-bit shifts. Note that at present regular
shift intstructions already have their shift amount operand adapted
at SelectionDAGBuilder::visitShift time, and funnelled shifts bypass that.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D110508
2021-10-08 09:57:57 +08:00
Wang, Pengfei c236883b6b [X86] Optimize fdiv with reciprocal instructions for half type
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D110557
2021-10-08 09:41:13 +08:00
Bradley Smith 5be266db7a [AArch64][SVE] Improve VECTOR_SPLICE codegen for VL > 128-bit
Differential Revision: https://reviews.llvm.org/D111135
2021-10-07 15:28:55 +00:00
Jack Andersen bd4dad87f4 [MachineInstr] Move MIParser's DBG_VALUE RegState::Debug invariant into MachineInstr::addOperand
Based on the reasoning of D53903, register operands of DBG_VALUE are
invariably treated as RegState::Debug operands. This change enforces
this invariant as part of MachineInstr::addOperand so that all passes
emit this flag consistently.

RegState::Debug is inconsistently set on DBG_VALUE registers throughout
LLVM. This runs the risk of a filtering iterator like
MachineRegisterInfo::reg_nodbg_iterator to process these operands
erroneously when not parsed from MIR sources.

This issue was observed in the development of the llvm-mos fork which
adds a backend that relies on physical register operands much more than
existing targets. Physical RegUnit 0 has the same numeric encoding as
$noreg (indicating an undef for DBG_VALUE). Allowing debug operands into
the machine scheduler correlates $noreg with RegUnit 0 (i.e. a collision
of register numbers with different zero semantics). Eventually, this
causes an assert where DBG_VALUE instructions are prohibited from
participating in live register ranges.

Reviewed By: MatzeB, StephenTozer

Differential Revision: https://reviews.llvm.org/D110105
2021-10-07 16:08:52 +01:00
Simon Pilgrim 21661607ca [llvm] Replace report_fatal_error(std::string) uses with report_fatal_error(Twine)
As described on D111049, we're trying to remove the <string> dependency from error handling and replace uses of report_fatal_error(const std::string&) with the Twine() variant which can be forward declared.
2021-10-06 12:04:30 +01:00
David Sherwood 37edb7d3e2 [SVE] Fix incorrect DAG combines when extracting fixed-width from scalable vectors
We were previously silently generating incorrect code when extracting a
fixed-width vector from a scalable vector. This is worse than crashing,
since the user will have no indication that this is currently unsupported
behaviour. I have fixed the code to only perform DAG combines when safe
to do so, i.e. the input and output vectors are both fixed-width or
both scalable.

Test added here:

  CodeGen/AArch64/sve-extract-scalable-vector.ll

Differential revision: https://reviews.llvm.org/D110624
2021-10-06 09:27:44 +01:00
Simon Pilgrim 2e5daac217 [llvm] Update report_fatal_error calls from raw_string_ostream to use Twine(OS.str())
As described on D111049, we're trying to remove the <string> dependency from error handling and replace uses of report_fatal_error(const std::string&) with the Twine() variant which can be forward declared.

We can use the raw_string_ostream::str() method to perform the implicit flush() and return a reference to the std::string container that we can then wrap inside Twine().
2021-10-05 18:42:12 +01:00
Bjorn Pettersson 8ed0e6b2cf [SelectionDAG] Replace error prone index check in BaseIndexOffset::computeAliasing
Deriving NoAlias based on having the same index in two BaseIndexOffset
expressions seemed weird (and as shown in the added unittest the
correctness of doing so depended on undocumented pre-conditions that
the user of BaseIndexOffset::computeAliasing would need to take care
of.

This patch removes the code that dereived NoAlias based on indices
being the same. As a compensation, to avoid regressions/diffs in
various lit test, we also add a new check. The new check derives
NoAlias in case the two base pointers are based on two different
GlobalValue:s (neither of them being a GlobalAlias).

Reviewed By: niravd

Differential Revision: https://reviews.llvm.org/D110256
2021-10-05 12:15:55 +02:00
Bjorn Pettersson 1896fb2cff [SelectionDAG] Assume that a GlobalAlias may alias other global values
This fixes a bug detected in DAGCombiner when using global alias
variables. Here is an example:
  @foo = global i16 0, align 1
  @aliasFoo = alias i16, i16 * @foo
  define i16 @bar() {
    ...
    store i16 7, i16 * @foo, align 1
    store i16 8, i16 * @aliasFoo, align 1
    ...
  }

BaseIndexOffset::computeAliasing would incorrectly derive NoAlias
for the two accesses in the example above, resulting in DAGCombiner
miscompiles.

This patch fixes the problem by a defensive approach letting
BaseIndexOffset::computeAliasing return false, i.e. that the aliasing
couldn't be determined, when comparing two global values and at least
one is a GlobalAlias. In the future we might improve this with a
deeper analysis to look at the aliasee for the GlobalAlias etc. But
that is a bit more complicated considering that we could have
'local_unnamed_addr' and situations with several 'alias' variables.

Fixes PR51878.

Differential Revision: https://reviews.llvm.org/D110064
2021-10-05 12:15:55 +02:00
Amara Emerson cfef1803dd [GlobalISel] Port over the SelectionDAG stack protector codegen feature.
This is a port of the feature that allows the StackProtector pass to omit
checking code for stack canary checks, and rely on SelectionDAG to do it at a
later stage. The reasoning behind this seems to be to prevent the IR checking
instructions from hindering tail-call optimizations during codegen.

Here we allow GlobalISel to also use that scheme. Doing so requires that we
do some analysis using some factored-out code to determine where to generate
code for the epilogs.

Not every case is handled in this patch since we don't have support for all
targets that exercise different stack protector schemes.

Differential Revision: https://reviews.llvm.org/D98200
2021-10-04 21:33:44 -07:00
Jay Foad a9bceb2b05 [APInt] Stop using soft-deprecated constructors and methods in llvm. NFC.
Stop using APInt constructors and methods that were soft-deprecated in
D109483. This fixes all the uses I found in llvm, except for the APInt
unit tests which should still test the deprecated methods.

Differential Revision: https://reviews.llvm.org/D110807
2021-10-04 08:57:44 +01:00
Kazu Hirata d34cd75d89 [Analysis, CodeGen] Migrate from arg_operands to args (NFC)
Note that arg_operands is considered a legacy name.  See
llvm/include/llvm/IR/InstrTypes.h for details.
2021-10-03 08:22:20 -07:00
Simon Pilgrim df672f66b6 [DAG] scalarizeExtractedVectorLoad - replace getABITypeAlign with allowsMemoryAccess (PR45116)
One of the cases identified in PR45116 - we don't need to limit extracted loads to ABI alignment, we can use allowsMemoryAccess - which tests using getABITypeAlign, but also checks if a target permits (fast) misaligned memory loads by checking allowsMisalignedMemoryAccesses as a fallback.

I've also cleaned up the alignment calculation code - if we have a constant extraction index then the alignment can be based on an offset from the original vector load alignment, but for non-constant indices we should assume the worst (single element alignment only).

Differential Revision: https://reviews.llvm.org/D110486
2021-10-01 21:07:34 +01:00
Sander de Smalen b62e6f19d7 [SelectionDAG] Handle promotion + widening in getCopyToPartsVector
Some vectors require both widening and promotion for their legalization.
This case is not yet handled in getCopyToPartsVector and falls back
on scalarizing by default. BBecause scalable vectors can't easily be
scalarised, we need to implement this in two separate stages:
1. Widen the vector.
2. Promote the vector.

As part of this patch, PromoteIntRes_CONCAT_VECTORS also needed to be
made scalable aware. Instead of falling back on scalarizing the vector
(fixed-width only), each sub-part of the CONCAT vector is promoted,
and the operation is performed on the type with the widest element type,
finally truncating the result to the promoted result type.

Differential Revision: https://reviews.llvm.org/D110646
2021-10-01 08:19:47 +01:00
Christopher Tetreault 3077bc90de [NFC] Restore magic and magicu to a globally visible location
While these functions are only used in one location in upstream,
it has been reused in multiple downstreams. Restore this file to
a globally visibile location (outside of APInt.h) to eliminate
donwstream breakage and enable potential future reuse.

Additionally, this patch renames types and cleans up
clang-tidy issues.
2021-09-30 17:43:12 -07:00
Sander de Smalen 6709b193ea [SelectionDAG] Make WidenVecRes_EXTRACT_SUBVECTOR work for scalable vectors.
The legalizer handles this by breaking up an EXTRACT_SUBVECTOR into
smaller parts, and combines those together, padding the result with
UNDEF vectors, e.g.

  nxv6i64 extract_subvector(nxv12i64, 6)
  <->
  nxv8i64 concat(
    nxv2i64 extract_subvector(nxv16i64, 6)
    nxv2i64 extract_subvector(nxv16i64, 8)
    nxv2i64 extract_subvector(nxv16i64, 10)
    nxv2i64 undef)

Reviewed By: frasercrmck, david-arm

Differential Revision: https://reviews.llvm.org/D110253
2021-09-29 11:33:45 +01:00
Itay Bookstein 7255ce30e4 [SelectionDAG] Fix incorrect condition for shift amount truncation
Comment says:
  // If the operand is larger than the shift count type but the shift
  // count type has enough bits to represent any shift value ...

It clearly talks about the shifted operand, not the shift-amount operand,
but the comparison is performed against Log2_32_Ceil(Op2.getValueSizeInBits())
where Op2 is the shift amount operand. This comparison also doesn't make
sense in the context of the previous one (ShiftsSize > Op2Size) because
Op2Size == Op2.getValueSizeInBits(). Fix to use Op1.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D110509
2021-09-28 17:52:30 -07:00
Arthur Eubanks aa53785f23 Reland [clang] Rework dontcall attributes
To avoid using the AST when emitting diagnostics, split the "dontcall"
attribute into "dontcall-warn" and "dontcall-error", and also add the
frontend attribute value as the LLVM attribute value. This gives us all
the information to report diagnostics we need from within the IR (aside
from access to the original source).

One downside is we directly use LLVM's demangler rather than using the
existing Clang diagnostic pretty printing of symbols.

Previous revisions didn't properly declare the new dependencies.

Reviewed By: nickdesaulniers

Differential Revision: https://reviews.llvm.org/D110364
2021-09-28 15:31:30 -07:00
Arthur Eubanks 7833d20f1f Revert "[clang] Rework dontcall attributes"
This reverts commit 2943071e2e.

Breaks bots
2021-09-28 14:49:27 -07:00
Arthur Eubanks 2943071e2e [clang] Rework dontcall attributes
To avoid using the AST when emitting diagnostics, split the "dontcall"
attribute into "dontcall-warn" and "dontcall-error", and also add the
frontend attribute value as the LLVM attribute value. This gives us all
the information to report diagnostics we need from within the IR (aside
from access to the original source).

One downside is we directly use LLVM's demangler rather than using the
existing Clang diagnostic pretty printing of symbols.

Reviewed By: nickdesaulniers

Differential Revision: https://reviews.llvm.org/D110364
2021-09-28 14:21:10 -07:00
Xiang1 Zhang ebe9944a34 [ISel] Legalized arithmetic.fence.f128 for 32-bits target
Reviewed By: Craig Topper, Wang Pengfei

Differential Revision: https://reviews.llvm.org/D110467
2021-09-28 10:27:25 +08:00
Fraser Cormack e2b46e336b [DAGCombiner][VP] Fold zero-length or false-masked VP ops
This patch adds a generic DAGCombine for vector-predicated (VP) nodes.
Those for which we can determine that no vector element is active can be
replaced by either undef or, for reductions, the start value.

This is tested rather trivially at the IR level, where it's possible
that we want to teach instcombine to perform this optimization.

However, we can also see the zero-evl case arise during SelectionDAG
legalization, when wide VP operations can be split into two and the
upper operation emerges as trivially false.

It's possible that we could perform this optimization "proactively"
(both on legal vectors and before splitting) and reduce the width of an
operation and insert it into a larger undef vector:

```
v8i32 vp_add x, y, mask, 4
->
v8i32 insert_subvector (v8i32 undef), (v4i32 vp_add xsub, ysub, mask, 4), i32 0
```

This is somewhat analogous to similar vector narrow/widening
optimizations, but it's unclear at this point whether that's beneficial
to do this for VP ops for any/all targets.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D109148
2021-09-27 11:30:09 +01:00
Simon Pilgrim 18c8ed5416 [DAG] ReduceLoadOpStoreWidth - replace getABITypeAlign with allowsMemoryAccess (PR45116)
One of the cases identified in PR45116 - we don't need to limit store narrowing to ABI alignment, we can use allowsMemoryAccess - which tests using getABITypeAlign, but also checks if a target permits (fast) misaligned memory access by checking allowsMisalignedMemoryAccesses as a fallback.
2021-09-25 18:35:57 +01:00
Simon Pilgrim 6bd5b1b1ce [DAG] combineShiftToMULH - move getValueType() inside assert. NFCI.
Avoids an unnecessary (void).
2021-09-25 11:56:35 +01:00
Simon Pilgrim 2a5936faf0 [CodeGen] ProcessSDDbgValues - use const-ref value in for-range loop. NFCI.
Avoid unnecessary copies, reported by MSVC static analyzer.
2021-09-23 12:23:46 +01:00
Fraser Cormack e7c879a69d [RISCV][VP] Add support for VP_REDUCE_* operations
This patch adds codegen support for lowering the vector-predicated
reduction intrinsics to RVV instructions. The process is similar to that
of the other reduction intrinsics, save for the fact that every VP
reduction has a start value. We reuse the existing custom "VL" nodes,
adding extra patterns where required to handle non-true masks.

To support these nodes, the `RISCVISD::VECREDUCE_*_VL` nodes have been
given an explicit "merge" operand. This is to faciliate the VP
reductions, where we must be careful to ensure that even if no operation
is performed (when VL=0) we still produce the start value. The RVV
reductions don't update the destination register under these conditions,
so we tie the splatted start value to the output register.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D107657
2021-09-23 11:11:05 +01:00
Bjorn Pettersson c3ae8ecb52 [DAGCombiner] Rename isAlias as mayAlias. NFC
Differential Revision: https://reviews.llvm.org/D110062
2021-09-23 09:54:42 +02:00
Sander de Smalen 3e8d2008f7 [SelectionDAG] Remove PromoteIntOp_EXTRACT_SUBVECTOR.
This code seems untested and is likely obsolete, because this case
should already be handled by the code that legalizes the result type
of EXTRACT_SUBVECTOR.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D110061
2021-09-22 14:23:35 +01:00
Sander de Smalen d5681f1d68 [SelectionDAG] Add PromoteIntOp_INSERT_SUBVECTOR.
This is required to codegen something like:
  <vscale x 8 x i16> @llvm.experimental.vector.insert(<vscale x 8 x i16> %vec,
                                                      <vscale x 2 x i16> %subvec,
                                                      i64 %idx)
where the output vector is legal, but the input vector needs promoting.

It implements this by performing the whole operation on the promoted type,
and then truncating the result.

Reviewed By: david-arm, craig.topper

Differential Revision: https://reviews.llvm.org/D110059
2021-09-22 13:32:36 +01:00
Sander de Smalen 4ca1fbe361 [SelectionDAG] Make WidenVecRes_Convert work for scalable vectors.
Most of the code wasn't yet scalable safe, although most of the
code conceptually just works for scalable vectors. This change
makes the algorithm work on ElementCount, where appropriate,
and leaves the fixed-width only code to use `getFixedNumElements`.

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D110058
2021-09-22 10:58:38 +01:00
Michael Liao 5fb3ae525f [SelectionDAG] Re-calculate scoped AA metadata when merging stores.
Reviewed By: jeroen.dobbelaere

Differential Revision: https://reviews.llvm.org/D102821
2021-09-21 11:41:17 -04:00
Simon Pilgrim 20b58855e0 [CodeGen] SelectionDAGBuilder - Use const-ref iterator in for-range loops. NFCI.
Avoid unnecessary copies, reported by MSVC static analyzer.
2021-09-21 13:01:08 +01:00
Simon Pilgrim 0f83456cf5 [CodeGen] SDDbgValue::getSDNodes() - use const-ref to avoid unnecessary copies. NFCI.
Reported by MSVC static analyzer.
2021-09-21 13:01:08 +01:00
Kazu Hirata 84b07c9b3a [llvm] Use pop_back_val (NFC) 2021-09-19 13:44:23 -07:00
Nikita Popov 0fc624f029 [IR] Return AAMDNodes from Instruction::getMetadata() (NFC)
getMetadata() currently uses a weird API where it populates a
structure passed to it, and optionally merges into it. Instead,
we can return the AAMDNodes and provide a separate merge() API.
This makes usages more compact.

Differential Revision: https://reviews.llvm.org/D109852
2021-09-16 21:06:57 +02:00
Matt Arsenault 54d755a034 DAG: Fix incorrect folding of fmul -1 to fneg
The fmul is a canonicalizing operation, and fneg is not so this would
break denormals that need flushing and also would not quiet signaling
nans. Fold to fsub instead, which is also canonicalizing.
2021-09-14 21:25:02 -04:00
Simon Pilgrim 9db20822f7 [APInt] Add APIntOps::ScaleBitMask helper
APInt is used to describe a bit mask in a variety of value tracking and demanded bits/elts functions.

When traversing through dst/src operands, we have a number of places where these masks need to widened/narrowed to translate through bitcasts, reductions etc. to a different type.

This patch add a APIntOps::ScaleBitMask common helper, adds unit test coverage, and updates a number of cases to use the the helper instead of their own implementation.

This came up on D109065 where we currently have to add yet another implementation of the same code.

Differential Revision: https://reviews.llvm.org/D109683
2021-09-13 16:27:12 +01:00
vnalamot 0fc3ebb70a [SelectionDAG][NFC] Fix typo in VerifyDAGDiverence() function name
Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D109674
2021-09-13 20:48:04 +05:30
David Truby 915e9e76bf [llvm][sve] Lowering for VLS masked extending loads
This extends the custom lowering for extending loads on
fixed length vectors in SVE to support masked extending loads.

The existing tests for correct behaviour of masked extending loads
exhibit bad code generation due to the legalistaion of i1 vectors.
They have been left as-is and new tests have been added that do not
exhibit this behaviour.

Differential Revision: https://reviews.llvm.org/D108200
2021-09-13 11:13:25 +01:00
Sander de Smalen ec7d8d5069 [SelectionDAG] PromoteIntRes_EXTRACT_SUBVECTOR for scalable vectors (widening).
This patch implements legalization of EXTRACT_SUBVECTOR for the case
where the result needs promoting, and the input type requires widening.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D109509
2021-09-10 13:29:26 +01:00
Sander de Smalen 801a745dd2 [SelectionDAG] PromoteIntRes_EXTRACT_SUBVECTOR for scalable vectors.
This patch implements legalization of EXTRACT_SUBVECTOR for the case
where the result needs promoting, and the input type is either legal
or requires splitting.

The idea is that the operation is broken down into simpler steps,
by first extracting a smaller subvector until the input vector
becomes legal or requires promotion.

Reviewed By: CarolineConcatto

Differential Revision: https://reviews.llvm.org/D109313
2021-09-10 13:29:26 +01:00
Craig Topper 9af8f1b18e [SelectionDAG] Add isZero/isAllOnes methods to ConstantSDNode.
Soft deprecrate isNullValue/isAllOnesValue and update in tree
callers. This matches the changes to the APInt interface from
D109483.

Reviewed By: lattner

Differential Revision: https://reviews.llvm.org/D109535
2021-09-09 13:28:30 -07:00
Craig Topper 517728fe1e [SelectionDAG] Use DAG.getNOT to further simplify some code. NFC
Followup to D109483
2021-09-09 10:53:39 -07:00
Nick Desaulniers e69d402088 [NFC] rename member of BitTestBlock and JumpTableHeader
Follow up to suggestions in D109103 via hans:
  I think UnreachableDefault (or UnreachableFallthrough) would be a
  better name now, since it doesn't just omit the range check, it also
  omits the last bit test.

Reviewed By: hans

Differential Revision: https://reviews.llvm.org/D109455
2021-09-09 10:43:00 -07:00
Chris Lattner d51da74889 [CodeGen] Use DAG.getAllOnesConstant where possible to simplify code. NFC. 2021-09-09 10:22:51 -07:00
Chris Lattner 735f46715d [APInt] Normalize naming on keep constructors / predicate methods.
This renames the primary methods for creating a zero value to `getZero`
instead of `getNullValue` and renames predicates like `isAllOnesValue`
to simply `isAllOnes`.  This achieves two things:

1) This starts standardizing predicates across the LLVM codebase,
   following (in this case) ConstantInt.  The word "Value" doesn't
   convey anything of merit, and is missing in some of the other things.

2) Calling an integer "null" doesn't make any sense.  The original sin
   here is mine and I've regretted it for years.  This moves us to calling
   it "zero" instead, which is correct!

APInt is widely used and I don't think anyone is keen to take massive source
breakage on anything so core, at least not all in one go.  As such, this
doesn't actually delete any entrypoints, it "soft deprecates" them with a
comment.

Included in this patch are changes to a bunch of the codebase, but there are
more.  We should normalize SelectionDAG and other APIs as well, which would
make the API change more mechanical.

Differential Revision: https://reviews.llvm.org/D109483
2021-09-09 09:50:24 -07:00
Chris Lattner 9e46dd965a [APInt.h] Reduce the APInt header file interface a bit. NFC
This moves one mid-size function out of line, inlines the
trivial tcAnd/tcOr/tcXor/tcComplement methods into their only
caller, and moves the magic/umagic functions into SelectionDAG
since they are implementation details of its algorithm. This
also removes the unit tests for magic, but these are already
tested in the divide lowering logic for various targets.

This also upgrades some C style comments to C++.

Differential Revision: https://reviews.llvm.org/D109476
2021-09-08 18:17:07 -07:00
Nick Desaulniers 4331f19d8b [ISEL][BitTestBlock] omit additional bit test when default destination is unreachable
Otherwise we end up with an extra conditional jump, following by an
unconditional jump off the end of a function. ie.

  bb.0:
    BT32rr ..
    JCC_1 %bb.4 ...
  bb.1:
    BT32rr ..
    JCC_1 %bb.2 ...
    JMP_1 %bb.3
  bb.2:
    ...
  bb.3.unreachable:
  bb.4:
    ...

  Should be equivalent to:
  bb.0:
    BT32rr ..
    JCC_1 %bb.4 ...
    JMP_1 %bb.2
  bb.1:
  bb.2:
    ...
  bb.3.unreachable:
  bb.4:
    ...

This can occur since at the higher level IR (Instruction) SwitchInsts
are required to have BBs for default destinations, even when it can be
deduced that such BBs are unreachable.

For most programs, this isn't an issue, just wasted instructions since the
unreachable has been statically proven.

The x86_64 Linux kernel when built with CONFIG_LTO_CLANG_THIN=y fails to
boot though once D106056 is re-applied.  D106056 makes it more likely
that correlation-propagation (CVP) can deduce that the default case of
SwitchInsts are unreachable. The x86_64 kernel uses a binary post
processor called objtool, which emits this warning:

vmlinux.o: warning: objtool: cfg80211_edmg_chandef_valid()+0x169: can't
find jump dest instruction at .text.cfg80211_edmg_chandef_valid+0x17b

I haven't debugged precisely why this causes a failure at boot time, but
fixing this very obvious jump off the end of the function fixes the
warning and boot problem.

Link: https://bugs.llvm.org/show_bug.cgi?id=50080
Fixes: https://github.com/ClangBuiltLinux/linux/issues/679
Fixes: https://github.com/ClangBuiltLinux/linux/issues/1440

Reviewed By: hans

Differential Revision: https://reviews.llvm.org/D109103
2021-09-08 11:03:47 -07:00
David Green d8d24c64fe [DAG] Fix GT -> GE condition when creating SetCC
79845ed6df folded some setcc(ashr) conditions to setcc, but got
the condition for NE incorrect, using GT where it should be using GE.
2021-09-08 12:41:51 +01:00
Fraser Cormack 2c5568a6a9 [LegalizeTypes][VP] Add promotion support for binary VP ops
This patch extends the preliminary support for vector-predicated (VP)
operation legalization to include promotion of illegal integer vector
types.

Integer promotion of binary VP operations is relatively simple and
piggy-backs on the non-VP logic, but passing the two extra mask and VP
operands through to the promoted operation.

Tests have been added to the RISC-V target to cover the basic scenarios
for integer promotion for both fixed- and scalable-vector types.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D108288
2021-09-08 10:22:57 +01:00
Fraser Cormack a823bdf3ab [RISCV][VP] Custom lower VP_STORE and VP_LOAD
This patch adds support for the vector-predicated `VP_STORE` and
`VP_LOAD` nodes. We do this in the same way we lower `MSTORE` and
`MLOAD`: to regular load/store instructions via intrinsics.

One necessary change was made to `SelectionDAGLegalize` so that
`VP_STORE` nodes' operation actions are taken from the stored "value"
operands, in the same vein as `STORE` or `MSTORE`.

Reviewed By: craig.topper, rogfer01

Differential Revision: https://reviews.llvm.org/D108999
2021-09-07 10:53:25 +01:00
Fraser Cormack f4dee8cb82 [RISCV][VP] Custom lower VP_SCATTER and VP_GATHER
This patch adds support for the `VP_SCATTER` and `VP_GATHER` nodes by
lowering them to RVV's `vsox`/`vlux` instructions, respectively. This
process is almost identical to the existing `MSCATTER`/`MGATHER` support.

One extra change was made to `SelectionDAGLegalize` so that
`VP_SCATTER`'s operation action is derived from its stored "value"
operand rather than its return type (which is always the chain).

Reviewed By: craig.topper, rogfer01

Differential Revision: https://reviews.llvm.org/D108987
2021-09-07 10:43:07 +01:00
Sanjay Patel e1e4bf174b [DAGCombine] Prevent the transform of combine for multi-use operand
The test is based on a miscompile example in:
https://llvm.org/PR51321

Differential Revision: https://reviews.llvm.org/D107692
2021-09-06 15:30:32 -04:00
Jonas Paulsson 118997d8e9 [SelectionDAGBuilder] Bugfix in visitInlineAsm()
In case of a virtual register tied to a phys-def, the register class needs to
be computed. Make sure that this works generally also with fast regalloc by
using TLI.getRegClassFor() whenever possible, and make only the case of
'Untyped' use getMinimalPhysRegClass().

Fixes https://bugs.llvm.org/show_bug.cgi?id=51699.

Review: Ulrich Weigand
Differential Revision: https://reviews.llvm.org/D109291
2021-09-06 17:46:31 +02:00
David Green 1b83aaaefa [DAG] Remove oneuse check in select_cc setgt X, -1, C, ~C fold
This appears to produce better code, even if the condition may need to
be replicated.
2021-09-05 16:18:31 +01:00
David Green 8523fb96a6 [DAG] Fold select_cc setgt X, -1, C, ~C -> xor (ashr X, BW-1), C
Given a select_cc producing a constant and a invertion of the constant
for a comparison more than zero, we can produce an xor with ashr
instead, which produces smaller code. The ashr either sets all bits or
clear all bits depending on if the value is negative. This is then xor'd
with the constant to optionally negate the value.
https://alive2.llvm.org/ce/z/DTFaBZ

This includes a OneUseCheck on the Cmp, which seems to make thinks a
little worse and will be removed in a followup.

Differential Revision: https://reviews.llvm.org/D109149
2021-09-05 16:04:01 +01:00
David Green 79845ed6df [DAG] Fold setcc eq with ashr to compare to zero.
Pulled out of D109149, this folds set_cc seteq (ashr X, BW-1), -1 ->
set_cc setlt X, 0 to prevent some regressions later on when folding
select_cc setgt X, -1, C, ~C -> xor (ashr X, BW-1), C

Differential Revision: https://reviews.llvm.org/D109214
2021-09-05 14:06:47 +01:00
Heejin Ahn 28780e59f6 [WebAssembly] Add Wasm SjLj support
This add support for SjLj using Wasm exception handling instructions:
https://github.com/WebAssembly/exception-handling/blob/master/proposals/exception-handling/Exceptions.md

This does not yet support the mixed use of EH and SjLj within a
function. It will be added in a follow-up CL.

This currently passes all SjLj Emscripten tests for wasm0/1/2/3/s,
except for the below:
- `test_longjmp_standalone`: Uses Node
- `test_dlfcn_longjmp`: Uses NodeRAWFS
- `test_longjmp_throw`: Mixes EH and SjLj
- `test_exceptions_longjmp1`: Mixes EH and SjLj
- `test_exceptions_longjmp2`: Mixes EH and SjLj
- `test_exceptions_longjmp3`: Mixes EH and SjLj

Reviewed By: dschuff, tlively

Differential Revision: https://reviews.llvm.org/D108960
2021-09-02 10:51:02 -07:00
Roman Lebedev 3f1f08f0ed
Revert @llvm.isnan intrinsic patchset.
Please refer to
https://lists.llvm.org/pipermail/llvm-dev/2021-September/152440.html
(and that whole thread.)

TLDR: the original patch had no prior RFC, yet it had some changes that
really need a proper RFC discussion. It won't be productive to discuss
such an RFC, once it's actually posted, while said patch is already
committed, because that introduces bias towards already-committed stuff,
and the tree is potentially in broken state meanwhile.

While the end result of discussion may lead back to the current design,
it may also not lead to the current design.

Therefore i take it upon myself
to revert the tree back to last known good state.

This reverts commit 4c4093e6e3.
This reverts commit 0a2b1ba33a.
This reverts commit d9873711cb.
This reverts commit 791006fb8c.
This reverts commit c22b64ef66.
This reverts commit 72ebcd3198.
This reverts commit 5fa6039a5f.
This reverts commit 9efda541bf.
This reverts commit 94d3ff09cf.
2021-09-02 13:53:56 +03:00
Fraser Cormack ef78f2106c [LegalizeTypes][VP] Add splitting support for binary VP ops
This patch extends D107904's introduction of vector-predicated (VP)
operation legalization to include vector splitting.

When the result of a binary VP operation needs splitting, all of its
operands are split in kind. The two operands and the mask are split as
usual, and the vector-length parameter EVL is "split" such that the low
and high halves each execute the correct number of elements.

Tests have been added to the RISC-V target to show splitting several
scenarios for fixed- and scalable-vector types. Without support for
`umax` (e.g. in the `B` extension) the generated code starts to branch.
Ideally a cost model would prevent their insertion in the first place.

Through these tests many opportunities for better codegen can be seen:
combining known-undef VP operations and for constant-folding operations
on `ISD::VSCALE`, to name but a few.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D107957
2021-09-02 10:15:53 +01:00
Abinav Puthan Purayil 0baace5379 [DAGCombine] Add node level checks for fp-contract and fp-ninf in visitFMULForFMADistributiveCombine().
Differential Revision: https://reviews.llvm.org/D107551
2021-09-02 11:33:14 +05:30
Roman Lebedev f5753125f0
[Codegen][TLI][X86] SimplifyMultipleUseDemandedBits(): 0'th vec subreg widening is free, try to perform it earlier
I believe, the profitability reasoning here is correct
"sub"reg is already located within the 0'th subreg of wider reg,
so if we have suvector insertion at index 0 into undef,
then it's always free do to.

After this, D109065 finally avoids the regression in D108382.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D109074
2021-09-02 00:54:05 +03:00
Fraser Cormack 85fd44d7fe [SelectionDAG][NFC] Fix typo in assertion message
s/Uexpected/Unexpected.
2021-09-01 08:55:06 +01:00
Hussain Kadhem 524ded7d01 [VP] implementation of sdag support for VP memory intrinsics
Followup to D99355: SDAG support for vector-predicated load/store/gather/scatter.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D105871
2021-08-31 17:01:50 +02:00
Craig Topper 201f6446da [LegalizeTypes][X86] Improve ExpandIntRes_FP_TO_SINT/ExpandIntRes_FP_TO_UINT when input is SoftPromoteHalf.
Instead of splitting off the fp16 to float conversion and generating
a libcall, we should split the operation into fp16 to float and float
to integer operations. This will allow the float to integer conversion
to go through any custom handling the target has. If the target doesn't
have custom handling then we should come back to ExpandIntRes_FP_TO_SINT/
ExpandIntRes_FP_TO_UINT automatically to create the libcall.

This avoids generating libcalls on 32-bit X86. These library functions may
not exist in 32-bit libgcc. At least for LLVM, we never generate them when
hardware floating point instructions are available.

Differential Revision: https://reviews.llvm.org/D108933
2021-08-30 13:12:59 -07:00
Bjorn Pettersson 789f01283d [SelectionDAG] Fix miscompile bugs related to smul.fix.sat with scale zero
When expanding a SMULFIXSAT ISD node (usually originating from
a smul.fix.sat intrinsic) we've applied some optimizations for
the special case when the scale is zero. The idea has been that
it would be cheaper to use an SMULO instruction (if legal) to
perform the multiplication and at the same time detect any overflow.
And in case of overflow we could use some SELECT:s to replace the
result with the saturated min/max value. The only tricky part
is to know if we overflowed on the min or max value, i.e. if the
product is positive or negative. Unfortunately the implementation
has been incorrect as it has looked at the product returned by the
SMULO to determine the sign of the product. In case of overflow that
product is truncated and won't give us the correct sign bit.

This patch is adding an extra XOR of the multiplication operands,
which is used to determine the sign of the non truncated product.

This patch fixes PR51677.

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D108938
2021-08-30 22:08:26 +02:00
Craig Topper 705d005781 [DAGCombiner][RISCV] Don't use vector types in DAGCombiner::tryStoreMergeOfLoads if we need a rotate.
The check for whether a rotate is possible occurs before the
memory legality checks for the integer type. So it's possible we
decide we can use a rotate, but then fail the legality checks. If
that happens we should not fall back to a vector type. This triggers
an assertion in the rotate handling when it finds a vector type
instead of an integer type.

In theory we could use a shufflevector in place of the rotate, but
right now I'd just like to fix the crash.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D108839
2021-08-30 08:47:15 -07:00
Carl Ritson 5d9de3ea18 [DAGCombine] Allow FMA combine with both FMA and FMAD
Without this change only the preferred fusion opcode is tested
when attempting to combine FMA operations.
If both FMA and FMAD are available then FMA ops formed prior to
legalization will not be merged post legalization as FMAD becomes
the preferred fusion opcode.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D108619
2021-08-27 19:49:35 +09:00
Craig Topper 8bb24289f3 [SelectionDAG] Optimize bitreverse expansion to minimize the number of mask constants.
We can halve the number of mask constants by masking before shl
and after srl.

This can reduce the number of mov immediate or constant
materializations. Or reduce the number of constant pool loads
for X86 vectors.

I think we might be able to do something similar for bswap. I'll
look at it next.

Differential Revision: https://reviews.llvm.org/D108738
2021-08-26 09:33:24 -07:00
Sanjay Patel e728d1a3e8 [DAGCombiner] create binop nodes with all of expected values
This is another bug exposed by https://llvm.org/PR51612
(and the one that triggered the initial assertion) in the report.

That example was suppressed with:
985b48f183

...but these would still crash because we created nodes
like UADDO without the expected 2 output values.
2021-08-25 16:14:22 -04:00
Sanjay Patel 985b48f183 [DAGCombiner] check uses more strictly on select-of-binop fold
There are 2 bugs here:
1. We were not checking uses of operand 2 (the false value of the select).
2. We were not checking for multiple uses of nodes that produce >1 result.

Correcting those is enough to avoid the crash in the reduced test based on:
https://llvm.org/PR51612

The additional use check on operand 0 (the condition value of the select)
should not strictly be necessary because we are only replacing one use
with another (whether it makes performance sense to do the transform with
that pattern is not clear). But as noted in the TODO, changing that
uncovers another bug.

Note: there's at least one more bug here - we aren't propagating EVTs
correctly, but I plan to fix that in another patch.
2021-08-25 14:14:41 -04:00
Nick Desaulniers 846e562dcc [Clang] add support for error+warning fn attrs
Add support for the GNU C style __attribute__((error(""))) and
__attribute__((warning(""))). These attributes are meant to be put on
declarations of functions whom should not be called.

They are frequently used to provide compile time diagnostics similar to
_Static_assert, but which may rely on non-ICE conditions (ie. relying on
compiler optimizations). This is also similar to diagnose_if function
attribute, but can diagnose after optimizations have been run.

While users may instead simply call undefined functions in such cases to
get a linkage failure from the linker, these provide a much more
ergonomic and actionable diagnostic to users and do so at compile time
rather than at link time. Users instead may be able use inline asm .err
directives.

These are used throughout the Linux kernel in its implementation of
BUILD_BUG and BUILD_BUG_ON macros. These macros generally cannot be
converted to use _Static_assert because many of the parameters are not
ICEs. The Linux kernel still needs to be modified to make use of these
when building with Clang; I have a patch that does so I will send once
this feature is landed.

To do so, we create a new IR level Function attribute, "dontcall" (both
error and warning boil down to one IR Fn Attr).  Then, similar to calls
to inline asm, we attach a !srcloc Metadata node to call sites of such
attributed callees.

The backend diagnoses these during instruction selection, while we still
know that a call is a call (vs say a JMP that's a tail call) in an arch
agnostic manner.

The frontend then reconstructs the SourceLocation from that Metadata,
and determines whether to emit an error or warning based on the callee's
attribute.

Link: https://bugs.llvm.org/show_bug.cgi?id=16428
Link: https://github.com/ClangBuiltLinux/linux/issues/1173

Reviewed By: aaron.ballman

Differential Revision: https://reviews.llvm.org/D106030
2021-08-25 10:34:18 -07:00
Jeremy Morse 0116ed0069 [DebugInfo][InstrRef] Don't use instr-ref for unoptimised functions
InstrRefBasedLDV is marginally slower than VarlocBasedLDV when analysing
optimised code -- however, it's much slower when analysing code compiled
-O0.

To avoid this: don't use instruction referencing for -O0 functions. In the
"pure" case of unoptimised code, this won't really harm the debugging
experience because most variables won't have been promoted off the stack,
so can't go missing. It becomes more complicated when optimised code is
inlined into functions marked optnone; however these are rare, and as -O0
doesn't run many optimisations there should be little damage to the debug
experience as a result.

I've taken the opportunity to refactor testing for instruction-referencing
into a MachineFunction method, which seems the most appropriate place to
put it.

Differential Revision: https://reviews.llvm.org/D108585
2021-08-25 15:10:36 +01:00
Peilin Guo 4c4dbeeeea [DAGCombine] Check the legality of the index of EXTRACT_SUBVECTOR
For ISD::EXTRACT_SUBVECTOR, its second operand must be a constant
multiple of the known-minimum vector length of the result type.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D107795
2021-08-25 19:33:39 +08:00
Simon Pilgrim 194b08000c [DAG] LoadedSlice::canMergeExpensiveCrossRegisterBankCopy - replace getABITypeAlign with allowsMemoryAccess (PR45116)
One of the cases identified in PR45116 - we don't need to limit load combines to ABI alignment, we can use allowsMemoryAccess - which tests using getABITypeAlign, but also checks if a target permits (fast) misaligned memory loads by checking allowsMisalignedMemoryAccesses as a fallback.
2021-08-24 15:28:30 +01:00
Simon Pilgrim 6de0b55188 [DAG] TransformFPLoadStorePair - replace getABITypeAlign with allowsMemoryAccess (PR45116)
One of the cases identified in PR45116 - we don't need to limit load combines (in this case for fp->int load/store copies) to ABI alignment, we can use allowsMemoryAccess - which tests using getABITypeAlign, but also checks if a target permits (fast) misaligned memory loads by checking allowsMisalignedMemoryAccesses as a fallback.

Differential Revision: https://reviews.llvm.org/D108318
2021-08-24 13:11:27 +01:00
Simon Pilgrim e431b280c9 [DAG] CombineConsecutiveLoads - replace getABITypeAlign with allowsMemoryAccess (PR45116)
One of the cases identified in PR45116 - we don't need to limit load combines (in this case for ISD::BUILD_PAIR) to ABI alignment, we can use allowsMemoryAccess - which tests using getABITypeAlign, but also checks if a target permits (fast) misaligned memory loads by checking allowsMisalignedMemoryAccesses as a fallback.

This helps in particular for 32-bit X86 cases loading 64-bit size data, reducing codegen diffs vs x86_64.

Differential Revision: https://reviews.llvm.org/D108307
2021-08-24 12:31:22 +01:00
Ben Shi f69fb7ac72 [DAGCombiner] Add target hook function to decide folding (mul (add x, c1), c2)
Reviewed by: lebedev.ri, spatel, craig.topper, luismarques, jrtc27

Differential Revision: https://reviews.llvm.org/D107711
2021-08-22 16:53:32 +08:00
Fangrui Song 1dfb30e54c [TargetCallingConv] Change OutputArg ctor to match its members
This avoids unneeded MVT->EVT conversion.
2021-08-21 16:41:48 -07:00
Craig Topper 84cea602f9 Revert "[SelectionDAGBuilder] Compute and cache PreferredExtendType on demand."
This reverts commit add08c8741.

There was a compile time jump on tramp3d-v4 on https://llvm-compile-time-tracker.com/
Want to see if it goes away with this reverted.
2021-08-19 08:42:05 -07:00
David Green d10f23a25d [ISel] Expand saddsat and ssubsat via asr and xor
This changes the lowering of saddsat and ssubsat so that instead of
using:
  r,o = saddo x, y
  c = setcc r < 0
  s = c ? INTMAX : INTMIN
  ret o ? s : r
into using asr and xor to materialize the INTMAX/INTMIN constants:
  r,o = saddo x, y
  s = ashr r, BW-1
  x = xor s, INTMIN
  ret o ? x : r
https://alive2.llvm.org/ce/z/TYufgD

This seems to reduce the instruction count in most testcases across most
architectures. X86 has some custom lowering added to compensate for
cases where it can increase instruction count.

Differential Revision: https://reviews.llvm.org/D105853
2021-08-19 16:08:07 +01:00
Craig Topper add08c8741 [SelectionDAGBuilder] Compute and cache PreferredExtendType on demand.
Previously we pre-calculated this and cached it for every
instruction in the function. Most of the calculated results will
never be used. So instead calculate it only on the first use, and
then cache it.

The cache was originally added to fix a compile time issue which
caused r216066 to be reverted.

This change exposed that we weren't pre-computing the Value for
Arguments. I've explicitly disabled that for now as it seemed to
regress some tests on AArch64 which has sext built into its compare
instructions.

Spotted while investigating how to improve heuristics to work better
with RISCV preferring sign extend for unsigned compares for i32 on RV64.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D107976
2021-08-19 07:18:33 -07:00
Fraser Cormack e6b1ac8546 [LegalizeTypes][VP] Add widening support for binary VP ops
This patch adds the beginnings of more thorough support in the
legalizers for vector-predicated (VP) operations.

The first step is the ability to widen illegal vectors. The more
complicated scenario in which the result/operands need widening but the
mask doesn't has not been handled here. That would require a lot of code
without an in-tree target on which to test it.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D107904
2021-08-19 13:08:47 +01:00
jacquesguan a7ebc4d145 [DAGCombiner] Teach isKnownToBeAPowerOfTwo handle SPLAT_VECTOR
Make DAGCombine turn mul by power of 2 into shl for scalable vector.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D107883
2021-08-18 10:10:40 +08:00
Simon Pilgrim d7f288502f SelectionDAGBuilder::visitInlineAsm - don't dereference dyn_cast<> results.
dyn_cast<> can return nullptr if the cast is illegal, use cast<> instead which will assert that the cast is correct.

Fixes static analyser warning.
2021-08-17 18:40:59 +01:00
Fraser Cormack f3e9047249 [VP] Add vector-predicated reduction intrinsics
This patch adds vector-predicated ("VP") reduction intrinsics corresponding to
each of the existing unpredicated `llvm.vector.reduce.*` versions. Unlike the
unpredicated reductions, all VP reductions have a start value. This start value
is returned when the no vector element is active.

Support for expansion on targets without native vector-predication support is
included.

This patch is based on the ["reduction
slice"](https://reviews.llvm.org/D57504#1732277) of the LLVM-VP reference patch
(https://reviews.llvm.org/D57504).

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D104308
2021-08-17 17:56:35 +01:00
Arthur Eubanks 0d822da2bd [NFC] Remove/replace some confusing attribute getters on Function 2021-08-16 16:12:37 -07:00
Simon Pilgrim d6fe8d37c6 [DAG] Fold concat_vectors(concat_vectors(x,y),concat_vectors(a,b)) -> concat_vectors(x,y,a,b)
Follow-up to D107068, attempt to fold nested concat_vectors/undefs, as long as both the vector and inner subvector types are legal.

This exposed the same issue in ARM's MVE LowerCONCAT_VECTORS_i1 (raised as PR51365) and AArch64's performConcatVectorsCombine which both assumed concat_vectors only took 2 subvector operands.

Differential Revision: https://reviews.llvm.org/D107597
2021-08-16 16:06:54 +01:00
Paul Walker cd0e196413 [DAGCombiner] Stop visitEXTRACT_SUBVECTOR creating illegal BITCASTs post legalisation.
visitEXTRACT_SUBVECTOR can sometimes create illegal BITCASTs when
removing "redundant" INSERT_SUBVECTOR operations.  This patch adds
an extra check to ensure such combines only occur after operation
legalisation if any resulting BITBAST is itself legal.

Differential Revision: https://reviews.llvm.org/D108086
2021-08-15 18:25:49 +01:00
Qiu Chaofan a240b29f21 [NFC] Simply update a FIXME comment
X86 overrided LowerOperationWrapper was moved to common implementation
in a7eae62.
2021-08-15 22:43:46 +08:00
Arthur Eubanks f80ae58068 [NFC] Cleanup calls to AttributeList::getAttribute(FunctionIndex)
getAttribute() is confusing, use a clearer method.
2021-08-13 16:27:11 -07:00
Arthur Eubanks d7593ebaee [NFC] Clean up users of AttributeList::hasAttribute()
AttributeList::hasAttribute() is confusing, use clearer methods like
hasParamAttr()/hasRetAttr().

Add hasRetAttr() since it was missing from AttributeList.
2021-08-13 11:59:18 -07:00
Arthur Eubanks 92ce6db9ee [NFC] Rename AttributeList::hasFnAttribute() -> hasFnAttr()
This is more consistent with similar methods.
2021-08-13 11:09:18 -07:00
Fraser Cormack 885be620f9 [LegalizeTypes][NFC] Remove else-after-return
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D107890
2021-08-11 16:48:28 +01:00
madhur13490 61526b1262 [DAG] Reword comment for EnforceNodeIdInvariant and InvalidateNodeId. NFC.
Reviewed By: niravd

Differential Revision: https://reviews.llvm.org/D107845
2021-08-11 12:14:28 +05:30
Craig Topper a8ae41fb51 [SelectionDAGBuilder] Save iterator to avoid second DenseMap lookup. NFC
We were calling find and then using operator[]. Instead keep the
iterator from find and use it to get the value.

Just happened to notice while investigating how we decide what extends
to use between basic blocks.
2021-08-10 22:37:48 -07:00
Adrian Prantl d6b6880172 Streamline the API of salvageDebugInfoImpl (NFC)
This patch refactors / simplifies salvageDebugInfoImpl(). The goal
here is to simplify the implementation of coro::salvageDebugInfo() in
a followup patch.

  1. Change the return value to I.getOperand(0). Currently users of
     salvageDebugInfoImpl() assume that the first operand is
     I.getOperand(0). This patch makes this information explicit. A
     nice side-effect of this change is that it allows us to salvage
     expressions such as add i8 1, %a in the future.

  2. Factor out the creation of a DIExpression and return an array of
     DIExpression operations instead. This change allows users that
     call salvageDebugInfoImpl() in a loop to avoid the costly
     creation of temporary DIExpressions and to defer the creation of
     a DIExpression until the end.

This patch does not change any functionality.

rdar://80227769

Differential Revision: https://reviews.llvm.org/D107383
2021-08-10 15:21:18 -07:00
Wang, Pengfei 6f7f5b54c8 [X86] AVX512FP16 instructions enabling 1/6
1. Enable FP16 type support and basic declarations used by following patches.
2. Enable new instructions VMOVW and VMOVSH.

Ref.: https://software.intel.com/content/www/us/en/develop/download/intel-avx512-fp16-architecture-specification.html

Reviewed By: LuoYuanke

Differential Revision: https://reviews.llvm.org/D105263
2021-08-10 12:46:01 +08:00
Luo, Yuanke 53642d5b80 [NFC] Fix the formula for reciprocal calculation.
Differential Revision: https://reviews.llvm.org/D107713
2021-08-09 16:03:56 +08:00
Nemanja Ivanovic 62fe3dcf98 Fix PPC buildbot break caused by 4c4093e6e3
This commit adds the isnan intrinsic and provides a default expansion
for it in the SDAG. However, it makes the assumption that types
it operates on are IEEE-compliant types. This is not always the case.
An example of that is PPC "double double" which has a representation
that
- Does not need to conform to IEEE requirements for isnan as it is
  not an IEEE-compliant type
- Does not have a representation that allows for straightforward
  reinterpreting as an integer and use of integer operations

The result was that this commit broke __builtin_isnan for ppc_fp128
making many valid numeric values report a NaN.

This patch simply changes the expansion to always expand to unordered
comparison (regardless of whether FP exceptions are tracked). This
is inline with previous semantics.
2021-08-06 22:10:20 -05:00
Amara Emerson 2b067e3335 Change TargetLowering::canMergeStoresTo() to take a MF instead of DAG.
DAG is unnecessary and we need this hook to implement store merging on GlobalISel too.
2021-08-06 12:57:53 -07:00
Craig Topper b2ca4dc935 [LegalizeTypes] Add a simple expansion for SMULO when a libcall isn't available.
This isn't optimal, but prevents crashing when the libcall isn't
available. It just calculates the full product and makes sure the high bits
match the sign of the low half. Each of the pieces should go through their own
type legalization.

This can make D107420 unnecessary.

Needs tests, but I wanted to start discussion about D107420.

Reviewed By: FreddyYe

Differential Revision: https://reviews.llvm.org/D107581
2021-08-06 09:43:01 -07:00
Serge Pavlov 4c4093e6e3 Introduce intrinsic llvm.isnan
This is recommit of the patch 16ff91ebcc,
reverted in 0c28a7c990 because it had
an error in call of getFastMathFlags (base type should be FPMathOperator
but not Instruction). The original commit message is duplicated below:

    Clang has builtin function '__builtin_isnan', which implements C
    library function 'isnan'. This function now is implemented entirely in
    clang codegen, which expands the function into set of IR operations.
    There are three mechanisms by which the expansion can be made.

    * The most common mechanism is using an unordered comparison made by
      instruction 'fcmp uno'. This simple solution is target-independent
      and works well in most cases. It however is not suitable if floating
      point exceptions are tracked. Corresponding IEEE 754 operation and C
      function must never raise FP exception, even if the argument is a
      signaling NaN. Compare instructions usually does not have such
      property, they raise 'invalid' exception in such case. So this
      mechanism is unsuitable when exception behavior is strict. In
      particular it could result in unexpected trapping if argument is SNaN.

    * Another solution was implemented in https://reviews.llvm.org/D95948.
      It is used in the cases when raising FP exceptions by 'isnan' is not
      allowed. This solution implements 'isnan' using integer operations.
      It solves the problem of exceptions, but offers one solution for all
      targets, however some can do the check in more efficient way.

    * Solution implemented by https://reviews.llvm.org/D96568 introduced a
      hook 'clang::TargetCodeGenInfo::testFPKind', which injects target
      specific code into IR. Now only SystemZ implements this hook and it
      generates a call to target specific intrinsic function.

    Although these mechanisms allow to implement 'isnan' with enough
    efficiency, expanding 'isnan' in clang has drawbacks:

    * The operation 'isnan' is hidden behind generic integer operations or
      target-specific intrinsics. It complicates analysis and can prevent
      some optimizations.

    * IR can be created by tools other than clang, in this case treatment
      of 'isnan' has to be duplicated in that tool.

    Another issue with the current implementation of 'isnan' comes from the
    use of options '-ffast-math' or '-fno-honor-nans'. If such option is
    specified, 'fcmp uno' may be optimized to 'false'. It is valid
    optimization in general, but it results in 'isnan' always returning
    'false'. For example, in some libc++ implementations the following code
    returns 'false':

        std::isnan(std::numeric_limits<float>::quiet_NaN())

    The options '-ffast-math' and '-fno-honor-nans' imply that FP operation
    operands are never NaNs. This assumption however should not be applied
    to the functions that check FP number properties, including 'isnan'. If
    such function returns expected result instead of actually making
    checks, it becomes useless in many cases. The option '-ffast-math' is
    often used for performance critical code, as it can speed up execution
    by the expense of manual treatment of corner cases. If 'isnan' returns
    assumed result, a user cannot use it in the manual treatment of NaNs
    and has to invent replacements, like making the check using integer
    operations. There is a discussion in https://reviews.llvm.org/D18513#387418,
    which also expresses the opinion, that limitations imposed by
    '-ffast-math' should be applied only to 'math' functions but not to
    'tests'.

    To overcome these drawbacks, this change introduces a new IR intrinsic
    function 'llvm.isnan', which realizes the check as specified by IEEE-754
    and C standards in target-agnostic way. During IR transformations it
    does not undergo undesirable optimizations. It reaches instruction
    selection, where is lowered in target-dependent way. The lowering can
    vary depending on options like '-ffast-math' or '-ffp-model' so the
    resulting code satisfies requested semantics.

    Differential Revision: https://reviews.llvm.org/D104854
2021-08-06 14:32:27 +07:00
Craig Topper f7076cfd3a [DAGCombiner][RISCV][AMDGPU] Call SimplifyDemandedBits at the end of visitMULHU to enable known bits contant folding.
We don't have real demanded bits support for MULHU, but we can
still use the known bits based constant folding support at the end
of SimplifyDemandedBits to simplify a MULHU. This helps with cases
where we know the LHS and RHS have enough leading zeros so that
the high multiply result is always 0.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D106471
2021-08-05 08:31:26 -07:00
Simon Pilgrim 2cbf9fd402 [DAG] DAGCombiner::visitVECTOR_SHUFFLE - recognise INSERT_SUBVECTOR patterns
IR typically creates INSERT_SUBVECTOR patterns as a widening of the subvector with undefs to pad to the destination size, followed by a shuffle for the actual insertion - SelectionDAGBuilder has to do something similar for shuffles when source/destination vectors are different sizes.

This combine attempts to recognize these patterns by looking for a shuffle of a subvector (from a CONCAT_VECTORS) that starts at a modulo of its size into an otherwise identity shuffle of the base vector.

This uncovered a couple of target-specific issues as we haven't often created INSERT_SUBVECTOR nodes in generic code - aarch64 could only handle insertions into the bottom of undefs (i.e. a vector widening), and x86-avx512 vXi1 insertion wasn't keeping track of undef elements in the base vector.

Fixes PR50053

Differential Revision: https://reviews.llvm.org/D107068
2021-08-05 15:40:48 +01:00
Paul Robinson 75aa3d520d Add a DIExpression const-folder to prevent silly expressions.
It's entirely possible (because it actually happened) for a bool
variable to end up with a 256-bit DW_AT_const_value.  This came about
when a local bool variable was initialized from a bitfield in a
32-byte struct of bitfields, and after inlining and constant
propagation, the variable did have a constant value. The sequence of
optimizations had it carrying "i256" values around, but once the
constant made it into the llvm.dbg.value, no further IR changes could
affect it.

Technically the llvm.dbg.value did have a DIExpression to reduce it
back down to 8 bits, but the compiler is in no way ready to emit an
oversized constant *and* a DWARF expression to manipulate it.
Depending on the circumstances, we had either just the very fat bool
value, or an expression with no starting value.

The sequence of optimizations that led to this state did seem pretty
reasonable, so the solution I came up with was to invent a DWARF
constant expression folder.  Currently it only does convert ops, but
there's no reason it couldn't do other ops if that became useful.

This broke three tests that depended on having convert ops survive
into the DWARF, so I added an operator that would abort the folder to
each of those tests.

Differential Revision: https://reviews.llvm.org/D106915
2021-08-05 06:14:40 -07:00
Fraser Cormack 0b8471e91b [SelectionDAG] Correctly determine the VECREDUCE_SEQ_FMUL action
The LegalizeAction for this node should follow the logic for
`VECREDUCE_SEQ_FADD` and be determined using the vector operand's type.

here isn't an in-tree target that makes use of this, but I think it's safe to
say this is how it should behave, should a target want to customize the action
for this node.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D107478
2021-08-05 09:42:33 +01:00
Craig Topper c23405174a [DAGCombiner][AMDGPU] Canonicalize constants to the RHS of MULHU/MULHS.
This allows special constants like to 0 to be recognized. It's also
expected by isel patterns if a target had a mulh with immediate instructions.
The commuting done by tablegen won't commute patterns with immediates since it
expects DAGCombine to have done it.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D107486
2021-08-04 11:39:23 -07:00
Serge Pavlov 0c28a7c990 Revert "Introduce intrinsic llvm.isnan"
This reverts commit 16ff91ebcc.
Several errors were reported mainly test-suite execution time. Reverted
for investigation.
2021-08-04 17:18:15 +07:00
Serge Pavlov 16ff91ebcc Introduce intrinsic llvm.isnan
Clang has builtin function '__builtin_isnan', which implements C
library function 'isnan'. This function now is implemented entirely in
clang codegen, which expands the function into set of IR operations.
There are three mechanisms by which the expansion can be made.

* The most common mechanism is using an unordered comparison made by
  instruction 'fcmp uno'. This simple solution is target-independent
  and works well in most cases. It however is not suitable if floating
  point exceptions are tracked. Corresponding IEEE 754 operation and C
  function must never raise FP exception, even if the argument is a
  signaling NaN. Compare instructions usually does not have such
  property, they raise 'invalid' exception in such case. So this
  mechanism is unsuitable when exception behavior is strict. In
  particular it could result in unexpected trapping if argument is SNaN.

* Another solution was implemented in https://reviews.llvm.org/D95948.
  It is used in the cases when raising FP exceptions by 'isnan' is not
  allowed. This solution implements 'isnan' using integer operations.
  It solves the problem of exceptions, but offers one solution for all
  targets, however some can do the check in more efficient way.

* Solution implemented by https://reviews.llvm.org/D96568 introduced a
  hook 'clang::TargetCodeGenInfo::testFPKind', which injects target
  specific code into IR. Now only SystemZ implements this hook and it
  generates a call to target specific intrinsic function.

Although these mechanisms allow to implement 'isnan' with enough
efficiency, expanding 'isnan' in clang has drawbacks:

* The operation 'isnan' is hidden behind generic integer operations or
  target-specific intrinsics. It complicates analysis and can prevent
  some optimizations.

* IR can be created by tools other than clang, in this case treatment
  of 'isnan' has to be duplicated in that tool.

Another issue with the current implementation of 'isnan' comes from the
use of options '-ffast-math' or '-fno-honor-nans'. If such option is
specified, 'fcmp uno' may be optimized to 'false'. It is valid
optimization in general, but it results in 'isnan' always returning
'false'. For example, in some libc++ implementations the following code
returns 'false':

    std::isnan(std::numeric_limits<float>::quiet_NaN())

The options '-ffast-math' and '-fno-honor-nans' imply that FP operation
operands are never NaNs. This assumption however should not be applied
to the functions that check FP number properties, including 'isnan'. If
such function returns expected result instead of actually making
checks, it becomes useless in many cases. The option '-ffast-math' is
often used for performance critical code, as it can speed up execution
by the expense of manual treatment of corner cases. If 'isnan' returns
assumed result, a user cannot use it in the manual treatment of NaNs
and has to invent replacements, like making the check using integer
operations. There is a discussion in https://reviews.llvm.org/D18513#387418,
which also expresses the opinion, that limitations imposed by
'-ffast-math' should be applied only to 'math' functions but not to
'tests'.

To overcome these drawbacks, this change introduces a new IR intrinsic
function 'llvm.isnan', which realizes the check as specified by IEEE-754
and C standards in target-agnostic way. During IR transformations it
does not undergo undesirable optimizations. It reaches instruction
selection, where is lowered in target-dependent way. The lowering can
vary depending on options like '-ffast-math' or '-ffp-model' so the
resulting code satisfies requested semantics.

Differential Revision: https://reviews.llvm.org/D104854
2021-08-04 15:27:49 +07:00
Simon Pilgrim 11396641e4 [DAG] Cleanup DAGCombiner::CombineConsecutiveLoads early-outs. NFCI.
We had some similar hasOneUse/isNON_EXTLoad early-outs spread out over different parts of the method - we should pull them all together.

Noticed while triaging PR45116
2021-08-03 13:47:55 +01:00
Eli Friedman 1f62af6346 [AArch64][SelectionDAG] Support passing/returning scalable vectors with unusual types.
This adds handling for two cases:

1. A scalable vector where the element type is promoted.
2. A scalable vector where the element count is odd (or more generally,
   not divisble by the element count of the part type).

(Some element types still don't work; for example, <vscale x 2 x i128>,
or <vscale x 2 x fp128>.)

Differential Revision: https://reviews.llvm.org/D105591
2021-08-02 15:53:16 -07:00
Simon Pilgrim 3a7c82efb8 [DAG] isGuaranteedNotToBeUndefOrPoison - handle ISD::BUILD_VECTOR nodes
If all demanded elements of the BUILD_VECTOR pass a isGuaranteedNotToBeUndefOrPoison check, then we can treat this specific demanded use of the BUILD_VECTOR as guaranteed not to be undef or poison either.

Differential Revision: https://reviews.llvm.org/D107174
2021-07-31 15:08:25 +01:00
Alexandros Lamprineas 7d940432c4 [AArch64] Legalize MVT::i64x8 in DAG isel lowering
This patch legalizes the Machine Value Type introduced in D94096 for loads
and stores. A new target hook named getAsmOperandValueType() is added which
maps i512 to MVT::i64x8. GlobalISel falls back to DAG for legalization.

Differential Revision: https://reviews.llvm.org/D94097
2021-07-31 09:51:28 +01:00
Simon Pilgrim 3c0b596ecc SelectionDAGDumper.cpp - remove nested if-else return chain. NFCI.
Match style and don't use an else after a return.
2021-07-30 19:23:05 +01:00
Simon Pilgrim 986841cca2 SelectionDAGDumper.cpp - printrWithDepthHelper - remove dead code. NFCI.
Fixes coverity warning - we have an early-out for unsigned depth == 0, so the depth < 1 early-out later on is dead code.
2021-07-30 19:23:04 +01:00
Sanjay Patel fa6b2c9915 [DAGCombiner] don't try to partially reduce add-with-overflow ops
This transform was added with D58874, but there were no tests for overflow ops.
We need to change this one way or another because it can crash as shown in:
https://llvm.org/PR51238

Note that if there are no uses of an overflow op's bool overflow result, we
reduce it to a regular math op, so we continue to fold that case either way.
If we have uses of both the math and the overflow bool, then we are likely
not saving anything by creating an independent sub instruction as seen in
the test diffs here.

This patch makes the behavior in SDAG consistent with what we do in
instcombine AFAICT.

Differential Revision: https://reviews.llvm.org/D106983
2021-07-29 08:51:54 -04:00
Jeremy Morse 8612417e5a [DebugInfo][InstrRef] Don't break up ret-sequences on debug-info instrs
When we have a terminator sequence (i.e. a tailcall or return),
MIIsInTerminatorSequence is used to work out where the preceding ABI-setup
instructions end, i.e. the parts that were glued to the terminator
instruction. This allows LLVM to split blocks safely without having to
worry about ABI stuff.

The function only ignores DBG_VALUE instructions, meaning that the two
debug instructions I recently added can end terminator sequences early,
causing various MachineVerifier errors. This patch promotes the test for
debug instructions from "isDebugValue" to "isDebugInstr", thus avoiding any
debug-info interfering with this function.

Differential Revision: https://reviews.llvm.org/D106660
2021-07-28 15:56:00 +01:00
Juneyoung Lee 4f71f59bf3 [DAGCombiner] Fold SETCC(FREEZE(x),const) to FREEZE(SETCC(x,const)) if SETCC is used by BRCOND
This patch adds a peephole optimization `SETCC(FREEZE(x),const)` => `FREEZE(SETCC(x,const))`
if the SETCC is only used by BRCOND.

Combined with `BRCOND(FREEZE(X)) => BRCOND(X)`, this leads to a nice improvement in the generated assembly when x is a masked loaded value.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D105344
2021-07-28 09:22:15 +09:00
Fraser Cormack 7b33b849bd [SelectionDAG] Support scalable splats in U(ADD|SUB)SAT combines
This patch builds on top of D106575 in which scalable-vector splats were
supported in `ISD::matchBinaryPredicate`. It teaches the DAGCombiner how
to perform a variety of the pre-existing saturating add/sub combines on
scalable-vector types.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D106652
2021-07-27 10:52:34 +01:00
Fraser Cormack f924a3d474 [SelectionDAG] Support scalable-vector splats in yet more cases
This patch extends support for (scalable-vector) splats in the
DAGCombiner via the `ISD::matchBinaryPredicate` function, which enable a
variety of simple combines of constants.

Users of this function may now have to distinguish between
`BUILD_VECTOR` and `SPLAT_VECTOR` vector operands. The way of dealing
with this in-tree follows the approach added for
`ISD::matchUnaryPredicate` implemented in D94501.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D106575
2021-07-26 10:15:08 +01:00
Simon Pilgrim c261a06b7a [DAG] Add initial SelectionDAG::isGuaranteedNotToBeUndefOrPoison framework (PR51129)
I've setup the basic framework for the isGuaranteedNotToBeUndefOrPoison call and updated DAGCombiner::visitFREEZE to use it, further Opcodes can be handled when we have test coverage.

I'm not aware of any vector test freeze coverage so the DemandedElts (and the Depth) args are not being used yet - but they are in place.

SelectionDAG::isGuaranteedNotToBePoison wrappers have also been added.

Differential Revision: https://reviews.llvm.org/D106668
2021-07-24 11:36:35 +01:00
David Truby 1528a4d400 [llvm][sve] Lowering for VLS truncating stores
This adds custom lowering for truncating stores when operating on
fixed length vectors in SVE. It also includes a DAG combine to
fold extends followed by truncating stores into non-truncating
stores in order to prevent this pattern appearing once truncating
stores are supported.

Currently truncating stores are not used in certain cases where
the size of the vector is larger than the target vector width.

Differential Revision: https://reviews.llvm.org/D104471
2021-07-23 14:04:55 +01:00
Paulo Matos 46667a1003 [WebAssembly] Implementation of global.get/set for reftypes in LLVM IR
Reland of 31859f896.

This change implements new DAG notes GLOBAL_GET/GLOBAL_SET, and
lowering methods for load and stores of reference types from IR
globals. Once the lowering creates the new nodes, tablegen pattern
matches those and converts them to Wasm global.get/set.

Reviewed By: tlively

Differential Revision: https://reviews.llvm.org/D104797
2021-07-22 22:07:24 +02:00
Roman Lebedev af8fa36bf0
[NFCI][TLI] prepare[US]REMEqFold(): don't add nonsensical 'exact' flag to rotates created
As pointed out by Craig Topper.
2021-07-22 23:02:58 +03:00
Eli Friedman 0ca46a1757 [SelectionDAG] Fix the representation of ISD::STEP_VECTOR.
The existing rule about the operand type is strange.  Instead, just say
the operand is a TargetConstant with the right width.  (Legalization
ignores TargetConstants, so it doesn't matter if that width is legal.)

Highlights:

1. I had to substantially rewrite the AArch64 isel patterns to expect a
TargetConstant.  Nothing too exotic, but maybe a little hairy. Maybe
worth considering a target-specific node with some dagcombines instead
of this complicated nest of isel patterns.
2. Our behavior on RV32 for vectors of i64 has changed slightly. In
particular, we correctly preserve the width of the arithmetic through
legalization.  This changes the DAG a bit. Maybe room for
improvement here.
3. I explicitly defined the behavior around overflow. This is necessary
to make the DAGCombine transforms legal, and I don't think it causes any
practical issues.

Differential Revision: https://reviews.llvm.org/D105673
2021-07-21 10:58:40 -07:00
Guillaume Chatelet d6da02d952 [llvm] Add enum iteration to Sequence
This patch allows iterating typed enum via the ADT/Sequence utility.

It also changes the original design to better separate concerns:
 - `StrongInt` only deals with safe `intmax_t` operations,
 - `SafeIntIterator` presents the iterator and reverse iterator
 interface but only deals with safe `StrongInt` internally.
 - `iota_range` only deals with `SafeIntIterator` internally.

 This design ensures that operations are always valid. In particular,
 "Out of bounds" assertions fire when:
  - the `value_type` is not representable as an `intmax_t`
  - iterator operations make internal computation underflow/overflow
  - the internal representation cannot be converted back to `value_type`

Differential Revision: https://reviews.llvm.org/D106279
2021-07-21 12:48:53 +00:00
Amy Huang fd972bb9fd Revert "[llvm][sve] Lowering for VLS truncating stores" because it
causes a seg fault (see https://reviews.llvm.org/D104471).

This reverts commit c305557acd.
2021-07-19 11:03:33 -07:00
Craig Topper 50302feb1d [SelectionDAG][RISCV] Use isSExtCheaperThanZExt to control whether sext or zext is used for constant folding any_extend.
RISCV would prefer a sign extended constant since that works better
with our constant materialization. We have an existing TLI hook we
use to control sign extension of setcc operands in type legalization.
That hook happens to do the right check we need here, but might be
straying from its original purpose. With only RISCV defining this
hook in tree, I wasn't sure if it was worth adding another hook
with identical behavior.

This is an alternative to D105785 where I tried to handle this in
the RISCV backend by not creating ANY_EXTENDs in some places.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D105918
2021-07-19 09:25:28 -07:00
Roman Lebedev 5b51bd1878
[TLI] prepareSREMEqFold(): use correct VT for the final VSELECT (PR51133)
We were using the wrong VT for this final VSELECT,
it should be in the final comparison VT,
not the source value's VT.

Fixes https://bugs.llvm.org/show_bug.cgi?id=51133
2021-07-19 16:44:00 +03:00
Eli Friedman 6601be4419 [X86] Remove incorrect use of known bits in shuffle simplification.
This reverts commit 2a419a0b99.

The result of a shufflevector must not propagate poison from any element
other than the one noted in the shuffle mask.

The regressions outside of fptoui-may-overflow.ll can probably be
recovered some other way; for example, using isGuaranteedNotToBePoison.

See discussion on https://reviews.llvm.org/D106053 for more background.

Differential Revision: https://reviews.llvm.org/D106222
2021-07-18 18:13:11 -07:00
Simon Pilgrim fd7a54c709 [DAG] DAGCombiner::foldSelectOfBinops - propagate the common flags to the merged binop
As discussed on D106058 - we were failing to keep the common flags. This matches the behaviour in InstCombinerImpl::foldSelectOpOp.
2021-07-18 18:38:59 +01:00
Simon Pilgrim 5643be96bc [DAG] Enable foldSelectOfBinops on select(setcc(),binop(),binop()) calls 2021-07-18 18:38:59 +01:00
Simon Pilgrim 1a6a8443c2 [DAG] Move select(cc, binop(), binop()) folds into DAGCombiner::foldSelectOfBinops. NFCI.
I'm going to extend the functionality started in D106058 so move the folds into their own method to reduce the amount of code in DAGCombiner::visitSELECT
2021-07-18 14:54:41 +01:00
Simon Pilgrim 95995673d1 [DAG] SelectionDAG::MaskedElementsAreZero - assert we're calling with a vector. NFCI.
Add an assertion that we've calling MaskedElementsAreZero with a vector op and that the DemandedElts arg is a matching width.

Makes the error a lot easier to grok when something else accidentally gets used.
2021-07-16 17:43:35 +01:00