Commit Graph

63852 Commits

Author SHA1 Message Date
Amara Emerson 95ac3d15e9 [AArch64][GlobalISel] Add G_VECREDUCE fewerElements support for full scalarization.
For some reductions like G_VECREDUCE_OR on AArch64, we need to scalarize
completely if the source is <= 64b. This change adds support for that in
the legalizer. If the source has a pow-2 num elements, then we can do
a tree reduction using the scalar operation in the individual elements.
Otherwise, we just create a sequential chain of operations.

For AArch64, we only need to scalarize if the input is <64b. If it's great than
64b then we can first do a fewElements step to 64b, taking advantage of vector
instructions until we reach the point of scalarization.

I also had to relax the verifier checks for reductions because the intrinsics
support <1 x EltTy> types, which we lower to scalars for GlobalISel.

Differential Revision: https://reviews.llvm.org/D108276
2021-08-19 16:38:52 -07:00
Amara Emerson a0051f7149 [AArch64][GlobalISel] Fix miscompile of <16 x s8> G_EXTRACT_VECTOR_ELT.
When support for copying vector s8 lanes was added recently, this also
had the side effect of fixing a fallback for <16 x s8> extracts since
both used the same helper. However, there was a bug in another helper
to get the regclass for a specific FPR-native type, which was assigning
FPR16 to s8 instead of FPR8.
2021-08-19 16:22:32 -07:00
Thomas Lively be6c49e743 [WebAssembly] Add explicit casts to silence -Wc++11-narrowing 2021-08-19 16:00:07 -07:00
Thomas Lively fd0557dbf1 [WebAssembly] More convert_low and promote_low codegen
The convert_low and promote_low instructions can widen the lower two lanes of a
four-lane vector, but we were previously scalarizing patterns that widened lanes
besides the low two lanes. The commit adds a shuffle to move the widened lanes
into the low lane positions so the convert_low and promote_low instructions can
be used instead of scalarizing.

Depends on D108266.

Differential Revision: https://reviews.llvm.org/D108341
2021-08-19 15:37:12 -07:00
Thomas Lively b311a040ef [WebAssembly] Pattern match SIMD convert_low and promote_low during ISel
Since the simplest DAG patterns for convert_low and promote_low instructions
involved v2i32, v2f32, v4i64, and v4f64 types, which are not legal in the
WebAssembly backend and would be eliminated by type legalization, we were
previously matching those patterns in a DAG combine before the type legalization
stage. However in cases where the vectors were wider than 128 bits, the patterns
we matched were not created until the type legalization stage when the wide
vectors were split up. Type legalization would continue to eliminate the illegal
types we were matching as well, so the code ended up scalarized.

To make the ISel for these instructions more robust, match the scalarized
patterns rather than the patterns containing illegal types. Add tests with
double-wide vectors to show that this works as intended.

Fixes PR51098.
Depends on D107502.

Differential Revision: https://reviews.llvm.org/D108266
2021-08-19 15:24:28 -07:00
Arthur Eubanks 44a3241f10 [NFC] Replace some attribute methods that use confusing indexes 2021-08-19 14:10:26 -07:00
Thomas Lively b69374ca58 [WebAssembly] Legalize vector types by widening
The default legalization of unsupported vector types is to promote the integers
in each lane, which leads to extra sign or zero extending and masking when
moving data into and out of vectors. Switch our preferred type legalization from
the default to vector widening, which keeps the data in the low lanes of the
vector rather than in the low bits of each lane. The unused high lanes can be
ignored.

Half-wide vectors are now loaded from memory into the low 64 bits of the v128
rather than spread out among the lanes. As a result, v128.load64_splat is a much
more common operation, so add new patterns to support it.

Differential Revision: https://reviews.llvm.org/D107502
2021-08-19 12:07:33 -07:00
Stanislav Mekhanoshin 8d7d89b081 [AMDGPU] Add alias.scope metadata to lowered LDS struct
Alias analysis is unable to disambiguate accesses to the structure
fields without it unlike distinct variables. As a result we cannot
combine ds_read and ds_write operations in a case of any store in
between which always considered clobbering.

Differential Revision: https://reviews.llvm.org/D108315
2021-08-19 11:40:30 -07:00
Tim Northover edab411ee6 AArch64: copy all parts of the mem operand across when combining a store
In particular we were dropping volatility, which can lead to unwanted
transformations.
2021-08-19 18:26:39 +01:00
Owen Anderson 06a4c85890 Use v16i8 rather than v2i64 as the VT for memset expansion on AArch64.
This allows the instruction selector to realize that it can directly
broadcast the low byte of the memset value, rather than replicating
it to a 64-bit GPR before broadcasting.

This fixes PR50985.

Differential Revision: https://reviews.llvm.org/D108354
2021-08-19 16:54:07 +00:00
David Green d10f23a25d [ISel] Expand saddsat and ssubsat via asr and xor
This changes the lowering of saddsat and ssubsat so that instead of
using:
  r,o = saddo x, y
  c = setcc r < 0
  s = c ? INTMAX : INTMIN
  ret o ? s : r
into using asr and xor to materialize the INTMAX/INTMIN constants:
  r,o = saddo x, y
  s = ashr r, BW-1
  x = xor s, INTMIN
  ret o ? x : r
https://alive2.llvm.org/ce/z/TYufgD

This seems to reduce the instruction count in most testcases across most
architectures. X86 has some custom lowering added to compensate for
cases where it can increase instruction count.

Differential Revision: https://reviews.llvm.org/D105853
2021-08-19 16:08:07 +01:00
Simon Pilgrim 9419729b6a [CostModel][X86] Add VPOPCNTDQ/BITALG ctpop costs
VPOPCNTDQ + BITALG add ctpop instructions for vXi64/vXi32 + vXi16/vXi8 vector types respectively
2021-08-19 15:40:09 +01:00
Craig Topper 36d8316cc8 [RISCV] Reduce duplicate code for calling SimplifyDemandedBits.
This encapsulates the APInt creation and worklist management into
a helper function.

To keep one common interface I've use Log2_32 in places that
previously created a mask by subtracting 1 from a power of 2.

Differential Revision: https://reviews.llvm.org/D108324
2021-08-19 07:09:38 -07:00
Matthew Devereau 734708e04f [AArch64][SVE] Teach cost model that masked loads/stores are cheap
Reduce the cost of VLS masked loads/stores to make the vectorizor emit them more frequently.
2021-08-19 13:01:33 +01:00
David Sherwood f4122398e7 [LoopVectorize][AArch64] Enable ordered reductions by default for AArch64
I have added a new TTI interface called enableOrderedReductions() that
controls whether or not ordered reductions should be enabled for a
given target. By default this returns false, whereas for AArch64 it
returns true and we rely upon the cost model to make sensible
vectorisation choices. It is still possible to override the new TTI
interface by setting the command line flag:

  -force-ordered-reductions=true|false

I have added a new RUN line to show that we use ordered reductions by
default for SVE and Neon:

  Transforms/LoopVectorize/AArch64/strict-fadd.ll
  Transforms/LoopVectorize/AArch64/scalable-strict-fadd.ll

Differential Revision: https://reviews.llvm.org/D106653
2021-08-19 09:29:40 +01:00
Jessica Paquette c22b64ef66 [AArch64][GlobalISel] Don't allow s128 for G_ISNAN
getAPFloatFromSize doesn't support s128, so we can't lower this without
asserting right now.

To fix the buildbots, don't allow any scalars other than s16, s32, and s64.
2021-08-18 13:59:00 -07:00
Jessica Paquette 3d91d5b757 [AArch64][GlobalISel] Mark G_FMINNUM/G_FMAXNUM as floating point opcodes
We need to ensure that these end up on FPR to allow imported patterns to
select them.

This will also ensure that we get good regbank selection when dealing with
instructions like G_PHI/G_LOAD/G_STORE which deduce their banks from their
uses/users.

Differential Revision: https://reviews.llvm.org/D108260
2021-08-18 13:32:19 -07:00
Jessica Paquette 45e1a6bd25 [AArch64][GlobalISel] Legalize scalar G_FMINNUM + G_FMAXNUM
For subtargets with full FP16, this is legal for s16, s32, and s64. Without
full FP16, it's legal for s32 and s64.

For s128, this is a libcall.

We also support some vector types, but for now, let's just support scalars.

Differential Revision: https://reviews.llvm.org/D108259
2021-08-18 13:30:03 -07:00
Joe Nash 9dbc968ed9 [AMDGPU] Fix atomic float max/min intrinsics
Hooked up raw.buffer.atomic.fmin/max.f64
This instruction should be available on GFX6, GFX7, and GFX10.
It was implemented for GFX90a with a different name.

Added intrinsic def for image_atomic_fmin/fmax; the instruction
defs were already there.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D108208

Change-Id: I473f98d28b2afbeeb2c27822d9686b5e86634e2f
2021-08-18 14:12:42 -04:00
Craig Topper 3f9b37ccb1 [RISCV] Remove sext_inreg+add/sub/mul/shl isel patterns.
Let the sext_inreg be selected to sext.w. Remove unneeded sext.w
during PostProcessISelDAG.

This gives opportunities for some other isel patterns to match
like the ADDIPair or matching mul with immediate to shXadd.

This becomes possible after D107658 started selecting W instructions
based on users. The sext.w will be considered a W user so isel
will often select a W instruction for the sext.w input and we can
just remove the sext.w. Otherwise we can combine the sext.w with
a ADD/SUB/MUL/SLLI to create a new W instruction in parallel
to the the original instruction.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D107708
2021-08-18 11:07:11 -07:00
Jessica Paquette 791006fb8c [GlobalISel] Implement lowering for G_ISNAN + use it in AArch64
GlobalISel equivalent to `TargetLowering::expandISNAN`.

Use it in AArch64 and add a testcase.

Differential Revision: https://reviews.llvm.org/D108227
2021-08-18 10:54:25 -07:00
Craig Topper 6d7ea597ef [RISCV] Insert sext_inreg when type legalizing add/sub/mul with constant LHS.
We already do this for non-constants RHS. This just removes the
special case. I believe the special case may have been needed
because the ANY_EXTEND of a constant used to create zero extended
constants, but we recently changed that to produce sign extended
constants.

D107658 is needed to prevent some regressions.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D107697
2021-08-18 10:44:25 -07:00
Craig Topper 20e6265873 [RISCV] Improve constant materialization for stores of i16 or i32 negative constants.
DAGCombiner::visitStore can clear the upper bits of constants
used by stores. This leads prevents them from being recognized as
sign extended negative values making them more expensive to
materialize.

This patch uses the hasAllNBitUsers method from D107658 to make
a negative constant if none of the users care about the upper bits.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D108052
2021-08-18 10:25:12 -07:00
Craig Topper d9ba1a9c5c [RISCV] Teach isel to select ADDW/SUBW/MULW/SLLIW when only the lower 32-bits are used.
We normally select these when the root node is a sext_inreg, but
SimplifyDemandedBits can sometimes bypass the sext_inreg for some
users. This can create situation where sext_inreg+add/sub/mul/shl
is selected to a W instruction, and then the add/sub/mul/shl is
separately selected to a non-W instruction with the same inputs.

This patch tries to detect when it would still be ok to use a W
instruction without the sext_inreg by checking the direct users.
This can allow the W instruction to CSE with one created for a
sext_inreg+add/sub/mul/shl. To minimize complexity and cost of
checking, we make no attempt to determine if the CSE will happen
and just always use a W instruction when we can.

Differential Revision: https://reviews.llvm.org/D107658
2021-08-18 10:22:00 -07:00
Craig Topper f70238914a [RISCV] Add zext.h/zext.w to RISCVTTIImpl::getIntImmCostInst.
If we have these instructions, we don't need to hoist the immediate
for an AND that would match them.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D107783
2021-08-18 09:40:40 -07:00
David Sherwood 219d4518fc [Analysis][AArch64] Make fixed-width ordered reductions slightly more expensive
For tight loops like this:

  float r = 0;
  for (int i = 0; i < n; i++) {
    r += a[i];
  }

it's better not to vectorise at -O3 using fixed-width ordered reductions
on AArch64 targets. Although the resulting number of instructions in the
generated code ends up being comparable to not vectorising at all, there
may be additional costs on some CPUs, for example perhaps the scheduling
is worse. It makes sense to deter vectorisation in tight loops.

Differential Revision: https://reviews.llvm.org/D108292
2021-08-18 17:01:56 +01:00
Bing1 Yu ffe58de393 [X86] [AMX] Fix the test case failure caused by D107544.
The issue can be duplicated when EXPENSIVE_CHECKS is specified for llvm
build. Thank Simon report this issue at
https://bugs.llvm.org/show_bug.cgi?id=51513. We need return correct
value for the changed IR.

Reviewed By: RKSimon, LuoYuanke

Differential Revision: https://reviews.llvm.org/D108269
2021-08-18 22:27:22 +08:00
Tim Northover 8eb054a87d AArch64: compare correct type for multi-valued SDNode.
If Orig produces more than one value (rare) with different types (rarer) then
we need to make sure we check against the one that Orig actually represents,
not just the first type.

Unfortunately because of the combination of things that need to happen I wasn't
able to produce a test.
2021-08-18 09:35:31 +01:00
Amara Emerson 284006079e [AArch64][GlobalISel] Add support for selection of s8:fpr = G_UNMERGE <8 x s8> 2021-08-18 00:34:06 -07:00
Christudasan Devadasan 4f5ba46e16 [AMDGPU] Set wait state for meta instructions to zero
It looked more reasonable to set the wait state to
zero for all non-instructions. With that we can avoid
the special handling for them in `getWaitStatesSince`
and `AdvanceCycle`. This NFC patch makes the handling
more generic.
2021-08-18 01:46:59 -04:00
Arthur Eubanks 3f4d00bc3b [NFC] More get/removeAttribute() cleanup 2021-08-17 21:05:41 -07:00
Arthur Eubanks de0ae9e89e [NFC] Cleanup more AttributeList::addAttribute() 2021-08-17 21:05:41 -07:00
Arthur Eubanks ad727ab7d9 [NFC] Migrate some callers away from Function/AttributeLists methods that take an index
These methods can be confusing.
2021-08-17 21:05:40 -07:00
Arthur Eubanks 46cf82532c [NFC] Replace Function handling of attributes with less confusing calls
To avoid magic constants and confusing indexes.
2021-08-17 21:05:40 -07:00
Qiu Chaofan 5ca250a03d [RegAlloc] Remove addAllocPriorityToGlobalRanges hook
It was introduced in 1a6dc92 and only enabled on PowerPC/AMDGPU. That
should be enabled for all targets.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D108010
2021-08-18 10:21:27 +08:00
Wang, Pengfei 2379949aad [X86] AVX512FP16 instructions enabling 3/6
Enable FP16 conversion instructions.

Ref.: https://software.intel.com/content/www/us/en/develop/download/intel-avx512-fp16-architecture-specification.html

Reviewed By: LuoYuanke

Differential Revision: https://reviews.llvm.org/D105265
2021-08-18 09:03:41 +08:00
Simon Pilgrim 359cfa2af7 [X86] EmitInstrWithCustomInserter - silence uninitialized variable warnings. NFCI.
Consistently use llvm_unreachable for the default case in the inner switch statements.
2021-08-17 22:01:07 +01:00
Craig Topper 8f6cea43e7 [RISCV] Use RISCV::RVVBitsPerBlock for RGK_ScalableVector in getRegisterBitWidth.
I might be wrong, but I think this is should be width of the known
min size we use for scalable vectors. It shouldn't scale with
minimum vlen.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D107945
2021-08-17 11:13:15 -07:00
Simon Pilgrim caff2acae1 [AArch64] AArch64DAGToDAGISel::tryReadRegister/tryWriteRegister - don't dereference dyn_cast<> results.
dyn_cast<> can return nullptr if the cast is illegal, use cast<> instead which will assert that the cast is correct.

Fixes static analyser warnings.
2021-08-17 18:40:59 +01:00
Simon Pilgrim 1e770f0388 [ARM] ARMDAGToDAGISel::tryReadRegister/tryWriteRegister - don't dereference dyn_cast<> results.
dyn_cast<> can return nullptr if the cast is illegal, use cast<> instead which will assert that the cast is correct.

Fixes static analyser warnings.
2021-08-17 18:40:59 +01:00
Simon Pilgrim fb81271e8b [AMDGPU] Fix lowering of AMDGPU::G_CTTZ_ZERO_UNDEF to AMDGPU::G_AMDGPU_FFBL_B32
As mentioned on D107474, there was a copy+paste typo repeating G_CTLZ_ZERO_UNDEF that coverity reported as dead code.

Differential Revision: https://reviews.llvm.org/D108210
2021-08-17 18:09:57 +01:00
Roman Lebedev 2078c4ecfd
[X86] Lower insertions into upper half of an 256-bit vector as broadcast+blend (PR50971)
Broadcast is not worse than extract+insert of subvector.
https://godbolt.org/z/aPq98G6Yh

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D105390
2021-08-17 18:45:10 +03:00
Dylan Fleming ef198cd99e [SVE] Remove usage of getMaxVScale for AArch64, in favour of IR Attribute
Removed AArch64 usage of the getMaxVScale interface, replacing it with
the vscale_range(min, max) IR Attribute.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D106277
2021-08-17 14:42:47 +01:00
David Green 52e0cf9d61 [ARM] Enable subreg liveness
This enables subreg liveness in the arm backend when MVE is present,
which allows the register allocator to detect when subregister are
alive/dead, compared to only acting on full registers. This can helps
produce better code on MVE with the way MQPR registers are made up of
SPR registers, but is especially helpful for MQQPR and MQQQQPR
registers, where there are very few "registers" available and being able
to split them up into subregs can help produce much better code.

Differential Revision: https://reviews.llvm.org/D107642
2021-08-17 14:10:33 +01:00
David Green 62e892fa2d [ARM] Add MQQPR and MQQQQPR spill and reload pseudo instructions
As a part of D107642, this adds pseudo instructions for MQQPR and
MQQQQPR register classes, that can spill and reloads entire registers
whilst keeping them combined, not splitting them into multiple D subregs
that a VLDMIA/VSTMIA would use. This can help certain analyses, and
helps to prevent verifier issues with subreg liveness.
2021-08-17 13:51:34 +01:00
Sebastian Neubauer fbae34635d [GlobalISel] Add combine for PTR_ADD with regbanks
Combine two G_PTR_ADDs, but keep the register bank of the constant.
That way, the combine can be used in post-regbank-select combines.

Introduce two helper methods in CombinerHelper, getRegBank and
setRegBank that get and set an optional register bank to a register.
That way, they can be used before and after register bank selection.

Differential Revision: https://reviews.llvm.org/D103326
2021-08-17 13:58:16 +02:00
Simon Pilgrim 895ed64009 [AArch64] LowerCONCAT_VECTORS - merge getNumOperands() calls. NFCI.
Improves on the unused variable fix from rG4357562067003e25ab343a2d67a60bd89cd66dbf
2021-08-17 11:23:03 +01:00
Bing1 Yu bcec4ccd04 [X86] [AMX] Replace bitcast with specific AMX intrinsics with X86 specific cast.
There is some discussion on the bitcast for vector and x86_amx at https://reviews.llvm.org/D99152. This patch is to introduce a x86 specific cast for vector and x86_amx, so that it can avoid some unnecessary optimization by middle-end. On the other way, we have to optimize the x86 specific cast by ourselves. This patch also optimize the cast operation to eliminate redundant code.

Reviewed By: LuoYuanke

Differential Revision: https://reviews.llvm.org/D107544
2021-08-17 17:04:26 +08:00
David Stuttard ebdb0d09a4 AMDGPU: During img instruction ret value construction cater for non int values
Make sure return type is int type.

Differential Revision: https://reviews.llvm.org/D108131

Change-Id: Ic02f07d1234cd51b6ed78c3fecd2cb1d6acd5644
2021-08-17 09:08:24 +01:00
Christudasan Devadasan 686607676f [AMDGPU] Skip pseudo MIs in hazard recognizer
Instructions like WAVE_BARRIER and SI_MASKED_UNREACHABLE
are only placeholders to prevent certain unwanted
transformations and will get discarded during assembly
emission. They should not be counted during nop insertion.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D108022
2021-08-16 23:11:14 -04:00