Commit Graph

63852 Commits

Author SHA1 Message Date
Carl Ritson 99c790dc21 [AMDGPU] Make BVH isel consistent with other MIMG opcodes
Suffix opcodes with _gfx10.
Remove direct references to architecture specific opcodes.
Add a BVH flag and apply this to diassembly.
Fix a number of disassembly errors on gfx90a target caused by
previous incorrect BVH detection code.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D108117
2021-08-17 10:42:22 +09:00
Min-Yih Hsu eec3495a9d [M68k] Do not pass llvm::Function& to M68kCCState
Previously we're passing `llvm::Function&` into `M68kCCState` to lower
arguments in fastcc. However, that reference might not be available if
it's a library call and we only need its argument types. Therefore,
now we're simply passing a list of argument llvm::Type-s.

This fixes PR-50752.

Differential Revision: https://reviews.llvm.org/D108101
2021-08-16 15:33:08 -07:00
David Green 9236dea255 [ARM] Create MQQPR and MQQQQPR register classes
Similar to the MQPR register class as the MVE equivalent to QPR, this
adds MQQPR and MQQQQPR register classes for the MVE equivalents of QQPR
and QQQQPR registers. The MVE MQPR seemed have worked out quite well,
and adding MQQPR and MQQQQPR allows us to a little more accurately
specify the number of registers, calculating register pressure limits a
little better.

Differential Revision: https://reviews.llvm.org/D107463
2021-08-16 22:58:12 +01:00
Jordan Rupprecht 4357562067 [NFC][AArch64] Fix unused var in release build 2021-08-16 10:04:32 -07:00
Simon Pilgrim d6fe8d37c6 [DAG] Fold concat_vectors(concat_vectors(x,y),concat_vectors(a,b)) -> concat_vectors(x,y,a,b)
Follow-up to D107068, attempt to fold nested concat_vectors/undefs, as long as both the vector and inner subvector types are legal.

This exposed the same issue in ARM's MVE LowerCONCAT_VECTORS_i1 (raised as PR51365) and AArch64's performConcatVectorsCombine which both assumed concat_vectors only took 2 subvector operands.

Differential Revision: https://reviews.llvm.org/D107597
2021-08-16 16:06:54 +01:00
Cullen Rhodes 09507b5325 [AArch64][SME] Disable NEON in streaming mode
In streaming mode most of the NEON instruction set is illegal, disable
NEON when compiling with `+streaming-sve`, unless NEON is explictly
requested.

Subsequent patches will add support for the small subset of NEON
instructions that are legal in streaming mode.

Reviewed By: paulwalker-arm, david-arm

Differential Revision: https://reviews.llvm.org/D107902
2021-08-16 07:56:48 +00:00
Craig Topper b82ce77b2b [X86] Support avx512fp16 compare instructions in the IntelInstPrinter.
This enables printing of the mnemonics that contain the predicate
in the Intel printer. This requires accounting for the memory size
that is explicitly printed in Intel syntax. Those changes have been
synced to the ATT printer as well.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D108093
2021-08-16 12:31:36 +08:00
Craig Topper ff95d2524a [X86] Prevent accidentally accepting cmpeqsh as a valid mnemonic.
We should only accept as vcmpeqsh.

Same for all the other 31 comparison values.
2021-08-15 12:00:56 -07:00
Craig Topper 819818f7d5 [X86] Modify the commuted load isel pattern for VCMPSHZrm to match VCMPSSZrm/VCMPSDZrm.
This allows commuting any immediate value. The previous code only
commuted equality immediates. This was inherited from an earlier
version of VCMPSSZrm/VCMPSDZrm.
2021-08-15 11:43:56 -07:00
Craig Topper 786b8fcc9b [X86] Add vcmpsh/vcmpph to X86InstrInfo::commuteInstructionImpl.
They were already added to findCommuteOpIndices, but they also
need to be in X86InstrInfo::commuteInstructionImpl in order
to adjust the immediate control.
2021-08-15 11:36:13 -07:00
Nikita Popov 81b106584f [AArch64] Fix comparison peephole opt with non-0/1 immediate (PR51476)
This is a non-intrusive fix for
https://bugs.llvm.org/show_bug.cgi?id=51476 intended for backport
to the 13.x release branch. It expands on the current hack by
distinguishing between CmpValue of 0, 1 and 2, where 0 and 1 have
the obvious meaning and 2 means "anything else". The new optimization
from D98564 should only be performed for CmpValue of 0 or 1.

For main, I think we should switch the analyzeCompare() and
optimizeCompare() APIs to use int64_t instead of int, which is in
line with MachineOperand's notion of an immediate, and avoids this
problem altogether.

Differential Revision: https://reviews.llvm.org/D108076
2021-08-15 12:35:52 +02:00
Wang, Pengfei f1de9d6dae [X86] AVX512FP16 instructions enabling 2/6
Enable FP16 binary operator instructions.

Ref.: https://software.intel.com/content/www/us/en/develop/download/intel-avx512-fp16-architecture-specification.html

Reviewed By: LuoYuanke

Differential Revision: https://reviews.llvm.org/D105264
2021-08-15 08:56:33 +08:00
Kazu Hirata 915cc69259 [Aarch64] Remove redundant c_str (NFC)
Identified with readability-redundant-string-cstr.
2021-08-14 08:49:40 -07:00
Craig Topper d63f117210 [RISCV] Support RISCVISD::SELECT_CC in ComputeNumSignBitsForTargetNode. 2021-08-13 18:00:09 -07:00
Matt Arsenault a77ae4aa6a AMDGPU: Stop attributor adding attributes to intrinsic declarations 2021-08-13 20:51:48 -04:00
Matt Arsenault 5beb9a0e6a AMDGPU: Respect compute ABI attributes with unknown OS
Unfortunately Mesa is still using amdgcn-- as the triple for OpenGL,
so we still have the awkward unknown OS case to deal with. Previously
if the HSA ABI intrinsics appeared, we we would not add the ABI
registers to the function. We would emit an error later, but we still
need to produce some compile result. Start adding the registers to any
compute function, regardless of the OS. This keeps the internal state
more consistent, and will help avoid numerous test crashes in a future
patch which starts assuming the ABI inputs are present on functions by
default.
2021-08-13 20:44:46 -04:00
Arthur Eubanks 16e8134e7c [NFC] One more AttributeList::getAttribute(FunctionIndex) -> getFnAttr() 2021-08-13 16:56:42 -07:00
Arthur Eubanks d7593ebaee [NFC] Clean up users of AttributeList::hasAttribute()
AttributeList::hasAttribute() is confusing, use clearer methods like
hasParamAttr()/hasRetAttr().

Add hasRetAttr() since it was missing from AttributeList.
2021-08-13 11:59:18 -07:00
Arthur Eubanks 80ea2bb574 [NFC] Rename AttributeList::getParam/Ret/FnAttributes() -> get*Attributes()
This is more consistent with similar methods.
2021-08-13 11:16:52 -07:00
Arthur Eubanks 92ce6db9ee [NFC] Rename AttributeList::hasFnAttribute() -> hasFnAttr()
This is more consistent with similar methods.
2021-08-13 11:09:18 -07:00
Arthur Eubanks a0c42ca56c [NFC] Remove AttributeList::hasParamAttribute()
It's the same as AttributeList::hasParamAttr().
2021-08-13 10:58:21 -07:00
Amy Kwan 581a80304c [PowerPC] Disable CTR Loop generate for fma with the PPC double double type.
It is possible to generate the llvm.fmuladd.ppcf128 intrinsic, and there is no actual
FMA instruction that corresponds to this intrinsic call for ppcf128. Thus, this
intrinsic needs to remain as a call as it cannot be lowered to any instruction, which
also means we need to disable CTR loop generation for fma involving the ppcf128 type.
This patch accomplishes this behaviour.

Differential Revision: https://reviews.llvm.org/D107914
2021-08-13 12:27:24 -05:00
Jessica Paquette ccfc079047 [AArch64][GlobalISel] Legalize scalar G_SSUBSAT + G_SADDSAT
These are lowered, matching SDAG behaviour. (See
llvm/test/CodeGen/AArch64/ssub_sat.ll and llvm/test/CodeGen/AArch64/sadd_sat.ll)

These fall back ~159 times on a build of clang with GISel enabled.

Differential Revision: https://reviews.llvm.org/D107777
2021-08-13 09:02:25 -07:00
Shivam Gupta 835ea22b37 [AVR] Enable machine verifier
Reviewed By: mhjacobson, benshi001

Differential Revision: https://reviews.llvm.org/D107853
2021-08-13 12:11:22 +08:00
Heejin Ahn adb96d2e76 [WebAssembly] Fix leak in Emscripten SjLj
For SjLj, we allocate a table to record setjmp buffer info in the entry
of each setjmp-calling function by inserting a `malloc` call, and insert
a `free` call to free the buffer before each `ret` instruction.

But this is not sufficient; we have to free the buffer before we throw.
In SjLj handling, normal functions that can possibly throw or longjmp
are wrapped with an invoke and caught within the function so they don't
end up escaping the function. But three functions throw and escape the
function:
- `__resumeException` (Emscripten library function used for Emscripten
  EH)
- `emscripten_longjmp` (Emscripten library function used for Emscripten
  SjLj)
- `__cxa_throw` (libc++abi function called when for C++ `throw` keyword)

The first two functions are used to rethrow the current
exception/longjmp when the caught exception/longjmp is not for the
current function. `__cxa_throw` is used for exception, and because we
consider that a function that cannot longjmp, it escapes the function
right away, before which we should free the buffer.

Currently `lsan.test_longjmp3` and `lsan.test_exceptions_longjmp3` fail
in Emscripten; this CL fixes these.

Reviewed By: dschuff

Differential Revision: https://reviews.llvm.org/D107852
2021-08-12 16:32:46 -07:00
Heejin Ahn aca198cf74 [WebAssembly] Error out when Emscripten SjLj setjmp is used with Wasm EH
Currently, when Wasm EH is used with Emscripten SjLj, Emscripten SjLj
cannot handle `invoke` instructions - it assumes all `invoke`s have been
lowered away with Emscripten EH. But in Wasm EH they are lowered in
instruction selection, so they are still present in the IR stage. This
happens when
1. Wasm EH and Emscripten SjLj are used together
2. A function that calls `setjmp` uses exceptions, i.e., has `invoke`s

We were already erroring out with an assertion failure in this case, but
this CL makes it error out more properly with a valid error message.

Wasm EH + Wasm SjLj will not have this restrictions. (it will have
another restriction though, e.g., `setjmp` cannot be called within
`catch`. But why would anyone do that..)

Reviewed By: dschuff

Differential Revision: https://reviews.llvm.org/D107687
2021-08-12 16:19:04 -07:00
Lei Huang 8930af45c3 [PowerPC] Implement XL compatibility builtin __addex
Add builtin and intrinsic for `__addex`.

This patch is part of a series of patches to provide builtins for
compatibility with the XL compiler.

Reviewed By: stefanp, nemanjai, NeHuang

Differential Revision: https://reviews.llvm.org/D107002
2021-08-12 16:38:21 -05:00
Heejin Ahn 78e87970af [WebAssembly] Disable offset folding for function addresses
Wasm does not support function addresses with offsets, but isel can
generate folded SDValues in the form of (@func + offset) without this
patch.

Fixes https://bugs.llvm.org/show_bug.cgi?id=43133.

Reviewed By: dschuff, sbc100

Differential Revision: https://reviews.llvm.org/D107940
2021-08-12 13:40:41 -07:00
Craig Topper 79fbddbea0 [RISCV] Teach vsetvli insertion pass that it doesn't need to insert vsetvli for unit-stride or strided loads/stores in some cases.
For unit-stride and strided load/stores we set the SEW operand of
the pseudo instruction equal the EEW in the opcode. The LMUL
of the pseudo instruction is the LMUL we want.

These instructions calculate EMUL=(EEW/SEW) * LMUL. We can use
this to avoid changing vtype if the SEW/LMUL of the previous
vtype matches the EEW/EMUL ratio we need for the instruction.

Due to how the global analysis works, we can only do this
optimization when the previous vsetvli was produced in the block
containing the store. We need to know in the first phase if the
vsetvli will be inserted so we can propagate information to
the successors in the second phase correctly. This means we can't
depend on predecessors.

Reviewed By: rogfer01

Differential Revision: https://reviews.llvm.org/D106601
2021-08-12 10:05:27 -07:00
David Green ae9a346ef8 [ARM] Fix DAG combine loop in reduction distribution
Given a constant operand, the MVE and DAGCombine combines could fight,
each redistributing in the opposite order. Add a guard to the MVE
vecreduce distribution to prevent that.
2021-08-12 16:37:39 +01:00
Victor Huang 99e00663d4 [PowerPC] Fix return address computation for "__builtin_return_address"
When depth > 0, callee frame address is used to compute the return address of
callee producing improper return address. This patch adds the fix to use caller
frame address to compute the return address of callee.

Reviewed By: nemanjai, #powerpc

Differential revision: https://reviews.llvm.org/D107646
2021-08-12 09:44:49 -05:00
David Truby 9c47d6b48d [llvm][sve] Lowering for VLS extending loads
This patch enables extending loads for fixed length SVE code generation.

There is a slight regression here in the mulh tests; since these tests
load the parameter and then extend it these are treated as extending
loads which are merged, preventing the mulh instruction from being
generated. As this affects scalable SVE codegen as well this should be
addressed in a separate patch.

Reviewed By: bsmith

Differential Revision: https://reviews.llvm.org/D107057
2021-08-12 09:43:39 +00:00
Cullen Rhodes 419deccfd1 [AArch64] NFC: Remove register decoder tables in disassembler
The register classes are generated by TableGen, use them instead of
handwritten tables.

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D107763
2021-08-12 07:28:56 +00:00
Amara Emerson 73056f239e [AArch64][GlobalISel] Simplify/nuke the merge/unmerge legalizer rules.
These rules were originally written when the new predicate based legalizer
was introduced in an attempt to preserve existing behaviour. It wasn't
properly kept up to date as things like vector support was split out into
G_CONCAT_VECTORS, and frankly, even if it was, it was too complex.

It's much easier to start from scratch with what we can actually support,
which is just a few type combinations. Anything illegal we should either
legalize, or should be eliminated as a side effect of artifact combination.

Differential Revision: https://reviews.llvm.org/D107937
2021-08-11 16:45:23 -07:00
Usman Nadeem 9396c3ec7b [AArch64][SVE] Remove assertion/range check for i16 values during immediate selection
The assertion can fail in some cases when an i16 constant is promoted
to i32.

e.g. in the added test case the value `i16 -32768` is within the range
of i16 but the assert fails when the constant is promoted to positive
`i32 32768` by an earlier call to DAG.getConstant().

Differential Revision: https://reviews.llvm.org/D107880

Change-Id: I2f6179783cbc9630e6acab149a762b43c65664de
2021-08-11 14:50:20 -07:00
Amara Emerson 2c1789bc8c [AArch64][GlobalISel] Add ptradd_immed_chain combine to post-legalizer combiner. 2021-08-11 13:59:23 -07:00
David Green 8c50b5fbfe [ARM] Add extra debug messages for validating live outs. NFC
We are running into more and more cases where the liveouts of low
overhead loops do not validate. Add some extra debug messages to make it
clearer why.
2021-08-11 10:35:53 +01:00
Cullen Rhodes 1fe0e6a380 [AArch64][SME] Support ptrue(s) in streaming mode
The ptrue and ptrues instructions are legal in streaming mode, missed in
D106272.

The reference can be found here:
https://developer.arm.com/documentation/ddi0602/2021-06/SVE-Instructions

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D107807
2021-08-11 07:49:36 +00:00
Christopher Di Bella c874dd5362 [llvm][clang][NFC] updates inline licence info
Some files still contained the old University of Illinois Open Source
Licence header. This patch replaces that with the Apache 2 with LLVM
Exception licence.

Differential Revision: https://reviews.llvm.org/D107528
2021-08-11 02:48:53 +00:00
Sushma Unnibhavi 7bdce6bcbd [M68k][GloballSel] RegBankSelect implementation
Implementation of RegBankSelect for the M68k backend.

Differential Revision: https://reviews.llvm.org/D107542
2021-08-10 15:24:43 -07:00
Thomas Johnson b821086876 [ARC] Add codegen for count trailing zeros intrinsic for the ARC backend
Differential Revision: https://reviews.llvm.org/D107828
2021-08-10 12:07:35 -07:00
Matt Arsenault d719f1c3cc AMDGPU: Add alloc priority to global ranges
The requested register class priorities weren't respected
globally. Not sure why this is a target option, and not just the
expected behavior (recently added in
1a6dc92be7). This avoids an allocation
failure when many wide tuple spills are introduced. I think this is a
workaround since I would not expect the allocation priority to be
required, and only a performance hint. The allocator should be smarter
about when only a subregister needs to be spilled and restored.

This does regress a couple of degenerate store stress lit tests which
shouldn't be too important.
2021-08-10 13:12:34 -04:00
Craig Topper 6f5edc3487 [RISCV] Fold (add (select lhs, rhs, cc, 0, y), x) -> (select lhs, rhs, cc, x, (add x, y))
Similar for sub except sub isn't commutative.

Modify the existing and/or/xor folds to also work on ISD::SELECT
and not just RISCVISD::SELECT_CC. This is needed to make sure
we do this transform before type legalization turns i32 add/sub
into add/sub+sign_extend_inreg on RV64. If we don't do this before
that, the sign_extend_inreg will still be after the select.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D107603
2021-08-10 09:02:56 -07:00
David Green 013030a0b2 [AArch64] Correct sinking of shuffles to adds/subs
This was checking extends as shuffles, where as we should be checking
the operands. This helps sink the shuffles, creating more addl/subl
instructions.

Differential Revision: https://reviews.llvm.org/D107623
2021-08-10 13:25:42 +01:00
Tim Northover 5ad0860899 AArch64: support @llvm.va_copy in GISel 2021-08-10 13:11:03 +01:00
David Green c140ff493e [ARM] Change a couple of instances of LiveRegs.contains to !LiveRegs.available
This changes a couple of calls to LiveRegs.contains to
!LiveRegs.available, one in Thumb1FrameLoweringInfo (which modifies a
test to look more correct to me, given r7 should be the frame pointer so
is not available), and another in the ARMLoadStoreOptimizer, that I
don't have a test for, it was just found by inspection.

Differential Revision: https://reviews.llvm.org/D107454
2021-08-10 09:53:26 +01:00
Tony Tye 53eb469195 [AMDGPU] Support non-strictly stronger memory orderings in SIMemoryLegalizer
C++20 no longer requires the failure memory ordering to be no stronger than the
success memory ordering. Adjust assert in AMD GPU SIMemoryLegalizer, and merge
instruction memory orderings

Add common operation to merge memory orders that allows non strict memory
orderings to be combined. Use it in SIMemoryLegalizer and
MachineMemOperand::getMergedOrdering.

Reviewed By: efriedma, rampitec

Differential Revision: https://reviews.llvm.org/D106729
2021-08-10 08:43:03 +00:00
Cullen Rhodes 81f057c253 [AArch64][SVE] NFC: Remove unused p0-p7 with element size predicates
Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D107752
2021-08-10 07:56:22 +00:00
Wang, Pengfei 6f7f5b54c8 [X86] AVX512FP16 instructions enabling 1/6
1. Enable FP16 type support and basic declarations used by following patches.
2. Enable new instructions VMOVW and VMOVSH.

Ref.: https://software.intel.com/content/www/us/en/develop/download/intel-avx512-fp16-architecture-specification.html

Reviewed By: LuoYuanke

Differential Revision: https://reviews.llvm.org/D105263
2021-08-10 12:46:01 +08:00
Usman Nadeem 5420fc4a27 [AArch64][SVE][InstCombine] Unpack of a splat vector -> Scalar extend
Replace vector unpack operation with a scalar extend operation.
  unpack(splat(X)) --> splat(extend(X))

If we have both, unpkhi and unpklo, for the same vector then we may
save a register in some cases, e.g:
  Hi = unpkhi (splat(X))
  Lo = unpklo(splat(X))
   --> Hi = Lo = splat(extend(X))

Differential Revision: https://reviews.llvm.org/D106929

Change-Id: I77c5c201131e3a50de1cdccbdcf84420f5b2244b
2021-08-09 14:58:54 -07:00
Usman Nadeem 85bbc05154 [AArch64][SVE][InstCombine] Move last{a,b} before binop if one operand is a splat value
Move the last{a,b} operation to the vector operand of the binary instruction if
the binop's operand is a splat value. This essentially converts the binop
to a scalar operation.

Example:
    // If x and/or y is a splat value:
    lastX (binop (x, y)) --> binop(lastX(x), lastX(y))

Differential Revision: https://reviews.llvm.org/D106932

Change-Id: I93ff5302f9a7972405ee0d3854cf115f072e99c0
2021-08-09 14:48:41 -07:00
Eli Friedman ac20e56911 [AArch64] Implement FCOPYSIGN for SVE.
I was originally going to try to implement this in target-independent
code, but it's actually sort of tricky to generate the correct sequence
for vectors like nxv2f32.  So just stick this in target-specific code,
at least for now.

Differential Revision: https://reviews.llvm.org/D107608
2021-08-09 12:06:48 -07:00
Bradley Smith 73ecb9987b [AArch64][SVE] Fix assertion failure when lowering fixed length gather/scatter
The patterns for fixed length gather/scatter with 32-bit offsets and
64-bit memory type are slightly different that the rest of the patterns,
as such the lowering needs to be slightly different to ensure the
correct types are used.

Differential Revision: https://reviews.llvm.org/D107576
2021-08-09 14:05:22 +00:00
Fraser Cormack 2b4a1d4b86 [RISCV] Improve codegen for shuffles with LHS/RHS splats
Shuffles which are broken into separate halves reveal splats in which
a half is accessed via one index; such operations can be optimized to
use "vrgather.vi".

This optimization could be achieved by adding extra patterns to match
`vrgather_vv_vl` which uses a splat as an index operand, but this patch
instead identifies splat earlier. This way, future optimizations can
build on top of the data gathered here, e.g., to splat-gather dominant
indices and insert any leftovers.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D107449
2021-08-09 10:31:40 +01:00
Min-Yih Hsu 7cbcde4aa3 [M68k] Use separate asm operand class for different widths of address
This could help asm parser to pick the correct variant of instruction.
This patch also migrated all the control instructions MC tests.
2021-08-09 00:07:19 -07:00
Min-Yih Hsu cf277f0b31 [M68k][NFC] Coalesce render methods in different asm register op class
And assign RegClass (i.e. operand class for all GPR) as the super class
of ARegClass and DRegClass. Note that this is a NFC change because
actually we already had XRDReg to model either address or data register
operands (as well as test coverage for it). The new super class syntax
added here is just making the relations between three RegClass-es more
explicit.
2021-08-09 00:07:19 -07:00
Cullen Rhodes 1a18bb9270 [AArch64] NFC: Remove DecodeVectorRegisterClass from disassembler
The decoder function and table are the same as FPR128, use that instead.

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D107644
2021-08-09 06:52:47 +00:00
Craig Topper 2f3b738960 [RISCV] Add optimizations for FMV_X_ANYEXTH similar to FMV_X_ANYEXTW_RV64.
This enables the fneg and fabs combines we have for FMV_X_ANYEXTW_RV64.
2021-08-08 18:30:48 -07:00
Craig Topper 88bc29f5f2 [RISCV] Introduce a RISCV CondCode enum instead of using ISD:SET* in MIR. NFC
Previously we converted ISD condition codes to integers and stored
them directly in our MIR instructions. The ISD enum kind of belongs
to SelectionDAG so that seems like incorrect layering.

This patch instead uses a CondCode node on RISCV::SELECT_CC until
isel and then converts it from ISD encoding to a RISCV specific value.
This value can be converted to/from the RISCV branch opcodes in the
RISCV namespace.

My larger motivation is to possibly support a microarchitectural
feature of some CPUs where a short forward branch over a single
instruction can be predicated internally. This will require a new
pseudo instruction for select that needs to carry a branch condition
and live probably until RISCVExpandPseudos. At that point it can be
expanded to control flow without other instructions ending up in the
predicated basic block. Using an ISD encoding in RISCVExpandPseudos
doesn't seem like correct layering.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D107400
2021-08-08 17:25:37 -07:00
Craig Topper 20dfe051ab [RISCV] Move the $rs operand of PseudoStore from outs to ins. NFC
This is the data to be stored so it should be an input.

To keep operand order similar between loads and stores, move the temp
register to the first dest operand of floating point loads. Rework
the assembler code accordingly.

This doesn't have any functional effect because this Pseudo is only
used by the assembler which doesn't use ins/outs.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D107309
2021-08-08 15:58:24 -07:00
Min-Yih Hsu 657bb7262d [M68k] Separate ADDA from ADD and migrate rest of the arithmetic MC tests
Previously ADD & ADDA (as well as SUB & SUBA) instructions are mixed
together, which not only violated Motorola assembly's syntax but also
made asm parsing more difficult. This patch separates these two kinds of
instructions migrate rest of the tests from
test/CodeGen/M68k/Encoding/Arithmetic to test/MC/M68k/Arithmetic.

Note that we observed minor regressions on codegen quality: Sometimes
isel uses ADD instead of ADDA even the latter can lead to shorter
sequence of code. This issue implies that some isel patterns might need
to be updated.
2021-08-07 17:19:12 -07:00
Craig Topper d4ee84ceee [RISCV] Support FP_TO_S/UINT_SAT for i32 and i64.
The fcvt fp to integer instructions saturate if their input is
infinity or out of range, but the instructions produce a maximum
integer for nan instead of 0 required for the ISD opcodes.

This means we can use the instructions to do the saturating
conversion, but we'll need to fix up the nan case at the end.

We can probably improve the i8 and i16 default codegen as well,
but I'll leave that for a follow up.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D107230
2021-08-07 16:06:00 -07:00
Craig Topper 24dfba8d50 [X86] Teach shouldSinkOperands to recognize pmuldq/pmuludq patterns.
The IR for pmuldq/pmuludq intrinsics uses a sext_inreg/zext_inreg
pattern on the inputs. Ideally we pattern match these away during
isel. It is possible for LICM or other middle end optimizations
to separate the extend from the mul. This prevents SelectionDAG
from removing it or depending on how the extend is lowered, we
may not be able to generate an AssertSExt/AssertZExt in the
mul basic block. This will prevent pmuldq/pmuludq from being
formed at all.

This patch teaches shouldSinkOperands to recognize this so
that CodeGenPrepare will clone the extend into the same basic
block as the mul.

Fixes PR51371.

Differential Revision: https://reviews.llvm.org/D107689
2021-08-07 08:45:56 -07:00
Steffen Larsen 1b4c85fc02 [NVPTX] Add NVPTX intrinsics for CUDA PTX 6.5 ldmatrix instructions
Adds NVPTX intrinsics for the CUDA PTX `ldmatrix.sync.aligned` instructions added in PTX 6.5.

PTX ISA description of `ldmatrix.sync.aligned`: https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#warp-level-matrix-instructions-ldmatrix

Authored-by: Steffen Larsen <steffen.larsen@codeplay.com>

Reviewed By: tra

Differential Revision: https://reviews.llvm.org/D107046
2021-08-06 16:13:35 -07:00
Michael Liao 05783e1cfe [amdgpu] Revise the conversion from i64 to f32.
- Replace 'cmp+sel' with 'umin' if possible.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D107507
2021-08-06 17:01:47 -04:00
Amara Emerson 2b067e3335 Change TargetLowering::canMergeStoresTo() to take a MF instead of DAG.
DAG is unnecessary and we need this hook to implement store merging on GlobalISel too.
2021-08-06 12:57:53 -07:00
Thomas Johnson f8a4495149 [ARC] Add codegen for llvm.ctlz intrinsic for the ARC backend
Differential Revision: https://reviews.llvm.org/D107611
2021-08-06 12:18:06 -07:00
Craig Topper b2ca4dc935 [LegalizeTypes] Add a simple expansion for SMULO when a libcall isn't available.
This isn't optimal, but prevents crashing when the libcall isn't
available. It just calculates the full product and makes sure the high bits
match the sign of the low half. Each of the pieces should go through their own
type legalization.

This can make D107420 unnecessary.

Needs tests, but I wanted to start discussion about D107420.

Reviewed By: FreddyYe

Differential Revision: https://reviews.llvm.org/D107581
2021-08-06 09:43:01 -07:00
David Green 77e8f4eeee [ARM] Define ComplexPatternFuncMutatesDAG
Some of the Arm complex pattern functions call canExtractShiftFromMul,
which can modify the DAG in-place. For this to be valid and handled
successfully we need to define ComplexPatternFuncMutatesDAG.

Differential Revision: https://reviews.llvm.org/D107476
2021-08-06 17:35:11 +01:00
Reshabh Sharma 5173854f19 [AMDGPU] Handle functions in llvm's global ctors and dtors list
This patch introduces a new code object metadata field, ".kind"
which is used to add support for init and fini kernels.

HSAStreamer will use function attributes, "device-init" and
"device-fini" to distinguish between init and fini kernels from
the regular kernels and will emit metadata with ".kind" set to
"init" and "fini" respectively.

To reduce the number of init and fini kernels, the ctors and
dtors present in the llvm's global.ctors and global.dtors lists
are called from a single init and fini kernel respectively.

Reviewed by: yaxunl

Differential Revision: https://reviews.llvm.org/D105682
2021-08-06 15:53:33 +05:30
Simon Pilgrim dbce6a8d9d [ARM] Fold insert_subvector to concat_vectors
D107068 fixed the same problem on aarch64 but the arm variant wasn't exposed in existing test coverage.

I've copied the arm64-neon-copy tests (and stripped the intrinsic test from it) for testing on arm neon builds as well.
2021-08-06 11:21:31 +01:00
Simon Pilgrim 18e6a03b1a [X86][AVX] Extract SUBV_BROADCAST constant bits from just the lower subvector range (PR51281)
As reported on PR51281, an internal fuzz test encountered an issue when extracting constant bits from a SUBV_BROADCAST node from a constant pool source larger than the broadcasted subvector width.

The getTargetConstantBitsFromNode was assuming that the Constant would the same size as the subvector, resulting in the incorrect packing of the per-element bits data.

This patch attempts to solve this by using the SUBV_BROADCAST node to determine the subvector width, and then ensuring we extract only the lowest bits from Constant of that subvector bitsize.

Differential Revision: https://reviews.llvm.org/D107158
2021-08-06 11:21:31 +01:00
Cullen Rhodes 08bc441174 [AArch64] NFC: drop unnecessary llvm:: namespace prefix on MCInst 2021-08-06 09:23:18 +00:00
Jay Foad 83610d4eb0 [AMDGPU][GlobalISel] Better legalization of 32-bit ctlz/cttz
Differential Revision: https://reviews.llvm.org/D107474
2021-08-06 09:40:48 +01:00
Jay Foad 24b67a9024 [AMDGPU][GlobalISel] Improve regbankselect for 64-bit VGPR ctlz_zero_undef/cttz_zero_undef
We can improve on the generic splitting by using ffbh/ffbl, which have a
defined result when the input is zero.

Differential Revision: https://reviews.llvm.org/D107442
2021-08-06 09:40:48 +01:00
Jay Foad d77b43c385 [AMDGPU][GlobalISel] Add G_AMDGPU_FFBL_B32
This is the counterpart to G_AMDGPU_FFBH_U32 which already exists. These
instructions have a defined result of -1 when the input is zero.

Differential Revision: https://reviews.llvm.org/D107441
2021-08-06 09:40:48 +01:00
Serge Pavlov 4c4093e6e3 Introduce intrinsic llvm.isnan
This is recommit of the patch 16ff91ebcc,
reverted in 0c28a7c990 because it had
an error in call of getFastMathFlags (base type should be FPMathOperator
but not Instruction). The original commit message is duplicated below:

    Clang has builtin function '__builtin_isnan', which implements C
    library function 'isnan'. This function now is implemented entirely in
    clang codegen, which expands the function into set of IR operations.
    There are three mechanisms by which the expansion can be made.

    * The most common mechanism is using an unordered comparison made by
      instruction 'fcmp uno'. This simple solution is target-independent
      and works well in most cases. It however is not suitable if floating
      point exceptions are tracked. Corresponding IEEE 754 operation and C
      function must never raise FP exception, even if the argument is a
      signaling NaN. Compare instructions usually does not have such
      property, they raise 'invalid' exception in such case. So this
      mechanism is unsuitable when exception behavior is strict. In
      particular it could result in unexpected trapping if argument is SNaN.

    * Another solution was implemented in https://reviews.llvm.org/D95948.
      It is used in the cases when raising FP exceptions by 'isnan' is not
      allowed. This solution implements 'isnan' using integer operations.
      It solves the problem of exceptions, but offers one solution for all
      targets, however some can do the check in more efficient way.

    * Solution implemented by https://reviews.llvm.org/D96568 introduced a
      hook 'clang::TargetCodeGenInfo::testFPKind', which injects target
      specific code into IR. Now only SystemZ implements this hook and it
      generates a call to target specific intrinsic function.

    Although these mechanisms allow to implement 'isnan' with enough
    efficiency, expanding 'isnan' in clang has drawbacks:

    * The operation 'isnan' is hidden behind generic integer operations or
      target-specific intrinsics. It complicates analysis and can prevent
      some optimizations.

    * IR can be created by tools other than clang, in this case treatment
      of 'isnan' has to be duplicated in that tool.

    Another issue with the current implementation of 'isnan' comes from the
    use of options '-ffast-math' or '-fno-honor-nans'. If such option is
    specified, 'fcmp uno' may be optimized to 'false'. It is valid
    optimization in general, but it results in 'isnan' always returning
    'false'. For example, in some libc++ implementations the following code
    returns 'false':

        std::isnan(std::numeric_limits<float>::quiet_NaN())

    The options '-ffast-math' and '-fno-honor-nans' imply that FP operation
    operands are never NaNs. This assumption however should not be applied
    to the functions that check FP number properties, including 'isnan'. If
    such function returns expected result instead of actually making
    checks, it becomes useless in many cases. The option '-ffast-math' is
    often used for performance critical code, as it can speed up execution
    by the expense of manual treatment of corner cases. If 'isnan' returns
    assumed result, a user cannot use it in the manual treatment of NaNs
    and has to invent replacements, like making the check using integer
    operations. There is a discussion in https://reviews.llvm.org/D18513#387418,
    which also expresses the opinion, that limitations imposed by
    '-ffast-math' should be applied only to 'math' functions but not to
    'tests'.

    To overcome these drawbacks, this change introduces a new IR intrinsic
    function 'llvm.isnan', which realizes the check as specified by IEEE-754
    and C standards in target-agnostic way. During IR transformations it
    does not undergo undesirable optimizations. It reaches instruction
    selection, where is lowered in target-dependent way. The lowering can
    vary depending on options like '-ffast-math' or '-ffp-model' so the
    resulting code satisfies requested semantics.

    Differential Revision: https://reviews.llvm.org/D104854
2021-08-06 14:32:27 +07:00
Amara Emerson 4fee756c75 Delete copy-ctor of MachineFrameInfo.
I just hit a nasty bug when writing a unit test after calling MF->getFrameInfo()
without declaring the variable as a reference.

Deleting the copy-constructor also showed a place in the ARM backend which was
doing the same thing, albeit it didn't impact correctness there from the looks of it.
2021-08-05 23:24:37 -07:00
Kai Luo 666ee849f0 [PowerPC] Fix shift amount of xxsldwi when performing vector int_to_double
POC
```
// main.c
#include <stdio.h>
#include <altivec.h>
extern vector double foo(vector int s);
int main() {
  vector int s = {0, 1, 0, 4};
  vector double vd;
  vd = foo(s);
  printf("%lf %lf\n", vd[0], vd[1]);
  return 0;
}
// poc.c
vector double foo(vector int s) {
  int x1 = s[1];
  int x3 = s[3];
  double d1 = x1;
  double d3 = x3;
  vector double x = { d1, d3 };
  return x;
}
```
Compiled with `poc.c main.c -mcpu=pwr8 -O3` on BE machine.
Current clang gives
```
4.000000 1.000000
```
while xlc gives
```
1.000000 4.000000
```
Xlc's output should be correct.

Reviewed By: shchenz, #powerpc

Differential Revision: https://reviews.llvm.org/D107428
2021-08-06 06:01:29 +00:00
Serge Bazanski 7ece20505f [Lanai] fix lowering wide returns
This implements LanaiTargetLowering::CanLowerReturn, thereby ensuring
all return values conform to the RetCC and get sret-demoted as
necessary.

A regression test is also added that exercises this functionality.

Reviewed By: jpienaar

Differential Revision: https://reviews.llvm.org/D107086
2021-08-05 21:08:09 -07:00
Jinsong Ji 6f84d94b9c [PowerPC] Fix copy/paste error in scalar_to_vector patterns
https://reviews.llvm.org/D100478 refactoring added a copy/paste error
for v8i16 patterns.

Reviewed By: #powerpc, shchenz

Differential Revision: https://reviews.llvm.org/D107609
2021-08-06 02:59:01 +00:00
Jessica Paquette e6a3944ea9 [AArch64][GlobalISel] Overhaul G_INSERT legalization
Similar cleanup to G_EXTRACT (51bd4e874f).

Also swap the order of clamp/widen to avoid unnecessary complex merges.

Add a bunch of missing testcases to legalize-inserts while we're at it.

Differential Revision: https://reviews.llvm.org/D107601
2021-08-05 18:28:22 -07:00
Jessica Paquette 562c8e14d9 [AArch64][GlobalISel] Widen G_IMPLICIT_DEF and G_FREEZE before clamping
Similar to other cleanup commits which widen instructions before clamping
during legalization. Purpose of this is to avoid weird type breakdowns.

In terms of G_IMPLICIT_DEF, this simplifies legalization for other instructions.
The legalizer has to emit G_IMPLICIT_DEF to legalize certain instructions, so
this can help with emitting merges elsewhere.

Differential Revision: https://reviews.llvm.org/D107604
2021-08-05 18:21:14 -07:00
Jessica Paquette 8a557d8311 [AArch64][GlobalISel] Widen extloads before clamping during legalization
Allows us to avoid awkward type breakdowns on types like s88, like the other
commits.

Differential Revision: https://reviews.llvm.org/D107587
2021-08-05 16:14:06 -07:00
Stanislav Mekhanoshin d71924fbfe [AMDGPU] Improve v2i32/v2f32 insertelt patterns
Using REG_SEQUENCE produces better code than INSERT_SUBREG,
we can omit one move instruction in many cases.

Fixes: SWDEV-298028

Differential Revision: https://reviews.llvm.org/D107602
2021-08-05 16:13:39 -07:00
Heejin Ahn 41ba39dfcd [WebAssembly] Don't do SjLj transformation when there's only setjmp
When there is a `setjmp` call in a function, we transform every callsite
of `setjmp` to record its information by calling `saveSetjmp` function,
and we also transform every callsite of a function that can longjmp to
to check if a longjmp occurred and if so jump to the corresponding
post-setjmp BB. Currently we are doing this for every function that
contains a call to `setjmp`, but if there is no other function call
within that function that can longjmp, this transformation of `setjmp`
callsite and all the preparation of `setjmpTable` in the entry of the
function are not necessary.

This checks if a setjmp-calling function has any other calls that can
longjmp, and if not, skips the function for the SjLj transformation.

Reviewed By: dschuff

Differential Revision: https://reviews.llvm.org/D107530
2021-08-05 15:28:02 -07:00
David Green 649cf4514d [AArch64] Expand the SVE min/max reduction costs to NEON
This takes the existing SVE costing for the various min/max reduction
intrinsics and expands it to NEON, where I believe it applies equally
well.

In the process it changes the lowering to use min/max cost, as opposed
to summing up the cost of ICmp+Select.

Differential Revision: https://reviews.llvm.org/D106239
2021-08-05 23:23:24 +01:00
Jessica Paquette 36498374d4 [AArch64][GlobalISel] Widen G_BSWAP before clamping
This allows us to avoid odd type breakdowns + allows us to legalize types like
s88 in the first place.

Add some testcases for known legal types + testcases for s4 and s88.

Differential Revision: https://reviews.llvm.org/D107607
2021-08-05 15:16:00 -07:00
Jessica Paquette 51bd4e874f [AArch64][GlobalISel] Overhaul G_EXTRACT legalization
This simplifies our existing G_EXTRACT rules and adds some test coverage. Mostly
changing this because it should make it easier to improve legalization for
instructions which use G_EXTRACT as part of the legalization process.

This also adds support for legalizing some weird types. Similar to other recent
legalizer changes, this changes the order of widening/clamping.

There was some dead code in our existing rules (e.g. the p0 case would never get
hit), so this knocks those out and makes the types we want to handle explicit.

This also removes some checks which, nowadays, are handled by the
MachineVerifier.

Differential Revision: https://reviews.llvm.org/D107505
2021-08-05 13:55:15 -07:00
Jon Roelofs 98f38c151b [AArch64][GlobalISel] Legalize ctpop s128
This is re-landing the same patch again, but without the changes to
LegalizerHelper that regressed the Mips test:

test/CodeGen/Mips/GlobalISel/llvm-ir/ctpop.ll

Differential revision: https://reviews.llvm.org/D106494
2021-08-05 11:54:53 -07:00
Roman Lebedev c0586ff05d
[NFC][X86] combineX86ShuffleChain(): hoist Mask variable higher up
Having `NewMask` outside of an if and rebinding `BaseMask` `ArrayRef`
to it is confusing. Instead, just move the `Mask` vector higher up,
and change the code that earlier had no access to it but now does
to use `Mask` instead of `BaseMask`.

This has no other intentional changes.

This is a recommit of 35c0848b57,
that was reverted to simplify reversion of an earlier change.
2021-08-05 20:37:51 +03:00
Benjamin Kramer bd17ced1db Revert "[X86] combineX86ShuffleChain(): canonicalize mask elts picking from splats"
This reverts commits f819e4c7d0 and
35c0848b57. It triggers an infinite loop during
compilation.

$ cat t.ll
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

define void @MaxPoolGradGrad_1.65() local_unnamed_addr #0 {
entry:
  %wide.vec78 = load <64 x i32>, <64 x i32>* null, align 16
  %strided.vec83 = shufflevector <64 x i32> %wide.vec78, <64 x i32> poison, <8 x i32> <i32 4, i32 12, i32 20, i32 28, i32 36, i32 44, i32 52, i32 60>
  %0 = lshr <8 x i32> %strided.vec83, <i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16>
  %1 = add <8 x i32> zeroinitializer, %0
  %2 = shufflevector <8 x i32> %1, <8 x i32> undef, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
  %3 = shufflevector <16 x i32> %2, <16 x i32> undef, <32 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31>
  %interleaved.vec = shufflevector <32 x i32> undef, <32 x i32> %3, <64 x i32> <i32 0, i32 8, i32 16, i32 24, i32 32, i32 40, i32 48, i32 56, i32 1, i32 9, i32 17, i32 25, i32 33, i32 41, i32 49, i32 57, i32 2, i32 10, i32 18, i32 26, i32 34, i32 42, i32 50, i32 58, i32 3, i32 11, i32 19, i32 27, i32 35, i32 43, i32 51, i32 59, i32 4, i32 12, i32 20, i32 28, i32 36, i32 44, i32 52, i32 60, i32 5, i32 13, i32 21, i32 29, i32 37, i32 45, i32 53, i32 61, i32 6, i32 14, i32 22, i32 30, i32 38, i32 46, i32 54, i32 62, i32 7, i32 15, i32 23, i32 31, i32 39, i32 47, i32 55, i32 63>
  store <64 x i32> %interleaved.vec, <64 x i32>* undef, align 16
  unreachable
}

$ llc < t.ll -mcpu=skylake
<hang>
2021-08-05 18:58:08 +02:00
Jessica Paquette f3f3098afe [AArch64][GlobalISel] Mark v16s8 <- v8s8, v8s8 G_CONCAT_VECTOR as legal
G_CONCAT_VECTORS shows up from time to time when legalizing other instructions.

We actually import patterns for the v16s8 <- v8s8, v8s8 case so marking it
as legal gives us selection for free.

Differential Revision: https://reviews.llvm.org/D107512
2021-08-05 09:40:46 -07:00
Jay Foad 2b63933115 [AMDGPU][SDag] Better lowering for 32-bit ctlz/cttz
Differential Revision: https://reviews.llvm.org/D107566
2021-08-05 15:57:40 +01:00
Jay Foad e6c364a624 [AMDGPU][SDag] Better lowering for 64-bit ctlz/cttz
Differential Revision: https://reviews.llvm.org/D107546
2021-08-05 15:57:40 +01:00
Simon Pilgrim 2cbf9fd402 [DAG] DAGCombiner::visitVECTOR_SHUFFLE - recognise INSERT_SUBVECTOR patterns
IR typically creates INSERT_SUBVECTOR patterns as a widening of the subvector with undefs to pad to the destination size, followed by a shuffle for the actual insertion - SelectionDAGBuilder has to do something similar for shuffles when source/destination vectors are different sizes.

This combine attempts to recognize these patterns by looking for a shuffle of a subvector (from a CONCAT_VECTORS) that starts at a modulo of its size into an otherwise identity shuffle of the base vector.

This uncovered a couple of target-specific issues as we haven't often created INSERT_SUBVECTOR nodes in generic code - aarch64 could only handle insertions into the bottom of undefs (i.e. a vector widening), and x86-avx512 vXi1 insertion wasn't keeping track of undef elements in the base vector.

Fixes PR50053

Differential Revision: https://reviews.llvm.org/D107068
2021-08-05 15:40:48 +01:00
Simon Pilgrim e78bf49a58 [X86] Rename Subtarget Tuning Feature Flag Prefix. NFC.
As suggested on D107370, this patch renames the tuning feature flags to start with 'Tuning' instead of 'Feature'.

Differential Revision: https://reviews.llvm.org/D107459
2021-08-05 13:09:23 +01:00
Jay Foad e790b2b744 [AMDGPU] Make more use of getHiHalf64 and split64BitValue. NFCI. 2021-08-05 09:36:13 +01:00
Igor Kudrin 2c14798ead [ARM][llvm-objdump] Annotate PC-relative memory operands of VLDR instructions
This extends D105979 and adds support for VLDR instructions.

Differential Revision: https://reviews.llvm.org/D105980
2021-08-05 14:11:11 +07:00
Igor Kudrin ddbe812bcc [ARM][llvm-objdump] Annotate PC-relative memory operands
This implements `MCInstrAnalysis::evaluateMemoryOperandAddress()` for
Arm so that the disassembler can print the target address of memory
operands that use PC+immediate addressing.

Differential Revision: https://reviews.llvm.org/D105979
2021-08-05 14:11:11 +07:00