Commit Graph

60269 Commits

Author SHA1 Message Date
Jessica Paquette d0ba6c4002 [AArch64][GlobalISel] Select CSINC and CSINV for G_SELECT with constants
Select the following:

- G_SELECT cc, 0, 1 -> CSINC zreg, zreg, cc
- G_SELECT cc 0, -1 -> CSINV zreg, zreg cc
- G_SELECT cc, 1, f -> CSINC f, zreg, inv_cc
- G_SELECT cc, -1, f -> CSINV f, zreg, inv_cc
- G_SELECT cc, t, 1 -> CSINC t, zreg, cc
- G_SELECT cc, t, -1 -> CSINC t, zreg, cc

(IR example: https://godbolt.org/z/YfPna9)

These correspond to a bunch of the AArch64csel patterns in AArch64InstrInfo.td.

Unfortunately, it doesn't seem like we can import patterns that use NZCV like
those ones do. E.g.

```
def : Pat<(AArch64csel GPR32:$tval, (i32 1), (i32 imm:$cc), NZCV),
          (CSINCWr GPR32:$tval, WZR, (i32 imm:$cc))>;
```

So we have to manually select these for now.

This replaces `selectSelectOpc` with an `emitSelect` function, which performs
these optimizations.

Differential Revision: https://reviews.llvm.org/D90701
2020-11-12 14:44:01 -08:00
Kazushi (Jam) Marukawa 410626c9b5 [VE] Support vld intrinsics
Add intrinsics for vector load instructions.  Add a regression test also.

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D91332
2020-11-13 07:34:42 +09:00
Stanislav Mekhanoshin cf6565f6d0 [AMDGPU] Enable multi-dword flat scratch load/stores
Differential Revision: https://reviews.llvm.org/D91384
2020-11-12 13:38:56 -08:00
Jay Foad 6881a82e8c [AMDGPU] Fix scheduling of exp pos4
Also fix a similar issue in SIInsertWaitcnts, but I don't think that fix
has any effect in practice.

Differential Revision: https://reviews.llvm.org/D91290
2020-11-12 19:57:14 +00:00
Jay Foad d7d6ac5624 [AMDGPU] Define and use names for export targets. NFC.
Differential Revision: https://reviews.llvm.org/D91289
2020-11-12 19:57:14 +00:00
Craig Topper 4cdf1d2110 [MSP430] Remove unused MVT::Glue output from MSP430ISD::SELECT_CC nodes.
Follow up from a similar patch on RISCV 637f19c36b

Nothing reads this Glue value that I could see. The SDNode def in
the td file does not have the SDNPOutGlue flag so I don't think
this glue would get properly propagated to MachineSDNodes if it
was used.
2020-11-12 10:34:01 -08:00
Craig Topper 0add5f9122 [RISCV] Don't include CodeGen layer files in MC layer
-Use MCRegister instead of Register in MC layer.
-Move some enums from RISCVInstrInfo.h to RISCVBaseInfo.h to be with other TSFlags bits.

Differential Revision: https://reviews.llvm.org/D91114
2020-11-12 07:45:38 -08:00
Craig Topper 9ca02d6fe1 [RISCV] Add an ANDI to shift amount of FSL/FSR instructions
The fshl and fshr intrinsics are defined to modulo their shift amount by the bitwidth of one of their inputs. The FSR/FSL instructions read one extra bit from the shift amount. If that bit is set the inputs are swapped. In order to preserve the semantics of the llvm intrinsics we need to make sure that the extra bit isn't set. DAG combine or instcombine may have removed any mask that was originally present.

We could be smarter here and try to use computeKnownBits to check if the bit is known zero, but wanted to start with correctness.

Differential Revision: https://reviews.llvm.org/D90905
2020-11-12 07:33:40 -08:00
David Green 11dee2eae2 [ARM] Ensure CountReg definition dominates InsertPt when creating t2DoLoopStartTP
Of course there was something missing, in this case a check that the def
of the count register we are adding to a t2DoLoopStartTP would dominate
the insertion point.

In the future, when we remove some of these COPY's in between, the
t2DoLoopStartTP will always become the last instruction in the block,
preventing this from happening. In the meantime we need to check they
are created in a sensible order.

Differential Revision: https://reviews.llvm.org/D91287
2020-11-12 13:47:46 +00:00
Kazushi (Jam) Marukawa a72d384249 [VE] Change the default type of v64 register class
Change the default type of v64 register class from v512i32 to v256f64.
Add a regression test also.

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D91301
2020-11-12 19:07:07 +09:00
David Sherwood 3225fcf11e [SVE] Deal with SVE tuple call arguments correctly when running out of registers
When passing SVE types as arguments to function calls we can run
out of hardware SVE registers. This is normally fine, since we
switch to an indirect mode where we pass a pointer to a SVE stack
object in a GPR. However, if we switch over part-way through
processing a SVE tuple then part of it will be in registers and
the other part will be on the stack.

I've fixed this by ensuring that:

1. When we don't have enough registers to allocate the whole block
   we mark any remaining SVE registers temporarily as allocated.
2. We temporarily remove the InConsecutiveRegs flags from the last
   tuple part argument and reinvoke the autogenerated calling
   convention handler. Doing this prevents the code from entering
   an infinite recursion and, in combination with 1), ensures we
   switch over to the Indirect mode.
3. After allocating a GPR register for the pointer to the tuple we
   then deallocate any SVE registers we marked as allocated in 1).
   We also set the InConsecutiveRegs flags back how they were before.
4. I've changed the AArch64ISelLowering LowerCALL and
   LowerFormalArguments functions to detect the start of a tuple,
   which involves allocating a single stack object and doing the
   correct numbers of legal loads and stores.

Differential Revision: https://reviews.llvm.org/D90219
2020-11-12 08:41:50 +00:00
Amara Emerson ad376657c1 [AArch64][GlobalISel] Optimize G_PTR_ADD with a negated offset to be a G_SUB. 2020-11-11 22:46:53 -08:00
Baptiste Saleil 37c4ac8545 [PowerPC] Accumulator/Unprimed Accumulator register copy, spill and restore
This patch adds support for accumulator/unprimed accumulator
register copy, spill and restore for MMA.

Authored By: Baptiste Saleil

Reviewed By: #powerpc, bsaleil, amyk

Differential Revision: https://reviews.llvm.org/D90616
2020-11-11 16:23:45 -06:00
Jessica Paquette 7a70a2f04d [AArch64][GlobalISel] Mark G_FCONSTANT as legal when there is full fp16 support
When there is full fp16 support, there is no reason to widen 16-bit
G_FCONSTANTs to 32 bits. Mark them as legal in this case.

Also, we currently import a pattern for materializing a 16-bit 0.0.
Add a testcase showing we select it.

(All other 16-bit G_FCONSTANTS are not yet selected.)

Differential Revision: https://reviews.llvm.org/D89164
2020-11-11 13:25:11 -08:00
Craig Topper 637f19c36b [RISCV] Remove traces of Glue from RISCVISD::SELECT_CC
We were creating RISCVISD::SELECT_CC nodes with Glue output that was never being used, and the tablegen SDNode had the SDNPInGlue flag instead of the SDNPOutGlue flag.

Since we don't seem to need the Glue just get rid of it from both places.

Differential Revision: https://reviews.llvm.org/D91199
2020-11-11 09:30:48 -08:00
Jessica Paquette c42053f79b [AArch64][GlobalISel] Select arith extended add/sub in manual selection code
The manual selection code for add/sub was not checking if it was possible to
fold in shifts + extends (the *rx opcode variants).

As a result, we could never select things like

```
cmp x1, w0, uxtw #2
```

Because we don't import any patterns for compares.

This adds support for the arithmetic shifted register forms and updates tests
for instructions selected using `emitADD`, `emitADDS`, and `emitSUBS`.

This is a 0.1% geomean code size improvement on SPECINT2000 at -Os.

Differential Revision: https://reviews.llvm.org/D91207
2020-11-11 09:26:03 -08:00
Jessica Paquette f0580c73bb [AArch64][GlobalISel] Select negative arithmetic immediates in manual selector
Previously, we only handled negative arithmetic immediates in the imported
selector code.

Since we don't import code for, say, compares, we were missing opportunities
for things like

```
%cst:gpr(s64) = G_CONSTANT i64 -10
%cmp:gpr(s32) = G_ICMP intpred(eq), %reg0(s64), %cst
->
%adds = ADDSXri %reg0, 10, 0, implicit-def $nzcv
%cmp = CSINCWr $wzr, $wzr, 1, implicit $nzcv
```

Instead, we would have to materialize the constant and emit a SUBS.

This adds support for selection like above for SUB, SUBS, ADD, and ADDS.

This is a 0.1% geomean code size improvement on SPECINT2000 at -Os.

Differential Revision: https://reviews.llvm.org/D91108
2020-11-11 09:20:05 -08:00
Jay Foad f23c4c6f8a [AMDGPU] Separate out real exp instructions by subtarget. NFC.
Differential Revision: https://reviews.llvm.org/D91247
2020-11-11 17:13:40 +00:00
Jay Foad 2b33ea6935 [AMDGPU] Split exp instructions out into their own tablegen file. NFC.
Differential Revision: https://reviews.llvm.org/D91246
2020-11-11 17:13:40 +00:00
Jay Foad f94fd1c8ca [AMDGPU] Make use of SIInstrInfo::isEXP. NFC. 2020-11-11 17:01:20 +00:00
Jay Foad 830ed64ccd Revert "Revert "[AMDGPU] Reorganize GCN subtarget features for unaligned access""
This reverts commit 8b08fa0103.

The underlying problems were fixed by D90607.
2020-11-11 14:40:14 +00:00
Caroline Concatto 37f4ccb275 [AArch64]Add memory op cost model for SVE
This patch adds/fixes memory op cost model for SVE with fixed-width
vector.

Differential Revision: https://reviews.llvm.org/D90950
2020-11-11 12:49:19 +00:00
Simon Pilgrim 1a62ca65c1 [KnownBits] Add KnownBits::commonBits helper. NFCI.
We have a frequent pattern where we're merging two KnownBits to get the common/shared bits, and I just fell for the gotcha where I tried to use the & operator to merge them........
2020-11-11 12:15:54 +00:00
Kerry McLaughlin 170947a5de [SVE][CodeGen] Lower scalable masked scatters
Lowers the llvm.masked.scatter intrinsics (scalar plus vector addressing mode only)

Changes included in this patch:
 - Custom lowering for MSCATTER, which chooses the appropriate scatter store opcode to use.
    Floating-point scatters are cast to integer, with patterns added to match FP reinterpret_casts.
 - Added the getCanonicalIndexType function to convert redundant addressing
   modes (e.g. scaling is redundant when accessing bytes)
 - Tests with 32 & 64-bit scaled & unscaled offsets

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D90941
2020-11-11 11:50:22 +00:00
Kerry McLaughlin ffbbfc76ca [SVE][CodeGen] Add the isTruncatingStore flag to MSCATTER
This patch adds the IsTruncatingStore flag to MaskedScatterSDNode, set by getMaskedScatter().
Updated SelectionDAGDumper::print_details for MaskedScatterSDNode to print
the details of masked scatters (is truncating, signed or scaled).

This is the first in a series of patches which adds support for scalable masked scatters

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D90939
2020-11-11 10:58:24 +00:00
Sam Parker 898a81dfc5 [NFC][ARM] Replace lambda with any_of 2020-11-11 10:02:55 +00:00
Amara Emerson 2262393090 [AArch64][GlobalISel] Port some AArch64 target specific MUL combines from SDAG.
These do things like turn a multiply of a pow-2+1 into a shift and and add,
which is a common pattern that pops up, and is universally better than expensive
madd instructions with a constant.

I've added check lines to an existing codegen test since the code being ported
is almost identical, however the mul by negative pow2 constant tests don't generate
the same code because we're missing some generic G_MUL combines still.

Differential Revision: https://reviews.llvm.org/D91125
2020-11-10 22:21:13 -08:00
Gaurav Jain 3726b14428 [NFC] Use [MC]Register for x86 target
Differential Revision: https://reviews.llvm.org/D91161
2020-11-10 15:49:39 -08:00
Kazushi (Jam) Marukawa dd6f607ea8 [VE] Implement FoldImmediate
Implement FoldImmediate for only integer aritihmetic operations.
Add regression tests also.

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D91150
2020-11-11 08:08:32 +09:00
Pirama Arumuga Nainar 8262e94a6d [ARM] Fix PR 47980: Use constrainRegClass during foldImmediate opt.
Previously we used setRegClass to rgpr, which may expand the register
domain if the result was already in a constrained class (tcgpr in the
above PR).

Differential Revision: https://reviews.llvm.org/D91192
2020-11-10 13:38:11 -08:00
Stanislav Mekhanoshin 544ef42e40 [AMDGPU] Set default op_sel_hi on accvgpr read/write
These are opsel opcodes with op_sel actually being ignored.
As a such op_sel_hi needs to be set to default 1 even though
these bits are ignored. This is compatibility change.

Differential Revision: https://reviews.llvm.org/D91202
2020-11-10 13:07:29 -08:00
Benjamin Kramer 92c61a045f [ARM] Silence unused variable warning in Release builds. NFC. 2020-11-10 20:35:28 +01:00
Craig Topper 70b481e8db [RISCV] Add missing copyright header to RISCVBaseInfo.cpp. NFC 2020-11-10 11:33:08 -08:00
David Green 08d1c2d470 [ARM] Introduce t2DoLoopStartTP
This introduces a new pseudo instruction, almost identical to a
t2DoLoopStart but taking 2 parameters - the original loop iteration
count needed for a low overhead loop, plus the VCTP element count needed
for a DLSTP instruction setting up a tail predicated loop. The idea is
that the instruction holds both values and the backend
ARMLowOverheadLoops pass can pick between the two, depending on whether
it creates a tail predicated loop or falls back to a low overhead loop.

To do that there needs to be something that converts a t2DoLoopStart to
a t2DoLoopStartTP, for which this patch repurposes the
MVEVPTOptimisationsPass as a "tail predication and vpt optimisation"
pass. The extra operand for the t2DoLoopStartTP is chosen based on the
operands of VCTP's in the loop, and the instruction is moved as late in
the block as possible to attempt to increase the likelihood of making
tail predicated loops.

Differential Revision: https://reviews.llvm.org/D90591
2020-11-10 18:08:12 +00:00
Jay Foad bb8d1437a6 [AMDGPU] Simplify multiclass EXP_m. NFC. 2020-11-10 17:28:36 +00:00
David Green dbe1bf63aa [ARM] Cleanup for ARMLowOverheadLoops. NFC 2020-11-10 17:28:07 +00:00
David Green c7e275388e [ARM] Don't aggressively unroll vector remainder loops
We already do not unroll loops with vector instructions under MVE, but
that does not include the remainder loops that the vectorizer produces.
These remainder loops will be rarely executed and are not worth
unrolling, as the trip count is likely to be low if they get executed at
all. Luckily they get llvm.loop.isvectorized to make recognizing them
simpler.

We have wanted to do this for a while but hit issues with low overhead
loops being reverted due to difficult registry allocation. With recent
changes that seems to be less of an issue now.

Differential Revision: https://reviews.llvm.org/D90055
2020-11-10 17:01:31 +00:00
David Green 73a6cd4b6b [ARM] Add a RegAllocHint for hinting t2DoLoopStart towards LR
This hints the operand of a t2DoLoopStart towards using LR, which can
help make it more likely to become t2DLS lr, lr. This makes it easier to
move if needed (as the input is the same as the output), or potentially
remove entirely.

The hint is added after others (from COPY's etc) which still take
precedence. It needed to find a place to add the hint, which currently
uses the post isel custom inserter.

Differential Revision: https://reviews.llvm.org/D89883
2020-11-10 16:28:57 +00:00
David Green b2ac9681a7 [ARM] Alter t2DoLoopStart to define lr
This changes the definition of t2DoLoopStart from
t2DoLoopStart rGPR
to
GPRlr = t2DoLoopStart rGPR

This will hopefully mean that low overhead loops are more tied together,
and we can more reliably generate loops without reverting or being at
the whims of the register allocator.

This is a fairly simple change in itself, but leads to a number of other
required alterations.

 - The hardware loop pass, if UsePhi is set, now generates loops of the
   form:
       %start = llvm.start.loop.iterations(%N)
     loop:
       %p = phi [%start], [%dec]
       %dec = llvm.loop.decrement.reg(%p, 1)
       %c = icmp ne %dec, 0
       br %c, loop, exit
 - For this a new llvm.start.loop.iterations intrinsic was added, identical
   to llvm.set.loop.iterations but produces a value as seen above, gluing
   the loop together more through def-use chains.
 - This new instrinsic conceptually produces the same output as input,
   which is taught to SCEV so that the checks in MVETailPredication are not
   affected.
 - Some minor changes are needed to the ARMLowOverheadLoop pass, but it has
   been left mostly as before. We should now more reliably be able to tell
   that the t2DoLoopStart is correct without having to prove it, but
   t2WhileLoopStart and tail-predicated loops will remain the same.
 - And all the tests have been updated. There are a lot of them!

This patch on it's own might cause more trouble that it helps, with more
tail-predicated loops being reverted, but some additional patches can
hopefully improve upon that to get to something that is better overall.

Differential Revision: https://reviews.llvm.org/D89881
2020-11-10 15:57:58 +00:00
Kazushi (Jam) Marukawa 543b30db06 [VE][NFC] Change cast to dyn_cast
We used cast where we should use dyn_cast.  So, change it this time.
Old code cause problems if I implement brind instruction and compile
openmp using new compiler.

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D91151
2020-11-10 21:49:16 +09:00
Pablo Barrio 642b21beba [AArch64] Enable RAS 1.1 system registers in all AArch64
Some use cases (e.g. kernel devs) have strict requirements to only enable
features available with -march=armv8-a, e.g. no armv8.1-a. Enabling RAS 1.1 in
all AArch64 means they can consider to support it.

Bear in mind that the first versions of the Armv8 architecture still do not
support RAS 1.1. This patch only lets devs write code with the user-friendly
register mnemonic instead of the ugly generic S<op0>_<op1>_<Cn>_<Cm>_<op2>.
They still need to place runtime checks to make sure that the CPU to run on
supports RAS 1.1.

Differential Revision: https://reviews.llvm.org/D90594
2020-11-10 12:13:33 +00:00
Kazushi (Jam) Marukawa c84b2c49be [VE] Support inline assembly with vector regsiters
Support inline assembly with vector registers.  Add a regression test also.

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D91146
2020-11-10 20:55:38 +09:00
Mirko Brkusanin a75d6178b8 [GlobalISel] Add combine for (x | mask) -> x when (x | mask) == x
If we have a mask, and a value x, where (x | mask) == x, we can drop the OR
and just use x.

Differential Revision: https://reviews.llvm.org/D90952
2020-11-10 11:32:13 +01:00
Mirko Brkusanin fb36ab0a42 [GlobalISel] Expand combine for (x & mask) -> x when (x & mask) == x
We can use KnownBitsAnalysis to cover cases when mask is not trivial. It can
also help with cases when mask is not constant but can still be folded into
one. Since 'and' is comutative we should treat both operands as possible
replacements.

Differential Revision: https://reviews.llvm.org/D90674
2020-11-10 11:32:13 +01:00
Kazushi (Jam) Marukawa b65ef65b22 [VE] Support inline assembly
Support inline assembly with scalar registers.  Add a regression test also.

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D91119
2020-11-10 18:56:22 +09:00
Jay Foad 0ad4d04002 [AMDGPU] Remove an unused return value. NFC.
Differential Revision: https://reviews.llvm.org/D91063
2020-11-10 09:15:14 +00:00
Esme-Yi 6e0ad5bc8c [PowerPC] Add an ISEL pattern for Mul with Imm.
Summary: This patch try to do the following transformation if the multiplier doen't fit int16:
			(mul X, c1 << c2) -> (rldicr (mulli X, c1) c2)

Reviewed By: jsji, steven.zhang

Differential Revision: https://reviews.llvm.org/D87384
2020-11-10 06:52:39 +00:00
Carl Ritson fde8351743 [AMDGPU] Fix lowering of S_MOV_{B32,B64}_term
If the source of S_MOV_{B32,B64}_term is an immediate then it
cannot be lowered to a COPY.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D90451
2020-11-10 12:16:31 +09:00
Eric Astor d657f7cd30 [ms] [llvm-ml] Support MASM's relational operators (EQ, LT, etc.)
Support the named relational operators (EQ, LT, etc.).

Reviewed By: thakis

Differential Revision: https://reviews.llvm.org/D89733
2020-11-09 14:01:36 -05:00
Francesco Petrogalli 9f61931e07 [llvm][AArch64] Allow TB(N)Z to drop signext for sign bit tests.
For example if the sign extension is only used in for TBZ, and the value is used elsewhere with a zero extension, this can eliminate a sign extension.

Reviewed By: samparker

Differential Revision: https://reviews.llvm.org/D90606
2020-11-09 18:27:48 +00:00
David Green c8cd7e2bbf [ARM] Remove MI variable aliasing. NFC
This was accidentally using the same name for two different variables in
the same line. Whilst it seems to work for some compilers, others have
trouble and it is probably not a fantastic idea.
2020-11-09 18:18:43 +00:00
Craig Topper 5d3fd3df94 [RISCV] Make ctlz/cttz cheap to speculatively execute so CodeGenPrepare won't insert a zero check.
Add additional isel patterns for ctzw/clzw instructions.

Differential Revision: https://reviews.llvm.org/D91040
2020-11-09 10:13:45 -08:00
Craig Topper a59076006b [RISCV] Add isel patterns for using PACK for zext.h and zext.w.
Differential Revision: https://reviews.llvm.org/D91024
2020-11-09 10:13:45 -08:00
Craig Topper 4265cbaa34 [RISCV] Make SIGN_EXTEND_INREG from i8/i16 legal when Zbb extension is enabled.
This produces better code for sign extend to i64 on RV32 target.

Differential Revision: https://reviews.llvm.org/D91023
2020-11-09 10:13:45 -08:00
Craig Topper c0dd22e44a [RISCV] Add isel patterns to match sbset/sbclr/sbinv/sbext even if the shift amount isn't masked.
This uses the shiftop PatFrags to handle the masked shift amount
and unmasked shift amount cases. That also checks XLen as part
of the masked amount check so we don't need separate RV32 and RV64
patterns.

Differential Revision: https://reviews.llvm.org/D91016
2020-11-09 09:55:26 -08:00
Mircea Trofin 2ac3a7d0c4 [NFC] Use [MC]Register
Differential Revision: https://reviews.llvm.org/D90795
2020-11-09 08:37:14 -08:00
jasonliu 42d2109380 [XCOFF] Enable explicit sections on AIX
Implement mechanism to allow explicit sections to be generated on AIX.

Reviewed By: DiggerLin

Differential Revision: https://reviews.llvm.org/D88615
2020-11-09 16:27:38 +00:00
Stanislav Mekhanoshin d5a465866e [AMDGPU] Omit buffer resource with flat scratch.
Differential Revision: https://reviews.llvm.org/D90979
2020-11-09 08:05:20 -08:00
Paul C. Anagnostopoulos 91d2e5c81a [TableGen] Add the !filter bang operator.
Add a test. Update the Programmer's Reference.

Use it in some TableGen files.

Differential Revision: https://reviews.llvm.org/D91008
2020-11-09 10:56:55 -05:00
Sebastian Neubauer a022b1ccd8 [AMDGPU] Add amdgpu_gfx calling convention
Add a calling convention called amdgpu_gfx for real function calls
within graphics shaders. For the moment, this uses the same calling
convention as other calls in amdgpu, with registers excluded for return
address, stack pointer and stack buffer descriptor.

Differential Revision: https://reviews.llvm.org/D88540
2020-11-09 16:51:44 +01:00
Momchil Velikov 937ab6a785 [ARM][MachineOutliner] Emit more CFI instructions
This patch make the outliner emit CFI instructions in a few more
places:

  * after LR is restored, but before the return in an outlined
  function

  * around save/restore of LR to/from a register at calls to outlined
  functions

  * around save/restore of LR to/from the stack at calls to outlined
  functions

The latter two only when the function does NOT spill LR. If the
function spills LR, then outliner generated saves/restores around
calls are not considered interesting for unwinding the frame.

Differential Revision: https://reviews.llvm.org/D89483
2020-11-09 15:26:18 +00:00
Sam Tebbs 40a3f7e48d [ARM][LowOverheadLoops] Merge a VCMP and the new VPST into a VPT
There were cases where a VCMP and a VPST were merged even if the VCMP
didn't have the same defs of its operands as the VPST. This is fixed by
adding RDA checks for the defs. This however gave rise to cases where
the new VPST created would precede the un-merged VCMP and so would fail
a predicate mask assertion since the VCMP wasn't predicated. This was
solved by converting the VCMP to a VPT instead of inserting the new
VPST.

Differential Revision: https://reviews.llvm.org/D90461
2020-11-09 15:03:48 +00:00
Jay Foad 55ea017759 [AMDGPU] Remove unused DisableDecoder machinery. NFC.
This has been unused since D24738.
2020-11-09 13:53:27 +00:00
David Green a0a9e1c798 [ARM] Remove kill flags between VCMP and insertion point
When we fold a VCMP into a VPST instruction any kill flags between the
old VCMP position and the new insertion point need to be removed, in
order to keep the verifier happy.

Differential Revision: https://reviews.llvm.org/D90964
2020-11-09 13:17:53 +00:00
Lucas Prates c2c2cc1360 [ARM][AArch64] Adding Neoverse V1 CPU support
Add support for the Neoverse V1 CPU to the ARM and AArch64 backends.

This is based on patches from Mark Murray and Victor Campos.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D90765
2020-11-09 13:15:40 +00:00
Craig Topper f40925aa8b [X86] Improve lowering of fptoui
Invert the select condition when masking in the sign bit of a fptoui operation. Also, rather than lowering the sign mask to select/xor and expecting the select to get cleaned up later, directly lower to shift/xor.

Patch by Layton Kifer!

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D90658
2020-11-07 23:50:03 -08:00
Craig Topper 19313ed580 [RISCV] Remove assertsexti32 from a couple B extension isel patterns that don't demanded the sign extended bits. 2020-11-07 22:43:16 -08:00
Carl Ritson 8e8a54c7e9 [AMDGPU] SIWholeQuadMode fix mode insertion when SCC always defined
Fix a crash when SCC is defined until end of block and mode change
must be inserted in SCC live region.

Reviewed By: mceier

Differential Revision: https://reviews.llvm.org/D90997
2020-11-08 11:14:57 +09:00
Craig Topper c72358b77f [RISCV] Use (not X) in instead of (xor X, -1) in isel patterns to improve readability. NFC 2020-11-07 11:50:52 -08:00
Elvina Yakubova 93b99728b1 [AArch64] Add pipeline model for HiSilicon's TSV110
This patch adds the scheduling and cost model for TSV110.

Reviewed by: SjoerdMeijer, bryanpkc

Differential Revision: https://reviews.llvm.org/D89972
2020-11-07 01:23:00 +03:00
Eric Astor 5afb360808 [ms] [llvm-ml] Allow arbitrary strings as integer constants
MASM interprets strings in expression contexts as integers expressed in big-endian base-256, treating each character as its ASCII representation.

This completely eliminates the need to special-case single-character strings.

Reviewed By: thakis

Differential Revision: https://reviews.llvm.org/D90788
2020-11-06 17:15:49 -05:00
Jay Foad d61f2cfb9f [AMDGPU] Simplify exp target parsing
Treat any identifier as a potential exp target and diagnose them all the
same way as "invalid exp target"s.

Differential Revision: https://reviews.llvm.org/D90947
2020-11-06 16:09:34 +00:00
Paul C. Anagnostopoulos eed768b700 [NVPTX] [TableGen] Use new features of TableGen to simplify and clarify.
Differential Revision: https://reviews.llvm.org/D90861
2020-11-06 09:20:19 -05:00
Simon Moll 7914e4f0fa [VE] Add v(m)regs to preserve_all reg mask
V(m)regs where defined before CSR_preserve_all was, add them now.

Reviewed By: kaz7

Differential Revision: https://reviews.llvm.org/D90912
2020-11-06 15:16:11 +01:00
Simon Moll adc69743d2 [VE][NFC] Refactor to support more than one calling conv
Prepare for supporting  different calling conventions by factoring out
things into CC-dependent selection functions (getParamCC, getReturnCC).

Reviewed By: kaz7

Differential Revision: https://reviews.llvm.org/D90911
2020-11-06 14:25:25 +01:00
Kazushi (Jam) Marukawa 43df29e206 [VE] Optimize address calculation
Optimize address calculations using LEA/LEASL instructions.
Update comments in VEISelLowering.cpp also.  Update an
existing regression test optimized by this modification.

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D90878
2020-11-06 19:46:59 +09:00
Simon Moll d3b33a7810 [VE][TTI] don't advertise vregs/vops
Claim to not have any vector support to dissuade SLP, LV and friends
from generating SIMD IR for the VE target.  We will take this back once
vector isel is stable.

Reviewed By: kaz7, fhahn

Differential Revision: https://reviews.llvm.org/D90462
2020-11-06 11:12:10 +01:00
Craig Topper 741b04b0b7 [RISCV] Only enable GPR<->FPR32 bitconvert isel patterns on RV32. NFCI
Bitconvert requires the bitwidth to match on both sides. On RV64
the GPR size is i64 so bitconvert between f32 isn't possible. The
node should never be generated so the pattern won't ever match, but
moving the patterns under IsRV32 makes it more obviously impossible.
It also moves it to a similar location to the patterns for the
custom nodes we use for RV64.
2020-11-05 16:15:25 -08:00
Konstantin Pyzhov 41e74e400d [AMDGPU] Corrected declaration of VOPC instructions with SDWA addressing mode.
Removed "implicit def VCC" from declarations of AMDGPU VOPC instructions since they do not implicitly write to VCC in SDWA mode.

Differential Revision: https://reviews.llvm.org/D89168
2020-11-05 11:15:50 -05:00
Michael Liao 23c6d1501d [amdgpu] Add `llvm.amdgcn.endpgm` support.
- `llvm.amdgcn.endpgm` is added to enable "abort" support.

Differential Revision: https://reviews.llvm.org/D90809
2020-11-05 19:06:50 -05:00
Yuriy Chernyshov 99e64623ec Do not construct std::string from nullptr
While I am trying to forbid such usages systematically in https://reviews.llvm.org/D79427 / P2166R0 to C++ standard, this PR fixes this (definitelly incorrect) usage in llvm.

This code is unreachable, so it could not cause any harm

Reviewed By: nikic, dblaikie

Differential Revision: https://reviews.llvm.org/D87697
2020-11-05 15:23:26 -08:00
Craig Topper defe11866a [RISCV] Add isel patterns for fnmadd/fnmsub with an fneg on the second operand instead of the first.
The multiply part of FMA is commutable, but TargetSelectionDAG.td
doesn't have it marked as commutable so tablegen won't automatically
create the additional patterns.

So manually add commuted patterns.
2020-11-05 14:00:25 -08:00
Kazushi (Jam) Marukawa f0e585d585 [VE] Add isReMaterializable and isAsCheapAsAMove flags
Add isReMaterializable and isCheapAsAMove flags to integer instructions
which cost cheap.

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D90833
2020-11-06 06:09:10 +09:00
Sanjay Patel 264a6df353 [ARM] remove cost-kind predicate for cmp/sel costs
This is the cmp/sel sibling to D90692.
Again, the reasoning is: the throughput cost is number of instructions/uops,
so size/blended costs are identical except in special cases (for example,
fdiv or other known-expensive machine instructions or things like MVE that
may require cracking into >1 uops).

We need to check for a valid (non-null) condition type parameter because
SimplifyCFG may pass nullptr for that (and so we will crash multiple
regression tests without that check). I'm not sure if passing nullptr makes
sense, but other code in the cost model does appear to check if that param
is set or not.

Differential Revision: https://reviews.llvm.org/D90781
2020-11-05 14:52:25 -05:00
Amara Emerson f347d78cca [AArch64][GlobalISel] Add AArch64::G_DUPLANE[X] opcodes for lane duplicates.
These were previously handled by pattern matching shuffles in the selector, but
adding a new opcode and making it equivalent to the AArch64duplane SDAG node
allows us to select more patterns, like lane indexed FMLAs (patch adding a test
for that will be committed later).

The pattern matching code has been simply moved to postlegalize lowering.

Differential Revision: https://reviews.llvm.org/D90820
2020-11-05 11:18:11 -08:00
Craig Topper ce5f4f22e9 [RISCV] Use the 'si' lib call for (double (fp_to_sint/uint i32 X)) when F extension is enabled.
D80526 added custom lowering to pick the si lib call on RV64, but this custom handling is only enabled when the F and D extension are both disabled. This prevents the si library call from being used for double when F is enabled but D is not.

This patch changes the behavior so we always enable the Custom hook on RV64 and decide in ReplaceNodeResults if we should emit a libcall based on whether the FP type should be softened or not.

Differential Revision: https://reviews.llvm.org/D90817
2020-11-05 10:46:45 -08:00
Stanislav Mekhanoshin f738aee0bb [AMDGPU] Add default 1 glc operand to rtn atomics
This change adds a real glc operand to the return atomic
instead of just string " glc" in the middle of the asm
string.

Improves asm parser diagnostics.

Differential Revision: https://reviews.llvm.org/D90730
2020-11-05 10:41:59 -08:00
Craig Topper ce1270fc7e [RISCV] Remove shadow register list passed to AllocateReg when allocating FP registers for calling convention
The _F and _D registers are already sub/super registers. When one gets allocated all its aliases are already marked as allocated. We don't need to explicitly shadow it too.

I believe shadow is for calling conventions like 64-bit Windows on X86 where have rules like this

CCIfType<[i32], CCAssignToRegWithShadow<[ECX , EDX , R8D , R9D ],
                                         [XMM0, XMM1, XMM2, XMM3]>>

For that calling convention the argument number determines which register is used regardless of how many scalars or vectors came before it.

Removing this removes a question I had in D90738.

Differential Revision: https://reviews.llvm.org/D90801
2020-11-05 09:49:42 -08:00
Craig Topper c623584b6f [RISCV] Add isel patterns for fshl with immediate to select FSRI/FSRIW
There is no FSLI instruction, but we can emulate it using FSRI by swapping operands and subtracting the immediate from the bitwidth.

Differential Revision: https://reviews.llvm.org/D90826
2020-11-05 09:37:43 -08:00
Sander de Smalen d57bba7cf8 [SVE] Return StackOffset for TargetFrameLowering::getFrameIndexReference.
To accommodate frame layouts that have both fixed and scalable objects
on the stack, describing a stack location or offset using a pointer + uint64_t
is not sufficient. For this reason, we've introduced the StackOffset class,
which models both the fixed- and scalable sized offsets.

The TargetFrameLowering::getFrameIndexReference is made to return a StackOffset,
so that this can be used in other interfaces, such as to eliminate frame indices
in PEI or to emit Debug locations for variables on the stack.

This patch is purely mechanical and doesn't change the behaviour of how
the result of this function is used for fixed-sized offsets. The patch adds
various checks to assert that the offset has no scalable component, as frame
offsets with a scalable component are not yet supported in various places.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D90018
2020-11-05 11:02:18 +00:00
Fangrui Song 96b0b9a5e3 [X86] Enable shrink-wrapping for no-frame-pointer non-nounwind functions on platforms not using compact unwind
The current compact unwind scheme does not work when the prologue is not at the
start (the instructions before the prologue cannot be described).  (Technically
this is fixable, but it requires multiple compact unwind descriptors for one
function.)

rL255175 chose to not perform shrink-wrapping for no-frame-pointer functions not
marked as nounwind to work around PR25614. This is overly limited, as platforms
not supporting compact unwind (all non-Darwin) does not need the workaround.
This patch restricts the limitation to compact unwind platforms.

Reviewed By: qcolombet

Differential Revision: https://reviews.llvm.org/D89930
2020-11-04 16:51:48 -08:00
Arthur Eubanks ab0ddbc38a Reland [NewPM] Add OptimizationLevel param to registerPipelineStartEPCallback
This allows targets to skip optional optimization passes at -O0.

Reviewed By: ychen

Differential Revision: https://reviews.llvm.org/D90777
2020-11-04 13:11:40 -08:00
Arthur Eubanks 9173b5a99d Revert "[NewPM] Add OptimizationLevel param to registerPipelineStartEPCallback"
This reverts commit 7a83aa0520.

Causing buildbot failures.
2020-11-04 12:57:32 -08:00
Arthur Eubanks 7a83aa0520 [NewPM] Add OptimizationLevel param to registerPipelineStartEPCallback
This allows targets to skip optional optimization passes at -O0.

Reviewed By: ychen

Differential Revision: https://reviews.llvm.org/D90777
2020-11-04 12:53:30 -08:00
Eric Astor 07c4f1d10b [ms] [llvm-ml] Lex MASM strings, including escaping
Allow single-quoted strings and double-quoted character values, as well as doubled-quote escaping.

Reviewed By: thakis

Differential Revision: https://reviews.llvm.org/D89731
2020-11-04 15:28:43 -05:00
Cameron McInally c126eb7529 [SelectionDAG] Add legalizations for VECREDUCE_SEQ_FMUL
Hook up legalizations for VECREDUCE_SEQ_FMUL. This is following up on the VECREDUCE_SEQ_FADD work from D90247.

Differential Revision: https://reviews.llvm.org/D90644
2020-11-04 14:20:31 -06:00
Mircea Trofin 5dc47541f9 [NFC] Use Register/MCRegister
Differential Revision: https://reviews.llvm.org/D90724
2020-11-04 12:20:17 -08:00
Craig Topper cc3bf27077 [RISCV] Remove assertsexti32 from fslw/fsrw isel patterns.
The operations in these patterns shouldn't be effected by sign
bits. And the pattern is starting from a sign_extend_inreg so
we aren't expecting sign bits to be passed through either.

Differential Revision: https://reviews.llvm.org/D90739
2020-11-04 11:37:58 -08:00
Craig Topper d47300f503 [RISCV] Correct the operand order for fshl/fshr to fsl/fsr instructions.
fsl/fsr take their shift amount in $rs2 or an immediate. The
sources are $rs1 and $rs3.

fshl/fshr ISD opcodes both concatenate operand 0 in the high bits and
operand 1 in the lower bits. fshl returns the high bits after
shifting and fshr returns the low bits. So a shift amount of 0
returns operand 0 for fshl and operand 1 for fshr.

fsl/fsr concatenate their operands in different orders such that
$rs1 will be returned for a shift amount of 0. So $rs1 needs to
come from operand 0 of fshl and operand 1 of fshr.

Differential Revision: https://reviews.llvm.org/D90735
2020-11-04 11:13:25 -08:00
Craig Topper 0122a4ea66 [RISCV] Remove assertsexti32 from inputs to riscv_sllw/srlw nodes in B extension isel patterns.
riscv_sllw/srlw only reads the lower 32 bits of the first operand.
And the lower 5 bits of the second operands. Whether the upper
32 bits of the input are sign bits or not doesn't matter.

Also use ineg and not to shorten the patterns.

Differential Revision: https://reviews.llvm.org/D90668
2020-11-04 10:35:05 -08:00
Craig Topper 857563eaf0 [RISCV] Check all 64-bits of the mask in SelectRORIW.
We need to ensure the upper 32 bits of the mask are zero.
So that the srl shifts zeroes into the lower 32 bits.

Differential Revision: https://reviews.llvm.org/D90585
2020-11-04 10:15:30 -08:00
Christopher Tetreault 900ec97bbe [UBSan] Cannot negate smallest negative signed integer
Silence warning Undefined Behavior Sanitzer warning:
runtime error: negation of -9223372036854775808 cannot be represented in type 'int64_t' (aka 'long'); cast to an unsigned type to negate this value to itself

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D90710
2020-11-04 10:07:52 -08:00
Craig Topper 3701e33a22 [RISCV] Remove custom isel for (srl (shl val, 32), imm). Use pattern instead. NFCI
We don't need custom matching, we just a need a predicate to check
the immediate is greater than 32. We can use the existing ImmSub32
to adjust the immediate.

I've also used the new predicate in the other location that used
ImmSub32. I tried to create a test case where we would break without
the greater than 32 check on that pattern, but DAG combine defeated me.
Still seemed safer to have it.

Differential Revision: https://reviews.llvm.org/D90546
2020-11-04 09:59:14 -08:00
Joe Nash 58adab34c4 [AMDGPU] Resolve pseudo registers at encoding uses
Pseudo-registers allow different register encodings
between gpu generations. Make sure we resolve the
pseudo regs to real regs whenever we get their
hardware encoding.
Using the correct encodings revealed a register
bank conflict and an unnecessary write dependency.
Tests have been updated to match.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D90721

Change-Id: I73c154cd24aecc820993b50bebaf4df97a5710ca
2020-11-04 12:52:32 -05:00
Sebastian Neubauer 31a0b2834f [AMDGPU] Fix iterating in SIFixSGPRCopies
The insertion of waterfall loops splits the current basic block into
three blocks. So the basic block that we iterate over must be updated.

This failed assert(!NodePtr->isKnownSentinel()) in ilist_iterator for
divergent calls in branches before.

Differential Revision: https://reviews.llvm.org/D90596
2020-11-04 18:43:19 +01:00
Paul C. Anagnostopoulos d56cd4291e [TableGen] Add !interleave operator to concatenate a list of values with delimiters
Add a test. Use it in some TableGen files.

Differential Revision: https://reviews.llvm.org/D90469
2020-11-04 09:23:54 -05:00
Simon Moll 351c10cc72 [VE] Add +vpu attribute
`+vpu` controls whether VEISelLowering adds any vregs.  This defaults to
`-vpu` to have scalar code generation out of the box.  We bring up
vector isel under the `+vpu` flag. Once vector isel is stable we switch
to `+vpu` and advertise vregs and vops in TTI.

Reviewed By: kaz7

Differential Revision: https://reviews.llvm.org/D90465
2020-11-04 12:42:00 +01:00
Kerry McLaughlin f2412d372d [SVE][CodeGen] Lower scalable integer vector reductions
This patch uses the existing LowerFixedLengthReductionToSVE function to also lower
scalable vector reductions. A separate function has been added to lower VECREDUCE_AND
& VECREDUCE_OR operations with predicate types using ptest.

Lowering scalable floating-point reductions will be addressed in a follow up patch,
for now these will hit the assertion added to expandVecReduce() in TargetLowering.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D89382
2020-11-04 11:38:49 +00:00
Sebastian Neubauer 1124bf4ab7 [AMDGPU] Set rsrc1 flags for graphics shaders
Before they were only set for compute kernels and compute shaders but
not for other shaders.

Differential Revision: https://reviews.llvm.org/D89399
2020-11-04 12:25:41 +01:00
Sebastian Neubauer 76313288cd [AMDGPU] Fix ieee mode default value
Previously, the default value for ieee mode was
- on for compute kernels and compute shaders,
- off for all shaders except compute shaders.

This commit changes the default to be
- on for compute kernels,
- off for shaders.

This aligns the default value with the settings that are actually in
use.  To my knowledge, all users of shader calling conventions (mesa and
llpc) disable the ieee mode by default.

Differential Revision: https://reviews.llvm.org/D89388
2020-11-04 12:25:38 +01:00
David Green eb611930b6 [ARM] Remove unused variable. NFC 2020-11-04 09:00:03 +00:00
Sander de Smalen 73b6cb67dc [NFCI] Replace AArch64StackOffset by StackOffset.
This patch replaces the AArch64StackOffset class by the generic one
defined in TypeSize.h.

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D88983
2020-11-04 08:49:00 +00:00
Amara Emerson 393b55380a [AArch64][GlobalISel] Add combine for G_EXTRACT_VECTOR_ELT to allow selection of pairwise FADD.
For the <2 x float> case, instead of adding another combine or legalization to
get it into a <4 x float> form, I'm just adding a GISel specific selection
pattern to cover it.

Differential Revision: https://reviews.llvm.org/D90699
2020-11-03 17:25:14 -08:00
Julien Jorge 0fca651711 [WebAssembly] Don't fold frame offset for global addresses
When machine instructions are in the form of
```
%0 = CONST_I32 @str
%1 = ADD_I32 %stack.0, %0
%2 = LOAD 0, 0, %1
```

In the `ADD_I32` instruction, it is possible to fold it if `%0` is a
`CONST_I32` from an immediate number. But in this case it is a global
address, so we shouldn't do that. But we haven't checked if the operand
of `ADD` is an immediate so far. This fixes the problem. (The case
applies the same for `ADD_I64` and `CONST_I64` instructions.)

Fixes https://bugs.llvm.org/show_bug.cgi?id=47944.

Patch by Julien Jorge (jjorge@quarkslab.com)

Reviewed By: dschuff

Differential Revision: https://reviews.llvm.org/D90577
2020-11-03 14:56:25 -08:00
Sanjay Patel c40126e740 [ARM] remove cost-kind predicate for most math op costs
This is based on the same idea that I am using for the basic model implementation
and what I have partly already done for x86: throughput cost is number of
instructions/uops, so size/blended costs are identical except in special cases
(for example, fdiv or other known-expensive machine instructions or things like
MVE that may require cracking into >1 uop)).

Differential Revision: https://reviews.llvm.org/D90692
2020-11-03 17:23:46 -05:00
Jordan Rupprecht 980bf1d5d1 [NFC] Inline wasm assertion-only variable 2020-11-03 13:06:59 -08:00
Andy Wingo 107c3a12d6 [WebAssembly] Implement ref.null
This patch adds a new "heap type" operand kind to the WebAssembly MC
layer, used by ref.null. Currently the possible values are "extern" and
"func"; when typed function references come, though, this operand may be
a type index.

Note that the "heap type" production is still known as "refedtype" in
the draft proposal; changing its name in the spec is
ongoing (https://github.com/WebAssembly/reference-types/issues/123).

The register form of ref.null is still untested.

Differential Revision: https://reviews.llvm.org/D90608
2020-11-03 10:46:23 -08:00
Craig Topper 00eff96e1d [RISCV] Add missing patterns for rotr with immediate for Zbb/Zbp extensions.
DAGCombine doesn't canonicalize rotl/rotr with immediate so we
need patterns for both.

Remove the custom matcher for rotl to RORI and just use a SDNodeXForm
to convert the immediate instead. Doing this gives priority to the
rev32/rev16 versions of grevi over rori since an explicit immediate
is more precise than any immediate. I also added rotr patterns for
rev32/rev16. And removed the (or (shl), (shr)) patterns that should be
combined to rotl by DAG combine.

There is at least one other grev pattern that probably needs a
another rotr pattern, but we need more test coverage first.

Differential Revision: https://reviews.llvm.org/D90575
2020-11-03 10:04:52 -08:00
Esme-Yi 5053eab890 Revert "[PowerPC] Extend folding RLWINM + RLWINM to post-RA."
This reverts commit 119ab2181e.
2020-11-03 16:34:02 +00:00
Tim Renouf 89d41f3a2b [AMDGPU] Add gfx1033 target
Differential Revision: https://reviews.llvm.org/D90447

Change-Id: If2650fc7f31bbdd49c76e74a9ca8e3734d769761
2020-11-03 16:27:48 +00:00
Tim Renouf ee3e642627 [AMDGPU] Add gfx90c target
This differentiates the Ryzen 4000/4300/4500/4700 series APUs that were
previously included in gfx909.

Differential Revision: https://reviews.llvm.org/D90419

Change-Id: Ia901a7157eb2f73ccd9f25dbacec38427312377d
2020-11-03 16:27:43 +00:00
Jay Foad 040c50278c [AMDGPU] Fix ds_read2/write2 with unaligned offsets
These instructions use a scaled offset. We were wrongly selecting them
even when the required offset was not a multiple of the scale factor.

Differential Revision: https://reviews.llvm.org/D90607
2020-11-03 15:16:10 +00:00
Jameson Nash a0ad066ce4 make the AsmPrinterHandler array public
This lets external consumers customize the output, similar to how
AssemblyAnnotationWriter lets the caller define callbacks when printing
IR. The array of handlers already existed, this just cleans up the code
so that it can be exposed publically.

Replaces https://reviews.llvm.org/D74158

Differential Revision: https://reviews.llvm.org/D89613
2020-11-03 10:02:09 -05:00
Sanjay Patel 9af561ec99 [x86] update cost table comments for maxnum; NFC
Follow-up suggested in D90613.
2020-11-03 08:09:59 -05:00
David Green bd32386410 [ARM] Remove unused variable. NFC 2020-11-03 12:58:10 +00:00
David Green e474499402 [ARM] Treat memcpy/memset/memmove as call instructions for low overhead loops
If an instruction will be lowered to a call there is no advantage of
using a low overhead loop as the LR register will need to be spilled and
reloaded around the call, and the low overhead will end up being
reverted. This teaches our hardware loop lowering that these memory
intrinsics will be calls under certain situations.

Differential Revision: https://reviews.llvm.org/D90439
2020-11-03 11:53:09 +00:00
Nicholas Guy 54d8627852 [AArch64] Redundant masks in downcast long multiply
Adds patterns to catch masks preceeding a long multiply,
and generating a single umull/smull instruction instead.

Differential revision: https://reviews.llvm.org/D89956
2020-11-03 10:12:28 +00:00
Petar Avramovic 0031418dce AMDGPU/GlobalISel: Use same builder/observer in post-legalizer-combiner
Change match/apply functions into methods of new target specific combiner
helper class. Use reference to MachineIRBuilder from helper instead of
constructing new MachineIRBuilder each time new instruction needs to made.
Allows correct tracking of newly created instructions.

Differential Revision: https://reviews.llvm.org/D90623
2020-11-03 09:24:50 +01:00
Esme-Yi 119ab2181e [PowerPC] Extend folding RLWINM + RLWINM to post-RA.
Summary: This patch depends on D89846. We have the patterns to fold 2 RLWINMs in ppc-mi-peephole, while some RLWINM will be generated after RA, for example rGc4690b007743. If the RLWINM generated after RA followed by another RLWINM, we expect to perform the optimization after RA, too.

Reviewed By: shchenz, steven.zhang

Differential Revision: https://reviews.llvm.org/D89855
2020-11-03 07:44:11 +00:00
Craig Topper 46e91f6701 [RISCV] Remove isel patterns for fshl/fshr with same inputs. NFC
These were being selected to ROL/ROR, but DAG combine should
canonicalize fshl/fshr with same inputs to rotl/rotr which we
also have patterns for.
2020-11-02 23:12:18 -08:00
Esme-Yi b969dfe26f [NFC][PowerPC] Move the folding RLWINMs from ppc-mi-peephole to PPCInstrInfo.
Summary: We have the patterns to fold 2 RLWINMs in ppc-mi-peephole, while some RLWINM will be generated after RA, for example D88274. If the RLWINM generated after RA followed by another RLWINM, we expect to perform the optimization after RA, too.
This is a NFC patch to move the folding patterns to PPCInstrInfo, and the follow-up works will be calling it in pre-emit-peephole and expand the patterns to handle more cases.

Reviewed By: shchenz

Differential Revision: https://reviews.llvm.org/D89846
2020-11-03 06:28:56 +00:00
Jessica Clarke 7601a21738 [RISCV] Only return DestSourcePair from isCopyInstrImpl for registers
ADDI often has a frameindex in operand 1, but consumers of this
interface, such as MachineSink, tend to call getReg() on the Destination
and Source operands, leading to the following crash when building
FreeBSD after this implementation was added in 8cf6778d30:

```
clang: llvm/include/llvm/CodeGen/MachineOperand.h:359: llvm::Register llvm::MachineOperand::getReg() const: Assertion `isReg() && "This is not a register operand!"' failed.
PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace, preprocessed source, and associated run script.
Stack dump:
 #0 0x00007f4286f9b4d0 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) llvm/lib/Support/Unix/Signals.inc:563:0
 #1 0x00007f4286f9b587 PrintStackTraceSignalHandler(void*) llvm/lib/Support/Unix/Signals.inc:630:0
 #2 0x00007f4286f9926b llvm::sys::RunSignalHandlers() llvm/lib/Support/Signals.cpp:71:0
 #3 0x00007f4286f9ae52 SignalHandler(int) llvm/lib/Support/Unix/Signals.inc:405:0
 #4 0x00007f428646ffd0 (/lib/x86_64-linux-gnu/libc.so.6+0x3efd0)
 #5 0x00007f428646ff47 raise /build/glibc-2ORdQG/glibc-2.27/signal/../sysdeps/unix/sysv/linux/raise.c:51:0
 #6 0x00007f42864718b1 abort /build/glibc-2ORdQG/glibc-2.27/stdlib/abort.c:81:0
 #7 0x00007f428646142a __assert_fail_base /build/glibc-2ORdQG/glibc-2.27/assert/assert.c:89:0
 #8 0x00007f42864614a2 (/lib/x86_64-linux-gnu/libc.so.6+0x304a2)
 #9 0x00007f428d4078e2 llvm::MachineOperand::getReg() const llvm/include/llvm/CodeGen/MachineOperand.h:359:0
#10 0x00007f428d8260e7 attemptDebugCopyProp(llvm::MachineInstr&, llvm::MachineInstr&) llvm/lib/CodeGen/MachineSink.cpp:862:0
#11 0x00007f428d826442 performSink(llvm::MachineInstr&, llvm::MachineBasicBlock&, llvm::MachineInstrBundleIterator<llvm::MachineInstr, false>, llvm::SmallVectorImpl<llvm::MachineInstr*>&) llvm/lib/CodeGen/MachineSink.cpp:918:0
#12 0x00007f428d826e27 (anonymous namespace)::MachineSinking::SinkInstruction(llvm::MachineInstr&, bool&, std::map<llvm::MachineBasicBlock*, llvm::SmallVector<llvm::MachineBasicBlock*, 4u>, std::less<llvm::MachineBasicBlock*>, std::allocator<std::pair<llvm::MachineBasicBlock* const, llvm::SmallVector<llvm::MachineBasicBlock*, 4u> > > >&) llvm/lib/CodeGen/MachineSink.cpp:1073:0
#13 0x00007f428d824a2c (anonymous namespace)::MachineSinking::ProcessBlock(llvm::MachineBasicBlock&) llvm/lib/CodeGen/MachineSink.cpp:410:0
#14 0x00007f428d824513 (anonymous namespace)::MachineSinking::runOnMachineFunction(llvm::MachineFunction&) llvm/lib/CodeGen/MachineSink.cpp:340:0
```

Thus, check that operand 1 is also a register in the condition.

Reviewed By: arichardson, luismarques

Differential Revision: https://reviews.llvm.org/D89090
2020-11-03 03:55:47 +00:00
Qiu Chaofan d14e51806b [PowerPC] Skip IEEE 128-bit FP type in FastISel
Vector types, quadword integers and f128 currently cannot be handled in
FastISel. We did not skip f128 type in lowering arguments, which causes
a crash. This patch will fix it.

Reviewed By: steven.zhang

Differential Revision: https://reviews.llvm.org/D90206
2020-11-03 11:17:11 +08:00
Qiu Chaofan 3204ffeade [PowerPC] [NFC] Rename VCMPo to VCMP_rec
Reviewed By: jsji

Differential Revision: https://reviews.llvm.org/D90581
2020-11-03 11:10:59 +08:00
Fangrui Song ca01a6b3ac [PowerPC] Parse and ignore .machine ppc64
In the wild, kexec-tools purgatory/arch/ppc64/v2wrap.S and hvcall.S
use this directive.
2020-11-02 16:49:57 -08:00
Krzysztof Parzyszek b26a2755dc [Hexagon] Move isTypeForHVX from Hexagon TTI to HexagonSubtarget, NFC
It's useful outside of Hexagon TTI, and with how TTI is implemented,
it is not accessible outside of TTI.
2020-11-02 14:00:45 -06:00
Stanislav Mekhanoshin c9d6fe6f7d [AMDGPU] Improve FLAT scratch detection
We were useing too broad check for isFLATScratch() which also
includes FLAT global.

Differential Revision: https://reviews.llvm.org/D90505
2020-11-02 11:37:33 -08:00
Craig Topper 9ac2910093 [RISCV] Make SelectRORIW handle the commutability of OR.
The SHL and SRL could be in opposite order so account for that.

Differential Revision: https://reviews.llvm.org/D90586
2020-11-02 09:32:54 -08:00
Sanjay Patel 35fa3c474f [x86] add AVX2 cost model entries for maxnum of 256-bit vectors
As noticed in D90554 ,
the AVX2 costs for 256-bit vectors did not include FMAXNUM entries,
so we fell back to AVX1 which assumes those ops will be split into
128-bit halves or something close to that.

Differential Revision: https://reviews.llvm.org/D90613
2020-11-02 12:20:17 -05:00
Craig Topper 7142ec3aaf [RISCV] When matching RORIW, make sure the same input is given to both shifts.
The code is looking for (sext_inreg (or (shl X, C2), (shr (and Y, C3), C1))).
We need to ensure X and Y are the same.

Differential Revision: https://reviews.llvm.org/D90580
2020-11-02 09:12:40 -08:00
Momchil Velikov 7360d6d921 [ARM][MachineOutliner] Do not overestimate LR liveness in return block
The `LiveRegUnits` utility (as well as `LivePhysRegs`) considers
callee-saved registers to be alive at the point after the return
instruction in a block. In the ARM backend, the `LR` register is
classified as callee-saved, which is not really correct (from an ARM
eABI or just common sense point of view).  These two conditions cause
the `MachineOutliner` to overestimate the liveness of `LR`, which
results in unnecessary saves/restores of `LR` around calls to outlined
sequences.  It also causes the `MachineVerifer` to crash in some
cases, because the save instruction reads a dead `LR`, for example
when the following program:

int h(int, int);

int f(int a, int b, int c, int d) {
  a = h(a + 1, b - 1);
  b = b + c;
  return 1 + (2 * a + b) * (c - d) / (a - b) * (c + d);
}

int g(int a, int b, int c, int d) {
  a = h(a - 1, b + 1);
  b = b + c;
  return 2 + (2 * a + b) * (c - d) / (a - b) * (c + d);
}

is compiled with `-target arm-eabi -march=armv7-m -Oz`.

This patch computes the liveness of `LR` in return blocks only, while
taking into account the few ARM instructions, which read `LR`, but
nevertheless the register is not mentioned (explicitly or implicitly)
in the instruction operands.

Differential Revision: https://reviews.llvm.org/D89189
2020-11-02 16:47:22 +00:00
Florian Hahn b3b993a7ad Reland "[TTI] Add VecPred argument to getCmpSelInstrCost."
This reverts the revert commit 408c4408fa.

This version of the patch includes a fix for a crash caused by
treating ICmp/FCmp constant expressions as instructions.

Original message:

On some targets, like AArch64, vector selects can be efficiently lowered
if the vector condition is a compare with a supported predicate.

This patch adds a new argument to getCmpSelInstrCost, to indicate the
predicate of the feeding select condition. Note that it is not
sufficient to use the context instruction when querying the cost of a
vector select starting from a scalar one, because the condition of the
vector select could be composed of compares with different predicates.

This change greatly improves modeling the costs of certain
compare/select patterns on AArch64.

I am also planning on putting up patches to make use of the new argument in
SLPVectorizer & LV.
2020-11-02 15:39:29 +00:00
Matt Arsenault 86b8f6919b AMDGPU: Reorder checks 2020-11-02 10:21:48 -05:00
Evgeny Leviant cc96a82291 [TableGen][SchedModels] Fix read/write variant substitution
Patch fixes case when sched class has write and read variants belonging
to different processor models.

Differential revision: https://reviews.llvm.org/D89777
2020-11-02 17:39:04 +03:00
Jay Foad 0892d2a311 Revert "Fix ds_read2/write2 unaligned offsets"
This reverts commit 2e7e898c8f.

It was committed by mistake.
2020-11-02 14:01:33 +00:00
Jay Foad 2e7e898c8f Fix ds_read2/write2 unaligned offsets 2020-11-02 13:57:13 +00:00
Simon Pilgrim 36920d5f9d [RISCV] Avoid std::pair<> in FPReg StringSwitch to avoid MSVC compile failures. NFCI.
As discussed on D90322, some MSVC builds are failing with is_trivially_copyable static asserts (see D86126) - we can avoid this by not using the std::pair<unsigned,unsigned> which held both the FP+DP Registers, just handle the FP register and convert to DP on the fly.
2020-11-02 11:30:57 +00:00
Caroline Concatto 71038788ce Revert "[AArch64][AsmParser] Remove 'x31' alias for 'sp/xzr' register."
This reverts commit 8b281bfaf3.
2020-11-02 08:15:50 +00:00
Caroline Concatto 8b281bfaf3 [AArch64][AsmParser] Remove 'x31' alias for 'sp/xzr' register.
Only the aliases 'xzr' and 'sp' exist for the physical register x31.
The reason for wanting to remove the alias 'x31' is because it allows users
to write invalid asm that is not accepted by the GNU assembler.

Is there any objection to removing this alias? Or do we want to keep
this for compatibility with existing code that uses w31/x31?

Differential Revision: https://reviews.llvm.org/D90153
2020-11-02 07:57:05 +00:00
Qiu Chaofan 2762e6734f [PowerPC] Fix a crash in POWER 9 setb peephole
Variable InnerIsSel references FalseRes, while FalseRes might be
zext/sext. So InnerIsSel should reference SetOrSelCC, otherwise a crash
will happen.

Reviewed By: steven.zhang

Differential Revision: https://reviews.llvm.org/D90142
2020-11-02 14:29:43 +08:00
Craig Topper e57237f198 Recommit "[RISCV] Remove include of RISCVRegisterInfo.h from RISCVBaseInfo.h. NFCI"
This reverts 781917254d and recommits
781917254d.

I've changed getRegForInlineAsmConstraint to not use a std::pair
of Register in a previous commit. Hopefully that fixes the reported
issue with expensive checks on Windows. I'm still not sure exactly
why this commit removing an include affected a different file.

Original message:

RISCVRegisterInfo.h is part of the CodeGen layer. The Utils library
is intended to be shared with the MC layer so shouldn't use files
from the CodeGen layer.

The register enum names are already available from
RISCVMCTargetDesc.h. It appears what was coming from this include
was a transitive include of the Register class which I've replaced
with MCRegister. Register has a constructor from MCRegister so it
should be convertible.
2020-11-01 10:35:37 -08:00
Craig Topper a76cd10fcd [RISCV] Use 'unsigned' instead of Register in getRegForInlineAsmConstraint. NFC
The return value of this interface still uses an 'unsigned' on all
targets. So we convert Register back to unsigned at the end.

I'm hoping this will prevent the issue that caused the revert of
D90322.
2020-11-01 10:16:52 -08:00
Christudasan Devadasan d6aa4aa29a [AMDGPU] Some refactoring after D90404. NFC. 2020-11-01 13:18:53 +05:30
Christudasan Devadasan 9bb2b4f0aa [AMDGPU] Add alignment check for v3 to v4 load type promotion
It should be enabled only when the load alignment is at least 8-byte.

Fixes: SWDEV-256824

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D90404
2020-11-01 12:05:34 +05:30
Ayke van Laethem e03ba2198d
[AVR] Improve inline rotate/shift expansions
These expansions were rather inefficient and were done with more code
than necessary. This change optimizes them to use expansions more
similar to GCC. The code size is the same (when optimizing for code
size) but somehow LLVM reorders blocks in a non-optimal way. Still, this
should be an improvement with a reduction in code size of around 0.12%
(when building compiler-rt).

Differential Revision: https://reviews.llvm.org/D86418
2020-10-31 23:15:49 +01:00
Paul C. Anagnostopoulos ef6f6d1c1a [TableGen] Eliminate uses of true and false in .td files.
They occurred in one NVPTX file and some test files.

Differential Revision: https://reviews.llvm.org/D90513
2020-10-31 10:54:33 -04:00
David Green 30ad742644 [ARM] Fix crash for gather of pointer costs.
If the elt size is unknown due to it being a pointer, a comparison
against 0 will cause an assert. Make sure the elt size is large enough
before comparing and for the moment just return the scalar cost.
2020-10-31 13:10:14 +00:00
Simon Pilgrim 9e406ee808 [X86] Make some basic VarArgsLoweringHelper helper methods const. NFCI.
Fixes a number of cppcheck remarks.
2020-10-31 12:16:49 +00:00
Simon Pilgrim e0cbcf96ce [X86] Make the X86FrameSortingComparator operator const. NFCI.
Fixes a cppcheck remark.
2020-10-31 12:16:49 +00:00
Simon Pilgrim 55dbb7d823 [X86] X86MCTargetDesc - ensure the declaration/definition variable names match. NFCI.
Silences cppcheck mismatch warnings.
2020-10-31 11:50:00 +00:00
Simon Pilgrim 30a1d91127 [X86] Reduce scope of DestReg and use specific Register type not unsigned. NFCI. 2020-10-31 11:46:07 +00:00
Simon Pilgrim ae80ac6db2 [X86] printAsmMRegister - make the X86AsmPrinter arg a const reference. NFC.
Fixes cppcheck warning.
2020-10-31 11:41:14 +00:00
Simon Pilgrim 39f77b3224 [X86] assignValueToReg - fix Wshadow warning. NFCI.
X86OutgoingValueHandler already has a MIB member
2020-10-31 11:39:26 +00:00
Simon Pilgrim 33e20008d1 [X86] printAsmVRegister - remove unused argument. NFC. 2020-10-31 11:34:28 +00:00
Simon Pilgrim ec547a7517 [X86] X86AsmPrinter - ensure the declaration/definition variable names match. NFCI.
Silences cppcheck mismatch warnings.
2020-10-31 11:31:46 +00:00
Simon Pilgrim 5eec049689 [X86] No need to determine pointer when the type is already a MachineInstr*. NFCI.
Caught by cppcheck - appears to be a copy+paste typo as the other var is an iterator that does need the &* pointer operation.
2020-10-31 11:26:25 +00:00
Liu, Chen3 756f597841 [X86] Support Intel avxvnni
This patch mainly made the following changes:

1. Support AVX-VNNI instructions;
2. Introduce ExplicitVEXPrefix flag so that vpdpbusd/vpdpbusds/vpdpbusds/vpdpbusds instructions only use vex-encoding when user explicity add {vex} prefix.

Differential Revision: https://reviews.llvm.org/D89105
2020-10-31 12:39:51 +08:00
Thomas Lively a787e09779 [WebAssembly] Prototype i64x2.bitmask
As proposed in https://github.com/WebAssembly/simd/pull/368.

Differential Revision: https://reviews.llvm.org/D90514
2020-10-30 17:23:30 -07:00
Wouter van Oortmerssen 86cd2332ce [WebAssembly] Fixed DWARF DW_AT_low_pc encoded as 64-bit in wasm64
Also added general wasm64 DWARF test
Also added asserts for unsupported reloc combinations that triggered this bug.

Differential Revision: https://reviews.llvm.org/D90503
2020-10-30 16:42:48 -07:00
Thomas Lively 0a512a555a [WebAssembly] Prototype i64x2.eq
As proposed in https://github.com/WebAssembly/simd/pull/381. Since it is still
in the prototyping phase, it is only accessible via a target builtin function
and a target intrinsic.

Depends on D90504.

Differential Revision: https://reviews.llvm.org/D90508
2020-10-30 16:38:15 -07:00
Thomas Lively 1cb0b56607 [WebAssembly] Prototype i64x2.widen_{low,high}_i32x4_{s,u}
As proposed in https://github.com/WebAssembly/simd/pull/290. As usual, these
instructions are available only via builtin functions and intrinsics while they
are in the prototyping stage.

Differential Revision: https://reviews.llvm.org/D90504
2020-10-30 15:44:04 -07:00
Florian Hahn 408c4408fa Revert "[TTI] Add VecPred argument to getCmpSelInstrCost."
This reverts commit 73f01e3df5.

This appears to break
http://lab.llvm.org:8011/#/builders/85/builds/383.
2020-10-30 21:26:14 +00:00
Peter Collingbourne 3d049bce98 hwasan: Support for outlined checks in the Linux kernel.
Add support for match-all tags and GOT-free runtime calls, which
are both required for the kernel to be able to support outlined
checks. This requires extending the access info to let the backend
know when to enable these features. To make the code easier to maintain
introduce an enum with the bit field positions for the access info.

Allow outlined checks to be enabled with -mllvm
-hwasan-inline-all-checks=0. Kernels that contain runtime support for
outlined checks may pass this flag. Kernels lacking runtime support
will continue to link because they do not pass the flag. Old versions
of LLVM will ignore the flag and continue to use inline checks.

With a separate kernel patch [1] I measured the code size of defconfig
+ tag-based KASAN, as well as boot time (i.e. time to init launch)
on a DragonBoard 845c with an Android arm64 GKI kernel. The results
are below:

         code size    boot time
before    92824064      6.18s
after     38822400      6.65s

[1] https://linux-review.googlesource.com/id/I1a30036c70ab3c3ee78d75ed9b87ef7cdc3fdb76

Depends on D90425

Differential Revision: https://reviews.llvm.org/D90426
2020-10-30 14:25:40 -07:00
Cameron McInally dda1e74b58 [Legalize] Add legalizations for VECREDUCE_SEQ_FADD
Add Legalization support for VECREDUCE_SEQ_FADD, so that we don't need to depend on ExpandReductionsPass.

Differential Revision: https://reviews.llvm.org/D90247
2020-10-30 16:02:55 -05:00
Peter Collingbourne c9b1a2b41d AArch64: Use SBFX instead of UBFX to extract address granule in outlined HWASan checks.
In a kernel (or in general in environments where bit 55 of the address
is set) the shadow base needs to point to the end of the shadow region,
not the beginning. Bit 55 needs to be sign extended into bits 52-63
of the shadow base offset, otherwise we end up loading from an invalid
address. We can do this by using SBFX instead of UBFX.

Using SBFX should have no effect in the userspace case where bit 55
of the address is clear so we do so unconditionally. I don't think
we need a ABI version bump for this (but one will come anyway when
we switch to x20 for the shadow base register).

Differential Revision: https://reviews.llvm.org/D90424
2020-10-30 12:53:15 -07:00
Peter Collingbourne 3859fc653f AArch64: Switch to x20 as the shadow base register for outlined HWASan checks.
From a code size perspective it turns out to be better to use a
callee-saved register to pass the shadow base. For non-leaf functions
it avoids the need to reload the shadow base into x9 after each
function call, at the cost of an additional stack slot to save the
caller's x20. But with x9 there is also a stack size cost, either
as a result of copying x9 to a callee-saved register across calls or
by spilling it to stack, so for the non-leaf functions the change to
stack usage is largely neutral.

It is also code size (and stack size) neutral for many leaf functions.
Although they now need to save/restore x20 this can typically be
combined via LDP/STP into the x30 save/restore. In the case where
the function needs callee-saved registers or stack spills we end up
needing, on average, 8 more bytes of stack and 1 more instruction
but given the improvements to other functions this seems like the
right tradeoff.

Unfortunately we cannot change the register for the v1 (non short
granules) check because the runtime assumes that the shadow base
register is stored in x9, so the v1 check still uses x9.

Aside from that there is no change to the ABI because the choice
of shadow base register is a contract between the caller and the
outlined check function, both of which are compiler generated. We do
need to rename the v2 check functions though because the functions
are deduplicated based on their names, not on their contents, and we
need to make sure that when object files from old and new compilers
are linked together we don't end up with a function that uses x9
calling an outlined check that uses x20 or vice versa.

With this change code size of /system/lib64/*.so in an Android build
with HWASan goes from 200066976 bytes to 194085912 bytes, or a 3%
decrease.

Differential Revision: https://reviews.llvm.org/D90422
2020-10-30 12:51:30 -07:00
Craig Topper 6915c76e10 [RISCV] Don't use DCI.CombineTo to replace a single result. NFCI
Just return the new node, which is the standard practice.

I also noticed what appeared to be an unnecessary attempt at
creating an ANY_EXTEND where the type should already be correct.
I replace with an assert to verify the type.

Differential Revision: https://reviews.llvm.org/D90444
2020-10-30 10:46:32 -07:00
Sanjay Patel 251dd7c0f9 [x86] add cost overrides for mul with overflow
I'm assuming the standard size integer instructions for this end up as something like:
mulq %rsi
seto %al

And the 'mul' generally has reciprocal throughput of 1 on typical implementations
(higher latency, but that's not handled here).
The default costs may end up much higher than that, and that's what we see in the test diffs.

Vector types are left as a 'TODO'.

Differential Revision: https://reviews.llvm.org/D90431
2020-10-30 12:38:16 -04:00
Simon Moll 4474d4d49c [VE][NFC] Split up lowering init
Split up the monolithic VETargetLowering ctor into three initialization phases:
1. initRegisterClasses()
2. initSPUActions()
3. // TODO initVPUActions()

Reviewed By: kaz7

Differential Revision: https://reviews.llvm.org/D90463
2020-10-30 16:18:27 +01:00
Matt Arsenault 790f5771fd AMDGPU: Fix missing writelane cases to skip with exec=0 2020-10-30 11:15:11 -04:00
serge-sans-paille 0f60bcc36c [stack-clash] Fix probing of dynamic alloca
- Perform the probing in the correct direction.
  Related to https://github.com/rust-lang/rust/pull/77885#issuecomment-711062924

- The first touch on a dynamic alloca cannot use a mov because it clobbers
  existing space. Use a xor 0 instead

Differential Revision: https://reviews.llvm.org/D90216
2020-10-30 15:34:00 +01:00
Simon Pilgrim 0ff1ab42f2 Use cast<> instead of dyn_cast<> as we dereference the pointer immediately. NFCI.
Fix clang static analyzer warning - we know that the arg should be ConstantInt and we're better off relying on cast<> asserting on failure rather than a null dereference crash.
2020-10-30 14:33:20 +00:00
Florian Hahn 73f01e3df5 [TTI] Add VecPred argument to getCmpSelInstrCost.
On some targets, like AArch64, vector selects can be efficiently lowered
if the vector condition is a compare with a supported predicate.

This patch adds a new argument to getCmpSelInstrCost, to indicate the
predicate of the feeding select condition. Note that it is not
sufficient to use the context instruction when querying the cost of a
vector select starting from a scalar one, because the condition of the
vector select could be composed of compares with different predicates.

This change greatly improves modeling the costs of certain
compare/select patterns on AArch64.

I am also planning on putting up patches to make use of the new argument in
SLPVectorizer & LV.

Reviewed By: dmgreen, RKSimon

Differential Revision: https://reviews.llvm.org/D90070
2020-10-30 13:49:08 +00:00
David Sherwood cea69fa4dc [SVE] Add fatal error for unnamed SVE variadic arguments
We don't currently support passing unnamed variadic SVE arguments
so I've added a fatal error if we hit such cases to prevent any
silent ABI issues in future.

Differential Revision: https://reviews.llvm.org/D90230
2020-10-30 13:35:47 +00:00
David Green d14db8c8dc [ARM] Match MVE vqdmulh
This adds ISel matching for a form of VQDMULH. There are several ir
patterns that we could match to that instruction, this one is for:

min(ashr(mul(sext(a), sext(b)), 7), 127)

Which is what llvm will optimize to once it has removed the max that
usually makes up the min/max saturate pattern, as in this case the
compare will always be false. The additional complication to match i32
patterns (which extend into an i64) is that the min will be a
vselect/setcc, as vmin is not supported for i64 vectors. Tablegen
patterns have also been updated to attempt to reuse the MVE_TwoOpPattern
patterns.

Differential Revision: https://reviews.llvm.org/D90096
2020-10-30 13:34:27 +00:00
Simon Pilgrim 781917254d Revert rG22c383763456 "[RISCV] Remove include of RISCVRegisterInfo.h from RISCVBaseInfo.h"
This reverts commit 22c3837634.

This is causing a build failure with MSVC - reported on D90322
2020-10-30 11:59:37 +00:00
alex-t a4f7e4264c [AMDGPU] SILowerControlFlow::removeMBBifRedundant. Refactoring plus fix for the null MBB pointer in MF->splice
Detailed description: This change addresses the refactoring adviced by foad. It also contain the fix for the case when getNextNode is null if the successor block is the last in MachineFunction.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D90314
2020-10-30 14:46:08 +03:00
Michael Roe fc0892c1f9 [mips] Implement add.ps, mul.ps and sub.ps
Differential revision: https://reviews.llvm.org/D90321
2020-10-30 10:59:15 +03:00
Krzysztof Parzyszek db60e64036 [Hexagon] Handle additional shuffles that can be made perfect 2020-10-29 19:09:00 -05:00
Craig Topper 74b078294f [RISCV] Improve worklist management in the DAG combine for SLLW/SRLW/SRAW
This combine makes two calls to SimplifyDemandedBits, one for the LHS and one
for the RHS. If the LHS call returns true, we don't make the RHS call. When
SimplifyDemandedBits makes a change, it will add the nodes around the change to
the DAG combiner worklist. If the simplification happens on the first recursion
step, the N will get added to the worklist. But if the simplification happens
deeper in the recursion, then N will not be revisited until the next time the
DAG combiner runs.

This patch explicitly addes N to the worklist anytime a Simplification is made.
Without this we might miss additional simplifications on the LHS or never
simplify the RHS. Special care also needs to be taken to not add N if it has
been CSEd by the simplification. There are similar examples in DAGCombiner and
the X86 target, but I don't have a test for it for RISC-V. I've also returned
SDValue(N, 0) instead of SDValue() so DAGCombiner knows a change was made and
will update its Statistic variable.

The test here was constructed so that 2 simplifications happen to the LHS.
Without this fix one happens in the post type legalization DAG combine and the
other happens after LegalizeDAG. This prevents the RHS from ever being
simplified causing the left and right shift to clear the upper 32 bits of the
RHS to be left behind.

Differential Revision: https://reviews.llvm.org/D90339
2020-10-29 14:52:53 -07:00
Craig Topper 22c3837634 [RISCV] Remove include of RISCVRegisterInfo.h from RISCVBaseInfo.h
RISCVRegisterInfo.h is part of the CodeGen layer. The Utils library
is intended to be shared with the MC layer so shouldn't use files
from the CodeGen layer.

The register enum names are already available from
RISCVMCTargetDesc.h. It appears what was coming from this include
was a transitive include of the Register class which I've replaced
with MCRegister. Register has a constructor from MCRegister so it
should be convertible.
2020-10-29 11:39:19 -07:00
Thomas Lively be6f50798e [WebAssembly] Implement SIMD signselect instructions
As proposed in https://github.com/WebAssembly/simd/pull/124, using the opcodes
adopted by V8 in
https://chromium-review.googlesource.com/c/v8/v8/+/2486235/2/src/wasm/wasm-opcodes.h.
Uses new builtin functions and a new target intrinsic exclusively to ensure that
the new instructions are only emitted when a user explicitly opts in to using
them since they are still in the prototyping and evaluation phase.

Differential Revision: https://reviews.llvm.org/D90357
2020-10-29 11:06:20 -07:00
Jay Foad 9cee87d72a [AMDGPU] Fix double space in disassembly of ds_gws_sema_* with gds
By setting up the AsmStrings correctly we can remove some special cases
from AMDGPUInstPrinter::printOffset.

Differential Revision: https://reviews.llvm.org/D90307
2020-10-29 17:31:59 +00:00
Jay Foad 58de4b2053 [AMDGPU] Use pseudo instructions for readlane/writelane
This reverts r227987 "R600/SI: Determine target-specific encoding of READLANE and WRITELANE early v2".

All the codegen changes are caused by the post-RA scheduler no longer
treating readlane/writelane as scheduling barriers due to having
unmodelled side effects. (The pseudos are hasSideEffects = 0, but the
real instructions are hasSideEffects = ? which TableGen conservatively
treats as 1.)

Differential Revision: https://reviews.llvm.org/D90401
2020-10-29 16:00:53 +00:00
Nicholas Guy eb9fe24eaf [ARM] Fix IT block generation after Thumb2SizeReduce with -Oz
Fixes a regression caused by D82439, in which IT blocks were no longer being generated when -Oz is present.

Differential Revision: https://reviews.llvm.org/D88496
2020-10-29 15:17:31 +00:00
Jay Foad 7a79921edd [AMDGPU] Remove gds operand from ds_gws_* MachineInstrs
The operand value was always 1 (except in some bad MIR tests) so it was
redundant.

Differential Revision: https://reviews.llvm.org/D90378
2020-10-29 15:04:23 +00:00
Jay Foad a442fad911 [AMDGPU] Fix double space in disassembly of s_set_gpr_idx_mode
Differential Revision: https://reviews.llvm.org/D90374
2020-10-29 14:54:33 +00:00
Jay Foad e9dd2c4fe2 [AMDGPU] Fix double space in disassembly of some DPP instructions
Differential Revision: https://reviews.llvm.org/D90373
2020-10-29 14:54:33 +00:00
Kazushi (Jam) Marukawa 58a6b7bcde [VE] Add missing BCR format
Add missing "BCR %sy, 0, target" format instruction and a regression
test for this format.

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D90387
2020-10-29 23:30:49 +09:00
Kazushi (Jam) Marukawa 07d1996601 [VE] Support register aliases in llvm-mc
Support register aliases in MC layer to compile existing assembly
files with clang and integrated assembler.

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D90383
2020-10-29 23:28:32 +09:00
Jay Foad 69f5105f5c [AMDGPU] Simplify insertNoops functions. NFC. 2020-10-29 10:55:20 +00:00
Kazushi (Jam) Marukawa 9c82944b2d [VE] Add vector control instructions
Add LVL/SVL/SMVL/LVIX isntructions.  Add regression tests too.

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D90355
2020-10-29 19:24:31 +09:00
Ben Shi 076a8d915b [NFC][AVR] Improve device list
Reviewed By: dylanmckay

https://reviews.llvm.org/D87968
2020-10-29 10:54:17 +08:00
Kazushi (Jam) Marukawa 7942960199 [VE] Add vector mask operation instructions
Add VFMK/VFMS/VFMF/ANDM/ORM/XORM/EQVM/NNDM/NEGM/PCVM/LZVM/TOVM
isntructions.  Add regression tests too.  Also add new patterns
to parse VFMK/VFMS/VFMF mnemonics.

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D90297
2020-10-29 08:42:41 +09:00
Austin Kerbow de51867343 [AMDGPU] Add Reset function to GCNHazardRecognizer
Reset the tracked emitted instructions when starting scheduling on a new
region.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D90347
2020-10-28 16:32:32 -07:00
Jay Foad 5b91a6a88b [AMDGPU] Allow some modifiers on VOP3B instructions
V_DIV_SCALE_F32/F64 are VOP3B encoded so they can't use the ABS src
modifier, but they can still use NEG and the usual output modifiers.

This partially reverts 3b99f12a4e "AMDGPU: Remove modifiers from v_div_scale_*".

Differential Revision: https://reviews.llvm.org/D90296
2020-10-28 21:54:14 +00:00
Jay Foad 50ee22d791 [AMDGPU] Fix double space in disassembly of SDWA instructions with vcc
Differential Revision: https://reviews.llvm.org/D90317
2020-10-28 21:39:39 +00:00
Florian Hahn 772aaa6023 [AArch64] Improve lowering of insert_vector_elt with 0.0 consts.
When moving +0.0 into a float vector, we can use to vi*gpr variants of
INS.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D90176
2020-10-28 21:35:33 +00:00
Austin Kerbow 8b127a8661 [AMDGPU] Fix inserting combined s_nop in bundles
Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D90334
2020-10-28 14:34:04 -07:00
Florian Hahn ba78cae20f [AArch64] Use DUP for BUILD_VECTOR with few different elements.
If most elements of BUILD_VECTOR are the same, with a few different
elements, it is better to use DUP  for the common elements and
INSERT_VECTOR_ELT for the different elements.

Currently this transform is guarded quite restrictively to only trigger
in clearly beneficial cases.

With D90176, the lowering for patterns originating from code like
` float32x4_t y = {a,a,a,0};` (common in 3D apps) are lowered even
better (unnecessary fmov is removed).

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D90233
2020-10-28 19:48:20 +00:00
Sanjay Patel 7c395f31a6 [CostModel][x86] remove cost-kind predicate for intrinsic costs
We model cost as number of instructions / uops, so it does not
make sense to treat size/blended costs any differently than
throughput.
2020-10-28 14:33:37 -04:00
Thomas Lively 31e944556f [WebAssembly] Prototype extending multiplication SIMD instructions
As proposed in https://github.com/WebAssembly/simd/pull/376. This commit
implements new builtin functions and intrinsics for these instructions, but does
not yet add them to wasm_simd128.h because they have not yet been merged to the
proposal. These are the first instructions with opcodes greater than 0xff, so
this commit updates the MC layer and disassembler to handle that correctly.

Differential Revision: https://reviews.llvm.org/D90253
2020-10-28 09:38:59 -07:00
Paul C. Anagnostopoulos 9d72065cf6 [TableGen] [AMDGPU] Add !sub operator for subtraction
Use it in the AMDGPU target to eliminate !add(value1, !mul(value2, -1))

Differential Revision: https://reviews.llvm.org/D90107
2020-10-28 12:27:53 -04:00
Jay Foad 9e634bc22f [AMDGPU] Omit needless string concatenations. NFC. 2020-10-28 12:56:52 +00:00
Kazushi (Jam) Marukawa cbdee7df06 [VE] Add vector merger operation instructions
Add VMRG/VSHF/VCP/VEX isntructions.  Add regression tests too.
Also add new patterns to parse new UImm4 oeprand.

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D90292
2020-10-28 19:57:10 +09:00
Kazushi (Jam) Marukawa 7ce2b93cbe [VE] Add vector iterative operation instructions
Add VFIA/VFIS/VFIM/VFIAM/VFISM/VFIMA/VFIMS isntructions.
Add regression tests too.

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D90252
2020-10-28 19:06:46 +09:00
Kazushi (Jam) Marukawa 15f6250bed [VE][NFC] Fix typo in comment 2020-10-28 18:51:07 +09:00
Kazushi (Jam) Marukawa b22e32a9c8 [VE] Specify to expand BRIND and BR_JT
BRIND and BR_JT are not implmented yet, so expand them atm.
Add regression tests too.

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D90283
2020-10-28 18:50:20 +09:00
David Green 066737fdbc [AArch64] Remove AArch64ISD::NOT, use vnot instead
vnot (xor -1) should be equivalent to the AArch64 specific AArch64ISD::NOT
node, but allow more folding thanks to all the target independent
optimizations. Specifically this allows select(icmp ne, x, y) to
become "cmeq; bsl y, x" as opposed to needing to convert the predicate
with "cmeq; mvn; bsl x, y"

Unfortunately there is a regression in a cmtst test, but the code it
selected from was already non-canonical, with instcombine preferring to
use an eq predicate instead. Plus the more common case of icmp ne is
improved.

Differential Revision: https://reviews.llvm.org/D90126
2020-10-28 08:15:37 +00:00
Carl Ritson 057934a6d7 [AMDGPU] Fix insert of SIPreAllocateWWMRegs in FastRegAlloc
SIPreAllocateWWMRegs was being inserted after RegisterCoalescer
but this pass does not exist during FastAlloc so pre-allocation
pass was never being run.
Insert pre-allocation after TwoAddressInstructionPass instead.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D90236
2020-10-28 12:15:15 +09:00
Nemanja Ivanovic 5459d08795 [PowerPC] Fix single-use check and update chain users for ld-splat
When converting a BUILD_VECTOR or VECTOR_SHUFFLE to a splatting load
as of 1461fb6e78, we inaccurately check
for a single user of the load and neglect to update the users of the
output chain of the original load. As a result, we can emit a new
load when the original load is kept and the new load can be reordered
after a dependent store. This patch fixes those two issues.

Fixes https://bugs.llvm.org/show_bug.cgi?id=47891
2020-10-27 16:49:38 -05:00
Stanislav Mekhanoshin 78ae1f6c90 [AMDGPU] Change predicate for fma/fmac legacy
I do not exactly like the use of a negative predicate to
enable instructions' support. Change HasNoMadMacF32Insts
with HasFmaLegacy32.

Differential Revision: https://reviews.llvm.org/D90250
2020-10-27 12:03:52 -07:00
Victor Huang 2e1a737f46 [PowerPC][PCRelative] Turn on TLS support for PCRel by default
Turn on TLS support for PCRel by default and update the test cases.

Differential Revision: https://reviews.llvm.org/D88738
Reviewed by: stefanp, kamaub
2020-10-27 13:58:44 -05:00
Michael Liao 46c3d5cb05 [amdgpu] Add the late codegen preparation pass.
Summary:
- Teach that pass to widen naturally aligned but not DWORD aligned
  sub-DWORD loads.

Reviewers: rampitec, arsenm

Subscribers:

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D80364
2020-10-27 14:07:59 -04:00
Kazushi (Jam) Marukawa a65883a78a [VE] Add vector reduction instructions
Add VSUMS/VSUMX/VFSUM/VMAXS/VMAXX/VFMAX/VRAND/VROR/VRXOR isntructions.
Add regression tests too.

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D90227
2020-10-28 02:33:21 +09:00
Michael Liao 0d092303b4 [amdgpu] Enable use of AA during codegen.
- Add an internal option `-amdgpu-use-aa-in-codegen` to enable or
  disable this feature. By Default, it's enabled.

Differential Revision: https://reviews.llvm.org/D89320
2020-10-27 09:46:23 -04:00
Benjamin Kramer 35f7cbf9df [X86] Don't crash on CVTPS2PH with wide vector inputs. 2020-10-27 14:42:02 +01:00
Kazushi (Jam) Marukawa c5fa6bae12 [VE] Add vector float instructions
Add VFAD/VFSB/VFMP/VFDV/VFSQRT/VFCP/VFCM/VFMAD/VFMSB/VFNMAD/VFNMSB/
VRCP/VRSQRT/VRSQRTNEX/VFIX/VFIXX/VFLT/VFLTX/VCVS/VCVD instructions.
Add regression tests too.  Also add additional AsmParser for VFIX
and VFIXX instructions to parse their mnemonic.

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D90166
2020-10-27 20:42:24 +09:00
Jay Foad 6539ebe97d [AMDGPU] Use DPP instead of Ext in a couple of class names. NFC. 2020-10-27 10:22:30 +00:00
Craig Topper f385823e04 [X86] Alternate implementation of D88194.
This uses PreprocessISelDAG to replace the constant before
instruction selection instead of matching opcodes after.

Reviewed By: pengfei

Differential Revision: https://reviews.llvm.org/D89178
2020-10-27 00:20:03 -07:00
Wei Wang d602e79a81 [X86] Encode global address in small code model
In small code model, program and its symbols are linked in the lower 2 GB of
the address space. Try encoding global address even when the range is unknown
in such case.

Differential Revision: https://reviews.llvm.org/D89341
2020-10-26 23:14:06 -07:00
Bing1 Yu 2c08f1b4b6 [CostModel][X86] teach TTI calculate cost of chain of vector inserts/extracts more precisely and correctly:In each 128-lane, if there is at least one index is demanded and not all indices are demanded...
In each 128-lane, if there is at least one index is demanded and not all
indices are demanded and this 128-lane is not the first 128-lane of the
legalized-vector, then this 128-lane needs a extracti128;
If in each 128-lane, there is at least one index is demanded, this 128-lane
needs a inserti128.

The following cases will help you build a better understanding:
Assume we insert several elements into a v8i32 vector in avx2,
Case#1: inserting into 1th index needs vpinsrd + inserti128
Case#2: inserting into 5th index needs extracti128 + vpinsrd +
inserti128
Case#3: inserting into 4,5,6,7 index needs 4*vpinsrd + inserti128.

Reviewed By: pengfei, RKSimon

Differential Revision: https://reviews.llvm.org/D89767
2020-10-27 11:21:13 +08:00
Chen Zheng 00e573cadb [LSR] fix typo in comments and rename for a new added hook. 2020-10-26 22:29:22 -04:00
Carl Ritson 7a880ab388 [AMDGPU] Move WQM Pass after MI Scheduler
Exec mask manipulation inserted by SIWholeQuadMode barriers to
instruction scheduling.  Move the entire pass after the machine
instruction scheduler and make changes so pass is correct for
non-SSA operation.  These changes should leave the pass still
usable pre-scheduler, although tests have be updated to reflect
post-scheduler results.

Reviewed By: nhaehnle

Differential Revision: https://reviews.llvm.org/D88081
2020-10-27 10:25:53 +09:00
Amy Kwan 803cc3aff2 [PowerPC] Implement Set Boolean Condition Instructions
This patch implements the set boolean condition instructions introduced in
POWER10.

The set boolean condition instructions (set[n]bc[r]) are used during
the following situations:
- sign/zero/any extending i1 to an i32 or i64,
- reg+reg, reg+imm or floating point comparisons being sign/zero extended to i32 or i64,
- spilling CR bits (using the setnbc instruction)

Differential Revision: https://reviews.llvm.org/D87705
2020-10-26 18:42:51 -05:00
Stanislav Mekhanoshin d176e13ca5 Fixed release build after D89170 2020-10-26 16:00:57 -07:00
Stanislav Mekhanoshin 038d884a50 [AMDGPU] Use flat scratch instructions where available
The support is disabled by default. So far there is instruction
selection, spilling, and frame elimination. It also changes SP
from unswizzled to swizzled as used by flat scratch instructions,
so it cannot be mixed with MUBUF stack access.

At the very least missing:

- GlobalISel;
- Some optimizations in frame elimination in between vector
  and scalar ALU;
- It shall finally allow to always materialize frame index
  as an SGPR, but that is not implemented and frame elimination
  cannot handle it yet;
- Unaligned and/or multidword flat scratch shall work, but it
  is legalized now for MUBUF;
- Operand folding cannot optimize FI like with MUBUF yet;
- It will need scaling the value of the SP/FP in the DWARF
  expression to recover the unswizzled scratch address;

Differential Revision: https://reviews.llvm.org/D89170
2020-10-26 14:40:42 -07:00
Evgeny Leviant a28388f95b [ARM][SchedModels] Move IsLDMBaseRegInListPred to ARMSchedule.td. NFC
This predicate is not specific to cortex-a57 and can be used in other processor
models as well.
2020-10-26 22:31:41 +03:00
Stanislav Mekhanoshin ad8131bb03 [AMDGPU] Fix VC warning about singed/unsigned comparison. NFC.
This is the warning reported in https://reviews.llvm.org/D89599
2020-10-26 11:55:57 -07:00
Evgeny Leviant e74f66125e [ARM][SchedModels] Convert IsLdstsoScaledNotOptimalPred to MCSchedPredicate
Differential revision: https://reviews.llvm.org/D90150
2020-10-26 20:22:41 +03:00
Evgeny Leviant a877bda397 Fix issue in cortex-a57 sched model
Differential revision: https://reviews.llvm.org/D90152
2020-10-26 20:16:40 +03:00
Benjamin Kramer b777d30496 [AMDGPU] Avoid unused variable warning in Release builds. NFC.
SIRegisterInfo.cpp:480:19: error: unused variable 'SOffset'
2020-10-26 18:11:57 +01:00
Kazushi (Jam) Marukawa 9d0db405b5 [VE] Add vector shift instructions
Add VSLL/VSLD/VSRL/VSLA/VSLAX/VSRA/VSRAX/VSFA instructionss.  Add
additonal AsmParser for VSLD special operand.  Also add regression
tests.

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D90143
2020-10-27 00:30:27 +09:00
Kazushi (Jam) Marukawa 83cb423c6e [VE] Add vector logical instructions
Add VAND/VOR/VXOE/VEQV/VLDZ/VPCNT/VBRV/VSEQ instrucitons and regression
tests.

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D90141
2020-10-27 00:29:33 +09:00
Kazushi (Jam) Marukawa cfefef50c1 [VE] Support atomic store
Support atomic store instructions and add a regression test.

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D90137
2020-10-27 00:28:11 +09:00
Jay Foad 0ca4124798 [AMDGPU] Make more use of printNamedBit in AMDGPUInstPrinter. NFC. 2020-10-26 14:03:35 +00:00
Kazushi (Jam) Marukawa 8aa60f67dc [VE] Add vector comparison and min/max
Add VCMP/VCPS/VCPX/VCMS/VCMX vector instructions.  Also add regression
tests.

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D89643
2020-10-26 18:32:04 +09:00
Kazushi (Jam) Marukawa 0acf700243 [VE] Add integer arithmetic vector instructions
Add VADD/VADS/VADX/VSUB/VSBS/VSBX/VMPY/VMPS/VMPX/VMPD/VDIV/VDVS/VDVX
instructions.  Also add regression tests.

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D89642
2020-10-26 18:30:11 +09:00
Sebastian Neubauer a094b4fa4b [AMDGPU] Emit new pal metadata by default
If no pal metadata is given, default to the msgpack format instead of
the legacy metadata. This makes tests better readable.

Differential Revision: https://reviews.llvm.org/D90035
2020-10-26 10:16:17 +01:00
Evgeny Leviant a95ce5f65f [ARM][SchedModels] Rename and generalize predicate. NFC 2020-10-26 12:14:55 +03:00