Commit Graph

53908 Commits

Author SHA1 Message Date
Roman Lebedev cc7495a355 [X86][CodeGen][NFC] Delay `combineIncDecVector()` from DAGCombine to X86DAGToDAGISel
Summary:
We were previously doing it in DAGCombine.
But we also want to do `sub %x, C` -> `add %x, (sub 0, C)` for vectors in DAGCombine.
So if we had `sub %x, -1`, we'll transform it to `add %x, 1`,
which `combineIncDecVector()` will immediately transform back into `sub %x, -1`,
and here we go again...

I've marked this as NFC since not a single test changes,
but since that 'changes' DAGCombine, probably this isn't fully NFC.

Reviewers: RKSimon, craig.topper, spatel

Reviewed By: craig.topper

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62327

llvm-svn: 370327
2019-08-29 10:50:09 +00:00
Craig Topper c96284002e [X86] Remove isel patterns with X86VBroadcast+scalar_to_vector+load.
The DAG should have these as X86VBroadcast+load.

llvm-svn: 370299
2019-08-29 06:36:16 +00:00
Craig Topper 231e628d69 [X86] Remove some unneeded X86VBroadcast isel patterns that have larger than 128 bit input types.
We should always be shrinking the input to 128 bits or smaller
when the node is created.

llvm-svn: 370296
2019-08-29 06:02:11 +00:00
Craig Topper 1ec5c204b8 [X86] Add a DAG combine to combine INSERTPS and VBROADCAST of a scalar load. Remove corresponding isel patterns.
We had an isel pattern to perform this, but its better to
do it in DAG combine as a simplification. This also fixes the lack
of patterns for AVX512 targets.

llvm-svn: 370294
2019-08-29 05:48:48 +00:00
Craig Topper 1aadf6f39f [X86] Make inline assembly 'x' and 'v' constraints work for f128.
Including a type legalizer fix to make bitcast operand promotion
work correctly when getSoftenedFloat returns f128 instead of i128.

Fixes PR43157

llvm-svn: 370293
2019-08-29 05:13:56 +00:00
Matt Arsenault 216d8ff60b AMDGPU: Don't use frame virtual registers
SGPR spills aren't really handled after SILowerSGPRSpills. In order to
directly control what happens if the scavenger needs to spill, the
scavenger needs to be used directly. There is an alternative to
spilling in these contexts anyway since the frame register can be
increment and restored.

This does present another possible issue if spilling is needed for the
unused carry out if an add is needed. I think this can be avoided by
using a scalar add (although that clobbers SCC, which happens anyway).

llvm-svn: 370281
2019-08-29 01:13:47 +00:00
Matt Arsenault 8ec5c10042 GlobalISel/TableGen: Handle setcc patterns
This is a special case because one node maps to two different G_
instructions, and the operand order is changed.

This mostly enables G_FCMP for AMDPGPU. G_ICMP is still manually
selected for now since it has the SALU and VALU complication to deal
with.

llvm-svn: 370280
2019-08-29 01:13:41 +00:00
Craig Topper af364131af [X86] Fix a couple isel patterns to not shrink a volatile load.
Also add a FIXME because I'm not sure why these patterns exist. Looks like a missing combine.

And another FIXME because the AVX512 equivalent one of the patterns is missing.

llvm-svn: 370276
2019-08-28 23:45:10 +00:00
Shiva Chen b39876d8cd [RISCV] Avoid generating AssertZext for LP64 ABI when lowering floating LibCall
The patch fixed the issue that RV64 didn't clear the upper bits
when return complex floating value with lp64 ABI.

float _Complex
complex_add(float _Complex a, float _Complex b)
{
   return a + b;
}

RealResult = zero_extend(RealA + RealB)
ImageResult = ImageA + ImageB
Return (RealResult | (ImageResult << 32))

The patch introduces shouldExtendTypeInLibCall target hook to suppress
the AssertZext generation when lowering floating LibCall.

Thanks to Eli's comments from the Bugzilla
https://bugs.llvm.org/show_bug.cgi?id=42820

Differential Revision: https://reviews.llvm.org/D65497

llvm-svn: 370275
2019-08-28 23:40:37 +00:00
Heejin Ahn d85fd5a3f4 [WebAssembly] Add atomic.fence instruction
Summary:
This adds `atomic.fence` instruction:
https://github.com/WebAssembly/threads/blob/master/proposals/threads/Overview.md#fence-operator

And we now emit the new `atomic.fence` instruction for multithread
fences, rather than the prevous `atomic.rmw` hack.

Reviewers: dschuff

Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, jfb, tlively, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66794

llvm-svn: 370272
2019-08-28 23:13:43 +00:00
Simon Atanasyan 027f1da010 [mips] Add an empty line to separate different patterns. NFC
llvm-svn: 370269
2019-08-28 22:32:16 +00:00
Simon Atanasyan 59bb3609fa [mips] Fix 64-bit address loading in case of applying 32-bit mask to the result
If result of 64-bit address loading combines with 32-bit mask, LLVM
tries to optimize the code and remove "redundant" loading of upper
32-bits of the address. It leads to incorrect code on MIPS64 targets.

MIPS backend creates the following chain of commands to load 64-bit
address in the `MipsTargetLowering::getAddrNonPICSym64` method:
```
(add (shl (add (shl (add %highest(sym), %higher(sym)),
                    16),
               %hi(sym)),
          16),
     %lo(%sym))
```

If the mask presents, LLVM decides to optimize the chain of commands. It
really does not make sense to load upper 32-bits because the 0x0fffffff
mask anyway clears them. After removing redundant commands we get this
chain:
```
(add (shl (%hi(sym), 16), %lo(%sym))
```

There is no patterns matched `(MipsHi (i64 symbol))`. Due a bug in `SYM_32`
predicate definition, backend incorrectly selects a pattern for a 32-bit
symbols and uses the `lui` instruction for loading `%hi(sym)`.

As a result we get incorrect set of instructions with unnecessary 16-bit
left shifting:
```
lui     at,0x0
    R_MIPS_HI16     foo
dsll    at,at,0x10
daddiu  at,at,0
    R_MIPS_LO16     foo
```

This patch resolves two problems:
- Fix `SYM_32/SYM_64` predicates to prevent selection of patterns dedicated
  to 32-bit symbols in case of using N64 ABI.
- Add missed patterns for 64-bit symbols for `%hi/%lo`.

Fix PR42736.

Differential Revision: https://reviews.llvm.org/D66228

llvm-svn: 370268
2019-08-28 22:32:10 +00:00
Scott Linder 04f6f25421 [AMDGPU] Fix bug when calculating user_spgr_count for Code Object V3 assembler
Stop counting explicitly disabled user_spgr's in the user_sgpr_count field of the kernel descriptor.

Differential Revision: https://reviews.llvm.org/D66900

llvm-svn: 370250
2019-08-28 19:38:15 +00:00
Jessica Paquette af0bd41e06 [AArch64][GlobalISel] Fall back when translating musttail calls
These are currently translated as normal functions calls in AArch64.

Until we have proper tail call lowering, we shouldn't translate these.

Differential Revision: https://reviews.llvm.org/D66842

llvm-svn: 370225
2019-08-28 16:19:01 +00:00
Ryan Taylor 3b1459ed7c [AMDGPU] Adjust number of SGPRs available in Calling Convention
This reduces the number of SGPRs due to some concerns about running
out of SGPRs if you make all the SGPRs that aren't reserved available
for the calling convention.

Change-Id: Idb4ca4dc72f5b6808cb524ff7270915a8de5b4c1
llvm-svn: 370215
2019-08-28 15:00:45 +00:00
Hans Wennborg cff90f07cb [SelectionDAG] Don't generate libcalls for wide shifts on Windows (PR42711)
Neither libgcc or compiler-rt are usually used on Windows, so these
functions can't be called.

Differential revision: https://reviews.llvm.org/D66880

llvm-svn: 370204
2019-08-28 13:55:10 +00:00
Simon Atanasyan f46ba4f077 [mips] Use less registers to load address of TargetExternalSymbol
There is no pattern matched `add hi, (MipsLo texternalsym)`. As a result,
loading an address of 32-bit symbol requires two registers and one more
additional instruction:
```
addiu $1, $zero, %lo(foo)
lui   $2, %hi(foo)
addu  $25, $2, $1
```

This patch adds the missed pattern and enables generation more effective
set of instructions:
```
lui   $1, %hi(foo)
addiu $25, $1, %lo(foo)
```

Differential Revision: https://reviews.llvm.org/D66771

llvm-svn: 370196
2019-08-28 12:35:53 +00:00
Amaury Sechet 4f4387dd12 [TargetLowering] Add buildLegalVectorShuffle facility to help build legal shuffles
Summary: There are at least 2 ways to express the same shuffle. Various pieces of code explicit check for both option, but other places do not when they would benefit from doing it. This patches refactor the codebase to use buildLegalVectorShuffle in order to make that behavior more consistent.

Reviewers: craig.topper, efriedma, RKSimon, lebedev.ri

Subscribers: javed.absar, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66804

llvm-svn: 370190
2019-08-28 12:00:06 +00:00
David Green 379f6186dd [ARM] Move MVEVPTBlockPass to a separate file. NFC
This just pulls the MVEVPTBlockPass into a separate file, as opposed to being
wrapped up in Thumb2ITBlockPass.

Differential revision: https://reviews.llvm.org/D66579

llvm-svn: 370187
2019-08-28 11:37:31 +00:00
David Green 1c5b143c99 [MVE] VMOVX patterns
This adds fp16 VMOVX patterns, using the same patterns as rL362482 with some
adjustments for MVE. It allows us to move fp16 registers without going into and
out of gprs.

VMOVX is able to move the top bits from a fp16 in a fp reg into the bottom bits
of another register, zeroing the rest. This can be used for odd MVE register
lanes. The top bits are not read by fp16 instructions, so no move is required
there if we are dealing with even lanes.

Differential revision: https://reviews.llvm.org/D66793

llvm-svn: 370184
2019-08-28 10:13:23 +00:00
Sam Parker a761ba0f2d [ARM][ParallelDSP] Change search for muls
rL369567 reverted a couple of recent changes made to ARMParallelDSP
because of a miscompilation error: PR43073.

The issue stemmed from an underlying bug that was caused by adding
muls into a reduction before it was proved that they could be executed
in parallel with another mul.

Most of the changes here are from the previously reverted commits.
The additional changes have been made area:
1) The Search function now doesn't insert any muls into the Reduction
   object. That now happens once the search has successfully finished.
2) For any muls added into the reduction but that weren't paired, we
   accumulate their values as an input into the smlad.

Differential Revision: https://reviews.llvm.org/D66660

llvm-svn: 370171
2019-08-28 08:51:13 +00:00
Matt Arsenault a8bbcbd006 AMDGPU/GlobalISel: Fix constraining scalar and/or/xor
If the result register already had a register class assigned, the
sources may not have been properly constrained.

llvm-svn: 370150
2019-08-28 02:11:03 +00:00
Vlad Tsyrklevich 57076d3199 Revert "Change the X86 datalayout to add three address spaces for 32 bit signed,"
This reverts commit r370083 because it caused check-lld failures on
sanitizer-x86_64-linux-fast.

llvm-svn: 370142
2019-08-28 01:08:54 +00:00
Matt Arsenault 5c7e96dc26 AMDGPU/GlobalISel: Implement addrspacecast for 32-bit constant addrspace
llvm-svn: 370140
2019-08-28 00:58:24 +00:00
Luis Marques c894c6c983 [RISCV] Implement RISCVRegisterInfo::getPointerRegClass
Fixes bug 43041

Differential Revision: https://reviews.llvm.org/D66752

llvm-svn: 370113
2019-08-27 21:37:57 +00:00
Amara Emerson e20b91c265 [GlobalISel] Replace hard coded dynamic alloca handling with G_DYN_STACKALLOC.
This change moves the actual stack pointer manipulation into the legalizer,
available to targets via lower(). The codegen is slightly different because
we're using explicit masks instead of G_PTRMASK, and using G_SUB rather than
adding a negative amount via G_GEP.

Differential Revision: https://reviews.llvm.org/D66678

llvm-svn: 370104
2019-08-27 19:54:27 +00:00
Matt Arsenault ff07631b48 AMDGPU: Add amdgpu-32bit-address-high-bits to MIR serialization
llvm-svn: 370089
2019-08-27 18:18:38 +00:00
Matt Arsenault 0c096da02f AMDGPU: Fix crash from inconsistent register types for v3i16/v3f16
This is something of a workaround since computeRegisterProperties
seems to be doing the wrong thing.

llvm-svn: 370086
2019-08-27 17:51:56 +00:00
Amy Huang 1299945b81 Change the X86 datalayout to add three address spaces for 32 bit signed,
32 bit unsigned, and 64 bit pointers.

llvm-svn: 370083
2019-08-27 17:46:53 +00:00
Craig Topper fc1f08c2f2 [X86] Remove encoding information from the TAILJMP instructions that are lowered by MCInstLowering. Fix LowerPATCHABLE_TAIL_CALL to also convert them to regular JMP/JCC instructions
There are 5 instructions here that are converted from TAILJMP opcodes to regular JMP/JCC opcodes during MCInstLowering. So normally there encoding information isn't used. The exception being when XRay wraps them in PATCHABLE_TAIL_CALL.

For the ones that weren't already handled in MCInstLowering, add handling for those and remove their encoding information.

This patch fixes PATCHABLE_TAIL_CALL to do the same opcode conversion as the regular lowering patch. Then removes the encoding information.

Differential Revision: https://reviews.llvm.org/D66561

llvm-svn: 370079
2019-08-27 17:24:23 +00:00
Petar Avramovic 4a2a653288 [MIPS GlobalISel] ClampScalar G_SHL, G_ASHR and G_LSHR
ClampScalar G_SHL, G_ASHR and G_LSHR to s32 for MIPS32.

Differential Revision: https://reviews.llvm.org/D66533

llvm-svn: 370067
2019-08-27 14:41:44 +00:00
Simon Pilgrim 8912e2af39 [X86][AVX] Add SimplifyDemandedVectorElts support for KSHIFTL/KSHIFTR
Differential Revision: https://reviews.llvm.org/D66527

llvm-svn: 370055
2019-08-27 13:13:17 +00:00
Tim Northover a7f226f9db AArch64: avoid creating cycle in DAG for post-increment NEON ops.
Inserting a value into Visited has the effect of terminating a search for
predecessors if that node is seen. This is legitimate for the base address, and
acts as a slight performance optimization, but the vector-building node can be
paert of a legitimate cycle so we shouldn't stop searching there.

PR43056.

llvm-svn: 370036
2019-08-27 10:21:11 +00:00
Pengfei Wang 564fb58a32 [WinEH] Allocate space in funclets stack to save XMM CSRs
Summary:
This is an alternate approach to D63396

Currently funclets reuse the same stack slots that are used in the
parent function for saving callee-saved xmm registers. If the parent
function modifies a callee-saved xmm register before an excpetion is
thrown, the catch handler will overwrite the original saved value.

This patch allocates space in funclets stack for saving callee-saved xmm
registers and uses RSP instead RBP to access memory.

Signed-off-by: Pengfei Wang <pengfei.wang@intel.com>

Reviewers: rnk, RKSimon, craig.topper, annita.zhang, LuoYuanke, andrew.w.kaylor

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66596

Signed-off-by: Pengfei Wang <pengfei.wang@intel.com>
llvm-svn: 370005
2019-08-27 01:53:24 +00:00
Matt Arsenault 0a6564980b AMDGPU: Combine directly on mul24 intrinsics
The problem these are supposed to work around can occur before the
intrinsics are lowered into the nodes. Try to directly simplify them
so they are matched before the bit assert operations can be optimized
out.

llvm-svn: 369994
2019-08-27 00:18:09 +00:00
Matt Arsenault 3b95986a32 AMDGPU: Run AMDGPUCodeGenPrepare after scalar opts
The mul24 matching could interfere with SLSR and the other addressing
mode related passes. This probably is not the optimal placement, but
is an intermediate step. This should probably be moved after all the
generic IR passes, particularly LSR. Moving this after LSR seems to
help in some cases, and hurts others.

As-is in this patch, in idiv-licm, it saves 1-2 instructions inside
some of the loop bodies, but increases the number in others. Moving
this later helps these loops. In the new lsr tests in
mul24-pass-ordering, the intrinsic prevents introducing more
instructions in the loop preheader, so moving this later ends up
hurting them. This shouldn't be any worse than before the intrinsics
were introduced in r366094, and LSR should probably be smarter. I
think it's because it doesn't know the and inside the loop will be
folded away.

llvm-svn: 369991
2019-08-27 00:08:31 +00:00
Simon Atanasyan d5918edf0d [mips] Fix indentation. NFC
llvm-svn: 369983
2019-08-26 22:40:34 +00:00
Simon Atanasyan ac64924a55 [mips] clang-format the code. NFC
llvm-svn: 369982
2019-08-26 22:40:28 +00:00
Craig Topper 6db7f492d9 [X86] Delay combineIncDecVector until after op legalization.
Probably better to keep add over sub in early DAG combines.

It might make sense to push this to lowering or delay it all
the way to isel. But this was the simplest change.

llvm-svn: 369981
2019-08-26 22:17:54 +00:00
Heejin Ahn 173a3a54bb [WebAssembly] Fix SSA rebuilding in SjLj transformation
Summary:
Previously we skipped uses within the same BB as a def when rebuilding
SSA after SjLj transformation. For example, before transformation,
```
for.cond:
  %0 = phi i32 [ %var, %for.inc ] ...
  %var = ...
  br label %for.inc

for.inc:                               ; preds = %for.cond
  call i32 @setjmp(...)
  br %for.cond
```

In this BB, %var should be defined in all paths from %for.inc to make %0
valid. In the input it was true; %for.inc's only predecessor was
%for.cond. But after SjLj transformation, it is possible that %for.inc
has other predecessors that are reachable without reaching %for.cond.
```
entry.split:
  ...
  br i1 %a, label %bb.1, label %for.inc

for.cond:
  %0 = phi i32 [ %var, %for.inc ] ...  ; Not valid!
  %var = ...
  br label %for.inc

for.inc:                               ; preds = %for.cond, %entry.split
  call i32 @setjmp(...)
  ...
  br %for.cond
```

In this case, we can't use %var in the `phi` instruction in %for.cond,
because %var is not defined in all paths through %for.inc (If the
control flow is %entry -> %entry.split -> %for.inc -> %for.cond, %var
has not been defined until we reach the `phi`). But the previous code
excluded users within the same BB, skipping instructions within the same
BB so they are not rewritten properly. User instructions within the same
BB also should be candidates for rewriting if they are _before_ the
original definition.

Fixes PR43097.

Reviewers: dschuff

Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66729

llvm-svn: 369978
2019-08-26 21:51:35 +00:00
Roland Froese 18db4e9ae1 Recommit [PowerPC] Update P9 vector costs for insert/extract
Now that the v1i128 smin regression has been fixed, recommit the P9 cost
updates from D60160.

llvm-svn: 369952
2019-08-26 19:26:08 +00:00
Krzysztof Parzyszek 9e0feaf562 [Hexagon] Improve generated code for test-if-bit-clear
llvm-svn: 369947
2019-08-26 19:08:08 +00:00
Craig Topper 36d1588f01 [X86] Add a hack to combinePMULDQ to manually turn SIGN_EXTEND_VECTOR_INREG/ZERO_EXTEND_VECTOR_INREG inputs into an ANY_EXTEND_VECTOR_INREG style shuffle
ANY_EXTEND_VECTOR_INREG isn't currently marked Legal which prevents SimplifyDemandedBits from turning SIGN/ZERO_EXTEND_VECTOR_INREG into it after op legalization. And even if we did make it Legal, combineExtInVec doesn't do shuffle combining on the VECTOR_INREG nodes until AVX1.

This patch adds a quick hack to combinePMULDQ to directly emit a vector shuffle corresponding to an ANY_EXTEND_VECTOR_INREG operation. This avoids both of those issues without creating any other regressions on our tests. The xop-ifma.ll change here also showed up when I tried to resurrect D56306 and seemed to be the only improvement that patch creates now. This is a more direct way to get the benefit.

Differential Revision: https://reviews.llvm.org/D66436

llvm-svn: 369942
2019-08-26 18:23:26 +00:00
Craig Topper b8b90ac1c5 [X86][DAGCombiner] Teach narrowShuffle to use concat_vectors instead of inserting into undef
Summary:
Concat_vectors is more canonical during early DAG combine. For example, its what's used by SelectionDAGBuilder when converting IR shuffles into SelectionDAG shuffles when element counts between inputs and mask don't match. We also have combines in DAGCombiner than can pull concat_vectors through a shuffle. See partitionShuffleOfConcats. So it seems like concat_vectors is a better operation to use here. I had to teach DAGCombiner's SimplifyVBinOp to also handle concat_vectors with undef. I haven't checked yet if we can remove the INSERT_SUBVECTOR version in there or not.

I didn't want to mess with the other caller of getShuffleHalfVectors that's used during shuffle lowering where insert_subvector probably is what we want to produce so I've enabled this via a boolean passed to the function.

Reviewers: spatel, RKSimon

Reviewed By: RKSimon

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66504

llvm-svn: 369872
2019-08-25 17:59:49 +00:00
Xing Xue ef039a3ccd [PowerPC][AIX] Adds support for writing the .data section in assembly files
Summary:
Adds support for generating the .data section in assembly files for global variables with a non-zero initialization. The support for writing the .data section in XCOFF object files will be added in a follow-on patch. Any relocations are not included in this patch.

Reviewers: hubert.reinterpretcast, sfertile, jasonliu, daltenty, Xiangling_L

Reviewed by: hubert.reinterpretcast

Subscribers: nemanjai, hiraditya, kbarton, MaskRay, jsji, wuzish, shchenz, DiggerLin, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66154

llvm-svn: 369869
2019-08-25 15:17:25 +00:00
Benjamin Kramer 55e8c91dd5 [AMDGPU] Downgrade from StringLiteral to const char* in an attempt to make GCC 5 happy
llvm-svn: 369867
2019-08-25 12:47:31 +00:00
Craig Topper 1abe162a9a [X86] Teach -Os immediate sharing code to not count constant uses that will become INC/DEC.
INC/DEC don't use an immediate so we don't need to count it. We
also shouldn't use the custom isel for it.

Fixes PR42998.

llvm-svn: 369863
2019-08-25 05:22:40 +00:00
Craig Topper cc4b0596b1 [X86] Add isel patterns to match vpdpwssd avx512vnni instruction from add+pmaddwd nodes.
llvm-svn: 369859
2019-08-24 23:14:57 +00:00
Matt Arsenault c6ab2b4fed AMDGPU: Preserve value name when inserting mul24 intrinsic
llvm-svn: 369857
2019-08-24 22:17:10 +00:00
Matt Arsenault b3dd381a73 AMDGPU: Introduce a flag to disable mul24 intrinsic formation
llvm-svn: 369856
2019-08-24 22:14:41 +00:00
Benjamin Kramer 7043477042 Fix some accidental global initializers by using StringLiteral instead of StringRef
llvm-svn: 369850
2019-08-24 15:24:25 +00:00
Benjamin Kramer 16b322914a Use a bit of relaxed constexpr to make FeatureBitset costant intializable
This requires std::intializer_list to be a literal type, which it is
starting with C++14. The downside is that std::bitset is still not
constexpr-friendly so this change contains a re-implementation of most
of it.

Shrinks clang by ~60k.

llvm-svn: 369847
2019-08-24 15:02:44 +00:00
Craig Topper dd2cf78381 [X86] Add an assert to mark more code that needs to be removed when the vector widening legalization switch is removed again.
llvm-svn: 369837
2019-08-24 05:59:46 +00:00
Guillaume Chatelet 0b6563e8a2 [LLVM][NFC] Removing unused functions
Summary: Removes a not so useful function from DataLayout and cleans up Support/MathExtras.h

Reviewers: courbet

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66691

llvm-svn: 369824
2019-08-23 23:19:25 +00:00
Stanislav Mekhanoshin b37d6a750a [AMDGPU] Check for immediate SrcC in mfma in AsmParser
Differential Revision: https://reviews.llvm.org/D66674

llvm-svn: 369819
2019-08-23 22:22:49 +00:00
Stanislav Mekhanoshin e6e1c4eac0 [AMDGPU] w/a for gfx908 mfma SrcC literal HW bug
gfx908 ignores an mfma if SrcC is a literal.

Differential Revision: https://reviews.llvm.org/D66670

llvm-svn: 369818
2019-08-23 22:22:29 +00:00
Stanislav Mekhanoshin 8fe1245a0f [AMDGPU] w/a for gfx908 mfma SrcC literal HW bug
gfx908 ignores an mfma if SrcC is a literal.

Differential Revision: https://reviews.llvm.org/D66670

llvm-svn: 369816
2019-08-23 22:09:58 +00:00
Jessica Paquette 83fe56b3b9 [AArch64][GlobalISel] Import XRO load/store patterns instead of custom selection
Instead of using custom C++ in `earlySelect` for loads and stores, just import
the patterns.

Remove `earlySelectLoad`, since we can just import the work it's doing.

Some minor changes to how `ComplexRendererFns` are returned for the XRO
addressing modes. If you add immediates in two steps, sometimes they are not
imported properly and you only end up with one immediate. I'm not sure if this
is intentional.

- Update load-addressing-modes.mir to include the instructions we can now
  import.

- Add a similar test, store-addressing-modes.mir to show which store opcodes we
  currently import, and show that we can pull in shifts etc.

- Update arm64-fastisel-gep-promote-before-add.ll to use FastISel instead of
  GISel. This test failed with GISel because GISel folds the gep into the load.
  The test checks that FastISel doesn't fold non-pointer-width adds into loads.
  GISel on the other hand, produces a G_CONSTANT of -128 for the add, and then
  a G_GEP, which must be pointer-width.

Note that we don't get STRBRoX right now. It seems like the importer can't
handle `FPR8Op:{ *:[Untyped] }:$Rt` source operands. So, those are not currently
supported.

Differential Revision: https://reviews.llvm.org/D66679

llvm-svn: 369806
2019-08-23 20:31:34 +00:00
Benjamin Kramer dc5f805d31 Do a sweep of symbol internalization. NFC.
llvm-svn: 369803
2019-08-23 19:59:23 +00:00
Craig Topper bc173d4c51 [X86] Move a transform out of combineConcatVectorOps so we don't prematurely turn CONCAT_VECTORS into INSERT_SUBVECTORS.
CONCAT_VECTORS and INSERT_SUBVECTORS can both call combineConcatVectorOps,
but we shouldn't produce INSERT_SUBVECTORS from there. We should
keep CONCAT_VECTORS until vector legalization.

Noticed while looking at the madd_quad_reduction test from madd.ll

llvm-svn: 369802
2019-08-23 19:52:24 +00:00
Roland Froese b4051e57b1 [PowerPC] Expand v1i128 smin
The smin opcode and friends for v1i128 are incorrectly marked as legal for PPC.
Change them to expand.

Differential Revision: https://reviews.llvm.org/D64960

llvm-svn: 369797
2019-08-23 19:04:47 +00:00
Craig Topper bccd183217 [X86] Mark VPDPWSSD and VPDPWSSDS as commutable. Add stack folding tests.
llvm-svn: 369792
2019-08-23 18:05:37 +00:00
Craig Topper e7211bb567 [SelectionDAG][X86] Enable iX SimplifyDemandedBits to vXi1 SimplifyDemandedVectorElts simplification. Add a hack to X86 to avoid a regression
Patch showing the effect of enabling bool vector oversimplification.

Non-VLX builds can simplify a kshift shuffle, but VLX builds simplify:

insert_subvector v8i zeroinitializer, v2i --> insert_subvector v8i undef, v2i

Preventing the removal of the AND to clear the upper bits of result

Differential Revision: https://reviews.llvm.org/D53022

llvm-svn: 369780
2019-08-23 17:14:58 +00:00
Simon Atanasyan 5f7d6ac7bf [mips] Reduce number of instructions used for loading a global symbol's value
Now `lw/sw $reg, sym+offset` pseudo instructions for global symbol `sym`
are lowering into the following three instructions.
```
lw     $reg, %got(symbol)($gp)
addiu  $reg, $reg, offset
lw/sw  $reg, 0($reg)
```

It's possible to reduce the number of instructions by taking the offset
in account in the final `lw/sw` command. This patch implements that
optimization.
```
lw     $reg, %got(symbol)($gp)
lw/sw  $reg, offset($reg)
```

Differential Revision: https://reviews.llvm.org/D66553

llvm-svn: 369756
2019-08-23 13:36:24 +00:00
Simon Atanasyan 58492b1895 [mips] Do not include offset into `%got` expression for global symbols
Now pseudo instruction `la $6, symbol+8($6)` is expanding into the following
chain of commands:
```
lw    $1, %got(symbol+8)($gp)
addiu $1, $1, 8
addu  $6, $1, $6
```

This is incorrect. When a linker handles the `R_MIPS_GOT16` relocation,
it does not expect to get any addend and breaks on assertion. Otherwise
it has to create new GOT entry for each unique "sym + offset" pair.
Offset for a global symbol should be added to result of loading GOT
entry by a separate `add` command.

The patch fixes the problem by stripping off an offset from the expression
passed to the `%got`. That's interesting that even current code inserts
a separate `add` command.

Differential Revision: https://reviews.llvm.org/D66552

llvm-svn: 369755
2019-08-23 13:36:14 +00:00
Simon Pilgrim c88408cf85 Use VT::getHalfNumVectorElementsVT helpers in a few places. NFCI.
llvm-svn: 369751
2019-08-23 12:37:09 +00:00
Andrea Di Biagio 8e9af64da6 [X86][BtVer2] Add a read-advance to every implicit register use of CMPXCHG8B/16B.
This is a follow up of r369642.

This patch assigns a ReadAfterLd to every implicit register use of instruction
CMPXCHG8B and instruction CMPXCHG16B. Perf micro-benchmarks show that implicit
registers are read after 3cy from the start of execution.

llvm-svn: 369750
2019-08-23 12:19:45 +00:00
Andrea Di Biagio 1630f64e2f [X86][BtVer2] Fix latency of ALU RMW instructions.
Excluding ADC/SBB and the bit-test instructions (BTR/BTS/BTC), the observed
latency of all other RMW integer arithmetic/logic instructions is 6cy and not
5cy.

Example (ADD):

```
addb $0, (%rsp)            # Latency: 6cy
addb $7, (%rsp)            # Latency: 6cy
addb %sil, (%rsp)          # Latency: 6cy

addw $0, (%rsp)            # Latency: 6cy
addw $511, (%rsp)          # Latency: 6cy
addw %si, (%rsp)           # Latency: 6cy

addl $0, (%rsp)            # Latency: 6cy
addl $511, (%rsp)          # Latency: 6cy
addl %esi, (%rsp)          # Latency: 6cy

addq $0, (%rsp)            # Latency: 6cy
addq $511, (%rsp)          # Latency: 6cy
addq %rsi, (%rsp)          # Latency: 6cy
```

The same latency profile applies to SUB/AND/OR/XOR/INC/DEC.

The observed latency of ADC/SBB is 7-8cy. So we need a different write to model
those.  Latency of BTS/BTR/BTC is not fixed by this patch (they are much slower
than what the model for btver2 currently reports).

Differential Revision: https://reviews.llvm.org/D66636

llvm-svn: 369748
2019-08-23 11:34:10 +00:00
Jay Foad eac23862a8 [AMDGPU] gfx10 atomic optimizer changes.
Summary:
Add support for gfx10, where all DPP operations are confined to work
within a single row of 16 lanes, and wave32.

Reviewers: arsenm, sheredom, critson, rampitec

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, t-tye, hiraditya, jfb, dstuttard, tpr, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65644

llvm-svn: 369745
2019-08-23 10:07:43 +00:00
Craig Topper 4deb388bca [X86] Make combineLoopSADPattern use CONCAT_VECTORS instead of INSERT_SUBVECTORS for widening with zeros.
CONCAT_VECTORS is more canonical for the early DAG combine runs
until we start getting into the op legalization phases.

llvm-svn: 369734
2019-08-23 06:08:33 +00:00
Craig Topper bdceb9fb14 [X86] Improve lowering of v2i32 SAD handling in combineLoopSADPattern.
For v2i32 we only feed 2 i8 elements into the psadbw instructions
with 0s in the other 14 bytes. The resulting psadbw instruction
will produce zeros in bits [127:16] of the output. We need to take
the result and feed it to a v2i32 add where the first element
includes bits [15:0] of the sad result. The other element should
be zero.

Prior to this patch we were using a truncate to take 0 from
bits 95:64 of the psadbw. This results in a pshufd to move those
bits to 63:32. But since we also have zeroes in bits 63:32 of
the psadbw output, we should just take those bits.

The previous code probably worked better with promoting legalization,
but now we use widening legalization. I've preserved the old
behavior if -x86-experimental-vector-widening-legalization=false
until we get that option removed.

llvm-svn: 369733
2019-08-23 05:33:27 +00:00
Sam Clegg 90b6bb75e8 [MC] Minor cleanup to MCFixup::Kind handling. NFC.
Prefer `MCFixupKind` where possible and add getTargetKind() to
convert to `unsigned` when needed rather than scattering cast
operators around the place.

Differential Revision: https://reviews.llvm.org/D59890

llvm-svn: 369720
2019-08-23 01:00:55 +00:00
Peter Collingbourne 2452d7030b IR. Change strip* family of functions to not look through aliases.
I noticed another instance of the issue where references to aliases were
being replaced with aliasees, this time in InstCombine. In the instance that
I saw it turned out to be only a QoI issue (a symbol ended up being missing
from the symbol table due to the last reference to the alias being removed,
preventing HWASAN from symbolizing a global reference), but it could easily
have manifested as incorrect behaviour.

Since this is the third such issue encountered (previously: D65118, D65314)
it seems to be time to address this common error/QoI issue once and for all
and make the strip* family of functions not look through aliases.

Includes a test for the specific issue that I saw, but no doubt there are
other similar bugs fixed here.

As with D65118 this has been tested to make sure that the optimization isn't
load bearing. I built Clang, Chromium for Linux, Android and Windows as well
as the test-suite and there were no size regressions.

Differential Revision: https://reviews.llvm.org/D66606

llvm-svn: 369697
2019-08-22 19:56:14 +00:00
Francis Visoiu Mistrih 5b5ee61b5f [MachO][TLOF] Use hasLocalLinkage to determine if indirect symbol is local
Local symbols in the indirect symbol table contain the value
`INDIRECT_SYMBOL_LOCAL` and the corresponding __pointers entry must
contain the address of the target.

In r349060, I added support for local symbols in the indirect symbol
table, which was checking if the symbol `isDefined` && `!isExternal` to
determine if the symbol is local or not.

It turns out that `isDefined` will return false if the user of the
symbol comes before its definition, and we'll again generate .long 0
which will be the symbol at the adress 0x0.

Instead of doing that, use GlobalValue::hasLocalLinkage() to check if
the symbol is local.

Differential Revision: https://reviews.llvm.org/D66563

llvm-svn: 369671
2019-08-22 16:59:00 +00:00
Craig Topper 898a0e9b84 [X86] Remove MCInstLower code that drops operands from some CALL and TAILJMP instructions. Add asserts to verify operand count
It appears the FIXME here was handled at some point. r159728 from 2012 seems to be at least aportion of fixing it.

Differential Revision: https://reviews.llvm.org/D66570

llvm-svn: 369665
2019-08-22 16:23:35 +00:00
Andrea Di Biagio c9649eb9da [X86][BtVer2] Fix latency/throughput of scalar integer MUL instructions.
Single operand MUL instructions that implicitly set EAX have the following
latency/throughput profile (see below):

imul %cl              # latency: 3cy - uOPs: 1 - 1 JMul
imul %cx              # latency: 3cy - uOPs: 3 - 3 JMul
imul %ecx             # latency: 3cy - uOPs: 2 - 2 JMul
imul %rcx             # latency: 6cy - uOPs: 2 - 4 JMul

mul %cl               # latency: 3cy - uOPs: 1 - 1 JMul
mul %cx               # latency: 3cy - uOPs: 3 - 3 JMul
mul %ecx              # latency: 3cy - uOPs: 2 - 2 JMul
mul %rcx              # latency: 6cy - uOPs: 2 - 4 JMul

Excluding the 64bit variant, which has a latency of 6cy, every other instruction
has a latency of 3cy. However, the number of decoded macro-opcodes (as well as
the resource cyles) depend on the MUL size.

The two operand MULs have a more predictable profile (see below):

imul %dx, %dx         # latency: 3cy - uOPs: 1 - 1 JMul
imul %edx, %edx       # latency: 3cy - uOPs: 1 - 1 JMul
imul %rdx, %rdx       # latency: 6cy - uOPs: 1 - 4 JMul

imul $3, %dx, %dx     # latency: 4cy - uOPs: 2 - 2 JMul
imul $3, %ecx, %ecx   # latency: 3cy - uOPs: 1 - 1 JMul
imul $3, %rdx, %rdx   # latency: 6cy - uOPs: 1 - 4 JMul

This patch updates the values in the Jaguar scheduling model and regenerates
llvm-mca tests.

Differential Revision: https://reviews.llvm.org/D66547

llvm-svn: 369661
2019-08-22 15:20:16 +00:00
Sean Fertile 5f85a7b1cf [PowerPC] Add combined ELF ABI and 32/64 bit queries to the subtarget. [NFC]
A lot of places in the code combine checks for both ABI (SVR4/Darwin/AIX) and
addressing mode (64-bit vs 32-bit). In an attempt to make some of the code more
readable I've added a couple functions that combine checking for the ELF abi and
64-bit/32-bit code at once. As we add more AIX support I intend to add similar
functions for the AIX ABI.

Differential Revision: https://reviews.llvm.org/D65814

llvm-svn: 369658
2019-08-22 15:11:28 +00:00
Sean Fertile 18fd1b0b49 [PowerPC][XCOFF][MC] Explicitly set containing csect on symbols. [NFC]
Previously we would get the csect a symbol was contained in through its
fragment. This works only if we are writing an object file, and only for
defined symbols. To fix this we set the contating csect explicitly on the
MCSymbolXCOFF object.

Differential Revision: https://reviews.llvm.org/D66032

llvm-svn: 369657
2019-08-22 15:11:23 +00:00
Andrea Di Biagio c6744055ad [X86][BtVer2] Fix latency and throughput of XCHG and XADD.
On Jaguar, XCHG has a latency of 1cy and decodes to 2 macro-opcodes. Maximum
throughput for XCHG is 1 IPC. The byte exchange has worse latency and decodes to
1 extra uOP; maximum observed throughput is 0.5 IPC.

```
xchgb %cl, %dl           # Latency: 2cy  -  uOPs: 3  -  2 ALU
xchgw %cx, %dx           # Latency: 1cy  -  uOPs: 2  -  2 ALU
xchgl %ecx, %edx         # Latency: 1cy  -  uOPs: 2  -  2 ALU
xchgq %rcx, %rdx         # Latency: 1cy  -  uOPs: 2  -  2 ALU
```

The reg-mem forms of XCHG are atomic operations with an observed latency of
16cy.  The resource usage is similar to the XCHGrr variants. The biggest
difference is obviously the bus-locking, which prevents the LS to issue other
memory uOPs in parallel until the unlocking store uOP is executed.

```
xchgb %cl, (%rsp)        # Latency: 16cy  -  uOPs: 3 - ECX latency: 11cy
xchgw %cx, (%rsp)        # Latency: 16cy  -  uOPs: 3 - ECX latency: 11cy
xchgl %ecx, (%rsp)       # Latency: 16cy  -  uOPs: 3 - ECX latency: 11cy
xchgq %rcx, (%rsp)       # Latency: 16cy  -  uOPs: 3 - ECX latency: 11cy
```

The exchanged in/out register operand becomes available after 11cy from the
start of execution. Added test xchg.s to verify that we correctly see that
register write committed in 11cy (and not 16cy).

Reg-reg XADD instructions have the same latency/throughput than the byte
exchange (register-register variant).

```
xaddb %cl, %dl           # latency: 2cy  -  uOPs: 3  -  3 ALU
xaddw %cx, %dx           # latency: 2cy  -  uOPs: 3  -  3 ALU
xaddl %ecx, %edx         # latency: 2cy  -  uOPs: 3  -  3 ALU
xaddq %rcx, %rdx         # latency: 2cy  -  uOPs: 3  -  3 ALU
```

The non-atomic RM variants have a latency of 11cy, and decode to 4
macro-opcodes. They still consume 2 ALU pipes, and the exchange in/out register
operand becomes available in 3cy (it matches the 'load-to-use latency').

```
xaddb %cl, (%rsp)        # latency: 11cy  -  uOPs: 4  -  3 ALU
xaddw %cx, (%rsp)        # latency: 11cy  -  uOPs: 4  -  3 ALU
xaddl %ecx, (%rsp)       # latency: 11cy  -  uOPs: 4  -  3 ALU
xaddq %rcx, (%rsp)       # latency: 11cy  -  uOPs: 4  -  3 ALU
```

The atomic XADD variants execute in 16cy. The in/out register operand is
available after 11cy from the start of execution.

```
lock xaddb %cl, (%rsp)   # latency: 16cy - uOPs: 4 - 3 ALU -- ECX latency: 11cy
lock xaddw %cx, (%rsp)   # latency: 16cy - uOPs: 4 - 3 ALU -- ECX latency: 11cy
lock xaddl %ecx, (%rsp)  # latency: 16cy - uOPs: 4 - 3 ALU -- ECX latency: 11cy
lock xaddq %rcx, (%rsp)  # latency: 16cy - uOPs: 4 - 3 ALU -- ECX latency: 11cy
```

Added test xadd.s to verify those latencies as well as read-advance values.

Differential Revision: https://reviews.llvm.org/D66535

llvm-svn: 369642
2019-08-22 11:32:47 +00:00
Simon Pilgrim 6dd51c2f19 [MVT] Add MVT equivalent to EVT::getHalfNumVectorElementsVT() helper. NFCI.
Allows for some cleanup in a lot of SSE/AVX vector splitting code

llvm-svn: 369640
2019-08-22 11:14:30 +00:00
Sam Tebbs a69d9d6156 Reapply: [ARM] Fix lsrl with a 128/256 bit shift amount or a shift of 32
The CodeGen/Thumb2/mve-vaddv.ll test needed to be amended to reflect the
changes from the above patch.

This reverts commit cd53ff6, reapplying 7c6b229.

llvm-svn: 369638
2019-08-22 10:29:20 +00:00
Hans Wennborg cd53ff6c0d Revert r369626 "[ARM] Fix lsrl with a 128/256 bit shift amount or a shift of 32"
It broke the bots, see e.g. http://lab.llvm.org:8011/builders/clang-cuda-build/builds/36275/

> This patch fixes shifts by a 128/256 bit shift amount. It also fixes
> codegen for shifts of 32 by delegating to LLVM's default optimisation
> instead of emitting a long shift.
>
> Tests that used to generate long shifts of 32 are updated to check for the
> more optimised codegen.
>
> Differential revision: https://reviews.llvm.org/D66519
>
> llvm-svn: 369626

llvm-svn: 369636
2019-08-22 09:16:53 +00:00
Craig Topper d420616313 [X86] Lower the cost of v2i32->v2f64 sint_to_fp under vector widening legalization.
I don't really understand the costs we're using for fp_to_sint,
but prior to widening legalization we used 20 as the cost for this
via the v2i64->v2f64 entry. That number seems better than the 40
we got with widening legalization. So now we need either a
v2i32->v2f64 entry or a v4i32->v2f64 entry depending on whether
AVX is enabled or not since we skip the first SSE2 table look up
under AVX.

llvm-svn: 369628
2019-08-22 08:18:45 +00:00
Sam Tebbs 7c6b229204 [ARM] Fix lsrl with a 128/256 bit shift amount or a shift of 32
This patch fixes shifts by a 128/256 bit shift amount. It also fixes
codegen for shifts of 32 by delegating to LLVM's default optimisation
instead of emitting a long shift.

Tests that used to generate long shifts of 32 are updated to check for the
more optimised codegen.

Differential revision: https://reviews.llvm.org/D66519

llvm-svn: 369626
2019-08-22 08:12:06 +00:00
Shiva Chen 72a41e7b0d [TargetLowering] Remove optional arguments passing to makeLibCall
The patch introduces MakeLibCallOptions struct as suggested by @efriedma on D65497.
The struct contain argument flags which will pass to makeLibCall function.
The patch should not has any functionality changes.

Differential Revision: https://reviews.llvm.org/D65795

llvm-svn: 369622
2019-08-22 04:59:43 +00:00
Pengfei Wang 7630e24492 [X86] Making X86OptimizeLEAs pass public. NFC
Reviewers: wxiao3, LuoYuanke, andrew.w.kaylor, craig.topper, annita.zhang, liutianle, pengfei, xiangzhangllvm, RKSimon, spatel, andreadb

Reviewed By: RKSimon

Subscribers: andreadb, hiraditya, llvm-commits

Tags: #llvm

Patch by Gen Pei (gpei)

Differential Revision: https://reviews.llvm.org/D65933

llvm-svn: 369612
2019-08-22 02:29:27 +00:00
Craig Topper 78e6507b0a [X86] Correct the scheduler classes for TAILJMP and TCRETURN CodeGenOnly instructions.
We had an odd combination of WriteJump applied to some memory
instructions and WriteJumpLd applied to register and immediate
instructions.

Thsi should hopefully assign them all correctly.

llvm-svn: 369599
2019-08-21 23:17:52 +00:00
Craig Topper 303bbc3be2 [X86] Replace a couple hardcoded '5's with X86::AddrNumOperands for readability. NFC
llvm-svn: 369598
2019-08-21 22:40:07 +00:00
Luis Marques f7cdff4ffd [RISCV] Remove fix introduced by r369573, superseded by r369580
llvm-svn: 369590
2019-08-21 22:02:56 +00:00
Luis Marques 4f488b594a [RISCV] Fix use of side-effects in asserts in decoder functions
llvm-svn: 369580
2019-08-21 21:11:37 +00:00
Richard Smith b73cd33625 Fix -Werror=unused-variable error after r369528.
llvm-svn: 369573
2019-08-21 20:42:37 +00:00
Nico Weber ed18e70c86 Revert r367389 (and follow-up r368404); it caused PR43073.
llvm-svn: 369567
2019-08-21 19:53:42 +00:00
Sam Clegg dde8a25a4b [WebAssembly] Handle aliases in WebAssemblyFixFunctionBitcasts
Fixes: https://github.com/emscripten-core/emscripten/issues/8770

Differential Revision: https://reviews.llvm.org/D66508

llvm-svn: 369566
2019-08-21 19:52:33 +00:00
Craig Topper 3f59bfd5be [MVT] Add v16f16 and v32f16 vectors.
I might look at improving PR43065 which will require being
able to mark a 256 and 512 bit vector of f16 as Legal.

Differential Revision: https://reviews.llvm.org/D66515

llvm-svn: 369565
2019-08-21 19:14:48 +00:00
Simon Atanasyan 159f621c5c [mips] Replace call `expandLoadAddress` by `loadAndAddSymbolAddress`. NFC
In case of expanding `lw/sw $reg, symbol($reg)` instruction for PIC it's
enough to call the `loadAndAddSymbolAddress` method. Additional work
performed by the `expandLoadAddress` is not required here.

llvm-svn: 369563
2019-08-21 18:54:51 +00:00
Simon Atanasyan bb2f857247 [mips] Remove duplicated case from the `StringSwitch`. NFC
llvm-svn: 369562
2019-08-21 18:54:41 +00:00
Matt Arsenault 954a012b4c GlobalISel: Implement moreElementsVector for G_UNMERGE_VALUES sources
This is necessary for handling <3 x s16> on AMDGPU, assuming this
should be handled as 2 separate legalization actions. The alternative
would be for fewerElementsVector to handle 3->2.

llvm-svn: 369547
2019-08-21 16:59:10 +00:00
David Green 717feabdf0 [ARM] Formatting for ARMInstrMVE.td. NFC
This is just some formatting cleanup, prior to the masked load and store patch
in D66534.

llvm-svn: 369545
2019-08-21 16:20:35 +00:00
Alexander Timofeev 78347c979e [AMDGPU] Prevent VGPR copies from moving across the EXEC mask definitions
Differential Revision: https://reviews.llvm.org/D63731
Reviewers: qcolombet, rampitec

llvm-svn: 369532
2019-08-21 15:15:04 +00:00
Guillaume Chatelet 1c18a9cb9e [LLVM][Alignment] Introduce Alignment In MachineFrameInfo
Summary:
This is patch is part of a serie to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: jfb

Subscribers: hiraditya, dexonsmith, llvm-commits, courbet

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65800

llvm-svn: 369531
2019-08-21 14:29:30 +00:00
Luis Marques c3bf3d14ea [RISCV] Add support for RVC HINT instructions
The hint instructions are enabled by default (if the standard C extension is 
enabled). To disable them pass -mattr=-rvc-hints.

Differential Revision: https://reviews.llvm.org/D62592

llvm-svn: 369528
2019-08-21 14:00:58 +00:00
Petar Avramovic 7f581df649 [MIPS GlobalISel] NarrowScalar G_ZEXTLOAD and G_SEXTLOAD
NarrowScalar G_ZEXTLOAD and G_SEXTLOAD to s32 for MIPS32.

Differential Revision: https://reviews.llvm.org/D66205

llvm-svn: 369512
2019-08-21 09:43:20 +00:00
Petar Avramovic e406aa791c [MIPS GlobalISel] NarrowScalar G_ZEXT and G_SEXT
NarrowScalar G_ZEXT and G_SEXT to s32 for MIPS32.

Differential Revision: https://reviews.llvm.org/D66204

llvm-svn: 369511
2019-08-21 09:35:02 +00:00
Petar Avramovic 61bf2675b9 [MIPS GlobalISel] Consider type1 when legalizing shifts after r351882
r351882 allows different type for shift amount then result and value
being shifted. Fix MIPS Legalizer rules to take r351882 into account.

Differential Revision: https://reviews.llvm.org/D66203

llvm-svn: 369510
2019-08-21 09:31:29 +00:00
Petar Avramovic 5b4c5c2c54 [MIPS GlobalISel] NarrowScalar G_TRUNC
Add NarrowScalar for G_TRUNC when NarrowTy is half the size of source.
NarrowScalar G_TRUNC to s32 for MIPS32.

Differential Revision: https://reviews.llvm.org/D66202

llvm-svn: 369509
2019-08-21 09:26:39 +00:00
Luke Cheeseman 71d38b3c62 [AArch64] Update MTE system register encodings
The encodings for the system registers TFSRE0_EL1, TFSR_EL1 TFSR_EL2, TFSR_EL3
and TFSR_EL12 have been changed so that they consistently have CRn=5 and CRm=6
as per https://developer.arm.com/docs/ddi0487/latest.

Differential Revision: https://reviews.llvm.org/D65442

llvm-svn: 369505
2019-08-21 09:09:56 +00:00
Amara Emerson 56606a4db3 [AArch64][GlobalISel] Add support for narrowScalar of G_ZEXT
We do this by merging the source with the high bits set to 0.

Differential Revision: https://reviews.llvm.org/D66181

llvm-svn: 369480
2019-08-21 00:12:37 +00:00
Daniel Sanders a16bd4f9f2 [RISCV GlobalISel] Adding initial GlobalISel infrastructure
Summary:
Add an initial GlobalISel skeleton for RISCV. It can only run ir translator for `ret void`.

Patch by Andrew Wei

Reviewers: asb, sabuasal, apazos, lenary, simoncook, lewis-revill, edward-jones, rogfer01, xiangzhai, rovka, Petar.Avramovic, mgorny, dsanders

Reviewed By: dsanders

Subscribers: pzheng, s.egerton, dsanders, hiraditya, rbar, johnrusso, niosHD, kito-cheng, shiva0217, jrtc27, MaskRay, zzheng, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, psnobl, benna, Jim, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65219

llvm-svn: 369467
2019-08-20 22:53:24 +00:00
Jessica Paquette e6c299b983 [AArch64][GlobalISel] Select logical_imm32 and logical_imm64 patterns
Add a GlobalISel equivalent for the logical_imm32_XFORM and logical_imm64_XFORM
SDNodeXForms in AArch64InstrFormats.td.

- Add select-logical-imm.mir, which contains tests for each imported pattern.

- Update select-pr32733.mir and select-scalar-shift-imm.mir, since they now
select instructions of this form.

Differential Revision: https://reviews.llvm.org/D66162

llvm-svn: 369465
2019-08-20 22:31:25 +00:00
Jessica Paquette 9a95e79b1b [AArch64][GlobalISel] Select patterns which use shifted register operands
This adds GlobalISel equivalents for the following from AArch64InstrFormats:

- arith_shifted_reg32
- arith_shifted_reg64

And partial support for

- logical_shifted_reg32
- logical_shifted_reg32

The only thing missing for the logical cases is support for rotates. Other than
the missing support, the transformation is identical for the arithmetic shifted
register and the logical shifted register.

Lots of tests here:

- Add select-arith-shifted-reg.mir to show that we correctly select add and
sub instructions which use this pattern.

- Add select-logical-shifted-reg.mir to cover patterns which are not shared
between the arithmetic and logical cases.

- Update addsub-shifted.ll to show that we correctly fold shifts into
adds/subs.

- Update eon.ll to show that we can select the eon instruction by folding xors.

Differential Revision: https://reviews.llvm.org/D66163

llvm-svn: 369460
2019-08-20 22:18:06 +00:00
Craig Topper ba375263e8 [DAGCombiner][X86] Teach visitCONCAT_VECTORS to combine (concat_vectors (concat_vectors X, Y), undef)) -> (concat_vectors X, Y, undef, undef)
I also had to add a new combine to X86's combineExtractSubvector to prevent a regression.

This helps our vXi1 code see the full concat operation and allow it optimize undef to a zero if there is already a zero in the concat. This helped us use a movzx instead of an AND in some of the tests. In those tests, one concat comes from SelectionDAGBuilder and the second comes from type legalization of v4i1->i4 bitcasts which uses an additional concat. Though these changes weren't my original motivation.

I'm looking at making X86ISelLowering's narrowShuffle emit a concat_vectors instead of an insert_subvector since concat_vectors is more canonical during early DAG combine. This patch helps prevent a regression from my experiments with that.

Differential Revision: https://reviews.llvm.org/D66456

llvm-svn: 369459
2019-08-20 22:12:50 +00:00
Reid Kleckner 22fb734907 Revert [WinEH] Allocate space in funclets stack to save XMM CSRs
This reverts r367088 (git commit 9ad565f70e)

And the follow up fix r368631 / e9865b9b31

llvm-svn: 369457
2019-08-20 22:08:57 +00:00
Sean Fertile 1e46d4cec5 Adds support for writing the .bss section for XCOFF object files.
Adds Wrapper classes for MCSymbol and MCSection into the XCOFF target
object writer. Also adds a class to represent the top-level sections, which we
materialize in the ObjectWriter.

executePostLayoutBinding will map all csects into the appropriate
container depending on its storage mapping class, and map all symbols
into their containing csect. Once all symbols have been processed we
- Assign addresses and symbol table indices.
- Calaculte section sizes.
- Build the section header table.
- Assign the sections raw-pointer value for non-virtual sections.

Since the .bss section is virtual, writing the header table is enough to
add support. Writing of a sections raw data, or of any relocations is
not included in this patch.

Testing is done by dumping the section header table, but it needs to be
extended to include dumping the symbol table once readobj support for
dumping auxiallary entries lands.

Differential Revision: https://reviews.llvm.org/D65159

llvm-svn: 369454
2019-08-20 22:03:18 +00:00
Craig Topper 3a2b08e6c9 [X86] Add a DAG combine to transform (i8 (bitcast (v8i1 (extract_subvector (v16i1 X), 0)))) -> (i8 (trunc (i16 (bitcast (v16i1 X))))) on KNL target
Without AVX512DQ we don't have KMOVB so we can't really copy 8-bits of a k-register to a GPR. We have to copy 16 bits instead. We do this even if the DAG copy is from v8i1->v16i1. If we detect the (i8 (bitcast (v8i1 (extract_subvector (v16i1 X), 0)))) we should rewrite the types to match the copy we do support. By doing this, we can help known bits to propagate without losing the upper 8 bits of the input to the extract_subvector. This allows some zero extends to be removed since we have an isel pattern to use kmovw for (zero_extend (i16 (bitcast (v16i1 X))).

Differential Revision: https://reviews.llvm.org/D66489

llvm-svn: 369434
2019-08-20 20:20:04 +00:00
Craig Topper 250951abf5 [X86] Add isel patterns for (i64 (zext (i8 (bitcast (v16i1 X))))) to use a KMOVW and a SUBREG_TO_REG. Similar for i8 and anyextend.
We already had patterns for extending to i32 to take advantage of
the impliciting zeroing of the upper bits of a 32-bit GPR that is
done by KMOVW/KMOVB. But the extend might be all the way to i64,
in which case the existing patterns would fail and we'd get a
KMOVW/B followed by a MOVZX. By adding patterns for i64 we can
use the fact that KMOVW/B zero the upper bits of the 32-bit GPR
and the normal property that 32-bit GPR writes implicitly zero the
upper 32-bits of the full 64-bit GPR.

The anyextend patterns are slightly different since we don't care
about the upper zeros. For the i8->i64 I think this avoids selecting
the anyextend as a MOVZX to prevent a partial register issue that
doesn't exist. For i16->i64 I think we would have just emitted an
insert_subreg on top of the extract_subreg that the vXi16->i16
bitcast pattern emits. The register coalescer or peephole pass
should combine those, but this saves that work and makes i8/16
consistent.

llvm-svn: 369431
2019-08-20 19:43:48 +00:00
Martin Storsjo 514f3a122d [TargetMachine] Don't try to create COFFSTUB references on windows on non-COFF
This avoids spurious relocation types for windows/elf targets.

Differential Revision: https://reviews.llvm.org/D66401

llvm-svn: 369426
2019-08-20 18:58:05 +00:00
Matt Arsenault 4b7fc85c0b Revert "AMDGPU: Fix iterator error when lowering SI_END_CF"
This reverts r367500 and r369203. This is causing various test
failures.

llvm-svn: 369417
2019-08-20 17:45:25 +00:00
Andrea Di Biagio 2e897a94f5 [X86][BtVer2] Use ReadAfterLd entries for the register operands of CMPXCHG.
This is a follow-up of r369365.

llvm-svn: 369412
2019-08-20 17:05:56 +00:00
Craig Topper 22ac9f396f [X86] Use isNullConstant instead of getConstantOperandVal == 0. NFC
llvm-svn: 369410
2019-08-20 16:55:12 +00:00
Sam Tebbs dcfc2d40d3 [ARM] Select vaddva
This patch adds vaddva selection.

Differential revision: https://reviews.llvm.org/D66410

llvm-svn: 369404
2019-08-20 16:33:34 +00:00
Andrea Di Biagio 16111d3795 [X86][BtVer2] Fix latency and throughput of atomic INC/DEC/NEG/NOT.
Latency and throughput of LOCK INC/DEC/NEG/NOT is always 19cy.
Number of uOPs is still 1.

Differential Revision: https://reviews.llvm.org/D66469

llvm-svn: 369388
2019-08-20 14:31:27 +00:00
Alex Bradbury 7cb3cd34e8 [RISCV] Implement getExprForFDESymbol to ensure RISCV_32_PCREL is used for the FDE location
Follow binutils in using RISCV_32_PCREL for the FDE initial location. As
explained in the relevant binutils commit
<a6cbf936e3>,
the ADD/SUB pair of relocations is problematic in the presence of linker
relaxation.

This patch has the same end goal as D64715 but includes test changes and
avoids adding a new global VariantKind to MCExpr.h (preferring
RISCVMCExpr VKs like the rest of the RISC-V backend).

Differential Revision: https://reviews.llvm.org/D66419

llvm-svn: 369375
2019-08-20 12:32:31 +00:00
Andrea Di Biagio b1bdd97a26 [X86][Btver2] Fix latency and throughput of CMPXCHG instructions.
On Jaguar, CMPXCHG has a latency of 11cy, and a maximum throughput of 0.33 IPC.
Throughput is superiorly limited to 0.33 because of the implicit in/out
dependency on register EAX. In the case of repeated non-atomic CMPXCHG with the
same memory location, store-to-load forwarding occurs and values for sequent
loads are quickly forwarded from the store buffer.

Interestingly, the functionality in LLVM that computes the reciprocal throughput
doesn't seem to know about RMW instructions. That functionality only looks at
the "consumed resource cycles" for the throughput computation. It should be
fixed/improved by a future patch. In particular, for RMW instructions, that
logic should also take into account for the write latency of in/out register
operands.

An atomic CMPXCHG has a latency of ~17cy. Throughput is also limited to
~17cy/inst due to cache locking, which prevents other memory uOPs to start
executing before the "lock releasing" store uOP.

CMPXCHG8rr and CMPXCHG8rm are treated specially because they decode to one less
macro opcode. Their latency tend to be the same as the other RR/RM variants. RR
variants are relatively fast 3cy (but still microcoded - 5 macro opcodes).

CMPXCHG8B is 11cy and unfortunately doesn't seem to benefit from store-to-load
forwarding. That means, throughput is clearly limited by the in/out dependency
on GPR registers. The uOP composition is sadly unknown (due to the lack of PMCs
for the Integer pipes). I have reused the same mix of consumed resource from the
other CMPXCHG instructions for CMPXCHG8B too.
LOCK CMPXCHG8B is instead 18cycles.

CMPXCHG16B is 32cycles. Up to 38cycles when the LOCK prefix is specified. Due to
the in/out dependencies, throughput is limited to 1 instruction every 32 (or 38)
cycles dependeing on whether the LOCK prefix is specified or not.
I wouldn't be surprised if the microcode for CMPXCHG16B is similar to 2x
microcode from CMPXCHG8B. So, I have speculatively set the JALU01 consumption to
2x the resource cycles used for CMPXCHG8B.

The two new hasLockPrefix() functions are used by the btver2 scheduling model
check if a MCInst/MachineInst has a LOCK prefix. Calls to hasLockPrefix() have
been encoded in predicates of variant scheduling classes that describe lat/thr
of CMPXCHG.

Differential Revision: https://reviews.llvm.org/D66424

llvm-svn: 369365
2019-08-20 10:23:55 +00:00
Craig Topper 1ada137854 [X86] Add back the -x86-experimental-vector-widening-legalization comand line flag and all associated code, but leave it enabled by default
Google is reporting performance issues with the new default behavior
and have asked for a way to switch back to the old behavior while we
investigate and make fixes.

I've restored all of the code that had since been removed and added
additional checks of the command flag onto code paths that are
not otherwise guarded by a check of getTypeAction.

I've also modified the cost model tables to hopefully get us back
to the previous costs.

Hopefully we won't need to support this for very long since we
have no test coverage of the old behavior so we can very easily
break it.

llvm-svn: 369332
2019-08-20 06:58:00 +00:00
Thomas Raoux a08e139d50 [NFC] Test commit, fix some comment spelling.
llvm-svn: 369326
2019-08-20 05:21:27 +00:00
Karl-Johan Karlsson 40da6be2bd [AsmPrinter] Remove const qualifier from EmitBasicBlockStart.
Overriders may want to modify state in it. AMDGPU wants
to, but has to make its members mutable in order to do so.

Besides, EmitBasicBlockEnd is not const, so why should
Start be?

Patch by Bevin Hansson.

Reviewed By: nickdesaulniers

Differential Revision: https://reviews.llvm.org/D66341

llvm-svn: 369325
2019-08-20 05:13:57 +00:00
Evgeniy Stepanov 55ccd16354 Refactor isPointerOffset (NFC).
Summary:
Simplify the API using Optional<> and address comments in
         https://reviews.llvm.org/D66165

Reviewers: vitalybuka

Subscribers: hiraditya, llvm-commits, ostannard, pcc

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66317

llvm-svn: 369300
2019-08-19 21:08:04 +00:00
Evgeniy Stepanov 50affbe47f MemTag: stack initializer merging.
Summary:
MTE provides instructions to update memory tags and data at the same
time. This change makes use of those to generate more compact code for
stack variable tagging + initialization.

We collect memory store and memset instructions following an alloca or a
lifetime.start call, and replace them with the corresponding MTE
intrinsics. Since the intrinsics work on 16-byte aligned chunks, the
stored values are combined as necessary.

Reviewers: pcc, vitalybuka, ostannard

Subscribers: srhines, javed.absar, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66167

llvm-svn: 369297
2019-08-19 20:47:09 +00:00
Sam Clegg 19bf637eb1 [WebAssembly][MC] Allow empty assembly functions
Differential Revision: https://reviews.llvm.org/D66434

llvm-svn: 369292
2019-08-19 19:04:54 +00:00
Craig Topper a0d92c7262 [X86] Teach lowerV4I32Shuffle to only use broadcasts if the mask has more than one undef element. Prioritize shifts over broadcast in lowerV8I16Shuffle.
The motivating case are the changes in vector-reduce-add.ll where
we were doing extra work in the scalar domain instead of shuffling.
There may be some one use check that needs to be looked into there,
but this patch sidesteps the issue by avoiding broadcasts that
aren't really broadcasting.

Differential Revision: https://reviews.llvm.org/D66071

llvm-svn: 369287
2019-08-19 18:15:50 +00:00
Serge Guelton a023d6b7de [nfc] Silent gcc warning
llvm-svn: 369266
2019-08-19 14:40:33 +00:00
Jinsong Ji 0776da5236 [PeepholeOptimizer] Don't assume bitcast def always has input
Summary:
If we have a MI marked with bitcast bits, but without input operands,
PeepholeOptimizer might crash with assert.

eg:
If we apply the changes in PPCInstrVSX.td as in this patch:

[(set v4i32:$XT, (bitconvert (v16i8 immAllOnesV)))]>;

We will get assert in PeepholeOptimizer.

```
llvm-lit llvm-project/llvm/test/CodeGen/PowerPC/build-vector-tests.ll -v

llvm-project/llvm/include/llvm/CodeGen/MachineInstr.h:417: const
llvm::MachineOperand &llvm::MachineInstr::getOperand(unsigned int)
const: Assertion `i < getNumOperands() && "getOperand() out of range!"'
failed.
```

The fix is to abort if we found out of bound access.

Reviewers: qcolombet, MatzeB, hfinkel, arsenm

Reviewed By: qcolombet

Subscribers: wdng, arsenm, steven.zhang, wuzish, nemanjai, hiraditya, kbarton, MaskRay, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65542

llvm-svn: 369261
2019-08-19 14:19:04 +00:00
Alex Bradbury 1c1f8f215d [RISCV] Don't force absolute FK_Data_X fixups to relocs
The current behavior of shouldForceRelocation forces relocations for the
majority of fixups when relaxation is enabled. This makes sense for
fixups which incorporate symbols but is unnecessary for simple data
fixups where the fixup target is already resolved to an absolute value.

Differential Revision: https://reviews.llvm.org/D63404
Patch by Edward Jones.

llvm-svn: 369257
2019-08-19 13:23:02 +00:00
Sam Tebbs f312c1ecf4 [ARM] Add support for MVE vaddv
This patch adds vecreduce_add and the relevant instruction selection for
vaddv.

Differential revision: https://reviews.llvm.org/D66085

llvm-svn: 369245
2019-08-19 09:38:28 +00:00
David Green 2bfc13fde1 [ARM] MVE sext costs
This adds some sext costs for MVE, taken from the length of assembly sequences
that we currently generate.

Differential Revision: https://reviews.llvm.org/D66010

llvm-svn: 369244
2019-08-19 09:13:22 +00:00
Craig Topper ebb7ddc633 [X86] Teach lower1BitShuffle to match right shifts with upper zero elements on types that don't natively support KSHIFT.
We can support these by widening to a supported type,
then shifting all the way to the left and then
back to the right to ensure that we shift in zeroes.

llvm-svn: 369232
2019-08-19 05:45:39 +00:00
Craig Topper e47437a6ef [X86] Fix the lower1BitShuffle code added in r369215 to correctly pass the widened vector to the KSHIFT node.
Not sure how to test this as we have tests that exercise this code,
but nothing failed for the types not matching. Since all the k-registers
use equivalent register classes everything just ends up working.

llvm-svn: 369228
2019-08-19 04:08:44 +00:00
Craig Topper 269c6b1c15 [X86] Teach lower1BitShuffle to match KSHIFTR that doesn't use Zeroable and only relies on undef.
This allows us to widen the type when the KSHIFTR instruction
doesn't exist for the type. If we need to shift in zeroes into
the upper elements we would need more work to guarantee zeroes
when widening.

llvm-svn: 369227
2019-08-19 04:08:40 +00:00
Craig Topper 2eb7951da3 [X86] Teach lower1BitShuffle to recognize padding a subvector with zeros with V2 as the source and V1 as the zero vector.
Shuffle canonicalization can swap the sources so the zero vector
might be V1 and the subvector that's being padded can be V2.

llvm-svn: 369226
2019-08-19 00:39:22 +00:00
Craig Topper 2ee46c7c4b [X86] Add a special case to LowerCONCAT_VECTORSvXi1 to handle concatenating zero vectors followed by one non-zero vector followed by undef vectors.
For such a case we should only need a KSHIFTL, but we were
previously generating a KSHIFTL followed by a KSHIFTR because
we mistakenly believed we need to zero the undef elements.

llvm-svn: 369224
2019-08-18 23:30:11 +00:00
Craig Topper 388b8dd94a [X86] Replace uses of getZeroVector for vXi1 vectors with DAG.getConstant.
vXi1 vectors don't need special handling.

llvm-svn: 369222
2019-08-18 23:30:03 +00:00
Craig Topper 9e074c06fe [X86] Improve lower1BitShuffle handling for KSHIFTL on narrow vectors.
We can insert the value into a larger legal type and shift that
by the desired amount.

llvm-svn: 369215
2019-08-18 18:52:46 +00:00
Simon Pilgrim 63b3c56fca Fix signed/unsigned comparison warning. NFCI.
llvm-svn: 369213
2019-08-18 17:26:30 +00:00
Simon Pilgrim fee2546f3f [X86] isTargetShuffleEquivalent - add BUILD_VECTOR matching
Add similar functionality to isShuffleEquivalent - if the mask elements don't match, try matching the BUILD_VECTOR scalars instead.

As target shuffles need to handle SM_Sentinel values, this can get a bit tricky, so commit just adds actual mask element index handling - full SM_SentinelZero support will be added when the need arises.

Also, enables support in matchVectorShuffleWithPACK

llvm-svn: 369212
2019-08-18 17:15:26 +00:00
Simon Pilgrim a66edd86e2 [X86] isTargetShuffleEquivalent - early out on illegal shuffle masks. NFCI.
Simplifies shuffle mask comparisons by just bailing out if the shuffle mask has any out of range values - will make an upcoming patch much simpler.

llvm-svn: 369211
2019-08-18 16:37:58 +00:00
Matt Arsenault 479f3bdb2c AMDGPU: Fix iterator error when lowering SI_END_CF
If the instruction is the last in the block, there is no next
instruction but the iteration still needs to look at the new block.

llvm-svn: 369203
2019-08-18 00:20:44 +00:00
Matt Arsenault cfdc2b9bd9 AMDGPU: Disambiguate v3f16 format in load/store tables
Currently the searchable tables report the number of dwords. These
round to the same number for 3 and 4 component d16
instructions. Change this to report the number of elements so this
isn't ambiguous.

llvm-svn: 369202
2019-08-18 00:20:43 +00:00
Craig Topper 31f829f0cd [X86] Add a one use check to the combineStore code that handles v16i16->v16i8 truncate+store by extending to v16i32 and then emitting a v16i32->v16i8 truncstore.
This prevent us from emitting a separate truncate and a truncating
store instruction.

llvm-svn: 369200
2019-08-17 22:46:15 +00:00
Paul Walker 26295676a4 Revert Revert [AArch64InstrInfo] Stop getInstSizeInBytes returning non-zero for meta instructions.
This reverts r369132 (git commit 19301d75f0)

llvm-svn: 369186
2019-08-17 09:22:36 +00:00
Paul Walker 93c7a4a47c Revert [AArch64InstrInfo] Stop getInstSizeInBytes returning non-zero for meta instructions.
This reverts r369133 (git commit 2632c677f8)

llvm-svn: 369185
2019-08-17 09:22:28 +00:00
Jian Cai 16fa8b0970 Reland "[ARM] push LR before __gnu_mcount_nc"
This relands r369147 with fixes to unit tests.

https://reviews.llvm.org/D65019

llvm-svn: 369173
2019-08-16 23:30:16 +00:00
Amara Emerson 57ec292ab8 [AArch64][GlobalISel] Fix an assertion during G_UNMERGE selection for s128 types.
llvm-svn: 369172
2019-08-16 23:23:40 +00:00
Jordan Rupprecht d0797ece46 Revert [X86] SimplifyDemandedVectorElts - attempt to recombine target shuffle using DemandedElts mask (reapplied)
This reverts r368662 (git commit 1a8d790cf5)

The compile-time regression repro is in https://bugs.llvm.org/show_bug.cgi?id=43024

llvm-svn: 369167
2019-08-16 23:08:56 +00:00
Eli Friedman eaff844fe9 [ARM] Preserve liveness in ARMConstantIslands.
We currently don't use liveness information after this point, but it can
be useful to catch bugs using -verify-machineinstrs, and optimizations
could potentially use this information in the future.

Differential Revision: https://reviews.llvm.org/D66319

llvm-svn: 369162
2019-08-16 22:20:14 +00:00
Craig Topper a17d1d2250 [X86] Use Register/MCRegister in more places in X86
This was a quick pass through some obvious places. I haven't tried the clang-tidy check.

I also replaced the zeroes in getX86SubSuperRegister with X86::NoRegister which is the real sentinel name.

Differential Revision: https://reviews.llvm.org/D66363

llvm-svn: 369151
2019-08-16 20:50:23 +00:00
Jian Cai 2d957cfe02 Revert "[ARM] push LR before __gnu_mcount_nc"
This reverts commit f4cf3b9593.

llvm-svn: 369149
2019-08-16 20:40:21 +00:00
Jian Cai f4cf3b9593 [ARM] push LR before __gnu_mcount_nc
Push LR register before calling __gnu_mcount_nc as it expects the value of LR register to be the top value of
the stack on ARM32.

Differential Revision: https://reviews.llvm.org/D65019

llvm-svn: 369147
2019-08-16 20:21:08 +00:00
Guanzhong Chen b1cb9fd1aa [WebAssembly] Forbid use of EM_ASM with setjmp/longjmp
Summary:
We tried to support EM_ASM with setjmp/longjmp in binaryen. But with dynamic
linking thrown into the mix, the code is no longer understandable and cannot
be maintained. We also discovered more bugs in the EM_ASM handling code.

To ensure maintainability and correctness of the binaryen code, EM_ASM will
no longer be supported with setjmp/longjmp. This is probably fine since the
support was added recently and haven't be published.

Reviewers: tlively, sbc100, jgravelle-google, kripken

Reviewed By: tlively, kripken

Subscribers: dschuff, hiraditya, aheejin, sunfish, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66356

llvm-svn: 369137
2019-08-16 18:21:08 +00:00
Simon Pilgrim 63b78b678b [X86] resolveTargetShuffleInputs - add DemandedElts variant. NFCI.
Nothing calls this yet, everything still goes through the non (all) DemandedElts wrapper.

llvm-svn: 369136
2019-08-16 18:13:22 +00:00
Simon Pilgrim 8ff1b7de4d [X86] combineExtractWithShuffle - handle extract(truncate(x), 0)
Eventually we need to generalize combineExtractWithShuffle to handle all faux shuffles and handle truncate (and X86ISD::VTRUNC etc.) there, but we're not ready yet (still creates nodes on the fly, incomplete DemandedElts support, bad use of recursive Depth limit).

llvm-svn: 369134
2019-08-16 17:35:08 +00:00
Paul Walker 2632c677f8 [AArch64InstrInfo] Stop getInstSizeInBytes returning non-zero for meta instructions.
Recommit with fixes for mac builders.

Summary:
AArch64InstrInfo::getInstSizeInBytes is incorrectly treating meta
instructions (e.g. CFI_INSTRUCTION) as normal instructions and
giving them a size of 4.

This results in branch relaxation calculating block sizes wrong.
Branch relaxation also considers alignment and thus a single
mistake can result in later blocks being incorrectly sized even
when they themselves do not contain meta instructions.

The net result is we might not relax a branch whose destination is
not within range.

Reviewers: nickdesaulniers, peter.smith

Reviewed By: peter.smith

Subscribers: javed.absar, kristof.beyls, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66337

> llvm-svn: 369111

llvm-svn: 369133
2019-08-16 17:29:53 +00:00
Paul Walker 19301d75f0 Revert [AArch64InstrInfo] Stop getInstSizeInBytes returning non-zero for meta instructions.
This reverts r369111 (git commit 3ccee5f7c4)

llvm-svn: 369132
2019-08-16 17:29:42 +00:00
Simon Pilgrim 3a8c698771 [X86] Alphabetize pass initialization definitions. NFCI.
llvm-svn: 369126
2019-08-16 16:41:38 +00:00
Krzysztof Parzyszek ac83aab035 [Hexagon] Generate min/max instructions for 64-bit vectors
llvm-svn: 369124
2019-08-16 16:16:27 +00:00
Sander de Smalen f28e1128d9 Relanding r368987 [AArch64] Change location of frame-record within callee-save area.
Changes:
There was a condition for `!NeedsFrameRecord` missing in the assert. The
assert in question has changed to:

+    assert((!RPI.isPaired() || !NeedsFrameRecord || RPI.Reg2 != AArch64::FP ||
+            RPI.Reg1 == AArch64::LR) &&
+           "FrameRecord must be allocated together with LR");

This addresses PR43016.

llvm-svn: 369122
2019-08-16 15:42:28 +00:00
David Green b782e61e47 [ARM] MVE sext of a load is free
MVE also has some sext of loads, which will be free just as scalar
instructions are.

Differential Revision: https://reviews.llvm.org/D66008

llvm-svn: 369118
2019-08-16 15:13:37 +00:00
Luis Marques fa06e95898 [RISCV] Convert registers from unsigned to Register
Only in public interfaces that have not yet been converted should there remain
registers with unsigned type.

Differential Revision: https://reviews.llvm.org/D66252

llvm-svn: 369114
2019-08-16 14:27:50 +00:00
Paul Walker 3ccee5f7c4 [AArch64InstrInfo] Stop getInstSizeInBytes returning non-zero for meta instructions.
Summary:
AArch64InstrInfo::getInstSizeInBytes is incorrectly treating meta
instructions (e.g. CFI_INSTRUCTION) as normal instructions and
giving them a size of 4.

This results in branch relaxation calculating block sizes wrong.
Branch relaxation also considers alignment and thus a single
mistake can result in later blocks being incorrectly sized even
when they themselves do not contain meta instructions.

The net result is we might not relax a branch whose destination is
not within range.

Reviewers: nickdesaulniers, peter.smith

Reviewed By: peter.smith

Subscribers: javed.absar, kristof.beyls, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66337

llvm-svn: 369111
2019-08-16 14:17:52 +00:00
Simon Pilgrim 9da4989c52 [X86] Remove unused include. NFCI.
We don't use anything from TargetOptions.h directly and its included via TargetLowering.h anyhow.

llvm-svn: 369110
2019-08-16 14:05:46 +00:00
David Green 6e1ac42474 [ARM] Correct register for narrowing and widening MVE loads and stores.
The widening and narrowing MVE instructions like VLDRH.32 are only permitted to
use low tGPR registers. This means that if they are used for a stack slot,
where the register used is only decided during frame setup, we need to be able
to correctly pick a thumb1 register over a normal GPR.

This attempts to add the required logic into eliminateFrameIndex and
rewriteT2FrameIndex, only picking the FrameReg if it is a valid register for
the operands register class, and picking a valid scratch register for the
register class.

Differential Revision: https://reviews.llvm.org/D66285

llvm-svn: 369108
2019-08-16 13:42:39 +00:00
David Green 8c2c5f5045 [ARM] Don't pretend we know how to generate MVE VLDn
We don't yet know how to generate these instructions for MVE. And in the case
of VLD3, we don't even have the instruction. For the moment don't tell the
vectoriser that we have VLD4, just to end up serialising the results.

Differential Revision: https://reviews.llvm.org/D66009

llvm-svn: 369101
2019-08-16 13:06:49 +00:00
Lewis Revill d3f774d33c [RISCV] Allow parsing of bare symbols with offsets
This patch allows symbols followed by an expression for an offset to be
parsed as bare symbols.

Differential Revision: https://reviews.llvm.org/D57332

llvm-svn: 369097
2019-08-16 12:00:56 +00:00
Lewis Revill 7abf863f76 [RISCV] Lower inline asm constraint A for RISC-V
This allows arguments with the constraint A to be lowered to input nodes
for RISC-V, which implies a memory address stored in a register.

This patch adds the minimal amount of code required to get operands with
the right constraints to compile.

https://reviews.llvm.org/D54296

llvm-svn: 369095
2019-08-16 10:28:34 +00:00
Craig Topper 120cffccf8 [X86] Manually reimplement getTargetInsertSubreg in X86DAGToDAGISel::matchBitExtract so we can call insertDAGNode on the target constant.
This is needed to maintain the topological sort order.

Fixes PR42992.

llvm-svn: 369084
2019-08-16 04:47:44 +00:00
Nico Weber ee96499a42 Revert r368987, it caused PR43016.
llvm-svn: 369080
2019-08-16 02:21:21 +00:00
Eli Friedman 9b9a308452 [ARM][LowOverheadLoops] Fix generated code for "revert".
Two issues:

1. t2CMPri shouldn't use CPSR if it isn't predicated. This doesn't
really have any visible effect at the moment, but it might matter in the
future.
2. The t2CMPri generated for t2WhileLoopStart might need to use a
register that isn't LR.

My team found this because we have a patch to track register liveness
late in the pass pipeline. I'll look into upstreaming it to help catch
issues like this earlier.

Differential Revision: https://reviews.llvm.org/D66243

llvm-svn: 369069
2019-08-15 23:35:53 +00:00
Philip Reames 5c38ca3534 [SDAG] Minor code cleanup/standardization of atomic accessors [NFC]
llvm-svn: 369057
2019-08-15 22:21:14 +00:00
Evgeniy Stepanov 10ce5f88d1 Add missing MIR serialization text for AArch64II::MO_TAGGED.
Reviewers: pcc

Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66312

llvm-svn: 369053
2019-08-15 22:03:55 +00:00
Daniel Sanders 0c47611131 Apply llvm-prefer-register-over-unsigned from clang-tidy to LLVM
Summary:
This clang-tidy check is looking for unsigned integer variables whose initializer
starts with an implicit cast from llvm::Register and changes the type of the
variable to llvm::Register (dropping the llvm:: where possible).

Partial reverts in:
X86FrameLowering.cpp - Some functions return unsigned and arguably should be MCRegister
X86FixupLEAs.cpp - Some functions return unsigned and arguably should be MCRegister
X86FrameLowering.cpp - Some functions return unsigned and arguably should be MCRegister
HexagonBitSimplify.cpp - Function takes BitTracker::RegisterRef which appears to be unsigned&
MachineVerifier.cpp - Ambiguous operator==() given MCRegister and const Register
PPCFastISel.cpp - No Register::operator-=()
PeepholeOptimizer.cpp - TargetInstrInfo::optimizeLoadInstr() takes an unsigned&
MachineTraceMetrics.cpp - MachineTraceMetrics lacks a suitable constructor

Manual fixups in:
ARMFastISel.cpp - ARMEmitLoad() now takes a Register& instead of unsigned&
HexagonSplitDouble.cpp - Ternary operator was ambiguous between unsigned/Register
HexagonConstExtenders.cpp - Has a local class named Register, used llvm::Register instead of Register.
PPCFastISel.cpp - PPCEmitLoad() now takes a Register& instead of unsigned&

Depends on D65919

Reviewers: arsenm, bogner, craig.topper, RKSimon

Reviewed By: arsenm

Subscribers: RKSimon, craig.topper, lenary, aemerson, wuzish, jholewinski, MatzeB, qcolombet, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, wdng, nhaehnle, sbc100, jgravelle-google, kristof.beyls, hiraditya, aheejin, kbarton, fedor.sergeev, javed.absar, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, tpr, PkmX, jocewei, jsji, Petar.Avramovic, asbirlea, Jim, s.egerton, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65962

llvm-svn: 369041
2019-08-15 19:22:08 +00:00
Krzysztof Parzyszek 8e987702b1 [Hexagon] Fix instruction selection for vselect v4i8
llvm-svn: 369040
2019-08-15 19:20:09 +00:00
Matt Arsenault 1f2b727298 MVT: Add v3i16/v3f16 vectors
AMDGPU has some buffer intrinsics which theoretically could use
this. Some of the generated tables include the 3 and 4 element vector
versions of these rounded to 64-bits, which is ambiguous. Add these to
help the table disambiguate these.

Assertion change is for the path odd sized vectors now take for R600.
v3i16 is widened to v4i16, which then needs to be promoted to v4i32.

llvm-svn: 369038
2019-08-15 18:58:25 +00:00
Craig Topper 2a372ba534 [X86] Add custom type legalization for bitcasting mmx to v2i32/v4i16/v8i8 to use movq2dq instead of going through memory.
llvm-svn: 369031
2019-08-15 18:23:37 +00:00
Craig Topper 6eebd2bcd7 [X86] Improve cost model for subvector extraction of less than 128-bit vectors
Now that we're using widening legalization. We need to improve our extract_subvector cost model for these types. This patch begins by modeling these as a subvector extract followed by a permute. I've left FIXMEs in the code for future improvements.

Differential Revision: https://reviews.llvm.org/D65892

llvm-svn: 369022
2019-08-15 17:29:42 +00:00
Krzysztof Parzyszek 8460301d58 [Hexagon] Generate vector min/max for HVX
llvm-svn: 369014
2019-08-15 16:13:17 +00:00
Jonas Devlieghere 0eaee545ee [llvm] Migrate llvm::make_unique to std::make_unique
Now that we've moved to C++14, we no longer need the llvm::make_unique
implementation from STLExtras.h. This patch is a mechanical replacement
of (hopefully) all the llvm::make_unique instances across the monorepo.

llvm-svn: 369013
2019-08-15 15:54:37 +00:00
Jinsong Ji 9fd81dc139 [PowerPC] Use xxleqv to set all one vector IMM(-1).
Summary:
xxspltib/vspltisb are 3 cycle PM instructions,
xxleqv is 2 cycle ALU instruction.

We should use xxleqv to set all one vectors.

Reviewers: hfinkel, nemanjai, steven.zhang

Subscribers: hiraditya, kbarton, MaskRay, shchenz, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65529

llvm-svn: 369006
2019-08-15 14:32:51 +00:00
David Green 3a99101812 [ARM] Fix alignment checks for BE VLDRH
We need to allow any alignment at least 2, not just exactly 2, so that the big
endian loads and stores can be selected successfully. I've also added extra BE
testing for the load and store tests.

Thanks to Oliver for the report.

Differential Revision: https://reviews.llvm.org/D66222

llvm-svn: 368996
2019-08-15 12:54:47 +00:00
Sanjay Patel 57d459309d [SDAG][x86] check for relaxed math when matching an FP reduction
If the last step in an FP add reduction allows reassociation and doesn't care
about -0.0, then we are free to recognize that computation as a reduction
that may reorder the intermediate steps.

This is requested directly by PR42705:
https://bugs.llvm.org/show_bug.cgi?id=42705
and solves PR42947 (if horizontal math instructions are actually faster than
the alternative):
https://bugs.llvm.org/show_bug.cgi?id=42947

Differential Revision: https://reviews.llvm.org/D66236

llvm-svn: 368995
2019-08-15 12:43:15 +00:00
David Green 0ff2296a49 [ARM] MVE predicate store patterns
Stack loads and stores were already working, but direct stores were not. This
adds the patterns for them, same as predicate loads.

Differential Revision: https://reviews.llvm.org/D66213

llvm-svn: 368988
2019-08-15 10:41:42 +00:00
Sander de Smalen 643adb5576 [AArch64] Change location of frame-record within callee-save area.
This patch changes the location of the frame-record (FP, LR) to the 
bottom of the callee-saved area. According to the AAPCS the location of
the frame-record within the stackframe is unspecified (section 5.2.3 The 
Frame Pointer), so the compiler should be free to choose a different
location.

The reason for changing the location of the frame-record is to prepare
the frame for allocating an SVE area below the callee-saves. This way the 
compiler can use the VL-scaled addressing modes to directly access SVE 
objects from the frame-pointer.

            :                :   
        | stack |        | stack |
        |  args |        |  args |
        +-------+        +-------+
        |  x30  |        |  x19  |
        |  x29  |        |  x20  |
  FP -> |- - - -|        |  x21  |
        |  x19  |   ==>  |  x22  |
        |  x20  |        |- - - -|
        |  x21  |        |  x30  |
        |  x22  |        |  x29  |
        +-------+        +-------+ <- FP
        |///////|        |///////|         // realignment gap 
        |- - - -|        |- - - -|
        |spills/|        |spills/|
        | locals|        | locals|
  SP -> +-------+        +-------+ <- SP

Things to point out:
- The algorithm to find a paired register should be prevented from
  accidentally pairing some callee-saved register with LR that is not 
  FP, since they should always be paired together when the frame
  has a frame-record.
- For Darwin platforms the location of the frame-record is unchanged,
  since the unwind encoding does not allow for encoding this position
  dynamically and other tools currently depend on the former layout. 

Reviewers: efriedma, rovka, rengolin, thegameg, greened, t.p.northover

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D65653

llvm-svn: 368987
2019-08-15 10:34:16 +00:00
David Green 04f2f32869 [ARM] MVE trunc to i1 vectors
This adds patterns for selecting trunc instructions from full vectors to i1's
vectors.

Differential Revision: https://reviews.llvm.org/D66201

llvm-svn: 368981
2019-08-15 09:26:51 +00:00
Craig Topper 1e246b20c0 [X86] Add isel pattern to match VZEXT_MOVL and a v2i64 scalar_to_vector bitcasted from x86mmx to MOVQ2DQ.
We already had the pattern for just the scalar to vector and bitcast,
but not the case where we wanted zeroes in the high half of the xmm.

llvm-svn: 368972
2019-08-15 06:46:30 +00:00
Craig Topper e6409602a1 [X86] Make sure load is non-volatile in the MMX_X86movdq2q (loadv2i64) isel pattern.
This pattern will narrow the load so we should make sure its not
volatile.

llvm-svn: 368971
2019-08-15 06:46:26 +00:00
Craig Topper dbcbbf5658 [X86] Remove unneeded isel pattern for v4f32->v4i32 fp_to_sint and conversion to MMX.
fp_to_sint is turned into X86cvttp2si during isel preprocessing.
The other redundant isel patterns were removed previously, but I
missed this one because its in the MMX td file.

llvm-svn: 368968
2019-08-15 05:52:02 +00:00
Craig Topper a57734ba4e [X86] Disable custom type legalization for v2i32/v4i16/v8i8->i64.
The default legalization can take care of this.

llvm-svn: 368967
2019-08-15 05:51:58 +00:00
Craig Topper 57286afe4e [X86] Disable custom type legalization for v2i32/v4i16/v8i8->f64 bitcast.
The generic legalization handles this in the same way so just use
that.

llvm-svn: 368966
2019-08-15 05:51:54 +00:00
Craig Topper ba39fcd8c6 [X86] Remove some unreachable code from LowerBITCAST.
llvm-svn: 368965
2019-08-15 05:51:50 +00:00
Craig Topper 14f7560020 [X86] Remove some dead code and combine some repeated code that's left.
If the width is 256 bits, then we must have AVX so the else here
was unnecessary. Once that's removed then the >= 256 bit code is
identical to the 128 bit code with a different VT so combine them.

llvm-svn: 368956
2019-08-15 04:07:43 +00:00
Amara Emerson 1222cfd5fe [AArch64][GlobalISel] Custom selection for s8 load acquire.
Implement this single atomic load instruction so that we can compile stack
protector code.

Differential Revision: https://reviews.llvm.org/D66245

llvm-svn: 368923
2019-08-14 21:30:30 +00:00
Matt Arsenault dbc1f207fa InferAddressSpaces: Move target intrinsic handling to TTI
I'm planning on handling intrinsics that will benefit from checking
the address space enums. Don't bother moving the address collection
for now, since those won't need th enums.

llvm-svn: 368895
2019-08-14 18:13:00 +00:00
Thomas Lively de0133eaa2 [WebAssembly] Stop unrolling SIMD shifts since they are fixed in V8
Summary:
Fixes PR42973. Tests don't change because simd-arith.ll tests behavior
on unimplemented-simd128, which does not include any temporary
workarounds such as the one removed in this revision.

Reviewers: aheejin

Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, dmgreen, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66166

llvm-svn: 368868
2019-08-14 16:24:37 +00:00
Craig Topper 3e44d96170 [X86] Use PSADBW for v8i8 addition reductions.
Improves the 8 byte case from PR42674.

Differential Revision: https://reviews.llvm.org/D66069

llvm-svn: 368864
2019-08-14 15:57:29 +00:00
Xiangling Liao 49661f94c8 [NFC][AIX] Change assertion
Address one left comment on https://reviews.llvm.org/D63547. A minor
change for assertion.

Differential Revision: https://reviews.llvm.org/D63547

llvm-svn: 368860
2019-08-14 14:57:25 +00:00
Craig Topper 30d3e9c395 [X86][CostModel] Adjust the costs of ZERO_EXTEND/SIGN_EXTEND with less than 128-bit inputs
Now that we legalize by widening, the element types here won't change. Previously these were modeled as the elements being widened and then the instruction might become an AND or SHL/ASHR pair. But now they'll become something like a ZERO_EXTEND_VECTOR_INREG/SIGN_EXTEND_VECTOR_INREG.

For AVX2, when the destination type is legal its clear the cost should be 1 since we have extend instructions that can produce 256 bit vectors from less than 128 bit vectors. I'm a little less sure about AVX1 costs, but I think the ones I changed were definitely too high, but they might still be too high.

Differential Revision: https://reviews.llvm.org/D66169

llvm-svn: 368858
2019-08-14 14:52:39 +00:00
Craig Topper 8c545168ee [X86] Add llvm_unreachable to a switch that covers all expected values.
llvm-svn: 368857
2019-08-14 14:51:19 +00:00
Jinsong Ji e71db6584d [PowerPC][NFC] Consolidate duplicate XX3Form_SetZero and XX3Form_Zero.
Rename one to XX3Form_SameOp, remove the other one.

llvm-svn: 368856
2019-08-14 14:16:26 +00:00
Jason Liu 8fc095d453 [AIX] Add call lowering for parameters that could pass onto FPRs
Summary:
This patch adds call lowering functionality to enable passing
parameters onto floating point registers when needed.

Differential Revision: https://reviews.llvm.org/D63654

llvm-svn: 368855
2019-08-14 14:13:11 +00:00
Amara Emerson 2a312fc989 [AArch64][GlobalISel] RBS: Treat s128s like vectors when unmerging.
The destinations should be FPRs (for now).

Differential Revision: https://reviews.llvm.org/D66184

llvm-svn: 368775
2019-08-13 23:51:20 +00:00
Eli Friedman b5eb3e1e82 [AArch64] Remove incorrect usage of MONonTemporal.
This has no effect at the moment, but might matter if we try to
implement non-temporal loads in the future.

llvm-svn: 368770
2019-08-13 23:12:14 +00:00
Xiangling Liao a8c624a1c4 [AIX]Lowering global address for 32/64bit small/large code models
This patch implements global address lowering for 32/64 bit with small/large code models.
    1.For 32bit large code model on AIX, there are newly added pseudo opcode LWZtocL & ADDIStocHA32, the support of which on MC layer will be
       provided by future patches.
    2.The default code model on AIX should be small code model.
    3.Since AIX does not have medium code model, "report_fatal_error" when users specify it.

    Differential Revision: https://reviews.llvm.org/D63547

llvm-svn: 368744
2019-08-13 20:29:01 +00:00
Tim Renouf 10db641aab [AMDGPU] Fix to 'Fold readlane from copy of SGPR or imm'
That change (r363670) could leave a copy from vgpr to sgpr. Fixed.

Differential Revision: https://reviews.llvm.org/D66133

Change-Id: I00c3fe6fda2e8e1e36f53195b881b1449c777ea4
llvm-svn: 368736
2019-08-13 18:57:55 +00:00
David Green a655393f17 [ARM] Add MVE beats vector cost model
The MVE architecture has the idea of "beats", where a vector instruction can be
executed over several ticks of the architecture. This adds a similar system
into the Arm backend cost model, multiplying the cost of all vector
instructions by a factor.

This factor essentially becomes the expected difference between scalar code
and vector code, on average. MVE Vector instructions can also overlap so the a
true cost of them is often lower. But equally scalar instructions can in some
situations be dual issued, or have other optimisations such as unrolling or
make use of dsp instructions. The default is chosen as 2. This should not
prevent vectorisation is a most cases (as the vector instructions will still be
doing at least 4 times the work), but it will help prevent over vectorising in
cases where the benefits are less likely.

This adds things so far to the obvious places in ARMTargetTransformInfo, and
updates a few related costs like not treating float instructions as cost 2 just
because they are floats.

Differential Revision: https://reviews.llvm.org/D66005

llvm-svn: 368733
2019-08-13 18:12:08 +00:00
Heejin Ahn 64517a6419 Use Register over unsigned in LateEHPrepare (NFC)
Summary:
While D65962 is pending for review, I landed D65475 that added one more
use of `unsigned`. Changed it to `Register`.

Reviewers: dsanders

Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66064

llvm-svn: 368727
2019-08-13 17:35:44 +00:00
Hubert Tong 0996705009 Reland r368691: "[AIX] Implement LR prolog/epilog save/restore"
Trying again with the code changes (and not just the new test).

Summary:
This patch fixes the offsets of fields in the stack frame linkage save
area for AIX.

Reviewers: sfertile, hubert.reinterpretcast, jasonliu, Xiangling_L, xingxue, ZarkoCA, daltenty

Reviewed By: hubert.reinterpretcast

Subscribers: wuzish, nemanjai, hiraditya, kbarton, MaskRay, jsji, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64424

Patch by Chris Bowler!

llvm-svn: 368721
2019-08-13 17:05:53 +00:00
Matt Arsenault 28215caa60 GlobalISel: Partially implement fewerElementsVector G_UNMERGE_VALUES
Odd sized vectors aren't handled yet.

llvm-svn: 368713
2019-08-13 16:26:28 +00:00
Momchil Velikov 114c37e72a [ARM] Fix detection of duplicates when parsing reg list operands
Differential Revision: https://reviews.llvm.org/D65957

llvm-svn: 368712
2019-08-13 16:13:00 +00:00
Momchil Velikov f990e4a4c7 [ARM] Fix encoding of APSR in CLRM instruction
The APSR is encoded by setting bit 15 in the register list of the CLRM
instruction (cf. https://static.docs.arm.com/ddi0553/bh/DDI0553B_h_armv8m_arm.pdf).

Differential Revision: https://reviews.llvm.org/D65873

llvm-svn: 368711
2019-08-13 16:12:46 +00:00
Matt Arsenault 690645bda0 GlobalISel: Implement lower for G_SHUFFLE_VECTOR
llvm-svn: 368709
2019-08-13 16:09:07 +00:00
Matt Arsenault 5af9cf042f GlobalISel: Change representation of shuffle masks
Currently shufflemasks get emitted as any other constant, and you end
up with a bunch of virtual registers of G_CONSTANT with a
G_BUILD_VECTOR. The AArch64 selector then asserts on anything that
doesn't fit this pattern. This isn't an ideal representation, and
should avoid legalization and have fewer opportunities for a
representational error.

Rather than invent a new shuffle mask operand type, similar to what
ShuffleVectorSDNode does, just track the original IR Constant mask
operand. I don't completely like the idea of adding another link to
the IR, but MIR is already quite dependent on IR constants already,
and this will allow sharing the shuffle mask utility functions with
the IR.

llvm-svn: 368704
2019-08-13 15:34:38 +00:00
Simon Pilgrim e7b350a5d1 [X86] XFormVExtractWithShuffleIntoLoad - handle shuffle mask scaling
If the target shuffle mask is from a wider type, attempt to scale the mask so that the extraction can attempt to peek through.

Fixes the regression mentioned in rL368662

Reapplying this as rL368308 had to be reverted as part of rL368660 to revert rL368276

llvm-svn: 368663
2019-08-13 11:11:42 +00:00
Simon Pilgrim 1a8d790cf5 [X86] SimplifyDemandedVectorElts - attempt to recombine target shuffle using DemandedElts mask (reapplied)
If we don't demand all elements, then attempt to combine to a simpler shuffle.

At the moment we can only do this if Depth == 0 as combineX86ShufflesRecursively uses Depth to track whether the shuffle has really changed or not - we'll need to change this before we can properly start merging combineX86ShufflesRecursively into SimplifyDemandedVectorElts. 

The insertps-combine.ll regression is because XFormVExtractWithShuffleIntoLoad can't see through shuffles of different widths - this will be fixed in a follow-up commit.

Reapplying this as rL368307 had to be reverted as part of rL368660 to revert rL368276

llvm-svn: 368662
2019-08-13 10:51:39 +00:00
Hans Wennborg 5390d25f2b Revert r368276 "[TargetLowering] SimplifyDemandedBits - call SimplifyMultipleUseDemandedBits for ISD::EXTRACT_VECTOR_ELT"
This introduced a false positive MemorySanitizer warning about use of
uninitialized memory in a vectorized crc function in Chromium. That suggests
maybe something is not right with this transformation. See
https://crbug.com/992853#c7 for a reproducer.

This also reverts the follow-up commits r368307 and r368308 which
depended on this.

> This patch attempts to peek through vectors based on the demanded bits/elt of a particular ISD::EXTRACT_VECTOR_ELT node, allowing us to avoid dependencies on ops that have no impact on the extract.
>
> In particular this helps remove some unnecessary scalar->vector->scalar patterns.
>
> The wasm shift patterns are annoying - @tlively has indicated that the wasm vector shift codegen are to be refactored in the near-term and isn't considered a major issue.
>
> Differential Revision: https://reviews.llvm.org/D65887

llvm-svn: 368660
2019-08-13 09:33:25 +00:00
Qiu Chaofan 4fb99a3330 [PowerPC] Fix ICE when truncating some vectors
The legalizer would hit an assertion on PowerPC platform when truncating
a vector whose size is not power of 2.  This patch is to add a check to
prevent vectors with such odd-size elements from being custom lowered.

Reviewed By: Hal Finkel

Differential Revision: https://reviews.llvm.org/D65261

llvm-svn: 368654
2019-08-13 07:53:29 +00:00
Amara Emerson 72c81b94cb [AArch64][GlobalISel] Replace explicit vreg creation with implicit using SrcOp. NFC.
llvm-svn: 368653
2019-08-13 06:55:32 +00:00
Amara Emerson e14c91b71a [GlobalISel] Make the InstructionSelector instance non-const, allowing state to be maintained.
Currently we can't keep any state in the selector object that we get from
subtarget. As a result we have to plumb through all our variables through
multiple functions. This change makes it non-const and adds a virtual init()
method to allow further state to be captured for each target.

AArch64 makes use of this in this patch to cache a call to hasFnAttribute()
which is expensive to call, and is used on each selection of G_BRCOND.

Differential Revision: https://reviews.llvm.org/D65984

llvm-svn: 368652
2019-08-13 06:26:59 +00:00
Stanislav Mekhanoshin 438315bf69 [AMDGPU] Fix msan failure in printf lowering
llvm-svn: 368645
2019-08-13 01:07:27 +00:00
Stanislav Mekhanoshin 5b32752d10 [AMDGPU] removed unused functions from printf lowering
Differential Revision: https://reviews.llvm.org/D66117

llvm-svn: 368633
2019-08-12 23:32:35 +00:00
Reid Kleckner e9865b9b31 [WinEH] Fix catch block parent frame pointer offset
r367088 made it so that funclets store XMM registers into their local
frame instead of storing them to the parent frame. However, that change
forgot to update the parent frame pointer offset for catch blocks. This
change does that.

Fixes crashes when an exception is rethrown in a catch block that saves
XMMs, as described in https://crbug.com/992860.

llvm-svn: 368631
2019-08-12 23:02:00 +00:00
Daniel Sanders 3836874dbb [risc-v] Apply llvm-prefer-register-over-unsigned from clang-tidy to LLVM
Summary:
This clang-tidy check is looking for unsigned integer variables whose initializer
starts with an implicit cast from llvm::Register and changes the type of the
variable to llvm::Register (dropping the llvm:: where possible).

Depends on D65919

Reviewers: lenary

Subscribers: jholewinski, MatzeB, qcolombet, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, wdng, nhaehnle, sbc100, jgravelle-google, kristof.beyls, hiraditya, aheejin, kbarton, fedor.sergeev, javed.absar, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, tpr, PkmX, jocewei, jsji, Petar.Avramovic, asbirlea, Jim, s.egerton, llvm-commits

Tags: #llvm

Differential Revision for full review was: https://reviews.llvm.org/D65962

llvm-svn: 368629
2019-08-12 22:41:02 +00:00
Daniel Sanders 5ae66e56cf [aarch64] Apply llvm-prefer-register-over-unsigned from clang-tidy to LLVM
Summary:
This clang-tidy check is looking for unsigned integer variables whose initializer
starts with an implicit cast from llvm::Register and changes the type of the
variable to llvm::Register (dropping the llvm:: where possible).

Manual fixups in:
AArch64InstrInfo.cpp - genFusedMultiply() now takes a Register* instead of unsigned*
AArch64LoadStoreOptimizer.cpp - Ternary operator was ambiguous between Register/MCRegister. Settled on Register

Depends on D65919

Reviewers: aemerson

Subscribers: jholewinski, MatzeB, qcolombet, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, wdng, nhaehnle, sbc100, jgravelle-google, kristof.beyls, hiraditya, aheejin, kbarton, fedor.sergeev, javed.absar, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, tpr, PkmX, jocewei, jsji, Petar.Avramovic, asbirlea, Jim, s.egerton, llvm-commits

Tags: #llvm

Differential Revision for full review was: https://reviews.llvm.org/D65962

llvm-svn: 368628
2019-08-12 22:40:53 +00:00
Daniel Sanders 05c145d694 [webassembly] Apply llvm-prefer-register-over-unsigned from clang-tidy to LLVM
Summary:
This clang-tidy check is looking for unsigned integer variables whose initializer
starts with an implicit cast from llvm::Register and changes the type of the
variable to llvm::Register (dropping the llvm:: where possible).

Reviewers: aheejin

Subscribers: jholewinski, MatzeB, qcolombet, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, wdng, nhaehnle, sbc100, jgravelle-google, kristof.beyls, hiraditya, aheejin, kbarton, fedor.sergeev, javed.absar, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, tpr, PkmX, jocewei, jsji, Petar.Avramovic, asbirlea, Jim, s.egerton, llvm-commits

Tags: #llvm

Differential Revision for whole review: https://reviews.llvm.org/D65962

llvm-svn: 368627
2019-08-12 22:40:45 +00:00
Stanislav Mekhanoshin ef8f1c473a [AMDGPU] Use PredicateControl in MIMGBaseOpcode. NFC.
This is infrastructural, will be needed for future work.
For some reason it was only used in MIMG_NoSampler, while
needed everywere we use MIMGBaseOpcode if we want to use
predicates.

Differential Revision: https://reviews.llvm.org/D66115

llvm-svn: 368626
2019-08-12 22:32:21 +00:00
Craig Topper e07e593782 [X86] Allow combineTruncateWithSat to use pack instructions for i16->i8 without AVX512BW.
We need AVX512BW to be able to truncate an i16 vector. If we don't
have that we have to extend i16->i32, then trunc, i32->i8. But we
won't be able to remove the min/max if we do that. At least not
without more special handling.

llvm-svn: 368623
2019-08-12 22:18:23 +00:00
Craig Topper 0761a38e8a [X86] Remove unreachable code from LowerTRUNCATE. NFC
All three 256->128 bit cases were already handled above.

Noticed while looking at the coverage report.

llvm-svn: 368609
2019-08-12 19:26:45 +00:00
Craig Topper a3605baaff [X86] Add a paranoia type check to the code that detects AVG patterns from truncating stores.
If we're after type legalize, we should make sure we won't create
a store with an illegal type when we separate the AVG pattern
from the truncating store.

I don't know of a way to fail for this today. Just noticed while
I was in the vicinity.

llvm-svn: 368608
2019-08-12 19:26:37 +00:00
Craig Topper 1b02909847 [X86] Simplify creation of saturating truncating stores.
We just need to check if the truncating store is legal
instead of going through isSATValidOnAVX512Subtarget.

llvm-svn: 368607
2019-08-12 19:26:30 +00:00
Craig Topper 3f4e9b156d [X86] Replace call to isTruncStoreLegalOrCustom with isTruncStoreLegal. NFC
We have no custom trunc stores on X86.

llvm-svn: 368606
2019-08-12 19:26:22 +00:00
Craig Topper 09d5d15339 [X86] Disable use of zmm registers for varargs musttail calls under prefer-vector-width=256 and min-legal-vector-width=256.
Under this config, the v16f32 type we try to use isn't to a register
class so the getRegClassFor call will fail.

llvm-svn: 368594
2019-08-12 17:43:26 +00:00
David Green 86876422ef [ARM] sext of a load is free
This teaches the cost model that the sext or zext of a load is going to be
free.

Differential Revision: https://reviews.llvm.org/D66006

llvm-svn: 368593
2019-08-12 17:39:56 +00:00
Stanislav Mekhanoshin 4c9c98f36b [AMDGPU] Printf runtime binding pass
This pass is a port of the according pass from the HSAIL compiler.
It parses printf calls and setup runtime printf buffer.
After that it copies printf arguments to the buffer and fills in
module metadata for runtime.

Differential Revision: https://reviews.llvm.org/D24035

llvm-svn: 368592
2019-08-12 17:12:29 +00:00
David Green 3e39f39ad9 [ARM] MVE shuffle broadcast costs
A VDUP will perform a vector broadcast in a single instruction. Update the cost
model for MVE accordingly.

Code originally by David Sherwood.

Differential Revision: https://reviews.llvm.org/D63448

llvm-svn: 368589
2019-08-12 16:54:07 +00:00
David Green 83bbfaa5e4 [ARM] Put some of the TTI costmodel behind hasNeon calls.
This puts some of the calls in ARMTargetTransformInfo.cpp behind hasNeon()
checks, now that we have MVE, and updates all the tests accordingly.

Differential Revision: https://reviews.llvm.org/D63447

llvm-svn: 368587
2019-08-12 15:59:52 +00:00
Sam Elliott fee242aed4 [RISCV] Fix ICE in isDesirableToCommuteWithShift
Summary:
Ana Pazos reported a bug where we were not checking that an APInt would
fit into 64-bits before calling `getSExtValue()`. This caused asserts when
compiling large constants, such as i128s, as happens when compiling compiler-rt.

This patch adds a testcase and makes the callback less error-prone.

Reviewers: apazos, asb, luismarques

Reviewed By: luismarques

Subscribers: hiraditya, rbar, johnrusso, simoncook, sabuasal, niosHD, kito-cheng, shiva0217, jrtc27, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, rkruppe, PkmX, jocewei, psnobl, benna, Jim, s.egerton, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66081

llvm-svn: 368572
2019-08-12 13:51:00 +00:00
Simon Pilgrim 182249daee [X86][SSE] ComputeKnownBits - add basic PSADBW handling
llvm-svn: 368558
2019-08-12 12:19:19 +00:00
Pengfei Wang e28cbbd5d4 [X86] Support -march=tigerlake
Support -march=tigerlake for x86.
Compare with Icelake Client, It include 4 more new features ,they are
avx512vp2intersect, movdiri, movdir64b, shstk.

Patch by Xiang Zhang (xiangzhangllvm)

Differential Revision: https://reviews.llvm.org/D65840

llvm-svn: 368543
2019-08-12 01:29:46 +00:00
Craig Topper ce6a2cf966 [X86] Simplify some of the type checks in combineSubToSubus.
If we have SSE2 we can handle any i8/i16 type and let
type legalization deal with it.

llvm-svn: 368538
2019-08-11 17:36:49 +00:00
Craig Topper 637964bfd8 [X86] Don't use SplitOpsAndApply for ISD::USUBSAT.
Target independent type legalization and custom lowering
should be able to handle it.

llvm-svn: 368537
2019-08-11 17:36:45 +00:00
David Green 11c4602fce [MVE] Don't try to unroll vectorised MVE loops
Due to the nature of the beat system in the MVE architecture, along with tail
predication and low-overhead loops, unrolling has less benefit compared to
normal loops. You can not, for example, hide the latency of a load with other
instructions as you can for scalar code. Preventing unrolling also makes the
code easier to read and reason about.

So if a loop contains vector code, don't enable the runtime unrolling. At least
for the time being.

Differential Revision: https://reviews.llvm.org/D65803

llvm-svn: 368530
2019-08-11 08:53:18 +00:00
David Green 44f8d635e2 [ARM] Permit auto-vectorization using MVE
With enough codegen complete, we can now correctly report the number and size
of vector registers for MVE, allowing auto vectorisation. This also allows FP
auto-vectorization for MVE without -Ofast/-ffast-math, due to support for IEEE
FP arithmetic and parity between scalar and vector FP behaviour.

Patch by David Sherwood.

Differential Revision: https://reviews.llvm.org/D63728

llvm-svn: 368529
2019-08-11 08:42:57 +00:00
Heejin Ahn 831efe0e0f Fix __clang_call_termiante's argument for foreign exceptions
Summary:
When exceptions are repeatedly thrown in the middle of handling another
exception, we call `__clang_call_terminate` with the exception pointer
(i32) as an argument. But in case of foreign exceptions, we don't have
the pointer, so we call the function with 0. (This requires
`__clang_call_terminate` can deal with 0 argument, which will be done
later)

But previously the 0 argument was not added as a `i32.const 0` but an
immediate by mistake, causing the `call` instruction to take not an i32
but rather an exnref, because an `exnref` is left on top of the value
stack if `br_on_exn` is not taken.

```
block i32
  br_on_exn 0, __cpp_exception
                               ;; exnref is on top of stack now
  i32.const 0                  ;; This was missing!
  call __clang_call_terminate
  unreachable
end
call __clang_call_terminate    ;; This takes i32 extracted by br_on_exn
```

Reviewers: dschuff

Subscribers: sbc100, jgravelle-google, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65475

llvm-svn: 368527
2019-08-11 06:24:07 +00:00
Craig Topper 9758e0e1bf [X86] Remove some more code from combineShuffle that is no longer needed with widening legalization.
llvm-svn: 368523
2019-08-11 02:17:18 +00:00
Craig Topper 0f74b82ef1 [X86] Remove some code from combineShuffle that seems largely unnecessary with widening legalization.
The test case that changed is probably better served through
allowing combineTruncatedArithmetic to create narrow vectors. It
also appears InstCombine would have simplified this test case
to remove the zext and trunc anyway.

llvm-svn: 368522
2019-08-11 02:08:38 +00:00
Simon Pilgrim ec128709f0 [X86][SSE] Lower shuffle as ANY_EXTEND_VECTOR_INREG
On SSE41+ targets we always lower vector shuffles to ZERO_EXTEND_VECTOR_INREG, even if we don't need the extended bits.

This patch relaxes this so that we lower to ANY_EXTEND_VECTOR_INREG if we can, meaning that shuffle combines have a better idea of what elements need to be kept zero. This helps the multiple reduction code as we can now combine away a lot more of the pack+extend codes.

Differential Revision: https://reviews.llvm.org/D65741

llvm-svn: 368515
2019-08-10 16:46:07 +00:00
Craig Topper 74c43a2277 [X86] Match the IR pattern form movmsk on SSE1 only targets where v4i32 isn't legal
Summary:
This patch adds a special DAG combine for SSE1 to recognize the IR pattern InstCombine gives us for movmsk. This only does the recognition for a few cases where its obvious the input won't be scalarized resulting in building a vector just do to the movmsk. I've made it separate from our existing matching for movmsk since that's called in multiple places and I didn't spend time to see if the other callers would make sense here. Plus the restrictions and additional checks would complicate that.

This fixes the case from PR42870. Buts its probably still broken the presence of logic ops feeding the movmsk pattern which would further hide the v4f32 type.

Reviewers: spatel, RKSimon, xbolva00

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65689

llvm-svn: 368506
2019-08-10 07:51:13 +00:00
Craig Topper a8e5e73711 [X86] Improve the diagnostic for larger than 4-bit immediate for vpermil2pd/ps. Only allow MCConstantExprs.
llvm-svn: 368505
2019-08-10 04:28:52 +00:00
Luo, Yuanke c6c86f4f81 [X86] Fix stack probe issue on windows32.
Summary:
On windows if the frame size exceed 4096 bytes, compiler need to
generate a call to _alloca_probe. X86CallFrameOptimization pass
changes the reserved stack size and cause of stack probe function
not be inserted. This patch fix the issue by detecting the call
frame size, if the size exceed 4096 bytes, drop X86CallFrameOptimization.

Reviewers: craig.topper, wxiao3, annita.zhang, rnk, RKSimon

Reviewed By: rnk

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65923

llvm-svn: 368503
2019-08-10 02:49:02 +00:00
Daniel Sanders e9a57c2b23 [globalisel] Add G_SEXT_INREG
Summary:
Targets often have instructions that can sign-extend certain cases faster
than the equivalent shift-left/arithmetic-shift-right. Such cases can be
identified by matching a shift-left/shift-right pair but there are some
issues with this in the context of combines. For example, suppose you can
sign-extend 8-bit up to 32-bit with a target extend instruction.
  %1:_(s32) = G_SHL %0:_(s32), i32 24 # (I've inlined the G_CONSTANT for brevity)
  %2:_(s32) = G_ASHR %1:_(s32), i32 24
  %3:_(s32) = G_ASHR %2:_(s32), i32 1
would reasonably combine to:
  %1:_(s32) = G_SHL %0:_(s32), i32 24
  %2:_(s32) = G_ASHR %1:_(s32), i32 25
which no longer matches the special case. If your shifts and extend are
equal cost, this would break even as a pair of shifts but if your shift is
more expensive than the extend then it's cheaper as:
  %2:_(s32) = G_SEXT_INREG %0:_(s32), i32 8
  %3:_(s32) = G_ASHR %2:_(s32), i32 1
It's possible to match the shift-pair in ISel and emit an extend and ashr.
However, this is far from the only way to break this shift pair and make
it hard to match the extends. Another example is that with the right
known-zeros, this:
  %1:_(s32) = G_SHL %0:_(s32), i32 24
  %2:_(s32) = G_ASHR %1:_(s32), i32 24
  %3:_(s32) = G_MUL %2:_(s32), i32 2
can become:
  %1:_(s32) = G_SHL %0:_(s32), i32 24
  %2:_(s32) = G_ASHR %1:_(s32), i32 23

All upstream targets have been configured to lower it to the current
G_SHL,G_ASHR pair but will likely want to make it legal in some cases to
handle their faster cases.

To follow-up: Provide a way to legalize based on the constant. At the
moment, I'm thinking that the best way to achieve this is to provide the
MI in LegalityQuery but that opens the door to breaking core principles
of the legalizer (legality is not context sensitive). That said, it's
worth noting that looking at other instructions and acting on that
information doesn't violate this principle in itself. It's only a
violation if, at the end of legalization, a pass that checks legality
without being able to see the context would say an instruction might not be
legal. That's a fairly subtle distinction so to give a concrete example,
saying %2 in:
  %1 = G_CONSTANT 16
  %2 = G_SEXT_INREG %0, %1
is legal is in violation of that principle if the legality of %2 depends
on %1 being constant and/or being 16. However, legalizing to either:
  %2 = G_SEXT_INREG %0, 16
or:
  %1 = G_CONSTANT 16
  %2:_(s32) = G_SHL %0, %1
  %3:_(s32) = G_ASHR %2, %1
depending on whether %1 is constant and 16 does not violate that principle
since both outputs are genuinely legal.

Reviewers: bogner, aditya_nandakumar, volkan, aemerson, paquette, arsenm

Subscribers: sdardis, jvesely, wdng, nhaehnle, rovka, kristof.beyls, javed.absar, hiraditya, jrtc27, atanasyan, Petar.Avramovic, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61289

llvm-svn: 368487
2019-08-09 21:11:20 +00:00
Eric Christopher db2f17d362 Remove variable only used in an assert.
llvm-svn: 368486
2019-08-09 21:02:47 +00:00
Craig Topper 6cb05ca044 [X86] Remove custom handling for extloads from LowerLoad.
We don't appear to need this with widening legalization.

llvm-svn: 368479
2019-08-09 20:27:22 +00:00
Simon Pilgrim 60394f47b0 [X86][SSE] Swap X86ISD::BLENDV inputs with an inverted selection mask (PR42825)
As discussed on PR42825, if we are inverting the selection mask we can just swap the inputs and avoid the inversion.

Differential Revision: https://reviews.llvm.org/D65522

llvm-svn: 368438
2019-08-09 12:44:20 +00:00
Simon Atanasyan 242c5a70d4 [Mips][Codegen] Fix fast-isel mixing of FGR64 and AFGR64 registers
Fast-isel was picking AFGR64 register class for processing call
arguments when +fp64 options was used. We simply check is option +fp64
is used and pick appropriate register.

Patch by Mirko Brkusanin.

Differential Revision: https://reviews.llvm.org/D65886

llvm-svn: 368433
2019-08-09 12:02:32 +00:00
Pablo Barrio 3cdd586be2 [AArch64] Set pref. func. align to 8 bytes on Neoverse E1 & Cortex-A65
Summary:
The Arm Neoverse E1 and Cortex-A65 Software Optimization Guide [1][2],
Section "4.7 Branch instruction alignment" state:

"It is preferable for branch targets, including subroutine entry points,
to be placed on aligned 64-bit boundaries to maximize instruction fetch
efficiency."

This patch sets the preferred function alignment on Neoverse E1 and
Cortex-A65 to 2^3=8B. This was already the case in some Cortex-A CPUs
such as Cortex-A53.

[1] https://developer.arm.com/docs/swog466751/latest/arm-neoversetm-e1-core-software-optimization-guide
[2] https://developer.arm.com/docs/swog010045/latest/arm-cortex-a65-core-software-optimization-guide

Reviewers: dmgreen, fhahn, samparker

Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65937

llvm-svn: 368431
2019-08-09 11:05:15 +00:00
Tim Northover 01eb869114 AArch64: support TLS on Darwin platforms in GlobalISel.
All TLS access on Darwin is in the "general dynamic" form where we call
a function to resolve the address, so implementation is pretty simple.

llvm-svn: 368418
2019-08-09 09:32:38 +00:00
Tim Northover e1a5f668b3 GlobalISel: pack various parameters for lowerCall into a struct.
I've now needed to add an extra parameter to this call twice recently. Not only
is the signature getting extremely unwieldy, but just updating all of the
callsites and implementations is a pain. Putting the parameters in a struct
sidesteps both issues.

llvm-svn: 368408
2019-08-09 08:26:38 +00:00
Sam Parker 0dba791a25 [ARM][ParallelDSP] Replace SExt uses
As loads are combined and widened, we replaced their sext users
operands whereas we should have been replacing the uses of the sext.
I've added a load of tests, with only a few of them originally
causing assertion failures, the rest improve pattern coverage.

Differential Revision: https://reviews.llvm.org/D65740

llvm-svn: 368404
2019-08-09 07:48:50 +00:00
Craig Topper 6179175551 [X86] Remove code that expands truncating stores from combineStore.
We shouldn't form trunc stores that need to be expanded now that
we are using widening legalization.

llvm-svn: 368400
2019-08-09 06:59:53 +00:00
Craig Topper 7e33f11ba7 [X86] Remove stale FIXME from combineMaskedStore. NFC
I believe PR34584 was tracking that FIXME, but its since been
closed and a test case was added.

llvm-svn: 368397
2019-08-09 05:55:41 +00:00
Craig Topper 8c5c09780d [X86] Remove DAG combine expansion of extending masked load and truncating masked store.
The only way to generate these was through promoting legalization
of narrow vectors, but we widen those types now. So we shouldn't
produce these nodes.

llvm-svn: 368396
2019-08-09 05:53:37 +00:00
Craig Topper 509c8774fa [X86] Remove handler for (U/S)(ADD/SUB)SAT from ReplaceNodeResults. Remove TypeWidenVector check from code that handles X86ISD::VPMADDWD and X86ISD::AVG.
More unneeded code since we now legalize narrow vectors by widening.

llvm-svn: 368395
2019-08-09 05:17:52 +00:00
Craig Topper 824961824f [X86] Remove ISD::SETCC handling from ReplaceNodeResults.
This is no longer needed since we widen v2i32 instead of promoting.

llvm-svn: 368394
2019-08-09 05:17:48 +00:00
Craig Topper ef5b435b00 [X86] Simplify ISD::LOAD handling in ReplaceNodeResults and ISD::STORE handling in LowerStore now that v2i32 is widened to v4i32.
llvm-svn: 368390
2019-08-09 03:09:43 +00:00
Craig Topper 0da681a2be [X86] Merge v2f32 and v2i32 gather/scatter handling in ReplaceNodeResults/LowerMSCATTER now that v2i32 is also widened like v2f32.
llvm-svn: 368389
2019-08-09 03:09:28 +00:00
Craig Topper 6f81db0f68 [X86] Now unreachable handling for f64->v2i32/v4i16/v8i8 bitcasts from ReplaceNodeResults.
We rely on the generic type legalizer for this now.

llvm-svn: 368388
2019-08-09 03:09:19 +00:00
Craig Topper d871f638d7 [X86] Simplify ReplaceNodeResults handling for FP_TO_SINT/UINT for vectors to only handle widening.
llvm-svn: 368387
2019-08-09 03:09:10 +00:00
Craig Topper 0bd44d59db [X86] Simplify ReplaceNodeResults handling for SIGN_EXTEND/ZERO_EXTEND/TRUNCATE for vectors to only handle widening.
llvm-svn: 368386
2019-08-09 03:08:54 +00:00
Craig Topper cdb9a8ebd8 [X86] Simplify ReplaceNodeResults handling for UDIV/UREM/SDIV/SREM for vectors to only handle widening.
llvm-svn: 368385
2019-08-09 03:08:45 +00:00
Craig Topper 35848345f0 [X86] Remove vector promotion handling from the ReplaceNodeResults ISD::MUL handling code.
We now widen illegal vector types so we don't need this anymore.

llvm-svn: 368384
2019-08-09 03:08:28 +00:00
Craig Topper c49d3e6c4d [X86] Improve codegen of v8i64->v8i16 and v16i32->v16i8 truncate with avx512vl, avx512bw, min-legal-vector-width<=256 and prefer-vector-width=256
Under this configuration we'll want to split the v8i64 or v16i32 into two vectors. The default legalization will try to truncate each of those 256-bit pieces one step to 128-bit, concatenate those, then truncate one more time from the new 256 to 128 bits.

With this patch we now truncate the two splits to 64-bits then concatenate those. We have to do this two different ways depending on whether have widening legalization enabled. Without widening legalization we have to manually construct X86ISD::VTRUNC to prevent the ISD::TRUNCATE with a narrow result being promoted to 128 bits with a larger element type than what we want followed by something like a pshufb to grab the lower half of each element to finish the job. With widening legalization we just get the right thing. When we switch to widening by default we can just delete the other code path.

Differential Revision: https://reviews.llvm.org/D65626

llvm-svn: 368349
2019-08-08 21:36:47 +00:00
Brian Cain 6dbbd0f343 [llvm-mc] Add reportWarning() to MCContext
Adding reportWarning() to MCContext, so that it can be used from
the Hexagon assembler backend.

llvm-svn: 368327
2019-08-08 19:13:23 +00:00
Craig Topper 9d55e2c85e [X86] Make CMPXCHG16B feature imply CMPXCHG8B feature.
This fixes znver1 so that it properly enables CMPXHG8B. We can
probably remove explicit CMPXCHG8B from CPUs that also have
CMPXCHG16B, but keeping this simple to allow cherry pick to 9.0.

Fixes PR42935.

llvm-svn: 368324
2019-08-08 18:11:17 +00:00
Pirama Arumuga Nainar 0cb2a33dfd [AArch64] Do not emit '#' before immediates in inline asm
Summary:
The A64 assembly language does not require the '#' character to
introduce constant immediate operands.  Avoid the '#' since the AArch64
asm parser does not accept '#' before the lane specifier and rejects the
following:
  __asm__ ("fmla v2.4s, v0.4s, v1.s[%0]" :: "I"(0x1))

Fix a test to not expect the '#' and add a new test case with the above
asm.

Fixes: https://github.com/android-ndk/ndk/issues/1036

Reviewers: peter.smith, kristof.beyls

Subscribers: javed.absar, hiraditya, llvm-commits, srhines

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65550

llvm-svn: 368320
2019-08-08 17:50:39 +00:00
Simon Pilgrim eb7a553db8 [X86] XFormVExtractWithShuffleIntoLoad - handle shuffle mask scaling
If the target shuffle mask is from a wider type, attempt to scale the mask so that the extraction can attempt to peek through.

Fixes the regression mentioned in rL368307

llvm-svn: 368308
2019-08-08 16:05:23 +00:00
Simon Pilgrim 67c246bbe6 [X86] SimplifyDemandedVectorElts - attempt to recombine target shuffle using DemandedElts mask
If we don't demand all elements, then attempt to combine to a simpler shuffle.

At the moment we can only do this if Depth == 0 as combineX86ShufflesRecursively uses Depth to track whether the shuffle has really changed or not - we'll need to change this before we can properly start merging combineX86ShufflesRecursively into SimplifyDemandedVectorElts. 

The insertps-combine.ll regression is because XFormVExtractWithShuffleIntoLoad can't see through shuffles of different widths - this will be fixed in a follow-up commit.

llvm-svn: 368307
2019-08-08 15:54:20 +00:00
David Tenty 8558aac82c Enable assembly output of local commons for AIX
Summary:
This patch enable assembly output of local commons for AIX using .lcomm
directives. Adds a EmitXCOFFLocalCommonSymbol to MCStreamer so we can emit the
AIX version of .lcomm assembly directives which include a csect name. Handle the
case of BSS locals in PPCAIXAsmPrinter by using EmitXCOFFLocalCommonSymbol. Adds
a test for generating .lcomm on AIX Targets.

Reviewers: cebowleratibm, hubert.reinterpretcast, Xiangling_L, jasonliu, sfertile

Reviewed By: sfertile

Subscribers: wuzish, nemanjai, hiraditya, kbarton, MaskRay, jsji, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64825

llvm-svn: 368306
2019-08-08 15:40:35 +00:00
David Green 27ca82f32a [ARM] Add support for MVE pre and post inc loads and stores
This adds pre- and post- increment and decrements for MVE loads and stores. It
uses the builtin pre and post load/store detection, unlike Neon. Loads are
selected with the code in tryT2IndexedLoad, stores are selected with tablegen
patterns. The immediates have a +/-7bit range, multiplied by the size of the
element.

Differential Revision: https://reviews.llvm.org/D63840

llvm-svn: 368305
2019-08-08 15:27:58 +00:00
David Green 824ffd8b12 [ARM] MVE big endian loads/stores
This adds some missing patterns for big endian loads/stores, allowing unaligned
loads/stores to also be selected with an extra VREV, which produces better code
than aligning through a stack. Also moves VLDR_P0 to not be LE only, and
adjusts some of the tests to show all that working.

Differential Revision: https://reviews.llvm.org/D65583

llvm-svn: 368304
2019-08-08 15:15:19 +00:00
Sam Elliott 856d5c5817 [RISCV] Allow ABI Names in Inline Assembly Constraints
Summary:
Clang will replace references to registers using ABI names in inline
assembly constraints with references to architecture names, but other
frontends do not. LLVM uses the regular assembly parser to parse inline asm,
so inline assembly strings can contain references to registers using their
ABI names.

This patch adds support for parsing constraints using either the ABI name or
the architectural register name. This means we do not need to implement the
ABI name replacement code in every single frontend, especially those like
Rust which are a very thin shim on top of LLVM IR's inline asm, and that
constraints can more closely match the assembly strings they refer to.

Reviewers: asb, simoncook

Reviewed By: simoncook

Subscribers: hiraditya, rbar, johnrusso, JDevlieghere, apazos, sabuasal, niosHD, kito-cheng, shiva0217, jrtc27, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, rkruppe, PkmX, jocewei, psnobl, benna, Jim, s.egerton, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65947

llvm-svn: 368303
2019-08-08 14:59:16 +00:00
Sam Elliott cd44aee3da [RISCV] Minimal stack realignment support
Summary:
Currently the RISC-V backend does not realign the stack. This can be an issue even for the RV32I/RV64I ABIs (where the stack is 16-byte aligned), though is rare. It will be much more comment with RV32E (though the alignment requirements for common data types remain under-documented...).

This patch adds minimal support for stack realignment. It should cope with large realignments. It will error out if the stack needs realignment and variable sized objects are present.

It feels like a lot of the code like getFrameIndexReference and determineFrameLayout could be refactored somehow, as right now it feels fiddly and brittle. We also seem to allocate a lot more memory than GCC does for equivalent C code.

Reviewers: asb

Reviewed By: asb

Subscribers: wwei, jrtc27, s.egerton, MaskRay, Jim, lenary, hiraditya, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, kito-cheng, shiva0217, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, rkruppe, PkmX, jocewei, psnobl, benna, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62007

llvm-svn: 368300
2019-08-08 14:40:54 +00:00
Simon Pilgrim 59fabf9c60 [X86][SSE] matchBinaryPermuteShuffle - split INSERTPS combines
We need to prefer INSERTPS with zeros over SHUFPS, but fallback to INSERTPS if that fails.

llvm-svn: 368292
2019-08-08 13:23:53 +00:00
Petar Avramovic caef930699 [MIPS GlobalISel] Select jump_table and brjt
G_JUMP_TABLE and G_BRJT appear from translation of switch statement.
Select these two instructions for MIPS32, both pic and non-pic.

Differential Revision: https://reviews.llvm.org/D65861

llvm-svn: 368274
2019-08-08 10:21:12 +00:00
Sam Tebbs 7ca980edcd [ARM] Select VFMA
llvm-svn: 368264
2019-08-08 08:21:01 +00:00
Craig Topper 724c6053ac [X86] Remove -x86-experimental-vector-widening-legalization command line option and all its uses.
This option is now defaulted to true and we don't want to support
turning it off so remove the option.

llvm-svn: 368258
2019-08-08 06:48:22 +00:00
David Green 1becefd3f7 [ARM] Tighten up VLDRH.32 with low alignments
VLDRH needs to have an alignment of at least 2, including the
widening/narrowing versions. This tightens up the ISel patterns for it and
alters allowsMisalignedMemoryAccesses so that unaligned accesses are expanded
through the stack. It also fixed some incorrect shift amounts, which seemed to
be passing a multiple not a shift.

Differential Revision: https://reviews.llvm.org/D65580

llvm-svn: 368256
2019-08-08 06:22:03 +00:00
Craig Topper 0aacc7da8b [X86] Add CMOV_FR32X and CMOV_FR64X to the isCMOVPseudo function.
llvm-svn: 368250
2019-08-08 04:40:59 +00:00
Amy Huang 0b870b969f Recommit "[MS] Emit S_HEAPALLOCSITE debug info in Selection DAG"
with a fix to clear the SDNode map when SelectionDAG is cleared.

llvm-svn: 368230
2019-08-07 22:49:40 +00:00
Craig Topper 7f7ef0208b [X86] Allow pack instructions to be used for 512->256 truncates when -mprefer-vector-width=256 is causing 512-bit vectors to be split
If we're splitting the 512-bit vector anyway and we have zero/sign bits, then we might as well use pack instructions to concat and truncate at once.

Differential Revision: https://reviews.llvm.org/D65904

llvm-svn: 368210
2019-08-07 21:16:10 +00:00
Craig Topper 8b5f2ab2a4 Recommit r367901 "[X86] Enable -x86-experimental-vector-widening-legalization by default."
The assert that caused this to be reverted should be fixed now.

Original commit message:

This patch changes our defualt legalization behavior for 16, 32, and
64 bit vectors with i8/i16/i32/i64 scalar types from promotion to
widening. For example, v8i8 will now be widened to v16i8 instead of
promoted to v8i16. This keeps the elements widths the same and pads
with undef elements. We believe this is a better legalization strategy.
But it carries some issues due to the fragmented vector ISA. For
example, i8 shifts and multiplies get widened and then later have
to be promoted/split into vXi16 vectors.

This has the potential to cause regressions so we wanted to get
it in early in the 10.0 cycle so we have plenty of time to
address them.

Next steps will be to merge tests that explicitly test the command
line option. And then we can remove the option and its associated
code.

llvm-svn: 368183
2019-08-07 16:24:26 +00:00
Oliver Cruickshank 4d4eefda6c [ARM] Expand CTPOP intrinsic for MVE
llvm-svn: 368180
2019-08-07 15:47:45 +00:00
Simon Pilgrim d52bc482a5 [X86] EltsFromConsecutiveLoads - early out for non-byte sized memory (PR42909)
Don't attempt to merge loads for types that aren't modulo 8-bits.

llvm-svn: 368165
2019-08-07 12:41:59 +00:00
Sander de Smalen 1d2bfa4a86 [AArch64][WinCFI] Do not pair callee-save instructions in LoadStoreOptimizer
Prevent the LoadStoreOptimizer from pairing any load/store instructions with
instructions from the prologue/epilogue if the CFI information has encoded the
operations as separate instructions.  This would otherwise lead to a mismatch
of the actual prologue size from the size as recorded in the Windows CFI.

Reviewers: efriedma, mstorsjo, ssijaric

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D65817

llvm-svn: 368164
2019-08-07 12:41:38 +00:00
Simon Atanasyan e5fa049efa [mips] Make a couple of class methods plain static functions. NFC
llvm-svn: 368162
2019-08-07 12:21:41 +00:00
Simon Atanasyan 8a7c0e7c0a [mips] Use isMicroMips() function to check enabled feature flag. NFC
llvm-svn: 368161
2019-08-07 12:21:32 +00:00
Simon Atanasyan 9f2e076f27 [Mips] Instruction `sc` now accepts symbol as an argument
Function MipsAsmParser::expandMemInst() did not properly handle
instruction `sc` with a symbol as an argument because first argument
would be counted twice. We add additional checks and handle this case
separately.

Patch by Mirko Brkusanin.

Differential Revision: https://reviews.llvm.org/D64252

llvm-svn: 368160
2019-08-07 12:21:26 +00:00
Benjamin Kramer 3d5360a439 Replace llvm::MutexGuard/UniqueLock with their standard equivalents
All supported platforms have <mutex> now, so we don't need our own
copies any longer. No functionality change intended.

llvm-svn: 368149
2019-08-07 10:57:25 +00:00
Oliver Cruickshank 30dcae0956 [ARM] Generate MVE VHADDs/VHSUBs
llvm-svn: 368146
2019-08-07 10:26:57 +00:00
Sam Parker 173de03740 [ARM][LowOverheadLoops] Revert after read/write
Currently we check whether LR is stored/loaded to/from inbetween the
loop decrement and loop end pseudo instructions. There's two problems
here:
- It relies on all load/store instructions being labelled as such in
  tablegen.
- Actually any use of loop decrement is troublesome because the value
  doesn't exist!
    
So we need to check for any read/write of LR that occurs between the
two instructions and revert if we find anything.

Differential Revision: https://reviews.llvm.org/D65792

llvm-svn: 368130
2019-08-07 07:39:19 +00:00
Craig Topper f192cc587c [X86] Allow any 8-bit immediate to be used with bt/btc/btr/bts memory aliases.
We have aliases that disambiguate memory forms of bt/btc/btr/bts
without suffixes to the 32-bit form. These aliases should have
been updated when the instructions were updated in r356413.

llvm-svn: 368127
2019-08-07 06:17:58 +00:00
Craig Topper 624980037d [X86] Use isInt<8> to simplify some code. NFC
llvm-svn: 368126
2019-08-07 06:17:55 +00:00
Craig Topper 29688f4da0 [X86] Limit vpermil2pd/vpermil2ps immediates to 4 bits in the assembly parser.
The upper 4 bits of the immediate byte are used to encode a
register. We need to limit the explicit immediate to fit in the
remaining 4 bits.

Fixes PR42899.

llvm-svn: 368123
2019-08-07 05:34:27 +00:00
Mitch Phillips bd0d97e1c4 Revert "[X86] Enable -x86-experimental-vector-widening-legalization by default."
This reverts commit 3de33245d2.

This commit broke the MSan buildbots. See
https://reviews.llvm.org/rL367901 for more information.

llvm-svn: 368107
2019-08-06 23:00:43 +00:00
Craig Topper ecc1e5d476 [X86] Don't allow combineSIntToFP to create v2i32 vectors after type legalization.
If we're after type legalization we should only be trying to turn
v2i64 into v2i32. So bitcast to v4i32, shuffle the even elements
together. Then use X86ISD::CVTSI2P. The alternative is to leave
the v2i64 type alone and let it scalarized. Hopefully keeping
it packed is better.

Fixes PR42905.

llvm-svn: 368091
2019-08-06 21:43:15 +00:00
Aditya Nandakumar c8ac029d0a [GISel]: Add GISelKnownBits analysis
https://reviews.llvm.org/D65698

This adds a KnownBits analysis pass for GISel. This was done as a
pass (compared to static functions) so that we can add other features
such as caching queries(within a pass and across passes) in the future.
This patch only adds the basic pass boiler plate, and implements a lazy
non caching knownbits implementation (ported from SelectionDAG). I've
also hooked up the AArch64PreLegalizerCombiner pass to use this - there
should be no compile time regression as the analysis is lazy.

llvm-svn: 368065
2019-08-06 17:18:29 +00:00
Daniel Sanders d9934d4939 [globalisel] Allow SrcOp to convert an APInt and render it as an immediate operand (MO.isImm() == true)
Summary:
This is tested by D61289 but has been pulled into a separate patch at
a reviewers request.

Reviewers: bogner, aditya_nandakumar, volkan, aemerson, paquette, arsenm, rovka

Reviewed By: arsenm

Subscribers: javed.absar, hiraditya, wdng, kristof.beyls, Petar.Avramovic, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61321

llvm-svn: 368063
2019-08-06 17:16:27 +00:00
Roman Lebedev 213817327f [X86] Move CPU features for Barcelona/K10 out of line
Summary:
Cleans X86.td's Barcelona entry to be more like the others,
by moving the features out of the `Proc<>`, thus potentially
making it possible to inherit from them.
Split off from D63628

Reviewers: craig.topper, RKSimon

Reviewed By: craig.topper

Subscribers: hiraditya, jfb, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65791

llvm-svn: 368061
2019-08-06 17:04:02 +00:00
Sander de Smalen ad7e95df5a [AArch64] NFC: Generalize emitFrameOffset to support more than byte offsets.
Refactor emitFrameOffset to accept a StackOffset struct as its offset argument.
This method currently only supports byte offsets (MVT::i8) but will be extended
in a later patch to support scalable offsets (MVT::nxv1i8) as well.

Reviewers: thegameg, rovka, t.p.northover, efriedma, greened

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D61436

llvm-svn: 368049
2019-08-06 15:06:31 +00:00
Tim Northover b5abc425d2 AArch64: bail instead of asserting on unexpected type in G_CONSTANT 0.
llvm-svn: 368031
2019-08-06 13:34:08 +00:00
Simon Pilgrim cf62047d29 [X86][SSE] Call SimplifyMultipleUseDemandedBits on PACKSS/PACKUS arguments.
This mainly helps to replace unused arguments with UNDEF in the case where they have multiple users.

llvm-svn: 368026
2019-08-06 13:10:42 +00:00
Sander de Smalen 612b038966 [AArch64] NFC: Add generic StackOffset to describe scalable offsets.
To support spilling/filling of scalable vectors we need a more generic
representation of a stack offset than simply 'int'.

For this we introduce the StackOffset struct, which comprises multiple
offsets sized by their respective MVTs. Byte-offsets will thus be a simple
tuple such as { offset, MVT::i8 }. Adding two byte-offsets will result in a
byte offset { offsetA + offsetB, MVT::i8 }. When two offsets have different
types, we can canonicalise them to use the same MVT, as long as their
runtime sizes are guaranteed to have the same size-ratio as they would have
at compile-time.

When we have both scalable- and fixed-size objects on the stack, we can 
create an offset that is: 

  ({ offset_fixed, MVT::i8 } + { offset_scalable, MVT::nxv1i8 })

The struct also contains a getForFrameOffset() method that is specific to
AArch64 and decomposes the frame-offset to be used directly in instructions
that operate on the stack or index into the stack.

Note: This patch adds StackOffset as an AArch64-only concept, but we would
like to make this a generic concept/struct that is supported by all 
interfaces that take or return stack offsets (currently as 'int'). Since
that would be a bigger change that is currently pending on D32530 landing,
we thought it makes sense to first show/prove the concept in the AArch64
target before proposing to roll this out further.

Reviewers: thegameg, rovka, t.p.northover, efriedma, greened

Reviewed By: rovka, greened

Differential Revision: https://reviews.llvm.org/D61435

llvm-svn: 368024
2019-08-06 13:06:40 +00:00
Simon Pilgrim 01d267dc4f [X86] SimplifyMultipleUseDemandedBits - target shuffles might not be identity
If we don't demand any non-undef shuffle elements then the assert will fail as all shuffle inputs would still be flagged as 'identity' safe.

Exposed by an incoming patch.

llvm-svn: 368022
2019-08-06 12:41:29 +00:00
Simon Pilgrim c6735aecfa [X86][SSE] Enable min/max partial reduction
As mentioned on D65047 / rL366933 the plan is to enable partial reduction handling wherever possible.

llvm-svn: 368016
2019-08-06 11:00:34 +00:00
Cullen Rhodes ced419f4d7 [SelectionDAG] Extend base addressing modes supported by MGATHER/MSCATTER
Summary:
Before this patch MGATHER/MSCATTER is capable of representing all
common addressing modes, but only when illegal types are used.
This patch adds an IndexType property so more representations
are available when using legal types only.

Original modes:
 vector of bases
 base + vector of signed scaled offsets

New modes:
 base + vector of signed unscaled offsets
 base + vector of unsigned scaled offsets
 base + vector of unsigned unscaled offsets

The current behaviour of addressing modes for gather/scatter remains
unchanged.

Patch by Paul Walker.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D65636

llvm-svn: 368008
2019-08-06 09:46:13 +00:00
Tim Northover de98e92bc2 AArch64: use xzr/wzr for constant 0 in GlobalISel.
COPYs from xzr and wzr can often be folded away entirely during register
allocation, unlike a movz.

llvm-svn: 368003
2019-08-06 09:18:41 +00:00
Bill Wendling ebc2cf9c27 Use "isa" since the variable isn't used.
llvm-svn: 367985
2019-08-06 07:27:26 +00:00
Matt Arsenault f4d3113a5f CodeGen: Migration to using Register
llvm-svn: 367974
2019-08-06 03:59:31 +00:00
Austin Kerbow a05c384132 Re-commit: [AMDGPU] Use S_DENORM_MODE for gfx10
Summary: During fdiv32 lowering use S_DENORM_MODE to select denorm mode in gfx10.

Reviewers: arsenm, rampitec

Reviewed By: arsenm, rampitec

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65620

llvm-svn: 367969
2019-08-06 02:16:11 +00:00
Shiva Chen b12056bd33 [RISCV] Custom legalize i32 operations for RV64 to reduce signed extensions
Differential Revision: https://reviews.llvm.org/D65434

llvm-svn: 367960
2019-08-06 00:24:00 +00:00
Amara Emerson bc1172df14 [GlobalISel][CallLowering] Rename isArgumentHandler() -> isIncomingArgumentHandler()
Previous name and comment incorrectly implied it was just for formal arg handlers,
which is not true.

llvm-svn: 367945
2019-08-05 23:05:28 +00:00
Keno Fischer 5c3cdef84b [WebAssembly] Fix conflict between ret legalization and sjlj
Summary:
When the WebAssembly backend encounters a return type that doesn't
fit within i32, SelectionDAG performs sret demotion, adding an
additional argument to the start of the function that contains
a pointer to an sret buffer to use instead. However, this conflicts
with the emscripten sjlj lowering pass. There we translate calls like:

```
	call {i32, i32} @foo()
```

into (in pseudo-llvm)
```
	%addr = @foo
	call {i32, i32} @__invoke_{i32,i32}(%addr)
```

i.e. we perform an indirect call through an extra function.
However, the sret transform now transforms this into
the equivalent of
```
        %addr = @foo
        %sret = alloca {i32, i32}
        call {i32, i32} @__invoke_{i32,i32}(%sret, %addr)
```
(while simultaneously translation the implementation of @foo as well).
Unfortunately, this doesn't work out. The __invoke_ ABI expected
the function address to be the first argument, causing crashes.

There is several possible ways to fix this:
1. Implementing the sret rewrite at the IR level as well and performing
   it as part of lowering to __invoke
2. Fixing the wasm backend to recognize that __invoke has a special ABI
3. A change to the binaryen/emscripten ABI to recognize this situation

This revision implements the middle option, teaching the backend to
treat __invoke_ functions specially in sret lowering. This is achieved
by
1) Introducing a new CallingConv ID for invoke functions
2) When this CallingConv ID is seen in the backend and the first argument
   is marked as sret (a function pointer would never be marked as sret),
   swapping the first two arguments.

Reviewed By: tlively, aheejin
Differential Revision: https://reviews.llvm.org/D65463

llvm-svn: 367935
2019-08-05 21:36:09 +00:00
Amara Emerson 85e5e28ab4 [AArch64][GlobalISel] Inline tiny memcpy et al at -O0.
FastISel already does this since the initial arm64 port was upstreamed, so
it seems there are no issues with doing this at -O0 for very small memcpys.

Gives a 0.2% geomean code size improvement on CTMark.

Differential Revision: https://reviews.llvm.org/D65758

llvm-svn: 367919
2019-08-05 20:02:52 +00:00
Dmitri Gribenko 37aa8ad663 Revert "[AMDGPU] Use S_DENORM_MODE for gfx10"
This reverts commit r367882. It broke the test
MC/Disassembler/AMDGPU/gfx10_dasm_all.txt.

llvm-svn: 367904
2019-08-05 18:36:43 +00:00
Craig Topper 3de33245d2 [X86] Enable -x86-experimental-vector-widening-legalization by default.
This patch changes our defualt legalization behavior for 16, 32, and
64 bit vectors with i8/i16/i32/i64 scalar types from promotion to
widening. For example, v8i8 will now be widened to v16i8 instead of
promoted to v8i16. This keeps the elements widths the same and pads
with undef elements. We believe this is a better legalization strategy.
But it carries some issues due to the fragmented vector ISA. For
example, i8 shifts and multiplies get widened and then later have
to be promoted/split into vXi16 vectors.

This has the potential to cause regressions so we wanted to get
it in early in the 10.0 cycle so we have plenty of time to
address them.

Next steps will be to merge tests that explicitly test the command
line option. And then we can remove the option and its associated
code.

llvm-svn: 367901
2019-08-05 18:25:36 +00:00
Evandro Menezes a005c1ac4f [AArch64] Expand bcmp() for small block lengths
Patch D56593 by @courbet results in calls to `bcmp()` in some cases, should
the target support the it.  Unless `TTI::MemCmpExpansionOptions()`
is overridden by the target.

In a proprietary benchmark we see a performance drop of about 12% on PNG
compression before this patch, though it passes all tests.

This patch mirrors X86 for AArch64 and initializes
`TTI::MemCmpExpansionOptions()` to then expand calls to `bcmp()` when
appropriate.  No tuning of the parameters was performed, but, at this point,
it's enough to recover the performance drop above.

This problem also exists on ARM.  Once a consensus is reached for AArch64, we
can work to fix ARM as well.

Authors:
- Evandro Menezes (@evandro) <e.menezes@samsung.com>
- Brian Rzycki (@brzycki) <b.rzycki@samsung.com>

Differential revision: https://reviews.llvm.org/D64805

llvm-svn: 367898
2019-08-05 18:09:14 +00:00
Pablo Barrio a8426b43f8 [AArch64] Set preferred function alignment to 16 bytes on Neoverse N1
Summary:
The Arm Neoverse N1 Software Optimization Guide [1], Section "4.8 Branch
instruction alignment" states:

"Consider aligning subroutine entry points and branch targets to 32B
boundaries, within the bounds of the code-density requirements of the
program."

This patch sets the preferred function alignment on Neoverse N1 to 2^4=16B.
This was already the case in some of the latest Cortex-A CPUs. Benchmarking
in previous Cortex-A CPUs suggested that 16B alignment is already better
than the default. See commit d04ee305.

The reason we don't set it to 32B right now (as the optimisation guide
suggests) is that this will impact code size and perhaps the instruction
cache performance. Therefore we need benchmark numbers first.

I have also added testing for A75 and A76 that we were missing.

[1] https://developer.arm.com/docs/swog309707/latest

Reviewers: fhahn, greened, samparker, dmgreen

Reviewed By: dmgreen

Subscribers: dmgreen, javed.absar, kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65654

llvm-svn: 367894
2019-08-05 17:38:58 +00:00
Austin Kerbow 8d229dbb47 [AMDGPU] Use S_DENORM_MODE for gfx10
Summary: During fdiv32 lowering use S_DENORM_MODE to select denorm mode in gfx10.

Reviewers: arsenm, rampitec

Reviewed By: arsenm, rampitec

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65620

llvm-svn: 367882
2019-08-05 16:09:49 +00:00
Tom Stellard e15d95a987 AMDGPU/LoadStoreOptimizer: Set the correct offset whem merging MMOs
Summary:
This is a follow up to r367237.  MachineFunction::getMachineMemOperand()
adds the offset parameter to the existing offset instead of resetting it.
So we need to reset the offset to the correct value after calling this
function.

Reviewers: arsenm

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65557

llvm-svn: 367881
2019-08-05 16:08:44 +00:00
Matt Arsenault 3922392969 AMDGPU: Correct behavior of f16 buffer loads
Don't assume format loads for f16. Also fixes support for targets
without i16.

llvm-svn: 367879
2019-08-05 15:59:07 +00:00
Matt Arsenault 0e0a1c80fb AMDGPU: Correct behavior of f16/i16 non-format store intrinsics
This was switching to use a format store for a non-format store for
f16 types. Also fixes i16/f16 stores on targets without legal f16.

The corresponding loads also need to be fixed.

llvm-svn: 367872
2019-08-05 14:57:59 +00:00
Matt Arsenault ff6b007772 AMDGPU/GlobalISel: Alternative mappings for constants
Without context we assume SGPR. Allowing VGPR constants theoretically
helps avoid a copy. This seems to not actually work now, and the
choice isn't based on the use bank.

llvm-svn: 367871
2019-08-05 14:40:26 +00:00
Matt Arsenault 4e21730300 AMDGPU/GlobalISel: Don't reject shader types
I'm not sure what complications these present, but the current
argument lowering is pretty much directly copied from the DAG
lowering, so I assume these work as they should.

No tests because I'm lazy and things are getting pretty close to the
point where the existing calling-conventions.ll can be shared with
SelectionDAG.

llvm-svn: 367870
2019-08-05 14:40:23 +00:00
Cullen Rhodes 2a48176373 [AArch64] Implement initial SVE calling convention support
Summary:

This patch adds initial support for the SVE calling convention such that
SVE types can be passed as arguments and return values to/from a
subroutine.

The SVE AAPCS states [1]:

    z0-z7 are used to pass scalable vector arguments to a subroutine,
    and to return scalable vector results from a function. If a
    subroutine takes arguments in scalable vector or predicate
    registers, or if it is a function that returns results in such
    registers, it must ensure that the entire contents of z8-z23 are
    preserved across the call. In other cases it need only preserve the
    low 64 bits of z8-z15, as described in §5.1.2.

    p0-p3 are used to pass scalable predicate arguments to a subroutine
    and to return scalable predicate results from a function. If a
    subroutine takes arguments in scalable vector or predicate
    registers, or if it is a function that returns results in these
    registers, it must ensure that p4-p15 are preserved across the call.
    In other cases it need not preserve any scalable predicate register
    contents.

SVE predicate and data registers are passed indirectly (i.e. spilled to the
stack and pass the address) if they exceed the registers used for argument
passing defined by the PCS referenced above.  Until SVE stack support is merged
we can't spill SVE registers to the stack, so currently an llvm_unreachable is
used where we will eventually handle this.

[1] https://static.docs.arm.com/100986/0000/100986_0000.pdf

Reviewed By: ostannard

Differential Revision: https://reviews.llvm.org/D65448

llvm-svn: 367859
2019-08-05 13:44:10 +00:00
Sanjay Patel eaf13044bd [DAGCombiner][x86] prevent infinite loop from truncate/extend transforms
The test case is based on the example from the post-commit thread for:
https://reviews.llvm.org/rGc9171bd0a955

This replaces the x86-specific simple-type check from:
rL367766
with a check in the DAGCombiner. Adding the check isn't
strictly necessary after the fix from:
rL367768
...but it seems likely that we're heading for trouble if
we are creating weird types in this transform.

I combined the earlier legality check into the initial
clause to simplify the code.

So we should only try the trunc/sext transform at the
earliest combine stage, but we limit the transform to
simple types anyway because the TLI hook is probably
too lax about what it considers a free truncate.

llvm-svn: 367834
2019-08-05 11:27:07 +00:00
Florian Hahn e3ea97b049 [AArch64] Skip isZIPMask check for masks with an odd number of elements.
We process 2 elements at a time and expect the number of elements to be
even. Similar to D60690.

Reviewers: dmgreen, samparker, t.p.northover

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D65400

llvm-svn: 367831
2019-08-05 11:12:23 +00:00
Guillaume Chatelet c97a3d15d2 [LLVM][Alignment] Introduce Alignment Type
Summary:
This is patch is part of a serie to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet, jfb, jakehehrlich

Reviewed By: jfb

Subscribers: wuzish, jholewinski, arsenm, dschuff, nemanjai, jvesely, nhaehnle, javed.absar, sbc100, jgravelle-google, hiraditya, aheejin, kbarton, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, dexonsmith, PkmX, jocewei, jsji, s.egerton, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65514

llvm-svn: 367828
2019-08-05 11:02:05 +00:00
Nicolai Haehnle e204786b6c AMDGPU: add missing llvm.amdgcn.{raw,struct}.buffer.atomic.{inc,dec}
Summary:
Wrapping increment/decrement. These aren't exposed by many APIs...

Change-Id: I1df25c7889de5a5ba76468ad8e8a2597efa9af6c

Reviewers: arsenm, tpr, dstuttard

Subscribers: kzhuravl, jvesely, wdng, yaxunl, t-tye, jfb, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65283

llvm-svn: 367821
2019-08-05 09:36:06 +00:00
Oliver Stannard 8ed8353fc4 Reland: Fix and test inter-procedural register allocation for ARM
Add an explicit construction of the ArrayRef, gcc 5 and earlier don't
seem to select the ArrayRef constructor which takes a C array when the
construction is implicit.

Original commit message:

- Avoid a crash when IPRA calls ARMFrameLowering::determineCalleeSaves
  with a null RegScavenger. Simply not updating the register scavenger
  is fine because IPRA only cares about the SavedRegs vector, the acutal
  code of the function has already been generated at this point.
- Add a new hook to TargetRegisterInfo to get the set of registers which
  can be clobbered inside a call, even if the compiler can see both
  sides, by linker-generated code.

Differential revision: https://reviews.llvm.org/D64908

llvm-svn: 367819
2019-08-05 09:04:10 +00:00
Fangrui Song d9b948b6eb Rename F_{None,Text,Append} to OF_{None,Text,Append}. NFC
F_{None,Text,Append} are kept for compatibility since r334221.

llvm-svn: 367800
2019-08-05 05:43:48 +00:00
Craig Topper 635f5ff580 [X86] Fix a bad early out in combineExtInVec that prevented recursive shuffle combining from running with -x86-experimental-vector-widening-legalization.
llvm-svn: 367798
2019-08-05 03:48:31 +00:00
Craig Topper 5a4989e2ac [TargetLowering][X86] Teach SimplifyDemandedVectorElts to replace the base vector of INSERT_SUBVECTOR with undef if none of the elements are demanded even if the node has other users.
Summary:
The SimplifyDemandedVectorElts function can replace with undef
when no elements are demanded, but due to how it interacts with
TargetLoweringOpts, it can only do this when the node has
no other users.

Remove a now unneeded DAG combine from the X86 backend.

Reviewers: RKSimon, spatel

Reviewed By: RKSimon

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65713

llvm-svn: 367788
2019-08-04 17:30:41 +00:00
Simon Pilgrim 436fd52a71 [X86] lowerShuffleAsSpecificZeroOrAnyExtend - use undef PSHUFB mask indices for ANY_EXTEND shuffles
llvm-svn: 367784
2019-08-04 13:15:23 +00:00
Simon Pilgrim c5891eaa34 Fix signed/unsigned comparison warning. NFC.
llvm-svn: 367783
2019-08-04 12:48:19 +00:00