Commit Graph

52710 Commits

Author SHA1 Message Date
Adhemerval Zanella bce9e11a7b [AArch64] Handle ISD::LROUND and ISD::LLROUND for float16
This patch is a follow up for D61391 to add lround/llround
support for float16.

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D62861

llvm-svn: 362698
2019-06-06 11:53:26 +00:00
Dmitri Gribenko 8c2c072582 Include what you use in LanaiAsmParser.cpp
llvm-svn: 362696
2019-06-06 10:37:06 +00:00
Petar Avramovic 81132ce0e9 [MIPS GlobalISel] Select sqrt
Select G_FSQRT for MIPS32.

Differential Revision: https://reviews.llvm.org/D62905

llvm-svn: 362692
2019-06-06 10:00:41 +00:00
Petar Avramovic 0a1fd355b2 [MIPS GlobalISel] Select fabs
Select G_FABS for MIPS32.

Differential Revision: https://reviews.llvm.org/D62903

llvm-svn: 362690
2019-06-06 09:22:37 +00:00
Petar Avramovic a7d0006447 [MIPS GlobalISel] Select fpext and fptrunc
Select G_FPEXT and G_FPTRUNC for MIPS32.

Differential Revision: https://reviews.llvm.org/D62902

llvm-svn: 362689
2019-06-06 09:16:58 +00:00
Petar Avramovic faaa2b5d21 [MIPS GlobalISel] Select floor and ceil
Select G_FFLOOR and G_FCEIL for MIPS32.

Differential Revision: https://reviews.llvm.org/D62901

llvm-svn: 362688
2019-06-06 09:02:24 +00:00
Amara Emerson d3144a4abc [AArch64][GlobalISel] Add manual selection support for G_ZEXTLOADs to s64.
We already get support for G_ZEXTLOAD to s32 from the importer, but it can't
deal with the SUBREG_TO_REG in the pattern. Tweaking the existing manual
selection code for G_LOAD to handle an additional SUBREG_TO_REG when dealing
with G_ZEXTLOAD isn't much work.

Also add tests to check the imported pattern selections to s32 work.

llvm-svn: 362681
2019-06-06 07:58:37 +00:00
Amara Emerson d940e20051 [AArch64][GlobalISel] Add the new changes to fix PR42129 that were supposed to go into r362666.
The changes weren't staged so ended up just re-commiting the unmodified reverted change.

llvm-svn: 362677
2019-06-06 07:33:47 +00:00
Craig Topper 9226ba6b37 [X86] Don't turn avx masked.load with constant mask into masked.load+vselect when passthru value is all zeroes.
This is intended to enable the use of an immediate blend or
more optimal instruction. But if the passthru is zero we don't
need any additional instructions.

llvm-svn: 362675
2019-06-06 05:41:27 +00:00
Amara Emerson c37ff0d138 Revert "Revert "[AArch64][GlobalISel] Optimize G_FCMP + G_SELECT pairs when G_SELECT is fp""
When looking through copies, make sure to not try to find the vreg def of a physreg.
Normally getVRegDef will return nullptr in this case, but if there happens to be
multiple defs then it will assert.

This fixes PR42129.

llvm-svn: 362666
2019-06-05 23:46:16 +00:00
Matt Arsenault 34c8b835b1 AMDGPU: Don't fix emergency stack slot at offset 0
This forced the caller to be aware of this, which is an ugly ABI
feature.

Partially reverts r295877. The original reasons for doing this are
mostly fixed. Alloca is now in a non-0 address space, so it should be
OK to have 0 as a valid pointer. Since we treat the absolute address
as the pointer value, this part only really needed to apply to
kernels.

Since r357093, we avoid the need to increment/decrement the offset
register in more cases, and since r354816 the scavenger can fail
without spilling, so it's less critical that we try to avoid an offset
that fits in the MUBUF offset.

Restrict to callable functions for now to split this into 2 steps to
limit thte number of test updates and in case anything breaks.

llvm-svn: 362665
2019-06-05 22:37:50 +00:00
Ulrich Weigand 6c5d5ce551 Allow target to handle STRICT floating-point nodes
The ISD::STRICT_ nodes used to implement the constrained floating-point
intrinsics are currently never passed to the target back-end, which makes
it impossible to handle them correctly (e.g. mark instructions are depending
on a floating-point status and control register, or mark instructions as
possibly trapping).

This patch allows the target to use setOperationAction to switch the action
on ISD::STRICT_ nodes to Legal. If this is done, the SelectionDAG common code
will stop converting the STRICT nodes to regular floating-point nodes, but
instead pass the STRICT nodes to the target using normal SelectionDAG
matching rules.

To avoid having the back-end duplicate all the floating-point instruction
patterns to handle both strict and non-strict variants, we make the MI
codegen explicitly aware of the floating-point exceptions by introducing
two new concepts:

- A new MCID flag "mayRaiseFPException" that the target should set on any
  instruction that possibly can raise FP exception according to the
  architecture definition.
- A new MI flag FPExcept that CodeGen/SelectionDAG will set on any MI
  instruction resulting from expansion of any constrained FP intrinsic.

Any MI instruction that is *both* marked as mayRaiseFPException *and*
FPExcept then needs to be considered as raising exceptions by MI-level
codegen (e.g. scheduling).

Setting those two new flags is straightforward. The mayRaiseFPException
flag is simply set via TableGen by marking all relevant instruction
patterns in the .td files.

The FPExcept flag is set in SDNodeFlags when creating the STRICT_ nodes
in the SelectionDAG, and gets inherited in the MachineSDNode nodes created
from it during instruction selection. The flag is then transfered to an
MIFlag when creating the MI from the MachineSDNode. This is handled just
like fast-math flags like no-nans are handled today.

This patch includes both common code changes required to implement the
new features, and the SystemZ implementation.

Reviewed By: andrew.w.kaylor

Differential Revision: https://reviews.llvm.org/D55506

llvm-svn: 362663
2019-06-05 22:33:10 +00:00
Petr Hosek 2f94203e23 Revert "[AArch64][GlobalISel] Optimize G_FCMP + G_SELECT pairs when G_SELECT is fp"
This reverts commit r362435 as this triggers ICE, see PR42129 for details.

llvm-svn: 362662
2019-06-05 22:27:31 +00:00
Matt Arsenault b812b7a45e AMDGPU: Invert frame index offset interpretation
Since the beginning, the offset of a frame index has been consistently
interpreted backwards. It was treating it as an offset from the
scratch wave offset register as a frame register. The correct
interpretation is the offset from the SP on entry to the function,
before the prolog. Frame index elimination then should select either
SP or another register as an FP.

Treat the scratch wave offset on kernel entry as the pre-incremented
SP. Rely more heavily on the standard hasFP and frame pointer
elimination logic, and clean up the private reservation code. This
saves a copy in most callee functions.

The kernel prolog emission code is still kind of a mess relying on
checking the uses of physical registers, which I would prefer to
eliminate.

Currently selection directly emits MUBUF instructions, which require
using a reference to some register. Use the register chosen for SP,
and then ignore this later. This should probably be cleaned up to use
pseudos that don't refer to any specific base register until frame
index elimination.

Add a workaround for shaders using large numbers of SGPRs. I'm not
sure these cases were ever working correctly, since as far as I can
tell the logic for figuring out which SGPR is the scratch wave offset
doesn't match up with the shader input initialization in the shader
programming guide.

llvm-svn: 362661
2019-06-05 22:20:47 +00:00
Craig Topper 3975b15dba [X86] Fix mistake that marked VADDSSrrb_Int/VADDSDrrb_Int/VMULSSrrb_Int/VMULSDrrb_Int as commutable.
One of the sources controls the pass through value for the upper bits
of the result so we can't really commute it.

In practice this problem isn't a functional issue because we would
only try to commute this instruction in order to fold a load. But
we can't do embedded rounding and fold a load at the same time. So
the load fold would never succeed so I don't think we would ever
commute or at least keep the version after commuting.

llvm-svn: 362647
2019-06-05 21:00:31 +00:00
Matt Arsenault 4fb580c314 AMDGPU: Remove amdgpu-max-work-group-size attribute
This has been deprecated for a long time, and mesa recently switched
to amdgpu-flat-work-group-size.

llvm-svn: 362641
2019-06-05 20:32:32 +00:00
Matt Arsenault 0f8a764e8f AMDGPU: Fix using 2 different enums for same operand flags
These enums are really for the same namespace of flags set on
arbitrary MachineOperands, so merge them to avoid value collisions.

llvm-svn: 362640
2019-06-05 20:32:25 +00:00
Dan Gohman 53572d0470 [WebAssembly] Limit PIC support to the Emscripten target
The current PIC support currently only works with Emscripten, so
disable it for other targets.

This is the PIC portion of https://reviews.llvm.org/D62542.

Reviewed By: dschuff, sbc100

llvm-svn: 362638
2019-06-05 20:01:01 +00:00
Craig Topper d0fff89b81 [X86] Add the vector integer min/max instructions to isAssociativeAndCommutative.
As far as I know these should be freely reassociatable just like
the floating point MAXC/MINC instructions.

The *reduce* test changes are largely regressions and caused by
the "generic" CPU we default to not having a scheduler model.

The machine-combiner-int-vec.ll test shows the positive benefits
of this change.

Differential Revision: https://reviews.llvm.org/D62787

llvm-svn: 362629
2019-06-05 18:25:09 +00:00
Sanjay Patel 2bf82879bd [x86] split more 256-bit stores of concatenated vectors
As suggested in D62498 - collectConcatOps() matches both
concat_vectors and insert_subvector patterns, and we see
more test improvements by using the more general match.

llvm-svn: 362620
2019-06-05 16:40:57 +00:00
Simon Pilgrim de586bd1fd [X86][AVX] Generalize split256BitStore to splitVectorStore. NFCI.
Enables us to use this to split 512-bit vectors in future patches.

llvm-svn: 362617
2019-06-05 16:14:14 +00:00
Petar Avramovic 22e99c434f [MIPS GlobalISel] Select fcmp
Select floating point compare for MIPS32.

Differential Revision: https://reviews.llvm.org/D62721

llvm-svn: 362603
2019-06-05 14:03:13 +00:00
Simon Pilgrim 886a55eaa0 [X86][AVX] combineX86ShuffleChain - combine shuffle(extractsubvector(x),extractsubvector(y))
We already handle the case where we combine shuffle(extractsubvector(x),extractsubvector(x)), this relaxes the requirement to permit different sources as long as they have the same value type.

This causes a couple of cases where the VPERMV3 binary shuffles occur at a wider width than before, which I intend to improve in future commits - but as only the subvector's mask indices are defined, these will broadcast so we don't see any increase in constant size.

llvm-svn: 362599
2019-06-05 12:56:53 +00:00
Dmitri Gribenko 6fc4c1cc54 Include what you use in PPCFrameLowering.h
llvm-svn: 362590
2019-06-05 08:58:00 +00:00
Nemanja Ivanovic 7c842fadf1 [PowerPC] Collapse RLDICL/RLDICR into RLDIC when possible
Generally speaking, we lower to an optimal rotate sequence for nodes visible in
the SDAG. However, there are instances where the two rotates are not visible at
ISEL time - most notably those in a very common sequence when lowering switch
statements to jump tables.

A common situation is a switch on a 32-bit integer. This value has to have the
upper 32 bits cleared and because jump table offsets are word offsets, the value
needs to be shifted left by 2 bits. We currently emit the clear and the left
shift as two separate instructions, but this is not needed as we can lower it to
a single RLDIC.

This patch just cleans that up.

Differential revision: https://reviews.llvm.org/D60402

llvm-svn: 362576
2019-06-05 02:36:40 +00:00
Craig Topper 78fdce25a1 [X86] Cleanup convertIntLogicToFPLogic a little. NFCI
-Use early returns to reduce indentation
-Replace multipe ifs with a switch.
-Replace an assert with an llvm_unreachable default in the switch.
-Check that the FP type we're going to use for the
 X86ISD::FAND/FOR/FXOR is legal rather than checking that the
 integer type matches the width of a legal scalar fp type. This all
 runs after legalization so it shouldn't really matter, but making
 sure we're using a valid type in the X86ISD node is really
 whats important.

llvm-svn: 362565
2019-06-05 01:00:34 +00:00
Amara Emerson 2d37cb82f0 [AArch64][GlobalISel] Make extloads to i64 legal.
Although we had the support in the prelegalizer combiner to generate the
G_SEXTLOAD or G_ZEXTLOAD ops, the legalizer definitions for arm64 had them as
lowering back to separate ops.

llvm-svn: 362553
2019-06-04 21:51:34 +00:00
Thomas Lively 3d9ca00e74 [WebAssembly] Fix ISel crash on sext_inreg/extract type mismatch
Summary:
Adjusts the index and adds a bitcast around the vector operand of
EXTRACT_VECTOR_ELT so that its lane type matches the source type of
its parent sext_inreg. Without this bitcast the ISel patterns do not
match and ISel fails.

Reviewers: aheejin

Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62646

llvm-svn: 362547
2019-06-04 21:08:20 +00:00
Craig Topper 137de38009 [X86] Mutate fceil/ffloor/ftrunc/fnearbyint/frint into X86ISD::RNDSCALE during PreProcessIselDAG to cut down on pattern permutations
We already need to have patterns for X86ISD::RNDSCALE to support software intrinsics. But we currently have 5 sets of patterns for the 5 rounding operations. For of these 6 patterns we have to support 3 vectors widths, 2 element sizes, sse/vex/evex encodings, load folding, and broadcast load folding. This results in a fair amount of bytes in the isel table.

This patch adds code to PreProcessIselDAG to morph the fceil/ffloor/ftrunc/fnearbyint/frint to X86ISD::RNDSCALE. This way we can remove everything, but the intrinsic pattern while still allowing the operations to be considered Legal for DAGCombine and Legalization. This shrinks the DAGISel by somewhere between 9K and 10K.

There is one complication to this, the STRICT versions of these nodes are currently mutated to their none strict equivalents at isel time when the node is visited. This won't be true in the future since that loses the chain ordering information. For now I've also added support for the non-STRICT nodes to Select so we can change the STRICT versions there after they've been mutated to their non-STRICT versions. We'll probably need a STRICT version of RNDSCALE or something to handle this in the future. Which will take us back to needing 2 sets of patterns for strict and non-strict, but that's still better than the 11 or 12 sets of patterns we'd need.

We can probably do something similar for scalar, but I haven't looked at it yet.

Differential Revision: https://reviews.llvm.org/D62757

llvm-svn: 362535
2019-06-04 18:03:07 +00:00
Benjamin Kramer 03ff1b3c30 [X86] Fold single-use variable into assert. NFC.
Avoids an unused variable warning in Release builds.

llvm-svn: 362534
2019-06-04 18:01:07 +00:00
Sanjay Patel 606eb2367f [x86] split 256-bit store of concatenated vectors
This shows up as a side issue to the main problem for the AVX target example from PR37428:
https://bugs.llvm.org/show_bug.cgi?id=37428 - https://godbolt.org/z/7tpRa3

But as we can see in the pile of existing test diffs, it's actually a widespread problem
that affects any AVX or later target. Apart from a couple of oddballs, I think these are
all improvements for the reasons stated in the code comment: we do not want to enable YMM
unnecessarily (avoid vzeroupper and frequency throttling) and some cores split 256-bit
stores anyway.

We could say that MergeConsecutiveStores() is going overboard on some of these examples,
but that won't solve the problem completely. But that is a reason I'm proposing this as
a lowering rather than a combine: we will infinite loop fighting the merge code if we try
this earlier.

Differential Revision: https://reviews.llvm.org/D62498

llvm-svn: 362524
2019-06-04 16:40:04 +00:00
Peter Smith f15e3d856f [AArch64][ELF] Add support for PLT decoding with BTI instructions present
Arm Architecture v8.5a introduces Branch Target Identification (BTI). When
enabled all indirect branches must target a bti instruction of the
appropriate form. As PLT sequences may sometimes be the target of an
indirect branch and PLT[0] always is, a static linker may need to generate
PLT sequences that contain "bti c" as the first instruction. In effect:
bti     c
adrp    x16, page offset to .got.plt
...
Instead of:
adrp    x16, page offset to .got.plt
...
At present the PLT decoding assumes the adrp will always be the first
instruction. This patch adds support for a single "bti c" to prefix it. A
test binary has been uploaded with such a PLT sequence. A forthcoming LLD
patch will make heavy use of the PLT decoding code.

Differential Revision: https://reviews.llvm.org/D62598

llvm-svn: 362523
2019-06-04 16:35:40 +00:00
Jinsong Ji 3144d7a2da [PowerPC] P9 Scheduling Model: dispatching rule fixes
This is to address some of the problems in existing P9 resource modeling,
especially about the dispatching rules.

Instead of using a hypothetical DISPATCHER , we try to use the number of
actual dispatch slots, and define SchedWriteRes to model dispatch rules,
then update instruction classes according to dispatch rules.

All the dispatch rules and instruction classes update are made according
to POWER9 User Manual.

Differential Revision: https://reviews.llvm.org/D61873

llvm-svn: 362509
2019-06-04 15:22:23 +00:00
Sanjay Patel 1e63dd0b44 [SelectionDAG][x86] limit post-legalization store merging by type
The proposal in D62498 showed that x86 would benefit from vector
store splitting, but that may conflict with the generic DAG
combiner's store merging transforms.

Add memory type to the existing TLI hook that enables the merging
transforms, so we can limit those changes to scalars only for x86.

llvm-svn: 362507
2019-06-04 15:15:59 +00:00
Simon Pilgrim a6e289e9f8 [X86][SSE] Pulled out (sub (xor X, M), M) 'ConditionalNegate' out pattern match code. NFCI.
As discussed on D62777 - we should be able to use this in more SSE41+ cases as well but that requires us to separate it from the OR(AND(),ANDN()) matcher.

llvm-svn: 362504
2019-06-04 15:02:33 +00:00
Shawn Landden 669775f9db [Support] make countLeadingZeros() countTrailingZeros() countLeadingOnes() and countTrailingOnes() return unsigned
This matches APInt's versions of these functions, and there is no need for these to be size_t.

(as well as __builtin_clzll())

Differential Revision: https://reviews.llvm.org/D60823

llvm-svn: 362503
2019-06-04 14:51:15 +00:00
Dmitri Gribenko 454fc77872 Include what you use in PPCRegisterInfo.cpp
llvm-svn: 362495
2019-06-04 12:55:00 +00:00
Mikhail Maltsev 08da01b496 [ARM] Add FP16 vector insert/extract patterns
This change adds two FP16 extraction and two insertion patterns
(one per possible vector length).
Extractions are handled by copying a Q/D register into one of VFP2
class registers, where single FP32 sub-registers can be accessed. Then
the extraction of even lanes are simple sub-register extractions
(because we don't care about the top parts of registers for FP16
operations). Odd lanes need an additional VMOVX instruction.

Unfortunately, insertions cannot be handled in the same way, because:
* There is no instruction to insert FP16 into an even lane (VINS only
  works with odd lanes)
* The patterns for odd lanes will have a form of a DAG (not a tree),
  and will not be implementable in pure tablegen

Because of this insertions are handled in the same way as 16-bit
integer insertions (with conversions between FP registers and GPRs
using VMOVHR instructions).

Without these patterns the ARM backend would sometimes fail during
instruction selection.

This patch also adds patterns which combine:
* an FP16 element extraction and a store into a single VST1
  instruction
* an FP16 load and insertion into a single VLD1 instruction

Differential Revision: https://reviews.llvm.org/D62651

llvm-svn: 362482
2019-06-04 09:39:55 +00:00
Dmitri Gribenko 73a15d4b78 Include what you use in PPC.h
llvm-svn: 362477
2019-06-04 09:16:35 +00:00
Dmitri Gribenko 067a17b51d Include what you use in PPCMachineScheduler.cpp
llvm-svn: 362476
2019-06-04 09:16:31 +00:00
Dmitri Gribenko 9d1c5ea165 Include what you use in PPCRegisterInfo.h
llvm-svn: 362475
2019-06-04 09:13:08 +00:00
Simon Tatham ac02445524 [ARM] Turn some undefined encoding bits into 0s.
The family of 32-bit Thumb instruction encodings that include t2ORR,
t2AND and t2EOR are all listed in the ArmARM as having (0) in bit 15.
The Tablegen descriptions of those instructions listed them as ?. This
change tightens that up by making them into 0 + Unpredictable.

In the specific case of t2ORR, we tighten it up still further by
making the zero bit mandatory. This change comes from Arm v8.1-M, in
which encodings with that bit equal to 1 will now be used for
different instructions.


Reviewers: dmgreen, samparker, SjoerdMeijer, efriedma

Reviewed By: dmgreen, efriedma

Subscribers: efriedma, javed.absar, kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60705

llvm-svn: 362470
2019-06-04 08:28:48 +00:00
Matt Arsenault 0ceda9fb5c AMDGPU: Disable stack realignment for kernels
This is something of a workaround, and the state of stack realignment
controls is kind of a mess. Ideally, we would be able to specify the
stack is infinitely aligned on entry to a kernel.

TargetFrameLowering provides multiple controls which apply at
different points. The StackRealignable field is used during
SelectionDAG, and for some reason distinct from this
hook. StackAlignment is a single field not dependent on the
function. It would probably be better to make that dependent on the
calling convention, and the maximum value for kernels.

Currently this doesn't really change anything, since the frame
lowering mostly does its own thing. This helps avoid regressions in a
future change which will rely more heavily on hasFP.

llvm-svn: 362447
2019-06-03 21:33:22 +00:00
Jessica Paquette 7500c97ce4 [AArch64][GlobalISel] Optimize G_FCMP + G_SELECT pairs when G_SELECT is fp
Instead of emitting all of the test stuff for a compare when it's only used by
a select, instead, just emit the compare + select. The select will use the
value of NZCV correctly, so we don't need to emit all of the test instructions
etc.

For now, only support fp selects which use G_FCMP. Also only support condition
codes which will only require one select to represent.

Also add a test.

Differential Revision: https://reviews.llvm.org/D62695

llvm-svn: 362446
2019-06-03 20:47:20 +00:00
Craig Topper dcf865f0ca [X86] Fix the pattern for merge masked vcvtps2pd.
r362199 fixed it for zero masking, but not zero masking. The load
folding in the peephole pass hid the bug. This patch turns off
the peephole pass on the relevant test to ensure coverage.

llvm-svn: 362440
2019-06-03 19:29:14 +00:00
Nemanja Ivanovic bad43d8f49 [PowerPC] Look through copies for compare elimination
We currently miss the opportunities for optmizing comparisons in the peephole
optimizer if the input is the result of a COPY since we look for record-form
versions of the producing instruction.

This patch simply lets the optimization peek through copies.

Differential revision: https://reviews.llvm.org/D59633

llvm-svn: 362438
2019-06-03 19:09:15 +00:00
Matt Arsenault 8dbeb9256c TTI: Improve default costs for addrspacecast
For some reason multiple places need to do this, and the variant the
loop unroller and inliner use was not handling it.

Also, introduce a new wrapper to be slightly more precise, since on
AMDGPU some addrspacecasts are free, but not no-ops.

llvm-svn: 362436
2019-06-03 18:41:34 +00:00
Dmitri Gribenko 26c43d0ef8 Include what you use in Lanai.h
Other files were not relying on these transitive includes, so I'm
submitting this change separately.

llvm-svn: 362423
2019-06-03 17:02:15 +00:00
Dmitri Gribenko b8aeaf882e Include what you use in LanaiAsmPrinter.cpp
llvm-svn: 362422
2019-06-03 17:02:07 +00:00
Dmitri Gribenko dc136847e3 Include what you use in LanaiMemAluCombiner.cpp
llvm-svn: 362421
2019-06-03 17:02:02 +00:00
Dmitri Gribenko f4d22bd0b4 Include what you use in LanaiISelDAGToDAG.cpp
llvm-svn: 362420
2019-06-03 17:01:57 +00:00
Dmitri Gribenko 179154f6b9 Include what you use in LanaiFrameLowering.{cpp,h}
llvm-svn: 362419
2019-06-03 17:01:52 +00:00
Dmitri Gribenko 8e317e29da Include what you use in LanaiRegisterInfo.cpp
llvm-svn: 362416
2019-06-03 16:31:37 +00:00
Dmitri Gribenko bedcaea99a Include what you use in LanaiInstrInfo.cpp
llvm-svn: 362408
2019-06-03 15:26:25 +00:00
Dmitri Gribenko b3bd866c7f Include what you use in PPCInstrInfo.h
llvm-svn: 362405
2019-06-03 15:04:05 +00:00
Dmitri Gribenko 2b369f83c5 Include what you use in NVPTX.h
Other files were not relying on these transitive includes, so I'm
submitting this change separately.

llvm-svn: 362403
2019-06-03 14:37:26 +00:00
Dmitri Gribenko 14c69fefe6 Include what you use in NVPTX.h
I also fixed all other files that were including NVPTX.h and were
relying on transitive includes.

llvm-svn: 362402
2019-06-03 14:26:50 +00:00
Dmitry Preobrazhensky 9111f35f02 [AMDGPU][MC] Added support of SCC, VCCZ and EXECZ operands
See bug 39292: https://bugs.llvm.org/show_bug.cgi?id=39292

Reviewers: rampitec, arsenm

Differential Revision: https://reviews.llvm.org/D62660

llvm-svn: 362400
2019-06-03 13:51:24 +00:00
Dmitri Gribenko 7a3e4ab286 Include what you use in LanaiInstPrinter.cpp
llvm-svn: 362395
2019-06-03 12:53:05 +00:00
Dmitri Gribenko 2f66316c96 Include what you use in LanaiMCCodeEmitter.cpp
LanaiMCCodeEmitter.cpp was not using any APIs from Lanai.h, and was only
including it for transitive dependencies.  Doing so is problematic from
include-what-you-use perspective, but it is also a layering issue (it
creates a dependency cycle between the primary Lanai target library and
the MCTargetDesc library).

llvm-svn: 362394
2019-06-03 12:42:48 +00:00
Dmitri Gribenko c69ee63cb9 Include what you use in LanaiDisassembler.cpp
llvm-svn: 362392
2019-06-03 12:37:11 +00:00
Nicolai Haehnle edfa756f3f AMDGPU/GFX10: V_CMPX_xxx instructions still have an omod operand
Summary: Change-Id: If6ee98e4a723b643bc37254fc6ef8b3812db16da

Reviewers: rampitec

Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62720

Change-Id: Id547ef152b2f92b24dc1c0efbf7e4467c4fb4b6e
llvm-svn: 362390
2019-06-03 12:07:41 +00:00
Dmitri Gribenko 8668fc0102 Include what you use in HexagonInstPrinter.cpp
HexagonInstPrinter.cpp was not using any APIs from HexagonAsmPrinter.h.
Doing so is problematic from include-what-you-use perspective, but it is
also a layering issue (it creates a dependency cycle between the primary
Hexagon target library and the MCTargetDesc library).

llvm-svn: 362389
2019-06-03 11:41:22 +00:00
Dmitri Gribenko 61b49ccb77 Include what you use in HexagonAsmPrinter.h
llvm-svn: 362388
2019-06-03 11:41:18 +00:00
Dmitri Gribenko 03d1b33041 Include what you use in HexagonMCInstrInfo.cpp
HexagonMCInstrInfo.cpp was not using any APIs from Hexagon.h.  Doing so
is problematic from include-what-you-use perspective, but it is also a
layering issue (it creates a dependency cycle between the primary
Hexagon target library and the MCTargetDesc library).

llvm-svn: 362387
2019-06-03 11:25:37 +00:00
Dmitri Gribenko 970b9f961f Include what you use in HexagonMCCodeEmitter.cpp
HexagonMCCodeEmitter.cpp was not using any APIs from Hexagon.h.  Doing
so is problematic from include-what-you-use perspective, but it is also
a layering issue (it creates a dependency cycle between the primary
Hexagon target library and the MCTargetDesc library).

llvm-svn: 362386
2019-06-03 11:20:53 +00:00
Dmitri Gribenko ebe360edfa Include what you use in HexagonMCCompound.cpp
HexagonMCCompound.cpp was not using any APIs from Hexagon.h.  Doing so
is problematic from include-what-you-use perspective, but it is also a
layering issue (it creates a dependency cycle between the primary
Hexagon target library and the MCTargetDesc library).

llvm-svn: 362385
2019-06-03 11:20:48 +00:00
Dmitri Gribenko 6e076a081a Include what you use in HexagonShuffler.cpp
HexagonShuffler.cpp was not using any APIs from Hexagon.h, and was only
including it for transitive dependencies.  Doing so is problematic from
include-what-you-use perspective, but it is also a layering issue (it
creates a dependency cycle between the primary Hexagon target library
and the MCTargetDesc library).

llvm-svn: 362384
2019-06-03 11:14:20 +00:00
Dmitri Gribenko 6214b577b7 Include what you use in HexagonMCChecker.cpp
HexagonMCChecker.cpp was not using any APIs from Hexagon.h.  Doing so is
problematic from include-what-you-use perspective, but it is also a
layering issue (it creates a dependency cycle between the primary
Hexagon target library and the MCTargetDesc library).

llvm-svn: 362383
2019-06-03 11:14:15 +00:00
Dmitri Gribenko bf2a356ec0 Include what you use in HexagonMCTargetDesc.cpp
HexagonMCTargetDesc.cpp was not using any APIs from Hexagon.h.  Doing so
is problematic from include-what-you-use perspective, but it is also a
layering issue (it creates a dependency cycle between the primary
Hexagon target library and the MCTargetDesc library).

llvm-svn: 362382
2019-06-03 11:14:10 +00:00
Dmitri Gribenko beb7f48a29 Include what you use in HexagonMCShuffler.cpp
HexagonMCShuffler.cpp was not using any APIs from Hexagon.h.  Doing so
is problematic from include-what-you-use perspective, but it is also a
layering issue (it creates a dependency cycle between the primary
Hexagon target library and the MCTargetDesc library).

llvm-svn: 362381
2019-06-03 11:14:05 +00:00
Dmitri Gribenko 7ebfbebfe1 Include what you use in HexagonELFObjectWriter.cpp
HexagonELFObjectWriter.cpp was not using any APIs from Hexagon.h, and
was only including it for transitive dependencies.  Doing so is
problematic from include-what-you-use perspective, but it is also a
layering issue (it creates a dependency cycle between the primary
Hexagon target library and the MCTargetDesc library).

llvm-svn: 362376
2019-06-03 09:56:40 +00:00
Dmitri Gribenko 0aa374a306 Include what you use in HexagonAsmBackend.cpp
HexagonAsmBackend.cpp was not using any APIs from Hexagon.h.  Doing so
is problematic from include-what-you-use perspective, but it is also a
layering issue (it creates a dependency cycle between the primary
Hexagon target library and the MCTargetDesc library).

llvm-svn: 362372
2019-06-03 09:43:05 +00:00
Dmitri Gribenko 301f8fd632 Include what you use in HexagonAsmParser.cpp
HexagonAsmParser.cpp was not using any APIs from Hexagon.h.  Doing so is
problematic from include-what-you-use perspective, but it is also a
layering issue (it creates a dependency cycle between the primary
Hexagon target library and the AsmParser library).

llvm-svn: 362370
2019-06-03 09:38:48 +00:00
Dmitri Gribenko c5327ab71d Include what you use in HexagonShuffler.h
HexagonShuffler.h was not using any APIs from Hexagon.h, and was only
including it for transitive dependencies.  Doing so is problematic from
include-what-you-use perspective, but it is also a layering issue (it
creates a dependency cycle between the primary Hexagon target library
and the MCTargetDesc library).

llvm-svn: 362369
2019-06-03 09:33:48 +00:00
Dmitri Gribenko 3c837201e0 Include what you use in BPFMCTargetDesc.cpp
BPFMCTargetDesc.cpp was not using any APIs from BPF.h.  Doing so is
problematic from include-what-you-use perspective, but it is also a
layering issue (it creates a dependency cycle between the primary
BPF target library and the MCTargetDesc library).

llvm-svn: 362368
2019-06-03 09:29:51 +00:00
Diogo N. Sampaio df92f84110 [ARM][FIX] Ran out of registers due tail recursion
Summary:
- pr42062
When compiling for MinSize,
ARMTargetLowering::LowerCall decides to indirect
multiple calls to a same function. However,
it disconsiders the limitation that thumb1
indirect calls require the callee to be in a
register from r0 to r3 (llvm limiation).
If all those registers are used by arguments, the
compiler dies with "error: run out of registers
during register allocation".
This patch tells the function
IsEligibleForTailCallOptimization if we intend to
perform indirect calls, as to avoid tail call
optimization.

Reviewers: dmgreen, efriedma

Reviewed By: efriedma

Subscribers: javed.absar, kristof.beyls, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62683

llvm-svn: 362366
2019-06-03 08:58:05 +00:00
Sam Parker a0bd6f8a1a [AArch64] Check for simple type in FPToUInt
DAGCombiner was hitting a SimpleType assertion when trying to combine
a v3f32 before type legalization.

bugzilla: https://bugs.llvm.org/show_bug.cgi?id=41916

Differential Revision: https://reviews.llvm.org/D62734

llvm-svn: 362365
2019-06-03 08:49:17 +00:00
Jim Lin 20b14dacbb [AVR] Fix incorrect source regclass of LDWRdPtr
Summary:
LDWRdPtr would be expanded to ld+ldd. ldd only accepts the pointer register is Y or Z.
So the register class of pointer of LDWRdPtr should be PTRDISPREGS instead of PTRREGS.

Reviewers: dylanmckay

Reviewed By: dylanmckay

Subscribers: dylanmckay, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62300

llvm-svn: 362351
2019-06-03 02:31:07 +00:00
Simon Pilgrim 8a32ca381d [CostModel][X86] Improve masked load/store AVX1/AVX2 costs
A mixture of internal tests and review of the scheduler models indicates we're overestimating the cost of a masked load, which we're estimating at 4x regular memory ops - more realistic values indicates that its closer to 2x. Masked stores costs are a lot more diverse but 8x is roughly in the middle of the range.

e.g. SandyBridge
defm : X86WriteRes<WriteFMaskedLoad, [SBPort23,SBPort05], 8, [1,2], 3>;
defm : X86WriteRes<WriteFMaskedLoadY, [SBPort23,SBPort05], 9, [1,2], 3>;
defm : X86WriteRes<WriteFMaskedStore, [SBPort4,SBPort01,SBPort23], 5, [1,1,1], 3>;
defm : X86WriteRes<WriteFMaskedStoreY, [SBPort4,SBPort01,SBPort23], 5, [1,1,1], 3>;

e.g. Btver2
defm : X86WriteRes<WriteFMaskedLoad, [JLAGU, JFPU01, JFPX], 6, [1, 2, 2], 1>;
defm : X86WriteRes<WriteFMaskedLoadY, [JLAGU, JFPU01, JFPX], 6, [2, 4, 4], 2>;
defm : X86WriteRes<WriteFMaskedStore, [JSAGU, JFPU01, JFPX], 6, [1, 1, 4], 1>;
defm : X86WriteRes<WriteFMaskedStoreY, [JSAGU, JFPU01, JFPX], 6, [2, 2, 4], 2>;

Differential Revision: https://reviews.llvm.org/D61257

llvm-svn: 362338
2019-06-02 20:37:02 +00:00
Simon Pilgrim 59a8db628b [TTI][X86] Cleanup getMaskedMemoryOpCost. NFCI.
Prep work before resurrecting D61257.

llvm-svn: 362335
2019-06-02 18:06:42 +00:00
Simon Pilgrim 71a39bcf68 [X86] isHorizontalBinOp - add extract_subvector(shuffle(x)) handling (PR39921)
Let's us match horizontal op patterns on fast-variable-shuffle targets (Haswell etc.)

llvm-svn: 362327
2019-06-02 15:47:49 +00:00
Simon Pilgrim 7a869e7036 [DAGCombine] Fold insert_subvector(bitcast(x),bitcast(y),c1) -> bitcast(insert_subvector(x,y),c2)
Move this combine from x86 into generic DAGCombine, which currently only manages cases where the bitcast is between types of the same scalarsize.

Differential Revision: https://reviews.llvm.org/D59188

llvm-svn: 362324
2019-06-02 14:42:11 +00:00
Craig Topper 396a915c26 [X86] Add the SSE versions of PMULLW and PMULLD to isAssociativeAndCommutative.
llvm-svn: 362309
2019-06-02 00:42:58 +00:00
Simon Atanasyan 25694e0084 [mips] Extend range of register indexes accepted by cfcmsa/ctcmsa
The `cfcmsa` and `ctcmsa` instructions accept index of MSA control
register. The MIPS64 SIMD Architecture define eight MSA control
registers. But register index for `cfcmsa` and `ctcmsa` instructions
might be any number in 0..31 range. If the index is greater then 7,
`cfcmsa` writes zero to the destination registers and `ctcmsa` does
nothing [1].

[1] MIPS Architecture for Programmers Volume IV-j:
    The MIPS64 SIMD Architecture Module
https://www.mips.com/?do-download=the-mips64-simd-architecture-module

Differential Revision: https://reviews.llvm.org/D62597

llvm-svn: 362299
2019-06-01 13:55:18 +00:00
Dylan McKay 45eb4c7e55 [AVR] Disable register coalescing to the PTRDISPREGS class
If we would allow register coalescing on PTRDISPREGS class then register
allocator can lock Z register to some virtual register. Larger instructions
requiring a memory acces then fail during the register allocation phase since
there is no available register to hold a pointer if Y register was already
taken for a stack frame. This patch prevents it by keeping Z register
spillable. It does it by not allowing coalescer to lock it.

Original discussion on https://github.com/avr-rust/rust/issues/128.

llvm-svn: 362298
2019-06-01 12:38:56 +00:00
Craig Topper c288a19bb7 [X86] Add AVX512BF16 and AVX512VP2INTERSECT instructions to the loading folding tables.
llvm-svn: 362288
2019-06-01 06:20:59 +00:00
Craig Topper 48fdb61766 [X86] Make the X86FoldTablesEmitter functional again. Fix the spacing in the output to make it easier to diff.
Fix a few other formatting issues in the manual table. And remove some
old FIXMEs.

llvm-svn: 362287
2019-06-01 06:20:55 +00:00
Tom Tan eb4d6142dc [COFF, ARM64] Add CodeView register mapping
CodeView has its own register map which is defined in cvconst.h. Missing this
mapping before saving register to CodeView causes debugger to show incorrect
value for all register based variables, like variables in register and local
variables addressed by register (stack pointer + offset).

This change added mapping between LLVM register and CodeView register so the
correct register number will be stored to CodeView/PDB, it aso fixed the
mapping from CodeView register number to register name based on current
CPUType but print PDB to yaml still assumes X86 CPU and needs to be fixed.

Differential Revision: https://reviews.llvm.org/D62608

llvm-svn: 362280
2019-05-31 23:43:31 +00:00
Nick Desaulniers 7fcad2f171 [PowerPC] check for INLINEASM_BR along w/ INLINEASM
Summary:
It looks like since INLINEASM_BR was created off of INLINEASM (r353563),
a few checks for INLINEASM needed to be updated to check for either
case.

pr/41999

Reviewers: hfinkel

Reviewed By: hfinkel

Subscribers: nemanjai, hiraditya, kbarton, jsji, llvm-commits, craig.topper, srhines

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62403

llvm-svn: 362278
2019-05-31 23:02:13 +00:00
Matt Arsenault 302eedcbfa AMDGPU: Fix not adding ImplicitBufferPtr as a live-in
Fixes missing test from r293000.

llvm-svn: 362275
2019-05-31 22:47:36 +00:00
Stanislav Mekhanoshin fbbe5230f4 [AMDGPU] Use InliningThresholdMultiplier for inline hint
AMDGPU uses multiplier 9 for the inline cost. It is taken into account
everywhere except for inline hint threshold. As a result we are penalizing
functions with the inline hint making them less probable to be inlined
than those without the hint. Defaults are 225 for a normal function and
325 for a function with an inline hint. Currently we have effective
threshold 225 * 9 = 2025 for normal functions and just 325 for those with
the hint. That is fixed by this patch.

Differential Revision: https://reviews.llvm.org/D62707

llvm-svn: 362239
2019-05-31 16:19:26 +00:00
Guozhi Wei c3a24e93d5 [PPC] Correctly adjust branch probability in PPCReduceCRLogicals
In PPCReduceCRLogicals after splitting the original MBB into 2, the 2 impacted branches still use original branch probability. This is unreasonable. Suppose we have following code, and the probability of each successor is 50%.

    condc = conda || condb
    br condc, label %target, label %fallthrough

It can be transformed to following,

    br conda, label %target, label %newbb
  newbb:
    br condb, label %target, label %fallthrough

Since each branch has a probability of 50% to each successor, the total probability to %fallthrough is 25% now, and the total probability to %target is 75%. This actually changed the original profiling data. A more reasonable probability can be set to 70% to the false side for each branch instruction, so the total probability to %fallthrough is close to 50%.

This patch assumes the branch target with two incoming edges have same edge frequency and computes new probability fore each target, and keep the total probability to original targets unchanged.

Differential Revision: https://reviews.llvm.org/D62430

llvm-svn: 362237
2019-05-31 16:11:17 +00:00
Cullen Rhodes 0fc3a07398 [AArch64][SVE2] Asm: support WHILE instructions
Summary:
Patch adds support for the following instructions:
    * WHILEGE, WHILEGT, WHILEHS, WHILEHI, WHILEWR, WHILERW

The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest

Reviewed By: chill

Differential Revision: https://reviews.llvm.org/D62601

llvm-svn: 362215
2019-05-31 09:13:55 +00:00
Cullen Rhodes 087d1337f8 [AArch64][SVE2] Asm: support TBL/TBX instructions
Summary:
A three sources variant of the TBL instruction is added to the existing
SVE instruction in SVE2. This is implemented with minor changes to the
existing TableGen class. TBX is a new instruction with its own
definition.

The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest

Reviewed By: chill

Differential Revision: https://reviews.llvm.org/D62600

llvm-svn: 362214
2019-05-31 09:06:53 +00:00
Cullen Rhodes 2e870011b6 [AArch64][SVE2] Asm: support SVE2 store instructions
Summary:
Patch adds support for the following instructions:
    * STNT1B, STNT1H, STNT1S, STNT1D

The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest

Reviewed By: chill

Differential Revision: https://reviews.llvm.org/D62599

llvm-svn: 362213
2019-05-31 08:59:40 +00:00
Petar Avramovic efcd3c0009 [MIPS GlobalISel] Handle position independent code
Handle position independent code for MIPS32.
When callee is global address, lower call will emit callee
as G_GLOBAL_VALUE and add target flag if needed.
Support $gp in getRegBankFromRegClass().
Select G_GLOBAL_VALUE, specially handle case when
there are target flags attached by lowerCall.

Differential Revision: https://reviews.llvm.org/D62589

llvm-svn: 362210
2019-05-31 08:27:06 +00:00
Petar Avramovic 9058b50fb2 [mips] Move initGlobalBaseReg to MipsFunctionInfo. NFC
Move initGlobalBaseReg from MipsSEDAGToDAGISel to MipsFunctionInfo.
This way functions used for handling position independent code during
instruction selection, getGlobalBaseReg and initGlobalBaseReg,
end up in same class.

Differential Revision: https://reviews.llvm.org/D62586

llvm-svn: 362206
2019-05-31 08:15:28 +00:00
Petar Avramovic f4a6dd28b6 [MIPS GlobalISel] Lower call for callee that is register
Lower call for callee that is register for MIPS32.
Register should contain callee function address.

Differential Revision: https://reviews.llvm.org/D62585

llvm-svn: 362204
2019-05-31 08:06:17 +00:00
Craig Topper 31d00d80a2 [X86] Remove patterns for X86VSintToFP/X86VUintToFP+loadv4f32 to v2f64.
These patterns can incorrectly narrow a volatile load from 128-bits to 64-bits.
Similar to PR42079.

Switch to using (v4i32 (bitcast (v2i64 (scalar_to_vector (loadi64))))) as the
load pattern used in the instructions.

This probably still has issues in 32-bit mode where loadi64 isn't legal. Maybe
we should use VZMOVL for widened loads even when we don't need the upper bits
as zeroes?

llvm-svn: 362203
2019-05-31 07:38:26 +00:00
Craig Topper b79cc5f802 [X86] Remove avx512 isel patterns for fpextend+load. Prefer to only match fp extloads instead.
DAG combine will usually fold fpextend+load to an fp extload anyway. So the
256 and 512 patterns were probably unnecessary. The 128 bit pattern was special
in that it looked for a v4f32 load, but then used it in an instruction that
only loads 64-bits. This is bad if the load happens to be volatile. We could
probably make the patterns volatile aware, but that's more work for something
that's probably rare. The peephole pass might kick in and save us anyway. We
might also be able to fix this with some additional DAG combines.

This also adds patterns for vselect+extload to enabled masked vcvtps2pd to be
used. Previously we looked for the unlikely vselect+fpextend+load.

llvm-svn: 362199
2019-05-31 06:21:53 +00:00
Craig Topper 23066033a1 [X86] Correct the ins operand order for MASKPAIR16STORE to match other store instructions.
This makes the 5 address operands come first. And the data operand comes last.

This matches the operand order the instruction is created with. It's also the
expected order in X86MCInstLower. So everything appeared to work, but the
operands didn't match their declared type.

Fixes a -verify-machineinstrs failure.

Also remove the isel patterns from these instructions since they should only
be used for stack spills and reloads. I'm not even sure what types the patterns
were looking for to match.

llvm-svn: 362193
2019-05-31 05:20:27 +00:00
Pengfei Wang 2e67d0c842 [X86] Add VP2INTERSECT instructions
Support Intel AVX512 VP2INTERSECT instructions in llvm

Patch by Xiang Zhang (xiangzhangllvm)

Differential Revision: https://reviews.llvm.org/D62366

llvm-svn: 362188
2019-05-31 02:50:41 +00:00
Craig Topper 70dc2200a2 [X86] Remove result type constraints from the extloadv2f32/extloadv4f32/extloadv8f32 PatFrags. NFC
The result types aren't mentioned in the pattern name so really shouldn't be in the PatFrags.

The users of these either have their own type constraint or rely on the type constranit system to realize the only legal extend would be to f64.

llvm-svn: 362175
2019-05-30 23:35:24 +00:00
Craig Topper d6b74cc859 [X86] Remove code that unnecessarily sets EXTLOAD with src type of v2f32/v4f32/v8f32 as Legal for SSE2/AVX/AVX512 respectively. NFC
The LoadExt table defaults to all combinations being Legal. For
vector types, only src VTs with an i1 element type were ever changed.
So we don't need to mark them legal manually.

llvm-svn: 362170
2019-05-30 22:29:06 +00:00
Matt Arsenault e0a4da8c0a AMDGPU/GlobalISel: Add wave scratch offset argument
Avoids crashing in PEI in a future change.

llvm-svn: 362136
2019-05-30 19:33:18 +00:00
Tim Renouf 7fecdf36cc [AMDGPU] Added target-specific attribute amdgpu-max-memory-clause
With LLPC, previous investigation has suggested that si-scheduler
interacts badly with SiFormMemoryClauses on an XNACK target in some
games.

That needs further investigation in the future. In the meantime, this
commit adds a target-specific attribute to allow us to disable
SIFormMemoryClauses by setting it to 1 on a per-function basis for LLPC
to use.

Differential Revision: https://reviews.llvm.org/D62572

Change-Id: Ia0ca12ce79093cbbe86caded723ffb13384ede92
llvm-svn: 362127
2019-05-30 18:46:34 +00:00
Sam Parker 913604a637 [NFC][ARM][ParallelDSP] Refactor narrow sequence
Most of the code used for finding a 'narrow' sequence is not used,
so I've removed it and simplified the calls from the smlad matcher.

llvm-svn: 362104
2019-05-30 15:26:37 +00:00
Sjoerd Meijer eb072b5a6a [ARM] Change the MC names for VMAXNM/VMINNM
Now the NEON ones have a prefix "NEON_", and the VFP ones have a
prefix "VFP_". This is so that the regex in ARMScheduleA57.td can be
made to match both of _those_ classes of VMAXNM without also matching
the MVE ones that are going to be introduced soon. NFCI.

Patch by Simon Tatham.

Differential Revision: https://reviews.llvm.org/D60700

llvm-svn: 362097
2019-05-30 14:34:29 +00:00
Simon Pilgrim 5359bb4d31 [ARM] LowerVECTOR_SHUFFLE - fix uninitialized variable warnings. NFCI.
llvm-svn: 362094
2019-05-30 14:01:24 +00:00
Sjoerd Meijer 930dee2c0b [ARM] add target arch definitions for 8.1-M and MVE
This adds:
- LLVM subtarget features to make all the new instructions conditional on,
- CPU and FPU names for use on clang's command line, with default FPUs set
  so that "armv8.1-m.main+fp" and "armv8.1-m.main+fp.dp" will select the right
  FPU features,
- architecture extension names "mve" and "mve.fp",
- ABI build attribute support for v8.1-M (a new value for Tag_CPU_arch) and MVE
  (a new actual tag).

Patch mostly by Simon Tatham.

Differential Revision: https://reviews.llvm.org/D60698

llvm-svn: 362090
2019-05-30 12:57:04 +00:00
Sjoerd Meijer 7eb95d672d [ARM] Introduce separate features for FP registers
The MVE extension in Arm v8.1-M permits the use of some move, load and
store isntructions which access the FP registers, even if there's no
actual FP support in the processor (in particular, if you have the
integer-only version of MVE).

Therefore, we need separate subtarget features to condition those
instructions on, which are implied by both FP and MVE but are not part
of either.

Patch mostly by Simon Tatham.

Differential Revision: https://reviews.llvm.org/D60694

llvm-svn: 362088
2019-05-30 12:37:05 +00:00
Simon Pilgrim 32aac1727a [X86][SSE] Improve bool vector extload (PR26091)
We already have good codegen for (vXiY *ext(vXi1 bitcast(iX))) cases, this patch uses it for loads of vXi1 types as well - changing the load into a iX integer load, and bitcasting so that combineToExtendBoolVectorInReg can then use it.

Differential Revision: https://reviews.llvm.org/D62449

llvm-svn: 362081
2019-05-30 10:25:20 +00:00
Cullen Rhodes 7fad428931 [AArch64][SVE2] Asm: support SVE2 vector splice (constructive)
Summary:
The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D62530

llvm-svn: 362073
2019-05-30 08:51:39 +00:00
Cullen Rhodes ebe23041f0 [AArch64][SVE2] Asm: support SVE2 load instructions
Summary:
Patch adds support for the following instructions:
    * LDNT1SB, LDNT1B, LDNT1SH, LDNT1H, LDNT1SW, LDNT1W, LDNT1D

The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D62528

llvm-svn: 362072
2019-05-30 08:44:27 +00:00
Cullen Rhodes 455c529f77 [AArch64][SVE2] Asm: support FCVTX/FLOGB instructions
Summary:

Patch completes SVE2 support for:

    SVE Floating Point Unary Operations - Predicated Group

The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D62526

llvm-svn: 362071
2019-05-30 08:35:12 +00:00
Cullen Rhodes 028413f5ae [AArch64][SVE2] Asm: add ext (immediate offset, constructive) instruction
Summary:
The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest

Reviewed By: chill

Differential Revision: https://reviews.llvm.org/D62518

llvm-svn: 362070
2019-05-30 08:25:17 +00:00
Sjoerd Meijer 5857bf5d1e [ARM] Add an MVE execution domain
MVE architecturally specifies a 'beat' system in which a vector
instruction executed now will complete its actual operation over the
next four cycles, so it can overlap with the execution of the previous
and next MVE instruction.

This makes it generally an advantage to avoid moving values back and
forth between MVE registers and anywhere else, if there's any sensible
way to do the same processing in whatever register type the values
already occupied.

That's just what the 'execution domain' system is supposed to achieve.
So here we add a new execution domain which will contain all the MVE
vector instructions when they are added.

Patch by: Simon Tatham

Differential Revision: https://reviews.llvm.org/D60703

llvm-svn: 362068
2019-05-30 08:07:06 +00:00
Pengfei Wang 1f67d94279 [X86] Add ENQCMD instructions
For more details about these instructions, please refer to the latest
ISE document:
https://software.intel.com/en-us/download/intel-architecture-instruction-set-extensions-programming-reference.

Patch by Tianqing Wang (tianqing)

Differential Revision: https://reviews.llvm.org/D62281

llvm-svn: 362053
2019-05-30 03:59:16 +00:00
Pete Couperus d80024c687 [ARC] Cleanup ARCAsmPrinter.
Summary:
Remove unused getTargetStreamer.
Remove unused headers.

Reviewers: dantrushin

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62549

llvm-svn: 362021
2019-05-29 20:07:35 +00:00
Aakanksha Patil d5443f8c21 AMDGPU: Return address lowering
The patch computes the return address for the current function.

Differential revision: https://reviews.llvm.org/D59666

llvm-svn: 362001
2019-05-29 18:20:11 +00:00
Simon Atanasyan 188162118f [mips] Iterate over MSACtrlRegClass to reserve all MSA control registers. NFC
llvm-svn: 361965
2019-05-29 14:58:56 +00:00
Simon Atanasyan af7bf2f687 [mips] Use range-based for loops. NFC
llvm-svn: 361964
2019-05-29 14:58:50 +00:00
Sjoerd Meijer 24c5629625 [ARM] Split predicates out into their own .td file
The new ARMPredicates.td is included from ARM.td, early enough that
the predicate definitions are already in scope when ARMSchedule.td is
included. This will make it possible to refer to them in
UnsupportedFeatures fields of scheduling models.

NFC: the chunk of Tablegen being moved here is copied and pasted
verbatim.

Patch by: Simon Tatham

Differential Revision: https://reviews.llvm.org/D60693

llvm-svn: 361958
2019-05-29 13:41:57 +00:00
Cullen Rhodes 6c04ef3d48 [AArch64][SVE2] Asm: support SVE Bitwise Logical - Unpredicated Group
Summary:
Patch adds support for the following instructions:
    * EOR3, BSL, BCAX, BSL1N, BSL2N, NBSL, XAR

Aliases for types .B/.H/.S for EOR3 and BCAX have been added, the
preferred disassembly is .D.

The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D62387

llvm-svn: 361936
2019-05-29 09:03:27 +00:00
Cullen Rhodes 75dfbdc2da [AArch64][SVE2] Asm: support Floating Point Widening Multiply-Add
Summary:
Patch adds support for the indexed and unpredicated vectors forms of the
FMLALB, FMLALT, FMLSLB and FMLSLT instructions.

The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest

Reviewed By: chill

Differential Revision: https://reviews.llvm.org/D62386

llvm-svn: 361935
2019-05-29 08:53:06 +00:00
Cullen Rhodes 4f58ad4e72 [AArch64][SVE2] Asm: support SVE2 Floating Point Pairwise Group
Summary:
Patch adds support for the following instructions:

SVE2 floating-point pairwise operations:
    * FADDP, FMAXNMP, FMINNMP, FMAXP, FMINP

The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest

Reviewed By: chill

Differential Revision: https://reviews.llvm.org/D62383

llvm-svn: 361933
2019-05-29 08:40:33 +00:00
Pengfei Wang 72e3f9662b Revert "[X86] Use 'llvm_unreachable' instead of nullptr in unreachable code to"
This reverts commit c1b3716614bc0a107e6f41a7d3d503baefad8a5b.

llvm-svn: 361918
2019-05-29 02:49:59 +00:00
Pengfei Wang 818c652643 [X86] Use 'llvm_unreachable' instead of nullptr in unreachable code to
avoid static check fail

RegClassOrBank is an object of RegClassOrRegBank, which is defined as
using llvm::RegClassOrRegBank = typedef PointerUnion<const
TargetRegisterClass *, const RegisterBank *>
so control flow can not get here. Use ""llvm_unreachable" here to avoid
"null pointer" confusion.

Patch by Shengchen Kan (skan)

Differential Revision: https://reviews.llvm.org/D62006

Signed-off-by: pengfei <pengfei.wang@intel.com>
llvm-svn: 361912
2019-05-29 02:20:37 +00:00
Fangrui Song 656afe370d [X86] Fix x86-64 call *foo@tlsdesc(%rax) and support R_386_TLSGOTDESC R_386_TLS_DESC_CALL
D18885 emitted 5 bytes for call *foo@tlsdesc(%rax). It should use the
2-byte form instead and let R_X86_64_TLSDESC_CALL apply to the beginning
of the call instruction.

The 2-byte form was deliberately chosen to make ->LE and ->IE relaxation work:

    0:   48 8d 05 00 00 00 00    lea    0x0(%rip),%rax        # 7 <.text+0x7>
                         3: R_X86_64_GOTPC32_TLSDESC     a-0x4
    7:   ff 10                   callq  *(%rax)
                         7: R_X86_64_TLSDESC_CALL        a

=>

    0:   48 c7 c0 fc ff ff ff    mov    $0xfffffffffffffffc,%rax
    7:   66 90                   xchg   %ax,%ax

Also change the symbol type to STT_TLS when VK_TLSCALL or VK_TLSDESC is
seen.

Reviewed By: compnerd

Differential Revision: https://reviews.llvm.org/D62512

llvm-svn: 361910
2019-05-29 02:02:59 +00:00
Thomas Lively 26d711be6e [WebAssembly] Add signatures for RINT builtins
Reviewers: azakai, dschuff

Subscribers: sbc100, jgravelle-google, hiraditya, aheejin, sunfish, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62564

llvm-svn: 361904
2019-05-29 01:06:00 +00:00
Jessica Paquette b73ea75b38 [AArch64][GlobalISel] Select FCMPSri/FCMPDri when comparing against 0.0
Add support for selecting FCMPSri and FCMPDri when comparing against 0.0, and
factor out opcode selection for G_FCMP into its own function.

Add a test to show that we don't do this with other immediates.

Differential Revision: https://reviews.llvm.org/D62539

llvm-svn: 361888
2019-05-28 22:52:49 +00:00
Heejin Ahn 5514658591 [WebAssembly] Support for atomic fences
Summary:
This adds support for translation of LLVM IR fence instruction. We
convert a singlethread fence to a pseudo compiler barrier which becomes
0 instructions in final binary, and a thread fence to an idempotent
atomicrmw instruction to a memory address.

Reviewers: dschuff, jfb, sunfish, tlively

Subscribers: sbc100, jgravelle-google, llvm-commits

Differential Revision: https://reviews.llvm.org/D50277

llvm-svn: 361884
2019-05-28 22:09:12 +00:00
Konstantin Zhuravlyov fe23ed2c68 AMDGPU: Temporary drop s_mul_hi_i/u32 patterns
It introduces performance regressions in several applications.

This has already been submitted downstream.

llvm-svn: 361879
2019-05-28 21:18:34 +00:00
Adhemerval Zanella 34d8daae53 [AArch64] Handle ISD::LRINT and ISD::LLRINT
This patch optimizes ISD::LRINT and ISD::LLRINT to frintx plus
fcvtzs. It currently only handles the scalar version.

Reviewed By: SjoerdMeijer, mstorsjo

Differential Revision: https://reviews.llvm.org/D62018

llvm-svn: 361877
2019-05-28 21:04:29 +00:00
Adhemerval Zanella 6d7bf5e8df [CodeGen] Add lrint/llrint builtins
This patch add the ISD::LRINT and ISD::LLRINT along with new
intrinsics.  The changes are straightforward as for other
floating-point rounding functions, with just some adjustments
required to handle the return value being an interger.

The idea is to optimize lrint/llrint generation for AArch64
in a subsequent patch.  Current semantic is just route it to libm
symbol.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D62017

llvm-svn: 361875
2019-05-28 20:47:44 +00:00
Michael Liao 5fc1dfa784 [AMDGPU] Correct the handling of inlineasm output registers.
Summary:
- There's a regression due to the cross-block RC assignment. Use the
  proper way to derive the output register RC in inline asm.

Reviewers: rampitec, alex-t

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, dstuttard, tpr, t-tye, eraman, hiraditya, llvm-commits, yaxunl

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62537

llvm-svn: 361868
2019-05-28 19:37:09 +00:00
Sanjay Patel f7980e727f Revert "[x86] split 256-bit store of concatenated vectors"
This reverts commit d5a8637072.

Most likely suspect for this bot failure:
http://lab.llvm.org:8011/builders/clang-cmake-x86_64-avx2-linux/builds/9684

llvm-svn: 361850
2019-05-28 17:37:58 +00:00
Matt Arsenault 24e80b8d04 AMDGPU: Don't enable all lanes with non-CSR VGPR spills
If the only VGPRs used for SGPR spilling were not CSRs, this was
enabling all laness and immediately restoring exec. This is the usual
situation in leaf functions.

llvm-svn: 361848
2019-05-28 16:46:02 +00:00
Michael Liao 7166843f1e [AMDGPU] Fix the mis-handling of `vreg_1` copied from scalar register.
Summary:
- Don't treat the use of a scalar register as `vreg_1` an VGPR usage.
  Otherwise, that promotes that scalar register into vector one, which
  breaks the assumption that scalar register holds the lane mask.
- The issue is triggered in a complicated case, where if the uses of
  that (lane mask) scalar register is legalized firstly before its
  definition, e.g., due to the mismatch block placement and its
  topological order or loop. In that cases, the legalization of PHI
  introduces the use of that scalar register as `vreg_1`.

Reviewers: rampitec, nhaehnle, arsenm, alex-t

Subscribers: kzhuravl, jvesely, wdng, dstuttard, tpr, t-tye, hiraditya, llvm-commits, yaxunl

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62492

llvm-svn: 361847
2019-05-28 16:29:39 +00:00
Simon Tatham 760df47b77 [ARM] Replace fp-only-sp and d16 with fp64 and d32.
Those two subtarget features were awkward because their semantics are
reversed: each one indicates the _lack_ of support for something in
the architecture, rather than the presence. As a consequence, you
don't get the behavior you want if you combine two sets of feature
bits.

Each SubtargetFeature for an FP architecture version now comes in four
versions, one for each combination of those options. So you can still
say (for example) '+vfp2' in a feature string and it will mean what
it's always meant, but there's a new string '+vfp2d16sp' meaning the
version without those extra options.

A lot of this change is just mechanically replacing positive checks
for the old features with negative checks for the new ones. But one
more interesting change is that I've rearranged getFPUFeatures() so
that the main FPU feature is appended to the output list *before*
rather than after the features derived from the Restriction field, so
that -fp64 and -d32 can override defaults added by the main feature.

Reviewers: dmgreen, samparker, SjoerdMeijer

Subscribers: srhines, javed.absar, eraman, kristof.beyls, hiraditya, zzheng, Petar.Avramovic, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D60691

llvm-svn: 361845
2019-05-28 16:13:20 +00:00
Fangrui Song 448a79d123 [AArch64] Delete unused VariantKind in AArch64MCExpr
llvm-svn: 361844
2019-05-28 16:11:56 +00:00
David Greene 561fcc0d63 [X86-64] Fix 256-bit SET0 lowering for non-VLX targets
If we don't have VLX then 256-bit SET0 should be lowered
to VPXOR with ZMM registers.  This restores functionality
accidentally removed by r309926.

Differential Revision: https://reviews.llvm.org/D62415

llvm-svn: 361843
2019-05-28 15:37:01 +00:00
Sanjay Patel d5a8637072 [x86] split 256-bit store of concatenated vectors
This shows up as a side issue to the main problem for the AVX target example from PR37428:
https://bugs.llvm.org/show_bug.cgi?id=37428 - https://godbolt.org/z/7tpRa3

But as we can see in the pile of existing test diffs, it's actually a widespread problem
that affects any AVX or later target. Apart from a couple of oddballs, I think these are
all improvements for the reasons stated in the code comment: we do not want to enable YMM
unnecessarily (avoid vzeroupper and frequency throttling) and some cores split 256-bit
stores anyway.

We could say that MergeConsecutiveStores() is going overboard on some of these examples,
but that won't solve the problem completely. But that is the reason I'm proposing this as
a lowering rather than a combine: we will infinite loop fighting the merge code if we try
this earlier.

Differential Revision: https://reviews.llvm.org/D62498

llvm-svn: 361822
2019-05-28 13:54:17 +00:00
Sanjay Patel 6bf4ca9d2e [x86] fix 256-bit vector store splitting to honor 'volatile'
Forking this out of the discussion in D62498
(and assuming that will be committed later, so adding the helper function here).
The LangRef says:
"the backend should never split or merge target-legal volatile load/store instructions."

Differential Revision: https://reviews.llvm.org/D62506

llvm-svn: 361815
2019-05-28 12:58:07 +00:00
Benjamin Kramer 57e267a2e9 [X86] Custom lower CONCAT_VECTORS of v2i1
The generic legalizer cannot handle this. Add an assert instead of
silently miscompiling vectors with elements smaller than 8 bits.

llvm-svn: 361814
2019-05-28 12:52:57 +00:00
Graham Hunter 19e91253c0 [NFC] Test commit, delete trailing whitespace
llvm-svn: 361813
2019-05-28 12:36:39 +00:00
Simon Pilgrim 4b48aa0e30 [X86] X86CmovConverterPass::collectCmovCandidates - fix uninitialized variable warnings. NFCI.
llvm-svn: 361804
2019-05-28 10:53:23 +00:00
Cullen Rhodes f57bd6bd23 [AArch64][SVE2] Asm: support SVE2 Floating Point Convert Group
Summary:
Patch adds support for the following intructions:

SVE2 floating-point convert precision:
    * FCVTXNT, FCVTNT, FCVTLT

The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest

Reviewed By: chill

Differential Revision: https://reviews.llvm.org/D62382

llvm-svn: 361801
2019-05-28 09:36:52 +00:00
Cullen Rhodes 8e91dd7934 [AArch64][SVE2] Asm: support SVE2 Crypto Extensions Group
Summary:
Patch adds support for the following instructions:

SVE2 crypto constructive binary operations:
    * SM4EKEY, RAX1

SVE2 crypto destructive binary operations:
    * AESE, AESD, SM4E

SVE2 crypto unary operations:
    * AESMC, AESIMC

AESE, AESD, AESMC and AESIMC are enabled with +sve2-aes.  SM4E and
SM4EKEY are enabled with +sve2-sm4. RAX1 is enabled with +sve2-sha3.

The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D62307

llvm-svn: 361797
2019-05-28 09:13:17 +00:00
Cullen Rhodes c4ed601bd9 [AArch64][SVE2] Asm: support SVE2 Histogram Computation Groups
Summary:
Patch adds support for the following instructions:

SVE2 histogram generation (segment):
    * HISTSEG

SVE2 histogram generation (vector):
    * HISTCNT

The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest

Reviewed By: chill

Differential Revision: https://reviews.llvm.org/D62306

llvm-svn: 361796
2019-05-28 08:51:59 +00:00
Cullen Rhodes 7d9cac5bba [AArch64][SVE2] Asm: support SVE2 Misc Group
Summary:
Patch adds support for the following instructions:

SVE2 bitwise exclusive-or interleaved:
    * EORBT, EORTB

SVE2 bitwise permute:
    * BEXT, BDEP, BGRP

SVE2 bitwise shift left long:
    * SSHLLB, SSHLLT, USHLLB, USHLLT

SVE2 integer add/subtract interleaved long:
    * SADDLBT, SSUBLBT, SSUBLTB

BDEP, BEXT and BGRP are enabled with SVE2 feature +bitperm, all other
instructions in this group are enabled with +sve2.

Reviewed By: chill

Differential Revision: https://reviews.llvm.org/D62304

llvm-svn: 361795
2019-05-28 08:42:22 +00:00
Alexander Timofeev f4040a0dd8 [AMDGPU] Fix for the address sanitizer failure. Fixing typo
llvm-svn: 361776
2019-05-27 18:17:21 +00:00
Dmitri Gribenko 5379f1a6c5 Include what you use in AArch64AsmBackend.cpp
AArch64AsmBackend.cpp was not using any APIs from AArch64.h, and was
only including it for transitive dependencies.  Doing so is problematic
from include-what-you-use perspective, but it is also a layering issue
(it creates a dependency cycle between the primary AArch64 target
library and the MCTargetDesc library).

llvm-svn: 361774
2019-05-27 17:03:57 +00:00
Alexander Timofeev 4a7c4069ae [AMDGPU] Fix for the address sanitizer failure caused by the ifollowing commit:
1a8b2ea611cf4ca7cb09562e0238cfefa27c05b5  Divergence driven ISel. Assign register class for cross block values according to the divergence.

llvm-svn: 361770
2019-05-27 15:03:29 +00:00
Dmitry Preobrazhensky b79af7930c [AMDGPU][MC] Enabled constant expressions as operands of s_waitcnt
See bug 40820: https://bugs.llvm.org/show_bug.cgi?id=40820

Reviewers: artem.tamazov, arsenm

Differential Revision: https://reviews.llvm.org/D61017

llvm-svn: 361763
2019-05-27 14:08:43 +00:00
Diana Picus 68b20c589c [ARM GlobalISel] Cleanup CallLowering a bit
We never actually use the Offsets produced by ComputeValueVTs, so remove
them until we need them.

llvm-svn: 361755
2019-05-27 10:30:33 +00:00
Yonghong Song e698958ad8 [BPF] generate R_BPF_NONE relocation for BTF DataSec variables
The variables in BTF DataSec type encode in-section offset.
R_BPF_NONE should be generated instead of R_BPF_64_32.

Signed-off-by: Yonghong Song <yhs@fb.com>

Differential Revision: https://reviews.llvm.org/D62460

llvm-svn: 361742
2019-05-26 21:26:06 +00:00
Alexander Timofeev ba447bae74 [AMDGPU] Divergence driven ISel. Assign register class for cross block values according to the divergence.
Details: To make instruction selection really divergence driven it is necessary to assign
             the correct register classes to the cross block values beforehand. For the divergent targets
             same value type requires different register classes dependent on the value divergence.

    Reviewers: rampitec, nhaehnle

    Differential Revision: https://reviews.llvm.org/D59990

    This commit was reverted because of the build failure.
    The reason was mlformed patch.
    Build failure fixed.

llvm-svn: 361741
2019-05-26 20:33:26 +00:00
Shawn Landden 343578759e [SimplifyCFG] back out all SwitchInst commits
They caused the sanitizer builds to fail.

My suspicion is the change the countLeadingZeros().

llvm-svn: 361736
2019-05-26 18:15:51 +00:00
Simon Pilgrim a044410f37 [X86][SSE] Add shuffle combining support for ISD::ANY_EXTEND_VECTOR_INREG
Reuses what we already have in place for ISD::ZERO_EXTEND_VECTOR_INREG just with a different sentinel

llvm-svn: 361734
2019-05-26 16:00:35 +00:00
Shawn Landden b7cc093db2 [Support] make countLeadingZeros() and countTrailingZeros() return unsigned
This matches countLeadingOnes() and countTrailingOnes(), and
APInt's countLeadingZeros() and countTrailingZeros().

(as well as __builtin_clzll())

llvm-svn: 361724
2019-05-26 13:49:58 +00:00
David Green 0dbafe191e [ARM] Select fp16 fma
This adds a pattern for fma, similar to the float and double patterns.

Differential Revision: https://reviews.llvm.org/D62330

llvm-svn: 361719
2019-05-26 11:34:30 +00:00
David Green 21542cd6f4 [ARM] Select a number of fp16 rounding functions
This add patterns for fp16 round and ceil etc. Same as the float and double
patterns.

Differential Revision: https://reviews.llvm.org/D62326

llvm-svn: 361718
2019-05-26 11:13:00 +00:00
David Green c9f4b7d201 [ARM] Promote various fp16 math intrinsics
Promote a number of fp16 math intrinsics to float, so that the relevant float
math routines can be used. Copysign is expanded so as to be handled in-place.

Differential Revision: https://reviews.llvm.org/D62325

llvm-svn: 361717
2019-05-26 10:59:21 +00:00
Simon Pilgrim 58a8541dcc [X86][AVX] combineBitcastvxi1 - peek through bitops to determine size of original vector
We were only testing for direct SETCC results - this allows us to peek through AND/OR/XOR combinations of the comparison results as well.

There's a missing SEXT(PACKSS) fold that I need to investigate for v8i1 cases before I can enable it there as well.

llvm-svn: 361716
2019-05-26 10:54:23 +00:00
David Green 2881325b17 [ARM] Select fp16 fabs
This adds a pattern for the fabs intrinsic, the same as float and double.

Differential Revision: https://reviews.llvm.org/D62324

llvm-svn: 361715
2019-05-26 10:51:58 +00:00
David Green aeade651f3 [ARM] Select fp16 fsqrt
This adds a pattern for the sqrt intrinsic, the same as float and double.

Differential Revision: https://reviews.llvm.org/D62322

llvm-svn: 361714
2019-05-26 10:42:24 +00:00
David Green caf8a11b65 [ARM] Promote fp16 frem
Promote fp16 frem operations on ARM to floats so they call fmodf.

Differential Revision: https://reviews.llvm.org/D62321

llvm-svn: 361713
2019-05-26 10:30:22 +00:00
Simon Pilgrim 40fa52b174 [X86] lowerBuildVectorToBitOp - support build_vector(shift()) -> shift(build_vector(),C)
Commonly occurs in sign-extension cases

llvm-svn: 361706
2019-05-25 18:02:17 +00:00
Nikita Popov d87eceda0e [X86] Combine fminnum/fmaxnum with non-nan operand to fmin/fmax
If we have a known non-nan operand, place it in the second operand
of fmin/fmax that is returned if either operand is nan.

Differential Revision: https://reviews.llvm.org/D62448

llvm-svn: 361704
2019-05-25 16:44:29 +00:00
Craig Topper 46e5052b8e [X86FixupLEAs] Turn optIncDec into a generic two address LEA optimizer. Support LEA64_32r properly.
INC/DEC is really a special case of a more generic issue. We should also turn leas into add reg/reg or add reg/imm regardless of the slow lea flags.

This also supports LEA64_32 which has 64 bit input registers and 32 bit output registers. So we need to convert the 64 bit inputs to their 32 bit equivalents to check if they are equal to base reg.

One thing to note, the original code preserved the kill flags by adding operands to the new instruction instead of using addReg. But I think tied operands aren't supposed to have the kill flag set. I dropped the kill flags, but I could probably try to preserve it in the add reg/reg case if we think its important. Not sure which operand its supposed to go on for the LEA64_32r instruction due to the super reg implicit uses. Though I'm also not sure those are needed since they were probably just created by an INSERT_SUBREG from a 32-bit input.

Differential Revision: https://reviews.llvm.org/D61472

llvm-svn: 361691
2019-05-25 06:17:47 +00:00
Craig Topper 4b08fcdeb1 [X86] Add zero idioms to the haswell, broadwell, and skylake schedule models. Add 256-bit fp xor to sandybridge zero idioms
This copies the Sandy Bridge zero idiom support to later CPUs. Adding the AVX2 and AVX512F/VL instructions as appropriate.

Differential Revision: https://reviews.llvm.org/D62360

llvm-svn: 361690
2019-05-25 04:47:49 +00:00
Peter Collingbourne 3b93737446 Revert r361644, "[AMDGPU] Divergence driven ISel. Assign register class for cross block values according to the divergence."
Broke sanitizer bots:
http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux/builds/21694/steps/bootstrap%20clang/logs/stdio
http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/32478/steps/check-llvm%20asan/logs/stdio

llvm-svn: 361688
2019-05-25 01:52:38 +00:00
Jessica Paquette 97d668d70f [GlobalISel][AArch64] Make FP constraint checks consider possible use/def banks
In a few places in getInstrMapping, we check if use/def instructions for the
instruction we're mapping have floating point constraints.

We can improve this check and reduce the number of copies in GISel-compiled code
if we make a couple observations:

- For a def instruction, it only matters if the def instruction must always
  output a value stored on a FPR

- For a use instruction, it only matters if the use instruction must always
  only take in values stored in FPRs

This adds two new functions:

- onlyUsesFP
- onlyDefinesFP

Then we can use those when we're checking the uses/defs instead.

Without this patch, the load, unmerge, store, and select in the added test
would have unnecessary copies.

Differential Revision: https://reviews.llvm.org/D62426

llvm-svn: 361679
2019-05-24 23:08:45 +00:00
Jessica Paquette bede937b16 [GlobalISel][AArch64] NFC: Factor out HasFPConstraints into a proper function
Factor it out into a function, and replace places where we had the same check
with the new function.

Differential Revision: https://reviews.llvm.org/D62421

llvm-svn: 361677
2019-05-24 22:12:21 +00:00
Jason Liu 8e1d921bb3 Implement call lowering without parameters on AIX
Summary:dd
This patch implements call lowering for calls without parameters
on AIX as initial support.

Reviewers: sfertile, hubert.reinterpretcast, aheejin, efriedma

Differential Revision: https://reviews.llvm.org/D61948

llvm-svn: 361669
2019-05-24 20:54:35 +00:00
Jessica Paquette 56503865ed [GlobalISel][AArch64] Improve register bank mappings for G_SELECT
The fcsel and csel instructions differ in only the register banks they work on.

So, they're entirely interchangeable otherwise.

With this in mind, this does two things:

- Teach AArch64RegisterBankInfo to consider the inputs to G_SELECT as well as
  the outputs.
- Teach it to choose the best register bank mapping based off the constraints
  of the inputs and outputs.

The "best" in this case means the one that requires the smallest number of
copies to properly emit a fcsel/csel.

For example, if the inputs are all already going to be on FPRs, we should
emit a fcsel, even if the output is a GPR. This costs one copy to produce the
result, but saves us from copying the inputs into GPRs.

Also update the regbank-select.mir to check that we end up with the right
select instruction.

Differential Revision: https://reviews.llvm.org/D62267

llvm-svn: 361665
2019-05-24 19:35:25 +00:00
Nick Desaulniers 33bc64202b [AArch64] check for INLINEASM_BR along w/ INLINEASM
Summary:
It looks like since INLINEASM_BR was created off of INLINEASM, a few
checks for INLINEASM needed to be updated to check for either case.

pr/41999

Reviewers: t.p.northover, peter.smith

Reviewed By: peter.smith

Subscribers: craig.topper, javed.absar, kristof.beyls, hiraditya, llvm-commits, peter.smith, srhines

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62402

llvm-svn: 361661
2019-05-24 19:00:13 +00:00
Nick Desaulniers 9f7bd71cf5 [ARM] additionally check for ARM::INLINEASM_BR w/ ARM::INLINEASM
Summary:
We were observing failures for arm32 allyesconfigs of the Linux kernel
with the asm goto Clang patch, where ldr's were being generated to
offsets too far away to encode in imm12.

It looks like since INLINEASM_BR was created off of INLINEASM, a few
checks for INLINEASM needed to be updated to check for either case.

pr/41999

Link: https://github.com/ClangBuiltLinux/linux/issues/490

Reviewers: peter.smith, kristof.beyls, ostannard, rengolin, t.p.northover

Reviewed By: peter.smith

Subscribers: jyu2, javed.absar, hiraditya, llvm-commits, nathanchance, craig.topper, kees, srhines

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62400

llvm-svn: 361659
2019-05-24 18:58:21 +00:00
Matt Arsenault 3d59e388ca AMDGPU: Activate all lanes when spilling CSR VGPR for SGPR spills
If some lanes weren't active on entry to the function, this could
clobber their VGPR values.

llvm-svn: 361655
2019-05-24 18:18:51 +00:00
Matt Arsenault 0ff901fba0 AMDGPU: Boost inline threshold with addrspacecasted alloca arguments
This was skipping GetUnderlyingObject for nonprivate addresses, but an
alloca could also be found through an addrspacecast if it's flat.

llvm-svn: 361649
2019-05-24 16:52:35 +00:00
Alexander Timofeev dffedea014 [AMDGPU] Divergence driven ISel. Assign register class for cross block values according to the divergence.
Details: To make instruction selection really divergence driven it is necessary to assign
         the correct register classes to the cross block values beforehand. For the divergent targets
         same value type requires different register classes dependent on the value divergence.

Reviewers: rampitec, nhaehnle

Differential Revision: https://reviews.llvm.org/D59990

llvm-svn: 361644
2019-05-24 15:32:18 +00:00
Stefan Pintilie 522307fa40 [PowerPC] Remove CRBits Copy Of Unset/set CBit
For the situation, where we generate the following code:

       crxor 8, 8, 8
       < Some instructions>
.LBB0_1:
       < Some instructions>
       cror 1, 8, 8

cror (COPY of CRbit) depends on the result of the crxor instruction.
CR8 is known to be zero as crxor is equivalent to CRUNSET. We can simply use
crxor 1, 1, 1 instead to zero out CR1, which does not have any dependency on
any previous instruction.

This patch will optimize it to:

        < Some instructions>
.LBB0_1:
        < Some instructions>
        cror 1, 1, 1

Patch By: Victor Huang (NeHuang)

Differential Revision: https://reviews.llvm.org/D62044

llvm-svn: 361632
2019-05-24 12:05:37 +00:00
Cullen Rhodes b3e58df80c [AArch64][SVE2] Asm: support SVE2 String Processing Group
Summary:
Patch adds support for the SVE2 character match instructions MATCH and NMATCH.

The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D62206

llvm-svn: 361627
2019-05-24 10:32:01 +00:00
Cullen Rhodes adb1d74bf9 [AArch64][SVE2] Asm: support SVE2 Narrowing Group
Summary:
Patch adds support for the following instructions:

SVE2 bitwise shift right narrow:
    * SQSHRUNB, SQSHRUNT, SQRSHRUNB, SQRSHRUNT, SHRNB, SHRNT, RSHRNB, RSHRNT,
      SQSHRNB, SQSHRNT, SQRSHRNB, SQRSHRNT, UQSHRNB, UQSHRNT, UQRSHRNB,
      UQRSHRNT

SVE2 integer add/subtract narrow high part:
    * ADDHNB, ADDHNT, RADDHNB, RADDHNT, SUBHNB, SUBHNT, RSUBHNB, RSUBHNT

SVE2 saturating extract narrow:
    * SQXTNB, SQXTNT, UQXTNB, UQXTNT, SQXTUNB, SQXTUNT

The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D62205

llvm-svn: 361624
2019-05-24 10:22:30 +00:00
Cullen Rhodes 5f04f00282 [AArch64][SVE2] Asm: support SVE2 Accumulate Group
Summary:
Patch adds support for the following instructions:

SVE2 bitwise shift and insert:
    * SRI, SLI

SVE2 bitwise shift right and accumulate:
    * SSRA, USRA, SRSRA, URSRA

SVE2 complex integer add:
    * CADD, SQCADD

SVE2 integer absolute difference and accumulate:
    * SABA, UABA

SVE2 integer absolute difference and accumulate long:
    * SABALB, SABALT, UABALB, UABALT

SVE2 integer add/subtract long with carry:
    * ADCLB, ADCLT, SBCLB, SBCLT

The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D62204

llvm-svn: 361622
2019-05-24 10:10:34 +00:00
Simon Pilgrim 95b8d9bbf8 [SelectionDAG] computeKnownBits - support constant pool values from target
This patch adds the overridable TargetLowering::getTargetConstantFromLoad function which allows targets to return any constant value loaded by a LoadSDNode node - only X86 makes use of this so far but everything should be in place for other targets.

computeKnownBits then uses this function to improve codegen, notably vector code after legalization.

A future commit will do the same for ComputeNumSignBits but computeKnownBits sees the bigger benefit.

This required a couple of fixes:
* SimplifyDemandedBits must early-out for getTargetConstantFromLoad cases to prevent infinite loops of constant regeneration (similar to what we already do for BUILD_VECTOR).
* Fix a DAGCombiner::visitTRUNCATE issue as we had trunc(shl(v8i32),v8i16) <-> shl(trunc(v8i16),v8i32) infinite loops after legalization on AVX512 targets.

Differential Revision: https://reviews.llvm.org/D61887

llvm-svn: 361620
2019-05-24 10:03:11 +00:00
Cullen Rhodes 980f760515 [AArch64][SVE2] Asm: add PMULLB/PMULLT instructions
Summary:
This patch adds support for the polynomial multiplication instructions
PMULLB/PMULLT. The 64-bit source and 128-bit destination element
variants are enabled with crypto extensions (+sve2-aes), similar to the
NEON PMULL2 instruction. All other variants are enabled with +sve2.

The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D62145

llvm-svn: 361619
2019-05-24 09:56:23 +00:00
Cullen Rhodes 8bcea9daaa [AArch64][SVE2] Asm: add integer add/sub long/wide instructions
Summary:
Patch adds support for the following instructions:

SVE2 integer add/subtract long:
    * SADDLB, SADDLT, UADDLB, UADDLT, SSUBLB, SSUBLT, USUBLB, USUBLT,
      SABDLB, SABDLT, UABDLB, UABDLT

SVE2 integer add/subtract wide:
    * SADDWB, SADDWT, UADDWB, UADDWT, SSUBWB, SSUBWT, USUBWB, USUBWT

The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D62142

llvm-svn: 361615
2019-05-24 09:28:27 +00:00
Cullen Rhodes 968cb0e049 [AArch64][SVE2] Asm: add various bitwise shift instructions
Summary:
This patch adds support for the SVE2 saturating/rounding bitwise shift
left (predicated) group of instructions:

    * SRSHL, URSHL, SRSHLR, URSHLR, SQSHL, UQSHL, SQRSHL, UQRSHL,
      SQSHLR, UQSHLR, SQRSHLR, UQRSHLR

Immediate forms of the SQSHL and UQSHL instructions are also added to
the existing SVE bitwise shift by immediate (predicated) group, as well
as three new instructions SRSHR/URSHR/SQSHLU. The new instructions in
this group are encoded similarly and are implemented using the same
TableGen class with a minimal change (1 bit in encoding).

The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D62140

llvm-svn: 361612
2019-05-24 09:17:23 +00:00
Cullen Rhodes 6bca64fe5e [AArch64][SVE2] Asm: add saturating add/sub instructions
Summary:
Patch adds support for the following instructions:

    * SQADD, UQADD, SUQADD, USQADD
    * SQSUB, UQSUB, SQSUBR, UQSUBR

The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest

Reviewed By: chill

Differential Revision: https://reviews.llvm.org/D62130

llvm-svn: 361611
2019-05-24 09:06:37 +00:00
Cullen Rhodes d9bb7b69ab [AArch64][SVE2] Asm: fix overlapping bit
Summary:
Bit 20 in sve2_int_arith_pred TableGen class was overlapping. The
encodings are not affected as bit 20 is defined by the opc bits
and this was overwriting the earlier error of setting bit 20 to 0.

Raised by Momchil: https://reviews.llvm.org/D62130

Reviewed By: chill

Differential Revision: https://reviews.llvm.org/D62292

llvm-svn: 361609
2019-05-24 08:45:37 +00:00
Tim Northover 3b2157aeed GlobalISel: support swifterror attribute on AArch64.
swifterror marks an argument as a register pretending to be a pointer, so we
need a guaranteed mem2reg-like analysis of its uses. Fortunately most of the
infrastructure can be reused from the DAG world.

llvm-svn: 361608
2019-05-24 08:40:13 +00:00
Simon Atanasyan c1b482f2a5 [mips] Always check that `shift and add` optimization is efficient.
The D45316 introduced the `shouldTransformMulToShiftsAddsSubs` function
to check that breaking down constant multiplications into a series
of shifts, adds, and subs is efficient. Unfortunately, this function
does not check maximum number of steps on all paths of the algorithm.
This patch fixes this bug.

Fix for PR41929.

Differential Revision: https://reviews.llvm.org/D62166

llvm-svn: 361606
2019-05-24 08:39:40 +00:00
Sjoerd Meijer 937af54666 [ARM] ARMExpandPseudoInsts: add debug messages
This pass wasn't printing any messages at all, which I find really inconvenient
while debugging/tracing things. It now dumps the before and after of expanded
instructions. It doesn't do this yet for all instructions, but this is a good
start I guess.

Differential Revision: https://reviews.llvm.org/D62297

llvm-svn: 361604
2019-05-24 08:25:02 +00:00
QingShan Zhang 449bfdd1b0 [Power9] Add a specific heuristic to schedule the addi before the load
When we are scheduling the load and addi, if all other heuristic didn't take effect,
 we will try to schedule the addi before the load, to hide the latency, and avoid the
 true dependency added by RA. And this only take effects for Power9.

Differential Revision: https://reviews.llvm.org/D61930

llvm-svn: 361600
2019-05-24 05:30:09 +00:00
Reid Kleckner b7a78c7dff [AArch64] Preserve X8 for thunks ending in variadic musttail calls
Summary:
On Windows, X8 may be used to pass in the address of an aggregate that
is returned indirectly. Therefore, it should be forwarded to variadic
musttail calls and preserved in thunks.

Fixes PR41997

Reviewers: mgrang, efriedma

Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62344

llvm-svn: 361585
2019-05-24 01:27:20 +00:00
Serge Pavlov ed595e8627 [AArch64] Add nvcast patterns for v2f32 -> v1f64
Summary: Constant stores of f32 values can create such NvCast nodes.

Reviewers: t.p.northover

Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62285

llvm-svn: 361584
2019-05-24 01:20:34 +00:00
Thomas Lively 55229f6b10 [WebAssembly] Expand more SIMD float ops
Summary: These were previously causing ISel failures.

Reviewers: aheejin

Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62354

llvm-svn: 361577
2019-05-24 00:15:04 +00:00
Matt Arsenault 5c714cbdd8 AMDGPU: Correct maximum possible private allocation size
We were assuming a much larger possible per-wave visible stack
allocation than is possible:

faa3ae5138/src/core/runtime/amd_gpu_agent.cpp (L70)

Based on this, we can assume the high 15 bits of a frame index or sret
are 0. The frame index value is the per-lane offset, so the maximum
frame index value is MAX_WAVE_SCRATCH / wavesize.

Remove the corresponding subtarget feature and option that made
this configurable.

llvm-svn: 361541
2019-05-23 19:38:14 +00:00
Robert Lougher 170dfeb2ff Resubmit r360436 "[X86] Avoid SFB - Fix inconsistent codegen with/without debug info"
Fixes https://bugs.llvm.org/show_bug.cgi?id=40969

The functions findPotentiallyBlockedCopies and buildCopy are currently not
accounting for the presence of debug instructions. In the former this results
in the optimization not being trigerred, and in the latter results in
inconsistent codegen.

This patch enables the optimization to be performed in a debug build and
ensures the codegen is consistent with non-debug builds.

Patch by Chris Dawson.

Differential Revision: https://reviews.llvm.org/D61680

llvm-svn: 361527
2019-05-23 18:15:12 +00:00
Thomas Lively e18b5c6237 [WebAssembly] Implement ReplaceNodeResults to fix a SIMD crash
Reviewers: aheejin

Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61037

llvm-svn: 361526
2019-05-23 18:09:26 +00:00
Matt Arsenault 0f3ba44b57 AMDGPU/GlobalISel: Legality for integer min/max
llvm-svn: 361519
2019-05-23 17:58:48 +00:00
Thomas Lively eafe8ef6f2 [WebAssembly] Add multivalue and tail-call target features
Summary:
These features will both be implemented soon, so I thought I would
save time by adding the boilerplate for both of them at the same time.

Reviewers: aheejin

Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D62047

llvm-svn: 361516
2019-05-23 17:26:47 +00:00
Lewis Revill 74927554e2 [RISCV] Support assembling TLS LA pseudo instructions
This patch adds the pseudo instructions la.tls.ie and la.tls.gd, used in
the initial-exec and global-dynamic TLS models respectively when
addressing a global. The pseudo instructions are expanded in the
assembly parser.

llvm-svn: 361499
2019-05-23 14:46:27 +00:00
Sam Parker 617cdc5a6d [ARM][CGP] Clear SafeWrap before each search
The previous patch added a member set to store instructions that we
could allow to wrap. But this wasn't cleared between searches meaning
that they could get promoted, incorrectly, during the promotion of a
separate valid chain.

Differential Revision: https://reviews.llvm.org/D62254

llvm-svn: 361462
2019-05-23 07:46:39 +00:00
Thomas Lively 1a3cbe720c [WebAssembly] Implement __builtin_return_address for emscripten
Summary:
In this patch, `ISD::RETURNADDR` is lowered on the emscripten target
to the new Emscripten runtime function `emscripten_return_address`, which
implements the functionality.

Patch by Guanzhong Chen

Reviewers: tlively, aheejin

Reviewed By: tlively

Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62210

llvm-svn: 361454
2019-05-23 01:24:01 +00:00
Fangrui Song 86c9ca48c3 [X86] Support -fno-plt __tls_get_addr calls
In general dynamic/local dynamic TLS models, with -fno-plt,

* x86: emit `calll *___tls_get_addr@GOT(%ebx)` instead of `calll ___tls_get_addr@PLT`
  Note, on x86, if we can get rid of %ebx as the PIC register,
  it may be better to use a register not preserved across function calls.
* x86_64: emit `callq *__tls_get_addr@GOTPCREL(%rip)` instead of `callq __tls_get_addr@PLT`

Reorganize the code by separating 32-bit and 64-bit.

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D62106

llvm-svn: 361453
2019-05-23 01:05:13 +00:00
Craig Topper 93f38e1f1a [X86] Explcitly disable VEXTRACT instruction matching for an immediate of 0. Remove a bunch of isel patterns that become unnecessary.
We effectively had a second set of isel patterns that tried to use a
regular store instruction and an extract_subreg instruction. Or a masked move
and an extract_subreg. These patterns were intended to override the
matching of VEXTRACT instructions by taking advantage of the priority
of the explicit immediate 0 for the index.

This patch instaed just disables the immediate 0 matchin the VEXTRACT
patterns. This each of the component pieces of the larger patterns will
match by themselves.

This found a bug of sorts were we didn't use 128-bit store for 512->128
extract on KNL. Its unclear what the right thing here should be.
Using the vextract avoids constraining the register allocator to use
xmm0-15. But it always results in a longer encoding if the register
allocator ends up choosing xmm0-15 anyway.

llvm-svn: 361431
2019-05-22 21:00:18 +00:00
Galina Kistanova ed49f6d8e6 Reverted r361134 because of a failing test left unattended for a long time.
http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-win/builds/17792/steps/test-check-all/logs/stdio
Failing Tests (1):
    LLVM :: CodeGen/AMDGPU/regbank-reassign.mir

llvm-svn: 361430
2019-05-22 20:42:56 +00:00
Craig Topper 9816d55776 [X86][InstCombine] Remove InstCombine code that turns X86 round intrinsics into llvm.ceil/floor. Remove some isel patterns that existed because that was happening.
We were turning roundss/sd/ps/pd intrinsics with immediates of 1 or 2 into
llvm.floor/ceil.  The llvm.ceil/floor intrinsics are supposed to correspond
to the libm functions.  For the libm functions we need to disable the
precision exception so the llvm.floor/ceil functions should always map to
encodings 0x9 and 0xA.

We had a mix of isel patterns where some used 0x9 and 0xA and others used
0x1 and 0x2. We need to be consistent and always use 0x9 and 0xA.

Since we have no way in isel of knowing where the llvm.ceil/floor came
from, we can't map X86 specific intrinsics with encodings 1 or 2 to it.
We could map 0x9 and 0xA to llvm.ceil/floor instead, but I'd really like
to see a use case and optimization advantage first.

I've left the backend test cases to show the blend we now emit without
the extra isel patterns. But I've removed the InstCombine tests completely.

llvm-svn: 361425
2019-05-22 20:04:55 +00:00
Alexey Lapshin 53726588f6 [DebugInfo][AArch64] Recognise target specific instruction as mov instr
This fix is for the problem from https://bugs.llvm.org/show_bug.cgi?id=38714.
Specifically, Simple Register Coalescing creates following conversion :

 undef %0.sub_32:gpr64 = ORRWrs $wzr, %3:gpr32common, 0, debug-location !24;

It copies 32-bit value from gpr32 into gpr64. But Live DEBUG_VALUE analysis
is not able to create debug location record for that instruction. So the problem
is in that debug info for argc variable is incorrect. The fix is
to write custom isCopyInstrImpl() which would recognize the ORRWrs instr.

llvm-svn: 361417
2019-05-22 18:48:58 +00:00
Matt Arsenault 418e23e33c AMDGPU: Move disassembler support check to constructor
Don't check for unsupported targets for every instruction.

llvm-svn: 361406
2019-05-22 16:28:48 +00:00
Matt Arsenault ca64ef2043 MC: Allow getMaxInstLength to depend on the subtarget
Keep it optional in cases this is ever needed in some global
context. Currently it's only used for getting an upper bound inline
asm code size.

For AMDGPU, gfx10 increases the maximum instruction size to
20-bytes. This avoids penalizing older subtargets when estimating code
size, and making some annoying branch relaxation test adjustments.

llvm-svn: 361405
2019-05-22 16:28:41 +00:00
Dmitry Preobrazhensky 7773fc478d [AMDGPU][MC] Corrected parsing of op_sel* and neg_* modifiers
See bug 41361: https://bugs.llvm.org/show_bug.cgi?id=41361

Reviewers: artem.tamazov, arsenm

Differential Revision: https://reviews.llvm.org/D61012

llvm-svn: 361386
2019-05-22 13:59:01 +00:00
Simon Pilgrim 9b40dd6318 [Hexagon] assert getRegisterBitWidth returns non-zero value. NFCI.
Fixes scan-build warning.

llvm-svn: 361375
2019-05-22 12:25:46 +00:00
Sjoerd Meijer aa4f1ffca4 [TargetMachine] error message unsupported code model
When the tiny code model is requested for a target machine that does not
support this, we get an error message (which is nice) but also this diagnostic
and request to submit a bug report:

    fatal error: error in backend: Target does not support the tiny CodeModel
    [Inferior 2 (process 31509) exited with code 0106]
    clang-9: error: clang frontend command failed with exit code 70 (use -v to see invocation)
    (gdb) clang version 9.0.0 (http://llvm.org/git/clang.git 29994b0c63a40f9c97c664170244a7bba5ecc15e) (http://llvm.org/git/llvm.git 95606fdf91c2d63a931e865f4b78b2e9828ddc74)
    Target: arm-arm-none-eabi
    Thread model: posix
    clang-9: note: diagnostic msg: PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace, preprocessed source, and associated run script.
    clang-9: note: diagnostic msg:
    ********************
    PLEASE ATTACH THE FOLLOWING FILES TO THE BUG REPORT:
    Preprocessed source(s) and associated run script(s) are located at:
    clang-9: note: diagnostic msg: /tmp/tiny-dfe1a2.c
    clang-9: note: diagnostic msg: /tmp/tiny-dfe1a2.sh
    clang-9: note: diagnostic msg:

But this is not a bug, this is a feature. :-) Not only is this not a bug, this
is also pretty confusing. This patch causes just to print the fatal error and
not the diagnostic:

fatal error: error in backend: Target does not support the tiny CodeModel

Differential Revision: https://reviews.llvm.org/D62236

llvm-svn: 361370
2019-05-22 10:40:26 +00:00
Fangrui Song 1c61471ab1 [PPC64] Parse -elfv1 -elfv2 when specified on target triple
Summary:
For big-endian powerpc64, the default ABI is ELFv1. OpenPower ABI ELFv2 is supported when -mabi=elfv2 is specified. FreeBSD support for PowerPC64 ELFv2 ABI with LLVM is in progress[1]. This patch adds an alternative way to specify ELFv2 ABI on target triple [2].

The following results are expected:

ELFv1 when using:
-target powerpc64-unknown-freebsd12.0
-target powerpc64-unknown-freebsd12.0 -mabi=elfv1
-target powerpc64-unknown-freebsd12.0-elfv1

ELFv2 when using:
-target powerpc64-unknown-freebsd12.0 -mabi=elfv2
-target powerpc64-unknown-freebsd12.0-elfv2

[1] https://wiki.freebsd.org/powerpc/llvm-elfv2
[2] https://clang.llvm.org/docs/CrossCompilation.html

Patch by Alfredo Dal'Ava Júnior!

Differential Revision: https://reviews.llvm.org/D61950

llvm-svn: 361355
2019-05-22 07:29:59 +00:00
Sjoerd Meijer eec021658b [AArch64] Subtarget crypto extension defaults
The Armv8.2-A crypto extensions all defaulted to true, but should default to
false, like all the other extensions.

Differential Revision: https://reviews.llvm.org/D62180

llvm-svn: 361354
2019-05-22 07:10:27 +00:00
Nikita Popov 15df05152d [X86] Don't compare i128 through vector if construction not cheap (PR41971)
Fix for https://bugs.llvm.org/show_bug.cgi?id=41971. Make the
combineVectorSizedSetCCEquality() transform more conservative by
checking that the bitcast to the vector type will be cheap/free
for both operands. I'm considering it cheap if it's a constant,
a load or already a vector. I've dropped the explicit check for
f128 because it should fall out naturally (in the cases where
it'd be detrimental).

Differential Revision: https://reviews.llvm.org/D62220

llvm-svn: 361352
2019-05-22 06:47:06 +00:00
Chen Zheng b727b0483c [PowerPC] use meaningful name for displacement form aligned with x-form - NFC
llvm-svn: 361347
2019-05-22 03:17:39 +00:00
Chen Zheng 9970665f60 [PowerPC] [ISEL] select x-form instruction for unaligned offset
Differential Revision: https://reviews.llvm.org/D62173

llvm-svn: 361346
2019-05-22 02:57:31 +00:00
Pengfei Wang 6a0d432e9e [X86] [CET] Deal with return-twice function such as vfork, setjmp when
CET-IBT enabled

Return-twice functions will indirectly jump after the caller's position.
So when CET-IBT is enable, we should make sure these is endbr*
instructions follow these Return-twice function caller. Like GCC does.

Patch by Xiang Zhang (xiangzhangllvm)

Differential Revision: https://reviews.llvm.org/D61881

llvm-svn: 361342
2019-05-22 00:50:21 +00:00
Matt Arsenault 2cba91b8db AMDGPU: Assume calls read exec
llvm-svn: 361333
2019-05-21 23:23:16 +00:00
Matt Arsenault dd1ffa00a5 AMDGPU: Assume call pseudos are convergent
There should probably be nonconvergent versions, but my guess is it
doesn't matter in practice.

llvm-svn: 361331
2019-05-21 23:23:10 +00:00
Matt Arsenault 60ba03e210 AMDGPU: Fix not marking new gfx10 SGPRs as CSRs
llvm-svn: 361330
2019-05-21 23:23:05 +00:00
Dan Gohman a49496fb2a [WebAssembly] Add the signature for the new llround builtin function
r360889 added new llround builtin functions. This patch adds their
signatures for the WebAssembly backend.

It also adds wasm32 support to utils/update_llc_test_checks.py, since
that's the script other targets are using for their testcases for this
feature.

Differential Revision: https://reviews.llvm.org/D62207

llvm-svn: 361327
2019-05-21 23:06:34 +00:00
Craig Topper ed6df47bae [X86] Remove an unneeded ZERO_EXTEND creation from LowerINTRINSIC_W_CHAIN. NFC
We were trying to ZERO_EXTEND from an i8 X86ISD::SETCC to i8 again.

llvm-svn: 361288
2019-05-21 19:03:45 +00:00
Simon Pilgrim 4b82e50315 [X86][SSE] computeKnownBitsForTargetNode - add X86ISD::ANDNP support
Fixes PACKSS-PSHUFB shuffle regressions mentioned on D61692

llvm-svn: 361270
2019-05-21 15:20:24 +00:00
Fangrui Song cd36a2857e [PPC64] Update LocalEntry from assigned symbols
On PowerPC64 ELFv2 ABI, functions may have 2 entry points: global and local.
The local entry point location of a function is stored in the st_other field of the symbol, as an offset relative to the global entry point.

In order to make symbol assignments (e.g. .equ/.set) work properly with this, PPCTargetELFStreamer already copies the local entry bits from the source symbol to the destination one, on emitAssignment(). The problem is that this copy is performed only at the assignment location, where the source symbol may not yet have processed the .localentry directive, that sets the local entry. This may cause the destination symbol to end up with wrong local entry information. Other symbol info is not affected by this because, in this case, the destination symbol value is actually a symbol reference.

This change keeps track of these assignments, and update all needed st_other fields when finish() is called.

Patch by Leandro Lupori!

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D56586

llvm-svn: 361237
2019-05-21 10:41:25 +00:00
Florian Hahn 4a8835c655 [AArch64] Skip mask checks for masks with an odd number of elements.
Some checks in isShuffleMaskLegal expect an even number of elements,
e.g. isTRN_v_undef_Mask or isUZP_v_undef_Mask, otherwise they access
invalid elements and crash. This patch adds checks to the impacted
functions.

Fixes PR41951

Reviewers: t.p.northover, dmgreen, samparker

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D60690

llvm-svn: 361235
2019-05-21 10:05:26 +00:00
Cullen Rhodes 7f47b75d18 [AArch64][SVE2] Asm: add integer unary instructions (predicated)
Summary:
Patch adds support for the following instructions:

    * URECPE, URSQRTE, SQABS, SQNEG

The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D62129

llvm-svn: 361230
2019-05-21 09:06:51 +00:00
Cullen Rhodes e798e8d9d2 [AArch64][SVE2] Asm: add integer pairwise arithmetic instructions
Summary:
Patch adds support for the following instructions:

    ADDP, SMAXP, UMAXP, SMINP, UMINP

The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D62128

llvm-svn: 361229
2019-05-21 08:59:00 +00:00
Sam Parker 3141bbd52d [ARM][CGP] Skip nuw in PrepareConstants
PrepareConstants step converts add/sub with 'negative' immediates to
sub/add with a 'positive' imm to make promotion more simple. nuw
already states that the add shouldn't cause an unsigned wrap, so
it shouldn't need any tweaking. Plus, we also don't allow a sub with
a 'negative' immediate to be safe wrap, so this functionality has
been removed. The PrepareConstants step now just handles the add
instructions that we've determined would be safe if they wrap around
zero.

Differential Revision: https://reviews.llvm.org/D62057

llvm-svn: 361227
2019-05-21 07:56:47 +00:00
Dylan McKay e967308da4 Add TargetLoweringInfo hook for explicitly setting the ABI calling convention endianess
Summary:
The endianess used in the calling convention does not always match the
endianess of the target on all architectures, namely AVR.

When an argument is too large to be legalised by the architecture and is
split for the ABI, a new hook TargetLoweringInfo::shouldSplitFunctionArgumentsAsLittleEndian
is queried to find the endianess that function arguments must be laid
out in.

This approach was recommended by Eli Friedman.

Originally reported in https://github.com/avr-rust/rust/issues/129.

Patch by Carl Peto.

Reviewers: bogner, t.p.northover, RKSimon, niravd, efriedma

Reviewed By: efriedma

Subscribers: JDevlieghere, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62003

llvm-svn: 361222
2019-05-21 06:38:02 +00:00
Chen Zheng c4c407a0eb [PowerPC] use more meaningful name - NFC
llvm-svn: 361218
2019-05-21 03:54:42 +00:00
Matt Arsenault 6dd08e335f AMDGPU: Force skip branches over calls
Unfortunately the way SIInsertSkips works is backwards, and is
required for correctness. r338235 added handling of some special cases
where skipping is mandatory to avoid side effects if no lanes are
active. It conservatively handled asm correctly, but the same logic
needs to apply to calls.

Usually the call sequence code is larger than the skip threshold,
although the way the count is computed is really broken, so I'm not
sure if anything was likely to really hit this.

llvm-svn: 361202
2019-05-20 22:04:42 +00:00
Martin Storsjo 4ed18e5ef5 [AArch64] Handle lowering lround on windows, where long is 32 bit
Differential Revision: https://reviews.llvm.org/D62108

llvm-svn: 361192
2019-05-20 19:53:28 +00:00
Bjorn Pettersson eee0f2330d [AMDGPU] Fix std::array initializers to avoid warnings with older tool chains. NFC
A std::array is implemented as a template with an array
inside a struct. Older versions of clang, like 3.6,
require an extra set of curly braces around std::array
initializations to avoid warnings.

The C++ language was changed regarding this by CWG 1270.
So more modern tool chains does not complaing even if
leaving out one level of braces.

llvm-svn: 361171
2019-05-20 16:41:08 +00:00
Matt Arsenault 5239298b0d R600: Fix unconditional return in loop
llvm-svn: 361167
2019-05-20 16:22:11 +00:00
Cullen Rhodes 523789fa6b [AArch64][SVE2] Asm: add SADALP and UADALP instructions
Summary:
This patch adds support for the integer pairwise add and accumulate long
instructions SADALP/UADALP. These instructions are predicated.

The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D62001

llvm-svn: 361154
2019-05-20 13:50:15 +00:00
Petar Jovanovic e85bbf564d [DebugInfoMetadata] Refactor DIExpression::prepend constants (NFC)
Refactor DIExpression::With* into a flag enum in order to be less
error-prone to use (as discussed on D60866).

Patch by Djordje Todorovic.

Differential Revision: https://reviews.llvm.org/D61943

llvm-svn: 361137
2019-05-20 10:35:57 +00:00
Cullen Rhodes 96c5929926 [AArch64][SVE2] Asm: add int halving add/sub (predicated) instructions
Summary:
This patch adds support for the predicated integer halving add/sub
instructions:

    * SHADD, UHADD, SRHADD, URHADD
    * SHSUB, UHSUB, SHSUBR, UHSUBR

The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest

Reviewed By: rovka

Differential Revision: https://reviews.llvm.org/D62000

llvm-svn: 361136
2019-05-20 10:35:23 +00:00
Cullen Rhodes 0fc6347b35 [AArch64][SVE2] Asm: add saturating multiply-add interleaved long instructions
Summary:
Patch adds support for SQDMLALBT and SQDMLSLBT instructions.

The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest

Reviewed By: rovka

Differential Revision: https://reviews.llvm.org/D61998

llvm-svn: 361135
2019-05-20 10:29:48 +00:00
Fangrui Song 68774edcd6 Use llvm::sort. NFC
llvm-svn: 361134
2019-05-20 10:18:35 +00:00
Carl Ritson 34e95ce259 [AMDGPU] gfx1010 Avoid SMEM WAR hazard for some s_waitcnt values
Summary:
Avoid introducing hazard mitigation when lgkmcnt is reduced to 0.
Clarify code comments to explain assumptions made for this hazard
mitigation.  Expand and correct test cases to cover variants of
s_waitcnt.

Reviewers: nhaehnle, rampitec

Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62058

llvm-svn: 361124
2019-05-20 07:20:12 +00:00
Craig Topper 3164b50af7 [X86] Remove combineShift function. Just dispatch directly to the handler for each flavor from the main switch. NFC
llvm-svn: 361108
2019-05-19 01:01:46 +00:00
Matt Arsenault 2f29220d6d AMDGPU/GlobalISel: Implement s64->s64 [SU]ITOFP
llvm-svn: 361082
2019-05-17 23:05:18 +00:00
Matt Arsenault 02b5ca8cd1 GlobalISel: Implement lower for S64->S32 [SU]ITOFP
This is ported from the custom AMDGPU DAG implementation. I think this
is a better default expansion than what the DAG currently uses, at
least if the target has CTLZ.

This implements the signed version in terms of the unsigned
conversion, which is implemented with bit operations. SelectionDAG has
several other implementations that should eventually be ported
depending on what instructions are legal.

llvm-svn: 361081
2019-05-17 23:05:13 +00:00
Sam Clegg 13717bd54b [WebAssembly] Remove expected failure of builtin-location.C test
This seems to have been fixed by https://reviews.llvm.org/D61956

Yay

Differential Revision: https://reviews.llvm.org/D62075

llvm-svn: 361071
2019-05-17 19:55:17 +00:00
Simon Pilgrim 065431c82b [X86][SSE] Fold movmsk(not(x)) -> not(movmsk)
Helps to improve folding of comparisons with movmsk results.

llvm-svn: 361056
2019-05-17 17:56:25 +00:00
Simon Pilgrim 2c2f8e74b9 [X86][SSE] Match all-of bool scalar reductions into a bitcast/movmsk + cmp.
Same as what we do for vector reductions in combineHorizontalPredicateResult, use movmsk+cmp for scalar (and(extract(x,0),extract(x,1)) reduction patterns. 

llvm-svn: 361052
2019-05-17 17:25:55 +00:00
Dmitry Preobrazhensky 198611b0ff [AMDGPU][MC] Corrected parsing of NAME:VALUE modifiers
See bug 41298: https://bugs.llvm.org/show_bug.cgi?id=41298

Reviewers: artem.tamazov, arsenm

Differential Revision: https://reviews.llvm.org/D61009

llvm-svn: 361045
2019-05-17 16:04:17 +00:00
Dmitry Preobrazhensky 5ae3113969 [AMDGPU][MC] Enabled labels with s_call_b64 and s_cbranch_i_fork
See https://bugs.llvm.org/show_bug.cgi?id=41888

Reviewers: artem.tamazov, arsenm

Differential Revision: https://reviews.llvm.org/D62016

llvm-svn: 361040
2019-05-17 14:57:04 +00:00
Simon Pilgrim 279314e81b [X86][AVX] Remove LowerCTTZ's AVX1 custom vector handling.
We can now rely on generic expansion to handle this.

llvm-svn: 361038
2019-05-17 14:37:19 +00:00
Simon Pilgrim 62c7032c18 [X86][AVX] isNOT - add extract_subvector(xor X, -1) -> extract_subvector(X) fold.
Prep work for the removal of the remaining x86 CTTZ vector lowering.

llvm-svn: 361035
2019-05-17 14:04:56 +00:00
Dmitry Preobrazhensky 43fcc79837 [AMDGPU][MC] Enabled expressions for most operands which accept integer values
See bug 40873: https://bugs.llvm.org/show_bug.cgi?id=40873

Reviewers: artem.tamazov, arsenm

Differential Revision: https://reviews.llvm.org/D60768

llvm-svn: 361031
2019-05-17 13:17:48 +00:00
Matt Arsenault 1a02d30c87 AMDGPU: Fix unused variable warnings in release builds
llvm-svn: 361030
2019-05-17 12:59:27 +00:00
Matt Arsenault a510b570c2 AMDGPU/GlobalISel: Legalize G_FCEIL
llvm-svn: 361028
2019-05-17 12:20:05 +00:00
Matt Arsenault 6aebcd5499 AMDGPU/GlobalISel: Legalize G_INTRINSIC_TRUNC
llvm-svn: 361027
2019-05-17 12:20:01 +00:00
Matt Arsenault 6aafc5e19d AMDGPU/GlobalISel: Legalize G_FRINT
llvm-svn: 361026
2019-05-17 12:19:57 +00:00
Matt Arsenault 1448f5689e AMDGPU/GlobalISel: Legalize G_FCOPYSIGN
llvm-svn: 361025
2019-05-17 12:19:52 +00:00
Matt Arsenault 568f193847 AMDGPU/GlobalISel: RegBankSelect for llvm.amdgcn.s.buffer.load
llvm-svn: 361023
2019-05-17 12:02:34 +00:00
Matt Arsenault a3b5a386fa AMDGPU/GlobalISel: Use subreg index instead of extra unmerge
This saves instructions and extra steps, but I'm not sure about
introducing subregister indexes at this point.

llvm-svn: 361022
2019-05-17 12:02:31 +00:00
Matt Arsenault b3dc73634c AMDGPU/GlobalISel: Use waterfall loop for buffer_load
This adds support for more complex waterfall loops that need to handle
operands > 32-bits, and multiple operands.

llvm-svn: 361021
2019-05-17 12:02:27 +00:00
Simon Pilgrim a6d3bd486b [X86] Pull out IsNOT helper. NFCI.
Return the input value for the NOT pattern: (xor X, -1) -> X

llvm-svn: 361012
2019-05-17 10:37:08 +00:00
Rhys Perry c4bc61bad7 [AMDGPU] detect WaW hazards when moving/merging load/store instructions
Summary:
    In order to combine memory operations efficiently, the load/store
    optimizer might move some instructions around. It's usually safe
    to move instructions down past the merged instruction because the
    pass checks if memory operations can be re-ordered.

    Though, the current logic doesn't handle Write-after-Write hazards.

    This fixes a reflection issue with Monster Hunter World and DXVK.

    v2: - rebased on top of master
        - clean up the test case
        - handle WaW hazards correctly

    Bugzilla: https://bugs.llvm.org/show_bug.cgi?id=40130

Original patch by Samuel Pitoiset.

Reviewers: tpr, arsenm, nhaehnle

Reviewed By: nhaehnle

Subscribers: ronlieb, arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye

Differential Revision: https://reviews.llvm.org/D61313

llvm-svn: 361008
2019-05-17 09:32:23 +00:00
Cullen Rhodes 7f605c3550 [AArch64][SVE2] Asm: add saturating multiply-add long instructions
Summary:
Patch adds support for indexed and unpredicated vectors forms of the
following instructions:

    * SQDMLALB, SQDMLALT, SQDMLSLB, SQDMLSLT

The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D61997

llvm-svn: 361005
2019-05-17 09:29:43 +00:00
Cullen Rhodes 334130a199 [AArch64][SVE2] Asm: add integer multiply-add long instructions
Summary:
Patch adds support for indexed and unpredicated vectors forms of the
following instructions:

    * SMLALB, SMLALT, UMLALB, UMLALT, SMLSLB, SMLSLT, UMLSLB, UMLSLT

The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest

Reviewed By: rovka

Differential Revision: https://reviews.llvm.org/D61951

llvm-svn: 361003
2019-05-17 09:19:41 +00:00
Cullen Rhodes 0d47f00821 [AArch64][SVE2] Asm: add integer multiply long instructions
Summary:
Patch adds support for indexed and unpredicated vectors forms of the
following instructions:

    * SMULLB, SMULLT, UMULLB, UMULLT, SQDMULLB, SQDMULLT

The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest

Reviewed By: rovka

Differential Revision: https://reviews.llvm.org/D61936

llvm-svn: 361002
2019-05-17 09:04:44 +00:00
Craig Topper ae1597d360 [X86] Add FeatureFastScalarShiftMasks and FeatureFastVectorShiftMasks to the ignore list for inlining compatibility.
These are tuning flags and won't cause any codegen issue if we inline a function
with a different value.

llvm-svn: 360992
2019-05-17 06:40:21 +00:00
Fangrui Song ad7199f3e6 [PowerPC] Support .reloc *, R_PPC{,64}_NONE, *
This can be used to create references among sections. When --gc-sections
is used, the referenced section will be retained if the origin section
is retained.

llvm-svn: 360990
2019-05-17 06:04:11 +00:00
Fangrui Song e18a6ad0b8 [MC][PowerPC] Clean up PPCAsmBackend
Replace the member variable Target with Triple
Use Triple instead of TheTarget.getName() to dispatch on 32-bit/64-bit.
Delete redundant parameters

llvm-svn: 360986
2019-05-17 05:44:26 +00:00
Fangrui Song 2463239777 [X86] Support .reloc *, R_{386,X86_64}_NONE, *
This can be used to create references among sections. When --gc-sections
is used, the referenced section will be retained if the origin section
is retained.

See R_MIPS_NONE (D13659), R_ARM_NONE (D61992), R_AARCH64_NONE (D61973) for similar changes.

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D62014

llvm-svn: 360983
2019-05-17 03:25:39 +00:00
Fangrui Song aa6102ad8e [AArch64] Support .reloc *, R_AARCH64_NONE, *
Summary:
This can be used to create references among sections. When --gc-sections
is used, the referenced section will be retained if the origin section
is retained.

Reviewed By: peter.smith

Differential Revision: https://reviews.llvm.org/D61973

llvm-svn: 360981
2019-05-17 03:05:07 +00:00
Fangrui Song 43ca0e9eb8 [ARM] Support .reloc *, R_ARM_NONE, *
R_ARM_NONE can be used to create references among sections. When
--gc-sections is used, the referenced section will be retained if the
origin section is retained.

Add a generic MCFixupKind FK_NONE as this kind of no-op relocation is
ubiquitous on ELF and COFF, and probably available on many other binary
formats. See D62014.

Reviewed By: peter.smith

Differential Revision: https://reviews.llvm.org/D61992

llvm-svn: 360980
2019-05-17 02:51:54 +00:00
Jonas Paulsson 9427961c89 [SystemZ] Bugfix in SystemZTargetLowering::combineIntDIVREM()
Make sure to not unroll a vector division/remainder (with a constant splat
divisor) after type legalization, since the scalar type may then be illegal.

Review: Ulrich Weigand
https://reviews.llvm.org/D62036

llvm-svn: 360965
2019-05-17 00:50:35 +00:00
David L. Jones 4a5e01faa4 [X86][AsmParser] Add mnemonics missed in r360954.
These are valid Jcc, but aren't based on the EFLAGS condition codes (Intel 64
and IA-32 Architetcures Software Developer's Manual Vol. 1, Appendix B). These
are covered in clang/test, but not llvm/test.

llvm-svn: 360960
2019-05-17 00:19:20 +00:00
David L. Jones add7ed2281 [X86][AsmParser] Ignore "short" even harder in Intel syntax ASM.
In Intel syntax, it's not uncommon to see a "short" modifier on Jcc conditional
jumps, which indicates the offset should be a "short jump" (8-bit immediate
offset from EIP, -128 to +127). This patch expands to all recognized Jcc
condition codes, and removes the inline restriction.

Clang already ignores "jmp short" in inline assembly. However, only "jmp" and a
couple of Jcc are actually checked, and only inline (i.e., not when using the
integrated assembler for asm sources). A quick search through asm-containing
libraries at hand shows a pretty broad range of Jcc conditions spelled with
"short."

GAS ignores the "short" modifier, and instead uses an encoding based on the
given immediate. MS inline seems to do the same, and I suspect MASM does, too.
NASM will yield an error if presented with an out-of-range immediate value.

Example of GCC 9.1 and MSVC v19.20, "jmp short" with offsets that do and do not
fit within 8 bits: https://gcc.godbolt.org/z/aFZmjY

Differential Revision: https://reviews.llvm.org/D61990

llvm-svn: 360954
2019-05-16 23:27:07 +00:00
David L. Jones 11305984d0 [X86][AsmParser] Rename "ConditionCode" variable to "ConditionPredicate".
This better matches the verbiage in Intel documentation, and should help avoid
confusion between these two different kinds of values, both of which are parsed
from mnemonics.

llvm-svn: 360953
2019-05-16 23:27:05 +00:00
Reid Kleckner 08c15df29f [X86] Deduplicate symbol lowering logic, NFC
Summary:
This refactors four pieces of code that create SDNodes for references to
symbols:
- normal global address lowering (LEA, MOV, etc)
- callee global address lowering (CALL)
- external symbol address lowering (LEA, MOV, etc)
- external symbol address lowering (CALL)

Each of these pieces of code need to:
- classify the reference
- lower the symbol
- emit a RIP wrapper if needed
- emit a load if needed
- add offsets if needed

I think handling them all in one place will make the code easier to
maintain in the future.

Reviewers: craig.topper, RKSimon

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61690

llvm-svn: 360952
2019-05-16 23:15:26 +00:00
Craig Topper f09b9d419f [X86] Use 0x9 instead of 0x1 as the immediate in some masked floor pattern. Similarly change 0x2 to 0xA for ceil.
This suppresses exceptions which is what we should be doing for ceil and floor. We already use the correct immediate
in patterns without masking.

llvm-svn: 360915
2019-05-16 16:53:50 +00:00
Matt Arsenault 99e6f4d11a AMDGPU: Introduce TokenFactor for ABI register copies in call sequence
The call was missing chain dependencies on the pre-call copies. I
don't think this was causing any real issues however.

llvm-svn: 360906
2019-05-16 15:10:27 +00:00
Matt Arsenault df24c92c0f AMDGPU: Assume xnack is enabled by default
This is the conservatively correct default. It is always safe to
assume xnack is enabled, but not the converse.

Introduce a feature to blacklist targets where xnack can never be
meaningfully enabled. I'm not sure the targets this is applied to is
100% correct.

llvm-svn: 360903
2019-05-16 14:48:34 +00:00
Adhemerval Zanella 2d28db6b9f [AArch64] Handle ISD::LROUND and ISD::LLROUND
This patch optimizes ISD::LROUND and ISD::LLROUND to fcvtas
instruction. It currently only handles the scalar version.

llvm-svn: 360894
2019-05-16 13:30:18 +00:00
Adhemerval Zanella 73643b5041 [CodeGen] Add lround/llround builtins
This patch add the ISD::LROUND and ISD::LLROUND along with new
intrinsics.  The changes are straightforward as for other
floating-point rounding functions, with just some adjustments
required to handle the return value being an interger.

The idea is to optimize lround/llround generation for AArch64
in a subsequent patch.  Current semantic is just route it to libm
symbol.

llvm-svn: 360889
2019-05-16 13:15:27 +00:00
Matt Arsenault a8f88c388f AMDGPU/GlobalISel: Correct regbank for 1-bit and/or/xor
Bool values should use the scc/vcc regbank since r350611.

llvm-svn: 360877
2019-05-16 12:06:41 +00:00
Cullen Rhodes 472c6ef8b0 [AArch64][SVE2] Asm: implement CMLA/SQRDCMLAH instructions
Summary:
This patch adds support for the indexed and unpredicated vectors forms
of the CMLA and SQRDCMLAH instructions.

The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D61906

llvm-svn: 360871
2019-05-16 09:42:22 +00:00
Cullen Rhodes 07eba98dd7 [AArch64][SVE2] Asm: implement CDOT instruction
Summary:
The complex DOT instructions perform a dot-product on quadtuplets from
two source vectors and the resuling wide real or wide imaginary is
accumulated into the destination register. The instructions come in two
forms:

Vector form, e.g.
  cdot z0.s, z1.b, z2.b, #90    - complex dot product on four 8-bit quad-tuplets,
                                  accumulating results in 32-bit elements. The
                                  complex numbers in the second source vector are
                                  rotated by 90 degrees.

  cdot z0.d, z1.h, z2.h, #180   - complex dot product on four 16-bit quad-tuplets,
                                  accumulating results in 64-bit elements.
                                  The complex numbers in the second source
                                  vector are rotated by 180 degrees.

Indexed form, e.g.
  cdot z0.s, z1.b, z2.b[3], #0  - complex dot product on four 8-bit quad-tuplets,
                                  with specified quadtuplet from second source vector,
                                  accumulating results in 32-bit elements.
  cdot z0.d, z1.h, z2.h[1], #0  - complex dot product on four 16-bit quad-tuplets,
                                  with specified quadtuplet from second source vector,
                                  accumulating results in 64-bit elements.

The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest

Reviewed By: SjoerdMeijer, rovka

Differential Revision: https://reviews.llvm.org/D61903

llvm-svn: 360870
2019-05-16 09:33:44 +00:00
Cullen Rhodes 064f6ab556 [AArch64][SVE2] Asm: add unpredicated integer multiply instructions
Summary:
Add support for the following instructions:

  * MUL (indexed and unpredicated vectors forms)
  * SQDMULH (indexed and unpredicated vectors forms)
  * SQRDMULH (indexed and unpredicated vectors forms)
  * SMULH (unpredicated, predicated form added in SVE)
  * UMULH (unpredicated, predicated form added in SVE)
  * PMUL (unpredicated)

The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest

Reviewed By: SjoerdMeijer, rovka

Differential Revision: https://reviews.llvm.org/D61902

llvm-svn: 360867
2019-05-16 09:07:26 +00:00
Fangrui Song 3e92df3e39 Add Triple::isPPC64()
llvm-svn: 360864
2019-05-16 08:31:22 +00:00
Craig Topper e43bdf144c [X86] Delay creating index register negations during address matching until after we know for sure the match will succeed
If we're trying to match an LEA, its possible the LEA match will be deemed unprofitable. In which case the negation we created in matchAddress would be left dangling in the SelectionDAG. This could artificially increase use counts for other nodes in the DAG. Though I don't have an example of that. But it just seems like bad form to have dangling nodes in isel.

Differential Revision: https://reviews.llvm.org/D61047

llvm-svn: 360823
2019-05-15 21:59:53 +00:00
Simon Atanasyan 0b0cc23fb6 [mips] Use range-based `for` loops. NFC
llvm-svn: 360817
2019-05-15 21:26:25 +00:00
Mandeep Singh Grang 814435fe87 [AArch64] only indicate CFI on Windows if we emitted CFI
Summary:
Otherwise, we emit directives for CFI without any actual CFI opcodes to
go with them, which causes tools to malfunction.  The technique is
similar to what the x86 backend already does.

Fixes https://bugs.llvm.org/show_bug.cgi?id=40876

Patch by: froydnj (Nathan Froyd)

Reviewers: mstorsjo, eli.friedman, rnk, mgrang, ssijaric

Reviewed By: rnk

Subscribers: javed.absar, kristof.beyls, llvm-commits, dmajor

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61960

llvm-svn: 360816
2019-05-15 21:23:41 +00:00
Craig Topper 439228727a [X86] Strengthen type constraints on some specialized X86 ISD opcodes that don't have any flexibility. NFC
These particular instructions only operate on 128-bit vectors and have no wider equivalents. And the
element size is always known.

One could argue that MOVSS/MOVSD could be merged, but that's probably disruptive to code in
X86ISelLowering and probably low value.

llvm-svn: 360815
2019-05-15 21:16:28 +00:00
Pete Couperus 1ca049959f Uncomment LLVM_FALLTHROUGH.
llvm-svn: 360798
2019-05-15 19:46:17 +00:00
Ryan Taylor 29257eb76c [AMDGPU] Increases available SGPR for Calling Convention
Summary:
SGPR in CC can be either hw initialized or set by other chained shaders
and so this increases the SGPR count availalbe to CC to 105.

Change-Id: I3dfadc750fe4a3e2bd07117a2899fd13f3e2fef3

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61261

llvm-svn: 360778
2019-05-15 14:43:55 +00:00
David Green 0582b22f10 [ARM] Don't use the Machine Scheduler for cortex-m at minsize
The new cortex-m schedule in rL360768 helps performance, but can increase the
amount of high-registers used. This, on average, ends up increasing the
codesize by a fair amount (because less instructions are converted from T2 to
T1). On cortex-m at -Oz, where we are quite size-paranoid, it is better to use
the existing DAG scheduler with the RegPressure scheduling preference (at least
until the issues around T2 vs T1 instructions can be improved).

I have also made sure that the Sched::RegPressure dag scheduler is always
chosen for MinSize.

The test shows one case where we increase the number of registers used.

Differential Revision: https://reviews.llvm.org/D61882

llvm-svn: 360769
2019-05-15 12:58:02 +00:00
David Green d2d0f46cd2 [ARM] Cortex-M4 schedule
This patch adds a simple Cortex-M4 schedule, renaming the existing M3
schedule to M4 and filling in the latencies as-per the Cortex-M4 TRM:
https://developer.arm.com/docs/ddi0439/latest

Most of these are 1, with the important exception being loads taking 2
cycles. A few others are also higher, but I don't believe they make a
large difference. I've repurposed the M3 schedule as the latencies are
mostly the same between the two cores, with the M4 having more FP and
DSP instructions. We also turn on MISched and UseAA for the cores that
now use this.

It also adds some schedule Write's to various instruction to make things
simpler.

Differential Revision: https://reviews.llvm.org/D54142

llvm-svn: 360768
2019-05-15 12:41:58 +00:00
Simon Atanasyan 4c68c5ae71 [mips] LLVM and GAS now use same instructions for CFA Definition. NFCI
LLVM previously used `DW_CFA_def_cfa` instruction in .eh_frame to set
the register and offset for current CFA rule. We change it to
`DW_CFA_def_cfa_register` which is the same one used by GAS that only
changes the register but keeping the old offset.

Patch by Mirko Brkusanin.

Differential Revision: https://reviews.llvm.org/D61899

llvm-svn: 360765
2019-05-15 12:05:27 +00:00
Craig Topper 384d46c0d5 [X86] Use OR32mi8Locked instead of LOCK_OR32mi8 in emitLockedStackOp.
They encode the same way, but OR32mi8Locked sets hasUnmodeledSideEffects set
which should be stronger than the mayLoad/mayStore on LOCK_OR32mi8. I think
this makes sense since we are using it as a fence.

This also seems to hide the operation from the speculative load hardening pass
so I've reverted r360511.

llvm-svn: 360747
2019-05-15 04:15:46 +00:00
Philip Reames 658cad1287 [NFC] Reuse a helper function to eliminate duplicate code
llvm-svn: 360740
2019-05-15 01:39:07 +00:00
Richard Trieu 5f7d4ab5f9 [XCore] Create a TargetInfo header. NFC
Move the declarations of getThe<Name>Target() functions into a new header in
TargetInfo and make users of these functions include this new header.
This fixes a layering problem.

llvm-svn: 360738
2019-05-15 01:28:30 +00:00
Richard Trieu 0116385452 [X86] Create a TargetInfo header. NFC
Move the declarations of getThe<Name>Target() functions into a new header in
TargetInfo and make users of these functions include this new header.
This fixes a layering problem.

llvm-svn: 360736
2019-05-15 01:17:58 +00:00
Richard Trieu c6c421379d [WebAssembly] Create a TargetInfo header. NFC
Move the declarations of getThe<Name>Target() functions into a new header in
TargetInfo and make users of these functions include this new header.
This fixes a layering problem.

llvm-svn: 360735
2019-05-15 01:03:00 +00:00
Richard Trieu 1e6f98b89d [SystemZ] Create a TargetInfo header. NFC
Move the declarations of getThe<Name>Target() functions into a new header in
TargetInfo and make users of these functions include this new header.
This fixes a layering problem.

llvm-svn: 360734
2019-05-15 00:46:18 +00:00
Richard Trieu cf82d4a483 [Sparc] Create a TargetInfo header. NFC
Move the declarations of getThe<Name>Target() functions into a new header in
TargetInfo and make users of these functions include this new header.
This fixes a layering problem.

llvm-svn: 360733
2019-05-15 00:35:37 +00:00
Richard Trieu 51fc56d603 [RISCV] Create a TargetInfo header. NFC
Move the declarations of getThe<Name>Target() functions into a new header in
TargetInfo and make users of these functions include this new header.
This fixes a layering problem.

llvm-svn: 360732
2019-05-15 00:24:15 +00:00
Richard Trieu ee6ced196d [PowerPC] Create a TargetInfo header. NFC
Move the declarations of getThe<Name>Target() functions into a new header in
TargetInfo and make users of these functions include this new header.
This fixes a layering problem.

llvm-svn: 360731
2019-05-15 00:09:58 +00:00
Richard Trieu e8f83befd5 [NVPTX] Create a TargetInfo header. NFC
Move the declarations of getThe<Name>Target() functions into a new header in
TargetInfo and make users of these functions include this new header.
This fixes a layering problem.

llvm-svn: 360729
2019-05-14 23:56:18 +00:00
Richard Trieu a57ce32eff [MSP430] Create a TargetInfo header. NFC
Move the declarations of getThe<Name>Target() functions into a new header in
TargetInfo and make users of these functions include this new header.
This fixes a layering problem.

llvm-svn: 360728
2019-05-14 23:45:18 +00:00
Richard Trieu 313b78150c [Mips] Create a TargetInfo header. NFC
Move the declarations of getThe<Name>Target() functions into a new header in
TargetInfo and make users of these functions include this new header.
This fixes a layering problem.

llvm-svn: 360727
2019-05-14 23:34:37 +00:00
Richard Trieu 2e50dc78c5 [Lanai] Create a TargetInfo header. NFC
Move the declarations of getThe<Name>Target() functions into a new header in
TargetInfo and make users of these functions include this new header.
This fixes a layering problem.

llvm-svn: 360726
2019-05-14 23:17:18 +00:00
Richard Trieu 7ef172998b [Hexagon] Create a TargetInfo header. NFC
Move the declarations of getThe<Name>Target() functions into a new header in
TargetInfo and make users of these functions include this new header.
This fixes a layering problem.

llvm-svn: 360724
2019-05-14 23:04:55 +00:00
Richard Trieu a68ee931e6 [BPF] Create a TargetInfo header. NFC
Move the declarations of getThe<Name>Target() functions into a new header in
TargetInfo and make users of these functions include this new header.
This fixes a layering problem.

llvm-svn: 360722
2019-05-14 22:54:06 +00:00
Richard Trieu e982b42003 [AVR] Create a TargetInfo header. NFC
Move the declarations of getThe<Name>Target() functions into a new header in
TargetInfo and make users of these functions include this new header.
This fixes a layering problem.

llvm-svn: 360721
2019-05-14 22:41:58 +00:00
Philip Reames 445f942fc4 Use an offset from TOS for idempotent rmw locked op lowering
This was the portion split off D58632 so that it could follow the redzone API cleanup. Note that I changed the offset preferred from -8 to -64. The difference should be very minor, but I thought it might help address one concern which had been previously raised.

Differential Revision: https://reviews.llvm.org/D61862

llvm-svn: 360719
2019-05-14 22:32:42 +00:00
Richard Trieu f3011b9b10 [ARM] Create a TargetInfo header. NFC
Move the declarations of getThe<Name>Target() functions into a new header in
TargetInfo and make users of these functions include this new header.
This fixes a layering problem.

llvm-svn: 360718
2019-05-14 22:29:50 +00:00
Richard Trieu 7f9a008a2d [ARC] Create a TargetInfo header. NFC
Move the declarations of getThe<Name>Target() functions into a new header in
TargetInfo and make users of these functions include this new header.
This fixes a layering problem.

llvm-svn: 360716
2019-05-14 22:06:04 +00:00
Richard Trieu 8ce2ee9d56 [AMDGPU] Create a TargetInfo header. NFC
Move the declarations of getThe<Name>Target() functions into a new header in
TargetInfo and make users of these functions include this new header.
This fixes a layering problem.

llvm-svn: 360713
2019-05-14 21:54:37 +00:00
Richard Trieu b26592e04d [AArch64] Create a TargetInfo header. NFC
Move the declarations of getThe<Name>Target() functions into a new header in
TargetInfo and make users of these functions include this new header.
This fixes a layering problem.

llvm-svn: 360709
2019-05-14 21:33:53 +00:00
Dmitry Preobrazhensky ee51d851ea [AMDGPU][GFX8][GFX9] Corrected predicate of v_*_co_u32 aliases
Reviewers: rampitec, arsenm

Differential Revision: https://reviews.llvm.org/D61905

llvm-svn: 360702
2019-05-14 19:16:24 +00:00
Stanislav Mekhanoshin 05791d90c9 [AMDGPU] Fixed handling of imemdiate i1 literals
This bug was exposed by the rL360395.

Differential Revision: https://reviews.llvm.org/D61812

llvm-svn: 360689
2019-05-14 16:18:00 +00:00
Tim Renouf 33cb8f5b54 [AMDGPU] Fixed +DumpCode
The +DumpCode attribute is a horrible hack in AMDGPU to embed the
disassembly of the generated code into the elf file. It is used by LLPC
to implement an extension that allows the application to read back the
disassembly of the code. Longer term, we should re-implement that by
using the LLVM disassembler from the Vulkan driver.

Recent LLVM changes broke +DumpCode. With -filetype=asm it crashed, and
with -filetype=obj I think it did not include any instructions, only the
labels. Fixed with this commit: now it has no effect with -filetype=asm,
and works as intended with -filetype=obj.

Differential Revision: https://reviews.llvm.org/D60682

Change-Id: I6436d86fe2ea220d74a643a85e64753747c9366b
llvm-svn: 360688
2019-05-14 16:17:14 +00:00
Simon Pilgrim c2d9cfd925 [X86] Disable shouldFoldConstantShiftPairToMask for scalar shifts on AMD targets (PR40758)
D61068 handled vector shifts, this patch does the same for scalars where there are similar number of pipes for shifts as bit ops - this is true almost entirely for AMD targets where the scalar ALUs are well balanced.

This combine avoids AND immediate mask which usually means we reduce encoding size.

Some tests show use of (slow, scaled) LEA instead of SHL in some cases, but thats due to particular shift immediates - shift+mask generate these just as easily.

Differential Revision: https://reviews.llvm.org/D61830

llvm-svn: 360684
2019-05-14 15:21:28 +00:00
Cullen Rhodes 3b917019a5 [AArch64][SVE2] Asm: add SQRDMLAH/SQRDMLSH instructions
Summary:
This patch adds support for the indexed and unpredicated vectors forms of the
SQRDMLAH and SQRDMLSH instructions.

The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest

Reviewed By: rovka

Differential Revision: https://reviews.llvm.org/D61515

llvm-svn: 360683
2019-05-14 15:10:16 +00:00
Cullen Rhodes e029da46e6 [AArch64][SVE2] Asm: add integer multiply-add/subtract (indexed) instructions
Summary:
This patch adds support for the following instructions:

  MLA mul-add, writing addend (Zda = Zda +  Zn * Zm[idx])
  MLS mul-sub, writing addend (Zda = Zda + -Zn * Zm[idx])

Predicated forms of these instructions were added in SVE.

The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest

Reviewed By: rovka

Differential Revision: https://reviews.llvm.org/D61514

llvm-svn: 360682
2019-05-14 15:01:00 +00:00
Lei Huang 22561972af [PowerPC] Custom lower known CR bit spills
For known CRBit spills, CRSET/CRUNSET, it is more efficient to load and spill
the known value instead of extracting the bit.

eg. This sequence is currently used to spill a CRUNSET:
    crclr   4*cr5+lt
    mfocrf  r3,4
    rlwinm  r3,r3,20,0,0
    stw     r3,132(r1)

This patch custom lower it to:
    li  r3,0
    stw r3,132(r1)

Differential Revision: https://reviews.llvm.org/D61754

llvm-svn: 360677
2019-05-14 14:27:06 +00:00
Simon Pilgrim 2747ee2c83 [X86] X86TargetLowering::LowerINTRINSIC_WO_CHAIN - ensure rounding control is initialized. NFCI.
Fixes scan-build warnings

llvm-svn: 360664
2019-05-14 11:30:39 +00:00
Tim Northover ff6875acd9 AArch64: support binutils-like things on arm64_32.
This adds support for the arm64_32 watchOS ABI to LLVM's low level tools,
teaching them about the specific MachO choices and constants needed to
disassemble things.

llvm-svn: 360663
2019-05-14 11:25:44 +00:00
Philip Reames 3098e44daa [X86] Prefer locked stack op over mfence for seq_cst 64-bit stores on 32-bit targets
This is a follow on to D58632, with the same logic. Given a memory operation which needs ordering, but doesn't need to modify any particular address, prefer to use a locked stack op over an mfence.

Differential Revision: https://reviews.llvm.org/D61863

llvm-svn: 360649
2019-05-14 04:43:37 +00:00
Sanjay Patel 3a13d970aa [SDAG, x86] allow targets to override test for binop opcodes
This follows the pattern of the existing isCommutativeBinOp().

x86 shows improvements from vector narrowing for the min/max opcodes.

llvm-svn: 360639
2019-05-14 00:39:40 +00:00
Craig Topper e2966473dd [X86] Use ISD::MERGE_VALUES to return from lowerAtomicArith instead of calling ReplaceAllUsesOfValueWith and returning SDValue().
Returning SDValue() makes the caller think that nothing happened and it will
end up executing the Expand path. This generates extra nodes that will need to
be pruned as dead code.

Returning an ISD::MERGE_VALUES will tell the caller that we'd like to make a
change and it will take care of replacing uses. This will prevent falling into
the Expand path.

llvm-svn: 360627
2019-05-13 22:17:13 +00:00
Craig Topper 5f999c2bea [X86] Various type corrections to the code that creates LOCK_OR32mi8/OR32mi8Locked to the stack for idempotent atomic rmw and atomic fence.
These are updates to match how isel table would emit a LOCK_OR32mi8 node.

-Use i32 for the immediate zero even though only 8 bits are encoded.
-Use i16 for segment register.
-Use LOCK_OR32mi8 for idempotent atomic operations in 32-bit mode to match
64-bit mode. I'm not sure why OR32mi8Locked and LOCK_OR32mi8 both exist. The
only difference seems to be that OR32mi8Locked is marked as UnmodeledSideEffects=1.
-Emit an extra i32 result for the flags output.

I don't know if the types here really matter just noticed it was inconsistent
with normal behavior.

llvm-svn: 360619
2019-05-13 21:01:24 +00:00
Nikita Popov 323dc634b9 [WebAssembly] Don't assume that zext/sext result is i32/i64 in fast isel (PR41841)
Usually this will abort fast-isel at the instruction using the
non-legal result, but if the only use is in a different basic block,
we'll incorrectly assume that the zext/sext is to i32 (rather than
i128 in this case).

Differential Revision: https://reviews.llvm.org/D61823

llvm-svn: 360616
2019-05-13 19:40:18 +00:00
Stanislav Mekhanoshin 79b2828b3f [AMDGPU] Reorder includes per coding standard. NFC.
llvm-svn: 360609
2019-05-13 18:05:10 +00:00
Stanislav Mekhanoshin 21088639ae [AMDGPU] Remove now unused V2FP16_ONE constant def. NFC.
llvm-svn: 360608
2019-05-13 17:52:57 +00:00
Robert Lougher 91a9d4ef4b Revert [X86] Avoid SFB - Fix inconsistent codegen with/without debug info
Revert r360436 as it is causing clang-x64-windows-msvc buildbot to fail.

llvm-svn: 360606
2019-05-13 17:36:46 +00:00
Nick Desaulniers c33f754e74 [TargetLowering] Handle multi depth GEPs w/ inline asm constraints
Summary:
X86TargetLowering::LowerAsmOperandForConstraint had better support than
TargetLowering::LowerAsmOperandForConstraint for arbitrary depth
getelementpointers for "i", "n", and "s" extended inline assembly
constraints. Hoist its support from the derived class into the base
class.

Link: https://github.com/ClangBuiltLinux/linux/issues/469

Reviewers: echristo, t.p.northover

Reviewed By: t.p.northover

Subscribers: t.p.northover, E5ten, kees, jyknight, nemanjai, javed.absar, eraman, hiraditya, jsji, llvm-commits, void, craig.topper, nathanchance, srhines

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61560

llvm-svn: 360604
2019-05-13 17:27:44 +00:00
Simon Pilgrim 73aee29095 [X86][SSE] LowerBuildVectorv4x32 - don't insert MOVQ for undef elts
Fixes the regression noted in D61782 where a VZEXT_MOVL was being inserted because we weren't discriminating between 'zeroable' and 'all undef' for the upper elts.

Differential Revision: https://reviews.llvm.org/D61782

llvm-svn: 360596
2019-05-13 16:10:11 +00:00
Simon Pilgrim cf5a8eb7cd [X86][SSE] Relax use limits for lowerAddSubToHorizontalOp (PR32433)
Now that we can use HADD/SUB for scalar additions from any pair of extracted elements (D61263), we can relax the one use limit as we will be able to merge multiple uses into using the same HADD/SUB op.

This exposes a couple of missed opportunities in LowerBuildVectorv4x32 which will be committed separately.

Differential Revision: https://reviews.llvm.org/D61782

llvm-svn: 360594
2019-05-13 16:02:45 +00:00
Simon Pilgrim d9aa928603 [X86] Add SimplifyDemandedBits support for PEXTRB/PEXTRW (PR39709)
Test case will be included in a followup - its being used but its tricky to show a case that isn't caught at a later stage anyway.

llvm-svn: 360588
2019-05-13 15:31:27 +00:00
Cullen Rhodes 6dcef8fc0c [AArch64][SVE2] Add SVE2 target features to backend and TargetParser
Summary:
This patch adds the following features defined by Arm SVE2 architecture
extension:

  sve2, sve2-aes, sve2-sm4, sve2-sha3, bitperm

For existing CPUs these features are declared as unsupported to prevent
scheduler errors.

The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest

Reviewers: SjoerdMeijer, sdesmalen, ostannard, rovka

Reviewed By: SjoerdMeijer, rovka

Subscribers: rovka, javed.absar, tschuett, kristof.beyls, kristina, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61513

llvm-svn: 360573
2019-05-13 10:10:24 +00:00
Ulrich Weigand 8e42f6ddc8 [SystemZ] Model floating-point control register
This adds the FPC (floating-point control register) as a reserved
physical register and models its use by SystemZ instructions.

Note that only the current rounding modes and the IEEE exception
masks are modeled.  *Changes* of the FPC due to exceptions (in
particular the IEEE exception flags and the DXC) are not modeled.

At this point, this patch is mostly NFC, but it will prevent
scheduling of floating-point instructions across SPFC/LFPC etc.

llvm-svn: 360570
2019-05-13 09:47:26 +00:00
Sam Parker a33e311a3b [ARM][ParallelDSP] Relax alias checks
When deciding the safety of generating smlad, we checked for any
writes within the block that may alias with any of the loads that
need to be widened. This is overly conservative because it only
matters when there's a potential aliasing write to a location
accessed by a pair of loads.

Now we check for aliasing writes only once, during setup. If two
loads are found to have an aliasing write between them, we don't add
these loads to LoadPairs. This means that later during the transform,
we can safely widened a pair without worrying about aliasing.

However, to maintain correctness, we also need to change the way that
wide loads are inserted because the order is now important.

The MatchSMLAD method has also been changed, absorbing
MatchReductions and AddMACCandidate to hopefully improve readability.

Differential Revision: https://reviews.llvm.org/D6102

llvm-svn: 360567
2019-05-13 09:23:32 +00:00
Fangrui Song f3be557159 [WebAssembly] Add dependency on WebAssemblyDesc to fix BUILD_SHARED_LIBS=on builds after rL360550
This fixes the link error

ld.lld: error: undefined symbol: llvm::WebAssembly::anyTypeToString(unsigned int)
>>> referenced by WebAssemblyDisassembler.cpp

llvm-svn: 360558
2019-05-13 05:51:39 +00:00
Yonghong Song 98fe9c9869 [BPF] emit BTF sections only if debuginfo available
Currently, without -g, BTF sections may still be emitted with
data sections, e.g., for linux kernel bpf selftest
test_tcp_check_syncookie_kern.c issue discovered by Martin
as shown below.

-bash-4.4$ bpftool btf dump file test_tcp_check_syncookie_kern.o
[1] VAR 'results' type_id=0, linkage=global-alloc
[2] VAR '_license' type_id=0, linkage=global-alloc
[3] DATASEC 'license' size=0 vlen=1
        type_id=2 offset=0 size=4
[4] DATASEC 'maps' size=0 vlen=1
        type_id=1 offset=0 size=28

Let disable BTF generation if no debuginfo, which is
the original design.

Signed-off-by: Yonghong Song <yhs@fb.com>

Differential Revision: https://reviews.llvm.org/D61826

llvm-svn: 360556
2019-05-13 05:00:23 +00:00
Craig Topper 61e556d2bd Recommit r358887 "[TargetLowering][AMDGPU][X86] Improve SimplifyDemandedBits bitcast handling"
I've included a new fix in X86RegisterInfo to prevent PR41619 without
reintroducing r359392. We might be able to improve that in the base class
implementation of shouldRewriteCopySrc somehow. But this hopefully enables
forward progress on SimplifyDemandedBits improvements for now.

Original commit message:

This patch adds support for BigBitWidth -> SmallBitWidth bitcasts, splitting the DemandedBits/Elts accordingly.

The AMDGPU backend needed an extra  (srl (and x, c1 << c2), c2) -> (and (srl(x, c2), c1) combine to encourage BFE creation, I investigated putting this in DAGComb
but it caused a lot of noise on other targets - some improvements, some regressions.

The X86 changes are all definite wins.

llvm-svn: 360552
2019-05-13 04:03:35 +00:00
David L. Jones a263aa25e1 [WebAssembly] Move InstPrinter files to MCTargetDesc. NFC
For some targets, there is a circular dependency between InstPrinter and
MCTargetDesc. Merging them together will fix this. For the other targets,
the merging is to maintain consistency so all targets will have the same
structure.

llvm-svn: 360550
2019-05-13 03:32:41 +00:00
Simon Pilgrim a7fc763082 [X86][AVX] Split VZEXT_MOVL ymm/zmm if the upper elements are not demanded.
Removes unnecessary vzeroupper noted in D61806

llvm-svn: 360543
2019-05-12 15:16:29 +00:00
Simon Pilgrim fda6bffd3b [X86][SSE] SimplifyDemandedBits - call PEXTRB/PEXTRW SimplifyDemandedVectorElts as well.
See if we can simplify the demanded vector elts from the extraction before trying to simplify the demanded bits.

This helps us with target shuffles and hops in particular.

llvm-svn: 360535
2019-05-11 21:35:50 +00:00
Simon Pilgrim 6b10fde69b [CostModel][X86] Add min/max reduction costs for all SSE targets
The original costs stopped at SSE42, I've added conservative estimates for everything down to SSE1/SSE2 and moved some of the SSE42 costs to SSE41 (really only the addition of PCMPGT makes any difference).

I've also added missing vXi8 costs (we use PHMINPOSUW for i8/i16 for scarily quick results) and 256-bit vector costs for AVX1.

llvm-svn: 360528
2019-05-11 17:12:52 +00:00
Simon Pilgrim e4c5b6d9bd [X86][SSE] Add SimplifyDemandedVectorElts HADD/HSUB handling.
Still missing PHADDW/PHSUBW tests because PEXTRW doesn't call SimplifyDemandedVectorElts

llvm-svn: 360526
2019-05-11 16:07:12 +00:00
Simon Pilgrim 5e0f92acad FixupLEAPass::fixupIncDec - non-LEA opcodes should not happen here. NFCI.
Matches what we do in other functions and fixes scan-build warning about uninitialized NewOpcode variable.

llvm-svn: 360525
2019-05-11 16:02:34 +00:00
Craig Topper c9d7484aa3 [X86] Add CMOV_FR32X/CMOV_FR64X pseudo instructions. Use them in fast isel to fix a machine verifier error after adding test cases.
Fast isel picks the FR32X/FR64X register classes when lowering pseudo select, but it didn't have the right opcode to go with it.

llvm-svn: 360524
2019-05-11 16:00:28 +00:00
Craig Topper 74a436596d [X86] Sink some fast isel code into the only if that uses it. NFC
llvm-svn: 360523
2019-05-11 16:00:19 +00:00
Craig Topper 26f2b13a65 [X86] Use TLI.getRegClassFor to simplify some more fast isel code. NFCI
llvm-svn: 360522
2019-05-11 16:00:13 +00:00
Simon Pilgrim e7c51137aa HexagonConstEvaluator::evaluateHexExt - check incoming opcodes. NFCI.
Only certain extension opcodes are supported - fixes scan build warning.

llvm-svn: 360520
2019-05-11 15:24:34 +00:00
Craig Topper 682cc09675 [X86] Use getRegClassFor to simplify some code in fast isel. NFCI
No need to select the register class based on type and features. It should
already be setup by X86ISelLowering.

llvm-svn: 360513
2019-05-11 05:18:58 +00:00
Craig Topper 31f7adb94f [X86] Don't emit MOVNTDQA loads from fast-isel without SSE4.1.
We were checking for SSE4.1 for FP types, but not integer 128-bit types.

Fixes PR41837.

llvm-svn: 360512
2019-05-11 04:19:33 +00:00
Craig Topper bdef12df8d [X86] Add a test case for idempotent atomic operations with speculative load hardening. Fix an additional issue found by the test.
This test covers the fix from r360475 as well.

llvm-svn: 360511
2019-05-11 04:00:27 +00:00
Richard Trieu d0124bd762 [SystemZ] Move InstPrinter files to MCTargetDesc. NFC
For some targets, there is a circular dependency between InstPrinter and
MCTargetDesc.  Merging them together will fix this.  For the other targets,
the merging is to maintain consistency so all targets will have the same
structure.

llvm-svn: 360510
2019-05-11 03:36:16 +00:00
Richard Trieu 03fe9d82c4 [Sparc] Move InstPrinter files to MCTargetDesc. NFC
For some targets, there is a circular dependency between InstPrinter and
MCTargetDesc.  Merging them together will fix this.  For the other targets,
the merging is to maintain consistency so all targets will have the same
structure.

llvm-svn: 360506
2019-05-11 02:59:02 +00:00
Richard Trieu 00ecf67045 [RISCV] Move InstPrinter files to MCTargetDesc. NFC
For some targets, there is a circular dependency between InstPrinter and
MCTargetDesc.  Merging them together will fix this.  For the other targets,
the merging is to maintain consistency so all targets will have the same
structure

llvm-svn: 360505
2019-05-11 02:43:58 +00:00
Richard Trieu 4bdb136b0f [PowerPC] Move InstPrinter files to MCTargetDesc. NFC
For some targets, there is a circular dependency between InstPrinter and
MCTargetDesc.  Merging them together will fix this.  For the other targets,
the merging is to maintain consistency so all targets will have the same
structure.

llvm-svn: 360502
2019-05-11 02:33:18 +00:00
Richard Trieu 4b620fcf0f [NVPTX] Move InstPrinter files to MCTargetDesc. NFC
For some targets, there is a circular dependency between InstPrinter and
MCTargetDesc.  Merging them together will fix this.  For the other targets,
the merging is to maintain consistency so all targets will have the same
structure.

llvm-svn: 360500
2019-05-11 02:09:13 +00:00
Richard Trieu 61fb6700a5 [MSP430] Move InstPrinter files to MCTargetDesc. NFC
For some targets, there is a circular dependency between InstPrinter and
MCTargetDesc.  Merging them together will fix this.  For the other targets,
the merging is to maintain consistency so all targets will have the same
structure.

llvm-svn: 360498
2019-05-11 01:58:52 +00:00
Richard Trieu fa29bee9d0 [Mips] Move InstPrinter files to MCTargetDesc. NFC
For some targets, there is a circular dependency between InstPrinter and
MCTargetDesc.  Merging them together will fix this.  For the other targets,
the merging is to maintain consistency so all targets will have the same
structure.

llvm-svn: 360497
2019-05-11 01:38:56 +00:00
Richard Trieu 4c3890ddbf [Lanai] Move InstPrinter files to MCTargetDesc. NFC
For some targets, there is a circular dependency between InstPrinter and
MCTargetDesc.  Merging them together will fix this.  For the other targets,
the merging is to maintain consistency so all targets will have the same
structure.

llvm-svn: 360496
2019-05-11 01:25:58 +00:00
Richard Trieu 48803aa65c [BPF] Move InstPrinter files to MCTargetDesc. NFC
For some targets, there is a circular dependency between InstPrinter and
MCTargetDesc.  Merging them together will fix this.  For the other targets,
the merging is to maintain consistency so all targets will have the same
structure.

llvm-svn: 360494
2019-05-11 01:13:21 +00:00
Richard Trieu bf9e67b5b9 [AVR] Move InstPrinter files to MCTargetDesc. NFC
For some targets, there is a circular dependency between InstPrinter and
MCTargetDesc.  Merging them together will fix this.  For the other targets,
the merging is to maintain consistency so all targets will have the same
structure.

llvm-svn: 360493
2019-05-11 01:03:03 +00:00
Richard Trieu 5e3ee4b84e [ARM] Move InstPrinter files to MCTargetDesc. NFC
For some targets, there is a circular dependency between InstPrinter and
MCTargetDesc.  Merging them together will fix this.  For the other targets,
the merging is to maintain consistency so all targets will have the same
structure.

llvm-svn: 360490
2019-05-11 00:34:07 +00:00
Richard Trieu dcf1ea08e5 [ARC] Move InstPrinter files to MCTargetDesc. NFC
For some targets, there is a circular dependency between InstPrinter and
MCTargetDesc.  Merging them together will fix this.  For the other targets,
the merging is to maintain consistency so all targets will have the same
structure.

llvm-svn: 360488
2019-05-11 00:13:01 +00:00
Richard Trieu c0bd7bd481 [AMDGPU] Move InstPrinter files to MCTargetDesc. NFC
For some targets, there is a circular dependency between InstPrinter and
MCTargetDesc.  Merging them together will fix this.  For the other targets,
the merging is to maintain consistency so all targets will have the same
structure.

llvm-svn: 360487
2019-05-11 00:03:35 +00:00
Richard Trieu 7ba0605511 [AArch64] Move InstPrinter files to MCTargetDesc. NFC
For some targets, there is a circular dependency between InstPrinter and
MCTargetDesc.  Merging them together will fix this.  For the other targets,
the merging is to maintain consistency so all targets will have the same
structure.

llvm-svn: 360486
2019-05-10 23:50:01 +00:00
Richard Trieu f48ef2f2ba [XCore] Move InstPrinter files to MCTargetDesc. NFC
For some targets, there is a circular dependency between InstPrinter and
MCTargetDesc.  Merging them together will fix this.  For the other targets,
the merging is to maintain consistency so all targets will have the same
structure.

llvm-svn: 360485
2019-05-10 23:36:49 +00:00
Richard Trieu b28b8b7724 [X86] Move InstPrinter files to MCTargetDesc. NFC
For some targets, there is a circular dependency between InstPrinter and
MCTargetDesc.  Merging them together will fix this.  For the other targets,
the merging is to maintain consistency so all targets will have the same
structure.

llvm-svn: 360484
2019-05-10 23:24:38 +00:00
Philip Reames 849ef823df Factor out redzone ABI checks [NFCI]
As requested in D58632, cleanup our red zone detection logic in the X86 backend. The existing X86MachineFunctionInfo flag is used to track whether we *use* the redzone (via a particularly optimization?), but there's no common way to check whether the function *has* a red zone.

I'd appreciate careful review of the uses being updated. I think they are NFC, but a careful eye from someone else would be appreciated.

Differential Revision: https://reviews.llvm.org/D61799

llvm-svn: 360479
2019-05-10 22:55:42 +00:00
Craig Topper df10cc6068 [X86] Disable speculative load hardening for operations with an explicit RSP base.
After D58632, we can create idempotent atomic operations to the top of stack.
This confused speculative load hardening because it thinks accesses should have
virtual register base except for the cases it already excluded.

This commit adds a new exclusion for this case. I'll try to reduce a test case
for this, but this fix was verified to work by the reporter. This should avoid
needing to revert D58632.

llvm-svn: 360475
2019-05-10 22:03:33 +00:00
Mircea Trofin ff3bed0e61 Skip over prefetches
Summary: Skip over prefetches when assigning debug info to instructions with memory operands. This way, the debug info is stable after instrumenting a binary with prefetches, allowing for iterative profiling and instrumentation.

Reviewers: davidxl

Reviewed By: davidxl

Subscribers: aprantl, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61789

llvm-svn: 360471
2019-05-10 21:27:55 +00:00
Robert Lougher 986b6b86bb [X86] Avoid SFB - Fix inconsistent codegen with/without debug info
Fixes https://bugs.llvm.org/show_bug.cgi?id=40969

The functions findPotentiallyBlockedCopies and buildCopy are currently not
accounting for the presence of debug instructions. In the former this results
in the optimization not being trigerred, and in the latter results in
inconsistent codegen.

This patch enables the optimization to be performed in a debug build and
ensures the codegen is consistent with non-debug builds.

Patch by Chris Dawson.

Differential Revision: https://reviews.llvm.org/D61680

llvm-svn: 360436
2019-05-10 15:55:06 +00:00
Simon Pilgrim a0b1518a4a [X86][SSE] Add getHopForBuildVector vector splitting
If we only use the lower xmm of a ymm hop, then extract the xmm's (for free), perform the xmm hop and then insert back into a ymm (for free).

Fixes some of the regressions noted in D61782

llvm-svn: 360435
2019-05-10 15:46:04 +00:00
Lei Huang 1ac6e9636c [PowerPC] custom lower `v2f64 fpext v2f32`
Reduces scalarization overhead via custom lowering of v2f64 fpext v2f32.

eg. For the following IR
  %0 = load <2 x float>, <2 x float>* %Ptr, align 8
  %1 = fpext <2 x float> %0 to <2 x double>
  ret <2 x double> %1

Pre custom lowering:
  ld r3, 0(r3)
  mtvsrd f0, r3
  xxswapd vs34, vs0
  xscvspdpn f0, vs0
  xxsldwi vs1, vs34, vs34, 3
  xscvspdpn f1, vs1
  xxmrghd vs34, vs0, vs1

After custom lowering:
  lfd f0, 0(r3)
  xxmrghw vs0, vs0, vs0
  xvcvspdp vs34, vs0

Differential Revision: https://reviews.llvm.org/D57857

llvm-svn: 360429
2019-05-10 14:04:06 +00:00
Sam Clegg ea38ac5ba3 [WebAssembly] Don't assume that strongly defined symbols are DSO-local
The current PIC model for WebAssembly is more like ELF in that it
allows symbol interposition.

This means that more functions end up being addressed via the GOT
and fewer directly added to the wasm table.

One effect is a reduction in the number of wasm table entries similar
to the previous attempt in https://reviews.llvm.org/D61539 which was
reverted.

Differential Revision: https://reviews.llvm.org/D61772

llvm-svn: 360402
2019-05-10 01:52:08 +00:00
Sam Clegg 2147365484 [WebAssembly] Remove friend18.C from list of known gcc torture test failures. NFC.
Differential Revision: https://reviews.llvm.org/D61775

llvm-svn: 360401
2019-05-10 01:45:34 +00:00
Mircea Trofin 5c31c05fbd [llvm] X86DiscriminateMemOps: insert debug info when missing
Reviewers: davidxl

Reviewed By: davidxl

Subscribers: aprantl, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61735

llvm-svn: 360396
2019-05-10 00:12:51 +00:00
Stanislav Mekhanoshin 64196850f0 [AMDGPU] Pattern for v_xor3_b32
This also allows three op patterns to use increased constant bus
limit of GFX10.

Differential Revision: https://reviews.llvm.org/D61763

llvm-svn: 360395
2019-05-10 00:09:01 +00:00
Philip Reames bd588dfd59 [X86] Improve lowering of idemptotent RMW operations
The current lowering uses an mfence. mfences are substaintially higher latency than the locked operations originally requested, but we do want to avoid contention on the original cache line. As such, use a locked instruction on a cache line assumed to be thread local.

Differential Revision: https://reviews.llvm.org/D58632

llvm-svn: 360393
2019-05-09 23:23:42 +00:00
Bill Wendling 6ee7f31484 Add ".dword" directive
Summary:
The ".dword" directive is a synonym for ".xword" and is used used
by klibc, a minimalistic libc subset for initramfs.

Reviewers: t.p.northover, nickdesaulniers

Reviewed By: nickdesaulniers

Subscribers: nickdesaulniers, javed.absar, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61719

llvm-svn: 360381
2019-05-09 21:57:44 +00:00
Stanislav Mekhanoshin a76da34b1d [AMDGPU] gfx1010 v_interp_* instructions
Differential Revision: https://reviews.llvm.org/D61703

llvm-svn: 360364
2019-05-09 18:38:55 +00:00
Simon Pilgrim 93bfa5af48 [X86][SSE] Fold add(shuffle(),shuffle()) to hadd on 'slow' targets (PR39920)
As reported on PR39920, "slow horizontal ops" targets tend to internally expand to 2*shuffle+add/sub - so if we can reduce 2*shuffle+add/sub to a hadd/sub then we should do it - similar port usage but reduced instruction count.

This works out in most cases, although the "PR22377" regression in vector-shuffle-combining.ll is annoying - going from 2*shuffle+add+shuffle to hadd+2*shuffle - I've opened PR41813 to cover this.

Differential Revision: https://reviews.llvm.org/D61308

llvm-svn: 360360
2019-05-09 17:45:01 +00:00
Stanislav Mekhanoshin 4d4c9e0757 [AMDGPU] gfx1010 changes for PAL metadata
Differential Revision: https://reviews.llvm.org/D61704

llvm-svn: 360353
2019-05-09 16:34:13 +00:00
Roman Lebedev 9db0e72570 [X86] AMD Piledriver (BdVer2): major cleanup (mainly inverse throughput)
I've started this cleanup more several times now, but got sidetracked
elsewhere, e.g. by llvm-exegesis problems. Not this time, finally!

This is mainly cleaning up the inverse throughput values,
and a few latencies/uops, based on the llvm-exegesis measured values.

Though this is not complete by any means,
there's certainly more cleanup to be done.

The performance numbers (i've only checked by RawSpeed benchmark) aren't
really surprising - overall this *slightly* (< -1%) improves perf.

llvm-svn: 360341
2019-05-09 13:54:51 +00:00
Sam Parker d7b650cc72 [ARM][CGP] Guard against signext args and sitofp
Add an Argument that has the SExtAttr attached, as well as SIToFP
instructions, as values that generate sign bits. SIToFP doesn't
strictly do this and could be treated as a sink to be sign-extended.

Differential Revision: https://reviews.llvm.org/D61381

llvm-svn: 360331
2019-05-09 11:56:16 +00:00
Diana Picus 3531453371 [ARM GlobalISel] Map DBG_VALUE for types != s32
...and make sure we fail elegantly for unsupported values.

s64 goes into DPR, anything <= 32 into GPR.

llvm-svn: 360321
2019-05-09 09:49:36 +00:00
Hans Wennborg b1b09e5b55 X86WinAllocaExpander: Drop code looking through register copies (PR41786)
This code was never covered by tests, in PR41786 it was pointed out that
the deletion part doesn't work, and in a full Chrome build I was never
able to hit the code path that looks through copies. It seems the
situation it's supposed to handle doesn't actually come up in practice.

Delete it to simplify the code.

Differential revision: https://reviews.llvm.org/D61671

llvm-svn: 360320
2019-05-09 09:22:56 +00:00
Matt Arsenault 462403a5c8 AMDGPU: Mark scheduler classes as final
llvm-svn: 360294
2019-05-08 22:10:04 +00:00
Matt Arsenault 01434f9377 AMDGPU: Select VOP3 form of add
The VOP3 form should always be the preferred selection, to be shrunk
later. This should only be an optimization issue, but this partially
works around a problem from clobbering VCC when SIFixSGPRCopies
rewrites an SCC defining operation directly to VCC.

3 of the testcases are regressions from failing to fold the immediate
in cases it should. These can be avoided by improving the VCC liveness
handling in SIFoldOperands. Simply increasing the threshold to
computeRegisterLiveness works, although this is common enough that VCC
liveness should probably be tracked throughout the pass. The hack of
leaving behind an implicit_def instruction to avoid breaking iterator
wastes instruction count, which inhibits finding the VCC def in long
chains of adds. Doing this however exposes different, worse looking
regressions from poor scheduling behavior. This could probably be
avoided around by forcing the shrink of the addc here, but the
scheduler should probably be fixed.

The r600 add test needs to be split out because it asserts on the
arguments in the new test during the calling convention lowering.

llvm-svn: 360293
2019-05-08 22:09:57 +00:00
Stanislav Mekhanoshin 1dbf721315 [AMDGPU] gfx1010 exp modifications
Differential Revision: https://reviews.llvm.org/D61701

llvm-svn: 360287
2019-05-08 21:23:37 +00:00
Changpeng Fang 73b7272e7a AMDGPU: Fix a mis-placed bracket
Differential Revision:
  https://reviews.llvm.org/D61430

llvm-svn: 360283
2019-05-08 19:46:04 +00:00
Alina Sbirlea f31eba6494 [MemorySSA] Teach LoopSimplify to preserve MemorySSA.
Summary:
Preserve MemorySSA in LoopSimplify, in the old pass manager, if the analysis is available.
Do not preserve it in the new pass manager.
Update tests.

Subscribers: nemanjai, jlebar, javed.absar, Prazek, kbarton, zzheng, jsji, llvm-commits, george.burgess.iv, chandlerc

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60833

llvm-svn: 360270
2019-05-08 17:05:36 +00:00
Simon Pilgrim e461e9a77d [AArch64] Remove scan-build "Value stored during its initialization is never read" warnings. NFCI.
llvm-svn: 360268
2019-05-08 16:29:39 +00:00
Simon Pilgrim 12521b2d43 [AArch64] Fix scan-build null/uninitialized pointer warnings. NFCI.
llvm-svn: 360267
2019-05-08 16:27:24 +00:00
Simon Pilgrim e3eec06dde [AMDGPU] Reapplied BFE canonicalization from D60462
This was committed in rL358887 but reverted in rL360066 due to a x86 regression, really it should be have been pre-committed instead of being part of the SimplifyDemandedBits bitcast patch.

llvm-svn: 360263
2019-05-08 15:49:10 +00:00
Simon Pilgrim ec58090491 [Hexagon] Fix cppcheck reduce variable scope warnings. NFCI.
Also fixes a static analyzer "Value stored to 'S2' during its initialization is never read" warning.

llvm-svn: 360244
2019-05-08 11:02:46 +00:00
Tim Northover 18adcf331b ARM: disallow SP as Rn for Thumb2 TST & TEQ instructions
Using SP in this position is unpredictable in ARMv7. CMP and CMN are not
affected, and of course v8 relaxes this requirement, but that's handled
elsewhere.

llvm-svn: 360242
2019-05-08 10:59:08 +00:00
Simon Pilgrim 02937dad69 R600InstrInfo.cpp - Add getTransSwizzle assert for the swizzle op index. NFCI.
Fixes static analyzer undefined value warning.

llvm-svn: 360239
2019-05-08 10:39:56 +00:00
Simon Pilgrim be9ade93d1 [SIMode] Fix typo in Status constructor
As noted in https://www.viva64.com/en/b/0629/ (Snippet No. 36) and the scan-build CI reports (https://llvm.org/reports/scan-build/report-SIModeRegister.cpp-Status-1-1.html#EndPath), rL348754 introduced a typo in the Status constructor due to argument variable names shadowing the member variable names.

Differential Revision: https://reviews.llvm.org/D61595

llvm-svn: 360236
2019-05-08 10:24:22 +00:00
Reid Kleckner 6bf108d77a [COFF] Use COFF stubs for extern_weak functions
Summary:
A COFF stub indirects the reference to a symbol through memory. A
.refptr.$sym global variable pointer is created to refer to $sym.
Typically mingw uses these for external global variable declarations,
but we can use them for weak function declarations as well.

Updates the dso_local classification to add a special case for
extern_weak symbols on COFF in both clang and LLVM.

Fixes PR37598

Reviewers: smeenai, mstorsjo

Subscribers: hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D61615

llvm-svn: 360207
2019-05-07 23:06:21 +00:00
Austin Kerbow 8a3d3a9af6 [AMDGPU] Check MI bundles for hazards
Summary: GCNHazardRecognizer fails to identify hazards that are in and around bundles. This patch allows the hazard recognizer to consider bundled instructions in both scheduler and hazard recognizer mode. We ignore “bundledness” for the purpose of detecting hazards and examine the instructions individually.

Reviewers: arsenm, msearles, rampitec

Reviewed By: rampitec

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61564

llvm-svn: 360199
2019-05-07 22:12:15 +00:00
Eric Christopher 4727221734 Make sure that the DAG combiner doesn't merge stores that we explicitly
asked not be greater than preferred vector width for the vectorizer.
Test for both 128 and 256 with a skylake architecture.

llvm-svn: 360183
2019-05-07 19:25:34 +00:00
Simon Pilgrim debb2b2a1e Fix local shadow variable warning. NFCI.
llvm-svn: 360157
2019-05-07 14:56:34 +00:00
Nemanja Ivanovic b4f028f0f3 [PowerPC] Use the two-constant NR algorithm for refining estimates
The single-constant algorithm produces infinities on a lot of denormal values.
The precision of the two-constant algorithm is actually sufficient across the
range of denormals. We will switch to that algorithm for now to avoid the
infinities on denormals. In the future, we will re-evaluate the algorithm to
find the optimal one for PowerPC.

Differential revision: https://reviews.llvm.org/D60037

llvm-svn: 360144
2019-05-07 13:48:03 +00:00
Diana Picus 0a47fb8884 [ARM GlobalISel] Widen G_SELECT operands
...except for the condition operand.

llvm-svn: 360135
2019-05-07 11:39:30 +00:00
Simon Pilgrim b0f51266b8 [X86][AVX] Fold concat(packus(),packus()) -> packus(concat(),concat()) (PR34773)
Basic "revectorization" combine, we can probably do more opcodes here but it can be a tricky cost-benefit depending on where the subvectors came from - but this case helps shuffle combining.

llvm-svn: 360134
2019-05-07 11:17:39 +00:00
Simon Pilgrim a80abeea88 Fixed "Value stored to 'Opc' is never read" warning. NFCI.
llvm-svn: 360133
2019-05-07 11:09:16 +00:00
Simon Pilgrim 3c975a0ab5 [X86] Reduce scope of variables where possible. NFCI.
Fixes cppcheck warnings.

llvm-svn: 360131
2019-05-07 10:50:11 +00:00
Diana Picus d6d3808fa4 [ARM GlobalISel] Widen G_INTTOPTR/G_PTRTOINT
We actually have a couple of G_PTRTOINT to s8 when building clang, so
we should do something about them.

llvm-svn: 360130
2019-05-07 10:48:01 +00:00
Simon Pilgrim c5ac14eef8 Fix uninitialized variable warning. NFCI.
This also fixes a scan-build "array subscript is undefined" warning.

llvm-svn: 360128
2019-05-07 10:30:22 +00:00
Diana Picus d18bac5d19 [ARM GlobalISel] Widen G_GEP index operand
llvm-svn: 360127
2019-05-07 10:11:57 +00:00
Nicolai Haehnle 79ea85c6af AMDGPU: Verify that SOP2/SOPC instructions have at most one immediate operand
Summary:
No test case because I don't know of a way to trigger this, but I
accidentally caused this to fail while working on a different change.

Change-Id: I8015aa447fe27163cc4e4902205a203bd44bf7e3

Reviewers: arsenm, rampitec

Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61490

llvm-svn: 360123
2019-05-07 09:19:09 +00:00
Fangrui Song da82ce99b7 [DebugInfo] Delete TypedDINodeRef
TypedDINodeRef<T> is a redundant wrapper of Metadata * that is actually a T *.

Accordingly, change DI{Node,Scope,Type}Ref uses to DI{Node,Scope,Type} * or their const variants.
This allows us to delete many resolve() calls that clutter the code.

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D61369

llvm-svn: 360108
2019-05-07 02:06:37 +00:00
Craig Topper a75630302d [X86] Use extended vector register classes in getRegForInlineAsmConstraint to support x/y/zmm16-31 when the type is mismatched.
The FR32/FR64/VR128/VR256 register classes don't contain the upper 16 registers. For most cases we use the default implementation which will find any register class that contains the register in question if the VT is legal for the register class. But if the VT is i32 or i64, we won't find a matching register class and will instead up in the code modified in this patch.

If the requested register is x/y/zmm16-31 we weren't returning a register class that contains those registers and will hit an assertion in the caller.

To fix this, I've changed to use the extended register class instead. I don't believe we need a subtarget check to see if avx512 is enabled. The default implementation just pick whatever register class it finds first. I checked and we currently pick FR32X for XMM0 with an f32 type using the default implementation regardless of whether avx512 is enabled. So I assume its it is ok to do the same for i32.

Differential Revision: https://reviews.llvm.org/D61457

llvm-svn: 360102
2019-05-06 23:57:42 +00:00
Eli Friedman 2ea088173d [ARM] Glue register copies to tail calls.
This generally follows what other targets do. I don't completely
understand why the special case for tail calls existed in the first
place; even when the code was committed in r105413, call lowering didn't
work in the way described in the comments.

Stack protector lowering breaks if the register copies are not glued to
a tail call: we have to insert the stack protector check before the tail
call, and we choose the location based on the assumption that all
physical register dependencies of a tail call are adjacent to the tail
call. (See FindSplitPointForStackProtector.) This is sort of fragile,
but I don't see any reason to break that assumption.

I'm guessing nobody has seen this before just because it's hard to
convince the scheduler to actually schedule the code in a way that
breaks; even without the glue, the only computation that could actually
be scheduled after the register copies is the computation of the call
address, and the scheduler usually prefers to schedule that before the
copies anyway.

Fixes https://bugs.llvm.org/show_bug.cgi?id=41417

Differential Revision: https://reviews.llvm.org/D60427

llvm-svn: 360099
2019-05-06 23:21:59 +00:00
Stanislav Mekhanoshin 491746a584 [AMDGPU] gfx1010 verifier changes
Differential Revision: https://reviews.llvm.org/D61521

llvm-svn: 360095
2019-05-06 22:49:45 +00:00
Stanislav Mekhanoshin 971cb8b633 [AMDGPU] gfx1010: prefer V_MUL_LO_U32 over V_MUL_LO_I32
GFX10 deprecates v_mul_lo_i32 instruction, so choose u32 form for
all targets.

Differential Revision: https://reviews.llvm.org/D61525

llvm-svn: 360094
2019-05-06 22:27:05 +00:00
Stanislav Mekhanoshin 1bc001dec4 [AMDGPU] gfx1010 memory legalizer
Differential Revision: https://reviews.llvm.org/D61535

llvm-svn: 360087
2019-05-06 21:57:02 +00:00
Craig Topper d10a200ceb [X86] Remove the suffix on vcvt[u]si2ss/sd register variants in assembly printing.
We require d/q suffixes on the memory form of these instructions to disambiguate the memory size.
We don't require it on the register forms, but need to support parsing both with and without it.

Previously we always printed the d/q suffix on the register forms, but it's redundant and
inconsistent with gcc and objdump.

After this patch we should support the d/q for parsing, but not print it when its unneeded.

llvm-svn: 360085
2019-05-06 21:39:51 +00:00
Martin Storsjo 899f3cd581 [AArch64] Default to SEH exception handling on MinGW
The SEH implementation is pretty mature at this point.

Differential Revision: https://reviews.llvm.org/D61590

llvm-svn: 360080
2019-05-06 21:18:15 +00:00
Amara Emerson 3d1128cc9e [GlobalISel] Handle <1 x T> vector return types properly.
After support for dealing with types that need to be extended in some way was
added in r358032 we didn't correctly handle <1 x T> return types. These types
don't have a GISel direct representation, instead we just see them as scalars.
When we need to pad them into <2 x T> types however we need to use a
G_BUILD_VECTOR instead of trying to do a G_CONCAT_VECTOR.

This fixes PR41738.

llvm-svn: 360068
2019-05-06 19:41:01 +00:00
Craig Topper 55a71b575c Revert r359392 and r358887
Reverts "[X86] Remove (V)MOV64toSDrr/m and (V)MOVDI2SSrr/m. Use 128-bit result MOVD/MOVQ and COPY_TO_REGCLASS instead"
Reverts "[TargetLowering][AMDGPU][X86] Improve SimplifyDemandedBits bitcast handling"

Eric Christopher and Jorge Gorbe Moya reported some issues with these patches to me off list.

Removing the CodeGenOnly instructions has changed how fneg is handled during fast-isel with sse/sse2. We're now emitting fsub -0.0, x instead
moving to the integer domain(in a GPR), xoring the sign bit, and then moving back to xmm. This is because the fast isel table no longer
contains an entry for (f32/f64 bitcast (i32/i64)) so the target independent fneg code fails. The use of fsub changes the behavior of nan with
respect to -O2 codegen which will always use a pxor. NOTE: We still have a difference with double with -m32 since the move to GPR doesn't work
there. I'll file a separate PR for that and add test cases.

Since removing the CodeGenOnly instructions was fixing PR41619, I'm reverting r358887 which exposed that PR. Though I wouldn't be surprised
if that bug can still be hit independent of that.

This should hopefully get Google back to green. I'll work with Simon and other X86 folks to figure out how to move forward again.

llvm-svn: 360066
2019-05-06 19:29:24 +00:00
Guillaume Chatelet edd69fca3e Modernize repmovsb implementation of x86 memcpy and allow runtime sizes.
Summary: This is a prerequisite to RFC http://lists.llvm.org/pipermail/llvm-dev/2019-April/131973.html

Reviewers: courbet

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61593

Fix typo.

Turn this patch into an NFC.

Addressing comments

llvm-svn: 360050
2019-05-06 15:10:19 +00:00
Simon Pilgrim 2a0ef0530b [X86] Fix uninitialized members in constructor warnings. NFCI.
Initialize all member variables in X86ATTInstPrinter and X86DAGToDAGISel constructors to fix cppcheck warning.

llvm-svn: 360047
2019-05-06 14:48:02 +00:00
Alexandre Ganea 799d96ec39 Fix compilation warnings when compiling with GCC 7.3
Differential Revision: https://reviews.llvm.org/D61046

llvm-svn: 360044
2019-05-06 13:41:54 +00:00
Nemanja Ivanovic 70afe4f7e1 [PowerPC] Fix erroneous condition for converting uint-to-fp vector conversion
A condition for exiting the legalization of v4i32 conversion to v2f64 through
extract/convert/build erroneously checks for the extract having type i32.
This is not adequate as smaller extracts are actually legalized to i32 as well.
Furthermore, an early exit is missing which means that we only check that
both extracts are from the same vector if that check fails.
As a result, both cases in the included test case fail - the first gets a
select error and the second generates incorrect code.

The culprit commit is r274535.

llvm-svn: 360043
2019-05-06 13:35:49 +00:00
Simon Pilgrim d672d0e246 X86DAGToDAGISel::tryVPTESTM - fix uninitialized variable warning. NFCI.
findBroadcastedOp should always initialize the value if it returns true but static-analyzer isn't great at recognising this.

llvm-svn: 360037
2019-05-06 11:52:16 +00:00
Simon Pilgrim 04dad8f66d [X86] X86InstrInfo::findThreeSrcCommutedOpIndices - fix unread variable warning.
scan-build was reporting that CommutableOpIdx1 never used its original initialized value - move it down to where its first used to make the real initialization more obvious (and matches the comment that's there).

llvm-svn: 360028
2019-05-06 10:15:34 +00:00
Simon Pilgrim 07d91cd98a [X86] lowerVectorShuffle - use any_of to detect out of bounds shuffle indices. NFCI.
Fixes cppcheck local shadow warning as well.

llvm-svn: 360027
2019-05-06 10:11:24 +00:00
Luo, Yuanke beec41c656 Enable AVX512_BF16 instructions, which are supported for BFLOAT16 in Cooper Lake
Summary:
1. Enable infrastructure of AVX512_BF16, which is supported for BFLOAT16 in Cooper Lake;
2. Enable VCVTNE2PS2BF16, VCVTNEPS2BF16 and DPBF16PS  instructions, which are Vector Neural Network Instructions supporting BFLOAT16 inputs and conversion instructions from IEEE single precision.
VCVTNE2PS2BF16: Convert Two Packed Single Data to One Packed BF16 Data.
VCVTNEPS2BF16: Convert Packed Single Data to Packed BF16 Data.
VDPBF16PS: Dot Product of BF16 Pairs Accumulated into Packed Single Precision.
For more details about BF16 isa, please refer to the latest ISE document: https://software.intel.com/en-us/download/intel-architecture-instruction-set-extensions-programming-reference

Author: LiuTianle

Reviewers: craig.topper, smaslov, LuoYuanke, wxiao3, annita.zhang, RKSimon, spatel

Reviewed By: craig.topper

Subscribers: kristina, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60550

llvm-svn: 360017
2019-05-06 08:22:37 +00:00
Simon Pilgrim 8462cc3c74 [X86] Pull out repeated Subtarget feature tests. NFCI.
Avoids a scan-build "uninitialized value" warning in X86FastISel::X86SelectFPExtOrFPTrunc

llvm-svn: 360001
2019-05-05 20:45:20 +00:00
Simon Pilgrim addc90e4e8 [TTI][X86] Make getAddressComputationCost cost value const. NFCI.
llvm-svn: 359999
2019-05-05 20:03:51 +00:00
Simon Pilgrim 5170c0e5fe Move getOpcode() call into if statement. NFCI.
Avoids a cppcheck "Local variable name shadows outer variable" warning. 

llvm-svn: 359991
2019-05-05 18:34:38 +00:00
Simon Pilgrim 70ee2def90 [X86] Make X86RegisterInfo(const Triple &TT) constructor explicit.
Fixes cppcheck warning.

llvm-svn: 359981
2019-05-05 12:51:47 +00:00
Simon Pilgrim cbcd9b1b92 [X86] Fix some cppcheck "Local variable name shadows outer variable" warnings. NFCI.
llvm-svn: 359976
2019-05-05 12:00:14 +00:00
Stanislav Mekhanoshin 5ddd564e19 [AMDGPU] Fixed asan error after D61536
llvm-svn: 359963
2019-05-04 06:40:20 +00:00
Stanislav Mekhanoshin 51d1415a16 AMDGPU] gfx1010 hazard recognizer
Differential Revision: https://reviews.llvm.org/D61536

llvm-svn: 359961
2019-05-04 04:30:57 +00:00
Stanislav Mekhanoshin 28a1936f6d [AMDGPU] gfx1010: use fmac instructions
Differential Revision: https://reviews.llvm.org/D61527

llvm-svn: 359959
2019-05-04 04:20:37 +00:00
Jessica Paquette 910630c1e4 [AArch64][GlobalISel] Use fcsel instead of csel for G_SELECT on FPRs
This saves us some unnecessary copies.

If the inputs to a G_SELECT are floating point, we should use fcsel rather than
csel.

Changes here are...

- Teach selectCopy about s1-to-s1 copies across register banks.
- AArch64RegisterBankInfo about G_SELECT in general.
- Teach the instruction selector about the FCSEL instructions.

Also add two tests:

- select-select.mir to show that we get the expected FCSEL
- regbank-select.mir (unfortunately named) to show the register banks on
G_SELECT are properly preserved

And update fast-isel-select.ll to show that we do the same thing as other
instruction selectors in these cases.

llvm-svn: 359940
2019-05-03 22:37:46 +00:00
Stanislav Mekhanoshin d9dcf392c7 [AMDGPU] gfx1010 wait count insertion
Differential Revision: https://reviews.llvm.org/D61534

llvm-svn: 359938
2019-05-03 21:53:53 +00:00
Stanislav Mekhanoshin 41bbe101a2 [AMDGPU] gfx1010 s_code_end generation
Also add some missing metadata in the streamer.

Differential Revision: https://reviews.llvm.org/D61531

llvm-svn: 359937
2019-05-03 21:26:39 +00:00
Stanislav Mekhanoshin 93f15c922f [AMDGPU] gfx1010 loop alignment
Differential Revision: https://reviews.llvm.org/D61529

llvm-svn: 359935
2019-05-03 21:17:29 +00:00
Mandeep Singh Grang 5dc8aeb26d [COFF, ARM64] Fix ABI implementation of struct returns
Summary:
Refer the ABI doc at: https://docs.microsoft.com/en-us/cpp/build/arm64-windows-abi-conventions?view=vs-2019#return-values

Related clang patch: D60349

Reviewers: rnk, efriedma, TomTan, ssijaric

Reviewed By: rnk, efriedma

Subscribers: mstorsjo, javed.absar, kristof.beyls, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60348

llvm-svn: 359934
2019-05-03 21:12:36 +00:00
Brian Cain 3428c9daef [hexagon] change AsmParser assertion to error
For immediates that can't be evaluated in assembler-mapped instructions, we
should return 'invalid operand' instead of assert.

llvm-svn: 359905
2019-05-03 16:50:38 +00:00
Craig Topper a8f3840c62 [X86] Allow assembly parser to accept x/y/z suffixes on non-memory vfpclassps/pd and on memory forms in intel syntax
The x/y/z suffix is needed to disambiguate the memory form in at&t syntax since no xmm/ymm/zmm register is mentioned.

But we should also allow it for the register and broadcast forms where its not needed for consistency. This matches gas.

The printing code will still only use the suffix for the memory form where it is needed.

llvm-svn: 359903
2019-05-03 16:15:15 +00:00
Simon Pilgrim b323d5ec7c [X86] LowerToHorizontalOp - Tidyup calls to getHopForBuildVector. NFCI.
Merge the if() tests for the various HADD/SUB + Subtarget tests

llvm-svn: 359901
2019-05-03 15:56:06 +00:00
Matt Arsenault 657ef48a88 AMDGPU: Select VOP3 form of sub
The VOP3 form should always be the preferred selection form to be
shrunk later.

The r600 sub test needs to be split out because it asserts on the
arguments in the new test during the calling convention lowering.

llvm-svn: 359899
2019-05-03 15:37:07 +00:00
Matt Arsenault cfd0ca38b0 AMDGPU: Support shrinking add with FI in SIFoldOperands
Avoids test regression in a future patch

llvm-svn: 359898
2019-05-03 15:21:53 +00:00
Matt Arsenault 344d68d3c9 AMDGPU: Remove redundant patterns for shifts
llvm-svn: 359895
2019-05-03 15:08:36 +00:00
Matt Arsenault ada33314a2 AMDGPU: Remove redundant patterns for sub
There were 2 patterns for sub, one selecting to sub and one to
subrev. Only one of these will succeed, so remove the reversed one.

llvm-svn: 359894
2019-05-03 15:08:35 +00:00
Matt Arsenault 0446fbe45e AMDGPU: Replace shrunk instruction with dummy implicit_def
This was broken if the original operand was killed. The kill flag
would appear on both instructions, and fail the verifier. Keep the
kill flag, but remove the operands from the old instruction. This has
an added benefit of really reducing the use count for future folds.

Ideally the pass would be structured more like what PeepholeOptimizer
does to avoid this hack to avoid breaking instruction iterators.

llvm-svn: 359891
2019-05-03 14:40:10 +00:00
Simon Pilgrim bfdd0f75a8 [X86] Remove repeated variables. NFCI.
llvm-svn: 359889
2019-05-03 14:37:00 +00:00
Simon Pilgrim aa49be4926 Avoid cppcheck operator precedence warnings. NFCI.
Prefer ((X & Y) ? A : B) to (X & Y ? A : B)

llvm-svn: 359884
2019-05-03 13:50:38 +00:00
Matt Arsenault 2c8936fd26 AMDGPU: Fix incorrect commute with sub when folding immediates
When a fold of an immediate into a sub/subrev required shrinking the
instruction, the wrong VOP2 opcode was used. This was using the VOP2
equivalent of the original instruction, not the commuted instruction
with the inverted opcode.

llvm-svn: 359883
2019-05-03 13:42:56 +00:00
Simon Pilgrim a359ef192b [X86] LowerMULH - remove unused Lo/Hi vector indices. NFCI.
Leftover from before we had the extract128BitVector helpers.

llvm-svn: 359871
2019-05-03 10:32:07 +00:00
Simon Pilgrim 88f9117168 Reduce variable scope to just the if() block its actually used in. NFCI.
llvm-svn: 359869
2019-05-03 10:13:41 +00:00
Craig Topper d724360695 [X86] Add more one checks to masked compare patterns that were missed in r358358.
This covers the patterns we use for widening 128/256 comparisons to 512-bit when
AVX512VL isn't supported.

llvm-svn: 359863
2019-05-03 07:14:05 +00:00
Eli Friedman 7238353848 [AArch64][MC] Reject "add x0, x1, w2, lsl #1" etc.
Looks like just a minor oversight in the parsing code.

Fixes https://bugs.llvm.org/show_bug.cgi?id=41504.

Differential Revision: https://reviews.llvm.org/D60840

llvm-svn: 359855
2019-05-03 00:59:52 +00:00
Craig Topper bf29238e1a [X86] Remove LEA16r references from X86FixupLEAs. NFCI
As far as I know, we never emit LEA16r

llvm-svn: 359840
2019-05-02 22:46:23 +00:00
Craig Topper e1e38d4248 [X86] Correct the register class for specific mask register constraints in getRegForInlineAsmConstraint when the VT is a scalar type
The default impementation in the base class for TargetLowering::getRegForInlineAsmConstraint doesn't work for mask registers when the VT is a scalar type integer types since the only legal mask types are vXi1. So we end up just getting whatever the first register class that contains the register. Currently this appears to be VK1, but its really dependent on the order tablegen outputs the register classes.

Some code in the caller ends up looking up the type for this register class and find v1i1 then generates a copyfromreg from the physical k-register with the v1i1 type. Then it generates an any_extend from v1i1 to the scalar VT which isn't legal. This bad any_extend sticks around until isel where it selects a MOVZX32rr8 with a v1i1 input or maybe a i8 input. Not sure but eventually we pick up a copy from VK1 to GR8 in MachineIR which isn't supported. This leads to a failure in physical register copying.

This patch uses the scalar type to find a VK class of the right size. In the attached test case this will be VK16. This causes a bitcast from vk16 to i16 to be generated instead of an any_extend. This will be properly iseled to a VK16 to GR32 copy and a GR32->GR16 extract_subreg.

Fixes PR41678

Differential Revision: https://reviews.llvm.org/D61453

llvm-svn: 359837
2019-05-02 22:26:40 +00:00
Evandro Menezes 111df108e6 [AArch64] Update for Exynos
Fix the forwarding of multiplication results for Exynos M4.

llvm-svn: 359834
2019-05-02 22:01:39 +00:00
Craig Topper 47d8865a38 [X86] Remove string literal from an if. NFC
This if used to be an assert that got refactored into an if, but left the string literal behind.

Fixes PR41718

llvm-svn: 359833
2019-05-02 21:57:18 +00:00
Sanjay Patel 284472be6d [SelectionDAG] remove constant folding limitations based on FP exceptions
We don't have FP exception limits in the IR constant folder for the binops (apart from strict ops),
so it does not make sense to have them here in the DAG either. Nothing else in the backend tries
to preserve exceptions (again outside of strict ops), so I don't see how this could have ever
worked for real code that cares about FP exceptions.

There are still cases (examples: unary opcodes in SDAG, FMA in IR) where we are trying (at least
partially) to preserve exceptions without even asking if the target supports FP exceptions. Those
should be corrected in subsequent patches.

Real support for FP exceptions requires several changes to handle the constrained/strict FP ops.

Differential Revision: https://reviews.llvm.org/D61331

llvm-svn: 359791
2019-05-02 14:47:59 +00:00
Simon Pilgrim df8daf0ef4 [X86][SSE] lowerAddSubToHorizontalOp - enable ymm extraction+fold
Limiting scalar hadd/hsub generation to the lowest xmm looks to be unnecessary - we will be extracting one upper xmm whatever, and we can remove a shuffle by using the hop which is inline with what shouldUseHorizontalOp expects to happen anyway.

Testing on btver2 (the main target for fast-hops) shows this is beneficial even for float ops where we have a 'shuffle' to extract the float result:
https://godbolt.org/z/0R-U-K

Differential Revision: https://reviews.llvm.org/D61426

llvm-svn: 359786
2019-05-02 14:00:55 +00:00
Simon Pilgrim 9fa56f7829 [X86][SSE] Move shouldUseHorizontalOp inside isHorizontalBinOp. NFCI.
Matches what we do for lowerAddSubToHorizontalOp and will make it easier to peek through subvectors to help fix PR39921

llvm-svn: 359782
2019-05-02 12:18:24 +00:00
Diana Picus 1136ea2d44 [ARM GlobalISel] Fixup r359768
Get rid of local variable used only in assertion.

llvm-svn: 359772
2019-05-02 10:08:29 +00:00
Diana Picus 06a61ccc42 [ARM GlobalISel] Select extensions to < 32 bits
Select G_SEXT and G_ZEXT with destination types smaller than 32 bits in
the exact same way as 32 bits. This overwrites the higher bits, but that
should be ok since all legal users of types smaller than 32 bits ignore
those bits anyway.

llvm-svn: 359768
2019-05-02 09:28:00 +00:00
Diana Picus 53bcf6f2e7 [ARM GlobalISel] Legalize extensions to < 32 bits
Make it legal to extend from e.g. s1 to s8 or s16.

llvm-svn: 359766
2019-05-02 09:21:46 +00:00
Kang Zhang 1a0d6d6899 [NFC][PowerPC] Return early if the element type is not byte-sized in combineBVOfConsecutiveLoads
Summary:
Based on the Eli Friedman's comments in https://reviews.llvm.org/D60811 , we'd better return early if the element type is not byte-sized in `combineBVOfConsecutiveLoads`.

Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D61076

llvm-svn: 359764
2019-05-02 08:15:13 +00:00
Stanislav Mekhanoshin 64399da8b8 [AMDGPU] gfx1010 lost VOP2 forms of some add/sub
Add legalization of V_ADD_I32, V_SUB_I32, V_SUBREV_I32.

Differential Revision:

llvm-svn: 359757
2019-05-02 04:26:35 +00:00
Stanislav Mekhanoshin 5cf8167735 [AMDGPU] gfx1010 allows VOP3 to have a literal
Differential Revision: https://reviews.llvm.org/D61413

llvm-svn: 359756
2019-05-02 04:01:39 +00:00
Stanislav Mekhanoshin f2baae0abb [AMDGPU] gfx1010 constant bus limit
Constant bus limit has increased to 2 with GFX10.

Differential Revision: https://reviews.llvm.org/D61404

llvm-svn: 359754
2019-05-02 03:47:23 +00:00
Craig Topper b929a0062e [X86] Remove the redundant suffix in vfpclassp[d,s]'s broadcasting variant
The broadcasting variant for instruction vfpclassp[d,s] shouldn't use suffix q/l. So remove them from the template.

Patch by Pengfei Wang

Differential Revision: https://reviews.llvm.org/D61295

llvm-svn: 359753
2019-05-02 03:25:50 +00:00
Jessica Paquette a3843fe6f4 [GlobalISel][AArch64] Use fmov for G_FCONSTANT when possible
This adds support for using fmov rather than a standard mov to materialize
G_FCONSTANT when it's safe to do so.

Update arm64-fast-isel-materialize.ll and select-constant.mir to show that the
selection is correct.

llvm-svn: 359734
2019-05-01 22:39:43 +00:00
Simon Pilgrim 9f04d97cd7 [X86][SSE] Fold scalar horizontal add/sub for non-0/1 element extractions
We already perform horizontal add/sub if we extract from elements 0 and 1, this patch extends it to non-0/1 element extraction indices (as long as they are from the lowest 128-bit vector).

Differential Revision: https://reviews.llvm.org/D61263

llvm-svn: 359707
2019-05-01 17:13:35 +00:00
Stanislav Mekhanoshin 3b7925f035 [AMDGPU] gfx1010 GCNRegBankReassign pass
Reassign registers to reduce register bank conflicts.

Differential Revision: https://reviews.llvm.org/D61344

llvm-svn: 359704
2019-05-01 16:49:31 +00:00
Stanislav Mekhanoshin c29d491596 [AMDGPU] gfx1010 GCNNSAReassign pass
Convert NSA into non-NSA images.

Differential Revision: https://reviews.llvm.org/D61341

llvm-svn: 359700
2019-05-01 16:40:49 +00:00
Stanislav Mekhanoshin 692560dc98 [AMDGPU] gfx1010 MIMG implementation
Differential Revision: https://reviews.llvm.org/D61339

llvm-svn: 359698
2019-05-01 16:32:58 +00:00
Stanislav Mekhanoshin a224f68a10 [AMDGPU] gfx1010 DS implementation
Differential Revision: https://reviews.llvm.org/D61332

llvm-svn: 359696
2019-05-01 16:11:11 +00:00
Simon Pilgrim f5bdff7747 Fix 80 column violation. NFCI.
llvm-svn: 359694
2019-05-01 16:01:49 +00:00
Simon Pilgrim 6711b9699a [X86][SSE] Add demanded elts support X86ISD::PMULDQ\PMULUDQ
Add to SimplifyDemandedVectorEltsForTargetNode and SimplifyDemandedBitsForTargetNode

llvm-svn: 359686
2019-05-01 14:50:50 +00:00
Simon Pilgrim 3d6899e369 [X86][SSE] Add SSE vector shift support to SimplifyDemandedVectorEltsForTargetNode vector splitting
llvm-svn: 359680
2019-05-01 13:51:09 +00:00
Simon Pilgrim ba372c6e62 [X86][SSE] Split 512-bit -> 128-bit vector directly in SimplifyDemandedVectorEltsForTargetNode
llvm-svn: 359678
2019-05-01 12:48:42 +00:00
Simon Pilgrim 951a6b4579 [X86][SSE] Add 512-bit vector support to SimplifyDemandedVectorEltsForTargetNode vector splitting
llvm-svn: 359677
2019-05-01 12:37:41 +00:00
Simon Pilgrim 37c2419cc7 [X86][SSE] Add X86ISD::PACKSS\PACKUS to SimplifyDemandedVectorEltsForTargetNode vector splitting
llvm-svn: 359673
2019-05-01 11:29:36 +00:00
Simon Pilgrim 3353cee06c [X86][SSE] Add X86ISD::UNPCKL\UNPCK to SimplifyDemandedVectorEltsForTargetNode vector splitting
llvm-svn: 359670
2019-05-01 11:08:03 +00:00
Simon Pilgrim f7b978a71b [X86][SSE] Move extract_subvector(pshufb) fold to SimplifyDemandedVectorEltsForTargetNode
This lets us hit more cases than combineExtractSubvector and allows us reuse more code.

llvm-svn: 359669
2019-05-01 10:58:38 +00:00
Simon Pilgrim a7d107a3e0 [X86] SimplifyDemandedVectorEltsForTargetNode - pull out vector halving code. NFCI.
Pull out the HADD/HSUB code to halve vector widths if the upper half isn't used - prep work to adding support for other opcodes.

llvm-svn: 359667
2019-05-01 10:38:10 +00:00
Simon Pilgrim 99eefe94b5 [X86][SSE] Extract i1 elements from vXi1 bool vectors
This is an alternative to D59669 which more aggressively extracts i1 elements from vXi1 bool vectors using a MOVMSK.

Differential Revision: https://reviews.llvm.org/D61189

llvm-svn: 359666
2019-05-01 10:02:22 +00:00
Craig Topper dd66acef96 [X86FixupLEAs] Hoist the calls to isLEA out of the 3 separate functions and put it in the basic block instruction loop. NFC
Now need to check it 3 different times. Just do it once at the top of the loop.

llvm-svn: 359658
2019-05-01 06:53:03 +00:00