Commit Graph

49400 Commits

Author SHA1 Message Date
Simon Pilgrim ffdfe45645 [X86][SSE] LowerMULH vXi8 - use SSE shifts directly.
We know these vXi16 extended cases are legal constant splat shifts.

llvm-svn: 340414
2018-08-22 15:37:11 +00:00
Sam Parker 4d519fc3b5 [ARM] Rotated operand patterns for *xtb16
Add intrinsic isel patterns for sxtb16, sxtab16, uxtb16 and uxtab16
so that they can perform a ror.

Differential Revision: https://reviews.llvm.org/D51034

llvm-svn: 340405
2018-08-22 12:58:36 +00:00
David Green 9dd1d451d9 [AArch64] Add Tiny Code Model for AArch64
This adds the plumbing for the Tiny code model for the AArch64 backend. This,
instead of loading addresses through the normal ADRP;ADD pair used in the Small
model, uses a single ADR. The 21 bit range of an ADR means that the code and
its statically defined symbols need to be within 1MB of each other.

This makes it mostly interesting for embedded applications where we want to fit
as much as we can in as small a space as possible.

Differential Revision: https://reviews.llvm.org/D49673

llvm-svn: 340397
2018-08-22 11:31:39 +00:00
Matt Arsenault bb8e64e7f5 AMDGPU: Fix not respecting byval alignment in call frame setup
This was hackily adding in the 4-bytes reserved for the callee's
emergency stack slot. Treat it like a normal stack allocation
so we get the correct alignment padding behavior. This fixes
an inconsistency between the caller and callee.

llvm-svn: 340396
2018-08-22 11:09:45 +00:00
Stefan Maksimovic 6ccbd16433 [mips] Handle missing CondCodes
Add patterns for unhandled CondCode enumerables:
SETEQ, SETGE, SETGT, SETLE, SETLT, SETNE.

Stated at the ISD::CondCode enum declaration:
`All of these (except for the 'always folded ops')
should be handled for floating point.`

Add patterns which use these nodes, same as corresponding
'ordered' CondCode nodes.

Referring to 'Ordered means that neither operand is a QNAN'
we assume it is safe to match ex. SETLT node to the same
instruction as SETOLT.

Differential Revision: https://reviews.llvm.org/D50757

llvm-svn: 340392
2018-08-22 09:34:44 +00:00
Heejin Ahn 684325955c [WebAssembly] Fix typos in mem.grow/memory.grow opcodes
This should be not 0x3f but 0x40.

llvm-svn: 340373
2018-08-22 00:33:34 +00:00
Heejin Ahn c4df1d182c [WebAssembly] Change comments on SP writing back (NFC)
Summary: We now write back not to memory but to __stack_pointer global.

Reviewers: dschuff

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D51074

llvm-svn: 340372
2018-08-22 00:20:02 +00:00
Alina Sbirlea ab6f84f763 Update MemorySSA in BasicBlockUtils.
Summary:
Extend BasicBlocksUtils to update MemorySSA.

Subscribers: sanjoy, arsenm, nhaehnle, jlebar, Prazek, llvm-commits

Differential Revision: https://reviews.llvm.org/D45300

llvm-svn: 340365
2018-08-21 23:32:03 +00:00
Scott Linder 72855e36c5 [AMDGPU] Consider loads from flat addrspace to be potentially divergent
In general we can't assume flat loads are uniform, and cases where we can prove
they are should be handled through infer-address-spaces.

Differential Revision: https://reviews.llvm.org/D50991

llvm-svn: 340343
2018-08-21 21:24:31 +00:00
Heejin Ahn 78d1910891 [WebAssembly] Restore __stack_pointer after catch instructions
Summary:
After the stack is unwound due to a thrown exception, the
`__stack_pointer` global can point to an invalid address. This inserts
instructions that restore `__stack_pointer` global.

Reviewers: jgravelle-google, dschuff

Subscribers: mgorny, sbc100, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D50980

llvm-svn: 340339
2018-08-21 21:23:07 +00:00
Thomas Lively 22442924a8 [WebAssembly] v128.const
Summary:
This CL implements v128.const for each vector type. New operand types
are added to ensure the vector contents can be serialized without LEB
encoding. Tests are added for instruction selection, encoding,
assembly and disassembly.

Reviewers: aheejin, dschuff, aardappel

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D50873

llvm-svn: 340336
2018-08-21 21:03:18 +00:00
Heejin Ahn 20c9c4438e [WebAssembly] Change writeSPToMemory to writeSPToGlobal (NFC)
Summary: SP is now a __stack_pointer global and not a memory address anymore.

Reviewers: dschuff

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D51046

llvm-svn: 340328
2018-08-21 19:52:19 +00:00
Heejin Ahn ed5e06b0a7 [WebAssembly] Add isEHScopeReturn instruction property
Summary:
So far, `isReturn` property is used to mean both a return instruction
from a functon and the end of an EH scope, a scope that starts with a EH
scope entry BB and ends with a catchret or a cleanupret instruction.
Because WinEH uses funclets, all EH-scope-ending instructions are also
real return instruction from a function. But for wasm, they only serve
as the end marker of an EH scope but not a return instruction that
exits a function. This mismatch caused incorrect prolog and epilog
generation in wasm EH scopes. This patch fixes this.

This patch is in the same vein with rL333045, which splits
`MachineBasicBlock::isEHFuncletEntry` into `isEHFuncletEntry` and
`isEHScopeEntry`.

Reviewers: dschuff

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D50653

llvm-svn: 340325
2018-08-21 19:44:11 +00:00
Benjamin Kramer d66dde5a98 [NVPTX] Remove ftz variants of cvt with rounding mode
These do not exist in ptxas, it refuses to compile them.

Differential Revision: https://reviews.llvm.org/D51042

llvm-svn: 340317
2018-08-21 18:44:25 +00:00
Eric Christopher 3dc594c1e6 Temporarily Revert "[PowerPC] Generate Power9 extswsli extend sign and shift immediate instruction" due to it causing a compiler crash on valid.
This reverts commit r340016, testcase forthcoming.

llvm-svn: 340315
2018-08-21 18:35:08 +00:00
Simon Pilgrim 50eba6b380 [X86][SSE] Lower vXi8 general shifts to SSE shifts directly. NFCI.
Most of these shifts are extended to vXi16 so we don't gain anything from forcing another round of generic shift lowering - we know these extended cases are legal constant splat shifts.

llvm-svn: 340307
2018-08-21 17:27:03 +00:00
Simon Pilgrim 98eb4ae499 [X86][SSE] Lower v8i16 general shifts to SSE shifts directly. NFCI.
We don't gain anything from forcing another round of generic shift lowering - we know these are legal constant splat shifts.

llvm-svn: 340302
2018-08-21 17:05:07 +00:00
Simon Pilgrim dbe4e9e3ff [X86][SSE] Lower directly to SSE shifts in the BLEND(SHIFT, SHIFT) combine. NFCI.
We don't gain anything from forcing another round of generic shift lowering - we know these are legal constant splat shifts.

llvm-svn: 340300
2018-08-21 16:46:48 +00:00
Farhana Aleen 3528c80378 [AMDGPU] Support idot2 pattern.
Summary: Transform add (mul ((i32)S0.x, (i32)S1.x),

         add( mul ((i32)S0.y, (i32)S1.y), (i32)S3) => i/udot2((v2i16)S0, (v2i16)S1, (i32)S3)

Author: FarhanaAleen

Reviewed By: arsenm

Subscribers: llvm-commits, AMDGPU

Differential Revision: https://reviews.llvm.org/D50024

llvm-svn: 340295
2018-08-21 16:21:15 +00:00
Simon Pilgrim 5a83a1fd13 [X86][SSE] Add helper function to convert to/between the SSE vector shift opcodes. NFCI.
Also remove some more getOpcode calls from LowerShift when we already have Opc.

llvm-svn: 340290
2018-08-21 15:57:33 +00:00
Daniel Sanders 6a943fb16a [aarch64][mc] Don't lookup symbols when there is no symbol lookup callback
Summary: When run under llvm-mc-disassemble-fuzzer, there is no symbol lookup callback so tryAddingSymbolicOperand() must fail gracefully instead of crashing

Reviewers: aemerson, javed.absar

Reviewed By: aemerson

Subscribers: lhames, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D51005

llvm-svn: 340287
2018-08-21 15:47:25 +00:00
Tim Renouf bb5ee41ab4 [AMDGPU] Allow int types for MUBUF vdata
Summary:
Previously the new llvm.amdgcn.raw/struct.buffer.load/store intrinsics
only allowed float types for the data to be loaded or stored, which
sometimes meant the frontend needed to generate a bitcast. In this, the
new intrinsics copied the old buffer intrinsics.

This commit extends the new intrinsics to allow int types as well.

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D50315

Change-Id: I8202af2d036455553681dcbb3d7d32ae273f8f85
llvm-svn: 340270
2018-08-21 11:08:12 +00:00
Tim Renouf 4f703f5e11 [AMDGPU] New buffer intrinsics
Summary:
This commit adds new intrinsics
  llvm.amdgcn.raw.buffer.load
  llvm.amdgcn.raw.buffer.load.format
  llvm.amdgcn.raw.buffer.load.format.d16
  llvm.amdgcn.struct.buffer.load
  llvm.amdgcn.struct.buffer.load.format
  llvm.amdgcn.struct.buffer.load.format.d16
  llvm.amdgcn.raw.buffer.store
  llvm.amdgcn.raw.buffer.store.format
  llvm.amdgcn.raw.buffer.store.format.d16
  llvm.amdgcn.struct.buffer.store
  llvm.amdgcn.struct.buffer.store.format
  llvm.amdgcn.struct.buffer.store.format.d16
  llvm.amdgcn.raw.buffer.atomic.*
  llvm.amdgcn.struct.buffer.atomic.*

with the following changes from the llvm.amdgcn.buffer.*
intrinsics:

* there are separate raw and struct versions: raw does not have an
  index arg and sets idxen=0 in the instruction, and struct always sets
  idxen=1 in the instruction even if the index is 0, to allow for the
  fact that gfx9 does bounds checking differently depending on whether
  idxen is set;

* there is a combined cachepolicy arg (glc+slc)

* there are now only two offset args: one for the offset that is
  included in bounds checking and swizzling, to be split between the
  instruction's voffset and immoffset fields, and one for the offset
  that is excluded from bounds checking and swizzling, to go into the
  instruction's soffset field.

The AMDISD::BUFFER_* SD nodes always have an index operand, all three
offset operands, combined cachepolicy operand, and an extra idxen
operand.

The obsolescent llvm.amdgcn.buffer.* intrinsics continue to work.

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, jfb, llvm-commits

Differential Revision: https://reviews.llvm.org/D50306

Change-Id: If897ea7dc34fcbf4d5496e98cc99a934f62fc205
llvm-svn: 340269
2018-08-21 11:07:10 +00:00
Tim Renouf 35484c9d50 [AMDGPU] New tbuffer intrinsics
Summary:
This commit adds new intrinsics
  llvm.amdgcn.raw.tbuffer.load
  llvm.amdgcn.struct.tbuffer.load
  llvm.amdgcn.raw.tbuffer.store
  llvm.amdgcn.struct.tbuffer.store

with the following changes from the llvm.amdgcn.tbuffer.* intrinsics:

* there are separate raw and struct versions: raw does not have an index
  arg and sets idxen=0 in the instruction, and struct always sets
  idxen=1 in the instruction even if the index is 0, to allow for the
  fact that gfx9 does bounds checking differently depending on whether
  idxen is set;

* there is a combined format arg (dfmt+nfmt)

* there is a combined cachepolicy arg (glc+slc)

* there are now only two offset args: one for the offset that is
  included in bounds checking and swizzling, to be split between the
  instruction's voffset and immoffset fields, and one for the offset
  that is excluded from bounds checking and swizzling, to go into the
  instruction's soffset field.

The AMDISD::TBUFFER_* SD nodes always have an index operand, all three
offset operands, combined format operand, combined cachepolicy operand,
and an extra idxen operand.

The tbuffer pseudo- and real instructions now also have a combined
format operand.

The obsolescent llvm.amdgcn.tbuffer.* and llvm.SI.tbuffer.store
intrinsics continue to work.

V2: Separate raw and struct intrinsics.
V3: Moved extract_glc and extract_slc defs to a more sensible place.
V4: Rebased on D49995.
V5: Only two separate offset args instead of three.
V6: Pseudo- and real instructions have joint format operand.
V7: Restored optionality of dfmt and nfmt in assembler.
V8: Addressed minor review comments.

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D49026

Change-Id: If22ad77e349fac3a5d2f72dda53c010377d470d4
llvm-svn: 340268
2018-08-21 11:06:05 +00:00
Petar Jovanovic 3b953c37f8 [MIPS GlobalISel] Select bitwise instructions
Select bitwise instructions for i32.

Patch by Petar Avramovic.

Differential Revision: https://reviews.llvm.org/D50183

llvm-svn: 340258
2018-08-21 08:15:56 +00:00
Heejin Ahn 487992cc09 [WebAssembly] Revert type of wake count in atomic.wake to i32
Summary:
We decided to revert this from i64 to i32 in Nov 28 CG meeting. Fixes
PR38632.

Reviewers: dschuff

Subscribers: sbc100, jgravelle-google, sunfish, jfb, llvm-commits

Differential Revision: https://reviews.llvm.org/D51010

llvm-svn: 340234
2018-08-20 23:49:29 +00:00
Heejin Ahn c2c33c8e64 [WebAssembly] Remove an unused argument from writeSPToMemory (NFC)
Reviewers: dschuff

Subscribers: dschuff, sbc100, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D50933

llvm-svn: 340230
2018-08-20 23:02:15 +00:00
Craig Topper 210ccfe3db [X86] Prevent lowerVectorShuffleByMerging128BitLanes from creating cycles
Due to some splat handling code in getVectorShuffle, its possible for NewV1/NewV2 to have their mask modified from what is requested. This can lead to cycles being created in the DAG.

This patch examines the returned mask and makes sure its different. Long term we may need to look closer at that splat code in getVectorShuffle, or add more splat awareness to getVectorShuffle.

Fixes PR38639

Differential Revision: https://reviews.llvm.org/D50981

llvm-svn: 340214
2018-08-20 21:08:35 +00:00
Craig Topper 7dcb2c4b0a [X86] Teach combineTruncatedArithmetic to handle some cases of ISD::SUB
We can safely avoid interfering with the subus combine if both inputs are freely truncatable. Either both extends, or an extend and a constant vector.

Differential Revision: https://reviews.llvm.org/D50878

llvm-svn: 340212
2018-08-20 20:57:35 +00:00
Vitaly Buka 30b5ed3eb7 Revert "AMDGPU: bump AS.MAX_COMMON_ADDRESS to 6 since 32-bit addr space"
As it introduces out of bound access.

This reverts commit r340172 and r340171

llvm-svn: 340202
2018-08-20 19:31:03 +00:00
Marcello Maggioni 5ca4128b45 [PSV] Update API to be able to use TargetCustom without UB.
getTargetCustom() requires values for "Kind" in the constructor
that are not in the PSVKind enum. Passing a value that is not inside
an enum as an argument to a constructor of the type of the enum is
UB. Changing to the underlying type of the enum would solve the UB

Differential Revision: https://reviews.llvm.org/D50909

llvm-svn: 340200
2018-08-20 19:23:45 +00:00
Samuel Pitoiset 216a2da577 AMDGPU: fix compilation errors since r340171
Some buildbot slaves reports compilation errors, but it
compiled fine on my side, sorry for the breakage.

llvm-svn: 340172
2018-08-20 13:31:41 +00:00
Samuel Pitoiset c95ef77d37 AMDGPU: bump AS.MAX_COMMON_ADDRESS to 6 since 32-bit addr space
32-bit constant address space is declared as 6, so the
maximum number of address spaces is 6, not 5.

Fixes "LLVM ERROR: Pointer address space out of range".

v3: use static_assert()
v2: add a very simple test for 32-bit addr space

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106630
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
llvm-svn: 340171
2018-08-20 13:18:59 +00:00
Sander de Smalen 07db432265 [AArch64][SVE] Asm: Add SVE System registers
This patch adds system registers for controlling aspects of SVE:
- ZCR_EL1  (r/w)   visible at EL1 and EL0.
- ZCR_EL2  (r/w)   visible at EL2 and Non-secure EL1 and EL0.
- ZCR_EL3  (r/w)   visible at all exception levels.

and a system register identifying SVE:
- ID_AA64ZFR0_EL1  (r)  SVE Feature identifier.

Reviewers: SjoerdMeijer, samparker, pbarrio, fhahn, javed.absar

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D50885

llvm-svn: 340158
2018-08-20 09:16:59 +00:00
QingShan Zhang f8f9af7ba5 [PowerPC] Add a peephole post RA to transform the inst that fed by add
If the arch is P8, we will select XFLOAD to load the floating point, and then, expand it to vsx and non-vsx X-form instruction post RA. This patch is trying to convert the X-form to D-form if it meets the requirement that one operand of the x-form inst is the special Zero register, and another operand fed by add inst. i.e.
y = add imm, reg
LFDX. 0, y
-->
LFD imm(reg)

Reviewers: Nemanjai
Differential Revision: https://reviews.llvm.org/D49007

llvm-svn: 340149
2018-08-20 02:52:55 +00:00
Craig Topper 803912ea57 [X86] Fix an issue in the matching for ADDUS.
We were basically assuming only one operand of the compare could be an ADD node and using that to swap operands. But we can have a normal add followed by a saturing add.

This rewrites the canonicalization to just be based on the condition code.

llvm-svn: 340134
2018-08-19 04:26:31 +00:00
Craig Topper 2b03df9b05 [X86] Use SDValue::operator== instead of DAG.isEqualTo in strictly integer matching.
isEqualTo is more useful for floating point. operator== is sufficient for integer.

llvm-svn: 340130
2018-08-18 19:16:56 +00:00
Craig Topper 3e299d896f [X86] Simplify the PADDUS legality check in combineSelect to match PSUBUS. NFC
While there remove some trailing whitespace.

llvm-svn: 340129
2018-08-18 18:51:04 +00:00
Craig Topper 40c9559b74 [X86] Add support for using 512-bit PSUBUS to combineSelect.
The code already support 128 and 256 and even knows to split 256 for AVX1. So we really just needed to stop looking for specific VTs and subtarget features and just look for legal VTs with i8/i16 elements.

While there, add some curly braces around outer if statement bodies that contain only another if. It makes all the closing curly braces look more regular.

llvm-svn: 340128
2018-08-18 18:51:03 +00:00
Simon Pilgrim 9c1761a6fd [X86] Replace all single match schedule class instregexs with instrs entries
Helps reduce cost of instrw collection

llvm-svn: 340124
2018-08-18 18:04:29 +00:00
Simon Pilgrim ebfd6ebba7 [X86] Merge shift/rotate schedule class instregexs
Helps reduce cost of instrw collection

llvm-svn: 340123
2018-08-18 15:58:19 +00:00
Zachary Turner bc94ae437f Add the extended XMM registers mappings for AVX-512.
After this we should have the entire AVX-512 register set
mapping in place.

llvm-svn: 340118
2018-08-18 03:54:16 +00:00
Craig Topper 62dd1b1b4f [X86] Remove detectAddSubSatPattern.
This was added very recently in r339650, but appears to be completely untested and has at least one bug in it.

llvm-svn: 340086
2018-08-17 21:19:28 +00:00
Krzysztof Parzyszek 9937e205e8 [Hexagon] Remove unused functions from HexagonInstPrinter, NFC
llvm-svn: 340081
2018-08-17 21:12:37 +00:00
Simon Pilgrim 2f48122cc9 [X86][SSE] Lower constant vXi8 ISD::SRL/ISD::SRA using PMULLW
Extending the concept introduced in D49562, this patch lowers constant vXi8 ISD::SRL/ISD::SRA by zero/sign extending to vXi16 and using PMULLW and then truncating the high 8 bits of the result.

Differential Revision: https://reviews.llvm.org/D50781

llvm-svn: 340062
2018-08-17 18:03:11 +00:00
Craig Topper 730890dbdb [X86] Use hasOneUse instead of isOnlyUserOf. NFCI
isOnlyUserOf is a little heavier because it allows the node to be used multiple times by the other node. In this case we are looking at a truncate which only has one operand so we know it can only use it once. Thus hasOneUse is better.

llvm-svn: 340059
2018-08-17 17:57:25 +00:00
Stefan Pintilie 39869ccf51 [PowerPC] Generate lxsd instead of the ld->mtvsrd sequence for vector loads
This patch addresses:

- Implementation within PPCISelLowering.cpp to check if we should use direct
load into vector instructions (such as lxsd/lfd ) when the scalar_to_vector
function is used; which will allow us to catch as many cases of the
scalar_to_vector uses as possible to translate the ld->mtvsrd sequence into
lxsd.

- Test cases to exhibit the behaviour of emitting lxsd/lfd.

Patch by amyk

Differential revision: https://reviews.llvm.org/D49698

llvm-svn: 340037
2018-08-17 15:15:26 +00:00
Francis Visoiu Mistrih 8bff832534 [X86] Fix liveness information when expanding X86::EH_SjLj_LongJmp64
test/CodeGen/X86/shadow-stack.ll has the following machine verifier
errors:

```
*** Bad machine code: Using a killed virtual register ***
- function:    bar
- basic block: %bb.6 entry (0x7fdc81857818)
- instruction: %3:gr64 = MOV64rm killed %2:gr64, 1, $noreg, 8, $noreg
- operand 1:   killed %2:gr64

*** Bad machine code: Using a killed virtual register ***
- function:    bar
- basic block: %bb.6 entry (0x7fdc81857818)
- instruction: $rsp = MOV64rm killed %2:gr64, 1, $noreg, 16, $noreg
- operand 1:   killed %2:gr64

*** Bad machine code: Virtual register killed in block, but needed live out. ***
- function:    bar
- basic block: %bb.2 entry (0x7fdc818574f8)
Virtual register %2 is used after the block.
```

The fix here is to only copy the machine operand's register without the
kill flags for all the instructions except the very last one of the
sequence.

I had to insert dummy PHIs in the test case to force the NoPHI function
property to be set to false. More on this here: https://llvm.org/PR38439

Differential Revision: https://reviews.llvm.org/D50260

llvm-svn: 340033
2018-08-17 14:46:56 +00:00
Krzysztof Parzyszek 39a979c838 [Hexagon] Expand vgather pseudos during packetization
This will allow packetizing the vgather expansion with other instructions.

llvm-svn: 340028
2018-08-17 14:24:24 +00:00
Roger Ferrer Ibanez 734a04ea33 [RISCV] Remove unused function
This function is not virtual, it is private and it is not called anywhere. No
regression is introduced by removing it.

I think we can safely remove it.

Differential Revision: https://reviews.llvm.org/D50836

llvm-svn: 340024
2018-08-17 13:40:03 +00:00
Luke Cheeseman 64dcdec60c [AArch64] - Generate pointer authentication instructions
- Generate pointer authentication instructions
- The functions instrumented depend on function attribtues:
  all (all functions instrumentent)
  non-leaf (only those that spill LR)
  none
- Function epilogues sign the LR before spilling to the stack and authenticate
  the LR once restored
- If the target is v8.3a or greater than can use the combined authenticate and
  return instruction

Differential revision: https://reviews.llvm.org/D49793

llvm-svn: 340018
2018-08-17 12:53:22 +00:00
Nemanja Ivanovic 39751276b0 [PowerPC] Generate Power9 extswsli extend sign and shift immediate instruction
Add a DAG combine for the PowerPC code generator to generate the Power9 extswsli
extend sign and shift immediate instruction.

Patch by RolandF.

Differential revision: https://reviews.llvm.org/D49879

llvm-svn: 340016
2018-08-17 12:35:44 +00:00
Bernard Ogden b828bb2a15 [ARM/AArch64] Support FP16 +fp16fml instructions
Add +fp16fml feature for new FP16 instructions, which are a
mandatory part of FP16 from v8.4-A and an optional part of FP16
from v8.2-A. It doesn't seem to be possible to model this in
LLVM, but the relationship between the options is handled by
the related clang patch.

In keeping with what I think is the usual practice, the fp16fml
extension is accepted regardless of base architecture version.

Builds on/replaces Sjoerd Meijer's patch to add these instructions at
https://reviews.llvm.org/D49839.

Differential Revision: https://reviews.llvm.org/D50228

llvm-svn: 340013
2018-08-17 11:29:49 +00:00
Daniel Cederman 0c597ca223 [Sparc] Get sret arg size from CallLoweringInfo.getArgs()
Summary:
Looking at the callee argument list, as is done now, might not work if
the function has been typecasted into one that is expected to return
a struct. This change also simplifies the code.

The isFP128ABICall() function can be removed as it is no longer needed.
The test in fp128.ll has been updated to verify this.

Reviewers: jyknight, venkatra

Reviewed By: jyknight

Subscribers: fedor.sergeev, jrtc27, llvm-commits

Differential Revision: https://reviews.llvm.org/D48117

llvm-svn: 340008
2018-08-17 10:40:00 +00:00
Daniel Cederman 7d3e08ff8d [Sparc] Flush register windows for @llvm.returnaddress(1)
Summary: When @llvm.returnaddress is called with a value higher than 0
it needs to read from the call stack to get the return address. This
means that the register windows needs to be flushed to the stack to
guarantee that the data read is valid. For values higher than 1 this
is done indirectly by the call to getFRAMEADDR(), but not for the value 1.

Reviewers: jyknight, venkatra

Reviewed By: jyknight

Subscribers: fedor.sergeev, jrtc27, llvm-commits

Differential Revision: https://reviews.llvm.org/D48636

llvm-svn: 340003
2018-08-17 09:18:31 +00:00
Sjoerd Meijer 31239a4c6a [ARM][NFC] ARMCodeGenPrepare: some refactoring and algorithm description
Differential Revision: https://reviews.llvm.org/D50846

llvm-svn: 339997
2018-08-17 07:34:01 +00:00
Heejin Ahn a93e726170 [WebAssembly] Modify LateEHPrepare one-line description (NFC)
llvm-svn: 339972
2018-08-17 00:12:04 +00:00
Heejin Ahn e76fa9ecca [WebAssembly] CFG stackify support for exception handling
Summary:
This adds support for exception handling to CFGStackify pass. This only
adds TRY / END_TRY markers and DOES NOT yet fix unwind mismatches that
can be created by the linearization of the CFG into the structural wasm
format. The mismatch fix will be added by following patches.

In detail, this patch
- Added support for TRY / END_TRY markers to support EH
- Changed many static functions into class member functions as they take
too many arguments now
- Added several more bookeeping data structures
- Refactored routines that decide where to insert markers, because
without refactoring this got too complicated as we added support for new
kinds of markers (TRY/END_TRY).
- Rewrote rethrow instructions' BB arguments to relative depths in EH
pad stack.

Reviewers: dschuff, sunfish

Subscribers: sbc100, jgravelle-google, llvm-commits

Differential Revision: https://reviews.llvm.org/D48273

llvm-svn: 339967
2018-08-16 23:50:59 +00:00
Craig Topper bde2b43cb3 [X86] In EFLAGS copy pass, don't emit EXTRACT_SUBREG instructions since we're after peephole
Normally the peephole pass converts EXTRACT_SUBREG to COPY instructions. But we're after peephole so we can't rely on it to clean these up.

To fix this, the eflags pass now emits a COPY with a subreg input.

I also noticed that in 32-bit mode we need to constrain the input to the copy to ensure the subreg is valid. Otherwise we'll fail verify-machineinstrs

Differential Revision: https://reviews.llvm.org/D50656

llvm-svn: 339945
2018-08-16 21:54:02 +00:00
Chandler Carruth c73c0307fe [MI] Change the array of `MachineMemOperand` pointers to be
a generically extensible collection of extra info attached to
a `MachineInstr`.

The primary change here is cleaning up the APIs used for setting and
manipulating the `MachineMemOperand` pointer arrays so chat we can
change how they are allocated.

Then we introduce an extra info object that using the trailing object
pattern to attach some number of MMOs but also other extra info. The
design of this is specifically so that this extra info has a fixed
necessary cost (the header tracking what extra info is included) and
everything else can be tail allocated. This pattern works especially
well with a `BumpPtrAllocator` which we use here.

I've also added the basic scaffolding for putting interesting pointers
into this, namely pre- and post-instruction symbols. These aren't used
anywhere yet, they're just there to ensure I've actually gotten the data
structure types correct. I'll flesh out support for these in
a subsequent patch (MIR dumping, parsing, the works).

Finally, I've included an optimization where we store any single pointer
inline in the `MachineInstr` to avoid the allocation overhead. This is
expected to be the overwhelmingly most common case and so should avoid
any memory usage growth due to slightly less clever / dense allocation
when dealing with >1 MMO. This did require several ergonomic
improvements to the `PointerSumType` to reasonably support the various
usage models.

This also has a side effect of freeing up 8 bits within the
`MachineInstr` which could be repurposed for something else.

The suggested direction here came largely from Hal Finkel. I hope it was
worth it. ;] It does hopefully clear a path for subsequent extensions
w/o nearly as much leg work. Lots of thanks to Reid and Justin for
careful reviews and ideas about how to do all of this.

Differential Revision: https://reviews.llvm.org/D50701

llvm-svn: 339940
2018-08-16 21:30:05 +00:00
Jacob Gravelle 3d668d3928 [WebAssembly] Remove temporary workaround for function bitcasts
Summary:
EM_ASM no longer is lowered as varargs in C, so this workaround is
obsolete.

Reviewers: dschuff, sunfish

Subscribers: sbc100, aheejin, llvm-commits

Differential Revision: https://reviews.llvm.org/D50859

llvm-svn: 339925
2018-08-16 19:24:31 +00:00
Reid Kleckner bd5d71229d [codeview] Use push_macro to avoid conflicts instead of a prefix
Summary:
This prefix was added in r333421, and it changed our dumper output to
say things like "CVRegEAX" instead of just "EAX". That's a functional
change that I'd rather avoid.

I tested GCC, Clang, and MSVC, and all of them support #pragma
push_macro. They don't issue warnings whem the macro is not defined
either.

I don't have a Mac so I can't test the real termios.h header, but I
looked at the termios.h sources online and looked for other conflicts.
I saw only the CR* macros, so those are the ones we work around.

Reviewers: zturner, JDevlieghere

Subscribers: hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D50851

llvm-svn: 339907
2018-08-16 17:34:31 +00:00
Matt Arsenault 7121bed210 AMDGPU: Custom lower fexp
This will allow the library to just use __builtin_expf directly
without expanding this itself. Note f64 still won't work because
there is no exp instruction for it.

llvm-svn: 339902
2018-08-16 17:07:52 +00:00
Nirav Dave 7fd992a755 [MC][X86] Enhance X86 Register expression handling to more closely match GCC.
Allow the comparison of x86 registers in the evaluation of assembler
directives. This generalizes and simplifies the extension from r334022
to catch another case found in the Linux kernel.

Reviewers: rnk, void

Reviewed By: rnk

Subscribers: hiraditya, nickdesaulniers, llvm-commits

Differential Revision: https://reviews.llvm.org/D50795

llvm-svn: 339895
2018-08-16 16:31:14 +00:00
Zachary Turner 2838b59121 Add support for AVX-512 CodeView registers.
When compiling with /arch:AVX512 and optimizations turned on,
we could crash while emitting debug info because we did not
have CodeView register constants for the AVX 512 register
set defined.  This patch defines them.

Differential Revision: https://reviews.llvm.org/D50819

llvm-svn: 339893
2018-08-16 16:17:55 +00:00
Sam Parker 0d51197051 [ARM] Ignore GEPs in ARMCodeGenPrepare
While searching through the use-def tree, ignore GetElementPtrInst
instructions because they don't need promoting and neither do their
indices. Otherwise, the wide indices prevent the transformation from
happening.

Differential Revision: https://reviews.llvm.org/D50762

llvm-svn: 339871
2018-08-16 12:24:40 +00:00
Sam Parker 0e2f0bd48e [ARM] Allow zext in ARMCodeGenPrepare
Treat zext instructions as roots, like we do for truncs.

Differential Revision: https://reviews.llvm.org/D50759

llvm-svn: 339868
2018-08-16 11:54:09 +00:00
Sam Parker 13567dbbd8 [ARM] Allow signed icmps in ARMCodeGenPrepare
Originally committed in r339755 which was reverted in r339806 due to
an asan issue. The issue was caused by my assumption that operands to
a CallInst mapped to the FunctionType Params. CallInsts are now
handled by iterating over their ArgOperands instead of Operands.
    
Original Message:
  Treat signed icmps as 'sinks', allowing them to be in the use-def
  tree, enabling more promotions to be performed. As a sink, any
  promoted incoming values need to be truncated before being used by
  the signed icmp.
    
  Differential Revision: https://reviews.llvm.org/D50067

llvm-svn: 339858
2018-08-16 10:05:39 +00:00
Simon Atanasyan a8ac4308aa [mips] Remove dead code from MipsPassConfig
Found by GCC's -Wunused-function.

Patch by Kim Gräsman.

Differential revision: https://reviews.llvm.org/D50612

llvm-svn: 339847
2018-08-16 08:43:17 +00:00
Craig Topper 9c1d9fdeaa [X86] Remove masking from the 512-bit padds and psubs intrinsics. Use select in IR instead.
llvm-svn: 339842
2018-08-16 06:20:24 +00:00
Craig Topper 9d6983c9fd [X86] Remove the unused masked 128 and 256-bit masked padds/psubs intrinsics.
Still need to remove masking from the 512-bit versions.

llvm-svn: 339841
2018-08-16 06:20:22 +00:00
Chandler Carruth 00c35c7794 [x86] Actually initialize the SLH pass with the x86 backend and use
a shorter name ('x86-slh') for the internal flags and pass name.

Without this, you can't use the -stop-after or -stop-before
infrastructure. I seem to have just missed this when originally adding
the pass.

The shorter name solves two problems. First, the flag names were ...
really long and hard to type/manage. Second, the pass name can't be the
exact same as the flag name used to enable this, and there are already
some users of that flag name so I'm avoiding changing it unnecessarily.

llvm-svn: 339836
2018-08-16 01:22:19 +00:00
Matt Arsenault f533e6b0ed AMDGPU: Fold fneg into fmed3
llvm-svn: 339821
2018-08-15 21:46:27 +00:00
Matt Arsenault a816073764 AMDGPU: Improve extract_vector_elt reduction combine
Handle fmul, fsub and preserve flags.

Also really test minnum/maxnum reductions.
The existing tests were only checking from
minnum/maxnum matched from a fast math compare
and select which is not the same.

llvm-svn: 339820
2018-08-15 21:34:06 +00:00
Matt Arsenault b3a80e5397 AMDGPU: Implement llvm.amdgcn.icmp/fcmp for i16/f16
Also support these on targets without support for these,
since it will allow us to freely create these in instcombine.

llvm-svn: 339819
2018-08-15 21:25:20 +00:00
Craig Topper 08e082619a [X86] Improve AVX1 shuffle lowering for v8f32 shuffles where the low half comes from V1 and the high half comes from V2 and the halves do the same operation
To lower this we now create a new V1 containing the low half of both sources and a new V2 containing the upper half of both sources. Then we created a repeated lane shuffle of those new sources to create the final result.

This fixes PR35833

Differential Revison: https://reviews.llvm.org/D41794

llvm-svn: 339818
2018-08-15 21:21:52 +00:00
Matt Arsenault 6c7ba82900 AMDGPU: Address todo for handling 1/(2 pi)
llvm-svn: 339814
2018-08-15 21:03:55 +00:00
Vitaly Buka ed4239f482 Revert "[ARM] Allow signed icmps in ARMCodeGenPrepare"
use-after-poison in check-llvm under asan

This reverts commit r339755.

llvm-svn: 339806
2018-08-15 20:09:35 +00:00
Thomas Lively 5222cb601b [WebAssembly][NFC] Standardize SIMD multiclass format
Summary:
This CL changes the ExtractLane ISEL multiclass to more closely mirror
the structure of the splat and replace_lane multiclasses.

Reviewers: aheejin, dschuff

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D50794

llvm-svn: 339801
2018-08-15 18:15:18 +00:00
Thomas Lively 39fe480832 [WebAssembly] Test commit
Changes a comment and some whitespace to test commit access.

llvm-svn: 339798
2018-08-15 17:50:22 +00:00
Derek Schuff 82812fb986 [WebAssembly] SIMD replace_lane
Implement and test replace_lane instructions.

Patch by Thomas Lively

Differential Revision: https://reviews.llvm.org/D50750

llvm-svn: 339786
2018-08-15 16:18:51 +00:00
Nemanja Ivanovic 5b9a4f8ee5 [PowerPC] Enhance the selection(ISD::VSELECT) of vector type
To make ISD::VSELECT available(legal) so long as there are altivec instruction,
otherwise it's default behavior is expanding.
Use xxsel to match vselect if vsx is open, or use vsel.

In order to do not write many patterns in td file, promote (for vector it's
bitcast) all other type into v4i32 and only pattern match vselect of v4i32 into
vsel or xxsel.

Patch by wuzish
Differential revision: https://reviews.llvm.org/D49531

llvm-svn: 339779
2018-08-15 15:30:36 +00:00
Krzysztof Parzyszek 2a119b9a98 [SystemZ] Replace subreg_r with subreg_h
Change
  subreg_r32  -> subreg_h32
  subreg_r64  -> subreg_h64
  subreg_hr32 -> subreg_hh32

The subregisters subreg_r32 and subreg_r64 were added to emphasize the
fact that modifying these subregisters may clobber the entire register.
This is not necessarily the case for subreg_h32, et al.

However, the ability to compose subreg_h64 with subreg_r32, and with
subreg_h32 and subreg_l32 at the same time makes the compositions be
treated as non-overlapping (leading to problems when tracking subreg
liveness). See D50468 for more details.

Differential Revision: https://reviews.llvm.org/D50725

llvm-svn: 339778
2018-08-15 15:21:23 +00:00
Jonas Paulsson d5a9c2d551 [SystemZ] New CL option to enable subreg liveness
This option is needed to enable subreg liveness tracking during register
allocation.

Review: Ulrich Weigand
https://reviews.llvm.org/D50779

llvm-svn: 339776
2018-08-15 15:04:49 +00:00
Sam Parker fabf7fe5f8 [ARM] TypeSize lower bound for ARMCodeGenPrepare
We only try to promote types with are smaller than 16-bits, but we
also need to check that the type is not less than 8-bits.

Differential Revision: https://reviews.llvm.org/D50769

llvm-svn: 339770
2018-08-15 13:29:50 +00:00
Nemanja Ivanovic 8b4bd09e22 [PowerPC] Don't run BV DAG Combine before legalization if it assumes legal types
When trying to combine a DAG that builds a vector out of sign-extensions of
vector extracts, the code assumes legal input types. Due to that, we have to
disable this combine prior to legalization.
In some cases, the DAG will look slightly different after legalization so
account for that in the matching code.

This is a fix for https://bugs.llvm.org/show_bug.cgi?id=38087

Differential Revision: https://reviews.llvm.org/D49080

llvm-svn: 339769
2018-08-15 12:58:13 +00:00
Simon Pilgrim f3b5943ffc Remove lambda default argument to fix gcc pedantic warning.
llvm-svn: 339767
2018-08-15 12:32:09 +00:00
Sam Parker 6548cd3905 [ARM] Allow signed icmps in ARMCodeGenPrepare
Treat signed icmps as 'sinks', allowing them to be in the use-def
tree, enabling more promotions to be performed. As a sink, any
promoted incoming values need to be truncated before being used by
the signed icmp.

Differential Revision: https://reviews.llvm.org/D50067

llvm-svn: 339755
2018-08-15 08:23:03 +00:00
Sam Parker 7def86bbdb [ARM] Allow pointer values in ARMCodeGenPrepare
Add pointers to the list of allowed types, but don't try to promote
them. Also fixed a bug with the promotion of undef values, so a new
value is now created instead of mutating in place. We also now only
promote if there's an instruction in the use-def chains other than
the icmp, sinks and sources.

Differential Revision: https://reviews.llvm.org/D50054

llvm-svn: 339754
2018-08-15 07:52:35 +00:00
Craig Topper 633fe98e27 [X86] Change legacy SSE scalar fp to integer intrinsics to use specific ISD opcodes instead of keeping as intrinsics. Unify SSE and AVX512 isel patterns.
AVX512 added new versions of these intrinsics that take a rounding mode. If the rounding mode is 4 the new intrinsics are equivalent to the old intrinsics.

The AVX512 intrinsics were being lowered to ISD opcodes, but the legacy SSE intrinsics were left as intrinsics. This resulted in the AVX512 instructions needing separate patterns for the ISD opcodes and the legacy SSE intrinsics.

Now we convert SSE intrinsics and AVX512 intrinsics with rounding mode 4 to the same ISD opcode so we can share the isel patterns.

llvm-svn: 339749
2018-08-15 01:23:00 +00:00
Chandler Carruth 139b35192a [SDAG] Update the AVR backend for the SelectionDAG API changes in
r339740, fixing the build for this target.

llvm-svn: 339748
2018-08-15 01:22:50 +00:00
Derek Schuff 4ec8bca13e [WebAssembly] SIMD Splats
Implement and test SIMD splat ops.

Patch by Thomas Lively

Differential Revision: https://reviews.llvm.org/D50741

llvm-svn: 339744
2018-08-15 00:30:27 +00:00
Chandler Carruth 66654b72c9 [SDAG] Remove the reliance on MI's allocation strategy for
`MachineMemOperand` pointers attached to `MachineSDNodes` and instead
have the `SelectionDAG` fully manage the memory for this array.

Prior to this change, the memory management was deeply confusing here --
The way the MI was built relied on the `SelectionDAG` allocating memory
for these arrays of pointers using the `MachineFunction`'s allocator so
that the raw pointer to the array could be blindly copied into an
eventual `MachineInstr`. This creates a hard coupling between how
`MachineInstr`s allocate their array of `MachineMemOperand` pointers and
how the `MachineSDNode` does.

This change is motivated in large part by a change I am making to how
`MachineFunction` allocates these pointers, but it seems like a layering
improvement as well.

This would run the risk of increasing allocations overall, but I've
implemented an optimization that should avoid that by storing a single
`MachineMemOperand` pointer directly instead of allocating anything.
This is expected to be a net win because the vast majority of uses of
these only need a single pointer.

As a side-effect, this makes the API for updating a `MachineSDNode` and
a `MachineInstr` reasonably different which seems nice to avoid
unexpected coupling of these two layers. We can map between them, but we
shouldn't be *surprised* at where that occurs. =]

Differential Revision: https://reviews.llvm.org/D50680

llvm-svn: 339740
2018-08-14 23:30:32 +00:00
Eli Friedman 0d12e90bf5 [ARM] Make PerformSHLSimplify add nodes to the DAG worklist correctly.
Intentionally excluding nodes from the DAGCombine worklist is likely to
lead to weird optimizations and infinite loops, so it's generally a bad
idea.

To avoid the infinite loops, fix DAGCombine to use the
isDesirableToCommuteWithShift target hook before performing the
transforms in question, and implement the target hook in the ARM backend
disable the transforms in question.

Fixes https://bugs.llvm.org/show_bug.cgi?id=38530 . (I don't have a
reduced testcase for that bug. But we should have sufficient test
coverage for PerformSHLSimplify given that we're not playing weird
tricks with the worklist. I can try to bugpoint it if necessary,
though.)

Differential Revision: https://reviews.llvm.org/D50667

llvm-svn: 339734
2018-08-14 22:10:25 +00:00
Heejin Ahn c9c711a0ac [WebAssembly] Fix encoding of non-SIMD vector-typed instructions
Previously SIMD_I was the same as a normal instruction except for the
addition of a HasSIM128 predicate. However, rL339186 changed the
encoding of SIMD_I instructions to automatically contain the SIMD
prefix byte. This broke the encoding of non-SIMD vector-typed
instructions, which had instantiated SIMD_I. This CL corrects this
error.

Reviewers: aheejin

Subscribers: sunfish, jgravelle-google, sbc100, llvm-commits

Differential Revision: https://reviews.llvm.org/D50682

Patch by Thomas Lively (tlively)

llvm-svn: 339710
2018-08-14 19:03:36 +00:00
Heejin Ahn a0fd9c3e9a [WebAssembly] SIMD extract_lane
Implement instruction selection for all versions of the extract_lane
instruction. Use explicit sext/zext to differentiate between
extract_lane_s and extract_lane_u for applicable types, otherwise
default to extract_lane_u.

Reviewers: aheejin

Subscribers: sunfish, jgravelle-google, sbc100, llvm-commits

Differential Revision: https://reviews.llvm.org/D50597

Patch by Thomas Lively (tlively)

llvm-svn: 339707
2018-08-14 18:53:27 +00:00
Andrea Di Biagio 9eaf5aa006 [Tablegen][MCInstPredicate] Removed redundant template argument from class TIIPredicate, and implemented verification rules for TIIPredicates.
This patch removes redundant template argument `TargetName` from TIIPredicate.
Tablegen can always infer the target name from the context. So we don't need to
force users of TIIPredicate to always specify it.

This allows us to better modularize the tablegen class hierarchy for the
so-called "function predicates". class FunctionPredicateBase has been added; it
is currently used as a building block for TIIPredicates. However, I plan to
reuse that class to model other function predicate classes too (i.e. not just
TIIPredicates). For example, this can be a first step towards implementing
proper support for dependency breaking instructions in tablegen.

This patch also adds a verification step on TIIPredicates in tablegen.
We cannot have multiple TIIPredicates with the same name. Otherwise, this will
cause build errors later on, when tablegen'd .inc files are included by cpp
files and then compiled.

Differential Revision: https://reviews.llvm.org/D50708

llvm-svn: 339706
2018-08-14 18:36:54 +00:00
Simon Pilgrim 2ce3d6e135 [X86][SSE] Avoid duplicate shuffle input sources in combineX86ShufflesRecursively
rL339686 added the case where a faux shuffle might have repeated shuffle inputs coming from either side of the OR().

This patch improves the insertion of the inputs into the source ops lists to account for this, as well as making it trivial to add support for shuffles with more than 2 inputs in the future.

llvm-svn: 339696
2018-08-14 17:22:37 +00:00
Simon Pilgrim ed55138247 [X86][SSE] Add shuffle combine support for OR(PSHUFB,PSHUFB) style patterns.
If each element is zero from one (or both) inputs then we can combine these into a single shuffle mask.

llvm-svn: 339686
2018-08-14 16:00:05 +00:00
Simon Pilgrim df9880f257 [X86][SSE] Generalize lowerVectorShuffleAsBlendOfPSHUFBs to work with any vXi8 type.
We still only use this for v16i8, but this cleans up the code to support v32i8/v64i8 sometime in the future.

llvm-svn: 339679
2018-08-14 14:00:14 +00:00
Roger Ferrer Ibanez c8f4dbbc63 [RISCV] Fix incorrect use of MCInstBuilder
This is a fix for r339314.

MCInstBuilder uses the named parameter idiom and an 'operator MCInst&' to ease
the creation of MCInsts. As the object of MCInstBuilder owns the MCInst is
manipulating, the lifetime of the MCInst is bound to that of MCInstBuilder.

In r339314 I bound a reference to the MCInst in an initializer. The
temporary of MCInstBuilder (and also its MCInst) is destroyed at the end of
the declaration leading to a dangling reference.

Fix this by using MCInstBuilder inside an argument of a function call.
Temporaries in function calls are destroyed in the enclosing full expression,
so the the reference to MCInst is still valid when emitToStreamer executes.

llvm-svn: 339654
2018-08-14 08:30:42 +00:00
Chih-Mao Chen 5d94b25ffe Test commit: fix punctuation
llvm-svn: 339652
2018-08-14 08:08:39 +00:00
Tomasz Krupa 86a63889f3 [X86] Lowering addus/subus intrinsics to native IR
Summary: This revision improves previous version (rL330322) which has been reverted due to crashes.

This is the patch that lowers x86 intrinsics to native IR
in order to enable optimizations. The patch also includes folding
of previously missing saturation patterns so that IR emits the same
machine instructions as the intrinsics.

Reviewers: craig.topper, spatel, RKSimon

Reviewed By: craig.topper

Subscribers: mike.dvoretsky, DavidKreitzer, sroland, llvm-commits

Differential Revision: https://reviews.llvm.org/D46179

llvm-svn: 339650
2018-08-14 08:00:56 +00:00
Sjoerd Meijer 3c859b3ec3 [ARM] ParallelDSP: add option to enable/disable the pass
Differential Revision: https://reviews.llvm.org/D50511

llvm-svn: 339645
2018-08-14 07:43:49 +00:00
Wouter van Oortmerssen a7be375586 Revert "[WebAssembly] Added default stack-only instruction mode for MC."
This reverts commit 917a99b71ce21c975be7bfbf66f4040f965d9f3c.

llvm-svn: 339630
2018-08-13 23:12:49 +00:00
Craig Topper cade635c77 [X86] Don't ignore 0x66 prefix on relative jumps in 64-bit mode. Fix opcode selection of relative jumps in 16-bit mode. Treat jno/jo like other jcc instructions.
The behavior in 64-bit mode is different between Intel and AMD CPUs. Intel ignores the 0x66 prefix. AMD does not. objump doesn't ignore the 0x66 prefix. Since LLVM aims to match objdump behavior, we should do the same.

While I was trying to fix this I had change brtarget16/32 to use ENCODING_IW/ID instead of ENCODING_Iv to get the 0x66+REX.W case to act sort of sanely. It's still wrong, but that's a problem for another day.

The change in encoding exposed the fact that 16-bit mode disassembly of relative jumps was creating JMP_4 with a 2 byte immediate. It should have been JMP_2. From just printing you can't tell the difference, but if you dumped the encoding it wouldn't have matched what we started with.

While fixing that, it exposed that jo/jno opcodes were missing from the switch that this patch deleted and there were no test cases for them.

Fixes PR38537.

llvm-svn: 339622
2018-08-13 22:06:28 +00:00
Andrea Di Biagio 7b77b14198 [X86][BtVer2] Use NoSchedPredicate to model default transitions in variant scheduling classes. NFC.
llvm-svn: 339589
2018-08-13 17:52:39 +00:00
Krzysztof Parzyszek cce15c76d3 [Hexagon] Silence -Wuninitialized warning from GCC 5.4, NFC
Patch by Kim Gräsman.

Differential Revision: https://reviews.llvm.org/D50623

llvm-svn: 339576
2018-08-13 15:08:25 +00:00
Daniel Cederman dc3e4c6d95 Revert "[Sparc] Add support for the cycle counter available in GR740"
It breaks when using EXPENSIVE_CHECKS with the error message
"Bad machine code: Using an undefined physical register".

llvm-svn: 339570
2018-08-13 14:18:09 +00:00
Sid Manning 8d4a6615e1 Check for tied operands
Differential Revision: https://reviews.llvm.org/D50592

llvm-svn: 339567
2018-08-13 14:01:25 +00:00
Jonas Paulsson 5ffb27b166 [SystemZ] Increase the amount of inlining.
Implement getInliningThresholdMultiplier() and have it return 3.

Review: Ulrich Weigand
llvm-svn: 339563
2018-08-13 13:31:30 +00:00
Daniel Cederman 1bfbc62022 [Sparc] Add support for the cycle counter available in GR740
Summary: The GR740 provides an up cycle counter in the
registers ASR22 and ASR23. As these registers can not be
read together atomically we only use the value of ASR23
for llvm.readcyclecounter(). The ASR23 register holds the
32 LSBs of the up-counter.

Reviewers: jyknight, venkatra

Reviewed By: jyknight

Subscribers: fedor.sergeev, jrtc27, llvm-commits

Differential Revision: https://reviews.llvm.org/D48638

llvm-svn: 339551
2018-08-13 10:49:48 +00:00
Luke Geeson 4ce41d2bb7 [ARM] Added FP16 VREV Vector Instrinsic CodeGen support
llvm-svn: 339546
2018-08-13 08:37:41 +00:00
Matt Arsenault 13b0db9285 AMDGPU: Check NSZ MI flag when folding omod
I'm not sure the exact nsz flag combination that
is OK. I think as long as it's on either, this is OK.
For now just check it on the omod multiply.

llvm-svn: 339513
2018-08-12 08:44:25 +00:00
Matt Arsenault b5acec1f79 AMDGPU: Use splat vectors for undefs when folding canonicalize
If one of the elements is undef, use the canonicalized constant
from the other element instead of 0.

Splat vectors are more useful for other optimizations, such
as matching vector clamps. This was breaking on clamps
of half3 from the undef 4th component.

llvm-svn: 339512
2018-08-12 08:42:54 +00:00
Matt Arsenault 3ead7d7389 AMDGPU: Fix packing undef parts of build_vector
llvm-svn: 339511
2018-08-12 08:42:46 +00:00
Craig Topper ed8a114c86 [X86] Remove unnecessary AddedComplexity line. NFC
The use of the or_is_add predicate already gives enough of a complexity boost to get the patterns ordered properly.

llvm-svn: 339507
2018-08-12 03:22:18 +00:00
Craig Topper b3e3477649 [X86] Remove the AL/AX/EAX/RAX short immediate forms from the macro fusion shouldScheduleAdjacent. NFC
These instructions are only created by the backend during MCInst lowering.

llvm-svn: 339499
2018-08-11 06:42:51 +00:00
Craig Topper c6cf169940 [X86] Add the mem-reg form of CMP to the macro fusion shouldScheduleAdjacent.
Unlike the other arithmetic instructions the mem-reg form of compare is just a load and not a RMW operation. According to the Intel optimization manual, this form is also supported by macro fusion.

llvm-svn: 339498
2018-08-11 06:42:50 +00:00
Craig Topper 616eeb827d [X86] Remove ADD8mi and ADDmr from the macro fusion shouldScheduleAdjacent.
The are RMW of memory operations. They aren't eligible for macro fusion.

llvm-svn: 339497
2018-08-11 06:42:49 +00:00
Craig Topper 570d47a010 [X86] Change the MOV32ri64 pseudo instruction to def a GR64 directly instead of wrapping it in a SUBREG_TO_REG.
Now we switch to the subregister in expandPostRAPseudos where we already switched the opcode.

This simplifies a few isel patterns that used the pseudo directly. And magically seems to have improved our ability to CSE it in the undef-label.ll test.

llvm-svn: 339496
2018-08-11 05:33:00 +00:00
Richard Trieu 01f99f3cd6 Fix WebAssembly instruction printer after r339474
Treat the stack variants of control instructions the same as regular
instructions.  Otherwise, the vector ControlFlowStack will be the wrong
size and have out-of-bounds access.  This was detected by MemorySanitizer.

llvm-svn: 339495
2018-08-11 04:18:05 +00:00
Tom Stellard 8adc86a7dc AMDGPU/GlobalISel: Define instruction mapping for G_INSERT
Reviewers: arsenm

Reviewed By: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D49625

llvm-svn: 339491
2018-08-11 00:51:54 +00:00
Eli Friedman 6b84a48953 Fix unused lambda capture warning from r339472.
llvm-svn: 339479
2018-08-10 22:03:25 +00:00
Wouter van Oortmerssen ab26bd0647 [WebAssembly] Added default stack-only instruction mode for MC.
Summary:
Moved Explicit Locals pass to last.
Made that pass obligatory.
Made it convert from register to stack based instructions, and removed the registers.
Fixes to related code that was expecting register based instructions.
Added the correct testing flag to all tests, depending on what the
format they were expecting so far.
Translated one test to stack format as example: reg-stackify-stack.ll

tested:
llvm-lit -v `find test -name WebAssembly`
unittests/MC/*

Reviewers: dschuff, sunfish

Subscribers: jfb, llvm-commits, aheejin, eraman, jgravelle-google, sbc100

Differential Revision: https://reviews.llvm.org/D50568

llvm-svn: 339474
2018-08-10 21:32:47 +00:00
Eli Friedman e1687a89e8 [ARM] Adjust AND immediates to make them cheaper to select.
LLVM normally prefers to minimize the number of bits set in an AND
immediate, but that doesn't always match the available ARM instructions.
In Thumb1 mode, prefer uxtb or uxth where possible; otherwise, prefer
a two-instruction sequence movs+ands or movs+bics.

Some potential improvements outlined in
ARMTargetLowering::targetShrinkDemandedConstant, but seems to work
pretty well already.

The ARMISelDAGToDAG fix ensures we don't generate an invalid UBFX
instruction due to a larger-than-expected mask. (It's orthogonal, in
some sense, but as far as I can tell it's either impossible or nearly
impossible to reproduce the bug without this change.)

According to my testing, this seems to consistently improve codesize by
a small amount by forming bic more often for ISD::AND with an immediate.

Differential Revision: https://reviews.llvm.org/D50030

llvm-svn: 339472
2018-08-10 21:21:53 +00:00
Matt Arsenault 940e6075e4 AMDGPU: More canonicalized operations
llvm-svn: 339464
2018-08-10 19:20:17 +00:00
Matt Arsenault 3dcf4ce435 AMDGPU: Combine and of seto/setuo and fp_class
Clear the nan (or non-nan) test bits from the mask.

llvm-svn: 339462
2018-08-10 18:58:56 +00:00
Matt Arsenault 8ad00d30fa AMDGPU: Match isfinite pattern to class instructions
llvm-svn: 339460
2018-08-10 18:58:41 +00:00
Matt Arsenault 5bb9d798b4 AMDGPU: Add LLVM_FALLTHROUGH
llvm-svn: 339458
2018-08-10 17:57:12 +00:00
Sam Parker 8c4b964c5a [ARM] Disallow zexts in ARMCodeGenPrepare
Enabling ARMCodeGenPrepare by default caused a whole load of
failures. This is due to zexts and truncs not being handled properly.
ZExts are messy so it's just easier to disable for now and truncs
are allowed only as 'sinks'. I still need to figure out why allowing
them as 'sources' causes so many failures. The other main changes are
that we are explicit in the types that we converting to, it's now
always 'TypeSize'. Type support is also now performed while checking
for valid opcodes as it unnecessarily complicated having the checks
are different stages.
    
I've moved the tests around too, so we have the zext and truncs in
their own file as well as the overflowing opcode tests.

Differential Revision: https://reviews.llvm.org/D50518

llvm-svn: 339432
2018-08-10 13:57:13 +00:00
Simon Pilgrim 130b00bc43 [X86][SSE] Pull out repeated shift getOpcode() calls. NFCI.
llvm-svn: 339425
2018-08-10 11:42:42 +00:00
Heejin Ahn 5831e9cc79 [WebAssembly] Gate i64x2 and f64x2 on -wasm-enable-unimplemented
Summary:
i64x2 and f64x2 operations are not implemented in V8, so we normally
do not want to emit them. However, they are in the SIMD spec proposal,
so we still want to be able to test them in the toolchain. This patch
adds a flag to enable their emission.

Reviewers: aheejin, dschuff

Subscribers: sunfish, jgravelle-google, sbc100, llvm-commits

Differential Revision: https://reviews.llvm.org/D50423

Patch by Thomas Lively (tlively)

llvm-svn: 339407
2018-08-09 23:58:51 +00:00
Craig Topper 9a8136f7b4 [X86] Qualify one of the heuristics in combineMul to only apply to positive multiply amounts.
This seems to slightly help the performance of one of our internal benchmarks. We probably need better heuristics here.

llvm-svn: 339406
2018-08-09 23:27:42 +00:00
Heejin Ahn 41b25c6cf4 [WebAssembly] Fix wasm backend compilation on gcc 5.4: variable name cannot match class
Summary:
gcc does not like

const Region *Region;

It wants a different name for the variable.

Is there a better convention for what name to use in such a case?

Reviewers: sbc100, aheejin

Subscribers: aheejin, jgravelle-google, dschuff, llvm-commits

Differential Revision: https://reviews.llvm.org/D50472

Patch by Alon Zakai (kripken)

llvm-svn: 339398
2018-08-09 22:35:23 +00:00
Reid Kleckner fce7f73bec [MC] Move EH DWARF encodings from MC to CodeGen, NFC
Summary:
The TType encoding, LSDA encoding, and personality encoding are all
passed explicitly by CodeGen to the assembler through .cfi_* directives,
so only the AsmPrinter needs to know about them.

The FDE CFI encoding however, controls the encoding of the label
implicitly created by the .cfi_startproc directive. That directive seems
to be special in that it doesn't take an encoding, so the assembler just
has to know how to encode one DSO-local label reference from .eh_frame
to .text.

As a result, it looks like MC will continue to have to know when the
large code model is in use. Perhaps we could invent a '.cfi_startproc
[large]' flag so that this knowledge doesn't need to pollute the
assembler.

Reviewers: davide, lliu0, JDevlieghere

Subscribers: hiraditya, fedor.sergeev, llvm-commits

Differential Revision: https://reviews.llvm.org/D50533

llvm-svn: 339397
2018-08-09 22:24:04 +00:00
Ana Pazos 10de234905 [RISC-V] Fixed alias for addi x2, x2, 0
A missing check for non-zero immediate in MCOperandPredicate
caused c.addi16sp sp, 0 to be selected which is not a valid
instruction.

llvm-svn: 339381
2018-08-09 20:51:53 +00:00
Krzysztof Parzyszek 75c2ca3638 [Hexagon] Map ISD::TRAP to J2_trap0(#0)
llvm-svn: 339365
2018-08-09 18:03:45 +00:00
Evandro Menezes 8c4366273c [ARM] Adjust the feature set for Exynos
Enable `FeatureZCZeroing`, `FeatureHasSlowFPVMLx`, `FeatureExpandMLx`,
`FeatureProfUnpredicate`, `FeatureSlowVDUP32`, `FeatureSlowVGETLNi32`,
`FeatureSplatVFPToNeon`, `FeatureHasRetAddrStack`, `FeatureSlowFPBrcc` for
all Exynos processors.

llvm-svn: 339356
2018-08-09 16:34:38 +00:00
Evandro Menezes 9a92fe0c9e [ARM] Replace processor check with feature
Add new feature, `FeatureUseWideStrideVFP`, that replaces the need for a
processor check.  Otherwise, NFC.

llvm-svn: 339354
2018-08-09 16:13:24 +00:00
Andrea Di Biagio f3bde0485c [MC][PredicateExpander] Extend the grammar to support simple switch and return statements.
This patch introduces tablegen class MCStatement.

Currently, an MCStatement can be either a return statement, or a switch
statement.

```
MCStatement:
   MCReturnStatement
   MCOpcodeSwitchStatement
```

A MCReturnStatement expands to a return statement, and the boolean expression
associated with the return statement is described by a MCInstPredicate.

An MCOpcodeSwitchStatement is a switch statement where the condition is a check
on the machine opcode. It allows the definition of multiple checks, as well as a
default case. More details on the grammar implemented by these two new
constructs can be found in the diff for TargetInstrPredicates.td.

This patch makes it easier to read the body of auto-generated TargetInstrInfo
predicates.

In future, I plan to reuse/extend the MCStatement grammar to describe more
complex target hooks. For now, this is just a first step (mostly a minor
cosmetic change to polish the new predicates framework).

Differential Revision: https://reviews.llvm.org/D50457

llvm-svn: 339352
2018-08-09 15:32:48 +00:00
Sjoerd Meijer 806f70d229 [ARM] FP16: codegen support for VTRN
Differential Revision: https://reviews.llvm.org/D50454

llvm-svn: 339340
2018-08-09 12:45:09 +00:00
Simon Pilgrim 511c3fc529 [X86][SSE] Remove PMULDQ/PMULUDQ by zero
Exposed by D50328

Differential Revision: https://reviews.llvm.org/D50328

llvm-svn: 339337
2018-08-09 12:37:36 +00:00
Simon Pilgrim 01ae462fef [X86][SSE] Combine (some) target shuffles with multiple uses
As discussed on D41794, we have many cases where we fail to combine shuffles as the input operands have other uses.

This patch permits these shuffles to be combined as long as they don't introduce additional variable shuffle masks, which should reduce instruction dependencies and allow the total number of shuffles to still drop without increasing the constant pool.

However, this may mean that some memory folds may no longer occur, and on pre-AVX require the occasional extra register move.

This also exposes some poor PMULDQ/PMULUDQ codegen which was doing unnecessary upper/lower calculations which will in fact fold to zero/undef - the fix will be added in a followup commit.

Differential Revision: https://reviews.llvm.org/D50328

llvm-svn: 339335
2018-08-09 12:30:02 +00:00
Andrew V. Tischenko 24f63bcb34 [X86] Improved sched models for X86 XCHG*rr and XADD*rr instructions.
Differential Revision: https://reviews.llvm.org/D49861

llvm-svn: 339321
2018-08-09 09:23:26 +00:00
Jonas Hahnfeld 20526bf483 [NVPTX] Select atomic loads and stores
According to PTX ISA .volatile has the same memory synchronization
semantics as .relaxed.sys, so it can be used to implement monotonic
atomic loads and stores. This is important for OpenMP's atomic
construct where
 - 'read's and 'write's are lowered to atomic loads and stores, and
 - an update of float or double types are lowered into a cmpxchg loop.
(Note that PTX could do better because it has atom.add.f{32,64} but
LLVM's atomicrmw instruction only allows integer types.)

Higher levels of atomicity (like acquire and release) need additional
synchronization properties which were added with PTX ISA 6.0 / sm_70.
So using these instructions still results in an error.

Differential Revision: https://reviews.llvm.org/D50391

llvm-svn: 339316
2018-08-09 07:45:49 +00:00
Roger Ferrer Ibanez 577a97e2b9 [RISCV] Add "lla" pseudo-instruction to assembler
This pseudo-instruction is similar to la but uses PC-relative addressing
unconditionally. This is, la is only different to lla when using -fPIC. This
pseudo-instruction seems often forgotten in several specs but it is definitely
mentioned in binutils opcodes/riscv-opc.c. The semantics are defined both in
page 37 of the "RISC-V Reader" book but also in function macro found in
gas/config/tc-riscv.c.

This is a very first step towards adding PIC support for Linux in the RISC-V
backend.

The lla pseudo-instruction expands to a sequence of auipc + addi with a couple
of pc-rel relocations where the second points to the first one. This is
described in
https://github.com/riscv/riscv-elf-psabi-doc/blob/master/riscv-elf.md#pc-relative-symbol-addresses

For now, this patch only introduces support of that pseudo instruction at the
assembler parser.

Differential Revision: https://reviews.llvm.org/D49661

llvm-svn: 339314
2018-08-09 07:08:20 +00:00
Eli Friedman 5b45a39056 [ARM] Avoid spilling lr with Thumb1 tail calls.
Normally, if any registers are spilled, we prefer to spill lr on Thumb1
so we can fold the "bx lr" into the "pop".  However, if there are tail
calls involved, restoring lr is expensive, so skip the optimization in
that case.

The spill of r7 in the new test also isn't necessary, but that's
mostly orthogonal to this patch. (It's the same code in
ARMFrameLowering, but it's not related to tail calls.)

Differential Revision: https://reviews.llvm.org/D49459

llvm-svn: 339283
2018-08-08 20:03:10 +00:00
Krzysztof Parzyszek 1df7059150 [Hexagon] Diagnose misaligned absolute loads and stores
Differential Revision: https://reviews.llvm.org/D50405

llvm-svn: 339272
2018-08-08 17:00:09 +00:00
Matt Arsenault 935f3b70fe AMDGPU: Error more gracefully on libcalls
I think this is the only situation where the callsite
will have a null instruction.

llvm-svn: 339271
2018-08-08 16:58:39 +00:00
Matt Arsenault e719139b10 AMDGPU: Fix shifts for i128
llvm-svn: 339270
2018-08-08 16:58:33 +00:00
Zaara Syeda b2595b988b [PowerPC] Improve codegen for vector loads using scalar_to_vector
This patch aims to improve the codegen for vector loads involving the
scalar_to_vector (load X) sequence. Initially, ld->mv instructions were used
for scalar_to_vector (load X), so this patch allows scalar_to_vector (load X)
to utilize:

LXSD and LXSDX for i64 and f64
LXSIWAX for i32 (sign extension to i64)
LXSIWZX for i32 and f64

Committing on behalf of Amy Kwan.
Differential Revision: https://reviews.llvm.org/D48950

llvm-svn: 339260
2018-08-08 15:20:43 +00:00
Alex Bradbury 07224dfb47 [RISCV] Add mnemonic alias: move, sbreak and scall.
Further improve compatibility with the GNU assembler.

Differential Revision: https://reviews.llvm.org/D50217
Patch by Kito Cheng.

llvm-svn: 339255
2018-08-08 14:53:45 +00:00
Alex Bradbury 7d8d87c143 [RISCV] Add InstAlias definitions for add[w], and, xor, or, sll[w], srl[w], sra[w], slt and sltu with immediate
Match the GNU assembler in supporting immediate operands for these 
instructions even when the reg-reg mnemonic is used.

Differential Revision: https://reviews.llvm.org/D50046
Patch by Kito Cheng.

llvm-svn: 339252
2018-08-08 14:45:44 +00:00
Sjoerd Meijer f8c394f0f5 [ARM] FP16: codegen support for VEXT
Differential Revision: https://reviews.llvm.org/D50427

llvm-svn: 339241
2018-08-08 13:26:38 +00:00
Sjoerd Meijer db5908deb9 [ARM] FP16: vector vmov and vdup support
This adds codegen support for the vmov_n_f16 and vdup_n_f16 variants.

Differential Revision: https://reviews.llvm.org/D50329

llvm-svn: 339238
2018-08-08 13:11:31 +00:00
Sjoerd Meijer 920a453485 [ARM] FP16: vector VMUL variants
This adds codegen support for the vmul_lane_f16 and vmul_n_f16 variants.

Differential Revision: https://reviews.llvm.org/D50326

llvm-svn: 339232
2018-08-08 10:27:34 +00:00
Benjamin Kramer 83996e4dee [Wasm] Don't iterate over MachineBasicBlock::successors while erasing from it
This will read out of bounds. Found by asan.

llvm-svn: 339230
2018-08-08 10:13:19 +00:00
Sjoerd Meijer b33a4c02cc [ARM] FP16: support vector INT_TO_FP and FP_TO_INT
This adds codegen support for the different vcvt_f16 variants.

Differential Revision: https://reviews.llvm.org/D50393

llvm-svn: 339227
2018-08-08 09:45:34 +00:00
Sjoerd Meijer b264944ed5 [ARM] FP16: support the vector vmin and vmax variants
Differential Revision: https://reviews.llvm.org/D50238

llvm-svn: 339221
2018-08-08 07:20:15 +00:00
Jan Vesely 7b2c98ab59 AMDGPU: Remove broken i16 ternary patterns
Fixup test to check for GCN prefix
These patterns always zero extend the result even though it might need sign extension.
This has been broken since the addition of i16 support.
It has popped up in mad_sat(char) test since min(max()) combination is turned into v_med3, resulting in the following (incorrect) sequence:
        v_mad_i16 v2, v10, v9, v11
        v_med3_i32 v2, v2, v8, v7

Fixes mad_sat(char) piglit on VI.

Differential Revision: https://reviews.llvm.org/D49836

llvm-svn: 339190
2018-08-07 21:54:37 +00:00
Derek Schuff 51ed131ed2 [WebAssembly] Update SIMD binary arithmetic
Add missing SIMD types (v2f64) and binary ops. Also adds
tablegen support for automatically prepending prefix byte to SIMD
opcodes.

Differential Revision: https://reviews.llvm.org/D50292

Patch by Thomas Lively

llvm-svn: 339186
2018-08-07 21:24:01 +00:00
Krzysztof Parzyszek e7ce247dd7 [Hexagon] Allow use of gather intrinsics even with no-packets
Vgather requires must be in a packet with a store, which contradicts
the no-packets feature. As a consequence, gather/scatter could not be
used with no-packets. Relax this, and allow gather packets as exceptions
to the no-packets requirements.

llvm-svn: 339177
2018-08-07 20:33:47 +00:00
Heejin Ahn 7fb68d2679 [WebAssembly] CFG sort support for exception handling
Summary:
This patch extends CFGSort pass to support exception handling. Once it
places a loop header, it does not place blocks that are not dominated by
the loop header until all the loop blocks are sorted. This patch extends
the same algorithm to exception 'catch' part, using the information
calculated by WebAssemblyExceptionInfo class.

Reviewers: dschuff, sunfish

Subscribers: sbc100, jgravelle-google, llvm-commits

Differential Revision: https://reviews.llvm.org/D46500

llvm-svn: 339172
2018-08-07 20:19:23 +00:00
Craig Topper deb2899b2d [SelectionDAG][X86][SystemZ] Add a generic nonvolatile_store/nonvolatile_load pattern fragment in TargetSelectionDAG.td
Differential Revision: https://reviews.llvm.org/D50358

llvm-svn: 339156
2018-08-07 17:34:59 +00:00
Sjoerd Meijer b39cd886b9 [ARM] FP16: codegen support for VACGT
Differential Revision: https://reviews.llvm.org/D50236

llvm-svn: 339148
2018-08-07 15:11:47 +00:00
Jonas Paulsson 5438f1debc [SystemZ] Comment update.
Update the comment in nextGroup since the ProcResourceCounters are not anymore
always decremented with '1'.

llvm-svn: 339140
2018-08-07 13:48:09 +00:00
Jonas Paulsson 25cbfdd423 [SystemZ] NFC: Remove redundant check in SystemZHazardRecognizer.
Remove the redundant check against zero when updating ProcResourceCounters in
nextGroup(), as pointed out in https://reviews.llvm.org/D50187.

Review: Ulrich Weigand.
llvm-svn: 339139
2018-08-07 13:44:11 +00:00
Aleksandar Beserminji 949a17c016 [mips] Handle branch expansion corner cases
When potential jump instruction and target are in the same segment, use
jump instruction with immediate field.

In cases where offset does not fit immediate value of a bc/j instructions,
offset is stored into register, and then jump register instruction is used.

Differential Revision: https://reviews.llvm.org/D48019

llvm-svn: 339126
2018-08-07 10:45:45 +00:00
Matt Arsenault 96b678427a AMDGPU: Add feature vi-insts
This is necessary to add a VI specific builtin,
__builtin_amdgcn_s_dcache_wb. We already have an
overly specific feature for one of these builtins,
for s_memrealtime. I'm not sure whether it's better
to add more of those, or to get rid of that and merge
it with vi-insts.

Alternatively, maybe this logically goes with scalar-stores?

llvm-svn: 339104
2018-08-07 07:28:46 +00:00
Craig Topper 9de1797c50 [SelectionDAG][X86] Rename MaskedLoadSDNode::getSrc0 to getPassThru.
Src0 doesn't really convey any meaning to what the operand is. Passthru matches what's used in the documentation for the intrinsic this comes from.

llvm-svn: 339101
2018-08-07 06:52:49 +00:00
Craig Topper 17989208a9 [SelectionDAG][X86] Rename getValue to getPassThru for gather SDNodes.
getValue is more meaningful name for scatter than it is for gather. Split them and use getPassThru for gather.

llvm-svn: 339096
2018-08-07 06:13:40 +00:00
Heejin Ahn e8653bb89a [WebAssembly] Enable atomic expansion for unsupported atomicrmws
Summary:
Wasm does not have direct counterparts to some of LLVM IR's atomicrmw
instructions (min, max, umin, umax, and nand). This enables atomic
expansion using cmpxchg instruction within a loop for those atomicrmw
instructions.

Reviewers: dschuff

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D49440

llvm-svn: 339084
2018-08-07 00:22:22 +00:00
Derek Schuff 2c78385960 [WebAssembly] Replace SIMD expression types with V128
Summary:
The spec only defines a SIMD expression type of V128 and
leaves interpretation of different vector types to the instructions.

Differential Revision: https://reviews.llvm.org/D50367

Patch by Thomas Lively

llvm-svn: 339082
2018-08-06 23:16:50 +00:00
Matt Arsenault 08f3fe4fae AMDGPU: cvt_pk_rtz_f16 canonicalizes
llvm-svn: 339078
2018-08-06 23:01:31 +00:00
Matt Arsenault e94ee833f9 AMDGPU: Handle some vector operations in isCanonicalized
llvm-svn: 339077
2018-08-06 22:45:51 +00:00
Matt Arsenault a29e76244a AMDGPU: Push fcanonicalize through partially constant build_vector
This usually avoids some re-packing code, and may
help find canonical sources.

llvm-svn: 339072
2018-08-06 22:30:44 +00:00
Matt Arsenault f2a167fb1d AMDGPU: Refactor fcanonicalize combine
This will make more complex combines easier.

llvm-svn: 339070
2018-08-06 22:10:26 +00:00
Matt Arsenault d49ab0b214 AMDGPU: Treat more custom operations as canonicalizing
Everything should quiet, and I think everything should
flush.

I assume the min3/med3/max3 follow the same rules
as regular min/max for flushing, which should at
least be conservatively correct.

There are still more operations that need to
be handled.

llvm-svn: 339065
2018-08-06 21:58:11 +00:00
Matt Arsenault ce6d61fba8 AMDGPU: Conversions always produce canonical results
Not sure why this was checking for denormals for f16.
My interpretation of the IEEE standard is conversions
should produce a canonical result, and the ISA manual
says denormals are created when appropriate.

llvm-svn: 339064
2018-08-06 21:51:52 +00:00
Matt Arsenault f8768bfc84 AMDGPU: Fix implementation of isCanonicalized
If denormals are enabled, denormals are canonical.
Also fix a few other issues. minnum/maxnum are supposed
to canonicalize. Temporarily improve workaround for the
instruction behavior change in gfx9.

Handle selects and fcopysign.

The tests were also largely broken, since they were
checking for a flush used on some targets after the
store of the result.

llvm-svn: 339061
2018-08-06 21:38:27 +00:00
Easwaran Raman 10fd92dd94 [X86] Recognize a splat of negate in isFNEG
Summary:
Expand isFNEG so that we generate the appropriate F(N)M(ADD|SUB)
instructions in more cases. For example, the following sequence
a = _mm256_broadcast_ss(f)
d = _mm256_fnmadd_ps(a, b, c)

generates an fsub and fma without this patch and an fnma with this
change.

Reviewers: craig.topper

Subscribers: llvm-commits, davidxl, wmi

Differential Revision: https://reviews.llvm.org/D48467

llvm-svn: 339043
2018-08-06 19:23:38 +00:00
Craig Topper 0076477a4c [X86] When using "and $0" and "orl $-1" to store 0 and -1 for minsize, make sure the store isn't volatile
If the store is volatile this might be a memory mapped IO access. In that case we shouldn't generate a load that didn't exist in the source

Differential Revision: https://reviews.llvm.org/D50270

llvm-svn: 339041
2018-08-06 18:44:26 +00:00
Matt Arsenault 0d1b3934e2 AMDGPU: Fold v_lshl_or_b32 with 0 src0
Appears from expansion of some packed cases.

llvm-svn: 339025
2018-08-06 15:40:20 +00:00
Bryan Chan e023706471 [AArch64] Fix assertion failure on widened f16 BUILD_VECTOR
Summary:
Ensure that NormalizedBuildVector returns a BUILD_VECTOR with operands of the
same type. This fixes an assertion failure in VerifySDNode.

Reviewers: SjoerdMeijer, t.p.northover, javed.absar

Reviewed By: SjoerdMeijer

Subscribers: kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D50202

llvm-svn: 339013
2018-08-06 14:14:41 +00:00
Tim Northover 9956e4a24b ARM-MachO: don't add Thumb bit for addend to non-external relocation.
ld64 supplies its own Thumb bit for Thumb functions, and intentionally zeroes
out that part of any addend in an object file. But it only does that for
symbols marked N_EXT -- i.e. external symbols. So LLVM should avoid setting
that extra bit in other cases.

llvm-svn: 339007
2018-08-06 11:32:44 +00:00
David Bolvansky c0aa4b75a4 Enrich inline messages
Summary:
This patch improves Inliner to provide causes/reasons for negative inline decisions.
1. It adds one new message field to InlineCost to report causes for Always and Never instances. All Never and Always instantiations must provide a simple message.
2. Several functions that used to return the inlining results as boolean are changed to return InlineResult which carries the cause for negative decision.
3. Changed remark priniting and debug output messages to provide the additional messages and related inline cost.
4. Adjusted tests for changed printing.

Patch by: yrouban (Yevgeny Rouban)


Reviewers: craig.topper, sammccall, sgraenitz, NutshellySima, shchenz, chandlerc, apilipenko, javed.absar, tejohnson, dblaikie, sanjoy, eraman, xbolva00

Reviewed By: tejohnson, xbolva00

Subscribers: xbolva00, llvm-commits, arsenm, mehdi_amini, eraman, haicheng, steven_wu, dexonsmith

Differential Revision: https://reviews.llvm.org/D49412

llvm-svn: 338969
2018-08-05 14:53:08 +00:00
Craig Topper 3c869cb5e5 [X86] Add isel patterns for atomic_load+sub+atomic_sub.
Despite the comment removed in this patch, this is beneficial when the RHS of the sub is a register.

llvm-svn: 338930
2018-08-03 22:08:30 +00:00
Craig Topper d7391eefdf [X86] Remove RELEASE_ and ACQUIRE_ pseudo instructions. Use isel patterns and the normal instructions instead
At one point in time acquire implied mayLoad and mayStore as did release. Thus we needed separate pseudos that also carried that property. This appears to no longer be the case. I believe it was changed in 2012 with a comment saying that atomic memory accesses are marked volatile which preserves the ordering.

So from what I can tell we shouldn't need additional pseudos since they aren't carry any flags that are different from the normal instructions. The only thing I can think of is that we may consider them for load folding candidates in the peephole pass now where we didn't before. If that's important hopefully there's something in the memory operand we can check to prevent the folding without relying on pseudo instructions.

Differential Revision: https://reviews.llvm.org/D50212

llvm-svn: 338925
2018-08-03 21:40:44 +00:00
Matt Arsenault c3dc8e65e2 DAG: Enhance isKnownNeverNaN
Add a parameter for testing specifically for
sNaNs - at least one instruction pattern on AMDGPU
needs to check specifically for this.

Also handle more cases, and add a target hook
for custom nodes, similar to the hooks for known
bits.

llvm-svn: 338910
2018-08-03 18:27:52 +00:00
Artem Belevich 0a11b6366a [NVPTX] Handle __nvvm_reflect("__CUDA_ARCH").
Summary:
libdevice in recent CUDA versions relies on __nvvm_reflect() to select
GPU-specific bitcode. This patch addresses the requirement.

Reviewers: jlebar

Subscribers: jholewinski, sanjoy, hiraditya, bixia, llvm-commits

Differential Revision: https://reviews.llvm.org/D50207

llvm-svn: 338908
2018-08-03 18:05:24 +00:00
Craig Topper feb2a58860 [X86] Add a DAG combine for the __builtin_parity idiom used by clang to enable better codegen
Clang uses "ctpop & 1" to implement __builtin_parity. If the popcnt instruction isn't supported this generates a large amount of code to calculate the population count. Instead we can bisect the data down to a single byte using xor and then check the parity flag.

Even when popcnt is supported, its still a good idea to split 64-bit data on 32-bit targets using an xor in front of a single popcnt. Otherwise we get two popcnts and an add before the and.

I've specifically targeted this at the sizes supported by clang builtins, but we could generalize this if we think that's useful.

Differential Revision: https://reviews.llvm.org/D50165

llvm-svn: 338907
2018-08-03 18:00:29 +00:00
Nicholas Wilson e408a89a3a [WebAssembly] Cleanup of the way globals and global flags are handled
Differential Revision: https://reviews.llvm.org/D44030

llvm-svn: 338894
2018-08-03 14:33:37 +00:00
Jonas Paulsson f107b7275c [SystemZ] Improve handling of instructions which expand to several groups
Some instructions expand to more than one decoder group.

This has been hitherto ignored, but is handled with this patch.

Review: Ulrich Weigand
https://reviews.llvm.org/D50187

llvm-svn: 338849
2018-08-03 10:43:05 +00:00
Sjoerd Meijer d62c5ec2fe [ARM] FP16: support vector zip and unzip
This is addressing PR38404.

Differential Revision: https://reviews.llvm.org/D50186

llvm-svn: 338835
2018-08-03 09:24:29 +00:00
Sjoerd Meijer 9b30213828 [ARM] FP16: support VFMA
This is addressing PR38404.

llvm-svn: 338830
2018-08-03 09:12:56 +00:00
Craig Topper a7a12399a1 [X86] Remove all the vector NOP bitcast patterns. Use a few lines of code in the Select method in X86ISelDAGToDAG.cpp instead.
There are a lot of permutations of types here generating a lot of patterns in the isel table. It's more efficient to just ReplaceUses and RemoveDeadNode from the Select function.

The test changes are because we have a some shuffle patterns that have a bitcast as their root node. But the behavior is identical to another instruction whose pattern doesn't start with a bitcast. So this isn't a functional change.

llvm-svn: 338824
2018-08-03 07:01:10 +00:00
Craig Topper e902b7d0b0 [X86] Support fp128 and/or/xor/load/store with VEX and EVEX encoded instructions.
Move all the patterns to X86InstrVecCompiler.td so we can keep SSE/AVX/AVX512 all in one place.

To save some patterns we'll use an existing DAG combine to convert f128 fand/for/fxor to integer when sse2 is enabled. This allows use to reuse all the existing patterns for v2i64.

I believe this now makes SHA instructions the only case where VEX/EVEX and legacy encoded instructions could be generated simultaneously.

llvm-svn: 338821
2018-08-03 06:12:56 +00:00
Craig Topper a80352c04e [X86] When post-processing the DAG to remove zero extending moves for YMM/ZMM, make sure the producing instruction is VEX/XOP/EVEX encoded.
If the producing instruction is legacy encoded it doesn't implicitly zero the upper bits. This is important for the SHA instructions which don't have a VEX encoded version. We might also be able to hit this with the incomplete f128 support that hasn't been ported to VEX.

llvm-svn: 338812
2018-08-03 04:49:42 +00:00
Craig Topper b2cc9a1d44 [X86] Add R13D to the isInefficientLEAReg in FixupLEAs.
I'm assuming the R13 restriction extends to R13D. Guessing this restriction is related to the funny encoding of this register as base always requiring a displacement to be encoded.

llvm-svn: 338806
2018-08-03 03:45:19 +00:00
Craig Topper 2c095444a4 [X86] Prevent promotion of i16 add/sub/and/or/xor to i32 if we can fold an atomic load and atomic store.
This makes them consistent with i8/i32/i64. Which still seems to be more aggressive on folding than icc, gcc, or MSVC.

llvm-svn: 338795
2018-08-03 00:37:34 +00:00
Tim Renouf 366a49d986 [AMDGPU] Minor change to d16 buffer load implementation
Summary:
By not reconstructing the operand list of the SDNode, this change makes
it easier to add the forthcoming new tbuffer and buffer intrinsics.

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D49995

Change-Id: I0cb79ef0801532645d7dd954a6d7355139db7b38
llvm-svn: 338784
2018-08-02 23:33:01 +00:00
Tim Renouf abd85fb1f5 [AMDGPU] Reworked SIFixWWMLiveness
Summary:
I encountered some problems with SIFixWWMLiveness when WWM is in a loop:

1. It sometimes gave invalid MIR where there is some control flow path
   to the new implicit use of a register on EXIT_WWM that does not pass
   through any def.

2. There were lots of false positives of registers that needed to have
   an implicit use added to EXIT_WWM.

3. Adding an implicit use to EXIT_WWM (and adding an implicit def just
   before the WWM code, which I tried in order to fix (1)) caused lots
   of the values to be spilled and reloaded unnecessarily.

This commit is a rework of SIFixWWMLiveness, with the following changes:

1. Instead of considering any register with a def that can reach the WWM
   code and a def that can be reached from the WWM code, it now
   considers three specific cases that need to be handled.

2. A register that needs liveness over WWM to be synthesized now has it
   done by adding itself as an implicit use to defs other than the
   dominant one.

Also added the following fixmes:

FIXME: We should detect whether a register in one of the above
categories is already live at the WWM code before deciding to add the
implicit uses to synthesize its liveness.

FIXME: I believe this whole scheme may be flawed due to the possibility
of the register allocator doing live interval splitting.

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D46756

Change-Id: Ie7fba0ede0378849181df3f1a9a7a39ed1a94a94
llvm-svn: 338783
2018-08-02 23:31:32 +00:00
Craig Topper 63873db5c4 [X86] Allow 'atomic_store (neg/not atomic_load)' to isel to a RMW instruction.
There was a FIXMe in the td file about a type inference issue that was easy to fix.

llvm-svn: 338782
2018-08-02 23:30:38 +00:00
Tim Renouf f1c7b92a6a [AMDGPU] Avoid using divergent value in mubuf addr64 descriptor
Summary:
This fixes a problem where a load from global+idx generated incorrect
code on <=gfx7 when the index is divergent.

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D47383

Change-Id: Ib4d177d6254b1dd3f8ec0203fdddec94bd8bc5ed
llvm-svn: 338779
2018-08-02 22:53:57 +00:00
Krzysztof Parzyszek d91a9e27a9 [Hexagon] Simplify CFG after atomic expansion
This will remove suboptimal branching from the generated ll/sc loops.
The extra simplification pass affects a lot of testcases, which have
been modified to accommodate this change: either by modifying the
test to become immune to the CFG simplification, or (less preferablt)
by adding option -hexagon-initial-cfg-clenaup=0.

llvm-svn: 338774
2018-08-02 22:17:53 +00:00
Heejin Ahn 4128cb0b6b [WebAssembly] Support for atomic.wait / atomic.wake instructions
Summary:
This adds support for atomic.wait / atomic.wake instructions in the wasm
thread proposal.

Reviewers: dschuff

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D49395

llvm-svn: 338770
2018-08-02 21:44:24 +00:00
Sam Clegg 41d7047de5 [WebAssembly] Ensure bitcasts that would result in invalid wasm are removed by FixFunctionBitcasts
Rather than allowing invalid bitcasts to be lowered to wasm
call instructions that won't validate, generate wrappers that
contain unreachable thereby delaying the error until runtime.

Differential Revision: https://reviews.llvm.org/D49517

llvm-svn: 338744
2018-08-02 17:38:06 +00:00
Craig Topper 0423881820 [X86] Allow fake unary unpckhpd and movhlps to be commuted for execution domain fixing purposes
These instructions perform the same operation, but the semantic of which operand is destroyed is reversed. If the same register is used as both operands we can change the execution domain without worrying about this difference.

Unfortunately, this really only works in cases where the input register is killed by the instruction. If its not killed, the two address isntruction pass inserts a copy that will become a move instruction. This makes the instruction use different physical registers that contain the same data at the time the unpck/movhlps executes. I've considered using a unary pseudo instruction with tied operand to trick the two address instruction pass. We could then expand the pseudo post regalloc to get the same physical register on both inputs.

Differential Revision: https://reviews.llvm.org/D50157

llvm-svn: 338735
2018-08-02 16:48:01 +00:00
Matt Arsenault 36cdcfadcf AMDGPU: Fix scalarizing v4f16 fcanonicalize
llvm-svn: 338714
2018-08-02 13:43:42 +00:00
Simon Pilgrim 8b16e15d47 [X86][SSE] Pull out duplicate VSELECT to shuffle mask code. NFCI.
llvm-svn: 338693
2018-08-02 09:20:27 +00:00
Alexander Ivchenko 48eba54f18 [GlobalISel] Fix typo with missed override specifier
llvm-svn: 338689
2018-08-02 08:55:05 +00:00
Alexander Ivchenko 49168f6778 [GlobalISel] Rewrite CallLowering::lowerReturn to accept multiple VRegs per Value
This is logical continuation of https://reviews.llvm.org/D46018 (r332449)

Differential Revision: https://reviews.llvm.org/D49660

llvm-svn: 338685
2018-08-02 08:33:31 +00:00
David Green ea60446c6d [AArch64] Add support for got relocated LDR's
As a part of adding the tiny codemodel, we need to support ldr's with :got:
relocations on them. This seems to be mostly already done, just needs the
relocation type support.

Differential Revision: https://reviews.llvm.org/D50137

llvm-svn: 338673
2018-08-02 06:24:40 +00:00
Kito Cheng dffce953bf Test commit.
llvm-svn: 338672
2018-08-02 05:38:18 +00:00
Nemanja Ivanovic e1a525ed06 [PowerPC] Do not round values prior to converting to integer
Adding the FP_ROUND nodes when combining FP_TO_[SU]INT of elements
feeding a BUILD_VECTOR into an FP_TO_[SU]INT of the built vector
loses precision. This patch removes the code that adds these nodes
to true f64 operands. It also adds patterns required to ensure
the code is still vectorized rather than converting individual
elements and inserting into a vector.

Fixes https://bugs.llvm.org/show_bug.cgi?id=38342

Differential Revision: https://reviews.llvm.org/D50121

llvm-svn: 338658
2018-08-02 00:03:22 +00:00
Lei Liu 8e422b8403 [AArch64] DWARF: do not generate AT_location for thread local
AArch64 ELF ABI does not define a static relocation type for TLS offset within
a module, which makes it impossible for compiler to generate a valid
DW_AT_location content for thread local variables. Currently LLVM generates an
invalid R_AARCH64_ABS64 relocation at the DW_AT_location field for a TLS
variable. That causes trouble for linker because thread local variable does
not have an absolute address at link time. AArch64 GCC solves the problem by
not generating DW_AT_location for thread local variables. We should do the
same in LLVM.

Differential Revision: https://reviews.llvm.org/D43860

llvm-svn: 338655
2018-08-01 23:46:49 +00:00
Reid Kleckner a30a6d2c29 Load from the GOT for external symbols in the large, PIC code model
Do the same handling for external symbols that we do for jump table
symbols and global values.

Fixes one of the cases in PR38385

llvm-svn: 338651
2018-08-01 22:56:05 +00:00
Matt Arsenault a7dfd48310 AMDGPU: Use SPseudoInst helper
llvm-svn: 338631
2018-08-01 20:49:00 +00:00
Matt Arsenault 709374d186 AMDGPU: Improve hack for packing conversion ops
Mutate the node type during selection when it
doesn't matter. This avoids an intermediate bitcast
node on targets with legal i16/f16.

Also fixes missing output modifiers on v_cvt_pkrtz_f32_f16,
which I assume are OK.

llvm-svn: 338619
2018-08-01 20:13:58 +00:00
Matt Arsenault 55ab9213d3 AMDGPU: Partially fix handling of packed amdgpu_ps arguments
Fixes annoying limitations when writing tests.
Also remove more leftover code for manually scalarizing arguments
and return values.

llvm-svn: 338618
2018-08-01 19:57:34 +00:00
Heejin Ahn b3724b7169 [WebAssembly] Support for a ternary atomic RMW instruction
Summary: This adds support for a ternary atomic RMW instruction: cmpxchg.

Reviewers: dschuff

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D49195

llvm-svn: 338617
2018-08-01 19:40:28 +00:00
Craig Topper c985d42903 [X86] Canonicalize the pattern for __builtin_ffs in a similar way to '__builtin_ffs + 5'
We now emit a move of -1 before the cmov and do the addition after the cmov just like the case with an extra addition.

This may be slightly worse for code size, but is more consistent with other compilers. And we might be able to hoist the mov -1 outside of loops.

llvm-svn: 338613
2018-08-01 18:38:46 +00:00
Jan Vesely 93b252799b AMDGPU/R600: Convert kernel param loads to use PARAM_I_ADDRESS
Non ext aligned i32 loads are still optimized to use CONSTANT_BUFFER (AS 8)

llvm-svn: 338610
2018-08-01 18:36:07 +00:00
Vlad Tsyrklevich ab016e00ec [X86] FastISel fall back on !absolute_symbol GVs
Summary:
D25878, which added support for !absolute_symbol for normal X86 ISel,
did not add support for materializing references to absolute symbols for
X86 FastISel. This causes build failures because FastISel generates
PC-relative relocations for absolute symbols. Fall back to normal ISel
for references to !absolute_symbol GVs. Fix for PR38200.

Reviewers: pcc, craig.topper

Reviewed By: pcc

Subscribers: hiraditya, llvm-commits, kcc

Differential Revision: https://reviews.llvm.org/D50116

llvm-svn: 338599
2018-08-01 17:44:37 +00:00
Simon Pilgrim 931ebe3be1 [X86] Assign from a brace initializer to match style guide. NFCI.
llvm-svn: 338598
2018-08-01 17:43:38 +00:00
Simon Pilgrim a3548c960e [SelectionDAG] Make binop reduction matcher available to all targets
There is nothing x86-specific about this code, so it'd be nice to make this available for other targets to use in the future (and get it out of X86ISelLowering!).

Differential Revision: https://reviews.llvm.org/D50083

llvm-svn: 338586
2018-08-01 16:52:28 +00:00
Jan Vesely 5ba1b4bdab AMDGPU: Allow fp32-denormals feature for r600 targets
This was accidentally removed in r335942.

Differential Revision: https://reviews.llvm.org/D49934

llvm-svn: 338569
2018-08-01 15:04:36 +00:00
Bryan Chan 67106b5e08 [AArch64] Fix FCCMP with FP16 operands
Summary: This patch adds support for FCCMP instruction with FP16 operands, avoiding an assertion during instruction selection.

Reviewers: olista01, SjoerdMeijer, t.p.northover, javed.absar

Reviewed By: SjoerdMeijer

Subscribers: kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D50115

llvm-svn: 338554
2018-08-01 13:50:29 +00:00
Simon Pilgrim 564762cf32 [X86] Use isNullConstant helper. NFCI.
llvm-svn: 338530
2018-08-01 13:06:14 +00:00
Ryan Taylor 894c8fd0e2 [AMDGPU] Optimize _L image intrinsic to _LZ when lod is zero
Summary:
Add _L to _LZ image intrinsic table mapping to table gen.
In ISelLowering check if image intrinsic has lod and if it's equal
to zero, if so remove lod and change opcode to equivalent mapped _LZ.

Change-Id: Ie24cd7e788e2195d846c7bd256151178cbb9ec71

Subscribers: arsenm, mehdi_amini, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, steven_wu, dexonsmith, llvm-commits

Differential Revision: https://reviews.llvm.org/D49483

llvm-svn: 338523
2018-08-01 12:12:01 +00:00
Ulrich Weigand 58a9786e81 [SystemZ, TableGen] Fix shift count handling
The DAG combiner logic to simplify AND masks in shift counts is invalid.
While it is true that the SystemZ shift instructions ignore all but the
low 6 bits of the shift count, it is still invalid to simplify the AND
masks while the DAG still uses the standard shift operators (which are
*not* defined to match the SystemZ instruction behavior).

Instead, this patch performs equivalent operations during instruction
selection. For completely removing the AND, this now happens via
additional DAG match patterns implemented by a multi-alternative
PatFrags. For simplifying a 32-bit AND to a 16-bit AND, the existing DAG
patterns were already mostly OK, they just needed an output XForm to
actually truncate the immediate value.

Unfortunately, the latter change also exposed a bug in TableGen: it
seems XForms are currently only handled correctly for direct operands of
the outermost operation node. This patch also fixes that bug by simply
recurring through the whole pattern. This should be NFC for all other
targets.

Differential Revision: https://reviews.llvm.org/D50096

llvm-svn: 338521
2018-08-01 11:57:58 +00:00
Simon Pilgrim e447a273bd [X86] Use isNullConstant helper. NFCI.
llvm-svn: 338516
2018-08-01 11:24:11 +00:00
Andrew V. Tischenko dad919d357 [X86] Improved sched models for X86 BT*rr instructions.
Differential Revision: https://reviews.llvm.org/D49243

llvm-svn: 338507
2018-08-01 10:24:27 +00:00
Petar Jovanovic 64c10ba8e2 [MIPS GlobalISel] Select global address
Select G_GLOBAL_VALUE for position dependent code.

Patch by Petar Avramovic.

Differential Revision: https://reviews.llvm.org/D49803

llvm-svn: 338499
2018-08-01 09:03:23 +00:00
David Bolvansky fbbb83c782 Revert "Enrich inline messages", tests fail
llvm-svn: 338496
2018-08-01 08:02:40 +00:00
David Bolvansky 7f36cd9d96 Enrich inline messages
Summary:
This patch improves Inliner to provide causes/reasons for negative inline decisions.
1. It adds one new message field to InlineCost to report causes for Always and Never instances. All Never and Always instantiations must provide a simple message.
2. Several functions that used to return the inlining results as boolean are changed to return InlineResult which carries the cause for negative decision.
3. Changed remark priniting and debug output messages to provide the additional messages and related inline cost.
4. Adjusted tests for changed printing.

Patch by: yrouban (Yevgeny Rouban)


Reviewers: craig.topper, sammccall, sgraenitz, NutshellySima, shchenz, chandlerc, apilipenko, javed.absar, tejohnson, dblaikie, sanjoy, eraman, xbolva00

Reviewed By: tejohnson, xbolva00

Subscribers: xbolva00, llvm-commits, arsenm, mehdi_amini, eraman, haicheng, steven_wu, dexonsmith

Differential Revision: https://reviews.llvm.org/D49412

llvm-svn: 338494
2018-08-01 07:37:16 +00:00
Martin Storsjo d4590c38ab [AArch64] Disallow the MachO specific .loh directive for windows
Also add a test for it being unsupported for linux.

Differential Revision: https://reviews.llvm.org/D49929

llvm-svn: 338493
2018-08-01 06:50:18 +00:00
Craig Topper 65a1388881 [X86] When looking for (CMOV C-1, (ADD (CTTZ X), C), (X != 0)) -> (ADD (CMOV (CTTZ X), -1, (X != 0)), C), make sure we really have a compare with 0.
It's not strictly required by the transform of the cmov and the add, but it makes sure we restrict it to the cases we know we want to match.

While there canonicalize the operand order of the cmov to simplify the matching and emitting code.

llvm-svn: 338492
2018-08-01 06:36:20 +00:00
Chandler Carruth 2ce191e220 [x86] Fix a really subtle miscompile due to a somewhat glaring bug in
EFLAGS copy lowering.

If you have a branch of LLVM, you may want to cherrypick this. It is
extremely unlikely to hit this case empirically, but it will likely
manifest as an "impossible" branch being taken somewhere, and will be
... very hard to debug.

Hitting this requires complex conditions living across complex control
flow combined with some interesting memory (non-stack) initialized with
the results of a comparison. Also, because you have to arrange for an
EFLAGS copy to be in *just* the right place, almost anything you do to
the code will hide the bug. I was unable to reduce anything remotely
resembling a "good" test case from the place where I hit it, and so
instead I have constructed synthetic MIR testing that directly exercises
the bug in question (as well as the good behavior for completeness).

The issue is that we would mistakenly assume any SETcc with a valid
condition and an initial operand that was a register and a virtual
register at that to be a register *defining* SETcc...

It isn't though....

This would in turn cause us to test some other bizarre register,
typically the base pointer of some memory. Now, testing this register
and using that to branch on doesn't make any sense. It even fails the
machine verifier (if you are running it) due to the wrong register
class. But it will make it through LLVM, assemble, and it *looks*
fine... But wow do you get a very unsual and surprising branch taken in
your actual code.

The fix is to actually check what kind of SETcc instruction we're
dealing with. Because there are a bunch of them, I just test the
may-store bit in the instruction. I've also added an assert for sanity
that ensure we are, in fact, *defining* the register operand. =D

llvm-svn: 338481
2018-08-01 03:01:58 +00:00
Konstantin Zhuravlyov bb30ef7af4 AMDGPU: Add clamp bit to dot intrinsics
Differential Revision: https://reviews.llvm.org/D49874

llvm-svn: 338470
2018-08-01 01:31:30 +00:00
Reid Kleckner b32ff46ff7 Revert r338354 "[ARM] Revert r337821"
Disable ARMCodeGenPrepare by default again. It is causing verifier
failues in V8 that look like:

  Duplicate integer as switch case
  switch i32 %trunc, label %if.end13 [
    i32 0, label %cleanup36
    i32 0, label %if.then8
  ], !dbg !4981
  i32 0
  fatal error: error in backend: Broken function found, compilation aborted!

I will continue reducing the test case and send it along.

llvm-svn: 338452
2018-07-31 23:09:42 +00:00
Jonas Paulsson 590b1fc881 [SystemZ] Fix bad assert composition.
Use '&&' before the string instead of '||'

llvm-svn: 338429
2018-07-31 19:58:42 +00:00
Matt Arsenault feedabfde7 AMDGPU: Break 64-bit arguments into 32-bit pieces
llvm-svn: 338421
2018-07-31 19:29:04 +00:00
Matt Arsenault 0395da7842 AMDGPU: Split wide vectors of i16/f16 into 32-bit regs on calls
This improves code for the same reasons as scalarizing 32-bit
element vectors.

llvm-svn: 338418
2018-07-31 19:17:47 +00:00
Matt Arsenault 9ced1e0d80 AMDGPU: Scalarize vector argument types to calls
When lowering calling conventions, prefer to decompose vectors
into the constitute register types. This avoids artifical constraints
to satisfy a wide super-register.

This improves code quality because now optimizations don't need to
deal with the super-register constraint. For example the immediate
folding code doesn't deal with 4 component reg_sequences, so by
breaking the register down earlier the existing immediate folding
code is able to work.

This also avoids the need for the shader input processing code
to manually split vector types.

llvm-svn: 338416
2018-07-31 19:05:14 +00:00
Simon Pilgrim 67caf04d3a [X86] WriteBSWAP sched classes are reg-reg only.
Don't declare them as X86SchedWritePair when the folded class will never be used.

Note: MOVBE (load/store endian conversion) instructions tend to have a very different behaviour to BSWAP.
llvm-svn: 338412
2018-07-31 18:24:24 +00:00
Simon Pilgrim 5d9b00d15b [X86][SSE] Use ISD::MULHU for constant/non-zero ISD::SRL lowering (PR38151)
As was done for vector rotations, we can efficiently use ISD::MULHU for vXi8/vXi16 ISD::SRL lowering.

Shift-by-zero cases are still problematic (mainly on v32i8 due to extra AND/ANDN/OR or VPBLENDVB blend masks but v8i16/v16i16 aren't great either if PBLENDW fails) so I've limited this first patch to known non-zero cases if we can't easily use PBLENDW.

Differential Revision: https://reviews.llvm.org/D49562

llvm-svn: 338407
2018-07-31 18:05:56 +00:00
Craig Topper bef126fb71 [X86] Add pattern matching for PMADDUBSW
Summary:
Similar to D49636, but for PMADDUBSW. This instruction has the additional complexity that the addition of the two products saturates to 16-bits rather than wrapping around. And one operand is treated as signed and the other as unsigned.

A C example that triggers this pattern

```
static const int N = 128;

int8_t A[2*N];
uint8_t B[2*N];
int16_t C[N];

void foo() {
  for (int i = 0; i != N; ++i)
    C[i] = MIN(MAX((int16_t)A[2*i]*(int16_t)B[2*i] + (int16_t)A[2*i+1]*(int16_t)B[2*i+1], -32768), 32767);
}
```

Reviewers: RKSimon, spatel, zvi

Reviewed By: RKSimon, zvi

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D49829

llvm-svn: 338402
2018-07-31 17:12:08 +00:00
Francis Visoiu Mistrih ae8002c1cf [X86] Preserve more liveness information in emitStackProbeInline
This commit fixes two issues with the liveness information after the
call:

1) The code always spills RCX and RDX if InProlog == true, which results
in an use of undefined phys reg.
2) FinalReg, JoinReg, RoundedReg, SizeReg are not added as live-ins to
the basic blocks that use them, therefore they are seen undefined.

https://llvm.org/PR38376

Differential Revision: https://reviews.llvm.org/D50020

llvm-svn: 338400
2018-07-31 16:41:12 +00:00
David Bolvansky ab79414f7b Revert Enrich inline messages
llvm-svn: 338389
2018-07-31 14:47:22 +00:00
David Bolvansky b562dbabda Enrich inline messages
Summary:
This patch improves Inliner to provide causes/reasons for negative inline decisions.
1. It adds one new message field to InlineCost to report causes for Always and Never instances. All Never and Always instantiations must provide a simple message.
2. Several functions that used to return the inlining results as boolean are changed to return InlineResult which carries the cause for negative decision.
3. Changed remark priniting and debug output messages to provide the additional messages and related inline cost.
4. Adjusted tests for changed printing.

Patch by: yrouban (Yevgeny Rouban)


Reviewers: craig.topper, sammccall, sgraenitz, NutshellySima, shchenz, chandlerc, apilipenko, javed.absar, tejohnson, dblaikie, sanjoy, eraman, xbolva00

Reviewed By: tejohnson, xbolva00

Subscribers: xbolva00, llvm-commits, arsenm, mehdi_amini, eraman, haicheng, steven_wu, dexonsmith

Differential Revision: https://reviews.llvm.org/D49412

llvm-svn: 338387
2018-07-31 14:25:24 +00:00
Matt Arsenault 638a202760 AMDGPU: Don't handle FP16_TO_FP in isCanonicalized
This needs more special handling to do correctly.
Fixes test in subsequent commit.

llvm-svn: 338381
2018-07-31 14:15:16 +00:00
Matt Arsenault 4aec86d37a AMDGPU: Fold undef fcanonicalize to qNaN
We could choose a free 0 for this, but this
matches the behavior for fmul undef, 1.0. Also,
the NaN use is more useful for folding use operations
although if it's not eliminated it is more expensive
in terms of code size.

llvm-svn: 338376
2018-07-31 13:34:31 +00:00
Andrea Di Biagio a1852b6194 [llvm-mca][BtVer2] Teach how to identify dependency-breaking idioms.
This patch teaches llvm-mca how to identify dependency breaking instructions on
btver2.

An example of dependency breaking instructions is the zero-idiom XOR (example:
`XOR %eax, %eax`), which always generates zero regardless of the actual value of
the input register operands.
Dependency breaking instructions don't have to wait on their input register
operands before executing. This is because the computation is not dependent on
the inputs.

Not all dependency breaking idioms are also zero-latency instructions. For
example, `CMPEQ %xmm1, %xmm1` is independent on
the value of XMM1, and it generates a vector of all-ones.
That instruction is not eliminated at register renaming stage, and its opcode is
issued to a pipeline for execution. So, the latency is not zero. 

This patch adds a new method named isDependencyBreaking() to the MCInstrAnalysis
interface. That method takes as input an instruction (i.e. MCInst) and a
MCSubtargetInfo.
The default implementation of isDependencyBreaking() conservatively returns
false for all instructions. Targets may override the default behavior for
specific CPUs, and return a value which better matches the subtarget behavior.

In future, we should teach to Tablegen how to automatically generate the body of
isDependencyBreaking from scheduling predicate definitions. This would allow us
to expose the knowledge about dependency breaking instructions to the machine
schedulers (and, potentially, other codegen passes).

Differential Revision: https://reviews.llvm.org/D49310

llvm-svn: 338372
2018-07-31 13:21:43 +00:00
Simon Pilgrim 0aa2867545 Revert r338365: [X86] Improved sched models for X86 BT*rr instructions.
https://reviews.llvm.org/D49243

Contains WIP code that should not have been included.

llvm-svn: 338369
2018-07-31 13:00:51 +00:00
Jonas Paulsson 2f12e45d5a [SystemZ] Improve decoding in case of instructions with four register operands.
Since z13, the max group size will be 2 if any μop has more than 3 register
sources.

This has been ignored sofar in the SystemZHazardRecognizer, but is now
handled by recognizing those instructions and adjusting the tracking of
decoding and the cost heuristic for grouping.

Review: Ulrich Weigand
https://reviews.llvm.org/D49847

llvm-svn: 338368
2018-07-31 13:00:42 +00:00
Andrew V. Tischenko e6f5ace81a [X86] Improved sched models for X86 BT*rr instructions.
https://reviews.llvm.org/D49243

llvm-svn: 338365
2018-07-31 12:33:48 +00:00
Andrew V. Tischenko e564055671 [X86] Improved sched models for X86 SHLD/SHRD* instructions.
Differential Revision: https://reviews.llvm.org/D9611

llvm-svn: 338359
2018-07-31 10:14:43 +00:00
Simon Pilgrim 99d475f97d [X86][SSE] isFNEG - Use getTargetConstantBitsFromNode to handle all constant cases
isFNEG was duplicating much of what was done by getTargetConstantBitsFromNode in its own calls to getTargetConstantFromNode.

Noticed while reviewing D48467.

llvm-svn: 338358
2018-07-31 10:13:17 +00:00
Martin Storsjo 293079f2de [ARM] Allow automatically deducing the thumb instruction size for .inst
This matches GAS, that allows unsuffixed .inst for thumb.

Differential Revision: https://reviews.llvm.org/D49937

llvm-svn: 338357
2018-07-31 09:27:07 +00:00
Martin Storsjo af18947f0a [ARM] Support the .inst directive for MachO and COFF targets
Contrary to ELF, we don't add any markers that distinguish data generated
with .short/.long from normal instructions, so the .inst directive only
adds compatibility with assembly that uses it.

Differential Revision: https://reviews.llvm.org/D49936

llvm-svn: 338356
2018-07-31 09:27:01 +00:00
Martin Storsjo 3e3d39d07e [AArch64] Support the .inst directive for MachO and COFF targets
Contrary to ELF, we don't add any markers that distinguish data generated
with .long from normal instructions, so the .inst directive only adds
compatibility with assembly that uses it.

Differential Revision: https://reviews.llvm.org/D49935

llvm-svn: 338355
2018-07-31 09:26:52 +00:00
Sam Parker 2a6c842fda [ARM] Revert r337821
Re-enabling ARMCodeGenPrepare by default after failing to reproduce
the bootstrap issues that I was concerned it was causing.

llvm-svn: 338354
2018-07-31 09:04:14 +00:00
Craig Topper 9164b9b16e [X86] Stop accidentally running the Bonnell LEA fixup path on Goldmont.
In one place we checked X86Subtarget.slowLEA() to decide if the pass should run. But to decide what the pass should we only check isSLM. This resulted in Goldmont going down the Bonnell path.

llvm-svn: 338342
2018-07-31 00:43:54 +00:00
Amara Emerson 1e8c164c63 [AArch64][GlobalISel] Add isel support for G_BLOCK_ADDR.
Also refactors some existing code to materialize addresses for the large code
model so it can be shared between G_GLOBAL_VALUE and G_BLOCK_ADDR.

This implements PR36390.

Differential Revision: https://reviews.llvm.org/D49903

llvm-svn: 338337
2018-07-31 00:09:02 +00:00
Amara Emerson 0e86c07077 [AArch64][GlobalISel] Make G_BLOCK_ADDR legal.
Differential Revision: https://reviews.llvm.org/D49902

llvm-svn: 338336
2018-07-31 00:08:56 +00:00
Craig Topper 2f60ef2c78 [DAGCombiner][TargetLowering] Pass a SmallVector instead of a std::vector to BuildSDIV/BuildUDIV/etc.
The vector contains the SDNodes that these functions create. The number of nodes is always a small number so we should use SmallVector to avoid a heap allocation.

llvm-svn: 338329
2018-07-30 23:22:00 +00:00
Craig Topper a568a27dfa [DAGCombiner][PowerPC][AArch64] Pass Created vector by reference to BuildSDIVPow2.
llvm-svn: 338303
2018-07-30 21:04:34 +00:00
Fangrui Song f78650a8de Remove trailing space
sed -Ei 's/[[:space:]]+$//' include/**/*.{def,h,td} lib/**/*.{cpp,h}

llvm-svn: 338293
2018-07-30 19:41:25 +00:00
Jessica Paquette fa3bee4756 [MachineOutliner][AArch64] Add support for saving LR to a register
This teaches the outliner to save LR to a register rather than the stack when
possible. This allows us to avoid bumping the stack in outlined functions in
some cases. By doing this, in a later patch, we can teach the outliner to do
something like this:

f1:
  ...
  bl OUTLINED_FUNCTION
  ...

f2:
  ...
  move LR's contents to a register
  bl OUTLINED_FUNCTION
  move the register's contents back

instead of falling back to saving LR in both cases.

llvm-svn: 338278
2018-07-30 17:45:28 +00:00
Craig Topper f014ec9b3b [X86] Fix typo in comment. NFC
llvm-svn: 338274
2018-07-30 17:34:31 +00:00
Craig Topper dd0ef801f8 Recommit r338204 "[X86] Correct the immediate cost for 'add/sub i64 %x, 0x80000000'."
This checks in a more direct way without triggering a UBSAN error.

llvm-svn: 338273
2018-07-30 17:29:57 +00:00
Thomas Preud'homme 6c1b075299 Fix uninitialized read in ARM's PrintAsmOperand
Summary:
Fix read of uninitialized RC variable in ARM's PrintAsmOperand when
hasRegClassConstraint returns false. This was causing
inline-asm-operand-implicit-cast test to fail in r338206.

Reviewers: t.p.northover, weimingz, javed.absar, chill

Reviewed By: chill

Subscribers: chill, eraman, kristof.beyls, chrib, llvm-commits

Differential Revision: https://reviews.llvm.org/D49984

llvm-svn: 338268
2018-07-30 16:45:40 +00:00
Sander de Smalen e64206a02c [AArch64][SVE] Asm: Enable instructions to be prefixed.
This patch enables instructions that are destructive on their
destination- and first source operand, to be prefixed with a
MOVPRFX instruction.

This patch also adds a variety of tests:

- positive tests for all instructions and forms that accept a
  movprfx for either or both predicated and unpredicated forms.

- negative tests for all instructions and forms that do not accept
  an unpredicated or predicated movprfx.

- negative tests for the diagnostics that get emitted when a MOVPRFX
  instruction is used incorrectly.

This is patch [2/2] in a series to add MOVPRFX instructions:
- Patch [1/2]: https://reviews.llvm.org/D49592
- Patch [2/2]: https://reviews.llvm.org/D49593

Reviewers: rengolin, SjoerdMeijer, samparker, fhahn, javed.absar

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D49593

llvm-svn: 338261
2018-07-30 16:05:45 +00:00
Sander de Smalen 9b33309c87 [AArch64][SVE] Asm: Add MOVPRFX instructions.
This patch adds predicated and unpredicated MOVPRFX instructions, which
can be prepended to SVE instructions that are destructive on their first
source operand, to make them a constructive operation, e.g.

  add z1.s, p0/m, z1.s, z2.s        <=> z1 = z1 + z2

can be made constructive:

  movprfx z0, z1
  add z0.s, p0/m, z0.s, z2.s        <=> z0 = z1 + z2

The predicated MOVPRFX instruction can additionally be used to zero
inactive elements, e.g.

  movprfx z0.s, p0/z, z1.s
  add z0.s, p0/m, z0.s, z2.s

Not all instructions can be prefixed with the MOVPRFX instruction
which is why this patch also adds a mechanism to validate prefixed
instructions. The exact rules when a MOVPRFX applies is detailed in
the SVE supplement of the Architectural Reference Manual.

This is patch [1/2] in a series to add MOVPRFX instructions:
- Patch [1/2]: https://reviews.llvm.org/D49592
- Patch [2/2]: https://reviews.llvm.org/D49593

Reviewers: rengolin, SjoerdMeijer, samparker, fhahn, javed.absar

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D49592

llvm-svn: 338258
2018-07-30 15:42:46 +00:00
Krzysztof Parzyszek 24fae50905 [Hexagon] Simplify A4_rcmp[n]eqi R, 0
Consider cases when register R is known to be zero/non-zero, or when it
is defined by a C2_muxii instruction.

llvm-svn: 338251
2018-07-30 14:28:02 +00:00
Matt Arsenault de496c32a4 AMDGPU: Reduce code size with fcanonicalize (fneg x)
When fcanonicalize is lowered to a mul, we can
use -1.0 for free and avoid the cost of the bigger
encoding for source modifers.

llvm-svn: 338244
2018-07-30 12:16:58 +00:00
Matt Arsenault f3c9a34def AMDGPU: Make fneg combine handle fcanonicalize
llvm-svn: 338243
2018-07-30 12:16:47 +00:00
Francis Visoiu Mistrih 7d003657de [MachineOutliner][X86] Use TAILJMPd64 instead of JMP_1 for TailCall construction
The machine verifier asserts with:

Assertion failed: (isMBB() && "Wrong MachineOperand accessor"), function getMBB, file ../include/llvm/CodeGen/MachineOperand.h, line 542.

It calls analyzeBranch which tries to call getMBB if the opcode is
JMP_1, but in this case we do:

JMP_1 @OUTLINED_FUNCTION

I believe we have to use TAILJMPd64 instead of JMP_1 since JMP_1 is used
with brtarget8.

Differential Revision: https://reviews.llvm.org/D49299

llvm-svn: 338237
2018-07-30 09:59:33 +00:00
Dean Michael Berris 927b3da6c9 Revert "[X86] Correct the immediate cost for 'add/sub i64 %x, 0x80000000'."
This reverts commit r338204.

llvm-svn: 338236
2018-07-30 09:45:09 +00:00
Nicolai Haehnle 7f0d05d532 AMDGPU: Force skip over s_sendmsg and exp instructions
Summary:
These instructions interact with hardware blocks outside the shader core,
and they can have "scalar" side effects even when EXEC = 0. We don't
want these scalar side effects to occur when all lanes want to skip
these instructions, so always add the execz skip branch instruction
for basic blocks that contain them.

Also ensure that we skip scalar stores / atomics, though we don't
code-gen those yet.

Reviewers: arsenm, rampitec

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D48431

Change-Id: Ieaeb58352e2789ffd64745603c14970c60819d44
llvm-svn: 338235
2018-07-30 09:23:59 +00:00
Petr Pavlu 8b6eff4e77 [ARM] Fix over-alignment in arguments that are HA of 128-bit vectors
Code in `CC_ARM_AAPCS_Custom_Aggregate()` is responsible for handling
homogeneous aggregates for `CC_ARM_AAPCS_VFP`. When an aggregate ends up
fully on stack, the function tries to pack all resulting items of the
aggregate as tightly as possible according to AAPCS.

Once the first item was laid out, the alignment used for consecutive
items was the size of one item. This logic went wrong for 128-bit
vectors because their alignment is normally only 64 bits, and so could
result in inserting unexpected padding between the first and second
element.

The patch fixes the problem by updating the alignment with the item size
only if this results in reducing it.

Differential Revision: https://reviews.llvm.org/D49720

llvm-svn: 338233
2018-07-30 08:49:30 +00:00
Dylan McKay 6bc5d5c6db [AVR] Re-enable expansion of ADDE/ADDC/SUBE/SUBC in ISel
This was disabled in r333748, which broke four tests.

In the future, these need to be updated to UADDO/ADDCARRY or
USUBO/SUBCARRY.

llvm-svn: 338212
2018-07-29 11:38:36 +00:00
Sander de Smalen ad88a99956 [AArch64][SVE] Asm: Support for WHILE(LE|LO|LS|LT) instructions.
The WHILE instructions generate a predicate that is true while the 
comparison of the first scalar operand (incremented for each predicate
element) with the second scalar operand is true and false thereafter.

  WHILELE  While incrementing signed scalar less than or equal to scalar
  WHILELO  While incrementing unsigned scalar lower than scalar
  WHILELS  While incrementing unsigned scalar lower than or same as scalar
  WHILELT  While incrementing signed scalar less than scalar

e.g.

  whilele  p0.s, x0, x1

  generates predicate p0 (for 32bit elements) by incrementing
  (signed) x0 and comparing that vector to splat(x1).

llvm-svn: 338211
2018-07-29 08:51:08 +00:00
Sander de Smalen e70ed3187c [AArch64][SVE] Asm: Instructions to perform serialized operations.
The instructions added in this patch permit active elements within
a vector to be processed sequentially without unpacking the vector.

  PFIRST      Set the first active element to true.
  PNEXT       Find next active element in predicate.
  CTERMEQ     Compare and terminate loop when equal.
  CTERMNE     Compare and terminate loop when not equal.

llvm-svn: 338210
2018-07-29 08:00:16 +00:00
Craig Topper 5daa032546 [X86] Correct the immediate cost for 'add/sub i64 %x, 0x80000000'.
X86 normally requires immediates to be a signed 32-bit value which would exclude i64 0x80000000. But for add/sub we can negate the constant and use the opposite instruction.

llvm-svn: 338204
2018-07-28 18:21:46 +00:00
Craig Topper ba208b07b6 [X86] Use alignTo and divideCeil to make some code more readable. NFC
llvm-svn: 338203
2018-07-28 18:21:45 +00:00
Sander de Smalen 5b3a289424 [AArch64][SVE] Asm: Support for PFALSE and PTEST instructions.
This patch adds PFALSE (unconditionally sets all elements of
the predicate to false) and PTEST (set the status flags for the
predicate).

llvm-svn: 338198
2018-07-28 14:18:11 +00:00
Matt Arsenault 8f9dde94b7 AMDGPU: Stop wasting argument registers with v3i32/v3f32
SelectionDAGBuilder widens v3i32/v3f32 arguments to
to v4i32/v4f32 which consume an additional register.
In addition to wasting argument space, this produces extra
instructions since now it appears the 4th vector component has
a meaningful value to most combines.

llvm-svn: 338197
2018-07-28 14:11:34 +00:00
Sander de Smalen 3878bf83dd [AArch64][SVE] Asm: Data-dependent loop predicate partitioning instructions.
This patch adds support for instructions that partition a predicate
based on data-dependent termination conditions in a loop.

  BRKA      Break after the first true condition
  BRKAS     Break after the first true condition, setting condition flags
  BRKB      Break before the first true condition
  BRKBS     Break before the first true condition, setting condition flags

  BRKPA     Break after the first true condition, propagating from the 
            previous partition
  BRKPAS    Break after the first true condition, propagating from the 
            previous partition, setting condition flags
  BRKPB     Break before the first true condition, propagating from the 
            previous partition
  BRKPBS    Break before the first true condition, propagating from the 
            previous partition, setting condition flags

  BRKN      Propagate break to next partition
  BKRNS     Propagate break to next partition, setting condition flags

llvm-svn: 338196
2018-07-28 14:04:52 +00:00
Matt Arsenault 81920b0a25 DAG: Add calling convention argument to calling convention funcs
This seems like a pretty glaring omission, and AMDGPU
wants to treat kernels differently from other calling
conventions.

llvm-svn: 338194
2018-07-28 13:25:19 +00:00
Matt Arsenault 72b0e38b26 AMDGPU: Stop trying to extend arguments for clover
This was trying to replace i8/i16 arguments with i32, which
was broken and no longer necessary.

llvm-svn: 338193
2018-07-28 12:34:25 +00:00
Wouter van Oortmerssen a90d24da1c Revert "[WebAssembly] Added default stack-only instruction mode for MC."
This reverts commit d3c9af4179eae7793d1487d652e2d4e23844555f.
(SVN revision 338164)

llvm-svn: 338176
2018-07-27 23:19:51 +00:00
Craig Topper c3e11bf3f7 [X86] Add support expanding multiplies by constant where the constant is -3/-5/-9 multplied by a power of 2.
These can be replaced with an LEA, a shift, and a negate. This seems to match what gcc and icc would do.

llvm-svn: 338174
2018-07-27 23:04:59 +00:00
Wouter van Oortmerssen a67c4137c3 [WebAssembly] Added default stack-only instruction mode for MC.
Summary:
Moved Explicit Locals pass to last.
Made that pass obligatory.
Made it convert from register to stack based instructions, and removed the registers.
Fixes to related code that was expecting register based instructions.
Added the correct testing flag to all tests, depending on what the
format they were expecting so far.
Translated one test to stack format as example: reg-stackify-stack.ll

tested:
llvm-lit -v `find test -name WebAssembly`
unittests/MC/*

Reviewers: dschuff, sunfish

Subscribers: sbc100, jgravelle-google, eraman, aheejin, llvm-commits

Differential Revision: https://reviews.llvm.org/D49160

llvm-svn: 338164
2018-07-27 20:56:43 +00:00
Jessica Paquette f90edbe3d6 Recommit "Enable MachineOutliner by default under -Oz for AArch64"
Fixed the ASAN failure from before in r338148, so recommiting.

This patch enables the MachineOutliner by default in AArch64 under -Oz.

The MachineOutliner offers around a 4.5% improvement on the current -Oz code
size improvements.

We have done work into improving the debuggability of outlined code, so that
users of -Oz won't be surprised by the optimization. We have also been executing
the LLVM test suite and common external tests such as the SPEC suites
continuously with no issue. The outliner has a low compile-time overhead of
roughly 1%. At this point, the outliner would be a really good addition to the
-Oz pass pipeline!

llvm-svn: 338160
2018-07-27 20:18:27 +00:00
Jessica Paquette 9d93c6026a [MachineOutliner] Exit getOutliningCandidateInfo when we erase all candidates
There was a missing check for if a candidate list was entirely deleted. This
adds that check.

This fixes an asan failure caused by running test/CodeGen/AArch64/addsub_ext.ll
with the MachineOutliner enabled.

llvm-svn: 338148
2018-07-27 18:21:57 +00:00
Evandro Menezes fcca45f0dd [ARM] Add new target feature to fuse literal generation
This feature enables the fusion of such operations on Cortex A57 and Cortex
A72, as recommended in their Software Optimisation Guides, sections 4.14 and
4.11, respectively.

Differential revision: https://reviews.llvm.org/D49563

llvm-svn: 338147
2018-07-27 18:16:47 +00:00
Jessica Paquette faea2d3130 Revert "Enable MachineOutliner by default under -Oz for AArch64"
It failed an Asan test on a bot:

http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/21543/steps/check-llvm%20asan/logs/stdio

Fixing that before recommitting.

llvm-svn: 338136
2018-07-27 17:25:38 +00:00
Yonghong Song 04ccfda075 bpf: add missing RegState to notify MachineInstr verifier necessary register usage
Errors like the following are reported by:

  https://urldefense.proofpoint.com/v2/url?u=http-3A__lab.llvm.org-3A8011_builders_llvm-2Dclang-2Dx86-5F64-2Dexpensive-2Dchecks-2Dwin_builds_11261&d=DwIBAg&c=5VD0RTtNlTh3ycd41b3MUw&r=DA8e1B5r073vIqRrFz7MRA&m=929oWPCf7Bf2qQnir4GBtowB8ZAlIRWsAdTfRkDaK-g&s=9k-wbEUVpUm474hhzsmAO29VXVvbxJPWD9RTgCD71fQ&e=

  *** Bad machine code: Explicit definition marked as use ***
  - function:    cal_align1
  - basic block: %bb.0 entry (0x47edd98)
  - instruction: LDB $r3, $r2, 0
  - operand 0:   $r3

This is because RegState info was missing for ScratchReg inside
expandMEMCPY. This caused incomplete register usage information to
MachineInstr verifier which then would complain as there could be potential
code-gen issue if the complained MachineInstr is used in place where
register usage information matters even though the memcpy expanding is not
in such case as it happens at the last stage of IR optimization pipeline.

We should always specify those register usage information which compiler
couldn't deduct automatically whenever we add a hardware register manually.

Reported-by: Builder llvm-clang-x86_64-expensive-checks-win Build #11261
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Yonghong Song <yhs@fb.com>
llvm-svn: 338134
2018-07-27 16:58:52 +00:00
Jessica Paquette d4229b985c Enable MachineOutliner by default under -Oz for AArch64
This patch enables the MachineOutliner by default in AArch64 under -Oz.

The MachineOutliner offers around a 4.5% improvement on the current -Oz code
size improvements.

We have done work into improving the debuggability of outlined code, so that
users of -Oz won't be surprised by the optimization. We have also been executing
the LLVM test suite and common external tests such as the SPEC suites
continuously with no issue. The outliner has a low compile-time overhead of
roughly 1%. At this point, the outliner would be a really good addition to the
-Oz pass pipeline!

llvm-svn: 338133
2018-07-27 16:44:42 +00:00
Jan Vesely 6ff58ed5ca AMDGPU/R600: Add MOV instructions to BFE patterns
R600 can't handle immediates for BFE, these will be eliminated later.
Fixes powr/pow regressions n r600 since r334817

Differential Revision: https://reviews.llvm.org/D49641

llvm-svn: 338127
2018-07-27 15:00:13 +00:00
Sander de Smalen a703b8dc71 [AArch64][SVE] Asm: Predicated integer reductions.
This patch adds support for various integer reduction operations:

  SADDV    signed add reduction to scalar
  UADDV    unsigned add reduction to scalar

  SMAXV    signed maximum reduction to scalar
  SMINV    signed minimum reduction to scalar
  UMAXV    unsigned maximum reduction to scalar
  UMINV    unsigned minimum reduction to scalar

  ANDV     logical AND reduction to scalar
  ORV      logical OR reduction to scalar
  EORV     logical EOR reduction to scalar

The reduction is predicated, e.g.
  smaxv s0, p0, z1.s

performs a signed maximum reduction on active elements in z1,
and stores the (signed max value) result in s0.

llvm-svn: 338126
2018-07-27 14:24:55 +00:00
Sander de Smalen fcb636d222 [AArch64][SVE] Asm: Predicated floating point reductions.
This patch adds support for various floating-point
reduction operations:

  FADDA    strictly-ordered add reduction, accumulating in scalar
  FADDV    recursive add reduction to scalar
  FMAXV    recursive max reduction to scalar
  FMINV    recursive min reduction to scalar
  FMAXNMV  recursive max number reduction to scalar
  FMINNMV  recursive min number reduction to scalar

The reduction is predicated, e.g.

  fadda d0, p0, d0, z1.d

performs the add-reduction in strict order on active elements
in z1, accumulating into d0.

  faddv d0, p0, z1.d

performs the add-reduction (not in strict order)
on active elements in z1, storing the result in d0.

llvm-svn: 338123
2018-07-27 13:58:48 +00:00
Sander de Smalen 88e154ff90 [AArch64][SVE] Asm: Support for FEXPA and FTSSEL.
This patch adds support for transcendental acceleration
instructions 'FEXPA' (exponential accelerator) and 'FTSSEL'
(trigonometric select coefficient).

llvm-svn: 338121
2018-07-27 12:40:09 +00:00
Sander de Smalen 71929e7cad [AArch64][SVE] Asm: Support for FRECPE and FRSQRTE.
Support for floating-point instructions for reciprocal
estimate (FRECPE) and reciprocal square root estimate (FRSQRTE).

llvm-svn: 338120
2018-07-27 12:26:24 +00:00
Matt Arsenault 0183c56c11 AMDGPU: Fix code size for return_to_epilog pseudo
llvm-svn: 338113
2018-07-27 09:15:03 +00:00
Tom Stellard e9bdc5f1d8 AMDGPU/GlobalISel: Fix crash in regbankselect on non-power-of-2 types
Reviewers: arsenm

Reviewed By: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D49624

llvm-svn: 338102
2018-07-27 06:04:40 +00:00
Craig Topper 561e298e29 [X86] Remove an unnecessary 'if' that prevented treating INT64_MAX and -INT64_MAX as power of 2 minus 1 in the multiply expansion code.
Not sure why they were being explicitly excluded, but I believe all the math inside the if works. I changed the absolute value to be uint64_t instead of int64_t so INT64_MIN+1 wouldn't be signed wrap.

llvm-svn: 338101
2018-07-27 05:56:27 +00:00
Craig Topper e364baa88b [X86] Add matching for another pattern of PMADDWD.
Summary:
This is the pattern you get from the loop vectorizer for something like this

int16_t A[1024];
int16_t B[1024];
int32_t C[512];

void pmaddwd() {
  for (int i = 0; i != 512; ++i)
    C[i] = (A[2*i]*B[2*i]) + (A[2*i+1]*B[2*i+1]);
}

In this case we will have (add (mul (build_vector), (build_vector)), (mul (build_vector), (build_vector))). This is different than the pattern we currently match which has the build_vectors between an add and a single multiply. I'm not sure what C code would get you that pattern.

Reviewers: RKSimon, spatel, zvi

Reviewed By: zvi

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D49636

llvm-svn: 338097
2018-07-27 04:29:10 +00:00
Craig Topper f7bc550223 [X86] When removing sign extends from gather/scatter indices, make sure we handle UpdateNodeOperands finding an existing node to CSE with.
If this happens the operands aren't updated and the existing node is returned. Make sure we pass this existing node up to the DAG combiner so that a proper replacement happens. Otherwise we get stuck in an infinite loop with an unoptimized node.

llvm-svn: 338090
2018-07-27 00:00:30 +00:00
Scott Linder eb1f75d561 [AMDGPU] Fix VGPR spills where offset doesn't fit in 12 bits
Scale the offset of VGPR spills by the wave size when it cannot fit in the
12-bit offset immediate field and so is added to the soffset SGPR. This
accounts for hardware swizzling of scratch memory.

Differential Revision: https://reviews.llvm.org/D49448

llvm-svn: 338060
2018-07-26 19:47:51 +00:00
Ana Pazos 2e4106b73d [RISCV] Add support for _interrupt attribute
- Save/restore only registers that are used.
This includes Callee saved registers and Caller saved registers
(arguments and temporaries) for integer and FP registers.
- If there is a call in the interrupt handler, save/restore all
Caller saved registers (arguments and temporaries) and all FP registers.
- Emit special return instructions depending on "interrupt"
attribute type.
Based on initial patch by Zhaoshi Zheng.

Reviewers: asb

Reviewed By: asb

Subscribers: rkruppe, the_o, MartinMosbeck, brucehoult, rbar, johnrusso, simoncook, sabuasal, niosHD, kito-cheng, shiva0217, zzheng, edward-jones, mgrang, rogfer01, llvm-commits

Differential Revision: https://reviews.llvm.org/D48411

llvm-svn: 338047
2018-07-26 17:49:43 +00:00
Alexey Bataev 4dd7558fab [DEBUGINFO, NVPTX] Emit correct debug information for local variables.
Summary:
NVPTX target dos not use register-based frame information. Instead it
relies on the artificial local_depot that is used instead of the frame
and the data for variables must be emitted relatively to this
local_depot.

Reviewers: tra, jlebar, echristo

Subscribers: jholewinski, aprantl, JDevlieghere, llvm-commits

Differential Revision: https://reviews.llvm.org/D45963

llvm-svn: 338039
2018-07-26 16:29:52 +00:00
Luke Cheeseman 66b5e7da4c Enable some pointer authentication instructions for aarch64 v8a targets
- Some of the v8.3 pointer authentication instruction inhabit the Hint space
- These instructions can be assembled to hint instructions which act as NOP instructions prior to v8.3
- This patch permits using the hint instructions for all v8a targets
- Also, correct the RETA{A,B} instructions to match the instruction attributes of RET (set isTerminator and isBarrier)

Differential Revision: https://reviews.llvm.org/D49786

llvm-svn: 338029
2018-07-26 14:00:50 +00:00
Stefan Maksimovic 4a612d4bf2 [mips] Sign extend i32 return values on MIPS64
Override getTypeForExtReturn so that functions returning
an i32 typed value have it sign extended on MIPS64.

Also provide patterns to get rid of unneeded sign extensions
for arithmetic instructions which implicitly sign extend
their results.

Differential Revision: https://reviews.llvm.org/D48374

llvm-svn: 338019
2018-07-26 10:59:35 +00:00
Chandler Carruth 1387159b93 [x86/SLH] Extract the logic to trace predicate state through calls to
a helper function with a nice overview comment. NFC.

This is a preperatory refactoring to implementing another component of
mitigation here that was descibed in the design document but hadn't been
implemented yet.

llvm-svn: 338016
2018-07-26 09:42:57 +00:00
Sjoerd Meijer dc198344ce [AArch64] Armv8.2-A: add the crypto extensions
This adds MC support for the crypto instructions that were made optional
extensions in Armv8.2-A (AArch64 only).

Differential Revision: https://reviews.llvm.org/D49370

llvm-svn: 338010
2018-07-26 07:13:59 +00:00
Craig Topper 4e687d5bb2 [X86] Don't use CombineTo to skip adding new nodes to the DAGCombiner worklist in combineMul.
I'm not sure if this was trying to avoid optimizing the new nodes further or what. Or maybe to prevent a cycle if something tried to reform the multiply? But I don't think its a reliable way to do that. If the user of the expanded multiply is visited by the DAGCombiner after this conversion happens, the DAGCombiner will check its operands, see that they haven't been visited by the DAGCombiner before and it will then add the first node to the worklist. This process will repeat until all the new nodes are visited.

So this seems like an unreliable prevention at best. So this patch just returns the new nodes like any other combine. If this starts causing problems we can try to add target specific nodes or something to more directly prevent optimizations.

Now that we handle the combine normally, we can combine any negates the mul expansion creates into their users since those will be visited now.

llvm-svn: 338007
2018-07-26 05:40:10 +00:00
Craig Topper 370bdd3a0f [X86] Remove some unnecessary explicit calls to DCI.AddToWorkList.
These calls were making sure some newly created nodes were added to worklist, but the DAGCombiner has internal support for ensuring it has visited all nodes. Any time it visits a node it ensures the operands have been queued to be visited as well. This means if we only need to return the last new node. The DAGCombiner will take care of adding its inputs thus walking backwards through all the new nodes.

llvm-svn: 337996
2018-07-26 03:20:27 +00:00
Matthias Braun 57dd5b3dea CodeGen: Cleanup regmask construction; NFC
- Avoid duplication of regmask size calculation.
- Simplify allocateRegisterMask() call.
- Rename allocateRegisterMask() to allocateRegMask() to be consistent
  with naming in MachineOperand.

llvm-svn: 337986
2018-07-26 00:27:47 +00:00
Yonghong Song 71d81e5c8f bpf: new option -bpf-expand-memcpy-in-order to expand memcpy in order
Some BPF JIT backends would want to optimize memcpy in their own
architecture specific way.

However, at the moment, there is no way for JIT backends to see memcpy
semantics in a reliable way. This is due to LLVM BPF backend is expanding
memcpy into load/store sequences and could possibly schedule them apart from
each other further. So, BPF JIT backends inside kernel can't reliably
recognize memcpy semantics by peephole BPF sequence.

This patch introduce new intrinsic expand infrastructure to memcpy.

To get stable in-order load/store sequence from memcpy, we first lower
memcpy into BPF::MEMCPY node which then expanded into in-order load/store
sequences in expandPostRAPseudo pass which will happen after instruction
scheduling. By this way, kernel JIT backends could reliably recognize
memcpy through scanning BPF sequence.

This new memcpy expand infrastructure is gated by a new option:

  -bpf-expand-memcpy-in-order

Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
llvm-svn: 337977
2018-07-25 22:40:02 +00:00
Martin Storsjo d78b394543 Add missing 'override', fixing compilation with some compilers since SVN r337950
llvm-svn: 337952
2018-07-25 19:01:36 +00:00
Martin Storsjo d2662c32fb [COFF] Hoist constant pool handling from X86AsmPrinter into AsmPrinter
In SVN r334523, the first half of comdat constant pool handling was
hoisted from X86WindowsTargetObjectFile (which despite the name only
was used for msvc targets) into the arch independent
TargetLoweringObjectFileCOFF, but the other half of the handling was
left behind in X86AsmPrinter::GetCPISymbol.

With only half of the handling in place, inconsistent comdat
sections/symbols are created, causing issues with both GNU binutils
(avoided for X86 in SVN r335918) and with the MS linker, which
would complain like this:

fatal error LNK1143: invalid or corrupt file: no symbol for COMDAT section 0x4

Differential Revision: https://reviews.llvm.org/D49644

llvm-svn: 337950
2018-07-25 18:35:31 +00:00
Eli Friedman 733f4ed1bb [ARM] Prefer lsls+lsrs over lsls+ands or lsrs+ands in Thumb1.
Saves materializing the immediate for the "ands".

Corresponding patterns exist for lsrs+lsls, but that seems less common
in practice.

Now implemented as a DAGCombine.

Differential Revision: https://reviews.llvm.org/D49585

llvm-svn: 337945
2018-07-25 18:22:22 +00:00
Stanislav Mekhanoshin 7e7268ac1c [AMDGPU] Use AssumptionCacheTracker in the divrem32 expansion
Differential Revision: https://reviews.llvm.org/D49761

llvm-svn: 337938
2018-07-25 17:02:11 +00:00
Krzysztof Parzyszek 4e07509d18 [Hexagon] Properly scale bit index when extracting elements from vNi1
For example v = <2 x i1> is represented as bbbbaaaa in a predicate register,
where b = v[1], a = v[0]. Extracting v[1] is equivalent to extracting bit 4
from the predicate register.

llvm-svn: 337934
2018-07-25 16:20:59 +00:00
Petar Jovanovic 58c0210023 [MIPS GlobalISel] Lower pointer arguments
Add support for lowering pointer arguments.
Changing type from pointer to integer is already done in
MipsTargetLowering::getRegisterTypeForCallingConv.

Patch by Petar Avramovic.

Differential Revision: https://reviews.llvm.org/D49419

llvm-svn: 337912
2018-07-25 12:35:01 +00:00
Jonas Paulsson 374af8070e [SystemZ] Use tablegen loops in SchedModels
NFC changes to make scheduler TableGen files more readable, by using loops
instead of a lot of similar defs with just e.g. a latency value that changes.

https://reviews.llvm.org/D49598
Review: Ulrich Weigand, Javed Abshar

llvm-svn: 337909
2018-07-25 11:42:55 +00:00
Chandler Carruth 4f6481dc81 [x86/SLH] Sink the return hardening into the main block-walk + hardening
code.

This consolidates all our hardening calls, and simplifies the code
a bit. It seems much more clear to handle all of these together.

No functionality changed here.

llvm-svn: 337895
2018-07-25 09:18:48 +00:00
Chandler Carruth 196e719acd [x86/SLH] Improve name and comments for the main hardening function.
This function actually does two things: it traces the predicate state
through each of the basic blocks in the function (as that isn't directly
handled by the SSA updater) *and* it hardens everything necessary in the
block as it goes. These need to be done together so that we have the
currently active predicate state to use at each point of the hardening.

However, this also made obvious that the flag to disable actual
hardening of loads was flawed -- it also disabled tracing the predicate
state across function calls within the body of each block. So this patch
sinks this debugging flag test to correctly guard just the hardening of
loads.

Unless load hardening was disabled, no functionality should change with
tis patch.

llvm-svn: 337894
2018-07-25 09:00:26 +00:00
Simon Atanasyan b524459288 [mips] Replace custom parsing logic for data directives by the `addAliasForDirective`
The target independent AsmParser doesn't recognise .hword, .word, .dword
which are required for Mips. Currently MipsAsmParser recognises these
through dispatch to MipsAsmParser::parseDataDirective. This contains
equivalent logic to AsmParser::parseDirectiveValue. This patch allows
reuse of AsmParser::parseDirectiveValue by making use of
addAliasForDirective to support .hword, .word and .dword.

Original patch provided by Alex Bradbury at D47001 was modified to fix
handling of microMIPS symbols. The `AsmParser::parseDirectiveValue`
calls either `EmitIntValue` or `EmitValue`. In this patch we override
`EmitIntValue` in the `MipsELFStreamer` to clear a pending set of
microMIPS symbols.

Differential revision: https://reviews.llvm.org/D49539

llvm-svn: 337893
2018-07-25 07:07:43 +00:00
Craig Topper dc0e8a601d [X86] Use X86ISD::MUL_IMM instead of ISD::MUL for multiply we intend to be selected to LEA.
This prevents other combines from possibly disturbing it.

llvm-svn: 337890
2018-07-25 05:33:36 +00:00
Chandler Carruth 7024921c0a [x86/SLH] Teach the x86 speculative load hardening pass to harden
against v1.2 BCBS attacks directly.

Attacks using spectre v1.2 (a subset of BCBS) are described in the paper
here:
https://people.csail.mit.edu/vlk/spectre11.pdf

The core idea is to speculatively store over the address in a vtable,
jumptable, or other target of indirect control flow that will be
subsequently loaded. Speculative execution after such a store can
forward the stored value to subsequent loads, and if called or jumped
to, the speculative execution will be steered to this potentially
attacker controlled address.

Up until now, this could be mitigated by enableing retpolines. However,
that is a relatively expensive technique to mitigate this particular
flavor. Especially because in most cases SLH will have already mitigated
this. To fully mitigate this with SLH, we need to do two core things:
1) Unfold loads from calls and jumps, allowing the loads to be post-load
   hardened.
2) Force hardening of incoming registers even if we didn't end up
   needing to harden the load itself.

The reason we need to do these two things is because hardening calls and
jumps from this particular variant is importantly different from
hardening against leak of secret data. Because the "bad" data here isn't
a secret, but in fact speculatively stored by the attacker, it may be
loaded from any address, regardless of whether it is read-only memory,
mapped memory, or a "hardened" address. The only 100% effective way to
harden these instructions is to harden the their operand itself. But to
the extent possible, we'd like to take advantage of all the other
hardening going on, we just need a fallback in case none of that
happened to cover the particular input to the control transfer
instruction.

For users of SLH, currently they are paing 2% to 6% performance overhead
for retpolines, but this mechanism is expected to be substantially
cheaper. However, it is worth reminding folks that this does not
mitigate all of the things retpolines do -- most notably, variant #2 is
not in *any way* mitigated by this technique. So users of SLH may still
want to enable retpolines, and the implementation is carefuly designed to
gracefully leverage retpolines to avoid the need for further hardening
here when they are enabled.

Differential Revision: https://reviews.llvm.org/D49663

llvm-svn: 337878
2018-07-25 01:51:29 +00:00
Craig Topper fc501a9223 [X86] Use a shift plus an lea for multiplying by a constant that is a power of 2 plus 2/4/8.
The LEA allows us to combine an add and the multiply by 2/4/8 together so we just need a shift for the larger power of 2.

llvm-svn: 337875
2018-07-25 01:15:38 +00:00
Craig Topper 5be253d988 [X86] Expand mul by pow2 + 2 using a shift and two adds similar to what we do for pow2 - 2.
llvm-svn: 337874
2018-07-25 01:15:35 +00:00
Craig Topper 56c104f104 [X86] Use a two lea sequence for multiply by 37, 41, and 73.
These fit a pattern used by 11, 21, and 19.

llvm-svn: 337871
2018-07-24 23:44:17 +00:00
Craig Topper f8fcee70a3 [X86] Change multiply by 26 to use two multiplies by 5 and an add instead of multiply by 3 and 9 and a subtract.
Same number of operations, but ending in an add is friendlier due to it being commutable.

llvm-svn: 337869
2018-07-24 23:44:12 +00:00
Craig Topper 5ddc0a2b14 [X86] When expanding a multiply by a negative of one less than a power of 2, like 31, don't generate a negate of a subtract that we'll never optimize.
We generated a subtract for the power of 2 minus one then negated the result. The negate can be optimized away by swapping the subtract operands, but DAG combine doesn't know how to do that and we don't add any of the new nodes to the worklist anyway.

This patch makes use explicitly emit the swapped subtract.

llvm-svn: 337858
2018-07-24 21:31:21 +00:00
Craig Topper 6d29891bef [X86] Generalize the multiply by 30 lowering to generic multipy by power 2 minus 2.
Use a left shift and 2 subtracts like we do for 30. Move this out from behind the slow lea check since it doesn't even use an LEA.

Use this for multiply by 14 as well.

llvm-svn: 337856
2018-07-24 21:15:41 +00:00
Craig Topper 86d6320b94 [X86] Change multiply by 19 to use (9 * X) * 2 + X instead of (5 * X) * 4 - 1.
The new lowering can be done in 2 LEAs. The old code took 1 LEA, 1 shift, and 1 sub.

llvm-svn: 337851
2018-07-24 20:31:48 +00:00
Jessica Paquette 69f517df27 [MachineOutliner][NFC] Move target frame info into OutlinedFunction
Just some gardening here.

Similar to how we moved call information into Candidates, this moves outlined
frame information into OutlinedFunction. This allows us to remove
TargetCostInfo entirely.

Anywhere where we returned a TargetCostInfo struct, we now return an
OutlinedFunction. This establishes OutlinedFunctions as more of a general
repeated sequence, and Candidates as occurrences of those repeated sequences.

llvm-svn: 337848
2018-07-24 20:13:10 +00:00
Peter Collingbourne e06bac4796 Put "built-in" function definitions in global Used list, for LTO. (fix bug 34169)
When building with LTO, builtin functions that are defined but whose calls have not been inserted yet, get internalized. The Global Dead Code Elimination phase in the new LTO implementation then removes these function definitions. Later optimizations add calls to those functions, and the linker then dies complaining that there are no definitions. This CL fixes the new LTO implementation to check if a function is builtin, and if so, to not internalize (and later DCE) the function. As part of this fix I needed to move the RuntimeLibcalls.{def,h} files from the CodeGen subidrectory to the IR subdirectory. I have updated all the files that accessed those two files to access their new location.

Fixes PR34169

Patch by Caroline Tice!

Differential Revision: https://reviews.llvm.org/D49434

llvm-svn: 337847
2018-07-24 19:34:37 +00:00
Chandler Carruth c9313a9ecb [x86] Teach the x86 backend that it can fold between TCRETURNm* and TCRETURNr* and fix latent bugs with register class updates.
Summary:
Enabling this fully exposes a latent bug in the instruction folding: we
never update the register constraints for the register operands when
fusing a load into another operation. The fused form could, in theory,
have different register constraints on its operands. And in fact,
TCRETURNm* needs its memory operands to use tailcall compatible
registers.

I've updated the folding code to re-constrain all the registers after
they are mapped onto their new instruction.

However, we still can't enable folding in the general case from
TCRETURNr* to TCRETURNm* because doing so may require more registers to
be available during the tail call. If the call itself uses all but one
register, and the folded load would require both a base and index
register, there will not be enough registers to allocate the tail call.

It would be better, IMO, to teach the register allocator to *unfold*
TCRETURNm* when it runs out of registers (or specifically check the
number of registers available during the TCRETURNr*) but I'm not going
to try and solve that for now. Instead, I've just blocked the forward
folding from r -> m, leaving LLVM free to unfold from m -> r as that
doesn't introduce new register pressure constraints.

The down side is that I don't have anything that will directly exercise
this. Instead, I will be immediately using this it my SLH patch. =/

Still worse, without allowing the TCRETURNr* -> TCRETURNm* fold, I don't
have any tests that demonstrate the failure to update the memory operand
register constraints. This patch still seems correct, but I'm nervous
about the degree of testing due to this.

Suggestions?

Reviewers: craig.topper

Subscribers: sanjoy, mcrosier, hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D49717

llvm-svn: 337845
2018-07-24 19:04:37 +00:00
Jessica Paquette fca55129b1 [MachineOutliner][NFC] Make Candidates own their call information
Before this, TCI contained all the call information for each Candidate.

This moves that information onto the Candidates. As a result, each Candidate
can now supply how it ought to be called. Thus, Candidates will be able to,
say, call the same function in cheaper ways when possible. This also removes
that information from TCI, since it's no longer used there.

A follow-up patch for the AArch64 outliner will demonstrate this.

llvm-svn: 337840
2018-07-24 17:42:11 +00:00
Simon Atanasyan 28ded4ee19 [mips] Fix local dynamic TLS with Sym64
For the final DTPREL addition, rather than a lui/daddiu/daddu triple,
LLVM was erronously emitting a daddiu/daddiu pair, treating the %dtprel_hi
as if it were a %dtprel_lo, since Mips::Hi expands unshifted for Sym64.
Instead, use a new TlsHi node and, although unnecessary due to the exact
structure of the nodes emitted, use TlsHi for local exec too to prevent
future bugs. Also garbage-collect the unused TprelLo and TlsGd nodes,
and TprelHi since its functionality is provided by the new common TlsHi node.

Patch by James Clarke.

Differential revision: https://reviews.llvm.org/D49259

llvm-svn: 337827
2018-07-24 13:47:52 +00:00
Chandler Carruth 54529146c6 [x86/SLH] Extract the core register hardening logic to a low-level
helper and restructure the post-load hardening to use this.

This isn't as trivial as I would have liked because the post-load
hardening used a trick that only works for it where it swapped in
a temporary register to the load rather than replacing anything.
However, there is a simple way to do this without that trick that allows
this to easily reuse a friendly API for hardening a value in a register.
That API will in turn be usable in subsequent patcehs.

This also techincally changes the position at which we insert the subreg
extraction for the predicate state, but that never resulted in an actual
instruction and so tests don't change at all.

llvm-svn: 337825
2018-07-24 12:44:00 +00:00
Chandler Carruth 376113da89 [x86/SLH] Tidy up a comment, using doxygen structure and wording it to
be more accurate and understandable.

llvm-svn: 337822
2018-07-24 12:19:01 +00:00
Sam Parker 8b93e82c3d [ARM] Disable ARMCodeGenPrepare by default
ARM Stage 2 builders have been suspiciously broken since the pass was
committed. Disabling to hopefully fix the bots and give me time to
debug.

llvm-svn: 337821
2018-07-24 12:04:23 +00:00
Tom Stellard b7f19e6d1e AMDGPU/GlobalISel: Legalize G_INSERT
Reviewers: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D49601

llvm-svn: 337798
2018-07-24 02:19:20 +00:00
Tom Stellard 2d37929c10 AMDGPU/GlobalISel: Remove unnecessary legality constraint for G_EXTRACT
Summary:
We were marking G_EXTRACT operations unsupported if the output type
was larger than the input type.  I don't see how this could ever actually
happen, so I dropped the constraint.  Doing this makes it possible to
reuse the same legality code for G_INSERT.

Reviewers: arsenm

Reviewed By: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D49600

llvm-svn: 337794
2018-07-24 01:43:49 +00:00
Chandler Carruth 66fbbbca60 [x86/SLH] Simplify the code for hardening a loaded value. NFC.
This is in preparation for extracting this into a re-usable utility in
this code.

llvm-svn: 337785
2018-07-24 00:35:36 +00:00
Chandler Carruth b46c22de00 [x86/SLH] Remove complex SHRX-based post-load hardening.
This code was really nasty, had several bugs in it originally, and
wasn't carrying its weight. While on Zen we have all 4 ports available
for SHRX, on all of the Intel parts with Agner's tables, SHRX can only
execute on 2 ports, giving it 1/2 the throughput of OR.

Worse, all too often this pattern required two SHRX instructions in
a chain, hurting the critical path by a lot.

Even if we end up needing to safe/restore EFLAGS, that is no longer so
bad. We pay for a uop to save the flag, but we very likely get fusion
when it is used by forming a test/jCC pair or something similar. In
practice, I don't expect the SHRX to be a significant savings here, so
I'd like to avoid the complex code required. We can always resurrect
this if/when someone has a specific performance issue addressed by it.

llvm-svn: 337781
2018-07-24 00:21:59 +00:00
Martin Storsjo c2b701408e [AArch64] Use MCAsmInfoMicrosoft and MCAsmInfoGNUCOFF as base classes
This matches the structure used on X86 and ARM. This requires
a little bit of duplication of the parts that are equal in both
AArch64 COFF variants though.

Before SVN r335286, these classes didn't add anything that MCAsmInfoCOFF
didn't, but now they do.

This makes AArch64 match X86 in how comdat is used for float constants
for MinGW.

Differential Revision: https://reviews.llvm.org/D49637

llvm-svn: 337755
2018-07-23 22:15:14 +00:00
Reid Kleckner 980c4df037 Re-land r335297 "[X86] Implement more of x86-64 large and medium PIC code models"
Don't try to generate large PIC code for non-ELF targets. Neither COFF
nor MachO have relocations for large position independent code, and
users have been using "large PIC" code models to JIT 64-bit code for a
while now. With this change, if they are generating ELF code, their
JITed code will truly be PIC, but if they target MachO or COFF, it will
contain 64-bit immediates that directly reference external symbols. For
a JIT, that's perfectly fine.

llvm-svn: 337740
2018-07-23 21:14:35 +00:00
Krzysztof Parzyszek 9500a24fce [Hexagon] Handle unnamed globals in HexagonConstExpr
Instead of comparing names, compare positions in the parent module.

llvm-svn: 337723
2018-07-23 18:30:17 +00:00
Fangrui Song 58407ca045 [ARM] Use unique_ptr to fix memory leak introduced in r337701
llvm-svn: 337714
2018-07-23 17:43:21 +00:00
Jordan Rupprecht e5daf61229 OpChain has subclasses, so add a virtual destructor.
Summary:
OpChain has subclasses, so add a virtual destructor.

This fixes an issue when deleting subclasses of OpChain (see MatchSMLAD() specifically) in r337701.

Reviewers: javed.absar

Subscribers: llvm-commits, SjoerdMeijer, samparker

Differential Revision: https://reviews.llvm.org/D49681

llvm-svn: 337713
2018-07-23 17:38:05 +00:00
Matt Morehouse d773cb99f1 [ARM] Follow-up to r337709.
Fix double-free.

llvm-svn: 337711
2018-07-23 17:22:53 +00:00
Matt Morehouse a70685f63a [ARM] Add doFinalization() to ARMCodeGenPrepare pass.
Attempt to fix the leak introduced in r337687 and make sanitizer
buildbots green again.

llvm-svn: 337709
2018-07-23 17:00:45 +00:00
Sam Parker 89a3799a69 [ARM][NFC] ParallelDSP reorganisation
In preparing to allow ARMParallelDSP pass to parallelise more than
smlads, I've restructed some elements:

- The ParallelMAC struct has been renamed to BinOpChain.
- The BinOpChain struct holds two value lists: LHS and RHS, as well
  as inheriting from the OpChain base class.
- The OpChain struct holds all the values of the represented chain
  and has had the memory locations functionality inserted into it.
- ParallelMACList becomes OpChainList and it now holds pointers
  instead of objects.

Differential Revision: https://reviews.llvm.org/D49020

llvm-svn: 337701
2018-07-23 15:25:59 +00:00
Jonas Paulsson 59c94bec0d [SystemZ] Fix dumpSU() method in SystemZHazardRecognizer.
Two minor issues: The new MCD SchedWrite name does not contain "Unit" like
all the others, so a check is needed. Also, print "LSU" instead of "LS".

Review: Ulrich Weigand
llvm-svn: 337700
2018-07-23 15:08:35 +00:00
Sam Parker 3828c6ff94 [ARM] ARMCodeGenPrepare backend pass
Arm specific codegen prepare is implemented to perform type promotion
on icmp operands, which can enable the removal of uxtb and uxth
(unsigned extend) instructions. This is possible because performing
type promotion before ISel alleviates this duty from the DAG builder
which has to perform legalisation, but has a limited view on data
ranges.
    
The pass visits any instruction operand of an icmp and creates a
worklist to traverse the use-def tree to determine whether the values
can simply be promoted. Our concern is values in the registers
overflowing the narrow (i8, i16) data range, so instructions marked
with nuw can be promoted easily. For add and sub instructions, we are
able to use the parallel dsp instructions to operate on scalar data
types and avoid overflowing bits. Underflowing adds and subs are also
permitted when the result is only used by an unsigned icmp.

Differential Revision: https://reviews.llvm.org/D48832

llvm-svn: 337687
2018-07-23 12:27:47 +00:00
Roman Lebedev 52b85377eb [NFC][MCA] ZnVer1: Update RegisterFile to identify false dependencies on partially written registers.
Summary:
Pretty mechanical follow-up for D49196.

As microarchitecture.pdf notes, "20 AMD Ryzen pipeline",
"20.8 Register renaming and out-of-order schedulers":
  The integer register file has 168 physical registers of 64 bits each.
  The floating point register file has 160 registers of 128 bits each.
"20.14 Partial register access":
  The processor always keeps the different parts of an integer register together.
  ...
  An instruction that writes to part of a register will therefore have a false dependence
  on any previous write to the same register or any part of it.

Reviewers: andreadb, courbet, RKSimon, craig.topper, GGanesh

Reviewed By: GGanesh

Subscribers: gbedwell, llvm-commits

Differential Revision: https://reviews.llvm.org/D49393

llvm-svn: 337676
2018-07-23 10:10:13 +00:00
Chandler Carruth 1d926fb9f4 [x86/SLH] Fix a bug where we would harden tail calls twice -- once as
a call, and then again as a return.

Also added a comment to try and explain better why we would be doing
what we're doing when hardening the (non-call) returns.

llvm-svn: 337673
2018-07-23 07:56:15 +00:00
Chandler Carruth 0477b40137 [x86/SLH] Rename and comment the main hardening function. NFC.
This provides an overview of the algorithm used to harden specific
loads. It also brings this our terminology further in line with
hardening rather than checking.

Differential Revision: https://reviews.llvm.org/D49583

llvm-svn: 337667
2018-07-23 04:01:34 +00:00
Craig Topper b2a626b52e [X86] Remove the max vector width restriction from combineLoopMAddPattern and rely splitOpsAndApply to handle splitting.
This seems to be a net improvement. There's still an issue under avx512f where we have a 512-bit vpaddd, but not vpmaddwd so we end up doing two 256-bit vpmaddwds and inserting the results before a 512-bit vpaddd. It might be better to do two 512-bits paddds with zeros in the upper half. Same number of instructions, but breaks a dependency.

llvm-svn: 337656
2018-07-22 19:44:35 +00:00
Simon Atanasyan 2c5b18f70f [mips] Factor out register class selection for global base register. NFC
Factor out register class selection for global base register into a
separate function to escape long chain of ternary operators.

llvm-svn: 337647
2018-07-21 16:16:08 +00:00
Simon Atanasyan ecd1e0afdd [mips] Move out the WrapperPat declaration from the NotInMicroMips predicate
This is a follow-up to the rL335185. Those commit adds some WrapperPat
patterns for microMIPS target. But declaration of the WrapperPat class
is under the NotInMicroMips predicate and microMIPS patterns cannot be
selected because predicate (Subtarget->inMicroMipsMode()) &&
(!Subtarget->inMicroMipsMode()) is always false.

This change move out the WrapperPat class declaration from the
NotInMicroMips predicate and enables microMIPS WrapperPat patterns.

Differential revision: https://reviews.llvm.org/D49533

llvm-svn: 337646
2018-07-21 16:16:03 +00:00
Matt Arsenault 5ae765e68c AMDGPU: Use existing function to check for VGPRs
llvm-svn: 337621
2018-07-20 21:20:36 +00:00
Benjamin Kramer 64c7fa3201 Revert "[X86][AVX] Convert X86ISD::VBROADCAST demanded elts combine to use SimplifyDemandedVectorElts"
This reverts commit r337547. It triggers an infinite loop.

llvm-svn: 337617
2018-07-20 20:59:46 +00:00
Craig Topper 28ac623f6f [X86] Remove isel patterns for MOVSS/MOVSD ISD opcodes with integer types.
Ideally our ISD node types going into the isel table would have types consistent with their instruction domain. This prevents us having to duplicate patterns with different types for the same instruction.

Unfortunately, it seems our shuffle combining is currently relying on this a little remove some bitcasts. This seems to enable some switching between shufps and shufd. Hopefully there's some way we can address this in the combining.

Differential Revision: https://reviews.llvm.org/D49280

llvm-svn: 337590
2018-07-20 17:57:53 +00:00
Craig Topper 6194ccf8c7 [X86] Remove what appear to be unnecessary uses of DCI.CombineTo
CombineTo is most useful when you need to replace multiple results, avoid the worklist management, or you need to something else after the combine, etc. Otherwise you should be able to just return the new node and let DAGCombiner go through its usual worklist code.

All of the places changed in this patch look to be standard cases where we should be able to use the more stand behavior of just returning the new node.

Differential Revision: https://reviews.llvm.org/D49569

llvm-svn: 337589
2018-07-20 17:57:42 +00:00
Simon Pilgrim 70fcd0f481 [X86][XOP] Fix SUB constant folding for VPSHA/VPSHL shift lowering
We can safely use getConstant here as we're still lowering, which allows constant folding to kick in and simplify the vector shift codegen.

Noticed while working on D49562.

llvm-svn: 337578
2018-07-20 16:55:18 +00:00
Evandro Menezes fffa9b5897 [ARM] Add new feature to enable optimizing the VFP registers
Enable the optimization of operations on DPR and SPR via a feature instead
of checking the target.

Differential revision: https://reviews.llvm.org/D49463

llvm-svn: 337575
2018-07-20 16:49:28 +00:00
Simon Pilgrim c7132031a2 [X86][SSE] Use SplitOpsAndApply to improve HADD/HSUB lowering
Improve AVX1 256-bit vector HADD/HSUB matching by using SplitOpsAndApply to split into 128-bit instructions.

llvm-svn: 337568
2018-07-20 16:20:45 +00:00
Simon Pilgrim a85b86a982 [X86][AVX] Add support for i16 256-bit vector horizontal op redundant shuffle removal
llvm-svn: 337566
2018-07-20 15:51:01 +00:00
Simon Pilgrim 7c56bce996 [X86][AVX] Add support for 32/64 bits 256-bit vector horizontal op redundant shuffle removal
llvm-svn: 337561
2018-07-20 15:24:12 +00:00
Simon Pilgrim 6fb8b68b2d [X86][AVX] Convert X86ISD::VBROADCAST demanded elts combine to use SimplifyDemandedVectorElts
This is an early step towards using SimplifyDemandedVectorElts for target shuffle combining - this merely moves the existing X86ISD::VBROADCAST simplification code to use the SimplifyDemandedVectorElts mechanism.

Adds X86TargetLowering::SimplifyDemandedVectorEltsForTargetNode to handle X86ISD::VBROADCAST - in time we can support all target shuffles (and other ops) here.

llvm-svn: 337547
2018-07-20 13:26:51 +00:00
Jonas Paulsson c88d3f6a99 [SystemZ] Reimplent SchedModel IssueWidth and WriteRes/ReadAdvance mappings.
As a consequence of recent discussions
(http://lists.llvm.org/pipermail/llvm-dev/2018-May/123164.html), this patch
changes the SystemZ SchedModels so that the IssueWidth is 6, which is the
decoder capacity, and NumMicroOps become the number of decoder slots needed
per instruction.

In addition, the SchedWrite latencies now match the MachineInstructions
def-operand indexes, and ReadAdvances have been added on instructions with
one register operand and one memory operand.

Review: Ulrich Weigand
https://reviews.llvm.org/D47008

llvm-svn: 337538
2018-07-20 09:40:43 +00:00
Andrew V. Tischenko ee2e3144ba Improved sched model for X86 BSWAP* instrs.
Differential Revision: https://reviews.llvm.org/D49477

llvm-svn: 337537
2018-07-20 09:39:14 +00:00
Matt Arsenault 4bec7d4261 Reapply "AMDGPU: Fix handling of alignment padding in DAG argument lowering"
Reverts r337079 with fix for msan error.

llvm-svn: 337535
2018-07-20 09:05:08 +00:00
Sander de Smalen 33f588acb9 [AArch64][SVE] Asm: Support for bit/byte reverse operations.
This patch adds the following instructions:

  RBIT      reverse bits within each active elemnt (predicated), e.g.
                rbit z0.d, p0/m, z1.d

            for 8, 16, 32 and 64 bit elements.

  REV       reverse order of elements in data/predicate vector
            (unpredicated), e.g.
                rev z0.d, z1.d
                rev p0.d, p1.d

            for 8, 16, 32 and 64 bit elements.

  REVB      reverse order of bytes within each active element, e.g.
                revb z0.d, p0/m, z1.d

            for 16, 32 and 64 bit elements.

  REVH      reverse order of 16-bit half-words within each active
            element, e.g.
                revh z0.d, p0/m, z1.d

            for 32 and 64 bit elements.

  REVW      reverse order of 32-bit words within each active element,
            e.g.
                revw z0.d, p0/m, z1.d

            for 64 bit elements.

llvm-svn: 337534
2018-07-20 09:00:44 +00:00
Sander de Smalen 3ed7f81ce1 [AArch64][SVE] Asm: Support for FTMAD instruction.
Floating-point trigonometric multiply-add coefficient,
e.g.

  ftmad z0.h, z0.h, z1.h, #7

with variants for 16, 32 and 64-bit elements.

llvm-svn: 337533
2018-07-20 08:47:26 +00:00
Heejin Ahn 022a37af77 [WebAssembly] Disable a test that violates DR1696
Summary:
lifetime2.C violates DR1696, which prevents reference members from being
initialized to temporaries, whose lifetime would end at the end of ctor.

Reviewers: sbc100

Subscribers: dschuff, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D49577

llvm-svn: 337512
2018-07-20 00:13:42 +00:00
Chandler Carruth a3a03ac247 [x86/SLH] Clean up helper naming for return instruction handling and
remove dead declaration of a call instruction handling helper.

This moves to the 'harden' terminology that I've been trying to settle
on for returns. It also adds a really detailed comment explaining what
all we're trying to accomplish with return instructions and why.
Hopefully this makes it much more clear what exactly is being
"hardened".

Differential Revision: https://reviews.llvm.org/D49571

llvm-svn: 337510
2018-07-19 23:46:24 +00:00
Simon Pilgrim 1d181bc992 [X86][AVX] Use extract_subvector to reduce vector op widths (PR36761)
We have a number of cases where we fail to reduce vector op widths, performing the op in a larger vector and then extracting a subvector. This is often because by default it would create illegal types.

This peephole patch attempts to handle a few common cases detailed in PR36761, which typically involved extension+conversion to vX2f64 types.

Differential Revision: https://reviews.llvm.org/D49556

llvm-svn: 337500
2018-07-19 21:52:06 +00:00
Craig Topper 9888670c6b [X86] Fix some 'return SDValue()' after DCI.CombineTo instead return the output of CombineTo
Returning SDValue() means nothing was changed. Returning the result of CombineTo returns the first argument of CombineTo. This is specially detected by DAGCombiner as meaning that something changed, but worklist management was already taken care of.

I think the only real effect of this change is that we now properly update the Statistic the counts the number of combines performed. That's the only thing between the check for null and the check for N in the DAGCombiner.

llvm-svn: 337491
2018-07-19 20:10:44 +00:00
Stefan Pintilie 0c122d5a41 [Power9] Code Cleanup - Remove needsAggressiveScheduling()
As we already return true from needsAggressiveScheduling() for the most recent
hardware it would be cleaner to just return true for all PowerPC hardware.

Differential Revision: https://reviews.llvm.org/D48663

llvm-svn: 337488
2018-07-19 19:34:18 +00:00
Andrea Di Biagio b6022aa8d9 [X86][BtVer2] correctly model the latency/throughput of LEA instructions.
This patch fixes the latency/throughput of LEA instructions in the BtVer2
scheduling model.

On Jaguar, A 3-operands LEA has a latency of 2cy, and a reciprocal throughput of
1. That is because it uses one cycle of SAGU followed by 1cy of ALU1.  An LEA
with a "Scale" operand is also slow, and it has the same latency profile as the
3-operands LEA. An LEA16r has a latency of 3cy, and a throughput of 0.5 (i.e.
RThrouhgput of 2.0).

This patch adds a new TIIPredicate named IsThreeOperandsLEAFn to X86Schedule.td.
The tablegen backend (for instruction-info) expands that definition into this
(file X86GenInstrInfo.inc):
```
static bool isThreeOperandsLEA(const MachineInstr &MI) {
  return (
    (
      MI.getOpcode() == X86::LEA32r
      || MI.getOpcode() == X86::LEA64r
      || MI.getOpcode() == X86::LEA64_32r
      || MI.getOpcode() == X86::LEA16r
    )
    && MI.getOperand(1).isReg()
    && MI.getOperand(1).getReg() != 0
    && MI.getOperand(3).isReg()
    && MI.getOperand(3).getReg() != 0
    && (
      (
        MI.getOperand(4).isImm()
        && MI.getOperand(4).getImm() != 0
      )
      || (MI.getOperand(4).isGlobal())
    )
  );
}
```

A similar method is generated in the X86_MC namespace, and included into
X86MCTargetDesc.cpp (the declaration lives in X86MCTargetDesc.h).

Back to the BtVer2 scheduling model:
A new scheduling predicate named JSlowLEAPredicate now checks if either the
instruction is a three-operands LEA, or it is an LEA with a Scale value
different than 1.
A variant scheduling class uses that new predicate to correctly select the
appropriate latency profile.

Differential Revision: https://reviews.llvm.org/D49436

llvm-svn: 337469
2018-07-19 16:42:15 +00:00
Tim Northover a7c18ee8c4 ARM: switch armv7em MachO triple to hard-float defaults and libcalls.
We were emitting incorrect calls to libm functions that LLVM had decided it
knew about because the default is soft-float.

Recommitted without breaking ELF this time.

llvm-svn: 337450
2018-07-19 12:44:51 +00:00
Chandler Carruth 4b0028a3d1 [x86/SLH] Major refactoring of SLH implementaiton. There are two big
changes that are intertwined here:

1) Extracting the tracing of predicate state through the CFG to its own
   function.
2) Creating a struct to manage the predicate state used throughout the
   pass.

Doing #1 necessitates and motivates the particular approach for #2 as
now the predicate management is spread across different functions
focused on different aspects of it. A number of simplifications then
fell out as a direct consequence.

I went with an Optional to make it more natural to construct the
MachineSSAUpdater object.

This is probably the single largest outstanding refactoring step I have.
Things get a bit more surgical from here. My current goal, beyond
generally making this maintainable long-term, is to implement several
improvements to how we do interprocedural tracking of predicate state.
But I don't want to do that until the predicate state management and
tracing is in reasonably clear state.

Differential Revision: https://reviews.llvm.org/D49427

llvm-svn: 337446
2018-07-19 11:13:58 +00:00
Simon Pilgrim dd7bf598cc Fix spelling mistake in comments. NFCI.
llvm-svn: 337442
2018-07-19 09:14:39 +00:00
Heejin Ahn 47068a42d2 [WebAssembly] Add missing -mattr=+exception-handling guards
Summary:
The use of exception handling instructions should only be enabled with
`-mattr=+exception-handling` option.

Reviewers: jgravelle-google

Subscribers: dschuff, sbc100, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D49391

llvm-svn: 337425
2018-07-18 21:42:22 +00:00
Tim Northover da142d10d9 Revert "ARM: switch armv7em triple to hard-float defaults and libcalls."
This reverts commit r337385 until it can be targeted at MachO only.

llvm-svn: 337424
2018-07-18 21:32:49 +00:00
Simon Pilgrim d4b82da113 [X86][SSE] Canonicalize scalar fp arithmetic shuffle patterns
As discussed on PR38197, this canonicalizes MOVS*(N0, OP(N0, N1)) --> MOVS*(N0, SCALAR_TO_VECTOR(OP(N0[0], N1[0])))

This returns the scalar-fp codegen lost by rL336971.

Additionally it handles the OP(N1, N0)) case for commutable (FADD/FMUL) ops.

Differential Revision: https://reviews.llvm.org/D49474

llvm-svn: 337419
2018-07-18 19:55:19 +00:00
Simon Atanasyan e9d7b3198a [mips] Fix predicate for the MipsTruncIntFP pattern
This is a follow-up to the rL337171. This patch fixes regression
introduced by the r337171 and enables MipsTruncIntFP pattern.

Differential revision: https://reviews.llvm.org/D49469

llvm-svn: 337392
2018-07-18 14:11:22 +00:00
Simon Pilgrim 3a45369b9e [X86][SSE] Remove BLENDPD canonicalization from combineTargetShuffle
When rL336971 removed the scalar-fp isel patterns, we lost the need for this canonicalization - commutation/folding can handle everything else.

llvm-svn: 337387
2018-07-18 13:01:20 +00:00
Tim Northover e00cf4fc68 ARM: stop explicitly marking armv7k libcalls as hard-float. NFC.
Since the triple's default is hard float, the libcalls will already use VFP
registers.

llvm-svn: 337386
2018-07-18 12:37:43 +00:00
Tim Northover d4abd14c1b ARM: switch armv7em triple to hard-float defaults and libcalls.
We were emitting incorrect calls to libm functions that LLVM had decided it
knew about because the default is soft-float.

llvm-svn: 337385
2018-07-18 12:37:04 +00:00
Tim Northover 097a3e3d95 ARM: deduplicate hard-float detection code. NFC.
ARMSubtarget had a copy/pasted block to determine whether the target was
hard-float, but it just delegated to triple features anyway so it's better at
the TargetMachine level.

llvm-svn: 337384
2018-07-18 12:36:25 +00:00
Sander de Smalen 330d887d72 [AArch64][SVE] Asm: Support for unpredicated FP operations.
This patch adds support for the following unpredicated
floating-point instructions:

  FADD      Floating point add
  FSUB      Floating point subtract
  FMUL      Floating point multiplication
  FTSMUL    Floating point trigonometric starting value
  FRECPS    Floating point reciprocal step
  FRSQRTS   Floating point reciprocal square root step

The instructions have the following assembly format:
  fadd z0.h, z1.h, z2.h
and have variants for 16, 32 and 64-bit FP elements.

llvm-svn: 337383
2018-07-18 11:59:12 +00:00
Daniel Cederman 959c8bf51c Revert "[Sparc] Use the IntPair reg class for r constraints with value type f64"
This reverts commit 55222c9183c6e07f53a54c4061677734f54feac1.

I missed that this patch has a dependency on https://reviews.llvm.org/D49219
that has not been approved yet.

llvm-svn: 337373
2018-07-18 10:05:30 +00:00
Sander de Smalen ccdc7ebc1d [AArch64][SVE] Asm: Support for UDOT/SDOT instructions.
The signed/unsigned DOT instructions perform a dot-product on
quadtuplets from two source vectors and accumulate the result in
the destination register. The instructions come in two forms:

Vector form, e.g.
  sdot  z0.s, z1.b, z2.b     - signed dot product on four 8-bit quad-tuplets,
                               accumulating results in 32-bit elements.

  udot  z0.d, z1.h, z2.h     - unsigned dot product on four 16-bit quad-tuplets,
                               accumulating results in 64-bit elements.

Indexed form, e.g.
  sdot  z0.s, z1.b, z2.b[3]  - signed dot product on four 8-bit quad-tuplets
                               with specified quadtuplet from second
                               source vector, accumulating results in 32-bit
                               elements.
  udot  z0.d, z1.h, z2.h[1]  - dot product on four 16-bit quad-tuplets
                               with specified quadtuplet from second
                               source vector, accumulating results in 64-bit
                               elements.

llvm-svn: 337372
2018-07-18 09:37:51 +00:00
Daniel Cederman 4e38df18ea [Sparc] Use the IntPair reg class for r constraints with value type f64
Summary: This is how it appears to be handled in GCC and it prevents a
"Unknown mismatch" error in the SelectionDAGBuilder.

Reviewers: venkatra, jyknight, jrtc27

Reviewed By: jyknight, jrtc27

Subscribers: eraman, fedor.sergeev, jrtc27, llvm-commits

Differential Revision: https://reviews.llvm.org/D49218

llvm-svn: 337370
2018-07-18 09:25:33 +00:00
Sander de Smalen 889fe81ce5 [AArch64][SVE] Asm: Integer divide instructions.
This patch adds the following predicated instructions:

  UDIV    Unsigned divide active elements
  UDIVR   Unsigned divide active elements, reverse form.
  SDIV    Signed divide active elements
  SDIVR   Signed divide active elements, reverse form.

e.g.
  udiv  z0.s, p0/m, z0.s, z1.s
    (unsigned divide active elements in z0 by z1, store result in z0)

  sdivr z0.s, p0/m, z0.s, z1.s
    (signed divide active elements in z1 by z0, store result in z0)

llvm-svn: 337369
2018-07-18 09:17:29 +00:00
Sander de Smalen ac0cb5bf75 [AArch64][SVE] Asm: Support for integer MUL instructions.
This patch adds the following instructions:
  MUL   - multiply vectors, e.g.
    mul z0.h, p0/m, z0.h, z1.h
        - multiply with immediate, e.g.
    mul z0.h, z0.h, #127

  SMULH - signed multiply returning high half, e.g.
    smulh z0.h, p0/m, z0.h, z1.h

  UMULH - unsigned multiply returning high half, e.g.
    umulh z0.h, p0/m, z0.h, z1.h

llvm-svn: 337358
2018-07-18 08:10:03 +00:00
Craig Topper 92ea7a7b48 [X86] Enable commuting of VUNPCKHPD to VMOVLHPS to enable load folding by using VMOVLPS with a modified address.
This required an annoying amount of tablegen multiclass changes to make only VUNPCKHPDZ128rr commutable.

llvm-svn: 337357
2018-07-18 07:31:32 +00:00
Hiroshi Inoue cd83d459bc [NFC] fix trivial typos in comments
llvm-svn: 337351
2018-07-18 06:04:43 +00:00
Justin Hibbits 22e939a15b Fix build failures from r337347, found by clang
* Delete a no-longer-used override, and mark the other
getRegisterTypeForCallingConv() as override.
* SPE only supports i32, not i64, as the internal type, so simply remove
the type check, so that DestReg and Opc are provably always set.

GCC 6.4 did not warn about either of the above.

llvm-svn: 337350
2018-07-18 05:19:25 +00:00
Craig Topper 95063a45b8 [X86] Remove patterns that mix X86ISD::MOVLHPS/MOVHLPS with v2i64/v2f64 types.
The X86ISD::MOVLHPS/MOVHLPS should now only be emitted in SSE1 only. This means that the v2i64/v2f64 types would be illegal thus we don't need these patterns.

llvm-svn: 337349
2018-07-18 05:10:53 +00:00
Craig Topper 1425e10cc6 [X86] Generate v2f64 X86ISD::UNPCKL/UNPCKH instead of X86ISD::MOVLHPS/MOVHLPS for unary v2f64 {0,0} and {1,1} shuffles with SSE2.
I'm trying to restrict the MOVLHPS/MOVHLPS ISD nodes to SSE1 only. With SSE2 we can use unpcks. I believe this will allow some patterns to be cleaned up to require fewer bitcasts.

I've put in an odd isel hack to still select MOVHLPS instruction from the unpckh node to avoid changing tests and because movhlps is a shorter encoding. Ideally we'd do execution domain switching on this, but the operands are in the wrong order and are tied. We might be able to try a commute in the domain switching using custom code.

We already support domain switching for UNPCKLPD and MOVLHPS.

llvm-svn: 337348
2018-07-18 05:10:51 +00:00
Justin Hibbits d52990c71b Introduce codegen for the Signal Processing Engine
Summary:
The Signal Processing Engine (SPE) is found on NXP/Freescale e500v1,
e500v2, and several e200 cores.  This adds support targeting the e500v2,
as this is more common than the e500v1, and is in SoCs still on the
market.

This patch is very intrusive because the SPE is binary incompatible with
the traditional FPU.  After discussing with others, the cleanest
solution was to make both SPE and FPU features on top of a base PowerPC
subset, so all FPU instructions are now wrapped with HasFPU predicates.

Supported by this are:
* Code generation following the SPE ABI at the LLVM IR level (calling
conventions)
* Single- and Double-precision math at the level supported by the APU.

Still to do:
* Vector operations
* SPE intrinsics

As this changes the Callee-saved register list order, one test, which
tests the precise generated code, was updated to account for the new
register order.

Reviewed by: nemanjai
Differential Revision: https://reviews.llvm.org/D44830

llvm-svn: 337347
2018-07-18 04:25:10 +00:00
Justin Hibbits 4fa4fa6a73 Complete the SPE instruction set patterns
This is the lead-up to having SPE codegen.  Add the rest of the
instructions, along with MC tests.

Differential Revision:  https://reviews.llvm.org/D44829

llvm-svn: 337346
2018-07-18 04:24:57 +00:00
Justin Hibbits ceb3cd96f7 Add PowerPC e500(v2) core scheduler and directives.
Differential Revision: https://reviews.llvm.org/D44828

llvm-svn: 337345
2018-07-18 04:24:49 +00:00
Craig Topper a29f58dc31 [X86] Remove the vector alignment requirement from the patterns added in r337320.
The resulting instruction will only load 64 bits so alignment isn't required.

llvm-svn: 337334
2018-07-17 23:26:20 +00:00
Craig Topper 9ef92865ec [X86] Add patterns for folding full vector load into MOVHPS and MOVLPS with SSE1 only.
llvm-svn: 337320
2018-07-17 20:16:18 +00:00
Chandler Carruth c0cb5731fc [x86/SLH] Flesh out the data-invariant instruction table a bit based on feedback from Craig.
Summary:
The only thing he suggested that I've skipped here is the double-wide
multiply instructions. Multiply is an area I'm nervous about there being
some hidden data-dependent behavior, and it doesn't seem important for
any benchmarks I have, so skipping it and sticking with the minimal
multiply support that matches what I know is widely used in existing
crypto libraries. We can always add double-wide multiply when we have
clarity from vendors about its behavior and guarantees.

I've tried to at least cover the fundamentals here with tests, although
I've not tried to cover every width or permutation. I can add more tests
where folks think it would be helpful.

Reviewers: craig.topper

Subscribers: sanjoy, mcrosier, hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D49413

llvm-svn: 337308
2018-07-17 18:07:59 +00:00
Sam Clegg 28b3e99482 [WebAssembly] Update WebAssemblyLowerEmscriptenEHSjLj to handle separate compilation
Previously we were assuming whole program compilation. Now that
separate compilation is a thing we need to update this pass.
Firstly, it can no longer assert on the existence of malloc and free.
This functions might not be in the current translation unit.  If we
need them then we will generate not imports for them.

Secondly the global helper function we create should be marked as
weak since we will be generating a separate copy in each translation
unit.

Finally the names of the symbols used must be unique and fixed since
they need to agree across translation units.

Differential Revision: https://reviews.llvm.org/D49263

llvm-svn: 337301
2018-07-17 16:40:03 +00:00
Craig Topper 9187bca71b [X86] Remove some standalone patterns in favor of the patterns in the MOVLPD instruction definitions.
Previously we passed 'null_frag' into the instruction definition. The multiclass is shared with MOVHPD which doesn't use null_frag. It turns out by passing X86Movsd it produces patterns equivalent to some standalone patterns.

llvm-svn: 337299
2018-07-17 16:24:33 +00:00
Sander de Smalen 7b33faaf38 [AArch64][SVE]: Integer multiply-add/subtract instructions.
This patch adds support for the following instructions:
  MLA  mul-add, writing addend       (Zda = Zda +   Zn * Zm)
  MLS  mul-sub, writing addend       (Zda = Zda +  -Zn * Zm)
  MAD  mul-add, writing multiplicant (Zdn =  Za +  Zdn * Zm)
  MSB  mul-sub, writing multiplicant (Zdn =  Za + -Zdn * Zm)

llvm-svn: 337293
2018-07-17 15:41:58 +00:00
Petar Jovanovic f10e4798b4 [Mips][FastISel] Fix handling of icmp with i1 type
The Mips FastISel back-end does not extend i1 values while lowering icmp.
Ensure that we bail into DAG ISel when handling this case.

Patch by Dragan Mladjenovic.

Differential Revision: https://reviews.llvm.org/D49290

llvm-svn: 337288
2018-07-17 14:57:46 +00:00
Sander de Smalen f41dd122d6 [AArch64][SVE] Asm: FP fused multiply-add/subtract instructions.
This patch adds support for the following instructions:

  FMLA    mul-add, writing addend                (Zda =  Zda +   Zn * Zm)
  FNMLA   negated mul-add, writing addend        (Zda = -Zda +  -Zn * Zm)

  FMLS    mul-sub, writing addend                (Zda =  Zda +  -Zn * Zm)
  FNMLS   negated mul-sub, writing addend        (Zda = -Zda +   Zn * Zm)

  FMAD    mul-add, writing multiplicant          (Zdn =  Za  +  Zdn * Zm)
  FNMAD   negated mul-add, writing multiplicant  (Zdn = -Za  + -Zdn * Zm)

  FMSB    mul-sub, writing multiplicant          (Zdn =  Za  + -Zdn * Zm)
  FNMSB   negated mul-sub, writing multiplicant  (Zdn = -Za  +  Zdn * Zm)

llvm-svn: 337282
2018-07-17 13:58:46 +00:00
Sander de Smalen 5dabcf887b [AArch64][SVE] Asm: Support for predicated FP operations (FP immediate)
This patch completes support for the following floating point
instructions that take FP immediates:
  FADD*  (addition)
  FSUB   (subtract)
  FSUBR  (subtract reverse form)
  FMUL*  (multiplication)
  FMAX*  (maximum)
  FMAXNM (maximum number)
  FMIN   (maximum)
  FMINNM (maximum number)

All operations are predicated and take a FP immediate operand,
e.g.

  fadd z0.h, p0/m, z0.h, #0.5
  fmin z0.s, p0/m, z0.s, #1.0
        ^___________^ (tied)

* Instructions added in a previous patch.

llvm-svn: 337272
2018-07-17 12:36:08 +00:00
whitequark 7c4a074505 [LLVM-C] Add target triple normalization to the C API.
rL333307 was introduced to remove automatic target triple
normalization when calling sys::getDefaultTargetTriple(), arguing
that users of the latter already called Triple::normalize()
if necessary. However, users of the C API currently have no way of
doing target triple normalization.

This patch introduces an LLVMNormalizeTargetTriple function to
the C API which wraps Triple::normalize() and can be used on
the result of LLVMGetDefaultTargetTriple to achieve the same effect.

Differential Revision: https://reviews.llvm.org/D49414

Reviewed By: whitequark

llvm-svn: 337263
2018-07-17 10:57:39 +00:00
Sander de Smalen 3b9e342ae1 [AArch64][SVE] Asm: Support for predicated FP operations.
This patch adds support for the following floating point
instructions:
  FABD   (absolute difference)
  FADD   (addition)
  FSUB   (subtract)
  FSUBR  (subtract reverse form)
  FDIV   (divide)
  FDIVR  (divide reverse form)
  FMAX   (maximum)
  FMAXNM (maximum number)
  FMIN   (minimum)
  FMINNM (minimum number)
  FSCALE (adjust exponent)
  FMULX  (multiply extended)

All operations are predicated and binary form, e.g.

  fadd z0.h, p0/m, z0.h, z1.h
        ^___________^ (tied)

Supporting 16, 32 and 64-bit FP elements.

llvm-svn: 337259
2018-07-17 09:48:57 +00:00
Simon Pilgrim e4d12bb2d6 [DAGCombiner] Call SimplifyDemandedVectorElts from EXTRACT_VECTOR_ELT
If we are only extracting vector elements via EXTRACT_VECTOR_ELT(s) we may be able to use SimplifyDemandedVectorElts to avoid unnecessary vector ops.

Differential Revision: https://reviews.llvm.org/D49262

llvm-svn: 337258
2018-07-17 09:45:35 +00:00
Sander de Smalen ec229abb9b [AArch64][SVE] Asm: Support for SPLICE instruction.
The SPLICE instruction splices two vectors into one vector using a
predicate. It copies the active elements from the first vector, and
then fills the remaining elements with the low-numbered elements from
the second vector.

The instruction has the following form, e.g.

  splice z0.b, p0, z0.b, z1.b

for 8-bit elements. It also supports 16, 32 and
64-bit elements.

llvm-svn: 337253
2018-07-17 08:52:45 +00:00
Sander de Smalen 78fe30a662 [AArch64][SVE] Asm: Support for EXT instruction.
This patch adds an instruction that allows extracting
a vector from a pair of vectors, given an immediate index
that describes the element position to extract from.

The instruction has the following assembly:
  ext z0.b, z0.b, z1.b, #imm

where #imm is an immediate between 0 and 255.

llvm-svn: 337251
2018-07-17 08:39:48 +00:00
Craig Topper 880f92ad71 [X86] Properly qualify some MOVSS/MOVSD patterns with OptSize.
These are integer versions of patterns that I already fixed for floating point.

llvm-svn: 337240
2018-07-17 06:24:16 +00:00
Daniel Cederman c812dcc3db [Sparc] Do not depend on icc for ta 1
The ta instruction will always trap, regardless of the value
of the integer condition codes. TRAPri is marked as using icc,
so we cannot use a pattern for TRAPri to implement ta 1, as
verify-machineinstrs can complain that icc is not defined.
Instead we implement ta 1 the same way as ta 5.

llvm-svn: 337236
2018-07-17 05:49:33 +00:00
Craig Topper c376a1916b [X86] Add full set of patterns for turning ceil/floor/trunc/rint/nearbyint into rndscale with loads, broadcast, and masking.
This amounts to pretty ridiculous number of patterns. Ideally we'd canonicalize the X86ISD::VRNDSCALE earlier to reuse those patterns. I briefly looked into doing that, but some strict FP operations could still get converted to rint and nearbyint during isel. It's probably still worthwhile to look into. This patch is meant as a starting point to work from.

llvm-svn: 337234
2018-07-17 05:48:48 +00:00
Craig Topper 6751727d76 [X86] Add a missing FMA3 scalar intrinsic pattern.
This allows us to use 231 form to fold an insertelement on the add input to the fma. There is technically no software intrinsic that can use this until AVX512F, but it can be manually built up from other intrinsics.

llvm-svn: 337223
2018-07-16 23:10:58 +00:00
Sam Clegg cf2a9e28b1 [WebAssembly] Remove ELF file support.
This support was partial and temporary.  Now that we have
wasm object file support its no longer needed.

Differential Revision: https://reviews.llvm.org/D48744

llvm-svn: 337222
2018-07-16 23:09:29 +00:00
Farhana Aleen c370d7b33d [AMDGPU] [AMDGPU] Support a fdot2 pattern.
Summary: Optimize fma((float)S0.x, (float)S1.x fma((float)S0.y, (float)S1.y, z))
                   -> fdot2((v2f16)S0, (v2f16)S1, (float)z)

Author: FarhanaAleen

Reviewed By: rampitec, b-sumner

Subscribers: AMDGPU

Differential Revision: https://reviews.llvm.org/D49146

llvm-svn: 337198
2018-07-16 18:19:59 +00:00
Chandler Carruth fa065aa75c [x86/SLH] Completely rework how we sink post-load hardening past data
invariant instructions to be both more correct and much more powerful.

While testing, I continued to find issues with sinking post-load
hardening. Unfortunately, it was amazingly hard to create any useful
tests of this because we were mostly sinking across copies and other
loading instructions. The fact that we couldn't sink past normal
arithmetic was really a big oversight.

So first, I've ported roughly the same set of instructions from the data
invariant loads to also have their non-loading varieties understood to
be data invariant. I've also added a few instructions that came up so
often it again made testing complicated: inc, dec, and lea.

With this, I was able to shake out a few nasty bugs in the validity
checking. We need to restrict to hardening single-def instructions with
defined registers that match a particular form: GPRs that don't have
a NOREX constraint directly attached to their register class.

The (tiny!) test case included catches all of the issues I was seeing
(once we can sink the hardening at all) except for the NOREX issue. The
only test I have there is horrible. It is large, inexplicable, and
doesn't even produce an error unless you try to emit encodings. I can
keep looking for a way to test it, but I'm out of ideas really.

Thanks to Ben for giving me at least a sanity-check review. I'll follow
up with Craig to go over this more thoroughly post-commit, but without
it SLH crashes everywhere so landing it for now.

Differential Revision: https://reviews.llvm.org/D49378

llvm-svn: 337177
2018-07-16 14:58:32 +00:00
Simon Atanasyan 738a6e449e [mips] Eliminate the usage of hasStdEnc in MipsPat.
Instead, the pattern is tagged with the correct predicate when
it is declared. Some patterns have been duplicated as necessary.

Patch by Simon Dardis.

Differential revision: https://reviews.llvm.org/D48365

llvm-svn: 337171
2018-07-16 13:52:41 +00:00
Petar Jovanovic 021e4c82eb [MIPS GlobalISel] Select instructions to load and store i32 on stack
Add code for selection of G_LOAD, G_STORE, G_GEP, G_FRAMEINDEX and
G_CONSTANT. Support loads and stores of i32 values.

Patch by Petar Avramovic.

Differential Revision: https://reviews.llvm.org/D48957

llvm-svn: 337168
2018-07-16 13:29:32 +00:00
Roman Lebedev de506632aa [X86][AArch64][DAGCombine] Unfold 'check for [no] signed truncation' pattern
Summary:

[[ https://bugs.llvm.org/show_bug.cgi?id=38149 | PR38149 ]]

As discussed in https://reviews.llvm.org/D49179#1158957 and later,
the IR for 'check for [no] signed truncation' pattern can be improved:
https://rise4fun.com/Alive/gBf
^ that pattern will be produced by Implicit Integer Truncation sanitizer,
https://reviews.llvm.org/D48958 https://bugs.llvm.org/show_bug.cgi?id=21530
in signed case, therefore it is probably a good idea to improve it.

But the IR-optimal patter does not lower efficiently, so we want to undo it..

This handles the simple pattern.
There is a second pattern with predicate and constants inverted.

NOTE: we do not check uses here. we always do the transform.

Reviewers: spatel, craig.topper, RKSimon, javed.absar

Reviewed By: spatel

Subscribers: kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D49266

llvm-svn: 337166
2018-07-16 12:44:10 +00:00
Daniel Cederman 92a700a215 [Sparc] Use the correct encoding for ta 3
Summary: The old encoding generated a "tn %g1 + 3" instruction instead
of the expected "ta 3".

Reviewers: venkatra, jyknight

Reviewed By: jyknight

Subscribers: fedor.sergeev, jrtc27, llvm-commits

Differential Revision: https://reviews.llvm.org/D49171

llvm-svn: 337165
2018-07-16 12:28:26 +00:00
Daniel Cederman ab09da7b57 [Sparc] Use the names .rem and .urem instead of __modsi3 and __umodsi3
Summary: These are the names used in libgcc.

Reviewers: venkatra, jyknight, ekedaigle

Reviewed By: jyknight

Subscribers: joerg, fedor.sergeev, jrtc27, llvm-commits

Differential Revision: https://reviews.llvm.org/D48915

llvm-svn: 337164
2018-07-16 12:22:08 +00:00
Daniel Cederman 68765757d4 [Sparc] Generate ta 1 for the @llvm.debugtrap intrinsic
Summary: Software trap number one is the trap used for breakpoints
in the Sparc ABI.

Reviewers: jyknight, venkatra

Reviewed By: jyknight

Subscribers: fedor.sergeev, jrtc27, llvm-commits

Differential Revision: https://reviews.llvm.org/D48637

llvm-svn: 337163
2018-07-16 12:16:53 +00:00
Chandler Carruth e66a6f48e3 [x86/SLH] Fix a bug where we would try to post-load harden non-GPRs.
Found cases that hit the assert I added. This patch factors the validity
checking into a nice helper routine and calls it when deciding to harden
post-load, and asserts it when doing so later.

I've added tests for the various ways of loading a floating point type,
as well as loading all vector permutations. Even though many of these go
to identical instructions, it seems good to somewhat comprehensively
test them.

I'm confident there will be more fixes needed here, I'll try to add
tests each time as I get this predicate adjusted.

llvm-svn: 337160
2018-07-16 11:38:48 +00:00
Chandler Carruth 3620b9957a [x86/SLH] Extract another small helper function, add better comments and
use better terminology. NFC.

llvm-svn: 337157
2018-07-16 10:46:16 +00:00
Mark Searles f4e7025299 [AMDGPU][Waitcnt] Re-apply fix "comparison of integers of different signs" build error"
Re-apply "[AMDGPU][Waitcnt] fix "comparison of integers of different signs" build error""
( fe0a456510131f268e388c4a18a92f575c0db183 ), which was inadvertantly reverted via
2b2ee080f0164485562593b1b87291a48cea4a9a .

llvm-svn: 337156
2018-07-16 10:21:36 +00:00
Mark Searles 72da47df25 run post-RA hazard recognizer pass late
Memory legalizer, waitcnt, and shrink  passes can perturb the instructions,
which means that the post-RA hazard recognizer pass should run after them.
Otherwise, one of those passes may invalidate the work done by the hazard
recognizer. Note that this has adverse side-effect that any consecutive
S_NOP 0's, emitted by the hazard recognizer, will not be shrunk into a
single S_NOP <N>. This should be addressed in a follow-on patch.

Differential Revision: https://reviews.llvm.org/D49288

llvm-svn: 337154
2018-07-16 10:02:41 +00:00
Mark Searles c2d5d9adb5 Revert "[AMDGPU][Waitcnt] fix "comparison of integers of different signs" build error"
This reverts commit fe0a456510131f268e388c4a18a92f575c0db183.

llvm-svn: 337153
2018-07-16 10:02:40 +00:00
Craig Topper 07a1787501 [X86] Merge the FR128 and VR128 regclass since they have identical spill and alignment characteristics.
This unfortunately requires a bunch of bitcasts to be added added to SUBREG_TO_REG, COPY_TO_REGCLASS, and instructions in output patterns. Otherwise tablegen seems to default to picking f128 and then we fail when something tries to get the register class for f128 which isn't always valid.

The test changes are because we were previously mixing fr128 and vr128 due to contrainRegClass finding FR128 first and passes like live range shrinking weren't handling that well.

llvm-svn: 337147
2018-07-16 06:56:09 +00:00
Chandler Carruth bc46bca99e [x86/SLH] Fix an unused variable warning in release builds after
r337144.

llvm-svn: 337145
2018-07-16 04:42:27 +00:00
Chandler Carruth cdf0addc65 [x86/SLH] Teach speculative load hardening to correctly harden the
indices used by AVX2 and AVX-512 gather instructions.

The index vector is hardened by broadcasting the predicate state
into a vector register and then or-ing. We don't even have to worry
about EFLAGS here.

I've added a test for all of the gather intrinsics to make sure that we
don't miss one. A particularly interesting creation is the gather
prefetch, which needs to be marked as potentially "loading" to get the
correct behavior. It's a memory access in many ways, and is actually
relevant for SLH. Based on discussion with Craig in review, I've moved
it to be `mayLoad` and `mayStore` rather than generic side effects. This
matches how we model other prefetch instructions.

Many thanks to Craig for the review here.

Differential Revision: https://reviews.llvm.org/D49336

llvm-svn: 337144
2018-07-16 04:17:51 +00:00
Chandler Carruth 3ffcc0383e [x86/SLH] Extract one of the bits of logic to its own function. NFC.
This is just a refactoring to start cleaning up the code here and make
it more readable and approachable.

llvm-svn: 337138
2018-07-15 23:46:36 +00:00
Craig Topper cdb0ed2910 [X86] Add custom execution domain fixing for 128/256-bit integer logic operations with AVX512F, but not AVX512DQ.
AVX512F only has integer domain logic instructions. AVX512DQ added FP domain logic instructions.

Execution domain fixing runs before EVEX->VEX. So if we have AVX512F and not AVX512DQ we fail to do execution domain switching of the logic operations. This leads to mismatches in execution domain and more test differences.

This patch adds custom domain fixing that switches EVEX integer logic operations to VEX fp logic operations if XMM16-31 are not used.

llvm-svn: 337137
2018-07-15 23:32:36 +00:00
Craig Topper d553ff3e2e [X86] Add load patterns for cases where we select X86Movss/X86Movsd to blend instructions.
This allows us to fold the load during isel without waiting for the peephole pass to do it.

llvm-svn: 337136
2018-07-15 21:49:01 +00:00
Craig Topper ec0038398a [X86] Use 128-bit blends instead vmovss/vmovsd for 512-bit vzmovl patterns to match AVX.
llvm-svn: 337135
2018-07-15 18:51:08 +00:00
Craig Topper 8f34858779 [X86] Use 128-bit ops for 256-bit vzmovl patterns.
128-bit ops implicitly zero the upper bits. This should address the comment about domain crossing for the integer version without AVX2 since we can use a 128-bit VBLENDW without AVX2.

The only bad thing I see here is that we failed to reuse an vxorps in some of the tests, but I think that's already known issue.

llvm-svn: 337134
2018-07-15 18:51:07 +00:00
Andrea Di Biagio ff630c2cdc [llvm-mca][BtVer2] teach how to identify false dependencies on partially written
registers.

The goal of this patch is to improve the throughput analysis in llvm-mca for the
case where instructions perform partial register writes.

On x86, partial register writes are quite difficult to model, mainly because
different processors tend to implement different register merging schemes in
hardware.

When the code contains partial register writes, the IPC (instructions per
cycles) estimated by llvm-mca tends to diverge quite significantly from the
observed IPC (using perf).

Modern AMD processors (at least, from Bulldozer onwards) don't rename partial
registers. Quoting Agner Fog's microarchitecture.pdf:
" The processor always keeps the different parts of an integer register together.
For example, AL and AH are not treated as independent by the out-of-order
execution mechanism. An instruction that writes to part of a register will
therefore have a false dependence on any previous write to the same register or
any part of it."

This patch is a first important step towards improving the analysis of partial
register updates. It changes the semantic of RegisterFile descriptors in
tablegen, and teaches llvm-mca how to identify false dependences in the presence
of partial register writes (for more details: see the new code comments in
include/Target/TargetSchedule.h - class RegisterFile).

This patch doesn't address the case where a write to a part of a register is
followed by a read from the whole register.  On Intel chips, high8 registers
(AH/BH/CH/DH)) can be stored in separate physical registers. However, a later
(dirty) read of the full register (example: AX/EAX) triggers a merge uOp, which
adds extra latency (and potentially affects the pipe usage).
This is a very interesting article on the subject with a very informative answer
from Peter Cordes:
https://stackoverflow.com/questions/45660139/how-exactly-do-partial-registers-on-haswell-skylake-perform-writing-al-seems-to

In future, the definition of RegisterFile can be extended with extra information
that may be used to identify delays caused by merge opcodes triggered by a dirty
read of a partial write.

Differential Revision: https://reviews.llvm.org/D49196

llvm-svn: 337123
2018-07-15 11:01:38 +00:00
Dylan McKay 0603bae41c [AVR] Document some public functions
llvm-svn: 337122
2018-07-15 07:24:27 +00:00
Craig Topper 60ce856134 [X86] Add some optsize patterns for 256-bit X86vzmovl.
These patterns use VMOVSS/SD. Without optsize we use BLENDI instead.

llvm-svn: 337119
2018-07-15 06:03:19 +00:00
Chandler Carruth fb503ac027 [x86/SLH] Fix an issue where we wouldn't harden any loads if we found
no conditions.

This is only valid to do if we're hardening calls and rets with LFENCE
which results in an LFENCE guarding the entire entry block for us.

llvm-svn: 337089
2018-07-14 09:32:37 +00:00
Craig Topper 7426cf6717 [X86] Fix a subtle bug in the custom execution domain fixing for blends.
The code tried to find the immediate by using getNumOperands() on the MachineInstr, but there might be implicit-defs after the immediate that get counted.

Instead use getNumOperands() from the instruction description which will only count the operands that are defined in the td file.

llvm-svn: 337088
2018-07-14 06:30:30 +00:00
Craig Topper f0b164415c [X86] Prefer blendi over movss/sd when avx512 is enabled unless optimizing for size.
AVX512 doesn't have an immediate controlled blend instruction. But blend throughput is still better than movss/sd on SKX.

This commit changes AVX512 to use the AVX blend instructions instead of MOVSS/MOVSD. This constrains the register allocation since it won't be able to use XMM16-31, but hopefully the increased throughput and reduced port 5 pressure makes up for that.

llvm-svn: 337083
2018-07-14 02:05:08 +00:00
Evgeniy Stepanov 1971ba097d Revert "AMDGPU: Fix handling of alignment padding in DAG argument lowering"
This reverts commit r337021.

WARNING: MemorySanitizer: use-of-uninitialized-value
    #0 0x1415cd65 in void write_signed<long>(llvm::raw_ostream&, long, unsigned long, llvm::IntegerStyle) /code/llvm-project/llvm/lib/Support/NativeFormatting.cpp:95:7
    #1 0x1415c900 in llvm::write_integer(llvm::raw_ostream&, long, unsigned long, llvm::IntegerStyle) /code/llvm-project/llvm/lib/Support/NativeFormatting.cpp:121:3
    #2 0x1472357f in llvm::raw_ostream::operator<<(long) /code/llvm-project/llvm/lib/Support/raw_ostream.cpp:117:3
    #3 0x13bb9d4 in llvm::raw_ostream::operator<<(int) /code/llvm-project/llvm/include/llvm/Support/raw_ostream.h:210:18
    #4 0x3c2bc18 in void printField<unsigned int, &(amd_kernel_code_s::amd_kernel_code_version_major)>(llvm::StringRef, amd_kernel_code_s const&, llvm::raw_ostream&) /code/llvm-project/llvm/lib/Target/AMDGPU/Utils/AMDKernelCodeTUtils.cpp:78:23
    #5 0x3c250ba in llvm::printAmdKernelCodeField(amd_kernel_code_s const&, int, llvm::raw_ostream&) /code/llvm-project/llvm/lib/Target/AMDGPU/Utils/AMDKernelCodeTUtils.cpp:104:5
    #6 0x3c27ca3 in llvm::dumpAmdKernelCode(amd_kernel_code_s const*, llvm::raw_ostream&, char const*) /code/llvm-project/llvm/lib/Target/AMDGPU/Utils/AMDKernelCodeTUtils.cpp:113:5
    #7 0x3a46e6c in llvm::AMDGPUTargetAsmStreamer::EmitAMDKernelCodeT(amd_kernel_code_s const&) /code/llvm-project/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp:161:3
    #8 0xd371e4 in llvm::AMDGPUAsmPrinter::EmitFunctionBodyStart() /code/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp:204:26

[...]

Uninitialized value was created by an allocation of 'KernelCode' in the stack frame of function '_ZN4llvm16AMDGPUAsmPrinter21EmitFunctionBodyStartEv'
    #0 0xd36650 in llvm::AMDGPUAsmPrinter::EmitFunctionBodyStart() /code/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp:192

llvm-svn: 337079
2018-07-14 01:20:53 +00:00
Chandler Carruth a24fe067c0 [x86/SLH] Add an assert to catch if we ever end up trying to harden
post-load a register that isn't valid for use with OR or SHRX.

llvm-svn: 337078
2018-07-14 00:52:09 +00:00
Krzysztof Parzyszek 7ced04c0fd [Hexagon] Avoid introducing calls into coalesced range of HVX vector pairs
If an HVX vector register is to be coalesced into a vector pair, make
sure that the vector pair will not have a function call in its live range,
unless it already had one. All HVX vector registers are volatile, so
any vector register live across a function call will have to be spilled.

If a vector needs to be spilled, and it's coalesced into a vector pair
then the whole pair will need to be spilled (even if only a part of it is
live), taking extra stack space.

llvm-svn: 337073
2018-07-13 23:42:29 +00:00
Craig Topper 8315667d99 [X86][SLH] Remove PDEP and PEXT from isDataInvariantLoad
Ryzen has something like an 18 cycle latency on these based on Agner's data. AMD's own xls is blank. So it seems like there might be something tricky here.

Agner's data for Intel CPUs indicates these are a single uop there.

Probably safest to remove them. We never generate them without an intrinsic so this should be ok.

Differential Revision: https://reviews.llvm.org/D49315

llvm-svn: 337067
2018-07-13 22:41:52 +00:00
Craig Topper 41fa858262 [X86][SLH] Add VEX and EVEX conversion instructions to isDataInvariantLoad
-Drop the intrinsic versions of conversion instructions. These should be handled when we do vectors. They shouldn't show up in scalar code.
-Add the float<->double conversions which were missing.
-Add the AVX512 and AVX version of the conversion instructions including the unsigned integer conversions unique to AVX512

Differential Revision: https://reviews.llvm.org/D49313

llvm-svn: 337066
2018-07-13 22:41:50 +00:00
Craig Topper 445abf700d [X86][SLH] Regroup the instructions in isDataInvariantLoad a little. NFC
-Move BSF/BSR to the same group as TZCNT/LZCNT/POPCNT.
-Split some of the bit manipulation instructions away from TZCNT/LZCNT/POPCNT. These are things like 'x & (x - 1)' which are composed of a few simple arithmetic operations. These aren't nearly as complicated/surprising as counting bits.
-Move BEXTR/BZHI into their own group. They aren't like a simple arithmethic op or the bit manipulation instructions. They're more like a shift+and.

Differential Revision: https://reviews.llvm.org/D49312

llvm-svn: 337065
2018-07-13 22:41:46 +00:00
Craig Topper 2260e4149a [X86] Use the correct types in some recently added isel patterns.
These were supposed to be integer types since we are selecting integer instructions.

Found while preparing to remove these patterns for another patch.

llvm-svn: 337057
2018-07-13 22:27:53 +00:00
Tom Stellard ac68471326 AMDGPU/GlobalISel: Implement select() for 32-bit @llvm.minnun and @llvm.maxnum
Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D46172

llvm-svn: 337056
2018-07-13 22:16:03 +00:00
Craig Topper da424ba1c5 [X86][FastISel] Support uitofp with avx512.
llvm-svn: 337055
2018-07-13 22:09:30 +00:00
Fangrui Song dcdc9ac7a2 [X86] Correct comment of TEST elimination in BSF/TZCNT
llvm-svn: 337052
2018-07-13 21:40:08 +00:00
Tom Stellard 390a5f4774 AMDGPU/GlobalISel: Implement select() for @llvm.amdgcn.exp
Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D45882

llvm-svn: 337046
2018-07-13 21:05:14 +00:00
Craig Topper f0831eef0b [X86][FastISel] Add EVEX support to sitofp handling.
llvm-svn: 337045
2018-07-13 21:03:43 +00:00
Fangrui Song 90d9c201dc [X86] Try fixing r336768
llvm-svn: 337043
2018-07-13 20:54:24 +00:00
Matt Arsenault d362b6ad29 AMDGPU: Properly handle shader inputs with split arguments
This needs to refer to arguments by their original argument
index, not the argument split index which depends on what
the type splitting decides to do.

Also avoid increment PSInputNum for each split piece.

llvm-svn: 337022
2018-07-13 16:40:37 +00:00
Matt Arsenault de95077780 AMDGPU: Fix handling of alignment padding in DAG argument lowering
This was completely broken if there was ever a struct argument, as
this information is thrown away during the argument analysis.

The offsets as passed in to LowerFormalArguments are not useful,
as they partially depend on the legalized result register type,
and they don't consider the alignment in the first place.

Ignore the Ins array, and instead figure out from the raw IR type
what we need to do. This seems to fix the padding computation
if the DAG lowering is forced (and stops breaking arguments
following padded arguments if the arguments were only partially
lowered in the IR)

llvm-svn: 337021
2018-07-13 16:40:25 +00:00
Sjoerd Meijer ceabd50a5c [AArch64] Armv8.4-A: LDAPR & STLR with immediate offset instructions (cont'd)
Follow up of rL336913: fix base class description. Thanks to Ahmed Bougacha
for pointing this out.

Differential Revision: https://reviews.llvm.org/D49284

llvm-svn: 337009
2018-07-13 15:25:42 +00:00
Nemanja Ivanovic 080c35050e [PowerPC] Materialize more constants with CR-field set in late peephole
Revision r322373 fixed a bug in how we materialize constants when the CR-field
needs to be set.

However the fix is overly conservative. It will only do the transform if
AND-ing the input with the new constant produces the same new constant.
This is of course correct, but not necessarily required.

If there are no futher uses of the constant, the constant can be changed.
If there are no uses of the GPR result, the final result of the materialization
isn't important other than it needs to compare to zero correctly (lt, gt, eq).

Differential revision: https://reviews.llvm.org/D42109

llvm-svn: 337008
2018-07-13 15:21:03 +00:00
Joel Galenson 06e7e5798f [cfi-verify] Support AArch64.
This patch adds support for AArch64 to cfi-verify.

This required three changes to cfi-verify.  First, it generalizes checking if an instruction is a trap by adding a new isTrap flag to TableGen (and defining it for x86 and AArch64).  Second, the code that ensures that the operand register is not clobbered between the CFI check and the indirect call needs to allow a single dereference (in x86 this happens as part of the jump instruction).  Third, we needed to ensure that return instructions are not counted as indirect branches.  Technically, returns are indirect branches and can be covered by CFI, but LLVM's forward-edge CFI does not protect them, and x86 does not consider them, so we keep that behavior.

In addition, we had to improve AArch64's code to evaluate the branch target of a MCInst to handle calls where the destination is not the first operand (which it often is not).

Differential Revision: https://reviews.llvm.org/D48836

llvm-svn: 337007
2018-07-13 15:19:33 +00:00
Erich Keane ac8cb22ad5 Add parens to silence Wparentheses warning, introduced by 336990
llvm-svn: 337002
2018-07-13 14:43:20 +00:00
Ulrich Weigand c48aefb63b [TableGen] Support multi-alternative pattern fragments
A TableGen instruction record usually contains a DAG pattern that will
describe the SelectionDAG operation that can be implemented by this
instruction. However, there will be cases where several different DAG
patterns can all be implemented by the same instruction. The way to
represent this today is to write additional patterns in the Pattern
(or usually Pat) class that map those extra DAG patterns to the
instruction. This usually also works fine.

However, I've noticed cases where the current setup seems to require
quite a bit of extra (and duplicated) text in the target .td files.
For example, in the SystemZ back-end, there are quite a number of
instructions that can implement an "add-with-overflow" operation.
The same instructions also need to be used to implement just plain
addition (simply ignoring the extra overflow output). The current
solution requires creating extra Pat pattern for every instruction,
duplicating the information about which particular add operands
map best to which particular instruction.

This patch enhances TableGen to support a new PatFrags class, which
can be used to encapsulate multiple alternative patterns that may
all match to the same instruction.  It operates the same way as the
existing PatFrag class, except that it accepts a list of DAG patterns
to match instead of just a single one.  As an example, we can now define
a PatFrags to match either an "add-with-overflow" or a regular add
operation:

  def z_sadd : PatFrags<(ops node:$src1, node:$src2),
                        [(z_saddo node:$src1, node:$src2),
                         (add node:$src1, node:$src2)]>;

and then use this in the add instruction pattern:

  defm AR : BinaryRRAndK<"ar", 0x1A, 0xB9F8, z_sadd, GR32, GR32>;

These SystemZ target changes are implemented here as well.


Note that PatFrag is now defined as a subclass of PatFrags, which
means that some users of internals of PatFrag need to be updated.
(E.g. instead of using PatFrag.Fragment you now need to use
!head(PatFrag.Fragments).)


The implementation is based on the following main ideas:
- InlinePatternFragments may now replace each original pattern
  with several result patterns, not just one.
- parseInstructionPattern delays calling InlinePatternFragments
  and InferAllTypes.  Instead, it extracts a single DAG match
  pattern from the main instruction pattern.
- Processing of the DAG match pattern part of the main instruction
  pattern now shares most code with processing match patterns from
  the Pattern class.
- Direct use of main instruction patterns in InferFromPattern and
  EmitResultInstructionAsOperand is removed; everything now operates
  solely on DAG match patterns.


Reviewed by: hfinkel

Differential Revision: https://reviews.llvm.org/D48545

llvm-svn: 336999
2018-07-13 13:18:00 +00:00
Chandler Carruth 90358e1ef1 [SLH] Introduce a new pass to do Speculative Load Hardening to mitigate
Spectre variant #1 for x86.

There is a lengthy, detailed RFC thread on llvm-dev which discusses the
high level issues. High level discussion is probably best there.

I've split the design document out of this patch and will land it
separately once I update it to reflect the latest edits and updates to
the Google doc used in the RFC thread.

This patch is really just an initial step. It isn't quite ready for
prime time and is only exposed via debugging flags. It has two major
limitations currently:
1) It only supports x86-64, and only certain ABIs. Many assumptions are
   currently hard-coded and need to be factored out of the code here.
2) It doesn't include any options for more fine-grained control, either
   of which control flow edges are significant or which loads are
   important to be hardened.
3) The code is still quite rough and the testing lighter than I'd like.

However, this is enough for people to begin using. I have had numerous
requests from people to be able to experiment with this patch to
understand the trade-offs it presents and how to use it. We would also
like to encourage work to similar effect in other toolchains.

The ARM folks are actively developing a system based on this for
AArch64. We hope to merge this with their efforts when both are far
enough along. But we also don't want to block making this available on
that effort.

Many thanks to the *numerous* people who helped along the way here. For
this patch in particular, both Eric and Craig did a ton of review to
even have confidence in it as an early, rough cut at this functionality.

Differential Revision: https://reviews.llvm.org/D44824

llvm-svn: 336990
2018-07-13 11:13:58 +00:00
Chandler Carruth caa7b03a50 [x86] Teach the EFLAGS copy lowering to handle much more complex control
flow patterns including forks, merges, and even cyles.

This tries to cover a reasonably comprehensive set of patterns that
still don't require PHIs or PHI placement. The coverage was inspired by
the amazing variety of patterns produced when copy EFLAGS and restoring
it to implement Speculative Load Hardening. Without this patch, we
simply cannot make such complex and invasive changes to x86 instruction
sequences due to EFLAGS.

I've added "just" one test, but this test covers many different
complexities and corner cases of this approach. It is actually more
comprehensive, as far as I can tell, than anything that I have
encountered in the wild on SLH.

Because the test is so complex, I've tried to give somewhat thorough
comments and an ASCII-art diagram of the control flows to make it a bit
easier to read and maintain long-term.

Differential Revision: https://reviews.llvm.org/D49220

llvm-svn: 336985
2018-07-13 09:39:10 +00:00
Sander de Smalen 7c3c0f24a3 [AArch64][SVE] Asm: Vector Unpack Low/High instructions.
This patch adds support for the following unpack instructions:
  
- PUNPKLO, PUNPKHI   Unpack elements from low/high half and
                     place into elements of twice their size.

  e.g. punpklo p0.h, p0.b

- UUNPKLO, UUNPKHI   Unpack elements from low/high half and 
  SUNPKLO, SUNPKHI   place into elements of twice their size
                     after zero- or sign-extending the values.

  e.g. uunpklo z0.h, z0.b

llvm-svn: 336982
2018-07-13 09:25:43 +00:00
Sander de Smalen faee91a52b [AArch64][SVE] Asm: Support for insert element (INSR) instructions.
Insert general purpose register into shifted vector, e.g.
  insr    z0.s, w0
  insr    z0.d, x0

Insert SIMD&FP scalar register into shifted vector, e.g.
  insr    z0.b, b0
  insr    z0.h, h0
  insr    z0.s, s0
  insr    z0.d, d0

llvm-svn: 336979
2018-07-13 08:51:57 +00:00
Craig Topper a8efe59a0a [X86] Prefer MOVSS/SD over BLEND under optsize in isel.
Previously we iseled to blend, commuted to another blend, and then commuted back to movss/movsd or blend depending on optsize. Now we do it directly.

llvm-svn: 336976
2018-07-13 06:25:31 +00:00
Craig Topper 2ab325ba23 [X86] Remove isel patterns that turns packed add/sub/mul/div+movss/sd into scalar intrinsic instructions.
This is not an optimization we should be doing in isel. This is more suitable for a DAG combine.

My main concern is a future time when we support more FPENV. Changing a packed op to a scalar op could cause us to miss some exceptions that should have occured if we had done a packed op. A DAG combine would be better able to manage this.

llvm-svn: 336971
2018-07-13 04:50:39 +00:00
Matthias Braun 90ad6835dd CodeGen: Remove pipeline dependencies on StackProtector; NFC
This re-applies r336929 with a fix to accomodate for the Mips target
scheduling multiple SelectionDAG instances into the pass pipeline.

PrologEpilogInserter and StackColoring depend on the StackProtector analysis
being alive from the point it is run until PEI, which requires that they are all
scheduled in the same FunctionPassManager. Inserting a (machine) ModulePass
between StackProtector and PEI results in these passes being in separate
FunctionPassManagers and the StackProtector is not available for PEI.

PEI and StackColoring don't use much information from the StackProtector pass,
so transfering the required information to MachineFrameInfo is cleaner than
keeping the StackProtector pass around. This commit moves the SSP layout
information to MFI instead of keeping it in the pass.

This patch set (D37580, D37581, D37582, D37583, D37584, D37585, D37586, D37587)
is a first draft of the pagerando implementation described in
http://lists.llvm.org/pipermail/llvm-dev/2017-June/113794.html.

Patch by Stephen Crane <sjc@immunant.com>

Differential Revision: https://reviews.llvm.org/D49256

llvm-svn: 336964
2018-07-13 00:08:38 +00:00
Craig Topper 3a13477214 [X86] Add AVX512 equivalents of some isel patterns so we get EVEX instructions.
These are the patterns for matching fceil, ffloor, and sqrt to intrinsic instructions if they have a MOVSS/SD.

llvm-svn: 336954
2018-07-12 22:14:10 +00:00
Craig Topper b0053b79d6 Revert r336950 and r336951 "[X86] Add AVX512 equivalents of some isel patterns so we get EVEX instructions." and "foo"
One of them had a bad title and they should have been squashed.

llvm-svn: 336953
2018-07-12 21:58:03 +00:00
Craig Topper 3b837b6b63 [X86] Add AVX512 equivalents of some isel patterns so we get EVEX instructions.
These are the patterns for matching fceil, ffloor, and sqrt to intrinsic instructions if they have a MOVSS/SD.

llvm-svn: 336951
2018-07-12 21:53:23 +00:00
Craig Topper b01a355354 foo
llvm-svn: 336950
2018-07-12 21:53:07 +00:00
Craig Topper 57c4585bab [X86][FastISel] Support EVEX version of sqrt.
llvm-svn: 336939
2018-07-12 19:58:06 +00:00
Matt Arsenault 4dca0a9904 AMDGPU: Fix assert in truncate combine with vectors
The piece above probably has the same problem, but I need
to try to come up with a test for it.

llvm-svn: 336935
2018-07-12 19:40:16 +00:00
Craig Topper abc307e6fa [X86] Connect the flags user from PCMPISTR instructions to the correct node from the instruction.
We were accidentally connecting it to result 0 instead of result 1. This was caught by the machine verifier that noticed the flags were dead, but we were using them somehow. I'm still not clear what actually happened downstream.

llvm-svn: 336925
2018-07-12 18:04:05 +00:00
Craig Topper d43f58231c [X86][FastISel] Choose EVEX instructions when possible when lowering x86_sse_cvttss2si and similar intrinsics.
This should fix a machine verifier error.

llvm-svn: 336924
2018-07-12 18:03:56 +00:00
Sjoerd Meijer 83a2a62fb4 [AArch64] Armv8.4-A: LDAPR & STLR with immediate offset instructions
These instructions are added to AArch64 only.

llvm-svn: 336913
2018-07-12 14:57:59 +00:00
Simon Pilgrim 44b89fa900 [X86][SSE] Utilize ZeroableElements for canWidenShuffleElements
canWidenShuffleElements can do a better job if given a mask with ZeroableElements info. Apparently, ZeroableElements was being only used to identify AllZero candidates, but possibly we could plug it into more shuffle matchers.

Original Patch by Zvi Rackover @zvi

Differential Revision: https://reviews.llvm.org/D42044

llvm-svn: 336903
2018-07-12 13:29:41 +00:00
Simon Pilgrim 8a463897e9 [X86][AVX] Use Zeroable mask to improve shuffle mask widening
Noticed while updating D42044, lowerV2X128VectorShuffle can improve the shuffle mask with the zeroable data to create a target shuffle mask to recognise more 'zero upper 128' patterns.

NOTE: lowerV4X128VectorShuffle could benefit as well but the code needs refactoring first to discriminate between SM_SentinelUndef and SM_SentinelZero for negative shuffle indices.

Differential Revision: https://reviews.llvm.org/D49092

llvm-svn: 336900
2018-07-12 13:03:58 +00:00
Simon Atanasyan 053ff54478 [mips] Mark standard encoded instructions as not being in MIPS16e
Mark standard encoded instructions and pseudo "standard encoded"
as not being in MIPS16e by default.

Patch by Simon Dardis.

Differential revision: https://reviews.llvm.org/D48379

llvm-svn: 336893
2018-07-12 08:50:11 +00:00
Craig Topper 73c7dceffd [X86] Remove i128 type from FR128 regclass.
i128 isn't a legal type in our x86 implementation today. So remove this and the few patterns that used it until it becomes necessary.

llvm-svn: 336889
2018-07-12 07:30:01 +00:00
Craig Topper 73347ec081 [X86] Remove patterns and ISD nodes for the old scalar FMA intrinsic lowering.
We now use llvm.fma.f32/f64 or llvm.x86.fmadd.f32/f64 intrinsics that use scalar types rather than vector types. So we don't these special ISD nodes that operate on the lowest element of a vector.

llvm-svn: 336883
2018-07-12 03:42:41 +00:00
Chandler Carruth b4faf4ce08 [x86] Fix another trivial bug in x86 flags copy lowering that has been
there for a long time.

The boolean tracking whether we saw a kill of the flags was supposed to
be per-block we are scanning and instead was outside that loop and never
cleared. It requires a quite contrived test case to hit this as you have
to have multiple levels of successors and interleave them with kills.
I've included such a test case here.

This is another bug found testing SLH and extracted to its own focused
patch.

llvm-svn: 336876
2018-07-12 01:43:21 +00:00
Craig Topper be996bd2d9 [X86] Add patterns to use VMOVSS/SD zero masking for scalar f32/f64 select with zero.
These showed up in some of the upgraded FMA code. We really need to improve these test cases more, but this helps for now.

llvm-svn: 336875
2018-07-12 00:54:40 +00:00
Chandler Carruth 1c8234f639 [x86] Fix EFLAGS copy lowering to correctly handle walking past uses in
multiple successors where some of the uses end up killing the EFLAGS
register.

There was a bug where rather than skipping to the next basic block
queued up with uses once we saw a kill, we stopped processing the blocks
entirely. =/

Test case produces completely nonsensical code w/o this tiny fix.

This was found testing Speculative Load Hardening and split out of that
work.

Differential Revision: https://reviews.llvm.org/D49211

llvm-svn: 336874
2018-07-12 00:52:50 +00:00
Craig Topper 034adf2683 [X86] Remove and autoupgrade the scalar fma intrinsics with masking.
This converts them to what clang is now using for codegen. Unfortunately, there seem to be a few kinks to work out still. I'll try to address with follow up patches.

llvm-svn: 336871
2018-07-12 00:29:56 +00:00
Eli Friedman 0319c28459 [CodeGen] Emit more precise AssertZext/AssertSext nodes.
This is marginally helpful for removing redundant extensions, and the
code is easier to read, so it seems like an all-around win. In the new
test i8-phi-ext.ll, we used to emit an AssertSext i8; now we emit an
AssertZext i2, which allows the extension of the return value to be
eliminated.

Differential Revision: https://reviews.llvm.org/D49004

llvm-svn: 336868
2018-07-11 23:26:35 +00:00
Tom Stellard 752ddbd068 AMDGPU/SI: Initialize InstrInfo before TargetLoweringInfo in GCNSubtarget
SITargetLowering queries SIInstrInfo in its constructor, so SIInstrInfo
must be initialized first.  This fixes msan buildbot failures and was
introduced by r336851.

llvm-svn: 336861
2018-07-11 22:15:15 +00:00
Tom Stellard 1a8adce24f AMDGPU: Remove duplicate call to initializeSubtargetDependencies()
This was added in r336851.

llvm-svn: 336853
2018-07-11 21:12:03 +00:00
Tom Stellard 5bfbae5cb1 AMDGPU: Refactor Subtarget classes
Summary:
This is a follow-up to r335942.
- Merge SISubtarget into AMDGPUSubtarget and rename to GCNSubtarget
- Rename AMDGPUCommonSubtarget to AMDGPUSubtarget
- Merge R600Subtarget::Generation and GCNSubtarget::Generation into
  AMDGPUSubtarget::Generation.

Reviewers: arsenm, jvesely

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D49037

llvm-svn: 336851
2018-07-11 20:59:01 +00:00
Craig Topper 38b290f7d7 [X86] Remove patterns for inserting a load into a zero vector.
We can instead block the load folding isProfitableToFold. Then isel will emit a register->register move for the zeroing part and a separate load. The PostProcessISelDAG should be able to remove the register->register move.

This saves us patterns and fixes the fact that we only had unaligned load patterns. The test changes show places where we should have been using an aligned load.

llvm-svn: 336828
2018-07-11 18:09:04 +00:00
Konstantin Zhuravlyov bde59e9989 AMDGPU/NFC: Use already available explicit kernarg
size instead of calculating it again when filling
out the metadata.

llvm-svn: 336825
2018-07-11 17:27:17 +00:00
Andrea Di Biagio 483db141e3 [X86] Fix MayLoad/HasSideEffect flag for (V)MOVLPSrm instructions.
Before revision 336728, the "mayLoad" flag for instruction (V)MOVLPSrm was
inferred directly from the "default" pattern associated with the instruction
definition.

r336728 removed special node X86Movlps, and all the patterns associated to it.
Now instruction (V)MOVLPSrm doesn't have a pattern associated to it, and the
'mayLoad/hasSideEffects' flags are left unset.

When the instruction info is emitted by tablegen, method
CodeGenDAGPatterns::InferInstructionFlags() sees that (V)MOVLPSrm doesn't have a
pattern, and flags are undefined. So, it conservatively sets the
"hasSideEffects" flag for it.

As a consequence, we were losing the 'mayLoad' flag, and we were gaining a
'hasSideEffect' flag in its place.
This patch fixes the issue (originally reported by Michael Holmen).

The mca tests show the differences in the instruction info flags.  Instructions
that were affected by this problem were: MOVLPSrm/VMOVLPSrm/VMOVLPSZ128rm.

Differential Revision: https://reviews.llvm.org/D49182

llvm-svn: 336818
2018-07-11 15:27:50 +00:00
Simon Atanasyan e523792a9c [mips] Update the P5600 scheduler model not to use instruction itineraries.
This mostly brings the P5600 scheduler model to a mostly complete
status. There are a number of instructions which trigger the
`error:'MipsP5600Model' lacks information for` error. These are certain
codegen only instructions relating to MIPS64 which can be addressed by
using the correct predicates for them. That will be done in a full-up
patch.

Patch by Simon Dardis.

Differential revision: https://reviews.llvm.org/D45245

llvm-svn: 336802
2018-07-11 13:21:10 +00:00
Sjoerd Meijer 53449daa96 [ARM] ParallelDSP: multiple reduction stmts in loop
This fixes an issue that we were not properly supporting multiple reduction
stmts in a loop, and not generating SMLADs for these cases. The alias analysis
checks were done too early, making it too conservative.

Differential revision: https://reviews.llvm.org/D49125

llvm-svn: 336795
2018-07-11 12:36:25 +00:00
Sander de Smalen ea45a89e5c [AArch64][SVE] Asm: Support for COMPACT instruction.
The compact instruction shuffles active elements of vector
into lowest numbered elements and sets remaining elements
to zero. 

e.g.
  compact z0.s, p0, z1.s

llvm-svn: 336789
2018-07-11 11:22:26 +00:00
Sander de Smalen a90530f7c1 [AArch64][SVE] Asm: Support for LAST(A|B) and CLAST(A|B) instructions.
The LASTB and LASTA instructions extract the last active element,
or element after the last active, from the source vector.

The added variants are:

  Scalar:
  last(a|b)  w0, p0, z0.b
  last(a|b)  w0, p0, z0.h
  last(a|b)  w0, p0, z0.s
  last(a|b)  x0, p0, z0.d

  SIMD & FP Scalar:
  last(a|b)  b0, p0, z0.b
  last(a|b)  h0, p0, z0.h
  last(a|b)  s0, p0, z0.s
  last(a|b)  d0, p0, z0.d

The CLASTB and CLASTA conditionally extract the last or element after
the last active element from the source vector.

The added variants are:

  Scalar:
  clast(a|b)  w0, p0, w0, z0.b
  clast(a|b)  w0, p0, w0, z0.h
  clast(a|b)  w0, p0, w0, z0.s
  clast(a|b)  x0, p0, x0, z0.d

  SIMD & FP Scalar:
  clast(a|b)  b0, p0, b0, z0.b
  clast(a|b)  h0, p0, h0, z0.h
  clast(a|b)  s0, p0, s0, z0.s
  clast(a|b)  d0, p0, d0, z0.d

  Vector:
  clast(a|b)  z0.b, p0, z0.b, z1.b
  clast(a|b)  z0.h, p0, z0.h, z1.h
  clast(a|b)  z0.s, p0, z0.s, z1.s
  clast(a|b)  z0.d, p0, z0.d, z1.d

Please refer to the architecture specification for more details on
the semantics of the added instructions.

llvm-svn: 336783
2018-07-11 10:08:00 +00:00
Simon Atanasyan 6cb1c6b35a [mips] Remove dead code. NFC
llvm-svn: 336777
2018-07-11 09:41:28 +00:00
Eric Liu 867b0e41fe [WebAssembly] Only call llvm::value::dump() in debug build.
This fixes compile error in r336759. llvm::value::dump is not available
in released build.

llvm-svn: 336770
2018-07-11 08:16:47 +00:00
Craig Topper 02867f0fa3 [X86] The TEST instruction is eliminated when BSF/TZCNT is used
Summary:
These changes cover the PR#31399.
Now the ffs(x) function is lowered to (x != 0) ? llvm.cttz(x) + 1 : 0
and it corresponds to the following llvm code:
  %cnt = tail call i32 @llvm.cttz.i32(i32 %v, i1 true)
  %tobool = icmp eq i32 %v, 0
  %.op = add nuw nsw i32 %cnt, 1
  %add = select i1 %tobool, i32 0, i32 %.op
and x86 asm code:
  bsfl     %edi, %ecx
  addl     $1, %ecx
  testl    %edi, %edi
  movl     $0, %eax
  cmovnel  %ecx, %eax
In this case the 'test' instruction can't be eliminated because
the 'add' instruction modifies the EFLAGS, namely, ZF flag
that is set by the 'bsf' instruction when 'x' is zero.

We now produce the following code:
  bsfl     %edi, %ecx
  movl     $-1, %eax
  cmovnel  %ecx, %eax
  addl     $1, %eax

Patch by Ivan Kulagin

Reviewers: davide, craig.topper, spatel, RKSimon

Reviewed By: craig.topper

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D48765

llvm-svn: 336768
2018-07-11 06:57:42 +00:00
Craig Topper 1d6a80cd95 [X86] Remove some composite MOVSS/MOVSD isel patterns.
These patterns looked for a MOVSS/SD followed by a scalar_to_vector. Or a scalar_to_vector followed by a load.

In both cases we emitted a MOVSS/SD for the MOVSS/SD part, a REG_CLASS for the scalar_to_vector, and a MOVSS/SD for the load.

But we have patterns that do each of those 3 things individually so there's no reason to build large patterns.

Most of the test changes are just reorderings. The one test that had a meaningful change is pr30430.ll and it appears to be a regression. But its doing -O0 so I think it missed a lot of opportunities and was just getting lucky before.

llvm-svn: 336762
2018-07-11 04:51:40 +00:00
Sam Clegg 92617559bb [WebAssembly] Add pass to infer prototypes for prototype-less functions
See https://bugs.llvm.org/show_bug.cgi?id=35385

Differential Revision: https://reviews.llvm.org/D48471

llvm-svn: 336759
2018-07-11 04:29:36 +00:00
Stefan Pintilie b9d01aa29e [Power9] Add remaining __flaot128 builtin support for FMA round to odd
Implement this as it is done on GCC:

__float128 a, b, c, d;
a = __builtin_fmaf128_round_to_odd (b, c, d);         // generates xsmaddqpo
a = __builtin_fmaf128_round_to_odd (b, c, -d);        // generates xsmsubqpo
a = - __builtin_fmaf128_round_to_odd (b, c, d);       // generates xsnmaddqpo
a = - __builtin_fmaf128_round_to_odd (b, c, -d);      // generates xsnmsubpqp

Differential Revision: https://reviews.llvm.org/D48218

llvm-svn: 336754
2018-07-11 01:42:22 +00:00
Eli Friedman d2c739230c [ARM] Treat cmn immediates as legal in isLegalICmpImmediate.
The original code attempted to do this, but the std::abs() call didn't
actually do anything due to implicit type conversions.  Fix the type
conversions, and perform the correct check for negative immediates.

This probably has very little practical impact, but it's worth fixing
just to avoid confusion in the future, I think.

Differential Revision: https://reviews.llvm.org/D48907

llvm-svn: 336742
2018-07-10 23:44:37 +00:00
Craig Topper 27c77fe4ce [X86] Remove AddedComplexity from all patterns that use X86vzmovl as their root.
Some added 20 and some added 15. Its unclear when to use which value and whether they are required at all.

This patch removes them all. If we start finding real world issues we may need to add them back with proper tests.

llvm-svn: 336735
2018-07-10 22:23:54 +00:00
Richard Trieu d5e57ed9c2 Fix -Wmismatched-tags warning
class -> struct in forward declaration.

llvm-svn: 336733
2018-07-10 22:09:33 +00:00
Craig Topper 860ab496d3 [X86] Teach X86InstrInfo::commuteInstructionImpl to use MOVSD/MOVSS for BLEND under optsize when the immediate allows it.
Isel currently emits movss/movsd a lot of the time and an accidental double commute turns it into a blend.

Ideally we'd select blend directly in isel under optspeed and not rely on the double commute to create blend.

llvm-svn: 336731
2018-07-10 22:02:23 +00:00
Craig Topper dea0b88b04 [X86] Remove X86ISD::MOVLPS and X86ISD::MOVLPD. NFCI
These ISD nodes try to select the MOVLPS and MOVLPD instructions which are special load only instructions. They load data and merge it into the lower 64-bits of an XMM register. They are logically equivalent to our MOVSD node plus a load.

There was only one place in X86ISelLowering that used MOVLPD and no places that selected MOVLPS. The one place that selected MOVLPD had to choose between it and MOVSD based on whether there was a load. But lowering is too early to tell if the load can really be folded. So in isel we have patterns that use MOVSD for MOVLPD if we can't find a load.

We also had patterns that select the MOVLPD instruction for a MOVSD if we can find a load, but didn't choose the MOVLPD ISD opcode for some reason.

So it seems better to just standardize on MOVSD ISD opcode and manage MOVSD vs MOVLPD instruction with isel patterns.

llvm-svn: 336728
2018-07-10 21:00:22 +00:00
Scott Linder 01ce144ddf [AMDGPU] Fix layering issue with AMDGPUHSAMetadataStreamer (NFC)
llvm-svn: 336722
2018-07-10 20:07:22 +00:00
Craig Topper fb302d0198 [X86] Remove dead SDNode object from X86InstrFragmentsSIMD.td. NFC
It points to an opcode that doesn't exist.

llvm-svn: 336720
2018-07-10 20:03:51 +00:00
Craig Topper 04ded1ac1f [X86] Remove AddedComplexity from register form of NOT. NFCI
I believe isProfitableToFold will stop the load folding that this was intended to overcome.

Given an (xor load, -1), isProfitableToFold will see that the immediate can be folded with the xor using a one byte immediate since it can be sign extended. It doesn't know about NOT, but the one byte immediate check is enough to stop the fold.

llvm-svn: 336712
2018-07-10 19:09:00 +00:00
Craig Topper 0f6275ab43 [X86] Remove AddedComplexity from MMX_X86movw2d patterns.
There were only 3 patterns with this node as a root and they all the same AddedComplexity. So this doesn't really do anything.

llvm-svn: 336711
2018-07-10 18:41:58 +00:00
Scott Linder 2ad2c18b82 [AMDGPU] Refactor HSAMetadataStream::emitKernel (NFC)
Move all metadata construction into AMDGPUHSAMetadataStreamer.

Differential Revision: https://reviews.llvm.org/D48176

llvm-svn: 336707
2018-07-10 17:31:32 +00:00
Alexander Ivchenko 48ca0550dd [GlobalISel][X86_64] Support for G_SITOFP
The instruction selection is automatically handled by tablegen

llvm-svn: 336703
2018-07-10 16:38:35 +00:00
Konstantin Zhuravlyov f0badd5ac1 AMDGPU: Make hidden argument metadata consistent with
amdgpu-implicitarg-num-bytes attribute

Differential Revision: https://reviews.llvm.org/D49096

llvm-svn: 336697
2018-07-10 16:12:51 +00:00
Sander de Smalen 53108d48f7 [AArch64][SVE] Asm: Support for predicated unary operations.
This patch adds support for the following instructions:
  CLS  (Count Leading Sign bits)
  CLZ  (Count Leading Zeros)
  CNT  (Count non-zero bits)
  CNOT (Logically invert boolean condition in vector)
  NOT  (Bitwise invert vector)
  FABS (Floating-point absolute value)
  FNEG (Floating-point negate)

All operations are predicated and unary, e.g.
  clz  z0.s, p0/m, z1.s

- CLS, CLZ, CNT, CNOT and NOT have variants for 8, 16, 32
  and 64 bit elements.

- FABS and FNEG have variants for 16, 32 and 64 bit elements.

llvm-svn: 336677
2018-07-10 14:05:55 +00:00
Matt Arsenault a680199a96 Reapply "AMDGPU: Force inlining if LDS global address is used"
This reverts commit r336623

llvm-svn: 336675
2018-07-10 14:03:41 +00:00
Krzysztof Parzyszek c052451a02 [Hexagon] Add implicit uses even when untied explicit uses are present
An explicit untied use is not sufficient to maintain liveness of a
register redefined in a predicated instruction. For example
  %1 = COPY %0
  ...
  %1 = A2_paddif %2, %1, 1
could become
  $r1 = COPY $r0
  ...
  $r1 = A2_paddif $p0, $r1, 1
and later
  $r1 = COPY $r0                ;; this is not really dead!
  ...
  $r1 = A2_paddif $p0, $r0, 1

llvm-svn: 336662
2018-07-10 12:57:49 +00:00
Simon Pilgrim d32ca2c0b7 [X86][SSE] Prefer BLEND(SHL(v,c1),SHL(v,c2)) over MUL(v, c3)
Now that rL336250 has landed, we should prefer 2 immediate shifts + a shuffle blend over performing a multiply. Despite the increase in instructions, this is quicker (especially for slow v4i32 multiplies), avoid loads and constant pool usage. It does mean however that we increase register pressure. The code size will go up a little but by less than what we save on the constant pool data.

This patch also adds support for v16i16 to the BLEND(SHIFT(v,c1),SHIFT(v,c2)) combine, and also prevents blending on pre-SSE41 shifts if it would introduce extra blend masks/constant pool usage.

Differential Revision: https://reviews.llvm.org/D48936

llvm-svn: 336642
2018-07-10 07:58:33 +00:00
Craig Topper 08b81a5508 [X86] Use IsProfitableToFold to block vinsertf128rm in favor of insert_subreg instead of artifically increasing pattern complexity to give priority.
This is a much more direct way to solve the issue than just giving extra priority.

llvm-svn: 336639
2018-07-10 06:19:54 +00:00
Craig Topper db73f56489 [X86] Remove some seemingly unnecessary patterns.
We're missing the EVEX equivalents of these patterns and seem to get along fine.

I think we end up with X86vzload for the obvious IR cases that would produce this DAG.

llvm-svn: 336638
2018-07-10 05:31:42 +00:00
Craig Topper 866a377e91 [X86] Correct vfixupimm load patterns to look for an integer load, not a floating point load bitcasted to integer.
DAG combine wouldn't let a floating point load bitcasted to integer exist. It would just be an integer load.

llvm-svn: 336626
2018-07-10 00:49:49 +00:00
Craig Topper e4f46e4f31 [X86] Remove FloatVT from X86VectorVTInfo in X86InstrAVX512.td
The only places it was used where places where VT was the same as FloatVT. So switch those uses to VT and drop it.

llvm-svn: 336624
2018-07-10 00:49:45 +00:00
Vlad Tsyrklevich 688e752207 Revert "AMDGPU: Force inlining if LDS global address is used"
This reverts commit r336587, it was causing test failures on the
sanitizer bots.

llvm-svn: 336623
2018-07-10 00:46:07 +00:00
Heejin Ahn fed7382ef6 [WebAssembly] Support for binary atomic RMW instructions
Summary:
This adds support for binary atomic read-modify-write instructions:
add, sub, and, or, xor, and xchg.

This does not yet support translations of some of LLVM IR atomicrmw
instructions (nand, max, min, umax, and umin) that do not have a direct
counterpart in wasm instructions.

Reviewers: dschuff

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D49088

llvm-svn: 336615
2018-07-09 22:30:51 +00:00
Stefan Pintilie 133acb22bb [Power9] Add __float128 builtins for Rounding Operations
Added __float128 support for a number of rounding operations:

trunc
rint
nearbyint
round
floor
ceil

Differential Revision: https://reviews.llvm.org/D48415

llvm-svn: 336601
2018-07-09 20:38:40 +00:00
Heejin Ahn d31bc9866b [WebAssembly] Improve readability of load/stores and tests. NFC.
Summary:
- Changed variable/function names to be more consistent
- Improved comments in test files
- Added more tests
- Fixed a few typos
- Misc. cosmetic changes

Reviewers: dschuff

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D49087

llvm-svn: 336598
2018-07-09 20:18:21 +00:00
Stefan Pintilie 58e3e0a827 [Power9] [LLVM] Add __float128 support for trunc to double round to odd
Add support for this builtin:
double builtin_truncf128_round_to_odd(float128)

Differential Revision: https://reviews.llvm.org/D48483

llvm-svn: 336595
2018-07-09 20:09:22 +00:00
Mark Searles 5bfd8d8991 [AMDGPU][Waitcnt] fix "comparison of integers of different signs" build error
Build error on Android; reported by and fix provided by (thanks) by Mauro Rossi <issor.oruam@gmail.com>

Fixes the following building error:

external/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp:1903:61:
error: comparison of integers of different signs:
'typename iterator_traits<__wrap_iter<MachineBasicBlock **> >::difference_type'
(aka 'int') and 'unsigned int' [-Werror,-Wsign-compare]
                      BlockWaitcntProcessedSet.end(), &MBB) < Count)) {
                      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ^ ~~~~~
1 error generated.

Differential Revision: https://reviews.llvm.org/D49089

llvm-svn: 336588
2018-07-09 19:28:14 +00:00
Matt Arsenault 40cb6cab56 AMDGPU: Force inlining if LDS global address is used
These won't work for the forseeable future. These aren't allowed
from OpenCL, but IPO optimizations can make them appear.

Also directly set the attributes on functions, regardless
of the linkage rather than cloning functions like before.

llvm-svn: 336587
2018-07-09 19:22:22 +00:00
Roman Lebedev 5ccae1750b [X86][TLI] DAGCombine: Unfold variable bit-clearing mask to two shifts.
Summary:
This adds a reverse transform for the instcombine canonicalizations
that were added in D47980, D47981.

As discussed later, that was worse at least for the code size,
and potentially for the performance, too.

https://rise4fun.com/Alive/Zmpl

Reviewers: craig.topper, RKSimon, spatel

Reviewed By: spatel

Subscribers: reames, llvm-commits

Differential Revision: https://reviews.llvm.org/D48768

llvm-svn: 336585
2018-07-09 19:06:42 +00:00
Stefan Pintilie 83a5fe146e [Power9] Add __float128 builtins for Round To Odd
GCC has builtins for these round to odd instructions:

__float128 __builtin_sqrtf128_round_to_odd (__float128)
__float128 __builtin_{add,sub,mul,div}f128_round_to_odd (__float128, __float128)
__float128 __builtin_fmaf128_round_to_odd (__float128, __float128, __float128)

Differential Revision: https://reviews.llvm.org/D47550

llvm-svn: 336578
2018-07-09 18:50:06 +00:00
Craig Topper 47170b3153 [X86] In combineFMA, make sure we bitcast the result of isFNEG back the expected type before creating the new FMA node.
Previously, we were creating malformed SDNodes, but nothing noticed because the type constraints prevented isel from noticing.

llvm-svn: 336566
2018-07-09 17:43:24 +00:00
Craig Topper e9cff7d47b [X86] Remove some patterns that include a bitcast of a floating point load to an integer type.
DAG combine should have converted the type of the load.

llvm-svn: 336557
2018-07-09 16:03:02 +00:00
Craig Topper 16ee4b4957 [X86] Remove some patterns that seems to be unreachable.
These patterns mapped (v2f64 (X86vzmovl (v2f64 (scalar_to_vector FR64:$src)))) to a MOVSD and an zeroing XOR. But the complexity of a pattern for (v2f64 (X86vzmovl (v2f64))) that selects MOVQ is artificially and hides this MOVSD pattern.

Weirder still, the SSE version of the pattern was explicitly blocked on SSE41, but yet we had copied it to AVX and AVX512.

llvm-svn: 336556
2018-07-09 16:03:01 +00:00
Craig Topper 22330c700b [X86] Remove some seemingly unnecessary AddedComplexity lines.
Looking at the generated tables this didn't seem to make an obvious difference in pattern priority.

llvm-svn: 336555
2018-07-09 16:02:59 +00:00
Sander de Smalen d3efb59f29 [AArch64][SVE] Asm: Support for CNT(B|H|W|D) and CNTP instructions.
This patch adds support for the following instructions:

  CNTB CNTH - Determine the number of active elements implied by
  CNTW CNTD   the named predicate constant, multiplied by an
              immediate, e.g.

                cnth x0, vl8, #16

  CNTP      - Count active predicate elements, e.g.
                cntp  x0, p0, p1.b

              counts the number of active elements in p1, predicated
              by p0, and stores the result in x0.

llvm-svn: 336552
2018-07-09 15:22:08 +00:00
Stefan Pintilie 3d76326d24 [Power9] Add __float128 support for compare operations
Added handling for the select f128.

Differential Revision: https://reviews.llvm.org/D48294

llvm-svn: 336548
2018-07-09 13:36:14 +00:00
Sander de Smalen 813b21e33a [AArch64][SVE] Asm: Support for remaining shift instructions.
This patch completes support for shifts, which include:
- LSL   - Logical Shift Left
- LSLR  - Logical Shift Left, Reversed form
- LSR   - Logical Shift Right
- LSRR  - Logical Shift Right, Reversed form
- ASR   - Arithmetic Shift Right
- ASRR  - Arithmetic Shift Right, Reversed form
- ASRD  - Arithmetic Shift Right for Divide

In the following variants:

- Predicated shift by immediate - ASR, LSL, LSR, ASRD
  e.g.
    asr z0.h, p0/m, z0.h, #1

  (active lanes of z0 shifted by #1)

- Unpredicated shift by immediate - ASR, LSL*, LSR*
  e.g.
    asr z0.h, z1.h, #1

  (all lanes of z1 shifted by #1, stored in z0)

- Predicated shift by vector - ASR, LSL*, LSR*
  e.g.
    asr z0.h, p0/m, z0.h, z1.h

  (active lanes of z0 shifted by z1, stored in z0)

- Predicated shift by vector, reversed form - ASRR, LSLR, LSRR
  e.g.
    lslr z0.h, p0/m, z0.h, z1.h

  (active lanes of z1 shifted by z0, stored in z0)

- Predicated shift left/right by wide vector - ASR, LSL, LSR
  e.g.
    lsl z0.h, p0/m, z0.h, z1.d

  (active lanes of z0 shifted by wide elements of vector z1)

- Unpredicated shift left/right by wide vector - ASR, LSL, LSR
  e.g.
    lsl z0.h, z1.h, z2.d

  (all lanes of z1 shifted by wide elements of z2, stored in z0)

*Variants added in previous patches.

llvm-svn: 336547
2018-07-09 13:23:41 +00:00
Stefan Maksimovic 0a23998fe7 [mips] Addition of the [d]rem and [d]remu instructions
Related to http://reviews.llvm.org/D15772
Depends on http://reviews.llvm.org/D16889
Adds [D]REM[U] instructions.

Patch By: Srdjan Obucina
Contributions from: Simon Dardis

Differential Revision: https://reviews.llvm.org/D17036

llvm-svn: 336545
2018-07-09 13:06:44 +00:00
Sander de Smalen 54077dcfcb [AArch64][SVE] Asm: Support for TBL instruction.
Support for SVE's TBL instruction for programmable table
lookup/permute using vector of element indices, e.g.

  tbl  z0.d, { z1.d }, z2.d

stores elements from z1, indexed by elements from z2, into z0.

llvm-svn: 336544
2018-07-09 12:32:56 +00:00
Sander de Smalen c69944c6b0 [AArch64][SVE] Asm: Support for ADR instruction.
Supporting various addressing modes:
- adr z0.s, [z0.s, z0.s]
- adr z0.s, [z0.s, z0.s, lsl #<shift>]
- adr z0.d, [z0.d, z0.d]
- adr z0.d, [z0.d, z0.d, lsl #<shift>]
- adr z0.d, [z0.d, z0.d, uxtw #<shift>]
- adr z0.d, [z0.d, z0.d, sxtw #<shift>]

Reviewers: rengolin, fhahn, SjoerdMeijer, samparker, javed.absar

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D48870

llvm-svn: 336533
2018-07-09 09:58:24 +00:00
Sander de Smalen bd513b42a1 [AArch64][SVE] Asm: Support for UZP and TRN instructions.
This patch adds support for:
  UZP1  Concatenate even elements from two vectors
  UZP2  Concatenate  odd elements from two vectors
  TRN1  Interleave  even elements from two vectors
  TRN2  Interleave   odd elements from two vectors

With variants for both data and predicate vectors, e.g.
  uzp1    z0.b, z1.b, z2.b
  trn2    p0.s, p1.s, p2.s

llvm-svn: 336531
2018-07-09 09:12:17 +00:00
Craig Topper b8145ec667 [X86] Improve the message for some asserts. Remove an if that is guaranteed true by said asserts.
This replaces some asserts in lowerV2F64VectorShuffle with the similar asserts from lowerVIF64VectorShuffle which are more readable. The original asserts mentioned a blend, but there's no guarantee that it is a blend.

Also remove an if that the asserts prove is always true. Mask[0] is always less than 2 and Mask[1] is always at least 2. Therefore (Mask[0] >= 2) + (Mask[1] >= 2) == 1 must wlays be true.

llvm-svn: 336517
2018-07-09 01:52:56 +00:00
Craig Topper c98c675f03 [X86] Remove an AddedComplexity line that seems unnecessary.
It only existed on SSE and AVX version. AVX512 version didn't have it.

I checked the generated table and this didn't seem necessary to creat a match preference.

llvm-svn: 336516
2018-07-08 22:57:33 +00:00
Roman Lebedev 75ce45376b [X86][Nearly NFC] Split SHLD/SHRD into their own WriteShiftDouble class
Summary:
{F6603964}
While there is still some discrepancies within that new group,
it is clearly separate from the other shifts.
And Agner's tables agree, these double shifts are clearly
different from the normal shifts/rotates.

I'm guessing `FeatureSlowSHLD` is related.

Indeed, a basic sched pair is *not* the /best/ match.
But keeping it in the WriteShift is /clearly/ not ideal either.
This can and likely will be fine-tuned later.

This is purely mechanical change, it does not change any numbers,
as the [lack of the change of] mca tests show.

Reviewers: craig.topper, RKSimon, andreadb

Reviewed By: craig.topper

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D49015

llvm-svn: 336515
2018-07-08 19:01:55 +00:00
Craig Topper 9e17073c21 [X86] Enhance combineFMA to look for FNEG behind an EXTRACT_VECTOR_ELT.
llvm-svn: 336514
2018-07-08 18:04:00 +00:00
Simon Pilgrim 2eced71ecf [X86][SSE] Combine v16i8 SHL by constants to multiplies
Pre-AVX512 (which can perform a quick extend/shift/truncate), extending to 2 v8i16 for the PMULLW and then truncating is more performant than relying on the generic PBLENDVB vXi8 shift path and uses a similar amount of mask constant pool data.

Differential Revision: https://reviews.llvm.org/D48963

llvm-svn: 336513
2018-07-08 12:47:50 +00:00
Simon Pilgrim 1795870bb8 [X86] Set scheduler classes to unsupported. NFCI.
While looking at PR36895 I noticed how much of the atom model was still setting schedules for unsupported SSE4+ instructions.

llvm-svn: 336512
2018-07-08 10:32:07 +00:00
Roman Lebedev fa988853bd [X86][Basically NFC] Sched: split WriteBitScan into WriteBSF/WriteBSR.
Summary:
Motivation: {F6597954}

This only does the mechanical splitting, does not actually change
any numbers, as the tests added in previous revision show.

Reviewers: craig.topper, RKSimon, courbet

Reviewed By: craig.topper

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D48998

llvm-svn: 336511
2018-07-08 09:50:25 +00:00
Craig Topper f1a981c705 [X86] Add back some intrinsic table entries lost in r336506.
llvm-svn: 336508
2018-07-08 01:23:49 +00:00
Craig Topper fdf3f1ff82 [X86] Add new scalar fma intrinsics with rounding mode that use f32/f64 types.
This allows us to handle masking in a very similar way to the default rounding version that uses llvm.fma.

I had to add new rounding mode CodeGenOnly instructions to support isel when we can't find a movss to grab the upper bits from to use the b_Int instruction.

Fast-isel tests have been updated to match new clang codegen.

We are currently having trouble folding fneg into the new intrinsic. I'm going to correct that in a follow up patch to keep the size of this one down.

A future patch will also remove the old intrinsics.

llvm-svn: 336506
2018-07-08 01:10:43 +00:00
Simon Pilgrim 23f9eddabe [SelectionDAG] Split float and integer isKnownNeverZero tests
Splits off isKnownNeverZeroFloat to handle +/- 0 float cases.

This will make it easier to be more aggressive with the integer isKnownNeverZero tests (similar to ValueTracking), use computeKnownBits etc.

Differential Revision: https://reviews.llvm.org/D48969

llvm-svn: 336492
2018-07-07 18:17:14 +00:00
Simon Pilgrim dc113dc7ed [CostModel][X86] Add SREM/UREM general and constant costs (PR38056)
We penalize general SDIV/UDIV costs but don't do the same for SREM/UREM.

This patch makes general vector SREM/UREM x20 as costly as scalar, the same approach as we do for SDIV/UDIV. The patch also extends the existing SDIV/UDIV constant costs for SREM/UREM - at the moment this means the additional cost of a MUL+SUB (see D48975).

Differential Revision: https://reviews.llvm.org/D48980

llvm-svn: 336486
2018-07-07 16:53:30 +00:00
Yvan Roux a96a04558d [MachineOutliner] Assert that Liveness tracking is accurate (NFC)
The checking is done deeper inside MachineBasicBlock, but this will
hopefully help to find issues when porting the machine outliner to a
target where Liveness tracking is broken (like ARM).

Differential Revision: https://reviews.llvm.org/D49023

llvm-svn: 336481
2018-07-07 08:02:19 +00:00
Craig Topper 2c27e33a58 [X86] Merge INTR_TYPE_3OP_RM with INTR_TYPE_3OP. Remove unused INTR_TYPE_1OP_RM.
llvm-svn: 336476
2018-07-07 01:04:22 +00:00
Vedant Kumar b3091da3af Use Type::isIntOrPtrTy where possible, NFC
It's a bit neater to write T.isIntOrPtrTy() over `T.isIntegerTy() ||
T.isPointerTy()`.

I used Python's re.sub with this regex to update users:

  r'([\w.\->()]+)isIntegerTy\(\)\s*\|\|\s*\1isPointerTy\(\)'

llvm-svn: 336462
2018-07-06 20:17:42 +00:00
Craig Topper f61c631b25 [X86] Remove patterns for MOVLPD/MOVLPS nodes with integer types.
Lowering shouldn't generate these. If we need to use them for integer types, it should use a bitcast.

llvm-svn: 336458
2018-07-06 18:47:57 +00:00
Craig Topper 77edbffabd [X86] Add more FMA3 memory folding patterns. Remove patterns that are no longer needed.
We've removed the legacy FMA3 intrinsics and are now using llvm.fma and extractelement/insertelement. So we don't need patterns for the nodes that could only be created by the old intrinscis. Those ISD opcodes still exist because we haven't dropped the AVX512 intrinsics yet, but those should go to EVEX instructions.

llvm-svn: 336457
2018-07-06 18:47:55 +00:00
Tom Stellard ec4feae1b6 AMDGPU: Fix UBSan error caused by r335942
Summary: Fixes PR38071.

Reviewers: arsenm, dstenb

Reviewed By: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D48979

llvm-svn: 336448
2018-07-06 17:16:17 +00:00
Sjoerd Meijer b3e06faa28 [ARM] ParallelDSP: added statistics, NFC.
Added statistics for the number of SMLAD instructions created, and
als renamed the pass name to -arm-parallel-dsp.

Differential Revision: https://reviews.llvm.org/D48971

llvm-svn: 336441
2018-07-06 14:47:09 +00:00
Sjoerd Meijer 35bd8f5d1e [AArch64] Armv8.4-A: TLB support
This adds:
- outer shareable TLB Maintenance instructions, and
- TLB range maintenance instructions.

llvm-svn: 336434
2018-07-06 13:00:16 +00:00
Sjoerd Meijer a3dad801b7 Recommit: [AArch64] Armv8.4-A: Flag manipulation instructions
Now with the asm operand definition included.

llvm-svn: 336432
2018-07-06 12:32:33 +00:00
Sjoerd Meijer 8203177e5e Revert [AArch64] Armv8.4-A: Flag manipulation instructions
It's causing build errors.

llvm-svn: 336422
2018-07-06 08:39:43 +00:00
Sjoerd Meijer 6f5f6d5b2e [AArch64] Armv8.4-A: Flag manipulation instructions
These instructions are added to AArch64 only.

Differential Revision: https://reviews.llvm.org/D48926

llvm-svn: 336421
2018-07-06 08:12:20 +00:00
Sjoerd Meijer 2a57b357a3 [AArch64][ARM] Armv8.4-A: Trace synchronization barrier instruction
This adds the Armv8.4-A Trace synchronization barrier (TSB) instruction.

Differential Revision: https://reviews.llvm.org/D48918

llvm-svn: 336418
2018-07-06 08:03:12 +00:00
Craig Topper c60e1807b3 [X86] Remove FMA4 scalar intrinsics. Use llvm.fma intrinsic instead.
The intrinsics can be implemented with a f32/f64 llvm.fma intrinsic and an insert into a zero vector.

There are a couple regressions here due to SelectionDAG not being able to pull an fneg through an extract_vector_elt. I'm not super worried about this though as InstCombine should be able to do it before we get to SelectionDAG.

llvm-svn: 336416
2018-07-06 07:14:41 +00:00
Craig Topper 7b35585ff1 [X86] Remove all of the avx512 masked packed fma intrinsics. Use llvm.fma or unmasked 512-bit intrinsics with rounding mode.
This upgrades all of the intrinsics to use fneg instructions to convert fma into fmsub/fnmsub/fnmadd/fmsubadd. And uses a select instruction for masking.

This matches how clang uses the intrinsics these days.

llvm-svn: 336409
2018-07-06 03:42:09 +00:00
Stefan Pintilie b351f09c9e [Power9] Add __float128 library call for frem
Power 9 does not have a hardware instruction for frem but we can call fmodf128.

Differential Revision: https://reviews.llvm.org/D48552

llvm-svn: 336406
2018-07-06 02:47:02 +00:00
Maksim Panchenko 89e4abe7b7 [X86][Disassembler] Fix LOCK prefix disassembler support
Summary:
If LOCK prefix is not the first prefix in an instruction, LLVM
disassembler silently drops the prefix.

The fix is to select a proper instruction with a builtin LOCK prefix if
one exists.

Reviewers: craig.topper

Reviewed By: craig.topper

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D49001

llvm-svn: 336400
2018-07-05 23:32:42 +00:00
Heejin Ahn 80d9f1708f [WebAssembly] Add missing _S opcodes of atomic stores to InstPrinter
Summary: This was missing in D48839 (rL336145).

Reviewers: aardappel

Subscribers: dschuff, sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D48992

llvm-svn: 336390
2018-07-05 21:27:09 +00:00
Sander de Smalen e2c10f8f47 This is a recommit of r336322, previously reverted in r336324 due to
a deficiency in TableGen that has been addressed in r336334.

[AArch64][SVE] Asm: Support for predicated FP rounding instructions.

This patch also adds instructions for predicated FP square-root and
reciprocal exponent.

The added instructions are:
- FRINTI  Round to integral value (current FPCR rounding mode)
- FRINTX  Round to integral value (current FPCR rounding mode, signalling inexact)
- FRINTA  Round to integral value (to nearest, with ties away from zero)
- FRINTN  Round to integral value (to nearest, with ties to even)
- FRINTZ  Round to integral value (toward zero)
- FRINTM  Round to integral value (toward minus Infinity)
- FRINTP  Round to integral value (toward plus Infinity)
- FSQRT   Floating-point square root
- FRECPX  Floating-point reciprocal exponent

llvm-svn: 336387
2018-07-05 20:21:21 +00:00
Craig Topper 88d361e976 [X86] Remove the last of the 'x86.fma.' intrinsics and autoupgrade them to 'llvm.fma'. Add upgrade tests for all.
Still need to remove the AVX512 masked versions.

llvm-svn: 336383
2018-07-05 18:43:58 +00:00
Craig Topper 4fe321d1ce [X86] Add SHUF128 to target shuffle decoding.
Differential Revision: https://reviews.llvm.org/D48954

llvm-svn: 336376
2018-07-05 17:10:17 +00:00
Matt Arsenault 29f303799b AMDGPU/GlobalISel: Implement custom kernel arg lowering
Avoid using allocateKernArg / AssignFn. We do not want any
of the type splitting properties of normal calling convention
lowering.

For now at least this exists alongside the IR argument lowering
pass. This is necessary to handle struct padding correctly while
some arguments are still skipped by the IR argument lowering
pass.

llvm-svn: 336373
2018-07-05 17:01:20 +00:00
Simon Pilgrim 8c3765dc6b [CostModel][X86] Add UDIV/UREM by pow2 costs
Normally InstCombine would have simplified these to SRL/AND instructions but we may still see these during SLP vectorization etc.

llvm-svn: 336371
2018-07-05 16:56:28 +00:00
Lei Huang 5612b90694 [Power9] Add lib calls for float128 operations with no equivalent PPC instructions
Map the following instructions to the proper float128 lib calls:
  pow[i], exp[2], log[2|10], sin, cos, fmin, fmax

Differential Revision: https://reviews.llvm.org/D48544

llvm-svn: 336361
2018-07-05 15:21:37 +00:00
Ryan Taylor 5f04458a61 [AMDGPU] Add VALU to V_INTERP Instructions
Wait states are not properly being inserted after buffer_store for v_interp instructions.

Add VALU to V_INTERP instructions so that the GCNHazardRecognizer can
check and insert the appropriate wait states when needed.

Differential Revision: https://reviews.llvm.org/D48772

Change-Id: Id540c9b074fc69b5c1de6b182276aa089c74aa64
llvm-svn: 336339
2018-07-05 12:02:07 +00:00
Simon Pilgrim 6dc45e6ca0 Try to fix -Wimplicit-fallthrough warning. NFCI.
llvm-svn: 336331
2018-07-05 09:48:01 +00:00
Aleksandar Beserminji 3239ba8c0e [mips] Fix atomic operations at O0, v3
Similar to PR/25526, fast-regalloc introduces spills at the end of basic
blocks. When this occurs in between an ll and sc, the stores can cause the
atomic sequence to fail.

This patch fixes the issue by introducing more pseudos to represent atomic
operations and moving their lowering to after the expansion of postRA
pseudos.

This version addresses issues with the initial implementation and covers
all atomic operations.

This resolves PR/32020.

Thanks to James Cowgill for reporting the issue!

Patch By: Simon Dardis

Differential Revision: https://reviews.llvm.org/D31287

llvm-svn: 336328
2018-07-05 09:27:05 +00:00
Ivan A. Kosarev 466037900c [NEON] Fix combining of vldx_dup intrinsics with updating of base addresses
Resolves:
Unsupported ARM Neon intrinsics in Target-specific DAG combine
function for VLDDUP
https://bugs.llvm.org/show_bug.cgi?id=38031

Related diff: D48439

Differential Revision: https://reviews.llvm.org/D48920

llvm-svn: 336325
2018-07-05 08:59:49 +00:00
Sander de Smalen 097ab704c9 Reverting r336322 for now, as it causes an assert failure
in TableGen, for which there is already a patch in Phabricator
(D48937) that needs to be committed first.

llvm-svn: 336324
2018-07-05 08:52:03 +00:00
Sander de Smalen ef44226c4f [AArch64][SVE] Asm: Support for predicated FP rounding instructions.
This patch also adds instructions for predicated FP square-root and
reciprocal exponent.

The added instructions are:
- FRINTI  Round to integral value (current FPCR rounding mode)
- FRINTX  Round to integral value (current FPCR rounding mode, signalling inexact)
- FRINTA  Round to integral value (to nearest, with ties away from zero)
- FRINTN  Round to integral value (to nearest, with ties to even)
- FRINTZ  Round to integral value (toward zero)
- FRINTM  Round to integral value (toward minus Infinity)
- FRINTP  Round to integral value (toward plus Infinity)
- FSQRT   Floating-point square root
- FRECPX  Floating-point reciprocal exponent

llvm-svn: 336322
2018-07-05 08:38:30 +00:00
Sjoerd Meijer 27be58b307 [ARM] ParallelDSP: only support i16 loads for now
We were miscompiling i8 loads, so reject them as unsupported narrow operations
for now.

Differential Revision: https://reviews.llvm.org/D48944

llvm-svn: 336319
2018-07-05 08:21:40 +00:00
Sander de Smalen 592718f906 [AArch64][SVE] Asm: Support for signed/unsigned MIN/MAX/ABD
This patch implements the following varieties:

- Unpredicated signed max,   e.g. smax z0.h, z1.h, #-128
- Unpredicated signed min,   e.g. smin z0.h, z1.h, #-128

- Unpredicated unsigned max, e.g. umax z0.h, z1.h, #255
- Unpredicated unsigned min, e.g. umin z0.h, z1.h, #255

- Predicated signed max,     e.g. smax z0.h, p0/m, z0.h, z1.h
- Predicated signed min,     e.g. smin z0.h, p0/m, z0.h, z1.h
- Predicated signed abd,     e.g. sabd z0.h, p0/m, z0.h, z1.h

- Predicated unsigned max,   e.g. umax z0.h, p0/m, z0.h, z1.h
- Predicated unsigned min,   e.g. umin z0.h, p0/m, z0.h, z1.h
- Predicated unsigned abd,   e.g. uabd z0.h, p0/m, z0.h, z1.h

llvm-svn: 336317
2018-07-05 07:54:10 +00:00
Lei Huang 66e22c21c3 [Power9] Optimize codgen for conversions of int to float128
Optimize code sequences for integer conversion to fp128 when the integer is a result of:
  * float->int
  * float->long
  * double->int
  * double->long

Differential Revision: https://reviews.llvm.org/D48429

llvm-svn: 336316
2018-07-05 07:46:01 +00:00
Craig Topper 350c5f1881 [X86] Remove X86 specific scalar FMA intrinsics and upgrade to tart independent FMA and extractelement/insertelement.
llvm-svn: 336315
2018-07-05 06:52:55 +00:00
Lei Huang a855e17f09 [Power9] Ensure float128 in non-homogenous aggregates are passed via VSX reg
Non-homogenous aggregates are passed in consecutive GPRs, in GPRs and in memory,
or in memory. This patch ensures that float128 members of non-homogenous
aggregates are passed via VSX registers.

This is done via custom lowering a bitcast of a build_pari(i64,i64) to float128
to a new PPCISD node, BUILD_FP128.

Differential Revision: https://reviews.llvm.org/D48308

llvm-svn: 336310
2018-07-05 06:21:37 +00:00
Lei Huang d17c39ccaa [Power9]Legalize and emit code for quad-precision convert from single-precision
Legalize and emit code for quad-precision floating point operation conversion of
single-precision value to quad-precision.

Differential Revision: https://reviews.llvm.org/D47569

llvm-svn: 336307
2018-07-05 04:18:37 +00:00
Lei Huang a26f3be454 [Power9] Implement float128 parameter passing and return values
This patch enable parameter passing and return by value for float128 types.
Passing aggregate/union which contain float128 members will be submitted in
subsequent patches.

Differential Revision: https://reviews.llvm.org/D47552

llvm-svn: 336306
2018-07-05 04:10:15 +00:00
Craig Topper 2db909cfae [X86] Remove some isel patterns for X86ISD::SELECTS that specifically looked for the v1i1 mask to have come from a scalar_to_vector from GR8.
We have patterns for SELECTS that top at v1i1 and we have a pattern for (v1i1 (scalar_to_vector GR8)). The patterns being removed here do the same thing as the two other patterns combined so there is no need for them.

llvm-svn: 336305
2018-07-05 03:01:29 +00:00
Craig Topper 95eb88abfe [X86] Add support for combining FMSUB/FNMADD/FNMSUB ISD nodes with an fneg input.
Previously we could only negate the FMADD opcodes. This used to be mostly ok when we lowered FMA intrinsics during lowering. But with the move to llvm.fma from target specific intrinsics, we can combine (fneg (fma)) to (fmsub) earlier. So if we start with (fneg (fma (fneg))) we would get stuck at (fmsub (fneg)).

This patch fixes that so we can also combine things like (fmsub (fneg)).

llvm-svn: 336304
2018-07-05 02:52:56 +00:00
Craig Topper e4b9257b69 [X86] Remove some of the packed FMA3 intrinsics since we no longer use them in clang.
There's a regression in here due to inability to combine fneg inputs of X86ISD::FMSUB/FNMSUB/FNMADD nodes.

More removals to come, but I wanted to stop and fix the regression that showed up in this first.

llvm-svn: 336303
2018-07-05 02:52:54 +00:00
Lei Huang 6270ab6ce4 [Power9]Legalize and emit code for round & convert quad-precision values
Legalize and emit code for round & convert float128 to double precision and
single precision.

Differential Revision: https://reviews.llvm.org/D46997

llvm-svn: 336299
2018-07-04 21:59:16 +00:00
Vladimir Stefanovic 87b60a0e00 [mips] Warn when crc, ginv, virt flags are used with too old revision
CRC and GINV ASE require revision 6, Virtualization requires revision 5.
Print a warning when revision is older than required.

Differential Revision: https://reviews.llvm.org/D48843

llvm-svn: 336296
2018-07-04 19:26:31 +00:00
Stefan Pintilie cb4f0c5c07 [PowerPC] Replace the Post RA List Scheduler with the Machine Scheduler
We want to run the Machine Scheduler instead of the List Scheduler after RA.
  Checked with a performance run on a Power 9 machine with SPEC 2006 and while
  some benchmarks improved and others degraded the geomean was slightly improved
  with the Machine Scheduler.

  Differential Revision: https://reviews.llvm.org/D45265

llvm-svn: 336295
2018-07-04 18:54:25 +00:00
Volodymyr Turanskyy 17c0c4e742 [ARM] [Assembler] Support negative immediates: cover few missing cases
Support for negative immediates was implemented in
https://reviews.llvm.org/rL298380, however few instruction options were missing.

This change adds negative immediates support and respective tests
for the following:

ADD
ADDS
ADDS.W
AND.W
ANDS
BIC.W
BICS
BICS.W
SUB
SUBS
SUBS.W

Differential Revision: https://reviews.llvm.org/D48649

llvm-svn: 336286
2018-07-04 16:11:15 +00:00
Yvan Roux eaececf5e0 [MachineOutliner] Fix typo in getOutliningCandidateInfo function name
getOutlininingCandidateInfo -> getOutliningCandidateInfo

Differential Revision: https://reviews.llvm.org/D48867

llvm-svn: 336285
2018-07-04 15:37:08 +00:00
Sander de Smalen 1e4dc2e97d [AArch64][SVE] Asm: Support for reversed subtract (SUBR) instruction.
This patch adds both a vector and an immediate form, e.g.                  
                                                                           
- Vector form:                                                             
                                                                           
    subr z0.h, p0/m, z0.h, z1.h                                            
                                                                           
  subtract active elements of z0 from z1, and store the result in z0.      
                                                                           
- Immediate form:                                                          
                                                                           
    subr z0.h, z0.h, #255                                                  
                                                                           
  subtract elements of z0, and store the result in z0.

llvm-svn: 336274
2018-07-04 14:05:33 +00:00
Sander de Smalen ab2b0530d9 [AArch64][SVE] Asm: Support for instructions to set/read FFR.
Includes instructions to read the First-Faulting Register (FFR):
- RDFFR (unpredicated)
    rdffr   p0.b
- RDFFR (predicated)
    rdffr   p0.b, p0/z
- RDFFRS (predicated, sets condition flags)
    rdffr   p0.b, p0/z

Includes instructions to set/write the FFR:
- SETFFR (no arguments, sets the FFR to all true)
    setffr
- WRFFR  (unpredicated)
    wrffr   p0.b

llvm-svn: 336267
2018-07-04 12:58:46 +00:00
Sander de Smalen 80283b2af4 [AArch64][SVE] Asm: Support for FP conversion instructions.
The variants added are:

- fcvt   (FP convert precision)
- scvtf  (signed int -> FP) 
- ucvtf  (unsigned int -> FP) 
- fcvtzs (FP -> signed int (round to zero))
- fcvtzu (FP -> unsigned int (round to zero))

For example:
  fcvt   z0.h, p0/m, z0.s  (single- to half-precision FP) 
  scvtf  z0.h, p0/m, z0.s  (32-bit int to half-precision FP) 
  ucvtf  z0.h, p0/m, z0.s  (32-bit unsigned int to half-precision FP) 
  fcvtzs z0.s, p0/m, z0.h  (half-precision FP to 32-bit int)
  fcvtzu z0.s, p0/m, z0.h  (half-precision FP to 32-bit unsigned int)

llvm-svn: 336265
2018-07-04 12:13:17 +00:00
Simon Pilgrim c3e1617bf9 [X86][SSE] Blend any v8i16/v4i32 shift with 2 shift unique values (REAPPLIED)
We were only doing this for basic blends, despite shuffle lowering now being good enough to handle more complex blends. This means that the two v8i16 splat shifts are performed in parallel instead of serially as the general shift case.

Reapplied with a fixed (extra null tests) version of rL336113 after reversion in rL336189 - extra test case added at rL336247.

llvm-svn: 336250
2018-07-04 09:12:48 +00:00
Sander de Smalen e31e6d46dd [AArch64][SVE] Asm: Support for SVE condition code aliases
SVE overloads the AArch64 PSTATE condition flags and introduces
a set of condition code aliases for the assembler. The 
details are described in section 2.2 of the architecture
reference manual supplement for SVE.

In short:

  SVE alias =>  AArch64 name
  --------------------------
  NONE      => EQ
  ANY       => NE
  NLAST     => HS
  LAST      => LO
  FIRST     => MI
  NFRST     => PL
  PMORE     => HI
  PLAST     => LS
  TCONT     => GE
  TSTOP     => LT

Reviewers: rengolin, fhahn, SjoerdMeijer, samparker, javed.absar

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D48869

llvm-svn: 336245
2018-07-04 08:50:49 +00:00
Jacques Pienaar 0b1e1a4cd4 [lanai] Handle atomic load of i8 like regular load.
Loads and stores less than 64-bits are already atomic, this adds support for a special case thereof. This needs to be expanded.

llvm-svn: 336236
2018-07-03 22:57:51 +00:00
Fangrui Song 78ab286aa0 [X86][AsmParser] Fix inconsistent declaration parameter name in r336218
llvm-svn: 336232
2018-07-03 21:40:03 +00:00
Benjamin Kramer 2f0dd1405e [NVPTX] Expand v2f16 INSERT_VECTOR_ELT
Vectorization can create them.

llvm-svn: 336227
2018-07-03 20:40:04 +00:00
Craig Topper e317533dcf [X86] Remove repeated 'the' from multiple comments that have been copy and pasted. NFC
llvm-svn: 336226
2018-07-03 20:39:55 +00:00
Fangrui Song 68169343a5 [ARM] Fix inconsistent declaration parameter name in r336195
llvm-svn: 336223
2018-07-03 19:12:27 +00:00
Fangrui Song bc5c7f2ef0 [AArch64] Make function parameter names in declarations match those of definitions
llvm-svn: 336222
2018-07-03 19:07:53 +00:00
Craig Topper adc51ae425 [X86][AsmParser] Rework the in/out (%dx) hack one more time.
This patch adds a new token type specifically for (%dx). We will now always create this token when we parse (%dx). After all operands have been parsed, if the mnemonic is in/out we'll morph this token to a regular register token. Otherwise we keep it as the special DX token which won't match any instructions.

This removes the need for passing Mnemonic through the parsing functions. It also seems closer to gas where when its used on the wrong instruction it just gets diagnosed as an invalid operand rather than a bad memory address.

llvm-svn: 336218
2018-07-03 18:07:30 +00:00
Craig Topper bc598f0d61 [X86][AsmParser] Don't consider %eip as a valid register outside of 32-bit mode.
This might make the error message added in r335668 unneeded, but I'm not sure yet.

The check for RIP is technically unnecessary since RIP is in GR64, but that fact is kind of surprising so be explicit.

llvm-svn: 336217
2018-07-03 17:40:51 +00:00
Sander de Smalen 128fdfa23f [AArch64][SVE] Asm: Support for FP Complex ADD/MLA.
The variants added in this patch are:

- Predicated Complex floating point ADD with rotate, e.g.

   fcadd   z0.h, p0/m, z0.h, z1.h, #90

- Predicated Complex floating point MLA with rotate, e.g.

   fcmla   z0.h, p0/m, z1.h, z2.h, #180

- Unpredicated Complex floating point MLA with rotate (indexed operand), e.g.

   fcmla   z0.h, p0/m, z1.h, z2.h[0], #180

Reviewers: rengolin, fhahn, SjoerdMeijer, samparker, javed.absar

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D48824

llvm-svn: 336210
2018-07-03 16:01:27 +00:00
Amara Emerson d912ffaba5 [AArch64][GlobalISel] Fix fallbacks introduced in r336120 due to unselectable stores.
r336120 resulted in falling back to SelectionDAG more often due to the G_STORE
MMOs not matching the vreg size. This fixes that by explicitly any-extending the
value.

llvm-svn: 336209
2018-07-03 15:59:26 +00:00
Sander de Smalen 8cd1f53334 [AArch64][SVE] Asm: Support for FMUL (indexed)
Unpredicated FP-multiply of SVE vector with a vector-element given by
vector[index], for example:

  fmul z0.s, z1.s, z2.s[0]

which performs an unpredicated FP-multiply of all 32-bit elements in
'z1' with the first element from 'z2'.

This patch adds restricted register classes for SVE vectors:
  ZPR_3b (only z0..z7 are allowed)  - for indexed vector of 16/32-bit elements.
  ZPR_4b (only z0..z15 are allowed) - for indexed vector of 64-bit elements.

Reviewers: rengolin, fhahn, SjoerdMeijer, samparker, javed.absar

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D48823

llvm-svn: 336205
2018-07-03 15:31:04 +00:00
Sander de Smalen cbd224941f [AArch64][SVE] Asm: Support for predicated unary operations.
The patch includes support for the following instructions:

       ABS z0.h, p0/m, z0.h
       NEG z0.h, p0/m, z0.h

  (S|U)XTB z0.h, p0/m, z0.h
  (S|U)XTB z0.s, p0/m, z0.s
  (S|U)XTB z0.d, p0/m, z0.d

  (S|U)XTH z0.s, p0/m, z0.s
  (S|U)XTH z0.d, p0/m, z0.d

  (S|U)XTW z0.d, p0/m, z0.d

llvm-svn: 336204
2018-07-03 14:57:48 +00:00
Sam Parker ffc1681620 [ARM][NFC] Refactor sequential access for DSP
With a view to support parallel operations that have their results
stored to memory, refactor the consecutive access helper out so it
could support stores instructions.

Differential Revision: https://reviews.llvm.org/D48872

llvm-svn: 336195
2018-07-03 12:44:16 +00:00
Sjoerd Meijer 173b7f0ec7 [AArch64] Armv8.4-A: system registers
This adds the following system registers:
- RAS registers,
- MPAM registers,
- Activitiy monitor registers,
- Trace Extension registers,
- Timing insensitivity of data processing instructions,
- Enhanced Support for Nested Virtualization.

Differential Revision: https://reviews.llvm.org/D48871

llvm-svn: 336193
2018-07-03 12:09:20 +00:00
Benjamin Kramer fd171f2f89 Revert "[X86][SSE] Blend any v8i16/v4i32 shift with 2 shift unique values"
This reverts commit r336113. It causes crashes.

llvm-svn: 336189
2018-07-03 11:15:17 +00:00
Sander de Smalen 7fc8543208 [AArch64][SVE] Asm: Support for saturing ADD/SUB instructions.
The variants added are:
    signed Saturating ADD/SUB (immediate)  e.g. sqadd z0.h, z0.h, #42
  unsigned Saturating ADD/SUB (immediate)  e.g. uqadd z0.h, z0.h, #42
    signed Saturating ADD/SUB (vectors)    e.g. sqadd z0.h, z0.h, z1.h
  unsigned Saturating ADD/SUB (vectors)    e.g. uqadd z0.h, z0.h, z1.h

llvm-svn: 336186
2018-07-03 09:48:22 +00:00
Petar Jovanovic 226e6117ae [MIPS GlobalISel] Lower arguments using stack
Lower more than 4 arguments using stack. This patch targets MIPS32.
It supports only functions with arguments of type i32.

Patch by Petar Avramovic.

Differential Revision: https://reviews.llvm.org/D47934

llvm-svn: 336185
2018-07-03 09:31:48 +00:00
Sander de Smalen 8fcc3f5feb [AArch64][SVE] Asm: Support for vector element FP compare.
Contains the following variants:

- Compare with (elements from) other vector
  instructions: fcmeq, fcmgt, fcmge, fcmne, fcmuo.
  aliases: fcmle, fcmlt.

  e.g. fcmle   p0.h, p0/z, z0.h, z1.h => fcmge p0.h, p0/z, z1.h, z0.h

- Compare absolute values with (absolute values from) other vector.
  instructions: facge, facgt.
  aliases: facle, faclt.

  e.g. facle   p0.h, p0/z, z0.h, z1.h => facge   p0.h, p0/z, z1.h, z0.h

- Compare vector elements with #0.0
  instructions: fcmeq, fcmgt, fcmge, fcmle, fcmlt, fcmne.

  e.g. fcmle   p0.h, p0/z, z0.h, #0.0

llvm-svn: 336182
2018-07-03 09:07:23 +00:00
Heejin Ahn 402b490843 [WebAssembly] Support for atomic stores
Summary: Add support for atomic store instructions.

Reviewers: dschuff

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D48839

llvm-svn: 336145
2018-07-02 21:22:59 +00:00
Vadzim Dambrouski fd10286e04 [ARM] Fix PR37382: Don't optimize mul.with.overflow on thumbv6m.
Reviewers: efriedma, rogfer01, javed.absar

Reviewed By: efriedma, rogfer01

Subscribers: kristof.beyls, chrib, llvm-commits

Differential Revision: https://reviews.llvm.org/D48846

llvm-svn: 336144
2018-07-02 21:05:26 +00:00
Dan Gohman b01d87622b [WebAssembly] Fix fast-isel optimization of branch conditions.
LLVM doesn't guarantee anything about the high bits of a register holding
an i1 value at the IR level, so don't translate LLVM IR i1 values directly
into WebAssembly conditional branch operands. WebAssembly's conditional
branches do demand all 32 bits be valid.

Fixes PR38019.

llvm-svn: 336138
2018-07-02 19:45:57 +00:00
Krzysztof Parzyszek fd97494984 [X86] Add phony registers for high halves of regs with low halves
Add registers still missing after r328016 (D43353):
- for bits 15-8  of SI, DI, BP, SP (*H), and R8-R15 (*BH),
- for bits 31-16 of R8-R15 (*WH).

Thanks to Craig Topper for pointing it out.

llvm-svn: 336134
2018-07-02 19:05:09 +00:00
Craig Topper 56440b9745 [X86] Don't use aligned load/store instructions for fp128 if the load/store isn't aligned.
Similarily, don't fold fp128 loads into SSE instructions if the load isn't aligned. Unless we're targeting an AMD CPU that doesn't check alignment on arithmetic instructions.

Should fix PR38001

llvm-svn: 336121
2018-07-02 17:01:54 +00:00
Amara Emerson 846f2436e8 [AArch64][GlobalISel] Any-extend vararg parameters to stack slot size on Darwin.
We currently don't any-extend vararg parameters before storing them to the stack
locations on Darwin. However, SelectionDAG however does this, and so user code
is in the wild which inadvertently relies on this extension. This can manifest
in cases where the value stored is (int)0, but the actual parameter is interpreted
by va_arg as a pointer, and so not extending to 64 bits causes the callee to
load additional undefined bits.

llvm-svn: 336120
2018-07-02 16:39:09 +00:00
Simon Pilgrim 2bc8e079f2 [X86][SSE] Blend any v8i16/v4i32 shift with 2 shift unique values
We were only doing this for basic blends, despite shuffle lowering now being good enough to handle more complex blends. This means that the two v8i16 splat shifts are performed in parallel instead of serially as the general shift case.

llvm-svn: 336113
2018-07-02 15:14:07 +00:00
Alex Bradbury c48908781d [X86] Use addAliasForDirective to support the .word directive (reland)
The X86 asm parser currently has custom parsing logic for .word. Rather than
use this custom logic, we can just use addAliasForDirective to enable the
reuse of AsmParser::parseDirectiveValue.

See also similar changes to Sparc (rL333078), AArch64 (rL333077), and Hexagon
(rL332607) backends.

Differential Revision: https://reviews.llvm.org/D47004

This is a fixed reland of rL336100. This should have been caught in 
pre-commit testing so apologies for the noise.

llvm-svn: 336104
2018-07-02 13:49:52 +00:00