Commit Graph

47310 Commits

Author SHA1 Message Date
Craig Topper ebd3e4a69c [X86] Remove SLDT64m instruction.
It doesn't really exist. The instruction always writes 16-bits of memory. Putting a REX.w on it won't change anything.

While I was touching the encoding tests to remove it, I added some other missing register form test cases.

llvm-svn: 331135
2018-04-29 04:50:53 +00:00
Craig Topper 8e3dbfd725 [X86] Remove unnecessary InstAliases. NFCI
These used to disambiguate MOV16ms/MOV16sm from other size instructions that no longer exist.

llvm-svn: 331134
2018-04-29 04:06:02 +00:00
Craig Topper a3f52aaa19 [X86] Use getX86SubSuperRegister in addGR32orGR64Operands in the AsmParser instead of duplicating its functionality. NFC
llvm-svn: 331128
2018-04-29 00:53:10 +00:00
Craig Topper 06624e1a93 [X86] Restrict many of the InstAliases to either to only att or intel syntax. NFCI
Many of these aliases exist to give one syntax or the other a slightly different mnemonic and the other variant gets a duplicate of its normal mnemonic

This patch restricts a lot of these to only one variant so we don't get the duplication.

This removes a lot of duplicate entries from the matcher table. It also reduces the number of warnings printed when you enable the ambiguous match warning in tablegen.

llvm-svn: 331117
2018-04-28 18:46:11 +00:00
Simon Pilgrim badf63e95c [X86] Remove unnecessary rotate-carry folded InstRW overrides.
Merge some remaining instregex entries.

llvm-svn: 331116
2018-04-28 18:45:16 +00:00
Daniel Sanders 5eb9f581b6 [globalisel][legalizerinfo] Introduce dedicated extending loads and add lowerings for them
Summary:
Previously, a extending load was represented at (G_*EXT (G_LOAD x)).
This had a few drawbacks:
* G_LOAD had to be legal for all sizes you could extend from, even if
  registers didn't naturally hold those sizes.
* All sizes you could extend from had to be allocatable just in case the
  extend went missing (e.g. by optimization).
* At minimum, G_*EXT and G_TRUNC had to be legal for these sizes. As we
  improve optimization of extends and truncates, this legality requirement
  would spread without considerable care w.r.t when certain combines were
  permitted.
* The SelectionDAG importer required some ugly and fragile pattern
  rewriting to translate patterns into this style.

This patch begins changing the representation to:
* (G_[SZ]EXTLOAD x)
* (G_LOAD x) any-extends when MMO.getSize() * 8 < ResultTy.getSizeInBits()
which resolves these issues by allowing targets to work entirely in their
native register sizes, and by having a more direct translation from
SelectionDAG patterns.

This patch introduces the new generic instructions and new variation on
G_LOAD and adds lowering for them to convert back to the existing
representations.

Depends on D45466

Reviewers: ab, aditya_nandakumar, bogner, rtereshin, volkan, rovka, aemerson, javed.absar

Reviewed By: aemerson

Subscribers: aemerson, kristof.beyls, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D45540

llvm-svn: 331115
2018-04-28 18:14:50 +00:00
Simon Pilgrim 39d7720354 [X86] Remove unnecessary shift/rotate folded InstRW overrides.
llvm-svn: 331110
2018-04-28 15:32:19 +00:00
Simon Pilgrim 9f561dd54a [X86][SSE] Stop hard coding some instruction scheduler classes.
Make these arguments to the multiclass to allow easier specialization.

llvm-svn: 331107
2018-04-28 14:08:51 +00:00
Simon Pilgrim c2fa056a2d [X86][HW] Cleanup Haswell model. NFCI.
Moved LAHF/SAHF to instrs instead of instregex.

Removed some unnecessary instregex entries.

llvm-svn: 331106
2018-04-28 14:06:28 +00:00
Craig Topper 9e11f96df6 [X86] Remove mayLoad flag from BNDMK/BNDCL/BNDCN/BNDCU.
The instruction documentation specifically says that these instruction don't access memory.

llvm-svn: 331105
2018-04-28 06:58:27 +00:00
Craig Topper 6d6b2b9503 [X86] Change memory operand of BNDMK/BNDCL/BNDCU/BNDCN/BNDST to anymem.
These instruction don't use their memory operands as normal memory operands. They're just used as addresses. They don't have a size because they aren't directly representing a load or store.

llvm-svn: 331104
2018-04-28 06:58:26 +00:00
Craig Topper ef3866a859 [X86] Remove REX.W from 64-bit mode BND instructions.
As far as I can tell from the docs, the instructions are automatically 64-bit in 64-bit mode. We don't need REX.W.

llvm-svn: 331102
2018-04-28 06:02:40 +00:00
Craig Topper 8a6532ae84 [X86] Rename BNDMOV instructions and hide redundant instruction encoding from the assembler.
Favor the 0x1a encoding for register/register move to match gas.

The instructions used RM and MR in their name along with rr/rm/mr at the end. To make more consistent with other instructions remove the RM/MR and use rr/rm/mr/rr_REV.

Hide the _REV encoding from the assembler but leave it for the disassembler.

llvm-svn: 331101
2018-04-28 06:02:39 +00:00
Jessica Paquette 0b6724917a [MachineOutliner] Add defs to calls + don't track liveness on outlined functions
This commit makes it so that if you outline a def of some register, then the
call instruction created by the outliner actually reflects that the register
is defined by the call. It also makes it so that outlined functions don't
have the TracksLiveness property.

Outlined calls shouldn't break liveness assumptions that someone might make.

This also un-XFAILs the noredzone test, and updates the calls test.

llvm-svn: 331095
2018-04-27 23:36:35 +00:00
Craig Topper d656410293 [X86] Make the STTNI flag intrinsics use the flags from pcmpestrm/pcmpistrm if the mask instrinsics are also used in the same basic block.
Summary:
Previously the flag intrinsics always used the index instructions even if a mask instruction also exists.

To fix fix this I've created a single ISD node type that returns index, mask, and flags. The SelectionDAG CSE process will merge all flavors of intrinsics with the same inputs to a s ingle node. Then during isel we just have to look at which results are used to know what instruction to generate. If both mask and index are used we'll need to emit two instructions. But for all other cases we can emit a single instruction.

Since I had to do manual isel anyway, I've removed the pseudo instructions and custom inserter code that was working around tablegen limitations with multiple implicit defs.

I've also renamed the recently added sse42.ll test case to sttni.ll since it focuses on that subset of the sse4.2 instructions.

Reviewers: chandlerc, RKSimon, spatel

Reviewed By: chandlerc

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D46202

llvm-svn: 331091
2018-04-27 22:15:33 +00:00
Simon Pilgrim 8ee7d01dcf [X86] Merge some x87 instruction instregex single matches. NFCI.
llvm-svn: 331084
2018-04-27 21:14:19 +00:00
Daniel Sanders 27fe8a5011 [globalisel][legalizerinfo] Add support for legalization based on the MachineMemOperand
Summary:
Currently only the memory size is supported but others can be added as
needed.

narrowScalar for G_LOAD and G_STORE now correctly update the
MachineMemOperand and will refuse to legalize atomics since those need more
careful expansions to maintain atomicity.

Reviewers: ab, aditya_nandakumar, bogner, rtereshin, aemerson, javed.absar

Reviewed By: aemerson

Subscribers: aemerson, rovka, kristof.beyls, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D45466

llvm-svn: 331071
2018-04-27 19:48:53 +00:00
Jun Bum Lim 47aece1344 [CodeGen] Use RegUnits to track register aliases (NFC)
Summary: Use RegUnits to track register aliases in PostRASink and AArch64LoadStoreOptimizer.

Reviewers: thegameg, mcrosier, gberry, qcolombet, sebpop, MatzeB, t.p.northover, javed.absar

Reviewed By: thegameg, sebpop

Subscribers: javed.absar, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D45695

llvm-svn: 331066
2018-04-27 18:44:37 +00:00
Simon Pilgrim 8a937e00d8 [X86] Split WriteFBlend/WriteFVarBlend/WriteFVarShuffle into XMM and YMM/ZMM scheduler classes
This removes all the WriteFBlend/WriteFVarBlend InstRW overrides - some WriteFVarShuffle remain to be fixed.

llvm-svn: 331065
2018-04-27 18:19:48 +00:00
Simon Pilgrim c3c767bf50 [X86] Split WriteFHadd into XMM and YMM/ZMM scheduler classes
This removes all the HADD/HSUB PS/PD InstRW overrides.

llvm-svn: 331054
2018-04-27 16:11:57 +00:00
Simon Pilgrim b2aa89c909 [X86][AVX] Split WriteFLogic into XMM and YMM/ZMM scheduler classes
This removes all the AND/ANDN/OR/XOR PS/PD InstRW overrides.

llvm-svn: 331051
2018-04-27 15:50:33 +00:00
Simon Dardis e3c3c5a7a7 [mips] Analyze and provide selection patterns microMIPSR6 branches
These branches were previously unanalyzable and unselectable. Add them and
recognize how to generate their inverses.

Reviewers: smaksimovic, atanasyan, abeserminji

Differential Revision: https://reviews.llvm.org/D46113

llvm-svn: 331050
2018-04-27 15:49:49 +00:00
Francis Visoiu Mistrih c855e92ca9 [AArch64] Place the first ldp at the end when ReverseCSRRestoreSeq is true
Put the first ldp at the end, so that the load-store optimizer can run
and merge the ldp and the add into a post-index ldp.

This didn't work in case no frame was needed and resulted in code size
regressions.

llvm-svn: 331044
2018-04-27 15:30:54 +00:00
Jonas Paulsson 9a485985cd [SystemZ] Remove scheduling info from some Pseudo instructions (NFC).
If the MachineInstr uses a custom inserter and is then erased after
instruction selection, there is no use for mapping it to a sched class.

Review: Ulrich Weigand
llvm-svn: 331040
2018-04-27 14:09:03 +00:00
Oliver Stannard 76088a5929 [AArch64] Codegen for v8.2A dot product intrinsics
This adds IR intrinsics for the AArch64 dot-product instructions introduced in
v8.2-A.

Differential revisioon: https://reviews.llvm.org/D46107

llvm-svn: 331036
2018-04-27 13:45:32 +00:00
Benjamin Kramer 733c7fc55d [NVPTX] Turn on Loop/SLP vectorization
Since PTX has grown a <2 x half> datatype vectorization has become more
important. The late LoadStoreVectorizer intentionally only does loads
and stores, but now arithmetic has to be vectorized for optimal
throughput too.

This is still very limited, SLP vectorization happily creates <2 x half>
if it's a legal type but there's still a lot of register moving
happening to get that fed into a vectorized store. Overall it's a small
performance win by reducing the amount of arithmetic instructions.

I haven't really checked what the loop vectorizer does to PTX code, the
cost model there might need some more tweaks. I didn't see it causing
harm though.

Differential Revision: https://reviews.llvm.org/D46130

llvm-svn: 331035
2018-04-27 13:36:05 +00:00
Simon Pilgrim aef5ca7299 [X86] Replace some system instruction instregex single matches with instrs entry. NFCI.
llvm-svn: 331034
2018-04-27 13:32:42 +00:00
Aleksandar Beserminji 3546c1603a [mips] Fix how compiler fuse instructions to fmadd/fmsub
This patch makes compiler does not fuse fmul and fadd/fsub into
fmadd/fmsub by default. Instead, -fp-contract=fast option can
be used when such behavior is desired.

Differential Revision: https://reviews.llvm.org/D46057

llvm-svn: 331033
2018-04-27 13:30:27 +00:00
Oliver Stannard f3632143da [ARM] Codegen for v8.2A dot product intrinsics
This adds IR intrinsics for the ARM dot-product instructions introduced in
v8.2-A.

Differential revision: https://reviews.llvm.org/D46106

llvm-svn: 331032
2018-04-27 12:50:40 +00:00
David Green c4cccea4c9 [ARM] Enable misched for R52.
Back when the R52 schedule was added in rL286949, there was no way
to enable machine schedules in ARM for specific cores. Since then a
target feature has been added. This enables the feature for R52,
removing the need to manually specify compiler flags.

llvm-svn: 331027
2018-04-27 11:29:49 +00:00
Petar Jovanovic d4349f3bf6 [mips] Add support for Virtualization ASE
This includes

  Instructions: tlbginv, tlbginvf, tlbgp, tlbgr, tlbgwi, tlbgwr, hypcall
                mfgc0, mtgc0, mfhgc0, mthgc0, dmfgc0, dmtgc0,

  Assembler directives: .set virt, .set novirt, .module virt, .module novirt

  Attribute: virt

  .MIPS.abiflags: VZ (0x100)

Patch by Vladimir Stefanovic.

Differential Revision: https://reviews.llvm.org/D44905

llvm-svn: 331024
2018-04-27 09:12:08 +00:00
Eli Friedman da018e5687 [MachineOutliner] Don't outline from functions with a section marking.
The program might have unusual expectations for functions; for example,
the Linux kernel's build system warns if it finds references from .text
to .init.data.

I'm not sure this is something we actually want to make any guarantees
about (there isn't any explicit rule that would disallow outlining
in this case), but we might want to be conservative anyway.

Differential Revision: https://reviews.llvm.org/D46091

llvm-svn: 331007
2018-04-27 00:21:34 +00:00
Chandler Carruth 16429acacb [x86] Revert r330322 (& r330323): Lowering x86 adds/addus/subs/subus intrinsics
The LLVM commit introduces a crash in LLVM's instruction selection.

I filed http://llvm.org/PR37260 with the test case.

llvm-svn: 330997
2018-04-26 21:46:01 +00:00
Simon Atanasyan d4d892ff9f [mips] Accept 32-bit offsets for lb and lbu commands
`lb` and `lbu` commands accepts 16-bit signed offsets. But GAS accepts
larger offsets for these commands. If an offset does not fit in 16-bit
range, `lb` command is translated into lui/lb or lui/addu/lb series.
It's interesting that initially LLVM assembler supported this feature,
but later it was broken.

This patch restores support for 32-bit offsets. It replaces `mem_simm16`
operand for `LB` and `LBu` definitions by the new `mem_simmptr` operand.
This operand is intended to check that offset fits to the same size as
using for pointers. Later we will be able to extend this rule and
accepts 64-bit offsets when it is possible.

Some issues remain:
- The regression also affects LD, SD, LH, LHU commands. I'm going
  to fix them by a separate patch.

- GAS accepts any 32-bit values as an offset. Now LLVM accepts signed
  16-bit values and this patch extends the range to signed 32-bit offsets.
  In other words, the following code accepted by GAS and still triggers
  an error by LLVM:
```
  lb      $4, 0x80000004

  # gas
  lui     a0, 0x8000
    lb      a0, 4(a0)
```

- In case of 64-bit pointers GAS accepts a 64-bit offset and translates
  it to the li/dsll/lb series of commands. LLVM still rejects it.
  Probably this feature has never been implemented in LLVM. This issue
  is for a separate patch.
```
  lb      $4, 0x800000001

  # gas
  li      a0, 0x8000
  dsll    a0, a0, 0x14
  lb      a0, 4(a0)
```

Differential Revision: https://reviews.llvm.org/D45020

llvm-svn: 330983
2018-04-26 19:55:28 +00:00
Sam Clegg 6a31a0d694 [WebAssembly] Write DWARF data into wasm object file
- Writes ".debug_XXX" into corresponding custom sections.
- Writes relocation records into "reloc.debug_XXX" sections.

Patch by Yury Delendik!

Differential Revision: https://reviews.llvm.org/D44184

llvm-svn: 330982
2018-04-26 19:27:28 +00:00
Matt Arsenault 540512c297 DAG: Fix not legalizing vector fcanonicalizes
If an fcanoncialize was done on a vector type that was legal,

llvm-svn: 330981
2018-04-26 19:21:37 +00:00
Matt Arsenault fcc5ba46b7 AMDGPU: Extend extract_vector_elt fneg combine to fabs
Fixes a regression in a future commit.

llvm-svn: 330980
2018-04-26 19:21:32 +00:00
Matt Arsenault 8474803c7c AMDGPU: Consolidate SubtargetPredicate definitions
llvm-svn: 330979
2018-04-26 19:21:26 +00:00
Geoff Berry 08ab8c9544 [AArch64] Fix scavenged spill slot base when stack realignment required.
Summary:
Use the FP for scavenged spill slot accesses to prevent corruption of
the callee-save region when the SP is re-aligned.

Based on problem and patch reported by @paulwalker-arm

This is an alternative to solution proposed in D45770

Reviewers: t.p.northover, paulwalker-arm, thegameg, javed.absar

Subscribers: qcolombet, mcrosier, paulwalker-arm, kristof.beyls, rengolin, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D46063

llvm-svn: 330976
2018-04-26 18:50:45 +00:00
Mark Searles 2a19af6e17 [AMDGPU][Waitcnt] As of gfx7, VMEM operations do not increment the export counter and the input registers are available in the next instruction; update the waitcnt pass to take this into account.
Differential Revision: https://reviews.llvm.org/D46067

llvm-svn: 330954
2018-04-26 16:11:19 +00:00
Simon Dardis 8086b9db3d [mips] Correct the definitions of some control instructions
Correct the definitions of ei, di, eret, deret, wait, syscall and break.
Also provide microMIPS specific aliases to match the MIPS aliases.

Additionally correct the definition of the wait instruction so that
it is present in the instruction mapping tables.

Reviewers: smaksimovic, abeserminji, atanasyan

Differential Revision: https://reviews.llvm.org/D45939

llvm-svn: 330952
2018-04-26 16:06:34 +00:00
Alex Bradbury fda6037e98 [RISCV] Implement isLoadFromStackSlot and isStoreToStackSlot
This causes some slight shuffling but no meaningful codegen differences on the 
corpus I used for testing, but it has a larger impact when combined with e.g. 
rematerialisation. Regardless, it makes sense to report as accurate 
target-specific information as possible.

llvm-svn: 330949
2018-04-26 15:34:27 +00:00
Benjamin Kramer 7dd437710e [NVPTX] Make the legalizer expand shufflevector of <2 x half>
There's no direct instruction for this, but it's trivially implemented
with two movs. Without this the code generator just dies when
encountering a shufflevector.

Differential Revision: https://reviews.llvm.org/D46116

llvm-svn: 330948
2018-04-26 15:26:29 +00:00
Alex Bradbury 15e894baee [RISCV] Implement isZextFree
This returns true for 8-bit and 16-bit loads, allowing LBU/LHU to be selected
and avoiding unnecessary masks.

llvm-svn: 330943
2018-04-26 14:04:18 +00:00
Matthew Simpson b4096ebe26 [TTI, AArch64] Add transpose shuffle kind
This patch adds a new shuffle kind useful for transposing a 2xn matrix. These
transpose shuffle masks read corresponding even- or odd-numbered vector
elements from two n-dimensional source vectors and write each result into
consecutive elements of an n-dimensional destination vector. The transpose
shuffle kind is meant to model the TRN1 and TRN2 AArch64 instructions. As such,
this patch also considers transpose shuffles in the AArch64 implementation of
getShuffleCost.

Differential Revision: https://reviews.llvm.org/D45982

llvm-svn: 330941
2018-04-26 13:48:33 +00:00
Alex Bradbury 130b8b3f2b [RISCV] Implement isTruncateFree
Adapted from ARM's implementation introduced in r313533 and r314280.

llvm-svn: 330940
2018-04-26 13:37:00 +00:00
Lama Saba a331f91853 [X86] Fix Update Kill Register in Avoid SFB Pass - Bug 37153
Differential Revision: https://reviews.llvm.org/D45823

Change-Id: Icf6f34f6babc3cb2ff5292fde003472473037a71
llvm-svn: 330939
2018-04-26 13:16:11 +00:00
Alex Bradbury dcbff63c24 [RISCV] Implement isLegalICmpImmediate
I'm unable to construct a representative test case that demonstrates the 
advantage, but it seems sensible to report accurate target-specific 
information regardless.

llvm-svn: 330938
2018-04-26 13:15:17 +00:00
Alex Bradbury 5c41ecedf8 [RISCV] Implement isLegalAddImmediate
This causes a trivial improvement in the recently added lsr-legaladdimm.ll 
test case.

llvm-svn: 330937
2018-04-26 13:00:37 +00:00
Sander de Smalen fe17a78b86 [AArch64][SVE] Enable DiagnosticPredicates for SVE LD1 instructions.
This patch extends the PredicateMethod of AsmOperands used in SVE's 
LD1 instructions with a DiagnosticPredicate. This makes them 'context
sensitive' to the operand that has been parsed and tells the user to
use the right register (with expected shift/extend), rather than telling 
the immediate is out of range when it actually parsed a register.

Patch [2/2] in a series to improve assembler diagnostics for SVE:
-  Patch [1/2]: https://reviews.llvm.org/D45879
-  Patch [2/2]: https://reviews.llvm.org/D45880

Reviewers: olista01, stoklund, craig.topper, mcrosier, rengolin, echristo, fhahn, SjoerdMeijer, evandro, javed.absar

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D45880

llvm-svn: 330934
2018-04-26 12:54:42 +00:00
Benjamin Kramer bd89647229 [NVPTX] Deduplicate code. No functionality change.
llvm-svn: 330933
2018-04-26 12:30:16 +00:00
Alex Bradbury 09926296df [RISCV] Implement isLegalAddressingMode for RISC-V
This has no impact on codegen for the current RISC-V unit tests or my small 
benchmark set and very minor changes in a few programs in the GCC torture 
suite. Based on this, I haven't been able to produce a representative test 
program that demonstrates a benefit from isLegalAddressingMode. I'm committing 
the patch anyway, on the basis that presenting accurate information to the 
target-independent code is preferable to relying on incorrect generic 
assumptions.

llvm-svn: 330932
2018-04-26 12:13:48 +00:00
Sander de Smalen 74f9e6720b [AArch64][SVE] Asm: Support for gather LD1/LDFF1 (scalar + vector) load instructions.
Patch [2/3] in series to add support for SVE's gather load instructions
that use scalar+vector addressing modes:
- Patch [1/3]: https://reviews.llvm.org/D45951
- Patch [2/3]: https://reviews.llvm.org/D46023
- Patch [3/3]: https://reviews.llvm.org/D45958

Reviewers: fhahn, rengolin, samparker, SjoerdMeijer, t.p.northover, echristo, evandro, javed.absar

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D46023

llvm-svn: 330928
2018-04-26 08:19:53 +00:00
Craig Topper bc26f3b61b [X86] Print 'tbyte ptr' instead of 'xword ptr' for f80mem in Intel syntax.
This matches objdump.

llvm-svn: 330922
2018-04-26 05:07:40 +00:00
Craig Topper b0227189fd [X86] Remove alignment restriction on loading folding of pcmp[ei]str* during isel too.
This is a follow up to the changes in r330896 which enabled folding after isel during peephole and register allocation.

llvm-svn: 330897
2018-04-26 03:53:39 +00:00
Chandler Carruth eb631ef51e [x86] Allow folding unaligned memory operands into pcmp[ei]str*
instructions.

These have special permission according to the x86 manual to read
unaligned memory, and this folding is done by ICC and GCC as well.

This corrects one of the issues identified in PR37246.

llvm-svn: 330896
2018-04-26 03:17:25 +00:00
Simon Pilgrim 2faf606fb6 [CostModel][X86] Remove hard coded SDIV/UDIV vector costs
Algorithmically compute the 'x20' SDIV/UDIV vector costs - this is necessary for PR36550 when DIV costs will be driven from the scheduler models.

llvm-svn: 330870
2018-04-25 20:59:16 +00:00
Tom Stellard dce46fa1cf AMDGPU/R600: Move int_r600_store_stream_output to the public intrinsic file
Summary:
The TableGen'd GlobalISel instruction selector assumes all intrinsics are in
the public Intrinsic:: namespace.

Reviewers: jvesely, nhaehnle

Reviewed By: jvesely, nhaehnle

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D45989

llvm-svn: 330866
2018-04-25 20:02:53 +00:00
Mark Searles ec58183e1b [AMDGPU] Waitcnt pass: add debug options
- Add "amdgpu-waitcnt-forcezero" to force all waitcnt instrs to be emitted as s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)

- Add debug counters to control force emit of s_waitcnt instrs; debug counters:
si-insert-waitcnts-forceexp: force emit s_waitcnt expcnt(0) instrs
si-insert-waitcnts-forcevm: force emit s_waitcnt lgkmcnt(0) instrs
si-insert-waitcnts-forcelgkm: force emit s_waitcnt vmcnt(0) instrs

- Add some debug statements

Note that a variant of this patch was previously committed/reverted.

Differential Revision: https://reviews.llvm.org/D45888

llvm-svn: 330862
2018-04-25 19:21:26 +00:00
Craig Topper 300e20d61c [X86] Form MUL_IMM for multiplies with 3/5/9 to encourage LEA formation over load folding.
Previously we only formed MUL_IMM when we split a constant. This blocked load folding on those cases. We should also form MUL_IMM for 3/5/9 to favor LEA over load folding.

Differential Revision: https://reviews.llvm.org/D46040

llvm-svn: 330850
2018-04-25 17:35:03 +00:00
Alex Bradbury cd8688a4c2 [RISCV] Allow call pseudoinstruction to be used to call a function name that coincides with a register name
Previously `call zero`, `call f0` etc would fail. This leads to compilation 
failures if building programs that define functions with those names and using 
-save-temps.

llvm-svn: 330846
2018-04-25 17:25:29 +00:00
Simon Pilgrim 58e03a09db [CostModel][X86] Recursive call for cost of imul for packed v16i16 constant shift left.
Don't just assume cost = 1.

llvm-svn: 330834
2018-04-25 15:22:03 +00:00
Amara Emerson 1f5d994119 [AArch64][GlobalISel] Implement selection for the llvm.trap intrinsic.
rdar://38674040

llvm-svn: 330831
2018-04-25 14:43:59 +00:00
Shiva Chen d58bd8dc4a [RISCV] Expand function call to "call" pseudoinstruction
To do this:
1. Change GlobalAddress SDNode to TargetGlobalAddress to avoid legalizer
   split the symbol.

2. Change ExternalSymbol SDNode to TargetExternalSymbol to avoid legalizer
   split the symbol.

3. Let PseudoCALL match direct call with target operand TargetGlobalAddress
   and TargetExternalSymbol.

Differential Revision: https://reviews.llvm.org/D44885

llvm-svn: 330827
2018-04-25 14:19:12 +00:00
Shiva Chen 98f9389f65 [RISCV] Support "call" pseudoinstruction in the MC layer
To do this:
1. Add PseudoCALLIndirct to match indirect function call.

2. Add PseudoCALL to support parsing and print pseudo `call` in assembly

3. Expand PseudoCALL to the following form with R_RISCV_CALL relocation type
   while encoding:
        auipc ra, func
        jalr ra, ra, 0

If we expand PseudoCALL before emitting assembly, we will see auipc and jalr
pair when compile with -S. It's hard for assembly parser to parsing this
pair and identify it's semantic is function call and then insert R_RISCV_CALL
relocation type. Although we could insert R_RISCV_PCREL_HI20 and
R_RISCV_PCREL_LO12_I relocation types instead of R_RISCV_CALL.
Due to RISCV relocation design, auipc and jalr pair only can relax to jal with
R_RISCV_CALL + R_RISCV_RELAX relocation types.

We expand PseudoCALL as late as encoding(RISCVMCCodeEmitter) instead of before
emitting assembly(RISCVAsmPrinter) because we want to preserve call
pseudoinstruction in assembly code. It's more readable and assembly parser
could identify call assembly and insert R_RISCV_CALL relocation type.

Differential Revision: https://reviews.llvm.org/D45859

llvm-svn: 330826
2018-04-25 14:18:55 +00:00
Simon Dardis 0f2f5976d0 [mips] Teach the delay slot filler to transform 'jal' for microMIPS
ISel is currently picking 'JAL' over 'JAL_MM' for calling a function when
targeting microMIPS. A later patch will correct this behaviour.

This patch extends the mechanism for transforming instructions into their short
delay to recognise 'JAL_MM' for transforming into 'JALS_MM'.

llvm-svn: 330825
2018-04-25 14:12:57 +00:00
Simon Pilgrim dbd1ae7ddd [X86] Split WriteFMA into XMM, Scalar and YMM/ZMM scheduler classes
This removes all the FMA InstRW overrides.

If we ever get PR36924, then we can remove many of these declarations from models.

llvm-svn: 330820
2018-04-25 13:07:58 +00:00
Alexander Timofeev b934728cd2 [AMDGPU] Revert b0efc4fd6 (https://reviews.llvm.org/D40556)
llvm-svn: 330818
2018-04-25 12:32:46 +00:00
Simon Pilgrim 6a82e96ed9 [X86][SKX] Setup WriteFAdd and remove unnecessary InstRW scheduler overrides.
llvm-svn: 330813
2018-04-25 10:51:19 +00:00
Simon Pilgrim 98e21c5ade [X86][SNB] Remove unnecessary WriteFBlendLd InstRW scheduler overrides.
llvm-svn: 330812
2018-04-25 10:50:39 +00:00
Simon Dardis eac9301cdb [mips] Fix the definition of sync, synci
Also, fix the disassembly of synci for microMIPS.

Reviewers: abeserminji, smaksimovic, atanasyan

Differential Revision: https://reviews.llvm.org/D45870

llvm-svn: 330810
2018-04-25 10:19:22 +00:00
Sander de Smalen eb896b148b [AArch64][SVE] Asm: Add AsmOperand classes for SVE gather/scatter addressing modes.
This patch adds parsing support for 'vector + shift/extend' and
corresponding asm operand classes, needed for implementing SVE's
gather/scatter addressing modes.

The added combinations of vector (ZPR) and Shift/Extend are:

Unscaled:
  ZPR64ExtLSL8:           signed 64-bit offsets  (z0.d)
  ZPR32ExtUXTW8:        unsigned 32-bit offsets  (z0.s, uxtw)
  ZPR32ExtSXTW8:          signed 32-bit offsets  (z0.s, sxtw)

Unpacked and unscaled:
  ZPR64ExtUXTW8:        unsigned 32-bit offsets  (z0.d, uxtw)
  ZPR64ExtSXTW8:          signed 32-bit offsets  (z0.d, sxtw)

Unpacked and scaled:
  ZPR64ExtUXTW<scale>:  unsigned 32-bit offsets  (z0.d, uxtw #<shift>)
  ZPR64ExtSXTW<scale>:    signed 32-bit offsets  (z0.d, sxtw #<shift>)

Scaled:
  ZPR32ExtUXTW<scale>:  unsigned 32-bit offsets  (z0.s, uxtw #<shift>)
  ZPR32ExtSXTW<scale>:    signed 32-bit offsets  (z0.s, sxtw #<shift>)
  ZPR64ExtLSL<scale>:   unsigned 64-bit offsets  (z0.d,  lsl #<shift>)
  ZPR64ExtLSL<scale>:     signed 64-bit offsets  (z0.d,  lsl #<shift>)


Patch [1/3] in series to add support for SVE's gather load instructions
that use scalar+vector addressing modes:
- Patch [1/3]: https://reviews.llvm.org/D45951
- Patch [2/3]: https://reviews.llvm.org/D46023
- Patch [3/3]: https://reviews.llvm.org/D45958

Reviewers: fhahn, rengolin, samparker, SjoerdMeijer, t.p.northover, echristo, evandro, javed.absar

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D45951

llvm-svn: 330805
2018-04-25 09:26:47 +00:00
Jessica Paquette 4f56428de1 [MachineOutliner] Check for explicit uses of LR/W30 in MI operands
Before, the outliner would grab ADRPs that used LR/W30. This patch fixes
that by checking for explicit uses of those registers before the special-casing
for ADRPs.

This also adds a test that ensures that those sorts of ADRPs won't be outlined.

llvm-svn: 330783
2018-04-24 22:38:15 +00:00
Warren Ristow b960d2cb40 [X86] Account for partial stack slot spills (PR30821)
Previously, _any_ store or load instruction was considered to be
operating on a spill if it had a frameindex as an operand, and thus
was fair game for optimisations such as "StackSlotColoring". This
usually works, except on architectures where spills can be partially
restored, for example on X86 where a spilt vector can have a single
component loaded (zeroing the rest of the target register). This can be
mis-interpreted and the zero extension unsoundly eliminated, see
pr30821.

To avoid this, this commit optionally provides the caller to
isLoadFromStackSlot and isStoreToStackSlot with the number of bytes
spilt/loaded by the given instruction. Optimisations can then determine
that a full spill followed by a partial load (or vice versa), for
example, cannot necessarily be commuted.

Patch by Jeremy Morse!

Differential Revision: https://reviews.llvm.org/D44782

llvm-svn: 330778
2018-04-24 22:01:50 +00:00
Tom Stellard a2be8f4c35 AMDGPU: Remove deprecated llvm.AMDGPU.kilp intrinsic
Summary: This is no longer used by mesa since its 18.0.0 release.

Reviewers: nhaehnle

Reviewed By: nhaehnle

Subscribers: arsenm, kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D45988

llvm-svn: 330775
2018-04-24 21:37:57 +00:00
Tom Stellard 257882ff72 AMDGPU/GlobalISel: Fall-back to SelectionDAG for non-void functions
Reviewers: arsenm, nhaehnle

Reviewed By: nhaehnle

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D45843

llvm-svn: 330774
2018-04-24 21:29:36 +00:00
Tom Stellard c7709e1c29 AMDGPU/GlobalISel: Add support for amdgpu_ps calling convention
Reviewers: arsenm

Reviewed By: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D45837

llvm-svn: 330767
2018-04-24 20:51:28 +00:00
Simon Pilgrim c4d25a2922 [X86][SKX] Setup WriteFMul and remove unnecessary InstRW scheduler overrides.
llvm-svn: 330760
2018-04-24 19:22:01 +00:00
Simon Pilgrim 27bc83e228 [X86] Split off PHMINPOSUW to their own schedule class
This also fixes Jaguar's schedule which was treating it as the WriteVecIMul default. 

llvm-svn: 330756
2018-04-24 18:49:25 +00:00
Stanislav Mekhanoshin a4bfb3c446 [AMDGPU] Truncate packed inline constant
If a packed inline constant is sign extended it must be truncated
after the shift. I.e. a constant (0xH0000, 0xHBC00), will be represented
as 0xFFFFFFFFBC000000 in the IR because the immediate is sign extended
to 64 bit. After the value shifted right by 16 to use it in a low part
with op_sel_hi it becomes 0xFFFFFFFFBC00 and does not qualify as inline
constant any longer.

Fixed the error and added verification code. Without the fix and with
the verification bug is causing pk_max_f16_literal.ll to fail.

Differential Revision: https://reviews.llvm.org/D45987

llvm-svn: 330752
2018-04-24 18:17:55 +00:00
Simon Pilgrim 81cb67ad82 [XOP] v4i32 IFMA 'VPMACS' instructions should use the WritePMULLD schedule class
llvm-svn: 330751
2018-04-24 18:13:57 +00:00
Simon Pilgrim cf0199a289 [AVX512] VPERMQ/VPERMPD/VPERMIL single op shuffles are not variable shuffles
These variants all take an immediate shuffle mask value and should be scheduled as such.

llvm-svn: 330747
2018-04-24 17:59:54 +00:00
Simon Dardis d2ac0faf3b Reland "[mips] Guard traps for microMIPS correctly"
This is part of fixing the instruction predicates for MIPS.

Reviewers: atanasyan, abeserminji

Differential Revision: https://reviews.llvm.org/D44212


This patch relands r327409, hopefully without the problematic part of the
tests that cause FileCheck to assert on the windows expensive checks bot.

llvm-svn: 330741
2018-04-24 17:11:37 +00:00
Simon Pilgrim f0945aa0e0 [X86][F16C] Add WriteCvtF2FSt scheduling class
Fixes the classification of VCVTPS2PHmr/VCVTPS2PHYmr which were tagged as WriteCvtF2FLd_WriteRMW (PR36887)

llvm-svn: 330737
2018-04-24 16:43:07 +00:00
Simon Pilgrim 828ef9e013 [X86][BtVer2] Fix VCVTPS2PHmr/VCVTPS2PHYmr latencies
These are stores, not loads, so don't need to account for load latency.

llvm-svn: 330735
2018-04-24 16:26:51 +00:00
Simon Atanasyan 9df3be3ccb [mips] Show an error if register number is out of range
Current code does not check that a register number is in the 0-31 range.
Sometimes the parser checks that later for some kinds of instructions,
but that leads to unclear / incorrect error messages like that:

  % cat test.s
  .text
  lb $4, 8($32)

  % llvm-mc test.s -triple=mips64-unknown-linux
  test.s:2:10: error: expected memory with 16-bit signed offset
    lb $4, 8($32)
           ^

Sometimes the parser just crashes:

  % cat test.s
  .text
  lw  $4, 8($32)

  % llvm-mc test.s -triple=mips64-unknown-linux

This patch resolves the problem by checking that register number after
'$' sign is in the 0-31 range. If the number is out of the range the
parser shows the `invalid register number` error, but treats invalid
register number as a normal one to continue parsing and catch other
possible errors.

Differential Revision: https://reviews.llvm.org/D45919

llvm-svn: 330732
2018-04-24 16:14:00 +00:00
Mark Searles 70901b9047 [AMDGPU][Waitcnt] NFC. Cleanup some code/naming consistency:
- s/SWaitcnt/Waitcnt s/WaitCnt/Waitcnt

llvm-svn: 330730
2018-04-24 15:59:59 +00:00
Simon Pilgrim 16299273d0 [X86] Remove unnecessary FMA reg-mem InstRW scheduler overrides.
llvm-svn: 330720
2018-04-24 14:47:11 +00:00
Ulrich Weigand 497c70fff1 [SystemZ] Use preferred 16-byte function alignment
While not necessary for correctness, it is preferable for
performance reasons on all architectures we currently support
to align functions to 16-byte boundaries by default.

llvm-svn: 330718
2018-04-24 14:03:21 +00:00
Simon Pilgrim f7d2a93d5f [X86] Add vector element insertion/extraction scheduler classes
Split off pinsr/pextr and extractps instructions.

(Mostly) fixes PR36887.

Note: It might be worth adding a WriteFInsertLd class as well in the future.

Differential Revision: https://reviews.llvm.org/D45929

llvm-svn: 330714
2018-04-24 13:21:41 +00:00
Alexander Ivchenko 5717fbaf4c [X86] Replace action Promote with Expand for operation ISD::SINT_TO_FP
Summary:
If attribute "use-soft-float"="true" is set then X86ISelLowering.cpp sets
'Promote' action for ISD::SINT_TO_FP operation on type i32.

But 'Promote' action is not proper in this case since lib function
__floatsidf is available for casting from signed int to float type.
Thus Expand action is more suitable here.

The Expand action should be set for ISD::UINT_TO_FP for soft float as well.

If function attribute "use-soft-float"="true" is set then infinite looping
can happen in DAG combining, function visitSINT_TO_FP() replaces SINT_TO_FP
node with UINT_TO_FP node and function combineUIntToFP() replace vice versa in cycle.
The fix prevents it.

Patch by vrybalov

Differential Revision: https://reviews.llvm.org/D45572

llvm-svn: 330711
2018-04-24 12:57:51 +00:00
Petar Jovanovic e2bfcd6394 Correct dwarf unwind information in function epilogue
This patch aims to provide correct dwarf unwind information in function
epilogue for X86.
It consists of two parts. The first part inserts CFI instructions that set
appropriate cfa offset and cfa register in emitEpilogue() in
X86FrameLowering. This part is X86 specific.

The second part is platform independent and ensures that:

* CFI instructions do not affect code generation (they are not counted as
  instructions when tail duplicating or tail merging)
* Unwind information remains correct when a function is modified by
  different passes. This is done in a late pass by analyzing information
  about cfa offset and cfa register in BBs and inserting additional CFI
  directives where necessary.

Added CFIInstrInserter pass:

* analyzes each basic block to determine cfa offset and register are valid
  at its entry and exit
* verifies that outgoing cfa offset and register of predecessor blocks match
  incoming values of their successors
* inserts additional CFI directives at basic block beginning to correct the
  rule for calculating CFA

Having CFI instructions in function epilogue can cause incorrect CFA
calculation rule for some basic blocks. This can happen if, due to basic
block reordering, or the existence of multiple epilogue blocks, some of the
blocks have wrong cfa offset and register values set by the epilogue block
above them.
CFIInstrInserter is currently run only on X86, but can be used by any target
that implements support for adding CFI instructions in epilogue.

Patch by Violeta Vukobrat.

Differential Revision: https://reviews.llvm.org/D42848

llvm-svn: 330706
2018-04-24 10:32:08 +00:00
Simon Dardis fce722e6f8 [mips] Correct the patterns for bswap
Guard the MIPS64 variant correctly for i64, mark the MIPS32 version as not
in microMIPS and provide the microMIPS version.

Additionally, remove a related stale XFAIL'd test as bswap has its own test
case providing coverage.

Reviewers: smaksimovic, abeserminji, atanasyan

Differential Revision: https://reviews.llvm.org/D45816

llvm-svn: 330705
2018-04-24 10:19:29 +00:00
Sander de Smalen eb1053f9d3 [AArch64][SVE] Asm: Support for contiguous, first-faulting LDFF1 (scalar+scalar) load instructions.
Reviewers: fhahn, rengolin, samparker, SjoerdMeijer, t.p.northover, echristo, evandro, javed.absar

Reviewed By: rengolin

Subscribers: tschuett, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D45946

llvm-svn: 330697
2018-04-24 08:59:08 +00:00
Craig Topper 19b85103a3 [X86] Add a BSWAP16 instruction using the 32-bit encoding plus a 0x66 prefix.
This encoding is recognized by the CPU, but the behavior is undefined. This makes the disassembler handle it correctly so we don't print bswapl with a 16-bit register.

llvm-svn: 330682
2018-04-24 04:28:02 +00:00
Eric Christopher b9733d0f7c Remove unused function HexagonEarlyIfConversion::replacePhiEdges. NFC.
llvm-svn: 330678
2018-04-24 02:10:59 +00:00
Simon Pilgrim e5e4bf02d6 [X86] Remove unnecessary vector memory folded InstRW overrides.
We have test coverage for these with resources-sse*/avx*

llvm-svn: 330662
2018-04-23 22:45:04 +00:00
Simon Pilgrim eb6090941c [X86] Remove unnecessary BMI2 InstRW overrides.
We have test coverage for these with resources-bmi2.s

llvm-svn: 330659
2018-04-23 22:19:55 +00:00
Simon Pilgrim ed09ebb48d [X86] Remove unnecessary WriteLEA InstRW overrides.
llvm-svn: 330648
2018-04-23 21:04:23 +00:00
Gabor Buella 1a2ce572bf [X86] Revert r330638 - accidental commit
llvm-svn: 330640
2018-04-23 20:05:51 +00:00