Commit Graph

45412 Commits

Author SHA1 Message Date
Matt Arsenault 4512d0a68b AMDGPU: Replace list of SMEM buffer opcodes
llvm-svn: 318506
2017-11-17 04:18:26 +00:00
Matt Arsenault 03c67d1eb2 AMDGPU: Fix breaking SMEM clauses
This was completely ignoring subregisters,
so was not very useful. Also only break them
if xnack is actually enabled.

llvm-svn: 318505
2017-11-17 04:18:24 +00:00
David Blaikie b3bde2ea50 Fix a bunch more layering of CodeGen headers that are in Target
All these headers already depend on CodeGen headers so moving them into
CodeGen fixes the layering (since CodeGen depends on Target, not the
other way around).

llvm-svn: 318490
2017-11-17 01:07:10 +00:00
Yi Kong 39bcd4ed3e [ARM] 't' asm constraint should accept i32
't' constraint normally only accepts f32 operands, but for VCVT the
operands can be i32. LLVM is overly restrictive and rejects asm like:

  float foo() {
    float result;
    __asm__ __volatile__(
      "vcvt.f32.s32 %[result], %[arg1]\n"
      : [result]"=t"(result)
      : [arg1]"t"(0x01020304) );
    return result;
  }

Relax the value type for 't' constraint to either f32 or i32.

Differential Revision: https://reviews.llvm.org/D40137

llvm-svn: 318472
2017-11-16 23:38:17 +00:00
Craig Topper 089082378f [X86] Add DAG combine to remove sext i32->i64 from gather/scatter instructions.
Only do this pre-legalize in case we're using the sign extend to legalize for KNL.

This recovers all of the tests that changed when I stopped SelectionDAGBuilder from deleting sign extends.

There's more work that could be done here particularly to fix the i8->i64 test case that experienced split.

llvm-svn: 318468
2017-11-16 23:09:06 +00:00
Mandeep Singh Grang 47fbc5911d [RISCV] Fix 64-bit data layout mismatch between backend and target description
Reviewers: asb

Reviewed By: asb

Subscribers: rbar, johnrusso, simoncook, jordy.potman.lists, llvm-commits

Differential Revision: https://reviews.llvm.org/D40145

llvm-svn: 318454
2017-11-16 20:30:49 +00:00
Craig Topper e85ff4f732 [X86] Pre-truncate gather/scatter indices that have element sizes larger than 64-bits before Legalize.
The wider element type will normally cause legalize to try to split and scalarize the gather/scatter, but we can't handle that. Instead, truncate the index early so the gather/scatter node is insulated from the legalization.

This really shouldn't happen in practice since InstCombine will normalize index types to the same size as pointers.

llvm-svn: 318452
2017-11-16 20:23:22 +00:00
Craig Topper 04be793cec [X86] DAGCombinerInfo is in TargetLowering not X86TargetLowering.
llvm-svn: 318451
2017-11-16 20:23:17 +00:00
Daniel Sanders 170baca646 [arc] Fix ambiguous overloaded operator error
lib/Target/ARC/ARCISelLowering.cpp:490:22: error: use of overloaded operator '<<' is ambiguous (with operand types 'llvm::raw_ostream' and 'llvm::MVT::SimpleValueType')
                     << RegVT.getSimpleVT().SimpleTy << "\n");
                     ^  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~

llvm-svn: 318443
2017-11-16 19:16:56 +00:00
Yonghong Song ce96738dee bpf: print backward branch target properly
Currently, it prints the backward branch offset as unsigned value
like below:
       7:       7d 34 0b 00 00 00 00 00         if r4 s>= r3 goto 11 <LBB0_3>
       8:       b7 00 00 00 00 00 00 00         r0 = 0
LBB0_2:
       9:       07 00 00 00 01 00 00 00         r0 += 1
      ......
      17:       bf 31 00 00 00 00 00 00         r1 = r3
      18:       6d 32 f6 ff 00 00 00 00         if r2 s> r3 goto 65526 <LBB0_3+0x7FFB0>

The correct print insn 18 should be:
      18:       6d 32 f6 ff 00 00 00 00         if r2 s> r3 goto -10 <LBB0_2>

To provide better clarity and be consistent with kernel verifier output,
the insn 7 output is changed to the following with "+" added to
non-negative branch offset:
       7:       7d 34 0b 00 00 00 00 00         if r4 s>= r3 goto +11 <LBB0_3>

Signed-off-by: Yonghong Song <yhs@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
llvm-svn: 318442
2017-11-16 19:15:36 +00:00
Daniel Sanders 1eaf300fac [arc] Update TargetInfo to include the new backend name argument
Also update a comment about the usage of RegisterTarget() that didn't mention
the new argument.

llvm-svn: 318441
2017-11-16 19:10:26 +00:00
Azharuddin Mohammed fa8420d0a1 Fix RISCV build after r318352
Reviewers: asb, apazos, mgrang

Reviewed By: mgrang

Subscribers: rbar, johnrusso, simoncook, jordy.potman.lists, llvm-commits

Differential Revision: https://reviews.llvm.org/D40139

llvm-svn: 318437
2017-11-16 18:39:31 +00:00
Guozhi Wei 433e8d3e04 [PPC] Change i32 constant in store instruction to i64
This patch changes all i32 constant in store instruction to i64 with truncation, to increase the chance that the referenced constant can be shared with other i64 constant.

Differential Revision: https://reviews.llvm.org/D39352

llvm-svn: 318436
2017-11-16 18:27:34 +00:00
Mohammed Agabaria 6e6d5326a1 [TTI][X86] update costs of interleaved load\store of i64\double
This patch contains more accurate cost of interelaved load\store of stride 2 for the types int64\double on AVX2.

Reviewers: delena, RKSimon, craig.topper, dorit

Reviewed By: dorit

Differential Revision: https://reviews.llvm.org/D40008

llvm-svn: 318385
2017-11-16 09:38:32 +00:00
Craig Topper 46a5d58b8c [X86] Update TTI to report that v1iX/v1fX types aren't legal for masked gather/scatter/load/store.
The type legalizer will try to scalarize these operations if it sees them, but there is no handling for scalarizing them. This leads to a fatal error. With this change they will now be scalarized by the mem intrinsic scalarizing pass before SelectionDAG.

llvm-svn: 318380
2017-11-16 06:02:05 +00:00
Eric Christopher 6348188e87 Fix thinko in last commit.
llvm-svn: 318374
2017-11-16 03:25:02 +00:00
Eric Christopher 3148a1be88 Add NDEBUG checks around LLVM_DUMP_METHOD functions for Wunused-function warnings.
llvm-svn: 318373
2017-11-16 03:18:15 +00:00
Craig Topper e6601fd30e [X86] Custom type legalize v2f32 masked gathers instead of trying to cleanup after type legalization.
llvm-svn: 318368
2017-11-16 02:07:45 +00:00
Yonghong Song 4c3ce59e61 bpf: enable llvm-objdump to print out symbolized jmp target
Add hook in BPF backend so that llvm-objdump can print out
the jmp target with label names, e.g.,
  ...
  if r1 != 2 goto 6 <LBB0_2>
  ...
  goto 7 <LBB0_4>
  ...
 LBB0_2:
  ...
 LBB0_4:
  ...

Signed-off-by: Yonghong Song <yhs@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
llvm-svn: 318358
2017-11-16 00:52:30 +00:00
Daniel Sanders f76f315436 [globalisel][tablegen] Generate rule coverage and use it to identify untested rules
Summary:
This patch adds a LLVM_ENABLE_GISEL_COV which, like LLVM_ENABLE_DAGISEL_COV,
causes TableGen to instrument the generated table to collect rule coverage
information. However, LLVM_ENABLE_GISEL_COV goes a bit further than
LLVM_ENABLE_DAGISEL_COV. The information is written to files
(${CMAKE_BINARY_DIR}/gisel-coverage-* by default). These files can then be
concatenated into ${LLVM_GISEL_COV_PREFIX}-all after which TableGen will
read this information and use it to emit warnings about untested rules.

This technique could also be used by SelectionDAG and can be further
extended to detect hot rules and give them priority over colder rules.

Usage:
* Enable LLVM_ENABLE_GISEL_COV in CMake
* Build the compiler and run some tests
* cat gisel-coverage-[0-9]* > gisel-coverage-all
* Delete lib/Target/*/*GenGlobalISel.inc*
* Build the compiler

Known issues:
* ${LLVM_GISEL_COV_PREFIX}-all must be generated as a manual
  step due to a lack of a portable 'cat' command. It should be the
  concatenation of all ${LLVM_GISEL_COV_PREFIX}-[0-9]* files.
* There's no mechanism to discard coverage information when the ruleset
  changes

Depends on D39742

Reviewers: ab, qcolombet, t.p.northover, aditya_nandakumar, rovka

Reviewed By: rovka

Subscribers: vsk, arsenm, nhaehnle, mgorny, kristof.beyls, javed.absar, igorb, llvm-commits

Differential Revision: https://reviews.llvm.org/D39747

llvm-svn: 318356
2017-11-16 00:46:35 +00:00
Reid Kleckner 8d8a8bb7ee Try to fix WebAssembly build after r318352
llvm-svn: 318355
2017-11-16 00:32:19 +00:00
Daniel Sanders 725584e26d Add backend name to Target to enable runtime info to be fed back into TableGen
Summary:
Make it possible to feed runtime information back to tablegen to enable
profile-guided tablegen-eration, detection of untested tablegen definitions, etc.

Being a cross-compiler by nature, LLVM will potentially collect data for multiple
architectures (e.g. when running 'ninja check'). We therefore need a way for
TableGen to figure out what data applies to the backend it is generating at the
time. This patch achieves that by including the name of the 'def X : Target ...'
for the backend in the TargetRegistry.

Reviewers: qcolombet

Reviewed By: qcolombet

Subscribers: jholewinski, arsenm, jyknight, aditya_nandakumar, sdardis, nemanjai, ab, nhaehnle, t.p.northover, javed.absar, qcolombet, llvm-commits, fedor.sergeev

Differential Revision: https://reviews.llvm.org/D39742

llvm-svn: 318352
2017-11-15 23:55:44 +00:00
Evandro Menezes 82665b1ec4 [AArch64] Adjust the cost model for Exynos M1 and M2
Fix the modeling of FP stores.

llvm-svn: 318351
2017-11-15 23:49:58 +00:00
Matt Arsenault 301162c4fe AMDGPU: Replace i64 add/sub lowering
Use VOP3 add/addc like usual.

This has some tradeoffs. Inline immediates fold
a little better, but other constants are worse off.
SIShrinkInstructions could be made smarter to handle
these cases.

This allows us to avoid selecting scalar adds where we
need to track the carry in scc and replace its users.
This makes it easier to use the carryless VALU adds.

llvm-svn: 318340
2017-11-15 21:51:43 +00:00
Evandro Menezes 5ba804bc11 [AArch64] Refactor the loads and stores optimizer
Move remaining inline matching of instructions of some optimizations into
separate functions, like in the other optimizations.  Otherwise, NFC.

Differential revision: https://reviews.llvm.org/D40090

llvm-svn: 318335
2017-11-15 21:06:22 +00:00
Craig Topper 54b57b0dd8 [X86] Add a return to the end of a switch to prevent an accidental fallthrough in the future.
llvm-svn: 318330
2017-11-15 20:42:47 +00:00
Sean Fertile 0f0837e84e [PowerPC] Implement mayBeEmittedAsTailCall for PPC
Implements TargetLowering callback 'mayBeEmittedAsTailCall' that enables
CodeGenPrepare to duplicate returns when they might enable a tail-call.

Differential Revision: https://reviews.llvm.org/D39777

llvm-svn: 318321
2017-11-15 18:58:27 +00:00
Evandro Menezes cbf70486bc [AArch64] Adjust the cost model for Exynos M1 and M2
Fix the modeling of loads and stores using the pre or post indexed
addressing modes.

llvm-svn: 318312
2017-11-15 17:39:37 +00:00
Simon Pilgrim 56415772d6 [X86] Add CBW/CDQ/CDQE/CQO/CWD/CWDE to WriteALU schedule class
Some CPUs are already overriding these sign extension instructions but we should be able to use the WriteALU schedule class by default.

Differential Revision: https://reviews.llvm.org/D39899

llvm-svn: 318308
2017-11-15 17:11:24 +00:00
Sean Fertile 7b056b3048 [PowerPC] Split out the tailcall calling convention checks. NFC.
Move the calling convention checks for tail-call eligibility for the 64-bit
SysV ABI into a separate function. This is so that it can be shared with
'mayBeEmittedAsTailCall' in a subsequent change.

llvm-svn: 318305
2017-11-15 16:53:41 +00:00
Sander de Smalen 8e607346af [AArch64][SVE] Asm: Report SVE parsing diagnostics only once
Summary:
Prevent an issue where a diagnostic is reported multiple times by bailing out with a ParseFail if an invalid SVE register element qualifier/suffix is specified, for example:

 <stdin>:10:18: error: invalid sve vector kind qualifier
 add z20.h, z2.h, z31.x
                 ^
 <stdin>:10:18: error: invalid sve vector kind qualifier
 add z20.h, z2.h, z31.x
 
 ...
 
 <stdin>:10:18: error: invalid sve vector kind qualifier
 add z20.h, z2.h, z31.x
                 ^


Reviewers: fhahn, rengolin

Reviewed By: rengolin

Subscribers: aemerson, javed.absar, tschuett, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D39894

llvm-svn: 318297
2017-11-15 15:44:43 +00:00
Petar Jovanovic cd729ead01 [mips] Improve genConstMult() to work with arbitrary precision
APInt is now used instead of uint64_t in function genConstMult() allowing
multiplication optimizations with constants of arbitrary length.

Patch by Milos Stojanovic.

Differential Revision: https://reviews.llvm.org/D38130

llvm-svn: 318296
2017-11-15 15:24:04 +00:00
Momchil Velikov 4a91fb93db [ARM] Split Arm jump table branch into i12 and rs suffixed versions
This is a refactoring/cleanup of Arm `addrmode2` operand class. The patch
removes it completely.

Differential Revision: https://reviews.llvm.org/D39832

llvm-svn: 318291
2017-11-15 12:02:55 +00:00
Craig Topper 16a91cee6c [X86] Redefine the 128-bit version of VPGATHERQD and VGATHERQPS to use a VK2 mask instead of a VK4 mask.
This allows us to remove extra extend creation during lowering and more accurately reflects the semantics of the instruction.

While there add an extra output VT to X86 masked gather node to better match the isel pattern predicate. Currently we're exploiting the fact that the isel table doesn't count how many output results a node actually has if the result type of any can be inferred from the first result and the type constraints defined in tablegen. I think we might ultimately want to lower all MGATHER/MSCATTER to an X86ISD node with the extra mask result and stop relying on this hole in the isel checking.

llvm-svn: 318278
2017-11-15 07:46:43 +00:00
Hiroshi Inoue 72a1f98a67 [PowerPC] fix up in redundant compare elimination
This patch fixes a potential problem in my previous commit (https://reviews.llvm.org/rL312514) by introducing an additional check.

llvm-svn: 318266
2017-11-15 04:23:26 +00:00
Matt Arsenault 10c472dd83 AMDGPU: Add separate definitions for DS insts without m0 use
llvm-svn: 318246
2017-11-15 01:34:06 +00:00
Matt Arsenault 45b98189bd AMDGPU: Don't use MUBUF vaddr if address may overflow
Effectively revert r263964. Before we would not
allow this if vaddr was not known to be positive.

llvm-svn: 318240
2017-11-15 00:45:43 +00:00
Matt Arsenault c8903125cd AMDGPU: Handle or in multi-use shl ptr combine
llvm-svn: 318223
2017-11-14 23:46:42 +00:00
Simon Dardis de5ed0c58e Reland "[mips][mt][6/7] Add support for mftr, mttr instructions."
This adjusts the tests to hopfully pacify the
llvm-clang-x86_64-expensive-checks-win buildbot.

Unlike many other instructions, these instructions have aliases which
take coprocessor registers, gpr register, accumulator (and dsp accumulator)
registers, floating point registers, floating point control registers and
coprocessor 2 data and control operands.

For the moment, these aliases are treated as pseudo instructions which are
expanded into the underlying instruction. As a result, disassembling these
instructions shows the underlying instruction and not the alias.

Reviewers: slthakur, atanasyan

Differential Revision: https://reviews.llvm.org/D35253

llvm-svn: 318207
2017-11-14 22:26:42 +00:00
Richard Smith 7007f07664 Fix unused variable warning.
llvm-svn: 318201
2017-11-14 21:26:46 +00:00
Matt Arsenault 9ba465a972 AMDGPU: Error on stack size overflow
llvm-svn: 318189
2017-11-14 20:33:14 +00:00
Ulrich Weigand 5f4373a2fc [SystemZ] Do not crash when selecting an OR of two constants
In rare cases, common code will attempt to select an OR of two
constants.  This confuses the logic in splitLargeImmediate,
causing an internal error during isel.  Fixed by simply leaving
this case to common code to handle.

This fixes PR34859.

llvm-svn: 318187
2017-11-14 20:00:34 +00:00
Evandro Menezes 1c94538693 [AArch64] Adjust the cost model for Exynos M1 and M2
Fix the modeling of loads and stores of registers pairs.

llvm-svn: 318186
2017-11-14 19:59:43 +00:00
Martin Storsjo 4629f52312 [ARM, AArch64] Fix an assert message, Darwin isn't the only target supporting TLS. NFC.
llvm-svn: 318184
2017-11-14 19:57:59 +00:00
Ulrich Weigand 55b8590e03 [SystemZ] Fix invalid codegen using RISBMux on out-of-range bits
Before using the 32-bit RISBMux set of instructions we need to
verify that the input bits are actually within range of the 32-bit
instruction.  This fixer PR35289.

llvm-svn: 318177
2017-11-14 19:20:46 +00:00
Artem Belevich 55dcf5e586 Mark intrinsics operating on the whole warp as IntrInaccessibleMemOnly
It's needed to model the fact that they do access data from other threads in a
warp and thus can't be CSE'd.

llvm-svn: 318173
2017-11-14 19:14:00 +00:00
Craig Topper 2153114227 [X86] Fix typo in comment. NFC
llvm-svn: 318156
2017-11-14 16:14:00 +00:00
Tim Northover 5cdc4f9c33 ARM: correctly update CFG when splitting BB to fix branch.
Because the block-splitting code is multi-purpose, we have to meddle with the
branches when using it to fixup a conditional branch destination. We got the
code right, but forgot to update the CFG so the verifier complained when
expensive checks were on.

Probably harmless since constant-islands comes so late, but best to fix it
anyway.

llvm-svn: 318148
2017-11-14 11:43:54 +00:00
Diana Picus 21a42bcc0b [ARM GlobalISel] Remove C++ code for G_CONSTANT
Get rid of the handwritten instruction selector code for handling
G_CONSTANT. This code wasn't checking all the preconditions correctly
anyway, so it's better to leave it to TableGen, which can handle at
least some cases correctly (e.g. MOVi, MOVi16, folding into binary
operations). Also add tests to cover those cases.

llvm-svn: 318146
2017-11-14 11:20:32 +00:00
Momchil Velikov dc86e1444d [ARM] Fix incorrect conversion of a tail call to an ordinary call
When we emit a tail call for Armv8-M, but then discover that the caller needs to
save/restore `LR`, we convert the tail call to an ordinary one, since restoring
`LR` takes extra instructions, which may negate the benefits of the tail
call. If the callee, however, takes stack arguments, this conversion is
incorrect, since nothing has been done to pass the stack arguments.

Thus the patch reverts https://reviews.llvm.org/rL294000

Also, we improve the instruction sequence for popping `LR` in the case when we
couldn't immediately find a scratch low register, but we can use as a temporary
one of the callee-saved low registers and restore `LR` before popping other
callee-saves.

Differential Revision: https://reviews.llvm.org/D39599

llvm-svn: 318143
2017-11-14 10:36:52 +00:00
Matt Arsenault 57c37b2dcd AMDGPU: Fix producing saveexec when the copy is spilled
If the register from the copy from exec was spilled,
the copy before the spill was deleted leaving a spill
of undefined register verifier error and miscompiling.
Check for other use instructions of the copy register.

llvm-svn: 318132
2017-11-14 02:16:54 +00:00
Hans Wennborg 08b34a017a Update some code.google.com links
llvm-svn: 318115
2017-11-13 23:47:58 +00:00
Matt Arsenault 4b7938c658 AMDGPU: Fix not converting d16 load/stores to offset
Fixes missed optimization with new MUBUF instructions.

llvm-svn: 318106
2017-11-13 23:24:26 +00:00
Matt Arsenault 4eea3f3da3 AMDGPU: Implement computeKnownBitsForTargetNode for mbcnt
llvm-svn: 318100
2017-11-13 22:55:05 +00:00
Evgeniy Stepanov 76d5ac4906 [arm] Fix Unnecessary reloads from GOT.
Summary:
This fixes PR35221.
Use pseudo-instructions to let MachineCSE hoist global address computation.

Subscribers: aemerson, javed.absar, kristof.beyls, llvm-commits, hiraditya

Differential Revision: https://reviews.llvm.org/D39871

llvm-svn: 318081
2017-11-13 20:45:38 +00:00
Craig Topper c314f461dd [X86] Allow X86ISD::Wrapper to be folded into the base of gather/scatter address
If the base of our gather corresponds to something contained in X86ISD::Wrapper we should be able to fold it into the address.

This patch refactors some of the address matching to more fully use the X86ISelAddressMode struct and the getAddressOperands helper. A new helper function matchVectorAddress is added to call matchWrapper or fall back to matchAddressBase.

We should also be able to support constant offsets from a wrapper, but I'll look into that in a future patch. We may even be able to completely reuse matchAddress here, but I wanted to start simple and work up to it.

Differential Revision: https://reviews.llvm.org/D39927

llvm-svn: 318057
2017-11-13 17:53:59 +00:00
Jan Vesely b17f32040c AMDGPU: Drop duplicate setOperationAction
These are set with other scalar int ops few lines up

Differential Revision: https://reviews.llvm.org/D39928

llvm-svn: 318051
2017-11-13 16:46:07 +00:00
Uriel Korach 2aa707bdaa [X86] test/testn intrinsics lowering to IR. llvm part.
Remove builtins from llvm and add AutoUpgrade support.
Also add fast-isel tests for the TEST and TESTN instructions.

Differential Revision: https://reviews.llvm.org/D38736

llvm-svn: 318036
2017-11-13 12:51:18 +00:00
Momchil Velikov 842aa90192 [ARM] Place jump table as the first operand in additions
When generating table jump code for switch statements, place the jump
table label as the first operand in the various addition instructions
in order to enable addressing mode selectors to better match index
computation and possibly fold them into the addressing mode of the
table entry load instruction.

Differential revision: https://reviews.llvm.org/D39752

llvm-svn: 318033
2017-11-13 11:56:48 +00:00
Sander de Smalen 070a7ff1ad Test commit
llvm-svn: 318027
2017-11-13 09:57:20 +00:00
Jina Nahias 9a7f9f123c [x86][AVX512] Lowering shuffle i/f intrinsics to LLVM IR
This patch, together with a matching clang patch (https://reviews.llvm.org/D38672), implements the lowering of X86 shuffle i/f intrinsics to IR.

Differential Revision: https://reviews.llvm.org/D38671

Change-Id: I1e7d359a74743e995ec356237a85214ce55d3661
llvm-svn: 318026
2017-11-13 09:16:39 +00:00
Gadi Haber c9f2300652 [X86][SKX] Adding scheduling info of non-intrinsic + commutable SKX opcodes.
Updated the scheduling information of the SKX subtarget  in the file X86SchedSkylakeServer.td under lib/Target/X86 to:
1. add regular opcodes in addition to the suffixed "_Int" opcodes
2. add the (V)MAXCPD/MAXCPS/MAXCSD/MAXCSS/MINCPD/MINCPS/MINCSD/MINCSS
    instructions that are equivalent to their counterparts without the 'C' as they are part of a hack to
    make floating point min/max commutable under fast math.

Reviewers: zvi, RKSimon, craig.topper
Differential Revision: https://reviews.llvm.org/D39833

Change-Id: Ie13702a5ce1b1a08af91ca637a52b6962881e7d6
llvm-svn: 318024
2017-11-13 08:42:07 +00:00
Craig Topper 1af2adb9f3 [X86] Limit NOPs to 7 bytes when 'slm' is spelled 'silvermont'.
We support 2 spelling for silvermont and we should accept both here.

llvm-svn: 318023
2017-11-13 08:17:30 +00:00
Craig Topper 75d71540f8 [X86] Use sse_load_f32/f64 to improve load folding of scalar vfscalefss/sd, vrcp14ss/sd, rsqrt14ss/sd instructions.
llvm-svn: 318022
2017-11-13 08:07:33 +00:00
Craig Topper ca8abedb2a [X86] Use sse_load_f32/f64 to improve load folding for scalar VFPCLASS intrinsics.
llvm-svn: 318019
2017-11-13 06:46:48 +00:00
Matt Arsenault e5e0c742df AMDGPU: Preserve nuw in shl add ptr combine
llvm-svn: 318017
2017-11-13 05:33:35 +00:00
Craig Topper d4f6094091 [X86] Fix SQRTSS/SQRTSD/RCPSS/RCPSD intrinsics to use sse_load_f32/sse_load_f64 to increase load folding opportunities.
llvm-svn: 318016
2017-11-13 05:25:24 +00:00
Matt Arsenault fbe9533509 AMDGPU: Fix multi-use shl/add combine
This was using a custom function that didn't handle the
addressing modes properly for private. Use
isLegalAddressingMode to avoid duplicating this.

Additionally, skip the combine if there is only one use
since the standard combine will handle it.

llvm-svn: 318013
2017-11-13 05:11:54 +00:00
Craig Topper 23493f3777 [X86] Attempt to fix signed and unsigned comparison warning.
llvm-svn: 318010
2017-11-13 02:19:13 +00:00
Craig Topper deee24b83c [X86] Use sse_load_f32/f64 in patterns for the memory forms of VRNDSCALESS/SD.
llvm-svn: 318009
2017-11-13 02:03:01 +00:00
Craig Topper 63157c4784 [X86] Use EVEX encoded VRNDSCALE instructions to implement the legacy round intrinsics.
The VRNDSCALE instructions implement a superset of the (V)ROUND instructions. They are equivalent if the upper 4-bits of the immediate are 0.

This patch lowers the legacy intrinsics to the VRNDSCALE ISD node and masks the upper bits of the immediate to 0. This allows us to take advantage of the larger register encoding space.

We should maybe consider converting VRNDSCALE back to VROUND in the EVEX to VEX pass if the extended registers are not being used.

I notice some load folding opportunities being missed for the VRNDSCALESS/SD instructions that I'll try to fix in future patches.

llvm-svn: 318008
2017-11-13 02:03:00 +00:00
Craig Topper 0af48f1ad4 [X86] Split VRNDSCALE/VREDUCE/VGETMANT/VRANGE ISD nodes into versions with and without the rounding operand. NFCI
I want to reuse the VRNDSCALE node for the legacy SSE rounding intrinsics so that those intrinsics can use EVEX instructions. All of these nodes share tablegen multiclasses so I split them all so that they all remain similar in their implementations.

llvm-svn: 318007
2017-11-13 02:02:58 +00:00
Matt Arsenault e1cd482fda AMDGPU: Select d16 loads into low component of register
llvm-svn: 318005
2017-11-13 00:22:09 +00:00
Craig Topper b42a23ff8f [X86] Add an X86ISD::RANGES opcode to use for the scalar intrinsics.
This fixes a bug where we selected packed instructions for scalar intrinsics.

llvm-svn: 317999
2017-11-12 18:51:09 +00:00
Craig Topper 1382932c12 [X86] Remove some no longer needed intrinsic lowering code.
llvm-svn: 317997
2017-11-12 18:51:06 +00:00
Mandeep Singh Grang d104673257 [llvm] Remove redundant return [NFC]
Reviewers: davidxl, olista01, Eugene.Zelenko

Reviewed By: Eugene.Zelenko

Subscribers: sdardis, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D39917

llvm-svn: 317995
2017-11-12 03:47:50 +00:00
Craig Topper ac250825c6 [X86] Use vrndscaleps/pd for 128/256 ffloor/ftrunc/fceil/fnearbyint/frint when avx512vl is enabled.
This matches what we do for scalar and 512-bit types.

llvm-svn: 317991
2017-11-11 21:44:51 +00:00
Simon Pilgrim 294b87b432 [X86] Attempt to match multiple binary reduction ops at once. NFCI
matchBinOpReduction currently matches against a single opcode, but we already have a case where we repeat calls to try to match against AND/OR and I'll be shortly adding another case for SMAX/SMIN/UMAX/UMIN (D39729).

This NFCI patch alters matchBinOpReduction to try and pattern match against any of the provided list of candidate bin ops at once to save time.

Differential Revision: https://reviews.llvm.org/D39726

llvm-svn: 317985
2017-11-11 18:16:55 +00:00
Craig Topper 0ccec70ff5 [X86] Add scalar register class versions of VRNDSCALE instructions and rename the existing versions to _Int.
This is consistent with out normal implementation of scalar instructions.

While there disable load folding for the patterns with IMPLICIT_DEF unless optimizing for size which is also our standard practice.

llvm-svn: 317977
2017-11-11 08:24:15 +00:00
Craig Topper 80405076b0 [X86] Inline some SDNode operand multiclass operands that don't vary. NFC
llvm-svn: 317975
2017-11-11 08:24:12 +00:00
Craig Topper 4a63843706 [X86] Set the execution domain for VFPCLASS to SSEPackedSingle/Double.
llvm-svn: 317974
2017-11-11 06:57:44 +00:00
Craig Topper 1a093934a9 [X86] Set the execution domain for vptest instruction to the integer domain.
llvm-svn: 317973
2017-11-11 06:19:12 +00:00
Craig Topper 0eb4a43384 [X86] Correct the execution domain on ROUND/VROUND instructions.
llvm-svn: 317968
2017-11-11 02:26:05 +00:00
Craig Topper bf9b944ea7 [X86] Remove the default for one of the arguments to some tablegen multiclasses. NFC
No one ever uses this default and probably shouldn't since it sets the execution domain to generic.

llvm-svn: 317967
2017-11-11 02:26:02 +00:00
Krzysztof Parzyszek e8926438a9 Recommit r317904: [Hexagon] Create HexagonISelDAGToDAG.h, NFC
The Windows builder did not reconstruct the HexagonGenDAGISel.inc file
after the TableGen binary has changed.

llvm-svn: 317921
2017-11-10 20:09:46 +00:00
Konstantin Zhuravlyov 27b0a033d8 AMDGPU/NFC: Split Processors.td into GCNProcessors.td and R600Processors.td
Differential Revision: https://reviews.llvm.org/D39880

llvm-svn: 317920
2017-11-10 20:01:58 +00:00
Krzysztof Parzyszek 79dae95f4a Revert "[Hexagon] Create HexagonISelDAGToDAG.h, NFC"
This reverts r317904: broke Windows build.

llvm-svn: 317916
2017-11-10 19:27:18 +00:00
Craig Topper bb001c6ddc [X86] Merge the template method selectAddrOfGatherScatterNode into selectVectorAddr. NFCI
Just need to initialize a couple variables differently based on the node type. No need for a whole separate template method.

llvm-svn: 317915
2017-11-10 19:26:04 +00:00
Mandeep Singh Grang 5f043ae2e1 [RISCV] Silence an unused variable warning in release builds [NFC]
Summary:
Also minor cleanups:
1. Avoided multiple calls to Fixup.getKind()
2. Avoided multiple calls to getFixupKindInfo()
3. Removed a redundant return.

Reviewers: asb, apazos

Reviewed By: asb

Subscribers: rbar, johnrusso, llvm-commits

Differential Revision: https://reviews.llvm.org/D39881

llvm-svn: 317908
2017-11-10 19:09:28 +00:00
Krzysztof Parzyszek 89765acc6c [Hexagon] Create HexagonISelDAGToDAG.h, NFC
llvm-svn: 317904
2017-11-10 18:39:45 +00:00
Jonas Paulsson 4b017e682d [RegAlloc, SystemZ] Increase number of LOCRs by passing "hard" regalloc hints.
* The method getRegAllocationHints() is now of bool type instead of void. If
true is returned, regalloc (AllocationOrder) will *only* try to allocate the
hints, as opposed to merely trying them before non-hinted registers.

* TargetRegisterInfo::getRegAllocationHints() is implemented for SystemZ with
an increase in number of LOCRs.

In this case, it is desired to force the hints even though there is a slight
increase in spilling, because if a non-hinted register would be allocated,
the LOCRMux pseudo would have to be expanded with a jump sequence. The LOCR
(Load On Condition) SystemZ instruction must have both operands in either the
low or high part of the 64 bit register.

Reviewers: Quentin Colombet and Ulrich Weigand
https://reviews.llvm.org/D36795

llvm-svn: 317879
2017-11-10 08:46:26 +00:00
Craig Topper 1a0da2db5f [X86] Add support for combining FMADDSUB(A, B, FNEG(C))->FMSUBADD(A, B, C)
Support the opposite direction as well. Also add a TODO for not being able to combine FMSUB/FNMADD/FNMSUB with FNEG.

llvm-svn: 317878
2017-11-10 08:22:37 +00:00
Yaxun Liu 35845f06a4 [AMDGPU] Fix pointer info for lowering load/store for r600 for amdgiz environment
r600 uses dummy pointer info for lowering load/store. Since dummy pointer info
assumes address space 0, this causes isel failure when temporary load/store SDNodes
are generated for amdgiz environment.

Since the offest is not constant, FixedStack pseudo source value cannot be used
to create the pointer info. This patch creates pointer info using llvm undef value.
At least this provides correct address space so that isel can be done correctly.

Differential Revision: https://reviews.llvm.org/D39698

llvm-svn: 317862
2017-11-10 02:03:28 +00:00
Yaxun Liu 920cc2f813 [AMDGPU] Fix pointer info for pseudo source for r600
The pointer info for pseudo source for r600 is not correct when
alloca addr space is not 0, which causes invalid SDNode for r600---amdgiz.

This patch fixes that.

Differential Revision: https://reviews.llvm.org/D39670

llvm-svn: 317861
2017-11-10 01:53:24 +00:00
Ulrich Weigand d39e9dca1b [SystemZ] Add support for the "o" inline asm constraint
We don't really need any special handling of "offsettable"
memory addresses, but since some existing code uses inline
asm statements with the "o" constraint, add support for this
constraint for compatibility purposes.

llvm-svn: 317807
2017-11-09 16:31:57 +00:00
Simon Dardis c2d3e38ba6 [mips] Correct microMIP's jump and add unconditional branch pseudo
Correct the definition of 'j' as being unavailable for microMIPS32R6 and
provide the 'b' assembly idiom for codegen purposes for microMIPS32r3.

Provide the necessary 'br' pattern for microMIPS32R6 as it now longer
incorrectly uses the 'j' instruction.

Reviewers: atanasyan

Differential Revision: https://reviews.llvm.org/D39741

llvm-svn: 317801
2017-11-09 16:02:18 +00:00
Alex Bradbury 8c345c5aa9 [RISCV] MC layer support for the standard RV32A instruction set extension
llvm-svn: 317791
2017-11-09 15:00:03 +00:00
Alex Bradbury a47514ce3f [RISCV] MC layer support for the standard RV32M instruction set extension
llvm-svn: 317788
2017-11-09 14:46:30 +00:00
Andrew V. Tischenko f8c75b8794 Sched model improving on btver2: JFPU01 resource, vtestp* for xmm.
Differential Revision: https://reviews.llvm.org/D39802

llvm-svn: 317785
2017-11-09 14:19:59 +00:00
Andrew V. Tischenko 3543f0a712 Add -print-schedule scheduling comments to inline asm.
Differential Revision: https://reviews.llvm.org/D39728

llvm-svn: 317782
2017-11-09 12:45:40 +00:00
Craig Topper 5bfa5ffe5e [X86] Give priority to EVEX FMA instructions over FMA4 instructions.
No existing processor has both so it doesn't really matter what we do here. But we were previously just relying on pattern order which gave FMA4 priority.

llvm-svn: 317775
2017-11-09 08:26:26 +00:00
Vitaly Buka bee1964d80 Fix "default label in switch which covers all enumeration values" warning
llvm-svn: 317771
2017-11-09 07:46:13 +00:00
Craig Topper 7a6e294a6c [X86] Make X86ISD::FMADDS3 isel patterns commutable.
This was missed when FMADDS3 was split from X86ISD::FMADDS3_RND.

llvm-svn: 317769
2017-11-09 06:17:05 +00:00
Marek Olsak 58410f37ff AMDGPU: Merge BUFFER_STORE_DWORD_OFFEN/OFFSET into x2, x4
Summary:
Only 56 shaders (out of 48486) are affected.

Totals from affected shaders (changed stats only):
SGPRS: 2420 -> 2460 (1.65 %)
Spilled VGPRs: 94 -> 112 (19.15 %)
Scratch size: 524 -> 528 (0.76 %) dwords per thread
Code Size: 187400 -> 184992 (-1.28 %) bytes

One DiRT Showdown shader spills 6 more VGPRs.
One Grid Autosport shader spills 12 more VGPRs.

The other 54 shaders only have a decrease in code size.
(I'm ignoring the SGPR noise)

Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D39012

llvm-svn: 317755
2017-11-09 01:52:55 +00:00
Marek Olsak 5cec64195c AMDGPU: Lower buffer store and atomic intrinsics manually
Summary:
Without this, SIMemoryLegalizer inserts s_waitcnt vmcnt(0) before every
buffer store and atomic instruction.

Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D39060

llvm-svn: 317754
2017-11-09 01:52:48 +00:00
Marek Olsak 4c421a2db2 AMDGPU: Merge BUFFER_LOAD_DWORD_OFFSET into x2, x4
Summary: Only 3 (out of 48486) shaders are affected.

Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D38951

llvm-svn: 317753
2017-11-09 01:52:36 +00:00
Marek Olsak 6a0548acaa AMDGPU: Merge BUFFER_LOAD_DWORD_OFFEN into x2, x4
Summary:
-9.9% code size decrease in affected shaders.

Totals (changed stats only):
SGPRS: 2151462 -> 2170646 (0.89 %)
VGPRS: 1634612 -> 1640288 (0.35 %)
Spilled SGPRs: 8942 -> 8940 (-0.02 %)
Code Size: 52940672 -> 51727288 (-2.29 %) bytes
Max Waves: 373066 -> 371718 (-0.36 %)

Totals from affected shaders:
SGPRS: 283520 -> 302704 (6.77 %)
VGPRS: 227632 -> 233308 (2.49 %)
Spilled SGPRs: 3966 -> 3964 (-0.05 %)
Code Size: 12203080 -> 10989696 (-9.94 %) bytes
Max Waves: 44070 -> 42722 (-3.06 %)

Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D38950

llvm-svn: 317752
2017-11-09 01:52:30 +00:00
Marek Olsak b953cc36e2 AMDGPU: Merge S_BUFFER_LOAD_DWORD_IMM into x2, x4
Summary:
Only constant offsets (*_IMM opcodes) are merged.
It reuses code for LDS load/store merging.
It relies on the scheduler to group loads.

The results are mixed, I think they are mostly positive. Most shaders are
affected, so here are total stats only:

 SGPRS: 2072198 -> 2151462 (3.83 %)
 VGPRS: 1628024 -> 1634612 (0.40 %)
 Spilled SGPRs: 7883 -> 8942 (13.43 %)
 Spilled VGPRs: 97 -> 101 (4.12 %)
 Scratch size: 1488 -> 1492 (0.27 %) dwords per thread
 Code Size: 60222620 -> 52940672 (-12.09 %) bytes
 Max Waves: 374337 -> 373066 (-0.34 %)

There is 13.4% increase in SGPR spilling, DiRT Showdown spills a few more
VGPRs (now 37), but 12% decrease in code size.

These are the new stats for SGPR spilling. We already spill a lot SGPRs,
so it's uncertain whether more spilling will make any difference since
SGPRs are always spilled to VGPRs:

 SGPR SPILLING APPS   Shaders SpillSGPR AvgPerSh
 alien_isolation         2938       100      0.0
 batman_arkham_origins    589         6      0.0
 bioshock-infinite       1769         4      0.0
 borderlands2            3968        22      0.0
 counter_strike_glob..   1142        60      0.1
 deus_ex_mankind_div..   1410        79      0.1
 dirt-showdown            533         4      0.0
 dirt_rally               364      1163      3.2
 divinity                1052         2      0.0
 dota2                   1747         7      0.0
 f1-2015                  776      1515      2.0
 grid_autosport          1767      1505      0.9
 hitman                  1413       273      0.2
 left_4_dead_2           1762         4      0.0
 life_is_strange         1296        26      0.0
 mad_max                  358        96      0.3
 metro_2033_redux        2670        60      0.0
 payday2                 1362        22      0.0
 portal                   474         3      0.0
 saints_row_iv           1704         8      0.0
 serious_sam_3_bfe        392      1348      3.4
 shadow_of_mordor        1418        12      0.0
 shadow_warrior          3956       239      0.1
 talos_principle          324      1735      5.4
 thea                     172        17      0.1
 tomb_raider             1449       215      0.1
 total_war_warhammer      242        56      0.2
 ue4_effects_cave         295        55      0.2
 ue4_elemental            572        12      0.0
 unigine_tropics          210        56      0.3
 unigine_valley           278       152      0.5
 victor_vran             1262        84      0.1
 yofrankie                 82         2      0.0

Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D38949

llvm-svn: 317751
2017-11-09 01:52:23 +00:00
Marek Olsak ffadcb744b AMDGPU: Fold immediate offset into BUFFER_LOAD_DWORD lowered from SMEM
Summary:
-5.3% code size in affected shaders.

Changed stats only:

48486 shaders in 30489 tests
Totals:
SGPRS: 2086406 -> 2072430 (-0.67 %)
VGPRS: 1626872 -> 1627960 (0.07 %)
Spilled SGPRs: 7865 -> 7912 (0.60 %)
Code Size: 60978060 -> 60188764 (-1.29 %) bytes
Max Waves: 374530 -> 374342 (-0.05 %)

Totals from affected shaders:
SGPRS: 299664 -> 285688 (-4.66 %)
VGPRS: 233844 -> 234932 (0.47 %)
Spilled SGPRs: 3959 -> 4006 (1.19 %)
Code Size: 14905272 -> 14115976 (-5.30 %) bytes
Max Waves: 46202 -> 46014 (-0.41 %)

Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D38915

llvm-svn: 317750
2017-11-09 01:52:17 +00:00
Craig Topper 93e27d2ecc [X86] Make sure we don't read too many operands from X86ISD::FMADDS1/FMADDS3 nodes when doing FNEG combine.
r317453 added new ISD nodes without rounding modes that were added to an existing if/else chain. But all the previous nodes handled there included a rounding mode. The final code after this if/else chain expected an extra operand that isn't present for the new nodes.

llvm-svn: 317748
2017-11-09 01:06:47 +00:00
Craig Topper cfd510678f [X86] X86MaskedGatherSDNode shouldn't inherit from MaskedGatherScatterSDNode
The classof implementation in MaskedGatherScatterSDNode doesn't consider X86MaskedGatherSDNode so its misleading.

llvm-svn: 317733
2017-11-08 22:26:41 +00:00
Craig Topper 61f81f9637 [X86] Preserve memory refs when folding loads into divides.
This is similar to what we already do for multiplies. Without this we can't unfold and hoist an invariant load.

llvm-svn: 317732
2017-11-08 22:26:39 +00:00
Craig Topper 55029d811f [X86] Remove an if check on the result of a cast. NFC
cast takes a non-null input and produces a non-null output. So this if can never fail.

llvm-svn: 317731
2017-11-08 22:26:37 +00:00
Reid Kleckner 7adb2fdbba Revert "Correct dwarf unwind information in function epilogue for X86"
This reverts r317579, originally committed as r317100.

There is a design issue with marking CFI instructions duplicatable. Not
all targets support the CFIInstrInserter pass, and targets like Darwin
can't cope with duplicated prologue setup CFI instructions. The compact
unwind info emission fails.

When the following code is compiled for arm64 on Mac at -O3, the CFI
instructions end up getting tail duplicated, which causes compact unwind
info emission to fail:
  int a, c, d, e, f, g, h, i, j, k, l, m;
  void n(int o, int *b) {
    if (g)
      f = 0;
    for (; f < o; f++) {
      m = a;
      if (l > j * k > i)
        j = i = k = d;
      h = b[c] - e;
    }
  }

We get assembly that looks like this:
; BB#1:                                 ; %if.then
Lloh3:
	adrp	x9, _f@GOTPAGE
Lloh4:
	ldr	x9, [x9, _f@GOTPAGEOFF]
	mov	 w8, wzr
Lloh5:
	str		wzr, [x9]
	stp	x20, x19, [sp, #-16]!   ; 8-byte Folded Spill
	.cfi_def_cfa_offset 16
	.cfi_offset w19, -8
	.cfi_offset w20, -16
	cmp		w8, w0
	b.lt	LBB0_3
	b	LBB0_7
LBB0_2:                                 ; %entry.if.end_crit_edge
Lloh6:
	adrp	x8, _f@GOTPAGE
Lloh7:
	ldr	x8, [x8, _f@GOTPAGEOFF]
Lloh8:
	ldr		w8, [x8]
	stp	x20, x19, [sp, #-16]!   ; 8-byte Folded Spill
	.cfi_def_cfa_offset 16
	.cfi_offset w19, -8
	.cfi_offset w20, -16
	cmp		w8, w0
	b.ge	LBB0_7
LBB0_3:                                 ; %for.body.lr.ph

Note the multiple .cfi_def* directives. Compact unwind info emission
can't handle that.

llvm-svn: 317726
2017-11-08 21:31:14 +00:00
Alex Bradbury fa18b9e73c Set hasSideEffects=0 for PHI and fix affected passes
Previously, hasSideEffects was ? for TargetOpcode::PHI and would be inferred 
as 1. D37065 sets the previously inferred properties explicitly. This patch sets 
hasSideEffects=0 for PHI, as it is for G_PHI. MachineInstr::isSafeToMove has 
been updated so it still returns false for PHI.

Additionally, HexagonBitSimplify relied on a PHI node having the 
hasUnmodeledSideEffects property. This patch fixes that assumption.

Differential Revision: https://reviews.llvm.org/D37097

llvm-svn: 317721
2017-11-08 20:19:16 +00:00
Craig Topper 78a770402a [X86] Correct the implementation of BEXTR load folding to use the shift as the parent node and pass a separate root.
We were calling tryFoldLoad with the 'and' node was the root and parent node of the load. But the parent of the load should be the shift that proceeds the and. While the and node is correctly the root node.

To fix this I had to make tryFoldLoad take a separate use and root input. I've added a convenience version with the old signature to avoid updating the other call sites.

llvm-svn: 317720
2017-11-08 20:17:33 +00:00
Sam Clegg 6368442fb7 [WebAssembly] Update test expectations
I believe these were fixed in rL317707

Differential Revision: https://reviews.llvm.org/D39813

llvm-svn: 317718
2017-11-08 20:14:06 +00:00
Craig Topper e6094f9bd9 [X86] Don't call validateInstruction from MatchAndEmitInstruction when MatchingInlineAsm is set. The MCInst won't be populated.
Without this we can't parse gather instructions in ms inline asm blocks. The validateInstruction function was introduced in r316700 to check gather constraints.

llvm-svn: 317713
2017-11-08 19:38:48 +00:00
Dan Gohman 0828ba1e1e [WebAssembly] Call signExtend to get sign extended register
Patch by Jatin Bhateja!

Differential Revision: https://reviews.llvm.org/D39529

llvm-svn: 317710
2017-11-08 19:24:21 +00:00
Dan Gohman b465aa0504 [WebAssembly] Revise the strategy for inline asm.
Previously, an "r" constraint would mean the compiler provides a value
on WebAssembly's operand stack. This was tricky to use properly,
particularly since it isn't possible to declare a new local from within
an inline asm string.

With this patch, "r" provides the value in a WebAssembly local, and the
local index is provided to the inline asm string. This requires inline
asm to use get_local and set_local to read the register. This does
potentially result in larger code size, however inline asm should
hopefully be quite rare in WebAssembly.

This also means that the "m" constraint can no longer be supported, as
WebAssembly has nothing like a "memory operand" that includes an
implicit get_local.

This fixes PR34599 for the wasm32-unknown-unknown-wasm target (though
not for the ELF target).

llvm-svn: 317707
2017-11-08 19:18:08 +00:00
Alex Bradbury a337675cdb [RISCV] Initial support for function calls
Note that this is just enough for simple function call examples to generate 
working code. Support for varargs etc follows in future patches.

Differential Revision: https://reviews.llvm.org/D29936

llvm-svn: 317691
2017-11-08 13:41:21 +00:00
Alex Bradbury 74913e1c70 [RISCV] Codegen for conditional branches
A good portion of this patch is the extra functions that needed to be 
implemented to support the test case. e.g. storeRegToStackSlot, 
loadRegFromStackSlot, eliminateFrameIndex.

Setting ISD::BR_CC to Expand may appear non-obvious on an architecture with 
branch+cmp instructions. However, I found it much easier to deal with matching 
the expanded form.

I had to change simm13_lsb0 and simm21_lsb0 to inherit from the 
Operand<OtherVT> class rather than Operand<i32> in order to keep tablegen 
happy. This isn't a big deal, but it does seem a shame to lose the uniformity 
across immediate types when there's not an obvious benefit (I'm hoping a 
tablegen expert will educate me on what I'm missing here!).

Differential Revision: https://reviews.llvm.org/D29935

llvm-svn: 317690
2017-11-08 13:31:40 +00:00
Alex Bradbury ec8aa91305 [RISCV] Codegen support for memory operations on global addresses
Differential Revision: https://reviews.llvm.org/D39103

llvm-svn: 317688
2017-11-08 13:24:21 +00:00
Alex Bradbury cfa6291bb1 [RISCV] Codegen support for memory operations
This required the implementation of RISCVTargetInstrInfo::copyPhysReg. Support
for lowering global addresses follow in the next patch.

Differential Revision: https://reviews.llvm.org/D29934

llvm-svn: 317685
2017-11-08 12:20:01 +00:00
Alex Bradbury 0f0e1b54f0 [RISCV] Codegen support for materializing constants
Differential Revision: https://reviews.llvm.org/D39101

llvm-svn: 317684
2017-11-08 12:02:22 +00:00
Simon Dardis 789f7ca265 [mips] Guard indirect and tailcall pseudo instructions correctly.
Previously these pseudo instructions were not guarded by ISA, so their
select was dependant on the ordering of the entries in the DAG matcher.

Reviewers: atanasyan

Differential Revision: https://reviews.llvm.org/D39723

llvm-svn: 317681
2017-11-08 11:13:44 +00:00
Alex Bradbury cc988415fe [NFCI] Ensure TargetOpcode::* are compatible with guessInstructionProperties=0
rL162640 introduced CodeGenTarget::guessInstructionProperties. If a target 
sets guessInstructionProperties=0 in its FooInstrInfo, tablegen will error if 
it has to guess properties from patterns. Unfortunately, 
guessInstructionProperties=0 can't be used with current upstream LLVM as 
instructions in the TargetOpcode namespace are always included and sometimes 
have inferred properties for mayLoad, mayStore, and hasSideEffects. This patch 
provides the simplest possible fix to this problem, setting default values for 
these fields in the TargetOpcode scope. There is no intended functional 
change, as the explicitly set properties should match what was previously 
inferred. A number of the instructions had hasSideEffects=1 inferred 
unintentionally. This patch makes it explicit, while future patches (such as 
D37097) correct the property.

Differential Revision: https://reviews.llvm.org/D37065

llvm-svn: 317674
2017-11-08 09:26:06 +00:00
Craig Topper 65e6d0b758 [X86] Add patterns to fold EVEX store with EVEX encoded vcvtps2ph instructions. Remove bad pattern that had vf432 vcvtps2ph storing 128-bits.
llvm-svn: 317662
2017-11-08 04:00:31 +00:00
Craig Topper b832ee68b4 [X86] Allow legacy vcvtps2ph intrinsics to select EVEX encoded instructions. Rely on EVEX->VEX to convert back.
Missed store folding opportunities will be fixed in a subsequent commit.

llvm-svn: 317661
2017-11-08 04:00:30 +00:00
David Blaikie 3f833edc7c Target/TargetInstrInfo.h -> CodeGen/TargetInstrInfo.h to match layering
This header includes CodeGen headers, and is not, itself, included by
any Target headers, so move it into CodeGen to match the layering of its
implementation.

llvm-svn: 317647
2017-11-08 01:01:31 +00:00
Matt Arsenault 4709ab9124 AMDGPU: Set correct sched model on v_mad_u64_u32
llvm-svn: 317645
2017-11-08 00:48:25 +00:00
Sriraman Tallam 056b3fd6fb Attribute nonlazybind should not affect calls to functions with hidden visibility.
Differential Revision: https://reviews.llvm.org/D39625

llvm-svn: 317639
2017-11-08 00:01:05 +00:00
Justin Lebar da9e0bd3a2 [NVPTX] Implement __nvvm_atom_add_gen_d builtin.
Summary:
This just seems to have been an oversight.  We already supported the f64
atomic add with an explicit scope (e.g. "cta"), but not the scopeless
version.

Reviewers: tra

Subscribers: jholewinski, sanjoy, cfe-commits, llvm-commits, hiraditya

Differential Revision: https://reviews.llvm.org/D39638

llvm-svn: 317623
2017-11-07 22:10:54 +00:00
Graham Yiu 5cd044e8c8 Use new vector insert half-word and byte instructions when we see insertelement on '8 x i16' and '16 x i8' types. Also extended existing lit testcase to cover these cases.
Differential Revision: https://reviews.llvm.org/D34630

llvm-svn: 317613
2017-11-07 20:55:43 +00:00
Krzysztof Parzyszek 385a4e0489 [Hexagon] Make a test more flexible in HexagonLoopIdiomRecognition
An "or" that sets the sign-bit can be replaced with a "xor", if
the sign-bit was known to be clear before. With some changes to
instruction combining, the simple sign-bit check was failing.
Replace it with a more flexible one to catch more cases.

llvm-svn: 317592
2017-11-07 17:05:54 +00:00
Florian Hahn b936810833 [AArch64][SVE] Asm: Add support for (ADD|SUB)_ZZZ
Patch [5/5] in a series to add assembler/disassembler support for AArch64 SVE unpredicated ADD/SUB instructions.

Patch by Sander De Smalen.

Reviewed by: rengolin

Differential Revision: https://reviews.llvm.org/D39091

llvm-svn: 317591
2017-11-07 16:58:13 +00:00
Florian Hahn 91f11e5ad1 [AArch64][SVE] Asm: Add SVE (Z) Register definitions and parsing support
Patch [3/5] in a series to add assembler/disassembler support for AArch64 SVE unpredicated ADD/SUB instructions.

To summarise, this patch adds:

 * SVE register definitions
 * Methods to parse SVE register operands
 * Methods to print SVE register operands
 * RegKind SVEDataVector to distinguish it from other data types like scalar register or Neon vector.
 * k_SVEDataRegister and SVEDataRegOp to describe SVE registers (which will be extended by further patches with e.g. ElementWidth and the shift-extend type).


Patch by Sander De Smalen.

Reviewed by: rengolin

Differential Revision: https://reviews.llvm.org/D39089

llvm-svn: 317590
2017-11-07 16:45:48 +00:00
Florian Hahn d825bbdc41 [AArch64][SVE] Asm: Set SVE as unsupported feature for existing scheduler models.
Patch [4/5] in a series to add assembler/disassembler support for AArch64 SVE unpredicated ADD/SUB instructions.

We add SVE as unsupported feature for CPUs that don't have SVE to prevent errors from scheduler models saying it lacks information for these instructions.

Patch by Sander De Smalen.

Reviewed by: rengolin

Differential Revision: https://reviews.llvm.org/D39090

llvm-svn: 317582
2017-11-07 15:03:11 +00:00
Petar Jovanovic e2a585dddc Reland "Correct dwarf unwind information in function epilogue for X86"
Reland r317100 with minor fix regarding ComputeCommonTailLength function in
BranchFolding.cpp. Skipping top CFI instructions block needs to executed on
several more return points in ComputeCommonTailLength().

Original r317100 message:

"Correct dwarf unwind information in function epilogue for X86"

This patch aims to provide correct dwarf unwind information in function
epilogue for X86.

It consists of two parts. The first part inserts CFI instructions that set
appropriate cfa offset and cfa register in emitEpilogue() in
X86FrameLowering. This part is X86 specific.

The second part is platform independent and ensures that:

- CFI instructions do not affect code generation
- Unwind information remains correct when a function is modified by
  different passes. This is done in a late pass by analyzing information
  about cfa offset and cfa register in BBs and inserting additional CFI
  directives where necessary.

Changed CFI instructions so that they:

- are duplicable
- are not counted as instructions when tail duplicating or tail merging
- can be compared as equal

Added CFIInstrInserter pass:

- analyzes each basic block to determine cfa offset and register valid at
  its entry and exit
- verifies that outgoing cfa offset and register of predecessor blocks match
  incoming values of their successors
- inserts additional CFI directives at basic block beginning to correct the
  rule for calculating CFA

Having CFI instructions in function epilogue can cause incorrect CFA
calculation rule for some basic blocks. This can happen if, due to basic
block reordering, or the existence of multiple epilogue blocks, some of the
blocks have wrong cfa offset and register values set by the epilogue block
above them.

CFIInstrInserter is currently run only on X86, but can be used by any target
that implements support for adding CFI instructions in epilogue.

Patch by Violeta Vukobrat.

llvm-svn: 317579
2017-11-07 14:40:27 +00:00
Alexey Bataev e25a6fd390 [SLP] Fix PR35047: Fix default cost model for cast op in X86.
Summary:
The cost calculation for default case on X86 target does not always
follow correct wayt because of missing 4-th argument in
`BaseT::getCastInstrCost()` call. Added this missing parameter.

Reviewers: hfinkel, mkuper, RKSimon, spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D39687

llvm-svn: 317576
2017-11-07 14:23:44 +00:00
Florian Hahn c4422247b3 [AArch64][SVE] Asm: Replace 'IsVector' by 'RegKind' in AArch64AsmParser (NFC)
Patch [2/5] in a series to add assembler/disassembler support for AArch64 SVE unpredicated ADD/SUB instructions.

This change is a non functional change that adds RegKind as an alternative to 'isVector' to prepare it for newer types (SVE data vectors and predicate vectors) that will be added in next patches (where the SVE data vector is added as part of this patch set)

Patch by Sander De Smalen.

Reviewed by: rengolin

Differential Revision: https://reviews.llvm.org/D39088

llvm-svn: 317569
2017-11-07 13:07:50 +00:00
Kristof Beyls af9814a1fc [GlobalISel] Enable legalizing non-power-of-2 sized types.
This changes the interface of how targets describe how to legalize, see
the below description.

1. Interface for targets to describe how to legalize.

In GlobalISel, the API in the LegalizerInfo class is the main interface
for targets to specify which types are legal for which operations, and
what to do to turn illegal type/operation combinations into legal ones.

For each operation the type sizes that can be legalized without having
to change the size of the type are specified with a call to setAction.
This isn't different to how GlobalISel worked before. For example, for a
target that supports 32 and 64 bit adds natively:

  for (auto Ty : {s32, s64})
    setAction({G_ADD, 0, s32}, Legal);

or for a target that needs a library call for a 32 bit division:

  setAction({G_SDIV, s32}, Libcall);

The main conceptual change to the LegalizerInfo API, is in specifying
how to legalize the type sizes for which a change of size is needed. For
example, in the above example, how to specify how all types from i1 to
i8388607 (apart from s32 and s64 which are legal) need to be legalized
and expressed in terms of operations on the available legal sizes
(again, i32 and i64 in this case). Before, the implementation only
allowed specifying power-of-2-sized types (e.g. setAction({G_ADD, 0,
s128}, NarrowScalar).  A worse limitation was that if you'd wanted to
specify how to legalize all the sized types as allowed by the LLVM-IR
LangRef, i1 to i8388607, you'd have to call setAction 8388607-3 times
and probably would need a lot of memory to store all of these
specifications.

Instead, the legalization actions that need to change the size of the
type are specified now using a "SizeChangeStrategy".  For example:

   setLegalizeScalarToDifferentSizeStrategy(
       G_ADD, 0, widenToLargerAndNarrowToLargest);

This example indicates that for type sizes for which there is a larger
size that can be legalized towards, do it by Widening the size.
For example, G_ADD on s17 will be legalized by first doing WidenScalar
to make it s32, after which it's legal.
The "NarrowToLargest" indicates what to do if there is no larger size
that can be legalized towards. E.g. G_ADD on s92 will be legalized by
doing NarrowScalar to s64.

Another example, taken from the ARM backend is:
   for (unsigned Op : {G_SDIV, G_UDIV}) {
     setLegalizeScalarToDifferentSizeStrategy(Op, 0,
         widenToLargerTypesUnsupportedOtherwise);
     if (ST.hasDivideInARMMode())
       setAction({Op, s32}, Legal);
     else
       setAction({Op, s32}, Libcall);
   }

For this example, G_SDIV on s8, on a target without a divide
instruction, would be legalized by first doing action (WidenScalar,
s32), followed by (Libcall, s32).

The same principle is also followed for when the number of vector lanes
on vector data types need to be changed, e.g.:

   setAction({G_ADD, LLT::vector(8, 8)}, LegalizerInfo::Legal);
   setAction({G_ADD, LLT::vector(16, 8)}, LegalizerInfo::Legal);
   setAction({G_ADD, LLT::vector(4, 16)}, LegalizerInfo::Legal);
   setAction({G_ADD, LLT::vector(8, 16)}, LegalizerInfo::Legal);
   setAction({G_ADD, LLT::vector(2, 32)}, LegalizerInfo::Legal);
   setAction({G_ADD, LLT::vector(4, 32)}, LegalizerInfo::Legal);
   setLegalizeVectorElementToDifferentSizeStrategy(
       G_ADD, 0, widenToLargerTypesUnsupportedOtherwise);

As currently implemented here, vector types are legalized by first
making the vector element size legal, followed by then making the number
of lanes legal. The strategy to follow in the first step is set by a
call to setLegalizeVectorElementToDifferentSizeStrategy, see example
above.  The strategy followed in the second step
"moreToWiderTypesAndLessToWidest" (see code for its definition),
indicating that vectors are widened to more elements so they map to
natively supported vector widths, or when there isn't a legal wider
vector, split the vector to map it to the widest vector supported.

Therefore, for the above specification, some example legalizations are:
  * getAction({G_ADD, LLT::vector(3, 3)})
    returns {WidenScalar, LLT::vector(3, 8)}
  * getAction({G_ADD, LLT::vector(3, 8)})
    then returns {MoreElements, LLT::vector(8, 8)}
  * getAction({G_ADD, LLT::vector(20, 8)})
    returns {FewerElements, LLT::vector(16, 8)}


2. Key implementation aspects.

How to legalize a specific (operation, type index, size) tuple is
represented by mapping intervals of integers representing a range of
size types to an action to take, e.g.:

       setScalarAction({G_ADD, LLT:scalar(1)},
                       {{1, WidenScalar},  // bit sizes [ 1, 31[
                        {32, Legal},       // bit sizes [32, 33[
                        {33, WidenScalar}, // bit sizes [33, 64[
                        {64, Legal},       // bit sizes [64, 65[
                        {65, NarrowScalar} // bit sizes [65, +inf[
                       });

Please note that most of the code to do the actual lowering of
non-power-of-2 sized types is currently missing, this is just trying to
make it possible for targets to specify what is legal, and how non-legal
types should be legalized.  Probably quite a bit of further work is
needed in the actual legalizing and the other passes in GlobalISel to
support non-power-of-2 sized types.

I hope the documentation in LegalizerInfo.h and the examples provided in the
various {Target}LegalizerInfo.cpp and LegalizerInfoTest.cpp explains well
enough how this is meant to be used.

This drops the need for LLT::{half,double}...Size().


Differential Revision: https://reviews.llvm.org/D30529

llvm-svn: 317560
2017-11-07 10:34:34 +00:00
Bjorn Steinbrink c02b237e46 [X86] Don't clobber reserved registers with stack adjustments
Summary:
Calls using invoke in funclet based functions are assumed to clobber
all registers, which causes the stack adjustment using pops to consider
all registers not defined by the call to be undefined, which can
unfortunately include the base pointer, if one is needed.

To prevent this (and possibly other hazards), skip reserved registers
when looking for candidate registers.

This fixes issue #45034 in the Rust compiler.

Reviewers: mkuper

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D39636

llvm-svn: 317551
2017-11-07 08:50:21 +00:00
Craig Topper e7fb300226 [X86] Add patterns to fold a 64-bit load into the EVEX vcvtph2ps instructions.
llvm-svn: 317548
2017-11-07 07:13:07 +00:00
Craig Topper 0231b1d445 [X86] Add patterns for folding a v16i8 with the VEX vcvtph2ps intrinsics.
Disable the peephole pass to prove that the pattern is working.

llvm-svn: 317547
2017-11-07 07:13:06 +00:00
Craig Topper cf8e6d0a76 [X86] Add support for using EVEX instructions for the legacy vcvtph2ps intrinsics.
Looks like there's some missed load folding opportunities for i64 loads.

llvm-svn: 317544
2017-11-07 07:13:03 +00:00
Craig Topper afc3c8206e [X86] Use IMPLICIT_DEF in VEX/EVEX vcvtss2sd/vcvtsd2ss patterns instead of a COPY_TO_REGCLASS.
ExeDepsFix pass should take care of making the registers match.

llvm-svn: 317542
2017-11-07 04:44:22 +00:00
Craig Topper 4ad81b51ed [X86] Remove 'Requires' from instructions with no patterns. NFC
llvm-svn: 317541
2017-11-07 04:44:21 +00:00
Matt Arsenault 6119f80034 AMDGPU: Remove redundant combine
This combine was already done in two places. The
generic combiner already has done this since
r217610, for adds (with a single use).

This one was added in r303641, and added support for handling
or as well. r313251 later added support to the generic
combine for or. It also turns out the isOrEquivalentToAdd
check is not necessary for this combine.

Additionally, we already reproduce this combine in yet
another place in the backend, although in that version
multiple uses of the add are still folded if it will
allow a fold into the addressing mode. That version needs
to be improved to understand ors though, as well as the
correct legal offsets for private.

llvm-svn: 317526
2017-11-07 00:06:32 +00:00
Craig Topper 428a4e6374 [X86] Make FeatureAVX512 imply FeatureF16C.
The EVEX to VEX pass is already assuming this is true under AVX512VL. We had special patterns to use zmm instructions if VLX and F16C weren't available.

Instead just make AVX512 imply F16C to make the EVEX to VEX behavior explicitly legal and remove the extra patterns.

All known CPUs with AVX512 have F16C so this should safe for now.

llvm-svn: 317521
2017-11-06 22:49:04 +00:00
Craig Topper cb6c38612e [X86] Make FeatureAVX512 imply FeatureFMA.
Previously our VEX patterns were checking Subtarget.hasFMA() which checked FMA || AVX512. So we were behaving as if AVX512 implied it anyway. Which means we'd allow VEX encoded 128/256 FMA when AVX512F was enabled but AVX512VL is off. Regardless of the FMA flag.

EVEX to VEX also transforms scalar EVEX FMA instructions to their VEX versions even without the FMA flag. Similarly for 128/256 under AVX512VL.

So this makes AVX512 imply FeatureFMA to make our current behavior explicit.

All known CPUs that support AVX512 have VEX FMA instructions.

llvm-svn: 317520
2017-11-06 22:49:01 +00:00
Graham Yiu 52a52a6cab Fix buildbot breakages from r317503. Add parentheses to assignment when using result as a condition.
llvm-svn: 317508
2017-11-06 21:04:19 +00:00
Graham Yiu 030621bbcb Adds code to PPC ISEL lowering to recognize byte inserts from vector_shuffles, and use P9 shift and vector insert byte instructions instead of vperm. Extends tests from vector insert half-word.
Differential Revision: https://reviews.llvm.org/D34497

llvm-svn: 317503
2017-11-06 20:18:30 +00:00
Guozhi Wei e3b8d9a312 [PPC] Use xxbrd to speed up bswap64
Power doesn't have bswap instructions, so llvm generates following code sequence for bswap64.

  rotldi   5, 3, 16
  rotldi   4, 3, 8
  rotldi   9, 3, 24
  rotldi   10, 3, 32
  rotldi   11, 3, 48
  rotldi   12, 3, 56
  rldimi 4, 5, 8, 48
  rldimi 4, 9, 16, 40
  rldimi 4, 10, 24, 32
  rldimi 4, 11, 40, 16
  rldimi 4, 12, 48, 8
  rldimi 4, 3, 56, 0

But Power9 has vector bswap instructions, they can also be used to speed up scalar bswap intrinsic. With this patch, bswap64 can be translated to:

  mtvsrdd 34, 3, 3
  xxbrd 34, 34
  mfvsrld 3, 34

Differential Revision: https://reviews.llvm.org/D39510

llvm-svn: 317499
2017-11-06 19:09:38 +00:00
Matt Arsenault 4f6318fe1b AMDGPU: Select v_mad_u64_u32 and v_mad_i64_i32
llvm-svn: 317492
2017-11-06 17:04:37 +00:00
Sanjay Patel 629c411538 [IR] redefine 'UnsafeAlgebra' / 'reassoc' fast-math-flags and add 'trans' fast-math-flag
As discussed on llvm-dev:
http://lists.llvm.org/pipermail/llvm-dev/2016-November/107104.html
and again more recently:
http://lists.llvm.org/pipermail/llvm-dev/2017-October/118118.html

...this is a step in cleaning up our fast-math-flags implementation in IR to better match
the capabilities of both clang's user-visible flags and the backend's flags for SDNode.

As proposed in the above threads, we're replacing the 'UnsafeAlgebra' bit (which had the 
'umbrella' meaning that all flags are set) with a new bit that only applies to algebraic 
reassociation - 'AllowReassoc'.

We're also adding a bit to allow approximations for library functions called 'ApproxFunc' 
(this was initially proposed as 'libm' or similar).

...and we're out of bits. 7 bits ought to be enough for anyone, right? :) FWIW, I did 
look at getting this out of SubclassOptionalData via SubclassData (spacious 16-bits), 
but that's apparently already used for other purposes. Also, I don't think we can just 
add a field to FPMathOperator because Operator is not intended to be instantiated. 
We'll defer movement of FMF to another day.

We keep the 'fast' keyword. I thought about removing that, but seeing IR like this:
%f.fast = fadd reassoc nnan ninf nsz arcp contract afn float %op1, %op2
...made me think we want to keep the shortcut synonym.

Finally, this change is binary incompatible with existing IR as seen in the 
compatibility tests. This statement:
"Newer releases can ignore features from older releases, but they cannot miscompile 
them. For example, if nsw is ever replaced with something else, dropping it would be 
a valid way to upgrade the IR." 
( http://llvm.org/docs/DeveloperPolicy.html#ir-backwards-compatibility )
...provides the flexibility we want to make this change without requiring a new IR 
version. Ie, we're not loosening the FP strictness of existing IR. At worst, we will 
fail to optimize some previously 'fast' code because it's no longer recognized as 
'fast'. This should get fixed as we audit/squash all of the uses of 'isFast()'.

Note: an inter-dependent clang commit to use the new API name should closely follow 
commit.

Differential Revision: https://reviews.llvm.org/D39304

llvm-svn: 317488
2017-11-06 16:27:15 +00:00
Simon Pilgrim ad9b9720e8 [X86][SSE] Merge combineExtractVectorElt_SSE into combineExtractVectorElt. NFCI.
We still early-out for X86ISD::PEXTRW/X86ISD::PEXTRB so no actual change in behaviour, but it'll make it easier to add support in a future patch.

llvm-svn: 317485
2017-11-06 15:28:25 +00:00
Simon Pilgrim 14450720e6 [X86][SSE] Combine EXTRACT_VECTOR_ELT with combineExtractWithShuffle before XFormVExtractWithShuffleIntoLoad
combineExtractWithShuffle can handle more complex shuffles/bitcasts than we can with the equivalent code in XFormVExtractWithShuffleIntoLoad.

Mainly a compile time improvement now (combineExtractWithShuffle combines will have always failed late on inside XFormVExtractWithShuffleIntoLoad), and will let us merge combineExtractVectorElt_SSE in a future commit.

llvm-svn: 317481
2017-11-06 14:34:19 +00:00
Yaxun Liu cc56a8b108 [AMDGPU] Change alloca addr space of r600 to 5 for amdgiz environment
Differential Revision: https://reviews.llvm.org/D39657

llvm-svn: 317479
2017-11-06 14:32:33 +00:00
Jonas Paulsson e54cc1a436 [SystemZ] implement hasDivRemOp()
SystemZ can do division and remainder in a single instruction for scalar
integer types, which are now reflected by returning true in this hook for
those cases.

Review: Ulrich Weigand
llvm-svn: 317477
2017-11-06 13:10:31 +00:00
Yaxun Liu 1ac16619d2 [AMDGPU] Fix assertion due to assuming pointer in default addr space is 32 bit
The backend assumes pointer in default addr space is 32 bit, which is not
true for the new addr space mapping and causes assertion for unresolved
functions.

This patch fixes that.

Differential Revision: https://reviews.llvm.org/D39643

llvm-svn: 317476
2017-11-06 13:01:33 +00:00
Simon Dardis 169df4e24b [mips] Add movep for microMIPS32R6 and fix microMIPS32r3 version
Previously, the 'movep' instruction was defined for microMIPS32r3 and
shared that definition with microMIPS32R6. 'movep' was re-encoded for
microMIPS32r6, so this patch provides the correct encoding.

Secondly, correct the encoding of the 'rs' and 'rt' operands which have
an instruction specific encoding for the registers those operands accept.

Finally, correct the decoding of the 'dst_regs' operand which was extracting
the relevant field from the instruction, but was actually extracting the
field from the alreadly extracted field.

Reviewers: atanasyan

Differential Revision: https://reviews.llvm.org/D39495

llvm-svn: 317475
2017-11-06 12:59:53 +00:00
Mohammed Agabaria 6691758364 [LV][X86] update the cost of interleaving mem. access of floats
Recommit:
This patch contains update of the costs of interleaved loads of v8f32 of stride 3 and 8.
fixed the location of the lit test it works with make check-all.

Differential Revision: https://reviews.llvm.org/D39403

llvm-svn: 317471
2017-11-06 10:56:20 +00:00
Simon Dardis e57795384c [mips] Fix PR35140
Mark all symbols involved with TLS relocations as being TLS symbols.

This resolves PR35140.

Thanks to Alex Crichton for reporting the issue!

Reviewers: atanasyan

Differential Revision: https://reviews.llvm.org/D39591

llvm-svn: 317470
2017-11-06 10:50:04 +00:00
Uriel Korach bb86686a8b [X86][AVX512] Improve lowering of AVX512 test intrinsics
Added TESTM and TESTNM to the list of instructions that already zeroing unused upper bits
and does not need the redundant shift left and shift right instructions afterwards.
Added a pattern for TESTM and TESTNM in iselLowering, so now icmp(neq,and(X,Y), 0) goes folds into TESTM
and icmp(eq,and(X,Y), 0) goes folds into TESTNM
This commit is a preparation for lowering the test and testn X86 intrinsics to IR.

Differential Revision: https://reviews.llvm.org/D38732

llvm-svn: 317465
2017-11-06 09:22:38 +00:00
Uriel Korach eb47d95d52 [X86] Replace duplicate function call with variable. NFC
Change from:
if (N->getOperand(0).getValueType() == MVT::v8i32 ||
    N->getOperand(0).getValueType() == MVT::v8f32)

to:
EVT OpVT = N->getOperand(0).getValueType();
if (OpVT == MVT::v8i32 || OpVT == MVT::v8f32)

Change-Id: I5a105f8710b73a828e6cfcd55fac2eae6153ce25
llvm-svn: 317464
2017-11-06 08:32:45 +00:00
Zvi Rackover 3122698040 X86 ISel: Basic support for variable-index vector permutations
Summary:
Try to lower a BUILD_VECTOR composed of extract-extract chains that can be
reasoned to be a permutation of a vector by indices in a non-constant vector.

We saw this pattern created by ISPC, which resolts to creating it due to the
requirement that shufflevector's mask operand be a *constant* vector.
I didn't check this but we could possibly use this pattern for lowering the X86 permute
C-instrinsics instead of llvm.x86 instrinsics.

This change can be followed by more improvements:
1. Handle vectors with undef elements.
2. Utilize pshufb and zero-mask-blending to support more effiecient
   construction of vectors with constant-0 elements.
3. Use smaller-element vectors of same width, and "interpolate" the indices,
   when no native operation available.

Reviewers: RKSimon, craig.topper

Reviewed By: RKSimon

Subscribers: chandlerc, DavidKreitzer

Differential Revision: https://reviews.llvm.org/D39126

llvm-svn: 317463
2017-11-06 08:25:46 +00:00
Jina Nahias 3844f1ad5c Revert "adding a pattern for broadcastm"
This reverts commit r317457.

Change-Id: If07f1fca1e3453d16c1dac906e87768661384e91
llvm-svn: 317462
2017-11-06 07:48:58 +00:00
Jina Nahias 7b705f1f91 [x86][AVX512] Lowering Broadcastm intrinsics to LLVM IR
This patch, together with a matching clang patch (https://reviews.llvm.org/D38683), implements the lowering of X86 broadcastm intrinsics to IR.

Differential Revision: https://reviews.llvm.org/D38684

Change-Id: I709ac0b34641095397e994c8ff7e15d1315b3540
llvm-svn: 317458
2017-11-06 07:09:24 +00:00
Jina Nahias 9c6561b648 adding a pattern for broadcastm
Change-Id: I6551fb13879e098aed74de410e29815cf37d9ab5
llvm-svn: 317457
2017-11-06 07:09:09 +00:00
Craig Topper 70eaeae7f0 [X86] Use EVEX encoded intrinsics for legacy FMA intrinsics when possible.
llvm-svn: 317454
2017-11-06 05:48:26 +00:00
Craig Topper 07dac55d95 [X86] Add scalar FMA ISD nodes without rounding mode. NFC
Next step is to use them for the legacy FMA scalar intrinsics as well. This will enable the legacy intrinsics to use EVEX encoded opcodes and the extended registers.

llvm-svn: 317453
2017-11-06 05:48:25 +00:00
Craig Topper eff606cc0e [X86] Use EVEX encoded instructions for legacy scalar sqrt intrinsics.
Fixes PR35161.

llvm-svn: 317445
2017-11-06 04:04:01 +00:00
Craig Topper d6471cb934 [X86] Add missing predicate to a pattern. NFC
Other patterns had higher priority so this wasn't noticed. But we shouldn't be dependent on pattern order.

llvm-svn: 317442
2017-11-05 21:14:06 +00:00
Craig Topper 4e2f53511a [X86] Remove some more RCP and RSQRT patterns from InstrAVX512.td that I missed in r317413.
llvm-svn: 317441
2017-11-05 21:14:05 +00:00
Craig Topper 948c39c480 [X86] Fix outdated comment. NFC
llvm-svn: 317440
2017-11-05 21:14:04 +00:00
Mohammed Agabaria acd69dbc7c [REVERT][LV][X86] update the cost of interleaving mem. access of floats
reverted my changes will be committed later after fixing the failure
This patch contains update of the costs of interleaved loads of v8f32 of stride 3 and 8.

Differential Revision: https://reviews.llvm.org/D39403

llvm-svn: 317433
2017-11-05 09:36:54 +00:00
Mohammed Agabaria f74c767de6 [LV][X86] update the cost of interleaving mem. access of floats
This patch contains update of the costs of interleaved loads of v8f32 of stride 3 and 8.

Differential Revision: https://reviews.llvm.org/D39403

llvm-svn: 317432
2017-11-05 09:06:23 +00:00
Craig Topper 692c8efe30 [X86] Don't use RCP14 and RSQRT14 for reciprocal estimations or for legacy SSE rcp/rsqrt intrinsics when AVX512 features are enabled.
Summary:
AVX512 added RCP14 and RSQRT instructions which improve accuracy over the legacy RCP and RSQRT instruction, but not enough accuracy to remove the need for a Newton Raphson refinement.

Currently we use these new instructions for the legacy packed SSE instrinics, but not the scalar instrinsics. And we use it for fast math optimization of division and reciprocal sqrt.

I think switching the legacy instrinsics maybe surprising to the user since it changes the answer based on which processor you're using regardless of any fastmath settings. It's also weird that we did something different between scalar and packed.

As far at the reciprocal estimation, I think it creates unnecessary deltas in our output behavior (and prevents EVEX->VEX). A little playing around with gcc and icc and godbolt suggest they don't change which instructions they use here.

This patch adds new X86ISD nodes for the RCP14/RSQRT14 and uses those for the new intrinsics. Leaving the old intrinsics to use the old instructions.

Going forward I think our focus should be on
-Supporting 512-bit vectors, which will have to use the RCP14/RSQRT14.
-Using RSQRT28/RCP28 to remove the Newton Raphson step on processors with AVX512ER
-Supporting double precision.

Reviewers: zvi, DavidKreitzer, RKSimon

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D39583

llvm-svn: 317413
2017-11-04 18:26:41 +00:00
Craig Topper e5d44cefea [X86] Teach EVEX->VEX pass to turn SHUFI32X4/SHUFF32X4/SHUFI64X/SHUFF64X2 into VPERM2F128/VPERM2I128.
This recovers some of the tests that were changed by r317403.

llvm-svn: 317410
2017-11-04 18:10:03 +00:00
Yaxun Liu 0d9673cff2 [AMDGPU] Remove hardcoded address space value from AMDGPULibFunc
AMDGPULibFunc hardcodes address space values of the old address space mapping,
which causes invalid addrspacecast instructions and undefined functions in
APPSDK sample MonteCarloAsianDP.

This patch fixes that.

Differential Revision: https://reviews.llvm.org/D39616

llvm-svn: 317409
2017-11-04 17:37:43 +00:00
Craig Topper a96d62b360 [X86] Teach shuffle lowering to use 256-bit SHUF128 when possible.
This allows masked operations to be used and allows the register allocator to use YMM16-31 if necessary.

As a follow up I'll look into teaching EVEX->VEX how to turn this back into PERM2X128 if any of the additional features don't work out.

llvm-svn: 317403
2017-11-04 06:44:47 +00:00
Craig Topper d21a53f246 [X86] Give unary PERMI priority over SHUF128 in lowerV8I64VectorShuffle to make it possible to fold a load.
llvm-svn: 317382
2017-11-03 22:48:13 +00:00
David Blaikie 1be62f0327 Move TargetFrameLowering.h to CodeGen where it's implemented
This header already includes a CodeGen header and is implemented in
lib/CodeGen, so move the header there to match.

This fixes a link error with modular codegeneration builds - where a
header and its implementation are circularly dependent and so need to be
in the same library, not split between two like this.

llvm-svn: 317379
2017-11-03 22:32:11 +00:00
Aaron Ballman ecf0e95267 Add llvm::for_each as a range-based extensions to <algorithm> and make use of it in some cases where it is a more clear alternative to std::for_each.
llvm-svn: 317356
2017-11-03 20:01:25 +00:00
Evandro Menezes 9dcf099944 [AArch64] Fix the number of iterations for the Newton series
The number of iterations was incorrectly determined for DP FP vector types
and the tests were insufficient to flag this issue.

Differential revision: https://reviews.llvm.org/D39507

llvm-svn: 317349
2017-11-03 18:56:36 +00:00
Simon Dardis d3b9f61c52 [mips] Match 'ins' and its' variants with C++ code
Change the ISel matching of 'ins', 'dins[mu]' from tablegen code to
C++ code. This resolves an issue where ISel would select 'dins' instead
of 'dinsm' when the instructions size and position were individually in
range but their sum was out of range according to the ISA specification.

Reviewers: atanasyan

Differential Revision: https://reviews.llvm.org/D39117

llvm-svn: 317331
2017-11-03 15:35:13 +00:00
Andrew V. Tischenko 0916c6b654 Fix for Bug 34475 - LOCK/REP/REPNE prefixes emitted as instruction on their own.
Differential Revision: https://reviews.llvm.org/D39546

llvm-svn: 317330
2017-11-03 15:25:13 +00:00
Simon Pilgrim ae1f013495 [X86][SSE] Add PACKUS support to combineVectorTruncation
Similar to the existing code to lower to PACKSS, we can use PACKUS if the input vector's leading zero bits extend all the way to the packed/truncated value.

We have to account for pre-SSE41 targets not supporting PACKUSDW

llvm-svn: 317315
2017-11-03 11:33:48 +00:00
Diana Picus acf4bf21ab [ARM GlobalISel] Move the check for Thumb higher up
We're currently bailing out for Thumb targets while lowering formal
parameters, but there used to be some other checks before it, which
could've caused some functions (e.g. those without formal parameters) to
sneak through unnoticed.

llvm-svn: 317312
2017-11-03 10:30:12 +00:00
Martin Storsjo 9befcd7d8d [AArch64] Use dwarf exception handling on MinGW
Ideally we should probably produce WinEH here as well, but until
then, we can use dwarf exceptions, without any further changes
required in clang, libunwind or libcxxabi.

Differential Revision: https://reviews.llvm.org/D39535

llvm-svn: 317304
2017-11-03 07:33:20 +00:00
Craig Topper 333897ec31 [X86] Remove PALIGNR/VALIGN handling from combineBitcastForMaskedOp and move to isel patterns instead. Prefer 128-bit VALIGND/VALIGNQ over PALIGNR during lowering when possible.
llvm-svn: 317299
2017-11-03 06:48:02 +00:00
Sriraman Tallam 7cdb10f1aa Avoid PLT for external calls when attribute nonlazybind is used.
Differential Revision: https://reviews.llvm.org/D39065

llvm-svn: 317292
2017-11-03 00:10:19 +00:00
Quentin Colombet b6afac1f9a [AArch64][RegisterBankInfo] Add mapping for G_FPEXT.
This fixes http://llvm.org/PR32560. We were missing a description for
half floating point type and as a result were using the FPR 32 mapping.
Because of the size mismatch the generic code was complaining that the
default mapping is not appropriate. Fix the mapping description so that
the default mapping can be properly applied.

llvm-svn: 317287
2017-11-02 23:38:19 +00:00
Quentin Colombet 619d649878 [AArch64][RegisterBankInfo] Add FPR16 support in value mapping.
NFC.

llvm-svn: 317286
2017-11-02 23:38:13 +00:00
Craig Topper 086c04c8a7 [X86] Give AVX512VL instructions priority over their AVX equivalents.
I thought we had gotten all these priority bugs worked out, but I guess not.

llvm-svn: 317283
2017-11-02 23:23:37 +00:00
Konstantin Zhuravlyov 275a4f76c4 AMDGPU: Fix warning discovered by r317266 [-Wunused-private-field]
llvm-svn: 317280
2017-11-02 22:35:22 +00:00
Krzysztof Parzyszek 058014fca5 [Hexagon] Prefer L2_loadrub_io over L4_loadrub_rr
If the offset is an immediate, avoid putting it in a register
to get Rs+Rt<<#0.

llvm-svn: 317275
2017-11-02 21:56:59 +00:00
Konstantin Zhuravlyov b695cd41b3 AMDGPU: Remove outdated fixme (it was already fixed)
llvm-svn: 317266
2017-11-02 20:48:06 +00:00
Simon Dardis 725acb2d91 [mips] Use register scavenging with MSA.
MSA stores and loads to the stack are more likely to require an
emergency GPR spill slot due to the smaller offsets available
with those instructions.

Handle this by overestimating the size of the stack by determining
the largest offset presuming that all callee save registers are
spilled and accounting of incoming arguments when determining
whether an emergency spill slot is required.

Reviewers: atanasyan

Differential Revision: https://reviews.llvm.org/D39056

llvm-svn: 317204
2017-11-02 12:47:22 +00:00
Sam Parker 242052c6b4 [ARM] and, or, xor and add with shl combine
The generic dag combiner will fold:

(shl (add x, c1), c2) -> (add (shl x, c2), c1 << c2)
(shl (or x, c1), c2) -> (or (shl x, c2), c1 << c2)

This can create constants which are too large to use as an immediate.
Many ALU operations are also able of performing the shl, so we can
unfold the transformation to prevent a mov imm instruction from being
generated.

Other patterns, such as b + ((a << 1) | 510), can also be simplified
in the same manner.

Differential Revision: https://reviews.llvm.org/D38084

llvm-svn: 317197
2017-11-02 10:43:10 +00:00
Andrew V. Tischenko 3c8bf5ec37 The patch updates sched numbers for YMM AVX instrs such as VMOVx, VORx, VXOR, VPERMILx, VBROADCASTx, etc.
PR32857 should be closed.
Differential Revision: https://reviews.llvm.org/D39227

llvm-svn: 317196
2017-11-02 10:33:41 +00:00
Petar Jovanovic bb5c84fb57 Revert "Correct dwarf unwind information in function epilogue for X86"
This reverts r317100 as it introduced sanitizer-x86_64-linux-autoconf
buildbot failure (build #15606).

llvm-svn: 317136
2017-11-01 23:05:52 +00:00
Craig Topper 3837322a6b [X86] Use foreach in X86.td to combine some of the CPU names that are obviously aliases. NFC
llvm-svn: 317134
2017-11-01 22:15:49 +00:00
Craig Topper 7a754c4622 [X86] Add CMOV feature to 'i686' processor, making it a proper alias of pentiumpro which I believe it should be.
This is consistent with current gcc behavior.

llvm-svn: 317133
2017-11-01 22:15:40 +00:00
Simon Pilgrim e152c2c447 [X86][SSE] Add PACKUS support to LowerTruncate
Similar to the existing code to lower to PACKSS, we can use PACKUS if the input vector's leading zero bits extend all the way to the packed/truncated value.

We have to account for pre-SSE41 targets not supporting PACKUSDW

llvm-svn: 317128
2017-11-01 21:52:29 +00:00
Craig Topper 4e56ba271e [X86] Add custom code to EVEX to VEX pass to turn unmasked 128-bit VPALIGND/Q into VPALIGNR if the extended registers aren't being used.
This will enable us to prefer VALIGND/Q during shuffle lowering in order to get the extended register encoding space when BWI isn't available. But if we end up not using the extended registers we can switch VPALIGNR for the shorter VEX encoding.

Differential Revision: https://reviews.llvm.org/D39401

llvm-svn: 317122
2017-11-01 21:00:59 +00:00
Konstantin Zhuravlyov 435151ad75 AMDGPU: Fix set but not used warnings related to AMDGPUAS
Differential Revision: https://reviews.llvm.org/D39499

llvm-svn: 317114
2017-11-01 19:12:38 +00:00
Craig Topper ca1aa83cbe [X86] Prevent fast isel from folding loads into the instructions listed in hasPartialRegUpdate.
This patch moves the check for opt size and hasPartialRegUpdate into the lower level implementation of foldMemoryOperandImpl to catch the entry point that fast isel uses.

We're still folding undef register instructions in AVX that we should also probably disable, but that's a problem for another patch.

Unfortunately, this requires reordering a bunch of functions which is why the diff is so large. I can do the function reordering separately if we want.

Differential Revision: https://reviews.llvm.org/D39402

llvm-svn: 317112
2017-11-01 18:10:06 +00:00
Graham Yiu 671526148c Adds code to PPC ISEL lowering to recognize half-word inserts from vector_shuffles, and use P9 shift and vector insert instructions instead of vperm.
Differential Revision: https://reviews.llvm.org/D34160

llvm-svn: 317111
2017-11-01 18:06:56 +00:00
Craig Topper 5ae677e102 [X86] Add 64-bit int to float/double conversion with AVX to X86FastISel::X86SelectSIToFP
Summary:
[X86] Teach fast isel to handle i64 sitofp with AVX.

For some reason we only handled i32 sitofp with AVX. But with SSE only we support i64 so we should do the same with AVX.

Also add i686 command lines for the 32-bit tests. 64-bit tests are in a separate file to avoid a fast-isel abort failure in 32-bit mode.

Reviewers: RKSimon, zvi

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D39450

llvm-svn: 317102
2017-11-01 16:23:06 +00:00
Andrew V. Tischenko 3d971e39f8 Update VCVTx, VMOVNTPx and VROUNDYPx instructions scheduling on btver2.
Differential Revision: https://reviews.llvm.org/D39059

llvm-svn: 317101
2017-11-01 16:10:20 +00:00
Petar Jovanovic f2faee92aa Correct dwarf unwind information in function epilogue for X86
This patch aims to provide correct dwarf unwind information in function
epilogue for X86.

It consists of two parts. The first part inserts CFI instructions that set
appropriate cfa offset and cfa register in emitEpilogue() in
X86FrameLowering. This part is X86 specific.

The second part is platform independent and ensures that:

- CFI instructions do not affect code generation
- Unwind information remains correct when a function is modified by
  different passes. This is done in a late pass by analyzing information
  about cfa offset and cfa register in BBs and inserting additional CFI
  directives where necessary.

Changed CFI instructions so that they:

- are duplicable
- are not counted as instructions when tail duplicating or tail merging
- can be compared as equal

Added CFIInstrInserter pass:

- analyzes each basic block to determine cfa offset and register valid at
  its entry and exit
- verifies that outgoing cfa offset and register of predecessor blocks match
  incoming values of their successors
- inserts additional CFI directives at basic block beginning to correct the
  rule for calculating CFA

Having CFI instructions in function epilogue can cause incorrect CFA
calculation rule for some basic blocks. This can happen if, due to basic
block reordering, or the existence of multiple epilogue blocks, some of the
blocks have wrong cfa offset and register values set by the epilogue block
above them.

CFIInstrInserter is currently run only on X86, but can be used by any target
that implements support for adding CFI instructions in epilogue.


Patch by Violeta Vukobrat.

Differential Revision: https://reviews.llvm.org/D35844

llvm-svn: 317100
2017-11-01 16:04:11 +00:00
Simon Pilgrim 778810eb42 [X86][SSE] Begun generalizing truncateVectorWithPACKSS to work with PACKSS/PACKUS functions
Renamed to truncateVectorWithPACK

llvm-svn: 317098
2017-11-01 15:31:51 +00:00
Roger Ferrer Ibanez 9dfbc10522 Revert r313618 "[ARM] Use ADDCARRY / SUBCARRY"
That change causes PR35103, so reverting until I figure it out.

llvm-svn: 317092
2017-11-01 14:06:57 +00:00
NAKAMURA Takumi 1657f2ad99 Fix warnings discovered by rL317076. [-Wunused-private-field]
llvm-svn: 317091
2017-11-01 13:47:55 +00:00
NAKAMURA Takumi f7d7a59b9e Suppress a warning discovered by rL317076. [-Wunused-private-field]
llvm-svn: 317090
2017-11-01 13:47:51 +00:00
Simon Pilgrim f657ba0cb6 [X86][SSE] Truncate with PACKSS any input with sufficient sign-bits
So far we've only been using PACKSS truncations with 'all-bits or zero-bits' patterns (vector comparison results etc.). When really we can safely use it for any case as long as the number of sign bits reach down to the last 16-bits (or 8-bits if we're truncating to bytes).

The next steps after this is add the equivalent support for PACKUS and to support packing to sub-128 bit vectors for truncating stores etc.

Differential Revision: https://reviews.llvm.org/D39476

llvm-svn: 317086
2017-11-01 11:47:44 +00:00
Craig Topper 688f0ca6a7 [X86] Add more type qualifiers to INSERT_SUBREG operations in rotate patterns so they don't get created with a v64i8 type.
Not sure why tablegen didn't error on this.

Fixes PR35158.

llvm-svn: 317079
2017-11-01 07:11:32 +00:00
Craig Topper a827f84dcc [X86] Add AVX512 support to X86FastISel::fastMaterializeFloatZero.
llvm-svn: 317059
2017-11-01 00:47:45 +00:00
Benjamin Kramer f9ab3ddb8f [AMDGPU] Clean up symbols in the global namespace.
llvm-svn: 317051
2017-10-31 23:21:30 +00:00
Marek Olsak 5914ece6aa AMDGPU: Select s_buffer_load_dword with a non-constant SGPR offset
Summary:
Apps that benefit:
- alien isolation
- bioshock infinite
- civilization: beyond earth
- company of heroes 2
- dirt showdown
- dota 2
- F1 2015
- grid autosport
- hitman
- legend of grimrock
- serious sam 3: bfe
- shadow warrior
- talos principle
- total war: warhammer
- UE4 demos: effects cave, elemental, sun temple

Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D38914

llvm-svn: 317038
2017-10-31 21:06:42 +00:00
Reid Kleckner 39970069b1 [X86][AsmParser] Treat '%' as the modulo operator under Intel syntax
It can't be a register prefix, anyway. This is consistent with the masm
docs on MSDN: https://msdn.microsoft.com/en-us/library/t4ax90d2.aspx

This is a straight-forward extension of our support for "MOD"
implemented in https://reviews.llvm.org/D33876 / r306425

llvm-svn: 317011
2017-10-31 16:47:38 +00:00
Simon Pilgrim f3c33ca83e [X86][SSE] Add VSRLI/VSRAI/VSLLI demanded elts support to computeKnownBits/ComputeNumSignBits
Mainly a perf improvements as most combines will have occurred before we lower to these instructions

llvm-svn: 317005
2017-10-31 16:06:21 +00:00
Michael Zuckerman 9e58831cb8 [AVX512] Adding new patterns for extract_subvector of vXi1
extract subvector of vXi1 from vYi1 is poorly supported by LLVM and most of the time end with an assertion.
This patch fixes this issue by adding new patterns to the TD file.

Reviewers:
1. guyblank
2. igorb
3. zvi
4. ayman
5. craig.topper

Differential Revision: https://reviews.llvm.org/D39292

Change-Id: Ideb4d7e946c8d40cfce2920891f2d89fe64c58f8
llvm-svn: 316981
2017-10-31 10:00:19 +00:00
Craig Topper beed653135 [X86] Make AVX512_512_SET0 XMM16-31 lower to 128-bit XOR when AVX512VL is enabled. Use 128-bit VLX instruction when VLX is enabled.
Unfortunately, this weakens our ability to do domain fixing when AVX512DQ is not enabled, but it is consistent with our 256-bit behavior.

Maybe we should add custom handling to domain fixing to allow EVEX integer XOR/AND/OR/ANDN to switch to VEX encoded fp instructions if the high registers aren't being used?

llvm-svn: 316978
2017-10-31 06:01:04 +00:00
Craig Topper 668b1ab6f1 [X86] Clang-format some code. NFC
llvm-svn: 316973
2017-10-31 02:34:29 +00:00
Javed Absar d13d419d4a [AArch64]: range loopify frame-lowering
llvm-svn: 316960
2017-10-30 22:00:06 +00:00
Craig Topper 9f01f6093c [X86] Add AVX512 support to fast isel's X86ChooseCmpOpcode.
llvm-svn: 316955
2017-10-30 21:09:19 +00:00
Stefan Pintilie 6262fd4b0a Revert "[PowerPC] Try to simplify a Swap if it feeds a Splat"
Revert r316478.
A test case has failed.
Will recommit this change once we find and fix the failure.

This reverts commit 7c330fabaedaba3d02c58bc3cc1198896c895f34.

llvm-svn: 316952
2017-10-30 19:55:38 +00:00
Jina Nahias 5bf6620b15 [X86][AVX512] Adding a pattern for broadcastm intrinsic.
Differential Revision: https://reviews.llvm.org/D38312

Change-Id: I71c8605a8e4c98013ef25289694afc5cfd46bb0b
llvm-svn: 316921
2017-10-30 16:37:28 +00:00
Rafael Espindola 6f36637be0 Move isDSOLocal check and add a comment.
llvm-svn: 316920
2017-10-30 16:32:31 +00:00
Fangrui Song 2696db90d1 [PPC CodeGen] Fix the bitreverse.i64 intrinsic.
Summary: The two 32-bit words were swapped. Update a test omitted in reverted r316270.

Reviewers: jtony, aaron.ballman

Subscribers: nemanjai, kbarton

Differential Revision: https://reviews.llvm.org/D39163

llvm-svn: 316916
2017-10-30 16:03:44 +00:00
Craig Topper 4e13d4de52 [X86] Make sure we don't create locked inc/dec instructions when the carry flag is being used.
Summary:
INC/DEC don't update the carry flag so we need to make sure we don't try to use it.

This patch introduces new X86ISD opcodes for locked INC/DEC. Teaches lowerAtomicArithWithLOCK to emit these nodes if INC/DEC is not slow or the function is being optimized for size. An additional flag is added that allows the INC/DEC to be disabled if the caller determines that the carry flag is being requested.

The test_sub_1_cmp_1_setcc_ugt test is currently showing this bug. The other test case changes are recovering cases that were regressed in r316860.

This should fully fix PR35068 finishing the fix started in r316860.

Reviewers: RKSimon, zvi, spatel

Reviewed By: zvi

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D39411

llvm-svn: 316913
2017-10-30 14:51:37 +00:00
Craig Topper 367cc12fa9 [X86] Remove AVX512 early out from X86FastISel::X86SelectCmp.
This shouldn't be needed anymore since i1 isn't a legal type.

llvm-svn: 316912
2017-10-30 14:50:11 +00:00
Yaxun Liu c928f2a6d4 [AMDGPU] Emit metadata for hidden arguments for kernel enqueue
Identifies kernels which performs device side kernel enqueues and emit
metadata for the associated hidden kernel arguments. Such kernels are
marked with calls-enqueue-kernel function attribute by
AMDGPUOpenCLEnqueueKernelLowering pass and later on
hidden kernel arguments metadata HiddenDefaultQueue and
HiddenCompletionAction are emitted for them.

Differential Revision: https://reviews.llvm.org/D39255

llvm-svn: 316907
2017-10-30 14:30:28 +00:00
Clement Courbet b2c3eb8cf1 [CodeGen][ExpandMemcmp] Allow memcmp to expand to vector loads (2).
- Targets that want to support memcmp expansions now return the list of
   supported load sizes.
 - Expansion codegen does not assume that all power-of-two load sizes
   smaller than the max load size are valid. For examples, this is not the
   case for x86(32bit)+sse2.

Fixes PR34887.

llvm-svn: 316905
2017-10-30 14:19:33 +00:00
Krzysztof Parzyszek bef1c56724 [Hexagon] Allow the RDF optimizations to be run in .mir testcases
llvm-svn: 316904
2017-10-30 14:11:52 +00:00
Javed Absar 5cde1ccb29 [GlobalISel|ARM] : Allow legalizing G_FSUB
Adding support for VSUB.
Reviewed by: @rovka
Differential Revision: https://reviews.llvm.org/D39261

llvm-svn: 316902
2017-10-30 13:51:56 +00:00
Andrew V. Tischenko f94da596a7 Invalid used of 'w' suffix on push and pop using 64-bit register.
Differential Revision: https://reviews.llvm.org/D38626

llvm-svn: 316898
2017-10-30 12:02:06 +00:00
Jina Nahias e63db55c67 Revert "[X86][AVX512] Adding a pattern for broadcastm intrinsic."
This reverts commit r316890.

Change-Id: I683cceee9848ef309b452293086b1f26a941950d
llvm-svn: 316894
2017-10-30 10:35:53 +00:00
Jina Nahias 70280f9a0d [X86][AVX512] Adding a pattern for broadcastm intrinsic.
Differential Revision: https://reviews.llvm.org/D38312

Change-Id: I6551fb13879e098aed74de410e29815cf37d9ab5
llvm-svn: 316890
2017-10-30 09:59:52 +00:00
Craig Topper 85bcc297c3 [X86] Rearrange code in X86InstrInfo.cpp to put all the foldMemoryOperandImpl methods together without partial/undef register handling in the middle. NFC
I have a future patch that wants to make use of the one of the partial functions in one of the earlier memory folding methods and the current ordering prevents that.

llvm-svn: 316883
2017-10-30 04:39:18 +00:00
Craig Topper c848355335 [X86] Simplify code by removing an unnecessary temporary variable. NFC
llvm-svn: 316882
2017-10-30 03:35:44 +00:00
Craig Topper 730414b0ca [X86] Move some EVEX->VEX code to a helper function to prepare for a future patch. NFC
llvm-svn: 316881
2017-10-30 03:35:43 +00:00
Craig Topper 495a1bc893 [X86] Remove combine that turns X86ISD::LSUB into X86ISD::LADD. Update patterns that depended on this.
If the carry flag is being used, this transformation isn't safe.

This does prevent some test cases from using DEC now, but I'll try to look into that separately.

Fixes PR35068.

llvm-svn: 316860
2017-10-29 06:51:04 +00:00
Craig Topper 7a60e29185 [X86] Fix typo in comment. NFC
llvm-svn: 316859
2017-10-29 06:51:02 +00:00
Craig Topper 912f3b8e4b [X86] Use the extended vector register classes in fast isel with AVX512F/VL.
llvm-svn: 316857
2017-10-29 05:14:26 +00:00
Craig Topper 5f2289a13c [X86] Add AVX512 support to X86FastISel::X86SelectFPExt and X86FastISel::X86SelectFPTrunc.
llvm-svn: 316856
2017-10-29 02:50:31 +00:00
Craig Topper 1e30d783dd [X86] Add AVX512 support to X86FastISel::X86MaterializeFP
llvm-svn: 316853
2017-10-29 02:18:41 +00:00
Craig Topper 0692ca4bd2 [X86] Remove invalid code from LowerVSELECT.
This code attempted to say that v8i16/v16i16 VSELECT is legal if BWI and VLX are enabled, but the only way we could reach this point is if the condition was not a vXi1 type. Which means it really wasn't legal.

We don't have any tests that exercise this code. So I'm hoping it wasn't really reachable.

llvm-svn: 316851
2017-10-28 23:10:13 +00:00
Simon Pilgrim 294f88dfa0 [X86][SSE] Combine 128-bit target shuffles to PACKSS/PACKUS.
llvm-svn: 316845
2017-10-28 20:51:27 +00:00
Simon Pilgrim bd3852aa5e [X86][SSE] Split off matchVectorShuffleWithPACK. NFCI.
Split matchVectorShuffleWithPACK from lowerVectorShuffleWithPACK so that we can reuse it for target shuffle combines

llvm-svn: 316844
2017-10-28 20:27:22 +00:00
Craig Topper 40f0584f08 [X86] Fix a mistake in the X86ISelDAGToDAG.cpp code for MUL8r/IMUL8r.
I think this code is unreachable due to some promotions that occur elsewhere. I'll look into that to be sure, but for now I thought I should at least fix the obvious typo.

llvm-svn: 316840
2017-10-28 19:56:57 +00:00
Craig Topper 202b559ae0 [X86] Replace some default cases in X86SelectShift with llvm_unreachable.
llvm-svn: 316839
2017-10-28 19:56:56 +00:00
Sanjay Patel b049173157 [SimplifyCFG] use pass options and remove the latesimplifycfg pass
This is no-functional-change-intended.

This is repackaging the functionality of D30333 (defer switch-to-lookup-tables) and 
D35411 (defer folding unconditional branches) with pass parameters rather than a named
"latesimplifycfg" pass. Now that we have individual options to control the functionality,
we could decouple when these fire (but that's an independent patch if desired). 

The next planned step would be to add another option bit to disable the sinking transform
mentioned in D38566. This should also make it clear that the new pass manager needs to
be updated to limit simplifycfg in the same way as the old pass manager.

Differential Revision: https://reviews.llvm.org/D38631

llvm-svn: 316835
2017-10-28 18:43:07 +00:00
Simon Pilgrim 25808c303f [X86][SSE] Rename truncateVectorCompareWithPACKSS to truncateVectorWithPACKSS. NFC.
We no longer rely on the vector source being a comparison result, just have sufficient sign bits.

llvm-svn: 316834
2017-10-28 17:59:56 +00:00
Craig Topper f8b92661b8 [X86] Remove unneeded MVT::i1 related code from fast isel.
llvm-svn: 316825
2017-10-28 05:52:23 +00:00
Tom Stellard d0c6cf2e8c AMDGPU/GlobalISel: Mark 32-bit G_FADD as legal
Reviewers: arsenm

Reviewed By: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, igorb, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D38439

llvm-svn: 316815
2017-10-27 23:57:41 +00:00
Krzysztof Parzyszek 4dc04e6a70 [Hexagon] Adjust patterns to reflect instruction selection preferences
llvm-svn: 316804
2017-10-27 22:24:49 +00:00
David Blaikie 8699f71310 Add a few missing headers for modularization/IWYU/etc
Several cases where class definitions are required for DenseMap pointer
traits handling.

llvm-svn: 316803
2017-10-27 22:12:46 +00:00
Rafael Espindola 2393c3b4e1 Handle undefined weak hidden symbols on all architectures.
We were handling the non-hidden case in lib/Target/TargetMachine.cpp,
but the hidden case was handled in architecture dependent code and
only X86_64 and AArch64 were covered.

While it is true that some code sequences in some ABIs might be able
to produce the correct value at runtime, that doesn't seem to be the
common case.

I left the AArch64 code in place since it also forces a got access for
non-pic code. It is not clear if that is needed, but it is probably
better to change that in another commit.

llvm-svn: 316799
2017-10-27 21:18:48 +00:00
Craig Topper d69453290e [X86] Remove fast-isel code for handling i8 shifts. This is handled by auto generated code.
llvm-svn: 316797
2017-10-27 21:00:59 +00:00
Craig Topper 728fa7b4e2 [X86] Teach fastisel to use VLX VMOVNTDQA for v4f64 and 256-bit integers when available.
This looks to have been missed from r280682.

llvm-svn: 316790
2017-10-27 20:13:10 +00:00
Krzysztof Parzyszek 92a2635bbd [Hexagon] Fix an incorrect assertion in HexagonConstExtenders.cpp
Making sure that an instruction has fewer operands than required, then
attempting to access one out of range is going to fail.

llvm-svn: 316785
2017-10-27 18:52:28 +00:00
Simon Pilgrim 5e3808afa2 [X86][F16C] Fix btver2 AGU pipe scheduling
Use the store AGU for stores, and the load AGU needs to be the first pipe for loads

llvm-svn: 316771
2017-10-27 16:34:58 +00:00
David Blaikie 6265130054 InstructionSelectorImpl.h: Modularize/remove ODR violations by using a static member function to expose the debug name
llvm-svn: 316715
2017-10-26 23:39:54 +00:00
Eli Friedman d5dfb62de7 [ARM] Honor -mfloat-abi for libcall calling convention
As far as I can tell, this matches gcc: -mfloat-abi determines the
calling convention for all functions except those explicitly defined as
soft-float in the ARM RTABI.

This change only affects cases where the user specifies -mfloat-abi to
override the default calling convention derived from the target triple.

Fixes https://bugs.llvm.org//show_bug.cgi?id=34530.

Differential Revision: https://reviews.llvm.org/D38299

llvm-svn: 316708
2017-10-26 21:42:32 +00:00
Craig Topper b8d7d4d683 [X86] Improve handling of UDIVREM8_ZEXT_HREG/SDIVREM8_SEXT_HREG to support 64-bit extensions.
If the extend type is 64-bits, emit a 32-bit -> 64-bit extend after the UDIVREM8_ZEXT_HREG/UDIVREM8_SEXT_HREG operation.

This gives a shorter encoding for the second extend in the sext case, and allows us to completely remove the second extend in the zext case.

This also adds known bit and num sign bits support for UDIVREM8_ZEXT_HREG/SDIVREM8_SEXT_HREG.

Differential Revision: https://reviews.llvm.org/D38275

llvm-svn: 316702
2017-10-26 21:12:03 +00:00
Craig Topper 8a2a104129 [X86] Teach the assembly parser to warn on duplicate registers in gather instructions.
Fixes PR32238.

Differential Revision: https://reviews.llvm.org/D39077

llvm-svn: 316700
2017-10-26 21:03:54 +00:00
Sanjay Patel ac50f3e907 [x86] use an insert op to put one variable element into a constant of vectors
Instead of loading (a potential ton of) scalar constants, load those as a vector and then insert into it.

Differential Revision: https://reviews.llvm.org/D38756

llvm-svn: 316685
2017-10-26 18:27:55 +00:00
Yichao Yu 221dae31a5 Clear LastMappingSymbols and LastEMS(Info) when resetting the ARM(AArch64)ELFStreamer
Summary:
This causes a segfault on ARM when (I think) the pass manager is used multiple times.

Reset set the (last) current section to NULL without saving the corresponding LastEMSInfo back into the map. The next use of the streamer then save the LastEMSInfo for the NULL section leaving the LastEMSInfo mapping for the last current section (the one that was there before the reset) NULL which cause the LastEMSInfo to be set to NULL when the section is being used again.

The reuse of the section (pointer) might mean that the map was holding dangling pointers previously which is why I went for clearing the map and resetting the info, making it as similar to the state right after the constructor run as possible. The AArch64 one doesn't have segfault (since LastEMS isn't a pointer) but it seems to have the same issue.

The segfault is likely caused by https://reviews.llvm.org/D30724 which turns LastEMSInfo into a pointer. As mentioned above, it seems that the actual issue was older though.

No test is included since the test is believed to be too complicated for such an obvious fix and not worth doing.

Reviewers: llvm-commits, shankare, t.p.northover, peter.smith, rengolin

Reviewed By: rengolin

Subscribers: mgorny, aemerson, rengolin, javed.absar, kristof.beyls

Differential Revision: https://reviews.llvm.org/D38588

llvm-svn: 316679
2017-10-26 17:36:43 +00:00
Sean Fertile c70d28bff5 Represent runtime preemption in the IR.
Currently we do not represent runtime preemption in the IR, which has several
drawbacks:

  1) The semantics of GlobalValues differ depending on the object file format
     you are targeting (as well as the relocation-model and -fPIE value).
  2) We have no way of disabling inlining of run time interposable functions,
     since in the IR we only know if a function is link-time interposable.
     Because of this llvm cannot support elf-interposition semantics.
  3) In LTO builds of executables we will have extra knowledge that a symbol
     resolved to a local definition and can't be preemptable, but have no way to
     propagate that knowledge through the compiler.

This patch adds preemptability specifiers to the IR with the following meaning:

dso_local --> means the compiler may assume the symbol will resolve to a
 definition within the current linkage unit and the symbol may be accessed
 directly even if the definition is not within this compilation unit.

dso_preemptable --> means that the compiler must assume the GlobalValue may be
replaced with a definition from outside the current linkage unit at runtime.

To ease transitioning dso_preemptable is treated as a 'default' in that
low-level codegen will still do the same checks it did previously to see if a
symbol should be accessed indirectly. Eventually when IR producers emit the
specifiers on all Globalvalues we can change dso_preemptable to mean 'always
access indirectly', and remove the current logic.

Differential Revision: https://reviews.llvm.org/D20217

llvm-svn: 316668
2017-10-26 15:00:26 +00:00
Marek Olsak 2232243863 AMDGPU: Handle s_buffer_load_dword hazard on SI
Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D39171

llvm-svn: 316666
2017-10-26 14:43:02 +00:00
Simon Dardis b633acac9f [mips] Fix (dis)assembly of abs.fmt for micromips
These instructions were previously marked as codegen only preventing
them from being assembled as microMIPS or disassembled.

Reviewers: atanasyan, abeserminji

Differential Revision: https://reviews.llvm.org/D39123

llvm-svn: 316656
2017-10-26 11:36:54 +00:00
Simon Dardis 13452383cd [mips] Fix PR35071
PR35071 exposed the fact that MipsInstrInfo::removeBranch did not walk past
debug instructions when removing branches for the control flow optimizer, which
lead to duplicated conditional branches. If the target of the branch was a
removable block, only the conditional branch in the terminating position would
have it's MBB operands updated, leaving the first branch with a dangling MBB
operand. The MIPS long branch pass would then trigger an assertion when
attempting to examine the instruction with dangling MBB operand.

This resolves PR35071.

Thanks to Alex Richardson for reporting the issue!

Reviewers: atanasyan

Differential Revision: https://reviews.llvm.org/D39288

llvm-svn: 316654
2017-10-26 10:58:36 +00:00
Hiroshi Inoue b72b1fb0de [PowerPC] Use record-form instruction for Less-or-Equal -1 and Greater-or-Equal 1
Currently a record-form instruction is used for comparison of "greater than -1" and "less than 1" by modifying the predicate (e.g. LT 1 into LE 0) in addition to the naive case of comparison against 0.
This patch also enables emitting a record-form instruction for "less than or equal to -1" (i.e. "less than 0") and "greater than or equal to 1" (i.e. "greater than 0") to increase the optimization opportunities.

Differential Revision: https://reviews.llvm.org/D38941

llvm-svn: 316647
2017-10-26 09:01:51 +00:00
Craig Topper 0551556ed2 [AsmParser][TableGen] Add VariantID argument to the generated mnemonic spell check function so it can use the correct table based on variant.
I'm considering implementing the mnemonic spell checker for x86, and that would require the separate intel and att variants.

llvm-svn: 316641
2017-10-26 06:46:41 +00:00
Craig Topper 2a06028c0a [AsmParser][TableGen] Make the generated mnemonic spell checker function a file local static function.
Also only emit in targets that specificially request it. This is required so we don't get an unused static function error.

llvm-svn: 316640
2017-10-26 06:46:40 +00:00
Craig Topper 619b15283d [X86] Use correct type for return value of ComputeAvailableFeatures in the AsmParser. NFC
There aren't enough used bits to make this a functional change, but we should fix it for consistency.

llvm-svn: 316639
2017-10-26 06:46:38 +00:00
David Blaikie cc7763ba92 Hexagon: Fold a single-use textual header into its use
llvm-svn: 316604
2017-10-25 19:52:21 +00:00
Krzysztof Parzyszek 27056da9a8 [Hexagon] Account for negative offset when limiting max deviation
In getOffsetRange, Max can be set to 0 to force the extender replacement
to be at or below the original value. This would cause the new offset to
be non-negative, which is preferred for memory instructions (to reduce
the likelihood of it getting constant-extended due to predication). The
problem happens when the range is shifted by an offset (present in the
instruction being examined) and the offset is negative. The entire range
for the allowable deviation will then be strictly negative. This creates
a problem, since 0 is assumed to be a valid deviation.

llvm-svn: 316601
2017-10-25 18:46:40 +00:00
Craig Topper 6fae2eedf3 [X86] Add avx512vpopcntdq to Knights Mill
As indicated by Table 1-1 in Intel Architecture Instruction Set Extensions and Future Features Programming Reference from October 2017.

llvm-svn: 316592
2017-10-25 17:10:32 +00:00
Simon Dardis 7af3edc4f4 [mips] Clean up some whitespace (NFC).
Also test that my email address was updated.

llvm-svn: 316575
2017-10-25 13:35:53 +00:00
Diana Picus b35022121d [ARM GlobalISel] Fix call opcodes
We were generating BLX for all the calls, which was incorrect in most
cases. Update ARMCallLowering to generate BL for direct calls, and BLX,
BX_CALL or BMOVPCRX_CALL for indirect calls.

llvm-svn: 316570
2017-10-25 11:42:40 +00:00
Sam Parker 1f742117bd [ARM] OrCombineToBFI function
Extract the functionality to combine OR to BFI into its own function.

Differential Revision: https://reviews.llvm.org/D39001

llvm-svn: 316563
2017-10-25 08:37:33 +00:00
Sam Parker ccb209bb97 [ARM] Swap cmp operands for automatic shifts
Swap the compare operands if the lhs is a shift and the rhs isn't,
as in arm and T2 the shift can be performed by the compare for its
second operand.

Differential Revision: https://reviews.llvm.org/D39004

llvm-svn: 316562
2017-10-25 08:33:06 +00:00
Martin Storsjo 373c8efa1e [AArch64] Add support for dllimport of values and functions
Previously, the dllimport attribute did the right thing in terms
of treating it as a pointer to a value, but this makes sure the
names get mangled properly, and calls to such functions load the
function from the __imp_ pointer.

This is based on SVN r212431 and r212430 where the same was
implemented for ARM.

Differential Revision: https://reviews.llvm.org/D38530

llvm-svn: 316555
2017-10-25 07:25:18 +00:00
Matt Arsenault 28f52e51f1 AMDGPU: Add max-mix-insts subtarget feature
llvm-svn: 316553
2017-10-25 07:00:51 +00:00
Yonghong Song 9af998e86e bpf: fix an uninitialized variable issue
Signed-off-by: Yonghong Song <yhs@fb.com>
llvm-svn: 316519
2017-10-24 21:36:33 +00:00
David Blaikie c70b392e49 ARMAddressingModes.h: Don't mark header functions as file local
llvm-svn: 316517
2017-10-24 21:29:21 +00:00
David Blaikie 4016da602e HexagonDepTimingClasses.h: Don't mark header functions as file local
llvm-svn: 316508
2017-10-24 21:29:16 +00:00
David Blaikie 75bda3006b WebassemblyAsmPrinter.h: Include WebAssemblyMachineFunctionInfo for use with MachineFunction::getInfo
llvm-svn: 316507
2017-10-24 21:29:15 +00:00
David Blaikie 1032b51aa0 X86Operand.h: Include X86MCTargetDesc.h for SSE register enum/names
llvm-svn: 316506
2017-10-24 21:29:15 +00:00
David Blaikie 6a2b124248 X86AsmPrinter.h: Add missing header for complete type needed for MCCodeEmitter dtor.
llvm-svn: 316505
2017-10-24 21:29:14 +00:00
Artem Belevich cb8f6328dc [NVPTX] allow address space inference for volatile loads/stores.
If particular target supports volatile memory access operations, we can
avoid AS casting to generic AS. Currently it's only enabled in NVPTX for
loads and stores that access global & shared AS.

Differential Revision: https://reviews.llvm.org/D39026

llvm-svn: 316495
2017-10-24 20:31:44 +00:00
Gadi Haber 323f2e1715 [X86][Broadwell] Added the instruction scheduling information for the Broadwell CPU.
Adding the scheduling information for the Browadwell (BDW) CPU target.

This patch adds the instruction scheduling information for the Broadwell (BDW) architecture target by adding the file X86SchedBroadwell.td located under the X86 Target.
We used the scheduling information retrieved from the Broadwell architects in order to create the file.
The scheduling information includes latency, number of micro-Ops and used ports by each BDW instruction.

The patch continues the scheduling replacement and insertion effort started with the SandyBridge (SNB) target in r310792, the Haswell (HSW) target in r311879, the SkylakeClient (SKL) target in rL313613 + rL315978 and the SkylakeServer (SKX) in rL315175.

Performance fluctuations may be expected due to code alignment effects.

Reviewers: zvi, RKSimon, craig.topper
Differential Revision: https://reviews.llvm.org/D39054

Change-Id: If6f799e5ff60e1091c8d43b05ea78c53581bae01
llvm-svn: 316492
2017-10-24 20:19:47 +00:00
Yonghong Song ee68d8e41f bpf: fix a bug in trunc-op optimization
Previous implementation for per-function scope
is incorrect and too conservative.

Signed-off-by: Yonghong Song <yhs@fb.com>
llvm-svn: 316481
2017-10-24 18:21:10 +00:00
Stefan Pintilie 8f0c783095 [PowerPC] Try to simplify a Swap if it feeds a Splat
If we have the situation where a Swap feeds a Splat we can sometimes change the
  index on the Splat and then remove the Swap instruction.

Fixed the test case that was failing and recommit after pulling the original
  commit.

  Original revision is here: https://reviews.llvm.org/D39009

llvm-svn: 316478
2017-10-24 17:44:27 +00:00
Yonghong Song 0f836d5dc5 bpf: fix a bug in bpf-isel trunc-op optimization
In BPF backend, we try to optimize away redundant
trunc operations so that kernel verifier rewrite
remains valid. Previous implementation only works
for a single function.

This patch fixed the issue for multiple functions.
It clears internal map data structure before
performing optimization for each function.

Signed-off-by: Yonghong Song <yhs@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
llvm-svn: 316469
2017-10-24 17:29:03 +00:00
Simon Pilgrim 5e8c3f328f [X86][AVX] ComputeNumSignBitsForTargetNode - add support for X86ISD::VTRUNC
llvm-svn: 316462
2017-10-24 17:04:57 +00:00
Saleem Abdulrasool fb490a0bcc PowerPC: support the separator character in the IAS
PowerPC uses ; as a comment leader and the @ as a separator character.
Support this properly.

llvm-svn: 316454
2017-10-24 16:19:56 +00:00
Simon Pilgrim 0a12c239b6 [X86] truncateVectorCompareWithPACKSS - use PACKSSDW/PACKSSWB instead of just PACKSSWB.
By using the widest type possible for PACKSS truncation we have a better chance of being able to peek through bitcasts and improves other combines driven by ComputeNumSignBits.

llvm-svn: 316448
2017-10-24 15:38:16 +00:00
Oliver Stannard 03ded27bbc [ARM] Error for invalid shift in memory operand
Report a diagnostic when we fail to parse a shift in a memory operand because
the shift type is not an identifier. Without this, we were silently ignoring
the whole instruction.

Differential revision: https://reviews.llvm.org/D39237

llvm-svn: 316441
2017-10-24 14:19:08 +00:00
Simon Pilgrim c36dd6ae9c [X86] truncateVectorCompareWithPACKSS - remove duplicate variables. NFCI.
llvm-svn: 316440
2017-10-24 14:18:32 +00:00
Andrew V. Tischenko f4fbe4a51b Update f16c instruction scheduling on btver2.
Differential Revision: https://reviews.llvm.org/D39051

llvm-svn: 316435
2017-10-24 13:38:30 +00:00
Zvi Rackover bf31bf78e7 X86CallFrameOptimization: Update comments and variable names. NFCI.
Following up on D38738.

llvm-svn: 316434
2017-10-24 13:24:26 +00:00
Zvi Rackover 31b101a186 X86CallFrameOptimization: Recognize 'store 0/-1 using and/or' idioms
Summary:
r264440 added or/and patterns for storing -1 or 0 with the intention of decreasing code size. However,
X86CallFrameOptimization does not recognize these memory accesses so it will not replace them with push's when profitable.

This patch fixes this problem by teaching X86CallFrameOptimization these store 0/-1 idioms.

An alternative fix would be to prevent the 'store 0/1 idioms' patterns from firing when accessing the stack. This would save
the need to teach the pass about these idioms. However, because X86CallFrameOptimization does not always fire we may result
in cases where neither X86CallFrameOptimization not the patterns for 'store 0/1 idioms' fire.

Fixes pr34863

Reviewers: DavidKreitzer, guyblank, aymanmus

Reviewed By: aymanmus

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D38738

llvm-svn: 316431
2017-10-24 12:13:05 +00:00
Marek Olsak ce76ea0394 AMDGPU: Add new intrinsic llvm.amdgcn.kill(i1)
Summary:
Kill the thread if operand 0 == false.
llvm.amdgcn.wqm.vote can be applied to the operand.

Also allow kill in all shader stages.

Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D38544

llvm-svn: 316427
2017-10-24 10:27:13 +00:00
Marek Olsak 2114fc3bcb AMDGPU: Add llvm.amdgcn.wqm.vote intrinsic
Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye

Differential Revision: https://reviews.llvm.org/D38543

llvm-svn: 316426
2017-10-24 10:26:59 +00:00
Oliver Stannard ce256a3a01 [ARM] Replace development diagnostics with normal DEBUG macro
* Remove the -arm-asm-parser-dev-diags option.
* Use normal DEBUG(dbgs()) printing for the extra development information about
  missing diagnostics.

Differential Revision: https://reviews.llvm.org/D39194

llvm-svn: 316423
2017-10-24 09:46:56 +00:00
Oliver Stannard 6d5a5b98ab [ARM] tSETEND needs IsThumb
This is the Thumb encoding, so the Requires list must include IsThumb.

No test because we happen to select the ARM one first, but that's just luck.

Differential Revision: https://reviews.llvm.org/D39190

llvm-svn: 316421
2017-10-24 09:03:33 +00:00
Oliver Stannard c507b370a1 [ARM] Remove tCPS alias which just crashed
This alias caused a crash when trying to print the "cps #0" instruction in a
diagnostic for thumbv6 (which doesn't have that instruction).
	    
The comment was incorrect, this instruction is UNPREDICTABLE if no flag bits
are set, so I don't think it's worth keeping.

Differential Revision: https://reviews.llvm.org/D39191

llvm-svn: 316420
2017-10-24 08:55:36 +00:00
Zvi Rackover 3c0d385598 X86: Fix X86CallFrameOptimization to search for the COPY StackPointer
SelectionDAG inserts a copy of ESP into a virtual register.
X86CallFrameOptimization assumed that the COPY, if present, is always
right after the call-frame setup instruction (ADJCALLSTACKDOWN). This was a
wrong assumption as the COPY can be located anywhere between the call-frame setup
instruction and its first use. If the COPY happened to be located in a different
location than what X86CallFrameOptimization assumed, visiting it while
processing the call chain would lead to a conservative bail-out.

The fix is quite straightfoward, scan ahead for the stack-pointer copy and make note
of it so it can be ignored while processing the call chain.

Fixes pr34903

Differential Revision: https://reviews.llvm.org/D38730

llvm-svn: 316416
2017-10-24 07:38:29 +00:00
Omer Paparo Bivas 2251c79aba [MC] Adding code padding for performance stability - infrastructure. NFC.
Infrastructure designed for padding code with nop instructions in key places such that preformance improvement will be achieved.
The infrastructure is implemented such that the padding is done in the Assembler after the layout is done and all IPs and alignments are known.
This patch by itself in a NFC. Future patches will make use of this infrastructure to implement required policies for code padding.

Reviewers:
aaboud
zvi
craig.topper
gadi.haber

Differential revision: https://reviews.llvm.org/D34393

Change-Id: I92110d0c0a757080a8405636914a93ef6f8ad00e
llvm-svn: 316413
2017-10-24 06:16:03 +00:00
Zvi Rackover c6d0b6c103 X86: Register the X86CallFrameOptimization pass
Summary:
The motivation of this change is to enable .mir testing for this pass.
Added one test case to cover the functionality, this same case will be improved by
a future patch.

Reviewers: igorb, guyblank, DavidKreitzer

Reviewed By: guyblank, DavidKreitzer

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D38729

llvm-svn: 316412
2017-10-24 05:47:07 +00:00
Konstantin Zhuravlyov 339e74440a AMDGPU: Initialize WavefrontSize from TD files
Differential Revision: https://reviews.llvm.org/D39205

llvm-svn: 316389
2017-10-23 23:02:39 +00:00
Simon Pilgrim 321e54f72d [X86][SSE] combineBitcastvxi1 - use PACKSSWB directly to pack v8i16 to v16i8
Avoid difficulties determining the number of sign bits later on in shuffle lowering to lower to PACKSS

llvm-svn: 316383
2017-10-23 22:05:02 +00:00
Stefan Pintilie 52bbd587ac Revert "[PowerPC] Try to simplify a Swap if it feeds a Splat"
Revert commit r316366.
Previous commit causes p8-scalar_vector_conversions.ll to fail.

This reverts commit 990e764ad8a2eec206ce5dda6aefab059ccd4e92.

llvm-svn: 316371
2017-10-23 20:22:23 +00:00
Krzysztof Parzyszek 6f06b6edff [Hexagon] Return the correct chain edge for i1 function calls
In HexagonISelLowering, there is code to handle the case when
a function returns an i1 type. In this case, we need to generate
extra nodes to copy the result from R0 to a predicate register.

The code was returning the wrong value for the chain edge which
caused an assert "Wrong topological sorting" when converting the
instructions to MIs.

This patch fixes the problem by returning the chain for the final
copy.

Patch by Brendon Cahoon.

llvm-svn: 316367
2017-10-23 19:35:25 +00:00
Stefan Pintilie feafa1d7f0 [PowerPC] Try to simplify a Swap if it feeds a Splat
If we have the situation where a Swap feeds a Splat we can sometimes change the
index on the Splat and then remove the Swap instruction.

Differential Revision: https://reviews.llvm.org/D39009

llvm-svn: 316366
2017-10-23 19:33:31 +00:00
Krzysztof Parzyszek 273678823b [Hexagon] Add extra pattern for S4_addaddi
One combination was missing: add(add(x,y),c).

llvm-svn: 316363
2017-10-23 19:07:50 +00:00
Daniel Sanders d66e0901ae [globalisel][tablegen] Import stores and allow GISel to automatically substitute zero regs like WZR/XZR/$zero.
This patch enables the import of stores. Unfortunately, doing so by itself,
loses an optimization where storing 0 to memory makes use of WZR/XZR.

To mitigate this, this patch also introduces a new feature that allows register
operands to nominate a zero register. When this is done, GlobalISel will
substitute (G_CONSTANT 0) with the nominated register automatically. This
is currently configured to only apply to the stores.

Applying it to GPR32/GPR64 register classes in general will be done after
review see (https://reviews.llvm.org/D39150).

llvm-svn: 316360
2017-10-23 18:19:24 +00:00
Matt Arsenault a030e2688f AMDGPU: Cleanup local atomic node names
llvm-svn: 316349
2017-10-23 17:16:43 +00:00
Matt Arsenault b791802aef AMDGPU: Fix default range in non-kernel functions
The range should be assumed to be the hardware maximum
if a workitem intrinsic is used in a callable function
which does not know the restricted limit of the calling
kernel.

llvm-svn: 316346
2017-10-23 17:09:35 +00:00
Craig Topper 8d5a246ebe [X86] Change VMPTRST to use PS instead of TB to match VMPTRLD.
llvm-svn: 316340
2017-10-23 16:22:40 +00:00
Craig Topper 1db2f0828e [X86] Change RDRAND to use PS instead of TB.
Should be no functional change for now. A future disassembler change will prevent disassembling with 0xf2/0xf3.

llvm-svn: 316339
2017-10-23 16:22:38 +00:00
Craig Topper 4d93adfed5 [X86] Change XRSTOR to use PS instead of TB to match XSAVE.
I don't think this changes anything functionally yet, but I plan to fix the disassembler to use this to disable matching certain instructions with 0xf3/0xf2/0x66 prefixes.

llvm-svn: 316337
2017-10-23 16:11:33 +00:00
Simon Pilgrim 1dcb913be6 [X86][SSE] Remove AssertZext stage from PEXTRW/PEXTRB lowering. NFCI.
Remove AssertZext and instead add PEXTRW/PEXTRB support to computeKnownBitsForTargetNode to simplify instruction selection.

Differential Revision: https://reviews.llvm.org/D39169

llvm-svn: 316336
2017-10-23 16:00:57 +00:00
Andrew V. Tischenko 777308b548 Update DPPD/DPPS instruction scheduling on btver2.
Differential Revision: https://reviews.llvm.org/D39046

llvm-svn: 316334
2017-10-23 15:53:30 +00:00
Craig Topper 8f182fdd8b [X86] Add PTWRITE instruction for assembler and disassembler.
llvm-svn: 316333
2017-10-23 15:53:21 +00:00
Craig Topper 5f0339d2f3 [X86] Add RDPID instruction for assembler and disassembler.
llvm-svn: 316332
2017-10-23 15:53:16 +00:00
Andrew V. Tischenko eff4fc0d41 Fix for Bug 30718 - Failure to disassemble certain MOV with rex.R. The issue was in illegal segment register index.
Differential Revision: https://reviews.llvm.org/D38786

llvm-svn: 316319
2017-10-23 09:36:33 +00:00
Haojian Wu 1afddd4136 Fix a -Wpedantic warning.
llvm-svn: 316315
2017-10-23 09:02:59 +00:00
Sam Parker 487ab86942 [ARM] Allow unrolling of multi-block loops.
Before, loop unrolling was only enabled for loops with a single
block. This restriction has been removed and replaced by:
- allow a maximum of two exiting blocks,
- a four basic block limit for cores with a branch predictor.

Differential Revision: https://reviews.llvm.org/D38952

llvm-svn: 316313
2017-10-23 08:05:14 +00:00
Craig Topper 326008c615 [X86] Fix disassembly of EVEX rounding control and SAE instructions.
Fixes PR31955.

llvm-svn: 316308
2017-10-23 02:26:24 +00:00
Benjamin Kramer a7c822a238 [X86] Add missing override. NFC.
llvm-svn: 316299
2017-10-22 19:16:31 +00:00
Simon Pilgrim ce55eab936 Strip trailing whitespace. NFCI.
llvm-svn: 316296
2017-10-22 18:38:57 +00:00
Marina Yatsina f9371d821f Add logic to greedy reg alloc to avoid bad eviction chains
This fixes bugzilla 26810
https://bugs.llvm.org/show_bug.cgi?id=26810

This is intended to prevent sequences like:
movl %ebp, 8(%esp) # 4-byte Spill
movl %ecx, %ebp
movl %ebx, %ecx
movl %edi, %ebx
movl %edx, %edi
cltd
idivl %esi
movl %edi, %edx
movl %ebx, %edi
movl %ecx, %ebx
movl %ebp, %ecx
movl 16(%esp), %ebp # 4 - byte Reload

Such sequences are created in 2 scenarios:

Scenario #1:
vreg0 is evicted from physreg0 by vreg1
Evictee vreg0 is intended for region splitting with split candidate physreg0 (the reg vreg0 was evicted from)
Region splitting creates a local interval because of interference with the evictor vreg1 (normally region spliiting creates 2 interval, the "by reg" and "by stack" intervals. Local interval created when interference occurs.)
one of the split intervals ends up evicting vreg2 from physreg1
Evictee vreg2 is intended for region splitting with split candidate physreg1
one of the split intervals ends up evicting vreg3 from physreg2 etc.. until someone spills

Scenario #2
vreg0 is evicted from physreg0 by vreg1
vreg2 is evicted from physreg2 by vreg3 etc
Evictee vreg0 is intended for region splitting with split candidate physreg1
Region splitting creates a local interval because of interference with the evictor vreg1
one of the split intervals ends up evicting back original evictor vreg1 from physreg0 (the reg vreg0 was evicted from)
Another evictee vreg2 is intended for region splitting with split candidate physreg1
one of the split intervals ends up evicting vreg3 from physreg2 etc.. until someone spills

As compile time was a concern, I've added a flag to control weather we do cost calculations for local intervals we expect to be created (it's on by default for X86 target, off for the rest).

Differential Revision: https://reviews.llvm.org/D35816

Change-Id: Id9411ff7bbb845463d289ba2ae97737a1ee7cc39
llvm-svn: 316295
2017-10-22 17:59:38 +00:00
Momchil Velikov d6a4ab3d49 [ARM] Dynamic stack alignment for 16-bit Thumb
This patch implements dynamic stack (re-)alignment for 16-bit Thumb. When
targeting processors, which support only the 16-bit Thumb instruction set
the compiler ignores the alignment attributes of automatic variables and may
silently generate incorrect code.

Differential revision: https://reviews.llvm.org/D38143

llvm-svn: 316289
2017-10-22 11:56:35 +00:00
Guy Blank 92d5ce3bd4 [X86] Add a pass to convert instruction chains between domains.
The pass scans the function to find instruction chains that define
registers in the same domain (closures).
It then calculates the cost of converting the closure to another domain.
If found profitable, the instructions are converted to instructions in
the other domain and the register classes are changed accordingly.

This commit adds the pass infrastructure and a simple conversion from
the GPR domain to the Mask domain.

Differential Revision:
https://reviews.llvm.org/D37251

Change-Id: Ic2cf1d76598110401168326d411128ae2580a604
llvm-svn: 316288
2017-10-22 11:43:08 +00:00
Craig Topper a33846aca6 [X86] Add VEX_WIG to applicable AVX512 instructions.
This should be NFC. Will be used in future patches to fix disassembler bugs.

llvm-svn: 316284
2017-10-22 06:18:23 +00:00
Craig Topper 1bcb0d8a7f [X86] Add VEX_WIG to VROUNDSSrr/VROUNDSSrm/VROUNDSDrr/VROUNDSDrm
llvm-svn: 316283
2017-10-22 06:18:20 +00:00
Craig Topper 158bc6474a [X86] Don't allow gather/scatter to disassembler if memory operand does not use a SIB byte.
Fixes PR34998.

llvm-svn: 316282
2017-10-22 04:32:30 +00:00
Simon Pilgrim ab6dbe2b29 Strip trailing whitespace. NFCI.
llvm-svn: 316277
2017-10-21 20:40:49 +00:00
Aaron Ballman fc02869c96 Reverting r316270 due to failing build bots.
http://lab.llvm.org:8011/builders/clang-x86_64-linux-selfhost-modules-2/builds/12899
http://lab.llvm.org:8011/builders/clang-x86-windows-msvc2015/builds/7951

llvm-svn: 316276
2017-10-21 20:38:15 +00:00
Simon Pilgrim 3cb024490a [X86][SSE] Add extractps/pextrd equivalence to domain tables
Differential Revision: https://reviews.llvm.org/D39135

llvm-svn: 316274
2017-10-21 20:19:48 +00:00
Craig Topper ca2382d809 [X86] Fix disassembling of EVEX instructions to stop accidentally decoding the SIB index register as an XMM/YMM/ZMM register.
This introduces a new operand type to encode the whether the index register should be XMM/YMM/ZMM. And new code to fixup the results created by readSIB.

This has the nice effect of removing a bunch of code that hard coded the name of every GATHER and SCATTER instruction to map the index type.

This fixes PR32807.

llvm-svn: 316273
2017-10-21 20:03:20 +00:00
Simon Pilgrim cb028c7321 Fix MSVC 'result of 32-bit shift implicitly converted to 64 bits' warning. NFCI.
llvm-svn: 316271
2017-10-21 17:23:04 +00:00
Fangrui Song c7b749bd06 [PPC CodeGen] Fix the bitreverse.i64 intrinsic.
Summary: The two 32-bit words were swapped.

Subscribers: nemanjai, kbarton

Differential Revision: https://reviews.llvm.org/D38705

llvm-svn: 316270
2017-10-21 16:59:40 +00:00
Craig Topper fcf27188d7 [X86] Do not generate __multi3 for mul i128 on X86
Summary: __multi3 is not available on x86 (32-bit). Setting lib call name for MULI_128 to nullptr forces DAGTypeLegalizer::ExpandIntRes_MUL to generate instructions for 128-bit multiply instead of a call to an undefined function.  This fixes PR20871 though it may be worth looking at why licm and indvars combine to generate 65-bit multiplies in that test.

Patch by Riyaz V Puthiyapurayil

Reviewers: craig.topper, schweitz

Reviewed By: craig.topper, schweitz

Subscribers: RKSimon, llvm-commits

Differential Revision: https://reviews.llvm.org/D38668

llvm-svn: 316254
2017-10-21 02:26:00 +00:00
Krzysztof Parzyszek 9d19c8cac9 [Packetizer] Add function to check for aliasing between instructions
llvm-svn: 316243
2017-10-20 22:08:40 +00:00
Sam Clegg 12fd3da9d1 [WebAssembly] MC: Fix crash when -g specified.
At this point we don't output any debug sections or thier
relocations.

Differential Revision: https://reviews.llvm.org/D39076

llvm-svn: 316240
2017-10-20 21:28:38 +00:00
Daniel Sanders 1e4569fdc1 [globalisel][tablegen] Fix small spelling nits. NFC
ComplexRendererFn -> ComplexRendererFns
Corrected a couple lingering references to tied operands that were missed.

llvm-svn: 316237
2017-10-20 20:55:29 +00:00
Krzysztof Parzyszek 022922b31a [Hexagon] Report error instead of crashing on wrong inline-asm constraints
llvm-svn: 316236
2017-10-20 20:24:44 +00:00
Krzysztof Parzyszek 64e5d7d3ae [Hexagon] Reorganize and update instruction patterns
llvm-svn: 316228
2017-10-20 19:33:12 +00:00
Simon Pilgrim 29b32472b4 [X86][SSE] getTargetShuffleMask - check shuffle input value types. NFCI.
To help identify shuffle combine issues

llvm-svn: 316222
2017-10-20 18:07:50 +00:00
Dave Lee f9b72327b0 Make x86 __ehhandler comdat if parent function is
Summary:
This change comes from using lld for i686-windows-msvc. Before this change, lld
emits an error of:

    error: relocation against symbol in discarded section: .xdata

It's possible that this could be addressed in lld, but I think this change is
reasonable on its own.

At a high level, this is being generated:

    A (.text comdat) -> B (.text) -> C (.xdata comdat)

Where A is a C++ inline function, which references B, an exception handler
thunk, which references C, the exception handling info.

With this structure, lld will error when applying relocations to B if the C it
references has been discarded (some other C has been selected).

This change checks if A is comdat, and if so places the exception registration
thunk (B) in the comdata group of A (and B).

It appears that MSVC makes the __ehhandler function comdat.

Is it possible that duplicate thunks are being emitted into the final binary
with other linkers, or are they stripping the unused thunks?

Reviewers: rnk, majnemer, compnerd, smeenai

Reviewed By: rnk, compnerd

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D38940

llvm-svn: 316219
2017-10-20 17:04:43 +00:00
Krzysztof Parzyszek 3818aeaeb9 [Hexagon] Allow redefinition with immediates for hw loop conversion
Normally, if the registers holding the induction variable's bounds
are redefined inside of the loop's body, the loop cannot be converted
to a hardware loop. However, if the redefining instruction is actually
loading an immediate value into the register, this conversion is both
possible and legal (since the immediate itself will be used in the
loop setup in the preheader).

llvm-svn: 316218
2017-10-20 16:56:33 +00:00
Aleksandar Beserminji 143572984d Revert "[mips] Reordering callseq* nodes to be linear"
This reverts commit r314507, because the original patch is causing test
failures.

llvm-svn: 316215
2017-10-20 14:35:41 +00:00
Eugene Leviant 27b226fb65 [ARM] Use post-RA MI scheduler when +use-misched is set
Differential revision: https://reviews.llvm.org/D39100

llvm-svn: 316214
2017-10-20 14:29:17 +00:00
Nemanja Ivanovic 0026c06e11 Disabling the transformation introduced in r315888
The commit at https://reviews.llvm.org/rL315888 is causing some failures
with internal testing. Disabling this code until we can resolve the issues.

llvm-svn: 316199
2017-10-20 00:36:46 +00:00
Alex Bradbury c6c4e8bd5a [RISCV] Add missing hunk from r316188
r316188 didn't set guessInstructionProperties=1 as it should have done.

llvm-svn: 316189
2017-10-19 21:43:29 +00:00
Alex Bradbury 8971842f43 [RISCV] Initial codegen support for ALU operations
This adds the minimum necessary to support codegen for simple ALU operations
on RV32. Prolog and epilog insertion, support for memory operations etc etc 
follow in future patches.

Leave guessInstructionProperties=1 until https://reviews.llvm.org/D37065 is 
reviewed and lands.

Differential Revision: https://reviews.llvm.org/D29933

llvm-svn: 316188
2017-10-19 21:37:38 +00:00
Craig Topper 7bce79a539 [X86] Remove LowerEXTRACT_SUBVECTOR handler. All EXTRACT_SUBVECTORs are marked as legal.
llvm-svn: 316182
2017-10-19 20:59:40 +00:00
Graham Yiu 488782efa3 The cost of splitting a large vector instruction is not being taken into account by the getUserCost function. This was leading to some loops being over unrolled. The cost of a vector instruction is now being multiplied by the cost of the type legalization. This will return a more accurate cost.
Committing on behalf on Brad Nemanich (brad.nemanich@ibm.com)

Differential Revision: https://reviews.llvm.org/D38961

llvm-svn: 316174
2017-10-19 18:16:31 +00:00
Krzysztof Parzyszek e4d0e199bf [Hexagon] Fix store conversion from rr to io in optimize addressing modes
llvm-svn: 316170
2017-10-19 16:59:22 +00:00
Alex Bradbury 3c941e7ed9 [RISCV] RISCVAsmParser: early exit if RISCVOperand isn't immediate as expected
This is necessary to avoid an assertion in the included test case and similar 
assembler inputs.

llvm-svn: 316168
2017-10-19 16:22:51 +00:00
Alex Bradbury baa54d4ac8 [RISCV][NFC] Drop unused parameter from createImm helper in RISCVAsmParser
llvm-svn: 316167
2017-10-19 16:09:20 +00:00
Simon Pilgrim fdd63d1535 [X86] Replace custom scalar integer absolute matching with ISD::ABS lowering.
x86 has its own copy of integer absolute pattern matching to combine directly to a SUB+CMOV.

This patch removes the x86 combine and adds custom lowering support for ISD::ABS instead, allowing us to use the DAGCombiner version.

Additional test cases are already covered by iabs.ll (rL315706 and rL315711).

Differential Revision: https://reviews.llvm.org/D38895

llvm-svn: 316162
2017-10-19 15:02:24 +00:00
Alex Bradbury ee7c7ecd03 [RISCV] Prepare for the use of variable-sized register classes
While parameterising by XLen, also take the opportunity to clean up the 
formatting of the RISCV .td files.

This commit unifies the in-tree code with my patchset at 
<https://github.com/lowrisc/riscv-llvm>.

llvm-svn: 316159
2017-10-19 14:29:03 +00:00
Sumanth Gundapaneni e1983bcf55 [Hexagon] New HVX target features.
This patch lets the llvm tools handle the new HVX target features that
are added by frontend (clang). The target-features are of the form
"hvx-length64b" for 64 Byte HVX mode, "hvx-length128b" for 128 Byte mode HVX.
"hvx-double" is an alias to "hvx-length128b" and is soon will be deprecated.
The hvx version target feature is upgated form "+hvx" to "+hvxv{version_number}.
Eg: "+hvxv62"

For the correct HVX code generation, the user must use the following
target features.
For 64B mode: "+hvxv62" "+hvx-length64b"
For 128B mode: "+hvxv62" "+hvx-length128b"

Clang picks a default length if none is specified. If for some reason,
no hvx-length is specified to llvm, the compilation will bail out.
There is a corresponding clang patch.

Differential Revision: https://reviews.llvm.org/D38851

llvm-svn: 316101
2017-10-18 18:07:07 +00:00
Sumanth Gundapaneni 9d954c4169 [Hexagon] Update Hexagon ArchEnum and sync some downstream changes(NFC)
Differential Revision: https://reviews.llvm.org/D38850

llvm-svn: 316099
2017-10-18 17:45:22 +00:00
Krzysztof Parzyszek 8c53c95137 [Hexagon] Mark vector loads as predicable, update instruction mappings
All loads of form V6_vL32b_{,cur,nt,tmp,nt_cur,nt_tmp}_{ai,pi,ppu} are
predicable on v62 (but not on v60). Mark them all as predicable in the
instruction definitions, and handle the v60 case in HII::isPredicable.

llvm-svn: 316098
2017-10-18 17:36:46 +00:00
Konstantin Zhuravlyov 8d5e9e110c AMDGPU: Rename MaxFlatWorkgroupSize to MaxFlatWorkGroupSize for consistency
Differential Revision: https://reviews.llvm.org/D38957

llvm-svn: 316097
2017-10-18 17:31:09 +00:00
Alex Bradbury 13ce95b77f [RISCV] Bugfix createRISCVELFObjectWriter
r315275 set the IsLittleEndian parameter incorrectly. This patch corrects 
this, and adds a test to ensure such mistakes will be caught in the future.

llvm-svn: 316091
2017-10-18 16:11:31 +00:00
Andre Vieira d4a25707f0 [ARM] Fix disassembly for conditional VMRS and VMSR instructions in ARM mode
Differential Revision: https://reviews.llvm.org/D38347

llvm-svn: 316085
2017-10-18 14:47:37 +00:00
Simon Dardis 03c2c65b2d [mips] Fix analyzeBranch to handle debug data
In the case where there was a conditional branch followed by a unconditional
branch with debug instruction separating them, MipsInstrInfo::analyzeBranch
would not skip past debug instruction when searching for the second branch
which give erroneous results about the control flow of the block.

This could lead to the branch folder to merge the non-fall through case
into it's predecessor, leaving the conditional branch with a dangling
basic block operand.

This resolves PR34975.

Thanks to Alexander Richardson for reporting the issue!

Reviewers: atanasyan

Differential Revision: https://reviews.llvm.org/D39003

llvm-svn: 316084
2017-10-18 14:35:29 +00:00
NAKAMURA Takumi 6f43bd4bde Untabify.
llvm-svn: 316079
2017-10-18 13:31:28 +00:00
Dylan McKay bebde41ec5 [AVR] Update to current LLVM API
r315410 broke a number of things in the AVR backend, which are now
fixed.

llvm-svn: 316076
2017-10-18 12:35:15 +00:00
Michael Zuckerman 49293264cc [AVX512][AVX2]Cost calculation for interleave load/store patterns {v8i8,v16i8,v32i8,v64i8}
This patch adds accurate instructions cost.
The formula presents two cases(stride 3 and stride 4) and calculates the cost according to the VF and stride.

Reviewers:
1. delena
2. Farhana
3. zvi
4. dorit
5. Ayal

Differential Revision: https://reviews.llvm.org/D38762

Change-Id: If4cfbd4ac0e63694e8144cb78c7fa34850647ff7
llvm-svn: 316072
2017-10-18 11:41:55 +00:00
Hiroshi Inoue 5388e66d3a [PowerPC] Use helper functions to check sign-/zero-extended value
Helper functions to identify sign- and zero-extending machine instruction is introduced in rL315888.
This patch makes PPCInstrInfo::optimizeCompareInstr use the helper functions. It simplifies the code and also makes possible more optimizations since the helper can do more analysis than the original check code; I observed about 5000 more compare instructions are eliminated while building LLVM.

Also, this patch fixes a bug in helpers on ANDIo instruction handling due to the order of checks. This bug causes a failure in an existing test case for optimizeCompareInstr.

Differential Revision: https://reviews.llvm.org/D38988

llvm-svn: 316071
2017-10-18 10:31:19 +00:00
Michael Zuckerman 72a6f893cb Fixing bug issue https://bugs.llvm.org/show_bug.cgi?id=34978
Change-Id: I7f13d5bcb181be2860377df7b40e1579a8ad4add
llvm-svn: 316067
2017-10-18 08:04:31 +00:00
Daniel Sanders 30247fd1d9 [aarch64][globalisel] Register banks and classes should have distinct names.
Otherwise they are ambiguous in MIR.

llvm-svn: 316047
2017-10-18 00:12:43 +00:00
Wei Ding 7ab1f7a421 AMDGPU : Fix an error for the llvm.cttz implementation.
Differential Revision: http://reviews.llvm.org/D39014

llvm-svn: 316037
2017-10-17 21:49:52 +00:00
Matthias Braun a2f96b5bde AArch64: Enable AES instruction fusion on Cyclone.
Note that cyclone itself doesn't fuse, but newer apple chips do and we
are using cyclone as the default when targeting apple OSes.

The current code also does not capture all fusion patterns of apple CPUs
yet; I am still looking for ways to refactor the code nicely to extend
it.

llvm-svn: 316036
2017-10-17 21:46:15 +00:00
Tim Northover 350a87eaf1 AArch64: account for possible frame index operand in compares.
If the address of a local is used in a comparison, AArch64 can fold the
address-calculation into the comparison via "adds". Unfortunately, a couple of
places (both hit in this one test) are not ready to deal with that yet and just
assume the first source operand is a register.

llvm-svn: 316035
2017-10-17 21:43:52 +00:00
Eugene Zelenko 6cadde7f40 [Transforms] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC).
llvm-svn: 316034
2017-10-17 21:27:42 +00:00
Konstantin Zhuravlyov 7dabe9ced7 AMDGPU: Start generating metadata for MaxFlatWorkGroupSize
Differential Revision: https://reviews.llvm.org/D38958

llvm-svn: 316024
2017-10-17 20:03:21 +00:00
Yichao Yu a46eb8e649 Fix `FaultMaps` crash when the out streamer is reused
Summary:
Make sure the map is cleared before processing a new module. Similar to what is done on `StackMaps`.

This issue is similar to D38588, though this time for FaultMaps (on x86) rather than ARM/AArch64. Other than possible mixing of information between modules, the crash is caused by the pointers values in the map that was allocated by the bump pointer allocator that is unwinded when emitting the next file. This issue has been around since 3.8.

This issue is likely much harder to write a test for since AFAICT it requires emitting something much more compilcated (and possibly real code) instead of just some random bytes.

Reviewers: skatkov, sanjoy

Reviewed By: skatkov, sanjoy

Subscribers: sanjoy, aemerson, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D38924

llvm-svn: 315990
2017-10-17 11:44:34 +00:00
Gadi Haber 1e0f1f476a [X86][SKL] Updated scheduling information for the SkylakeClient target
Updated the scheduling information for the SkylakeClient target with the following changes:

1. regrouped the instructions after adding load and store latencies.
2. regrouped the instructions after adding identified missing ports in several groups.
The changes were made after revisiting the latencies impact of all the load and store uOps.

Reviewers: zvi, RKSimon, craig.topper
Differential Revision: https://reviews.llvm.org/D38727

Change-Id: I778a308cc11e490e8fa5e27e2047412a1dca029f
llvm-svn: 315978
2017-10-17 06:47:04 +00:00
Craig Topper fbb1985c14 [X86] Fix typo in comment. NFC
llvm-svn: 315969
2017-10-17 04:17:54 +00:00
Mark Searles 4e3d6160db Use the return value of UpdateNodeOperands(); in some cases, UpdateNodeOperands() modifies the node in-place and using the return value isn’t strictly necessary. However, it does not necessarily modify the node, but may return a resultant node if it already exists in the DAG. See comments in UpdateNodeOperands(). In that case, the return value must be used to avoid such scenarios as an infinite loop (node is assumed to have been updated, so added back to the worklist, and re-processed; however, node hasn’t changed so it is once again passed to UpdateNodeOperands(), assumed modified, added back to worklist; cycle infinitely repeats).
Differential Revision: https://reviews.llvm.org/D38466

llvm-svn: 315957
2017-10-16 23:38:53 +00:00
Quentin Colombet 0bd2825517 Re-apply [AArch64][RegisterBankInfo] Use the statically computed mappings for COPY
This reverts commit r315823, thus re-applying r315781.

Also make sure we don't use G_BITCAST mapping for non-generic registers.
Non-generic registers don't have a type but do have a reg bank.
Something the COPY mapping now how to deal with but the G_BITCAST
mapping don't.

-- Original Commit Message --
We use to resort on the generic implementation to get the mappings for
COPYs. The generic implementation resorts on table lookup and
dynamically allocated objects to get the valid mappings.

Given we already know how to map G_BITCAST and have the static mappings
for them, use that code path for COPY as well. This is much more
efficient.

Improve the compile time of RegBankSelect by up to 20%.

Note: When we eventually generate all the mappings via TableGen, we
wouldn't have to do that dance to shave compile time. The intent of this
change was to make sure that moving to static structure really pays off.

NFC.

llvm-svn: 315947
2017-10-16 22:28:40 +00:00
Quentin Colombet 9f20af6135 [AArch64][RegisterBankInfo] Add mapping support for G_BITCAST of s128
Anything bigger than 64-bit just map to FPR.

llvm-svn: 315946
2017-10-16 22:28:38 +00:00
Quentin Colombet 7c114d3d70 [AArch64][LegalizerInfo] Mark s128 G_BITCAST legal
We used to mark all G_BITCAST of 128-bit legal but only for vector
types. Scalars of this size are just fine as well.

llvm-svn: 315945
2017-10-16 22:28:27 +00:00
Krzysztof Parzyszek 72518eaa6f Add iterator range MachineRegisterInfo::liveins(), adopt users, NFC
llvm-svn: 315927
2017-10-16 19:08:41 +00:00
Krzysztof Parzyszek 02893de4ef [Hexagon] Rangify some loops, NFC
Recommit r315763 with a fix.

llvm-svn: 315925
2017-10-16 18:43:08 +00:00
Simon Dardis 0d378a9eed [mips][micromips] Fix (dis)assembly of bc1(t|f)
Previously these instructions were marked codegen only and had
an under-specified instruction description that did not record the
fcc register.

Reviewers: atanasyan, abeserminji

Differential Revision: https://reviews.llvm.org/D38847

llvm-svn: 315905
2017-10-16 14:20:22 +00:00
Simon Pilgrim 73bd5aa049 Fix or vs || typo.
llvm-svn: 315903
2017-10-16 14:01:59 +00:00
Stefan Maksimovic ee6b5a79dc [mips] Provide alternate predicates for constant synthesis
Ordering of patterns should not be of importance anymore
since the predicates used are mutually exclusive now.

llvm-svn: 315901
2017-10-16 13:18:21 +00:00
Hiroshi Inoue a7eb78b47f [PowerPC] fix up in sign-/zero-extension elimination
This patch fixes a potential problem in my previous commit (https://reviews.llvm.org/rL315888) by adding a null check.

llvm-svn: 315900
2017-10-16 12:11:15 +00:00
Andrew V. Tischenko bfc9061593 This patch is a result of D37262: The issues with X86 prefixes. It closes PR7709, PR17697, PR19251, PR32809 and PR21640. There could be other bugs closed by this patch.
llvm-svn: 315899
2017-10-16 11:14:29 +00:00
Daniel Sanders 01805b6747 [aarch64][globalisel] Fix a crash in selectAddrModeIndexed() caused by incorrect G_FRAME_INDEX handling
The wrong operand was being rendered to the result instruction.

The crash was detected by Bitcode/simd_ops/AArch64_halide_runtime.bc

llvm-svn: 315890
2017-10-16 05:39:30 +00:00
Yonghong Song 6621cf67cf bpf: fix bug on silently truncating 64-bit immediate
We came across an llvm bug when compiling some testcases that 64-bit
immediates are silently truncated into 32-bit and then packed into
BPF_JMP | BPF_K encoding.  This caused comparison with wrong value.

This bug looks to be introduced by r308080.  The Select_Ri pattern is
supposed to be lowered into J*_Ri while the latter only support 32-bit
immediate encoding, therefore Select_Ri should have similar immediate
predicate check as what J*_Ri are doing.

Reported-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Yonghong Song <yhs@fb.com>
llvm-svn: 315889
2017-10-16 04:14:53 +00:00
Hiroshi Inoue e3a3e3c9e9 [PowerPC] Eliminate sign- and zero-extensions if already sign- or zero-extended
This patch enables redundant sign- and zero-extension elimination in PowerPC MI Peephole pass.
If the input value of a sign- or zero-extension is known to be already sign- or zero-extended, the operation is redundant and can be eliminated.
One common case is sign-extensions for a method parameter or for a method return value; they must be sign- or zero-extended as defined in PPC ELF ABI. 
For example of the following simple code, two extsw instructions are generated before the invocation of int_func and before the return. With this patch, both extsw are eliminated.

void int_func(int);
void ii_test(int a) {
    if (a & 1) return int_func(a);
}

Such redundant sign- or zero-extensions are quite common in many programs; e.g. I observed about 60,000 occurrences of the elimination while compiling the LLVM+CLANG.

Differential Revision: https://reviews.llvm.org/D31319

llvm-svn: 315888
2017-10-16 04:12:57 +00:00
Daniel Sanders ea8711b88e Re-commit r315885: [globalisel][tblgen] Add support for iPTR and implement am_unscaled* and am_indexed*
Summary:
iPTR is a pointer of subtarget-specific size to any address space. Therefore
type checks on this size derive the SizeInBits from a subtarget hook.

At this point, we can import the simplests G_LOAD rules and select load
instructions using them. Further patches will support for the predicates to
enable additional loads as well as the stores.

The previous commit failed on MSVC due to a failure to convert an
initializer_list to a std::vector. Hopefully, MSVC will accept this version.

Depends on D37457

Reviewers: ab, qcolombet, t.p.northover, rovka, aditya_nandakumar

Reviewed By: qcolombet

Subscribers: kristof.beyls, javed.absar, llvm-commits, igorb

Differential Revision: https://reviews.llvm.org/D37458

llvm-svn: 315887
2017-10-16 03:36:29 +00:00
Daniel Sanders ce72d611af Revert r315885: [globalisel][tblgen] Add support for iPTR and implement am_unscaled* and am_indexed*
MSVC doesn't like one of the constructors.

llvm-svn: 315886
2017-10-16 02:15:39 +00:00
Daniel Sanders 6735ea86cd [globalisel][tblgen] Add support for iPTR and implement am_unscaled* and am_indexed*
Summary:
iPTR is a pointer of subtarget-specific size to any address space. Therefore
type checks on this size derive the SizeInBits from a subtarget hook.

At this point, we can import the simplests G_LOAD rules and select load
instructions using them. Further patches will support for the predicates to
enable additional loads as well as the stores.

Depends on D37457

Reviewers: ab, qcolombet, t.p.northover, rovka, aditya_nandakumar

Reviewed By: qcolombet

Subscribers: kristof.beyls, javed.absar, llvm-commits, igorb

Differential Revision: https://reviews.llvm.org/D37458

llvm-svn: 315885
2017-10-16 01:16:35 +00:00
Krzysztof Parzyszek 7467119149 [Hexagon] Add LLVM_ATTRIBUTE_UNUSED to operator<<, NFC
This should silence "unused function" warnings.

llvm-svn: 315883
2017-10-16 00:29:47 +00:00
Daniel Sanders df39cbae2f Re-commit r315863: [globalisel][tablegen] Import ComplexPattern when used as an operator
Summary:
It's possible for a ComplexPattern to be used as an operator in a match
pattern. This is used by the load/store patterns in AArch64 to name the
suboperands returned by ComplexPattern predicate so that they can be broken
apart and referenced independently in the result pattern.

This patch adds support for this in order to enable the import of load/store
patterns.

Depends on D37445

Hopefully fixed the ambiguous constructor that a large number of bots reported.

Reviewers: ab, qcolombet, t.p.northover, rovka, aditya_nandakumar

Reviewed By: qcolombet

Subscribers: aemerson, javed.absar, igorb, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D37456

llvm-svn: 315869
2017-10-15 18:22:54 +00:00
Daniel Sanders bb082a36d3 Revert r315863: [globalisel][tablegen] Import ComplexPattern when used as an operator
A large number of bots are failing on an ambiguous constructor call.

llvm-svn: 315866
2017-10-15 17:51:07 +00:00
Daniel Sanders b95b867dd8 [globalisel][tablegen] Import ComplexPattern when used as an operator
Summary:
It's possible for a ComplexPattern to be used as an operator in a match
pattern. This is used by the load/store patterns in AArch64 to name the
suboperands returned by ComplexPattern predicate so that they can be broken
apart and referenced independently in the result pattern.

This patch adds support for this in order to enable the import of load/store
patterns.

Depends on D37445

Reviewers: ab, qcolombet, t.p.northover, rovka, aditya_nandakumar

Reviewed By: qcolombet

Subscribers: aemerson, javed.absar, igorb, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D37456

llvm-svn: 315863
2017-10-15 17:03:36 +00:00
Craig Topper 2738117326 [X86] Remove the SlowBTMem feature flag entirely
Turns out we have no patterns on the instructions that were using this feature flag for other reasons. These instructions are slow on all modern CPUs so it seems unlikely that we will spend any effort supporting these instructions going forward. So we might as well just kill of the feature flag and just fix up the comments.

llvm-svn: 315862
2017-10-15 16:57:33 +00:00
Craig Topper a5af4a64d0 [AVX512] Don't mark EXTLOAD as legal with AVX512. Continue using custom lowering.
Summary:
This was impeding our ability to combine the extending shuffles with other shuffles as you can see from the test changes.

There's one special case that needed to be added to use VZEXT directly for v8i8->v8i64 since the custom lowering requires v64i8.

Reviewers: RKSimon, zvi, delena

Reviewed By: delena

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D38714

llvm-svn: 315860
2017-10-15 16:41:17 +00:00
Craig Topper a1f9c9dd8b [X86] Add FeatureSlowBTMem to Haswell, Broadwell, Skylake, Cannonlake, and Knights Landing CPUs.
Summary: I see nothing in Agner Fog's tables to indicate that this improved between Ivy Bridge and Haswell. It's also set for all Atom CPUs so I assume KNL should have it too.

Reviewers: RKSimon, zvi, gadi.haber

Reviewed By: gadi.haber

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D38890

llvm-svn: 315859
2017-10-15 16:41:15 +00:00
Aaron Ballman 615eb47035 Reverting r315590; it did not include changes for llvm-tblgen, which is causing link errors for several people.
Error LNK2019 unresolved external symbol "public: void __cdecl `anonymous namespace'::MatchableInfo::dump(void)const " (?dump@MatchableInfo@?A0xf4f1c304@@QEBAXXZ) referenced in function "public: void __cdecl `anonymous namespace'::AsmMatcherEmitter::run(class llvm::raw_ostream &)" (?run@AsmMatcherEmitter@?A0xf4f1c304@@QEAAXAEAVraw_ostream@llvm@@@Z) llvm-tblgen D:\llvm\2017\utils\TableGen\AsmMatcherEmitter.obj 1

llvm-svn: 315854
2017-10-15 14:32:27 +00:00
Amjad Aboud c8d67979c0 [X86] Ignore DBG instructions in X86CmovConversion optimization to resolve PR34565
Differential Revision: https://reviews.llvm.org/D38359

llvm-svn: 315851
2017-10-15 11:00:56 +00:00
Craig Topper a9cd59fb5d [X86] Lower vselect with constant condition to vector_shuffle even with AVX512 instructions.
Summary:
It's better to use our shuffle lowering code to handle these than loading an immediate into a k-register.

It really feels like this should be a DAG combine optimization rather than a lowering operation, but that's a problem for another day.

Reviewers: RKSimon, delena, zvi

Reviewed By: delena

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D38932

llvm-svn: 315849
2017-10-15 06:39:07 +00:00
Vitaly Buka 7450398e01 Remove unused variables
llvm-svn: 315847
2017-10-15 05:35:02 +00:00
Davide Italiano 76067588dc [Hexagon] Mark RangeTree::dump() with LLVM_DUMP_METHOD.
GCC otherwise emits a "defined but not used" warning on the
member function.

llvm-svn: 315838
2017-10-14 23:46:01 +00:00
Konstantin Zhuravlyov 8c18f5b3d4 AMDGPU: Don't use TargetStreamer if it has not been initialized
Fixes cfe/trunk/test/Misc/backend-resource-limit-diagnostics.cl
test after r315808

We may hit few other similar issues, but I want to discuss good
solution offline.

llvm-svn: 315830
2017-10-14 22:16:26 +00:00
Simon Pilgrim 36fe00ee17 [X86][SSE] Don't attempt to reduce the imul vector width of odd sized vectors (PR34947)
llvm-svn: 315825
2017-10-14 19:57:19 +00:00
Bruno Cardoso Lopes caac2fbd19 Revert "[AArch64][RegisterBankInfo] Use the statically computed mappings for COPY"
This reverts commit r315781, breaks:
http://green.lab.llvm.org/green/job/Compiler_Verifiers_GlobalISEL/9882

llvm-svn: 315823
2017-10-14 19:31:03 +00:00
Konstantin Zhuravlyov a01d8b0b63 AMDGPU: Bring HSA metadata on par with the specification
Differential Revision: https://reviews.llvm.org/D38753

llvm-svn: 315821
2017-10-14 19:03:51 +00:00
Simon Pilgrim f5b9f353c3 Pull out repeated calls to VT.getVectorNumElements(). NFCI.
llvm-svn: 315818
2017-10-14 17:37:42 +00:00
Simon Pilgrim cded82837d Use DAG::getBitcast() helper. NFCI.
llvm-svn: 315815
2017-10-14 17:14:42 +00:00
Konstantin Zhuravlyov 219066bab8 AMDGPU: Improve note directive verification in assembler
- Do not allow amd_amdgpu_isa directives on non-amdgcn architectures
  - Do not allow amd_amdgpu_hsa_metadata on non-amdhsa OSes
  - Do not allow amd_amdgpu_pal_metadata on non-amdpal OSes

Differential Revision: https://reviews.llvm.org/D38750

llvm-svn: 315812
2017-10-14 16:15:28 +00:00
Konstantin Zhuravlyov eda425edd4 AMDGPU: Do not emit deprecated notes for code object v3
Differential Revision: https://reviews.llvm.org/D38749

llvm-svn: 315810
2017-10-14 15:59:07 +00:00
Konstantin Zhuravlyov 9c05b2bc3b AMDGPU: Add support for isa version note
- Emit NT_AMD_AMDGPU_ISA
  - Add assembler parsing for isa version directive
    - If isa version directive does not match command line arguments, then return error

Differential Revision: https://reviews.llvm.org/D38748

llvm-svn: 315808
2017-10-14 15:40:33 +00:00
Simon Pilgrim f367c27d2d [X86][SSE] Support combining AND(EXTRACT(SHUF(X)), C) -> EXTRACT(SHUF(X))
If we are applying a byte mask to a value extracted from a shuffle, see if we can combine the mask into shuffle.

Fixes the last issue with PR22415

llvm-svn: 315807
2017-10-14 15:01:36 +00:00
Craig Topper f7e777763d [X86] Add patterns for vzmovl+cvtpd2dq/cvttpd2dq with a load.
llvm-svn: 315802
2017-10-14 07:04:48 +00:00
Craig Topper 61010a85b8 [X86] Add AVX512 versions of VCVTPD2PS to load folding tables.
llvm-svn: 315801
2017-10-14 05:55:43 +00:00
Craig Topper ee277e190c [X86] Add patterns for vzmovl+cvtpd2ps with a load.
llvm-svn: 315800
2017-10-14 05:55:42 +00:00
Craig Topper aec05a9303 [X86] Remove some patterns for bitcasted alignednonedtemporalloads.
These select the same instruction as the non-bitcasted pattern. So this provides no additional value.

llvm-svn: 315799
2017-10-14 04:18:11 +00:00
Craig Topper 009f0aaeb0 [X86] Remove unnecessary bitconverts as the root of patterns for zero extended VCVTPD2UDQZ128rr and VCVTTPD2UDQZ128rr.
We don't need a bitconvert as a root pattern in these cases. The types in the other parts of the pattern are sufficient to express the behavior of these instructions.

llvm-svn: 315798
2017-10-14 04:18:10 +00:00
Craig Topper d746747d03 [X86] Add additional patterns for folding loads with 128-bit VCVTDQ2PD and VCVTUDQ2PD.
This matches the patterns we have for the SSE/AVX version.

This is a prerequisite for D38714.

llvm-svn: 315797
2017-10-14 04:18:09 +00:00
Craig Topper 134241e4af [X86] Add AVX512 flavors of VCVTDQ2PD plus VCVTUDQ2PD to the load folding tables.
llvm-svn: 315796
2017-10-14 04:18:08 +00:00
Craig Topper 0b64e67b0d [X86] Remove TB_NO_REVERSE from VCVTDQ2PDYrr and VCVTPS2PDYrr in the load folding tables.
I believe these were added incorrectly under the belief that the load size was smaller than the input register size, but that's not true.

llvm-svn: 315795
2017-10-14 04:18:07 +00:00
Craig Topper 53b0cb7fa9 [X86] Add an additional isel pattern to CVTDQ2PDrm/VCVTDQ2PDrm to enable load folding without the peephole pass.
This pattern is already used in AVX512VL version of these instructions. Though AVX512VL version is missing other patterns.

llvm-svn: 315794
2017-10-14 04:18:06 +00:00
Quentin Colombet dc2da06c55 [AArch64][RegisterBankInfo] Use the statically computed mappings for COPY
We use to resort on the generic implementation to get the mappings for
COPYs. The generic implementation resorts on table lookup and
dynamically allocated objects to get the valid mappings.

Given we already know how to map G_BITCAST and have the static mappings
for them, use that code path for COPY as well. This is much more
efficient.

Improve the compile time of RegBankSelect by up to 20%.

Note: When we eventually generate all the mappings via TableGen, we
wouldn't have to do that dance to shave compile time. The intent of this
change was to make sure that moving to static structure really pays off.

NFC.

llvm-svn: 315781
2017-10-14 00:43:48 +00:00
Krzysztof Parzyszek a7e5c84590 Revert r315763: "[Hexagon] Rangify some loops, NFC"
Broke some builds (using libstdc++).

llvm-svn: 315769
2017-10-13 21:57:11 +00:00
Craig Topper f6c69564e7 [X86] Use X86ISD::VBROADCAST in place of v2f64 X86ISD::MOVDDUP when AVX2 is available
This is particularly important for AVX512VL where we are better able to recognize the VBROADCAST loads to fold with other operations.

For AVX512VL we now use X86ISD::VBROADCAST for all of the patterns and remove the 128-bit X86ISD::VMOVDDUP.

We may be able to use this for AVX1 as well which would allow us to remove more isel patterns.

I also had to add X86ISD::VBROADCAST as a node to call combineShuffle for so that we treat it similar to X86ISD::MOVDDUP.

Differential Revision: https://reviews.llvm.org/D38836

llvm-svn: 315768
2017-10-13 21:56:48 +00:00
Krzysztof Parzyszek 63ca5d6196 [Hexagon] Rangify some loops, NFC
llvm-svn: 315763
2017-10-13 21:43:00 +00:00
Daniel Sanders 11300cead8 [globalisel][tablegen] Add support for fpimm and import of APInt/APFloat based ImmLeaf.
Summary:
There's only a tablegen testcase for IntImmLeaf and not a CodeGen one
because the relevant rules are rejected for other reasons at the moment.
On AArch64, it's because there's an SDNodeXForm attached to the operand.
On X86, it's because the rule either emits multiple instructions or has
another predicate using PatFrag which cannot easily be supported at the
same time.

Reviewers: ab, t.p.northover, qcolombet, rovka, aditya_nandakumar

Reviewed By: qcolombet

Subscribers: aemerson, javed.absar, igorb, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D36569

llvm-svn: 315761
2017-10-13 21:28:03 +00:00
Matt Arsenault e11d8aca77 AMDGPU: Implement hasBitPreservingFPLogic
llvm-svn: 315754
2017-10-13 21:10:22 +00:00
Benjamin Kramer 9f21ca6361 [Hexagon] Avoid unused variable warnings in release builds.
No functionality change intended.

llvm-svn: 315749
2017-10-13 20:46:14 +00:00
Matt Arsenault 550c66d10f AMDGPU: Look for src mods before fp_extend
When selecting modifiers for mad_mix instructions,
look at fneg/fabs that occur before the conversion.

llvm-svn: 315748
2017-10-13 20:45:49 +00:00
Daniel Sanders 649c585710 [aarch64] Support APInt and APFloat in ImmLeaf subclasses and make AArch64 use them.
Summary:
The purpose of this patch is to expose more information about ImmLeaf-like
PatLeaf's so that GlobalISel can learn to import them. Previously, ImmLeaf
could only be used to test int64_t's produced by sign-extending an APInt.
Other tests on immediates had to use the generic PatLeaf and extract the
constant using C++.

With this patch, tablegen will know how to generate predicates for APInt,
and APFloat. This will allow it to 'do the right thing' for both SelectionDAG
and GlobalISel which require different methods of extracting the immediate
from the IR.

This is NFC for SelectionDAG since the new code is equivalent to the
previous code. It's also NFC for FastISel because FastIselShouldIgnore is 1
for the ImmLeaf subclasses. Enabling FastIselShouldIgnore == 0 for these new
subclasses will require a significant re-factor of FastISel.

For GlobalISel, it's currently NFC because the relevant code to import the
affected rules is not yet present. This will be added in a later patch.

Depends on D36086

Reviewers: ab, t.p.northover, qcolombet, rovka, aditya_nandakumar

Reviewed By: qcolombet

Subscribers: bjope, aemerson, rengolin, javed.absar, igorb, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D36534

llvm-svn: 315747
2017-10-13 20:42:18 +00:00
Matt Arsenault 4d70754e3c AMDGPU: Implement isFPExtFoldable
This helps match v_mad_mix* in some cases.

llvm-svn: 315744
2017-10-13 20:18:59 +00:00
Matt Arsenault f2db97d8fa DAG: Add opcode and source type to isFPExtFree
This is only currently used for mad/fma transforms.
This is the only case where it should be used for AMDGPU,
so add an opcode to be sure.

llvm-svn: 315740
2017-10-13 19:55:45 +00:00
Krzysztof Parzyszek 7c9c05888c [Hexagon] Minimize number of repeated constant extenders
Each constant extender requires an extra instruction, which adds to the
code size and also reduces the number of available slots in an instruction
packet. In most cases, the value of a repeated constant extender could be
loaded into a register, and the instructions using the extender could be
replaced with their counterparts that use that register instead.

This patch adds a pass that tries to reduce the number of constant
extenders, including extenders which differ only in an immediate offset
known at compile time, e.g. @global and @global+12.

llvm-svn: 315735
2017-10-13 19:02:59 +00:00
Craig Topper 5d692917f4 [X86] Add initial skeleton support for knm cpu
This adds Intel's Knights Mill CPU to valid CPU names for the backend. For now its an alias of "knl", but ultimately we need to support AVX5124FMAPS and AVX5124VNNIW instruction sets for it.

Differential Revision: https://reviews.llvm.org/D38811

llvm-svn: 315722
2017-10-13 18:10:17 +00:00
Craig Topper 5805fb3dfc [X86] Fix some inconsistent formatting in the processor feature lists.
llvm-svn: 315696
2017-10-13 16:06:06 +00:00
Craig Topper 54541c4675 [X86] Add ProcIntelBDW to BroadwellProc class not BDWFeatures class.
This isn't a property we want inherited.

llvm-svn: 315695
2017-10-13 16:04:08 +00:00
Krzysztof Parzyszek a0f2f7c413 [Hexagon] Add patterns for cmpb/cmph with immediate arguments
Patch by Sumanth Gundapaneni.

llvm-svn: 315692
2017-10-13 15:43:12 +00:00
Craig Topper 0817346aef [X86] Stop creating CMOV nodes with a second MVT::Glue result
Summary: We seem to inconsistently create CMOV nodes some with a Glue result and some without. But I can't find any cases that use the Glue result. So I've tried to remove all the place that did this.

Reviewers: RKSimon, spatel, zvi

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D38664

llvm-svn: 315686
2017-10-13 15:28:35 +00:00
Craig Topper bf0de9d3b6 [X86] Remove patterns that select unmasked vbroadcastf2x32/vbroadcasti2x32. Prefer vbroadcastsd/vpbroadcastq instead.
There's no advantage to using these instructions when they aren't masked. This enables some additional execution domain switching without needing to update the table.

llvm-svn: 315674
2017-10-13 06:07:10 +00:00
Matthias Braun bb8507e63c Revert "TargetMachine: Merge TargetMachine and LLVMTargetMachine"
Reverting to investigate layering effects of MCJIT not linking
libCodeGen but using TargetMachine::getNameWithPrefix() breaking the
lldb bots.

This reverts commit r315633.

llvm-svn: 315637
2017-10-12 22:57:28 +00:00
Matthias Braun 3a9c114b24 TargetMachine: Merge TargetMachine and LLVMTargetMachine
Merge LLVMTargetMachine into TargetMachine.

- There is no in-tree target anymore that just implements TargetMachine
  but not LLVMTargetMachine.
- It should still be possible to stub out all the various functions in
  case a target does not want to use lib/CodeGen
- This simplifies the code and avoids methods ending up in the wrong
  interface.

Differential Revision: https://reviews.llvm.org/D38489

llvm-svn: 315633
2017-10-12 22:28:54 +00:00
Craig Topper 060cb43721 [X86] Add CLWB intrinsic. llvm part
llvm-svn: 315613
2017-10-12 20:08:31 +00:00
Wei Ding 5676acad9e Implement custom lowering for ISD::CTTZ_ZERO_UNDEF and ISD::CTTZ.
Differential Revision: http://reviews.llvm.org/D37348

llvm-svn: 315610
2017-10-12 19:37:14 +00:00
Konstantin Zhuravlyov 70303c011f AMDGPU/NFC: Move AMDGPU specific note types to ELF.h
Differential Revision: https://reviews.llvm.org/D38747

llvm-svn: 315608
2017-10-12 18:59:54 +00:00
Artem Belevich 3bafc2f0d9 [NVPTX] Implemented wmma intrinsics and instructions.
WMMA = "Warp Level Matrix Multiply-Accumulate".
These are the new instructions introduced in PTX6.0 and available
on sm_70 GPUs.

Differential Revision: https://reviews.llvm.org/D38645

llvm-svn: 315601
2017-10-12 18:27:55 +00:00
Reid Kleckner 1a7e387849 [codeview] Don't emit FPO data in funclet prologues
Attempt 3 to work around bugs in FPO data with funclets.

llvm-svn: 315600
2017-10-12 18:20:35 +00:00
Konstantin Zhuravlyov 63e87f5a02 AMDGPU: Fix warnings introduced in r315526
llvm-svn: 315596
2017-10-12 17:34:05 +00:00
Lei Huang 0724fea2da [PowerPC] Add profitablilty check for conversion to mtctr loops
Add profitability checks for modifying counted loops to use the mtctr instruction.

The latency of mtctr is only justified if there are more than 4 comparisons that
will be removed as a result.  Usually counted loops are formed relatively early
and before unrolling, so most low trip count loops often don't survive.  However
we want to ensure that if they do, we do not mistakenly update them to mtctr loops.

Use CodeMetrics to ensure we are only doing this for small loops with small trip counts.

Differential Revision: https://reviews.llvm.org/D38212

llvm-svn: 315592
2017-10-12 16:43:33 +00:00
Tim Renouf c8ffffe462 [AMDGPU] For amdpal, widen interpolation mode workaround
Summary:
The interpolation mode workaround ensures that at least one
interpolation mode is enabled in PSInputAddr. It does not also check
PSInputEna on the basis that the user might enable bits in that
depending on run-time state.

However, for amdpal os type, the user does not enable some bits after
compilation based on run-time states; the register values being
generated here are the final ones set in the hardware. Therefore, apply
the workaround to PSInputAddr and PSInputEnable together. (The case
where a bit is set in PSInputAddr but not in PSInputEnable is where the
frontend set up an input arg for a particular interpolation mode, but
nothing uses that input arg. Really we should have an earlier pass that
removes such an arg.)

Reviewers: arsenm, nhaehnle, dstuttard

Subscribers: kzhuravl, wdng, yaxunl, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D37758

llvm-svn: 315591
2017-10-12 16:16:41 +00:00
Don Hinton 3e0199f7eb [dump] Remove NDEBUG from test to enable dump methods [NFC]
Summary:
Add LLVM_FORCE_ENABLE_DUMP cmake option, and use it along with
LLVM_ENABLE_ASSERTIONS to set LLVM_ENABLE_DUMP.

Remove NDEBUG and only use LLVM_ENABLE_DUMP to enable dump methods.

Move definition of LLVM_ENABLE_DUMP from config.h to llvm-config.h so
it'll be picked up by public headers.

Differential Revision: https://reviews.llvm.org/D38406

llvm-svn: 315590
2017-10-12 16:16:06 +00:00
Sanjay Patel 3a72909b7e [x86] replace isEqualTo with == for efficiency
This is a follow-up suggested in D37534.
Patch by Yulia Koval.

llvm-svn: 315589
2017-10-12 16:15:38 +00:00
Simon Pilgrim 0903085ec3 [X86][SSE] Pull out repeated INSERT_VECTOR_ELT code from LowerBUILD_VECTOR v16i8/v8i16 insertion. NFCI.
llvm-svn: 315587
2017-10-12 15:52:01 +00:00
Reid Kleckner d925f98375 Speculative build fix 2
llvm-svn: 315542
2017-10-12 00:28:28 +00:00
Wei Mi 1736efd16a Revert r307036 because of PR34919.
llvm-svn: 315540
2017-10-12 00:24:52 +00:00
Reid Kleckner 9c0126ec0b Speculative build fix, apparently I built llc without my patch applied to test it
llvm-svn: 315539
2017-10-12 00:20:50 +00:00
Reid Kleckner 29cfa6f11f [codeview] Disable FPO in functions using EH funclets
Funclets are emitted by WinException which doesn't have access to
X86TargetStreamer so it's hard to make a quick fix for this.

llvm-svn: 315538
2017-10-12 00:06:57 +00:00
Reid Kleckner c18c12e385 Fix AMDGPU build issue
llvm-svn: 315535
2017-10-11 23:53:36 +00:00
Reid Kleckner ec4ff24f79 [X86] Sink X86AsmPrinter ctor into .cpp file, NFC
I keep adding and removing code here, so let's sink it.

llvm-svn: 315534
2017-10-11 23:53:12 +00:00
Lang Hames 2241ffa43c [MC] Have MCObjectStreamer take its MCAsmBackend argument via unique_ptr.
MCObjectStreamer owns its MCCodeEmitter -- this fixes the types to reflect that,
and allows us to remove the last instance of MCObjectStreamer's weird "holding
ownership via someone else's reference" trick.

llvm-svn: 315531
2017-10-11 23:34:47 +00:00
Konstantin Zhuravlyov 516651b154 AMDGPU/NFC: Minor clean ups in HSA metadata
- Use HSA metadata streamer directly from AMDGPUAsmPrinter
  - Make naming consistent with PAL metadata

Differential Revision: https://reviews.llvm.org/D38746

llvm-svn: 315526
2017-10-11 22:59:35 +00:00
Konstantin Zhuravlyov c3beb6a075 AMDGPU/NFC: Minor clean ups in PAL metadata
- Move PAL metadata definitions to AMDGPUMetadata
  - Make naming consistent with HSA metadata

Differential Revision: https://reviews.llvm.org/D38745

llvm-svn: 315523
2017-10-11 22:41:09 +00:00
Konstantin Zhuravlyov a63b0f9d20 AMDGPU/NFC: Rename code object metadata as HSA metadata
- Rename AMDGPUCodeObjectMetadata to AMDGPUMetadata (PAL metadata will be included in this file in the follow up change)
  - Rename AMDGPUCodeObjectMetadataStreamer to AMDGPUHSAMetadataStreamer
  - Introduce HSAMD namespace
  - Other minor name changes in function and test names

llvm-svn: 315522
2017-10-11 22:18:53 +00:00
Reid Kleckner 9cdd4df81a [codeview] Implement FPO data assembler directives
Summary:
This adds a set of new directives that describe 32-bit x86 prologues.
The directives are limited and do not expose the full complexity of
codeview FPO data. They are merely a convenience for the compiler to
generate more readable assembly so we don't need to generate tons of
labels in CodeGen. If our prologue emission changes in the future, we
can change the set of available directives to suit our needs. These are
modelled after the .seh_ directives, which use a different format that
interacts with exception handling.

The directives are:
  .cv_fpo_proc _foo
  .cv_fpo_pushreg ebp/ebx/etc
  .cv_fpo_setframe ebp/esi/etc
  .cv_fpo_stackalloc 200
  .cv_fpo_endprologue
  .cv_fpo_endproc
  .cv_fpo_data _foo

I tried to follow the implementation of ARM EHABI CFI directives by
sinking most directives out of MCStreamer and into X86TargetStreamer.
This helps avoid polluting non-X86 code with WinCOFF specific logic.

I used cdb to confirm that this can show locals in parent CSRs in a few
cases, most importantly the one where we use ESI as a frame pointer,
i.e. the one in http://crbug.com/756153#c28

Once we have cdb integration in debuginfo-tests, we can add integration
tests there.

Reviewers: majnemer, hans

Subscribers: aemerson, mgorny, kristof.beyls, llvm-commits, hiraditya

Differential Revision: https://reviews.llvm.org/D38776

llvm-svn: 315513
2017-10-11 21:24:33 +00:00
Krzysztof Parzyszek c4a9a8d8e0 [Hexagon] Make sure that new-value jump is packetized with producer
llvm-svn: 315510
2017-10-11 21:20:43 +00:00
Lei Huang 263dc4ef3a [PowerPC] Utilize DQ-Form instructions for spill/restore and fix FrameIndex elimination to only use `lis/addi` if necessary.
Currently we produce a bunch of unnecessary code when emitting the
prologue/epilogue for spills/restores.  Namely, if the load from stack
slot/store to stack slot instruction is an X-Form instruction, we will
always produce an LIS/ORI sequence for the stack offset.

Furthermore, we have not exploited the P9 vector D-Form loads/stores for this
purpose.

This patch address both issues.

Specifying the D-Form load as the instruction to use for stack spills/reloads
should be safe because:

1. The stack should be aligned according to the ABI
2. If the stack isn't aligned, PPCRegisterInfo::eliminateFrameIndex() will
   check for the offset being a multiple of 16 and will convert it to an
   X-Form instruction if it isn't.

Differential Revision : https://reviews.llvm.org/D38758

llvm-svn: 315500
2017-10-11 20:20:58 +00:00
Sanjay Patel 6c0aef77aa [x86] avoid infinite loop from SoftenFloatOperand (PR34866)
Legalization of fp128 assumes things that we should have asserts for,
so that's another potential improvement.

Differential Revision: https://reviews.llvm.org/D38771

llvm-svn: 315485
2017-10-11 18:24:21 +00:00
Krzysztof Parzyszek bf626195df [Hexagon] Handle non-immediate operands to A2_addi in getIncrementValue
llvm-svn: 315472
2017-10-11 16:15:31 +00:00
Simon Pilgrim 7db366630c Spelling mistake in comment. NFCI.
llvm-svn: 315471
2017-10-11 16:10:05 +00:00
Craig Topper 3dc22bba47 [X86] Remove MVT::i1 handling code from LowerTRUNCATE
Summary: I don't think this is necessary with i1 being illegal now.

Reviewers: RKSimon, zvi, guyblank

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D38784

llvm-svn: 315469
2017-10-11 16:05:05 +00:00
Krzysztof Parzyszek 12bdcab59c [Pipeliner] Fix offset value for instrs dependent on post-inc load/stores
The software pipeliner and the packetizer try to break dependence
between the post-increment instruction and the dependent memory
instructions by changing the base register and the offset value.
However, in some cases, the existing logic didn't work properly
and created incorrect offset value.

Patch by Jyotsna Verma.

llvm-svn: 315468
2017-10-11 15:59:51 +00:00
Krzysztof Parzyszek 8f174dde92 [Pipeliner] Improve serialization order for post-increments
The pipeliner is generating a serial sequence that causes poor
register allocation when a post-increment instruction appears
prior to the use of the post-increment register. This occurs when
there is a circular set of dependences involved with a sequence
of instructions in the same cycle. In this case, there is no
serialization of the parallel semantics that will not cause an
additional register to be allocated.

This patch fixes the problem by changing the instructions so that
the post-increment instruction is used by the subsequent
instruction, which enables the register allocator to make a
better decision and not require another register.

Patch by Brendon Cahoon.

llvm-svn: 315466
2017-10-11 15:51:44 +00:00
Alex Bradbury 5c1eef4618 [RISCV] Fix build after r315327
Differential Revision: https://reviews.llvm.org/D38779
Patch by Chih-Mao Chen.

llvm-svn: 315455
2017-10-11 12:09:06 +00:00
Simon Dardis 41851e3546 [mips] Add support for parsing target specific flags for MIR
Reviewers: atanasyan

Differential Revision: https://reviews.llvm.org/D38620

llvm-svn: 315451
2017-10-11 11:11:35 +00:00
Oliver Stannard 4191b9eaea [Asm] Add debug tracing in table-generated assembly matcher
This adds debug tracing to the table-generated assembly instruction matcher,
enabled by the -debug-only=asm-matcher option.

The changes in the target AsmParsers are to add an MCInstrInfo reference under
a consistent name, so that we can use it from table-generated code. This was
already being used this way for targets that use deprecation warnings, but 5
targets did not have it, and Hexagon had it under a different name to the other
backends.

llvm-svn: 315445
2017-10-11 09:17:43 +00:00
Lang Hames 02d330548d [MC] Have MCObjectStreamer take its MCAsmBackend argument via unique_ptr.
MCObjectStreamer owns its MCAsmBackend -- this fixes the types to reflect that,
and allows us to remove another instance of MCObjectStreamer's weird "holding
ownership via someone else's reference" trick.

llvm-svn: 315410
2017-10-11 01:57:21 +00:00
Craig Topper 85b1da1dc4 [X86] Remove temporary std::string creation from shuffle comment printing. We can just write directly to the raw_ostream.
llvm-svn: 315399
2017-10-11 00:46:09 +00:00
Craig Topper 6ce20bd184 [X86] Add 128-bit version of vbroadcasti32x2 to shuffle comment decoding.
llvm-svn: 315395
2017-10-11 00:11:53 +00:00
Craig Topper bb0e316dc7 [X86] Add broadcast patterns that allow a scalar_to_vector between the broadcast and the load.
We already have these patterns for AVX512VL, but not AVX1 or 2.

llvm-svn: 315382
2017-10-10 22:40:31 +00:00
Craig Topper ad3d03193a [X86] Fix some patterns that select VLX instructions, but were incorrectly also checking presence of BWI instructions.
The EVEX->VEX pass probably obscures this.

llvm-svn: 315365
2017-10-10 21:07:14 +00:00
Simon Dardis b994128d14 [mips] Correct the instruction predicates for microMIPSr3
Rather than using the AdditionalPredicates mechanism to guard
the microMIPS instructions, use the existing predicates to properly
guard those instructions.

This also resolves a case where an instruction pattern was incorrectly
available for microMIPS32R6, which caused a register allocation failure
as the registers specified in the pattern were not available.

Reviewers: nitesh.jain, atanasyan

Differential Revision: https://reviews.llvm.org/D38451

llvm-svn: 315362
2017-10-10 20:52:53 +00:00
Matt Arsenault f42074b699 AMDGPU: Fix missing skipFunction calls
llvm-svn: 315361
2017-10-10 20:48:36 +00:00
Matt Arsenault d674e0ac0d AMDGPU: Fix failure to select branch with optnone
opt-bisect/optnone disable the AMDGPUUniformAnnotateValues pass.
The heuristic in the custom selector for brcond deferred the
branch uniformity check to the pattern, which would fail.

llvm-svn: 315360
2017-10-10 20:34:49 +00:00
Matt Arsenault cc85223f87 AMDGPU: Fix incorrect selection of pseudo-branches
These should only be used if the machine structurizer is enabled.

llvm-svn: 315357
2017-10-10 20:22:07 +00:00
Yaxun Liu de4b88d9a1 [AMDGPU] Lower enqueued blocks and generate runtime metadata
This patch adds a post-linking pass which replaces the function pointer of enqueued
block kernel with a global variable (runtime handle) and adds
runtime-handle attribute to the enqueued block kernel.

In LLVM CodeGen the runtime-handle metadata will be translated to
RuntimeHandle metadata in code object. Runtime allocates a global buffer
for each kernel with RuntimeHandel metadata and saves the kernel address
required for the AQL packet into the buffer. __enqueue_kernel function
in device library knows that the invoke function pointer in the block
literal is actually runtime handle and loads the kernel address from it
and puts it into AQL packet for dispatching.

This cannot be done in FE since FE cannot create a unique global variable
with external linkage across LLVM modules. The global variable with internal
linkage does not work since optimization passes will try to replace loads
of the global variable with its initialization value.

Differential Revision: https://reviews.llvm.org/D38610

llvm-svn: 315352
2017-10-10 19:39:48 +00:00
Derek Schuff 669300db9c [WebAssembly] Update MCObjectWriter and associated interfaces after r315327
llvm-svn: 315335
2017-10-10 17:31:43 +00:00
Lang Hames 232cdb48fc [MC] Add another missing <memory> include left out of r315327.
llvm-svn: 315332
2017-10-10 16:59:01 +00:00
Lang Hames 3a67075a3a [MC] Add a missing <memory> include left out of r315327.
llvm-svn: 315331
2017-10-10 16:58:26 +00:00
Lang Hames 60fbc7cc38 [MC] Thread unique_ptr<MCObjectWriter> through the create.*ObjectWriter
functions.

This makes the ownership of the resulting MCObjectWriter clear, and allows us
to remove one instance of MCObjectStreamer's bizarre "holding ownership via
someone else's reference" trick.

llvm-svn: 315327
2017-10-10 16:28:07 +00:00
Jacob Gravelle 37af00e7d0 [WebAssembly] Narrow the scope of WebAssemblyFixFunctionBitcasts
Summary:
The pass to fix function bitcasts generates thunks for functions that
are called directly with a mismatching signature. It was also generating
thunks in cases where the function was address-taken, causing aliasing
problems in otherwise valid cases.
This patch tightens the restrictions for when the pass runs.

Reviewers: sunfish, dschuff

Subscribers: jfb, sbc100, llvm-commits, aheejin

Differential Revision: https://reviews.llvm.org/D38640

llvm-svn: 315326
2017-10-10 16:20:18 +00:00
Simon Dardis 96d35fe06a [mips] Duplicate the reciprocal instruction definitions for FP32
Add instruction definitions for FP32 mode for recip.d and rsqrt.d.

Previously these instructions were only defined when targeting the
full 64-bit FPU model but were not guarded properly.

Reviewers: nitesh.jain, atanasyan

Differential Revision: https://reviews.llvm.org/D38400

llvm-svn: 315318
2017-10-10 14:41:11 +00:00
Stefan Pintilie cc330daa5b [PowerPC] Add missing record form instructions to the P9 Scheduling Model
A number of record form instructions were missing from the P9 scheduling
model. Added those instructions and marked the P9 model as complete.

Differential Revision: https://reviews.llvm.org/D38560

llvm-svn: 315313
2017-10-10 13:45:35 +00:00
Uriel Korach 059e211aa1 after fixing the i386 case
Change-Id: If6fe0b6ec01f111115fb734fe31c0e152dbc165f
llvm-svn: 315311
2017-10-10 13:43:09 +00:00
Simon Dardis a17a7b619a [mips] Partially fix PR34391
Previously, the parsing of the 'subu $reg, ($reg,) imm' relied on a parser
which also rendered the operand to the instruction. In some cases the
general parser could construct an MCExpr which was not a MCConstantExpr
which MipsAsmParser was expecting.

Address this by altering the special handling to cope with unexpected inputs
and fine-tune the handling of cases where an register name that is not
available in the current ABI is regarded as not a match for the custom parser
but also not as an outright error.

Also enforces the binutils restriction that only constants are accepted.

This partially resolves PR34391.

Thanks to Ed Maste for reporting the issue!

Reviewers: nitesh.jain, arichardson

Differential Revision: https://reviews.llvm.org/D37476

llvm-svn: 315310
2017-10-10 13:34:45 +00:00
Oliver Stannard 30b732c942 [ARM, Asm] Harden GNU LDRD/STRD aliases against invalid inputs
Previously, the code that implemented the GNU assembler aliases for the
LDRD and STRD instructions (where the second register is omitted)
assumed that the input was a valid instruction. This caused assertion
failures for every example in ldrd-strd-gnu-bad-inst.s.

This improves this code so that it bails out if the instruction is not
in the expected format, the check bails out, and the asm parser is run
on the unmodified instruction.

It also relaxes the alias on thumb targets, so that unaligned pairs of
registers can be used. The restriction that Rt must be even-numbered
only applies to the ARM versions of these instructions.

Differential revision: https://reviews.llvm.org/D36732

llvm-svn: 315305
2017-10-10 12:38:22 +00:00
Oliver Stannard cd3306f62f [ARM, Asm] Add diagnostics for floating-point register operands
This adds diagnostic strings for the ARM floating-point register
classes, which will be used when these classes are expected by the
assembler, but the provided operand is not valid.

One of these, DPR, requires C++ code to select the correct error
message, as that class contains different registers depending on the
FPU. The rest can all have their diagnostic strings stored in the
tablegen decription of them.

Differential revision: https://reviews.llvm.org/D36693

llvm-svn: 315304
2017-10-10 12:35:09 +00:00
Oliver Stannard bbad419e94 [ARM, Asm] Add diagnostics for general-purpose register operands
This adds diagnostic strings for the ARM general-purpose register
classes, which will be used when these classes are expected by the
assembler, but the provided operand is not valid.

One of these, rGPR, requires C++ code to select the correct error
message, as that class contains different registers in pre-v8 and v8
targets. The rest can all have their diagnostic strings stored in the
tablegen description of them.

Differential revision: https://reviews.llvm.org/D36692

llvm-svn: 315303
2017-10-10 12:31:53 +00:00
Nicolai Haehnle 312b64f4d7 AMDGPU: Split MUBUF offset into aligned components
Summary:
Atomic buffer operations do not work (and trap on gfx9) when the
components are unaligned, even if their sum is aligned.

Previously, we generated an offset of 4156 without an SGPR by
splitting it as 4095 + 61 (immediate + inline constant). The
highest offset for which we can do this correctly is 4156 = 4092 + 64.

Fixes dEQP-GLES31.functional.ssbo.atomic.*

Reviewers: arsenm

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D37850

llvm-svn: 315302
2017-10-10 12:22:23 +00:00
Nemanja Ivanovic 7bf866eb10 Fix for PR34888.
The issue is that we assume operand zero of the input to the add instruction
is a register. In this case, the input comes from inline assembly and
operand zero is not a register thereby causing a crash.
The code will bail anyway if the input instruction doesn't have the right
opcode. So do that check first and let short-circuiting prevent the crash.

llvm-svn: 315285
2017-10-10 08:46:10 +00:00
NAKAMURA Takumi aba2b3d1f3 SILoadStoreOptimizer.cpp: Fix build; Clang doesn't like "using anonymous struct" since rL315256.
llvm-svn: 315283
2017-10-10 08:30:53 +00:00
Alex Bradbury 8cc99f1887 [RISCV] Fix build after r315254
createELFObjectWriter now takes a std::unique_ptr<MCELFObjectTargetWriter> 
rather than a MCELFObjectTargetWriter*.

llvm-svn: 315275
2017-10-10 07:19:18 +00:00
Craig Topper a88306e6fb [AVX512] Add patterns to commute integer comparison instructions during isel.
This enables broadcast loads to be commuted and allows normal loads to be folded without the peephole pass.

llvm-svn: 315274
2017-10-10 06:36:46 +00:00
Reid Kleckner e52d1e6787 [SEH] Use reportError instead of report_fatal_error for bad directives
This makes the .seh_ directives slightly more usable from standalone
assembly files.

This removes a large number of report_fatal_errors and recovers from the
error by ignoring the directive.

llvm-svn: 315262
2017-10-10 01:26:25 +00:00
Lang Hames 1301a878f1 [MC] Plumb unique_ptr<MCWasmObjectTargetWriter> through createWasmObjectWriter
to WasmObjectWriter's constructor.

Fixes the same ownership issue for COFF that r315245 did for MachO:
WasmObjectWriter takes ownership of its MCWasmObjectTargetWriter, so we want to
pass this through to the constructor via a unique_ptr, rather than a raw ptr.

llvm-svn: 315260
2017-10-10 01:15:10 +00:00
Reid Kleckner a11b983e11 Fix Wasm build after r315254
llvm-svn: 315258
2017-10-10 00:52:40 +00:00
Lang Hames 77dff39cb4 [MC] Plumb unique_ptr<MCWinCOFFObjectTargetWriter> through
createWinCOFFObjectWriter to WinCOFFObjectWriter's constructor.

Fixes the same ownership issue for COFF that r315245 did for MachO:
WinCOFFObjectWriter takes ownership of its MCWinCOFFObjectTargetWriter, so we
want to pass this through to the constructor via a unique_ptr, rather than a
raw ptr.

llvm-svn: 315257
2017-10-10 00:50:29 +00:00
Lang Hames dcb312bdb9 [MC] Plumb unique_ptr<MCELFObjectTargetWriter> through createELFObjectWriter to
ELFObjectWriter's constructor.

Fixes the same ownership issue for ELF that r315245 did for MachO:
ELFObjectWriter takes ownership of its MCELFObjectTargetWriter, so we want to
pass this through to the constructor via a unique_ptr, rather than a raw ptr.

llvm-svn: 315254
2017-10-09 23:53:15 +00:00
Lang Hames 9b206a7d60 [MC] Plumb unique_ptr<MCMachObjectTargetWriter> through createMachObjectWriter
to MCObjectWriter's constructor.

MCObjectWriter takes ownership of its MCMachObjectTargetWriter argument -- this
patch plumbs that ownership relationship through the constructor (which
previously took raw MCMachObjectTargetWriter*) and the createMachObjectWriter
function.

llvm-svn: 315245
2017-10-09 22:38:13 +00:00
Aditya Nandakumar c3bfc81a1f [GISel]: Fix generation of illegal COPYs during CallLowering
We end up creating COPY's that are either truncating/extending and this
should be illegal.

https://reviews.llvm.org/D37640

Patch for X86 and ARM by igorb, rovka

llvm-svn: 315240
2017-10-09 20:07:43 +00:00
Zvi Rackover c1d5955684 [X86] Unsigned saturation subtraction canonicalization [the backend part]
Summary:
On behalf of julia.koval@intel.com

The patch transforms canonical version of unsigned saturation, which is sub(max(a,b),a) or sub(a,min(a,b)) to special psubus insturuction on targets, which support it(8bit and 16bit uints).
umax(a,b) - b -> subus(a,b)
a - umin(a,b) -> subus(a,b)

There is also extra case handled, when right part of sub is 32 bit and can be truncated, using UMIN(this transformation was discussed in https://reviews.llvm.org/D25987).

The example of special case code:

```
void foo(unsigned short *p, int max, int n) {

  int i;
  unsigned m;
  for (i = 0; i < n; i++) {
    m = *--p;
    *p = (unsigned short)(m >= max ? m-max : 0);
  }
}
```
Max in this example is truncated to max_short value, if it is greater than m, or just truncated to 16 bit, if it is not. It is vaid transformation, because if max > max_short, result of the expression will be zero.

Here is the table of types, I try to support, special case items are bold:

| Size | 128 | 256 | 512
| -----  | -----  | -----   | -----
| i8 | v16i8 | v32i8 | v64i8
| i16 | v8i16 | v16i16 | v32i16
| i32 | | **v8i32** | **v16i32**
| i64 | | | **v8i64**

Reviewers: zvi, spatel, DavidKreitzer, RKSimon

Reviewed By: zvi

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D37534

llvm-svn: 315237
2017-10-09 20:01:10 +00:00
Amara Emerson 24ca39ce71 [AArch64] Improve codegen for inverted overflow checking intrinsics
E.g. if we have a (xor(overflow-bit), 1) where overflow-bit comes from an
intrinsic like llvm.sadd.with.overflow then we can kill the xor and use the
inverted condition code for the CSEL.

rdar://28495949

Reviewed By: kristof.beyls

Differential Revision: https://reviews.llvm.org/D38160

llvm-svn: 315205
2017-10-09 15:15:09 +00:00
Craig Topper c88883b07d [X86] Remove a setLoadExtAction from the AVX512 section that uses an AVX512BW type and is alraedy present in the AVX512BW section.
llvm-svn: 315202
2017-10-09 01:05:16 +00:00
Craig Topper 4f8656a7af [X86] Enable extended comparison predicate support for SETUEQ/SETONE when targeting AVX instructions.
We believe that despite AMD's documentation, that they really do support all 32 comparision predicates under AVX.

Differential Revision: https://reviews.llvm.org/D38609

llvm-svn: 315201
2017-10-09 01:05:15 +00:00
Simon Pilgrim 2c742f919a [X86][SSE] Don't call combineTo inside combineX86ShufflesRecursively. NFCI.
Return the combined shuffle from combineX86ShufflesRecursively and perform the combineTo in the caller.

Makes it easier for future patches to use this in functions that aren't actually shuffles themselves.

llvm-svn: 315195
2017-10-08 20:58:14 +00:00
Simon Pilgrim 6abbd33ec0 Tidyup with clang-format. NFCI.
llvm-svn: 315187
2017-10-08 19:24:30 +00:00
Benjamin Kramer 16610028ea Remove unused variables. No functionality change.
llvm-svn: 315185
2017-10-08 19:11:02 +00:00
Simon Pilgrim dc32c844f9 [X86] getTargetConstantBitsFromNode - add support for decoding scalar constants
llvm-svn: 315182
2017-10-08 17:21:18 +00:00
Craig Topper c97775c03c [X86] Prefer MOVSS/SD over BLENDI during legalization. Remove BLENDI versions of scalar arithmetic patterns
Summary:
We currently disable some converting of shuffles to MOVSS/MOVSD during legalization if SSE41 is enabled. But later during shuffle combining we go back to prefering MOVSS/MOVSD.

Additionally we have patterns that look for BLENDIs to detect scalar arithmetic operations. I believe due to the combining using MOVSS/MOVSD these are unnecessary.

Interestingly, we still codegen blend instructions even though lowering/isel emit movss/movsd instructions. Turns out machine CSE commutes them to blend, and then commuting those blends back into blends that are equivalent to the original movss/movsd.

This patch fixes the inconsistency in legalization to prefer MOVSS/MOVSD. The one test change was caused by this change. The problem is that we have integer types and are mostly selecting integer instructions except for the shufps. This shufps forced the execution domain, but the vpblendw couldn't have its domain changed with a naive instruction swap. We could fix this by special casing VPBLENDW based on the immediate to widen the element type.

The rest of the patch is removing all the excess scalar patterns.

Long term we should probably add isel patterns to make MOVSS/MOVSD emit blends directly instead of relying on the double commute. We may also want to consider emitting movss/movsd for optsize. I also wonder if we should still use the VEX encoded blendi instructions even with AVX512. Blends have better throughput, and that may outweigh the register constraint.

Reviewers: RKSimon, zvi

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D38023

llvm-svn: 315181
2017-10-08 16:57:23 +00:00
Amara Emerson 1cd89ca669 [AArch64][GlobalISel] Make G_PHI of p0 types legal.
Differential Revision: https://reviews.llvm.org/D38621

llvm-svn: 315177
2017-10-08 15:29:11 +00:00
Gadi Haber 684944b822 [X86][SKX] Adding the scheduling information for the SKX target.
Adding the scheduling information for the SkylakeServer (SKX) target.

This patch adds the instruction scheduling information for the SkylakeServer (SKX) architecture target by adding the file X86SchedSkylakeServer.td located under the X86 Target.
We used the scheduling information retrieved from the Skylake architects in order to create the file.
The scheduling information includes latency, number of micro-Ops and used ports by each SKL instruction.

The patch continues the scheduling replacement and insertion effort started with the SNB target in r310792, the HSW target in r311879 and the SkylakeClient (SKL) target in rL313613.

Please expect some performance fluctuations due to code alignment effects.

Reviewers: zvi, RKSimon, craig.topper, chandlerc, aymanmu
Differential Revision: https://reviews.llvm.org/D38443

Change-Id: I5c228fcc09e9e5a99b6116e62b356c4f9b971185
llvm-svn: 315175
2017-10-08 12:52:54 +00:00
Ayman Musa 1170deb9c8 [X86] Add missing entries in 'MemoryFoldTable2Addr' to get complete form of the table.
Get the folding table 'MemoryFoldTable2Addr' to a complete state as part of the process explained in https://reviews.llvm.org/D38028

Differential Revision: https://reviews.llvm.org/D38500

llvm-svn: 315174
2017-10-08 09:46:50 +00:00
Ayman Musa 993339b941 [X86][TableGen] Recommitting the X86 memory folding tables TableGen backend while disabling it by default.
After the original commit ([[ https://reviews.llvm.org/rL304088 | rL304088 ]]) was reverted, a discussion in llvm-dev was opened on 'how to accomplish this task'.
In the discussion we concluded that the best way to achieve our goal (which is to automate the folding tables and remove the manually maintained tables) is:

 # Commit the tablegen backend disabled by default.

 # Proceed with an incremental updating of the manual tables - while checking the validity of each added entry.

 # Repeat previous step until we reach a state where the generated and the manual tables are identical. Then we can safely remove the manual tables and include the generated tables instead.

 # Schedule periodical (1 week/2 weeks/1 month) runs of the pass:

   - if changes appear (new entries):
      - make sure the entries are legal
      - If they are not, mark them as illegal to folding
   - Commit the changes (if there are any).

CMake flag added for this purpose is "X86_GEN_FOLD_TABLES". Building with this flags will run the pass and emit the X86GenFoldTables.inc file under build/lib/Target/X86/ directory which is a good reference for any developer who wants to take part in the effort of completing the current folding tables.

Differential Revision: https://reviews.llvm.org/D38028

llvm-svn: 315173
2017-10-08 09:20:32 +00:00
Craig Topper bbca2f2978 [X86] Stop LowerSIGN_EXTEND_AVX512 from creating v8i16/v16i16/v16i8 vselects with a v8i1/v16i1 condition when BWI is not available.
Some of the tests in vector-shuffle-v1.ll would get into an infinite loop without this.

llvm-svn: 315172
2017-10-08 08:50:59 +00:00
Ayman Musa 5fc6dc58d7 [X86] Add new attribute to X86 instructions to enable marking them as "not memory foldable"
This attribute will be used in a tablegen backend that generated the X86 memory folding tables which will be added in a future pass.
Instructions with this attribute unset will be excluded from the full set of X86 instructions available for the pass.

Differential Revision: https://reviews.llvm.org/D38027

llvm-svn: 315171
2017-10-08 08:32:56 +00:00
Craig Topper 9563cab961 [X86] Simplify some code in getInsertVINSERTImmediate and getExtractVEXTRACTImmediate. NFC
Replace one of the divides with a multiply.

llvm-svn: 315162
2017-10-08 01:33:42 +00:00
Craig Topper 27170fee8d [X86] If we see an insert of a bitcast into zero vector, canonicalize it to move the bitcast to the other side of the insert.
This improves detection of zeroing of upper bits during isel.

llvm-svn: 315161
2017-10-08 01:33:41 +00:00
Craig Topper f7a19db649 [X86] Remove ISD::INSERT_SUBVECTOR handling from combineBitcastForMaskedOp. Add isel patterns to make up for it.
This will allow for some flexibility in canonicalizing bitcasts around insert_subvector.

llvm-svn: 315160
2017-10-08 01:33:40 +00:00
Craig Topper 16f2044fa8 [X86] Use getConstantOperandVal to simplify some code. NFC
llvm-svn: 315159
2017-10-08 01:33:38 +00:00
Simon Pilgrim 9508fe7924 [X86][SSE] Match bitcasted BUILD_VECTOR of constants for v2i64 shifts on 64-bit targets (PR34855)
Extension to rL315155, generate constant shifts on 64-bits as well as 32-bits.

llvm-svn: 315156
2017-10-07 17:57:22 +00:00
Simon Pilgrim 70e1db78db [X86][SSE] Match bitcasted v4i32 BUILD_VECTORS for v2i64 shifts on 64-bit targets (PR34855)
We were already doing this for 32-bit targets, but we can generate these on 64-bits as well.

llvm-svn: 315155
2017-10-07 17:42:17 +00:00
Craig Topper 2f60295364 [X86] Add X86ISD::CMOV to computeKnownBitsForTargetNode and ComputeNumSignBitsForTargetNode.
Summary: Implementations based on ISD::SELECT.

Reviewers: RKSimon, spatel

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D38663

llvm-svn: 315153
2017-10-07 16:51:19 +00:00
Simon Pilgrim 73f143e774 [X86][SSE] Improve shuffling combining with horizontal operations
Recognise cases when we can merge the shuffles with their horizontal (HADD/HSUB/PACK) instruction inputs.

Replaces an older implementation which performed some of this during lowering, expanding an existing target shuffle combine stage instead.

Differential Revision: https://reviews.llvm.org/D38506

llvm-svn: 315150
2017-10-07 12:42:23 +00:00
Martin Storsjo 5e9d482b0a [X86] Update an outdated comment about SjLj
The SjLj intrinsics in the X86 backend are intended for use with
SjLj exception handling as well, since SVN r271244.

Differential Revision: https://reviews.llvm.org/D38532

llvm-svn: 315146
2017-10-07 06:00:32 +00:00
Craig Topper e79eff3bb5 [X86] Correct result type for the flag result of RDSEED and RDRAND nodes. Correct the CC type for the CMOV used with RDSEED/RDRAND.
The flag result was MVT::Glue, but should be MVT::i32. The CC type was MVT::i8, but should be MVT::i32.

llvm-svn: 315145
2017-10-07 05:11:59 +00:00
Jessica Paquette 13593843f6 [MachineOutliner] Disable outlining from LinkOnceODRs by default
Say you have two identical linkonceodr functions, one in M1 and one in M2.
Say that the outliner outlines A,B,C from one function, and D,E,F from another
function (where letters are instructions). Now those functions are not
identical, and cannot be deduped. Locally to M1 and M2, these outlining
choices would be good-- to the whole program, however, this might not be true!

To mitigate this, this commit makes it so that the outliner sees linkonceodr
functions as unsafe to outline from. It also adds a flag,
-enable-linkonceodr-outlining, which allows the user to specify that they
want to outline from such functions when they know what they're doing.

Changing this handles most code size regressions in the test suite caused by
competing with linker dedupe. It also doesn't have a huge impact on the code
size improvements from the outliner. There are 6 tests that regress > 5% from
outlining WITH linkonceodrs to outlining WITHOUT linkonceodrs. Overall, most
tests either improve or are not impacted.

Not outlined vs outlined without linkonceodrs:
https://hastebin.com/raw/qeguxavuda

Not outlined vs outlined with linkonceodrs:
https://hastebin.com/raw/edepoqoqic

Outlined with linkonceodrs vs outlined without linkonceodrs:
https://hastebin.com/raw/awiqifiheb

Numbers generated using compare.py with -m size.__text. Tests run for AArch64
with -Oz -mllvm -enable-machine-outliner -mno-red-zone.

llvm-svn: 315136
2017-10-07 00:16:34 +00:00
Cameron McInally 9d64101fe8 [AVX512] Fix TERNLOG when folding broadcast
Patch to fix ternlog instructions with a folded
broadcast. The broadcast decorator, e.g. {1toX}, was missing.

Differential Revision: https://reviews.llvm.org/D38649

llvm-svn: 315122
2017-10-06 22:31:29 +00:00
Stanislav Mekhanoshin de42c29a68 [AMDGPU] New 64 bit div/rem expansion
Old expansion was 20 VGPRs, 78 SGPRs and ~380 instructions.
This expansion is 11 VGPRs, 12 SGPRs and ~120 instructions.

Passes OpenCL conformance test_integer_ops quick_[u]long_math

Differential Revision: https://reviews.llvm.org/D38607

llvm-svn: 315081
2017-10-06 17:24:45 +00:00
Diana Picus e393bc72ee [ARM] GlobalISel: Select shifts
Unfortunately TableGen doesn't handle this yet:
Unable to deduce gMIR opcode to handle Src (which is a leaf).

Just add some temporary hand-written code to generate the proper MOVsr.

llvm-svn: 315071
2017-10-06 15:39:16 +00:00
Diana Picus a81a4b17e5 [ARM] GlobalISel: Map shift operands to GPRs
llvm-svn: 315067
2017-10-06 14:52:43 +00:00
Diana Picus 2c95730450 [ARM] GlobalISel: Mark shifts as legal for s32
The new legalize combiner introduces shifts all over the place, so we
should support them sooner rather than later.

llvm-svn: 315064
2017-10-06 14:30:05 +00:00
Jonas Paulsson c63ed222b8 [SystemZ] Enable machine scheduler.
The machine scheduler (before register allocation) is enabled by default for
SystemZ.

The SelectionDAG scheduling preference now becomes source order scheduling
(was regpressure).

Review: Ulrich Weigand
https://reviews.llvm.org/D37977

llvm-svn: 315063
2017-10-06 13:59:28 +00:00
Reid Kleckner 676941909d [X86] Extract CATCHRET handling from emitEpilogue, NFC
llvm-svn: 315023
2017-10-05 21:37:39 +00:00
Derek Schuff 885dc59297 [WebAssembly] Add the rest of the atomic loads
Add extending loads and constant offset patterns
A bit more refactoring of the tablegen to make the patterns fairly nice and
uniform between the regular and atomic loads.

Differential Revision: https://reviews.llvm.org/D38523

llvm-svn: 315022
2017-10-05 21:18:42 +00:00
Krzysztof Parzyszek a114941fa8 [Hexagon] Make PS_fi and PS_fia extendable (they both expand to A2_addi)
llvm-svn: 315019
2017-10-05 20:20:06 +00:00
Krzysztof Parzyszek 7ae3ae9ef4 [Hexagon] Give uniform names to functions changing addressing modes, NFC
The new format is changeAddrMode_xx_yy, where xx is the current mode,
and yy is the new one.

Old name:               New name:
getBaseWithImmOffset    changeAddrMode_abs_io
getAbsoluteForm         changeAddrMode_io_abs
getBaseWithRegOffset    changeAddrMode_io_rr
xformRegToImmOffset     changeAddrMode_rr_io
getBaseWithLongOffset   changeAddrMode_rr_ur
getRegShlForm           changeAddrMode_ur_rr

llvm-svn: 315013
2017-10-05 20:01:38 +00:00
Reid Kleckner 7344282c36 [X86] Simplify X86 epilogue frame size calculation, NFC
Sink the insertion of "pop ebp" out of the frame size calculation
branches. They all check for HasFP.

Our handling of CLEANUPRET and CATCHRET was equivalent, both are
funclets and use the same frame size. We can eliminate the CLEANUPRET
case.

Hoist the hasFP(MF) query into a local bool.

Rename TargetMBB to CatchRetTarget to be more descriptive.

Eliminate the Optional<unsigned> RetOpcode local, now that it has one
use.

It's only a net savings of 10 lines, but hopefully it's *slightly* more
readable.

llvm-svn: 315000
2017-10-05 18:27:08 +00:00
Petar Jovanovic 65f10246bb [mips] implement .set dspr2 directive
Implement .set dspr2 directive with appropriate feature bits. This
directive is a counterpart of -mattr=dspr2 command line option with the
exception that it does not influence elf header flags.

Patch by Milos Stojanovic.

Differential Revision: https://reviews.llvm.org/D38537

llvm-svn: 314994
2017-10-05 17:40:32 +00:00
Matt Arsenault 2d3f8f333d AMDGPU: Set v2i32 any_extend to expand
llvm-svn: 314993
2017-10-05 17:38:30 +00:00
Krzysztof Parzyszek 9f3e88ae64 [RDF] Simplify construction of maximal registers
The old algoritm was not correct, although it worked most of the time.
Avoid the complex reachability analysis and simply calculate the maximal
registers out of the set of all referenced registers.

llvm-svn: 314991
2017-10-05 17:12:49 +00:00
Artur Pilipenko 7b15254c8f [X86] Fix chains update when lowering BUILD_VECTOR to a vector load
The code which lowers BUILD_VECTOR of consecutive loads into a single vector
load doesn't update chains properly. As a result the vector load can be
reordered with the store to the same location.

The current code in EltsFromConsecutiveLoads only updates the chain following
the first load. The fix is to update the chains following all the loads
comprising the vector.

This is a fix for PR10114.

Reviewed By: niravd

Differential Revision: https://reviews.llvm.org/D38547

llvm-svn: 314988
2017-10-05 16:28:21 +00:00
Konstantin Zhuravlyov aa0835a7ab AMDGPU: Add and set AMDGPU-specific e_flags
Differential Revision: https://reviews.llvm.org/D38556

llvm-svn: 314987
2017-10-05 16:19:18 +00:00
Simon Dardis 51a7ae2a29 [mips] Place certain 64 bit FPU instructions in their own decoder namespace
Previously, instructions that were defined to use the FGR64 register class
were associated with the Mips64 table which was incorrect.

Reviewers: nitesh.jain, atanasyan

Differential Revision: https://reviews.llvm.org/D38454

llvm-svn: 314976
2017-10-05 10:27:37 +00:00
Eugene Zelenko 60433b682f [X86] Fix some Clang-tidy modernize-use-using and Include What You Use warnings; other minor fixes (NFC).
llvm-svn: 314953
2017-10-05 00:33:50 +00:00
Matt Arsenault f48e5c9ce5 AMDGPU: Add comment about clamps
llvm-svn: 314952
2017-10-05 00:13:20 +00:00
Matt Arsenault aafff87dda AMDGPU: Do not fold clamp instructions when sources are different
Patch by hakzsam (Samuel Pitoiset)

llvm-svn: 314951
2017-10-05 00:13:17 +00:00
Matt Arsenault 9ab1fa6803 AMDGPU: Fix not accounting for instruction size in bundles
These were counted as 0. Fixes branch limit exceeded errors
in some large programs.

llvm-svn: 314944
2017-10-04 22:59:12 +00:00
Konstantin Zhuravlyov 8684f7b4f9 AMDGPU: Correctly set EI_OSABI based on the os
Differential Revision: https://reviews.llvm.org/D38555

llvm-svn: 314943
2017-10-04 22:44:13 +00:00
Sanjay Patel 4c33d5213b [SimplifyCFG] put the optional assumption cache pointer in the options struct; NFCI
This is a follow-up to https://reviews.llvm.org/D38138. 

I fixed the capitalization of some functions because we're changing those
lines anyway and that helped verify that we weren't accidentally dropping 
any options by using default param values.

llvm-svn: 314930
2017-10-04 20:26:25 +00:00
Simon Pilgrim 9edbe110e8 [X86][AVX] Improve (i8 bitcast (v8i1 x)) handling for v8i64/v8f64 512-bit vector compare results.
AVX1/AVX2 targets were missing a chance to use vmovmskps for v8f32/v8i32 results for bool vector bitcasts

llvm-svn: 314921
2017-10-04 18:00:42 +00:00
Krzysztof Parzyszek 4697ddeea4 [Hexagon] Add a member Subtarget to HexagonInstrInfo, NFC
llvm-svn: 314920
2017-10-04 18:00:15 +00:00
Hans Wennborg 2a6c9adb2f Revert r314886 "[X86] Improvement in CodeGen instruction selection for LEAs (re-applying post required revision changes.)"
It broke the Chromium / SQLite build; see PR34830.

> Summary:
>    1/  Operand folding during complex pattern matching for LEAs has been
>        extended, such that it promotes Scale to accommodate similar operand
>        appearing in the DAG.
>        e.g.
>          T1 = A + B
>          T2 = T1 + 10
>          T3 = T2 + A
>        For above DAG rooted at T3, X86AddressMode will no look like
>          Base = B , Index = A , Scale = 2 , Disp = 10
>
>    2/  During OptimizeLEAPass down the pipeline factorization is now performed over LEAs
>        so that if there is an opportunity then complex LEAs (having 3 operands)
>        could be factored out.
>        e.g.
>          leal 1(%rax,%rcx,1), %rdx
>          leal 1(%rax,%rcx,2), %rcx
>        will be factored as following
>          leal 1(%rax,%rcx,1), %rdx
>          leal (%rdx,%rcx)   , %edx
>
>    3/ Aggressive operand folding for AM based selection for LEAs is sensitive to loops,
>       thus avoiding creation of any complex LEAs within a loop.
>
> Reviewers: lsaba, RKSimon, craig.topper, qcolombet, jmolloy
>
> Reviewed By: lsaba
>
> Subscribers: jmolloy, spatel, igorb, llvm-commits
>
>     Differential Revision: https://reviews.llvm.org/D35014

llvm-svn: 314919
2017-10-04 17:54:06 +00:00
Simon Pilgrim b47b3f2564 [X86][SSE] Add support for lowering v8i16 binary shuffles to PACKSS/PACKUS
Missed in D38472

llvm-svn: 314916
2017-10-04 17:31:28 +00:00
Craig Topper 6fb55716e9 [X86] Redefine MOVSS/MOVSD instructions to take VR128 regclass as input instead of FR32/FR64
This patch redefines the MOVSS/MOVSD instructions to take VR128 as its second input. This allows the MOVSS/SD->BLEND commute to work without requiring a COPY to be inserted.

This should fix PR33079

Overall this looks to be an improvement in the generated code. I haven't checked the EXPENSIVE_CHECKS build but I'll do that and update with results.

Differential Revision: https://reviews.llvm.org/D38449

llvm-svn: 314914
2017-10-04 17:20:12 +00:00
Yonghong Song 09b01b3555 bpf: fix an insn encoding issue for neg insn
Signed-off-by: Yonghong Song <yhs@fb.com>
llvm-svn: 314911
2017-10-04 16:11:52 +00:00
Simon Pilgrim 46a366ccb7 [X86][SSE] Early out from ComputeNumSignBitsForTargetNode. NFCI.
Early out from vector shift by immediates that will exceed eltsize - don't bother making an unnecessary ComputeNumSignBits recursive call.

llvm-svn: 314903
2017-10-04 13:41:26 +00:00
Simon Pilgrim bd5d2f0284 [X86][SSE] Add support for lowering unary shuffles to PACKSS/PACKUS
Extension to D38472

llvm-svn: 314901
2017-10-04 13:12:08 +00:00
Dylan McKay 8dd702c1cd [AVR] Implement LPMWRdZ pseudo-instruction's expansion.
FIXME: implementation is mostly copy-pasted from LDWRdPtr, so we should
refactor a bit and unify the two

Patch by Gerdo Erdi.

llvm-svn: 314898
2017-10-04 10:37:22 +00:00
Dylan McKay 3f71f1c91e [AVR] Factor out mayLoad in tablegen patterns
Patch by Gergo Erdi.

llvm-svn: 314897
2017-10-04 10:36:07 +00:00
Dylan McKay d00f9c1ef1 [AVR] Elaborate LDWRdPtr into `ld r, X++; ld r+1, X`
Patch by Gergo Erdi.

llvm-svn: 314896
2017-10-04 10:33:36 +00:00
Dylan McKay 39069208d5 [AVR] Insert JMP for long branches
Previously, on long branches (relative jumps of >4 kB), an assertion
failure was hit, as AVRInstrInfo::insertIndirectBranch was not
implemented. Despite its name, it is called by the branch relaxator
for *all* unconditional jumps.

Patch by Thomas Backman.

llvm-svn: 314891
2017-10-04 09:51:28 +00:00
Dylan McKay c4b002bf5a [AVR] Fix displacement overflow for LDDW/STDW
In some cases, the code generator attempts to generate instructions such as:

lddw r24, Y+63

which expands to:

ldd r24, Y+63
ldd r25, Y+64 # Oops! This is actually ld r25, Y in the binary

This commit limits the first offset to 62, and thus the second to 63.
It also updates some asserts in AVRExpandPseudoInsts.cpp, including for
INW and OUTW, which appear to be unused.

Patch by Thomas Backman.

llvm-svn: 314890
2017-10-04 09:51:21 +00:00
Oliver Stannard 878216dd05 [ARM] Add diag string for movw/movt immediates in assembly
This adds diagnostics for invalid immediate operands to the MOVW and MOVT
instructions (ARM and Thumb).

Differential revision: https://reviews.llvm.org/D31879

llvm-svn: 314888
2017-10-04 09:24:54 +00:00
Oliver Stannard 5a7aae3a80 [ARM, Asm] Change grammar of immediate operand diagnostics
Currently, our diagnostics for assembly operands are not consistent.
Some start with (for example) "immediate operand must be ...",
and some with "operand must be an immediate ...". I think the latter
form is preferable for a few reasons:
* It's unambiguous that it is referring to the expected type of operand, not
  the type the user provided. For example, the user could provide an register
  operand, and get a message taking about an operand is if it is already an
  immediate, just not in the accepted range.
* It allows us to have a consistent style once we add diagnostics for operands
  that could take two forms, for example a label or pc-relative memory operand.

Differential revision: https://reviews.llvm.org/D36689

llvm-svn: 314887
2017-10-04 09:18:07 +00:00
Jatin Bhateja 3c29bacd43 [X86] Improvement in CodeGen instruction selection for LEAs (re-applying post required revision changes.)
Summary:
   1/  Operand folding during complex pattern matching for LEAs has been
       extended, such that it promotes Scale to accommodate similar operand
       appearing in the DAG.
       e.g.
         T1 = A + B
         T2 = T1 + 10
         T3 = T2 + A
       For above DAG rooted at T3, X86AddressMode will no look like
         Base = B , Index = A , Scale = 2 , Disp = 10

   2/  During OptimizeLEAPass down the pipeline factorization is now performed over LEAs
       so that if there is an opportunity then complex LEAs (having 3 operands)
       could be factored out.
       e.g.
         leal 1(%rax,%rcx,1), %rdx
         leal 1(%rax,%rcx,2), %rcx
       will be factored as following
         leal 1(%rax,%rcx,1), %rdx
         leal (%rdx,%rcx)   , %edx

   3/ Aggressive operand folding for AM based selection for LEAs is sensitive to loops,
      thus avoiding creation of any complex LEAs within a loop.

Reviewers: lsaba, RKSimon, craig.topper, qcolombet, jmolloy

Reviewed By: lsaba

Subscribers: jmolloy, spatel, igorb, llvm-commits

    Differential Revision: https://reviews.llvm.org/D35014

llvm-svn: 314886
2017-10-04 09:02:10 +00:00
Martin Storsjo e14145dcb0 [X86] Fix using the SJLJ jump table on x86_64
The previous version didn't work if the jump table base address didn't
fit in 32 bit, since it was encoded as an immediate offset. And in case
the jump table is encoded as 32 bit label differences, we need to
load and add them to the table base first.

This solves the first half of the issues mentioned in PR34720.

Also fix some of the errors pointed out by -verify-machineinstrs, by
using GR32_NOSPRegClass.

Differential Revision: https://reviews.llvm.org/D38333

llvm-svn: 314876
2017-10-04 05:12:10 +00:00
Balaram Makam e0c43152b5 [AArch64] Use LateSimplifyCFG after expanding atomic operations.
Summary:
After r308422 we defer optimizations that can destroy loop canonical forms to
LateSimplifyCFG. Running LateSimplifyCFG after expanding atomic operations
can exploit more control-flow opportunities.

Reviewers: mcrosier, t.p.northover, efriedma

Reviewed By: efriedma

Subscribers: aemerson, rengolin, javed.absar, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D38262

llvm-svn: 314857
2017-10-03 22:39:24 +00:00
Konstantin Zhuravlyov 22bc039c89 AMDGPU: Expand setcc for v2f32 and v4f32
llvm-svn: 314853
2017-10-03 21:45:01 +00:00
Konstantin Zhuravlyov 908fa90b51 AMDGPU: Expand setcc for v2i32 and v4i32
llvm-svn: 314852
2017-10-03 21:31:24 +00:00
Reid Kleckner 33cbbbc62f [X86] Remove dead declaration convertArgMovsToPushes, NFC
This was dead when it landed in r252578. We have this functionality, if
not for stack probe calls, but for regular calls in
X86CallFrameOptimization.cpp.

llvm-svn: 314845
2017-10-03 21:12:18 +00:00
Stefan Pintilie e1d7547237 [PowerPC] Revert P9 scheduling model to incomplete
Partially revert a previous change from commit: https://llvm.org/svn/llvm-project/llvm/trunk@314026
The previous change caused regressions on Power 9.

llvm-svn: 314835
2017-10-03 20:27:30 +00:00
Tim Renouf 72800f0436 [AMDGPU] implemented pal metadata
Summary:
For the amdpal OS type:

We write an AMDGPU_PAL_METADATA record in the .note section in the ELF
(or as an assembler directive). It contains key=value pairs of 32 bit
ints. It is a merge of metadata from codegen of the shaders, and
metadata provided by the frontend as _amdgpu_pal_metadata IR metadata.
Where both sources have a key=value with the same key, the two values
are ORed together.

This .note record is part of the amdpal ABI and will be documented in
docs/AMDGPUUsage.rst in a future commit.

Eventually the amdpal OS type will stop generating the .AMDGPU.config
section once the frontend has safely moved over to using the .note
records above instead of .AMDGPU.config.

Reviewers: arsenm, nhaehnle, dstuttard

Subscribers: kzhuravl, wdng, yaxunl, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D37753

llvm-svn: 314829
2017-10-03 19:03:52 +00:00
Alexander Timofeev 4651396584 [AMDGPU] Avoid predicated execution of the basic blocks containing scalar
instructions.

Differential revision: https://reviews.llvm.org/D38293

llvm-svn: 314828
2017-10-03 18:55:36 +00:00
Hans Wennborg 660531085a CodeView: Provide a .def file with the register ids
The list of register ids was previously written out in a couple of dirrent
places. This puts it in a .def file and also adds a few more registers (e.g.
the x87 regs) which should lead to more readable dumps, but I didn't include
the whole list since that seems unnecessary.

X86_MC::initLLVMToSEHAndCVRegMapping is pretty ugly, but at least it's not
relying on magic constants anymore. The TODO of using tablegen still stands.

Differential revision: https://reviews.llvm.org/D38480

llvm-svn: 314821
2017-10-03 18:27:22 +00:00
Oliver Stannard 0d5c792223 [ARM] Use table-gen'd assembly operand diags in ARM asm parser
This switches the ARM AsmParser to use assembly operand diagnostics from
tablegen, rather than a switch statement on the ARMMatchResultTy. It
moves the existing diagnostic strings to tablegen, but adds no new ones,
so this is NFC except for one diagnostic string that had an off-by-1 error
in the hand-written switch statement.

Differential revision: https://reviews.llvm.org/D31607

llvm-svn: 314804
2017-10-03 14:38:52 +00:00
Oliver Stannard 55114fd9f0 [ARM, Asm] Use correct source location for register tokens
tryParseRegister advances the lexer, so we need to take copies of the start and
end locations of the register operand before calling it.

Previously, the caret in the diagnostic pointer to the comma after the r0
operand in the test, rather than the start of the operand.

Differential revision: https://reviews.llvm.org/D31537

llvm-svn: 314799
2017-10-03 14:30:58 +00:00
Simon Dardis 055192ccd3 [mips] Enable spilling and reloading of the dsp register set.
The dsp register class is an alias of the gpr register class, so
we have to define instructions for spilling and reloading.

Reviewers: atanasyan

Differential Revision: https://reviews.llvm.org/D38038

llvm-svn: 314798
2017-10-03 13:45:49 +00:00
Oliver Stannard 68aa7de517 [ARM, Asm] Fix ubsan failure caused by out-of-range enum value
In this code, we use ~0U as a sentinel value for any operand class that doesn't
have a user-friendly error message, but this value isn't in range of the
MatchClassKind enum, so we need to ensure it does not get passed to isSubclass.

llvm-svn: 314793
2017-10-03 12:45:18 +00:00
Simon Pilgrim cf99d069c3 [X86][SSE] Add support for decoding PACKSS/PACKUS shuffles masks with UNDEF
llvm-svn: 314792
2017-10-03 12:41:39 +00:00
Oliver Stannard 5daee987fd [ARM, Asm] Remove dead code causing MSan failure.
r314779 caused ErrorInfo to be red uninitialised, but also made this code dead,
so it can just be removed.

llvm-svn: 314791
2017-10-03 12:28:28 +00:00
Simon Pilgrim f5f291d129 [X86][SSE] Add support for lowering shuffles to PACKSS/PACKUS
If the upper bits of a truncation shuffle patterns have at least the minimum number of sign/zero bits on their inputs then we can safely use PACKSS/PACKUS as shuffles.

Partial fix for https://bugs.llvm.org/show_bug.cgi?id=34773

Differential Revision: https://reviews.llvm.org/D38472

llvm-svn: 314788
2017-10-03 12:01:31 +00:00
Oliver Stannard e093bad472 [ARM] Use new assembler diags for ARM
This converts the ARM AsmParser to use the new assembly matcher error
reporting mechanism, which allows errors to be reported for multiple
instruction encodings when it is ambiguous which one the user intended
to use.

By itself this doesn't improve many error messages, because we don't have
diagnostic text for most operand types, but as we add that then this will allow
more of those diagnostic strings to be used when they are relevant.

Differential revision: https://reviews.llvm.org/D31530

llvm-svn: 314779
2017-10-03 10:26:11 +00:00
Simon Pilgrim d87af9a1c0 Remove unused variable. NFCI.
llvm-svn: 314778
2017-10-03 10:01:02 +00:00
Simon Pilgrim 640fbf5132 [X86][SSE] Add support for shuffle combining from PACKSS/PACKUS
Mentioned in D38472

llvm-svn: 314777
2017-10-03 09:54:03 +00:00
Simon Pilgrim 19d535e75b [X86][SSE] Add support for PACKSS/PACKUS constant folding
Pulled out of D38472

llvm-svn: 314776
2017-10-03 09:41:00 +00:00
Sjoerd Meijer 7a22a4948f ISel type legalization: add debug messages. NFCI.
This adds some more debug messages to the type legalizer and functions
like PromoteNode, ExpandNode, ExpandLibCall in an attempt to make
the debug messages a little bit more informative and useful.

Differential Revision: https://reviews.llvm.org/D38450

llvm-svn: 314773
2017-10-03 08:54:15 +00:00
Hiroshi Inoue 224661d94b [trivial] fix format, NFC
llvm-svn: 314769
2017-10-03 07:28:58 +00:00
Martin Storsjo 1e54738676 [X86] Provide the LSDA pointer with RIP relative addressing if necessary
This makes sure the LSDA pointer isn't truncated to 32 bit.

Make LowerINTRINSIC_WO_CHAIN a member function instead of a static
function, so that it can use the getGlobalWrapperKind method.

This solves the second half of the issues mentioned in PR34720.

Differential Revision: https://reviews.llvm.org/D38343

llvm-svn: 314767
2017-10-03 06:29:58 +00:00
Matt Arsenault 90c7593a75 AMDGPU: Remove global isGCN predicates
These are problematic because they apply to everything,
and can easily clobber whatever more specific predicate
you are trying to add to a function.

Currently instructions use SubtargetPredicate/PredicateControl
to apply this to patterns applied to an instruction definition,
but not to free standing Pats. Add a wrapper around Pat
so the special PredicateControls requirements can be appended
to the final predicate list like how Mips does it.

llvm-svn: 314742
2017-10-03 00:06:41 +00:00
Amjad Aboud 8ef85a088e [X86][NFC] Add X86CmovConverterPass to the pass registry.
Differential Revision: https://reviews.llvm.org/D38355

llvm-svn: 314726
2017-10-02 21:46:37 +00:00
Michael Liao c6004d0371 Remove dead file.
llvm-svn: 314720
2017-10-02 21:00:52 +00:00
Matt Arsenault c6baa85fc6 AMDGPU: Fix typos
llvm-svn: 314715
2017-10-02 20:31:18 +00:00
Walter Lee 35b09cbd42 Add support for Myriad ma2x8x series of CPUs
Summary: Also add support for some older Myriad CPUs that were missing.

Reviewers: jyknight

Subscribers: fedor.sergeev

Differential Revision: https://reviews.llvm.org/D37552

llvm-svn: 314705
2017-10-02 18:50:48 +00:00
Bjorn Pettersson 8e978c0151 [X86][SSE] Fix -Wsign-compare problems introduced in r314658
The refactoring in
"[X86][SSE] Add createPackShuffleMask helper function. NFCI."
resulted in warning when compiling the code (seen in build bots).

This patch restores some types from int to unsigned to avoid
those warnings.

llvm-svn: 314667
2017-10-02 12:46:38 +00:00
Simon Pilgrim e2e27aff9b [X86][SSE] Add createPackShuffleMask helper function. NFCI.
llvm-svn: 314658
2017-10-02 10:12:51 +00:00
Simon Pilgrim c04c7443ea [X86][SSE] matchBinaryVectorShuffle - add support for different src/dst value shuffle types
Preparation for support for combining to PACKSS/PACKUS

llvm-svn: 314656
2017-10-02 09:45:08 +00:00
Hiroshi Inoue dcedd66b00 [PowerPC] support ZERO_EXTEND in tryBitPermutation
This patch add a support of ISD::ZERO_EXTEND in PPCDAGToDAGISel::tryBitPermutation to increase the opportunity to use rotate-and-mask by reordering ZEXT and ANDI.
Since tryBitPermutation stops analyzing nodes if it hits a ZEXT node while traversing SDNodes, we want to avoid ZEXT between two nodes that can be folded into a rotate-and-mask instruction.

For example, we allow these nodes

      t9: i32 = add t7, Constant:i32<1>
    t11: i32 = and t9, Constant:i32<255>
  t12: i64 = zero_extend t11
t14: i64 = shl t12, Constant:i64<2>

to be folded into a rotate-and-mask instruction.
Such case often happens in array accesses with logical AND operation in the index, e.g. array[i & 0xFF];

Differential Revision: https://reviews.llvm.org/D37514

llvm-svn: 314655
2017-10-02 09:24:00 +00:00
Simon Pilgrim 3bbbf31590 Fix typo in comment. NFCI.
llvm-svn: 314653
2017-10-02 09:10:50 +00:00
Simon Pilgrim e575651370 [X86] Cleanup uses of computeKnownBits by using MaskedValueIsZero helper instead. NFCI.
llvm-svn: 314652
2017-10-02 09:08:45 +00:00
Michael Zuckerman e4084f6bdb [X86][LLVM]Expanding Supports lowerInterleaved{store|load}() in X86InterleavedAccess (VF64 stride 3-4)
I continue to support different VF interleaved and in this pass for this patch,
I added the vf64 stride3 support for both load and store.
I also added support fot the stride4 store.

Reviewers:
1. zvi
2. dorit
3. igorb
4. guyblank

Differential Revision: https://reviews.llvm.org/D37687

Change-Id: I3d238efedf217d1768b348d710de1efa2f19d27b
llvm-svn: 314651
2017-10-02 07:35:25 +00:00
Craig Topper d37625859a [X86] Fix copy pasto in X86FastISel::fastEmitInst_rrrr.
The 4th operand was not being constrained and the third operand was being constrained twice.

llvm-svn: 314648
2017-10-02 05:46:53 +00:00
Craig Topper bb7866162c [X86] Use a bool flag instead of assigning an unsigned to two different values that we only use in an equality comparison.
llvm-svn: 314647
2017-10-02 05:46:52 +00:00
Craig Topper c05c390a7c [X86] Use _NOREX MOVZX instructions for some patterns even in 32-bit mode.
This unifies the patterns between both modes. This should be effectively NFC since all the available registers in 32-bit mode statisfy this constraint.

llvm-svn: 314643
2017-10-02 00:44:50 +00:00
Ron Lieberman 9bcdd80b66 [Hexagon] Check vector elements for equivalence in the HexagonVectorLoopCarriedReuse pass
If the two instructions being compared for equivalence have corresponding operands
    that are integer constants, then check their values to determine equivalence.
    
    Patch by Suyog Sarda!
    

llvm-svn: 314642
2017-10-02 00:34:07 +00:00
Ron Lieberman f90493d220 [Hexagon] Patch to Extract i1 element from vector of i1
This patch extracts 1 element from vector consisting
of elements of size 1 bit at given index.

llvm-svn: 314641
2017-10-02 00:16:15 +00:00
Craig Topper c20b46da2f [X86] Change register&memory TEST instructions from MRMSrcMem to MRMDstMem
Summary:
Intel documentation shows the memory operand as the first operand. But we currently treat it as the second operand. Conceptually the order doesn't matter since it doesn't write memory. We have aliases to parse with the operands in either order and the isel matching is commutable.

For the register&register form order does matter for the assembly parser. PR22995 was previously filed and fixed by changing the register&register form from MRMSrcReg to MRMDestReg to match gas. Ideally the memory form should match by using MRMDestMem.

I believe this supercedes D38025 which was trying to switch the register&register form back to pre-PR22995.

Reviewers: aymanmus, RKSimon, zvi

Reviewed By: aymanmus

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D38120

llvm-svn: 314639
2017-10-01 23:53:53 +00:00
Craig Topper 00230604d3 [X86] Remove a couple unnecessary COPY_TO_REGCLASS from some output patterns where the instruction already produces the correct register class.
llvm-svn: 314638
2017-10-01 23:53:50 +00:00
Simon Pilgrim df23a2700d [X86][SSE] Add faux shuffle combining support for PACKUS
llvm-svn: 314631
2017-10-01 18:43:48 +00:00
Simon Pilgrim 836fa6dcfd [X86][SSE] Improve shuffle combining of PACKSS instructions.
Support unary packing and fix the faux shuffle mask for vectors larger than 128 bits.

llvm-svn: 314629
2017-10-01 17:54:55 +00:00
Sanjay Patel c7076a3ba9 [x86] formatting; NFC
llvm-svn: 314627
2017-10-01 14:39:10 +00:00
Simon Pilgrim a8dd6f4f30 [X86][SSE] Fold (VSRAI (VSHLI X, C1), C1) --> X iff NumSignBits(X) > C1
Remove sign extend in register style pattern if the sign is already extended enough

llvm-svn: 314599
2017-09-30 17:57:34 +00:00
Craig Topper 619569841a [AVX-512] Add patterns to make fp compare instructions commutable during isel.
llvm-svn: 314598
2017-09-30 17:02:39 +00:00
Michael Zuckerman b92b6d424f Code refactoring for the interleaved code <NFC>
Change-Id: I7831c9febad8e14278a5bc87584a0053dc837be1
llvm-svn: 314596
2017-09-30 14:55:03 +00:00
Craig Topper d92ade96f4 [X86] Support v64i8 mulhu/mulhs
Implemented by splitting into two v32i8 mulhu/mulhs and concatenating the results.

Differential Revision: https://reviews.llvm.org/D38307

llvm-svn: 314584
2017-09-30 04:21:46 +00:00
Stanislav Mekhanoshin 1d8cf2be89 [AMDGPU] Set fast-math flags on functions given the options
We have a single library build without relaxation options.
When inlined library functions remove fast math attributes
from the functions they are integrated into.

This patch sets relaxation attributes on the functions after
linking provided corresponding relaxation options are given.
Math instructions inside the inlined functions remain to have
no fast flags, but inlining does not prevent fast math
transformations of a surrounding caller code anymore.

Differential Revision: https://reviews.llvm.org/D38325

llvm-svn: 314568
2017-09-29 23:40:19 +00:00
Nicolai Haehnle ce4ddd06da AMDGPU: VALU carry-in and v_cndmask condition cannot be EXEC
The hardware will only forward EXEC_LO; the high 32 bits will be zero.

Additionally, inline constants do not work. At least,

   v_addc_u32_e64 v0, vcc, v0, v1, -1

which could conceivably be used to combine (v0 + v1 + 1) into a single
instruction, acts as if all carry-in bits are zero.

The llvm.amdgcn.ps.live test is adjusted; it would be nice to combine

   s_mov_b64 s[0:1], exec
   v_cndmask_b32_e64 v0, v1, v2, s[0:1]

into

   v_mov_b32 v0, v3

but it's not particularly high priority.

Fixes dEQP-GLES31.functional.shaders.helper_invocation.value.*

llvm-svn: 314522
2017-09-29 15:37:31 +00:00
Jonas Paulsson c9e363ac69 [SystemZ] implement shouldCoalesce()
Implement shouldCoalesce() to help regalloc avoid running out of GR128
registers.

If a COPY involving a subreg of a GR128 is coalesced, the live range of the
GR128 virtual register will be extended. If this happens where there are
enough phys-reg clobbers present, regalloc will run out of registers (if
there is not a single GR128 allocatable register available).

This patch tries to allow coalescing only when it can prove that this will be
safe by checking the (local) interval in question.

Review: Ulrich Weigand, Quentin Colombet
https://reviews.llvm.org/D37899
https://bugs.llvm.org/show_bug.cgi?id=34610

llvm-svn: 314516
2017-09-29 14:31:39 +00:00
Amara Emerson 7d6c55f8aa [X86] Improve codegen for inverted overflow checking intrinsics.
Adds a new combine for: xor(setcc cc, val), 1 --> setcc (invert(cc), val)

Differential Revision: https://reviews.llvm.org/D38161

llvm-svn: 314514
2017-09-29 13:53:44 +00:00
Sam Parker 963da5b119 [ARM] v8.3-a complex number support
New instructions are added to AArch32 and AArch64 to aid
floating-point multiplication and addition of complex numbers, where
the complex numbers are packed in a vector register as a pair of
elements. The Imaginary part of the number is placed in the more
significant element, and the Real part of the number is placed in the
less significant element.

This patch adds assembler for the ARM target.

Differential Revision: https://reviews.llvm.org/D36789

llvm-svn: 314511
2017-09-29 13:11:33 +00:00
Michael Zuckerman 0b5db55b96 Small modification <NFC>
Change-Id: I360abccee12cae29bd2ac4f8399c9ecc92eb7f13
llvm-svn: 314510
2017-09-29 12:45:54 +00:00
Aleksandar Beserminji 29341b88ac [mips] Reordering callseq* nodes to be linear
Fix nested callseq* nodes by moving callseq_start after the
arguments calculation to temporary registers, so that callseq* nodes
in resulting DAG are linear.

Recommitting r314497. This version does not contain test which fails
when compiler is not build in debug mode.

Differential Revision: https://reviews.llvm.org/D37328

llvm-svn: 314507
2017-09-29 11:05:02 +00:00
Aleksandar Beserminji a0a01e7172 Revert "[mips] Reordering callseq* nodes to be linear"
Added test relies on the compiler being built in debug mode,
which may not be the case.

This reverts commit r314497.

llvm-svn: 314506
2017-09-29 10:52:03 +00:00
Simon Dardis f21d8d6ad5 [mips] Add missing license info, formatting changes. NFCI
Add missing license information to MicroMipsInstrFPU.td and
fix most of the formatting errors present. Others will be
addressed in a follow up commits.

llvm-svn: 314505
2017-09-29 10:08:06 +00:00
Tim Renouf ef1ae8ffac [AMDGPU] calling conventions for AMDPAL OS type
Summary:
This commit adds comments on how the AMDPAL OS type overloads the
existing AMDGPU_ calling conventions used by Mesa, and adds a couple of
new ones.

Reviewers: arsenm, nhaehnle, dstuttard

Subscribers: mehdi_amini, kzhuravl, wdng, yaxunl, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D37752

llvm-svn: 314502
2017-09-29 09:51:22 +00:00
Tim Renouf 132291589f [AMDGPU] AMDPAL scratch buffer support
Summary:
Added support for scratch (including spilling) for OS type amdpal:
generates code to set up the scratch descriptor if it is needed.

With amdpal, the scratch resource descriptor is loaded from offset 0 of
the global information table. The low 32 bits of the address of the
global information table is passed in s0.

Added amdgpu-git-ptr-high function attribute to hard-wire the high 32
bits of the address of the global information table. If the function
attribute is not specified, or is 0xffffffff, then the backend generates
code to use the high 32 bits of pc.

The documentation for the AMDPAL ABI will be added in a later commit.

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye

Differential Revision: https://reviews.llvm.org/D37483

llvm-svn: 314501
2017-09-29 09:49:35 +00:00
Tim Renouf 9f7ead3334 [Triple] Add AMDPAL operating system type
Summary:
This operating system type represents the AMDGPU PAL runtime, and will
be required by the AMDGPU backend in order to generate correct code for
this runtime.

Currently it generates the same code as not specifying an OS at all.
That will change in future commits.

Patch from Tim Corringham.

Subscribers: arsenm, nhaehnle

Differential Revision: https://reviews.llvm.org/D37380

llvm-svn: 314500
2017-09-29 09:48:12 +00:00
Aleksandar Beserminji 502dcb035a [mips] Reordering callseq* nodes to be linear
Fix nested callseq* nodes by moving callseq_start after the
arguments calculation to temporary registers, so that callseq* nodes
in resulting DAG are linear.

Differential Revision: https://reviews.llvm.org/D37328

llvm-svn: 314497
2017-09-29 09:32:14 +00:00
Coby Tayree c3d24118e8 [X86][MS-InlineAsm] Extended support for variables / identifiers on memory / immediate expressions
Allow the proper recognition of Enum values and global variables inside ms inline-asm memory / immediate expressions, as they require some additional overhead and treated incorrect if doesn't early recognized.
supersedes D33278, D35774

Differential Revision: https://reviews.llvm.org/D37412

llvm-svn: 314493
2017-09-29 07:02:46 +00:00
Craig Topper 6255c7b675 [X86] Don't select (cmp (and, imm), 0) to testw
Summary:
X86ISelDAGToDAG tries to analyze ANDs compared with 0 to optimize to narrower immediates using subregisters.

I don't think we should be optimizing to 16-bit test instructions. It goes against our normal behavior of promoting i16 operations to i32. It only saves one byte due to the need to add a 0x66 prefix. I think it would also be subject to a length changing prefix penalty in the decoders on Intel CPUs.

Reviewers: RKSimon, zvi, spatel

Reviewed By: spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D38273

llvm-svn: 314474
2017-09-28 23:35:36 +00:00
Matthias Braun 51687912a4 ARM: Fix cases where CSI Restored bit is not cleared
LR is an untypical callee saved register in that it is restored into a
different register (PC) and thus does not live-out of the return block.
This case requires the `Restored` flag in CalleeSavedInfo to be cleared.

This fixes a number of cases where this wasn't handled correctly yet.

llvm-svn: 314471
2017-09-28 23:12:06 +00:00
Yonghong Song ef29a84d48 bpf: fix a bug for disassembling ld_pseudo inst
Signed-off-by: Yonghong Song <yhs@fb.com>
llvm-svn: 314469
2017-09-28 22:47:34 +00:00
Eugene Zelenko 3b87336a0c [Hexagon] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC).
llvm-svn: 314467
2017-09-28 22:27:31 +00:00
Ulrich Weigand df86855f61 [SystemZ] Fix fall-out from r314428
The expensive-checks build bot found a problem with the r314428 commit:
if CC is live after a ATOMIC_CMP_SWAPW instruction, it needs to be
marked as live-in to the block after the loop the pseudo gets expanded
to.  This actually fixes a code-gen bug as well, since if the CC isn't
live, the CR and JLH are merged to a CRJLH which doesn't actually set
the condition code any more.

llvm-svn: 314465
2017-09-28 22:08:25 +00:00
Craig Topper ed19350293 [X86] Make use of vpmovwb when possible in LowerMULH
If we have BWI, we can truncate in a much simpler way by using vpmovwb. This even works without VLX by using the wider zmm->ymm truncate with a subvector extract.

Differential Revision: https://reviews.llvm.org/D38375

llvm-svn: 314457
2017-09-28 20:10:34 +00:00
Martin Storsjo d6218cc385 [ARM] Restore the right frame pointer register in Int_eh_sjlj_longjmp
In setupEntryBlockAndCallSites in CodeGen/SjLjEHPrepare.cpp,
we fetch and store the actual frame pointer, but on return via
the longjmp intrinsic, it always was restored into the r7 variable.

On windows, the frame pointer should be restored into r11 instead of r7.

On Darwin (where sjlj exception handling is used by default), the frame
pointer is always r7, both in arm and thumb mode, and likewise, on
windows, the frame pointer always is r11.

On linux however, if sjlj exception handling is enabled (which it isn't
by default), libcxxabi and the user code can be built in differing modes
using different registers as frame pointer. Therefore, when restoring
registers on a platform where we don't always use the same register
depending on code mode, restore both r7 and r11.

Differential Revision: https://reviews.llvm.org/D38253

llvm-svn: 314451
2017-09-28 19:04:30 +00:00
Martin Storsjo adceba59a2 [ARM] Fix SJLJ exception handling when manually chosen on a platform where it isn't default
Differential Revision: https://reviews.llvm.org/D38252

llvm-svn: 314450
2017-09-28 19:04:14 +00:00
Craig Topper 3819be6cf6 [X86] Use target independent ZERO_EXTEND/SIGN_EXTEND nodes were possible in LowerMULH
We aren't do any in register extends here so we should be able to just the target independent nodes directly and allow them to be lowered as necessary.

llvm-svn: 314447
2017-09-28 18:45:28 +00:00
Craig Topper fc104bfbc0 [X86] Move a setOperation action for ISD::TRUNCATE near another one in the same if. Remove one that is redundant with another subtarget features.
llvm-svn: 314446
2017-09-28 18:45:27 +00:00
Craig Topper ceff6da6e9 [X86] Use BWI instructions to improve lowering of v32i8 MULHU/S
Summary: If we have BWI instructions we can widen to v32i16 to do the multiply instead of splitting.

Reviewers: RKSimon, spatel, zvi

Reviewed By: zvi

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D38305

llvm-svn: 314432
2017-09-28 17:00:21 +00:00
Craig Topper fd6b8a67fb [X86] Remove dead code from X86ISelDAGToDAG.cpp multiply handling
Summary:
Lowering never creates X86ISD::UMUL for 8-bit types. X86ISD::UMUL8 is used instead. If X86ISD::UMUL 8-bit were ever used it would crash.

DAGCombiner replaces UMUL_LOHI/SMUL_LOHI with a wider MUL and a shift if the type twice as wide is legal. So we should never see i8 UMUL_LOHI/SMUL_LOHI. In fact I think there was a bug in part of the i8 code. Similar is true for i16 though without the bug.

Reviewers: RKSimon, spatel, zvi

Reviewed By: zvi

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D38276

llvm-svn: 314430
2017-09-28 16:56:36 +00:00
Craig Topper 71a8cf9f99 [X86] Use correct subvector index when combining two insert subvectors featuring zero vectors.
Previously we were using one of the subvector indices twice. The included test case causes an assert without this change.

Thanks to Simon Pilgrim for catching this.

llvm-svn: 314429
2017-09-28 16:53:16 +00:00
Ulrich Weigand 0f1de04979 [SystemZ] Custom-expand ATOMIC_CMP_AND_SWAP_WITH_SUCCESS
The SystemZ compare-and-swap instructions already provide the "success"
indication via a condition-code value, so the default expansion of those
operations generates an unnecessary extra comparsion.

llvm-svn: 314428
2017-09-28 16:22:54 +00:00
Simon Pilgrim 2ff339303e Use SDValue::getConstantOperandVal helper. NFCI.
llvm-svn: 314425
2017-09-28 15:53:27 +00:00
Simon Dardis c8e33c5ca1 [mips] Remove codegen support for branch likely instructions.
This patch disables codegen support for branch likely instructions to
address a potential bug. These branches were unselectable as
they had the same patterns as the normal branches but came after them
when ISel was concerned.

The branch likely instructions were marked as having no delay
slots when they have annulling delay slots. The delay slot filler
does not currently handle annulling delay slot branches, so this
would lead to wrong codegen if these branches were generated.

Reviewers: atanasyan, nitesh.jain

Differential Revision: https://reviews.llvm.org/D38169

llvm-svn: 314421
2017-09-28 15:24:07 +00:00
Coby Tayree 566348f2a0 [x86][AsmParser] Allow some more MS size directives
MS allows the following size directives: float/double and long as synonymous to dword/qword and dword, respectively.
Differential Revision: https://reviews.llvm.org/D37190

llvm-svn: 314410
2017-09-28 11:04:08 +00:00
Alex Bradbury 5518cbfc41 Teach TargetInstrInfo::getInlineAsmLength to parse .space directives with integer arguments
It's currently quite difficult to test passes like branch relaxation, which
requires branches with large displacement to be generated. The .space assembler
directive makes it easy to create arbitrarily large basic blocks, but
getInlineAsmLength is not able to parse it and so the size of the block is not
correctly estimated. Other backends (AArch64, AMDGPU) introduce options just
for testing that artificially restrict the ranges of branch instructions (e.g.
aarch64-tbz-offset-bits). Although parsing a single form of the .space
directive feels inelegant, it does allow a more direct testing approach.

This patch adapts the .space parsing code from
Mips16InstrInfo::getInlineAsmLength and removes it now the extra functionality
is provided by the base implementation. I want to move this functionality to
the generic getInlineAsmLength as 1) I need the same for RISC-V, and 2) I feel
other backends will benefit from more direct testing of large branch
displacements.

Differential Revision: https://reviews.llvm.org/D37798

llvm-svn: 314393
2017-09-28 09:31:46 +00:00
Hiroshi Inoue 79c0bec06e [PowerPC] eliminate partially redundant compare instruction
This is a follow-on of D37211.
D37211 eliminates a compare instruction if two conditional branches can be made based on the one compare instruction, e.g.
if (a == 0) { ... }
else if (a < 0) { ... }

This patch extends this optimization to support partially redundant cases, which often happen in while loops.
For example, one compare instruction is moved from the loop body into the preheader by this optimization in the following example.
do {
  if (a == 0) dummy1();
  a = func(a);
} while (a > 0);

Differential Revision: https://reviews.llvm.org/D38236

llvm-svn: 314390
2017-09-28 08:38:19 +00:00
Alex Bradbury 9d3f12501a [RISCV] Add common fixups and relocations
%lo(), %hi(), and %pcrel_hi() are supported and test cases have been added to 
ensure the appropriate fixups and relocations are generated. I've added an 
instruction format field which is used in RISCVMCCodeEmitter to, for 
instance, tell whether it should emit a lo12_i fixup or a lo12_s fixup 
(RISC-V has two 12-bit immediate encodings depending on the instruction 
type).

Differential Revision: https://reviews.llvm.org/D23568

llvm-svn: 314389
2017-09-28 08:26:24 +00:00
Yonghong Song e9165f8720 bpf: add new insns for bswap_to_le and negation
This patch adds new insn, "reg = be16/be32/be64 reg",
for bswap to little endian for big-endian target (bpfeb).
It also adds new insn for negation "reg = -reg".

Currently, for source code, e.g.,
  b = -a
LLVM still prefers to generate:
  b = 0 - a
But "reg = -reg" format can be used in assembly code.

Signed-off-by: Yonghong Song <yhs@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
llvm-svn: 314376
2017-09-28 02:46:11 +00:00
Galina Kistanova 1c6f0bb63e Reverted r313993.
This patch produces a crash and hexagon_vector_loop_carried_reuse_constant.ll test fails on Windows (llvm-clang-x86_64-expensive-checks-win build bot).

llvm-svn: 314361
2017-09-27 23:09:14 +00:00
Jessica Paquette 4cf187b5b4 [MachineOutliner] AArch64: Avoid saving + restoring LR if possible
This commit allows the outliner to avoid saving and restoring the link register
on AArch64 when it is dead within an entire class of candidates.

This introduces changes to the way the outliner interfaces with the target.
For example, the target now interfaces with the outliner using a
MachineOutlinerInfo struct rather than by using getOutliningCallOverhead and
getOutliningFrameOverhead.

This also improves several comments on the outliner's cost model.

https://reviews.llvm.org/D36721

llvm-svn: 314341
2017-09-27 20:47:39 +00:00
Craig Topper c16a472966 Revert r314249 "Recommit r314151 "[X86] Make all the NOREX CodeGenOnly instructions into postRA pseudos like the NOREX version of TEST."""
This caused PR34751

llvm-svn: 314339
2017-09-27 20:34:17 +00:00
Craig Topper e0d8290094 Revert r314248 "[X86] Don't emit X86::MOV8rr_NOREX from X86InstrInfo::copyPhysReg."
This contributed to PR34751

llvm-svn: 314338
2017-09-27 20:34:13 +00:00
Simon Pilgrim 870007b4f8 [X86][SSE] Pull out variable shuffle mask combine logic. NFCI.
Hopefully this will make it easier to vary the combine depth threshold per-target.

llvm-svn: 314337
2017-09-27 20:19:53 +00:00
Craig Topper 7b1d503d7f [X86] Rewrite the zero vector checks in lowerV2X128VectorShuffle to use the Zeroable APInt
We already have zeroable bits in an APInt. We might as well use that instead of checking for an all zero BUILD_VECTOR.

Differential Revision: https://reviews.llvm.org/D37950

llvm-svn: 314332
2017-09-27 18:56:20 +00:00
Craig Topper 05f71dd036 [X86] In combineLoopSADPattern, pad result with zeros and use full size add instead of using a smaller add and inserting.
In some cases the result psadbw is smaller than the type of the add that started the match. Currently in these cases we are using a smaller add and inserting the result.

If we instead combine the psadbw with zeros and use the full size add we can take advantage of implicit zeroing we get if we emit a narrower move before the add.

In a future patch, I want to make isel aware that the psadbw itself already zeroed the upper bits and remove the move entirely.

Differential Revision: https://reviews.llvm.org/D37453

llvm-svn: 314331
2017-09-27 18:36:45 +00:00
Geoff Berry c032b2beb0 [AArch64][Falkor] Ignore SP based loads in HW prefetch fixups.
Reviewers: mcrosier

Subscribers: aemerson, rengolin, javed.absar, kristof.beyls

Differential Revision: https://reviews.llvm.org/D38301

llvm-svn: 314319
2017-09-27 17:14:10 +00:00
Sanjay Patel 0f9b4773c1 [SimplifyCFG] add a struct to house optional folds (PR34603)
This was intended to be no-functional-change, but it's not - there's a test diff.

So I thought I should stop here and post it as-is to see if this looks like what was expected 
based on the discussion in PR34603:
https://bugs.llvm.org/show_bug.cgi?id=34603

Notes:
 1. The test improvement occurs because the existing 'LateSimplifyCFG' marker is not carried 
    through the recursive calls to 'SimplifyCFG()->SimplifyCFGOpt().run()->SimplifyCFG()'. 
    The parameter isn't passed down, so we pick up the default value from the function signature 
    after the first level. I assumed that was a bug, so I've passed 'Options' down in all of the 
    'SimplifyCFG' calls.

 2. I split 'LateSimplifyCFG' into 2 bits: ConvertSwitchToLookupTable and KeepCanonicalLoops. 
    This would theoretically allow us to differentiate the transforms controlled by those params 
    independently.

 3. We could stash the optional AssumptionCache pointer and 'LoopHeaders' pointer in the struct too. 
    I just stopped here to minimize the diffs.

 4. Similarly, I stopped short of messing with the pass manager layer. I have another question that 
    could wait for the follow-up: why is the new pass manager creating the pass with LateSimplifyCFG 
    set to true no matter where in the pipeline it's creating SimplifyCFG passes?

    // Create an early function pass manager to cleanup the output of the
    // frontend.
    EarlyFPM.addPass(SimplifyCFGPass());

    -->

    /// \brief Construct a pass with the default thresholds
    /// and switch optimizations.
    SimplifyCFGPass::SimplifyCFGPass()
       : BonusInstThreshold(UserBonusInstThreshold),
         LateSimplifyCFG(true) {}   <-- switches get converted to lookup tables and loops may not be in canonical form

    If this is unintended, then it's possible that the current behavior of dropping the 'LateSimplifyCFG' 
    setting via recursion was masking this bug.

Differential Revision: https://reviews.llvm.org/D38138

llvm-svn: 314308
2017-09-27 14:54:16 +00:00
Hiroshi Inoue ed1ffa49a4 [PowerPC] eliminate unconditional branch to the next instruction
This patch makes analyzeBranch eliminate unconditional branch to the next instruction.
After basic blocks are re-organized by optimizers, such as machine block placement, a BB may end with an unconditional branch to the next (fallthrough) BB. This patch removes such redundant branch instruction.

Differential Revision: https://reviews.llvm.org/D37730

llvm-svn: 314297
2017-09-27 10:33:02 +00:00
Coby Tayree 836c50cc2f [X86][AsmParser] fix PR32035
Differential Revision: https://reviews.llvm.org/D37473

llvm-svn: 314295
2017-09-27 10:29:29 +00:00
Simon Pilgrim 3b0d9e789e [X86][AVX] Improve (i4 bitcast (v4i1 x)) handling for 256-bit vector compare results.
As commented on D37849 and rL313547, AVX1 targets were missing a chance to use vmovmskpd for v4f64/v4i64 results for bool vector bitcasts

llvm-svn: 314293
2017-09-27 10:10:17 +00:00
Sam Parker 211f47aa37 [ARM] isTruncateFree fix
I implemented isTruncateFree in rL313533, this patch fixes the logic
to match my comment, as the previous logic was too general. Now the
only truncates that are free are i64 -> i32.

Differential Revision: https://reviews.llvm.org/D38234

llvm-svn: 314280
2017-09-27 08:30:45 +00:00
Martin Storsjo aa1533bf9b [X86] Fix SJLJ struct offsets for x86_64
This is necessary, but not sufficient, for having working SJLJ exception
handling on x86_64.

Differential Revision: https://reviews.llvm.org/D38254

llvm-svn: 314277
2017-09-27 06:08:23 +00:00
Martin Storsjo eccaf04e40 [X86] Remove erroneous callsite offsetting in SJLJ landing pads
The callsite value is already stored indexed from 0 in
the _Unwind_Context struct. When accessed via the functions
_Unwind_GetIP and _Unwind_SetIP, the value is indexed from 1,
but those functions handle the offseting. When reading directly
from the struct here, we shouldn't subtract 1.

This matches the code generated by the ARM target, where SJLJ
exception handling is used by default on iOS.

This makes clang-built object files for 32 bit x86 mingw work when
linked with libgcc/libstdc++.

Differential Revision: https://reviews.llvm.org/D38251

llvm-svn: 314276
2017-09-27 06:08:16 +00:00
Craig Topper 177a3923ce [X86] Use extract128BitVector in LowerMULH so we can extract from constant build vectors.
llvm-svn: 314274
2017-09-27 06:04:55 +00:00
Geoff Berry bbfa246ad3 [AArch64][Falkor] Fix bug in falkor prefetcher fix pass.
Summary:
In rare cases, loads that don't get prefetched that were marked as
strided loads could cause a crash if they occurred in a loop with other
colliding loads.

Reviewers: mcrosier

Subscribers: aemerson, rengolin, javed.absar, kristof.beyls

Differential Revision: https://reviews.llvm.org/D38261

llvm-svn: 314252
2017-09-26 21:40:46 +00:00
Geoff Berry a4b2f5df5e [AArch64][Falkor] Fix correctness bug in falkor prefetcher fix pass and correct some opcode tag computations.
Summary:
This addresses a correctness bug for LD[1234]*_POST opcodes that have
the prefetcher fix applied to them: the base register was not being
written back from the temp after being incremented, so it would appear
to never be incremented.

Also, fix some opcode tag computations based on some updated HW details
to get better tag avoidance and thus better prefetcher performance.

Reviewers: mcrosier

Subscribers: aemerson, rengolin, javed.absar, kristof.beyls

Differential Revision: https://reviews.llvm.org/D38256

llvm-svn: 314251
2017-09-26 21:40:41 +00:00
Craig Topper b7e4c94c6c [X86] Fix register class name in a comment. NFC
llvm-svn: 314250
2017-09-26 21:35:11 +00:00
Craig Topper 7f0eeb428b Recommit r314151 "[X86] Make all the NOREX CodeGenOnly instructions into postRA pseudos like the NOREX version of TEST.""
The late MOV8rr_NOREX that caused the crash has been removed.

llvm-svn: 314249
2017-09-26 21:35:09 +00:00
Craig Topper ab3c0075b8 [X86] Don't emit X86::MOV8rr_NOREX from X86InstrInfo::copyPhysReg.
This hook is called after register allocation with two physical registers. We don't need a separate instruction at that time to force register class constraints. I left in the assert though. We also have a fatal error in X86MCCodeEmitter if we ever encode an H-reg and a REX prefix.

llvm-svn: 314248
2017-09-26 21:35:06 +00:00
Craig Topper 0768bced39 [X86] Fix typo in comment. NFC
llvm-svn: 314247
2017-09-26 21:35:04 +00:00
Nemanja Ivanovic e22ebeab1a [PowerPC] Reverting sequence of patches for elimination of comparison instructions
In the past while, I've committed a number of patches in the PowerPC back end
aimed at eliminating comparison instructions. However, this causes some failures
in proprietary source and these issues are not observed in SPEC or any open
source packages I've been able to run.
As a result, I'm pulling the entire series and will refactor it to:
- Have a single entry point for easy control
- Have fine-grained control over which patterns we transform

A side-effect of this is that test cases for these patches (and modified by
them) are XFAIL-ed. This is a temporary measure as it is counter-productive
to remove/modify these test cases and then have to modify them again when
the refactored patch is recommitted.
The failure will be investigated in parallel to the refactoring effort and
the recommit will either have a fix for it or will leave this transformation
off by default until the problem is resolved.

llvm-svn: 314244
2017-09-26 20:42:47 +00:00
Michael Zuckerman 645f777e40 [X86][LLVM]Expanding Supports lowerInterleavedStore() in X86InterleavedAccess (VF{8|16|32} stride 3)
This patch expands the support of lowerInterleavedStore to {8|16|32}x8i stride 3.

LLVM creates suboptimal shuffle code-gen for AVX2. In overall, this patch is a specific fix for the pattern (Strid=3 VF={8|16|32}) .
This patch is part two of two patches and it covers the store (interlevaed) side.

The patch goal is to optimize the following sequence:
a0 a1 a2 a3 a4 a5 a6 a7
b0 b1 b2 b3 b4 b5 b6 b7
c0 c1 c2 c3 c4 c5 c6 c7

into
a0 b0 c0 a1 b1 c1 a2 b2
c2 a3 b3 c3 a4 b4 c4 a5
b5 c5 a6 b6 c6 a7 b7 c7

Reviewers:
zvi
guyblank
dorit
Ayal

Differential Revision: https://reviews.llvm.org/D37117

Change-Id: I56ced8bcbea809a37654060771911ade20246ccc
llvm-svn: 314234
2017-09-26 18:49:11 +00:00