Commit Graph

116221 Commits

Author SHA1 Message Date
Wei Mi 10ef7d20b2 [RegAlloc][NFC] Fix the help string of the option "huge-size-for-split".
llvm-svn: 337402
2018-07-18 16:56:33 +00:00
Nirav Dave e24fcd5382 [MC] Fix nested macro body parsing
Add missing .rep case in nestlevel checking for macro body parsing.

llvm-svn: 337398
2018-07-18 16:17:03 +00:00
Simon Atanasyan e9d7b3198a [mips] Fix predicate for the MipsTruncIntFP pattern
This is a follow-up to the rL337171. This patch fixes regression
introduced by the r337171 and enables MipsTruncIntFP pattern.

Differential revision: https://reviews.llvm.org/D49469

llvm-svn: 337392
2018-07-18 14:11:22 +00:00
Simon Pilgrim 2b37ddce4b [SLPVectorizer] Avoid duplicate scalar cost calculations in BoUpSLP::getEntryCost. NFCI.
Pulled out from D49225, we have a lot of repeated scalar cost calculations, often with arguments that don't look the same but turn out to be.

llvm-svn: 337390
2018-07-18 13:53:55 +00:00
Tim Northover 43d64b0b36 [Support] Build fix for Haiku when checking for a local filesystem
Haiku does not expose information about local versus remote mounts, so just
return false, like Cygwin.

Patch by Niels Sascha Reedijk.

llvm-svn: 337389
2018-07-18 13:42:18 +00:00
Simon Pilgrim 3a45369b9e [X86][SSE] Remove BLENDPD canonicalization from combineTargetShuffle
When rL336971 removed the scalar-fp isel patterns, we lost the need for this canonicalization - commutation/folding can handle everything else.

llvm-svn: 337387
2018-07-18 13:01:20 +00:00
Tim Northover e00cf4fc68 ARM: stop explicitly marking armv7k libcalls as hard-float. NFC.
Since the triple's default is hard float, the libcalls will already use VFP
registers.

llvm-svn: 337386
2018-07-18 12:37:43 +00:00
Tim Northover d4abd14c1b ARM: switch armv7em triple to hard-float defaults and libcalls.
We were emitting incorrect calls to libm functions that LLVM had decided it
knew about because the default is soft-float.

llvm-svn: 337385
2018-07-18 12:37:04 +00:00
Tim Northover 097a3e3d95 ARM: deduplicate hard-float detection code. NFC.
ARMSubtarget had a copy/pasted block to determine whether the target was
hard-float, but it just delegated to triple features anyway so it's better at
the TargetMachine level.

llvm-svn: 337384
2018-07-18 12:36:25 +00:00
Sander de Smalen 330d887d72 [AArch64][SVE] Asm: Support for unpredicated FP operations.
This patch adds support for the following unpredicated
floating-point instructions:

  FADD      Floating point add
  FSUB      Floating point subtract
  FMUL      Floating point multiplication
  FTSMUL    Floating point trigonometric starting value
  FRECPS    Floating point reciprocal step
  FRSQRTS   Floating point reciprocal square root step

The instructions have the following assembly format:
  fadd z0.h, z1.h, z2.h
and have variants for 16, 32 and 64-bit FP elements.

llvm-svn: 337383
2018-07-18 11:59:12 +00:00
Roman Lebedev 3cb87e905c [InstCombine] Re-commit: Fold 'check for [no] signed truncation' pattern
Summary:
[[ https://bugs.llvm.org/show_bug.cgi?id=38149 | PR38149 ]]

As discussed in https://reviews.llvm.org/D49179#1158957 and later,
the IR for 'check for [no] signed truncation' pattern can be improved:
https://rise4fun.com/Alive/gBf
^ that pattern will be produced by Implicit Integer Truncation sanitizer,
https://reviews.llvm.org/D48958 https://bugs.llvm.org/show_bug.cgi?id=21530
in signed case, therefore it is probably a good idea to improve it.

The DAGCombine will reverse this transform, see
https://reviews.llvm.org/D49266

This transform is surprisingly frustrating.
This does not deal with non-splat shift amounts, or with undef shift amounts.
I've outlined what i think the solution should be:
```
  // Potential handling of non-splats: for each element:
  //  * if both are undef, replace with constant 0.
  //    Because (1<<0) is OK and is 1, and ((1<<0)>>1) is also OK and is 0.
  //  * if both are not undef, and are different, bailout.
  //  * else, only one is undef, then pick the non-undef one.
```

This is a re-commit, as the original patch, committed in rL337190
was reverted in rL337344 as it broke chromium build:
https://bugs.llvm.org/show_bug.cgi?id=38204 and
https://crbug.com/864832
Proofs that the fixed folds are ok: https://rise4fun.com/Alive/VYM

Differential Revision: https://reviews.llvm.org/D49320

llvm-svn: 337376
2018-07-18 10:55:17 +00:00
Daniel Cederman 959c8bf51c Revert "[Sparc] Use the IntPair reg class for r constraints with value type f64"
This reverts commit 55222c9183c6e07f53a54c4061677734f54feac1.

I missed that this patch has a dependency on https://reviews.llvm.org/D49219
that has not been approved yet.

llvm-svn: 337373
2018-07-18 10:05:30 +00:00
Sander de Smalen ccdc7ebc1d [AArch64][SVE] Asm: Support for UDOT/SDOT instructions.
The signed/unsigned DOT instructions perform a dot-product on
quadtuplets from two source vectors and accumulate the result in
the destination register. The instructions come in two forms:

Vector form, e.g.
  sdot  z0.s, z1.b, z2.b     - signed dot product on four 8-bit quad-tuplets,
                               accumulating results in 32-bit elements.

  udot  z0.d, z1.h, z2.h     - unsigned dot product on four 16-bit quad-tuplets,
                               accumulating results in 64-bit elements.

Indexed form, e.g.
  sdot  z0.s, z1.b, z2.b[3]  - signed dot product on four 8-bit quad-tuplets
                               with specified quadtuplet from second
                               source vector, accumulating results in 32-bit
                               elements.
  udot  z0.d, z1.h, z2.h[1]  - dot product on four 16-bit quad-tuplets
                               with specified quadtuplet from second
                               source vector, accumulating results in 64-bit
                               elements.

llvm-svn: 337372
2018-07-18 09:37:51 +00:00
Daniel Cederman 4e38df18ea [Sparc] Use the IntPair reg class for r constraints with value type f64
Summary: This is how it appears to be handled in GCC and it prevents a
"Unknown mismatch" error in the SelectionDAGBuilder.

Reviewers: venkatra, jyknight, jrtc27

Reviewed By: jyknight, jrtc27

Subscribers: eraman, fedor.sergeev, jrtc27, llvm-commits

Differential Revision: https://reviews.llvm.org/D49218

llvm-svn: 337370
2018-07-18 09:25:33 +00:00
Sander de Smalen 889fe81ce5 [AArch64][SVE] Asm: Integer divide instructions.
This patch adds the following predicated instructions:

  UDIV    Unsigned divide active elements
  UDIVR   Unsigned divide active elements, reverse form.
  SDIV    Signed divide active elements
  SDIVR   Signed divide active elements, reverse form.

e.g.
  udiv  z0.s, p0/m, z0.s, z1.s
    (unsigned divide active elements in z0 by z1, store result in z0)

  sdivr z0.s, p0/m, z0.s, z1.s
    (signed divide active elements in z1 by z0, store result in z0)

llvm-svn: 337369
2018-07-18 09:17:29 +00:00
Simon Pilgrim 3cb3056fca Fix -Wdocumentation warning. NFCI.
llvm-svn: 337367
2018-07-18 09:07:54 +00:00
Sander de Smalen ac0cb5bf75 [AArch64][SVE] Asm: Support for integer MUL instructions.
This patch adds the following instructions:
  MUL   - multiply vectors, e.g.
    mul z0.h, p0/m, z0.h, z1.h
        - multiply with immediate, e.g.
    mul z0.h, z0.h, #127

  SMULH - signed multiply returning high half, e.g.
    smulh z0.h, p0/m, z0.h, z1.h

  UMULH - unsigned multiply returning high half, e.g.
    umulh z0.h, p0/m, z0.h, z1.h

llvm-svn: 337358
2018-07-18 08:10:03 +00:00
Craig Topper 92ea7a7b48 [X86] Enable commuting of VUNPCKHPD to VMOVLHPS to enable load folding by using VMOVLPS with a modified address.
This required an annoying amount of tablegen multiclass changes to make only VUNPCKHPDZ128rr commutable.

llvm-svn: 337357
2018-07-18 07:31:32 +00:00
Hiroshi Inoue cd83d459bc [NFC] fix trivial typos in comments
llvm-svn: 337351
2018-07-18 06:04:43 +00:00
Justin Hibbits 22e939a15b Fix build failures from r337347, found by clang
* Delete a no-longer-used override, and mark the other
getRegisterTypeForCallingConv() as override.
* SPE only supports i32, not i64, as the internal type, so simply remove
the type check, so that DestReg and Opc are provably always set.

GCC 6.4 did not warn about either of the above.

llvm-svn: 337350
2018-07-18 05:19:25 +00:00
Craig Topper 95063a45b8 [X86] Remove patterns that mix X86ISD::MOVLHPS/MOVHLPS with v2i64/v2f64 types.
The X86ISD::MOVLHPS/MOVHLPS should now only be emitted in SSE1 only. This means that the v2i64/v2f64 types would be illegal thus we don't need these patterns.

llvm-svn: 337349
2018-07-18 05:10:53 +00:00
Craig Topper 1425e10cc6 [X86] Generate v2f64 X86ISD::UNPCKL/UNPCKH instead of X86ISD::MOVLHPS/MOVHLPS for unary v2f64 {0,0} and {1,1} shuffles with SSE2.
I'm trying to restrict the MOVLHPS/MOVHLPS ISD nodes to SSE1 only. With SSE2 we can use unpcks. I believe this will allow some patterns to be cleaned up to require fewer bitcasts.

I've put in an odd isel hack to still select MOVHLPS instruction from the unpckh node to avoid changing tests and because movhlps is a shorter encoding. Ideally we'd do execution domain switching on this, but the operands are in the wrong order and are tied. We might be able to try a commute in the domain switching using custom code.

We already support domain switching for UNPCKLPD and MOVLHPS.

llvm-svn: 337348
2018-07-18 05:10:51 +00:00
Justin Hibbits d52990c71b Introduce codegen for the Signal Processing Engine
Summary:
The Signal Processing Engine (SPE) is found on NXP/Freescale e500v1,
e500v2, and several e200 cores.  This adds support targeting the e500v2,
as this is more common than the e500v1, and is in SoCs still on the
market.

This patch is very intrusive because the SPE is binary incompatible with
the traditional FPU.  After discussing with others, the cleanest
solution was to make both SPE and FPU features on top of a base PowerPC
subset, so all FPU instructions are now wrapped with HasFPU predicates.

Supported by this are:
* Code generation following the SPE ABI at the LLVM IR level (calling
conventions)
* Single- and Double-precision math at the level supported by the APU.

Still to do:
* Vector operations
* SPE intrinsics

As this changes the Callee-saved register list order, one test, which
tests the precise generated code, was updated to account for the new
register order.

Reviewed by: nemanjai
Differential Revision: https://reviews.llvm.org/D44830

llvm-svn: 337347
2018-07-18 04:25:10 +00:00
Justin Hibbits 4fa4fa6a73 Complete the SPE instruction set patterns
This is the lead-up to having SPE codegen.  Add the rest of the
instructions, along with MC tests.

Differential Revision:  https://reviews.llvm.org/D44829

llvm-svn: 337346
2018-07-18 04:24:57 +00:00
Justin Hibbits ceb3cd96f7 Add PowerPC e500(v2) core scheduler and directives.
Differential Revision: https://reviews.llvm.org/D44828

llvm-svn: 337345
2018-07-18 04:24:49 +00:00
Bob Haarman 4ebe5d59b6 Revert "[InstCombine] Fold 'check for [no] signed truncation' pattern"
This reverts r337190 (and a few follow-up commits), which caused the
Chromium build to fail. See
https://bugs.llvm.org/show_bug.cgi?id=38204 and
https://crbug.com/864832

llvm-svn: 337344
2018-07-18 02:18:28 +00:00
Peter Collingbourne fc50498ced CodeGen: Don't create address significance table entries for thread-local variables.
The presence of these symbols in the symbol table can cause symbol type
mismatch errors (or undefined symbol errors on emulated TLS targets)
and they can't be ICF'd anyway.

llvm-svn: 337338
2018-07-18 00:21:40 +00:00
Craig Topper a29f58dc31 [X86] Remove the vector alignment requirement from the patterns added in r337320.
The resulting instruction will only load 64 bits so alignment isn't required.

llvm-svn: 337334
2018-07-17 23:26:20 +00:00
Peter Collingbourne bd9d313d5c CodeGen: Add a target option for emitting .addrsig directives for all address-significant symbols.
Differential Revision: https://reviews.llvm.org/D48143

llvm-svn: 337331
2018-07-17 22:40:08 +00:00
Peter Collingbourne 3e22733698 MC: Implement support for new .addrsig and .addrsig_sym directives.
Part of the address-significance tables proposal:
http://lists.llvm.org/pipermail/llvm-dev/2018-May/123514.html

Differential Revision: https://reviews.llvm.org/D47744

llvm-svn: 337328
2018-07-17 22:17:18 +00:00
Craig Topper 9ef92865ec [X86] Add patterns for folding full vector load into MOVHPS and MOVLPS with SSE1 only.
llvm-svn: 337320
2018-07-17 20:16:18 +00:00
Fangrui Song 5391bb62fb [Demangle] Add missing header files
llvm-svn: 337318
2018-07-17 19:50:41 +00:00
Zachary Turner 427bce1114 Add missing include.
llvm-svn: 337317
2018-07-17 19:48:46 +00:00
Zachary Turner 8a0efd0919 Add some helper functions to the demangle utility classes.
These are all methods that, while not currently used in the
Itanium demangler, are generally useful enough that it's
likely the itanium demangler could find a use for them.  More
importantly, they are all necessary for the Microsoft demangler
which is up and coming in a subsequent patch.  Rather than
combine these into a single monolithic patch, I think it makes
sense to commit this utility code first since it is very simple,
this way it won't detract from the substance of the MS demangler
patch.

llvm-svn: 337316
2018-07-17 19:42:29 +00:00
Vedant Kumar 9ece818291 [InstCombine] Preserve debug value when simplifying cast-of-select
InstCombine has a cast transform that matches a cast-of-select:

  Orig = cast (Src = select Cond TV FV)

And tries to replace it with a select which has the cast folded in:

  NewSel = select Cond (cast TV) (cast FV)

The combiner does RAUW(Orig, NewSel), so any debug values for Orig would
survive the transform. But debug values for Src would be lost.

This patch teaches InstCombine to replace all debug uses of Src with
NewSel (taking care of doing any necessary DIExpression rewriting).

Differential Revision: https://reviews.llvm.org/D49270

llvm-svn: 337310
2018-07-17 18:08:36 +00:00
Chandler Carruth c0cb5731fc [x86/SLH] Flesh out the data-invariant instruction table a bit based on feedback from Craig.
Summary:
The only thing he suggested that I've skipped here is the double-wide
multiply instructions. Multiply is an area I'm nervous about there being
some hidden data-dependent behavior, and it doesn't seem important for
any benchmarks I have, so skipping it and sticking with the minimal
multiply support that matches what I know is widely used in existing
crypto libraries. We can always add double-wide multiply when we have
clarity from vendors about its behavior and guarantees.

I've tried to at least cover the fundamentals here with tests, although
I've not tried to cover every width or permutation. I can add more tests
where folks think it would be helpful.

Reviewers: craig.topper

Subscribers: sanjoy, mcrosier, hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D49413

llvm-svn: 337308
2018-07-17 18:07:59 +00:00
Sam Clegg 28b3e99482 [WebAssembly] Update WebAssemblyLowerEmscriptenEHSjLj to handle separate compilation
Previously we were assuming whole program compilation. Now that
separate compilation is a thing we need to update this pass.
Firstly, it can no longer assert on the existence of malloc and free.
This functions might not be in the current translation unit.  If we
need them then we will generate not imports for them.

Secondly the global helper function we create should be marked as
weak since we will be generating a separate copy in each translation
unit.

Finally the names of the symbols used must be unique and fixed since
they need to agree across translation units.

Differential Revision: https://reviews.llvm.org/D49263

llvm-svn: 337301
2018-07-17 16:40:03 +00:00
Craig Topper 9187bca71b [X86] Remove some standalone patterns in favor of the patterns in the MOVLPD instruction definitions.
Previously we passed 'null_frag' into the instruction definition. The multiclass is shared with MOVHPD which doesn't use null_frag. It turns out by passing X86Movsd it produces patterns equivalent to some standalone patterns.

llvm-svn: 337299
2018-07-17 16:24:33 +00:00
Sander de Smalen 7b33faaf38 [AArch64][SVE]: Integer multiply-add/subtract instructions.
This patch adds support for the following instructions:
  MLA  mul-add, writing addend       (Zda = Zda +   Zn * Zm)
  MLS  mul-sub, writing addend       (Zda = Zda +  -Zn * Zm)
  MAD  mul-add, writing multiplicant (Zdn =  Za +  Zdn * Zm)
  MSB  mul-sub, writing multiplicant (Zdn =  Za + -Zdn * Zm)

llvm-svn: 337293
2018-07-17 15:41:58 +00:00
Petar Jovanovic f10e4798b4 [Mips][FastISel] Fix handling of icmp with i1 type
The Mips FastISel back-end does not extend i1 values while lowering icmp.
Ensure that we bail into DAG ISel when handling this case.

Patch by Dragan Mladjenovic.

Differential Revision: https://reviews.llvm.org/D49290

llvm-svn: 337288
2018-07-17 14:57:46 +00:00
Florian Hahn d95761d9d0 [IPSCCP] Run Solve each time we resolved an undef in a function.
Once we resolved an undef in a function we can run Solve, which could
lead to finding a constant return value for the function, which in turn
could turn undefs into constants in other functions that call it, before
resolving undefs there.

Computationally the amount of work we are doing stays the same, just the
order we process things is slightly different and potentially there are
a few less undefs to resolve.

We are still relying on the order of functions in the IR, which means
depending on the order, we are able to resolve the optimal undef first
or not. For example, if @test1 comes before @testf, we find the constant
return value of @testf too late and we cannot use it while solving
@test1.

This on its own does not lead to more constants removed in the
test-suite, probably because currently we have to be very lucky to visit
applicable functions in the right order.

Maybe we manage to come up with a better way of resolving undefs in more
'profitable' functions first.

Reviewers: efriedma, mssimpso, davide

Reviewed By: efriedma, davide

Differential Revision: https://reviews.llvm.org/D49385

llvm-svn: 337283
2018-07-17 14:04:59 +00:00
Sander de Smalen f41dd122d6 [AArch64][SVE] Asm: FP fused multiply-add/subtract instructions.
This patch adds support for the following instructions:

  FMLA    mul-add, writing addend                (Zda =  Zda +   Zn * Zm)
  FNMLA   negated mul-add, writing addend        (Zda = -Zda +  -Zn * Zm)

  FMLS    mul-sub, writing addend                (Zda =  Zda +  -Zn * Zm)
  FNMLS   negated mul-sub, writing addend        (Zda = -Zda +   Zn * Zm)

  FMAD    mul-add, writing multiplicant          (Zdn =  Za  +  Zdn * Zm)
  FNMAD   negated mul-add, writing multiplicant  (Zdn = -Za  + -Zdn * Zm)

  FMSB    mul-sub, writing multiplicant          (Zdn =  Za  + -Zdn * Zm)
  FNMSB   negated mul-sub, writing multiplicant  (Zdn = -Za  +  Zdn * Zm)

llvm-svn: 337282
2018-07-17 13:58:46 +00:00
Simon Pilgrim 1a4f3c93fb [SLPVectorizer] Don't attempt horizontal reduction on pointer types (PR38191)
TTI::getMinMaxReductionCost typically can't handle pointer types - until this is changed its better to limit horizontal reduction to integer/float vector types only.

llvm-svn: 337280
2018-07-17 13:43:33 +00:00
Tim Renouf e1016f1bc7 More fixes for subreg join failure in RegCoalescer
Summary:
Part of the adjustCopiesBackFrom method wasn't correctly dealing with SubRange
intervals when updating.

2 changes. The first to ensure that bogus SubRange Segments aren't propagated when
encountering Segments of the form [1234r, 1234d:0) when preparing to merge value
numbers. These can be removed in this case.

The second forces a shrinkToUses call if SubRanges end on the copy index
(instead of just the parent register).

V2: Addressed review comments, plus MIR test instead of ll test

Subscribers: MatzeB, qcolombet, nhaehnle

Differential Revision: https://reviews.llvm.org/D40308

Change-Id: I1d2b2b4beea802fce11da01edf71feb2064aab05
llvm-svn: 337273
2018-07-17 12:38:39 +00:00
Sander de Smalen 5dabcf887b [AArch64][SVE] Asm: Support for predicated FP operations (FP immediate)
This patch completes support for the following floating point
instructions that take FP immediates:
  FADD*  (addition)
  FSUB   (subtract)
  FSUBR  (subtract reverse form)
  FMUL*  (multiplication)
  FMAX*  (maximum)
  FMAXNM (maximum number)
  FMIN   (maximum)
  FMINNM (maximum number)

All operations are predicated and take a FP immediate operand,
e.g.

  fadd z0.h, p0/m, z0.h, #0.5
  fmin z0.s, p0/m, z0.s, #1.0
        ^___________^ (tied)

* Instructions added in a previous patch.

llvm-svn: 337272
2018-07-17 12:36:08 +00:00
Joerg Sonnenberger 7f457d79ec Don't assert that a size_t fits into 64bit.
Avoids tautological compare warnings on 32bit platforms.

llvm-svn: 337269
2018-07-17 12:30:34 +00:00
whitequark a41b24f32d [LLVM-C] Fix name mangling on AggressiveInstCombine
Similarly to rL336736, at least one more C API function does not
properly get declared as extern "C" due to a missing header, causing
name mangling and linking errors.

This patch fixes calls to LLVMAddAggressiveInstCombinerPass().

Differential Revision: https://reviews.llvm.org/D49416

Reviewed By: whitequark

llvm-svn: 337264
2018-07-17 11:13:58 +00:00
whitequark 7c4a074505 [LLVM-C] Add target triple normalization to the C API.
rL333307 was introduced to remove automatic target triple
normalization when calling sys::getDefaultTargetTriple(), arguing
that users of the latter already called Triple::normalize()
if necessary. However, users of the C API currently have no way of
doing target triple normalization.

This patch introduces an LLVMNormalizeTargetTriple function to
the C API which wraps Triple::normalize() and can be used on
the result of LLVMGetDefaultTargetTriple to achieve the same effect.

Differential Revision: https://reviews.llvm.org/D49414

Reviewed By: whitequark

llvm-svn: 337263
2018-07-17 10:57:39 +00:00
Sander de Smalen 3b9e342ae1 [AArch64][SVE] Asm: Support for predicated FP operations.
This patch adds support for the following floating point
instructions:
  FABD   (absolute difference)
  FADD   (addition)
  FSUB   (subtract)
  FSUBR  (subtract reverse form)
  FDIV   (divide)
  FDIVR  (divide reverse form)
  FMAX   (maximum)
  FMAXNM (maximum number)
  FMIN   (minimum)
  FMINNM (minimum number)
  FSCALE (adjust exponent)
  FMULX  (multiply extended)

All operations are predicated and binary form, e.g.

  fadd z0.h, p0/m, z0.h, z1.h
        ^___________^ (tied)

Supporting 16, 32 and 64-bit FP elements.

llvm-svn: 337259
2018-07-17 09:48:57 +00:00
Simon Pilgrim e4d12bb2d6 [DAGCombiner] Call SimplifyDemandedVectorElts from EXTRACT_VECTOR_ELT
If we are only extracting vector elements via EXTRACT_VECTOR_ELT(s) we may be able to use SimplifyDemandedVectorElts to avoid unnecessary vector ops.

Differential Revision: https://reviews.llvm.org/D49262

llvm-svn: 337258
2018-07-17 09:45:35 +00:00
Simon Pilgrim a0220b0570 Fix MSVC "result of 32-bit shift implicitly converted to 64 bits" warning. NFCI.
llvm-svn: 337257
2018-07-17 09:39:55 +00:00
Sander de Smalen ec229abb9b [AArch64][SVE] Asm: Support for SPLICE instruction.
The SPLICE instruction splices two vectors into one vector using a
predicate. It copies the active elements from the first vector, and
then fills the remaining elements with the low-numbered elements from
the second vector.

The instruction has the following form, e.g.

  splice z0.b, p0, z0.b, z1.b

for 8-bit elements. It also supports 16, 32 and
64-bit elements.

llvm-svn: 337253
2018-07-17 08:52:45 +00:00
Sander de Smalen 78fe30a662 [AArch64][SVE] Asm: Support for EXT instruction.
This patch adds an instruction that allows extracting
a vector from a pair of vectors, given an immediate index
that describes the element position to extract from.

The instruction has the following assembly:
  ext z0.b, z0.b, z1.b, #imm

where #imm is an immediate between 0 and 255.

llvm-svn: 337251
2018-07-17 08:39:48 +00:00
Craig Topper 880f92ad71 [X86] Properly qualify some MOVSS/MOVSD patterns with OptSize.
These are integer versions of patterns that I already fixed for floating point.

llvm-svn: 337240
2018-07-17 06:24:16 +00:00
Daniel Cederman c812dcc3db [Sparc] Do not depend on icc for ta 1
The ta instruction will always trap, regardless of the value
of the integer condition codes. TRAPri is marked as using icc,
so we cannot use a pattern for TRAPri to implement ta 1, as
verify-machineinstrs can complain that icc is not defined.
Instead we implement ta 1 the same way as ta 5.

llvm-svn: 337236
2018-07-17 05:49:33 +00:00
Craig Topper c376a1916b [X86] Add full set of patterns for turning ceil/floor/trunc/rint/nearbyint into rndscale with loads, broadcast, and masking.
This amounts to pretty ridiculous number of patterns. Ideally we'd canonicalize the X86ISD::VRNDSCALE earlier to reuse those patterns. I briefly looked into doing that, but some strict FP operations could still get converted to rint and nearbyint during isel. It's probably still worthwhile to look into. This patch is meant as a starting point to work from.

llvm-svn: 337234
2018-07-17 05:48:48 +00:00
Craig Topper 6751727d76 [X86] Add a missing FMA3 scalar intrinsic pattern.
This allows us to use 231 form to fold an insertelement on the add input to the fma. There is technically no software intrinsic that can use this until AVX512F, but it can be manually built up from other intrinsics.

llvm-svn: 337223
2018-07-16 23:10:58 +00:00
Sam Clegg cf2a9e28b1 [WebAssembly] Remove ELF file support.
This support was partial and temporary.  Now that we have
wasm object file support its no longer needed.

Differential Revision: https://reviews.llvm.org/D48744

llvm-svn: 337222
2018-07-16 23:09:29 +00:00
Sanjay Patel c71adc8040 [Intrinsics] define funnel shift IR intrinsics + DAG builder support
As discussed here:
http://lists.llvm.org/pipermail/llvm-dev/2018-May/123292.html
http://lists.llvm.org/pipermail/llvm-dev/2018-July/124400.html

We want to add rotate intrinsics because the IR expansion of that pattern is 4+ instructions, 
and we can lose pieces of the pattern before it gets to the backend. Generalizing the operation 
by allowing 2 different input values (plus the 3rd shift/rotate amount) gives us a "funnel shift" 
operation which may also be a single hardware instruction.

Initially, I thought we needed to define new DAG nodes for these ops, and I spent time working 
on that (much larger patch), but then I concluded that we don't need it. At least as a first 
step, we have all of the backend support necessary to match these ops...because it was required. 
And shepherding these through the IR optimizer is the primary concern, so the IR intrinsics are 
likely all that we'll ever need.

There was also a question about converting the intrinsics to the existing ROTL/ROTR DAG nodes
(along with improving the oversized shift documentation). Again, I don't think that's strictly 
necessary (as the test results here prove). That can be an efficiency improvement as a small 
follow-up patch.

So all we're left with is documentation, definition of the IR intrinsics, and DAG builder support. 

Differential Revision: https://reviews.llvm.org/D49242

llvm-svn: 337221
2018-07-16 22:59:31 +00:00
Zachary Turner 21808b1037 Add missing includes.
llvm-svn: 337218
2018-07-16 21:34:25 +00:00
Zachary Turner fac2da1ba7 [LLVMDemangle] Move some utility classes to header files.
In a followup I'm looking to add a Microsoft demangler.  Doing
so needs a lot of the same utility classes and feature test
macros which are already implemented in ItaniumDemangle.cpp.
So move all of these things into header files so that they
can be re-used by a new demangler.

Differential Revision: https://reviews.llvm.org/D49399

llvm-svn: 337217
2018-07-16 21:24:03 +00:00
Fangrui Song cb0bab86b3 [CodeGen] Fix inconsistent declaration parameter name
llvm-svn: 337200
2018-07-16 18:51:40 +00:00
Farhana Aleen c370d7b33d [AMDGPU] [AMDGPU] Support a fdot2 pattern.
Summary: Optimize fma((float)S0.x, (float)S1.x fma((float)S0.y, (float)S1.y, z))
                   -> fdot2((v2f16)S0, (v2f16)S1, (float)z)

Author: FarhanaAleen

Reviewed By: rampitec, b-sumner

Subscribers: AMDGPU

Differential Revision: https://reviews.llvm.org/D49146

llvm-svn: 337198
2018-07-16 18:19:59 +00:00
Mandeep Singh Grang 20239b18bb [llvm] Change 2 instances of std::sort to llvm::sort
llvm-svn: 337192
2018-07-16 17:26:37 +00:00
Roman Lebedev b79b4f539b [InstCombine] Fold 'check for [no] signed truncation' pattern
Summary:
[[ https://bugs.llvm.org/show_bug.cgi?id=38149 | PR38149 ]]

As discussed in https://reviews.llvm.org/D49179#1158957 and later,
the IR for 'check for [no] signed truncation' pattern can be improved:
https://rise4fun.com/Alive/gBf
^ that pattern will be produced by Implicit Integer Truncation sanitizer,
https://reviews.llvm.org/D48958 https://bugs.llvm.org/show_bug.cgi?id=21530
in signed case, therefore it is probably a good idea to improve it.

Proofs for this transform: https://rise4fun.com/Alive/mgu
This transform is surprisingly frustrating.
This does not deal with non-splat shift amounts, or with undef shift amounts.
I've outlined what i think the solution should be:
```
  // Potential handling of non-splats: for each element:
  //  * if both are undef, replace with constant 0.
  //    Because (1<<0) is OK and is 1, and ((1<<0)>>1) is also OK and is 0.
  //  * if both are not undef, and are different, bailout.
  //  * else, only one is undef, then pick the non-undef one.
```

The DAGCombine will reverse this transform, see
https://reviews.llvm.org/D49266

Reviewers: spatel, craig.topper

Reviewed By: spatel

Subscribers: JDevlieghere, rkruppe, llvm-commits

Differential Revision: https://reviews.llvm.org/D49320

llvm-svn: 337190
2018-07-16 16:45:42 +00:00
Wei Mi 40c4aa7637 [RegAlloc] Skip global splitting if the live range is huge and its spill is
trivially rematerializable.

We run into a case where machineLICM hoists a large number of live ranges
outside of a big loop because it thinks those live ranges are trivially
rematerializable. In regalloc, global splitting is tried out first for those
live ranges before they are spilled and rematerialized. Because the global
splitting algorithm is quadratic, increasing a lot of global splitting
candidates causes huge compile time increase (50s to 1400s on my local
machine when compiling a module).

However, we think for live ranges which are very large and are trivially
rematerialiable, it is better to just skip global splitting so as to save
compile time with little chance of sacrificing performance.  We uses the
segment size of live range to indirectly evaluate whether the global
splitting of the live range can introduce high cost, and use an option
as a knob to adjust the size limit threshold.

Differential Revision: https://reviews.llvm.org/D49353

llvm-svn: 337186
2018-07-16 15:42:20 +00:00
Teresa Johnson d68935c5ac Restore "[ThinLTO] Ensure we always select the same function copy to import"
This reverts commit r337081, therefore restoring r337050 (and fix in
r337059), with test fix for bot failure described after the original
description below.

In order to always import the same copy of a linkonce function,
even when encountering it with different thresholds (a higher one then a
lower one), keep track of the summary we decided to import.
This ensures that the backend only gets a single definition to import
for each GUID, so that it doesn't need to choose one.

Move the largest threshold the GUID was considered for import into the
current module out of the ImportMap (which is part of a larger map
maintained across the whole index), and into a new map just maintained
for the current module we are computing imports for. This saves some
memory since we no longer have the thresholds maintained across the
whole index (and throughout the in-process backends when doing a normal
non-distributed ThinLTO build), at the cost of some additional
information being maintained for each invocation of ComputeImportForModule
(the selected summary pointer for each import).

There is an additional map lookup for each callee being considered for
importing, however, this was able to subsume a map lookup in the
Worklist iteration that invokes computeImportForFunction. We also are
able to avoid calling selectCallee if we already failed to import at the
same or higher threshold.

I compared the run time and peak memory for the SPEC2006 471.omnetpp
benchmark (running in-process ThinLTO backends), as well as for a large
internal benchmark with a distributed ThinLTO build (so just looking at
the thin link time/memory). Across a number of runs with and without
this change there was no significant change in the time and memory.

(I tried a few other variations of the change but they also didn't
improve time or peak memory).

The new commit removes a test that no longer makes sense
(Transforms/FunctionImport/hotness_based_import2.ll), as exposed by the
reverse-iteration bot. The test depends on the order of processing the
summary call edges, and actually depended on the old problematic
behavior of selecting more than one summary for a given GUID when
encountered with different thresholds. There was no guarantee even
before that we would eventually pick the linkonce copy with the hottest
call edges, it just happened to work with the test and the old code, and
there was no guarantee that we would end up importing the selected
version of the copy that had the hottest call edges (since the backend
would effectively import only one of the selected copies).

Reviewers: davidxl

Subscribers: mehdi_amini, inglorion, llvm-commits

Differential Revision: https://reviews.llvm.org/D48670

llvm-svn: 337184
2018-07-16 15:30:27 +00:00
Chandler Carruth fa065aa75c [x86/SLH] Completely rework how we sink post-load hardening past data
invariant instructions to be both more correct and much more powerful.

While testing, I continued to find issues with sinking post-load
hardening. Unfortunately, it was amazingly hard to create any useful
tests of this because we were mostly sinking across copies and other
loading instructions. The fact that we couldn't sink past normal
arithmetic was really a big oversight.

So first, I've ported roughly the same set of instructions from the data
invariant loads to also have their non-loading varieties understood to
be data invariant. I've also added a few instructions that came up so
often it again made testing complicated: inc, dec, and lea.

With this, I was able to shake out a few nasty bugs in the validity
checking. We need to restrict to hardening single-def instructions with
defined registers that match a particular form: GPRs that don't have
a NOREX constraint directly attached to their register class.

The (tiny!) test case included catches all of the issues I was seeing
(once we can sink the hardening at all) except for the NOREX issue. The
only test I have there is horrible. It is large, inexplicable, and
doesn't even produce an error unless you try to emit encodings. I can
keep looking for a way to test it, but I'm out of ideas really.

Thanks to Ben for giving me at least a sanity-check review. I'll follow
up with Craig to go over this more thoroughly post-commit, but without
it SLH crashes everywhere so landing it for now.

Differential Revision: https://reviews.llvm.org/D49378

llvm-svn: 337177
2018-07-16 14:58:32 +00:00
Simon Atanasyan 738a6e449e [mips] Eliminate the usage of hasStdEnc in MipsPat.
Instead, the pattern is tagged with the correct predicate when
it is declared. Some patterns have been duplicated as necessary.

Patch by Simon Dardis.

Differential revision: https://reviews.llvm.org/D48365

llvm-svn: 337171
2018-07-16 13:52:41 +00:00
Petar Jovanovic 021e4c82eb [MIPS GlobalISel] Select instructions to load and store i32 on stack
Add code for selection of G_LOAD, G_STORE, G_GEP, G_FRAMEINDEX and
G_CONSTANT. Support loads and stores of i32 values.

Patch by Petar Avramovic.

Differential Revision: https://reviews.llvm.org/D48957

llvm-svn: 337168
2018-07-16 13:29:32 +00:00
Roman Lebedev de506632aa [X86][AArch64][DAGCombine] Unfold 'check for [no] signed truncation' pattern
Summary:

[[ https://bugs.llvm.org/show_bug.cgi?id=38149 | PR38149 ]]

As discussed in https://reviews.llvm.org/D49179#1158957 and later,
the IR for 'check for [no] signed truncation' pattern can be improved:
https://rise4fun.com/Alive/gBf
^ that pattern will be produced by Implicit Integer Truncation sanitizer,
https://reviews.llvm.org/D48958 https://bugs.llvm.org/show_bug.cgi?id=21530
in signed case, therefore it is probably a good idea to improve it.

But the IR-optimal patter does not lower efficiently, so we want to undo it..

This handles the simple pattern.
There is a second pattern with predicate and constants inverted.

NOTE: we do not check uses here. we always do the transform.

Reviewers: spatel, craig.topper, RKSimon, javed.absar

Reviewed By: spatel

Subscribers: kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D49266

llvm-svn: 337166
2018-07-16 12:44:10 +00:00
Daniel Cederman 92a700a215 [Sparc] Use the correct encoding for ta 3
Summary: The old encoding generated a "tn %g1 + 3" instruction instead
of the expected "ta 3".

Reviewers: venkatra, jyknight

Reviewed By: jyknight

Subscribers: fedor.sergeev, jrtc27, llvm-commits

Differential Revision: https://reviews.llvm.org/D49171

llvm-svn: 337165
2018-07-16 12:28:26 +00:00
Daniel Cederman ab09da7b57 [Sparc] Use the names .rem and .urem instead of __modsi3 and __umodsi3
Summary: These are the names used in libgcc.

Reviewers: venkatra, jyknight, ekedaigle

Reviewed By: jyknight

Subscribers: joerg, fedor.sergeev, jrtc27, llvm-commits

Differential Revision: https://reviews.llvm.org/D48915

llvm-svn: 337164
2018-07-16 12:22:08 +00:00
Daniel Cederman 68765757d4 [Sparc] Generate ta 1 for the @llvm.debugtrap intrinsic
Summary: Software trap number one is the trap used for breakpoints
in the Sparc ABI.

Reviewers: jyknight, venkatra

Reviewed By: jyknight

Subscribers: fedor.sergeev, jrtc27, llvm-commits

Differential Revision: https://reviews.llvm.org/D48637

llvm-svn: 337163
2018-07-16 12:16:53 +00:00
Daniel Cederman c3d8002c2e Avoid losing Hi part when expanding VAARG nodes on big endian machines
Summary:
If the high part of the load is not used the offset to the next element
will not be set correctly.

For example, on Sparc V8, the following code will read val2 from offset 4
instead of 8.

```
int val = __builtin_va_arg(va, long long);
int val2 = __builtin_va_arg(va, int);
```

Reviewers: jyknight

Reviewed By: jyknight

Subscribers: fedor.sergeev, jrtc27, llvm-commits

Differential Revision: https://reviews.llvm.org/D48595

llvm-svn: 337161
2018-07-16 12:14:17 +00:00
Chandler Carruth e66a6f48e3 [x86/SLH] Fix a bug where we would try to post-load harden non-GPRs.
Found cases that hit the assert I added. This patch factors the validity
checking into a nice helper routine and calls it when deciding to harden
post-load, and asserts it when doing so later.

I've added tests for the various ways of loading a floating point type,
as well as loading all vector permutations. Even though many of these go
to identical instructions, it seems good to somewhat comprehensively
test them.

I'm confident there will be more fixes needed here, I'll try to add
tests each time as I get this predicate adjusted.

llvm-svn: 337160
2018-07-16 11:38:48 +00:00
Alexander Potapenko d1a381b17a MSan: minor fixes, NFC
- remove an extra space after |ID| declaration
 - drop the unused |FirstInsn| parameter in getShadowOriginPtrUserspace()

llvm-svn: 337159
2018-07-16 10:57:19 +00:00
Jonas Devlieghere 98062cb396 [AccelTable] Provide DWARF5AccelTableStaticData for dsymutil.
For dsymutil we want to store offsets in the accelerator table entries
rather than DIE pointers. In addition, we need a way to communicate
which CU a DIE belongs to. This patch provides support for both of these
issues.

Differential revision: https://reviews.llvm.org/D49102

llvm-svn: 337158
2018-07-16 10:52:27 +00:00
Chandler Carruth 3620b9957a [x86/SLH] Extract another small helper function, add better comments and
use better terminology. NFC.

llvm-svn: 337157
2018-07-16 10:46:16 +00:00
Mark Searles f4e7025299 [AMDGPU][Waitcnt] Re-apply fix "comparison of integers of different signs" build error"
Re-apply "[AMDGPU][Waitcnt] fix "comparison of integers of different signs" build error""
( fe0a456510131f268e388c4a18a92f575c0db183 ), which was inadvertantly reverted via
2b2ee080f0164485562593b1b87291a48cea4a9a .

llvm-svn: 337156
2018-07-16 10:21:36 +00:00
Alexander Potapenko 725a4ddc9e [MSan] factor userspace-specific declarations into createUserspaceApi(). NFC
This patch introduces createUserspaceApi() that creates function/global
declarations for symbols used by MSan in the userspace.
This is a step towards the upcoming KMSAN implementation patch.

Reviewed at https://reviews.llvm.org/D49292

llvm-svn: 337155
2018-07-16 10:03:30 +00:00
Mark Searles 72da47df25 run post-RA hazard recognizer pass late
Memory legalizer, waitcnt, and shrink  passes can perturb the instructions,
which means that the post-RA hazard recognizer pass should run after them.
Otherwise, one of those passes may invalidate the work done by the hazard
recognizer. Note that this has adverse side-effect that any consecutive
S_NOP 0's, emitted by the hazard recognizer, will not be shrunk into a
single S_NOP <N>. This should be addressed in a follow-on patch.

Differential Revision: https://reviews.llvm.org/D49288

llvm-svn: 337154
2018-07-16 10:02:41 +00:00
Mark Searles c2d5d9adb5 Revert "[AMDGPU][Waitcnt] fix "comparison of integers of different signs" build error"
This reverts commit fe0a456510131f268e388c4a18a92f575c0db183.

llvm-svn: 337153
2018-07-16 10:02:40 +00:00
Alexandros Lamprineas f854ce84c4 [MemorySSAUpdater] Remove deleted trivial Phis from active workset
Bug fix for PR37808. The regression test is a reduced version of the
original reproducer attached to the bug report. As stated in the report,
the problem was that InsertedPHIs was keeping dangling pointers to
deleted Memory-Phis. MemoryPhis are created eagerly and sometimes get
zapped shortly afterwards. I've used WeakVH instead of an expensive
removal operation from the active workset.

Differential Revision: https://reviews.llvm.org/D48372

llvm-svn: 337149
2018-07-16 07:51:27 +00:00
Craig Topper 07a1787501 [X86] Merge the FR128 and VR128 regclass since they have identical spill and alignment characteristics.
This unfortunately requires a bunch of bitcasts to be added added to SUBREG_TO_REG, COPY_TO_REGCLASS, and instructions in output patterns. Otherwise tablegen seems to default to picking f128 and then we fail when something tries to get the register class for f128 which isn't always valid.

The test changes are because we were previously mixing fr128 and vr128 due to contrainRegClass finding FR128 first and passes like live range shrinking weren't handling that well.

llvm-svn: 337147
2018-07-16 06:56:09 +00:00
Chandler Carruth bc46bca99e [x86/SLH] Fix an unused variable warning in release builds after
r337144.

llvm-svn: 337145
2018-07-16 04:42:27 +00:00
Chandler Carruth cdf0addc65 [x86/SLH] Teach speculative load hardening to correctly harden the
indices used by AVX2 and AVX-512 gather instructions.

The index vector is hardened by broadcasting the predicate state
into a vector register and then or-ing. We don't even have to worry
about EFLAGS here.

I've added a test for all of the gather intrinsics to make sure that we
don't miss one. A particularly interesting creation is the gather
prefetch, which needs to be marked as potentially "loading" to get the
correct behavior. It's a memory access in many ways, and is actually
relevant for SLH. Based on discussion with Craig in review, I've moved
it to be `mayLoad` and `mayStore` rather than generic side effects. This
matches how we model other prefetch instructions.

Many thanks to Craig for the review here.

Differential Revision: https://reviews.llvm.org/D49336

llvm-svn: 337144
2018-07-16 04:17:51 +00:00
Chen Zheng ccc8422464 [InstCombine] add more SPFofSPF folding
Differential Revision: https://reviews.llvm.org/D49238

llvm-svn: 337143
2018-07-16 02:23:00 +00:00
Chen Zheng b972273f98 [InstCombine] fold icmp pred (sub 0, X) C for vector type
Differential Revision: https://reviews.llvm.org/D49283

llvm-svn: 337141
2018-07-16 00:51:40 +00:00
Michael J. Spencer 7bb2767fba Recommit r335794 "Add support for generating a call graph profile from Branch Frequency Info." with fix for removed functions.
llvm-svn: 337140
2018-07-16 00:28:24 +00:00
Chandler Carruth 3ffcc0383e [x86/SLH] Extract one of the bits of logic to its own function. NFC.
This is just a refactoring to start cleaning up the code here and make
it more readable and approachable.

llvm-svn: 337138
2018-07-15 23:46:36 +00:00
Craig Topper cdb0ed2910 [X86] Add custom execution domain fixing for 128/256-bit integer logic operations with AVX512F, but not AVX512DQ.
AVX512F only has integer domain logic instructions. AVX512DQ added FP domain logic instructions.

Execution domain fixing runs before EVEX->VEX. So if we have AVX512F and not AVX512DQ we fail to do execution domain switching of the logic operations. This leads to mismatches in execution domain and more test differences.

This patch adds custom domain fixing that switches EVEX integer logic operations to VEX fp logic operations if XMM16-31 are not used.

llvm-svn: 337137
2018-07-15 23:32:36 +00:00
Craig Topper d553ff3e2e [X86] Add load patterns for cases where we select X86Movss/X86Movsd to blend instructions.
This allows us to fold the load during isel without waiting for the peephole pass to do it.

llvm-svn: 337136
2018-07-15 21:49:01 +00:00
Craig Topper ec0038398a [X86] Use 128-bit blends instead vmovss/vmovsd for 512-bit vzmovl patterns to match AVX.
llvm-svn: 337135
2018-07-15 18:51:08 +00:00
Craig Topper 8f34858779 [X86] Use 128-bit ops for 256-bit vzmovl patterns.
128-bit ops implicitly zero the upper bits. This should address the comment about domain crossing for the integer version without AVX2 since we can use a 128-bit VBLENDW without AVX2.

The only bad thing I see here is that we failed to reuse an vxorps in some of the tests, but I think that's already known issue.

llvm-svn: 337134
2018-07-15 18:51:07 +00:00
Sanjay Patel 79a423cfa2 [DAGCombiner] fix typo in comment; NFC
llvm-svn: 337132
2018-07-15 17:09:35 +00:00
Sanjay Patel 9d2099cc03 [InstCombine] Corrections in comments for division transformation (NFC)
The actual code seems to be correct, but the comments were misleading.

Patch by Aaron Puchert!

Differential Revision: https://reviews.llvm.org/D49276

llvm-svn: 337131
2018-07-15 17:06:59 +00:00
Sanjay Patel a41c886c55 [DAGCombiner] extend(ifpositive(X)) -> shift-right (not X)
This is almost the same as an existing IR canonicalization in instcombine, 
so I'm assuming this is a good early generic DAG combine too.

The motivation comes from reduced bit-hacking for select-of-constants in IR 
after rL331486. We want to restore that functionality in the DAG as noted in
the commit comments for that change and the llvm-dev discussion here:
http://lists.llvm.org/pipermail/llvm-dev/2018-July/124433.html

The PPC and AArch tests show that those targets are already doing something 
similar. x86 will be neutral in the minimal case and generally better when 
this pattern is extended with other ops as shown in the signbit-shift.ll tests.

Note the asymmetry: we don't include the (extend (ifneg X)) transform because 
it already exists in SimplifySelectCC(), and that is verified in the later 
unchanged tests in the signbit-shift.ll files. Without the 'not' op, the 
general transform to use a shift is always a win because that's a single 
instruction.

Alive proofs:
https://rise4fun.com/Alive/ysli

Name: if pos, get -1
  %c = icmp sgt i16 %x, -1
  %r = sext i1 %c to i16
  =>
  %n = xor i16 %x, -1
  %r = ashr i16 %n, 15

Name: if pos, get 1
  %c = icmp sgt i16 %x, -1
  %r = zext i1 %c to i16
  =>
  %n = xor i16 %x, -1
  %r = lshr i16 %n, 15

Differential Revision: https://reviews.llvm.org/D48970

llvm-svn: 337130
2018-07-15 16:27:07 +00:00
Sanjay Patel 92d0c1c129 [InstSimplify] fold minnum/maxnum with NaN arg
This fold is repeated/misplaced in instcombine, but I'm
not sure if it's safe to remove that yet because some
other folds appear to be asserting that the transform
has occurred within instcombine itself.

This isn't the best fix for PR37776, but it probably
hides the bug with the given code example:
https://bugs.llvm.org/show_bug.cgi?id=37776

We have another test to demonstrate the more general bug.

llvm-svn: 337127
2018-07-15 14:52:16 +00:00
Andrea Di Biagio ff630c2cdc [llvm-mca][BtVer2] teach how to identify false dependencies on partially written
registers.

The goal of this patch is to improve the throughput analysis in llvm-mca for the
case where instructions perform partial register writes.

On x86, partial register writes are quite difficult to model, mainly because
different processors tend to implement different register merging schemes in
hardware.

When the code contains partial register writes, the IPC (instructions per
cycles) estimated by llvm-mca tends to diverge quite significantly from the
observed IPC (using perf).

Modern AMD processors (at least, from Bulldozer onwards) don't rename partial
registers. Quoting Agner Fog's microarchitecture.pdf:
" The processor always keeps the different parts of an integer register together.
For example, AL and AH are not treated as independent by the out-of-order
execution mechanism. An instruction that writes to part of a register will
therefore have a false dependence on any previous write to the same register or
any part of it."

This patch is a first important step towards improving the analysis of partial
register updates. It changes the semantic of RegisterFile descriptors in
tablegen, and teaches llvm-mca how to identify false dependences in the presence
of partial register writes (for more details: see the new code comments in
include/Target/TargetSchedule.h - class RegisterFile).

This patch doesn't address the case where a write to a part of a register is
followed by a read from the whole register.  On Intel chips, high8 registers
(AH/BH/CH/DH)) can be stored in separate physical registers. However, a later
(dirty) read of the full register (example: AX/EAX) triggers a merge uOp, which
adds extra latency (and potentially affects the pipe usage).
This is a very interesting article on the subject with a very informative answer
from Peter Cordes:
https://stackoverflow.com/questions/45660139/how-exactly-do-partial-registers-on-haswell-skylake-perform-writing-al-seems-to

In future, the definition of RegisterFile can be extended with extra information
that may be used to identify delays caused by merge opcodes triggered by a dirty
read of a partial write.

Differential Revision: https://reviews.llvm.org/D49196

llvm-svn: 337123
2018-07-15 11:01:38 +00:00
Dylan McKay 0603bae41c [AVR] Document some public functions
llvm-svn: 337122
2018-07-15 07:24:27 +00:00
Craig Topper 60ce856134 [X86] Add some optsize patterns for 256-bit X86vzmovl.
These patterns use VMOVSS/SD. Without optsize we use BLENDI instead.

llvm-svn: 337119
2018-07-15 06:03:19 +00:00
Roman Lebedev c7bc4c02eb [NFC][InstCombine] foldICmpWithLowBitMaskedVal(): update comments.
All predicates are handled.
There does not seem to be any other possible folds here.
There are some more folds possible with inverted mask though.

llvm-svn: 337112
2018-07-14 20:08:52 +00:00
Roman Lebedev b972fc3e8a [InstCombine] Fold x & (-1 >> y) s< x to x s> (-1 >> y)
https://bugs.llvm.org/show_bug.cgi?id=38123
https://rise4fun.com/Alive/I3O

This pattern is not commutative!
We must make sure not to fold the commuted version!

llvm-svn: 337111
2018-07-14 20:08:47 +00:00
Roman Lebedev f14426101e [InstCombine] Fold x & (-1 >> y) s>= x to x s<= (-1 >> y)
https://bugs.llvm.org/show_bug.cgi?id=38123
https://rise4fun.com/Alive/I3O

This pattern is not commutative!
We must make sure not to fold the commuted version!

llvm-svn: 337109
2018-07-14 20:08:37 +00:00
Roman Lebedev 1e61e358a4 [InstCombine] Fold x s<= x & (-1 >> y) to x s<= (-1 >> y)
https://bugs.llvm.org/show_bug.cgi?id=38123
https://rise4fun.com/Alive/I3O

This pattern is not commutative!
We must make sure not to fold the commuted version!

llvm-svn: 337107
2018-07-14 20:08:26 +00:00
Roman Lebedev 859e14aeaa [InstCombine] Fold x s> x & (-1 >> y) to x s> (-1 >> y)
https://bugs.llvm.org/show_bug.cgi?id=38123
https://rise4fun.com/Alive/I3O

This pattern is not commutative!
We must make sure not to fold the commuted version!

llvm-svn: 337105
2018-07-14 20:08:16 +00:00
Roman Lebedev 0f5ec8921b [InstCombine] Fold x u<= x & C to x u<= C
https://bugs.llvm.org/show_bug.cgi?id=38123
https://rise4fun.com/Alive/Fqp

This pattern is not commutative. But InstSimplify will
already have taken care of the 'commutative' variant.

llvm-svn: 337102
2018-07-14 16:44:54 +00:00
Roman Lebedev 74f611a1f5 [InstCombine] Fold x u> x & C to x u> C
https://bugs.llvm.org/show_bug.cgi?id=38123
https://rise4fun.com/Alive/JvS

This pattern is not commutative. But InstSimplify will
already have taken care of the 'commutative' variant.

llvm-svn: 337100
2018-07-14 16:44:43 +00:00
Roman Lebedev e3dc587ae0 [InstCombine] Fold x & (-1 >> y) u< x to x u> (-1 >> y)
https://bugs.llvm.org/show_bug.cgi?id=38123
https://rise4fun.com/Alive/ocb

This pattern is not commutative. But InstSimplify will
already have taken care of the 'commutative' variant.

llvm-svn: 337098
2018-07-14 12:20:16 +00:00
Roman Lebedev fac48474ce [InstCombine] Fold x & (-1 >> y) u>= x to x u<= (-1 >> y)
https://bugs.llvm.org/show_bug.cgi?id=38123
https://rise4fun.com/Alive/azI

This pattern is not commutative. But InstSimplify will
already have taken care of the 'commutative' variant.

llvm-svn: 337096
2018-07-14 12:20:06 +00:00
Francis Visoiu Mistrih f905bf1408 [MachineOutliner] Check the last instruction from the sequence when updating liveness
The MachineOutliner was doing an std::for_each from the call (inserted
before the outlined sequence) to the iterator at the end of the
sequence.

std::for_each needs the iterator past the end, so the last instruction
was not taken into account when propagating the liveness information.

This fixes the machine verifier issue in machine-outliner-disubprogram.ll.

Differential Revision: https://reviews.llvm.org/D49295

llvm-svn: 337090
2018-07-14 09:40:01 +00:00
Chandler Carruth fb503ac027 [x86/SLH] Fix an issue where we wouldn't harden any loads if we found
no conditions.

This is only valid to do if we're hardening calls and rets with LFENCE
which results in an LFENCE guarding the entire entry block for us.

llvm-svn: 337089
2018-07-14 09:32:37 +00:00
Craig Topper 7426cf6717 [X86] Fix a subtle bug in the custom execution domain fixing for blends.
The code tried to find the immediate by using getNumOperands() on the MachineInstr, but there might be implicit-defs after the immediate that get counted.

Instead use getNumOperands() from the instruction description which will only count the operands that are defined in the td file.

llvm-svn: 337088
2018-07-14 06:30:30 +00:00
Nico Weber 17172c6b80 Give llvm-lib rudimentary help output.
https://reviews.llvm.org/D49318

llvm-svn: 337084
2018-07-14 02:29:44 +00:00
Craig Topper f0b164415c [X86] Prefer blendi over movss/sd when avx512 is enabled unless optimizing for size.
AVX512 doesn't have an immediate controlled blend instruction. But blend throughput is still better than movss/sd on SKX.

This commit changes AVX512 to use the AVX blend instructions instead of MOVSS/MOVSD. This constrains the register allocation since it won't be able to use XMM16-31, but hopefully the increased throughput and reduced port 5 pressure makes up for that.

llvm-svn: 337083
2018-07-14 02:05:08 +00:00
Teresa Johnson b78c5d0602 Revert "[ThinLTO] Ensure we always select the same function copy to import"
This reverts commits r337050 and r337059. Caused failure in
reverse-iteration bot that needs more investigation.

llvm-svn: 337081
2018-07-14 01:45:49 +00:00
Evgeniy Stepanov 1971ba097d Revert "AMDGPU: Fix handling of alignment padding in DAG argument lowering"
This reverts commit r337021.

WARNING: MemorySanitizer: use-of-uninitialized-value
    #0 0x1415cd65 in void write_signed<long>(llvm::raw_ostream&, long, unsigned long, llvm::IntegerStyle) /code/llvm-project/llvm/lib/Support/NativeFormatting.cpp:95:7
    #1 0x1415c900 in llvm::write_integer(llvm::raw_ostream&, long, unsigned long, llvm::IntegerStyle) /code/llvm-project/llvm/lib/Support/NativeFormatting.cpp:121:3
    #2 0x1472357f in llvm::raw_ostream::operator<<(long) /code/llvm-project/llvm/lib/Support/raw_ostream.cpp:117:3
    #3 0x13bb9d4 in llvm::raw_ostream::operator<<(int) /code/llvm-project/llvm/include/llvm/Support/raw_ostream.h:210:18
    #4 0x3c2bc18 in void printField<unsigned int, &(amd_kernel_code_s::amd_kernel_code_version_major)>(llvm::StringRef, amd_kernel_code_s const&, llvm::raw_ostream&) /code/llvm-project/llvm/lib/Target/AMDGPU/Utils/AMDKernelCodeTUtils.cpp:78:23
    #5 0x3c250ba in llvm::printAmdKernelCodeField(amd_kernel_code_s const&, int, llvm::raw_ostream&) /code/llvm-project/llvm/lib/Target/AMDGPU/Utils/AMDKernelCodeTUtils.cpp:104:5
    #6 0x3c27ca3 in llvm::dumpAmdKernelCode(amd_kernel_code_s const*, llvm::raw_ostream&, char const*) /code/llvm-project/llvm/lib/Target/AMDGPU/Utils/AMDKernelCodeTUtils.cpp:113:5
    #7 0x3a46e6c in llvm::AMDGPUTargetAsmStreamer::EmitAMDKernelCodeT(amd_kernel_code_s const&) /code/llvm-project/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp:161:3
    #8 0xd371e4 in llvm::AMDGPUAsmPrinter::EmitFunctionBodyStart() /code/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp:204:26

[...]

Uninitialized value was created by an allocation of 'KernelCode' in the stack frame of function '_ZN4llvm16AMDGPUAsmPrinter21EmitFunctionBodyStartEv'
    #0 0xd36650 in llvm::AMDGPUAsmPrinter::EmitFunctionBodyStart() /code/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp:192

llvm-svn: 337079
2018-07-14 01:20:53 +00:00
Chandler Carruth a24fe067c0 [x86/SLH] Add an assert to catch if we ever end up trying to harden
post-load a register that isn't valid for use with OR or SHRX.

llvm-svn: 337078
2018-07-14 00:52:09 +00:00
Tim Shen a064622bd3 Re-apply "[SCEV] Strengthen StrengthenNoWrapFlags (reapply r334428)."
llvm-svn: 337075
2018-07-13 23:58:46 +00:00
Krzysztof Parzyszek 7ced04c0fd [Hexagon] Avoid introducing calls into coalesced range of HVX vector pairs
If an HVX vector register is to be coalesced into a vector pair, make
sure that the vector pair will not have a function call in its live range,
unless it already had one. All HVX vector registers are volatile, so
any vector register live across a function call will have to be spilled.

If a vector needs to be spilled, and it's coalesced into a vector pair
then the whole pair will need to be spilled (even if only a part of it is
live), taking extra stack space.

llvm-svn: 337073
2018-07-13 23:42:29 +00:00
Tim Shen 9e25d5d2ce [LSR] If no Use is interesting, early return.
Summary:
By looking at the callers of getUse(), we can see that even though
IVUsers may offer uses, but they may not be interesting to
LSR. It's possible that none of them is interesting.

Reviewers: sanjoy

Subscribers: jlebar, hiraditya, bixia, llvm-commits

Differential Revision: https://reviews.llvm.org/D49049

llvm-svn: 337072
2018-07-13 23:40:00 +00:00
Craig Topper 8315667d99 [X86][SLH] Remove PDEP and PEXT from isDataInvariantLoad
Ryzen has something like an 18 cycle latency on these based on Agner's data. AMD's own xls is blank. So it seems like there might be something tricky here.

Agner's data for Intel CPUs indicates these are a single uop there.

Probably safest to remove them. We never generate them without an intrinsic so this should be ok.

Differential Revision: https://reviews.llvm.org/D49315

llvm-svn: 337067
2018-07-13 22:41:52 +00:00
Craig Topper 41fa858262 [X86][SLH] Add VEX and EVEX conversion instructions to isDataInvariantLoad
-Drop the intrinsic versions of conversion instructions. These should be handled when we do vectors. They shouldn't show up in scalar code.
-Add the float<->double conversions which were missing.
-Add the AVX512 and AVX version of the conversion instructions including the unsigned integer conversions unique to AVX512

Differential Revision: https://reviews.llvm.org/D49313

llvm-svn: 337066
2018-07-13 22:41:50 +00:00
Craig Topper 445abf700d [X86][SLH] Regroup the instructions in isDataInvariantLoad a little. NFC
-Move BSF/BSR to the same group as TZCNT/LZCNT/POPCNT.
-Split some of the bit manipulation instructions away from TZCNT/LZCNT/POPCNT. These are things like 'x & (x - 1)' which are composed of a few simple arithmetic operations. These aren't nearly as complicated/surprising as counting bits.
-Move BEXTR/BZHI into their own group. They aren't like a simple arithmethic op or the bit manipulation instructions. They're more like a shift+and.

Differential Revision: https://reviews.llvm.org/D49312

llvm-svn: 337065
2018-07-13 22:41:46 +00:00
Vedant Kumar d37e078d39 Fix comments which mixed up 'before' and 'after', NFC
llvm-svn: 337061
2018-07-13 22:39:31 +00:00
Craig Topper 2260e4149a [X86] Use the correct types in some recently added isel patterns.
These were supposed to be integer types since we are selecting integer instructions.

Found while preparing to remove these patterns for another patch.

llvm-svn: 337057
2018-07-13 22:27:53 +00:00
Tom Stellard ac68471326 AMDGPU/GlobalISel: Implement select() for 32-bit @llvm.minnun and @llvm.maxnum
Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D46172

llvm-svn: 337056
2018-07-13 22:16:03 +00:00
Craig Topper da424ba1c5 [X86][FastISel] Support uitofp with avx512.
llvm-svn: 337055
2018-07-13 22:09:30 +00:00
Eli Friedman 835297a951 [LTO] Fix linking with an alias defined using another alias.
When we're linking an alias which will be defined later, we neeed to
build a GlobalAlias, or else we'll crash later in
IRLinker::linkGlobalValueBody.

clang sometimes constructs aliases like this for C++ destructors.

Differential Revision: https://reviews.llvm.org/D49316

llvm-svn: 337053
2018-07-13 21:58:55 +00:00
Fangrui Song dcdc9ac7a2 [X86] Correct comment of TEST elimination in BSF/TZCNT
llvm-svn: 337052
2018-07-13 21:40:08 +00:00
Teresa Johnson d94c0594d9 [ThinLTO] Ensure we always select the same function copy to import
In order to always import the same copy of a linkonce function,
even when encountering it with different thresholds (a higher one then a
lower one), keep track of the summary we decided to import.
This ensures that the backend only gets a single definition to import
for each GUID, so that it doesn't need to choose one.

Move the largest threshold the GUID was considered for import into the
current module out of the ImportMap (which is part of a larger map
maintained across the whole index), and into a new map just maintained
for the current module we are computing imports for. This saves some
memory since we no longer have the thresholds maintained across the
whole index (and throughout the in-process backends when doing a normal
non-distributed ThinLTO build), at the cost of some additional
information being maintained for each invocation of ComputeImportForModule
(the selected summary pointer for each import).

There is an additional map lookup for each callee being considered for
importing, however, this was able to subsume a map lookup in the
Worklist iteration that invokes computeImportForFunction. We also are
able to avoid calling selectCallee if we already failed to import at the
same or higher threshold.

I compared the run time and peak memory for the SPEC2006 471.omnetpp
benchmark (running in-process ThinLTO backends), as well as for a large
internal benchmark with a distributed ThinLTO build (so just looking at
the thin link time/memory). Across a number of runs with and without
this change there was no significant change in the time and memory.

(I tried a few other variations of the change but they also didn't
improve time or peak memory).

Reviewers: davidxl

Subscribers: mehdi_amini, inglorion, llvm-commits

Differential Revision: https://reviews.llvm.org/D48670

llvm-svn: 337050
2018-07-13 21:35:51 +00:00
Tom Stellard 390a5f4774 AMDGPU/GlobalISel: Implement select() for @llvm.amdgcn.exp
Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D45882

llvm-svn: 337046
2018-07-13 21:05:14 +00:00
Craig Topper f0831eef0b [X86][FastISel] Add EVEX support to sitofp handling.
llvm-svn: 337045
2018-07-13 21:03:43 +00:00
Fangrui Song 90d9c201dc [X86] Try fixing r336768
llvm-svn: 337043
2018-07-13 20:54:24 +00:00
Vlad Tsyrklevich cd1559366d [LowerTypeTests] Limit when icall jumptable entries are emitted
Summary:
Currently LowerTypeTests emits jumptable entries for all live external
and address-taken functions; however, we could limit the number of
functions that we emit entries for significantly.

For Cross-DSO CFI, we continue to emit jumptable entries for all
exported definitions.  In the non-Cross-DSO CFI case, we only need to
emit jumptable entries for live functions that are address-taken in live
functions. This ignores exported functions and functions that are only
address taken in dead functions. This change uses ThinLTO summary data
(now emitted for all modules during ThinLTO builds) to determine
address-taken and liveness info.

The logic for emitting jumptable entries is more conservative in the
regular LTO case because we don't have summary data in the case of
monolithic LTO builds; however, once summaries are emitted for all LTO
builds we can unify the Thin/monolithic LTO logic to only use summaries
to determine the liveness of address taking functions.

This change is a partial fix for PR37474. It reduces the build size for
nacl_helper by ~2-3%, the reduction is due to nacl_helper compiling in
lots of unused code and unused functions that are address taken in dead
functions no longer being being considered live due to emitted jumptable
references. The reduction for chromium is ~0.1-0.2%.

Reviewers: pcc, eugenis, javed.absar

Reviewed By: pcc

Subscribers: aheejin, dexonsmith, dschuff, mehdi_amini, eraman, steven_wu, llvm-commits, kcc

Differential Revision: https://reviews.llvm.org/D47652

llvm-svn: 337038
2018-07-13 19:57:39 +00:00
Jonas Devlieghere 327e7a1608 [dwarfdump] Add pretty printer for accelerator table based on Atom.
For instance, When dumping .apple_types, the second atom represents the
DW_TAG. In addition to printing the raw value, we now also pretty print
the value if the ATOM tells us how.

llvm-svn: 337026
2018-07-13 17:21:51 +00:00
Matt Arsenault d362b6ad29 AMDGPU: Properly handle shader inputs with split arguments
This needs to refer to arguments by their original argument
index, not the argument split index which depends on what
the type splitting decides to do.

Also avoid increment PSInputNum for each split piece.

llvm-svn: 337022
2018-07-13 16:40:37 +00:00
Matt Arsenault de95077780 AMDGPU: Fix handling of alignment padding in DAG argument lowering
This was completely broken if there was ever a struct argument, as
this information is thrown away during the argument analysis.

The offsets as passed in to LowerFormalArguments are not useful,
as they partially depend on the legalized result register type,
and they don't consider the alignment in the first place.

Ignore the Ins array, and instead figure out from the raw IR type
what we need to do. This seems to fix the padding computation
if the DAG lowering is forced (and stops breaking arguments
following padded arguments if the arguments were only partially
lowered in the IR)

llvm-svn: 337021
2018-07-13 16:40:25 +00:00
Evgeniy Stepanov ca8c2f7638 Revert "CallGraphSCCPass: iterate over all functions."
This reverts commit r336419: use-after-free on CallGraph::FunctionMap elements
due to the use of a stale iterator in CGPassManager::runOnModule.

The iterator may be invalidated if a pass removes a function, ex.:
  llvm::LegacyInlinerBase::inlineCalls
  inlineCallsImpl
  llvm::CallGraph::removeFunctionFromModule

llvm-svn: 337018
2018-07-13 16:32:31 +00:00
Jonas Devlieghere 8afd926077 [dwarfdump] Pretty print DW_AT_APPLE_runtime_class
Instead of printing

  DW_AT_APPLE_runtime_class       (0x10)

we now print

  DW_AT_APPLE_runtime_class       (DW_LANG_ObjC)

llvm-svn: 337011
2018-07-13 16:06:17 +00:00
Sjoerd Meijer ceabd50a5c [AArch64] Armv8.4-A: LDAPR & STLR with immediate offset instructions (cont'd)
Follow up of rL336913: fix base class description. Thanks to Ahmed Bougacha
for pointing this out.

Differential Revision: https://reviews.llvm.org/D49284

llvm-svn: 337009
2018-07-13 15:25:42 +00:00
Nemanja Ivanovic 080c35050e [PowerPC] Materialize more constants with CR-field set in late peephole
Revision r322373 fixed a bug in how we materialize constants when the CR-field
needs to be set.

However the fix is overly conservative. It will only do the transform if
AND-ing the input with the new constant produces the same new constant.
This is of course correct, but not necessarily required.

If there are no futher uses of the constant, the constant can be changed.
If there are no uses of the GPR result, the final result of the materialization
isn't important other than it needs to compare to zero correctly (lt, gt, eq).

Differential revision: https://reviews.llvm.org/D42109

llvm-svn: 337008
2018-07-13 15:21:03 +00:00
Joel Galenson 06e7e5798f [cfi-verify] Support AArch64.
This patch adds support for AArch64 to cfi-verify.

This required three changes to cfi-verify.  First, it generalizes checking if an instruction is a trap by adding a new isTrap flag to TableGen (and defining it for x86 and AArch64).  Second, the code that ensures that the operand register is not clobbered between the CFI check and the indirect call needs to allow a single dereference (in x86 this happens as part of the jump instruction).  Third, we needed to ensure that return instructions are not counted as indirect branches.  Technically, returns are indirect branches and can be covered by CFI, but LLVM's forward-edge CFI does not protect them, and x86 does not consider them, so we keep that behavior.

In addition, we had to improve AArch64's code to evaluate the branch target of a MCInst to handle calls where the destination is not the first operand (which it often is not).

Differential Revision: https://reviews.llvm.org/D48836

llvm-svn: 337007
2018-07-13 15:19:33 +00:00
Erich Keane ac8cb22ad5 Add parens to silence Wparentheses warning, introduced by 336990
llvm-svn: 337002
2018-07-13 14:43:20 +00:00
Erich Keane 3b6bcf6d67 [NFC] Silence Wparentheses warning in DomTreeUpdater, introduced by 336968
llvm-svn: 337001
2018-07-13 14:41:15 +00:00
Ulrich Weigand c48aefb63b [TableGen] Support multi-alternative pattern fragments
A TableGen instruction record usually contains a DAG pattern that will
describe the SelectionDAG operation that can be implemented by this
instruction. However, there will be cases where several different DAG
patterns can all be implemented by the same instruction. The way to
represent this today is to write additional patterns in the Pattern
(or usually Pat) class that map those extra DAG patterns to the
instruction. This usually also works fine.

However, I've noticed cases where the current setup seems to require
quite a bit of extra (and duplicated) text in the target .td files.
For example, in the SystemZ back-end, there are quite a number of
instructions that can implement an "add-with-overflow" operation.
The same instructions also need to be used to implement just plain
addition (simply ignoring the extra overflow output). The current
solution requires creating extra Pat pattern for every instruction,
duplicating the information about which particular add operands
map best to which particular instruction.

This patch enhances TableGen to support a new PatFrags class, which
can be used to encapsulate multiple alternative patterns that may
all match to the same instruction.  It operates the same way as the
existing PatFrag class, except that it accepts a list of DAG patterns
to match instead of just a single one.  As an example, we can now define
a PatFrags to match either an "add-with-overflow" or a regular add
operation:

  def z_sadd : PatFrags<(ops node:$src1, node:$src2),
                        [(z_saddo node:$src1, node:$src2),
                         (add node:$src1, node:$src2)]>;

and then use this in the add instruction pattern:

  defm AR : BinaryRRAndK<"ar", 0x1A, 0xB9F8, z_sadd, GR32, GR32>;

These SystemZ target changes are implemented here as well.


Note that PatFrag is now defined as a subclass of PatFrags, which
means that some users of internals of PatFrag need to be updated.
(E.g. instead of using PatFrag.Fragment you now need to use
!head(PatFrag.Fragments).)


The implementation is based on the following main ideas:
- InlinePatternFragments may now replace each original pattern
  with several result patterns, not just one.
- parseInstructionPattern delays calling InlinePatternFragments
  and InferAllTypes.  Instead, it extracts a single DAG match
  pattern from the main instruction pattern.
- Processing of the DAG match pattern part of the main instruction
  pattern now shares most code with processing match patterns from
  the Pattern class.
- Direct use of main instruction patterns in InferFromPattern and
  EmitResultInstructionAsOperand is removed; everything now operates
  solely on DAG match patterns.


Reviewed by: hfinkel

Differential Revision: https://reviews.llvm.org/D48545

llvm-svn: 336999
2018-07-13 13:18:00 +00:00
Tim Renouf f3d8295105 DivergenceAnalysis: added debug output
Summary:
This commit does two things:

1. modified the existing DivergenceAnalysis::dump() so it dumps the
   whole function with added DIVERGENT: annotations;

2. added code to do that dump if the appropriate -debug-only option is
   on.

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D47700

Change-Id: Id97b605aab1fc6f5a11a20c58a99bbe8c565bf83
llvm-svn: 336998
2018-07-13 13:13:30 +00:00
Chandler Carruth 90358e1ef1 [SLH] Introduce a new pass to do Speculative Load Hardening to mitigate
Spectre variant #1 for x86.

There is a lengthy, detailed RFC thread on llvm-dev which discusses the
high level issues. High level discussion is probably best there.

I've split the design document out of this patch and will land it
separately once I update it to reflect the latest edits and updates to
the Google doc used in the RFC thread.

This patch is really just an initial step. It isn't quite ready for
prime time and is only exposed via debugging flags. It has two major
limitations currently:
1) It only supports x86-64, and only certain ABIs. Many assumptions are
   currently hard-coded and need to be factored out of the code here.
2) It doesn't include any options for more fine-grained control, either
   of which control flow edges are significant or which loads are
   important to be hardened.
3) The code is still quite rough and the testing lighter than I'd like.

However, this is enough for people to begin using. I have had numerous
requests from people to be able to experiment with this patch to
understand the trade-offs it presents and how to use it. We would also
like to encourage work to similar effect in other toolchains.

The ARM folks are actively developing a system based on this for
AArch64. We hope to merge this with their efforts when both are far
enough along. But we also don't want to block making this available on
that effort.

Many thanks to the *numerous* people who helped along the way here. For
this patch in particular, both Eric and Craig did a ton of review to
even have confidence in it as an early, rough cut at this functionality.

Differential Revision: https://reviews.llvm.org/D44824

llvm-svn: 336990
2018-07-13 11:13:58 +00:00
Simon Pilgrim 36b944e778 [SLPVectorizer] Add initial alternate opcode support for cast instructions. (REAPPLIED-2)
We currently only support binary instructions in the alternate opcode shuffles.

This patch is an initial attempt at adding cast instructions as well, this raises several issues that we probably want to address as we continue to generalize the alternate mechanism:

1 - Duplication of cost determination - we should probably add scalar/vector costs helper functions and get BoUpSLP::getEntryCost to use them instead of determining costs directly.
2 - Support alternate instructions with the same opcode (e.g. casts with different src types) - alternate vectorization of calls with different IntrinsicIDs will require this.
3 - Allow alternates to be a different instruction type - mixing binary/cast/call etc.
4 - Allow passthrough of unsupported alternate instructions - related to PR30787/D28907 'copyable' elements.

Reapplied with fix to only accept 2 different casts if they come from the same source type (PR38154).

Differential Revision: https://reviews.llvm.org/D49135

llvm-svn: 336989
2018-07-13 11:09:52 +00:00
Chandler Carruth caa7b03a50 [x86] Teach the EFLAGS copy lowering to handle much more complex control
flow patterns including forks, merges, and even cyles.

This tries to cover a reasonably comprehensive set of patterns that
still don't require PHIs or PHI placement. The coverage was inspired by
the amazing variety of patterns produced when copy EFLAGS and restoring
it to implement Speculative Load Hardening. Without this patch, we
simply cannot make such complex and invasive changes to x86 instruction
sequences due to EFLAGS.

I've added "just" one test, but this test covers many different
complexities and corner cases of this approach. It is actually more
comprehensive, as far as I can tell, than anything that I have
encountered in the wild on SLH.

Because the test is so complex, I've tried to give somewhat thorough
comments and an ASCII-art diagram of the control flows to make it a bit
easier to read and maintain long-term.

Differential Revision: https://reviews.llvm.org/D49220

llvm-svn: 336985
2018-07-13 09:39:10 +00:00
Sander de Smalen 7c3c0f24a3 [AArch64][SVE] Asm: Vector Unpack Low/High instructions.
This patch adds support for the following unpack instructions:
  
- PUNPKLO, PUNPKHI   Unpack elements from low/high half and
                     place into elements of twice their size.

  e.g. punpklo p0.h, p0.b

- UUNPKLO, UUNPKHI   Unpack elements from low/high half and 
  SUNPKLO, SUNPKHI   place into elements of twice their size
                     after zero- or sign-extending the values.

  e.g. uunpklo z0.h, z0.b

llvm-svn: 336982
2018-07-13 09:25:43 +00:00
Sander de Smalen faee91a52b [AArch64][SVE] Asm: Support for insert element (INSR) instructions.
Insert general purpose register into shifted vector, e.g.
  insr    z0.s, w0
  insr    z0.d, x0

Insert SIMD&FP scalar register into shifted vector, e.g.
  insr    z0.b, b0
  insr    z0.h, h0
  insr    z0.s, s0
  insr    z0.d, d0

llvm-svn: 336979
2018-07-13 08:51:57 +00:00
Petar Jovanovic be2e80af12 [LiveDebugValues] Tracking copying value between registers
During the execution of long functions or functions that have a lot of
inlined code it could come to the situation where tracked value could be
transferred from one register to another. The transfer is recognized only if
destination register is a callee saved register and if source register is
killed. We do not salvage caller-saved registers since there is a great
chance that killed register would outlive it.

Patch by Nikola Prica.

Differential Revision: https://reviews.llvm.org/D44016

llvm-svn: 336978
2018-07-13 08:24:26 +00:00
Craig Topper a8efe59a0a [X86] Prefer MOVSS/SD over BLEND under optsize in isel.
Previously we iseled to blend, commuted to another blend, and then commuted back to movss/movsd or blend depending on optsize. Now we do it directly.

llvm-svn: 336976
2018-07-13 06:25:31 +00:00
Dean Michael Berris 10141261e1 [XRay][compiler-rt] Add PID field to llvm-xray tool and add PID metadata record entry in FDR mode
Summary:
llvm-xray changes:
- account-mode - process-id  {...} shows after thread-id
- convert-mode - process {...} shows after thread
- parses FDR and basic mode pid entries
- Checks version number for FDR log parsing.

Basic logging changes:
- Update header version from 2 -> 3

FDR logging changes:
- Update header version from 2 -> 3
- in writeBufferPreamble, there is an additional PID Metadata record (after thread id record and tsc record)

Test cases changes:
- fdr-mode.cc, fdr-single-thread.cc, fdr-thread-order.cc modified to catch process id output in the log.

Reviewers: dberris

Reviewed By: dberris

Subscribers: hiraditya, llvm-commits, #sanitizers

Differential Revision: https://reviews.llvm.org/D49153

llvm-svn: 336974
2018-07-13 05:38:22 +00:00
Craig Topper 2ab325ba23 [X86] Remove isel patterns that turns packed add/sub/mul/div+movss/sd into scalar intrinsic instructions.
This is not an optimization we should be doing in isel. This is more suitable for a DAG combine.

My main concern is a future time when we support more FPENV. Changing a packed op to a scalar op could cause us to miss some exceptions that should have occured if we had done a packed op. A DAG combine would be better able to manage this.

llvm-svn: 336971
2018-07-13 04:50:39 +00:00
Chijun Sima 00712cb749 [DomTreeUpdater] Ignore updates when both DT and PDT are nullptrs
Summary:
Previously, when both DT and PDT are nullptrs and the UpdateStrategy is Lazy, DomTreeUpdater still pends updates inside.
After this patch, DomTreeUpdater will ignore all updates from(`applyUpdates()/insertEdge*()/deleteEdge*()`) in this case. (call `delBB()` still pends BasicBlock deletion until a flush event according to the doc).
The behavior of DomTreeUpdater previously documented won't change after the patch.

Reviewers: dmgreen, davide, kuhar, brzycki, grosser

Reviewed By: kuhar

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D48974

llvm-svn: 336968
2018-07-13 04:02:13 +00:00
Sanjay Patel 70043b7e9a [InstCombine] return when SimplifyAssociativeOrCommutative makes a change
This bug was created by rL335258 because we used to always call instsimplify
after trying the associative folds. After that change it became possible
for subsequent folds to encounter unsimplified code (and potentially assert
because of it). 

Instead of carrying changed state through instcombine, we can just return 
immediately. This allows instsimplify to run, so we can continue assuming
that easy folds have already occurred.

llvm-svn: 336965
2018-07-13 01:18:07 +00:00
Matthias Braun 90ad6835dd CodeGen: Remove pipeline dependencies on StackProtector; NFC
This re-applies r336929 with a fix to accomodate for the Mips target
scheduling multiple SelectionDAG instances into the pass pipeline.

PrologEpilogInserter and StackColoring depend on the StackProtector analysis
being alive from the point it is run until PEI, which requires that they are all
scheduled in the same FunctionPassManager. Inserting a (machine) ModulePass
between StackProtector and PEI results in these passes being in separate
FunctionPassManagers and the StackProtector is not available for PEI.

PEI and StackColoring don't use much information from the StackProtector pass,
so transfering the required information to MachineFrameInfo is cleaner than
keeping the StackProtector pass around. This commit moves the SSP layout
information to MFI instead of keeping it in the pass.

This patch set (D37580, D37581, D37582, D37583, D37584, D37585, D37586, D37587)
is a first draft of the pagerando implementation described in
http://lists.llvm.org/pipermail/llvm-dev/2017-June/113794.html.

Patch by Stephen Crane <sjc@immunant.com>

Differential Revision: https://reviews.llvm.org/D49256

llvm-svn: 336964
2018-07-13 00:08:38 +00:00
Piotr Padlewski c63b492bcd Simplify recursive launder.invariant.group and strip
Summary:
This patch is crucial for proving equality laundered/stripped
pointers. eg:

  bool foo(A *a) {
    return a == std::launder(a);
  }

Clang with -fstrict-vtable-pointers will emit something like:

    define dso_local zeroext i1 @_Z3fooP1A(%struct.A* %a) {
    entry:
      %c = bitcast %struct.A* %a to i8*
      %call = tail call i8* @llvm.launder.invariant.group.p0i8(i8* %c)
      %0 = bitcast %struct.A* %a to i8*
      %1 = tail call i8* @llvm.strip.invariant.group.p0i8(i8* %0)
      %2 = tail call i8* @llvm.strip.invariant.group.p0i8(i8* %call)
      %cmp = icmp eq i8* %1, %2
      ret i1 %cmp
    }

and because %2 can be replaced with @llvm.strip.invariant.group(%0)
and that %2 and %1 will produce the same value (because strip is readnone)
we can replace compare with true.

Reviewers: rsmith, hfinkel, majnemer, amharc, kuhar

Subscribers: llvm-commits, hiraditya

Differential Revision: https://reviews.llvm.org/D47423

llvm-svn: 336963
2018-07-12 23:55:20 +00:00
Fangrui Song 9bb6c392e3 [InstCombine] Simplify isKnownNegation
llvm-svn: 336957
2018-07-12 22:56:23 +00:00
Craig Topper 3a13477214 [X86] Add AVX512 equivalents of some isel patterns so we get EVEX instructions.
These are the patterns for matching fceil, ffloor, and sqrt to intrinsic instructions if they have a MOVSS/SD.

llvm-svn: 336954
2018-07-12 22:14:10 +00:00
Craig Topper b0053b79d6 Revert r336950 and r336951 "[X86] Add AVX512 equivalents of some isel patterns so we get EVEX instructions." and "foo"
One of them had a bad title and they should have been squashed.

llvm-svn: 336953
2018-07-12 21:58:03 +00:00
George Burgess IV e7cdb7edf7 Remove redundant *_or_null checks; NFC
For the first one, we dereference `NewDef` right before the `if` anyway.
For the second, we shouldn't have NULL users().

llvm-svn: 336952
2018-07-12 21:56:31 +00:00
Craig Topper 3b837b6b63 [X86] Add AVX512 equivalents of some isel patterns so we get EVEX instructions.
These are the patterns for matching fceil, ffloor, and sqrt to intrinsic instructions if they have a MOVSS/SD.

llvm-svn: 336951
2018-07-12 21:53:23 +00:00
Craig Topper b01a355354 foo
llvm-svn: 336950
2018-07-12 21:53:07 +00:00
Martin Storsjo 86b95489fd Revert "[SLPVectorizer] Add initial alternate opcode support for cast instructions. (REAPPLIED)"
This reverts commit r336812, which broke compilation of a number
of projects, see PR38154.

llvm-svn: 336949
2018-07-12 21:33:42 +00:00
Matt Morehouse 4543816150 [SanitizerCoverage] Add associated metadata to 8-bit counters.
Summary:
This allows counters associated with unused functions to be
dead-stripped along with their functions.  This approach is the same one
we used for PC tables.

Fixes an issue where LLD removes an unused PC table but leaves the 8-bit
counter.

Reviewers: eugenis

Reviewed By: eugenis

Subscribers: llvm-commits, hiraditya, kcc

Differential Revision: https://reviews.llvm.org/D49264

llvm-svn: 336941
2018-07-12 20:24:58 +00:00
Craig Topper 57c4585bab [X86][FastISel] Support EVEX version of sqrt.
llvm-svn: 336939
2018-07-12 19:58:06 +00:00
Matt Arsenault 4dca0a9904 AMDGPU: Fix assert in truncate combine with vectors
The piece above probably has the same problem, but I need
to try to come up with a test for it.

llvm-svn: 336935
2018-07-12 19:40:16 +00:00
Matthias Braun f03f32d478 Revert "(HEAD -> master, origin/master, arcpatch-D37582) CodeGen: Remove pipeline dependencies on StackProtector; NFC"
This was triggering pass scheduling failures.

This reverts commit r336929.

llvm-svn: 336934
2018-07-12 19:27:01 +00:00
Matthias Braun 9436570cbd CodeGen: Remove pipeline dependencies on StackProtector; NFC
PrologEpilogInserter and StackColoring depend on the StackProtector analysis
being alive from the point it is run until PEI, which requires that they are all
scheduled in the same FunctionPassManager. Inserting a (machine) ModulePass
between StackProtector and PEI results in these passes being in separate
FunctionPassManagers and the StackProtector is not available for PEI.

PEI and StackColoring don't use much information from the StackProtector pass,
so transfering the required information to MachineFrameInfo is cleaner than
keeping the StackProtector pass around. This commit moves the SSP layout
information to MFI instead of keeping it in the pass.

This patch set (D37580, D37581, D37582, D37583, D37584, D37585, D37586, D37587)
is a first draft of the pagerando implementation described in
http://lists.llvm.org/pipermail/llvm-dev/2017-June/113794.html.

Patch by Stephen Crane <sjc@immunant.com>

Differential Revision: https://reviews.llvm.org/D49256

llvm-svn: 336929
2018-07-12 18:33:32 +00:00
Wolfgang Pieb fcf3810cf7 [DWARF v5] Generate range list tables into the .debug_rnglists section. No support for split DWARF
and no use of DW_FORM_rnglistx with the DW_AT_ranges attribute.

Reviewer: aprantl

Differential Revision: https://reviews.llvm.org/D49214

llvm-svn: 336927
2018-07-12 18:18:21 +00:00
Craig Topper abc307e6fa [X86] Connect the flags user from PCMPISTR instructions to the correct node from the instruction.
We were accidentally connecting it to result 0 instead of result 1. This was caught by the machine verifier that noticed the flags were dead, but we were using them somehow. I'm still not clear what actually happened downstream.

llvm-svn: 336925
2018-07-12 18:04:05 +00:00
Craig Topper d43f58231c [X86][FastISel] Choose EVEX instructions when possible when lowering x86_sse_cvttss2si and similar intrinsics.
This should fix a machine verifier error.

llvm-svn: 336924
2018-07-12 18:03:56 +00:00
Sjoerd Meijer 83a2a62fb4 [AArch64] Armv8.4-A: LDAPR & STLR with immediate offset instructions
These instructions are added to AArch64 only.

llvm-svn: 336913
2018-07-12 14:57:59 +00:00
Roman Lebedev 74f899f0f4 [InstCombine] Fold x & (-1 >> y) != x to x u> (-1 >> y)
Summary:
A complementary fold to D49179.

https://bugs.llvm.org/show_bug.cgi?id=38123
https://rise4fun.com/Alive/Rny

Caveat: one more thing in `test/Transforms/InstCombine/icmp-logical.ll` breaks.

Reviewers: spatel, craig.topper

Reviewed By: spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D49205

llvm-svn: 336911
2018-07-12 14:56:12 +00:00
Andrew Ng ac305e18a4 [ThinLTO] Escape module paths when printing
We have located a bug in AssemblyWriter::printModuleSummaryIndex(). This
function outputs path strings incorrectly. Backslashes in the strings
are not correctly escaped.

Consequently, if a path name contains a backslash followed by two
hexadecimal characters, the sequence is incorrectly interpreted when the
output is read by another component. This mangles the path and results
in error.

This patch fixes this issue by calling printEscapedString() to output
the module paths.

Patch by Chris Jackson.

Differential Revision: https://reviews.llvm.org/D49090

llvm-svn: 336908
2018-07-12 14:40:21 +00:00
Simon Pilgrim 44b89fa900 [X86][SSE] Utilize ZeroableElements for canWidenShuffleElements
canWidenShuffleElements can do a better job if given a mask with ZeroableElements info. Apparently, ZeroableElements was being only used to identify AllZero candidates, but possibly we could plug it into more shuffle matchers.

Original Patch by Zvi Rackover @zvi

Differential Revision: https://reviews.llvm.org/D42044

llvm-svn: 336903
2018-07-12 13:29:41 +00:00
Simon Pilgrim 8a463897e9 [X86][AVX] Use Zeroable mask to improve shuffle mask widening
Noticed while updating D42044, lowerV2X128VectorShuffle can improve the shuffle mask with the zeroable data to create a target shuffle mask to recognise more 'zero upper 128' patterns.

NOTE: lowerV4X128VectorShuffle could benefit as well but the code needs refactoring first to discriminate between SM_SentinelUndef and SM_SentinelZero for negative shuffle indices.

Differential Revision: https://reviews.llvm.org/D49092

llvm-svn: 336900
2018-07-12 13:03:58 +00:00
David Green 2557e437fd [UnJ] Use SmallPtrSets for block collections. NFC
We no longer care about the order of blocks in these collections,
so can change to SmallPtrSets, making contains checks quicker.

Differential revision: https://reviews.llvm.org/D49060

llvm-svn: 336897
2018-07-12 10:44:47 +00:00
Simon Atanasyan 053ff54478 [mips] Mark standard encoded instructions as not being in MIPS16e
Mark standard encoded instructions and pseudo "standard encoded"
as not being in MIPS16e by default.

Patch by Simon Dardis.

Differential revision: https://reviews.llvm.org/D48379

llvm-svn: 336893
2018-07-12 08:50:11 +00:00
Craig Topper 73c7dceffd [X86] Remove i128 type from FR128 regclass.
i128 isn't a legal type in our x86 implementation today. So remove this and the few patterns that used it until it becomes necessary.

llvm-svn: 336889
2018-07-12 07:30:01 +00:00
Stefan Granitz d3b69c6be9 Fix few typos in comments (write access test commit)
llvm-svn: 336887
2018-07-12 06:41:41 +00:00
Craig Topper 73347ec081 [X86] Remove patterns and ISD nodes for the old scalar FMA intrinsic lowering.
We now use llvm.fma.f32/f64 or llvm.x86.fmadd.f32/f64 intrinsics that use scalar types rather than vector types. So we don't these special ISD nodes that operate on the lowest element of a vector.

llvm-svn: 336883
2018-07-12 03:42:41 +00:00
Chen Zheng fdf13ef342 [InstSimplify] simplify add instruction if two operands are negative
Differential Revision: https://reviews.llvm.org/D49216

llvm-svn: 336881
2018-07-12 03:06:04 +00:00
Fangrui Song 28f69c72bb [AsmParser] Fix inconsistent declaration parameter name
llvm-svn: 336879
2018-07-12 02:03:53 +00:00
Eric Christopher 7289ac8a19 Temporarily revert "Recommit r328307: [IPSCCP] Use constant range information for comparisons of parameters." as it's causing miscompiles.
A testcase was provided in the original review thread.

This reverts commit r336098.

llvm-svn: 336877
2018-07-12 01:53:21 +00:00
Chandler Carruth b4faf4ce08 [x86] Fix another trivial bug in x86 flags copy lowering that has been
there for a long time.

The boolean tracking whether we saw a kill of the flags was supposed to
be per-block we are scanning and instead was outside that loop and never
cleared. It requires a quite contrived test case to hit this as you have
to have multiple levels of successors and interleave them with kills.
I've included such a test case here.

This is another bug found testing SLH and extracted to its own focused
patch.

llvm-svn: 336876
2018-07-12 01:43:21 +00:00
Craig Topper be996bd2d9 [X86] Add patterns to use VMOVSS/SD zero masking for scalar f32/f64 select with zero.
These showed up in some of the upgraded FMA code. We really need to improve these test cases more, but this helps for now.

llvm-svn: 336875
2018-07-12 00:54:40 +00:00
Chandler Carruth 1c8234f639 [x86] Fix EFLAGS copy lowering to correctly handle walking past uses in
multiple successors where some of the uses end up killing the EFLAGS
register.

There was a bug where rather than skipping to the next basic block
queued up with uses once we saw a kill, we stopped processing the blocks
entirely. =/

Test case produces completely nonsensical code w/o this tiny fix.

This was found testing Speculative Load Hardening and split out of that
work.

Differential Revision: https://reviews.llvm.org/D49211

llvm-svn: 336874
2018-07-12 00:52:50 +00:00
Craig Topper 034adf2683 [X86] Remove and autoupgrade the scalar fma intrinsics with masking.
This converts them to what clang is now using for codegen. Unfortunately, there seem to be a few kinks to work out still. I'll try to address with follow up patches.

llvm-svn: 336871
2018-07-12 00:29:56 +00:00
Duncan P. N. Exon Smith 9f22752c4b IR: Skip -print-*-all after -print-*
This changes `-print-*` from transformation passes to analysis passes so
that `-print-after-all` and `-print-before-all` don't trigger.  This
avoids some redundant output.

Patch by Son Tuan Vu!

llvm-svn: 336869
2018-07-11 23:30:25 +00:00
Eli Friedman 0319c28459 [CodeGen] Emit more precise AssertZext/AssertSext nodes.
This is marginally helpful for removing redundant extensions, and the
code is easier to read, so it seems like an all-around win. In the new
test i8-phi-ext.ll, we used to emit an AssertSext i8; now we emit an
AssertZext i2, which allows the extension of the return value to be
eliminated.

Differential Revision: https://reviews.llvm.org/D49004

llvm-svn: 336868
2018-07-11 23:26:35 +00:00
Craig Topper ed6acde8cf [LoopIdiomRecognize] Don't convert a do while loop to ctlz.
This commit suppresses turning loops like this into "(bitwidth - ctlz(input))".

unsigned foo(unsigned input) {
  unsigned num = 0;
  do {
    ++num;
    input >>= 1;
  } while (input != 0);
  return num;
}

The loop version returns a value of 1 for both an input of 0 and an input of 1. Converting to a naive ctlz does not preserve that.

Theoretically we could do better if we checked isKnownNonZero or we could insert a select to handle the divergence. But until we have motivating cases for that, this is the easiest solution.

llvm-svn: 336864
2018-07-11 22:35:28 +00:00
Tom Stellard 752ddbd068 AMDGPU/SI: Initialize InstrInfo before TargetLoweringInfo in GCNSubtarget
SITargetLowering queries SIInstrInfo in its constructor, so SIInstrInfo
must be initialized first.  This fixes msan buildbot failures and was
introduced by r336851.

llvm-svn: 336861
2018-07-11 22:15:15 +00:00
Alina Sbirlea 0f53355e83 [MemorySSA] Add APIs to move memory accesses between blocks, following CFG changes.
Summary:
The move APIs added in this patch will be used to update MemorySSA when CFG changes merge or split blocks, by moving memory accesses accordingly in MemorySSA's internal data structures.
[Split from D45299 for easier review]

Reviewers: george.burgess.iv

Subscribers: sanjoy, jlebar, Prazek, llvm-commits

Differential Revision: https://reviews.llvm.org/D48897

llvm-svn: 336860
2018-07-11 22:11:46 +00:00
Tom Stellard 1a8adce24f AMDGPU: Remove duplicate call to initializeSubtargetDependencies()
This was added in r336851.

llvm-svn: 336853
2018-07-11 21:12:03 +00:00
Tom Stellard 5bfbae5cb1 AMDGPU: Refactor Subtarget classes
Summary:
This is a follow-up to r335942.
- Merge SISubtarget into AMDGPUSubtarget and rename to GCNSubtarget
- Rename AMDGPUCommonSubtarget to AMDGPUSubtarget
- Merge R600Subtarget::Generation and GCNSubtarget::Generation into
  AMDGPUSubtarget::Generation.

Reviewers: arsenm, jvesely

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D49037

llvm-svn: 336851
2018-07-11 20:59:01 +00:00
Fangrui Song 24452316c6 [DebugInfo] Fix getPreviousSibling after r336823
llvm-svn: 336837
2018-07-11 19:09:37 +00:00
Roman Lebedev 68d54cf5b3 [InstCombine] Fold x & (-1 >> y) == x to x u<= (-1 >> y)
Summary:
https://bugs.llvm.org/show_bug.cgi?id=38123

This pattern will be produced by Implicit Integer Truncation sanitizer,
https://reviews.llvm.org/D48958
https://bugs.llvm.org/show_bug.cgi?id=21530
in unsigned case, therefore it is probably a good idea to improve it.

https://rise4fun.com/Alive/Rny
^ there are more opportunities for folds, i will follow up with them afterwards.

Caveat: this somehow exposes a missing opportunities
in `test/Transforms/InstCombine/icmp-logical.ll`
It seems, the problem is in `foldLogOpOfMaskedICmps()` in `InstCombineAndOrXor.cpp`.
But i'm not quite sure what is wrong, because it calls `getMaskedTypeForICmpPair()`,
which calls `decomposeBitTestICmp()` which should already work for these cases...
As @spatel notes in https://reviews.llvm.org/D49179#1158760,
that code is a rather complex mess, so we'll let it slide.

Reviewers: spatel, craig.topper

Reviewed By: spatel

Subscribers: yamauchi, majnemer, t.p.northover, llvm-commits

Differential Revision: https://reviews.llvm.org/D49179

llvm-svn: 336834
2018-07-11 19:05:04 +00:00
Craig Topper 38b290f7d7 [X86] Remove patterns for inserting a load into a zero vector.
We can instead block the load folding isProfitableToFold. Then isel will emit a register->register move for the zeroing part and a separate load. The PostProcessISelDAG should be able to remove the register->register move.

This saves us patterns and fixes the fact that we only had unaligned load patterns. The test changes show places where we should have been using an aligned load.

llvm-svn: 336828
2018-07-11 18:09:04 +00:00
Simon Pilgrim 667a5b541f [TargetTransformInfo] Add pow2 analysis for scalar constants
Add ConstantInt analysis to getOperandInfo so we get more realistic div/rem expansion costs comparable to the vector costs.

llvm-svn: 336827
2018-07-11 17:51:27 +00:00
Konstantin Zhuravlyov bde59e9989 AMDGPU/NFC: Use already available explicit kernarg
size instead of calculating it again when filling
out the metadata.

llvm-svn: 336825
2018-07-11 17:27:17 +00:00
Jonas Devlieghere 3f27e57ade [DebugInfo] Make children iterator bidirectional
Make the DIE iterator bidirectional so we can move to the previous
sibling of a DIE.

Differential revision: https://reviews.llvm.org/D49173

llvm-svn: 336823
2018-07-11 17:11:11 +00:00
Andrea Di Biagio 483db141e3 [X86] Fix MayLoad/HasSideEffect flag for (V)MOVLPSrm instructions.
Before revision 336728, the "mayLoad" flag for instruction (V)MOVLPSrm was
inferred directly from the "default" pattern associated with the instruction
definition.

r336728 removed special node X86Movlps, and all the patterns associated to it.
Now instruction (V)MOVLPSrm doesn't have a pattern associated to it, and the
'mayLoad/hasSideEffects' flags are left unset.

When the instruction info is emitted by tablegen, method
CodeGenDAGPatterns::InferInstructionFlags() sees that (V)MOVLPSrm doesn't have a
pattern, and flags are undefined. So, it conservatively sets the
"hasSideEffects" flag for it.

As a consequence, we were losing the 'mayLoad' flag, and we were gaining a
'hasSideEffect' flag in its place.
This patch fixes the issue (originally reported by Michael Holmen).

The mca tests show the differences in the instruction info flags.  Instructions
that were affected by this problem were: MOVLPSrm/VMOVLPSrm/VMOVLPSZ128rm.

Differential Revision: https://reviews.llvm.org/D49182

llvm-svn: 336818
2018-07-11 15:27:50 +00:00
Simon Pilgrim 876e99bf2c [SLPVectorizer] Add initial alternate opcode support for cast instructions. (REAPPLIED)
We currently only support binary instructions in the alternate opcode shuffles.

This patch is an initial attempt at adding cast instructions as well, this raises several issues that we probably want to address as we continue to generalize the alternate mechanism:

1 - Duplication of cost determination - we should probably add scalar/vector costs helper functions and get BoUpSLP::getEntryCost to use them instead of determining costs directly.
2 - Support alternate instructions with the same opcode (e.g. casts with different src types) - alternate vectorization of calls with different IntrinsicIDs will require this.
3 - Allow alternates to be a different instruction type - mixing binary/cast/call etc.
4 - Allow passthrough of unsupported alternate instructions - related to PR30787/D28907 'copyable' elements.

Reapplied with fix to only accept 2 different casts if they come from the same source type.

Differential Revision: https://reviews.llvm.org/D49135

llvm-svn: 336812
2018-07-11 15:05:10 +00:00
Simon Pilgrim 1edde95abd Revert rL336804: [SLPVectorizer] Add initial alternate opcode support for cast instructions.
Reverting due to buildbot failures

llvm-svn: 336806
2018-07-11 14:08:16 +00:00
Simon Pilgrim 2f963a7e83 [SLPVectorizer] Add initial alternate opcode support for cast instructions.
We currently only support binary instructions in the alternate opcode shuffles.

This patch is an initial attempt at adding cast instructions as well, this raises several issues that we probably want to address as we continue to generalize the alternate mechanism:

1 - Duplication of cost determination - we should probably add scalar/vector costs helper functions and get BoUpSLP::getEntryCost to use them instead of determining costs directly.
2 - Support alternate instructions with the same opcode (e.g. casts with different src types) - alternate vectorization of calls with different IntrinsicIDs will require this.
3 - Allow alternates to be a different instruction type - mixing binary/cast/call etc.
4 - Allow passthrough of unsupported alternate instructions - related to PR30787/D28907 'copyable' elements.

Differential Revision: https://reviews.llvm.org/D49135

llvm-svn: 336804
2018-07-11 13:34:09 +00:00
Krzysztof Parzyszek 0b492f7fb8 [CodeGen] Ignore debug uses in MachineCopyPropagation
Debug uses should not count as real uses, since the presence of debug
information could affect the generated code.

llvm-svn: 336803
2018-07-11 13:30:27 +00:00
Simon Atanasyan e523792a9c [mips] Update the P5600 scheduler model not to use instruction itineraries.
This mostly brings the P5600 scheduler model to a mostly complete
status. There are a number of instructions which trigger the
`error:'MipsP5600Model' lacks information for` error. These are certain
codegen only instructions relating to MIPS64 which can be addressed by
using the correct predicates for them. That will be done in a full-up
patch.

Patch by Simon Dardis.

Differential revision: https://reviews.llvm.org/D45245

llvm-svn: 336802
2018-07-11 13:21:10 +00:00
Diogo N. Sampaio b0d85ef975 [NFC][InstCombine] Converts isLegalNarrowLoad into isLegalNarrowLdSt
Reuse this function as to test correctness and profitability of
reducing width of either load or store operations.

Reviewsers: samparker

Differential Revision: https://reviews.llvm.org/D48624

llvm-svn: 336800
2018-07-11 12:59:42 +00:00
Sjoerd Meijer 53449daa96 [ARM] ParallelDSP: multiple reduction stmts in loop
This fixes an issue that we were not properly supporting multiple reduction
stmts in a loop, and not generating SMLADs for these cases. The alias analysis
checks were done too early, making it too conservative.

Differential revision: https://reviews.llvm.org/D49125

llvm-svn: 336795
2018-07-11 12:36:25 +00:00
Jonas Devlieghere 26ddf274d7 Use debug-prefix-map for AT_NAME
AT_NAME was being emitted before the directory paths were remapped. This
ensures that all paths are remapped before anything is emitted.

An additional test case has been added.

Note that this only works if the replacement string is an absolute path.
If not, then AT_decl_file believes the new path is a relative path, and
joins that path with the compilation directory. I do not know of a good
way to resolve this.

Patch by: Siddhartha Bagaria (starsid)

Differential revision: https://reviews.llvm.org/D49169

llvm-svn: 336793
2018-07-11 12:30:35 +00:00
Sander de Smalen ea45a89e5c [AArch64][SVE] Asm: Support for COMPACT instruction.
The compact instruction shuffles active elements of vector
into lowest numbered elements and sets remaining elements
to zero. 

e.g.
  compact z0.s, p0, z1.s

llvm-svn: 336789
2018-07-11 11:22:26 +00:00
Sander de Smalen a90530f7c1 [AArch64][SVE] Asm: Support for LAST(A|B) and CLAST(A|B) instructions.
The LASTB and LASTA instructions extract the last active element,
or element after the last active, from the source vector.

The added variants are:

  Scalar:
  last(a|b)  w0, p0, z0.b
  last(a|b)  w0, p0, z0.h
  last(a|b)  w0, p0, z0.s
  last(a|b)  x0, p0, z0.d

  SIMD & FP Scalar:
  last(a|b)  b0, p0, z0.b
  last(a|b)  h0, p0, z0.h
  last(a|b)  s0, p0, z0.s
  last(a|b)  d0, p0, z0.d

The CLASTB and CLASTA conditionally extract the last or element after
the last active element from the source vector.

The added variants are:

  Scalar:
  clast(a|b)  w0, p0, w0, z0.b
  clast(a|b)  w0, p0, w0, z0.h
  clast(a|b)  w0, p0, w0, z0.s
  clast(a|b)  x0, p0, x0, z0.d

  SIMD & FP Scalar:
  clast(a|b)  b0, p0, b0, z0.b
  clast(a|b)  h0, p0, h0, z0.h
  clast(a|b)  s0, p0, s0, z0.s
  clast(a|b)  d0, p0, d0, z0.d

  Vector:
  clast(a|b)  z0.b, p0, z0.b, z1.b
  clast(a|b)  z0.h, p0, z0.h, z1.h
  clast(a|b)  z0.s, p0, z0.s, z1.s
  clast(a|b)  z0.d, p0, z0.d, z1.d

Please refer to the architecture specification for more details on
the semantics of the added instructions.

llvm-svn: 336783
2018-07-11 10:08:00 +00:00
Paul Semel b98f504850 [llvm-readobj] Add -hex-dump (-x) option
Differential Revision: https://reviews.llvm.org/D48281

llvm-svn: 336782
2018-07-11 10:00:29 +00:00
Simon Pilgrim 075b04a55f [SelectionDAG] Add constant buildvector support to isKnownNeverZero
This allows us to use SelectionDAG::isKnownNeverZero in DAGCombiner::visitREM (visitSDIVLike/visitUDIVLike handle the checking for constants).

llvm-svn: 336779
2018-07-11 09:56:41 +00:00
Simon Atanasyan 6cb1c6b35a [mips] Remove dead code. NFC
llvm-svn: 336777
2018-07-11 09:41:28 +00:00
Simon Pilgrim df9d59771b [DAGCombiner] Support non-uniform X%C -> X-(X/C)*C folds
First stage in PR38057 - support non-uniform constant vectors in the combine to reuse the division-by-constant logic.

We can definitely do better for srem pow2 remainders (and avoid that extra multiply....) but this at least helps keep everything on the vector unit.

Differential Revision: https://reviews.llvm.org/D48975

llvm-svn: 336774
2018-07-11 09:22:42 +00:00
Simon Pilgrim 97cf111689 [DAGCombiner] Add (urem X, -1) -> select(X == -1, 0, x) fold
llvm-svn: 336773
2018-07-11 09:14:37 +00:00
Simon Tatham 09f256575b [TableGen] Add missing std::moves to fix build failure.
gcc 4.7 seems to disagree with gcc 5.3 about whether you need to say
'return std::move(thing)' instead of just 'return thing'. All the
json::Arrays and json::Objects that I was implicitly turning into
json::Values by returning them from functions now have explicit
std::move wrappers, so hopefully 4.7 will be happy now.

llvm-svn: 336772
2018-07-11 08:57:56 +00:00
Simon Tatham 6a8c6cadf1 [TableGen] Add a general-purpose JSON backend.
The aim of this backend is to output everything TableGen knows about
the record set, similarly to the default -print-records backend. But
where -print-records produces output in TableGen's input syntax
(convenient for humans to read), this backend produces it as
structured JSON data, which is convenient for loading into standard
scripting languages such as Python, in order to extract information
from the data set in an automated way.

The output data contains a JSON representation of the variable
definitions in output 'def' records, and a few pieces of metadata such
as which of those definitions are tagged with the 'field' prefix and
which defs are derived from which classes. It doesn't dump out
absolutely every piece of knowledge it _could_ produce, such as type
information and complicated arithmetic operator nodes in abstract
superclasses; the main aim is to allow consumers of this JSON dump to
essentially act as new backends, and backends don't generally need to
depend on that kind of data.

The new backend is implemented as an EmitJSON() function similar to
all of llvm-tblgen's other EmitFoo functions, except that it lives in
lib/TableGen instead of utils/TableGen on the basis that I'm expecting
to add it to clang-tblgen too in a future patch.

To test it, I've written a Python script that loads the JSON output
and tests properties of it based on comments in the .td source - more
or less like FileCheck, except that the CHECK: lines have Python
expressions after them instead of textual pattern matches.

Reviewers: nhaehnle

Reviewed By: nhaehnle

Subscribers: arichardson, labath, mgorny, llvm-commits

Differential Revision: https://reviews.llvm.org/D46054

llvm-svn: 336771
2018-07-11 08:40:19 +00:00
Eric Liu 867b0e41fe [WebAssembly] Only call llvm::value::dump() in debug build.
This fixes compile error in r336759. llvm::value::dump is not available
in released build.

llvm-svn: 336770
2018-07-11 08:16:47 +00:00
Craig Topper 02867f0fa3 [X86] The TEST instruction is eliminated when BSF/TZCNT is used
Summary:
These changes cover the PR#31399.
Now the ffs(x) function is lowered to (x != 0) ? llvm.cttz(x) + 1 : 0
and it corresponds to the following llvm code:
  %cnt = tail call i32 @llvm.cttz.i32(i32 %v, i1 true)
  %tobool = icmp eq i32 %v, 0
  %.op = add nuw nsw i32 %cnt, 1
  %add = select i1 %tobool, i32 0, i32 %.op
and x86 asm code:
  bsfl     %edi, %ecx
  addl     $1, %ecx
  testl    %edi, %edi
  movl     $0, %eax
  cmovnel  %ecx, %eax
In this case the 'test' instruction can't be eliminated because
the 'add' instruction modifies the EFLAGS, namely, ZF flag
that is set by the 'bsf' instruction when 'x' is zero.

We now produce the following code:
  bsfl     %edi, %ecx
  movl     $-1, %eax
  cmovnel  %ecx, %eax
  addl     $1, %eax

Patch by Ivan Kulagin

Reviewers: davide, craig.topper, spatel, RKSimon

Reviewed By: craig.topper

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D48765

llvm-svn: 336768
2018-07-11 06:57:42 +00:00
Lang Hames 709f773a88 Revert r336760: "[ORC] Add unit tests for the reexports utility that were..."
This patch broke a few buildbots. I will investigate and re-apply when I have
a fix.

llvm-svn: 336767
2018-07-11 06:46:17 +00:00
Craig Topper 1d6a80cd95 [X86] Remove some composite MOVSS/MOVSD isel patterns.
These patterns looked for a MOVSS/SD followed by a scalar_to_vector. Or a scalar_to_vector followed by a load.

In both cases we emitted a MOVSS/SD for the MOVSS/SD part, a REG_CLASS for the scalar_to_vector, and a MOVSS/SD for the load.

But we have patterns that do each of those 3 things individually so there's no reason to build large patterns.

Most of the test changes are just reorderings. The one test that had a meaningful change is pr30430.ll and it appears to be a regression. But its doing -O0 so I think it missed a lot of opportunities and was just getting lucky before.

llvm-svn: 336762
2018-07-11 04:51:40 +00:00
Lang Hames fdf1a855e0 [ORC] Add unit tests for the reexports utility that were left out of r336741,
and fix a bug that these exposed.

llvm-svn: 336760
2018-07-11 04:39:11 +00:00
Sam Clegg 92617559bb [WebAssembly] Add pass to infer prototypes for prototype-less functions
See https://bugs.llvm.org/show_bug.cgi?id=35385

Differential Revision: https://reviews.llvm.org/D48471

llvm-svn: 336759
2018-07-11 04:29:36 +00:00
Stefan Pintilie b9d01aa29e [Power9] Add remaining __flaot128 builtin support for FMA round to odd
Implement this as it is done on GCC:

__float128 a, b, c, d;
a = __builtin_fmaf128_round_to_odd (b, c, d);         // generates xsmaddqpo
a = __builtin_fmaf128_round_to_odd (b, c, -d);        // generates xsmsubqpo
a = - __builtin_fmaf128_round_to_odd (b, c, d);       // generates xsnmaddqpo
a = - __builtin_fmaf128_round_to_odd (b, c, -d);      // generates xsnmsubpqp

Differential Revision: https://reviews.llvm.org/D48218

llvm-svn: 336754
2018-07-11 01:42:22 +00:00
Eli Friedman d2c739230c [ARM] Treat cmn immediates as legal in isLegalICmpImmediate.
The original code attempted to do this, but the std::abs() call didn't
actually do anything due to implicit type conversions.  Fix the type
conversions, and perform the correct check for negative immediates.

This probably has very little practical impact, but it's worth fixing
just to avoid confusion in the future, I think.

Differential Revision: https://reviews.llvm.org/D48907

llvm-svn: 336742
2018-07-10 23:44:37 +00:00
Lang Hames a3c473e650 [ORC] Generalize alias materialization to support re-exports (i.e. aliasing of
symbols in another VSO).

Also fixes a bug where chained aliases within a single VSO would deadlock on
materialization.

llvm-svn: 336741
2018-07-10 23:34:56 +00:00
George Burgess IV 3482871645 Sort includes + include a missing `extern "C"` header
If we don't include Initialization.h,
`LLVMInitializeAggressiveInstCombiner` won't see its `extern "C"` decl.
This causes sadness, name mangling, and linker errors.

Reported on the mailing lists by Vladimir Vissoultchev. Thanks!

llvm-svn: 336736
2018-07-10 22:48:13 +00:00
Craig Topper 27c77fe4ce [X86] Remove AddedComplexity from all patterns that use X86vzmovl as their root.
Some added 20 and some added 15. Its unclear when to use which value and whether they are required at all.

This patch removes them all. If we start finding real world issues we may need to add them back with proper tests.

llvm-svn: 336735
2018-07-10 22:23:54 +00:00
Richard Trieu d5e57ed9c2 Fix -Wmismatched-tags warning
class -> struct in forward declaration.

llvm-svn: 336733
2018-07-10 22:09:33 +00:00
Craig Topper 860ab496d3 [X86] Teach X86InstrInfo::commuteInstructionImpl to use MOVSD/MOVSS for BLEND under optsize when the immediate allows it.
Isel currently emits movss/movsd a lot of the time and an accidental double commute turns it into a blend.

Ideally we'd select blend directly in isel under optspeed and not rely on the double commute to create blend.

llvm-svn: 336731
2018-07-10 22:02:23 +00:00
Craig Topper dea0b88b04 [X86] Remove X86ISD::MOVLPS and X86ISD::MOVLPD. NFCI
These ISD nodes try to select the MOVLPS and MOVLPD instructions which are special load only instructions. They load data and merge it into the lower 64-bits of an XMM register. They are logically equivalent to our MOVSD node plus a load.

There was only one place in X86ISelLowering that used MOVLPD and no places that selected MOVLPS. The one place that selected MOVLPD had to choose between it and MOVSD based on whether there was a load. But lowering is too early to tell if the load can really be folded. So in isel we have patterns that use MOVSD for MOVLPD if we can't find a load.

We also had patterns that select the MOVLPD instruction for a MOVSD if we can find a load, but didn't choose the MOVLPD ISD opcode for some reason.

So it seems better to just standardize on MOVSD ISD opcode and manage MOVSD vs MOVLPD instruction with isel patterns.

llvm-svn: 336728
2018-07-10 21:00:22 +00:00
Scott Linder 01ce144ddf [AMDGPU] Fix layering issue with AMDGPUHSAMetadataStreamer (NFC)
llvm-svn: 336722
2018-07-10 20:07:22 +00:00
Teresa Johnson c0320ef47b [ThinLTO] Use std::map to get determistic imports files
Summary:
I noticed that the .imports files emitted for distributed ThinLTO
backends do not have consistent ordering. This is because StringMap
iteration order is not guaranteed to be deterministic. Since we already
have a std::map with this information, used when emitting the individual
index files (ModuleToSummariesForIndex), use it for the imports files as
well.

This issue is likely causing some unnecessary rebuilds of the ThinLTO
backends in our distributed build system as the imports files are inputs
to those backends.

Reviewers: pcc, steven_wu, mehdi_amini

Subscribers: mehdi_amini, inglorion, eraman, steven_wu, dexonsmith, llvm-commits

Differential Revision: https://reviews.llvm.org/D48783

llvm-svn: 336721
2018-07-10 20:06:04 +00:00
Craig Topper fb302d0198 [X86] Remove dead SDNode object from X86InstrFragmentsSIMD.td. NFC
It points to an opcode that doesn't exist.

llvm-svn: 336720
2018-07-10 20:03:51 +00:00
Craig Topper 04ded1ac1f [X86] Remove AddedComplexity from register form of NOT. NFCI
I believe isProfitableToFold will stop the load folding that this was intended to overcome.

Given an (xor load, -1), isProfitableToFold will see that the immediate can be folded with the xor using a one byte immediate since it can be sign extended. It doesn't know about NOT, but the one byte immediate check is enough to stop the fold.

llvm-svn: 336712
2018-07-10 19:09:00 +00:00
Craig Topper 0f6275ab43 [X86] Remove AddedComplexity from MMX_X86movw2d patterns.
There were only 3 patterns with this node as a root and they all the same AddedComplexity. So this doesn't really do anything.

llvm-svn: 336711
2018-07-10 18:41:58 +00:00
Scott Linder 2ad2c18b82 [AMDGPU] Refactor HSAMetadataStream::emitKernel (NFC)
Move all metadata construction into AMDGPUHSAMetadataStreamer.

Differential Revision: https://reviews.llvm.org/D48176

llvm-svn: 336707
2018-07-10 17:31:32 +00:00
Alexander Ivchenko 48ca0550dd [GlobalISel][X86_64] Support for G_SITOFP
The instruction selection is automatically handled by tablegen

llvm-svn: 336703
2018-07-10 16:38:35 +00:00
Eugene Leviant 6a572b8e79 [Evaluator] Examine alias when evaluating function call
This fixes PR38120

llvm-svn: 336702
2018-07-10 16:34:23 +00:00
Simon Pilgrim 4cb4609392 [DAGCombiner] Add special case fast paths for udiv x,1 and udiv x,-1
udiv x,-1 was going down the (slow) BuildUDIV route resulting in unnecessary shifts.

llvm-svn: 336701
2018-07-10 16:33:07 +00:00
Jonas Devlieghere 3192e35b60 Revert "[AccelTable] Provide abstraction for emitting DWARF5 accelerator tables."
This reverts r336529 because an alternative approach turned out to be a
better fit for dsymuil.

llvm-svn: 336698
2018-07-10 16:18:56 +00:00
Konstantin Zhuravlyov f0badd5ac1 AMDGPU: Make hidden argument metadata consistent with
amdgpu-implicitarg-num-bytes attribute

Differential Revision: https://reviews.llvm.org/D49096

llvm-svn: 336697
2018-07-10 16:12:51 +00:00
Sanjay Patel c8d9d812ec [InstCombine] allow flag propagation when using safe constant
This corresponds with the code for the single binop pattern
added in rL336684.

llvm-svn: 336696
2018-07-10 16:09:49 +00:00
Ulrich Weigand b961fdc509 [gcov] Fix ABI when calling llvm_gcov_... routines from instrumentation code
The llvm_gcov_... routines in compiler-rt are regular C functions that
need to be called using the proper C ABI for the target. The current
code simply calls them using plain LLVM IR types. Since the type are
mostly simple, this happens to just work on certain targets. But other
targets still need special handling; in particular, it may be necessary
to sign- or zero-extended sub-word values to comply with the ABI. This
caused gcov failures on SystemZ in particular.

Now the very same problem was already fixed for the llvm_profile_ calls
here: https://reviews.llvm.org/D21736

This patch uses the same method to fix the llvm_gcov_ calls, in
particular calls to llvm_gcda_start_file, llvm_gcda_emit_function, and
llvm_gcda_emit_arcs.

Reviewed By: marco-c

Differential Revision: https://reviews.llvm.org/D49134

llvm-svn: 336692
2018-07-10 16:05:47 +00:00
Jonas Devlieghere e13e6dbe40 [MC] Add interface to finish pending labels.
When manually finishing the object writer in dsymutil, it's possible
that there are pending labels that haven't been resolved. This results
in an assertion when the assembler tries to fixup a label that doesn't
have an address yet.

Differential revision: https://reviews.llvm.org/D49131

llvm-svn: 336688
2018-07-10 15:32:17 +00:00
Sanjay Patel 509a1e7a9b [InstCombine] safely allow non-commutative binop identity constant folds
This was originally intended with D48893, but as discussed there, we
have to make the folds safe from producing extra poison. This should
give the single binop folds the same capabilities as the existing
folds for 2-binops+shuffle.

LLVM binary opcode review: there are a total of 18 binops. There are 7 
commutative binops (add, mul, and, or, xor, fadd, fmul) which we already 
fold. We're able to fold 6 more opcodes with this patch (shl, lshr, ashr,
fdiv, udiv, sdiv). There are no folds for srem/urem/frem AFAIK. We don't 
bother with sub/fsub with constant operand 1 because those are 
canonicalized to add/fadd. 7 + 6 + 3 + 2 = 18.

llvm-svn: 336684
2018-07-10 15:12:31 +00:00
Paul Robinson c17c8bf749 Support -fdebug-prefix-map in llvm-mc. This is useful to omit the
debug compilation dir when compiling assembly files with -g.
Part of PR38050.

Patch by Siddhartha Bagaria!

Differential Revision: https://reviews.llvm.org/D48988

llvm-svn: 336680
2018-07-10 14:41:54 +00:00
Sanjay Patel 3333106a62 [InstCombine] drop poison flags when shuffle mask undef propagates to constant
llvm-svn: 336679
2018-07-10 14:27:55 +00:00
Sander de Smalen 53108d48f7 [AArch64][SVE] Asm: Support for predicated unary operations.
This patch adds support for the following instructions:
  CLS  (Count Leading Sign bits)
  CLZ  (Count Leading Zeros)
  CNT  (Count non-zero bits)
  CNOT (Logically invert boolean condition in vector)
  NOT  (Bitwise invert vector)
  FABS (Floating-point absolute value)
  FNEG (Floating-point negate)

All operations are predicated and unary, e.g.
  clz  z0.s, p0/m, z1.s

- CLS, CLZ, CNT, CNOT and NOT have variants for 8, 16, 32
  and 64 bit elements.

- FABS and FNEG have variants for 16, 32 and 64 bit elements.

llvm-svn: 336677
2018-07-10 14:05:55 +00:00
Matt Arsenault a680199a96 Reapply "AMDGPU: Force inlining if LDS global address is used"
This reverts commit r336623

llvm-svn: 336675
2018-07-10 14:03:41 +00:00
Sanjay Patel 06ea4206ad [InstCombine] allow more shuffle-binop folds with safe constants
The case with 2 variables is more complicated than the case where
we eliminate the shuffle entirely because a shuffle with an undef 
mask element creates an undef result. 

I'm not aware of any current analysis/transform that recognizes that 
undef propagating to a div/rem/shift, but we have to guard against 
the possibility.

llvm-svn: 336668
2018-07-10 13:33:26 +00:00
Anastasis Grammenos 612bf7cac5 [DebugInfo][LoopVectorize] Preserve DL in induction PHI and Add
Differential Revision: https://reviews.llvm.org/D48968

llvm-svn: 336667
2018-07-10 13:29:50 +00:00
Simon Pilgrim 641097d561 [DAGCombiner] visitREM - call visitSDIVLike/visitUDIVLike directly to avoid recursive combining.
As suggested by @efriedma on D48975 use the visitSDIVLike/visitUDIVLike functions introduced at rL336656.

llvm-svn: 336664
2018-07-10 13:18:16 +00:00
Krzysztof Parzyszek c052451a02 [Hexagon] Add implicit uses even when untied explicit uses are present
An explicit untied use is not sufficient to maintain liveness of a
register redefined in a predicated instruction. For example
  %1 = COPY %0
  ...
  %1 = A2_paddif %2, %1, 1
could become
  $r1 = COPY $r0
  ...
  $r1 = A2_paddif $p0, $r1, 1
and later
  $r1 = COPY $r0                ;; this is not really dead!
  ...
  $r1 = A2_paddif $p0, $r0, 1

llvm-svn: 336662
2018-07-10 12:57:49 +00:00
Karl-Johan Karlsson 1ffeb5d7f0 [LowerSwitch] Fixed faulty PHI nodes
Summary:
Fixed two cases of where PHI nodes need to be updated by lowerswitch.

When lowerswitch find out that the switch default branch is not
reachable it remove the old default and replace it with the most
popular block from the cases, but it forget to update the PHI
nodes in the default block.

The PHI nodes also need to be updated when the switch is replaced
with a single branch.

Reviewers: hans, reames, arsenm

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D47203

llvm-svn: 336659
2018-07-10 12:06:16 +00:00
Sam McCall e6057bc689 [Support] Harded JSON against invalid UTF-8.
Parsing invalid UTF-8 input is now a parse error.
Creating JSON values from invalid UTF-8 now triggers an assertion, and
(in no-assert builds) substitutes the unicode replacement character.
Strings retrieved from json::Value are always valid UTF-8.

llvm-svn: 336657
2018-07-10 11:51:26 +00:00
Simon Pilgrim ce5c19b623 [DAGCombiner] Split SDIV/UDIV optimization expansions from the rest of the combines. NFCI.
As suggested by @efriedma on D48975, this patch separates the BuildDiv/Pow2 style optimizations from the rest of the visitSDIV/visitUDIV to make it easier to reuse the combines and will allow us to avoid some rather nasty node recursive combining in visitREM.

llvm-svn: 336656
2018-07-10 11:38:00 +00:00
Chandler Carruth 148861f579 [PM/Unswitch] Fix unused variable in r336646.
llvm-svn: 336647
2018-07-10 08:57:04 +00:00
Chandler Carruth 47dc3a346e [PM/Unswitch] Fix a collection of closely related issues with trivial
switch unswitching.

The core problem was that the way we handled unswitching trivial exit
edges through the default successor of a switch. For some reason
I thought the right way to do this was to add a block containing
unreachable and point the default successor at this block. In
retrospect, this has an amazing number of problems.

The first issue is the one that this pass has always worked around -- we
have to *detect* such edges and avoid unswitching them again. This
seemed pretty easy really. You juts look for an edge to a block
containing unreachable. However, this pattern is woefully unsound. So
many things can break it. The amazing thing is that I found a test case
where *simple-loop-unswitch itself* breaks this! When we do
a *non-trivial* unswitch of a switch we will end up splitting this exit
edge. The result will be a default successor that is an exit and
terminates in ... a perfectly normal branch. So the first test case that
I started trying to fix is added to the nontrivial test cases. This is
a ridiculous example that did just amazing things previously. With just
unswitch, it would create 10+ copies of this stuff stamped out. But if
you combine it *just right* with a bunch of other passes (like
simplify-cfg, loop rotate, and some LICM) you can get it to do this
infinitely. Or at least, I never got it to finish. =[

This, in turn, uncovered another related issue. When we are manipulating
these switches after doing a trivial unswitch we never correctly updated
PHI nodes to reflect our edits. As soon as I started changing how these
edges were managed, it became obvious there were more issues that
I couldn't realistically leave unaddressed, so I wrote more test cases
around PHI updates here and ensured all of that works now.

And this, in turn, required some adjustment to how we collect and manage
the exit successor when it is the default successor. That showed a clear
bug where we failed to include it in our search for the outer-most loop
reached by an unswitched exit edge. This was actually already tested and
the test case didn't work. I (wrongly) thought that was due to SCEV
failing to analyze the switch. In fact, it was just a simple bug in the
code that skipped the default successor. While changing this, I handled
it correctly and have updated the test to reflect that we now get
precise SCEV analysis of trip counts for the outer loop in one of these
cases.

llvm-svn: 336646
2018-07-10 08:36:05 +00:00
Simon Pilgrim d32ca2c0b7 [X86][SSE] Prefer BLEND(SHL(v,c1),SHL(v,c2)) over MUL(v, c3)
Now that rL336250 has landed, we should prefer 2 immediate shifts + a shuffle blend over performing a multiply. Despite the increase in instructions, this is quicker (especially for slow v4i32 multiplies), avoid loads and constant pool usage. It does mean however that we increase register pressure. The code size will go up a little but by less than what we save on the constant pool data.

This patch also adds support for v16i16 to the BLEND(SHIFT(v,c1),SHIFT(v,c2)) combine, and also prevents blending on pre-SSE41 shifts if it would introduce extra blend masks/constant pool usage.

Differential Revision: https://reviews.llvm.org/D48936

llvm-svn: 336642
2018-07-10 07:58:33 +00:00
Craig Topper 08b81a5508 [X86] Use IsProfitableToFold to block vinsertf128rm in favor of insert_subreg instead of artifically increasing pattern complexity to give priority.
This is a much more direct way to solve the issue than just giving extra priority.

llvm-svn: 336639
2018-07-10 06:19:54 +00:00
Craig Topper db73f56489 [X86] Remove some seemingly unnecessary patterns.
We're missing the EVEX equivalents of these patterns and seem to get along fine.

I think we end up with X86vzload for the obvious IR cases that would produce this DAG.

llvm-svn: 336638
2018-07-10 05:31:42 +00:00
Craig Topper 866a377e91 [X86] Correct vfixupimm load patterns to look for an integer load, not a floating point load bitcasted to integer.
DAG combine wouldn't let a floating point load bitcasted to integer exist. It would just be an integer load.

llvm-svn: 336626
2018-07-10 00:49:49 +00:00
Craig Topper e4f46e4f31 [X86] Remove FloatVT from X86VectorVTInfo in X86InstrAVX512.td
The only places it was used where places where VT was the same as FloatVT. So switch those uses to VT and drop it.

llvm-svn: 336624
2018-07-10 00:49:45 +00:00
Vlad Tsyrklevich 688e752207 Revert "AMDGPU: Force inlining if LDS global address is used"
This reverts commit r336587, it was causing test failures on the
sanitizer bots.

llvm-svn: 336623
2018-07-10 00:46:07 +00:00
Wolfgang Pieb e194f73e9f [DWARF][NFC] Refactor range list emission to use a static helper
This is prep for DWARF v5 range list emission. Emission of a single range list is moved
to a static helper function.

Reviewer: jdevlieghere

Differential Revision: https://reviews.llvm.org/D49098

llvm-svn: 336621
2018-07-10 00:10:11 +00:00
Sanjay Patel 69faf464ed [InstCombine] allow more shuffle folds using safe constants
getSafeVectorConstantForBinop() was calling getBinOpIdentity() assuming
that the constant we wanted was operand 1 (RHS). That's wrong, but I
don't think we could expose a bug or even a suboptimal fold from that
because the callers have other guards for any binop that would have
been affected.

llvm-svn: 336617
2018-07-09 23:22:47 +00:00
Heejin Ahn fed7382ef6 [WebAssembly] Support for binary atomic RMW instructions
Summary:
This adds support for binary atomic read-modify-write instructions:
add, sub, and, or, xor, and xchg.

This does not yet support translations of some of LLVM IR atomicrmw
instructions (nand, max, min, umax, and umin) that do not have a direct
counterpart in wasm instructions.

Reviewers: dschuff

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D49088

llvm-svn: 336615
2018-07-09 22:30:51 +00:00
Manoj Gupta 77eeac3d9e llvm: Add support for "-fno-delete-null-pointer-checks"
Summary:
Support for this option is needed for building Linux kernel.
This is a very frequently requested feature by kernel developers.

More details : https://lkml.org/lkml/2018/4/4/601

GCC option description for -fdelete-null-pointer-checks:
This Assume that programs cannot safely dereference null pointers,
and that no code or data element resides at address zero.

-fno-delete-null-pointer-checks is the inverse of this implying that
null pointer dereferencing is not undefined.

This feature is implemented in LLVM IR in this CL as the function attribute
"null-pointer-is-valid"="true" in IR (Under review at D47894).
The CL updates several passes that assumed null pointer dereferencing is
undefined to not optimize when the "null-pointer-is-valid"="true"
attribute is present.

Reviewers: t.p.northover, efriedma, jyknight, chandlerc, rnk, srhines, void, george.burgess.iv

Reviewed By: efriedma, george.burgess.iv

Subscribers: eraman, haicheng, george.burgess.iv, drinkcat, theraven, reames, sanjoy, xbolva00, llvm-commits

Differential Revision: https://reviews.llvm.org/D47895

llvm-svn: 336613
2018-07-09 22:27:23 +00:00
Rui Ueyama 0230f7c763 Use StringRef instead of `const char *`.
I don't think there's a need to use `const char *`. In most (probably all?)
cases, we need a length of a name later, so discarding a length will
lead to a wasted effort.

Differential Revision: https://reviews.llvm.org/D49046

llvm-svn: 336612
2018-07-09 22:26:49 +00:00
George Burgess IV 3fbfa9c403 Make llvm.objectsize more conservative with null
In non-zero address spaces, we were reporting that an object at `null`
always occupies zero bytes. This is incorrect in many cases, so just
return `unknown` in those cases for now.

Differential Revision: https://reviews.llvm.org/D48860

llvm-svn: 336611
2018-07-09 22:21:16 +00:00
Lang Hames f07dad3d8f [ORC] Rename MaterializationResponsibility::delegate to replace and add a new
delegate method (and unit test).

The name 'replace' better captures what the old delegate method did: it
returned materialization responsibility for a set of symbols to the VSO.

The new delegate method delegates responsibility for a set of symbols to a new
MaterializationResponsibility instance. This can be used to split responsibility
between multiple threads, or multiple materialization methods.

llvm-svn: 336603
2018-07-09 20:54:36 +00:00
Stefan Pintilie 133acb22bb [Power9] Add __float128 builtins for Rounding Operations
Added __float128 support for a number of rounding operations:

trunc
rint
nearbyint
round
floor
ceil

Differential Revision: https://reviews.llvm.org/D48415

llvm-svn: 336601
2018-07-09 20:38:40 +00:00
Heejin Ahn d31bc9866b [WebAssembly] Improve readability of load/stores and tests. NFC.
Summary:
- Changed variable/function names to be more consistent
- Improved comments in test files
- Added more tests
- Fixed a few typos
- Misc. cosmetic changes

Reviewers: dschuff

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D49087

llvm-svn: 336598
2018-07-09 20:18:21 +00:00
Stefan Pintilie 58e3e0a827 [Power9] [LLVM] Add __float128 support for trunc to double round to odd
Add support for this builtin:
double builtin_truncf128_round_to_odd(float128)

Differential Revision: https://reviews.llvm.org/D48483

llvm-svn: 336595
2018-07-09 20:09:22 +00:00
Mark Searles 7139dea6d9 RenameIndependentSubregs: Fix handling of undef tied operands
Ensure that, if updating a tied operand pair, to only update
that pair.

Differential Revision: https://reviews.llvm.org/D49052

llvm-svn: 336593
2018-07-09 20:07:03 +00:00
Daniel Sanders 9481399c0f [globalisel][irtranslator] Add support for atomicrmw and (strong) cmpxchg
Summary:
This patch adds support for the atomicrmw instructions and the strong
cmpxchg instruction to the IRTranslator.

I've left out weak cmpxchg because LangRef.rst isn't entirely clear on what
difference it makes to the backend. As far as I can tell from the code, it
only matters to AtomicExpandPass which is run at the LLVM-IR level.

Reviewers: ab, t.p.northover, qcolombet, rovka, aditya_nandakumar, volkan, javed.absar

Reviewed By: qcolombet

Subscribers: kristof.beyls, javed.absar, igorb, llvm-commits

Differential Revision: https://reviews.llvm.org/D40092

llvm-svn: 336589
2018-07-09 19:33:40 +00:00
Mark Searles 5bfd8d8991 [AMDGPU][Waitcnt] fix "comparison of integers of different signs" build error
Build error on Android; reported by and fix provided by (thanks) by Mauro Rossi <issor.oruam@gmail.com>

Fixes the following building error:

external/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp:1903:61:
error: comparison of integers of different signs:
'typename iterator_traits<__wrap_iter<MachineBasicBlock **> >::difference_type'
(aka 'int') and 'unsigned int' [-Werror,-Wsign-compare]
                      BlockWaitcntProcessedSet.end(), &MBB) < Count)) {
                      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ^ ~~~~~
1 error generated.

Differential Revision: https://reviews.llvm.org/D49089

llvm-svn: 336588
2018-07-09 19:28:14 +00:00
Matt Arsenault 40cb6cab56 AMDGPU: Force inlining if LDS global address is used
These won't work for the forseeable future. These aren't allowed
from OpenCL, but IPO optimizations can make them appear.

Also directly set the attributes on functions, regardless
of the linkage rather than cloning functions like before.

llvm-svn: 336587
2018-07-09 19:22:22 +00:00
Roman Lebedev 5ccae1750b [X86][TLI] DAGCombine: Unfold variable bit-clearing mask to two shifts.
Summary:
This adds a reverse transform for the instcombine canonicalizations
that were added in D47980, D47981.

As discussed later, that was worse at least for the code size,
and potentially for the performance, too.

https://rise4fun.com/Alive/Zmpl

Reviewers: craig.topper, RKSimon, spatel

Reviewed By: spatel

Subscribers: reames, llvm-commits

Differential Revision: https://reviews.llvm.org/D48768

llvm-svn: 336585
2018-07-09 19:06:42 +00:00
Stefan Pintilie 83a5fe146e [Power9] Add __float128 builtins for Round To Odd
GCC has builtins for these round to odd instructions:

__float128 __builtin_sqrtf128_round_to_odd (__float128)
__float128 __builtin_{add,sub,mul,div}f128_round_to_odd (__float128, __float128)
__float128 __builtin_fmaf128_round_to_odd (__float128, __float128, __float128)

Differential Revision: https://reviews.llvm.org/D47550

llvm-svn: 336578
2018-07-09 18:50:06 +00:00
Maksim Panchenko fa762cc19b [DebugInfo] Change default value of FDEPointerEncoding
Summary:
If the encoding is not specified in CIE augmentation string, then it
should be DW_EH_PE_absptr instead of DW_EH_PE_omit.

Reviewers: ruiu, MaskRay, plotfi, rafauler

Reviewed By: MaskRay

Subscribers: rafauler, JDevlieghere, llvm-commits

Differential Revision: https://reviews.llvm.org/D49000

llvm-svn: 336577
2018-07-09 18:45:38 +00:00
Craig Topper e3b0c7e5bd [SelectionDAG] Add VT consistency checks to the creation of ISD::FMA.
This is similar to what is done for binops. I don't know if this would have helped us catch the bug fixed in r336566 earlier or not, but I figured it couldn't hurt.

llvm-svn: 336576
2018-07-09 18:23:55 +00:00
Diego Caballero 29a07b37bf [LoopInfo] Port loop exit interfaces from Loop to LoopBase
This patch ports hasDedicatedExits, getUniqueExitBlocks and
getUniqueExitBlock in Loop to LoopBase so that they can be used
from other LoopBase sub-classes.

Reviewers: chandlerc, sanjoy, hfinkel, fhahn

Reviewed By: chandlerc

Differential Revision: https://reviews.llvm.org/D48817

llvm-svn: 336572
2018-07-09 17:52:49 +00:00
Craig Topper 47170b3153 [X86] In combineFMA, make sure we bitcast the result of isFNEG back the expected type before creating the new FMA node.
Previously, we were creating malformed SDNodes, but nothing noticed because the type constraints prevented isel from noticing.

llvm-svn: 336566
2018-07-09 17:43:24 +00:00
Sanjay Patel 7cd32419ab [InstCombine] avoid extra poison when moving shift above shuffle
As discussed in D49047 / D48987, shift-by-undef produces poison,
so we can't use undef vector elements in that case..

Note that we need to extend this for poison-generating flags,
and there's a proposal to create poison from FMF in D47963,

llvm-svn: 336562
2018-07-09 17:20:20 +00:00
Steven Wu e1f7c5f8c7 [BitcodeReader] Infer the correct runtime preemption for GlobalValue
Summary:
To allow bitcode built by old compiler to pass the current verifer,
BitcodeReader needs to auto infer the correct runtime preemption from
linkage and visibility for GlobalValues.

Since llvm-6.0 bitcode already contains the new field but can be
incorrect in some cases, the attribute needs to be recomputed all the
time in BitcodeReader. This will make all the GVs has dso_local marked
correctly if read from bitcode, and it should still allow the verifier
to catch mistakes in optimization passes.

This should fix PR38009.

Reviewers: sfertile, vsk

Reviewed By: vsk

Subscribers: dexonsmith, llvm-commits

Differential Revision: https://reviews.llvm.org/D49039

llvm-svn: 336560
2018-07-09 16:52:05 +00:00
Sanjay Patel a62725317b [InstCombine] generalize safe vector constant utility
This is almost NFC, but there could be some case where the original
code had undefs in the constants (rather than just the shuffle mask),
and we'll use safe constants rather than undefs now.

The FIXME noted in foldShuffledBinop() is already visible in existing
tests, so correcting that is the next step.

llvm-svn: 336558
2018-07-09 16:16:51 +00:00
Craig Topper e9cff7d47b [X86] Remove some patterns that include a bitcast of a floating point load to an integer type.
DAG combine should have converted the type of the load.

llvm-svn: 336557
2018-07-09 16:03:02 +00:00
Craig Topper 16ee4b4957 [X86] Remove some patterns that seems to be unreachable.
These patterns mapped (v2f64 (X86vzmovl (v2f64 (scalar_to_vector FR64:$src)))) to a MOVSD and an zeroing XOR. But the complexity of a pattern for (v2f64 (X86vzmovl (v2f64))) that selects MOVQ is artificially and hides this MOVSD pattern.

Weirder still, the SSE version of the pattern was explicitly blocked on SSE41, but yet we had copied it to AVX and AVX512.

llvm-svn: 336556
2018-07-09 16:03:01 +00:00
Craig Topper 22330c700b [X86] Remove some seemingly unnecessary AddedComplexity lines.
Looking at the generated tables this didn't seem to make an obvious difference in pattern priority.

llvm-svn: 336555
2018-07-09 16:02:59 +00:00
Diego Caballero d09530144a [VPlan][LV] Introduce condition bit in VPBlockBase
This patch introduces a VPValue in VPBlockBase to represent the condition
bit that is used as successor selector when a block has multiple successors.
This information wasn't necessary until now, when we are about to introduce
outer loop vectorization support in VPlan code gen.

Reviewers: fhahn, rengolin, mkuper, hfinkel, mssimpso

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D48814

llvm-svn: 336554
2018-07-09 15:57:09 +00:00
Sander de Smalen d3efb59f29 [AArch64][SVE] Asm: Support for CNT(B|H|W|D) and CNTP instructions.
This patch adds support for the following instructions:

  CNTB CNTH - Determine the number of active elements implied by
  CNTW CNTD   the named predicate constant, multiplied by an
              immediate, e.g.

                cnth x0, vl8, #16

  CNTP      - Count active predicate elements, e.g.
                cntp  x0, p0, p1.b

              counts the number of active elements in p1, predicated
              by p0, and stores the result in x0.

llvm-svn: 336552
2018-07-09 15:22:08 +00:00
Xin Tong b467233d8b [CVP] Handle calls with void return value. No need to create CVPLattice state for it.
Summary:
Tests: 10
Metric: compile_time

Program                                         unpatch-result  patch-result diff

Bullet/bullet                                  32.39           30.54        -5.7%
SPASS/SPASS                                    18.14           17.25        -4.9%
mafft/pairlocalalign                           12.10           11.64        -3.8%
ClamAV/clamscan                                19.21           19.63         2.2%
7zip/7zip-benchmark                            49.55           48.85        -1.4%
kimwitu++/kc                                   15.68           15.87         1.2%
lencod/lencod                                  21.13           21.34         1.0%
consumer-typeset/consumer-typeset              13.65           13.62        -0.2%
tramp3d-v4/tramp3d-v4                          29.88           29.92         0.1%
sqlite3/sqlite3                                18.48           18.46        -0.1%
       unpatch-result  patch-result       diff
count  10.000000       10.000000     10.000000
mean   23.022000       22.712400    -0.011671
std    11.362831       11.094183     0.027338
min    12.104000       11.640000    -0.057298
25%    16.299000       16.214000    -0.032282
50%    18.844000       19.048000    -0.001350
75%    27.689000       27.774000     0.007752
max    49.552000       48.852000     0.021861

I also tested only this pass by concatenating all the code from the
llvm/lib/Analysis/ folder and do clang -g followed by opt. I get close to 20% speedup
for the pass. I expect a majority of the gain come from skipping the dbg intrinsics.

Before patch (opt -time-passes -called-value-propagation):
============
===-------------------------------------------------------------------------===
 ... Pass execution timing report ...
===-------------------------------------------------------------------------===
 Total Execution Time: 3.8303 seconds (3.8279 wall clock)

 ---User Time--- --System Time-- --User+System-- ---Wall Time--- ---
Name ---
 2.0768 ( 57.3%) 0.0990 ( 48.0%) 2.1757 ( 56.8%) 2.1757 ( 56.8%) Bitcode
Writer
 0.8444 ( 23.3%) 0.0600 ( 29.1%) 0.9044 ( 23.6%) 0.9044 ( 23.6%) Called
Value Propagation
 0.7031 ( 19.4%) 0.0472 ( 22.9%) 0.7502 ( 19.6%) 0.7478 ( 19.5%) Module
Verifier
 3.6242 (100.0%) 0.2062 (100.0%) 3.8303 (100.0%) 3.8279 (100.0%) Total

After patch (opt -time-passes -called-value-propagation):
============
===-------------------------------------------------------------------------===
 ... Pass execution timing report ...
===-------------------------------------------------------------------------===
 Total Execution Time: 3.6605 seconds (3.6579 wall clock)

 ---User Time--- --System Time-- --User+System-- ---Wall Time--- ---
Name ---
 2.0716 ( 59.7%) 0.0990 ( 52.5%) 2.1705 ( 59.3%) 2.1706 ( 59.3%) Bitcode
Writer
 0.7144 ( 20.6%) 0.0300 ( 15.9%) 0.7444 ( 20.3%) 0.7444 ( 20.4%) Called
Value Propagation
 0.6859 ( 19.8%) 0.0596 ( 31.6%) 0.7455 ( 20.4%) 0.7429 ( 20.3%) Module
Verifier
 3.4719 (100.0%) 0.1886 (100.0%) 3.6605 (100.0%) 3.6579 (100.0%) Total

Reviewers: davide, mssimpso

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D49078

llvm-svn: 336551
2018-07-09 14:53:37 +00:00
Stefan Pintilie 3d76326d24 [Power9] Add __float128 support for compare operations
Added handling for the select f128.

Differential Revision: https://reviews.llvm.org/D48294

llvm-svn: 336548
2018-07-09 13:36:14 +00:00
Sander de Smalen 813b21e33a [AArch64][SVE] Asm: Support for remaining shift instructions.
This patch completes support for shifts, which include:
- LSL   - Logical Shift Left
- LSLR  - Logical Shift Left, Reversed form
- LSR   - Logical Shift Right
- LSRR  - Logical Shift Right, Reversed form
- ASR   - Arithmetic Shift Right
- ASRR  - Arithmetic Shift Right, Reversed form
- ASRD  - Arithmetic Shift Right for Divide

In the following variants:

- Predicated shift by immediate - ASR, LSL, LSR, ASRD
  e.g.
    asr z0.h, p0/m, z0.h, #1

  (active lanes of z0 shifted by #1)

- Unpredicated shift by immediate - ASR, LSL*, LSR*
  e.g.
    asr z0.h, z1.h, #1

  (all lanes of z1 shifted by #1, stored in z0)

- Predicated shift by vector - ASR, LSL*, LSR*
  e.g.
    asr z0.h, p0/m, z0.h, z1.h

  (active lanes of z0 shifted by z1, stored in z0)

- Predicated shift by vector, reversed form - ASRR, LSLR, LSRR
  e.g.
    lslr z0.h, p0/m, z0.h, z1.h

  (active lanes of z1 shifted by z0, stored in z0)

- Predicated shift left/right by wide vector - ASR, LSL, LSR
  e.g.
    lsl z0.h, p0/m, z0.h, z1.d

  (active lanes of z0 shifted by wide elements of vector z1)

- Unpredicated shift left/right by wide vector - ASR, LSL, LSR
  e.g.
    lsl z0.h, z1.h, z2.d

  (all lanes of z1 shifted by wide elements of z2, stored in z0)

*Variants added in previous patches.

llvm-svn: 336547
2018-07-09 13:23:41 +00:00
Sanjay Patel 5bd36644c8 [InstCombine] fix shuffle-of-binops transform to avoid poison/undef
As noted in D48987, there are many different ways for this transform to go wrong. 
In particular, the poison potential for shifts means we have to more careful with those ops. 
I added tests to make that behavior visible for all of the different cases that I could find.

This is a partial fix. To make this review easier, I did not make changes for the single binop 
pattern (handled in foldSelectShuffleWith1Binop()). I also left out some potential optimizations 
noted with TODO comments. I'll follow-up once we're confident that things are correct here.

The goal is to correct all marked FIXME tests to either avoid the shuffle transform or do it safely.

Note that distinguishing when the shuffle mask contains undefs and using getBinOpIdentity() allows 
for some improvements to div/rem patterns, so there are wins along with the missed opportunities 
and fixes.

Differential Revision: https://reviews.llvm.org/D49047

llvm-svn: 336546
2018-07-09 13:21:46 +00:00
Stefan Maksimovic 0a23998fe7 [mips] Addition of the [d]rem and [d]remu instructions
Related to http://reviews.llvm.org/D15772
Depends on http://reviews.llvm.org/D16889
Adds [D]REM[U] instructions.

Patch By: Srdjan Obucina
Contributions from: Simon Dardis

Differential Revision: https://reviews.llvm.org/D17036

llvm-svn: 336545
2018-07-09 13:06:44 +00:00
Sander de Smalen 54077dcfcb [AArch64][SVE] Asm: Support for TBL instruction.
Support for SVE's TBL instruction for programmable table
lookup/permute using vector of element indices, e.g.

  tbl  z0.d, { z1.d }, z2.d

stores elements from z1, indexed by elements from z2, into z0.

llvm-svn: 336544
2018-07-09 12:32:56 +00:00
Sam McCall d93eaeb7c3 [Support] Make JSON handle doubles and int64s losslessly
Summary:
This patch adds a new "integer" ValueType, and renames Number -> Double.
This allows us to preserve the full precision of int64_t when parsing integers
from the wire, or constructing from an integer.
The API is unchanged, other than giving asInteger() a clearer contract.

In addition, always output doubles with enough precision that parsing will
reconstruct the same double.

Reviewers: simon_tatham

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D46209

llvm-svn: 336541
2018-07-09 12:16:40 +00:00
Chandler Carruth ed2965438e [PM/Unswitch] Fix a nasty bug in the new PM's unswitch introduced in
r335553 with the non-trivial unswitching of switches.

The code correctly updated most aspects of the CFG and analyses, but
missed some crucial aspects:
1) When multiple cases have the same successor, we unswitch that
   a single time and replace the switch with a direct branch. The CFG
   here is correct, but the target of this direct branch may have had
   a PHI node with multiple entries in it.
2) When we still have to clone a successor of the switch into an
   unswitched copy of the loop, we'll delete potentially multiple edges
   entering this successor, not just one.
3) We also have to delete multiple edges entering the successors in the
   original loop when they have to be retained.
4) When the "retained successor" *also* occurs as a case successor, we
   just assert failed everywhere. This doesn't happen very easily
   because its always valid to simply drop the case -- the retained
   successor for switches is always the default successor. However, it
   is likely possible through some contrivance of different loop passes,
   unrolling, and simplifying for this to occur in practice and
   certainly there is nothing "invalid" about the IR so this pass needs
   to handle it.
5) In the case of #4, we also will replace these multiple edges with
   a direct branch much like in #1 and need to collapse the entries in
   any PHI nodes to a single enrty.

All of this stems from the delightful fact that the same successor can
show up in multiple parts of the switch terminator, and each of these
are considered a distinct edge for the purpose of PHI nodes (and
iterating the successors and predecessors) but not for unswitching
itself, the dominator tree, or many other things. For the record,
I intensely dislike this "feature" of the IR in large part because of
the complexity it causes in passes like this. We already have a ton of
logic building sets and handling duplicates, and we just had to add
a bunch more.

I've added a complex test case that covers all five of the above failure
modes. I've also added a variation on it where #4 and #5 occur in loop
exit, adding fun where we have an LCSSA PHI node with "multiple entries"
despite have dedicated exits. There were no additional issues found by
this, but it seems a useful corner case to cover with testing.

One thing that working on all of this code has made painfully clear for
me as well is how amazingly inefficient our PHI node representation is
(in terms of the in-memory data structures and the APIs used to update
them). This code has truly marvelous complexity bounds because every
time we remove an entry from a PHI node we do a linear scan to find it
and then a linear update to the data structure to remove it. We could in
theory batch all of the PHI node updates into a single linear walk of
the operands making this much more efficient, but the APIs fight hard
against this and the fact that we have to handle duplicates in the
peculiar manner we do (removing all but one in some cases) makes even
implementing that very tedious and annoying. Anyways, none of this is
new here or specific to loop unswitching. All code in LLVM that updates
PHI node operands suffers from these problems.

llvm-svn: 336536
2018-07-09 10:30:48 +00:00
Sam McCall 6be3824721 Lift JSON library from clang-tools-extra/clangd to llvm/Support.
Summary:
This consists of four main parts:
 - an type json::Expr representing JSON values of dynamic kind, which can be
   composed, inspected, and modified
 - a JSON parser from string -> json::Expr
 - a JSON printer from json::Expr -> string, with optional pretty-printing
 - a convention for mapping json::Expr <=> native types (fromJSON/toJSON)
   Mapping functions are provided for primitives (e.g. int, vector) and the
   ObjectMapper helper helps implement fromJSON for struct/object types.

Based on clangd's usage, a couple of places I'd appreciate review attention:
 - fromJSON returns only bool. A richer error-signaling mechanism may be useful
   to provide useful messages, or let recursive fromJSONs (containers/structs)
   do careful error recovery.
 - should json::obj be always explicitly written (like json::ary)
 - there's no streaming parse API. I suspect there are some simple wins like
   a callback API where the document is a long array, and each element is small.
   But this can probably be bolted on easily when we see the need.

Reviewers: bkramer, labath

Subscribers: mgorny, ilya-biryukov, ioeric, MaskRay, llvm-commits

Differential Revision: https://reviews.llvm.org/D45753

llvm-svn: 336534
2018-07-09 10:05:41 +00:00
Sander de Smalen c69944c6b0 [AArch64][SVE] Asm: Support for ADR instruction.
Supporting various addressing modes:
- adr z0.s, [z0.s, z0.s]
- adr z0.s, [z0.s, z0.s, lsl #<shift>]
- adr z0.d, [z0.d, z0.d]
- adr z0.d, [z0.d, z0.d, lsl #<shift>]
- adr z0.d, [z0.d, z0.d, uxtw #<shift>]
- adr z0.d, [z0.d, z0.d, sxtw #<shift>]

Reviewers: rengolin, fhahn, SjoerdMeijer, samparker, javed.absar

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D48870

llvm-svn: 336533
2018-07-09 09:58:24 +00:00
Sander de Smalen bd513b42a1 [AArch64][SVE] Asm: Support for UZP and TRN instructions.
This patch adds support for:
  UZP1  Concatenate even elements from two vectors
  UZP2  Concatenate  odd elements from two vectors
  TRN1  Interleave  even elements from two vectors
  TRN2  Interleave   odd elements from two vectors

With variants for both data and predicate vectors, e.g.
  uzp1    z0.b, z1.b, z2.b
  trn2    p0.s, p1.s, p2.s

llvm-svn: 336531
2018-07-09 09:12:17 +00:00
Jonas Devlieghere 5e810a878d [AccelTable] Provide abstraction for emitting DWARF5 accelerator tables.
When emitting the DWARF accelerator tables from dsymutil, we don't have
a DwarfDebug instance and we use a custom class to represent Dwarf
compile units. This patch adds an interface AccelTableWriterInfo to
abstract these from the Dwarf5AccelTableWriter, so we can have a custom
implementation for this in dsymutil.

Differential revision: https://reviews.llvm.org/D49031

llvm-svn: 336529
2018-07-09 09:08:44 +00:00
Jonas Devlieghere e60ca777b6 [AccelTable] Dwarf5AccelTableEmitter -> Writer (NFC)
Renames Dwarf5AccelTableEmitter to Dwarf5AccelTableWriter as suggested
in D49031.

llvm-svn: 336525
2018-07-09 08:47:38 +00:00
Chijun Sima 9e1e0c7b2a [PGOMemOPSize] Preserve the DominatorTree
Summary:
PGOMemOPSize only modifies CFG in a couple of places; thus we can preserve the DominatorTree with little effort.
When optimizing SQLite with -O3, this patch can decrease 3.8% of the numbers of nodes traversed by DFS and 5.7% of the times DominatorTreeBase::recalculation is called.

Reviewers: kuhar, davide, dmgreen

Reviewed By: dmgreen

Subscribers: mzolotukhin, vsk, llvm-commits

Differential Revision: https://reviews.llvm.org/D48914

llvm-svn: 336522
2018-07-09 08:07:21 +00:00
Craig Topper b8145ec667 [X86] Improve the message for some asserts. Remove an if that is guaranteed true by said asserts.
This replaces some asserts in lowerV2F64VectorShuffle with the similar asserts from lowerVIF64VectorShuffle which are more readable. The original asserts mentioned a blend, but there's no guarantee that it is a blend.

Also remove an if that the asserts prove is always true. Mask[0] is always less than 2 and Mask[1] is always at least 2. Therefore (Mask[0] >= 2) + (Mask[1] >= 2) == 1 must wlays be true.

llvm-svn: 336517
2018-07-09 01:52:56 +00:00
Craig Topper c98c675f03 [X86] Remove an AddedComplexity line that seems unnecessary.
It only existed on SSE and AVX version. AVX512 version didn't have it.

I checked the generated table and this didn't seem necessary to creat a match preference.

llvm-svn: 336516
2018-07-08 22:57:33 +00:00
Roman Lebedev 75ce45376b [X86][Nearly NFC] Split SHLD/SHRD into their own WriteShiftDouble class
Summary:
{F6603964}
While there is still some discrepancies within that new group,
it is clearly separate from the other shifts.
And Agner's tables agree, these double shifts are clearly
different from the normal shifts/rotates.

I'm guessing `FeatureSlowSHLD` is related.

Indeed, a basic sched pair is *not* the /best/ match.
But keeping it in the WriteShift is /clearly/ not ideal either.
This can and likely will be fine-tuned later.

This is purely mechanical change, it does not change any numbers,
as the [lack of the change of] mca tests show.

Reviewers: craig.topper, RKSimon, andreadb

Reviewed By: craig.topper

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D49015

llvm-svn: 336515
2018-07-08 19:01:55 +00:00
Craig Topper 9e17073c21 [X86] Enhance combineFMA to look for FNEG behind an EXTRACT_VECTOR_ELT.
llvm-svn: 336514
2018-07-08 18:04:00 +00:00
Simon Pilgrim 2eced71ecf [X86][SSE] Combine v16i8 SHL by constants to multiplies
Pre-AVX512 (which can perform a quick extend/shift/truncate), extending to 2 v8i16 for the PMULLW and then truncating is more performant than relying on the generic PBLENDVB vXi8 shift path and uses a similar amount of mask constant pool data.

Differential Revision: https://reviews.llvm.org/D48963

llvm-svn: 336513
2018-07-08 12:47:50 +00:00
Simon Pilgrim 1795870bb8 [X86] Set scheduler classes to unsupported. NFCI.
While looking at PR36895 I noticed how much of the atom model was still setting schedules for unsupported SSE4+ instructions.

llvm-svn: 336512
2018-07-08 10:32:07 +00:00
Roman Lebedev fa988853bd [X86][Basically NFC] Sched: split WriteBitScan into WriteBSF/WriteBSR.
Summary:
Motivation: {F6597954}

This only does the mechanical splitting, does not actually change
any numbers, as the tests added in previous revision show.

Reviewers: craig.topper, RKSimon, courbet

Reviewed By: craig.topper

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D48998

llvm-svn: 336511
2018-07-08 09:50:25 +00:00
Craig Topper 2835278ee0 [LoopIdiomRecognize] Support for converting loops that use LSHR to CTLZ.
In the 'detectCTLZIdiom' function support for loops that use LSHR instruction instead of ASHR has been added.

This supports creating ctlz from the following code.

int lzcnt(int x) {
     int count = 0;
     while (x > 0)  {
          count++;
          x = x >> 1;
     }
    return count;
}

Patch by Olga Moldovanova

Differential Revision: https://reviews.llvm.org/D48354

llvm-svn: 336509
2018-07-08 01:45:47 +00:00
Craig Topper f1a981c705 [X86] Add back some intrinsic table entries lost in r336506.
llvm-svn: 336508
2018-07-08 01:23:49 +00:00
Craig Topper fdf3f1ff82 [X86] Add new scalar fma intrinsics with rounding mode that use f32/f64 types.
This allows us to handle masking in a very similar way to the default rounding version that uses llvm.fma.

I had to add new rounding mode CodeGenOnly instructions to support isel when we can't find a movss to grab the upper bits from to use the b_Int instruction.

Fast-isel tests have been updated to match new clang codegen.

We are currently having trouble folding fneg into the new intrinsic. I'm going to correct that in a follow up patch to keep the size of this one down.

A future patch will also remove the old intrinsics.

llvm-svn: 336506
2018-07-08 01:10:43 +00:00
Simon Pilgrim 23f9eddabe [SelectionDAG] Split float and integer isKnownNeverZero tests
Splits off isKnownNeverZeroFloat to handle +/- 0 float cases.

This will make it easier to be more aggressive with the integer isKnownNeverZero tests (similar to ValueTracking), use computeKnownBits etc.

Differential Revision: https://reviews.llvm.org/D48969

llvm-svn: 336492
2018-07-07 18:17:14 +00:00
Simon Pilgrim 0a36daa5a2 Use const APInt& to avoid extra copy. NFCI.
As discussed on D48825.

llvm-svn: 336491
2018-07-07 17:33:48 +00:00
Simon Pilgrim c1d1944053 [DAGCombiner] Add EXTRACT_SUBVECTOR to SimplifyDemandedVectorElts
As discussed on PR37989, this patch adds EXTRACT_SUBVECTOR handling to TargetLowering::SimplifyDemandedVectorElts and calls it from DAGCombiner::visitEXTRACT_SUBVECTOR.

Differential Revision: https://reviews.llvm.org/D48825

llvm-svn: 336490
2018-07-07 17:30:06 +00:00
Simon Pilgrim dc113dc7ed [CostModel][X86] Add SREM/UREM general and constant costs (PR38056)
We penalize general SDIV/UDIV costs but don't do the same for SREM/UREM.

This patch makes general vector SREM/UREM x20 as costly as scalar, the same approach as we do for SDIV/UDIV. The patch also extends the existing SDIV/UDIV constant costs for SREM/UREM - at the moment this means the additional cost of a MUL+SUB (see D48975).

Differential Revision: https://reviews.llvm.org/D48980

llvm-svn: 336486
2018-07-07 16:53:30 +00:00
Yvan Roux a96a04558d [MachineOutliner] Assert that Liveness tracking is accurate (NFC)
The checking is done deeper inside MachineBasicBlock, but this will
hopefully help to find issues when porting the machine outliner to a
target where Liveness tracking is broken (like ARM).

Differential Revision: https://reviews.llvm.org/D49023

llvm-svn: 336481
2018-07-07 08:02:19 +00:00
Chandler Carruth d8b0c8ce1b [PM/LoopUnswitch] Fix PR37889, producing the correct loop nest structure
after trivial unswitching.

This PR illustrates that a fundamental analysis update was not performed
with the new loop unswitch. This update is also somewhat fundamental to
the core idea of the new loop unswitch -- we actually *update* the CFG
based on the unswitching. In order to do that, we need to update the
loop nest in addition to the domtree.

For some reason, when writing trivial unswitching, I thought that the
loop nest structure cannot be changed by the transformation. But the PR
helps illustrate that it clearly can. I've expanded this to a number of
different test cases that try to cover the different cases of this. When
we unswitch, we move an exit edge of a loop out of the loop. If this
exit edge changes which loop reached by an exit is the innermost loop,
it changes the parent of the loop. Essentially, this transformation may
hoist the inner loop up the nest. I've added the simple logic to handle
this reliably in the trivial unswitching case. This just requires
updating LoopInfo and rebuilding LCSSA on the impacted loops. In the
trivial case, we don't even need to handle dedicated exits because we're
only hoisting the one loop and we just split its preheader.

I've also ported all of these tests to non-trivial unswitching and
verified that the logic already there correctly handles the loop nest
updates necessary.

Differential Revision: https://reviews.llvm.org/D48851

llvm-svn: 336477
2018-07-07 01:12:56 +00:00
Craig Topper 2c27e33a58 [X86] Merge INTR_TYPE_3OP_RM with INTR_TYPE_3OP. Remove unused INTR_TYPE_1OP_RM.
llvm-svn: 336476
2018-07-07 01:04:22 +00:00
Tim Shen 2ed501d656 Revert "[SCEV] Strengthen StrengthenNoWrapFlags (reapply r334428)."
This reverts commit r336140. Our tests shows that LSR assert fails with it.

llvm-svn: 336473
2018-07-06 23:20:35 +00:00
Benjamin Kramer 9fc944ae36 [PDB] memicmp only exists on Windows, use StringRef::compare_lower instead
llvm-svn: 336469
2018-07-06 21:56:57 +00:00
Vedant Kumar 71c7c43695 Fix DIExpression::ExprOperand::appendToVector
appendToVector used the wrong overload of SmallVector::append, resulting
in it appending the same element to a vector `getSize()` times. This did
not cause a problem when initially committed because appendToVector was
only used to append 1-element operands.

This changes appendToVector to use the correct overload of append().

Testing: ./unittests/IR/IRTests --gtest_filter='*DIExpressionTest*'
llvm-svn: 336466
2018-07-06 21:06:21 +00:00
Vedant Kumar 8a3680852e Remove a redundant null-check in DIExpression::prepend, NFC
Code outside of an `if (Expr)` block dereferenced `Expr`, so the null
check was redundant.

llvm-svn: 336465
2018-07-06 21:06:20 +00:00
Zachary Turner 648bebdc67 [PDB] One more fix for hasing GSI records.
The reference implementation uses a case-insensitive string
comparison for strings of equal length.  This will cause the
string "tEo" to compare less than "VUo".  However we were using
a case sensitive comparison, which would generate the opposite
outcome.  Switch to a case insensitive comparison.  Also, when
one of the strings contains non-ascii characters, fallback to
a straight memcmp.

The only way to really test this is with a DIA test.  Before this
patch, the test will fail (but succeed if link.exe is used instead
of lld-link).  After the patch, it succeeds even with lld-link.

llvm-svn: 336464
2018-07-06 21:01:42 +00:00
Vedant Kumar b3091da3af Use Type::isIntOrPtrTy where possible, NFC
It's a bit neater to write T.isIntOrPtrTy() over `T.isIntegerTy() ||
T.isPointerTy()`.

I used Python's re.sub with this regex to update users:

  r'([\w.\->()]+)isIntegerTy\(\)\s*\|\|\s*\1isPointerTy\(\)'

llvm-svn: 336462
2018-07-06 20:17:42 +00:00
Fangrui Song 3c1b5dbbf1 [IR] Fix inconsistent declaration parameter name
llvm-svn: 336459
2018-07-06 19:26:00 +00:00
Craig Topper f61c631b25 [X86] Remove patterns for MOVLPD/MOVLPS nodes with integer types.
Lowering shouldn't generate these. If we need to use them for integer types, it should use a bitcast.

llvm-svn: 336458
2018-07-06 18:47:57 +00:00
Craig Topper 77edbffabd [X86] Add more FMA3 memory folding patterns. Remove patterns that are no longer needed.
We've removed the legacy FMA3 intrinsics and are now using llvm.fma and extractelement/insertelement. So we don't need patterns for the nodes that could only be created by the old intrinscis. Those ISD opcodes still exist because we haven't dropped the AVX512 intrinsics yet, but those should go to EVEX instructions.

llvm-svn: 336457
2018-07-06 18:47:55 +00:00
Nico Weber 038dbf3c24 Revert 336426 (and follow-ups 428, 440), it very likely caused PR38084.
llvm-svn: 336453
2018-07-06 17:37:24 +00:00
Vedant Kumar 6379a62250 [Local] replaceAllDbgUsesWith: Update debug values before RAUW
The replaceAllDbgUsesWith utility helps passes preserve debug info when
replacing one value with another.

This improves upon the existing insertReplacementDbgValues API by:

- Updating debug intrinsics in-place, while preventing use-before-def of
  the replacement value.
- Falling back to salvageDebugInfo when a replacement can't be made.
- Moving the responsibiliy for rewriting llvm.dbg.* DIExpressions into
  common utility code.

Along with the API change, this teaches replaceAllDbgUsesWith how to
create DIExpressions for three basic integer and pointer conversions:

- The no-op conversion. Applies when the values have the same width, or
  have bit-for-bit compatible pointer representations.
- Truncation. Applies when the new value is wider than the old one.
- Zero/sign extension. Applies when the new value is narrower than the
  old one.

Testing:

- check-llvm, check-clang, a stage2 `-g -O3` build of clang,
  regression/unit testing.
- This resolves a number of mis-sized dbg.value diagnostics from
  Debugify.

Differential Revision: https://reviews.llvm.org/D48676

llvm-svn: 336451
2018-07-06 17:32:39 +00:00
Tom Stellard ec4feae1b6 AMDGPU: Fix UBSan error caused by r335942
Summary: Fixes PR38071.

Reviewers: arsenm, dstenb

Reviewed By: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D48979

llvm-svn: 336448
2018-07-06 17:16:17 +00:00
Sanjay Patel e85a300a77 [Constants] extend getBinOpIdentity(); NFC
The enhanced version will be used in D48893 and related patches
and an almost identical (fadd is different) version is proposed 
in D28907, so adding this as a preliminary step.

llvm-svn: 336444
2018-07-06 15:18:58 +00:00
Sanjay Patel e6dda2fee7 [Constant] add undef element query for vector constants; NFC
This is likely to be used in D48987 and similar patches, 
so adding it as an NFC preliminary step.

llvm-svn: 336442
2018-07-06 14:52:36 +00:00
Sjoerd Meijer b3e06faa28 [ARM] ParallelDSP: added statistics, NFC.
Added statistics for the number of SMLAD instructions created, and
als renamed the pass name to -arm-parallel-dsp.

Differential Revision: https://reviews.llvm.org/D48971

llvm-svn: 336441
2018-07-06 14:47:09 +00:00
Benjamin Kramer 3687ac52a9 [LoopSink] Make the enforcement of determinism deterministic.
LoopBlockNumber is a DenseMap<BasicBlock*, int>, comparing the result of
find() will compare a pair<BasicBlock*, int>. That's of course depending
on pointer ordering which varies from run to run. Reverse iteration
doesn't find this because we're copying to a vector first.

This bug has been there since 2016 but only recently showed up on clang
selfhost with FDO and ThinLTO, which is also why I didn't manage to get
a reasonable test case for this. Add an assert that would've caught
this.

llvm-svn: 336439
2018-07-06 14:20:58 +00:00
Sjoerd Meijer 35bd8f5d1e [AArch64] Armv8.4-A: TLB support
This adds:
- outer shareable TLB Maintenance instructions, and
- TLB range maintenance instructions.

llvm-svn: 336434
2018-07-06 13:00:16 +00:00
Sjoerd Meijer a3dad801b7 Recommit: [AArch64] Armv8.4-A: Flag manipulation instructions
Now with the asm operand definition included.

llvm-svn: 336432
2018-07-06 12:32:33 +00:00
Diogo N. Sampaio 17be994942 Added missing semicolon
llvm-svn: 336428
2018-07-06 10:09:04 +00:00
Diogo N. Sampaio 742bf1a255 [SelectionDAG] https://reviews.llvm.org/D48278
D48278

Allow to reduce redundant shift masks.
For example:
x1 = x & 0xAB00
x2 = (x >> 8) & 0xAB

can be reduced to:
x1 = x & 0xAB00
x2 = x1 >> 8
It only allows folding when the masks and shift values are constants.

llvm-svn: 336426
2018-07-06 09:42:25 +00:00
Sjoerd Meijer 8203177e5e Revert [AArch64] Armv8.4-A: Flag manipulation instructions
It's causing build errors.

llvm-svn: 336422
2018-07-06 08:39:43 +00:00
Sjoerd Meijer 6f5f6d5b2e [AArch64] Armv8.4-A: Flag manipulation instructions
These instructions are added to AArch64 only.

Differential Revision: https://reviews.llvm.org/D48926

llvm-svn: 336421
2018-07-06 08:12:20 +00:00
Tim Northover 7ee46ed992 CallGraphSCCPass: iterate over all functions.
Previously we only iterated over functions reachable from the set of
external functions in the module. But since some of the passes under
this (notably the always-inliner and coroutine lowerer) are required for
correctness, they need to run over everything.

This just adds an extra layer of iteration over the CallGraph to keep
track of which functions we've already visited and get the next batch of
SCCs.

Should fix PR38029.

llvm-svn: 336419
2018-07-06 08:04:47 +00:00
Sjoerd Meijer 2a57b357a3 [AArch64][ARM] Armv8.4-A: Trace synchronization barrier instruction
This adds the Armv8.4-A Trace synchronization barrier (TSB) instruction.

Differential Revision: https://reviews.llvm.org/D48918

llvm-svn: 336418
2018-07-06 08:03:12 +00:00
Craig Topper c60e1807b3 [X86] Remove FMA4 scalar intrinsics. Use llvm.fma intrinsic instead.
The intrinsics can be implemented with a f32/f64 llvm.fma intrinsic and an insert into a zero vector.

There are a couple regressions here due to SelectionDAG not being able to pull an fneg through an extract_vector_elt. I'm not super worried about this though as InstCombine should be able to do it before we get to SelectionDAG.

llvm-svn: 336416
2018-07-06 07:14:41 +00:00
Max Kazantsev 20da7e467a Revert "[InstCombine] Delay foldICmpUsingKnownBits until simple transforms are done"
llvm-svn: 336410
2018-07-06 04:04:13 +00:00
Craig Topper 7b35585ff1 [X86] Remove all of the avx512 masked packed fma intrinsics. Use llvm.fma or unmasked 512-bit intrinsics with rounding mode.
This upgrades all of the intrinsics to use fneg instructions to convert fma into fmsub/fnmsub/fnmadd/fmsubadd. And uses a select instruction for masking.

This matches how clang uses the intrinsics these days.

llvm-svn: 336409
2018-07-06 03:42:09 +00:00
Stefan Pintilie b351f09c9e [Power9] Add __float128 library call for frem
Power 9 does not have a hardware instruction for frem but we can call fmodf128.

Differential Revision: https://reviews.llvm.org/D48552

llvm-svn: 336406
2018-07-06 02:47:02 +00:00
Zachary Turner 1f200adfa7 [PDB] Sort globals symbols by name in GSI hash buckets.
It seems like the debugger first computes a symbol's bucket,
and then does a binary search of entries in the bucket using the
symbol's name in order to find it.  If the bucket entries are not
in sorted order, this obviously won't work.  After this patch a
couple of simple test cases show that we generate an exactly
identical GSI hash stream, which is very nice.

llvm-svn: 336405
2018-07-06 02:33:58 +00:00
Mandeep Singh Grang 083f4d7da4 [OpenEmbedded] Add OpenEmbedded vendor
Summary: The lib paths are not correctly picked up for OpenEmbedded sysroots
(like arm-oe-linux-gnueabi). I fix this in a follow-up clang patch. But in
order to add the correct libs I need to detect if the vendor is oe. For this
reason, it is first necessary to teach llvm to detect oe vendor, which is what
this patch does.

Reviewers: chandlerc, compnerd, rengolin, javed.absar

Reviewed By: compnerd

Subscribers: kristof.beyls, dexonsmith, llvm-commits

Differential Revision: https://reviews.llvm.org/D48861

llvm-svn: 336401
2018-07-05 23:41:17 +00:00
Maksim Panchenko 89e4abe7b7 [X86][Disassembler] Fix LOCK prefix disassembler support
Summary:
If LOCK prefix is not the first prefix in an instruction, LLVM
disassembler silently drops the prefix.

The fix is to select a proper instruction with a builtin LOCK prefix if
one exists.

Reviewers: craig.topper

Reviewed By: craig.topper

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D49001

llvm-svn: 336400
2018-07-05 23:32:42 +00:00
Michael Zolotukhin a5f2c52a1e Revert r332168: "Reapply "[PR16756] Use SSAUpdaterBulk in JumpThreading.""
There were a couple of issues reported (PR38047, PR37929) - I'll reland
the patch when I figure out and fix the rootcause.

llvm-svn: 336393
2018-07-05 22:10:31 +00:00
Heejin Ahn 80d9f1708f [WebAssembly] Add missing _S opcodes of atomic stores to InstPrinter
Summary: This was missing in D48839 (rL336145).

Reviewers: aardappel

Subscribers: dschuff, sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D48992

llvm-svn: 336390
2018-07-05 21:27:09 +00:00
Heejin Ahn e69ba6e6d5 [ORC] Add BitReader/BitWriter to target_link_libraries
Summary:
CompileOnDemandLayer.cpp uses function in these libraries, and builds
with `-DSHARED_LIB=ON` fail without this.

Reviewers: lhames

Subscribers: mgorny, llvm-commits

Differential Revision: https://reviews.llvm.org/D48995

llvm-svn: 336389
2018-07-05 21:23:15 +00:00
Sander de Smalen e2c10f8f47 This is a recommit of r336322, previously reverted in r336324 due to
a deficiency in TableGen that has been addressed in r336334.

[AArch64][SVE] Asm: Support for predicated FP rounding instructions.

This patch also adds instructions for predicated FP square-root and
reciprocal exponent.

The added instructions are:
- FRINTI  Round to integral value (current FPCR rounding mode)
- FRINTX  Round to integral value (current FPCR rounding mode, signalling inexact)
- FRINTA  Round to integral value (to nearest, with ties away from zero)
- FRINTN  Round to integral value (to nearest, with ties to even)
- FRINTZ  Round to integral value (toward zero)
- FRINTM  Round to integral value (toward minus Infinity)
- FRINTP  Round to integral value (toward plus Infinity)
- FSQRT   Floating-point square root
- FRECPX  Floating-point reciprocal exponent

llvm-svn: 336387
2018-07-05 20:21:21 +00:00
Lang Hames 7bd8970743 [ORC] In CompileOnDemandLayer2, clone modules on to different contexts by
writing them to a buffer and re-loading them.

Also introduces a multithreaded variant of SimpleCompiler
(MultiThreadedSimpleCompiler) for compiling IR concurrently on multiple
threads.

These changes are required to JIT IR on multiple threads correctly.

No test case yet. I will be looking at how to modify LLI / LLJIT to test
multithreaded JIT support soon.

llvm-svn: 336385
2018-07-05 19:01:27 +00:00
Diogo N. Sampaio 734cfd11fe Testing commit permision
llvm-svn: 336384
2018-07-05 18:49:32 +00:00
Craig Topper 88d361e976 [X86] Remove the last of the 'x86.fma.' intrinsics and autoupgrade them to 'llvm.fma'. Add upgrade tests for all.
Still need to remove the AVX512 masked versions.

llvm-svn: 336383
2018-07-05 18:43:58 +00:00
Craig Topper 4fe321d1ce [X86] Add SHUF128 to target shuffle decoding.
Differential Revision: https://reviews.llvm.org/D48954

llvm-svn: 336376
2018-07-05 17:10:17 +00:00
Matt Arsenault 24ce89b717 Fix asserts in AMDGCN fmed3 folding by handling more cases of NaN
Better NaN handling for AMDGCN fmed3.

All operands are checked for NaN now. The checks
were moved before the canonicalization to provide
a better mapping from fclamp. Changed the behaviour
of fmed3(x,y,NaN) to return max(x,y) instead of
min(x,y) in light of this. Updated tests as a result
and added some new cases to cover the fix.

Patch by Alan Baker

llvm-svn: 336375
2018-07-05 17:05:36 +00:00
Matt Arsenault 29f303799b AMDGPU/GlobalISel: Implement custom kernel arg lowering
Avoid using allocateKernArg / AssignFn. We do not want any
of the type splitting properties of normal calling convention
lowering.

For now at least this exists alongside the IR argument lowering
pass. This is necessary to handle struct padding correctly while
some arguments are still skipped by the IR argument lowering
pass.

llvm-svn: 336373
2018-07-05 17:01:20 +00:00
Simon Pilgrim 8c3765dc6b [CostModel][X86] Add UDIV/UREM by pow2 costs
Normally InstCombine would have simplified these to SRL/AND instructions but we may still see these during SLP vectorization etc.

llvm-svn: 336371
2018-07-05 16:56:28 +00:00
Lei Huang 5612b90694 [Power9] Add lib calls for float128 operations with no equivalent PPC instructions
Map the following instructions to the proper float128 lib calls:
  pow[i], exp[2], log[2|10], sin, cos, fmin, fmax

Differential Revision: https://reviews.llvm.org/D48544

llvm-svn: 336361
2018-07-05 15:21:37 +00:00
Simon Pilgrim dafd828c97 [SLPVectorizer] Begin abstracting InstructionsState alternate matching away from opcodes. NFCI.
This is an early step towards matching Instructions by attributes other than the opcode. This will be necessary for cast/call alternates which share the same opcode but have different types/intrinsicIDs etc. - which we could vectorize as long as we split them using the alternate mechanism.

Differential Revision: https://reviews.llvm.org/D48945

llvm-svn: 336344
2018-07-05 12:30:44 +00:00
Ryan Taylor 5f04458a61 [AMDGPU] Add VALU to V_INTERP Instructions
Wait states are not properly being inserted after buffer_store for v_interp instructions.

Add VALU to V_INTERP instructions so that the GCNHazardRecognizer can
check and insert the appropriate wait states when needed.

Differential Revision: https://reviews.llvm.org/D48772

Change-Id: Id540c9b074fc69b5c1de6b182276aa089c74aa64
llvm-svn: 336339
2018-07-05 12:02:07 +00:00
Simon Pilgrim 6dc45e6ca0 Try to fix -Wimplicit-fallthrough warning. NFCI.
llvm-svn: 336331
2018-07-05 09:48:01 +00:00
Aleksandar Beserminji 3239ba8c0e [mips] Fix atomic operations at O0, v3
Similar to PR/25526, fast-regalloc introduces spills at the end of basic
blocks. When this occurs in between an ll and sc, the stores can cause the
atomic sequence to fail.

This patch fixes the issue by introducing more pseudos to represent atomic
operations and moving their lowering to after the expansion of postRA
pseudos.

This version addresses issues with the initial implementation and covers
all atomic operations.

This resolves PR/32020.

Thanks to James Cowgill for reporting the issue!

Patch By: Simon Dardis

Differential Revision: https://reviews.llvm.org/D31287

llvm-svn: 336328
2018-07-05 09:27:05 +00:00
Ivan A. Kosarev 466037900c [NEON] Fix combining of vldx_dup intrinsics with updating of base addresses
Resolves:
Unsupported ARM Neon intrinsics in Target-specific DAG combine
function for VLDDUP
https://bugs.llvm.org/show_bug.cgi?id=38031

Related diff: D48439

Differential Revision: https://reviews.llvm.org/D48920

llvm-svn: 336325
2018-07-05 08:59:49 +00:00
Sander de Smalen 097ab704c9 Reverting r336322 for now, as it causes an assert failure
in TableGen, for which there is already a patch in Phabricator
(D48937) that needs to be committed first.

llvm-svn: 336324
2018-07-05 08:52:03 +00:00
Sander de Smalen ef44226c4f [AArch64][SVE] Asm: Support for predicated FP rounding instructions.
This patch also adds instructions for predicated FP square-root and
reciprocal exponent.

The added instructions are:
- FRINTI  Round to integral value (current FPCR rounding mode)
- FRINTX  Round to integral value (current FPCR rounding mode, signalling inexact)
- FRINTA  Round to integral value (to nearest, with ties away from zero)
- FRINTN  Round to integral value (to nearest, with ties to even)
- FRINTZ  Round to integral value (toward zero)
- FRINTM  Round to integral value (toward minus Infinity)
- FRINTP  Round to integral value (toward plus Infinity)
- FSQRT   Floating-point square root
- FRECPX  Floating-point reciprocal exponent

llvm-svn: 336322
2018-07-05 08:38:30 +00:00
Sjoerd Meijer 27be58b307 [ARM] ParallelDSP: only support i16 loads for now
We were miscompiling i8 loads, so reject them as unsupported narrow operations
for now.

Differential Revision: https://reviews.llvm.org/D48944

llvm-svn: 336319
2018-07-05 08:21:40 +00:00
Sander de Smalen 592718f906 [AArch64][SVE] Asm: Support for signed/unsigned MIN/MAX/ABD
This patch implements the following varieties:

- Unpredicated signed max,   e.g. smax z0.h, z1.h, #-128
- Unpredicated signed min,   e.g. smin z0.h, z1.h, #-128

- Unpredicated unsigned max, e.g. umax z0.h, z1.h, #255
- Unpredicated unsigned min, e.g. umin z0.h, z1.h, #255

- Predicated signed max,     e.g. smax z0.h, p0/m, z0.h, z1.h
- Predicated signed min,     e.g. smin z0.h, p0/m, z0.h, z1.h
- Predicated signed abd,     e.g. sabd z0.h, p0/m, z0.h, z1.h

- Predicated unsigned max,   e.g. umax z0.h, p0/m, z0.h, z1.h
- Predicated unsigned min,   e.g. umin z0.h, p0/m, z0.h, z1.h
- Predicated unsigned abd,   e.g. uabd z0.h, p0/m, z0.h, z1.h

llvm-svn: 336317
2018-07-05 07:54:10 +00:00
Lei Huang 66e22c21c3 [Power9] Optimize codgen for conversions of int to float128
Optimize code sequences for integer conversion to fp128 when the integer is a result of:
  * float->int
  * float->long
  * double->int
  * double->long

Differential Revision: https://reviews.llvm.org/D48429

llvm-svn: 336316
2018-07-05 07:46:01 +00:00
Craig Topper 350c5f1881 [X86] Remove X86 specific scalar FMA intrinsics and upgrade to tart independent FMA and extractelement/insertelement.
llvm-svn: 336315
2018-07-05 06:52:55 +00:00
Serge Pavlov 892bd81a62 [demangler] Avoid alignment warning
The alignment specified by a constant for the field
`BumpPointerAllocator::InitialBuffer` exceeded the alignment
guaranteed by `malloc` and `new` on Windows. This change set
the alignment value to that of `long double`, which is defined
by the used platform.

It fixes https://bugs.llvm.org/show_bug.cgi?id=37944.

Differential Revision: https://reviews.llvm.org/D48889

llvm-svn: 336311
2018-07-05 06:22:39 +00:00
Lei Huang a855e17f09 [Power9] Ensure float128 in non-homogenous aggregates are passed via VSX reg
Non-homogenous aggregates are passed in consecutive GPRs, in GPRs and in memory,
or in memory. This patch ensures that float128 members of non-homogenous
aggregates are passed via VSX registers.

This is done via custom lowering a bitcast of a build_pari(i64,i64) to float128
to a new PPCISD node, BUILD_FP128.

Differential Revision: https://reviews.llvm.org/D48308

llvm-svn: 336310
2018-07-05 06:21:37 +00:00
Lei Huang d17c39ccaa [Power9]Legalize and emit code for quad-precision convert from single-precision
Legalize and emit code for quad-precision floating point operation conversion of
single-precision value to quad-precision.

Differential Revision: https://reviews.llvm.org/D47569

llvm-svn: 336307
2018-07-05 04:18:37 +00:00
Lei Huang a26f3be454 [Power9] Implement float128 parameter passing and return values
This patch enable parameter passing and return by value for float128 types.
Passing aggregate/union which contain float128 members will be submitted in
subsequent patches.

Differential Revision: https://reviews.llvm.org/D47552

llvm-svn: 336306
2018-07-05 04:10:15 +00:00
Craig Topper 2db909cfae [X86] Remove some isel patterns for X86ISD::SELECTS that specifically looked for the v1i1 mask to have come from a scalar_to_vector from GR8.
We have patterns for SELECTS that top at v1i1 and we have a pattern for (v1i1 (scalar_to_vector GR8)). The patterns being removed here do the same thing as the two other patterns combined so there is no need for them.

llvm-svn: 336305
2018-07-05 03:01:29 +00:00
Craig Topper 95eb88abfe [X86] Add support for combining FMSUB/FNMADD/FNMSUB ISD nodes with an fneg input.
Previously we could only negate the FMADD opcodes. This used to be mostly ok when we lowered FMA intrinsics during lowering. But with the move to llvm.fma from target specific intrinsics, we can combine (fneg (fma)) to (fmsub) earlier. So if we start with (fneg (fma (fneg))) we would get stuck at (fmsub (fneg)).

This patch fixes that so we can also combine things like (fmsub (fneg)).

llvm-svn: 336304
2018-07-05 02:52:56 +00:00
Craig Topper e4b9257b69 [X86] Remove some of the packed FMA3 intrinsics since we no longer use them in clang.
There's a regression in here due to inability to combine fneg inputs of X86ISD::FMSUB/FNMSUB/FNMADD nodes.

More removals to come, but I wanted to stop and fix the regression that showed up in this first.

llvm-svn: 336303
2018-07-05 02:52:54 +00:00
Lei Huang 6270ab6ce4 [Power9]Legalize and emit code for round & convert quad-precision values
Legalize and emit code for round & convert float128 to double precision and
single precision.

Differential Revision: https://reviews.llvm.org/D46997

llvm-svn: 336299
2018-07-04 21:59:16 +00:00
Vladimir Stefanovic 87b60a0e00 [mips] Warn when crc, ginv, virt flags are used with too old revision
CRC and GINV ASE require revision 6, Virtualization requires revision 5.
Print a warning when revision is older than required.

Differential Revision: https://reviews.llvm.org/D48843

llvm-svn: 336296
2018-07-04 19:26:31 +00:00
Stefan Pintilie cb4f0c5c07 [PowerPC] Replace the Post RA List Scheduler with the Machine Scheduler
We want to run the Machine Scheduler instead of the List Scheduler after RA.
  Checked with a performance run on a Power 9 machine with SPEC 2006 and while
  some benchmarks improved and others degraded the geomean was slightly improved
  with the Machine Scheduler.

  Differential Revision: https://reviews.llvm.org/D45265

llvm-svn: 336295
2018-07-04 18:54:25 +00:00
Sanjay Patel 9c2e7ceb1a [InstCombine] allow narrowing of min/max/abs
We have bailout hacks based on min/max in various places in instcombine 
that shouldn't be necessary. The affected test was added for:
D48930 
...which is a consequence of the improvement in:
D48584 (https://reviews.llvm.org/rL336172)

I'm assuming the visitTrunc bailout in this patch was added specifically 
to avoid a change from SimplifyDemandedBits, so I'm just moving that 
below the EvaluateInDifferentType optimization. A narrow min/max is still
a min/max.

llvm-svn: 336293
2018-07-04 17:44:04 +00:00
Simon Pilgrim ae1c4dcc6e Fix some irregular whitespace/indentation. NFCI.
llvm-svn: 336291
2018-07-04 17:24:05 +00:00
Volodymyr Turanskyy 17c0c4e742 [ARM] [Assembler] Support negative immediates: cover few missing cases
Support for negative immediates was implemented in
https://reviews.llvm.org/rL298380, however few instruction options were missing.

This change adds negative immediates support and respective tests
for the following:

ADD
ADDS
ADDS.W
AND.W
ANDS
BIC.W
BICS
BICS.W
SUB
SUBS
SUBS.W

Differential Revision: https://reviews.llvm.org/D48649

llvm-svn: 336286
2018-07-04 16:11:15 +00:00
Yvan Roux eaececf5e0 [MachineOutliner] Fix typo in getOutliningCandidateInfo function name
getOutlininingCandidateInfo -> getOutliningCandidateInfo

Differential Revision: https://reviews.llvm.org/D48867

llvm-svn: 336285
2018-07-04 15:37:08 +00:00
Paul Semel d2af4d6f1b [llvm-objdump] Add --file-headers (-f) option
llvm-svn: 336284
2018-07-04 15:25:03 +00:00
Andrew Ng 089303d8ff [ThinLTO] Update ThinLTO cache file atimes when on Windows
ThinLTO cache file access times are used for expiration based pruning
and since Vista, file access times are not updated by Windows by
default:

https://blogs.technet.microsoft.com/filecab/2006/11/07/disabling-last-access-time-in-windows-vista-to-improve-ntfs-performance

This means on Windows, cache files are currently being pruned from
creation time. This change manually updates cache files that are
accessed by ThinLTO, when on Windows.

Patch by Owen Reynolds.

Differential Revision: https://reviews.llvm.org/D47266

llvm-svn: 336276
2018-07-04 14:17:10 +00:00
Sander de Smalen 1e4dc2e97d [AArch64][SVE] Asm: Support for reversed subtract (SUBR) instruction.
This patch adds both a vector and an immediate form, e.g.                  
                                                                           
- Vector form:                                                             
                                                                           
    subr z0.h, p0/m, z0.h, z1.h                                            
                                                                           
  subtract active elements of z0 from z1, and store the result in z0.      
                                                                           
- Immediate form:                                                          
                                                                           
    subr z0.h, z0.h, #255                                                  
                                                                           
  subtract elements of z0, and store the result in z0.

llvm-svn: 336274
2018-07-04 14:05:33 +00:00
Sander de Smalen ab2b0530d9 [AArch64][SVE] Asm: Support for instructions to set/read FFR.
Includes instructions to read the First-Faulting Register (FFR):
- RDFFR (unpredicated)
    rdffr   p0.b
- RDFFR (predicated)
    rdffr   p0.b, p0/z
- RDFFRS (predicated, sets condition flags)
    rdffr   p0.b, p0/z

Includes instructions to set/write the FFR:
- SETFFR (no arguments, sets the FFR to all true)
    setffr
- WRFFR  (unpredicated)
    wrffr   p0.b

llvm-svn: 336267
2018-07-04 12:58:46 +00:00
Sander de Smalen 80283b2af4 [AArch64][SVE] Asm: Support for FP conversion instructions.
The variants added are:

- fcvt   (FP convert precision)
- scvtf  (signed int -> FP) 
- ucvtf  (unsigned int -> FP) 
- fcvtzs (FP -> signed int (round to zero))
- fcvtzu (FP -> unsigned int (round to zero))

For example:
  fcvt   z0.h, p0/m, z0.s  (single- to half-precision FP) 
  scvtf  z0.h, p0/m, z0.s  (32-bit int to half-precision FP) 
  ucvtf  z0.h, p0/m, z0.s  (32-bit unsigned int to half-precision FP) 
  fcvtzs z0.s, p0/m, z0.h  (half-precision FP to 32-bit int)
  fcvtzu z0.s, p0/m, z0.h  (half-precision FP to 32-bit unsigned int)

llvm-svn: 336265
2018-07-04 12:13:17 +00:00
Anastasis Grammenos 204726b345 [DebugInfo][LoopVectorize] Preserve DL in generated phi instruction
When creating `phi` instructions to resume at the scalar part of the loop,
copy the DebugLoc from the original phi over to the new one.

Differential Revision: https://reviews.llvm.org/D48769

llvm-svn: 336256
2018-07-04 10:16:55 +00:00
Anastasis Grammenos 509d79789f [DebugInfo][InstCombine] Preserve DI after combining zext
When zext is EvaluatedInDifferentType, InstCombine
drops the dbg.value intrinsic. This patch tries to
preserve said DI, by inserting the zext's old DI in the
resulting instruction. (Only for integer type for now)

Differential Revision: https://reviews.llvm.org/D48331

llvm-svn: 336254
2018-07-04 09:55:46 +00:00
Simon Pilgrim c3e1617bf9 [X86][SSE] Blend any v8i16/v4i32 shift with 2 shift unique values (REAPPLIED)
We were only doing this for basic blends, despite shuffle lowering now being good enough to handle more complex blends. This means that the two v8i16 splat shifts are performed in parallel instead of serially as the general shift case.

Reapplied with a fixed (extra null tests) version of rL336113 after reversion in rL336189 - extra test case added at rL336247.

llvm-svn: 336250
2018-07-04 09:12:48 +00:00
Sander de Smalen e31e6d46dd [AArch64][SVE] Asm: Support for SVE condition code aliases
SVE overloads the AArch64 PSTATE condition flags and introduces
a set of condition code aliases for the assembler. The 
details are described in section 2.2 of the architecture
reference manual supplement for SVE.

In short:

  SVE alias =>  AArch64 name
  --------------------------
  NONE      => EQ
  ANY       => NE
  NLAST     => HS
  LAST      => LO
  FIRST     => MI
  NFRST     => PL
  PMORE     => HI
  PLAST     => LS
  TCONT     => GE
  TSTOP     => LT

Reviewers: rengolin, fhahn, SjoerdMeijer, samparker, javed.absar

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D48869

llvm-svn: 336245
2018-07-04 08:50:49 +00:00
Max Kazantsev e8e01143ec [ImplicitNullChecks] Check for rewrite of register used in 'test' instruction
The following code pattern:

       mov %rax, %rcx
       test %rax, %rax
       %rax = ....
       je  throw_npe
       mov(%rcx), %r9
       mov(%rax), %r10

gets transformed into the following incorrect code after implicit null check pass:
        mov %rax, %rcx
       %rax = ....
       faulting_load_op("movl (%rax), %r10", throw_npe)
       mov(%rcx), %r9

For implicit null check pass, if the register that is checked for null value (ie, the register used in the 'test' instruction) is written into before the condition jump, we should avoid doing the optimization.

Patch by Surya Kumari Jangala!

Differential Revision: https://reviews.llvm.org/D48627
Reviewed By: skatkov

llvm-svn: 336241
2018-07-04 08:01:26 +00:00
Jacques Pienaar 0b1e1a4cd4 [lanai] Handle atomic load of i8 like regular load.
Loads and stores less than 64-bits are already atomic, this adds support for a special case thereof. This needs to be expanded.

llvm-svn: 336236
2018-07-03 22:57:51 +00:00
Fangrui Song 78ab286aa0 [X86][AsmParser] Fix inconsistent declaration parameter name in r336218
llvm-svn: 336232
2018-07-03 21:40:03 +00:00
Benjamin Kramer 2f0dd1405e [NVPTX] Expand v2f16 INSERT_VECTOR_ELT
Vectorization can create them.

llvm-svn: 336227
2018-07-03 20:40:04 +00:00
Craig Topper e317533dcf [X86] Remove repeated 'the' from multiple comments that have been copy and pasted. NFC
llvm-svn: 336226
2018-07-03 20:39:55 +00:00
Fangrui Song 68169343a5 [ARM] Fix inconsistent declaration parameter name in r336195
llvm-svn: 336223
2018-07-03 19:12:27 +00:00
Fangrui Song bc5c7f2ef0 [AArch64] Make function parameter names in declarations match those of definitions
llvm-svn: 336222
2018-07-03 19:07:53 +00:00
Craig Topper adc51ae425 [X86][AsmParser] Rework the in/out (%dx) hack one more time.
This patch adds a new token type specifically for (%dx). We will now always create this token when we parse (%dx). After all operands have been parsed, if the mnemonic is in/out we'll morph this token to a regular register token. Otherwise we keep it as the special DX token which won't match any instructions.

This removes the need for passing Mnemonic through the parsing functions. It also seems closer to gas where when its used on the wrong instruction it just gets diagnosed as an invalid operand rather than a bad memory address.

llvm-svn: 336218
2018-07-03 18:07:30 +00:00
Craig Topper bc598f0d61 [X86][AsmParser] Don't consider %eip as a valid register outside of 32-bit mode.
This might make the error message added in r335668 unneeded, but I'm not sure yet.

The check for RIP is technically unnecessary since RIP is in GR64, but that fact is kind of surprising so be explicit.

llvm-svn: 336217
2018-07-03 17:40:51 +00:00
Vladimir Stefanovic beb9d9799f Fix typo in lib/Support/Path.cpp to test commit access
llvm-svn: 336216
2018-07-03 17:26:43 +00:00
Sanjay Patel 8307bc407b [Constants] add identity constants for fadd/fmul
As the test diffs show, the current users of getBinOpIdentity()
are InstCombine and Reassociate. SLP vectorizer is a candidate
for using this functionality too (D28907).

The InstCombine shuffle improvements are part of the planned
enhancements noted in D48830.

InstCombine actually has several other uses of getBinOpIdentity() 
via SimplifyUsingDistributiveLaws(), but we don't call that for 
any FP ops. Fixing that might be another part of removing the
custom reassociation in InstCombine that is only done for fadd+fmul.

llvm-svn: 336215
2018-07-03 17:12:59 +00:00
Sander de Smalen 128fdfa23f [AArch64][SVE] Asm: Support for FP Complex ADD/MLA.
The variants added in this patch are:

- Predicated Complex floating point ADD with rotate, e.g.

   fcadd   z0.h, p0/m, z0.h, z1.h, #90

- Predicated Complex floating point MLA with rotate, e.g.

   fcmla   z0.h, p0/m, z1.h, z2.h, #180

- Unpredicated Complex floating point MLA with rotate (indexed operand), e.g.

   fcmla   z0.h, p0/m, z1.h, z2.h[0], #180

Reviewers: rengolin, fhahn, SjoerdMeijer, samparker, javed.absar

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D48824

llvm-svn: 336210
2018-07-03 16:01:27 +00:00
Amara Emerson d912ffaba5 [AArch64][GlobalISel] Fix fallbacks introduced in r336120 due to unselectable stores.
r336120 resulted in falling back to SelectionDAG more often due to the G_STORE
MMOs not matching the vreg size. This fixes that by explicitly any-extending the
value.

llvm-svn: 336209
2018-07-03 15:59:26 +00:00
Teresa Johnson fc801f5f18 Rename lazy initialization functions to reflect behavior (NFC)
Suggested in review for D48698.

llvm-svn: 336207
2018-07-03 15:52:57 +00:00
Sander de Smalen 8cd1f53334 [AArch64][SVE] Asm: Support for FMUL (indexed)
Unpredicated FP-multiply of SVE vector with a vector-element given by
vector[index], for example:

  fmul z0.s, z1.s, z2.s[0]

which performs an unpredicated FP-multiply of all 32-bit elements in
'z1' with the first element from 'z2'.

This patch adds restricted register classes for SVE vectors:
  ZPR_3b (only z0..z7 are allowed)  - for indexed vector of 16/32-bit elements.
  ZPR_4b (only z0..z15 are allowed) - for indexed vector of 64-bit elements.

Reviewers: rengolin, fhahn, SjoerdMeijer, samparker, javed.absar

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D48823

llvm-svn: 336205
2018-07-03 15:31:04 +00:00
Sander de Smalen cbd224941f [AArch64][SVE] Asm: Support for predicated unary operations.
The patch includes support for the following instructions:

       ABS z0.h, p0/m, z0.h
       NEG z0.h, p0/m, z0.h

  (S|U)XTB z0.h, p0/m, z0.h
  (S|U)XTB z0.s, p0/m, z0.s
  (S|U)XTB z0.d, p0/m, z0.d

  (S|U)XTH z0.s, p0/m, z0.s
  (S|U)XTH z0.d, p0/m, z0.d

  (S|U)XTW z0.d, p0/m, z0.d

llvm-svn: 336204
2018-07-03 14:57:48 +00:00
Simon Pilgrim 74cc4cfa94 [DAGCombiner] visitSDIV - Permit MIN_SIGNED_VALUE in pow2 vector codegen
Now that D45806 has landed, we can re-enable support for MIN_SIGNED_VALUE in the sdiv by pow2-constant code

llvm-svn: 336198
2018-07-03 14:11:32 +00:00
Sanjay Patel 3074b9e53f [InstCombine] fold shuffle-with-binop and common value
This is the last significant change suggested in PR37806:
https://bugs.llvm.org/show_bug.cgi?id=37806#c5
...though there are several follow-ups noted in the code comments 
in this patch to complete this transform.

It's possible that a binop feeding a select-shuffle has been eliminated 
by earlier transforms (or the code was just written like this in the 1st 
place), so we'll fail to match the patterns that have 2 binops from: 
D48401, 
D48678, 
D48662, 
D48485.

In that case, we can try to materialize identity constants for the remaining
binop to fill in the "ghost" lanes of the vector (where we just want to pass 
through the original values of the source operand).

I added comments to ConstantExpr::getBinOpIdentity() to show planned follow-ups. 
For now, we only handle the 5 commutative integer binops (add/mul/and/or/xor).

Differential Revision: https://reviews.llvm.org/D48830

llvm-svn: 336196
2018-07-03 13:44:22 +00:00
Sam Parker ffc1681620 [ARM][NFC] Refactor sequential access for DSP
With a view to support parallel operations that have their results
stored to memory, refactor the consecutive access helper out so it
could support stores instructions.

Differential Revision: https://reviews.llvm.org/D48872

llvm-svn: 336195
2018-07-03 12:44:16 +00:00
Bjorn Pettersson aa02580935 [IR] Strip trailing whitespace. NFC
llvm-svn: 336194
2018-07-03 12:39:52 +00:00
Sjoerd Meijer 173b7f0ec7 [AArch64] Armv8.4-A: system registers
This adds the following system registers:
- RAS registers,
- MPAM registers,
- Activitiy monitor registers,
- Trace Extension registers,
- Timing insensitivity of data processing instructions,
- Enhanced Support for Nested Virtualization.

Differential Revision: https://reviews.llvm.org/D48871

llvm-svn: 336193
2018-07-03 12:09:20 +00:00
Bjorn Pettersson 8dd6cf711f [DebugInfo] Corrections for salvageDebugInfo
Summary:
When salvaging a dbg.declare/dbg.addr we should not add
DW_OP_stack_value to the DIExpression
(see test/Transforms/InstCombine/salvage-dbg-declare.ll).

Consider this example
  %vla = alloca i32, i64 2
  call void @llvm.dbg.declare(metadata i32* %vla, metadata !1, metadata !DIExpression())

Instcombine will turn it into
  %vla1 = alloca [2 x i32]
  %vla1.sub = getelementptr inbounds [2 x i32], [2 x i32]* %vla, i64 0, i64 0
  call void @llvm.dbg.declare(metadata [2 x i32]* %vla1.sub, metadata !19, metadata !DIExpression())

If the GEP can be eliminated, then the dbg.declare will be salvaged
and we should get
  %vla1 = alloca [2 x i32]
  call void @llvm.dbg.declare(metadata [2 x i32]* %vla1, metadata !19, metadata !DIExpression())

The problem was that salvageDebugInfo did not recognize dbg.declare
as being indirect (%vla1 points to the value, it does not hold the
value), so we incorrectly got
  call void @llvm.dbg.declare(metadata [2 x i32]* %vla1, metadata !19, metadata !DIExpression(DW_OP_stack_value))

I also made sure that llvm::salvageDebugInfo and
DIExpression::prependOpcodes do not add DW_OP_stack_value to
the DIExpression in case no new operands are added to the
DIExpression. That way we avoid to, unneccessarily, turn a
register location expression into an implicit location expression
in some situations (see test11 in test/Transforms/LICM/sinking.ll).

Reviewers: aprantl, vsk

Reviewed By: aprantl, vsk

Subscribers: JDevlieghere, llvm-commits

Differential Revision: https://reviews.llvm.org/D48837

llvm-svn: 336191
2018-07-03 11:29:00 +00:00
Benjamin Kramer fd171f2f89 Revert "[X86][SSE] Blend any v8i16/v4i32 shift with 2 shift unique values"
This reverts commit r336113. It causes crashes.

llvm-svn: 336189
2018-07-03 11:15:17 +00:00
Sander de Smalen 7fc8543208 [AArch64][SVE] Asm: Support for saturing ADD/SUB instructions.
The variants added are:
    signed Saturating ADD/SUB (immediate)  e.g. sqadd z0.h, z0.h, #42
  unsigned Saturating ADD/SUB (immediate)  e.g. uqadd z0.h, z0.h, #42
    signed Saturating ADD/SUB (vectors)    e.g. sqadd z0.h, z0.h, z1.h
  unsigned Saturating ADD/SUB (vectors)    e.g. uqadd z0.h, z0.h, z1.h

llvm-svn: 336186
2018-07-03 09:48:22 +00:00
Petar Jovanovic 226e6117ae [MIPS GlobalISel] Lower arguments using stack
Lower more than 4 arguments using stack. This patch targets MIPS32.
It supports only functions with arguments of type i32.

Patch by Petar Avramovic.

Differential Revision: https://reviews.llvm.org/D47934

llvm-svn: 336185
2018-07-03 09:31:48 +00:00
Chandler Carruth 3897ded691 [PM/LoopUnswitch] Fix PR37651 by correctly invalidating SCEV when
unswitching loops.

Original patch trying to address this was sent in D47624, but that
didn't quite handle things correctly. There are two key principles used
to select whether and how to invalidate SCEV-cached information about
loops:

1) We must invalidate any info SCEV has cached before unswitching as we
   may change (or destroy) the loop structure by the act of unswitching,
   and make it hard to recover everything we want to invalidate within
   SCEV.

2) We need to invalidate all of the loops whose CFGs are mutated by the
   unswitching. Notably, this isn't the *entire* loop nest, this is
   every loop contained by the outermost loop reached by an exit block
   relevant to the unswitch.

And we need to do this even when doing trivial unswitching.

I've added more focused tests that directly check that SCEV starts off
with imprecise information and after unswitching (and simplifying
instructions) re-querying SCEV will produce precise information. These
tests also specifically work to check that an *outer* loop's information
becomes precise.

However, the testing here is still a bit imperfect. Crafting test cases
that reliably fail to be analyzed by SCEV before unswitching and succeed
afterward proved ... very, very hard. It took me several hours and
careful work to build these, and I'm not optimistic about necessarily
coming up with more to cover more elaborate possibilities. Fortunately,
the code pattern we are testing here in the pass is really
straightforward and reliable.

Thanks to Max Kazantsev for the initial work on this as well as the
review, and to Hal Finkel for helping me talk through approaches to test
this stuff even if it didn't come to much.

Differential Revision: https://reviews.llvm.org/D47624

llvm-svn: 336183
2018-07-03 09:13:27 +00:00
Sander de Smalen 8fcc3f5feb [AArch64][SVE] Asm: Support for vector element FP compare.
Contains the following variants:

- Compare with (elements from) other vector
  instructions: fcmeq, fcmgt, fcmge, fcmne, fcmuo.
  aliases: fcmle, fcmlt.

  e.g. fcmle   p0.h, p0/z, z0.h, z1.h => fcmge p0.h, p0/z, z1.h, z0.h

- Compare absolute values with (absolute values from) other vector.
  instructions: facge, facgt.
  aliases: facle, faclt.

  e.g. facle   p0.h, p0/z, z0.h, z1.h => facge   p0.h, p0/z, z1.h, z0.h

- Compare vector elements with #0.0
  instructions: fcmeq, fcmgt, fcmge, fcmle, fcmlt, fcmne.

  e.g. fcmle   p0.h, p0/z, z0.h, #0.0

llvm-svn: 336182
2018-07-03 09:07:23 +00:00
Shiva Chen a0a52bf195 [DebugInfo] Fix PR37395.
DbgLabelInst has no address as its operands.

Differential Revision: https://reviews.llvm.org/D46738

Patch by Hsiangkai Wang.

llvm-svn: 336176
2018-07-03 07:56:04 +00:00
Max Kazantsev 3097b76e8c [InstCombine] Delay foldICmpUsingKnownBits until simple transforms are done
This patch changes order of transform in InstCombineCompares to avoid
performing transforms based on ranges which produce complex bit arithmetics
before more simple things (like folding with constants) are done. See PR37636
for the motivating example.

Differential Revision: https://reviews.llvm.org/D48584
Reviewed By: spatel, lebedev.ri

llvm-svn: 336172
2018-07-03 06:23:57 +00:00
Jakub Kuderski 5e3ab7a940 Reappl "[Dominators] Add the DomTreeUpdater class"
Summary:
This patch is the first in a series of patches related to the [[ http://lists.llvm.org/pipermail/llvm-dev/2018-June/123883.html | RFC - A new dominator tree updater for LLVM ]].

This patch introduces the DomTreeUpdater class, which provides a cleaner API to perform updates on available dominator trees (none, only DomTree, only PostDomTree, both) using different update strategies (eagerly or lazily) to simplify the updating process.

—Prior to the patch—

   - Directly calling update functions of DominatorTree updates the data structure eagerly while DeferredDominance does updates lazily.
   - DeferredDominance class cannot be used when a PostDominatorTree also needs to be updated.
   - Functions receiving DT/DDT need to branch a lot which is currently necessary.
   - Functions using both DomTree and PostDomTree need to call the update function separately on both trees.
   - People need to construct an additional DeferredDominance class to use functions only receiving DDT.

—After the patch—

Patch by Chijun Sima <simachijun@gmail.com>.

Reviewers: kuhar, brzycki, dmgreen, grosser, davide

Reviewed By: kuhar, brzycki

Author: NutshellySima

Subscribers: vsk, mgorny, llvm-commits

Differential Revision: https://reviews.llvm.org/D48383

llvm-svn: 336163
2018-07-03 02:06:23 +00:00
Erik Pilkington 988a16af92 Revert r336159, r336157. Some bots failed on qualified std::max_align_t, and other on unqualified max_align_t.
I'll take another stab at this tomorrow. Any ideas for fixing this would be appreciated!

http://lab.llvm.org:8011/builders/lld-x86_64-darwin13/builds/23071/steps/build_Lld/logs/stdio
http://lab.llvm.org:8011/builders/clang-with-thin-lto-ubuntu/builds/11185/steps/build-stage1-compiler/logs/stdio

llvm-svn: 336162
2018-07-03 01:30:53 +00:00
Teresa Johnson f8182f1aef [ThinLTO] Fix printing of aliases for distributed backend indexes
Summary:
When we import an alias (which will import a copy of the aliasee), but
aren't going to import the aliasee directly, the distributed backend
index will not contain the aliasee summary. Handle this in the summary
assembly printer by printing "null" as the aliasee.

Reviewers: davidxl, dexonsmith

Subscribers: mehdi_amini, inglorion, eraman, steven_wu, llvm-commits

Differential Revision: https://reviews.llvm.org/D48699

llvm-svn: 336160
2018-07-03 01:11:43 +00:00
Erik Pilkington 0409a8ade7 Some buildbots were choking on std::max_align_t, try using the global alias.
llvm-svn: 336159
2018-07-03 00:48:27 +00:00
Erik Pilkington d26ace3955 [demangler] Fix a MSVC alignment warning.
This should fix llvm.org/PR37944

llvm-svn: 336157
2018-07-03 00:23:18 +00:00
Lang Hames adae9bfa24 [ORC] Verify modules when running LLLazyJIT in LLI, and deal with fallout.
The verifier identified several modules that were broken due to incorrect
linkage on declarations. To fix this, CompileOnDemandLayer2::extractFunction
has been updated to change decls to external linkage.

llvm-svn: 336150
2018-07-02 22:30:18 +00:00
Teresa Johnson 8fc766681d [ThinLTO] Fix printing of module paths for distributed backend indexes
Summary:
In the individual index files emitted for distributed ThinLTO backends,
the module path ids are not contiguous. Assign slots to module paths in
order to handle this better and also to get contiguous numbering in the
summary assembly.

Reviewers: davidxl, dexonsmith

Subscribers: mehdi_amini, inglorion, eraman, llvm-commits, steven_wu

Differential Revision: https://reviews.llvm.org/D48698

llvm-svn: 336148
2018-07-02 22:09:23 +00:00
Heejin Ahn 402b490843 [WebAssembly] Support for atomic stores
Summary: Add support for atomic store instructions.

Reviewers: dschuff

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D48839

llvm-svn: 336145
2018-07-02 21:22:59 +00:00
Vadzim Dambrouski fd10286e04 [ARM] Fix PR37382: Don't optimize mul.with.overflow on thumbv6m.
Reviewers: efriedma, rogfer01, javed.absar

Reviewed By: efriedma, rogfer01

Subscribers: kristof.beyls, chrib, llvm-commits

Differential Revision: https://reviews.llvm.org/D48846

llvm-svn: 336144
2018-07-02 21:05:26 +00:00
Tim Shen c7cef4bcc4 [SCEV] Strengthen StrengthenNoWrapFlags (reapply r334428).
Summary:
Comment on Transforms/LoopVersioning/incorrect-phi.ll: With the change
SCEV is able to prove that the loop doesn't wrap-self (due to zext i16
to i64), disabling the entire loop versioning pass. Removed the zext and
just use i64.

Reviewers: sanjoy

Subscribers: jlebar, hiraditya, javed.absar, bixia, llvm-commits

Differential Revision: https://reviews.llvm.org/D48409

llvm-svn: 336140
2018-07-02 20:01:54 +00:00
Dan Gohman b01d87622b [WebAssembly] Fix fast-isel optimization of branch conditions.
LLVM doesn't guarantee anything about the high bits of a register holding
an i1 value at the IR level, so don't translate LLVM IR i1 values directly
into WebAssembly conditional branch operands. WebAssembly's conditional
branches do demand all 32 bits be valid.

Fixes PR38019.

llvm-svn: 336138
2018-07-02 19:45:57 +00:00
Krzysztof Parzyszek fd97494984 [X86] Add phony registers for high halves of regs with low halves
Add registers still missing after r328016 (D43353):
- for bits 15-8  of SI, DI, BP, SP (*H), and R8-R15 (*BH),
- for bits 31-16 of R8-R15 (*WH).

Thanks to Craig Topper for pointing it out.

llvm-svn: 336134
2018-07-02 19:05:09 +00:00
Alina Sbirlea 0e15501fa7 Replace "Replacable" with "Replaceable". [NFC]
llvm-svn: 336133
2018-07-02 18:53:40 +00:00
Farhana Aleen 3b416db19b [SLP] Recognize min/max pattern using instructions producing same values.
Summary: It is common to have the following min/max pattern during the intermediate stages of SLP since we only optimize at the end. This patch tries to catch such patterns and allow more vectorization.

         %1 = extractelement <2 x i32> %a, i32 0
         %2 = extractelement <2 x i32> %a, i32 1
         %cond = icmp sgt i32 %1, %2
         %3 = extractelement <2 x i32> %a, i32 0
         %4 = extractelement <2 x i32> %a, i32 1
         %select = select i1 %cond, i32 %3, i32 %4

Author: FarhanaAleen

Reviewed By: ABataev, RKSimon, spatel

Differential Revision: https://reviews.llvm.org/D47608

llvm-svn: 336130
2018-07-02 17:55:31 +00:00
Sanjay Patel b999d74132 [InstCombine] reverse canonicalization of add --> or to allow more shuffle folding
This extends D48485 to allow another pair of binops (add/or) to be combined either
with or without a leading shuffle:
or X, C --> add X, C (when X and C have no common bits set)

Here, we need value tracking to determine that the 'or' can be reversed into an 'add',
and we've added general infrastructure to allow extending to other opcodes or moving 
to where other passes could use that functionality.

Differential Revision: https://reviews.llvm.org/D48662

llvm-svn: 336128
2018-07-02 17:42:29 +00:00
Francis Visoiu Mistrih 4d5b1073ba [MC] Error on a .zerofill directive in a non-virtual section
On darwin, all virtual sections have zerofill type, and having a
.zerofill directive in a non-virtual section is not allowed. Instead of
asserting, show a nicer error.

In order to use the equivalent of .zerofill in a non-virtual section,
the usage of .zero of .space is required.

This patch replaces the assert with an error.

Differential Revision: https://reviews.llvm.org/D48517

llvm-svn: 336127
2018-07-02 17:29:43 +00:00
Craig Topper 56440b9745 [X86] Don't use aligned load/store instructions for fp128 if the load/store isn't aligned.
Similarily, don't fold fp128 loads into SSE instructions if the load isn't aligned. Unless we're targeting an AMD CPU that doesn't check alignment on arithmetic instructions.

Should fix PR38001

llvm-svn: 336121
2018-07-02 17:01:54 +00:00
Amara Emerson 846f2436e8 [AArch64][GlobalISel] Any-extend vararg parameters to stack slot size on Darwin.
We currently don't any-extend vararg parameters before storing them to the stack
locations on Darwin. However, SelectionDAG however does this, and so user code
is in the wild which inadvertently relies on this extension. This can manifest
in cases where the value stored is (int)0, but the actual parameter is interpreted
by va_arg as a pointer, and so not extending to 64 bits causes the callee to
load additional undefined bits.

llvm-svn: 336120
2018-07-02 16:39:09 +00:00
Jakub Kuderski 198f3b16dc Revert "[Dominators] Add the DomTreeUpdater class"
Temporary revert because of a failing test on some buildbots.

This reverts commit r336114.

llvm-svn: 336117
2018-07-02 16:10:49 +00:00
Jakub Kuderski e813a9b380 [Dominators] Add the DomTreeUpdater class
Summary:
This patch is the first in a series of patches related to the [[ http://lists.llvm.org/pipermail/llvm-dev/2018-June/123883.html | RFC - A new dominator tree updater for LLVM ]].

This patch introduces the DomTreeUpdater class, which provides a cleaner API to perform updates on available dominator trees (none, only DomTree, only PostDomTree, both) using different update strategies (eagerly or lazily) to simplify the updating process.

—Prior to the patch—

   - Directly calling update functions of DominatorTree updates the data structure eagerly while DeferredDominance does updates lazily.
   - DeferredDominance class cannot be used when a PostDominatorTree also needs to be updated.
   - Functions receiving DT/DDT need to branch a lot which is currently necessary.
   - Functions using both DomTree and PostDomTree need to call the update function separately on both trees.
   - People need to construct an additional DeferredDominance class to use functions only receiving DDT.

—After the patch—

Patch by Chijun Sima <simachijun@gmail.com>.

Reviewers: kuhar, brzycki, dmgreen, grosser, davide

Reviewed By: kuhar, brzycki

Subscribers: vsk, mgorny, llvm-commits

Author: NutshellySima

Differential Revision: https://reviews.llvm.org/D48383

llvm-svn: 336114
2018-07-02 15:37:41 +00:00
Simon Pilgrim 2bc8e079f2 [X86][SSE] Blend any v8i16/v4i32 shift with 2 shift unique values
We were only doing this for basic blends, despite shuffle lowering now being good enough to handle more complex blends. This means that the two v8i16 splat shifts are performed in parallel instead of serially as the general shift case.

llvm-svn: 336113
2018-07-02 15:14:07 +00:00
Sanjay Patel 284ba0c18f [ValueTracking] allow undef elements when matching vector abs
llvm-svn: 336111
2018-07-02 14:43:40 +00:00
David Stenberg 23bba56fce [CodeGen] Make block removal order deterministic in CodeGenPrepare
Summary:
Replace use of a SmallPtrSet with a SmallSetVector to make the worklist
iteration order deterministic. This is done as the order the blocks are
removed may affect whether or not PHI nodes in successor blocks are
removed.

For example, consider the following case where %bb1 and %bb2 are
removed:

    bb1:
      br i1 undef, label %bb3, label %bb4
    bb2:
      br i1 undef, label %bb4, label %bb3
    bb3:
      pv1 = phi type [ undef, %bb1 ], [ undef, %bb2], [ v0, %other ]
      br label %bb4
    bb4:
      pv2 = phi type [ undef, %bb1 ], [ undef, %bb2 ],
                     [ pv1, %bb3 ], [ v0, %other ]

If %bb2 is removed before %bb1, the incoming values from %bb1 and %bb2
to pv1 will be removed before %bb1 is removed as a predecessor to %bb4.
The pv1 node will thus be optimized out (to v0) at the time %bb1 is
removed as a predecessor to %bb4, leaving the blocks as following when
the incoming value from %bb1 has been removed:

    bb3: ; pv1 optimized out, incoming value to pv2 is v0
      br label %bb4
    bb4:
      pv2 = phi type [ v0, %bb3 ], [ v0, %other ]

The pv2 PHI node will be optimized away by removePredecessor() as all
incoming values are identical.

In case %bb2 is removed after %bb1, pv1 will not be optimized out at the
time %bb2 is removed as a predecessor to %bb4, leaving the blocks as
following when the incoming value from %bb2 to pv2 has been removed:

    bb3:
      pv1 = phi type [ undef, %bb2 ], [ v0, %other ]
      br label %bb4
    bb4:
      pv2 = phi type [ pv1, %bb3 ], [ v0, %other ]

The pv2 PHI node will thus not be removed in this case, ultimately
leading to the following output

    bb3: ; pv1 optimized out, incoming value to pv2 is v0
      br label %bb4
    bb4:
      pv2 = phi type [ v0, %bb3 ], [ v0, %other ]

I have not looked into changing DeleteDeadBlock() so that the redundant
PHI nodes are removed.

I have not added a test case, as I was not able to create a particularly
small and (not messy) reproducer. This is likely due to SmallPtrSet
behaving deterministically when in small mode.

Reviewers: void, dexonsmith, spatel, skatkov, fhahn, bkramer, nhaehnle

Reviewed By: fhahn

Subscribers: mgrang, llvm-commits

Differential Revision: https://reviews.llvm.org/D48369

llvm-svn: 336109
2018-07-02 14:23:48 +00:00
Alex Bradbury c48908781d [X86] Use addAliasForDirective to support the .word directive (reland)
The X86 asm parser currently has custom parsing logic for .word. Rather than
use this custom logic, we can just use addAliasForDirective to enable the
reuse of AsmParser::parseDirectiveValue.

See also similar changes to Sparc (rL333078), AArch64 (rL333077), and Hexagon
(rL332607) backends.

Differential Revision: https://reviews.llvm.org/D47004

This is a fixed reland of rL336100. This should have been caught in 
pre-commit testing so apologies for the noise.

llvm-svn: 336104
2018-07-02 13:49:52 +00:00
Alex Bradbury c000e4dcb5 Revert r336100
This was a bad change. .word == 2byte on x86.

llvm-svn: 336103
2018-07-02 13:43:45 +00:00
Simon Pilgrim d5fb50e3bf [SLPVectorizer] Remove nullptr early-outs from Instruction::ShuffleVector getEntryCost
This code is only used by alternate opcodes so the InstructionsState has already confirmed that every Value is an Instruction, plus we use cast<Instruction> which will assert on failure.

llvm-svn: 336102
2018-07-02 13:41:29 +00:00
Alex Bradbury 42485ec9ca [X86] Use addAliasForDirective to support the .word directive
The X86 asm parser currently has custom parsing logic for .word. Rather than 
use this custom logic, we can just use addAliasForDirective to enable the 
reuse of AsmParser::parseDirectiveValue.

See also similar changes to Sparc (rL333078), AArch64 (rL333077), and Hexagon 
(rL332607) backends.

Differential Revision: https://reviews.llvm.org/D47004

llvm-svn: 336100
2018-07-02 13:37:15 +00:00
Florian Hahn 4ebba909a2 Recommit r328307: [IPSCCP] Use constant range information for comparisons of parameters.
This version contains a fix to add values for which the state in ParamState change
to the worklist if the state in ValueState did not change. To avoid adding the
same value multiple times, mergeInValue returns true, if it added the value to
the worklist. The value is added to the worklist depending on its state in
ValueState.

Original message:
For comparisons with parameters, we can use the ParamState lattice
elements which also provide constant range information. This improves
the code for PR33253 further and gets us closer to use
ValueLatticeElement for all values.

Also, as we are using the range information in the solver directly, we
do not need tryToReplaceWithConstantRange afterwards anymore.

Reviewers: dberlin, mssimpso, davide, efriedma

Reviewed By: mssimpso

Differential Revision: https://reviews.llvm.org/D43762

llvm-svn: 336098
2018-07-02 12:44:04 +00:00
Simon Pilgrim 265793d52a [SLPVectorizer] Fix alternate opcode + shuffle cost function to correct handle SK_Select patterns.
We were always using the opcodes of the first 2 scalars for the costs of the alternate opcode + shuffle. This made sense when we used SK_Alternate and opcodes were guaranteed to be alternating, but this fails for the more general SK_Select case.

This fix exposes an issue demonstrated by the fmul_fdiv_v4f32_const test - the SLM model has v4f32 fdiv costs which are more than twice those of the f32 scalar cost, meaning that the cost model determines that the vectorization is not performant. Unfortunately it completely ignores the fact that the fdiv by a constant will be changed into a fmul by InstCombine for a much lower cost vectorization. But at least we're seeing this now...

llvm-svn: 336095
2018-07-02 11:28:01 +00:00
Simon Pilgrim 409bd5f487 [SLPVectorizer] Only Alternate opcodes use ShuffleVector cases for getEntryCost/vectorizeTree. NFCI.
Add assertions - we're already assuming this in how we use the AltOpcode and treat everything as BinaryOperators.

llvm-svn: 336092
2018-07-02 10:54:19 +00:00
Sander de Smalen 8d4c01a702 [AArch64][SVE] Asm: Support for (SQ)INCP/DECP (scalar, vector)
Increments/decrements the result with the number of active bits
from the predicate.

The inc/dec variants added are:
- incp   x0, p0.h     (scalar)
- incp   z0.h, p0     (vector)

The unsigned saturating inc/dec variants added are:
- uqincp x0, p0.h     (scalar)
- uqincp w0, p0.h     (scalar, 32bit)
- uqincp z0.h, p0     (vector)

The signed saturating inc/dec variants added are:
- sqincp x0, p0.h     (scalar)
- sqincp x0, p0.h, w0 (scalar, 32bit)
- sqincp z0.h, p0     (vector)

llvm-svn: 336091
2018-07-02 10:08:36 +00:00
Sander de Smalen c504101781 [AArch64][SVE] Asm: Support for (saturating) vector INC/DEC instructions.
Increment/decrement vector by multiple of predicate constraint
element count.

The variants added by this patch are:
 - INCH, INCW, INC 

and (saturating):
 - SQINCH, SQINCW, SQINCD
 - UQINCH, UQINCW, UQINCW
 - SQDECH, SQINCW, SQINCD
 - UQDECH, UQINCW, UQINCW

For example:
  incw z0.s, all, mul #4

llvm-svn: 336090
2018-07-02 09:31:11 +00:00
Simon Pilgrim e389434a8a [X86][BtVer2] Added Jaguar FPU Pipe0/1 uop counters to permit basic llvm-exegesis uop testing
We don't have PMCs to cover many of the Jaguar resources but we can at least monitor the FPU issue pipes which give an indication of the fpu uop count, just not the execution resources.

llvm-svn: 336089
2018-07-02 09:15:01 +00:00
Petar Jovanovic 3af2c992dc [Mips][FastISel] Do not duplicate condition while lowering branches
This change fixes the issue that arises when we duplicate condition from
the predecessor block. If the condition's arguments are not considered alive
across the blocks, fast regalloc gets confused and starts generating reloads
from the slots that have never been spilled to. This change also leads to
smaller code given that, unlike on architectures with condition codes, on
Mips we can branch directly on register value, thus we gain nothing by
duplication.

Patch by Dragan Mladjenovic.

Differential Revision: https://reviews.llvm.org/D48642

llvm-svn: 336084
2018-07-02 08:56:57 +00:00
Sander de Smalen 8eea4f1c7d [AArch64][SVE] Asm: Support for vector element compares (immediate).
Compare vector elements with a signed/unsigned immediate, e.g.
  cmpgt   p0.s, p0/z, z0.s, #-16
  cmphi   p0.s, p0/z, z0.s, #127

llvm-svn: 336081
2018-07-02 08:20:59 +00:00
Sander de Smalen 0325e304b9 Reapply r334980 and r334983.
These patches were previously reverted as they led to 
buildbot time-outs caused by large switch statement in
printAliasInstr when using UBSan and O3.  The issue has
been addressed with a workaround (r335525).

llvm-svn: 336079
2018-07-02 07:34:52 +00:00
Craig Topper e06dabd3ca [X86] Put some cases in switch statements back on one line to be more compact and make it easier to see the similarities. NFC
It looks like someone ran clang-format over this entire file which reformatted these switches into a multiline form. But I think the single line form is more useful here.

llvm-svn: 336077
2018-07-02 06:42:42 +00:00
Craig Topper 0661f67296 [X86] Remove FMA3Info DenseMap. Break into sorted tables that we can binary search.
I separated out the rounding and broadcast groups into their own tables because it made the ordering in the main table easier.

Further splitting of the tables might make it possible to directly index using bits from the TSFlags, but its probably not worth it right now.

llvm-svn: 336075
2018-07-02 06:23:39 +00:00
QingShan Zhang 3b2aa2b4b4 [PowerPC] Don't make it as pre-inc candidate if displacement isn't 4's multiple for i64 pre-inc load/store
For the below case, pre-inc prep think it's a good candidate to use pre-inc for the bucket, but 64bit integer load/store update (pre-inc) instruction on Power requires the displacement field should be DS-form (4's multiple). Since it can't satisfy the constraint, we have to do some fix ups later. As below, the original load/stores could be well-form, it makes things worse.

unsigned long long result = 0;
unsigned long long foo(char *p, unsigned long long n) {
  for (unsigned long long i = 0; i < n; i++) {
    unsigned long long x1 = *(unsigned long long *)(p - 50000 + i);
    unsigned long long x2 = *(unsigned long long *)(p - 61024 + i);
    unsigned long long x3 = *(unsigned long long *)(p - 62048 + i);
    unsigned long long x4 = *(unsigned long long *)(p - 64096 + i);
    result *= x1 * x2 * x3 * x4;
  }
  return result;
}

Patch by jedilyn(Kewen Lin).

Differential Revision: https://reviews.llvm.org/D48813 
--This line, and  those below, will be ignored--

M    lib/Target/PowerPC/PPCLoopPreIncPrep.cpp
A    test/CodeGen/PowerPC/preincprep-i64-check.ll

llvm-svn: 336074
2018-07-02 05:46:09 +00:00
Piotr Padlewski 5b3db45e8f Implement strip.invariant.group
Summary:
This patch introduce new intrinsic -
strip.invariant.group that was described in the
RFC: Devirtualization v2

Reviewers: rsmith, hfinkel, nlopes, sanjoy, amharc, kuhar

Subscribers: arsenm, nhaehnle, JDevlieghere, hiraditya, xbolva00, llvm-commits

Differential Revision: https://reviews.llvm.org/D47103

Co-authored-by: Krzysztof Pszeniczny <krzysztof.pszeniczny@gmail.com>
llvm-svn: 336073
2018-07-02 04:49:30 +00:00
Eric Christopher 53054141a7 Add an entry for rodata constant merge sections to the default
section flags in the ELF assembler. This matches the defaults
given in the rest of MC.

Fixes PR37997 where we couldn't assemble our own assembly output
without warnings.

llvm-svn: 336072
2018-07-02 00:16:39 +00:00
Craig Topper c004aa6c5f [X86] Remove the places that return nullptr from X86InstrInfo::commuteInstructionImpl.
findCommutedOpIndices does the pre-checking for whether commuting is possible. There should be no reason left to fail in commuteInstructionImpl. There was a missing pre-check that I've added there and changed the check to an assert in commuteInstructionImpl.

llvm-svn: 336070
2018-07-01 23:27:41 +00:00
Simon Pilgrim 3dafb553d9 [SLPVectorizer] Call InstructionsState.isOpcodeOrAlt with Instruction instead of an opcode. NFCI.
llvm-svn: 336069
2018-07-01 20:22:46 +00:00
Simon Pilgrim ef9c97c343 [SLPVectorizer] Replace sameOpcodeOrAlt with InstructionsState.isOpcodeOrAlt helper. NFCI.
This is a basic step towards matching more general instructions types than just opcodes.

llvm-svn: 336068
2018-07-01 20:07:30 +00:00
Craig Topper 4d8ec92fb0 [X86][Disassembler] Remove TYPE_BNDR from translateImmediate.
I've check the disassembler tables and this shouldn't be reachable. Which is good since if it was reachable there should have been a 'return' after the addOperand line.

llvm-svn: 336066
2018-07-01 17:50:29 +00:00
Simon Pilgrim 77d2067677 [SLPVectorizer] Use InstructionsState Op/Alt opcodes directly. NFCI.
llvm-svn: 336063
2018-07-01 13:41:58 +00:00
David Green 963401d2be [UnrollAndJam] New Unroll and Jam pass
This is a simple implementation of the unroll-and-jam classical loop
optimisation.

The basic idea is that we take an outer loop of the form:

  for i..
    ForeBlocks(i)
    for j..
      SubLoopBlocks(i, j)
    AftBlocks(i)

Instead of doing normal inner or outer unrolling, we unroll as follows:

  for i... i+=2
    ForeBlocks(i)
    ForeBlocks(i+1)
    for j..
      SubLoopBlocks(i, j)
      SubLoopBlocks(i+1, j)
    AftBlocks(i)
    AftBlocks(i+1)
  Remainder Loop

So we have unrolled the outer loop, then jammed the two inner loops into
one. This can lead to a simpler inner loop if memory accesses can be shared
between the now jammed loops.

To do this we have to prove that this is all safe, both for the memory
accesses (using dependence analysis) and that ForeBlocks(i+1) can move before
AftBlocks(i) and SubLoopBlocks(i, j).

Differential Revision: https://reviews.llvm.org/D41953

llvm-svn: 336062
2018-07-01 12:47:30 +00:00
Eugene Leviant 6e4134459b [Evaluator] Improve evaluation of call instruction
Recommit of r335324 after buildbot failure fix

llvm-svn: 336059
2018-07-01 11:02:07 +00:00
Craig Topper a2d30b3134 [X86] Remove unnecessary include. NFC
Leftover from when the pass contained a DenseMap before it switched to binary search.

llvm-svn: 336057
2018-07-01 05:54:22 +00:00
Craig Topper 4e78213ae4 [X86] Move the memory unfolding table creation into its own class and make it a ManagedStatic.
Also move the static folding tables, their search functions and the new class into new cpp/h files.

The unfolding table is effectively static data. It's just a different ordering and a subset of the static folding tables.

By putting it in a separate ManagedStatic we ensure we only have one copy instead of one per X86InstrInfo object. This way also makes it only get initialized when really needed.

llvm-svn: 336056
2018-07-01 05:47:49 +00:00
Craig Topper 84199deb17 [X86] Move the X86InstrFMA3Info class into the cpp file. Expose only a getFMA3Group free function. NFCI
The class only exists to hold a DenseMap and is only created as a ManagedStatic. It used to expose a single static method that outside code was expected to use.

This patch moves that static function out of the class and moves it implementation into the cpp file. It can now access the ManagedStatic directly by name without the need for the other static method that accessed the ManagedStatic.

llvm-svn: 336055
2018-06-30 22:38:42 +00:00
Craig Topper 731740744f [X86] Remove the AsmName from the HAX,HDX,HCX,HBX,HSI,HDI,HBP,HSP,HIP artificial registers so they can't be parsed by the assembly parser.
There are no instructions that use them so they weren't causing any bad matches. But they weren't being diagnosed as "invalid register name" if they were used and would instead trigger some form of invalid operand.

llvm-svn: 336054
2018-06-30 22:38:41 +00:00
Craig Topper 1b7b9b8596 [X86] Use MVT::i8 for scalar shift amounts since that is what they ultimately need to legalize to.
I believe all of these are constants so legalizing them should be pretty trivial, but this saves a step.

In one case it looks like we may have been creating a shift amount larger than the shift input itself.

llvm-svn: 336052
2018-06-30 18:30:31 +00:00
Craig Topper 5f28d50d27 [X86] When combining load to BZHI, make sure we create the shift instruction with an i8 type.
This combine runs pretty late and causes us to introduce a shift after the op legalization phase has run. We need to be sure we create the shift with the proper type for the shift amount. If we don't do this, we will still re-legalize the operation properly, but we won't get a chance to fully optimize the truncate that gets inserted.

So this patch adds the necessary truncate when the shift is created. I've also narrowed the subtract that gets created to always be an i32 type. The truncate would have trigered SimplifyDemandedBits to optimize it anyway. But using a more appropriate VT here is free and saves an optimization step.

llvm-svn: 336051
2018-06-30 17:49:42 +00:00
Simon Pilgrim fae337704e [DAGCombiner] Handle correctly non-splat power of 2 -1 divisor (PR37119)
The combine added in commit 329525 overlooked the case where one, but not all, of the divisor elements is -1, -1 is the only power of two value for which the sdiv expansion recipe breaks.

Thanks to @zvi for the original patch.

Differential Revision: https://reviews.llvm.org/D45806

llvm-svn: 336048
2018-06-30 12:22:55 +00:00
Tom Stellard eebbfc2809 AMDGPU/GlobalISel: Make IMPLICIT_DEF of all sizes < 512 legal.
Summary:
We could split sizes that are not power of two into smaller sized
G_IMPLICIT_DEF instructions, but this ends up generating
G_MERGE_VALUES instructions which we then have to handle in the instruction
selector.  Since G_IMPLICIT_DEF is really a no-op it's easier just to
keep everything that can fit into a register legal.

Reviewers: arsenm

Reviewed By: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D48777

llvm-svn: 336041
2018-06-30 04:09:44 +00:00
Jessica Paquette 8bda1881ca [MachineOutliner] Add support for target-default outlining.
This adds functionality to the outliner that allows targets to
specify certain functions that should be outlined from by default.

If a target supports default outlining, then it specifies that in
its TargetOptions. In the case that it does, and the user hasn't
specified that they *never* want to outline, the outliner will
be added to the pass pipeline and will run on those default functions.

This is a preliminary patch for turning the outliner on by default
under -Oz for AArch64.

https://reviews.llvm.org/D48776

llvm-svn: 336040
2018-06-30 03:56:03 +00:00
Craig Topper 59f2f38fe0 [X86] Remove masking from avx512 rotate intrinsics. Use select in IR instead.
llvm-svn: 336035
2018-06-30 01:32:04 +00:00
Chandler Carruth 7c557f804d [instsimplify] Move the instsimplify pass to use more obvious file names
and diretory.

Also cleans up all the associated naming to be consistent and removes
the public access to the pass ID which was unused in LLVM.

Also runs clang-format over parts that changed, which generally cleans
up a bunch of formatting.

This is in preparation for doing some internal cleanups to the pass.

Differential Revision: https://reviews.llvm.org/D47352

llvm-svn: 336028
2018-06-29 23:36:03 +00:00
Zachary Turner 68e1919d14 [CodeView] Correctly compute the name of S_PROCREF symbols.
We have a function which switches on the type of a symbol record
to return a hardcoded offset into the record that contains the
symbol name.  Not all symbols have names to begin with, and for
those records we return -1 for the offset.

Names are used for various things.  Importantly for this particular
bug, a hash of the record name is used as a key for certain hash
tables which are serialied into the PDB file.  One of these hash
tables is for the global symbol stream, which is basically a
collection of S_PROCREF symbols which contain the name of the
symbol, a module, and an address offset.

However, for S_PROCREF symbols, the function to return the offset
of the name was returning -1: basically it wasn't implemented.
As a result of this, all global symbols were hashing to the same
value, essentially it was as if every single global symbol's name
was the empty string.

This manifests in the VS debugger when you try to call a function
(global or member, doesn't matter) through the immediate window
and the debugger simply reports an error because it can't find the
function.  This makes perfect sense, because it is hashing the name
for real, looking in the global symbol hash table, and there is only
1 entry there which corresponds to a symbol whose name is the empty
string.

Fixing this fixes the MSVC debugger in this case.

llvm-svn: 336024
2018-06-29 22:19:02 +00:00
Heejin Ahn a86152d0a7 [WebAssembly] Comment out a switch block in ISelDAGToDAG
Summary: Fixes PR37977.

Reviewers: RKSimon

Subscribers: dschuff, sbc100, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D48737

llvm-svn: 336017
2018-06-29 21:19:22 +00:00
Alina Sbirlea da1e80feb7 [MemorySSA] Add APIs to MemoryPhis to delete incoming blocks/values, and an updater API to remove blocks.
Summary:
MemoryPhis now have APIs analogous to BB Phis to remove an incoming value/block.
The MemorySSAUpdater uses the above APIs when updating MemorySSA given a set of dead blocks about to be deleted.

Reviewers: george.burgess.iv

Subscribers: sanjoy, jlebar, Prazek, llvm-commits

Differential Revision: https://reviews.llvm.org/D48396

llvm-svn: 336015
2018-06-29 20:46:16 +00:00
Alex Shlyapnikov 788764ca12 [HWASan] Do not retag allocas before return from the function.
Summary:
Retagging allocas before returning from the function might help
detecting use after return bugs, but it does not work at all in real
life, when instrumented and non-instrumented code is intermixed.
Consider the following code:

F_non_instrumented() {
  T x;
  F1_instrumented(&x);
  ...
}

{
  F_instrumented();
  F_non_instrumented();
}

- F_instrumented call leaves the stack below the current sp tagged
  randomly for UAR detection
- F_non_instrumented allocates its own vars on that tagged stack,
  not generating any tags, that is the address of x has tag 0, but the
  shadow memory still contains tags left behind by F_instrumented on the
  previous step
- F1_instrumented verifies &x before using it and traps on tag mismatch,
  0 vs whatever tag was set by F_instrumented

Reviewers: eugenis

Subscribers: srhines, llvm-commits

Differential Revision: https://reviews.llvm.org/D48664

llvm-svn: 336011
2018-06-29 20:20:17 +00:00
Vedant Kumar 69ee62cef8 [LLVMContext] Detecting leaked instructions with metadata
When instructions with metadata are accidentally leaked, the result is a
difficult-to-find memory corruption in ~LLVMContextImpl that leads to
random crashes.

Patch by Arvīds Kokins!

llvm-svn: 336010
2018-06-29 20:13:13 +00:00
Paul Robinson 50f8ca38ee Pass DWARFUnit to verifier by reference not by value. I am moderately
sure this should not cause a memory leak.

llvm-svn: 336007
2018-06-29 19:17:44 +00:00
Sean Fertile cd0d7634f6 Revert "Extend CFGPrinter and CallPrinter with Heat Colors"
This reverts r335996 which broke graph printing in Polly.

llvm-svn: 336000
2018-06-29 17:48:58 +00:00
Matt Arsenault f5be3ad7f8 AMDGPU: Don't use struct type for argument layout
This was introducing unnecessary padding after the explicit
arguments, depending on the alignment of the total struct type.
Also has the side effect of avoiding creating an extra GEP for
the offset from the base kernel argument to the explicit kernel
argument offset.

llvm-svn: 335999
2018-06-29 17:31:42 +00:00
Craig Topper 87b107dd69 [X86] Limit the number of target specific nodes emitted in LowerShiftParts
The important part is the creation of the SHLD/SHRD nodes. The compare and the conditional move can use target independent nodes that can be legalized on their own. This gives some opportunities to trigger the optimizations present in the lowering for those things. And its just better to limit the number of places we emit target specific nodes.

The changed test cases still aren't optimal.

Differential Revision: https://reviews.llvm.org/D48619

llvm-svn: 335998
2018-06-29 17:24:07 +00:00
Sean Fertile 3b0535b424 Extend CFGPrinter and CallPrinter with Heat Colors
Extends the CFGPrinter and CallPrinter with heat colors based on heuristics or
profiling information. The colors are enabled by default and can be toggled
on/off for CFGPrinter by using the option -cfg-heat-colors for both
-dot-cfg[-only] and -view-cfg[-only].  Similarly, the colors can be toggled
on/off for CallPrinter by using the option -callgraph-heat-colors for both
-dot-callgraph and -view-callgraph.

Patch by Rodrigo Caetano Rocha!

Differential Revision: https://reviews.llvm.org/D40425

llvm-svn: 335996
2018-06-29 17:13:58 +00:00
Craig Topper 7c96f051d2 [X86] Use a std::vector for the memory unfolding table.
Previously we used a DenseMap which is costly to set up due to multiple full table rehashes as the size increases and causes the table to be reallocated.

This patch changes the table to a vector of structs. We now walk the reg->mem tables and push new entries in the mem->reg table for each row not marked TB_NO_REVERSE. Once all the table entries have been created, we sort the vector. Then we can use a binary search for lookups.

Differential Revision: https://reviews.llvm.org/D48585

llvm-svn: 335994
2018-06-29 17:11:26 +00:00
Petar Jovanovic cccc236a96 [mips] Support shrink-wrapping
Except for -O0, it's enabled by default.

Patch by Vladimir Stefanovic.

Differential Revision: https://reviews.llvm.org/D47947

llvm-svn: 335989
2018-06-29 16:37:16 +00:00
Stanislav Mekhanoshin 20d4795d93 [AMDGPU] Enable LICM in the BE pipeline
This allows to hoist code portion to compute reciprocal of loop
invariant denominator in integer division after codegen prepare
expansion.

Differential Revision: https://reviews.llvm.org/D48604

llvm-svn: 335988
2018-06-29 16:26:53 +00:00
Jessica Paquette 79917b9686 [MachineOutliner] Add always and never options to -enable-machine-outliner
This is a recommit of r335887, which was erroneously committed earlier.

To enable the MachineOutliner by default on AArch64, we need to be able to
disable the MachineOutliner and also provide an option to "always" enable the
outliner.

This adds that capability. It allows the user to still use the old
-enable-machine-outliner option, which defaults to "always". This is building
up to allowing the user to specify "always" versus the target default
outlining behaviour.

https://reviews.llvm.org/D48682

llvm-svn: 335986
2018-06-29 16:12:45 +00:00
Alexey Bataev 2a03d4296a [DEBUG_INFO, NVPTX] Do not emit .debug_loc section.
Summary:
.debug_loc section is not supported for NVPTX target. If there is an
object whose location can change during its lifetime, we do not generate
debug location info for this variable.

Reviewers: echristo

Subscribers: jholewinski, JDevlieghere, llvm-commits

Differential Revision: https://reviews.llvm.org/D48730

llvm-svn: 335976
2018-06-29 14:23:28 +00:00
Krzysztof Parzyszek ce3a66804a [Hexagon] Remove unused instruction itineraties, NFC
llvm-svn: 335975
2018-06-29 13:55:28 +00:00
Sanjay Patel da66753e01 [InstCombine] enhance shuffle-of-binops to allow different variable ops (PR37806)
This was discussed in D48401 as another improvement for:
https://bugs.llvm.org/show_bug.cgi?id=37806

If we have 2 different variable values, then we shuffle (select) those lanes, 
shuffle (select) the constants, and then perform the binop. This eliminates a binop.

The new shuffle uses the same shuffle mask as the existing shuffle, so there's no 
danger of creating a difficult shuffle.

All of the earlier constraints still apply, but we also check for extra uses to 
avoid creating more instructions than we'll remove.

Additionally, we're disallowing the fold for div/rem because that could expose a
UB hole.

Differential Revision: https://reviews.llvm.org/D48678

llvm-svn: 335974
2018-06-29 13:44:06 +00:00
Roman Shirokiy 272eac85c7 Fix overconfident assert in ScalarEvolution::isImpliedViaMerge
We can have AddRec with loops having many predecessors.
This changes an assert to an early return.

Differential Revision: https://reviews.llvm.org/D48766

llvm-svn: 335965
2018-06-29 11:46:30 +00:00
Sjoerd Meijer 3b599d75d5 [AArch64] Armv8.4-A: Virtualization system registers
This adds the Secure EL2 extension.

Differential Revision: https://reviews.llvm.org/D48711

llvm-svn: 335962
2018-06-29 11:03:15 +00:00
Simon Pilgrim aab8660e23 [X86][SSE] Support v16i8/v32i8 vector rotations
This uses the same technique as for shifts - split the rotation into 4/2/1-bit partial rotations and select those partials based on the amount bit, making use of PBLENDVB if available. This halves the use of PBLENDVB compared to expanding to shifts, which can be a slow op.

Unfortunately I haven't found a decent way to share much of this code with the shift equivalent.

Differential Revision: https://reviews.llvm.org/D48655

llvm-svn: 335957
2018-06-29 09:36:39 +00:00
Sjoerd Meijer 195e904002 [ARM][AArch64] Armv8.4-A Enablement
Initial patch adding assembly support for Armv8.4-A.

Besides adding v8.4 as a supported architecture to the usual places, this also
adds target features for the different crypto algorithms. Armv8.4-A introduced
new crypto algorithms, made them optional, and allows different combinations:

- none of the v8.4 crypto functions are supported, which is independent of the
  implementation of the Armv8.0 SHA1 and SHA2 instructions.
- the v8.4 SHA512 and SHA3 support is implemented, in this case the Armv8.0
  SHA1 and SHA2 instructions must also be implemented.
- the v8.4 SM3 and SM4 support is implemented, which is independent of the
  implementation of the Armv8.0 SHA1 and SHA2 instructions.
- all of the v8.4 crypto functions are supported, in this case the Armv8.0 SHA1
  and SHA2 instructions must also be implemented.

The v8.4 crypto instructions are added to AArch64 only, and not AArch32,
and are made optional extensions to Armv8.2-A.

The user-facing Clang options will map on these new target features, their
naming will be compatible with GCC and added in follow-up patches.

The Armv8.4-A instruction sets can be downloaded here:
https://developer.arm.com/products/architecture/a-profile/exploration-tools

Differential Revision: https://reviews.llvm.org/D48625

llvm-svn: 335953
2018-06-29 08:43:19 +00:00
Roman Lebedev 8d081b78e4 SCEVExpander::expandAddRecExprLiterally(): check before casting as Instruction
Summary:
An alternative to D48597.
Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=37936 | PR37936 ]].

The problem is as follows:
1. `indvars` marks `%dec` as `NUW`.
2. `loop-instsimplify` runs `instsimplify`, which constant-folds `%dec` to -1 (D47908)
3. `loop-reduce` tries to do some further modification, but crashes
    with an type assertion in cast, because `%dec` is no longer an `Instruction`,

If the runline is split into two, i.e. you first run `-indvars -loop-instsimplify`,
store that into a file, and then run `-loop-reduce`, there is no crash.

So it looks like the problem is due to `-loop-instsimplify` not discarding SCEV.
But in this case we can just not crash if it's not an `Instruction`.
This is just a local fix, unlike D48597, so there may very well be other problems.

Reviewers: mkazantsev, uabelho, sanjoy, silviu.baranga, wmi

Reviewed By: mkazantsev

Subscribers: evstupac, javed.absar, spatel, llvm-commits

Differential Revision: https://reviews.llvm.org/D48599

llvm-svn: 335950
2018-06-29 07:44:20 +00:00
Craig Topper 875e9f8fa4 [X86] Remove masking from the avx512 packed sqrt intrinsics. Use select in IR instead.
While there improve the coverage of the intrinsic testing and add fast-isel tests.

llvm-svn: 335944
2018-06-29 05:43:26 +00:00
Tom Stellard c5a154db48 AMDGPU: Separate R600 and GCN TableGen files
Summary:
We now have two sets of generated TableGen files, one for R600 and one
for GCN, so each sub-target now has its own tables of instructions,
registers, ISel patterns, etc.  This should help reduce compile time
since each sub-target now only has to consider information that
is specific to itself.  This will also help prevent the R600
sub-target from slowing down new features for GCN, like disassembler
support, GlobalISel, etc.

Reviewers: arsenm, nhaehnle, jvesely

Reviewed By: arsenm

Subscribers: MatzeB, kzhuravl, wdng, mgorny, yaxunl, dstuttard, tpr, t-tye, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D46365

llvm-svn: 335942
2018-06-28 23:47:12 +00:00
Eli Friedman 65d885e376 [ARM] Assert that ARMDAGToDAGISel creates valid UBFX/SBFX nodes.
We don't ever check these again (unless you're using
-fno-integrated-as), so make sure the extracted bits are well-defined.

I don't think it's possible to trigger any of the assertions on trunk,
but it's difficult to prove.  (The first one depends on DAGCombine to
minimize the number of set bits in AND masks; I think the others are
mathematically impossible to hit.)

llvm-svn: 335931
2018-06-28 21:49:41 +00:00
Jessica Paquette 0c5d3ffbb8 [MachineOutliner] Never add the outliner in -O0
This is a recommit of r335879.

We shouldn't add the outliner when compiling at -O0 even if
-enable-machine-outliner is passed in. This makes sure that we
don't add it in this case.

This also removes -O0 from the outliner DWARF test.

llvm-svn: 335930
2018-06-28 21:49:24 +00:00
Jake Ehrlich 0f440d832f [llvm-readobj] Add experimental support for SHT_RELR sections
This change adds experimental support for SHT_RELR sections, proposed
here: https://groups.google.com/forum/#!topic/generic-abi/bX460iggiKg

Definitions for the new ELF section type and dynamic array tags, as well
as the encoding used in the new section are all under discussion and are
subject to change. Use with caution!

Author: rahulchaudhry

Differential Revision: https://reviews.llvm.org/D47919

llvm-svn: 335922
2018-06-28 21:07:34 +00:00
Sanjay Patel d512853aa3 [InstCombine] fix opcode check in shuffle fold
There's no way to expose this difference currently, 
but we should use the updated variable because the
original opcodes can go stale if we transform into
something new.

llvm-svn: 335920
2018-06-28 20:52:43 +00:00
Martin Storsjo 2a9bd7b756 [COFF] Fix constant sharing regression for MinGW
This fixes a regression since SVN r334523, where the object files
built targeting MinGW were rejected by GNU binutils tools. Prior to
that commit, we only put constants in comdat for MSVC configurations.

Differential Revision: https://reviews.llvm.org/D48567

llvm-svn: 335918
2018-06-28 20:28:29 +00:00
Teresa Johnson e87868b7e9 [ThinLTO] Port InlinerFunctionImportStats handling to new PM
Summary:
The InlinerFunctionImportStats will collect and dump stats regarding how
many function inlined into the module were imported by ThinLTO.

Reviewers: wmi, dexonsmith

Subscribers: mehdi_amini, inglorion, llvm-commits, eraman

Differential Revision: https://reviews.llvm.org/D48729

llvm-svn: 335914
2018-06-28 20:07:47 +00:00
Benjamin Kramer 23d8282047 [NVPTX] Delete dead code
No functionality change.

llvm-svn: 335913
2018-06-28 20:05:35 +00:00
Eli Friedman 6613efbd4e [ARM] Add missing Thumb2 assembler diagnostics.
Mostly just adding checks for Thumb2 instructions which correspond to
ARM instructions which already had diagnostics. While I'm here, also fix
ARM-mode strd to check the input registers correctly.

Differential Revision: https://reviews.llvm.org/D48610

llvm-svn: 335909
2018-06-28 19:53:12 +00:00
Anastasis Grammenos 425df22ee3 [SROA] Preserve DebugLoc when rewriting alloca partitions
When rewriting an alloca partition copy the DL from the
old alloca over the the new one.

Differential Revision: https://reviews.llvm.org/D48640

llvm-svn: 335904
2018-06-28 18:58:30 +00:00
Zachary Turner 1adca7c4a5 Add a flag to FileOutputBuffer that allows modification.
FileOutputBuffer creates a temp file and on commit atomically
renames the temp file to the destination file.  Sometimes we
want to modify an existing file in place, but still have the
atomicity guarantee.  To do this we can initialize the contents
of the temp file from the destination file (if it exists), that
way the resulting FileOutputBuffer can have only selective
bytes modified.  Committing will then atomically replace the
destination file as desired.

llvm-svn: 335902
2018-06-28 18:49:09 +00:00
Simon Pilgrim c09b5e31d7 Remove unnecessary semicolon. NFCI.
Fixes -Wpedantic warning.

llvm-svn: 335901
2018-06-28 18:37:16 +00:00
Craig Topper 90317d1d94 [X86] Suppress load folding into and/or/xor if it will prevent matching btr/bts/btc.
This is a follow up to r335753. At the time I forgot about isProfitableToFold which makes this pretty easy.

Differential Revision: https://reviews.llvm.org/D48706

llvm-svn: 335895
2018-06-28 17:58:01 +00:00
Jonas Devlieghere b757fc3878 Revert "Re-land r335297 "[X86] Implement more of x86-64 large and medium PIC code models""
Reverting because this is causing failures in the LLDB test suite on
GreenDragon.

  LLVM ERROR: unsupported relocation with subtraction expression, symbol
  '__GLOBAL_OFFSET_TABLE_' can not be undefined in a subtraction
  expression

llvm-svn: 335894
2018-06-28 17:56:43 +00:00
Sanjay Patel 57bda365bf [InstCombine] allow shl+mul combos with shuffle (select) fold (PR37806)
This is an enhancement to D48401 that was discussed in:
https://bugs.llvm.org/show_bug.cgi?id=37806

We can convert a shift-left-by-constant into a multiply (we canonicalize IR in the other 
direction because that's generally better of course). This allows us to remove the shuffle 
as we do in the regular opcodes-are-the-same cases.

This requires a small hack to make sure we don't introduce any extra poison:
https://rise4fun.com/Alive/ZGv

Other examples of opcodes where this would work are add+sub and fadd+fsub, but we already 
canonicalize those subs into adds, so there's nothing to do for those cases AFAICT. There 
are planned enhancements for opcode transforms such or -> add.

Note that there's a different fold needed if we've already managed to simplify away a binop 
as seen in the test based on PR37806, but we manage to get that one case here because this 
fold is positioned above the demanded elements fold currently.

Differential Revision: https://reviews.llvm.org/D48485

llvm-svn: 335888
2018-06-28 17:48:04 +00:00
Jessica Paquette dafa198c96 [MachineOutliner] Define MachineOutliner support in TargetOptions
Targets should be able to define whether or not they support the outliner
without the outliner being added to the pass pipeline. Before this, the
outliner pass would be added, and ask the target whether or not it supports the
outliner.

After this, it's possible to query the target in TargetPassConfig, before the
outliner pass is created. This ensures that passing -enable-machine-outliner
will not modify the pass pipeline of any target that does not support it.

https://reviews.llvm.org/D48683

llvm-svn: 335887
2018-06-28 17:45:43 +00:00
Simon Pilgrim 9c70d48cb2 [DAGCombiner] Ensure we use the correct CC result type in visitSDIV (REAPPLIED)
We could get away with it for constant folded cases, but not for rL335719.

Thanks to Krzysztof Parzyszek for noticing.

Reapply original commit rL335821 which was reverted at rL335871 due to a WebAssembly bug that was fixed at rL335884.

llvm-svn: 335886
2018-06-28 17:33:41 +00:00
Simon Pilgrim 99f701673d [WebAssembly] Add getSetCCResultType placeholder override to handle vector compare results.
Necessary to get the rL335821 bugfix (which was reverted at rL335871) un-reverted.

llvm-svn: 335884
2018-06-28 17:27:09 +00:00
Jessica Paquette d6261bef7b Revert "[MachineOutliner] Add always and never options to -enable-machine-outliner"
I accidentally committed this instead of D48683 because I haven't had coffee
yet.

llvm-svn: 335883
2018-06-28 17:26:19 +00:00
Jessica Paquette f3a44fe833 Revert "[MachineOutliner] Never add the outliner in -O0"
This reverts commit 9c7c10e4073a0bc6a759ce5cd33afbac74930091.

It relies on r335872 since that introduces the machine outliner
flags test. I meant to commit D48683 in that commit, but got mixed
up and committed D48682 instead. So, I'm reverting this and
r335872, since D48682 hasn't made it through review yet.

llvm-svn: 335882
2018-06-28 17:26:18 +00:00
Jessica Paquette c9d675266e [MachineOutliner] Never add the outliner in -O0
We shouldn't add the outliner when compiling at -O0 even if
-enable-machine-outliner is passed in. This makes sure that we
don't add it in this case.

This also updates machine-outliner-flags to reflect the change
and improves the comment describing what that test does.

llvm-svn: 335879
2018-06-28 17:05:57 +00:00
Matthias Braun da5e7e11d1 SelectionDAGBuilder, mach-o: Skip trap after noreturn call (for Mach-O)
Add NoTrapAfterNoreturn target option which skips emission of traps
behind noreturn calls even if TrapUnreachable is enabled.

Enable the feature on Mach-O to save code size; Comments suggest it is
not possible to enable it for the other users of TrapUnreachable.

rdar://41530228

DifferentialRevision: https://reviews.llvm.org/D48674
llvm-svn: 335877
2018-06-28 17:00:45 +00:00
Jessica Paquette 1ccb66c5fb [MachineOutliner] Add always and never options to -enable-machine-outliner
To enable the MachineOutliner by default on AArch64, we need to be able to
disable the MachineOutliner and also provide an option to "always" enable the
outliner.

This adds that capability. It allows the user to still use the old
-enable-machine-outliner option, which defaults to "always". This is building
up to allowing the user to specify "always" versus the target-default
outlining behaviour.

llvm-svn: 335872
2018-06-28 16:39:42 +00:00
Haojian Wu 2103990e63 Revert "[DAGCombiner] Ensure we use the correct CC result type in visitSDIV"
This reverts commit r335821.

This crashes the webassembly test, run "ninja check-llvm-codegen-webassembly" to reproduce.

llvm-svn: 335871
2018-06-28 16:25:57 +00:00
Stanislav Mekhanoshin 67aa18f165 [AMDGPU] Early expansion of 32 bit udiv/urem
This allows hoisting of a common code, for instance if denominator
is loop invariant. Current change is expansion only, adding licm to
the target pass list going to be a separate patch. Given this patch
changes to codegen are minor as the expansion is similar to that on
DAG. DAG expansion still must remain for R600.

Differential Revision: https://reviews.llvm.org/D48586

llvm-svn: 335868
2018-06-28 15:59:18 +00:00
Stanislav Mekhanoshin 298a61590a [AMDGPU] Overload llvm.amdgcn.fmad.ftz to support f16
Differential Revision: https://reviews.llvm.org/D48677

llvm-svn: 335866
2018-06-28 15:24:46 +00:00
John Brawn bdbbd8381f Add a PhiValuesAnalysis pass to calculate the underlying values of phis
This pass is being added in order to make the information available to BasicAA,
which can't do caching of this information itself, but possibly this information
may be useful for other passes.

Incorporates code based on Daniel Berlin's implementation of Tarjan's algorithm.

Differential Revision: https://reviews.llvm.org/D47893

llvm-svn: 335857
2018-06-28 14:13:06 +00:00
Benjamin Kramer 269eb21e1c Revert "Add support for generating a call graph profile from Branch Frequency Info."
This reverts commits r335794 and r335797. Breaks ThinLTO+FDO selfhost.

llvm-svn: 335851
2018-06-28 13:15:03 +00:00
Sjoerd Meijer c89ca5582a [ARM] Parallel DSP Pass
Armv6 introduced instructions to perform 32-bit SIMD operations. The purpose of
this pass is to do some straightforward IR pattern matching to create ACLE DSP
intrinsics, which map on these 32-bit SIMD operations.

Currently, only the SMLAD instruction gets recognised. This instruction
performs two multiplications with 16-bit operands, and stores the result in an
accumulator. We will follow this up with patches to recognise SMLAD in more
cases, and also to generate other DSP instructions (like e.g. SADD16).

Patch by: Sam Parker and Sjoerd Meijer

Differential Revision: https://reviews.llvm.org/D48128

llvm-svn: 335850
2018-06-28 12:55:29 +00:00
Jesper Antonsson 514b6b5796 Comment change to verify commit rights. NFC.
Summary: Just a silly one-character correction.

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D48709

llvm-svn: 335832
2018-06-28 10:55:04 +00:00
Hans Wennborg a257376003 s/TablesChecked/TableChecked/ after r335823
llvm-svn: 335831
2018-06-28 10:24:38 +00:00
Matt Arsenault 75e7192ba3 AMDGPU: Remove MFI::ABIArgOffset
We have too many mechanisms for tracking the various offsets
used for kernel arguments, so remove one. There's still a lot of
confusion with these because there are two different "implicit"
argument areas located at the beginning and end of the kernarg
segment.

Additionally, the offset was determined based on the memory
size of the split element types. This would break in a future
commit where v3i32 is decomposed into separate i32 pieces.

llvm-svn: 335830
2018-06-28 10:18:55 +00:00
Matt Arsenault 1fb9013368 AMDGPU: Error on calls from graphics shaders
In principle nothing should stop these from working, but
work is necessary to create an ABI for dealing with the stack
related registers.

llvm-svn: 335829
2018-06-28 10:18:36 +00:00
Matt Arsenault 12269dda5c AMDGPU: Fix AMDGPUCodeGenPrepare using uninitialized AMDGPUAS struct
Not sure how this wasn't noticed before.

llvm-svn: 335828
2018-06-28 10:18:23 +00:00
Matt Arsenault 513e0c0ea4 AMDGPU: Fix assert on aggregate type kernel arguments
Just fix the crash for now by not doing the optimization since
figuring out how to properly convert the bits for an arbitrary
struct is a pain.

Also fix a crash when there is only an empty struct argument.

llvm-svn: 335827
2018-06-28 10:18:11 +00:00
Benjamin Kramer f9613b2995 Unify sorted asserts to use the existing atomic pattern
These are all benign races and only visible in !NDEBUG. tsan complains
about it, but a simple atomic bool is sufficient to make it happy.

llvm-svn: 335823
2018-06-28 10:03:45 +00:00
Simon Pilgrim abebe4c746 [DAGCombiner] Ensure we use the correct CC result type in visitSDIV
We could get away with it for constant folded cases, but not for rL335719.

Thanks to Krzysztof Parzyszek for noticing.

llvm-svn: 335821
2018-06-28 09:54:28 +00:00
Florian Hahn 388af14f85 [SCCP] Mark CFG as preserved.
SCCP does not change the CFG, so we can mark it as preserved.

Reviewers: dberlin, efriedma, davide

Reviewed By: davide

Differential Revision: https://reviews.llvm.org/D47149

llvm-svn: 335820
2018-06-28 09:53:38 +00:00
Simon Pilgrim 49cb65bb7b [DAGCombiner] Remove unused variable. NFCI.
Noticed in D45806 review.

llvm-svn: 335817
2018-06-28 09:29:08 +00:00
Max Kazantsev f5ba37182e [IndVarSimplify] Ignore unreachable users of truncs
If a trunc has a user in a block which is not reachable from entry,
we can safely perform trunc elimination as if this user didn't exist.

llvm-svn: 335816
2018-06-28 08:20:03 +00:00
Petar Jovanovic d175aeb881 [DwarfDebug] Remove unused argument (NFC)
Remove unused ByteStreamer argument from function emitDebugLocValue.

Patch by Nikola Prica.

Differential Revision: https://reviews.llvm.org/D48590

llvm-svn: 335811
2018-06-28 04:50:40 +00:00
Craig Topper ec5d568ac1 [X86] Use PatFrag with hardcoded numbers for FROUND_NO_EXC/FROUND_CURRENT instead of ImmLeafs with predicates where one of the two numbers was hardcoded.
This more efficient for the isel table generator since we can use CheckChildInteger instead of MoveChild, CheckPredicate, MoveParent. This reduced the table size by 1-2K.

I wish there was a way to share the values with X86BaseInfo.h and still use a PatFrag like this. These numbers are fixed by the X86 intrinsic spec going back many years and we should never need to change them. So we shouldn't waste table bytes to support sharing.

llvm-svn: 335806
2018-06-28 01:45:44 +00:00
Craig Topper ab70f58891 [X86] Change how we prefer shift by immediate over folding a load into a shift.
BMI2 added new shift by register instructions that have the ability to fold a load.

Normally without doing anything special isel would prefer folding a load over folding an immediate because the load folding pattern has higher "complexity". This would require an instruction to move the immediate into a register. We would rather fold the immediate instead and have a separate instruction for the load.

We used to enforce this priority by artificially lowering the complexity of the load pattern.

This patch changes this to instead reject the load fold in isProfitableToFoldLoad if there is an immediate. This is more consistent with other binops and feels less hacky.

llvm-svn: 335804
2018-06-28 00:47:41 +00:00
Michael J. Spencer 98f5475f44 [CGProfile] Fix unused variable warning.
llvm-svn: 335797
2018-06-28 00:12:04 +00:00
Michael J. Spencer 5bf1ead377 Add support for generating a call graph profile from Branch Frequency Info.
=== Generating the CG Profile ===

The CGProfile module pass simply gets the block profile count for each BB and scans for call instructions.  For each call instruction it adds an edge from the current function to the called function with the current BB block profile count as the weight.

After scanning all the functions, it generates an appending module flag containing the data. The format looks like:
```
!llvm.module.flags = !{!0}

!0 = !{i32 5, !"CG Profile", !1}
!1 = !{!2, !3, !4} ; List of edges
!2 = !{void ()* @a, void ()* @b, i64 32} ; Edge from a to b with a weight of 32
!3 = !{void (i1)* @freq, void ()* @a, i64 11}
!4 = !{void (i1)* @freq, void ()* @b, i64 20}
```

Differential Revision: https://reviews.llvm.org/D48105

llvm-svn: 335794
2018-06-27 23:58:08 +00:00
Zachary Turner ee8010abe3 Move some code from PDBFileBuilder to MSFBuilder.
The code to emit the pieces of the MSF file were actually in
PDBFileBuilder.  Move this to MSFBuilder so that we can
theoretically emit an MSF without having a PDB file.

llvm-svn: 335789
2018-06-27 21:18:15 +00:00
Benjamin Kramer e214f046af [X86] Make folding table checking threadsafe
This is a benign race, but tsan likes to complain about it. Just make it
happy.

llvm-svn: 335788
2018-06-27 21:01:53 +00:00
Craig Topper 880e34ed45 [X86] In X86DAGToDAGISel::PreprocessISelDAG, make sure we don't access N after we delete it.
If we turn X86ISD::AND into ISD::AND, we delete N. But we were continuing onto the next block of code even though N no longer existed.

Just happened to notice it. I assume asan didn't notice it because we explicitly unpoison deleted nodes and give them a DELETE_NODE opcode.

llvm-svn: 335787
2018-06-27 20:58:46 +00:00
Sameer AbuAsal 9b65ffb097 [RISCV] Add machine function pass to merge base + offset
Summary:
   In r333455 we added a peephole to fix the corner cases that result
   from separating base + offset lowering of global address.The
   peephole didn't handle some of the cases because it only has a basic
   block view instead of a function level view.

   This patch replaces that logic with a machine function pass. In
   addition to handling the original cases it handles uses of the global
   address across blocks in function and folding an offset from LW\SW
   instruction. This pass won't run for OptNone compilation, so there
   will be a negative impact overall vs the old approach at O0.

Reviewers: asb, apazos, mgrang

Reviewed By: asb

Subscribers: MartinMosbeck, brucehoult, the_o, rogfer01, mgorny, rbar, johnrusso, simoncook, niosHD, kito-cheng, shiva0217, zzheng, llvm-commits, edward-jones

Differential Revision: https://reviews.llvm.org/D47857

llvm-svn: 335786
2018-06-27 20:51:42 +00:00
Nirav Dave 7c57ae57a8 [DAGCombine] Disable TokenFactor simplifications when optnone.
llvm-svn: 335773
2018-06-27 19:41:25 +00:00
Fangrui Song b0d57a535b [X86] Fix unmatched parenthesis in r335768
llvm-svn: 335769
2018-06-27 19:12:07 +00:00
Craig Topper 6bea2c7f9b [X86] Teach the disassembler to use %eiz/%riz instead of NoRegister when the SIB byte is present, but doesn't encode an index register and there was another shorter encoding that would achieve the same result.
The %eiz/%riz are dummy registers that force the encoder to emit a SIB byte when it normally wouldn't. By emitting them in the disassembly output we ensure that assembling the disassembler output would also produce a SIB byte.

This should match the behavior of objdump from binutils.

llvm-svn: 335768
2018-06-27 19:03:36 +00:00
Daniel Sanders bdeb880d14 [globalisel][legalizer] Add AtomicOrdering to LegalityQuery and use it in AArch64
Now that we have the ability to legalize based on MMO's. Add support for
legalizing based on AtomicOrdering and use it to correct the legalization
of the atomic instructions.

Also extend all() to be a variadic template as this ruleset now requires
3 and 4 argument versions.

llvm-svn: 335767
2018-06-27 19:03:21 +00:00
Sanjay Patel d052de856d [DAGCombiner] restrict (float)((int) f) --> ftrunc with no-signed-zeros
As noted in the D44909 review, the transform from (fptosi+sitofp) to ftrunc 
can produce -0.0 where the original code does not:

#include <stdio.h>
  
int main(int argc) {
  float x;
  x = -0.8 * argc;
  printf("%f\n", (float)((int)x));
  return 0;
}

$ clang -O0 -mavx fp.c ; ./a.out 
0.000000
$ clang -O1 -mavx fp.c ; ./a.out 
-0.000000

Ideally, we'd use IR/node flags to predicate the transform, but the IR parser 
doesn't currently allow fast-math-flags on the cast instructions. So for now, 
just use the function attribute that corresponds to clang's "-fno-signed-zeros" 
option.

Differential Revision: https://reviews.llvm.org/D48085

llvm-svn: 335761
2018-06-27 18:16:40 +00:00
Teresa Johnson 7e7b13d016 [ThinLTO] Print names in function import debug messages when available
Summary:
Rather than just print the GUID, when it is available in the index,
print the global name as well in the function import thin link debug
messages. Names will be available when the combined index is being
built by the same process, e.g. a linker or "llvm-lto2 run".

Reviewers: davidxl

Subscribers: mehdi_amini, inglorion, eraman, steven_wu, llvm-commits

Differential Revision: https://reviews.llvm.org/D48612

llvm-svn: 335760
2018-06-27 18:03:39 +00:00
Jessica Paquette f472f6159a [MachineOutliner] Don't outline sequences where x16/x17/nzcv are live across
It isn't safe to outline sequences of instructions where x16/x17/nzcv live
across the sequence.

This teaches the outliner to check whether or not a specific canidate has
x16/x17/nzcv live across it and discard the candidate in the case that that is
true.

https://bugs.llvm.org/show_bug.cgi?id=37573
https://reviews.llvm.org/D47655

llvm-svn: 335758
2018-06-27 17:43:27 +00:00
Craig Topper 812fcb35e7 [X86] Use bts/btr/btc for single bit set/clear/complement of a variable bit position
If we are just modifying a single bit at a variable bit position we can use the BT* instructions to make the change instead of shifting a 1(or rotating a -1) and doing a binop. These instruction also ignore the upper bits of their index input so we can also remove an and if one is present on the index.

Fixes PR37938.

llvm-svn: 335754
2018-06-27 16:47:39 +00:00
Jakub Kuderski 555e41bbf2 [AliasSet] Fix UnknownInstructions printing
Summary:
AliasSet::print uses `I->printAsOperand` to print UnknownInstructions. The problem is that not all UnknownInstructions have names (e.g. call instructions). When such instructions are printed, they appear as `<badref>` in AliasSets, which is very confusing, as the values are perfectly valid.

This patch fixes that by printing UnknownInstructions without a name using `print` instead of `printAsOperand`.

Reviewers: asbirlea, chandlerc, sanjoy, grosser

Reviewed By: asbirlea

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D48609

llvm-svn: 335751
2018-06-27 16:34:30 +00:00
Craig Topper 31cbe75b3b [X86] Rename the autoupgraded of packed fp compare and fpclass intrinsics that don't take a mask as input to exclude '.mask.' from their name.
I think the intrinsics named 'avx512.mask.' should refer to the previous behavior of taking a mask argument in the intrinsic instead of using a 'select' or 'and' instruction in IR to accomplish the masking. This is more consistent with the goal that eventually we will have no intrinsics that have masking builtin. When we reach that goal, we should have no intrinsics named "avx512.mask".

llvm-svn: 335744
2018-06-27 15:57:53 +00:00
Stanislav Mekhanoshin 1a1687f1bb [AMDGPU] Convert rcp to rcp_iflag
If a source of rcp instruction is a result of any conversion from
an integer convert it into rcp_iflag instruction. No FP exception
can ever happen except division by zero if a single precision rcp
argument is a representation of an integral number.

Differential Revision: https://reviews.llvm.org/D48569

llvm-svn: 335742
2018-06-27 15:33:33 +00:00
Luke Geeson 316327150b [AArch64] Reverting FP16 vcvth_n_s64_f16 to fix
llvm-svn: 335737
2018-06-27 14:34:40 +00:00
Adhemerval Zanella cadcfed7aa [AArch64] Add custom lowering for v4i8 trunc store
This patch adds a custom trunc store lowering for v4i8 vector types.
Since there is not v.4b register, the v4i8 is promoted to v4i16 (v.4h)
and default action for v4i8 is to extract each element and issue 4
byte stores.

A better strategy would be to extended the promoted v4i16 to v8i16
(with undef elements) and extract and store the word lane which
represents the v4i8 subvectores. The construction:

  define void @foo(<4 x i16> %x, i8* nocapture %p) {
    %0 = trunc <4 x i16> %x to <4 x i8>
    %1 = bitcast i8* %p to <4 x i8>*
    store <4 x i8> %0, <4 x i8>* %1, align 4, !tbaa !2
    ret void
  }

Can be optimized from:

  umov    w8, v0.h[3]
  umov    w9, v0.h[2]
  umov    w10, v0.h[1]
  umov    w11, v0.h[0]
  strb    w8, [x0, #3]
  strb    w9, [x0, #2]
  strb    w10, [x0, #1]
  strb    w11, [x0]
  ret

To:

  xtn     v0.8b, v0.8h
  str     s0, [x0]
  ret

The patch also adjust the memory cost for autovectorization, so the C
code:

  void foo (const int *src, int width, unsigned char *dst)
  {
    for (int i = 0; i < width; i++)
       *dst++ = *src++;
  }

can be vectorized to:

  .LBB0_4:                                // %vector.body
                                          // =>This Inner Loop Header: Depth=1
        ldr     q0, [x0], #16
        subs    x12, x12, #4            // =4
        xtn     v0.4h, v0.4s
        xtn     v0.8b, v0.8h
        st1     { v0.s }[0], [x2], #4
        b.ne    .LBB0_4

Instead of byte operations.

llvm-svn: 335735
2018-06-27 13:58:46 +00:00
Ivan A. Kosarev 7231598fce [NEON] Support vldNq intrinsics in AArch32 (LLVM part)
This patch adds support for the q versions of the dup
(load-to-all-lanes) NEON intrinsics, such as vld2q_dup_f16() for
example.

Currently, non-q versions of the dup intrinsics are implemented
in clang by generating IR that first loads the elements of the
structure into the first lane with the lane (to-single-lane)
intrinsics, and then propagating it other lanes. There are at
least two problems with this approach. First, there are no
double-spaced to-single-lane byte-element instructions. For
example, there is no such instruction as 'vld2.8 { d0[0], d2[0]
}, [r0]'. That means we cannot rely on the to-single-lane
intrinsics and instructions to implement the q versions of the
dup intrinsics. Note that to-all-lanes instructions do support
all sizes of data items, including bytes.

The second problem with the current approach is that we need a
separate vdup instruction to propagate the structure to each
lane. So for vld4q_dup_f16() we would need four vdup instructions
in addition to the initial vld instruction.

This patch introduces dup LLVM intrinsics and reworks handling of
the currently supported (non-q) NEON dup intrinsics to expand
them into those LLVM intrinsics, thus eliminating the need for
using to-single-lane intrinsics and instructions.

Additionally, this patch adds support for u64 and s64 dup NEON
intrinsics. These are marked as Arch64-only in the ARM NEON
Reference, but it seems there are no reasons to not support them
in AArch32 mode. Please correct, if that is wrong.

That's what we generate with this patch applied:

vld2q_dup_f16:
  vld2.16 {d0[], d2[]}, [r0]
  vld2.16 {d1[], d3[]}, [r0]

vld3q_dup_f16:
  vld3.16 {d0[], d2[], d4[]}, [r0]
  vld3.16 {d1[], d3[], d5[]}, [r0]

vld4q_dup_f16:
  vld4.16 {d0[], d2[], d4[], d6[]}, [r0]
  vld4.16 {d1[], d3[], d5[], d7[]}, [r0]

Differential Revision: https://reviews.llvm.org/D48439

llvm-svn: 335733
2018-06-27 13:57:52 +00:00
Simon Pilgrim d3e583a52d [DAGCombiner] visitSDIV - add special case handling for (sdiv X, 1) -> X in pow2 expansion
For divisor = 1, perform a select of X - reduces scalarisation of simple SDIVs

llvm-svn: 335727
2018-06-27 12:45:31 +00:00
Simon Pilgrim e835f662fa [DAGCombiner] visitSDIV - simplify pow2 handling. NFCI.
Use the builtin constant folding of getNode() etc. instead of doing it manually.

llvm-svn: 335720
2018-06-27 10:51:55 +00:00
Simon Pilgrim dfbcc66adc [DAGCombiner] Fold SDIV(%X, MIN_SIGNED) -> SELECT(%X == MIN_SIGNED, 1, 0)
Fixes PR37569.

llvm-svn: 335719
2018-06-27 10:21:06 +00:00
Simon Pilgrim 0a566bc0ae [DAGCombiner] Don't accept signbit sdiv divisors in sdiv-by-pow2 vector expansion (PR37569)
llvm-svn: 335717
2018-06-27 09:41:22 +00:00
Luke Geeson 68cb233c0f [AArch64] Remove Duplicate FP16 Patterns with same encoding, match on existing patterns
llvm-svn: 335715
2018-06-27 09:20:13 +00:00
Konstantin Zhuravlyov 30f03b3bc0 AMDGPU/NFC: Fix typo in comment
llvm-svn: 335707
2018-06-27 05:36:03 +00:00
Vedant Kumar f6c0b41fb7 [InstCombine] Avoid creating mis-sized dbg.values in commonCastTransforms()
This prevents InstCombine from creating mis-sized dbg.values when
replacing a sequence of casts with a simpler cast. For example, in:

  (fptrunc (floor (fpext X))) -> (floorf X)

We no longer emit dbg.value(X) (with a 32-bit float operand) to describe
(fpext X) (which is a 64-bit float).

This was diagnosed by the debugify check added in r335682.

llvm-svn: 335696
2018-06-27 00:47:53 +00:00
Craig Topper 33aba0eb4c [X86] Don't store register and memory FMA3 opcodes in the same X86InstrFMA3Group.
Nothing was using this relationship. By splitting them we no longer need to worry about register or memory entries being empty in a group.

The memory folding tables in X86InstrInfo.cpp can be used to access this relationship if needed.

llvm-svn: 335694
2018-06-27 00:42:24 +00:00
Evgeniy Stepanov 289a7d4c7d Revert "[asan] Instrument comdat globals on COFF targets"
Causes false positive ODR violation reports on __llvm_profile_raw_version.

llvm-svn: 335681
2018-06-26 22:43:48 +00:00
Lang Hames 2f17824463 [ORC] Don't call isa<> on a null value.
This should fix the recent builder failures in the test-global-ctors.ll testcase.

llvm-svn: 335680
2018-06-26 22:43:01 +00:00
Lang Hames 8f9dbb1d64 [ORC] Fix a missing return value.
llvm-svn: 335677
2018-06-26 22:30:42 +00:00
Michael Zolotukhin d3b8bdef01 [JumpThreading] Don't try to rewrite a use if it's already valid.
Summary:
When recording uses we need to rewrite after cloning a loop we need to
check if the use is not dominated by the original def. The initial
assumption was that the cloned basic block will introduce a new path and
thus the original def will only dominate the use if they are in the same
BB, but as the reproducer from PR37745 shows it's not always the case.

This fixes PR37745.

Reviewers: haicheng, Ka-Ka

Subscribers: hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D48111

llvm-svn: 335675
2018-06-26 22:19:48 +00:00
Lang Hames bf7b532cbc [ORC] Add a dependence on MC to LLVMBuild.txt
llvm-svn: 335673
2018-06-26 22:12:02 +00:00
Lang Hames 6a94134b11 [ORC] Add LLJIT and LLLazyJIT, and replace OrcLazyJIT in LLI with LLLazyJIT.
LLJIT is a prefabricated ORC based JIT class that is meant to be the go-to
replacement for MCJIT. Unlike OrcMCJITReplacement (which will continue to be
supported) it is not API or bug-for-bug compatible, but targets the same
use cases: Simple, non-lazy compilation and execution of LLVM IR.

LLLazyJIT extends LLJIT with support for function-at-a-time lazy compilation,
similar to what was provided by LLVM's original (now long deprecated) JIT APIs.

This commit also contains some simple utility classes (CtorDtorRunner2,
LocalCXXRuntimeOverrides2, JITTargetMachineBuilder) to support LLJIT and
LLLazyJIT.

Both of these classes are works in progress. Feedback from JIT clients is very
welcome!

llvm-svn: 335670
2018-06-26 21:35:48 +00:00
Konstantin Zhuravlyov 777477705a AMDGPU: Silence unused warnings in waitcnt insertion pass in release build
Differential Revision: https://reviews.llvm.org/D48607

llvm-svn: 335669
2018-06-26 21:33:38 +00:00
Jessica Paquette 67599c2e1e [X86][AsmParser] Recommit r335658
Recommit of r335658 so that it does not change the behaviour of any
existing error output.

llvm-svn: 335668
2018-06-26 21:30:34 +00:00
Vedant Kumar 1cb63dc2d5 Rename skipDebugInfo -> skipDebugIntrinsics, NFC
This addresses post-commit feedback about the name 'skipDebugInfo' being
misleading. This name could be interpreted as meaning 'a function that
skips instructions with debug locations'.

The new name, 'skipDebugIntrinsics', makes it clear that this function
only skips debug info intrinsics.

Thanks to Adrian Prantl for pointing this out!

llvm-svn: 335667
2018-06-26 21:16:59 +00:00
Lang Hames 2795a0a06e [ORC] Reset AsynchronousSymbolQuery's NotifySymbolsResolved callback on error.
AsynchronousSymbolQuery::canStillFail checks the value of the callback to
prevent sending it redundant error notifications, so we need to reset it after
running it.

llvm-svn: 335664
2018-06-26 20:59:50 +00:00
Lang Hames 831c575829 [ORC] Move the VSOList typedef out of VSO.
llvm-svn: 335663
2018-06-26 20:59:49 +00:00
Lang Hames ec8f5c8e5a [ORC] Fix a FIXME by moving MangleAndInterner to Core.h.
llvm-svn: 335661
2018-06-26 20:59:46 +00:00
Jessica Paquette 0a80af0761 Revert "[X86][AsmParser] Emit an error when RIP-relative instructions are used in 32-bit mode"
This reverts commit 4850a9aae8b38c7deadc103d634ec7397e6c323b.

It caused MC/X86/x86_errors.s to fail. Will fix and recommit shortly.

llvm-svn: 335660
2018-06-26 20:57:19 +00:00
Jessica Paquette 0e40d4bfc3 [X86][AsmParser] Emit an error when RIP-relative instructions are used in 32-bit mode
Right now, when we use RIP-relative instructions in 32-bit mode, we'll just
assert and crash.

This adds an error message which tells the user that they can't do that in
32-bit mode, so that we don't crash (and also can see the issue outside of
assert builds).

llvm-svn: 335658
2018-06-26 20:33:46 +00:00
Stanislav Mekhanoshin dacda79ee6 [AMDGPU] Add llvm.amdgcn.fmad.ftz intrinsic
This intrinsic selects v_mad_f32 regardless of fp32 denorm support.

Differential Revision: https://reviews.llvm.org/D48573

llvm-svn: 335654
2018-06-26 20:04:19 +00:00
Sanjay Patel fb9c440ba5 [DAGCombiner] use isBitwiseNot to simplify code; NFC
llvm-svn: 335652
2018-06-26 19:46:56 +00:00
Matt Arsenault 8c4a35237a AMDGPU: Add pass to lower kernel arguments to loads
This replaces most argument uses with loads, but for
now not all.

The code in SelectionDAG for calling convention lowering
is actively harmful for amdgpu_kernel. It attempts to
split the argument types into register legal types, which
results in low quality code for arbitary types. Since
all kernel arguments are passed in memory, we just want the
raw types.

I've tried a couple of methods of mitigating this in SelectionDAG,
but it's easier to just bypass this problem alltogether. It's
possible to hack around the problem in the initial lowering,
but the real problem is the DAG then expects to be able to use
CopyToReg/CopyFromReg for uses of the arguments outside the block.

Exposing the argument loads in the IR also has the advantage
that the LoadStoreVectorizer can merge them.

I'm not sure the best approach to dealing with the IR
argument list is. The patch as-is just leaves the IR arguments
in place, so all the existing code will still compute the same
kernarg size and pointlessly lowers the arguments.

Arguably the frontend should emit kernels with an empty argument
list in the first place. Alternatively a dummy array could be
inserted as a single argument just to reserve space.

This does have some disadvantages. Local pointer kernel arguments can
no longer have AssertZext placed  on them as the equivalent !range
metadata is not valid on pointer  typed loads. This is mostly bad
for SI which needs to know about the known bits in order to use the
DS instruction offset, so in this case this is not done.

More importantly, this skips noalias arguments since this pass
does not yet convert this to the equivalent !alias.scope and !noalias
metadata. Producing this metadata correctly seems to be tricky,
although this logically is the same as inlining into a function which
doesn't exist. Additionally, exposing these loads to the vectorizer
may result in degraded aliasing information if a pointer load is
merged with another argument load.

I'm also not entirely sure this is preserving the current clover
ABI, although I would greatly prefer if it would stop widening
arguments and match the HSA ABI. As-is I think it is extending
< 4-byte arguments to 4-bytes but doesn't align them to 4-bytes.

llvm-svn: 335650
2018-06-26 19:10:00 +00:00
Matt Arsenault 7e991d30c0 ConstantFold: Don't fold global address vs. null for addrspace != 0
Not sure why this logic seems to be repeated in 2 different places,
one called by the other.

On AMDGPU addrspace(3) globals start allocating at 0, so these
checks will be incorrect (not that real code actually tries
to compare these addresses)

llvm-svn: 335649
2018-06-26 18:55:43 +00:00
Vedant Kumar 78ff0f1b83 Use a variable to appease a no-asserts bot, NFC
Failure URL:
http://lab.llvm.org:8011/builders/lld-x86_64-darwin13/builds/22836

llvm-svn: 335648
2018-06-26 18:55:26 +00:00
Tim Shen b32823cbe9 [ConstantRange] Add support of mul in makeGuaranteedNoWrapRegion.
Summary: This is trying to add support for r334428.

Reviewers: sanjoy

Subscribers: jlebar, hiraditya, bixia, llvm-commits

Differential Revision: https://reviews.llvm.org/D48399

llvm-svn: 335646
2018-06-26 18:54:10 +00:00
Matt Arsenault 2c1a570aab LoopUnroll: Allow analyzing intrinsic call costs
I'm not sure why the code here is skipping calls since
TTI does try to do something for general calls, but it
at least should allow intrinsics.

Skip intrinsics that should not be omitted as calls, which
is by far the most common case on AMDGPU.

llvm-svn: 335645
2018-06-26 18:51:17 +00:00
Vedant Kumar c85ca4cdab [Local] Add a convenient insertReplacementDbgValues overload, NFC
Add an overload for the common case where the replacement dbg.values
have the same DIExpressions as the originals.

llvm-svn: 335643
2018-06-26 18:44:53 +00:00
Vedant Kumar de46f65bbd [Local] Sink salvageDI's early exit into helper functions, NFC
salvageDebugInfo() performs a check that allows it to exit early without
doing a DenseMap lookup. It's a bit neater and marginally more useful to
sink this early exit into the findDbg{Addr,Users,Values} helpers.

llvm-svn: 335642
2018-06-26 18:44:52 +00:00
Brendon Cahoon b7169c435a [Hexagon] Add a "generic" cpu
Add the generic processor for Hexagon so that it can be used
with 3rd party programs that create a back-end with the
"generic" CPU. This patch also enables the JIT for Hexagon.

Differential Revision: https://reviews.llvm.org/D48571

llvm-svn: 335641
2018-06-26 18:44:05 +00:00
Simon Pilgrim 7f55af37f4 [DAGCombiner] Don't accept -1 sdiv divisors in sdiv-by-pow2 vector expansion (PR37119)
Temporary fix until I've managed to get D45806 updated - both +1 and -1 special cases need to be properly supported.

llvm-svn: 335637
2018-06-26 17:46:51 +00:00
Sanjay Patel ad0bfb844d [InstSimplify] fold shifts by sext bool
https://rise4fun.com/Alive/c3Y

llvm-svn: 335633
2018-06-26 17:31:38 +00:00
Sanjay Patel 9adea01c9f [InstCombine] simplify code for urem fold; NFCI
llvm-svn: 335623
2018-06-26 16:39:29 +00:00
Sanjay Patel 3575f0c0b3 [InstCombine] fold urem with sext bool divisor
Similar to other patches in this series:
https://reviews.llvm.org/rL335512
https://reviews.llvm.org/rL335527
https://reviews.llvm.org/rL335597
https://reviews.llvm.org/rL335616

...this is filling a gap in analysis that is exposed by an unrelated select-of-constants transform.
I didn't see a way to unify the sext cases because each div/rem opcode results in a different fold.

Note that in this case, the backend might want to convert the select into math:
Name: sext urem
%e = sext i1 %x to i32
%r = urem i32 %y, %e
=>
%c = icmp eq i32 %y, -1
%z = zext i1 %c to i32
%r = add i32 %z, %y

llvm-svn: 335622
2018-06-26 16:30:00 +00:00
Simon Pilgrim bbfc18b5b5 [SLPVectorizer] Recognise non uniform power of 2 constants
Since D46637 we are better at handling uniform/non-uniform constant Pow2 detection; this patch tweaks the SLP argument handling to support them.

As SLP works with arrays of values I don't think we can easily use the pattern match helpers here.

Differential Revision: https://reviews.llvm.org/D48214

llvm-svn: 335621
2018-06-26 16:20:16 +00:00
Simon Pilgrim 133b1cdf08 [DAGCombiner] Pull out VT bitwidth in visitSDIV. NFCI.
llvm-svn: 335617
2018-06-26 15:39:16 +00:00
Sanjay Patel 2b7e31095d [InstSimplify] fold srem with sext bool divisor
llvm-svn: 335616
2018-06-26 15:32:54 +00:00
Krzysztof Parzyszek 9f199ebec0 Silence "unused variable" warning in LiveIntervals.cpp after r335607
llvm-svn: 335610
2018-06-26 14:55:04 +00:00
Krzysztof Parzyszek 70f027022c Account for undef values from predecessors in extendSegmentsToUses
It is legal for a PHI node not to have a live value in a predecessor
as long as the end of the predecessor is jointly dominated by an undef
value.

llvm-svn: 335607
2018-06-26 14:37:16 +00:00
Simon Pilgrim aa2bf2be31 [TargetLowering] isVectorClearMaskLegal - use ArrayRef<int> instead of const SmallVectorImpl<int>&
This is more generic and matches isShuffleMaskLegal.

Differential Revision: https://reviews.llvm.org/D48591

llvm-svn: 335605
2018-06-26 14:15:31 +00:00
Than McIntosh 3190993a02 [X86,ARM] Retain split-stack prolog check for sibling calls
Summary:
If a routine with no stack frame makes a sibling call, we need to
preserve the stack space check even if the local stack frame is empty,
since the call target could be a "no-split" function (in which case
the linker needs to be able to fix up the prolog sequence in order to
switch to a larger stack).

This fixes PR37807.

Reviewers: cherry, javed.absar

Subscribers: srhines, llvm-commits

Differential Revision: https://reviews.llvm.org/D48444

llvm-svn: 335604
2018-06-26 14:11:30 +00:00
Simon Pilgrim cfe2f9d4d2 Fix spelling mistakes in comments. NFCI.
llvm-svn: 335603
2018-06-26 14:06:23 +00:00
Teresa Johnson 63ee0e73e4 [ThinLTO] Parse module summary index from assembly
Summary:
Adds assembly parsing support for the module summary index (follow on
to r333335 which added the assembly writing support).

I added support to llvm-as to invoke the index parsing, so that it can
create either a bitcode file with a Module and a per-module index, or
a combined index without a Module.

I will send follow on patches soon to do the following:
- add support to tools such as llvm-lto2 to parse the per-module indexes
from assembly instead of bitcode when testing the thin link.
- verification support.

Depends on D47844 and D47842.

Reviewers: pcc, dexonsmith, mehdi_amini

Subscribers: inglorion, eraman, steven_wu, llvm-commits

Differential Revision: https://reviews.llvm.org/D47905

llvm-svn: 335602
2018-06-26 13:56:49 +00:00
Sanjay Patel 7c45debaea [InstCombine] fold udiv with sext bool divisor
Note: I didn't add a hasOneUse() check because the existing,
related fold doesn't have that check. I suspect that the
improved analysis and codegen make these some of the rare
canonicalization cases where we allow an increase in
instructions.

llvm-svn: 335597
2018-06-26 12:41:15 +00:00
Tim Northover b73efb85ba ARM: correctly decode VFP instructions following unpredictable t2IT
When the condition code for an IT instruction is "AL" we get strange "15"
predicates on subsequent instructions. These are dealt with for most
instructions by treating them as "ARMCC::AL", but VFP takes a different path
which didn't have this code.

llvm-svn: 335594
2018-06-26 11:39:20 +00:00
Tim Northover bf54858115 ARM: diagnose unpredictable IT instructions
IT instructions are allowed to have the 'AL' predicate, but it must never
result in an 'NV' predicated instruction. Essentially this means that all
branches must be 't' rather than 'e' if the predicate is 'AL'.

This patch adds a diagnostic for this during assembly (error because parsing
hits an assertion if allowed to continue) and an annotation during disassembly.

llvm-svn: 335593
2018-06-26 11:38:41 +00:00
Simon Pilgrim bfaa09220b [X86] Just use ArrayRef instead of SmallVectorImpl in a few static method arguments. NFCI.
llvm-svn: 335590
2018-06-26 10:45:41 +00:00
Florian Hahn 4a69b0bb36 [IPSCCP] Change dead blocks to unreachable after visiting all executable blocks.
changeToUnreachable may remove PHI nodes from executable blocks we found values
for and we would fail to replace them. By changing dead blocks to unreachable after
we replaced constants in all executable blocks, we ensure such PHI nodes are replaced
by their known value before.

Fixes PR37780.

Reviewers: efriedma, davide

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D48421

llvm-svn: 335588
2018-06-26 10:15:02 +00:00
Simon Pilgrim dcf5bd271f Fix MSVC "signed/unsigned mismatch" warning. NFCI.
llvm-svn: 335587
2018-06-26 10:02:12 +00:00
Simon Pilgrim 9b3b0fe763 Fix MSVC "not all control paths return a value" warnings. NFCI.
llvm-svn: 335584
2018-06-26 09:31:18 +00:00
Bjorn Pettersson 550517bcab Improve ConvertDebugDeclareToDebugValue
Summary:
This is a follow-up to r334830 and r335031.

In the valueCoversEntireFragment check we now also handle
the situation when there is a variable length array (VLA)
involved, and the length of the array has been reduced to
a constant.

The ConvertDebugDeclareToDebugValue functions that are related
to PHI nodes and load instructions now avoid inserting dbg.value
intrinsics when the value does not, for certain, cover the
variable/fragment that should be described.
In r334830 we assumed that the value always covered the entire
var/fragment and we had assertions in the code to show that
assumption. However, those asserts failed when compiling code
with VLAs, so we removed the asserts in r335031. Now when we
know that the valueCoversEntireFragment check can fail also for
PHI/Load instructions we avoid to insert the faulty dbg.value
intrinsic in such situations. Compared to the Store instruction
scenario we simply drop the dbg.value here (as the variable does
not change its value due to PHI/Load, so an earlier dbg.value
describing the variable should still be valid).

Reviewers: aprantl, vsk, efriedma

Reviewed By: aprantl

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D48547

llvm-svn: 335580
2018-06-26 06:17:00 +00:00
Gil Rapaport da2e2caa6c [InstCombine] (A + 1) + (B ^ -1) --> A - B
Turn canonicalized subtraction back into (-1 - B) and combine it with (A + 1) into (A - B).
This is similar to the folding already done for (B ^ -1) + Const into (-1 + Const) - B.

Differential Revision: https://reviews.llvm.org/D48535

llvm-svn: 335579
2018-06-26 05:31:18 +00:00
Craig Topper 08dae1682d [X86] Don't use getScalarShiftAmountTy to get the immediate type for target specific VSHLDQ/VSRLDQ nodes.
These opcodes have a fixed type of i8 for their immediate and shouldn't have anything to do with the scalar shift amount used by target independent shift nodes.

llvm-svn: 335578
2018-06-26 04:53:42 +00:00
Dan Gohman 910ba33d0c [WebAssembly] Fix lowering of varargs functions with non-legal fixed arguments.
CallLoweringInfo's NumFixedArgs field gives the number of fixed arguments
before legalization. The ISD::OutputArg "Outs" array holds legalized
arguments, so when indexing into it to find the non-fixed arguemn, we need
to use the number of arguments after legalization.

Fixes PR37934.

llvm-svn: 335576
2018-06-26 03:18:38 +00:00
Craig Topper c42ed4e3c4 [X86] Use XOR for SUB (C, X) during isel if will help fold an immediate
Summary:
Same idea as D48529, but restricted to X86 and done very late to avoid any surprises where subtract might be better for DAG combining.

This seems like the safest way to do this trick. And we consider doing it as a DAG combine later.

Reviewers: spatel, RKSimon

Reviewed By: spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D48557

llvm-svn: 335575
2018-06-26 03:11:15 +00:00
Dan Gohman fd2f7aeb12 [WebAssembly] Fix a typo in a comment.
llvm-svn: 335574
2018-06-26 03:03:41 +00:00
Teresa Johnson 519055336d [ThinLTO] Add string saver onto index for value names
Summary:
Adds a string saver to the ModuleSummaryIndex so it can store value
names in the case of adding a ValueInfo for a GUID when we don't
have the name stored in a Module string table. This is motivated
by the upcoming summary parser patch, where we will read value names
from the summary entry and want to store them, even when a Module
is not available.

Currently this allows us to store the name in the legacy bitcode case,
and I have added a test to show that.

Reviewers: pcc, dexonsmith

Subscribers: mehdi_amini, inglorion, eraman, steven_wu, llvm-commits

Differential Revision: https://reviews.llvm.org/D47842

llvm-svn: 335570
2018-06-26 02:29:08 +00:00
Craig Topper 689e363ff2 [X86] Redefine avx512 packed fpclass intrinsics to return a vXi1 mask and implement the mask input argument using an 'and' IR instruction.
This recommits r335562 and 335563 as a single commit.

The frontend will surround the intrinsic with the appropriate marshalling to/from a scalar type to match the sigature of the builtin that software expects.

By exposing the vXi1 type directly in the llvm intrinsic we make it available to optimizers much earlier. This can enable the scalar marshalling code to be optimized away.

llvm-svn: 335568
2018-06-26 01:37:02 +00:00
Teresa Johnson 9766fd64fb [ThinLTO] Add per-module indexes to combined index consistently
Summary:
Without this change we only add module paths to the combined index when
there is a module hash or at least one global value. Make this more
consistent by adding the module to the index whenever there is a summary
section, and it is a per-module summary (had a MODULE_CODE_SOURCE_FILENAME
record).

Since we will no longer add module paths lazily, add a new interface to get
the module info from the index that asserts it is already added.

Fixes PR37899.

Reviewers: Vlad, pcc

Subscribers: mehdi_amini, inglorion, steven_wu, llvm-commits

Differential Revision: https://reviews.llvm.org/D48511

llvm-svn: 335567
2018-06-26 01:32:58 +00:00
Craig Topper 6f4fdfa9af Revert r335562 and 335563 "[X86] Redefine avx512 packed fpclass intrinsics to return a vXi1 mask and implement the mask input argument using an 'and' IR instruction."
These were supposed to have been squashed to a single commit.

llvm-svn: 335566
2018-06-26 01:31:53 +00:00
Lang Hames ce72161ddf [ORC] Add a symbolAliases function to the Core APIs.
symbolAliases can be used to define symbol aliases within a VSO.

llvm-svn: 335565
2018-06-26 01:22:29 +00:00
Craig Topper 9b4322ce31 foo
llvm-svn: 335562
2018-06-26 00:43:34 +00:00
Teresa Johnson 7bea1aad6a [ThinLTO] Compute GUID directly from GV when building per-module index
Summary:
I discovered when writing the summary parsing support that the
per-module index builder and writer are computing the GUID from the
value name alone (ignoring the linkage type). This was ok since those
GUID were not emitted in the bitcode, and there are never multiple
conflicting names in a single module.

However, I don't see a reason for making the GUID computation different
for the per-module case. It also makes things simpler on the parsing
side to have the GUID computation consistent. So this patch changes the
summary analysis phase and the per-module summary writer to compute the
GUID using the facility on the GlobalValue.

Reviewers: pcc, dexonsmith

Subscribers: llvm-commits, inglorion

Differential Revision: https://reviews.llvm.org/D47844

llvm-svn: 335560
2018-06-26 00:20:49 +00:00
Eric Christopher b7a52bb28a Add a warning if someone attempts to add extra section flags to sections
with well defined semantics like .rodata.

llvm-svn: 335558
2018-06-25 23:53:54 +00:00
Tim Shen 802c31cc28 [APInt] Add helpers for rounding u/sdivs.
Reviewers: sanjoy, craig.topper

Subscribers: jlebar, hiraditya, bixia, llvm-commits

Differential Revision: https://reviews.llvm.org/D48498

llvm-svn: 335557
2018-06-25 23:49:20 +00:00
Chandler Carruth 1652996fd6 [PM/LoopUnswitch] Teach the new unswitch to handle nontrivial
unswitching of switches.

This works much like trivial unswitching of switches in that it reliably
moves the switch out of the loop. Here we potentially clone the entire
loop into each successor of the switch and re-point the cases at these
clones.

Due to the complexity of actually doing nontrivial unswitching, this
patch doesn't create a dedicated routine for handling switches -- it
would duplicate far too much code. Instead, it generalizes the existing
routine to handle both branches and switches as it largely reduces to
looping in a few places instead of doing something once. This actually
improves the results in some cases with branches due to being much more
careful about how dead regions of code are managed. With branches,
because exactly one clone is created and there are exactly two edges
considered, somewhat sloppy handling of the dead regions of code was
sufficient in most cases. But with switches, there are much more
complicated patterns of dead code and so I've had to move to a more
robust model generally. We still do as much pruning of the dead code
early as possible because that allows us to avoid even cloning the code.

This also surfaced another problem with nontrivial unswitching before
which is that we weren't as precise in reconstructing loops as we could
have been. This seems to have been mostly harmless, but resulted in
pointless LCSSA PHI nodes and other unnecessary cruft. With switches, we
have to get this *right*, and everything benefits from it.

While the testing may seem a bit light here because we only have two
real cases with actual switches, they do a surprisingly good job of
exercising numerous edge cases. Also, because we share the logic with
branches, most of the changes in this patch are reasonably well covered
by existing tests.

The new unswitch now has all of the same fundamental power as the old
one with the exception of the single unsound case of *partial* switch
unswitching -- that really is just loop specialization and not
unswitching at all. It doesn't fit into the canonicalization model in
any way. We can add a loop specialization pass that runs late based on
profile data if important test cases ever come up here.

Differential Revision: https://reviews.llvm.org/D47683

llvm-svn: 335553
2018-06-25 23:32:54 +00:00
Sanjay Patel 38a86d3136 [InstCombine] cleanup udiv folds; NFCI
This removes a "UDivFoldAction" in favor of a simple constant
matcher. In theory, the existing code could do more matching,
but I don't see any evidence or need for it. I've left a TODO
about using ValueTracking in case we see any regressions.

llvm-svn: 335545
2018-06-25 22:50:26 +00:00
Benjamin Kramer 1649774816 [Instrumentation] Remove unused include
It's also a layering violation.

llvm-svn: 335528
2018-06-25 21:43:09 +00:00
Sanjay Patel 6a96d90acd [InstCombine] fold sdiv with sext bool divisor
llvm-svn: 335527
2018-06-25 21:39:41 +00:00
Florian Hahn b10b141a79 Revert r335513: [SCEVExp] Advance found insertion point
llvm-svn: 335522
2018-06-25 20:55:26 +00:00
Craig Topper 27847868b7 [LoopIdiomRecognize] Fix a couple places where it appears we were unintenionally making copies of DebugLoc.
llvm-svn: 335521
2018-06-25 20:45:45 +00:00
Craig Topper 913abc8b58 [X86] Simplify intrinsic table binary search to not require a temporary struct.
std::lower_bound doesn't require the thing to search for to be the same type as the table entries. We just need to define an appropriate comparison function that can take an table entry and an intrinsic number.

llvm-svn: 335518
2018-06-25 20:27:46 +00:00
Craig Topper 614f192471 [X86] Add comment about the sorting of the memory folding tables added in r335501.
llvm-svn: 335517
2018-06-25 20:11:16 +00:00
Lei Huang 5d109ee3d4 [PowerPC] Fix incorrectly encoded wait instruction
Encoding for the wait instruction was wrong. Fix according to ISA 3.0.

Differential Revision: https://reviews.llvm.org/D48550

llvm-svn: 335514
2018-06-25 19:28:27 +00:00
Florian Hahn 5947c17fd4 [SCEVExp] Advance found insertion point until we find a non-dbg instruction.
This avoids creating unnecessary casts if the IP used to be a dbg info
intrinsic. Fixes PR37727.

Reviewers: vsk, aprantl, sanjoy, efriedma

Reviewed By: vsk, efriedma

Differential Revision: https://reviews.llvm.org/D47874

llvm-svn: 335513
2018-06-25 19:17:29 +00:00
Sanjay Patel 1e911fa746 [InstSimplify] fold div/rem of zexted bool
I was looking at an unrelated fold and noticed that
we don't have this simplification (because the other
fold would break existing tests).

Name: zext udiv
  %z = zext i1 %x to i32
  %r = udiv i32 %y, %z
=>
  %r = %y

Name: zext urem
  %z = zext i1 %x to i32
  %r = urem i32 %y, %z
=>
  %r = 0

Name: zext sdiv
  %z = zext i1 %x to i32
  %r = sdiv i32 %y, %z
=>
  %r = %y

Name: zext srem
  %z = zext i1 %x to i32
  %r = srem i32 %y, %z
=>
  %r = 0

https://rise4fun.com/Alive/LZ9

llvm-svn: 335512
2018-06-25 18:51:21 +00:00
Kamil Rytarowski a8448ad098 Handle NetBSD specific path in findDebugBinary()
Summary:
The NetBSD Operating System installs debuginfo
files into /usr/libdata/debug, rather than other path
like in some other popular distribution.

This change makes llvm-symbolizer functional with
the basesystem executables.

Reviewers: joerg, vitalybuka

Reviewed By: vitalybuka

Subscribers: JDevlieghere, llvm-commits

Differential Revision: https://reviews.llvm.org/D48525

llvm-svn: 335511
2018-06-25 18:49:13 +00:00
Reid Kleckner 88fee5fdbc Re-land r335297 "[X86] Implement more of x86-64 large and medium PIC code models"
The large code model allows code and data segments to exceed 2GB, which
means that some symbol references may require a displacement that cannot
be encoded as a displacement from RIP. The large PIC model even relaxes
the assumption that the GOT itself is within 2GB of all code. Therefore,
we need a special code sequence to materialize it:
  .LtmpN:
    leaq .LtmpN(%rip), %rbx
    movabsq $_GLOBAL_OFFSET_TABLE_-.LtmpN, %rax # Scratch
    addq %rax, %rbx # GOT base reg

From that, non-local references go through the GOT base register instead
of being PC-relative loads. Local references typically use GOTOFF
symbols, like this:
    movq extern_gv@GOT(%rbx), %rax
    movq local_gv@GOTOFF(%rbx), %rax

All calls end up being indirect:
    movabsq $local_fn@GOTOFF, %rax
    addq %rbx, %rax
    callq *%rax

The medium code model retains the assumption that the code segment is
less than 2GB, so calls are once again direct, and the RIP-relative
loads can be used to access the GOT. Materializing the GOT is easy:
    leaq _GLOBAL_OFFSET_TABLE_(%rip), %rbx # GOT base reg

DSO local data accesses will use it:
    movq local_gv@GOTOFF(%rbx), %rax

Non-local data accesses will use RIP-relative addressing, which means we
may not always need to materialize the GOT base:
    movq extern_gv@GOTPCREL(%rip), %rax

Direct calls are basically the same as they are in the small code model:
They use direct, PC-relative addressing, and the PLT is used for calls
to non-local functions.

This patch adds reasonably comprehensive testing of LEA, but there are
lots of interesting folding opportunities that are unimplemented.

I restricted the MCJIT/eh-lg-pic.ll test to Linux, since the large PIC
code model is not implemented for MachO yet.

Differential Revision: https://reviews.llvm.org/D47211

llvm-svn: 335508
2018-06-25 18:16:27 +00:00
Craig Topper 3cc6cb1d35 [X86] Sort the static memory folding tables by reg opcode. Remove the reg->mem DenseMaps in favor of binary search.
With the static tables sorted we can binary search them directly for reg->mem lookups. This removes 6 DenseMaps that had to be created when X86InstrInfo is constructed.

We still have one Mem->Reg DenseMap for the reverse direction. This is created just as before by walking the reg->mem arrays to populate it.

Differential Revision: https://reviews.llvm.org/D48527

llvm-svn: 335501
2018-06-25 17:26:56 +00:00
Craig Topper b9cb88a4b0 [X86] Allow base and index for gather instructions to appear in other order for Intel syntax.
llvm-svn: 335500
2018-06-25 17:26:51 +00:00
Vedant Kumar b725c69f12 [SelectionDAG] Remove debug locations from ConstantSD(FP)Nodes
This removes debug locations from ConstantSDNode and ConstantSDFPNode.

When this kind of node is materialized we no longer create a line table
entry which jumps back to the constant's first point of use. This makes
single-stepping behavior smoother, and it matches the model used by IR,
where Constants have no locations. See this thread for more context:

  http://lists.llvm.org/pipermail/llvm-dev/2018-June/124164.html

I'd like to handle constant BuildVectorSDNodes and to try to eliminate
passing SDLocs to SelectionDAG::getConstant*() in follow-up commits.

Differential Revision: https://reviews.llvm.org/D48468

llvm-svn: 335497
2018-06-25 17:06:18 +00:00
Alexander Richardson 85e200e934 Add Triple::isMIPS()/isMIPS32()/isMIPS64(). NFC
There are quite a few if statements that enumerate all these cases. It gets
even worse in our fork of LLVM where we also have a Triple::cheri (which
is mips64 + CHERI instructions) and we had to update all if statements that
check for Triple::mips64 to also handle Triple::cheri. This patch helps to
reduce our diff to upstream and should also make some checks more readable.

Reviewed By: atanasyan

Differential Revision: https://reviews.llvm.org/D48548

llvm-svn: 335493
2018-06-25 16:49:20 +00:00
Matt Arsenault b1cc4f52ff AMDGPU/GlobalISel: Add support for llvm.amdgcn.kernarg.segment.ptr
Note a normal select test is not currently possible because this
relies on input registers tracked in SIMachineFunctionInfo which
are not currently serializable in MIR, but this does work end-to-end
from the IR.

llvm-svn: 335490
2018-06-25 16:17:48 +00:00
Matt Arsenault 921f7a27cc StackSlotColoring: Decide colors per stack ID
I thought I fixed this in r308673, but that fix was
very broken. The assumption that any frame index can be used
in place of another was more widespread than I realized.
Even when stack slot sharing was disabled, this was still
replacing frame index uses with a different ID with a different
stack slot.

Really fix this by doing the coloring per-stack ID, so all of
the coloring logically done in a separate namespace. This is a lot
simpler than trying to figure out how to change the color if
the stack ID is different.

llvm-svn: 335488
2018-06-25 16:05:55 +00:00
Matt Arsenault 2811a20f77 AMDGPU: Remove commented out code
llvm-svn: 335486
2018-06-25 15:42:20 +00:00
Matt Arsenault b3feccd7fa AMDGPU/GlobalISel: Fix G_IMPLICIT_DEF for pointers
llvm-svn: 335485
2018-06-25 15:42:12 +00:00
Wei Mi e555127435 [SampleFDO] Add an option to turn on/off warning about samples unused.
If a function has sample to use, but cannot use them because of no debug
information, currently a warning will be issued to inform the missing
opportunity.

This warning assumes the binary generating the profile and the binary using
the profile are similar enough. It is not always the case. Sometimes even
if the binaries are not quite similar, we may still get some benefit by
using sampleFDO. In those cases, we may still want to apply sampleFDO but
not want to see a lot of such warnings pop up.

The patch adds an option for the warning.

Differential Revision: https://reviews.llvm.org/D48510

llvm-svn: 335484
2018-06-25 15:40:31 +00:00
David Green 8699492304 [DA] Delinearise AddRecs if we can prove they don't wrap
We can prove that some delinearized subscripts do not wrap around to become
negative by the fact that they are from inbound geps of load/store locations.
This helps improve the delinearisation in cases where we can't prove that they
are non-negative from SCEV alone.

Differential Revision: https://reviews.llvm.org/D48481

llvm-svn: 335481
2018-06-25 15:13:26 +00:00
Matt Arsenault 73eeb42e50 AMDGPU: Respect align argument parameter
This should avoid relying on the pointee type
to get the alignment, particularly since pointee
types are supposed to be removed at some point.

Also fixes not getting the alignment for unsized types.

llvm-svn: 335478
2018-06-25 14:29:04 +00:00
Artur Pilipenko ac6e6864e8 SafepointIRVerifier should ignore dead blocks and dead edges
Not only should SafepointIRVerifier ignore unreachable blocks (as suggested in https://reviews.llvm.org/D47011) but it also has to ignore dead blocks.

In @test2 (see the new tests):

  br i1 true, label %right, label %left
left:
  ...
right:
  ...
merge:
  %val = phi i8 addrspace(1)* [ ..., %left ], [ ..., %right ]
  use %val
both left and right branches are reachable.
If they collide then SafepointIRVerifier reports an error.

Because of the foldable branch condition GVN finds the left branch dead and removes the phi node entry that merges values from right and left. Then the use comes from the right branch. This results in no collision.

So, SafepointIRVerifier ends up in different results depending on either GVN is run or not.

To solve this issue this patch adds Dead Block detection to SafepointIRVerifier which can ignore dead blocks while validating IR. The Dead Block detection algorithm is taken from GVN but modified to not split critical edges. That is needed to keep CFG unchanged by SafepointIRVerifier.

Patch by Yevgeny Rouban.

Reviewed By: anna, apilipenko, DaniilSuchkov

Differential Revision: https://reviews.llvm.org/D47441

llvm-svn: 335473
2018-06-25 13:51:11 +00:00
Krzysztof Parzyszek 4581f37e7c Improve handling of COPY instructions with identical value numbers
Testcases provided by Tim Renouf.

Differential Revision: https://reviews.llvm.org/D48102

llvm-svn: 335472
2018-06-25 13:46:41 +00:00
Artur Pilipenko ddc7f391d2 Revert change 335077 "[InlineSpiller] Fix a crash due to lack of forward progress from remat specifically for STATEPOINT"
This change caused widespread assertion failures in our downstream testing:
lib/CodeGen/LiveInterval.cpp:409: bool llvm::LiveRange::overlapsFrom(const llvm::LiveRange&, llvm::LiveRange::const_iterator) const: Assertion `!empty() && "empty range"' failed.

llvm-svn: 335462
2018-06-25 12:58:13 +00:00
Simon Pilgrim 79e474bf46 Use APInt[] bit access to avoid "32-bit shift implicitly converted to 64 bits" MSVC warning (again). NFCI.
llvm-svn: 335457
2018-06-25 11:46:24 +00:00
Simon Pilgrim 3a0e13f347 Use APInt[] bit access to avoid "32-bit shift implicitly converted to 64 bits" MSVC warning. NFCI.
llvm-svn: 335454
2018-06-25 11:38:27 +00:00
Simon Pilgrim 5b6b500687 Fix -Wparentheses gcc warning. NFCI.
llvm-svn: 335451
2018-06-25 11:19:05 +00:00
Craig Topper facea6b4a6 [X86] Block commuting operand 1 of FMA*_Int instructions in findThreeSrcCommutedOpIndices. Remove uncommutable returns from getThreeSrcCommuteCase/getFMA3OpcodeToCommuteOperands.
We should be blocking the operand while we are in the routine that tries to find commutable operand indices. Doing it later means we might have missed out on another valid set of operands we could have commuted.

The intrinsic case was the only case that could really prevent commuting in getFMA3OpcodeToCommuteOperands. All the other cases in getThreeSrcCommuteCase were not reachable conditions as they were protected by findThreeSrcCommutedOpIndices.

With that abort case pushed earlier, we can remove all the abort checks and replace with asserts.

llvm-svn: 335446
2018-06-25 06:05:37 +00:00
George Burgess IV 97ec62455d [MSSA] Add domination number verifier; NFC
It's easy for domination numbers to get out-of-date, and this is no more
costly than any of the other verifiers we already have, so it seems nice
to have.

A stage3 build with this Works On My Machine, so this hasn't caught any
bugs... yet. :)

llvm-svn: 335444
2018-06-25 05:30:36 +00:00
Heejin Ahn 04c4894911 [WebAssembly] Add WebAssemblyException information analysis
Summary:
A WebAssemblyException object contains BBs that belong to a 'catch' part
of the try-catch-end structure. Because CFGSort requires all the BBs
within a catch part to be sorted together as it does for loops, this
pass calculates the nesting structure of catch part of exceptions in a
function. Now this assumes the use of Windows EH instructions.

Reviewers: dschuff, majnemer

Subscribers: jfb, mgorny, sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D44134

llvm-svn: 335439
2018-06-25 01:20:21 +00:00
Heejin Ahn 4934f76b58 [WebAssembly] Add WebAssemblyLateEHPrepare pass
Summary:
Add WebAssemblyLateEHPrepare pass that does several small jobs for
exception handling. This runs before CFGSort, and is different from
WasmEHPrepare pass that runs before ISel, even though the names are
similar.

Reviewers: dschuff, majnemer

Subscribers: sbc100, jgravelle-google, sunfish, mgorny, llvm-commits

Differential Revision: https://reviews.llvm.org/D46803

llvm-svn: 335438
2018-06-25 01:07:11 +00:00
Craig Topper 3b18bdc46d [X86] Simplify some code by using isOneConstant. NFC
llvm-svn: 335437
2018-06-25 01:01:47 +00:00
Craig Topper 4331d6218d [X86] Remove the changes to combineScalarToVector made in r335037.
They appear to be untested other than the test case for p37879.ll and I believe we should be using SimplifyDemandedElts here to handle these cases.

llvm-svn: 335436
2018-06-25 00:21:53 +00:00
Craig Topper ecf7c5b75f [X86] Reduce the number of patterns needed for masked scalar ceil/floor isel.
The scalar to vector on the mask register should not be part of the patterns.

llvm-svn: 335435
2018-06-25 00:05:09 +00:00
Brad Smith df1f50579f [mips][ias] Enable IAS by default for OpenBSD / FreeBSD mips64/mips64el.
Reviewers: atanasyan

Differential Review: https://reviews.llvm.org/D31557

llvm-svn: 335434
2018-06-24 15:44:47 +00:00
Sanjay Patel 962ee178fa [DAGCombiner] eliminate setcc bool math when input is low-bit of some value
This patch has the same motivating example as D48466:
define void @foo(i64 %x, i32 %c.0282.in, i32 %d.0280, i32* %ptr0, i32* %ptr1) {
    %c.0282 = and i32 %c.0282.in, 268435455
    %a16 = lshr i64 32508, %x
    %a17 = and i64 %a16, 1
    %tobool = icmp eq i64 %a17, 0
    %. = select i1 %tobool, i32 1, i32 2
    %.286 = select i1 %tobool, i32 27, i32 26
    %shr97 = lshr i32 %c.0282, %.
    %shl98 = shl i32 %c.0282.in, %.286
    %or99 = or i32 %shr97, %shl98
    %shr100 = lshr i32 %d.0280, %.
    %shl101 = shl i32 %d.0280, %.286
    %or102 = or i32 %shr100, %shl101
    store i32 %or99, i32* %ptr0
    store i32 %or102, i32* %ptr1
    ret void
}

...but I'm trying to kill the setcc bool math sooner rather than later.

By matching a larger pattern that includes both the low-bit mask and the trailing add/sub, 
we can create a universally good fold because we always eliminate the condition code 
intermediate value.

Here are Alive proofs for these (currently instcombine folds the 'add' variants, but 
misses the 'sub' patterns):
https://rise4fun.com/Alive/Gsyp

Name: sub of zext cmp mask
  %a = and i8 %x, 1
  %c = icmp eq i8 %a, 0
  %z = zext i1 %c to i32
  %r = sub i32 C1, %z
  =>
  %optional_cast = zext i8 %a to i32
  %r = add i32 %optional_cast, C1-1

Name: add of zext cmp mask
  %a = and i32 %x, 1
  %c = icmp eq i32 %a, 0
  %z = zext i1 %c to i8
  %r = add i8 %z, C1
  =>
  %optional_cast = trunc i32 %a to i8
  %r = sub i8 C1+1, %optional_cast

All of the tests look like improvements or neutral to me. But it is possible that x86 
test+set+bitop is better than what we now show here. I suspect we could do better by 
adding another fold for the 'sub' variants.

We start with select-of-constant in IR in the larger motivating test, so that's why I 
included tests with selects. Proofs for those variants:
https://rise4fun.com/Alive/Bx1

Name: true const is bigger
Pre: C2 == (C1 + 1)
  %a = and i8 %x, 1
  %c = icmp eq i8 %a, 0
  %r = select i1 %c, i64 C2, i64 C1
  =>
  %z = zext i8 %a to i64
  %r = sub i64 C2, %z

Name: false const is bigger
Pre: C2 == (C1 + 1)
  %a = and i8 %x, 1
  %c = icmp eq i8 %a, 0
  %r = select i1 %c, i64 C1, i64 C2
  =>
  %z = zext i8 %a to i64
  %r = add i64 C1, %z

Differential Revision: https://reviews.llvm.org/D48466

llvm-svn: 335433
2018-06-24 14:37:30 +00:00
Craig Topper 03523f6741 [X86] Regroup some isel patterns. NFC
For some reason the 64-bit patterns were separated from their 8/16/32-bit friends, but only for add/sub/mul. For and/or/xor they were together.

llvm-svn: 335429
2018-06-24 06:56:49 +00:00
Craig Topper 19772c89c7 [X86] Rename VFPCLASSSS and VFPCLASSSD internal instruction names to include a Z to match other EVEX instructions.
llvm-svn: 335428
2018-06-24 06:29:50 +00:00
Brad Smith 8c17d5921d Add OpenBSD support to the Threading code
llvm-svn: 335426
2018-06-23 22:02:59 +00:00
Duncan P. N. Exon Smith f4c82cffc6 ADT: Use EBO to shrink SmallVector size 1
SmallVectorStorage is empty when its size is 1; use inheritance so that
the empty base class optimization kicks in.

llvm-svn: 335421
2018-06-23 18:39:44 +00:00
Jonas Devlieghere c02bf01f64 [TableGen] Use WithColor for printing errors/warnings
Use the WithColor helper from support to print errors and warnings.

llvm-svn: 335415
2018-06-23 16:48:03 +00:00
Craig Topper d8d64a56b5 [X86] Make %eiz usage in 64-bit mode, force a 0x67 address size prefix. Fix some test CHECK lines.
llvm-svn: 335414
2018-06-23 06:15:04 +00:00
Craig Topper 2545529034 [X86] Teach disassembler to use %eip instead of %rip when 0x67 prefix is used on a rip-relative address.
llvm-svn: 335413
2018-06-23 06:03:48 +00:00
Craig Topper 68d64e3859 [X86][AsmParser] Improve base/index register checks.
-Ensure EIP isn't used with an index reigster.
-Ensure EIP isn't used as index register.
-Ensure base register isn't a vector register.
-Ensure eiz/riz usage matches the size of their base register.

llvm-svn: 335412
2018-06-23 05:53:00 +00:00
Stanislav Mekhanoshin d8c9374797 Fix invariant fdiv hoisting in LICM
FDiv is replaced with multiplication by reciprocal and invariant
reciprocal is hoisted out of the loop, while multiplication remains
even if invariant.

Switch checks for all invariant operands and only invariant
denominator to fix the issue.

Differential Revision: https://reviews.llvm.org/D48447

llvm-svn: 335411
2018-06-23 04:01:28 +00:00
Reid Kleckner fd7c9ab971 [AMDGPU] Update includes for intrinsic changes :(
llvm-svn: 335409
2018-06-23 03:05:39 +00:00
Lang Hames d716a26e8b [ORC] Fix formatting and list pending queries in VSO::dump.
llvm-svn: 335408
2018-06-23 02:22:10 +00:00
Reid Kleckner f5890e4e43 [IR] Split Intrinsics.inc into enums and implementations
Implements PR34259

Intrinsics.h is a very popular header. Most LLVM TUs care about things
like dbg_value, but they don't care how they are implemented. After I
split these out, IntrinsicImpl.inc is 1.7 MB, so this saves each LLVM TU
from scanning 1.7 MB of source that gets pre-processed away.

It also means we can modify intrinsic properties without triggering a
full rebuild, but that's probably less of a win.

I think the next best thing to do would be to split out the target
intrinsics into their own header. Very, very few TUs care about
target-specific intrinsics. It's very hard to split up the target
independent intrinsics like llvm.expect, assume, and dbg.value, though.

llvm-svn: 335407
2018-06-23 02:02:38 +00:00
Craig Topper abdbb2c67a [X86][AsmParser] Rework that allows (%dx) to be used in place of %dx with in/out instructions.
Previously, to support (%dx) we left a wide open hole in our 16-bit memory address checking. This let this address value be used with any instruction without error in the parser. It would later fail in the encoder with an assertion failure on debug builds and who knows what on release builds.

This patch passes the mnemonic down to the memory operand parsing function so we can allow the (%dx) form only on specific instructions.

llvm-svn: 335403
2018-06-23 00:03:20 +00:00
Reid Kleckner 330f65b3e8 [RuntimeDyld] Implement the ELF PIC large code model relocations
Prerequisite for https://reviews.llvm.org/D47211 which improves our ELF
large PIC codegen.

llvm-svn: 335402
2018-06-22 23:53:22 +00:00
Eli Friedman 203eaaf5ba [LoopReroll] Rewrite induction variable rewriting.
This gets rid of a bunch of weird special cases; instead, just use SCEV
rewriting for everything.  In addition to being simpler, this fixes a
bug where we would use the wrong stride in certain edge cases.

The one bit I'm not quite sure about is the trip count handling,
specifically the FIXME about overflow.  In general, I think we need to
widen the exit condition, but that's probably not profitable if the new
type isn't legal, so we probably need a check somewhere.  That said, I
don't think I'm making the existing problem any worse.

As a followup to this, a bunch of IV-related code in root-finding could
be cleaned up; with SCEV-based rewriting, there isn't any reason to
assume a loop will have exactly one or two PHI nodes.

Differential Revision: https://reviews.llvm.org/D45191

llvm-svn: 335400
2018-06-22 22:58:55 +00:00
George Burgess IV 2cbf9730b0 [MSSA] Remove incorrect comment + `auto`ify dyn_cast results; NFC
llvm-svn: 335399
2018-06-22 22:34:07 +00:00
Craig Topper 10e2f73793 [X86][AsmParser] Keep track of whether an explicit scale was specified while parsing an address in Intel syntax. Use it for improved error checking.
This allows us to check these:
-16-bit addressing doesn't support scale so we should error if we find one there.
-Multiplying ESP/RSP by a scale even if the scale is 1 should be an error because ESP/RSP can't be an index.

llvm-svn: 335398
2018-06-22 22:28:39 +00:00
Craig Topper 1d707539e4 [X86][AsmParser] In Intel syntax make sure we support ESP/RSP being the second register in memory expressions like [EAX+ESP].
By default, the second register gets assigned to the index register slot. But ESP can't be an index register so we need to swap it with the other register.

There's still a slight bug that we allow [EAX+ESP*1]. The existence of the multiply even though its with 1 should force ESP to the index register and trigger an error, but it doesn't currently.

llvm-svn: 335394
2018-06-22 21:57:24 +00:00
Tobias Edler von Koch 7609cb83e6 Re-land "[LTO] Enable module summary emission by default for regular LTO"
Since we are now producing a summary also for regular LTO builds, we
need to run the NameAnonGlobals pass in those cases as well (the
summary cannot handle anonymous globals).

See https://reviews.llvm.org/D34156 for details on the original change.

This reverts commit 6c9ee4a4a438a8059aacc809b2dd57128fccd6b3.

llvm-svn: 335385
2018-06-22 20:23:21 +00:00
Craig Topper 9bc2c059c3 [X86] Don't accept (%si,%bp) 16-bit address expressions.
The second register is the index register and should only be %si or %di if used with a base register. And in that case the base register should be %bp or %bx.

This makes us compatible with gas.

We do still need to support both orders with Intel syntax which uses [bp+si] and [si+bp]

llvm-svn: 335384
2018-06-22 20:20:38 +00:00
Craig Topper c26c62e0e5 [X86][AsmParser] Allow (%bp,%si) and (%bp,%di) to be encoded without using a zero displacement.
(%bp) can't be encoded without a displacement. The encoding is instead used for displacement alone. So a 1 byte displacement of 0 must be used. But if there is an index register we can encode without a displacement.

llvm-svn: 335379
2018-06-22 19:42:21 +00:00
Craig Topper cd18bb523c [X86][AsmParser] Check for invalid 16-bit base register in Intel syntax.
llvm-svn: 335373
2018-06-22 17:50:40 +00:00
Craig Topper 22d1db122a [X86] Don't allow ESP/RSP to be used as an index register in assembly.
Fixes PR37892

llvm-svn: 335370
2018-06-22 17:15:58 +00:00
Alina Sbirlea bee50036d3 [LoopUnswitch]Fix comparison for DomTree updates.
Summary:
In LoopUnswitch when replacing a branch Parent -> Succ with a conditional
branch Parent -> True & Parent->False, the DomTree updates should insert an edge for
each of True/False if True/False are different than Succ, and delete Parent->Succ edge
if both are different. The comparison with Succ appears to be incorect,
it's comparing with Parent instead.
There is no test failing either before or after this change, but it seems to me this is
the right way to do the update.

Reviewers: chandlerc, kuhar

Subscribers: sanjoy, jlebar, llvm-commits

Differential Revision: https://reviews.llvm.org/D48457

llvm-svn: 335369
2018-06-22 17:14:35 +00:00
Krzysztof Parzyszek 358a916aa8 Initialize LiveRegs once in BranchFolder::mergeCommonTails
llvm-svn: 335365
2018-06-22 16:38:38 +00:00
Simon Pilgrim 9d3ef8ee2b [SLPVectorizer] Support alternate opcodes in tryToVectorizeList
Enable tryToVectorizeList to support InstructionsState alternate opcode patterns at a root (build vector etc.) as well as further down the vectorization tree.

NOTE: This patch reduces some of the debug reporting if there are opcode mismatches - I can try to add it back if it proves a problem. But it could get rather messy trying to provide equivalent verbose debug strings via getSameOpcode etc.

Differential Revision: https://reviews.llvm.org/D48488

llvm-svn: 335364
2018-06-22 16:37:34 +00:00
Simon Pilgrim 213cb1b82d [SLPVectorizer] reorderAltShuffleOperands should just take InstructionsState. NFCI.
All calls were extracting the InstructionsState Opcode/AltOpcode values so we might as well pass it directly

llvm-svn: 335359
2018-06-22 16:10:26 +00:00
Paul Robinson 11539b0969 [DWARFv5] Allow ".loc 0" to refer to the root file.
DWARF v5 explicitly represents file #0 in the line table.  Prior
versions did not, so ".loc 0" is still an error in those cases.

Differential Revision: https://reviews.llvm.org/D48452

llvm-svn: 335350
2018-06-22 14:16:11 +00:00
Simon Pilgrim 1e564504bb [SLPVectorizer] Relax alternate opcodes to accept any BinaryOperator pair
SLP currently only accepts (F)Add/(F)Sub alternate counterpart ops to be merged into an alternate shuffle.

This patch relaxes this to accept any pair of BinaryOperator opcodes instead, assuming the target's cost model accepts the vectorization+shuffle.

Differential Revision: https://reviews.llvm.org/D48477

llvm-svn: 335349
2018-06-22 14:04:06 +00:00
Sanjay Patel a52963b404 [InstCombine] rearrange shuffle-of-binops logic; NFC
The commutative matcher makes things more complicated
here, and I'm planning an enhancement where this 
form is more readable.

llvm-svn: 335343
2018-06-22 12:46:16 +00:00
George Rimar dcf59c5480 Recommit r335333 "[MC] - Add .stack_size sections into groups and link them with .text"
With compilation fix.

Original commit message:

D39788 added a '.stack-size' section containing metadata on function stack sizes
to output ELF files behind the new -stack-size-section flag.

This change does following two things on top:

1) Imagine the case when there are -ffunction-sections flag given and there are text sections in COMDATs. 
    The patch adds a '.stack-size' section into corresponding COMDAT group, so that linker will be able to
    eliminate them fast during resolving the COMDATs.
2) Patch sets a SHF_LINK_ORDER flag and links '.stack-size' with the corresponding .text.
   With that linker will be able to do -gc-sections on dead stack sizes sections.

Differential revision: https://reviews.llvm.org/D46874

llvm-svn: 335336
2018-06-22 10:53:47 +00:00
Simon Pilgrim e0a6eb1f4f [IR] Use Instruction::isBinaryOp helper instead of raw enum range tests. NFCI.
llvm-svn: 335335
2018-06-22 10:48:02 +00:00
George Rimar 6d448da1be Revert r335332 "[MC] - Add .stack_size sections into groups and link them with .text"
It broke bots.

http://lab.llvm.org:8011/builders/clang-ppc64le-linux-lnt/builds/12891
http://lab.llvm.org:8011/builders/clang-cmake-x86_64-sde-avx512-linux/builds/9443
http://lab.llvm.org:8011/builders/lldb-x86_64-ubuntu-14.04-buildserver/builds/25551

llvm-svn: 335333
2018-06-22 10:27:33 +00:00
George Rimar e14485a0c6 [MC] - Add .stack_size sections into groups and link them with .text
D39788 added a '.stack-size' section containing metadata on function stack sizes
to output ELF files behind the new -stack-size-section flag.

This change does following two things on top:

1) Imagine the case when there are -ffunction-sections flag given and there are text sections in COMDATs. 
    The patch adds a '.stack-size' section into corresponding COMDAT group, so that linker will be able to
    eliminate them fast during resolving the COMDATs.
2) Patch sets a SHF_LINK_ORDER flag and links '.stack-size' with the corresponding .text.
   With that linker will be able to do -gc-sections on dead stack sizes sections.

Differential revision: https://reviews.llvm.org/D46874

llvm-svn: 335332
2018-06-22 10:10:53 +00:00
Sjoerd Meijer 1043dffbd3 Recommit of r335326, with the test fixed that I missed.
llvm-svn: 335331
2018-06-22 10:03:03 +00:00
Simon Pilgrim 9c8f9374b5 [CostModel][AArch64] Add some initial costs for SK_Select and SK_PermuteSingleSrc
AArch64 was only setting costs for SK_Transpose, which meant that many of the simpler shuffles (e.g. SK_Select and SK_PermuteSingleSrc for larger vector elements) was being severely overestimated by the default shuffle expansion.

This patch adds costs to help improve SLP performance and avoid a regression in reductions introduced by D48174.

I'm not very knowledgeable about AArch64 shuffle lowering so I've kept the extra costs to a minimum - someone who knows this code can add extra costs which should improve vectorization a lot more.

Differential Revision: https://reviews.llvm.org/D48172

llvm-svn: 335329
2018-06-22 09:45:31 +00:00
Sjoerd Meijer 7ee5b090de Reverting r335326 while I look at the test failure
llvm-svn: 335328
2018-06-22 09:17:08 +00:00
Eugene Leviant 6d711ca168 Revert r335324 due to a builtbot failure
llvm-svn: 335327
2018-06-22 08:57:01 +00:00
Sjoerd Meijer 8d2f1565b7 [ARM] ARMv6m and v8m.baseline strict align
This sets target feature FeatureStrictAlign for Armv6-m and Armv8-m.baseline,
because it has no support for unaligned accesses.
It looks like we always pass target feature "+strict-align" from
Clang, so this is not a user facing problem, but querying the subtarget
(in e.g. llc) for unaligned access support is incorrect.

Differential Revision: https://reviews.llvm.org/D48437

llvm-svn: 335326
2018-06-22 08:48:13 +00:00
Matt Arsenault 3f8e7a3dbc AMDGPU: Add patterns for i32/i64 local atomic load/store
Not sure why the 32/64 split is needed in the atomic_load
store hierarchies. The regular PatFrags do this, but we don't
do it for the existing handling for global.

llvm-svn: 335325
2018-06-22 08:39:52 +00:00
Eugene Leviant ea19c9473c [Evaluator] Improve evaluation of call instruction
Differential revision: https://reviews.llvm.org/D46584

llvm-svn: 335324
2018-06-22 08:29:36 +00:00
Mikhail Dvoretckii 0963562083 [X86] Changing the check for valid inputs in combineScalarToVector
Changing the logic of scalar mask folding to check for valid input types rather
than against invalid ones, making it more robust and fixing PR37879.

Differential Revision: https://reviews.llvm.org/D48366

llvm-svn: 335323
2018-06-22 08:28:05 +00:00
Chandler Carruth aa5f4d2e23 Revert r335306 (and r335314) - the Call Graph Profile pass.
This is the first pass in the main pipeline to use the legacy PM's
ability to run function analyses "on demand". Unfortunately, it turns
out there are bugs in that somewhat-hacky approach. At the very least,
it leaks memory and doesn't support -debug-pass=Structure. Unclear if
there are larger issues or not, but this should get the sanitizer bots
back to green by fixing the memory leaks.

llvm-svn: 335320
2018-06-22 05:33:57 +00:00
Tom Stellard 6af7307650 AMDGPU/GlobalISel: Default to using TableGen'd instruction selector
Summary:
We can select all instructions that are marked as legal in a full piglit run,
so now is a good time to make the TableGen'd instruction selector default
for all opcodes.  This is NFC for a full piglit run, which is why there are
no tests.

Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D48198

llvm-svn: 335319
2018-06-22 03:04:35 +00:00
Tom Stellard 26fac0f8e1 AMDGPU/GlobalISel: legalize and select 32-bit G_ASHR
Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D48196

llvm-svn: 335318
2018-06-22 02:54:57 +00:00
Chandler Carruth fe70b29cf7 [LegacyPM] Fix PR37888 by teaching the legacy loop pass manager how to
clear out deleted loops from the current queue beyond just the current
loop.

This is important because SimpleLoopUnswitch will now enqueue the same
loop to be re-processed. When it does this with the legacy PM, we don't
have a way of canceling the rest of the pipeline and so we can end up
deleting the loop before we reprocess it. =/

This change also makes it easy to support deleting other loops in the
queue to process, although I don't have any use cases for that.

Differential Revision: https://reviews.llvm.org/D48470

llvm-svn: 335317
2018-06-22 02:43:41 +00:00
Tom Stellard 9a6535718e AMDGPU/GlobalISel: legalize and select 32-bit G_SITOFP
Reviewers: arsenm, nhaehnle

Reviewed By: arsenm

Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D48195

llvm-svn: 335316
2018-06-22 02:34:29 +00:00
Tom Stellard 7712ee8891 AMDGPU/GlobalISel: Implement select() for COPY
Reviewers: arsenm, nhaehnle

Reviewed By: nhaehnle

Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D46151

llvm-svn: 335315
2018-06-22 00:44:29 +00:00
Sanjay Patel 4784e1506e [InstCombine] fix shuffle-of-binops bug
With non-commutative binops, we could be using the same
variable value as operand 0 in 1 binop and operand 1 in 
the other, so we have to check for that possibility and
bail out.

llvm-svn: 335312
2018-06-21 23:56:59 +00:00
Tom Stellard 3f1c6fe156 AMDGPU/GlobalISel: Implement select() for G_IMPLICIT_DEF
Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D46150

llvm-svn: 335307
2018-06-21 23:38:20 +00:00
Michael J. Spencer fc93dd8e18 [Instrumentation] Add Call Graph Profile pass
This patch adds support for generating a call graph profile from Branch Frequency Info.

The CGProfile module pass simply gets the block profile count for each BB and scans for call instructions. For each call instruction it adds an edge from the current function to the called function with the current BB block profile count as the weight.

After scanning all the functions, it generates an appending module flag containing the data. The format looks like:

!llvm.module.flags = !{!0}

!0 = !{i32 5, !"CG Profile", !1}
!1 = !{!2, !3, !4} ; List of edges
!2 = !{void ()* @a, void ()* @b, i64 32} ; Edge from a to b with a weight of 32
!3 = !{void (i1)* @freq, void ()* @a, i64 11}
!4 = !{void (i1)* @freq, void ()* @b, i64 20}

Differential Revision: https://reviews.llvm.org/D48105

llvm-svn: 335306
2018-06-21 23:31:10 +00:00
Reid Kleckner 2ef486690c [X86] Fix 32-bit mingw comdat names, only add one underscore
llvm-svn: 335304
2018-06-21 23:06:33 +00:00
Reid Kleckner 3a2fd1c2f3 Revert r335297 "[X86] Implement more of x86-64 large and medium PIC code models"
MCJIT can't handle R_X86_64_GOT64 yet.

llvm-svn: 335300
2018-06-21 22:19:05 +00:00
Reid Kleckner 3286a6c896 [X86] Commit some comments that weren't in the medium code model patch
llvm-svn: 335298
2018-06-21 21:57:44 +00:00
Reid Kleckner 247fe6aeab [X86] Implement more of x86-64 large and medium PIC code models
Summary:
The large code model allows code and data segments to exceed 2GB, which
means that some symbol references may require a displacement that cannot
be encoded as a displacement from RIP. The large PIC model even relaxes
the assumption that the GOT itself is within 2GB of all code. Therefore,
we need a special code sequence to materialize it:
  .LtmpN:
    leaq .LtmpN(%rip), %rbx
    movabsq $_GLOBAL_OFFSET_TABLE_-.LtmpN, %rax # Scratch
    addq %rax, %rbx # GOT base reg

From that, non-local references go through the GOT base register instead
of being PC-relative loads. Local references typically use GOTOFF
symbols, like this:
    movq extern_gv@GOT(%rbx), %rax
    movq local_gv@GOTOFF(%rbx), %rax

All calls end up being indirect:
    movabsq $local_fn@GOTOFF, %rax
    addq %rbx, %rax
    callq *%rax

The medium code model retains the assumption that the code segment is
less than 2GB, so calls are once again direct, and the RIP-relative
loads can be used to access the GOT. Materializing the GOT is easy:
    leaq _GLOBAL_OFFSET_TABLE_(%rip), %rbx # GOT base reg

DSO local data accesses will use it:
    movq local_gv@GOTOFF(%rbx), %rax

Non-local data accesses will use RIP-relative addressing, which means we
may not always need to materialize the GOT base:
    movq extern_gv@GOTPCREL(%rip), %rax

Direct calls are basically the same as they are in the small code model:
They use direct, PC-relative addressing, and the PLT is used for calls
to non-local functions.

This patch adds reasonably comprehensive testing of LEA, but there are
lots of interesting folding opportunities that are unimplemented.

Reviewers: chandlerc, echristo

Subscribers: hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D47211

llvm-svn: 335297
2018-06-21 21:55:08 +00:00
Matthew Voss 30648ab233 [GVN] Avoid casting a vector of size less than 8 bits to i8
Summary:
A reprise of D25849.

This crash was found through fuzzing some time ago and was documented in PR28879.

No check for load size has been added due to the following tests:
  - Transforms/GVN/invariant.group.ll
  - Transforms/GVN/pr10820.ll

These tests expect load sizes that are not a multiple of eight.

Thanks to @davide for the original patch.

Reviewers: nlopes, davide, RKSimon, reames, efriedma

Reviewed By: efriedma

Subscribers: davide, llvm-commits, Prazek

Differential Revision: https://reviews.llvm.org/D48330

llvm-svn: 335294
2018-06-21 21:43:20 +00:00
Tim Shen 63f244c4f4 [SCEV] Re-apply r335197 (with Polly fixes).
Summary:
This initiates a discussion on changing Polly accordingly while re-applying r335197 (D48338).

I have never worked on Polly. The proposed change to param_div_div_div_2.ll is not educated, but just patterns that match the output.

All LLVM files are already reviewed in D48338.

Reviewers: jdoerfert, bollu, efriedma

Subscribers: jlebar, sanjoy, hiraditya, llvm-commits, bixia

Differential Revision: https://reviews.llvm.org/D48453

llvm-svn: 335292
2018-06-21 21:29:54 +00:00
Konstantin Zhuravlyov e004b3d97b AMDGPU: Remove ability to reserve VGPRs for debugger
Differential Revision: https://reviews.llvm.org/D48234

llvm-svn: 335288
2018-06-21 20:28:19 +00:00
Reid Kleckner 13c9ee684c [mingw] Fix GCC ABI compatibility for comdat things
Summary:
GCC and the binutils COFF linker do comdats differently from MSVC.
If we want to be ABI compatible, we have to do what they do, which is to
emit unique section names like ".text$_Z3foov" instead of short section
names like ".text". Otherwise, the binutils linker gets confused and
reports multiple definition errors when two object files from GCC and
Clang containing the same inline function are linked together.

The best description of the issue is probably at
https://github.com/Alexpux/MINGW-packages/issues/1677, we don't seem to
have a good one in our tracker.

I fixed up the .pdata and .xdata sections needed everywhere other than
32-bit x86. GCC doesn't use associative comdats for those, it appears to
rely on the section name.

Reviewers: smeenai, compnerd, mstorsjo, martell, mati865

Subscribers: llvm-commits, hiraditya

Differential Revision: https://reviews.llvm.org/D48402

llvm-svn: 335286
2018-06-21 20:27:38 +00:00
Sanjay Patel a76b70069d [InstCombine] fold vector select of binops with constant ops to 1 binop (PR37806)
This is the simplest case from PR37806:
https://bugs.llvm.org/show_bug.cgi?id=37806

If we have a common variable operand used in a pair of binops with vector constants 
that are vector selected together, then we can constant shuffle the constant vectors 
to eliminate the shuffle instruction.

This has some tricky parts that are hopefully addressed in the tests and their 
respective comments:

  1. If the shuffle mask contains an undef element, then that lane of the result is 
     undef:
     http://llvm.org/docs/LangRef.html#shufflevector-instruction

     Therefore, we can replace the constant in that lane with an undef value except 
     for div/rem. With div/rem, an undef in the divisor would cause the whole op to 
     be undef. So I'm using the same hack as in D47686 - replace the undefs with '1'.

  2. Intersect the wrapping and FMF of the original binops for the new binop. There 
     should be no extra poison or fast-math potential in the new binop that wasn't 
     possible in the original code.

  3. Disregard other uses. Given that we're eliminating uses (shortening the 
     dependency chain), I think that's always the right IR canonicalization. But 
     I purposely chose the udiv test to demonstrate the scenario where both 
     intermediate values have other uses because that seems likely worse for 
     codegen with an expensive math op. This seems like a very rare possibility to 
     me, so I don't think it requires a backend patch first.

Differential Revision: https://reviews.llvm.org/D48401

llvm-svn: 335283
2018-06-21 20:15:09 +00:00
Scott Linder 1e8c2c705d [AMDGPU] Update assembler for HSA Code Object v3
Update AMDGPU assembler syntax behind the code-object-v3 feature:

* Replace/rename most AMDGPU assembler directives/symbols and document them.
* Provide more diagnostics (e.g. values out of range, missing values, repeated
  values).
* Provide path for backwards compatibility, even with underlying descriptor
  changes.

Differential Revision: https://reviews.llvm.org/D47736

llvm-svn: 335281
2018-06-21 19:38:56 +00:00
Francis Visoiu Mistrih ac599b6951 Revert r335206 "Recommit r333268: [IPSCCP] Use PredicateInfo to propagate facts from cmp instructions."
This reverts commit r335206.

As discussed here: https://reviews.llvm.org/rL333740, a fix will come
tomorrow. In the meanwhile, revert this to fix some bots.

llvm-svn: 335272
2018-06-21 19:18:36 +00:00
Simon Dardis 3505045b42 [mips] Modify comment to test new email address (NFC).
llvm-svn: 335269
2018-06-21 18:52:32 +00:00
Scott Linder 5792dd0f39 [AMDGPU] Fix bug with tracking processed blocks in SIInsertWaitcnts
BlockWaitcntProcessedSet was not being cleared between calls, so it was
producing incorrect counts in cases where MBB addresses happened to coincide
across multiple calls.

Differential Revision: https://reviews.llvm.org/D48391

llvm-svn: 335268
2018-06-21 18:48:48 +00:00
Konstantin Zhuravlyov 766c77efd7 AMDGPU/AMDHSA: Remove GridWorkGroupCountX/Y/Z
and everything that comes with it from implementation
and v3 header files.

Leave definition in v2 header files for backwards
compatibility.

Differential Revision: https://reviews.llvm.org/D48191

llvm-svn: 335267
2018-06-21 18:36:04 +00:00
Matt Davis d041f21810 [DebugInfo] Ignore DBG_VALUE instructions in PostRA Machine Sink
Summary:
The logic for handling the sinking of COPY instructions was generating
different code when building with debug flags.

The original code did not take into consideration debug instructions.  This
resulted in the registers in the DBG_VALUE instructions being treated as used,
and prevented the COPY from being sunk.  This patch avoids analyzing debug
instructions when trying to sink COPY instructions.

This patch also creates a routine from the code in MachineSinking::SinkInstruction to
perform the logic of sinking an instruction along with its debug instructions.
This functionality is used in multiple places, including the code for sinking COPY instrs.


Reviewers: junbuml, javed.absar, MatzeB, bjope

Reviewed By: bjope

Subscribers: aprantl, probinson, thegameg, jonpa, bjope, vsk, kristof.beyls, JDevlieghere, llvm-commits

Tags: #debug-info

Differential Revision: https://reviews.llvm.org/D45637

llvm-svn: 335264
2018-06-21 17:59:52 +00:00
Sanjay Patel 3244537a3c [InstCombine] use constant pattern matchers with icmp+sext
The previous code worked with vectors, but it failed when the
vector constants contained undef elements. 
The matchers handle those cases.

llvm-svn: 335262
2018-06-21 17:51:44 +00:00
Sanjay Patel 7b0fc75f73 [InstCombine] simplify binops before trying other folds
This is outwardly NFC from what I can tell, but it should be more efficient 
to simplify first (despite the name, SimplifyAssociativeOrCommutative does
not actually simplify as InstSimplify does - it creates/morphs instructions).

This should make it easier to refactor duplicated code that runs for all binops.

llvm-svn: 335258
2018-06-21 17:06:36 +00:00
Paul Robinson 547adcaaac [DWARF] Warn on and ignore ".file 0" for DWARF v4 and earlier.
This had been messing with the directory table for prior versions, and
also could induce a crash when generating asm output.

llvm-svn: 335254
2018-06-21 16:42:03 +00:00
Sirish Pande b60acb9e48 Revert "[AArch64] Coalesce Copy Zero during instruction selection"
This reverts commit d8f57105010cc7e78026e511d5def873fc91e0e7.

Original Commit:

Author: Haicheng Wu <haicheng@codeaurora.org>
Date:   Sun Feb 18 13:51:33 2018 +0000

    [AArch64] Coalesce Copy Zero during instruction selection

    Add special case for copy of zero to avoid a double copy.

    Differential Revision: https://reviews.llvm.org/D36104

Author's intention is to remove a BB that has one mov instruction. In
order to do that, d8f571050 pessmizes MachineSinking by introducing a
copy, such that mov instruction is NOT moved to the BB. Optimization
downstream gets rid of the BB with only mov instruction. This works well
if we have only one fall through branch as there is only one "extra"
mov instruction.

If we have multiple fall throughs, we will have a lot of redundant movs.
In such a case, it's better to have this BB which has one mov instruction.

This is causing degradation in jpeg, fft and other codebases. I believe
if we want to remove a BB with only one branch instruction, we should not
pessimize Machine Sinking at all, and find some other solution.

llvm-svn: 335251
2018-06-21 16:05:24 +00:00
Stanislav Mekhanoshin 22ee191c3e DAG combine "and|or (select c, -1, 0), x" -> "select c, x, 0|-1"
Allowed folding for "and/or" binops with non-constant operand if
arguments of select are 0/-1 values.

Normally this code with "and" opcode does not get to a DAG combiner
and simplified yet in the InstCombine. However AMDGPU produces it
during lowering and InstCombine has no chance to optimize it out.

In turn the same pattern with "or" opcode can reach DAG.

Differential Revision: https://reviews.llvm.org/D48301

llvm-svn: 335250
2018-06-21 16:02:05 +00:00
David Green 21a2973cc4 [ARM] Enable useAA() for the in-order Cortex-R52
This option allows codegen (such as DAGCombine or MI scheduling) to use alias
analysis information, which can help with the codegen on in-order cpu's,
especially machine scheduling. Here I have done things the same way as AArch64,
adding a subtarget feature to enable this for specific cores, and enabled it for
the R52 where we have a schedule to make use of it.

Differential Revision: https://reviews.llvm.org/D48074

llvm-svn: 335249
2018-06-21 15:48:29 +00:00
Sanjay Patel 3e5c051a06 [InstCombine] make div/rem vector constant utility function; NFCI
This was originally in D48401 and will be used there.

llvm-svn: 335242
2018-06-21 14:59:35 +00:00
Sameer AbuAsal e01e711c64 [RISCV] Tail calls don't need to save return address
Summary:
 When expanding the PseudoTail in expandFunctionCall() we were using X6
 to save the return address. Since this is a tail call the return
 address is not needed, this patch replaces it with X0 to be ignored.

 This matches the behaviour listed in the ISA V2.2 document page 110.
 tail offset -----> jalr x0, x6, offset

 GCC exhibits the same behavior.

Reviewers: apazos, asb, mgrang

Reviewed By: asb

Subscribers: rbar, johnrusso, simoncook, niosHD, kito-cheng, shiva0217, zzheng, edward-jones, rogfer01

Differential Revision: https://reviews.llvm.org/D48343

llvm-svn: 335239
2018-06-21 14:37:09 +00:00
Mikhail Dvoretckii 22c82af5c8 [x86] Lower some trunc + shuffle patterns to vpmov[q|d][b|w]
This should help in lowering the following four intrinsics:
 _mm256_cvtepi32_epi8
 _mm256_cvtepi64_epi16
 _mm256_cvtepi64_epi8
 _mm512_cvtepi64_epi8

Differential Revision: https://reviews.llvm.org/D46957

llvm-svn: 335238
2018-06-21 14:16:45 +00:00
Krzysztof Parzyszek 6a3229301f [CodeGen] Avoid handling DBG_VALUE in LiveRegUnits::stepBackward
Patch by Jesper Antonsson.

Differential Revision: https://reviews.llvm.org/D48420

llvm-svn: 335233
2018-06-21 13:38:43 +00:00
Nicolai Haehnle 15745ba5c1 AMDGPU: Remove redundant MIMG instruction variants
Summary:
For sample and gather ops, we can accurately determine the set of
vaddr-size instruction variants that are required. This reduces
the size of instruction tables by ~5%.

The number of machine instruction opcodes is reduced from 10002
to 9476.

Change-Id: Ie7fc65d3657b762c7816017fe70b2e9bec644a8a

Reviewers: arsenm, rampitec

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D48168

llvm-svn: 335232
2018-06-21 13:37:55 +00:00
Nicolai Haehnle db6911a6f9 AMDGPU: Remove old-style image intrinsics
Summary:
This also removes the need for atomic pseudo instructions, since
we select the correct encoding directly in SITargetLowering::lowerImage
for dimension-aware image intrinsics.

Mesa uses dimension-aware image intrinsics since
commit a9a7993441.

Change-Id: I7473d20009476a4ed6d919cae4e6dca9ff42e77a

Reviewers: arsenm, rampitec, mareko, tpr, b-sumner

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D48167

llvm-svn: 335231
2018-06-21 13:37:45 +00:00
Nicolai Haehnle b29ee70122 InstCombine/AMDGPU: Add dimension-aware image intrinsics to SimplifyDemanded
Summary:
Use the expanded features of the TableGen generic tables to avoid manually
adding the combinatorially exploded set of intrinsics. The
getAMDGPUImageDimIntrinsic lookup function is early-out,
i.e. non-AMDGPU intrinsics will never look at the underlying table.

Use a generic approach for getting the new intrinsic overload to keep the
code simple, and make the image dmask handling more generic:
- handle non-sampler image loads
- handle the case where the set of demanded elements is not a prefix

There is some overlap between this code and an optimization that happens
in the backend during code generation. They currently complement each other:

- only the codegen optimization can generate vec3 loads
- only the InstCombine optimization can handle D16

The InstCombine optimization also likely covers more cases since the
codegen optimization is fairly ad-hoc. Ideally, we'll remove the optimization
in codegen once the infrastructure for vec3 is in place (which will probably
take a long time).

Modify the test cases to use dimension-aware intrinsics. This makes it
easier to see that the test coverage for the new intrinsics is equivalent,
and the old style intrinsics will be removed in a follow-up commit anyway.

Change-Id: I4b91ea661413d13004956fe4ef7d13d41b8ce3ad

Reviewers: arsenm, rampitec, majnemer

Subscribers: kzhuravl, wdng, mgorny, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D48165

llvm-svn: 335230
2018-06-21 13:37:31 +00:00
Nicolai Haehnle 7a9c03f484 AMDGPU: Select MIMG instructions manually in SITargetLowering
Summary:
Having TableGen patterns for image intrinsics is hitting limitations:
for D16 we already have to manually pre-lower the packing of data
values, and we will have to do the same for A16 eventually.

Since there is already some custom C++ code anyway, it is arguably easier
to just do everything in C++, now that we can use the beefed-up generic
tables backend of TableGen to provide all the required metadata and map
intrinsics to corresponding opcodes. With this approach, all image
intrinsic lowering happens in SITargetLowering::lowerImage. That code is
dense due to all the cases that it handles, but it should still be easier
to follow than what we had before, by virtue of it all being done in a
single location, and by virtue of not relying on the TableGen pattern
magic that very few people really understand.

This means that we will have MachineSDNodes with MIMG instructions
during DAG combining, but that seems alright: previously we had
intrinsic nodes instead, but those are similarly opaque to the generic
CodeGen infrastructure, and the final pattern matching just did a 1:1
translation to machine instructions anyway. If anything, the fact that
we now merge the address words into a vector before DAG combine should
be an advantage.

Change-Id: I417f26bd88f54ce9781c1668acc01f3f99774de6

Reviewers: arsenm, rampitec, rtaylor, tstellar

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D48017

llvm-svn: 335228
2018-06-21 13:36:57 +00:00
Nicolai Haehnle 0ab200b6c9 AMDGPU: Refactor MIMG instruction TableGen using generic tables
Summary:
This allows us to access rich information about MIMG opcodes from C++ code.
Simplifying the mapping between equivalent opcodes of different data size
becomes quite natural.

This also flattens the MIMG-related class and multiclass hierarchy a little,
and collapses together some of the scaffolding for sample and gather4 opcodes.

Change-Id: I1a2549fdc1e881ff100e5393d2d87e73729a0ccd

Reviewers: arsenm, rampitec

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D48016

llvm-svn: 335227
2018-06-21 13:36:44 +00:00
Nicolai Haehnle e741d7e0fd AMDGPU: Use generic tables instead of SearchableTable
Summary:

Reviewers: arsenm, rampitec

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D48014

Change-Id: Ibb43f90d955275571aff17d0c3ecfb5e5b299641
llvm-svn: 335226
2018-06-21 13:36:33 +00:00
Nicolai Haehnle 2367f03565 AMDGPU: Pass AMDGPUSampleVariant to MIMG_{Sampler,Gather}(_WQM)
Summary:
This will allows us to provide rich metadata about the instructions
in tables that are accessible by custom C++ code.

Change-Id: Id9305a26304ab6a6cceb6c65c8cd49141cc0101d

Reviewers: arsenm, rampitec

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D48011

llvm-svn: 335224
2018-06-21 13:36:13 +00:00
Nicolai Haehnle b3a9b68513 AMDGPU: Add implicit def of SCC to kill and indirect pseudos
Summary:
Kill instructions sometimes do use SCC in unusual circumstances, when
v_cmpx cannot be used due to the operands that are involved.

Additionally, even if SCC was never defined by the expansion, kill pseudos
could previously occur between an s_cmp and an s_cbranch_scc, which breaks
the SCC liveness tracking when the pseudo is expanded to split the basic
block. While it would be possible to explicitly mark the SCC as live-in for
the successor basic block, it's simpler to just mark the pseudo as using SCC,
so that such a sequence is never emitted by instruction selection in the
first place.

A similar issue affects indirect source/dest pseudos in principle, although
I haven't been able to come up with a test case where it actually matters
(this affects instruction selection, so a MIR test can't be used).

Fixes: dEQP-GLES3.functional.shaders.discard.dynamic_loop_always
Change-Id: Ica8d82ecff1a763b892a1112cf1b06c948863a4f

Reviewers: arsenm, rampitec

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D47761

llvm-svn: 335223
2018-06-21 13:36:08 +00:00
Nicolai Haehnle f267431901 AMDGPU: Turn D16 for MIMG instructions into a regular operand
Summary:
This allows us to reduce the number of different machine instruction
opcodes, which reduces the table sizes and helps flatten the TableGen
multiclass hierarchies.

We can do this because for each hardware MIMG opcode, we have a full set
of IMAGE_xxx_Vn_Vm machine instructions for all required sizes of vdata
and vaddr registers. Instead of having separate D16 machine instructions,
a packed D16 instructions loading e.g. 4 components can simply use the
same V2 opcode variant that non-D16 instructions use.

We still require a TSFlag for D16 buffer instructions, because the
D16-ness of buffer instructions is part of the opcode. Renaming the flag
should help avoid future confusion.

The one non-obvious code change is that for gather4 instructions, the
disassembler can no longer automatically decide whether to use a V2 or
a V4 variant. The existing logic which choose the correct variant for
other MIMG instruction is extended to cover gather4 as well.

As a bonus, some of the assembler error messages are now more helpful
(e.g., complaining about a wrong data size instead of a non-existing
instruction).

While we're at it, delete a whole bunch of dead legacy TableGen code.

Change-Id: I89b02c2841c06f95e662541433e597f5d4553978

Reviewers: arsenm, rampitec, kzhuravl, artem.tamazov, dp, rtaylor

Subscribers: wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D47434

llvm-svn: 335222
2018-06-21 13:36:01 +00:00
Nicolai Haehnle 7d69e0f37d TableGen: Allow foreach in multiclass to depend on template args
Summary:
This also allows inner foreach loops to have a list that depends on
the iteration variable of an outer foreach loop. The test cases show
some very simple examples of how this can be used.

This was perhaps the last remaining major non-orthogonality in the
TableGen frontend.

Change-Id: I79b92d41a5c0e7c03cc8af4000c5e1bda5ef464d

Reviewers: tra, simon_tatham, craig.topper, MartinO, arsenm

Subscribers: wdng, llvm-commits

Differential Revision: https://reviews.llvm.org/D47431

llvm-svn: 335221
2018-06-21 13:35:44 +00:00
David Green d143c65de3 [DA] Enable -da-delinearize by default
This enables da-delinearize in Dependence Analysis for delinearizing array
accesses into multiple dimensions. This can help to increase the power of
Dependence analysis on multi-dimensional arrays and prevent having to fall
back to the slower and less accurate MIV tests. It adds static checks on the
bounds of the arrays to ensure that one dimension doesn't overflow into
another, and brings our code in line with our tests.

Differential Revision: https://reviews.llvm.org/D45872

llvm-svn: 335217
2018-06-21 11:53:16 +00:00
Simon Pilgrim 2a9cde026c [X86][AVX] Reduce v4f64/v4i64 shuffle costs (PR37882)
These were being over cautious for costs for one/two op general shuffles - VSHUFPD doesn't have to replicate the same shuffle in both lanes like VSHUFPS does. 

llvm-svn: 335216
2018-06-21 11:37:13 +00:00
Mikael Holmen 42f7bc96dd [DebugInfo] Make sure all DBG_VALUEs' reguse operands have IsDebug property
Summary:
In some cases, these operands lacked the IsDebug property, which is meant to signal that
they should not affect codegen. This patch adds a check for this property in the
MachineVerifier and adds it where it was missing.

This includes refactorings to use MachineInstrBuilder construction functions instead of
manually setting up the intrinsic everywhere.

Patch by: JesperAntonsson

Reviewers: aprantl, rnk, echristo, javed.absar

Reviewed By: aprantl

Subscribers: qcolombet, sdardis, nemanjai, JDevlieghere, atanasyan, llvm-commits

Differential Revision: https://reviews.llvm.org/D48319

llvm-svn: 335214
2018-06-21 10:03:34 +00:00
David Green a465188500 [DAGCombine] Fix alignment for offset loads/stores
The alignment parameter to getExtLoad is treated as a base alignment,
not the alignment of the load (base + offset). When we infer a better
alignment for a Ptr we need to ensure that it applies to the base to
prevent the alignment on the load from being wrong.

This fixes a bug where the alignment could then be used to incorrectly
prove noalias between a load and a store, leading to a miscompile.

Differential Revision: https://reviews.llvm.org/D48029

llvm-svn: 335210
2018-06-21 08:30:07 +00:00
Eric Christopher 12e221d9b9 Remove FIXME comment about WIP.
This is the only line other than the function signature remaining
of the original patch.

llvm-svn: 335208
2018-06-21 07:15:19 +00:00
Eric Christopher 76cfef46f3 Add some explanatory text to the associated symbol support.
llvm-svn: 335207
2018-06-21 07:15:14 +00:00
Florian Hahn d36aa1f763 Recommit r333268: [IPSCCP] Use PredicateInfo to propagate facts from cmp instructions.
r335150 should resolve the issues with the clang-with-thin-lto-ubuntu
and clang-with-lto-ubuntu builders.

Original message:
This patch updates IPSCCP to use PredicateInfo to propagate
facts to true branches predicated by EQ and to false branches
predicated by NE.

As a follow up, we should be able to extend it to also propagate additional
facts about nonnull.

Reviewers: davide, mssimpso, dberlin, efriedma

Reviewed By: davide, dberlin

llvm-svn: 335206
2018-06-21 07:15:08 +00:00
Mikael Holmen 57b33f6aac [DebugInfo] Keep DBG_VALUE undef in LiveDebugVariables
Summary:
Fixes PR36579.

For cases where we had e.g.

 DBG_VALUE 42
 [...]
 DBG_VALUE undef

LiveDebugVariables would discard all undef DBG_VALUEs and then it would
look like the variable had the value 42 throughout the rest of the
function, which is incorrect.

With this patch we don't remove all undef DBG_VALUEs in LiveDebugVariables
so they will be kept after register allocation just like other DBG_VALUEs
which will yield more correct debug information.

Reviewers: aprantl

Reviewed By: aprantl

Subscribers: bjope, Ka-Ka, JDevlieghere, llvm-commits

Differential Revision: https://reviews.llvm.org/D48277

llvm-svn: 335205
2018-06-21 07:02:46 +00:00
Chandler Carruth d1dab0c3c0 [PM/LoopUnswitch] Add partial non-trivial unswitching for invariant
conditions feeding a chain of `and`s or `or`s for a branch.

Much like with full non-trivial unswitching, we rely on the pass manager
to handle iterating until all of the profitable unswitches have been
done. This is to allow other more profitable unswitches to fire on any
of the cloned, simpler versions of the loop if viable.

Threading the partial unswiching through the non-trivial unswitching
logic motivated some minor refactorings. If those are too disruptive to
make it reasonable to review this patch, I can separate them out, but
it'll be somewhat timeconsuming so I wanted to send it for initial
review as-is. Feel free to tell me whether it warrants pulling apart.

I've tried to re-use (and factor out) logic form the partial trivial
unswitching, but not as much could be shared as I had haped. Still, this
wasn't as bad as I naively expected.

Some basic testing is added, but I probably need more. Suggestions for
things you'd like to see tested more than welcome. One thing I'd like to
do is add some testing that when we schedule this with loop-instsimplify
it effectively cleans up the cruft created.

Last but not least, this uncovered a bug that has been in loop cloning
the entire time for non-trivial unswitching. Specifically, we didn't
correctly add the outer-most cloned loop to the list of cloned loops.
This meant that LCSSA wouldn't be updated for it hypothetically, and
more significantly that we would never visit it in the loop pass
manager. I noticed this while checking loop-instsimplify by hand. I'll
try to separate this bugfix out into its own patch with a more focused
test. But it is just one line, so shouldn't significantly confuse the
review here.

After this patch, the only missing "feature" in this unswitch I'm aware
of us non-trivial unswitching of switches. I'll try implementing *full*
non-trivial unswitching of switches (which is at least a sound thing to
implement), but *partial* non-trivial unswitching of switches is
something I don't see any sound and principled way to implement. I also
have no interesting test cases for the latter, so I'm not really
worried. The rest of the things that need to be ported are bug-fixes and
more narrow / targeted support for specific issues.

Differential Revision: https://reviews.llvm.org/D47522

llvm-svn: 335203
2018-06-21 06:14:03 +00:00
Michael Zolotukhin 336d75cc73 ProvenanceAnalysis: Store WeakTrackingVH instead of Value* in UnderlyingValue Cache.
Summary:
Since the value stored in the cache might be deleted or replaced with
something else, we need to use tracking ValueHandlers instead of plain
Value pointers. It was discovered in one of internal builds, and
unfortunately there is no small reproducer for the issue.

The cache was introduced in rL327328.

Reviewers: ahatanak, pete

Subscribers: hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D48407

llvm-svn: 335201
2018-06-21 05:14:00 +00:00
Craig Topper 296526bf46 [X86] Remove masking from 512-bit floating max/min intrinsics. Use select instruction instead.
llvm-svn: 335199
2018-06-21 05:00:56 +00:00
Tim Shen 433b9761ce Revert "[SCEV] Improve zext(A /u B) and zext(A % B)"
This reverts commit r335197, as some bots are not happy.

llvm-svn: 335198
2018-06-21 02:15:32 +00:00
Tim Shen 5af61e0a28 [SCEV] Improve zext(A /u B) and zext(A % B)
Summary:
Try to match udiv and urem patterns, and sink zext down to the leaves.

I'm not entirely sure why some unrelated tests change, but the added <nsw>s seem right.

Reviewers: sanjoy

Subscribers: jlebar, hiraditya, bixia, llvm-commits

Differential Revision: https://reviews.llvm.org/D48338

llvm-svn: 335197
2018-06-21 01:49:07 +00:00
Wolfgang Pieb 61d8c8d9b3 [DWARF] Improved error reporting for range lists.
Errors found processing the DW_AT_ranges attribute are propagated by lower level 
routines and reported by their callers.

Reviewer: JDevlieghere

Differential Revision: https://reviews.llvm.org/D48344

llvm-svn: 335188
2018-06-20 22:56:37 +00:00
Simon Dardis 0f111dd704 [mips] Add microMIPS specific addressing patterns.
These are identical but use microMIPS instructions instead of MIPS instructions.

Also, flatten the 'let AdditionalPredicates = [InMicroMips]' by using the
ISA_MICROMIPS adjective. Add tests for constant materialization.

Reviewers: atanasyan, abeserminji, smaksimovic

Differential Revision: https://reviews.llvm.org/D48275

llvm-svn: 335185
2018-06-20 22:40:12 +00:00
Alina Sbirlea dfd14adeb0 Generalize MergeBlockIntoPredecessor. Replace uses of MergeBasicBlockIntoOnlyPred.
Summary:
Two utils methods have essentially the same functionality. This is an attempt to merge them into one.
1. lib/Transforms/Utils/Local.cpp : MergeBasicBlockIntoOnlyPred
2. lib/Transforms/Utils/BasicBlockUtils.cpp : MergeBlockIntoPredecessor

Prior to the patch:
1. MergeBasicBlockIntoOnlyPred
Updates either DomTree or DeferredDominance
Moves all instructions from Pred to BB, deletes Pred
Asserts BB has single predecessor
If address was taken, replace the block address with constant 1 (?)

2. MergeBlockIntoPredecessor
Updates DomTree, LoopInfo and MemoryDependenceResults
Moves all instruction from BB to Pred, deletes BB
Returns if doesn't have a single predecessor
Returns if BB's address was taken

After the patch:
Method 2. MergeBlockIntoPredecessor is attempting to become the new default:
Updates DomTree or DeferredDominance, and LoopInfo and MemoryDependenceResults
Moves all instruction from BB to Pred, deletes BB
Returns if doesn't have a single predecessor
Returns if BB's address was taken

Uses of MergeBasicBlockIntoOnlyPred that need to be replaced:

1. lib/Transforms/Scalar/LoopSimplifyCFG.cpp
Updated in this patch. No challenges.

2. lib/CodeGen/CodeGenPrepare.cpp
Updated in this patch.
  i. eliminateFallThrough is straightforward, but I added using a temporary array to avoid the iterator invalidation.
  ii. eliminateMostlyEmptyBlock(s) methods also now use a temporary array for blocks
Some interesting aspects:
  - Since Pred is not deleted (BB is), the entry block does not need updating.
  - The entry block was being updated with the deleted block in eliminateMostlyEmptyBlock. Added assert to make obvious that BB=SinglePred.
  - isMergingEmptyBlockProfitable assumes BB is the one to be deleted.
  - eliminateMostlyEmptyBlock(BB) does not delete BB on one path, it deletes its unique predecessor instead.
  - adding some test owner as subscribers for the interesting tests modified:
    test/CodeGen/X86/avx-cmp.ll
    test/CodeGen/AMDGPU/nested-loop-conditions.ll
    test/CodeGen/AMDGPU/si-annotate-cf.ll
    test/CodeGen/X86/hoist-spill.ll
    test/CodeGen/X86/2006-11-17-IllegalMove.ll

3. lib/Transforms/Scalar/JumpThreading.cpp
Not covered in this patch. It is the only use case using the DeferredDominance.
I would defer to Brian Rzycki to make this replacement.

Reviewers: chandlerc, spatel, davide, brzycki, bkramer, javed.absar

Subscribers: qcolombet, sanjoy, nemanjai, nhaehnle, jlebar, tpr, kbarton, RKSimon, wmi, arsenm, llvm-commits

Differential Revision: https://reviews.llvm.org/D48202

llvm-svn: 335183
2018-06-20 22:01:04 +00:00
Alina Sbirlea 201d02c75c [MemorySSA] Verify Phi incoming blocks are block predecessors.
Summary: Make the MemorySSA verify also check that all Phi incoming blocks are block predecessors.

Reviewers: george.burgess.iv

Subscribers: sanjoy, jlebar, Prazek, llvm-commits

Differential Revision: https://reviews.llvm.org/D48333

llvm-svn: 335174
2018-06-20 21:06:13 +00:00
Craig Topper c2696d577b [X86] Use setcc ISD opcode for AVX512 integer comparisons all the way to isel
I don't believe there is any real reason to have separate X86 specific opcodes for vector compares. Setcc has the same behavior just uses a different encoding for the condition code.

I had to change the CondCodeAction for SETLT and SETLE to prevent some transforms from changing SETGT lowering.

Differential Revision: https://reviews.llvm.org/D43608

llvm-svn: 335173
2018-06-20 21:05:02 +00:00
Simon Pilgrim 3d1c8c97b8 [SLPVectorizer] Provide InstructionsState down the BoUpSLP vectorization call tree
As described in D48359, this patch pushes InstructionsState down the BoUpSLP call hierarchy instead of the corresponding raw OpValue. This makes it easier to track the alternate opcode etc. and avoids us having to call getAltOpcode which makes it difficult to support more than one alternate opcode.

Differential Revision: https://reviews.llvm.org/D48382

llvm-svn: 335170
2018-06-20 20:54:52 +00:00
Stanislav Mekhanoshin 20279dc025 Allow binop C1, (select cc, CF, CT) -> select folding
Previously this folding was done only if select is a first operand.
However, for non-commutative operations constant may go before
select.

Differential Revision: https://reviews.llvm.org/D48223

llvm-svn: 335167
2018-06-20 20:24:20 +00:00
Simon Dardis 6021424c10 [mips] Correct predicates for loads, bit manipulation instructions and some pseudos
Additionally, correct the definition of the rdhwr instruction.

Reviewers: atanasyan, abeserminji, smaksimovic

Differential Revision: https://reviews.llvm.org/D48216

llvm-svn: 335162
2018-06-20 19:59:58 +00:00
Matt Arsenault 5a4ec8127f AMDGPU: Fix scalar_to_vector for v4i16/v4f16
llvm-svn: 335161
2018-06-20 19:45:48 +00:00
Matt Arsenault 3d06668ad4 AMDGPU: Fix missing C++ mode comment
llvm-svn: 335160
2018-06-20 19:45:40 +00:00
Sanjay Patel 3597588493 [IR] add/use isIntDivRem convenience function
There are more existing potential users of this,
but I've limited this patch to the first couple
that I found to minimize typo risk.

llvm-svn: 335157
2018-06-20 19:02:17 +00:00
Chandler Carruth 4da3331d3d [PM/LoopUnswitch] Support partial trivial unswitching.
The idea of partial unswitching is to take a *part* of a branch's
condition that is loop invariant and just unswitching that part. This
primarily makes sense with i1 conditions of branches as opposed to
switches. When dealing with i1 conditions, we can easily extract loop
invariant inputs to a a branch and unswitch them to test them entirely
outside the loop.

As part of this, we now create much more significant cruft in the loop
body, so this relies on adding cleanup passes to the loop pipeline and
revisiting unswitched loops to do that cleanup before continuing to
process them.

This already appears to be more powerful at unswitching than the old
loop unswitch pass, and so I'd appreciate pretty careful review in case
I'm just missing some correctness checks. The `LIV-loop-condition` test
case is not unswitched by the old unswitch pass, but is with this pass.

Thanks to Sanjoy and Fedor for the review!

Differential Revision: https://reviews.llvm.org/D46706

llvm-svn: 335156
2018-06-20 18:57:07 +00:00
Alex Bradbury fafdebcfcb [RISCV] Accept fmv.s.x and fmv.x.s as mnemonic aliases for fmv.w.x and fmv.x.w
These instructions were renamed in version 2.2 of the user-level ISA spec, but 
the old name should also be accepted by standard tools.

llvm-svn: 335154
2018-06-20 18:42:25 +00:00
Vedant Kumar 4e93f3dcf8 [Local] Generalize insertReplacementDbgValues, NFC
This utility should operate on Values, not Instructions. While I'm here,
I've also made it possible to skip emitting replacement dbg.values for
certain debug users (by having RewriteExpr return nullptr).

llvm-svn: 335152
2018-06-20 18:40:14 +00:00
Florian Hahn 5ac2629823 [PredicateInfo] Order instructions in different BBs by DFSNumIn.
Using OrderedInstructions::dominates as comparator for instructions in
BBs without dominance relation can cause a non-deterministic order
between such instructions. That in turn can cause us to materialize
copies in a non-deterministic order. While this does not effect
correctness, it causes some minor non-determinism in the final generated
code, because values have slightly different labels.

Without this patch, running -print-predicateinfo on a reasonably large
module produces slightly different output on each run.

This patch uses the dominator trees DFSInNum to order instruction from
different BBs, which should enforce a deterministic ordering and
guarantee that dominated instructions come after the instructions that
dominate them.

Reviewers: dberlin, efriedma, davide

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D48230

llvm-svn: 335150
2018-06-20 17:42:01 +00:00
Paul Robinson 8e3e374e5f [DWARF] Don't keep a ref to possibly stack allocated data.
llvm-svn: 335146
2018-06-20 17:08:46 +00:00
Vlad Tsyrklevich 34b9112216 IRMover: Account for matching types present across modules
Summary:
Due to uniqueing of DICompositeTypes, it's possible for a type from one
module to be loaded into another earlier module without being renamed.
Then when the defining module is being IRMoved, the type can be used as
a Mapping destination before being loaded, such that when it's requested
using TypeMapTy::get() it will fail with an assertion that the type is a
source type when it's actually a type in both the source and
destination modules. Correctly handle that case by allowing a non-opaque
non-literal struct type be present in both modules.

Fix for PR37684.

Reviewers: pcc, tejohnson

Reviewed By: pcc, tejohnson

Subscribers: tobiasvk, mehdi_amini, steven_wu, llvm-commits, kcc

Differential Revision: https://reviews.llvm.org/D47898

llvm-svn: 335145
2018-06-20 16:50:56 +00:00
Vedant Kumar 6fa24b0b7f [Local] Add a utility to insert replacement dbg.values, NFC
The purpose of this utility is to make it easier for optimizations to
insert replacement dbg.values for instructions they are deleting. This
is useful in situations where salvageDebugInfo is inapplicable, say,
because the new dbg.value cannot refer to an operand of the dying value.

The utility is called insertReplacementDbgValues.

It assumes that the instruction 'From' is going to be deleted, and
inserts replacement dbg.values for each debug user of 'From'. The
newly-inserted dbg.values refer to 'To' instead of 'From'. Each
replacement dbg.value has the same location and variable as the debug
user it replaces, has a DIExpression determined by the result of
'RewriteExpr' applied to an old debug user of 'From', and is placed
before 'InsertBefore'.

This should simplify future patches, like D48331.

llvm-svn: 335144
2018-06-20 16:50:25 +00:00
Paul Robinson 37fbb92652 Remove a redundant initialization. NFC
llvm-svn: 335143
2018-06-20 16:12:03 +00:00
Simon Pilgrim 292651a5b7 [SLPVectorizer] Move isOneOf after InstructionsState type. NFCI.
A future patch will have isOneOf use InstructionsState.

llvm-svn: 335142
2018-06-20 16:11:00 +00:00
Bjorn Pettersson 7bf676662a [DAG] Don't map a TableId to itself in the ReplacedValues map
Summary:
Found some regressions (infinite loop in DAGTypeLegalizer::RemapId)
after r334880. This patch makes sure that we do map a TableId to
itself.

Reviewers: niravd

Reviewed By: niravd

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D48364

llvm-svn: 335141
2018-06-20 16:06:09 +00:00
Nirav Dave cd558887d3 [DAG] Fix and-mask folding when narrowing loads.
Summary:
Check that and masks are strictly smaller than implicit mask from
narrowed load.

Fixes PR37820.

Reviewers: samparker, RKSimon, nemanjai

Subscribers: hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D48335

llvm-svn: 335137
2018-06-20 15:36:29 +00:00
Sam Clegg 52564675c4 [WebAssembly] Update know failures for the wasm waterfall
Summary:
The waterfall no longer builds .s files and no longers uses
the wasm-o when it builds object files.

Subscribers: dschuff, jgravelle-google, aheejin, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D48371

llvm-svn: 335135
2018-06-20 15:17:12 +00:00
Simon Pilgrim 0c9f8dcde7 [SLPVectorizer] Use InstructionsState to record AltOpcode
This is part of a move towards generalizing the alternate opcode mechanism and not just supporting (F)Add/(F)Sub counterparts.

The patch embeds the AltOpcode in the InstructionsState instead of calling getAltOpcode so often.

I'm hoping to eventually remove all uses of getAltOpcode and handle alternate opcode selection entirely within getSameOpcode, that will require us to use InstructionsState throughout the BoUpSLP call hierarchy (similar to some of the changes in D28907), which I will begin in future patches.

Differential Revision: https://reviews.llvm.org/D48359

llvm-svn: 335134
2018-06-20 15:13:40 +00:00
Simon Pilgrim 2e2f20a949 [SLPVectorizer] Relax "alternate" opcode vectorisation to work with any SK_Select shuffle pattern
D47985 saw the old SK_Alternate 'alternating' shuffle mask replaced with the SK_Select mask which accepts either input operand for each lane, equivalent to a vector select with a constant condition operand.

This patch updates SLPVectorizer to make full use of this SK_Select shuffle pattern by removing the 'isOdd()' limitation.

The AArch64 regression will be fixed by D48172.

Differential Revision: https://reviews.llvm.org/D48174

llvm-svn: 335130
2018-06-20 14:26:28 +00:00
Sanjay Patel 0c57de4c21 [InstSimplify] Fix missed optimization in simplifyUnsignedRangeCheck()
For both operands are unsigned, the following optimizations are valid, and missing:

   1. X > Y && X != 0 --> X > Y
   2. X > Y || X != 0 --> X != 0
   3. X <= Y || X != 0 --> true
   4. X <= Y || X == 0 --> X <= Y
   5. X > Y && X == 0 --> false

unsigned foo(unsigned x, unsigned y) { return x > y && x != 0; }
should fold to x > y, but I found we haven't done it right now.
besides, unsigned foo(unsigned x, unsigned y) { return x < y && y != 0; }
Has been folded to x < y, so there may be a bug.

Patch by: Li Jia He!

Differential Revision: https://reviews.llvm.org/D47922

llvm-svn: 335129
2018-06-20 14:22:49 +00:00
Alex Bradbury 79d2b50ca8 [RISCV] Add InstAlias definitions for fgt.{s|d}, fge.{s|d}
These are produced by GCC and supported by GAS, but not currently contained in 
the pseudoinstruction listing in the RISC-V ISA manual.

llvm-svn: 335127
2018-06-20 14:03:02 +00:00
Krzysztof Parzyszek d8b780dcd6 [Hexagon] Remove 'T' from HasVNN predicates, NFC
Patch by Sumanth Gundapaneni.

llvm-svn: 335124
2018-06-20 13:56:09 +00:00
Simon Dardis eae99120b0 [mips] Fix the predicates of some DSP instructions from AdditionalPredicates to ASEPredicate
Reviewers: atanasyan, abeserminji, smaksimovic

Differential Revision: https://reviews.llvm.org/D48166

llvm-svn: 335122
2018-06-20 13:29:57 +00:00
Sanjay Patel 825a4faa8d [InstCombine] ignore debuginfo when removing redundant assumes (PR37726)
This is similar to:
rL335083

Fixes::
https://bugs.llvm.org/show_bug.cgi?id=37726

llvm-svn: 335121
2018-06-20 13:22:26 +00:00
Alex Bradbury 18b9bd7d6c [RISCV] Add InstAlias definitions for sgt and sgtu
These are produced by GCC and supported by GAS, but not currently contained in 
the pseudoinstruction listing in the RISC-V ISA manual.

llvm-svn: 335120
2018-06-20 12:54:02 +00:00
Tim Northover 644a819534 ARM: convert ORR instructions to ADD where possible on Thumb.
Thumb has more 16-bit encoding space dedicated to ADD than ORR, allowing both a
3-address encoding and a wider range of immediates. So, particularly when
optimizing for code size (but it doesn't make things worse elsewhere) it's
beneficial to select an OR operation to an ADD if we know overflow won't occur.

This is made even better by LLVM's penchant for putting operations in canonical
form by converting the other way.

llvm-svn: 335119
2018-06-20 12:09:44 +00:00
Tim Northover 70666e7765 [AArch64] Implement FLT_ROUNDS macro.
Very similar to ARM implementation, just maps to an MRS.

Should fix PR25191.

Patch by Michael Brase.

llvm-svn: 335118
2018-06-20 12:09:01 +00:00
Andrea Di Biagio 2145b13fc9 [llvm-mca][X86] Teach how to identify register writes that implicitly clear the upper portion of a super-register.
This patch teaches llvm-mca how to identify register writes that implicitly zero
the upper portion of a super-register.

On X86-64, a general purpose register is implemented in hardware as a 64-bit
register. Quoting the Intel 64 Software Developer's Manual: "an update to the
lower 32 bits of a 64 bit integer register is architecturally defined to zero
extend the upper 32 bits".  Also, a write to an XMM register performed by an AVX
instruction implicitly zeroes the upper 128 bits of the aliasing YMM register.

This patch adds a new method named clearsSuperRegisters to the MCInstrAnalysis
interface to help identify instructions that implicitly clear the upper portion
of a super-register.  The rest of the patch teaches llvm-mca how to use that new
method to obtain the information, and update the register dependencies
accordingly.

I compared the kernels from tests clear-super-register-1.s and
clear-super-register-2.s against the output from perf on btver2.  Previously
there was a large discrepancy between the estimated IPC and the measured IPC.
Now the differences are mostly in the noise.

Differential Revision: https://reviews.llvm.org/D48225

llvm-svn: 335113
2018-06-20 10:08:11 +00:00
Simon Pilgrim b7ac037797 [SLPVectorizer] Split Tree/Reduction cost calls to simplify debugging. NFCI.
llvm-svn: 335110
2018-06-20 09:39:01 +00:00
Roman Lebedev 42a1ff11fb [NFC][SCEV] Add tests related to bit masking (PR37793)
Summary:
Related to https://bugs.llvm.org/show_bug.cgi?id=37793, https://reviews.llvm.org/D46760#1127287

We'd like to do this canonicalization https://rise4fun.com/Alive/Gmc
But it is currently restricted by rL155136 / rL155362, which says:
```
    // This is a constant shift of a constant shift. Be careful about hiding
    // shl instructions behind bit masks. They are used to represent multiplies
    // by a constant, and it is important that simple arithmetic expressions
    // are still recognizable by scalar evolution.
    //
    // The transforms applied to shl are very similar to the transforms applied
    // to mul by constant. We can be more aggressive about optimizing right
    // shifts.
    //
    // Combinations of right and left shifts will still be optimized in
    // DAGCombine where scalar evolution no longer applies.
```

I think these tests show that for *constants*, SCEV has no issues with that canonicalization.

Reviewers: mkazantsev, spatel, efriedma, sanjoy

Reviewed By: mkazantsev

Subscribers: sanjoy, javed.absar, llvm-commits, stoklund, bixia

Differential Revision: https://reviews.llvm.org/D48229

llvm-svn: 335101
2018-06-20 07:54:11 +00:00
Roman Lebedev d23b6831de [X86][Znver1] Specify Register Files, RCU; FP scheduler capacity.
Summary:
First off: i do not have any access to that processor,
so this is purely theoretical, no benchmarks.

I have been looking into b**d**ver2 scheduling profile, and while cross-referencing
the existing b**t**ver2, znver1 profiles, and the reference docs
(`Software Optimization Guide for AMD Family {15,16,17}h Processors`),
i have noticed that only b**t**ver2 scheduling profile specifies these.

Also, there is no mca test coverage.

Reviewers: RKSimon, craig.topper, courbet, GGanesh, andreadb

Reviewed By: GGanesh

Subscribers: gbedwell, vprasad, ddibyend, shivaram, Ashutosh, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D47676

llvm-svn: 335099
2018-06-20 07:01:14 +00:00
Clement Courbet 7b9913fb9f [X86] Add sched class WriteLAHFSAHF and fix values.
Summary:
I ran llvm-exegesis on SKX, SKL, BDW, HSW, SNB.
Atom is from Agner and SLM is a guess.
I've left AMD processors alone.

Reviewers: RKSimon, craig.topper

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D48079

llvm-svn: 335097
2018-06-20 06:13:39 +00:00
Hiroshi Inoue c73b6d6bf7 [NFC] fix trivial typos in comments
llvm-svn: 335096
2018-06-20 05:29:26 +00:00
Craig Topper ddd88a559f [DAGCombiner] Add some comments to some true/false arguments to make it obvious what they are. NFC
llvm-svn: 335095
2018-06-20 04:32:07 +00:00
Craig Topper 59da74370b [SelectionDAG] Don't crash on inline assembly errors when the inline assembly return type is a struct.
Summary:
If we get an error building the SelectionDAG for inline assembly we try to continue and still build the DAG.

But if the return type for the inline assembly is a struct we end up crashing because we try to create an UNDEF node with a struct type which isn't valid.

Instead we need to create an UNDEF for each element of the struct and join them with merge_values.

This patch relies on single operand merge_values being handled gracefully by getMergeValues. If the return type is void there will be no VTs returned by ComputeValueVTs and now we just return instead of calling setValue. Hopefully that's ok, I assumed nothing would need to look up the mapped value for void node.

Fixes PR37359

Reviewers: rengolin, rovka, echristo, efriedma, bogner

Reviewed By: efriedma

Subscribers: craig.topper, llvm-commits

Differential Revision: https://reviews.llvm.org/D46560

llvm-svn: 335093
2018-06-20 04:32:05 +00:00
Craig Topper d22ad8568c [X86] Use binary search of the EVEX->VEX static tables instead of populating two DenseMaps for lookups
Summary:
After r335018, the static tables are guaranteed sorted by the EVEX opcode to convert. We can use this to do a binary search and remove the need for any secondary data structures.

Right now one table is 736 entries and the other is 482 entries. It might make sense to merge the two tables as a follow up. The effort it takes to select the table is probably similar to the extra binary search step it would require for a larger table.

I haven't done any measurements to see if this has any effect on compile time, but I don't imagine that EVEX->VEX conversion is a place we spend a lot of time.

Reviewers: RKSimon, spatel, chandlerc

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D48312

llvm-svn: 335092
2018-06-20 04:32:04 +00:00
Vlad Tsyrklevich 98724e582e Revert r334980 and 334983
This reverts commits r334980 and r334983 because they were causing build
timeouts on the x86_64-linux-ubsan bot.

llvm-svn: 335085
2018-06-20 00:02:32 +00:00
Vedant Kumar f01827f2d1 [IR] Introduce helpers to skip debug instructions (NFC)
This patch introduces two helpers to make it easier to ignore debug
intrinsics:

- Instruction::getNextNonDebugInstruction()

This is just like Instruction::getNextNode(), except that it skips debug
info.

- skipDebugInfo(BasicBlock::iterator)

A free function which advances a BasicBlock iterator past any debug
info. This is a no-op when the iterator already points to a non-debug
instruction.

Part of: llvm.org/PR37728
Related to: https://reviews.llvm.org/D47874

Differential Revision: https://reviews.llvm.org/D48305

llvm-svn: 335083
2018-06-19 23:42:17 +00:00
Philip Reames 8befa295ea [InlineSpiller] Fix a crash due to lack of forward progress from remat specifically for STATEPOINT
This patch covers up a fairly fundemental issue around remat and register allocation which shows up with psuedo instructions with more vreg uses than there are physical registers.  This patch essentially just disables remat for STATEPOINTs which are the only case we've seen so far, but long term we need a better fix.

For STATEPOINTs specifically, this is a strict improvement.  It unblocks progress towards enabling a currently off-by-default mode which integrates deopt bundle operand lowering with register allocator spilling so that we end up with smaller stack sizes and more optimally placed spills.  Assming no other issues turn up during my next round of integration testing - which based on experience so far, is admittedly unlikely - we might finally be able to enable something I've been working towards in small bits and pieces for years now.  :)

For psuedo ops in general, there are a couple of ideas for a "proper fix" discussed on the bug, but I'm far enough outside my knowledge area to not be able to see any of them through to a successful conclusion.  If anyone wants to help out here, please do.  

Differential Revision: https://reviews.llvm.org/D41098

llvm-svn: 335077
2018-06-19 21:19:59 +00:00
Jessica Paquette 32de26d432 [MachineOutliner] NFC: Remove insertOutlinerPrologue, rename insertOutlinerEpilogue
insertOutlinerPrologue was not used by any target, and prologue-esque code was
beginning to appear in insertOutlinerEpilogue. Refactor that into one function,
buildOutlinedFrame.

This just removes insertOutlinerPrologue and renames insertOutlinerEpilogue.

llvm-svn: 335076
2018-06-19 21:14:48 +00:00
Heejin Ahn 891a747266 [WebAssembly] Fix liveness tracking info after drop insertion
Summary:
This fixes liveness tracking information after `drop` instruction
insertion in ExplicitLocals pass.

When a drop instruction is inserted to drop a dead register operand, the
original operand should be marked not dead anymore because it is now
used by the new drop instruction. And the operand to the new drop
instruction should be marked killed instead. This bug caused some
programs to fail when `llc` is run with `-verify-machineinstrs` option.

Reviewers: dschuff

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D48253

llvm-svn: 335074
2018-06-19 20:30:42 +00:00
Sanjay Patel 2ca3360b11 [IR] move shuffle mask queries from TTI to ShuffleVectorInst
The optimizer is getting smarter (eg, D47986) about differentiating shuffles 
based on its mask values, so we should make queries on the mask constant 
operand generally available to avoid code duplication.

We'll probably use this soon in the vectorizers and instcombine (D48023 and 
https://bugs.llvm.org/show_bug.cgi?id=37806).

We might clean up TTI a bit more once all of its current 'SK_*' options are 
covered.

Differential Revision: https://reviews.llvm.org/D48236

llvm-svn: 335067
2018-06-19 18:44:00 +00:00
Matt Davis a245c765a8 [MIRParser] Update a diagnostic message to use the correct register sigil. NFC
Summary:
Patch r323922 changed the sigil for physical registers to '$',  instead of '%'.
An error message was missed during this change, and reports the wrong sigil.
This patch corrects that diagnostic and the tests that check that error string.


Reviewers: zer0, bjope

Reviewed By: bjope

Subscribers: bjope, thegameg, plotfi, llvm-commits

Differential Revision: https://reviews.llvm.org/D48086

llvm-svn: 335066
2018-06-19 18:39:40 +00:00
Krzysztof Parzyszek 03aa8f3a24 [Hexagon] Fix the value of HexagonII::TypeCVI_FIRST
This value is the first vector instruction type in numerical order. The
previous value was incorrect, leaving TypeCVI_GATHER outside of the range
for vector instructions. This caused vector .new instructions to be
incorrectly encoded in the presence of gather.

llvm-svn: 335065
2018-06-19 18:09:54 +00:00
Craig Topper 0b7936737b [X86] Initialize FMA3Info directly in its constructor instead of relying on std::call_once
FMA3Info only exists as a managed static. As far as I know the ManagedStatic construction proccess is thread safe. It doesn't look like we ever access the ManagedStatic object without immediately doing a query on it that would require the map to be populated. So I don't think we're ever deferring the calculation of the tables from the construction of the object.

So I think we should be able to just populate the FMA3Info map directly in the constructor and get rid of all of the initGroupsOnce stuff.

Differential Revision: https://reviews.llvm.org/D48194

llvm-svn: 335064
2018-06-19 18:06:52 +00:00
Craig Topper 7ffa976993 [X86] Don't fold unaligned loads into SSE ROUNDPS/ROUNDPD for ceil/floor/nearbyint/rint/trunc.
Incorrect patterns were added in r334460. This changes them to check alignment properly for SSE.

llvm-svn: 335062
2018-06-19 17:51:42 +00:00
Krzysztof Parzyszek 5c2944c4f2 [Hexagon] Enforce restrictions on packetizing cache instructions
llvm-svn: 335061
2018-06-19 17:26:20 +00:00
Simon Dardis af38a8fed6 [mips] Mark microMIPS64 as being unsupported.
There are no provided instruction definitions for this architecture.

Reviewers: smaksimovic, atanasyan, abeserminji

Differential Revision: https://reviews.llvm.org/D48320

llvm-svn: 335057
2018-06-19 16:05:44 +00:00
Simon Dardis 9e80c340f7 [mips] Fix the predicates of some aliases
Previously, some aliases were marked as not being available for microMIPS32R6,
but this was overridden at the top level.

Reviewers: atanasyan, abeserminji, smaksimovic

Differential Revision: https://reviews.llvm.org/D48321

llvm-svn: 335053
2018-06-19 15:25:01 +00:00
Simon Pilgrim 0461393660 [SLPVectorizer] Remove default OperandValueKind arguments from getArithmeticInstrCost calls (NFC)
The getArithmeticInstrCost calls for shuffle vectors entry costs specify TargetTransformInfo::OperandValueKind arguments, but are just using the method's default values. This seems to be a copy + paste issue and doesn't affect the costs in anyway. The TargetTransformInfo::OperandValueProperties default arguments are already not being used.

Noticed while working on D47985.

Differential Revision: https://reviews.llvm.org/D48008

llvm-svn: 335045
2018-06-19 13:40:00 +00:00
Strahinja Petrovic bb2b00bb80 [PowerPC] Fix label address calculation for ppc32
This patch fixes calculating address of label on ppc32 (for -fPIC).

Differential Revision: https://reviews.llvm.org/D46582

llvm-svn: 335043
2018-06-19 13:07:40 +00:00
Mikhail Dvoretckii 8393f90717 [InstCombine] Replacing X86-specific rounding intrinsics with generic floor-ceil
This patch replaces calls to X86-specific intrinsics with floor-ceil semantics
with calls to target-independent @llvm.floor.* and @llvm.ceil.* intrinsics. This
doesn't affect the resulting machine code, as those intrinsics are lowered to
the same instructions, but exposes these specific rounding cases to generic
optimizations.

Differential Revision: https://reviews.llvm.org/D48067

llvm-svn: 335039
2018-06-19 10:49:12 +00:00
Mikhail Dvoretckii b1ce7765be [X86] VRNDSCALE* folding from masked and scalar ffloor and fceil patterns
This patch handles back-end folding of generic patterns created by lowering the
X86 rounding intrinsics to native IR in cases where the instruction isn't a
straightforward packed values rounding operation, but a masked operation or a
scalar operation.

Differential Revision: https://reviews.llvm.org/D45203

llvm-svn: 335037
2018-06-19 10:37:52 +00:00
David Green e6a9c24878 [LoopSimplifyCFG] Invalidate SCEV in LoopSimplifyCFG
LoopSimplifyCFG, being a loop pass, needs to preserve scalar
evolution. This invalidates SE for the loops altered during
block merging.

Differential Revision: https://reviews.llvm.org/D48258

llvm-svn: 335036
2018-06-19 09:43:36 +00:00
Simon Pilgrim c966f7213e [SLPVectorizer] Pull out AltOpcode determination from reorderAltShuffleOperands.
Minor step towards making the alternate opcode system work with a wider range of opcode pairs.

llvm-svn: 335032
2018-06-19 09:16:06 +00:00
Bjorn Pettersson 2015a39955 Remove valueCoversEntireFragment asserts in ConvertDebugDeclareToDebugValue
This is a fixup for r334830 causing problems in polly-aosp buildbot.

Focus in r334830 was to fix a problem seen with
ConvertDebugDeclareToDebugValue involving store instructions.
It also added some asserts to find out of similar problems
existed for the ConvertDebugDeclareToDebugValue functions
involving load and phi instructions. One of those asserts seems
to blow in the polly-aosp buildbot, so I'll revert the asserts
while debugging.

llvm-svn: 335031
2018-06-19 08:41:34 +00:00
Florian Hahn d8fcf0de31 [LoopInterchange] Move PHI handling to adjustLoopBranches.
This patch moves the logic to handle reduction PHI nodes to the end of
adjustLoopBranches. Reduction PHI nodes in the outer loop header can be
moved to the inner loop header and reduction PHI nodes from the inner loop
header can be moved to the outer loop header. In the latter situation,
we have to deal with 1 kind of PHI nodes:

    PHI nodes that are part of inner loop-only reductions.

We can replace the PHI node with the value coming from outside
the inner loop.

Reviewers: mcrosier, efriedma, karthikthecool

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D46198

llvm-svn: 335027
2018-06-19 08:03:24 +00:00
Mikhail Dvoretckii bd8ed2dbaa Test commit.
llvm-svn: 335026
2018-06-19 07:55:10 +00:00
QingShan Zhang 9f0fe9a3f8 If the arch is P9, we will select the DFLOADf32/DFLOADf64 pseudo instruction when we are loading a floating,
and expand it post RA basing on the register pressure. However, we miss to do the add-imm peephole for these pseudo instruction.

Differential Revision: https://reviews.llvm.org/D47568
Reviewed By: Nemanjai

llvm-svn: 335024
2018-06-19 06:54:51 +00:00
Max Kazantsev 37da4333a8 [SimplifyIndVars] Eliminate redundant truncs
This patch adds logic to deal with the following constructions:

  %iv = phi i64 ...
  %trunc = trunc i64 %iv to i32
  %cmp = icmp <pred> i32 %trunc, %invariant

Replacing it with
  %iv = phi i64 ...
  %cmp = icmp <pred> i64 %iv, sext/zext(%invariant)

In case if it is legal. Specifically, if `%iv` has signed comparison users, it is
required that `sext(trunc(%iv)) == %iv`, and if it has unsigned comparison
uses then we require `zext(trunc(%iv)) == %iv`. The current implementation
bails if `%trunc` has other uses than `icmp`, but in theory we can handle more
cases here (e.g. if the user of trunc is bitcast).

Differential Revision: https://reviews.llvm.org/D47928
Reviewed By: reames

llvm-svn: 335020
2018-06-19 04:48:34 +00:00
Craig Topper c2965214ef [X86] Add the ability to force an EVEX2VEX mapping table entry from the .td files. Remove remaining manual table entries from the tablegen emitter.
This adds an EVEX2VEXOverride string to the X86 instruction class in X86InstrFormats.td. If this field is set it will add manual entry in the EVEX->VEX tables that doesn't check the encoding information.

Then use this mechanism to map VMOVDU/A8/16, 128-bit VALIGN, and VPSHUFF/I instructions to VEX instructions.

Finally, remove the manual table from the emitter.

This has the bonus of fully sorting the autogenerated EVEX->VEX tables by their EVEX instruction enum value. We may be able to use this to do a binary search for the conversion and get rid of the need to create a DenseMap.

llvm-svn: 335018
2018-06-19 04:24:44 +00:00
Craig Topper 0a5e90cc2a [X86] Add a new VEX_WPrefix encoding to tag EVEX instruction that have VEX.W==1, but can be converted to their VEX equivalent that uses VEX.W==0.
EVEX makes heavy use of the VEX.W bit to indicate 64-bit element vs 32-bit elements. Many of the VEX instructions were split into 2 versions with different masking granularity.

The EVEX->VEX table generate can collapse the two versions if the VEX version uses is tagged as VEX_WIG. But if the VEX version is instead marked VEX.W==0 we can't combine them because we don't know if there is also a VEX version with VEX.W==1.

This patch adds a new VEX_W1X tag that indicates the EVEX instruction encodes with VEX.W==1, but is safe to convert to a VEX instruction with VEX.W==0.

This allows us to remove a bunch of manual EVEX->VEX table entries. We may want to look into splitting up the VEX_WPrefix field which would simplify the disassembler.

llvm-svn: 335017
2018-06-19 04:24:42 +00:00
Sanjoy Das 6e9b355cc9 Revert "[SCEV] Add nuw/nsw to mul ops in StrengthenNoWrapFlags"
This reverts r334428.  It incorrectly marks some multiplications as nuw.  Tim
Shen is working on a proper fix.

Original commit message:

[SCEV] Add nuw/nsw to mul ops in StrengthenNoWrapFlags where safe.

Summary:
Previously we would add them for adds, but not multiplies.

llvm-svn: 335016
2018-06-19 04:09:44 +00:00
Craig Topper 46c0b368d6 [X86] Simplify the TSFlags checking code in EvexToVexInstPass. NFCI
The code was previously checking the L2 and L flag on 3 separate lines, treating the combination as an encoding. Instead its better to think of the L2 bit as being something that can't be done with VEX and early returning. Then we just need to check the L bit.

llvm-svn: 335015
2018-06-19 03:17:46 +00:00
Heejin Ahn 817811caae [WebAssembly] Add more utility functions
Summary:
Added more utility functions that will be used in EH-related passes Also
changed `LoopBottom` function to `getBottom` and uses templates to be
able to handle other classes as well, which will be used in CFGSort
later.

Reviewers: dschuff

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D48262

llvm-svn: 335006
2018-06-19 00:32:03 +00:00
Heejin Ahn 33c3fce592 [WebAssembly] Add WasmEHFuncInfo for unwind destination information
Summary:
Add WasmEHFuncInfo and routines to calculate and fill in this struct to
keep track of unwind destination information. This will be used in
other EH related passes.

Reviewers: dschuff

Subscribers: sbc100, jgravelle-google, sunfish, chrib, llvm-commits

Differential Revision: https://reviews.llvm.org/D48263

llvm-svn: 335005
2018-06-19 00:26:39 +00:00
Heejin Ahn 6279d71227 [WebAssembly] Make rethrow instruction take a target BB argument
Summary:
This patch changes the rethrow instruction to take a BB argument in LLVM
backend, like `br` and `br_if`s. This BB is a target catch BB the
rethrow instruction unwinds to. This BB argument will be converted to an
relative depth immediate at the end of CFGStackify pass, as in the same
way of branches.

RETHROW_TO_CALLER is a codegen-only instruction that should be used when
a rethrow instruction does not have an unwind destination BB, i.e., it
should rethrow to its caller function.

Reviewers: dschuff

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D48260

llvm-svn: 334998
2018-06-18 23:54:29 +00:00
Michael Berg 7b993d762f Utilize new SDNode flag functionality to expand current support for fadd
Summary: This patch originated from D46562 and is a proper subset, with some issues addressed.

Reviewers: spatel, hfinkel, wristow, arsenm, javed.absar

Reviewed By: spatel

Subscribers: wdng, nhaehnle

Differential Revision: https://reviews.llvm.org/D47909

llvm-svn: 334996
2018-06-18 23:44:59 +00:00
Craig Topper a7b7f2f4d8 [X86] Remove ReadAfterLd from avx512_shift_rmbi multiclass.
The instructions that use this class don't have another source register. So I think this was just marking one of the address operands as ReadAfterLd?

llvm-svn: 334994
2018-06-18 23:20:57 +00:00
Xin Tong 54b4227f32 Revert "Simplify blockaddress usage before giving up in MergeBlockIntoPredecessor"
This reverts commit f976cf4cca0794267f28b54e468007fd476d37d9.

I am reverting this because it causes break in a few bots and its going
to take me sometime to look at this.

llvm-svn: 334993
2018-06-18 23:20:08 +00:00
Xin Tong bfd8cfcb8d Simplify blockaddress usage before giving up in MergeBlockIntoPredecessor
Summary:
Simplify blockaddress usage before giving up in MergeBlockIntoPredecessor

This is a missing small optimization in MergeBlockIntoPredecessor.

This helps with one simplifycfg test which expects this case to be handled.

Reviewers: davide, spatel, brzycki, asbirlea

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D48284

llvm-svn: 334992
2018-06-18 22:59:13 +00:00
Eric Christopher 88bbad24c0 Tidy comment language and explanation.
llvm-svn: 334990
2018-06-18 22:21:19 +00:00
Eric Christopher b1faaf3069 Pull non-lazy stub table emission into a separate function alongside
the individual stub creation to increase readability a bit in the
non-object file format specific function.

llvm-svn: 334989
2018-06-18 22:21:18 +00:00
Eric Christopher b6d0b99f3b Add return statements to make it clear that all of these are mutually exclusive conditions.
else if would have worked just as well, but this keeps the original readability a bit more clear.

llvm-svn: 334988
2018-06-18 22:21:13 +00:00
Wouter van Oortmerssen 48dac3109e [WebAssembly] Modified tablegen defs to have 2 parallel instuction sets.
Summary:
One for register based, much like the existing definitions,
and one for stack based (suffix _S).

This allows us to use registers in most of LLVM (which works better),
and stack based in MC (which results in a simpler and more readable
assembler / disassembler).

Tried to keep this change as small as possible while passing tests,
follow-up commit will:
- Add reg->stack conversion in MI.
- Fix asm/disasm in MC to be stack based.
- Fix emitter to be stack based.

tests passing:
llvm-lit -v `find test -name WebAssembly`

test/CodeGen/WebAssembly
test/MC/WebAssembly
test/MC/Disassembler/WebAssembly
test/DebugInfo/WebAssembly
test/CodeGen/MIR/WebAssembly
test/tools/llvm-objdump/WebAssembly

Reviewers: dschuff, sbc100, jgravelle-google, sunfish

Subscribers: aheejin, JDevlieghere, llvm-commits

Differential Revision: https://reviews.llvm.org/D48183

llvm-svn: 334985
2018-06-18 21:22:44 +00:00
Michael Berg 932ba20af8 refactor of visitFADD for AllowNewConst cases
Summary: Refactoring for all constant cases which require AllowNewConst and some staging for future fmf usage.

Reviewers: spatel, hfinkel, wristow

Reviewed By: spatel

Subscribers: nhaehnle

Differential Revision: https://reviews.llvm.org/D48289

llvm-svn: 334984
2018-06-18 21:12:21 +00:00
Sander de Smalen 067eee1c13 [AArch64][SVE] Asm: Fix predicate pattern diagnostics.
This patch uses the DiagnosticPredicate for SVE predicate patterns
to improve their diagnostics, now giving a 'invalid operand' diagnostic
if the type is not an immediate or one of the expected pattern
labels.

Reviewers: samparker, SjoerdMeijer, javed.absar, fhahn

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D48220

llvm-svn: 334983
2018-06-18 21:03:02 +00:00
Sander de Smalen 7ac9e193ec [AArch64][SVE] Asm: Support for saturating INC/DEC (32bit scalar) instructions.
The variants added by this patch are:
- SQINC     signed increment, e.g. sqinc x0, w0, all, mul #4
- SQDEC     signed decrement, e.g. sqdec x0, w0, all, mul #4
- UQINC   unsigned increment, e.g. uqinc w0, all, mul #4
- UQDEC   unsigned decrement, e.g. uqdec w0, all, mul #4
 
This patch includes asmparser changes to parse a GPR64 as a GPR32 in
order to satisfy the constraint check:
  x0 == GPR64(w0)
in:
  sqinc x0, w0, all, mul #4
         ^___^ (must match)

Reviewers: rengolin, fhahn, SjoerdMeijer, samparker, javed.absar

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D47716

llvm-svn: 334980
2018-06-18 20:50:33 +00:00
Wouter van Oortmerssen 78c62966c2 [WebAssembly] Cleaned up register accessors in WebAssemblyMachineFunctionInfo.h
Tested: llvm-lit -v `find test -name WebAssembly`

(This is a commit access "test commit" :)

llvm-svn: 334979
2018-06-18 20:45:49 +00:00
Craig Topper 17bd84c12c [X86] Encode the EVEX2VEX exception list information in .td files instead of the emitter source.
Rather than having an exclusion list in tablegen sources, add a flag to the X86 instruction records that can be used to suppress checking for convertibility.

llvm-svn: 334971
2018-06-18 18:47:07 +00:00
Michael Berg cafe947445 [NFC] make MIFlag accessor functions consistant with usage model
llvm-svn: 334970
2018-06-18 18:37:48 +00:00
Florian Hahn 3385caaafd [VPlan] Add VPInstruction to VPRecipe transformation.
This patch introduces a VPInstructionToVPRecipe transformation, which
allows us to generate code for a VPInstruction based VPlan re-using the
existing infrastructure.

Reviewers: dcaballe, hsaito, mssimpso, hfinkel, rengolin, mkuper, javed.absar, sguggill

Reviewed By: dcaballe

Differential Revision: https://reviews.llvm.org/D46827

llvm-svn: 334969
2018-06-18 18:28:49 +00:00
Lang Hames 68c9b8d6a1 [ORC] Add an initial implementation of a replacement CompileOnDemandLayer.
CompileOnDemandLayer2 is a replacement for CompileOnDemandLayer built on the ORC
Core APIs. Functions in added modules are extracted and compiled lazily.
CompileOnDemandLayer2 supports multithreaded JIT'd code, and compilation on
multiple threads.

llvm-svn: 334967
2018-06-18 18:01:43 +00:00
Lang Hames 2e96114caf [ORC] Keep weak flag on VSO symbol tables during materialization, but treat
materializing weak symbols as strong.

This removes some elaborate flag tweaking and plays nicer with RuntimeDyld,
which relies of weak/common flags to determine whether it should emit a given
weak definition. (Switching to strong up-front makes it appear as if there is
already an overriding definition, which would require an extra back-channel to
override).

llvm-svn: 334966
2018-06-18 18:01:41 +00:00
Krzysztof Parzyszek 546017322f Shrink interval after moving copy in removePartialRedundancy
llvm-svn: 334963
2018-06-18 17:16:39 +00:00
Nirav Dave b35f9e1459 Fix typoed cast to avoid assertion in MCFragment::dump.
llvm-svn: 334959
2018-06-18 16:26:11 +00:00
Simon Pilgrim 5b962b2fc3 [SLPVectorizer] Tidyup isShuffle helper
Ensure we keep track of the input vectors in all cases instead of just for SK_Select.

Ideally we'd reuse the shuffle mask pattern matching in TargetTransformInfo::getInstructionThroughput here to easily add support for all TargetTransformInfo::ShuffleKind without mass code duplication, I've added a TODO for now but D48236 should help us here.

Differential Revision: https://reviews.llvm.org/D48023

llvm-svn: 334958
2018-06-18 16:25:01 +00:00
Florian Hahn 63cbcf98a5 [VPlanRecipeBase] Add eraseFromParent().
Reviewers: dcaballe, hsaito, mkuper, hfinkel

Reviewed By: dcaballe

Differential Revision: https://reviews.llvm.org/D48081

llvm-svn: 334951
2018-06-18 15:18:48 +00:00
Sander de Smalen 13684d8400 [AArch64][SVE] Asm: Support for saturating INC/DEC (64bit scalar) instructions.
Summary:
The variants added by this patch are:
- SQINC  (signed increment)
- UQINC  (unsigned increment)
- SQDEC  (signed decrement)
- UQDEC  (unsigned decrement)

For example:
  uqincw  x0, all, mul #4

Reviewers: rengolin, fhahn, SjoerdMeijer, samparker, javed.absar

Differential Revision: https://reviews.llvm.org/D47715

llvm-svn: 334948
2018-06-18 14:47:52 +00:00
Simon Pilgrim 9173c97ce4 [X86][BtVer2] Flag AVX2+ scheduler classes as unsupported
Jaguar only supports up to AVX1

Differential Revision: https://reviews.llvm.org/D48274

llvm-svn: 334947
2018-06-18 14:31:14 +00:00
Florian Hahn 3bcff3662c [VPlan] Fix sanitizer problem with insertBefore.
llvm-svn: 334943
2018-06-18 13:51:28 +00:00
Simon Pilgrim 99a5832016 [SLPVectorizer] Avoid calling const VL.size() repeatedly in for-loop. NFCI.
llvm-svn: 334934
2018-06-18 11:35:36 +00:00
Florian Hahn 7591e4e94a [VPlanRecipeBase] Add insertBefore helper.
Reviewers: dcaballe, mkuper, hfinkel, hsaito, mssimpso

Reviewed By: dcaballe

Differential Revision: https://reviews.llvm.org/D48080

llvm-svn: 334933
2018-06-18 11:34:17 +00:00
Sander de Smalen d521c4353e [AArch64][SVE] Asm: Support for vector element compares.
This patch adds instructions for comparing elements from two vectors, e.g.
  cmpgt p0.s, p0/z, z0.s, z1.s

and also adds support for comparing to a 64-bit wide element vector, e.g.
  cmpgt p0.s, p0/z, z0.s, z1.d

The patch also contains aliases for certain comparisons, e.g.:
  cmple p0.s, p0/z, z0.s, z1.s => cmpge p0.s, p0/z, z1.s, z0.s
  cmplo p0.s, p0/z, z0.s, z1.s => cmphi p0.s, p0/z, z1.s, z0.s
  cmpls p0.s, p0/z, z0.s, z1.s => cmphs p0.s, p0/z, z1.s, z0.s
  cmplt p0.s, p0/z, z0.s, z1.s => cmpgt p0.s, p0/z, z1.s, z0.s

llvm-svn: 334931
2018-06-18 10:59:19 +00:00
Clement Courbet 0d9da88d18 [X86] Fix NOOP sched overrides on BDW/HSW/SKL.
Summary: Noop certainly does not use resources.

Reviewers: RKSimon, craig.topper, andreadb

Subscribers: gbedwell, llvm-commits, gchatelet

Differential Revision: https://reviews.llvm.org/D48028

llvm-svn: 334927
2018-06-18 06:48:22 +00:00
Craig Topper f0ab7bd196 [X86] Create X86InstrFMA3Group objects fully in a static table instead of on the heap. NFCI
Previously we heap allocated the X86InstrFMA3Group objects which were created by passing them small register/memory opcode arrays that existed as individual static tables.

Rather than a bunch of small static arrays we now have one large static table of X86InstrFMA3Group objects. Rather than storing a pointer to the opcode arrays in the X86InstrFMA3Group object, we now store have a register and memory array as part of the object. If a group doesn't have memory or register opcodes, the array entries will be 0.

This greatly simplifies the destruction of the X86InstrFMA3Info object. We no longer need to delete the X86InstrFMA3Group objects as we destruct the DenseMap. And we don't need to keep track of which ones we already deleted.

This reduces the llc binary size on my local machine by ~50k. I can only assume that's really due to the fact that we had something like 512 small static arrays that we passed to the init functions either one at a time or in pairs. So there were between 256 and 512 distinct calls to the init functions in the initOnceImpl method.

llvm-svn: 334925
2018-06-18 06:32:22 +00:00
Craig Topper 16fdde5e63 [X86] Add '.s' aliases to the assembler for the various redundant move encodings to match gas and our EVEX instructions.
We already have these aliases for EVEX enocded instructions, but not for the GPR, MMX, SSE, and VEX versions.

Also remove the vpextrw.s EVEX alias. That's not something gas implements.

llvm-svn: 334922
2018-06-18 05:00:50 +00:00
Craig Topper 916d0cf649 [X86] Move the 'vmovq.s' and similar assembly strings for EVEX vector moves with reversed operands to InstAliases.
The .s assembly strings allow the reversed forms to be targeted from assembly which matches gas behavior. But when printing the instructions we should print them without the .s to match other tooling like objdump. By using InstAliases we can use the normal string in the instruction and just hide it from the assembly parser.

Ideally we'd add the .s versions to the legacy SSE and VEX versions as well for full compatibility with gas. Not sure how we got to state where only EVEX was supported.

llvm-svn: 334920
2018-06-18 01:28:05 +00:00
Lang Hames 0705ee8dc2 [ORC] Remove redundant condition
llvm-svn: 334918
2018-06-17 23:54:58 +00:00
Lang Hames a5247cc5c7 [ORC] Only notify queries that they are resolved/ready when the query state
changes.

This guards against redundant notifications.

llvm-svn: 334916
2018-06-17 18:59:01 +00:00
Craig Topper 9fe45d846e [X86] Add all the FMA instructions direclty to the load folding table instead of proxying through X86InstrFMA3Info.
These increases the size of the static tables, but is closer to what we would get if used the autogenerated table directly. This reduces the remaining large deltas between what's in the manual table and what's in the autogenerated table.

llvm-svn: 334915
2018-06-17 18:00:16 +00:00
Lang Hames cd018a4467 [ORC] Suppress an unused variable warning for a debug-mode only use.
llvm-svn: 334911
2018-06-17 17:18:12 +00:00
Lang Hames df5776b1dc [ORC] Erase empty dependence sets when adding new symbol dependencies.
llvm-svn: 334910
2018-06-17 16:59:53 +00:00
Lang Hames 11adecfb2c [ORC] In MaterializationResponsibility, only maintain the Materializing flag on
symbols in debug mode.

The MaterializationResponsibility class hijacks the Materializing flag to track
symbols that have not yet been resolved in order to guard against redundant
resolution. Since this is an API contract check and only enforced in debug mode
there is no reason to maintain the flag state in release mode.

llvm-svn: 334909
2018-06-17 16:59:52 +00:00
Craig Topper b0e986f88e [X86] Pass the parent SDNode to X86DAGToDAGISel::selectScalarSSELoad to simplify the hasSingleUseFromRoot handling.
Some of the calls to hasSingleUseFromRoot were passing the load itself. If the load's chain result has a user this would count against that. By getting the true parent of the match and ensuring any intermediate between the match and the load have a single use we can avoid this case. isLegalToFold will take care of checking users of the load's data output.

This fixed at least fma-scalar-memfold.ll to succed without the peephole pass.

llvm-svn: 334908
2018-06-17 16:29:46 +00:00
Sander de Smalen 279b7e74e7 [AArch64][SVE] Asm: Support for bitwise operations on predicate vectors.
This patch adds support for instructions performing bitwise operations
on predicate vectors, including AND, BIC, EOR, NAND, NOR, ORN, ORR, and
their status flag setting variants ANDS, BICS, EORS, NANDS, ORNS, ORRS.

This patch also adds several aliases:

  orr  p0.b, p1/z, p1.b, p1.b  => mov  p0.b, p1.b
  orrs p0.b, p1/z, p1.b, p1.b  => movs p0.b, p1.b

  and  p0.b, p1/z, p2.b, p2.b  => mov  p0.b, p1/z, p2.b
  ands p0.b, p1/z, p2.b, p2.b  => movs p0.b, p1/z, p2.b

  eor  p0.b, p1/z, p2.b, p1.b  => not  p0.b, p1/z, p2.b
  eors p0.b, p1/z, p2.b, p1.b  => nots p0.b, p1/z, p2.b

llvm-svn: 334906
2018-06-17 10:48:21 +00:00
Sander de Smalen 2c25b4cd36 [AArch64][SVE] Asm: Support for SEL (vector/predicate) instructions.
Support for SVE's predicated select instructions to select elements
from either vector, both in a data-vector and a predicate-vector
variant.

llvm-svn: 334905
2018-06-17 10:11:04 +00:00
Jonas Hahnfeld c7410ed47a [NVPTX] Ignore target-cpu and -features for inlining
We don't want to prevent inlining because of target-cpu and -features
attributes that were added to newer versions of LLVM/Clang: There are
no incompatible functions in PTX, ptxas will throw errors in such cases.

Differential Revision: https://reviews.llvm.org/D47691

llvm-svn: 334904
2018-06-17 09:55:20 +00:00
Heejin Ahn 9786946731 [WebAssembly] Simple comment fix. NFC.
llvm-svn: 334899
2018-06-17 00:37:56 +00:00
Craig Topper 29f22d7baa [X86] More additions to the load folding tables based on the autogenerated tables.
Including more additions for NotMemoryFoldable to remove some entries from the autogenerated table.

llvm-svn: 334898
2018-06-16 23:25:50 +00:00
Craig Topper c435632862 [X86] Hide POP16/32/64rmr and PUSH16/32/64rmr instructions from the assembly parser.
These all have a short form encoding that the assembler already prefers. Though that preference seems to only be based on order in the .td fie. Hiding the long form saves space in the table and prevents us from breaking the implicit order based priority.

llvm-svn: 334897
2018-06-16 23:25:48 +00:00
Craig Topper 74412c7d59 [X86] Fix an inconsistency between AVX512 and AVX/SSE version on a couple instructions.
VMOVPQIto64Zmr is not a 64-bit mode only instruction. But I don't know how to test this because VMOVPQIto64mr should always have priority over it in 32-bit mode since its only advantage is XMM16-XMM31 which aren't usable in 32-bit mode.

VMOVPQIto64Zrr is a 64-bit mode only instruction, but we don't need to explicitly mark it as such because it uses a GR64 register which won't parse in 32-bit mode.

llvm-svn: 334896
2018-06-16 23:25:47 +00:00
Michael Zolotukhin 158a7c3323 CorrelatedValuePropagation: Preserve DT.
Summary:
We only modify CFG in a couple of places, and we can preserve DT there
with a little effort.

Reviewers: davide, vsk

Subscribers: hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D48059

llvm-svn: 334895
2018-06-16 18:57:31 +00:00
Benjamin Kramer 1193bbf6b7 Fix namespaces. No functionality change.
llvm-svn: 334890
2018-06-16 13:37:52 +00:00
Stanislav Mekhanoshin 3b11794dbf [AMDGPU] setcc (select cc, CT, CF), CF, eq | ne -> xor cc, -1 | cc
This is the common case in the BE when we serialize condition and then
rematerialize it. Use either original or inverted condition.

Differential Revision: https://reviews.llvm.org/D48246

llvm-svn: 334882
2018-06-16 03:46:59 +00:00
Nirav Dave d4ff2f8a74 Avoid needing to walk out legalization tables. NFCI.
Relanding after fixing expensive check from modifying tables.

To avoid redundant work, during DAG legalization we keep tables
mapping pre-legalized SDValues to post-legalized SDValues and a
SDValue-to-SDValue map to enable fast node replacements. However, as
the keys are nodes which may be reused it is possible that an entry in
a table refers to a now deleted node N (that should have been renamed
by the value replacement map) while a new node N' exists. If N' is
then replaced that entry would be wrong. Previously we avoided this by
when potentially violating this property, walking every table and
updating all node pointers. This is very expensive but hopefully rare
occurance.

This patch assigns each instance of a SDValue used in legalization a
unique id and uses these ids in the legalization tables. This avoids
any such aliasing issue, avoiding the full table search and allowing
more aggressive incremental table pruning.

In some cases this is a 1000x speedup to compilation.

Reviewers: jyknight, echristo, bogner, tra

Reviewed By: bogner

Subscribers: dberris, grandinj, hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D47959

llvm-svn: 334880
2018-06-16 02:51:29 +00:00
Justin Lebar 3f5490af21 Revert "[SCEV] Use LLVM_MARK_AS_BITMASK_ENUM in SCEV." -- breaks MSVC builds.
This reverts D48237.

llvm-svn: 334878
2018-06-16 00:14:10 +00:00
Justin Lebar 018c3790f9 Revert "[SCEV] Simplify some flags expressions." -- dependent revision breaks MSVC builds.
This reverts D48238.

llvm-svn: 334877
2018-06-16 00:13:57 +00:00
Michael Berg 8e570c3390 Utilize new SDNode flag functionality to expand current support for fma
Summary: This patch originated from D47388 and is a proper subset of the originating changes, containing only the fmf optimization guard extensions.

Reviewers: spatel, hfinkel, wristow, arsenm, javed.absar, rampitec, nhaehnle, nemanjai

Reviewed By: rampitec, nhaehnle

Subscribers: tpr, nemanjai, wdng

Differential Revision: https://reviews.llvm.org/D47918

llvm-svn: 334876
2018-06-16 00:03:06 +00:00
Justin Lebar af30bb1c90 [SCEV] Simplify some flags expressions.
Summary:
Sending for presubmit review out of an abundance of caution; it would be
bad to mess this up.

Reviewers: sanjoy

Subscribers: hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D48238

llvm-svn: 334875
2018-06-15 23:52:11 +00:00
Justin Lebar 6cb702d00d [SCEV] Use LLVM_MARK_AS_BITMASK_ENUM in SCEV.
Summary:
Obviates the need for mask/clear/setFlags helpers.

There are some expressions here which can be simplified, but to keep
this easy to review, I have not simplified them in this patch.

No functional change.

Reviewers: sanjoy

Subscribers: hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D48237

llvm-svn: 334874
2018-06-15 23:51:57 +00:00
Daniel Sanders 8ead1290e6 [globalisel][tablegen] Add support for C++ predicates on PatFrags and use it to support BFC on ARM.
So far, we've only handled special cases of PatFrag like ImmLeaf. This patch
adds support for the remaining cases using similar mechanisms.

Like most C++ code from SelectionDAG, GISel and DAGISel expect to operate on
different types and representations and as such the code is not compatible
between the two. It's therefore necessary to add an alternative implementation
in the GISelPredicateCode field.

The target test for this feature could easily be done with IntImmLeaf and this
would save on a little boilerplate. The reason I've chosen to implement this
using PatFrag.GISelPredicateCode and not IntImmLeaf is because I was unable to
find a rule that was blocked solely by lack of support for PatFrag predicates. I
found that the ones I investigated as being likely candidates for the test
were further blocked by other things.

llvm-svn: 334871
2018-06-15 23:13:43 +00:00
Francis Visoiu Mistrih dc705a6a89 Revert r334729 "[DAG] Avoid needing to walk out legalization tables. NFCI."
This reverts commit r334729.

llvm-svn: 334869
2018-06-15 23:05:41 +00:00
Francis Visoiu Mistrih 1c9df30eca Revert r334731 "Avoid unused variable in non-assert builds."
This reverts commit r334731.

It breaks EXPENSIVE_CHECKS bots.

llvm-svn: 334868
2018-06-15 23:05:40 +00:00
Craig Topper d00e375310 [X86] Add more instructions to the hasUndefRegUpdate list.
Not sure any of these matter today because I don't think we ever produce them with IMPLICIT_DEF as an input. But by listing them we don't be suprised in the future.

llvm-svn: 334867
2018-06-15 22:25:04 +00:00
Benjamin Kramer 7f68a309ac [BPI] Remove unnecessary std::list
vector is sufficient here. No functionality change intended.

llvm-svn: 334865
2018-06-15 21:06:43 +00:00
Cameron McInally 7caac670b2 [FPEnv] Expand constrained FP POWI
Modify ExpandStrictFPOp(...) to handle nodes that have scalar
operands. 

Also, add a Strict FMA test and do some other light cleanup in the
Strict FP code.

Differential Revision: https://reviews.llvm.org/D48149

llvm-svn: 334863
2018-06-15 20:57:55 +00:00
Michael Berg 02d1c6c0cf Utilize new SDNode flag functionality to expand current support for fdiv
Summary: This patch originated from D46562 and is a proper subset, with some issues addressed.

Reviewers: spatel, hfinkel, wristow, arsenm

Reviewed By: spatel

Subscribers: wdng, nhaehnle

Differential Revision: https://reviews.llvm.org/D47954

llvm-svn: 334862
2018-06-15 20:44:55 +00:00
Matt Morehouse 0ea9a90b3d [SanitizerCoverage] Add associated metadata to pc-tables.
Summary:
Using associated metadata rather than llvm.used allows linkers to
perform dead stripping with -fsanitize-coverage=pc-table.  Unfortunately
in my local tests, LLD was the only linker that made use of this metadata.

Partially addresses https://bugs.llvm.org/show_bug.cgi?id=34636 and fixes
https://github.com/google/sanitizers/issues/971.

Reviewers: eugenis

Reviewed By: eugenis

Subscribers: Dor1s, hiraditya, llvm-commits, kcc

Differential Revision: https://reviews.llvm.org/D48203

llvm-svn: 334858
2018-06-15 20:12:58 +00:00
Sean Fertile cac28aeb3f [PowerPC] Add support for high and higha symbol modifiers on tls modifers.
Enables using the high and high-adjusted symbol modifiers on thread local
storage modifers in powerpc assembly. Needed to be able to support 64 bit
thread-pointer and dynamic-thread-pointer access sequences.

Differential Revision: https://reviews.llvm.org/D47754

llvm-svn: 334856
2018-06-15 19:47:16 +00:00
Sean Fertile 80b8f82f17 [PPC64] Support "symbol@high" and "symbol@higha" symbol modifers.
Add support for the "@high" and "@higha" symbol modifiers in powerpc64 assembly.
The modifiers represent accessing the segment consiting of bits 16-31 of a
64-bit address/offset.

Differential Revision: https://reviews.llvm.org/D47729

llvm-svn: 334855
2018-06-15 19:47:11 +00:00
Tomasz Krupa bcaab53d47 [X86] Lowering sqrt intrinsics to native IR
Summary: Complementary patch to lowering sqrt intrinsics in Clang.

Reviewers: craig.topper, spatel, RKSimon, DavidKreitzer, uriel.k

Reviewed By: craig.topper

Subscribers: tkrupa, mike.dvoretsky, llvm-commits

Differential Revision: https://reviews.llvm.org/D41599

llvm-svn: 334849
2018-06-15 18:05:24 +00:00
Craig Topper 1657b7b8d2 [X86] Prevent folding stack reloads into instructions in hasUndefRegUpdate.
An earlier commit prevented folds from the peephole pass by checking for IMPLICIT_DEF. But later in the pipeline IMPLICIT_DEF just becomes and Undef flag on the input register so we need to check for that case too.

llvm-svn: 334848
2018-06-15 17:56:17 +00:00
Krzysztof Parzyszek 1a70426ac1 Remove <undef> from rematerialized full register
When coalescing a small register into a subregister of a larger register,
if the larger register is rematerialized, the function updateRegDefUses
can add an <undef> flag to the rematerialized definition (since it's
treating it as only definining the coalesced subregister). While with that
assumption doing so is not incorrect, make sure to remove the flag later
on after the call to updateRegDefUses.

llvm-svn: 334845
2018-06-15 16:58:22 +00:00
Joseph Tremoulet 6f406d4f02 [InstCombine] Avoid iteration/mutation conflict
Summary:
When iterating users of a multiply in processUMulZExtIdiom, the
call to setOperand in the truncation case may replace the use
being visited; make sure the iterator has been advanced before
doing that replacement.

Reviewers: majnemer, davide

Reviewed By: davide

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D48192

llvm-svn: 334844
2018-06-15 16:52:40 +00:00
Sander de Smalen a6edca72ba [AArch64][SVE] Asm: Support for CPY SIMD/FP and GPR instructions.
Predicated splat/copy of SIMD/FP register or general purpose
register to SVE vector, along with MOV-aliases.

llvm-svn: 334842
2018-06-15 16:39:46 +00:00
Jordan Rose 7e535bc4bf Avoid copying PrettyStackTrace messages an extra time on Apple OSs
We were unnecessarily going from SmallString to std::string just to
get a null-terminated C string. So just...don't do that. Crash
slightly faster!

llvm-svn: 334841
2018-06-15 16:35:31 +00:00
Diego Caballero 68795245cf [LV] Prevent LV to run cost model twice for VF=2
This is a minor fix for LV cost model, where the cost for VF=2 was
computed twice when the vectorization of the loop was forced without
specifying a VF.

Reviewers: xusx595, hsaito, fhahn, mkuper

Reviewed By: hsaito, xusx595

Differential Revision: https://reviews.llvm.org/D48048

llvm-svn: 334840
2018-06-15 16:21:35 +00:00
Sander de Smalen 18ac8f9f25 [AArch64][SVE] Asm: Support for INC/DEC (scalar) instructions.
Increment/decrement scalar register by (scaled) element count given by
predicate pattern, e.g. 'incw x0, all, mul #4'.

Reviewers: rengolin, fhahn, SjoerdMeijer, samparker, javed.absar

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D47713

llvm-svn: 334838
2018-06-15 15:47:44 +00:00
Matt Arsenault 63bc0e3cb9 AMDGPU: Add combine for short vector extract_vector_elts
Try to access pieces 4 bytes at a time. This helps
various hasOneUse extract_vector_elt combines, such
as load width reductions.

Avoids test regressions in a future commit.

llvm-svn: 334836
2018-06-15 15:31:36 +00:00
Matt Arsenault 02dc7e19e2 AMDGPU: Make v4i16/v4f16 legal
Some image loads return these, and it's awkward working
around them not being legal.

llvm-svn: 334835
2018-06-15 15:15:46 +00:00
Sander de Smalen 5eb51d7495 [AArch64][SVE] Asm: Support for FADD, FMUL and FMAX immediate instructions.
Reviewers: rengolin, fhahn, SjoerdMeijer, samparker, javed.absar

Reviewed By: javed.absar

Differential Revision: https://reviews.llvm.org/D47712

llvm-svn: 334831
2018-06-15 13:57:51 +00:00
Bjorn Pettersson 428caf988b Re-apply "[DebugInfo] Check size of variable in ConvertDebugDeclareToDebugValue"
This is r334704 (which was reverted in r334732) with a fix for
types like x86_fp80. We need to use getTypeAllocSizeInBits and
not getTypeStoreSizeInBits to avoid dropping debug info for
such types.

Original commit msg:
> Summary:
> Do not convert a DbgDeclare to DbgValue if the store
> instruction only refer to a fragment of the variable
> described by the DbgDeclare.
>
> Problem was seen when for example having an alloca for an
> array or struct, and there were stores to individual elements.
> In the past we inserted a DbgValue intrinsics for each store,
> just as if the store wrote the whole variable.
>
> When handling store instructions we insert a DbgValue that
> indicates that the variable is "undefined", as we do not know
> which part of the variable that is updated by the store.
>
> When ConvertDebugDeclareToDebugValue is used with a load/phi
> instruction we assert that the referenced value is large enough
> to cover the whole variable. Afaict this should be true for all
> scenarios where those methods are used on trunk. If the assert
> blows in the future I guess we could simply skip to insert a
> dbg.value instruction.
>
> In the future I think we should examine which part of the variable
> that is accessed, and add a DbgValue instrinsic with an appropriate
> DW_OP_LLVM_fragment expression.
>
> Reviewers: dblaikie, aprantl, rnk
>
> Reviewed By: aprantl
>
> Subscribers: JDevlieghere, llvm-commits
>
> Tags: #debug-info
>
> Differential Revision: https://reviews.llvm.org/D48024

llvm-svn: 334830
2018-06-15 13:48:55 +00:00
Simon Dardis 98b9849d34 [mips] Add licensing information of the microMIPS tablegen files. (NFC)
llvm-svn: 334827
2018-06-15 13:29:35 +00:00
Sander de Smalen 3cbf171479 [AArch64][SVE] Asm: Add parsing/printing support for exact FP immediates.
Some instructions require of a limited set of FP immediates as operands,
for example '#0.5 or #1.0' for SVE's FADD instruction.

This patch adds support for parsing and printing such FP immediates as
exact values (e.g. #0.499999 is not accepted for #0.5).

Reviewers: rengolin, fhahn, SjoerdMeijer, samparker, javed.absar

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D47711

llvm-svn: 334826
2018-06-15 13:11:49 +00:00
Matt Arsenault df2f4ef29d DAG: Fix creating concat_vectors with illegal type
Test passes as is, but fails with future patch to make v4i16/v4f16
legal.

llvm-svn: 334823
2018-06-15 12:09:15 +00:00
Roman Lebedev 84c11aed10 [InstCombine] Recommit: Fold (x << y) >> y -> x & (-1 >> y)
Summary:
We already do it for splat constants, but not just values.
Also, undef cases are mostly non-functional.

The original commit was reverted because
it broke tests for amdgpu backend, which i didn't check.
Now, the backed was updated to recognize these new
patterns, so we are good.

https://bugs.llvm.org/show_bug.cgi?id=37603
https://rise4fun.com/Alive/cplX

Reviewers: spatel, craig.topper, mareko, bogner, rampitec, nhaehnle, arsenm

Reviewed By: spatel, rampitec, nhaehnle

Subscribers: wdng, nhaehnle, llvm-commits

Differential Revision: https://reviews.llvm.org/D47980

llvm-svn: 334818
2018-06-15 09:56:52 +00:00
Roman Lebedev dec562c849 [AMDGPU] Recognize x & ~(-1 << y) pattern.
Summary: The same pattern as D48010, but this one is IR-canonical as of D47428.

Reviewers: nhaehnle, bogner, tstellar, arsenm

Reviewed By: arsenm

Subscribers: arsenm, kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Tags: #amdgpu

Differential Revision: https://reviews.llvm.org/D48012

llvm-svn: 334817
2018-06-15 09:56:45 +00:00
Roman Lebedev 9c17dad8f2 [AMDGPU] Recognize x & ((1 << y) - 1) pattern.
Summary:
As a followup for D48007.

Since we already handle `x << (bitwidth - y) >> (bitwidth - y)` pattern,
which does not have ub for both the edge cases (`y == 0`, `y == bitwidth`),
i think also handling a pattern that is ub for `y == bitwidth` should be fine.

Reviewers: nhaehnle, bogner, tstellar, arsenm

Reviewed By: arsenm

Subscribers: arsenm, kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Tags: #amdgpu

Differential Revision: https://reviews.llvm.org/D48010

llvm-svn: 334816
2018-06-15 09:56:39 +00:00
Roman Lebedev aa8587d1fc [AMDGPU] Recognize x & (-1 >> (32 - y)) pattern.
Summary:
D47980 will canonicalize the `x << (32 - y) >> (32 - y)`,
which is the pattern the AMDGPU expects to `x &  (-1 >> (32 - y))`,
which is not recognized by AMDGPU.

Thus, it needs to be recognized, too.

Reviewers: nhaehnle, bogner, tstellar, arsenm

Reviewed By: arsenm

Subscribers: arsenm, kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Tags: #amdgpu

Differential Revision: https://reviews.llvm.org/D48007

llvm-svn: 334815
2018-06-15 09:56:31 +00:00
Peter Smith 1503fc0fd0 [MC] Move bundling and MCSubtargetInfo to MCEncodedFragment [NFC]
Instruction bundling is only supported on descendants of the
MCEncodedFragment type. By moving the bundling functionality and
MCSubtargetInfo to this class it makes it easier to set and extract the
MCSubtargetInfo when it is necessary.

This is a refactoring change that will make it easier to pass the
MCSubtargetInfo through to writeNops when nop padding is required.

Differential Revision: https://reviews.llvm.org/D45959

llvm-svn: 334814
2018-06-15 09:48:18 +00:00
Craig Topper c8a763ed84 Revert r334802 "[X86] Prevent folding stack reloads with instructions that have an undefined register update."
There's a typo causing the build to fail.

llvm-svn: 334803
2018-06-15 06:15:26 +00:00
Craig Topper 5ec210cc27 [X86] Prevent folding stack reloads with instructions that have an undefined register update.
We want to keep the load unfolded so we can use the same register for both sources to avoid a false dependency.

llvm-svn: 334802
2018-06-15 06:11:36 +00:00
Craig Topper 3c4cc01226 [X86] Add more instructions to the memory folding tables using the autogenerated table as a guide.
I think this covers most of the unmasked vector instructions. We're still missing a lot of the masked instructions.

There are some test changes here because of the new folding support. I don't think these particular cases should be folded because it creates an undef register dependency. I think the changes introduced in r334175 are not handling stack folding. They're only blocking the peephole pass.

llvm-svn: 334800
2018-06-15 05:49:19 +00:00
Craig Topper f43807dd89 [X86] Add 'Z' to the internal names of various EVEX instructions for overall consistency.
llvm-svn: 334785
2018-06-15 04:42:54 +00:00
Andrew Kaylor 36bb0ad078 Add debug info for OProfile profiling support
Patch by Gaetano Priori

Differential Revision: https://reviews.llvm.org/D47925

llvm-svn: 334782
2018-06-15 00:07:28 +00:00
Eli Friedman 3f1ce093ea Make uitofp and sitofp defined on overflow.
IEEE 754 defines the expected result on overflow. As far as I know,
hardware implementations (of f16), and compiler-rt (__floatuntisf)
correctly return +-Inf on overflow. And I can't think of any useful
transform that would take advantage of overflow being undefined here.

Differential Revision: https://reviews.llvm.org/D47807

llvm-svn: 334777
2018-06-14 22:58:48 +00:00
Lang Hames 5d6c509944 [ORC] Strip weak flags from a symbol once it is selected for materialization.
Once a symbol has been selected for materialization it can no longer be
overridden. Stripping the weak flag guarantees this (override attempts will
then be treated as duplicate definitions and result in a DuplicateDefinition
error).

llvm-svn: 334771
2018-06-14 21:16:29 +00:00
Michael Berg 0c20447a02 easing the constraint for isNegatibleForFree and GetNegatedExpression
Summary:
Here we relax the old constraint which utilized unsafe with the TargetOption flag HonorSignDependentRoundingFPMathOption, with the assertion that unsafe is no longer needed or never was required for correctness on FDIV/FMUL.  



Reviewers: spatel, hfinkel, wristow, arsenm, javed.absar

Reviewed By: spatel

Subscribers: efriedma, wdng, tpr

Differential Revision: https://reviews.llvm.org/D48057

llvm-svn: 334769
2018-06-14 20:54:13 +00:00
George Burgess IV aa283d80fe [MSSA] Print more optimization information
In particular, when asked to print a MemoryAccess, we'll now print where
defs are optimized to, and we'll print optimized access types.

This patch also introduces an operator<< to make printing AliasResults
easier.

Patch by Juneyoung Lee!

Differential Revision: https://reviews.llvm.org/D47860

llvm-svn: 334760
2018-06-14 19:55:53 +00:00
Sanjay Patel f85ca6abee [x86] be more selective about converting 'and' to shuffle (PR37749)
isVectorClearMaskLegal() is the TLI hook used by the generic
DAGCombiner::XformToShuffleWithZero().

We've grown to accomodate/expect this transform to shuffle
(disabling it more generally results in many regressions).
So I'm narrowly excluding the 256-bit types that clearly 
are not worthwhile for AVX1. 

I think in most cases we are able to recover by converting 
the shuffle back into 'and' ops, but the cases in:
https://bugs.llvm.org/show_bug.cgi?id=37749
...show that there are cracks.

llvm-svn: 334759
2018-06-14 19:55:02 +00:00
Craig Topper bfa94d5086 [X86] Fix stale comment in folding tables.
llvm-svn: 334758
2018-06-14 19:28:31 +00:00
Tom Stellard a92847359a AMDGPU/GlobalISel: Implement select() for @llvm.amdgcn.cvt.pkrtz
Reviewers: arsenm, nhaehnle

Reviewed By: arsenm

Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D45907

llvm-svn: 334757
2018-06-14 19:26:37 +00:00
Justin Bogner 3b83edb037 Re-apply "[VirtRegRewriter] Avoid clobbering registers when expanding copy bundles"
This is r334750 (which was reverted in r334754) with a fix for an
uninitialized variable that was caught by msan.

Original commit message:
> If a copy bundle happens to involve overlapping registers, we can end
> up with emitting the copies in an order that ends up clobbering some
> of the subregisters. Since instructions in the copy bundle
> semantically happen at the same time, this is incorrect and we need to
> make sure we order the copies such that this doesn't happen.

llvm-svn: 334756
2018-06-14 19:24:03 +00:00
Justin Bogner 36c7f40f20 Revert "[VirtRegRewriter] Avoid clobbering registers when expanding copy bundles"
There's an msan failure:

  http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/19549

This reverts r334750.

llvm-svn: 334754
2018-06-14 19:10:57 +00:00
Michael Berg 4663ceb63f updating isNegatibleForFree and GetNegatedExpression with fmf for fadd
Summary:  A FMF constraint is added to FADD with unsafe still available as the fallback

Reviewers: spatel, wristow, arsenm, hfinkel

Reviewed By: spatel

Subscribers: wdng

Differential Revision: https://reviews.llvm.org/D48180

llvm-svn: 334753
2018-06-14 18:48:31 +00:00
Sam Clegg 277f898a4d [WebAssembly] Ignore explicit section names for functions
WebAssembly doesn't support more than one function per section
and we rely on function sections being unique. This change ignores
the section provided by the function to avoid two functions being
in the same section.

Without this change the object writer produces the following
error for this test:
 LLVM ERROR: section already has a defining function: baz

Differential Revision: https://reviews.llvm.org/D48178

llvm-svn: 334752
2018-06-14 18:48:19 +00:00
Justin Bogner 866d9f02be [VirtRegRewriter] Avoid clobbering registers when expanding copy bundles
If a copy bundle happens to involve overlapping registers, we can end
up with emitting the copies in an order that ends up clobbering some
of the subregisters. Since instructions in the copy bundle
semantically happen at the same time, this is incorrect and we need to
make sure we order the copies such that this doesn't happen.

Differential Revision: https://reviews.llvm.org/D48154

llvm-svn: 334750
2018-06-14 18:32:55 +00:00
Justin Lebar bdb0a58c91 [SCEV] Fix a variable name, NFC.
llvm-svn: 334738
2018-06-14 17:14:01 +00:00
Justin Lebar fe455464eb [SCEV] Simplify zext/trunc idiom that appears when handling bitmasks.
Summary:
Specifically, we transform

  zext(2^K * (trunc X to iN)) to iM ->
  2^K * (zext(trunc X to i{N-K}) to iM)<nuw>

This is helpful because pulling the 2^K out of the zext allows further
optimizations.

Reviewers: sanjoy

Subscribers: hiraditya, llvm-commits, timshen

Differential Revision: https://reviews.llvm.org/D48158

llvm-svn: 334737
2018-06-14 17:13:48 +00:00
Justin Lebar b326904dba [SCEV] Simplify trunc-of-add/mul to add/mul-of-trunc under more circumstances.
Summary:
Previously we would do this simplification only if it did not introduce
any new truncs (excepting new truncs which replace other cast ops).

This change weakens this condition: If the number of truncs stays the
same, but we're able to transform trunc(X + Y) to X + trunc(Y), that's
still simpler, and it may open up additional transformations.

While we're here, also clean up some duplicated code.

Reviewers: sanjoy

Subscribers: hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D48160

llvm-svn: 334736
2018-06-14 17:13:35 +00:00
Justin Lebar 62a0747926 [SCEV] Fix indentation and combine two if statements in getMulExpr, NFC.
llvm-svn: 334735
2018-06-14 17:13:22 +00:00
Sam Clegg c0dba0af01 Revert "[MC] Factor MCObjectStreamer::addFragmentAtoms out of MachO streamer."
This reverts rL331412.  We didn't up using fragment atoms
in the wasm object writer after all.

Differential Revision: https://reviews.llvm.org/D48173

llvm-svn: 334734
2018-06-14 17:11:19 +00:00
Bjorn Pettersson 972fd1c9e7 Revert rL334704: "[DebugInfo] Check size of variable in ConvertDebugDeclareToDebugValue"
This reverts commit r334704.

Buildbots detected an assertion in "test tsan in debug compiler-rt build".

llvm-svn: 334732
2018-06-14 16:08:22 +00:00
Nirav Dave 41e69a8e8b Avoid unused variable in non-assert builds.
llvm-svn: 334731
2018-06-14 15:55:15 +00:00
Nirav Dave a1ee983a95 [DAG] Avoid needing to walk out legalization tables. NFCI.
To avoid redundant work, during DAG legalization we keep tables
mapping pre-legalized SDValues to post-legalized SDValues and a
SDValue-to-SDValue map to enable fast node replacements. However, as
the keys are nodes which may be reused it is possible that an entry in
a table refers to a now deleted node N (that should have been renamed
by the value replacement map) while a new node N' exists. If N' is
then replaced that entry would be wrong. Previously we avoided this by
when potentially violating this property, walking every table and
updating all node pointers. This is very expensive but hopefully rare
occurance.

This patch assigns each instance of a SDValue used in legalization a
unique id and uses these ids in the legalization tables. This avoids
any such aliasing issue, avoiding the full table search and allowing
more aggressive incremental table pruning.

In some cases this is a 1000x speedup to compilation.

Reviewers: jyknight, echristo, bogner, tra

Reviewed By: bogner

Subscribers: dberris, grandinj, hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D47959

llvm-svn: 334729
2018-06-14 15:46:23 +00:00
Craig Topper 3ffeb41f6b [X86] Add more vector instructions to the memory folding table using the autogenerated table as a guide.
The test cahnge is because we now fold stack reload into RNDSCALE and RNDSCALE can be turned into ROUND by EVEX->VEX.

llvm-svn: 334728
2018-06-14 15:40:31 +00:00
Craig Topper 82fa048371 [X86] Remove '128' from the internal name of some scalar FP instructions to be consistent with other scalar instructions.
llvm-svn: 334727
2018-06-14 15:40:30 +00:00
Craig Topper b0742bf30d [X86] Disable load unfolding for a bunch of instruction where unfolding would increase the size of the load.
Found by an audit of the manual table vs the autogenerated table.

llvm-svn: 334726
2018-06-14 15:40:29 +00:00
Craig Topper 9f829f76e8 [X86] Remove NotMemoryFoldable from some AVX/AVX512 scalar instructions.
Some of these instructions are already in the manual folding table so we should have them in the auto table too.

llvm-svn: 334725
2018-06-14 15:40:27 +00:00
Lang Hames b7788ebb4a [ORC] Filter out self-dependencies in VSO::addDependencies.
llvm-svn: 334724
2018-06-14 15:32:59 +00:00
Lang Hames bd49fb83aa [ORC] Assert that the query argument to VSO::lookup must be non-null.
llvm-svn: 334723
2018-06-14 15:32:59 +00:00
Lang Hames 784fecfe71 [ORC] Add a WaitUntilReady argument to blockingLookup.
If WaitUntilReady is set to true then blockingLookup will return once all
requested symbols are ready. If WaitUntilReady is set to false then
blockingLookup will return as soon as all requested symbols have been
resolved. In the latter case, if any error occurs in finalizing the symbols it
will be reported to the ExecutionSession, rather than returned by
blockingLookup.

llvm-svn: 334722
2018-06-14 15:32:58 +00:00
Lang Hames 03395d2e58 [ORC] Strip the Materializing flag off finalized symbols in VSOs.
Finalized symbols are no longer in the materializing state.

llvm-svn: 334721
2018-06-14 15:32:56 +00:00
Simon Pilgrim dee9c67f24 [EarlyCSE] Fix MSVC build. NFCI.
MSVC doesn't let you assign different lambdas through a ternary operator.

llvm-svn: 334715
2018-06-14 14:22:03 +00:00
Sam Clegg 8c32e913b5 [MC] Move MCAssembler::dump into the correct cpp file. NFC
Differential Revision: https://reviews.llvm.org/D46556

llvm-svn: 334713
2018-06-14 14:04:23 +00:00
Paul Robinson cc7344aae3 [DWARFv5] Tolerate files not all having an MD5 checksum.
In some cases, for example when compiling a preprocessed file, the
front-end is not able to provide an MD5 checksum for all files. When
that happens, omit the MD5 checksums from the final DWARF, because
DWARF doesn't have a way to indicate that some but not all files have
a checksum.

When assembling a .s file, and some but not all .file directives
provide an MD5 checksum, issue a warning and don't emit MD5 into the
DWARF.

Fixes PR37623.

Differential Revision: https://reviews.llvm.org/D48135

llvm-svn: 334710
2018-06-14 13:38:20 +00:00
Simon Dardis 6ad680ab6a [mips] Correct predicates for MSA pseudo instructions
llvm-svn: 334708
2018-06-14 13:03:53 +00:00
Max Kazantsev ff6d1c9188 [EarlyCSE] Propagate conditions of AND and OR instructions
This patches teaches EarlyCSE to figure out that if `and i1 %x, %y` is true then both
`%x` and `%y` are true in the taken branch, and if `or i1 %x, %y` is false then both
`%x` and `%y` are false in non-taken branch. Fix for PR37635.

Differential Revision: https://reviews.llvm.org/D47574
Reviewed By: reames

llvm-svn: 334707
2018-06-14 13:02:13 +00:00
Bjorn Pettersson e406b29c22 [DebugInfo] Check size of variable in ConvertDebugDeclareToDebugValue
Summary:
Do not convert a DbgDeclare to DbgValue if the store
instruction only refer to a fragment of the variable
described by the DbgDeclare.

Problem was seen when for example having an alloca for an
array or struct, and there were stores to individual elements.
In the past we inserted a DbgValue intrinsics for each store,
just as if the store wrote the whole variable.

When handling store instructions we insert a DbgValue that
indicates that the variable is "undefined", as we do not know
which part of the variable that is updated by the store.

When ConvertDebugDeclareToDebugValue is used with a load/phi
instruction we assert that the referenced value is large enough
to cover the whole variable. Afaict this should be true for all
scenarios where those methods are used on trunk. If the assert
blows in the future I guess we could simply skip to insert a
dbg.value instruction.

In the future I think we should examine which part of the variable
that is accessed, and add a DbgValue instrinsic with an appropriate
DW_OP_LLVM_fragment expression.

Reviewers: dblaikie, aprantl, rnk

Reviewed By: aprantl

Subscribers: JDevlieghere, llvm-commits

Tags: #debug-info

Differential Revision: https://reviews.llvm.org/D48024

llvm-svn: 334704
2018-06-14 11:23:42 +00:00
Simon Pilgrim b234ff136e [SLPVectorizer] Remove RawInstructionsData/getMainOpcode and merge into getSameOpcode
This is part of the work to cleanup use of 'alternate' ops so we can use the more general SK_Select shuffle type.

Only getSameOpcode calls getMainOpcode and much of the logic is repeated in both functions. This will require some reworking of D28907 but that patch has hit trouble and is unlikely to be completed anytime soon.

Differential Revision: https://reviews.llvm.org/D48120

llvm-svn: 334701
2018-06-14 10:25:19 +00:00
Simon Pilgrim c0d53aba7b [CostModel] Cleanup isSingleSourceVectorMask to match other shuffle matchers. NFCI.
llvm-svn: 334699
2018-06-14 09:48:19 +00:00
Simon Pilgrim 32702cc86a [CostModel] Recognise REVERSE shuffle mask if the elements come from the second src
llvm-svn: 334698
2018-06-14 09:35:00 +00:00
Hiroshi Inoue f209649dfc [NFC] fix trivial typos in comments
llvm-svn: 334687
2018-06-14 05:41:49 +00:00
Craig Topper b2552e1e08 [x86] fix mappings of cvttp2si/cvttp2ui x86 intrinsics to x86-specific nodes and isel patterns (PR37551)
Summary:
The tests in:
https://bugs.llvm.org/show_bug.cgi?id=37751
...show miscompiles because we wrongly mapped and folded x86-specific intrinsics into generic DAG nodes.

This patch corrects the mappings in X86IntrinsicsInfo.h and adds isel matching corresponding to the new patterns. The complete tests for the failure cases should be in avx-cvttp2si.ll and sse-cvttp2si.ll and avx512-cvttp2i.ll

Reviewers: RKSimon, gbedwell, spatel

Reviewed By: spatel

Subscribers: mcrosier, llvm-commits

Differential Revision: https://reviews.llvm.org/D47993

llvm-svn: 334685
2018-06-14 03:16:58 +00:00
Tom Stellard 46bbbc33c0 AMDGPU/GlobalISel: Implement select() for 32-bit G_FADD and G_FMUL
Reviewers: arsenm, nhaehnle

Reviewed By: arsenm

Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D46171

llvm-svn: 334665
2018-06-13 22:30:47 +00:00
Zachary Turner 9b8b0794b8 Revert "Enable ThreadPool to queue tasks that return values."
This is failing to compile when LLVM_ENABLE_THREADS is false,
and the fix is not immediately obvious, so reverting while I look
into it.

llvm-svn: 334658
2018-06-13 21:24:19 +00:00
Francis Visoiu Mistrih 03185797d7 Reland: [Timers] Use the pass argument name for JSON keys in time-passes
When using clang --save-stats -mllvm -time-passes, both timers and stats
end up in the same json file.

We could end up with things like:

{
  "asm-printer.EmittedInsts": 1,
  "time.pass.Virtual Register Map.wall": 2.9015541076660156e-04,
  "time.pass.Virtual Register Map.user": 2.0500000000000379e-04,
  "time.pass.Virtual Register Map.sys": 8.5000000000001741e-05,
}

This patch makes use of the pass argument name (if available) in the
JSON key to end up with things like:

{
  "asm-printer.EmittedInsts": 1,
  "time.pass.virtregmap.wall": 2.9015541076660156e-04,
  "time.pass.virtregmap.user": 2.0500000000000379e-04,
  "time.pass.virtregmap.sys": 8.5000000000001741e-05,
}

This also helps avoiding to write another JSON printer to handle all the
cases that we could have in our pass names.

Fixed test instead of adding a new one originally from r334649.

Differential Revision: https://reviews.llvm.org/D48109

llvm-svn: 334657
2018-06-13 21:03:56 +00:00
Reid Kleckner 12395b7795 [WinASan] Don't instrument globals in sections containing '$'
Such globals are very likely to be part of a sorted section array, such
the .CRT sections used for dynamic initialization. The uses its own
sorted sections called ATL$__a, ATL$__m, and ATL$__z. Instead of special
casing them, just look for the dollar sign, which is what invokes linker
section sorting for COFF.

Avoids issues with ASan and the ATL uncovered after we started
instrumenting comdat globals on COFF.

llvm-svn: 334653
2018-06-13 20:47:21 +00:00
Francis Visoiu Mistrih 0c3a7761f3 Revert r334649 "[Timers] Use the pass argument name for JSON keys in time-passes"
This reverts commit r334649.

This breaks a test.

llvm-svn: 334651
2018-06-13 20:44:02 +00:00
Francis Visoiu Mistrih fbd450b052 [Timers] Use the pass argument name for JSON keys in time-passes
When using clang --save-stats -mllvm -time-passes, both timers and stats
end up in the same json file.

We could end up with things like:

{
  "asm-printer.EmittedInsts": 1,
  "time.pass.Virtual Register Map.wall": 2.9015541076660156e-04,
  "time.pass.Virtual Register Map.user": 2.0500000000000379e-04,
  "time.pass.Virtual Register Map.sys": 8.5000000000001741e-05,
}

This patch makes use of the pass argument name (if available) in the
JSON key to end up with things like:

{
  "asm-printer.EmittedInsts": 1,
  "time.pass.virtregmap.wall": 2.9015541076660156e-04,
  "time.pass.virtregmap.user": 2.0500000000000379e-04,
  "time.pass.virtregmap.sys": 8.5000000000001741e-05,
}

This also helps avoiding to write another JSON printer to handle all the
cases that we could have in our pass names.

Differential Revision: https://reviews.llvm.org/D48109

llvm-svn: 334649
2018-06-13 20:09:59 +00:00
Craig Topper f7f663e0a9 [X86] Move RCPSSr_Int, RSQRTSSr_Int, SQRTSDr_Int, SQRTSSr_Int to the correct load folding table.
They were in the operand 1 folding table, but their foldable operand is operand 2.

llvm-svn: 334648
2018-06-13 20:03:42 +00:00
Zachary Turner 1b76a128a8 Enable ThreadPool to support tasks that return values.
Previously ThreadPool could only queue async "jobs", i.e. work
that was done for its side effects and not for its result.  It's
useful occasionally to queue async work that returns a value.
From an API perspective, this is very intuitive.  The previous
API just returned a shared_future<void>, so all we need to do is
make it return a shared_future<T>, where T is the type of value
that the operation returns.

Making this work required a little magic, but ultimately it's not
too bad.  Instead of keeping a shared queue<packaged_task<void()>>
we just keep a shared queue<unique_ptr<TaskBase>>, where TaskBase
is a class with a pure virtual execute() method, then have a
templated derived class that stores a packaged_task<T()>.  Everything
else works out pretty cleanly.

Differential Revision: https://reviews.llvm.org/D48115

llvm-svn: 334643
2018-06-13 19:29:16 +00:00
Stanislav Mekhanoshin 7bec57300c [AMDGPU] Corrected computeKnownBits for V_PERM_B32
Differential Revision: https://reviews.llvm.org/D48133

llvm-svn: 334640
2018-06-13 18:52:54 +00:00
Peter Collingbourne 881ba10465 LTO: Keep file handles open for memory mapped files.
On Windows we've observed that if you open a file, write to it, map it into
memory and close the file handle, the contents of the memory mapping can
sometimes be incorrect. That was what we did when adding an entry to the
ThinLTO cache using the TempFile and MemoryBuffer classes, and it was causing
intermittent build failures on Chromium's ThinLTO bots on Windows. More
details are in the associated Chromium bug (crbug.com/786127).

We can prevent this from happening by keeping a handle to the file open while
the mapping is active. So this patch changes the mapped_file_region class to
duplicate the file handle when mapping the file and close it upon unmapping it.

One gotcha is that the file handle that we keep open must not have been
created with FILE_FLAG_DELETE_ON_CLOSE, as otherwise the operating system
will prevent other processes from opening the file. We can achieve this
by avoiding the use of FILE_FLAG_DELETE_ON_CLOSE altogether.  Instead,
we use SetFileInformationByHandle with FileDispositionInfo to manage the
delete-on-close bit. This lets us remove the hack that we used to use to
clear the delete-on-close bit on a file opened with FILE_FLAG_DELETE_ON_CLOSE.

A downside of using SetFileInformationByHandle/FileDispositionInfo as
opposed to FILE_FLAG_DELETE_ON_CLOSE is that it prevents us from using
CreateFile to open the file while the flag is set, even within the same
process. This doesn't seem to matter for almost every client of TempFile,
except for LockFileManager, which calls sys::fs::create_link to create a
hard link from the lock file, and in the process of doing so tries to open
the file. To prevent this change from breaking LockFileManager I changed it
to stop using TempFile by effectively reverting r318550.

Differential Revision: https://reviews.llvm.org/D48051

llvm-svn: 334630
2018-06-13 18:03:14 +00:00
Yaxun Liu fb17bf60dd [AMDGPU] Change enqueue kernel handle type
Currently the handle type is a global pointer which holds 8 bytes.
We need a larger type which hold 16 bytes, therefore change it
to [i64 x 2].

Differential Revision: https://reviews.llvm.org/D48094

llvm-svn: 334625
2018-06-13 17:31:51 +00:00
Dmitry Preobrazhensky 32c6b5cb70 [AMDGPU][MC] Enabled parsing of relocations on VALU instructions
See bug 37566: https://bugs.llvm.org/show_bug.cgi?id=37566

Reviewers: artem.tamazov, arsenm, nhaehnle

Differential Revision: https://reviews.llvm.org/D47884

llvm-svn: 334622
2018-06-13 17:02:03 +00:00
Simon Pilgrim 54a138a0c5 [CostModel] Recognise BROADCAST shuffle mask if the elements come from the second src
llvm-svn: 334620
2018-06-13 16:52:02 +00:00
Dmitry Preobrazhensky ffbee7acdc [AMDGPU][MC][GFX8][GFX9] Allow LDS direct reads for BUFFER_LOAD_DWORDX2/X3/X4
See bug 37653: https://bugs.llvm.org/show_bug.cgi?id=37653

Reviewers: artem.tamazov, arsenm

Differential Revision: https://reviews.llvm.org/D47885

llvm-svn: 334609
2018-06-13 15:32:46 +00:00
Sanjay Patel 7d4929611c [DAGCombiner] remove hasOneUse() check from fadd constants transform
We're constant folding here, so we shouldn't check uses. This matches
the IR optimizer behavior.

The x86 test shows the expected win. The AArch64 test shows something
else. This only seems to happen if the "generic" AArch64 CPU model is 
used by MachineCombiner, so I'll file a bug report to follow-up.

llvm-svn: 334608
2018-06-13 15:22:48 +00:00
Tom Stellard 264c171f36 AMDGPU: Move isSDNodeSourceOfDivergence() implementation to SITargetLowering
Summary:
The code that handles ISD:Register and ISD::CopyFromReg assumes
the target is amdgcn, so this is broken on r600.  We don't
need this analysis on r600 anyway so we can safely move
it to SITargetLowering.

Reviewers: alex-t, arsenm, nhaehnle

Reviewed By: arsenm

Subscribers: msearles, kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D46298

llvm-svn: 334607
2018-06-13 15:06:37 +00:00
Cameron McInally f37bd01ddc [FPEnv] Expand constrained FP operations
Add a helper function to expand constrained FP operations as needed. 
Note that the Strict POWI operation is not handled in this patch since 
the format is slightly different from the others.

Differential Revision: https://reviews.llvm.org/D47491

llvm-svn: 334603
2018-06-13 14:32:12 +00:00
Hans Wennborg 12ba9ec929 Do not enforce absolute path argv0 in windows
Even if we support no-canonical-prefix on
clang-cl(https://reviews.llvm.org/D47480), argv0 becomes absolute path
in clang-cl and that embeds absolute path in /showIncludes.

This patch removes such full path normalization from InitLLVM on
windows, and that removes absolute path from clang-cl output
(obj/stdout/stderr) when debug flag is disabled.

Patch by Takuto Ikuta!

Differential Revision https://reviews.llvm.org/D47578

llvm-svn: 334602
2018-06-13 14:29:26 +00:00
Krzysztof Parzyszek 3e039f86cc Revert "Improve handling of COPY instructions with identical value numbers"
This reverts r334594, it breaks buildbots and fails with expensive checks.

llvm-svn: 334598
2018-06-13 13:49:06 +00:00
Zoran Jovanovic 3a7654c15d [mips][microMIPS] Extending size reduction pass with LWP and SWP
Author: milena.vujosevic.janicic
Reviewers: sdardis
The patch extends size reduction pass for MicroMIPS.
It introduces reduction of two instructions into one instruction:
Two SW instructions are transformed into one SWP instrucition.
Two LW instructions are transformed into one LWP instrucition.
Differential Revision: https://reviews.llvm.org/D39115

llvm-svn: 334595
2018-06-13 12:51:37 +00:00
Krzysztof Parzyszek 36b816f814 Improve handling of COPY instructions with identical value numbers
Differential Revision: https://reviews.llvm.org/D48102

llvm-svn: 334594
2018-06-13 12:47:17 +00:00
Sanjay Patel b983ac6fe1 [x86] eliminate even more sign-bit tests with vector select
This shortcoming was noted in D47330, and the test diffs show we already 
had other examples where we failed to fold to a SHRUNKBLEND:

/// Dynamic (non-constant condition) vector blend where only the sign bits
/// of the condition elements are used. This is used to enforce that the
/// condition mask is not valid for generic VSELECT optimizations.

This patch implements an idea from D48043 and would obsolete that patch 
because it catches more cases (notable the AVX1 case that was missed there). 
All we're doing is allowing the existing transform to fire more often by 
removing the post-legalize constraint. All of the relevant feature checks 
and other predicates are left as-is.

Differential Revision: https://reviews.llvm.org/D48078

llvm-svn: 334592
2018-06-13 12:28:32 +00:00
Alex Bradbury 96f492d7df [RISCV] Add codegen support for atomic load/stores with RV32A
Fences are inserted according to table A.6 in the current draft of version 2.3
of the RISC-V Instruction Set Manual, which incorporates the memory model
changes and definitions contributed by the RISC-V Memory Consistency Model
task group.

Instruction selection failures will now occur for 8/16/32-bit atomicrmw and 
cmpxchg operations when targeting RV32IA until lowering for these operations 
is added in a follow-on patch.

Differential Revision: https://reviews.llvm.org/D47589

llvm-svn: 334591
2018-06-13 12:04:51 +00:00
Alex Bradbury dc790dd5d0 [RISCV] Codegen support for atomic operations on RV32I
This patch adds lowering for atomic fences and relies on AtomicExpandPass to
lower atomic loads/stores, atomic rmw, and cmpxchg to __atomic_* libcalls.

test/CodeGen/RISCV/atomic-* are modelled on the exhaustive
test/CodeGen/PPC/atomics-regression.ll, and will prove more useful once RV32A
codegen support is introduced.

Fence mappings are taken from table A.6 in the current draft of version 2.3 of
the RISC-V Instruction Set Manual, which incorporates the memory model changes
and definitions contributed by the RISC-V Memory Consistency Model task group.

Differential Revision: https://reviews.llvm.org/D47587

llvm-svn: 334590
2018-06-13 11:58:46 +00:00
Simon Pilgrim 2c9d2adff5 [SLPVectorizer] getSameOpcode - remove useless cast [NFC]
There's no need to cast the base Value to an Instruction

llvm-svn: 334588
2018-06-13 10:49:24 +00:00
Simon Pilgrim 1224260f83 [SLPVectorizer] getSameOpcode - remove unusued alternate code [NFC]
We early-out for the case where we don't use alternate opcodes, so no need to check for it later.

llvm-svn: 334587
2018-06-13 10:14:27 +00:00
Clement Courbet 5eeed77f87 [TableGen] Emit a fatal error on inconsistencies in resource units vs cycles.
Summary:
For targets I'm not familiar with, I've automatically made the "default to 1 for each resource" behaviour explicit in the td files.
For more obvious cases, I've ventured a fix.

Some notes:
 - Exynos is especially fishy.
 - AArch64SchedThunderX2T99.td had some truncated entries. If I understand correctly, the person who wrote that interpreted the ResourceCycle as a range. I made the decision to use the upper/lower bound for consistency with the 'Latency' value. I'm sure there is a better choice.
 - The change to X86ScheduleBtVer2.td is an NFC, it just makes values more explicit.

Also see PR37310.

Reviewers: RKSimon, craig.topper, javed.absar

Subscribers: kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D46356

llvm-svn: 334586
2018-06-13 09:41:49 +00:00
Hiroshi Inoue 0f7f59f073 [PowerPC] fix trivial typos in comment, NFC
llvm-svn: 334583
2018-06-13 08:54:13 +00:00
Hans Wennborg 32e611d1a9 Fix -DLLVM_ENABLE_THREADS=OFF build after r334537
llvm-svn: 334582
2018-06-13 08:43:03 +00:00
Hiroshi Inoue 9bffc94cf0 [PowerPC] avoid verification failure due to PowerPC VSX Swap Removal pass
This patch fixes a failure in lnt tests with -verify-machineinstrs option.
When VSX Swap Removal pass swaps two register operands, it did not maintain kill flags associated with operands. This patch swaps flags as well as register number to avoid inconsistent kill flags information.

llvm-svn: 334579
2018-06-13 08:25:14 +00:00
Pavel Labath 4adc88ed25 [DWARF/AccelTable] Remove getDIESectionOffset for DWARF v5 entries
Summary:
This method was not correct for entries in DWO files as it assumed it
could just add up the CU and DIE offsets to get the absolute DIE offset.
This is not correct for the DWO files, as here the CU offset will
reference the skeleton unit, whereas the DIE offset will be the offset
in the full unit in the DWO file.

Unfortunately, this means that we are not able to determine the absolute
DIE offset using the information in the .debug_names section alone,
which means we have to offload some of this work to the users of this
class.

To demonstrate how this can be done, I've added/fixed the ability to
lookup entries using accelerator tables in DWO files in llvm-dwarfdump.
To make this happen, I've needed to make two extra changes in other
classes:
- made the DWARFContext method to lookup a CU based on the section
  offset public. I've needed this functionality to lookup a CU, and this
  seems like a useful thing in general.
- made DWARFUnit::getDWOId call extractDIEsIfNeeded. Before this, the
  DWOId was filled in only if the root DIE happened to be parsed
  before we called the accessor. Since the lazy parsing is supposed to
  happen under the hood, calling extractDIEsIfNeeded seems appropriate.

Reviewers: JDevlieghere, aprantl, dblaikie

Subscribers: mgrang, llvm-commits

Differential Revision: https://reviews.llvm.org/D48009

llvm-svn: 334578
2018-06-13 08:14:27 +00:00
Craig Topper 3829d258ee [X86] Remove masking from avx512vbmi2 concat and shift by immediate intrinsics. Use select in IR instead.
llvm-svn: 334576
2018-06-13 07:19:21 +00:00
Max Kazantsev 0ed79620c6 [SimplifyIndVars] Ignore dead users
IndVarSimplify sometimes makes transforms basing on users that are trivially dead. In particular,
if DCE wasn't run before it, there may be a dead `sext/zext` in loop that will trigger widening
transforms, however it makes no sense to do it.

This patch teaches IndVarsSimplify ignore the mist trivial cases of that.

Differential Revision: https://reviews.llvm.org/D47974
Reviewed By: sanjoy

llvm-svn: 334567
2018-06-13 02:25:32 +00:00
Craig Topper 55488731be [X86] Mark all instructions that have masked store semantics with NotMemoryFoldable. Remove dependency on SchedRW from memory table autogenerator.
Previously we were whitelisting in instructions based on their SchedRW value. With the masked store instructions explicitly removed via NotMemoryFoldable, we don't seem to need this check anymore.

llvm-svn: 334563
2018-06-13 00:04:08 +00:00
Craig Topper 4f9cac667b [X86] Remove VPCOMPRESSB/W from the autogenerated load folding table.
llvm-svn: 334562
2018-06-13 00:04:04 +00:00
Stanislav Mekhanoshin 8fd3c4e431 [AMDGPU] DAG combine to produce V_PERM_B32
Differential Revision: https://reviews.llvm.org/D48099

llvm-svn: 334559
2018-06-12 23:50:37 +00:00
Krzysztof Parzyszek 82d284c1d2 [DAGCombiner] Recognize more patterns for ABS
Differential Revision: https://reviews.llvm.org/D47831

llvm-svn: 334553
2018-06-12 21:51:49 +00:00
Reid Kleckner 04920989a2 Remove malloc.h include from Intel JIT events code
llvm-svn: 334547
2018-06-12 21:15:27 +00:00
Reid Kleckner 722e2ea247 Add null check to Intel JIT event listener
llvm-svn: 334544
2018-06-12 20:54:11 +00:00
Lang Hames 2aae25819e [ORC] Add a fallback definition generator for VSOs.
If a VSO has a fallback definition generator attached it will be called during
lookup (and lookupFlags) for any unresolved symbols. The definition generator
can add new definitions to the VSO for any unresolved symbol. This allows VSOs
to generate new definitions on demand.

The immediate use case for this code is supporting VSOs that can import
definitions found via dlsym on demand.

llvm-svn: 334538
2018-06-12 20:43:18 +00:00
Lang Hames 253584fdaf [ORC] Refactor blocking lookup logic into the blockingLookup function, and
implement existing blocking lookups (the lookup function) and
JITSymbolResolverAdapter on top of that.

llvm-svn: 334537
2018-06-12 20:43:17 +00:00
Lang Hames 22a7440200 [RuntimeDyld] Add an assert to catch misbehaving symbol resolvers.
Resolvers are required to find results for all requested symbols or return an
error, but if a resolver fails to adhere to this contract (by returning results
for only a subset of the requested symbols) then this code will infinite loop.
This assertion catches resolvers that fail to adhere to the contract.

llvm-svn: 334536
2018-06-12 20:43:17 +00:00
Lang Hames e7989ef47f [MCJIT] Call materializeAll on modules before compiling them in MCJIT.
This only affects modules with lazy GVMaterializers attached (usually modules
read off disk using the lazy bitcode reader). For such modules, materializing
before compiling prevents crashes due to missing function bodies /
initializers.

llvm-svn: 334535
2018-06-12 20:43:15 +00:00
Petr Hosek 7250908016 [AArch64] Support reserving x20 register
Register x20 is a callee-saved register which may be used for other
purposes in certain contexts, for example to hold special variables
within the kernel. This change adds support for reserving this register
both to frontend and backend to make this register usable for these
purposes.

Differential Revision: https://reviews.llvm.org/D46552

llvm-svn: 334531
2018-06-12 20:00:50 +00:00
Craig Topper 3a34c3596d [X86] Remove mayLoad flag from AVX512 truncating store instructions.
llvm-svn: 334529
2018-06-12 19:59:08 +00:00
Reid Kleckner 98117a47e6 [MS][ARM64] Hoist __ImageBase handling into TargetLoweringObjectFileCOFF
All COFF targets should use @IMGREL32 relocations for symbol differences
against __ImageBase. Do the same for getSectionForConstant, so that
immediates lowered to globals get merged across TUs.

Patch by Chris January

Differential Revision: https://reviews.llvm.org/D47783

llvm-svn: 334523
2018-06-12 18:56:05 +00:00
Konstantin Zhuravlyov ce25bc3e82 AMDHSA/NFC: Code object v3 updates (additional):
- Move section selection and alignment to AMDGPUAsmPrinter

llvm-svn: 334521
2018-06-12 18:33:51 +00:00
Roman Tereshin b2d3f2e5da [MIR][MachineCSE] Implementing proper MachineInstr::getNumExplicitDefs()
Apparently, MachineInstr class definition as well as pretty much all of
the machine passes assume that the only kind of MachineInstr's operands
that is variadic for variadic opcodes is explicit non-definitions.

In particular, this assumption is made by MachineInstr::defs(), uses(),
and explicit_uses() methods, as well as by MachineCSE pass.

The assumption is incorrect judging from at least TableGen backend
implementation, that recognizes variable_ops in OutOperandList, and the
very existence of G_UNMERGE_VALUES generic opcode, or ARM load multiple
instructions, all of which have variadic defs.

In particular, MachineCSE pass breaks MIR with CSE'able G_UNMERGE_VALUES
instructions in it.

This commit implements MachineInstr::getNumExplicitDefs() similar to
pre-existing MachineInstr::getNumExplicitOperands(), fixes
MachineInstr::defs(), uses(), and explicit_uses(), and fixes MachineCSE
pass.

As the issue addressed seems to affect only machine passes that could be
ran mid-GlobalISel pipeline at the moment, the other passes aren't fixed
by this commit, like MachineLICM: that could be done on per-pass basis
when (if ever) they get adopted for GlobalISel.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D45640

llvm-svn: 334520
2018-06-12 18:30:37 +00:00
Konstantin Zhuravlyov 00f2cb1116 AMDHSA: Code object v3 updates
- Do not emit following assembler directives:
  - .hsa_code_object_version
  - .hsa_code_object_isa
  - .amd_amdgpu_isa
  - .amd_amdgpu_hsa_metadata
  - .amd_amdgpu_pal_metadata
- Do not emit .note entries
- Cleanup and bring in sync kernel descriptor header file
- Emit kernel descriptor into .rodata with appropriate relocations and
  alignments

llvm-svn: 334519
2018-06-12 18:02:46 +00:00
Zachary Turner 08426e1f9f Refactor ExecuteAndWait to take StringRefs.
This simplifies some code which had StringRefs to begin with, and
makes other code more complicated which had const char* to begin
with.

In the end, I think this makes for a more idiomatic and platform
agnostic API.  Not all platforms launch process with null terminated
c-string arrays for the environment pointer and argv, but the api
was designed that way because it allowed easy pass-through for
posix-based platforms.  There's a little additional overhead now
since on posix based platforms we'll be takign StringRefs which
were constructed from null terminated strings and then copying
them to null terminate them again, but from a readability and
usability standpoint of the API user, I think this API signature
is strictly better.

llvm-svn: 334518
2018-06-12 17:43:52 +00:00
Fangrui Song f72cdb50be [MC] [X86] Teach leaq _GLOBAL_OFFSET_TABLE(%rip), %r15 to use R_X86_64_GOTPC32 instead of R_X86_64_PC32
Summary:
This is similar to D46319 (ARM). x86-64 psABI p40 gives an example:

  leaq _GLOBAL_OFFSET_TABLE(%rip), %r15 # GOTPC32 reloc

GNU as creates R_X86_64_GOTPC32. However, MC currently emits R_X86_64_PC32.

Reviewers: javed.absar, echristo

Subscribers: kristof.beyls, llvm-commits, peter.smith, grimar

Differential Revision: https://reviews.llvm.org/D47507

llvm-svn: 334515
2018-06-12 16:20:44 +00:00
Michael Berg 5d49f66570 Utilize new SDNode flag functionality to expand current support for fmul
Summary: This patch originated from D46562 and is a proper subset, with some issues addressed for fmul.

Reviewers: spatel, hfinkel, wristow, arsenm

Reviewed By: spatel

Subscribers: nhaehnle, wdng

Differential Revision: https://reviews.llvm.org/D47911

llvm-svn: 334514
2018-06-12 16:13:11 +00:00
Simon Pilgrim e39fa6cbbb [CostModel] Replace ShuffleKind::SK_Alternate with ShuffleKind::SK_Select (PR33744)
As discussed on PR33744, this patch relaxes ShuffleKind::SK_Alternate which requires shuffle masks to only match an alternating pattern from its 2 sources:

e.g. v4f32: <0,5,2,7> or <4,1,6,3>

This seems far too restrictive as most SIMD hardware which will implement it using a general blend/bit-select instruction, so replaces it with SK_Select, permitting elements from either source as long as they are inline:

e.g. v4f32: <0,5,2,7>, <4,1,6,3>, <0,1,6,7>, <4,1,2,3> etc.

This initial patch just updates the name and cost model shuffle mask analysis, later patch reviews will update SLP to better utilise this - it still limits itself to SK_Alternate style patterns.

Differential Revision: https://reviews.llvm.org/D47985

llvm-svn: 334513
2018-06-12 16:12:29 +00:00
Paul Robinson f69316c617 [DWARFv5] llvm-mc -dwarf-version does not imply -g.
Don't provide the assembler source as the "root file" unless the user
asked to have debug info for the assembler source (with -g).

If the source doesn't provide an explicit ".file 0" then (a) use the
compilation directory as directory #0, and (b) use the file #1 info
for file #0 also.

Differential Revision: https://reviews.llvm.org/D48055

llvm-svn: 334512
2018-06-12 16:09:03 +00:00
Craig Topper ede97c9548 [X86] Remove TB_ALIGN_16 from VEXTRACTF128/VEXTRACTI128 in the memory folding table.
llvm-svn: 334511
2018-06-12 15:48:03 +00:00
Simon Pilgrim 51ef8dabcd Fix signed/unsigned warning. NFCI.
llvm-svn: 334509
2018-06-12 15:14:34 +00:00
Krzysztof Parzyszek bea23d065e [Hexagon] Make floating point operations expensive for vectorization
llvm-svn: 334508
2018-06-12 15:12:50 +00:00
Simon Pilgrim 0783921987 [CostModel] Treat Identity shuffle masks as zero cost
As discussed on D47985, identity shuffle masks should probably be free.

I've limited this to the case where the input and output types all match - but we could probably accept all cases.

Differential Revision: https://reviews.llvm.org/D47986

llvm-svn: 334506
2018-06-12 14:47:13 +00:00
Sanjay Patel c3466d2568 [x86] move shrunkblend transform to helper function; NFCI
We should be able to obsolete D48043 by easing the constraints
on this existing code. 

llvm-svn: 334504
2018-06-12 14:21:51 +00:00
Krzysztof Parzyszek 3d671248ab [SelectionDAG] Provide default expansion for rotates
Implement default legalization of rotates: either in terms of the rotation
in the opposite direction (if legal), or in terms of shifts and ors.

Implement generating of rotate instructions for Hexagon. Hexagon only
supports rotates by an immediate value, so implement custom lowering of
ROTL/ROTR on Hexagon. If a rotate is not legal, use the default expansion.

Differential Revision: https://reviews.llvm.org/D47725

llvm-svn: 334497
2018-06-12 12:49:36 +00:00
Florian Hahn a1cc848399 Use SmallPtrSet explicitly for SmallSets with pointer types (NFC).
Currently SmallSet<PointerTy> inherits from SmallPtrSet<PointerTy>. This
patch replaces such types with SmallPtrSet, because IMO it is slightly
clearer and allows us to get rid of unnecessarily including SmallSet.h

Reviewers: dblaikie, craig.topper

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D47836

llvm-svn: 334492
2018-06-12 11:16:56 +00:00
Simon Dardis 74fb5e6789 [mips] Guard some floating point instructions correctly
Reviewers: smaksimovic, atanasyan, abeserminji

Differential Revision: https://reviews.llvm.org/D47636

llvm-svn: 334491
2018-06-12 10:28:06 +00:00
Aleksandar Beserminji 8acdc10220 [mips] Extend LONG_BRANCH_LUi/ADDiu with extra parameter
Extend LONG_BRANCH_LUi and LONG_BRANCH_ADDiu pseudo instructions with
additional flag, so instead of always lowering to lui %hi(...),
addiu %lo(...) or addiu %hi(...), now they can lower to either %lo, %hi,
%higher or %highest depending on the added flag.

Differential Revision: https://reviews.llvm.org/D47941

llvm-svn: 334490
2018-06-12 10:23:49 +00:00
Luke Geeson dc82aa44e6 [AArch64] Audit on rL333879 to fix FP16 64bit bitpatterns
llvm-svn: 334488
2018-06-12 09:35:20 +00:00
Craig Topper 88c230265b [X86] Add NotMemoryFoldable to the VPCOMPRESS instructions.
llvm-svn: 334481
2018-06-12 07:32:19 +00:00
Craig Topper 5799e4df75 [X86] Add NotMemoryFoldable to more instructions.
These include PUSH/POP instructions that don't match the manual table. This also includes CMPXCHG which we never emit in non-locked form.

llvm-svn: 334479
2018-06-12 07:32:17 +00:00
Wei Mi d9be2c7e64 [NFC] Change sample profile format enum name SPF_Raw_Binary to SPF_Binary.
Some out-of-tree targets depend on the enum name SPF_Binary. Keep the name
can avoid unnecessary churn to those targets.

llvm-svn: 334476
2018-06-12 05:53:49 +00:00
Craig Topper 66572df76e [X86] Add NotMemoryFoldable to a bunch of instructions to suppress them from the autogenerated load folding table.
Most of these are system instructions or other instructions we don't use in CodeGen. No point wasting space for them in the table. Removing them from the autogenerated table makes it easier to review the manual table.

A few are real opcode collisions where the memory and register forms are completely different instructions.

llvm-svn: 334474
2018-06-12 04:34:59 +00:00
Craig Topper 957b738432 [X86] Add isel patterns for folding loads when creating ROUND instructions from ffloor/fnearbyint/fceil/frint/ftrunc.
We were missing packed isel folding patterns for all of sse41, avx, and avx512.

For some reason avx512 had scalar load folding patterns under optsize(due to partial/undef reg update), but we didn't have the equivalent sse41 and avx patterns.

Sometimes we would get load folding due to peephole pass anyway, but we're also missing avx512 instructions from the load folding table. I'll try to fix that in another patch.

Some of this was spotted in the review for D47993.

This patch adds all the folds to isel, adds a few spot tests, and disables the peephole pass on a few tests to ensure we're testing some of these patterns.

llvm-svn: 334460
2018-06-12 00:48:57 +00:00
Mark Searles 987f292c56 [AMDGPU] prevent hitting Assertion `isReg() && "Wrong MachineOperand accessor"'
The use iterator, used within findMaskOperands(), can return anything which is
not a def. isUse() requires a register, so check isReg() before calling isUse().

Differential Revision: https://reviews.llvm.org/D48047

llvm-svn: 334459
2018-06-12 00:41:26 +00:00
Wei Mi c6b96c8db2 Fix a warning issued by clang.
llvm-svn: 334453
2018-06-11 23:09:04 +00:00
George Burgess IV c72204d5b5 Simplify; NFC
Not shown in the diff: AQ is a `vector<SUnit *>`, and SU is a `SUnit *`

llvm-svn: 334451
2018-06-11 22:58:32 +00:00
Wei Mi a0c0857e7a [SampleFDO] Add a new compact binary format for sample profile.
Name table occupies a big chunk of size in current binary format sample profile.
In order to reduce its size, the patch changes the sample writer/reader to
save/restore MD5Hash of names in the name table. Sample annotation phase will
also use MD5Hash of name to query samples accordingly.

Experiment shows compact binary format can reduce the size of sample profile by
2/3 compared with binary format generally.

Differential Revision: https://reviews.llvm.org/D47955

llvm-svn: 334447
2018-06-11 22:40:43 +00:00
Konstantin Zhuravlyov 3e5d66ac66 AMDGPU: Add 64-bit relative variant kind
Differential Revision: https://reviews.llvm.org/D47601

llvm-svn: 334443
2018-06-11 21:37:57 +00:00
Matt Arsenault 5615fa0a87 DAG: Fix extract_subvector combine for a single element
This would fail before because 1x vectors aren't legal,
so instead just use the scalar type.

Avoids regressions in a future AMDGPU commit to add
v4i16/v4f16 as legal types.

Test update is just the one test that this triggers
on in tree now. It wasn't checking anything before.
The result is completely  changed since the selects
are eliminated. Not sure if it's considered better
or not.

llvm-svn: 334440
2018-06-11 21:27:41 +00:00
Craig Topper 3efdb7ce19 [X86] Push some variable declarations down into the individual switch cases that need them. NFC
All of the cases are already wrapped in curly braces so declaring a variable there isn't an issue. And the variables aren't assigned or used in the larger scope.

llvm-svn: 334436
2018-06-11 20:50:58 +00:00
Craig Topper ceed99baf0 [X86] Reorder some type constraints to force things to be vectors and integer/fp before forcing them to be the same size.
This may be needed by another patch that I'm working on. It should have no effect on any of the generated outputs.

llvm-svn: 334430
2018-06-11 19:20:15 +00:00
Justin Lebar 4da41c13a5 [SCEV] Add transform zext((A * B * ...)<nuw>) --> (zext(A) * zext(B) * ...)<nuw>.
Reviewers: sanjoy

Subscribers: hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D48041

llvm-svn: 334429
2018-06-11 18:57:58 +00:00
Justin Lebar aa4fec94d8 [SCEV] Add nuw/nsw to mul ops in StrengthenNoWrapFlags where safe.
Summary:
Previously we would add them for adds, but not multiplies.

Reviewers: sanjoy

Subscribers: llvm-commits, hiraditya

Differential Revision: https://reviews.llvm.org/D48038

llvm-svn: 334428
2018-06-11 18:57:42 +00:00
Justin Lebar 7b4656c1d3 Fix indentation in ScalarEvolution.cpp.
Whitespace-only change.  (clang-formatted the whole block.)

llvm-svn: 334427
2018-06-11 18:57:27 +00:00
Krzysztof Parzyszek dd9415d550 [Hexagon] Late predicate producers cannot be used as dot-new sources
llvm-svn: 334426
2018-06-11 18:45:52 +00:00
Tim Shen cc63761720 [SCEV] Canonicalize "A /u C1 /u C2" to "A /u (C1*C2)".
Summary: FWIW InstCombine already folds this. Also avoid the case where C1*C2 overflows.

Reviewers: sunfish, sanjoy

Subscribers: hiraditya, bixia, llvm-commits

Differential Revision: https://reviews.llvm.org/D47965

llvm-svn: 334425
2018-06-11 18:44:58 +00:00
Simon Pilgrim 14ee66ef37 [X86][AVX512] Tag AVX5124FMAPS/AVX5124VNNIW with missing scheduler classes
Necessary for D46276 as even though btver2 doesn't use these instructions, its now flagged as complete so complains if ANY instruction isn't tagged.....

UnsupportedFeatures wouldn't help here as these instructions don't appear to have a feature predicate (like a lot of AVX512).

llvm-svn: 334423
2018-06-11 17:28:00 +00:00
Stanislav Mekhanoshin 7ba3fc730c [AMDGPU] Do not consider indirect acces through phi for wave limiter
Rational: if there is indirect access that is usually an issue
because load is not ready by the use. However, if use is inside a
loop and load is outside that is potentially an issue for a first
iteration only.

Differential Revision: https://reviews.llvm.org/D47740

llvm-svn: 334420
2018-06-11 16:50:49 +00:00
Aleksandar Beserminji 62cf9d21ab [mips] Fix spill slot for mips3, n64 abi
When program is compiled for mips3 with n64 abi, wrong register class
is used for creating an emergency spill slot. This patch fixes the
correct register class to be chosen.

This patch resolves PR35859.

Thanks to John Baldwin for reporting the issue!

Differential Revision: https://reviews.llvm.org/D47938

llvm-svn: 334419
2018-06-11 16:50:28 +00:00
Dylan McKay d011869c82 [AVR] Set trackLivenessAfterRegAlloc
This sets trackLivenessAfterRegAlloc on AVRRegisterInfo.

Most existing targets set this flag. Without it, specific IR inputs
cause LLVM to fail with:

Assertion failed: (getParent()->getProperties().hasProperty( MachineFunctionProperties::Property::TracksLiveness) &&
                   "Liveness information is accurate"), function livein_begin
file MachineBasicBlock.cpp, line 1354.

With this commit, this no longer happens.

Patch by Peter Nimmervoll.

llvm-svn: 334409
2018-06-11 14:46:48 +00:00
Clement Courbet 7db69cc08a [X86] Fix skylake server scheduling info.
Summary:
This fixes most of the scheduling info for SKX vector operations.
I had to split a lot of the YMM/ZMM classes into separate classes for YMM and ZMM.

The before/after llvm-exegesis analysis are in the phabricator diff.

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D47721

llvm-svn: 334407
2018-06-11 14:37:53 +00:00
Pavel Labath b2c73c4cc0 Fix build errors on some configurations
It's been reported
<http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20180611/559616.html>
that template argument deduction for RetryAfterSignal fails if open is
not prefixed with "::".

This should help us build correctly on those platforms and explicitly
specifying the namespace is more correct anyway.

llvm-svn: 334403
2018-06-11 13:30:47 +00:00
Pavel Labath d8c6290ba4 Move VersionTuple from clang/Basic to llvm/Support
Summary:
This kind of functionality is useful to other project apart from clang.
LLDB works with version numbers a lot, but it does not have a convenient
abstraction for this. Moving this class to a lower level library allows
it to be freely used within LLDB.

Since this class is used in a lot of places in clang, and it used to be
in the clang namespace, it seemed appropriate to add it to the list of
adopted classes in LLVM.h to avoid prefixing all uses with "llvm::".

Also, I didn't find any tests specific for this class, so I wrote a
couple of quick ones for the more interesting bits of functionality.

Reviewers: zturner, erik.pilkington

Subscribers: mgorny, cfe-commits, llvm-commits

Differential Revision: https://reviews.llvm.org/D47887

llvm-svn: 334399
2018-06-11 10:28:04 +00:00
Clement Courbet f4f6899cdf [ExynosM1][Sched] Fix resource usage in scheduling model.
This is part of https://reviews.llvm.org/D46356.

llvm-svn: 334391
2018-06-11 07:33:08 +00:00
Clement Courbet c48435bfe5 [X86] Explicitly mark unsupported classes in scheduling models.
Summary: In preparation for D47721. HSW and SNB still define unsupported
classes as they are used by KNL and generic models respectively.

Reviewers: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D47763

llvm-svn: 334389
2018-06-11 07:00:08 +00:00
Craig Topper 0e25c8239a [X86] Remove masking from dbpsadbw intrinsics, use select in IR instead.
llvm-svn: 334384
2018-06-11 06:18:22 +00:00
Daniel Cederman 33f67a256b [Sparc] Add support for 13-bit PIC
Summary: When compiling with -fpic, in contrast to -fPIC, use only the
immediate field to index into the GOT. This saves space if the GOT is
known to be small. The linker will warn if the GOT is too large for
this method.

Reviewers: jyknight, venkatra

Reviewed By: jyknight

Subscribers: brad, fedor.sergeev, jrtc27, llvm-commits

Differential Revision: https://reviews.llvm.org/D47136

llvm-svn: 334383
2018-06-11 05:50:08 +00:00
Brock Wyma b60532f89a [CodeView] Omit forward references for unnamed structs and unions
Codeview references to unnamed structs and unions are expected to refer to the
complete type definition instead of a forward reference so Visual Studio can
resolve the type properly.

Differential Revision: https://reviews.llvm.org/D32498

llvm-svn: 334382
2018-06-11 01:39:34 +00:00
Craig Topper e71ad1f6d0 [X86] Remove and autoupgrade the expandload and compressstore intrinsics.
We use the target independent intrinsics now.

llvm-svn: 334381
2018-06-11 01:25:22 +00:00
Sanjay Patel 3e5c70cc1d [DAGCombiner] match vector compare and select sizes with extload operand (PR37427)
This patch started off much more general and ambitious, but it's been a nightmare 
seeing all the ways x86 vector codegen can go wrong.

So the code is still structured to allow extending easily, but it's currently 
limited in several ways:

1. Only handle cases with an extending load.
2. Only handle cases with a zero constant compare.
3. Ignore setcc with vector bitmask (SetCCWidth != 1) - so AVX512 should be unaffected.

The motivating case from PR37427:
https://bugs.llvm.org/show_bug.cgi?id=37427
...is the 1st test, and that shows the expected win - we eliminated the unnecessary 
intermediate cast.

There's a clear regression in the last test (sgt_zero_fp_select) because we longer 
recognize a 'SHRUNKBLEND' opportunity. I think that general problem is also present 
in sgt_zero, so I'll try to fix that in a follow-up. We need to match a sign-bit 
setcc from a sign-extended operand and remove it.

Differential Revision: https://reviews.llvm.org/D47330

llvm-svn: 334378
2018-06-10 23:09:50 +00:00
Craig Topper 860562c915 [X86] Miscellaneous fixes to get the load folding table generator to work again.
llvm-svn: 334377
2018-06-10 21:48:24 +00:00
Zachary Turner 15243d5a6d Attempt 3: Resubmit "[Support] Expose flattenWindowsCommandLine."
I took some liberties and quoted fewer characters than before,
based on an article from MSDN which says that only certain characters
cause an arg to require quoting.  This seems to be incorrect, though,
and worse it seems to be a difference in Windows version.  The bot
that fails is Windows 7, and I can't reproduce the failure on Win
10.  But it's definitely related to quoting and special characters,
because both tests that fail have a * in the argument, which is one
of the special characters that would cause an argument to be quoted
before but not any longer after the new patch.

Since I don't have Win 7, all I can do is just guess that I need to
restore the old quoting rules.  So this patch does that in hopes that
it fixes the problem on Windows 7.

llvm-svn: 334375
2018-06-10 20:57:14 +00:00
Roman Lebedev ebb3252f00 Revert rL334371 / D47980: "[InstCombine] Fold (x << y) >> y -> x & (-1 >> y)"
test/Transforms/InstCombine/AMDGPU/amdgcn-intrinsics.ll broke,
and i did not notice because i did not build that backend.

llvm-svn: 334373
2018-06-10 20:32:03 +00:00
Roman Lebedev eb795a0661 [InstCombine] Fold (x >> y) << y -> x & (-1 << y)
Summary:
We already do it for matching splat constants, but not just values.

Further improvements for non-matching splat constants, as noted in
https://reviews.llvm.org/D46760#1123713 will be needed,
but i'd prefer to do that as a follow-up.

https://bugs.llvm.org/show_bug.cgi?id=37603
https://rise4fun.com/Alive/cplX
https://rise4fun.com/Alive/0HF

Reviewers: spatel, craig.topper

Reviewed By: spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D47981

llvm-svn: 334372
2018-06-10 20:10:13 +00:00
Roman Lebedev 4cdc59ecf2 [InstCombine] Fold (x << y) >> y -> x & (-1 >> y)
Summary:
We already do it for splat constants, but not just values.
Also, undef cases are mostly non-functional.

https://bugs.llvm.org/show_bug.cgi?id=37603
https://rise4fun.com/Alive/cplX

Reviewers: spatel, craig.topper

Reviewed By: spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D47980

llvm-svn: 334371
2018-06-10 20:10:06 +00:00
Ivan A. Kosarev 847daa11f8 [NEON] Support VST1xN intrinsics in AArch32 mode (LLVM part)
We currently support them only in AArch64. The NEON Reference,
however, says they are 'ARMv7, ARMv8' intrinsics.

Differential Revision: https://reviews.llvm.org/D47447

llvm-svn: 334361
2018-06-10 09:27:27 +00:00
Craig Topper 98a79934af [X86] Remove masking from the 512-bit masked floating point add/sub/mul/div intrinsics. Use a select in IR instead.
llvm-svn: 334358
2018-06-10 06:01:36 +00:00
Fangrui Song 69d6418d60 Cleanup. NFC
llvm-svn: 334357
2018-06-10 04:53:14 +00:00
Zachary Turner 071a09053a Revert "Resubmit "[Support] Expose flattenWindowsCommandLine.""
This reverts commit 65243b6d19143cb7a03f68df0169dcb63e8b4632.

Seems like it's not a flake.  It might have something to do with
the '*' character being in a command line.

llvm-svn: 334356
2018-06-10 03:16:25 +00:00
Zachary Turner 5e119768a1 Resubmit "[Support] Expose flattenWindowsCommandLine."
There were a few linux compilation failures, but other than that
I think this was just a flake that caused the tests to fail.  I'm
going to resubmit and see if the failures go away, if not I'll
revert again.

llvm-svn: 334355
2018-06-10 02:46:11 +00:00
Zachary Turner 1fbca91c07 Revert "[Support] Expose flattenWindowsCommandLine."
This reverts commit 10d2e88e87150a35dc367ba30716189d2af26774.

This is causing some test failures for some reason, reverting
while I investigate.

llvm-svn: 334354
2018-06-09 23:07:39 +00:00
Zachary Turner 48c3341cfe [Support] Expose flattenWindowsCommandLine.
This function was internal to Program.inc, but I've needed this
on several occasions when I've had to use CreateProcess without
llvm's sys::Execute functions.  In doing so, I noticed that the
function was written using unsafe C-string access and was pretty
hard to understand / make sense of, so I've also re-written the
functions to use more modern LLVM constructs.

llvm-svn: 334353
2018-06-09 22:44:44 +00:00
Gabor Buella 5aa26980c4 [X86] NFC Use member initialization in X86Subtarget
The separate initializeEnvironment function was sort of
useless since r217071.
ARM did this move already with r273556.

llvm-svn: 334345
2018-06-09 09:19:40 +00:00
Serge Pavlov 15681ad00b Use uniform mechanism for OOM errors handling
This is a recommit of r333506, which was reverted in r333518.
The original commit message is below.

In r325551 many calls of malloc/calloc/realloc were replaces with calls of
their safe counterparts defined in the namespace llvm. There functions
generate crash if memory cannot be allocated, such behavior facilitates
handling of out of memory errors on Windows.

If the result of *alloc function were checked for success, the function was
not replaced with the safe variant. In these cases the calling function made
the error handling, like:

    T *NewElts = static_cast<T*>(malloc(NewCapacity*sizeof(T)));
    if (NewElts == nullptr)
      report_bad_alloc_error("Allocation of SmallVector element failed.");

Actually knowledge about the function where OOM occurred is useless. Moreover
having a single entry point for OOM handling is convenient for investigation
of memory problems. This change removes custom OOM errors handling and
replaces them with calls to functions `llvm::safe_*alloc`.

Declarations of `safe_*alloc` are moved to a separate include file, to avoid
cyclic dependency in SmallVector.h

Differential Revision: https://reviews.llvm.org/D47440

llvm-svn: 334344
2018-06-09 05:19:45 +00:00
Craig Topper 61998289f9 Use SmallPtrSet instead of SmallSet in places where we iterate over the set.
SmallSet forwards to SmallPtrSet for pointer types. SmallPtrSet supports iteration, but a normal SmallSet doesn't. So if it wasn't for the forwarding, this wouldn't work.

These places were found by hiding the begin/end methods in the SmallSet forwarding

llvm-svn: 334343
2018-06-09 05:04:20 +00:00
Eli Friedman 864df22307 [ARM] Allow CMPZ transforms even if the input has multiple uses.
It looks like this got left in by accident in r289794; I can't think of
any reason this check would be necessary.  (Maybe it was meant to be a
check that the AND has one use? But we check that a few lines earlier.)

Differential Revision: https://reviews.llvm.org/D47921

llvm-svn: 334322
2018-06-08 21:16:56 +00:00
Krzysztof Parzyszek b10ea39270 [SCEV] Look through zero-extends in howFarToZero
An expression like
  (zext i2 {(trunc i32 (1 + %B) to i2),+,1}<%while.body> to i32)
will become zero exactly when the nested value becomes zero in its type.
Strip injective operations from the input value in howFarToZero to make
the value simpler.

Differential Revision: https://reviews.llvm.org/D47951

llvm-svn: 334318
2018-06-08 20:43:07 +00:00
Davide Italiano 189c2cf114 [InstCombine] Skip dbg.value(s) when looking at stack{save,restore}.
Fixes PR37713.

llvm-svn: 334317
2018-06-08 20:42:36 +00:00
Reid Kleckner 0bab222084 [asan] Instrument comdat globals on COFF targets
Summary:
If we can use comdats, then we can make it so that the global metadata
is thrown away if the prevailing definition of the global was
uninstrumented. I have only tested this on COFF targets, but in theory,
there is no reason that we cannot also do this for ELF.

This will allow us to re-enable string merging with ASan on Windows,
reducing the binary size cost of ASan on Windows.

Reviewers: eugenis, vitalybuka

Subscribers: hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D47841

llvm-svn: 334313
2018-06-08 18:33:16 +00:00
Sanjay Patel 498564e6fb [DAGCombiner] clean up comments; NFC
llvm-svn: 334312
2018-06-08 18:00:46 +00:00
Simon Pilgrim 5c32989c91 [X86][SSE] Support v8i16/v16i16 rotations
Extension to D46954 (PR37426), this patch adds support for v8i16/v16i16 rotations in a similar manner - the conversion of the shift/rotate amount to a multiplication factor and the use of PMULLW to shift left and PMULHUW (ISD::MULHU) to shift the wrapped bits back around to be ORd together.

Differential Revision: https://reviews.llvm.org/D47822

llvm-svn: 334309
2018-06-08 17:58:42 +00:00
Michael Berg bf90d1f263 Utilize new SDNode flag functionality to expand current support for fsub
Summary: This patch originated from D46562 and is a proper subset, with some issues addressed for fsub.

Reviewers: spatel, hfinkel, wristow, arsenm

Reviewed By: spatel

Subscribers: wdng

Differential Revision: https://reviews.llvm.org/D47910

llvm-svn: 334306
2018-06-08 17:39:50 +00:00
Florian Hahn 45e5d5b4be [VPlan] Move recipe construction to VPRecipeBuilder.
This patch moves the recipe-creation functions out of
LoopVectorizationPlanner, which should do the high-level
orchestration of the transformations.

Reviewers: dcaballe, rengolin, hsaito, Ayal

Reviewed By: dcaballe

Differential Revision: https://reviews.llvm.org/D47595

llvm-svn: 334305
2018-06-08 17:30:45 +00:00
Simon Pilgrim 89deac6694 [X86][BtVer2] Add support for all SUB/XOR 32/64 scalar instructions that should match the dependency-breaking 'zero-idiom'
As detailed on Agner's Microarchitecture doc (21.8 AMD Bobcat and Jaguar pipeline - Dependency-breaking instructions), these instructions are dependency breaking and fast-path zero the destination register (and appropriate EFLAGS bits).

llvm-svn: 334303
2018-06-08 17:00:45 +00:00
Daniil Fukalov c9a098b314 [AMDGPU] Inline asm - added i16, half and i128 types support
AMDGPU inline assembler support i16, half and i128 typed variables in constraints, but they were reported as error.
Needed to fix https://github.com/RadeonOpenCompute/ROCm/issues/341,
e.g. to be able to load with global_load_dwordx4 to a 128bit integer variable

Differential Revision: https://reviews.llvm.org/D44920

llvm-svn: 334301
2018-06-08 16:29:04 +00:00
Daniil Fukalov 37433dc2e1 reapply r334209 with fixes for harfbuzz in Chromium
r334209 description:
[LSR] Check yet more intrinsic pointer operands

the patch fixes another assertion in isLegalUse()

Differential Revision: https://reviews.llvm.org/D47794

llvm-svn: 334300
2018-06-08 16:22:52 +00:00
Roman Lebedev f87321a2dc [NFC][InstSimplify] SimplifyAddInst(): coding style: variable names.
llvm-svn: 334299
2018-06-08 15:44:53 +00:00
Roman Lebedev b060ce45ca [InstSimplify] add nuw %x, -1 -> -1 fold.
Summary:
`%ret = add nuw i8 %x, C`
From [[ https://llvm.org/docs/LangRef.html#add-instruction | langref ]]:
    nuw and nsw stand for “No Unsigned Wrap” and “No Signed Wrap”,
    respectively. If the nuw and/or nsw keywords are present,
    the result value of the add is a poison value if unsigned
    and/or signed overflow, respectively, occurs.

So if `C` is `-1`, `%x` can only be `0`, and the result is always `-1`.

I'm not sure we want to use `KnownBits`/`LVI` here, because there is
exactly one possible value (all bits set, `-1`), so some other pass
should take care of replacing the known-all-ones with constant `-1`.

The `test/Transforms/InstCombine/set-lowbits-mask-canonicalize.ll` change *is* confusing.
What happening is, before this: (omitting `nuw` for simplicity)
1. First, InstCombine D47428/rL334127 folds `shl i32 1, %NBits`) to `shl nuw i32 -1, %NBits`
2. Then, InstSimplify D47883/rL334222 folds `shl nuw i32 -1, %NBits` to `-1`,
3. `-1` is inverted to `0`.
But now:
1. *This* InstSimplify fold `%ret = add nuw i32 %setbit, -1` -> `-1` happens first,
   before InstCombine D47428/rL334127 fold could happen.
Thus we now end up with the opposite constant,
and it is all good: https://rise4fun.com/Alive/OA9

https://rise4fun.com/Alive/sldC
Was mentioned in D47428 review.
Follow-up for D47883.

Reviewers: spatel, craig.topper

Reviewed By: spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D47908

llvm-svn: 334298
2018-06-08 15:44:47 +00:00
Alexander Kornienko f084913223 commandLineFitsWithinSystemLimits Overestimates System Limits
Summary:
The function `llvm::sys::commandLineFitsWithinSystemLimits` appears to be overestimating the system limits. This issue was discovered while attempting to enable response files in the Swift compiler. When the compiler submits its frontend jobs, those jobs are subjected to the system limits on command line length. `commandLineFitsWithinSystemLimits` is used to determine if the job's arguments need to be wrapped in a response file. There are some cases where the argument size for the job passes `commandLineFitsWithinSystemLimits`, but actually exceeds the real system limit, and the job fails.

`clang` also uses this function to decide whether or not to wrap it's job arguments in response files. See: https://github.com/llvm-mirror/clang/blob/master/lib/Driver/Driver.cpp#L1341. Clang will also fail for response files who's size falls within a certain range. I wrote a script that should find a failure point for `clang++`. All that is needed to run it is Python 2.7, and a simple "hello world" program for `test.cc`. It should run on Linux and on macOS. The script is available here: https://gist.github.com/dabelknap/71bd083cd06b91c5b3cef6a7f4d3d427. When it hits a failure point, you should see a `clang: error: unable to execute command: posix_spawn failed: Argument list too long`.

The proposed solution is to mirror the behavior of `xargs` in `commandLinefitsWithinSystemLimits`. `xargs` defaults to 128k for the command line length size (See: https://fossies.org/dox/findutils-4.6.0/buildcmd_8c_source.html#l00551). It adjusts this depending on the value of `ARG_MAX`.

Reviewers: alexfh

Reviewed By: alexfh

Subscribers: llvm-commits

Tags: #clang

Patch by Austin Belknap!

Differential Revision: https://reviews.llvm.org/D47795

llvm-svn: 334295
2018-06-08 15:19:16 +00:00
Zachary Turner 66ef5d3cd6 Clean up some code in Program.
NFC here, this just raises some platform specific ifdef hackery
out of a class and creates proper platform-independent typedefs
for the relevant things.  This allows these typedefs to be
reused in other places without having to reinvent this preprocessor
logic.

llvm-svn: 334294
2018-06-08 15:16:25 +00:00
Zachary Turner 6edfecb883 Add a file open flag that disables O_CLOEXEC.
O_CLOEXEC is the right default, but occasionally you don't
want this.  This is especially true for tools like debuggers
where you might need to spawn the child process with specific
files already open, but it's occasionally useful in other
scenarios as well, like when you want to do some IPC between
parent and child.

llvm-svn: 334293
2018-06-08 15:15:56 +00:00
Simon Pilgrim a6afa310c9 [X86][SSE] Simplify combineVectorTruncationWithPACKUS to reduce code duplication
Simplify combineVectorTruncationWithPACKUS to mask the upper bits followed by calling truncateVectorWithPACK instead of duplicating with similar code.

This results in the codegen using (V)PACKUSDW on SSE41+ targets for vXi64/vXi32 inputs where before it always used PACKUSWB (along with a lot more bitcasting).

I've raised PR37749 as until we avoid unnecessary concats back to 256-bit for bitwise ops, we can't avoid splitting the input value into 128-bit subvectors for masking.

llvm-svn: 334289
2018-06-08 13:59:11 +00:00
Artur Pilipenko 4d063e7bb1 [BPI] Apply invoke heuristic before loop branch heuristic
Currently the loop branch heuristic is applied before the invoke heuristic which makes us overestimate the probability of the unwind destination of invokes inside loops. This in turn makes us grossly underestimate the frequencies of loops with invokes.

Reviewed By: skatkov, vsk

Differential Revision: https://reviews.llvm.org/D47371

llvm-svn: 334285
2018-06-08 13:03:21 +00:00
Florian Hahn b3c6f07dde [VPlan] Move recipe based VPlan generation to separate function.
This first step separates VPInstruction-based and VPRecipe-based
VPlan creation, which should make it easier to migrate to VPInstruction
based code-gen step by step.

Reviewers: Ayal, rengolin, dcaballe, hsaito, mkuper, mzolotukhin

Reviewed By: dcaballe

Subscribers: bollu, tschuett, rkruppe, llvm-commits

Differential Revision: https://reviews.llvm.org/D47477

llvm-svn: 334284
2018-06-08 12:53:51 +00:00
Simon Dardis 1d6254f7e9 [mips] Correct the predicates for a number of codegen only instructions
Reviewers: smaksimovic, atanasyan, abeserminji

Differential Revision: https://reviews.llvm.org/D47638

llvm-svn: 334280
2018-06-08 10:55:34 +00:00
Alex Bradbury ed53ca73ec [RISCV] Implement MC layer support for the fence.tso instruction
The instruction makes use of a previously ignored field in the fence
instruction. It is introduced in the version 2.3 draft of the RISC-V
specification after much work by the Memory Model Task Group.

As clarified here <https://github.com/riscv/riscv-isa-manual/issues/186>,
the fence.tso assembler mnemonic does not have operands.

llvm-svn: 334278
2018-06-08 10:39:05 +00:00
Simon Pilgrim ad45efc445 [X86][SSE] Consistently prefer lowering to PACKUS over PACKSS
We have some combines/lowerings that attempt to use PACKSS-then-PACKUS and others that use PACKUS-then-PACKSS.

PACKUS is much easier to combine with if we know the upper bits are zero as ComputeKnownBits can easily see through BITCASTs etc. especially now that rL333995 and rL334007 have landed. It also effectively works at byte level which further simplifies shuffle combines.

The only (minor) annoyances are that ComputeKnownBits can sometimes take longer as it doesn't fail as quickly as ComputeNumSignBits (but I'm not seeing any actual regressions in tests) and PACKUSDW only became available after SSE41 so we have more codegen diffs between targets.

llvm-svn: 334276
2018-06-08 10:29:00 +00:00
Roman Shirokiy 9ba0aa2da0 [LV] Fix PR36983. For a given recurrence, fix all phis in exit block
There could be more than one PHIs in exit block using same loop recurrence.
Don't assume there is only one and fix each user.

Differential Revision: https://reviews.llvm.org/D47788

llvm-svn: 334271
2018-06-08 08:21:20 +00:00
Matt Arsenault 6fc3759811 AMDGPU: Error on LDS global address in functions
These won't work as expected now, so error on them to avoid
wasting time debugging this in the future.

llvm-svn: 334269
2018-06-08 08:05:54 +00:00
Sam Parker 16f963ba0d [DAGCombine] Fix for PR37667
While trying to propagate AND masks back to loads, we currently allow
one non-load node to be included as a leaf in chain. This fix now
limits that node to produce only a single data value.

Differential Revision: https://reviews.llvm.org/D47878

llvm-svn: 334268
2018-06-08 07:49:04 +00:00
Hiroshi Inoue 863fb7a4b8 [NFC] fix formatting
llvm-svn: 334263
2018-06-08 04:00:54 +00:00
Craig Topper 2ed4328281 [X86] Improve some shuffle decoding code to remove a conditional from a loop and reduce the number of temporary variables. NFCI
The NumControlBits variable was definitely sketchy. I think that only worked because the expected value was 1 or 2 and the number of lanes was 2 or 4. Had their been 8 lanes the number of bits should have been 3 not 4 as the previous code would have given.

llvm-svn: 334258
2018-06-08 01:09:31 +00:00
Tony Tye 6db1f5da4f [AMDGPU] Simplify memory legalizer (add missing virtual descructor)
Differential Revision: https://reviews.llvm.org/D47504

llvm-svn: 334257
2018-06-08 01:00:11 +00:00
Reid Kleckner a3609f75b2 Revert r334209 "[LSR] Check yet more intrinsic pointer operands"
This causes cast failures when compiling harfbuzz in Chromium.
Reproducer on the way.

llvm-svn: 334254
2018-06-08 00:43:27 +00:00
Zachary Turner 9d2cfa6ccc Expose a single global file open function.
This one allows much more flexibility than the standard
openFileForRead / openFileForWrite functions.  Since there is now
just one "real" function that does the work, all other implementations
simply delegate to this one.

llvm-svn: 334246
2018-06-07 23:25:13 +00:00
Michael Berg 77b5be7ec6 propagate fast math flags via IR on fma and sub expressions
Summary: This change uses fmf subflags to guard fma optimizations as well as unsafe. These changes originated from D46483 and have been simplified via getNode.

Reviewers: spatel, arsenm, hfinkel, javed.absar

Reviewed By: spatel

Subscribers: nemanjai, wdng

Differential Revision: https://reviews.llvm.org/D47388

llvm-svn: 334242
2018-06-07 22:49:09 +00:00
Tony Tye a5a7c331e7 [AMDGPU] Simplify memory legalizer
- Make code easier to maintain.
- Avoid generating waitcnts for VMEM if the address sppace does not involve VMEM.
- Add support to generate waitcnts for LDS and GDS memory.

Differential Revision: https://reviews.llvm.org/D47504

llvm-svn: 334241
2018-06-07 22:28:32 +00:00
Petr Hosek 623e2c928a [Support] Link libzircon.so when building LLVM for Fuchsia
This is necessary for zx_* symbols.

Differential Revision: https://reviews.llvm.org/D47848

llvm-svn: 334232
2018-06-07 21:01:32 +00:00
Zachary Turner 154a72ddf7 Fix unused private variable.
This parameter got lost in the refactor.  Add it back.

llvm-svn: 334223
2018-06-07 20:07:08 +00:00
Roman Lebedev 2683802ba0 [InstSimplify] shl nuw C, %x -> C iff signbit is set on C.
Summary:
`%r = shl nuw i8 C, %x`

As per langref:
```
If the nuw keyword is present, then the shift produces
a poison value if it shifts out any non-zero bits.
```
Thus, if the sign bit is set on `C`, then `%x` can only be `0`,
which means that `%r` can only be `C`.
Or in other words, set sign bit means that the signed value
is negative, so the constant is `<= 0`.

https://rise4fun.com/Alive/WMk
https://rise4fun.com/Alive/udv

Was mentioned in D47428 review.

We already handle the `0` constant, https://godbolt.org/g/UZq1sJ, so this only handles negative constants.

Could use computeKnownBits() / LazyValueInfo,
but the cost-benefit analysis (https://reviews.llvm.org/D47891)
suggests it isn't worth it.

Reviewers: spatel, craig.topper

Reviewed By: spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D47883

llvm-svn: 334222
2018-06-07 20:03:45 +00:00
Zachary Turner 1f67a3cba9 [FileSystem] Split up the OpenFlags enumeration.
This breaks the OpenFlags enumeration into two separate
enumerations: OpenFlags and CreationDisposition.  The first
controls the behavior of the API depending on whether or not
the target file already exists, and is not a flags-based
enum.  The second controls more flags-like values.

This yields a more easy to understand API, while also allowing
flags to be passed to the openForRead api, where most of the
values didn't make sense before.  This also makes the apis more
testable as it becomes easy to enumerate all the configurations
which make sense, so I've added many new tests to exercise all
the different values.

llvm-svn: 334221
2018-06-07 19:58:58 +00:00
Matt Arsenault e8eb567e17 DAG: Avoid bitcast/ext/build_vector combine
This avoids regressions in a future AMDGPU change
to make v4i16/v4f16 legal. For these types, build_vector
is implemented as bitcasted operations on v2i32. This
combine was creating v4i16s out of what would have been
already been a v2i32 build_vector, creating a mess
of nodes that never get cleaned up.

I'm not sure this is the right condition to check.
I initially tried just checking for the legality of the
new build_vector. This works for my case, but breaks dozens
of x86 tests. A Mips test seems to show some improvement
or at least a neutral change. I don't want to think
about how long it would take to analyze the set of
different x86 vector operations impacted.

Test included in future commit.

llvm-svn: 334218
2018-06-07 19:42:27 +00:00
Sanjay Patel 4b1205b40f [TargetLibraryInfo] add mappings from LLVM sin/cos intrinsics to SVML calls
These weren't included in D19544 - probably just an oversight.
D40044 made it more likely that we'll have LLVM math intrinsics rather 
than libcalls, so this bug was more easily exposed.
As the tests/code show, we already have the complete mappings for pow/exp/log.

I don't have any experience with SVML, so I don't know if anything else is 
missing. It's also not clear to me that we should be doing this transform in 
IR rather than DAG/isel, but that's a separate issue.

Differential Revision: https://reviews.llvm.org/D47610

llvm-svn: 334211
2018-06-07 18:21:24 +00:00
Daniil Fukalov 12c0663a25 [LSR] Check yet more intrinsic pointer operands
the patch fixes another assertion in isLegalUse()

Differential Revision: https://reviews.llvm.org/D47794

llvm-svn: 334209
2018-06-07 17:30:58 +00:00
Simon Pilgrim 51ceef775b [X86][SSE] Updated comment - combineVectorSignBitsTruncation handles PACKSS and PACKUS. NFCI.
llvm-svn: 334204
2018-06-07 16:08:40 +00:00
Alex Bradbury 6a4b5441e4 [RISCV] AsmParser support for the li pseudo instruction
The implementation follows the MIPS backend and expands the pseudo instruction 
directly during asm parsing. As the result, only real MC instructions are 
emitted to the MCStreamer. The actual expansion to real instructions is 
similar to the expansion performed by the GNU Assembler.

This patch supersedes D41949.

Differential Revision: https://reviews.llvm.org/D46118
Patch by Mario Werner.

llvm-svn: 334203
2018-06-07 15:35:47 +00:00
Alex Bradbury 6cfb31c7c1 [AVR] Fix build after r334078
r334078 added MCSubtargetInfo to fixupNeedsRelaxation and applyFixup. This 
patch makes the necessary adjustment for the AVR target.

llvm-svn: 334202
2018-06-07 15:29:09 +00:00
Simon Pilgrim 51ff15f472 [X86][SSE] Simplify combineVectorTruncationWithPACKUS. NFCI.
Move code only used by combineVectorTruncationWithPACKUS out of combineVectorTruncation.

llvm-svn: 334201
2018-06-07 14:53:32 +00:00
Hiroshi Inoue 01ef4c2c64 [PowerPC] avoid unprofitable Repl32 flag in BitPermutationSelector
BitPermutationSelector sets Repl32 flag for bit groups which can be (potentially) benefit from 32-bit rotate-and-mask instructions with bit replication, i.e. rlwinm/rlwimi copies lower 32 bits into upper 32 bits on 64-bit PowerPC before rotation.
However, enforcing 32-bit instruction sometimes results in redundant generated code.
For example, the following simple code is compiled into rotldi + rlwimi while it can be compiled into only rldimi instruction if Repl32 flag is not set on the bit group for (a & 0xFFFFFFFF).

uint64_t func(uint64_t a, uint64_t b) {
	return (a & 0xFFFFFFFF) | (b << 32) ;
}

To avoid such problem, this patch checks the potential benefit of Repl32 flag before setting it. If a bit group does not require rotation (i.e. RLAmt == 0) and won't be merged into another group, we do not benefit from Repl32 flag on this group.

Differential Revision: https://reviews.llvm.org/D47867

llvm-svn: 334195
2018-06-07 13:21:14 +00:00
Petar Jovanovic 241f286bd7 [Mips] Silencing warnings in instruction info (NFC)
isORCopyInst and isReadOrWriteToDSPReg functions were producing warning
that some statements my fall through.

Patch by Nikola Prica.

Differential Revision: https://reviews.llvm.org/D47876

llvm-svn: 334194
2018-06-07 13:06:06 +00:00
Simon Pilgrim 09953d8412 [X86][SSE] Simplify combineVectorTruncationWithPACKSS to reduce code duplication
Simplify combineVectorTruncationWithPACKSS to just a SIGN_EXTEND_INREG followed by using the existing truncateVectorWithPACK instead of duplicating code.

llvm-svn: 334193
2018-06-07 13:01:42 +00:00
Hiroshi Inoue b557846083 [PowerPC] fix trivial typos in comment, NFC
llvm-svn: 334191
2018-06-07 12:49:12 +00:00
Matt Arsenault f1c868ef08 AMDGPU: Fix not including v2f64 in SReg_128
Fixes assertion with calls returning v2f64.

llvm-svn: 334189
2018-06-07 12:16:31 +00:00
Florian Hahn 0d6b01761c [Mem2Reg] Avoid replacing load with itself in promoteSingleBlockAlloca.
We do the same thing in rewriteSingleStoreAlloca.

Fixes PR37632.

Reviewers: chandlerc, davide, efriedma

Reviewed By: davide

Differential Revision: https://reviews.llvm.org/D47825

llvm-svn: 334187
2018-06-07 11:09:05 +00:00
Matt Arsenault 697300bd4f AMDGPU: Use scalar operations for f16 fabs/fneg patterns
Fixes unnecessary differences between subtargets.

llvm-svn: 334184
2018-06-07 10:15:20 +00:00
Matt Arsenault 90083d3088 AMDGPU: Try a lot harder to emit scalar loads
This has two main components. First, widen
widen short constant loads in DAG when they have
the correct alignment. This is already done a bit in
AMDGPUCodeGenPrepare, since that has access to
DivergenceAnalysis. This can't help kernarg loads
created in the DAG. Start to use DAG divergence analysis
to help this case.

The second part is to avoid kernel argument lowering
breaking the alignment of short vector elements because
calling convention lowering wants to split everything
into legal register types.

When loading a split type, load the nearest 4-byte aligned
segment and shift to get the desired bits. This extra
load of the earlier argument piece ends up merging,
and the bit extract hopefully folds out.

There are a number of improvements and regressions with
this, but I think as-is this is a better compromise between
several of the worst parts of SelectionDAG.

Particularly when i16 is legal, this produces worse code
for i8 and i16 element vector kernel arguments. This is
partially due to the very weak load merging the DAG does.
It only looks for fairly specific combines between pairs
of loads which no longer appear. In particular this
causes v4i16 loads to be split into 2 components when
previously the two halves were merged.

Worse, because of the newly introduced shifts, there
is a lot more unnecessary vector packing and unpacking code
emitted. At least some of this is due to reporting
false for isTypeDesirableForOp for i16 as a workaround for
the lack of divergence information in the DAG. The cases
where this happens it doesn't actually matter, but the
relevant code in SimplifyDemandedBits doens't have the context
to know to ignore this.

The use of the  scalar cache is probably more important
than the mess of mostly scalar instructions doing this packing
and unpacking. Future work can fix this, possibly by making better
use of the new DAG divergence information for controlling promotion
decisions, or adding another version of shift + trunc + shift
combines that doesn't only know about the used types.

llvm-svn: 334180
2018-06-07 09:54:49 +00:00
Clement Courbet 4281b1d3b5 [X86][NFC] Fix harmless typo in BtVer2 model.
See D46356 for context.

llvm-svn: 334178
2018-06-07 09:26:33 +00:00
Tomasz Krupa f8c7637027 [X86] Block UndefRegUpdate
Summary: Prevent folding of operations with memory loads when one of the sources has undefined register update.

Reviewers: craig.topper

Subscribers: llvm-commits, mike.dvoretsky, ashlykov

Differential Revision: https://reviews.llvm.org/D47621

llvm-svn: 334175
2018-06-07 08:48:45 +00:00
Max Kazantsev b4b2ccea6d [NFC] Use variable instead of accessing pair many times
llvm-svn: 334173
2018-06-07 08:47:19 +00:00
Tomasz Krupa 145825162a Test commit access.
Added a bunch of periods after comments.

llvm-svn: 334171
2018-06-07 08:20:28 +00:00
Clement Courbet 9212ef0a0a [X86][NFC] Fix harmless typos in BDW/ZnVer1 sched models.
See D46356 for context.

llvm-svn: 334164
2018-06-07 07:37:49 +00:00
Karl-Johan Karlsson abb11f805f [BranchFolding] Fix live-in's when hoisting code
Summary:
When the branch folder hoist code into a predecessor it adjust live-in's
in the blocks it hoist code from. However it fail to handle hoisted code
that contain a defed register that originally is live-in in the block
through a super register.

This is fixed by replacing the live-in handling code with calls to
utility functions in LivePhysRegs.

Reviewers: kparzysz, gberry, MatzeB, uweigand, aprantl

Reviewed By: kparzysz

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D47529

llvm-svn: 334163
2018-06-07 07:20:33 +00:00
Jonas Paulsson e80d405760 [SystemZ] Build Load And Test from scratch in convertToLoadAndTest.
This is needed to get CC operand in right place, as expected by the
SchedModel.

Review: Ulrich Weigand
https://reviews.llvm.org/D47820

llvm-svn: 334161
2018-06-07 05:59:07 +00:00
Michael Zolotukhin 31800864dc SpeculativeExecution Pass: Set PreserveCFG to avoid unnecessary analyses invalidation.
The pass doesn't touch CFG in any way, only moves instructions between
blocks.

llvm-svn: 334150
2018-06-07 00:19:29 +00:00
Stanislav Mekhanoshin df61be70b2 [AMDGPU] Improve reciprocal handling
When denormals are supported we are producing a full division for
1.0f / x. That still can be replaced by the faster version:

    bool c = fabs(x) > 0x1.0p+96f;
    float s = c ? 0x1.0p-32f : 1.0f;
    x *= s;
    return s * v_rcp_f32(x)

in case if requested accuracy is 2.5ulp or less. The same version
is used if denormals are not supported for non 1.0 numerators, where
just v_rcp_f32 is then used for 1.0 numerator.

The optimization of 1/x is extended to the case -1/x, which is the
same except for the resulting sign bit.

OpenCL conformance passed with both enabled and disabled denorms.

Differential Revision: https://reviews.llvm.org/D47805

llvm-svn: 334142
2018-06-06 22:22:32 +00:00
Teresa Johnson 4ffc3e7834 [ThinLTO] Rename index IsAnalysis flag to HaveGVs (NFC)
With the upcoming patch to add summary parsing support, IsAnalysis would
be true in contexts where we are not performing module summary analysis.
Rename to the more specific and approprate HaveGVs, which is essentially
what this flag is indicating.

llvm-svn: 334140
2018-06-06 22:22:01 +00:00
Sanjay Patel 3cd1aa88f9 [InstCombine] fold another shifty abs pattern to cmp+sel (PR36036)
The bug report:
https://bugs.llvm.org/show_bug.cgi?id=36036

...requests a DAG change for this, but an IR canonicalization
probably handles most cases. If we still want to match this
pattern in the backend, there's a proposal for that too:
D47831

Alive proofs including nsw/nuw cases that were first noted in:
D46988

https://rise4fun.com/Alive/Kmp

This patch is largely copied from the existing code that was
initially added with:
D40984
...but I didn't see much gain from trying to share code.

llvm-svn: 334137
2018-06-06 21:58:12 +00:00
Matt Arsenault e9524f1fb3 AMDGPU: Custom lower v2f16 fneg/fabs with illegal f16
Fixes terrible code on targets without f16 support. The
legalization creates a mess that is difficult to recover
from. Also should avoid randomly breaking these tests
multiple times in sequence in future commits.

Some regressions in cases where it happens to be better
to pull the source modifier after the conversion.

llvm-svn: 334132
2018-06-06 21:28:11 +00:00
Roman Lebedev cbf8446359 [InstCombine] PR37603: low bit mask canonicalization
Summary:
This is [[ https://bugs.llvm.org/show_bug.cgi?id=37603 | PR37603 ]].

https://godbolt.org/g/VCMNpS
https://rise4fun.com/Alive/idM

When doing bit manipulations, it is quite common to calculate some bit mask,
and apply it to some value via `and`.

The typical C code looks like:
```
int mask_signed_add(int nbits) {
    return (1 << nbits) - 1;
}
```
which is translated into (with `-O3`)
```
define dso_local i32 @mask_signed_add(int)(i32) local_unnamed_addr #0 {
  %2 = shl i32 1, %0
  %3 = add nsw i32 %2, -1
  ret i32 %3
}
```

But there is a second, less readable variant:
```
int mask_signed_xor(int nbits) {
    return ~(-(1 << nbits));
}
```
which is translated into (with `-O3`)
```
define dso_local i32 @mask_signed_xor(int)(i32) local_unnamed_addr #0 {
  %2 = shl i32 -1, %0
  %3 = xor i32 %2, -1
  ret i32 %3
}
```

Since we created such a mask, it is quite likely that we will use it in `and` next.
And then we may get rid of `not` op by folding into `andn`.

But now that i have actually looked:
https://godbolt.org/g/VTUDmU
_some_ backend changes will be needed too.
We clearly loose `bzhi` recognition.

Reviewers: spatel, craig.topper, RKSimon

Reviewed By: spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D47428

llvm-svn: 334127
2018-06-06 19:38:27 +00:00
Roman Lebedev 488d28d4e5 [X86] Emit BZHI when mask is ~(-1 << nbits))
Summary:
In D47428, i propose to choose the `~(-(1 << nbits))` as the canonical form of low-bit-mask formation.
As it is seen from these tests, there is a reason for that.

AArch64 currently better handles `~(-(1 << nbits))`, but not the more traditional `(1 << nbits) - 1` (sic!).
The other way around for X86.
It would be much better to canonicalize.

This patch is completely monkey-typing.
I don't really understand how this works :)
I have based it on `// x & (-1 >> (32 - y))` pattern.

Also, when we only have `BMI`, i wonder if we could use `BEXTR` with `start=0` ?

Related links:
https://bugs.llvm.org/show_bug.cgi?id=36419
https://bugs.llvm.org/show_bug.cgi?id=37603
https://bugs.llvm.org/show_bug.cgi?id=37610
https://rise4fun.com/Alive/idM

Reviewers: craig.topper, spatel, RKSimon, javed.absar

Reviewed By: craig.topper

Subscribers: kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D47453

llvm-svn: 334125
2018-06-06 19:38:16 +00:00
Krzysztof Parzyszek c1e712baa5 [Hexagon] Implement vector-pair zero as V6_vsubw_dv
llvm-svn: 334123
2018-06-06 19:34:40 +00:00
Craig Topper ef813a5226 [X86] Properly disassemble gather/scatter instructions where xmm4/ymm4/zmm4 are used as the index.
These encodings correspond to the cases in the normal encoding scheme where there is no index and our modrm reading code initially decodes it as such. The VSIB handling code tried to compensate for this, but failed to add the base needed to make later code do the right thing.

Fixes PR37712.

llvm-svn: 334121
2018-06-06 19:15:15 +00:00
Craig Topper d04cc8e640 [X86] Rename vy512mem->vy512xmem and vz256xmem->vz256mem.
The index size is represented by the letter after the 'v'. The number represents the memory size. If an 'x' appears after the number its means the index register can be from VR128X/VR256X instead of VR128/VR256.

As vy512mem uses a VR256X index it should have an x.
And vz256mem uses a VR512 index so it shouldn't have an x.

I admit these names kind of suck and are confusing.

llvm-svn: 334120
2018-06-06 19:15:12 +00:00
Simon Pilgrim aef5bdbea1 [X86][BtVer2] Add support for all vector instructions that should match the dependency-breaking 'zero-idiom'
As detailed on Agner's Microarchitecture doc (21.8 AMD Bobcat and Jaguar pipeline - Dependency-breaking instructions), all these instructions are dependency breaking and zero the destination register.

llvm-svn: 334119
2018-06-06 19:06:09 +00:00
Evandro Menezes b2c8244715 [AArch64, ARM] Add support for Samsung Exynos M4
Create a separate feature set for Exynos M4 and add test cases.

llvm-svn: 334115
2018-06-06 18:56:00 +00:00
Michael Berg cc1c4b6912 guard fsqrt with fmf sub flags
Summary:
This change uses fmf subflags to guard optimizations as well as unsafe. These changes originated from D46483.
It contains only context for fsqrt.


Reviewers: spatel, hfinkel, arsenm

Reviewed By: spatel

Subscribers: hfinkel, wdng, andrew.w.kaylor, wristow, efriedma, nemanjai

Differential Revision: https://reviews.llvm.org/D47749

llvm-svn: 334113
2018-06-06 18:47:55 +00:00
Krzysztof Parzyszek 0da1fe3770 [Hexagon] Split CTPOP of vector pairs
llvm-svn: 334109
2018-06-06 18:03:29 +00:00
Petar Jovanovic 8cb6a521be Change TII isCopyInstr way of returning arguments(NFC)
Make TII isCopyInstr() return MachineOperands through pointer to pointer
instead via reference.

Patch by Nikola Prica.

Differential Revision: https://reviews.llvm.org/D47364

llvm-svn: 334105
2018-06-06 16:36:30 +00:00
David Green 25312b2b6c [GlobalMerge] Set the alignment on merged global structs
If no alignment is set, the abi/preferred alignment of structs will be
used which may be higher than required. This can lead to extra padding
and in the end an increase in data size.

Differential Revision: https://reviews.llvm.org/D47633

llvm-svn: 334099
2018-06-06 14:48:32 +00:00
Tim Northover 9b80060d7b InstCombine: ignore debug instructions during fence combine
We should never get different CodeGen based on whether the code is being
compiled in debug mode so we must skip over @llvm.dbg.value (and similar)
calls.

Should fix at least the worst part of PR37690.

llvm-svn: 334090
2018-06-06 12:46:02 +00:00
Simon Pilgrim f06ff16049 Fix MSVC '*/' found outside of comment warning. NFCI.
llvm-svn: 334086
2018-06-06 11:10:11 +00:00
Ilya Biryukov 3c9c10649b Fix compilation of WebAssembly and RISCV after r334078
llvm-svn: 334085
2018-06-06 10:57:50 +00:00
Simon Pilgrim 3d14158891 [X86][BMI][TBM] Only demand bottom 16-bits of the BEXTR control op (PR34042)
Only the bottom 16-bits of BEXTR's control op are required (0:8 INDEX, 15:8 LENGTH).

Differential Revision: https://reviews.llvm.org/D47690

llvm-svn: 334083
2018-06-06 10:52:10 +00:00
Peter Smith 57f661bd7d [MC] Pass MCSubtargetInfo to fixupNeedsRelaxation and applyFixup
On targets like Arm some relaxations may only be performed when certain
architectural features are available. As functions can be compiled with
differing levels of architectural support we must make a judgement on
whether we can relax based on the MCSubtargetInfo for the function. This
change passes through the MCSubtargetInfo for the function to
fixupNeedsRelaxation so that the decision on whether to relax can be made
per function. In this patch, only the ARM backend makes use of this
information. We must also pass the MCSubtargetInfo to applyFixup because
some fixups skip error checking on the assumption that relaxation has
occurred, to prevent code-generation errors applyFixup must see the same
MCSubtargetInfo as fixupNeedsRelaxation.

Differential Revision: https://reviews.llvm.org/D44928

llvm-svn: 334078
2018-06-06 09:40:06 +00:00
Petar Jovanovic 326ec32403 [MIPS GlobalISel] Add lowerCall
Add minimal support to lower function calls.
Support only functions with arguments/return that go through registers
and have type i32.

Patch by Petar Avramovic.

Differential Revision: https://reviews.llvm.org/D45627

llvm-svn: 334071
2018-06-06 07:24:52 +00:00
Petr Hosek fc9b29bd61 [Support] Use zx_cache_flush on Fuchsia to flush instruction cache
Fuchsia doesn't use __clear_cache, instead it provide zx_cache_flush
system call. Use it to flush instruction cache.

Differential Revision: https://reviews.llvm.org/D47753

llvm-svn: 334068
2018-06-06 06:26:18 +00:00
Sanjay Patel 59313be8d3 [CodeGen] assume max/default throughput for unspecified instructions
This is a fix for the problem arising in D47374 (PR37678):
https://bugs.llvm.org/show_bug.cgi?id=37678

We may not have throughput info because it's not specified in the model 
or it's not available with variant scheduling, so assume that those
instructions can execute/complete at max-issue-width.

Differential Revision: https://reviews.llvm.org/D47723

llvm-svn: 334055
2018-06-05 23:34:45 +00:00
Amaury Sechet a79b6b3ef0 [Mips] Remove uneeded variants of ADDC/ADDE lowering
Summary: As it turns out, the lowering for the Mips16* family of target is the exact same thing as what the ops expands to, so the code handling them can be removed and the ops only enabled for the MipsSE* family of targets.

Reviewers: smaksimovic, atanasyan, abeserminji

Subscribers: sdardis, arichardson, llvm-commits

Differential Revision: https://reviews.llvm.org/D47703

llvm-svn: 334052
2018-06-05 22:13:56 +00:00
Guozhi Wei c4c6b548c5 [CodeGenPrepare] Move Extension Instructions Through Logical And Shift Instructions
CodeGenPrepare pass move extension instructions close to load instructions in different BB, so they can be combined later. But the extension instructions can't move through logical and shift instructions in current implementation. This patch enables this enhancement, so we can eliminate more extension instructions.

Differential Revision: https://reviews.llvm.org/D45537

This is re-commit of r331783, which was reverted by r333305. The performance regression was caused by some unlucky alignment, not a code generation problem.

llvm-svn: 334049
2018-06-05 21:03:52 +00:00
Zachary Turner 8ac1c38a72 [FileSystem] Remove OpenFlags param from several functions.
There was only one place in the entire codebase where a non
default value was being passed, and that place was already hidden
in an implementation file.  So we can delete the extra parameter
and all existing clients continue to work as they always have,
while making the interface a bit simpler.

Differential Revision: https://reviews.llvm.org/D47789

llvm-svn: 334046
2018-06-05 19:58:26 +00:00
Matt Arsenault 57e541e87e AMDGPU: Preserve metadata when widening loads
Preserves the low bound of the !range. I don't think
it's legal to do anything with the top half since it's
theoretically reading garbage.

llvm-svn: 334045
2018-06-05 19:52:56 +00:00
Matt Arsenault 9224c00d2b AMDGPU: Use more custom insert/extract_vector_elt lowering
Apply to i8 vectors.

llvm-svn: 334044
2018-06-05 19:52:46 +00:00
Krzysztof Parzyszek b984ffcc71 [Hexagon] Add pattern to generate 64-bit neg instruction
llvm-svn: 334043
2018-06-05 19:52:39 +00:00
Krzysztof Parzyszek d8b093efef [Hexagon] Add more patterns for generating abs/absp instructions
llvm-svn: 334038
2018-06-05 19:00:50 +00:00
Michael Berg 96925fe0df guard fneg with fmf sub flags
Summary: This change uses fmf subflags to guard optimizations as well as unsafe. These changes originated from D46483.

Reviewers: spatel, hfinkel

Reviewed By: spatel

Subscribers: nemanjai

Differential Revision: https://reviews.llvm.org/D47389

llvm-svn: 334037
2018-06-05 18:49:47 +00:00
Simon Dardis 0d95ff03f2 [mips] Fix the predicates for arithmetic operations
Reviewers: smaksimovic, atanasyan, abeserminji

Differential Revision: https://reviews.llvm.org/D47635

llvm-svn: 334031
2018-06-05 17:53:22 +00:00
Simon Pilgrim f2f043acbb [X86][SSE] Use multiplication scale factors for v8i16 SHL on pre-AVX2 targets.
Similar to v4i32 SHL, convert v8i16 shift amounts to scale factors instead to improve performance and reduce instruction count. We were already doing this for constant shifts, this adds variable shift support.

Reduces the serial nature of the codegen, which relies on chains of plendvb/pand+pandn+por shifts.

This is a step towards adding support for vXi16 vector rotates.

Differential Revision: https://reviews.llvm.org/D47546

llvm-svn: 334023
2018-06-05 15:17:39 +00:00
Nirav Dave 05b589101e [MC][X86] Allow assembler variable assignment to register name.
Summary:
Allow extended parsing of variable assembler assignment syntax and modify X86 to permit
VAR = register assignment. As we emit these as .set directives when possible, we inline
such expressions in output assembly.

Fixes PR37425.

Reviewers: rnk, void, echristo

Reviewed By: rnk

Subscribers: nickdesaulniers, llvm-commits, hiraditya

Differential Revision: https://reviews.llvm.org/D47545

llvm-svn: 334022
2018-06-05 15:13:39 +00:00
Matt Arsenault 191bc71541 DAG: Stop dropping invariant/dereferencable
When legalizing illegal FP load results, this was
for some reason dropping the invariant and dereferencable
memory flags. There doesn't seem to be any reason for this,
and the equivalent isn't done for integer loads.

Fixes an issue in a future AMDGPU commit where some identical
loads fail to merge because one of the loads ends up
dropping the flags.

llvm-svn: 334020
2018-06-05 14:52:24 +00:00
John Brawn e4ff0bd401 [InstCombine] Correct the cmp operand type used when canonicalizing abs/nabs
When adjusting a cmp in order to canonicalize an abs/nabs select pattern we need
to use the type of the existing operand when creating a new operand not the
type of a select operand, as the two may be different.

This fixes PR37686.

llvm-svn: 334019
2018-06-05 14:10:55 +00:00
Gabor Buella 1181f94ae4 [X86] NFC Fix typo introduced in r328016 HSI->HDI
llvm-svn: 334016
2018-06-05 12:55:12 +00:00
Krzysztof Parzyszek aafb8c204c [Hexagon] Minor cleanups in isel lowering
llvm-svn: 334015
2018-06-05 12:49:19 +00:00
Hiroshi Inoue 955655f558 [PowerPC] reduce rotate in BitPermutationSelector
BitPermutationSelector builds the output value by repeating rotate-and-mask instructions with input registers.
Here, we may avoid one rotate instruction if we start building from an input register that does not require rotation.

For example of the test case bitfieldinsert.ll, it first rotates left r4 by 8 bits and then inserts some bits from r5 without rotation.
This can be executed by one rlwimi instruction, which rotates r4 by 8 bits and inserts its bits into r5.

This patch adds a check for rotation amounts in the comparator used in sorting to process the input without rotation first.

Differential Revision: https://reviews.llvm.org/D47765

llvm-svn: 334011
2018-06-05 11:58:01 +00:00
Simon Pilgrim fef9b6eea6 [X86][SSE] Add target shuffle support to X86TargetLowering::computeKnownBitsForTargetNode
Ideally we'd use resolveTargetShuffleInputs to handle faux shuffles as well but:
(a) that code path doesn't handle general/pre-legalized ops/types very well.
(b) I'm concerned about the compute time as they recurse to calls to computeKnownBits/ComputeNumSignBits which would need depth limiting somehow.

llvm-svn: 334007
2018-06-05 10:52:29 +00:00
Gabor Buella 349ffcee87 [X86] NFC Refactor some code in InstPrinters
Summary:
Bringing some come duplicated in the AT&T and the Intel printers
into a common parent class.

Reviewers: craig.topper

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D47682

llvm-svn: 334005
2018-06-05 10:41:39 +00:00
Peter Smith ef945b2240 [MC][ARM] Add range checking for Thumb2 resolved fixups.
When the branch target of a Thumb2 unconditional or conditonal branch is
resolved at assembly time, no range checking is performed on the result
leading to incorrect immediates. This change adds a range check:
+- 16 Megabytes for unconditional branches, +- 1 Megabyte for the
conditional branch.

Differential Revision: https://reviews.llvm.org/D46306

llvm-svn: 333997
2018-06-05 10:00:56 +00:00
Simon Pilgrim 7bbe7a2920 [X86][SSE] Add basic PACKUS support to X86TargetLowering::computeKnownBitsForTargetNode
Helps improve analysis of saturation ops

llvm-svn: 333995
2018-06-05 09:45:03 +00:00
Peter Smith 0aafe0cee5 [MC][ARM] Correct Thumb BL instruction range
The Thumb BL range is + or - either 16 Megabytes or 4 Megabytes depending
on whether the CPU supports Thumb2 or the v8-m baseline ops. The existing
check for BL range is incorrectly set at +- 32 Megabytes. This change
corrects the higher range and uses the lower range if the featurebits
don't have the necessary support for it.

Differential Revision: https://reviews.llvm.org/D46305

llvm-svn: 333991
2018-06-05 09:32:28 +00:00
Alexander Ivchenko 964b27fa21 [X86][CET] Shadow stack fix for setjmp/longjmp
This is the new version of D46181, allowing setjmp/longjmp
to work correctly with the Intel CET shadow stack by storing
SSP on setjmp and fixing it on longjmp. The patch has been
updated to use the cf-protection-return module flag instead
of HasSHSTK, and the bug that caused D46181 to be reverted
has been fixed with the test expanded to track that fix.

patch by mike.dvoretsky

Differential Revision: https://reviews.llvm.org/D47311

llvm-svn: 333990
2018-06-05 09:22:30 +00:00
Craig Topper f17b33d6c6 [X86] Make all instructions that operate on MMX types, but were added after the initial MMX support via one of the SSE features flags make them require the MMX feature as well.
Passing -mattr=-mmx needs to disable these instructions since the MMX register class won't have been set up. But we don't want -mattr=-mmx to disable SSE so we have to do it separately.

llvm-svn: 333984
2018-06-05 06:20:06 +00:00
Nirav Dave e5eb99668c [RegAllocGreedy] Use simpler map class for EvicteeInfo. NFCI.
RegAlloc keeps a insertion-time ordered map of evictee information,
but we only use membership. Replace MapVector with contextually
equivalent DenseMap which is smaller and faster.

llvm-svn: 333981
2018-06-05 03:16:28 +00:00
Francis Visoiu Mistrih 2c0ef67327 Use MF instead of Fn for MachineFunction references. NFC
llvm-svn: 333973
2018-06-05 00:27:28 +00:00
Francis Visoiu Mistrih ca69b3bf6d [ShrinkWrap] Add optimization remarks to the shrink-wrapping pass
Start by emitting remarks for very basic unsupported cases such as
irreducible CFGs and EHFunclets. The end goal is to be able to cover all
the cases where we give up with an explanation.

llvm-svn: 333972
2018-06-05 00:27:24 +00:00
Amara Emerson d496cc8ffb [MIRParser] Add parser support for 'true' and 'false' i1s.
We already output true and false in the printer, but the parser isn't able to
read it.

Differential Revision: https://reviews.llvm.org/D47424

llvm-svn: 333970
2018-06-05 00:17:13 +00:00
Reid Kleckner adcaddb6da Fix -Wcovered-switch-default warning and clang-format it
llvm-svn: 333967
2018-06-04 23:47:29 +00:00
David Blaikie 10d25ffe7d Move Compiler.h from Demangle back to Support
Code review feedback from r328123 prefers copying the few feature test
macros used by Demangle into there, rather than sinking the header into
an odd corner like Demangle.

llvm-svn: 333965
2018-06-04 22:53:38 +00:00
Derek Schuff 72f19241d6 Simplified WebAssemblyAsmBackend by removing explicit ELF variant.
The ELF version was broken (does not deal with wasm specific fixups),
and now is slightly less broken. It will be removed in its entirety
in the future which this change makes slightly easier (just remove
the IsELF bool).

Differential Revision: https://reviews.llvm.org/D47745

Patch by Wouter van Oortmerssen

llvm-svn: 333964
2018-06-04 22:53:36 +00:00
Sanjay Patel dcb8d304c3 [InstCombine] refine UB-handling in shuffle-binop transform
As noted in rL333782, we can be both better for optimization and
safer with this transform:
BinOp (shuffle V1, Mask), C --> shuffle (BinOp V1, NewC), Mask

The only potentially unsafe-to-speculate binops are integer div/rem.
All other binops are always safe (although I don't see a way to
assert that in code here).

For opcodes like shifts that can produce poison, it can't matter
here because we know the lanes with undef are dropped by the
subsequent shuffle.

Differential Revision: https://reviews.llvm.org/D47686

llvm-svn: 333962
2018-06-04 22:26:45 +00:00
David Blaikie 31b98d2e99 Move Analysis/Utils/Local.h back to Transforms
Review feedback from r328165. Split out just the one function from the
file that's used by Analysis. (As chandlerc pointed out, the original
change only moved the header and not the implementation anyway - which
was fine for the one function that was used (since it's a
template/inlined in the header) but not in general)

llvm-svn: 333954
2018-06-04 21:23:21 +00:00
Jessica Paquette aa087327ce [MachineOutliner] NFC - Move intermediate data structures to MachineOutliner.h
This is setting up to fix bug 37573 cleanly.

This moves data structures that are technically both used in some way by the
target and the general-purpose outlining algorithm into MachineOutliner.h. In
particular, the `Candidate` class is of importance.

Before, the outliner passed the locations of `Candidates` to the target, which
would then make some decisions about the prospective outlined function. This
change allows us to just pass `Candidates` along to the target. This will allow
the target to discard `Candidates` that would be considered unsafe before cost
calculation. Thus, we will be able to remove the unsafe candidates described in
the bug without resorting to torching the entire prospective function.

Also, as a side-effect, it makes the outliner a bit cleaner.

https://bugs.llvm.org/show_bug.cgi?id=37573

llvm-svn: 333952
2018-06-04 21:14:16 +00:00
Alexander Ivchenko 2f038c4094 [X86][ELF][CET] Adding the .note.gnu.property ELF section in X86
In preparation for the proposed linker ABI changes
(https://github.com/hjl-tools/linux-abi/wiki/linux-abi-draft.pdf,
https://github.com/hjl-tools/x86-psABI/wiki/x86-64-psABI-cet.pdf),
this patch enables emission of the .note.gnu.property section to
ELF object files when building CET-enabled modules.

patch by mike.dvoretsky

Differential Revision: https://reviews.llvm.org/D47145

llvm-svn: 333951
2018-06-04 21:07:35 +00:00
Scott Linder ba81d7f1eb [CodeGen] Always update divergence in SelectionDAG::UpdateNodeOperands
Some overloads failed to update divergence.

Differential Revision: https://reviews.llvm.org/D47148

llvm-svn: 333947
2018-06-04 20:19:45 +00:00
Zachary Turner 63db25ba0d [Support] Add functions that operate on native file handles on Windows.
Windows' CRT has a limit of 512 open file descriptors, and fds which are
generated by converting a HANDLE via _get_osfhandle count towards this
limit as well.

Regardless, often you find yourself marshalling back and forth between
native HANDLE objects and fds anyway. If we know from the getgo that
we're going to need to work directly with the handle, we can cut out the
marshalling layer while also not contributing to filling up the CRT's
very limited handle table.

On Unix these functions just delegate directly to the existing set of
functions since an fd *is* the native file type. It would be nice, very
long term, if we could convert most uses of fds to file_t.

Differential Revision: https://reviews.llvm.org/D47688

llvm-svn: 333945
2018-06-04 19:38:11 +00:00
Amaury Sechet da661e9236 [DAGcombine] Teach the combiner about -a = ~a + 1
Summary: This include variant for add, uaddo and addcarry. usubo and subcarry require the carry to be flipped to preserve semantic, but we chose to do the transform anyway in that case as to push the transform down the carry chain.

Reviewers: efriedma, spatel, RKSimon, zvi, bkramer

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D46505

llvm-svn: 333943
2018-06-04 19:23:22 +00:00
Amaury Sechet 93a7d2aa3c Get rid of SETCCE
Summary: It has been deprecated in favor of SETCCCARRY for a year now and isn't used by any in tree backend.

Reviewers: efriedma, craig.topper, dblaikie, bkramer

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D47685

llvm-svn: 333939
2018-06-04 18:36:22 +00:00
Dmitry Mikulin 4539487650 In thin and full LTO + CFI, direct function calls may go through jump table
entries to reach the target. Since these calls don't require type checks,
we can short-circuit them to their real targets, except in cases when they
can be pre-empted.

Differential Revision: https://reviews.llvm.org/D46326

llvm-svn: 333937
2018-06-04 18:18:12 +00:00
Craig Topper 1f956e2c5f [X86] Don't pass ParitySrc array into isAddSubOrSubAddMask. Instead use a bool output parameter to get the real piece of info we care about. NFC
The ParitySrc array is more of an implementation detail. A single bool to get the final parity is sufficient.

llvm-svn: 333935
2018-06-04 17:58:45 +00:00
Stanislav Mekhanoshin 838c07c531 [AMDGPU] Small refactoring in the scheduler
After last changes some code can be simplified.

Differential Revision: https://reviews.llvm.org/D47661

llvm-svn: 333934
2018-06-04 17:57:40 +00:00
Stanislav Mekhanoshin 28624f94d5 [AMDGPU] Factored out common part of GCNRPTracker::reset()
Differential Revision: https://reviews.llvm.org/D47664

llvm-svn: 333931
2018-06-04 17:21:54 +00:00
Sam Clegg 675a51750a [MachO] Add out-of-bounds check to MachOObjectFile.cpp
This is a followup to rL333496.

Differential Revision: https://reviews.llvm.org/D47544

llvm-svn: 333929
2018-06-04 17:01:20 +00:00
Sam Clegg 537afe6f0e [WebAssembly] Fix .td files after rL333900
Differential Revision: https://reviews.llvm.org/D47727

llvm-svn: 333928
2018-06-04 16:59:26 +00:00
John Brawn c5a6392be3 [ValueTracking] Match select abs pattern when there's an sext involved
When checking a select to see if it matches an abs, allow the true/false values
to be a sign-extension of the comparison value instead of requiring that they're
directly the comparison value, as all the comparison cares about is the sign of
the value.

This fixes a regression due to r333702, where we were no longer generating ctlz
due to isKnownNonNegative failing to match such a pattern.

Differential Revision: https://reviews.llvm.org/D47631

llvm-svn: 333927
2018-06-04 16:53:57 +00:00
Mark Searles f0b93f1e9e [AMDGPU][Waitcnt] Fix handling of flat instrs
On GFX9 and earlier, flat memory ops may decrement VMCNT out-of-order as well as LGKMCNT out-of-order.

Differential Revision: https://reviews.llvm.org/D46616

llvm-svn: 333926
2018-06-04 16:51:59 +00:00
Simon Pilgrim 7c000d4267 [X86] Only accept const SelectionDAG to resolveTargetShuffleInputs/getFauxShuffleMask
These methods should only be using SelectionDAG for analysis (known/sign bits etc), not node creation.

llvm-svn: 333925
2018-06-04 16:48:13 +00:00
Benjamin Kramer f663eba561 [NVPTX] Delete dead code from the AsmPrinter.
llvm-svn: 333924
2018-06-04 16:12:33 +00:00
Andrea Di Biagio 39e5a5695f [RFC][patch 3/3] Add support for variant scheduling classes in llvm-mca.
This patch is the last of a sequence of three patches related to LLVM-dev RFC
"MC support for variant scheduling classes".
http://lists.llvm.org/pipermail/llvm-dev/2018-May/123181.html

This fixes PR36672.

The main goal of this patch is to teach llvm-mca how to solve variant scheduling
classes.  This patch does that, plus it adds new variant scheduling classes to
the BtVer2 scheduling model to identify so-called zero-idioms (i.e. so-called
dependency breaking instructions that are known to generate zero, and that are
optimized out in hardware at register renaming stage).

Without the BtVer2 change, this patch would not have had any meaningful tests.
This patch is effectively the union of two changes:
 1) a change that teaches llvm-mca how to resolve variant scheduling classes.
 2) a change to the BtVer2 scheduling model that allows us to special-case
    packed XOR zero-idioms (this partially fixes PR36671).

Differential Revision: https://reviews.llvm.org/D47374 

llvm-svn: 333909
2018-06-04 15:43:09 +00:00
Krzysztof Parzyszek 623eb54361 [SelectionDAG] Add missing closing parentheses in comments, NFC
llvm-svn: 333907
2018-06-04 14:54:53 +00:00
Nicolai Haehnle 59198ed040 AMDGPU: Make various NamedOperands upper case
Summary:
Avoid name clashes with the corresponding bit fields in the instruction
encoding.

Change-Id: Id1644e703e976e78f7af93788d9f44cb48c3251f

Reviewers: arsenm, rampitec, kzhuravl

Subscribers: wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D47433

llvm-svn: 333905
2018-06-04 14:45:20 +00:00
Nicolai Haehnle 01d261f18d TableGen: Streamline the semantics of NAME
Summary:
The new rules are straightforward. The main rules to keep in mind
are:

1. NAME is an implicit template argument of class and multiclass,
   and will be substituted by the name of the instantiating def/defm.

2. The name of a def/defm in a multiclass must contain a reference
   to NAME. If such a reference is not present, it is automatically
   prepended.

And for some additional subtleties, consider these:

3. defm with no name generates a unique name but has no special
   behavior otherwise.

4. def with no name generates an anonymous record, whose name is
   unique but undefined. In particular, the name won't contain a
   reference to NAME.

Keeping rules 1&2 in mind should allow a predictable behavior of
name resolution that is simple to follow.

The old "rules" were rather surprising: sometimes (but not always),
NAME would correspond to the name of the toplevel defm. They were
also plain bonkers when you pushed them to their limits, as the old
version of the TableGen test case shows.

Having NAME correspond to the name of the toplevel defm introduces
"spooky action at a distance" and breaks composability:
refactoring the upper layers of a hierarchy of nested multiclass
instantiations can cause unexpected breakage by changing the value
of NAME at a lower level of the hierarchy. The new rules don't
suffer from this problem.

Some existing .td files have to be adjusted because they ended up
depending on the details of the old implementation.

Change-Id: I694095231565b30f563e6fd0417b41ee01a12589

Reviewers: tra, simon_tatham, craig.topper, MartinO, arsenm, javed.absar

Subscribers: wdng, llvm-commits

Differential Revision: https://reviews.llvm.org/D47430

llvm-svn: 333900
2018-06-04 14:26:05 +00:00
Simon Dardis fb4dde1142 [mips] Restore the availablity of trap for microMIPS
Reviewers: smaksimovic, atanasyan, abeserminji

Differential Revision: https://reviews.llvm.org/D47584

llvm-svn: 333895
2018-06-04 12:50:32 +00:00
Luke Geeson 43e4367961 [AArch64] Audit on rL333634 to fix FP16 Disasm BitPatterns
llvm-svn: 333879
2018-06-04 09:41:32 +00:00
Sander de Smalen d0a6f6a502 [AArch64][SVE] Fix range for DUP immediates (16bit elts)
For immediates used in DUP instructions that have the range
-128 to 127, or a multiple of 256 in the range -32768 to 32512,
one could argue that when the result element size is 16bits (.h),
the value can be considered both signed and unsigned.

Reviewers: rengolin, fhahn, SjoerdMeijer, samparker, javed.absar

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D47619

llvm-svn: 333873
2018-06-04 07:24:23 +00:00
Sander de Smalen fd54a781f6 [AArch64][SVE] Asm: Print indexed element 0 as FPR.
Print the first indexed element as a FP register, for example:

  mov z0.d, z1.d[0]

Is now printed as:

  mov z0.d, d1

Next to printing, this patch also adds aliases to parse 'mov z0.d, d1'.

Reviewers: rengolin, fhahn, samparker, SjoerdMeijer, javed.absar

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D47571

llvm-svn: 333872
2018-06-04 07:07:35 +00:00
Sander de Smalen c33d668ab7 [AArch64][SVE] Asm: Support for indexed DUP instructions.
Unpredicated copy of indexed SVE element to SVE vector,
along with MOV-aliases.

For example:

  dup     z0.h, z1.h[0]

duplicates the first 16-bit element from z1 to all elements in
the result vector z0.

Reviewers: rengolin, fhahn, samparker, SjoerdMeijer, javed.absar

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D47570

llvm-svn: 333871
2018-06-04 06:40:55 +00:00
Sander de Smalen 367a53b059 [AArch64][SVE] Asm: Support for FCPY immediate instructions.
Predicated copy of floating-point immediate value to SVE vector,
along with MOV-aliases.

Reviewers: rengolin, fhahn, samparker, SjoerdMeijer, javed.absar

Reviewed By: javed.absar

Differential Revision: https://reviews.llvm.org/D47518

llvm-svn: 333869
2018-06-04 05:58:06 +00:00
Sander de Smalen 512d57f1a5 [AArch64][SVE] Asm: Support for CPY immediate instructions
Predicated copy of possibly shifted immediate value into SVE
vector, along with MOV-aliases.

Reviewers: rengolin, fhahn, samparker, SjoerdMeijer, javed.absar

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D47517

llvm-svn: 333868
2018-06-04 05:40:46 +00:00
Serguei Katkov d894fb4288 [InstCombine] Fix div handling
When we optimize select basing on fact that div by 0 is undef
we should not traverse the instruction which are not guaranteed to
transfer execution to next instruction. Guard intrinsic is an example.

Reviewers: spatel, craig.topper
Reviewed By: spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D47576

llvm-svn: 333864
2018-06-04 02:52:36 +00:00
Vedant Kumar adbd27a599 [Debugify] Don't apply DI before the bitcode writer pass
Applying synthetic debug info before the bitcode writer pass has no
testing-related purpose. This commit prevents that from happening.

It also adds tests which check that IR produced with/without
-debugify-each enabled is identical after stripping. This makes it
possible to check that individual passes (or full pipelines) are
invariant to debug info.

llvm-svn: 333861
2018-06-04 00:11:49 +00:00
Craig Topper 9923eac358 [X86] Remove and autoupgrade masked avx512vnni intrinsics using the unmasked intrinsics and select instructions.
llvm-svn: 333857
2018-06-03 23:24:17 +00:00
Lang Hames d6155ff002 [ORC] Add a constructor to create an IRMaterializationUnit from a module and
pre-existing SymbolFlags and SymbolToDefinition maps.

This constructor is useful when delegating work from an existing
IRMaterialiaztionUnit to a new one, as it avoids the cost of re-computing these
maps.

llvm-svn: 333852
2018-06-03 19:22:48 +00:00
Sanjay Patel 3bd957b7ae [InstCombine] improve sub with bool folds
There's a patchwork of existing transforms trying to handle
these cases, but as seen in the changed test, we weren't
catching them all.

llvm-svn: 333845
2018-06-03 16:35:26 +00:00
Amaury Sechet 99909e9308 Remove SETCCE use from Lanai's backend
Summary: This creates a small perf regression, but after talking with Jacques Pienaar, he was good with it to get things moving toward removng SETCCE.

Reviewers: jpienaar, bryant

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D47626

llvm-svn: 333838
2018-06-03 12:56:24 +00:00
Ivan A. Kosarev 60a991ed1a [NEON] Support VLD1xN intrinsics in AArch32 mode (LLVM part)
We currently support them only in AArch64. The NEON Reference,
however, says they are 'ARMv7, ARMv8' intrinsics.

Differential Revision: https://reviews.llvm.org/D47120

llvm-svn: 333825
2018-06-02 16:40:03 +00:00
Ivan A. Kosarev 73c5337a64 Revert r333819 "[NEON] Support VLD1xN intrinsics in AArch32 mode (Clang part)"
The LLVM part was committed instead of the Clang part.

Differential Revision: https://reviews.llvm.org/D47121

llvm-svn: 333824
2018-06-02 16:38:38 +00:00
Michael J. Spencer ae6eeaea92 [MC] Add assembler support for .cg_profile.
Object FIle Representation
At codegen time this is emitted into the ELF file a pair of symbol indices and a weight. In assembly it looks like:

.cg_profile a, b, 32
.cg_profile freq, a, 11
.cg_profile freq, b, 20

When writing an ELF file these are put into a SHT_LLVM_CALL_GRAPH_PROFILE (0x6fff4c02) section as (uint32_t, uint32_t, uint64_t) tuples as (from symbol index, to symbol index, weight).

Differential Revision: https://reviews.llvm.org/D44965

llvm-svn: 333823
2018-06-02 16:33:01 +00:00
Craig Topper 93d8fbd8f2 [X86] Add tied source operand to AVX5124FMAPS and AVX5124VNNIW instructions.
This doesn't affect the assembly or disassembly, but is more accurate.

llvm-svn: 333822
2018-06-02 16:30:39 +00:00