Commit Graph

49400 Commits

Author SHA1 Message Date
Simon Pilgrim 29279f29c8 [X86][SSE] Add extract_subvector(PSHUFB) -> PSHUFB(extract_subvector()) combine
Fixes PR32160 by reducing the size of PSHUFB if we only use one of the lanes.

This approach can probably be generalized to handle any target shuffle (and any subvector index) but we have no test coverage at the moment.

llvm-svn: 344336
2018-10-12 12:10:34 +00:00
Andrea Di Biagio 6eebbe0a97 [tblgen][llvm-mca] Add the ability to describe move elimination candidates via tablegen.
This patch adds the ability to identify instructions that are "move elimination
candidates". It also allows scheduling models to describe processor register
files that allow move elimination.

A move elimination candidate is an instruction that can be eliminated at
register renaming stage.
Each subtarget can specify which instructions are move elimination candidates
with the help of tablegen class "IsOptimizableRegisterMove" (see
llvm/Target/TargetInstrPredicate.td).

For example, on X86, BtVer2 allows both GPR and MMX/SSE moves to be eliminated.
The definition of 'IsOptimizableRegisterMove' for BtVer2 looks like this:

```
def : IsOptimizableRegisterMove<[
  InstructionEquivalenceClass<[
    // GPR variants.
    MOV32rr, MOV64rr,

    // MMX variants.
    MMX_MOVQ64rr,

    // SSE variants.
    MOVAPSrr, MOVUPSrr,
    MOVAPDrr, MOVUPDrr,
    MOVDQArr, MOVDQUrr,

    // AVX variants.
    VMOVAPSrr, VMOVUPSrr,
    VMOVAPDrr, VMOVUPDrr,
    VMOVDQArr, VMOVDQUrr
  ], CheckNot<CheckSameRegOperand<0, 1>> >
]>;
```

Definitions of IsOptimizableRegisterMove from processor models of a same
Target are processed by the SubtargetEmitter to auto-generate a target-specific
override for each of the following predicate methods:

```
bool TargetSubtargetInfo::isOptimizableRegisterMove(const MachineInstr *MI)
const;
bool MCInstrAnalysis::isOptimizableRegisterMove(const MCInst &MI, unsigned
CPUID) const;
```

By default, those methods return false (i.e. conservatively assume that there
are no move elimination candidates).

Tablegen class RegisterFile has been extended with the following information:
 - The set of register classes that allow move elimination.
 - Maxium number of moves that can be eliminated every cycle.
 - Whether move elimination is restricted to moves from registers that are
   known to be zero.

This patch is structured in three part:

A first part (which is mostly boilerplate) adds the new
'isOptimizableRegisterMove' target hooks, and extends existing register file
descriptors in MC by introducing new fields to describe properties related to
move elimination.

A second part, uses the new tablegen constructs to describe move elimination in
the BtVer2 scheduling model.

A third part, teaches llm-mca how to query the new 'isOptimizableRegisterMove'
hook to mark instructions that are candidates for move elimination. It also
teaches class RegisterFile how to describe constraints on move elimination at
PRF granularity.

llvm-mca tests for btver2 show differences before/after this patch.

Differential Revision: https://reviews.llvm.org/D53134

llvm-svn: 344334
2018-10-12 11:23:04 +00:00
Simon Pilgrim c844bc84dd [X86] Ignore float/double non-temporal loads (PR39256)
Scalar non-temporal loads were asserting instead of just being ignored.

Reduced from https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=10895

llvm-svn: 344331
2018-10-12 10:20:16 +00:00
Stefan Maksimovic 285c0f4fdc [mips] Mark fmaxl as a long double emulation routine
Failure was discovered upon running
projects/compiler-rt/test/builtins/Unit/divtc3_test.c
in a stage2 compiler build.

When compiling projects/compiler-rt/lib/builtins/divtc3.c,
a call to fmaxl within the divtc3 implementation had its
return values read from registers $2 and $3 instead of $f0 and $f2.
Include fmaxl in the list of long double emulation routines
to have its return value correctly interpreted as f128.

Almost exact issue here: https://reviews.llvm.org/D17760

Differential Revision: https://reviews.llvm.org/D52649

llvm-svn: 344326
2018-10-12 08:18:38 +00:00
Tom Stellard a894043910 Revert "AMDGPU/GlobalISel: Implement select for G_INSERT"
This reverts commit r344310.

The test case was failing on some bots.

llvm-svn: 344317
2018-10-11 23:36:46 +00:00
Matthias Braun d6131c9633 X86/TargetTransformInfo: Report div/rem constant immediate costs as TCC_Free
DIV/REM by constants should always be expanded into mul/shift/etc.
patterns. Unfortunately the ConstantHoisting pass runs too early at a
point where the pattern isn't expanded yet. However after
ConstantHoisting hoisted some immediate the result may not expand
anymore. Also the hoisting typically doesn't make sense because it
operates on immediates that will change completely during the expansion.

Report DIV/REM as TCC_Free so ConstantHoisting will not touch them.

Differential Revision: https://reviews.llvm.org/D53174

llvm-svn: 344315
2018-10-11 23:14:35 +00:00
Tom Stellard 4733be6e7b AMDGPU/GlobalISel: Implement select for G_INSERT
Reviewers: arsenm

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D53116

llvm-svn: 344310
2018-10-11 22:49:54 +00:00
Ana Pazos 0a5fcefa31 [RISCV] Fix disassembling of fence instruction with invalid field
Summary:
Instruction with 0 in fence field being disassembled as fence , iorw.
Printing "unknown" to match GAS behavior.

This bug was uncovered by a LLVM MC Disassembler Protocol Buffer Fuzzer
for the RISC-V assembly language.

Reviewers: asb

Subscribers: rbar, johnrusso, simoncook, sabuasal, niosHD, kito-cheng, shiva0217, zzheng, edward-jones, mgrang, rogfer01, MartinMosbeck, brucehoult, the_o, rkruppe, jfb, PkmX, jocewei, asb

Differential Revision: https://reviews.llvm.org/D51828

llvm-svn: 344309
2018-10-11 22:49:13 +00:00
Richard Trieu dfd1760b5f Inline variable into assert to avoid unused variable warning.
llvm-svn: 344308
2018-10-11 22:42:41 +00:00
Craig Topper 35d513c7e4 [X86] Type legalize v2f32 loads by using an f64 load and a scalar_to_vector.
On 64-bit targets the generic legalize will use an i64 load and a scalar_to_vector for us. But on 32-bit targets i64 isn't legal and the generic legalizer will end up emitting two 32-bit loads. We have DAG combines that try to put those two loads back together with pretty good success.

This patch instead uses f64 to avoid the splitting entirely. I've made it do the same for 64-bit mode for consistency and to keep the load in the fp domain.

There are a few things in here that look like regressions in 32-bit mode, but I believe they bring us closer to the 64-bit mode codegen. And that the 64-bit mode code could be better. I think those issues should be looked at separately.

Differential Revision: https://reviews.llvm.org/D52528

llvm-svn: 344291
2018-10-11 20:36:06 +00:00
Thomas Lively f04bed8e79 [WebAssembly][NFC] Remove repetition of Defs = [ARGUMENTS] (fixed)
llvm-svn: 344287
2018-10-11 20:21:22 +00:00
Sumanth Gundapaneni a4a9155e4f [Hexagon] Restrict compound instructions with constant value.
Having a constant value operand in the compound instruction
is not always profitable. This patch improves coremark by ~4% on
Hexagon.

Differential Revision: https://reviews.llvm.org/D53152

llvm-svn: 344284
2018-10-11 19:48:15 +00:00
Thomas Lively ab37189f7e [WebAssembly] Revert rL344180, which was breaking expensive checks
llvm-svn: 344280
2018-10-11 18:45:48 +00:00
Krzysztof Parzyszek 5d3a6f76a8 [Hexagon] Eliminate potential sources of non-determinism in HCE
Also, avoid comparing GUIDs when ordering global addresses, because
source file location can cause different GUID to be calculated. As a
result, a pair of symbols can compare "less" in one directory, but
"greater" in another.

llvm-svn: 344271
2018-10-11 18:26:02 +00:00
Craig Topper fb2ac8969e [X86] Restore X86ISelDAGToDAG::matchBEXTRFromAnd. Teach address matching to create a BEXTR pattern from a (shl (and X, mask >> C1) if C1 can be folded into addressing mode.
This is an alternative to D53080 since I think using a BEXTR for a shifted mask is definitely an improvement when the shl can be absorbed into addressing mode. The other cases I'm less sure about.

We already have several tricks for handling an and of a shift in address matching. This adds a new case for BEXTR.

I've moved the BEXTR matching code back to X86ISelDAGToDAG to allow it to match. I suppose alternatively we could directly emit a X86ISD::BEXTR node that isel could pattern match. But I'm trying to view BEXTR matching as an isel concern so DAG combine can see 'and' and 'shift' operations that are well understood. We did lose a couple cases from tbm_patterns.ll, but I think there are ways to recover that.

I've also put back the manual load folding code in matchBEXTRFromAnd that I removed a few months ago in r324939. This gives us some more freedom to make decisions based on the ability to fold a load. I haven't done anything with that yet.

Differential Revision: https://reviews.llvm.org/D53126

llvm-svn: 344270
2018-10-11 18:06:07 +00:00
Diogo N. Sampaio 352a2fa1e7 [AARCH64][FIX] Emit data symbol for constant pool data
The ARM64 elf emitter would omit printing data
symbol for zero filled constant data. This patch
overrides the emitFill method as to enforce that
the symbol is correctly printed.

Differential revision: https://reviews.llvm.org/D53132

llvm-svn: 344248
2018-10-11 14:10:32 +00:00
Roman Lebedev 4225f4adff [X86][BMI1]: X86DAGToDAGISel: select BEXTR from x & ~(-1 << nbits) pattern
Summary:
As discussed in D48491, we can't really do this in the TableGen,
since we need to produce *two* instructions. This only implements
one single pattern. The other 3 patterns will be in follow-ups.

I'm not sure yet if we want to also fuse shift into here
(i.e `(x >> start) & ...`)

Reviewers: RKSimon, craig.topper, spatel

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D52304

llvm-svn: 344224
2018-10-11 07:51:13 +00:00
Thomas Lively 7fa7e6a284 [WebAssembly][NFC] Use intrinsic dag nodes directly
Summary: Instead of custom lowering to WebAssemblyISD nodes first.

Reviewers: aheejin, dschuff

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D53119

llvm-svn: 344211
2018-10-11 00:49:24 +00:00
Thomas Lively 2ebacb107b [WebAssembly] Saturating float to int intrinsics
Summary:
Although the saturating float to int instructions are already
emitted from normal IR, the fpto{s,u}i instructions produce poison
values if the argument cannot fit in the result type. These intrinsics
are therefore necessary to get guaranteed defined saturating behavior.

Reviewers: aheejin, dschuff

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D53004

llvm-svn: 344204
2018-10-11 00:01:25 +00:00
Craig Topper b5421c498d [X86] Prevent non-temporal loads from folding into instructions by blocking them in X86DAGToDAGISel::IsProfitableToFold rather than with a predicate.
Remove tryFoldVecLoad since tryFoldLoad would call IsProfitableToFold and pick up the new check.

This saves about 5K out of ~600K on the generated isel table.

llvm-svn: 344189
2018-10-10 21:48:34 +00:00
George Burgess IV 6ef8002c2c Replace most users of UnknownSize with LocationSize::unknown(); NFC
Moving away from UnknownSize is part of the effort to migrate us to
LocationSizes (e.g. the cleanup promised in D44748).

This doesn't entirely remove all of the uses of UnknownSize; some uses
require tweaks to assume that UnknownSize isn't just some kind of int.
This patch is intended to just be a trivial replacement for all places
where LocationSize::unknown() will Just Work.

llvm-svn: 344186
2018-10-10 21:28:44 +00:00
Thomas Lively eff0542c56 [WebAssembly][NFC] Remove repetition of Defs = [ARGUMENTS]
Summary:
By moving that line into the `I` multiclass.

Reviewers: aheejin

Subscribers: dschuff, sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D53093

llvm-svn: 344180
2018-10-10 20:40:54 +00:00
Roman Lebedev 33d84c6dac [X86] Move X86DAGToDAGISel::matchBEXTRFromAnd() into X86ISelLowering
Summary:
As discussed in [[ https://bugs.llvm.org/show_bug.cgi?id=38938 | PR38938 ]],
we fail to emit `BEXTR` if the mask is shifted.
We can't deal with that in `X86DAGToDAGISel` `before the address mode for the inc is selected`,
and we can't really do it in the normal DAGCombine, because we don't have generic `ISD::BitFieldExtract` node,
and if we simply turn the shifted mask into a normal mask + shift-left, it will be folded back.
So it would seem X86ISelLowering is the place to handle this.

This patch only moves the matchBEXTRFromAnd()
from X86DAGToDAGISel to X86ISelLowering.
It does not add support for the 'shifted mask' pattern.

Reviewers: RKSimon, craig.topper, spatel

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D52426

llvm-svn: 344179
2018-10-10 20:40:12 +00:00
Thomas Lively 103f0161b3 [WebAssembly][NFC] Use vnot patfrag to simplify v128.not
Reviewers: aheejin, dschuff

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D53097

llvm-svn: 344175
2018-10-10 19:09:16 +00:00
Sanjay Patel 6cca8af227 [x86] allow single source horizontal op matching (PR39195)
This is intended to restore horizontal codegen to what it looked like before IR demanded elements improved in:
rL343727

As noted in PR39195:
https://bugs.llvm.org/show_bug.cgi?id=39195
...horizontal ops can be worse for performance than a shuffle+regular binop, so I've added a TODO. Ideally, we'd 
solve that in a machine instruction pass, but a quicker solution will be adding a 'HasFastHorizontalOp' feature
bit to deal with it here in the DAG.

Differential Revision: https://reviews.llvm.org/D52997

llvm-svn: 344141
2018-10-10 13:39:59 +00:00
Simon Pilgrim 5cb3a82892 [TargetLowering] Add root node back to work list after successful SimplifyDemandedBits/SimplifyDemandedVectorElts
Similar to what already happens in the DAGCombiner wrappers, this patch adds the root nodes back onto the worklist if the DCI wrappers' SimplifyDemandedBits/SimplifyDemandedVectorElts were successful.

Differential Revision: https://reviews.llvm.org/D53026

llvm-svn: 344132
2018-10-10 10:44:15 +00:00
Jonas Paulsson bf66f38705 [SystemZ] Temporarily disable high VFs with integer div/rem.
Until mischeduler is clever enough to avoid spilling in a vectorized loop
with many (scalar) DLRs it is better to avoid high vectorization factors (8
and above).

llvm-svn: 344129
2018-10-10 09:30:29 +00:00
Craig Topper 02c62aa58a [X86] Remove FeatureRTM from Skylake processor list
Summary:
There are a LOT of Skylakes and later without TSX-NI. Examples:
- SKL: https://ark.intel.com/products/136863/Intel-Core-i3-8121U-Processor-4M-Cache-up-to-3-20-GHz-
- KBL: https://ark.intel.com/products/97540/Intel-Core-i7-7560U-Processor-4M-Cache-up-to-3-80-GHz-
- KBL-R: https://ark.intel.com/products/149091/Intel-Core-i7-8565U-Processor-8M-Cache-up-to-4-60-GHz-
- CNL: https://ark.intel.com/products/136863/Intel-Core-i3-8121U-Processor-4M-Cache-up-to-3_20-GHz

This feature seems to be present only on high-end desktop and server
chips (I can't find any SKX without). This commit leaves it disabled
for all processors, but can be re-enabled for specific builds with
-mrtm.

Patch by Thiago Macieira

Reviewers: erichkeane, craig.topper

Reviewed By: craig.topper

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D53041

llvm-svn: 344116
2018-10-10 07:43:35 +00:00
Jonas Paulsson 2c8b33770c [SystemZ] Take better care when computing needed vector registers in TTI.
A new function getNumVectorRegs() is better to use for the number of needed
vector registers instead of getNumberOfParts(). This is to make sure that the
number of vector registers (and typically operations) required for a vector
type is accurate.

getNumberOfParts() which was previously used works by splitting the vector
type until it is legal gives incorrect results for types with a non
power of two number of elements (rare).

A new static function getScalarSizeInBits() that also checks for a pointer
type and returns 64U for it since otherwise it gets a value of 0). Used in a
few places where Ty may be pointer.

Review: Ulrich Weigand
llvm-svn: 344115
2018-10-10 07:36:27 +00:00
QingShan Zhang bc1586352e [PowerPC] Fix the assert of ISD::SIGN_EXTEND_INREG when type is v2i16 and v2i8
For ISD::SIGN_EXTEND_INREG operation of v2i16 and v2i8 types will cause assert because they are registered as custom operation. 
So that the type legalization phase will enter the custom hook, which do not handle ISD::SIGN_EXTEND_INREG operation and fall throw into unreachable assert.

Patch By: wuzish (Zixuan Wu)
Differential Revision: https://reviews.llvm.org/D52449

llvm-svn: 344109
2018-10-10 02:33:48 +00:00
Thomas Lively 108e98ec32 [WebAssembly] Fix fneg lowering
Summary:
Subtraction from zero and floating point negation do not have the same
semantics, so fix lowering.

Reviewers: aheejin, dschuff

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D52948

llvm-svn: 344107
2018-10-10 01:09:09 +00:00
Heejin Ahn 5d900954bd [WebAssembly] Improve comments for SIMD instruction definitions
llvm-svn: 344106
2018-10-10 01:04:02 +00:00
Thomas Lively 409f5840a7 [WebAssembly] Handle V128 register class in explicit locals pass
Summary:
Also add tests to catch crashes in passes that are not normally run in
tests.

Reviewers: aheejin, dschuff

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D52959

llvm-svn: 344094
2018-10-09 23:33:16 +00:00
Rong Xu 5c7bf1a756 [X86] Fix sanitizer bot failure from 344085
Fix the memory issue exposed by sanitizer.

llvm-svn: 344092
2018-10-09 23:10:56 +00:00
Heejin Ahn d9a6de3c38 [WebAssembly] Improve readability of SIMD instructions (NFC)
Summary:
- Categorize instructions into the categories as in the SIMD spec
- Move SIMD-related definition to WebAssemblyInstrSIMD.td
- Put definition and use of patterns together
- Add newlines here and there

Reviewers: tlively

Subscribers: dschuff, sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D53045

llvm-svn: 344086
2018-10-09 22:23:39 +00:00
Rong Xu 3d2efdfdea Recommit r343993: [X86] condition branches folding for three-way conditional codes
Fix the memory issue exposed by sanitizer.

llvm-svn: 344085
2018-10-09 22:03:40 +00:00
Nemanja Ivanovic 87873d04c3 [PowerPC] Implement hasBitPreservingFPLogic for types that can be supported
This is the PPC-specific non-controversial part of
https://reviews.llvm.org/D44548 that simply enables this combine for PPC
since PPC has these instructions.
This commit will allow the target-independent portion to be truly target
independent.

llvm-svn: 344077
2018-10-09 20:35:15 +00:00
Craig Topper f6d8400869 [X86] When lowering unsigned v2i64 setcc without SSE42, flip the sign bits in the v2i64 type then bitcast to v4i32.
This may give slightly better opportunities for DAG combine to simplify with the operations before the setcc. It also matches the type the xors will eventually be promoted to anyway so it saves a legalization step.

Almost all of the test changes are because our constant pool entry is now v2i64 instead of v4i32 on 64-bit targets. On 32-bit targets getConstant should be emitting a v4i32 build_vector and a v4i32->v2i64 bitcast.

There are a couple test cases where it appears we now combine a bitwise not with one of these xors which caused a new constant vector to be generated. This prevented a constant pool entry from being shared. But if that's an issue we're concerned about, it seems we need to address it another way that just relying a bitcast to hide it.

This came about from experiments I've been trying with pushing the promotion of and/or/xor to vXi64 later than LegalizeVectorOps where it is today. We run LegalizeVectorOps in a bottom up order. So the and/or/xor are promoted before their users are legalized. The bitcasts added for the promotion act as a barrier to computeKnownBits if we try to use it during vector legalization of a later operation. So by moving the promotion out we can hopefully get better results from computeKnownBits/computeNumSignBits like in LowerTruncate on AVX512. I've also looked at running LegalizeVectorOps in a top down order like LegalizeDAG, but thats showing some other issues.

llvm-svn: 344071
2018-10-09 19:05:50 +00:00
Sanjay Patel f5fac1826a [x86] use demanded bits to simplify masked store codegen
As noted in D52747, if we prefer IR to use trunc for bool vectors rather 
than and+icmp, we can expose codegen shortcomings as seen here with masked store.

Replace a hard-coded PCMPGT simplification with the more general demanded bits call
to improve things.

Differential Revision: https://reviews.llvm.org/D52964

llvm-svn: 344048
2018-10-09 14:04:14 +00:00
Simon Atanasyan d465318c6d [mips] Set pointer size to 4 bytes for N32 ABI
CodePointerSize and CalleeSaveStackSlotSize values are used in DWARF
generation. In case of MIPS it's incorrect to check for Triple::isMIPS64()
only this function returns true for N32 ABI too.

Now we do not have a method to recognize N32 if it's specified by a command
line option and is not a part of a target triple. So we check for
Triple::GNUABIN32 only. It's better than nothing.

Differential revision: https://reviews.llvm.org/D52874

llvm-svn: 344039
2018-10-09 11:29:45 +00:00
Nemanja Ivanovic 4c0b110e3e [PowerPC] Remove self-copies in pre-emit peephole
There are occasionally instances where AADB rewrites registers in such a way
that a reg-reg copy becomes a self-copy. Such an instruction is obviously
redundant and can be removed. This patch does precisely that.

Note that this will not remove various nop's that we insert (which are
themselves just self-copies). The reason those are left alone is that all of
them have their own opcodes (that just encode to a self-copy).

What prompted this patch is the fact that these self-copies sometimes end up
using registers that make the instruction a priority-setting nop, thereby
having a significant effect on performance.

Differential revision: https://reviews.llvm.org/D52432

llvm-svn: 344036
2018-10-09 10:54:04 +00:00
Simon Pilgrim 720db8ed7b [X86][AVX1] Enable *_EXTEND_VECTOR_INREG lowering of 256-bit vectors
As discussed on D52964, this adds 256-bit *_EXTEND_VECTOR_INREG lowering support for AVX1 targets to help improve SimplifyDemandedBits handling.

Differential Revision: https://reviews.llvm.org/D52980

llvm-svn: 344019
2018-10-09 07:42:01 +00:00
Petar Jovanovic aa97890d66 [MIPS GlobalISel] Legalize i64 add
Custom legalize s64 G_ADD for MIPS32.

Patch by Petar Avramovic.

Differential Revision: https://reviews.llvm.org/D52652

llvm-svn: 344007
2018-10-08 23:59:37 +00:00
Rong Xu 47fd015163 [X86] Revert r343993 condition branches folding for three-way conditional codes
Some buildbots failed.

llvm-svn: 343998
2018-10-08 22:08:43 +00:00
Craig Topper ff9f02580d [X86] Prefer isTypeLegal over checking isSimple in a DAG combine.
Simple types are a superset of what all in tree targets in LLVM could possibly have a legal type. This means the behavior of using isSimple to check for a supported type for X86 could change over time. For example, this could would change if a v256i1 type was added to MVT in the future.

llvm-svn: 343995
2018-10-08 20:02:59 +00:00
Rong Xu 67b1b328f7 [X86] condition branches folding for three-way conditional codes
This patch implements a pass that optimizes condition branches on x86 by
taking advantage of the three-way conditional code generated by compare
instructions.

Currently, it tries to hoisting EQ and NE conditional branch to a dominant
conditional branch condition where the same EQ/NE conditional code is
computed. An example:
bb_0:
  cmp %0, 19
  jg bb_1
  jmp bb_2
bb_1:
  cmp %0, 40
  jg bb_3
  jmp bb_4
bb_4:
  cmp %0, 20
  je bb_5
  jmp bb_6
Here we could combine the two compares in bb_0 and bb_4 and have the
following code:

bb_0:
  cmp %0, 20
  jg bb_1
  jl bb_2
  jmp bb_5
bb_1:
  cmp %0, 40
  jg bb_3
  jmp bb_6

For the case of %0 == 20 (bb_5), we eliminate two jumps, and the control height
for bb_6 is also reduced. bb_4 is gone after the optimization.

This optimization is motivated by the branch pattern generated by the switch
lowering: we always have pivot-1 compare for the inner nodes and we do a pivot
compare again the leaf (like above pattern).

This pass currently is enabled on Intel's Sandybridge and later arches. Some
reviewers pointed out that on some arches (like AMD Jaguar), this pass may
increase branch density to the point where it hurts the performance of the
branch predictor.

Differential Revision: https://reviews.llvm.org/D46662

llvm-svn: 343993
2018-10-08 18:52:39 +00:00
Scott Linder 823549a6ec [AMDGPU] Legalize VGPR Rsrc operands for MUBUF instructions
Emit a waterfall loop in the general case for a potentially-divergent Rsrc
operand. When practical, avoid this by using Addr64 instructions.

Recommits r341413 with changes to update the MachineDominatorTree when present.

Differential Revision: https://reviews.llvm.org/D51742

llvm-svn: 343992
2018-10-08 18:47:01 +00:00
Simon Pilgrim 6fc8d05565 [X86][AVX2] Enable ZERO_EXTEND_VECTOR_INREG lowering of 256-bit vectors
Some necessary yak shaving before lowering *_EXTEND_VECTOR_INREG 256-bit vectors on AVX1 targets as suggested by D52964.

Differential Revision: https://reviews.llvm.org/D52970

llvm-svn: 343991
2018-10-08 18:40:50 +00:00
Sanjay Patel 43bf9917cc [x86] make horizontal binop matching clearer; NFCI
The instructions are complicated, so this code will
probably never be very obvious, but hopefully this
makes it better. 

As shown in PR39195:
https://bugs.llvm.org/show_bug.cgi?id=39195
...we need to improve the matching to not miss cases
where we're h-opping on 1 source vector, and that
should be a small patch after this rearranging.

llvm-svn: 343989
2018-10-08 18:08:02 +00:00
Tom Stellard 14d8807d9a AMDGPU/GlobalISel: Select amdgcn.cvt.pkrtz to 64-bit instructions
Summary: The 32-bit variants do not exist on VI+.

Reviewers: arsenm

Reviewed By: arsenm

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D52958

llvm-svn: 343985
2018-10-08 17:49:29 +00:00