Commit Graph

17365 Commits

Author SHA1 Message Date
Igor Breger 1a388871b9 [AVX512] In some cases KORTEST instruction may be used instead of ZEXT + TEST sequence.
Differential Revision: http://reviews.llvm.org/D23490

llvm-svn: 279960
2016-08-29 08:52:52 +00:00
Craig Topper 713085e60a [X86] Don't lower FABS/FNEG masking directly to a ConstantPool load. Just create a ConstantFPSDNode and let that be lowered.
This allows broadcast loads to used when available.

llvm-svn: 279958
2016-08-29 04:49:31 +00:00
Craig Topper 71584cd0f0 [AVX-512] Add 512-bit fabs tests with and without AVX512DQ.
llvm-svn: 279956
2016-08-29 04:49:24 +00:00
Craig Topper 850feaf3b7 [AVX-512] Add support for selecting 512-bit VPABSB/VPABSW when BWI is available.
llvm-svn: 279951
2016-08-28 22:20:51 +00:00
Craig Topper a47fc6e5b5 [AVX-512] Add testcases showing that we don't emit 512-bit vpabsb/vpabsw. Will be fixed in a future commit.
llvm-svn: 279949
2016-08-28 22:20:45 +00:00
Sanjay Patel cd7d0c6aca [x86] add tests for <3 x N> vector types (PR29114)
llvm-svn: 279939
2016-08-28 18:31:32 +00:00
Simon Pilgrim 5369cd9e9c [X86][AVX512] Only combine EVEX targets shuffles to shuffles of the same number of vector elements
Over eager combing prevents the correct folding of writemasks.

At the moment this occurs for ALL EVEX shuffles, in the future we need to check that the user of the root shuffle is a VSELECT that can fold to a writemask.

llvm-svn: 279934
2016-08-28 17:27:14 +00:00
Hal Finkel 5728200f33 [PowerPC] Implement lowering for atomicrmw min/max/umin/umax
Implement lowering for atomicrmw min/max/umin/umax. Fixes PR28818.

llvm-svn: 279933
2016-08-28 16:17:58 +00:00
Craig Topper abe80cc04d [AVX-512] Promote AND/OR/XOR to v2i64/v4i64/v8i64 even when we have AVX512F/AVX512VL.
Previously we weren't creating masked logical operations if bitcasts appeared between the logic operation and the select. The IR optimizers can move bitcasts across logic operations and create these cases. To minimize the number of cases we need to handle, this change promotes all logic ops to an i64 vector type just like when only SSE or AVX is available.

Unfortunately, this also has the consequence of making it difficult to select unmasked VPANDD/VPORD/VPXORD in all the cases it was previously used. This is the cause of most of the test change. This shouldn't result in any functional change though.

llvm-svn: 279929
2016-08-28 06:06:28 +00:00
Craig Topper 8046e2033e [AVX-512] Add tests to show that we don't select masked logic ops if there are bitcasts between the logic op and the select.
This is taken from optimized IR of clang test cases for masked logic ops.

llvm-svn: 279928
2016-08-28 06:06:24 +00:00
Jan Vesely 38814fa2fd AMDGPU/R600: Enable Load combine
Fix and improve tests

Differential Revision: https://reviews.llvm.org/D23899

llvm-svn: 279925
2016-08-27 19:09:43 +00:00
Craig Topper 144fdef66b [X86] Enable FR32/FR64 cmpeq/cmpne/cmpunord/cmpord to be commuted.
llvm-svn: 279913
2016-08-27 05:22:12 +00:00
Craig Topper 4891c724aa [AVX-512] Add load folding for EVEX vcmpps/pd/ss/sd.
llvm-svn: 279912
2016-08-27 05:22:08 +00:00
Matt Arsenault 2712d4a3d8 AMDGPU: Select mulhi 24-bit instructions
llvm-svn: 279902
2016-08-27 01:32:27 +00:00
Matt Arsenault 22e417956d AMDGPU: Move cndmask pseudo to be isel pseudo
There's only one use of this for the convenience
of a pattern. I think v_mov_b64_pseudo should also be
moved, but SIFoldOperands does currently make use of it.

llvm-svn: 279901
2016-08-27 01:00:37 +00:00
Quentin Colombet 374796d678 [GlobalISel] Add a fallback path to SDISel.
When global-isel fails on a MachineFunction MF, MF will be cleaned up
and given to SDISel.
Thanks to this fallback, we can already perform correctness test even if
we support only a small portion of the functions in a test.

llvm-svn: 279891
2016-08-27 00:18:31 +00:00
Michael Kuperstein aea50f8b84 [X86] Add baseline test for "odd" shuffles. NFC.
Adds a baseline test for lowering shuffles where the width of the output
vector is not twice the size of the input vectors. Many of those sequences
are suboptimal, and will hopefully be improved in follow-up patches.

llvm-svn: 279888
2016-08-27 00:10:24 +00:00
Tom Stellard e175d8aba5 AMDGPU/SI: Canonicalize offset order for merged DS instructions
Summary:
If the scheduler clusters the loads, then the offsets will be sorted,
but it is possible for the scheduler to scheduler loads together
without out explicitly clustering them, which would give us non-sorted
offsets.

Also, we will want to do this if we move the load/store optimizer before
the scheduler.

Reviewers: arsenm

Subscribers: arsenm, llvm-commits, kzhuravl

Differential Revision: https://reviews.llvm.org/D23776

llvm-svn: 279870
2016-08-26 21:36:47 +00:00
Manman Ren 66b54e9f32 Swift Calling Convetion: add support for AArch64.
It will just be the same as the regular calling convention.

rdar://28029509

llvm-svn: 279853
2016-08-26 19:28:17 +00:00
Tim Northover 85cf564c51 AArch64: avoid assertion on illegal types in performFDivCombine.
In the code to detect fixed-point conversions and make use of AArch64's special
instructions, we weren't prepared for weird types. The fptosi direction got
fixed recently, but not the similar sitofp code.

llvm-svn: 279852
2016-08-26 18:52:31 +00:00
Chad Rosier 58f505ba24 [AArch64] Avoid materializing constant values when generating csel instructions.
Differential Revision: https://reviews.llvm.org/D23677

llvm-svn: 279849
2016-08-26 18:05:50 +00:00
Tim Northover bc1701c7fb GlobalISel: mark G_FPEXT legal from float to double.
llvm-svn: 279845
2016-08-26 17:46:22 +00:00
Tim Northover 30bd36e3fc GlobalISel: mark G_FCMP legal on float & double.
llvm-svn: 279844
2016-08-26 17:46:19 +00:00
Tim Northover 051b8ad3d9 GlobalISel: simplify G_ICMP legalization regime.
It's unclear how the old

    %res(32) = G_ICMP { s32, s32 } intpred(eq), %0, %1

is actually different from an s1 verison

    %res(1) = G_ICMP { s1, s32 } intpred(eq), %0, %1

so we'll remove it for now.

llvm-svn: 279843
2016-08-26 17:46:17 +00:00
Tim Northover cecee56abb GlobalISel: legalize sdiv and srem operations.
llvm-svn: 279842
2016-08-26 17:46:13 +00:00
Tim Northover 7a753d9bec GlobalISel: legalize under-width divisions.
llvm-svn: 279841
2016-08-26 17:46:06 +00:00
Tim Northover 1d18a99a53 GlobalISel: mark selects legal
llvm-svn: 279840
2016-08-26 17:46:03 +00:00
Tim Northover 5d0eaa4e79 GlobalISel: mark float/int conversions legal
llvm-svn: 279839
2016-08-26 17:45:58 +00:00
Chad Rosier 39c1dbb845 [AArch64] Avoid materializing constant 1 by using csinc, rather than csel.
This is similar to what was done in r261675, but for CSINC rather than CSINV.

Differential Revision: https://reviews.llvm.org/D23892

llvm-svn: 279822
2016-08-26 14:01:55 +00:00
Pablo Barrio b8ec630583 Handle empty functions with debug info in load/store opt pass
Summary:
In fuctions that contained debug info but were empty otherwise,
the ARM load/store optimizer could abort. This was because
function MergeReturnIntoLDM handled the special case where a
Machine Basic BLock is empty by calling MBB.empty(). However, this
returns false in presence of debug info, although the function
should be considered empty in the eyes of the load/store optimizer.
This has been fixed by handling the case where searching through the
block finds only debug instructions.

Reviewers: rengolin, dexonsmith, llvm-commits, jmolloy

Subscribers: t.p.northover, aemerson, rengolin, samparker

Differential Revision: https://reviews.llvm.org/D23847

llvm-svn: 279820
2016-08-26 13:00:39 +00:00
Simon Pilgrim 091c4c781c [X86][SSE4A] The EXTRQ/INSERTQ bit extraction/insertion ops should be in the integer domain
llvm-svn: 279811
2016-08-26 09:55:41 +00:00
Craig Topper 8f27f51192 [X86][SSE] Add CMPSS/CMPSD intrinsic scalar load folding support.
llvm-svn: 279806
2016-08-26 07:08:00 +00:00
Matt Arsenault f403df38eb Replace subregister uses when processing tied operands
This was for some reason skipping operands that are subregisters
instead of keeping the same subregister index.

v_movreld_b32 expects src0 to be the subregister of the tied
super register use/def.

e.g.

v_movreld_b32 v0, v9, <imp-def, tied3> v[0:3], <imp-use, tied2> v[0:3]

was being replaced with

v[4:7] = copy v[0:3]
v_movreld_b32 v0, v9, <imp-def, tied3> v[4:7], <imp-use, tied2> v[4:7],

which really writes to v[0:3]

llvm-svn: 279804
2016-08-26 06:31:32 +00:00
Michael Kuperstein 2ee911e985 Revert r274613 because it breaks the test suite with AVX512
This reverts most of r274613 (AKA r274626) and its follow-ups (r276347, r277289),
due to miscompiles in the test suite. The FastISel change was left in, because
it apparently fixes an unrelated issue.

(Recommit of r279782 which was broken due to a bad merge.)

This fixes 4 out of the 5 test failures in PR29112.

llvm-svn: 279788
2016-08-25 22:48:11 +00:00
Michael Kuperstein 6e271f4ce8 Revert r279782 due to debug buildbot breakage.
llvm-svn: 279785
2016-08-25 22:14:45 +00:00
Michael Kuperstein a6ccc8d365 Revert r274613 because it breaks the test suite with AVX512
This reverts most of r274613 and its follow-ups (r276347, r277289), due to
miscompiles in the test suite. The FastISel change was left in, because it
apparently fixes an unrelated issue.

This fixes 4 out of the 5 test failures in PR29112.

llvm-svn: 279782
2016-08-25 21:55:41 +00:00
Tim Northover fe880a8801 GlobalISel: mark simple ops legal even on types < 32-bit.
The 32-bit variants of these operations don't depend on the bits not being
operated on, so they also naturally model operations narrower than the actual
register width.

llvm-svn: 279760
2016-08-25 17:37:39 +00:00
Tim Northover 7a1ec0141a GlobalISel: mark pointer constants as legal on AArch64.
llvm-svn: 279759
2016-08-25 17:37:35 +00:00
Tim Northover 438c77ca1a GlobalISel: perform multi-step legalization
llvm-svn: 279758
2016-08-25 17:37:32 +00:00
Tim Northover 2c4a838e24 GlobalISel: mark small extends as legal on AArch64
llvm-svn: 279757
2016-08-25 17:37:25 +00:00
Michael Kuperstein 40887c5566 [X86] 512-bit VPAVG requires AVX512BW
Fix VPAVG detection to require AVX512BW, not AVX512F for 512-bit widths,
and change associated asserts to assert in the right direction...

This fixes PR29111.

llvm-svn: 279755
2016-08-25 17:17:46 +00:00
Ron Lieberman c93d123b86 [Hexagon] vector store print tracing.
Add vector store print tracing option for hexagon vector instructions.

https://reviews.llvm.org/D23870

llvm-svn: 279739
2016-08-25 13:35:48 +00:00
Simon Pilgrim 3125501bba [X86][AVX] Improved AVX512F/AVX512VL SubVectorBroadcast tests
llvm-svn: 279736
2016-08-25 12:50:13 +00:00
Simon Pilgrim 0ad9f3e93b [X86][AVX] Provide SubVectorBroadcast fallback if load fold fails (PR29133)
Fix for PR29133, matching the approach that was taken for AVX1 scalar broadcasts.

llvm-svn: 279735
2016-08-25 12:45:16 +00:00
Matthias Braun 1eb473680a MachineFunctionProperties/MIRParser: Rename AllVRegsAllocated->NoVRegs, compute it
Rename AllVRegsAllocated to NoVRegs. This avoids the connotation of
running after register and simply describes that no vregs are used in
a machine function. With that we can simply compute the property and do
not need to dump/parse it in .mir files.

Differential Revision: http://reviews.llvm.org/D23850

llvm-svn: 279698
2016-08-25 01:27:13 +00:00
Kyle Butt 90e51b1bef Test: Add REQUIRES: asserts to test that now requires stats.
Test was modified in r279670

llvm-svn: 279690
2016-08-25 00:06:52 +00:00
Krzysztof Parzyszek 6dff336ad1 [Hexagon] Check for block end when skipping debug instructions
llvm-svn: 279681
2016-08-24 22:36:35 +00:00
Matthias Braun a319e2cae0 MIRParser/MIRPrinter: Compute HasInlineAsm instead of printing/parsing it
llvm-svn: 279680
2016-08-24 22:34:06 +00:00
Matthias Braun 5dce48e0a7 Missed a test in my last commit
llvm-svn: 279679
2016-08-24 22:32:11 +00:00
Matthias Braun f1b20c5225 MachineRegisterInfo/MIR: Initialize tracksSubRegLiveness early, do not print/parser it
tracksSubRegLiveness only depends on the Subtarget and a cl::opt, there
is not need to change it or save/parse it in a .mir file.
Make the field const and move the initialization LiveIntervalAnalysis to the
MachineRegisterInfo constructor. Also cleanup some code and fix some
instances which better use MachineRegisterInfo::subRegLivenessEnabled() instead
of TargetSubtargetInfo::enableSubRegLiveness().

llvm-svn: 279676
2016-08-24 22:17:45 +00:00
Kyle Butt a8c7371d16 CodeGen: If Convert blocks that would form a diamond when tail-merged.
The following function currently relies on tail-merging for if
conversion to succeed. The common tail of cond_true and cond_false is
extracted, and this then forms a diamond pattern that can be
successfully if converted.

If this block does not get extracted, either because tail-merging is
disabled or the threshold is higher, we should still recognize this
pattern and if-convert it.

Fixed a regression in the original commit. Need to un-reverse branches after
reversing them, or other conversions go awry.

define i32 @t2(i32 %a, i32 %b) nounwind {
entry:
        %tmp1434 = icmp eq i32 %a, %b           ; <i1> [#uses=1]
        br i1 %tmp1434, label %bb17, label %bb.outer

bb.outer:               ; preds = %cond_false, %entry
        %b_addr.021.0.ph = phi i32 [ %b, %entry ], [ %tmp10, %cond_false ]
        %a_addr.026.0.ph = phi i32 [ %a, %entry ], [ %a_addr.026.0, %cond_false ]
        br label %bb

bb:             ; preds = %cond_true, %bb.outer
        %indvar = phi i32 [ 0, %bb.outer ], [ %indvar.next, %cond_true ]
        %tmp. = sub i32 0, %b_addr.021.0.ph
        %tmp.40 = mul i32 %indvar, %tmp.
        %a_addr.026.0 = add i32 %tmp.40, %a_addr.026.0.ph
        %tmp3 = icmp sgt i32 %a_addr.026.0, %b_addr.021.0.ph
        br i1 %tmp3, label %cond_true, label %cond_false

cond_true:              ; preds = %bb
        %tmp7 = sub i32 %a_addr.026.0, %b_addr.021.0.ph
        %tmp1437 = icmp eq i32 %tmp7, %b_addr.021.0.ph
        %indvar.next = add i32 %indvar, 1
        br i1 %tmp1437, label %bb17, label %bb

cond_false:             ; preds = %bb
        %tmp10 = sub i32 %b_addr.021.0.ph, %a_addr.026.0
        %tmp14 = icmp eq i32 %a_addr.026.0, %tmp10
        br i1 %tmp14, label %bb17, label %bb.outer

bb17:           ; preds = %cond_false, %cond_true, %entry
        %a_addr.026.1 = phi i32 [ %a, %entry ], [ %tmp7, %cond_true ], [ %a_addr.026.0, %cond_false ]
        ret i32 %a_addr.026.1
}

Without tail-merging or diamond-tail if conversion:
LBB1_1:                                 @ %bb
                                        @ =>This Inner Loop Header: Depth=1
        cmp     r0, r1
        ble     LBB1_3
@ BB#2:                                 @ %cond_true
                                        @   in Loop: Header=BB1_1 Depth=1
        subs    r0, r0, r1
        cmp     r1, r0
        it      ne
        cmpne   r0, r1
        bgt     LBB1_4
LBB1_3:                                 @ %cond_false
                                        @   in Loop: Header=BB1_1 Depth=1
        subs    r1, r1, r0
        cmp     r1, r0
        bne     LBB1_1
LBB1_4:                                 @ %bb17
        bx      lr

With diamond-tail if conversion, but without tail-merging:
@ BB#0:                                 @ %entry
        cmp     r0, r1
        it      eq
        bxeq    lr
LBB1_1:                                 @ %bb
                                        @ =>This Inner Loop Header: Depth=1
        cmp     r0, r1
        ite     le
        suble   r1, r1, r0
        subgt   r0, r0, r1
        cmp     r1, r0
        bne     LBB1_1
@ BB#2:                                 @ %bb17
        bx      lr

llvm-svn: 279671
2016-08-24 21:34:27 +00:00
Kyle Butt 6262ca3448 IfConversion: Rescan diamonds.
The cost of predicating a diamond is only the instructions that are not shared
between the two branches. Additionally If a predicate clobbering instruction
occurs in the shared portion of the branches (e.g. a cond move), it may still
be possible to if convert the sub-cfg. This change handles these two facts by
rescanning the non-shared portion of a diamond sub-cfg to recalculate both the
predication cost and whether both blocks are pred-clobbering.

Fixed 2 bugs before recommitting. Branch instructions must be compared and found
identical before diamond conversion. Also, predicate-clobbering instructions in
the shared prefix disqualifies a potential diamond conversion. Includes tests
for both.

llvm-svn: 279670
2016-08-24 21:34:24 +00:00
Changpeng Fang 75f0968b39 AMDGCN/SI: Implement readlane/readfirstlane intrinsics
Summary:
  This patch implements readlane/readfirstlane intrinsics.
TODO: need to define a new register class to consider the case
that the source could be a vector register or M0.

Reviewed by:
  arsenm and tstellarAMD

Differential Revision:
  http://reviews.llvm.org/D22489

llvm-svn: 279660
2016-08-24 20:35:23 +00:00
Rafael Espindola 70c6a3976b Use isTargetMachO instead of isTargetDarwin.
llvm-svn: 279655
2016-08-24 19:02:29 +00:00
Simon Pilgrim e14653e17d [X86][SSE] Add MINSD/MAXSD/MINSS/MAXSS intrinsic scalar load folding support
These are no different in load behaviour to the existing ADD/SUB/MUL/DIV scalar ops but were missing from isNonFoldablePartialRegisterLoad

llvm-svn: 279652
2016-08-24 18:40:53 +00:00
Simon Pilgrim 941bd6bbae [X86][SSE] Add support for combining VZEXT_MOVL target shuffles
Includes adding more general support for the pattern: VZEXT_MOVL(VZEXT_LOAD(ptr)) -> VZEXT_LOAD(ptr)

This has unearthed a couple of latent poor codegen issues (MINSS/MAXSS scalar load folding and MOVDDUP/BROADCAST load folding patterns), which will be fixed shortly.

Its also reduced a couple of tests so that they no longer reach the instruction threshold necessary to be combined to PSHUFB (see PR26183).

llvm-svn: 279646
2016-08-24 18:07:53 +00:00
Tim Northover 65f6336ff9 GlobalISel: fix cmp test to be in SSA form
llvm-svn: 279633
2016-08-24 15:37:51 +00:00
Simon Pilgrim 2725217680 [X86][SSE] Regenerate scalar math load folding tests for 32 and 64 bit targets
llvm-svn: 279630
2016-08-24 15:07:11 +00:00
Wei Ding 1041a646a9 AMDGPU : Add V_SAD_U32 instruction pattern.
Differential Revision: http://reviews.llvm.org/D23069

llvm-svn: 279629
2016-08-24 14:59:47 +00:00
Krzysztof Parzyszek a7ed090bba Create subranges for new intervals resulting from live interval splitting
The register allocator can split a live interval of a register into a set
of smaller intervals. After the allocation of registers is complete, the
rewriter will modify the IR to replace virtual registers with the corres-
ponding physical registers. At this stage, if a register corresponding
to a subregister of a virtual register is used, the rewriter will check
if that subregister is undefined, and if so, it will add the <undef> flag
to the machine operand. The function verifying liveness of the subregis-
ter would assume that it is undefined, unless any of the subranges of the
live interval proves otherwise.
The problem is that the live intervals created during splitting do not
have any subranges, even if the original parent interval did. This could
result in the <undef> flag placed on a register that is actually defined.

Differential Revision: http://reviews.llvm.org/D21189

llvm-svn: 279625
2016-08-24 13:37:55 +00:00
Simon Dardis f114820912 [mips] Preparatory work for a generic scheduler
Extend instruction definitions from nearly all ISAs to include
appropriate instruction itineraries. Change MIPS16s gp prologue
generation to use real instructions instead of using a pseudo
instruction.

Reviewers: dsanders, vkalintiris

Differential Review: https://reviews.llvm.org/D23548

llvm-svn: 279623
2016-08-24 13:00:47 +00:00
Simon Pilgrim 7a50c8c2ba [X86][AVX2] Ensure on 32-bit targets that we broadcast f64 types not i64 (PR29101)
llvm-svn: 279622
2016-08-24 12:42:31 +00:00
Simon Pilgrim 3c8cd3df5e [X86][F16C] Regenerated f16c tests
llvm-svn: 279621
2016-08-24 11:56:15 +00:00
Matthias Braun 733fe3676c CodeGen: Remove MachineFunctionAnalysis => Enable (Machine)ModulePasses
Re-apply this patch, hopefully I will get away without any warnings
in the constructor now.

This patch removes the MachineFunctionAnalysis. Instead we keep a
map from IR Function to MachineFunction in the MachineModuleInfo.

This allows the insertion of ModulePasses into the codegen pipeline
without breaking it because the MachineFunctionAnalysis gets dropped
before a module pass.

Peak memory should stay unchanged without a ModulePass in the codegen
pipeline: Previously the MachineFunction was freed at the end of a codegen
function pipeline because the MachineFunctionAnalysis was dropped; With
this patch the MachineFunction is freed after the AsmPrinter has
finished.

Differential Revision: http://reviews.llvm.org/D23736

llvm-svn: 279602
2016-08-24 01:52:46 +00:00
Matthias Braun 79f85b3b8f MIRParser/MIRPrinter: Compute isSSA instead of printing/parsing it.
Specifying isSSA is an extra line at best and results in invalid MI at
worst. Compute the value instead.

Differential Revision: http://reviews.llvm.org/D22722

llvm-svn: 279600
2016-08-24 01:32:41 +00:00
Richard Smith 8c3fbdc6c4 Revert r279564. It introduces undefined behavior (binding a reference to a
dereferenced null pointer) in MachineModuleInfo::MachineModuleInfo that causes
-Werror builds (including several buildbots) to fail.

llvm-svn: 279580
2016-08-23 22:08:27 +00:00
Tim Northover d0cfb7344e GlobalISel: add some G_TRUNCs to make icmp test valid MIR.
llvm-svn: 279579
2016-08-23 22:07:31 +00:00
Tim Northover 4bdf473590 GlobalISel: add forgotten test-case for G_ICMP
llvm-svn: 279569
2016-08-23 21:11:36 +00:00
Tim Northover bdf67c9a00 GlobalISel: make truncate/extend casts uniform
They really should have both types represented, but early variants were created
before MachineInstrs could have multiple types so they're rather ambiguous.

llvm-svn: 279567
2016-08-23 21:01:33 +00:00
Tim Northover b3a0be4d38 GlobalISel: legalize conditional branches on AArch64.
llvm-svn: 279565
2016-08-23 21:01:20 +00:00
Matthias Braun 4c1f1f120c CodeGen: Remove MachineFunctionAnalysis => Enable (Machine)ModulePasses
Re-apply this commit with the deletion of a MachineFunction delegated to
a separate pass to avoid use after free when doing this directly in
AsmPrinter.

This patch removes the MachineFunctionAnalysis. Instead we keep a
map from IR Function to MachineFunction in the MachineModuleInfo.

This allows the insertion of ModulePasses into the codegen pipeline
without breaking it because the MachineFunctionAnalysis gets dropped
before a module pass.

Peak memory should stay unchanged without a ModulePass in the codegen
pipeline: Previously the MachineFunction was freed at the end of a codegen
function pipeline because the MachineFunctionAnalysis was dropped; With
this patch the MachineFunction is freed after the AsmPrinter has
finished.

Differential Revision: http://reviews.llvm.org/D23736

llvm-svn: 279564
2016-08-23 20:58:29 +00:00
Tim Northover 456a3c03ac GlobalISel: mark pointer casts legal on AArch64.
llvm-svn: 279553
2016-08-23 19:30:38 +00:00
Tim Northover 3c73e367c0 GlobalISel: legalize 1-bit load/store and mark 8/16 bit variants legal on AArch64.
llvm-svn: 279548
2016-08-23 18:20:09 +00:00
Simon Pilgrim 95580d6ed2 [X86][SSE] Demonstrate inability to recognise that (v)cvtpd2dq & (v)cvttpd2dq intrinsics implicitly zeroes the upper half of the xmm
llvm-svn: 279527
2016-08-23 16:11:21 +00:00
Krzysztof Parzyszek 38e2ccc8d0 [Hexagon] Packetize return value setup with the return instruction
Commit r279241 unintentionally reverted that ability.

llvm-svn: 279526
2016-08-23 16:01:01 +00:00
Simon Pilgrim 04b99fcded [X86][AVX] Updated fptosi_2f64_to_4i32 test to show missed opportunity to implicit zero the upper elements
llvm-svn: 279521
2016-08-23 15:10:39 +00:00
Simon Pilgrim 22c415a696 [X86][AVX] Add v2i32 fp to int conversion tests
llvm-svn: 279520
2016-08-23 15:00:52 +00:00
Simon Pilgrim cb96142fb8 [X86][AVX] Add AVX2/AVX512 fp to int conversion tests
llvm-svn: 279518
2016-08-23 14:37:35 +00:00
Simon Pilgrim 2ed547513d [X86][SSE] Demonstrate inability to recognise that (v)cvtpd2ps intrinsics implicitly zeroes the upper half of the xmm
llvm-svn: 279511
2016-08-23 11:26:28 +00:00
Simon Pilgrim 9eb978b47b [X86][SSE] Demonstrate inability to recognise that (v)cvtpd2ps implicitly zeroes the upper half of the xmm
llvm-svn: 279508
2016-08-23 10:35:24 +00:00
Oliver Stannard 9aa6f010a4 [ARM] Generate consistent frame records for Thumb2
There is not an official documented ABI for frame pointers in Thumb2,
but we should try to emit something which is useful.

We use r7 as the frame pointer for Thumb code, which currently means
that if a function needs to save a high register (r8-r11), it will get
pushed to the stack between the frame pointer (r7) and link register
(r14). This means that while a stack unwinder can follow the chain of
frame pointers up the stack, it cannot know the offset to lr, so does
not know which functions correspond to the stack frames.

To fix this, we need to push the callee-saved registers in two batches,
with the first push saving the low registers, fp and lr, and the second
push saving the high registers. This is already implemented, but
previously only used for iOS. This patch turns it on for all Thumb2
targets when frame pointers are required by the ABI, and the frame
pointer is r7 (Windows uses r11, so this isn't a problem there). If
frame pointer elimination is enabled we still emit a single push/pop
even if we need a frame pointer for other reasons, to avoid increasing
code size.

We must also ensure that lr is pushed to the stack when using a frame
pointer, so that we end up with a complete frame record. Situations that
could cause this were rare, because we already push lr in most
situations so that we can return using the pop instruction.

Differential Revision: https://reviews.llvm.org/D23516

llvm-svn: 279506
2016-08-23 09:19:22 +00:00
Matthias Braun 7f66202d38 Revert "(HEAD -> master, origin/master, origin/HEAD) CodeGen: Remove MachineFunctionAnalysis => Enable (Machine)ModulePasses"
Reverting while tracking down a use after free.

This reverts commit r279502.

llvm-svn: 279503
2016-08-23 05:17:11 +00:00
Matthias Braun fd936841eb CodeGen: Remove MachineFunctionAnalysis => Enable (Machine)ModulePasses
This patch removes the MachineFunctionAnalysis. Instead we keep a
map from IR Function to MachineFunction in the MachineModuleInfo.

This allows the insertion of ModulePasses into the codegen pipeline
without breaking it because the MachineFunctionAnalysis gets dropped
before a module pass.

Peak memory should stay unchanged without a ModulePass in the codegen
pipeline: Previously the MachineFunction was freed at the end of a codegen
function pipeline because the MachineFunctionAnalysis was dropped; With
this patch the MachineFunction is freed after the AsmPrinter has
finished.

Differential Revision: http://reviews.llvm.org/D23736

llvm-svn: 279502
2016-08-23 03:20:09 +00:00
Matt Arsenault 567631bdd4 BranchRelaxation: Fix handling of blocks with multiple conditional
branches

Looping over all terminators exposed AArch64 tests hitting
an assert from analyzeBranch failing. I believe these cases
were miscompiled before.

e.g.
  fcmp s0, s1
  b.ne LBB0_1
  b.vc LBB0_2
  b LBB0_2
LBB0_1:
  ; Large block
LBB0_2:
 ; ...

Both of the individual conditional branches need to
be expanded, since neither can reach the final block.

Split the original block into ones which analyzeBranch
will be able to understand.

llvm-svn: 279499
2016-08-23 01:30:30 +00:00
Matt Arsenault 78fc9daf8d AMDGPU: Split SILowerControlFlow into two pieces
Do most of the lowering in a pre-RA pass. Keep the skip jump
insertion late, plus a few other things that require more
work to move out.

One concern I have is now there may be COPY instructions
which do not have the necessary implicit exec uses
if they will be lowered to v_mov_b32.

This has a positive effect on SGPR usage in shader-db.

llvm-svn: 279464
2016-08-22 19:33:16 +00:00
James Molloy 5bf2114265 [SimplifyCFG] Rewrite SinkThenElseCodeToEnd
[Recommitting now an unrelated assertion in SROA is sorted out]

The new version has several advantages:
  1) IMSHO it's more readable and neater
  2) It handles loads and stores properly
  3) It can handle any number of incoming blocks rather than just two. I'll be taking advantage of this in a followup patch.

With this change we can now finally sink load-modify-store idioms such as:

    if (a)
      return *b += 3;
    else
      return *b += 4;

    =>

    %z = load i32, i32* %y
    %.sink = select i1 %a, i32 5, i32 7
    %b = add i32 %z, %.sink
    store i32 %b, i32* %y
    ret i32 %b

When this works for switches it'll be even more powerful.

Round 4. This time we should handle all instructions correctly, and not replace any operands that need to be constant with variables.

This was really hard to determine safely, so the helper function should be put into the Instruction API. I'll do that as a followup.

llvm-svn: 279460
2016-08-22 19:07:15 +00:00
James Molloy 475f4a763f Revert "[SimplifyCFG] Rewrite SinkThenElseCodeToEnd"
This reverts commit r279443. It caused buildbot failures.

llvm-svn: 279447
2016-08-22 18:13:12 +00:00
James Molloy 353052698a [SimplifyCFG] Rewrite SinkThenElseCodeToEnd
The new version has several advantages:
  1) IMSHO it's more readable and neater
  2) It handles loads and stores properly
  3) It can handle any number of incoming blocks rather than just two. I'll be taking advantage of this in a followup patch.

With this change we can now finally sink load-modify-store idioms such as:

    if (a)
      return *b += 3;
    else
      return *b += 4;

    =>

    %z = load i32, i32* %y
    %.sink = select i1 %a, i32 5, i32 7
    %b = add i32 %z, %.sink
    store i32 %b, i32* %y
    ret i32 %b

When this works for switches it'll be even more powerful.

Round 4. This time we should handle all instructions correctly, and not replace any operands that need to be constant with variables.

This was really hard to determine safely, so the helper function should be put into the Instruction API. I'll do that as a followup.

llvm-svn: 279443
2016-08-22 17:40:23 +00:00
Simon Pilgrim c8ad5c069c [X86][AVX] Don't use SubVectorBroadcast if there are additional users of the chain (PR29088)
We could improve on this by making X86SubVBroadcast a full memory intrinsic similar to X86vzload

llvm-svn: 279441
2016-08-22 16:47:55 +00:00
Simon Pilgrim 2279e59573 [X86][SSE] Avoid specifying unused arguments in SHUFPD lowering
As discussed on PR26491, we are missing the opportunity to make use of the smaller MOVHLPS instruction because we set both arguments of a SHUFPD when using it to lower a single input shuffle.

This patch sets the lowered argument to UNDEF if that shuffle element is undefined. This in turn makes it easier for target shuffle combining to decode UNDEF shuffle elements, allowing combines to MOVHLPS to occur.

A fix to match against MOVHPD stores was necessary as well.

This builds on the improved MOVLHPS/MOVHLPS lowering and memory folding support added in D16956

Adding similar support for SHUFPS will have to wait until have better support for target combining of binary shuffles.

Differential Revision: https://reviews.llvm.org/D23027

llvm-svn: 279430
2016-08-22 12:56:54 +00:00
Guy Blank 9ae797a798 [AVX512][FastISel] Do not use K registers in TEST instructions
In some cases, FastIsel was emitting TEST instruction with K reg input, which is illegal.
Changed to using KORTEST when dealing with K regs.

Differential Revision: https://reviews.llvm.org/D23163

llvm-svn: 279393
2016-08-21 08:02:27 +00:00
Duncan P. N. Exon Smith 8f44c98d04 ARM: Avoid dereferencing end() in ARMFrameLowering::emitEpilogue
This fixes the crash from PR29072, where the MachineBasicBlock::iterator
wasn't being properly checked against MachineBasicBlock::end() before
iterating.  This was another bug exposed by the new
ilist::iterator::operator*() assertion from r279314.

This testcase is poor quality.  bugpoint couldn't reduce any further,
and I haven't had time to dig into what's going on so I can't invent a
better one.  I didn't even get good CHECK lines in: this is just a
crasher.

I'm committing anyway since this is a real crash with an obvious fix,
but I'll leave PR29072 open and ask an ARM maintainer to help improve
the testcase.

llvm-svn: 279391
2016-08-21 00:08:10 +00:00
Simon Pilgrim 636422a898 [X86][SSE] Regenerate 32-bit buildvector test
llvm-svn: 279389
2016-08-20 23:09:57 +00:00
Simon Pilgrim ead5076753 [X86][SSE] Regenerate subvector extraction widening test
llvm-svn: 279388
2016-08-20 22:00:53 +00:00
Simon Pilgrim b65d476549 [X86] Regenerate fp truncate tests
llvm-svn: 279387
2016-08-20 21:56:33 +00:00
Simon Pilgrim 8275583be6 Regenerate test
llvm-svn: 279386
2016-08-20 21:37:30 +00:00
Simon Pilgrim a1142579f1 Regenerate test
llvm-svn: 279385
2016-08-20 21:35:45 +00:00
Simon Pilgrim ae9b81e684 [X86][XOP] Tweak vpermil2pd test to stop it being combined away
llvm-svn: 279384
2016-08-20 21:07:41 +00:00
Simon Pilgrim cb0ba1067f [X86][SSE] Added vector interleave test (PR21281)
llvm-svn: 279375
2016-08-20 17:07:38 +00:00
Tim Northover a11be04769 GlobalISel: support legalization of G_FCONSTANTs
llvm-svn: 279341
2016-08-19 22:40:08 +00:00
Tim Northover ea904f9424 GlobalISel: teach legalizer how to handle integer constants.
llvm-svn: 279340
2016-08-19 22:40:00 +00:00
Matthias Braun a7d6fc9618 MachineFunction: Cleanup/simplify MachineFunctionProperties::print()
- Always compile print() regardless of LLVM_ENABLE_DUMP. (We usually
  only gard dump() functions with that).
- Only show the set properties to reduce output clutter.
- Remove the unused variant that even shows the unset properties.
- Fix comments

llvm-svn: 279338
2016-08-19 22:31:45 +00:00
Tim Northover b78e4cafde GlobalISel: translate floating-point round/extend
llvm-svn: 279320
2016-08-19 20:48:23 +00:00
Tim Northover d5c23bcfc9 GlobalISel: translate floating-point comparisons
llvm-svn: 279319
2016-08-19 20:48:16 +00:00
Justin Lebar d13880a750 [NVPTX] Switch nvptx-use-infer-addrspace to true.
Summary:
This switches us to use a different, more powerful algorithm for address
space inference.  I've tested this locally and it seems to work great.
Once we're more confident in it, we can remove the old pass altogether.

Reviewers: jingyue

Subscribers: llvm-commits, tra, jholewinski

Differential Revision: https://reviews.llvm.org/D23694

llvm-svn: 279317
2016-08-19 20:46:45 +00:00
Reid Kleckner 98a48afa5d Revert "[SimplifyCFG] Rewrite SinkThenElseCodeToEnd"
This reverts commit r279229. It breaks intrinsic function calls in
diamonds.

llvm-svn: 279313
2016-08-19 20:22:39 +00:00
Tim Northover b16734fbaa GlobalISel: translate floating-point constants
llvm-svn: 279311
2016-08-19 20:09:15 +00:00
Tim Northover d3761cd165 GlobalISel: translate float/int conversion instructions.
llvm-svn: 279310
2016-08-19 20:09:11 +00:00
Tim Northover 5a28c3642f GlobalISel: support translating select instructions.
llvm-svn: 279309
2016-08-19 20:09:07 +00:00
Tim Northover bbbfb1cfb8 GlobalISel: translate insertvalue instructions.
This adds a G_INSERT instruction, which technically makes G_SEQUENCE redundant
(it's equivalent to a G_INSERT into an IMPLICIT_DEF). We'll leave G_SEQUENCE
for now though: it's likely to be far more common as it's a fundamental part of
legalization, so avoiding the mess and bloat of the extra IMPLICIT_DEFs is
probably worthwhile.

llvm-svn: 279306
2016-08-19 20:08:55 +00:00
Krzysztof Parzyszek 021151d6c1 [Hexagon] Add RUN line to test
llvm-svn: 279304
2016-08-19 19:36:35 +00:00
Krzysztof Parzyszek 505eb498bd [Hexagon] Allow i1 values for 'r' constraint in inline-asm
llvm-svn: 279302
2016-08-19 19:17:28 +00:00
Tim Northover 26b76f2c59 GlobalISel: improve representation of G_SEQUENCE and G_EXTRACT
First, make sure all types involved are represented, rather than being implicit
from the register width.

Second, canonicalize all types to scalar. These operations just act in bits and
don't care about vectors.

Also standardize spelling of Indices in the MachineIRBuilder (NFC here).

llvm-svn: 279294
2016-08-19 18:32:14 +00:00
Kyle Butt ce0196de3f Revert "CodeGen: If Convert blocks that would form a diamond when tail-merged."
This reverts commit 0fda93481c4231c06b838ef476c0c404c51ff875.

llvm-svn: 279288
2016-08-19 18:17:04 +00:00
Tim Northover 2fa5fa391f GlobalISel: allow extractvalue to extract an aggregate.
llvm-svn: 279287
2016-08-19 18:09:41 +00:00
Krzysztof Parzyszek 3d9946eb23 [Hexagon] Fixes for new-value jump formation
- Recognize C2_cmpgtui, S2_tstbit_i, and S4_ntstbit_i.
- Avoid creating new-value instructions with both source operands equal.

llvm-svn: 279286
2016-08-19 17:54:49 +00:00
Tim Northover 6f80b08c64 GlobalISel: support translation of extractvalue instructions.
llvm-svn: 279285
2016-08-19 17:47:05 +00:00
Tim Northover 91c8173093 GlobalISel: support overflow arithmetic intrinsics.
Unsigned addition and subtraction can reuse the instructions created to
legalize large width operations (i.e. both produce and consume a carry flag).
Signed operations and multiplies get a dedicated op-with-overflow instruction.

Once this is produced the two values are combined into a struct register (which
will almost always be merged with a corresponding G_EXTRACT as part of
legalization).

llvm-svn: 279278
2016-08-19 17:17:06 +00:00
Simon Pilgrim d7a3782ae4 [X86][SSE] Generalised combining to VZEXT_MOVL to any vector size
This doesn't change tests codegen as we already combined to blend+zero which is what we lower VZEXT_MOVL to on SSE41+ targets, but it does put us in a better position when we improve shuffling for optsize.

llvm-svn: 279273
2016-08-19 17:02:00 +00:00
Krzysztof Parzyszek 639545b4d8 [Hexagon] Enforce LLSC packetization rules
Ensure that load locked and store conditional instructions are only
packetized with ALU32 instructions.

Patch by Ben Craig.

llvm-svn: 279272
2016-08-19 16:57:05 +00:00
Krzysztof Parzyszek 9335bf0ec5 [Hexagon] Fix incorrect generation of S4_subi_asl_ri
Patch by Jyotsna Verma.

llvm-svn: 279267
2016-08-19 16:35:05 +00:00
Krzysztof Parzyszek 0ba9754584 [Hexagon] Allow tail-call optimization when mixing C and fast calling conv
Patch by Arnold Schwaighofer.

llvm-svn: 279251
2016-08-19 15:02:18 +00:00
Krzysztof Parzyszek 66dd6797e8 [Hexagon] Check for empty live interval
Patch by Brendon Cahoon.

llvm-svn: 279249
2016-08-19 14:29:43 +00:00
Anton Korobeynikov b38195c1a8 Revert r279242 - it's failing the tests
llvm-svn: 279247
2016-08-19 14:18:34 +00:00
Anton Korobeynikov 2aae31a945 Fix PR27500: on MSP430 the branch destination offset is measured in words, not bytes.
In addition, the branch instructions will have proper BB destinations, not offsets, like before.

Patch by Vadzim Dambrouski!

Differential Revision: https://reviews.llvm.org/D20162

llvm-svn: 279242
2016-08-19 14:07:10 +00:00
Krzysztof Parzyszek 6421b934ec [Hexagon] Mark PS_jumpret as pseudo-instruction, expand it into J2_jumpr
llvm-svn: 279241
2016-08-19 14:04:45 +00:00
Krzysztof Parzyszek bd8ef4b8ce [Hexagon] Improvements to handling and generation of FP instructions
Improved handling of fma, floating point min/max, additional load/store
instructions for floating point types.

Patch by Jyotsna Verma.

llvm-svn: 279239
2016-08-19 13:34:31 +00:00
Simon Pilgrim f1b8fdc074 [X86][SSE] Add support for matching commuted insertps patterns
INSERTPS doesn't fit well with our shuffle mask canonicalization, so we need to attempt both the original mask and the commuted mask to more likely get a match

llvm-svn: 279230
2016-08-19 10:31:53 +00:00
James Molloy 11a1936b70 [SimplifyCFG] Rewrite SinkThenElseCodeToEnd
The new version has several advantages:
  1) IMSHO it's more readable and neater
  2) It handles loads and stores properly
  3) It can handle any number of incoming blocks rather than just two. I'll be taking advantage of this in a followup patch.

With this change we can now finally sink load-modify-store idioms such as:

    if (a)
      return *b += 3;
    else
      return *b += 4;

    =>

    %z = load i32, i32* %y
    %.sink = select i1 %a, i32 5, i32 7
    %b = add i32 %z, %.sink
    store i32 %b, i32* %y
    ret i32 %b

When this works for switches it'll be even more powerful.

llvm-svn: 279229
2016-08-19 10:10:27 +00:00
James Molloy 7ee640f9b6 [CodeGen] Fix a trivial type conversion bug dating back to pre-2008
The heuristic above this code is incredibly suspect, but disregarding that it mutates the cast opcode so we need to check the *mutated* opcode later to see if we need to emit an AssertSext or AssertZext node.

Fixes PR29041.

llvm-svn: 279223
2016-08-19 08:38:50 +00:00
Dean Michael Berris 1dd1ca9727 [XRay] Synthesize a reference to the xray_instr_map
Without the synthesized reference to a symbol in the xray_instr_map,
linker section garbage collection will helpfully remove the whole
xray_instr_map section from the final executable (or archive). This will
cause the runtime to not be able to identify the sleds and hot-patch the
calls/jumps into the runtime trampolines.

This change adds a reference from the text section at the end of the
function to keep around the associated xray_instr_map section as well.

We also make sure that we catch this reference in the test.

Reviewers: chandlerc, echristo, majnemer, mehdi_amini

Subscribers: mehdi_amini, llvm-commits, dberris

Differential Revision: https://reviews.llvm.org/D23398

llvm-svn: 279204
2016-08-19 04:44:30 +00:00
Matthias Braun fdc4c6b426 Revert "RegScavenging: Add scavengeRegisterBackwards()"
The ppc64 multistage bot fails on this.

This reverts commit r279124.

Also Revert "CodeGen: Add/Factor out LiveRegUnits class; NFCI" because it depends on the previous change
This reverts commit r279171.

llvm-svn: 279199
2016-08-19 03:03:24 +00:00
Kyle Butt 780b517d6b CodeGen: If Convert blocks that would form a diamond when tail-merged.
The following function currently relies on tail-merging for if
conversion to succeed. The common tail of cond_true and cond_false is
extracted, and this then forms a diamond pattern that can be
successfully if converted.

If this block does not get extracted, either because tail-merging is
disabled or the threshold is higher, we should still recognize this
pattern and if-convert it.

Fixed a regression in the original commit. Need to un-reverse branches after
reversing them, or other conversions go awry.

Regression on self-hosting bots with no obvious explanation. Tidied up range
handling to be more obviously correct, but there was no smoking gun.

define i32 @t2(i32 %a, i32 %b) nounwind {
entry:
        %tmp1434 = icmp eq i32 %a, %b           ; <i1> [#uses=1]
        br i1 %tmp1434, label %bb17, label %bb.outer

bb.outer:               ; preds = %cond_false, %entry
        %b_addr.021.0.ph = phi i32 [ %b, %entry ], [ %tmp10, %cond_false ]
        %a_addr.026.0.ph = phi i32 [ %a, %entry ], [ %a_addr.026.0, %cond_false ]
        br label %bb

bb:             ; preds = %cond_true, %bb.outer
        %indvar = phi i32 [ 0, %bb.outer ], [ %indvar.next, %cond_true ]
        %tmp. = sub i32 0, %b_addr.021.0.ph
        %tmp.40 = mul i32 %indvar, %tmp.
        %a_addr.026.0 = add i32 %tmp.40, %a_addr.026.0.ph
        %tmp3 = icmp sgt i32 %a_addr.026.0, %b_addr.021.0.ph
        br i1 %tmp3, label %cond_true, label %cond_false

cond_true:              ; preds = %bb
        %tmp7 = sub i32 %a_addr.026.0, %b_addr.021.0.ph
        %tmp1437 = icmp eq i32 %tmp7, %b_addr.021.0.ph
        %indvar.next = add i32 %indvar, 1
        br i1 %tmp1437, label %bb17, label %bb

cond_false:             ; preds = %bb
        %tmp10 = sub i32 %b_addr.021.0.ph, %a_addr.026.0
        %tmp14 = icmp eq i32 %a_addr.026.0, %tmp10
        br i1 %tmp14, label %bb17, label %bb.outer

bb17:           ; preds = %cond_false, %cond_true, %entry
        %a_addr.026.1 = phi i32 [ %a, %entry ], [ %tmp7, %cond_true ], [ %a_addr.026.0, %cond_false ]
        ret i32 %a_addr.026.1
}

Without tail-merging or diamond-tail if conversion:
LBB1_1:                                 @ %bb
                                        @ =>This Inner Loop Header: Depth=1
        cmp     r0, r1
        ble     LBB1_3
@ BB#2:                                 @ %cond_true
                                        @   in Loop: Header=BB1_1 Depth=1
        subs    r0, r0, r1
        cmp     r1, r0
        it      ne
        cmpne   r0, r1
        bgt     LBB1_4
LBB1_3:                                 @ %cond_false
                                        @   in Loop: Header=BB1_1 Depth=1
        subs    r1, r1, r0
        cmp     r1, r0
        bne     LBB1_1
LBB1_4:                                 @ %bb17
        bx      lr

With diamond-tail if conversion, but without tail-merging:
@ BB#0:                                 @ %entry
        cmp     r0, r1
        it      eq
        bxeq    lr
LBB1_1:                                 @ %bb
                                        @ =>This Inner Loop Header: Depth=1
        cmp     r0, r1
        ite     le
        suble   r1, r1, r0
        subgt   r0, r0, r1
        cmp     r1, r0
        bne     LBB1_1
@ BB#2:                                 @ %bb17
        bx      lr

llvm-svn: 279168
2016-08-18 22:09:27 +00:00
Zhan Jun Liau cf2f4b3251 [SystemZ] Use valid base/index regs for inline asm
Summary:
Inline asm memory constraints can have the base or index register be assigned
to %r0 right now. Make sure that we assign only ADDR64 registers to the base
and index.

Reviewers: uweigand

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D23367

llvm-svn: 279157
2016-08-18 21:44:15 +00:00
Tom Stellard a1619cd9aa AMDGPU/SI: Fix a test in wqm.ll to always use s_cbranch_vcc*
Summary:
We need to use floating-point compares to ensure that s_cbranch_vcc*
instructions are always generated.  With integer compares, future
optimizations could cause s_cbranch_scc* to be generated instead.

Reviewers: arsenm, nhaehnle

Subscribers: llvm-commits, kzhuravl

Differential Revision: https://reviews.llvm.org/D23401

llvm-svn: 279148
2016-08-18 21:21:53 +00:00
Wei Ding 52bb661dec AMDGPU : Fix QSAD and MQSAD instructions' incorrect data type.
Differential Revision: http://reviews.llvm.org/D23689

llvm-svn: 279126
2016-08-18 19:51:14 +00:00
Matthias Braun 075d0c23d5 RegScavenging: Add scavengeRegisterBackwards()
Re-apply r276044 with off-by-1 instruction fix for the reload placement.

This is a variant of scavengeRegister() that works for
enterBasicBlockEnd()/backward(). The benefit of the backward mode is
that it is not affected by incomplete kill flags.

This patch also changes
PrologEpilogInserter::doScavengeFrameVirtualRegs() to use the register
scavenger in backwards mode.

Differential Revision: http://reviews.llvm.org/D21885

llvm-svn: 279124
2016-08-18 19:47:59 +00:00
Simon Pilgrim 99fd9c5f56 [X86][SSE] Missed insertps shuffle patterns
llvm-svn: 279111
2016-08-18 18:19:28 +00:00
Valery Pykhtin 609c2f8137 [AMDGPU] add s_incperflevel/s_decperflevel intrinsics.
Differential revision: https://reviews.llvm.org/D23666

llvm-svn: 279106
2016-08-18 18:06:20 +00:00
Elliot Colp 687691aeac Fix SystemZ compilation abort caused by negative AND mask
Normally, when an AND with a constant is lowered to NILL, the constant value is truncated to 16 bits. However, since r274066, ANDs whose results are used in a shift are caught by a different pattern that does not truncate. The instruction printer expects a 16-bit unsigned immediate operand for NILL, so this results in an abort.

This patch adds code to manually truncate the constant in this situation. The rest of the bits are then set, so we will detect a case for NILL "naturally" rather than using peephole optimizations.

Differential Revision: http://reviews.llvm.org/D21854

llvm-svn: 279105
2016-08-18 18:04:26 +00:00
Duncan P. N. Exon Smith 84c2da47f9 AArch64: Don't call getIterator() on iterators
Remove an unnecessary round-trip:

    iterator => operator->() => getIterator()

In some cases, the iterator is end(), so the dereference of operator->
is invalid (UB).

The testcase only crashes with r278974 (currently reverted to
investigate this), which adds an assertion for invalid dereferences of
ilist nodes.

Fixes PR29035.

llvm-svn: 279104
2016-08-18 17:58:09 +00:00
Dan Gohman c9623db884 [WebAssembly] Disable the store-results optimization.
The WebAssemly spec removing the return value from store instructions, so
remove the associated optimization from LLVM.

This patch leaves the store instruction operands in place for now, so stores
now always write to "$drop"; these will be removed in a seperate patch.

llvm-svn: 279100
2016-08-18 17:51:27 +00:00
Ahmed Bougacha 33e19fe1c4 [AArch64][GlobalISel] Select floating-point binary ops.
There is no FREM instruction, but the others are straightforward.

llvm-svn: 279081
2016-08-18 16:05:11 +00:00
Ahmed Bougacha 71d033a17f [GlobalISel] Add floating-point binary ops.
llvm-svn: 279080
2016-08-18 16:05:06 +00:00
Derek Schuff ccdceda128 [WebAssembly] Refactor WebAssemblyLowerEmscriptenException pass for setjmp/longjmp
This patch changes the code structure of
WebAssemblyLowerEmscriptenException pass to support both exception
handling and setjmp/longjmp. It also changes the name of the pass and
the source file.

1. Change the file/pass name to WebAssemblyLowerEmscriptenExceptions ->
WebAssemblyLowerEmscriptenEHSjLj to make it clear that it supports both
EH and SjLj
2. List function / global variable names at the top so they
can be changed easily
3. Some cosmetic changes

Patch by Heejin Ahn

Differential Revision: https://reviews.llvm.org/D23588

llvm-svn: 279075
2016-08-18 15:27:25 +00:00
Ahmed Bougacha 1d0560b14d [AArch64][GlobalISel] Select G_SDIV/G_UDIV.
There is no REM instruction; that will require an expansion.
It's not obvious that should be done in select, rather than as a
(custom?) legalization.

llvm-svn: 279074
2016-08-18 15:17:13 +00:00
Ahmed Bougacha 13db94540c [GlobalISel] Add support for DIV/REM.
llvm-svn: 279073
2016-08-18 15:17:01 +00:00
Krzysztof Parzyszek b1b0372337 [Hexagon] Create vcombine in HexagonCopyToCombine
llvm-svn: 279067
2016-08-18 14:12:34 +00:00
Simon Pilgrim ab7c46eccf [X86][SSE] Add SSE1 tests to make sure we don't merge loads on illegal types
llvm-svn: 279065
2016-08-18 13:41:26 +00:00
Simon Dardis ea3431598e [mips] Correct tail call encoding for MIPSR6
r277708 enabled tails calls for MIPS but used the 'jr' instruction when the 
jump target was held in a register. For MIPSR6, 'jalr $zero, $reg' should
have been used. Additionally, add missing patterns for external and global
symbols for tail calls.

Reviewers: dsanders, vkalintiris

Differential Review: https://reviews.llvm.org/D23301

llvm-svn: 279064
2016-08-18 13:22:43 +00:00
Matthias Braun c9130ea6a3 Testcase for r279022
llvm-svn: 279031
2016-08-18 02:21:54 +00:00
Tim Northover de3aea0412 GlobalISel: support irtranslation of icmp instructions.
llvm-svn: 278969
2016-08-17 20:25:25 +00:00
Marina Yatsina 53ce3f9d02 Fix for PR29010
This is a fix for https://llvm.org/bugs/show_bug.cgi?id=29010
Root cause of the bug is that the register class of the machine instruction operand does not fully reflect if this registers that can be allocated.
Both for i386 and x86_64 the operand's register class is VR128RegClass and thus contains xmm0-xmm15, though in i386 we can only use xmm0-xmm8.
In order to get the actual allocable registers of the class we need to use RegisterClassInfo.

Differential Revision: https://reviews.llvm.org/D23613

llvm-svn: 278954
2016-08-17 19:07:40 +00:00
Jonas Paulsson 7a79422536 [LoopStrenghtReduce] Refactoring and addition of a new target cost function.
Refactored so that a LSRUse owns its fixups, as oppsed to letting the
LSRInstance own them. This makes it easier to rate formulas for
LSRUses, since the fixups are available directly. The Offsets vector
has been removed since it was no longer necessary.

New target hook isFoldableMemAccessOffset(), which is used during formula
rating.

For SystemZ, this is useful to express that loads and stores with
float or vector types with a big/negative offset should be avoided in
loops. Without this, LSR will generate a lot of negative offsets that
would require extra instructions for loading the address.

Updated tests:
test/CodeGen/SystemZ/loop-01.ll

Reviewed by: Quentin Colombet and Ulrich Weigand.
https://reviews.llvm.org/D19152

llvm-svn: 278927
2016-08-17 13:24:19 +00:00
Ayman Musa 71b43c5c1d Fix bug in DAGBuilder for getelementptr with expanded vector.
Replacing the usage of MVT with EVT in case the vector type is expanded.
Differential Revision: https://reviews.llvm.org/D23306

llvm-svn: 278913
2016-08-17 07:52:15 +00:00
Chuang-Yu Cheng f7ba716bcb [ppc64] Don't apply sibling call optimization if callee has any byval arg
This is a quick work around, because in some cases, e.g. caller's stack
size > callee's stack size, we are still able to apply sibling call
optimization even callee has any byval arg.

This patch fix: https://llvm.org/bugs/show_bug.cgi?id=28328

Reviewers: hfinkel kbarton nemanjai amehsan
Subscribers: hans, tjablin

https://reviews.llvm.org/D23441

llvm-svn: 278900
2016-08-17 03:17:44 +00:00
Kyle Butt 07d61425e3 Codegen: Don't tail-duplicate blocks with un-analyzable fallthrough.
If AnalyzeBranch can't analyze a block and it is possible to
fallthrough, then duplicating the block doesn't make sense, as only one
block can be the layout predecessor for the un-analyzable fallthrough.

Submitted wit a test case, but NOTE: the test case doesn't currently
fail. However, the test case fails with D20505 and would have saved me
some time debugging.

llvm-svn: 278866
2016-08-16 22:56:14 +00:00
Sanjay Patel 904cd39b05 [x86] Allow merging multiple instances of an immediate within a basic block for code size savings, for 64-bit constants.
This patch handles 64-bit constants which can be encoded as 32-bit immediates.

It extends the functionality added by https://reviews.llvm.org/D11363 for 32-bit constants to 64-bit constants.

Patch by Sunita Marathe!

Differential Revision: https://reviews.llvm.org/D23391

llvm-svn: 278857
2016-08-16 21:35:16 +00:00
Haicheng Wu 9780df5385 [BranchFolding] Change a test case of r278575.
Rename the operands to make the test less brittle.

llvm-svn: 278841
2016-08-16 20:06:25 +00:00
Sjoerd Meijer 15c81b05ea [MBP] do not reorder and move up loop latch block
Do not reorder and move up a loop latch block before a loop header
when optimising for size because this will generate an extra 
unconditional branch.

Differential Revision: https://reviews.llvm.org/D22521

llvm-svn: 278840
2016-08-16 19:50:33 +00:00
Simon Dardis 4893aff94e [mips] Enforce compact branch restrictions
Check both operands for use of the $zero register which cannot be used with
a compact branch instruction.

Reviewers: dsanders, vkalintris

Differential Review: https://reviews.llvm.org/D23547

llvm-svn: 278824
2016-08-16 17:16:11 +00:00
Wei Mi db68c9adbd Remove a stale comment from the test, NFC.
llvm-svn: 278821
2016-08-16 16:57:15 +00:00
Ahmed Bougacha e4c03abddd [AArch64][GlobalISel] Select G_MUL.
llvm-svn: 278810
2016-08-16 14:37:46 +00:00
Brendon Cahoon 65b6ebccad [Pipeliner] Fix an asssert due to invalid Phi in the epilog
The pipeliner was generating an invalid Phi name for an operand
in the epilog block, which caused an assert in the live variable
analysis pass. The fix is to the code that generates new Phis
in the epilog block. In this case, there is an existing Phi that
needs to be reused rather than creating a new Phi instruction.

Differential Revision: https://reviews.llvm.org/D23513

llvm-svn: 278805
2016-08-16 14:29:24 +00:00
Ahmed Bougacha 2ac5bf94bc [AArch64][GlobalISel] Select (variable) shifts.
For now, no support for immediates.

llvm-svn: 278804
2016-08-16 14:02:47 +00:00
Ahmed Bougacha 7e508a8fcd [AArch64][GlobalISel] Robustize select tests. NFC.
Using the same register means nothing was checking for operand order.

llvm-svn: 278803
2016-08-16 14:02:44 +00:00
Ahmed Bougacha 0306b5ef07 [AArch64][GlobalISel] Select p0 G_FRAME_INDEX.
And mark it as legal.

llvm-svn: 278802
2016-08-16 14:02:42 +00:00
Simon Pilgrim 25d2506029 [X86][AVX] Fixed typo in zero element insertion
llvm-svn: 278798
2016-08-16 13:33:33 +00:00
Ron Lieberman a481c7db93 [Hexagon] Improve test to check for @PCREL, only run llc, not opt -> llc.
llvm-svn: 278796
2016-08-16 13:10:09 +00:00
Simon Pilgrim cc316f013a [X86][SSE] Add support for combining v2f64 target shuffles to VZEXT_MOVL byte rotations
The combine was only matching v2i64 as it assumed lowering to MOVQ - but we have v2f64 patterns that match in a similar fashion

llvm-svn: 278794
2016-08-16 12:52:06 +00:00
Simon Pilgrim d2d3202532 [X86][AVX512BW] Updated tests to demonstrate AVX512BW's inability to vectorize v64i8 shifts
llvm-svn: 278790
2016-08-16 11:05:47 +00:00
Simon Pilgrim f16cd361d4 [X86][SSE] Add support for combining target shuffles to PALIGNR byte rotations
llvm-svn: 278787
2016-08-16 10:03:23 +00:00
Guy Blank 722caebdae [X86] Add xgetbv/xsetbv intrinsics to non-windows platforms
Differential Revision: https://reviews.llvm.org/D21958

llvm-svn: 278782
2016-08-16 06:41:00 +00:00
Eli Friedman 98151d6440 Fix typo in lowering for fp128 ueq.
Regression from r259791.

Differential Revision: https://reviews.llvm.org/D23374

llvm-svn: 278750
2016-08-15 21:46:19 +00:00
Jan Vesely 0486f739a4 AMDGPU/R600: Convert buffer id to VTX_READ input
Use patterns instead of multiple instructions
Add buffer id to asm string

https://reviews.llvm.org/D22650

llvm-svn: 278749
2016-08-15 21:38:30 +00:00
Tim Northover 28fdc4272d GlobalISel: support loads and stores of strange types.
Before we mischaracterized structs and i1 types as a scalar with size 0 in
various ways.

llvm-svn: 278744
2016-08-15 21:13:17 +00:00
Reid Kleckner bb8652312a Fix WAsm test after LSR change in r278658
Now the increment is done in a different location

llvm-svn: 278713
2016-08-15 18:51:42 +00:00
Matt Arsenault 3661e90e71 AMDGPU: Don't fold subregister extracts into tied operands
llvm-svn: 278676
2016-08-15 16:18:36 +00:00
Reid Kleckner 70a600b8bb Revert "[SimplifyCFG] Rewrite SinkThenElseCodeToEnd"
This reverts commit r278660.

It causes downstream assertion failure in InstCombine on shuffle
instructions. Comes up in __mm_swizzle_epi32.

llvm-svn: 278672
2016-08-15 15:42:31 +00:00
James Molloy 9a3c82f5cf [SimplifyCFG] Rewrite SinkThenElseCodeToEnd
The new version has several advantages:
  1) IMSHO it's more readable and neater
  2) It handles loads and stores properly
  3) It can handle any number of incoming blocks rather than just two. I'll be taking advantage of this in a followup patch.

With this change we can now finally sink load-modify-store idioms such as:

    if (a)
      return *b += 3;
    else
      return *b += 4;

    =>

    %z = load i32, i32* %y
    %.sink = select i1 %a, i32 5, i32 7
    %b = add i32 %z, %.sink
    store i32 %b, i32* %y
    ret i32 %b

When this works for switches it'll be even more powerful.

llvm-svn: 278660
2016-08-15 08:04:56 +00:00
James Molloy 196ad0823e [LSR] Don't try and create post-inc expressions on non-rotated loops
If a loop is not rotated (for example when optimizing for size), the latch is not the backedge. If we promote an expression to post-inc form, we not only increase register pressure and add a COPY for that IV expression but for all IVs!

Motivating testcase:

    void f(float *a, float *b, float *c, int n) {
      while (n-- > 0)
        *c++ = *a++ + *b++;
    }

It's imperative that the pointer increments be located in the latch block and not the header block; if not, we cannot use post-increment loads and stores and we have to keep both the post-inc and pre-inc values around until the end of the latch which bloats register usage.

llvm-svn: 278658
2016-08-15 07:53:03 +00:00
Igor Breger 505f2cc468 [AVX512] Fix VFPCLASSSD/VFPCLASSSS intrinsic lowering. The i1 result should be zero extended according to SPEC.
Differential Revision: http://reviews.llvm.org/D23489

llvm-svn: 278626
2016-08-14 13:58:57 +00:00
Igor Breger 6fc00b0acf autogenerate checks
llvm-svn: 278624
2016-08-14 09:34:39 +00:00
Igor Breger 8672408db0 [AVX512] Fix insertelement i1 lowering.
1. Use shuffle to insert element i1 into vector. The previous implementation was incorrect ( dest_bit OR src_bit , it doesn't clear the bit if src_bit=0 )
2. Improve shuffle i1 vector, use CVT2MASK if supported instead TRUNCATE.

Differential Revision: http://reviews.llvm.org/D23347

llvm-svn: 278623
2016-08-14 05:25:07 +00:00
Diana Picus 68be1eb885 Revert "CodeGen: If Convert blocks that would form a diamond when tail-merged."
This reverts commit r278287.

This commit broke the clang-cmake-thumbv7-a15-full-sh bot.
See https://llvm.org/bugs/show_bug.cgi?id=28949

llvm-svn: 278621
2016-08-14 02:10:18 +00:00
Diana Picus 35ccf53e75 Revert "Codegen: Don't tail-duplicate blocks with un-analyzable fallthrough."
This reverts commit r278288.

r278287 broke the clang-cmake-thumbv7-a15-full-sh bot.
Revert this so we can get to r278287.

llvm-svn: 278620
2016-08-14 02:10:12 +00:00
Ron Lieberman 822ee88ab8 Fix unsupported relocation type R_HEX_6_X' for symbol .rodata
LowerTargetConstantPool is not properly setting the TargetFlag to indicate
desired relocation. Coding error, the offset parameter was omitted, so the
TargetFlag was used as the offset, and the TargetFlag defaulted to zero.

This only affects -fpic compilation, and only those items created in a
Constant Pool, for example a vector of constants. Halide ran into this issue.

llvm-svn: 278614
2016-08-13 23:41:11 +00:00
Mehdi Amini 8c629ecf3a Revert "Revert "Invariant start/end intrinsics overloaded for address space""
This reverts commit 32fc6488e48eafc0ca1bac1bd9cbf0008224d530.

llvm-svn: 278609
2016-08-13 23:31:24 +00:00
Mehdi Amini 164ac651da Revert "Invariant start/end intrinsics overloaded for address space"
This reverts commit r276447.

llvm-svn: 278608
2016-08-13 23:27:32 +00:00
Sanjay Patel 08c876673e [x86] add tests to show missed 64-bit immediate merging
Tests are slightly modified versions of those written by
Sunita Marathe in D23391.

llvm-svn: 278599
2016-08-13 18:42:14 +00:00
Craig Topper 3f8126e6fa [AVX-512] Remove an AddedComplexity that was prioritizing basic vzmovl patterns over more complex ones that produce better code.
llvm-svn: 278593
2016-08-13 05:43:20 +00:00
Craig Topper 600685d510 [AVX-512] Add patterns to support VZEXT_MOVL from 512-bit vectors with 64-bit and 32-bit elements.
Fixes PR28961.

llvm-svn: 278592
2016-08-13 05:33:12 +00:00
Matt Arsenault 3cc1e0066d AMDGPU: Fix missing test for addressing mode with odd offsets
Add test if the constant offset looks unaligned.

llvm-svn: 278589
2016-08-13 01:43:51 +00:00
Dominic Chen 4a9b99ee92 [WebAssembly] Re-enable disabled debug value test
Summary:
This test was resulting in asan/valgrind failures due to undefined
DWARF register mappings for WebAssembly, and was disabled in r278495.
These have been resolved.

Reviewers: sunfish, dschuff

Subscribers: bkramer, llvm-commits, jfb

Differential Revision: https://reviews.llvm.org/D23459

llvm-svn: 278576
2016-08-12 23:14:18 +00:00
Haicheng Wu 7c4535d1e7 Reapply [BranchFolding] Restrict tail merging loop blocks after MBP
Fixed a bug in the test case.

To fix PR28104, this patch restricts tail merging to blocks that belong to the
same loop after MBP.

llvm-svn: 278575
2016-08-12 23:13:38 +00:00
Artem Belevich 2f0a3dfe64 [NVPTX] Use untyped (.b) integer registers in PTX.
This bring LLVM-generated PTX closer to what nvcc generates and avoids
triggering issues in ptxas.

For instance, ptxas does not accept .s16 (or .u16) registers as operands
for .fp16 instructions.

Differential Revision: https://reviews.llvm.org/D23460

llvm-svn: 278568
2016-08-12 22:02:19 +00:00
Eli Friedman f184e4befc [AArch64LoadStoreOptimizer] Check aliasing correctly when creating paired loads/stores.
The existing code accidentally skipped the aliasing check in edge cases.

Differential revision: https://reviews.llvm.org/D23372

llvm-svn: 278562
2016-08-12 20:39:51 +00:00
Eli Friedman 8585e9d33d [AArch64LoadStoreOpt] Handle offsets correctly for post-indexed paired loads.
Trunk would try to create something like "stp x9, x8, [x0], #512", which isn't actually a valid instruction.

Differential revision: https://reviews.llvm.org/D23368

llvm-svn: 278559
2016-08-12 20:28:02 +00:00
Artur Pilipenko 87e4038a91 [x86] X86ISelLowering zext(add_nuw(x, C)) --> add(zext(x), C_zext)
Currently X86ISelLowering has a similar transformation for sexts:
sext(add_nsw(x, C)) --> add(sext(x), C_sext)

In this change I extend this code to handle zexts as well.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D23359

llvm-svn: 278520
2016-08-12 16:08:30 +00:00
James Y Knight 2cc9da9a65 Revert "[Sparc] Leon errata fix passes."
...and the two followup commits:
Revert "[Sparc][Leon] Missed resetting option flags from check-in 278489."
Revert "[Sparc][Leon] Errata fixes for various errata in different
versions of the Leon variants of the Sparc 32 bit processor."

This reverts commit r274856, r278489, and r278492.

llvm-svn: 278511
2016-08-12 14:48:09 +00:00
Simon Pilgrim 687d71e877 [X86][SSE] Add support for combining target shuffles to PSLLDQ/PSRLDQ byte shifts
llvm-svn: 278502
2016-08-12 11:24:34 +00:00
Benjamin Kramer 05e760ec4b [Webassembly] disable unstable test.
It reads uninitialized memory and crashes randomly.

llvm-svn: 278495
2016-08-12 10:13:45 +00:00
Simon Pilgrim ed96b9adfb [X86][SSE] Fixed PALIGNR target shuffle decode
The PALIGNR target shuffle decode was not taking into account that DecodePALIGNRMask (rather oddly) expects the operands to be in reverse order, nor was it detecting unary patterns, causing combines to combine with the incorrect input.

The cgbuiltin, auto upgrade and instruction comments code correctly swap the operands so are not affected.

llvm-svn: 278494
2016-08-12 10:10:51 +00:00
Chris Dewhurst 829f8efe55 [Sparc][Leon] Errata fixes for various errata in different versions of the Leon variants of the Sparc 32 bit processor.
The nature of the errata are listed in the comments preceding the errata fix passes. Relevant unit tests are implemented for each of these.

These changes update older versions of these errata fixes with improvements to code and unit tests.

Differential Revision: https://reviews.llvm.org/D21960

llvm-svn: 278489
2016-08-12 09:34:26 +00:00
Haicheng Wu d9cbb1608f Revert "[BranchFolding] Restrict tail merging loop blocks after MBP"
This reverts commit r278463 because it hits the bot.

llvm-svn: 278484
2016-08-12 08:40:24 +00:00
Wei Mi 7e103d92cc Recommit 'Remove the restriction that MachineSinking is now stopped by
"insert_subreg, subreg_to_reg, and reg_sequence" instructions' after
adjusting some unittest checks.

This is to solve PR28852. The restriction was added at 2010 to make better register
coalescing. We assumed that it was not necessary any more. Testing results on x86
supported the assumption.

We will look closely to any performance impact it will bring and will be prepared
to help analyzing performance problem found on other architectures.

Differential Revision: https://reviews.llvm.org/D23210

llvm-svn: 278466
2016-08-12 03:33:22 +00:00
Haicheng Wu ea02372059 [BranchFolding] Restrict tail merging loop blocks after MBP
To fix PR28014, this patch restricts tail merging to blocks that belong to the
same loop after MBP.

Differential Revision: https://reviews.llvm.org/D23191

llvm-svn: 278463
2016-08-12 03:30:23 +00:00
Vyacheslav Klochkov 6daefcf626 X86-FMA3: Implemented commute transformation for EVEX/AVX512 FMA3 opcodes.
This helped to improved memory-folding and register coalescing optimizations.

Also, this patch fixed the tracker #17229.

Reviewer: Craig Topper.
Differential Revision: https://reviews.llvm.org/D23108

llvm-svn: 278431
2016-08-11 22:07:33 +00:00
Tim Northover 8e0c53a018 GlobalISel: support 'null' constant in translation.
It's sharing the integer G_CONSTANT for now since I don't *think* it creates
any ambiguity (even on weird archs). If that turns out wrong we can create a
G_PTRCONSTANT or something.

llvm-svn: 278423
2016-08-11 21:40:55 +00:00
Krzysztof Parzyszek 1b689da04e [Hexagon] Allow non-returning calls in hardware loops
llvm-svn: 278416
2016-08-11 21:14:25 +00:00
Tim Northover da6f5f2d0a Remove empty file left by partial reversion.
llvm-svn: 278411
2016-08-11 21:01:15 +00:00
Tim Northover 30e67ce793 GlobalISel: add translation support for shift operations.
llvm-svn: 278410
2016-08-11 21:01:13 +00:00
Tim Northover f1f7bf1279 GlobalISel: support zext & sext during translation phase.
llvm-svn: 278409
2016-08-11 21:01:10 +00:00
Wei Ding 70cda07526 AMDGPU : Add intrinsic for instruction v_cvt_pk_u8_f32
Differential Revision: http://reviews.llvm.org/D23336

llvm-svn: 278403
2016-08-11 20:34:48 +00:00
Wei Mi 3ab5816000 Revert rL278384 which caused several buildbot failures (like check failures in CodeGen/X86/clz.ll).
llvm-svn: 278402
2016-08-11 20:33:37 +00:00
Wei Mi ec19b35179 Remove the restriction that MachineSinking is now stopped by "insert_subreg,
subreg_to_reg, and reg_sequence" instructions.

This is to solve PR28852. The restriction was added at 2010 to make better register
coalescing. We assumed that it was not necessary any more. Testing results on x86
supported the assumption.

We will look closely to any performance impact it will bring and will be prepared
to help analyzing performance problem found on other architectures.

Differential Revision: https://reviews.llvm.org/D23210

llvm-svn: 278384
2016-08-11 18:42:56 +00:00
Krzysztof Parzyszek a003b76391 If-conversion incorrectly calculates liveness of redefined registers
Differential Revision: https://reviews.llvm.org/D23207

llvm-svn: 278383
2016-08-11 18:42:06 +00:00
Krzysztof Parzyszek 60f0b51485 [Hexagon] Skip byval arguments when checking parameter attributes
From the point of view of register assignment, byval parameters are
ignored: a byval parameter is not going to be assigned to a register,
and it will not affect the assignments of subsequent parameters.
When matching registers with parameters in the bit tracker, make sure
to skip byval parameters before advancing the registers.

llvm-svn: 278375
2016-08-11 18:15:16 +00:00
Dominic Chen 6ba19659cb Improve virtual register handling when computing debug information
Summary: Some backends, like WebAssembly, use virtual registers instead of physical registers. This crashes the DbgValueHistoryCalculator pass, which assumes that all registers are physical. Instead, skip virtual registers when iterating aliases, and assume that they are clobbered.

Reviewers: dexonsmith, dschuff, aprantl

Subscribers: yurydelendik, llvm-commits, jfb, sunfish

Differential Revision: https://reviews.llvm.org/D22590

llvm-svn: 278371
2016-08-11 17:52:40 +00:00
Michael Kuperstein e36d7716c3 Make TwoAddressInstructionPass::rescheduleMIBelowKill subreg-aware
This fixes PR28824.

Differential Revision: https://reviews.llvm.org/D23220

llvm-svn: 278370
2016-08-11 17:38:33 +00:00
Matt Arsenault 56684d4538 AMDGPU: Fix crashes on memory functions
llvm-svn: 278369
2016-08-11 17:31:42 +00:00
Wei Ding d3344378c6 AMDGPU : Fix SAD related instruction LIT tests function atttibute issues.
Differential Revision: http://reviews.llvm.org/D23133

llvm-svn: 278360
2016-08-11 17:14:17 +00:00
Wei Ding 34e1753585 AMDGPU : Add LLVM intrinsics for SAD related instructions.
Differential Revision: http://reviews.llvm.org/D23133

llvm-svn: 278354
2016-08-11 16:33:53 +00:00
Tim Northover 0d51044b69 GlobalISel: clear vreg mapping after translating each function
Otherwise we only materialize (shared) constants in the first function they
appear in. This doesn't go well.

llvm-svn: 278351
2016-08-11 16:21:29 +00:00
Igor Breger a77b14d02c [AVX512] Fix extractelement i1 lowering.
The previous implementation (not custom) doesn't enforce zeroing off upper bits. The assumption is that i1 PRODUCER (truncate and extractelement) must zero all upper bits, so i1 CONSUMER instructions ( test, zext, save, etc) can be done without additional zeroing.
Make extractelement i1 lowering custom for all vector i1.

Differential Revision: http://reviews.llvm.org/D23246

llvm-svn: 278328
2016-08-11 12:13:46 +00:00
Marina Yatsina 88f0c31f13 Avoid false dependencies of undef machine operands
This patch helps avoid false dependencies on undef registers by updating the machine instructions' undef operand to use a register that the instruction is truly dependent on, or use a register with clearance higher than Pref.

Pseudo example:

loop:
xmm0 = ...
xmm1 = vcvtsi2sdl eax, xmm0<undef>
... = inst xmm0
jmp loop

In this example, selecting xmm0 as the undef register creates false dependency between loop iterations.
This false dependency cannot be solved by inserting an xor before vcvtsi2sdl because xmm0 is alive at the point of the vcvtsi2sdl instruction.
Selecting a different register instead of xmm0, especially a register that is not used in the loop, will eliminate this problem.

Differential Revision: https://reviews.llvm.org/D22466

llvm-svn: 278321
2016-08-11 07:32:08 +00:00
Craig Topper a78b768ed4 [AVX-512] Promote 512-bit integer loads to v8i64 similar to what is done for 128/256-bit vectors for overall consistency.
llvm-svn: 278318
2016-08-11 06:04:07 +00:00
Craig Topper 14aa2665d3 [AVX-512] Add patterns to allow EVEX encoded stores of v16i16/v8i16/v16i8/v32i8 even when BWI is not supported.
llvm-svn: 278317
2016-08-11 06:04:04 +00:00
Craig Topper 3563d0f622 [AVX-512] Fix the 128-bit and 256-bit nontemporal load patterns with elements type other than i64. These loads have all been promoted to v2i64/v4i64 loads so we need bitcasts or we end up selecting VMOVDQA32/VMOVDQU32 instead.
llvm-svn: 278316
2016-08-11 06:04:00 +00:00
Tim Northover 357f1be2ca GlobalISel: support same ConstantExprs as Instructions.
It's more than just inttoptr, but the others can't be tested until we have
support for non-trivial constants (they currently get unavoidably folded to a
ConstantInt).

llvm-svn: 278303
2016-08-10 23:02:41 +00:00
Tim Northover 2ff5935a95 GlobalISel: add tests forgotten in r278293.
llvm-svn: 278296
2016-08-10 22:13:48 +00:00
Changpeng Fang fb9c3818dd AMDGPU/SI: Implement amdgcn image intrinsics with sampler
Summary:
  This patch define and implement amdgcn image intrinsics with sampler.

    1. define vdata type to be llvm_anyfloat_ty, address type to be llvm_anyfloat_ty,
       and rsrc type to be llvm_anyint_ty. As a result, we expect the intrinsics name
       to have three suffixes to overload each of these three types;

    2. D128 as well as two other flags are implied in the three types, for example,
       if you use v8i32 as resource type, then r128 is 0!

    3. don't expose TFE flag, and other flags are exposed in the instruction order:
       unrm, glc, slc, lwe and da.

Differential Revision: http://reviews.llvm.org/D22838

Reviewed by:
  arsenm and tstellarAMD

llvm-svn: 278291
2016-08-10 21:15:30 +00:00
Kyle Butt 81d32846b0 Codegen: Don't tail-duplicate blocks with un-analyzable fallthrough.
If AnalyzeBranch can't analyze a block and it is possible to
fallthrough, then duplicating the block doesn't make sense, as only one
block can be the layout predecessor for the un-analyzable fallthrough.

Submitted wit a test case, but NOTE: the test case doesn't currently
fail. However, the test case fails with D20505 and would have saved me
some time debugging.

llvm-svn: 278288
2016-08-10 21:03:27 +00:00
Kyle Butt e1c931b171 CodeGen: If Convert blocks that would form a diamond when tail-merged.
The following function currently relies on tail-merging for if
conversion to succeed. The common tail of cond_true and cond_false is
extracted, and this then forms a diamond pattern that can be
successfully if converted.

If this block does not get extracted, either because tail-merging is
disabled or the threshold is higher, we should still recognize this
pattern and if-convert it.

Fixed a regression in the original commit. Need to un-reverse branches after
reversing them, or other conversions go awry.

define i32 @t2(i32 %a, i32 %b) nounwind {
entry:
        %tmp1434 = icmp eq i32 %a, %b           ; <i1> [#uses=1]
        br i1 %tmp1434, label %bb17, label %bb.outer

bb.outer:               ; preds = %cond_false, %entry
        %b_addr.021.0.ph = phi i32 [ %b, %entry ], [ %tmp10, %cond_false ]
        %a_addr.026.0.ph = phi i32 [ %a, %entry ], [ %a_addr.026.0, %cond_false ]
        br label %bb

bb:             ; preds = %cond_true, %bb.outer
        %indvar = phi i32 [ 0, %bb.outer ], [ %indvar.next, %cond_true ]
        %tmp. = sub i32 0, %b_addr.021.0.ph
        %tmp.40 = mul i32 %indvar, %tmp.
        %a_addr.026.0 = add i32 %tmp.40, %a_addr.026.0.ph
        %tmp3 = icmp sgt i32 %a_addr.026.0, %b_addr.021.0.ph
        br i1 %tmp3, label %cond_true, label %cond_false

cond_true:              ; preds = %bb
        %tmp7 = sub i32 %a_addr.026.0, %b_addr.021.0.ph
        %tmp1437 = icmp eq i32 %tmp7, %b_addr.021.0.ph
        %indvar.next = add i32 %indvar, 1
        br i1 %tmp1437, label %bb17, label %bb

cond_false:             ; preds = %bb
        %tmp10 = sub i32 %b_addr.021.0.ph, %a_addr.026.0
        %tmp14 = icmp eq i32 %a_addr.026.0, %tmp10
        br i1 %tmp14, label %bb17, label %bb.outer

bb17:           ; preds = %cond_false, %cond_true, %entry
        %a_addr.026.1 = phi i32 [ %a, %entry ], [ %tmp7, %cond_true ], [ %a_addr.026.0, %cond_false ]
        ret i32 %a_addr.026.1
}

Without tail-merging or diamond-tail if conversion:
LBB1_1:                                 @ %bb
                                        @ =>This Inner Loop Header: Depth=1
        cmp     r0, r1
        ble     LBB1_3
@ BB#2:                                 @ %cond_true
                                        @   in Loop: Header=BB1_1 Depth=1
        subs    r0, r0, r1
        cmp     r1, r0
        it      ne
        cmpne   r0, r1
        bgt     LBB1_4
LBB1_3:                                 @ %cond_false
                                        @   in Loop: Header=BB1_1 Depth=1
        subs    r1, r1, r0
        cmp     r1, r0
        bne     LBB1_1
LBB1_4:                                 @ %bb17
        bx      lr

With diamond-tail if conversion, but without tail-merging:
@ BB#0:                                 @ %entry
        cmp     r0, r1
        it      eq
        bxeq    lr
LBB1_1:                                 @ %bb
                                        @ =>This Inner Loop Header: Depth=1
        cmp     r0, r1
        ite     le
        suble   r1, r1, r0
        subgt   r0, r0, r1
        cmp     r1, r0
        bne     LBB1_1
@ BB#2:                                 @ %bb17
        bx      lr

llvm-svn: 278287
2016-08-10 20:45:56 +00:00
Matt Arsenault 57431c9680 AMDGPU: Change insertion point of si_mask_branch
Insert before the skip branch if one is created.
This is a somewhat more natural placement relative
to the skip branches, and makes it possible to implement
analyzeBranch for skip blocks.

The test changes are mostly due to a quirk where
the block label is not emitted if there is a terminator
that is not also a branch.

llvm-svn: 278273
2016-08-10 19:11:42 +00:00
Sanjay Patel 5ccc85fe83 [x86, AVX] allow FP vector select folding to bitwise logic ops (PR28895)
This handles the case in:
https://llvm.org/bugs/show_bug.cgi?id=28895

...but we are not getting all of the possibilities yet. 
Eg, we use 'X86::FANDN' for scalar FP select combines.

That enhancement is filed as:
https://llvm.org/bugs/show_bug.cgi?id=28925

Differential Revision: https://reviews.llvm.org/D23337

llvm-svn: 278270
2016-08-10 19:00:11 +00:00
Nicolai Haehnle 02d784172c LiveIntervalAnalysis: fix a crash in repairOldRegInRange
Summary:
See the new test case for one that was (non-deterministically) crashing
on trunk and deterministically hit the assertion that I added in D23302.
Basically, the machine function contains a sequence

     DS_WRITE_B32 %vreg4, %vreg14:sub0, ...
     DS_WRITE_B32 %vreg4, %vreg14:sub0, ...
     %vreg14:sub1<def> = COPY %vreg14:sub0

and SILoadStoreOptimizer::mergeWrite2Pair merges the two DS_WRITE_B32
instructions into one before calling repairIntervalsInRange.

Now repairIntervalsInRange wants to repair %vreg14, in particular, and
ends up trying to repair %vreg14:sub1 as well, but that only becomes
active _after_ the range that is to be repaired, hence the crash due
to LR.find(...) == LR.begin() at the start of repairOldRegInRange.

I believe that just skipping those subrange is fine, but again, not too
familiar with that code.

Reviewers: MatzeB, kparzysz, tstellarAMD

Subscribers: llvm-commits, MatzeB

Differential Revision: https://reviews.llvm.org/D23303

llvm-svn: 278268
2016-08-10 18:51:14 +00:00
Kyle Butt 71b1ca1be4 Codegen: Tail Merge: Be less aggressive with special cases.
This change makes it possible for tail-duplication and tail-merging to
be disjoint. By being less aggressive when merging during layout, there are no
overlapping cases between tail-duplication and tail-merging, provided the
thresholds are disjoint.

There is a remaining TODO to benchmark the succ_size() test for non-layout tail
merging.

llvm-svn: 278265
2016-08-10 18:36:18 +00:00
Krzysztof Parzyszek 0bbad0fc86 [Hexagon] Simplify the SplitConst32/64 pass
llvm-svn: 278256
2016-08-10 18:05:47 +00:00
Krzysztof Parzyszek 3b946c90ef [Hexagon] Add extra patterns for single-precision min/max instructions
llvm-svn: 278252
2016-08-10 17:56:24 +00:00
Tim Northover 7552ef5a00 GlobalISel: avoid inserting redundant COPYs for bitcasts.
If the value produced by the bitcast hasn't been referenced yet, we can simply
reuse the input register avoiding an unnecessary COPY instruction.

llvm-svn: 278245
2016-08-10 16:51:14 +00:00
Krzysztof Parzyszek a3386501af [Hexagon] Use integer instructions for floating point immediates
Floating point instructions use general purpose registers, so the few
instructions that can put floating point immediates into registers are,
in fact, integer instruction. Use them explicitly instead of having
pseudo-instructions specifically for dealing with floating point values.

Simplify the constant loading instructions (from sdata) to have only two:
one for 32-bit values and one for 64-bit values: CONST32 and CONST64.

llvm-svn: 278244
2016-08-10 16:46:36 +00:00
Simon Pilgrim b204f03004 [X86][XOP] Tweak vpermil2pd test to stop it being combined away
The target shuffle combined to a BLENDPD pattern which we will shortly add support for.

llvm-svn: 278233
2016-08-10 15:15:56 +00:00
Simon Pilgrim f1f55198c1 [X86][SSE] Regenerate vector shift lowering tests
llvm-svn: 278232
2016-08-10 15:13:49 +00:00
Sanjay Patel 2c677a9306 use different comparison predicates for better test coverage
llvm-svn: 278229
2016-08-10 15:06:11 +00:00
Simon Pilgrim ac8fa6c2c6 [X86][SSE] Add support for combining target shuffles to MOVSS/MOVSD
Only do this on pre-SSE41 targets where we should be lowering to BLENDPS/BLENDPD instead

llvm-svn: 278228
2016-08-10 14:15:41 +00:00
Simon Pilgrim d99242c44d [X86][SSE] Regenerate SSE1 tests
Properly demonstrate the nasty codegen we get for vselect without integer vectors

llvm-svn: 278215
2016-08-10 12:26:40 +00:00
Simon Pilgrim cb5a189b90 Regenerate test
llvm-svn: 278214
2016-08-10 12:24:19 +00:00
Simon Pilgrim 85c7ea86ae [DAGCombine] Avoid INSERT_SUBVECTOR reinsertions (PR28678)
If the input vector to INSERT_SUBVECTOR is another INSERT_SUBVECTOR, and this inserted subvector replaces the last insertion, then insert into the common source vector.

i.e. 
INSERT_SUBVECTOR( INSERT_SUBVECTOR( Vec, SubOld, Idx ), SubNew, Idx ) --> INSERT_SUBVECTOR( Vec, SubNew, Idx )

Differential Revision: https://reviews.llvm.org/D23330

llvm-svn: 278211
2016-08-10 10:50:53 +00:00
Sam Parker 62965c96df [ARM] Improve sxta{b|h} and uxta{b|h} tests
Created a Thumb2 predicated pattern matcher that uses Thumb2 and
HasT2ExtractPack and used it to redefine the patterns for sxta{b|h}
and uxta{b|h}. Also used the similar patterns to fill in isel pattern
gaps for the corresponding instructions in the ARM backend.
The patch is mainly changes to tests since most of this functionality
appears not to have been tested.

Differential Revision: https://reviews.llvm.org/D23273

llvm-svn: 278207
2016-08-10 09:34:34 +00:00