Commit Graph

39162 Commits

Author SHA1 Message Date
Jan Vesely 38814fa2fd AMDGPU/R600: Enable Load combine
Fix and improve tests

Differential Revision: https://reviews.llvm.org/D23899

llvm-svn: 279925
2016-08-27 19:09:43 +00:00
Craig Topper 6943aa306e [X86] Rename predicate function that detects if requires one of the REX.B, REX.X or REX.R bits. It's old name conflicted with a function in X8II namespace that doesnt' quite do the same thing. NFC
llvm-svn: 279924
2016-08-27 17:13:43 +00:00
Craig Topper 45793a1f7a [X86] Keep looping over operands looking for byte registers even if we already found a register that requires a REX prefix. Otherwise we don't error if a high byte register is used after SPL/BPL/DIL/SIL.
llvm-svn: 279923
2016-08-27 17:13:41 +00:00
Craig Topper 6acca80e17 [X86] Include XMM/YMM/ZMM16-23 in X86II::isX86_64ExtendedReg. This feels more consistent with its name and simplifies assembler code.
llvm-svn: 279922
2016-08-27 17:13:37 +00:00
Craig Topper 06c60c067f [X86] Don't allow DR8-DR15 to be assembled in 32-bit mode. Add missing test for CR8-CR15.
llvm-svn: 279921
2016-08-27 17:13:34 +00:00
Craig Topper ed71f04abb [X86] Remove stale comment about FixupBWInsts pass being off by default. NFC
llvm-svn: 279915
2016-08-27 05:26:54 +00:00
Craig Topper 225da2cb84 [AVX-512] Allow EVEX encoding unordered/ordered/equal/notequal VCMPPS/PD/SS/SD to be commuted just like the SSE and AVX counterparts.
llvm-svn: 279914
2016-08-27 05:22:15 +00:00
Craig Topper 144fdef66b [X86] Enable FR32/FR64 cmpeq/cmpne/cmpunord/cmpord to be commuted.
llvm-svn: 279913
2016-08-27 05:22:12 +00:00
Craig Topper 4891c724aa [AVX-512] Add load folding for EVEX vcmpps/pd/ss/sd.
llvm-svn: 279912
2016-08-27 05:22:08 +00:00
Matt Arsenault a15ea4e217 AMDGPU: Mark sched model complete
Fixes bug 26800

llvm-svn: 279910
2016-08-27 03:39:27 +00:00
Matt Arsenault 71ed8a67e8 AMDGPU: Remove unneeded implicit exec uses/defs
SI_BREAK, SI_IF_BREAK, and SI_ELSE_BREAK do not def exec.
SI_IF_BREAK and SI_ELSE_BREAK do not read it either.

llvm-svn: 279909
2016-08-27 03:00:51 +00:00
Matt Arsenault 2712d4a3d8 AMDGPU: Select mulhi 24-bit instructions
llvm-svn: 279902
2016-08-27 01:32:27 +00:00
Matt Arsenault 22e417956d AMDGPU: Move cndmask pseudo to be isel pseudo
There's only one use of this for the convenience
of a pattern. I think v_mov_b64_pseudo should also be
moved, but SIFoldOperands does currently make use of it.

llvm-svn: 279901
2016-08-27 01:00:37 +00:00
Matt Arsenault e949744474 AMDGPU: Fix sched type for branches
llvm-svn: 279900
2016-08-27 00:51:02 +00:00
Matt Arsenault f98a596954 AMDGPU: Remove register operand from si_mask_branch
It isn't used for anything, and is also misleading since
it could be spilled at the end of the block, so it can't be relied
on. There ends up being a verifier error about using an undefined
register since the spill kills the register.

llvm-svn: 279899
2016-08-27 00:42:21 +00:00
Matt Arsenault 00e102baf4 AMDGPU: Improve error reporting for maximum branch distance
Unfortunately this seems to only help the assembler diagnostic.

llvm-svn: 279895
2016-08-27 00:21:22 +00:00
Quentin Colombet a94caa5673 [AArch64][CallLowering] Do not assert for not implemented part.
When doing the ABI lowering, report a failure to the caller instead of
asserting. This gives a chance for the caller to recover.

llvm-svn: 279890
2016-08-27 00:18:28 +00:00
Tom Stellard e175d8aba5 AMDGPU/SI: Canonicalize offset order for merged DS instructions
Summary:
If the scheduler clusters the loads, then the offsets will be sorted,
but it is possible for the scheduler to scheduler loads together
without out explicitly clustering them, which would give us non-sorted
offsets.

Also, we will want to do this if we move the load/store optimizer before
the scheduler.

Reviewers: arsenm

Subscribers: arsenm, llvm-commits, kzhuravl

Differential Revision: https://reviews.llvm.org/D23776

llvm-svn: 279870
2016-08-26 21:36:47 +00:00
Tom Stellard 4b5cd87ed3 XXX
llvm-svn: 279868
2016-08-26 21:16:40 +00:00
Tom Stellard 7c463c9168 AMDGPU/SI: Use a better method for determining the largest pressure sets
Summary:
There are a few different sgpr pressure sets, but we only care about
the one which covers all of the sgprs.  We were using hard-coded
register pressure set names to determine the reg set id for the
biggest sgpr set.  However, we were using the wrong name, and this
method is pretty fragile, since the reg pressure set names may
change.

The new method just looks for the pressure set that contains the most
reg units and sets that set as our SGPR pressure set.  We've also
adopted the same technique for determining our VGPR pressure set.

Reviewers: arsenm

Subscribers: MatzeB, arsenm, llvm-commits, kzhuravl

Differential Revision: https://reviews.llvm.org/D23687

llvm-svn: 279867
2016-08-26 21:16:37 +00:00
Manman Ren 66b54e9f32 Swift Calling Convetion: add support for AArch64.
It will just be the same as the regular calling convention.

rdar://28029509

llvm-svn: 279853
2016-08-26 19:28:17 +00:00
Tim Northover 85cf564c51 AArch64: avoid assertion on illegal types in performFDivCombine.
In the code to detect fixed-point conversions and make use of AArch64's special
instructions, we weren't prepared for weird types. The fptosi direction got
fixed recently, but not the similar sitofp code.

llvm-svn: 279852
2016-08-26 18:52:31 +00:00
Chad Rosier 58f505ba24 [AArch64] Avoid materializing constant values when generating csel instructions.
Differential Revision: https://reviews.llvm.org/D23677

llvm-svn: 279849
2016-08-26 18:05:50 +00:00
Reid Kleckner a5b1eef846 [MC] Move .cv_loc management logic out of MCContext
MCContext already has many tasks, and separating CodeView out from it is
probably a good idea. The .cv_loc tracking was modelled on the DWARF
tracking which lived directly in MCContext.

Removes the inclusion of MCCodeView.h from MCContext.h, so now there are
only 10 build actions while I hack on CodeView support instead of 265.

llvm-svn: 279847
2016-08-26 17:58:37 +00:00
Tim Northover bc1701c7fb GlobalISel: mark G_FPEXT legal from float to double.
llvm-svn: 279845
2016-08-26 17:46:22 +00:00
Tim Northover 30bd36e3fc GlobalISel: mark G_FCMP legal on float & double.
llvm-svn: 279844
2016-08-26 17:46:19 +00:00
Tim Northover 051b8ad3d9 GlobalISel: simplify G_ICMP legalization regime.
It's unclear how the old

    %res(32) = G_ICMP { s32, s32 } intpred(eq), %0, %1

is actually different from an s1 verison

    %res(1) = G_ICMP { s1, s32 } intpred(eq), %0, %1

so we'll remove it for now.

llvm-svn: 279843
2016-08-26 17:46:17 +00:00
Tim Northover cecee56abb GlobalISel: legalize sdiv and srem operations.
llvm-svn: 279842
2016-08-26 17:46:13 +00:00
Tim Northover 7a753d9bec GlobalISel: legalize under-width divisions.
llvm-svn: 279841
2016-08-26 17:46:06 +00:00
Tim Northover 1d18a99a53 GlobalISel: mark selects legal
llvm-svn: 279840
2016-08-26 17:46:03 +00:00
Tim Northover 5d0eaa4e79 GlobalISel: mark float/int conversions legal
llvm-svn: 279839
2016-08-26 17:45:58 +00:00
Chad Rosier 39c1dbb845 [AArch64] Avoid materializing constant 1 by using csinc, rather than csel.
This is similar to what was done in r261675, but for CSINC rather than CSINV.

Differential Revision: https://reviews.llvm.org/D23892

llvm-svn: 279822
2016-08-26 14:01:55 +00:00
Pablo Barrio b8ec630583 Handle empty functions with debug info in load/store opt pass
Summary:
In fuctions that contained debug info but were empty otherwise,
the ARM load/store optimizer could abort. This was because
function MergeReturnIntoLDM handled the special case where a
Machine Basic BLock is empty by calling MBB.empty(). However, this
returns false in presence of debug info, although the function
should be considered empty in the eyes of the load/store optimizer.
This has been fixed by handling the case where searching through the
block finds only debug instructions.

Reviewers: rengolin, dexonsmith, llvm-commits, jmolloy

Subscribers: t.p.northover, aemerson, rengolin, samparker

Differential Revision: https://reviews.llvm.org/D23847

llvm-svn: 279820
2016-08-26 13:00:39 +00:00
Simon Pilgrim 091c4c781c [X86][SSE4A] The EXTRQ/INSERTQ bit extraction/insertion ops should be in the integer domain
llvm-svn: 279811
2016-08-26 09:55:41 +00:00
Craig Topper 8f27f51192 [X86][SSE] Add CMPSS/CMPSD intrinsic scalar load folding support.
llvm-svn: 279806
2016-08-26 07:08:00 +00:00
Michael Kuperstein 2ee911e985 Revert r274613 because it breaks the test suite with AVX512
This reverts most of r274613 (AKA r274626) and its follow-ups (r276347, r277289),
due to miscompiles in the test suite. The FastISel change was left in, because
it apparently fixes an unrelated issue.

(Recommit of r279782 which was broken due to a bad merge.)

This fixes 4 out of the 5 test failures in PR29112.

llvm-svn: 279788
2016-08-25 22:48:11 +00:00
Michael Kuperstein 6e271f4ce8 Revert r279782 due to debug buildbot breakage.
llvm-svn: 279785
2016-08-25 22:14:45 +00:00
Michael Kuperstein a6ccc8d365 Revert r274613 because it breaks the test suite with AVX512
This reverts most of r274613 and its follow-ups (r276347, r277289), due to
miscompiles in the test suite. The FastISel change was left in, because it
apparently fixes an unrelated issue.

This fixes 4 out of the 5 test failures in PR29112.

llvm-svn: 279782
2016-08-25 21:55:41 +00:00
Tim Northover 3495647d0d ARM: by default don't set the Thumb bit on MachO relocated values.
Its existence is largely historical, apparently we tried to make ARM object
files look maybe-almost-possibly runnable by putting our best guess at the
actual value into relocated locations. Of course, the real linker then comes
along and can completely change things.

But it should only be there for word-sized and movw/movt relocations. It can't
be encoded in branch relocations, and I've seen it mess up validity
calculations twice in the last couple of weeks so the default is clearly problematic.

llvm-svn: 279773
2016-08-25 20:41:30 +00:00
Tim Northover d8a6d7ce91 GlobalISel: mark overflow bit of overflow ops legal.
It's expected this will map to NZCV register class and be properly selectable.

llvm-svn: 279761
2016-08-25 17:37:41 +00:00
Tim Northover fe880a8801 GlobalISel: mark simple ops legal even on types < 32-bit.
The 32-bit variants of these operations don't depend on the bits not being
operated on, so they also naturally model operations narrower than the actual
register width.

llvm-svn: 279760
2016-08-25 17:37:39 +00:00
Tim Northover 7a1ec0141a GlobalISel: mark pointer constants as legal on AArch64.
llvm-svn: 279759
2016-08-25 17:37:35 +00:00
Tim Northover 438c77ca1a GlobalISel: perform multi-step legalization
llvm-svn: 279758
2016-08-25 17:37:32 +00:00
Tim Northover 2c4a838e24 GlobalISel: mark small extends as legal on AArch64
llvm-svn: 279757
2016-08-25 17:37:25 +00:00
Michael Kuperstein 40887c5566 [X86] 512-bit VPAVG requires AVX512BW
Fix VPAVG detection to require AVX512BW, not AVX512F for 512-bit widths,
and change associated asserts to assert in the right direction...

This fixes PR29111.

llvm-svn: 279755
2016-08-25 17:17:46 +00:00
Simon Pilgrim 5aa9c203ac [X86][SSE] INSERTPS is only combined on v4f32 types. NFCI.
llvm-svn: 279751
2016-08-25 17:02:00 +00:00
Ron Lieberman a3c739b977 [Hexagon] Remove extraneous debug output from HexagonCopyToCombine.cpp
BB# ...

llvm-svn: 279750
2016-08-25 16:46:09 +00:00
Simon Pilgrim 6fe4a9ed1e Fix line endings
llvm-svn: 279745
2016-08-25 15:45:27 +00:00
Ron Lieberman c93d123b86 [Hexagon] vector store print tracing.
Add vector store print tracing option for hexagon vector instructions.

https://reviews.llvm.org/D23870

llvm-svn: 279739
2016-08-25 13:35:48 +00:00
Simon Pilgrim 0ad9f3e93b [X86][AVX] Provide SubVectorBroadcast fallback if load fold fails (PR29133)
Fix for PR29133, matching the approach that was taken for AVX1 scalar broadcasts.

llvm-svn: 279735
2016-08-25 12:45:16 +00:00
Craig Topper 5ef7a0f45a [X86] Simplify getOperandBias as a bit. NFC
There's no reason for it to return a signed type. Just return the operand bias in each if instead of starting from 0 and adding in the 'if'.

llvm-svn: 279720
2016-08-25 04:16:10 +00:00
Craig Topper 969e56a2cc [X86] Fix indentation per coding standards. NFC
llvm-svn: 279719
2016-08-25 04:16:08 +00:00
Matthias Braun 1eb473680a MachineFunctionProperties/MIRParser: Rename AllVRegsAllocated->NoVRegs, compute it
Rename AllVRegsAllocated to NoVRegs. This avoids the connotation of
running after register and simply describes that no vregs are used in
a machine function. With that we can simply compute the property and do
not need to dump/parse it in .mir files.

Differential Revision: http://reviews.llvm.org/D23850

llvm-svn: 279698
2016-08-25 01:27:13 +00:00
George Burgess IV 381fc0ee3c Make some LLVM_CONSTEXPR variables const. NFC.
This patch changes LLVM_CONSTEXPR variable declarations to const
variable declarations, since LLVM_CONSTEXPR expands to nothing if the
current compiler doesn't support constexpr. In all of the changed
cases, it looks like the code intended the variable to be const instead
of sometimes-constexpr sometimes-not.

llvm-svn: 279696
2016-08-25 01:05:08 +00:00
Heejin Ahn b6cd5121b7 [WebAssembly] Change a comment line
Test for commit access.

llvm-svn: 279683
2016-08-24 22:53:00 +00:00
Krzysztof Parzyszek 6dff336ad1 [Hexagon] Check for block end when skipping debug instructions
llvm-svn: 279681
2016-08-24 22:36:35 +00:00
Krzysztof Parzyszek 951fb36120 [Hexagon] Change insertion of expand-condsets pass to avoid memory leaks
llvm-svn: 279678
2016-08-24 22:27:36 +00:00
Tim Northover 9c3633f516 ARM: don't diagnose cbz/cbnz to Thumb functions.
A branch-distance to a Thumb function shouldn't be forced to be odd for
CBZ/CBNZ instructions because (assuming it's within range), it's going to be a
valid, even offset.

llvm-svn: 279665
2016-08-24 21:21:29 +00:00
Changpeng Fang 75f0968b39 AMDGCN/SI: Implement readlane/readfirstlane intrinsics
Summary:
  This patch implements readlane/readfirstlane intrinsics.
TODO: need to define a new register class to consider the case
that the source could be a vector register or M0.

Reviewed by:
  arsenm and tstellarAMD

Differential Revision:
  http://reviews.llvm.org/D22489

llvm-svn: 279660
2016-08-24 20:35:23 +00:00
Rafael Espindola 70c6a3976b Use isTargetMachO instead of isTargetDarwin.
llvm-svn: 279655
2016-08-24 19:02:29 +00:00
Simon Pilgrim e14653e17d [X86][SSE] Add MINSD/MAXSD/MINSS/MAXSS intrinsic scalar load folding support
These are no different in load behaviour to the existing ADD/SUB/MUL/DIV scalar ops but were missing from isNonFoldablePartialRegisterLoad

llvm-svn: 279652
2016-08-24 18:40:53 +00:00
Evandro Menezes 5395187fe5 [AArch64] Adjust the feature set for Exynos M1.
Enable zero cycle zeroing.

llvm-svn: 279648
2016-08-24 18:17:30 +00:00
Simon Pilgrim 941bd6bbae [X86][SSE] Add support for combining VZEXT_MOVL target shuffles
Includes adding more general support for the pattern: VZEXT_MOVL(VZEXT_LOAD(ptr)) -> VZEXT_LOAD(ptr)

This has unearthed a couple of latent poor codegen issues (MINSS/MAXSS scalar load folding and MOVDDUP/BROADCAST load folding patterns), which will be fixed shortly.

Its also reduced a couple of tests so that they no longer reach the instruction threshold necessary to be combined to PSHUFB (see PR26183).

llvm-svn: 279646
2016-08-24 18:07:53 +00:00
Krzysztof Parzyszek b5ec48755d [Hexagon] Enable subregister liveness tracking
llvm-svn: 279642
2016-08-24 17:17:39 +00:00
Krzysztof Parzyszek cbd559f507 [Hexagon] Remove the utilization of IMPLICIT_DEFs from expand-condsets
This is no longer necessary, because since r279625 the subregister
liveness properly accounts for read-undefs.

llvm-svn: 279637
2016-08-24 16:36:37 +00:00
Wei Ding 1041a646a9 AMDGPU : Add V_SAD_U32 instruction pattern.
Differential Revision: http://reviews.llvm.org/D23069

llvm-svn: 279629
2016-08-24 14:59:47 +00:00
Krzysztof Parzyszek a7ed090bba Create subranges for new intervals resulting from live interval splitting
The register allocator can split a live interval of a register into a set
of smaller intervals. After the allocation of registers is complete, the
rewriter will modify the IR to replace virtual registers with the corres-
ponding physical registers. At this stage, if a register corresponding
to a subregister of a virtual register is used, the rewriter will check
if that subregister is undefined, and if so, it will add the <undef> flag
to the machine operand. The function verifying liveness of the subregis-
ter would assume that it is undefined, unless any of the subranges of the
live interval proves otherwise.
The problem is that the live intervals created during splitting do not
have any subranges, even if the original parent interval did. This could
result in the <undef> flag placed on a register that is actually defined.

Differential Revision: http://reviews.llvm.org/D21189

llvm-svn: 279625
2016-08-24 13:37:55 +00:00
Simon Dardis f114820912 [mips] Preparatory work for a generic scheduler
Extend instruction definitions from nearly all ISAs to include
appropriate instruction itineraries. Change MIPS16s gp prologue
generation to use real instructions instead of using a pseudo
instruction.

Reviewers: dsanders, vkalintiris

Differential Review: https://reviews.llvm.org/D23548

llvm-svn: 279623
2016-08-24 13:00:47 +00:00
Simon Pilgrim 7a50c8c2ba [X86][AVX2] Ensure on 32-bit targets that we broadcast f64 types not i64 (PR29101)
llvm-svn: 279622
2016-08-24 12:42:31 +00:00
Simon Pilgrim 6392b8d4ce [X86][SSE] Add support for 32-bit element vectors to X86ISD::VZEXT_LOAD
Consecutive load matching (EltsFromConsecutiveLoads) currently uses VZEXT_LOAD (load scalar into lowest element and zero uppers) for vXi64 / vXf64 vectors only.

For vXi32 / vXf32 vectors it instead creates a scalar load, SCALAR_TO_VECTOR and finally VZEXT_MOVL (zero upper vector elements), relying on tablegen patterns to match this into an equivalent of VZEXT_LOAD.

This patch adds the VZEXT_LOAD patterns for vXi32 / vXf32 vectors directly and updates EltsFromConsecutiveLoads to use this.

This has proven necessary to allow us to easily make VZEXT_MOVL a full member of the target shuffle set - without this change the call to combineShuffle (which is the main caller of EltsFromConsecutiveLoads) tended to recursively recreate VZEXT_MOVL nodes......

Differential Revision: https://reviews.llvm.org/D23673

llvm-svn: 279619
2016-08-24 10:46:40 +00:00
Matthias Braun 733fe3676c CodeGen: Remove MachineFunctionAnalysis => Enable (Machine)ModulePasses
Re-apply this patch, hopefully I will get away without any warnings
in the constructor now.

This patch removes the MachineFunctionAnalysis. Instead we keep a
map from IR Function to MachineFunction in the MachineModuleInfo.

This allows the insertion of ModulePasses into the codegen pipeline
without breaking it because the MachineFunctionAnalysis gets dropped
before a module pass.

Peak memory should stay unchanged without a ModulePass in the codegen
pipeline: Previously the MachineFunction was freed at the end of a codegen
function pipeline because the MachineFunctionAnalysis was dropped; With
this patch the MachineFunction is freed after the AsmPrinter has
finished.

Differential Revision: http://reviews.llvm.org/D23736

llvm-svn: 279602
2016-08-24 01:52:46 +00:00
Philip Reames e83c4b30ca [stackmaps] More extraction of common code [NFCI]
General cleanup before starting to work on the part I want to actually change.

llvm-svn: 279586
2016-08-23 23:33:29 +00:00
Richard Smith 8c3fbdc6c4 Revert r279564. It introduces undefined behavior (binding a reference to a
dereferenced null pointer) in MachineModuleInfo::MachineModuleInfo that causes
-Werror builds (including several buildbots) to fail.

llvm-svn: 279580
2016-08-23 22:08:27 +00:00
Matthias Braun 90799ce8b2 MachineFunction: Introduce NoPHIs property
I want to compute the SSA property of .mir files automatically in
upcoming patches. The problem with this is that some inputs will be
reported as static single assignment with some passes claiming not to
support SSA form.  In reality though those passes do not support PHI
instructions => Track the presence of PHI instructions separate from the
SSA property.

Differential Revision: https://reviews.llvm.org/D22719

llvm-svn: 279573
2016-08-23 21:19:49 +00:00
Tim Northover 6cd4b23a0f GlobalISel: legalize integer comparisons on AArch64.
Next step is doing both legalizations at the same time! Marvel at GlobalISel's
cunning.

llvm-svn: 279566
2016-08-23 21:01:26 +00:00
Tim Northover b3a0be4d38 GlobalISel: legalize conditional branches on AArch64.
llvm-svn: 279565
2016-08-23 21:01:20 +00:00
Matthias Braun 4c1f1f120c CodeGen: Remove MachineFunctionAnalysis => Enable (Machine)ModulePasses
Re-apply this commit with the deletion of a MachineFunction delegated to
a separate pass to avoid use after free when doing this directly in
AsmPrinter.

This patch removes the MachineFunctionAnalysis. Instead we keep a
map from IR Function to MachineFunction in the MachineModuleInfo.

This allows the insertion of ModulePasses into the codegen pipeline
without breaking it because the MachineFunctionAnalysis gets dropped
before a module pass.

Peak memory should stay unchanged without a ModulePass in the codegen
pipeline: Previously the MachineFunction was freed at the end of a codegen
function pipeline because the MachineFunctionAnalysis was dropped; With
this patch the MachineFunction is freed after the AsmPrinter has
finished.

Differential Revision: http://reviews.llvm.org/D23736

llvm-svn: 279564
2016-08-23 20:58:29 +00:00
Tim Northover a01bece1dc GlobalISel: extend legalizer interface to handle multiple types.
Instructions like G_ICMP have multiple types that may need to be legalized (the
boolean output and nearly arbitrary inputs in this case). So the legalizer must
be capable of deciding what to do for each of them separately.

llvm-svn: 279554
2016-08-23 19:30:42 +00:00
Tim Northover 456a3c03ac GlobalISel: mark pointer casts legal on AArch64.
llvm-svn: 279553
2016-08-23 19:30:38 +00:00
Tim Northover 3c73e367c0 GlobalISel: legalize 1-bit load/store and mark 8/16 bit variants legal on AArch64.
llvm-svn: 279548
2016-08-23 18:20:09 +00:00
Krzysztof Parzyszek 38e2ccc8d0 [Hexagon] Packetize return value setup with the return instruction
Commit r279241 unintentionally reverted that ability.

llvm-svn: 279526
2016-08-23 16:01:01 +00:00
Jacques Pienaar 8ed1cd291d [lanai] Use const instead of constexpr
The windows build bot did not like constexpr.

llvm-svn: 279517
2016-08-23 14:36:53 +00:00
Elliot Colp a4092104cb Fix SystemZ hang caused by r279105
The change in r279105 causes an infinite loop in some cases, as it sets the upper bits of an AND mask constant, which DAGCombiner::SimplifyDemandedBits then unsets.
This patch reverts that part of the behaviour, instead relying on .td peepholes to perform the transformation to NILL. I reapplied my original fix for the problem addressed by r279105 (unsetting the upper bits, which prevents a compiler abort for a different reason).

Differential Revision: https://reviews.llvm.org/D23781

llvm-svn: 279515
2016-08-23 14:03:02 +00:00
NAKAMURA Takumi 708591d231 LLVMLanaDesc: Update libdesp.
llvm-svn: 279510
2016-08-23 10:47:40 +00:00
NAKAMURA Takumi 031a6330c4 Change the target's name, s/LanaiMCTargetDesc/LanaiDesc/g.
"AllTargetsDescs" in llvm-mc/CMakeLists.txt expects not ${target}MCTargetDesc, but ${target}Desc.

llvm-svn: 279509
2016-08-23 10:43:01 +00:00
Oliver Stannard 9aa6f010a4 [ARM] Generate consistent frame records for Thumb2
There is not an official documented ABI for frame pointers in Thumb2,
but we should try to emit something which is useful.

We use r7 as the frame pointer for Thumb code, which currently means
that if a function needs to save a high register (r8-r11), it will get
pushed to the stack between the frame pointer (r7) and link register
(r14). This means that while a stack unwinder can follow the chain of
frame pointers up the stack, it cannot know the offset to lr, so does
not know which functions correspond to the stack frames.

To fix this, we need to push the callee-saved registers in two batches,
with the first push saving the low registers, fp and lr, and the second
push saving the high registers. This is already implemented, but
previously only used for iOS. This patch turns it on for all Thumb2
targets when frame pointers are required by the ABI, and the frame
pointer is r7 (Windows uses r11, so this isn't a problem there). If
frame pointer elimination is enabled we still emit a single push/pop
even if we need a frame pointer for other reasons, to avoid increasing
code size.

We must also ensure that lr is pushed to the stack when using a frame
pointer, so that we end up with a complete frame record. Situations that
could cause this were rare, because we already push lr in most
situations so that we can return using the pop instruction.

Differential Revision: https://reviews.llvm.org/D23516

llvm-svn: 279506
2016-08-23 09:19:22 +00:00
Matthias Braun 7f66202d38 Revert "(HEAD -> master, origin/master, origin/HEAD) CodeGen: Remove MachineFunctionAnalysis => Enable (Machine)ModulePasses"
Reverting while tracking down a use after free.

This reverts commit r279502.

llvm-svn: 279503
2016-08-23 05:17:11 +00:00
Matthias Braun fd936841eb CodeGen: Remove MachineFunctionAnalysis => Enable (Machine)ModulePasses
This patch removes the MachineFunctionAnalysis. Instead we keep a
map from IR Function to MachineFunction in the MachineModuleInfo.

This allows the insertion of ModulePasses into the codegen pipeline
without breaking it because the MachineFunctionAnalysis gets dropped
before a module pass.

Peak memory should stay unchanged without a ModulePass in the codegen
pipeline: Previously the MachineFunction was freed at the end of a codegen
function pipeline because the MachineFunctionAnalysis was dropped; With
this patch the MachineFunction is freed after the AsmPrinter has
finished.

Differential Revision: http://reviews.llvm.org/D23736

llvm-svn: 279502
2016-08-23 03:20:09 +00:00
Matt Arsenault 567631bdd4 BranchRelaxation: Fix handling of blocks with multiple conditional
branches

Looping over all terminators exposed AArch64 tests hitting
an assert from analyzeBranch failing. I believe these cases
were miscompiled before.

e.g.
  fcmp s0, s1
  b.ne LBB0_1
  b.vc LBB0_2
  b LBB0_2
LBB0_1:
  ; Large block
LBB0_2:
 ; ...

Both of the individual conditional branches need to
be expanded, since neither can reach the final block.

Split the original block into ones which analyzeBranch
will be able to understand.

llvm-svn: 279499
2016-08-23 01:30:30 +00:00
Jacques Pienaar 0e2171904e [lanai] Exit early in Mem Alu combiner if sentinel reach.
LanaiMemAluCombiner could try to query the debug value of a list sentinel. Add check to exit early instead.

llvm-svn: 279497
2016-08-23 01:04:41 +00:00
Matt Arsenault 78fc9daf8d AMDGPU: Split SILowerControlFlow into two pieces
Do most of the lowering in a pre-RA pass. Keep the skip jump
insertion late, plus a few other things that require more
work to move out.

One concern I have is now there may be COPY instructions
which do not have the necessary implicit exec uses
if they will be lowered to v_mov_b32.

This has a positive effect on SGPR usage in shader-db.

llvm-svn: 279464
2016-08-22 19:33:16 +00:00
Simon Pilgrim c8ad5c069c [X86][AVX] Don't use SubVectorBroadcast if there are additional users of the chain (PR29088)
We could improve on this by making X86SubVBroadcast a full memory intrinsic similar to X86vzload

llvm-svn: 279441
2016-08-22 16:47:55 +00:00
Simon Atanasyan eb9ed61021 [mips][ias] Support .dtprel[d]word and .tprel[d]word directives
Assembler directives .dtprelword, .dtpreldword, .tprelword, and
.tpreldword generates relocations R_MIPS_TLS_DTPREL32, R_MIPS_TLS_DTPREL64,
R_MIPS_TLS_TPREL32, and R_MIPS_TLS_TPREL64 respectively.

The main motivation for this patch is to be able to write test cases
for checking correctness of the LLD linker's behaviour.

Differential Revision: https://reviews.llvm.org/D23669

llvm-svn: 279439
2016-08-22 16:18:42 +00:00
Simon Pilgrim 13fa33012b [X86] Only accept SM_SentinelUndef (-1) as an undefined shuffle mask in range
As discussed on D23027 we should be trying to be more strict on what is an undefined mask value.

llvm-svn: 279435
2016-08-22 13:18:56 +00:00
Simon Pilgrim 2279e59573 [X86][SSE] Avoid specifying unused arguments in SHUFPD lowering
As discussed on PR26491, we are missing the opportunity to make use of the smaller MOVHLPS instruction because we set both arguments of a SHUFPD when using it to lower a single input shuffle.

This patch sets the lowered argument to UNDEF if that shuffle element is undefined. This in turn makes it easier for target shuffle combining to decode UNDEF shuffle elements, allowing combines to MOVHLPS to occur.

A fix to match against MOVHPD stores was necessary as well.

This builds on the improved MOVLHPS/MOVHLPS lowering and memory folding support added in D16956

Adding similar support for SHUFPS will have to wait until have better support for target combining of binary shuffles.

Differential Revision: https://reviews.llvm.org/D23027

llvm-svn: 279430
2016-08-22 12:56:54 +00:00
Hrvoje Varga f0ed16eae5 [mips][microMIPS] Implement BLTZC, BLEZC, BGEZC and BGTZC instructions, fix disassembly and add operand checking to existing B<cond>C implementations
Differential Revision: https://reviews.llvm.org/D22667

llvm-svn: 279429
2016-08-22 12:17:59 +00:00
Craig Topper 5f8419da34 [X86] Create a new instruction format to handle 4VOp3 encoding. This saves one bit in TSFlags and simplifies MRMSrcMem/MRMSrcReg format handling.
llvm-svn: 279424
2016-08-22 07:38:50 +00:00
Craig Topper 9b20fece81 [X86] Create a new instruction format to handle MemOp4 encoding. This saves one bit in TSFlags and simplifies MRMSrcMem/MRMSrcReg format handling.
llvm-svn: 279423
2016-08-22 07:38:45 +00:00
Craig Topper 61b62e56b7 [X86] Space out the encodings of X86 instruction formats. I plan to add some new encodings in future commits and this will reduce the size of those commits. NFC
This tries to keep all the ModRM memory and register forms in their own regions of the encodings. Hoping to make it simple on some of the switch statements that operate on these encodings.

llvm-svn: 279422
2016-08-22 07:38:41 +00:00
Craig Topper ca0eda3e6a [X86] Merge hasVEX_i8ImmReg into the ImmFormat type which had extra unused encodings. This saves one bit in TSFlags. NFC
llvm-svn: 279412
2016-08-22 01:37:19 +00:00
Craig Topper 522541231a [X86] Remove ignoreVEX_L from TSFlags. Only the disassembler needs it and the disassembler doesn't use TSFlags. NFC
llvm-svn: 279411
2016-08-22 01:37:16 +00:00
NAKAMURA Takumi 9d0b53129c Reformat.
llvm-svn: 279409
2016-08-22 00:58:47 +00:00
NAKAMURA Takumi 59a20649c6 Untabify.
llvm-svn: 279408
2016-08-22 00:58:04 +00:00
Simon Pilgrim 67e7e22462 [X86][AVX] Dropped combineShuffle256 - this can now be performed by EltsFromConsecutiveLoads
llvm-svn: 279397
2016-08-21 15:39:45 +00:00
Guy Blank 9ae797a798 [AVX512][FastISel] Do not use K registers in TEST instructions
In some cases, FastIsel was emitting TEST instruction with K reg input, which is illegal.
Changed to using KORTEST when dealing with K regs.

Differential Revision: https://reviews.llvm.org/D23163

llvm-svn: 279393
2016-08-21 08:02:27 +00:00
Duncan P. N. Exon Smith 8f44c98d04 ARM: Avoid dereferencing end() in ARMFrameLowering::emitEpilogue
This fixes the crash from PR29072, where the MachineBasicBlock::iterator
wasn't being properly checked against MachineBasicBlock::end() before
iterating.  This was another bug exposed by the new
ilist::iterator::operator*() assertion from r279314.

This testcase is poor quality.  bugpoint couldn't reduce any further,
and I haven't had time to dig into what's going on so I can't invent a
better one.  I didn't even get good CHECK lines in: this is just a
crasher.

I'm committing anyway since this is a real crash with an obvious fix,
but I'll leave PR29072 open and ask an ARM maintainer to help improve
the testcase.

llvm-svn: 279391
2016-08-21 00:08:10 +00:00
Tim Northover a11be04769 GlobalISel: support legalization of G_FCONSTANTs
llvm-svn: 279341
2016-08-19 22:40:08 +00:00
Tim Northover ea904f9424 GlobalISel: teach legalizer how to handle integer constants.
llvm-svn: 279340
2016-08-19 22:40:00 +00:00
Tim Shen b5e0f5ac95 [GraphTraits] Make nodes_iterator dereference to NodeType*/NodeRef
Currently nodes_iterator may dereference to a NodeType* or a NodeType&. Make them all dereference to NodeType*, which is NodeRef later.

Differential Revision: https://reviews.llvm.org/D23704
Differential Revision: https://reviews.llvm.org/D23705

llvm-svn: 279326
2016-08-19 21:20:13 +00:00
Krzysztof Parzyszek 29a6a2eb8f [Hexagon] Avoid register dependencies on indirect branches in packetizer
Do not packetize the instruction setting the branch address with the
indirect branch itself.

llvm-svn: 279324
2016-08-19 21:07:35 +00:00
Justin Lebar d13880a750 [NVPTX] Switch nvptx-use-infer-addrspace to true.
Summary:
This switches us to use a different, more powerful algorithm for address
space inference.  I've tested this locally and it seems to work great.
Once we're more confident in it, we can remove the old pass altogether.

Reviewers: jingyue

Subscribers: llvm-commits, tra, jholewinski

Differential Revision: https://reviews.llvm.org/D23694

llvm-svn: 279317
2016-08-19 20:46:45 +00:00
Krzysztof Parzyszek fb4c4178a2 [Hexagon] Fix subesthetic indentation
llvm-svn: 279303
2016-08-19 19:29:15 +00:00
Krzysztof Parzyszek 505eb498bd [Hexagon] Allow i1 values for 'r' constraint in inline-asm
llvm-svn: 279302
2016-08-19 19:17:28 +00:00
Krzysztof Parzyszek 8849a51370 [Hexagon] Do not cache alloca instructions during isel
They can be deleted or replicated, so the cache may become outdated.
They only need to be visited once during frame lowering, so just scan
the function instead.

llvm-svn: 279297
2016-08-19 18:46:13 +00:00
Krzysztof Parzyszek 3d9946eb23 [Hexagon] Fixes for new-value jump formation
- Recognize C2_cmpgtui, S2_tstbit_i, and S4_ntstbit_i.
- Avoid creating new-value instructions with both source operands equal.

llvm-svn: 279286
2016-08-19 17:54:49 +00:00
Krzysztof Parzyszek 5a7bef9c14 [Hexagon] Fix a few omissions in HexagonInstrInfo
llvm-svn: 279280
2016-08-19 17:20:57 +00:00
Simon Pilgrim d7a3782ae4 [X86][SSE] Generalised combining to VZEXT_MOVL to any vector size
This doesn't change tests codegen as we already combined to blend+zero which is what we lower VZEXT_MOVL to on SSE41+ targets, but it does put us in a better position when we improve shuffling for optsize.

llvm-svn: 279273
2016-08-19 17:02:00 +00:00
Krzysztof Parzyszek 639545b4d8 [Hexagon] Enforce LLSC packetization rules
Ensure that load locked and store conditional instructions are only
packetized with ALU32 instructions.

Patch by Ben Craig.

llvm-svn: 279272
2016-08-19 16:57:05 +00:00
Krzysztof Parzyszek b7640d4df0 [Hexagon] Minor updates to register definitions
llvm-svn: 279269
2016-08-19 16:40:19 +00:00
Krzysztof Parzyszek 9335bf0ec5 [Hexagon] Fix incorrect generation of S4_subi_asl_ri
Patch by Jyotsna Verma.

llvm-svn: 279267
2016-08-19 16:35:05 +00:00
Krzysztof Parzyszek dddb097a1f [Hexagon] Add missing pattern for C4_cmplte
llvm-svn: 279265
2016-08-19 16:11:33 +00:00
Krzysztof Parzyszek 0b8672269c [Hexagon] Make p0 an explicit operand in VA1_clr* subinstructions, NFC
llvm-svn: 279255
2016-08-19 15:17:19 +00:00
Krzysztof Parzyszek 6ce82951c3 [Hexagon] Add explicit default constructor for HexagonSelectionDAGInfo
llvm-svn: 279254
2016-08-19 15:13:54 +00:00
Krzysztof Parzyszek 0ba9754584 [Hexagon] Allow tail-call optimization when mixing C and fast calling conv
Patch by Arnold Schwaighofer.

llvm-svn: 279251
2016-08-19 15:02:18 +00:00
Krzysztof Parzyszek 66dd6797e8 [Hexagon] Check for empty live interval
Patch by Brendon Cahoon.

llvm-svn: 279249
2016-08-19 14:29:43 +00:00
Krzysztof Parzyszek db019ae801 [Hexagon] Consider zext/sext of a load to i32 to be free
llvm-svn: 279248
2016-08-19 14:22:07 +00:00
Anton Korobeynikov b38195c1a8 Revert r279242 - it's failing the tests
llvm-svn: 279247
2016-08-19 14:18:34 +00:00
Krzysztof Parzyszek a243adfd27 [Hexagon] Handle J2_jumptpt and J2_jumpfpt instructions
llvm-svn: 279246
2016-08-19 14:14:09 +00:00
Krzysztof Parzyszek 067debe0a0 [Hexagon] Fix indentation, NFC
llvm-svn: 279245
2016-08-19 14:12:51 +00:00
Krzysztof Parzyszek 9273ecc176 [Hexagon] Remove unnecessary llvm::, NFC
llvm-svn: 279244
2016-08-19 14:10:57 +00:00
Krzysztof Parzyszek 75e74ee699 [Hexagon] Rename the HEXAGON_MC namespace to Hexagon_MC, NFC
llvm-svn: 279243
2016-08-19 14:09:47 +00:00
Anton Korobeynikov 2aae31a945 Fix PR27500: on MSP430 the branch destination offset is measured in words, not bytes.
In addition, the branch instructions will have proper BB destinations, not offsets, like before.

Patch by Vadzim Dambrouski!

Differential Revision: https://reviews.llvm.org/D20162

llvm-svn: 279242
2016-08-19 14:07:10 +00:00
Krzysztof Parzyszek 6421b934ec [Hexagon] Mark PS_jumpret as pseudo-instruction, expand it into J2_jumpr
llvm-svn: 279241
2016-08-19 14:04:45 +00:00
Krzysztof Parzyszek bd8ef4b8ce [Hexagon] Improvements to handling and generation of FP instructions
Improved handling of fma, floating point min/max, additional load/store
instructions for floating point types.

Patch by Jyotsna Verma.

llvm-svn: 279239
2016-08-19 13:34:31 +00:00
Simon Pilgrim f1b8fdc074 [X86][SSE] Add support for matching commuted insertps patterns
INSERTPS doesn't fit well with our shuffle mask canonicalization, so we need to attempt both the original mask and the commuted mask to more likely get a match

llvm-svn: 279230
2016-08-19 10:31:53 +00:00
Dean Michael Berris 1dd1ca9727 [XRay] Synthesize a reference to the xray_instr_map
Without the synthesized reference to a symbol in the xray_instr_map,
linker section garbage collection will helpfully remove the whole
xray_instr_map section from the final executable (or archive). This will
cause the runtime to not be able to identify the sleds and hot-patch the
calls/jumps into the runtime trampolines.

This change adds a reference from the text section at the end of the
function to keep around the associated xray_instr_map section as well.

We also make sure that we catch this reference in the test.

Reviewers: chandlerc, echristo, majnemer, mehdi_amini

Subscribers: mehdi_amini, llvm-commits, dberris

Differential Revision: https://reviews.llvm.org/D23398

llvm-svn: 279204
2016-08-19 04:44:30 +00:00
Andrew Kaylor 81901d658f Include X86CallFrameOptimization in the opt-bisect process.
Differential Revision: https://reviews.llvm.org/D23683

llvm-svn: 279175
2016-08-18 22:49:51 +00:00
Saleem Abdulrasool dab786fb78 AArch64: remove extraneous padding
The structs BarrierOp, PrefetchOp, PSBHintOp are in AArch64AsmParser.cpp
(inside anonymous namespace).  This diff changes the order of fields and
removes the excessive padding (8 bytes).

Patch by Alexander Shaposhnikov!

llvm-svn: 279173
2016-08-18 22:35:06 +00:00
Zhan Jun Liau cf2f4b3251 [SystemZ] Use valid base/index regs for inline asm
Summary:
Inline asm memory constraints can have the base or index register be assigned
to %r0 right now. Make sure that we assign only ADDR64 registers to the base
and index.

Reviewers: uweigand

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D23367

llvm-svn: 279157
2016-08-18 21:44:15 +00:00
Michael Kuperstein 2bc3d4d46c [SelectionDAG] Rename fextend -> fpextend, fround -> fpround, frnd -> fround
The names of the tablegen defs now match the names of the ISD nodes.
This makes the world a slightly saner place, as previously "fround" matched
ISD::FP_ROUND and not ISD::FROUND.

Differential Revision: https://reviews.llvm.org/D23597

llvm-svn: 279129
2016-08-18 20:08:15 +00:00
Wei Ding 52bb661dec AMDGPU : Fix QSAD and MQSAD instructions' incorrect data type.
Differential Revision: http://reviews.llvm.org/D23689

llvm-svn: 279126
2016-08-18 19:51:14 +00:00
Valery Pykhtin 609c2f8137 [AMDGPU] add s_incperflevel/s_decperflevel intrinsics.
Differential revision: https://reviews.llvm.org/D23666

llvm-svn: 279106
2016-08-18 18:06:20 +00:00
Elliot Colp 687691aeac Fix SystemZ compilation abort caused by negative AND mask
Normally, when an AND with a constant is lowered to NILL, the constant value is truncated to 16 bits. However, since r274066, ANDs whose results are used in a shift are caught by a different pattern that does not truncate. The instruction printer expects a 16-bit unsigned immediate operand for NILL, so this results in an abort.

This patch adds code to manually truncate the constant in this situation. The rest of the bits are then set, so we will detect a case for NILL "naturally" rather than using peephole optimizations.

Differential Revision: http://reviews.llvm.org/D21854

llvm-svn: 279105
2016-08-18 18:04:26 +00:00
Duncan P. N. Exon Smith 84c2da47f9 AArch64: Don't call getIterator() on iterators
Remove an unnecessary round-trip:

    iterator => operator->() => getIterator()

In some cases, the iterator is end(), so the dereference of operator->
is invalid (UB).

The testcase only crashes with r278974 (currently reverted to
investigate this), which adds an assertion for invalid dereferences of
ilist nodes.

Fixes PR29035.

llvm-svn: 279104
2016-08-18 17:58:09 +00:00
Eugene Zelenko 61a72d8850 [LLVM] Fix some Clang-tidy modernize-use-using and Include What You Use warnings
Differential revision: https://reviews.llvm.org/D23675

llvm-svn: 279102
2016-08-18 17:56:27 +00:00
Dan Gohman c9623db884 [WebAssembly] Disable the store-results optimization.
The WebAssemly spec removing the return value from store instructions, so
remove the associated optimization from LLVM.

This patch leaves the store instruction operands in place for now, so stores
now always write to "$drop"; these will be removed in a seperate patch.

llvm-svn: 279100
2016-08-18 17:51:27 +00:00
Ahmed Bougacha 33e19fe1c4 [AArch64][GlobalISel] Select floating-point binary ops.
There is no FREM instruction, but the others are straightforward.

llvm-svn: 279081
2016-08-18 16:05:11 +00:00
Derek Schuff ccdceda128 [WebAssembly] Refactor WebAssemblyLowerEmscriptenException pass for setjmp/longjmp
This patch changes the code structure of
WebAssemblyLowerEmscriptenException pass to support both exception
handling and setjmp/longjmp. It also changes the name of the pass and
the source file.

1. Change the file/pass name to WebAssemblyLowerEmscriptenExceptions ->
WebAssemblyLowerEmscriptenEHSjLj to make it clear that it supports both
EH and SjLj
2. List function / global variable names at the top so they
can be changed easily
3. Some cosmetic changes

Patch by Heejin Ahn

Differential Revision: https://reviews.llvm.org/D23588

llvm-svn: 279075
2016-08-18 15:27:25 +00:00
Ahmed Bougacha 1d0560b14d [AArch64][GlobalISel] Select G_SDIV/G_UDIV.
There is no REM instruction; that will require an expansion.
It's not obvious that should be done in select, rather than as a
(custom?) legalization.

llvm-svn: 279074
2016-08-18 15:17:13 +00:00
Sanjay Patel 7d37b221a2 fix typo; NFC
llvm-svn: 279068
2016-08-18 14:17:34 +00:00