Commit Graph

30679 Commits

Author SHA1 Message Date
Matt Arsenault 4d90625271 AMDGPU/GlobalISel: Fix load/store of types in other address spaces
There should probably be a size only matcher.

llvm-svn: 371155
2019-09-06 00:36:06 +00:00
Matt Arsenault 9ceb6edf11 GlobalISel/TableGen: Fix handling of EXTRACT_SUBREG constraints
This was only using the correct register constraints if this was the
final result instruction. If the extract was a sub instruction of the
result, it would attempt to use GIR_ConstrainSelectedInstOperands on a
COPY, which won't work. Move the handling to
createAndImportSubInstructionRenderer so it works correctly.

I don't fully understand why runOnPattern and
createAndImportSubInstructionRenderer both need to handle these
special cases, and constrain them with slightly different methods. If
I remove the runOnPattern handling, it does break the constraint when
the final result instruction is EXTRACT_SUBREG.

llvm-svn: 371150
2019-09-06 00:05:58 +00:00
Matt Arsenault 60c8b8bcf2 AMDGPU: Allow getMemOperandWithOffset to analyze stack accesses
Report soffset as a base register if the scratch resource can be
ignored.

llvm-svn: 371149
2019-09-05 23:54:35 +00:00
Matt Arsenault 59ff77ee38 AMDGPU: Fix emitting multiple stack loads for stack passed workitems
The same stack is loaded for each workitem ID, and each use. Nothing
prevents you from creating multiple fixed stack objects with the same
offsets, so this was creating a load for each unique frame index,
despite them being the same offset. Re-use the same frame index so the
loads are CSEable.

llvm-svn: 371148
2019-09-05 23:40:14 +00:00
Eli Friedman 9dd453ce8d [AArch64] Add testcase for codegen for sdiv by 2.
llvm-svn: 371147
2019-09-05 23:40:03 +00:00
Puyan Lotfi dc97ca9f25 [MIR] MIRNamer pass for improving MIR test authoring experience.
This patch reuses the MIR vreg renamer from the MIRCanonicalizerPass to cleanup
names of vregs in a MIR file for MIR test authors. I found it useful when
writing a regression test for a globalisel failure I encountered recently and
thought it might be useful for other folks as well.

Differential Revision: https://reviews.llvm.org/D67209

llvm-svn: 371121
2019-09-05 20:44:33 +00:00
Jessica Paquette 20e8667098 Recommit "[AArch64][GlobalISel] Teach AArch64CallLowering to handle basic sibling calls"
Recommit basic sibling call lowering (https://reviews.llvm.org/D67189)

The issue was that if you have a return type other than void, call lowering
will emit COPYs to get the return value after the call.

Disallow sibling calls other than ones that return void for now. Also
proactively disable swifterror tail calls for now, since there's a similar issue
with COPYs there.

Update call-translator-tail-call.ll to include test cases for each of these
things.

llvm-svn: 371114
2019-09-05 20:18:34 +00:00
Eli Friedman cae1e47f6e [IfConversion] Fix diamond conversion with unanalyzable branches.
The code was incorrectly counting the number of identical instructions,
and therefore tried to predicate an instruction which should not have
been predicated.  This could have various effects: a compiler crash,
an assembler failure, a miscompile, or just generating an extra,
unnecessary instruction.

Instead of depending on TargetInstrInfo::removeBranch, which only
works on analyzable branches, just remove all branch instructions.

Fixes https://bugs.llvm.org/show_bug.cgi?id=43121 and
https://bugs.llvm.org/show_bug.cgi?id=41121 .

Differential Revision: https://reviews.llvm.org/D67203

llvm-svn: 371111
2019-09-05 20:02:38 +00:00
Craig Topper 0fde412140 [X86] Enable BuildSDIVPow2 for i16.
We're able to use a 32-bit ADD and CMOV here and should work
well with our other i16->i32 promotion optimizations.

llvm-svn: 371107
2019-09-05 18:49:52 +00:00
Craig Topper b8d6ba3ca2 [X86] Override BuildSDIVPow2 for X86.
As noted in PR43197, we can use test+add+cmov+sra to implement
signed division by a power of 2.

This is based off the similar version in AArch64, but I've
adjusted it to use target independent nodes where AArch64 uses
target specific CMP and CSEL nodes. I've also blocked INT_MIN
as the transform isn't valid for that.

I've limited this to i32 and i64 on 64-bit targets for now and only
when CMOV is supported. i8 and i16 need further investigation to be
sure they get promoted to i32 well.

I adjusted a few tests to enable cmov to demonstrate the new
codegen. I also changed twoaddr-coalesce-3.ll to 32-bit mode
without cmov to avoid perturbing the scenario that is being
set up there.

Differential Revision: https://reviews.llvm.org/D67087

llvm-svn: 371104
2019-09-05 18:15:07 +00:00
Sanjay Patel 10412a69f9 [x86] fix horizontal math bug exposed by improved demanded elements analysis (PR43225)
https://bugs.llvm.org/show_bug.cgi?id=43225

llvm-svn: 371095
2019-09-05 17:28:17 +00:00
Craig Topper 673da001c5 [X86] Remove unneeded CHECK lines from a test. NFC
llvm-svn: 371093
2019-09-05 17:24:25 +00:00
Sanjay Patel 3856512334 [x86] add test for horizontal math bug (PR43225); NFC
llvm-svn: 371088
2019-09-05 16:58:18 +00:00
Krzysztof Parzyszek 0ce93194fe [Hexagon] Fix type in HexagonTargetLowering::ReplaceNodeResults
llvm-svn: 371083
2019-09-05 16:19:47 +00:00
Simon Pilgrim 29361c704d [X86][SSE] EltsFromConsecutiveLoads - ignore non-zero offset base loads (PR43227)
As discussed on D64551 and PR43227, we don't correctly handle cases where the base load has a non-zero byte offset.

Until we can properly handle this, we must bail from EltsFromConsecutiveLoads.

llvm-svn: 371078
2019-09-05 15:07:07 +00:00
David Green 83a3341246 [ARM] Fixup the creation of VPT blocks
This attempts to just fix the creation of VPT blocks, fixing up the iterating,
which instructions are considered in the bundle, and making sure that we do not
overrun the end of the block.

Differential Revision: https://reviews.llvm.org/D67219

llvm-svn: 371064
2019-09-05 13:37:04 +00:00
Simon Pilgrim 215910eeb2 [X86][SSE] Add (failing) test case for PR43227
llvm-svn: 371061
2019-09-05 12:36:11 +00:00
Petar Avramovic a4bfc8dfda [MIPS GlobalISel] Select G_FENCE
G_FENCE comes form fence instruction. For MIPS fence is generated in
AtomicExpandPass when atomic instruction gets surrounded with fence
instruction when needed.
G_FENCE arguments don't have LLT, because of that there is no job for
legalizer and regbankselect. Instruction select G_FENCE for MIPS32.

Differential Revision: https://reviews.llvm.org/D67181

llvm-svn: 371056
2019-09-05 11:20:32 +00:00
Petar Avramovic f5c7fe0795 [MIPS GlobalISel] Select llvm.trap intrinsic
Select G_INTRINSIC_W_SIDE_EFFECTS for Intrinsic::trap for MIPS32
via legalizeIntrinsic.

Differential Revision: https://reviews.llvm.org/D67180

llvm-svn: 371055
2019-09-05 11:16:37 +00:00
Petar Avramovic d2574d79b6 [MIPS GlobalISel] Lower SRet pointer arguments
Instead of returning structure by value clang usually adds pointer
to that structure as an argument. Pointers don't require special
handling no matter the SRet flag. Remove unsuccessful exit from
lowerCall for arguments with SRet flag if they are pointers.

Differential Revision: https://reviews.llvm.org/D67179

llvm-svn: 371054
2019-09-05 11:12:01 +00:00
Simon Pilgrim 071287c5a9 Revert rL370996 from llvm/trunk: [AArch64][GlobalISel] Teach AArch64CallLowering to handle basic sibling calls
This adds support for basic sibling call lowering in AArch64. The intent here is
to only handle tail calls which do not change the ABI (hence, sibling calls.)

At this point, it is very restricted. It does not handle

- Vararg calls.
- Calls with outgoing arguments.
- Calls whose calling conventions differ from the caller's calling convention.
- Tail/sibling calls with BTI enabled.

This patch adds

- `AArch64CallLowering::isEligibleForTailCallOptimization`, which is equivalent
   to the same function in AArch64ISelLowering.cpp (albeit with the restrictions
   above.)
- `mayTailCallThisCC` and `canGuaranteeTCO`, which are identical to those in
   AArch64ISelLowering.cpp.
- `getCallOpcode`, which is exactly what it sounds like.

Tail/sibling calls are lowered by checking if they pass target-independent tail
call positioning checks, and checking if they satisfy
`isEligibleForTailCallOptimization`. If they do, then a tail call instruction is
emitted instead of a normal call. If we have a sibling call (which is always the
case in this patch), then we do not emit any stack adjustment operations. When
we go to lower a return, we check if we've already emitted a tail call. If so,
then we skip the return lowering.

For testing, this patch

- Adds call-translator-tail-call.ll to test which tail calls we currently lower,
  which ones we don't, and which ones we shouldn't.
- Updates branch-target-enforcement-indirect-calls.ll to show that we fall back
  as expected.

Differential Revision: https://reviews.llvm.org/D67189
........
This fails on EXPENSIVE_CHECKS builds due to a -verify-machineinstrs test failure in CodeGen/AArch64/dllimport.ll

llvm-svn: 371051
2019-09-05 10:38:39 +00:00
Jonas Paulsson 821858780e [SystemZ] Recognize INLINEASM_BR in backend
Handle the remaining cases also by handling asm goto in
SystemZInstrInfo::getBranchInfo().

Review: Ulrich Weigand
https://reviews.llvm.org/D67151

llvm-svn: 371048
2019-09-05 10:20:05 +00:00
Guillaume Chatelet aff45e4b23 [LLVM][Alignment] Make functions using log of alignment explicit
Summary:
This patch renames functions that takes or returns alignment as log2, this patch will help with the transition to llvm::Align.
The renaming makes it explicit that we deal with log(alignment) instead of a power of two alignment.
A few renames uncovered dubious assignments:

 - `MirParser`/`MirPrinter` was expecting powers of two but `MachineFunction` and `MachineBasicBlock` were using deal with log2(align). This patch fixes it and updates the documentation.
 - `MachineBlockPlacement` exposes two flags (`align-all-blocks` and `align-all-nofallthru-blocks`) supposedly interpreted as power of two alignments, internally these values are interpreted as log2(align). This patch updates the documentation,
 - `MachineFunctionexposes` exposes `align-all-functions` also interpreted as power of two alignment, internally this value is interpreted as log2(align). This patch updates the documentation,

Reviewers: lattner, thegameg, courbet

Subscribers: dschuff, arsenm, jyknight, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, javed.absar, hiraditya, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, dexonsmith, PkmX, jocewei, jsji, Jim, s.egerton, llvm-commits, courbet

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65945

llvm-svn: 371045
2019-09-05 10:00:22 +00:00
Matt Arsenault f581d575ce AMDGPU: Add intrinsics for address space identification
The library currently uses ptrtoint and directly checks the queue ptr
for this, which counts as a pointer capture.

llvm-svn: 371009
2019-09-05 02:20:39 +00:00
Matt Arsenault 69b1a2ae65 AMDGPU/GlobalISel: Restore insert point when getting aperture
Avoids SSA violations in a future patch.

llvm-svn: 371008
2019-09-05 02:20:32 +00:00
Matt Arsenault 25156ae7ea AMDGPU/GlobalISel: Fix placeholder value used for addrspacecast
llvm-svn: 371007
2019-09-05 02:20:29 +00:00
Matt Arsenault d51a3746d0 AMDGPU/GlobalISel: Fix assert on load from constant address
llvm-svn: 371006
2019-09-05 02:20:25 +00:00
Puyan Lotfi 6d3ea2d9b6 [mir-canon][NFC] Adding -verify-machineinstrs to mir-canon tests.
In the review process for some of the refactoring of MIRCanonicalizationPass it
was noted that some of the tests didn't have verifier enabled. Enabling here.

llvm-svn: 371005
2019-09-05 02:10:41 +00:00
Reid Kleckner 29ccc8523a Use -mtriple to fix AMDGPU test sensitive to object file format
GOTPCREL32 doesn't exist on COFF, so it isn't used when this test runs
on Windows.

llvm-svn: 371000
2019-09-05 00:34:01 +00:00
Jessica Paquette b78324fc40 [AArch64][GlobalISel] Teach AArch64CallLowering to handle basic sibling calls
This adds support for basic sibling call lowering in AArch64. The intent here is
to only handle tail calls which do not change the ABI (hence, sibling calls.)

At this point, it is very restricted. It does not handle

- Vararg calls.
- Calls with outgoing arguments.
- Calls whose calling conventions differ from the caller's calling convention.
- Tail/sibling calls with BTI enabled.

This patch adds

- `AArch64CallLowering::isEligibleForTailCallOptimization`, which is equivalent
   to the same function in AArch64ISelLowering.cpp (albeit with the restrictions
   above.)
- `mayTailCallThisCC` and `canGuaranteeTCO`, which are identical to those in
   AArch64ISelLowering.cpp.
- `getCallOpcode`, which is exactly what it sounds like.

Tail/sibling calls are lowered by checking if they pass target-independent tail
call positioning checks, and checking if they satisfy
`isEligibleForTailCallOptimization`. If they do, then a tail call instruction is
emitted instead of a normal call. If we have a sibling call (which is always the
case in this patch), then we do not emit any stack adjustment operations. When
we go to lower a return, we check if we've already emitted a tail call. If so,
then we skip the return lowering.

For testing, this patch

- Adds call-translator-tail-call.ll to test which tail calls we currently lower,
  which ones we don't, and which ones we shouldn't.
- Updates branch-target-enforcement-indirect-calls.ll to show that we fall back
  as expected.

Differential Revision: https://reviews.llvm.org/D67189

llvm-svn: 370996
2019-09-04 22:54:52 +00:00
Matt Arsenault 2df41a8e38 AMDGPU/GlobalISel: Select G_BITREVERSE
llvm-svn: 370980
2019-09-04 20:46:31 +00:00
Matt Arsenault 5ff310e298 GlobalISel: Add basic legalization for G_BITREVERSE
llvm-svn: 370979
2019-09-04 20:46:15 +00:00
Alina Sbirlea 6da79ce1fe [MemorySSA] Re-enable MemorySSA use.
Differential Revision: https://reviews.llvm.org/D58311

llvm-svn: 370957
2019-09-04 19:16:04 +00:00
Craig Topper f0081dac81 [X86] Pre-commit test cases and test run line changes for D67087
llvm-svn: 370937
2019-09-04 17:33:38 +00:00
Matt Arsenault 84489b34f6 AMDGPU: Handle frame index expansion with no free SGPRs pre gfx9
Since an add instruction must produce an unused carry out, this
requires additional SGPRs. This can be avoided by keeping the entire
offset computation in SGPRs. If one SGPR is still available, this only
costs one extra mov. If none are available, the entire computation can
be done in place and reversed.

This does assume the use is a VGPR operand. This was already assumed,
and we currently only select frame indexes to VALU instructions. This
should probably be fixed at some point to handle more possible MIR.

llvm-svn: 370929
2019-09-04 17:12:57 +00:00
Matt Arsenault 70becc20fa GlobalISel: Add G_BITREVERSE
This is the first failing pattern for AMDGPU and is trivial to handle.

llvm-svn: 370927
2019-09-04 17:06:53 +00:00
Matt Arsenault d9af712da4 AMDGPU/GlobalISel: Make 16-bit constants legal
This is mostly for the benefit of patterns which use 16-bit constants.

llvm-svn: 370921
2019-09-04 16:19:45 +00:00
Krzysztof Parzyszek 08a09822a5 [Hexagon] Improve generated code for test-if-bit-clear, one more time
Adjust isel patterns after recent commit. Fixes https://llvm.org/PR43194.

llvm-svn: 370913
2019-09-04 15:22:36 +00:00
Sam Parker fea532230b [ARM][ParallelDSP] SExt mul for accumulation
For any unpaired muls, we accumulate them as an input to the
reduction. Check the type of the mul and perform a sext if the
existing accumlator input type is not the same.

Differential Revision: https://reviews.llvm.org/D66993

llvm-svn: 370851
2019-09-04 08:41:34 +00:00
Jim Lin b77aa1d248 [RISCV] Enable tail call opt for variadic function
Summary: Tail call opt can treat variadic function call the same as normal function call

Reviewers: mgrang, asb, lenary, lewis-revill

Reviewed By: lenary

Subscribers: luismarques, pzheng, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, kito-cheng, shiva0217, jrtc27, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, rkruppe, PkmX, jocewei, psnobl, benna, s.egerton, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66278

llvm-svn: 370835
2019-09-04 02:03:36 +00:00
Reid Kleckner 3fa07dee94 Revert [Windows] Disable TrapUnreachable for Win64, add SEH_NoReturn
This reverts r370525 (git commit 0bb1630685)
Also reverts r370543 (git commit 185ddc08ee)

The approach I took only works for functions marked `noreturn`. In
general, a call that is not known to be noreturn may be followed by
unreachable for other reasons. For example, there could be multiple call
sites to a function that throws sometimes, and at some call sites, it is
known to always throw, so it is followed by unreachable. We need to
insert an `int3` in these cases to pacify the Windows unwinder.

I think this probably deserves its own standalone, Win64-only fixup pass
that runs after block placement. Implementing that will take some time,
so let's revert to TrapUnreachable in the mean time.

llvm-svn: 370829
2019-09-03 22:27:27 +00:00
Heejin Ahn 49e7ee4dd5 [WebAssembly] Compare functions by names in Emscripten Sjlj
Summary:
This removes all string constants for function names and compares
functions by string directly when needed. Many of these constants are
used only once or twice so the benefit of defining them separately is
not very clear, and this actually fixes a bug.

When we already have a `malloc` declaration which is an alias to
something else within the module,
```
@malloc = weak hidden alias i8* (i32), i8* (i32)* @dlmalloc
```
(this happens compiling with emscripten with `-s WASM_OBJECT_FILES=0`
because all bc files are merged before being fed into `wasm-ld` which
runs the backend optimizations as LTO)

`Module::getFunction("malloc")` in `canLongjmp` returns `nullptr`
because `Module::getFunction` dyncasts pointer into `Function`, but the
alias is a `GlobalValue` but not a `Function`. This makes `canLongjmp`
return false for `malloc` in this case, and we end up adding a lot of
longjmp handling code around malloc. This is not only a code size
increase but actually a bug because `malloc` is used in the entry block
when preparing for setjmp tables for emscripten sjlj handling, and this
makes initial setjmp preparation, which has to happen in the entry
block, move to another split block, and this interferes with SSA update
later.

This also adds two more functions, `getTempRet0` and `setTempRet0`, in
the list of not longjmp-able functions.

Fixes https://github.com/emscripten-core/emscripten/issues/8935.

Reviewers: sbc100

Subscribers: mehdi_amini, jgravelle-google, hiraditya, sunfish, dexonsmith, dschuff, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67129

llvm-svn: 370828
2019-09-03 22:26:49 +00:00
Amara Emerson 2a2c25ba48 [AArch64][GlobalISel] Legalize 128 bit divisions to libcalls.
Now that we have the infrastructure to support s128 types as parameters
we can expand these to libcalls.

Differential Revision: https://reviews.llvm.org/D66185

llvm-svn: 370823
2019-09-03 21:42:32 +00:00
Amara Emerson fbaf425b79 [GlobalISel][CallLowering] Add support for splitting types according to calling conventions.
On AArch64, s128 types have to be split into s64 GPRs when passed as arguments.
This change adds the generic support in call lowering for dealing with multiple
registers, for incoming and outgoing args.

Support for splitting for return types not yet implemented.

Differential Revision: https://reviews.llvm.org/D66180

llvm-svn: 370822
2019-09-03 21:42:28 +00:00
Alina Sbirlea ccb1862bc9 [MemorySSA] Disable MemorySSA use.
Differential Revision: https://reviews.llvm.org/D58311

llvm-svn: 370821
2019-09-03 21:20:46 +00:00
Bjorn Pettersson b0eb394417 [CodeGen] Use FSHR in DAGTypeLegalizer::ExpandIntRes_MULFIX
Summary:
Simplify the right shift of the intermediate result (given
in four parts) by using funnel shift.

There are some impact on lit tests, but that seems to be
related to register allocation differences due to how FSHR
is expanded on X86 (giving a slightly different operand order
for the OR operations compared to the old code).

Reviewers: leonardchan, RKSimon, spatel, lebedev.ri

Reviewed By: RKSimon

Subscribers: hiraditya, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, s.egerton, pzheng, bevinh, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67036

llvm-svn: 370813
2019-09-03 19:35:07 +00:00
Alina Sbirlea e331d50534 [MemorySSA] Re-enable MemorySSA use.
Differential Revision: https://reviews.llvm.org/D58311

llvm-svn: 370811
2019-09-03 19:28:37 +00:00
Jessica Paquette 15036acb05 [AArch64][GlobalISel] Don't import i64imm_32bit pattern at -O0
This pattern, when imported at -O0 adds an extra copy via the SUBREG_TO_REG.

This is because the SUBREG_TO_REG is not eliminated. At all other opt levels,
it is eliminated.

This is a 1% geomean code size savings at -O0 on CTMark.

Differential Revision: https://reviews.llvm.org/D67027

llvm-svn: 370789
2019-09-03 17:21:12 +00:00
Jonas Paulsson a0a811739d [SystemZ] Recognize INLINEASM_BR in backend.
SystemZInstrInfo::analyzeBranch() needs to check for INLINEASM_BR
instructions, or it will crash.

Review: Ulrich Weigand
llvm-svn: 370753
2019-09-03 13:31:22 +00:00
David Green 2f3574c168 [ARM] Ignore Implicit CPSR regs when lowering from Machine to MC operands
The code here seems to date back to r134705, when tablegen lowering was first
being added. I don't believe that we need to include CPSR implicit operands on
the MCInst. This now works more like other backends (like AArch64), where all
implicit registers are skipped.

This allows the AliasInst for CSEL's to match correctly, as can be seen in the
test changes.

Differential revision: https://reviews.llvm.org/D66703

llvm-svn: 370745
2019-09-03 11:30:54 +00:00
Jonas Paulsson f12415812c [SystemZ] Add support for fentry.
SystemZAsmPrinter now properly emits function calls to __fentry__.

Review: Ulrich Weigand
llvm-svn: 370743
2019-09-03 11:21:12 +00:00
David Green 61973d978b [ARM] Invert CSEL predicates if the opposite is a simpler constant to materialise
This moves ConstantMaterializationCost into ARMBaseInstrInfo so that it can
also be used in ISel Lowering, adding codesize values to the computed costs, to
be able to compare either approximate instruction counts or codesize costs.

It also adds a HasLowerConstantMaterializationCost, which compares the
ConstantMaterializationCost of two values, returning true if the first is
smaller either in instruction count/codesize, or falling back to the other in
the case that they are equal.

This is used in constant CSEL lowering to invert the predicate if the opposite
is easier to materialise.

Differential revision: https://reviews.llvm.org/D66701

llvm-svn: 370741
2019-09-03 11:06:24 +00:00
David Green 57cc65ff47 [ARM] Generate 8.1-m CSINC, CSNEG and CSINV instructions.
Arm 8.1-M adds a number of related CSEL instructions, including CSINC, CSNEG and CSINV. These choose between two values given the content in CPSR and a condition, performing an increment, negation or inverse of the false value.

This adds some selection for them, either from constant values or patterns. It does not include CSEL directly, which is currently not always making code better. It is still useful, but we will have to check more carefully where it should and shouldn't be used.

Code by Ranjeet Singh and Simon Tatham, with some modifications from me.

Differential revision: https://reviews.llvm.org/D66483

llvm-svn: 370739
2019-09-03 10:53:07 +00:00
David Green a1ae7e3734 [ARM] Add csel tests. NFC
llvm-svn: 370738
2019-09-03 10:32:46 +00:00
Simon Atanasyan 25d5b54542 [mips] Switch to the `.text` section after emitting asm file preamble
Now the last `.section` directive in the MIPS asm file preamble
is the `.section .mdebug.abi`. If assembler code injected for example
by the LLVM `module asm` or the C ` __asm` directives do not contain
explicit switching to the `.text` section it goes to the `.mdebug.abi`
section. It might be unexpected to the user and in fact for example
breaks building some existing code like FreeBSD libc [1].

The patch forces switching to the `.text` section after emitting MIPS
assembler file preamble.

[1] https://bugs.llvm.org/show_bug.cgi?id=43119

Fix PR43119.

Differential Revision: https://reviews.llvm.org/D67014

llvm-svn: 370735
2019-09-03 10:24:07 +00:00
David Green 3e8d5f335d [ARM] Fix MVE ldst offset ranges
We were using isShiftedInt<7, Shift>(RHSC) to detect the ranges of offsets to
fold into MVE loads/stores. The instructions actually take a 7 bit unsigned
integer which is either added or subtracted. So something more like
isShiftedUInt<7, Shift>(abs(RHSC)).

Instead I've changes this to use the isScaledConstantInRange method, same as in
SelectT2AddrModeImm7Offset used by pre/post inc, which seemed to already be
getting this correct.

Differential revision: https://reviews.llvm.org/D66997

llvm-svn: 370731
2019-09-03 09:57:02 +00:00
Oliver Stannard 39bf484d92 Bug fix on function epilog optimization (ARM backend)
To save a 'add sp,#val' instruction by adding registers to the final pop instruction,
the first register transferred by this pop instruction need to be found.
If the function to be optimized has a non-void return value, the operand list contains
r0 (implicit) which prevents the optimization to take place.
Therefore implicit register references should be skipped in the search loop,
because this registers are never popped from the stack.

Patch by Rainer Herbertz (rOptimizer)!

Differential revision: https://reviews.llvm.org/D66730

llvm-svn: 370728
2019-09-03 09:51:19 +00:00
David Green 855caf2335 [ARM] More MVE load/store tests for offsets around the negative limit. NFC
llvm-svn: 370726
2019-09-03 09:42:16 +00:00
James Molloy 935499579c [MachinePipeliner] Add a way to unit-test the schedule emitter
Emitting a schedule is really hard. There are lots of corner cases to take care of; in fact, of the 60+ SWP-specific testcases in the Hexagon backend most of those are testing codegen rather than the schedule creation itself.

One issue is that to test an emission corner case we must craft an input such that the generated schedule uses that corner case; sometimes this is very hard and convolutes testcases. Other times it is impossible but we want to test it anyway.

This patch adds a simple test pass that will consume a module containing a loop and generate pipelined code from it. We use post-instr-symbols as a way to annotate instructions with the stage and cycle that we want to schedule them at.

We also provide a flag that causes the MachinePipeliner to generate these annotations instead of actually emitting code; this allows us to generate an input testcase with:

  llc < %s -stop-after=pipeliner -pipeliner-annotate-for-testing -o test.mir

And run the emission in isolation with:

  llc < test.mir -run-pass=modulo-schedule-test

llvm-svn: 370705
2019-09-03 08:20:31 +00:00
Sam Tebbs 8b2df85d02 [ARM] Select vmla
This patch adds vmla selection.

Differential revision: https://reviews.llvm.org/D66297

llvm-svn: 370704
2019-09-03 08:17:46 +00:00
Craig Topper 9dc8c448ed [X86] Don't use Expand for i32 fp_to_uint on SSE1/2 targets on 32-bit target.
Use Custom lowering instead. Fall back to default expansion only
when the scalar FP type belongs in an XMM register. This improves
lowering for i32 to fp80, and also i32 to double on SSE1 only.

llvm-svn: 370699
2019-09-03 05:57:18 +00:00
Craig Topper f255f44336 [X86] Add an exhaustive test for i32 fptosi/fptoui across different triples and features.
llvm-svn: 370698
2019-09-03 05:57:14 +00:00
Craig Topper dcecc7ea46 [X86] Custom promote i32->f80 uint_to_fp on AVX512 64-bit targets.
Reuse the same code to promote all i32 uint_to_fp on 64-bit targets
to simplify the X86ISelLowering constructor.

llvm-svn: 370693
2019-09-03 02:51:10 +00:00
Craig Topper 45cd185109 [X86] Enable fp128 as a legal type with SSE1 rather than with MMX.
FP128 values are passed in xmm registers so should be asssociated
with an SSE feature rather than MMX which uses a different set
of registers.

llc enables sse1 and sse2 by default with x86_64. But does not
enable mmx. Clang enables all 3 features by default.

I've tried to add command lines to test with -sse
where possible, but any test that returns a value in an xmm
register fails with a fatal error with -sse since we have no
defined ABI for that scenario.

llvm-svn: 370682
2019-09-02 20:16:30 +00:00
David Green a5fd8d8f47 [ARM] MVE predicate bitcast test and VPSEL adjustment. NFC
llvm-svn: 370678
2019-09-02 19:03:35 +00:00
David Green a95ec59fa5 [ARM] Use MQPR not QPR for MVE registers
We should be using MQPR, and if we don't we can get COPYs and PHIs created for
QPR. These get folded into instructions, failing verification checks.

Differential revision: https://reviews.llvm.org/D66214

llvm-svn: 370676
2019-09-02 17:18:23 +00:00
Robert Lougher 13190c4225 [TargetLowering][PS4] Add sincos(f) lib functions when target is PS4
PS4 supports sincosf and sincos. Adding the library functions enables
the sin(f)+cos(f) -> sincos(f) optimization.

Differential Revision: https://reviews.llvm.org/D67009

llvm-svn: 370675
2019-09-02 16:53:32 +00:00
Ulrich Weigand b21e245711 [SystemZ] Support constrained fpto[su]i intrinsics
Now that constrained fpto[su]i intrinsic are available,
add codegen support to the SystemZ backend.

In addition to pure back-end changes, I've also needed
to add the strict_fp_to_[su]int and any_fp_to_[su]int
pattern fragments in the obvious way.

llvm-svn: 370674
2019-09-02 16:49:29 +00:00
Kerry McLaughlin da4ef9b4c8 [SVE][Inline-Asm] Support for SVE asm operands
Summary:
Adds the following inline asm constraints for SVE:
  - w: SVE vector register with full range, Z0 to Z31
  - x: Restricted to registers Z0 to Z15 inclusive.
  - y: Restricted to registers Z0 to Z7 inclusive.

This change also adds the "z" modifier to interpret a register as an SVE register.

Not all of the bitconvert patterns added by this patch are used, but they have been included here for completeness.

Reviewers: t.p.northover, sdesmalen, rovka, momchil.velikov, rengolin, cameron.mcinally, greened

Reviewed By: sdesmalen

Subscribers: javed.absar, tschuett, rkruppe, psnobl, cfe-commits, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66302

llvm-svn: 370673
2019-09-02 16:12:31 +00:00
Sanjay Patel 4e54cf3e0e [DAGCombiner] try to form test+set out of shift+mask patterns
The motivating bugs are:
https://bugs.llvm.org/show_bug.cgi?id=41340
https://bugs.llvm.org/show_bug.cgi?id=42697

As discussed there, we could view this as a failure of IR canonicalization,
but then we would need to implement a backend fixup with target overrides
to get this right in all cases. Instead, we can just view this as a codegen
opportunity. It's not even clear for x86 exactly when we should favor
test+set; some CPUs have better theoretical throughput for the ALU ops than
bt/test.

This patch is made more complicated than I expected because there's an early
DAGCombine for 'and' that can change types of the intermediate ops via
trunc+anyext.

Differential Revision: https://reviews.llvm.org/D66687

llvm-svn: 370668
2019-09-02 14:52:09 +00:00
Jay Foad 6e18266aa4 Partially revert D61491 "AMDGPU: Be explicit about whether the high-word in SI_PC_ADD_REL_OFFSET is 0"
Summary:
D61491 caused us to use relocs when they're not strictly necessary, to
refer to symbols in the text section. This is a pessimization and it's a
problem for some loaders that don't support relocs yet.

Reviewers: nhaehnle, arsenm, tpr

Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65813

llvm-svn: 370667
2019-09-02 14:40:57 +00:00
Piotr Sobczak 252a584cbd [AMDGPU] Add test
Summary:
Add test checking that the redundant immediate MOV instruction
(by-product of handling phi nodes) is not found in the generated code.

Reviewers: arsenm, anton-afanasyev, craig.topper, rtereshin, bogner

Reviewed By: arsenm

Subscribers: kzhuravl, yaxunl, dstuttard, tpr, t-tye, wdng, jvesely, nhaehnle, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63860

llvm-svn: 370634
2019-09-02 10:02:54 +00:00
Amara Emerson 453ef4e376 [AArch64][GlobalISel] Fix zext narrowScalar to use the right type when creating
the merges.

Fixes PR43171.

llvm-svn: 370627
2019-09-02 08:18:55 +00:00
Craig Topper 3ab210862a [X86] Add initial support for unfolding broadcast loads from arithmetic instructions to enable LICM hoisting of the load
MachineLICM can hoist an invariant load, but if that load is folded it needs to be unfolded. On AVX512 sometimes this load is an broadcast load which we were previously unable to unfold. This patch adds initial support for that with a very basic list of supported instructions as a starting point.

Differential Revision: https://reviews.llvm.org/D67017

llvm-svn: 370620
2019-09-01 22:14:36 +00:00
Sanjay Patel c882208367 [DAGCombiner] improve throughput of shift+logic+shift
The motivating case for this is a long way from here:
https://bugs.llvm.org/show_bug.cgi?id=43146
...but I think this is where we have to start.

We need to canonicalize/optimize sequences of shift and logic to ease
pattern matching for things like bswap and improve perf in general.
But without the artificial limit of '!LegalTypes' (early combining),
there are a lot of test diffs, and not all are good.

In the minimal tests added for this proposal, x86 should have better
throughput in all cases. AArch64 is neutral for scalar tests because
it can fold shifts into bitwise logic ops.

There are 3 shift opcodes and 3 logic opcodes for a total of 9 possible patterns:
https://rise4fun.com/Alive/VlI
https://rise4fun.com/Alive/n1m
https://rise4fun.com/Alive/1Vn

Differential Revision: https://reviews.llvm.org/D67021

llvm-svn: 370617
2019-09-01 18:38:15 +00:00
David Green 8469a39af3 [ARM] Remove MVE masked loads/stores
These were never enabled correctly and are causing other problems. Taking them
out for the moment, whilst we work on the issues.

This reverts r370329.

llvm-svn: 370607
2019-09-01 10:11:40 +00:00
Shiva Chen adfdcb9c26 [TargetLowering] Fix Bugzilla ID 43183 to avoid soften comparison broken with constant inputs
Summary:
  This fixes the bugzilla id 43183 which triggerd by the following commit:
  [RISCV] Avoid generating AssertZext for LP64 ABI when lowering floating LibCall

llvm-svn: 370604
2019-09-01 04:52:54 +00:00
Puyan Lotfi 75a8a212d4 [GlobalISel][NFC] Regression test cases for aarch64 legalizer (s128 sext+icmp).
There were legalizer asserts in aarch64 globalisel (in debug mode) with s128
sext+icmp before r367060 and r366943 landed. These are just a couple reduced
mir and ir regression tests that came from a build where these were encountered.

llvm-svn: 370602
2019-09-01 00:45:28 +00:00
Simon Pilgrim f8d1d00190 [X86] EltsFromConsecutiveLoads - Don't confuse elt count with vector element count (PR43170)
EltsFromConsecutiveLoads was assuming that the number of input elts was the same as the number of elements in the output vector type when creating a zeroing shuffle, causing an assert when subvectors were being combined instead of just scalars.

llvm-svn: 370592
2019-08-31 16:21:31 +00:00
Simon Pilgrim 20be06db97 [X86][AVX512] Regenerate tests with common prefixes
llvm-svn: 370591
2019-08-31 16:04:39 +00:00
Sanjay Patel 11704d0f51 [AArch64][x86] increase value type coverage in tests; NFC
This goes with D67021.

llvm-svn: 370590
2019-08-31 15:49:16 +00:00
Amaury Sechet 82825ab882 [DAGCombiner] Match (add X, X) as (shl X, 1) when detecting rotate.
Summary: The combiner transforms (shl X, 1) into (add X, X).

Reviewers: craig.topper, efriedma, RKSimon, lebedev.ri

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66882

llvm-svn: 370578
2019-08-31 11:40:02 +00:00
Thomas Lively d0d9317061 [WebAssembly] Add SIMD QFMA/QFMS
Summary:
Adds clang builtins and LLVM intrinsics for these experimental
instructions. They are not implemented in engines yet, but that is ok
because the user must opt into using them by calling the builtins.

Reviewers: aheejin, dschuff

Reviewed By: aheejin

Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D67020

llvm-svn: 370556
2019-08-31 00:12:29 +00:00
Reid Kleckner a33474d595 [X86] Print register names in .seh_* directives
Also improve assembler parser register validation for .seh_ directives.
This requires moving X86-specific seh directive handling into the x86
backend, which addresses some assembler FIXMEs.

Differential Revision: https://reviews.llvm.org/D66625

llvm-svn: 370533
2019-08-30 21:23:05 +00:00
Sanjay Patel cfe959709f [x86] add tests for shift-logic-shift; NFC
llvm-svn: 370529
2019-08-30 20:51:51 +00:00
Sanjay Patel 82847b50e9 [AArch64] add tests for shift-logic-shift; NFC
llvm-svn: 370528
2019-08-30 20:48:43 +00:00
Reid Kleckner 0bb1630685 [Windows] Disable TrapUnreachable for Win64, add SEH_NoReturn
Users have complained llvm.trap produce two ud2 instructions on Win64,
one for the trap, and one for unreachable. This change fixes that.

TrapUnreachable was added and enabled for Win64 in r206684 (April 2014)
to avoid poorly understood issues with the Windows unwinder.

There seem to be two major things in play:
- the unwinder
- C++ EH, _CxxFrameHandler3 & co

The unwinder disassembles forward from the return address to scan for
epilogues. Inserting a ud2 had the effect of stopping the unwinder, and
ensuring that it ran the EH personality function for the current frame.
However, it's not clear what the unwinder does when the return address
happens to be the last address of one function and the first address of
the next function.

The Visual C++ EH personality, _CxxFrameHandler3, needs to figure out
what the current EH state number is. It does this by consulting the
ip2state table, which maps from PC to state number. This seems to go
wrong when the return address is the last PC of the function or catch
funclet.

I'm not sure precisely which system is involved here, but in order to
address these real or hypothetical problems, I believe it is enough to
insert int3 after a call site if it would otherwise be the last
instruction in a function or funclet.  I was able to reproduce some
similar problems locally by arranging for a noreturn call to appear at
the end of a catch block immediately before an unrelated function, and I
confirmed that the problems go away when an extra trailing int3
instruction is added.

MSVC inserts int3 after every noreturn function call, but I believe it's
only necessary to do it if the call would be the last instruction. This
change inserts a pseudo instruction that expands to int3 if it is in the
last basic block of a function or funclet. I did what I could to run the
Microsoft compiler EH tests, and the ones I was able to run showed no
behavior difference before or after this change.

Differential Revision: https://reviews.llvm.org/D66980

llvm-svn: 370525
2019-08-30 20:46:39 +00:00
Sanjay Patel a39ef6dea6 [Thumb2] tighten CHECK lines in test; NFC
The sequence between the function call and the asm start
may change without affecting what this test is looking for,
but we should have a better idea about what that sequence
looks like.

llvm-svn: 370518
2019-08-30 20:15:01 +00:00
Craig Topper 4b61b6476b [X86] Fix mul test cases in avx512-broadcast-unfold.ll to not get canonicalized to fadd. Remove the fsub test cases which were also testing fadd.
Not sure how to prevent an fsub by constant getting turned into an fadd by negative constant.

llvm-svn: 370515
2019-08-30 20:04:23 +00:00
Craig Topper a707ced18f [X86] Regenerate the test cases added in r370506.
Something weird happened with the v2i64/v2f64 test cases which
don't use broadcast. So they should already be hoisted, but
weren't in the version I submitted in r370506. This fixes that.
Not sure if something changed or I screwed up.

llvm-svn: 370507
2019-08-30 19:42:48 +00:00
Craig Topper 2396919200 [X86] Add test caes for opportunities for machine LICM to unfold broadcasted constant pool loads.
MachineLICM is able to unfold loads to move an invariant load out
a loop, but X86 infrastructure currently lacks the ability to do
this when avx512 embedded broadcasting is used.

This test adds examples for the basic float point operations,
add, mul, and, or, and xor.

llvm-svn: 370506
2019-08-30 19:26:06 +00:00
Jinsong Ji fb4b86af92 [PowerPC][NFC] Avoid checking non-relevant .cfi instructions
Summary:
This is brought up in
https://reviews.llvm.org/D64662?id=209923#inline-599490

CFI information are non-relevant to quite some testcases,
we should get rid of checking them when its unecessary.

This patch avoid generating cfi info in testcases that are not
testing prolog/epilog or exception handling.

Reviewers: kbarton, hfinkel, nemanjai, #powerpc

Reviewed By: hfinkel

Subscribers: MaskRay, shchenz, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67016

llvm-svn: 370505
2019-08-30 19:24:25 +00:00
Craig Topper 18e8d02e8c [X86] Pass v32i16/v64i8 in zmm registers on KNL target.
gcc and icc pass these types in zmm registers in zmm registers.

This patch implements a quick hack to override the register
type before calling convention handling to one that is legal.
Longer term we might want to do something similar to 256-bit
integer registers on AVX1 where we just split all the operations.

Fixes PR42957

Differential Revision: https://reviews.llvm.org/D66708

llvm-svn: 370495
2019-08-30 17:35:08 +00:00
Evgeniy Stepanov 04647f5e22 MemTag: unchecked load/store optimization.
Summary:
MTE allows memory access to bypass tag check iff the address argument
is [SP, #imm]. This change takes advantage of this to demote uses of
tagged addresses to regular FrameIndex operands, reducing register
pressure in large functions.

MO_TAGGED target flag is used to signal that the FrameIndex operand
refers to memory that might be tagged, and needs to be handled with
care. Such operand must be lowered to [SP, #imm] directly, without a
scratch register.

The transformation pass attempts to predict when the offset will be
out of range and disable the optimization.
AArch64RegisterInfo::eliminateFrameIndex has an escape hatch in case
this prediction has been wrong, but it is quite inefficient and should
be avoided.

Reviewers: pcc, vitalybuka, ostannard

Subscribers: mgorny, javed.absar, kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66457

llvm-svn: 370490
2019-08-30 17:23:02 +00:00
Simon Atanasyan 68f73bf262 [mips] Merge common checkings under the same check prefix. NFC
llvm-svn: 370467
2019-08-30 12:15:12 +00:00
Luis Marques c2b3d527fa [RISCV] Fix a couple of tests' CHECKs
llvm-svn: 370466
2019-08-30 12:11:47 +00:00
Amaury Sechet 485760f4c0 [X86] Add tests for rotate matching. NFC
llvm-svn: 370464
2019-08-30 11:35:28 +00:00
Petar Avramovic e96892a8aa [MIPS GlobalISel] Lower uitofp
Add custom lowering for G_UITOFP for MIPS32.

Differential Revision: https://reviews.llvm.org/D66930

llvm-svn: 370432
2019-08-30 05:51:12 +00:00
Petar Avramovic 6412b56513 [MIPS GlobalISel] Lower fptoui
Add lower for G_FPTOUI. Algorithm is similar to the SDAG version
in TargetLowering::expandFP_TO_UINT.
Lower G_FPTOUI for MIPS32.

Differential Revision: https://reviews.llvm.org/D66929

llvm-svn: 370431
2019-08-30 05:44:02 +00:00
Dan Gohman 8cfeeaf9de [CodeGen] Fix lowering for returning the result of an extractvalue
When the number of return values exceeds the number of registers available,
SelectionDAGBuilder::visitRet transforms a function's return to use a
pointer to a buffer to hold return values. When the returned value is an
operator such as extractvalue, the value may have a non-zero result number.
Add that number to the indexing when obtaining the values to store.

This fixes https://bugs.llvm.org/show_bug.cgi?id=43132.

Differential Revision: https://reviews.llvm.org/D66978

llvm-svn: 370430
2019-08-30 04:33:22 +00:00
Jinsong Ji 54a1ad5bd7 [PowerPC][NFC] Use -mtriple in RUN line, remove target triple in tls.ll
To avoid confusion, especially when -mtriple are also added for PPC32.

llvm-svn: 370427
2019-08-30 02:57:33 +00:00
Fangrui Song 7704b54389 [PPC32] Emit R_PPC_GOT_TPREL16 instead R_PPC_GOT_TPREL16_LO
Unlike ppc64, which has ADDISgotTprelHA+LDgotTprelL pairs,
ppc32 just uses LDgotTprelL32, so it does not make lots of sense to use
_LO without a paired _HA.

Emit R_PPC_GOT_TPREL16 instead R_PPC_GOT_TPREL16_LO to match GCC, and
get better linker relocation check. Note, R_PPC_GOT_TPREL16_{HA,LO}
don't have good linker support:

(a) lld does not support R_PPC_GOT_TPREL16_{HA,LO}.
(b) Top of tree ld.bfd does not support R_PPC_GOT_REL16_HA Initial-Exec -> Local-Exec relaxation:

  // a.o
  addis 3, 3, tsd_tls@got@tprel@ha
  lwz 3, tsd_tls@got@tprel@l(3)
  add 3, 3, tsd_tls@tls
  // b.o
  .section .tdata,"awT"; .globl tsd_tls; tsd_tls:

  // ld/ld-new a.o b.o
  internal error, aborting at ../../bfd/elf32-ppc.c:7952 in ppc_elf_relocate_section

Reviewed By: adalava

Differential Revision: https://reviews.llvm.org/D66925

llvm-svn: 370426
2019-08-30 02:20:49 +00:00
Jinsong Ji 1ed7d2119e [PowerPC] Support extended mnemonics mffprwz etc.
Summary:
Reported in https://github.com/opencv/opencv/issues/15413.

We have serveral extended mnemonics for Move To/From Vector-Scalar Register Instructions
eg: mffprd,mtfprd etc.

We only support one of them, this patch add the others.

Reviewers: nemanjai, steven.zhang, hfinkel, #powerpc

Reviewed By: hfinkel

Subscribers: wuzish, qcolombet, hiraditya, kbarton, MaskRay, shchenz, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66963

llvm-svn: 370411
2019-08-29 21:53:59 +00:00
Jessica Paquette 04e657be28 [AArch64][GlobalISel] Select arithmetic extended register patterns
This teaches GISel to select patterns which fold an extend plus optional shift
into the addressing mode. In particular, adds and subs.

Factor out the arith extended register ComplexPatterns in AArch64InstrFormats.td
and create GISel equivalents.

Add some equivalent functions to the ones in AArch64ISelDAGToDAG:

- `selectArithExtendedRegister`
- `narrowExtendRegIfNeeded`
- `getExtendTypeForInst`

`getExtendTypeForInst` includes the checks for loads and stores. This will be
used for WRO addressing modes in loads + stores.

Teach selectCopy to properly handle subregister copies on the same bank in
order to support `narrowExtendRegIfNeeded`. The extended register must be a
GPR32, so we need to support same-bank subregister copies.

Fix a bug in getSubRegForClass which would cause registers on things like
GPR32common to end up getting ssub. Just change the check to look for FPR32
rather than GPR32.

For tests:

- Add select-arith-extended-reg.mir
- Update addsub_ext.ll to include GlobalISel checks

Differential Revision: https://reviews.llvm.org/D66835

llvm-svn: 370410
2019-08-29 21:53:58 +00:00
Reid Kleckner 5b79e603d3 [X86] Don't emit unreachable stack adjustments
Summary:
This is a minor improvement on our past attempts to do this. Fixes
PR43155.

Reviewers: hans

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66905

llvm-svn: 370409
2019-08-29 21:24:41 +00:00
Simon Pilgrim 3d705a1fa4 [X86][SSE] combinePMULDQ - pmuldq(x, 0) -> zero vector (PR43159)
ISD::isBuildVectorAllZeros permits undef elements to be present, which means we can't return it as a zero vector. PMULDQ/PMULUDQ is an extending multiply so a multiply by zero of the lower 32-bits should result in a zero 64-bit element.

llvm-svn: 370404
2019-08-29 20:22:08 +00:00
Matt Arsenault cbd1782c79 AMDGPU/GlobalISel: Legalize sin/cos
llvm-svn: 370402
2019-08-29 20:06:48 +00:00
Jordan Rupprecht f9f81289e6 Revert [MBP] Disable aggressive loop rotate in plain mode
This reverts r369664 (git commit 51f48295cb)

It causes many benchmark regressions, internally and in llvm's benchmark suite.

llvm-svn: 370398
2019-08-29 19:03:58 +00:00
Alina Sbirlea 4b87023bae Revert enabling MemorySSA.
Breaks sanitizers bots.

Differential Revision: https://reviews.llvm.org/D58311

llvm-svn: 370397
2019-08-29 19:01:23 +00:00
Craig Topper 5a43fdd313 [X86] Remove what little support we had for MPX
-Deprecate -mmpx and -mno-mpx command line options
-Remove CPUID detection of mpx for -march=native
-Remove MPX from all CPUs
-Remove MPX preprocessor define

I've left the "mpx" string in the backend so we don't fail on old IR, but its not connected to anything.

gcc has also deprecated these command line options. https://www.phoronix.com/scan.php?page=news_item&px=GCC-Patch-To-Drop-MPX

Differential Revision: https://reviews.llvm.org/D66669

llvm-svn: 370393
2019-08-29 18:09:02 +00:00
Matt Arsenault caff0a88dd GlobalISel: Add known bits to InstructionSelector
AMDGPU uses this for some addressing mode selection patterns. The
analysis run itself doesn't do anything so it seems easier to just
always require this than adding a way to opt in.

llvm-svn: 370388
2019-08-29 17:24:32 +00:00
Alina Sbirlea 6289ee941d [MemorySSA & LoopPassManager] Enable MemorySSA as loop dependency. Update tests.
Summary:
I'm not planning to check this in at the moment, but feedback is very welcome, in particular how this affects performance.
The feedback obtains here will guide the next steps towards enabling this.

This patch enables the use of MemorySSA in the loop pass manager.

Passes that currently use MemorySSA:
 - EarlyCSE
Passes that use MemorySSA after this patch:
 - EarlyCSE
 - LICM
 - SimpleLoopUnswitch
Loop passes that update MemorySSA (and do not use it yet, but could use it after this patch):
 - LoopInstSimplify
 - LoopSimplifyCFG
 - LoopUnswitch
 - LoopRotate
 - LoopSimplify
 - LCSSA
Loop passes that do *not* update MemorySSA:
 - IndVarSimplify
 - LoopDelete
 - LoopIdiom
 - LoopSink
 - LoopUnroll
 - LoopInterchange
 - LoopUnrollAndJam
 - LoopVectorize
 - LoopReroll
 - IRCE

Reviewers: chandlerc, george.burgess.iv, davide, sanjoy, gberry

Subscribers: jlebar, Prazek, dmgreen, jdoerfert, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58311

llvm-svn: 370384
2019-08-29 17:08:13 +00:00
Jessica Paquette ba04f5fac1 [GlobalISel][AArch64] Select llvm.aarch64.stxr* intrinsics.
Add a GISelPredicateCode to the stxr_* PatFrags in AArch64InstrAtomics.td.

This allows us to select these intrinsics.

Differential Revision: https://reviews.llvm.org/D65779

llvm-svn: 370382
2019-08-29 16:55:55 +00:00
Jessica Paquette b8b23a1648 [GlobalISel][AArch64] Use a GISelPredicateCode to select llvm.aarch64.stlxr.*
Remove manual selection code for this intrinsic and use a GISelPredicateCode
instead.

This allows us to fully select this intrinsic without any tricky custom C++
matching.

Differential Revision: https://reviews.llvm.org/D65780

llvm-svn: 370380
2019-08-29 16:45:19 +00:00
Jessica Paquette 87720ac8c8 [AArch64][GlobalISel] Select @llvm.aarch64.ldxr.* intrinsics
Same thing as D66897, but for ldxr.* instead. Add a GISelPredicateCode to the
ldxr_* definitions, which allows us to import them.

Add select-ldxr-intrin.mir, and update arm64-ldxr-stxr.ll.

Differential Revision: https://reviews.llvm.org/D66898

llvm-svn: 370378
2019-08-29 16:33:01 +00:00
Jessica Paquette c327daeea5 [AArch64][GlobalISel] Select @llvm.aarch64.ldaxr.* intrinsics
Add a GISelPredicateCode to ldaxr_*. This allows us to import the patterns for
@llvm.aarch64.ldaxr.*, and thus select them.

Add `isLoadStoreOfNumBytes` for the GISelPredicateCode, since each of these
intrinsics involves the same check.

Add select-ldaxr-intrin.mir, and update arm64-ldxr-stxr.ll.

Differential Revision: https://reviews.llvm.org/D66897

llvm-svn: 370377
2019-08-29 16:16:38 +00:00
Jinsong Ji 8b0317ad7d [PowerPC][NFC] Update fp-int-conversions-direct-moves.ll using script
Also add -ppc-asm-full-reg-names,-ppc-vsr-nums-as-vr.

llvm-svn: 370375
2019-08-29 15:38:02 +00:00
Luis Marques cf3b39391e [RISCV] Fix callee-saved-gprs.ll test ABIs
llvm-svn: 370359
2019-08-29 14:05:59 +00:00
Jeremy Morse ca0e4b3689 [DebugInfo] LiveDebugValues: correctly discriminate kinds of variable locations
The missing line added by this patch ensures that only spilt variable
locations are candidates for being restored from the stack. Otherwise,
register or constant-value information can be interpreted as a spill
location, through a union.

The added regression test replicates a scenario where this occurs: the
stack load from [rsp] causes the register-location DBG_VALUE to be
"restored" to rsi, when it should be left alone. See PR43058 for details.

Un x-fail a test that was suffering from this from a previous patch.

Differential Revision: https://reviews.llvm.org/D66895

llvm-svn: 370334
2019-08-29 11:20:54 +00:00
David Green 942c2e3795 [ARM] MVE Masked loads and stores
Masked loads and store fit naturally with MVE, the instructions being easily
predicated. This adds lowering for the simple cases of masked loads and stores.
It does not yet deal with widening/narrowing or pre/post inc.

The llvm masked load intrinsic will accept a "passthru" value, dictating the
values used for the zero masked lanes. In MVE the instructions write 0 to the
zero predicated lanes, so we need to match a passthru that isn't 0 (or undef)
with a select instruction to pull in the correct data after the load.

We also need to do something with unaligned loads/stores. Currently this uses a
similar method used in big endian, using an VLDRB.8 (and potentially a VREV in
BE). This does mean that the predicate mask is converted from, for example, a
v4i1 to a v16i1. The VLDR instructions are defined as using the first bit of
the relevant mask lane, so this could potentially load different results if the
predicate is little odd. As the input is a v4i1 however, I believe this is OK
and all the bits required should be set in the predicate, making the VLDRB.8
load the same data.

Differential Revision: https://reviews.llvm.org/D66534

llvm-svn: 370329
2019-08-29 10:54:35 +00:00
Jeremy Morse 313d2ce999 [DebugInfo] LiveDebugValues should always revisit backedges if it skips them
The "join" method in LiveDebugValues does not attempt to join unseen
predecessor blocks if their out-locations aren't yet initialized, instead
the block should be re-visited later to see if any locations have changed
validity. However, because the set of blocks were all being "process"'d
once before "join" saw them, that logic in "join" was actually ignoring
legitimate out-locations on the first pass through. This meant that some
invalidated locations were not removed from the head of loops, allowing
illegal locations to persist.

Fix this by removing the run of "process" before the main join/process loop
in ExtendRanges. Now the unseen predecessors that "join" skips truly are
uninitialized, and we come back to the block at a later time to re-run
"join", see the @baz function added.

This also fixes another fault where stack/register transfers in the entry
block (or any other before-any-loop-block) had their tranfers initially
ignored, and were then never revisited. The MIR test added tests for this
behaviour.

XFail a test that exposes another bug; a fix for this is coming in D66895.

Differential Revision: https://reviews.llvm.org/D66663

llvm-svn: 370328
2019-08-29 10:53:29 +00:00
Roman Lebedev cc7495a355 [X86][CodeGen][NFC] Delay `combineIncDecVector()` from DAGCombine to X86DAGToDAGISel
Summary:
We were previously doing it in DAGCombine.
But we also want to do `sub %x, C` -> `add %x, (sub 0, C)` for vectors in DAGCombine.
So if we had `sub %x, -1`, we'll transform it to `add %x, 1`,
which `combineIncDecVector()` will immediately transform back into `sub %x, -1`,
and here we go again...

I've marked this as NFC since not a single test changes,
but since that 'changes' DAGCombine, probably this isn't fully NFC.

Reviewers: RKSimon, craig.topper, spatel

Reviewed By: craig.topper

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62327

llvm-svn: 370327
2019-08-29 10:50:09 +00:00
Amaury Sechet 8365e42010 [DAGCombiner] (insert_vector_elt (vector_shuffle X, Y), (extract_vector_elt X, N), IdxC) -> (vector_shuffle X, Y)
Summary: This is beneficial when the shuffle is only used once and end up being generated in a few places when some node is combined into a shuffle.

Reviewers: craig.topper, efriedma, RKSimon, lebedev.ri

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66718

llvm-svn: 370326
2019-08-29 10:35:51 +00:00
David Green e9211b764c [ARM] Masked load and store and predicate tests. NFC
llvm-svn: 370325
2019-08-29 10:32:12 +00:00
Craig Topper 1ec5c204b8 [X86] Add a DAG combine to combine INSERTPS and VBROADCAST of a scalar load. Remove corresponding isel patterns.
We had an isel pattern to perform this, but its better to
do it in DAG combine as a simplification. This also fixes the lack
of patterns for AVX512 targets.

llvm-svn: 370294
2019-08-29 05:48:48 +00:00
Craig Topper 1aadf6f39f [X86] Make inline assembly 'x' and 'v' constraints work for f128.
Including a type legalizer fix to make bitcast operand promotion
work correctly when getSoftenedFloat returns f128 instead of i128.

Fixes PR43157

llvm-svn: 370293
2019-08-29 05:13:56 +00:00
Matt Arsenault 216d8ff60b AMDGPU: Don't use frame virtual registers
SGPR spills aren't really handled after SILowerSGPRSpills. In order to
directly control what happens if the scavenger needs to spill, the
scavenger needs to be used directly. There is an alternative to
spilling in these contexts anyway since the frame register can be
increment and restored.

This does present another possible issue if spilling is needed for the
unused carry out if an add is needed. I think this can be avoided by
using a scalar add (although that clobbers SCC, which happens anyway).

llvm-svn: 370281
2019-08-29 01:13:47 +00:00
Matt Arsenault 8ec5c10042 GlobalISel/TableGen: Handle setcc patterns
This is a special case because one node maps to two different G_
instructions, and the operand order is changed.

This mostly enables G_FCMP for AMDPGPU. G_ICMP is still manually
selected for now since it has the SALU and VALU complication to deal
with.

llvm-svn: 370280
2019-08-29 01:13:41 +00:00
Richard Trieu e4a7f0182d Add requirement to test.
-debug-only option for llc is only available in debug builds so
"REQUIRES: asserts" is needed in the tes.

llvm-svn: 370279
2019-08-29 00:46:57 +00:00
Shiva Chen b39876d8cd [RISCV] Avoid generating AssertZext for LP64 ABI when lowering floating LibCall
The patch fixed the issue that RV64 didn't clear the upper bits
when return complex floating value with lp64 ABI.

float _Complex
complex_add(float _Complex a, float _Complex b)
{
   return a + b;
}

RealResult = zero_extend(RealA + RealB)
ImageResult = ImageA + ImageB
Return (RealResult | (ImageResult << 32))

The patch introduces shouldExtendTypeInLibCall target hook to suppress
the AssertZext generation when lowering floating LibCall.

Thanks to Eli's comments from the Bugzilla
https://bugs.llvm.org/show_bug.cgi?id=42820

Differential Revision: https://reviews.llvm.org/D65497

llvm-svn: 370275
2019-08-28 23:40:37 +00:00
Heejin Ahn d85fd5a3f4 [WebAssembly] Add atomic.fence instruction
Summary:
This adds `atomic.fence` instruction:
https://github.com/WebAssembly/threads/blob/master/proposals/threads/Overview.md#fence-operator

And we now emit the new `atomic.fence` instruction for multithread
fences, rather than the prevous `atomic.rmw` hack.

Reviewers: dschuff

Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, jfb, tlively, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66794

llvm-svn: 370272
2019-08-28 23:13:43 +00:00
Simon Atanasyan 59bb3609fa [mips] Fix 64-bit address loading in case of applying 32-bit mask to the result
If result of 64-bit address loading combines with 32-bit mask, LLVM
tries to optimize the code and remove "redundant" loading of upper
32-bits of the address. It leads to incorrect code on MIPS64 targets.

MIPS backend creates the following chain of commands to load 64-bit
address in the `MipsTargetLowering::getAddrNonPICSym64` method:
```
(add (shl (add (shl (add %highest(sym), %higher(sym)),
                    16),
               %hi(sym)),
          16),
     %lo(%sym))
```

If the mask presents, LLVM decides to optimize the chain of commands. It
really does not make sense to load upper 32-bits because the 0x0fffffff
mask anyway clears them. After removing redundant commands we get this
chain:
```
(add (shl (%hi(sym), 16), %lo(%sym))
```

There is no patterns matched `(MipsHi (i64 symbol))`. Due a bug in `SYM_32`
predicate definition, backend incorrectly selects a pattern for a 32-bit
symbols and uses the `lui` instruction for loading `%hi(sym)`.

As a result we get incorrect set of instructions with unnecessary 16-bit
left shifting:
```
lui     at,0x0
    R_MIPS_HI16     foo
dsll    at,at,0x10
daddiu  at,at,0
    R_MIPS_LO16     foo
```

This patch resolves two problems:
- Fix `SYM_32/SYM_64` predicates to prevent selection of patterns dedicated
  to 32-bit symbols in case of using N64 ABI.
- Add missed patterns for 64-bit symbols for `%hi/%lo`.

Fix PR42736.

Differential Revision: https://reviews.llvm.org/D66228

llvm-svn: 370268
2019-08-28 22:32:10 +00:00
Philip Reames 0b62951e1d Use the handle --check-prefixes mechanism to de-verbosify a couple atomics tests [NFC]
llvm-svn: 370256
2019-08-28 20:27:39 +00:00
Jessica Paquette 7080ffa21a [GlobalISel] Import patterns containing SUBREG_TO_REG
Reuse the logic for INSERT_SUBREG to also import SUBREG_TO_REG patterns.

- Split `inferSuperRegisterClass` into two functions, one which tries to use
  an existing TreePatternNode (`inferSuperRegisterClassForNode`), and one that
  doesn't. SUBREG_TO_REG doesn't have a node to leverage, which is the cause
  for the split.

- Rename GlobalISelEmitterInsertSubreg.td to GlobalISelEmitterSubreg.td and
  update it.

- Update impacted tests in the AArch64 and X86 backends.

This is kind of a hit/miss for code size improvements/regressions. E.g. in
add-ext.ll, we now get some identity copies. This isn't really anything the
importer can handle, since it's caused by a later pass introducing the copy for
the sake of correctness.

Differential Revision: https://reviews.llvm.org/D66769

llvm-svn: 370254
2019-08-28 20:12:31 +00:00
Kevin P. Neal ddf13c00ed [FPEnv] Add fptosi and fptoui constrained intrinsics.
This implements constrained floating point intrinsics for FP to signed and
unsigned integers.

Quoting from D32319:
The purpose of the constrained intrinsics is to force the optimizer to
respect the restrictions that will be necessary to support things like the
STDC FENV_ACCESS ON pragma without interfering with optimizations when
these restrictions are not needed.

Reviewed by:	Andrew Kaylor, Craig Topper, Hal Finkel, Cameron McInally, Roman Lebedev, Kit Barton
Approved by:	Craig Topper
Differential Revision:	http://reviews.llvm.org/D63782

llvm-svn: 370228
2019-08-28 16:33:36 +00:00
Jessica Paquette af0bd41e06 [AArch64][GlobalISel] Fall back when translating musttail calls
These are currently translated as normal functions calls in AArch64.

Until we have proper tail call lowering, we shouldn't translate these.

Differential Revision: https://reviews.llvm.org/D66842

llvm-svn: 370225
2019-08-28 16:19:01 +00:00
Ryan Taylor 3b1459ed7c [AMDGPU] Adjust number of SGPRs available in Calling Convention
This reduces the number of SGPRs due to some concerns about running
out of SGPRs if you make all the SGPRs that aren't reserved available
for the calling convention.

Change-Id: Idb4ca4dc72f5b6808cb524ff7270915a8de5b4c1
llvm-svn: 370215
2019-08-28 15:00:45 +00:00
Hans Wennborg cff90f07cb [SelectionDAG] Don't generate libcalls for wide shifts on Windows (PR42711)
Neither libgcc or compiler-rt are usually used on Windows, so these
functions can't be called.

Differential revision: https://reviews.llvm.org/D66880

llvm-svn: 370204
2019-08-28 13:55:10 +00:00
Amaury Sechet 3b44c36b29 [X86] Add test for rotate combining when add X, X is used instead of shl X, 1. NFC
llvm-svn: 370203
2019-08-28 13:52:32 +00:00
Simon Atanasyan f46ba4f077 [mips] Use less registers to load address of TargetExternalSymbol
There is no pattern matched `add hi, (MipsLo texternalsym)`. As a result,
loading an address of 32-bit symbol requires two registers and one more
additional instruction:
```
addiu $1, $zero, %lo(foo)
lui   $2, %hi(foo)
addu  $25, $2, $1
```

This patch adds the missed pattern and enables generation more effective
set of instructions:
```
lui   $1, %hi(foo)
addiu $25, $1, %lo(foo)
```

Differential Revision: https://reviews.llvm.org/D66771

llvm-svn: 370196
2019-08-28 12:35:53 +00:00
David Green 1c5b143c99 [MVE] VMOVX patterns
This adds fp16 VMOVX patterns, using the same patterns as rL362482 with some
adjustments for MVE. It allows us to move fp16 registers without going into and
out of gprs.

VMOVX is able to move the top bits from a fp16 in a fp reg into the bottom bits
of another register, zeroing the rest. This can be used for odd MVE register
lanes. The top bits are not read by fp16 instructions, so no move is required
there if we are dealing with even lanes.

Differential revision: https://reviews.llvm.org/D66793

llvm-svn: 370184
2019-08-28 10:13:23 +00:00
Sam Parker a761ba0f2d [ARM][ParallelDSP] Change search for muls
rL369567 reverted a couple of recent changes made to ARMParallelDSP
because of a miscompilation error: PR43073.

The issue stemmed from an underlying bug that was caused by adding
muls into a reduction before it was proved that they could be executed
in parallel with another mul.

Most of the changes here are from the previously reverted commits.
The additional changes have been made area:
1) The Search function now doesn't insert any muls into the Reduction
   object. That now happens once the search has successfully finished.
2) For any muls added into the reduction but that weren't paired, we
   accumulate their values as an input into the smlad.

Differential Revision: https://reviews.llvm.org/D66660

llvm-svn: 370171
2019-08-28 08:51:13 +00:00
Matt Arsenault a8bbcbd006 AMDGPU/GlobalISel: Fix constraining scalar and/or/xor
If the result register already had a register class assigned, the
sources may not have been properly constrained.

llvm-svn: 370150
2019-08-28 02:11:03 +00:00
Matt Arsenault 5c7e96dc26 AMDGPU/GlobalISel: Implement addrspacecast for 32-bit constant addrspace
llvm-svn: 370140
2019-08-28 00:58:24 +00:00
Amara Emerson e20b91c265 [GlobalISel] Replace hard coded dynamic alloca handling with G_DYN_STACKALLOC.
This change moves the actual stack pointer manipulation into the legalizer,
available to targets via lower(). The codegen is slightly different because
we're using explicit masks instead of G_PTRMASK, and using G_SUB rather than
adding a negative amount via G_GEP.

Differential Revision: https://reviews.llvm.org/D66678

llvm-svn: 370104
2019-08-27 19:54:27 +00:00
Matt Arsenault 2910184936 DAG: computeNumSignBits for MUL
Copied directly from the IR version.

Most of the testcases I've added for this are somewhat problematic
because they really end up testing the yet to be implemented version
for MUL_I24/MUL_U24.

llvm-svn: 370099
2019-08-27 19:05:33 +00:00
Matt Arsenault 2797474dbb AMDGPU: Add baseline test for num sign bits of mul
llvm-svn: 370098
2019-08-27 19:01:02 +00:00
Matt Arsenault ff07631b48 AMDGPU: Add amdgpu-32bit-address-high-bits to MIR serialization
llvm-svn: 370089
2019-08-27 18:18:38 +00:00
Matt Arsenault 0c096da02f AMDGPU: Fix crash from inconsistent register types for v3i16/v3f16
This is something of a workaround since computeRegisterProperties
seems to be doing the wrong thing.

llvm-svn: 370086
2019-08-27 17:51:56 +00:00
Jessica Paquette a2ea8a1eca Recommit "[GlobalISel] Import patterns containing INSERT_SUBREG"
I thought `llvm::sort` was stable for some reason but it's not.

Use `llvm::stable_sort` in `CodeGenTarget::getSuperRegForSubReg`.

Original patch: https://reviews.llvm.org/D66498

llvm-svn: 370084
2019-08-27 17:47:06 +00:00
Jessica Paquette 3d9b39b733 Revert "[GlobalISel] Import patterns containing INSERT_SUBREG"
When EXPENSIVE_CHECKS are enabled, GlobalISelEmitterSubreg.td doesn't get
stable output.

Reverting while I debug it.

See: https://reviews.llvm.org/D66498
llvm-svn: 370080
2019-08-27 17:26:44 +00:00
Sanjay Patel b516f1afdd [DAGCombiner] cancel fnegs from multiplied operands of FMA
(-X) * (-Y) + Z --> X * Y + Z

This is a missing optimization that shows up as a potential regression in D66050,
so we should solve it first. We appear to be partly missing this fold in IR as well.

We do handle the simpler case already:
(-X) * (-Y) --> X * Y

And it might be beneficial to make the constraint less conservative (eg, if both
operands are cheap, but not necessarily cheaper), but that causes infinite looping
for the existing fmul transform.

Differential Revision: https://reviews.llvm.org/D66755

llvm-svn: 370071
2019-08-27 15:17:46 +00:00
Jason Liu fc056950aa Handle local commons for XCOFF object file writing
Summary:
Adds support for emitting common local global symbols to an XCOFF object file.
Local commons are emitted into the .bss section with a storage class of
C_HIDEXT.

Patch by: daltenty

Reviewers: sfertile, hubert.reinterpretcast

Differential Revision: https://reviews.llvm.org/D66097

llvm-svn: 370070
2019-08-27 15:14:45 +00:00
Jinsong Ji 7f536bcf22 Revert "[CodeGen] Do the Simple Early Return in block-placement pass to optimize the blocks"
This reverts commit b3d258fc44.

@skatkov is reporting crash in D63972#1646303
Contacted @ZhangKang, and revert the commit on behalf of him.

llvm-svn: 370069
2019-08-27 14:59:08 +00:00
Petar Avramovic 4a2a653288 [MIPS GlobalISel] ClampScalar G_SHL, G_ASHR and G_LSHR
ClampScalar G_SHL, G_ASHR and G_LSHR to s32 for MIPS32.

Differential Revision: https://reviews.llvm.org/D66533

llvm-svn: 370067
2019-08-27 14:41:44 +00:00
Petar Avramovic d568ed40e0 [GlobalISel] Fix narrowScalar for shifts to match algorithm from SDAG
Fix typos. Use Hi and Lo prefixes for Or instead of LHS and RHS
to match names of surrounding variables.

Differential Revision: https://reviews.llvm.org/D66587

llvm-svn: 370062
2019-08-27 14:22:32 +00:00
Simon Pilgrim 8912e2af39 [X86][AVX] Add SimplifyDemandedVectorElts support for KSHIFTL/KSHIFTR
Differential Revision: https://reviews.llvm.org/D66527

llvm-svn: 370055
2019-08-27 13:13:17 +00:00
Tim Northover a7f226f9db AArch64: avoid creating cycle in DAG for post-increment NEON ops.
Inserting a value into Visited has the effect of terminating a search for
predecessors if that node is seen. This is legitimate for the base address, and
acts as a slight performance optimization, but the vector-building node can be
paert of a legitimate cycle so we shouldn't stop searching there.

PR43056.

llvm-svn: 370036
2019-08-27 10:21:11 +00:00
Pengfei Wang 564fb58a32 [WinEH] Allocate space in funclets stack to save XMM CSRs
Summary:
This is an alternate approach to D63396

Currently funclets reuse the same stack slots that are used in the
parent function for saving callee-saved xmm registers. If the parent
function modifies a callee-saved xmm register before an excpetion is
thrown, the catch handler will overwrite the original saved value.

This patch allocates space in funclets stack for saving callee-saved xmm
registers and uses RSP instead RBP to access memory.

Signed-off-by: Pengfei Wang <pengfei.wang@intel.com>

Reviewers: rnk, RKSimon, craig.topper, annita.zhang, LuoYuanke, andrew.w.kaylor

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66596

Signed-off-by: Pengfei Wang <pengfei.wang@intel.com>
llvm-svn: 370005
2019-08-27 01:53:24 +00:00
Matt Arsenault 0a6564980b AMDGPU: Combine directly on mul24 intrinsics
The problem these are supposed to work around can occur before the
intrinsics are lowered into the nodes. Try to directly simplify them
so they are matched before the bit assert operations can be optimized
out.

llvm-svn: 369994
2019-08-27 00:18:09 +00:00
Matt Arsenault 3b95986a32 AMDGPU: Run AMDGPUCodeGenPrepare after scalar opts
The mul24 matching could interfere with SLSR and the other addressing
mode related passes. This probably is not the optimal placement, but
is an intermediate step. This should probably be moved after all the
generic IR passes, particularly LSR. Moving this after LSR seems to
help in some cases, and hurts others.

As-is in this patch, in idiv-licm, it saves 1-2 instructions inside
some of the loop bodies, but increases the number in others. Moving
this later helps these loops. In the new lsr tests in
mul24-pass-ordering, the intrinsic prevents introducing more
instructions in the loop preheader, so moving this later ends up
hurting them. This shouldn't be any worse than before the intrinsics
were introduced in r366094, and LSR should probably be smarter. I
think it's because it doesn't know the and inside the loop will be
folded away.

llvm-svn: 369991
2019-08-27 00:08:31 +00:00
Craig Topper 6db7f492d9 [X86] Delay combineIncDecVector until after op legalization.
Probably better to keep add over sub in early DAG combines.

It might make sense to push this to lowering or delay it all
the way to isel. But this was the simplest change.

llvm-svn: 369981
2019-08-26 22:17:54 +00:00
Heejin Ahn 173a3a54bb [WebAssembly] Fix SSA rebuilding in SjLj transformation
Summary:
Previously we skipped uses within the same BB as a def when rebuilding
SSA after SjLj transformation. For example, before transformation,
```
for.cond:
  %0 = phi i32 [ %var, %for.inc ] ...
  %var = ...
  br label %for.inc

for.inc:                               ; preds = %for.cond
  call i32 @setjmp(...)
  br %for.cond
```

In this BB, %var should be defined in all paths from %for.inc to make %0
valid. In the input it was true; %for.inc's only predecessor was
%for.cond. But after SjLj transformation, it is possible that %for.inc
has other predecessors that are reachable without reaching %for.cond.
```
entry.split:
  ...
  br i1 %a, label %bb.1, label %for.inc

for.cond:
  %0 = phi i32 [ %var, %for.inc ] ...  ; Not valid!
  %var = ...
  br label %for.inc

for.inc:                               ; preds = %for.cond, %entry.split
  call i32 @setjmp(...)
  ...
  br %for.cond
```

In this case, we can't use %var in the `phi` instruction in %for.cond,
because %var is not defined in all paths through %for.inc (If the
control flow is %entry -> %entry.split -> %for.inc -> %for.cond, %var
has not been defined until we reach the `phi`). But the previous code
excluded users within the same BB, skipping instructions within the same
BB so they are not rewritten properly. User instructions within the same
BB also should be candidates for rewriting if they are _before_ the
original definition.

Fixes PR43097.

Reviewers: dschuff

Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66729

llvm-svn: 369978
2019-08-26 21:51:35 +00:00
Heejin Ahn 1266191d6f [WebAssembly] Combine emscripten SjLj tests
Summary:
Combine a test in lower-em-sjlj-longjmp-only.ll into lower-em-sjlj.ll,
because the test command is the same and I don't see any reason it
should be a separate file. Also converted tabs into spaces and fixed
indentations in lower-em-sjlj-sret.ll. (lower-em-sjlj.ll uses a
different test command (llc), so it couldn't be combined)

Reviewers: dschuff

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66728

llvm-svn: 369974
2019-08-26 21:41:17 +00:00
Jessica Paquette 69400f867d [GlobalISel] Import patterns containing INSERT_SUBREG
This teaches the importer to handle INSERT_SUBREG instructions.

We were missing patterns involving INSERT_SUBREG in AArch64. It appears in
AArch64InstrInfo.td 107 times, and 14 times in AArch64InstrFormats.td.

To meaningfully import it, the GlobalISelEmitter needs to know how to infer a
super register class for a given register class.

This patch introduces the following:

- `getSuperRegForSubReg`, a function which finds the largest register class
which supports a value type and subregister index

- `inferSuperRegisterClass`, a function which finds the appropriate super
register class for an INSERT_SUBREG'

- `inferRegClassFromPattern`, a function which allows for some trivial
lookthrough into instructions

- `getRegClassFromLeaf`, a helper function which returns the register class for
a leaf `TreePatternNode`

- Support for subregister index operands in `importExplicitUseRenderer`

It also

- Updates tests in each backend which are impacted by the change

- Adds GlobalISelEmitterSubreg.td to test that we import and skip the expected
patterns

As a result of this patch, INSERT_SUBREG patterns in X86 may use the
LOW32_ADDR_ACCESS_RBP register class instead of GR32. This is correct, since the
register class contains the same registers as GR32 (except with the addition of
RBP). So, this also teaches X86 to handle that register class. This is in line
with X86ISelLowering, which treats this as a GR class.

Differential Revision: https://reviews.llvm.org/D66498

llvm-svn: 369973
2019-08-26 21:38:57 +00:00
Krzysztof Parzyszek 9e0feaf562 [Hexagon] Improve generated code for test-if-bit-clear
llvm-svn: 369947
2019-08-26 19:08:08 +00:00
Craig Topper 36d1588f01 [X86] Add a hack to combinePMULDQ to manually turn SIGN_EXTEND_VECTOR_INREG/ZERO_EXTEND_VECTOR_INREG inputs into an ANY_EXTEND_VECTOR_INREG style shuffle
ANY_EXTEND_VECTOR_INREG isn't currently marked Legal which prevents SimplifyDemandedBits from turning SIGN/ZERO_EXTEND_VECTOR_INREG into it after op legalization. And even if we did make it Legal, combineExtInVec doesn't do shuffle combining on the VECTOR_INREG nodes until AVX1.

This patch adds a quick hack to combinePMULDQ to directly emit a vector shuffle corresponding to an ANY_EXTEND_VECTOR_INREG operation. This avoids both of those issues without creating any other regressions on our tests. The xop-ifma.ll change here also showed up when I tried to resurrect D56306 and seemed to be the only improvement that patch creates now. This is a more direct way to get the benefit.

Differential Revision: https://reviews.llvm.org/D66436

llvm-svn: 369942
2019-08-26 18:23:26 +00:00
Craig Topper 846429de74 [DAGCombiner][X86] Teach SimplifyVBinOp to fold VBinOp (concat X, undef/constant), (concat Y, undef/constant) -> concat (VBinOp X, Y), VecC
This improves the combine I included in D66504 to handle constants in the upper operands of the concat. If we can constant fold them away we can pull the concat after the bin op. This helps with chains of madd reductions on X86 from loop unrolling. The loop madd reduction pattern creates pmaddwd with half the width of the add that follows it using zeroes to fill the upper bits. If we have two of these added together we can pull the zeroes through the accumulating add and then shrink it.

Differential Revision: https://reviews.llvm.org/D66680

llvm-svn: 369937
2019-08-26 17:59:11 +00:00
Sanjay Patel 442a5765ce [PowerPC] add tests for fma with negated ops; NFC
llvm-svn: 369923
2019-08-26 16:20:09 +00:00
Amaury Sechet 298c0b352d [X86] Automatically generate various tests. NFC
llvm-svn: 369909
2019-08-26 13:53:29 +00:00
Zi Xuan Wu e18aa1e0a2 [NFC][Regalloc] Add testcases for D66576
llvm-svn: 369877
2019-08-26 05:06:30 +00:00
Amaury Sechet 1ec3ad9ed8 [X86] Automatically generate stack folding tests. NFC
llvm-svn: 369876
2019-08-25 20:48:14 +00:00
Sanjay Patel 7bd08fbae9 [Hexagon] remove noise from tests; NFC
llvm-svn: 369875
2019-08-25 18:34:07 +00:00
Sanjay Patel b882c973ec [Hexagon][x86] add tests for bit-test; NFC
More coverage for D66687
(assuming we make this a generic combine with TLI hook).

llvm-svn: 369874
2019-08-25 18:25:22 +00:00
Amaury Sechet 1475fad1d0 [X86] Add test case for inserting/extracting from two shuffled vectors. NFC
llvm-svn: 369871
2019-08-25 15:49:29 +00:00
Amaury Sechet 6075f6cc5c [X86] Add test case for inserting/extracting from shuffled vectors. NFC
llvm-svn: 369870
2019-08-25 15:19:20 +00:00
Xing Xue ef039a3ccd [PowerPC][AIX] Adds support for writing the .data section in assembly files
Summary:
Adds support for generating the .data section in assembly files for global variables with a non-zero initialization. The support for writing the .data section in XCOFF object files will be added in a follow-on patch. Any relocations are not included in this patch.

Reviewers: hubert.reinterpretcast, sfertile, jasonliu, daltenty, Xiangling_L

Reviewed by: hubert.reinterpretcast

Subscribers: nemanjai, hiraditya, kbarton, MaskRay, jsji, wuzish, shchenz, DiggerLin, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66154

llvm-svn: 369869
2019-08-25 15:17:25 +00:00
Nikita Popov aa71c977ba [SDAG] Fold umul_lohi with 0 or 1 multiplicand
These can turn up during multiplication legalization. In principle
these should also apply to smul_lohi, but I wasn't able to figure
out how to produce those with the necessary operands.

Differential Revision: https://reviews.llvm.org/D66380

llvm-svn: 369864
2019-08-25 08:04:22 +00:00
Craig Topper 1abe162a9a [X86] Teach -Os immediate sharing code to not count constant uses that will become INC/DEC.
INC/DEC don't use an immediate so we don't need to count it. We
also shouldn't use the custom isel for it.

Fixes PR42998.

llvm-svn: 369863
2019-08-25 05:22:40 +00:00
Craig Topper 6e2776c9c4 [X86] Add test cases for PR42998. NFC
llvm-svn: 369862
2019-08-25 05:22:36 +00:00
Craig Topper cc4b0596b1 [X86] Add isel patterns to match vpdpwssd avx512vnni instruction from add+pmaddwd nodes.
llvm-svn: 369859
2019-08-24 23:14:57 +00:00
Matt Arsenault 74115ef791 AMDGPU: Add baseline test for mul24 ordering issues
llvm-svn: 369858
2019-08-24 22:22:38 +00:00
Matt Arsenault b3dd381a73 AMDGPU: Introduce a flag to disable mul24 intrinsic formation
llvm-svn: 369856
2019-08-24 22:14:41 +00:00
Matt Arsenault c4dd1d1873 AMDGPU: Generate check lines
Checking all the instructions will help catch LICM changes when passes
are reordered. Also switch to using gfx9 since global stores make the
relevant instructions more obvious.

llvm-svn: 369855
2019-08-24 22:14:37 +00:00
Stanislav Mekhanoshin 8fe1245a0f [AMDGPU] w/a for gfx908 mfma SrcC literal HW bug
gfx908 ignores an mfma if SrcC is a literal.

Differential Revision: https://reviews.llvm.org/D66670

llvm-svn: 369816
2019-08-23 22:09:58 +00:00
Sanjay Patel a3b831aec3 [x86] add tests for bt/test; NFC
llvm-svn: 369812
2019-08-23 21:15:27 +00:00
Jessica Paquette 83fe56b3b9 [AArch64][GlobalISel] Import XRO load/store patterns instead of custom selection
Instead of using custom C++ in `earlySelect` for loads and stores, just import
the patterns.

Remove `earlySelectLoad`, since we can just import the work it's doing.

Some minor changes to how `ComplexRendererFns` are returned for the XRO
addressing modes. If you add immediates in two steps, sometimes they are not
imported properly and you only end up with one immediate. I'm not sure if this
is intentional.

- Update load-addressing-modes.mir to include the instructions we can now
  import.

- Add a similar test, store-addressing-modes.mir to show which store opcodes we
  currently import, and show that we can pull in shifts etc.

- Update arm64-fastisel-gep-promote-before-add.ll to use FastISel instead of
  GISel. This test failed with GISel because GISel folds the gep into the load.
  The test checks that FastISel doesn't fold non-pointer-width adds into loads.
  GISel on the other hand, produces a G_CONSTANT of -128 for the add, and then
  a G_GEP, which must be pointer-width.

Note that we don't get STRBRoX right now. It seems like the importer can't
handle `FPR8Op:{ *:[Untyped] }:$Rt` source operands. So, those are not currently
supported.

Differential Revision: https://reviews.llvm.org/D66679

llvm-svn: 369806
2019-08-23 20:31:34 +00:00
Volkan Keles 277631e3b8 [GlobalISel] Legalizer: Retry combining illegal artifacts as long as there new artifacts
Summary:
Currently, Legalizer aborts if it’s unable to legalize artifacts. However, it’s
possible to combine them after processing the rest of the instruction because
the legalization is likely to generate more artifacts that allow ArtifactCombiner
to combine away them.

Instead, move illegal artifacts to another list called RetryList and wait until all of the
instruction in InstList are legalized. After that, check if there is any new artifacts and
try to combine them again if that’s the case. If not, abort. The idea is similar to D59339,
but the approach is a bit different.

This patch fixes the issue described above, but the legalizer still may be unable to handle
some cases depending on when to legalize artifacts. So, in the long run, we probably need
a different legalization strategy that handles this dependency in a better way.

Reviewers: dsanders, aditya_nandakumar, qcolombet, arsenm, aemerson, paquette

Reviewed By: dsanders

Subscribers: jvesely, wdng, nhaehnle, rovka, javed.absar, hiraditya, Petar.Avramovic, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65894

llvm-svn: 369805
2019-08-23 20:30:35 +00:00
Roland Froese b4051e57b1 [PowerPC] Expand v1i128 smin
The smin opcode and friends for v1i128 are incorrectly marked as legal for PPC.
Change them to expand.

Differential Revision: https://reviews.llvm.org/D64960

llvm-svn: 369797
2019-08-23 19:04:47 +00:00
Amaury Sechet 1fd2e69e28 [X86] Automatically generate load-local-v3i1.ll . NFC
llvm-svn: 369793
2019-08-23 18:12:33 +00:00
Craig Topper bccd183217 [X86] Mark VPDPWSSD and VPDPWSSDS as commutable. Add stack folding tests.
llvm-svn: 369792
2019-08-23 18:05:37 +00:00
Amaury Sechet 05f56a1ddd [AMDGPU] Automatically generate various tests. NFC
llvm-svn: 369787
2019-08-23 17:58:49 +00:00
Amaury Sechet 57ae79d7a2 [PowerPC] Automatically generate various tests. NFC
llvm-svn: 369754
2019-08-23 13:30:45 +00:00
Simon Pilgrim 04906ef1f2 [DAGCombine] GetNegatedExpression - add FMA\FMAD support
If the accumulator and either of the multiply operands are negatable then we can we negate the entire expression.

Differential Revision: https://reviews.llvm.org/D63141

llvm-svn: 369746
2019-08-23 10:49:46 +00:00
Jay Foad eac23862a8 [AMDGPU] gfx10 atomic optimizer changes.
Summary:
Add support for gfx10, where all DPP operations are confined to work
within a single row of 16 lanes, and wave32.

Reviewers: arsenm, sheredom, critson, rampitec

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, t-tye, hiraditya, jfb, dstuttard, tpr, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65644

llvm-svn: 369745
2019-08-23 10:07:43 +00:00
Craig Topper 85a968e9d5 [X86] Add a further unrolled madd reduction test case that shows several deficiencies.
The AVX2 check lines show two issues. An ADD that became an OR
because we knew the input was disjoint, but really it was zero
so we should have just removed the ADD/OR all together.

Relatedly we use 128-bit VPMADDWD instructions followed by
256-bit VPADDD operations. We should be able to narrow these
VPADDDs.

llvm-svn: 369736
2019-08-23 07:38:25 +00:00
Craig Topper bdceb9fb14 [X86] Improve lowering of v2i32 SAD handling in combineLoopSADPattern.
For v2i32 we only feed 2 i8 elements into the psadbw instructions
with 0s in the other 14 bytes. The resulting psadbw instruction
will produce zeros in bits [127:16] of the output. We need to take
the result and feed it to a v2i32 add where the first element
includes bits [15:0] of the sad result. The other element should
be zero.

Prior to this patch we were using a truncate to take 0 from
bits 95:64 of the psadbw. This results in a pshufd to move those
bits to 63:32. But since we also have zeroes in bits 63:32 of
the psadbw output, we should just take those bits.

The previous code probably worked better with promoting legalization,
but now we use widening legalization. I've preserved the old
behavior if -x86-experimental-vector-widening-legalization=false
until we get that option removed.

llvm-svn: 369733
2019-08-23 05:33:27 +00:00
Amaury Sechet 83f5333491 [ARM] Automatically generate dsp-mlal.ll . NFC
llvm-svn: 369718
2019-08-22 23:43:48 +00:00
Amaury Sechet 0ddb0e9fcb [PowerPC] Automatically generate vec_buildvector_loadstore.ll . NFC
llvm-svn: 369703
2019-08-22 20:42:50 +00:00
Amaury Sechet cac5274b20 [PowerPC] Automatically generate various tests. NFC
llvm-svn: 369700
2019-08-22 20:26:56 +00:00
Amaury Sechet e5d6f07e9d [AArch64] autogenerate some tests. NFC
llvm-svn: 369685
2019-08-22 18:53:41 +00:00
Matt Arsenault fba82858f2 GlobalISel: Don't create G_UADDE with constant false carry in
The x86 tests are now broken (in paticular add-scalar.ll now hits the
DAG fallback) due to not handling G_UADDO. The DAG x86 backend has a
custom lowering for this, so that will need to be implemented.

llvm-svn: 369673
2019-08-22 17:29:17 +00:00
Guozhi Wei 51f48295cb [MBP] Disable aggressive loop rotate in plain mode
Patch https://reviews.llvm.org/D43256 introduced more aggressive loop layout optimization which depends on profile information. If profile information is not available, the statically estimated profile information(generated by BranchProbabilityInfo.cpp) is used. If user program doesn't behave as BranchProbabilityInfo.cpp expected, the layout may be worse.

To be conservative this patch restores the original layout algorithm in plain mode. But user can still try the aggressive layout optimization with -force-precise-rotation-cost=true.

Differential Revision: https://reviews.llvm.org/D65673

llvm-svn: 369664
2019-08-22 16:21:32 +00:00
Amaury Sechet 95cf66de7c [DAGCombiner] Remove explicit call to AddToWorklist in sqrt and reciprocal computations
Summary: These nodes end up being processed regardless due to DAGCombiner ensuring arguments are processed. This changes the order in which nodes are processed, which fixes an issue on PowerPC.

Reviewers: craig.topper, efriedma, RKSimon, lebedev.ri, mcberg2017, stefanp, hfinkel

Subscribers: nemanjai, MaskRay, jsji, steven.zhang, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66548

llvm-svn: 369662
2019-08-22 15:35:45 +00:00
Simon Pilgrim ab2f68d5ad [PowerPC] Regenerate reciprocal tests, as discussed on D66548
llvm-svn: 369659
2019-08-22 15:14:52 +00:00
Sam Tebbs a69d9d6156 Reapply: [ARM] Fix lsrl with a 128/256 bit shift amount or a shift of 32
The CodeGen/Thumb2/mve-vaddv.ll test needed to be amended to reflect the
changes from the above patch.

This reverts commit cd53ff6, reapplying 7c6b229.

llvm-svn: 369638
2019-08-22 10:29:20 +00:00
Hans Wennborg cd53ff6c0d Revert r369626 "[ARM] Fix lsrl with a 128/256 bit shift amount or a shift of 32"
It broke the bots, see e.g. http://lab.llvm.org:8011/builders/clang-cuda-build/builds/36275/

> This patch fixes shifts by a 128/256 bit shift amount. It also fixes
> codegen for shifts of 32 by delegating to LLVM's default optimisation
> instead of emitting a long shift.
>
> Tests that used to generate long shifts of 32 are updated to check for the
> more optimised codegen.
>
> Differential revision: https://reviews.llvm.org/D66519
>
> llvm-svn: 369626

llvm-svn: 369636
2019-08-22 09:16:53 +00:00
Sam Tebbs 7c6b229204 [ARM] Fix lsrl with a 128/256 bit shift amount or a shift of 32
This patch fixes shifts by a 128/256 bit shift amount. It also fixes
codegen for shifts of 32 by delegating to LLVM's default optimisation
instead of emitting a long shift.

Tests that used to generate long shifts of 32 are updated to check for the
more optimised codegen.

Differential revision: https://reviews.llvm.org/D66519

llvm-svn: 369626
2019-08-22 08:12:06 +00:00
Fangrui Song 246750c2a9 [COFF] Fix section name for constants larger than 64 bits on Windows
APIntToHexString returns wrong value ("0000000000000000ffffffffffffffff")
for integer larger than 64 bits, and thus
TargetLoweringObjectFileCOFF::getSectionForConstant returns same section name
for all numbers larger than 64 bits. This patch tries to fix it.

Differential Revision: https://reviews.llvm.org/D66458
Patch by Senran Zhang

llvm-svn: 369610
2019-08-22 01:48:34 +00:00
Nico Weber ed18e70c86 Revert r367389 (and follow-up r368404); it caused PR43073.
llvm-svn: 369567
2019-08-21 19:53:42 +00:00
Sam Clegg dde8a25a4b [WebAssembly] Handle aliases in WebAssemblyFixFunctionBitcasts
Fixes: https://github.com/emscripten-core/emscripten/issues/8770

Differential Revision: https://reviews.llvm.org/D66508

llvm-svn: 369566
2019-08-21 19:52:33 +00:00
Amaury Sechet c0f190a048 [DAGCombiner] Remove mostly redundant calls to AddToWorklist
Summary:
These calls change the order in which some nodes are processed and so have an effect on codegen.

The change in fixup-bw-copy.ll is due to (and (load anyext)) gets transformed into (load zext) while previously the and was removed by SimplifyDemandedBits, so the (load anyext) remained.

Reviewers: craig.topper, efriedma, RKSimon, lebedev.ri

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66543

llvm-svn: 369561
2019-08-21 18:51:08 +00:00
Matt Arsenault 954a012b4c GlobalISel: Implement moreElementsVector for G_UNMERGE_VALUES sources
This is necessary for handling <3 x s16> on AMDGPU, assuming this
should be handled as 2 separate legalization actions. The alternative
would be for fewerElementsVector to handle 3->2.

llvm-svn: 369547
2019-08-21 16:59:10 +00:00
Alexander Timofeev 78347c979e [AMDGPU] Prevent VGPR copies from moving across the EXEC mask definitions
Differential Revision: https://reviews.llvm.org/D63731
Reviewers: qcolombet, rampitec

llvm-svn: 369532
2019-08-21 15:15:04 +00:00
Petar Avramovic 7f581df649 [MIPS GlobalISel] NarrowScalar G_ZEXTLOAD and G_SEXTLOAD
NarrowScalar G_ZEXTLOAD and G_SEXTLOAD to s32 for MIPS32.

Differential Revision: https://reviews.llvm.org/D66205

llvm-svn: 369512
2019-08-21 09:43:20 +00:00
Petar Avramovic e406aa791c [MIPS GlobalISel] NarrowScalar G_ZEXT and G_SEXT
NarrowScalar G_ZEXT and G_SEXT to s32 for MIPS32.

Differential Revision: https://reviews.llvm.org/D66204

llvm-svn: 369511
2019-08-21 09:35:02 +00:00
Petar Avramovic 61bf2675b9 [MIPS GlobalISel] Consider type1 when legalizing shifts after r351882
r351882 allows different type for shift amount then result and value
being shifted. Fix MIPS Legalizer rules to take r351882 into account.

Differential Revision: https://reviews.llvm.org/D66203

llvm-svn: 369510
2019-08-21 09:31:29 +00:00
Petar Avramovic 5b4c5c2c54 [MIPS GlobalISel] NarrowScalar G_TRUNC
Add NarrowScalar for G_TRUNC when NarrowTy is half the size of source.
NarrowScalar G_TRUNC to s32 for MIPS32.

Differential Revision: https://reviews.llvm.org/D66202

llvm-svn: 369509
2019-08-21 09:26:39 +00:00
Amara Emerson 56606a4db3 [AArch64][GlobalISel] Add support for narrowScalar of G_ZEXT
We do this by merging the source with the high bits set to 0.

Differential Revision: https://reviews.llvm.org/D66181

llvm-svn: 369480
2019-08-21 00:12:37 +00:00
Amaury Sechet 4ccf5ba941 [X86] Automatically generate shift tests. NFC
llvm-svn: 369475
2019-08-20 23:47:19 +00:00
Amaury Sechet 33c283adfd [X86] Autogenerate vec_* tests. NFC
llvm-svn: 369469
2019-08-20 23:11:29 +00:00
Daniel Sanders a16bd4f9f2 [RISCV GlobalISel] Adding initial GlobalISel infrastructure
Summary:
Add an initial GlobalISel skeleton for RISCV. It can only run ir translator for `ret void`.

Patch by Andrew Wei

Reviewers: asb, sabuasal, apazos, lenary, simoncook, lewis-revill, edward-jones, rogfer01, xiangzhai, rovka, Petar.Avramovic, mgorny, dsanders

Reviewed By: dsanders

Subscribers: pzheng, s.egerton, dsanders, hiraditya, rbar, johnrusso, niosHD, kito-cheng, shiva0217, jrtc27, MaskRay, zzheng, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, psnobl, benna, Jim, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65219

llvm-svn: 369467
2019-08-20 22:53:24 +00:00
Jessica Paquette e6c299b983 [AArch64][GlobalISel] Select logical_imm32 and logical_imm64 patterns
Add a GlobalISel equivalent for the logical_imm32_XFORM and logical_imm64_XFORM
SDNodeXForms in AArch64InstrFormats.td.

- Add select-logical-imm.mir, which contains tests for each imported pattern.

- Update select-pr32733.mir and select-scalar-shift-imm.mir, since they now
select instructions of this form.

Differential Revision: https://reviews.llvm.org/D66162

llvm-svn: 369465
2019-08-20 22:31:25 +00:00
Jessica Paquette 9a95e79b1b [AArch64][GlobalISel] Select patterns which use shifted register operands
This adds GlobalISel equivalents for the following from AArch64InstrFormats:

- arith_shifted_reg32
- arith_shifted_reg64

And partial support for

- logical_shifted_reg32
- logical_shifted_reg32

The only thing missing for the logical cases is support for rotates. Other than
the missing support, the transformation is identical for the arithmetic shifted
register and the logical shifted register.

Lots of tests here:

- Add select-arith-shifted-reg.mir to show that we correctly select add and
sub instructions which use this pattern.

- Add select-logical-shifted-reg.mir to cover patterns which are not shared
between the arithmetic and logical cases.

- Update addsub-shifted.ll to show that we correctly fold shifts into
adds/subs.

- Update eon.ll to show that we can select the eon instruction by folding xors.

Differential Revision: https://reviews.llvm.org/D66163

llvm-svn: 369460
2019-08-20 22:18:06 +00:00
Craig Topper ba375263e8 [DAGCombiner][X86] Teach visitCONCAT_VECTORS to combine (concat_vectors (concat_vectors X, Y), undef)) -> (concat_vectors X, Y, undef, undef)
I also had to add a new combine to X86's combineExtractSubvector to prevent a regression.

This helps our vXi1 code see the full concat operation and allow it optimize undef to a zero if there is already a zero in the concat. This helped us use a movzx instead of an AND in some of the tests. In those tests, one concat comes from SelectionDAGBuilder and the second comes from type legalization of v4i1->i4 bitcasts which uses an additional concat. Though these changes weren't my original motivation.

I'm looking at making X86ISelLowering's narrowShuffle emit a concat_vectors instead of an insert_subvector since concat_vectors is more canonical during early DAG combine. This patch helps prevent a regression from my experiments with that.

Differential Revision: https://reviews.llvm.org/D66456

llvm-svn: 369459
2019-08-20 22:12:50 +00:00
Reid Kleckner 22fb734907 Revert [WinEH] Allocate space in funclets stack to save XMM CSRs
This reverts r367088 (git commit 9ad565f70e)

And the follow up fix r368631 / e9865b9b31

llvm-svn: 369457
2019-08-20 22:08:57 +00:00
Sean Fertile 1e46d4cec5 Adds support for writing the .bss section for XCOFF object files.
Adds Wrapper classes for MCSymbol and MCSection into the XCOFF target
object writer. Also adds a class to represent the top-level sections, which we
materialize in the ObjectWriter.

executePostLayoutBinding will map all csects into the appropriate
container depending on its storage mapping class, and map all symbols
into their containing csect. Once all symbols have been processed we
- Assign addresses and symbol table indices.
- Calaculte section sizes.
- Build the section header table.
- Assign the sections raw-pointer value for non-virtual sections.

Since the .bss section is virtual, writing the header table is enough to
add support. Writing of a sections raw data, or of any relocations is
not included in this patch.

Testing is done by dumping the section header table, but it needs to be
extended to include dumping the symbol table once readobj support for
dumping auxiallary entries lands.

Differential Revision: https://reviews.llvm.org/D65159

llvm-svn: 369454
2019-08-20 22:03:18 +00:00
Martin Storsjo 100957153a [test] Fix tests when run on windows after SVN r369426. NFC.
When running tests on windows, invoking "llc -march=<arch>" will
implicitly use windows as the target os, making these tests misbehave
after this change.

Fix the issue by using more specific -mtriple values instead of plain
-march in these tests.

This should hopefully fix buildbot failures like
http://lab.llvm.org:8011/builders/clang-x64-windows-msvc/builds/9816.

llvm-svn: 369443
2019-08-20 20:58:02 +00:00
Craig Topper 3a2b08e6c9 [X86] Add a DAG combine to transform (i8 (bitcast (v8i1 (extract_subvector (v16i1 X), 0)))) -> (i8 (trunc (i16 (bitcast (v16i1 X))))) on KNL target
Without AVX512DQ we don't have KMOVB so we can't really copy 8-bits of a k-register to a GPR. We have to copy 16 bits instead. We do this even if the DAG copy is from v8i1->v16i1. If we detect the (i8 (bitcast (v8i1 (extract_subvector (v16i1 X), 0)))) we should rewrite the types to match the copy we do support. By doing this, we can help known bits to propagate without losing the upper 8 bits of the input to the extract_subvector. This allows some zero extends to be removed since we have an isel pattern to use kmovw for (zero_extend (i16 (bitcast (v16i1 X))).

Differential Revision: https://reviews.llvm.org/D66489

llvm-svn: 369434
2019-08-20 20:20:04 +00:00
Craig Topper 250951abf5 [X86] Add isel patterns for (i64 (zext (i8 (bitcast (v16i1 X))))) to use a KMOVW and a SUBREG_TO_REG. Similar for i8 and anyextend.
We already had patterns for extending to i32 to take advantage of
the impliciting zeroing of the upper bits of a 32-bit GPR that is
done by KMOVW/KMOVB. But the extend might be all the way to i64,
in which case the existing patterns would fail and we'd get a
KMOVW/B followed by a MOVZX. By adding patterns for i64 we can
use the fact that KMOVW/B zero the upper bits of the 32-bit GPR
and the normal property that 32-bit GPR writes implicitly zero the
upper 32-bits of the full 64-bit GPR.

The anyextend patterns are slightly different since we don't care
about the upper zeros. For the i8->i64 I think this avoids selecting
the anyextend as a MOVZX to prevent a partial register issue that
doesn't exist. For i16->i64 I think we would have just emitted an
insert_subreg on top of the extract_subreg that the vXi16->i16
bitcast pattern emits. The register coalescer or peephole pass
should combine those, but this saves that work and makes i8/16
consistent.

llvm-svn: 369431
2019-08-20 19:43:48 +00:00
Martin Storsjo 514f3a122d [TargetMachine] Don't try to create COFFSTUB references on windows on non-COFF
This avoids spurious relocation types for windows/elf targets.

Differential Revision: https://reviews.llvm.org/D66401

llvm-svn: 369426
2019-08-20 18:58:05 +00:00
Matt Arsenault 4b7fc85c0b Revert "AMDGPU: Fix iterator error when lowering SI_END_CF"
This reverts r367500 and r369203. This is causing various test
failures.

llvm-svn: 369417
2019-08-20 17:45:25 +00:00
Thomas Raoux 53ab6bef98 [CodeGen] Add EarlyIfConvert test missed in previous commit
llvm-svn: 369405
2019-08-20 16:34:47 +00:00
Sam Tebbs dcfc2d40d3 [ARM] Select vaddva
This patch adds vaddva selection.

Differential revision: https://reviews.llvm.org/D66410

llvm-svn: 369404
2019-08-20 16:33:34 +00:00
Aditya Nandakumar 08bd080872 [GlobalISel] Handle multiple registers in dbg.value intrinsic
https://reviews.llvm.org/D66077

The value passed into dbg.value may relate to multiple registers,
each of which need a DBG_VALUE.

This fix calls MIRBuilder.buildDirectDbgValue for each register.

Without this, IR passed in from flang-compiler/flang may fail an
assertion in getOrCreateVReg.

Patch by : peterwaller-arm.

llvm-svn: 369403
2019-08-20 16:28:37 +00:00
Simon Pilgrim cec028fc14 [X86][FMA] Add FMA 'negated expression' combine tests for D63141
llvm-svn: 369384
2019-08-20 13:25:55 +00:00
Vyacheslav Zakharin f7229ac7d8 Fixed placement of llvm.global_dtors on Windows.
Differential revision: https://reviews.llvm.org/D66373

llvm-svn: 369299
2019-08-19 21:07:03 +00:00
Evgeniy Stepanov 50affbe47f MemTag: stack initializer merging.
Summary:
MTE provides instructions to update memory tags and data at the same
time. This change makes use of those to generate more compact code for
stack variable tagging + initialization.

We collect memory store and memset instructions following an alloca or a
lifetime.start call, and replace them with the corresponding MTE
intrinsics. Since the intrinsics work on 16-byte aligned chunks, the
stored values are combined as necessary.

Reviewers: pcc, vitalybuka, ostannard

Subscribers: srhines, javed.absar, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66167

llvm-svn: 369297
2019-08-19 20:47:09 +00:00
Craig Topper a0d92c7262 [X86] Teach lowerV4I32Shuffle to only use broadcasts if the mask has more than one undef element. Prioritize shifts over broadcast in lowerV8I16Shuffle.
The motivating case are the changes in vector-reduce-add.ll where
we were doing extra work in the scalar domain instead of shuffling.
There may be some one use check that needs to be looked into there,
but this patch sidesteps the issue by avoiding broadcasts that
aren't really broadcasting.

Differential Revision: https://reviews.llvm.org/D66071

llvm-svn: 369287
2019-08-19 18:15:50 +00:00
Roman Lebedev edfaee0811 [TargetLowering] x s% C == 0 fold: vector divisor with INT_MIN handling
Summary:
The general fold is only valid for positive divisors.
Which effectively means, it is invalid for `INT_MIN` divisors,
and we currently bailout if we see them.

But that is too strict, we can just fix-up the results.
For that, let's do a second computation 'in parallel':
```
Name: srem -> and
Pre: isPowerOf2(C)
%o = srem i8 %X, C
%r = icmp eq %o, 0
  =>
%n = and i8 %X, C-1
%r = icmp eq %n, 0
```
https://rise4fun.com/Alive/Sup

And then just blend results: if the divisor was `INT_MIN`,
pick the value we got via bit-test,
else pick the value from general fold.

There's interesting observation - `ISD::ROTR` is set to
`LegalizeAction::Expand` before AVX512, so we should not
treat `INT_MIN` divisor as even; and as it can be seen
while `@test_srem_odd_even_one` improves on all run-lines,
`@test_srem_odd_even_INT_MIN` only improves for AVX512.

Reviewers: RKSimon, craig.topper, spatel

Reviewed By: RKSimon

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66300

llvm-svn: 369268
2019-08-19 15:01:42 +00:00
Amaury Sechet 8130154115 Automatically generate AVX512 test cases. NFC
llvm-svn: 369264
2019-08-19 14:34:08 +00:00
Jinsong Ji 0776da5236 [PeepholeOptimizer] Don't assume bitcast def always has input
Summary:
If we have a MI marked with bitcast bits, but without input operands,
PeepholeOptimizer might crash with assert.

eg:
If we apply the changes in PPCInstrVSX.td as in this patch:

[(set v4i32:$XT, (bitconvert (v16i8 immAllOnesV)))]>;

We will get assert in PeepholeOptimizer.

```
llvm-lit llvm-project/llvm/test/CodeGen/PowerPC/build-vector-tests.ll -v

llvm-project/llvm/include/llvm/CodeGen/MachineInstr.h:417: const
llvm::MachineOperand &llvm::MachineInstr::getOperand(unsigned int)
const: Assertion `i < getNumOperands() && "getOperand() out of range!"'
failed.
```

The fix is to abort if we found out of bound access.

Reviewers: qcolombet, MatzeB, hfinkel, arsenm

Reviewed By: qcolombet

Subscribers: wdng, arsenm, steven.zhang, wuzish, nemanjai, hiraditya, kbarton, MaskRay, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65542

llvm-svn: 369261
2019-08-19 14:19:04 +00:00
David Stenberg 88df53e6ea [DebugInfo] Allow bundled calls in the MIR's call site info
Summary:
Extend the MIR parser and writer so that the call site information can
refer to calls that are bundled.

Reviewers: aprantl, asowda, NikolaPrica, djtodoro, ivanbaev, vsk

Reviewed By: aprantl

Subscribers: arsenm, hiraditya, llvm-commits

Tags: #debug-info, #llvm

Differential Revision: https://reviews.llvm.org/D66145

llvm-svn: 369256
2019-08-19 12:41:22 +00:00
Sam Tebbs f312c1ecf4 [ARM] Add support for MVE vaddv
This patch adds vecreduce_add and the relevant instruction selection for
vaddv.

Differential revision: https://reviews.llvm.org/D66085

llvm-svn: 369245
2019-08-19 09:38:28 +00:00
Craig Topper ebb7ddc633 [X86] Teach lower1BitShuffle to match right shifts with upper zero elements on types that don't natively support KSHIFT.
We can support these by widening to a supported type,
then shifting all the way to the left and then
back to the right to ensure that we shift in zeroes.

llvm-svn: 369232
2019-08-19 05:45:39 +00:00
Craig Topper 269c6b1c15 [X86] Teach lower1BitShuffle to match KSHIFTR that doesn't use Zeroable and only relies on undef.
This allows us to widen the type when the KSHIFTR instruction
doesn't exist for the type. If we need to shift in zeroes into
the upper elements we would need more work to guarantee zeroes
when widening.

llvm-svn: 369227
2019-08-19 04:08:40 +00:00
Craig Topper 2eb7951da3 [X86] Teach lower1BitShuffle to recognize padding a subvector with zeros with V2 as the source and V1 as the zero vector.
Shuffle canonicalization can swap the sources so the zero vector
might be V1 and the subvector that's being padded can be V2.

llvm-svn: 369226
2019-08-19 00:39:22 +00:00
Craig Topper c9ee4c7c22 [X86] Add test case for missed opportunity to recognize a vXi1 shuffle as an insert into a zero vector.
We are currently missing this because shuffle canonicalization
puts the zero vector as V1 and the subvector as V2. Our current
code doesn't recognize this case.

llvm-svn: 369225
2019-08-19 00:39:18 +00:00
Craig Topper 2ee46c7c4b [X86] Add a special case to LowerCONCAT_VECTORSvXi1 to handle concatenating zero vectors followed by one non-zero vector followed by undef vectors.
For such a case we should only need a KSHIFTL, but we were
previously generating a KSHIFTL followed by a KSHIFTR because
we mistakenly believed we need to zero the undef elements.

llvm-svn: 369224
2019-08-18 23:30:11 +00:00
Craig Topper 6bd2e8eff8 [X86] Add test cases for suboptimal insertion of a vXi1 vector into a larger vector with zeros in the lower elements and undef upper elements.
Currently we generate kshifts to clear both the upper and lower
elements, but we only need one kshift.

llvm-svn: 369223
2019-08-18 23:30:07 +00:00