Commit Graph

26589 Commits

Author SHA1 Message Date
Nick Desaulniers 103bd108a7 [RegisterCoalescer] fix potential use of undef value. NFC
Summary:
Fixes a warning produced from scan-build (llvm.org/reports/scan-build/),
further warnings found by annotation isMoveInstr [[nodiscard]].

isMoveInstr potentially does not assign to its parameters, so if they
were uninitialized, they will potentially stay uninitialized.  It seems
most call sites pass references to uninitialized values, then use them
without checking the return value.

Reviewers: wmi

Reviewed By: wmi

Subscribers: MatzeB, qcolombet, hiraditya, tpr, llvm-commits, srhines

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62109

llvm-svn: 362265
2019-05-31 21:20:13 +00:00
Puyan Lotfi 3ea6b24f41 [MIR-Canon] Don't do vreg skip for independent instructions if there are none.
We don't want to create vregs if there is nothing to use them for. That causes
verifier errors.

Differential Revision: https://reviews.llvm.org/D62740

llvm-svn: 362247
2019-05-31 17:34:25 +00:00
Kevin P. Neal ac79007205 Revert revert of r362112 with minor SystemZ test file corrections.
[FPEnv] Added a special UnrollVectorOp method to deal with the chain on StrictFP opcodes

This change creates UnrollVectorOp_StrictFP. The purpose of this is to address a failure that consistently occurs when calling StrictFP functions on vectors whose number of elements is 3 + 2n on most platforms, such as PowerPC or SystemZ. The old UnrollVectorOp method does not expect that the vector that it will unroll will have a chain, so it has an assert that prevents it from running if this is the case. This new StrictFP version of the method deals with the chain while unrolling the vector. With this new function in place during vector widending, llc can run vector-constrained-fp-intrinsics.ll for SystemZ successfully.

Submitted by:	Drew Wock <drew.wock@sas.com>
Reviewed by:	Cameron McInally, Kevin P. Neal
Approved by:	Cameron McInally
Differential Revision:	https://reviews.llvm.org/D62546

llvm-svn: 362241
2019-05-31 16:32:12 +00:00
Jinsong Ji 18e7bf5c4d [MachinePipeliner][NFC] Add some debug log and statistics
This is to add some log and statistics for debugging

Differential Revision: https://reviews.llvm.org/D62165

llvm-svn: 362233
2019-05-31 15:35:19 +00:00
Puyan Lotfi 0d63cef180 [MIR-Canon] Skip the first N vreg names lazily.
This consolidates the vreg skip code into one function (SkipVRegs()).
SkipVRegs() now knows if it should skip as if it is the first initialization or
subsequent skips.

The first skip is also done the first time createVirtualRegister is called by
the cursor instead of by the cursor's constructor. This prevents verifier
errors on machine functions that have no vregs (where the verifier will
complain that there are vregs when the function uses none).

Differential Revision: https://reviews.llvm.org/D62717

llvm-svn: 362195
2019-05-31 06:02:38 +00:00
Puyan Lotfi 2a901401fe [MIR-Canon] Hardening propagateLocalCopies.
This is am almost NFC, it does the following:
- If there is no register class for a COPY's src or dst, bail.
- Fixes uses iterator invalidation bug.

Differential Revision: https://reviews.llvm.org/D62713

llvm-svn: 362191
2019-05-31 04:49:58 +00:00
Matt Arsenault 18659f84b2 MISched: Fix -misched-regpressure=0 if subreg liveness enabled
Test is waiting on fixing several more crashes in the AMDGPU scheduler
implementation with this.

llvm-svn: 362174
2019-05-30 23:31:36 +00:00
Francis Visoiu Mistrih 6ada11f134 [Remarks][NFC] Move the serialization to lib/Remarks
Separate the remark serialization to YAML from the LLVM Diagnostics.

This adds a new serialization abstraction: remarks::Serializer. It's
completely independent from lib/IR and it provides an easy way to
replace YAML by providing a new remarks::Serializer.

Differential Revision: https://reviews.llvm.org/D62632

llvm-svn: 362160
2019-05-30 21:45:59 +00:00
Puyan Lotfi daaecf98c9 [MIR-Canon] Fixing case where MachineFunction is empty.
In cases where the machine function is empty: bail on the RPO traversal.

Differential Revision: https://reviews.llvm.org/D62617

llvm-svn: 362158
2019-05-30 21:37:25 +00:00
Roman Lebedev 46511d75b5 [DAGCombine] Limit 'hoist add/sub binop w/ constant op' to non-opaque consts
I don't have a test case for these, but there is a test case for D62266
where, even after all the constant-folding patches, we still end up
with endless combine loop. Which makes sense, since we don't constant
fold for opaque constants.

llvm-svn: 362156
2019-05-30 21:10:37 +00:00
Roman Lebedev a4e3b50e26 [DAGCombiner][X86][AArch64] (x - C) + y -> (x + y) - C fold. Try 2
Summary:
Only vector tests are being affected here,
since subtraction by scalar constant is rewritten
as addition by negated constant.

No surprising test changes.

https://rise4fun.com/Alive/pbT

This is a recommit, originally committed in rL361852, but reverted
to investigate test-suite compile-time hangs.

Reviewers: RKSimon, craig.topper, spatel

Reviewed By: RKSimon

Subscribers: javed.absar, kristof.beyls, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62257

llvm-svn: 362146
2019-05-30 20:37:49 +00:00
Roman Lebedev 57aa36ff91 [DAGCombine] (x - C) - y -> (x - y) - C fold. Try 3
Summary:
Again only vectors affected. Frustrating. Let me take a look into that..

https://rise4fun.com/Alive/AAq

This is a recommit, originally committed in rL361852, but reverted
to investigate test-suite compile-time hangs, and then reverted in
rL362109 to fix missing constant folds that were causing
endless combine loops.

Reviewers: RKSimon, craig.topper, spatel

Reviewed By: RKSimon

Subscribers: javed.absar, JDevlieghere, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62294

llvm-svn: 362145
2019-05-30 20:37:39 +00:00
Roman Lebedev 63b4741534 [DAGCombine][X86][AArch64][AMDGPU] (x - y) + -1 -> add (xor y, -1), x fold. Try 3
Summary:
This prevents regressions in next patch,
and somewhat recovers from the regression to AMDGPU test in D62223.

It is indeed not great that we leave vector decrement,
don't transform it into vector add all-ones..

https://rise4fun.com/Alive/ZRl

This is a recommit, originally committed in rL361852, but reverted
to investigate test-suite compile-time hangs, and then reverted in
rL362109 to fix missing constant folds that were causing
endless combine loops.

Reviewers: RKSimon, craig.topper, spatel, arsenm

Reviewed By: RKSimon, arsenm

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, javed.absar, dstuttard, tpr, t-tye, kristof.beyls, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62263

llvm-svn: 362144
2019-05-30 20:37:29 +00:00
Roman Lebedev 05ad5fd213 [DAGCombiner][X86][AArch64][SPARC][SystemZ] y - (x + C) -> (y - x) - C fold. Try 3
Summary:
Direct sibling of D62223 patch.
While i don't have a direct motivational pattern for this,
it would seem to make sense to handle both patterns (or none),
for symmetry?

The aarch64 changes look neutral;
sparc and systemz look like improvement (one less instruction each);
x86 changes - 32bit case improves, 64bit case shows that LEA no longer
gets constructed, which may be because that whole test is `-mattr=+slow-lea,+slow-3ops-lea`

https://rise4fun.com/Alive/ffh

This is a recommit, originally committed in rL361852, but reverted
to investigate test-suite compile-time hangs, and then reverted in
rL362109 to fix missing constant folds that were causing
endless combine loops.

Reviewers: RKSimon, craig.topper, spatel, t.p.northover

Reviewed By: t.p.northover

Subscribers: t.p.northover, jyknight, javed.absar, kristof.beyls, fedor.sergeev, jrtc27, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62252

llvm-svn: 362143
2019-05-30 20:37:18 +00:00
Roman Lebedev 1d9ec7a81b [DAGCombiner][X86][AArch64][AMDGPU] (x + C) - y -> (x - y) + C fold. Try 3
Summary:
The main motivation is shown by all these `neg` instructions that are now created.
In particular, the `@reg32_lshr_by_negated_unfolded_sub_b` test.

AArch64 test changes all look good (`neg` created), or neutral.

X86 changes look neutral (vectors), or good (`neg` / `xor eax, eax` created).

I'm not sure about `X86/ragreedy-hoist-spill.ll`, it looks like the spill
is now hoisted into preheader (which should still be good?),
2 4-byte reloads become 1 8-byte reload, and are elsewhere,
but i'm not sure how that affects that loop.

I'm unable to interpret AMDGPU change, looks neutral-ish?

This is hopefully a step towards solving [[ https://bugs.llvm.org/show_bug.cgi?id=41952 | PR41952 ]].

https://rise4fun.com/Alive/pkdq (we are missing more patterns, i'll submit them later)

This is a recommit, originally committed in rL361852, but reverted
to investigate test-suite compile-time hangs, and then reverted in
rL362109 to fix missing constant folds that were causing
endless combine loops.

Reviewers: craig.topper, RKSimon, spatel, arsenm

Reviewed By: RKSimon

Subscribers: bjope, qcolombet, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, javed.absar, dstuttard, tpr, t-tye, kristof.beyls, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62223

llvm-svn: 362142
2019-05-30 20:36:54 +00:00
Roman Lebedev 7eb8b5b5dd [DAGCombine] ((c1-A)-c2) -> ((c1-c2)-A) constant-fold
Summary: https://rise4fun.com/Alive/B0A

Reviewers: t.p.northover, RKSimon, spatel, craig.topper

Reviewed By: RKSimon

Subscribers: javed.absar, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62691

llvm-svn: 362135
2019-05-30 19:27:51 +00:00
Roman Lebedev 691b5e2ecc [DAGCombine] (A-C1)-C2 -> A-(C1+C2) constant-fold
Summary: https://rise4fun.com/Alive/Mb1M

Reviewers: RKSimon, craig.topper, spatel, t.p.northover

Reviewed By: t.p.northover

Subscribers: t.p.northover, javed.absar, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62689

llvm-svn: 362134
2019-05-30 19:27:42 +00:00
Roman Lebedev 0a3dbbcdfb [DAGCombine] (A+C1)-C2 -> A+(C1-C2) constant-fold
Summary:
Direct sibling of D62662, the root cause of the endless combine loop in D62257

https://rise4fun.com/Alive/d3W

Reviewers: RKSimon, craig.topper, spatel, t.p.northover

Reviewed By: t.p.northover

Subscribers: t.p.northover, javed.absar, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62664

llvm-svn: 362133
2019-05-30 19:27:32 +00:00
Roman Lebedev 9ff3159b4a [DAGCombine] Use FoldConstantArithmetic() to perform C2-(A+C1) -> (C2-C1)-A fold
Summary:
No tests change, and i'm not sure how to test this, but it's better safe than sorry.

Reviewers: spatel, RKSimon, craig.topper, t.p.northover

Reviewed By: craig.topper

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62663

llvm-svn: 362132
2019-05-30 19:27:26 +00:00
Roman Lebedev cc9a9cf237 [DAGCombine] ((A-c1)+c2) -> (A+(c2-c1)) constant-fold
Summary:
This was the root cause of the endless combine loop in D62257

https://rise4fun.com/Alive/d3W

Reviewers: RKSimon, spatel, craig.topper, t.p.northover

Reviewed By: t.p.northover

Subscribers: t.p.northover, javed.absar, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62662

llvm-svn: 362131
2019-05-30 19:27:19 +00:00
Roman Lebedev ef95679741 [DAGCombine] Use FoldConstantArithmetic() to perform ((c1-A)+c2) -> (c1+c2)-A fold
Summary: No tests change, and i'm not sure how to test this, but it's better safe than sorry.

Reviewers: spatel, RKSimon, craig.topper, t.p.northover

Reviewed By: craig.topper

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62661

llvm-svn: 362130
2019-05-30 19:27:10 +00:00
Tim Northover b7141207a4 Reapply: IR: add optional type to 'byval' function parameters
When we switch to opaque pointer types we will need some way to describe
how many bytes a 'byval' parameter should occupy on the stack. This adds
a (for now) optional extra type parameter.

If present, the type must match the pointee type of the argument.

The original commit did not remap byval types when linking modules, which broke
LTO. This version fixes that.

Note to front-end maintainers: if this causes test failures, it's probably
because the "byval" attribute is printed after attributes without any parameter
after this change.

llvm-svn: 362128
2019-05-30 18:48:23 +00:00
Puyan Lotfi 0f4446b270 [MIR-Canon] Add support for rewriting VRegs that are typed but don't have an RC.
There were crashes (addrspace-memoperands.mir was only one of them) in MIR that
had operands that came from before register classes were set. With these
operands, creating a replacement vreg (for MIR-Canon's renaming) needs to use
the vreg type rather than the RegisterClass which is not present.

Differential Revision: https://reviews.llvm.org/D62543

llvm-svn: 362122
2019-05-30 18:06:28 +00:00
Kevin P. Neal 51ce0b196a Correct error in revert of r362112.
Differential Revision:	http://reviews.llvm.org/D62546

llvm-svn: 362118
2019-05-30 17:21:45 +00:00
Kevin P. Neal d3db7b40b0 Revert r362112, it broke the bots with the message "Unsupported vector argument or return type"
Differential Revision:	http://reviews.llvm.org/D62546

llvm-svn: 362117
2019-05-30 17:10:21 +00:00
Kevin P. Neal 2e1807678d [FPEnv] Added a special UnrollVectorOp method to deal with the chain on StrictFP opcodes
This change creates UnrollVectorOp_StrictFP. The purpose of this is to address a failure that consistently occurs when calling StrictFP functions on vectors whose number of elements is 3 + 2n on most platforms, such as PowerPC or SystemZ. The old UnrollVectorOp method does not expect that the vector that it will unroll will have a chain, so it has an assert that prevents it from running if this is the case. This new StrictFP version of the method deals with the chain while unrolling the vector. With this new function in place during vector widending, llc can run vector-constrained-fp-intrinsics.ll for SystemZ successfully.

Submitted by:	Drew Wock <drew.wock@sas.com>
Reviewed by:	Cameron McInally, Kevin P. Neal
Approved by:	Cameron McInally
Differential Revision:	http://reviews.llvm.org/D62546

llvm-svn: 362112
2019-05-30 16:44:47 +00:00
Roman Lebedev 019d270e43 [DAGCombine] Revert of recommit of "binop-with-const hoisting" patches
I was looking into an endless combine loop the uncommitted follow-up patch
was causing, and it appears even these patches can exibit such an
endless loop. The root cause is that we try to hoist one binop (add/sub) with
constant operand, and if we get two such binops both of which are
eligible for this hoisting, we get stuck.

Some cases may highlight missing constant-folds.

Reverts r361871,r361872,r361873,r361874.

llvm-svn: 362109
2019-05-30 16:07:11 +00:00
Amy Huang 325003be02 CodeView - add static data members to global variable debug info.
Summary:
Add static data members to IR debug info's list of global variables
so that they are emitted as S_CONSTANT records.

Related to https://bugs.llvm.org/show_bug.cgi?id=41615.

Reviewers: rnk

Subscribers: aprantl, cfe-commits, llvm-commits, thakis

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D62167

llvm-svn: 362038
2019-05-29 21:45:34 +00:00
Tim Northover 71ee3d0237 Revert "IR: add optional type to 'byval' function parameters"
The IRLinker doesn't delve into the new byval attribute when mapping types, and
this breaks LTO.

llvm-svn: 362029
2019-05-29 20:46:38 +00:00
Benjamin Kramer 107f8d9873 [DAGCombiner] Replace gathers with a zero mask with the passthru value
These can be created by the legalizer when splitting a larger gather.

See https://llvm.org/PR42055 for a motivating example.

Differential Revision: https://reviews.llvm.org/D62613

llvm-svn: 362015
2019-05-29 19:24:19 +00:00
Tim Northover 6e07f16fae IR: add optional type to 'byval' function parameters
When we switch to opaque pointer types we will need some way to describe
how many bytes a 'byval' parameter should occupy on the stack. This adds
a (for now) optional extra type parameter.

If present, the type must match the pointee type of the argument.

Note to front-end maintainers: if this causes test failures, it's probably
because the "byval" attribute is printed after attributes without any parameter
after this change.

llvm-svn: 362012
2019-05-29 19:12:48 +00:00
Sam McCall 4f09d9fcfa Qualify use of llvm::empty that's ambiguous with std::empty
llvm-svn: 361968
2019-05-29 15:02:16 +00:00
Richard Trieu c77aff7e17 Inline a variable into debug section to fix unused variable warning.
llvm-svn: 361927
2019-05-29 04:09:32 +00:00
Richard Trieu e8698ead9d Inline value into debug statement to avoid unused variable warning.
llvm-svn: 361924
2019-05-29 03:43:01 +00:00
Peter Collingbourne 31fda09b2d Add IR support, ELF section and user documentation for partitioning feature.
The partitioning feature was proposed here:
http://lists.llvm.org/pipermail/llvm-dev/2019-February/130583.html

This is mostly just documentation. The feature itself will be contributed
in subsequent patches.

Differential Revision: https://reviews.llvm.org/D60242

llvm-svn: 361923
2019-05-29 03:29:01 +00:00
Jinsong Ji f6cb3bcb4c Support resource tracking with InstrSchedModel
The current design use DFA to do resource tracking in SMS,
and DFA only support InstrItins, and also has scaling limitation.

This patch extend SMS to allow Subtarget to use ProcResource in
InstrSchedModel instead.

Differential Revision: https://reviews.llvm.org/D62163

llvm-svn: 361919
2019-05-29 03:02:59 +00:00
Pengfei Wang 72e3f9662b Revert "[X86] Use 'llvm_unreachable' instead of nullptr in unreachable code to"
This reverts commit c1b3716614bc0a107e6f41a7d3d503baefad8a5b.

llvm-svn: 361918
2019-05-29 02:49:59 +00:00
Pengfei Wang 818c652643 [X86] Use 'llvm_unreachable' instead of nullptr in unreachable code to
avoid static check fail

RegClassOrBank is an object of RegClassOrRegBank, which is defined as
using llvm::RegClassOrRegBank = typedef PointerUnion<const
TargetRegisterClass *, const RegisterBank *>
so control flow can not get here. Use ""llvm_unreachable" here to avoid
"null pointer" confusion.

Patch by Shengchen Kan (skan)

Differential Revision: https://reviews.llvm.org/D62006

Signed-off-by: pengfei <pengfei.wang@intel.com>
llvm-svn: 361912
2019-05-29 02:20:37 +00:00
Quentin Colombet a6f57ad2c9 [RegUsageInfoCollector] Don't mark as saved registers that don't have subregister lanes
To determine the list of clobbered registers, the RegUsageInfoCollector pass
uses the list of callee saved registers provided by the target and then augments
it with the list of registers which have all their subregisters saved. It then
basically does the difference between all the registers and the saved registers
to come up with what is clobbered (plus it checks that the register is defined
within that functions).

The patch fixes a bug where when register does not have any subregister lane,
hence when checking if any of its subregister are not saved, we would find none
and think the register is saved as well.

That's obviously wrong.

The code was actually kind of checking for something like that with the
CoveredBySubRegs bit. What this bit says is that a register is completely
covered by its subregisters.
We required that this bit was set, to check that a register was saved by its
subregister lanes, since without this bit, we potentially would miss to check
some part of the register.

However, this bit is used de facto on registers that don't have any
subregisters (e.g., on ARM) and the code was not prepared for that.

This patch fixes this by checking that a register has subregisters before
declaring it saved when none of its lanes are modified.

llvm-svn: 361901
2019-05-28 23:43:12 +00:00
Adhemerval Zanella 6d7bf5e8df [CodeGen] Add lrint/llrint builtins
This patch add the ISD::LRINT and ISD::LLRINT along with new
intrinsics.  The changes are straightforward as for other
floating-point rounding functions, with just some adjustments
required to handle the return value being an interger.

The idea is to optimize lrint/llrint generation for AArch64
in a subsequent patch.  Current semantic is just route it to libm
symbol.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D62017

llvm-svn: 361875
2019-05-28 20:47:44 +00:00
Roman Lebedev dfc34f0211 [DAGCombine] (x - C) - y -> (x - y) - C fold. Try 2
Summary:
Again only vectors affected. Frustrating. Let me take a look into that..

https://rise4fun.com/Alive/AAq

This is a recommit, originally committed in rL361856, but reverted
to investigate test-suite compile-time hangs.

Reviewers: RKSimon, craig.topper, spatel

Reviewed By: RKSimon

Subscribers: javed.absar, JDevlieghere, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62294

llvm-svn: 361874
2019-05-28 20:40:10 +00:00
Roman Lebedev d485c6bc9f [DAGCombine][X86][AArch64][AMDGPU] (x - y) + -1 -> add (xor y, -1), x fold. Try 2
Summary:
This prevents regressions in next patch,
and somewhat recovers from the regression to AMDGPU test in D62223.

It is indeed not great that we leave vector decrement,
don't transform it into vector add all-ones..

https://rise4fun.com/Alive/ZRl

This is a recommit, originally committed in rL361855, but reverted
to investigate test-suite compile-time hangs.

Reviewers: RKSimon, craig.topper, spatel, arsenm

Reviewed By: RKSimon, arsenm

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, javed.absar, dstuttard, tpr, t-tye, kristof.beyls, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62263

llvm-svn: 361873
2019-05-28 20:40:03 +00:00
Roman Lebedev 96c9986199 [DAGCombiner][X86][AArch64][SPARC][SystemZ] y - (x + C) -> (y - x) - C fold. Try 2
Summary:
Direct sibling of D62223 patch.
While i don't have a direct motivational pattern for this,
it would seem to make sense to handle both patterns (or none),
for symmetry?

The aarch64 changes look neutral;
sparc and systemz look like improvement (one less instruction each);
x86 changes - 32bit case improves, 64bit case shows that LEA no longer
gets constructed, which may be because that whole test is `-mattr=+slow-lea,+slow-3ops-lea`

https://rise4fun.com/Alive/ffh

This is a recommit, originally committed in rL361853, but reverted
to investigate test-suite compile-time hangs.

Reviewers: RKSimon, craig.topper, spatel, t.p.northover

Reviewed By: t.p.northover

Subscribers: t.p.northover, jyknight, javed.absar, kristof.beyls, fedor.sergeev, jrtc27, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62252

llvm-svn: 361872
2019-05-28 20:39:55 +00:00
Roman Lebedev 2feb7e56e2 [DAGCombiner][X86][AArch64][AMDGPU] (x + C) - y -> (x - y) + C fold. Try 2
Summary:
The main motivation is shown by all these `neg` instructions that are now created.
In particular, the `@reg32_lshr_by_negated_unfolded_sub_b` test.

AArch64 test changes all look good (`neg` created), or neutral.

X86 changes look neutral (vectors), or good (`neg` / `xor eax, eax` created).

I'm not sure about `X86/ragreedy-hoist-spill.ll`, it looks like the spill
is now hoisted into preheader (which should still be good?),
2 4-byte reloads become 1 8-byte reload, and are elsewhere,
but i'm not sure how that affects that loop.

I'm unable to interpret AMDGPU change, looks neutral-ish?

This is hopefully a step towards solving [[ https://bugs.llvm.org/show_bug.cgi?id=41952 | PR41952 ]].

https://rise4fun.com/Alive/pkdq (we are missing more patterns, i'll submit them later)

This is a recommit, originally committed in rL361852, but reverted
to investigate test-suite compile-time hangs.

Reviewers: craig.topper, RKSimon, spatel, arsenm

Reviewed By: RKSimon

Subscribers: bjope, qcolombet, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, javed.absar, dstuttard, tpr, t-tye, kristof.beyls, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62223

llvm-svn: 361871
2019-05-28 20:39:39 +00:00
Roman Lebedev 272d70c366 Revert DAGCombine "hoist binop with const" folds
Appear to introduce test-suite compile-time hang.

http://lab.llvm.org:8011/builders/clang-cmake-x86_64-sde-avx512-linux/builds/22825

This reverts r361852,r361853,r361854,r361855,r361856

llvm-svn: 361865
2019-05-28 19:04:21 +00:00
Roman Lebedev 7669665432 [DAGCombine] (x - C) - y -> (x - y) - C fold
Summary:
Again only vectors affected. Frustrating. Let me take a look into that..

https://rise4fun.com/Alive/AAq

Reviewers: RKSimon, craig.topper, spatel

Reviewed By: RKSimon

Subscribers: javed.absar, JDevlieghere, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62294

llvm-svn: 361856
2019-05-28 17:54:21 +00:00
Roman Lebedev 8c9b3e4e4a [DAGCombine][X86][AArch64][AMDGPU] (x - y) + -1 -> add (xor y, -1), x fold
Summary:
This prevents regressions in next patch,
and somewhat recovers from the regression to AMDGPU test in D62223.

It is indeed not great that we leave vector decrement,
don't transform it into vector add all-ones..

https://rise4fun.com/Alive/ZRl

Reviewers: RKSimon, craig.topper, spatel, arsenm

Reviewed By: RKSimon, arsenm

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, javed.absar, dstuttard, tpr, t-tye, kristof.beyls, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62263

llvm-svn: 361855
2019-05-28 17:54:13 +00:00
Roman Lebedev 6a24c9b9ab [DAGCombiner][X86][AArch64] (x - C) + y -> (x + y) - C fold
Summary:
Only vector tests are being affected here,
since subtraction by scalar constant is rewritten
as addition by negated constant.

No surprising test changes.

https://rise4fun.com/Alive/pbT

Reviewers: RKSimon, craig.topper, spatel

Reviewed By: RKSimon

Subscribers: javed.absar, kristof.beyls, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62257

llvm-svn: 361854
2019-05-28 17:54:04 +00:00
Roman Lebedev 1499f65ac1 [DAGCombiner][X86][AArch64][SPARC][SystemZ] y - (x + C) -> (y - x) - C fold
Summary:
Direct sibling of D62223 patch.
While i don't have a direct motivational pattern for this,
it would seem to make sense to handle both patterns (or none),
for symmetry?

The aarch64 changes look neutral;
sparc and systemz look like improvement (one less instruction each);
x86 changes - 32bit case improves, 64bit case shows that LEA no longer
gets constructed, which may be because that whole test is `-mattr=+slow-lea,+slow-3ops-lea`

https://rise4fun.com/Alive/ffh

Reviewers: RKSimon, craig.topper, spatel, t.p.northover

Reviewed By: t.p.northover

Subscribers: t.p.northover, jyknight, javed.absar, kristof.beyls, fedor.sergeev, jrtc27, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62252

llvm-svn: 361853
2019-05-28 17:53:54 +00:00
Roman Lebedev 19f51ec04a [DAGCombiner][X86][AArch64][AMDGPU] (x + C) - y -> (x - y) + C fold
Summary:
The main motivation is shown by all these `neg` instructions that are now created.
In particular, the `@reg32_lshr_by_negated_unfolded_sub_b` test.

AArch64 test changes all look good (`neg` created), or neutral.

X86 changes look neutral (vectors), or good (`neg` / `xor eax, eax` created).

I'm not sure about `X86/ragreedy-hoist-spill.ll`, it looks like the spill
is now hoisted into preheader (which should still be good?),
2 4-byte reloads become 1 8-byte reload, and are elsewhere,
but i'm not sure how that affects that loop.

I'm unable to interpret AMDGPU change, looks neutral-ish?

This is hopefully a step towards solving [[ https://bugs.llvm.org/show_bug.cgi?id=41952 | PR41952 ]].

https://rise4fun.com/Alive/pkdq (we are missing more patterns, i'll submit them later)

Reviewers: craig.topper, RKSimon, spatel, arsenm

Reviewed By: RKSimon

Subscribers: bjope, qcolombet, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, javed.absar, dstuttard, tpr, t-tye, kristof.beyls, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62223

llvm-svn: 361852
2019-05-28 17:53:43 +00:00
Simon Pilgrim 9cd9624fb6 [DAG] LegalizeVectorTypes - reduce scope of local variables. NFCI.
Move the element index/count variables into the block where they are actually used - appeases cppcheck and helps avoid shadow variable warnings.

llvm-svn: 361821
2019-05-28 13:46:26 +00:00
David Stenberg 5d0e6b6755 Stop undef fragments from closing non-overlapping fragments
Summary:
When DwarfDebug::buildLocationList() encountered an undef debug value,
it would truncate all open values, regardless if they were overlapping or
not. This patch fixes so that it only does that for overlapping fragments.

This change unearthed a bug that I had introduced in D57511,
which I have fixed in this patch. The code in DebugHandlerBase that
changes labels for parameter debug values could break DwarfDebug's
assumption that the labels for the entries in the debug value history
are monotonically increasing. Before this patch, that bug could result
in location list entries whose ending address was lower than the
beginning address, and with the changes for undef debug values that this
patch introduces it could trigger an assertion, due to attempting to
emit location list entries with empty ranges. A reproducer for the bug
is added in param-reg-const-mix.mir.

Reviewers: aprantl, jmorse, probinson

Reviewed By: aprantl

Subscribers: javed.absar, llvm-commits

Tags: #debug-info, #llvm

Differential Revision: https://reviews.llvm.org/D62379

llvm-svn: 361820
2019-05-28 13:23:25 +00:00
Matt Arsenault d3ed418ad3 MIR: Fix printer crashing on dead CSR frame indexes
llvm-svn: 361819
2019-05-28 13:08:31 +00:00
Benjamin Kramer 57e267a2e9 [X86] Custom lower CONCAT_VECTORS of v2i1
The generic legalizer cannot handle this. Add an assert instead of
silently miscompiling vectors with elements smaller than 8 bits.

llvm-svn: 361814
2019-05-28 12:52:57 +00:00
Matt Arsenault ca84c4be4b RegAllocFast: Set MayLiveAcrossBlocks when allocating uses
Setting mayLiveOut based only on use instructions after allocating the
def block did not work if the use block was allocated before the def
block, since the virtual register uses were already removed.

Fixes bug 41973.

llvm-svn: 361781
2019-05-27 20:37:31 +00:00
Sanjay Patel 2f99d009c1 [SelectionDAG] fold concat of extract subvectors
This is derived from the related fold for build vectors.
We also have a version of this in DAGCombiner. The benefit of
having this fold at node creation time is (1) efficiency and
(2) preventing infinite looping from creating patterns that
should not exist in the first place.

Currently, the inf-loop could happen with MergeConsecutiveStores()
because it naively creates concat of extracts when forming a wider
vector store. That could fight with target-specific store narrowing.

llvm-svn: 361780
2019-05-27 20:26:21 +00:00
Sanjay Patel e13ae3e4d8 [SelectionDAG] fix formatting and redundant comments; NFC
There's a possible missing fold here for extracting from the
same source vector. It's similar to a check that we use to
squash a build vector with all extracted elements from the
same source vector.

llvm-svn: 361778
2019-05-27 18:26:43 +00:00
Michael Liao 9c70c574b4 [SelectionDAG] Enhance the simplification of `copyto` from `implicit-def`.
Summary:
- The current implementation simplifies the case where the source of
  `copyto` is `implicit-def`ed. However, it only works when that
  `implicit-def` is single-used since it detects that from
  `implicit-def` and cannot determine which destination vreg should be
  used if there are multiple uses.
- This patch changes that detection when `copyto` is being emitted. If
  that `copyto`'s source is defined from `implicit-def`, it simplifies
  it. Hence, it works even that `implicit-def` is multi-used.
- Except it simplifies the internal IR, it won't improve the quality of
  code generation. However, it helps to detect 'implicit-def` in a
  straight-forward manner in some passes, such as `si-i1-copies`. A test
  case is added.

Reviewers: sunfish, nhaehnle

Subscribers: jvesely, hiraditya, asbirlea, llvm-commits, yaxunl

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62342

llvm-svn: 361777
2019-05-27 18:26:29 +00:00
Simon Pilgrim ebb053b139 [SelectionDAG] GetDemandedBits - add demanded elements wrapper implementation
The DemandedElts variable is pretty much inert at the moment - the original GetDemandedBits implementation calls it with an 'all ones' DemandedElts value so the function is active and behaves exactly as it used to.

llvm-svn: 361773
2019-05-27 16:39:25 +00:00
Nikola Prica 441ad62531 Test commit (NFC)
Add blank line.

llvm-svn: 361761
2019-05-27 13:51:30 +00:00
David L. Jones 0ff41b8a5a Revert r361356: "[MIR] Add simple PRE pass to MachineCSE"
This is problematic on buildbots, as discussed here: https://reviews.llvm.org/rL361356

It seems like the plan already was to revert, but that hasn't happened yet.

llvm-svn: 361746
2019-05-27 06:00:00 +00:00
Alexander Timofeev ba447bae74 [AMDGPU] Divergence driven ISel. Assign register class for cross block values according to the divergence.
Details: To make instruction selection really divergence driven it is necessary to assign
             the correct register classes to the cross block values beforehand. For the divergent targets
             same value type requires different register classes dependent on the value divergence.

    Reviewers: rampitec, nhaehnle

    Differential Revision: https://reviews.llvm.org/D59990

    This commit was reverted because of the build failure.
    The reason was mlformed patch.
    Build failure fixed.

llvm-svn: 361741
2019-05-26 20:33:26 +00:00
Simon Pilgrim 06e02856ab [SelectionDAG] GetDemandedBits - cleanup to more closely match SimplifyDemandedBits. NFCI.
Prep work before adding demanded elts support.

llvm-svn: 361739
2019-05-26 18:58:14 +00:00
Simon Pilgrim 2916b9e28c [SelectionDAG] MaskedValueIsZero - add demanded elements implementation
Will be used in an upcoming patch but I've updated the original implementation to call this to ensure test coverage.

llvm-svn: 361738
2019-05-26 18:43:44 +00:00
Sanjay Patel 91131b6500 [SelectionDAG] soften assertion when legalizing narrow vector FP ops
The test based on PR42010:
https://bugs.llvm.org/show_bug.cgi?id=42010
...may show an inaccuracy for PPC's target defs, but we should not
be so aggressive with an assert here. There's no telling what out-of-tree
targets look like.

llvm-svn: 361696
2019-05-25 13:48:07 +00:00
Peter Collingbourne 3b93737446 Revert r361644, "[AMDGPU] Divergence driven ISel. Assign register class for cross block values according to the divergence."
Broke sanitizer bots:
http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux/builds/21694/steps/bootstrap%20clang/logs/stdio
http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/32478/steps/check-llvm%20asan/logs/stdio

llvm-svn: 361688
2019-05-25 01:52:38 +00:00
Alexander Timofeev dffedea014 [AMDGPU] Divergence driven ISel. Assign register class for cross block values according to the divergence.
Details: To make instruction selection really divergence driven it is necessary to assign
         the correct register classes to the cross block values beforehand. For the divergent targets
         same value type requires different register classes dependent on the value divergence.

Reviewers: rampitec, nhaehnle

Differential Revision: https://reviews.llvm.org/D59990

llvm-svn: 361644
2019-05-24 15:32:18 +00:00
Simon Pilgrim 95b8d9bbf8 [SelectionDAG] computeKnownBits - support constant pool values from target
This patch adds the overridable TargetLowering::getTargetConstantFromLoad function which allows targets to return any constant value loaded by a LoadSDNode node - only X86 makes use of this so far but everything should be in place for other targets.

computeKnownBits then uses this function to improve codegen, notably vector code after legalization.

A future commit will do the same for ComputeNumSignBits but computeKnownBits sees the bigger benefit.

This required a couple of fixes:
* SimplifyDemandedBits must early-out for getTargetConstantFromLoad cases to prevent infinite loops of constant regeneration (similar to what we already do for BUILD_VECTOR).
* Fix a DAGCombiner::visitTRUNCATE issue as we had trunc(shl(v8i32),v8i16) <-> shl(trunc(v8i16),v8i32) infinite loops after legalization on AVX512 targets.

Differential Revision: https://reviews.llvm.org/D61887

llvm-svn: 361620
2019-05-24 10:03:11 +00:00
Bjorn Pettersson b4771425f5 Use the DataLayout::typeSizeEqualsStoreSize helper. NFC
Just a minor refactoring to use the new helper method
DataLayout::typeSizeEqualsStoreSize(). This is done when
checking if getTypeSizeInBits is equal/non-equal to
getTypeStoreSizeInBits.

llvm-svn: 361613
2019-05-24 09:20:20 +00:00
Tim Northover 3b2157aeed GlobalISel: support swifterror attribute on AArch64.
swifterror marks an argument as a register pretending to be a pointer, so we
need a guaranteed mem2reg-like analysis of its uses. Fortunately most of the
infrastructure can be reused from the DAG world.

llvm-svn: 361608
2019-05-24 08:40:13 +00:00
Tim Northover 3d7a057b0d CodeGen: factor out swifterror value tracking.
llvm-svn: 361607
2019-05-24 08:39:43 +00:00
Sanjay Patel 7d6c0bce50 [DAGCombiner] make folds of binops safe for opcodes that produce >1 value
This is no-functional-change-intended currently because the definition
of isBinOp() only includes opcodes that produce 1 value. But if we
share that implementation with isCommutativeBinOp() as proposed in
D62191, then we need to make sure that the callers bail out for
opcodes that they are not prepared to handle correctly.

llvm-svn: 361547
2019-05-23 20:17:25 +00:00
Matt Arsenault 0f3ba44b57 AMDGPU/GlobalISel: Legality for integer min/max
llvm-svn: 361519
2019-05-23 17:58:48 +00:00
Shoaib Meenai 87226a7202 [AsmPrinter] Treat a narrowing PtrToInt like Trunc
When printing assembly for PtrToInt, AsmPrinter::lowerConstant
incorrectly assumed that if PtrToInt was not converting to an
int with exactly the same number of bits, it must be widening
to a larger int. But this isn't necessarily true; PtrToInt can
also shrink the size, which is useful when you want to produce
a known 32-bit pointer on a 64-bit platform (on x86_64 ELF
this yields a R_X86_64_32 relocation).

The old behavior of falling through to the widening case for a
narrowing PtrToInt yields bogus assembly code like this, which
fails to assemble because the no-op bit and it accidentally
creates is not a valid relocation:

```
        .long   a&-1
```

The fix is to treat a narrowing PtrToInt exactly the same as
it already treats Trunc: just emit the expression and let
the assembler deal with truncating it in the appropriate way.

Patch by Mat Hostetter <mjh@fb.com>.

Differential Revision: https://reviews.llvm.org/D61325

llvm-svn: 361508
2019-05-23 16:29:09 +00:00
Petar Jovanovic aa28b6d198 [LiveDebugValues] Rename 'DMI' into 'DebugInstr' (NFC)
This will improve code readability.

Patch by Djordje Todorovic.

Differential Revision: https://reviews.llvm.org/D62295

llvm-svn: 361497
2019-05-23 13:49:06 +00:00
Clement Courbet 43882b16a3 [MergeICmps] Make the pass compatible with the new pass manager.
Reviewers: gchatelet, spatel

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62287

llvm-svn: 361490
2019-05-23 12:35:26 +00:00
Petar Jovanovic ff47d83e78 [DwarfExpression] Refactor dwarf expression (NFC)
Refactor location description kind in order to be easier for extensions
(needed for D60866).
In addition, cut off some bits from the other class fields.

Patch by Djordje Todorovic.

Differential Revision: https://reviews.llvm.org/D62002

llvm-svn: 361480
2019-05-23 10:37:13 +00:00
Matt Arsenault ca64ef2043 MC: Allow getMaxInstLength to depend on the subtarget
Keep it optional in cases this is ever needed in some global
context. Currently it's only used for getting an upper bound inline
asm code size.

For AMDGPU, gfx10 increases the maximum instruction size to
20-bytes. This avoids penalizing older subtargets when estimating code
size, and making some annoying branch relaxation test adjustments.

llvm-svn: 361405
2019-05-22 16:28:41 +00:00
Kees Cook c2187c20a4 [TargetLowering] Extend bool args to inline-asm according to getBooleanType
Summary:
This extends Krzysztof Parzyszek's X86-specific solution
(https://reviews.llvm.org/D60208) to the generic code pointed out by
James Y Knight.

Reviewers: kparzysz, craig.topper, nickdesaulniers

Subscribers: efriedma, sdardis, nemanjai, javed.absar, eraman, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, llvm-commits, srhines, void, nickdesaulniers, jyknight

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60224

llvm-svn: 361404
2019-05-22 16:16:15 +00:00
Kees Cook a7a687e500 [TargetLowering] Add blank line (test commit)
llvm-svn: 361403
2019-05-22 16:02:13 +00:00
Anton Afanasyev df00c6a54f [MIR] Add simple PRE pass to MachineCSE
This is the second part of the commit fixing PR38917 (hoisting
partitially redundant machine instruction). Most of PRE (partitial
redundancy elimination) and CSE work is done on LLVM IR, but some of
redundancy arises during DAG legalization. Machine CSE is not enough
to deal with it. This simple PRE implementation works a little bit
intricately: it passes before CSE, looking for partitial redundancy
and transforming it to fully redundancy, anticipating that the next
CSE step will eliminate this created redundancy. If CSE doesn't
eliminate this, than created instruction will remain dead and eliminated
later by Remove Dead Machine Instructions pass.

The third part of the commit is supposed to refactor MachineCSE,
to make it more clear and to merge MachinePRE with MachineCSE,
so one need no rely on further Remove Dead pass to clear instrs
not eliminated by CSE.

First step: https://reviews.llvm.org/D54839

Fixes llvm.org/PR38917

llvm-svn: 361356
2019-05-22 07:41:34 +00:00
Stanislav Mekhanoshin 44d17ca02e Fix register coalescer failure to prune value
Register coalescer fails for the test in the patch with the assertion in
JoinVals::ConflictResolution `DefMI != nullptr'. It attempts to join
live intervals for two adjacent instructions and erase the copy:

    %2:vreg_256 = COPY %1
    %3:vreg_256 = COPY killed %1

The LI needs to be adjusted to kill subrange for the erased instruction
and extend the subrange of the original def. That was done for the main
interval only but not for the subrange. As a result subrange had a VNI
pointing to the erased slot resulting in the above failure.

Differential Revision: https://reviews.llvm.org/D62162

llvm-svn: 361293
2019-05-21 19:32:41 +00:00
Leonard Chan 0bada7ce6c [Intrinsic] Signed Fixed Point Saturation Multiplication Intrinsic
Add an intrinsic that takes 2 signed integers with the scale of them provided
as the third argument and performs fixed point multiplication on them. The
result is saturated and clamped between the largest and smallest representable
values of the first 2 operands.

This is a part of implementing fixed point arithmetic in clang where some of
the more complex operations will be implemented as intrinsics.

Differential Revision: https://reviews.llvm.org/D55720

llvm-svn: 361289
2019-05-21 19:17:19 +00:00
Sanjay Patel 10f6b39899 [SelectionDAG] fold insert subvector of undef into undef
DAGCombiner simplifies this more liberally as:
  // If inserting an UNDEF, just return the original vector.
  if (N1.isUndef())
    return N0;

So there's no way to make this visible in output AFAIK, but
doing this at node creation time should be slightly more efficient.

llvm-svn: 361287
2019-05-21 18:53:53 +00:00
Sanjay Patel 51dc59d090 [SelectionDAG] remove redundant code; NFCI
getNode() squashes concatenation of undefs via FoldCONCAT_VECTORS():
  // Concat of UNDEFs is UNDEF.
  if (llvm::all_of(Ops, [](SDValue Op) { return Op.isUndef(); }))
    return DAG.getUNDEF(VT);

llvm-svn: 361284
2019-05-21 18:28:22 +00:00
Sanjay Patel 78c3f58122 [DAGCombiner] prevent unsafe reassociation of FP ops
There are no FP callers of DAGCombiner::reassociateOps() currently,
but we can add a fast-math check to make sure this API is not being
misused.

This was noted as a potential risk (and that risk might increase) with:
D62191

llvm-svn: 361268
2019-05-21 14:47:38 +00:00
Florian Hahn f9b28e53c7 [ScheduleDAGInstrs] Compute topological ordering on demand.
In most cases, the topological ordering does not get changed in
ScheduleDAGInstrs. We can compute the ordering on demand, similar to
D60125.

This drastically cuts down the number of times we need to compute the
topological ordering, e.g. for SPEC2006, SPEC2k and MultiSource, we get
the following stats for -O3 -flto on X86 (showing the top reductions,
with small absolute values filtered). The smallest reduction is -50%.

Slightly positive impact on compile-time (-0.1 % geomean speedup for
test-suite + SPEC & co, with -O1 on X86)

Tests: 243
Metric: pre-RA-sched.NumTopoInits

Program                                        base       patch  diff
 test-suite...ngs-C/fixoutput/fixoutput.test   115.00      3.00   -97.4%
 test-suite...ks/Prolangs-C/cdecl/cdecl.test   957.00     26.00   -97.3%
 test-suite...math/automotive-basicmath.test   107.00      3.00   -97.2%
 test-suite...rolangs-C++/deriv2/deriv2.test   144.00      6.00   -95.8%
 test-suite...lowfish/security-blowfish.test   410.00     18.00   -95.6%
 test-suite...frame_layout/frame_layout.test   441.00     23.00   -94.8%
 test-suite...rolangs-C++/employ/employ.test   159.00     11.00   -93.1%
 test-suite...s/Ptrdist/anagram/anagram.test   157.00     11.00   -93.0%
 test-suite...s-C/unix-smail/unix-smail.test   829.00     59.00   -92.9%
 test-suite...chmarks/Olden/power/power.test   154.00     11.00   -92.9%
 test-suite...T95/147.vortex/147.vortex.test   19876.00  1434.00  -92.8%
 test-suite...000/255.vortex/255.vortex.test   19881.00  1435.00  -92.8%
 test-suite...ce/Applications/Burg/burg.test   2203.00   168.00   -92.4%
 test-suite...urce/Applications/hbd/hbd.test   1067.00    85.00   -92.0%
 test-suite...ternal/HMMER/hmmcalibrate.test   3145.00   251.00   -92.0%
 test-suite.../Applications/spiff/spiff.test   1037.00    84.00   -91.9%
 test-suite...SPEC/CINT95/130.li/130.li.test   5913.00   487.00   -91.8%
 test-suite.../CINT95/134.perl/134.perl.test   12532.00  1041.00  -91.7%
 test-suite...ce/Benchmarks/Olden/bh/bh.test   220.00     19.00   -91.4%
 test-suite :: External/Nurbs/nurbs.test       2304.00   206.00   -91.1%
 test-suite...arks/VersaBench/dbms/dbms.test   773.00     75.00   -90.3%
 test-suite...ce/Applications/siod/siod.test   9043.00   878.00   -90.3%
 test-suite...pplications/treecc/treecc.test   4510.00   438.00   -90.3%
 test-suite...T2006/456.hmmer/456.hmmer.test   7093.00   697.00   -90.2%
 test-suite...s-C/Pathfinder/PathFinder.test   882.00     87.00   -90.1%
 test-suite.../CINT2000/176.gcc/176.gcc.test   64978.00  6721.00  -89.7%
 test-suite...cations/hexxagon/hexxagon.test   657.00     69.00   -89.5%
 test-suite...fice-ispell/office-ispell.test   2712.00   285.00   -89.5%
 test-suite.../CINT2006/403.gcc/403.gcc.test   139613.00 14992.00 -89.3%
 test-suite...lications/ClamAV/clamscan.test   25880.00  2785.00  -89.2%

Reviewers: MatzeB, atrick, efriedma, niravd

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D60839

llvm-svn: 361253
2019-05-21 13:04:53 +00:00
Dylan McKay e967308da4 Add TargetLoweringInfo hook for explicitly setting the ABI calling convention endianess
Summary:
The endianess used in the calling convention does not always match the
endianess of the target on all architectures, namely AVR.

When an argument is too large to be legalised by the architecture and is
split for the ABI, a new hook TargetLoweringInfo::shouldSplitFunctionArgumentsAsLittleEndian
is queried to find the endianess that function arguments must be laid
out in.

This approach was recommended by Eli Friedman.

Originally reported in https://github.com/avr-rust/rust/issues/129.

Patch by Carl Peto.

Reviewers: bogner, t.p.northover, RKSimon, niravd, efriedma

Reviewed By: efriedma

Subscribers: JDevlieghere, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62003

llvm-svn: 361222
2019-05-21 06:38:02 +00:00
Craig Topper 97d4f7c194 [SelectionDAGBuilder] Flush PendingExports before creating INLINEASM_BR node for asm goto.
Since INLINEASM_BR is a terminator we need to flush the pending exports before
emitting it. If we don't do this, a TokenFactor can be inserted between it and
the BR instruction emitted to finish the callbr lowering.

It looks like nodes are glued to the INLINEASM_BR so I had to make sure we emit
the TokenFactor before that.

Differential Revision: https://reviews.llvm.org/D59981

llvm-svn: 361177
2019-05-20 17:08:02 +00:00
Craig Topper af7a188453 [Intrinsics] Merge lround.i32 and lround.i64 into a single intrinsic with overloaded result type. Make result type for llvm.llround overloaded instead of fixing to i64
We shouldn't really make assumptions about possible sizes for long and long long. And longer term we should probably support vectorizing these intrinsics. By making the result types not fixed we can support vectors as well.

Differential Revision: https://reviews.llvm.org/D62026

llvm-svn: 361169
2019-05-20 16:27:09 +00:00
Craig Topper 203bfdd0f0 [DAGCombiner] Refactor code in visitShiftByConstant slightly to make it more readable. NFC
This changes the isShift variable to include the constant operand
check that was previously in the if statement.

While there fix an 80 column violation and an unnecessary use of
getNode. Also fix variable name capitalization.

llvm-svn: 361168
2019-05-20 16:26:55 +00:00
Nikita Popov 9060b6df97 [SDAG] Vector op legalization for overflow ops
Fixes issue reported by aemerson on D57348. Vector op legalization
support is added for uaddo, usubo, saddo and ssubo (umulo and smulo
were already supported). As usual, by extracting TargetLowering methods
and calling them from vector op legalization.

Vector op legalization doesn't really deal with multiple result nodes,
so I'm explicitly performing a recursive legalization call on the
result value that is not being legalized.

There are some existing test changes because expansion happens
earlier, so we don't get a DAG combiner run in between anymore.

Differential Revision: https://reviews.llvm.org/D61692

llvm-svn: 361166
2019-05-20 16:09:22 +00:00
Matt Arsenault 7c8ec18964 RegAlloc: Fix verifier error with undef identity copies
The code did not match the example in the comment, and was checking
the undef flag on the copy dest instead of source. The existing tests
were only hitting the > 2 operands case.

llvm-svn: 361156
2019-05-20 14:09:36 +00:00
Guillaume Chatelet e386a01e84 [NFC] Refactor visitIntrinsicCall so it doesn't return a const char*
Summary: API simplification

Reviewers: courbet

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61306

llvm-svn: 361140
2019-05-20 11:01:30 +00:00
Petar Jovanovic e85bbf564d [DebugInfoMetadata] Refactor DIExpression::prepend constants (NFC)
Refactor DIExpression::With* into a flag enum in order to be less
error-prone to use (as discussed on D60866).

Patch by Djordje Todorovic.

Differential Revision: https://reviews.llvm.org/D61943

llvm-svn: 361137
2019-05-20 10:35:57 +00:00
Guillaume Chatelet a760e69840 Revert "[NFC] Refactor visitIntrinsicCall so it doesn't return a const char*"
This reverts commit 706d3cd6388cc3446aab282f3af879862b10cbed.

llvm-svn: 361130
2019-05-20 09:00:12 +00:00
Guillaume Chatelet fa8c152576 [NFC] Refactor visitIntrinsicCall so it doesn't return a const char*
Summary: API simplification

Reviewers: courbet

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61306

llvm-svn: 361129
2019-05-20 08:52:10 +00:00
Simon Pilgrim 2b45a70fd6 MemCmpExpansion::getCompareLoadPairs - assert we find a comparison diff. NFCI.
Fix scan-build uninitialized warning and assert the final diff isn't null.

llvm-svn: 361095
2019-05-18 11:31:48 +00:00
Matt Arsenault 02b5ca8cd1 GlobalISel: Implement lower for S64->S32 [SU]ITOFP
This is ported from the custom AMDGPU DAG implementation. I think this
is a better default expansion than what the DAG currently uses, at
least if the target has CTLZ.

This implements the signed version in terms of the unsigned
conversion, which is implemented with bit operations. SelectionDAG has
several other implementations that should eventually be ported
depending on what instructions are legal.

llvm-svn: 361081
2019-05-17 23:05:13 +00:00
Matt Arsenault f3cedf4823 GlobalISel: Define integer min/max instructions
Doesn't attempt to emit them for anything yet, but some legalizations
I want to port use them.

llvm-svn: 361061
2019-05-17 18:36:31 +00:00
Roman Lebedev 64c756b991 [DAGCombiner] visitShiftByConstant(): drop bogus signbit check
Summary:
That check claims that the transform is illegal otherwise.
That isn't true:
1. For `ISD::ADD`, we only process `ISD::SHL` outer shift => sign bit does not matter
   https://rise4fun.com/Alive/K4A
2. For `ISD::AND`, there is no restriction on constants:
   https://rise4fun.com/Alive/Wy3
3. For `ISD::OR`, there is no restriction on constants:
   https://rise4fun.com/Alive/GOH
3. For `ISD::XOR`, there is no restriction on constants:
   https://rise4fun.com/Alive/ml6

So, why is it there then?

This changes the testcase that was touched by @spatel in rL347478,
but i'm not sure that test tests anything particular?

Reviewers: RKSimon, spatel, craig.topper, jojo, rengolin

Reviewed By: spatel

Subscribers: javed.absar, llvm-commits, spatel

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61918

llvm-svn: 361044
2019-05-17 15:52:58 +00:00
Matt Arsenault 1448f5689e AMDGPU/GlobalISel: Legalize G_FCOPYSIGN
llvm-svn: 361025
2019-05-17 12:19:52 +00:00
Fangrui Song ec6dc3089e [GlobalISel] Fix -Wsign-compare on 32-bit -DLLVM_ENABLE_ASSERTIONS=on builds
llvm-svn: 360989
2019-05-17 05:53:39 +00:00
Ben Dunbobbin 1d16515fb4 [ELF] Implement Dependent Libraries Feature
This patch implements a limited form of autolinking primarily designed to allow
either the --dependent-library compiler option, or "comment lib" pragmas (
https://docs.microsoft.com/en-us/cpp/preprocessor/comment-c-cpp?view=vs-2017) in
C/C++ e.g. #pragma comment(lib, "foo"), to cause an ELF linker to automatically
add the specified library to the link when processing the input file generated
by the compiler.

Currently this extension is unique to LLVM and LLD. However, care has been taken
to design this feature so that it could be supported by other ELF linkers.

The design goals were to provide:

- A simple linking model for developers to reason about.
- The ability to to override autolinking from the linker command line.
- Source code compatibility, where possible, with "comment lib" pragmas in other
  environments (MSVC in particular).

Dependent library support is implemented differently for ELF platforms than on
the other platforms. Primarily this difference is that on ELF we pass the
dependent library specifiers directly to the linker without manipulating them.
This is in contrast to other platforms where they are mapped to a specific
linker option by the compiler. This difference is a result of the greater
variety of ELF linkers and the fact that ELF linkers tend to handle libraries in
a more complicated fashion than on other platforms. This forces us to defer
handling the specifiers to the linker.

In order to achieve a level of source code compatibility with other platforms
we have restricted this feature to work with libraries that meet the following
"reasonable" requirements:

1. There are no competing defined symbols in a given set of libraries, or
   if they exist, the program owner doesn't care which is linked to their
   program.
2. There may be circular dependencies between libraries.

The binary representation is a mergeable string section (SHF_MERGE,
SHF_STRINGS), called .deplibs, with custom type SHT_LLVM_DEPENDENT_LIBRARIES
(0x6fff4c04). The compiler forms this section by concatenating the arguments of
the "comment lib" pragmas and --dependent-library options in the order they are
encountered. Partial (-r, -Ur) links are handled by concatenating .deplibs
sections with the normal mergeable string section rules. As an example, #pragma
comment(lib, "foo") would result in:

.section ".deplibs","MS",@llvm_dependent_libraries,1
         .asciz "foo"

For LTO, equivalent information to the contents of a the .deplibs section can be
retrieved by the LLD for bitcode input files.

LLD processes the dependent library specifiers in the following way:

1. Dependent libraries which are found from the specifiers in .deplibs sections
   of relocatable object files are added when the linker decides to include that
   file (which could itself be in a library) in the link. Dependent libraries
   behave as if they were appended to the command line after all other options. As
   a consequence the set of dependent libraries are searched last to resolve
   symbols.
2. It is an error if a file cannot be found for a given specifier.
3. Any command line options in effect at the end of the command line parsing apply
   to the dependent libraries, e.g. --whole-archive.
4. The linker tries to add a library or relocatable object file from each of the
   strings in a .deplibs section by; first, handling the string as if it was
   specified on the command line; second, by looking for the string in each of the
   library search paths in turn; third, by looking for a lib<string>.a or
   lib<string>.so (depending on the current mode of the linker) in each of the
   library search paths.
5. A new command line option --no-dependent-libraries tells LLD to ignore the
   dependent libraries.

Rationale for the above points:

1. Adding the dependent libraries last makes the process simple to understand
   from a developers perspective. All linkers are able to implement this scheme.
2. Error-ing for libraries that are not found seems like better behavior than
   failing the link during symbol resolution.
3. It seems useful for the user to be able to apply command line options which
   will affect all of the dependent libraries. There is a potential problem of
   surprise for developers, who might not realize that these options would apply
   to these "invisible" input files; however, despite the potential for surprise,
   this is easy for developers to reason about and gives developers the control
   that they may require.
4. This algorithm takes into account all of the different ways that ELF linkers
   find input files. The different search methods are tried by the linker in most
   obvious to least obvious order.
5. I considered adding finer grained control over which dependent libraries were
   ignored (e.g. MSVC has /nodefaultlib:<library>); however, I concluded that this
   is not necessary: if finer control is required developers can fall back to using
   the command line directly.

RFC thread: http://lists.llvm.org/pipermail/llvm-dev/2019-March/131004.html.

Differential Revision: https://reviews.llvm.org/D60274

llvm-svn: 360984
2019-05-17 03:44:15 +00:00
Amy Huang c2029068bc Emit global variables as S_CONSTANT records for codeview debug info.
Summary:
This emits S_CONSTANT records for global variables.
Currently this emits records for the global variables already being tracked in the
LLVM IR metadata, which are just constant global variables; we'll also want S_CONSTANTs
for static data members and enums.

Related to https://bugs.llvm.org/show_bug.cgi?id=41615

Reviewers: rnk

Subscribers: aprantl, hiraditya, llvm-commits, thakis

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61926

llvm-svn: 360948
2019-05-16 22:28:52 +00:00
Tim Renouf e3cbdaf1b5 [CodeGen] Fixed de-optimization of legalize subvector extract
The recent introduction of v3i32 etc as an MVT, and its use in AMDGPU
3-dword memory instructions, caused a de-optimization problem for code
with such a load that then bitcasts via vector of i8, because v12i8 is
not an MVT so it legalizes the bitcast by widening it.

This commit adds the ability to widen a bitcast using extract_subvector
on the result, so the value does not need to go via memory.

Differential Revision: https://reviews.llvm.org/D60457

Change-Id: Ie4abb7760547e54a2445961992eafc78e80d4b64
llvm-svn: 360942
2019-05-16 21:49:06 +00:00
Adhemerval Zanella 73643b5041 [CodeGen] Add lround/llround builtins
This patch add the ISD::LROUND and ISD::LLROUND along with new
intrinsics.  The changes are straightforward as for other
floating-point rounding functions, with just some adjustments
required to handle the return value being an interger.

The idea is to optimize lround/llround generation for AArch64
in a subsequent patch.  Current semantic is just route it to libm
symbol.

llvm-svn: 360889
2019-05-16 13:15:27 +00:00
Matt Arsenault 828b685ebe RegAllocFast: Improve hinting heuristic
Trace through multiple COPYs when looking for a physreg source. Add
hinting for vregs that will be copied into physregs (we only hinted
for vregs getting copied to a physreg previously).  Give hinted a
register a bonus when deciding which value to spill.  This is part of
my rewrite regallocfast series. In fact this one doesn't even have an
effect unless you also flip the allocation to happen from back to
front of a basic block. Nonetheless it helps to split this up to ease
review of D52010

Patch by Matthias Braun

llvm-svn: 360887
2019-05-16 12:50:39 +00:00
Matt Arsenault 27ac8408f6 GlobalISel: Add DstOp version of buildIntrinsic
llvm-svn: 360879
2019-05-16 12:22:56 +00:00
Matt Arsenault 11be78bc7a GlobalISel: Add buildFConstant for APFloat
llvm-svn: 360853
2019-05-16 04:09:06 +00:00
Matt Arsenault 012ecbbbba GlobalISel: Fix indentation
llvm-svn: 360851
2019-05-16 04:08:46 +00:00
Matt Arsenault 55146d3139 GlobalISel: Add G_FCOPYSIGN
llvm-svn: 360850
2019-05-16 04:08:39 +00:00
Reid Kleckner 4882490349 [codeview] Fix SDNode representation of annotation labels
Before this change, they were erroneously constructed with the EH_LABEL
SDNode opcode, which caused other passes to interact with them in
incorrect ways. See the FIXME about fastisel that this addresses in the
existing test case.

Fixes PR41890

llvm-svn: 360818
2019-05-15 21:46:05 +00:00
Nicolai Haehnle f672b6170c [MachineOperand] Add a ChangeToGA method
Summary:
Analogous to the other ChangeToXXX methods. See the next patch for a
use case.

Change-Id: I6548d614706834fb9109ab3c8fe915e9c6ece2a7

Reviewers: arsenm, kzhuravl

Subscribers: wdng, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61651

llvm-svn: 360789
2019-05-15 17:48:10 +00:00
Nicolai Haehnle 664ceeda68 RegAlloc: try to fail more gracefully when out of registers
Summary:
The emitError path allows the program to continue, unlike report_fatal_error.
This is friendlier to use cases where LLVM is embedded in a larger program,
because the caller may be able to deal with the error somewhat gracefully.

Change the number of requested NOP bytes in the AArch64 and PowerPC
test cases to avoid triggering an unrelated assertion. The compilation
still fails, as verified by the test.

Change-Id: Iafb9ca341002a597b82e59ddc7a1f13c78758e3d

Reviewers: arsenm, MatzeB

Subscribers: qcolombet, nemanjai, wdng, javed.absar, kristof.beyls, kbarton, jsji, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61489

llvm-svn: 360786
2019-05-15 17:29:58 +00:00
Clement Courbet d9d0665d1c [[DAGCombiner][NFC] Add a comment.
As suggested in D61846.

llvm-svn: 360755
2019-05-15 08:21:18 +00:00
Fangrui Song f4dfd63c74 [IR] Disallow llvm.global_ctors and llvm.global_dtors of the 2-field form in textual format
The 3-field form was introduced by D3499 in 2014 and the legacy 2-field
form was planned to be removed in LLVM 4.0

For the textual format, this patch migrates the existing 2-field form to
use the 3-field form and deletes the compatibility code.
test/Verifier/global-ctors-2.ll checks we have a friendly error message.

For bitcode, lib/IR/AutoUpgrade UpgradeGlobalVariables will upgrade the
2-field form (add i8* null as the third field).

Reviewed By: rnk, dexonsmith

Differential Revision: https://reviews.llvm.org/D61547

llvm-svn: 360742
2019-05-15 02:35:32 +00:00
Fangrui Song 2f6ef2fc92 DWARF v5: emit DW_AT_addr_base if DW_AT_low_pc references .debug_addr
The condition !AddrPool.empty() is tested before attachRangesOrLowHighPC(), which may add an entry to AddrPool. We emit DW_AT_low_pc (DW_FORM_addrx) but may incorrectly omit DW_AT_addr_base for LineTablesOnly. This can be easily reproduced:

clang -gdwarf-5 -gmlt -c a.cc

Fix this by moving !AddrPool.empty() below.

This was discovered while investigating an lld crash (fixed by D61889) on such object files: ld.lld --gdb-index a.o

Reviewed By: probinson

Differential Revision: https://reviews.llvm.org/D61891

llvm-svn: 360678
2019-05-14 14:37:26 +00:00
Diana Picus a568222ddd [IRTranslator] Don't hardcode GEP index type
When breaking up loads and stores of aggregates, the IRTranslator uses
LLT::scalar(64) for the index type of the G_GEP instructions that
compute the addresses. This is unnecessarily large for 32-bit targets.
Use the int ptr type provided by the DataLayout instead.

Note that we're already doing the right thing when translating
getelementptr instructions from the IR. This is just an oversight when
generating new ones while translating loads/stores.

Both x86 and AArch64 already have tests confirming that the old
behaviour is preserved for 64-bit targets.

Differential Revision: https://reviews.llvm.org/D61852

llvm-svn: 360656
2019-05-14 09:25:17 +00:00
Sanjay Patel 99d6420a82 [SDAG] fix unused variable warning and unneeded indirection; NFC
llvm-svn: 360640
2019-05-14 00:57:31 +00:00
Sanjay Patel 3a13d970aa [SDAG, x86] allow targets to override test for binop opcodes
This follows the pattern of the existing isCommutativeBinOp().

x86 shows improvements from vector narrowing for the min/max opcodes.

llvm-svn: 360639
2019-05-14 00:39:40 +00:00
Nick Desaulniers c33f754e74 [TargetLowering] Handle multi depth GEPs w/ inline asm constraints
Summary:
X86TargetLowering::LowerAsmOperandForConstraint had better support than
TargetLowering::LowerAsmOperandForConstraint for arbitrary depth
getelementpointers for "i", "n", and "s" extended inline assembly
constraints. Hoist its support from the derived class into the base
class.

Link: https://github.com/ClangBuiltLinux/linux/issues/469

Reviewers: echristo, t.p.northover

Reviewed By: t.p.northover

Subscribers: t.p.northover, E5ten, kees, jyknight, nemanjai, javed.absar, eraman, hiraditya, jsji, llvm-commits, void, craig.topper, nathanchance, srhines

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61560

llvm-svn: 360604
2019-05-13 17:27:44 +00:00
Simon Pilgrim d3cedee3c6 [TargetLowering] Add SimplifyDemandedBits support for ZERO_EXTEND_VECTOR_INREG
More work for PR39709.

llvm-svn: 360592
2019-05-13 15:51:26 +00:00
Sanjay Patel 05dafb1c97 [DAGCombiner] narrow vector binop with inserts/extract
We catch most of these patterns (on x86 at least) by matching
a concat vectors opcode early in combining, but the pattern may
emerge later using insert subvector instead.

The AVX1 diffs for add/sub overflow show another missed narrowing
pattern. That one may be falling though the cracks because of
combine ordering and multiple uses.

llvm-svn: 360585
2019-05-13 14:31:14 +00:00
Kevin P. Neal 5987749e33 Add constrained fptrunc and fpext intrinsics.
The new fptrunc and fpext intrinsics are constrained versions of the
regular fptrunc and fpext instructions.

Reviewed by:	Andrew Kaylor, Craig Topper, Cameron McInally, Conner Abbot
Approved by:	Craig Topper
Differential Revision: https://reviews.llvm.org/D55897

llvm-svn: 360581
2019-05-13 13:23:30 +00:00
Simon Pilgrim d845bc3d0c TargetLowering::SimplifyDemandedBits - early-out for UNDEF ops. NFCI.
llvm-svn: 360579
2019-05-13 12:44:03 +00:00
Clement Courbet 9afc4764dd [DAGCombiner] Fix invalid alias analysis.
Summary:
When we know for sure whether two addresses do or do not alias, we
should immediately return from DAGCombiner::isAlias().

I think this comes from a bad copy/paste, Sorry for not catching that during the
code review.

Fixes PR41855.

Reviewers: niravd, gchatelet, EricWF

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61846

llvm-svn: 360566
2019-05-13 09:07:37 +00:00
Craig Topper 61e556d2bd Recommit r358887 "[TargetLowering][AMDGPU][X86] Improve SimplifyDemandedBits bitcast handling"
I've included a new fix in X86RegisterInfo to prevent PR41619 without
reintroducing r359392. We might be able to improve that in the base class
implementation of shouldRewriteCopySrc somehow. But this hopefully enables
forward progress on SimplifyDemandedBits improvements for now.

Original commit message:

This patch adds support for BigBitWidth -> SmallBitWidth bitcasts, splitting the DemandedBits/Elts accordingly.

The AMDGPU backend needed an extra  (srl (and x, c1 << c2), c2) -> (and (srl(x, c2), c1) combine to encourage BFE creation, I investigated putting this in DAGComb
but it caused a lot of noise on other targets - some improvements, some regressions.

The X86 changes are all definite wins.

llvm-svn: 360552
2019-05-13 04:03:35 +00:00
Sanjay Patel a09e686821 [DAGCombiner] try to move bitcast after extract_subvector
I noticed that we were failing to narrow an x86 ymm math op in a case similar
to the 'madd' test diff. That is because a bitcast is sitting between the math
and the extract subvector and thwarting our pattern matching for narrowing:

       t56: v8i32 = add t59, t58
      t68: v4i64 = bitcast t56
    t73: v2i64 = extract_subvector t68, Constant:i64<2>
  t96: v4i32 = bitcast t73

There are a few wins and neutral diffs in the other tests.

Differential Revision: https://reviews.llvm.org/D61806

llvm-svn: 360541
2019-05-12 14:43:20 +00:00
Simon Pilgrim 605a840747 [DAG] Add SimplifyDemandedBits support for BITREVERSE
Pulled out of D58017 while I continue to investigate the BSWAP regression on PPC

llvm-svn: 360534
2019-05-11 20:56:05 +00:00
Simon Pilgrim aeed0a30c0 SelectionDAGISel::CodeGenAndEmitDAG - remove unused variable. NFCI.
llvm-svn: 360514
2019-05-11 11:00:37 +00:00
Jordan Rupprecht 16c7fbd112 Revert [DAGCombiner] Avoid creating large tokenfactors in visitTokenFactor
This reverts r360171 (git commit a9d6c32eaf). A repro showing the asan/msan failures is forthcoming.

llvm-svn: 360481
2019-05-10 23:20:02 +00:00
Craig Topper 114f763f37 [LegalizeVectorOps] Remove calls to LegalizeOp on the return value from ExpandLoad/ExpandStore.
We already updated the LegalizedNodes map at the end of the Expand call. This
would have marked the new node as being mapped to itself. So the LegalizeOp
call will find that an immediately return.

llvm-svn: 360472
2019-05-10 21:42:27 +00:00
Nikita Popov 9f7537bd48 [SDAG] Recursively legalize both vector mulo results
Split out from D61692 per RKSimon's suggestion. Vector op
legalization will automatically recursively legalize the returned
SDValue, but we need to take care of the other results ourselves.
Otherwise it will end up getting legalized only during op
legalization, by which point it might be too late (though I'm not
aware of any specific cases right now).

There are codegen differences because expansion occurs earlier now
and we don't get a DAGCombiner run in between.

Differential Revision: https://reviews.llvm.org/D61744

llvm-svn: 360470
2019-05-10 20:42:48 +00:00
Sanjay Patel b37ddeafc0 [DAGCombiner] reduce code duplication; NFC
llvm-svn: 360462
2019-05-10 20:02:30 +00:00
David Blaikie 7598b71488 DebugInfo: Only move types out of type units if they're named or type united
Follow up to r359122, after a bug was reported in it - the original
change too aggressively tried to move related types out of type units,
which included unnamed types (like array types) which can't reasonably
be declared-but-not-defined.

A step beyond that is that some types in type units can be anonymous, if
they are types with a name for linkage purposes (eg: "typedef struct { }
x;"). So ensure those don't get turned into plain declarations (without
signatures) because, lacking names, they can't be resolved to the
definition.

[Also include a fix for llvm-dwarfdump/libDebugInfoDWARF to pretty print
types in type units]

llvm-svn: 360458
2019-05-10 19:15:29 +00:00
Momchil Velikov c396f09ce9 Adjust MachineScheduler to use ProcResource counts
This fix allows the scheduler to take into account the number of instances of
each ProcResource specified. Previously a declaration in a scheduler of
ProcResource<1> would be treated identically to a declaration of
ProcResource<2>. Now the hazard recognizer would report a hazard only after all
of the resource instances are busy.

Patch by Jackson Woodruff and Momchil Velikov.

Differential Revision: https://reviews.llvm.org/D51160

llvm-svn: 360441
2019-05-10 16:54:32 +00:00
Tim Northover 6c1e3f9493 SelectionDAG: accommodate atomic floating stores.
We were applying a pointer truncation to floating types, which crashed LLVM.
That is Not A Good Thing(TM).

llvm-svn: 360421
2019-05-10 11:23:04 +00:00
Cameron McInally 156eb28289 [CodeGen] Add comment about FSUB <-> FNEG xforms
Differential Revision: https://reviews.llvm.org/D61741

llvm-svn: 360366
2019-05-09 19:28:52 +00:00
Florian Hahn be10bc71f9 [DAGCombiner] Limit number of nodes explored as store candidates.
To find the candidates to merge stores we iterate over all nodes in a chain
for each store, which leads to quadratic compile times for large basic blocks
with a large number of stores.

Reviewers: niravd, spatel, craig.topper

Reviewed By: niravd

Differential Revision: https://reviews.llvm.org/D61511

llvm-svn: 360357
2019-05-09 17:05:52 +00:00
Simon Pilgrim 38ef296265 [CodeGenPrepare] Ensure we get a non-null result from getTrueOrFalseValue. NFCI.
llvm-svn: 360328
2019-05-09 10:51:26 +00:00
Markus Lavin 92d5db524e Make sub-registers index names case sensitive in the MIRParser
Prior to this change sub-register index names are assumed to be lower
case (but they are printed with original casing). This means that if a
target has some upper case characters in its sub-register names then
mir-export directly followed by mir-import is not possible. This also
means that sub-register indices currently are (and will continue to be)
slightly inconsistent with register names which are printed and assumed
to be lower case.

As the current textual representation of mir has a few inconsistencies
in this area it is a bit arbitrary how to address the matter. This
change is towards the direction that we feel is most correct (i.e. case
sensitivity).

Differential Revision: https://reviews.llvm.org/D61499

llvm-svn: 360318
2019-05-09 08:29:04 +00:00
Pengfei Wang c05aad0532 Bugfix for nullptr check by klocwork
Klocwork static check:
Pointer from call to function `DebugLoc::operator DILocation *() const `
may be NULL and will be dereference in function `printExtendedName```
Patch by Shengchen Kan (skan)
Differential Revision: https://reviews.llvm.org/D61715

llvm-svn: 360317
2019-05-09 08:09:21 +00:00
Bjorn Pettersson 8d19e94f13 [CodeGen] Use "DL.getPointerSizeInBits" instead of "8 * DL.getPointerSize". NFC
llvm-svn: 360315
2019-05-09 08:07:36 +00:00
Leonard Chan 95b7abdcc5 [SelectionDAG] Expand ADD/SUBCARRY
This patch allows for expansion of ADDCARRY and SUBCARRY when the target does not support it.

Differential Revision: https://reviews.llvm.org/D61411

llvm-svn: 360303
2019-05-09 01:17:48 +00:00
Eric Christopher c93f56d39e Temporarily Revert "[DebugInfo] Terminate more location-list ranges at the end of blocks"
as it was causing significant compile time regressions.

This reverts commit r359426 while we come up with testcases and additional ideas.

llvm-svn: 360301
2019-05-08 23:54:03 +00:00
Sanjay Patel 902b3ecdad [SelectionDAG] fold 'fneg undef' to undef
This is extracted from the original draft of D61419 with some additional tests.
We don't currently get this in IR (it's conservatively turned into a NaN),
but presumably that'll get updated as we add real IR support for 'fneg'
rather than 'fsub -0.0, x'.

The x86-32 run shows the following, and I haven't looked further to see why,
but that seems to be independent:
  Legalizing: t1: f32 = undef
  Trying to expand node
  Creating fp constant: t4: f32 = ConstantFP<0.000000e+00>

Differential Revision: https://reviews.llvm.org/D61516

llvm-svn: 360296
2019-05-08 22:19:52 +00:00
Quentin Colombet 157427245a [RegAllocFast] Scan physcial reg definitions before assigning virtual reg definitions
When assigning the definitions of an instruction we were updating
the available registers while walking the definitions. Some of
those definitions may be from physical registers and thus, they are
not available for other definitions to take, but by the time we see
that we may have already assign these registers to another
virtual register.

Fix that by walking through all the definitions and mark as unavailable
the physical register definitions, then do the virtual register assignments.

PR41790

llvm-svn: 360278
2019-05-08 18:30:26 +00:00
Craig Topper 493aec3ef5 [FastISel][X86] Support FNeg instruction in target independent fast isel handling
This patch adds support for calling selectFNeg for FNeg instructions in addition to the fsub idiom

Differential Revision: https://reviews.llvm.org/D61624

llvm-svn: 360273
2019-05-08 17:27:08 +00:00
Simon Pilgrim 2788ad3ee2 [LegalizeDAG] Assert non-power-of-2 load/store op splits are in range. NFCI.
Fixes static analyzer undefined/out-of-range shift warnings.

llvm-svn: 360245
2019-05-08 11:22:10 +00:00
Simon Pilgrim 97a0c54179 Fix cppcheck operator precedence warning. NFCI.
llvm-svn: 360234
2019-05-08 10:07:34 +00:00
QingShan Zhang 0e71a6e755 [CodeGenPrepare] Don't split the store if it is volatile
We shouldn't split the store when it is volatile.

Differential Revision: https://reviews.llvm.org/D61169

llvm-svn: 360228
2019-05-08 07:32:12 +00:00
QingShan Zhang e065af6a42 [NFC] Add a static function to do the endian check
Add a new function to do the endian check, as I will commit another patch later, which will also need the endian check. 

Differential Revision: https://reviews.llvm.org/D61236

llvm-svn: 360226
2019-05-08 07:21:37 +00:00
Austin Kerbow 6e6480e216 [CodeGen] Rename DEBUG_TYPE for default hazard recognizer.
Summary:
The DEBUG_TYPE of the default hazard recognizer should be updated to
match the DEBUG_TYPE of the machine-scheduler pass.

Reviewers: rampitec

Reviewed By: rampitec

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61359

llvm-svn: 360198
2019-05-07 22:09:04 +00:00
Adrian Prantl e6e8db5e9b Debug Info: Support address space attributes on rvalue references.
DWARF5, 2.12 20ff says that

Any debugging information entry representing a pointer or reference
type [may have a DW_AT_address_class attribute].

The existing code (https://reviews.llvm.org/D29670) seems to take a
quite literal interpretation of that wording. I don't see a reason why
an rvalue reference isn't a reference type in the spirit of that
paragraph. This patch allows rvalue references to also have address
spaces.

rdar://problem/50511483

Differential Revision: https://reviews.llvm.org/D61625

llvm-svn: 360176
2019-05-07 17:42:38 +00:00
Florian Hahn a9d6c32eaf [DAGCombiner] Avoid creating large tokenfactors in visitTokenFactor
When simplifying TokenFactors, we potentially iterate over all
operands of a large number of TokenFactors. This causes quadratic
compile times in some cases and the large token factors cause additional
scalability problems elsewhere.

This patch adds some limits to the number of nodes explored for the
cases mentioned above.

Reviewers: niravd, spatel, craig.topper

Reviewed By: niravd

Differential Revision: https://reviews.llvm.org/D61397

llvm-svn: 360171
2019-05-07 16:47:27 +00:00
Simon Pilgrim 3044ac058b Avoid use-after-move warnings by using swap instead. NFCI.
Swap should be as quick in these cases, and leaves the original variables in a known (empty) state.

llvm-svn: 360164
2019-05-07 15:45:00 +00:00
Craig Topper c6d445f9c1 [FastISel][X86] If selectFNeg fails, fall back to SelectionDAG not treating it as an fsub.
Summary:
If fneg lowering for fsub -0.0, x fails we currently fall back to treating it as an fsub. This has different behavior for nans than the xor with sign bit trick we normally try to do. On X86, the xor trick for double fails fast-isel in 32-bit mode with sse2 due to 64 bit integer types not being available. With -O2 we would always use an xorpd for this case. If we use subsd, this creates an observable behavior difference between -O0 and -O2. So fall back to SelectionDAG if we can't fast-isel it, that way SelectionDAG will use the xorpd.

I believe this patch is restoring the behavior prior to r345295 from last October. This was missed then because our fast isel case in 32-bit mode aborted fast-isel earlier for another reason. But I've added new tests to cover that.

Reviewers: andrew.w.kaylor, cameron.mcinally, spatel, efriedma

Reviewed By: cameron.mcinally

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61622

llvm-svn: 360111
2019-05-07 04:25:24 +00:00
Fangrui Song da82ce99b7 [DebugInfo] Delete TypedDINodeRef
TypedDINodeRef<T> is a redundant wrapper of Metadata * that is actually a T *.

Accordingly, change DI{Node,Scope,Type}Ref uses to DI{Node,Scope,Type} * or their const variants.
This allows us to delete many resolve() calls that clutter the code.

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D61369

llvm-svn: 360108
2019-05-07 02:06:37 +00:00
Amy Huang 987b969bab Fix bug in getCompleteTypeIndex in codeview debug info
Summary:
When there are multiple instances of a forward decl record type, only the first one is emitted with a type index, because
the type is added to a map with a null type index. Avoid this by reordering so that forward decl types aren't added to the map.

Reviewers: rnk

Subscribers: aprantl, hiraditya, arphaman, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61460

llvm-svn: 360101
2019-05-06 23:37:03 +00:00
Craig Topper 39f1a97417 [FastISel] Pass the fneg input operand to hasTrivialKill in FastISel::selectFNeg.
We're trying to calculate the kill flag for OpReg which is the input so we need to pass the input here.

llvm-svn: 360097
2019-05-06 23:09:09 +00:00
Philip Reames 2f53d79bff Fix pr33010, a 2 year old crashing regression
The problem was that we were creating a CMOV64rr <TargetFrameIndex>, <TargetFrameIndex>.  The entire point of a TFI is that address code is not generated, so there's no way to legalize/lower this.  Instead, simply prevent it's creation.

Arguably, we shouldn't be using *Target*FrameIndices in StatepointLowering at all, but that's a much deeper change.  

llvm-svn: 360090
2019-05-06 22:09:31 +00:00
Craig Topper ad56843dd7 [SelectionDAG][X86] Support inline assembly returning an mmx register into a type with fewer than 64 bits.
It's possible to use the 'y' mmx constraint with a type narrower than 64-bits.

This patch supports this by bitcasting the mmx type to 64-bits and then
truncating to the desired type.

There are probably other missing type combinations we need to support, but this
is the case we have a bug report for.

Fixes PR41748.

Differential Revision: https://reviews.llvm.org/D61582

llvm-svn: 360069
2019-05-06 19:50:14 +00:00
Craig Topper 55a71b575c Revert r359392 and r358887
Reverts "[X86] Remove (V)MOV64toSDrr/m and (V)MOVDI2SSrr/m. Use 128-bit result MOVD/MOVQ and COPY_TO_REGCLASS instead"
Reverts "[TargetLowering][AMDGPU][X86] Improve SimplifyDemandedBits bitcast handling"

Eric Christopher and Jorge Gorbe Moya reported some issues with these patches to me off list.

Removing the CodeGenOnly instructions has changed how fneg is handled during fast-isel with sse/sse2. We're now emitting fsub -0.0, x instead
moving to the integer domain(in a GPR), xoring the sign bit, and then moving back to xmm. This is because the fast isel table no longer
contains an entry for (f32/f64 bitcast (i32/i64)) so the target independent fneg code fails. The use of fsub changes the behavior of nan with
respect to -O2 codegen which will always use a pxor. NOTE: We still have a difference with double with -m32 since the move to GPR doesn't work
there. I'll file a separate PR for that and add test cases.

Since removing the CodeGenOnly instructions was fixing PR41619, I'm reverting r358887 which exposed that PR. Though I wouldn't be surprised
if that bug can still be hit independent of that.

This should hopefully get Google back to green. I'll work with Simon and other X86 folks to figure out how to move forward again.

llvm-svn: 360066
2019-05-06 19:29:24 +00:00
Nikita Popov cfe786a195 [SDAG][AArch64] Boolean and/or reduce to umax/min reduce (PR41635)
This addresses one half of https://bugs.llvm.org/show_bug.cgi?id=41635
by combining a VECREDUCE_AND/OR into VECREDUCE_UMIN/UMAX (if latter is
legal but former is not) for zero-or-all-ones boolean reductions (which
are detected based on sign bits).

Differential Revision: https://reviews.llvm.org/D61398

llvm-svn: 360054
2019-05-06 16:17:17 +00:00
Craig Topper f723490e76 [SelectionDAG] Replace llvm_unreachable at the end of getCopyFromParts with a report_fatal_error.
Based on PR41748, not all cases are handled in this function.

llvm_unreachable is treated as an optimization hint than can prune code paths
in a release build. This causes weird behavior when PR41748 is encountered on a
release build. It appears to generate an fp_round instruction from the floating
point code.

Making this a report_fatal_error prevents incorrect optimization of the code
and will instead generate a message to file a bug report.

llvm-svn: 360008
2019-05-06 04:01:49 +00:00
Roman Lebedev 1a1b922177 [NFC] BasicBlock: refactor changePhiUses() out of replacePhiUsesWith(), use it
Summary:
It is a common thing to loop over every `PHINode` in some `BasicBlock`
and change old `BasicBlock` incoming block to a new `BasicBlock` incoming block.
`replaceSuccessorsPhiUsesWith()` already had code to do that,
it just wasn't a function.
So outline it into a new function, and use it.

Reviewers: chandlerc, craig.topper, spatel, danielcdh

Reviewed By: craig.topper

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61013

llvm-svn: 359996
2019-05-05 18:59:39 +00:00
Roman Lebedev e3b1d82b53 [NFC] PHINode: introduce replaceIncomingBlockWith() function, use it
Summary:
There is `PHINode::getBasicBlockIndex()`, `PHINode::setIncomingBlock()`
and `PHINode::getNumOperands()`, but no function to replace every
specified `BasicBlock*` predecessor with some other specified `BasicBlock*`.
Clearly, there are a lot of places that could use that functionality.

Reviewers: chandlerc, craig.topper, spatel, danielcdh

Reviewed By: craig.topper

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61011

llvm-svn: 359995
2019-05-05 18:59:30 +00:00
Simon Pilgrim 0f89b76b84 [SelectionDAG] Use any_of/all_of where possible. NFCI.
llvm-svn: 359974
2019-05-05 10:30:04 +00:00
Sanjay Patel 5ab41a7a05 [CodeGenPrepare] limit overflow intrinsic matching to a single basic block (2nd try)
This is a subset of the original commit from rL359879
which was reverted because it could crash when using the 'RemovedInstructions'
structure that enables delayed deletion of dead instructions. The motivating
compile-time win does not require that change though. We should get most of
that win from this change alone.

Using/updating a dominator tree to match math overflow patterns may be very
expensive in compile-time (because of the way CGP uses a DT), so just handle
the single-block case.

See post-commit thread for rL354298 for more details:
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20190422/646276.html

Differential Revision: https://reviews.llvm.org/D61075

llvm-svn: 359969
2019-05-04 12:46:32 +00:00
Matt Arsenault b6c599afd3 Reapply r359906, "RegAllocFast: Add heuristic to detect values not live-out of a block"
This reverts commit r359912.

This should pass now, since the clang test was made less fragile in
r359918.

llvm-svn: 359919
2019-05-03 19:06:57 +00:00
Simon Pilgrim 5d3b100750 [DAGCombine] Remove repeated variables. NFCI.
llvm-svn: 359915
2019-05-03 18:20:28 +00:00
Nico Weber bb852a9672 Revert r359906, "RegAllocFast: Add heuristic to detect values not live-out of a block"
Makes clang/test/Misc/backend-stack-frame-diagnostics-fallback.cpp fail.

llvm-svn: 359912
2019-05-03 18:08:03 +00:00
Simon Pilgrim 308b5ec1ff [TargetLowering] SimplifySetCC - remove repeated variable. NFCI.
Also reduce scope of Temp variable.

llvm-svn: 359911
2019-05-03 18:02:33 +00:00
Evgeniy Stepanov 46ec57e576 Revert "[CodeGenPrepare] limit overflow intrinsic matching to a single basic block"
This reverts commit r359879, which introduced a compiler crash.

llvm-svn: 359908
2019-05-03 17:31:49 +00:00
Matt Arsenault daf2d653fa RegAllocFast: Add heuristic to detect values not live-out of a block
Add an improved/new heuristic to catch more cases when values are not
live out of a basic block.

Patch by Matthias Braun

llvm-svn: 359906
2019-05-03 17:03:24 +00:00
Simon Pilgrim d857f64c31 [SelectionDAG] CreateTopologicalOrder - don't use iterator
We shouldn't use an iterator to loop across a std::vector when the same loop is adding elements to that std::vector

Found by cppcheck

llvm-svn: 359900
2019-05-03 15:50:37 +00:00
Simon Pilgrim bc876df3a5 [TargetLowering] ShrinkDemandedConstant - reduce scope of TLO.DAG variable. NFCI.
Only ever used in one block

llvm-svn: 359890
2019-05-03 14:38:24 +00:00
Sanjay Patel 8ff072e48e [CodeGenPrepare] limit overflow intrinsic matching to a single basic block
Using/updating a dominator tree to match math overflow patterns may be very
expensive in compile-time (because of the way CGP uses a DT), so just handle
the single-block case.

Also, we were restarting the iterator loops when doing the overflow intrinsic
transforms by marking the dominator tree for update. That was done to prevent
iterating over a removed instruction. But we can postpone the deletion using
the existing "RemovedInsts" structure, and that means we don't need to update
the DT.

See post-commit thread for rL354298 for more details:
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20190422/646276.html

Differential Revision: https://reviews.llvm.org/D61075

llvm-svn: 359879
2019-05-03 13:09:18 +00:00
Simon Pilgrim e798e3a346 [TargetLowering] expandUnalignedStore - cleanup EVT variables. NFCI.
Avoid duplicated EVTs and rename Store/Load VTs to avoid -Wshadow warnings.

llvm-svn: 359877
2019-05-03 12:55:25 +00:00
Anton Afanasyev 6d08b8dbae Revert "[MIR] Add simple PRE pass to MachineCSE"
This reverts commit 9c20156de3.
It breaks stage 2 of clang-ppc64be-linux-multistage.

llvm-svn: 359875
2019-05-03 12:36:22 +00:00
Simon Pilgrim 42d2b604b5 [SelectionDAG] Use INT_MIN as (1 << 31) is UB for signed integers. NFCI.
llvm-svn: 359873
2019-05-03 11:32:00 +00:00
Simon Pilgrim bfd00a6440 [SelectionDAG] computeKnownBits - remove some duplicate/shadow variables. NFCI.
llvm-svn: 359872
2019-05-03 11:11:03 +00:00
Anton Afanasyev 9c20156de3 [MIR] Add simple PRE pass to MachineCSE
This is the second part of the commit fixing PR38917 (hoisting
partitially redundant machine instruction). Most of PRE (partitial
redundancy elimination) and CSE work is done on LLVM IR, but some of
redundancy arises during DAG legalization. Machine CSE is not enough
to deal with it. This simple PRE implementation works a little bit
intricately: it passes before CSE, looking for partitial redundancy
and transforming it to fully redundancy, anticipating that the next
CSE step will eliminate this created redundancy. If CSE doesn't
eliminate this, than created instruction will remain dead and eliminated
later by Remove Dead Machine Instructions pass.

The third part of the commit is supposed to refactor MachineCSE,
to make it more clear and to merge MachinePRE with MachineCSE,
so one need no rely on further Remove Dead pass to clear instrs
not eliminated by CSE.

First step: https://reviews.llvm.org/D54839

Fixes llvm.org/PR38917

Reviewers: RKSimon

Subscribers: hfinkel, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D56772

llvm-svn: 359870
2019-05-03 10:30:59 +00:00
Quentin Colombet c9256cc6ba [IRTranslator] Use the alloc size instead of the store size when translating allocas
We use to incorrectly use the store size instead of the alloc size when
creating the stack slot for allocas.
On aarch64 this can be demonstrated by allocating weirdly sized types.

For instance, in the added test case, we use an alloca for i19. We used
to allocate a slot of size 24-bit (19 rounded up to the next byte),
whereas we really want to use a full 32-bit slot for this type.

llvm-svn: 359856
2019-05-03 01:23:56 +00:00
Eli Friedman 0b61d220c9 [AArch64][Windows] Compute function length correctly in unwind tables.
The primary fix here is to WinException.cpp: we need to exclude jump
tables when computing the length of a function, or else we fail to
correctly compute the length. (We can only compute the number of bytes
consumed by certain assembler directives after the entire file is
parsed. ".p2align" is one of those directives, and is used by jump table
generation.)

The secondary fix, to MCWin64EH, is to make sure we don't silently
miscompile if we hit a similar situation in the future.

It's possible we could extend ARM64EmitUnwindInfo so it allows function
bodies that contain assembler directives, but that's a lot more
complicated; see the FIXME in MCWin64EH.cpp.

Fixes https://bugs.llvm.org/show_bug.cgi?id=41581 .

Differential Revision: https://reviews.llvm.org/D61095

llvm-svn: 359849
2019-05-03 00:10:45 +00:00
Craig Topper e8a1cde886 [SelectionDAG] Add asserts to verify the vectorness of input and output types of TRUNCATE/ZERO_EXTEND/ANY_EXTEND/SIGN_EXTEND agree
As a result of the underlying cause of PR41678 we created an ANY_EXTEND node with a scalar result type and v1i1 input type. Ideally we would have asserted for this instead of letting it go through to instruction selection and generate bad machine IR

Differential Revision: https://reviews.llvm.org/D61463

llvm-svn: 359836
2019-05-02 22:26:26 +00:00
Sanjay Patel 1972826178 [DAGCombiner] try repeated fdiv divisor transform before building estimate (2nd try)
The original patch was committed at rL359398 and reverted at rL359695 because of
infinite looping.

This includes a fix to check for a vector splat of "1.0" to avoid the infinite loop.

Original commit message:

This was originally part of D61028, but it's an independent diff.

If we try the repeated divisor reciprocal transform before producing an estimate sequence,
then we have an opportunity to use scalar fdiv. On x86, the trade-off is 1 divss vs. 5
vector FP ops in the default estimate sequence. On recent chips (Skylake, Ryzen), the
full-precision division is only 3 cycle throughput, so that's probably the better perf
default option and avoids problems from x86's inaccurate estimates.

The last 2 tests show that users still have the option to override the defaults by using
the function attributes for reciprocal estimates, but those patterns are potentially made
faster by converting the vector ops (including ymm ops) to scalar math.

Differential Revision: https://reviews.llvm.org/D61149

llvm-svn: 359793
2019-05-02 15:02:08 +00:00
Sanjay Patel 284472be6d [SelectionDAG] remove constant folding limitations based on FP exceptions
We don't have FP exception limits in the IR constant folder for the binops (apart from strict ops),
so it does not make sense to have them here in the DAG either. Nothing else in the backend tries
to preserve exceptions (again outside of strict ops), so I don't see how this could have ever
worked for real code that cares about FP exceptions.

There are still cases (examples: unary opcodes in SDAG, FMA in IR) where we are trying (at least
partially) to preserve exceptions without even asking if the target supports FP exceptions. Those
should be corrected in subsequent patches.

Real support for FP exceptions requires several changes to handle the constrained/strict FP ops.

Differential Revision: https://reviews.llvm.org/D61331

llvm-svn: 359791
2019-05-02 14:47:59 +00:00
Sanjay Patel 64d5751254 Revert "[DAGCombiner] try repeated fdiv divisor transform before building estimate"
This reverts commit fb9a5307a9 (rL359398)
because it can cause an infinite loop due to opposing combines.

llvm-svn: 359695
2019-05-01 16:06:21 +00:00
Tim Northover ee2474df9f DAG: allow DAG pointer size different from memory representation.
In preparation for supporting ILP32 on AArch64, this modifies the SelectionDAG
builder code so that pointers are allowed to have a larger type when "live" in
the DAG compared to memory.

Pointers get zero-extended whenever they are loaded, and truncated prior to
stores.  In addition, a few not quite so obvious locations need updating:

  * A GEP that has not been marked inbounds needs to enforce the IR-documented
    2s-complement wrapping at the memory pointer size. Inbounds GEPs are
    undefined if they overflow the address space, so no additional operations
    are needed.
  * Signed comparisons would give incorrect results if performed on the
    zero-extended values.

This shouldn't affect CodeGen for now, but will become active when the AArch64
ILP32 support is committed.

llvm-svn: 359676
2019-05-01 12:37:30 +00:00
Sanjay Patel 0387bf5269 [SelectionDAG] remove div-by-zero constant folding restriction
We don't have this restriction in IR, so it should not be here
either simply out of consistency. Code that wants to handle FP
exceptions is expected to use the 'strict' variants of these
nodes.

We don't get the frem case because frem by 0.0 produces NaN (invalid),
and that's the remaining check here (so the removed check for frem
was dead code AFAIK).

This is the only place in SDAG that uses "HasFPExceptions", so I
think we should remove that entirely as a follow-up patch.

llvm-svn: 359566
2019-04-30 14:37:15 +00:00
Sjoerd Meijer 0ed4619679 [TargetLowering] findOptimalMemOpLowering. NFCI.
This was a local static funtion in SelectionDAG, which I've promoted to
TargetLowering so that I can reuse it to estimate the cost of a memory
operation in D59787.

Differential Revision: https://reviews.llvm.org/D59766

llvm-svn: 359543
2019-04-30 10:09:15 +00:00
Fangrui Song 7bce25cd7d [AsmPrinter] Make AsmPrinter::HandlerInfo::Handler a unique_ptr
Handlers.clear() in AsmPrinter::doFinalization() will destroy these handlers.
A unique_ptr makes the ownership clearer.

llvm-svn: 359541
2019-04-30 09:14:02 +00:00
Sjoerd Meijer 180f1ae57c [TargetLowering] Change getOptimalMemOpType to take a function attribute list
The MachineFunction wasn't used in getOptimalMemOpType, but more importantly,
this allows reuse of findOptimalMemOpLowering that is calling getOptimalMemOpType.

This is the groundwork for the changes in D59766 and D59787, that allows
implementation of TTI::getMemcpyCost.

Differential Revision: https://reviews.llvm.org/D59785

llvm-svn: 359537
2019-04-30 08:38:12 +00:00
Markus Lavin a475da36eb [DebugInfo] DW_OP_deref_size in PrologEpilogInserter.
The PrologEpilogInserter need to insert a DW_OP_deref_size before
prepending a memory location expression to an already implicit
expression to avoid having the existing expression act on the memory
address instead of the value behind it.

The reason for using DW_OP_deref_size and not plain DW_OP_deref is that
big-endian targets need to read the right size as simply truncating a
larger read would yield the wrong result (LSB bytes are not at the lower
address).

This re-commit fixes issues reported in the first one. Namely deref was
inserted under wrong conditions and additionally the deref_size argument
was incorrectly encoded.

Differential Revision: https://reviews.llvm.org/D59687

llvm-svn: 359535
2019-04-30 07:58:57 +00:00
Zi Xuan Wu 49d60fdc2e [DAGCombiner] Do not generate ISD::ADDE node if adde is not legal for the target when combine ISD::TRUNC node
Do not combine (trunc adde(X, Y, Carry)) into (adde trunc(X), trunc(Y), Carry), 
if adde is not legal for the target. Even it's at type-legalize phase. 
Because adde is special and will not be legalized at operation-legalize phase later.

This fixes: PR40922
https://bugs.llvm.org/show_bug.cgi?id=40922

Differential Revision: https://reviews.llvm.org//D60854

llvm-svn: 359532
2019-04-30 03:01:14 +00:00
Simon Pilgrim 9b17b80a0e computePolynomialFromPointer - add missing early-out return for non-pointer types.
Reported in https://www.viva64.com/en/b/0629/

llvm-svn: 359486
2019-04-29 19:25:16 +00:00
Daniel Sanders 8f079844d0 [globalisel] Improve Legalizer debug output
* LegalizeAction should be printed by name rather than number
* Newly created instructions are incomplete at the point the observer first sees
  them. They are therefore recorded in a small vector and printed just before
  the legalizer moves on to another instruction. By this point, the instruction
  must be complete.

llvm-svn: 359481
2019-04-29 18:45:59 +00:00
Bjorn Pettersson 820994572c [DAG] Refactor DAGCombiner::ReassociateOps
Summary:
Extract the logic for doing reassociations
from DAGCombiner::reassociateOps into a helper
function DAGCombiner::reassociateOpsCommutative,
and use that helper to trigger reassociation
on the original operand order, or the commuted
operand order.

Codegen is not identical since the operand order will
be different when doing the reassociations for the
commuted case. That causes some unfortunate churn in
some test cases. Apart from that this should be NFC.

Reviewers: spatel, craig.topper, tstellar

Reviewed By: spatel

Subscribers: dmgreen, dschuff, jvesely, nhaehnle, javed.absar, sbc100, jgravelle-google, hiraditya, aheejin, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61199

llvm-svn: 359476
2019-04-29 17:50:10 +00:00
Jeremy Morse 055aee1d8a [DebugInfo] Terminate more location-list ranges at the end of blocks
This patch fixes PR40795, where constant-valued variable locations can
"leak" into blocks placed at higher addresses. The root of this is that
DbgEntityHistoryCalculator terminates all register variable locations at
the end of each block, but not constant-value variable locations.

Fixing this requires constant-valued DBG_VALUE instructions to be
broadcast into all blocks where the variable location remains valid, as
documented in the LiveDebugValues section of SourceLevelDebugging.rst,
and correct termination in DbgEntityHistoryCalculator.

Differential Revision: https://reviews.llvm.org/D59431

llvm-svn: 359426
2019-04-29 09:13:16 +00:00
Sanjay Patel fb9a5307a9 [DAGCombiner] try repeated fdiv divisor transform before building estimate
This was originally part of D61028, but it's an independent diff.

If we try the repeated divisor reciprocal transform before producing an estimate sequence,
then we have an opportunity to use scalar fdiv. On x86, the trade-off is 1 divss vs. 5
vector FP ops in the default estimate sequence. On recent chips (Skylake, Ryzen), the
full-precision division is only 3 cycle throughput, so that's probably the better perf
default option and avoids problems from x86's inaccurate estimates.

The last 2 tests show that users still have the option to override the defaults by using
the function attributes for reciprocal estimates, but those patterns are potentially made
faster by converting the vector ops (including ymm ops) to scalar math.

Differential Revision: https://reviews.llvm.org/D61149

llvm-svn: 359398
2019-04-28 12:23:43 +00:00
Nick Desaulniers 7ab164c4a4 [AsmPrinter] refactor to support %c w/ GlobalAddress'
Summary:
Targets like ARM, MSP430, PPC, and SystemZ have complex behavior when
printing the address of a MachineOperand::MO_GlobalAddress. Move that
handling into a new overriden method in each base class. A virtual
method was added to the base class for handling the generic case.

Refactors a few subclasses to support the target independent %a, %c, and
%n.

The patch also contains small cleanups for AVRAsmPrinter and
SystemZAsmPrinter.

It seems that NVPTXTargetLowering is possibly missing some logic to
transform GlobalAddressSDNodes for
TargetLowering::LowerAsmOperandForConstraint to handle with "i" extended
inline assembly asm constraints.

Fixes:
- https://bugs.llvm.org/show_bug.cgi?id=41402
- https://github.com/ClangBuiltLinux/linux/issues/449

Reviewers: echristo, void

Reviewed By: void

Subscribers: void, craig.topper, jholewinski, dschuff, jyknight, dylanmckay, sdardis, nemanjai, javed.absar, sbc100, jgravelle-google, eraman, kristof.beyls, hiraditya, aheejin, kbarton, fedor.sergeev, jrtc27, atanasyan, jsji, llvm-commits, kees, tpimh, nathanchance, peter.smith, srhines

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60887

llvm-svn: 359337
2019-04-26 18:45:04 +00:00
Simon Pilgrim ef54b1dddf [DAGCombine] Cleanup visitEXTRACT_SUBVECTOR. NFCI.
Use ArrayRef::slice, reduce some rather awkward long lines for legibility and run clang-format.

llvm-svn: 359326
2019-04-26 17:49:02 +00:00
Simon Pilgrim 5d6ef94c36 [X86][SSE] Disable shouldFoldConstantShiftPairToMask for btver1/btver2 targets (PR40758)
As detailed on PR40758, Bobcat/Jaguar can perform vector immediate shifts on the same pipes as vector ANDs with the same latency - so it doesn't make sense to replace a shl+lshr with a shift+and pair as it requires an additional mask (with the extra constant pool, loading and register pressure costs).

Differential Revision: https://reviews.llvm.org/D61068

llvm-svn: 359293
2019-04-26 10:49:13 +00:00
Marcello Maggioni c596584f67 [GlobalISel] Fix inserting copies in the right position for reg definitions
When constrainRegClass is called if the constraining happens on a use the COPY
needs to be inserted before the instruction that contains the MachineOperand,
but if we are constraining a definition it actually needs to be added
after the instruction. In addition, the COPY needs to have its operands
flipped (in the use case we are copying from the old unconstrained register
to the new constrained register, while in the definition case we are copying
from the new constrained register that the instruction defines to the old
unconstrained register).

llvm-svn: 359282
2019-04-26 07:21:56 +00:00
Craig Topper f9c30eddd0 [SelectionDAG][X86] Use stack load/store in PromoteIntRes_BITCAST when the input needs to be be split and the output type is a vector.
We had special case handling here, but it uses a scalar any_extend for the
promotion then bitcasts to the final type. This won't split up the input data
into multiple promoted elements like we need.

This patch falls back to doing the conversion through memory.

Fixes PR41594 which I believe was reflected in the bitcast-vector-bool.ll
changes. The changes to vector-half-conversions.ll are fixing a previously
unknown miscompile from this issue.

Differential Revision: https://reviews.llvm.org/D61114

llvm-svn: 359219
2019-04-25 18:19:59 +00:00
Jessica Paquette ba55767f51 [GlobalISel][AArch64] Legalize G_FNEARBYINT
Add legalizer support for G_FNEARBYINT. It's the same as G_FCEIL etc.

Since the importer allows us to automatically select this after legalization,
also add tests for selection etc. Also update arm64-vfloatintrinsics.ll.

llvm-svn: 359204
2019-04-25 16:44:40 +00:00
Jessica Paquette bd7ac30b15 [GlobalISel] Add IRTranslator support for G_FNEARBYINT
Translate llvm.nearbyint into G_FNEARBYINT as a simple intrinsic. Update
arm64-irtranslator.ll.

Differential Revision: https://reviews.llvm.org/D60922

llvm-svn: 359203
2019-04-25 16:39:28 +00:00
Amy Huang 68c9199493 Recommitting r358783 and r358786 "[MS] Emit S_HEAPALLOCSITE debug info" with fixes for buildbot error (undefined assembler label).
Summary:
This emits labels around heapallocsite calls and S_HEAPALLOCSITE debug
info in codeview. Currently only changes FastISel, so emitting labels still
needs to be implemented in SelectionDAG.

Reviewers: rnk

Subscribers: aprantl, hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D61083

llvm-svn: 359149
2019-04-24 23:02:48 +00:00
Sanjay Patel 6f41bf948b [DAGCombiner] scale repeated FP divisor by splat factor
If we have a vector FP division with a splatted divisor, use the existing transform
that converts 'x/y' into 'x * (1.0/y)' to allow more conversions. This can then
potentially be converted into a scalar FP division by existing combines (rL358984)
as seen in the tests here.

That can be a potentially big perf difference if scalar fdiv has better timing
(including avoiding possible frequency throttling for vector ops).

Differential Revision: https://reviews.llvm.org/D61028

llvm-svn: 359147
2019-04-24 22:28:58 +00:00
David Blaikie 832c7d9f36 DebugInfo: Emit only declarations (not whole definitions) of non-unit user defined types into type units
While this doesn't come up in reasonable cases currently (the only user
defined types not in type units are ones without linkage - which makes
for near-ODR violations, because it'd be a type with linkage referencing
a type without linkage - such a type can't be validly defined in more
than one TU, so arguably it shouldn't be in a type unit to begin with -
but it's a convenient way to demonstrate an issue that will become more
revalent with homed modular debug info type definitions - which also
don't need to be in type units but more legitimately so).

Precursor to the Clang change to de-type-unit (by omitting the
'identifier') types homed due to strong linkage vtables. (making that
change without this one would lead to major type duplication in type
units)

llvm-svn: 359122
2019-04-24 18:09:44 +00:00
Bjorn Pettersson 71e8c6f20f Add "const" in GetUnderlyingObjects. NFC
Summary:
Both the input Value pointer and the returned Value
pointers in GetUnderlyingObjects are now declared as
const.

It turned out that all current (in-tree) uses of
GetUnderlyingObjects were trivial to update, being
satisfied with have those Value pointers declared
as const. Actually, in the past several of the users
had to use const_cast, just because of ValueTracking
not providing a version of GetUnderlyingObjects with
"const" Value pointers. With this patch we get rid
of those const casts.

Reviewers: hfinkel, materi, jkorous

Reviewed By: jkorous

Subscribers: dexonsmith, jkorous, jholewinski, sdardis, eraman, hiraditya, jrtc27, atanasyan, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61038

llvm-svn: 359072
2019-04-24 06:55:50 +00:00
Francis Visoiu Mistrih 7fee2b89fd [Remarks] Add string deduplication using a string table
* Add support for uniquing strings in the remark streamer and emitting the string table in the remarks section.

* Add parsing support for the string table in the RemarkParser.

From this remark:

```
--- !Missed
Pass:     inline
Name:     NoDefinition
DebugLoc: { File: 'test-suite/SingleSource/UnitTests/2002-04-17-PrintfChar.c',
            Line: 7, Column: 3 }
Function: printArgsNoRet
Args:
  - Callee:   printf
  - String:   ' will not be inlined into '
  - Caller:   printArgsNoRet
    DebugLoc: { File: 'test-suite/SingleSource/UnitTests/2002-04-17-PrintfChar.c',
                Line: 6, Column: 0 }
  - String:   ' because its definition is unavailable'
...
```

to:

```
--- !Missed
Pass: 0
Name: 1
DebugLoc: { File: 3, Line: 7, Column: 3 }
Function: 2
Args:
  - Callee:   4
  - String:   5
  - Caller:   2
    DebugLoc: { File: 3, Line: 6, Column: 0 }
  - String:   6
...
```

And the string table in the .remarks/__remarks section containing:

```
inline\0NoDefinition\0printArgsNoRet\0
test-suite/SingleSource/UnitTests/2002-04-17-PrintfChar.c\0printf\0
will not be inlined into \0 because its definition is unavailable\0
```

This is mostly supposed to be used for testing purposes, but it gives us
a 2x reduction in the remark size, and is an incremental change for the
updates to the remarks file format.

Differential Revision: https://reviews.llvm.org/D60227

llvm-svn: 359050
2019-04-24 00:06:24 +00:00
Francis Visoiu Mistrih 1646851b87 [CGP] Look through bitcasts when duplicating returns for tail calls
The simple case of:

```
int *callee();
void *caller(void *a) {
  if (a == NULL)
    return callee();
  return a;
}
```

would generate a regular call instead of a tail call because we don't
look through the bitcast of the call to `callee` when duplicating the
return blocks.

Differential Revision: https://reviews.llvm.org/D60837

llvm-svn: 359041
2019-04-23 21:57:46 +00:00
Amy Huang fc79ab9857 Revert "[MS] Emit S_HEAPALLOCSITE debug info" because of ToTWin64(db)
buildbot failure.

This reverts commit d07d6d6177 and
c774f687b6.

llvm-svn: 359034
2019-04-23 21:12:58 +00:00
Jessica Paquette 3cc6d1f542 [AArch64][GlobalISel] Legalize G_INTRINSIC_ROUND
Add it to the same rule as G_FCEIL etc. Add a legalizer test, and add a missing
switch case to AArch64LegalizerInfo.cpp.

llvm-svn: 359033
2019-04-23 21:11:57 +00:00
David Blaikie 2f51176223 Reapply: "DebugInfo: Emit only one kind of accelerated access/name table""
Originally committed in r358931
Reverted in r358997

Seems this change made Apple accelerator tables miss names (because
names started respecting the CU NameTableKind GNU & assuming that
shouldn't produce accelerated names too), which is never correct (apple
accelerator tables don't have separators or CU lists - if present, they
must describe all names in all CUs).

Original Description:
Currently to opt in to debug_names in DWARFv5, the IR must contain
'nameTableKind: Default' which also enables debug_pubnames.

Instead, only allow one of {debug_names, apple_names, debug_pubnames,
debug_gnu_pubnames}.

nameTableKind: Default gives debug_names in DWARFv5 and greater,
debug_pubnames in v4 and earlier - and apple_names when tuning for lldb
on MachO.
nameTableKind: GNU always gives gnu_pubnames

llvm-svn: 359026
2019-04-23 19:00:45 +00:00
Jessica Paquette 56342642a0 [AArch64][GlobalISel] Legalize G_INTRINSIC_TRUNC
Same patch as G_FCEIL etc.

Add the missing switch case in widenScalar, add G_INTRINSIC_TRUNC to the correct
rule in AArch64LegalizerInfo.cpp, and add a test.

llvm-svn: 359021
2019-04-23 18:20:44 +00:00
David Blaikie a2470a4653 Revert "DebugInfo: Emit only one kind of accelerated access/name table"
Regresses some apple_names situations - still investigating.

This reverts commit r358931.

llvm-svn: 358997
2019-04-23 15:03:24 +00:00
Fangrui Song efd94c56ba Use llvm::stable_sort
While touching the code, simplify if feasible.

llvm-svn: 358996
2019-04-23 14:51:27 +00:00
Sanjay Patel 06ff5eae5b [DAGCombiner] generalize binop-of-splats scalarization
If we only match build vectors, we can miss some patterns
that use shuffles as seen in the affected tests.

Note that the underlying calls within getSplatSourceVector()
have the potential for compile-time explosion because of
exponential recursion looking through binop opcodes, but
currently the list of supported opcodes is very limited.
Both of those problems should be addressed in follow-up
patches.

llvm-svn: 358984
2019-04-23 13:16:41 +00:00
Bjorn Pettersson f97b29be88 [DAGCombiner] Combine OR as ADD when no common bits are set
Summary:
The DAGCombiner is rewriting (canonicalizing) an ISD::ADD
with no common bits set in the operands as an ISD::OR node.

This could sometimes result in "missing out" on some
combines that normally are performed for ADD. To be more
specific this could happen if we already have rewritten an
ADD into OR, and later (after legalizations or combines)
we expose patterns that could have been optimized if we
had seen the OR as an ADD (e.g. reassociations based on ADD).

To make the DAG combiner less sensitive to if ADD or OR is
used for these "no common bits set" ADD/OR operations we
now apply most of the ADD combines also to an OR operation,
when value tracking indicates that the operands have no
common bits set.

Reviewers: spatel, RKSimon, craig.topper, kparzysz

Reviewed By: spatel

Subscribers: arsenm, rampitec, lebedev.ri, jvesely, nhaehnle, hiraditya, javed.absar, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59758

llvm-svn: 358965
2019-04-23 10:01:08 +00:00
Chandler Carruth bbddf21f90 Revert "Use const DebugLoc&"
This reverts r358910 (git commit 2b74466530)

While this patch *seems* trivial and safe and correct, it is not. The
copies are actually load bearing copies. You can observe this with MSan
or other ways of checking for use-after-destroy, but otherwise this may
result in ... difficult to debug inexplicable behavior.

I suspect the issue is that the debug location is used after the
original reference to it is removed. The metadata backing it gets
destroyed as its last references goes away, and then we reference it
later through these const references.

llvm-svn: 358940
2019-04-23 01:42:07 +00:00
David Blaikie 68602ab2f3 DebugInfo: Emit only one kind of accelerated access/name table
Currently to opt in to debug_names in DWARFv5, the IR must contain
'nameTableKind: Default' which also enables debug_pubnames.

Instead, only allow one of {debug_names, apple_names, debug_pubnames,
debug_gnu_pubnames}.

nameTableKind: Default gives debug_names in DWARFv5 and greater,
debug_pubnames in v4 and earlier - and apple_names when tuning for lldb
on MachO.
nameTableKind: GNU always gives gnu_pubnames

llvm-svn: 358931
2019-04-22 22:45:11 +00:00
Sanjay Patel bf8aacb715 [SelectionDAG] move splat util functions up from x86 lowering
This was supposed to be NFC, but the change in SDLoc
definitions causes instruction scheduling changes.

There's nothing x86-specific in this code, and it can
likely be used from DAGCombiner's simplifyVBinOp().

llvm-svn: 358930
2019-04-22 22:43:36 +00:00
Matt Arsenault 2b74466530 Use const DebugLoc&
llvm-svn: 358910
2019-04-22 19:14:27 +00:00
Matt Arsenault 8f624abc1d GlobalISel: Legalize scalar G_EXTRACT sources
llvm-svn: 358892
2019-04-22 15:10:42 +00:00
Simon Pilgrim 6276ce0142 [TargetLowering][AMDGPU][X86] Improve SimplifyDemandedBits bitcast handling
This patch adds support for BigBitWidth -> SmallBitWidth bitcasts, splitting the DemandedBits/Elts accordingly.

The AMDGPU backend needed an extra  (srl (and x, c1 << c2), c2) -> (and (srl(x, c2), c1) combine to encourage BFE creation, I investigated putting this in DAGCombine but it caused a lot of noise on other targets - some improvements, some regressions.

The X86 changes are all definite wins.

Differential Revision: https://reviews.llvm.org/D60462

llvm-svn: 358887
2019-04-22 14:04:35 +00:00
Sanjay Patel 9bc6c77220 [DAGCombiner] make variable name less ambiguous; NFC
llvm-svn: 358886
2019-04-22 13:42:50 +00:00
Sanjay Patel d6989daae9 [DAGCombiner] prepare shuffle-of-splat to handle more patterns; NFC
llvm-svn: 358884
2019-04-22 13:36:07 +00:00
Amara Emerson 4286652556 Revert r358800. Breaks Obsequi from the test suite.
The last attempt fixed gcc and consumer-typeset, but Obsequi seems to fail with
a different issue.

llvm-svn: 358829
2019-04-20 21:25:00 +00:00
Fangrui Song d3b2682351 [ExecutionDomainFix] Optimize a binary search insertion
llvm-svn: 358815
2019-04-20 13:00:50 +00:00
Amara Emerson eac69e9377 Revert "Revert "[GlobalISel] Add legalization support for non-power-2 loads and stores""
We were shifting the wrong component of a split load when trying to combine them
back into a single value.

llvm-svn: 358800
2019-04-19 23:54:44 +00:00
Jessica Paquette d5c69e0836 [GlobalISel][AArch64] Legalize + select G_FRINT
Exactly the same as G_FCEIL, G_FABS, etc.

Add tests for the fp16/nofp16 behaviour, update arm64-vfloatintrinsics, etc.

Differential Revision: https://reviews.llvm.org/D60895

llvm-svn: 358799
2019-04-19 23:41:52 +00:00
Jessica Paquette ad69af3e95 [GlobalISel] Add IRTranslator support for G_FRINT
Add it as a simple intrinsic, update arm64-irtranslator.ll.

Differential Revision: https://reviews.llvm.org/D60893

llvm-svn: 358787
2019-04-19 21:46:12 +00:00
Amy Huang d07d6d6177 Attempt to fix buildbot failure in commit 1bb57bac959ac163fd7d8a76d734ca3e0ecee6ab.
llvm-svn: 358786
2019-04-19 21:44:30 +00:00
Amy Huang c774f687b6 [MS] Emit S_HEAPALLOCSITE debug info
Summary:
This emits labels around heapallocsite calls and S_HEAPALLOCSITE debug
info in codeview. Currently only changes FastISel, so emitting labels still
needs to be implemented in SelectionDAG.

Reviewers: hans, rnk

Subscribers: aprantl, hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D60800

llvm-svn: 358783
2019-04-19 21:09:11 +00:00
Amara Emerson 36c5baef49 Revert "[GlobalISel] Add legalization support for non-power-2 loads and stores"
This introduces some runtime failures which I'll need to investigate further.

llvm-svn: 358771
2019-04-19 17:42:13 +00:00
Jessica Paquette dfd87f6fa1 [GlobalISel][AArch64] Legalize vector G_FPOW
This instruction is legalized in the same way as G_FSIN, G_FCOS, G_FLOG10, etc.

Update legalize-pow.mir and arm64-vfloatintrinsics.ll to reflect the change.

Differential Revision: https://reviews.llvm.org/D60218

llvm-svn: 358764
2019-04-19 16:28:08 +00:00
Sanjay Patel e197c617a6 [SelectionDAG] soften splat mask assert/unreachable (PR41535)
These are general queries, so they should not die when given
a degenerate input like an all undef mask. Callers should be
able to deal with an op that will eventually be simplified away.

llvm-svn: 358761
2019-04-19 15:31:11 +00:00
Bjorn Pettersson 238c9d6308 [CodeGen] Add "const" to MachineInstr::mayAlias
Summary:
The basic idea here is to make it possible to use
MachineInstr::mayAlias also when the MachineInstr
is const (or the "Other" MachineInstr is const).

The addition of const in MachineInstr::mayAlias
then rippled down to the need for adding const
in several other places, such as
TargetTransformInfo::getMemOperandWithOffset.

Reviewers: hfinkel

Reviewed By: hfinkel

Subscribers: hfinkel, MatzeB, arsenm, jvesely, nhaehnle, hiraditya, javed.absar, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60856

llvm-svn: 358744
2019-04-19 09:08:38 +00:00
James Molloy 9ad4cb3de4 [PATCH] [MachineScheduler] Check pending instructions when an instruction is scheduled
Pending instructions that may have been blocked from being available by the HazardRecognizer may no longer may not be blocked any more when an instruction is scheduled; pending instructions should be re-checked in this case.

This is primarily aimed at VLIW targets with large parallelism and esoteric constraints.

No testcase as no in-tree targets have this behavior.

Differential revision: https://reviews.llvm.org/D60861

llvm-svn: 358743
2019-04-19 09:00:55 +00:00
Ali Tamur 783d84bb39 [llvm] Prevent duplicate files in debug line header in dwarf 5: another attempt
Another attempt to land the changes in debug line header to prevent duplicate
files in Dwarf 5. I rolled back my previous commit because of a mistake in
generating the object file in a test. Meanwhile, I addressed some offline
comments and changed the implementation; the largest difference is that
MCDwarfLineTableHeader does not keep DwarfVersion but gets it as a parameter. I
also merged the patch to fix two lld tests that will strt to fail into this
patch.

Original Commit:

https://reviews.llvm.org/D59515

Original Message:
Motivation: In previous dwarf versions, file name indexes started from 1, and
the primary source file was not explicit. Dwarf 5 standard (6.2.4) prescribes
the primary source file to be explicitly given an entry with an index number 0.

The current implementation honors the specification by just duplicating the
main source file, once with index number 0, and later maybe with another
index number. While this is compliant with the letter of the standard, the
duplication causes problems for consumers of this information such as lldb.
(Some files are duplicated, where only some of them have a line table although
all refer to the same file)

With this change, dwarf 5 debug line section files always start from 0, and
the zeroth entry is not duplicated whenever possible. This requires different
handling of dwarf 4 and dwarf 5 during generation (e.g. when a function returns
an index zero for a file name, it signals an error in dwarf 4, but not in dwarf
5) However, I think the minor complication is worth it, because it enables all
consumers (lldb, gdb, dwarfdump, objdump, and so on) to treat all files in the
file name list homogenously.

llvm-svn: 358732
2019-04-19 02:26:56 +00:00
Michael Berg d573aa0156 [NFC] FMF propagation for GlobalIsel
llvm-svn: 358702
2019-04-18 18:48:57 +00:00
Ali Tamur 6263365b08 Fix a typo in comments. [NFC]
llvm-svn: 358639
2019-04-18 02:39:37 +00:00
Aditya Nandakumar 9266337656 [GISel]:IRTranslator: Prefer a buidInstr form that allows CSE of cast instructions
https://reviews.llvm.org/D60844

Use the style of buildInstr that allows CSEing.

llvm-svn: 358637
2019-04-18 02:19:29 +00:00
Nick Desaulniers 9609ce2f33 [AsmPrinter] hoist %a output template to base class for ARM+Aarch64
Summary:
X86 is quite complicated; so I intend to leave it as is. ARM+Aarch64 do
basically the same thing (Aarch64 did not correctly handle immediates,
ARM has a test llvm/test/CodeGen/ARM/2009-04-06-AsmModifier.ll that uses
%a with an immediate) for a flag that should be target independent
anyways.

Reviewers: echristo, peter.smith

Reviewed By: echristo

Subscribers: javed.absar, eraman, kristof.beyls, hiraditya, llvm-commits, srhines

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60841

llvm-svn: 358618
2019-04-17 22:21:10 +00:00
Amara Emerson d51adf0568 Add a getSizeInBits() accessor to MachineMemOperand. NFC.
Cleans up a bunch of places where we do getSize() * 8.

Differential Revision: https://reviews.llvm.org/D60799

llvm-svn: 358617
2019-04-17 22:21:05 +00:00
Amara Emerson daf6e66ac5 [GlobalISel] Add legalization support for non-power-2 loads and stores
Legalize things like i24 load/store by splitting them into smaller power of 2 operations.

This matches how SelectionDAG handles these operations.

Differential Revision: https://reviews.llvm.org/D59971

llvm-svn: 358613
2019-04-17 21:30:07 +00:00