Commit Graph

25406 Commits

Author SHA1 Message Date
Craig Topper b4c96f5a32 [SelectionDAG] Remove ISD::ADDC/ADDE from some undef handling code in getNode. NFCI
These nodes should have two results. A real VT and a Glue. But this code would have returned Undef which would only be a single result. But we're in the single result version of getNode so these opcodes should never be seen by this function anyway.

llvm-svn: 348670
2018-12-08 00:27:34 +00:00
Jessica Paquette cc4b6920b3 [GlobalISel] Add IR translation support for the @llvm.log10 intrinsic
This adds IR translation support for @llvm.log10 and updates relevant tests.

https://reviews.llvm.org/D55392

llvm-svn: 348657
2018-12-07 22:08:02 +00:00
Pete Cooper 782a490dfb Follow-up from r348441 to add the rest of the objc ARC intrinsics.
This adds the other intrinsics used by ARC and codegen's them to their respective runtime methods.

llvm-svn: 348646
2018-12-07 21:28:47 +00:00
Sanjay Patel bc47ff86fe [DAGCombiner] split trunc from extend in hoistLogicOpWithSameOpcodeHands; NFC
This duplicates several shared checks, but we need to split
this up to fix underlying bugs in smaller steps.

llvm-svn: 348627
2018-12-07 18:51:08 +00:00
Sanjay Patel 3af4ae9735 [DAGCombiner] disable truncation of binops by default
As discussed in the post-commit thread of r347917, this
transform is fighting with an existing transform causing
an infinite loop or out-of-memory, so this is effectively 
reverting r347917 and its follow-up r348195 while we
investigate the bug.

llvm-svn: 348604
2018-12-07 15:47:52 +00:00
Sanjay Patel bb796cd61c [DAGCombiner] remove explicit calls to AddToWorkList; NFCI
As noted in the post-commit thread for rL347917:
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20181203/608936.html
...we don't need to repeat these calls because the combiner does it automatically.

llvm-svn: 348597
2018-12-07 15:00:56 +00:00
Simon Pilgrim d498dee7a2 [SelectionDAG] Don't pass on DemandedElts when handling SCALAR_TO_VECTOR
Fixes an assertion:

llc: lib/CodeGen/SelectionDAG/SelectionDAG.cpp:2200: llvm::KnownBits llvm::SelectionDAG::computeKnownBits(llvm::SDValue, const llvm::APInt&, unsigned int) const: Assertion `(!Op.getValueType().isVector() || NumElts == Op.getValueType().getVectorNumElements()) && "Unexpected vector size"' failed.

Committed on behalf of: @pendingchaos (Rhys Perry)

Differential Revision: https://reviews.llvm.org/D55223

llvm-svn: 348574
2018-12-07 09:18:44 +00:00
Sanjay Patel c6441c8547 [DAGCombiner] use root SDLoc for all nodes created by logic fold
If this is not a valid way to assign an SDLoc, then we get this
wrong all over SDAG.

I don't know enough about the SDAG to explain this. IIUC, theoretically,
debug info is not supposed to affect codegen. But here it has clearly
affected 3 different targets, and the x86 change is an actual improvement.

llvm-svn: 348552
2018-12-07 00:01:57 +00:00
Sanjay Patel 86cb679851 [DAGCombiner] don't bother saving a SDLoc for a node that's dead; NFCI
We shouldn't care about the debug location for a node that
we're creating, but attaching the root of the pattern should
be the best effort. (If this is not true, then we are doing
it wrong all over the SDAG).

This is no-functional-change-intended, and there are no
regression test diffs...and that's what I expected. But
there's a similar line above this diff, where those
assumptions apparently do not hold.

llvm-svn: 348550
2018-12-06 23:53:58 +00:00
Sanjay Patel 276cef343c [DAGCombiner] more clean up in hoistLogicOpWithSameOpcodeHands(); NFC
This code can still misbehave.

llvm-svn: 348547
2018-12-06 23:39:28 +00:00
Sanjay Patel 70af85b0ac [DAGCombiner] don't group bswap with casts in logic hoisting fold
This was probably organized as it was because bswap is a unary op.
But that's where the similarity to the other opcodes ends. We should
not limit this transform to scalars, and we should not try it if
either input has other uses. This is another step towards trying to
clean this whole function up to prevent it from causing infinite loops
and memory explosions. 

Earlier commits in this series:
rL348501
rL348508
rL348518

llvm-svn: 348534
2018-12-06 22:10:44 +00:00
Sanjay Patel 03a3ef2a0c [DAGCombiner] reduce indent; NFC
Unlike some of the folds in hoistLogicOpWithSameOpcodeHands()
above this shuffle transform, this has the expected hasOneUse()
checks in place.

llvm-svn: 348523
2018-12-06 20:02:47 +00:00
Andrea Di Biagio 52a2bac583 [DagCombiner][X86] Simplify a ConcatVectors of a scalar_to_vector with undef.
This patch introduces a new DAGCombiner rule to simplify concat_vectors nodes:

concat_vectors( bitcast (scalar_to_vector %A), UNDEF)
    --> bitcast (scalar_to_vector %A)

This patch only partially addresses PR39257. In particular, it is enough to fix
one of the two problematic cases mentioned in PR39257. However, it is not enough
to fix the original test case posted by Craig; that particular case would
probably require a more complicated approach (and knowledge about used bits).

Before this patch, we used to generate the following code for function PR39257
(-mtriple=x86_64 , -mattr=+avx):

vmovsd  (%rdi), %xmm0           # xmm0 = mem[0],zero
vxorps  %xmm1, %xmm1, %xmm1
vblendps        $3, %xmm0, %xmm1, %xmm0 # xmm0 = xmm0[0,1],xmm1[2,3]
vmovaps %ymm0, (%rsi)
vzeroupper
retq

Now we generate this:

vmovsd  (%rdi), %xmm0           # xmm0 = mem[0],zero
vmovaps %ymm0, (%rsi)
vzeroupper
retq

As a side note: that VZEROUPPER is completely redundant...

I guess the vzeroupper insertion pass doesn't realize that the definition of
%xmm0 from vmovsd is already zeroing the upper half of %ymm0. Note that on
%-mcpu=btver2, we don't get that vzeroupper because pass vzeroupper insertion
%pass is disabled.

Differential Revision: https://reviews.llvm.org/D55274

llvm-svn: 348522
2018-12-06 19:55:38 +00:00
Sanjay Patel bfc7ffa40f [DAGCombiner] don't hoist logic op if operands have other uses, part 2
The PPC test with 2 extra uses seems clearly better by avoiding this transform. 
With 1 extra use, we also prevent an extra register move (although that might
be an RA problem). The general rule should be to only make a change here if
it is always profitable. The x86 diffs are all neutral.

llvm-svn: 348518
2018-12-06 19:18:56 +00:00
Simon Pilgrim 845d5a0aa8 Fix Wdocumentation warning. NFCI.
llvm-svn: 348517
2018-12-06 19:17:28 +00:00
Sanjay Patel c3717cd0d5 [DAGCombiner] don't hoist logic op if operands have other uses
The AVX512 diffs are neutral, but the bswap test shows a clear overreach in 
hoistLogicOpWithSameOpcodeHands(). If we don't check for other uses, we can 
increase the instruction count.

This could also fight with transforms trying to go in the opposite direction 
and possibly blow up/infinite loop. This might be enough to solve the bug 
noted here:
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20181203/608593.html

I did not add the hasOneUse() checks to all opcodes because I see a perf 
regression for at least one opcode. We may decide that's irrelevant in the
face of potential compiler crashing, but I'll see if I can salvage that first.

llvm-svn: 348508
2018-12-06 18:16:32 +00:00
Sanjay Patel e9bf78fa23 [DAGCombiner] refactor function that hoists bitwise logic; NFCI
Added FIXME and TODO comments for lack of safety checks.
This function is a suspect in out-of-memory errors as discussed in
the follow-up thread to r347917:
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20181203/608593.html

llvm-svn: 348501
2018-12-06 17:08:03 +00:00
Simon Pilgrim 105a366254 DAGCombiner::visitINSERT_VECTOR_ELT - pull out repeated VT.getVectorNumElements(). NFCI.
llvm-svn: 348494
2018-12-06 15:39:25 +00:00
Markus Lavin 8ba5ee57a0 Test commit: Removed trailing space in .txt file.
llvm-svn: 348483
2018-12-06 13:20:27 +00:00
Pete Cooper e13d0992dc Add objc.* ARC intrinsics and codegen them to their runtime methods.
Reviewers: erik.pilkington, ahatanak

Differential Revision: https://reviews.llvm.org/D55233

llvm-svn: 348441
2018-12-06 00:52:54 +00:00
Jessica Paquette 3cd70b385d [MachineOutliner][NFC] Move yet another std::vector out of a loop
Once again, following the wisdom of the LLVM Programmer's Manual.

I think that's enough refactoring for today. :)

llvm-svn: 348439
2018-12-06 00:26:21 +00:00
Jessica Paquette d4e7d0749b [MachineOutliner][NFC] Move std::vector out of loop
See http://llvm.org/docs/ProgrammersManual.html#vector

llvm-svn: 348433
2018-12-06 00:04:03 +00:00
Jessica Paquette ca3ed964f1 [MachineOutliner][NFC] Remove IntegerInstructionMap from InstructionMapper
Refactoring.

This map was only used when we used a string of integers to output the outlined
sequence. Since it's no longer used for anything, there's no reason to keep it
around.

llvm-svn: 348432
2018-12-06 00:01:51 +00:00
Amara Emerson a0b15d8f3e [GlobalISel] Introduce G_BUILD_VECTOR, G_BUILD_VECTOR_TRUNC and G_CONCAT_VECTOR opcodes.
These opcodes are intended to subsume some of the capability of G_MERGE_VALUES,
as it was too powerful and thus complex to add deal with throughout the GISel
pipeline.

G_BUILD_VECTOR creates a vector value from a sequence of uniformly typed
scalar values. G_BUILD_VECTOR_TRUNC is a special opcode for handling scalar
operands which are larger than the destination vector element type, and
therefore does an implicit truncate.

G_CONCAT_VECTOR creates a vector by concatenating smaller, uniformly typed,
vectors together.

These will be used in a subsequent commit. This commit just adds the initial
infrastructure.

Differential Revision: https://reviews.llvm.org/D53594

llvm-svn: 348430
2018-12-05 23:53:30 +00:00
Jessica Paquette ce3a2dcf70 [MachineOutliner][NFC] Remove buildCandidateList and replace with findCandidates
More refactoring.

Since the pruning logic has changed, and the candidate list is gone,
everything can be sunk into findCandidates.

We no longer need to keep track of the length of the longest substring, so we
can drop all of that logic as well.

After this, we just find all of the candidates and move to outlining.

llvm-svn: 348428
2018-12-05 23:39:07 +00:00
Jessica Paquette e18d6ff036 [MachineOutliner][NFC] Candidates don't need to be shared_ptrs anymore
More refactoring.

After the changes to the pruning logic, and removing CandidateList, there's
no reason for Candiates to be shared_ptrs (or pointers at all).

std::shared_ptr<Candidate> -> Candidate.

llvm-svn: 348427
2018-12-05 23:24:22 +00:00
Jessica Paquette 4ae3b71df0 [MachineOutliner][NFC] Remove CandidateList, since it's now unused.
After removing the pruning logic, there's no reason to populate a list of
Candidates. Remove CandidateList and update comments.

llvm-svn: 348422
2018-12-05 22:50:26 +00:00
Jessica Paquette d9d9309bd4 Fix buildbot capture warning
A bot didn't like my lambda. This ought to fix it.

Example:

http://lab.llvm.org:8011/builders/lld-x86_64-win7/builds/30139/steps/build%20lld/logs/stdio

error C3493: 'AlreadyRemoved' cannot be implicitly captured because no default
capture mode has been specified

llvm-svn: 348421
2018-12-05 22:47:25 +00:00
Jessica Paquette 235d877eea [MachineOutliner][NFC] Simplify and unify pruning/outlining logic
Since we're now performing outlining per OutlinedFunction rather than per
Candidate, we can simply outline each candidate as it shows up.

Instead of having a pruning phase, instead, we'll outline entire functions.
Then we'll update the UnsignedVec we mapped to reflect the deletion. If any
candidate is in a space that's marked dirty, then we'll drop it.

This lets us remove the pruning logic entirely, and greatly simplifies the
code.

llvm-svn: 348420
2018-12-05 22:27:38 +00:00
Jessica Paquette 962b3ae659 [MachineOutliner] Outline functions by order of benefit
Mostly NFC, only change is the order of outlined function names.

Loop over the outlined functions instead of walking the candidate list.

This is a bit easier to understand. It's far more natural to create a function,
then replace all of its occurrences with calls than the other way around.

The functions outlined after this do not change, but their names will be
decided by their benefit. E.g, OUTLINED_FUNCTION_0 will now always be the
most beneficial function, rather than the first one seen.

This makes it easier to enforce an ordering on the outlined functions. So,
this also adds a test to make sure that the ordering works as expected.

llvm-svn: 348414
2018-12-05 21:36:04 +00:00
Aditya Nandakumar f75d4f329c [GISel]: Provide standard interface to observe changes in GISel passes
https://reviews.llvm.org/D54980

This provides a standard API across GISel passes to observe and notify
passes about changes (insertions/deletions/mutations) to MachineInstrs.
This patch also removes the recordInsertion method in MachineIRBuilder
and instead provides method to setObserver.

Reviewed by: vkeles.

llvm-svn: 348406
2018-12-05 20:14:52 +00:00
Jessica Paquette 34b618bf7e [MachineOutliner][NFC] Don't create outlined sequence from integer mapping
Some gardening/refactoring.

It's cleaner to copy the instructions into the MachineFunction using the first
candidate instead of going to the mapper.

Also, by doing this we can remove the Seq member from OutlinedFunction entirely.

llvm-svn: 348390
2018-12-05 17:57:33 +00:00
Sanjay Patel 33a448f935 [DAGCombiner] don't try to extract a fraction of a vector binop and crash (PR39893)
Because we're potentially peeking through a bitcast in this transform,
we need to use overall bitwidths rather than number of elements to
determine when it's safe to proceed.

Should fix:
https://bugs.llvm.org/show_bug.cgi?id=39893

llvm-svn: 348383
2018-12-05 17:10:30 +00:00
Simon Pilgrim 8fdaf5c915 [TargetLowering] Remove ISD::ANY_EXTEND/ANY_EXTEND_VECTOR_INREG opcodes from SimplifyDemandedVectorElts
These have no test coverage and the KnownZero flags can't be guaranteed unlike SIGN/ZERO_EXTEND cases.

llvm-svn: 348361
2018-12-05 12:20:05 +00:00
Simon Pilgrim 180639afe5 [SelectionDAG] Initial support for FSHL/FSHR funnel shift opcodes (PR39467)
This is an initial patch to add a minimum level of support for funnel shifts to the SelectionDAG and to begin wiring it up to the X86 SHLD/SHRD instructions.

Some partial legalization code has been added to handle the case for 'SlowSHLD' where we want to expand instead and I've added a few DAG combines so we don't get regressions from the existing DAG builder expansion code.

Differential Revision: https://reviews.llvm.org/D54698

llvm-svn: 348353
2018-12-05 11:12:12 +00:00
Simon Pilgrim cd8a152b18 Remove superfluous comments. NFCI.
As requested in D54698.

llvm-svn: 348350
2018-12-05 10:45:44 +00:00
Simon Pilgrim d24730cdda [TargetLowering] SimplifyDemandedVectorElts - don't alter DemandedElts mask
Fix potential issue with the ISD::INSERT_VECTOR_ELT case tweaking the DemandedElts mask instead of using a local copy - so later uses of the mask use the tweaked version.....

Noticed while investigating adding zero/undef folding to SimplifyDemandedVectorElts and the altered DemandedElts mask was causing mismatches.

llvm-svn: 348348
2018-12-05 10:37:45 +00:00
Craig Topper 6934202dc0 [MachineLICM][X86][AMDGPU] Fix subtle bug in the updating of PhysRegClobbers in post-RA LICM
It looks like MCRegAliasIterator can visit the same physical register twice. When this happens in this code in LICM we end up setting the PhysRegDef and then later in the same loop visit the register again. Now we see that PhysRegDef is set from the earlier iteration so now set PhysRegClobber.

This patch splits the loop so we have one that uses the previous value of PhysRegDef to update PhysRegClobber and second loop that updates PhysRegDef.

The X86 atomic test is an improvement. I had to add sideeffect to the two shrink wrapping tests to prevent hoisting from occurring. I'm not sure about the AMDGPU tests. It looks like the branch instruction changed at end the of the loops. And in the branch-relaxation test I think there is now "and vcc, exec, -1" instruction that wasn't there before.

Differential Revision: https://reviews.llvm.org/D55102

llvm-svn: 348330
2018-12-05 03:41:26 +00:00
Amara Emerson 814a6794ba [SelectionDAG] Split very large token factors for loads into 64k chunks.
There's a 64k limit on the number of SDNode operands, and some very large
functions with 64k or more loads can cause crashes due to this limit being hit
when a TokenFactor with this many operands is created. To fix this, create
sub-tokenfactors if we've exceeded the limit.

No test case as it requires a very large function.

rdar://45196621

Differential Revision: https://reviews.llvm.org/D55073

llvm-svn: 348324
2018-12-05 00:41:30 +00:00
Nirav Dave ce26c27b2a [SelectionDAG] Redefine isGAPlusOffset in terms of unwrapAddress. NFCI.
llvm-svn: 348288
2018-12-04 17:59:43 +00:00
Matt Arsenault 43153024ab MIR: Add method to stop after specific runs of passes
Currently if you use -{start,stop}-{before,after}, it picks
the first instance with the matching pass name. If you run
the same pass multiple times, there's no way to distinguish them.

Allow specifying a run index wih ,N to specify which you mean.

llvm-svn: 348285
2018-12-04 17:45:12 +00:00
Simon Pilgrim 0add090e24 [TargetLowering] expandFP_TO_UINT - avoid FPE due to out of range conversion (PR17686)
PR17686 demonstrates that for some targets FP exceptions can fire in cases where the FP_TO_UINT is expanded using a FP_TO_SINT instruction.

The existing code converts both the inrange and outofrange cases using FP_TO_SINT and then selects the result, this patch changes this for 'strict' cases to pre-select the FP_TO_SINT input and the offset adjustment.

The X87 cases don't need the strict flag but generates much nicer code with it....

Differential Revision: https://reviews.llvm.org/D53794

llvm-svn: 348251
2018-12-04 11:21:30 +00:00
Simon Pilgrim 666261cdc8 [TargetLowering] Add SimplifyDemandedVectorElts support to EXTEND opcodes
Add support for ISD::*_EXTEND and ISD::*_EXTEND_VECTOR_INREG opcodes.

The extra broadcast in trunc-subvector.ll will be fixed in an upcoming patch.

llvm-svn: 348246
2018-12-04 10:41:06 +00:00
Sanjay Patel d24f63477d [DAGCombiner] narrow truncated vector binops when legal
This is the smallest vector enhancement I could find to D54640.
Here, we're allowing narrowing to only legal vector ops because we'll see
regressions without that. All of the test diffs are wins from what I can tell.
With AVX/AVX512, we can shrink ymm/zmm ops to xmm.

x86 vector multiplies are the problem case that we're avoiding due to the
patchwork ISA, and it's not clear to me if we can dance around those
regressions using TLI hooks or if we need preliminary patches to plug those
holes.

Differential Revision: https://reviews.llvm.org/D55126

llvm-svn: 348195
2018-12-03 21:57:35 +00:00
Craig Topper e35b01f8ea [X86] Add DAG combine to combine a v8i32->v8i16 truncate with a packuswb that truncates v8i16->v8i8.
Summary:
Under -x86-experimental-vector-widening-legalization, fp_to_uint/fp_to_sint with a smaller than 128 bit vector type results are custom type legalized by promoting the result to a 128 bit vector by promoting the elements, inserting an assertzext/assertsext, then truncating back to original type. The truncate will be further legalizdd to a pack shuffle. In the case of a v8i8 result type, we'll end up with a v8i16 fp_to_sint. This will need to be further legalized during vector op legalization by promoting to v8i32 and then truncating again. Under avx2 this produces good code with two pack instructions, but Under avx512 this will result in a truncate instruction and a packuswb instruction. But we should be able to get away with a single truncate instruction.

The other option is to promote all the way to vXi32 result type during the first type legalization. But in some experimentation that seemed to require more work to produce good code for other configurations.

Reviewers: RKSimon, spatel

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D54836

llvm-svn: 348158
2018-12-03 18:26:24 +00:00
Sanjay Patel b205606d3e [SelectionDAG] fold constant with undef vector per element
This makes the SDAG behavior consistent with the way we do this in IR.
It's possible that we were getting the wrong answer before. For example,
'xor undef, undef --> 0' but 'xor undef, C' --> undef. 

But the most practical improvement is likely as shown in the tests here - 
for FP, we were overconstraining undef lanes to NaN, and that can prevent 
vector simplifications/narrowing (see D51553).

llvm-svn: 348090
2018-12-02 13:48:42 +00:00
Sanjay Patel 2daceedf92 [DAGCombiner] guard against an oversized shift crash
This change prevents the crash noted in the post-commit comments 
for rL347478 :
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20181119/605166.html

We can't guarantee that an oversized shift amount is folded away, 
so we have to check for it.

Note that I committed an incomplete fix for that crash with:
rL347502

But as discussed here:
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20181126/605679.html
...we have to try harder.

So I'm not sure how to expose the bug now (and apparently no fuzzers have found 
a way yet either).

On the plus side, we have discovered that we're missing real optimizations by 
not simplifying nodes sooner, so the earlier fix still has value, and there's 
likely more value in extending that so we can simplify more opcodes and simplify 
when doing RAUW and/or putting nodes on the combiner worklist.

Differential Revision: https://reviews.llvm.org/D54954

llvm-svn: 348089
2018-12-02 13:33:56 +00:00
Simon Pilgrim e017ed3245 [SelectionDAG] Improve SimplifyDemandedBits to SimplifyDemandedVectorElts simplification
D52935 introduced the ability for SimplifyDemandedBits to call SimplifyDemandedVectorElts through BITCASTs if the demanded bit mask entirely covered the sub element.

This patch relaxes this to demanding an element if we need any bit from it.

Differential Revision: https://reviews.llvm.org/D54761

llvm-svn: 348073
2018-12-01 12:08:55 +00:00
Nicolai Haehnle a9cc92c247 AMDGPU: Fix various issues around the VirtReg2Value mapping
Summary:
The VirtReg2Value mapping is crucial for getting consistently
reliable divergence information into the SelectionDAG. This
patch fixes a bunch of issues that lead to incorrect divergence
info and introduces tight assertions to ensure we don't regress:

1. VirtReg2Value is generated lazily; there were some cases where
   a lookup was performed before all relevant virtual registers were
   created, leading to an out-of-sync mapping. Those cases were:

  - Complex code to lower formal arguments that generated CopyFromReg
    nodes from live-in registers (fixed by never querying the mapping
    for live-in registers).

  - Code that generates CopyToReg for formal arguments that are used
    outside the entry basic block (fixed by never querying the
    mapping for Register nodes, which don't need the divergence info
    anyway).

2. For complex values that are lowered to a sequence of registers,
   all registers must be reflected in the VirtReg2Value mapping.

I am not adding any new tests, since I'm not actually aware of any
bugs that these problems are causing with trunk as-is. However,
I recently added a test case (in r346423) which fails when D53283 is
applied without this change. Also, the new assertions should provide
most of the effective test coverage.

There is one test change in sdwa-peephole.ll. The underlying issue
is that since the divergence info is now correct, the DAGISel will
select V_OR_B32 directly instead of S_OR_B32. This leads to an extra
COPY which affects the behavior of MachineLICM in a way that ends up
with the S_MOV_B32 with the constant in a different basic block than
the V_OR_B32, which is presumably what defeats the peephole.

Reviewers: alex-t, arsenm, rampitec

Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D54340

llvm-svn: 348049
2018-11-30 22:55:29 +00:00
Sanjay Patel 1901a12e76 [SelectionDAG] fold FP binops with 2 undef operands to undef
llvm-svn: 348016
2018-11-30 18:38:52 +00:00