Summary:
This issue from the bugzilla: https://bugs.llvm.org/show_bug.cgi?id=41177
When the two operands for BUILD_VECTOR are same, we will get assert error.
llvm::SDValue combineBVOfConsecutiveLoads(llvm::SDNode*, llvm::SelectionDAG&):
Assertion `!(InputsAreConsecutiveLoads && InputsAreReverseConsecutive) &&
"The loads cannot be both consecutive and reverse consecutive."' failed.
This error caused by the wrong ElemSIze when calling isConsecutiveLS(). We
should use `getScalarType().getStoreSize();` to get the ElemSize instread of
`getScalarSizeInBits() / 8`.
Reviewed By: jsji
Differential Revision: https://reviews.llvm.org/D60811
llvm-svn: 358644
Summary:
None of these derived classes do anything that the base class cannot.
If we remove these case statements, then the base class can handle them
just fine.
Reviewers: peter.smith, echristo
Reviewed By: echristo
Subscribers: nemanjai, javed.absar, eraman, kristof.beyls, hiraditya, kbarton, jsji, llvm-commits, srhines
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60803
llvm-svn: 358603
Summary:
Some llc debug options need pass-name as the parameters.
But if we use the pass-name ppc-early-ret, we will get below error:
llc test.ll -stop-after ppc-early-ret
LLVM ERROR: "ppc-early-ret" pass is not registered.
Below pass-names have the pass is not registered error:
ppc-ctr-loops
ppc-ctr-loops-verify
ppc-loop-preinc-prep
ppc-toc-reg-deps
ppc-vsx-copy
ppc-early-ret
ppc-vsx-fma-mutate
ppc-vsx-swaps
ppc-reduce-cr-ops
ppc-qpx-load-splat
ppc-branch-coalescing
ppc-branch-select
Reviewed By: jsji
Differential Revision: https://reviews.llvm.org/D60248
llvm-svn: 358271
Summary:
Some llc debug options need pass-name as the parameters.
But if we use the pass-name ppc-early-ret, we will get below error:
llc test.ll -stop-after ppc-early-ret
LLVM ERROR: "ppc-early-ret" pass is not registered.
Below pass-names have the pass is not registered error:
ppc-ctr-loops
ppc-ctr-loops-verify
ppc-loop-preinc-prep
ppc-toc-reg-deps
ppc-vsx-copy
ppc-early-ret
ppc-vsx-fma-mutate
ppc-vsx-swaps
ppc-reduce-cr-ops
ppc-qpx-load-splat
ppc-branch-coalescing
ppc-branch-select
Reviewed By: jsji
Differential Revision: https://reviews.llvm.org/D60248
llvm-svn: 358256
There are 3 operands of maddld, (add (mul %1, %2), %3) and sometimes
they are constant. If there is constant operand, it takes extra li to
materialize the operand, and one more extra register too. So it's not
profitable to use maddld to optimize mul-add pattern.
Differential Revision: https://reviews.llvm.org/D60181
llvm-svn: 358253
Certain optimisations from ConstantHoisting and CGP rely on Selection DAG not
seeing through to the constant in other blocks. Revert this patch while we come
up with a better way to handle that.
I will try to follow this up with some better tests.
llvm-svn: 358113
Summary:
Teach SelectionDAG how to compute known bits of ISD::CopyFromReg if
the virtual reg used has one def only.
This can be particularly useful when calling isBaseWithConstantOffset()
with the ISD::CopyFromReg argument, as more optimizations may get enabled
in the result.
Also add a missing truncation on X86, found by testing of this patch.
Change-Id: Id1c9fceec862d118c54a5b53adf72ada5d6daefa
Reviewers: bogner, craig.topper, RKSimon
Reviewed By: RKSimon
Subscribers: lebedev.ri, nemanjai, jvesely, nhaehnle, javed.absar, jsji, jdoerfert, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59535
llvm-svn: 357745
Summary:
Nodes that have no uses are eventually pruned when they are selected
from the worklist. Record nodes newly added to the worklist or DAG and
perform pruning after every combine attempt.
Reviewers: efriedma, RKSimon, craig.topper, spatel, jyknight
Reviewed By: jyknight
Subscribers: jdoerfert, jyknight, nemanjai, jvesely, nhaehnle, javed.absar, hiraditya, jsji, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D58070
llvm-svn: 357283
Summary:
PowerPC64/PowerPC64le supports the builtin function __builtin_setrnd to set the floating point rounding mode. This function will use the least significant two bits of integer argument to set the floating point rounding mode.
double __builtin_setrnd(int mode);
The effective values for mode are:
0 - round to nearest
1 - round to zero
2 - round to +infinity
3 - round to -infinity
Note that the mode argument will modulo 4, so if the int argument is greater than 3, it will only use the least significant two bits of the mode. Namely, builtin_setrnd(102)) is equal to builtin_setrnd(2).
Reviewed By: jsji
Differential Revision: https://reviews.llvm.org/D59405
llvm-svn: 357241
A shift and add/sub sequence combination is faster in place of a multiply by constant.
Because the cycle or latency of multiply is not huge, we only consider such following
worthy patterns.
```
(mul x, 2^N + 1) => (add (shl x, N), x)
(mul x, -(2^N + 1)) => -(add (shl x, N), x)
(mul x, 2^N - 1) => (sub (shl x, N), x)
(mul x, -(2^N - 1)) => (sub x, (shl x, N))
```
And the cycles or latency is subtarget-dependent so that we need consider the
subtarget to determine to do or not do such transformation.
Also data type is considered for different cycles or latency to do multiply.
Differential Revision: https://reviews.llvm.org/D58950
llvm-svn: 357233
The UseVSXReg flag can be safely removed and the code cleaned up.
Patch By: Yi-Hong Liu
Differential Revision: https://reviews.llvm.org/D58685
llvm-svn: 357028
Various SelectionDAG non-combine operations (e.g. the getNode smart
constructor and legalization) may leave dangling nodes by applying
optimizations or not fully pruning unused result values. This can
result in nodes that are never added to the worklist and therefore can
not be pruned.
Add a node inserter as the current node deleter to make sure such
nodes have the chance of being pruned.
Many minor changes, mostly positive.
llvm-svn: 356996
The 2nd loop calculates spill costs but reports free registers as cost
0 anyway, so there is little benefit from having a separate early
loop.
Surprisingly this is not NFC, as many register are marked regDisabled
so the first loop often picks up later registers unnecessarily instead
of the first one available in the allocation order...
Patch by Matthias Braun
llvm-svn: 356499
Summary:
A number of optimizations are inhibited by single-use TokenFactors not
being merged into the TokenFactor using it. This makes we consider if
we can do the merge immediately.
Most tests changes here are due to the change in visitation causing
minor reorderings and associated reassociation of paired memory
operations.
CodeGen tests with non-reordering changes:
X86/aligned-variadic.ll -- memory-based add folded into stored leaq
value.
X86/constant-combiners.ll -- Optimizes out overlap between stores.
X86/pr40631_deadstore_elision -- folds constant byte store into
preceding quad word constant store.
Reviewers: RKSimon, craig.topper, spatel, efriedma, courbet
Reviewed By: courbet
Subscribers: dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, javed.absar, eraman, hiraditya, kbarton, jrtc27, atanasyan, jsji, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59260
llvm-svn: 356068
Vector imm setting instructions like XXLXORz/XXLXORspz/XXLXORdpz
Should behave like LI8.
We should set corresponding flags to allow rematerialization and other
opts in LICM, RA, Scheduling etc.
Differential Revision: https://reviews.llvm.org/D58645
llvm-svn: 355948
In file PPCBranchSelector.cpp we tend to over estimate code size due to large
alignment and inline assembly. Usually it causes larger computed branch offset,
it is not big problem. But sometimes it may also causes smaller computed branch
offset than actual branch offset. If the offset is close to the limit of
encoding, it may cause problem at run time.
Following is a simplified example.
actual estimated
address address
...
bne Far 100 10c
.p2align 4
Near: 110 110
...
Far: 8108 8108
Actual offset: 0x8108 - 0x100 = 0x8008
Computed offset: 0x8108 - 0x10c = 0x7ffc
The computed offset is at most ((1 << alignment) - 4) bytes smaller than actual
offset. So we add this number to the offset for safety.
Differential Revision: https://reviews.llvm.org/D57718
llvm-svn: 355529
Move the stdu instruction in the prologue and epilogue.
This should provide a small performance boost in functions that are able to do
this. I've kept this change rather conservative at the moment and functions
with frame pointers or base pointers will not try to move the stack pointer
update.
Differential Revision: https://reviews.llvm.org/D42590
llvm-svn: 355085
Summary:
Turns out the test was not correct, I had to adjust the test to work. I
also added CHECK-LABELs for better error messages from FileCheck while
I'm here.
Reviewers: jsji
Subscribers: nemanjai, eraman, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D58614
llvm-svn: 355079
Summary:
Fast selection of llvm fptoi & fptrunc instructions is not handled well about
VSX instruction support.
We'd use VSX float convert integer instruction instead of non-vsx float convert
integer instruction if the operand register class is VSSRC or VSFRC because i32
and i64 are mapped to VSSRC and VSFRC correspondingly if VSX feature is
openeded.
For float trunc instruction, we do this silimar work like float convert integer
instruction to try to use VSX instruction.
Reviewed By: jsji
Differential Revision: https://reviews.llvm.org/D58430
llvm-svn: 354762
Added four test files to check the existing behaviour of prologue
and epilogue code generation. This patch was done as a setup for
the upcoming patch listed on Phabricator that will change how the
prologue and epilogue work.
The upcoming patch is: https://reviews.llvm.org/D42590
llvm-svn: 353994
If we're comparing some value for equality against 2 constants
and those constants have an absolute difference of just 1 bit,
then we can offset and mask off that 1 bit and reduce to a single
compare against zero:
and/or (setcc X, C0, ne), (setcc X, C1, ne/eq) -->
setcc ((add X, -C1), ~(C0 - C1)), 0, ne/eq
https://rise4fun.com/Alive/XslKj
This transform is disabled by default using a TLI hook
("convertSetCCLogicToBitwiseLogic()").
That should be overridden for AArch64, MIPS, Sparc and possibly
others based on the asm shown in:
https://bugs.llvm.org/show_bug.cgi?id=40611
llvm-svn: 353859
The PowerPC code generator currently scalarizes vector truncates that would fit in a vector register, resulting in vector extracts, scalar operations, and vector merges. This patch custom lowers a vector truncate that would fit in a register to a vector shuffle instead.
Differential Revision: https://reviews.llvm.org/D56507
llvm-svn: 353724
The sqrt case is faster and we already do this for the case where
the exponent is 0.25. This adds the 0.75 case which is also not
sensitive to signed zeros.
Patch by Whitney Tsang (Whitney)
Differential revision: https://reviews.llvm.org/D57434
llvm-svn: 353557
I noticed that we are missing this canonicalization in IR:
rL352515
...and then realized that we don't get this right in SDAG either,
so this has to be fixed first regardless of what we choose to do in IR.
The existing fold was limited to scalars and using the wrong predicate
to guard the transform. We have a boolean contents TLI query that can
be used to decide which direction to fold.
This may eventually lead back to the problems/question in:
https://bugs.llvm.org/show_bug.cgi?id=40486
...but it makes no difference to that yet.
Differential Revision: https://reviews.llvm.org/D57401
llvm-svn: 353433
Summary:
Experimentally we found that promotion to scalars carries less benefits
than sinking and hoisting in LICM. When using MemorySSA, we build an
AliasSetTracker on demand in order to reuse the current infrastructure.
We only build it if less than AccessCapForMSSAPromotion exist in the
loop, a cap that is by default set to 250. This value ensures there are
no runtime regressions, and there are small compile time gains for
pathological cases. A much lower value (20) was found to yield a single
regression in the llvm-test-suite and much higher benefits for compile
times. Conservatively we set the current cap to a high value, but we will
explore lowering it when MemorySSA is enabled by default.
Reviewers: sanjoy, chandlerc
Subscribers: nemanjai, jlebar, Prazek, george.burgess.iv, jfb, jsji, llvm-commits
Differential Revision: https://reviews.llvm.org/D56625
llvm-svn: 353339
There are 2 changes visible here:
1. There's no reason to limit this transform based on number
of condition registers. That diff allows PPC to produce
slightly better (dot-instructions should be generally good)
code.
Note: someone that cares about PPC codegen might want to
look closer at that output because it seems like we could
still improve this.
2. We (probably?) should not bother trying to form uaddo (or
other overflow ops) when there's no target support for such
an op. This goes beyond checking whether the op is expanded
because both PPC and AArch64 show better codegen for standard
types regardless of whether the op is legal/custom.
llvm-svn: 353001
This fixes most references to the paths:
llvm.org/svn/
llvm.org/git/
llvm.org/viewvc/
github.com/llvm-mirror/
github.com/llvm-project/
reviews.llvm.org/diffusion/
to instead point to https://github.com/llvm/llvm-project.
This is *not* a trivial substitution, because additionally, all the
checkout instructions had to be migrated to instruct users on how to
use the monorepo layout, setting LLVM_ENABLE_PROJECTS instead of
checking out various projects into various subdirectories.
I've attempted to not change any scripts here, only documentation. The
scripts will have to be addressed separately.
Additionally, I've deleted one document which appeared to be outdated
and unneeded:
lldb/docs/building-with-debug-llvm.txt
Differential Revision: https://reviews.llvm.org/D57330
llvm-svn: 352514
If bottom of block BB has only one successor OldTop, in most cases it is profitable to move it before OldTop, except the following case:
-->OldTop<-
| . |
| . |
| . |
---Pred |
| |
BB-----
Move BB before OldTop can't reduce the number of taken branches, this patch detects this case and prevent the moving.
Differential Revision: https://reviews.llvm.org/D57067
llvm-svn: 352236
Fast selection of llvm icmp and fcmp instructions is not handled well about VSX instruction support.
We'd use VSX float comparison instruction instead of non-vsx float comparison instruction
if the operand register class is VSSRC or VSFRC because i32 and i64 are mapped to VSSRC and
VSFRC correspondingly if VSX feature is opened.
If the target does not have corresponding VSX instruction comparison for some type,
just copy VSX-related register to common float register class and use non-vsx comparison instruction.
Differential Revision: https://reviews.llvm.org/D57078
llvm-svn: 352174