This is a follow-up refactoring patch after the introduction of usable TreeEntry pointers in D61706.
The EdgeInfo struct can now use a TreeEntry pointer instead of an index in VectorizableTree.
Committed on behalf of @vporpo (Vasileios Porpodas)
Differential Revision: https://reviews.llvm.org/D61795
llvm-svn: 361110
This reduces the number of parameters we need to pass in and they seem a
natural fit in LoopVectorizationCostModel. Also simplifies things for
D59995.
As a follow up refactoring, we could only expose a expose a
shouldUseVectorIntrinsic() helper in LoopVectorizationCostModel, instead
of calling getVectorCallCost/getVectorIntrinsicCost in
InnerLoopVectorizer/VPRecipeBuilder.
Reviewers: Ayal, hsaito, dcaballe, rengolin
Reviewed By: rengolin
Differential Revision: https://reviews.llvm.org/D61638
llvm-svn: 360758
This patch fixes the TreeEntry dangling pointer issue caused by reallocations of VectorizableTree.
Committed on behalf of @vporpo (Vasileios Porpodas)
Differential Revision: https://reviews.llvm.org/D61706
llvm-svn: 360456
Summary:
Preserve MemorySSA in LoopSimplify, in the old pass manager, if the analysis is available.
Do not preserve it in the new pass manager.
Update tests.
Subscribers: nemanjai, jlebar, javed.absar, Prazek, kbarton, zzheng, jsji, llvm-commits, george.burgess.iv, chandlerc
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60833
llvm-svn: 360270
Summary:
Bug: https://bugs.llvm.org/show_bug.cgi?id=39024
The bug reports that a vectorized loop is stepped through 4 times and each step through the loop seemed to show a different path. I found two problems here:
A) An incorrect line number on a preheader block (for.body.preheader) instruction causes a step into the loop before it begins.
B) Instructions in the middle block have different line numbers which give the impression of another iteration.
In this patch I give all of the middle block instructions the line number of the scalar loop latch terminator branch. This seems to provide the smoothest debugging experience because the vectorized loops will always end on this line before dropping into the scalar loop. To solve problem A I have altered llvm::SplitBlockPredecessors to accommodate loop header blocks.
Reviewers: samsonov, vsk, aprantl, probinson, anemet, hfinkel
Reviewed By: hfinkel
Subscribers: bjope, jmellorcrummey, hfinkel, gbedwell, hiraditya, zzheng, llvm-commits
Tags: #llvm, #debug-info
Differential Revision: https://reviews.llvm.org/D60831
llvm-svn: 360162
Properly initialize store type to null then ensure we find a real store type in the chain.
Fixes scan-build null dereference warning and makes the code clearer.
llvm-svn: 360031
Summary:
When refactoring vectorization flags, vectorization was disabled by default in the new pass manager.
This patch re-enables is for both managers, and changes the assumptions opt makes, based on the new defaults.
Comments in opt.cpp should clarify the intended use of all flags to enable/disable vectorization.
Reviewers: chandlerc, jgorbe
Subscribers: jlebar, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D61091
llvm-svn: 359167
Summary: The code did not check if operand was undef before casting it to Instruction.
Reviewers: RKSimon, ABataev, dtemirbulatov
Reviewed By: ABataev
Subscribers: uabelho
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D61024
llvm-svn: 359136
Summary:
Trying to add the plumbing necessary to add tuning options to the new pass manager.
Testing with the flags for loop vectorize.
Reviewers: chandlerc
Subscribers: sanjoy, mehdi_amini, jlebar, steven_wu, dexonsmith, dang, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59723
llvm-svn: 358763
This is a refactoring patch which should have all the functionality of the current code. Its goal is twofold:
i. Cleanup and simplify the reordering code, and
ii. Generalize reordering so that it will work for an arbitrary number of operands, not just 2.
This is the second patch in a series of patches that will enable operand reordering across chains of operations. An example of this was presented in EuroLLVM'18 https://www.youtube.com/watch?v=gIEn34LvyNo .
Committed on behalf of @vporpo (Vasileios Porpodas)
Differential Revision: https://reviews.llvm.org/D59973
llvm-svn: 358519
Summary:
Enable some of the existing size optimizations for cold code under PGO.
A ~5% code size saving in big internal app under PGO.
The way it gets BFI/PSI is discussed in the RFC thread
http://lists.llvm.org/pipermail/llvm-dev/2019-March/130894.html
Note it doesn't currently touch loop passes.
Reviewers: davidxl, eraman
Reviewed By: eraman
Subscribers: mgorny, javed.absar, smeenai, mehdi_amini, eraman, zzheng, steven_wu, dexonsmith, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59514
llvm-svn: 358422
1. Use computed VF for stress testing.
2. If the computed VF does not produce vector code (VF smaller than 2), force VF to be 4.
3. Test vectorization of i64 data on AArch64 to make sure we generate VF != 4 (on X86 that was already tested on AVX).
Patch by Francesco Petrogalli <francesco.petrogalli@arm.com>
Differential Revision: https://reviews.llvm.org/D59952
llvm-svn: 358056
Bug: https://bugs.llvm.org/show_bug.cgi?id=41180
In the bug test case the debug location was missing for the cmp instruction in
the "middle block" BB. This patch fixes the bug by copying the debug location
from the cmp of the scalar loop's terminator branch, if it exists.
The patch also fixes the debug location on the subsequent branch instruction.
It was previously using the location of the of the original loop's pre-header
block terminator. Both of these instructions will now map to the source line of
the conditional branch in the original loop.
A regression test has been added that covers these issues.
Patch by Orlando Cazalet-Hyams!
Differential Revision: https://reviews.llvm.org/D59944
llvm-svn: 357499
For the cases where the icmp/fcmp predicate is commutative, use reorderInputsAccordingToOpcode to collect and commute the operands.
This requires a helper to recognise commutativity in both general Instruction and CmpInstr types - the CmpInst::isCommutative doesn't overload the Instruction::isCommutative method for reasons I'm not clear on (maybe because its based on predicate not opcode?!?).
Differential Revision: https://reviews.llvm.org/D59992
llvm-svn: 357266
We should be able to match elements with the swapped predicate as well - as long as we commute the source operands.
Differential Revision: https://reviews.llvm.org/D59956
llvm-svn: 357243
With this change, the VPlan native path is triggered with the directive:
#pragma clang loop vectorize(enable)
There is no need to specify the vectorize_width(N) clause.
Patch by Francesco Petrogalli <francesco.petrogalli@arm.com>
Differential Revision: https://reviews.llvm.org/D57598
llvm-svn: 357156
As discussed on D59738, this generalizes reorderInputsAccordingToOpcode to handle multiple + non-commutative instructions so we can get rid of reorderAltShuffleOperands and make use of the extra canonicalizations that reorderInputsAccordingToOpcode brings.
Differential Revision: https://reviews.llvm.org/D59784
llvm-svn: 356939
Remove attempts to commute non-Instructions to the LHS - the codegen changes appear to rely on chance more than anything else and also have a tendency to fight existing instcombine canonicalization which moves constants to the RHS of commutable binary ops.
This is prep work towards:
(a) reusing reorderInputsAccordingToOpcode for alt-shuffles and removing the similar reorderAltShuffleOperands
(b) improving reordering to optimized cases with commutable and non-commutable instructions to still find splat/consecutive ops.
Differential Revision: https://reviews.llvm.org/D59738
llvm-svn: 356913
Remove the I.getOperand() calls from inside shouldReorderOperands - reorderInputsAccordingToOpcode should handle the creation of the operand lists and shouldReorderOperands should just check to see whether the i'th element should be commuted.
llvm-svn: 356854
This is a refactoring patch that removes the redundancy of performing operand reordering twice, once in buildTree() and later in vectorizeTree().
To achieve this we need to keep track of the operands within the TreeEntry struct while building the tree, and later in vectorizeTree() we are just accessing them from the TreeEntry in the right order.
This patch is the first in a series of patches that will allow for better operand reordering across chains of instructions (e.g., a chain of ADDs), as presented here: https://www.youtube.com/watch?v=gIEn34LvyNo
Patch by: @vporpo (Vasileios Porpodas)
Differential Revision: https://reviews.llvm.org/D59059
llvm-svn: 356814
This is a refactoring patch that removes the redundancy of performing operand reordering twice, once in buildTree() and later in vectorizeTree().
To achieve this we need to keep track of the operands within the TreeEntry struct while building the tree, and later in vectorizeTree() we are just accessing them from the TreeEntry in the right order.
This patch is the first in a series of patches that will allow for better operand reordering across chains of instructions (e.g., a chain of ADDs), as presented here: https://www.youtube.com/watch?v=gIEn34LvyNo
Patch by: @vporpo (Vasileios Porpodas)
Differential Revision: https://reviews.llvm.org/D59059
........
Reverted due to buildbot failures that I don't have time to track down.
llvm-svn: 355913
This is a refactoring patch that removes the redundancy of performing operand reordering twice, once in buildTree() and later in vectorizeTree().
To achieve this we need to keep track of the operands within the TreeEntry struct while building the tree, and later in vectorizeTree() we are just accessing them from the TreeEntry in the right order.
This patch is the first in a series of patches that will allow for better operand reordering across chains of instructions (e.g., a chain of ADDs), as presented here: https://www.youtube.com/watch?v=gIEn34LvyNo
Patch by: @vporpo (Vasileios Porpodas)
Differential Revision: https://reviews.llvm.org/D59059
llvm-svn: 355906
Change from original commit: move test (that uses an X86 triple) into the X86
subdirectory.
Original description:
Gating vectorizing reductions on *all* fastmath flags seems unnecessary;
`reassoc` should be sufficient.
Reviewers: tvvikram, mkuper, kristof.beyls, sdesmalen, Ayal
Reviewed By: sdesmalen
Subscribers: dcaballe, huntergr, jmolloy, mcrosier, jlebar, bixia, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D57728
llvm-svn: 355889
GCC correctly moans that PlainCFGBuilder::isExternalDef(llvm::Value*) and
StackSafetyDataFlowAnalysis::verifyFixedPoint() are defined but not used
in Release builds. Hide them behind 'ifndef NDEBUG'.
llvm-svn: 355205
This requires a couple of tweaks to existing vectorization functions as they were assuming that only the second call argument (ctlz/cttz/powi) could ever be the 'always scalar' argument, but for smul.fix + umul.fix its the third argument.
Differential Revision: https://reviews.llvm.org/D58616
llvm-svn: 354790
Add plumbing to get MemorySSA in the remaining loop passes.
Also update unit test to add the dependency.
[EnableMSSALoopDependency remains disabled].
llvm-svn: 353901
Loop::setAlreadyUnrolled() and
LoopVectorizeHints::setLoopAlreadyUnrolled() both add loop metadata that
stops the same loop from being transformed multiple times. This patch
merges both implementations.
In doing so we fix 3 potential issues:
* setLoopAlreadyUnrolled() kept the llvm.loop.vectorize/interleave.*
metadata even though it will not be used anymore. This already caused
problems such as http://llvm.org/PR40546. Change the behavior to the
one of setAlreadyUnrolled which deletes this loop metadata.
* setAlreadyUnrolled() used to create a new LoopID by calling
MDNode::get with nullptr as the first operand, then replacing it by
the returned references using replaceOperandWith. It is possible
that MDNode::get would instead return an existing node (due to
de-duplication) that then gets modified. To avoid, use a fresh
TempMDNode that does not get uniqued with anything else before
replacing it with replaceOperandWith.
* LoopVectorizeHints::matchesHintMetadataName() only compares the
suffix of the attribute to set the new value for. That is, when
called with "enable", would erase attributes such as
"llvm.loop.unroll.enable", "llvm.loop.vectorize.enable" and
"llvm.loop.distribute.enable" instead of the one to replace.
Fortunately, function was only called with "isvectorized".
Differential Revision: https://reviews.llvm.org/D57566
llvm-svn: 353738
This cleans up all GetElementPtr creation in LLVM to explicitly pass a
value type rather than deriving it from the pointer's element-type.
Differential Revision: https://reviews.llvm.org/D57173
llvm-svn: 352913