This uses the same technique as for shifts - split the rotation into 4/2/1-bit partial rotations and select those partials based on the amount bit, making use of PBLENDVB if available. This halves the use of PBLENDVB compared to expanding to shifts, which can be a slow op.
Unfortunately I haven't found a decent way to share much of this code with the shift equivalent.
Differential Revision: https://reviews.llvm.org/D48655
llvm-svn: 335957
Extension to D46954 (PR37426), this patch adds support for v8i16/v16i16 rotations in a similar manner - the conversion of the shift/rotate amount to a multiplication factor and the use of PMULLW to shift left and PMULHUW (ISD::MULHU) to shift the wrapped bits back around to be ORd together.
Differential Revision: https://reviews.llvm.org/D47822
llvm-svn: 334309
Similar to v4i32 SHL, convert v8i16 shift amounts to scale factors instead to improve performance and reduce instruction count. We were already doing this for constant shifts, this adds variable shift support.
Reduces the serial nature of the codegen, which relies on chains of plendvb/pand+pandn+por shifts.
This is a step towards adding support for vXi16 vector rotates.
Differential Revision: https://reviews.llvm.org/D47546
llvm-svn: 334023
Noticed while fixing PR37426, for splat rotations (rotation by an uniform value) its better to just expand back to shift ops than performing as a general non-uniform rotation.
llvm-svn: 333661
This improves splat rotations (rotation by an uniform value), to avoid having to use the generic non-uniform shift code (extension to PR37426).
llvm-svn: 333641
As suggested by Fabian on PR37426, we can use PMULUDQ to perform v4i32 vector rotations as the upper 32bits of the multiply will contain the 'wrapped' bits of the rotation.
v8i16/v16i8 rotations would be straightforward to add to lowerRotate in the future - ideally we'd mostly share code with the vector shifts lowering.
Differential Revision: https://reviews.llvm.org/D46954
llvm-svn: 332832
As suggested by Fabian on PR37441, use PSHUFLW to extend shift amount types for use with PSRAD/PSRLD to reduce register pressure.
Some of this ideally would be done by combineTargetShuffle but its tricky to do as most of the shuffles are sharing inputs.
Differential Revision: https://reviews.llvm.org/D46959
llvm-svn: 332524
Re-enable commit r323991 now that r325931 has been committed to make
MachineOperand::isRenamable() check more conservative w.r.t. code
changes and opt-in on a per-target basis.
llvm-svn: 326208
This reverts commit r323991.
This commit breaks target that don't model all the register constraints
in TableGen. So far the workaround was to set the
hasExtraXXXRegAllocReq, but it proves that it doesn't cover all the
cases.
For instance, when mutating an instruction (like in the lowering of
COPYs) the isRenamable flag is not properly updated. The same problem
will happen when attaching machine operand from one instruction to
another.
Geoff Berry is working on a fix in https://reviews.llvm.org/D43042.
llvm-svn: 325421
Summary:
This change extends MachineCopyPropagation to do COPY source forwarding
and adds an additional run of the pass to the default pass pipeline just
after register allocation.
This version of this patch uses the newly added
MachineOperand::isRenamable bit to avoid forwarding registers is such a
way as to violate constraints that aren't captured in the
Machine IR (e.g. ABI or ISA constraints).
This change is a continuation of the work started in D30751.
Reviewers: qcolombet, javed.absar, MatzeB, jonpa, tstellar
Subscribers: tpr, mgorny, mcrosier, nhaehnle, nemanjai, jyknight, hfinkel, arsenm, inouehrs, eraman, sdardis, guyblank, fedor.sergeev, aheejin, dschuff, jfb, myatsina, llvm-commits
Differential Revision: https://reviews.llvm.org/D41835
llvm-svn: 323991
Discussed here:
http://lists.llvm.org/pipermail/llvm-dev/2018-January/120320.html
In preparation for adding support for named vregs we are changing the sigil for
physical registers in MIR to '$' from '%'. This will prevent name clashes of
named physical register with named vregs.
llvm-svn: 323922
BWI supports shifting by word amounts. Even if VLX isn't support we can still widen to v32i16 and extract the lower half. For SKX its preferrable to not use 512-bit vector if we can.
llvm-svn: 321059
As part of the unification of the debug format and the MIR format, print
MBB references as '%bb.5'.
The MIR printer prints the IR name of a MBB only for block definitions.
* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" \) -type f -print0 | xargs -0 sed -i '' -E 's/BB#" << ([a-zA-Z0-9_]+)->getNumber\(\)/" << printMBBReference(*\1)/g'
* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" \) -type f -print0 | xargs -0 sed -i '' -E 's/BB#" << ([a-zA-Z0-9_]+)\.getNumber\(\)/" << printMBBReference(\1)/g'
* find . \( -name "*.txt" -o -name "*.s" -o -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" \) -type f -print0 | xargs -0 sed -i '' -E 's/BB#([0-9]+)/%bb.\1/g'
* grep -nr 'BB#' and fix
Differential Revision: https://reviews.llvm.org/D40422
llvm-svn: 319665
As part of the unification of the debug format and the MIR format,
always print registers as lowercase.
* Only debug printing is affected. It now follows MIR.
Differential Revision: https://reviews.llvm.org/D40417
llvm-svn: 319187
This patch redefines the MOVSS/MOVSD instructions to take VR128 as its second input. This allows the MOVSS/SD->BLEND commute to work without requiring a COPY to be inserted.
This should fix PR33079
Overall this looks to be an improvement in the generated code. I haven't checked the EXPENSIVE_CHECKS build but I'll do that and update with results.
Differential Revision: https://reviews.llvm.org/D38449
llvm-svn: 314914
Issues addressed since original review:
- Avoid bug in regalloc greedy/machine verifier when forwarding to use
in an instruction that re-defines the same virtual register.
- Fixed bug when forwarding to use in EarlyClobber instruction slot.
- Fixed incorrect forwarding to register definitions that showed up in
explicit_uses() iterator (e.g. in INLINEASM).
- Moved removal of dead instructions found by
LiveIntervals::shrinkToUses() outside of loop iterating over
instructions to avoid instructions being deleted while pointed to by
iterator.
- Fixed ARMLoadStoreOptimizer bug exposed by this change in r311907.
- The pass no longer forwards COPYs to physical register uses, since
doing so can break code that implicitly relies on the physical
register number of the use.
- The pass no longer forwards COPYs to undef uses, since doing so
can break the machine verifier by creating LiveRanges that don't
end on a use (since the undef operand is not considered a use).
[MachineCopyPropagation] Extend pass to do COPY source forwarding
This change extends MachineCopyPropagation to do COPY source forwarding.
This change also extends the MachineCopyPropagation pass to be able to
be run during register allocation, after physical registers have been
assigned, but before the virtual registers have been re-written, which
allows it to remove virtual register COPY LiveIntervals that become dead
through the forwarding of all of their uses.
llvm-svn: 314729
This teach simplifyDemandedBits to handle constant splat vector shifts.
This required changing some uses of getZExtValue to getLimitedValue since we can't rely on legalization using getShiftAmountTy for the shift amount.
I believe there may have been a bug in the ((X << C1) >>u ShAmt) handling where we didn't check if the inner shift was too large. I've fixed that here.
I had to add new patterns to ARM because the zext/sext the patterns were trying to look for got turned into an any_extend with this patch. Happy to split that out too, but not sure how to test without this change.
Differential Revision: https://reviews.llvm.org/D37665
llvm-svn: 314139
Issues addressed since original review:
- Moved removal of dead instructions found by
LiveIntervals::shrinkToUses() outside of loop iterating over
instructions to avoid instructions being deleted while pointed to by
iterator.
- Fixed ARMLoadStoreOptimizer bug exposed by this change in r311907.
- The pass no longer forwards COPYs to physical register uses, since
doing so can break code that implicitly relies on the physical
register number of the use.
- The pass no longer forwards COPYs to undef uses, since doing so
can break the machine verifier by creating LiveRanges that don't
end on a use (since the undef operand is not considered a use).
[MachineCopyPropagation] Extend pass to do COPY source forwarding
This change extends MachineCopyPropagation to do COPY source forwarding.
This change also extends the MachineCopyPropagation pass to be able to
be run during register allocation, after physical registers have been
assigned, but before the virtual registers have been re-written, which
allows it to remove virtual register COPY LiveIntervals that become dead
through the forwarding of all of their uses.
llvm-svn: 312328
It caused PR34387: Assertion failed: (RegNo < NumRegs && "Attempting to access record for invalid register number!")
> Issues identified by buildbots addressed since original review:
> - Fixed ARMLoadStoreOptimizer bug exposed by this change in r311907.
> - The pass no longer forwards COPYs to physical register uses, since
> doing so can break code that implicitly relies on the physical
> register number of the use.
> - The pass no longer forwards COPYs to undef uses, since doing so
> can break the machine verifier by creating LiveRanges that don't
> end on a use (since the undef operand is not considered a use).
>
> [MachineCopyPropagation] Extend pass to do COPY source forwarding
>
> This change extends MachineCopyPropagation to do COPY source forwarding.
>
> This change also extends the MachineCopyPropagation pass to be able to
> be run during register allocation, after physical registers have been
> assigned, but before the virtual registers have been re-written, which
> allows it to remove virtual register COPY LiveIntervals that become dead
> through the forwarding of all of their uses.
llvm-svn: 312178
Issues identified by buildbots addressed since original review:
- Fixed ARMLoadStoreOptimizer bug exposed by this change in r311907.
- The pass no longer forwards COPYs to physical register uses, since
doing so can break code that implicitly relies on the physical
register number of the use.
- The pass no longer forwards COPYs to undef uses, since doing so
can break the machine verifier by creating LiveRanges that don't
end on a use (since the undef operand is not considered a use).
[MachineCopyPropagation] Extend pass to do COPY source forwarding
This change extends MachineCopyPropagation to do COPY source forwarding.
This change also extends the MachineCopyPropagation pass to be able to
be run during register allocation, after physical registers have been
assigned, but before the virtual registers have been re-written, which
allows it to remove virtual register COPY LiveIntervals that become dead
through the forwarding of all of their uses.
llvm-svn: 312154
Two issues identified by buildbots were addressed:
- The pass no longer forwards COPYs to physical register uses, since
doing so can break code that implicitly relies on the physical
register number of the use.
- The pass no longer forwards COPYs to undef uses, since doing so
can break the machine verifier by creating LiveRanges that don't
end on a use (since the undef operand is not considered a use).
[MachineCopyPropagation] Extend pass to do COPY source forwarding
This change extends MachineCopyPropagation to do COPY source forwarding.
This change also extends the MachineCopyPropagation pass to be able to
be run during register allocation, after physical registers have been
assigned, but before the virtual registers have been re-written, which
allows it to remove virtual register COPY LiveIntervals that become dead
through the forwarding of all of their uses.
Reviewers: qcolombet, javed.absar, MatzeB, jonpa
Subscribers: jyknight, nemanjai, llvm-commits, nhaehnle, mcrosier, mgorny
Differential Revision: https://reviews.llvm.org/D30751
llvm-svn: 311135
This reverts commit r311038.
Several buildbots are breaking, and at least one appears to be due to
the forwarding of physical regs enabled by this change. Reverting while
I investigate further.
llvm-svn: 311062
This change extends MachineCopyPropagation to do COPY source forwarding.
This change also extends the MachineCopyPropagation pass to be able to
be run during register allocation, after physical registers have been
assigned, but before the virtual registers have been re-written, which
allows it to remove virtual register COPY LiveIntervals that become dead
through the forwarding of all of their uses.
Reviewers: qcolombet, javed.absar, MatzeB, jonpa
Subscribers: jyknight, nemanjai, llvm-commits, nhaehnle, mcrosier, mgorny
Differential Revision: https://reviews.llvm.org/D30751
llvm-svn: 311038
These were taking priority over the aligned load instructions since there is no vmovda8/16. I don't think there is really a difference between aligned and unaligned on newer cpus so I don't think it matters which instructions we use.
But with this change we reduce the size of the isel table a little and we allow the aligned information to pass through to the evex->vec pass and produce the same output has avx/avx2 in some cases.
I also generally dislike patterns rooted in a bitcast which these were.
Differential Revision: https://reviews.llvm.org/D35977
llvm-svn: 309589
The patch itself is simple: stop discriminating against vectors in visitAnd() and again in
SimplifyDemandedBits().
Some notes for reference:
1. We're not consistent about calls to SimplifyDemandedBits in the various visitXXX functions.
Sometimes, we check if the RHS is a constant first. Other times (like here), we just dive in.
2. I'd like to break the vector shackles in steps for the sake of risk minimization, but we could
make similar simultaneous changes in other places if we think that would be better.
3. I don't know what the intent of the changed tests in this patch was supposed to be, but since
they wiggled in a positive way, I'm just going with that. :)
4. In the rotate tests, note that we can see through non-splat constants. This is a result of D24253.
5. My motivation for being here now is to make D31944 look better, so this is step 1 of N towards
improving the vector codegen in that patch without writing any actual new code.
Differential Revision: https://reviews.llvm.org/D32230
llvm-svn: 300725
VZEROUPPER should not be issued on Knights Landing (KNL), but on Skylake-avx512 it should be.
Differential Revision: https://reviews.llvm.org/D29874
llvm-svn: 296859
Similar was already done for several other shuffles in this function.
The test changes are because the old code used explicity zeroing for elements that could have been undef.
While I was here I also changed other shuffle vectors in the same function to use the same input twice instead of creating UNDEF nodes. getVectorShuffle can create the UNDEF for us.
llvm-svn: 294130
Summary:
For instructions such as PSLLW/PSLLD/PSLLQ a variable shift amount may be passed in an XMM register.
The lower 64-bits of the register are evaluated to determine the shift amount.
This patch improves the construction of the vector containing the shift amount.
Reviewers: craig.topper, delena, RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D28353
llvm-svn: 291120
The script now replace '.LCPI888_8' style asm symbols with the {{\.LCPI.*}} re pattern - this helps stop hardcoded symbols in 32-bit x86 tests changing with every edit of the file
Refreshed some tests to demonstrate the new check
llvm-svn: 272488
The SSE41 v8i16 shift lowering using (v)pblendvb is great for non-constant shift amounts, but if it is constant then we can efficiently reduce the VSELECT to shuffles with the pre-SSE41 lowering.
llvm-svn: 263383
autogenerated.
Also update existing test cases which appear to be generated by it and
weren't modified (other than addition of the header) by rerunning it.
llvm-svn: 253917
We don't need a mask of a rotation result to be a constant splat - any constant scalar/vector can be usefully folded.
Followup to D13851.
llvm-svn: 251197
This patch adds support for lowering to the XOP VPROT / VPROTI vector bit rotation instructions.
This has required changes to the DAGCombiner rotation pattern matching to support vector types - so far I've only changed it to support splat vectors, but generalising this further is feasible in the future.
Differential Revision: http://reviews.llvm.org/D13851
llvm-svn: 251188