Commit Graph

193 Commits

Author SHA1 Message Date
Qiu Chaofan 41ba9d7723 [PowerPC] Support constrained vector fp/int conversion
This patch makes these operations legal, and add necessary codegen
patterns.

There's still some issue similar to D77033 for conversion from v1i128
type. But normal type tests synced in vector-constrained-fp-intrinsic
are passed successfully.

Reviewed By: uweigand

Differential Revision: https://reviews.llvm.org/D83654
2020-08-24 10:10:27 +08:00
Qiu Chaofan a5b7b8cce0 [PowerPC] Support constrained scalar sitofp/uitofp
This patch adds support for constrained scalar int to fp operations on
PowerPC. Besides, this also fixes the FP exception bit of FCFID*
instructions.

Reviewed By: steven.zhang, uweigand

Differential Revision: https://reviews.llvm.org/D81669
2020-08-22 02:10:29 +08:00
Qiu Chaofan 131b3b9ed4 [PowerPC] Support constrained scalar fptosi/fptoui
This patch adds support for constrained scalar fp to int operations on
PowerPC. Besides, this fixes the FP exception bit of quad-precision
convert & truncate instructions.

Reviewed By: steven.zhang, uweigand

Differential Revision: https://reviews.llvm.org/D81537
2020-08-20 13:29:43 +08:00
Kang Zhang a18953c1c0 [PowerPC] Fix RM operands for some instructions
Summary:
Some instructions have set the wrong [RM] flag, this patch is to fix it.

Instructions x(v|s)r(d|s)pi[zmp]? and fri[npzm] use fixed rounding
directions without referencing current rounding mode.

Also, the SETRNDi, SETRND, BCLRn, MTFSFI, MTFSB0, MTFSB1, MTFSFb,
MTFSFI, MTFSFI_rec, MTFSF, MTFSF_rec should also fix the RM flag.

Reviewed By: jsji

Differential Revision: https://reviews.llvm.org/D81360
2020-07-30 02:10:49 +00:00
Kit Barton 5ca75130f5 [PPC][NFC] Add Subtarget and replace all uses of PPCSubTarget with Subtarget.
Summary:
In preparation for GlobalISel, PPCSubTarget needs to be renamed to Subtarget as there places in GlobalISel that assume the presence of the variable Subtarget.
This patch introduces the variable Subtarget, and replaces all existing uses of PPCSubTarget with Subtarget. A subsequent patch will remove the definiton of
PPCSubTarget, once any downstream users have the opportunity to rename any uses they have.

Reviewers: hfinkel, nemanjai, jhibbits, #powerpc, echristo, lkail

Reviewed By: #powerpc, echristo, lkail

Subscribers: echristo, lkail, wuzish, nemanjai, hiraditya, jfb, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D81623
2020-06-26 11:23:38 -05:00
Nemanja Ivanovic 1fed131660 [PowerPC] Canonicalize shuffles to match more single-instruction masks on LE
We currently miss a number of opportunities to emit single-instruction
VMRG[LH][BHW] instructions for shuffles on little endian subtargets. Although
this in itself is not a huge performance opportunity since loading the permute
vector for a VPERM can always be pulled out of loops, producing such merge
instructions is useful to downstream optimizations.
Since VPERM is essentially opaque to all subsequent optimizations, we want to
avoid it as much as possible. Other permute instructions have semantics that can
be reasoned about much more easily in later optimizations.

This patch does the following:
- Canonicalize shuffles so that the first element comes from the first vector
  (since that's what most of the mask matching functions want)
- Switch the elements that come from splat vectors so that they match the
  corresponding elements from the other vector (to allow for merges)
- Adds debugging messages for when a shuffle is matched to a VPERM so that
  anyone interested in improving this further can get the info for their code

Differential revision: https://reviews.llvm.org/D77448
2020-06-18 21:54:22 -05:00
Qiu Chaofan 13edcd696e [PowerPC] Support constrained rounding operations
This patch adds handling of constrained FP intrinsics about round,
truncate and extend for PowerPC target, with necessary tests.

Reviewed By: steven.zhang

Differential Revision: https://reviews.llvm.org/D64193
2020-06-14 23:43:31 +08:00
Qiu Chaofan 7a001a2d92 [PowerPC] Require nsz flag for c-a*b to FNMSUB
On PowerPC, FNMSUB (both VSX and non-VSX version) means -(a*b-c). But
the backend used to generate these instructions regardless whether nsz
flag exists or not. If a*b-c==0, such transformation changes sign of
zero.

This patch introduces PPC specific FNMSUB ISD opcode, which may help
improving combined FMA code sequence.

Reviewed By: steven.zhang

Differential Revision: https://reviews.llvm.org/D76585
2020-06-04 16:41:27 +08:00
Nemanja Ivanovic 1a493b0fa5 [PowerPC] Add missing handling for half precision
The fix for PR39865 took care of some of the handling for half precision
but it missed a number of issues that still exist. This patch fixes the
remaining issues that cause crashes in the PPC back end.

Fixes: https://bugs.llvm.org/show_bug.cgi?id=45776

Differential revision: https://reviews.llvm.org/D79283
2020-05-22 07:50:11 -05:00
Qiu Chaofan 8ffe8891cd [PowerPC] Exploit VSX neg, abs and nabs for f32
xsnegdp, xsabsdp and xsnabsdp can be used to operate on f32 operand.

This patch adds the missing patterns since we prefer VSX instructions
when available.

Reviewed By: steven.zhang

Differential Revision: https://reviews.llvm.org/D75344
2020-05-13 14:28:50 +08:00
Qiu Chaofan e8d2ff22f0 [PowerPC] Add fma/fsqrt/fmax strict-fp intrinsics
This patch adds strict-fp intrinsics support for fma, fsqrt, fmaxnum and
fminnum on PowerPC.

Reviewed By: hfinkel

Differential Revision: https://reviews.llvm.org/D72749
2020-05-12 13:44:09 +08:00
Nemanja Ivanovic 8ca2fc9993 [PowerPC] Refactor PPCInstrVSX.td
Over time, we have made many additions to this file and it has frankly become a
bit of a mess. This has led to at least one issue - we have a number of
instructions where the side effects flag should be set to false and we neglected
to do this. This patch suggests a refactoring that should make the file much
more maintainable. The file is split up into major sections and the nesting
level is reduced, predicate blocks merged, etc.

Sections:
  - Custom PPCISD node definitions
  - Predicate definitions
  - Instruction formats
  - Instruction definitions
  - Helper DAG definitions
  - Anonymous patterns
  - Instruction aliases

Differential revision: https://reviews.llvm.org/D78132
2020-05-01 19:17:39 -05:00
Kazuaki Ishizaki 0312b9f550 [llvm] NFC: Fix trivial typo in rst and td files
Differential Revision: https://reviews.llvm.org/D77469
2020-04-23 14:26:32 +09:00
Nemanja Ivanovic 512600e3c0 [PowerPC] Handle f16 as a storage type only
The PPC back end currently crashes (fails to select) with f16 input. This patch
expands it on subtargets prior to ISA 3.0 (Power9) and uses the HW conversions
on Power9.

Fixes https://bugs.llvm.org/show_bug.cgi?id=39865

Differential revision: https://reviews.llvm.org/D68237
2020-04-11 07:34:47 -05:00
Nemanja Ivanovic ecd8435483 [NFC][PowerPC] Fix register class for patterns using XXPERMDIs
There are a few patterns where we use a superclass for inputs to this
instruction rather than the correct class. This can sometimes lead to
unncessary copies.
2020-04-07 14:06:08 -05:00
Nemanja Ivanovic bfa9ce1cb2 [PowerPC] Improve handling of some BUILD_VECTOR nodes
An analysis of real world code turned up a number of patterns with BUILD_VECTOR
of nodes resulting from operations on extracted vector elements for which we
produce poor code. This addresses those cases. No attempt is made for
completeness as that would entail a large amount of work for something that
there is no evidence of in real code.

Differential revision: https://reviews.llvm.org/D72660
2020-03-23 17:34:29 -05:00
QingShan Zhang d0fb34dc09 [PowerPC] Replace the PPCISD:: SExtVElems with ISD::SIGN_EXTEND_INREG to leverage the combine rules
The PPCISD::SExtVElems was added by commit https://reviews.llvm.org/D34009. However,
we have another ISD node ISD::SIGN_EXTEND_INREG that perfectly match the semantics
of SExtVElems. And the DAGCombiner has some combine rules for SIGN_EXTEND_INREG
that produce better code.

Differential Revision: https://reviews.llvm.org/D70771
2020-03-13 07:28:28 +00:00
Qiu Chaofan 096d545376 [PowerPC] Add strict-fp intrinsic to FP arithmetic
This patch adds basic strict-fp intrinsics support to PowerPC backend,
including basic arithmetic operations (add/sub/mul/div).

Reviewed By: steven.zhang, andrew.w.kaylor

Differential Revision: https://reviews.llvm.org/D63916
2020-03-12 17:02:54 +08:00
Qiu Chaofan 87c773082a [PowerPC] Exploit VSX rounding instrs for rint
Exploit native VSX rounding instruction, x(v|s)r(d|s)pic, which does
rounding using current rounding mode.

According to C standard library, rint may raise INEXACT exception while
nearbyint won't.

Reviewed By: lkail

Differential Revision: https://reviews.llvm.org/D72685
2020-02-13 20:59:50 +08:00
Jinsong Ji 24ee4edee8 [PowerPC][NFC] Rename record instructions to use _rec suffix instead of o
We use o suffix to indicate record form instuctions,
(as it is similar to dot '.' in mne?)

This was fine before, as we did not support XO-form.
However, with https://reviews.llvm.org/D66902,
we now have XO-form support.

It becomes confusing now to still use 'o' for record form,
and it is weird to have something like 'Oo' .

This patch rename all 'o' instructions to use '_rec' instead.
Also rename `isDot` to `isRecordForm`.

Reviewed By: #powerpc, hfinkel, nemanjai, steven.zhang, lkail

Differential Revision: https://reviews.llvm.org/D70758
2020-01-06 22:27:07 +00:00
Nemanja Ivanovic 0f0330a787 [PowerPC] Legalize rounding nodes
VSX provides a full complement of rounding instructions yet we somehow ended up
with some of them legal and others not. This just legalizes all of the FP
rounding nodes and the FP -> int rounding nodes with unsafe math.

Differential revision: https://reviews.llvm.org/D69949
2019-12-30 08:03:53 -06:00
QingShan Zhang 6d5e35e89d [Power9] Remove the PPCISD::XXREVERSE as it has completely the same semantics of ISD::BSWAP
The custom node PPCISD::XXREVERSE has completely the same semantics of generic node ISD::BSWAP.
We need to clean up it as we have the combine rules for bswap in the base class, while nothing for xxreverse.

Differential Revision: https://reviews.llvm.org/D70657
2019-12-23 07:44:33 +00:00
Nemanja Ivanovic a5da8d90da [PowerPC] Add missing legalization for vector BSWAP
We somehow missed doing this when we were working on Power9 exploitation.
This just adds the missing legalization and cost for producing the vector
intrinsics.

Differential revision: https://reviews.llvm.org/D70436
2019-12-17 19:07:34 -06:00
Jinsong Ji 3d41a58eac [PowerPC][NFC] Rename ANDI(S)o8 to ANDI(S)8o
Summary:
This is found during https://reviews.llvm.org/D70758
All the other record forms are having suffix o at the end.
ANDIo8 and ANDISo8 are the only two that put o before 8.

This patch rename them to be consistent with others.

Reviewers: #powerpc, hfinkel, nemanjai, lei, steven.zhang, echristo, jhibbits, joerg

Reviewed By: jhibbits

Subscribers: wuzish, hiraditya, kbarton, shchenz, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70928
2019-12-09 19:21:34 +00:00
Stefan Pintilie 6512473cee [PowerPC] Improve float vector gather codegen
This patch aims to improve the code generation for float vector gather on POWER9.
Patterns have been implemented to utilize instructions that deliver improved
performance.

Patch by: Kamau Bridgeman

Differential Revision: https://reviews.llvm.org/D62908
2019-11-18 15:53:32 -06:00
QingShan Zhang 529bb8a980 [PowerPC] Fix the incorrect 'RM' flag set on load/store instr
The 'RM' flag model the "Rounding Mode" and it has nothing to do with the load/store instructions.

Differential Revision: https://reviews.llvm.org/D69551
2019-11-06 02:46:37 +00:00
QingShan Zhang f15cf93899 [PowerPC] Clear the sideeffect bit for those instructions that didn't have the match pattern
If the instruction have match pattern, llvm-tblgen will infer the sideeffect bit from the match pattern and it works well.
If not, the tblgen will set it as true that hurt the scheduling.

PowerPC has some instructions that didn't specify the match pattern(i.e. LXSD etc), which is manually selected post-ra according
to the register pressure. We need to clear the sideeffect flag for these instructions.

Differential Revision: https://reviews.llvm.org/D69232
2019-10-30 07:59:32 +00:00
Nemanja Ivanovic 25a41ad242 [PowerPC] Emit scalar fp min/max instructions
VSX provides floating point minimum and maximum instructions that conform
to IEEE semantics. This legalizes the respective nodes and emits VSX code
for them. Furthermore, on Power9 cores we have xsmaxcdp and xsmincdp
instructions that conform to language semantics for the conditional operator
even in the presence of NaNs.

Differential revision: https://reviews.llvm.org/D62993
2019-10-28 19:13:33 -05:00
Jinsong Ji 4a6881eabc [PowerPC] Adjust the naming and operand order of fnmsub patterns
Summary:
This is follow up patch of https://reviews.llvm.org/D67595.
Adjust naming and the Commutable operands for additional patterns
to make it easier to read.

The testcase update also show that we can save some unecessary fmr as
well.

Reviewers: #powerpc, steven.zhang, hfinkel, nemanjai

Reviewed By: #powerpc, nemanjai

Subscribers: wuzish, hiraditya, kbarton, MaskRay, shchenz, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68112

llvm-svn: 373652
2019-10-03 19:36:42 +00:00
Jinsong Ji be13c43e08 [PowerPC] Fix typo in rL372985
llvm-svn: 372991
2019-09-26 15:49:11 +00:00
Jinsong Ji eaf6746db0 [PowerPC] Add missing pattern for VSX Scalar Negative Multiply-Subtract Single Precision
Summary:
This was found during review of https://reviews.llvm.org/D66050.
In the simple test of fdiv, we miss to fold
```
        fneg 2, 2
        xsmaddasp 3, 2, 0
```
to
```
        xsnmsubasp 3, 2, 0
```
We have the patterns for Double Precision and vectors, just missing
Single Precision, the patch add that.

Reviewers: #powerpc, hfinkel, nemanjai, steven.zhang

Reviewed By: #powerpc, steven.zhang

Subscribers: wuzish, hiraditya, kbarton, MaskRay, shchenz, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67595

llvm-svn: 372985
2019-09-26 15:11:33 +00:00
Matt Arsenault 3ecab8e455 Reapply r372285 "GlobalISel: Don't materialize immarg arguments to intrinsics"
This reverts r372314, reapplying r372285 and the commits which depend
on it (r372286-r372293, and r372296-r372297)

This was missing one switch to getTargetConstant in an untested case.

llvm-svn: 372338
2019-09-19 16:26:14 +00:00
Hans Wennborg 13bdae8541 Revert r372285 "GlobalISel: Don't materialize immarg arguments to intrinsics"
This broke the Chromium build, causing it to fail with e.g.

  fatal error: error in backend: Cannot select: t362: v4i32 = X86ISD::VSHLI t392, Constant:i8<15>

See llvm-commits thread of r372285 for details.

This also reverts r372286, r372287, r372288, r372289, r372290, r372291,
r372292, r372293, r372296, and r372297, which seemed to depend on the
main commit.

> Encode them directly as an imm argument to G_INTRINSIC*.
>
> Since now intrinsics can now define what parameters are required to be
> immediates, avoid using registers for them. Intrinsics could
> potentially want a constant that isn't a legal register type. Also,
> since G_CONSTANT is subject to CSE and legalization, transforms could
> potentially obscure the value (and create extra work for the
> selector). The register bank of a G_CONSTANT is also meaningful, so
> this could throw off future folding and legalization logic for AMDGPU.
>
> This will be much more convenient to work with than needing to call
> getConstantVRegVal and checking if it may have failed for every
> constant intrinsic parameter. AMDGPU has quite a lot of intrinsics wth
> immarg operands, many of which need inspection during lowering. Having
> to find the value in a register is going to add a lot of boilerplate
> and waste compile time.
>
> SelectionDAG has always provided TargetConstant for constants which
> should not be legalized or materialized in a register. The distinction
> between Constant and TargetConstant was somewhat fuzzy, and there was
> no automatic way to force usage of TargetConstant for certain
> intrinsic parameters. They were both ultimately ConstantSDNode, and it
> was inconsistently used. It was quite easy to mis-select an
> instruction requiring an immediate. For SelectionDAG, start emitting
> TargetConstant for these arguments, and using timm to match them.
>
> Most of the work here is to cleanup target handling of constants. Some
> targets process intrinsics through intermediate custom nodes, which
> need to preserve TargetConstant usage to match the intrinsic
> expectation. Pattern inputs now need to distinguish whether a constant
> is merely compatible with an operand or whether it is mandatory.
>
> The GlobalISelEmitter needs to treat timm as a special case of a leaf
> node, simlar to MachineBasicBlock operands. This should also enable
> handling of patterns for some G_* instructions with immediates, like
> G_FENCE or G_EXTRACT.
>
> This does include a workaround for a crash in GlobalISelEmitter when
> ARM tries to uses "imm" in an output with a "timm" pattern source.

llvm-svn: 372314
2019-09-19 12:33:07 +00:00
Matt Arsenault d8399d12cd GlobalISel: Don't materialize immarg arguments to intrinsics
Encode them directly as an imm argument to G_INTRINSIC*.

Since now intrinsics can now define what parameters are required to be
immediates, avoid using registers for them. Intrinsics could
potentially want a constant that isn't a legal register type. Also,
since G_CONSTANT is subject to CSE and legalization, transforms could
potentially obscure the value (and create extra work for the
selector). The register bank of a G_CONSTANT is also meaningful, so
this could throw off future folding and legalization logic for AMDGPU.

This will be much more convenient to work with than needing to call
getConstantVRegVal and checking if it may have failed for every
constant intrinsic parameter. AMDGPU has quite a lot of intrinsics wth
immarg operands, many of which need inspection during lowering. Having
to find the value in a register is going to add a lot of boilerplate
and waste compile time.

SelectionDAG has always provided TargetConstant for constants which
should not be legalized or materialized in a register. The distinction
between Constant and TargetConstant was somewhat fuzzy, and there was
no automatic way to force usage of TargetConstant for certain
intrinsic parameters. They were both ultimately ConstantSDNode, and it
was inconsistently used. It was quite easy to mis-select an
instruction requiring an immediate. For SelectionDAG, start emitting
TargetConstant for these arguments, and using timm to match them.

Most of the work here is to cleanup target handling of constants. Some
targets process intrinsics through intermediate custom nodes, which
need to preserve TargetConstant usage to match the intrinsic
expectation. Pattern inputs now need to distinguish whether a constant
is merely compatible with an operand or whether it is mandatory.

The GlobalISelEmitter needs to treat timm as a special case of a leaf
node, simlar to MachineBasicBlock operands. This should also enable
handling of patterns for some G_* instructions with immediates, like
G_FENCE or G_EXTRACT.

This does include a workaround for a crash in GlobalISelEmitter when
ARM tries to uses "imm" in an output with a "timm" pattern source.

llvm-svn: 372285
2019-09-19 01:33:14 +00:00
Nemanja Ivanovic 1461fb6e78 [PowerPC] Exploit single instruction load-and-splat for word and doubleword
We currently produce a load, followed by (possibly a move for integers and) a
splat as separate instructions. VSX has always had a splatting load for
doublewords, but as of Power9, we have it for words as well. This patch just
exploits these instructions.

Differential revision: https://reviews.llvm.org/D63624

llvm-svn: 372139
2019-09-17 16:45:20 +00:00
Nemanja Ivanovic e63c676825 [PowerPC] Cust lower fpext v2f32 to v2f64 from extract_subvector v4f32
Add the missing piece of r372029.
Somehow when the patch for review D61961 was committed, only the test case
went in and the code didn't. This of course caused all kinds of build bot
breaks.
This patch just adds the code for that patch.

Author: Lei Huang
Differential revision: https://reviews.llvm.org/D61961

llvm-svn: 372043
2019-09-16 22:54:52 +00:00
Jinsong Ji 1ed7d2119e [PowerPC] Support extended mnemonics mffprwz etc.
Summary:
Reported in https://github.com/opencv/opencv/issues/15413.

We have serveral extended mnemonics for Move To/From Vector-Scalar Register Instructions
eg: mffprd,mtfprd etc.

We only support one of them, this patch add the others.

Reviewers: nemanjai, steven.zhang, hfinkel, #powerpc

Reviewed By: hfinkel

Subscribers: wuzish, qcolombet, hiraditya, kbarton, MaskRay, shchenz, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66963

llvm-svn: 370411
2019-08-29 21:53:59 +00:00
Jinsong Ji 0776da5236 [PeepholeOptimizer] Don't assume bitcast def always has input
Summary:
If we have a MI marked with bitcast bits, but without input operands,
PeepholeOptimizer might crash with assert.

eg:
If we apply the changes in PPCInstrVSX.td as in this patch:

[(set v4i32:$XT, (bitconvert (v16i8 immAllOnesV)))]>;

We will get assert in PeepholeOptimizer.

```
llvm-lit llvm-project/llvm/test/CodeGen/PowerPC/build-vector-tests.ll -v

llvm-project/llvm/include/llvm/CodeGen/MachineInstr.h:417: const
llvm::MachineOperand &llvm::MachineInstr::getOperand(unsigned int)
const: Assertion `i < getNumOperands() && "getOperand() out of range!"'
failed.
```

The fix is to abort if we found out of bound access.

Reviewers: qcolombet, MatzeB, hfinkel, arsenm

Reviewed By: qcolombet

Subscribers: wdng, arsenm, steven.zhang, wuzish, nemanjai, hiraditya, kbarton, MaskRay, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65542

llvm-svn: 369261
2019-08-19 14:19:04 +00:00
Jinsong Ji 9fd81dc139 [PowerPC] Use xxleqv to set all one vector IMM(-1).
Summary:
xxspltib/vspltisb are 3 cycle PM instructions,
xxleqv is 2 cycle ALU instruction.

We should use xxleqv to set all one vectors.

Reviewers: hfinkel, nemanjai, steven.zhang

Subscribers: hiraditya, kbarton, MaskRay, shchenz, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65529

llvm-svn: 369006
2019-08-15 14:32:51 +00:00
Jinsong Ji e71db6584d [PowerPC][NFC] Consolidate duplicate XX3Form_SetZero and XX3Form_Zero.
Rename one to XX3Form_SameOp, remove the other one.

llvm-svn: 368856
2019-08-14 14:16:26 +00:00
Zi Xuan Wu 66c320908b recommit:[PowerPC] Eliminate loads/swap feeding swap/store for vector type by using big-endian load/store
In PowerPC, there is instruction to load vector in big endian element order when it's in little endian target. 
So we can combine vector load + reverse into big endian load to eliminate the swap instruction.
Also combine vector reverse + store into big endian store.

Differential Revision: https://reviews.llvm.org/D65063

llvm-svn: 367516
2019-08-01 05:26:02 +00:00
Zi Xuan Wu 54d446f70e revert r367382 because buildbot failure
llvm-svn: 367388
2019-07-31 07:03:42 +00:00
Zi Xuan Wu e85f6bf66c [PowerPC] Eliminate loads/swap feeding swap/store for vector type by using big-endian load/store
In PowerPC, there is instruction to load vector in big endian element order when it's in little endian target. 
So we can combine vector load + reverse into big endian load to eliminate the swap instruction.
Also combine vector reverse + store into big endian store.

llvm-svn: 367382
2019-07-31 02:56:00 +00:00
Jinsong Ji 5bb6202c44 [PowerPC][NFC]Fix a typo in comment.
llvm-svn: 367252
2019-07-29 19:27:54 +00:00
Zi Xuan Wu 588a170970 [NFC][PowerPC] Move XS*QP series instruction apart from XS*QPO series in position of td file
llvm-svn: 364620
2019-06-28 02:51:03 +00:00
Nemanja Ivanovic 47b7d13459 [PowerPC] Emit XXSEL for vec_sel and code that has the same pattern
As pointed out in https://bugs.llvm.org/show_bug.cgi?id=41777
we do not emit a vector select even when the pretty much asks for one.
This patch changes that.

Differential revision: https://reviews.llvm.org/D61658

llvm-svn: 364289
2019-06-25 10:46:13 +00:00
Nemanja Ivanovic ef4a3aa549 [PowerPC] Exploit the vector min/max instructions
Use the PPC vector min/max instructions for computing the corresponding
operation as these should be faster than the compare/select sequences
we currently emit.

Differential revision: https://reviews.llvm.org/D47332

llvm-svn: 362759
2019-06-06 23:49:01 +00:00
Chen Zheng b727b0483c [PowerPC] use meaningful name for displacement form aligned with x-form - NFC
llvm-svn: 361347
2019-05-22 03:17:39 +00:00
Chen Zheng 9970665f60 [PowerPC] [ISEL] select x-form instruction for unaligned offset
Differential Revision: https://reviews.llvm.org/D62173

llvm-svn: 361346
2019-05-22 02:57:31 +00:00
Lei Huang 1ac6e9636c [PowerPC] custom lower `v2f64 fpext v2f32`
Reduces scalarization overhead via custom lowering of v2f64 fpext v2f32.

eg. For the following IR
  %0 = load <2 x float>, <2 x float>* %Ptr, align 8
  %1 = fpext <2 x float> %0 to <2 x double>
  ret <2 x double> %1

Pre custom lowering:
  ld r3, 0(r3)
  mtvsrd f0, r3
  xxswapd vs34, vs0
  xscvspdpn f0, vs0
  xxsldwi vs1, vs34, vs34, 3
  xscvspdpn f1, vs1
  xxmrghd vs34, vs0, vs1

After custom lowering:
  lfd f0, 0(r3)
  xxmrghw vs0, vs0, vs0
  xvcvspdp vs34, vs0

Differential Revision: https://reviews.llvm.org/D57857

llvm-svn: 360429
2019-05-10 14:04:06 +00:00