Commit Graph

70 Commits

Author SHA1 Message Date
Amy Kwan ba627a32e1 [PowerPC] Update Refactored Load/Store Implementation, XForm VSX Patterns, and Tests
This patch includes the following updates to the load/store refactoring effort introduced in D93370:
 - Update various VSX patterns that use to "force" an XForm, to instead just XForm.
   This allows the ability for the patterns to compute the most optimal addressing
   mode (and to produce a DForm instruction when possible)
- Update pattern and test case for the LXVD2X/STXVD2X intrinsics
- Update LIT test cases that use to use the XForm instruction to use the DForm instruction

Differential Revision: https://reviews.llvm.org/D95115
2021-07-16 09:28:48 -05:00
Qiu Chaofan bc104fdcec [PowerPC] Relax register superclasses for paired memops
Relaxing superclass constraint for VSX register classes helps reducing
32-byte spills and copies when register pressure is high.

In test case affected, some of them introduces more copies due to new
allocation order. However, this patch should not be the root cause, and
we may be able to fix it in other places of register allocation.

Reviewed By: nemanjai

Differential Revision: https://reviews.llvm.org/D104006
2021-06-11 14:54:03 +08:00
Nemanja Ivanovic 03e7fefff8 [PowerPC] Canonicalize shuffles on big endian targets as well
Extend shuffle canonicalization and conversion of shuffles fed by vectorized
scalars to big endian subtargets. For big endian subtargets, loads and direct
moves of scalars into vector registers put the data in the correct element for
SCALAR_TO_VECTOR if the data type is 8 bytes wide. However, if the data type is
narrower, the value still ends up in the wrong place - althouth a different
wrong place than on little endian targets.

This patch extends the combine that keeps values where they are if they feed a
shuffle to big endian targets.

Differential revision: https://reviews.llvm.org/D100478
2021-04-20 07:29:47 -05:00
Bradley Smith 9745dce8c3 [SelectionDAG][AArch64][SVE] Perform SETCC condition legalization in LegalizeVectorOps
This is currently performed in SelectionDAGLegalize, here we make it also
happen in LegalizeVectorOps, allowing a target to lower the SETCC condition
codes first in LegalizeVectorOps and then lower to a custom node afterwards,
without having to duplicate all of the SETCC condition legalization in the
target specific lowering.

As a result of this, fixed length floating point SETCC nodes can now be
properly lowered for SVE.

Differential Revision: https://reviews.llvm.org/D98939
2021-03-29 15:32:25 +01:00
Craig Topper 03c8d6a0c4 [LegalizeDAG][RISCV][PowerPC][AMDGPU][WebAssembly] Improve expansion of SETONE/SETUEQ on targets without SETO/SETUO.
If SETO/SETUO aren't legal, they'll be expanded and we'll end up
with 3 comparisons.

SETONE is equivalent to (SETOGT || SETOLT)
so if one of those operations is supported use that expansion. We
don't need both since we can commute the operands to make the other.

SETUEQ can be implemented with !(SETOGT || SETOLT) or (SETULE && SETUGE).
I've only implemented the first because it didn't look like most of the
affected targets had legal SETULE/SETUGE.

Reviewed By: frasercrmck, tlively, nemanjai

Differential Revision: https://reviews.llvm.org/D94450
2021-01-12 10:45:03 -08:00
Matt Arsenault 89baeaef2f Reapply "RegAllocFast: Rewrite and improve"
This reverts commit 73a6a164b8.
2020-09-30 10:35:25 -04:00
Muhammad Omair Javaid 73a6a164b8 Revert "Reapply Revert "RegAllocFast: Rewrite and improve""
This reverts commit 55f9f87da2.

Breaks following buildbots:
http://lab.llvm.org:8011/builders/lldb-arm-ubuntu/builds/4306
http://lab.llvm.org:8011/builders/lldb-aarch64-ubuntu/builds/9154
2020-09-22 14:40:06 +05:00
Matt Arsenault 55f9f87da2 Reapply Revert "RegAllocFast: Rewrite and improve"
This reverts commit dbd53a1f0c.

Needed lldb test updates
2020-09-21 15:45:27 -04:00
Eric Christopher dbd53a1f0c Temporarily Revert "RegAllocFast: Rewrite and improve"
as it's breaking a few tests in the lldb test suite.

Bot: http://lab.llvm.org:8011/builders/lldb-arm-ubuntu/builds/4226/steps/test/logs/stdio

This reverts commit c8757ff3aa.
2020-09-18 18:11:21 -07:00
Matt Arsenault c8757ff3aa RegAllocFast: Rewrite and improve
This rewrites big parts of the fast register allocator. The basic
strategy of doing block-local allocation hasn't changed but I tweaked
several details:

Track register state on register units instead of physical
registers. This simplifies and speeds up handling of register aliases.
Process basic blocks in reverse order: Definitions are known to end
register livetimes when walking backwards (contrary when walking
forward then uses may or may not be a kill so we need heuristics).

Check register mask operands (calls) instead of conservatively
assuming everything is clobbered.  Enhance heuristics to detect
killing uses: In case of a small number of defs/uses check if they are
all in the same basic block and if so the last one is a killing use.
Enhance heuristic for copy-coalescing through hinting: We check the
first k defs of a register for COPYs rather than relying on there just
being a single definition.  When testing this on the full llvm
test-suite including SPEC externals I measured:

average 5.1% reduction in code size for X86, 4.9% reduction in code on
aarch64. (ranging between 0% and 20% depending on the test) 0.5%
faster compiletime (some analysis suggests the pass is slightly slower
than before, but we more than make up for it because later passes are
faster with the reduced instruction count)

Also adds a few testcases that were broken without this patch, in
particular bug 47278.

Patch mostly by Matthias Braun
2020-09-18 14:05:18 -04:00
Matt Arsenault 870fd53e4f Reapply "RegAllocFast: Record internal state based on register units"
The regressions this caused should be fixed when
https://reviews.llvm.org/D52010 is applied.

This reverts commit a21387c654.
2020-09-18 14:05:18 -04:00
Hans Wennborg a21387c654 Revert "RegAllocFast: Record internal state based on register units"
This seems to have caused incorrect register allocation in some cases,
breaking tests in the Zig standard library (PR47278).

As discussed on the bug, revert back to green for now.

> Record internal state based on register units. This is often more
> efficient as there are typically fewer register units to update
> compared to iterating over all the aliases of a register.
>
> Original patch by Matthias Braun, but I've been rebasing and fixing it
> for almost 2 years and fixed a few bugs causing intermediate failures
> to make this patch independent of the changes in
> https://reviews.llvm.org/D52010.

This reverts commit 66251f7e1d, and
follow-ups 931a68f26b
and 0671a4c508. It also adjust some
test expectations.
2020-09-15 13:25:41 +02:00
Nemanja Ivanovic 1fed131660 [PowerPC] Canonicalize shuffles to match more single-instruction masks on LE
We currently miss a number of opportunities to emit single-instruction
VMRG[LH][BHW] instructions for shuffles on little endian subtargets. Although
this in itself is not a huge performance opportunity since loading the permute
vector for a VPERM can always be pulled out of loops, producing such merge
instructions is useful to downstream optimizations.
Since VPERM is essentially opaque to all subsequent optimizations, we want to
avoid it as much as possible. Other permute instructions have semantics that can
be reasoned about much more easily in later optimizations.

This patch does the following:
- Canonicalize shuffles so that the first element comes from the first vector
  (since that's what most of the mask matching functions want)
- Switch the elements that come from splat vectors so that they match the
  corresponding elements from the other vector (to allow for merges)
- Adds debugging messages for when a shuffle is matched to a VPERM so that
  anyone interested in improving this further can get the info for their code

Differential revision: https://reviews.llvm.org/D77448
2020-06-18 21:54:22 -05:00
Matt Arsenault 66251f7e1d RegAllocFast: Record internal state based on register units
Record internal state based on register units. This is often more
efficient as there are typically fewer register units to update
compared to iterating over all the aliases of a register.

Original patch by Matthias Braun, but I've been rebasing and fixing it
for almost 2 years and fixed a few bugs causing intermediate failures
to make this patch independent of the changes in
https://reviews.llvm.org/D52010.
2020-06-03 16:51:46 -04:00
Kang Zhang 86e3abc9e6 [PowerPC] Add some InstAlias definitions
Summary:
This patch add the InstAlias definitions for below instructions.

ADDI ADDIS ADDI8 ADDIS8
RLWINM8
ISEL ISEL8
OR OR_rec ORI ORI8 XORI8
CNTLZW8 CNTLZW8_rec
TEND TSR
RFEBB
NOR NOR_rec
MTCRF
SUBF SUBF_rec SUBFC SUBFC_rec
RLDICL_32_64
TW

Reviewed By: steven.zhang

Differential Revision: https://reviews.llvm.org/D77559
2020-05-24 14:05:28 +00:00
Kang Zhang 513976df2e [PowerPC] Ignore implicit register operands for MCInst
Summary:
When doing the conversion: MachineInst -> MCInst, we should ignore the
implicit operands, it will expose more opportunity for InstiAlias.

Reviewed By: steven.zhang

Differential Revision: https://reviews.llvm.org/D77118
2020-04-16 16:22:43 +00:00
Nemanja Ivanovic bfa9ce1cb2 [PowerPC] Improve handling of some BUILD_VECTOR nodes
An analysis of real world code turned up a number of patterns with BUILD_VECTOR
of nodes resulting from operations on extracted vector elements for which we
produce poor code. This addresses those cases. No attempt is made for
completeness as that would entail a large amount of work for something that
there is no evidence of in real code.

Differential revision: https://reviews.llvm.org/D72660
2020-03-23 17:34:29 -05:00
QingShan Zhang 7fdb3a293b [PowerPC] Implement the areMemAccessesTriviallyDisjoint hook
After implemented this hook, we will model the memory dependency in the scheduling dependency graph more precise,
and will have more opportunity to reorder the load/stores, as they didn't have the dependency at some condition

Differential Revision: https://reviews.llvm.org/D63804

llvm-svn: 364886
2019-07-02 03:28:52 +00:00
Matt Arsenault 828b685ebe RegAllocFast: Improve hinting heuristic
Trace through multiple COPYs when looking for a physreg source. Add
hinting for vregs that will be copied into physregs (we only hinted
for vregs getting copied to a physreg previously).  Give hinted a
register a bonus when deciding which value to spill.  This is part of
my rewrite regallocfast series. In fact this one doesn't even have an
effect unless you also flip the allocation to happen from back to
front of a basic block. Nonetheless it helps to split this up to ease
review of D52010

Patch by Matthias Braun

llvm-svn: 360887
2019-05-16 12:50:39 +00:00
Matt Arsenault b6c599afd3 Reapply r359906, "RegAllocFast: Add heuristic to detect values not live-out of a block"
This reverts commit r359912.

This should pass now, since the clang test was made less fragile in
r359918.

llvm-svn: 359919
2019-05-03 19:06:57 +00:00
Nico Weber bb852a9672 Revert r359906, "RegAllocFast: Add heuristic to detect values not live-out of a block"
Makes clang/test/Misc/backend-stack-frame-diagnostics-fallback.cpp fail.

llvm-svn: 359912
2019-05-03 18:08:03 +00:00
Matt Arsenault daf2d653fa RegAllocFast: Add heuristic to detect values not live-out of a block
Add an improved/new heuristic to catch more cases when values are not
live out of a basic block.

Patch by Matthias Braun

llvm-svn: 359906
2019-05-03 17:03:24 +00:00
Matt Arsenault c2e35a6f32 RegAllocFast: Remove early selection loop, the spill calculation will report cost 0 anyway for free regs
The 2nd loop calculates spill costs but reports free registers as cost
0 anyway, so there is little benefit from having a separate early
loop.

Surprisingly this is not NFC, as many register are marked regDisabled
so the first loop often picks up later registers unnecessarily instead
of the first one available in the allocation order...

Patch by Matthias Braun

llvm-svn: 356499
2019-03-19 19:01:34 +00:00
Nemanja Ivanovic 0f7715afe1 [PowerPC] Complete the custom legalization of vector int to fp conversion
A recent patch has added custom legalization of vector conversions of
v2i16 -> v2f64. This just rounds it out for other types where the input vector
has an illegal (narrower) type than the result vector. Specifically, this will
handle the following conversions:

v2i8 -> v2f64
v4i8 -> v4f32
v4i16 -> v4f32

Differential revision: https://reviews.llvm.org/D54663

llvm-svn: 350155
2018-12-29 13:40:48 +00:00
Zaara Syeda 7509880b54 [Power9] Add support for stxvw4x.be and stxvd2x.be intrinsics
On Power9, we don't have patterns to select the following intrinsics:
llvm.ppc.vsx.stxvw4x.be
llvm.ppc.vsx.stxvd2x.be

This patch adds support for these.

Differential Revision: https://reviews.llvm.org/D53581

llvm-svn: 346148
2018-11-05 17:31:26 +00:00
Nemanja Ivanovic 5d06f17b8a [PowerPC] Revert commit r339779
This commit has caused failures in some internal benchmarks. Temporarily
reverting this patch until the issue can be diagnosed and fixed.

llvm-svn: 340740
2018-08-27 13:20:42 +00:00
Stefan Pintilie bddd7d8b7b [PowerPC] Change Test Options [NFC]
Patch by amyk

llvm-svn: 340639
2018-08-24 19:24:20 +00:00
Stefan Pintilie cb4f0c5c07 [PowerPC] Replace the Post RA List Scheduler with the Machine Scheduler
We want to run the Machine Scheduler instead of the List Scheduler after RA.
  Checked with a performance run on a Power 9 machine with SPEC 2006 and while
  some benchmarks improved and others degraded the geomean was slightly improved
  with the Machine Scheduler.

  Differential Revision: https://reviews.llvm.org/D45265

llvm-svn: 336295
2018-07-04 18:54:25 +00:00
Lei Huang 263dc4ef3a [PowerPC] Utilize DQ-Form instructions for spill/restore and fix FrameIndex elimination to only use `lis/addi` if necessary.
Currently we produce a bunch of unnecessary code when emitting the
prologue/epilogue for spills/restores.  Namely, if the load from stack
slot/store to stack slot instruction is an X-Form instruction, we will
always produce an LIS/ORI sequence for the stack offset.

Furthermore, we have not exploited the P9 vector D-Form loads/stores for this
purpose.

This patch address both issues.

Specifying the D-Form load as the instruction to use for stack spills/reloads
should be safe because:

1. The stack should be aligned according to the ABI
2. If the stack isn't aligned, PPCRegisterInfo::eliminateFrameIndex() will
   check for the offset being a multiple of 16 and will convert it to an
   X-Form instruction if it isn't.

Differential Revision : https://reviews.llvm.org/D38758

llvm-svn: 315500
2017-10-11 20:20:58 +00:00
Matthias Braun 7a482e2302 RegisterScavenging: Followup to r305625
This does some improvements/cleanup to the recently introduced
scavengeRegisterBackwards() functionality:

- Rewrite findSurvivorBackwards algorithm to use the existing
  LiveRegUnit::accumulateBackward() code. This also avoids the Available
  and Candidates bitset and just need 1 LiveRegUnit instance
  (= 1 bitset).
- Pick registers in allocation order instead of register number order.

llvm-svn: 305817
2017-06-20 18:43:14 +00:00
Nemanja Ivanovic b89c27f515 [PowerPC] Emit VMX loads/stores for aligned ops to avoid adding swaps on LE
Fixes PR30730.
This is a re-commit of a pulled commit. The commit was pulled because some
software projects contained uses of Altivec vectors that violated alignment
requirements. Known issues have now been fixed.

Committing on behalf of Lei Huang.

Differential Revision: https://reviews.llvm.org/D26861

llvm-svn: 301892
2017-05-02 01:47:34 +00:00
Joerg Sonnenberger 400e7b7811 Use PIC relocation model as default for PowerPC64 ELF.
Most of the PowerPC64 code generation for the ELF ABI is already PIC.
There are four main exceptions:
(1) Constant pointer arrays etc. should in writeable sections.
(2) The TOC restoration NOP after a call is needed for all global
symbols. While GNU ld has a workaround for questionable GCC self-calls,
we trigger the checks for calls from COMDAT sections as they cross input
sections and are therefore not considered self-calls. The current
decision is questionable and suboptimal, but outside the scope of the
change.
(3) TLS access can not use the initial-exec model.
(4) Jump tables should use relative addresses. Note that the current
encoding doesn't work for the large code model, but it is more compact
than the default for any non-trivial jump table. Improving this is again
beyond the scope of this change.

At least (1) and (3) are assumptions made in target-independent code and
introducing additional hooks is a bit messy. Testing with clang shows
that a -fPIC binary is 600KB smaller than the corresponding -fno-pic
build. Separate testing from improved jump table encodings would explain
only about 100KB or so. The rest is expected to be a result of more
aggressive immediate forming for -fno-pic, where the -fPIC binary just
uses TOC entries.

This change brings the LLVM output in line with the GCC output, other
PPC64 compilers like XLC on AIX are known to produce PIC by default
as well. The relocation model can still be provided explicitly, i.e.
when using MCJIT.

One test case for case (1) is included, other test cases with relocation
mode sensitive behavior are wired to static for now. They will be
reviewed and adjusted separately.

Differential Revision: https://reviews.llvm.org/D26566

llvm-svn: 289743
2016-12-15 00:01:53 +00:00
Nemanja Ivanovic f57f150b1b Revert https://reviews.llvm.org/rL287679
This commit caused some miscompiles that did not show up on any of the bots.
Reverting until we can investigate the cause of those failures.

llvm-svn: 288214
2016-11-29 23:00:33 +00:00
Nemanja Ivanovic df1cb520df [PowerPC] Improvements for BUILD_VECTOR Vol. 1
This patch corresponds to review:
https://reviews.llvm.org/D25912

This is the first patch in a series of 4 that improve the lowering and combining
for BUILD_VECTOR nodes on PowerPC.

llvm-svn: 288152
2016-11-29 16:11:34 +00:00
Nemanja Ivanovic b8e30d6db6 [PowerPC] Emit VMX loads/stores for aligned ops to avoid adding swaps on LE
This patch corresponds to review:
https://reviews.llvm.org/D26861

It also fixes PR30730.

Committing on behalf of Lei Huang.

llvm-svn: 287679
2016-11-22 19:02:07 +00:00
Tony Jiang 5f850cd1b1 [PowerPC] Implement BE VSX load/store builtins - llvm portion.
This patch implements all the overloads for vec_xl_be and vec_xst_be. On BE,
they behaves exactly the same with vec_xl and vec_xst, therefore they are
simply implemented by defining a matching macro. On LE, they are implemented
by defining new builtins and intrinsics. For int/float/long long/double, it
is just a load (lxvw4x/lxvd2x) or store(stxvw4x/stxvd2x). For char/char/short,
we also need some extra shuffling before or after call the builtins to get the
desired BE order. For int128, simply call vec_xl or vec_xst.

llvm-svn: 286967
2016-11-15 14:25:56 +00:00
Nemanja Ivanovic 11049f8f07 [Power9] Part-word VSX integer scalar loads/stores and sign extend instructions
This patch corresponds to review:
https://reviews.llvm.org/D23155

This patch removes the VSHRC register class (based on D20310) and adds
exploitation of the Power9 sub-word integer loads into VSX registers as well
as vector sign extensions.
The new instructions are useful for a few purposes:

    Int to Fp conversions of 1 or 2-byte values loaded from memory
    Building vectors of 1 or 2-byte integers with values loaded from memory
    Storing individual 1 or 2-byte elements from integer vectors

This patch implements all of those uses.

llvm-svn: 283190
2016-10-04 06:59:23 +00:00
Nemanja Ivanovic 6f22b41398 [Power9] Builtins for ELF v.2 API conformance - back end portion
This patch corresponds to review:
https://reviews.llvm.org/D24396

This patch adds support for the "vector count trailing zeroes",
"vector compare not equal" and "vector compare not equal or zero instructions"
as well as "scalar count trailing zeroes" instructions. It also changes the
vector negation to use XXLNOR (when VSX is enabled) so as not to increase
register pressure (previously this was done with a splat immediate of all
ones followed by an XXLXOR). This was done because the altivec.h
builtins (patch to follow) use vector negation and the use of an additional
register for the splat immediate is not optimal.

llvm-svn: 282478
2016-09-27 08:42:12 +00:00
Ehsan Amiri a538b0f023 Adding -verify-machineinstrs option to PowerPC tests
Currently we have a number of tests that fail with -verify-machineinstrs.
To detect this cases earlier we add the option to the testcases with the
exception of tests that will currently fail with this option. PR 27456 keeps
track of this failures.

No code review, as discussed with Hal Finkel.

llvm-svn: 277624
2016-08-03 18:17:35 +00:00
Nemanja Ivanovic 44513e545f [PowerPC] - Legalize vector types by widening instead of integer promotion
This patch corresponds to review:
http://reviews.llvm.org/D20443

It changes the legalization strategy for illegal vector types from integer
promotion to widening. This only applies for vectors with elements of width
that is a multiple of a byte since we have hardware support for vectors with
1, 2, 3, 8 and 16 byte elements.
Integer promotion for vectors is quite expensive on PPC due to the sequence
of breaking apart the vector, extending the elements and reconstituting the
vector. Two of these operations are expensive.
This patch causes between minor and major improvements in performance on most
benchmarks. There are very few benchmarks whose performance regresses. These
regressions can be handled in a subsequent patch with a DAG combine (similar
to how this patch handles int -> fp conversions of illegal vector types).

llvm-svn: 274535
2016-07-05 09:22:29 +00:00
Bill Schmidt 34af5e1c76 [PowerPC] Add an MI SSA peephole pass.
This patch adds a pass for doing PowerPC peephole optimizations at the
MI level while the code is still in SSA form.  This allows for easy
modifications to the instructions while depending on a subsequent pass
of DCE.  Both passes are very fast due to the characteristics of SSA.

At this time, the only peepholes added are for cleaning up various
redundancies involving the XXPERMDI instruction.  However, I would
expect this will be a useful place to add more peepholes for
inefficiencies generated during instruction selection.  The pass is
placed after VSX swap optimization, as it is best to let that pass
remove unnecessary swaps before performing any remaining clean-ups.

The utility of these clean-ups are demonstrated by changes to four
existing test cases, all of which now have tighter expected code
generation.  I've also added Eric Schweiz's bugpoint-reduced test from
PR25157, for which we now generate tight code.  One other test started
failing for me, and I've fixed it
(test/Transforms/PlaceSafepoints/finite-loops.ll) as well; this is not
related to my changes, and I'm not sure why it works before and not
after.  The problem is that the CHECK-NOT: of "statepoint" from test1
fails because of the "statepoint" in test2, and so forth.  Adding a
CHECK-LABEL in between keeps the different occurrences of that string
properly scoped.

llvm-svn: 252651
2015-11-10 21:38:26 +00:00
Nemanja Ivanovic be5f0c04f1 Fix for bootstrap bug introduced in r244921
This revision has introduced an issue that only affects bootstrapped compiler
when it is printing the ASM. It turns out that the new code path taken due to
legalizing a scalar_to_vector of i64 -> v2i64 exposes a missing check in a
micro optimization to change a load followed by a scalar_to_vector into a
load and splat instruction on PPC.

llvm-svn: 251798
2015-11-02 14:01:11 +00:00
Nemanja Ivanovic 5f1cea4141 Temporary fix for the self-host failures introduced by rL244921.
This revision has introduced an issue that only affects bootstrapped compiler
when it is printing the ASM. I am working on resolving the issue, but in the
meantime, I'm disabling the legalization of scalar_to_vector operation for v2i64
and the associated testing until I can get this fixed.

llvm-svn: 245481
2015-08-19 19:04:47 +00:00
Nemanja Ivanovic 1c39ca6501 Scalar to vector conversions using direct moves
This patch corresponds to review:
http://reviews.llvm.org/D11471

It improves the code generated for converting a scalar to a vector value. With
direct moves from GPRs to VSRs, we no longer require expensive stack operations
for this. Subsequent patches will handle the reverse case and more general
operations between vectors and their scalar elements.

llvm-svn: 244921
2015-08-13 17:40:44 +00:00
Bill Schmidt 54cced54a6 [PowerPC] v4i32 is a VSRCRegClass
I was looking at some vector code generation and kept seeing
unnecessary vector copies into the Altivec half of the VSX registers.
I discovered that we overlooked v4i32 when adding the register classes
for VSX; we only added v4f32 and v2f64.  This means that anything that
canonicalizes into v4i32 (which is a LOT of stuff) ends up being
forced into VRRC on its way to VSRC.

The fix is one line.  The rest of the patch is fixing up some test
cases whose code generation has changed as a result.

This seems like it would be a good candidate for backport to 3.7.

llvm-svn: 242442
2015-07-16 21:14:07 +00:00
Bill Schmidt ae94f11d55 [PPC64LE] Enable missing lxvdsx optimization, and related swap optimization
When adding little-endian vector support for PowerPC last year, I
inadvertently disabled an optimization that recognizes a load-splat
idiom and generates the lxvdsx instruction.  This patch moves the
offending logic so lxvdsx is once again generated.

This pattern is frequently generated by the vectorizer for scalar
loads of an effective constant.  Previously the lxvdsx instruction was
wrongly listed as lane-sensitive for the VSX swap optimization (since
both doublewords are identical, swaps are safe).  This patch fixes
this as well, so that vectorized code using lxvdsx can now have swaps
removed from the computation.

There is an existing test (@test50) in test/CodeGen/PowerPC/vsx.ll
that checks for the missing optimization.  However, vsx.ll was only
being tested for POWER7 with big-endian code generation.  I've added
a little-endian RUN statement and expected LE code generation for all
the tests in vsx.ll to give us a bit better VSX coverage, including
what's needed for this patch.

llvm-svn: 241183
2015-07-01 19:40:07 +00:00
Hal Finkel 7c5cb066d0 [PowerPC] Enable printing instructions using aliases
TableGen had been nicely generating code to print a number of instructions using
shorter aliases (and PowerPC has plenty of short mnemonics), but we were not
calling it. For some of the aliases we support in the parser, TableGen can't
infer the "inverse" alias relationship, so there is still more to do.

Thus, after some hours of updating test cases...

llvm-svn: 235616
2015-04-23 18:30:38 +00:00
David Blaikie a79ac14fa6 [opaque pointer type] Add textual IR support for explicit type parameter to load instruction
Essentially the same as the GEP change in r230786.

A similar migration script can be used to update test cases, though a few more
test case improvements/changes were required this time around: (r229269-r229278)

import fileinput
import sys
import re

pat = re.compile(r"((?:=|:|^)\s*load (?:atomic )?(?:volatile )?(.*?))(| addrspace\(\d+\) *)\*($| *(?:%|@|null|undef|blockaddress|getelementptr|addrspacecast|bitcast|inttoptr|\[\[[a-zA-Z]|\{\{).*$)")

for line in sys.stdin:
  sys.stdout.write(re.sub(pat, r"\1, \2\3*\4", line))

Reviewers: rafael, dexonsmith, grosser

Differential Revision: http://reviews.llvm.org/D7649

llvm-svn: 230794
2015-02-27 21:17:42 +00:00
Bill Seurer 1baeeb029b [PowerPC]Update Power VSX test cases to also test fast-isel
Update of some of the VSX test cases for Power to check fast-isel codegen as well as the regular codegen.

http://reviews.llvm.org/D6357

llvm-svn: 223509
2014-12-05 20:32:05 +00:00
Bill Schmidt 9c54bbd791 [PATCH] Support select-cc for VSFRC when VSX is enabled
A previous patch enabled SELECT_VSRC and SELECT_CC_VSRC for VSX to
handle <2 x double> cases.  This patch adds SELECT_VSFRC and
SELECT_CC_VSFRC to allow use of all 64 vector-scalar registers for the
f64 type when VSX is enabled.  The changes are analogous to those in
the previous patch.  I've added a new variant to vsx.ll to test the
code generation.

(I also cleaned up a little formatting in PPCInstrVSX.td from the
previous patch.)

llvm-svn: 220395
2014-10-22 16:58:20 +00:00