We use o suffix to indicate record form instuctions,
(as it is similar to dot '.' in mne?)
This was fine before, as we did not support XO-form.
However, with https://reviews.llvm.org/D66902,
we now have XO-form support.
It becomes confusing now to still use 'o' for record form,
and it is weird to have something like 'Oo' .
This patch rename all 'o' instructions to use '_rec' instead.
Also rename `isDot` to `isRecordForm`.
Reviewed By: #powerpc, hfinkel, nemanjai, steven.zhang, lkail
Differential Revision: https://reviews.llvm.org/D70758
This fixes an assertion failure that triggers inside
getMemOperandWithOffset when Machine Sinking calls it on a MachineInstr
that is not a memory operation.
Different backends implement getMemOperandWithOffset differently: some
return false on non-memory MachineInstrs, others assert.
The Machine Sinking pass in at least SinkingPreventsImplicitNullCheck
relies on getMemOperandWithOffset to return false on non-memory
MachineInstrs, instead of asserting.
This patch updates the documentation on getMemOperandWithOffset that it
should return false on any MachineInstr it cannot handle, instead of
asserting. It also adapts the in-tree backends accordingly where
necessary.
Differential Revision: https://reviews.llvm.org/D71359
Summary:
This is found during https://reviews.llvm.org/D70758
All the other record forms are having suffix o at the end.
ANDIo8 and ANDISo8 are the only two that put o before 8.
This patch rename them to be consistent with others.
Reviewers: #powerpc, hfinkel, nemanjai, lei, steven.zhang, echristo, jhibbits, joerg
Reviewed By: jhibbits
Subscribers: wuzish, hiraditya, kbarton, shchenz, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70928
When converting reg+reg shifts to reg+imm rotates, we neglect to consider the
CodeGenOnly versions of the 32-bit shift mnemonics. This means we produce a
rotate with missing operands which causes a crash.
Committing this fix without review since it is non-controversial that the list
of mnemonics to consider should include the 64-bit aliases for the exact
mnemonics.
Fixes PR44183.
Summary:
This patch renames the DarwinDirective (used to identify which CPU was defined)
to CPUDirective. It also adds the getCPUDirective() method and replaces all uses
of getDarwinDirective() with getCPUDirective().
Once this patch lands and downstream users of the getDarwinDirective() method
have switched to the getCPUDirective() method, the old getDarwinDirective()
method will be removed.
Reviewers: nemanjai, hfinkel, power-llvm-team, jsji, echristo, #powerpc, jhibbits
Reviewed By: hfinkel, jsji, jhibbits
Subscribers: hiraditya, shchenz, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70352
This patch provides support for peudo ops including ADDIStocHA8, ADDIStocHA, LWZtocL,
LDtoc, LDtocL for AIX, lowering them from MIR to assembly.
Differential Revision: https://reviews.llvm.org/D68341
llvm-svn: 375113
Doing this makes MSVC complain that `empty(someRange)` could refer to
either C++17's std::empty or LLVM's llvm::empty, which previously we
avoided via SFINAE because std::empty is defined in terms of an empty
member rather than begin and end. So, switch callers over to the new
method as it is added.
https://reviews.llvm.org/D68439
llvm-svn: 373935
Store rlwinm Rx, Ry, 32, 0, 31 as rlwinm Rx, Ry, 0, 0, 31 and store
rldicl Rx, Ry, 64, 0 as rldicl Rx, Ry, 0, 0. Otherwise SH field is overflow and
fails assertion in assembly printing stage.
Differential Revision: https://reviews.llvm.org/D66991
llvm-svn: 373519
Neither the base implementation of findCommutedOpIndices nor any in-tree target modifies the instruction passed in and there is no reason why they would in the future.
Committed on behalf of @hvdijk (Harald van Dijk)
Differential Revision: https://reviews.llvm.org/D66138
llvm-svn: 372882
Recommit: fix asan errors.
The way MachinePipeliner uses these target hooks is stateful - we reduce trip
count by one per call to reduceLoopCount. It's a little overfit for hardware
loops, where we don't have to worry about stitching a loop induction variable
across prologs and epilogs (the induction variable is implicit).
This patch introduces a new API:
/// Analyze loop L, which must be a single-basic-block loop, and if the
/// conditions can be understood enough produce a PipelinerLoopInfo object.
virtual std::unique_ptr<PipelinerLoopInfo>
analyzeLoopForPipelining(MachineBasicBlock *LoopBB) const;
The return value is expected to be an implementation of the abstract class:
/// Object returned by analyzeLoopForPipelining. Allows software pipelining
/// implementations to query attributes of the loop being pipelined.
class PipelinerLoopInfo {
public:
virtual ~PipelinerLoopInfo();
/// Return true if the given instruction should not be pipelined and should
/// be ignored. An example could be a loop comparison, or induction variable
/// update with no users being pipelined.
virtual bool shouldIgnoreForPipelining(const MachineInstr *MI) const = 0;
/// Create a condition to determine if the trip count of the loop is greater
/// than TC.
///
/// If the trip count is statically known to be greater than TC, return
/// true. If the trip count is statically known to be not greater than TC,
/// return false. Otherwise return nullopt and fill out Cond with the test
/// condition.
virtual Optional<bool>
createTripCountGreaterCondition(int TC, MachineBasicBlock &MBB,
SmallVectorImpl<MachineOperand> &Cond) = 0;
/// Modify the loop such that the trip count is
/// OriginalTC + TripCountAdjust.
virtual void adjustTripCount(int TripCountAdjust) = 0;
/// Called when the loop's preheader has been modified to NewPreheader.
virtual void setPreheader(MachineBasicBlock *NewPreheader) = 0;
/// Called when the loop is being removed.
virtual void disposed() = 0;
};
The Pipeliner (ModuloSchedule.cpp) can use this object to modify the loop while
allowing the target to hold its own state across all calls. This API, in
particular the disjunction of creating a trip count check condition and
adjusting the loop, improves the code quality in ModuloSchedule.cpp.
llvm-svn: 372463
The way MachinePipeliner uses these target hooks is stateful - we reduce trip
count by one per call to reduceLoopCount. It's a little overfit for hardware
loops, where we don't have to worry about stitching a loop induction variable
across prologs and epilogs (the induction variable is implicit).
This patch introduces a new API:
/// Analyze loop L, which must be a single-basic-block loop, and if the
/// conditions can be understood enough produce a PipelinerLoopInfo object.
virtual std::unique_ptr<PipelinerLoopInfo>
analyzeLoopForPipelining(MachineBasicBlock *LoopBB) const;
The return value is expected to be an implementation of the abstract class:
/// Object returned by analyzeLoopForPipelining. Allows software pipelining
/// implementations to query attributes of the loop being pipelined.
class PipelinerLoopInfo {
public:
virtual ~PipelinerLoopInfo();
/// Return true if the given instruction should not be pipelined and should
/// be ignored. An example could be a loop comparison, or induction variable
/// update with no users being pipelined.
virtual bool shouldIgnoreForPipelining(const MachineInstr *MI) const = 0;
/// Create a condition to determine if the trip count of the loop is greater
/// than TC.
///
/// If the trip count is statically known to be greater than TC, return
/// true. If the trip count is statically known to be not greater than TC,
/// return false. Otherwise return nullopt and fill out Cond with the test
/// condition.
virtual Optional<bool>
createTripCountGreaterCondition(int TC, MachineBasicBlock &MBB,
SmallVectorImpl<MachineOperand> &Cond) = 0;
/// Modify the loop such that the trip count is
/// OriginalTC + TripCountAdjust.
virtual void adjustTripCount(int TripCountAdjust) = 0;
/// Called when the loop's preheader has been modified to NewPreheader.
virtual void setPreheader(MachineBasicBlock *NewPreheader) = 0;
/// Called when the loop is being removed.
virtual void disposed() = 0;
};
The Pipeliner (ModuloSchedule.cpp) can use this object to modify the loop while
allowing the target to hold its own state across all calls. This API, in
particular the disjunction of creating a trip count check condition and
adjusting the loop, improves the code quality in ModuloSchedule.cpp.
llvm-svn: 372376
Summary:
Since the SPE4RC register class contains an identical set of registers
and an identical spill size to the GPRC class its slightly confusing
the tablegen emitter. It's preventing the GPRC_and_GPRC_NOR0 synthesized
register class from inheriting VTs and AltOrders from GPRC or GPRC_NOR0.
This is because SPE4C is found first in the super register class list
when inheriting these properties and it doesn't set the VTs or
AltOrders the same way as GPRC or GPRC_NOR0.
This patch replaces all uses of GPE4RC with GPRC and allows GPRC and
GPRC_NOR0 to contain f32.
The test changes here are because the AltOrders are being inherited
to GPRC_NOR0 now.
Found while trying to determine if getCommonSubClass needs to take
a VT argument. It was originally added to support fp128 on x86-64,
I've changed some things about that so that it might be needed
anymore. But a PowerPC test crashed without it and I think its
due to this subclass issue.
Reviewers: jhibbits, nemanjai, kbarton, hfinkel
Subscribers: wuzish, nemanjai, mehdi_amini, hiraditya, kbarton, MaskRay, dexonsmith, jsji, shchenz, steven.zhang, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D67513
llvm-svn: 371779
Summary:
This clang-tidy check is looking for unsigned integer variables whose initializer
starts with an implicit cast from llvm::Register and changes the type of the
variable to llvm::Register (dropping the llvm:: where possible).
Partial reverts in:
X86FrameLowering.cpp - Some functions return unsigned and arguably should be MCRegister
X86FixupLEAs.cpp - Some functions return unsigned and arguably should be MCRegister
X86FrameLowering.cpp - Some functions return unsigned and arguably should be MCRegister
HexagonBitSimplify.cpp - Function takes BitTracker::RegisterRef which appears to be unsigned&
MachineVerifier.cpp - Ambiguous operator==() given MCRegister and const Register
PPCFastISel.cpp - No Register::operator-=()
PeepholeOptimizer.cpp - TargetInstrInfo::optimizeLoadInstr() takes an unsigned&
MachineTraceMetrics.cpp - MachineTraceMetrics lacks a suitable constructor
Manual fixups in:
ARMFastISel.cpp - ARMEmitLoad() now takes a Register& instead of unsigned&
HexagonSplitDouble.cpp - Ternary operator was ambiguous between unsigned/Register
HexagonConstExtenders.cpp - Has a local class named Register, used llvm::Register instead of Register.
PPCFastISel.cpp - PPCEmitLoad() now takes a Register& instead of unsigned&
Depends on D65919
Reviewers: arsenm, bogner, craig.topper, RKSimon
Reviewed By: arsenm
Subscribers: RKSimon, craig.topper, lenary, aemerson, wuzish, jholewinski, MatzeB, qcolombet, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, wdng, nhaehnle, sbc100, jgravelle-google, kristof.beyls, hiraditya, aheejin, kbarton, fedor.sergeev, javed.absar, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, tpr, PkmX, jocewei, jsji, Petar.Avramovic, asbirlea, Jim, s.egerton, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65962
llvm-svn: 369041
Summary:
xxspltib/vspltisb are 3 cycle PM instructions,
xxleqv is 2 cycle ALU instruction.
We should use xxleqv to set all one vectors.
Reviewers: hfinkel, nemanjai, steven.zhang
Subscribers: hiraditya, kbarton, MaskRay, shchenz, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65529
llvm-svn: 369006
Summary:
In PostRA phase, we often have to find out the most recent definition
of a register. This patch adds getDefMIPostRA so that other methods
can use it rather than implementing it repeatedly.
Differential Revision: https://reviews.llvm.org/D65131
llvm-svn: 366990
Summary:
Since we are planning to add ADDIStocHA for 32bit in later patch, we decided
to change 64bit one first to follow naming convention with 8 behind opcode.
Patch by: Xiangling_L
Differential Revision: https://reviews.llvm.org/D64814
llvm-svn: 366731
Summary:
Missed in the original commit, use the correct callee-saved register
list for spilling, instead of the standard SVR432 list. This avoids
needlessly spilling the SPE non-volatile registers when they're not used.
As part of this, also add where missing, and sort, the spill opcode
checks for SPE and SPE4 register classes.
Reviewers: nemanjai, hfinkel, joerg
Subscribers: kbarton, jsji, llvm-commits
Differential Revision: https://reviews.llvm.org/D56703
llvm-svn: 366319
After implemented this hook, we will model the memory dependency in the scheduling dependency graph more precise,
and will have more opportunity to reorder the load/stores, as they didn't have the dependency at some condition
Differential Revision: https://reviews.llvm.org/D63804
llvm-svn: 364886
Avoids using a plain unsigned for registers throughoug codegen.
Doesn't attempt to change every register use, just something a little
more than the set needed to build after changing the return type of
MachineOperand::getReg().
llvm-svn: 364191
Implement necessary target hooks to enable MachinePipeliner for P9 only.
The pass is off by default, can be enabled with -ppc-enable-pipeliner for P9.
Differential Revision: https://reviews.llvm.org/D62164
llvm-svn: 363085
We currently miss the opportunities for optmizing comparisons in the peephole
optimizer if the input is the result of a COPY since we look for record-form
versions of the producing instruction.
This patch simply lets the optimization peek through copies.
Differential revision: https://reviews.llvm.org/D59633
llvm-svn: 362438
Summary:
It looks like since INLINEASM_BR was created off of INLINEASM (r353563),
a few checks for INLINEASM needed to be updated to check for either
case.
pr/41999
Reviewers: hfinkel
Reviewed By: hfinkel
Subscribers: nemanjai, hiraditya, kbarton, jsji, llvm-commits, craig.topper, srhines
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62403
llvm-svn: 362278
For the situation, where we generate the following code:
crxor 8, 8, 8
< Some instructions>
.LBB0_1:
< Some instructions>
cror 1, 8, 8
cror (COPY of CRbit) depends on the result of the crxor instruction.
CR8 is known to be zero as crxor is equivalent to CRUNSET. We can simply use
crxor 1, 1, 1 instead to zero out CR1, which does not have any dependency on
any previous instruction.
This patch will optimize it to:
< Some instructions>
.LBB0_1:
< Some instructions>
cror 1, 1, 1
Patch By: Victor Huang (NeHuang)
Differential Revision: https://reviews.llvm.org/D62044
llvm-svn: 361632
Vector imm setting instructions like XXLXORz/XXLXORspz/XXLXORdpz
Should behave like LI8.
We should set corresponding flags to allow rematerialization and other
opts in LICM, RA, Scheduling etc.
Differential Revision: https://reviews.llvm.org/D58645
llvm-svn: 355948
This was found when we generated COPY from G8RC to F8RC in
EmitInstrWithCustomInserter without checking proper architecture,
we silently generated mtvsrd, which require P8 and up.
This is a NFC patch to add assert when we call copyPhysReg, in case
someone accidentally generate COPY between G8RC to F8RC for P7 and
below.
llvm-svn: 355920
Move the stdu instruction in the prologue and epilogue.
This should provide a small performance boost in functions that are able to do
this. I've kept this change rather conservative at the moment and functions
with frame pointers or base pointers will not try to move the stack pointer
update.
Differential Revision: https://reviews.llvm.org/D42590
llvm-svn: 355085
to reflect the new license.
We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.
Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.
llvm-svn: 351636
If we are changing the MI operand from Reg to Imm, we need also handle its implicit use if have.
Differential Revision: https://reviews.llvm.org/D56078
llvm-svn: 350115
Fix assert about using an undefined physical register in machine instruction verify pass.
The reason is that register flag undef is missing when doing transformation from If Conversion Pass.
```
Bad machine code: Using an undefined physical register
- function: func_65
- basic block: %bb.0 entry (0x10024740738)
- instruction: BCLR killed $cr5lt, implicit $lr8, implicit $rm, implicit undef $x3
- operand 0: killed $cr5lt
LLVM ERROR: Found 1 machine code errors.
```
There are also other existing testcases with same issue. So I add -verify-machineinstrs option to open verifying.
Differential Revision: https://reviews.llvm.org/D55408
llvm-svn: 348566
Summary:
There are 4 instructions which have Inconsistent ImmMustBeMultipleOf in the
function PPCInstrInfo::instrHasImmForm, they are LFS, LFD, STFS, STFD.
These four instructions should set the ImmMustBeMultipleOf to 1 instead of 4.
Reviewed By: steven.zhang
Differential Revision: https://reviews.llvm.org/D54738
llvm-svn: 348109
This reverts commits r347532. Forget add the option
-mtriple powerpc64-unknown-linux-gnu. So other platform is error except
for PowerPC.
llvm-svn: 347534
Summary:
There are 4 instructions which have Inconsistent ImmMustBeMultipleOf in the
function PPCInstrInfo::instrHasImmForm, they are LFS, LFD, STFS, STFD.
These four instructions should set the ImmMustBeMultipleOf to 1 instead of 4.
Reviewed By: nemanjai
Differential Revision: https://reviews.llvm.org/D54738
llvm-svn: 347532
The D-Form VSX loads introduced in ISA 3.0 are not direct D-Form equivalent of
the corresponding X-Forms since they only target the Altivec registers.
Namely LXSSPX can load into any of the 64 VSX registers whereas LXSSP can only
load into the upper 32 VSX registers. Similarly with the remaining affected
instructions.
There is currently no way that I can see to trigger the bug, but as we add other
ways of exploiting these instructions, there may very well be instances that do.
This is an NFC patch in practical terms since the changes it introduces can not
be triggered without an MIR test.
Differential revision: https://reviews.llvm.org/D53323
llvm-svn: 344894
Going from XForm Load to DSForm Load requires that the immediate be 4 byte
aligned.
If we are not aligned we must leave the load as LDX (XForm).
This bug is causing a compile-time failure in the benchmark h264ref.
Differential Revision: https://reviews.llvm.org/D51988
llvm-svn: 343525
This is a follow-up to the previous patch that eliminated some of the rotates.
With this addition, we will also emit the record-form andis.
This patch increases the number of record-form rotates we eliminate by
more than 70%.
Differential revision: https://reviews.llvm.org/D44897
llvm-svn: 342478
Both ANDIo and ANDISo (and the 64-bit versions) are record-form instructions.
When optimizing compares, we handle the former in order to eliminate the compare
instruction but not the latter. This patch just adds the latter to the set of
instructions we optimize.
The reason these instructions need to be handled separately is that they are not
part of the RecFormRel map (since they don't have a non-record-form). The
missing "and-immediate-shifted" is just an oversight in the initial
implementation.
Differential revision: https://reviews.llvm.org/D51353
llvm-svn: 342472
This patch will address using the xscpsgndp instruction to copy floating point
scalar registers instead of the xxlor (specifically XXLORf) instruction that is
currently used. Additionally, this patch of utilizing xscpsgndp will apply to
P9, while pre-P9 will still use xxlor.
Patch by amyk
Differential Revision: https://reviews.llvm.org/D50004
llvm-svn: 340643
If the arch is P8, we will select XFLOAD to load the floating point, and then, expand it to vsx and non-vsx X-form instruction post RA. This patch is trying to convert the X-form to D-form if it meets the requirement that one operand of the x-form inst is the special Zero register, and another operand fed by add inst. i.e.
y = add imm, reg
LFDX. 0, y
-->
LFD imm(reg)
Reviewers: Nemanjai
Differential Revision: https://reviews.llvm.org/D49007
llvm-svn: 340149
Summary:
The Signal Processing Engine (SPE) is found on NXP/Freescale e500v1,
e500v2, and several e200 cores. This adds support targeting the e500v2,
as this is more common than the e500v1, and is in SoCs still on the
market.
This patch is very intrusive because the SPE is binary incompatible with
the traditional FPU. After discussing with others, the cleanest
solution was to make both SPE and FPU features on top of a base PowerPC
subset, so all FPU instructions are now wrapped with HasFPU predicates.
Supported by this are:
* Code generation following the SPE ABI at the LLVM IR level (calling
conventions)
* Single- and Double-precision math at the level supported by the APU.
Still to do:
* Vector operations
* SPE intrinsics
As this changes the Callee-saved register list order, one test, which
tests the precise generated code, was updated to account for the new
register order.
Reviewed by: nemanjai
Differential Revision: https://reviews.llvm.org/D44830
llvm-svn: 337347
Revision r322373 fixed a bug in how we materialize constants when the CR-field
needs to be set.
However the fix is overly conservative. It will only do the transform if
AND-ing the input with the new constant produces the same new constant.
This is of course correct, but not necessarily required.
If there are no futher uses of the constant, the constant can be changed.
If there are no uses of the GPR result, the final result of the materialization
isn't important other than it needs to compare to zero correctly (lt, gt, eq).
Differential revision: https://reviews.llvm.org/D42109
llvm-svn: 337008
and expand it post RA basing on the register pressure. However, we miss to do the add-imm peephole for these pseudo instruction.
Differential Revision: https://reviews.llvm.org/D47568
Reviewed By: Nemanjai
llvm-svn: 335024
The DEBUG() macro is very generic so it might clash with other projects.
The renaming was done as follows:
- git grep -l 'DEBUG' | xargs sed -i 's/\bDEBUG\s\?(/LLVM_DEBUG(/g'
- git diff -U0 master | ../clang/tools/clang-format/clang-format-diff.py -i -p1 -style LLVM
- Manual change to APInt
- Manually chage DOCS as regex doesn't match it.
In the transition period the DEBUG() macro is still present and aliased
to the LLVM_DEBUG() one.
Differential Revision: https://reviews.llvm.org/D43624
llvm-svn: 332240
This patch fixes https://bugs.llvm.org/show_bug.cgi?id=37039
The condition only covers one of the two 64-bit rotate instructions. This just
adds the second (RLDICLo).
Patch by Josh Stone.
llvm-svn: 329852
A new function getOpcodeForSpill should now be the only place to get
the opcode for a given spilled register.
Differential Revision: https://reviews.llvm.org/D43086
llvm-svn: 328556
This patch adds functions to allow MachineLICM to hoist invariant stores.
Currently, MachineLICM does not hoist any store instructions, however
when storing the same value to a constant spot on the stack, the store
instruction should be considered invariant and be hoisted. The function
isInvariantStore iterates each operand of the store instruction and checks
that each register operand satisfies isCallerPreservedPhysReg. The store
may be fed by a copy, which is hoisted by isCopyFeedingInvariantStore.
This patch also adds the PowerPC changes needed to consider the stack
register as caller preserved.
Differential Revision: https://reviews.llvm.org/D40196
llvm-svn: 328326
This patch adds functions to allow MachineLICM to hoist invariant stores.
Currently, MachineLICM does not hoist any store instructions, however
when storing the same value to a constant spot on the stack, the store
instruction should be considered invariant and be hoisted. The function
isInvariantStore iterates each operand of the store instruction and checks
that each register operand satisfies isCallerPreservedPhysReg. The store
may be fed by a copy, which is hoisted by isCopyFeedingInvariantStore.
This patch also adds the PowerPC changes needed to consider the stack
register as caller preserved.
Differential Revision: https://reviews.llvm.org/D40196
llvm-svn: 327856
Up until Power9, the performance profile for rlwinm., rldicl. and andi. looked
more or less equivalent. However with Power9, the rotates are still 2-way
cracked whereas the and-immediate is not.
This patch just ensures that we don't emit record-form rotates when an andi.
is adequate.
As first pointed out by Carrot in https://bugs.llvm.org/show_bug.cgi?id=30833
(this patch is a fix for that PR).
Differential Revision: https://reviews.llvm.org/D43977
llvm-svn: 326736
Revision 320791 introduced a pass that transforms reg+reg instructions to
reg+imm if they're fed by "load immediate". However, it didn't
handle out-of-range shifts correctly as reported in PR35688.
This patch fixes that and therefore the PR.
Furthermore, there was undefined behaviour in the patch where the RHS of an
initialization expression was 32 bits and constant `1` was shifted left 32
bits. This was fixed by ensuring the RHS is 64 bits just like the LHS.
Differential Revision: https://reviews.llvm.org/D41369
llvm-svn: 321551
This patch adds the necessary infrastructure to convert instructions that
take two register operands to those that take a register and immediate if
the necessary operand is produced by a load-immediate. Furthermore, it uses
this infrastructure to perform such conversions twice - first at MachineSSA
and then pre-emit.
There are a number of reasons we may end up with opportunities for this
transformation, including but not limited to:
- X-Form instructions chosen since the exact offset isn't available at ISEL time
- Atomic instructions with constant operands (we will add patterns for this
in the future)
- Tail duplication may duplicate code where one block contains this redundancy
- When emitting compare-free code in PPCDAGToDAGISel, we don't handle constant
comparands specially
Furthermore, this patch moves the initialization of PPCMIPeepholePass so that
it can be used for MIR tests.
llvm-svn: 320791
Work towards the unification of MIR and debug output by printing
`@foo` instead of `<ga:@foo>`.
Also print target flags in the MIR format since most of them are used on
global address operands.
Only debug syntax is affected.
llvm-svn: 320682
Headers/Implementation files should be named after the class they
declare/define.
Also eliminated an `#include "llvm/CodeGen/LiveIntervalAnalysis.h"` in
favor of `class LiveIntarvals;`
llvm-svn: 320546
As part of the unification of the debug format and the MIR format, avoid
printing "vreg" for virtual registers (which is one of the current MIR
possibilities).
Basically:
* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" \) -type f -print0 | xargs -0 sed -i '' -E "s/%vreg([0-9]+)/%\1/g"
* grep -nr '%vreg' . and fix if needed
* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" \) -type f -print0 | xargs -0 sed -i '' -E "s/ vreg([0-9]+)/ %\1/g"
* grep -nr 'vreg[0-9]\+' . and fix if needed
Differential Revision: https://reviews.llvm.org/D40420
llvm-svn: 319427
Separate the handling of AND/AND8 out from PHI/OR/ISEL checking. The reasoning
is the others need all their operands to be sign/zero extended for their output
to also be sign/zero extended. This is true for AND and sign-extension, but for
zero-extension we only need at least one of the input operands to be zero
extended for the result to also be zero extended.
Differential Revision: https://reviews.llvm.org/D39078
llvm-svn: 319289
As part of the unification of the debug format and the MIR format,
always print registers as lowercase.
* Only debug printing is affected. It now follows MIR.
Differential Revision: https://reviews.llvm.org/D40417
llvm-svn: 319187
This patch adds a peep hole optimization to remove any redundant toc save
instructions added as part of the call sequence for indirect calls. It removes
any toc saves within a function that are dominated by another toc save.
Differential Revision: https://reviews.llvm.org/D39736
llvm-svn: 319087
The VSX versions have the advantage of a full 64-register target whereas the FP
ones have the advantage of lower latency and higher throughput. So what we’re
after is using the faster instructions in low register pressure situations and
using the larger register file in high register pressure situations.
The heuristic chooses between the following 7 pairs of instructions.
PPC::LXSSPX vs PPC::LFSX
PPC::LXSDX vs PPC::LFDX
PPC::STXSSPX vs PPC::STFSX
PPC::STXSDX vs PPC::STFDX
PPC::LXSIWAX vs PPC::LFIWAX
PPC::LXSIWZX vs PPC::LFIWZX
PPC::STXSIWX vs PPC::STFIWX
Differential Revision: https://reviews.llvm.org/D38486
llvm-svn: 318651
Currently a record-form instruction is used for comparison of "greater than -1" and "less than 1" by modifying the predicate (e.g. LT 1 into LE 0) in addition to the naive case of comparison against 0.
This patch also enables emitting a record-form instruction for "less than or equal to -1" (i.e. "less than 0") and "greater than or equal to 1" (i.e. "greater than 0") to increase the optimization opportunities.
Differential Revision: https://reviews.llvm.org/D38941
llvm-svn: 316647
Helper functions to identify sign- and zero-extending machine instruction is introduced in rL315888.
This patch makes PPCInstrInfo::optimizeCompareInstr use the helper functions. It simplifies the code and also makes possible more optimizations since the helper can do more analysis than the original check code; I observed about 5000 more compare instructions are eliminated while building LLVM.
Also, this patch fixes a bug in helpers on ANDIo instruction handling due to the order of checks. This bug causes a failure in an existing test case for optimizeCompareInstr.
Differential Revision: https://reviews.llvm.org/D38988
llvm-svn: 316071
This patch enables redundant sign- and zero-extension elimination in PowerPC MI Peephole pass.
If the input value of a sign- or zero-extension is known to be already sign- or zero-extended, the operation is redundant and can be eliminated.
One common case is sign-extensions for a method parameter or for a method return value; they must be sign- or zero-extended as defined in PPC ELF ABI.
For example of the following simple code, two extsw instructions are generated before the invocation of int_func and before the return. With this patch, both extsw are eliminated.
void int_func(int);
void ii_test(int a) {
if (a & 1) return int_func(a);
}
Such redundant sign- or zero-extensions are quite common in many programs; e.g. I observed about 60,000 occurrences of the elimination while compiling the LLVM+CLANG.
Differential Revision: https://reviews.llvm.org/D31319
llvm-svn: 315888
Currently we produce a bunch of unnecessary code when emitting the
prologue/epilogue for spills/restores. Namely, if the load from stack
slot/store to stack slot instruction is an X-Form instruction, we will
always produce an LIS/ORI sequence for the stack offset.
Furthermore, we have not exploited the P9 vector D-Form loads/stores for this
purpose.
This patch address both issues.
Specifying the D-Form load as the instruction to use for stack spills/reloads
should be safe because:
1. The stack should be aligned according to the ABI
2. If the stack isn't aligned, PPCRegisterInfo::eliminateFrameIndex() will
check for the offset being a multiple of 16 and will convert it to an
X-Form instruction if it isn't.
Differential Revision : https://reviews.llvm.org/D38758
llvm-svn: 315500
This patch makes analyzeBranch eliminate unconditional branch to the next instruction.
After basic blocks are re-organized by optimizers, such as machine block placement, a BB may end with an unconditional branch to the next (fallthrough) BB. This patch removes such redundant branch instruction.
Differential Revision: https://reviews.llvm.org/D37730
llvm-svn: 314297
This patch updates register allocation to enable spilling gprs to
volatile vector registers rather than the stack. It can be enabled
for Power9 with option -ppc-enable-gpr-to-vsr-spills.
Differential Revision: https://reviews.llvm.org/D34815
llvm-svn: 313886
In optimizeCompareInstr, a compare instruction is eliminated by using a record form instruction if possible.
If the branch instruction that uses the result of the compare has a static branch hint, the optimization does not happen.
This patch makes this optimization happen regardless of the branch hint by splitting branch hint and branch condition before checking the predicate to identify the possible optimizations.
Differential Revision: https://reviews.llvm.org/D35801
llvm-svn: 309255
Define target hook isReallyTriviallyReMaterializable() to explicitly specify
PowerPC instructions that are trivially rematerializable. This will allow
the MachineLICM pass to accurately identify PPC instructions that should always
be hoisted.
Differential Revision: https://reviews.llvm.org/D34255
llvm-svn: 305932
This patch fixes a potential verification error (64-bit register operands for cmpw) with -verify-machineinstrs.
Differential Revision: https://reviews.llvm.org/D34208
llvm-svn: 305479
This patch builds upon https://reviews.llvm.org/rL302810 to add
handling for bitwise logical operations in general purpose registers.
The idea is to keep the values in GPRs as long as possible - only
extracting them to a condition register bit when no further operations
are to be done.
Differential Revision: https://reviews.llvm.org/D31851
llvm-svn: 304282
Summary
clang -c -mcpu=pwr9 test/CodeGen/PowerPC/build-vector-tests.ll causes an assertion failure during the binary encoding.
The failure occurs when a D-form load instruction takes two register operands instead of a register + an immediate.
This patch fixes the problem and also adds an assertion to catch this failure earlier before the binary encoding (i.e. during lit test).
The fix is from Nemanja Ivanovic @nemanjai.
Differential Revision: https://reviews.llvm.org/D33482
llvm-svn: 304133
PPC backend eliminates compare instructions by using record-form instructions in PPCInstrInfo::optimizeCompareInstr, which is called from peephole optimization pass.
This patch improves this optimization to eliminate more compare instructions in two types of common case.
- comparison against a constant 1 or -1
The record-form instructions set CR bit based on signed comparison against 0. So, the current implementation does not exploit the record-form instruction for comparison against a non-zero constant.
This patch enables record-form optimization for constant of 1 or -1 if possible; it changes the condition "greater than -1" into "greater than or equal to 0" and "less than 1" into "less than or equal to 0".
With this patch, compare can be eliminated in the following code sequence, as an example.
uint64_t a, b;
if ((a | b) & 0x8000000000000000ull) { ... }
else { ... }
- andi for 32-bit comparison on PPC64
Since record-form instructions execute 64-bit signed comparison and so we have limitation in eliminating 32-bit comparison, i.e. with cmplwi, using the record-form. The original implementation already has such checks but andi. is not recognized as an instruction which executes implicit zero extension and hence safe to convert into record-form if used for equality check.
%1 = and i32 %a, 10
%2 = icmp ne i32 %1, 0
br i1 %2, label %foo, label %bar
In this simple example, LLVM generates andi. + cmplwi + beq on PPC64.
This patch make it possible to eliminate the cmplwi for this case.
I added andi. for optimization targets if it is safe to do so.
Differential Revision: https://reviews.llvm.org/D30081
llvm-svn: 303500
Summary:
This fixes pr32392.
The lowering pipeline is:
llvm.ppc.cfence in IR -> PPC::CFENCE8 in isel -> Actual instructions in
expandPostRAPseudo.
The reason why expandPostRAPseudo is chosen is because previous passes
are likely eliminating instructions like cmpw 3, 3 (early CSE) and bne-
7, .+4 (some branch pass(s)).
Differential Revision: https://reviews.llvm.org/D32763
llvm-svn: 303205
In addition to the original commit, tighten the condition for when to
pad empty functions to COFF Windows. This avoids running into problems
when targeting e.g. Win32 AMDGPU, which caused test failures when this
was committed initially.
llvm-svn: 301047
Empty functions can lead to duplicate entries in the Guard CF Function
Table of a binary due to multiple functions sharing the same RVA,
causing the kernel to refuse to load that binary.
We had a terrific bug due to this in Chromium.
It turns out we were already doing this for Mach-O in certain
situations. This patch expands the code for that in
AsmPrinter::EmitFunctionBody() and renames
TargetInstrInfo::getNoopForMachoTarget() to simply getNoop() since it
seems it was used for not just Mach-O anyway.
Differential Revision: https://reviews.llvm.org/D32330
llvm-svn: 301040
Summary:
powerpc64 big-endian is not supported, but I believe that most logic can
be shared, except for xray_powerpc64.cc.
Also add a function InvalidateInstructionCache to xray_util.h, which is
copied from llvm/Support/Memory.cpp. I'm not sure if I need to add a unittest,
and I don't know how.
Reviewers: dberris, echristo, iteratee, kbarton, hfinkel
Subscribers: mehdi_amini, nemanjai, mgorny, llvm-commits
Differential Revision: https://reviews.llvm.org/D29742
llvm-svn: 294781
Generally, the ISEL is expanded into if-then-else sequence, in some
cases (like when the destination register is the same with the true
or false value register), it may just be expanded into just the if
or else sequence.
llvm-svn: 292154
Generally, the ISEL is expanded into if-then-else sequence, in some
cases (like when the destination register is the same with the true
or false value register), it may just be expanded into just the if
or else sequence.
llvm-svn: 292128
Rename from addOperand to just add, to match the other method that has been
added to MachineInstrBuilder for adding more than just 1 operand.
See https://reviews.llvm.org/D28057 for the whole discussion.
Differential Revision: https://reviews.llvm.org/D28556
llvm-svn: 291891
This patch corresponds to review:
The newly added VSX D-Form (register + offset) memory ops target the upper half
of the VSX register set. The existing ones target the lower half. In order to
unify these and have the ability to target all the VSX registers using D-Form
operations, this patch defines Pseudo-ops for the loads/stores which are
expanded post-RA. The expansion then choses the correct opcode based on the
register that was allocated for the operation.
llvm-svn: 283212
This patch corresponds to review:
https://reviews.llvm.org/D23155
This patch removes the VSHRC register class (based on D20310) and adds
exploitation of the Power9 sub-word integer loads into VSX registers as well
as vector sign extensions.
The new instructions are useful for a few purposes:
Int to Fp conversions of 1 or 2-byte values loaded from memory
Building vectors of 1 or 2-byte integers with values loaded from memory
Storing individual 1 or 2-byte elements from integer vectors
This patch implements all of those uses.
llvm-svn: 283190
This patch corresponds to review:
https://reviews.llvm.org/D19825
The new lxvx/stxvx instructions do not require the swaps to line the elements
up correctly. In order to select them over the lxvd2x/lxvw4x instructions which
require swaps, the patterns for the old instruction have a predicate that
ensures they won't be selected on Power9 and newer CPUs.
llvm-svn: 282143
Avoid implicit conversions from MachineInstrBundleIterator to
MachineInstr* in the PowerPC backend, mainly by preferring MachineInstr&
over MachineInstr* when a pointer isn't nullable and using range-based
for loops.
There was one piece of questionable code in PPCInstrInfo::AnalyzeBranch,
where a condition checked a pointer converted from an iterator for
nullptr. Since this case is impossible (moreover, the code above
guarantees that the iterator is valid), I removed the check when I
changed the pointer to a reference.
Despite that case, there should be no functionality change here.
llvm-svn: 276864
This is mostly a mechanical change to make TargetInstrInfo API take
MachineInstr& (instead of MachineInstr* or MachineBasicBlock::iterator)
when the argument is expected to be a valid MachineInstr. This is a
general API improvement.
Although it would be possible to do this one function at a time, that
would demand a quadratic amount of churn since many of these functions
call each other. Instead I've done everything as a block and just
updated what was necessary.
This is mostly mechanical fixes: adding and removing `*` and `&`
operators. The only non-mechanical change is to split
ARMBaseInstrInfo::getOperandLatencyImpl out from
ARMBaseInstrInfo::getOperandLatency. Previously, the latter took a
`MachineInstr*` which it updated to the instruction bundle leader; now,
the latter calls the former either with the same `MachineInstr&` or the
bundle leader.
As a side effect, this removes a bunch of MachineInstr* to
MachineBasicBlock::iterator implicit conversions, a necessary step
toward fixing PR26753.
Note: I updated WebAssembly, Lanai, and AVR (despite being
off-by-default) since it turned out to be easy. I couldn't run tests
for AVR since llc doesn't link with it turned on.
llvm-svn: 274189
This used to be free, copying and moving DebugLocs became expensive
after the metadata rewrite. Passing by reference eliminates a ton of
track/untrack operations. No functionality change intended.
llvm-svn: 272512
Fix PR27943 "Bad machine code: Using an undefined physical register".
SUBFC8 implicitly defines the CR0 register, but this was omitted in
the instruction definition.
Patch by Jameson Nash <jameson@juliacomputing.com>
Reviewers: hfinkel
Differential Revision: http://reviews.llvm.org/D20802
llvm-svn: 271425
This patch corresponds to review:
http://reviews.llvm.org/D19683
Simply adds the bits for being able to specify -mcpu=pwr9 to the back end.
llvm-svn: 268950
Resolve Bug 27046 (https://llvm.org/bugs/show_bug.cgi?id=27046).
The PPCInstrInfo::optimizeCompareInstr function could create a new use of
CR0, even if CR0 were previously dead. This patch marks CR0 live if a use of
CR0 is created.
Author: Tom Jablin (tjablin)
Reviewers: hfinkel kbarton cycheng
http://reviews.llvm.org/D18884
llvm-svn: 266040
Change TargetInstrInfo API to take `MachineInstr&` instead of
`MachineInstr*` in the functions related to predicated instructions
(I'll try to come back later and get some of the rest). All of these
functions require non-null parameters already, so references are more
clear. As a bonus, this happens to factor away a host of implicit
iterator => pointer conversions.
No functionality change intended.
llvm-svn: 261605
Some compilers don't do exhaustive switch checking. For those compilers,
add an initialization to prevent un-initialized variable warnings from
firing. For compilers with exhaustive switch checking, we still get a
guarantee that the switch is exhaustive, and hence the initializations
are redundant, and a non-functional change.
llvm-svn: 257923
Only non-weighted predicates were handled in PPCInstrInfo::insertSelect. Handle
the weighted predicates as well.
This latent bug was triggered by r255398, because it added use of the
branch-weighted predicates.
While here, switch over an enum instead of an int to get the compiler to enforce
totality in the future.
llvm-svn: 257518
Also, remove an enum hack where enum values were used as indexes into an array.
We may want to make this a real class to allow pattern-based queries/customization (D13417).
llvm-svn: 252196
To commute a trivial rlwimi instructions (meaning one with a full mask and zero
shift), we'd need to ability to form an all-zero mask (instead of an all-one
mask) using rlwimi. We can't represent this, however, and we'll miscompile code
if we try.
The code quality problem that this highlights (that SDAG simplification can
lead to us generating an ISD::OR node with a constant zero LHS) will be fixed
as a follow-up.
Fixes PR24719.
llvm-svn: 246937
Add support for MIR serialization of PowerPC-specific operand target flags
(based on the generic infrastructure added in r244185 and r245383).
I won't even pretend that this is good test coverage, but this includes the
regression test associated with r246372. Adding an MIR test for that fix is far
superior to adding an IR-level test because particular instruction-scheduling
decisions are necessary in order to expose the bug, and using an MIR test we
can start the pipeline post-scheduling.
llvm-svn: 246373
This commit removes the global manager variable which is responsible for
storing and allocating pseudo source values and instead it introduces a new
manager class named 'PseudoSourceValueManager'. Machine functions now own an
instance of the pseudo source value manager class.
This commit also modifies the 'get...' methods in the 'MachinePointerInfo'
class to construct pseudo source values using the instance of the pseudo
source value manager object from the machine function.
This commit updates calls to the 'get...' methods from the 'MachinePointerInfo'
class in a lot of different files because those calls now need to pass in a
reference to a machine function to those methods.
This change will make it easier to serialize pseudo source values as it will
enable me to transform the mips specific MipsCallEntry PseudoSourceValue
subclass into two target independent subclasses.
Reviewers: Akira Hatanaka
llvm-svn: 244693
This is a direct port of the code from the X86 backend (r239486/r240361), which
uses the MachineCombiner to reassociate (floating-point) adds/muls to increase
ILP, to the PowerPC backend. The rationale is the same.
There is a lot of copy-and-paste here between the X86 code and the PowerPC
code, and we should extract at least some of this into CodeGen somewhere.
However, I don't want to do that until this code is enhanced to handle FMAs as
well. After that, we'll be in a better position to extract the common parts.
llvm-svn: 242279
PowerPC uses itineraries to describe processor pipelines (and dispatch-group
restrictions for P7/P8 cores). Unfortunately, the target-independent
implementation of TII.getInstrLatency calls ItinData->getStageLatency, and that
looks for the largest cycle count in the pipeline for any given instruction.
This, however, yields the wrong answer for the PPC itineraries, because we
don't encode the full pipeline. Because the functional units are fully
pipelined, we only model the initial stages (there are no relevant hazards in
the later stages to model), and so the technique employed by getStageLatency
does not really work. Instead, we should take the maximum output operand
latency, and that's what PPCInstrInfo::getInstrLatency now does.
This caused some test-case churn, including two unfortunate side effects.
First, the new arrangement of copies we get from function parameters now
sometimes blocks VSX FMA mutation (a FIXME has been added to the code and the
test cases), and we have one significant test-suite regression:
SingleSource/Benchmarks/BenchmarkGame/spectral-norm
56.4185% +/- 18.9398%
In this benchmark we have a loop with a vectorized FP divide, and it with the
new scheduling both divides end up in the same dispatch group (which in this
case seems to cause a problem, although why is not exactly clear). The grouping
structure is hard to predict from the bottom of the loop, and there may not be
much we can do to fix this.
Very few other test-suite performance effects were really significant, but
almost all weakly favor this change. However, in light of the issues
highlighted above, I've left the old behavior available via a
command-line flag.
llvm-svn: 242188
This patch corresponds to review:
http://reviews.llvm.org/D9440
It adds a new register class to the PPC back end to contain single precision
values in VSX registers. Additionally, it adds scalar loads and stores for
VSX registers.
llvm-svn: 236755
This patch adds Hardware Transaction Memory (HTM) support supported by ISA 2.07
(POWER8). The intrinsic support is based on GCC one [1], but currently only the
'PowerPC HTM Low Level Built-in Function' are implemented.
The HTM instructions follows the RC ones and the transaction initiation result
is set on RC0 (with exception of tcheck). Currently approach is to create a
register copy from CR0 to GPR and comapring. Although this is suboptimal, since
the branch could be taken directly by comparing the CR0 value, it generates code
correctly on both test and branch and just return value. A possible future
optimization could be elimitate the MFCR instruction to branch directly.
The HTM usage requires a recently newer kernel with PPC HTM enabled. Tested on
powerpc64 and powerpc64le.
This is send along a clang patch to enabled the builtins and option switch.
[1] https://gcc.gnu.org/onlinedocs/gcc/PowerPC-Hardware-Transactional-Memory-Built-in-Functions.html
Phabricator Review: http://reviews.llvm.org/D8247
llvm-svn: 233204
This adds support for the QPX vector instruction set, which is used by the
enhanced A2 cores on the IBM BG/Q supercomputers. QPX vectors are 256 bytes
wide, holding 4 double-precision floating-point values. Boolean values, modeled
here as <4 x i1> are actually also represented as floating-point values
(essentially { -1, 1 } for { false, true }). QPX shares many features with
Altivec and VSX, but is distinct from both of them. One major difference is
that, instead of adding completely-separate vector registers, QPX vector
registers are extensions of the scalar floating-point registers (lane 0 is the
corresponding scalar floating-point value). The operations supported on QPX
vectors mirrors that supported on the scalar floating-point values (with some
additional ones for permutations and logical/comparison operations).
I've been maintaining this support out-of-tree, as part of the bgclang project,
for several years. This is not the entire bgclang patch set, but is most of the
subset that can be cleanly integrated into LLVM proper at this time. Adding
this to the LLVM backend is part of my efforts to rebase bgclang to the current
LLVM trunk, but is independently useful (especially for codes that use LLVM as
a JIT in library form).
The assembler/disassembler test coverage is complete. The CodeGen test coverage
is not, but I've included some tests, and more will be added as follow-up work.
llvm-svn: 230413
Our register allocation has become better recently, it seems, and is now
starting to generate cross-block copies into inflated register classes. These
copies are not transformed into subregister insertions/extractions by the
PPCVSXCopy class, and so need to be handled directly by
PPCInstrInfo::copyPhysReg. The code to do this was *almost* there, but not
quite (it was unnecessarily restricting itself to only the direct
sub/super-register-class case (not copying between, for example, something in
VRRC and the lower-half of VSRC which are super-registers of F8RC).
Triggering this behavior manually is difficult; I'm including two
bugpoint-reduced test cases from the test suite.
llvm-svn: 229457
PPCInstrInfo.cpp has ended up containing several small MI-level passes, and
this is making the file harder to read than necessary. Split out
PPCEarlyReturn into its own source file. NFC.
Now that PPCInstrInfo.cpp does not also contain pass implementations, I hope
that it will be slightly less unwieldy.
llvm-svn: 227775
PPCInstrInfo.cpp has ended up containing several small MI-level passes, and
this is making the file harder to read than necessary. Split out
PPCVSXCopy into its own source file. NFC.
llvm-svn: 227771