This patch adds spilling for the new WACC registers.
In order to get the spilling test to work the MMA instructions from Power 10 are
now supported for Future CPU except that they are all using the new WACC
registers instead of the ACC registers from Power 10.
Reviewed By: amyk, saghir
Differential Revision: https://reviews.llvm.org/D136728
This patch teaches getStoreOpcodesForSpillArray and
getLoadOpcodesForSpillArray to return ArrayRef. This way,
isLoadFromStackSlot and isStoreToStackSlot can use llvm::is_contained.
Summary: We currently optimize the comparison only in SSA, therefore we will miss some optimization opportunities where the input of comparison is lowered from COPY in post-RA.
Ie. ExpandPostRA::LowerCopy is called after PPCInstrInfo::optimizeCompareInstr.
This patch optimizes the comparison in post-RA and only the cases that compare against zero can be handled.
D131374 converts the comparison and its user to a compare against zero with the appropriate predicate on the branch, which creates additional opportunities for this patch.
Reviewed By: shchenz, lkail
Differential Revision: https://reviews.llvm.org/D131873
doesn't happened in peephole optimizer.
Summary: Converting a comparison against 1 or -1 into a comparison
against 0 can exploit record-form instructions for comparison optimization.
The conversion will happen only when a record-form instruction can be used
to replace the comparison during the peephole optimizer (see function optimizeCompareInstr).
In post-RA, we also want to optimize the comparison by using the record
form (see D131873) and it requires additional dataflow analysis to reliably
find uses of the CR register set.
It's reasonable to common the conversion for both peephole optimizer and
post-RA optimizer.
Converting to comparison against zero even when the optimization doesn't
happened in peephole optimizer may create additional opportunities for the
post-RA optimization.
Reviewed By: nemanjai
Differential Revision: https://reviews.llvm.org/D131374
LLVM contains a helpful function for getting the size of a C-style
array: `llvm::array_lengthof`. This is useful prior to C++17, but not as
helpful for C++17 or later: `std::size` already has support for C-style
arrays.
Change call sites to use `std::size` instead.
Differential Revision: https://reviews.llvm.org/D133429
This patch fixes the following two bugs in `PPCInstrInfo::isSignOrZeroExtended` helper, which is used from sign-/zero-extension elimination in PPCMIPeephole pass.
- Registers defined by load with update (e.g. LBZU) were identified as already sign or zero-extended. But it is true only for the first def (loaded value) and not for the second def (i.e. updated pointer).
- Registers defined by ORIS/XORIS were identified as already sign-extended. But, it is not true for sign extension depending on the immediate (while it is ok for zero extension).
To handle the first case, the parameter for the helpers is changed from `MachineInstr` to a register number to distinguish first and second defs. Also, this patch moves the initialization of PPCMIPeepholePass to allow mir test case.
Reviewed By: nemanjai
Differential Revision: https://reviews.llvm.org/D40554
This was stored in LiveIntervals, but not actually used for anything
related to LiveIntervals. It was only used in one check for if a load
instruction is rematerializable. I also don't think this was entirely
correct, since it was implicitly assuming constant loads are also
dereferenceable.
Remove this and rely only on the invariant+dereferenceable flags in
the memory operand. Set the flag based on the AA query upfront. This
should have the same net benefit, but has the possible disadvantage of
making this AA query nonlazy.
Preserve the behavior of assuming pointsToConstantMemory implying
dereferenceable for now, but maybe this should be changed.
Support allocation of huge stack frame(>2g) on PPC64.
For ELFv2 ABI on Linux, quoted from the spec 2.2.3.1 General Stack Frame Requirements
> There is no maximum stack frame size defined.
On AIX, XL allows such huge frame.
Reviewed By: #powerpc, nemanjai
Differential Revision: https://reviews.llvm.org/D107886
This reverts commit ef82063207.
- It conflicts with the existing llvm::size in STLExtras, which will now
never be called.
- Calling it without llvm:: breaks C++17 compat
Currently, the floating point instructions that depend on
rounding mode are correctly marked in the PPC back end with
an implicit use of the RM register. Similarly, instructions
that explicitly define the register are marked with an
implicit def of the same register. So for the most part,
RM-using code won't be moved across RM-setting instructions.
However, calls are not marked as RM-setting instructions so
code can be moved across calls. This is generally desired,
but so is the ability to turn off this behaviour with an
appropriate option - and -frounding-math really should be
that option.
This patch provides a set of call instructions (for direct
and indirect calls) that are marked with an implicit def of
the RM register. These will be used for calls that are marked
with the strictfp attribute.
Differential revision: https://reviews.llvm.org/D111433
This moves the registry higher in the LLVM library dependency stack.
Every client of the target registry needs to link against MC anyway to
actually use the target, so we might as well move this out of Support.
This allows us to ensure that Support doesn't have includes from MC/*.
Differential Revision: https://reviews.llvm.org/D111454
Stop using APInt constructors and methods that were soft-deprecated in
D109483. This fixes all the uses I found in llvm, except for the APInt
unit tests which should still test the deprecated methods.
Differential Revision: https://reviews.llvm.org/D110807
This patch marks splat immediate instructions XXSPLTIW and XXSPLTIDP as
rematerializable to prevent MachineLICM from moving them out of loops.
Reviewed By: lei, amy
Differential revision: https://reviews.llvm.org/D108823
This patch adds a fix to do early if conversion to select when
conditional branch not using physical register to prevent the crash when
expanding ISEL instruction.
Reviewed By: lei, kamaub, PowerPC
Differential revision: https://reviews.llvm.org/D108302
The backend generally uses 64-bit immediates (e.g. what
MachineOperand::getImm() returns), so use that for analyzeCompare()
and optimizeCompareInst() as well. This avoids truncation for
targets that support immediates larger 32-bit. In particular, we
can avoid the bugprone value normalization hack in the AArch64
target.
This is a followup to D108076.
Differential Revision: https://reviews.llvm.org/D108875
When the instruction has imm form and fed by LI, we can remove the redundat LI instruction.
Below is an example:
```
renamable $x5 = LI8 2
renamable $x4 = exact SRD killed renamable $x4, killed renamable $r5, implicit $x5
```
will be converted to:
```
renamable $x5 = LI8 2
renamable $x4 = exact RLDICL killed renamable $x4, 62, 2, implicit killed $x5
```
But when we do this optimization, we forget to remove implicit killed $x5
This bug has caused a lnt case error. This patch is to fix above bug.
Reviewed By: #powerpc, shchenz
Differential Revision: https://reviews.llvm.org/D85288
Export `lq`, `stq`, `lqarx` and `stqcx.` in preparation for implementing 16-byte lock free atomic operations on AIX.
Add a new register class `g8prc` for these instructions, since these instructions require even-odd register pair.
Reviewed By: nemanjai, jsji, #powerpc
Differential Revision: https://reviews.llvm.org/D103010
- Add new variantKinds for the symbol's variable offset and region handle
- Print the proper relocation specifier @gd in the asm streamer when emitting
the TC Entry for the variable offset for the symbol
- Fix the switch section failure between the TC Entry of variable offset and
region handle
- Put .__tls_get_addr symbol in the ProgramCodeSects with XTY_ER property
Reviewed by: sfertile
Differential Revision: https://reviews.llvm.org/D100956
This patch exploits the xxsplti32dx instruction available on Power10
in place of constant pool loads where xxspltidp would not be able to,
usually because the immediate cannot fit into 32 bits.
Differential Revision: https://reviews.llvm.org/D95458
When a D-Form instruction is fed by an add-immediate, we attempt
to merge the two immediates to form a single displacement so we
can remove the add-immediate.
However, we don't check whether the new displacement fits into
a 16-bit signed immediate field early enough. Namely, we do a
sign-extend from 16 bits first which will discard high bits and
then we check whether the result is a 16-bit signed immediate.
It of course will always be.
Move the check prior to the sign extend to ensure we are checking
the correct value.
Fixes https://bugs.llvm.org/show_bug.cgi?id=49640
Prefer (self-documenting) return values to output parameters (which are
liable to be used).
While here, rename Noop to Nop which is more widely used and improves
consistency with hasEmitNops/setEmitNops/emitNop/etc.
NOTE: This patch was originally written by Anil Mahmud. His code has been
rebased but otherwise left mostly unchanged.
A new instructon on Power 10 allows for the materialization of 34 bit
immediate values. This patch allows the compiler to take advantage of
the new instruction in this situation.
Reviewed By: amyk
Differential Revision: https://reviews.llvm.org/D92879
Reassociating some patterns to generate more fma instructions to
reduce register pressure.
Reviewed By: jsji
Differential Revision: https://reviews.llvm.org/D92071
Reassociating some patterns to generate more fma instructions to
reduce register pressure.
Reviewed By: jsji
Differential Revision: https://reviews.llvm.org/D92071
In `PPCInstrInfo::optimizeCompareInstr` we seek opportunities to fold `cmp(d|w)` and `subf` as an `subf.`. However, if `subf.` gets overflow, `cr0` can't reflect the correct order, violating the semantics of `cmp(d|w)`.
Fixed https://bugs.llvm.org/show_bug.cgi?id=47830.
Reviewed By: #powerpc, nemanjai
Differential Revision: https://reviews.llvm.org/D90156
add a new goal MustReduceRegisterPressure for machine combiner pass.
PowerPC will use this new goal to do some register pressure related optimization.
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D92068
Summary: We have the patterns to fold 2 RLWINMs before RA, while some RLWINM will be generated after RA, for example rGc4690b007743. If the RLWINM generated after RA followed by another RLWINM, we expect to perform the optimization too.
Reviewed By: shchenz
Differential Revision: https://reviews.llvm.org/D89855