Commit Graph

6630 Commits

Author SHA1 Message Date
Qiu Chaofan f7294ac809 [PowerPC] Remove extra swap for extract+vperm on LE
This is a simple fix on LE. On BE, vector shuffles are categorized into
different ops. We may need more work to eliminate these in
tablegen/pre-isel.

Reviewed By: nemanjai

Differential Revision: https://reviews.llvm.org/D101605
2021-05-07 13:48:08 +08:00
Victor Huang bb113b9845 [AIX][TLS] Add support for TLSGD relocations to XCOFF objects
- Add branch absolute reloction R_RBA, R_TLS relocation for the variable offset
  for the tlsgd model and R_TLSM for the region handle for the tlsgd model
- Properly set the relocation fixed values for R_TLS and R_TLSM
- Emit the TCEntry with the variant kind in the XCOFFStreamer

Reviewed by: sfertile, nemanjai, DiggerLin

Differential Revision: https://reviews.llvm.org/D100214
2021-05-06 09:01:47 -05:00
Ahsan Saghir 670736a904 [PowerPC] Prevent argument promotion of types with size greater than 128 bits
This patch prevents argument promotion of types having
type size greater than 128 bits.

Fixes Bugzilla: https://bugs.llvm.org/show_bug.cgi?id=49952

Reviewed By: #powerpc, nemanjai

Differential Revision: https://reviews.llvm.org/D101188
2021-05-04 12:09:25 -05:00
Amy Kwan 1998a08655 [PowerPC][NFC] Update atomic patterns to use the refactored load/store implementation
This patch updates the scalar atomic patterns to use the refactored load/store
implementation introduced in D93370.
All existing test cases pass with when the refactored patterns are utilized.

Differential Revision: https://reviews.llvm.org/D94498
2021-05-04 10:46:45 -05:00
Zarko Todorovski d98e5e02ad [AIX] Remove unused vector registers from allocation order in the default AltiVec ABI
The previous implementation of the default AltiVec ABI marked registers V20-V31
as reserved.  This failed to prevent reserved VFRC registers being allocated.
In this patch instead of marking the registers reserved we remove unallowed
registers from the allocation order completely.

This is a slight rework of an implementation by @nemanjai

Reviewed By: jsji

Differential Revision: https://reviews.llvm.org/D100050
2021-05-03 13:50:51 -04:00
Daniil Fukalov 3489c2d7b1 [TTI] NFC: Change getTypeLegalizationCost to return InstructionCost.
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Reviewed By: sdesmalen, kparzysz

Differential Revision: https://reviews.llvm.org/D101533
2021-04-30 22:51:51 +03:00
Amy Kwan 64d951be61 [PowerPC] Add new infrastructure to select load/store instructions, update P8/P9 load/store patterns.
This patch introduces a new infrastructure that is used to select the load and
store instructions in the PPC backend.

The primary motivation is that the current implementation of selecting load/stores
is dependent on the ordering of patterns in TableGen. Given this limitation, we
are not able to easily and reliably generate the P10 prefixed load and stores
instructions (such as when the immediates that fit within 34-bits). This
refactoring is meant to provide us with more control over the patterns/different
forms to exploit, as well as eliminating dependency of pattern declaration in TableGen.

The idea of this refactoring is that it introduces a set of addressing modes that
correspond to different instruction formats of a particular load and store
instruction, along with a set of common flags that describes a load/store.
Whenever a load/store instruction is being selected, we analyze the instruction
and compute a set of flags for it. The computed flags are then used to
select the most optimal load/store addressing mode.

This patch is the first of a series of patches to be committed - it contains the
initial implementation of the refactored load/store selection infrastructure and
also updates P8/P9 patterns to adopt this infrastructure. The idea is that
incremental patches will add more implementation and support, and eventually
the old implementation will be removed.

Differential Revision: https://reviews.llvm.org/D93370
2021-04-30 09:53:19 -05:00
Sidharth Baveja 70c433a184 [XCOFF][AIX] Add Global Variables Directly to TOC for 32 bit AIX
Summary:
This patch implements the backend implementation of adding global variables
directly to the table of contents (TOC), rather than adding the address of the
variable to the TOC.
Currently, this patch will look for the "toc-data" attribute on symbols in the
IR, and then add those symbols to the TOC.
ATM, this is implemented for 32 bit AIX.

Reviewers: sfertile
Differential Revision: https://reviews.llvm.org/D101178
2021-04-30 14:48:02 +00:00
Victor Huang ae3377c553 [AIX][TLS] Add ASM portion changes to support TLSGD relocations to XCOFF objects
- Add new variantKinds for the symbol's variable offset and region handle
- Print the proper relocation specifier @gd in the asm streamer when emitting
  the TC Entry for the variable offset for the symbol
- Fix the switch section failure between the TC Entry of variable offset and
  region handle
- Put .__tls_get_addr symbol in the ProgramCodeSects with XTY_ER property

Reviewed by: sfertile

Differential Revision: https://reviews.llvm.org/D100956
2021-04-29 13:18:59 -05:00
Qiu Chaofan 56d923efdb [SPE] Support constrained float operations on SPE
This patch enables support on SPE for constrained arithmetic and
comparison operations. This fixes bugzilla 50070.

One thing not covered is fcmp vs. fcmps on SPE. Some condition code
generates singaling comparison while some not. In this patch, all are
considered as singaling. So there might be still some issue when
compiling from C code.

Reviewed By: jhibbits

Differential Revision: https://reviews.llvm.org/D101282
2021-04-29 16:34:10 +08:00
Qiu Chaofan d5c2492455 [PowerPC] Fix SELECT_CC with i64 operand on PPC32
This patch fixes the infinite loop in legalization of PPC32 SELECT_CC
with 64-bit operand.
2021-04-28 17:48:33 +08:00
Victor Huang 241c2da406 [AIX][Power10] Restrict prefixed instructions from crossing the 64byte boundary
This patch adds the support to restrict prefixed instruction from
crossing the 64 byte boundary:
- Add the infrastructure to register a custom XCOFF streamer
- Add a custom XCOFF streamer for PowerPC to allow us to
  intercept instructions as they are being emitted and align all 8 byte
  instructions to a 64 byte boundary if required by adding a 4 byte nop.

Reviewed By: stefanp

Differential Revision: https://reviews.llvm.org/D101107
2021-04-27 11:55:18 -05:00
Zarko Todorovski f818ec9dd1 [AIX] Allow safe for 32bit P9 VSX extract and insert pattern matches
In https://reviews.llvm.org/D92789 PPC64 checks were added that disallowed most
VSX pattern matching.  We enable some safe ones for 32bit in this patch.

Reviewed By: nemanjai

Differential Revision: https://reviews.llvm.org/D97503
2021-04-27 07:27:43 -04:00
Nemanja Ivanovic 6725b90a02 [PowerPC] Add vec_ctsl and vec_ctul to altivec.h
These are added for compatibility with XLC. They are similar to
vec_cts and vec_ctu except that the result is a doubleword vector
regardless of the parameter type.
2021-04-23 11:03:38 -05:00
Sander de Smalen f9a50f04ba [TTI] NFC: Change getIntImmCost[Inst|Intrin] to return InstructionCost
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Differential Revision: https://reviews.llvm.org/D100565
2021-04-23 16:06:36 +01:00
Jay Foad 82d34fe2b3 Fix typo "beneficiates" in comments 2021-04-22 12:30:16 +01:00
Nemanja Ivanovic 092619cf6b [PowerPC] Improve codegen for vector fp to int widening conversions
We currently do not utilize instructions that convert single
precision vectors to doubleword integer vectors. These conversions
come up in code occasionally and this improvement allows us to
open code some functions that need to be added to altivec.h.
2021-04-22 05:04:06 -05:00
Nemanja Ivanovic 03e7fefff8 [PowerPC] Canonicalize shuffles on big endian targets as well
Extend shuffle canonicalization and conversion of shuffles fed by vectorized
scalars to big endian subtargets. For big endian subtargets, loads and direct
moves of scalars into vector registers put the data in the correct element for
SCALAR_TO_VECTOR if the data type is 8 bytes wide. However, if the data type is
narrower, the value still ends up in the wrong place - althouth a different
wrong place than on little endian targets.

This patch extends the combine that keeps values where they are if they feed a
shuffle to big endian targets.

Differential revision: https://reviews.llvm.org/D100478
2021-04-20 07:29:47 -05:00
Qiu Chaofan 2432d80d3b [PowerPC] Use mtvsrdd to put callee-saved GPR into VSR
This patch exploits mtvsrdd instruction (available in ISA3.0+) to save
two callee-saved GPR registers into a single VSR, making it more
efficient.

Reviewed By: jsji, nemanjai

Differential Revision: https://reviews.llvm.org/D62565
2021-04-20 16:43:24 +08:00
Qiu Chaofan b820339752 [PowerPC] Support f128 under VSX
This patch is the last one in backend to support fp128 type in
pre-POWER9 subtargets with VSX, removing temporary option and updating
remaining tests.

Reviewed By: steven.zhang

Differential Revision: https://reviews.llvm.org/D92374
2021-04-20 15:49:52 +08:00
Jinsong Ji d88d8c5b86 [PowerPC] Disable relative lookup table converter pass for AIX
XCOFF hasn't implemented lowerRelativeReference.
So we need to disable new pass introduced by https://reviews.llvm.org/D94355 for
AIX for now.

Reviewed By: gulfem

Differential Revision: https://reviews.llvm.org/D100584
2021-04-19 19:28:11 +00:00
Nick Desaulniers c440b97d89 [TargetLowering] move "o" and "X" constraint handling to base class
These constraints are machine agnostic; there's no reason to handle
these per-arch. If arches don't support these constraints, then they
will fail elsewhere during instruction selection. We don't need virtual
calls to look these up; TargetLowering::getInlineAsmMemConstraint should
only be overridden by architectures with additional unique memory
constraints.

Reviewed By: echristo, MaskRay

Differential Revision: https://reviews.llvm.org/D100416
2021-04-19 10:53:31 -07:00
Serge Guelton d6de1e1a71 Normalize interaction with boolean attributes
Such attributes can either be unset, or set to "true" or "false" (as string).
throughout the codebase, this led to inelegant checks ranging from

        if (Fn->getFnAttribute("no-jump-tables").getValueAsString() == "true")

to

        if (Fn->hasAttribute("no-jump-tables") && Fn->getFnAttribute("no-jump-tables").getValueAsString() == "true")

Introduce a getValueAsBool that normalize the check, with the following
behavior:

no attributes or attribute set to "false" => return false
attribute set to "true" => return true

Differential Revision: https://reviews.llvm.org/D99299
2021-04-17 08:17:33 +02:00
Nemanja Ivanovic ff769dd111 [PowerPC] Minor improvement for insert_vector_elt codegen
For v2f64, all VSX subtargets can insert an element with a single
XXPERMDI.
2021-04-16 18:52:37 -05:00
Stefan Pintilie f28cb01be0 [PowerPC] Add ROP Protection Instructions for PowerPC
There are four new PowerPC instructions that are introduced in
Power 10. They are hashst, hashchk, hashstp, hashchkp.

These instructions will be used for ROP Protection.
This patch adds the four instructions.

Reviewed By: nemanjai, amyk, #powerpc

Differential Revision: https://reviews.llvm.org/D99375
2021-04-15 11:38:38 -05:00
Sander de Smalen 4f42d873c2 [TTI] NFC: Change getArithmeticInstrCost to return InstructionCost
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D100317
2021-04-14 17:20:36 +01:00
Sander de Smalen 1af35e77f4 [TTI] NFC: Change getVectorInstrCost to return InstructionCost
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D100315
2021-04-14 17:20:35 +01:00
Sander de Smalen 174e8f6c5e [TTI] NFC: Change getShuffleCost to return InstructionCost
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D100314
2021-04-14 17:20:35 +01:00
Sander de Smalen 14b934f8a6 [TTI] NFC: Change getCFInstrCost to return InstructionCost
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Reviewed By: samparker

Differential Revision: https://reviews.llvm.org/D100313
2021-04-14 17:20:34 +01:00
Zarko Todorovski 6b7838b68c [AIX] Allow safe for 32bit P8 VSX pattern matching
Pull some of the safe for 32bit pattern matching for Pwr8 and above.

Reviewed By: nemanjai

Differential Revision: https://reviews.llvm.org/D97909
2021-04-14 08:12:48 -04:00
Nemanja Ivanovic 8be3181df6 [PowerPC] Fix incorrect subreg typo from 0148bf53f0 2021-04-14 05:01:12 -05:00
Nemanja Ivanovic 0148bf53f0 [PowerPC] Use correct node to get a super register from a subreg
The VSX tablegen file has some rather eggregious uses of
COPY_TO_REGCLASS even in situations where it needs to use
SUBREG_TO_REG. While this produces correct code, it often doesn't
allow the register coalescer to coalesce copies and the resulting
code ends up being suboptimal. This patch just changes over
patterns that should use SUBREG_TO_REG.
2021-04-13 19:52:21 -05:00
Sander de Smalen 03f47bdcb1 [TTI] NFC: Change get[Interleaved]MemoryOpCost to return InstructionCost
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D100205
2021-04-13 14:21:02 +01:00
Sander de Smalen db134e2428 [TTI] NFC: Change getCmpSelInstrCost to return InstructionCost
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D100203
2021-04-13 14:21:01 +01:00
Sander de Smalen 92d8421f49 [TTI] NFC: Change getCastInstrCost and getExtractWithExtendCost to return InstructionCost
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D100199
2021-04-13 14:20:58 +01:00
Chen Zheng 80aa9b0f7b [PowerPC] stop reverse mem op generation for some cases.
We should consider the feeder user number when we do reverse memory
operation transformation. Otherwise, we may get negative impact.

Reviewed By: nemanjai

Differential Revision: https://reviews.llvm.org/D100166
2021-04-12 22:41:28 -04:00
Qiu Chaofan ece7345859 [PowerPC] Lower f128 SETCC/SELECT_CC as libcall if p9vector disabled
XSCMPUQP is not available for pre-P9 subtargets. This patch will lower
them into libcall for correct behavior on power7/power8.

Reviewed By: steven.zhang

Differential Revision: https://reviews.llvm.org/D92083
2021-04-12 10:33:32 +08:00
dfukalov 8f4b7e94a2 [AMDGPU][CostModel] Refine cost model for control-flow instructions.
Added cost estimation for switch instruction, updated costs of branches, fixed
phi cost.
Had to increase `-amdgpu-unroll-threshold-if` default value since conditional
branch cost (size) was corrected to higher value.
Test renamed to "control-flow.ll".

Removed redundant code in `X86TTIImpl::getCFInstrCost()` and
`PPCTTIImpl::getCFInstrCost()`.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D96805
2021-04-10 09:20:24 +03:00
Mitch Phillips 1a2756b777 Revert "[PowerPC] Add ROP Protection Instructions for PowerPC"
This reverts commit 16fe741c69.

Reason: Broke the UBSan buildbots. More information available in the
phabricator review: https://reviews.llvm.org/D99375
2021-04-09 13:36:41 -07:00
Stefan Pintilie 5bca7cdafb Add correct types to the xxsplti32dx pattern.
Regiser types for xxsplti32dx for two td file patterns was incorrect.
Fixed the two types and added a test case that was reduced from a larger
failing test.

Reviewed By: nemanjai, #powerpc

Differential Revision: https://reviews.llvm.org/D100223
2021-04-09 14:11:34 -05:00
Stefan Pintilie 16fe741c69 [PowerPC] Add ROP Protection Instructions for PowerPC
There are four new PowerPC instructions that are introduced in
Power 10. They are hashst, hashchk, hashstp, hashchkp.

These instructions will be used for ROP Protection.
This patch adds the four instructions.

Reviewed By: nemanjai, amyk, #powerpc

Differential Revision: https://reviews.llvm.org/D99375
2021-04-09 12:09:01 -05:00
Chen Zheng 74e77295e7 [PowerPC] fixup killed flags for ri + addi to ri transformation
Fixup killed flags if DefMI and MI are not in the same basic blocks.

Reviewed By: nemanjai

Differential Revision: https://reviews.llvm.org/D100023
2021-04-07 22:04:08 -04:00
Qiu Chaofan 033c9c2552 [PowerPC] Fix use check of swap-reduction
This will fix swap-reduction in DAGISel for cases where COPY_TO_REGCLASS
has multiple uses.
2021-04-07 15:55:52 +08:00
Amy Kwan bd6033eca7 [PowerPC] Materialize 34-bit constants with pli directly
Previously, 34-bit constants were materialized in selectI64Imm(), and we relied
on td pattern matching to instead produce a pli. This becomes problematic as
there is no guarantee that the 34-bit constant will reach the td pattern
selection for pli. It is also possible for other transformations (such as complex
bit permutations) to also produce and utilize the 34-bit constant materialized
through selectI64Imm().

This patch instead produces pli on Power10 directly whenever the constant fits
within 34-bits.

Differential Revision: https://reviews.llvm.org/D99906
2021-04-06 13:38:11 -05:00
Nikita Popov 665065821e [FastISel] Remove kill tracking
This is a followup to D98145: As far as I know, tracking of kill
flags in FastISel is just a compile-time optimization. However,
I'm not actually seeing any compile-time regression when removing
the tracking. This probably used to be more important in the past,
before FastRA was switched to allocate instructions in reverse
order, which means that it discovers kills as a matter of course.

As such, the kill tracking doesn't really seem to serve a purpose
anymore, and just adds additional complexity and potential for
errors. This patch removes it entirely. The primary changes are
dropping the hasTrivialKill() method and removing the kill
arguments from the emitFast methods. The rest is mechanical fixup.

Differential Revision: https://reviews.llvm.org/D98294
2021-04-03 15:50:13 +02:00
Shimin Cui 00c0c8c87d [PowerPC] [MLICM] Enable hoisting of caller preserved registers on AIX
On ppc64 linux , MachineLICM will hoist caller preserved registers, including TOC loads of the global variable address, out of loops. This is to enable this on AIX for both ppc64 and ppc32.

Differential Revision: https://reviews.llvm.org/D99076
2021-03-31 12:46:25 -04:00
Sander de Smalen 2f6f249a49 NFC: Change getIntrinsicInstrCost to return InstructionCost
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Depends on D97468

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D97469
2021-03-31 14:04:41 +01:00
Sander de Smalen 3ccbd4f3c7 NFC: Change getUserCost to return InstructionCost
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Depends on D97382

Reviewed By: ctetreau, paulwalker-arm

Differential Revision: https://reviews.llvm.org/D97466
2021-03-31 10:13:09 +01:00
Tomas Matheson a9968c0a33 [NFC][CodeGen] Tidy up TargetRegisterInfo stack realignment functions
Currently needsStackRealignment returns false if canRealignStack returns false.
This means that the behavior of needsStackRealignment does not correspond to
it's name and description; a function might need stack realignment, but if it
is not possible then this function returns false. Furthermore,
needsStackRealignment is not virtual and therefore some backends have made use
of canRealignStack to indicate whether a function needs stack realignment.

This patch attempts to clarify the situation by separating them and introducing
new names:

 - shouldRealignStack - true if there is any reason the stack should be
   realigned

 - canRealignStack - true if we are still able to realign the stack (e.g. we
   can still reserve/have reserved a frame pointer)

 - hasStackRealignment = shouldRealignStack && canRealignStack (not target
   customisable)

Targets can now override shouldRealignStack to indicate that stack realignment
is required.

This change will make it easier in a future change to handle the case where we
need to realign the stack but can't do so (for example when the register
allocator creates an aligned spill after the frame pointer has been
eliminated).

Differential Revision: https://reviews.llvm.org/D98716

Change-Id: Ib9a4d21728bf9d08a545b4365418d3ffe1af4d87
2021-03-30 17:31:39 +01:00
Albion Fung e29bb074c6 [PowerPC] Exploit xxsplti32dx (constant materialization) for scalars
This patch exploits the xxsplti32dx instruction available on Power10
in place of constant pool loads where xxspltidp would not be able to,
usually because the immediate cannot fit into 32 bits.

Differential Revision: https://reviews.llvm.org/D95458
2021-03-24 15:59:59 -04:00
Sander de Smalen 55d18b3cc2 [TTI] Return a TypeSize from getRegisterBitWidth.
This patch changes the interface to take a RegisterKind, to indicate
whether the register bitwidth of a scalar register, fixed-width vector
register, or scalable vector register must be returned.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D98874
2021-03-24 14:45:13 +00:00
Stefan Pintilie 91f4c11133 [PowerPC] Add mprivileged option
Add an option to tell the compiler that it can use privileged instructions.

This patch only adds the option. Backend implementation will be added in a
future patch.

Reviewed By: lei, amyk

Differential Revision: https://reviews.llvm.org/D99193
2021-03-24 08:33:22 -05:00
Stefan Pintilie 0e4f5f3ea6 [PowerPC] Change option to mrop-protect
In order to have the same option on power PC LLVM and power PC gcc
the option will be changed from -mrop-protection to -mrop-protect.

The feature will be off by default and turned on when the option is used.

Reviewed By: lei, amyk

Differential Revision: https://reviews.llvm.org/D99185
2021-03-24 05:51:35 -05:00
Stefan Pintilie b8f3c6d011 [PowerPC][NFC] Do not enter prefix selection if it cannot do better.
Do not try to materialize a constant using prefix instructions if the selection
using non prefix instructions was able to do it using a single non prefix
instruction.

Reviewed By: nemanjai, #powerpc

Differential Revision: https://reviews.llvm.org/D98791
2021-03-22 09:17:52 -05:00
Qiu Chaofan 52f33f7953 [PowerPC] Enable redundant TOC save removal on AIX
Reviewed By: shchenz

Differential Revision: https://reviews.llvm.org/D97039
2021-03-22 14:29:22 +08:00
Nemanja Ivanovic ea48bf8649 [PowerPC][NFC] Do not produce i64 constants in 32-bit mode
There are some instances where we produce constants of type MVT::i64
unconditionally in the target DAG combines. This is not actually
valid in 32-bit mode.
2021-03-19 22:54:47 -05:00
Anshil Gandhi 697f90ebfa [NFC] [PowerPC] Determine Endianness in PPCTargetMachine
The TargetMachine uses the triple to determine endianness. Just
use that logic rather than replicating it in PPCSubtarget.

Differential revision: https://reviews.llvm.org/D98674
2021-03-19 20:22:16 -05:00
Nemanja Ivanovic a8697c57fa [PowerPC] Fix the check for 16-bit signed field in peephole
When a D-Form instruction is fed by an add-immediate, we attempt
to merge the two immediates to form a single displacement so we
can remove the add-immediate.

However, we don't check whether the new displacement fits into
a 16-bit signed immediate field early enough. Namely, we do a
sign-extend from 16 bits first which will discard high bits and
then we check whether the result is a 16-bit signed immediate.
It of course will always be.

Move the check prior to the sign extend to ensure we are checking
the correct value.

Fixes https://bugs.llvm.org/show_bug.cgi?id=49640
2021-03-19 07:15:53 -05:00
David Green e2935dcfc4 [TTI] Add a Mask to getShuffleCost
This adds an Mask ArrayRef to getShuffleCost, so that if an exact mask
can be provided a more accurate cost can be provided by the backend.
For example VREV costs could be returned by the ARM backend. This should
be an NFC until then, laying the groundwork for that to be added.

Differential Revision: https://reviews.llvm.org/D98206
2021-03-17 17:46:26 +00:00
Fangrui Song 5d44c92bf8 Change void getNoop(MCInst &NopInst) to MCInst getNop()
Prefer (self-documenting) return values to output parameters (which are
liable to be used).
While here, rename Noop to Nop which is more widely used and improves
consistency with hasEmitNops/setEmitNops/emitNop/etc.
2021-03-15 12:05:34 -07:00
Simon Pilgrim f6524b4ada [PPC] Fix UBSAN warning about out of range shift. NFCI. 2021-03-12 12:03:48 +00:00
Simon Pilgrim 400952980f [PPC] Fix static analyzer / UBSAN warnings about out of range shifts. NFCI. 2021-03-12 10:34:35 +00:00
Stefan Pintilie e021de0aab [PowerPC] Exploit paddi instruction on Power 10 for constant materialization
Starting with Power 10 the instruction paddi is available to use.
The instruction allows for immediates that are 34 bits.

This patch adds exploitation of the paddi instruction to allow us
to materialize constants.

Reviewed By: lei, amyk

Differential Revision: https://reviews.llvm.org/D93300
2021-03-11 08:37:49 -06:00
Qiu Chaofan 72c4cbd60e [PowerPC] Fix multi-use case for swap reduction
4c973ae implemented reduction of vector swap for lane-insensitive
operations. This commit fixes it for checking number of uses of the
vector operation.
2021-03-11 21:58:33 +08:00
Nikita Popov 2489cbaa80 [PowerPC] Fix infinite loop in peephole CR optimization (PR49509)
If we encounter a degenerate select node where both operands are
the same, then we can continue negating the condition while swapping
operands, resulting in an infinite loop. Avoid this by bailing out
if both operands are the same.

Fixes https://bugs.llvm.org/show_bug.cgi?id=49509.

Differential Revision: https://reviews.llvm.org/D98340
2021-03-11 14:25:22 +01:00
Amy Kwan 8b540c542c [PowerPC] Implement patterns for PC-Rel zextload/extload byte loads
This patch adds patterns to select the PC-Relative extloadi1 and zextloadi1 byte loads.

Differential Revision: https://reviews.llvm.org/D98042
2021-03-10 12:18:13 -06:00
Qiu Chaofan 4c973ae51b [PowerPC] Reduce symmetrical swaps for lane-insensitive vector ops
This patch simplifies pattern (xxswap (vec-op (xxswap a) (xxswap b)))
into (vec-op a b) if vec-op is lane-insensitive. The motivating case
is ScalarToVector-VecOp-ExtractElement sequence on LE, but the
peephole itself is not related to endianness, so BE may also benefit
from this.

Reviewed By: nemanjai

Differential Revision: https://reviews.llvm.org/D97658
2021-03-10 15:21:32 +08:00
Albion Fung 9b6ac9e999 [P10] [Power PC] Exploiting new load rightmost vector element instructions.
This pull request implements patterns to exploit the load rightmost vector
element instructions for loading element 0 on little endian PowerPC subtargets
into v8i16 and v16i8 vector registers for i16 and i8 data types.

Differential Revision: https://reviews.llvm.org/D94816#inline-921403
2021-03-09 16:08:17 -05:00
Lei Huang 535a4192a9 [AIX][TLS] Generate 64-bit general-dynamic access code sequence
Add support for the TLS general dynamic access model to assembly
files on AIX 64-bit.

Reviewed By: sfertile

Differential Revision: https://reviews.llvm.org/D98078
2021-03-08 16:41:25 -06:00
Masoud Ataei 820f508b08 [PowerPC] Removing _massv place holder
Since P8 is the oldest machine supported by MASSV pass,
_massv place holder is removed and the oldest version of
MASSV functions is assumed. If the P9 vector specific is
detected in the compilation process, the P8 prefix will
be updated to P9.

Differential Revision: https://reviews.llvm.org/D98064
2021-03-08 21:43:24 +00:00
Nemanja Ivanovic b0f0115308 [AIX][TLS] Generate 32-bit general-dynamic access code sequence
Adds support for the TLS general dynamic access model to
assembly files on AIX 32-bit.

To generate the correct code sequence when accessing a TLS variable
`v`, we first create two TOC entry nodes, one for the variable offset, one
for the region handle. These nodes are followed by a `PPCISD::TLSGD_AIX`
node (new node introduced by this patch).
The `PPCISD::TLSGD_AIX` node (`TLSGDAIX` pseudo instruction) is
expanded to 2 copies (to put the variable offset and region handle in
the right registers) and a call to `__tls_get_addr`.

This patch also changes the way TC entries are generated in asm files.
If the generated TC entry is for the region handle of a TLS variable,
we add the `@m` relocation and the `.` prefix to the entry name.
For example:

```
L..C0:
  .tc .v[TC],v[TL]@m -> region handle
L..C1:
  .tc v[TC],v[TL] -> variable offset
```

Reviewed By: nemanjai, sfertile

Differential Revision: https://reviews.llvm.org/D97948
2021-03-08 09:30:19 -06:00
Ahsan Saghir acce401068 [PowerPC] Change target data layout for 16-byte stack alignment
This changes the target data layout to make stack align to 16 bytes
on Power10. Before this change, stack was being aligned to 32 bytes.

Reviewed By: #powerpc, nemanjai

Differential Revision: https://reviews.llvm.org/D96265
2021-03-08 08:13:08 -06:00
Sean Fertile f0904a6208 [PowePC][AIX] Handle variadic vector call operands.
Patch adds support for passing vector call operands to variadic
functions. Arguments which are fixed shadow GPRs and stack space even
when they are passed in vector registers, while arguments passed through
ellipses are passed in properly aligned GPRs if available and on the
stack once all GPR arguments registers are consumed.

Differential Revision: https://reviews.llvm.org/D97956
2021-03-06 13:49:55 -05:00
Fangrui Song 3110187f1f [MC][PowerPC] Support .reloc *, BFD_RELOC_{NONE,16,32,64}, *
BFD_RELOC_NONE is useful for ld --gc-sections: it provides a generic way indicating a dependency between two sections.
2021-03-05 21:31:45 -08:00
Zarko Todorovski 2b50ce1524 [PowerPC][AIX] Enable the default AltiVec ABI on AIX
This patch adds support for the default AltiVec ABI for AIX.

Vector registers 20 through 31 are marked as reserved and cannot
be used in the default ABI. This patch adds handling for this case
and also remove the default AltiVec ABI errors.

Reviewed By: sfertile

Differential Revision: https://reviews.llvm.org/D96351
2021-03-05 12:46:27 -05:00
Jinsong Ji cc21de6789 [PowerPC] Update Copy/Paste encodings according to ISA3.1
Copy-paste P9 insns were added back in 2016,
however, looks like the opcodes has changed in ISA3.1.

Reviewed By: #powerpc, nemanjai

Differential Revision: https://reviews.llvm.org/D97416
2021-03-05 17:05:50 +00:00
Chen Zheng 87bbf3d1f8 [XCOFF][DebugInfo] support DWARF for XCOFF for assembly output.
Reviewed By: jasonliu

Differential Revision: https://reviews.llvm.org/D95518
2021-03-04 21:07:52 -05:00
Jinsong Ji 7967221a72 [PowerPC] Disable more extended mne on AIX
To avoid assembler errors.

Reviewed By: sfertile

Differential Revision: https://reviews.llvm.org/D97418
2021-03-04 21:13:37 +00:00
Benjamin Kramer e897feeb8a [PPC] Silence unused variable warning in release builds. NFC. 2021-03-04 21:43:19 +01:00
Sean Fertile aaeffbe007 [PowerPC][AIX] Handle variadic vector formal arguments.
Patch adds support for passing vector arguments to variadic functions.
Arguments which are fixed shadow GPRs and stack space even when they are
passed in vector registers, while arguments passed through ellipses are
passed in(properly aligned GPRs if available and on the stack once all
GPR arguments registers are consumed.

Differential Revision: https://reviews.llvm.org/D97485
2021-03-04 10:56:53 -05:00
Qiu Chaofan 72d4a41ba6 [PowerPC] Allow spilling GPR to VSR on AIX
This patch enables spilling GPR to VSRs instead of stack under AIX ABI.

Reviewed By: nemanjai

Differential Revision: https://reviews.llvm.org/D97367
2021-03-03 13:32:39 +08:00
Victor Huang 1756b2adc9 [AIX][TLS] Generate TLS variables in assembly files
This patch allows generating TLS variables in assembly files on AIX.
Initialized and external uninitialized variables are generated with the
.csect pseudo-op and local uninitialized variables are generated with
the .comm/.lcomm pseudo-ops. The patch also adds a check to
explicitly say that TLS is not yet supported on AIX.

Reviewed by: daltenty, jasonliu, lei, nemanjai, sfertile
Originally patched by: bsaleil
Commandeered by: NeHuang

Differential Revision: https://reviews.llvm.org/D96184
2021-03-02 18:22:48 -06:00
Amara Emerson 8a316045ed [AArch64][GlobalISel] Enable use of the optsize predicate in the selector.
To do this while supporting the existing functionality in SelectionDAG of using
PGO info, we add the ProfileSummaryInfo and LazyBlockFrequencyInfo analysis
dependencies to the instruction selector pass.

Then, use the predicate to generate constant pool loads for f32 materialization,
if we're targeting optsize/minsize.

Differential Revision: https://reviews.llvm.org/D97732
2021-03-02 12:55:51 -08:00
Kazu Hirata 4ed47858ab [llvm] Use llvm::drop_begin (NFC) 2021-02-22 20:17:16 -08:00
Jacques Pienaar 3bec7ed59e Different fix for gcc bug
Was still running into

from definition of 'template<class T> struct llvm::DenseMapInfo'
[-fpermissive]
 template <typename T> struct DenseMapInfo;
                               ^
2021-02-19 16:41:00 -08:00
Leonard Chan c77659e549 [llvm][IR] Do not place constants with static relocations in a mergeable section
This patch provides two major changes:

1. Add getRelocationInfo to check if a constant will have static, dynamic, or
   no relocations. (Also rename the original needsRelocation to needsDynamicRelocation.)
2. Only allow a constant with no relocations (static or dynamic) to be placed
   in a mergeable section.

This will allow unused symbols that contain static relocations and happen to
fit in mergeable constant sections (.rodata.cstN) to instead be placed in
unique-named sections if -fdata-sections is used and subsequently garbage collected
by --gc-sections.

See https://lists.llvm.org/pipermail/llvm-dev/2021-February/148281.html.

Differential Revision: https://reviews.llvm.org/D95960
2021-02-18 15:39:00 -08:00
Sean Fertile bb260b1ca7 [PowerPC][AIX] Add support for vector arg passing on the stack.
Enable passing more vector arguments then available vector
argument passing registers.

Differential Revision: https://reviews.llvm.org/D96415
2021-02-18 13:32:40 -05:00
Baptiste Saleil 34dc1ccb96 [PowerPC] Exploit the vinsw, vinsd, and vins[wd][lr]x instructions on P10
This patch generates the vinsw, vinsd, vinsblx, vinshlx, vinswlx, vinsdlx,
vinsbrx, vinshrx, vinswrx and vinsdrx instructions for vector insertion on P10.

Differential Revision: https://reviews.llvm.org/D94454
2021-02-18 14:17:47 +00:00
Stefan Pintilie b80357d46e [PowerPC] Add option for ROP Protection
Added -mrop-protection for Power PC to turn on codegen that provides some
protection from ROP attacks.

The option is off by default and can be turned on for Power 8, Power 9 and
Power 10.

This patch is for the option only. The feature will be implemented by a later
patch.

Reviewed By: amyk

Differential Revision: https://reviews.llvm.org/D96512
2021-02-18 12:15:50 +00:00
Chen Zheng 5517923b1c [XCOFF][NFC] make csect properties optional for getXCOFFSection
We are going to support debug sections for XCOFF. So the csect
properties are not necessary. This patch makes these properties
optional.

Reviewed By: hubert.reinterpretcast

Differential Revision: https://reviews.llvm.org/D95931
2021-02-17 20:51:42 -05:00
Sidharth Baveja cb2876800c [PowerPC][AIX] Enable Shrinkwrapping on 32 and 64 bit AIX.
Summary:
Currently Shrinkwrap is not enabled on AIX.
This patch enables shrink wrap on 32 and 64 bit AIX, and 64 bit ELF.

Reviewed By: sfertile, nemanjai

Differential Revision: https://reviews.llvm.org/D95094
2021-02-17 14:54:57 +00:00
Sean Fertile 4e127bce2d [PowerPC] Handle FP physical register in inline asm constraint.
Do not defer to the base class when the register constraint is a
physical fpr. The base class will select SPILLTOVSRRC as the register
class and register allocation will fail on subtargets without VSX
registers.

Differential Revision: https://reviews.llvm.org/D91629
2021-02-17 09:27:03 -05:00
Douglas Yung 0e3d7e6186 Fix gcc build after de3a485d9 due to a gcc bug: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92598
This should fix gcc based builders such as http://lab.llvm.org:8011/#/builders/76/builds/1683
2021-02-16 21:57:12 -08:00
Victor Huang de3a485d9c [NFC][PPC] Refactor TOC representation to allow several entries for the same symbol
We currently represent TOC entries by an MCSymbol. This is not enough in some situations.
For example, when accessing an initialized TLS variable v on AIX using the general dynamic
model, we need to generate the two following entries for v:

.tc .v[TC],v@m
.tc v[TC],v

One is for the region handle (with the @m relocation), the other is for the variable offset.
This refactoring allows storing several entries for the same symbol with different VariantKind
in the TOC. If the VariantKind is not specified, we default to VK_None.

The AIX TLS implementation using this refactoring to generate the two entries will be posted
in a subsequent patch.

Patched By: bsaleil
Reviewed By: sfertile

Differential Revision: https://reviews.llvm.org/D96346
2021-02-16 21:32:16 +00:00
Nemanja Ivanovic a5222aa085 [DAGCombine] Do not remove masking argument to FP16_TO_FP for some targets
As of commit 284f2bffc9, the DAG Combiner gets rid of the masking of the
input to this node if the mask only keeps the bottom 16 bits. This is because
the underlying library function does not use the high order bits. However, on
PowerPC's ELFv2 ABI, it is the caller that is responsible for clearing the bits
from the register. Therefore, the library implementation of __gnu_h2f_ieee will
return an incorrect result if the bits aren't cleared.

This combine is desired for ARM (and possibly other targets) so this patch adds
a query to Target Lowering to check if this zeroing needs to be kept.

Fixes: https://bugs.llvm.org/show_bug.cgi?id=49092

Differential revision: https://reviews.llvm.org/D96283
2021-02-09 06:33:48 -06:00
Simon Pilgrim 518af8df44 [PowerPC] Fix multiclass template parameter types. NFC.
Fixes TableGen parser errors reported by D95874.
2021-02-06 15:39:26 +00:00
Craig Topper 11ef356d9e [TargetLowering] Use Align in allowsMisalignedMemoryAccesses.
Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D96097
2021-02-04 19:22:06 -08:00
Chen Zheng b0869a7d72 [PowerPC] [NFC] fix wording typos
Post commit comments address for D92071.
2021-02-02 21:03:17 -05:00
Stefan Pintilie 288f762b6f [PowerPC] Materialize 34 bit constants with pli on Power 10.
NOTE: This patch was originally written by Anil Mahmud. His code has been
rebased but otherwise left mostly unchanged.

A new instructon on Power 10 allows for the materialization of 34 bit
immediate values. This patch allows the compiler to take advantage of
the new instruction in this situation.

Reviewed By: amyk

Differential Revision: https://reviews.llvm.org/D92879
2021-02-02 09:49:22 -06:00
Craig Topper 94206f1f90 [PowerPC] Remove vnot_ppc and replace with the standard vnot.
immAllOnesV has special support for looking through bitcasts
automatically so isel patterns don't need to explicitly look
for the bitconvert.
2021-01-31 19:41:33 -08:00
Kazu Hirata 627b5bda11 [llvm] Add missing header guards (NFC)
Identified with llvm-header-guard.
2021-01-30 09:53:42 -08:00
Kazu Hirata 8ed1636184 [llvm] Use isa instead of dyn_cast (NFC) 2021-01-29 23:23:37 -08:00
Kazu Hirata 7925aa091d [llvm] Populate SmallVector at construction time (NFC) 2021-01-28 22:21:14 -08:00
Albion Fung 2e470e03b4 [PowerPC][Power10] Fix XXSPLI32DX not correctly exploiting specific cases
Some cases may be transformed into 32 bit splats before hitting the boolean statement, which may cause incorrect behaviour and provide XXSPLTI32DX with the incorrect values of splat. The condition was reversed so that the shortcut prevents this problem.

Differential Revision: https://reviews.llvm.org/D95634
2021-01-28 15:17:32 -05:00
Nemanja Ivanovic 54e570d94a [PowerPC] Do not emit XXSPLTI32DX for sub 64-bit constants
If the APInt returned by BuildVectorSDNode::isConstantSplat() is narrower than
64 bits, the result produced by XXSPLTI32DX is incorrect. The result returned
by the function appears to be incorrect and we'll investigate/fix it in a
follow-up commit. However, since this causes miscompiles, we must
temporarily disable emitting this instruction for such values.
2021-01-28 04:16:48 -06:00
Nemanja Ivanovic 8018f731f0 [PowerPC] Do not emit HW loop with half precision operations
If a loop has any operations on half precision values, there will be calls to
library functions on Power8. Even on Power9, there is a small subset of
instructions that are actually supported for the type.

This patch disables HW loops whenever any operations on the type are found
(other than the handfull of supported ones when compiling for Power9). Fixes a
few PR's opened by Julia:

https://bugs.llvm.org/show_bug.cgi?id=48785
https://bugs.llvm.org/show_bug.cgi?id=48786
https://bugs.llvm.org/show_bug.cgi?id=48519

Differential revision: https://reviews.llvm.org/D94980
2021-01-25 20:55:56 -06:00
Nemanja Ivanovic 1150bfa6bb [PowerPC] Add missing negate for VPERMXOR on little endian subtargets
This intrinsic is supposed to have the permute control vector complemented on
little endian systems (as the ABI specifies and GCC implements). With the
current code gen, the result vector is byte-reversed.

Differential revision: https://reviews.llvm.org/D95004
2021-01-25 12:23:33 -06:00
QingShan Zhang ffc3e800c6 [NFC] [DAGCombine] Correct the result for sqrt even the iteration is zero
For now, we correct the result for sqrt if iteration > 0. This doesn't make
sense as they are not strict relative.

Reviewed By: dmgreen, spatel, RKSimon

Differential Revision: https://reviews.llvm.org/D94480
2021-01-25 04:02:44 +00:00
Chen Zheng 0ed4cf4bf3 [PowerPC] support register pressure reduction in machine combiner.
Reassociating some patterns to generate more fma instructions to
reduce register pressure.

Reviewed By: jsji

Differential Revision: https://reviews.llvm.org/D92071
2021-01-24 21:28:21 -05:00
Kazu Hirata 16baad8f4e [llvm] Use pop_back_val (NFC) 2021-01-24 12:18:57 -08:00
Kazu Hirata 054444177b [Target] Use llvm::append_range (NFC) 2021-01-24 12:18:56 -08:00
Kazu Hirata e4847a7fcf Revert "[Target] Use llvm::append_range (NFC)"
This reverts commit cc7a238286.

The X86WinEHState.cpp hunk seems to break certain builds.
2021-01-23 11:25:27 -08:00
Kazu Hirata cc7a238286 [Target] Use llvm::append_range (NFC) 2021-01-23 10:56:31 -08:00
Stanislav Mekhanoshin 607bec0bb9 Change materializeFrameBaseRegister() to return register
The only caller of this function is in the LocalStackSlotAllocation
and it creates base register of class returned by the target's
getPointerRegClass(). AMDGPU wants to use a different reg class
here so let materializeFrameBaseRegister to just create and return
whatever it wants.

Differential Revision: https://reviews.llvm.org/D95268
2021-01-22 15:51:06 -08:00
Kazu Hirata cfa241680f [llvm] Don't include StringSwitch.h where unnecessary (NFC) 2021-01-21 19:59:48 -08:00
Qiu Chaofan 449f2f7140 [PowerPC] Duplicate inherited heuristic from base scheduler
PowerPC has its custom scheduler heuristic. It calls parent classes'
tryCandidate in override version, but the function returns void, so this
way doesn't actually help. This patch duplicates code from base scheduler
into PPC machine scheduler class, which does what we wanted.

Reviewed By: steven.zhang

Differential Revision: https://reviews.llvm.org/D94464
2021-01-22 10:11:03 +08:00
Albion Fung 719b563ecf [PowerPC][Power10] Exploit splat instruction xxsplti32dx in Power10
Exploits the instruction xxsplti32dx.

It can be used to materialize any 64 bit scalar/vector splat by using two instances, one for the upper 32 bits and the other for the lower 32 bits. It should not materialize the cases which can be materialized by using the instruction xxspltidp.

Differential Revision: https://https://reviews.llvm.org/D90173
2021-01-20 12:55:52 -05:00
Kazu Hirata 8857202489 [llvm] Use llvm::find (NFC) 2021-01-19 20:19:14 -08:00
Victor Huang 909d6c86ea [PowerPC] Fix the check for the instruction using FRSP/XSRSP output register
When performing peephole optimization to simplify the code, after removing
passed FPSP/XSRSP instruction we will set any uses of that FRSP/XSRSP to the
source of the FRSP/XSRSP.

We are finding the machine instruction using virtual register holding FRSP/XSRSP
results by searching all following instructions and encountering an issue
that the first use of the virtual register is a debug MI causing:
1. virtual register in the debug MI removed unexpectedly.
2. virtual register used in non-debug MI not replaced with the source of
  FRSP/XSRSP. which stays in a undef status.

This patch fix the issue by only searching non-debug machine instruction using
virtual register holding FRSP/XSRSP results when the vr only has one non debug
usage.

Differential Revisien: https://reviews.llvm.org/D94711
Reviewed by: nemanjai
2021-01-19 09:20:03 -06:00
Nemanja Ivanovic 61f69153e8 [PowerPC] Sign extend comparison operand for signed atomic comparisons
As of 8dacca943a, we sign extend the atomic loaded
operand for signed subword comparisons. However, the assumption that the other
operand is correctly sign extended doesn't always hold. This patch sign extends
the other operand if it needs to be sign extended.

This is a second fix for https://bugs.llvm.org/show_bug.cgi?id=30451

Differential revision: https://reviews.llvm.org/D94058
2021-01-18 21:19:25 -06:00
Sean Fertile ead71a23ed [PowerPC][AIX]Do not emit xxspltd mnemonic on AIX.
A bug in the system assembler can assemble the xxspltd extended
menemonic into the wrong instruction (extracting the wrong element).
Emit the full xxpermdi with all operands to work around the problem.

Differential Revision: https://reviews.llvm.org/D94419
2021-01-18 09:25:31 -05:00
Tres Popp 3bd24574c7 Revert "[PowerPC] support register pressure reduction in machine combiner."
This reverts commit 26a396c4ef.

See https://reviews.llvm.org/D92071 for a description of the issue.
2021-01-18 12:01:57 +01:00
Chen Zheng 26a396c4ef [PowerPC] support register pressure reduction in machine combiner.
Reassociating some patterns to generate more fma instructions to
reduce register pressure.

Reviewed By: jsji

Differential Revision: https://reviews.llvm.org/D92071
2021-01-17 23:56:13 -05:00
Kazu Hirata 7dc3575ef2 [llvm] Remove redundant return and continue statements (NFC)
Identified with readability-redundant-control-flow.
2021-01-14 20:30:34 -08:00
Jinsong Ji 0f588ac03e [PowerPC] Only use some extend mne if assembler is modern enough
Legacy AIX assembly might not support all extended mnes,
add one feature bit to control the generation in MC,
and avoid generating them by default on AIX.

Reviewed By: sfertile

Differential Revision: https://reviews.llvm.org/D94458
2021-01-14 20:36:10 +00:00
Esme-Yi ff40fb07ad [PowerPC] Try to fold sqrt/sdiv test results with the branch.
Summary: The patch tries to fold sqrt/sdiv test node, i.g FTSQRT, XVTDIVDP, and the branch, i.e br_cc if they meet these patterns:
(br_cc seteq, (truncateToi1 SWTestOp), 0) -> (BCC PRED_NU, SWTestOp)
(br_cc seteq, (and SWTestOp, 2), 0) -> (BCC PRED_NE, SWTestOp)
(br_cc seteq, (and SWTestOp, 4), 0) -> (BCC PRED_LE, SWTestOp)
(br_cc seteq, (and SWTestOp, 8), 0) -> (BCC PRED_GE, SWTestOp)
(br_cc setne, (truncateToi1 SWTestOp), 0) -> (BCC PRED_UN, SWTestOp)
(br_cc setne, (and SWTestOp, 2), 0) -> (BCC PRED_EQ, SWTestOp)
(br_cc setne, (and SWTestOp, 4), 0) -> (BCC PRED_GT, SWTestOp)
(br_cc setne, (and SWTestOp, 8), 0) -> (BCC PRED_LT, SWTestOp)

Reviewed By: steven.zhang

Differential Revision: https://reviews.llvm.org/D94054
2021-01-14 02:15:19 +00:00
Jinsong Ji 93b54b7c67 [PowerPC][NFCI] PassSubtarget to ASMWriter
Subtarget feature bits are needed to change instprinter's behavior based
on feature bits.

Most of the other popular targets were updated back in 2015,
in https://reviews.llvm.org/rGb46d0234a6969
we should update it too.

Reviewed By: sfertile

Differential Revision: https://reviews.llvm.org/D94449
2021-01-12 16:25:35 +00:00
Nemanja Ivanovic 3f7b4ce960 [PowerPC] Add support for embedded devices with EFPU2
PowerPC cores like e200z759n3 [1] using an efpu2 only support single precision
hardware floating point instructions. The single precision instructions efs*
and evfs* are identical to the spe float instructions while efd* and evfd*
instructions trigger a not implemented exception.

This patch introduces a new command line option -mefpu2 which leads to
single-hardware / double-software code generation.

[1] Core reference:
  https://www.nxp.com/files-static/32bit/doc/ref_manual/e200z759CRM.pdf

Differential revision: https://reviews.llvm.org/D92935
2021-01-12 09:47:00 -06:00
Kazu Hirata e5b4dbab04 [llvm] Simplify string comparisons (NFC)
Identified with readability-string-compare.
2021-01-11 18:48:09 -08:00
Kazu Hirata e3d3dbd339 [llvm] Ensure newlines at the end of files (NFC)
This patch eliminates pesky "No newline at end of file" messages from
git diff.
2021-01-10 09:24:57 -08:00
Kazu Hirata b7c5e0b02c [Target, Transforms] Use *Set::contains (NFC) 2021-01-08 18:39:54 -08:00
Fangrui Song 022cc6e343 [PowerPC] Delete dead Lower* 2021-01-06 21:58:40 -08:00
Fangrui Song bfa6ca07a8 [PowerPC] Delete remnant Darwin ISelLowering code 2021-01-06 21:40:40 -08:00
Fangrui Song 01a2508aa5 [PowerPC] Delete remnant isOSDarwin references 2021-01-06 21:18:35 -08:00
Kit Barton 4bdab54826 [PPC] Remove old PPCSubTarget variable.
The PPCSubTarget variable has been replaced with the Subtarget variable. This
removes the remaining instances of PPCSubTarget as they are no longer necessary.
2021-01-06 17:44:07 -06:00
Stefan Pintilie cb0c034edc [PowerPC] Fix issue where vsrq is given incorrect shift vector
The new Power10 instruction vsrq was being given the wrong shift vector.
The original code assumed that the shift would be found in bits 121 to 127.
This is not correct. The shift is found in bits 57 to 63.
This can be fixed by swaping the first and second double words.

Reviewed By: nemanjai, #powerpc

Differential Revision: https://reviews.llvm.org/D94113
2021-01-06 05:56:09 -06:00
Christudasan Devadasan d68458bd56 [GlobalISel] Base implementation for sret demotion.
If the return values can't be lowered to registers
SelectionDAG performs the sret demotion. This patch
contains the basic implementation for the same in
the GlobalISel pipeline.

Furthermore, targets should bring relevant changes
during lowerFormalArguments, lowerReturn and
lowerCall to make use of this feature.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D92953
2021-01-06 10:30:50 +05:30
Qiu Chaofan b6c8feb29f [NFC] [PowerPC] Remove dead code in BUILD_VECTOR peephole
The piece of code tries to use splat+shift to lower build_vector with
repeating bit pattern. And immediate field of vector splat is only 5
bits (-16~15). It iterates over them one by one to find which
shifts/rotates to number in build_vector.

This patch removes code to try matching constant with algebraic
right-shift because that's meaningless - any negative number's algebraic
right-shift won't produce result smaller than itself. Besides, code
(int)((unsigned)i >> j) means logical shift-right in C.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D93937
2021-01-05 11:35:00 +08:00
Kai Luo f6515b0520 [PowerPC] Do not fold `cmp(d|w)` and `subf` instruction to `subf.` if `nsw` is not present
In `PPCInstrInfo::optimizeCompareInstr` we seek opportunities to fold `cmp(d|w)` and `subf` as an `subf.`. However, if `subf.` gets overflow, `cr0` can't reflect the correct order, violating the semantics of `cmp(d|w)`.

Fixed https://bugs.llvm.org/show_bug.cgi?id=47830.

Reviewed By: #powerpc, nemanjai

Differential Revision: https://reviews.llvm.org/D90156
2021-01-04 07:54:15 +00:00
Brandon Bergren 8f004471c2 [PowerPC] Add the LLVM triple for powerpcle [1/5]
Add a triple for powerpcle-*-*.

This is a little-endian encoding of the 32-bit PowerPC ABI, useful in certain niche situations:

1) A loader such as the FreeBSD loader which will be loading a little endian kernel. This is required for PowerPC64LE to load properly in pseries VMs.
Such a loader is implemented as a freestanding ELF32 LSB binary.

2) Userspace emulation of a 32-bit LE architecture such as x86 on 64-bit hosts such as PowerPC64LE with tools like box86 requires having a 32-bit LE toolchain and library set, as they operate by translating only the main binary and switching to native code when making library calls.

3) The Void Linux for PowerPC project is experimenting with running an entire powerpcle userland.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D93918
2021-01-02 12:17:22 -06:00
Kai Luo f904d50c29 [PowerPC] Remaining KnownBits should be constant when performing non-sign comparison
In `PPCTargetLowering::DAGCombineTruncBoolExt`, when checking if it's correct to perform the transformation for non-sign comparison, as the comment says
```
      // This is neither a signed nor an unsigned comparison, just make sure
      // that the high bits are equal.
```
Origin check
```
      if (Op1Known.Zero != Op2Known.Zero || Op1Known.One != Op2Known.One)
        return SDValue();
```
is not strong enough. For example,
```
Op1Known = 111x000x;
Op2Known = 111x000x;
```
Bit 4, besides bit 0, is still unknown and affects the final result.

This patch fixes https://bugs.llvm.org/show_bug.cgi?id=48388.

Reviewed By: nemanjai, #powerpc

Differential Revision: https://reviews.llvm.org/D93092
2020-12-30 02:00:47 +00:00
Nemanja Ivanovic 7486de1b2e [PowerPC] Provide patterns for permuted scalar to vector for pre-P8
We will emit these permuted nodes on all VSX little endian subtargets
but don't have the patterns available to match them on subtargets
that don't have direct moves.

Fixes: https://bugs.llvm.org/show_bug.cgi?id=47916
2020-12-29 06:49:25 -06:00
Nemanja Ivanovic 0a19fc3088 [PowerPC] Disable CTR loops containing operations on half-precision
On subtargets prior to Power9, conversions to/from half precision
are lowered to libcalls. This makes loops containing such operations
invalid candidates for HW loops.

Fixes: https://bugs.llvm.org/show_bug.cgi?id=48519
2020-12-29 05:12:50 -06:00
Nemanja Ivanovic 4f568fbd21 [PowerPC] Do not emit HW loop when TLS var accessed in PHI of loop exit
If any PHI nodes in loop exit blocks have incoming values from the
loop that are accesses of TLS variables with local dynamic or general
dynamic TLS model, the address will be computed inside the loop. Since
this includes a call to __tls_get_addr, this will in turn cause the
CTR loops verifier to complain.
Disable CTR loops in such cases.

Fixes: https://bugs.llvm.org/show_bug.cgi?id=48527
2020-12-28 20:36:16 -06:00
Fangrui Song f931290308 [PowerPC] Parse and ignore .machine
glibc/sysdeps/powerpc/powerpc64 has .machine
{altivec,power4,power5,power6,power7,power8} (.machine power9 is planned in
sysdeps/powerpc/powerpc64/power9/strcmp.S).
The diagnostic is not useful anyway so just delete it.
2020-12-28 12:20:40 -08:00
Nemanja Ivanovic e73f885c98 [PowerPC] Remove redundant COPY_TO_REGCLASS introduced by 8a58f21f5b 2020-12-28 09:26:51 -06:00
Kamau Bridgeman 8a58f21f5b [PowerPC][Power10] Exploit store rightmost vector element instructions
Using the store rightmost vector element instructions to do vector
element extraction and store. The rightmost vector element on little
endian is the zeroth vector element, with these patterns that element
can be extracted and stored in one instruction for all vector types.

Differential Revision: https://reviews.llvm.org/D89195
2020-12-22 12:06:43 -05:00
Nemanja Ivanovic ba1202a1e4 [PowerPC] Restore stack ptr from base ptr when available
On subtargets that have a red zone, we will copy the stack pointer to the base
pointer in the prologue prior to updating the stack pointer. There are no other
updates to the base pointer after that. This suggests that we should be able to
restore the stack pointer from the base pointer rather than loading it from the
back chain or adding the frame size back to either the stack pointer or the
frame pointer.
This came about because functions that call setjmp need to restore the SP from
the FP because the back chain might have been clobbered
(see https://reviews.llvm.org/D92906). However, if the stack is realigned, the
restored SP might be incorrect (which is what caused the failures in the two
ASan test cases).

This patch was tested quite extensivelly both with sanitizer runtimes and
general code.

Differential revision: https://reviews.llvm.org/D93327
2020-12-22 05:44:03 -06:00
Fangrui Song d9a0c40bce [MC] Split MCContext::createTempSymbol, default AlwaysAddSuffix to true, and add comments
CanBeUnnamed is rarely false. Splitting to a createNamedTempSymbol makes the
intention clearer and matches the direction of reverted r240130 (to drop the
unneeded parameters).

No behavior change.
2020-12-21 14:04:13 -08:00
Fangrui Song d33abc337c Migrate MCContext::createTempSymbol call sites to AlwaysAddSuffix=true
Most call sites set AlwaysAddSuffix to true. The two use cases do not really
need false and can be more consistent with other temporary symbol usage.
2020-12-21 14:04:13 -08:00