Some cases may be transformed into 32 bit splats before hitting the boolean statement, which may cause incorrect behaviour and provide XXSPLTI32DX with the incorrect values of splat. The condition was reversed so that the shortcut prevents this problem.
Differential Revision: https://reviews.llvm.org/D95634
If the APInt returned by BuildVectorSDNode::isConstantSplat() is narrower than
64 bits, the result produced by XXSPLTI32DX is incorrect. The result returned
by the function appears to be incorrect and we'll investigate/fix it in a
follow-up commit. However, since this causes miscompiles, we must
temporarily disable emitting this instruction for such values.
This intrinsic is supposed to have the permute control vector complemented on
little endian systems (as the ABI specifies and GCC implements). With the
current code gen, the result vector is byte-reversed.
Differential revision: https://reviews.llvm.org/D95004
For now, we correct the result for sqrt if iteration > 0. This doesn't make
sense as they are not strict relative.
Reviewed By: dmgreen, spatel, RKSimon
Differential Revision: https://reviews.llvm.org/D94480
Reassociating some patterns to generate more fma instructions to
reduce register pressure.
Reviewed By: jsji
Differential Revision: https://reviews.llvm.org/D92071
The only caller of this function is in the LocalStackSlotAllocation
and it creates base register of class returned by the target's
getPointerRegClass(). AMDGPU wants to use a different reg class
here so let materializeFrameBaseRegister to just create and return
whatever it wants.
Differential Revision: https://reviews.llvm.org/D95268
PowerPC has its custom scheduler heuristic. It calls parent classes'
tryCandidate in override version, but the function returns void, so this
way doesn't actually help. This patch duplicates code from base scheduler
into PPC machine scheduler class, which does what we wanted.
Reviewed By: steven.zhang
Differential Revision: https://reviews.llvm.org/D94464
Exploits the instruction xxsplti32dx.
It can be used to materialize any 64 bit scalar/vector splat by using two instances, one for the upper 32 bits and the other for the lower 32 bits. It should not materialize the cases which can be materialized by using the instruction xxspltidp.
Differential Revision: https://https://reviews.llvm.org/D90173
When performing peephole optimization to simplify the code, after removing
passed FPSP/XSRSP instruction we will set any uses of that FRSP/XSRSP to the
source of the FRSP/XSRSP.
We are finding the machine instruction using virtual register holding FRSP/XSRSP
results by searching all following instructions and encountering an issue
that the first use of the virtual register is a debug MI causing:
1. virtual register in the debug MI removed unexpectedly.
2. virtual register used in non-debug MI not replaced with the source of
FRSP/XSRSP. which stays in a undef status.
This patch fix the issue by only searching non-debug machine instruction using
virtual register holding FRSP/XSRSP results when the vr only has one non debug
usage.
Differential Revisien: https://reviews.llvm.org/D94711
Reviewed by: nemanjai
As of 8dacca943a, we sign extend the atomic loaded
operand for signed subword comparisons. However, the assumption that the other
operand is correctly sign extended doesn't always hold. This patch sign extends
the other operand if it needs to be sign extended.
This is a second fix for https://bugs.llvm.org/show_bug.cgi?id=30451
Differential revision: https://reviews.llvm.org/D94058
A bug in the system assembler can assemble the xxspltd extended
menemonic into the wrong instruction (extracting the wrong element).
Emit the full xxpermdi with all operands to work around the problem.
Differential Revision: https://reviews.llvm.org/D94419
Reassociating some patterns to generate more fma instructions to
reduce register pressure.
Reviewed By: jsji
Differential Revision: https://reviews.llvm.org/D92071
Legacy AIX assembly might not support all extended mnes,
add one feature bit to control the generation in MC,
and avoid generating them by default on AIX.
Reviewed By: sfertile
Differential Revision: https://reviews.llvm.org/D94458
Subtarget feature bits are needed to change instprinter's behavior based
on feature bits.
Most of the other popular targets were updated back in 2015,
in https://reviews.llvm.org/rGb46d0234a6969
we should update it too.
Reviewed By: sfertile
Differential Revision: https://reviews.llvm.org/D94449
PowerPC cores like e200z759n3 [1] using an efpu2 only support single precision
hardware floating point instructions. The single precision instructions efs*
and evfs* are identical to the spe float instructions while efd* and evfd*
instructions trigger a not implemented exception.
This patch introduces a new command line option -mefpu2 which leads to
single-hardware / double-software code generation.
[1] Core reference:
https://www.nxp.com/files-static/32bit/doc/ref_manual/e200z759CRM.pdf
Differential revision: https://reviews.llvm.org/D92935
The PPCSubTarget variable has been replaced with the Subtarget variable. This
removes the remaining instances of PPCSubTarget as they are no longer necessary.
The new Power10 instruction vsrq was being given the wrong shift vector.
The original code assumed that the shift would be found in bits 121 to 127.
This is not correct. The shift is found in bits 57 to 63.
This can be fixed by swaping the first and second double words.
Reviewed By: nemanjai, #powerpc
Differential Revision: https://reviews.llvm.org/D94113
If the return values can't be lowered to registers
SelectionDAG performs the sret demotion. This patch
contains the basic implementation for the same in
the GlobalISel pipeline.
Furthermore, targets should bring relevant changes
during lowerFormalArguments, lowerReturn and
lowerCall to make use of this feature.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D92953
The piece of code tries to use splat+shift to lower build_vector with
repeating bit pattern. And immediate field of vector splat is only 5
bits (-16~15). It iterates over them one by one to find which
shifts/rotates to number in build_vector.
This patch removes code to try matching constant with algebraic
right-shift because that's meaningless - any negative number's algebraic
right-shift won't produce result smaller than itself. Besides, code
(int)((unsigned)i >> j) means logical shift-right in C.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D93937
In `PPCInstrInfo::optimizeCompareInstr` we seek opportunities to fold `cmp(d|w)` and `subf` as an `subf.`. However, if `subf.` gets overflow, `cr0` can't reflect the correct order, violating the semantics of `cmp(d|w)`.
Fixed https://bugs.llvm.org/show_bug.cgi?id=47830.
Reviewed By: #powerpc, nemanjai
Differential Revision: https://reviews.llvm.org/D90156
Add a triple for powerpcle-*-*.
This is a little-endian encoding of the 32-bit PowerPC ABI, useful in certain niche situations:
1) A loader such as the FreeBSD loader which will be loading a little endian kernel. This is required for PowerPC64LE to load properly in pseries VMs.
Such a loader is implemented as a freestanding ELF32 LSB binary.
2) Userspace emulation of a 32-bit LE architecture such as x86 on 64-bit hosts such as PowerPC64LE with tools like box86 requires having a 32-bit LE toolchain and library set, as they operate by translating only the main binary and switching to native code when making library calls.
3) The Void Linux for PowerPC project is experimenting with running an entire powerpcle userland.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D93918
In `PPCTargetLowering::DAGCombineTruncBoolExt`, when checking if it's correct to perform the transformation for non-sign comparison, as the comment says
```
// This is neither a signed nor an unsigned comparison, just make sure
// that the high bits are equal.
```
Origin check
```
if (Op1Known.Zero != Op2Known.Zero || Op1Known.One != Op2Known.One)
return SDValue();
```
is not strong enough. For example,
```
Op1Known = 111x000x;
Op2Known = 111x000x;
```
Bit 4, besides bit 0, is still unknown and affects the final result.
This patch fixes https://bugs.llvm.org/show_bug.cgi?id=48388.
Reviewed By: nemanjai, #powerpc
Differential Revision: https://reviews.llvm.org/D93092
We will emit these permuted nodes on all VSX little endian subtargets
but don't have the patterns available to match them on subtargets
that don't have direct moves.
Fixes: https://bugs.llvm.org/show_bug.cgi?id=47916
On subtargets prior to Power9, conversions to/from half precision
are lowered to libcalls. This makes loops containing such operations
invalid candidates for HW loops.
Fixes: https://bugs.llvm.org/show_bug.cgi?id=48519
If any PHI nodes in loop exit blocks have incoming values from the
loop that are accesses of TLS variables with local dynamic or general
dynamic TLS model, the address will be computed inside the loop. Since
this includes a call to __tls_get_addr, this will in turn cause the
CTR loops verifier to complain.
Disable CTR loops in such cases.
Fixes: https://bugs.llvm.org/show_bug.cgi?id=48527
glibc/sysdeps/powerpc/powerpc64 has .machine
{altivec,power4,power5,power6,power7,power8} (.machine power9 is planned in
sysdeps/powerpc/powerpc64/power9/strcmp.S).
The diagnostic is not useful anyway so just delete it.
Using the store rightmost vector element instructions to do vector
element extraction and store. The rightmost vector element on little
endian is the zeroth vector element, with these patterns that element
can be extracted and stored in one instruction for all vector types.
Differential Revision: https://reviews.llvm.org/D89195
On subtargets that have a red zone, we will copy the stack pointer to the base
pointer in the prologue prior to updating the stack pointer. There are no other
updates to the base pointer after that. This suggests that we should be able to
restore the stack pointer from the base pointer rather than loading it from the
back chain or adding the frame size back to either the stack pointer or the
frame pointer.
This came about because functions that call setjmp need to restore the SP from
the FP because the back chain might have been clobbered
(see https://reviews.llvm.org/D92906). However, if the stack is realigned, the
restored SP might be incorrect (which is what caused the failures in the two
ASan test cases).
This patch was tested quite extensivelly both with sanitizer runtimes and
general code.
Differential revision: https://reviews.llvm.org/D93327
CanBeUnnamed is rarely false. Splitting to a createNamedTempSymbol makes the
intention clearer and matches the direction of reverted r240130 (to drop the
unneeded parameters).
No behavior change.
Summary: Some constants can be handled with less instructions than our current results. And it seems our original approach is not very easy to extend. Therefore this patch proposes to materialize all 64-bit constants by enumerated patterns.
I traversed almost all constants to verified the functionality of these pattens. A traversed comparison of the number of instructions used by the original method and the new method has also been completed, where no degradation was caused by this patch. This patch also passed Bootstrap test and SPEC test.
Improvements of this patch are shown in llvm/test/CodeGen/PowerPC/constants-i64.ll
Reviewed By: steven.zhang, stefanp
Differential Revision: https://reviews.llvm.org/D92089
This patch does two things:
1: fix the typo that intrinsic mfvscr should be with no readmem property
2: since VSCR is not modeled yet, add has side effect for SAT bit clobber
intrinsics/instructions.
Reviewed By: steven.zhang
Differential Revision: https://reviews.llvm.org/D90807
The LD/STD likewise instruction are selected only when the alignment in
the load/store >= 4 to deal with the case that the offset might not be
known(i.e. relocations). That means we have to select the X-Form load
for %0 = load i64, i64* %arrayidx, align 2 In fact, we can still select
the D-Form load if the offset is known. So, we only query the load/store
alignment when we don't know if the offset is a multiple of 4.
Reviewed By: jji, Nemanjai
Differential Revision: https://reviews.llvm.org/D93099
On PPC, the vector pair instructions are independent from MMA.
This patch renames the vector pair LLVM intrinsics and Clang builtins to replace the _mma_ prefix by _vsx_ in their names.
We also move the vector pair type/intrinsic/builtin tests to their own files.
Differential Revision: https://reviews.llvm.org/D91974
The PPCCTRLoop pass has been moved to HardwareLoops,
so the comments and some useless code are deprecated now.
Reviewed By: #powerpc, nemanjai
Differential Revision: https://reviews.llvm.org/D93336
X86 and AArch64 expand it as libcall inside the target. And PowerPC also
want to expand them as libcall for P8. So, propose an implement in the
legalizer to common the logic and remove the code for X86/AArch64 to
avoid the duplicate code.
Reviewed By: Craig Topper
Differential Revision: https://reviews.llvm.org/D91331
SUMMARY:
In order for the runtime on AIX to find the compact unwind section(EHInfo table),
we would need to set the following on the traceback table:
The 6th byte's longtbtable field to true to signal there is an Extended TB Table Flag.
The Extended TB Table Flag to be 0x08 to signal there is an exception handling info presents.
Emit the offset between ehinfo TC entry and TOC base after all other optional portions of traceback table.
The patch is authored by Jason Liu.
Reviewers: David Tenty, Digger Lin
Differential Revision: https://reviews.llvm.org/D92766
This patch enables the Clang type __vector_pair and its associated LLVM
intrinsics even when MMA is disabled. With this patch, the type is now controlled
by the PPC paired-vector-memops option. The builtins and intrinsics will be
renamed to drop the mma prefix in another patch.
Differential Revision: https://reviews.llvm.org/D91819
If a function happens to:
- call setjmp
- do a 16-byte stack allocation
- call a function that sets up a stack frame and longjmp's back
The stack pointer that is restores by setjmp will no longer point to a valid
back chain. According to the ABI, stack accesses in such a function are to be
frame pointer based - so it is an error (quite obviously) to restore the stack
from the back chain.
We already restore the stack from the frame pointer when there are calls to
fast_cc functions. We just need to also do that when there are calls to setjmp.
This patch simply does that.
This was pointed out by the Julia team.
Differential revision: https://reviews.llvm.org/D92906
The runtime library has two family library implementation for ppc_fp128 and fp128.
For IBM Long double(ppc_fp128), it is suffixed with 'l', i.e(sqrtl). For
IEEE Long double(fp128), it is suffixed with "ieee128" or "f128".
We miss to map several libcall for IEEE Long double.
Reviewed By: qiucf
Differential Revision: https://reviews.llvm.org/D91675
add a new goal MustReduceRegisterPressure for machine combiner pass.
PowerPC will use this new goal to do some register pressure related optimization.
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D92068
Some of the pattern matching in PPCInstrVSX.td and node lowering involving vectors assumes 64bit mode. This patch disables some of the unsafe pattern matching and lowering of BUILD_VECTOR in 32bit mode.
Reviewed By: Xiangling_L
Differential Revision: https://reviews.llvm.org/D92789
Summary:
"Speculative fix for link failure on bots" with a mention of "the clang-ppc64le-rhel bot fails on link: http://lab.llvm.org:8011/#/builders/57/builds/2307/steps/6/logs/stdio".
PPCAsmPrinter.cpp:(.text._ZN12_GLOBAL__N_116PPCAIXAsmPrinter19emitFunctionBodyEndEv+0x2f8): undefined reference to `llvm::XCOFF::getNameForTracebackTableLanguageId(llvm::XCOFF::TracebackTable::LanguageID)'
PPCAsmPrinter.cpp:(.text._ZN12_GLOBAL__N_116PPCAIXAsmPrinter19emitFunctionBodyEndEv+0x2170): undefined reference to `llvm::XCOFF::parseParmsType(unsigned int, unsigned int)'
SUMMARY:
1. added a new option -xcoff-traceback-table to control whether generate traceback table for function.
2. implement the functionality of emit traceback table of a function.
Reviewers: hubert.reinterpretcast, Jason Liu
Differential Revision: https://reviews.llvm.org/D92398
We defined SubRegIndex for 256/512 regs,
but we did not set the offset for higher part,
so the offset of lower and higher part are the same.
This may cause problem in assessing ranges of SubReg,
it is great that this haven't affected any testcases,
but I think we should fix it to avoid hidden bugs in the future.
Reviewed By: bsaleil, #powerpc
Differential Revision: https://reviews.llvm.org/D92864
Weak functions can be replaced by other functions at link time. Previously it
was assumed that no matter what the weak callee function was replaced with it
would still share the same TOC as the caller. This is no longer true as a weak
callee with a TOC setup can be replaced by another function that was compiled
with PC Relative and does not have a TOC at all.
This patch makes sure that all calls to functions defined as weak from a caller
that has a valid TOC have a nop after the call to allow a place for the linker
to restore the TOC.
Reviewed By: NeHuang
Differential Revision: https://reviews.llvm.org/D91983
Instruction darn was introduced in ISA 3.0. It means 'Deliver A Random
Number'. The immediate number L means:
- L=0, the number is 32-bit (higher 32-bits are all-zero)
- L=1, the number is 'conditioned' (processed by hardware to reduce bias)
- L=2, the number is not conditioned, directly from noise source
GCC implements them in three separate intrinsics: __builtin_darn,
__builtin_darn_32 and __builtin_darn_raw. This patch implements the
same intrinsics. And this change also addresses Bugzilla PR39800.
Reviewed By: steven.zhang
Differential Revision: https://reviews.llvm.org/D92465
Summary: The imm operands of some instructions are not defined accurately in td.
This is a small patch to correct these definitions.
Reviewed By: steven.zhang
Differential Revision: https://reviews.llvm.org/D91603
The xxeval instruction was intorduced in Power PC in Power 10.
The instruction accepts three vector registers and an immediate.
Depending on the value of the immediate the instruction can be used
to perform certain bitwise boolean operations (and, or, xor, ...) on
the given vector registers.
This patch implements the AND and NAND patterns that can be used by
the instruction.
Reviewed By: nemanjai, #powerpc, bsaleil, NeHuang, jsji
Differential Revision: https://reviews.llvm.org/D92420
Summary: This patch added support for the intrinsics llvm.ppc.dcbfps and llvm.ppc.dcbstps.
dcbfps and dcbstps are actually extended mnemonics of dcbf.
dcbfps RA,RB ---> dcbf RA,RB,4
dcbstps RA,RB ---> dcbf RA,RB,6
Reviewed By: amyk, steven.zhang
Differential Revision: https://reviews.llvm.org/D91323
A simple SELECT is used for converting i1 to floating types on ppc32,
but in constrained cases, the chain is not handled properly. This patch
will fix that.
Reviewed By: steven.zhang
Differential Revision: https://reviews.llvm.org/D92365
This PR adds more register class support in PowerPC,
mark OperandType for imm and memory operands.
Also added more unit tests for SnippetGenerator.
Reviewed By: #powerpc, steven.zhang
Differential Revision: https://reviews.llvm.org/D88044
When using accumulators in loops, they are passed around in PHI nodes of unprimed
accumulators, causing the generation of additional prime/unprime instructions.
This patch detects these cases and changes these PHI nodes to primed accumulator
PHI nodes. We also add IR and MIR test cases for several PHI node cases.
Differential Revision: https://reviews.llvm.org/D91391
PowerPC ISA support the input test for vector type v4f32 and v2f64.
Replace the software compare with hw test will improve the perf.
Reviewed By: ChenZheng
Differential Revision: https://reviews.llvm.org/D90914
Summary:
AIX uses the existing EH infrastructure in clang and llvm.
The major differences would be
1. AIX do not have CFI instructions.
2. AIX uses a new personality routine, named __xlcxx_personality_v1.
It doesn't use the GCC personality rountine, because the
interoperability is not there yet on AIX.
3. AIX do not use eh_frame sections. Instead, it would use a eh_info
section (compat unwind section) to store the information about
personality routine and LSDA data address.
Reviewed By: daltenty, hubert.reinterpretcast
Differential Revision: https://reviews.llvm.org/D91455
In lowering of FLT_ROUNDS_, FPSCR content will be moved into FP register
and then GPR, and then truncated into word.
For subtargets without direct move support, it will store and then load.
The load address needs adjustment (+4) only on big-endian targets. This
patch fixes it on using generic opcodes on little-endian and subtargets
with direct-move.
Reviewed By: steven.zhang
Differential Revision: https://reviews.llvm.org/D91845
i1 is the native type for PowerPC if crbits is enabled. However, we need
to promote the i1 to i64 as we didn't have the pattern for i1.
Reviewed By: Qiu Chao Fang
Differential Revision: https://reviews.llvm.org/D92067
Continue the work started at D50989.
The code has been long dead since the triple has been removed (D75494).
Reviewed By: nickdesaulniers, void
Differential Revision: https://reviews.llvm.org/D91836
For now, we will hardcode the result as 0.0 if the input is denormal or 0. That will
have the impact the precision. As the fsqrt added belong to the cold path of the
cmp+branch, it won't impact the performance for normal inputs for PowerPC, but improve
the precision if the input is denormal.
Reviewed By: Spatel
Differential Revision: https://reviews.llvm.org/D80974
This patch enables passing non variadic vector type parameters on the caller and callee side and vector return on AIX that are passed in vector registers only.
So far, support is enabled for only the AIX extended Altivec ABI Calling convention.
Reviewed By: sfertile, DiggerLin
Differential Revision: https://reviews.llvm.org/D86476
If smax() is legal, this is likely to result in smaller codegen expansion for abs(x) than the xor(add,ashr) method.
This is also what PowerPC has been doing for its abs implementation, so it lets us get rid of a load of custom lowering code there (and which was never updated when they added smax lowering).
Alive2: https://alive2.llvm.org/ce/z/xRk3cD
Differential Revision: https://reviews.llvm.org/D92095
During reviewing https://reviews.llvm.org/D84419, @efriedma mentioned the gap between realigned stack pointer and origin stack pointer should be probed too whatever the alignment is. This patch fixes the issue for PPC64.
Reviewed By: jsji
Differential Revision: https://reviews.llvm.org/D88078
PowerPC has instruction ftsqrt/xstsqrtdp etc to do the input test for software square root.
LLVM now tests it with smallest normalized value using abs + setcc. We should add hook to
target that has test instructions.
Reviewed By: Spatel, Chen Zheng, Qiu Chao Fang
Differential Revision: https://reviews.llvm.org/D80706
This patch is the initial patch for support of the AIX extended vector ABI. The extended ABI treats vector registers V20-V31 as non-volatile and we add them as callee saved registers in this patch.
Reviewed By: sfertile
Differential Revision: https://reviews.llvm.org/D88676
For now, we are using the GPR to pass the arguments/return value for fp128 on Power8,
which is incorrect. It should be VSR. The reason why we do it this way is that,
we are setting the fp128 as illegal which make LLVM try to emulate it with i128 on
Power8. So, we need to correct it as legal.
Reviewed By: Nemanjai
Differential Revision: https://reviews.llvm.org/D91527
Added support for the options mabi=vec-extabi and mabi=vec-default which are analogous to qvecnvol and qnovecnvol when using XL on AIX.
The extended Altivec ABI on AIX is enabled using mabi=vec-extabi in clang and vec-extabi in llc.
Reviewed By: Xiangling_L, DiggerLin
Differential Revision: https://reviews.llvm.org/D89684
When the operand to an (s/u)int_to_fp node is an illegally typed load we
cannot reuse the load address since we can not build a proper dependancy
chain. The legalized loads will use a different chain output then the
illegal load. If we reuse the load address then we will build a
conversion node that uses the chain of the illegal load and operations
which modify the memory address in the other dependancy chain can be
scheduled before the floating point load which feeds the conversion.
Differential Revision: https://reviews.llvm.org/D91265
New pseudo instructions GETtlsADDRPCREL and GETtlsldADDRPCREL are added for properly
setting REGMASK for tls_get_addr function when using PCRelative address.
Differential Revisien: https://reviews.llvm.org/D91420
Reviewed by: bsaleil
It is possible that we have different constants in different slots
of second vector double (float) of pow function. So, in this case
Exp->getSplatValue() will return nullptr. Here, I handle it properly.
Reviewed By: steven.zhang, PowerPC
Differential Revision: https://reviews.llvm.org/D91729
Support reserved [0-100] and non-reserved[101-65535] Clang/GNU init
priority values on AIX.
This patch maps Clang/GNU values into priority values used in sinit/sterm
functions. User can play with values and be able to get init to occur
before or after XL init and vice versa.
Differential Revision: https://reviews.llvm.org/D91272
Summary: We have the patterns to fold 2 RLWINMs before RA, while some RLWINM will be generated after RA, for example rGc4690b007743. If the RLWINM generated after RA followed by another RLWINM, we expect to perform the optimization too.
Reviewed By: shchenz
Differential Revision: https://reviews.llvm.org/D89855
The default version only works if the returned node has a single
result. The X86 and PowerPC versions support multiple results
and allow a single result to be returned from a node with
multiple outputs. And allow a single result that is not result 0
of the node.
Also replace the Mips version since the new version should work
for it. The original version handled multiple results, but only
if the new node and original node had the same number of results.
Differential Revision: https://reviews.llvm.org/D91846
Clang generates a '%' prefix for some registers in CFI directives. E.g.
".cfi_register lr, r12" becomes ".cfi_register lr, %r12" after
processing.
Differential Revision: https://reviews.llvm.org/D91735
In some situations, the compiler may insert an accumulator prime instruction and
an accumulator unprime instruction with no use of that accumulator between the two.
That's for example the case when we store an accumulator after assembling it or
restoring it. This patch adds a peephole to remove these prime and unprime instructions.
Differential Revision: https://reviews.llvm.org/D91386
This patch factors out the part of printInstruction that gets the
mnemonic string for a given MCInst. This is intended to be used
subsequently for the instruction-mix remarks to display the final
mnemonic (D90040).
Unfortunately making `getMnemonic` available to the AsmPrinter
seems to require making it virtual. Not sure if there's a way around
that with the current layering of the AsmPrinters.
Reviewed By: Paul-C-Anagnostopoulos
Differential Revision: https://reviews.llvm.org/D90039
No longer rely on an external tool to build the llvm component layout.
Instead, leverage the existing `add_llvm_componentlibrary` cmake function and
introduce `add_llvm_component_group` to accurately describe component behavior.
These function store extra properties in the created targets. These properties
are processed once all components are defined to resolve library dependencies
and produce the header expected by llvm-config.
Differential Revision: https://reviews.llvm.org/D90848
Summary: This patch try to do the following transformation if the multiplier doen't fit int16:
(mul X, c1 << c2) -> (rldicr (mulli X, c1) c2)
Reviewed By: jsji, steven.zhang
Differential Revision: https://reviews.llvm.org/D87384
Summary: This patch depends on D89846. We have the patterns to fold 2 RLWINMs in ppc-mi-peephole, while some RLWINM will be generated after RA, for example rGc4690b007743. If the RLWINM generated after RA followed by another RLWINM, we expect to perform the optimization after RA, too.
Reviewed By: shchenz, steven.zhang
Differential Revision: https://reviews.llvm.org/D89855
Summary: We have the patterns to fold 2 RLWINMs in ppc-mi-peephole, while some RLWINM will be generated after RA, for example D88274. If the RLWINM generated after RA followed by another RLWINM, we expect to perform the optimization after RA, too.
This is a NFC patch to move the folding patterns to PPCInstrInfo, and the follow-up works will be calling it in pre-emit-peephole and expand the patterns to handle more cases.
Reviewed By: shchenz
Differential Revision: https://reviews.llvm.org/D89846
Vector types, quadword integers and f128 currently cannot be handled in
FastISel. We did not skip f128 type in lowering arguments, which causes
a crash. This patch will fix it.
Reviewed By: steven.zhang
Differential Revision: https://reviews.llvm.org/D90206
This reverts the revert commit 408c4408fa.
This version of the patch includes a fix for a crash caused by
treating ICmp/FCmp constant expressions as instructions.
Original message:
On some targets, like AArch64, vector selects can be efficiently lowered
if the vector condition is a compare with a supported predicate.
This patch adds a new argument to getCmpSelInstrCost, to indicate the
predicate of the feeding select condition. Note that it is not
sufficient to use the context instruction when querying the cost of a
vector select starting from a scalar one, because the condition of the
vector select could be composed of compares with different predicates.
This change greatly improves modeling the costs of certain
compare/select patterns on AArch64.
I am also planning on putting up patches to make use of the new argument in
SLPVectorizer & LV.
Variable InnerIsSel references FalseRes, while FalseRes might be
zext/sext. So InnerIsSel should reference SetOrSelCC, otherwise a crash
will happen.
Reviewed By: steven.zhang
Differential Revision: https://reviews.llvm.org/D90142
On some targets, like AArch64, vector selects can be efficiently lowered
if the vector condition is a compare with a supported predicate.
This patch adds a new argument to getCmpSelInstrCost, to indicate the
predicate of the feeding select condition. Note that it is not
sufficient to use the context instruction when querying the cost of a
vector select starting from a scalar one, because the condition of the
vector select could be composed of compares with different predicates.
This change greatly improves modeling the costs of certain
compare/select patterns on AArch64.
I am also planning on putting up patches to make use of the new argument in
SLPVectorizer & LV.
Reviewed By: dmgreen, RKSimon
Differential Revision: https://reviews.llvm.org/D90070
When converting a BUILD_VECTOR or VECTOR_SHUFFLE to a splatting load
as of 1461fb6e78, we inaccurately check
for a single user of the load and neglect to update the users of the
output chain of the original load. As a result, we can emit a new
load when the original load is kept and the new load can be reordered
after a dependent store. This patch fixes those two issues.
Fixes https://bugs.llvm.org/show_bug.cgi?id=47891
Turn on TLS support for PCRel by default and update the test cases.
Differential Revision: https://reviews.llvm.org/D88738
Reviewed by: stefanp, kamaub
This patch implements the set boolean condition instructions introduced in
POWER10.
The set boolean condition instructions (set[n]bc[r]) are used during
the following situations:
- sign/zero/any extending i1 to an i32 or i64,
- reg+reg, reg+imm or floating point comparisons being sign/zero extended to i32 or i64,
- spilling CR bits (using the setnbc instruction)
Differential Revision: https://reviews.llvm.org/D87705
In this patch, Predicates fix added for the following:
* disable prefix-instrs will disable pcrelative-memops
* set two predicates PairedVectorMemops and PrefixInstrs for PLXVP/PSTXVP definitions
Differential Revision: https://reviews.llvm.org/D89727
Reviewed by: amyk, steven.zhang
Some instructions may be removable through processes such as IfConversion,
however DefinesPredicate can not be made aware of when this should be considered.
This parameter allows DefinesPredicate to distinguish these removable instructions
on a per-call basis, allowing for more fine-grained control from processes like
ifConversion.
Renames DefinesPredicate to ClobbersPredicate, to better reflect it's purpose
Differential Revision: https://reviews.llvm.org/D88494
MULH is often expanded on targets.
This patch removes the isMulhCheaperThanMulShift hook and uses
isOperationLegalOrCustom instead.
Differential Revision: https://reviews.llvm.org/D80485
In most of lib/Target we know that we are not dealing with scalable
types so it's perfectly fine to replace TypeSize comparison operators
with their fixed width equivalents, making use of getFixedSize()
and so on.
Differential Revision: https://reviews.llvm.org/D89101
This patch adds support for assemble disassemble intrinsics
for MMA.
Reviewed By: bsaleil, #powerpc
Differential Revision: https://reviews.llvm.org/D88739
Summary: This patch is derived from D87384.
In this patch we expand the existing decomposition of mul-by-constant to be more general by implementing 2 patterns:
```
mul x, (2^N + 2^M) --> (add (shl x, N), (shl x, M))
mul x, (2^N - 2^M) --> (sub (shl x, N), (shl x, M))
```
The conversion will be trigged if the multiplier is a big constant that the target can't use a single multiplication instruction to handle. This is controlled by the hook `decomposeMulByConstant`.
More over, the conversion benefits from an ILP improvement since the instructions are independent. A case with the sequence like following also gets benefit since a shift instruction is saved.
```
*res1 = a * 0x8800;
*res2 = a * 0x8080;
```
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D88201
SUMMARY:
In IBM compiler xlclang , there is an option -fnovisibility which suppresses visibility. For more details see: https://www.ibm.com/support/knowledgecenter/SSGH3R_16.1.0/com.ibm.xlcpp161.aix.doc/compiler_ref/opt_visibility.html.
We need to add the option -mignore-xcoff-visibility for compatibility with the IBM AIX OS (as the option is enabled by default in AIX). With this option llvm does not emit any visibility attribute to ASM or XCOFF object file.
The option only work on the AIX OS, for other non-AIX OS using the option will report an unsupported options error.
In AIX OS:
1.1 the option -mignore-xcoff-visibility is enabled by default , if there is not -fvisibility=* and -mignore-xcoff-visibility explicitly in the clang command .
1.2 if there is -fvisibility=* explicitly but not -mignore-xcoff-visibility explicitly in the clang command. it will generate visibility attributes.
1.3 if there are both -fvisibility=* and -mignore-xcoff-visibility explicitly in the clang command. The option "-mignore-xcoff-visibility" wins , it do not emit the visibility attribute.
The option -mignore-xcoff-visibility has no effect on visibility attribute when compile with -emit-llvm option to generated LLVM IR.
Reviewer: daltenty,Jason Liu
Differential Revision: https://reviews.llvm.org/D87451
getNode handling for ISD:SETCC calls FoldSETCC which can canonicalize
FP constants to the RHS. When this happens we should create the node
with the FMF that was requested. By using FlagInserter when can ensure
any calls to getNode/getSetcc during canonicalization will also get the flags.
Differential Revision: https://reviews.llvm.org/D88063
Summary: This patch implements the builtins for xvtdivdp, xvtdivsp, xvtsqrtdp, xvtsqrtsp.
The instructions correspond to the following builtins:
int vec_test_swdiv(vector double v1, vector double v2);
int vec_test_swdivs(vector float v1, vector float v2);
int vec_test_swsqrt(vector double v1);
int vec_test_swsqrts(vector float v1);
This patch depends on D88274, which fixes the bug in copying from CRRC to GPRC/G8RC.
Reviewed By: steven.zhang, amyk
Differential Revision: https://reviews.llvm.org/D88278
Summary: How we copying the CRRC to GRC is using a single MFOCRF to copy the contents of CR field n (CR bits 4×n+32:4×n+35) into bits 4×n+32:4×n+35 of register GRC. That’s not correct because we expect the value of destination register equals to source so we have to put the the contents of CR field in the lowest 4 bits. This patch adds a RLWINM after MFOCRF to achieve that.
The problem came up when adding builtins for xvtdivdp, xvtdivsp, xvtsqrtdp, xvtsqrtsp, as posted in D88278. We need to move the outputs (in CR register) to GRC. However outputs of these instructions may not in a fixed CR# register, so we can’t directly add a rotation instruction in the .td patterns, but need to wait until the CR register is determined. Then we confirmed this should be a bug in POST-RA PSEUDO PASS.
Reviewed By: nemanjai, shchenz
Differential Revision: https://reviews.llvm.org/D88274
Summary:
Some design decision worth noting about:
I've noticed a recent mailing discussing about why string literal is
not affected by -fdata-sections for ELF target:
http://lists.llvm.org/pipermail/llvm-dev/2020-September/145121.html
But on AIX, our linker could not split the mergeable string like other target.
So I think it would make more sense for us to emit separate csect for
every mergeable string in -fdata-sections mode,
as there might not be other ways for linker to do garbage collection
on unused mergeable string.
Reviewed By: daltenty, hubert.reinterpretcast
Differential Revision: https://reviews.llvm.org/D88339
This patch adds outer product instructions for MMA, including related infrastructure, and their tests.
Depends on D84968.
Reviewed By: #powerpc, bsaleil, amyk
Differential Revision: https://reviews.llvm.org/D88043
It looks like in some circumstances when compiling with `-mcpu=pwr9` we create an EXTSWSLI node when which causes llc to fail. No such error occurs in pwr8 or lower.
This occurs in 32BIT AIX and BE Linux. the cause seems to be that the default return in combineSHL is to create an EXTSWSLI node. Adding a check for whether we are in PPC64 before that fixes the issue.
Reviewed By: #powerpc, nemanjai
Differential Revision: https://reviews.llvm.org/D87046
After removal of Darwin as a PowerPC subtarget, the VRSAVE
save/restore/spill/update code is no longer needed by any supported
subtarget, so remove it while keeping support for vrsave and related instruction
aliases for inline asm. I've pre-commited tests to document the existing vrsave
handling in relation to @llvm.eh.unwind.init and inline asm usage, as
well as a test which shows a beahviour change on AIX related to
returning vector type as we were wrongly emiting VRSAVE_UPDATE on AIX.
This patch legalizes the v256i1 and v512i1 types that will be used for MMA.
It implements loads and stores of these types.
v256i1 is a pair of VSX registers, so for this type, we load/store the two
underlying registers. v512i1 is used for MMA accumulators. So in addition to
loading and storing the 4 associated VSX registers, we generate instructions to
prime (copy the VSX registers to the accumulator) after loading and unprime
(copy the accumulator back to the VSX registers) before storing.
This patch also adds the UACC register class that is necessary to implement the
loads and stores. This class represents accumulator in their unprimed form and
allow the distinction between primed and unprimed accumulators to avoid invalid
copies of the VSX registers associated with primed accumulators.
Differential Revision: https://reviews.llvm.org/D84968
According to POWER ISA, floating point instructions altering exception
bits in FPSCR should be 'may raise FP exception'. (excluding those
read or write the whole FPSCR directly, like mffs/mtfsf) We need to
model FPSCR well in future patches to handle the special case properly.
Instructions added mayRaiseFPException:
- fre(s)/frsqrte(s)
- fmadd(s)/fmsub(s)/fnmadd(s)/fnmsub(s)
- xscmpoqp/xscmpuqp/xscmpeqdp/xscmpgedp/xscmpgtdp
- xscvdphp/xscvhpdp/xvcvhpsp/xvcvsphp/xsrqpxp
- xsmaxcdp/xsincdp/xsmaxjdp/xsminjdp
Instructions removed mayRaiseFPException:
- xstdivdp/xvtdiv(d|s)p/xstsqrtdp/xvtsqrt(d|s)p
- xsabsdp/xsnabsdp/xvabs(d|s)p/xvnabs(d|s)p
- xsnegdp/xscpsgndp/xvneg(d|s)p/xvcpsgn(d|s)p
- xvcvsxwdp/xvcvuxwdp
- xscvdpspn/xscvspdpn
Reviewed By: steven.zhang
Differential Revision: https://reviews.llvm.org/D87738
This patch adds the xxmfacc, xxmtacc and xxsetaccz instructions to manipulate
accumulator registers. It also adds the ACC register class definition for the
accumulator registers.
Differential Revision: https://reviews.llvm.org/D84847
This patch implements the vec_[all|any]_[eq | ne | lt | gt | le | ge] builtins for vector signed/unsigned __int128.
Differential Revision: https://reviews.llvm.org/D87910
This patch is the initial support for the Local Dynamic Thread Local Storage
model to produce code sequence and relocation correct to the ABI for the model
when using PC relative memory operations.
Differential Revision: https://reviews.llvm.org/D87721
This patch implements the vector string isolate (predicate and non-predicate
versions) builtins. The predicate builtins are custom selected within PPCISelDAGToDAG.
Differential Revision: https://reviews.llvm.org/D87671
This patch implements the 128-bit vector divide extended builtins in Clang/LLVM.
These builtins map to the vdivesq and vdiveuq instructions respectively.
Differential Revision: https://reviews.llvm.org/D87729
Stop combining loads and stores with PPCISD::ADD_TLS before we can merge the
node with with TLS_LOCAL_EXEC_MAT_ADDR. The issue is that
TLS_LOCAL_EXEC_MAT_ADDR cannot be selected by itself and requires the previous
ADD_TLS node that goes with it. However, we sometimes try to combine ADD_TLS
with loads and stores that come after it. If this happens then the ADD_TLS is
removed and TLS_LOCAL_EXEC_MAT_ADDR cannot be selected.
While this bug fix will address the issue it my not be ideal from a performance
perspective as we may be able to add patterns to combine TLS_LOCAL_EXEC_MAT_ADDR
with ADD_TLS with the load and store that comes after it all in one. However,
this is beyond the scope of this patch.
Reviewed By: NeHuang
Differential Revision: https://reviews.llvm.org/D88030
Changes TTI function getIntImmCostInst to take an additional Instruction parameter,
which enables us to be able to check it is part of a min(max())/max(min()) pattern that will match SSAT.
We can then mark the constant used as free to prevent it being hoisted so SSAT can still be generated.
Required minor changes in some non-ARM backends to allow for the optional parameter to be included.
Differential Revision: https://reviews.llvm.org/D87457
This patch adds support for the lxvp, lxvpx, plxvp, stxvp, stxvpx and pstxvp
instructions in the PowerPC backend. These instructions allow loading and
storing VSX register pairs. This patch also adds the VSRp register class
definition needed for these instructions.
Differential Revision: https://reviews.llvm.org/D84359
This is a follow-up of D86605. For strict DAG FP node, if its FP
exception behavior metadata is ignore, it should have nofpexcept flag.
But during custom lowering, this flag isn't passed down.
This is also seen on X86 target.
Reviewed By: uweigand
Differential Revision: https://reviews.llvm.org/D87390
This patch implements the vec_gen[b|h|w|d|q]m function prototypes in altivec.h
in order to utilize the move to VSR with mask instructions introduced in Power10.
Differential Revision: https://reviews.llvm.org/D82725
This patch adds the instruction definitions and assembly/disassembly tests for
the set boolean condition instructions. This also includes the negative, and
reverse variants of the instruction.
Differential Revision: https://reviews.llvm.org/D86252
This patch implements the vec_cntm function prototypes in altivec.h in order to
utilize the vector count mask bits instructions introduced in Power10.
Differential Revision: https://reviews.llvm.org/D82726
llc would crash for (store (fptosi-f128-i32)) when -mcpu=pwr8, we should
not generate FP_TO_(S|U)INT_IN_VSR for f128 types at this time. This
patch fixes it.
Reviewed By: steven.zhang
Differential Revision: https://reviews.llvm.org/D86686
This patch is the initial support for the Local Exec Thread Local
Storage model to produce code sequence and relocations correct
to the ABI for the model when using PC relative memory operations.
Patch by: Kamau Bridgeman
Differential Revision: https://reviews.llvm.org/D83404
Summary:
In small code model, AIX assembler could not deal with labels that
could not be reached within the [-0x8000, 0x8000) range from TOC base.
So when generating the assembly, we would need to help the assembler
by subtracting an offset from the label to keep the actual value
within [-0x8000, 0x8000).
Reviewed By: hubert.reinterpretcast, Xiangling_L
Differential Revision: https://reviews.llvm.org/D86879
with P9 Model
Enable the pre-ra and post-ra scheduler strategy for Power10 as we want
to customize the heuristic later. And switch the scheduler model with P9
model before P10 Model is available. The NoSchedModel is modelled as
in-order cpu and the pre-ra scheduler is not bi-directional which will
have big impact on the scheduler.
Reviewed By: jji
Differential Revision: https://reviews.llvm.org/D86865
From ISA, fcmpu will raise the Floating-Point Invalid Operation
Exception (SNaN) if either of the operands is a Signaling NaN by setting
the bit VXSNAN. But the instruction description didn't set the
mayRaiseFPException which might have impact on the scheduling or some
backend optimization.
Reviewed By: qiucf
Differential Revision: https://reviews.llvm.org/D83937
This adds the initial GlobalISel skeleton for PowerPC. It can only run
ir-translator and legalizer for `ret void`.
This is largely based on the initial GlobalISel patch for RISCV
(https://reviews.llvm.org/D65219).
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D83100
22a0edd0 introduced a config IsStrictFPEnabled, which controls the
strict floating point mutation (transforming some strict-fp operations
into non-strict in ISel). This patch disables the mutation by default
since we've finished PowerPC strict-fp enablement in backend.
Reviewed By: uweigand
Differential Revision: https://reviews.llvm.org/D87222
In standard C library, both rint and nearbyint returns rounding result
in current rounding mode. But nearbyint never raises inexact exception.
On PowerPC, x(v|s)r(d|s)pic may modify FPSCR XX, raising inexact
exception. So we can't select constrained fnearbyint into xvrdpic.
One exception here is xsrqpi, which will not raise inexact exception, so
fnearbyint f128 is okay here.
Reviewed By: uweigand
Differential Revision: https://reviews.llvm.org/D87220
This removes the after the fact FMF handling from D46854 in favor of passing fast math flags to getNode. This should be a superset of D87130.
This required adding a SDNodeFlags to SelectionDAG::getSetCC.
Now we manage to contant fold some stuff undefs during the
initial getNode that we don't do in later DAG combines.
Differential Revision: https://reviews.llvm.org/D87200
Commit 3c0b3250 introduced memory cluster under pwr10 target, but a
check for operands was unexpectedly removed. This adds it back to avoid
regression.
Without gcc 7.4 warns with
../lib/Target/PowerPC/PPCInstrInfo.cpp:2284:25: warning: suggest parentheses around '&&' within '||' [-Wparentheses]
BaseOp1.isFI() &&
~~~~~~~~~~~~~~~^~
"Only base registers and frame indices are supported.");
~
On Power10, it's profitable to schedule some stores with adjacent target
address together. This patch implements this feature.
Reviewed By: steven.zhang
Differential Revision: https://reviews.llvm.org/D86754
This patch implements the vec_expandm function prototypes in altivec.h in order
to utilize the vector expand with mask instructions introduced in Power10.
Differential Revision: https://reviews.llvm.org/D82727
Libcall __gcc_qtou is not available, which breaks some tests needing
it. On PowerPC, we have code to manually expand the operation, this
patch applies it to constrained conversion. To keep it strict-safe,
it's using the algorithm similar to expandFP_TO_UINT.
For constrained operations marking FP exception behavior as 'ignore',
we should set the NoFPExcept flag. However, in some custom lowering
the flag is missed. This should be fixed by future patches.
Reviewed By: uweigand
Differential Revision: https://reviews.llvm.org/D86605
The test case in https://bugs.llvm.org/show_bug.cgi?id=47373 exposed
two bugs in the PPC back end. The first one was fixed in commit
2771407584 but the test case had to
be added without -verify-machineinstrs due to the second bug.
This commit fixes the use-after-kill that is left behind by the
PPC MI peephole optimization.
Quite a while ago, we legalized these nodes as we added custom
handling for reciprocal estimates in the back end. We have since
moved to target-independent combines but neglected to turn off
legalization. As a result, we can now get selection failures on
non-VSX subtargets as evidenced in the listed PR.
Fixes: https://bugs.llvm.org/show_bug.cgi?id=47373
This patch implements the builtins for Vector Multiply Builtins (vmulxxd family of instructions), and adds the appropriate test cases for these builtins. The builtins utilize the vector multiply instructions itnroduced with ISA 3.1.
Differential Revision: https://reviews.llvm.org/D83955
General purpose registers 30 and 31 are handled differently when they are
reserved as the base-pointer and frame-pointer respectively. This fixes the
offset of their fixed-stack objects when there are fpr calle-saved registers.
Differential Revision: https://reviews.llvm.org/D85850
On -O0, i1 strict_fsetcc will be promoted to i32. We don't handle that
in TD patterns. This patch fills logic in PPCISelDAGToDAG to handle more
cases.
Reviewed By: uweigand
Differential Revision: https://reviews.llvm.org/D86595
This patch adds the td instruction definitions of the xvcvspbf16 and xvcvbf16spn
instructions, along with their respective MC tests.
Differential Revision: https://reviews.llvm.org/D86794
There's a special case in hasAttribute for None when pImpl is null. If pImpl is not null we dispatch to pImpl->hasAttribute which will always return false for Attribute::None.
So if we just want to check for None its sufficient to just check that pImpl is null. Which can even be done inline.
This patch adds a helper for that case which I hope will speed up our getSubtargetImpl implementations.
Differential Revision: https://reviews.llvm.org/D86744
This patch implements the builtins for Vector Load with Zero and Signed Extend Builtins (lxvr_x for b, h, w, d), and adds the appropriate test cases for these builtins. The builtins utilize the vector load instructions itnroduced with ISA 3.1.
Differential Revision: https://reviews.llvm.org/D82502#inline-797941
When collecting `i1` values via `findAllDefs`, ignore Constant's
operands, since Constant's operands might not be `i1`.
Fixes https://bugs.llvm.org/show_bug.cgi?id=46923 which causes ICE
```
llvm-project/llvm/lib/IR/Constants.cpp:1924: static llvm::Constant *llvm::ConstantExpr::getZExt(llvm::Constant *, llvm::Type *, bool): Assertion `C->getType()->getScalarSizeInBits() < Ty->getScalarSizeInBits()&& "SrcTy must be smaller than DestTy for ZExt!"' failed.
```
Differential Revision: https://reviews.llvm.org/D85007
This patch implements the function prototypes vec_mulh and vec_dive in order to
utilize the vector multiply high (vmulh[s|u][w|d]) and vector divide extended
(vdive[s|u][w|d]) instructions introduced in Power10.
Differential Revision: https://reviews.llvm.org/D82609
Summary:
Support TOCU and TOCL relocation type for object file generation.
Reviewed by: DiggerLin
Differential Revision: https://reviews.llvm.org/D84549
Without the fix gcc 7.4 warns with
../lib/Target/PowerPC/PPCAsmPrinter.cpp: In member function 'void {anonymous}::PPCAsmPrinter::EmitTlsCall(const llvm::MachineInstr*, llvm::MCSymbolRefExpr::VariantKind)':
../lib/Target/PowerPC/PPCAsmPrinter.cpp:525:53: warning: enumeral and non-enumeral type in conditional expression [-Wextra]
MCInstBuilder(Subtarget->isPPC64() ? Opcode : PPC::BL_TLS)
~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~
PC-Relative addressing introduces a fair bit of complexity for correctly
eliminating TOC accesses. FastISel does not include any of that handling so we
miscompile code with -mcpu=pwr10 -O0 if it includes an external call that
FastISel does not handle followed by any of the following:
Floating point constant materialization
Materialization of a GlobalValue
Call that FastISel does handle
This patch switches to SDISel for any of the above.
Differential revision: https://reviews.llvm.org/D86343
Current custom lowering of truncate vector handles a source of up to 128 bits, but that only uses one of the two shuffle vector operands. Extend it to use both operands to handle 256 bit sources.
Differential Revision: https://reviews.llvm.org/D68035
This patch adds frontend and backend options to enable and disable
the PowerPC MMA operations added in ISA 3.1. Instructions using these
options will be added in subsequent patches.
Differential Revision: https://reviews.llvm.org/D81442
We may meet Invalid CTR loop crash when there's constrained ops inside.
This patch adds constrained FP intrinsics to the list so that CTR loop
verification doesn't complain about it.
Reviewed By: steven.zhang
Differential Revision: https://reviews.llvm.org/D81924
This patch makes these operations legal, and add necessary codegen
patterns.
There's still some issue similar to D77033 for conversion from v1i128
type. But normal type tests synced in vector-constrained-fp-intrinsic
are passed successfully.
Reviewed By: uweigand
Differential Revision: https://reviews.llvm.org/D83654
This patch adds support for constrained scalar int to fp operations on
PowerPC. Besides, this also fixes the FP exception bit of FCFID*
instructions.
Reviewed By: steven.zhang, uweigand
Differential Revision: https://reviews.llvm.org/D81669
This patch is the initial support for the Intial Exec Thread Local
Local Storage model to produce code sequence and relocations correct
to the ABI for the model when using PC relative memory operations.
Reviewed By: stefanp
Differential Revision: https://reviews.llvm.org/D81947
This patch is the initial support for the General Dynamic Thread Local
Local Storage model to produce code sequence and relocations correct
to the ABI for the model when using PC relative memory operations.
Patch by: NeHuang
Reviewed By: stefanp
Differential Revision: https://reviews.llvm.org/D82315
This patch adds support for constrained scalar fp to int operations on
PowerPC. Besides, this fixes the FP exception bit of quad-precision
convert & truncate instructions.
Reviewed By: steven.zhang, uweigand
Differential Revision: https://reviews.llvm.org/D81537
Summary:
This is a follow up for D82481. For .lcomm directive, although it's
not necessary to have .rename emitted, it's still desirable to do
it so that we do not see internal 'Rename..' gets print out in
symbol table. And we could have consistent naming between TC entry
and .lcomm. And also have consistent naming between IR and final
object file.
Reviewed By: hubert.reinterpretcast
Differential Revision: https://reviews.llvm.org/D86075
This patch implements the vec_extractm function prototypes in altivec.h in
order to utilize the vector extract with mask instructions introduced in Power10.
Differential Revision: https://reviews.llvm.org/D82675
This patch implements initial backend support for a -mtune CPU controlled by a "tune-cpu" function attribute. If the attribute is not present X86 will use the resolved CPU from target-cpu attribute or command line.
This patch adds MC layer support a tune CPU. Each CPU now has two sets of features stored in their GenSubtargetInfo.inc tables . These features lists are passed separately to the Processor and ProcessorModel classes in tablegen. The tune list defaults to an empty list to avoid changes to non-X86. This annoyingly increases the size of static tables on all target as we now store 24 more bytes per CPU. I haven't quantified the overall impact, but I can if we're concerned.
One new test is added to X86 to show a few tuning features with mismatched tune-cpu and target-cpu/target-feature attributes to demonstrate independent control. Another new test is added to demonstrate that the scheduler model follows the tune CPU.
I have not added a -mtune to llc/opt or MC layer command line yet. With no attributes we'll just use the -mcpu for both. MC layer tools will always follow the normal CPU for tuning.
Differential Revision: https://reviews.llvm.org/D85165
A unique module id, which is a part of sinit and sterm function names, is
necessary to be unique. However, `getUniqueModuleId` will fail if there is
no strong external symbol within a module. We turn to use Pid and timestamp
when this happens.
Differential Revision: https://reviews.llvm.org/D85527
This patch implements the builtins for the vector shifts (shl, srl, sra), and
adds the appropriate test cases for these builtins. The builtins utilize the
vector shift instructions introduced within ISA 3.1.
Differential Revision: https://reviews.llvm.org/D83338
SUMMARY:
1. in the patch , remove setting storageclass in function .getXCOFFSection and construct function of class MCSectionXCOFF
there are
XCOFF::StorageMappingClass MappingClass;
XCOFF::SymbolType Type;
XCOFF::StorageClass StorageClass;
in the MCSectionXCOFF class,
these attribute only used in the XCOFFObjectWriter, (asm path do not need the StorageClass)
we need get the value of StorageClass, Type,MappingClass before we invoke the getXCOFFSection every time.
actually , we can get the StorageClass of the MCSectionXCOFF from it's delegated symbol.
2. we also change the oprand of branch instruction from symbol name to qualify symbol name.
for example change
bl .foo
extern .foo
to
bl .foo[PR]
extern .foo[PR]
3. and if there is reference indirect call a function bar.
we also add
extern .bar[PR]
Reviewers: Jason liu, Xiangling Liao
Differential Revision: https://reviews.llvm.org/D84765
Changes the Offset arguments to both functions from int64_t to TypeSize
& updates all uses of the functions to create the offset using TypeSize::Fixed()
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D85220
Summary:
Use TE SMC instead of TC SMC in large code model mode,
so that large code model TOC entries could get placed after all
the small code model TOC entries, which reduces the chance of TOC overflow.
Reviewed By: Xiangling_L
Differential Revision: https://reviews.llvm.org/D85455
Summary:
AIX assembler does not generate correct relocation when .rename
appear between tc entry label and .tc directive.
So only emit .rename after .tc/.comm or other linkage is emitted.
Reviewed By: daltenty, hubert.reinterpretcast
Differential Revision: https://reviews.llvm.org/D85317
On the frontend side, this patch recovers AIX static init implementation to
use the linkage type and function names Clang chooses for sinit related function.
On the backend side, this patch sets correct linkage and function names on aliases
created for sinit/sterm functions.
Differential Revision: https://reviews.llvm.org/D84534
Add a hidden option to the compiler to control a the PC Relative GOT indirect
linker optimization.
If this option is set to false the compiler will no loger produce the
relocations required by the linker to perform the optimization.
Reviewed By: nemanjai, NeHuang, #powerpc
Differential Revision: https://reviews.llvm.org/D85377
This patch introduces two intrinsics: llvm.ppc.setflm and
llvm.ppc.readflm. They read from or write to FPSCR register
(floating-point status & control) which contains rounding mode and
exception status.
To ensure correctness of program, we need to prevent FP operations from
being moved across these intrinsics (mffs/mtfsf instruction), so here I
set them as scheduling boundaries. We can relax such restriction if
FPSCR is modeled well in the future.
Reviewed By: steven.zhang
Differential Revision: https://reviews.llvm.org/D84914
This patch adds the instruction definitions and assembly/disassembly tests for
the following set of instructions:
Vector Extract [byte | half | word | doubleword | quad] with mask
Vector Expand [byte | half | word | doubleword | quad] with mask
Move to VSR [byte | byte immediate | half | word | doubleword | quad] with mask
Vector Count Mask Bits [byte | half | word | doubleword]
Differential Revision: https://reviews.llvm.org/D83724
Introduce a fatal error if any thread local storage code is compiled
using pc relative memory operations as well as a hidden override
option `-enable-ppc-pcrel-tls` so that this support can be incrementally
added if possible.
Reviewed By: #powerpc, nemanjai
Differential Revision: https://reviews.llvm.org/D85448
This patch implements the function prototypes vec_extractl and vec_extracth in altivec.h to utilize the vector extract double element instructions introduced in Power10.
Differential Revision: https://reviews.llvm.org/D84622
The swap removal pass looks to remove swaps when a loaded value is swapped, some
number of lane-insensitive operations are performed and then the value is
swapped again and stored.
However, in a situation where we load the value, swap it and then store it
without swapping again, the pass erroneously removes the single swap. The
reason is that both checks in the same equivalence class:
- load feeds a swap
- swap feeds a store
pass. However, there is no check that the two swaps are actually a single swap.
This patch just fixes that.
Differential revision: https://reviews.llvm.org/D84785
The custom lowering saves an instruction over the generic expansion, by
taking advantage of the fact that PowerPC shift instructions are well
defined in the shift-by-bitwidth case.
Differential Revision: https://reviews.llvm.org/D83948
For FP_TO_INT and INT_TO_FP lowering, we have direct-move and
non-direct-move methods. But they share some conversion logic, so we can
reduce redundant code by introducing new methods.
Reviewed By: steven.zhang
Differential Revision: https://reviews.llvm.org/D81818
This is a refactor patch to prepare for adding the support for strict-fsetcc
in PowerPC backend. We want to move their definition into a uniform way so that,
we could add the strict node easier.
Reviewed By: shchenz
Differential Revision: https://reviews.llvm.org/D81712
SPE doesn't have a fsel instruction, so don't try to lower to it.
This fixes a "Cannot select: tN: f64 = PPCISD::FSEL tX, tY, tZ" error.
Reviewed By: #powerpc, lkail
Differential Revision: https://reviews.llvm.org/D77773
The patterns were incorrect copies from the FPU code, and are
unnecessary, since there's no extended load for SPE. Just let LLVM
itself do the work by marking it expand.
Reviewed By: #powerpc, lkail
Differential Revision: https://reviews.llvm.org/D78670
This patch implements the instruction definition and MC tests for the vector
string isolate instructions.
Differential Revision: https://reviews.llvm.org/D84197
Scheduler will try to retrieve the offset and base addr to determine if two
loads/stores are disjoint memory access. PowerPC failed to handle this for
frame index which will bring extra memory dependency for loads/stores.
Reviewed By: jji
Differential Revision: https://reviews.llvm.org/D84308
Summary:
This patch implements -ffunction-sections on AIX.
This patch focuses on assembly generation.
Follow-on patch needs to handle:
1. -ffunction-sections implication for jump table.
2. Object file generation path and associated testing.
Differential Revision: https://reviews.llvm.org/D83875
Summary:
Some instructions have set the wrong [RM] flag, this patch is to fix it.
Instructions x(v|s)r(d|s)pi[zmp]? and fri[npzm] use fixed rounding
directions without referencing current rounding mode.
Also, the SETRNDi, SETRND, BCLRn, MTFSFI, MTFSB0, MTFSB1, MTFSFb,
MTFSFI, MTFSFI_rec, MTFSF, MTFSF_rec should also fix the RM flag.
Reviewed By: jsji
Differential Revision: https://reviews.llvm.org/D81360