Weak functions can be replaced by other functions at link time. Previously it
was assumed that no matter what the weak callee function was replaced with it
would still share the same TOC as the caller. This is no longer true as a weak
callee with a TOC setup can be replaced by another function that was compiled
with PC Relative and does not have a TOC at all.
This patch makes sure that all calls to functions defined as weak from a caller
that has a valid TOC have a nop after the call to allow a place for the linker
to restore the TOC.
Reviewed By: NeHuang
Differential Revision: https://reviews.llvm.org/D91983
Instruction darn was introduced in ISA 3.0. It means 'Deliver A Random
Number'. The immediate number L means:
- L=0, the number is 32-bit (higher 32-bits are all-zero)
- L=1, the number is 'conditioned' (processed by hardware to reduce bias)
- L=2, the number is not conditioned, directly from noise source
GCC implements them in three separate intrinsics: __builtin_darn,
__builtin_darn_32 and __builtin_darn_raw. This patch implements the
same intrinsics. And this change also addresses Bugzilla PR39800.
Reviewed By: steven.zhang
Differential Revision: https://reviews.llvm.org/D92465
Summary: The imm operands of some instructions are not defined accurately in td.
This is a small patch to correct these definitions.
Reviewed By: steven.zhang
Differential Revision: https://reviews.llvm.org/D91603
The xxeval instruction was intorduced in Power PC in Power 10.
The instruction accepts three vector registers and an immediate.
Depending on the value of the immediate the instruction can be used
to perform certain bitwise boolean operations (and, or, xor, ...) on
the given vector registers.
This patch implements the AND and NAND patterns that can be used by
the instruction.
Reviewed By: nemanjai, #powerpc, bsaleil, NeHuang, jsji
Differential Revision: https://reviews.llvm.org/D92420
Summary: This patch added support for the intrinsics llvm.ppc.dcbfps and llvm.ppc.dcbstps.
dcbfps and dcbstps are actually extended mnemonics of dcbf.
dcbfps RA,RB ---> dcbf RA,RB,4
dcbstps RA,RB ---> dcbf RA,RB,6
Reviewed By: amyk, steven.zhang
Differential Revision: https://reviews.llvm.org/D91323
A simple SELECT is used for converting i1 to floating types on ppc32,
but in constrained cases, the chain is not handled properly. This patch
will fix that.
Reviewed By: steven.zhang
Differential Revision: https://reviews.llvm.org/D92365
This PR adds more register class support in PowerPC,
mark OperandType for imm and memory operands.
Also added more unit tests for SnippetGenerator.
Reviewed By: #powerpc, steven.zhang
Differential Revision: https://reviews.llvm.org/D88044
When using accumulators in loops, they are passed around in PHI nodes of unprimed
accumulators, causing the generation of additional prime/unprime instructions.
This patch detects these cases and changes these PHI nodes to primed accumulator
PHI nodes. We also add IR and MIR test cases for several PHI node cases.
Differential Revision: https://reviews.llvm.org/D91391
PowerPC ISA support the input test for vector type v4f32 and v2f64.
Replace the software compare with hw test will improve the perf.
Reviewed By: ChenZheng
Differential Revision: https://reviews.llvm.org/D90914
Summary:
AIX uses the existing EH infrastructure in clang and llvm.
The major differences would be
1. AIX do not have CFI instructions.
2. AIX uses a new personality routine, named __xlcxx_personality_v1.
It doesn't use the GCC personality rountine, because the
interoperability is not there yet on AIX.
3. AIX do not use eh_frame sections. Instead, it would use a eh_info
section (compat unwind section) to store the information about
personality routine and LSDA data address.
Reviewed By: daltenty, hubert.reinterpretcast
Differential Revision: https://reviews.llvm.org/D91455
In lowering of FLT_ROUNDS_, FPSCR content will be moved into FP register
and then GPR, and then truncated into word.
For subtargets without direct move support, it will store and then load.
The load address needs adjustment (+4) only on big-endian targets. This
patch fixes it on using generic opcodes on little-endian and subtargets
with direct-move.
Reviewed By: steven.zhang
Differential Revision: https://reviews.llvm.org/D91845
i1 is the native type for PowerPC if crbits is enabled. However, we need
to promote the i1 to i64 as we didn't have the pattern for i1.
Reviewed By: Qiu Chao Fang
Differential Revision: https://reviews.llvm.org/D92067
Continue the work started at D50989.
The code has been long dead since the triple has been removed (D75494).
Reviewed By: nickdesaulniers, void
Differential Revision: https://reviews.llvm.org/D91836
For now, we will hardcode the result as 0.0 if the input is denormal or 0. That will
have the impact the precision. As the fsqrt added belong to the cold path of the
cmp+branch, it won't impact the performance for normal inputs for PowerPC, but improve
the precision if the input is denormal.
Reviewed By: Spatel
Differential Revision: https://reviews.llvm.org/D80974
This patch enables passing non variadic vector type parameters on the caller and callee side and vector return on AIX that are passed in vector registers only.
So far, support is enabled for only the AIX extended Altivec ABI Calling convention.
Reviewed By: sfertile, DiggerLin
Differential Revision: https://reviews.llvm.org/D86476
If smax() is legal, this is likely to result in smaller codegen expansion for abs(x) than the xor(add,ashr) method.
This is also what PowerPC has been doing for its abs implementation, so it lets us get rid of a load of custom lowering code there (and which was never updated when they added smax lowering).
Alive2: https://alive2.llvm.org/ce/z/xRk3cD
Differential Revision: https://reviews.llvm.org/D92095
During reviewing https://reviews.llvm.org/D84419, @efriedma mentioned the gap between realigned stack pointer and origin stack pointer should be probed too whatever the alignment is. This patch fixes the issue for PPC64.
Reviewed By: jsji
Differential Revision: https://reviews.llvm.org/D88078
PowerPC has instruction ftsqrt/xstsqrtdp etc to do the input test for software square root.
LLVM now tests it with smallest normalized value using abs + setcc. We should add hook to
target that has test instructions.
Reviewed By: Spatel, Chen Zheng, Qiu Chao Fang
Differential Revision: https://reviews.llvm.org/D80706
This patch is the initial patch for support of the AIX extended vector ABI. The extended ABI treats vector registers V20-V31 as non-volatile and we add them as callee saved registers in this patch.
Reviewed By: sfertile
Differential Revision: https://reviews.llvm.org/D88676
For now, we are using the GPR to pass the arguments/return value for fp128 on Power8,
which is incorrect. It should be VSR. The reason why we do it this way is that,
we are setting the fp128 as illegal which make LLVM try to emulate it with i128 on
Power8. So, we need to correct it as legal.
Reviewed By: Nemanjai
Differential Revision: https://reviews.llvm.org/D91527
Added support for the options mabi=vec-extabi and mabi=vec-default which are analogous to qvecnvol and qnovecnvol when using XL on AIX.
The extended Altivec ABI on AIX is enabled using mabi=vec-extabi in clang and vec-extabi in llc.
Reviewed By: Xiangling_L, DiggerLin
Differential Revision: https://reviews.llvm.org/D89684
When the operand to an (s/u)int_to_fp node is an illegally typed load we
cannot reuse the load address since we can not build a proper dependancy
chain. The legalized loads will use a different chain output then the
illegal load. If we reuse the load address then we will build a
conversion node that uses the chain of the illegal load and operations
which modify the memory address in the other dependancy chain can be
scheduled before the floating point load which feeds the conversion.
Differential Revision: https://reviews.llvm.org/D91265
New pseudo instructions GETtlsADDRPCREL and GETtlsldADDRPCREL are added for properly
setting REGMASK for tls_get_addr function when using PCRelative address.
Differential Revisien: https://reviews.llvm.org/D91420
Reviewed by: bsaleil
It is possible that we have different constants in different slots
of second vector double (float) of pow function. So, in this case
Exp->getSplatValue() will return nullptr. Here, I handle it properly.
Reviewed By: steven.zhang, PowerPC
Differential Revision: https://reviews.llvm.org/D91729
Support reserved [0-100] and non-reserved[101-65535] Clang/GNU init
priority values on AIX.
This patch maps Clang/GNU values into priority values used in sinit/sterm
functions. User can play with values and be able to get init to occur
before or after XL init and vice versa.
Differential Revision: https://reviews.llvm.org/D91272
Summary: We have the patterns to fold 2 RLWINMs before RA, while some RLWINM will be generated after RA, for example rGc4690b007743. If the RLWINM generated after RA followed by another RLWINM, we expect to perform the optimization too.
Reviewed By: shchenz
Differential Revision: https://reviews.llvm.org/D89855
The default version only works if the returned node has a single
result. The X86 and PowerPC versions support multiple results
and allow a single result to be returned from a node with
multiple outputs. And allow a single result that is not result 0
of the node.
Also replace the Mips version since the new version should work
for it. The original version handled multiple results, but only
if the new node and original node had the same number of results.
Differential Revision: https://reviews.llvm.org/D91846
Clang generates a '%' prefix for some registers in CFI directives. E.g.
".cfi_register lr, r12" becomes ".cfi_register lr, %r12" after
processing.
Differential Revision: https://reviews.llvm.org/D91735
In some situations, the compiler may insert an accumulator prime instruction and
an accumulator unprime instruction with no use of that accumulator between the two.
That's for example the case when we store an accumulator after assembling it or
restoring it. This patch adds a peephole to remove these prime and unprime instructions.
Differential Revision: https://reviews.llvm.org/D91386
This patch factors out the part of printInstruction that gets the
mnemonic string for a given MCInst. This is intended to be used
subsequently for the instruction-mix remarks to display the final
mnemonic (D90040).
Unfortunately making `getMnemonic` available to the AsmPrinter
seems to require making it virtual. Not sure if there's a way around
that with the current layering of the AsmPrinters.
Reviewed By: Paul-C-Anagnostopoulos
Differential Revision: https://reviews.llvm.org/D90039
No longer rely on an external tool to build the llvm component layout.
Instead, leverage the existing `add_llvm_componentlibrary` cmake function and
introduce `add_llvm_component_group` to accurately describe component behavior.
These function store extra properties in the created targets. These properties
are processed once all components are defined to resolve library dependencies
and produce the header expected by llvm-config.
Differential Revision: https://reviews.llvm.org/D90848
Summary: This patch try to do the following transformation if the multiplier doen't fit int16:
(mul X, c1 << c2) -> (rldicr (mulli X, c1) c2)
Reviewed By: jsji, steven.zhang
Differential Revision: https://reviews.llvm.org/D87384
Summary: This patch depends on D89846. We have the patterns to fold 2 RLWINMs in ppc-mi-peephole, while some RLWINM will be generated after RA, for example rGc4690b007743. If the RLWINM generated after RA followed by another RLWINM, we expect to perform the optimization after RA, too.
Reviewed By: shchenz, steven.zhang
Differential Revision: https://reviews.llvm.org/D89855
Summary: We have the patterns to fold 2 RLWINMs in ppc-mi-peephole, while some RLWINM will be generated after RA, for example D88274. If the RLWINM generated after RA followed by another RLWINM, we expect to perform the optimization after RA, too.
This is a NFC patch to move the folding patterns to PPCInstrInfo, and the follow-up works will be calling it in pre-emit-peephole and expand the patterns to handle more cases.
Reviewed By: shchenz
Differential Revision: https://reviews.llvm.org/D89846
Vector types, quadword integers and f128 currently cannot be handled in
FastISel. We did not skip f128 type in lowering arguments, which causes
a crash. This patch will fix it.
Reviewed By: steven.zhang
Differential Revision: https://reviews.llvm.org/D90206
This reverts the revert commit 408c4408fa.
This version of the patch includes a fix for a crash caused by
treating ICmp/FCmp constant expressions as instructions.
Original message:
On some targets, like AArch64, vector selects can be efficiently lowered
if the vector condition is a compare with a supported predicate.
This patch adds a new argument to getCmpSelInstrCost, to indicate the
predicate of the feeding select condition. Note that it is not
sufficient to use the context instruction when querying the cost of a
vector select starting from a scalar one, because the condition of the
vector select could be composed of compares with different predicates.
This change greatly improves modeling the costs of certain
compare/select patterns on AArch64.
I am also planning on putting up patches to make use of the new argument in
SLPVectorizer & LV.
Variable InnerIsSel references FalseRes, while FalseRes might be
zext/sext. So InnerIsSel should reference SetOrSelCC, otherwise a crash
will happen.
Reviewed By: steven.zhang
Differential Revision: https://reviews.llvm.org/D90142
On some targets, like AArch64, vector selects can be efficiently lowered
if the vector condition is a compare with a supported predicate.
This patch adds a new argument to getCmpSelInstrCost, to indicate the
predicate of the feeding select condition. Note that it is not
sufficient to use the context instruction when querying the cost of a
vector select starting from a scalar one, because the condition of the
vector select could be composed of compares with different predicates.
This change greatly improves modeling the costs of certain
compare/select patterns on AArch64.
I am also planning on putting up patches to make use of the new argument in
SLPVectorizer & LV.
Reviewed By: dmgreen, RKSimon
Differential Revision: https://reviews.llvm.org/D90070
When converting a BUILD_VECTOR or VECTOR_SHUFFLE to a splatting load
as of 1461fb6e78, we inaccurately check
for a single user of the load and neglect to update the users of the
output chain of the original load. As a result, we can emit a new
load when the original load is kept and the new load can be reordered
after a dependent store. This patch fixes those two issues.
Fixes https://bugs.llvm.org/show_bug.cgi?id=47891
Turn on TLS support for PCRel by default and update the test cases.
Differential Revision: https://reviews.llvm.org/D88738
Reviewed by: stefanp, kamaub
This patch implements the set boolean condition instructions introduced in
POWER10.
The set boolean condition instructions (set[n]bc[r]) are used during
the following situations:
- sign/zero/any extending i1 to an i32 or i64,
- reg+reg, reg+imm or floating point comparisons being sign/zero extended to i32 or i64,
- spilling CR bits (using the setnbc instruction)
Differential Revision: https://reviews.llvm.org/D87705
In this patch, Predicates fix added for the following:
* disable prefix-instrs will disable pcrelative-memops
* set two predicates PairedVectorMemops and PrefixInstrs for PLXVP/PSTXVP definitions
Differential Revision: https://reviews.llvm.org/D89727
Reviewed by: amyk, steven.zhang
Some instructions may be removable through processes such as IfConversion,
however DefinesPredicate can not be made aware of when this should be considered.
This parameter allows DefinesPredicate to distinguish these removable instructions
on a per-call basis, allowing for more fine-grained control from processes like
ifConversion.
Renames DefinesPredicate to ClobbersPredicate, to better reflect it's purpose
Differential Revision: https://reviews.llvm.org/D88494
MULH is often expanded on targets.
This patch removes the isMulhCheaperThanMulShift hook and uses
isOperationLegalOrCustom instead.
Differential Revision: https://reviews.llvm.org/D80485
In most of lib/Target we know that we are not dealing with scalable
types so it's perfectly fine to replace TypeSize comparison operators
with their fixed width equivalents, making use of getFixedSize()
and so on.
Differential Revision: https://reviews.llvm.org/D89101
This patch adds support for assemble disassemble intrinsics
for MMA.
Reviewed By: bsaleil, #powerpc
Differential Revision: https://reviews.llvm.org/D88739
Summary: This patch is derived from D87384.
In this patch we expand the existing decomposition of mul-by-constant to be more general by implementing 2 patterns:
```
mul x, (2^N + 2^M) --> (add (shl x, N), (shl x, M))
mul x, (2^N - 2^M) --> (sub (shl x, N), (shl x, M))
```
The conversion will be trigged if the multiplier is a big constant that the target can't use a single multiplication instruction to handle. This is controlled by the hook `decomposeMulByConstant`.
More over, the conversion benefits from an ILP improvement since the instructions are independent. A case with the sequence like following also gets benefit since a shift instruction is saved.
```
*res1 = a * 0x8800;
*res2 = a * 0x8080;
```
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D88201
SUMMARY:
In IBM compiler xlclang , there is an option -fnovisibility which suppresses visibility. For more details see: https://www.ibm.com/support/knowledgecenter/SSGH3R_16.1.0/com.ibm.xlcpp161.aix.doc/compiler_ref/opt_visibility.html.
We need to add the option -mignore-xcoff-visibility for compatibility with the IBM AIX OS (as the option is enabled by default in AIX). With this option llvm does not emit any visibility attribute to ASM or XCOFF object file.
The option only work on the AIX OS, for other non-AIX OS using the option will report an unsupported options error.
In AIX OS:
1.1 the option -mignore-xcoff-visibility is enabled by default , if there is not -fvisibility=* and -mignore-xcoff-visibility explicitly in the clang command .
1.2 if there is -fvisibility=* explicitly but not -mignore-xcoff-visibility explicitly in the clang command. it will generate visibility attributes.
1.3 if there are both -fvisibility=* and -mignore-xcoff-visibility explicitly in the clang command. The option "-mignore-xcoff-visibility" wins , it do not emit the visibility attribute.
The option -mignore-xcoff-visibility has no effect on visibility attribute when compile with -emit-llvm option to generated LLVM IR.
Reviewer: daltenty,Jason Liu
Differential Revision: https://reviews.llvm.org/D87451
getNode handling for ISD:SETCC calls FoldSETCC which can canonicalize
FP constants to the RHS. When this happens we should create the node
with the FMF that was requested. By using FlagInserter when can ensure
any calls to getNode/getSetcc during canonicalization will also get the flags.
Differential Revision: https://reviews.llvm.org/D88063
Summary: This patch implements the builtins for xvtdivdp, xvtdivsp, xvtsqrtdp, xvtsqrtsp.
The instructions correspond to the following builtins:
int vec_test_swdiv(vector double v1, vector double v2);
int vec_test_swdivs(vector float v1, vector float v2);
int vec_test_swsqrt(vector double v1);
int vec_test_swsqrts(vector float v1);
This patch depends on D88274, which fixes the bug in copying from CRRC to GPRC/G8RC.
Reviewed By: steven.zhang, amyk
Differential Revision: https://reviews.llvm.org/D88278
Summary: How we copying the CRRC to GRC is using a single MFOCRF to copy the contents of CR field n (CR bits 4×n+32:4×n+35) into bits 4×n+32:4×n+35 of register GRC. That’s not correct because we expect the value of destination register equals to source so we have to put the the contents of CR field in the lowest 4 bits. This patch adds a RLWINM after MFOCRF to achieve that.
The problem came up when adding builtins for xvtdivdp, xvtdivsp, xvtsqrtdp, xvtsqrtsp, as posted in D88278. We need to move the outputs (in CR register) to GRC. However outputs of these instructions may not in a fixed CR# register, so we can’t directly add a rotation instruction in the .td patterns, but need to wait until the CR register is determined. Then we confirmed this should be a bug in POST-RA PSEUDO PASS.
Reviewed By: nemanjai, shchenz
Differential Revision: https://reviews.llvm.org/D88274
Summary:
Some design decision worth noting about:
I've noticed a recent mailing discussing about why string literal is
not affected by -fdata-sections for ELF target:
http://lists.llvm.org/pipermail/llvm-dev/2020-September/145121.html
But on AIX, our linker could not split the mergeable string like other target.
So I think it would make more sense for us to emit separate csect for
every mergeable string in -fdata-sections mode,
as there might not be other ways for linker to do garbage collection
on unused mergeable string.
Reviewed By: daltenty, hubert.reinterpretcast
Differential Revision: https://reviews.llvm.org/D88339
This patch adds outer product instructions for MMA, including related infrastructure, and their tests.
Depends on D84968.
Reviewed By: #powerpc, bsaleil, amyk
Differential Revision: https://reviews.llvm.org/D88043
It looks like in some circumstances when compiling with `-mcpu=pwr9` we create an EXTSWSLI node when which causes llc to fail. No such error occurs in pwr8 or lower.
This occurs in 32BIT AIX and BE Linux. the cause seems to be that the default return in combineSHL is to create an EXTSWSLI node. Adding a check for whether we are in PPC64 before that fixes the issue.
Reviewed By: #powerpc, nemanjai
Differential Revision: https://reviews.llvm.org/D87046
After removal of Darwin as a PowerPC subtarget, the VRSAVE
save/restore/spill/update code is no longer needed by any supported
subtarget, so remove it while keeping support for vrsave and related instruction
aliases for inline asm. I've pre-commited tests to document the existing vrsave
handling in relation to @llvm.eh.unwind.init and inline asm usage, as
well as a test which shows a beahviour change on AIX related to
returning vector type as we were wrongly emiting VRSAVE_UPDATE on AIX.
This patch legalizes the v256i1 and v512i1 types that will be used for MMA.
It implements loads and stores of these types.
v256i1 is a pair of VSX registers, so for this type, we load/store the two
underlying registers. v512i1 is used for MMA accumulators. So in addition to
loading and storing the 4 associated VSX registers, we generate instructions to
prime (copy the VSX registers to the accumulator) after loading and unprime
(copy the accumulator back to the VSX registers) before storing.
This patch also adds the UACC register class that is necessary to implement the
loads and stores. This class represents accumulator in their unprimed form and
allow the distinction between primed and unprimed accumulators to avoid invalid
copies of the VSX registers associated with primed accumulators.
Differential Revision: https://reviews.llvm.org/D84968
According to POWER ISA, floating point instructions altering exception
bits in FPSCR should be 'may raise FP exception'. (excluding those
read or write the whole FPSCR directly, like mffs/mtfsf) We need to
model FPSCR well in future patches to handle the special case properly.
Instructions added mayRaiseFPException:
- fre(s)/frsqrte(s)
- fmadd(s)/fmsub(s)/fnmadd(s)/fnmsub(s)
- xscmpoqp/xscmpuqp/xscmpeqdp/xscmpgedp/xscmpgtdp
- xscvdphp/xscvhpdp/xvcvhpsp/xvcvsphp/xsrqpxp
- xsmaxcdp/xsincdp/xsmaxjdp/xsminjdp
Instructions removed mayRaiseFPException:
- xstdivdp/xvtdiv(d|s)p/xstsqrtdp/xvtsqrt(d|s)p
- xsabsdp/xsnabsdp/xvabs(d|s)p/xvnabs(d|s)p
- xsnegdp/xscpsgndp/xvneg(d|s)p/xvcpsgn(d|s)p
- xvcvsxwdp/xvcvuxwdp
- xscvdpspn/xscvspdpn
Reviewed By: steven.zhang
Differential Revision: https://reviews.llvm.org/D87738
This patch adds the xxmfacc, xxmtacc and xxsetaccz instructions to manipulate
accumulator registers. It also adds the ACC register class definition for the
accumulator registers.
Differential Revision: https://reviews.llvm.org/D84847
This patch implements the vec_[all|any]_[eq | ne | lt | gt | le | ge] builtins for vector signed/unsigned __int128.
Differential Revision: https://reviews.llvm.org/D87910
This patch is the initial support for the Local Dynamic Thread Local Storage
model to produce code sequence and relocation correct to the ABI for the model
when using PC relative memory operations.
Differential Revision: https://reviews.llvm.org/D87721
This patch implements the vector string isolate (predicate and non-predicate
versions) builtins. The predicate builtins are custom selected within PPCISelDAGToDAG.
Differential Revision: https://reviews.llvm.org/D87671
This patch implements the 128-bit vector divide extended builtins in Clang/LLVM.
These builtins map to the vdivesq and vdiveuq instructions respectively.
Differential Revision: https://reviews.llvm.org/D87729
Stop combining loads and stores with PPCISD::ADD_TLS before we can merge the
node with with TLS_LOCAL_EXEC_MAT_ADDR. The issue is that
TLS_LOCAL_EXEC_MAT_ADDR cannot be selected by itself and requires the previous
ADD_TLS node that goes with it. However, we sometimes try to combine ADD_TLS
with loads and stores that come after it. If this happens then the ADD_TLS is
removed and TLS_LOCAL_EXEC_MAT_ADDR cannot be selected.
While this bug fix will address the issue it my not be ideal from a performance
perspective as we may be able to add patterns to combine TLS_LOCAL_EXEC_MAT_ADDR
with ADD_TLS with the load and store that comes after it all in one. However,
this is beyond the scope of this patch.
Reviewed By: NeHuang
Differential Revision: https://reviews.llvm.org/D88030
Changes TTI function getIntImmCostInst to take an additional Instruction parameter,
which enables us to be able to check it is part of a min(max())/max(min()) pattern that will match SSAT.
We can then mark the constant used as free to prevent it being hoisted so SSAT can still be generated.
Required minor changes in some non-ARM backends to allow for the optional parameter to be included.
Differential Revision: https://reviews.llvm.org/D87457
This patch adds support for the lxvp, lxvpx, plxvp, stxvp, stxvpx and pstxvp
instructions in the PowerPC backend. These instructions allow loading and
storing VSX register pairs. This patch also adds the VSRp register class
definition needed for these instructions.
Differential Revision: https://reviews.llvm.org/D84359
This is a follow-up of D86605. For strict DAG FP node, if its FP
exception behavior metadata is ignore, it should have nofpexcept flag.
But during custom lowering, this flag isn't passed down.
This is also seen on X86 target.
Reviewed By: uweigand
Differential Revision: https://reviews.llvm.org/D87390
This patch implements the vec_gen[b|h|w|d|q]m function prototypes in altivec.h
in order to utilize the move to VSR with mask instructions introduced in Power10.
Differential Revision: https://reviews.llvm.org/D82725
This patch adds the instruction definitions and assembly/disassembly tests for
the set boolean condition instructions. This also includes the negative, and
reverse variants of the instruction.
Differential Revision: https://reviews.llvm.org/D86252
This patch implements the vec_cntm function prototypes in altivec.h in order to
utilize the vector count mask bits instructions introduced in Power10.
Differential Revision: https://reviews.llvm.org/D82726
llc would crash for (store (fptosi-f128-i32)) when -mcpu=pwr8, we should
not generate FP_TO_(S|U)INT_IN_VSR for f128 types at this time. This
patch fixes it.
Reviewed By: steven.zhang
Differential Revision: https://reviews.llvm.org/D86686