llvm-project

Commit Graph

Author	SHA1	Message	Date
jasonliu	7049fbf960	[XCOFF] Handle the case when personality routine is an alias Summary: Personality routine could be an alias to another personality routine. Fix the situation when we compile the file that contains the personality routine and the file also have functions that need to refer to the personality routine. Reviewed By: hubert.reinterpretcast Differential Revision: https://reviews.llvm.org/D101401	2021-04-29 22:03:30 +00:00
Victor Huang	ae3377c553	[AIX][TLS] Add ASM portion changes to support TLSGD relocations to XCOFF objects - Add new variantKinds for the symbol's variable offset and region handle - Print the proper relocation specifier @gd in the asm streamer when emitting the TC Entry for the variable offset for the symbol - Fix the switch section failure between the TC Entry of variable offset and region handle - Put .__tls_get_addr symbol in the ProgramCodeSects with XTY_ER property Reviewed by: sfertile Differential Revision: https://reviews.llvm.org/D100956	2021-04-29 13:18:59 -05:00
Qiu Chaofan	56d923efdb	[SPE] Support constrained float operations on SPE This patch enables support on SPE for constrained arithmetic and comparison operations. This fixes bugzilla 50070. One thing not covered is fcmp vs. fcmps on SPE. Some condition code generates singaling comparison while some not. In this patch, all are considered as singaling. So there might be still some issue when compiling from C code. Reviewed By: jhibbits Differential Revision: https://reviews.llvm.org/D101282	2021-04-29 16:34:10 +08:00
Qiu Chaofan	d5c2492455	[PowerPC] Fix SELECT_CC with i64 operand on PPC32 This patch fixes the infinite loop in legalization of PPC32 SELECT_CC with 64-bit operand.	2021-04-28 17:48:33 +08:00
Victor Huang	241c2da406	[AIX][Power10] Restrict prefixed instructions from crossing the 64byte boundary This patch adds the support to restrict prefixed instruction from crossing the 64 byte boundary: - Add the infrastructure to register a custom XCOFF streamer - Add a custom XCOFF streamer for PowerPC to allow us to intercept instructions as they are being emitted and align all 8 byte instructions to a 64 byte boundary if required by adding a 4 byte nop. Reviewed By: stefanp Differential Revision: https://reviews.llvm.org/D101107	2021-04-27 11:55:18 -05:00
Zarko Todorovski	f818ec9dd1	[AIX] Allow safe for 32bit P9 VSX extract and insert pattern matches In https://reviews.llvm.org/D92789 PPC64 checks were added that disallowed most VSX pattern matching. We enable some safe ones for 32bit in this patch. Reviewed By: nemanjai Differential Revision: https://reviews.llvm.org/D97503	2021-04-27 07:27:43 -04:00
Chen Zheng	e5000eef81	[XCOFF] make .file directive have directory info The .file directive is changed to only have basename in D36018 for ELF. But on AIX, we require the .file directive to also contain the directory info. This aligns with other AIX compiler like XLC and is required by some AIX tool like DBX. Reviewed By: hubert.reinterpretcast Differential Revision: https://reviews.llvm.org/D99785	2021-04-27 00:15:23 -04:00
Nemanja Ivanovic	6725b90a02	[PowerPC] Add vec_ctsl and vec_ctul to altivec.h These are added for compatibility with XLC. They are similar to vec_cts and vec_ctu except that the result is a doubleword vector regardless of the parameter type.	2021-04-23 11:03:38 -05:00
Nemanja Ivanovic	092619cf6b	[PowerPC] Improve codegen for vector fp to int widening conversions We currently do not utilize instructions that convert single precision vectors to doubleword integer vectors. These conversions come up in code occasionally and this improvement allows us to open code some functions that need to be added to altivec.h.	2021-04-22 05:04:06 -05:00
Nemanja Ivanovic	03e7fefff8	[PowerPC] Canonicalize shuffles on big endian targets as well Extend shuffle canonicalization and conversion of shuffles fed by vectorized scalars to big endian subtargets. For big endian subtargets, loads and direct moves of scalars into vector registers put the data in the correct element for SCALAR_TO_VECTOR if the data type is 8 bytes wide. However, if the data type is narrower, the value still ends up in the wrong place - althouth a different wrong place than on little endian targets. This patch extends the combine that keeps values where they are if they feed a shuffle to big endian targets. Differential revision: https://reviews.llvm.org/D100478	2021-04-20 07:29:47 -05:00
Qiu Chaofan	2432d80d3b	[PowerPC] Use mtvsrdd to put callee-saved GPR into VSR This patch exploits mtvsrdd instruction (available in ISA3.0+) to save two callee-saved GPR registers into a single VSR, making it more efficient. Reviewed By: jsji, nemanjai Differential Revision: https://reviews.llvm.org/D62565	2021-04-20 16:43:24 +08:00
Qiu Chaofan	b820339752	[PowerPC] Support f128 under VSX This patch is the last one in backend to support fp128 type in pre-POWER9 subtargets with VSX, removing temporary option and updating remaining tests. Reviewed By: steven.zhang Differential Revision: https://reviews.llvm.org/D92374	2021-04-20 15:49:52 +08:00
Nemanja Ivanovic	ff769dd111	[PowerPC] Minor improvement for insert_vector_elt codegen For v2f64, all VSX subtargets can insert an element with a single XXPERMDI.	2021-04-16 18:52:37 -05:00
Zarko Todorovski	6b7838b68c	[AIX] Allow safe for 32bit P8 VSX pattern matching Pull some of the safe for 32bit pattern matching for Pwr8 and above. Reviewed By: nemanjai Differential Revision: https://reviews.llvm.org/D97909	2021-04-14 08:12:48 -04:00
Nemanja Ivanovic	0148bf53f0	[PowerPC] Use correct node to get a super register from a subreg The VSX tablegen file has some rather eggregious uses of COPY_TO_REGCLASS even in situations where it needs to use SUBREG_TO_REG. While this produces correct code, it often doesn't allow the register coalescer to coalesce copies and the resulting code ends up being suboptimal. This patch just changes over patterns that should use SUBREG_TO_REG.	2021-04-13 19:52:21 -05:00
Chen Zheng	80aa9b0f7b	[PowerPC] stop reverse mem op generation for some cases. We should consider the feeder user number when we do reverse memory operation transformation. Otherwise, we may get negative impact. Reviewed By: nemanjai Differential Revision: https://reviews.llvm.org/D100166	2021-04-12 22:41:28 -04:00
Qiu Chaofan	ece7345859	[PowerPC] Lower f128 SETCC/SELECT_CC as libcall if p9vector disabled XSCMPUQP is not available for pre-P9 subtargets. This patch will lower them into libcall for correct behavior on power7/power8. Reviewed By: steven.zhang Differential Revision: https://reviews.llvm.org/D92083	2021-04-12 10:33:32 +08:00
Stefan Pintilie	5bca7cdafb	Add correct types to the xxsplti32dx pattern. Regiser types for xxsplti32dx for two td file patterns was incorrect. Fixed the two types and added a test case that was reduced from a larger failing test. Reviewed By: nemanjai, #powerpc Differential Revision: https://reviews.llvm.org/D100223	2021-04-09 14:11:34 -05:00
Thomas Preud'homme	0494b6b676	[PowerPC, test] Fix use of undef FileCheck var LLVM test CodeGen/PowerPC/ctrloops-softfloat.ll tries to check for the absence of sequences of instructions with several CHECK-NOT with one of those directives using a variable defined in another. However CHECK-NOT are checked independently so that is using a variable defined in a pattern that should not occur in the input. This commit changes occurence of the variable for the regex used in its definition, thereby making each CHECK-NOT independent. Reviewed By: nemanjai Differential Revision: https://reviews.llvm.org/D99881	2021-04-09 12:55:02 +01:00
Thomas Preud'homme	bb69173ae5	[PowerPC, test] Fix use of undef FileCheck var Commit `6ad3d05b68` disables the definition of CSR that a follow-up CHECK-NOT directive depends on. This commit replaces the undefined CSR variable use by the regex used to define it. Reviewed By: nemanjai Differential Revision: https://reviews.llvm.org/D99870	2021-04-09 12:54:35 +01:00
Thomas Preud'homme	494ba60bb7	[PowerPC, test] Fix use of undef FileCheck var Commit `6646033e6e` removed the definition of variable RESULT used in two CHECK-NOT directives in LLVM test CodeGen/PowerPC/ppc64-i128-abi.ll. This commit replaces the uses by the regex that was used to define that variable. Reviewed By: nemanjai Differential Revision: https://reviews.llvm.org/D99868	2021-04-09 12:09:48 +01:00
Chen Zheng	4b54345e47	[NFC][PowerPC] add test cases for reverse memory op transformation	2021-04-09 03:38:39 -04:00
Chen Zheng	74e77295e7	[PowerPC] fixup killed flags for ri + addi to ri transformation Fixup killed flags if DefMI and MI are not in the same basic blocks. Reviewed By: nemanjai Differential Revision: https://reviews.llvm.org/D100023	2021-04-07 22:04:08 -04:00
Thomas Preud'homme	73a7d451a2	[PowerPC, test] Fix use of undef FileCheck var LLVM test CodeGen/PowerPC/ppc-disable-non-volatile-cr.ll tries to check for the absence of a sequence of instructions with several CHECK-NOT with one of those directives using a variable defined in another. However CHECK-NOT are checked independently so that is using a variable defined in a pattern that should not occur in the input. This commit changes occurence of the variable for the regex used in its definition, thereby making each CHECK-NOT independent. Reviewed By: NeHuang, nemanjai Differential Revision: https://reviews.llvm.org/D99880	2021-04-07 09:45:21 +01:00
Qiu Chaofan	033c9c2552	[PowerPC] Fix use check of swap-reduction This will fix swap-reduction in DAGISel for cases where COPY_TO_REGCLASS has multiple uses.	2021-04-07 15:55:52 +08:00
Amy Kwan	bd6033eca7	[PowerPC] Materialize 34-bit constants with pli directly Previously, 34-bit constants were materialized in selectI64Imm(), and we relied on td pattern matching to instead produce a pli. This becomes problematic as there is no guarantee that the 34-bit constant will reach the td pattern selection for pli. It is also possible for other transformations (such as complex bit permutations) to also produce and utilize the 34-bit constant materialized through selectI64Imm(). This patch instead produces pli on Power10 directly whenever the constant fits within 34-bits. Differential Revision: https://reviews.llvm.org/D99906	2021-04-06 13:38:11 -05:00
Victor Huang	f98567b3fe	[AIX][TLS] Add support for TLS variables to XCOFF object writer This patch adds support for TLS variables to the XCOFF object writer: - Add TData and TBSS sections - Add CsectGroups for the mapping classes XCOFF::XMC_TL and XCOFF::XMC_UL - Add XMC_UL in the enum entry of CsectStorageMapping class to print the string while reading the symbol properties for TLS variables - Fix the starting address of TData and TBSS sections Reviewed by: hubert.reinterpretcast, DiggerLin Differential Revision: https://reviews.llvm.org/D98946	2021-04-06 10:46:07 -05:00
Simon Pilgrim	0ba0a7315c	[PPC] Regenerate PR27078 test checks	2021-04-01 18:11:46 +01:00
Simonas Kazlauskas	777a58e05b	Support {S,U}REMEqFold before legalization This allows these optimisations to apply to e.g. `urem i16` directly before `urem` is promoted to i32 on architectures where i16 operations are not intrinsically legal (such as on Aarch64). The legalization then later can happen more directly and generated code gets a chance to avoid wasting time on computing results in types wider than necessary, in the end. Seems like mostly an improvement in terms of results at least as far as x86_64 and aarch64 are concerned, with a few regressions here and there. It also helps in preventing regressions in changes like {D87976}. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D88785	2021-04-01 01:35:41 +03:00
Shimin Cui	00c0c8c87d	[PowerPC] [MLICM] Enable hoisting of caller preserved registers on AIX On ppc64 linux , MachineLICM will hoist caller preserved registers, including TOC loads of the global variable address, out of loops. This is to enable this on AIX for both ppc64 and ppc32. Differential Revision: https://reviews.llvm.org/D99076	2021-03-31 12:46:25 -04:00
Bradley Smith	9745dce8c3	[SelectionDAG][AArch64][SVE] Perform SETCC condition legalization in LegalizeVectorOps This is currently performed in SelectionDAGLegalize, here we make it also happen in LegalizeVectorOps, allowing a target to lower the SETCC condition codes first in LegalizeVectorOps and then lower to a custom node afterwards, without having to duplicate all of the SETCC condition legalization in the target specific lowering. As a result of this, fixed length floating point SETCC nodes can now be properly lowered for SVE. Differential Revision: https://reviews.llvm.org/D98939	2021-03-29 15:32:25 +01:00
Sanjay Patel	ad8010e598	[PowerPC] auto-generate complete testchecks; NFC The full checks demonstrate a problem that comes up in: https://llvm.org/PR49610	2021-03-25 15:52:39 -04:00
Albion Fung	e29bb074c6	[PowerPC] Exploit xxsplti32dx (constant materialization) for scalars This patch exploits the xxsplti32dx instruction available on Power10 in place of constant pool loads where xxspltidp would not be able to, usually because the immediate cannot fit into 32 bits. Differential Revision: https://reviews.llvm.org/D95458	2021-03-24 15:59:59 -04:00
Stefan Pintilie	91f4c11133	[PowerPC] Add mprivileged option Add an option to tell the compiler that it can use privileged instructions. This patch only adds the option. Backend implementation will be added in a future patch. Reviewed By: lei, amyk Differential Revision: https://reviews.llvm.org/D99193	2021-03-24 08:33:22 -05:00
Stefan Pintilie	0e4f5f3ea6	[PowerPC] Change option to mrop-protect In order to have the same option on power PC LLVM and power PC gcc the option will be changed from -mrop-protection to -mrop-protect. The feature will be off by default and turned on when the option is used. Reviewed By: lei, amyk Differential Revision: https://reviews.llvm.org/D99185	2021-03-24 05:51:35 -05:00
Qiu Chaofan	52f33f7953	[PowerPC] Enable redundant TOC save removal on AIX Reviewed By: shchenz Differential Revision: https://reviews.llvm.org/D97039	2021-03-22 14:29:22 +08:00
Simon Pilgrim	9d2df96407	[DAG] computeKnownBits - add ISD::MULHS/MULHU/SMUL_LOHI/UMUL_LOHI handling Reuse the existing KnownBits multiplication code to handle the 'extend + multiply + extract high bits' pattern for multiply-high ops. Noticed while looking at the codegen for D88785 / D98587 - the patch helps division-by-constant expansion code in particular, which suggests that we might have some further KnownBits div/rem cases we could handle - but this was far easier to implement. Differential Revision: https://reviews.llvm.org/D98857	2021-03-19 16:02:31 +00:00
Nemanja Ivanovic	a8697c57fa	[PowerPC] Fix the check for 16-bit signed field in peephole When a D-Form instruction is fed by an add-immediate, we attempt to merge the two immediates to form a single displacement so we can remove the add-immediate. However, we don't check whether the new displacement fits into a 16-bit signed immediate field early enough. Namely, we do a sign-extend from 16 bits first which will discard high bits and then we check whether the result is a 16-bit signed immediate. It of course will always be. Move the check prior to the sign extend to ensure we are checking the correct value. Fixes https://bugs.llvm.org/show_bug.cgi?id=49640	2021-03-19 07:15:53 -05:00
edwin-wang	fd302e21b3	[NFC] [XCOFF] Update PowerPC readobj test case with expression This patch is to replace the fixed value with expression. Keep .file section as fixed values as it might be changed. The remaining sections will hardly be modified. So the Index values are sequential. By using expression, we can avoid the fixed value changes in coming patches. This is a follow-up of patch D97117. Reviewed By: hubert.reinterpretcast, shchenz Differential Revision: https://reviews.llvm.org/D98620	2021-03-17 16:02:50 +08:00
Amy Kwan	e582c073d1	[NFC][PowerPC] Add additional load/store test cases This patch adds additional load/store test cases involving scalars, vectors, and PC-Rel in preparation for the refactored load and store implementation introduced in D93370. Differential Revision: https://reviews.llvm.org/D97391	2021-03-15 08:54:38 -05:00
Roman Lebedev	78b8ce40ef	Reland [SCEV] Improve modelling for (null) pointer constants This reverts commit `329aeb5db4`, and relands commit `61f006ac65`. This is a continuation of D89456. As it was suggested there, now that SCEV models `PtrToInt`, we can try to improve SCEV's pointer handling. In particular, i believe, i will need this in the future to further fix `SCEVAddExpr`operation type handling. This removes special handling of `ConstantPointerNull` from `ScalarEvolution::createSCEV()`, and add constant folding into `ScalarEvolution::getPtrToIntExpr()`. This way, `null` constants stay as such in SCEV's, but gracefully become zero integers when asked. Reviewed By: Meinersbur Differential Revision: https://reviews.llvm.org/D98147	2021-03-13 16:05:34 +03:00
Roman Lebedev	329aeb5db4	Temporairly evert "[SCEV] Improve modelling for (null) pointer constants" This appears to have broken ubsan bot: https://lab.llvm.org/buildbot/#/builders/85/builds/3062 https://reviews.llvm.org/D98147#2623549 It looks like LSR needs some kind of a change around insertion point handling. Reverting until i have a fix. This reverts commit `61f006ac65`.	2021-03-13 09:10:28 +03:00
Roman Lebedev	61f006ac65	[SCEV] Improve modelling for (null) pointer constants This is a continuation of D89456. As it was suggested there, now that SCEV models `PtrToInt`, we can try to improve SCEV's pointer handling. In particular, i believe, i will need this in the future to further fix `SCEVAddExpr`operation type handling. This removes special handling of `ConstantPointerNull` from `ScalarEvolution::createSCEV()`, and add constant folding into `ScalarEvolution::getPtrToIntExpr()`. This way, `null` constants stay as such in SCEV's, but gracefully become zero integers when asked. Reviewed By: Meinersbur Differential Revision: https://reviews.llvm.org/D98147	2021-03-12 22:11:58 +03:00
Simonas Kazlauskas	a2eca31da2	Test cases for rem-seteq fold with illegal types This also briefly tests a larger set of architectures than the more exhaustive functionality tests for AArch64 and x86. As requested in D88785 Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D98339	2021-03-12 16:28:04 +02:00
Stefan Pintilie	e021de0aab	[PowerPC] Exploit paddi instruction on Power 10 for constant materialization Starting with Power 10 the instruction paddi is available to use. The instruction allows for immediates that are 34 bits. This patch adds exploitation of the paddi instruction to allow us to materialize constants. Reviewed By: lei, amyk Differential Revision: https://reviews.llvm.org/D93300	2021-03-11 08:37:49 -06:00
Qiu Chaofan	72c4cbd60e	[PowerPC] Fix multi-use case for swap reduction `4c973ae` implemented reduction of vector swap for lane-insensitive operations. This commit fixes it for checking number of uses of the vector operation.	2021-03-11 21:58:33 +08:00
Nikita Popov	2489cbaa80	[PowerPC] Fix infinite loop in peephole CR optimization (PR49509) If we encounter a degenerate select node where both operands are the same, then we can continue negating the condition while swapping operands, resulting in an infinite loop. Avoid this by bailing out if both operands are the same. Fixes https://bugs.llvm.org/show_bug.cgi?id=49509. Differential Revision: https://reviews.llvm.org/D98340	2021-03-11 14:25:22 +01:00
Daniel Sanders	134a179dee	[mir] Change 'undef' for MMO base addresses to 'unknown-address' Differential Revision: https://reviews.llvm.org/D98100	2021-03-10 16:46:44 -08:00
Amy Kwan	8b540c542c	[PowerPC] Implement patterns for PC-Rel zextload/extload byte loads This patch adds patterns to select the PC-Relative extloadi1 and zextloadi1 byte loads. Differential Revision: https://reviews.llvm.org/D98042	2021-03-10 12:18:13 -06:00
Qiu Chaofan	e82a54ae87	[NFC] [PowerPC] Remove unsafe-fp-math in some tests As we're going to replace this ambiguous option with more precise instruction-level fast-math description, some tests need to be updated and the option doesn't play any role in some of them.	2021-03-10 17:27:21 +08:00
Qiu Chaofan	4c973ae51b	[PowerPC] Reduce symmetrical swaps for lane-insensitive vector ops This patch simplifies pattern (xxswap (vec-op (xxswap a) (xxswap b))) into (vec-op a b) if vec-op is lane-insensitive. The motivating case is ScalarToVector-VecOp-ExtractElement sequence on LE, but the peephole itself is not related to endianness, so BE may also benefit from this. Reviewed By: nemanjai Differential Revision: https://reviews.llvm.org/D97658	2021-03-10 15:21:32 +08:00
Albion Fung	9b6ac9e999	[P10] [Power PC] Exploiting new load rightmost vector element instructions. This pull request implements patterns to exploit the load rightmost vector element instructions for loading element 0 on little endian PowerPC subtargets into v8i16 and v16i8 vector registers for i16 and i8 data types. Differential Revision: https://reviews.llvm.org/D94816#inline-921403	2021-03-09 16:08:17 -05:00
Lei Huang	535a4192a9	[AIX][TLS] Generate 64-bit general-dynamic access code sequence Add support for the TLS general dynamic access model to assembly files on AIX 64-bit. Reviewed By: sfertile Differential Revision: https://reviews.llvm.org/D98078	2021-03-08 16:41:25 -06:00
Masoud Ataei	820f508b08	[PowerPC] Removing _massv place holder Since P8 is the oldest machine supported by MASSV pass, _massv place holder is removed and the oldest version of MASSV functions is assumed. If the P9 vector specific is detected in the compilation process, the P8 prefix will be updated to P9. Differential Revision: https://reviews.llvm.org/D98064	2021-03-08 21:43:24 +00:00
Nemanja Ivanovic	b0f0115308	[AIX][TLS] Generate 32-bit general-dynamic access code sequence Adds support for the TLS general dynamic access model to assembly files on AIX 32-bit. To generate the correct code sequence when accessing a TLS variable `v`, we first create two TOC entry nodes, one for the variable offset, one for the region handle. These nodes are followed by a `PPCISD::TLSGD_AIX` node (new node introduced by this patch). The `PPCISD::TLSGD_AIX` node (`TLSGDAIX` pseudo instruction) is expanded to 2 copies (to put the variable offset and region handle in the right registers) and a call to `__tls_get_addr`. This patch also changes the way TC entries are generated in asm files. If the generated TC entry is for the region handle of a TLS variable, we add the `@m` relocation and the `.` prefix to the entry name. For example: ``` L..C0: .tc .v[TC],v[TL]@m -> region handle L..C1: .tc v[TC],v[TL] -> variable offset ``` Reviewed By: nemanjai, sfertile Differential Revision: https://reviews.llvm.org/D97948	2021-03-08 09:30:19 -06:00
Ahsan Saghir	acce401068	[PowerPC] Change target data layout for 16-byte stack alignment This changes the target data layout to make stack align to 16 bytes on Power10. Before this change, stack was being aligned to 32 bytes. Reviewed By: #powerpc, nemanjai Differential Revision: https://reviews.llvm.org/D96265	2021-03-08 08:13:08 -06:00
Sanjay Patel	f75b5305f4	[ConstantFold] allow folding icmp of null and constexpr I noticed that we were not folding expressions like this: icmp ult (constexpr), null in https://llvm.org/PR49355, so we end up with extremely large icmp instructions as the constant expressions pile up on each other. There is no potential to mis-fold an unsigned boundary condition with a zero/null, so this is just falling through a crack in the pattern matching. The more general case of comparisons of non-zero constants and constexpr are more tricky and may require the datalayout to know how to cast to different types, etc. Negative tests verify that we are only changing a subset of potential patterns. Differential Revision: https://reviews.llvm.org/D98150	2021-03-08 08:53:59 -05:00
Sean Fertile	f0904a6208	[PowePC][AIX] Handle variadic vector call operands. Patch adds support for passing vector call operands to variadic functions. Arguments which are fixed shadow GPRs and stack space even when they are passed in vector registers, while arguments passed through ellipses are passed in properly aligned GPRs if available and on the stack once all GPR arguments registers are consumed. Differential Revision: https://reviews.llvm.org/D97956	2021-03-06 13:49:55 -05:00
Zarko Todorovski	2b50ce1524	[PowerPC][AIX] Enable the default AltiVec ABI on AIX This patch adds support for the default AltiVec ABI for AIX. Vector registers 20 through 31 are marked as reserved and cannot be used in the default ABI. This patch adds handling for this case and also remove the default AltiVec ABI errors. Reviewed By: sfertile Differential Revision: https://reviews.llvm.org/D96351	2021-03-05 12:46:27 -05:00
Daniel Sanders	9fc2be6f28	[mir] Fix confusing MIR when MMO's value is nullptr but offset is non-zero :: (store 1 + 4, addrspace 1) -> :: (store 1 into undef + 4, addrspace 1) An offset without a base isn't terribly useful but it's convenient to update the offset without checking the value. For example, when breaking apart stores into smaller units Differential Revision: https://reviews.llvm.org/D97812	2021-03-04 10:34:30 -08:00
Sean Fertile	aaeffbe007	[PowerPC][AIX] Handle variadic vector formal arguments. Patch adds support for passing vector arguments to variadic functions. Arguments which are fixed shadow GPRs and stack space even when they are passed in vector registers, while arguments passed through ellipses are passed in(properly aligned GPRs if available and on the stack once all GPR arguments registers are consumed. Differential Revision: https://reviews.llvm.org/D97485	2021-03-04 10:56:53 -05:00
Baptiste Saleil	54c0f520c7	[VirtRegRewriter] Insert missing killed flags when tracking subregister liveness VirtRegRewriter may sometimes fail to correctly apply the kill flag where necessary, which causes unecessary code gen on PowerPC. This patch fixes the way masks for defined lanes are computed and the way mask for used lanes is computed. Contact albion.fung@ibm.com instead of author for problems related to this commit. Differential Revision: https://reviews.llvm.org/D92405	2021-03-03 12:02:04 -05:00
Qiu Chaofan	72d4a41ba6	[PowerPC] Allow spilling GPR to VSR on AIX This patch enables spilling GPR to VSRs instead of stack under AIX ABI. Reviewed By: nemanjai Differential Revision: https://reviews.llvm.org/D97367	2021-03-03 13:32:39 +08:00
Victor Huang	1756b2adc9	[AIX][TLS] Generate TLS variables in assembly files This patch allows generating TLS variables in assembly files on AIX. Initialized and external uninitialized variables are generated with the .csect pseudo-op and local uninitialized variables are generated with the .comm/.lcomm pseudo-ops. The patch also adds a check to explicitly say that TLS is not yet supported on AIX. Reviewed by: daltenty, jasonliu, lei, nemanjai, sfertile Originally patched by: bsaleil Commandeered by: NeHuang Differential Revision: https://reviews.llvm.org/D96184	2021-03-02 18:22:48 -06:00
Arthur Eubanks	040c1b49d7	Move EntryExitInstrumentation pass location This seems to be more of a Clang thing rather than a generic LLVM thing, so this moves it out of LLVM pipelines and as Clang extension hooks into LLVM pipelines. Move the post-inline EEInstrumentation out of the backend pipeline and into a late pass, similar to other sanitizer passes. It doesn't fit into the codegen pipeline. Also fix up EntryExitInstrumentation not running at -O0 under the new PM. PR49143 Reviewed By: hans Differential Revision: https://reviews.llvm.org/D97608	2021-03-01 10:08:10 -08:00
Craig Topper	e745f7c563	[LegalizeTypes] Improve ExpandIntRes_XMULO codegen. The code previously used two BUILD_PAIRs to concatenate the two UMULO results with 0s in the lower bits to match original VT. Then it created an ADD and a UADDO with the original bit width. Each of those operations need to be expanded since they have illegal types. Since we put 0s in the lower bits before the ADD, the lower half of the ADD result will be 0. So the lower half of the UADDO result is solely determined by the other operand. Since the UADDO need to be split in half, we don't really needd an operation for the lower bits. Unfortunately, we don't see that in type legalization and end up creating something more complicated and DAG combine or lowering aren't always able to recover it. This patch directly generates the narrower ADD and UADDO to avoid needing to legalize them. Now only the MUL is done on the original type. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D97440	2021-03-01 09:54:32 -08:00
Masoud Ataei	5fe0cab79e	[PowerPC] Removing sqrtd2 and sqrtf4 from list of vectorizable function with MASSV Under -O3 and -Ofast, the MASSV conversion prevents the sqrt call to be inlined. Inline sqrt is faster than MASSV call on leppc. Differential Revision: https://reviews.llvm.org/D97487	2021-03-01 15:42:19 +00:00
Fangrui Song	47c5576d7d	ELF: Create unique SHF_GNU_RETAIN sections for llvm.used global objects If a global object is listed in `@llvm.used`, place it in a unique section with the `SHF_GNU_RETAIN` flag. The section is a GC root under `ld --gc-sections` with LLD>=13 or GNU ld>=2.36. For front ends which do not expect to see multiple sections of the same name, consider emitting `@llvm.compiler.used` instead of `@llvm.used`. SHF_GNU_RETAIN is restricted to ELFOSABI_GNU and ELFOSABI_FREEBSD in binutils. We don't do the restriction - see the rationale in D95749. The integrated assembler has supported SHF_GNU_RETAIN since D95730. GNU as>=2.36 supports section flag 'R'. We don't need to worry about GNU ld support because older GNU ld just ignores the unknown SHF_GNU_RETAIN. With this change, `__attribute__((retain))` functions/variables emitted by clang will get the SHF_GNU_RETAIN flag. Differential Revision: https://reviews.llvm.org/D97448	2021-02-26 16:38:44 -08:00
Zarko Todorovski	1c051b7b70	[NFC][AIX] Rename aix-csr-vector.ll to aix-csr-vector-extabi.ll	2021-02-24 22:12:01 -05:00
Chen Zheng	71a3986247	[XCOFF] add C_FILE symbol at index 0 of symbol table. This is for XCOFF DWARF support. Seems when DWARF debug is enable, symbol 0 has special usage for AIX binder. At least, symbol 0 can not be the .text section. Otherwise, we get some binding time error. Add correct C_FILE symbol at index 0 here to make AIX binder work. Reviewed By: hubert.reinterpretcast Differential Revision: https://reviews.llvm.org/D97117	2021-02-23 22:21:56 -05:00
Sean Fertile	bb260b1ca7	[PowerPC][AIX] Add support for vector arg passing on the stack. Enable passing more vector arguments then available vector argument passing registers. Differential Revision: https://reviews.llvm.org/D96415	2021-02-18 13:32:40 -05:00
Baptiste Saleil	34dc1ccb96	[PowerPC] Exploit the vinsw, vinsd, and vins[wd][lr]x instructions on P10 This patch generates the vinsw, vinsd, vinsblx, vinshlx, vinswlx, vinsdlx, vinsbrx, vinshrx, vinswrx and vinsdrx instructions for vector insertion on P10. Differential Revision: https://reviews.llvm.org/D94454	2021-02-18 14:17:47 +00:00
Stefan Pintilie	b80357d46e	[PowerPC] Add option for ROP Protection Added -mrop-protection for Power PC to turn on codegen that provides some protection from ROP attacks. The option is off by default and can be turned on for Power 8, Power 9 and Power 10. This patch is for the option only. The feature will be implemented by a later patch. Reviewed By: amyk Differential Revision: https://reviews.llvm.org/D96512	2021-02-18 12:15:50 +00:00
Sidharth Baveja	cb2876800c	[PowerPC][AIX] Enable Shrinkwrapping on 32 and 64 bit AIX. Summary: Currently Shrinkwrap is not enabled on AIX. This patch enables shrink wrap on 32 and 64 bit AIX, and 64 bit ELF. Reviewed By: sfertile, nemanjai Differential Revision: https://reviews.llvm.org/D95094	2021-02-17 14:54:57 +00:00
Sean Fertile	4e127bce2d	[PowerPC] Handle FP physical register in inline asm constraint. Do not defer to the base class when the register constraint is a physical fpr. The base class will select SPILLTOVSRRC as the register class and register allocation will fail on subtargets without VSX registers. Differential Revision: https://reviews.llvm.org/D91629	2021-02-17 09:27:03 -05:00
Fangrui Song	39db16e75b	[test] Make ELF tests less reliant on the lexicographical order of non-local symbols	2021-02-13 01:01:06 -08:00
Craig Topper	5744502a13	[TargetLowering][RISCV][AArch64][PowerPC] Enable BuildUDIV/BuildSDIV on illegal types before type legalization if we can find a larger legal type that supports MUL. If we wait until the type is legalized, we'll lose information about the orginal type and need to use larger magic constants. This gets especially bad on RISCV64 where i64 is the only legal type. I've limited this to simple scalar types so it only works for i8/i16/i32 which are most likely to occur. For more odd types we might want to do a small promotion to a type where MULH is legal instead. Unfortunately, this does prevent some urem/srem+seteq matching since that still require legal types. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D96210	2021-02-11 09:43:13 -08:00
Nemanja Ivanovic	a5222aa085	[DAGCombine] Do not remove masking argument to FP16_TO_FP for some targets As of commit `284f2bffc9`, the DAG Combiner gets rid of the masking of the input to this node if the mask only keeps the bottom 16 bits. This is because the underlying library function does not use the high order bits. However, on PowerPC's ELFv2 ABI, it is the caller that is responsible for clearing the bits from the register. Therefore, the library implementation of __gnu_h2f_ieee will return an incorrect result if the bits aren't cleared. This combine is desired for ARM (and possibly other targets) so this patch adds a query to Target Lowering to check if this zeroing needs to be kept. Fixes: https://bugs.llvm.org/show_bug.cgi?id=49092 Differential revision: https://reviews.llvm.org/D96283	2021-02-09 06:33:48 -06:00
Stefan Pintilie	288f762b6f	[PowerPC] Materialize 34 bit constants with pli on Power 10. NOTE: This patch was originally written by Anil Mahmud. His code has been rebased but otherwise left mostly unchanged. A new instructon on Power 10 allows for the materialization of 34 bit immediate values. This patch allows the compiler to take advantage of the new instruction in this situation. Reviewed By: amyk Differential Revision: https://reviews.llvm.org/D92879	2021-02-02 09:49:22 -06:00
Albion Fung	2e470e03b4	[PowerPC][Power10] Fix XXSPLI32DX not correctly exploiting specific cases Some cases may be transformed into 32 bit splats before hitting the boolean statement, which may cause incorrect behaviour and provide XXSPLTI32DX with the incorrect values of splat. The condition was reversed so that the shortcut prevents this problem. Differential Revision: https://reviews.llvm.org/D95634	2021-01-28 15:17:32 -05:00
Nemanja Ivanovic	54e570d94a	[PowerPC] Do not emit XXSPLTI32DX for sub 64-bit constants If the APInt returned by BuildVectorSDNode::isConstantSplat() is narrower than 64 bits, the result produced by XXSPLTI32DX is incorrect. The result returned by the function appears to be incorrect and we'll investigate/fix it in a follow-up commit. However, since this causes miscompiles, we must temporarily disable emitting this instruction for such values.	2021-01-28 04:16:48 -06:00
Nemanja Ivanovic	8018f731f0	[PowerPC] Do not emit HW loop with half precision operations If a loop has any operations on half precision values, there will be calls to library functions on Power8. Even on Power9, there is a small subset of instructions that are actually supported for the type. This patch disables HW loops whenever any operations on the type are found (other than the handfull of supported ones when compiling for Power9). Fixes a few PR's opened by Julia: https://bugs.llvm.org/show_bug.cgi?id=48785 https://bugs.llvm.org/show_bug.cgi?id=48786 https://bugs.llvm.org/show_bug.cgi?id=48519 Differential revision: https://reviews.llvm.org/D94980	2021-01-25 20:55:56 -06:00
Nemanja Ivanovic	1150bfa6bb	[PowerPC] Add missing negate for VPERMXOR on little endian subtargets This intrinsic is supposed to have the permute control vector complemented on little endian systems (as the ABI specifies and GCC implements). With the current code gen, the result vector is byte-reversed. Differential revision: https://reviews.llvm.org/D95004	2021-01-25 12:23:33 -06:00
Chen Zheng	0ed4cf4bf3	[PowerPC] support register pressure reduction in machine combiner. Reassociating some patterns to generate more fma instructions to reduce register pressure. Reviewed By: jsji Differential Revision: https://reviews.llvm.org/D92071	2021-01-24 21:28:21 -05:00
Qiu Chaofan	449f2f7140	[PowerPC] Duplicate inherited heuristic from base scheduler PowerPC has its custom scheduler heuristic. It calls parent classes' tryCandidate in override version, but the function returns void, so this way doesn't actually help. This patch duplicates code from base scheduler into PPC machine scheduler class, which does what we wanted. Reviewed By: steven.zhang Differential Revision: https://reviews.llvm.org/D94464	2021-01-22 10:11:03 +08:00
Albion Fung	719b563ecf	[PowerPC][Power10] Exploit splat instruction xxsplti32dx in Power10 Exploits the instruction xxsplti32dx. It can be used to materialize any 64 bit scalar/vector splat by using two instances, one for the upper 32 bits and the other for the lower 32 bits. It should not materialize the cases which can be materialized by using the instruction xxspltidp. Differential Revision: https://https://reviews.llvm.org/D90173	2021-01-20 12:55:52 -05:00
Victor Huang	909d6c86ea	[PowerPC] Fix the check for the instruction using FRSP/XSRSP output register When performing peephole optimization to simplify the code, after removing passed FPSP/XSRSP instruction we will set any uses of that FRSP/XSRSP to the source of the FRSP/XSRSP. We are finding the machine instruction using virtual register holding FRSP/XSRSP results by searching all following instructions and encountering an issue that the first use of the virtual register is a debug MI causing: 1. virtual register in the debug MI removed unexpectedly. 2. virtual register used in non-debug MI not replaced with the source of FRSP/XSRSP. which stays in a undef status. This patch fix the issue by only searching non-debug machine instruction using virtual register holding FRSP/XSRSP results when the vr only has one non debug usage. Differential Revisien: https://reviews.llvm.org/D94711 Reviewed by: nemanjai	2021-01-19 09:20:03 -06:00
Nemanja Ivanovic	61f69153e8	[PowerPC] Sign extend comparison operand for signed atomic comparisons As of `8dacca943a`, we sign extend the atomic loaded operand for signed subword comparisons. However, the assumption that the other operand is correctly sign extended doesn't always hold. This patch sign extends the other operand if it needs to be sign extended. This is a second fix for https://bugs.llvm.org/show_bug.cgi?id=30451 Differential revision: https://reviews.llvm.org/D94058	2021-01-18 21:19:25 -06:00
Sean Fertile	ead71a23ed	[PowerPC][AIX]Do not emit xxspltd mnemonic on AIX. A bug in the system assembler can assemble the xxspltd extended menemonic into the wrong instruction (extracting the wrong element). Emit the full xxpermdi with all operands to work around the problem. Differential Revision: https://reviews.llvm.org/D94419	2021-01-18 09:25:31 -05:00
Tres Popp	3bd24574c7	Revert "[PowerPC] support register pressure reduction in machine combiner." This reverts commit `26a396c4ef`. See https://reviews.llvm.org/D92071 for a description of the issue.	2021-01-18 12:01:57 +01:00
Chen Zheng	26a396c4ef	[PowerPC] support register pressure reduction in machine combiner. Reassociating some patterns to generate more fma instructions to reduce register pressure. Reviewed By: jsji Differential Revision: https://reviews.llvm.org/D92071	2021-01-17 23:56:13 -05:00
Qiu Chaofan	f776d8b12f	[Legalizer] Promote result type in expanding FP_TO_XINT This patch promotes result integer type of FP_TO_XINT in expanding. So crash in conversion from ppc_fp128 to i1 will be fixed. Reviewed By: steven.zhang Differential Revision: https://reviews.llvm.org/D92473	2021-01-18 11:56:11 +08:00
Qiu Chaofan	2d9890775f	[PowerPC] [NFC] Add AIX triple to some regression tests As part of the effort to improve AIX support, regression test coverage misses quite a lot for AIX subtarget. This patch adds AIX triple to those don't need extra change, and we can cover more cases in following commits. Reviewed By: steven.zhang Differential Revision: https://reviews.llvm.org/D94159	2021-01-18 11:44:00 +08:00
Bjorn Pettersson	4f15556731	[LegalizeDAG] Handle NeedInvert when expanding BR_CC This is a follow-up fix to commit `03c8d6a0c4`. Seems like we now end up with NeedInvert being set in the result from LegalizeSetCCCondCode more often than in the past, so we need to handle NeedInvert when expanding BR_CC. Not sure how to deal with the "Tmp4.getNode()" case properly, but current assumption is that that code path isn't impacted by the changes in `03c8d6a0c4` so we can simply move the old assert into the if-branch and only handle NeedInvert in the else-branch. I think that the test case added here, for PowerPC, might have failed also before commit `03c8d6a0c4`. But we started to hit the assert more often downstream when having merged that commit. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D94762	2021-01-16 14:33:19 +01:00
Esme-Yi	ff40fb07ad	[PowerPC] Try to fold sqrt/sdiv test results with the branch. Summary: The patch tries to fold sqrt/sdiv test node, i.g FTSQRT, XVTDIVDP, and the branch, i.e br_cc if they meet these patterns: (br_cc seteq, (truncateToi1 SWTestOp), 0) -> (BCC PRED_NU, SWTestOp) (br_cc seteq, (and SWTestOp, 2), 0) -> (BCC PRED_NE, SWTestOp) (br_cc seteq, (and SWTestOp, 4), 0) -> (BCC PRED_LE, SWTestOp) (br_cc seteq, (and SWTestOp, 8), 0) -> (BCC PRED_GE, SWTestOp) (br_cc setne, (truncateToi1 SWTestOp), 0) -> (BCC PRED_UN, SWTestOp) (br_cc setne, (and SWTestOp, 2), 0) -> (BCC PRED_EQ, SWTestOp) (br_cc setne, (and SWTestOp, 4), 0) -> (BCC PRED_GT, SWTestOp) (br_cc setne, (and SWTestOp, 8), 0) -> (BCC PRED_LT, SWTestOp) Reviewed By: steven.zhang Differential Revision: https://reviews.llvm.org/D94054	2021-01-14 02:15:19 +00:00
Craig Topper	03c8d6a0c4	[LegalizeDAG][RISCV][PowerPC][AMDGPU][WebAssembly] Improve expansion of SETONE/SETUEQ on targets without SETO/SETUO. If SETO/SETUO aren't legal, they'll be expanded and we'll end up with 3 comparisons. SETONE is equivalent to (SETOGT \|\| SETOLT) so if one of those operations is supported use that expansion. We don't need both since we can commute the operands to make the other. SETUEQ can be implemented with !(SETOGT \|\| SETOLT) or (SETULE && SETUGE). I've only implemented the first because it didn't look like most of the affected targets had legal SETULE/SETUGE. Reviewed By: frasercrmck, tlively, nemanjai Differential Revision: https://reviews.llvm.org/D94450	2021-01-12 10:45:03 -08:00
Nemanja Ivanovic	3f7b4ce960	[PowerPC] Add support for embedded devices with EFPU2 PowerPC cores like e200z759n3 [1] using an efpu2 only support single precision hardware floating point instructions. The single precision instructions efs* and evfs* are identical to the spe float instructions while efd* and evfd* instructions trigger a not implemented exception. This patch introduces a new command line option -mefpu2 which leads to single-hardware / double-software code generation. [1] Core reference: https://www.nxp.com/files-static/32bit/doc/ref_manual/e200z759CRM.pdf Differential revision: https://reviews.llvm.org/D92935	2021-01-12 09:47:00 -06:00
Craig Topper	b1c304c494	[CodeGen] Try to make the print of memory operand alignment a little more user friendly. Memory operands store a base alignment that does not factor in the effect of the offset on the alignment. Previously the printing code only printed the base alignment if it was different than the size. If there is an offset, the reader would need to figure out the effective alignment themselves. This has confused me before and someone else was recently confused on IRC. This patch prints the possibly offset adjusted alignment if it is different than the size. And prints the base alignment if it is different than the alignment. The MIR parser has been updated to read basealign in addition to align. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D94344	2021-01-11 19:58:47 -08:00
Mircea Trofin	05e90cefeb	[NFC] Disallow unused prefixes under llvm/test/CodeGen This patch finishes addressing unused prefixes under CodeGen: 2 remaining tests fixed, and then undo-ing the lit.local.cfg changes under various subdirs and moving the policy under CodeGen. Differential Revision: https://reviews.llvm.org/D94430	2021-01-11 12:32:18 -08:00
Mircea Trofin	7200d2cf08	[NFC] Disallow unused prefixes in CodeGen/PowerPC tests. Also removed where applicable. Differential Revision: https://reviews.llvm.org/D94385	2021-01-11 09:24:52 -08:00
Paul Robinson	c161775dec	[FastISel] Flush local value map on every instruction Local values are constants or addresses that can't be folded into the instruction that uses them. FastISel materializes these in a "local value" area that always dominates the current insertion point, to try to avoid materializing these values more than once (per block). https://reviews.llvm.org/D43093 added code to sink these local value instructions to their first use, which has two beneficial effects. One, it is likely to avoid some unnecessary spills and reloads; two, it allows us to attach the debug location of the user to the local value instruction. The latter effect can improve the debugging experience for debuggers with a "set next statement" feature, such as the Visual Studio debugger and PS4 debugger, because instructions to set up constants for a given statement will be associated with the appropriate source line. There are also some constants (primarily addresses) that could be produced by no-op casts or GEP instructions; the main difference from "local value" instructions is that these are values from separate IR instructions, and therefore could have multiple users across multiple basic blocks. D43093 avoided sinking these, even though they were emitted to the same "local value" area as the other instructions. The patch comment for D43093 states: Local values may also be used by no-op casts, which adds the register to the RegFixups table. Without reversing the RegFixups map direction, we don't have enough information to sink these instructions. This patch undoes most of D43093, and instead flushes the local value map after() every IR instruction, using that instruction's debug location. This avoids sometimes incorrect locations used previously, and emits instructions in a more natural order. In addition, constants materialized due to PHI instructions are not assigned a debug location immediately; instead, when the local value map is flushed, if the first local value instruction has no debug location, it is given the same location as the first non-local-value-map instruction. This prevents PHIs from introducing unattributed instructions, which would either be implicitly attributed to the location for the preceding IR instruction, or given line 0 if they are at the beginning of a machine basic block. Neither of those consequences is good for debugging. This does mean materialized values are not re-used across IR instruction boundaries; however, only about 5% of those values were reused in an experimental self-build of clang. () Actually, just prior to the next instruction. It seems like it would be cleaner the other way, but I was having trouble getting that to work. This reapplies commits `cf1c774d` and `dc35368c`, and adds the modification to PHI handling, which should avoid problems with debugging under gdb. Differential Revision: https://reviews.llvm.org/D91734	2021-01-11 08:32:36 -08:00
Paul Robinson	e5eb5c8a7f	NFC: Use -LABEL more There were a number of tests needing updates for D91734, and I added a bunch of LABEL directives to help track down where those had to go. These directives are an improvement independent of the functional patch, so I'm committing them as their own separate patch.	2021-01-11 08:14:58 -08:00
Qiu Chaofan	6175fcf01f	[NFC] Update some PPC tests marked as auto-generated Update CodeGen regression tests with marker at first line telling it's auto-generated by the script, under PowerPC directory. For some reason, these tests are generated but manually written, which makes things unclear when someone's change affecting them. However, some tests only show simple change after re-generated, like extra blank lines, disappearing '.localentry', etc. Besides, some tests are generated but added checks for debug output. This commit doesn't try updating them.	2021-01-08 17:59:13 +08:00
Nikita Popov	7d48eff8ba	[PowerPC] Avoid call to undef in test (NFC) Replace call to undef with a dummy function, to avoid affecting this change by changes to call undef folding.	2021-01-06 21:09:02 +01:00
Stefan Pintilie	cb0c034edc	[PowerPC] Fix issue where vsrq is given incorrect shift vector The new Power10 instruction vsrq was being given the wrong shift vector. The original code assumed that the shift would be found in bits 121 to 127. This is not correct. The shift is found in bits 57 to 63. This can be fixed by swaping the first and second double words. Reviewed By: nemanjai, #powerpc Differential Revision: https://reviews.llvm.org/D94113	2021-01-06 05:56:09 -06:00
Jinsong Ji	f26bc0ddd5	[RegisterClassInfo] Return non-zero for RC without allocatable reg In some case, the RC may have 0 allocatable reg. eg: VRSAVERC in PowerPC, which has only 1 reg, but it is also reserved. The curreent implementation will keep calling the computePSetLimit because getRegPressureSetLimit assume computePSetLimit will return a non-zero value. The fix simply early return the value from TableGen for such special case. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D92907	2021-01-05 16:18:34 +00:00
Qiu Chaofan	48340fbe6a	[NFC] [PowerPC] Update vec_constants test to reflect more patterns This patch uses update_llc_check script to update vec_constants.ll, and add two cases to cover 'vsplti+vsldoi' with 16-bit and 24-bit offset.	2021-01-05 11:29:08 +08:00
Qiu Chaofan	ae61485163	[UpdateTestChecks] Fix PowerPC RE to support AIX assembly Current update_llc_test_checks.py cannot generate checks for AIX (powerpc64-ibm-aix-xcoff) properly. Assembly generated is little bit different from Linux. So I use begin function comment here to capture function name. Reviewed By: MaskRay, steven.zhang Differential Revision: https://reviews.llvm.org/D93676	2021-01-05 10:28:00 +08:00
Kai Luo	f6515b0520	[PowerPC] Do not fold `cmp(d\|w)` and `subf` instruction to `subf.` if `nsw` is not present In `PPCInstrInfo::optimizeCompareInstr` we seek opportunities to fold `cmp(d\|w)` and `subf` as an `subf.`. However, if `subf.` gets overflow, `cr0` can't reflect the correct order, violating the semantics of `cmp(d\|w)`. Fixed https://bugs.llvm.org/show_bug.cgi?id=47830. Reviewed By: #powerpc, nemanjai Differential Revision: https://reviews.llvm.org/D90156	2021-01-04 07:54:15 +00:00
Roman Lebedev	2461cdb417	[CodeGen][SimplifyCFG] Teach DwarfEHPrepare to preserve DomTree Once the default for SimplifyCFG flips, we can no longer pass nullptr instead of DomTree to SimplifyCFG, so we need to propagate it here. We don't strictly need to actually preserve DomTree in DwarfEHPrepare, but we might as well do it, since it's trivial.	2021-01-02 01:01:19 +03:00
Roman Lebedev	b23b1bcc26	[NFC][CodeGen][Tests] Mark all tests that fail to preserve DomTree for SimplifyCFG as such These tests start to fail when the SimplifyCFG's default regarding DomTree updating is switched on, so mark them as needing changes.	2021-01-02 01:01:19 +03:00
Fangrui Song	2047c10c22	[TargetMachine] Drop implied dso_local for definitions in ELF static relocation model/PIE TargetMachine::shouldAssumeDSOLocal currently implies dso_local for such definitions. Since clang -fno-pic add the dso_local specifier, we don't need to special case.	2020-12-30 16:57:50 -08:00
Fangrui Song	bf1160c1d6	[test] Add explicit dso_local to definitions in ELF static relocation model tests	2020-12-30 16:52:23 -08:00
Fangrui Song	a964e0f085	[test] Add explicit dso_local to definitions in ELF static relocation model tests	2020-12-30 15:47:16 -08:00
Fangrui Song	88cadb894c	[PowerPC][test] Add explicit dso_local to definitions in ELF static relocation model tests TargetMachine::shouldAssumeDSOLocal currently implies dso_local for such definitions. Adding explicit dso_local makes these tests align with the clang -fpic behavior and allow the removal of the TargetMachine::shouldAssumeDSOLocal special case. Rewrite preemption.ll to dsolocal-static.ll and dsolocal-pic.ll, and add "PIC Level" metadata.	2020-12-30 10:32:34 -08:00
Kai Luo	e3e25cfb44	[PowerPC] Add mir test to show effect of `optimizeCompareInstr` when `equalityOnly` is true. NFC.	2020-12-30 02:23:05 +00:00
Kai Luo	f904d50c29	[PowerPC] Remaining KnownBits should be constant when performing non-sign comparison In `PPCTargetLowering::DAGCombineTruncBoolExt`, when checking if it's correct to perform the transformation for non-sign comparison, as the comment says ``` // This is neither a signed nor an unsigned comparison, just make sure // that the high bits are equal. ``` Origin check ``` if (Op1Known.Zero != Op2Known.Zero \|\| Op1Known.One != Op2Known.One) return SDValue(); ``` is not strong enough. For example, ``` Op1Known = 111x000x; Op2Known = 111x000x; ``` Bit 4, besides bit 0, is still unknown and affects the final result. This patch fixes https://bugs.llvm.org/show_bug.cgi?id=48388. Reviewed By: nemanjai, #powerpc Differential Revision: https://reviews.llvm.org/D93092	2020-12-30 02:00:47 +00:00
Nemanja Ivanovic	7486de1b2e	[PowerPC] Provide patterns for permuted scalar to vector for pre-P8 We will emit these permuted nodes on all VSX little endian subtargets but don't have the patterns available to match them on subtargets that don't have direct moves. Fixes: https://bugs.llvm.org/show_bug.cgi?id=47916	2020-12-29 06:49:25 -06:00
Nemanja Ivanovic	0a19fc3088	[PowerPC] Disable CTR loops containing operations on half-precision On subtargets prior to Power9, conversions to/from half precision are lowered to libcalls. This makes loops containing such operations invalid candidates for HW loops. Fixes: https://bugs.llvm.org/show_bug.cgi?id=48519	2020-12-29 05:12:50 -06:00
Nemanja Ivanovic	4f568fbd21	[PowerPC] Do not emit HW loop when TLS var accessed in PHI of loop exit If any PHI nodes in loop exit blocks have incoming values from the loop that are accesses of TLS variables with local dynamic or general dynamic TLS model, the address will be computed inside the loop. Since this includes a call to __tls_get_addr, this will in turn cause the CTR loops verifier to complain. Disable CTR loops in such cases. Fixes: https://bugs.llvm.org/show_bug.cgi?id=48527	2020-12-28 20:36:16 -06:00
Gabriel Hjort Åkerlund	b9a7c89d43	[MIRPrinter] Fix incorrect output of unnamed stack names The MIRParser expects unnamed stack entries to have empty names (''). In case of unnamed alloca instructions, the MIRPrinter would output '<unnamed alloca>', which caused the MIRParser to reject the generated code. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D93685	2020-12-28 18:01:40 +01:00
Evgeniy Brevnov	9fb074e7bb	[BPI] Improve static heuristics for "cold" paths. Current approach doesn't work well in cases when multiple paths are predicted to be "cold". By "cold" paths I mean those containing "unreachable" instruction, call marked with 'cold' attribute and 'unwind' handler of 'invoke' instruction. The issue is that heuristics are applied one by one until the first match and essentially ignores relative hotness/coldness of other paths. New approach unifies processing of "cold" paths by assigning predefined absolute weight to each block estimated to be "cold". Then we propagate these weights up/down IR similarly to existing approach. And finally set up edge probabilities based on estimated block weights. One important difference is how we propagate weight up. Existing approach propagates the same weight to all blocks that are post-dominated by a block with some "known" weight. This is useless at least because it always gives 50\50 distribution which is assumed by default anyway. Worse, it causes the algorithm to skip further heuristics and can miss setting more accurate probability. New algorithm propagates the weight up only to the blocks that dominates and post-dominated by a block with some "known" weight. In other words, those blocks that are either always executed or not executed together. In addition new approach processes loops in an uniform way as well. Essentially loop exit edges are estimated as "cold" paths relative to back edges and should be considered uniformly with other coldness/hotness markers. Reviewed By: yrouban Differential Revision: https://reviews.llvm.org/D79485	2020-12-23 22:47:36 +07:00
Kamau Bridgeman	8a58f21f5b	[PowerPC][Power10] Exploit store rightmost vector element instructions Using the store rightmost vector element instructions to do vector element extraction and store. The rightmost vector element on little endian is the zeroth vector element, with these patterns that element can be extracted and stored in one instruction for all vector types. Differential Revision: https://reviews.llvm.org/D89195	2020-12-22 12:06:43 -05:00
Nemanja Ivanovic	ba1202a1e4	[PowerPC] Restore stack ptr from base ptr when available On subtargets that have a red zone, we will copy the stack pointer to the base pointer in the prologue prior to updating the stack pointer. There are no other updates to the base pointer after that. This suggests that we should be able to restore the stack pointer from the base pointer rather than loading it from the back chain or adding the frame size back to either the stack pointer or the frame pointer. This came about because functions that call setjmp need to restore the SP from the FP because the back chain might have been clobbered (see https://reviews.llvm.org/D92906). However, if the stack is realigned, the restored SP might be incorrect (which is what caused the failures in the two ASan test cases). This patch was tested quite extensivelly both with sanitizer runtimes and general code. Differential revision: https://reviews.llvm.org/D93327	2020-12-22 05:44:03 -06:00
Fangrui Song	d33abc337c	Migrate MCContext::createTempSymbol call sites to AlwaysAddSuffix=true Most call sites set AlwaysAddSuffix to true. The two use cases do not really need false and can be more consistent with other temporary symbol usage.	2020-12-21 14:04:13 -08:00
Esme-Yi	29eb3dcfe6	[PowerPC] Materialize i64 constants by enumerated patterns. Summary: Some constants can be handled with less instructions than our current results. And it seems our original approach is not very easy to extend. Therefore this patch proposes to materialize all 64-bit constants by enumerated patterns. I traversed almost all constants to verified the functionality of these pattens. A traversed comparison of the number of instructions used by the original method and the new method has also been completed, where no degradation was caused by this patch. This patch also passed Bootstrap test and SPEC test. Improvements of this patch are shown in llvm/test/CodeGen/PowerPC/constants-i64.ll Reviewed By: steven.zhang, stefanp Differential Revision: https://reviews.llvm.org/D92089	2020-12-21 05:21:07 +00:00
Chen Zheng	564066524a	[PowerPC] add has side effect for SAT bit clobber intrinsics/instructions This patch does two things: 1: fix the typo that intrinsic mfvscr should be with no readmem property 2: since VSCR is not modeled yet, add has side effect for SAT bit clobber intrinsics/instructions. Reviewed By: steven.zhang Differential Revision: https://reviews.llvm.org/D90807	2020-12-20 19:48:26 -05:00
Chen Zheng	4dce7c2e20	[MachineLICM] delete dead flag if the duplicated def outside of loop is dead. Fixup dead flags for CSE-ed instructions. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D92557	2020-12-20 19:26:22 -05:00
QingShan Zhang	477b6505fa	[PowerPC] Select the D-Form load if we know its offset meets the requirement The LD/STD likewise instruction are selected only when the alignment in the load/store >= 4 to deal with the case that the offset might not be known(i.e. relocations). That means we have to select the X-Form load for %0 = load i64, i64* %arrayidx, align 2 In fact, we can still select the D-Form load if the offset is known. So, we only query the load/store alignment when we don't know if the offset is a multiple of 4. Reviewed By: jji, Nemanjai Differential Revision: https://reviews.llvm.org/D93099	2020-12-18 07:27:26 +00:00
Layton Kifer	385e9a2a04	[DAGCombiner] Improve shift by select of constant Clean up a TODO, to support folding a shift of a constant by a select of constants, on targets with different shift operand sizes. Reviewed By: RKSimon, lebedev.ri Differential Revision: https://reviews.llvm.org/D90349	2020-12-18 02:21:42 +00:00
Baptiste Saleil	c2892978e9	[PowerPC] Rename the vector pair intrinsics and builtins to replace the _mma_ prefix by _vsx_ On PPC, the vector pair instructions are independent from MMA. This patch renames the vector pair LLVM intrinsics and Clang builtins to replace the _mma_ prefix by _vsx_ in their names. We also move the vector pair type/intrinsic/builtin tests to their own files. Differential Revision: https://reviews.llvm.org/D91974	2020-12-17 13:19:27 -05:00
QingShan Zhang	ebdd20f430	Expand the fp_to_int/int_to_fp/fp_round/fp_extend as libcall for fp128 X86 and AArch64 expand it as libcall inside the target. And PowerPC also want to expand them as libcall for P8. So, propose an implement in the legalizer to common the logic and remove the code for X86/AArch64 to avoid the duplicate code. Reviewed By: Craig Topper Differential Revision: https://reviews.llvm.org/D91331	2020-12-17 07:59:30 +00:00
Esme-Yi	2ea7210e39	Revert "[PowerPC] Extend folding RLWINM + RLWINM to post-RA." This reverts commit `1c0941e152`.	2020-12-16 17:12:24 +00:00
diggerlin	a1e1dcabe4	[XCOFF][AIX] Emit EH information in traceback table SUMMARY: In order for the runtime on AIX to find the compact unwind section(EHInfo table), we would need to set the following on the traceback table: The 6th byte's longtbtable field to true to signal there is an Extended TB Table Flag. The Extended TB Table Flag to be 0x08 to signal there is an exception handling info presents. Emit the offset between ehinfo TC entry and TOC base after all other optional portions of traceback table. The patch is authored by Jason Liu. Reviewers: David Tenty, Digger Lin Differential Revision: https://reviews.llvm.org/D92766	2020-12-16 09:34:59 -05:00
Baptiste Saleil	57d83c3a90	[PowerPC] Enable paired vector type and intrinsics when MMA is disabled This patch enables the Clang type __vector_pair and its associated LLVM intrinsics even when MMA is disabled. With this patch, the type is now controlled by the PPC paired-vector-memops option. The builtins and intrinsics will be renamed to drop the mma prefix in another patch. Differential Revision: https://reviews.llvm.org/D91819	2020-12-15 15:14:11 -06:00
Nemanja Ivanovic	bfdc19e778	[PowerPC] Restore stack ptr from frame ptr with setjmp If a function happens to: - call setjmp - do a 16-byte stack allocation - call a function that sets up a stack frame and longjmp's back The stack pointer that is restores by setjmp will no longer point to a valid back chain. According to the ABI, stack accesses in such a function are to be frame pointer based - so it is an error (quite obviously) to restore the stack from the back chain. We already restore the stack from the frame pointer when there are calls to fast_cc functions. We just need to also do that when there are calls to setjmp. This patch simply does that. This was pointed out by the Julia team. Differential revision: https://reviews.llvm.org/D92906	2020-12-14 11:34:16 -06:00
QingShan Zhang	08e287aaf3	[PowerPC][FP128] Fix the incorrect signature for math library call The runtime library has two family library implementation for ppc_fp128 and fp128. For IBM Long double(ppc_fp128), it is suffixed with 'l', i.e(sqrtl). For IEEE Long double(fp128), it is suffixed with "ieee128" or "f128". We miss to map several libcall for IEEE Long double. Reviewed By: qiucf Differential Revision: https://reviews.llvm.org/D91675	2020-12-14 07:52:56 +00:00
Zarko Todorovski	ce4040a43d	[PPC] Check for PPC64 when emitting 64bit specific VSX nodes when pattern matching built vectors Some of the pattern matching in PPCInstrVSX.td and node lowering involving vectors assumes 64bit mode. This patch disables some of the unsafe pattern matching and lowering of BUILD_VECTOR in 32bit mode. Reviewed By: Xiangling_L Differential Revision: https://reviews.llvm.org/D92789	2020-12-12 15:28:28 -05:00
diggerlin	997d286f2d	[AIX][XCOFF] emit traceback table for function in aix SUMMARY: 1. added a new option -xcoff-traceback-table to control whether generate traceback table for function. 2. implement the functionality of emit traceback table of a function. Reviewers: hubert.reinterpretcast, Jason Liu Differential Revision: https://reviews.llvm.org/D92398	2020-12-11 17:50:25 -05:00
Jinsong Ji	cf638f84a4	[PowerPC] Remove duplicate layout	2020-12-11 15:52:29 +00:00
QingShan Zhang	68dbb7789e	[NFC][Test] Add a test to verify the instruction form we got from isel	2020-12-11 10:36:46 +00:00
QingShan Zhang	08280c4b73	[NFC][Test] Format the PowerPC test for incoming patch	2020-12-11 09:53:20 +00:00
Jinsong Ji	02b2c02419	[PowerPC] Precommit testcases for regpressure compute fix	2020-12-09 03:37:00 +00:00
Chen Zheng	66a03d1022	[PowerPC] prepare more dq form for P10 pair load/store Reviewed By: steven.zhang Differential Revision: https://reviews.llvm.org/D92393	2020-12-08 21:01:40 -05:00
Stefan Pintilie	2812c15156	[PowerPC] Fix missing nop after call to weak callee. Weak functions can be replaced by other functions at link time. Previously it was assumed that no matter what the weak callee function was replaced with it would still share the same TOC as the caller. This is no longer true as a weak callee with a TOC setup can be replaced by another function that was compiled with PC Relative and does not have a TOC at all. This patch makes sure that all calls to functions defined as weak from a caller that has a valid TOC have a nop after the call to allow a place for the linker to restore the TOC. Reviewed By: NeHuang Differential Revision: https://reviews.llvm.org/D91983	2020-12-08 09:38:44 -06:00
Qiu Chaofan	92160b23f5	[NFC] [PowerPC] Move i1-to-fp tests and use script	2020-12-08 15:20:15 +08:00
Qiu Chaofan	5e85a2ba16	[PowerPC] Implement intrinsic for DARN instruction Instruction darn was introduced in ISA 3.0. It means 'Deliver A Random Number'. The immediate number L means: - L=0, the number is 32-bit (higher 32-bits are all-zero) - L=1, the number is 'conditioned' (processed by hardware to reduce bias) - L=2, the number is not conditioned, directly from noise source GCC implements them in three separate intrinsics: __builtin_darn, __builtin_darn_32 and __builtin_darn_raw. This patch implements the same intrinsics. And this change also addresses Bugzilla PR39800. Reviewed By: steven.zhang Differential Revision: https://reviews.llvm.org/D92465	2020-12-08 14:08:52 +08:00
Kai Luo	44bd8ea167	[DAGCombine][PowerPC] Simplify nabs by using legal `smin` operation Convert `0 - abs(x)` to `smin (x, -x)` if `smin` is a legal operation. Verification: https://alive2.llvm.org/ce/z/vpquFR Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D92637	2020-12-08 03:24:07 +00:00
Stefan Pintilie	49921d1c3c	[PowerPC] Exploitation of xxeval instruction for AND and NAND The xxeval instruction was intorduced in Power PC in Power 10. The instruction accepts three vector registers and an immediate. Depending on the value of the immediate the instruction can be used to perform certain bitwise boolean operations (and, or, xor, ...) on the given vector registers. This patch implements the AND and NAND patterns that can be used by the instruction. Reviewed By: nemanjai, #powerpc, bsaleil, NeHuang, jsji Differential Revision: https://reviews.llvm.org/D92420	2020-12-07 12:36:54 -06:00
Esme-Yi	28fdeea952	[PowerPC] Add support for intrinsics dcbfps and dcbstps in P10. Summary: This patch added support for the intrinsics llvm.ppc.dcbfps and llvm.ppc.dcbstps. dcbfps and dcbstps are actually extended mnemonics of dcbf. dcbfps RA,RB ---> dcbf RA,RB,4 dcbstps RA,RB ---> dcbf RA,RB,6 Reviewed By: amyk, steven.zhang Differential Revision: https://reviews.llvm.org/D91323	2020-12-07 05:19:06 +00:00
Qiu Chaofan	efdd463050	[PowerPC] Fix chain for i1-to-fp operation A simple SELECT is used for converting i1 to floating types on ppc32, but in constrained cases, the chain is not handled properly. This patch will fix that. Reviewed By: steven.zhang Differential Revision: https://reviews.llvm.org/D92365	2020-12-07 10:38:56 +08:00
Layton Kifer	ac522f8700	[DAGCombiner] Fold (sext (not i1 x)) -> (add (zext i1 x), -1) Move fold of (sext (not i1 x)) -> (add (zext i1 x), -1) from X86 to DAGCombiner to improve codegen on other targets. Differential Revision: https://reviews.llvm.org/D91589	2020-12-06 11:52:10 -05:00
Simon Pilgrim	4a8b5e9896	[PowerPC] Regenerate p10-vector-rotate.ll Reorder check-prefixes to stop update_llc_test_checks.py complaining	2020-12-04 15:33:01 +00:00
QingShan Zhang	c25b039e21	[PowerPC] Fix the regression caused by commit `9c588f53fc` Add a TypeLegal check for MVT::i1 and add the test.	2020-12-04 10:22:13 +00:00
Kai Luo	f5d52916ce	[PowerPC] Pre-commit neg abs test for vector. NFC.	2020-12-04 06:52:05 +00:00
Baptiste Saleil	45ec3a37b0	[PowerPC] Fix for excessive ACC copies due to PHI nodes When using accumulators in loops, they are passed around in PHI nodes of unprimed accumulators, causing the generation of additional prime/unprime instructions. This patch detects these cases and changes these PHI nodes to primed accumulator PHI nodes. We also add IR and MIR test cases for several PHI node cases. Differential Revision: https://reviews.llvm.org/D91391	2020-12-03 09:51:23 -06:00
QingShan Zhang	9bf0fea372	[PowerPC] Add the hw sqrt test for vector type v4f32/v2f64 PowerPC ISA support the input test for vector type v4f32 and v2f64. Replace the software compare with hw test will improve the perf. Reviewed By: ChenZheng Differential Revision: https://reviews.llvm.org/D90914	2020-12-03 03:19:18 +00:00
jasonliu	2c63e7604c	[XCOFF][AIX] Alternative path in EHStreamer for platforms do not have uleb128 support Summary: Not all system assembler supports `.uleb128 label2 - label1` form. When the target do not support this form, we have to take alternative manual calculation to get the offsets from them. Reviewed By: hubert.reinterpretcast Diffierential Revision: https://reviews.llvm.org/D92058	2020-12-02 20:03:15 +00:00
jasonliu	a65d8c5d72	[XCOFF][AIX] Generate LSDA data and compact unwind section on AIX Summary: AIX uses the existing EH infrastructure in clang and llvm. The major differences would be 1. AIX do not have CFI instructions. 2. AIX uses a new personality routine, named __xlcxx_personality_v1. It doesn't use the GCC personality rountine, because the interoperability is not there yet on AIX. 3. AIX do not use eh_frame sections. Instead, it would use a eh_info section (compat unwind section) to store the information about personality routine and LSDA data address. Reviewed By: daltenty, hubert.reinterpretcast Differential Revision: https://reviews.llvm.org/D91455	2020-12-02 18:42:44 +00:00
Simon Pilgrim	8aa40de5ec	[PowerPC] Regenerate cmpb tests Helps to reduce diff in D90113	2020-12-02 18:00:41 +00:00
Qiu Chaofan	ffa2dce590	[PowerPC] Fix FLT_ROUNDS_ on little endian In lowering of FLT_ROUNDS_, FPSCR content will be moved into FP register and then GPR, and then truncated into word. For subtargets without direct move support, it will store and then load. The load address needs adjustment (+4) only on big-endian targets. This patch fixes it on using generic opcodes on little-endian and subtargets with direct-move. Reviewed By: steven.zhang Differential Revision: https://reviews.llvm.org/D91845	2020-12-02 17:16:32 +08:00
QingShan Zhang	47f784ace6	[PowerPC] Promote the i1 to i64 for SINT_TO_FP/FP_TO_SINT i1 is the native type for PowerPC if crbits is enabled. However, we need to promote the i1 to i64 as we didn't have the pattern for i1. Reviewed By: Qiu Chao Fang Differential Revision: https://reviews.llvm.org/D92067	2020-12-02 05:37:45 +00:00
David Blaikie	615f63e149	Revert "[FastISel] Flush local value map on ever instruction" and dependent patches This reverts commit `cf1c774d6a`. This change caused several regressions in the gdb test suite - at least a sample of which was due to line zero instructions making breakpoints un-lined. I think they're worth investigating/understanding more (& possibly addressing) before moving forward with this change. Revert "[FastISel] NFC: Clean up unnecessary bookkeeping" This reverts commit `3fd39d3694`. Revert "[FastISel] NFC: Remove obsolete -fast-isel-sink-local-values option" This reverts commit `a474657e30`. Revert "Remove static function unused after cf1c774." This reverts commit `dc35368ccf`. Revert "[lldb] Fix TestThreadStepOut.py after "Flush local value map on every instruction"" This reverts commit `53a14a47ee`.	2020-12-01 14:26:23 -08:00
Simon Pilgrim	6dbd0d36a1	[DAG] Move vselect(icmp_ult, -1, add(x,y)) -> uaddsat(x,y) to DAGCombine (PR40111) Move the X86 VSELECT->UADDSAT fold to DAGCombiner - there's nothing target specific about these folds. The SSE42 test diffs are relatively benign - its avoiding an extra constant load in exchange for an extra xor operation - there are extra register moves, which is annoying as all those operations should commute them away. Differential Revision: https://reviews.llvm.org/D91876	2020-12-01 11:56:26 +00:00
QingShan Zhang	4d83aba422	[DAGCombine] Adding a hook to improve the precision of fsqrt if the input is denormal For now, we will hardcode the result as 0.0 if the input is denormal or 0. That will have the impact the precision. As the fsqrt added belong to the cold path of the cmp+branch, it won't impact the performance for normal inputs for PowerPC, but improve the precision if the input is denormal. Reviewed By: Spatel Differential Revision: https://reviews.llvm.org/D80974	2020-11-27 02:10:55 +00:00
Zarko Todorovski	6d648e69c0	[AIX] Add support for non var_arg extended vector ABI calling convention on AIX This patch enables passing non variadic vector type parameters on the caller and callee side and vector return on AIX that are passed in vector registers only. So far, support is enabled for only the AIX extended Altivec ABI Calling convention. Reviewed By: sfertile, DiggerLin Differential Revision: https://reviews.llvm.org/D86476	2020-11-26 12:03:51 -05:00
Craig Topper	2d6042937b	[SelectionDAGBuilder] Add SPF_NABS support to visitSelect We currently don't match this which limits the effectiveness of D91120 until InstCombine starts canonicalizing to llvm.abs. This should be easy to remove if/when we remove the SPF_ABS handling. Differential Revision: https://reviews.llvm.org/D92118	2020-11-25 14:54:26 -08:00
Paul Robinson	cf1c774d6a	[FastISel] Flush local value map on ever instruction Local values are constants or addresses that can't be folded into the instruction that uses them. FastISel materializes these in a "local value" area that always dominates the current insertion point, to try to avoid materializing these values more than once (per block). https://reviews.llvm.org/D43093 added code to sink these local value instructions to their first use, which has two beneficial effects. One, it is likely to avoid some unnecessary spills and reloads; two, it allows us to attach the debug location of the user to the local value instruction. The latter effect can improve the debugging experience for debuggers with a "set next statement" feature, such as the Visual Studio debugger and PS4 debugger, because instructions to set up constants for a given statement will be associated with the appropriate source line. There are also some constants (primarily addresses) that could be produced by no-op casts or GEP instructions; the main difference from "local value" instructions is that these are values from separate IR instructions, and therefore could have multiple users across multiple basic blocks. D43093 avoided sinking these, even though they were emitted to the same "local value" area as the other instructions. The patch comment for D43093 states: Local values may also be used by no-op casts, which adds the register to the RegFixups table. Without reversing the RegFixups map direction, we don't have enough information to sink these instructions. This patch undoes most of D43093, and instead flushes the local value map after() every IR instruction, using that instruction's debug location. This avoids sometimes incorrect locations used previously, and emits instructions in a more natural order. This does mean materialized values are not re-used across IR instruction boundaries; however, only about 5% of those values were reused in an experimental self-build of clang. () Actually, just prior to the next instruction. It seems like it would be cleaner the other way, but I was having trouble getting that to work. Differential Revision: https://reviews.llvm.org/D91734	2020-11-25 13:05:00 -05:00
Simon Pilgrim	6588592684	[PowerPC] Regenerate vec_select.ll tests and add <1 x i128> test case	2020-11-25 14:28:16 +00:00
Kai Luo	97e7ce3b15	[PowerPC] Probe the gap between stackptr and realigned stackptr During reviewing https://reviews.llvm.org/D84419, @efriedma mentioned the gap between realigned stack pointer and origin stack pointer should be probed too whatever the alignment is. This patch fixes the issue for PPC64. Reviewed By: jsji Differential Revision: https://reviews.llvm.org/D88078	2020-11-25 07:01:45 +00:00
QingShan Zhang	9c588f53fc	[DAGCombine] Add hook to allow target specific test for sqrt input PowerPC has instruction ftsqrt/xstsqrtdp etc to do the input test for software square root. LLVM now tests it with smallest normalized value using abs + setcc. We should add hook to target that has test instructions. Reviewed By: Spatel, Chen Zheng, Qiu Chao Fang Differential Revision: https://reviews.llvm.org/D80706	2020-11-25 05:37:15 +00:00
Kai Luo	8e6d92026c	[DAG][PowerPC] Fix dropped `nsw` flag in `SimplifySetCC` by adding `doesNodeExist` helper `SimplifySetCC` invokes `getNodeIfExists` without passing `Flags` argument and `getNodeIfExists` uses a default `SDNodeFlags` to intersect the original flags, as a consequence, flags like `nsw` is dropped. Added a new helper function `doesNodeExist` to check if a node exists without modifying its flags. Reviewed By: #powerpc, nemanjai Differential Revision: https://reviews.llvm.org/D89938	2020-11-25 04:39:03 +00:00
Zarko Todorovski	be7d425edc	[PPC][AIX] Add vector callee saved registers for AIX extended vector ABI This patch is the initial patch for support of the AIX extended vector ABI. The extended ABI treats vector registers V20-V31 as non-volatile and we add them as callee saved registers in this patch. Reviewed By: sfertile Differential Revision: https://reviews.llvm.org/D88676	2020-11-24 23:01:51 -05:00
QingShan Zhang	60c28a5a2b	[NFC][Test] Format the test for IEEE Long double	2020-11-25 03:00:24 +00:00
QingShan Zhang	fa42f08b26	[PowerPC][FP128] Fix the incorrect calling convention for IEEE long double on Power8 For now, we are using the GPR to pass the arguments/return value for fp128 on Power8, which is incorrect. It should be VSR. The reason why we do it this way is that, we are setting the fp128 as illegal which make LLVM try to emulate it with i128 on Power8. So, we need to correct it as legal. Reviewed By: Nemanjai Differential Revision: https://reviews.llvm.org/D91527	2020-11-25 01:43:48 +00:00
Zarko Todorovski	c92f29b05e	[AIX] Add mabi=vec-extabi options to enable the AIX extended and default vector ABIs. Added support for the options mabi=vec-extabi and mabi=vec-default which are analogous to qvecnvol and qnovecnvol when using XL on AIX. The extended Altivec ABI on AIX is enabled using mabi=vec-extabi in clang and vec-extabi in llc. Reviewed By: Xiangling_L, DiggerLin Differential Revision: https://reviews.llvm.org/D89684	2020-11-24 18:17:53 -05:00
Sean Fertile	4f5355ee73	[PowerPC] Don't reuse an illegal typed load for int_to_fp conversion. When the operand to an (s/u)int_to_fp node is an illegally typed load we cannot reuse the load address since we can not build a proper dependancy chain. The legalized loads will use a different chain output then the illegal load. If we reuse the load address then we will build a conversion node that uses the chain of the illegal load and operations which modify the memory address in the other dependancy chain can be scheduled before the floating point load which feeds the conversion. Differential Revision: https://reviews.llvm.org/D91265	2020-11-24 15:45:33 -05:00
Janek van Oirschot	42eaf4fe0a	[HardwareLoops] Change order of SCEV expression construction for InitLoopCount. Putting the +1 before the zero-extend will allow scalar evolution to fold the expression in some cases such as the one shown in PowerPC's `shrink-wrap.ll` test. Reviewed By: samparker Differential Revision: https://reviews.llvm.org/D91724	2020-11-24 18:01:42 +00:00
Victor Huang	1f5c4a0d04	[PowerPC][PCRelative] Add new pseudo instructions for PCRel TLS to fix R2 clobber issue New pseudo instructions GETtlsADDRPCREL and GETtlsldADDRPCREL are added for properly setting REGMASK for tls_get_addr function when using PCRelative address. Differential Revisien: https://reviews.llvm.org/D91420 Reviewed by: bsaleil	2020-11-24 11:34:32 -06:00
Masoud Ataei	b86a1cd2f8	[PowerPC] dyn_cast should be dyn_cast_or_null in MASSV pass It is possible that we have different constants in different slots of second vector double (float) of pow function. So, in this case Exp->getSplatValue() will return nullptr. Here, I handle it properly. Reviewed By: steven.zhang, PowerPC Differential Revision: https://reviews.llvm.org/D91729	2020-11-24 16:21:12 +00:00
Kai Luo	5931be60b5	[DAGCombine][PowerPC] Convert negated abs to trivial arithmetic ops This patch converts `0 - abs(x)` to `Y = sra (X, size(X)-1); sub (Y, xor (X, Y))` for better codegen. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D91120	2020-11-24 09:43:35 +00:00
Kai Luo	da3bc99bdd	[PowerPC] Pre-commit more tests for `select` codegen. NFC.	2020-11-24 06:32:38 +00:00
Xiangling Liao	01b3e6e026	[AIX] Support init priority Support reserved [0-100] and non-reserved[101-65535] Clang/GNU init priority values on AIX. This patch maps Clang/GNU values into priority values used in sinit/sterm functions. User can play with values and be able to get init to occur before or after XL init and vice versa. Differential Revision: https://reviews.llvm.org/D91272	2020-11-23 14:50:05 -05:00
Esme-Yi	1c0941e152	[PowerPC] Extend folding RLWINM + RLWINM to post-RA. Summary: We have the patterns to fold 2 RLWINMs before RA, while some RLWINM will be generated after RA, for example rGc4690b007743. If the RLWINM generated after RA followed by another RLWINM, we expect to perform the optimization too. Reviewed By: shchenz Differential Revision: https://reviews.llvm.org/D89855	2020-11-22 07:37:24 +00:00
Matt Arsenault	1d1234b2a4	OpaquePtr: Update more tests to use typed sret	2020-11-20 20:08:43 -05:00
Matt Arsenault	20c43d6bd5	OpaquePtr: Bulk update tests to use typed sret	2020-11-20 17:58:26 -05:00
Matt Arsenault	06c192d454	OpaquePtr: Bulk update tests to use typed byval Upgrade of the IR text tests should be the only thing blocking making typed byval mandatory. Partially done through regex and partially manual.	2020-11-20 14:00:46 -05:00
QingShan Zhang	1b5921f4d8	[NFC][Test] Update test for IEEE Long Double	2020-11-20 09:57:45 +00:00
Qiu Chaofan	4cb510d284	[NFC] Pre-commit test for flt_rounds on PowerPC	2020-11-20 15:14:58 +08:00
Baptiste Saleil	18db29ea6f	[PowerPC] Add peephole to remove redundant accumulator prime/unprime instructions In some situations, the compiler may insert an accumulator prime instruction and an accumulator unprime instruction with no use of that accumulator between the two. That's for example the case when we store an accumulator after assembling it or restoring it. This patch adds a peephole to remove these prime and unprime instructions. Differential Revision: https://reviews.llvm.org/D91386	2020-11-18 15:01:07 -06:00
Esme-Yi	163929d7a6	[NFC][POwerPC] Added testcases of constant-i64.	2020-11-18 10:13:16 +00:00
QingShan Zhang	63a8ee3dda	[NFC][Test] Add more tests for IEEE Longdouble for PowerPC	2020-11-18 02:12:01 +00:00
Kai Luo	c2460c3254	[PowerPC] Add negated abs test using llvm.abs intrinsic. NFC.	2020-11-17 09:28:56 +00:00
Esme-Yi	8063905b04	[NFC][PowerPC] Add testcase of constant-i64.	2020-11-17 04:49:19 +00:00
Victor Huang	6bb2ceac90	Fix the compilation assertion due to unreachable BB pruning not deleting the associated BB from the jump tables This patch is added to remove the unreachable MBBs reference in the jump table. Differential Revisien: https://reviews.llvm.org/D90498 Reviewed by: amyk, bsaleil	2020-11-16 10:35:31 -06:00
QingShan Zhang	2b84784a25	[NFC][Test] Add test coverage for IEEE Long Double on Power8	2020-11-16 03:45:51 +00:00
Baptiste Saleil	3f78605a8c	[PowerPC] Add paired vector load and store builtins and intrinsics This patch adds the Clang builtins and LLVM intrinsics to load and store vector pairs. Differential Revision: https://reviews.llvm.org/D90799	2020-11-13 12:35:10 -06:00
Kai Luo	96ff53fbae	[PowerPC] Add test case for negated abs. NFC.	2020-11-13 08:06:31 +00:00
Baptiste Saleil	37c4ac8545	[PowerPC] Accumulator/Unprimed Accumulator register copy, spill and restore This patch adds support for accumulator/unprimed accumulator register copy, spill and restore for MMA. Authored By: Baptiste Saleil Reviewed By: #powerpc, bsaleil, amyk Differential Revision: https://reviews.llvm.org/D90616	2020-11-11 16:23:45 -06:00
Chen Zheng	09e34048bf	[SelectionDAG] fminnum should be a binary operator Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D91163	2020-11-11 03:41:40 -05:00

... 2 3 4 5 6 ...

3123 Commits