llvm-project

Commit Graph

Author	SHA1	Message	Date
Ties Stuij	b430782be3	[ARM] emit PACBTI-M build attributes This patch is part of a series that adds support for the PACBTI-M extension of the Armv8.1-M architecture, as detailed here: https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/armv8-1-m-pointer-authentication-and-branch-target-identification-extension The PACBTI-M specification can be found in the Armv8-M Architecture Reference Manual: https://developer.arm.com/documentation/ddi0553/latest The following people contributed to this patch: - Victor Campos - Ties Stuij Reviewed By: ostannard Differential Revision: https://reviews.llvm.org/D112425	2021-12-01 11:05:29 +00:00
David Green	6d41de380f	[ARM] Teach getIntImmCostInst about the cost of saturating fp converts Given a min(max(fptosi, INT_MIN), INT_MAX) with the correct constants, we can now generate a fptosi.sat. But in the arm backend, the constant can be treated as high cost, pulling it out of the basic block in a way that the DAG combine can no longer see it. This teaches it again that it is a low cost constant, not worth hoisting out. Differential Revision: https://reviews.llvm.org/D114380	2021-12-01 10:25:52 +00:00
Mircea Trofin	520f641877	[test] Avoid dumping .o in source tree (expand-pseudos.ll) Piping the input to llc avoids that (i.e. llc .... < %s vs llc ... %s)	2021-11-30 16:56:53 -08:00
David Green	9e8a71caf0	[DAG] Create fptosi.sat from clamped fptosi This adds a fold in DAGCombine to create fptosi_sat from sequences for smin(smax(fptosi(x))) nodes, where the min/max saturate the output of the fp convert to a specific bitwidth (say INT_MIN and INT_MAX). Because it is dealing with smin(/smax) in DAG they may currently be ISD::SMIN, ISD::SETCC/ISD::SELECT, ISD::VSELECT or ISD::SELECT_CC nodes which need to be handled similarly. A shouldConvertFpToSat method was added to control when converting may be profitable. The original fptosi will have a less strict semantics than the fptosisat, with less values that need to produce defined behaviour. This especially helps on ARM/AArch64 where the vcvt instructions naturally saturate the result. Differential Revision: https://reviews.llvm.org/D111976	2021-11-30 15:29:14 +00:00
Hans Wennborg	a87782c34d	Revert "[DAG] Create fptosi.sat from clamped fptosi" It causes builds to fail with this assert: llvm/include/llvm/ADT/APInt.h:990: bool llvm::APInt::operator==(const llvm::APInt &) const: Assertion `BitWidth == RHS.BitWidth && "Comparison requires equal bit widths"' failed. See comment on the code review. > This adds a fold in DAGCombine to create fptosi_sat from sequences for > smin(smax(fptosi(x))) nodes, where the min/max saturate the output of > the fp convert to a specific bitwidth (say INT_MIN and INT_MAX). Because > it is dealing with smin(/smax) in DAG they may currently be ISD::SMIN, > ISD::SETCC/ISD::SELECT, ISD::VSELECT or ISD::SELECT_CC nodes which need > to be handled similarly. > > A shouldConvertFpToSat method was added to control when converting may > be profitable. The original fptosi will have a less strict semantics > than the fptosisat, with less values that need to produce defined > behaviour. > > This especially helps on ARM/AArch64 where the vcvt instructions > naturally saturate the result. > > Differential Revision: https://reviews.llvm.org/D111976 This reverts commit `52ff3b0093`.	2021-11-30 15:36:56 +01:00
David Green	52ff3b0093	[DAG] Create fptosi.sat from clamped fptosi This adds a fold in DAGCombine to create fptosi_sat from sequences for smin(smax(fptosi(x))) nodes, where the min/max saturate the output of the fp convert to a specific bitwidth (say INT_MIN and INT_MAX). Because it is dealing with smin(/smax) in DAG they may currently be ISD::SMIN, ISD::SETCC/ISD::SELECT, ISD::VSELECT or ISD::SELECT_CC nodes which need to be handled similarly. A shouldConvertFpToSat method was added to control when converting may be profitable. The original fptosi will have a less strict semantics than the fptosisat, with less values that need to produce defined behaviour. This especially helps on ARM/AArch64 where the vcvt instructions naturally saturate the result. Differential Revision: https://reviews.llvm.org/D111976	2021-11-30 11:05:32 +00:00
Nick Desaulniers	89453ed6f2	[ARM] create new pseudo t2LDRLIT_ga_pcrel for stack guards We can't use the existing pseudo ARM::tLDRLIT_ga_pcrel for loading the stack guard for PIC code that references the GOT, since arm-pseudo may expand this to the narrow tLDRpci rather than the wider t2LDRpci. Create a new pseudo, t2LDRLIT_ga_pcrel, and expand it to t2LDRpci. Fixes: https://bugs.chromium.org/p/chromium/issues/detail?id=1270361 Reviewed By: ardb Differential Revision: https://reviews.llvm.org/D114762	2021-11-30 08:46:05 +01:00
Guozhi Wei	f1d8345a2a	[TwoAddressInstructionPass] Create register mapping for registers with multiple uses in the current MBB Currently we create register mappings for registers used only once in current MBB. For registers with multiple uses, when all the uses are in the current MBB, we can also create mappings for them similarly according to the last use. For example %reg101 = ... = ... reg101 %reg103 = ADD %reg101, %reg102 We can create mapping between %reg101 and %reg103. Differential Revision: https://reviews.llvm.org/D113193	2021-11-29 19:01:59 -08:00
David Green	410d276400	[DAG] Add tests for fpsti.sat for various architectures. NFC	2021-11-29 21:57:13 +00:00
David Green	04b5c00952	[ARM] Add testing for various fptosi.sat patterns. NFC	2021-11-28 16:36:17 +00:00
Simon Pilgrim	2778f9a9f6	[DAG] SimplifyDemandedVectorElts - attempt to handle ADD(x,x) as single use If the ADD node is the only user of the repeated operand, then treat this as single use - allows us to peek through shl(x,1) patterns.	2021-11-26 10:32:10 +00:00
Zarko Todorovski	7f7dac7126	[NFC][llvm] Inclusive language: reword uses of sanity test and check Part of continuing work to use more inclusive language. Reworded uses of sanity check and sanity test in llvm/test/	2021-11-25 07:21:42 -05:00
David Green	871418c5b0	[ARM] Expand rev.ll test with more triples. NFC Useful in showing Thumb2 and Thumb1 rev instructions as well as the arm already tested, as well as testing the more canonical llvm.bswap.i16 form.	2021-11-23 14:24:58 +00:00
Quinn Pham	592504aa26	[NFC][llvm] Inclusive language: replace master with main in 2007-04-02-RegScavengerAssert.ll [NFC] As part of using inclusive language within the llvm project, this patch replaces master with main in `2007-04-02-RegScavengerAssert.ll`. Reviewed By: dblaikie Differential Revision: https://reviews.llvm.org/D114276	2021-11-22 14:41:19 -06:00
Simon Pilgrim	4a5e1ffcf9	[ARM] Regenerate sxt_rot.ll tests	2021-11-21 18:33:29 +00:00
Simon Pilgrim	3234f2d9c1	[ARM][ParallelDSP] Regenerate complex_dot_prod.ll test	2021-11-21 12:01:44 +00:00
Simon Pilgrim	812e64ef0c	[DAG] MatchRotate - support rotate-by-constant of illegal types Patch to fix some of the regressions in D77804. By folding to rotate/funnel-shift by constant amounts for illegal types, we prevent SimplifyDemandedBits from destroying the patterns prematurely, allowing us to use the rotate/funnel-shift legalization that was added in D112443. Differential Revision: https://reviews.llvm.org/D113192	2021-11-19 11:12:04 +00:00
Ard Biesheuvel	a19da876ab	[ARM] implement support for TLS register based stack protector Implement support for loading the stack canary from a memory location held in the TLS register, with an optional offset applied. This is used by the Linux kernel to implement per-task stack canaries, which is impossible on SMP systems when using a global variable for the stack canary. Reviewed By: nickdesaulniers Differential Revision: https://reviews.llvm.org/D112768	2021-11-09 18:19:47 +01:00
Ard Biesheuvel	2caf85ad7a	[ARM] implement LOAD_STACK_GUARD for remaining targets Currently, LOAD_STACK_GUARD on ARM is only implemented for Mach-O targets, and other targets rely on the generic support which may result in spilling of the stack canary value or address, or may cause it to be kept in a callee save register across function calls, which means they essentially get spilled as well, only by the callee when it wants to free up this register. So let's implement LOAD_STACK GUARD for other targets as well. This ensures that the load of the stack canary is rematerialized fully in the epilogue. This code was split off from D112768: [ARM] implement support for TLS register based stack protector for which it is a prerequisite. Reviewed By: nickdesaulniers Differential Revision: https://reviews.llvm.org/D112811	2021-11-08 22:59:15 +01:00
Simon Pilgrim	4ed13275b7	[ARM] Precommit i128 test from D111530	2021-11-08 16:08:21 +00:00
Jay Foad	bdaa181007	[TwoAddressInstructionPass] Update existing physreg live intervals In TwoAddressInstructionPass::processTiedPairs with -early-live-intervals, update any preexisting physreg live intervals, as well as virtreg live intervals. By default (without -precompute-phys-liveness) physreg live intervals only exist for registers that are live-in to some basic block. Differential Revision: https://reviews.llvm.org/D113191	2021-11-05 21:20:30 +00:00
Jay Foad	0321bd64e6	Revert "[TwoAddressInstructionPass] Update existing physreg live intervals" This reverts commit `ec0e1e88d2`. It was pushed by mistake.	2021-11-05 09:54:26 +00:00
Jay Foad	ec0e1e88d2	[TwoAddressInstructionPass] Update existing physreg live intervals In TwoAddressInstructionPass::processTiedPairs with -early-live-intervals, update any preexisting physreg live intervals, as well as virtreg live intervals. By default (without -precompute-phys-liveness) physreg live intervals only exist for registers that are live-in to some basic block. Differential Revision: https://reviews.llvm.org/D113191	2021-11-05 09:10:24 +00:00
David Green	091244023a	[ARM] Move VPTBlock pass after post-ra scheduling Currently when tail predicating loops, vpt blocks need to be created with the vctp predicate in case we need to revert to non-tail predicated form. This has the unfortunate side effect of severely hampering post-ra scheduling at times as the instructions are already stuck in vpt blocks, not allowed to be independently ordered. This patch addresses that by just moving the creation of VPT blocks later in the pipeline, after post-ra scheduling has been performed. This allows more optimal scheduling post-ra before the vpt blocks are created, leading to more optimal tail predicated loops. Differential Revision: https://reviews.llvm.org/D113094	2021-11-04 18:42:12 +00:00
Simon Pilgrim	a763d0010c	[ARM] Regenerate shift-combine.ll test checks	2021-11-04 14:27:31 +00:00
Simon Pilgrim	325031786e	[SelectionDAG] Optimize expansion for rotates/funnel shifts If the type of a funnel shift needs to be expanded, expand it to two funnel shifts instead of regular shifts. For constant shifts, this doesn't make much difference, but for variable shifts it allows a more optimal lowering. Also use the optimized funnel shift lowering for rotates. Alive2: https://alive2.llvm.org/ce/z/TvHDB- / https://alive2.llvm.org/ce/z/yzPept (Branched from D108058 as getting this completed should help unlock some other WIP patches). Original Patch: @efriedma (Eli Friedman) Differential Revision: https://reviews.llvm.org/D112443	2021-11-02 11:38:25 +00:00
Daniel Kiss	d8075e8781	Reland "[ARM] __cxa_end_cleanup should be called instead of _UnwindResume." This is relanding commit `da1d1a0869` . This patch additionally addresses failures found in buildbots & post review comments. ARM EHABI[1] specifies the __cxa_end_cleanup to be called after cleanup. It will call the UnwindResume. __cxa_begin_cleanup will be called from libcxxabi while __cxa_end_cleanup is never called. This will trigger a termination when a foreign exception is processed while UnwindResume is called because the global state will be wrong due to the missing __cxa_end_cleanup call. Additional test here: D109856 [1] https://github.com/ARM-software/abi-aa/blob/main/ehabi32/ehabi32.rst#941compiler-helper-functions Reviewed By: logan Differential Revision: https://reviews.llvm.org/D111703	2021-10-28 21:45:09 +02:00
Daniel Kiss	66e03db814	Revert "Reland "[ARM] __cxa_end_cleanup should be called instead of _UnwindResume."" This reverts commit `b6420e575f`.	2021-10-28 17:24:53 +02:00
Daniel Kiss	b6420e575f	Reland "[ARM] __cxa_end_cleanup should be called instead of _UnwindResume." This is relanding commit `da1d1a0869` . This patch additionally addresses failures found in buildbots & post review comments. ARM EHABI[1] specifies the __cxa_end_cleanup to be called after cleanup. It will call the UnwindResume. __cxa_begin_cleanup will be called from libcxxabi while __cxa_end_cleanup is never called. This will trigger a termination when a foreign exception is processed while UnwindResume is called because the global state will be wrong due to the missing __cxa_end_cleanup call. Additional test here: D109856 [1] https://github.com/ARM-software/abi-aa/blob/main/ehabi32/ehabi32.rst#941compiler-helper-functions Reviewed By: logan Differential Revision: https://reviews.llvm.org/D111703	2021-10-28 16:49:19 +02:00
Max Kazantsev	8daf76935d	[Test] Regenerate some of llc test checks using auto updater	2021-10-28 16:18:30 +07:00
Ard Biesheuvel	d7e089f2d6	[ARM] Use hardware TLS register in Thumb2 mode when -mtp=cp15 is passed In ARM mode, passing -mtp=cp15 forces the use of an inline MRC system register read to move the thread pointer value into a register. Currently, in Thumb2 mode, -mtp=cp15 is ignored, and a call to the __aeabi_read_tp helper is emitted instead. This is inconsistent, and breaks the Linux/ARM build for Thumb2 targets, as the Linux kernel does not provide an implementation of __aeabi_read_tp,. Reviewed By: nickdesaulniers, peter.smith Differential Revision: https://reviews.llvm.org/D112600	2021-10-27 16:42:11 -07:00
Daniel Kiss	894ddba1c9	Revert "[ARM] __cxa_end_cleanup should be called instead of _UnwindResume." This reverts commit `da1d1a0869`.	2021-10-27 14:29:35 +02:00
Daniel Kiss	da1d1a0869	[ARM] __cxa_end_cleanup should be called instead of _UnwindResume. ARM EHABI[1] specifies the __cxa_end_cleanup to be called after cleanup. It will call the UnwindResume. __cxa_begin_cleanup will be called from libcxxabi while __cxa_end_cleanup is never called. This will trigger a termination when a foreign exception is processed while UnwindResume is called because the global state will be wrong due to the missing __cxa_end_cleanup call. Additional test here: D109856 [1] https://github.com/ARM-software/abi-aa/blob/main/ehabi32/ehabi32.rst#941compiler-helper-functions Reviewed By: logan Differential Revision: https://reviews.llvm.org/D111703	2021-10-27 10:40:00 +02:00
Simon Pilgrim	d8e50c9dba	[CodeGen] Add PR50197 AArch64/ARM/X86 test coverage Pre-commit for D111530	2021-10-22 14:22:46 +01:00
Craig Topper	b75f3dd88e	[ARM] Use correct name of floating point ceil intrinsic in test. The intrinsic is called llvm.ceil not llvm.fceil. The checks weren't strong enough to notice that a call to llvm.fceil was emitted in the final assembly.	2021-10-20 17:30:26 -07:00
John Brawn	082fa56819	[ARM] Fix MOVCC peephole to not use an incorrect register class The MOVCC peephole eliminates a MOVCC by making one of its inputs a conditional instruction, but when doing this it should be using both inputs of the MOVCC to decide on the register class to use as otherwise we can get an error when using -verify-machineinstrs. Differential Revision: https://reviews.llvm.org/D111714	2021-10-15 10:54:26 +01:00
Andrew Savonichev	dc8a41de34	[ARM] Simplify address calculation for NEON load/store The patch attempts to optimize a sequence of SIMD loads from the same base pointer: %0 = gep float, float base, i32 4 %1 = bitcast float* %0 to <4 x float>* %2 = load <4 x float>, <4 x float>* %1 ... %n1 = gep float, float base, i32 N %n2 = bitcast float* %n1 to <4 x float>* %n3 = load <4 x float>, <4 x float>* %n2 For AArch64 the compiler generates a sequence of LDR Qt, [Xn, #16]. However, 32-bit NEON VLD1/VST1 lack the [Wn, #imm] addressing mode, so the address is computed before every ld/st instruction: add r2, r0, #32 add r0, r0, #16 vld1.32 {d18, d19}, [r2] vld1.32 {d22, d23}, [r0] This can be improved by computing address for the first load, and then using a post-indexed form of VLD1/VST1 to load the rest: add r0, r0, #16 vld1.32 {d18, d19}, [r0]! vld1.32 {d22, d23}, [r0] In order to do that, the patch adds more patterns to DAGCombine: - (load (add ptr inc1)) and (add ptr inc2) are now folded if inc1 and inc2 are constants. - (or ptr inc) is now recognized as a pointer increment if ptr is sufficiently aligned. In addition to that, we now search for all possible base updates and then pick the best one. Differential Revision: https://reviews.llvm.org/D108988	2021-10-14 15:23:10 +03:00
Guozhi Wei	6599961c17	[TwoAddressInstructionPass] Improve the SrcRegMap and DstRegMap computation This patch contains following enhancements to SrcRegMap and DstRegMap: 1 In findOnlyInterestingUse not only check if the Reg is two address usage, but also check after commutation can it be two address usage. 2 If a physical register is clobbered, remove SrcRegMap entries that are mapped to it. 3 In processTiedPairs, when create a new COPY instruction, add a SrcRegMap entry only when the COPY instruction is coalescable. (The COPY src is killed) With these enhancements isProfitableToCommute can do better commute decision, and finally more register copies are removed. Differential Revision: https://reviews.llvm.org/D108731	2021-10-11 15:28:31 -07:00
Qiu Chaofan	573531fb1f	Fix typo of colon to semicolon in lit tests	2021-10-09 10:03:50 +08:00
Pengxuan Zheng	b0045f5595	[ARM] Fix a bug in finding a pair of extracts to create VMOVRRD D100244 missed a check on the ResNo of the extract's operand 0 when finding a pair of extracts to combine into a VMOVRRD (extract(x, n); extract(x, n+1) -> VMOVRRD(extract x, n/2)). As a result, it can incorrectly pair an extract(x, n) with another extract(x:3, n+1) for example. This patch fixes the bug by adding the proper check on ResNo. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D111188	2021-10-06 10:03:32 -07:00
David Green	ffaaa9b05c	[ARM] Reset speculation-hardening-sls.ll test checks. The commit `e497b12a69` went and regenerated all the checks lines in the Arm speculation-hardening-sls.ll test in a way that removed most of the important checks. This just resets them back to how they were before, with the single character fix to change: ; NOHARDENARM: {{bxge lr$}} to ; NOHARDENARM: {{bxgt lr$}} Differential Revision: https://reviews.llvm.org/D111074	2021-10-05 10:51:18 +01:00
Amara Emerson	8bde5e58c0	Delay outgoing register assignments to last. The delayed stack protector feature which is currently used for SDAG (and thus allows for more commonly generating tail calls) depends on being able to extract the tail call into a separate return block. To do this it also has to extract the vreg->physreg copies that set up the call's arguments, since if it doesn't then the call inst ends up using undefined physregs in it's new spliced block. SelectionDAG implementations can do this because they delay emitting register copies until after the stack arguments are set up. GISel however just processes and emits the arguments in IR order, so stack arguments always end up last, and thus this breaks the code that looks for any register arg copies that precede the call instruction. This patch adds a thunk argument to the assignValueToReg() and custom assignment hooks. For outgoing arguments, register assignments use this return param to return a thunk that does the actual generating of the copies. We collect these until all the outgoing stack assignments have been done and then execute them, so that the copies (and perhaps some artifacts like G_SEXTs) are placed after any stores. Differential Revision: https://reviews.llvm.org/D110610	2021-10-04 12:33:20 -07:00
David Green	20b1a16a69	[ARM] Mark <= -1 immediate constant as cheap A <= -1 constant on a compare can be converted to a < 0 operation, which is usually cheap. If we mark the constant as cheap, preventing hoisting, we allow that fold to happen even across different blocks. Differential Revision: https://reviews.llvm.org/D109360	2021-10-03 19:30:08 +01:00
David Green	d6482df683	[ARM] Tests for constant hoisting -1 immediates	2021-10-03 16:32:31 +01:00
Stanislav Mekhanoshin	08d7eec06e	Revert "Allow rematerialization of virtual reg uses" Reverted due to two distcint performance regression reports. This reverts commit `92c1fd19ab`.	2021-09-24 10:26:11 -07:00
Jay Foad	7863cc6c1c	[LiveIntervals] Fix repairOldRegInRange for simple def cases The fix applied in D23303 "LiveIntervalAnalysis: fix a crash in repairOldRegInRange" was over-zealous. It would bail out when the end of the range to be repaired was in the middle of the first segment of the live range of Reg, which was always the case when the range contained a single def of Reg. This patch fixes it as suggested by Matthias Braun in post-commit review on the original patch, and tests it by adding -early-live-intervals to a selection of existing lit tests that now pass. (Note that D23303 was originally applied to fix a crash in SILoadStoreOptimizer, but that is now moot since D23814 updated SILoadStoreOptimizer to run before scheduling so it no longer has to update live intervals.) Differential Revision: https://reviews.llvm.org/D110238 Unrevert with some changes to the tests: - Add -verify-machineinstrs to check for remaining problems in live interval support in TwoAddressInstructionPass. - Drop test/CodeGen/AMDGPU/extract-load-i1.ll since it suffers from some of those remaining problems.	2021-09-24 11:44:49 +01:00
Jay Foad	deb2ca566a	Revert "[LiveIntervals] Fix repairOldRegInRange for simple def cases" This reverts commit `8229cb7412`. It was failing on buildbots with expensive checks enabled.	2021-09-23 17:55:05 +01:00
Jay Foad	8229cb7412	[LiveIntervals] Fix repairOldRegInRange for simple def cases The fix applied in D23303 "LiveIntervalAnalysis: fix a crash in repairOldRegInRange" was over-zealous. It would bail out when the end of the range to be repaired was in the middle of the first segment of the live range of Reg, which was always the case when the range contained a single def of Reg. This patch fixes it as suggested by Matthias Braun in post-commit review on the original patch, and tests it by adding -early-live-intervals to a selection of existing lit tests that now pass. (Note that D23303 was originally applied to fix a crash in SILoadStoreOptimizer, but that is now moot since D23814 updated SILoadStoreOptimizer to run before scheduling so it no longer has to update live intervals.) Differential Revision: https://reviews.llvm.org/D110238	2021-09-23 17:16:14 +01:00
David Green	c49611f909	Mark CFG as preserved in TypePromotion and InterleaveAccess passes Neither of these passes modify the CFG, allowing us to preserve DomTree and LoopInfo across them by using setPreservesCFG. Differential Revision: https://reviews.llvm.org/D110161	2021-09-22 18:58:00 +01:00
Petar Avramovic	e4c46ddd91	[GlobalISel] Improve elimination of dead instructions in legalizer Add eraseInstr(s) utility functions. Before deleting an instruction collects its use instructions. After deletion deletes use instructions that became trivially dead. This patch clears all dead instructions in existing legalizer mir tests. Differential Revision: https://reviews.llvm.org/D109154	2021-09-20 13:00:58 +02:00
David Green	1da52ef294	[ARM] Add VGETLANEu patterns for v4f16 and v8f16 These were apparently missing, having no pattern that could convert a VGETLANEu of a v4f16 to an i32. Added bf16 whilst here, following the same code.	2021-09-19 14:25:21 +01:00
Alexandros Lamprineas	1bd5ea968e	[ARM] Mitigate the cve-2021-35465 security vulnurability. Recently a vulnerability issue is found in the implementation of VLLDM instruction in the Arm Cortex-M33, Cortex-M35P and Cortex-M55. If the VLLDM instruction is abandoned due to an exception when it is partially completed, it is possible for subsequent non-secure handler to access and modify the partial restored register values. This vulnerability is identified as CVE-2021-35465. The mitigation sequence varies between v8-m and v8.1-m as follows: v8-m.main --------- mrs r5, control tst r5, #8 /* CONTROL_S.SFPA / it ne .inst.w 0xeeb00a40 / vmovne s0, s0 / 1: vlldm sp / Lazy restore of d0-d16 and FPSCR. / v8.1-m.main ----------- vscclrm {vpr} / Clear VPR. / vlldm sp / Lazy restore of d0-d16 and FPSCR. */ More details on developer.arm.com/support/arm-security-updates/vlldm-instruction-security-vulnerability Differential Revision: https://reviews.llvm.org/D109157	2021-09-16 12:56:43 +01:00
Alexandros Lamprineas	61f25daa8d	[ARM][CMSE] Clear the secure fp-registers when using softfp abi. When expanding the non-secure call instruction we are emiting code to clear the secure floating-point registers only if the targeted architecture has floating-point support. The potential problem is when the source code containing non-secure calls are built with -mfloat-abi=soft but some other part of the system has been built with -mfloat-abi=softfp (soft and softfp are compatible as they use the same procedure calling standard). In this case floating-point registers could leak to non-secure state as the non-secure won't have cleared them assuming no floating point has been used. Differential Revision: https://reviews.llvm.org/D109153	2021-09-16 12:56:43 +01:00
Philip Reames	debbf8049d	autogen a test for ease of update	2021-09-15 11:11:07 -07:00
David Green	a2332d5332	[ARM] Prevent continuous folding of SUBC Under some situations under Thumb1, we could be stuck in an infinite loop recombining the same instruction. This puts a limit on that, not combining SUBC with SUBE repeatedly.	2021-09-15 11:23:32 +01:00
Matt Arsenault	54d755a034	DAG: Fix incorrect folding of fmul -1 to fneg The fmul is a canonicalizing operation, and fneg is not so this would break denormals that need flushing and also would not quiet signaling nans. Fold to fsub instead, which is also canonicalizing.	2021-09-14 21:25:02 -04:00
Matt Arsenault	4a36e96c3f	RegAllocGreedy: Account for reserved registers in num regs heuristic This simple heuristic uses the estimated live range length combined with the number of registers in the class to switch which heuristic to use. This was taking the raw number of registers in the class, even though not all of them may be available. AMDGPU heavily relies on dynamically reserved numbers of registers based on user attributes to satisfy occupancy constraints, so the raw number is highly misleading. There are still a few problems here. In the original testcase that made me notice this, the live range size is incorrect after the scheduler rearranges instructions, since the instructions don't have the original InstrDist offsets. Additionally, I think it would be more appropriate to use the number of disjointly allocatable registers in the class. For the AMDGPU register tuples, there are a large number of registers in each tuple class, but only a small fraction can actually be allocated at the same time since they all overlap with each other. It seems we do not have a query that corresponds to the number of independently allocatable registers. Relatedly, I'm still debugging some allocation failures where overlapping tuples seem to not be handled correctly. The test changes are mostly noise. There are a handful of x86 tests that look like regressions with an additional spill, and a handful that now avoid a spill. The worst looking regression is likely test/Thumb2/mve-vld4.ll which introduces a few additional spills. test/CodeGen/AMDGPU/soft-clause-exceeds-register-budget.ll shows a massive improvement by completely eliminating a large number of spills inside a loop.	2021-09-14 21:00:29 -04:00
Nikita Popov	f5806830e0	[ARM] Support neon.vld auto-upgrade with opaque pointers This code manually constructs the intrinsic name, so we need to use p0 instead of p0i8 in opaque pointer mode.	2021-09-11 16:34:32 +02:00
Arthur Eubanks	fe15347a1e	Port the cost model printer to New PM Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D109284	2021-09-08 14:47:05 -07:00
David Green	d8d24c64fe	[DAG] Fix GT -> GE condition when creating SetCC `79845ed6df` folded some setcc(ashr) conditions to setcc, but got the condition for NE incorrect, using GT where it should be using GE.	2021-09-08 12:41:51 +01:00
Peter Smith	e63455d5e0	[MC] Use local MCSubtargetInfo in writeNops On some architectures such as Arm and X86 the encoding for a nop may change depending on the subtarget in operation at the time of encoding. This change replaces the per module MCSubtargetInfo retained by the targets AsmBackend in favour of passing through the local MCSubtargetInfo in operation at the time. On Arm using the architectural NOP instruction can have a performance benefit on some implementations. For Arm I've deleted the copy of the AsmBackend's MCSubtargetInfo to limit the chances of this causing problems in the future. I've not done this for other targets such as X86 as there is more frequent use of the MCSubtargetInfo and it looks to be for stable properties that we would not expect to vary per function. This change required threading STI through MCNopsFragment and MCBoundaryAlignFragment. I've attempted to take into account the in tree experimental backends. Differential Revision: https://reviews.llvm.org/D45962	2021-09-07 15:46:19 +01:00
Ben Shi	63ca9371c7	[ARM] Implement target hook function to decide folding (mul (add x, c1), c2) Prevent the folding in DAGCombine if it leads to worse code. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D109124	2021-09-07 15:42:43 +08:00
Ben Shi	20f890696f	[ARM][test] Add new tests for (mul (add r, c0), c1) Reviewed By: RKSimon, dmgreen Differential Revision: https://reviews.llvm.org/D109123	2021-09-07 15:42:32 +08:00
David Green	1b83aaaefa	[DAG] Remove oneuse check in select_cc setgt X, -1, C, ~C fold This appears to produce better code, even if the condition may need to be replicated.	2021-09-05 16:18:31 +01:00
David Green	8523fb96a6	[DAG] Fold select_cc setgt X, -1, C, ~C -> xor (ashr X, BW-1), C Given a select_cc producing a constant and a invertion of the constant for a comparison more than zero, we can produce an xor with ashr instead, which produces smaller code. The ashr either sets all bits or clear all bits depending on if the value is negative. This is then xor'd with the constant to optionally negate the value. https://alive2.llvm.org/ce/z/DTFaBZ This includes a OneUseCheck on the Cmp, which seems to make thinks a little worse and will be removed in a followup. Differential Revision: https://reviews.llvm.org/D109149	2021-09-05 16:04:01 +01:00
David Green	79845ed6df	[DAG] Fold setcc eq with ashr to compare to zero. Pulled out of D109149, this folds set_cc seteq (ashr X, BW-1), -1 -> set_cc setlt X, 0 to prevent some regressions later on when folding select_cc setgt X, -1, C, ~C -> xor (ashr X, BW-1), C Differential Revision: https://reviews.llvm.org/D109214	2021-09-05 14:06:47 +01:00
David Green	7801d7963d	[DAG] Add tests for select_cc and setcc with constant patterns.	2021-09-05 10:17:21 +01:00
David Green	adfd12e6d1	[ARM] Add patterns for store(fptosisat(..)) As an extension to D107866, this adds store(fptosisat(..)) patterns, similar to the existing fptosi patterns, to prevent unnecessarily moving into gpr regs where we can use fp stores directly. Differential Revision: https://reviews.llvm.org/D108378	2021-09-03 19:22:11 +01:00
David Green	f37e132263	[ARM] Add VFP lowering for fptosi.sat This extends D107865 to the VFP insructions, lowering llvm.fptosi.sat and llvm.fptoui.sat to VCVT instructions that inherently perform the saturate. Differential Revision: https://reviews.llvm.org/D107866	2021-09-03 18:11:08 +01:00
David Green	9cb8f4d1ad	[ARM] Add a tail-predication loop predicate register The semantics of tail predication loops means that the value of LR as an instruction is executed determines the predicate. In other words: mov r3, #3 DLSTP lr, r3 // Start tail predication, lr==3 VADD.s32 q0, q1, q2 // Lanes 0,1 and 2 are updated in q0. mov lr, #1 VADD.s32 q0, q1, q2 // Only first lane is updated. This means that the value of lr cannot be spilled and re-used in tail predication regions without potentially altering the behaviour of the program. More lanes than required could be stored, for example, and in the case of a gather those lanes might not have been setup, leading to alignment exceptions. This patch adds a new lr predicate operand to MVE instructions in order to keep a reference to the lr that they use as a tail predicate. It will usually hold the zeroreg meaning not predicated, being set to the LR phi value in the MVETPAndVPTOptimisationsPass. This will prevent it from being spilled anywhere that it needs to be used. A lot of tests needed updating. Differential Revision: https://reviews.llvm.org/D107638	2021-09-02 13:42:58 +01:00
Nick Desaulniers	5c91b98c5d	[ARMISelLowering] avoid emitting libcalls to __mulodi4() __has_builtin(__builtin_mul_overflow) returns true for 32b ARM targets, but Clang is deferring to compiler RT when encountering `long long` types. This breaks sanitizer builds of the Linux kernel that are using __builtin_mul_overflow with these types for these targets. If the semantics of __has_builtin mean "the compiler resolves these, always" then we shouldn't conditionally emit a libcall. This will still need to be worked around in the Linux kernel in order to continue to support allmodconfig builds of the Linux kernel for this target with older releases of clang. Link: https://bugs.llvm.org/show_bug.cgi?id=28629 Link: https://github.com/ClangBuiltLinux/linux/issues/1438 Reviewed By: rengolin Differential Revision: https://reviews.llvm.org/D108842	2021-08-27 15:14:47 -07:00
David Green	bd0959354f	[ARM] Add Extra FpToIntSat tests. This adds extra MVE vector fptosi.sat and fptoui.sat tests, along with adding or adjusting the existing scalar tests to cover more architectures and instruction combinations.	2021-08-25 20:10:18 +01:00
Stanislav Mekhanoshin	92c1fd19ab	Allow rematerialization of virtual reg uses Currently isReallyTriviallyReMaterializableGeneric() implementation prevents rematerialization on any virtual register use on the grounds that is not a trivial rematerialization and that we do not want to extend liveranges. It appears that LRE logic does not attempt to extend a liverange of a source register for rematerialization so that is not an issue. That is checked in the LiveRangeEdit::allUsesAvailableAt(). The only non-trivial aspect of it is accounting for tied-defs which normally represent a read-modify-write operation and not rematerializable. The test for a tied-def situation already exists in the /CodeGen/AMDGPU/remat-vop.mir, test_no_remat_v_cvt_f32_i32_sdwa_dst_unused_preserve. The change has affected ARM/Thumb, Mips, RISCV, and x86. For the targets where I more or less understand the asm it seems to reduce spilling (as expected) or be neutral. However, it needs a review by all targets' specialists. Differential Revision: https://reviews.llvm.org/D106408	2021-08-24 11:09:02 -07:00
Petar Avramovic	2bf4eeeeb6	[GlobalISel] Avoid creating COPY in LegalizationArtifactCombiner When Src and Dst used in buildAnyExtOrTrunc or buildSExtOrTrunc have the same type (creates COPY) use Src register directly or use replaceRegOrBuildCopy instead. Differential Revision: https://reviews.llvm.org/D108306	2021-08-24 11:09:56 +02:00
Martin Storsjö	039b469b85	[ARM] Allow using ';' as asm statement separator in MSVC mode This does the same as D96259, but for ARM, just like AArch64, using the same comment char as for ELF and MinGW mode. As the assembly input/output of LLVM is GAS style, trying to match what MS armasm.exe does isn't needed (because the comment char used is the least concern when it comes to that; all directives differ too). If a separate armasm compatible mode is implemented, it can use its own comment style (just like llvm-ml implements MS ml.exe compatible assembly parsing). This fixes building compiler-rt assembly files for ARM in MSVC mode. The updated testcase literals-comments.s was only intended to make sure that '#' isn't interpreted as a comment char. Differential Revision: https://reviews.llvm.org/D107251	2021-08-24 11:01:49 +03:00
David Green	d10f23a25d	[ISel] Expand saddsat and ssubsat via asr and xor This changes the lowering of saddsat and ssubsat so that instead of using: r,o = saddo x, y c = setcc r < 0 s = c ? INTMAX : INTMIN ret o ? s : r into using asr and xor to materialize the INTMAX/INTMIN constants: r,o = saddo x, y s = ashr r, BW-1 x = xor s, INTMIN ret o ? x : r https://alive2.llvm.org/ce/z/TYufgD This seems to reduce the instruction count in most testcases across most architectures. X86 has some custom lowering added to compensate for cases where it can increase instruction count. Differential Revision: https://reviews.llvm.org/D105853	2021-08-19 16:08:07 +01:00
Petr Hosek	2d4470ab89	Revert "Allow rematerialization of virtual reg uses" This reverts commit `877572cc19` which introduced PR51516.	2021-08-18 00:12:41 -07:00
Stanislav Mekhanoshin	877572cc19	Allow rematerialization of virtual reg uses Currently isReallyTriviallyReMaterializableGeneric() implementation prevents rematerialization on any virtual register use on the grounds that is not a trivial rematerialization and that we do not want to extend liveranges. It appears that LRE logic does not attempt to extend a liverange of a source register for rematerialization so that is not an issue. That is checked in the LiveRangeEdit::allUsesAvailableAt(). The only non-trivial aspect of it is accounting for tied-defs which normally represent a read-modify-write operation and not rematerializable. The test for a tied-def situation already exists in the /CodeGen/AMDGPU/remat-vop.mir, test_no_remat_v_cvt_f32_i32_sdwa_dst_unused_preserve. The change has affected ARM/Thumb, Mips, RISCV, and x86. For the targets where I more or less understand the asm it seems to reduce spilling (as expected) or be neutral. However, it needs a review by all targets' specialists. Differential Revision: https://reviews.llvm.org/D106408	2021-08-16 12:42:42 -07:00
Dávid Bolvanský	49de6070a2	Revert "[Remarks] Emit optimization remarks for atomics generating CAS loop" This reverts commit `435785214f`. Still same compile time issues for -O0 -g, eg. +1.3% for sqlite3.	2021-08-15 11:44:13 +02:00
Anshil Gandhi	435785214f	[Remarks] Emit optimization remarks for atomics generating CAS loop Implements ORE in AtomicExpand pass to report atomics generating a compare and swap loop. Differential Revision: https://reviews.llvm.org/D106891	2021-08-14 23:37:23 -06:00
David Green	e8d60e75fc	[ARM] Regenerate ARM neon-copy.ll test. NFC This test didn't include all test check lines, thanks to .'s in function names. It also changed the triple to hard float to make a more interesting test for NEON code generation.	2021-08-09 08:24:28 +01:00
David Green	77e8f4eeee	[ARM] Define ComplexPatternFuncMutatesDAG Some of the Arm complex pattern functions call canExtractShiftFromMul, which can modify the DAG in-place. For this to be valid and handled successfully we need to define ComplexPatternFuncMutatesDAG. Differential Revision: https://reviews.llvm.org/D107476	2021-08-06 17:35:11 +01:00
Simon Pilgrim	dbce6a8d9d	[ARM] Fold insert_subvector to concat_vectors D107068 fixed the same problem on aarch64 but the arm variant wasn't exposed in existing test coverage. I've copied the arm64-neon-copy tests (and stripped the intrinsic test from it) for testing on arm neon builds as well.	2021-08-06 11:21:31 +01:00
Tomas Matheson	40650f27b5	[ARM][atomicrmw] Fix CMP_SWAP_32 expand assert This assert is intended to ensure that the high registers are not selected when it is passed to one of the thumb UXT instructions. However it was triggering even for 32 bit where no UXT instruction is emitted. Fixes PR51313. Differential Revision: https://reviews.llvm.org/D107363	2021-08-04 15:02:02 +01:00
Arthur Eubanks	ad25344620	[MC][CodeGen] Emit constant pools earlier Previously we would emit constant pool entries for ldr inline asm at the very end of AsmPrinter::doFinalization(). However, if we're emitting dwarf aranges, that would end all sections with aranges. Then if we have constant pool entries to be emitted in those same sections, we'd hit an assert that the section has already been ended. We want to emit constant pool entries before emitting dwarf aranges. This patch splits out arm32/64's constant pool entry emission into its own MCTargetStreamer virtual method. Fixes PR51208 Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D107314	2021-08-03 20:55:31 -07:00
Guozhi Wei	50b6273145	[MBP] findBestLoopTopHelper should exit if OldTop is not a chain header Function findBestLoopTopHelper tries to find a new loop top block which can also fall through to OldTop, but it's impossible if OldTop is not a chain header, so it should exit immediately. Differential Revision: https://reviews.llvm.org/D106329	2021-07-28 19:00:45 -07:00
Johannes Doerfert	25a3130d89	[Local] Do not introduce a new `llvm.trap` before `unreachable` This is the second attempt to remove the `llvm.trap` insertion after https://reviews.llvm.org/rGe14e7bc4b889dfaffb7180d176a03311df2d4ae6 reverted the first one. It is not clear what the exact issue was back then and it might already be gone by now, it has been >5 years after all. Replaces D106299. Differential Revision: https://reviews.llvm.org/D106308	2021-07-26 23:33:36 -05:00
Tim Northover	19d2e42be2	ARM: don't return by popping PC if we have to adjust the stack afterwards. In mandatory tail calling conventions we might have to deallocate stack space used by our arguments before return. This happens after popping CSRs, so the pop cannot be turned into the return itself in this case. The else branch here was already a nop, so removing it as a tidy-up.	2021-07-21 09:35:14 +01:00
David Green	5561ad8b36	[ARM] Remove PromotedBitwiseVT for NEON types This removes the promotion of NEON AND, OR and XOR nodes to v2i32/v4i32, treating them the same as the AArch64 and MVE backends where we just add the relevant patterns for each legal type. This prevents a lot of bitcasts from being added to the DAG, which have the potential to make optimizations more difficult. It does mean adding extra patterns, and some codegen can change due to the types now being legal, not promoted. Differential Revision: https://reviews.llvm.org/D105588	2021-07-19 16:36:33 +01:00
Matt Arsenault	e91da668d0	GlobalISel: Track argument pointeriness with arg flags Since we're still building on top of the MVT based infrastructure, we need to track the pointer type/address space on the side so we can end up with the correct pointer LLTs when interpreting CCValAssigns.	2021-07-15 19:11:40 -04:00
Tim Northover	b18bda6791	ARM: reuse existing libcall global variable if possible. If we try to create a new GlobalVariable on each iteration, the Module will detect the name collision and "helpfully" rename later iterations by appending ".1" etc. But "___udivsi3.1" doesn't exist and we definitely don't want to try to call it. So instead check whether there's already a global with the right name in the module and use that if so.	2021-07-14 14:14:47 +01:00
Djordje Todorovic	df686842bc	[RemoveRedundantDebugValues] Add a Pass that removes redundant DBG_VALUEs This new MIR pass removes redundant DBG_VALUEs. After the register allocator is done, more precisely, after the Virtual Register Rewriter, we end up having duplicated DBG_VALUEs, since some virtual registers are being rewritten into the same physical register as some of existing DBG_VALUEs. Each DBG_VALUE should indicate (at least before the LiveDebugValues) variables assignment, but it is being clobbered for function parameters during the SelectionDAG since it generates new DBG_VALUEs after COPY instructions, even though the parameter has no assignment. For example, if we had a DBG_VALUE $regX as an entry debug value representing the parameter, and a COPY and after the COPY, DBG_VALUE $virt_reg, and after the virtregrewrite the $virt_reg gets rewritten into $regX, we'd end up having redundant DBG_VALUE. This breaks the definition of the DBG_VALUE since some analysis passes might be built on top of that premise..., and this patch tries to fix the MIR with the respect to that. This first patch performs bacward scan, by trying to detect a sequence of consecutive DBG_VALUEs, and to remove all DBG_VALUEs describing one variable but the last one: For example: (1) DBG_VALUE $edi, !"var1", ... (2) DBG_VALUE $esi, !"var2", ... (3) DBG_VALUE $edi, !"var1", ... ... in this case, we can remove (1). By combining the forward scan that will be introduced in the next patch (from this stack), by inspecting the statistics, the RemoveRedundantDebugValues removes 15032 instructions by using gdb-7.11 as a testbed. Differential Revision: https://reviews.llvm.org/D105279	2021-07-14 04:29:42 -07:00
Daniel Egger	98c2e4115d	[ARM] Add lowering of uadd_sat to uq{add\|sub}8 and uq{add\|sub}16 This follow the lead of https://reviews.llvm.org/D68974 to add lowering of unsigned saturated addition/subtraction. Differential Revision: https://reviews.llvm.org/D105413	2021-07-11 15:58:11 +01:00
David Green	dc0bbc9d89	[IfCvt] Don't use pristine register for counting liveins for predicated instructions. The test case here hits machine verifier problems. There are volatile long loads that the results of do not get used, loading into two dead registers. IfCvt will predicate them and as it does will add implicit uses of the predicating registers due to thinking they are live in. As nothing has used the register, the machine verifier disagrees that they are really live and we end up with a failure. The registers come from Pristine regs that LivePhysRegs counts as live. This patch adds a addLiveInsNoPristines method to be used instead in IfCvt, so that only really live in regs need to be added as implicit operands. Differential Revision: https://reviews.llvm.org/D90965	2021-07-11 14:45:54 +01:00
David Green	4ce26deac2	[DAG] Reassociate Add with Or We already have reassociation code for Adds and Ors separately in DAG combiner, this adds it for the combination of the two where Ors act like Adds. It reassociates (add (or (x, c), y) -> (add (add (x, y), c)) where we know that the Ors operands have no common bits set, and the Or has one use. Differential Revision: https://reviews.llvm.org/D104765	2021-07-07 10:21:07 +01:00
Eli Friedman	7ac1c7bead	Recommit [ScalarEvolution] Make getMinusSCEV() fail for unrelated pointers. As part of making ScalarEvolution's handling of pointers consistent, we want to forbid multiplying a pointer by -1 (or any other value). This means we can't blindly subtract pointers. There are a few ways we could deal with this: 1. We could completely forbid subtracting pointers in getMinusSCEV() 2. We could forbid subracting pointers with different pointer bases (this patch). 3. We could try to ptrtoint pointer operands. The option in this patch is more friendly to non-integral pointers: code that works with normal pointers will also work with non-integral pointers. And it seems like there are very few places that actually benefit from the third option. As a minimal patch, the ScalarEvolution implementation of getMinusSCEV still ends up subtracting pointers if they have the same base. This should eliminate the shared pointer base, but eventually we'll need to rewrite it to avoid negating the pointer base. I plan to do this as a separate step to allow measuring the compile-time impact. This doesn't cause obvious functional changes in most cases; the one case that is significantly affected is ICmpZero handling in LSR (which is the source of almost all the test changes). The resulting changes seem okay to me, but suggestions welcome. As an alternative, I tried explicitly ptrtoint'ing the operands, but the result doesn't seem obviously better. I deleted the test lsr-undef-in-binop.ll becuase I couldn't figure out how to repair it to test what it was actually trying to test. Recommitting with fix to MemoryDepChecker::isDependent. Differential Revision: https://reviews.llvm.org/D104806	2021-07-06 12:16:05 -07:00
David Green	be0924ad17	[Tests] Update some tests for D104765. NFC	2021-07-06 19:23:52 +01:00
Eli Friedman	a6d081b2cb	Revert "[ScalarEvolution] Make getMinusSCEV() fail for unrelated pointers." This reverts commit `74d6ce5d5f`. Seeing crashes on buildbots in MemoryDepChecker::isDependent.	2021-07-06 11:17:13 -07:00
Eli Friedman	74d6ce5d5f	[ScalarEvolution] Make getMinusSCEV() fail for unrelated pointers. As part of making ScalarEvolution's handling of pointers consistent, we want to forbid multiplying a pointer by -1 (or any other value). This means we can't blindly subtract pointers. There are a few ways we could deal with this: 1. We could completely forbid subtracting pointers in getMinusSCEV() 2. We could forbid subracting pointers with different pointer bases (this patch). 3. We could try to ptrtoint pointer operands. The option in this patch is more friendly to non-integral pointers: code that works with normal pointers will also work with non-integral pointers. And it seems like there are very few places that actually benefit from the third option. As a minimal patch, the ScalarEvolution implementation of getMinusSCEV still ends up subtracting pointers if they have the same base. This should eliminate the shared pointer base, but eventually we'll need to rewrite it to avoid negating the pointer base. I plan to do this as a separate step to allow measuring the compile-time impact. This doesn't cause obvious functional changes in most cases; the one case that is significantly affected is ICmpZero handling in LSR (which is the source of almost all the test changes). The resulting changes seem okay to me, but suggestions welcome. As an alternative, I tried explicitly ptrtoint'ing the operands, but the result doesn't seem obviously better. I deleted the test lsr-undef-in-binop.ll becuase I couldn't figure out how to repair it to test what it was actually trying to test. Differential Revision: https://reviews.llvm.org/D104806	2021-07-06 10:54:41 -07:00
Roman Lebedev	261c56f80b	[NFC][Codegen] Tune a few tests to not end with a naked `unreachable` terminator These rely on the fact that currently simplifycfg won't really propagate said `unreachable`, but that is about to change.	2021-07-02 23:33:30 +03:00
David Green	3d48775b89	[ARM] Reassociate BFI D104868 removed an (incorrect) fold for distributing BFI instructions in a chain, combining them into a single instruction. BFIs like that are hard to test, as the patterns are often destroyed before they become BFIs. But it can come up in places, with chains of BFIs that can be combined. This patch adds a replacement, which reassociates BFI instructions with non-overlapping insertion masks so that low bits are inserted first. This can end up sorting the nodes so that adjacent inserts are next to one another, allowing the existing folds to combine into a single BFI. Differential Revision: https://reviews.llvm.org/D105096	2021-07-01 21:08:13 +01:00
David Green	42d7d52314	[ARM] Extra BFI codegen tests. NFC	2021-07-01 16:56:23 +01:00
Matt Arsenault	28f2f66200	GlobalISel: Use LLT in memory legality queries This enables proper lowering of non-byte sized loads. We still aren't faithfully preserving memory types everywhere, so the legality checks still only consider the size.	2021-06-30 17:44:13 -04:00
Matt Arsenault	fae05692a3	CodeGen: Print/parse LLTs in MachineMemOperands This will currently accept the old number of bytes syntax, and convert it to a scalar. This should be removed in the near future (I think I converted all of the tests already, but likely missed a few). Not sure what the exact syntax and policy should be. We can continue printing the number of bytes for non-generic instructions to avoid test churn and only allow non-scalar types for generic instructions. This will currently print the LLT in parentheses, but accept parsing the existing integers and implicitly converting to scalar. The parentheses are a bit ugly, but the parser logic seems unable to deal without either parentheses or some keyword to indicate the start of a type.	2021-06-30 16:54:13 -04:00
David Green	cd76f43b49	[ARM] Set the immediate cost of GEP operands to 0 This prevents constant gep operands from being hoisted by the Constant Hoisting pass, leaving them to CodegenPrepare which can usually do a better job at splitting large offsets. This can, in general, improve performance and decrease codesize, especially for v6m where many constants have a high cost. Differential Revision: https://reviews.llvm.org/D104877	2021-06-30 19:19:03 +01:00
Igor Kudrin	657e067bb5	[ARMInstPrinter] Print the target address of a branch instruction This follows other patches that changed printing immediate values of branch instructions to target addresses, see D76580 (x86), D76591 (PPC), D77853 (AArch64). As observing immediate values might sometimes be useful, they are printed as comments for branch instructions. // llvm-objdump -d output (before) 000200b4 <_start>: 200b4: ff ff ff fa blx #-4 <thumb> 000200b8 <thumb>: 200b8: ff f7 fc ef blx #-8 <_start> // llvm-objdump -d output (after) 000200b4 <_start>: 200b4: ff ff ff fa blx 0x200b8 <thumb> @ imm = #-4 000200b8 <thumb>: 200b8: ff f7 fc ef blx 0x200b4 <_start> @ imm = #-8 // GNU objdump -d. 000200b4 <_start>: 200b4: faffffff blx 200b8 <thumb> 000200b8 <thumb>: 200b8: f7ff effc blx 200b4 <_start> Differential Revision: https://reviews.llvm.org/D104701	2021-06-30 16:35:28 +07:00
David Green	aaf6a7ac34	[ARM] Extra test for gep immediate costs. NFC	2021-06-29 16:51:47 +01:00
David Green	371ee32e01	[ARM] Fold extract of ARM_BUILD_VECTOR This adds a small fold for extract (ARM_BUILD_VECTOR) to fold to the original node. This can help simplify the resulting codegen in some cases. Differential Revision: https://reviews.llvm.org/D104860	2021-06-29 11:03:19 +01:00
David Green	991a88b177	[ARM] Regenerate big-endian-vector-caller.ll test checks. NFC	2021-06-26 13:21:54 +01:00
Amara Emerson	f9b3840c3d	[ARM] Fix crash in chained BFI combine due to incorrectly RAUW'ing a node. For a bfi chain like: a = bfi input, x, y b = bfi a, x', y' The previous code was RAUW'ing a with x, mutating the second 'b' bfi, and when SelectionDAG's CSE code ended up deleting it unexpectedly, bad things happend. There's no need to RAUW in this case because we can just return our newly created replacement BFI node. It also looked incorrect because it didn't account for other users of the 'a' bfi. Since it seems that chains of more than 2 BFI nodes are hard/impossible to produce without this combine kicking in at some point, I've removed that functionality since it had no test coverage. rdar://79095399 Differential Revision: https://reviews.llvm.org/D104868	2021-06-24 23:35:47 -07:00
Craig Topper	03f9e04bc3	[TargetLowering][ARM] Don't alter opaque constants in TargetLowering::ShrinkDemandedConstant. We don't constant fold based on demanded bits elsewhere in SimplifyDemandedBits, so I don't think we should shrink them either. The affected ARM test changes because a constant become non-opaque and eventually enabled some constant folding. This no longer happens. I checked and InstCombine is able to simplify this test. I'm not sure exactly what it was trying to test. Reviewed By: lebedev.ri, dmgreen Differential Revision: https://reviews.llvm.org/D104832	2021-06-24 10:09:36 -07:00
Roman Lebedev	9c4c2f2472	[SimplifyCFG] Tail-merging all blocks with `ret` terminator Based ontop of D104598, which is a NFCI-ish refactoring. Here, a restriction, that only empty blocks can be merged, is lifted. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D104597	2021-06-24 13:15:39 +03:00
Roman Lebedev	3c94869632	[NFC][ARM] Fix update_llc_test_checks for aarch64-apple-ios/thumbv7s-apple-darwin, autogenerate a few tests	2021-06-23 16:31:19 +03:00
Roman Lebedev	15be15073e	[NFC][ARM] Fix update_llc_test_checks for thumbv7-apple-ios, autogenerate switch-minsize.ll	2021-06-23 16:31:19 +03:00
Roman Lebedev	4de0c40031	[NFC][ARM] Fix update_llc_test_checks for armv7-apple-ios, autogenerate ifcvt5.ll/ifcvt6.ll	2021-06-23 16:31:19 +03:00
Roman Lebedev	ff4b1d379f	[NFCI-ish][SimplifyCFGPass] Rework and generalize `ret` block tail-merging This changes the approach taken to tail-merge the blocks to always create a new block instead of trying to reuse some block, and generalizes it to support dealing not with just the `ret` in the future. This effectively lifts the CallBr restriction, although this isn't really intentional. That is the only non-NFC change here, i'm not sure if it's reasonable/feasible to temporarily retain it. Other restrictions of the transform remain. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D104598	2021-06-23 14:33:18 +03:00
Fangrui Song	f53d791520	Improve the diagnostic of DiagnosticInfoResourceLimit (and warn-stack-size in particular) Before: `warning: stack size limit exceeded (888) in main` After: `warning: stack frame size (888) exceeds limit (100) in function 'main'` (the -Wframe-larger-than limit will be mentioned) Reviewed By: nickdesaulniers Differential Revision: https://reviews.llvm.org/D104667	2021-06-22 09:55:20 -07:00
Nick Desaulniers	8ace121305	[IR] convert warn-stack-size from module flag to fn attr Otherwise, this causes issues when building with LTO for object files that use different values. Link: https://github.com/ClangBuiltLinux/linux/issues/1395 Reviewed By: dblaikie, MaskRay Differential Revision: https://reviews.llvm.org/D104342	2021-06-21 15:09:25 -07:00
Roman Lebedev	e497b12a69	[NFC][AArch64][ARM][Thumb][Hexagon] Autogenerate some tests These all (and some others) are being affected by D104597, but they are manually-written, which rather complicates checking the effect that change has on them.	2021-06-20 14:12:45 +03:00
Bjorn Pettersson	4c7f820b2b	Update @llvm.powi to handle different int sizes for the exponent This can be seen as a follow up to commit `0ee439b705`, that changed the second argument of __powidf2, __powisf2 and __powitf2 in compiler-rt from si_int to int. That was to align with how those runtimes are defined in libgcc. One thing that seem to have been missing in that patch was to make sure that the rest of LLVM also handle that the argument now depends on the size of int (not using the si_int machine mode for 32-bit). When using __builtin_powi for a target with 16-bit int clang crashed. And when emitting libcalls to those rtlib functions, typically when lowering @llvm.powi), the backend would always prepare the exponent argument as an i32 which caused miscompiles when the rtlib was compiled with 16-bit int. The solution used here is to use an overloaded type for the second argument in @llvm.powi. This way clang can use the "correct" type when lowering __builtin_powi, and then later when emitting the libcall it is assumed that the type used in @llvm.powi matches the rtlib function. One thing that needed some extra attention was that when vectorizing calls several passes did not support that several arguments could be overloaded in the intrinsics. This patch allows overload of a scalar operand by adding hasVectorInstrinsicOverloadedScalarOpd, with an entry for powi. Differential Revision: https://reviews.llvm.org/D99439	2021-06-17 09:38:28 +02:00
Kristina Bessonova	f6b9836b09	[ARM][NEON] Combine base address updates for vld1Ndup intrinsics Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D103836	2021-06-13 11:18:32 +02:00
Koutheir Attouchi	789708617d	Do not generate calls to the 128-bit function __multi3() on 32-bit ARM Re-applying this patch after bots failures. Should be fine now. The function __multi3() is undefined on 32-bit ARM, so a call to it should never be emitted. Instead, plain instructions need to be generated to perform 128-bit multiplications. Differential Revision: https://reviews.llvm.org/D103906	2021-06-11 11:45:21 +01:00
Nick Desaulniers	fc018ebb60	[IR] make -warn-frame-size into a module attr -Wframe-larger-than= is an interesting warning; we can't know the frame size until PrologueEpilogueInsertion (PEI); very late in the compilation pipeline. -Wframe-larger-than= was propagated through CC1 as an -mllvm flag, then was a cl::opt in LLVM's PEI pass; this meant it was dropped during LTO and needed to be re-specified via -plugin-opt. Instead, make it part of the IR proper as a module level attribute, similar to D103048. Introduce -fwarn-stack-size CC1 option. Reviewed By: rsmith, qcolombet Differential Revision: https://reviews.llvm.org/D103928	2021-06-10 16:15:27 -07:00
Nico Weber	68a1d9a1f5	Revert "Do not generate calls to the 128-bit function __multi3() on 32-bit ARM" This reverts commit `64e9aa3302`. Breaks check-llvm everywhere, see https://reviews.llvm.org/D103906	2021-06-09 13:21:05 -04:00
Koutheir Attouchi	64e9aa3302	Do not generate calls to the 128-bit function __multi3() on 32-bit ARM The function __multi3() is undefined on 32-bit ARM, so a call to it should never be emitted. Instead, plain instructions need to be generated to perform 128-bit multiplications. Differential Revision: https://reviews.llvm.org/D103906	2021-06-09 16:21:16 +01:00
Yvan Roux	6c78dbd4ca	[ARM] Fix Machine Outliner LDRD/STRD handling in Thumb mode. This is a fix for PR50481 Immediate values for AddrModeT2_i8s4 are already scaled in MCinst operand. This patch changes the number of bits and scale factor to reflect that state when checking stack offset status. AddrModeT2_i7s[2\|4] also have this particularity but since MVE instructions are not outlined, just move these cases to the unhandled ones. Differential Revision: https://reviews.llvm.org/D103167	2021-06-09 15:37:21 +02:00
Arthur Eubanks	8815ce03e8	Remove "Rewrite Symbols" from codegen pipeline It breaks up the function pass manager in the codegen pipeline. With empty parameters, it looks at the -mllvm flag -rewrite-map-file. This is likely not in use. Add a check that we only have one function pass manager in the codegen pipeline. Some tests relied on the fact that we had a module pass somewhere in the codegen pipeline. addr-label.ll crashes on ARM due to this change. This is because a ARMConstantPoolConstant containing a BasicBlock to represent a blockaddress may hold an invalid pointer to a BasicBlock if the blockaddress is invalidated by its BasicBlock getting removed. In that case all referencing blockaddresses are RAUW a constant int. Making ARMConstantPoolConstant::CVal a WeakVH fixes the crash, but I'm not sure that's the right fix. As a workaround, create a barrier right before ISel so that IR optimizations can't happen while a ARMConstantPoolConstant has been created. Reviewed By: rnk, MaskRay, compnerd Differential Revision: https://reviews.llvm.org/D99707	2021-05-31 08:32:36 -07:00
Tim Northover	9ff2eb1ea5	SwiftTailCC: teach verifier musttail rules applicable to this CC. SwiftTailCC has a different set of requirements than the C calling convention for a tail call. The exact argument sequence doesn't have to match, but fewer ABI-affecting attributes are allowed. Also make sure the musttail diagnostic triggers if a musttail call isn't actually a tail call.	2021-05-28 11:12:00 +01:00
Tim Northover	d88f96dff3	ARM: support mandatory tail calls for tailcc & swifttailcc This adds support for callee-pop conventions to the ARM backend so that it can ensure a call marked "tail" is actually a tail call.	2021-05-28 11:10:51 +01:00
Kristina Bessonova	44843e2a04	[ARM][NEON] Combine base address updates for vld1x intrinsics Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D102855	2021-05-25 11:06:39 +02:00
David Spickett	0cd2629d97	[llvm][ARM] Remove non-existent arm1176j-s CPU This was removed in https://reviews.llvm.org/D52594 for clang. The one test using it has been updated to use the mpcore CPU as the linked clang change does. This is part of fixing https://bugs.llvm.org/show_bug.cgi?id=50454. Reviewed By: peter.smith Differential Revision: https://reviews.llvm.org/D103022	2021-05-25 08:56:55 +00:00
serge-sans-paille	4ab3041acb	Revert "[NFC] remove explicit default value for strboolattr attribute in tests" This reverts commit `bda6e5bee0`. See https://lab.llvm.org/buildbot/#/builders/109/builds/15424 for instance	2021-05-24 19:43:40 +02:00
serge-sans-paille	bda6e5bee0	[NFC] remove explicit default value for strboolattr attribute in tests Since `d6de1e1a71`, no attributes is quivalent to setting attribute to false. This is a preliminary commit for https://reviews.llvm.org/D99080	2021-05-24 19:31:04 +02:00
Daniel Kiss	801ab71032	[ARM][AArch64] SLSHardening: make non-comdat thunks possible Linker scripts might not handle COMDAT sections. SLSHardeing adds new section for each __llvm_slsblr_thunk_xN. This new option allows the generation of the thunks into the normal text section to handle these exceptional cases. ,comdat or ,noncomdat can be added to harden-sls to control the codegen. -mharden-sls=[all\|retbr\|blr],nocomdat. Reviewed By: kristof.beyls Differential Revision: https://reviews.llvm.org/D100546	2021-05-20 17:07:05 +02:00
Kristina Bessonova	d59a2a32b9	[ARM][NEON] Combine base address updates for vst1x intrinsics Differential Revision: https://reviews.llvm.org/D102256	2021-05-19 14:05:55 +02:00
Arthur Eubanks	1c7f32334d	[TargetLowering] Only inspect attributes in the arguments for ArgListEntry Parameter attributes are considered part of the function [1], and like mismatched calling conventions [2], we can't have the verifier check for mismatched parameter attributes. This is a reland after fixing MSan issues in D102667. [1] https://llvm.org/docs/LangRef.html#parameter-attributes [2] https://llvm.org/docs/FAQ.html#why-does-instcombine-simplifycfg-turn-a-call-to-a-function-with-a-mismatched-calling-convention-into-unreachable-why-not-make-the-verifier-reject-it Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D101806	2021-05-18 14:30:22 -07:00
Arthur Eubanks	341902672c	Revert "[TargetLowering] Only inspect attributes in the arguments for ArgListEntry" This reverts commit `16748bd2fb`. Causes https://crbug.com/1209013	2021-05-16 22:02:10 -07:00
David Green	dd5c52029d	[CPG][ARM] Optimize towards branch on zero in codegenprepare This adds a simple fold into codegenprepare that converts comparison of branches towards comparison with zero if possible. For example: %c = icmp ult %x, 8 br %c, bla, blb %tc = lshr %x, 3 becomes %tc = lshr %x, 3 %c = icmp eq %tc, 0 br %c, bla, blb As a first order approximation, this can reduce the number of instructions needed to perform the branch as the shift is (often) needed anyway. At the moment this does not effect very much, as llvm tends to prefer the opposite form. But it can protect against regressions from commits like rG9423f78240a2. Simple cases of Add and Sub are added along with Shift, equally as the comparison to zero can often be folded with cpsr flags. Differential Revision: https://reviews.llvm.org/D101778	2021-05-16 17:54:06 +01:00
David Green	d539357e1b	[ARM] Extra branch on zero tests. NFC	2021-05-16 17:22:52 +01:00
Tomas Matheson	34c098b780	[ARM] Prevent spilling between ldrex/strex pairs Based on the same for AArch64: `4751cadcca` At -O0, the fast register allocator may insert spills between the ldrex and strex instructions inserted by AtomicExpandPass when expanding atomicrmw instructions in LL/SC loops. To avoid this, expand to cmpxchg loops and therefore expand the cmpxchg pseudos after register allocation. Required a tweak to ARMExpandPseudo::ExpandCMP_SWAP to use the 4-byte encoding of UXT, since the pseudo instruction can be allocated a high register (R8-R15) which the 2-byte encoding doesn't support. However, the 4-byte encodings are not present for ARM v8-M Baseline. To enable this, two new pseudos are added for Thumb which are only valid for v8mbase, tCMP_SWAP_8 and tCMP_SWAP_16. The previously committed attempt in D101164 had to be reverted due to runtime failures in the test suites. Rather than spending time fixing that implementation (adding another implementation of atomic operations and more divergence between backends) I have chosen to follow the approach taken in D101163. Differential Revision: https://reviews.llvm.org/D101898 Depends on D101912	2021-05-12 09:43:21 +01:00
Tomas Matheson	edf9d88266	[ARM] Precommit test for D101898 Differential Revision: https://reviews.llvm.org/D101912	2021-05-12 09:43:21 +01:00
Arthur Eubanks	16748bd2fb	[TargetLowering] Only inspect attributes in the arguments for ArgListEntry Parameter attributes are considered part of the function [1], and like mismatched calling conventions [2], we can't have the verifier check for mismatched parameter attributes. [1] https://llvm.org/docs/LangRef.html#parameter-attributes [2] https://llvm.org/docs/FAQ.html#why-does-instcombine-simplifycfg-turn-a-call-to-a-function-with-a-mismatched-calling-convention-into-unreachable-why-not-make-the-verifier-reject-it Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D101806	2021-05-10 12:35:11 -07:00
Momchil Velikov	5c7b43aa82	[clang][AArch32] Correctly align HA arguments when passed on the stack Analogously to https://reviews.llvm.org/D98794 this patch uses the `alignstack` attribute to fix incorrect passing of homogeneous aggregate (HA) arguments on AArch32. The EABI/AAPCS was recently updated to clarify how VFP co-processor candidates are aligned: `4488e34998` Differential Revision: https://reviews.llvm.org/D100853	2021-05-10 16:28:46 +01:00
David Green	76786037c6	[ARM] Fix postinc of vst1xN These nodes are not handled correctly by CombineBaseUpdate. For the moment, similar to `5f1cad4d29` mark them as unsupported.	2021-05-09 21:57:55 +01:00
Matt Arsenault	fa0b93b5a0	GlobalISel: Use DAG call lowering infrastructure in a more compatible way Unfortunately the current call lowering code is built on top of the legacy MVT/DAG based code. However, GlobalISel was not using it the same way. In short, the DAG passes legalized types to the assignment function, and GlobalISel was passing the original raw type if it was simple. I do believe the DAG lowering is conceptually broken since it requires picking a type up front before knowing how/where the value will be passed. This ends up being a problem for AArch64, which wants to pass i1/i8/i16 values as a different size if passed on the stack or in registers. The argument type decision is split across 3 different places which is hard to follow. SelectionDAG builder uses getRegisterTypeForCallingConv to pick a legal type, tablegen gives the illusion of controlling the type, and the target may have additional hacks in the C++ part of the call lowering. AArch64 hacks around this by not using the standard AnalyzeFormalArguments and special casing i1/i8/i16 by looking at the underlying type of the original IR argument. I believe people have generally assumed the calling convention code is processing the original types, and I've discovered a number of dead paths in several targets. x86 actually relies on the opposite behavior from AArch64, and relies on x86_32 and x86_64 sharing calling convention code where the 64-bit cases implicitly do not work on x86_32 due to using the pre-legalized types. AMDGPU targets without legal i16/f16 have always used a broken ABI that promotes to i32/f32. GlobalISel accidentally fixed this to be the ABI we should have, but this fixes it so we're using the worse ABI that is compatible with the DAG. Ideally we would fix the DAG to match the old GlobalISel behavior, but I don't wish to fight that battle. A new native GlobalISel call lowering framework should let the target process the incoming types directly. CCValAssigns select a "ValVT" and "LocVT" but the meanings of these aren't entirely clear. Different targets don't use them consistently, even within their own call lowering code. My current belief is the intent was "ValVT" is supposed to be the legalized value type to use in the end, and and LocVT was supposed to be the ABI passed type (which is also legalized). With the default CCState::Analyze functions always passing the same type for these arguments, these only differ when the TableGen part of the lowering decide to promote the type from one legal type to another. AArch64's i1/i8/i16 hack ends up inverting the meanings of these values, so I had to add an additional hack to let the target interpret how large the argument memory is. Since targets don't consistently interpret ValVT and LocVT, this doesn't produce quite equivalent code to the initial DAG lowerings. I've opted to consistently interpret LocVT as the in-memory size for stack passed values, and ValVT as the register type to assign from that memory. We therefore produce extending loads directly out of the IRTranslator, whereas the DAG would emit regular loads of smaller values. This will also produce loads/stores that are wider than the argument value if the allocated stack slot is larger (and there will be undef padding bytes). If we had the optimizations to reduce load/stores based on truncated values, this wouldn't produce a different end result. Since ValVT/LocVT are more consistently interpreted, we now will emit more G_BITCASTS as requested by the CCAssignFn. For example AArch64 was directly assigning types to some physical vector registers which according to the tablegen spec should have been casted to a vector with a different element type. This also moves the responsibility for inserting G_ASSERT_SEXT/G_ASSERT_ZEXT from the target ValueHandlers into the generic code, which is closer to how SelectionDAGBuilder works. I had to xfail an x86 test since I don't see a quick way to fix it right now (I filed bug 50035 for this). It's broken independently of this change, and only triggers since now we end up with more ands which hit the improperly handled selection pattern. I also observed that FP arguments that need promotion (e.g. f16 passed as f32) are broken, and use regular G_TRUNC and G_ANYEXT. TLDR; the current call lowering infrastructure is bad and nobody has ever understood how it chooses types.	2021-05-05 17:35:02 -04:00
Simon Moll	1db4dbba24	Recommit "[VP,Integer,#2] ExpandVectorPredication pass" This reverts the revert `02c5ba8679` Fix: Pass was registered as DUMMY_FUNCTION_PASS causing the newpm-pass functions to be doubly defined. Triggered in -DLLVM_ENABLE_MODULE=1 builds. Original commit: This patch implements expansion of llvm.vp.* intrinsics (https://llvm.org/docs/LangRef.html#vector-predication-intrinsics). VP expansion is required for targets that do not implement VP code generation. Since expansion is controllable with TTI, targets can switch on the VP intrinsics they do support in their backend offering a smooth transition strategy for VP code generation (VE, RISC-V V, ARM SVE, AVX512, ..). Reviewed By: rogfer01 Differential Revision: https://reviews.llvm.org/D78203	2021-05-04 11:47:52 +02:00
Tomas Matheson	9d86095ff8	Revert "[CodeGen][ARM] Implement atomicrmw as pseudo operations at -O0" This reverts commit `753185031d`.	2021-05-03 21:48:20 +01:00
Tomas Matheson	753185031d	[CodeGen][ARM] Implement atomicrmw as pseudo operations at -O0 atomicrmw instructions are expanded by AtomicExpandPass before register allocation into cmpxchg loops. Register allocation can insert spills between the exclusive loads and stores, which invalidates the exclusive monitor and can lead to infinite loops. To avoid this, reimplement atomicrmw operations as pseudo-instructions and expand them after register allocation. Floating point legalisation: f16 ATOMIC_LOAD_FADD(f16, f16) is legalised to f32 ATOMIC_LOAD_FADD(i16, f32) and then eventually f32 ATOMIC_LOAD_FADD_16(*i16, f32) Differential Revision: https://reviews.llvm.org/D101164 Originally submitted as `3338290c18`. Reverted in `c7df6b1223`.	2021-05-03 20:25:15 +01:00
Adrian Prantl	02c5ba8679	Revert "[VP,Integer,#2] ExpandVectorPredication pass" This reverts commit `43bc584dc0`. The commit broke the -DLLVM_ENABLE_MODULES=1 builds. http://green.lab.llvm.org/green/view/LLDB/job/lldb-cmake/31603/consoleFull#2136199809a1ca8a51-895e-46c6-af87-ce24fa4cd561	2021-04-30 17:02:28 -07:00
Tomas Matheson	c7df6b1223	Revert "[CodeGen][ARM] Implement atomicrmw as pseudo operations at -O0" This reverts commit `3338290c18`. Broke expensive checks on debian.	2021-04-30 16:53:14 +01:00
Tomas Matheson	3338290c18	[CodeGen][ARM] Implement atomicrmw as pseudo operations at -O0 atomicrmw instructions are expanded by AtomicExpandPass before register allocation into cmpxchg loops. Register allocation can insert spills between the exclusive loads and stores, which invalidates the exclusive monitor and can lead to infinite loops. To avoid this, reimplement atomicrmw operations as pseudo-instructions and expand them after register allocation. Floating point legalisation: f16 ATOMIC_LOAD_FADD(f16, f16) is legalised to f32 ATOMIC_LOAD_FADD(i16, f32) and then eventually f32 ATOMIC_LOAD_FADD_16(*i16, f32) Differential Revision: https://reviews.llvm.org/D101164	2021-04-30 16:40:33 +01:00
Simon Moll	43bc584dc0	[VP,Integer,#2] ExpandVectorPredication pass This patch implements expansion of llvm.vp.* intrinsics (https://llvm.org/docs/LangRef.html#vector-predication-intrinsics). VP expansion is required for targets that do not implement VP code generation. Since expansion is controllable with TTI, targets can switch on the VP intrinsics they do support in their backend offering a smooth transition strategy for VP code generation (VE, RISC-V V, ARM SVE, AVX512, ..). Reviewed By: rogfer01 Differential Revision: https://reviews.llvm.org/D78203	2021-04-30 15:47:28 +02:00
David Green	94c7bd7eb2	[ARM] Expand VMOVRRD simplification pattern This expands the VMOVRRD(extract(..(build_vector(a, b, c, d)))) pattern, to also handle insert_vectors. Providing we can find the correct insert, this helps further simplify patterns by removing the redundant VMOVRRD. Differential Revision: https://reviews.llvm.org/D100245	2021-04-26 12:27:38 +01:00
Dávid Bolvanský	ef2dc7ed9f	[Analysis] Attribute alignment should not prevent tail call optimization Fixes tail folding issue mentioned in D100879. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D101230	2021-04-24 19:57:42 +02:00
David Green	21a8b9d9e9	[ARM] Limit PerformExtractEltToVMOVRRD to when f64 is legal. The generic SoftFloatVectorExtract.ll test was failing when run on arm machines, as it tries to create a f64 under soft float. Limit the transform to when f64 is legal. Also add a missing override, as reported in D100244.	2021-04-20 16:24:36 +01:00
David Green	48cef1fa8e	[ARM] Create VMOVRRD from adjacent vector extracts This adds a combine for extract(x, n); extract(x, n+1) -> VMOVRRD(extract x, n/2). This allows two vector lanes to be moved at the same time in a single instruction, and thanks to the other VMOVRRD folds we have added recently can help reduce the amount of executed instructions. Floating point types are very similar, but will include a bitcast to an integer type. This also adds a shouldRewriteCopySrc, to prevent copy propagation from DPR to SPR, which can break as not all DPR regs can be extracted from directly. Otherwise the machine verifier is unhappy. Differential Revision: https://reviews.llvm.org/D100244	2021-04-20 15:15:43 +01:00
David Green	806b47ade3	[ARM] Regenerate a couple of tests. NFC	2021-04-20 10:54:41 +01:00
David Penry	ca8eef7e3d	[CodeGen] Use ProcResGroup information in SchedBoundary When the ProcResGroup has BufferSize=0, 1. if there is a subunit in the list of write resources for the scheduling class, do not attempt to schedule the ProcResGroup. 2. if there is not a subunit in the list of write resources for the scheduling class, choose a subunit to use instead of the ProcResGroup. 3. having both the ProcResGroup and any of its subunits in the resources implied by a InstRW is not supported. Used to model parallel uses from a pool of resources. Differential Revision: https://reviews.llvm.org/D98976	2021-04-19 21:27:45 +01:00
David Penry	78a871abf7	[ARM] Use ProcResGroup in Cortex-M7 scheduling model Used to model structural hazards on FP issue, where some instructions take up 2 issue slots and others one as well as similar structural hazards on load issue, where some instructions take up two load lanes and others one. Differential Revision: https://reviews.llvm.org/D98977	2021-04-19 21:23:05 +01:00
Tim Northover	5e3d9fcc3a	StackProtector: ensure protection does not interfere with tail call frame. The IR stack protector pass must insert stack checks before the call instead of between it and the return. Similarly, SDAG one should recognize that ADJCALLFRAME instructions could be part of the terminal sequence of a tail call. In this case because such call frames cannot be nested in LLVM the stack protection code must skip over the whole sequence (or risk clobbering argument registers).	2021-04-13 15:14:57 +01:00
Arthur Eubanks	c88b87f9ce	Revert "Remove "Rewrite Symbols" from codegen pipeline" This reverts commit `6210261ecb`. addr-label.ll crashes on armv7.	2021-04-10 23:28:16 -07:00
Arthur Eubanks	6210261ecb	Remove "Rewrite Symbols" from codegen pipeline It breaks up the function pass manager in the codegen pipeline. With empty parameters, it looks at the -mllvm flag -rewrite-map-file. This is likely not in use. Add a check that we only have one function pass manager in the codegen pipeline. This required reverting commit 9583a3f2625818b78c0cf6d473cdedb9f23ad82c: "[AsmPrinter] Delete dead takeDeletedSymbsForFunction()". This was not NFC as initially thought. By coalescing two function psas managers, this exposed the reverted code as necessary. addr-label.ll was crashing due to an emitted blockaddress's block being removed but the label not emitted. Some tests relied on the fact that we had a module pass somewhere in the codegen pipeline. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D99707	2021-04-10 22:38:44 -07:00
Simonas Kazlauskas	777a58e05b	Support {S,U}REMEqFold before legalization This allows these optimisations to apply to e.g. `urem i16` directly before `urem` is promoted to i32 on architectures where i16 operations are not intrinsically legal (such as on Aarch64). The legalization then later can happen more directly and generated code gets a chance to avoid wasting time on computing results in types wider than necessary, in the end. Seems like mostly an improvement in terms of results at least as far as x86_64 and aarch64 are concerned, with a few regressions here and there. It also helps in preventing regressions in changes like {D87976}. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D88785	2021-04-01 01:35:41 +03:00
David Green	7b6f760fcd	[ARM] MVE vector lane interleaving MVE does not have a single sext/zext or trunc instruction that takes the bottom half of a vector and extends to a full width, like NEON has with MOVL. Instead it is expected that this happens through top/bottom instructions. So the MVE equivalent VMOVLT/B instructions take either the even or odd elements of the input and extend them to the larger type, producing a vector with half the number of elements each of double the bitwidth. As there is no simple instruction for a normal extend, we often have to expand sext/zext/trunc into a series of lane moves (or stack loads/stores, which we do not do yet). This pass takes vector code that starts at truncs, looks for interconnected blobs of operations that end with sext/zext and transforms them by adding shuffles so that the lanes are interleaved and the MVE VMOVL/VMOVN instructions can be used. This is done pre-ISel so that it can work across basic blocks. This initial version of the pass just handles a limited set of instructions, not handling constants or splats or FP, which can all come as extensions to this base. Differential Revision: https://reviews.llvm.org/D95804	2021-03-28 19:34:58 +01:00
Simon Pilgrim	9d2df96407	[DAG] computeKnownBits - add ISD::MULHS/MULHU/SMUL_LOHI/UMUL_LOHI handling Reuse the existing KnownBits multiplication code to handle the 'extend + multiply + extract high bits' pattern for multiply-high ops. Noticed while looking at the codegen for D88785 / D98587 - the patch helps division-by-constant expansion code in particular, which suggests that we might have some further KnownBits div/rem cases we could handle - but this was far easier to implement. Differential Revision: https://reviews.llvm.org/D98857	2021-03-19 16:02:31 +00:00
Simon Pilgrim	d9b5338cfb	[ARM] Regenerate select-imm.ll tests	2021-03-18 11:07:16 +00:00
David Green	402f2cae7d	[ARM] Use lrdsb for more thumb1 loads. Given a sextload i16, we can usually generate "ldrsh [rn. rm]". If we don't naturally have a rn, rm addressing mode, we can either generate "ldrh [rn, #0]; sxth" or "mov rm, #0; ldrsh [rn. rm]". We currently generate the first, always creating a sxth. They are both the same number of instructions, but if we generate the second then the mov #0 will likely be CSE'd or pulled out of a loop, etc. This adjusts the ISel patterns to do that, creating a mov instead of a sxth. Differential Revision: https://reviews.llvm.org/D98693	2021-03-17 15:29:02 +00:00
Simonas Kazlauskas	a2eca31da2	Test cases for rem-seteq fold with illegal types This also briefly tests a larger set of architectures than the more exhaustive functionality tests for AArch64 and x86. As requested in D88785 Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D98339	2021-03-12 16:28:04 +02:00
Daniel Sanders	134a179dee	[mir] Change 'undef' for MMO base addresses to 'unknown-address' Differential Revision: https://reviews.llvm.org/D98100	2021-03-10 16:46:44 -08:00
Craig Topper	0eb405c3b8	[SelectionDAG] Add computeKnownBits support for ISD::USUBSAT. The result of ISD::USUBSAT will never be larger than the LHS. We can use this to put a bound on the number of leading zeros. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D98133	2021-03-07 09:48:42 -08:00
Daniel Sanders	9fc2be6f28	[mir] Fix confusing MIR when MMO's value is nullptr but offset is non-zero :: (store 1 + 4, addrspace 1) -> :: (store 1 into undef + 4, addrspace 1) An offset without a base isn't terribly useful but it's convenient to update the offset without checking the value. For example, when breaking apart stores into smaller units Differential Revision: https://reviews.llvm.org/D97812	2021-03-04 10:34:30 -08:00
Arthur Eubanks	040c1b49d7	Move EntryExitInstrumentation pass location This seems to be more of a Clang thing rather than a generic LLVM thing, so this moves it out of LLVM pipelines and as Clang extension hooks into LLVM pipelines. Move the post-inline EEInstrumentation out of the backend pipeline and into a late pass, similar to other sanitizer passes. It doesn't fit into the codegen pipeline. Also fix up EntryExitInstrumentation not running at -O0 under the new PM. PR49143 Reviewed By: hans Differential Revision: https://reviews.llvm.org/D97608	2021-03-01 10:08:10 -08:00
Craig Topper	fe50be12c8	[LegalizeIntegerTypes] Further improve ExpandIntRes_SADDSUBO for targets where SADDO/SSUBO aren't supported. Rather than converting 3 signbits to bools and comparing them, we can do bitwise logic on the whole vector and convert the resulting sign bit to a bool at the end. This is still a different algorithm than what we do in LegalizeDAG through expandSADDOSSUBO. That algorithm needs to know that the RHS of SSUBO is > 0, but that's costly when the type is split. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D97325	2021-02-24 10:05:38 -08:00
David Green	03892a27d6	[ARM] Expand the range of allowed post-incs in load/store optimizer Currently the load/store optimizer will only fold in increments of the same size as the load/store. This patch expands that to any legal immediate for the post-inc instruction. This is a recommit of `3b34b06fc5` with correctness fixes and extra tests. Differential Revision: https://reviews.llvm.org/D95885	2021-02-24 08:46:15 +00:00
David Green	8fa2bbaed9	[ARM] Mir test for pre/postinc ldstopt combines. NFC	2021-02-23 22:27:06 +00:00
Craig Topper	eb165090bb	[LegalizeIntegerTypes] Improve ExpandIntRes_SADDSUBO codegen on targets without SADDO/SSUBO. This code creates 3 setccs that need to be expanded. It was creating a sign bit test as setge X, 0 which is non-canonical. Canonical would be setgt X, -1. This misses the special case in IntegerExpandSetCCOperands for sign bit tests that assumes canonical form. If we don't hit this special case we end up with a multipart setcc instead of just checking the sign of the high part. To fix this I've reversed the polarity of all of the setccs to setlt X, 0 which is canonical. The rest of the logic should still work. This seems to produce better code on RISCV which lacks a setgt instruction. This probably still isn't the best code sequence we could use here. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D97181	2021-02-23 09:40:32 -08:00
Sjoerd Meijer	e1c3bf6afe	[ARM] do not consider sp as deprecated for ldm/stm Early versions of the ARMv7 reference manuals considered the sp register as a deprecated register for ldm/stm familiy of instructions. However, later versions such as ARM DDI 0406C.d added a note to the Appendix: D9.3 Use of the SP as a general-purpose register Most ARM instructions, unlike Thumb instructions, provide exactly the same access to the SP as to R0-R12. This means that it is possible to use the SP as a general-purpose register. Earlier issues of this manual deprecated the use of SP in an ARM instruction, in any way that is deprecated, not permitted, or not possible in the corresponding Thumb instruction. However, user feedback indicates a number of cases where these instructions are useful. Therefore, ARM no longer deprecates these instruction uses. Also Armv8 manuals no longer consider SP as deprecated register for ldm/ stm A32 instructions. Furthermore, GNU as also does not print a deprecated warning when using SP with those instructions. Drop deprecation warning for pop/ldm/push/stm instructions. Patch by: Stefan Agner. Differential Revision: https://reviews.llvm.org/D82692	2021-02-23 13:26:18 +00:00
David Green	ebb6583e02	[ARM] Add pre/post inc tests of various sizes. NFC	2021-02-23 10:53:22 +00:00
David Green	7a5c26e99a	Revert "[ARM] Expand the range of allowed post-incs in load/store optimizer" This reverts commit `3b34b06fc5` as runtime errors were reported.	2021-02-19 13:15:10 +00:00
David Green	3b34b06fc5	[ARM] Expand the range of allowed post-incs in load/store optimizer Currently the load/store optimizer will only fold in increments of the same size as the load/store. This patch expands that to any legal immediate for the post-inc instruction. Differential Revision: https://reviews.llvm.org/D95885	2021-02-18 14:59:02 +00:00
David Green	1a6744e3dc	[ARM] Add larger than legal ICmp costs A v8i32 compare will produce a v8i1 predicate, but during codegen the v8i32 will be split into two v4i32, potentially requiring two v4i1 predicates to be merged into a single v8i1. Because this merging of two v4i1's into a v8i1 is very expensive, we need to make the cost of the compare equally high. This patch adds the cost of that to ARMTTIImpl::getCmpSelInstrCost. Because we don't know whether the user of the predicate can be split, and the cost model is mostly pre-instruction, we may be pessimistic but that should only be for larger and legal types. This also adds min/max detection to the costmodel where it can be detected, to keep those in line with the cost of simple min/max instructions. Otherwise for the most part, costs that were already expensive have become more expensive. Differential Revision: https://reviews.llvm.org/D96692	2021-02-18 11:42:17 +00:00
David Green	1fbb3287fc	[ARM] MVE ICmp costing tests. NFC	2021-02-18 10:50:34 +00:00
Sjoerd Meijer	f78aa8b2c2	[LSR] Add a flag that overrides the target's preferred addressing mode This adds a new flag -lsr-preferred-addressing-mode to override the target's preferred addressing mode. It replaces flag -lsr-backedge-indexing, which is equivalent to preindexed addressing that is one of the options that -lsr-preferred-addressing-mode accepts. Differential Revision: https://reviews.llvm.org/D96855	2021-02-17 16:50:21 +00:00
David Green	a838a4f69f	[ARM] Extend search for increment in load/store optimizer Currently the findIncDecAfter will only look at the next instruction for post-inc candidates in the load/store optimizer. This extends that to a search through the current BB, until an instruction that modifies or uses the increment reg is found. This allows more post-inc load/stores and ldm/stm's to be created, especially in cases where a schedule might move instructions further apart. We make sure not to look any further for an SP, as that might invalidate stack slots that are still in use. Differential Revision: https://reviews.llvm.org/D95881	2021-02-15 13:17:21 +00:00
Simon Pilgrim	60ba5397df	[DAG] PromoteIntRes_ADDSUBSHLSAT - use promoted ISD::USUBSAT directly As discussed on D96413, as long as the promoted bits of the args are zero we can use the basic ISD::USUBSAT pattern directly, without the shifting like we do for other ops. I think something similar should be possible for ISD::UADDSAT as well, which I'll look at later. Also, create a ISD::USUBSAT node directly - this will be expanded back by the legalizer later on if necessary. Differential Revision: https://reviews.llvm.org/D96622	2021-02-13 12:35:10 +00:00
Serge Pavlov	816053bc71	[FPEnv][ARM] Implement lowering of llvm.set.rounding Differential Revision: https://reviews.llvm.org/D96501	2021-02-13 11:16:29 +07:00
Simon Pilgrim	4841a225b7	[DAG] Move basic USUBSAT pattern matches from X86 to DAGCombine Begin transitioning the X86 vector code to recognise sub(umax(a,b) ,b) or sub(a,umin(a,b)) USUBSAT patterns to make it more generic and available to all targets. This initial patch just moves the basic umin/umax patterns to DAG, removing some vector-only checks on the way - these are some of the patterns that the legalizer will try to expand back to so we can be reasonably relaxed about matching these pre-legalization. We can handle the trunc(sub(..))) variants as well, which helps with patterns where we were promoting to a wider type to detect overflow/saturation. The remaining x86 code requires some cleanup first - some of it isn't actually tested etc. I also need to resurrect D25987. Differential Revision: https://reviews.llvm.org/D96413	2021-02-12 18:22:57 +00:00
Lukas Sommer	6577cef9b0	[CodeGen] New pass: Replace vector intrinsics with call to vector library This patch adds a pass to replace calls to vector intrinsics (i.e., LLVM intrinsics operating on vector operands) with calls to a vector library. Currently, calls to LLVM intrinsics are only replaced with calls to vector libraries when scalar calls to intrinsics are vectorized by the Loop- or SLP-Vectorizer. With this pass, it is now possible to replace calls to LLVM intrinsics already operating on vector operands, e.g., if such code was generated by MLIR. For the replacement, information from the TargetLibraryInfo, e.g., as specified via -vector-library is used. This is a re-try of the original commit `2303e93e66` that was reverted due to pass manager problems. Other minor changes have also been made. Differential Revision: https://reviews.llvm.org/D95373	2021-02-12 12:53:27 -05:00
Sanjay Patel	c981f6f8e1	Revert "[Codegen][ReplaceWithVecLib] add pass to replace vector intrinsics with calls to vector library" This reverts commit `2303e93e66`. Investigating bot failures.	2021-02-05 15:10:11 -05:00
Lukas Sommer	2303e93e66	[Codegen][ReplaceWithVecLib] add pass to replace vector intrinsics with calls to vector library This patch adds a pass to replace calls to vector intrinsics (i.e., LLVM intrinsics operating on vector operands) with calls to a vector library. Currently, calls to LLVM intrinsics are only replaced with calls to vector libraries when scalar calls to intrinsics are vectorized by the Loop- or SLP-Vectorizer. With this pass, it is now possible to replace calls to LLVM intrinsics already operating on vector operands, e.g., if such code was generated by MLIR. For the replacement, information from the TargetLibraryInfo, e.g., as specified via -vector-library is used. Differential Revision: https://reviews.llvm.org/D95373	2021-02-05 14:25:19 -05:00
Ayke van Laethem	aecdf15cc7	[ARM] Do not emit ldrexd/strexd on Cortex-M chips The ldrexd/strexd instructions are not supported on M-class chips, see for example https://developer.arm.com/documentation/dui0489/e/arm-and-thumb-instructions/memory-access-instructions/ldrex-and-strex which says: > All these 32-bit Thumb instructions are available in ARMv6T2 and > above, except that LDREXD and STREXD are not available in the ARMv7-M > architecture. Looking at the ARMv8-M architecture, it appears that these instructions aren't supported either. The Architecture Reference Manual lists ldrex/strex but not ldrexd/strexd: https://developer.arm.com/documentation/ddi0553/bn/ Godbolt example on LLVM 11.0.0, which incorrectly emits ldrexd/strexd instructions: https://llvm.godbolt.org/z/5qqPnE Differential Revision: https://reviews.llvm.org/D95891	2021-02-04 21:55:34 +01:00
David Green	649a3d00df	[ARM] Handle f16 in GeneratePerfectShuffle This new f16 shuffle under Neon would hit an assert in GeneratePerfectShuffle as it would try to treat a f16 vector as an i8. Add f16 handling, treating them like an i16. Differential Revision: https://reviews.llvm.org/D95446	2021-02-04 11:14:52 +00:00
David Green	5805521207	[ARM] Simplify VMOVRRD from extracts of buildvectors Under the softfp calling convention, we are often left with VMOVRRD(extract(bitcast(build_vector(a, b, c, d)))) for the return value of the function. These can be simplified to a,b or c,d directly, depending on the value of the extract. Big endian is a little different because the bitcast switches the lanes around, meaning we end up with b,a or d,c. Differential Revision: https://reviews.llvm.org/D94989	2021-02-01 16:09:25 +00:00
David Green	ad12e6ee95	[ARM] Turn sext_inreg(VGetLaneu) into VGetLaneu This adds a DAG combine for converting sext_inreg of VGetLaneu into VGetLanes, providing the types match correctly. Differential Revision: https://reviews.llvm.org/D95073	2021-02-01 11:10:35 +00:00
Roman Lebedev	ddc4b56eef	[ExpandMemCmpPass] Preserve Dominator Tree, if available This finishes getting rid of all the avoidable Dominator Tree recalculations in X86 optimized codegen pipeline.	2021-01-30 01:14:51 +03:00
Roman Lebedev	056385921d	[ScalarizeMaskedMemIntrin] Preserve Dominator Tree, if avaliable This de-pessimizes the arguably more usual case of no masked mem intrinsics, and gets rid of one more Dominator Tree recalculation. As per llvm/test/CodeGen/X86/opt-pipeline.ll, there's one more Dominator Tree recalculation left, we could get rid of.	2021-01-29 01:11:36 +03:00
Roman Lebedev	6617529a1d	[CodeGen][DwarfEHPrepare] Preserve Dominator Tree Now that D94827 has flipped the switch, and SimplifyCFG is officially marked as production-ready regarding Dominator Tree preservation, we can update this user pass to also preserve Dominator Tree. This is a geomean compile-time win of `-0.05%`..`-0.08%`. https://llvm-compile-time-tracker.com/compare.php?from=51a25846c198cff00abad0936f975167357afa6f&to=082499aac236a5c141e50a9e77870d5be2de5f0b&stat=instructions Differential Revision: https://reviews.llvm.org/D95548	2021-01-28 14:11:34 +03:00
David Green	c1c1944e69	[ARM] Regenerate constant hoisting test. NFC	2021-01-28 10:37:16 +00:00
Roman Lebedev	51a25846c1	[CodeGen] SafeStack: preserve DominatorTree if it is avaliable While this is mostly NFC right now, because only ARM happens to run this pass with DomTree available before it, and required after it, more backends will be affected once the SimplifyCFG's switch for domtree preservation is flipped, and DwarfEHPrepare also preserves the domtree.	2021-01-27 18:32:35 +03:00
David Green	9e2768a3d9	[ARM] Add neon FP16 scalar_to_vector patterns. This adds some simple fp16 scalar_to_vector patterns, preventing a selection failure if this came up. Differential Revision: https://reviews.llvm.org/D95427	2021-01-27 09:59:15 +00:00

... 2 3 4 5 6 ...

4596 Commits