llvm-project

Commit Graph

Author	SHA1	Message	Date
Oliver Stannard	a76620143c	[ARM] Patterns for vector conversion between half and float These patterns were omitted because clang only allows converting between these types using intrinsics, but other front-ends or optimisation passes may want to use them. Differential revision: https://reviews.llvm.org/D119354	2022-02-10 09:51:55 +00:00
serge-sans-paille	ef736a1c39	Cleanup LLVMMC headers There's a few relevant forward declarations in there that may require downstream adding explicit includes: llvm/MC/MCContext.h no longer includes llvm/BinaryFormat/ELF.h, llvm/MC/MCSubtargetInfo.h, llvm/MC/MCTargetOptions.h llvm/MC/MCObjectStreamer.h no longer include llvm/MC/MCAssembler.h llvm/MC/MCAssembler.h no longer includes llvm/MC/MCFixup.h, llvm/MC/MCFragment.h Counting preprocessed lines required to rebuild llvm-project on my setup: before: 1052436830 after: 1049293745 Which is significant and backs up the change in addition to the usual benefits of decreasing coupling between headers and compilation units. Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup Differential Revision: https://reviews.llvm.org/D119244	2022-02-09 11:09:17 +01:00
Mark Murray	3d7662142d	[ARM] Undeprecate complex IT blocks AArch32/Armv8A introduced the performance deprecation of certain patterns of IT instructions. After some debate internal to ARM, this is now being reverted; i.e. no IT instruction patterns are performance deprecated anymore, as the perfomance degredation is not significant enough. This reverts the following: "ARMv8-A deprecates some uses of the T32 IT instruction. All uses of IT that apply to instructions other than a single subsequent 16-bit instruction from a restricted set are deprecated, as are explicit references to the PC within that single 16-bit instruction. This permits the non-deprecated forms of IT and subsequent instructions to be treated as a single 32-bit conditional instruction." The deprecation no longer applies, but the behaviour may be controlled by the -arm-restrict-it and -arm-no-restrict-it command-line options, with the latter being the default. No warnings about complex IT blocks will be generated. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D118044	2022-02-07 15:47:53 +00:00
Kazu Hirata	3a3cb929ab	[llvm] Use = default (NFC)	2022-02-06 22:18:35 -08:00
David Green	b7d3a2b62f	[ARM] Mark i64 and f64 shuffles as Custom for MVE This way they get lowered through the ARMISD::BUILD_VECTOR, which can produce more efficient D register moves. Also helps D115653 not get stuck in a loop.	2022-02-06 16:17:06 +00:00
Simon Pilgrim	5aa2acc86b	[DAG] SimplifyDemandedVectorElts - remove KnownZero/KnownUndef from DCI helper wrapper None of the external users actual touch these (they're purely used internally down the recursive call) - its trivial to add another wrapper if anything ever does want to track known elements.	2022-02-02 12:04:49 +00:00
tyb0807	762f0b5463	[ARM] Make getInstSizeInBytes() use instruction size from InstrInfo.td Currently, ARMBaseInstrInfo::getInstSizeInBytes() uses hard-coded instruction size for some pseudo-instructions, while this information should ideally be found in ARMInstrInfo.td, ARMInstrThumb(2).td files (which can be accessed via MCInstrDesc). Hence, the .td files should be updated and no hard-coded instruction sizes should be used by getInstSizeInBytes() anymore. Differential Revision: https://reviews.llvm.org/D118009	2022-02-01 10:39:14 +00:00
David Sherwood	daa80339df	[CodeGen] Support folds of not(cmp(cc, ...)) -> cmp(!cc, ...) for scalable vectors I have updated TargetLowering::isConstTrueVal to also consider SPLAT_VECTOR nodes with constant integer operands. This allows the optimisation to also work for targets that support scalable vectors. Differential Revision: https://reviews.llvm.org/D117210	2022-02-01 09:50:00 +00:00
Ties Stuij	6b1e844b69	[ARM] Add Cortex-X1C Support for Clang and LLVM This patch upstreams support for the Arm-v8 Cortex-X1C processor for AArch64 and ARM. For more information, see: - https://community.arm.com/arm-community-blogs/b/announcements/posts/arm-cortex-x1c - https://developer.arm.com/documentation/101968/0002/Functional-description/Technical-overview/Components The following people contributed to this patch: - Simon Tatham - Ties Stuij Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D117202	2022-01-31 14:23:35 +00:00
Nikita Popov	93122b2567	[ARM] Don't look through pointer types in canTailPredicateLoop() Inspecting the pointer element type here is incompatible with opaque pointers, and doesn't seem necessary to me. I think the intention might have been to check the type of load/store pointer arguments, but I believe those should get checked through their return type or value operand anyway. I don't get any test failures if I simply drop this. Differential Revision: https://reviews.llvm.org/D118353	2022-01-28 09:34:13 +01:00
David Green	82973edfb7	[ARM][AArch64] Introduce qrdmlah and qrdmlsh intrinsics Since it's introduction, the qrdmlah has been represented as a qrdmulh and a sadd_sat. This doesn't produce the same result for all input values though. This patch fixes that by introducing a qrdmlah (and qrdmlsh) intrinsic specifically for the vqrdmlah and sqrdmlah instructions. The old test cases will now produce a qrdmulh and sqadd, as expected. Fixes #53120 and #50905 and #51761. Differential Revision: https://reviews.llvm.org/D117592	2022-01-27 19:19:46 +00:00
Benjamin Kramer	f15014ff54	Revert "Rename llvm::array_lengthof into llvm::size to match std::size from C++17" This reverts commit `ef82063207`. - It conflicts with the existing llvm::size in STLExtras, which will now never be called. - Calling it without llvm:: breaks C++17 compat	2022-01-26 16:55:53 +01:00
serge-sans-paille	ef82063207	Rename llvm::array_lengthof into llvm::size to match std::size from C++17 As a conquence move llvm::array_lengthof from STLExtras.h to STLForwardCompat.h (which is included by STLExtras.h so no build breakage expected).	2022-01-26 16:17:45 +01:00
Jim Lin	da1cac7d19	[NFC] Remove duplicate include	2022-01-26 15:10:16 +08:00
Nikita Popov	aa97bc116d	[NFC] Remove uses of PointerType::getElementType() Instead use either Type::getPointerElementType() or Type::getNonOpaquePointerElementType(). This is part of D117885, in preparation for deprecating the API.	2022-01-25 09:44:52 +01:00
serge-sans-paille	75e164f61d	[llvm] Cleanup header dependencies in ADT and Support The cleanup was manual, but assisted by "include-what-you-use". It consists in 1. Removing unused forward declaration. No impact expected. 2. Removing unused headers in .cpp files. No impact expected. 3. Removing unused headers in .h files. This removes implicit dependencies and is generally considered a good thing, but this may break downstream builds. I've updated llvm, clang, lld, lldb and mlir deps, and included a list of the modification in the second part of the commit. 4. Replacing header inclusion by forward declaration. This has the same impact as 3. Notable changes: - llvm/Support/TargetParser.h no longer includes llvm/Support/AArch64TargetParser.h nor llvm/Support/ARMTargetParser.h - llvm/Support/TypeSize.h no longer includes llvm/Support/WithColor.h - llvm/Support/YAMLTraits.h no longer includes llvm/Support/Regex.h - llvm/ADT/SmallVector.h no longer includes llvm/Support/MemAlloc.h nor llvm/Support/ErrorHandling.h You may need to add some of these headers in your compilation units, if needs be. As an hint to the impact of the cleanup, running clang++ -E -Iinclude -I../llvm/include ../llvm/lib/Support/*.cpp -std=c++14 -fno-rtti -fno-exceptions \| wc -l before: 8000919 lines after: 7917500 lines Reduced dependencies also helps incremental rebuilds and is more ccache friendly, something not shown by the above metric :-) Discourse thread on the topic: https://llvm.discourse.group/t/include-what-you-use-include-cleanup/5831	2022-01-21 13:54:49 +01:00
Jim Lin	d6b0734837	[NFC] Use Register instead of unsigned	2022-01-19 20:17:04 +08:00
Fangrui Song	2e589c9c42	[MC][ARM] Replace MCContext::reportFatalError call with reportError This call is slightly try. We need to postpone getFixupKindNumBytes.	2022-01-15 00:32:03 -08:00
Fangrui Song	e2b66928e5	[MC][ARM] Replace MCContext::reportFatalError call with reportError	2022-01-15 00:13:49 -08:00
Ties Stuij	7c70f96a91	[ARM] fix bug causing shrinkwrapping not always being off using PAC If you want to check for all uses of PAC, the SpillsLR argument to shouldSignReturnAddress should be true instead of false, as that value will be returned from the function if the other checks fall through. Reviewed By: miyuki Differential Revision: https://reviews.llvm.org/D116213	2022-01-13 10:37:00 +00:00
Rosie Sumpter	552eb372cb	[LoopVectorize] Pass a vector type to isLegalMaskedGather/Scatter This is required to query the legality more precisely in the LoopVectorizer. This adds another TTI function named 'forceScalarizeMaskedGather/Scatter' function to work around the hack introduced for MVE, where isLegalMaskedGather/Scatter would return an answer by second-guessing where the function was called from, based on the Type passed in (vector vs scalar). The new interface makes this explicit. It is also used by X86 to check for vector widths where gather/scatters aren't profitable (or don't exist) for certain subtargets. Differential Revision: https://reviews.llvm.org/D115329	2022-01-12 13:34:12 +00:00
David Green	351edf1c47	[ARM] Remove FeaturePerfMon from armv7-m FeaturePerfMon relates to the PMU extensions available in armv7-a, and should not be available in v7-m (it requires loading from a system register with a mrc). Sink it down a level in the dependency map so that it isn't present in ARMv7m or HasV8MMainlineOps. It is also removed from the Neoverse-N2, as it will already be transitively included. Differential Revision: https://reviews.llvm.org/D117022	2022-01-12 09:44:53 +00:00
Tim Northover	0b5b35fdbd	ARM: make FastISel & GISel pass -1 to ADJCALLSTACKUP to signal no callee pop. The interface for these instructions changed with support for mandatory tail calls, and now -1 indicates the CalleePopAmount argument is not valid. Unfortunately I didn't realise FastISel or GISel did calls at the time so didn't update them.	2022-01-11 11:31:13 +00:00
Kazu Hirata	4e2ec7e38d	[llvm] Remove unused forward declarations (NFC)	2022-01-07 20:00:34 -08:00
Kazu Hirata	2aed08131d	[llvm] Use true/false instead of 1/0 (NFC) Identified with modernize-use-bool-literals.	2022-01-07 00:39:14 -08:00
Kazu Hirata	f3a344d212	[Target] Remove redundant member initialization (NFC) Identified with readability-redundant-member-init.	2022-01-06 22:01:44 -08:00
Kazu Hirata	e5947760c2	Revert "[llvm] Remove redundant member initialization (NFC)" This reverts commit `fd4808887e`. This patch causes gcc to issue a lot of warnings like: warning: base class ‘class llvm::MCParsedAsmOperand’ should be explicitly initialized in the copy constructor [-Wextra]	2022-01-03 11:28:47 -08:00
Lucas Prates	cd7f621a0a	[ARM][AArch64] Introduce Armv9.3-A This patch introduces support for targetting the Armv9.3-A architecture, which should map to the existing Armv8.8-A extensions. Differential Revision: https://reviews.llvm.org/D116158	2022-01-03 12:40:43 +00:00
Kazu Hirata	41bfac6aed	[Target] Remove unused forward declarations (NFC)	2022-01-02 10:20:15 -08:00
Kazu Hirata	fd4808887e	[llvm] Remove redundant member initialization (NFC) Identified with readability-redundant-member-init.	2022-01-01 16:18:18 -08:00
David Green	319e77592f	[ARM] Verify addressing immediates This adds at extra check into ARMBaseInstrInfo::verifyInstruction to verify the offsets used in addressing mode immediates using isLegalAddressImm. Some tests needed fixing up as a result, adjusting the opcode created from CMSE stack adjustments. Differential Revision: https://reviews.llvm.org/D114939	2022-01-01 20:08:45 +00:00
Kazu Hirata	69ccc96162	[llvm] Use the default constructor for SDValue (NFC)	2022-01-01 10:36:59 -08:00
Simon Tatham	d50072f74e	[ARM] Introduce an empty "armv8.8-a" architecture. This is the first commit in a series that implements support for "armv8.8-a" architecture. This should contain all the necessary boilerplate to make the 8.8-A architecture exist from LLVM and Clang's point of view: it adds the new arch as a subtarget feature, a definition in TargetParser, a name on the command line, an appropriate set of predefined macros, and adds appropriate tests. The new architecture name is supported in both AArch32 and AArch64. However, in this commit, no actual _functionality_ is added as part of the new architecture. If you specify -march=armv8.8a, the compiler will accept it and set the right predefines, but generate no code any differently. Differential Revision: https://reviews.llvm.org/D115694	2021-12-31 16:43:53 +00:00
Kazu Hirata	5a667c0e74	[llvm] Use nullptr instead of 0 (NFC) Identified with modernize-use-nullptr.	2021-12-28 08:52:25 -08:00
Kazu Hirata	8445883327	[llvm] Drop unnecessary const from return types (NFC) Identified with readability-const-return-type.	2021-12-27 15:58:03 -08:00
David Green	2ec3ca7477	[ARM] Extend IsCMPZCSINC to handle CMOV A 'CMOV 1, 0, CC, %cpsr, Cmp' is the same as a 'CSINC 0, 0, CC, Cmp', and can be treated the same in IsCMPZCSINC added in D114013. This allows us to remove the unnecessary CMOV in the same way that we could remove a CSINC. Differential Revision: https://reviews.llvm.org/D115188	2021-12-27 14:15:03 +00:00
Kazu Hirata	0a5788ab57	[Target] Use range-based for loops (NFC)	2021-12-26 23:49:38 -08:00
Kazu Hirata	c5cf7d910e	[ARM] Use range-based for loops (NFC)	2021-12-20 23:06:47 -08:00
Kazu Hirata	de90490060	Revert "[ARM] Use range-based for loops (NFC)" This reverts commit `93d79cac2e`. This patch seems to break llvm/test/CodeGen/ARM/constant-islands-cfg.mir under asan.	2021-12-20 10:51:36 -08:00
Kazu Hirata	93d79cac2e	[ARM] Use range-based for loops (NFC)	2021-12-20 00:04:53 -08:00
David Green	4ece4cd77e	[ARM] Fold away CMP/CSINC from CMOV This makes use of the code in D114013 to fold away unnecessary CMPZ/CSINC starting from a CMOV, in a similar way to how we fold away CSINV/CSINC/etc Differential Revision: https://reviews.llvm.org/D115185	2021-12-19 21:53:50 +00:00
Kazu Hirata	26bd534a79	[llvm] Use none_of instead of \!any_of (NFC)	2021-12-17 13:48:57 -08:00
David Green	6bd8f114c8	[ARM] Handle splats of constants for MVE qr instruction Some MVE instructions have qr variants that take a Q and R register, splatting the R register for each lane. This is usually handled fine for standard splats as we sink the splat into the loop and combine the resulting dup into the qr instruction. It does not work for constant splats though, as we generate a vmovimm or constant pool load instead. This intercepts that, generating a vdup of the constant instead where we can turn the result into a qr instruction variant. Differential Revision: https://reviews.llvm.org/D115242	2021-12-17 09:16:28 +00:00
David Green	26f6fbe2be	[ARM] Add AddrModeT2_i8neg addressing mode support for frame lowering. As reported from a failing firefox build, we can sometimes get frame indices with negative offsets from a t2LDRi8. This adds support for them, to prevent the crash.	2021-12-14 12:49:27 +00:00
Kazu Hirata	483499670e	[Target] Use llvm::reverse (NFC)	2021-12-12 08:34:24 -08:00
Kazu Hirata	c2bb9637d9	Use llvm::any_of and llvm::all_of (NFC)	2021-12-11 11:54:37 -08:00
Kazu Hirata	36b8a4f9f3	[llvm] Use llvm::is_contained (NFC)	2021-12-11 11:42:09 -08:00
Mikael Holmen	d0f55a0d80	[ARM] Fix gcc warning about mix of enumeral and non-enumeral types gcc warned with ../lib/Target/ARM/ARMFrameLowering.cpp:797:31: warning: enumeral and non-enumeral type in conditional expression [-Wextra] 797 \| Reg == ARM::R12 ? ARM::RA_AUTH_CODE : Reg, true); \| ~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~	2021-12-09 10:31:56 +01:00
Mircea Trofin	b012742405	[NFC] Rename MachineFunction::deleteMachineInstr (coding style)	2021-12-08 20:36:13 -08:00
David Green	d43c801d13	[ARM] Peek through And 1 in IsCMPZCSINC We can be in situations where And 1 zext nodes will not have been yet, preventing us from detecting removable cmpz/csinc patterns. This peeks through those nodes allowing us to simplify more code. Differential Revision: https://reviews.llvm.org/D115176	2021-12-08 15:40:23 +00:00
Ties Stuij	63eb7ff47d	[ARM] Implement PAC return address signing mechanism for PACBTI-M This patch implements PAC return address signing for armv8-m. This patch roughly accomplishes the following things: - PAC and AUT instructions are generated. - They're part of the stack frame setup, so that shrink-wrapping can move them inwards to cover only part of a function - The auth code generated by PAC is saved across subroutine calls so that AUT can find it again to check - PAC is emitted before stacking registers (so that the SP it signs is the one on function entry). - The new pseudo-register ra_auth_code is mentioned in the DWARF frame data - With CMSE also in use: PAC is emitted before stacking FPCXTNS, and AUT validates the corresponding value of SP - Emit correct unwind information when PAC is replaced by PACBTI - Handle tail calls correctly Some notes: We make the assembler accept the `.save {ra_auth_code}` directive that is emitted by the compiler when it saves a register that contains a return address authentication code. For EHABI we need to have the `FrameSetup` flag on the instruction and handle the `t2PACBTI` opcode (identically to `t2PAC`), so we can emit `.save {ra_auth_code}`, instead of `.save {r12}`. For PACBTI-M, the instruction which computes return address PAC should use SP value before adjustment for the argument registers save are (used for variadic functions and when a parameter is is split between stack and register), but at the same it should be after the instruction that saves FPCXT when compiling a CMSE entry function. This patch moves the varargs SP adjustment after the FPCXT save (they are never enabled at the same time), so in a following patch handling of the `PAC` instruction can be placed between them. Epilogue emission code adjusted in a similar manner. PACBTI-M code generation should not emit any instructions for architectures v6-m, v8-m.base, and for A- and R-class cores. Diagnostic message for such cases is handled separately by a future ticket. note on tail calls: If the called function has four arguments that occupy registers `r0`-`r3`, the only option for holding the function pointer itself is `r12`, but this register is used to keep the PAC during function/prologue epilogue and clobbers the function pointer. When we do the tail call we need the five registers (`r0`-`r3` and `r12`) to keep six values - the four function arguments, the function pointer and the PAC, which is obviously impossible. One option would be to authenticate the return address before all callee-saved registers are restored, so we have a scratch register to temporarily keep the value of `r12`. The issue with this approach is that it violates a fundamental invariant that PAC is computed using CFA as a modifier. It would also mean using separate instructions to pop `lr` and the rest of the callee-saved registers, which would offset the advantages of doing a tail call. Instead, this patch disables indirect tail calls when the called function take four or more arguments and the return address sign and authentication is enabled for the caller function, conservatively assuming the caller function would spill LR. This patch is part of a series that adds support for the PACBTI-M extension of the Armv8.1-M architecture, as detailed here: https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/armv8-1-m-pointer-authentication-and-branch-target-identification-extension The PACBTI-M specification can be found in the Armv8-M Architecture Reference Manual: https://developer.arm.com/documentation/ddi0553/latest The following people contributed to this patch: - Momchil Velikov - Ties Stuij Reviewed By: danielkiss Differential Revision: https://reviews.llvm.org/D112429	2021-12-07 10:15:19 +00:00
Ties Stuij	0fbb17458a	[ARM] Implement setjmp BTI placement for PACBTI-M This patch intends to guard indirect branches performed by longjmp by inserting BTI instructions after calls to setjmp. Calls with 'returns-twice' are lowered to a new pseudo-instruction named t2CALL_BTI that is later expanded to a bundle of {tBL,t2BTI}. This patch is part of a series that adds support for the PACBTI-M extension of the Armv8.1-M architecture, as detailed here: https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/armv8-1-m-pointer-authentication-and-branch-target-identification-extension The PACBTI-M specification can be found in the Armv8-M Architecture Reference Manual: https://developer.arm.com/documentation/ddi0553/latest The following people contributed to this patch: - Alexandros Lamprineas - Ties Stuij Reviewed By: labrinea Differential Revision: https://reviews.llvm.org/D112427	2021-12-06 11:07:10 +00:00
David Green	d8495e0352	[ARM] Add a vrinta.f16.f16 alias The v8.1-m ARMARM uses the vrinta.f16.f16 names, as opposed to vrinta.f16. This adds an alias for it in the same way that we have for f32 and f64. Differential Revision: https://reviews.llvm.org/D68127	2021-12-06 11:06:25 +00:00
David Green	ab0c5cea0b	[ARM] Use v2i1 for MVE and CDE intrinsics This adjusts all the MVE and CDE intrinsics now that v2i1 is a legal type, to use a <2 x i1> as opposed to emulating the predicate with a <4 x i1>. The v4i1 workarounds have been removed leaving the natural v2i1 types, notably in vctp64 which now generates a v2i1 type. AutoUpgrade code has been added to upgrade old IR, which needs to convert the old v4i1 to a v2i1 be converting it back and forth to an integer with arm.mve.v2i and arm.mve.i2v intrinsics. These should be optimized away in the final assembly. Differential Revision: https://reviews.llvm.org/D114455	2021-12-03 15:27:58 +00:00
David Green	255ad73424	[ARM] Make MVE v2i1 predicates legal MVE can treat v16i1, v8i1, v4i1 and v2i1 as different views onto the same 16bit VPR.P0 register, with v2i1 holding two 8 bit values for the two halves. This was never treated as a legal type in llvm in the past as there are not many 64bit instructions and no 64bit compares. There are a few instructions that could use it though, notably a VSELECT (as it can handle any size using the underlying v16i8 VPSEL), AND/OR/XOR for similar reasons, some gathers/scatter and long multiplies and VCTP64 instructions. This patch goes through and makes v2i1 a legal type, handling all the cases that fall out of that. It also makes VSELECT legal for v2i64 as a side benefit. A lot of the codegen changes as a result - usually in way that is a little better or a little worse, but still expensive. Costs can change a little too in the process, again in a way that expensive things remain expensive. A lot of the tests that changed are mainly to ensure correctness - the code can hopefully be improved in the future where it comes up in practice. The intrinsics currently remain using the v4i1 they previously did to emulate a v2i1. This will be changed in a followup patch but this one was already large enough. Differential Revision: https://reviews.llvm.org/D114449	2021-12-03 14:05:41 +00:00
David Green	b8f1ccb0ac	[ARM] Introduce i8neg and i8pos addressing modes Some instructions with i8 immediate ranges can only hold negative values (like t2LDRHi8), only hold positive values (like t2STRT) or hold +/- depending on the U bit (like the pre/post inc instructions. e.g t2LDRH_POST). This patch splits the AddrModeT2_i8 into AddrModeT2_i8, AddrModeT2_i8pos and AddrModeT2_i8neg to make this clear. This allows us to get the offset ranges of t2LDRHi8 correct in the load/store optimizer, fixing issues where we could end up creating instructions with positive offsets (which may then be encoded as ldrht). Differential Revision: https://reviews.llvm.org/D114638	2021-12-02 17:10:26 +00:00
David Green	e629302558	[ARM] Correct range in isLegalAddressImm The ranges in isLegalAddressImm were off by one, not allowing the maximum values for unscaled offsets. Differential Revision: https://reviews.llvm.org/D114636	2021-12-02 11:33:40 +00:00
David Green	646c872f9d	[ARM] Teach getIntImmCostInst about the cost of saturating fp converts Given a min(max(fptosi, INT_MIN), INT_MAX) with the correct constants, we can now generate a fptosi.sat. But in the arm backend, the constant can be treated as high cost, pulling it out of the basic block in a way that the DAG combine can no longer see it. This teaches it again that it is a low cost constant, not worth hoisting out. Recommitted from `0e98659ea1` with a fix for APInt comparison. Differential Revision: https://reviews.llvm.org/D114380	2021-12-02 07:56:27 +00:00
David Green	13e66c070b	Revert "[ARM] Teach getIntImmCostInst about the cost of saturating fp converts" This reverts commit `6d41de380f` as the windows bots are not happy, in a way I do not understand. Revert whilst we figure out what is wrong.	2021-12-01 15:25:19 +00:00
Ties Stuij	f5f28d5b0c	[ARM] Implement BTI placement pass for PACBTI-M This patch implements a new MachineFunction in the ARM backend for placing BTI instructions. It is similar to the existing AArch64 aarch64-branch-targets pass. BTI instructions are inserted into basic blocks that: - Have their address taken - Are the entry block of a function, if the function has external linkage or has its address taken - Are mentioned in jump tables - Are exception/cleanup landing pads Each BTI instructions is placed in the beginning of a BB after the so-called meta instructions (e.g. exception handler labels). Each outlining candidate and the outlined function need to be in agreement about whether BTI placement is enabled or not. If branch target enforcement is disabled for a function, the outliner should not covertly enable it by emitting a call to an outlined function, which begins with BTI. The cost mode of the outliner is adjusted to account for the extra BTI instructions in the outlined function. The ARM Constant Islands pass will maintain the count of the jump tables, which reference a block. A `BTI` instruction is removed from a block only if the reference count reaches zero. PAC instructions in entry blocks are replaced with PACBTI instructions (tests for this case will be added in a later patch because the compiler currently does not generate PAC instructions). The ARM Constant Island pass is adjusted to handle BTI instructions correctly. Functions with static linkage that don't have their address taken can still be called indirectly by linker-generated veneers and thus their entry points need be marked with BTI or PACBTI. The changes are tested using "LLVM IR -> assembly" tests, jump tables also have a MIR test. Unfortunately it is not possible add MIR tests for exception handling and computed gotos because of MIR parser limitations. This patch is part of a series that adds support for the PACBTI-M extension of the Armv8.1-M architecture, as detailed here: https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/armv8-1-m-pointer-authentication-and-branch-target-identification-extension The PACBTI-M specification can be found in the Armv8-M Architecture Reference Manual: https://developer.arm.com/documentation/ddi0553/latest The following people contributed to this patch: - Mikhail Maltsev - Momchil Velikov - Ties Stuij Reviewed By: ostannard Differential Revision: https://reviews.llvm.org/D112426	2021-12-01 12:54:05 +00:00
Ties Stuij	b430782be3	[ARM] emit PACBTI-M build attributes This patch is part of a series that adds support for the PACBTI-M extension of the Armv8.1-M architecture, as detailed here: https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/armv8-1-m-pointer-authentication-and-branch-target-identification-extension The PACBTI-M specification can be found in the Armv8-M Architecture Reference Manual: https://developer.arm.com/documentation/ddi0553/latest The following people contributed to this patch: - Victor Campos - Ties Stuij Reviewed By: ostannard Differential Revision: https://reviews.llvm.org/D112425	2021-12-01 11:05:29 +00:00
Ties Stuij	c12c7a84b0	[ARM] add common parts for PACBTI-M support in the backend This patch encapsulates decision logic about when and how to generate PAC/BTI related code. It's a part shared by PAC-RET, BTI placement, build attribute emission, etc, so it make sense committing it separately in order to unblock the aforementioned parts, which can proceed concurrently. This patch adds a few member functions to `ARMFunctionInfo`, which are currently unused, therefore there is no testing for them at the moment. This code is tested in follow-up PAC/BTI code gen patches. This patch is part of a series that adds support for the PACBTI-M extension of the Armv8.1-M architecture, as detailed here: https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/armv8-1-m-pointer-authentication-and-branch-target-identification-extension The PACBTI-M specification can be found in the Armv8-M Architecture Reference Manual: https://developer.arm.com/documentation/ddi0553/latest The following people contributed to this patch: - Momchil Velikov - Ties Stuij Reviewed By: danielkiss Differential Revision: https://reviews.llvm.org/D112423	2021-12-01 10:48:30 +00:00
David Green	6d41de380f	[ARM] Teach getIntImmCostInst about the cost of saturating fp converts Given a min(max(fptosi, INT_MIN), INT_MAX) with the correct constants, we can now generate a fptosi.sat. But in the arm backend, the constant can be treated as high cost, pulling it out of the basic block in a way that the DAG combine can no longer see it. This teaches it again that it is a low cost constant, not worth hoisting out. Differential Revision: https://reviews.llvm.org/D114380	2021-12-01 10:25:52 +00:00
David Green	388bfc5408	[ARM] Fix some identing in ARMAsmPrinter::emitInstruction, NFC	2021-12-01 10:08:37 +00:00
David Green	9e8a71caf0	[DAG] Create fptosi.sat from clamped fptosi This adds a fold in DAGCombine to create fptosi_sat from sequences for smin(smax(fptosi(x))) nodes, where the min/max saturate the output of the fp convert to a specific bitwidth (say INT_MIN and INT_MAX). Because it is dealing with smin(/smax) in DAG they may currently be ISD::SMIN, ISD::SETCC/ISD::SELECT, ISD::VSELECT or ISD::SELECT_CC nodes which need to be handled similarly. A shouldConvertFpToSat method was added to control when converting may be profitable. The original fptosi will have a less strict semantics than the fptosisat, with less values that need to produce defined behaviour. This especially helps on ARM/AArch64 where the vcvt instructions naturally saturate the result. Differential Revision: https://reviews.llvm.org/D111976	2021-11-30 15:29:14 +00:00
Hans Wennborg	a87782c34d	Revert "[DAG] Create fptosi.sat from clamped fptosi" It causes builds to fail with this assert: llvm/include/llvm/ADT/APInt.h:990: bool llvm::APInt::operator==(const llvm::APInt &) const: Assertion `BitWidth == RHS.BitWidth && "Comparison requires equal bit widths"' failed. See comment on the code review. > This adds a fold in DAGCombine to create fptosi_sat from sequences for > smin(smax(fptosi(x))) nodes, where the min/max saturate the output of > the fp convert to a specific bitwidth (say INT_MIN and INT_MAX). Because > it is dealing with smin(/smax) in DAG they may currently be ISD::SMIN, > ISD::SETCC/ISD::SELECT, ISD::VSELECT or ISD::SELECT_CC nodes which need > to be handled similarly. > > A shouldConvertFpToSat method was added to control when converting may > be profitable. The original fptosi will have a less strict semantics > than the fptosisat, with less values that need to produce defined > behaviour. > > This especially helps on ARM/AArch64 where the vcvt instructions > naturally saturate the result. > > Differential Revision: https://reviews.llvm.org/D111976 This reverts commit `52ff3b0093`.	2021-11-30 15:36:56 +01:00
David Green	52ff3b0093	[DAG] Create fptosi.sat from clamped fptosi This adds a fold in DAGCombine to create fptosi_sat from sequences for smin(smax(fptosi(x))) nodes, where the min/max saturate the output of the fp convert to a specific bitwidth (say INT_MIN and INT_MAX). Because it is dealing with smin(/smax) in DAG they may currently be ISD::SMIN, ISD::SETCC/ISD::SELECT, ISD::VSELECT or ISD::SELECT_CC nodes which need to be handled similarly. A shouldConvertFpToSat method was added to control when converting may be profitable. The original fptosi will have a less strict semantics than the fptosisat, with less values that need to produce defined behaviour. This especially helps on ARM/AArch64 where the vcvt instructions naturally saturate the result. Differential Revision: https://reviews.llvm.org/D111976	2021-11-30 11:05:32 +00:00
Ties Stuij	5cff77c23f	[clang][ARM] PACBTI-M assembly support Introduce assembly support for Armv8.1-M PACBTI extension. This is an optional extension in v8.1-M. There are 10 new system registers and 5 new instructions, all predicated on the feature. The attribute for llvm-mc is called "pacbti". For armclang, an architecture extension also called "pacbti" was created. This patch is part of a series that adds support for the PACBTI-M extension of the Armv8.1-M architecture, as detailed here: https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/armv8-1-m-pointer-authentication-and-branch-target-identification-extension The PACBTI-M specification can be found in the Armv8-M Architecture Reference Manual: https://developer.arm.com/documentation/ddi0553/latest The following people contributed to this patch: - Victor Campos - Ties Stuij Reviewed By: labrinea Differential Revision: https://reviews.llvm.org/D112420	2021-11-30 09:28:18 +00:00
Nick Desaulniers	89453ed6f2	[ARM] create new pseudo t2LDRLIT_ga_pcrel for stack guards We can't use the existing pseudo ARM::tLDRLIT_ga_pcrel for loading the stack guard for PIC code that references the GOT, since arm-pseudo may expand this to the narrow tLDRpci rather than the wider t2LDRpci. Create a new pseudo, t2LDRLIT_ga_pcrel, and expand it to t2LDRpci. Fixes: https://bugs.chromium.org/p/chromium/issues/detail?id=1270361 Reviewed By: ardb Differential Revision: https://reviews.llvm.org/D114762	2021-11-30 08:46:05 +01:00
Kazu Hirata	c73fc74ce0	[llvm] Use range-based for loops (NFC)	2021-11-28 10:04:54 -08:00
David Green	5c64d8ef8c	[ARM] CSINC/CSINV patterns from CMOV We sometimes end up generating CMOV with constant operands that can be simplified to CSINC or CSINV under Arm-8.1m. This adds some simple patterns for them. Differential Revision: https://reviews.llvm.org/D114349	2021-11-27 20:21:41 +00:00
David Green	7d5d063c77	[ARM] Fold away unnecessary CSET/CMPZ Codegen from expanded vector operations can end up with unnecessary CMPZ/CSINC, of the form: CSXYZ A, B, C1 (CMPZ (CSINC 0, 0, C2, D), 0) These can be converted to remove the CMPZ and CSINC, depending on the condition. if C1==NE -> CSXYZ A, B, C2, D if C1==EQ -> CSXYZ A, B, NOT(C2), D Differential Revision: https://reviews.llvm.org/D114013	2021-11-27 19:07:16 +00:00
Kazu Hirata	387927bbaf	[Target] Use range-based for loops (NFC)	2021-11-26 21:21:17 -08:00
Kazu Hirata	562356d6e3	[Target] Use range-based for loops (NFC)	2021-11-26 08:23:01 -08:00
David Green	c76d6dd192	[ARM] Generate VCTP from SETCC This converts a vector SETCC([0,1,2,..], splat(n), ult) to vctp n, which can be fewer instructions and prevent the need for constant pool loads. Differential Revision: https://reviews.llvm.org/D114177	2021-11-26 10:57:14 +00:00
David Green	fbb61adb70	[ARM] Convert fptoi.sat to fixed point multiply This is a very small addition to the existing MVE fixed point vcvt code to also create them from FP_TO_SINT_SAT and FP_TO_UINT_SAT nodes, which should be equally valid for native saturating converts under MVE. Differential Revision: https://reviews.llvm.org/D114360	2021-11-25 15:43:45 +00:00
Simon Pilgrim	63b1e58f07	[DAG] SimplifyDemandedBits - simplify rotl/rotr to shl/srl (REAPPLIED) If we only demand bits from one half of a rotation pattern, see if we can simplify to a logical shift. For the ARM/AArch64 rev16/32 patterns, I had to drop a fold to prevent srl(bswap()) -> rotr(bswap) -> srl(bswap) infinite loops. I've replaced this with an isel PatFrag which should do the same task. Reapplied with fix for AArch64 rev patterns to matching the ARM fix. https://alive2.llvm.org/ce/z/iroxki (rol -> shl by amt iff demanded bits has at least as many trailing zeros as the shift amount) https://alive2.llvm.org/ce/z/4ez_U- (ror -> shl by revamt iff demanded bits has at least as many trailing zeros as the reverse shift amount) https://alive2.llvm.org/ce/z/cD7dR- (ror -> lshr by amt iff demanded bits has at least as many leading zeros as the shift amount) https://alive2.llvm.org/ce/z/_XGHtQ (rol -> lshr by revamt iff demanded bits has at least as many leading zeros as the reverse shift amount) Differential Revision: https://reviews.llvm.org/D114354	2021-11-25 11:14:15 +00:00
Benjamin Kramer	d32787230d	Revert "[DAG] SimplifyDemandedBits - simplify rotl/rotr to shl/srl" This reverts commit `3cf4a2c620`. It makes llc hang on the following test case. ``` target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128" target triple = "aarch64-unknown-linux-gnu" define dso_local void @_PyUnicode_EncodeUTF16() local_unnamed_addr #0 { entry: br label %while.body117.i while.body117.i: ; preds = %cleanup149.i, %entry %out.6269.i = phi i16* [ undef, %cleanup149.i ], [ undef, %entry ] %0 = load i16, i16* undef, align 2 %1 = icmp eq i16 undef, -10240 br i1 %1, label %fail.i, label %cleanup149.i cleanup149.i: ; preds = %while.body117.i %or130.i = call i16 @llvm.bswap.i16(i16 %0) #2 store i16 %or130.i, i16* %out.6269.i, align 2 br label %while.body117.i fail.i: ; preds = %while.body117.i ret void } ; Function Attrs: nofree nosync nounwind readnone speculatable willreturn declare i16 @llvm.bswap.i16(i16) #1 attributes #0 = { "target-features"="+neon,+v8a" } attributes #1 = { nofree nosync nounwind readnone speculatable willreturn } attributes #2 = { mustprogress nofree norecurse nosync nounwind readnone uwtable willreturn "frame-pointer"="non-leaf" "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="generic" "target-features"="+neon,+v8a" } ```	2021-11-24 14:42:54 +01:00
Simon Pilgrim	3cf4a2c620	[DAG] SimplifyDemandedBits - simplify rotl/rotr to shl/srl If we only demand bits from one half of a rotation pattern, see if we can simplify to a logical shift. For the ARM rev16 patterns, I had to drop a fold to prevent srl(bswap()) -> rotr(bswap) -> srl(bswap) infinite loops. I've replaced this with an isel PatFrag which should do the same task. https://alive2.llvm.org/ce/z/iroxki (rol -> shl by amt iff demanded bits has at least as many trailing zeros as the shift amount) https://alive2.llvm.org/ce/z/4ez_U- (ror -> shl by revamt iff demanded bits has at least as many trailing zeros as the reverse shift amount) https://alive2.llvm.org/ce/z/cD7dR- (ror -> lshr by amt iff demanded bits has at least as many leading zeros as the shift amount) https://alive2.llvm.org/ce/z/_XGHtQ (rol -> lshr by revamt iff demanded bits has at least as many leading zeros as the reverse shift amount) Differential Revision: https://reviews.llvm.org/D114354	2021-11-24 11:28:35 +00:00
David Green	581f837355	[ARM] Fold (fadd x, (vselect c, y, -1.0)) into (vselect c, (fadd x, y), x) This is similar to D113574, but as a DAG combine, not tablegen patterns. Doing the fold as a DAG combine allows the fadd to be folded with a fmul, finally producing a predicated vfma. It performs the same fold of fadd(x, vselect(p, y, -0.0)) to vselect p, (fadd x, y), x) using -0.0 as the identity value of a fadd. Differential Revision: https://reviews.llvm.org/D113584	2021-11-24 10:41:00 +00:00
David Green	d9af9c2c5a	[ARM] Fold floating point select(binop) patterns Similar to D84091 which added extra predicated folds for integer operations using the identity element of the operation, this adds them for floating point operations for the form `BinOp(x, select(p, y, Identity))`. They are folded back to predicated versions of the operator, with fadd having the identity -0.0, fsub using the identity 0.0 and fmul using 1.0. Differential Revision: https://reviews.llvm.org/D113574	2021-11-24 10:22:20 +00:00
Kazu Hirata	d45cb1d7ea	[llvm] Use range-based for loops (NFC)	2021-11-23 08:54:48 -08:00
Kazu Hirata	fc981cedea	[llvm] Use range-based for loops (NFC)	2021-11-21 10:36:18 -08:00
Quinn Pham	3f3bee42d2	[NFC][llvm] Inclusive language: remove instance of master from Thumb2SizeReduction.cpp [NFC] As part of using inclusive language within the llvm project, this patch replaces master with main in `Thumb2SizeReduction.cpp`. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D114196	2021-11-19 16:07:58 -06:00
Mubashar Ahmad	8e47b83ec9	[AArch64][ARM] Enablement of Cortex-A710 Support Phabricator review: https://reviews.llvm.org/D113256	2021-11-18 10:58:05 +00:00
Zarko Todorovski	5b8bbbecfa	[NFC][llvm] Inclusive language: reword and remove uses of sanity in llvm/lib/Target Reworded removed code comments that contain `sanity check` and `sanity test`.	2021-11-17 21:59:00 -05:00
Jay Foad	3264e95938	[CodeGen] Update LiveIntervals in TargetInstrInfo::convertToThreeAddress Delegate updating of LiveIntervals to each target's convertToThreeAddress implementation, instead of repairing LiveIntervals after the fact in TwoAddressInstruction::convertInstTo3Addr. Differential Revision: https://reviews.llvm.org/D113493	2021-11-17 10:16:47 +00:00
David Green	4c3bfdc7f1	[ARM] Fix GatherScatter AddLikeOr condition	2021-11-15 09:44:41 +00:00
Kazu Hirata	d243cbf8ea	[llvm] Use isa instead of dyn_cast (NFC)	2021-11-14 19:40:46 -08:00
Kazu Hirata	efa896e5f7	[Target] Use SDNode::uses (NFC)	2021-11-12 21:23:04 -08:00
Kazu Hirata	2ca45adf24	[CodeGen, Target] Use MachineRegisterInfo::use_operands (NFC)	2021-11-11 22:28:55 -08:00
Benjamin Kramer	194897eccf	[ARM] Fix unused variable warning in Release builds	2021-11-09 18:55:52 +01:00
Ard Biesheuvel	a19da876ab	[ARM] implement support for TLS register based stack protector Implement support for loading the stack canary from a memory location held in the TLS register, with an optional offset applied. This is used by the Linux kernel to implement per-task stack canaries, which is impossible on SMP systems when using a global variable for the stack canary. Reviewed By: nickdesaulniers Differential Revision: https://reviews.llvm.org/D112768	2021-11-09 18:19:47 +01:00
Ard Biesheuvel	2caf85ad7a	[ARM] implement LOAD_STACK_GUARD for remaining targets Currently, LOAD_STACK_GUARD on ARM is only implemented for Mach-O targets, and other targets rely on the generic support which may result in spilling of the stack canary value or address, or may cause it to be kept in a callee save register across function calls, which means they essentially get spilled as well, only by the callee when it wants to free up this register. So let's implement LOAD_STACK GUARD for other targets as well. This ensures that the load of the stack canary is rematerialized fully in the epilogue. This code was split off from D112768: [ARM] implement support for TLS register based stack protector for which it is a prerequisite. Reviewed By: nickdesaulniers Differential Revision: https://reviews.llvm.org/D112811	2021-11-08 22:59:15 +01:00
Kazu Hirata	41ef3187e0	[ARM, X86] Use MachineBasicBlock::{predecessors,successors} (NFC)	2021-11-07 09:53:16 -08:00
David Green	091244023a	[ARM] Move VPTBlock pass after post-ra scheduling Currently when tail predicating loops, vpt blocks need to be created with the vctp predicate in case we need to revert to non-tail predicated form. This has the unfortunate side effect of severely hampering post-ra scheduling at times as the instructions are already stuck in vpt blocks, not allowed to be independently ordered. This patch addresses that by just moving the creation of VPT blocks later in the pipeline, after post-ra scheduling has been performed. This allows more optimal scheduling post-ra before the vpt blocks are created, leading to more optimal tail predicated loops. Differential Revision: https://reviews.llvm.org/D113094	2021-11-04 18:42:12 +00:00
David Green	3bc586b9aa	[ARM] Treat MVE gather add-like-or's like adds LLVM has the habit of turning adds with no common bits set into ors, which means we need to detect them and treat them like adds again in the MVE gather/scatter lowering pass. Differential Revision: https://reviews.llvm.org/D112922	2021-11-03 11:41:06 +00:00
David Green	d36dd1f842	[ARM] Push gather/scatter shl index updates out of loops This teaches the MVE gather scatter lowering pass that SHL is essentially the same as Mul, where we are able to optimize the induction of a gather/scatter address by pushing them out of loops. https://alive2.llvm.org/ce/z/wG4VyT Differential Revision: https://reviews.llvm.org/D112920	2021-11-03 11:00:05 +00:00
Yi Kong	803d4f8a35	[ARM][AsmParser] Don't emit "deprecated instruction in IT block" warning if requested Also fixed formatting in AsmMatcherEmitter because it was confusing. Differential Revision: https://reviews.llvm.org/D112993	2021-11-03 17:18:04 +08:00
Daniel Kiss	d8075e8781	Reland "[ARM] __cxa_end_cleanup should be called instead of _UnwindResume." This is relanding commit `da1d1a0869` . This patch additionally addresses failures found in buildbots & post review comments. ARM EHABI[1] specifies the __cxa_end_cleanup to be called after cleanup. It will call the UnwindResume. __cxa_begin_cleanup will be called from libcxxabi while __cxa_end_cleanup is never called. This will trigger a termination when a foreign exception is processed while UnwindResume is called because the global state will be wrong due to the missing __cxa_end_cleanup call. Additional test here: D109856 [1] https://github.com/ARM-software/abi-aa/blob/main/ehabi32/ehabi32.rst#941compiler-helper-functions Reviewed By: logan Differential Revision: https://reviews.llvm.org/D111703	2021-10-28 21:45:09 +02:00
Daniel Kiss	66e03db814	Revert "Reland "[ARM] __cxa_end_cleanup should be called instead of _UnwindResume."" This reverts commit `b6420e575f`.	2021-10-28 17:24:53 +02:00
Daniel Kiss	b6420e575f	Reland "[ARM] __cxa_end_cleanup should be called instead of _UnwindResume." This is relanding commit `da1d1a0869` . This patch additionally addresses failures found in buildbots & post review comments. ARM EHABI[1] specifies the __cxa_end_cleanup to be called after cleanup. It will call the UnwindResume. __cxa_begin_cleanup will be called from libcxxabi while __cxa_end_cleanup is never called. This will trigger a termination when a foreign exception is processed while UnwindResume is called because the global state will be wrong due to the missing __cxa_end_cleanup call. Additional test here: D109856 [1] https://github.com/ARM-software/abi-aa/blob/main/ehabi32/ehabi32.rst#941compiler-helper-functions Reviewed By: logan Differential Revision: https://reviews.llvm.org/D111703	2021-10-28 16:49:19 +02:00
Ard Biesheuvel	d7e089f2d6	[ARM] Use hardware TLS register in Thumb2 mode when -mtp=cp15 is passed In ARM mode, passing -mtp=cp15 forces the use of an inline MRC system register read to move the thread pointer value into a register. Currently, in Thumb2 mode, -mtp=cp15 is ignored, and a call to the __aeabi_read_tp helper is emitted instead. This is inconsistent, and breaks the Linux/ARM build for Thumb2 targets, as the Linux kernel does not provide an implementation of __aeabi_read_tp,. Reviewed By: nickdesaulniers, peter.smith Differential Revision: https://reviews.llvm.org/D112600	2021-10-27 16:42:11 -07:00
Daniel Kiss	894ddba1c9	Revert "[ARM] __cxa_end_cleanup should be called instead of _UnwindResume." This reverts commit `da1d1a0869`.	2021-10-27 14:29:35 +02:00
Daniel Kiss	da1d1a0869	[ARM] __cxa_end_cleanup should be called instead of _UnwindResume. ARM EHABI[1] specifies the __cxa_end_cleanup to be called after cleanup. It will call the UnwindResume. __cxa_begin_cleanup will be called from libcxxabi while __cxa_end_cleanup is never called. This will trigger a termination when a foreign exception is processed while UnwindResume is called because the global state will be wrong due to the missing __cxa_end_cleanup call. Additional test here: D109856 [1] https://github.com/ARM-software/abi-aa/blob/main/ehabi32/ehabi32.rst#941compiler-helper-functions Reviewed By: logan Differential Revision: https://reviews.llvm.org/D111703	2021-10-27 10:40:00 +02:00
Kazu Hirata	d8e4170b0a	Ensure newlines at the end of files (NFC)	2021-10-23 08:45:29 -07:00
Craig Topper	04c184bba7	[TargetLowering] Simplify the interface of expandABS. NFC Instead of returning a bool to indicate success and a separate SDValue, return the SDValue and have the callers check if it is null. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D112331	2021-10-22 10:22:23 -07:00
Simon Pilgrim	71e39e3f18	[ADT] Add APInt::isNegatedPowerOf2() helper Inspired by D111968, provide a isNegatedPowerOf2() wrapper instead of obfuscating code with (-Value).isPowerOf2() patterns, which I'm sure are likely avenues for typos..... Differential Revision: https://reviews.llvm.org/D111998	2021-10-19 14:38:21 +01:00
Alexandros Lamprineas	04dc68710a	[DebugInfo][ARM] Fix incorrect debug information for RWPI accessed globals When compiling for the RWPI relocation model the debug information is wrong: * the debug location is described as { DW_OP_addr Var } instead of { DW_OP_constNu Var DW_OP_bregX 0 DW_OP_plus } * the relocation type is R_ARM_ABS32 instead of R_ARM_SBREL32 Differential Revision: https://reviews.llvm.org/D111404	2021-10-18 21:29:46 +01:00
John Brawn	082fa56819	[ARM] Fix MOVCC peephole to not use an incorrect register class The MOVCC peephole eliminates a MOVCC by making one of its inputs a conditional instruction, but when doing this it should be using both inputs of the MOVCC to decide on the register class to use as otherwise we can get an error when using -verify-machineinstrs. Differential Revision: https://reviews.llvm.org/D111714	2021-10-15 10:54:26 +01:00
Andrew Savonichev	dc8a41de34	[ARM] Simplify address calculation for NEON load/store The patch attempts to optimize a sequence of SIMD loads from the same base pointer: %0 = gep float, float base, i32 4 %1 = bitcast float* %0 to <4 x float>* %2 = load <4 x float>, <4 x float>* %1 ... %n1 = gep float, float base, i32 N %n2 = bitcast float* %n1 to <4 x float>* %n3 = load <4 x float>, <4 x float>* %n2 For AArch64 the compiler generates a sequence of LDR Qt, [Xn, #16]. However, 32-bit NEON VLD1/VST1 lack the [Wn, #imm] addressing mode, so the address is computed before every ld/st instruction: add r2, r0, #32 add r0, r0, #16 vld1.32 {d18, d19}, [r2] vld1.32 {d22, d23}, [r0] This can be improved by computing address for the first load, and then using a post-indexed form of VLD1/VST1 to load the rest: add r0, r0, #16 vld1.32 {d18, d19}, [r0]! vld1.32 {d22, d23}, [r0] In order to do that, the patch adds more patterns to DAGCombine: - (load (add ptr inc1)) and (add ptr inc2) are now folded if inc1 and inc2 are constants. - (or ptr inc) is now recognized as a pointer increment if ptr is sufficiently aligned. In addition to that, we now search for all possible base updates and then pick the best one. Differential Revision: https://reviews.llvm.org/D108988	2021-10-14 15:23:10 +03:00
David Green	860b4479dc	[ARM] Be more explicit about disabling CombineBaseUpdate for MVE. This shouldn't be called for non-neon targets at the moment in either case, but it is good to be expliit about the CombineBaseUpdate being a NEON function, not expecting to be run under MVE.	2021-10-11 21:51:45 +01:00
Victor Campos	3550e242fa	[Clang][ARM][AArch64] Add support for Armv9-A, Armv9.1-A and Armv9.2-A armv9-a, armv9.1-a and armv9.2-a can be targeted using the -march option both in ARM and AArch64. - Armv9-A maps to Armv8.5-A. - Armv9.1-A maps to Armv8.6-A. - Armv9.2-A maps to Armv8.7-A. - The SVE2 extension is enabled by default on these architectures. - The cryptographic extensions are disabled by default on these architectures. The Armv9-A architecture is described in the Arm® Architecture Reference Manual Supplement Armv9, for Armv9-A architecture profile (https://developer.arm.com/documentation/ddi0608/latest). Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D109517	2021-10-11 17:44:09 +01:00
Reid Kleckner	b3a6d096d7	Fix shlib builds for all lib/Target/*/TargetInfo libs They all must depend on MC now that the target registry is in MC. Also fix llvm-cxxdump	2021-10-08 15:21:13 -07:00
Reid Kleckner	89b57061f7	Move TargetRegistry.(h\|cpp) from Support to MC This moves the registry higher in the LLVM library dependency stack. Every client of the target registry needs to link against MC anyway to actually use the target, so we might as well move this out of Support. This allows us to ensure that Support doesn't have includes from MC/*. Differential Revision: https://reviews.llvm.org/D111454	2021-10-08 14:51:48 -07:00
Michael Forster	53801a59eb	Fix two unused-variable warnings.	2021-10-07 15:17:58 +02:00
David Green	73346f5848	[ARM] Introduce a MQPRCopy Currently when creating tail predicated loops, we need to validate that all the live-outs of a loop will be equivalent with and without tail predication, and if they are not we cannot legally create a tail-predicated loop, leaving expensive vctp and vpst instructions in the loop. These notably can include register-allocation instructions like stack loads and stores, and copys lowered from COPYs to MVE_VORRs. Instead of trying to prove this is valid late in the pipeline, this patch introduces a MQPRCopy pseudo instruction that COPY is lowered to. This can then either be converted to a MVE_VORR where possible, or to a couple of VMOVD instructions if not. This way they do not behave differently within and outside of tail-predications regions, and we can know by construction that they are always valid. The idea is that we can do the same with stack load and stores, converting them to VLDR/VSTR or VLDM/VSTM where required to prove tail predication is always valid. This does unfortunately mean inserting multiple VMOVD instructions, instead of a single MVE_VORR, but my experiments show it to be an improvement in general. Differential Revision: https://reviews.llvm.org/D111048	2021-10-07 12:52:12 +01:00
Itay Bookstein	40ec1c0f16	[IR][NFC] Rename getBaseObject to getAliaseeObject To better reflect the meaning of the now-disambiguated {GlobalValue, GlobalAlias}::getBaseObject after breaking off GlobalIFunc::getResolverFunction (D109792), the function is renamed to getAliaseeObject.	2021-10-06 19:33:10 -07:00
Pengxuan Zheng	b0045f5595	[ARM] Fix a bug in finding a pair of extracts to create VMOVRRD D100244 missed a check on the ResNo of the extract's operand 0 when finding a pair of extracts to combine into a VMOVRRD (extract(x, n); extract(x, n+1) -> VMOVRRD(extract x, n/2)). As a result, it can incorrectly pair an extract(x, n) with another extract(x:3, n+1) for example. This patch fixes the bug by adding the proper check on ResNo. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D111188	2021-10-06 10:03:32 -07:00
Amara Emerson	8bde5e58c0	Delay outgoing register assignments to last. The delayed stack protector feature which is currently used for SDAG (and thus allows for more commonly generating tail calls) depends on being able to extract the tail call into a separate return block. To do this it also has to extract the vreg->physreg copies that set up the call's arguments, since if it doesn't then the call inst ends up using undefined physregs in it's new spliced block. SelectionDAG implementations can do this because they delay emitting register copies until after the stack arguments are set up. GISel however just processes and emits the arguments in IR order, so stack arguments always end up last, and thus this breaks the code that looks for any register arg copies that precede the call instruction. This patch adds a thunk argument to the assignValueToReg() and custom assignment hooks. For outgoing arguments, register assignments use this return param to return a thunk that does the actual generating of the copies. We collect these until all the outgoing stack assignments have been done and then execute them, so that the copies (and perhaps some artifacts like G_SEXTs) are placed after any stores. Differential Revision: https://reviews.llvm.org/D110610	2021-10-04 12:33:20 -07:00
Jay Foad	a9bceb2b05	[APInt] Stop using soft-deprecated constructors and methods in llvm. NFC. Stop using APInt constructors and methods that were soft-deprecated in D109483. This fixes all the uses I found in llvm, except for the APInt unit tests which should still test the deprecated methods. Differential Revision: https://reviews.llvm.org/D110807	2021-10-04 08:57:44 +01:00
David Green	20b1a16a69	[ARM] Mark <= -1 immediate constant as cheap A <= -1 constant on a compare can be converted to a < 0 operation, which is usually cheap. If we mark the constant as cheap, preventing hoisting, we allow that fold to happen even across different blocks. Differential Revision: https://reviews.llvm.org/D109360	2021-10-03 19:30:08 +01:00
Kazu Hirata	c1e32b3fc0	[Target] Migrate from getNumArgOperands to arg_size (NFC) Note that getNumArgOperands is considered a legacy name. See llvm/include/llvm/IR/InstrTypes.h for details.	2021-10-02 12:06:29 -07:00
David Green	f9aa8623fe	[ARM] Add more MVE intrinsics to sink splats to This adds a few more unpredicated intrinsics to sink splats to, in order to create more qr instruction variants. Notably this includes saddsat/uaddsat but also some of the unpredicated mve intrinsics. Differential Revision: https://reviews.llvm.org/D110333	2021-09-30 14:41:23 +01:00
David Green	fdd8c10959	[ARM] Delay reverting WLS in arm-block-placement As we have to split blocks, we may be left in an invalid loop state after a WLS is reverted to a DLS. Instead remember the WLS that could not be fixed and revert them after finishing processing all other loops. Differential Revision: https://reviews.llvm.org/D110567	2021-09-28 15:38:29 +01:00
David Green	2c53215e99	[ARM] Skip debug info in recomputeVPTBlockMask The ARMLowOverheadLoops pass recalculates VPT block masks when it converts VCMP's inside VPT blocks into VPT's. The function to do so doesn't seem to handle debug info though, leading to invalid block creation or asserts at compile time. Make sure the function skips any debug info between the MVE instructions it inspects. Differential Revision: https://reviews.llvm.org/D110564	2021-09-28 14:58:13 +01:00
David Green	bb2d23dcd4	[ARM] Improve detection of fallthough when aligning blocks We align non-fallthrough branches under Cortex-M at O3 to lead to fewer instruction fetches. This improves that for the block after a LE or LETP. These blocks will still have terminating branches until the LowOverheadLoops pass is run (as they are not handled by analyzeBranch, the branch is not removed until later), so canFallThrough will return false. These extra branches will eventually be removed, leaving a fallthrough, so treat them as such and don't add unnecessary alignments. Differential Revision: https://reviews.llvm.org/D107810	2021-09-27 11:21:21 +01:00
David Green	883758ed48	[ARM] Fix Arm block placement creating branches after jump tables. Given: - A jump table - Which jumps to the next block - The next block ends in a WLS - Where the WLS conditionally jumps to block earlier in the program. The Arm block placement pass would attempt to move the block containing the WLS earlier, as the WLS instruction can only branch forward. In doing so it would add a branch from the jumptable block to the WLS block, thinking it previously fell-through. This in itself would be fine, if a little inefficient, but the constant island pass expects all instructions after a jump-table branch to have been removed by analyzeBranch. So it gets confused and can assign the same labels to multiple jump table blocks. I've changed the condition to the same as used in analyzeBranch.	2021-09-25 11:32:25 +01:00
Jay Foad	6cef28ed2d	[TII] Remove the MFI argument to convertToThreeAddress. NFC. This simplifies the API and addresses a FIXME in TwoAddressInstructionPass::convertInstTo3Addr. Differential Revision: https://reviews.llvm.org/D110229	2021-09-23 08:58:46 +01:00
Simon Pilgrim	b1f38a27f0	[Target][CodeGen] Remove default CostKind arguments on inner/impl TTI overrides Based off a discussion on D110100, we should be avoiding default CostKinds whenever possible. This initial patch removes them from the 'inner' target implementation callbacks - these should only be used by the main TTI calls, so this should guarantee that we don't cause changes in CostKind by missing it in an inner call. This exposed a few missing arguments in getGEPCost and reduction cost calls that I've cleaned up. Differential Revision: https://reviews.llvm.org/D110242	2021-09-22 15:28:08 +01:00
David Green	02cd8a6b91	[ARM] Allow smaller VMOVL in tail predicated loops This allows VMOVL in tail predicated loops so long as the the vector size the VMOVL is extending into is less than or equal to the size of the VCTP in the tail predicated loop. These cases represent a sign-extend-inreg (or zero-extend-inreg), which needn't block tail predication as in https://godbolt.org/z/hdTsEbx8Y. For this a vecsize has been added to the TSFlag bits of MVE instructions, which stores the size of the elements that the MVE instruction operates on. In the case of multiple size (such as a MVE_VMOVLs8bh that extends from i8 to i16, the largest size was be chosen). The sizes are encoded as 00 = i8, 01 = i16, 10 = i32 and 11 = i64, which often (but not always) comes from the instruction encoding directly. A unit test was added, and although only a subset of the vecsizes are currently used, the rest should be useful for other cases. Differential Revision: https://reviews.llvm.org/D109706	2021-09-22 12:07:52 +01:00
Kazu Hirata	85b4b21c8b	[llvm] Use make_early_inc_range (NFC)	2021-09-20 19:30:02 -07:00
David Green	3f90df22f1	[ARM] MVE reverse shuffles. The vectorizer can sometimes make reverse shuffles from indices that count down. In MVE, we don't have a 128bit rev instruction, but we can select this to a VREV64 with some lane movs to swap the two halfs. Ideally this would use VMOVD's, but only gets as far as VMOVS's at the moment. Differential Revision: https://reviews.llvm.org/D69510	2021-09-20 13:48:01 +01:00
Kazu Hirata	84b07c9b3a	[llvm] Use pop_back_val (NFC)	2021-09-19 13:44:23 -07:00
David Green	1da52ef294	[ARM] Add VGETLANEu patterns for v4f16 and v8f16 These were apparently missing, having no pattern that could convert a VGETLANEu of a v4f16 to an i32. Added bf16 whilst here, following the same code.	2021-09-19 14:25:21 +01:00
David Green	cb5e3f7959	[ARM] Prevent large integer VQDMULH pattern crashes Put a limit on the size of constant integers we test when looking for VQDMULH, to prevent it from crashing from values more than 64bits.	2021-09-18 18:47:02 +01:00
Alexandros Lamprineas	1bd5ea968e	[ARM] Mitigate the cve-2021-35465 security vulnurability. Recently a vulnerability issue is found in the implementation of VLLDM instruction in the Arm Cortex-M33, Cortex-M35P and Cortex-M55. If the VLLDM instruction is abandoned due to an exception when it is partially completed, it is possible for subsequent non-secure handler to access and modify the partial restored register values. This vulnerability is identified as CVE-2021-35465. The mitigation sequence varies between v8-m and v8.1-m as follows: v8-m.main --------- mrs r5, control tst r5, #8 /* CONTROL_S.SFPA / it ne .inst.w 0xeeb00a40 / vmovne s0, s0 / 1: vlldm sp / Lazy restore of d0-d16 and FPSCR. / v8.1-m.main ----------- vscclrm {vpr} / Clear VPR. / vlldm sp / Lazy restore of d0-d16 and FPSCR. */ More details on developer.arm.com/support/arm-security-updates/vlldm-instruction-security-vulnerability Differential Revision: https://reviews.llvm.org/D109157	2021-09-16 12:56:43 +01:00
Alexandros Lamprineas	61f25daa8d	[ARM][CMSE] Clear the secure fp-registers when using softfp abi. When expanding the non-secure call instruction we are emiting code to clear the secure floating-point registers only if the targeted architecture has floating-point support. The potential problem is when the source code containing non-secure calls are built with -mfloat-abi=soft but some other part of the system has been built with -mfloat-abi=softfp (soft and softfp are compatible as they use the same procedure calling standard). In this case floating-point registers could leak to non-secure state as the non-secure won't have cleared them assuming no floating point has been used. Differential Revision: https://reviews.llvm.org/D109153	2021-09-16 12:56:43 +01:00
Martin Storsjö	b33a43e57c	[ARM] Move fetching of ARMSubtarget into the scopes that need it. NFC. This was requested in D38253, but missed back then. Differential Revision: https://reviews.llvm.org/D109046	2021-09-15 15:03:20 +03:00
David Green	a2332d5332	[ARM] Prevent continuous folding of SUBC Under some situations under Thumb1, we could be stuck in an infinite loop recombining the same instruction. This puts a limit on that, not combining SUBC with SUBE repeatedly.	2021-09-15 11:23:32 +01:00
David Green	5a6dfbb8cd	[ARM] Teach DemandedVectorElts about VMOVN lanes The class of instructions that write to narrow top/bottom lanes only demand the even or odd elements of the input lanes. Which means that a pair of VMOVNT; VMOVNB demands no lanes from the original input. This teaches that to instcombine from the target hooks available through ARMTTIImpl. Differential Revision: https://reviews.llvm.org/D109325	2021-09-14 11:05:31 +01:00
Nikita Popov	45c467346a	[LAA] Pass access type to getPtrStride() Pass the access type to getPtrStride(), so it is not determined from the pointer element type. Many cases still fetch the element type at a higher level though, so this only partially addresses the issue.	2021-09-11 19:16:49 +02:00
Kazu Hirata	c9fca53af1	[CodeGen, Target] Use pred_empty and succ_empty (NFC)	2021-09-10 11:11:31 -07:00
David Green	deefeffb5d	[ARM] Remove unused tblgen arguments. NFC As per D109359, this removes or makes use of some of the existing unused NEON and base ARM tblgn arguments.	2021-09-10 18:03:54 +01:00
David Green	6b7cdb40da	[ARM] Remove unused tblgen arguments. NFCI As per D109359, this removes or makes use of some of the existing unused MVE tblgn arguments.	2021-09-10 15:06:31 +01:00
Florian Hahn	5d1a6d0d1a	[ARM] Remove unnecessary use of replaceSymbolicStrideSCEV (NFC). When passing an empty strides map, there's nothing to replace for replaceSymbolicStrideSCEV and it just returns the SCEV for Ptr. There should be no need to call the function. Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D109462	2021-09-10 10:47:26 +02:00
Craig Topper	9af8f1b18e	[SelectionDAG] Add isZero/isAllOnes methods to ConstantSDNode. Soft deprecrate isNullValue/isAllOnesValue and update in tree callers. This matches the changes to the APInt interface from D109483. Reviewed By: lattner Differential Revision: https://reviews.llvm.org/D109535	2021-09-09 13:28:30 -07:00
Chris Lattner	d51da74889	[CodeGen] Use DAG.getAllOnesConstant where possible to simplify code. NFC.	2021-09-09 10:22:51 -07:00
Chris Lattner	735f46715d	[APInt] Normalize naming on keep constructors / predicate methods. This renames the primary methods for creating a zero value to `getZero` instead of `getNullValue` and renames predicates like `isAllOnesValue` to simply `isAllOnes`. This achieves two things: 1) This starts standardizing predicates across the LLVM codebase, following (in this case) ConstantInt. The word "Value" doesn't convey anything of merit, and is missing in some of the other things. 2) Calling an integer "null" doesn't make any sense. The original sin here is mine and I've regretted it for years. This moves us to calling it "zero" instead, which is correct! APInt is widely used and I don't think anyone is keen to take massive source breakage on anything so core, at least not all in one go. As such, this doesn't actually delete any entrypoints, it "soft deprecates" them with a comment. Included in this patch are changes to a bunch of the codebase, but there are more. We should normalize SelectionDAG and other APIs as well, which would make the API change more mechanical. Differential Revision: https://reviews.llvm.org/D109483	2021-09-09 09:50:24 -07:00
Peter Smith	e63455d5e0	[MC] Use local MCSubtargetInfo in writeNops On some architectures such as Arm and X86 the encoding for a nop may change depending on the subtarget in operation at the time of encoding. This change replaces the per module MCSubtargetInfo retained by the targets AsmBackend in favour of passing through the local MCSubtargetInfo in operation at the time. On Arm using the architectural NOP instruction can have a performance benefit on some implementations. For Arm I've deleted the copy of the AsmBackend's MCSubtargetInfo to limit the chances of this causing problems in the future. I've not done this for other targets such as X86 as there is more frequent use of the MCSubtargetInfo and it looks to be for stable properties that we would not expect to vary per function. This change required threading STI through MCNopsFragment and MCBoundaryAlignFragment. I've attempted to take into account the in tree experimental backends. Differential Revision: https://reviews.llvm.org/D45962	2021-09-07 15:46:19 +01:00
Peter Smith	5e71839f77	[MC] Add MCSubtargetInfo to MCAlignFragment In preparation for passing the MCSubtargetInfo (STI) through to writeNops so that it can use the STI in operation at the time, we need to record the STI in operation when a MCAlignFragment may write nops as padding. The STI is currently unused, a further patch will pass it through to writeNops. There are many places that can create an MCAlignFragment, in most cases we can find out the STI in operation at the time. In a few places this isn't possible as we are in initialisation or finalisation, or are emitting constant pools. When possible I've tried to find the most appropriate existing fragment to obtain the STI from, when none is available use the per module STI. For constant pools we don't actually need to use EmitCodeAlign as the constant pools are data anyway so falling through into it via an executable NOP is no better than falling through into data padding. This is a prerequisite for D45962 which uses the STI to emit the appropriate NOP for the STI. Which can differ per fragment. Note that involves an interface change to InitSections. It is now called initSections and requires a SubtargetInfo as a parameter. Differential Revision: https://reviews.llvm.org/D45961	2021-09-07 15:46:19 +01:00
Ben Shi	63ca9371c7	[ARM] Implement target hook function to decide folding (mul (add x, c1), c2) Prevent the folding in DAGCombine if it leads to worse code. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D109124	2021-09-07 15:42:43 +08:00
David Green	adfd12e6d1	[ARM] Add patterns for store(fptosisat(..)) As an extension to D107866, this adds store(fptosisat(..)) patterns, similar to the existing fptosi patterns, to prevent unnecessarily moving into gpr regs where we can use fp stores directly. Differential Revision: https://reviews.llvm.org/D108378	2021-09-03 19:22:11 +01:00
David Green	f37e132263	[ARM] Add VFP lowering for fptosi.sat This extends D107865 to the VFP insructions, lowering llvm.fptosi.sat and llvm.fptoui.sat to VCVT instructions that inherently perform the saturate. Differential Revision: https://reviews.llvm.org/D107866	2021-09-03 18:11:08 +01:00
David Green	9cb8f4d1ad	[ARM] Add a tail-predication loop predicate register The semantics of tail predication loops means that the value of LR as an instruction is executed determines the predicate. In other words: mov r3, #3 DLSTP lr, r3 // Start tail predication, lr==3 VADD.s32 q0, q1, q2 // Lanes 0,1 and 2 are updated in q0. mov lr, #1 VADD.s32 q0, q1, q2 // Only first lane is updated. This means that the value of lr cannot be spilled and re-used in tail predication regions without potentially altering the behaviour of the program. More lanes than required could be stored, for example, and in the case of a gather those lanes might not have been setup, leading to alignment exceptions. This patch adds a new lr predicate operand to MVE instructions in order to keep a reference to the lr that they use as a tail predicate. It will usually hold the zeroreg meaning not predicated, being set to the LR phi value in the MVETPAndVPTOptimisationsPass. This will prevent it from being spilled anywhere that it needs to be used. A lot of tests needed updating. Differential Revision: https://reviews.llvm.org/D107638	2021-09-02 13:42:58 +01:00
David Green	49476a4d66	[ARM] Add MVE lowering for fptosi.sat This adds lowering of the llvm.fptosi.sat and llvm.fptoui.sat intinsics, selecting a VCVT instruction which under MVE will inherently perform the saturate. Differential Revision: https://reviews.llvm.org/D107865	2021-09-01 22:38:47 +01:00
Philip Reames	29fa37ec9f	[SCEV] If max BTC is zero, then so is the exact BTC [2 of 2] This extends D108921 into a generic rule applied to constructing ExitLimits along all paths. The remaining paths (primarily howFarToZero) don't have the same reasoning about UB sensitivity as the howManyLessThan ones did. Instead, the remain cause for max counts being more precise than exact counts is that we apply context sensitive loop guards on the max path, and not on the exact path. That choice is mildly suspect, but out of scope of this patch. The MVETailPredication.cpp change deserves a bit of explanation. We were previously figuring out that two SCEVs happened to be equal because the happened to be identical. When we optimized one with context sensitive information, but not the other, we lost the ability to prove them equal. So, cover this case by subtracting and then applying loop guards again. Without this, we see changes in test/CodeGen/Thumb2/mve-blockplacement.ll Differential Revision: https://reviews.llvm.org/D109015	2021-09-01 11:51:48 -07:00
David Green	22c384129e	[ARM] Add missing validForTailPredication for VMINNM/VMAXNM Apparently this was missing, preventing the generation of tail predication loops containing VMINNM, VMAXNM, VMINNMA and VMAXNMA.	2021-08-31 18:19:03 +01:00
Simon Wallis	f417b660ee	[Arm] Add assert in T2 Imm7s code emitter Add assert to provoke failure in object file output, not just in disassembly output. Reviewed By: yroux Differential Revision: https://reviews.llvm.org/D107259	2021-08-31 08:16:48 +01:00
David Green	efa340fbd2	[ARM] Workaround tailpredication min/max costmodel The min/max intrinsics are not yet canonical, but when they are the tail predications analysis will change from treating them like icmp to treating them like intrinsics. Unfortunately, they can currently produce better code by not being tail predicated thanks to the vectorizer picking higher VF's and the backend folding to better instructions (especially for saturate patterns). In the long run we will need to improve the vectorizers cost modelling, recognizing the instruction directly, but in the meantime this treats min/max as before to prevent performance regressions.	2021-08-30 19:19:51 +01:00
Nikita Popov	0529e2e018	[InstrInfo] Use 64-bit immediates for analyzeCompare() (NFCI) The backend generally uses 64-bit immediates (e.g. what MachineOperand::getImm() returns), so use that for analyzeCompare() and optimizeCompareInst() as well. This avoids truncation for targets that support immediates larger 32-bit. In particular, we can avoid the bugprone value normalization hack in the AArch64 target. This is a followup to D108076. Differential Revision: https://reviews.llvm.org/D108875	2021-08-30 19:46:04 +02:00
Nick Desaulniers	5c91b98c5d	[ARMISelLowering] avoid emitting libcalls to __mulodi4() __has_builtin(__builtin_mul_overflow) returns true for 32b ARM targets, but Clang is deferring to compiler RT when encountering `long long` types. This breaks sanitizer builds of the Linux kernel that are using __builtin_mul_overflow with these types for these targets. If the semantics of __has_builtin mean "the compiler resolves these, always" then we shouldn't conditionally emit a libcall. This will still need to be worked around in the Linux kernel in order to continue to support allmodconfig builds of the Linux kernel for this target with older releases of clang. Link: https://bugs.llvm.org/show_bug.cgi?id=28629 Link: https://github.com/ClangBuiltLinux/linux/issues/1438 Reviewed By: rengolin Differential Revision: https://reviews.llvm.org/D108842	2021-08-27 15:14:47 -07:00
Martin Storsjö	039b469b85	[ARM] Allow using ';' as asm statement separator in MSVC mode This does the same as D96259, but for ARM, just like AArch64, using the same comment char as for ELF and MinGW mode. As the assembly input/output of LLVM is GAS style, trying to match what MS armasm.exe does isn't needed (because the comment char used is the least concern when it comes to that; all directives differ too). If a separate armasm compatible mode is implemented, it can use its own comment style (just like llvm-ml implements MS ml.exe compatible assembly parsing). This fixes building compiler-rt assembly files for ARM in MSVC mode. The updated testcase literals-comments.s was only intended to make sure that '#' isn't interpreted as a comment char. Differential Revision: https://reviews.llvm.org/D107251	2021-08-24 11:01:49 +03:00
David Green	605489d593	[ARM] Fix VQDMULH fold for scalar smin Add a variant of mve-vqdmulh tests that uses min/max intrinsics directly, including a scalar test that shows it misbehaving for min intrinsics and a fix for the combine to prevent it from misbehaving.	2021-08-21 16:33:18 +01:00
Arthur Eubanks	46cf82532c	[NFC] Replace Function handling of attributes with less confusing calls To avoid magic constants and confusing indexes.	2021-08-17 21:05:40 -07:00
Simon Pilgrim	1e770f0388	[ARM] ARMDAGToDAGISel::tryReadRegister/tryWriteRegister - don't dereference dyn_cast<> results. dyn_cast<> can return nullptr if the cast is illegal, use cast<> instead which will assert that the cast is correct. Fixes static analyser warnings.	2021-08-17 18:40:59 +01:00
David Green	52e0cf9d61	[ARM] Enable subreg liveness This enables subreg liveness in the arm backend when MVE is present, which allows the register allocator to detect when subregister are alive/dead, compared to only acting on full registers. This can helps produce better code on MVE with the way MQPR registers are made up of SPR registers, but is especially helpful for MQQPR and MQQQQPR registers, where there are very few "registers" available and being able to split them up into subregs can help produce much better code. Differential Revision: https://reviews.llvm.org/D107642	2021-08-17 14:10:33 +01:00
David Green	62e892fa2d	[ARM] Add MQQPR and MQQQQPR spill and reload pseudo instructions As a part of D107642, this adds pseudo instructions for MQQPR and MQQQQPR register classes, that can spill and reloads entire registers whilst keeping them combined, not splitting them into multiple D subregs that a VLDMIA/VSTMIA would use. This can help certain analyses, and helps to prevent verifier issues with subreg liveness.	2021-08-17 13:51:34 +01:00
David Green	9236dea255	[ARM] Create MQQPR and MQQQQPR register classes Similar to the MQPR register class as the MVE equivalent to QPR, this adds MQQPR and MQQQQPR register classes for the MVE equivalents of QQPR and QQQQPR registers. The MVE MQPR seemed have worked out quite well, and adding MQQPR and MQQQQPR allows us to a little more accurately specify the number of registers, calculating register pressure limits a little better. Differential Revision: https://reviews.llvm.org/D107463	2021-08-16 22:58:12 +01:00
Simon Pilgrim	d6fe8d37c6	[DAG] Fold concat_vectors(concat_vectors(x,y),concat_vectors(a,b)) -> concat_vectors(x,y,a,b) Follow-up to D107068, attempt to fold nested concat_vectors/undefs, as long as both the vector and inner subvector types are legal. This exposed the same issue in ARM's MVE LowerCONCAT_VECTORS_i1 (raised as PR51365) and AArch64's performConcatVectorsCombine which both assumed concat_vectors only took 2 subvector operands. Differential Revision: https://reviews.llvm.org/D107597	2021-08-16 16:06:54 +01:00
Arthur Eubanks	92ce6db9ee	[NFC] Rename AttributeList::hasFnAttribute() -> hasFnAttr() This is more consistent with similar methods.	2021-08-13 11:09:18 -07:00
David Green	ae9a346ef8	[ARM] Fix DAG combine loop in reduction distribution Given a constant operand, the MVE and DAGCombine combines could fight, each redistributing in the opposite order. Add a guard to the MVE vecreduce distribution to prevent that.	2021-08-12 16:37:39 +01:00
David Green	8c50b5fbfe	[ARM] Add extra debug messages for validating live outs. NFC We are running into more and more cases where the liveouts of low overhead loops do not validate. Add some extra debug messages to make it clearer why.	2021-08-11 10:35:53 +01:00
David Green	c140ff493e	[ARM] Change a couple of instances of LiveRegs.contains to !LiveRegs.available This changes a couple of calls to LiveRegs.contains to !LiveRegs.available, one in Thumb1FrameLoweringInfo (which modifies a test to look more correct to me, given r7 should be the frame pointer so is not available), and another in the ARMLoadStoreOptimizer, that I don't have a test for, it was just found by inspection. Differential Revision: https://reviews.llvm.org/D107454	2021-08-10 09:53:26 +01:00
Amara Emerson	2b067e3335	Change TargetLowering::canMergeStoresTo() to take a MF instead of DAG. DAG is unnecessary and we need this hook to implement store merging on GlobalISel too.	2021-08-06 12:57:53 -07:00
David Green	77e8f4eeee	[ARM] Define ComplexPatternFuncMutatesDAG Some of the Arm complex pattern functions call canExtractShiftFromMul, which can modify the DAG in-place. For this to be valid and handled successfully we need to define ComplexPatternFuncMutatesDAG. Differential Revision: https://reviews.llvm.org/D107476	2021-08-06 17:35:11 +01:00
Simon Pilgrim	dbce6a8d9d	[ARM] Fold insert_subvector to concat_vectors D107068 fixed the same problem on aarch64 but the arm variant wasn't exposed in existing test coverage. I've copied the arm64-neon-copy tests (and stripped the intrinsic test from it) for testing on arm neon builds as well.	2021-08-06 11:21:31 +01:00
Amara Emerson	4fee756c75	Delete copy-ctor of MachineFrameInfo. I just hit a nasty bug when writing a unit test after calling MF->getFrameInfo() without declaring the variable as a reference. Deleting the copy-constructor also showed a place in the ARM backend which was doing the same thing, albeit it didn't impact correctness there from the looks of it.	2021-08-05 23:24:37 -07:00
Igor Kudrin	2c14798ead	[ARM][llvm-objdump] Annotate PC-relative memory operands of VLDR instructions This extends D105979 and adds support for VLDR instructions. Differential Revision: https://reviews.llvm.org/D105980	2021-08-05 14:11:11 +07:00
Igor Kudrin	ddbe812bcc	[ARM][llvm-objdump] Annotate PC-relative memory operands This implements `MCInstrAnalysis::evaluateMemoryOperandAddress()` for Arm so that the disassembler can print the target address of memory operands that use PC+immediate addressing. Differential Revision: https://reviews.llvm.org/D105979	2021-08-05 14:11:11 +07:00
Tomas Matheson	40650f27b5	[ARM][atomicrmw] Fix CMP_SWAP_32 expand assert This assert is intended to ensure that the high registers are not selected when it is passed to one of the thumb UXT instructions. However it was triggering even for 32 bit where no UXT instruction is emitted. Fixes PR51313. Differential Revision: https://reviews.llvm.org/D107363	2021-08-04 15:02:02 +01:00
Arthur Eubanks	ad25344620	[MC][CodeGen] Emit constant pools earlier Previously we would emit constant pool entries for ldr inline asm at the very end of AsmPrinter::doFinalization(). However, if we're emitting dwarf aranges, that would end all sections with aranges. Then if we have constant pool entries to be emitted in those same sections, we'd hit an assert that the section has already been ended. We want to emit constant pool entries before emitting dwarf aranges. This patch splits out arm32/64's constant pool entry emission into its own MCTargetStreamer virtual method. Fixes PR51208 Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D107314	2021-08-03 20:55:31 -07:00
Roman Lebedev	6f6e9a867f	[BasicTTIImpl][LoopUnroll] getUnrollingPreferences(): emit ORE remark when advising against unrolling due to a call in a loop I'm not sure this is the best way to approach this, but the situation is rather not very detectable unless we explicitly call it out when refusing to advise to unroll. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D107271	2021-08-03 00:57:26 +03:00
David Green	c423a586a7	[ARM] Remove setPreservesCFG from ARMBlockPlacement As of `2829391840` it no longer preserves the CFG, needing to split blocks in order to add DLS instructions.	2021-08-02 14:15:45 +01:00
Simon Pilgrim	0579050116	Fix MSVC signed/unsigned comparison warning. NFCI.	2021-08-02 11:23:43 +01:00
David Green	2829391840	[ARM] Revert WLSTP to DLSTP if the target block is out of range If the block target for a WLSTP instruction is known to be out of range, and cannot be fixed by the ARMBlockPlacementPass, we can relax it to a DLSTP (and cmp/branch) to still allow the creation of tail predicated loops. That is what this patch does, adding extra revert code to the fallback path of ARMBlockPlacementPass. Due to the code produced when reverting, this creates a DLSTP between a Bcc and a Br. As a DLS isn't necessarily a terminator we need to split the block to move the DLS/Br into. Differential Revision: https://reviews.llvm.org/D104709	2021-08-02 10:59:52 +01:00
David Green	15a1d7e839	[ARM] Switch order of creating VADDV and VMLAV. It can be beneficial to attempt to try the larger VMLAV patterns before VADDV, in case both may match the same code.	2021-07-31 16:28:52 +01:00
David Green	69cdadddec	[ARM] Distribute reductions based on ascending load offset This distributes reductions based on the relative offset of loads, if one is found from their operands. Given chains of reductions this will then sort them in ascending load order, which in turn can help simple prefetches latch on to increasing strides more easily. Differential Revision: https://reviews.llvm.org/D106569	2021-07-30 19:50:07 +01:00
David Green	532d05b714	[ARM] Attempt to distribute reductions This adds a combine for adds of reductions, distributing them so that they occur sequentially to enable better use of accumulating VADDVA instructions. It combines: add(X, add(vecreduce(Y), vecreduce(Z))) -> add(add(X, vecreduce(Y)), vecreduce(Z)) and add(add(A, reduce(B)), add(C, reduce(D))) -> add(add(add(A, C), reduce(B)), reduce(D)) These together distribute the add's so that more reductions can be selected to VADDVA. Differential Revision: https://reviews.llvm.org/D106532	2021-07-30 14:48:31 +01:00
David Green	4b56306762	[ARM] Turn vecreduce_add(add(x, y)) into vecreduce(x) + vecreduce(y) Under MVE we can use VADDV/VADDVA's to perform integer add reductions, so it can be beneficial to use more reductions than summing subvectors and reducing once. Especially for VMLAV/VMLAVA the mul can be incorporated into the reduction, producing less instructions. Some of the test cases currently get larger due to extra integer adds, but will be improved in a followup patch. Differential Revision: https://reviews.llvm.org/D106531	2021-07-30 10:10:41 +01:00
David Green	d4a2daa919	[ARM] Define a couple more ssub indexes. NFC Same as `91bd3ad128`, this doesn't really change anything but gives the registers better names than the ones tablegen would define. And fills in the missing gaps.	2021-07-29 23:00:35 +01:00
David Green	41cedb1c9a	[LV][ARM] Tighten up MLA reduction costing This makes a couple of changes to the costing of MLA reduction patterns, to more accurately cost various patterns that can come up from vectorization. - The Arm implementation of getExtendedAddReductionCost is altered to only provide costs for legal or smaller types. Larger than legal types need to be split, which currently does not work very well, especially for predicated reductions where the predicate may be legal but needs to be split. Currently we limit it to legal or smaller input types. - The getReductionPatternCost has learnt that reduce(ext(mul(ext, ext)) is a pattern that can come up, and can be treated the same as reduce(mul(ext, ext)) providing the extension types match. - And it has been adjusted to not count the ext in reduce(mul(ext, ext)) as part of a reduce(mul) pattern. Together these changes help to more accurately cost the mla reductions in cases such as where the extend types don't match or the extend opcodes are different, picking better vector factors that don't result in expanded reductions. Differential Revision: https://reviews.llvm.org/D106166	2021-07-28 12:50:58 +01:00
Anirudh Prasad	a8cfa4b9bd	[SystemZ][z/OS] Initial code to generate assembly files on z/OS - This patch consists of the bare basic code needed in order to generate some assembly for the z/OS target. - Only the .text and the .bss sections are added for now. - The relevant MCSectionGOFF/Symbol interfaces have been added. This enables us to print out the GOFF machine code sections. - This patch enables us to add simple lit tests wherever possible, and contribute to the testing coverage for the z/OS target - Further improvements and additions will be made in future patches. Reviewed By: tmatheson Differential Revision: https://reviews.llvm.org/D106380	2021-07-27 11:29:15 -04:00
David Green	54c91c0c74	[ARM] Implement isLoad/StoreFromStackSlot for MVE stack stores accesses This implements the isLoadFromStackSlot and isStoreToStackSlot for MVE MVE_VSTRWU32 and MVE_VLDRWU32 functions. They behave the same as many other loads/stores, expecting a FI in Op1 and zero offset in Op2. At the same time this alters VLDR_P0_off and VSTR_P0_off to use the same code too, as they too should be returning VPR in Op0, take a FI in Op1 and zero offset in Op2. Differential Revision: https://reviews.llvm.org/D106797	2021-07-27 09:11:58 +01:00
David Green	010f8e3057	[ARM] Ensure correct regclass in distributing postinc The register class required for some MVE loads/stores is more constrained than the register we use when creating postinc. Make sure we constrain the register class to keep the code correct.	2021-07-26 14:26:38 +01:00
David Sherwood	0aff1798b5	[Analysis] Add simple cost model for strict (in-order) reductions I have added a new FastMathFlags parameter to getArithmeticReductionCost to indicate what type of reduction we are performing: 1. Tree-wise. This is the typical fast-math reduction that involves continually splitting a vector up into halves and adding each half together until we get a scalar result. This is the default behaviour for integers, whereas for floating point we only do this if reassociation is allowed. 2. Ordered. This now allows us to estimate the cost of performing a strict vector reduction by treating it as a series of scalar operations in lane order. This is the case when FP reassociation is not permitted. For scalable vectors this is more difficult because at compile time we do not know how many lanes there are, and so we use the worst case maximum vscale value. I have also fixed getTypeBasedIntrinsicInstrCost to pass in the FastMathFlags, which meant fixing up some X86 tests where we always assumed the vector.reduce.fadd/mul intrinsics were 'fast'. New tests have been added here: Analysis/CostModel/AArch64/reduce-fadd.ll Analysis/CostModel/AArch64/sve-intrinsics.ll Transforms/LoopVectorize/AArch64/strict-fadd-cost.ll Transforms/LoopVectorize/AArch64/sve-strict-fadd-cost.ll Differential Revision: https://reviews.llvm.org/D105432	2021-07-26 10:26:06 +01:00
David Green	ba42f6a4b5	[ARM] Pass SelectionDAG to methods that dont require DCI. NFC In these methods DCI is never used, only the DAG from it. Pass the DAG directly, cleaning up the code a little.	2021-07-21 22:11:09 +01:00
Tim Northover	19d2e42be2	ARM: don't return by popping PC if we have to adjust the stack afterwards. In mandatory tail calling conventions we might have to deallocate stack space used by our arguments before return. This happens after popping CSRs, so the pop cannot be turned into the return itself in this case. The else branch here was already a nop, so removing it as a tidy-up.	2021-07-21 09:35:14 +01:00
David Green	5561ad8b36	[ARM] Remove PromotedBitwiseVT for NEON types This removes the promotion of NEON AND, OR and XOR nodes to v2i32/v4i32, treating them the same as the AArch64 and MVE backends where we just add the relevant patterns for each legal type. This prevents a lot of bitcasts from being added to the DAG, which have the potential to make optimizations more difficult. It does mean adding extra patterns, and some codegen can change due to the types now being legal, not promoted. Differential Revision: https://reviews.llvm.org/D105588	2021-07-19 16:36:33 +01:00
David Green	eb1e95dbdf	[ARM] Extend more reductions during lowering This relaxes the VMLAV and VADDV reduction recognition code to handle smaller than legal types, extending them as needed. That was already handled for some reductions, this extends it to more types in a more generic way. If a smaller than legal value is found it is extended to the legal type as needed. Differential Revision: https://reviews.llvm.org/D106051	2021-07-19 08:58:03 +01:00

... 2 3 4 5 6 ...

11846 Commits