llvm-project

Commit Graph

Author	SHA1	Message	Date
Fangrui Song	b0df70403d	[Target] llvm::Optional => std::optional The updated functions are mostly internal with a few exceptions (virtual functions in TargetInstrInfo.h, TargetRegisterInfo.h). To minimize changes to LLVMCodeGen, GlobalISel files are skipped. https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716	2022-12-04 22:43:14 +00:00
Kazu Hirata	20cde15415	[Target] Use std::nullopt instead of None (NFC) This patch mechanically replaces None with std::nullopt where the compiler would warn if None were deprecated. The intent is to reduce the amount of manual work required in migrating from Optional to std::optional. This is part of an effort to migrate from llvm::Optional to std::optional: https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716	2022-12-02 20:36:06 -08:00
Filipp Zhinkin	945a1468c9	[ARM] Support all versions of AND, ORR, EOR and BIC in optimizeCompareInstr Combine cmp with zero and all versions of AND, ORR, EOR and BIC instructions into S-suffixed versions. Related issue: https://github.com/llvm/llvm-project/issues/57122 Reviewed By: efriedma, samtebbs Differential Revision: https://reviews.llvm.org/D131786	2022-10-01 12:41:37 +03:00
Joe Loser	5e96cea1db	[llvm] Use std::size instead of llvm::array_lengthof LLVM contains a helpful function for getting the size of a C-style array: `llvm::array_lengthof`. This is useful prior to C++17, but not as helpful for C++17 or later: `std::size` already has support for C-style arrays. Change call sites to use `std::size` instead. Differential Revision: https://reviews.llvm.org/D133429	2022-09-08 09:01:53 -06:00
Kazu Hirata	2833760c57	[Target] Qualify auto in range-based for loops (NFC)	2022-08-28 17:35:09 -07:00
David Penry	ced705c440	[ModuloSchedule] Add interface call to accept/reject SMS schedules This interface allows a target to reject a proposed SMS schedule. For Hexagon/PowerPC, all schedules are accepted, leaving behavior unchanged. For ARM, schedules which exceed register pressure limits are rejected. Also, two RegisterPressureTracker methods now need to be public so that register pressure can be computed by more callers. Reapplication of D128941/(reversion:D132037) with small fix. Differential Revision: https://reviews.llvm.org/D132170	2022-08-22 12:10:13 -07:00
Kazu Hirata	ec5eab7e87	Use range-based for loops (NFC)	2022-08-20 21:18:32 -07:00
David Penry	1c9f0408bc	Revert "[ModuloSchedule] Add interface call to accept/reject SMS schedules" This reverts commit `8c4aea438c`. Needed because buildbot failures (warnings) gave a clue that there was a functional bug in the ARM rejection logic. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D132037	2022-08-17 09:32:43 -07:00
David Penry	8c4aea438c	[ModuloSchedule] Add interface call to accept/reject SMS schedules This interface allows a target to reject a proposed SMS schedule. For Hexagon/PowerPC, all schedules are accepted, leaving behavior unchanged. For ARM, schedules which exceed register pressure limits are rejected. Also, two RegisterPressureTracker methods now need to be public so that register pressure can be computed by more callers. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D128941	2022-08-17 08:13:26 -07:00
Fangrui Song	de9d80c1c5	[llvm] LLVM_FALLTHROUGH => [[fallthrough]]. NFC With C++17 there is no Clang pedantic warning or MSVC C5051.	2022-08-08 11:24:15 -07:00
Matt Arsenault	8d0383eb69	CodeGen: Remove AliasAnalysis from regalloc This was stored in LiveIntervals, but not actually used for anything related to LiveIntervals. It was only used in one check for if a load instruction is rematerializable. I also don't think this was entirely correct, since it was implicitly assuming constant loads are also dereferenceable. Remove this and rely only on the invariant+dereferenceable flags in the memory operand. Set the flag based on the AA query upfront. This should have the same net benefit, but has the possible disadvantage of making this AA query nonlazy. Preserve the behavior of assuming pointsToConstantMemory implying dereferenceable for now, but maybe this should be changed.	2022-07-18 17:23:41 -04:00
Kazu Hirata	621f58e716	[Target, CodeGen] Use isImm(), isReg(), etc (NFC)	2022-06-18 07:41:04 -07:00
Kazu Hirata	3b9707dbc0	[llvm] Convert for_each to range-based for loops (NFC)	2022-06-05 12:07:14 -07:00
Martin Storsjö	d8e67c1ccc	[ARM] Add SEH opcodes in frame lowering Skip inserting regular CFI instructions if using WinCFI. This is based a fair amount on the corresponding ARM64 implementation, but instead of trying to insert the SEH opcodes one by one where we generate other prolog/epilog instructions, we try to walk over the whole prolog/epilog range and insert them. This is done because in many cases, the exact number of instructions inserted is abstracted away deeper. For some cases, we manually insert specific SEH opcodes directly where instructions are generated, where the automatic mapping of instructions to SEH opcodes doesn't hold up (e.g. for __chkstk stack probes). Skip Thumb2SizeReduction for SEH prologs/epilogs, and force tail calls to wide instructions (just like on MachO), to make sure that the unwind info actually matches the width of the final instructions, without heuristics about what later passes will do. Mark SEH instructions as scheduling boundaries, to make sure that they aren't reordered away from the instruction they describe by PostRAScheduler. Mark the SEH instructions with the NoMerge flag, to avoid doing tail merging of functions that have multiple epilogs that all end with the same sequence of "b <other>; .seh_nop_w, .seh_endepilogue". Differential Revision: https://reviews.llvm.org/D125648	2022-06-02 12:28:46 +03:00
David Penry	917dc0749b	[ARM] Recognize t2LoopEnd for software pipelining - Add t2LoopEnd to TargetInstrInfo::analyzeBranch and related functions. As there are many side effects of analyzing a branch, only do so if software pipelining is enabled to maintain previous behavior when pipelining is not desired. - Make sure that t2LoopEndDec is immediately followed by a t2B when it is synthesized from a t2LoopEnd. This is done because the t2LoopEnd might have acquired a fall-through path, but IfConversion assumes that fall-through are only possible on analyzable branches. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D126322	2022-05-26 09:55:42 -07:00
David Penry	dcb77643e3	Reapply [CodeGen][ARM] Enable Swing Module Scheduling for ARM Fixed "private field is not used" warning when compiled with clang. original commit: `28d09bbbc3` reverted in: `fa49021c68` ------ This patch permits Swing Modulo Scheduling for ARM targets turns it on by default for the Cortex-M7. The t2Bcc instruction is recognized as a loop-ending branch. MachinePipeliner is extended by adding support for "unpipelineable" instructions. These instructions are those which contribute to the loop exit test; in the SMS papers they are removed before creating the dependence graph and then inserted into the final schedule of the kernel and prologues. Support for these instructions was not previously necessary because current targets supporting SMS have only supported it for hardware loop branches, which have no loop-exit-contributing instructions in the loop body. The current structure of the MachinePipeliner makes it difficult to remove/exclude these instructions from the dependence graph. Therefore, this patch leaves them in the graph, but adds a "normalization" method which moves them in the schedule to stage 0, which causes them to appear properly in kernel and prologues. It was also necessary to be more careful about boundary nodes when iterating across successors in the dependence graph because the loop exit branch is now a non-artificial successor to instructions in the graph. In additional, schedules with physical use/def pairs in the same cycle should be treated as creating an invalid schedule because the scheduling logic doesn't respect physical register dependence once scheduled to the same cycle. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D122672	2022-04-29 10:54:39 -07:00
David Penry	fa49021c68	Revert "[CodeGen][ARM] Enable Swing Module Scheduling for ARM" This reverts commit `28d09bbbc3` while I investigate a buildbot failure.	2022-04-28 13:29:27 -07:00
David Penry	28d09bbbc3	[CodeGen][ARM] Enable Swing Module Scheduling for ARM This patch permits Swing Modulo Scheduling for ARM targets turns it on by default for the Cortex-M7. The t2Bcc instruction is recognized as a loop-ending branch. MachinePipeliner is extended by adding support for "unpipelineable" instructions. These instructions are those which contribute to the loop exit test; in the SMS papers they are removed before creating the dependence graph and then inserted into the final schedule of the kernel and prologues. Support for these instructions was not previously necessary because current targets supporting SMS have only supported it for hardware loop branches, which have no loop-exit-contributing instructions in the loop body. The current structure of the MachinePipeliner makes it difficult to remove/exclude these instructions from the dependence graph. Therefore, this patch leaves them in the graph, but adds a "normalization" method which moves them in the schedule to stage 0, which causes them to appear properly in kernel and prologues. It was also necessary to be more careful about boundary nodes when iterating across successors in the dependence graph because the loop exit branch is now a non-artificial successor to instructions in the graph. In additional, schedules with physical use/def pairs in the same cycle should be treated as creating an invalid schedule because the scheduling logic doesn't respect physical register dependence once scheduled to the same cycle. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D122672	2022-04-28 13:01:18 -07:00
Shengchen Kan	37b378386e	[NFC][CodeGen] Rename some functions in MachineInstr.h and remove duplicated comments	2022-03-16 20:25:42 +08:00
Jessica Paquette	6d58f4ab07	[MachineOutliner] NFC: Hide LRU-related stuff behind helper functions It's not particularly user-friendly to have to call `initLRU` everywhere. Also, it wasn't particularly great that the LRU for registers used in a sequence was also initialized by `initLRU`. This patch hides this stuff behind some helper functions: * `isAvailableAcrossAndOutOfSeq` * `isAnyUnavailableAcrossOrOutOfSeq` * `isAvailableInsideSeq` This allows the user to avoid calling `initLRU` explicitly. Also, it allows us to separate initializing the used-in-sequence LRU from the main LRU. Since both ARM and AArch64 check LR liveness in `insertOutlinedCall`, this refactor requires that we de-const the Candidate there. Some other quality-of-code improvements: * LRUs in outliner::Candidate now have more descriptive names * Use `Register` instead of `unsigned` in some places * Improve readability in some places by using ranges rather than `std::for_each` This is a preparatory commit for a larger compile time related change for the AArch64 outliner.	2022-02-16 11:39:07 -08:00
tyb0807	762f0b5463	[ARM] Make getInstSizeInBytes() use instruction size from InstrInfo.td Currently, ARMBaseInstrInfo::getInstSizeInBytes() uses hard-coded instruction size for some pseudo-instructions, while this information should ideally be found in ARMInstrInfo.td, ARMInstrThumb(2).td files (which can be accessed via MCInstrDesc). Hence, the .td files should be updated and no hard-coded instruction sizes should be used by getInstSizeInBytes() anymore. Differential Revision: https://reviews.llvm.org/D118009	2022-02-01 10:39:14 +00:00
Benjamin Kramer	f15014ff54	Revert "Rename llvm::array_lengthof into llvm::size to match std::size from C++17" This reverts commit `ef82063207`. - It conflicts with the existing llvm::size in STLExtras, which will now never be called. - Calling it without llvm:: breaks C++17 compat	2022-01-26 16:55:53 +01:00
serge-sans-paille	ef82063207	Rename llvm::array_lengthof into llvm::size to match std::size from C++17 As a conquence move llvm::array_lengthof from STLExtras.h to STLForwardCompat.h (which is included by STLExtras.h so no build breakage expected).	2022-01-26 16:17:45 +01:00
David Green	319e77592f	[ARM] Verify addressing immediates This adds at extra check into ARMBaseInstrInfo::verifyInstruction to verify the offsets used in addressing mode immediates using isLegalAddressImm. Some tests needed fixing up as a result, adjusting the opcode created from CMSE stack adjustments. Differential Revision: https://reviews.llvm.org/D114939	2022-01-01 20:08:45 +00:00
Kazu Hirata	483499670e	[Target] Use llvm::reverse (NFC)	2021-12-12 08:34:24 -08:00
Ties Stuij	63eb7ff47d	[ARM] Implement PAC return address signing mechanism for PACBTI-M This patch implements PAC return address signing for armv8-m. This patch roughly accomplishes the following things: - PAC and AUT instructions are generated. - They're part of the stack frame setup, so that shrink-wrapping can move them inwards to cover only part of a function - The auth code generated by PAC is saved across subroutine calls so that AUT can find it again to check - PAC is emitted before stacking registers (so that the SP it signs is the one on function entry). - The new pseudo-register ra_auth_code is mentioned in the DWARF frame data - With CMSE also in use: PAC is emitted before stacking FPCXTNS, and AUT validates the corresponding value of SP - Emit correct unwind information when PAC is replaced by PACBTI - Handle tail calls correctly Some notes: We make the assembler accept the `.save {ra_auth_code}` directive that is emitted by the compiler when it saves a register that contains a return address authentication code. For EHABI we need to have the `FrameSetup` flag on the instruction and handle the `t2PACBTI` opcode (identically to `t2PAC`), so we can emit `.save {ra_auth_code}`, instead of `.save {r12}`. For PACBTI-M, the instruction which computes return address PAC should use SP value before adjustment for the argument registers save are (used for variadic functions and when a parameter is is split between stack and register), but at the same it should be after the instruction that saves FPCXT when compiling a CMSE entry function. This patch moves the varargs SP adjustment after the FPCXT save (they are never enabled at the same time), so in a following patch handling of the `PAC` instruction can be placed between them. Epilogue emission code adjusted in a similar manner. PACBTI-M code generation should not emit any instructions for architectures v6-m, v8-m.base, and for A- and R-class cores. Diagnostic message for such cases is handled separately by a future ticket. note on tail calls: If the called function has four arguments that occupy registers `r0`-`r3`, the only option for holding the function pointer itself is `r12`, but this register is used to keep the PAC during function/prologue epilogue and clobbers the function pointer. When we do the tail call we need the five registers (`r0`-`r3` and `r12`) to keep six values - the four function arguments, the function pointer and the PAC, which is obviously impossible. One option would be to authenticate the return address before all callee-saved registers are restored, so we have a scratch register to temporarily keep the value of `r12`. The issue with this approach is that it violates a fundamental invariant that PAC is computed using CFA as a modifier. It would also mean using separate instructions to pop `lr` and the rest of the callee-saved registers, which would offset the advantages of doing a tail call. Instead, this patch disables indirect tail calls when the called function take four or more arguments and the return address sign and authentication is enabled for the caller function, conservatively assuming the caller function would spill LR. This patch is part of a series that adds support for the PACBTI-M extension of the Armv8.1-M architecture, as detailed here: https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/armv8-1-m-pointer-authentication-and-branch-target-identification-extension The PACBTI-M specification can be found in the Armv8-M Architecture Reference Manual: https://developer.arm.com/documentation/ddi0553/latest The following people contributed to this patch: - Momchil Velikov - Ties Stuij Reviewed By: danielkiss Differential Revision: https://reviews.llvm.org/D112429	2021-12-07 10:15:19 +00:00
David Green	b8f1ccb0ac	[ARM] Introduce i8neg and i8pos addressing modes Some instructions with i8 immediate ranges can only hold negative values (like t2LDRHi8), only hold positive values (like t2STRT) or hold +/- depending on the U bit (like the pre/post inc instructions. e.g t2LDRH_POST). This patch splits the AddrModeT2_i8 into AddrModeT2_i8, AddrModeT2_i8pos and AddrModeT2_i8neg to make this clear. This allows us to get the offset ranges of t2LDRHi8 correct in the load/store optimizer, fixing issues where we could end up creating instructions with positive offsets (which may then be encoded as ldrht). Differential Revision: https://reviews.llvm.org/D114638	2021-12-02 17:10:26 +00:00
Ties Stuij	f5f28d5b0c	[ARM] Implement BTI placement pass for PACBTI-M This patch implements a new MachineFunction in the ARM backend for placing BTI instructions. It is similar to the existing AArch64 aarch64-branch-targets pass. BTI instructions are inserted into basic blocks that: - Have their address taken - Are the entry block of a function, if the function has external linkage or has its address taken - Are mentioned in jump tables - Are exception/cleanup landing pads Each BTI instructions is placed in the beginning of a BB after the so-called meta instructions (e.g. exception handler labels). Each outlining candidate and the outlined function need to be in agreement about whether BTI placement is enabled or not. If branch target enforcement is disabled for a function, the outliner should not covertly enable it by emitting a call to an outlined function, which begins with BTI. The cost mode of the outliner is adjusted to account for the extra BTI instructions in the outlined function. The ARM Constant Islands pass will maintain the count of the jump tables, which reference a block. A `BTI` instruction is removed from a block only if the reference count reaches zero. PAC instructions in entry blocks are replaced with PACBTI instructions (tests for this case will be added in a later patch because the compiler currently does not generate PAC instructions). The ARM Constant Island pass is adjusted to handle BTI instructions correctly. Functions with static linkage that don't have their address taken can still be called indirectly by linker-generated veneers and thus their entry points need be marked with BTI or PACBTI. The changes are tested using "LLVM IR -> assembly" tests, jump tables also have a MIR test. Unfortunately it is not possible add MIR tests for exception handling and computed gotos because of MIR parser limitations. This patch is part of a series that adds support for the PACBTI-M extension of the Armv8.1-M architecture, as detailed here: https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/armv8-1-m-pointer-authentication-and-branch-target-identification-extension The PACBTI-M specification can be found in the Armv8-M Architecture Reference Manual: https://developer.arm.com/documentation/ddi0553/latest The following people contributed to this patch: - Mikhail Maltsev - Momchil Velikov - Ties Stuij Reviewed By: ostannard Differential Revision: https://reviews.llvm.org/D112426	2021-12-01 12:54:05 +00:00
Nick Desaulniers	89453ed6f2	[ARM] create new pseudo t2LDRLIT_ga_pcrel for stack guards We can't use the existing pseudo ARM::tLDRLIT_ga_pcrel for loading the stack guard for PIC code that references the GOT, since arm-pseudo may expand this to the narrow tLDRpci rather than the wider t2LDRpci. Create a new pseudo, t2LDRLIT_ga_pcrel, and expand it to t2LDRpci. Fixes: https://bugs.chromium.org/p/chromium/issues/detail?id=1270361 Reviewed By: ardb Differential Revision: https://reviews.llvm.org/D114762	2021-11-30 08:46:05 +01:00
Kazu Hirata	562356d6e3	[Target] Use range-based for loops (NFC)	2021-11-26 08:23:01 -08:00
Jay Foad	3264e95938	[CodeGen] Update LiveIntervals in TargetInstrInfo::convertToThreeAddress Delegate updating of LiveIntervals to each target's convertToThreeAddress implementation, instead of repairing LiveIntervals after the fact in TwoAddressInstruction::convertInstTo3Addr. Differential Revision: https://reviews.llvm.org/D113493	2021-11-17 10:16:47 +00:00
Ard Biesheuvel	a19da876ab	[ARM] implement support for TLS register based stack protector Implement support for loading the stack canary from a memory location held in the TLS register, with an optional offset applied. This is used by the Linux kernel to implement per-task stack canaries, which is impossible on SMP systems when using a global variable for the stack canary. Reviewed By: nickdesaulniers Differential Revision: https://reviews.llvm.org/D112768	2021-11-09 18:19:47 +01:00
Ard Biesheuvel	2caf85ad7a	[ARM] implement LOAD_STACK_GUARD for remaining targets Currently, LOAD_STACK_GUARD on ARM is only implemented for Mach-O targets, and other targets rely on the generic support which may result in spilling of the stack canary value or address, or may cause it to be kept in a callee save register across function calls, which means they essentially get spilled as well, only by the callee when it wants to free up this register. So let's implement LOAD_STACK GUARD for other targets as well. This ensures that the load of the stack canary is rematerialized fully in the epilogue. This code was split off from D112768: [ARM] implement support for TLS register based stack protector for which it is a prerequisite. Reviewed By: nickdesaulniers Differential Revision: https://reviews.llvm.org/D112811	2021-11-08 22:59:15 +01:00
Kazu Hirata	41ef3187e0	[ARM, X86] Use MachineBasicBlock::{predecessors,successors} (NFC)	2021-11-07 09:53:16 -08:00
John Brawn	082fa56819	[ARM] Fix MOVCC peephole to not use an incorrect register class The MOVCC peephole eliminates a MOVCC by making one of its inputs a conditional instruction, but when doing this it should be using both inputs of the MOVCC to decide on the register class to use as otherwise we can get an error when using -verify-machineinstrs. Differential Revision: https://reviews.llvm.org/D111714	2021-10-15 10:54:26 +01:00
David Green	73346f5848	[ARM] Introduce a MQPRCopy Currently when creating tail predicated loops, we need to validate that all the live-outs of a loop will be equivalent with and without tail predication, and if they are not we cannot legally create a tail-predicated loop, leaving expensive vctp and vpst instructions in the loop. These notably can include register-allocation instructions like stack loads and stores, and copys lowered from COPYs to MVE_VORRs. Instead of trying to prove this is valid late in the pipeline, this patch introduces a MQPRCopy pseudo instruction that COPY is lowered to. This can then either be converted to a MVE_VORR where possible, or to a couple of VMOVD instructions if not. This way they do not behave differently within and outside of tail-predications regions, and we can know by construction that they are always valid. The idea is that we can do the same with stack load and stores, converting them to VLDR/VSTR or VLDM/VSTM where required to prove tail predication is always valid. This does unfortunately mean inserting multiple VMOVD instructions, instead of a single MVE_VORR, but my experiments show it to be an improvement in general. Differential Revision: https://reviews.llvm.org/D111048	2021-10-07 12:52:12 +01:00
Jay Foad	6cef28ed2d	[TII] Remove the MFI argument to convertToThreeAddress. NFC. This simplifies the API and addresses a FIXME in TwoAddressInstructionPass::convertInstTo3Addr. Differential Revision: https://reviews.llvm.org/D110229	2021-09-23 08:58:46 +01:00
David Green	9cb8f4d1ad	[ARM] Add a tail-predication loop predicate register The semantics of tail predication loops means that the value of LR as an instruction is executed determines the predicate. In other words: mov r3, #3 DLSTP lr, r3 // Start tail predication, lr==3 VADD.s32 q0, q1, q2 // Lanes 0,1 and 2 are updated in q0. mov lr, #1 VADD.s32 q0, q1, q2 // Only first lane is updated. This means that the value of lr cannot be spilled and re-used in tail predication regions without potentially altering the behaviour of the program. More lanes than required could be stored, for example, and in the case of a gather those lanes might not have been setup, leading to alignment exceptions. This patch adds a new lr predicate operand to MVE instructions in order to keep a reference to the lr that they use as a tail predicate. It will usually hold the zeroreg meaning not predicated, being set to the LR phi value in the MVETPAndVPTOptimisationsPass. This will prevent it from being spilled anywhere that it needs to be used. A lot of tests needed updating. Differential Revision: https://reviews.llvm.org/D107638	2021-09-02 13:42:58 +01:00
Nikita Popov	0529e2e018	[InstrInfo] Use 64-bit immediates for analyzeCompare() (NFCI) The backend generally uses 64-bit immediates (e.g. what MachineOperand::getImm() returns), so use that for analyzeCompare() and optimizeCompareInst() as well. This avoids truncation for targets that support immediates larger 32-bit. In particular, we can avoid the bugprone value normalization hack in the AArch64 target. This is a followup to D108076. Differential Revision: https://reviews.llvm.org/D108875	2021-08-30 19:46:04 +02:00
David Green	62e892fa2d	[ARM] Add MQQPR and MQQQQPR spill and reload pseudo instructions As a part of D107642, this adds pseudo instructions for MQQPR and MQQQQPR register classes, that can spill and reloads entire registers whilst keeping them combined, not splitting them into multiple D subregs that a VLDMIA/VSTMIA would use. This can help certain analyses, and helps to prevent verifier issues with subreg liveness.	2021-08-17 13:51:34 +01:00
David Green	9236dea255	[ARM] Create MQQPR and MQQQQPR register classes Similar to the MQPR register class as the MVE equivalent to QPR, this adds MQQPR and MQQQQPR register classes for the MVE equivalents of QQPR and QQQQPR registers. The MVE MQPR seemed have worked out quite well, and adding MQQPR and MQQQQPR allows us to a little more accurately specify the number of registers, calculating register pressure limits a little better. Differential Revision: https://reviews.llvm.org/D107463	2021-08-16 22:58:12 +01:00
David Green	54c91c0c74	[ARM] Implement isLoad/StoreFromStackSlot for MVE stack stores accesses This implements the isLoadFromStackSlot and isStoreToStackSlot for MVE MVE_VSTRWU32 and MVE_VLDRWU32 functions. They behave the same as many other loads/stores, expecting a FI in Op1 and zero offset in Op2. At the same time this alters VLDR_P0_off and VSTR_P0_off to use the same code too, as they too should be returning VPR in Op0, take a FI in Op1 and zero offset in Op2. Differential Revision: https://reviews.llvm.org/D106797	2021-07-27 09:11:58 +01:00
David Green	bee2f618d5	[ARM] Introduce t2WhileLoopStartTP This adds t2WhileLoopStartTP, similar to the t2DoLoopStartTP added in D90591. It keeps a reference to both the tripcount register and the element count register, so that the ARMLowOverheadLoops pass in the backend can pick the correct one without having to search for it from the operand of a VCTP. Differential Revision: https://reviews.llvm.org/D103236	2021-06-13 13:55:34 +01:00
Yvan Roux	6c78dbd4ca	[ARM] Fix Machine Outliner LDRD/STRD handling in Thumb mode. This is a fix for PR50481 Immediate values for AddrModeT2_i8s4 are already scaled in MCinst operand. This patch changes the number of bits and scale factor to reflect that state when checking stack offset status. AddrModeT2_i7s[2\|4] also have this particularity but since MVE instructions are not outlined, just move these cases to the unhandled ones. Differential Revision: https://reviews.llvm.org/D103167	2021-06-09 15:37:21 +02:00
Tomas Matheson	9d86095ff8	Revert "[CodeGen][ARM] Implement atomicrmw as pseudo operations at -O0" This reverts commit `753185031d`.	2021-05-03 21:48:20 +01:00
Tomas Matheson	753185031d	[CodeGen][ARM] Implement atomicrmw as pseudo operations at -O0 atomicrmw instructions are expanded by AtomicExpandPass before register allocation into cmpxchg loops. Register allocation can insert spills between the exclusive loads and stores, which invalidates the exclusive monitor and can lead to infinite loops. To avoid this, reimplement atomicrmw operations as pseudo-instructions and expand them after register allocation. Floating point legalisation: f16 ATOMIC_LOAD_FADD(f16, f16) is legalised to f32 ATOMIC_LOAD_FADD(i16, f32) and then eventually f32 ATOMIC_LOAD_FADD_16(*i16, f32) Differential Revision: https://reviews.llvm.org/D101164 Originally submitted as `3338290c18`. Reverted in `c7df6b1223`.	2021-05-03 20:25:15 +01:00
Tomas Matheson	c7df6b1223	Revert "[CodeGen][ARM] Implement atomicrmw as pseudo operations at -O0" This reverts commit `3338290c18`. Broke expensive checks on debian.	2021-04-30 16:53:14 +01:00
Tomas Matheson	3338290c18	[CodeGen][ARM] Implement atomicrmw as pseudo operations at -O0 atomicrmw instructions are expanded by AtomicExpandPass before register allocation into cmpxchg loops. Register allocation can insert spills between the exclusive loads and stores, which invalidates the exclusive monitor and can lead to infinite loops. To avoid this, reimplement atomicrmw operations as pseudo-instructions and expand them after register allocation. Floating point legalisation: f16 ATOMIC_LOAD_FADD(f16, f16) is legalised to f32 ATOMIC_LOAD_FADD(i16, f32) and then eventually f32 ATOMIC_LOAD_FADD_16(*i16, f32) Differential Revision: https://reviews.llvm.org/D101164	2021-04-30 16:40:33 +01:00
David Green	fad70c3068	[ARM] Improve WLS lowering Recently we improved the lowering of low overhead loops and tail predicated loops, but concentrated first on the DLS do style loops. This extends those improvements over to the WLS while loops, improving the chance of lowering them successfully. To do this the lowering has to change a little as the instructions are terminators that produce a value - something that needs to be treated carefully. Lowering starts at the Hardware Loop pass, inserting a new llvm.test.start.loop.iterations that produces both an i1 to control the loop entry and an i32 similar to the llvm.start.loop.iterations intrinsic added for do loops. This feeds into the loop phi, properly gluing the values together: %wls = call { i32, i1 } @llvm.test.start.loop.iterations.i32(i32 %div) %wls0 = extractvalue { i32, i1 } %wls, 0 %wls1 = extractvalue { i32, i1 } %wls, 1 br i1 %wls1, label %loop.ph, label %loop.exit ... loop: %lsr.iv = phi i32 [ %wls0, %loop.ph ], [ %iv.next, %loop ] .. %iv.next = call i32 @llvm.loop.decrement.reg.i32(i32 %lsr.iv, i32 1) %cmp = icmp ne i32 %iv.next, 0 br i1 %cmp, label %loop, label %loop.exit The llvm.test.start.loop.iterations need to be lowered through ISel lowering as a pair of WLS and WLSSETUP nodes, which each get converted to t2WhileLoopSetup and t2WhileLoopStart Pseudos. This helps prevent t2WhileLoopStart from being a terminator that produces a value, something difficult to control at that stage in the pipeline. Instead the t2WhileLoopSetup produces the value of LR (essentially acting as a lr = subs rn, 0), t2WhileLoopStart consumes that lr value (the Bcc). These are then converted into a single t2WhileLoopStartLR at the same point as t2DoLoopStartTP and t2LoopEndDec. Otherwise we revert the loop to prevent them from progressing further in the pipeline. The t2WhileLoopStartLR is a single instruction that takes a GPR and produces LR, similar to the WLS instruction. %1:gprlr = t2WhileLoopStartLR %0:rgpr, %bb.3 t2B %bb.1 ... bb.2.loop: %2:gprlr = PHI %1:gprlr, %bb.1, %3:gprlr, %bb.2 ... %3:gprlr = t2LoopEndDec %2:gprlr, %bb.2 t2B %bb.3 The t2WhileLoopStartLR can then be treated similar to the other low overhead loop pseudos, eventually being lowered to a WLS providing the branches are within range. Differential Revision: https://reviews.llvm.org/D97729	2021-03-11 17:56:19 +00:00
David Green	2753722b0f	[ARM] Mark MVE_VMOV_to_lane_32 as isInsertSubregLike This allows the peephole optimizer to know that a MVE_VMOV_to_lane_32 is the same as an insert subreg, allowing it to optimize some redundant lane moves. Differential Revision: https://reviews.llvm.org/D95433	2021-02-02 16:35:47 +00:00

1 2 3 4 5 ...

730 Commits