llvm-project

Commit Graph

Author	SHA1	Message	Date
Krzysztof Parzyszek	864aaa21b4	TargetLowering: convert Optional to std::optional	2022-12-01 16:19:10 -08:00
Stanislav Mekhanoshin	bcaf31ec3f	[AMDGPU] Allow finer grain control of an unaligned access speed A target can return if a misaligned access is 'fast' as defined by the target or not. In reality there can be different levels of 'fast' and 'slow'. This patch changes the boolean 'Fast' argument of the allowsMisalignedMemoryAccesses family of functions to an unsigned representing its speed. A target can still define it as it wants and the direct translation of the current code uses 0 and 1 for current false and true. This makes the change an NFC. Subsequent patch will start using an actual value of speed in the load/store vectorizer to compare if a vectorized access going to be not just fast, but not slower than before. Differential Revision: https://reviews.llvm.org/D124217	2022-11-17 09:23:53 -08:00
Nicholas Guy	d52e2839f3	[ARM][CodeGen] Add support for complex deinterleaving Adds the Complex Deinterleaving Pass implementing support for complex numbers in a target-independent manner, deferring to the TargetLowering for the given target to create a target-specific intrinsic. Differential Revision: https://reviews.llvm.org/D114174	2022-11-14 14:02:27 +00:00
David Green	f970b007e5	[ARM] Fix vector ule zero lowering The instruction icmp ule <4 x i32> %0, zeroinitializer will usually be simplified to icmp eq <4 x i32> %0, zeroinitializer. It is not guaranteed though, and the code for lowering vector compares could pick the wrong form of the instruction if this happened. I've tried to make the code more explicit about the supported conditions. This fixes NEON being unable to select VCMPZ with HS conditions, and fixes some incorrect MVE patterns. Fixes #58514. Differential Revision: https://reviews.llvm.org/D136447	2022-11-02 22:34:05 +00:00
Zhiyao Ma	7e8af2fc0c	[ARM] Support -mexecute-only with -mlong-calls. Instead of using constant pools, use movw movt pair. Differential Revision: https://reviews.llvm.org/D136203	2022-10-24 11:41:24 -07:00
Archibald Elliott	7d15212b8c	[ARM] Support fp16/bf16 using w constraint fp16 and bf16 values can be used in GCC's inline assembly using the "w" constraint, which means "VFP floating-point registers d0-d31" - fp16 and bf16 values are stored in S registers (which alias the D registers). This change ensures that LLVM is compatible with GCC for programs that use fp16 and the 'w' constraint. Differential Revision: https://reviews.llvm.org/D135662	2022-10-13 10:32:06 +01:00
David Green	f2fde99461	[ARM] More bf16 shuffle handling, including perfect shuffles.	2022-10-02 14:31:51 +01:00
Archibald Elliott	ff4027d152	[ARM] Support fp16/bf16 using t constraint fp16 and bf16 values can be used in GCC's inline assembly using the "t" constraint, which means "VFP floating-point registers s0-s31" - fp16 and bf16 values are stored in S registers too. This change ensures that LLVM is compatible with GCC for programs that use fp16 and the 't' constraint. Fixes #57753 Differential Revision: https://reviews.llvm.org/D134553	2022-09-28 14:48:21 +01:00
Momchil Velikov	6602110152	[ARM] Enable and/cmp0 folding The `CodeGenPrepare` pass can sink bitwise `and` used by compare to zero into the basic blocks where the users are. This operation is guarded by lowering hook, which is disabled for ARM. In the ARM architecture versions from v7-M up these two operations can be folded into `tst rN, #imm` instruction. Sinking of `and` can also enable the cmov-to-bfi DAG combiner. This patch fixes some benchmark regressions caused by https://reviews.llvm.org/D129370 as well scoring slightly better overall. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D134360	2022-09-26 11:31:23 +01:00
Sergei Barannikov	c6acb4eb0f	[SDAG] Add `getCALLSEQ_END` overload taking `uint64_t`s All in-tree targets pass pointer-sized ConstantSDNodes to the method. This overload reduced amount of boilerplate code a bit. This also makes getCALLSEQ_END consistent with getCALLSEQ_START, which already takes uint64_ts.	2022-09-15 14:02:12 -04:00
Craig Topper	38ffa2bb96	[LegalizeTypes] Improve splitting for urem/udiv by constant for some constants. For remainder: If (1 << (Bitwidth / 2)) % Divisor == 1, we can add the high and low halves together and use a (Bitwidth / 2) urem. If (BitWidth /2) is a legal integer type, this urem will be expand by DAGCombiner using multiply by magic constant. We do have to take into account that adding high and low together can produce a carry, making it a (BitWidth / 2)+1 bit number. So we need to also add back in the carry from the first addition. For division: We can use the above trick to compute the remainder, subtract that remainder from the dividend, then multiply by the multiplicative inverse of the Divisor modulo (1 << BitWidth). This is based on the section "Remainder by Summing Digits" in Hacker's delight. The remainder trick is similar to a trick you may have learned for determining if a decimal number is divisible by 3. You can add all the digits together and see if the sum is divisible by 3. If you're not sure if the sum is divisible by 3, you can add its digits together. This can be repeated until you have a single decimal digit. If that digit is 3, 6, or 9, then the original number is divisible by 3. This works because 10 % 3 == 1. gcc already does this same trick. There are additional tricks gcc does urem as well as srem, udiv, and sdiv that I plan to add in future patches. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D130862	2022-09-12 10:34:52 -07:00
Matthias Gehre	c1502425ba	Move TargetTransformInfo::maxLegalDivRemBitWidth -> TargetLowering::maxSupportedDivRemBitWidth Also remove new-pass-manager version of ExpandLargeDivRem because there is no way yet to access TargetLowering in the new pass manager. Differential Revision: https://reviews.llvm.org/D133691	2022-09-12 17:06:16 +01:00
Joe Loser	5e96cea1db	[llvm] Use std::size instead of llvm::array_lengthof LLVM contains a helpful function for getting the size of a C-style array: `llvm::array_lengthof`. This is useful prior to C++17, but not as helpful for C++17 or later: `std::size` already has support for C-style arrays. Change call sites to use `std::size` instead. Differential Revision: https://reviews.llvm.org/D133429	2022-09-08 09:01:53 -06:00
John Brawn	e26cadcc32	[ARM] Constant pools need 4-byte alignment if we only have tADR When the only ADR instruction we have is the 16-bit thumb one then all constant pool entries need to be 4-byte aligned, as tADR has an offset that's a multiple of 4. It looks like previously there happened to be no situations in which we encountered a constant pool entry with alignment less than 4, so failing to do this didn't cause any problems, but the expansion of cttz to a table added by D128911 does use a constant pool with alignment 1, so we now need to handle it correctly. Differential Revision: https://reviews.llvm.org/D133199	2022-09-06 11:36:12 +01:00
Kazu Hirata	2833760c57	[Target] Qualify auto in range-based for loops (NFC)	2022-08-28 17:35:09 -07:00
Simon Pilgrim	f9de13232f	[X86] Promote i8/i16 CTTZ (BSF) instructions and remove speculation branch This patch adds a Type operand to the TLI isCheapToSpeculateCttz/isCheapToSpeculateCtlz callbacks, allowing targets to decide whether branches should occur on a type-by-type/legality basis. For X86, this patch proposes to allow CTTZ speculation for i8/i16 types that will lower to promoted i32 BSF instructions by masking the operand above the msb (we already do something similar for i8/i16 TZCNT). This required a minor tweak to CTTZ lowering - if the src operand is known never zero (i.e. due to the promotion masking) we can remove the CMOV zero src handling. Although BSF isn't very fast, most CPUs from the last 20 years don't do that bad a job with it, although there are some annoying passthrough EFLAGS dependencies. Additionally, now that we emit 'REP BSF' in most cases, we are tending towards assuming this will most likely be executed as a TZCNT instruction on any semi-modern CPU. Differential Revision: https://reviews.llvm.org/D132520	2022-08-24 17:28:18 +01:00
Daniil Fukalov	7ed3d81333	[NFCI] Move cost estimation from TargetLowering to TargetTransformInfo. TragetLowering had two last InstructionCost related `getTypeLegalizationCost()` and `getScalingFactorCost()` members, but all other costs are processed in TTI. E.g. it is not comfortable to use other TTI members in these two functions overrided in a target. Minor refactoring: `getTypeLegalizationCost()` now doesn't need DataLayout parameter - it was always passed from TTI. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D117723	2022-08-18 00:38:55 +03:00
Kazu Hirata	6d9cd9199a	Use llvm::all_of (NFC)	2022-08-14 16:25:36 -07:00
Pengxuan Zheng	9bb6622423	[ARM] Do not use LOAD_STACK_GUARD with ROPI/RWPI ROPI/RWPI are not supported with LOAD_STACK_GUARD currently. Reviewed By: nickdesaulniers, rengolin Differential Revision: https://reviews.llvm.org/D131427	2022-08-09 14:59:08 -07:00
Fangrui Song	de9d80c1c5	[llvm] LLVM_FALLTHROUGH => [[fallthrough]]. NFC With C++17 there is no Clang pedantic warning or MSVC C5051.	2022-08-08 11:24:15 -07:00
Kazu Hirata	a2d4501718	[llvm] Fix comment typos (NFC)	2022-08-07 00:16:14 -07:00
Dawid Jurczak	1bd31a6898	[NFC] Add SmallVector constructor to allow creation of SmallVector<T> from ArrayRef of items convertible to type T Extracted from https://reviews.llvm.org/D129781 and address comment: https://reviews.llvm.org/D129781#3655571 Differential Revision: https://reviews.llvm.org/D130268	2022-08-05 13:35:41 +02:00
Nikita Popov	b1b1086973	[ARM] Add target feature to force 32-bit atomics This adds a +atomic-32 target feature, which instructs LLVM to assume that lock-free 32-bit atomics are available for this target, even if they usually wouldn't be. If only atomic loads/stores are used, then this won't emit libcalls. If atomic CAS is used, then the user is responsible for providing any necessary __sync implementations (e.g. by masking interrupts for single-core privileged use cases). See https://reviews.llvm.org/D120026#3674333 for context on this change. The tl;dr is that the thumbv6m target in Rust has historically made atomic load/store only available, which is incompatible with the change from D120026, which switched these to use libatomic. Differential Revision: https://reviews.llvm.org/D130480	2022-07-27 10:00:31 +02:00
David Green	6cb9529001	[ARM] Remove VBICimm if no cleared bits are demanded If none of the bits of a VBICimm are demanded, we can remove the node entirely using the input operand instead. Differential Revision: https://reviews.llvm.org/D129966	2022-07-19 11:53:47 +01:00
Simon Pilgrim	0f6b0461b0	[DAG] SimplifyDemandedBits - relax "xor (X >> ShiftC), XorC --> (not X) >> ShiftC" to match only demanded bits The "xor (X >> ShiftC), XorC --> (not X) >> ShiftC" fold is currently limited to the XOR mask being a shifted all-bits mask, but we can relax this to only need to match under the demanded bits. This helps expose more bit extraction/clearing patterns and fixes the PowerPC testCompares*.ll regressions from D127115 Alive2: https://alive2.llvm.org/ce/z/fl7T7K Differential Revision: https://reviews.llvm.org/D129933	2022-07-19 10:59:07 +01:00
Simon Pilgrim	259c36e7c1	[DAG] Add asserts to isDesirableToCommuteWithShift overrides to ensure its being called from a shift. NFC.	2022-07-18 13:11:24 +01:00
Simon Pilgrim	07ab0cb4e7	[DAG] Add missing asserts to shouldFoldConstantShiftPairToMask overrides to ensure a shl/srl pair is used. NFC.	2022-07-18 13:11:23 +01:00
John Brawn	ddd9485129	[MVE] Don't distribute add of vecreduce if it has more than one use If the add has more than one use then applying the transformation won't cause it to be removed, so we can end up applying it again causing an infinite loop. Differential Revision: https://reviews.llvm.org/D129361	2022-07-11 14:13:29 +01:00
David Green	0a11ad2aa8	[ARM] Expand MVE i1 fptoint and inttofp if mve.fp is not present. If MVE.fp is not present then we cannot select the vector i1 fp operations to VCMP instructions, so need to expand.	2022-07-11 13:03:30 +01:00
Kazu Hirata	0916d96d12	Don't use Optional::hasValue (NFC)	2022-06-20 20:17:57 -07:00
Guillaume Chatelet	6725d80640	[NFC][Alignment] Use Align in shouldAlignPointerArgs	2022-06-14 10:56:36 +00:00
David Sherwood	007917b95c	[MVE] Fold fadd(select(..., +0.0)) into a predicated fadd We already have patterns for matching fadd(select(..., -0.0)), but an upcoming patch will lead to patterns using +0.0 as the identity instead of -0.0. I'm adding support for these patterns now to avoid any regressions for MVE. Differential Revision: https://reviews.llvm.org/D127275	2022-06-10 11:09:55 +01:00
Guillaume Chatelet	0788186182	[Alignment][NFC] Remove usage of MemSDNode::getAlignment I can't remove the function just yet as it is used in the generated .inc files. I would also like to provide a way to compare alignment with TypeSize since it came up a few times. Differential Revision: https://reviews.llvm.org/D126910	2022-06-07 13:52:20 +00:00
Martin Storsjö	668bb96379	[ARM] Implement lowering of the sponentry intrinsic This is needed for SEH based setjmp on Windows. Differential Revision: https://reviews.llvm.org/D126763	2022-06-02 12:29:59 +03:00
Zongwei Lan	ad73ce318e	[Target] use getSubtarget<> instead of static_cast<>(getSubtarget()) Differential Revision: https://reviews.llvm.org/D125391	2022-05-26 11:22:41 -07:00
David Green	a86cfaea54	[ARM] Add register-mask for tail returns The TC_RETURN/TCRETURNdi under Arm does not currently add the register-mask operand when tail folding, which leads to the register (like LR) not being 'used' by the return. This changes the code to unconditionally set the register mask on the call, as opposed to skipping it for tail calls. I don't believe this will currently alter any codegen, but should glue things together better post-frame lowering. It matches the AArch64 code better. Differential Revision: https://reviews.llvm.org/D125906	2022-05-21 15:28:24 +01:00
David Green	f848798b7d	[ARM] Delay creation of MVE Imm shifts to legalization The reasoning for creating VSHLIMM/VSHRsIMM/VSHRuIMM nodes in a combine - because matching i64 constants is difficult - does not apply for MVE, as there are not v2i64 shifts. Delaying the creation of the nodes can allow extra transforms on target independant shl/shr.	2022-05-04 22:12:09 +01:00
Matt Arsenault	c4ea925f50	AtomicExpand: Change return type for shouldExpandAtomicStoreInIR Use the same enum as the other atomic instructions for consistency, in preparation for addition of another strategy. Introduce a new "Expand" option, since the store expansion does not use cmpxchg. Alternatively, the existing CmpXChg strategy could be renamed to Expand.	2022-04-06 22:34:04 -04:00
Shao-Ce SUN	662b9fa02c	[NFC][CodeGen] Add a setTargetDAGCombine use ArrayRef Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D122557	2022-03-29 09:53:24 +08:00
Eli Friedman	ddca66622c	[ARM] Fix shouldExpandAtomicLoadInIR for subtargets without ldrexd. Regression from 2f497ec3; we should not try to generate ldrexd on targets that don't have it. Also, while I'm here, fix shouldExpandAtomicStoreInIR, for consistency. That doesn't really have any practical effect, though. On Thumb targets where we need to use __sync_* libcalls, there is no libcall for stores, so SelectionDAG calls __sync_lock_test_and_set_8 anyway.	2022-03-18 15:54:38 -07:00
Eli Friedman	2f497ec3a0	[ARM] Fix ARM backend to correctly use atomic expansion routines. Without this patch, clang would generate calls to __sync_* routines on targets where it does not make sense; we can't assume the routines exist on unknown targets. Linux has special implementations of the routines that work on old ARM targets; other targets have no such routines. In general, atomics operations which aren't natively supported should go through libatomic (__atomic_) APIs, which can support arbitrary atomics through locks. ARM targets older than v6, where this patch makes a difference, are rare in practice, but not completely extinct. See, for example, discussion on D116088. This also affects Cortex-M0, but I don't think __sync_ routines actually exist in any Cortex-M0 libraries. So in practice this just leads to a slightly different linker error for those cases, I think. Mechanically, this patch does the following: - Ensures we run atomic expansion unconditionally; it never makes sense to completely skip it. - Fixes getMaxAtomicSizeInBitsSupported() so it returns an appropriate number on all ARM subtargets. - Fixes shouldExpandAtomicRMWInIR() and shouldExpandAtomicCmpXchgInIR() to correctly handle subtargets that don't have atomic instructions. Differential Revision: https://reviews.llvm.org/D120026	2022-03-18 12:43:57 -07:00
Tomas Matheson	831ab35b2f	[ARM][AArch64] generate subtarget feature flags Reland of D120906 after sanitizer failures. This patch aims to reduce a lot of the boilerplate around adding new subtarget features. From the SubtargetFeatures tablegen definitions, a series of calls to the macro GET_SUBTARGETINFO_MACRO are generated in ARM/AArch64GenSubtargetInfo.inc. ARMSubtarget/AArch64Subtarget can then use this macro to define bool members and the corresponding getter methods. Some naming inconsistencies have been fixed to allow this, and one unused member removed. This implementation only applies to boolean members; in future both BitVector and enum members could also be generated. Differential Revision: https://reviews.llvm.org/D120906	2022-03-18 16:07:00 +00:00
Tomas Matheson	62c481542e	Revert "[ARM][AArch64] generate subtarget feature flags" This reverts commit `dd8b0fecb9`.	2022-03-18 11:58:20 +00:00
Tomas Matheson	dd8b0fecb9	[ARM][AArch64] generate subtarget feature flags This patch aims to reduce a lot of the boilerplate around adding new subtarget features. From the SubtargetFeatures tablegen definitions, a series of calls to the macro GET_SUBTARGETINFO_MACRO are generated in ARM/AArch64GenSubtargetInfo.inc. ARMSubtarget/AArch64Subtarget can then use this macro to define bool members and the corresponding getter methods. Some naming inconsistencies have been fixed to allow this, and one unused member removed. This implementation only applies to boolean members; in future both BitVector and enum members could also be generated. Differential Revision: https://reviews.llvm.org/D120906	2022-03-18 11:48:20 +00:00
Sterling Augustine	bd38234d76	Reland "Use a stable-sort when combining bases" Differential Revision: https://reviews.llvm.org/D121922	2022-03-17 11:32:16 -07:00
Sterling Augustine	84810e1f74	Revert "Use a stable-sort when combining bases" This reverts commit `81417261a1`.	2022-03-17 09:54:13 -07:00
Sterling Augustine	81417261a1	Use a stable-sort when combining bases While experimenting with different algorithms for std::sort I discovered that combine-vmovdrr.ll fails if this sort is not stable. I suspect that the test is too stringent in its check--the resultant code looks functionally identical to me under both stable and unstable sorting, but a generic fix is quite a bit more difficult to implement. Thanks to scw@google.com for finding the proper fix. Differential Revision: https://reviews.llvm.org/D121870	2022-03-17 09:02:13 -07:00
Arthur Eubanks	2371c5a0e0	[OpaquePtr][ARM] Use elementtype on ldrex/ldaex/stlex/strex Includes verifier changes checking the elementtype, clang codegen changes to emit the elementtype, and ISel changes using the elementtype. Basically the same as D120527. Reviewed By: #opaque-pointers, nikic Differential Revision: https://reviews.llvm.org/D121847	2022-03-16 14:11:53 -07:00
Shengchen Kan	37b378386e	[NFC][CodeGen] Rename some functions in MachineInstr.h and remove duplicated comments	2022-03-16 20:25:42 +08:00
Craig Topper	c7d6448d03	[DAGCombiner][TargetLowering] Pass SDValue by value to isMulAddWithConstProfitable. Internally to DAGCombiner the SDValues were passed by non-const reference despite not being modified. They were then passed by const reference to TLI. This patch passes them by value which is consistent with the vast majority of code. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D120420	2022-02-23 12:40:45 -08:00

1 2 3 4 5 ...

2153 Commits