llvm-project

Commit Graph

Author	SHA1	Message	Date
Matt Arsenault	d6fdbbcace	AMDGPU: Add second emergency slot for SGPR to vmem for large frames In a future change, we will sometimes use a VGPR offset for doing spills to memory, in which case we need 2 free VGPRs to do the SGPR spill. In most cases we could spill the VGPR along with the SGPR being spilled, but we don't have any free lanes for SGPR_1024 in wave32 so we could still potentially need a second scavenging slot.	2022-02-02 19:05:05 -05:00
Florian Mayer	f7a6c341cb	[mte] support more complicated lifetimes (e.g. for exceptions). Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D118848	2022-02-02 14:39:22 -08:00
Sanjay Patel	f523e83b20	[x86] make helper function to create sbb with zero operands; NFC As noted in D116804, we want to effectively invert that patch for CPUs (intel) that don't break the false dependency on sbb %eax, %eax So we will likely want to create that here in the X86DAGToDAGISel::Select() case for X86::SETCC_CARRY.	2022-02-02 16:56:10 -05:00
Florian Mayer	8680d6db1e	[mte] work around lifetime issue with setjmp. setjmp can return twice, but PostDominatorTree is unaware of this. as such, it overestimates postdominance, leaving some cases where memory does not get untagged on return. this causes false positives later in the program execution. this is a workaround for now, in the longer term PostDominatorTree should be made aware of returns_twice, as this may cause problems elsewhere. See D118647 for equivalent fix to HWASan. Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D118749	2022-02-02 13:55:09 -08:00
Florian Mayer	712b31e2d4	[NFC] factor isStandardLifetime out of HWASan this is so we can use it for aarch64 stack tagging. Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D118836	2022-02-02 13:23:55 -08:00
Craig Topper	f1720abb54	[RISCV] Cleanup some places that assumed VLMaxSentinel and -1 constant mean the same thing. NFCI VLMaxSentintel happens to be represented as -1 TargetConstant. A user provided -1 would be an ISD::Constant. We shouldn't assume that they are the same thing. I'm still not entirely convinced that we should be treating -1 from the user as VLMAX. Also fix one place that failed to use XLenVT for the VLMaxSentinel, using MVT::i64 in code that only executes on RV32.	2022-02-02 12:23:12 -08:00
Matt Arsenault	245e25f9c3	AMDGPU: Implement isAsmClobberable Warn on inline assembly clobbering reserved registers. It should also warn on at least some reserved register defs, but that isn't happening right now. If you have a def and re-use of a register we reserve, the register coalescer will eliminate the intermediate virtual register. When the reserved reg def is introduced later by the backend, it will end up clobbering the value the register coalescer assumed was live through the range. There is also isInlineAsmReadOnlyReg, although I don't understand what the distinction really is. It's called in SelectionDAGBuilder, long before the set of reserved registers is frozen so I'm not sure how that can possibly work reliably. Unfortunately this is also using the ugly tablegenerated names for the registers.	2022-02-02 14:20:12 -05:00
Masoud Ataei	70066dd0e8	[PowerPC] Fixing buildbod failure ppc64le-lld-multistage-test	2022-02-02 10:29:22 -08:00
Craig Topper	b73d151a11	[RISCV] Add DAG combines to transform ADD_VL/SUB_VL into widening add/sub. This adds or reuses ISD opcodes for vadd.wv, vaddu.wv, vadd.vv, vaddu.vv and a similar set for sub. I've included support for narrowing scalar splats that have known sign/zero bits similar to what was done for MUL_VL. The conversion to vwadd.vv proceeds in two phases. First we'll form a vwadd.wv by narrowing one of the operands. Then we'll visit the vwadd.wv to try to narrow the other operand. This turned out to be simpler than catching all the cases in one step. The forming of of vwadd.wv can happen for either operand for add, but only the right hand side for sub since sub isn't commutable. An interesting quirk is that ADD_VL and VZEXT_VL/VSEXT_VL are formed during vector op legalization, but VMV_V_X_VL isn't usually formed until op legalization when BUILD_VECTORS are handled. This leads to VWADD_W_VL forming in one DAG combine round, and then a later DAG combine round sees the VMV_V_X_VL and needs to commute the operands to get the splat in position. This alone necessitated a VWADD_W_VL combine function which made forming vwadd.vv in two stages an easy choice. I've left out trying hard to form vwadd.wx instructions for now. It would only save an extend in the scalar domain which isn't as interesting. Might need to review the test coverage a bit. Most of the vwadd.wv instructions are coming from vXi64 tests on rv64. The tests were copy pasted from the existing multiply tests. Reviewed By: rogfer01 Differential Revision: https://reviews.llvm.org/D117954	2022-02-02 10:03:08 -08:00
Jay Foad	ddd3807e69	[AMDGPU] Use new target MMO flag MONoClobber This allows us to set the noclobber flag on (the MMO of) a load instruction instead of on the pointer. This fixes a bug where noclobber was being applied to all loads from the same pointer, even if some of them were clobbered. Differential Revision: https://reviews.llvm.org/D118775	2022-02-02 17:12:36 +00:00
Masoud Ataei	256d253332	[PowerPC] Scalar IBM MASS library conversion pass This patch introduces the conversions from math function calls to MASS library calls. To resolves calls generated with these conversions, one need to link libxlopt.a library. This patch is tested on PowerPC Linux and AIX. Differential: https://reviews.llvm.org/D101759 Reviewer: bmahjour	2022-02-02 07:54:19 -08:00
David Green	0cd8063960	[AArch64] Genereate CCMP from And CSel LLVM has a couple of ways of producing ccmp - either from chains in isel or from a later ifcvt style pass. This adds a simple DAG combine to capture more cases, converting and(csel(0, 1, cc0), csel(0, 1, cc1)) into a csel(ccmp(.., cc0)), depending on cc1 (a SUBS in this case). Differential Revision: https://reviews.llvm.org/D118327	2022-02-02 13:48:16 +00:00
Sanjay Patel	6592bcecd4	[x86] invert a vector select IR canonicalization with a binop identity constant This is an intentionally limited/different form of D90113. That patch bravely tries to generalize folds where we pull a binop into the arms of a select: N0 + (Cond ? 0 : FVal) --> Cond ? N0 : (N0 + FVal) ...but it is not universally profitable. This is the inverse of IR canonicalization as discussed in D113442. We know that this transform is not entirely profitable even within x86, so we only handle x86 vector fadd/fsub as a 1st step. The intent is to prevent AVX512 regressions as mentioned in D113442. The plan is to port this to DAGCombiner (so it will eventually look more like D90113) and add more types/cases in pieces with many more tests to verify that we are seeing improvements. Differential Revision: https://reviews.llvm.org/D118644	2022-02-02 08:17:53 -05:00
Simon Pilgrim	5aa2acc86b	[DAG] SimplifyDemandedVectorElts - remove KnownZero/KnownUndef from DCI helper wrapper None of the external users actual touch these (they're purely used internally down the recursive call) - its trivial to add another wrapper if anything ever does want to track known elements.	2022-02-02 12:04:49 +00:00
Nikita Popov	f8f55f7e03	[AVR] Avoid reusing the same variable name (NFC) Apparently GCC 5.4 (a supported compiler) has a bug where it will use the "MachineInstr &MI" defined by the range-based for loop to evaluate the for loop expression. Pick a different variable name to avoid this.	2022-02-02 11:36:30 +01:00
David Sherwood	11cf807796	[AArch64][CodeGen] Always use SVE (when enabled) to lower integer divides This patch adds custom lowering support for ISD::SDIV and ISD::UDIV when SVE is enabled, regardless of the minimum SVE vector length. We do this because NEON simply does not have vector integer divide support, so we want to take advantage of these instructions in SVE. As part of this patch I've also simplified LowerToPredicatedOp to avoid re-asking the same question about whether we should be using SVE for fixed length vectors. Once we've made the decision to call LowerToPredicatedOp, then we should simply assert we should be using SVE. I've updated the 128-bit min SVE vector bits tests here: CodeGen/AArch64/sve-fixed-length-int-div.ll CodeGen/AArch64/sve-fixed-length-int-rem.ll Differential Revision: https://reviews.llvm.org/D117871	2022-02-02 09:46:02 +00:00
Simon Moll	31cca9e6ba	[VE] Packed v512f32 binop isel and tests Reviewed By: kaz7 Differential Revision: https://reviews.llvm.org/D118335	2022-02-02 10:09:39 +01:00
Cullen Rhodes	16d464a291	[AArch64][SVE] NFC: tidy up isel lowering Whilst adding legal types <-> register classes for Streaming SVE in D118561 I noticed the hasSVE predication block set operation actions for opcodes that may not be legal in Streaming SVE. Move these operations to the later hasSVE block which has loops over the same types. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D118560	2022-02-02 09:02:20 +00:00
Simon Moll	7d926b7177	[VE] LEGALAVL and staged VVP legalization The new LEGALAVL node annotates that the AVL refers to packs of 64bit. We use a two-stage lowering approach with LEGALAVL: First, standard SDNodes are translated into illegal VVP layer nodes. Regardless of source (VP or standard), all VVP nodes have a mask and AVL parameter. The AVL parameter refers to the element position (just as in VP intrinsics). Second, we legalize the AVL usage in VVP layer nodes. If the element size is < 64bit, the EVL parameter has to be adjusted to refer to packs of 64bits. We wrap the legalized AVL in a LEGALAVL node to track this. Reviewed By: kaz7 Differential Revision: https://reviews.llvm.org/D118321	2022-02-02 09:11:41 +01:00
Ayke van Laethem	316664783d	[AVR] Fix atomicrmw result value This patch fixes the atomicrmw result value to be the value before the operation instead of the value after the operation. This was a bug, left as a FIXME in the code (see https://reviews.llvm.org/D97127). From the LangRef: > The contents of memory at the location specified by the <pointer> > operand are atomically read, modified, and written back. The original > value at the location is returned. Doing this expansion early allows the register allocator to arrange registers in such a way that commutable operations are simply swapped around as needed, which results in shorter code while still being correct. Differential Revision: https://reviews.llvm.org/D117725	2022-02-02 09:10:39 +01:00
serge-sans-paille	e188aae406	Cleanup header dependencies in LLVMCore Based on the output of include-what-you-use. This is a big chunk of changes. It is very likely to break downstream code unless they took a lot of care in avoiding hidden ehader dependencies, something the LLVM codebase doesn't do that well :-/ I've tried to summarize the biggest change below: - llvm/include/llvm-c/Core.h: no longer includes llvm-c/ErrorHandling.h - llvm/IR/DIBuilder.h no longer includes llvm/IR/DebugInfo.h - llvm/IR/IRBuilder.h no longer includes llvm/IR/IntrinsicInst.h - llvm/IR/LLVMRemarkStreamer.h no longer includes llvm/Support/ToolOutputFile.h - llvm/IR/LegacyPassManager.h no longer include llvm/Pass.h - llvm/IR/Type.h no longer includes llvm/ADT/SmallPtrSet.h - llvm/IR/PassManager.h no longer includes llvm/Pass.h nor llvm/Support/Debug.h And the usual count of preprocessed lines: $ clang++ -E -Iinclude -I../llvm/include ../llvm/lib/IR/*.cpp -std=c++14 -fno-rtti -fno-exceptions \| wc -l before: 6400831 after: 6189948 200k lines less to process is no that bad ;-) Discourse thread on the topic: https://llvm.discourse.group/t/include-what-you-use-include-cleanup Differential Revision: https://reviews.llvm.org/D118652	2022-02-02 06:54:20 +01:00
Craig Topper	5a5037c602	[RISCV] Fix some 80 column violations in ComputeNumSignBitsForTargetNode. NFC	2022-02-01 21:43:11 -08:00
Jacob Lambert	a24ff176a6	[AMDGPU][NFC] Fixing formatting Differential Revision: https://reviews.llvm.org/D117801	2022-02-01 17:59:01 -08:00
Stanislav Mekhanoshin	79606ee85c	[AMDGPU] Check atomics aliasing in the clobbering annotation MemorySSA considers any atomic a def to any operation it dominates just like a barrier or fence. That is correct from memory state perspective, but not required for the no-clobber metadata since we are not using it for reordering. Skip such atomics during the scan just like a barrier if it does not alias with the load. Differential Revision: https://reviews.llvm.org/D118661	2022-02-01 12:33:25 -08:00
Stanislav Mekhanoshin	c2b18a3cc5	[AMDGPU] Allow scalar loads after barrier Currently we cannot convert a vector load into scalar if there is dominating barrier or fence. It is considered a clobbering memory access to prevent memory operations reordering. While reordering is not possible the actual memory is not being clobbered by a barrier or fence and we can still use a scalar load for a uniform pointer. The solution is not to bail on a first clobbering access but traverse MemorySSA to the root excluding barriers and fences. Differential Revision: https://reviews.llvm.org/D118419	2022-02-01 11:43:17 -08:00
Krzysztof Parzyszek	c935f6e048	[Hexagon] Punt on registers without reaching defs in addr mode opt This fixes https://github.com/llvm/llvm-project/issues/52636.	2022-02-01 09:52:59 -08:00
Craig Topper	f943c58cae	[RISCC] Add missing words to comment. NFC	2022-02-01 07:39:51 -08:00
Craig Topper	7eb7810727	[RISCV] Fix a vsetvli insertion bug involving loads/stores. The first phase of the analysis can avoid a vsetvli if an earlier instruction in the block used an SEW and LMUL that when combined with the EEW of the load/store would produce the desired EMUL. If we avoided a vsetvli this will affect the global analysis we do in the second phase. The third phase where we really insert the vsetvlis needs to agree with the first phase. If it doesn't we can insert vsetvlis that invalidate the global analysis. In the test case there is a VSETVLI in the preheader that sets SEW=64 and LMUL=1. Inside the loop there is a VADD with SEW=64 and LMUL=1. This VADD is followed by a store that wants wants SEW=32 LMUL=1/2. Because it has EEW=32 as part of the opcode the SEW=64 LMUL=1 from the VADD can be become EMUL=1 for the store. So the first phase determines no vsetvli is needed. The third phase manages CurInfo differently than BBInfo.Change from the first phase. CurInfo is only updated when we see a vsetvli or insert a vsetvli. This was done to allow predecessor block information from the global analysis to be applied to multiple instructions. Since the loop body has no vsetvli we won't update CurInfo for either the VADD or the VSE. This prevented us from checking the store vsetvli elision for the VSE resulting in a vsetvli SEW=32 LMUL=1/2 being emitted which invalidated the global analysis. To mitigate this, I've added a BBLocalInfo variable that more closely matches the first phase propagation. This gets updated based on the VADD and prevents emitting a vsetvli for the store like we did in the first phase. I wonder if we should do an earlier phase to handle the load/store case by adding more pseudo opcodes and changing the SEW/LMUL for those instructions before the insertion analysis. That might be more robust than trying to guarantee two phases make the same decision. Fixes the test from D118629. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D118667	2022-02-01 07:29:01 -08:00
Shao-Ce SUN	a2a7fc7ea5	[RISCV] Adjust some comments.	2022-02-01 22:53:54 +08:00
Amy Kwan	0d6e64755a	[PowerPC] Update P10 vector insert patterns to use refactored load/stores, and update handling of v4f32 vector insert. This patch updates the P10 patterns with a load feeding into an insertelt to utilize the refactored load and store infrastructure, as well as updating any tests that exhibit any codegen changes. Furthermore, custom legalization is added for v4f32 on Power9 and above to not only assist with adjusting the refactored load/stores for P10 vector insert, but also it enables the utilization of direct moves. Differential Revision: https://reviews.llvm.org/D115691	2022-02-01 08:48:37 -06:00
Nikita Popov	a1dc6d4b83	[AArch64] Do not use ABI alignment for mops.memset.tag Pointer element types do not imply that the pointer is ABI aligned. We should be using either an explicit align attribute here, or fall back to an alignment of 1. This fixes a new element type access introduced in D117764. I don't think this makes any practical difference though, as the lowering does not depend on alignment. Differential Revision: https://reviews.llvm.org/D118681	2022-02-01 14:37:53 +01:00
tyb0807	762f0b5463	[ARM] Make getInstSizeInBytes() use instruction size from InstrInfo.td Currently, ARMBaseInstrInfo::getInstSizeInBytes() uses hard-coded instruction size for some pseudo-instructions, while this information should ideally be found in ARMInstrInfo.td, ARMInstrThumb(2).td files (which can be accessed via MCInstrDesc). Hence, the .td files should be updated and no hard-coded instruction sizes should be used by getInstSizeInBytes() anymore. Differential Revision: https://reviews.llvm.org/D118009	2022-02-01 10:39:14 +00:00
tyb0807	dd88f40c80	[AArch64] Make getInstSizeInBytes() use instruction size from InstrInfo.td Currently, AArch64InstrInfo::getInstSizeInBytes() uses hard-coded instruction size for some pseudo-instructions, while this information should ideally be found in AArch64InstrInfo.td file (which can be accessed via MCInstrDesc). Hence, the .td file should be updated and no hard-coded instruction sizes should be used by getInstSizeInBytes() anymore. Differential Revision: https://reviews.llvm.org/D117970	2022-02-01 10:39:14 +00:00
David Sherwood	daa80339df	[CodeGen] Support folds of not(cmp(cc, ...)) -> cmp(!cc, ...) for scalable vectors I have updated TargetLowering::isConstTrueVal to also consider SPLAT_VECTOR nodes with constant integer operands. This allows the optimisation to also work for targets that support scalable vectors. Differential Revision: https://reviews.llvm.org/D117210	2022-02-01 09:50:00 +00:00
Fangrui Song	51ed14d224	[AArch64] Temporarily use getPointerElementType to fix -Wdeprecated-declarations. NFC	2022-01-31 19:16:11 -08:00
Changpeng Fang	1194b9cdda	AMDGPU {NFC}: Add code object v5 support and generate metadata for implicit kernel args Summary: Add code object v5 support (deafult is still v4) Generate metadata for implicit kernel args for the new ABI Set the metadata version to be 1.2 Reviewers: t-tye, b-sumner, arsenm, and bcahoon Fixes: SWDEV-307188, SWDEV-307189 Differential Revision: https://reviews.llvm.org/D118272	2022-01-31 18:07:47 -08:00
tyb0807	e21f90dba2	[AArch64] Removing redundant PAuth flag This removes `HasPAUTH` from `AArch64SubTarget`, as it seems to be a redundant, unused copy of `HasPAuth`. Differential Revision: https://reviews.llvm.org/D117782	2022-01-31 21:00:30 +00:00
tyb0807	5aa08bf708	[AArch64][SelectionDAG] CodeGen for Armv8.8/9.3 MOPS New target SDNodes are added: AArch64ISD::MOPS_MEMSET, etc. Each intrinsic is translated to one of these in SelectionDAGBuilder via EmitTargetCodeForMOPS. A custom lowering routine for INTRINSIC_W_CHAIN is added to handle llvm.aarch64.mops.memset.tag. This takes a separate path from the common intrinsics but ultimately ends up in the same EmitMOPS(). This is part 4/4 of a series of patches split from https://reviews.llvm.org/D117405 to facilitate reviewing. Patch by Tomas Matheson, Lucas Prates and Son Tuan Vu. Differential Revision: https://reviews.llvm.org/D117764	2022-01-31 20:56:27 +00:00
tyb0807	78fd413cf7	[AArch64][GlobalISel] CodeGen for Armv8.8/9.3 MOPS This implements codegen for Armv8.8/9.3 Memory Operations extension (MOPS). Any memcpy/memset/memmov intrinsics will always be emitted as a series of three consecutive instructions P, M and E which perform the operation. The SelectionDAG implementation is split into a separate patch. AArch64LegalizerInfo will now consider the following generic opcodes if +mops is available, instead of legalising by expanding them to libcalls: G_BZERO, G_MEMCPY_INLINE, G_MEMCPY, G_MEMMOVE, G_MEMSET The s8 value of memset is legalised to s64 to match the pseudos. AArch64O0PreLegalizerCombinerInfo will still be able to combine G_MEMCPY_INLINE even if +mops is present, as it is unclear whether it is better to generate fixed length copies or MOPS instructions for the inline code of small or zero-sized memory operations, so we choose to be conservative for now. AArch64InstructionSelector will select the above as new pseudo instructions: AArch64::MOPSMemory{Copy/Move/Set/SetTagging} These are each expanded to a series of three instructions (e.g. SETP/SETM/SETE) which must be emitted together during code emission to avoid scheduler reordering. This is part 3/4 of a series of patches split from https://reviews.llvm.org/D117405 to facilitate reviewing. Patch by Tomas Matheson and Son Tuan Vu Differential Revision: https://reviews.llvm.org/D117763	2022-01-31 20:54:41 +00:00
tyb0807	13660715e6	[AArch64] Modeling NZCV read/write for MOPS instructions According to the specification, MOPS instructions define/use NZCV flags as part of their semantics (see discussion in https://reviews.llvm.org/D116157). More specifically, the specification of the MOPS extension states that each memcpy/memset/memmov operation will be performed by a series of three MOPS instructions P, M and E. The P instruction writes to the NZCV flags, while the others (M and E) reads from the NZCV flags. This is part 2/4 of a series of patches split from https://reviews.llvm.org/D117405 to facilitate reviewing. Differential Revision: https://reviews.llvm.org/D117757	2022-01-31 20:50:16 +00:00
Sam Clegg	3e230d15eb	Revert "[WebAssembly] Refactor and fix emission of external IR global decls" This reverts commit `00bf4755e9`. This change broke the emscripten builder (among other things): https://ci.chromium.org/ui/p/emscripten-releases/builders/try/linux/b8823500584349280721/overview Sample failure: ``` test_unistd_unlink (test_core.core0) ... wasm-ld: error: symbol type mismatch: __stdio_write >>> defined as WASM_SYMBOL_TYPE_FUNCTION in /usr/local/google/home/sbc/dev/wasm/emscripten/cache/sysroot/lib/wasm32-emscripten/libc-debug.a(__stdio_write.o) >>> defined as WASM_SYMBOL_TYPE_DATA in /usr/local/google/home/sbc/dev/wasm/emscripten/cache/sysroot/lib/wasm32-emscripten/libc-debug.a(stderr.o) ```	2022-01-31 12:20:56 -08:00
Paul Walker	bcda4c48c8	[SVE] By using SEL when orring predicates we forgo the need for a PTRUE. Differential Revision: https://reviews.llvm.org/D118463	2022-01-31 19:39:23 +00:00
Paul Walker	804915f5dc	[SVE] Extend isel pattern coverage for INCP & DECP. Adds patterns for: add(x, cntp(p, p)) -> incp(x, p) sub(x, cntp(p, p)) -> decp(x, p) Differential Revision: https://reviews.llvm.org/D118567	2022-01-31 19:05:05 +00:00
Florian Hahn	23091f7d50	[AArch64] Bail out for float operands in SetCC optimization. The optimization added in D118139 causes a crash on the added test case while trying to zero extend an vector of floats. Fix the crash by bailing out for floating point operands. Reviewed By: DavidTruby Differential Revision: https://reviews.llvm.org/D118615	2022-01-31 18:20:47 +00:00
Craig Topper	2e45e8abb1	[RISCV] Add a fatal error if ISD::VSCALE is used with Zvl32b. We convert VLEN to vscale by dividing by RVVBitsPerBlock which is currently 64. This is only correct if VLEN is evenly divisible by 64. With only Zvl32b we can't assume that. This patch adds a fatal_error to prevent generating code that may be broken. We probably need to look at how we size stack frame objects too. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D118583	2022-01-31 09:13:14 -08:00
Craig Topper	09606d6a63	[RISCV] Update the computeKnownBitsForTargetNode for RISCVISD::READ_VLENB to consider Zve/Zvl. We had previously hardcoded this to assume that vector registers are 128 bits. This was true when only V existed, but after Zve extensions were added this became incorrect. This patch adjusts it to support 128, 64, or 32 bit vectors depending on Zvl. The 128-bit limit is artificial, but we don't have any test coverage showing that we larger values so I was being conservative. None of our lit tests depend on this code today due to the custom lowering of ISD::VSCALE that inserts the appropriate left or right shift to convert from VLENB to VSCALE. That code was added after this code in computeKnownBitsForTargetNode. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D118582	2022-01-31 09:13:14 -08:00
Craig Topper	aae947e860	[RISCV] Separate the Zfhmin and Zfh extensions. The spec doesn't seem to be written as if Zfh implies Zfhmin. They seem to be separate extensions. This patch moves the instructions from Zfhmin to be enabled with either the Zfh or Zfhmin extensions. Reviewed By: achieveartificialintelligence Differential Revision: https://reviews.llvm.org/D118581	2022-01-31 09:06:43 -08:00
Ties Stuij	6b1e844b69	[ARM] Add Cortex-X1C Support for Clang and LLVM This patch upstreams support for the Arm-v8 Cortex-X1C processor for AArch64 and ARM. For more information, see: - https://community.arm.com/arm-community-blogs/b/announcements/posts/arm-cortex-x1c - https://developer.arm.com/documentation/101968/0002/Functional-description/Technical-overview/Components The following people contributed to this patch: - Simon Tatham - Ties Stuij Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D117202	2022-01-31 14:23:35 +00:00
Simon Pilgrim	7ec8fc2932	[X86] combineAnd() - per-element simplification - call SimplifyDemandedBits using mask demanded bits if SimplifyDemandedVectorElts fails We already call SimplifyDemandedVectorElts using whether each vector mask element is zero/nonzero, this just extends this to also try SimplifyDemandedBits using the demanded bits mask generated from the nonzero elements. This also requires an additional TargetLowering::SimplifyDemandedBits DemandedBits/DemandedElts wrapper.	2022-01-31 13:58:00 +00:00
Jay Foad	0dcc8b86ee	[AMDGPU] AMDGPUAnnotateUniformValues: inline a single-use lambda. NFC.	2022-01-31 11:22:09 +00:00

1 2 3 4 5 ...

65876 Commits