llvm-project

Commit Graph

Author	SHA1	Message	Date
Matt Arsenault	68fab44acf	AMDGPU: Fix visiting physreg dest users when folding immediate copies This can fold the immediate into the physical destination, but this should not look for further users of the register. Fixes regression introduced by `766cb615a3`.	2020-08-10 13:46:51 -04:00
Craig Topper	96dfc783b2	[BreakFalseDeps][X86] Move operand loop out of X86's getUndefRegClearance and put in the pass. X86 is the only user of this interface in tree. Previously the X86 pass would loop over operands looking for one undef operand for the pass to fix. But there could theoretically be multiple operands to fix. So it makes more sense for the pass to do the looping and ask the target if an operand needs to be fixed.	2020-08-10 10:32:29 -07:00
Wouter van Oortmerssen	582fd474dd	[WebAssembly] wasm64: fix memory.init operand types I had assumed they would all become in i64, but this is not necessary as long as data segments stay 32-bit, see: https://github.com/WebAssembly/memory64/blob/master/proposals/memory64/Overview.md Differential Revision: https://reviews.llvm.org/D85552	2020-08-10 10:15:20 -07:00
Simon Pilgrim	9a368d2b00	[X86][SSE] shuffle(hop,hop) - canonicalize unary hop(x,x) shuffle masks If a shuffle is referring to both the lower and upper half lanes of an unary horizontal op, then canonicalize the mask to only refer to the lower half.	2020-08-10 16:09:27 +01:00
jasonliu	7866442b3f	[XCOFF] Adjust .rename emission sequence Summary: AIX assembler does not generate correct relocation when .rename appear between tc entry label and .tc directive. So only emit .rename after .tc/.comm or other linkage is emitted. Reviewed By: daltenty, hubert.reinterpretcast Differential Revision: https://reviews.llvm.org/D85317	2020-08-10 14:48:24 +00:00
Xiangling Liao	6ef801aa6b	[AIX] Static init frontend recovery and backend support On the frontend side, this patch recovers AIX static init implementation to use the linkage type and function names Clang chooses for sinit related function. On the backend side, this patch sets correct linkage and function names on aliases created for sinit/sterm functions. Differential Revision: https://reviews.llvm.org/D84534	2020-08-10 10:10:49 -04:00
Simon Pilgrim	07e673a02b	[X86][SSE] Pull out shuffle(hop,hop) combine into combineShuffleWithHorizOp helper. NFC.	2020-08-10 15:08:57 +01:00
Stefan Pintilie	81883ca074	[PowerPC] Add option to control PCRel GOT indirect linker optimization Add a hidden option to the compiler to control a the PC Relative GOT indirect linker optimization. If this option is set to false the compiler will no loger produce the relocations required by the linker to perform the optimization. Reviewed By: nemanjai, NeHuang, #powerpc Differential Revision: https://reviews.llvm.org/D85377	2020-08-10 09:07:17 -05:00
Sam Parker	4f9f4b21e0	[ARM] Unrestrict Armv8-a IT when at minsize IT blocks with more than one instruction were performance deprecated in Armv8 but that doesn't mean we should follow that advise when optimising for size. Differential Revision: https://reviews.llvm.org/D85638	2020-08-10 14:59:53 +01:00
Simon Pilgrim	e6dc2c8ce7	[X86][SSE] combineTargetShuffle - rearrange shuffle(hop,hop) matching to delay shuffle mask manipulation. NFC. Check that we're shuffling hadd/pack ops first before altering shuffle masks. First step towards adding extra functionality, plus it avoids costly shuffle mask manipulation if not necessary.	2020-08-10 14:13:19 +01:00
Matt Arsenault	40188f807d	AMDGPU/GlobalISel: Don't try to handle undef source operand This is now illegal MIR	2020-08-10 08:49:43 -04:00
Matt Arsenault	f9c279b057	PeepholeOptimizer: Use Register	2020-08-10 08:49:36 -04:00
Matt Arsenault	a0ec81f70d	AMDGPU/GlobalISel: Merge load/store select cases	2020-08-10 08:46:26 -04:00
Matt Arsenault	c8b17874e5	AMDGPU/GlobalISel: Fix typo	2020-08-10 08:41:17 -04:00
Matt Arsenault	9533f0ea68	AMDGPU/GlobalISel: Use nicer form of buildInstr	2020-08-10 08:41:07 -04:00
Qiu Chaofan	dbcfbffc7a	[PowerPC] Add intrinsic to read or set FPSCR register This patch introduces two intrinsics: llvm.ppc.setflm and llvm.ppc.readflm. They read from or write to FPSCR register (floating-point status & control) which contains rounding mode and exception status. To ensure correctness of program, we need to prevent FP operations from being moved across these intrinsics (mffs/mtfsf instruction), so here I set them as scheduling boundaries. We can relax such restriction if FPSCR is modeled well in the future. Reviewed By: steven.zhang Differential Revision: https://reviews.llvm.org/D84914	2020-08-10 18:27:45 +08:00
Petar Avramovic	0d58d9e8fb	AMDGPU/GlobalISel: Lower G_FREM Add custom lower for G_FREM. Differential Revision: https://reviews.llvm.org/D84324	2020-08-10 10:10:46 +02:00
Piotr Sobczak	62d8b8a225	Fix 64-bit copy to SCC Fix 64-bit copy to SCC by restricting the pattern resulting in such a copy to subtargets supporting 64-bit scalar compare, and mapping the copy to S_CMP_LG_U64. Before introducing the S_CSELECT pattern with explicit SCC (`0045786f14`), there was no need for handling 64-bit copy to SCC ($scc = COPY sreg_64). The proposed handling to read only the low bits was however based on a false premise that it is only one bit that matters, while in fact the copy source might be a vector of booleans and all bits need to be considered. The practical problem of mapping the 64-bit copy to SCC is that the natural instruction to use (S_CMP_LG_U64) is not available on old hardware. Fix it by restricting the problematic pattern to subtargets supporting the instruction (hasScalarCompareEq64). Differential Revision: https://reviews.llvm.org/D85207	2020-08-09 20:50:30 +02:00
David Green	186a7f81e8	[ARM] Add VADDV and VMLAV patterns for v16i16 This adds patterns for v16i16's vecreduce, using all the existing code to go via an i32 VADDV/VMLAV and truncating the result. Differential Revision: https://reviews.llvm.org/D85452	2020-08-09 11:09:49 +01:00
David Green	8590e5abad	[ARM] Allow vecreduce_add in tail predicated loops This allows vecreduce_add in loops so that we can tailpredicate them. Differential Revision: https://reviews.llvm.org/D85454	2020-08-09 10:57:17 +01:00
David Green	296faa91ed	[ARM] Some formatting and predicate VRHADD patterns. NFC This formats some of the MVE patterns, and adds a missing Predicates = [HasMVEInt] to some VRHADD patterns I noticed as going through. Although I don't believe NEON would ever use the patterns (as it would use ADDL and VSHRN instead) they should ideally be predicated on having MVE instructions.	2020-08-09 10:07:52 +01:00
Craig Topper	bc8be30540	[X86][GlobalISel] Remove unneeded code for handling zext i8->16, i8->i64, i16->i64, i32->i64. These all seem to be handled by tablegen pattern imports.	2020-08-09 00:26:15 -07:00
Thomas Lively	cc612c2908	[WebAssembly] Fix FastISel address calculation bug Fixes PR47040, in which an assertion was improperly triggered during FastISel's address computation. The issue was that an `Address` set to be relative to the FrameIndex with offset zero was incorrectly considered to have an unset base. When the left hand side of an add set the Address to be 0 off the FrameIndex, the right side would not detect that the Address base had already been set and could try to set the Address to be relative to a register instead, triggering an assertion. This patch fixes the issue by explicitly tracking whether an `Address` has been set rather than interpreting an offset of zero to mean the `Address` has not been set. Differential Revision: https://reviews.llvm.org/D85581	2020-08-08 15:23:11 -07:00
Craig Topper	d3153b5ca2	[X86] Remove a DCI.isBeforeLegalize() call from combineVSelectWithAllOnesOrZeros. This was blocking isTypeLegal call so that we could do a particular transform on illegal types before type legalization. But the we create a target specific node using that type. We shouldn't do that if the type isn't legal. So I think we should just always make sure the type is legal. I suspect that in order to get the condition VT to not be a vector of i1 we already completed type legalization anyway so this probably doesn't matter much in practice.	2020-08-08 14:19:13 -07:00
Craig Topper	966a58e329	[X86] Support matching VPTERNLOG when the root node is X86ISD::ANDNP.	2020-08-08 13:11:47 -07:00
Dávid Bolvanský	c814eca3e4	[AArch64RegisterInfo] Supress new warning	2020-08-08 21:47:01 +02:00
Craig Topper	815a9b256b	[X86] Remove isSafeToClobberEFLAGS helper and just inline it into the call sites. This is just a thin wrapper around computeRegisterLivness which we can just call directly. The only real difference is that isSafeToClobberEFLAGS returns a bool and computeRegisterLivness returns an enum. So we need to check for the specific enum value that isSafeToClobberEFLAGS was hiding. I've also adjusted which sites pass an explicit value for Neighborhood since the default for computeRegisterLivness is 10.	2020-08-08 12:31:58 -07:00
Craig Topper	8d3ae64b04	Recommit "[X86] Increase the number of instructions searched for isSafeToClobberEFLAGS in a couple places" I messed up the bug numbers in the commit message before Previously this function searched 4 instructions forwards or backwards to determine if it was ok to clobber eflags. This is called in 3 places: rematerialization, turning 2 operand leas into adds or splitting 3 ops leas into an lea and add on some CPU targets. This patch increases the search limit to 10 instructions for rematerialization and 2 operand lea to add. I've left the old treshold for 3 ops lea spliting as that increases code size. Fixes PR47024 and PR46315.	2020-08-08 11:53:14 -07:00
Craig Topper	761f568420	Revert "[X86] Increase the number of instructions searched for isSafeToClobberEFLAGS in a couple places" This reverts commit `44b260cb0a`. I messed up the bug number in the commit message so I'm reverting to fix it.	2020-08-08 11:53:14 -07:00
Simon Pilgrim	cc15380f10	[X86][SSE] combineTargetShuffle - use scaleShuffleMask helper to widen shuffle mask. NFCI. Use scaleShuffleMask helper for the shuffle(hadd,hadd) canonicalization.	2020-08-08 19:36:18 +01:00
Craig Topper	44b260cb0a	[X86] Increase the number of instructions searched for isSafeToClobberEFLAGS in a couple places Previously this function searched 4 instructions forwards or backwards to determine if it was ok to clobber eflags. This is called in 3 places: rematerialization, turning 2 operand leas into adds or splitting 3 ops leas into an lea and add on some CPU targets. This patch increases the search limit to 10 instructions for rematerialization and 2 operand lea to add. I've left the old treshold for 3 ops lea spliting as that increases code size. Fixes PR47024 and PR43014	2020-08-08 11:29:41 -07:00
Craig Topper	514b00c439	[X86] Limit the scope of the min/max canonicalization in combineSelect Previously the transform was doing these two canonicalizations (x > y) ? x : y -> (x >= y) ? x : y (x < y) ? x : y -> (x <= y) ? x : y But those don't seem to be useful generally. And they actively pessimize the cases in PR47049. This patch limits it to (x > 0) ? x : 0 -> (x >= 0) ? x : 0 (x < -1) ? x : -1 -> (x <= -1) ? x : -1 These are the cases mentioned in the comments as the motivation for the canonicalization. These allow the CMOV to use the S flag from the compare thus improving opportunities to use a TEST or the flags from an arithmetic instruction.	2020-08-07 22:51:49 -07:00
Keno Fischer	c58674df14	[X86] Don't produce bad x86andp nodes for i1 vectors In D85499, I attempted to fix this same issue by canonicalizing andnp for i1 vectors, but since there was some opposition to such a change, this commit just fixes the bug by using two different forms depending on which kind of vector type is in use. We can then always decide to switch the canonical forms later. Description of the original bug: We have a DAG combine that tries to fold (vselect cond, 0000..., X) -> (andnp cond, x). However, it does so by attempting to create an i64 vector with the number of elements obtained by truncating division by 64 from the bitwidth. This is bad for mask vectors like v8i1, since that division is just zero. Besides, we don't want i64 vectors anyway. For i1 vectors, switch the pattern to (andnp (not cond), x), which is the canonical form for `kandn` on mask registers. Fixes https://github.com/JuliaLang/julia/issues/36955. Differential Revision: https://reviews.llvm.org/D85553	2020-08-07 20:05:47 -04:00
Matt Arsenault	3c0597a9e4	AMDGPU: Avoid explicitly listing all the memory nodes	2020-08-07 19:22:46 -04:00
Arthur Eubanks	1bf4629f11	[PPC] Rename bool-ret-to-int -> ppc-bool-ret-to-int Reviewed By: #powerpc, nemanjai Differential Revision: https://reviews.llvm.org/D85391	2020-08-07 11:27:05 -07:00
Vang Thao	04bd5b5286	[AMDGPU] Fix not rescheduling without clustering Regions are sometimes skipped which should be rescheduled without memory op clustering. RegionIdx is not incremented when iterating over regions that are flagged to be skipped, causing the index to be incorrect. Thanks to Vang Thao for discovering this bug! Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D85498	2020-08-07 11:15:58 -07:00
Amy Kwan	98eccec3ae	[PowerPC] Add Vector Extract/Expand/Count with Mask, Move to VSR Mask Instruction Definitions and MC Tests This patch adds the instruction definitions and assembly/disassembly tests for the following set of instructions: Vector Extract [byte \| half \| word \| doubleword \| quad] with mask Vector Expand [byte \| half \| word \| doubleword \| quad] with mask Move to VSR [byte \| byte immediate \| half \| word \| doubleword \| quad] with mask Vector Count Mask Bits [byte \| half \| word \| doubleword] Differential Revision: https://reviews.llvm.org/D83724	2020-08-07 11:02:08 -05:00
Kamau Bridgeman	d8c6d083c9	[PowerPC][PCRelative] Set TLS unsupported with PC relative memops Introduce a fatal error if any thread local storage code is compiled using pc relative memory operations as well as a hidden override option `-enable-ppc-pcrel-tls` so that this support can be incrementally added if possible. Reviewed By: #powerpc, nemanjai Differential Revision: https://reviews.llvm.org/D85448	2020-08-07 10:56:24 -05:00
Bevin Hansson	5de6c56f7e	[Intrinsic] Add sshl.sat/ushl.sat, saturated shift intrinsics. Summary: This patch adds two intrinsics, llvm.sshl.sat and llvm.ushl.sat, which perform signed and unsigned saturating left shift, respectively. These are useful for implementing the Embedded-C fixed point support in Clang, originally discussed in http://lists.llvm.org/pipermail/llvm-dev/2018-August/125433.html and http://lists.llvm.org/pipermail/cfe-dev/2018-May/058019.html Reviewers: leonardchan, craig.topper, bjope, jdoerfert Subscribers: hiraditya, jdoerfert, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D83216	2020-08-07 15:09:24 +02:00
Kazushi (Jam) Marukawa	63bc5d7863	[VE] Change to expand multiply related instructions Change to expand MULHU/MULHS/UMUL_LOHI/SMUL_LOHI for i32 and i64 since those instructions are not available on Aurora SX VE. Some of them are used in expansion of i128 multiply, so need to modify them to support i128. Then, update basic arithmetic regression tests of i128 and signed/unsigned i32 typed integer values. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D85490	2020-08-07 18:22:25 +09:00
David Sherwood	0905d9f31e	[SVE][CodeGen] Fix bug with store of unpacked FP scalable vectors Fixed an incorrect pattern in lib/Target/AArch64/AArch64SVEInstrInfo.td for storing out <vscale x 2 x f32> unpacked scalable vectors. Added a couple of tests to test/CodeGen/AArch64/sve-st1-addressing-mode-reg-imm.ll Differential Revision: https://reviews.llvm.org/D85441	2020-08-07 07:19:09 +01:00
biplmish	cce1b0e891	[PowerPC] Implement Vector Extract Low/High Order Builtins in LLVM/Clang This patch implements the function prototypes vec_extractl and vec_extracth in altivec.h to utilize the vector extract double element instructions introduced in Power10. Differential Revision: https://reviews.llvm.org/D84622	2020-08-07 01:02:29 -05:00
QingShan Zhang	55de46f3b2	[PowerPC] Support constrained fp operation for setcc The constrained fp operation fcmp was added by https://reviews.llvm.org/D69281. This patch is trying to add the support for PowerPC backend. Reviewed By: uweigand Differential Revision: https://reviews.llvm.org/D81727	2020-08-07 05:16:36 +00:00
Kazushi (Jam) Marukawa	f92e0d9384	[VE] Optimize trunc related instructions Change to not generate truncate instructions if all use of a truncate operation don't care about higher bits. For example, an i32 add instruction doesn't care about higher 32 bits in 64 bit registers. Updates regression tests also. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D85418	2020-08-07 09:21:05 +09:00
Yonghong Song	c50f5dece9	BPF: fix libLLVMBPFCodeGen.so build failure Buildbot reported a build failure when building shared library libLLVMBPFCodeGen.so with unknown reference to "createCFGSimplificationPass". Commit `87cba43402` ("BPF: add a SimplifyCFG IR pass during generic Scalar/IPO optimization") added an IR pass SimplifyCFG by BPF target. The commit called function createCFGSimplificationPass() defined in "Scalar" library. Add this library in Target/BPF/LLVMBuild.txt so shared library build can succeed.	2020-08-06 15:27:15 -07:00
Matt Arsenault	87b2af8140	AMDGPU/GlobalISel: Enable s_{and\|or}n2_{b32\|b64} patterns	2020-08-06 18:00:38 -04:00
Yonghong Song	87cba43402	BPF: add a SimplifyCFG IR pass during generic Scalar/IPO optimization The following bpf linux kernel selftest failed with latest llvm: $ ./test_progs -n 7/10 ... The sequence of 8193 jumps is too complex. verification time 126272 usec stack depth 320 processed 114799 insns (limit 1000000) ... libbpf: failed to load object 'pyperf600_nounroll.o' test_bpf_verif_scale:FAIL:110 #7/10 pyperf600_nounroll.o:FAIL #7 bpf_verif_scale:FAIL After some investigation, I found the following llvm patch https://reviews.llvm.org/D84108 is responsible. The patch disabled hoisting common instructions in SimplifyCFG by default. Later on, the code changes and a SimplifyCFG phase with hoisting on cannot do the work any more. A test is provided to demonstrate the problem. The IR before simplifyCFG looks like: for.cond: %i.0 = phi i32 [ 0, %entry ], [ %inc, %for.inc ] %cmp = icmp ult i32 %i.0, 6 br i1 %cmp, label %for.body, label %for.cond.cleanup for.cond.cleanup: %2 = load i8, i8* %frame_ptr, align 8, !tbaa !2 %cmp2 = icmp eq i8* %2, null %conv = zext i1 %cmp2 to i32 call void @llvm.lifetime.end.p0i8(i64 8, i8* nonnull %1) #3 call void @llvm.lifetime.end.p0i8(i64 8, i8* nonnull %0) #3 ret i32 %conv for.body: %3 = load i8, i8* %frame_ptr, align 8, !tbaa !2 %tobool.not = icmp eq i8* %3, null br i1 %tobool.not, label %for.inc, label %land.lhs.true The first two insns of `for.cond.cleanup` and `for.body`, load and icmp, can be hoisted to `for.cond` block. With Patch D84108, the optimization is delayed. But unfortunately, later on loop rotation added addition phi nodes to `for.body` and hoisting cannot be done any more. Note such a hoisting is beneficial to bpf programs as bpf verifier does path sensitive analysis and verification. The hoisting preverts reloading from stack which will assume conservative value and increase exploited insns. In this case, it caused verifier failure. To fix this problem, I added an IR pass from bpf target to performance additional simplifycfg with hoisting common inst enabled. Differential Revision: https://reviews.llvm.org/D85434	2020-08-06 13:16:00 -07:00
dfukalov	4ccc38813e	[AMDGPU][CostModel] Add f16, f64 and contract cases to fused costs estimation. Add cases of fused fmul+fadd/fsub with f16 and f64 operands to cost model. Also added operations with contract attribute. Fixed line endings in test. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D84995	2020-08-06 21:43:27 +03:00
Matt Arsenault	e00201539f	GlobalISel: Implement fewerElementsVector for G_EXTRACT_VECTOR_ELT Use the same basic strategy as LegalizeVectorTypes. Try to index into smaller pieces if there's a constant index, and otherwise fall back to a stack temporary.	2020-08-06 14:33:16 -04:00
Matt Arsenault	1a0c0944c6	AMDGPU: Define raw/struct variants of buffer atomic fadd Somehow the new FP atomic buffer intrinsics ended up using the legacy style for buffer intrinsics.	2020-08-06 13:36:19 -04:00
Matt Arsenault	eae9c54148	AArch64/GlobalISel: Fix verifier error after selecting returnaddress This was caching the wrong register to re-use later.	2020-08-06 13:18:05 -04:00
Matt Arsenault	90eb7d5283	AMDGPU: Fix spilling of 96-bit AGPRs	2020-08-06 12:42:07 -04:00
Matt Arsenault	56270d1d42	AMDGPU/GlobalISel: Start trying to handle AGPR bank Try to use AGPR banks for the various merge/unmerge type operations. Previously these would introduce copies to VGPR.	2020-08-06 12:39:50 -04:00
Matt Arsenault	34040a4f61	GlobalISel: Define InvalidRegBankID enum value	2020-08-06 12:39:49 -04:00
Matt Arsenault	63cdc9a49f	AMDGPU/GlobalISel: Handle llvm.amdgcn.ds.{fadd\|fmin\|fmax} These intrinsics are missing mangling for both the pointer and data type.	2020-08-06 11:09:08 -04:00
Matt Arsenault	63c4be53cf	AMDGPU/GlobalISel: Try to promote to use packed saturating add/sub This produces worse results right now for i8 vectors, but that should be addressed when we actually try to optimize packed vectors.	2020-08-06 11:08:45 -04:00
Matt Arsenault	dcf3ffb0a8	AMDGPU/GlobalISel: Move frame index selection to patterns Doesn't really save any code until global value is handled too.	2020-08-06 10:42:15 -04:00
Matt Arsenault	d188a608bd	AMDGPU: Fix code duplication between the selectors Not sure this is the right place for this helper.	2020-08-06 10:42:15 -04:00
Matt Arsenault	5a503521e7	AMDGPU/GlobalISel: Implement expansion for rsq.clamp Not sure why we handle this removed instruction on newer subtargets for this one and no others, but maintain compatibility with the DAG.	2020-08-06 10:23:25 -04:00
Matt Arsenault	c015cbc68b	AMDGPU/GlobalISel: Fix trying to widen <3 x s1> boolean ops	2020-08-06 10:07:22 -04:00
Matt Arsenault	28124a0a63	AMDGPU/GlobalISel: Stop using G_EXTRACT in argument lowering We really need to put this undef padding stuff into a helper somewhere, but leave that for when this is moved to generic code.	2020-08-06 09:55:35 -04:00
Matt Arsenault	6c7f640bf7	AMDGPU/GlobalISel: Implement LLT version of allowsMisalignedMemoryAccesses	2020-08-06 09:50:36 -04:00
Matt Arsenault	37894ba661	AMDGPU/GlobalISel: Make s16 phi legal If we were to have an operation with an s16 def that needs to be executed in a waterfall loop, not having s16 legal would place an avoidable burden on RegBankSelect to widen it.	2020-08-06 09:41:14 -04:00
Matt Arsenault	5316256709	AMDGPU/GlobalISel: Fix assert on copy to vcc This was trying to constrain a physical register. By the verifier's understanding, it's impossible to have a 1-bit copy to vcc/vcc_lo so don't try to handle physregs.	2020-08-06 09:41:14 -04:00
Paul Walker	0d33a8ef5b	[SVE] Lower scalable vector mul operations. This allows us to remove extra patterns from AArch64SVEInstrInfo.td because we can reuse those required for fixed length vectors. Differential Revision: https://reviews.llvm.org/D85328	2020-08-06 11:15:35 +01:00
Paul Walker	3ed59b775d	[SVE] Implement lowering for fixed length vector multiplication. NOTE: Also uses SVE code generation for NEON size vectors, instead of expanding i64 based vector multiplications. Differential Revision: https://reviews.llvm.org/D85327	2020-08-06 11:01:39 +01:00
Martin Storsjö	f5e6fbac24	[AArch64] [Windows] Error out on unsupported symbol locations These might occur in seemingly generic assembly. Previously when targeting COFF, they were silently ignored, which certainly won't give the right result. Instead clearly error out, to make it clear that the assembly needs to be adjusted for this target. Also change a preexisting report_fatal_error into a proper error message, pointing out the offending source instruction. This isn't strictly an internal error, as it can be triggered by user input. Differential Revision: https://reviews.llvm.org/D85242	2020-08-06 09:23:46 +03:00
Martin Storsjö	5eedc01a82	[ARM, AArch64] Fix a comment typo. NFC.	2020-08-06 09:23:45 +03:00
Craig Topper	0215ae9735	[X86] Remove incomplete custom handling of i128 sdivrem/udivrem on Windows. We need to have special handling of i128 div/rem on Windows due to a weird calling convention needed for the libcall. There was also some code that made it look like we do the same for sdivrem/udiv, but the code didn't account for multiple return values of those functions so couldn't possibly work. I think this code never triggers because we don't have libcall names defined for those functions by default so DAGCombine never creates DIVREM nodes.	2020-08-05 23:01:07 -07:00
Matt Arsenault	0ee1eba581	AMDGPU: Remove ATOMIC_PK_FADD The f32 and v2f16 cases should be handled the same way.	2020-08-05 22:00:52 -04:00
Ruiling Song	5ddc8b49ba	[AMDGPU] add buffer_atomic_swap for float The functionality is used when calling imageAtomicExhange() on float type imageBuffer in Graphics shaders. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D85187	2020-08-06 09:45:48 +08:00
Craig Topper	08b2d0a963	[X86] Disable copy elision in LowerMemArgument for scalarized vectors when the loc VT is a different size than the original element. For example a v4f16 argument is scalarized to 4 i32 values. So the values are spread out instead of being packed tightly like in the original vector. Fixes PR47000.	2020-08-05 15:44:54 -07:00
Stanislav Mekhanoshin	0bcda1a261	[AMDGPU] Scavenge temp reg for AGPR spill Differential Revision: https://reviews.llvm.org/D85234	2020-08-05 13:29:19 -07:00
Matt Arsenault	ec8c172d01	AMDGPU: Correct prolog SP initialization logic Having callees that will read SP is not the only reason we need to reference the stack pointer.	2020-08-05 15:47:53 -04:00
Stanislav Mekhanoshin	ea7d0e2996	[AMDGPU] gfx1031 target Differential Revision: https://reviews.llvm.org/D85337	2020-08-05 12:36:26 -07:00
Matt Arsenault	83eaf5d55d	AMDGPU: Eliminate BUFFER_ATOMIC_PK_ADD_F16 node This is redundant with the other no return buffer atomic node, and we don't really need a separate type profile for it.	2020-08-05 15:16:51 -04:00
Matt Arsenault	43c0c9252a	AMDGPU: Refactor buffer atomic intrinsic lowering Move raw/struct buffer atomic lowering to separate functions. This avoids a long nested switch, and simplifies a future patch.	2020-08-05 14:44:55 -04:00
Matt Arsenault	3e52667433	AMDGPU: Fix verifier error with undef source producing s_bitset* This needs to preserve the undef flag.	2020-08-05 14:42:20 -04:00
Simon Pilgrim	b60f998859	[X86][SSE] Fold 128-bit PACK(EXTEND(X),EXTEND(Y)) -> CONCAT(X,Y) subvectors This is seen in the sub-128-bit vector trunc(ext()) of comparison results Fixes pr46585.ll regression in D66004	2020-08-05 18:27:40 +01:00
Simon Pilgrim	6a06c7a0a7	[X86] isHorizontalBinOp - only update LHS/RHS references on success We've had issues in the past where isHorizontalBinOp calls would affect later combines as the LHS/RHS references had been commuted but still failed to match.	2020-08-05 15:09:52 +01:00
Simon Pilgrim	a57bfb44bc	[X86][AVX] Fold CONCAT(HOP(X,Y),HOP(Z,W)) -> HOP(CONCAT(X,Z),CONCAT(Y,W)) for integer types	2020-08-05 15:09:51 +01:00
Sam Parker	f2675ab45f	[ARM][CostModel] Implement getCFInstrCost As with other targets, set the throughput cost of control-flow instructions to free so that we don't miss out of vectorization opportunities. Differential Revision: https://reviews.llvm.org/D85283	2020-08-05 12:44:51 +01:00
Paul Walker	927fc536ca	[SVE] Add lowering for fixed length vector and, or & xor operations. Since there are no ill effects when performing these operations with undefined elements, they are lowered to the already supported unpredicated scalable vector equivalents. Differential Revision: https://reviews.llvm.org/D85117	2020-08-05 11:28:34 +01:00
Sander de Smalen	f2916636f8	[AArch64][SVE] Disable tail calls if callee does not preserve SVE regs. This fixes an issue triggered by the following code, where emitEpilogue got confused when trying to restore the SVE registers after the call, whereas the call to bar() is implemented as a TCReturn: int non_sve(); int sve(svint32_t x) { return non_sve(); } Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D84869	2020-08-05 09:38:54 +01:00
Jay Foad	8cbf4a17ac	[AMDGPU] Propagate fast math flags in frem lowering Differential Revision: https://reviews.llvm.org/D84518	2020-08-05 09:09:38 +01:00
Jay Foad	04cf4a5a65	[AMDGPU] Lower frem f16 Without this it would fail to select on subtargets that have 16-bit instructions. Differential Revision: https://reviews.llvm.org/D84517	2020-08-05 09:08:40 +01:00
Yonghong Song	00602ee7ef	BPF: simplify IR generation for __builtin_btf_type_id() This patch simplified IR generation for __builtin_btf_type_id(). For __builtin_btf_type_id(obj, flag), previously IR builtin looks like if (obj is a lvalue) llvm.bpf.btf.type.id(obj.ptr, 1, flag) !type else llvm.bpf.btf.type.id(obj, 0, flag) !type The purpose of the 2nd argument is to differentiate __builtin_btf_type_id(obj, flag) where obj is a lvalue vs. __builtin_btf_type_id(obj.ptr, flag) Note that obj or obj.ptr is never used by the backend and the `obj` argument is only used to derive the type. This code sequence is subject to potential llvm CSE when - obj is the same .e.g., nullptr - flag is the same - metadata type is different, e.g., typedef of struct "s" and strust "s". In the above, we don't want CSE since their metadata is different. This patch change IR builtin to llvm.bpf.btf.type.id(seq_num, flag) !type and seq_num is always increasing. This will prevent potential llvm CSE. Also report an error if the type name is empty for remote relocation since remote relocation needs non-empty type name to do relocation against vmlinux. Differential Revision: https://reviews.llvm.org/D85174	2020-08-04 16:29:42 -07:00
Arthur Eubanks	f50b3ff02e	[Hexagon] Use InstSimplify instead of ConstantProp This is the last remaining use of ConstantProp, migrate it to InstSimplify in the goal of removing ConstantProp. Add -hexagon-instsimplify option to enable skipping of instsimplify in tests that can't handle the extra optimization. Differential Revision: https://reviews.llvm.org/D85047	2020-08-04 15:42:39 -07:00
Krzysztof Parzyszek	09897b146a	[RDF] Remove uses of RDFRegisters::normalize (deprecate) This function has been reduced to an identity function for some time.	2020-08-04 17:02:12 -05:00
Matt Arsenault	486e84dfa4	AMDGPU/GlobalISel: Use live in helper function for returnaddress	2020-08-04 17:36:01 -04:00
Matt Arsenault	89011fc3c9	AMDGPU/GlobalISel: Select llvm.returnaddress	2020-08-04 17:14:38 -04:00
Matt Arsenault	f8fb7835d6	GlobalISel: Add utilty for getting function argument live ins Get the argument register and ensure there's a copy to the virtual register. AMDGPU and AArch64 have similarish code to get the livein value, and I also want to use this in multiple places. This is a bit more aggressive about setting the register class than the original function, but that's probably OK. I think we're missing a few verifier checks for function live ins. I noticed AArch64's calling convention code is not actually adding liveins to functions, only the entry block (which apparently might not matter that much?). There should probably be a verifier check that entry block live ins are also live into the function. We also might need a verifier check that the copy to the livein virtual register is in the entry block.	2020-08-04 16:55:55 -04:00
Eli Friedman	95efea4b93	[AArch64][SVE] Widen narrow sdiv/udiv operations. The SVE instruction set only supports sdiv/udiv for 32-bit and 64-bit integers. If we see an 8-bit or 16-bit divide, widen the operands to 32 bits, and narrow the result. Differential Revision: https://reviews.llvm.org/D85170	2020-08-04 13:22:15 -07:00
Yonghong Song	6d218b4adb	BPF: support type exist/size and enum exist/value relocations Four new CO-RE relocations are introduced: - TYPE_EXISTENCE: whether a typedef/record/enum type exists - TYPE_SIZE: the size of a typedef/record/enum type - ENUM_VALUE_EXISTENCE: whether an enum value of an enum type exists - ENUM_VALUE: the enum value of an enum type These additional relocations will make CO-RE bpf programs more adaptive for potential kernel internal data structure changes. Differential Revision: https://reviews.llvm.org/D83878	2020-08-04 12:35:39 -07:00
David Blaikie	e31cfc4cd3	Fix -Wconstant-conversion warning with explicit cast Introduced by `fd6584a220` Following similar use of casts in AsmParser.cpp, for instance - ideally this type would use unsigned chars as they're more representative of raw data and don't get confused around implementation defined choices of char's signedness, but this is what it is & the signed/unsigned conversions are (so far as I understand) safe/bit preserving in this usage and what's intended, given the API design here.	2020-08-04 10:41:27 -07:00
Matt Arsenault	0de547ed4a	AMDGPU/GlobalISel: Ensure subreg is valid when selecting G_UNMERGE_VALUES Fixes verifier error with SGPR unmerges with 96-bit result types.	2020-08-04 12:27:34 -04:00
Nemanja Ivanovic	14d726acd6	[PowerPC] Don't remove single swap between the load and store The swap removal pass looks to remove swaps when a loaded value is swapped, some number of lane-insensitive operations are performed and then the value is swapped again and stored. However, in a situation where we load the value, swap it and then store it without swapping again, the pass erroneously removes the single swap. The reason is that both checks in the same equivalence class: - load feeds a swap - swap feeds a store pass. However, there is no check that the two swaps are actually a single swap. This patch just fixes that. Differential revision: https://reviews.llvm.org/D84785	2020-08-04 10:38:15 -05:00
Jay Foad	28e322ea93	[PowerPC] Custom lowering for funnel shifts The custom lowering saves an instruction over the generic expansion, by taking advantage of the fact that PowerPC shift instructions are well defined in the shift-by-bitwidth case. Differential Revision: https://reviews.llvm.org/D83948	2020-08-04 16:30:49 +01:00
Jay Foad	8ec8ad868d	[AMDGPU] Use fma for lowering frem This gives shorter f64 code and perhaps better accuracy. Differential Revision: https://reviews.llvm.org/D84516	2020-08-04 16:18:23 +01:00
Simon Pilgrim	6f0da46d53	[X86] getFauxShuffleMask - drop unnecessary computeKnownBits OR(X,Y) shuffle decoding. Now that rG47cea9e82dda941e lets us aggressively decode multi-use shuffles for the OR(SHUFFLE(),SHUFFLE()) case we don't need the computeKnownBits variant any more.	2020-08-04 15:57:47 +01:00
Simon Pilgrim	051f293b78	[X86] Remove unused canScaleShuffleElements helper The only use was removed at rG36750ba5bd0e9e72 Thanks to @nemanjai for the heads up	2020-08-04 14:51:23 +01:00
Simon Pilgrim	36750ba5bd	[X86][AVX] isHorizontalBinOp - relax lane-crossing limits for AVX1-only targets. Permit lane-crossing post shuffles on AVX1 targets as long as every element comes from the same source lane, which for v8f32/v4f64 cases can be efficiently lowered with the LowerShuffleAsLanePermuteAnd* style methods.	2020-08-04 14:27:01 +01:00
Sander de Smalen	bb3344c7d8	[AArch64][SVE] Add missing unwind info for SVE registers. This patch adds a CFI entry for each SVE callee saved register that needs unwind info at an offset from the CFA. The offset is a DWARF expression because the offset is partly scalable. The CFI entries only cover a subset of the SVE callee-saves and only encodes the lower 64-bits, thus implementing the lowest common denominator ABI. Existing unwinders may support VG but only restore the lower 64-bits. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D84044	2020-08-04 11:47:06 +01:00
Sander de Smalen	fd6584a220	[AArch64][SVE] Fix CFA calculation in presence of SVE objects. The CFA is calculated as (SP/FP + offset), but when there are SVE objects on the stack the SP offset is partly scalable and should instead be expressed as the DWARF expression: SP + offset + scalable_offset * VG where VG is the Vector Granule register, containing the number of 64bits 'granules' in a scalable vector. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D84043	2020-08-04 11:47:06 +01:00
Paul Walker	4be13b15d6	[SVE] Replace remaining _MERGE_OP1 nodes with _PRED variants. This is the final bit of work to relax the register allocation requirements when code generating normal LLVM IR, which rarely care about the result of inactive lanes. By using _PRED nodes we can make better use of SVE's reversed instructions. Also removes a redundant parameter from the min/max tests. Differential Revision: https://reviews.llvm.org/D85142	2020-08-04 11:19:17 +01:00
Meera Nakrani	20283ff491	[ARM] Generated SSAT and USAT instructions with shift Added patterns so that both SSAT and USAT instructions are generated with shifts. Added corresponding regression tests. Differential Review: https://reviews.llvm.org/D85120	2020-08-04 09:38:17 +00:00
Simon Pilgrim	47cea9e82d	Revert rG66e7dce714fab "Revert "[X86][SSE] Shuffle combine blends to OR(X,Y) if the relevant elements are known zero."" [X86][SSE] Shuffle combine blends to OR(X,Y) if the relevant elements are known zero (REAPPLIED) This allows us to remove the (depth violating) code in getFauxShuffleMask where we were combining the OR(SHUFFLE,SHUFFLE) shuffle inputs as well, and not just the OR(). This is a minor step toward being able to shuffle combine from/to SELECT/BLENDV as a faux shuffle. Reapplied with fixed signed/unsigned comparisons.	2020-08-04 10:32:39 +01:00
Florian Hahn	f7658241cb	[AArch64] Consider instruction-level contract FMFs in combiner patterns. Currently, instruction level fast math flags are not considered when generating patterns for the machine combiner. This currently leads to some missed opportunities to generate FMAs in combination with `#pragma clang fp contract (fast)`. For example, when building the example below with -O3 for AArch64, no FMADD is generated. If built with -O2 and the DAGCombiner is used instead of the MachineCombiner for FMAs, an FMADD is generated. With this patch, the same code is generated in both cases. float madd_contract(float a, float b, float c) { #pragma clang fp contract (fast) return (a * b) + c; } Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D84930	2020-08-04 10:25:16 +01:00
Qiu Chaofan	6a78a8dd37	[NFC] [PowerPC] Refactor fp/int conversion lowering For FP_TO_INT and INT_TO_FP lowering, we have direct-move and non-direct-move methods. But they share some conversion logic, so we can reduce redundant code by introducing new methods. Reviewed By: steven.zhang Differential Revision: https://reviews.llvm.org/D81818	2020-08-04 15:48:16 +08:00
Wang, Pengfei	6bc7ea2d8d	[X86][AVX512] Fix build fail after D81548 Test function mask_cmp_128 failed during ISEL LLVM ERROR: Cannot select: t37: v8i1 = X86ISD::KSHIFTL t48, TargetConstant:i8<4> due to v8i1 only available under AVX512DQ. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D84922	2020-08-04 12:31:04 +08:00
Chen Zheng	45c46d180e	[PowerPC] mark r+i as legal address mode for vector type after pwr9 Reviewed By: steven.zhang Differential Revision: https://reviews.llvm.org/D84735	2020-08-04 00:02:37 -04:00
Carl Ritson	57899934ea	[AMDGPU] Make GCNRegBankReassign assign based on subreg banks When scavenging consider the sub-register of the source operand to determine the bank of a candidate register (not just sub0). Without this it is possible to introduce an infinite loop, e.g. $sgpr15_sgpr16_sgpr17 can be assigned for a conflict between $sgpr0 and SGPR_96:sub1. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D84910	2020-08-04 12:54:44 +09:00
Chen Zheng	ba955397ac	[SCEVExpander][PowerPC]clear scev rewriter before deleting instructions. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D85130	2020-08-03 20:36:08 -04:00
Christopher Tetreault	c9e6887f83	[SVE] Remove bad calls to VectorType::getNumElements() from X86 Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D85156	2020-08-03 16:34:10 -07:00
hgreving	509f5c4ec2	[MC] Fix memory leak when allocating MCInst with bump allocator Adds the function createMCInst() to MCContext that creates a MCInst using a typed bump alloctor. MCInst contains a SmallVector<MCOperand, 8>. The SmallVector is POD only for <= 8 operands. The default untyped bump pointer allocator of MCContext does not delete the MCInst, so if the SmallVector grows, it's a leak. This fixes https://bugs.llvm.org/show_bug.cgi?id=46900.	2020-08-03 16:08:26 -07:00
Christopher Tetreault	3b92db4c84	[SVE] Remove bad call to VectorType::getNumElements() from AMDGPU Differential Revision: https://reviews.llvm.org/D85151	2020-08-03 15:56:10 -07:00
Christopher Tetreault	b5059b7140	[SVE] Remove bad call to VectorType::getNumElements() from ARM Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D85152	2020-08-03 15:41:14 -07:00
Jordan Rupprecht	af3ec731d5	[NFC][ARM] Silence unused variable in release builds	2020-08-03 15:21:44 -07:00
Christopher Tetreault	b43791e701	[SVE] Remove bad calls to VectorType::getNumElements() from PowerPC Differential Revision: https://reviews.llvm.org/D85154	2020-08-03 15:15:20 -07:00
Mitch Phillips	9a05fa10bd	[HWASan] [GlobalISel] Add +tagged-globals backend feature for GlobalISel GlobalISel is the default ISel for aarch64 at -O0. Prior to D78465, GlobalISel didn't have support for dealing with address-of-global lowerings, so it fell back to SelectionDAGISel. HWASan Globals require special handling, as they contain the pointer tag in the top 16-bits, and are thus outside the code model. We need to generate a `movk` in the instruction sequence with a G3 relocation to ensure the bits are relocated properly. This is implemented in SelectionDAGISel, this patch does the same for GlobalISel. GlobalISel and SelectionDAGISel differ in their lowering sequence, so there are differences in the final instruction sequence, explained in `tagged-globals.ll`. Both of these implementations are correct, but GlobalISel is slightly larger code size / slightly slower (by a couple of arithmetic instructions). I don't see this as a problem for now as GlobalISel is only on by default at `-O0`. Reviewed By: aemerson, arsenm Differential Revision: https://reviews.llvm.org/D82615	2020-08-03 14:28:44 -07:00
David Green	22916481c1	[ARM] Convert VPSEL to VMOV in tail predicated loops VPSEL has slightly different semantics under tail predication (it can end up selecting from Qn, Qm and Qd). We do not model that at the moment so they block tail predicated loops from being formed. This just converts them into a predicated VMOV instead (via a VORR), allowing tail predication to happen whilst still modelling the original behaviour of the input. Differential Revision: https://reviews.llvm.org/D85110	2020-08-03 22:03:14 +01:00
Thomas Lively	cb32792210	[WebAssembly] Implement prototype v128.load{32,64}_zero instructions Specified in https://github.com/WebAssembly/simd/pull/237, these instructions load the first vector lane from memory and zero the other lanes. Since these instructions are not officially part of the SIMD proposal, they are only available on an opt-in basis via LLVM intrinsics and clang builtin functions. If these instructions are merged to the proposal, this implementation will change so that the instructions will be generated from normal IR. At that point the intrinsics and builtin functions would be removed. This PR also changes the opcodes for the experimental f32x4.qfm{a,s} instructions because their opcodes conflicted with those of the v128.load{32,64}_zero instructions. The new opcodes were chosen to match those used in V8. Differential Revision: https://reviews.llvm.org/D84820	2020-08-03 13:54:00 -07:00
Mitch Phillips	66e7dce714	Revert "[X86][SSE] Shuffle combine blends to OR(X,Y) if the relevant elements are known zero." This reverts commit `219f32f4b6`. Commit contains unsigned compasions that break bots that build with -Wsign-compare.	2020-08-03 13:48:30 -07:00
Eli Friedman	dca23ed895	[AArch64] Add missing isel patterns for fcvtzs/u intrinsic on v1f64. Fixes test-suite compile failure caused by `8dfb5d7`. While I'm in the area, add some more test coverage to related operations, to make sure we aren't missing any other patterns.	2020-08-03 13:04:59 -07:00
Jian Cai	c6334db577	[X86] support .nops directive Add support of .nops on X86. This addresses llvm.org/PR45788. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D82826	2020-08-03 11:50:56 -07:00
Joao Moreira	f208c659fb	[X86] Make ENDBR instruction a scheduling boundary Instructions should not be scheduled across ENDBR instructions, as this would result in the ENDBR being displaced, breaking the parity needed for the Indirect Branch Tracking feature of CET. Currently, the X86IndirectBranchTracking pass is later than the instruction scheduling in the pipeline, what causes the bug to be unnoticeable and very hard (if not unfeasible) to be triggered while compiling C files with the standard LLVM setup. Yet, for correctness and to prevent issues in future changes, the compiler should prevent the such scheduling. Differential Revision: https://reviews.llvm.org/D84862	2020-08-03 10:47:23 -07:00
Simon Pilgrim	219f32f4b6	[X86][SSE] Shuffle combine blends to OR(X,Y) if the relevant elements are known zero. This allows us to remove the (depth violating) code in getFauxShuffleMask where we were combining the OR(SHUFFLE,SHUFFLE) shuffle inputs as well, and not just the OR(). This is a minor step toward being able to shuffle combine from/to SELECT/BLENDV as a faux shuffle.	2020-08-03 18:32:47 +01:00
Craig Topper	ac82b918c7	[X86] Use h-register for final XOR of __builtin_parity on 64-bit targets. This adds an isel pattern and special XOR8rr_NOREX instruction to enable the use of h-registers for __builtin_parity. This avoids a copy and a shift instruction. The NOREX instruction is in case register allocation doesn't use the matching l-register for some reason. If a R8-R15 register gets picked instead, we won't be able to encode the instruction since an h-register can't be used with a REX prefix. Fixes PR46954	2020-08-03 10:10:17 -07:00
Cameron McInally	31c7a2fd5c	[FPEnv] Don't transform FSUB(-0,X)->FNEG(X) in SelectionDAGBuilder. This patch stops unconditionally transforming FSUB(-0,X) into an FNEG(X) while building the DAG. There is also one small change to handle the new FSUB(-0,X) similarly to FNEG(X) in the AMDGPU backend. Differential Revision: https://reviews.llvm.org/D84056	2020-08-03 10:22:25 -05:00
Matt Arsenault	2414bab5d7	AMDGPU/GlobalISel: Remove old hacks for boolean selection There were various hacks used to try to avoid making s1 SGPR vs. s1 VCC ambiguous after constraining the register before we had a strategy to deal with this. This also attempted to handle undef operands, which are now illegal gMIR.	2020-08-03 09:04:14 -04:00
Matt Arsenault	fd63e46941	AMDGPU/GlobalISel: Apply load bitcast to s.buffer.load intrinsic Should also apply this to the non-scalar buffer loads.	2020-08-03 08:54:29 -04:00
Simon Pilgrim	99a971cadf	[X86][SSE] Start shuffle combining from ANY_EXTEND_VECTOR_INREG on SSE targets We already do this on AVX (+ for ZERO_EXTEND_VECTOR_INREG), but this enables it for all SSE targets - we attempted something similar back at rL357057 but hit issues with the ZERO_EXTEND_VECTOR_INREG handling (PR41249). I'm still looking at the vector-mul.ll regression - which is due to 32-bit targets performing the load as a f64, resulting in the shuffle combiner thinking it has to create a shuffle in the float domain.	2020-08-03 13:41:48 +01:00
Matt Arsenault	d8ef1d1251	AMDGPU/GlobalISel: Fix selecting broken copies for s32->s64 anyext These should probably not be legal in the first place, but that might also be a pain.	2020-08-03 08:36:41 -04:00
Nicholas Guy	18279a54b5	[ARM] Fix IT block generation after Thumb2SizeReduce with -Oz Fixes a regression caused by D82439, in which IT blocks were no longer being generated when -Oz is present. This was due to the CPSR register being marked as dead, while this case was not accounted for. Differential Revision: https://reviews.llvm.org/D83667	2020-08-03 13:20:32 +01:00
Fangrui Song	40da58a04b	[MC] Default MCAsmBackend::mayNeedRelaxation() to false	2020-08-02 22:13:59 -07:00
QingShan Zhang	62e4644616	[NFC][PowerPC] Add a multiclass for fsetcc to define them in a uniform way This is a refactor patch to prepare for adding the support for strict-fsetcc in PowerPC backend. We want to move their definition into a uniform way so that, we could add the strict node easier. Reviewed By: shchenz Differential Revision: https://reviews.llvm.org/D81712	2020-08-03 03:28:03 +00:00
StephenFan	a96921afa7	[RISCV] eliminate the repetition declare of SDLoc DL Differential revision: https://reviews.llvm.org/D85002	2020-08-03 10:24:30 +08:00
Craig Topper	64516ec7c1	[X86] Use parity flag from byte test/cmp instruction for __builtin_parity when input fits in 8 bits. If the upper bits of the __builtin_parity idiom are known to be 0 we were previously emitting an xor with 0 to get the parity flag. But we can use cmp/test instead which may expose opportunities for load folding or combining an AND.	2020-08-02 10:45:04 -07:00
Matt Arsenault	212570abcf	GlobalISel: Implement bitcast action for G_EXTRACT_VECTOR_ELEMENT For AMDGPU, vectors with elements < 32 bits should be indexed in 32-bit elements and the desired bits extracted from there. For elements > 64-bits, these should be reduce to 64/32 elements to enable the normal dynamic indexing paths. In the dynamic index cases, this produces shorter code most of the time. This does immediately regress the constant index cases, but this should be fixed once we have the most basic of shift combines. The element size > 64 case is pretty much ported from the exisiting DAG implementation for extract element promote. The increasing element size case is new.	2020-08-02 10:42:07 -04:00
Simon Pilgrim	00d0f354f2	X86InstrInfo.cpp - fix include ordering. NFCI.	2020-08-02 15:34:18 +01:00
Simon Pilgrim	7dd4f03595	Use merge null and isa<> tests into isa_and_nonnull<>. NFCI.	2020-08-02 15:34:18 +01:00
Simon Pilgrim	d14a22da5e	[DAG] TargetLowering::LowerAsmOutputForConstraint - pass SDLoc as const& Try to be more consistent with the SDLoc param in the TargetLowering methods.	2020-08-02 15:12:02 +01:00
Simon Pilgrim	20fbbbc583	[X86] Use const APInt& in for-range loop to avoid unnecessary copies. NFCI. Fixes clang-tidy warning.	2020-08-02 14:32:23 +01:00
Simon Pilgrim	d7e2616741	[X86] Pass SDLoc by const reference. NFCI.	2020-08-02 14:32:22 +01:00
Simon Pilgrim	3f276840b6	[X86] Use const APInt& in for-range loop to avoid unnecessary copies. NFCI. Fixes clang-tidy warning.	2020-08-02 14:32:22 +01:00
Simon Pilgrim	2700311cce	[X86] combineX86ShuffleChain - pull out repeated RootVT.getSizeInBits() calls. NFCI.	2020-08-02 14:32:22 +01:00
Craig Topper	56166a3a52	[X86] Improve parity idiom recognition to handle (and (truncate (ctpop X)), 1). Fixes part of PR46954	2020-08-01 22:59:43 -07:00
Kazu Hirata	60434989e5	Use llvm::is_contained where appropriate (NFC) Use llvm::is_contained where appropriate (NFC) Reviewed By: kazu Differential Revision: https://reviews.llvm.org/D85083	2020-08-01 21:51:06 -07:00
Craig Topper	e297d928dc	[X86] Add assembler support for {disp8} and {disp32} to control the size of displacement used for memory operands. These prefixes should override the default behavior and force a larger immediate size. I don't believe gas issues any warning if you use {disp8} when a 32-bit displacement is already required. And this patch doesn't either. This completes the {disp8} and {disp32} support from PR46650. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D84793	2020-08-01 13:26:35 -07:00
Simon Pilgrim	82a5c848e7	[X86][AVX512] Fold concat(and(x,y),and(z,w)) -> and(concat(x,z),concat(y,w)) for 512-bit vectors Helps vpternlog folding on non-AVX512BW targets	2020-08-01 20:34:39 +01:00
Simon Pilgrim	bb13c34c3a	[X86][AVX] Ensure we only combine to PSHUFLW/PSHUFHW on supporting targets Noticed while investigating combining from concatenated shuffle vectors, we weren't checking that PSHUFLW/PSHUFHW was legal - we were depending on lowering splitting to subvectors.	2020-08-01 19:18:11 +01:00
David Green	fd69df62ed	[ARM] Distribute post-inc for Thumb2 sign/zero extending loads/stores This adds sign/zero extending scalar loads/stores to the MVE instructions added in D77813, allowing us to create up more post-inc instructions. These are comparatively simple, compared to LDR/STR (which may be better turned into an LDRD/LDM), but still require some additions over MVE instructions. Because there are i12 and i8 variants of the offset loads/stores dealing with different signs, we may need to convert an i12 address to a i8 negative instruction. t2LDRBi12 can also be shrunk to a tLDRi under the right conditions, so we need to be careful with codesize too. Differential Revision: https://reviews.llvm.org/D78625	2020-08-01 14:01:18 +01:00
Simon Pilgrim	1b1901536a	[X86][AVX] Extend v2f64 BROADCAST(LOAD) -> BROADCAST_LOAD to v2i64/v4f32/v4i32 Minor precursor fix for D66004, but helps the SSE41 tests as well as they run with -disable-peephole	2020-08-01 12:28:29 +01:00
Craig Topper	75f134eec1	[X86] Refactor the broadcast and load folding in tryVPTESTM to reduce some code. Now we try to load and broadcast together for operand 1. Followed by load and broadcast for operand 1. Previously we tried load operand 1, load operand 1, broadcast operand 0, broadcast operand 1. Now we have a single helper that tries load and broadcast for one operand that we can just call twice.	2020-07-31 23:57:13 -07:00
Craig Topper	1bd7046e4c	[X86] Use TargetLowering::getRegClassFor to simplify some code in tryVPTESTM. NFCI	2020-07-31 21:39:10 -07:00
Justin Hibbits	7e9153e940	PowerPC: Don't lower SELECT_CC to PPCISD::FSEL on SPE SPE doesn't have a fsel instruction, so don't try to lower to it. This fixes a "Cannot select: tN: f64 = PPCISD::FSEL tX, tY, tZ" error. Reviewed By: #powerpc, lkail Differential Revision: https://reviews.llvm.org/D77773	2020-07-31 22:52:47 -05:00
Justin Hibbits	914dbf4808	PowerPC: Fix SPE extloadf32 handling. The patterns were incorrect copies from the FPU code, and are unnecessary, since there's no extended load for SPE. Just let LLVM itself do the work by marking it expand. Reviewed By: #powerpc, lkail Differential Revision: https://reviews.llvm.org/D78670	2020-07-31 22:42:57 -05:00
Kazushi (Jam) Marukawa	605fd4d77c	[VE] Change calling convention to follow ABI Change to expand all arguments and return values to i64 to follow ABI. Update regression tests also. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D84581	2020-08-01 10:08:54 +09:00
Huihui Zhang	01bfe2e494	[AArch64][SVE] Allow vector of pointers as legal type for masked load/store. Refer to LangRef http://llvm.org/docs/LangRef.html#llvm-masked-load-intrinsics 'llvm.masked.load/store.*’ intrinsics are overloaded intrinsic, which allow the load/store data to be a vector of any integer, floating-point or pointer data type. Therefore, allow pointer data type when checking 'isLegalMaskedLoadStore()'. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D85045	2020-07-31 17:30:23 -07:00
Craig Topper	93c678a79b	[X86] Simplify vpternlog immediate selection. Rather than hardcoding immediate values for 12 different combinations in a nested pair of switches, we can perform the matched logic operation on 3 magic constants to calculate the immediate. Special thanks to this tweet https://twitter.com/rygorous/status/1187034321992871936 for making me realize I could do this.	2020-07-31 17:16:27 -07:00
Hsiangkai Wang	47a4a27f47	Upgrade MC to v0.9. Differential revision: https://reviews.llvm.org/D80802	2020-08-01 07:42:06 +08:00
Sidharth Baveja	b7cfa6ca92	[Loop Peeling] Separate the Loop Peeling Utilities from the Loop Unrolling Utilities Summary: This patch separates the Loop Peeling Utilities from Loop Unrolling. The reason for this change is that Loop Peeling is no longer only being used by loop unrolling; Patch D82927 introduces loop peeling with fusion, such that loops can be modified to have to same trip count, making them legal to be peeled. Reviewed By: Meinersbur Differential Revision: https://reviews.llvm.org/D83056	2020-07-31 18:31:58 +00:00
Albion Fung	93fd8dbdc2	[PowerPC] Add Vector String Isolate instruction definitions and MC Tests This patch implements the instruction definition and MC tests for the vector string isolate instructions. Differential Revision: https://reviews.llvm.org/D84197	2020-07-31 12:32:29 -05:00
Benjamin Kramer	c6f08b14d4	Hide some internal symbols. NFC.	2020-07-31 17:28:02 +02:00
Matt Arsenault	57bd64ff84	Support addrspacecast initializers with isNoopAddrSpaceCast Moves isNoopAddrSpaceCast to the TargetMachine. It logically belongs with the DataLayout.	2020-07-31 10:42:43 -04:00
Vitaly Buka	b0eb40ca39	[NFC] Remove unused GetUnderlyingObject paramenter Depends on D84617. Differential Revision: https://reviews.llvm.org/D84621	2020-07-31 02:10:03 -07:00
QingShan Zhang	9b04fec002	[PowerPC] Retrieve the offset from load/store if it stores to stack slots Scheduler will try to retrieve the offset and base addr to determine if two loads/stores are disjoint memory access. PowerPC failed to handle this for frame index which will bring extra memory dependency for loads/stores. Reviewed By: jji Differential Revision: https://reviews.llvm.org/D84308	2020-07-31 07:08:20 +00:00
Craig Topper	30a0dbb70d	[X86] Remove x86_sse42_crc32_64_64 from X86TTIImpl::simplifyDemandedUseBitsIntrinsic It doesn't do any simplifying. It just computes known bits. We can just let InstCombine call computeKnownBits which will handle this just as well.	2020-07-30 21:51:23 -07:00
Vitaly Buka	89051ebace	[NFC] GetUnderlyingObject -> getUnderlyingObject I am going to touch them in the next patch anyway	2020-07-30 21:08:24 -07:00
Craig Topper	916d9e1877	[X86] Pass the OperandVector by reference to ParseIntelOperand and ParseRoundingMode. NFCI Similar to what was recently done to ParseATTOperand. Make ParseIntelOperand directly responsible for adding to the operand vector instead of returning the operand. Return a bool for error. Remove ErrorOperand since it is no longer used.	2020-07-30 19:52:38 -07:00
Vitaly Buka	b256cb88a7	[ValueTracking] Remove AllocaForValue parameter findAllocaForValue uses AllocaForValue to cache resolved values. The function is used only to resolve arguments of lifetime intrinsic which usually are not fare for allocas. So result reuse is likely unnoticeable. In followup patches I'd like to replace the function with GetUnderlyingObjects. Depends on D84616. Differential Revision: https://reviews.llvm.org/D84617	2020-07-30 18:48:34 -07:00
Scott Constable	ec1445c5af	[X86] Fix for ballooning compile times due to Load Value Injection (LVI) mitigations Fix for the issue raised in https://github.com/rust-lang/rust/issues/74632. The current heuristic for inserting LFENCEs uses a quadratic-time algorithm. This can apparently cause substantial compilation slowdowns for building Rust projects, where functions > 5000 LoC are apparently common. The updated heuristic in this patch implements a linear-time algorithm. On a set of benchmarks, the slowdown factor for the generated code was comparable (2.55x geo mean for the quadratic-time heuristic, vs. 2.58x for the linear-time heuristic). Both heuristics offer the same security properties, namely, mitigating LVI. This patch also includes some formatting fixes. Differential Revision: https://reviews.llvm.org/D84471	2020-07-30 17:22:33 -07:00
Craig Topper	3ad09fd03c	[X86] Separate CPU Feature lists in X86.td between architecture features and tuning features After the recent change to the tuning settings for pentium4 to improve our default 32-bit behavior, I've decided to see about implementing -mtune support. This way we could have a default architecture CPU of "pentium4" or "x86-64" and a default tuning cpu of "generic". And we could change our "pentium4" tuning settings back to what they were before. As a step to supporting this, this patch separates all of the features lists for the CPUs into 2 lists. I'm using the Proc class and a new ProcModel class to concat the 2 lists before passing to the target independent ProcessorModel. Future work to truly support mtune would change ProcessorModel to take 2 lists separately. I've diffed the X86GenSubtargetInfo.inc file before and after this patch to ensure that the final feature list for the CPUs isn't changed. Differential Revision: https://reviews.llvm.org/D84879	2020-07-30 17:19:19 -07:00
Amara Emerson	09f9f7dd1b	[AArch64][GlobalISel] Add legalization & selection support for G_INTRINSIC_LRINT. Differential Revision: https://reviews.llvm.org/D84552	2020-07-30 16:14:56 -07:00
Matt Arsenault	e56e9022bc	AMDGPU: Fix liveness errors when copying AGPR tuples Avoid recursively calling copyPhysReg for AGPR handling. This was dropping the necessary super register implicit defs to avoid liveness verifier errors.	2020-07-30 18:13:04 -04:00
Changpeng Fang	243376cdc7	AMDGPU: Put inexpensive ops first in AMDGPUAnnotateUniformValues::visitLoadInst Summary: This is in response to the review of https://reviews.llvm.org/D84873: The expensive check should be reordered last Reviewers: arsenm Differential Revision: https://reviews.llvm.org/D84890	2020-07-30 14:37:06 -07:00
Wouter van Oortmerssen	ce1eb7af9d	[WebAssembly] Fixed 64-bit indices in br_table LLVM selection dag assumes "switch" indices are pointer sized, which causes problems for our 32-bit br_table. The new function ensures 32-bit operands don't get unnecessarily extended, and 64-bit operands get truncated. Note that the changes to the existing test test exactly that: the addition of -NEXT in 2 places ensures no extension is inserted (which the test previously ignored) and that the wrap is present (previously omitted in wasm64 mode). Differential Revision: https://reviews.llvm.org/D84705	2020-07-30 10:52:16 -07:00
Stanislav Mekhanoshin	5b32518f96	[AMDGPU] Do not use undef on indirect source We are using undef on the indirect move source subreg and then using implicit super-reg. This creates a problem in RA when Greedy decides to split the register. It reassigns the implicit super-reg but does not bother to change undef source because it is really does not matter. The fix is to stop lying to RA and drop undef flag. This has also hit a problem in SIFoldOperands as it can fold immediate into an indirect move since there is no undef flag anymore. That results in multiple test failures, so added the check for this case. Differential Revision: https://reviews.llvm.org/D84899	2020-07-30 10:41:59 -07:00
Craig Topper	3632f765dc	[WebAssembly] Fix GCC 5 build. Hans' speculative fix in `b7292f2db0` didn't work for me. This seems to.	2020-07-30 10:00:28 -07:00
hsmahesha	33fd4a18e7	[AMDGPU/MemOpsCluster] Clean-up fixme's around mem ops clustering logic Get rid of all fixmes and base heuristic on `num-clustered-dwords`. The main intuition behind this is as follows. The existing heuristic roughly summarizes as below: * Assume, all the mem ops instructions participating in the clustering process, loads/stores same num bytes * If num bytes loaded by each mem op is 4 bytes, then cluster at max 5 mem ops, that is at max 20 bytes * If num bytes loaded by each mem op is 8 bytes, then cluster at max 3 mem ops, that is at max 24 bytes * If num bytes loaded by each mem op is 16 bytes, then cluster at max 2 mem ops, that is at max 32 bytes So, we need to make sure that the new heuristic do not completey deviate away from the above one, and it properly handles both the sub-word loads and the wide loads. Reviewed By: arsenm, rampitec Differential Revision: https://reviews.llvm.org/D84354	2020-07-30 21:41:13 +05:30
Fangrui Song	d2c2248722	[X86] Parse and ignore .arch directives We parse .arch so that some `.arch i386; .code32` code can assemble. It seems that X86AsmParser does not do a good job tracking what features are needed to assemble instructions. GNU as's x86 port supports a very wide range of .arch operands. Ignore the operand for now. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D84900	2020-07-30 08:30:06 -07:00
Momchil Velikov	ef4e665435	[AArch64] Fix operand definitions of XPACI/XPACD The operand to these instructions is both input and output. These are not yet emitted by the compiler and the assembler already works fine, so can't test in this patch. But D75044 will use XPACI and provide test coverage for this patch as well. Differential Revision: https://reviews.llvm.org/D84298	2020-07-30 15:31:44 +01:00
Hans Wennborg	b7292f2db0	Speculative GCC 5 build fix It's complaining about specializing the template in a different namespace.	2020-07-30 16:12:52 +02:00
jasonliu	04dc9691eb	[XCOFF][AIX] Enable -ffunction-sections Summary: This patch implements -ffunction-sections on AIX. This patch focuses on assembly generation. Follow-on patch needs to handle: 1. -ffunction-sections implication for jump table. 2. Object file generation path and associated testing. Differential Revision: https://reviews.llvm.org/D83875	2020-07-30 13:30:01 +00:00
Simon Pilgrim	cc529285fd	VectorUtils.h - reduce unnecessary includes. NFC. Replace TargetLibraryInfo.h include with forward declaration and fix implicit dependencies. Reduce SmallSet.h include to SmallVector.h include.	2020-07-30 12:27:49 +01:00
Simon Pilgrim	2dec72ba5c	[X86][SSE] combineExtractWithShuffle - extend extract(truncate(x),0) for any source vector size As long as we can extract the lowest 128-bit subvector from the pre-truncated source vector, then we don't care what size it is. The next stage will be to support non-zero extraction indices, as long as its still coming from the lowest 128-bit subvector.	2020-07-30 12:27:49 +01:00
David Sherwood	23ad660b5d	[SVE][CodeGen] At -O0 fallback to DAG ISel when translating alloca with scalable types When building code at -O0 We weren't falling back to DAG ISel correctly when encountering alloca instructions with scalable vector types. This is because the alloca has no operands that are scalable. I've fixed this by adding a check in AArch64ISelLowering::fallBackToDAGISel for alloca instructions with scalable types. Differential Revision: https://reviews.llvm.org/D84746	2020-07-30 08:40:53 +01:00
Craig Topper	07bb8240a0	[X86] Pass the OperandVector to ParseMemOperand instead of returning the operand. NFCI Continue the change made to ParseATTOperand to take the vector by reference. Let ParseMemOperand add its memory operand to the vector and just return true/false to indicate error.	2020-07-29 23:44:56 -07:00
Craig Topper	17597442db	[X86] Don't pass some many parameters to ParseMemOperand by reference. Pointers and SMLocs are cheap to copy. Even though the function modifies some of these the caller doesn't use them after the call.	2020-07-29 23:44:56 -07:00
Craig Topper	9611ee5f40	[X86] Teach the assembler parser to handle a '' between segment register and base/index/displacement part of an address A '' after the segment is equivalent to a '' before the segment register. To make the AsmMatcher table work we need to place the '' token into the operand vector before the full memory operand. To accomplish this I've modified some portions of operand parsing to expose the operand vector to ParseATTOperand so that the token can be pushed to the vector after parsing the segment register and before creating the memory operand using that segment register. Fixes PR46879 Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D84895	2020-07-29 21:15:04 -07:00
Kang Zhang	a18953c1c0	[PowerPC] Fix RM operands for some instructions Summary: Some instructions have set the wrong [RM] flag, this patch is to fix it. Instructions x(v\|s)r(d\|s)pi[zmp]? and fri[npzm] use fixed rounding directions without referencing current rounding mode. Also, the SETRNDi, SETRND, BCLRn, MTFSFI, MTFSB0, MTFSB1, MTFSFb, MTFSFI, MTFSFI_rec, MTFSF, MTFSF_rec should also fix the RM flag. Reviewed By: jsji Differential Revision: https://reviews.llvm.org/D81360	2020-07-30 02:10:49 +00:00
Matt Arsenault	0da582d9b6	GlobalISel: Handle llvm.roundeven I still think it's highly questionable that we have two intrinsics with identical behavior and only vary by the name of the libcall used if it happens to be lowered that way, but try to reduce the feature delta between SDAG and GlobalISel for recently added intrinsics. I'm not sure which opcode should be considered the canonical one, but lower roundeven back to round.	2020-07-29 20:01:12 -04:00
Craig Topper	b1c1825b99	[X86] Remove unused argument from HandleAVX512Operand in the assembly parser.	2020-07-29 14:23:01 -07:00
Simon Pilgrim	a1c9529e60	[X86][AVX] isHorizontalBinOp - relax no-lane-crossing limit for AVX1-only targets. Instead of never accepting v8f32/v4f64 FHADD/FHSUB if the input shuffle masks cross lanes, perform the matching and determine if the post shuffle mask simplifies to a 'whole lane shuffle' mask - in which case we are guaranteed to cheaply perform this as a VPERM2F128 shuffle.	2020-07-29 20:49:10 +01:00
Stanislav Mekhanoshin	decfdb8ce3	[AMDGPU] Fixed formatting in GCNHazardRecognizer.cpp. NFC.	2020-07-29 12:21:28 -07:00
Stanislav Mekhanoshin	13b63be472	[AMDGPU] prefer non-mfma in post-RA schedule MFMA instructions shall not be scheduled back to back to avoid MAI SIMD stall. Tell post-RA schedule we would prefer some other instruction instead. Differential Revision: https://reviews.llvm.org/D84883	2020-07-29 12:17:50 -07:00
Baptiste Saleil	7aaa85627b	[PowerPC] Add options to control paired vector memops support Adds frontend and backend options to enable and disable the PowerPC paired vector memory operations added in ISA 3.1. Instructions using these options will be added in subsequent patches. Differential Revision: https://reviews.llvm.org/D83722	2020-07-29 14:00:53 -05:00
Amara Emerson	d8ba622209	[AArch64][GlobalISel] Selection support for vector DUP[X]lane instructions. In future, we'd like to use the perfect-shuffle mechanism to deal with these shuffle permutations. For now, this improves performance by avoiding the super-expensive const-pool load + tbl instruction. Differential Revision: https://reviews.llvm.org/D84866	2020-07-29 11:41:37 -07:00
Matt Arsenault	59fac51ff2	AMDGPU/GlobalISel: Handle llvm.amdgcn.reloc.constant	2020-07-29 14:24:21 -04:00
Matt Arsenault	0b7de7966f	GlobalISel: Implement lower for G_EXTRACT_VECTOR_ELT Use the basic store to stack and reload.	2020-07-29 14:16:28 -04:00
Jessica Paquette	7ff9575594	[AArch64][GlobalISel] Select XRO addressing mode with wide immediates Port the wide immediate case from AArch64DAGToDAGISel::SelectAddrModeXRO. If we have a wide immediate which can't be represented in an add, we can end up with code like this: ``` mov x0, imm add x1, base, x0 ldr x2, [x1, 0] ``` If we use the [base, xN] addressing mode instead, we can produce this: ``` mov x0, imm ldr x2, [base, x0] ``` This saves 0.4% code size on 7zip at -O3, and gives a geomean code size improvement of 0.1% on CTMark. Differential Revision: https://reviews.llvm.org/D84784	2020-07-29 11:02:10 -07:00
Matt Arsenault	766cb615a3	AMDGPU: Relax restriction on folding immediates into physregs I never completed the work on the patches referenced by `f8bf7d7f42`, but this was intended to avoid folding immediate writes into m0 which the coalescer doesn't understand very well. Relax this to allow simple SGPR immediates to fold directly into VGPR copies. This pattern shows up routinely in current GlobalISel code since nothing is smart enough to emit VGPR constants yet.	2020-07-29 14:01:53 -04:00
Heejin Ahn	276f9e8cfa	[WebAssembly] Fix getBottom for loops When it was first created, CFGSort only made sure BBs in each `MachineLoop` are sorted together. After we added exception support, CFGSort now also sorts BBs in each `WebAssemblyException`, which represents a `catch` block, together, and `Region` class was introduced to be a thin wrapper for both `MachineLoop` and `WebAssemblyException`. But how we compute those loops and exceptions is different. `MachineLoopInfo` is constructed using the standard loop computation algorithm in LLVM; the definition of loop is "a set of BBs that are dominated by a loop header and have a path back to the loop header". So even if some BBs are semantically contained by a loop in the original program, or in other words dominated by a loop header, if they don't have a path back to the loop header, they are not considered a part of the loop. For example, if a BB is dominated by a loop header but contains `call abort()` or `rethrow`, it wouldn't have a path back to the header, so it is not included in the loop. But `WebAssemblyException` is wasm-specific data structure, and its algorithm is simple: a `WebAssemblyException` consists of an EH pad and all BBs dominated by the EH pad. So this scenario is possible: (This is also the situation in the newly added test in cfg-stackify-eh.ll) ``` Loop L: header, A, ehpad, latch Exception E: ehpad, latch, B ``` (B contains `abort()`, so it does not have a path back to the loop header, so it is not included in L.) And it is sorted in this order: ``` header A ehpad latch B ``` And when CFGStackify places `end_loop` or `end_try` markers, it previously used `WebAssembly::getBottom()`, which returns the latest BB in the sorted order, and placed the marker there. So in this case the marker placements will be like this: ``` loop header try A catch ehpad latch end_loop <-- misplaced! B end_try ``` in which nesting between the loop and the exception is not correct. `end_loop` marker has to be placed after `B`, and also after `end_try`. Maybe the fundamental way to solve this problem is to come up with our own algorithm for computing loop region too, in which we include all BBs dominated by a loop header in a loop. But this takes a lot more effort. The only thing we need to fix is actually, `getBottom()`. If we make it return the right BB, which means in case of a loop, the latest BB of the loop itself and all exceptions contained in there, we are good. This renames `Region` and `RegionInfo` to `SortRegion` and `SortRegionInfo` and extracts them into their own file. And add `getBottom` to `SortRegionInfo` class, from which it can access `WebAssemblyExceptionInfo`, so that it can compute a correct bottom block for loops. Reviewed By: dschuff Differential Revision: https://reviews.llvm.org/D84724	2020-07-29 10:36:32 -07:00
Craig Topper	c4823b24a4	[X86] Add custom lowering for llvm.roundeven with sse4.1. We can use the roundss/sd/ps/pd instructions like we do for ceil/floor/trunc/rint/nearbyint. Differential Revision: https://reviews.llvm.org/D84592	2020-07-29 10:23:08 -07:00
Roman Lebedev	1d51dc38d8	[SimplifyCFG][LoopRotate] SimplifyCFG: disable common instruction hoisting by default, enable late in pipeline I've been looking at missed vectorizations in one codebase. One particular thing that stands out is that some of the loops reach vectorizer in a rather mangled form, with weird PHI's, and some of the loops aren't even in a rotated form. After taking a more detailed look, that happened because the loop's headers were too big by then. It is evident that SimplifyCFG's common code hoisting transform is at fault there, because the pattern it handles is precisely the unrotated loop basic block structure. Surprizingly, `SimplifyCFGOpt::HoistThenElseCodeToIf()` is enabled by default, and is always run, unlike it's friend, common code sinking transform, `SinkCommonCodeFromPredecessors()`, which is not enabled by default and is only run once very late in the pipeline. I'm proposing to harmonize this, and disable common code hoisting until //late// in pipeline. Definition of //late// may vary, here currently i've picked the same one as for code sinking, but i suppose we could enable it as soon as right after loop rotation happens. Experimentation shows that this does indeed unsurprizingly help, more loops got rotated, although other issues remain elsewhere. Now, this undoubtedly seriously shakes phase ordering. This will undoubtedly be a mixed bag in terms of both compile- and run- time performance, codesize. Since we no longer aggressively hoist+deduplicate common code, we don't pay the price of said hoisting (which wasn't big). That may allow more loops to be rotated, so we pay that price. That, in turn, that may enable all the transforms that require canonical (rotated) loop form, including but not limited to vectorization, so we pay that too. And in general, no deduplication means more [duplicate] instructions going through the optimizations. But there's still late hoisting, some of them will be caught late. As per benchmarks i've run {F12360204}, this is mostly within the noise, there are some small improvements, some small regressions. One big regression i saw i fixed in rG8d487668d09fb0e4e54f36207f07c1480ffabbfd, but i'm sure this will expose many more pre-existing missed optimizations, as usual :S llvm-compile-time-tracker.com thoughts on this: http://llvm-compile-time-tracker.com/compare.php?from=e40315d2b4ed1e38962a8f33ff151693ed4ada63&to=c8289c0ecbf235da9fb0e3bc052e3c0d6bff5cf9&stat=instructions * this does regress compile-time by +0.5% geomean (unsurprizingly) * size impact varies; for ThinLTO it's actually an improvement The largest fallout appears to be in GVN's load partial redundancy elimination, it spends much more time in `MemoryDependenceResults::getNonLocalPointerDependency()`. Non-local `MemoryDependenceResults` is widely-known to be, uh, costly. There does not appear to be a proper solution to this issue, other than silencing the compile-time performance regression by tuning cut-off thresholds in `MemoryDependenceResults`, at the cost of potentially regressing run-time performance. D84609 attempts to move in that direction, but the path is unclear and is going to take some time. If we look at stats before/after diffs, some excerpts: * RawSpeed (the target) {F12360200} * -14 (-73.68%) loops not rotated due to the header size (yay) * -272 (-0.67%) `"Number of live out of a loop variables"` - good for vectorizer * -3937 (-64.19%) common instructions hoisted * +561 (+0.06%) x86 asm instructions * -2 basic blocks * +2418 (+0.11%) IR instructions * vanilla test-suite + RawSpeed + darktable {F12360201} * -36396 (-65.29%) common instructions hoisted * +1676 (+0.02%) x86 asm instructions * +662 (+0.06%) basic blocks * +4395 (+0.04%) IR instructions It is likely to be sub-optimal for when optimizing for code size, so one might want to change tune pipeline by enabling sinking/hoisting when optimizing for size. Reviewed By: mkazantsev Differential Revision: https://reviews.llvm.org/D84108	2020-07-29 20:05:30 +03:00
Kang Zhang	802c043078	[PowerPC] Set v1i128 to expand for SETCC to avoid crash Summary: PPC only supports the instruction selection for v16i8, v8i16, v4i32, v2i64, v4f32 and v2f64 for ISD::SETCC, don't support the v1i128, so v1i128 for ISD::SETCC will crash. This patch is to set v1i128 to expand to avoid crash. Reviewed By: steven.zhang Differential Revision: https://reviews.llvm.org/D84238	2020-07-29 16:39:27 +00:00
Matt Arsenault	d42c7b2211	AMDGPU: Account for the size of LDS globals used through constant expressions. Also "fix" the longstanding bug where the computed size depends on the order of the visitation. We could try to predict the allocation order used by legalization, but it would never be 100% perfect. Until we start fixing the addresses somehow (or have a more reliable allocation scheme later), just try to compute the size based on the worst case padding.	2020-07-29 11:40:42 -04:00
Simon Pilgrim	d1abca187d	[CostModel][X86] Add SSE costs for SMAX/SMIN/UMAX/UMIN intrinsics	2020-07-29 15:55:43 +01:00
Simon Pilgrim	0a0f28254a	[CostModel][X86] Add SSE costs for ABS intrinsics	2020-07-29 14:33:59 +01:00
David Green	9ddb28964c	[ARM] Tune getCastInstrCost for extending masked loads and truncating masked stores This patch uses the feature added in D79162 to fix the cost of a sext/zext of a masked load, or a trunc for a masked store. Previously, those were considered cheap or even free, but it's not the case as we cannot split the load in the same way we would for normal loads. This updates the costs to better reflect reality, and adds a test for it in test/Analysis/CostModel/ARM/cast.ll. It also adds a vectorizer test that showcases the improvement: in some cases, the vectorizer will now choose a smaller VF when tail-predication is enabled, which results in better codegen. (Because if it were to use a higher VF in those cases, the code we see above would be generated, and the vmovs would block tail-predication later in the process, resulting in very poor codegen overall) Original Patch by Pierre van Houtryve Differential Revision: https://reviews.llvm.org/D79163	2020-07-29 13:41:34 +01:00
David Green	60280e9818	[Analysis] TTI: Add CastContextHint for getCastInstrCost Currently, getCastInstrCost has limited information about the cast it's rating, often just the opcode and types. Sometimes there is a context instruction as well, but it isn't trustworthy: for instance, when the vectorizer is rating a plan, it calls getCastInstrCost with the old instructions when, in fact, it's trying to evaluate the cost of the instruction post-vectorization. Thus, the current system can get the cost of certain casts incorrect as the correct cost can vary greatly based on the context in which it's used. For example, if the vectorizer queries getCastInstrCost to evaluate the cost of a sext(load) with tail predication enabled, getCastInstrCost will think it's free most of the time, but it's not always free. On ARM MVE, a VLD2 group cannot be extended like a normal VLDR can. Similar situations can come up with how masked loads can be extended when being split. To fix that, this path adds a new parameter to getCastInstrCost to give it a hint about the context of the cast. It adds a CastContextHint enum which contains the type of the load/store being created by the vectorizer - one for each of the types it can produce. Original patch by Pierre van Houtryve Differential Revision: https://reviews.llvm.org/D79162	2020-07-29 13:32:53 +01:00
Sjoerd Meijer	85342c27a3	[ARM] Optimize immediate selection Optimize some specific immediates selection by materializing them with sub/mvn instructions as opposed to loading them from the constant pool. Patch by Ben Shi, powerman1st@163.com. Differential Revision: https://reviews.llvm.org/D83745	2020-07-29 13:29:17 +01:00
Matt Arsenault	200bb5191a	AMDGPU/GlobalISel: Refactor special argument management	2020-07-29 08:27:31 -04:00
Matt Arsenault	c230965ccf	AMDGPU: Make saturating add/sub legal for DAG path	2020-07-29 08:27:31 -04:00
Matt Arsenault	cdd45d5f9c	AMDGPU/GlobalISel: Select llvm.amdgcn.global.atomic.csub Remove the custom node boilerplate. Not sure why this tried to handle the LDS atomic stuff.	2020-07-29 08:27:31 -04:00
Simon Pilgrim	0c005be6eb	[X86][SSE] getV4X86ShuffleImm8 - canonicalize broadcast masks If the mask input to getV4X86ShuffleImm8 only refers to a single source element (+ undefs) then canonicalize to a full broadcast. getV4X86ShuffleImm8 defaults to inline values for undefs, which can be useful for shuffle widening/narrowing but does leave SimplifyDemanded* calls thinking the shuffle depends on unnecessary elements. I'm still investigating what we should do more generally to avoid these undemanded elements, but broadcast cases was a simpler win.	2020-07-29 11:32:44 +01:00
Ikhlas Ajbar	d50d4c3d44	[Hexagon] Correct the order of operands when lowering funnel shift-left This patch corrects the order of operands in the pattern that lowers fshl in Hexagon.	2020-07-28 21:22:41 -05:00
Kang Zhang	00046d789c	[PowerPC] Add Def CR1 for MTFSFI_rec and MTFSF_rec	2020-07-29 01:47:23 +00:00
Matt Arsenault	44211f20a8	AMDGPU: Optimize copies to exec with other insts after exec def It's possible to have terminator instructions after a write to exec, so skip over them to find it.	2020-07-28 21:34:50 -04:00
Matt Arsenault	b6ebc77326	AMDGPU/GlobalISel: Fix selecting llvm.amdgcn.s.getreg This introduces the same bug llvm.amdgcn.s.setreg has where if the user specified an immediate outside of the valid 16-bit range, it will select into a verifier error.	2020-07-28 21:34:50 -04:00
Thomas Lively	11bb7eef41	[WebAssembly] Remove intrinsics for SIMD widening ops Instead, pattern match extends of extract_subvectors to generate widening operations. Since extract_subvector is not a legal node, this is implemented via a custom combine that recognizes extract_subvector nodes before they are legalized. The combine produces custom ISD nodes that are later pattern matched directly, just like the intrinsic was. Also removes the clang builtins for these operations since the instructions can now be generated from portable code sequences. Differential Revision: https://reviews.llvm.org/D84556	2020-07-28 18:25:55 -07:00
Craig Topper	06cf6f770d	[X86] Add FeatureCMPXCHG8B and FeatureSlowUAMem16 to 'lakemont' in X86.td We already had CMPXCH8B feature on this CPU for the frontend so this doesn't have much effect. The FeatureSlowUAMem16 only matters if someone compiles with -march=lakemont -msse which doesn't make sense, but is consistent with all our pre-sse4.2 CPUs. Maybe the feature flag should be FeatureFastUAMem16 and set on the newer CPUs instead.	2020-07-28 18:24:46 -07:00
Matt Arsenault	6a7b6dd54b	AMDGPU: Don't assert in canInsertSelect Currently GlobalISel doesn't force all VGPR phi operands to VGPRs, so this hit a case where it was queried with a VGPR and SGPR. This could arguably be a verifier error, but it's currently not.	2020-07-28 21:01:06 -04:00
Thomas Lively	ffd8c23ccb	[WebAssembly] Implement truncating vector stores Rather than expanding truncating stores so that vectors are stored one lane at a time, lower them to a sequence of instructions using narrowing operations instead, when possible. Since the narrowing operations have saturating semantics, but truncating stores require truncation, mask the stored value to manually truncate it before narrowing. Also, since narrowing is a binary operation, pass in the original vector as the unused second argument. Differential Revision: https://reviews.llvm.org/D84377	2020-07-28 17:46:45 -07:00
Matt Arsenault	068808d102	AMDGPU: Don't assume call targets are registers GlobalISel let through a call to null, which would then fold into the source operand like any other inline immediate. The SelectionDAG lowering deletes calls to null and undef as a workaround from before calls were supported. We should probably drop the special handling case in the DAG lowering now, since the middle end optimizers delete null calls anyway.	2020-07-28 20:46:06 -04:00
Matt Arsenault	8860daf0ed	AMDGPU: Handle a few missing cases in getAddrModeArguments	2020-07-28 20:22:38 -04:00
Matt Arsenault	b3e63aa8a4	AMDGPU: Don't assume there is only one terminator copy This would stop on the first in reverse order, failing the verifier if there were more earlier in the block.	2020-07-28 20:22:38 -04:00
Matt Arsenault	592f2e8d1c	AMDGPU: Fix verifier error on spilling partially defined SGPRs This needs an implicit def of the super-register in case one of the lanes isn't defined, similar to copyPhysReg (or the not-VGPR spill case below). This showed up in GlobalISel testing since it currently doesn't fold out many undef instructions.	2020-07-28 20:01:57 -04:00
Matt Arsenault	66d60e06cb	AMDGPU: Serialize MFI spill fields These should probably be inferred from the function on parse, but the target specific infrastructure currently does not give you a way to do this. SILowerSGPRSpills early exits without this reporting spills, which makes it difficult to write a MIR test for.	2020-07-28 20:01:57 -04:00
Matt Arsenault	e87356b498	GlobalISel: Don't assert on operations with no type indices Fix not marking G_FENCE as legal on AMDGPU This was apparently defaulting to legal using the "legacy" rules, whatever those are.	2020-07-28 16:49:55 -04:00
Matt Arsenault	5174e7b443	GlobalISel: Add typeIsNot LegalityPredicate This allows sorting the legal/custom rules first as is recommended	2020-07-28 16:49:55 -04:00
Matt Arsenault	9731ef3ec5	AMDGPU/GlobalISel: Add SReg_96 to SGPRRegBank	2020-07-28 16:49:55 -04:00
Matt Arsenault	e9b236f411	AMDGPU: Check for other defs when folding conditions into s_andn2_b64 We can't fold the masked compare value through the select if the select condition is re-defed after the and instruction. Fixes a verifier error and trying to use the outgoing value defined in the block. I'm not sure why this pass is bothering to handle physregs. It's making this more complex and forces extra liveness computation.	2020-07-28 16:36:23 -04:00
Roman Lebedev	e1dd212c87	[X86] Remove disabled miscompiling X86CondBrFolding pass As briefly discussed in IRC with @craig.topper, the pass is disabled basically since it's original introduction (nov 2018) due to known correctness issues (miscompilations), and there hasn't been much work done to fix that. While i won't promise that i will "fix" the pass, i have looked at it previously, and i'm sure i won't try to fix it if that requires actually fixing this existing code. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D84775	2020-07-28 23:35:04 +03:00
Craig Topper	69152a11cf	[X86] Merge the two 'Emit the normal disp32 encoding' cases in SIB byte handling in emitMemModRMByte. NFCI By repeating the Disp.isImm() check in a couple spots we can make the normal case for immediate and for expression the same. And then always rely on the ForceDisp32 flag to remove a later non-zero immediate check. This should make {disp32} pseudo prefix handling slightly easier as we need the normal disp32 handler to handle a immediate of 0.	2020-07-28 12:12:09 -07:00
jasonliu	f8ab66538c	[NFC][XCOFF] Use getFunctionEntryPointSymbol from TLOF to simplify logic Reviewed By: Xiangling_L Differential Revision: https://reviews.llvm.org/D84693	2020-07-28 18:59:51 +00:00
Simon Pilgrim	4838cd46a9	[X86][XOP] Shuffle v16i8 using VPPERM(X,Y) instead of OR(PSHUFB(X),PSHUFB(Y))	2020-07-28 19:56:10 +01:00
Austin Kerbow	adeeac9d5a	[AMDGPU] Spill CSR VGPR which is reserved for SGPR spills Update logic for reserving VGPR for SGPR spills. A CSR VGPR being reserved for SGPR spills could be clobbered if there were no free lower VGPR's available. Create a stack object so that it will be spilled in the prologue. Also adds more tests. Differential Revision: https://reviews.llvm.org/D83730	2020-07-28 11:53:02 -07:00
Craig Topper	91b8c1fd0f	[X86] Simplify some code in emitMemModRMByte. NFCI	2020-07-28 10:46:04 -07:00
Craig Topper	6c3dc6e1d5	[X86] Merge disp8 and cdisp8 handling into a single helper function to reduce some code. We currently handle EVEX and non-EVEX separately in two places. By sinking the EVEX check into the existing helper for CDisp8 we can simplify these two places. Differential Revision: https://reviews.llvm.org/D84730	2020-07-28 10:46:01 -07:00
Anna Welker	0c64233bb7	[ARM][MVE] Teach MVEGatherScatterLowering to merge successive getelementpointers A patch following up on the introduction of pointer induction variables, adding a preprocessing step to the address optimisation in the MVEGatherScatterLowering pass. If the getelementpointer that is the address is itself using a getelementpointer as base, they will be merged into one by summing up the offsets, after checking that this will not cause an overflow (this can be repeated recursively). Differential Revision: https://reviews.llvm.org/D84027	2020-07-28 17:31:20 +01:00
Arthur Eubanks	2ca6c422d2	[FunctionAttrs] Rename functionattrs -> function-attrs To match NewPM pass name, and also for readability. Also rename rpo-functionattrs -> rpo-function-attrs while we're here. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D84694	2020-07-28 09:09:13 -07:00
Matt Arsenault	16bcd54570	AMDGPU/GlobalISel: Mark GlobalISel classes as final	2020-07-28 11:42:17 -04:00
Matt Arsenault	bb23b5cfe0	AMDGPU/GlobalISel: Merge identical select cases	2020-07-28 11:42:17 -04:00
Matt Arsenault	a4edc04693	AMDGPU/GlobalISel: Use clamp modifier for [us]addsat/[us]subsat We also have never handled this for SelectionDAG, which needs additional work.	2020-07-28 11:18:05 -04:00
Sander de Smalen	cda2eb3ad2	[AArch64][SVE] Fix epilogue for SVE when the stack is realigned. While deallocating the stackframe, the offset used to reload the callee-saved registers was not pointing to the SVE callee-saves, but rather to the whole SVE area. +--------------+ \| GRP callee \| \| saves \| +--------------+ <- FP \| SVE callee \| \| saves \| +--------------+ <- Should restore SVE callee saves from here \| SVE Spills \| \| and Locals \| +--------------+ <- instead of from here. \| \| : : \| \| +--------------+ <- SP Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D84539	2020-07-28 15:45:53 +01:00
Sander de Smalen	26b4ef3694	[AArch64][SVE] Don't align the last SVE callee save. Instead of aligning the last callee-saved-register slot to the stack alignment (16 bytes), just align the SVE callee-saved block. This also simplifies the code that allocates space for the callee-saves. This change is needed to make sure the offset to which the callee-saved register is spilled, corresponds to the offset used for e.g. unwind call frame instructions. Reviewers: efriedma, paulwalker-arm, david-arm, rengolin Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D84042	2020-07-28 15:45:53 +01:00
Sander de Smalen	54492a5843	[AArch64][SVE] Don't support fixedStack for SVE objects. Fixed stack objects are preallocated and defined to be allocated before any of the regular stack objects. These are normally used to model stack arguments. The AAPCS does not support passing SVE registers on the stack by value (only by reference). The current layout also doesn't place them before all stack objects, but rather before all SVE objects. Removing this simplifies the code that emits the allocation/deallocation around callee-saved registers (D84042). This patch also removes all uses of fixedStack from from framelayout-sve.mir, where this was used purely for testing purposes. Reviewers: paulwalker-arm, efriedma, rengolin Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D84538	2020-07-28 15:45:53 +01:00
Jinsong Ji	d28f86723f	Re-land "[PowerPC] Remove QPX/A2Q BGQ/BGP CNK support" This reverts commit `bf544fa1c3`. Fixed the typo in PPCInstrInfo.cpp.	2020-07-28 14:00:11 +00:00
Tim Northover	39108f4c7a	ARM: make Thumb1 instructions non-flag-setting in IT block. Many Thumb1 instructions are defined to set CPSR if executed outside an IT block, but leave it alone from inside one. In MachineIR this is represented by whether an optional register is CPSR or NoReg (0), and affects how the instructions are printed. This sets the instruction to the appropriate form during if-conversion.	2020-07-28 13:31:17 +01:00
Stefan Pintilie	97470897c4	[PowerPC] Split s34imm into two types Currently the instruction paddi always takes s34imm as the type for the 34 bit immediate. However, the PC Relative form of the instruction should not produce the same fixup as the non PC Relative form. This patch splits the s34imm type into s34imm and s34imm_pcrel so that two different fixups can be emitted. Reviewed By: nemanjai, #powerpc, kamaub Differential Revision: https://reviews.llvm.org/D83255	2020-07-28 05:55:56 -05:00
Simon Pilgrim	182111777b	[X86][SSE] Attempt to match OP(SHUFFLE(X,Y),SHUFFLE(X,Y)) -> SHUFFLE(HOP(X,Y)) An initial backend patch towards fixing the various poor HADD combines (PR34724, PR41813, PR45747 etc.). This extends isHorizontalBinOp to check if we have per-element horizontal ops (odd+even element pairs), but not in the expected serial order - in which case we build a "post shuffle mask" that we can apply to the HOP result, assuming we have fast-hops/optsize etc. The next step will be to extend the SHUFFLE(HOP(X,Y)) combines as suggested on PR41813 - accepting more post-shuffle masks even on slow-hop targets if we can fold it into another shuffle. Differential Revision: https://reviews.llvm.org/D83789	2020-07-28 10:04:14 +01:00
Craig Topper	647e861e08	[X86] Detect if EFLAGs is live across XBEGIN pseudo instruction. Add it as livein to the basic blocks created when expanding the pseudo XBEGIN causes several based blocks to be inserted. If flags are live across it we need to make eflags live in the new basic blocks to avoid machine verifier errors. Fixes PR46827 Reviewed By: ivanbaev Differential Revision: https://reviews.llvm.org/D84479	2020-07-27 21:15:35 -07:00
Craig Topper	25f193fb46	[X86] Add support for {disp32} to control size of jmp and jcc instructions in the assembler By default we pick a 1 byte displacement and let relaxation enlarge it if necessary. The GNU assembler supports a pseudo prefix to basically pre-relax the instruction the larger size. I plan to add {disp8} and {disp32} support for memory operands in another patch which is why I've included the parsing code and enum for {disp8} pseudo prefix as well. Reviewed By: echristo Differential Revision: https://reviews.llvm.org/D84709	2020-07-27 21:11:48 -07:00
Craig Topper	a0ebac52df	[X86] Properly encode a 32-bit address with an index register and no base register in 16-bit mode. In 16-bit mode we can encode a 32-bit address using 0x67 prefix. We were failing to do this when the index register was a 32-bit register, the base register was not present, and the displacement fit in 16-bits. Fixes PR46866.	2020-07-27 21:11:42 -07:00
Matt Arsenault	ce944af33c	AMDGPU/GlobalISel: Mark G_ATOMICRMW_{NAND\|FSUB} as lower These aren't implemented and we're still relying on the AtomicExpand pass, but mark these as lower to eliminate a few of the few remaining no rules defined cases.	2020-07-27 18:47:40 -04:00
Matt Arsenault	8b81d0633f	AMDGPU: global_atomic_csub is not always dereferenceable	2020-07-27 18:47:39 -04:00
Francesco Petrogalli	adb28e0fb2	[llvm][CodeGen] Addressing modes for SVE ldN. Reviewers: c-rhodes, efriedma, sdesmalen Subscribers: huihuiz, tschuett, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D77251	2020-07-27 22:18:28 +00:00
Jinsong Ji	bf544fa1c3	Revert "[PowerPC] Remove QPX/A2Q BGQ/BGP CNK support" This reverts commit `adffce7153`. This is breaking test-suite, revert while investigation.	2020-07-27 21:07:00 +00:00
Arthur Eubanks	beb7e3bb70	Rename t2-reduce-size -> thumb2-reduce-size For readability and consistency with other thumb2 passes like "thumb2-it". Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D84696	2020-07-27 13:42:13 -07:00
Jinsong Ji	adffce7153	[PowerPC] Remove QPX/A2Q BGQ/BGP CNK support Per RFC http://lists.llvm.org/pipermail/llvm-dev/2020-April/141295.html no one is making use of QPX/A2Q/BGQ/BGP CNK anymore. This patch remove the support of QPX/A2Q in llvm, BGQ/BGP in clang, CNK support in openmp/polly. Reviewed By: hfinkel Differential Revision: https://reviews.llvm.org/D83915	2020-07-27 19:24:39 +00:00
Arthur Eubanks	2a672767cc	Prefix some AArch64/ARM passes with "aarch64-"/"arm-" For consistency with other target specific passes. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D84560	2020-07-27 11:00:39 -07:00
Kazu Hirata	902cbcd59e	Use llvm::is_contained where appropriate (NFC) Summary: This patch replaces std::find with llvm::is_contained where appropriate. Reviewers: efriedma, nhaehnle Reviewed By: nhaehnle Subscribers: arsenm, jvesely, nhaehnle, hiraditya, rogfer01, kerbowa, llvm-commits, vkmr Tags: #llvm Differential Revision: https://reviews.llvm.org/D84489	2020-07-27 10:20:44 -07:00
Craig Topper	51e1c028d4	[X86] Add back comment inadvertently lost in `1a1448e656`.	2020-07-27 10:02:38 -07:00
Francesco Petrogalli	dbeb184b7f	[NFC][AArch64] Replace some template methods/invocations... ...with the non-template version, as the template version might increase the size of the compiler build. Methods affected: 1.`findAddrModeSVELoadStore` 2. `SelectPredicatedStore` Also, remove the `const` qualifier from the `unsigned` parameters of the methods to conform with other similar methods in the class.	2020-07-27 16:52:08 +00:00
Simon Pilgrim	4d84d94969	[X86][SSE] Relax 128-bit restriction on extract_subvector(ext_vector_inreg(X),0) -> ext_vector_inreg(extract_subvector(X,0)) fold We only need to ensure that the source is larger than the subvector result type	2020-07-27 17:50:36 +01:00
jasonliu	c25f61cf6a	[XCOFF][AIX] Handle llvm.used and llvm.compiler.used global array For now, just return and do nothing when we see llvm.used and llvm.compiler.used global array. Hopefully, we could come up with a good solution later to prevent linker from eliminating symbols in llvm.used array. Reviewed By: DiggerLin, daltenty Differential Revision: https://reviews.llvm.org/D84363	2020-07-27 15:28:32 +00:00
Jon Roelofs	f5e1ec8c58	[AArch64] fjcvtzs,rmif,cfinv,setf* all clobber nzcv Differential Revision: https://reviews.llvm.org/D83818	2020-07-27 09:17:53 -06:00
Simon Pilgrim	ab4ffa52f0	[X86][AVX] Fold extract_subvector(truncate(x),0) -> truncate(extract_subvector(x),0) This is currently only supported for VLX targets where the op should be legal.	2020-07-27 14:51:29 +01:00
Simon Pilgrim	f720c9c68c	[X86] combineExtractSubvector - pull out repeated getSizeInBits() calls. NFCI.	2020-07-27 14:51:28 +01:00
Tim Northover	0f1494be43	AArch64: avoid UB shift of negative value Left shifting a negative value is undefined behaviour, so this just moves the negation afterwards to avoid it.	2020-07-27 13:49:50 +01:00
Tim Northover	216b67e202	AArch64: diagnose out of range relocation addends on MachO. MachO only has 24-bit addends for most relocations, small enough that it can overflow in semi-reasonable functions and cause insidious bugs if compiled without assertions enabled. Switch it to an actual error instead. The condition isn't quite identical because ld64 treats the addend as a signed number.	2020-07-27 13:01:22 +01:00
Piotr Sobczak	590dd73c6e	[AMDGPU] Make generating cache invalidating instructions optional Summary: D78800 skipped generating cache invalidating instrucions altogether on AMDPAL. However, this is sometimes too restrictive - we want a more flexible option to be able to toggle this behaviour on and off while we work towards developing a correct implementation of the alternative memory model. Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, dexonsmith, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D84448	2020-07-27 09:24:11 +02:00
David Sherwood	14bc85e0eb	[SVE] Don't use LocalStackAllocation for SVE objects I have introduced a new TargetFrameLowering query function: isStackIdSafeForLocalArea that queries whether or not it is safe for objects of a given stack id to be bundled into the local area. The default behaviour is to always bundle regardless of the stack id, however for AArch64 this is overriden so that it's only safe for fixed-size stack objects. There is future work here to extend this algorithm for multiple local areas so that SVE stack objects can be bundled together and accessed from their own virtual base-pointer. Differential Revision: https://reviews.llvm.org/D83859	2020-07-27 08:22:01 +01:00
biplmish	825ed2d43d	[PowerPC] Add Vector Extract Double Instruction Definitions and MC tests. This patch adds the td definitions and asm/disasm tests for the following instructions: Vector Extract Double Left Index - vextdubvlx, vextduhvlx, vextduwvlx, vextddvlx Vector Extract Double Right Index - vextdubvrx, vextduhvrx, vextduwvrx, vextddvrx Differential Revision: https://reviews.llvm.org/D84384	2020-07-26 23:56:19 -05:00
Matt Arsenault	e97aa5609f	AMDGPU/GlobalISel: Don't assert in LegalizerInfo constructor We don't really need these asserts. The LegalizerInfo is also overly-aggressivly constructed, even when not in use. It needs to not assert on dummy targets that have manually specified, unrelated features.	2020-07-26 23:01:28 -04:00
Craig Topper	df12524e6b	[X86] Turn X86DAGToDAGISel::tryVPTERNLOG into a fully custom instruction selector that can handle bitcasts between logic ops Previously we just matched the logic ops and replaced with an X86ISD::VPTERNLOG node that we would send through the normal pattern match. But that approach couldn't handle a bitcast between the logic ops. Extending that approach would require us to peek through the bitcasts and emit new bitcasts to match the types. Those new bitcasts would then have to be properly topologically sorted. This patch instead switches to directly emitting the MachineSDNode and skips the normal tablegen pattern matching. We do have to handle load folding and broadcast load folding ourselves now. Which also means commuting the immediate control. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D83630	2020-07-26 12:19:08 -07:00
Craig Topper	1a75d88b3e	[X86] Move getGatherOverhead/getScatterOverhead into X86TargetTransformInfo. These cost methods don't make much sense in X86Subtarget. Make them methods in X86's TTI and move the feature checks from the X86Subtarget constructor into these methods. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D84594	2020-07-26 10:38:42 -07:00
Simon Pilgrim	17eafe0841	[X86][SSE] lowerV2I64Shuffle - use undef elements in PSHUFD mask widening If we lower a v2i64 shuffle to PSHUFD, we currently clamp undef elements to 0, (elements 0,1 of the v4i32) which can result in the shuffle referencing more elements of the source vector than expected, affecting later shuffle combines and KnownBits/SimplifyDemanded calls. By ensuring we widen the undef mask element we allow getV4X86ShuffleImm8 to use inline elements as the default, which are more likely to fold.	2020-07-26 16:04:22 +01:00
Matt Arsenault	d35e2c101d	AMDGPU/GlobalISel: Fix not constraining ds_append/consume operands	2020-07-26 10:17:36 -04:00
Matt Arsenault	f6176f8a5f	GlobalISel: Handle G_PTR_ADD in narrowScalar	2020-07-26 10:08:17 -04:00
Matt Arsenault	7c09c173a2	AMDGPU/GlobalISel: Reorder G_CONSTANT legality rules The legal cases should be the first rules.	2020-07-26 10:05:05 -04:00
Matt Arsenault	bcf5184a68	AMDGPU/GlobalISel: Make sure <2 x s1> phis are scalarized	2020-07-26 10:04:47 -04:00
Matt Arsenault	6f961a1e7e	AMDGPU/GlobalISel: Legalize GDS atomics I noticed these don't use the _gfx9, non-m0 reading variants but not sure if that's a bug or not. It's the same in the DAG.	2020-07-26 10:03:34 -04:00
Matt Arsenault	5819159995	AMDGPU/GlobalISel: Pack constant G_BUILD_VECTOR_TRUNCs when selecting	2020-07-26 09:55:34 -04:00
Matt Arsenault	4033aa1467	AMDGPU/GlobalISel: Sign extend integer constants This matches the DAG behavior and fixes immediate folding	2020-07-26 09:30:14 -04:00
Amara Emerson	9b19400004	[AArch64][GlobalISel] Make <8 x s16> and <16 x s8> legal types for G_SHUFFLE_VECTOR and G_IMPLICIT_DEF. Trivial change, we're still missing support for rev matching for these types in the combiner.	2020-07-26 00:48:09 -07:00
Craig Topper	1a1448e656	[X86] Merge X86MCInstLowering's maxLongNopLength into emitNop and remove check for FeatureNOPL. The switch in emitNop uses 64-bit registers for nops exceeding 2 bytes. This isn't valid outside 64-bit mode. We could fix this easily enough, but there are no users that ask for more than 2 bytes outside 64-bit mode. Inlining the method to make the coupling between the two methods more explicit.	2020-07-25 22:11:47 -07:00
Craig Topper	14c59b4577	[X86] Remove getProcFamily() method from X86Subtarget. NFC This isn't used and we've decided in the past that a CPU enum for tuning is not a good idea.	2020-07-25 22:11:45 -07:00
Craig Topper	1df8804ce5	[X86] Replace a use of ProcIntelSLM with FeatureFast7ByteNOP.	2020-07-25 20:46:48 -07:00
Nemanja Ivanovic	cdead4f89c	[PowerPC][NFC] Fix an assert that cannot trip from `7d076e19e3` I mixed up the precedence of operators in the assert and thought I had it right since there was no compiler warning. This just adds the parentheses in the expression as needed.	2020-07-25 20:28:52 -04:00
Matt Arsenault	392b969c32	AMDGPU/GlobalISel: Don't assert on G_INSERT > 128-bits Just fallback for now. Really tablegen needs to generate all of the subregister index handling we need.	2020-07-25 10:05:44 -04:00
Simon Pilgrim	3b21823e4a	[X86][SSE] combineX86ShufflesRecursively - move all Root node asserts to the same location. NFCI. Minor tidyup for some upcoming shuffle combine improvements.	2020-07-25 12:48:14 +01:00
Simon Pilgrim	66998ae59f	[X86][SSE] getFauxShuffle - ignore undemanded sources for PACKSS/PACKUS faux shuffles If we don't care about an entire LHS/RHS of the PACK op, then can just treat it the same as undef (we don't care if it saturates) and is safe to treat as a shuffle. This can happen if we attempt to decode as a faux shuffle before SimplifyDemandedVectorElts has been called on the PACK which should replace the source with UNDEF entirely.	2020-07-25 10:51:14 +01:00
Jessica Paquette	604e33e83a	[AArch64][GlobalISel] Look through constants when selection stores of 0 Very minor code size improvements (hits 8 times in Bullet at -O3), but still something. Also very minor NFC change to make sure we only search for a 0 constant when selecting a store. Before, we'd do this for loads as well. Differential Revision: https://reviews.llvm.org/D84573	2020-07-24 22:46:14 -07:00
Amy Kwan	739cd2638b	[PowerPC] Exploit the High Order Vector Multiply Instructions on Power10 This patch aims to exploit the following vector multiply high instructions on Power10. vmulhsw VRT, VRA, VRB vmulhsd VRT, VRA, VRB vmulhuw VRT, VRA, VRB vmulhud VRT, VRA, VRB Differential Revision: https://reviews.llvm.org/D82584	2020-07-24 20:57:57 -05:00
Amy Kwan	74790a5dde	[PowerPC] Implement Truncate and Store VSX Vector Builtins This patch implements the `vec_xst_trunc` function in altivec.h in order to utilize the Store VSX Vector Rightmost [byte \| half \| word \| doubleword] Indexed instructions introduced in Power10. Differential Revision: https://reviews.llvm.org/D82467	2020-07-24 19:22:39 -05:00
Jessica Paquette	fcc55c0952	[AArch64][GlobalISel] Use wzr/xzr for 16 and 32 bit stores of zero We weren't performing this optimization on 16 and 32 bit stores. SDAG happily does this though. e.g. https://godbolt.org/z/cWocKr This saves about 0.2% in code size on CTMark at -O3. Differential Revision: https://reviews.llvm.org/D84568	2020-07-24 17:15:20 -07:00
Amara Emerson	f320f83f3a	[AArch64][GlobalISel] Promote G_UITOFP vector operands to same elt size as result. Fixes legalization failures.	2020-07-24 17:00:50 -07:00
Matt Arsenault	2bd72abef0	AMDGPU: Skip other terminators before inserting s_cbranch_exec[n]z PHIElimination/createPHISourceCopy inserts non-branch terminators after the control flow pseudo if a successor phi reads a register defined by the control flow pseudo. If this happens, we need to split the expansion of the control flow pseudo to ensure all the branches are after all of the other mask management instructions. GlobalISel hit this in testscases that happened to be tail duplicated. The original testcase still does not work, since the same problem appears to be present in a later pass.	2020-07-24 16:51:59 -04:00
Eli Friedman	c02aa53ecb	[AArch64][SVE] Add "fast" fcmp operations. `dacf8d3` added support for most fcmp operations, but there are some extra variations I hadn't considered: SelectionDAG supports float comparisons that are neither ordered nor unordered. Add support for the missing operations. Differential Revision: https://reviews.llvm.org/D84460	2020-07-24 13:22:41 -07:00
Nemanja Ivanovic	7d076e19e3	[PowerPC] Fix computation of offset for load-and-splat for permuted loads Unfortunately this is another regression from my canonicalization patch (`1fed131660`). The patch contained two implicit assumptions: 1. That we would have a permuted load only if we are loading a partial vector 2. That a partial vector load would necessarily be as wide as the splat However, assumption 2 is not correct since it is possible to do a wider load and only splat a half of it. This patch corrects this assumption by simply checking if the load is permuted and adjusting the offset if it is.	2020-07-24 15:38:46 -04:00
madhur13490	4a577c3a22	[AMDGPU] Fix incorrect arch assert while setting up FlatScratchInit Reviewers: arsenm, foad, rampitec, scott.linder Reviewed By: arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D84391	2020-07-24 18:19:04 +00:00
Craig Topper	945ed22f33	[X86] Move the implicit enabling of sse2 for 64-bit mode from X86Subtarget::initSubtargetFeatures to X86_MC::ParseX86Triple. ParseX86Triple already checks for 64-bit mode and produces a static string. We can just add +sse2 to the end of that static string. This avoids a potential reallocation when appending it to the std::string at runtime. This is a slight change to the behavior of tools that only use MC layer which weren't implicitly enabling sse2 before, but will now. I don't think we check for sse2 explicitly in any MC layer components so this shouldn't matter in practice. And if it did matter the new behavior is more correct.	2020-07-24 11:14:20 -07:00
Francesco Petrogalli	809600d664	[llvm][sve] Reg + Imm addressing mode for ld1ro. Reviewers: kmclaughlin, efriedma, sdesmalen Subscribers: tschuett, hiraditya, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D83357	2020-07-24 17:48:47 +00:00
Craig Topper	8158f0cefe	[X86] Use X86_MC::ParseX86Triple to add mode features to feature string in X86Subtarget::initSubtargetFeatures. Remove mode flags from constructor and remove calls to ToggleFeature for the mode bits. By adding them to the feature string we handle initializing the mode member variables in X86Subtarget and the feature bits in MCSubtargetInfo in one shot.	2020-07-24 10:48:22 -07:00
Meera Nakrani	db37937a47	[ARM] Added additional patterns to VABD instruction Added extra patterns to VABD instruction so it is selected in place of VSUB and VABS. Added corresponding regression test too. Differential Revision: https://reviews.llvm.org/D84500	2020-07-24 17:46:25 +00:00
Meera Nakrani	805e6bcf22	Test Commit Test commit - added whitespace in ARMInstrMVE.td	2020-07-24 17:22:56 +00:00
Dmitry Preobrazhensky	6b8948922c	[AMDGPU][MC] Added support of SP3 syntax for MTBUF format modifier Currently supported LLVM MTBUF syntax is shown below. It is not compatible with SP3. op dst, addr, rsrc, FORMAT, soffset This change adds support for SP3 syntax: op dst, addr, rsrc, soffset SP3FORMAT In addition to being compatible with SP3, this syntax allows using symbolic names for data, numeric and unified formats. Below is a list of added syntax variants. format:<expression> format:[<numeric-format-name>,<data-format-name>] format:[<data-format-name>,<numeric-format-name>] format:[<data-format-name>] format:[<numeric-format-name>] format:[<unified-format-name>] The last syntax variant is supported for GFX10 only. See llvm bug 37738 Reviewers: arsenm, rampitec, vpykhtin Differential Revision: https://reviews.llvm.org/D84026	2020-07-24 16:41:03 +03:00
Simon Pilgrim	0128b9505c	Revert rG5dd566b7c7b78bd- "PassManager.h - remove unnecessary Function.h/Module.h includes. NFCI." This reverts commit `5dd566b7c7`. Causing some buildbot failures that I'm not seeing on MSVC builds.	2020-07-24 13:02:33 +01:00
Simon Pilgrim	5dd566b7c7	PassManager.h - remove unnecessary Function.h/Module.h includes. NFCI. PassManager.h is one of the top headers in the ClangBuildAnalyzer frontend worst offenders list. This exposes a large number of implicit dependencies on various forward declarations/includes in other headers that need addressing.	2020-07-24 12:40:50 +01:00
Petar Avramovic	47bd41d099	AMDGPU/GlobalISel: Select set.inactive intrinsic Differential Revision: https://reviews.llvm.org/D84407	2020-07-24 10:14:14 +02:00
Craig Topper	205e8b7e89	[X86] Make the X86ProcFamilyEnum private to X86Subtarget. Removed unneeded 'protected' from X86Subtarget. NFC	2020-07-23 23:42:11 -07:00
Matt Arsenault	891759db73	GlobalISel: Add scalarSameSizeAs LegalizeRule Widen or narrow a type to a type with the same scalar size as another. This can be used to force G_PTR_ADD/G_PTRMASK's scalar operand to match the bitwidth of the pointer type. Use this to disallow narrower types for G_PTRMASK.	2020-07-23 21:17:31 -04:00
Eli Friedman	993c1a3219	[AArch64][SVE] Teach copyPhysReg to copy ZPR2/3/4. It's sort of tricky to hit this in practice, but not impossible. I have a synthetic C testcase if anyone is interested. The implementation is identical to the equivalent NEON register copies. Differential Revision: https://reviews.llvm.org/D84373	2020-07-23 16:41:37 -07:00
Amy Kwan	1dc1a3fb0c	[PowerPC] Implement low-order Vector Multiply, Modulus and Divide Instructions This patch aims to implement the low order vector multiply, divide and modulo instructions available on Power10. The patch involves legalizing the ISD nodes MUL, UDIV, SDIV, UREM and SREM for v2i64 and v4i32 vector types in order to utilize the following instructions: - Vector Multiply Low Doubleword: vmulld - Vector Modulus Word/Doubleword: vmodsw, vmoduw, vmodsd, vmodud - Vector Divide Word/Doubleword: vdivsw, vdivsd, vdivuw, vdivud Differential Revision: https://reviews.llvm.org/D82510	2020-07-23 17:18:36 -05:00
David Green	b37e92201c	[ARM] Add predicated mla reduction patterns Similar to `8fa824d7a3` but this time for MLA patterns, this selects predicated vmlav/vmlava/vmlalv/vmlava instructions from vecreduce.add(select(p, mul(x, y), 0)) nodes. Differential Revision: https://reviews.llvm.org/D84102	2020-07-23 21:47:59 +01:00
Matt Arsenault	b9c644ec61	AMDGPU: Fix failures from overflowing uint8_t number of operands If the operand index exceeded the limit of unsigned char, it wrapped and would point to the wrong operand. Increase the size of the operand index field to avoid this, and also don't bother trying to fold into implicit operands.	2020-07-23 15:39:33 -04:00
Amara Emerson	3b10e42ba1	[AArch64][GlobalISel] Add post-legalize combine for sext(trunc(sextload)) -> trunc/copy On AArch64 we generate redundant G_SEXTs or G_SEXT_INREGs because of this. Differential Revision: https://reviews.llvm.org/D81993	2020-07-23 12:06:35 -07:00
Matt Arsenault	d2b8fcff34	AMDGPU/GlobalISel: Handle call return values The only case that I know doesn't work is the implicit sret case when the return type doesn't fit in the return registers.	2020-07-23 14:29:35 -04:00
Craig Topper	5dbcf5e3cc	[X86] Add Feature64Bit to the 'generic' CPU and remove feature string hacking in X86Subtarget constructor Feature64Bit is only used by a check in the X86Subtarget constructor to ensure that the CPU selected supports 64-bit mode when the triple is for 64-bit mode. 'generic' is the default CPU in llc and so needs to be able to pass this check. Previously we did this by detecting the name and adding the feature to the feature string. But there doesn't seem to be any reason we can't just add the feature to the CPU directly.	2020-07-23 09:16:18 -07:00
Sebastian Neubauer	896679733d	[AMDGPU] Fix typo. NFC	2020-07-23 17:01:12 +02:00
Simon Pilgrim	d720ba1e4b	[X86][SSE] SimplifyDemandedVectorEltsForTargetNode - add SSE shift multiple use handling Add SimplifyMultipleUseDemandedVectorElts peek through for imm/var SSE shifts	2020-07-23 14:39:03 +01:00
Ulrich Weigand	68a80a4436	[SystemZ] Ensure -mno-vx disables any use of vector features When passing the -vector feature to LLVM (or equivalently the -mno-vx command line argument to clang), the intent is that generated code must not use any vector features (in particular, no vector registers must be used). However, there are some cases where we still could generate such uses; these are all related to some of the additional vector features (like +vector-enhancements-1). Since none of those features are actually usable with -vector, just make sure we disable them all if -vector is given.	2020-07-23 15:34:59 +02:00
Jay Foad	b35833b84e	[GlobalISel][AMDGPU] Legalize saturating add/subtract Add support in LegalizerHelper for lowering G_SADDSAT etc. either using add/subtract-with-overflow or using max/min instructions. Enable this lowering for AMDGPU so it can be tested. The legalization rules are still approximate and skips out on using the clamp bit to treat these as legal, which has never been used before. This also doesn't yet try to deal with expanding SALU cases.	2020-07-23 09:06:42 -04:00
Craig Topper	ebe5f17f9c	[X86] Remove the DeprecatedMPX feature flag. We deprecated mpx feature in 10.0. I left this feature flag in case someone still had IR files containing the feature in a target-feature attribute. At the time I think I thought it would fail the test if the feature couldn't be found. Further review suggests that at worst it prints a message to stderr about ignoring the feature.	2020-07-22 17:44:07 -07:00
Craig Topper	b2c65beb14	[X86] Rework the "sahf" feature flag to only apply to 64-bit mode. SAHF/LAHF instructions are always available in 32-bit mode. Early 64-bit capable CPUs made the undefined opcodes in 64-bit mode. This was changed on later CPUs. We have a feature flag to control our usage of these instructions. This feature flag is hooked up to a clang command line option -msahf/-mno-sahf specifically to give control of the 64-bit mode behavior. In the backend X86Subtarget constructor we were explicitly forcing +sahf into the feature flag string if we were not compiling for 64-bit mode. This was intended to make the predicates always allow the instructions outside of 64-bit mode. Unfortunately, the way it was placed into the string allowed -mno-sahf from clang to disable SAHF instructions in 32-bit mode. This causes an assertion to fire if you compile a floating point comparison with something like "-march=pentium -mno-sahf" as our floating point comparison handling on CPUs that don't support FCOMI/FUCOMI instructions requires SAHF. To fix this, this commit restricts the feature flag to only apply to 64-bit mode by ignoring the flag outside 64-bit mode in X86Subtarget::hasLAHFSAHF(). This way we don't need to mess with the feature string at all.	2020-07-22 16:57:46 -07:00
Amy Kwan	5f11027395	[PowerPC][Power10] Fix vinsvlx instructions to have i32 arguments. Previously, the vinsvlx instructions were incorrectly defined with i64 as the second argument. This patches fixes this issue by correcting the second argument of the vins*vlx instructions/intrinsics to be i32. Differential Revision: https://reviews.llvm.org/D84277	2020-07-22 17:58:14 -05:00
Craig Topper	deeb2fdbf4	[X86] Remove a couple temporary std::string for CPU names that I don't need to exist. The input to these functions is a StringRef. We then convert it to a std::string. Then maybe replace with "generic". I think we can just overwrite the incoming StringRef with "generic" if needed and then pass it along without creating any std::string.	2020-07-22 15:55:04 -07:00
David Green	411eb87c79	[ARM] Fix missing MVE_VMUL_qr predicate This was missed out of `1030e82598`, but hopefully fixes the issues reported with NEON accidentally generating MVE instructions.	2020-07-22 20:43:02 +01:00
Amy Kwan	08b4a50e39	[PowerPC][Power10] Fix the Test LSB by Byte (xvtlsbb) Builtins Implementation The implementation of the xvtlsbb builtins/intrinsics were not correct as the intrinsics previously used i1 as an argument type. This patch changes the i1 argument type used in these intrinsics to be i32 instead, as having the second as an i1 can lead to issues in the backend. Differential Revision: https://reviews.llvm.org/D84291	2020-07-22 13:27:05 -05:00
Matt Arsenault	d26526fd09	AArch64: Use Register	2020-07-22 14:14:44 -04:00
Matt Arsenault	0c92bfa4b8	GlobalISel: Don't use virtual for distinguishing arg handlers There's no reason to involve the hassle of a virtual method targets have to override for a simple boolean. Not sure exactly what's going on with Mips, but it seems to define its own totally separate handler classes.	2020-07-22 14:14:43 -04:00
Matt Arsenault	6f437117af	AMDGPU: Don't assert on f16 inv2pi immediates pre-gfx8 v_cvt_f32_f16 can still accept this value as a literal constant. This showed up in GlobalISel since it doesn't have constant folding for G_FPEXT.	2020-07-22 13:59:03 -04:00
Matt Arsenault	b98f902f18	GlobalISel: Restructure argument lowering loop in handleAssignments This was structured in a way that implied every split argument is in memory, or in registers. It is possible to pass an original argument partially in registers, and partially in memory. Transpose the logic here to only consider a single piece at a time. Every individual CCValAssign should be treated independently, and any merge to original value needs to be handled later. This is in preparation for merging some preprocessing hacks in the AMDGPU calling convention lowering into the generic code. I'm also not sure what the correct behavior for memlocs where the promoted size is larger than the original value. I've opted to clamp the memory access size to not exceed the value register to avoid the explicit trunc/extend/vector widen/vector extract instruction. This happens for AMDGPU for i8 arguments that end up stack passed, which are promoted to i16 (I think this is a preexisting DAG bug though, and they should not really be promoted when in memory).	2020-07-22 13:31:11 -04:00
Matt Arsenault	1fd1beea18	AMDGPU/GlobalISel: Fix translation of indirect calls	2020-07-22 13:13:21 -04:00
David Green	8fa824d7a3	[ARM] Add predicated add reduction patterns Given a vecreduce.add(select(p, x, 0)), we can convert that to a predicated vaddv, as the else value for the select is the identity value, a zero. That is what this patch does for the vaddv, vaddva, vaddlv and vaddlva instructions, copying the existing patterns to also handle predication through a select. Differential Revision: https://reviews.llvm.org/D84101	2020-07-22 17:30:02 +01:00
Dmitry Preobrazhensky	0b8fd77ad9	[AMDGPU][MC] Corrected decoding of 16-bit literals 16-bit literals are encoded as 32-bit values. If high 16-bits of the value is 0xFFFF, the decoded instruction cannot be reassembled. For example, the following code 0xff,0x04,0x04,0x52,0xcd,0xab,0xff,0xff was decoded as v_mul_lo_u16_e32 v2, 0xffffabcd, v2 However this literal is actually a 64-bit constant 0x00000000ffffabcd which violates requirements described in the documentation - the truncation is not safe. This change corrects decoding to make reassembly possible. Reviewers: arsenm, rampitec Differential Revision: https://reviews.llvm.org/D84098	2020-07-22 17:20:43 +03:00
Stefan Pintilie	a60251d739	[PowerPC] Add linker opt for PC Relative GOT indirect accesses A linker optimization is available on PowerPC for GOT indirect PCRelative loads. The idea is that we can mark a usual GOT indirect load: pld 3, vec@got@pcrel(0), 1 lwa 3, 4(3) With a relocation to say that if we don't need to go through the GOT we can let the linker further optimize this and replace a load with a nop. pld 3, vec@got@pcrel(0), 1 .Lpcrel1: .reloc .Lpcrel1-8,R_PPC64_PCREL_OPT,.-(.Lpcrel1-8) lwa 3, 4(3) This patch adds the logic that allows the compiler to add the R_PPC64_PCREL_OPT. Reviewers: nemanjai, lei, hfinkel, sfertile, efriedma, tstellar, grosbach Reviewed By: nemanjai Differential Revision: https://reviews.llvm.org/D79864	2020-07-22 09:08:23 -05:00
jasonliu	b98b1700ef	[XCOFF] Enable symbol alias for AIX Summary: AIX assembly's .set directive is not usable for aliasing purpose. We need to use extra-label-at-defintion strategy to generate symbol aliasing on AIX. Reviewed By: DiggerLin, Xiangling_L Differential Revision: https://reviews.llvm.org/D83252	2020-07-22 14:03:55 +00:00
Sebastian Neubauer	2a6c871596	[InstCombine] Move target-specific inst combining For a long time, the InstCombine pass handled target specific intrinsics. Having target specific code in general passes was noted as an area for improvement for a long time. D81728 moves most target specific code out of the InstCombine pass. Applying the target specific combinations in an extra pass would probably result in inferior optimizations compared to the current fixed-point iteration, therefore the InstCombine pass resorts to newly introduced functions in the TargetTransformInfo when it encounters unknown intrinsics. The patch should not have any effect on generated code (under the assumption that code never uses intrinsics from a foreign target). This introduces three new functions: TargetTransformInfo::instCombineIntrinsic TargetTransformInfo::simplifyDemandedUseBitsIntrinsic TargetTransformInfo::simplifyDemandedVectorEltsIntrinsic A few target specific parts are left in the InstCombine folder, where it makes sense to share code. The largest left-over part in InstCombineCalls.cpp is the code shared between arm and aarch64. This allows to move about 3000 lines out from InstCombine to the targets. Differential Revision: https://reviews.llvm.org/D81728	2020-07-22 15:59:49 +02:00
David Green	f8abecf337	[ARM] Extra MVE select(binop) patterns This is very similar to 243970d03cace2, but handling a slightly different form of predicated operations. When starting with a pattern of the form select(p, BinOp(x, y), x), Instcombine will often transform this to BinOp(x, select(p, y, 0)), where 0 is the identity value of the binop (0 for adds/subs, 1 for muls, -1 for ands etc). This adds the patterns that transforms those back into predicated binary operations. There is also a very minor adjustment to tablegen null_frag in here, to allow it to also be recognized as a PatLeaf node, so that it can be used in MVE_TwoOpPattern to easily exclude the cases where we do not need the alternate transform. Differential Revision: https://reviews.llvm.org/D84091	2020-07-22 14:08:29 +01:00
David Green	3533e0a08d	[ARM] Add patterns for select(p, BinOp(x, y), z) -> BinOpT(x, y,p z) Most MVE instructions can be predicated to fold a select into the instruction, using the predicate and the selects else as a passthough. This adds tablegen patterns for most two operand instructions using the newly added TwoOpPattern from `1030e82598`. Differential Revision: https://reviews.llvm.org/D83222	2020-07-22 13:24:01 +01:00
Chen Zheng	36f9fe2d34	[PowerPC] fixupIsDeadOrKill start and end in different block fixing In fixupIsDeadOrKill, we assume StartMI and EndMI not exist in same basic block, so we add an assertion in that function. This is wrong before RA, as before RA the true definition may exist in another block through copy like instructions. Reviewed By: nemanjai Differential Revision: https://reviews.llvm.org/D83365	2020-07-22 06:27:13 -04:00
Sander de Smalen	bef56f7fe2	[AArch64][SVE] Correctly allocate scavenging slot in presence of SVE. This patch addresses two issues: * Forces the availability of the base-pointer (x19) when the frame has both scalable vectors and variable-length arrays. Otherwise it will be expensive to access non-SVE locals. * In presence of SVE stack objects, it will allocate the emergency scavenging slot close to the SP, so that they can be accessed from the SP or BP if available. If accessed from the frame-pointer, it will otherwise need an extra register to access the scavenging slot because of mixed scalable/non-scalable addressing modes. Reviewers: efriedma, ostannard, cameron.mcinally, rengolin, david-arm Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D70174	2020-07-22 10:50:36 +01:00
Simon Wallis	94e4e37d55	[Thumb] set code alignment for 16-bit load from constant pool Summary: [Thumb] set code alignment for 16-bit load from constant pool LLVM miscompiles this code when compiling for a target with v8.2-A FP16 and the Thumb ISA at -O0: extern void bar(__fp16 P5); int main() { __fp16 P5 = 1.96875; bar(P5); } The code section containing main has 2 byte alignment. It needs to have 4 byte alignment, because the load literal instruction has an offset from the load address with the low 2 bits zeroed. I do not include a test case in this check-in. llc and llvm-mc do not exhibit this bug. They do not set code section alignment in the same manner as clang. Reviewers: dnsampaio Reviewed By: dnsampaio Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D84169	2020-07-22 10:12:41 +01:00
Petar Avramovic	44967fc604	AMDGPU: Simplify f16 to i64 custom lowering Range that f16 can represent fits into i32. Lower as f16->i32->i64 instead of f16->f32->i64 since f32->i64 has long expansion. Differential Revision: https://reviews.llvm.org/D84166	2020-07-22 10:32:14 +02:00
David Spickett	3a34194606	[ARM] Fix Asm/Disasm of TBB/TBH instructions Summary: This fixes Bugzilla #46616 in which it was reported that "tbb [pc, r0]" was marked as SoftFail (aka unpredictable) incorrectly. Expected behaviour is: * ARMv8 is required to use sp as rn or rm (tbb/tbh only have a Thumb encoding so using Arm mode is not an option) * If rm is the pc then the instruction is always unpredictable Some of this was implemented already and this fixes the rest. Added tests cover the new and pre-existing handling. Reviewers: ostannard Reviewed By: ostannard Subscribers: kristof.beyls, hiraditya, danielkiss, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D84227	2020-07-22 09:31:56 +01:00
Kai Luo	c3f9697f1f	[PowerPC] Fix wrong codegen when stack pointer has to realign performing dynalloc Current powerpc backend generates wrong code sequence if stack pointer has to realign if `-fstack-clash-protection` enabled. When probing dynamic stack allocation, current `PREPARE_PROBED_ALLOCA` takes `NegSizeReg` as input and returns `FinalStackPtr`. `FinalStackPtr=StackPtr+ActualNegSize` is calculated correctly, however code following `PREPARE_PROBED_ALLOCA` still uses value of `NegSizeReg`, which does not contain `ActualNegSize` if `MaxAlign > TargetAlign`, to calculate loop trip count and residual number of bytes. This patch is part of fix of https://bugs.llvm.org/show_bug.cgi?id=46759. Differential Revision: https://reviews.llvm.org/D84152	2020-07-22 06:35:12 +00:00
Kai Luo	8912252252	[PowerPC] Fix wrong codegen when stack pointer has to realign in prologue Current powerpc backend generates wrong code sequence if stack pointer has to realign if -fstack-clash-protection enabled. When probing in prologue, backend should generate a subtraction instruction rather than a `stux` instruction to realign the stack pointer. This patch is part of fix of https://bugs.llvm.org/show_bug.cgi?id=46759. Differential Revision: https://reviews.llvm.org/D84218	2020-07-22 06:35:12 +00:00
Kang Zhang	9bbf0ecff3	[PowerPC] Fix the implicit operands in PredicateInstruction() Summary: In the function `PPCInstrInfo::PredicateInstruction()`, we will replace non-Predicate Instructions to Predicate Instruction. But we forget add the new implicit operands the new Predicate Instruction needed. This patch is to fix this. Reviewed By: jsji, efriedma Differential Revision: https://reviews.llvm.org/D82390	2020-07-22 05:51:03 +00:00
Chen Zheng	e8425b27fe	[PowerPC] add store (load float) pattern to isProfitableToHoist store (load float) can be optimized to store(load i32) in InstCombine pass. Add store (load float) to isProfitableToHoist to make sure we don't break the opt in InstCombine pass. Reviewed By: jsji Differential Revision: https://reviews.llvm.org/D82341	2020-07-21 20:55:13 -04:00
Amy Kwan	1eb279d2a8	[PowerPC][Power10] Add Vector Multiply/Mod/Divide Instruction Definitions and MC Tests This patch adds the td definitions and asm/disasm tests for the following instructions: - Vector Multiply Low Doubleword: vmulld - Vector Modulus Word/Doubleword: vmodsw, vmoduw, vmodsd, vmodud - Vector Divide Word/Doubleword: vdivsw, vdivuw, vdivsd, vdivud - Vector Multiply High Word/Doubleword: vmulhsw, vmulhsd, vmulhuw, vmulhud - Vector Divide Extended Word/Doubleword: vdivesw, vdiveuw, vdivesd, vdiveud Differential Revision: https://reviews.llvm.org/D82929	2020-07-21 18:05:35 -05:00
Amara Emerson	791544422a	Revert "[AArch64][GlobalISel] Add post-legalize combine for sext_inreg(trunc(sextload)) -> copy" This reverts commit `64eb3a4915`. It caused miscompiles with optimizations enabled. Reverting while I investigate.	2020-07-21 16:01:18 -07:00
Amara Emerson	f1ae96d9bf	[AArch64][GlobalISel] Fix TLS accesses clobbering registers incorrectly. This was happening because the BLR didn't have a use of the X0 arg register, which would end up being re-used in high reg pressure situations. The change also avoids hard coding the use of X0 for the sequence except to copy the value for the call. ld64 should still be able to optimize it. rdar://65438258	2020-07-21 16:01:17 -07:00
Matt Arsenault	b258920095	AMDGPU/GlobalISel: Fix not erasing inst when lowering G_FRINT	2020-07-21 18:29:41 -04:00
Matt Arsenault	7cd8a0256d	GlobalISel: Legalize G_FPOWI	2020-07-21 18:13:04 -04:00
Matt Arsenault	1168119c2f	AMDGPU: Start interpreting byref on kernel arguments These are treated identically to value aggregates placed in the kernel argument list. A %struct.foo or %struct.foo addrspace(4)* byref(sizeof(%struct.foo)) align(alignof(%struct.foo)) argument should produce the same offsets and argument metadata. This handles all 3 kernel ABI implementations, and the two HSA metadata emission paths.	2020-07-21 18:11:22 -04:00
Simon Pilgrim	5b5dc2442a	[X86][AVX] getTargetShuffleMask - don't decode VBROADCAST(EXTRACT_SUBVECTOR(X,0)) patterns. getTargetShuffleMask is used by the various "SimplifyDemanded" folds so we can't assume that the bypassed extract_subvector can be safely simplified - getFauxShuffleMask performs a more general decode that allows us to more safely catch many of these cases so the impact is minimal.	2020-07-21 21:55:44 +01:00
Matt Arsenault	2fe0ea8261	DAG: Handle expanding strict_fsub into fneg and strict_fadd The AMDGPU handling of f16 vectors is terrible still since it gets scalarized even when the vector operation is legal. The code is is essentially duplicated between the non-strict and strict case. Apparently no other expansions are currently trying to do this. This is mostly because I found the behavior of getStrictFPOperationAction to be confusing. In the ARM case, it would expand strict_fsub even though it shouldn't due to the later check. At that point, the logic required to check for legality was more complex than just duplicating the 2 instruction expansion.	2020-07-21 16:17:10 -04:00
diggerlin	11546898e2	[AIX][XCOFF]emit extern linkage for the llvm intrinsic symbol SUMMARY: when we call memset, memcopy,memmove etc(this are llvm intrinsic function) in the c source code. the llvm will generate IR like call call void @llvm.memset.p0i8.i32(i8* align 4 bitcast (%struct.S* @s to i8), i8 %1, i32 %2, i1 false) for c source code bash> cat test_memset.call struct S{ int a; int b; }; extern struct S s; void bar() { memset(&s, s.b, s.b); } like %struct.S = type { i32, i32 } @s = external global %struct.S, align 4 ; Function Attrs: noinline nounwind optnone define void @bar() #0 { entry: %0 = load i32, i32 getelementptr inbounds (%struct.S, %struct.S* @s, i32 0, i32 1), align 4 %1 = trunc i32 %0 to i8 %2 = load i32, i32* getelementptr inbounds (%struct.S, %struct.S* @s, i32 0, i32 1), align 4 call void @llvm.memset.p0i8.i32(i8* align 4 bitcast (%struct.S* @s to i8), i8 %1, i32 %2, i1 false) ret void } declare void @llvm.memset.p0i8.i32(i8 nocapture writeonly, i8, i32, i1 immarg) #1 If we want to let the aix as assembly compile pass without -u it need to has following assembly code. .extern .memset (we do not output extern linkage for llvm instrinsic function. even if we output the extern linkage for llvm intrinsic function, we should not out .extern llvm.memset.p0i8.i32, instead of we should emit .extern memset) for other llvm buildin function floatdidf . even if we do not call these function floatdidf in the c source code(the generated IR also do not the call __floatdidf . the function call was generated in the LLVM optimized. the function is not in the functions list of Module, but we still need to emit extern .__floatdidf The solution for it as : We record all the lllvm intrinsic extern symbol when transformCallee(), and emit all these symbol in the AsmPrinter::doFinalization(Module &M) Reviewers: jasonliu, Sean Fertile, hubert.reinterpretcast, Differential Revision: https://reviews.llvm.org/D78929	2020-07-21 16:03:04 -04:00
David Green	1030e82598	[ARM] Add MVE_TwoOpPattern. NFC This commons out a chunk of the different two operand MVE patterns into a single helper multidef. Or technically two multidef patterns so that the Dup qr patterns can also get the same treatment. This is most of the two address instructions that we have some codegen pattern for (not ones that we select purely from intrinsics). It does not include shifts, which are more spread out and will need some extra work to be given the same treatment. Differential Revision: https://reviews.llvm.org/D83219	2020-07-21 19:51:37 +01:00
Sander de Smalen	9bacf15885	[AArch64][SVE] Fix PCS for functions taking/returning scalable types. The default calling convention needs to save/restore the SVE callee saves according to the SVE PCS when the function takes or returns scalable types, even when the `aarch64_sve_vector_pcs` CC is not specified for the function. Reviewers: efriedma, paulwalker-arm, david-arm, rengolin Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D84041	2020-07-21 15:55:39 +01:00
Petre-Ionut Tudor	1af9fc8213	[ARM] Generate [SU]HADD from ((a + b) >> 1) Summary: Teach LLVM to recognize the above pattern, where the operands are either signed or unsigned types. Subscribers: kristof.beyls, hiraditya, danielkiss, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D83777	2020-07-21 13:22:07 +01:00
David Green	30371df85f	[ARM] More unpredictable VCVT instructions. These extra vcvt instructions were missed from `74ca67c109` because they live in a different Domain, but should be treated in the same way. Differential Revision: https://reviews.llvm.org/D83204	2020-07-21 07:24:37 +01:00
Matt Arsenault	107c954c13	AMDGPU/GlobalISel: Remove unnecessary parameter	2020-07-20 20:53:01 -04:00
Artem Belevich	bf66003a4f	[MC,NVPTX] Add MCAsmPrinter support for unsigned-only data directives. PTX does not support negative values in .bNN data directives and we must typecast such values to unsigned before printing them. MCAsmInfo can now specify whether such casting is necessary for particular target. Differential Revision: https://reviews.llvm.org/D83423	2020-07-20 16:24:41 -07:00
Eli Friedman	b8f765a1e1	[AArch64][SVE] Add support for trunc to <vscale x N x i1>. This isn't a natively supported operation, so convert it to a mask+compare. In addition to the operation itself, fix up some surrounding stuff to make the testcase work: we need concat_vectors on i1 vectors, we need legalization of i1 vector truncates, and we need to fix up all the relevant uses of getVectorNumElements(). Differential Revision: https://reviews.llvm.org/D83811	2020-07-20 13:11:02 -07:00
Yuanfang Chen	ca1e69a675	[NFC] remove unused includes of SelectionDAGISel.h	2020-07-20 10:43:29 -07:00
Yuanfang Chen	589c646a7e	[llc] (almost) remove `--print-machineinstrs` Its effect could be achieved by `-stop-after`,`-print-after`,`-print-after-all`. But a few tests need to print MIR after ISel which could not be done with `-print-after`/`-stop-after` since isel pass does not have commandline name. That's the reason `--print-machineinstrs` is downgraded to `--print-after-isel` in this patch. `--print-after-isel` could be removed after we switch to new pass manager since isel pass would have a commandline text name to use `print-after` or equivalent switches. The motivation of this patch is to reduce tests dependency on would-be-deprecated feature. Reviewed By: arsenm, dsanders Differential Revision: https://reviews.llvm.org/D83275	2020-07-20 10:43:28 -07:00
Matt Arsenault	ce76d15a70	AMDGPU: Use MCRegister for preloaded arguments Attempt to fix build error with ancient GCC	2020-07-20 13:34:28 -04:00
Matt Arsenault	21ef01b7e3	AMDGPU: Remove outdated fixme	2020-07-20 11:41:41 -04:00
Matt Arsenault	84704d989b	AMDGPU: Fix not accounting for constantexpr uses of LDS globals This was failing to add the size of LDS globals that weren't directly used by an instruction. They could be used by constant expressions which are transitively used by the function. This requires a better search, but just abort on this for now for correctness.	2020-07-20 11:41:41 -04:00
Matt Arsenault	61f1f2a204	AMDGPU/GlobalISel: Initial Implementation of calls Return values, and tail calls are not yet handled.	2020-07-20 11:13:22 -04:00
Matt Arsenault	ad8e900cb3	Verifier: Disallow byval and similar for AMDGPU calling conventions These imply stack-like semantics, which doesn't make any sense for entry points.	2020-07-20 10:58:57 -04:00
Simon Pilgrim	017e5c949b	MCFixup.h - remove unnecessary MCExpr.h include. NFCI. Move the include down to files that actually depend on MCExpr definitions. Also exposes an implicit dependency on MCContext in AVRAsmBackend.h	2020-07-20 15:17:19 +01:00
Petar Avramovic	6a1030aa0e	AMDGPU/GlobalISel: Legalize s16->s64 G_FPEXT Legalize using narrowScalar as s16->s32 G_FPEXT followed by s32->s64 G_FPEXT. Differential Revision: https://reviews.llvm.org/D84030	2020-07-20 16:12:19 +02:00
Matt Arsenault	100564bdf8	AMDGPU/GlobalISel: Remove outdated comment	2020-07-20 10:06:18 -04:00
Matt Arsenault	93311a9812	AMDGPU/GlobalISel: Fix custom lowering of llvm.trunc.f64 for SI This was missing an operand from BFE and not erasing the original instruction.	2020-07-20 10:06:18 -04:00
Paul Walker	6384ec4099	[SVE] Add lowering for fixed length vector fdiv, fma, fmul and fsub operations. Differential Revision: https://reviews.llvm.org/D84034	2020-07-20 11:57:34 +00:00
Tim Northover	88464a55b4	AArch64: emit @llvm.debugtrap as `brk #0xf000` on all platforms It's useful for a debugger to be able to distinguish an @llvm.debugtrap from a (noreturn) @llvm.trap, so this extends the existing Windows behaviour to other platforms.	2020-07-20 10:31:26 +01:00
Petar Avramovic	ba938f6388	AMDGPU/GlobalISel: Legalize s16->s64 G_FPTOSI/G_FPTOUI Add narrowScalarFor action. Add narrow scalar for typeIndex == 0 for G_FPTOSI/G_FPTOUI. Legalize using narrowScalarFor as s16->s32 G_FPTOSI/G_FPTOUI followed by s32->s64 G_SEXT/G_ZEXT. Differential Revision: https://reviews.llvm.org/D84010	2020-07-20 11:06:11 +02:00
Sanjay Patel	50afa18772	[x86] split FMA with fast-math-flags to avoid libcall fma reassoc A, B, C --> fadd (fmul A, B), C (when target has no FMA hardware) C/C++ code may use explicit fma() calls (which become LLVM fma intrinsics in IR) but then gets compiled with -ffast-math or similar. For targets that do not have FMA hardware, we don't want to go out to the math library for a precise but slow FMA result. I tried this as a generic DAGCombine, but it caused infinite looping on more than 1 other target, so there's likely some over-reaching fma formation happening. There's also a potential intersection of strict FP with fast-math here. Deferring to current behavior for that case (assuming that strict-ness overrides fast-ness). Differential Revision: https://reviews.llvm.org/D83981	2020-07-19 10:03:55 -04:00
David Green	3504acc33e	[ARM] Don't mark vctp as having sideeffects As far as I can tell, it should not be necessary for VCTP to be unpredictable in tail predicated loops. Either it has a a valid loop counter as a operand which will naturally keep it in the right loop, or it doesn't and it won't be converted to a tail predicated loop. Not marking it as having side effects allows it to be scheduled more cleanly for cases where it is not expected to become a tail predicate loop. Differential Revision: https://reviews.llvm.org/D83907	2020-07-19 09:28:09 +01:00
Kang Zhang	d37befdfe5	[PowerPC] Remove the redundant implicit operands in ppc-early-ret pass Summary: In the `ppc-early-ret` pass, we have use `BuildMI` and `copyImplicitOps` when the branch instructions can do the early return. But the two functions will add implicit operands twice, this is not correct. This patch is to remove the redundant implicit operands in `ppc-early-ret pass`. Reviewed By: jsji Differential Revision: https://reviews.llvm.org/D76042	2020-07-19 07:01:45 +00:00
Dmitry Preobrazhensky	2e87acac9b	[AMDGPU] Removed s_mov_regrd and mov_fed opcodes These opcodes are not intended for public use. Reviewers: arsenm, rampitec Differential Revision: https://reviews.llvm.org/D81659	2020-07-17 19:52:54 +03:00
Yonghong Song	0e347c0ff0	BPF: generate .rodata BTF datasec for certain initialized local var's Currently, BTF datasec type for .rodata is generated only if there are user-defined readonly global variables which have debuginfo generated. Certain readonly global variables may be generated from initialized local variables. For example, void foo(const void *); int test() { const struct { unsigned a[4]; char b; } val = { .a = {2, 3, 4, 5}, .b = 6 }; foo(&val); return 0; } The clang will create a private linkage const global to store the initialized value: @__const.test.val = private unnamed_addr constant %struct.anon { [4 x i32] [i32 2, i32 3, i32 4, i32 5], i8 6 }, align 4 This global variable eventually is put in .rodata ELF section. If there is .rodata ELF section, libbpf expects a BTF .rodata datasec as well even though it may be empty meaning there are no global readonly variables with proper debuginfo. Martin reported a bug where without this empty BTF .rodata datasec, the bpftool gen will exit with an error. This patch fixed the issue by generating .rodata BTF datasec if there exists local var intial data which will result in .rodata ELF section. Differential Revision: https://reviews.llvm.org/D84002	2020-07-17 09:45:57 -07:00
Matt Arsenault	994fb86bc2	AMDGPU: Fix promoting f16 fpowi with legal f16	2020-07-17 11:29:05 -04:00
Jay Foad	760af7a074	[AMDGPU] Avoid splitting FLAT offsets in unsafe ways As explained in the comment: // For a FLAT instruction the hardware decides whether to access // global/scratch/shared memory based on the high bits of vaddr, // ignoring the offset field, so we have to ensure that when we add // remainder to vaddr it still points into the same underlying object. // The easiest way to do that is to make sure that we split the offset // into two pieces that are both >= 0 or both <= 0. In particular FLAT (as opposed to SCRATCH and GLOBAL) instructions have an unsigned immediate offset field, so we can't use it to help split a negative offset. Differential Revision: https://reviews.llvm.org/D83394	2020-07-17 11:44:10 +01:00
Simon Wallis	3e0ccf9a90	[ARM] halfword store hits llvm_unreachable with big-endian Summary: [ARM] halfword store hits llvm_unreachable with big-endian Provide missing case in getFixupKindContainerSizeBytes(). This stops execution reaching llvm_unreachable("Unknown fixup kind!") D83947 Reviewers: olista01, ostannard Reviewed By: ostannard Subscribers: ostannard, kristof.beyls, hiraditya, danielkiss, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D83947 Change-Id: I598aa1fb51fd1c6f424c557c85d6df6d1958bc62	2020-07-17 08:56:44 +01:00
hsmahesha	4905536086	Revert "[AMDGPU/MemOpsCluster] Implement new heuristic for computing max mem ops cluster size" This reverts commit `cc9d693856`.	2020-07-17 12:20:37 +05:30
Craig Topper	6bba95831e	[X86] Change the scheduler model for 'pentium4' to SandyBridgeModel. I meant to do this in D83913, but missed it while updating the feature list. Interestingly I think this is disabling the postRA scheduler. But it does match our default 64-bit behavior. Reviewed By: echristo Differential Revision: https://reviews.llvm.org/D83996	2020-07-16 22:04:29 -07:00
Craig Topper	addbf732c8	[X86] Reorder how the subtarget map key is created. We use a SmallString<512> and attempted to reserve enough space for CPU plus Features, but that doesn't account for all the things that get added to the string. Reorder the string so the shortest things go first which shouldn't exceed the small size. Finally add the feature string at the end which might be long. This should ensure at most one heap allocation without needing to use reserve. I don't know if this matters much in practice, but I was looking into something else that will require more code here and noticed the odd reserve call.	2020-07-16 21:41:45 -07:00
Carl Ritson	3a18665748	[AMDGPU] Translate s_and/s_andn2 to s_mov in vcc optimisation When SCC is dead, but VCC is required then replace s_and / s_andn2 with s_mov into VCC when mask value is 0 or -1. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D83850	2020-07-17 11:48:57 +09:00
Albion Fung	c273563552	[PowerPC][Power10] Add 128-bit Binary Integer Operation instruction definitions and MC Tests This patch adds the instruction definitions and MC tests for the 128-bit Binary Integer Operation instructions introduced in Power10. Differential Revision: https://reviews.llvm.org/D83516	2020-07-16 17:16:43 -05:00
Wouter van Oortmerssen	cc1b9b680f	[WebAssembly] 64-bit (function) pointer fixes. Accounting for the fact that Wasm function indices are 32-bit, but in wasm64 we want uniform 64-bit pointers. Includes reloc types for 64-bit table indices. Differential Revision: https://reviews.llvm.org/D83729	2020-07-16 14:10:22 -07:00
Craig Topper	5408024fa8	[X86] Move integer hadd/hsub formation into a helper function shared by combineAdd and combineSub. There was a lot of duplicate code here for checking the VT and subtarget. Moving it into a helper avoids that. It also fixes a bug that combineAdd reused Op0/Op1 after a call to isHorizontalBinOp may have changed it. The new helper function has its own local version of Op0/Op1 that aren't shared by other code. Fixes PR46455. Reviewed By: spatel, bkramer Differential Revision: https://reviews.llvm.org/D83971	2020-07-16 13:27:27 -07:00
Craig Topper	ad171d24b9	[X86] Change the tuning settings for pentium4 to be more modern since its the default 32-bit cpu in clang Alternative to D83897. I believe the big change here is that I removed slow unaligned memory 16 Down side that it may adversely effect tuning if someone explicitly targets -march=pentium4 and expects pentium4 tuned code. Of course pentium4 is so old our default behavior with the previous settings may not have been the best either. Reviewed By: echristo, RKSimon Differential Revision: https://reviews.llvm.org/D83913	2020-07-16 12:51:25 -07:00
Zakk Chen	294d1eae75	[RISCV] Add support for -mcpu option. Summary: 1. gcc uses `-march` and `-mtune` flag to chose arch and pipeline model, but clang does not have `-mtune` flag, we uses `-mcpu` to chose both infos. 2. Add SiFive e31 and u54 cpu which have default march and pipeline model. 3. Specific `-mcpu` with rocket-rv[32\|64] would select pipeline model only, and use the driver's arch choosing logic to get default arch. Reviewers: lenary, asb, evandro, HsiangKai Reviewed By: lenary, asb, evandro Tags: #llvm, #clang Differential Revision: https://reviews.llvm.org/D71124	2020-07-16 11:46:22 -07:00
Thomas Lively	ecb2e5bcd7	[WebAssembly] Implement v128.select Although the SIMD spec proposal does not specifically include a select instruction, the select instruction in MVP WebAssembly is polymorphic over the selected types, so it is able to work on v128 values when they are enabled. This patch introduces a new variant of the select instruction for each legal vector type. Additional ISel patterns are adapted from the SELECT_I32 and SELECT_I64 patterns. Depends on D83736. Differential Revision: https://reviews.llvm.org/D83737	2020-07-16 11:37:25 -07:00
Matt Arsenault	1912ace968	AMDGPU: Move handling of AGPR copies to a separate function This is in preparation for fixing multiple problems with the way AGPR copies are handled, but this change is NFC itself. First, it's relying on recursively calling copyPhysReg, which is losing information necessary to get correct super register handling. Second, it's constructing a new RegScavenger and doing a O(N^2) walk on every single sub-spill for every AGPR tuple copy. Third, it's using the forward form of the scavenger, and not using the preferred backwards scan.	2020-07-16 14:32:24 -04:00
Thomas Lively	f0f9787646	[WebAssembly] Lower vselect to v128.bitselect We were previously expanding vselect and matching on the expansion to generate bitselects, but in some cases the expansion would be further combined and a bitselect would not get generated. This patch improves codegen in those cases by legalizing vselect and lowering it to v128.bitselect. The old pattern that matches the expansion is still useful for lowering IR that already uses the expansion rather than a select operation. Differential Revision: https://reviews.llvm.org/D83734	2020-07-16 11:11:19 -07:00
Craig Topper	9adf7461f7	[X86] Add test case for PR46455.	2020-07-16 11:06:55 -07:00
Matt Arsenault	023883a834	IR: Rename Argument::hasPassPointeeByValueAttr to prepare for byref When the byref attribute is added, there will need to be two similar functions for the existing cases which have an associate value copy, and byref which does not. Most, but not all of the existing uses will use the existing version. The associated size function added by D82679 also needs to contextually differ, and will help eliminate a few places still relying on pointee element types.	2020-07-16 13:50:49 -04:00
Matt Arsenault	219a9fea14	AMDGPU: Rename gfx9 version of v_add_i32/v_sub_i32 The carry-out opcode is renamed, so eliminate the deceptive _gfx9, which looked like the encoded instruction. The real encoded version was named _gfx9_gfx9. Move it into the VI encoding namespace. The gfx9 namespace is just to deal with the renamed instructions that reinterpret the opcode. When codegened, it would fail to find the real instruction since it wasn't in the right namespace.	2020-07-16 13:32:05 -04:00
Matt Arsenault	79f67cae91	AMDGPU: Rename add/sub with carry out instructions The hardware has created a real mess in the naming for add/sub, which have been renamed basically every generation. Switch the carry out pseudos to have the gfx9/gfx10 names. We were using the original SI/CI v_add_i32/v_sub_i32 names. Later targets reintroduced these names as carryless instructions with a saturating clamp bit, which we do not define. Do this rename so we can unambiguously add these missing instructions. The carry-in versions should also be renamed, but at least those had a consistent _u32 name to begin with. The 16-bit instructions were also renamed, but aren't ambiguous. This does regress assembler error message quality in some cases. In mismatched wave32/wave64 situations, this will switch from "unsupported instruction" to "invalid operand", with the error pointing at the wrong position. I couldn't quite follow how the assembler selects these, but the previous behavior seemed accidental to me. It looked like there was a partial attempt to handle this which was never completed (i.e. there is an AMDGPUOperand::isBoolReg but it isn't used for anything).	2020-07-16 13:16:30 -04:00
Petar Avramovic	6850033ca6	AMDGPU/GlobalISel: Legalize s64->s16 G_SITOFP/G_UITOFP Add widenScalar for TypeIdx == 0 for G_SITOFP/G_UITOFP. Legailize, using widenScalar, as s64->s32 G_SITOFP/G_UITOFP followed by s32->s16 G_FPTRUNC. Differential Revision: https://reviews.llvm.org/D83880	2020-07-16 16:31:57 +02:00
David Green	7bbde17e62	[ARM] Add a PreferNoCSEL option. NFC This disables CSEL, falling back to the old predicated move behaviour for cases where that is useful for debugging.	2020-07-16 12:42:07 +01:00
Paul Walker	509351d768	[SVE] Add lowering for scalable vector fadd, fdiv, fmul and fsub operations. Lower the operations to predicated variants. This is prep work required for fixed length code generation but also fixes a bug whereby these operations fail selection when "unpacked" vector types (e.g. MVT::nxv2f32) are used. This patch also adds the missing "unpacked" patterns for FMA. Differential Revision: https://reviews.llvm.org/D83765	2020-07-16 11:31:35 +00:00
Roman Lebedev	4028409d77	Reland "[NFC] SimplifyCFGOptions: drop multi-parameter ctor, use default member-init" This reverts commit `5831e86190`, which reverted commit `90c1b0442a` in preparation for reverting commit `b2018198c3` in commit `1067d3e176` due to the introducton of a dependency cycle. Now that the other revert is reverted with a fix, this can be relanded.	2020-07-16 13:40:01 +03:00
Roman Lebedev	fb432a51f4	Reland "[NFCI] createCFGSimplificationPass(): migrate to also take SimplifyCFGOptions" This reverts commit `1067d3e176`, which reverted commit `b2018198c3`, because it introduced a Dependency Cycle between Transforms/Scalar and Transforms/Utils. So let's just move SimplifyCFGOptions.h into Utils/, thus avoiding the cycle.	2020-07-16 13:40:01 +03:00
Pavel Iliin	b9a6fb6428	[ARM] VBIT/VBIF support added. Vector bitwise selects are matched by pseudo VBSP instruction and expanded to VBSL/VBIT/VBIF after register allocation depend on operands registers to minimize extra copies.	2020-07-16 11:25:53 +01:00
David Green	146d35b6ee	[ARM] CSEL generation This adds a peephole optimisation to turn a t2MOVccr that could not be folded into any other instruction into a CSEL on 8.1-m. The t2MOVccr would usually be expanded into a conditional mov, that becomes an IT; MOV pair. We can instead generate a CSEL instruction, which can potentially be smaller and allows better register allocation freedom, which can help reduce codesize. Performance is more variable and may depend on the micrarchitecture details, but initial results look good. If we need to control this per-cpu, we can add a subtarget feature as we need it. Original patch by David Penry. Differential Revision: https://reviews.llvm.org/D83566	2020-07-16 11:10:53 +01:00
Kerry McLaughlin	2762da0a16	[SVE][CodeGen] Legalisation of masked loads and stores Summary: This patch modifies IncrementMemoryAddress to use a vscale when calculating the new address if the data type is scalable. Also adds tablegen patterns which match an extract_subvector of a legal predicate type with zip1/zip2 instructions Reviewers: sdesmalen, efriedma, david-arm Reviewed By: efriedma, david-arm Subscribers: tschuett, hiraditya, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D83137	2020-07-16 10:55:45 +01:00
Petar Avramovic	5658002b80	AMDGPU/GlobalISel: Select G_FREEZE Select G_FREEZE in the same way that COPY is selected. Differential Revision: https://reviews.llvm.org/D83031	2020-07-16 11:10:48 +02:00
Adrian Kuegel	1067d3e176	Revert "[NFCI] createCFGSimplificationPass(): migrate to also take SimplifyCFGOptions" This reverts commit `b2018198c3`. This commit introduced a Dependency Cycle between Transforms/Scalar and Transforms/Utils. Transforms/Scalar already depends on Transforms/Utils, so if SimplifyCFGOptions.h is moved to Scalar, and Utils/Local.h still depends on it, we have a cycle.	2020-07-16 10:54:10 +02:00
Adrian Kuegel	5831e86190	Revert "[NFC] SimplifyCFGOptions: drop multi-parameter ctor, use default member-init" This reverts commit `90c1b0442a`. This is based on another commit which also needs to be reverted. The other commit introduced a Dependency Cycle between Transforms/Scalar and TransformUtils. Scalar already depends (in many ways) on TransformUtils, so making TransformUtils depend on Scalar should be avoided.	2020-07-16 10:32:50 +02:00
Craig Topper	71b49aa438	[X86] Allow lsl/lar to be parsed with a GR16, GR32, or GR64 as source register. This matches GNU assembler behavior. Operand size is determined only from the destination register.	2020-07-15 23:51:37 -07:00
Amy Kwan	fc55308628	[PowerPC][Power10] Fix VINS* (vector insert byte/half/word) instructions to have i32 arguments. Previously, the vins* intrinsic was incorrectly defined to have its second and third argument arguments as an i64. This patch fixes the second and third argument of the vins* instruction and intrinsic to have i32s instead. Differential Revision: https://reviews.llvm.org/D83497	2020-07-16 00:30:24 -05:00
Carl Ritson	5bf2a9dd40	[AMDGPU] Update VMEM scalar write hazard mitigation sequence Using s_waitcnt_depctr 0xffe3 is potentially faster than v_nop. Reviewed By: rampitec, foad Differential Revision: https://reviews.llvm.org/D83872	2020-07-16 11:37:45 +09:00
dfukalov	76a0c0ee6f	[AMDGPU][CostModel] Improve cost estimation for fused {fadd\|fsub}(a,fmul(b,c)) Summary: If result of fmul(b,c) has one use, in almost all cases (except denormals are IEEE) the pair of operations will be fused in one fma/mad/mac/etc. Reviewers: rampitec Reviewed By: rampitec Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits, kerbowa Tags: #llvm Differential Revision: https://reviews.llvm.org/D83919	2020-07-16 03:06:38 +03:00
Roman Lebedev	90c1b0442a	[NFC] SimplifyCFGOptions: drop multi-parameter ctor, use default member-init Likewise, just use the builder pattern. Taking multiple params is unmaintainable.	2020-07-16 01:48:34 +03:00
Roman Lebedev	b2018198c3	[NFCI] createCFGSimplificationPass(): migrate to also take SimplifyCFGOptions Taking so many parameters is simply unmaintainable. We don't want to include the entire llvm/Transforms/Utils/Local.h into llvm/Transforms/Scalar.h so i've split SimplifyCFGOptions into it's own header.	2020-07-16 01:27:54 +03:00
Craig Topper	3c2a56a857	[X86] Teach assembler parser to accept lsl and lar with a 64 or 32 source register when the destination is a 64 register. Previously we only accepted a 32-bit source with a 64-bit dest. Accepting 64-bit as well is more consistent with gas behavior. I think maybe we should accept 16 bit register as well, but I'm not sure.	2020-07-15 15:17:06 -07:00
Dmitry Preobrazhensky	e122eba185	[AMDGPU][MC] Corrected MTBUF parsing and decoding MTBUF implementation has many issues and this change addresses most of these: - refactored duplicated code; - hardcoded constants moved out of high-level code; - fixed a decoding error when nfmt or dfmt are zero (bug 36932); - corrected parsing of operand separators (bug 46403); - corrected handling of missing operands (bug 46404); - corrected handling of out-of-range modifiers (bug 46421); - corrected default value (bug 46467). Reviewers: arsenm, rampitec, vpykhtin, artem.tamazov, kzhuravl Differential Revision: https://reviews.llvm.org/D83760	2020-07-15 19:46:00 +03:00
YunQiang Su	3a6c2a61c6	[mips] Rename FeatureMadd4 to FeatureNoMadd4. NFC `FeatureMadd4` is used to disable `madd4`, and the corresponding feature option is `(+-)nomadd4`. Renaming to the `FeatureNoMadd4` makes its purpose clear. Patch by YunQiang Su. Differential Revision: https://reviews.llvm.org/D83780	2020-07-15 14:39:38 +03:00
lewis-revill	c9c955ada8	[RISCV] Add matching of codegen patterns to RISCV Bit Manipulation Zbt asm instructions This patch provides optimization of bit manipulation operations by enabling the +experimental-b target feature. It adds matching of single block patterns of instructions to specific bit-manip instructions from the ternary subset (zbt subextension) of the experimental B extension of RISC-V. It adds also the correspondent codegen tests. This patch is based on Claire Wolf's proposal for the bit manipulation extension of RISCV: https://github.com/riscv/riscv-bitmanip/blob/master/bitmanip-0.92.pdf Differential Revision: https://reviews.llvm.org/D79875	2020-07-15 12:19:34 +01:00
lewis-revill	d4be33374c	[RISCV] Add matching of codegen patterns to RISCV Bit Manipulation Zbs asm instructions This patch provides optimization of bit manipulation operations by enabling the +experimental-b target feature. It adds matching of single block patterns of instructions to specific bit-manip instructions from the single-bit subset (zbs subextension) of the experimental B extension of RISC-V. It adds also the correspondent codegen tests. This patch is based on Claire Wolf's proposal for the bit manipulation extension of RISCV: https://github.com/riscv/riscv-bitmanip/blob/master/bitmanip-0.92.pdf Differential Revision: https://reviews.llvm.org/D79874	2020-07-15 12:19:34 +01:00
lewis-revill	6144f0a1e5	[RISCV] Add matching of codegen patterns to RISCV Bit Manipulation Zbbp asm instructions This patch provides optimization of bit manipulation operations by enabling the +experimental-b target feature. It adds matching of single block patterns of instructions to specific bit-manip instructions belonging to both the permutation and the base subsets of the experimental B extension of RISC-V. It adds also the correspondent codegen tests. This patch is based on Claire Wolf's proposal for the bit manipulation extension of RISCV: https://github.com/riscv/riscv-bitmanip/blob/master/bitmanip-0.92.pdf Differential Revision: https://reviews.llvm.org/D79873	2020-07-15 12:19:34 +01:00
lewis-revill	31b52b4345	[RISCV] Add matching of codegen patterns to RISCV Bit Manipulation Zbp asm instructions This patch provides optimization of bit manipulation operations by enabling the +experimental-b target feature. It adds matching of single block patterns of instructions to specific bit-manip instructions from the permutation subset (zbp subextension) of the experimental B extension of RISC-V. It adds also the correspondent codegen tests. This patch is based on Claire Wolf's proposal for the bit manipulation extension of RISCV: https://github.com/riscv/riscv-bitmanip/blob/master/bitmanip-0.92.pdf Differential Revision: https://reviews.llvm.org/D79871	2020-07-15 12:19:34 +01:00
lewis-revill	e2692f0ee7	[RISCV] Add matching of codegen patterns to RISCV Bit Manipulation Zbb asm instructions This patch provides optimization of bit manipulation operations by enabling the +experimental-b target feature. It adds matching of single block patterns of instructions to specific bit-manip instructions from the base subset (zbb subextension) of the experimental B extension of RISC-V. It adds also the correspondent codegen tests. This patch is based on Claire Wolf's proposal for the bit manipulation extension of RISCV: https://github.com/riscv/riscv-bitmanip/blob/master/bitmanip-0.92.pdf Differential Revision: https://reviews.llvm.org/D79870	2020-07-15 12:19:34 +01:00
Jessica Clarke	2dc16fbdf0	[RISCV] Duplicate pseudo expansion comment to RISCVMCCodeEmitter Follow-on from D77443. Although we're not fixing any of these pseudo-instructions, the potential for them to be out of sync still exists.	2020-07-15 10:52:42 +01:00
Jessica Clarke	3382c243ba	[RISCV] Fix RISCVInstrInfo::getInstSizeInBytes for atomics pseudos Summary: Without these, the generic branch relaxation pass will underestimate the range required for branches spanning these and we can end up with "fixup value out of range" errors rather than relaxing the branches. Some of the instructions in the expansion may end up being compressed but exactly determining that is awkward, and these conservative values should be safe, if slightly suboptimal in rare cases. Reviewers: asb, lenary, luismarques, lewis-revill Reviewed By: asb, luismarques Subscribers: hiraditya, rbar, johnrusso, simoncook, sabuasal, niosHD, kito-cheng, shiva0217, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, rkruppe, jfb, PkmX, jocewei, psnobl, benna, Jim, s.egerton, pzheng, sameer.abuasal, apazos, evandro, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D77443	2020-07-15 10:50:55 +01:00
Carl Ritson	674226126d	[AMDGPU] Apply pre-emit s_cbranch_vcc optimation to more patterns Add handling of s_andn2 and mask of 0. This eliminates redundant instructions from uniform control flow. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D83641	2020-07-15 11:02:35 +09:00
Krzysztof Pszeniczny	c3e6555616	Call Frame Information (CFI) Handling for Basic Block Sections This patch handles CFI with basic block sections, which unlike DebugInfo does not support ranges. The DWARF standard explicitly requires emitting separate CFI Frame Descriptor Entries for each contiguous fragment of a function. Thus, the CFI information for all callee-saved registers (possibly including the frame pointer, if necessary) have to be emitted along with redefining the Call Frame Address (CFA), viz. where the current frame starts. CFI directives are emitted in FDE’s in the object file with a low_pc, high_pc specification. So, a single FDE must point to a contiguous code region unlike debug info which has the support for ranges. This is what complicates CFI for basic block sections. Now, what happens when we start placing individual basic blocks in unique sections: * Basic block sections allow the linker to randomly reorder basic blocks in the address space such that a given basic block can become non-contiguous with the original function. * The different basic block sections can no longer share the cfi_startproc and cfi_endproc directives. So, each basic block section should emit this independently. * Each (cfi_startproc, cfi_endproc) directive will result in a new FDE that caters to that basic block section. * Now, this basic block section needs to duplicate the information from the entry block to compute the CFA as it is an independent entity. It cannot refer to the FDE of the original function and hence must duplicate all the stuff that is needed to compute the CFA on its own. * We are working on a de-duplication patch that can share common information in FDEs in a CIE (Common Information Entry) and we will present this as a follow up patch. This can significantly reduce the duplication overhead and is particularly useful when several basic block sections are created. * The CFI directives are emitted similarly for registers that are pushed onto the stack, like callee saved registers in the prologue. There are cfi directives that emit how to retrieve the value of the register at that point when the push happened. This has to be duplicated too in a basic block that is floated as a separate section. Differential Revision: https://reviews.llvm.org/D79978	2020-07-14 12:54:12 -07:00
Logan Smith	a19461d9e1	[NFC] Add 'override' keyword where missing in include/ and lib/. This fixes warnings raised by Clang's new -Wsuggest-override, in preparation for enabling that warning in the LLVM build. This patch also removes the virtual keyword where redundant, but only in places where doing so improves consistency within a given file. It also removes a couple unnecessary virtual destructor declarations in derived classes where the destructor inherited from the base class is already virtual. Differential Revision: https://reviews.llvm.org/D83709	2020-07-14 09:47:29 -07:00
Jay Foad	8a24208977	[AMDGPU] Simplify AMDGPUSubtarget::getWavesPerEU. NFC.	2020-07-14 14:20:02 +01:00
Roger Ferrer Ibanez	0cbdd2a82a	[RISCV] Fix isStoreToStackSlot Because of the layout of stores (that don't have a destination operand) this check is exactly the same as the one in RISCVInstrInfo::isLoadFromStackSlot. Differential Revision: https://reviews.llvm.org/D81805	2020-07-14 12:36:42 +00:00
Sam Elliott	1d15bbb9d9	Revert "[RISCV] Avoid Splitting MBB in RISCVExpandPseudo" This reverts commit `97106f9d80`. This is based on feedback from https://reviews.llvm.org/D82988#2147105	2020-07-14 11:15:01 +01:00
Victor Campos	dad1868772	[AArch64][AsmParser] Add rcpc support in .arch_extension AArch64 does not support enabling rcpc via .arch_extension in assembly. GCC, on the other hand, does. This patch adds 'rcpc' as a valid value to .arch_extension handling. Differential Revision: https://reviews.llvm.org/D83685	2020-07-14 10:57:11 +01:00
Jay Foad	5ab2e14d31	[AMDGPU] Fix typos in performCtlz_CttzCombine() Fix two obvious errors in the code and also update the test check. Also add one test to catch the failure. Patch by Ruiling Song! Differential Revision: https://reviews.llvm.org/D83280	2020-07-14 10:18:18 +01:00
Sjoerd Meijer	959eaa50d6	[ARM][MVE] Only tail-fold integer add reductions If a vector body has live-out values, it is probably a reduction, which needs a final reduction step after the loop. MVE has a VADDV instruction to reduce integer vectors, but doesn't have an equivalent one for float vectors. A live-out value that is not recognised as reduction later in the optimisation pipeline will result in the tail-predicated loop to be reverted to a non-predicated loop and this is very expensive, i.e. it has a significant performance impact, which is what we hope to avoid with fine tuning the ARM TTI hook preferPredicateOverEpilogue implementation. Differential Revision: https://reviews.llvm.org/D82953	2020-07-14 10:15:07 +01:00
Sander de Smalen	a8f4f85d84	[AArch64][SVE] Remove erroneous assert in resolveFrameOffsetReference The code already supports addressing a fixed-size stack object from the frame-pointer, by first subtracting sizeof(SVE area) from FP. Reviewers: efriedma, cameron.mcinally, david-arm, rengolin Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D83125	2020-07-14 09:22:45 +01:00
Jay Foad	1658b8d7dd	[AMDGPU] Avoid using s_cmpk when src0 is not register The hardware spec require src0 of s_cmpk should be a register. So, we should not optimize s_cmp to s_cmpk if src0 is not register. Patch by Ruiling Song!	2020-07-14 09:05:53 +01:00
Carl Ritson	74c14202d9	[AMDGPU] Propagate dead flag during pre-RA exec mask optimizations Preserve SCC dead flags in SIOptimizeExecMaskingPreRA. This helps with removing redundant s_andn2 instructions later. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D83637	2020-07-14 12:53:43 +09:00
Amy Kwan	62f5ba624b	[PowerPC][Power10] Implement Test LSB by Byte Builtins in LLVM/Clang This patch implements builtins for the Test LSB by Byte instruction introduced in Power10. Differential Revision: https://reviews.llvm.org/D82431	2020-07-13 22:47:47 -05:00
Amara Emerson	64eb3a4915	[AArch64][GlobalISel] Add post-legalize combine for sext_inreg(trunc(sextload)) -> copy On AArch64 we generate redundant G_SEXTs or G_SEXT_INREGs because of this. Differential Revision: https://reviews.llvm.org/D81993	2020-07-13 20:27:45 -07:00
Kai Luo	d4e7d126b0	[PowerPC] Generate CFI directives when probing in prologue Add missing CFI directives when probing in prologue if `stack-clash-protection` is enabled. Differential Revision: https://reviews.llvm.org/D83276	2020-07-14 02:56:12 +00:00
Fangrui Song	eafe7c14ea	[PowerPC] Fix combineVectorShuffle regression after D77448 Commit `1fed131660` assumed that NewShuffle (shuffle vector canonicalization result) will always be ShuffleVectorSDNode, which may be false (it may be a BITCAST node): ``` ... t12: v4i32 = scalar_to_vector t2 t15: v16i8 = bitcast t12 # LHS t17: v16i8 = vector_shuffle<u,u,u,u,u,u,u,u,0,1,2,3,u,u,u,u> t15, undef:v16i8 # SVN ``` Reviewed By: #powerpc, nemanjai Differential Revision: https://reviews.llvm.org/D83617	2020-07-13 16:57:27 -07:00
Eric Christopher	e958379581	Fold the opt size check into the assert to silence an unused variable warning.	2020-07-13 16:05:24 -07:00
Matt Arsenault	6a8c11a11f	GlobalISel: Implement widenScalar for saturating add/sub Add a placeholder legality rule for AMDGPU until the rest of the actions are handled.	2020-07-13 14:46:40 -04:00
serge-sans-paille	62881fda58	Fix HexagonGenExtract return status Differential Revision: https://reviews.llvm.org/D83460	2020-07-13 20:41:59 +02:00
Matt Arsenault	db091e12b2	RISCV: Avoid GlobalISel build break in a future patch The GlobalISelEmitter is stricter about matching timm instruction outputs to timm inputs (although in an accidental sort of way that doesn't hit a proper import failure error). Also, apparently no intrinsic patterns were importing since the ID enum declaration was missing.	2020-07-13 14:01:57 -04:00
Hiroshi Yamauchi	fb558ccae7	[PGO][PGSO] Add profile guided size optimization to X86ISelDAGToDAG. Differential Revision: https://reviews.llvm.org/D83331	2020-07-13 10:28:09 -07:00
Hiroshi Yamauchi	153a0b8906	[PGO][PGSO] Add profile guided size optimization to the X86 LEA fixup. Differential Revision: https://reviews.llvm.org/D83330	2020-07-13 09:46:22 -07:00
Eric Astor	3aabfa2808	[ms] [llvm-ml] Restore omitted changes requested by reviewer	2020-07-13 10:49:19 -04:00
Eric Astor	f08e8b6d7c	[ms] [llvm-ml] Add support for MASM STRUCT casting field accessors: (<TYPE> PTR <value>).<field> Summary: Add support for MASM STRUCT casting field accessors: (<TYPE> PTR <value>).<field> Since these are operands, we add them to X86AsmParser. If/when we extend MASM support to other architectures (e.g., ARM), we will need similar changes there as well. Reviewed By: thakis Differential Revision: https://reviews.llvm.org/D83346	2020-07-13 10:40:47 -04:00
Eric Astor	4cdea5faf9	[ms] [llvm-ml] Improve MASM STRUCT field accessor support Summary: Adds support for several accessors: - `[<identifier>.<struct name>].<field>` - `[<identifier>.<struct name>.<field>].<subfield>` (where `field` has already-defined STRUCT type) - `[<variable>.<field>].<subfield>` (where `field` has already-defined STRUCT type) Reviewed By: thakis Differential Revision: https://reviews.llvm.org/D83344	2020-07-13 10:34:30 -04:00
Sjoerd Meijer	595270ae39	[ARM][MVE] Refactor option -disable-mve-tail-predication This refactors option -disable-mve-tail-predication to take different arguments so that we have 1 option to control tail-predication rather than several different ones. This is also a prep step for D82953, in which we want to reject reductions unless that is requested with this option. Differential Revision: https://reviews.llvm.org/D83133	2020-07-13 13:40:33 +01:00
Mirko Brkusanin	38998cfa9c	[AMDGPU][GlobalISel] Fix subregister index for EXEC register in selectBallot. Temporarily remove subregister for EXEC in selectBallot added in https://reviews.llvm.org/D83214 to fix failures on expensive checks buildbot.	2020-07-13 13:35:34 +02:00
Paul Walker	319a97b5e2	[SVE] Ensure fixed length vector fptrunc operations bigger than NEON are not considered legal. Differential Revision: https://reviews.llvm.org/D83568	2020-07-13 11:16:30 +00:00
Mirko Brkusanin	ce23e54162	[AMDGPU][GlobalISel] Select llvm.amdgcn.ballot Select ballot intrinsic for GlobalISel. Differential Revision: https://reviews.llvm.org/D83214	2020-07-13 12:14:43 +02:00
Qiu Chaofan	b6912c879e	[PowerPC] Support constrained conversion in SPE target This patch adds support for constrained int/fp conversion between signed/unsigned i32 and f32/f64. Reviewed By: jhibbits Differential Revision: https://reviews.llvm.org/D82747	2020-07-13 12:18:36 +08:00
Fangrui Song	4d5fd0ee5e	[MC][RISCV] Set UseIntegratedAssembler to true to align with most other targets. Also, -fintegrated-as is the default for clang -target riscv*.	2020-07-12 21:04:48 -07:00
Craig Topper	f8f007e378	[X86] Consistently use 128 as the PSHUFB/VPPERM index for zero Bit 7 of the index controls zeroing, the other bits are ignored when bit 7 is set. Shuffle lowering was using 128 and shuffle combining was using 255. Seems like we should be consistent. This patch changes shuffle combining to use 128 to match lowering. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D83587	2020-07-12 10:52:43 -07:00
Craig Topper	04013a07ac	[X86] Fix two places that appear to misuse peekThroughOneUseBitcasts peekThroughOneUseBitcasts checks the use count of the operand of the bitcast. Not the bitcast itself. So I think that means we need to do any outside haseOneUse checks before calling the function not after. I was working on another patch where I misused the function and did a very quick audit to see if I there were other similar mistakes. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D83598	2020-07-12 10:52:43 -07:00
Yonghong Song	152a9fef1b	BPF: permit .maps section variables with typedef type Currently, llvm when see a global variable in .maps section, it ensures its type must be a struct type. Then pointee will be further evaluated for the structure members. In normal cases, the pointee type will be skipped. Although this is what current all bpf programs are doing, but it is a little bit restrictive. For example, it is legitimate for users to have: typedef struct { int key_size; int value_size; } __map_t; __map_t map __attribute__((section(".maps"))); This patch lifts this restriction and typedef of a struct type is also allowed for .maps section variables. To avoid create unnecessary fixup entries when traversal started with typedef/struct type, the new implementation first traverse all map struct members and then traverse the typedef/struct type. This way, in internal BTFDebug implementation, no fixup entries are generated. Two new unit tests are added for typedef and const struct in .maps section. Also tested with kernel bpf selftests. Differential Revision: https://reviews.llvm.org/D83638	2020-07-12 09:42:25 -07:00
Fangrui Song	be9f363704	[AVRInstPrinter] printOperand: support llvm-objdump --print-imm-hex Differential Revision: https://reviews.llvm.org/D83634	2020-07-12 08:14:52 -07:00
Christudasan Devadasan	d7a05698ef	[AMDGPU] Move LowerSwitch pass to CodeGenPrepare. It is possible that LowerSwitch pass leaves certain blocks unreachable from the entry. If not removed, these dead blocks can cause undefined behavior in the subsequent passes. It caused a crash in the AMDGPU backend after the instruction selection when a PHI node has its incoming values coming from these unreachable blocks. In the AMDGPU pass flow, the last invocation of UnreachableBlockElim precedes where LowerSwitch is currently placed and eventually missed out on the opportunity to get these blocks eliminated. This patch ensures that LowerSwitch pass get inserted earlier to make use of the existing unreachable block elimination pass. Reviewed By: sameerds, arsenm Differential Revision: https://reviews.llvm.org/D83584	2020-07-11 16:33:38 +05:30
Wang, Pengfei	e628092524	[X86][MMX] Optimize MMX shift intrinsics. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D83534	2020-07-11 11:16:23 +08:00
Jinsong Ji	3e3acc1cc7	[PowerPC][MachinePipeliner] Enable pipeliner if hasInstrSchedModel P9 is the only one with InstrSchedModel, but we may have more in the future, we should not hardcoded it to P9, check hasInstrSchedModel instead. Reviewed By: hfinkel Differential Revision: https://reviews.llvm.org/D83590	2020-07-11 02:24:12 +00:00
Thomas Lively	b59c6fcaf3	[WebAssembly] Prefer v128.const for constant splats In BUILD_VECTOR lowering, we used to generally prefer using splats over v128.const instructions because v128.const has a very large encoding. However, in `d5b7a4e2e8` we switched to preferring consts because they are expected to be more efficient in engines. This patch updates the ISel patterns to match this current preference. Differential Revision: https://reviews.llvm.org/D83581	2020-07-10 18:27:52 -07:00
Matt Arsenault	31f4e43f3f	AMDGPU: Remove .value_type from kernel metadata This doesn't appear used for anything, and is emitted incorrectly based on the description. This also depends on the IR type, and pointee element type.	2020-07-10 18:16:31 -04:00
Craig Topper	122a45fbac	[X86] Add isel patterns for matching broadcast vpternlog if the ternlog and the broadcast have different types.	2020-07-10 15:15:02 -07:00
Matt Arsenault	9ff310d5bf	AArch64: Fix unused variables	2020-07-10 15:12:25 -04:00
Sidharth Baveja	e541e1b757	[NFC] Separate Peeling Properties into its own struct (re-land after minor fix) Summary: This patch separates the peeling specific parameters from the UnrollingPreferences, and creates a new struct called PeelingPreferences. Functions which used the UnrollingPreferences struct for peeling have been updated to use the PeelingPreferences struct. Author: sidbav (Sidharth Baveja) Reviewers: Whitney (Whitney Tsang), Meinersbur (Michael Kruse), skatkov (Serguei Katkov), ashlykov (Arkady Shlykov), bogner (Justin Bogner), hfinkel (Hal Finkel), anhtuyen (Anh Tuyen Tran), nikic (Nikita Popov) Reviewed By: Meinersbur (Michael Kruse) Subscribers: fhahn (Florian Hahn), hiraditya (Aditya Kumar), llvm-commits, LLVM Tag: LLVM Differential Revision: https://reviews.llvm.org/D80580	2020-07-10 18:39:30 +00:00
Lei Huang	90b1a710ae	[PowerPC] Enable default support of quad precision operations Summary: Remove option guarding support of quad precision operations. Reviewers: nemanjai, #powerpc, steven.zhang Reviewed By: nemanjai, #powerpc, steven.zhang Subscribers: qiucf, wuzish, nemanjai, hiraditya, kbarton, shchenz, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D83437	2020-07-10 13:27:48 -05:00
Luke Geeson	954db63cd1	[ARM] Add Cortex-A78 and Cortex-X1 Support for Clang and LLVM This patch upstreams support for the Arm-v8 Cortex-A78 and Cortex-X1 processors for AArch64 and ARM. In detail: - Adding cortex-a78 and cortex-x1 as cpu options for aarch64 and arm targets in clang - Adding Cortex-A78 and Cortex-X1 CPU names and ProcessorModels in llvm details of the CPU can be found here: https://www.arm.com/products/cortex-x https://www.arm.com/products/silicon-ip-cpu/cortex-a/cortex-a78 The following people contributed to this patch: - Luke Geeson - Mikhail Maltsev Reviewers: t.p.northover, dmgreen Reviewed By: dmgreen Subscribers: dmgreen, kristof.beyls, hiraditya, danielkiss, cfe-commits, llvm-commits, miyuki Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D83206	2020-07-10 18:24:11 +01:00
Simon Pilgrim	4cc26a44ca	[X86][SSE] Use shouldUseHorizontalOp helper to determine whether to use (F)HADD. NFCI.	2020-07-10 12:13:34 +01:00
dstuttar	69a89b54c6	[NFC] Change isFPPredicate comparison to ignore lower bound Summary: Since changing the Predicate to be an unsigned enum, the lower bound check for isFPPredicate no longer needs to check the lower bound, since it will always evaluate to true. Also fixed a similar issue in SIISelLowering.cpp by removing the need for comparing to FIRST and LAST predicates Added an assert to the isFPPredicate comparison to flag if the FIRST_FCMP_PREDICATE is ever changed to anything other than 0, in which case the logic will break. Without this change warnings are generated in VS. Change-Id: I358f0daf28c0628c7bda8ad4cab4e1757b761bab Subscribers: arsenm, jvesely, nhaehnle, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D83540	2020-07-10 11:57:20 +01:00
Paul Walker	f78e6a3095	[SVE] Code generation for fixed length vector truncates. Lower fixed length vector truncates to a sequence of SVE UZP1 instructions. Differential Revision: https://reviews.llvm.org/D83395	2020-07-10 10:37:19 +00:00
Mirko Brkusanin	cf40db21af	[AMDGPU][GlobalISel] Fix G_AMDGPU_TBUFFER_STORE_FORMAT mapping Add missing mappings and tablegen definitions for TBUFFER_STORE_FORMAT. Differential Revision: https://reviews.llvm.org/D83240	2020-07-10 11:32:32 +02:00
Simon Pilgrim	77133cc1e2	[X86][AVX] Attempt to fold PACK(SHUFFLE(X,Y),SHUFFLE(X,Y)) -> SHUFFLE(PACK(X,Y)). Truncations lowered as shuffles of multiple (concatenated) vectors often leave us with lane-crossing shuffles that feed a PACKSS/PACKUS, if both shuffles are fed from the same 2 vector sources, then we can PACK the sources directly and shuffle the result instead. This is currently limited to whole i128 lanes in a 256-bit vector, but we can extend this if the need arises (but I'm not seeing many examples in real world code).	2020-07-10 09:33:27 +01:00
Thomas Lively	043eaa9a4a	[WebAssembly][NFC] Simplify vector shift lowering and add tests This patch builds on `0d7286a652` by simplifying the code for detecting splat values and adding new tests demonstrating the lowering of splatted absolute value shift amounts, which are common in code generated by Halide. The lowering is very bad right now, but subsequent patches will improve it considerably. The tests will be useful for evaluating the improvements in those patches. Reviewed By: aheejin Differential Revision: https://reviews.llvm.org/D83493	2020-07-10 00:18:59 -07:00
Zakk Chen	04b9a46c84	[RISCV] Refactor FeatureRVCHints to make ProcessorModel more intuitive Reviewers: luismarques, asb, evandro Reviewed By: asb, evandro Tags: #llvm Differential Revision: https://reviews.llvm.org/D77030	2020-07-09 23:07:39 -07:00
Eli Friedman	56ae2cebcd	[AArch64][SVE] Add lowering for llvm.fma. This is currently bare-bones; we aren't taking advantage of any of the FMA variant instructions. But it's enough to at least generate code. Differential Revision: https://reviews.llvm.org/D83444	2020-07-09 16:12:41 -07:00
Albion Fung	5ffec46720	[PowerPC][Power10] Add Instruction definition/MC Tests for Load/Store Rightmost VSX Vector This patch adds the instruction definitions and the assembly/disassembly tests for the Load/Store VSX Vector Rightmose instructions. Differential Revision: https://reviews.llvm.org/D83364	2020-07-09 17:06:03 -05:00
Stanislav Mekhanoshin	77f8f813a9	[AMDGPU] Return restricted number of regs from TTI This is practically NFC at the moment because nothing really asks the real number or does anything useful with it. Differential Revision: https://reviews.llvm.org/D82202	2020-07-09 14:31:28 -07:00
Eric Christopher	ce1e4853b5	Temporarily Revert "[PowerPC] Split s34imm into two types" as it was failing in Release+Asserts mode with an assert. This reverts commit `bd20680311`.	2020-07-09 13:36:32 -07:00
Kyungwoo Lee	7af27b65b3	[NFC][AArch64] Refactor getArgumentPopSize Differential Revision: https://reviews.llvm.org/D83456	2020-07-09 11:58:15 -07:00
Craig Topper	918e653186	[X86] Immediately call LowerShift from lowerBuildVectorToBitOp. If we don't immediately lower the vector shift, the splat constant vector we created may get turned into a constant pool load before we get around to lowering the shift. This makes it a lot more difficult to create a shift by constant. Sometimes we fail to see through the constant pool at all and end up trying to lower as if it was a variable shift. This requires custom handling and may create an unsupported vselect on pre-sse-4.1 targets. Since we're after LegalizeVectorOps we are unable to legalize the unsupported vselect as that code is in LegalizeVectorOps rather than LegalizeDAG. So calling LowerShift immediately ensures that we get see the splat constant. Fixes PR46527. Differential Revision: https://reviews.llvm.org/D83455	2020-07-09 10:51:29 -07:00
Hiroshi Yamauchi	2c1a9006dd	[PGO][PGSO] Add profile guided size optimization to X86 ISel Lowering.	2020-07-09 10:43:45 -07:00
Craig Topper	3e75912005	[X86] Directly emit X86ISD::BLENDV instead of VSELECT in a few places that were emitting sign bit tests. Technically a VSELECT expects a vector of all 1s or 0s elements for its condition. But we aren't guaranteeing that the sign bit and the non sign bits match in these locations. So we should use BLENDV which is more relaxed. Differential Revision: https://reviews.llvm.org/D83447	2020-07-09 10:40:09 -07:00
Stefan Pintilie	bd20680311	[PowerPC] Split s34imm into two types Currently the instruction paddi always takes s34imm as the type for the 34 bit immediate. However, the PC Relative form of the instruction should not produce the same fixup as the non PC Relative form. This patch splits the s34imm type into s34imm and s34imm_pcrel so that two different fixups can be emitted. Reviewed By: kamaub, nemanjai Differential Revision: https://reviews.llvm.org/D83255	2020-07-09 11:28:32 -05:00
Simon Pilgrim	f54402b63a	[X86][AVX] Attempt to fold extract_subvector(shuffle(X)) -> extract_subvector(X) If we're extracting a subvector from a shuffle that is shuffling entire subvectors we can peek through and extract the subvector from the shuffle source instead. This helps remove some cases where concat_vectors(extract_subvector(),extract_subvector()) legalizations has resulted in BLEND/VPERM2F128 shuffles of the subvectors.	2020-07-09 14:09:24 +01:00
Sam Elliott	97106f9d80	[RISCV] Avoid Splitting MBB in RISCVExpandPseudo Since the `RISCVExpandPseudo` pass has been split from `RISCVExpandAtomicPseudo` pass, it would be nice to run the former as early as possible (The latter has to be run as late as possible to ensure correctness). Running earlier means we can reschedule these pairs as we see fit. Running earlier in the machine pass pipeline is good, but would mean teaching many more passes about `hasLabelMustBeEmitted`. Splitting the basic blocks also pessimises possible optimisations because some optimisations are MBB-local, and others are disabled if the block has its address taken (which is notionally what `hasLabelMustBeEmitted` means). This patch uses a new approach of setting the pre-instruction symbol on the AUIPC instruction to a temporary symbol and referencing that. This avoids splitting the basic block, but allows us to reference exactly the instruction that we need to. Notionally, this approach seems more correct because we do actually want to address a specific instruction. This then allows the pass to be moved much earlier in the pass pipeline, before both scheduling and register allocation. However, to do so we must leave the MIR in SSA form (by not redefining registers), and so use a virtual register for the intermediate value. By using this virtual register, this pass now has to come before register allocation. Reviewed By: luismarques, asb Differential Revision: https://reviews.llvm.org/D82988	2020-07-09 13:54:13 +01:00
Benjamin Kramer	b44470547e	Make helpers static. NFC.	2020-07-09 13:48:56 +02:00
Paul Walker	6b403319f8	[SVE] Scalarize fixed length masked loads and stores. When adding support for scalable vector masked loads and stores we accidently opened up likewise for fixed length vectors. This patch restricts support to scalable vectors only, thus ensuring fixed length vectors are treated the same regardless of SVE support. Differential Revision: https://reviews.llvm.org/D83341	2020-07-09 10:47:04 +00:00
Paul Walker	614fb09645	[SVE] Disable some BUILD_VECTOR related code generator features. Fixed length vector code generation for SVE does not yet custom lower BUILD_VECTOR and instead relies on expansion. At the same time custom lowering for VECTOR_SHUFFLE is also not available so this patch updates isShuffleMaskLegal to reject vector types that require SVE. Related to this it also prevents the merging of stores after legalisation because this only works when BUILD_VECTOR is either legal or can be elminated. When this is not the case the code generator enters an infinite legalisation loop. Differential Revision: https://reviews.llvm.org/D83408	2020-07-09 10:47:04 +00:00
serge-sans-paille	e4ec6d0afe	Correctly update return status for MVEGatherScatterLowering `Changed` should reflect all possible changes. Differential Revision: https://reviews.llvm.org/D83459	2020-07-09 11:18:54 +02:00
Kai Luo	e2b93185b8	[PowerPC] Only make copies of registers on stack in variadic function when va_start is called On PPC64, for a variadic function, if va_start is not called, it won't access any variadic argument on stack, thus we can save stores of registers used to pass arguments. Differential Revision: https://reviews.llvm.org/D82361	2020-07-09 07:18:17 +00:00
Nikita Popov	0b39d2d752	Revert "[NFC] Separate Peeling Properties into its own struct" This reverts commit `0369dc98f9`. Many failing tests.	2020-07-08 21:43:32 +02:00
Sidharth Baveja	0369dc98f9	[NFC] Separate Peeling Properties into its own struct Summary: This patch makes the peeling properties of the loop accessible by other loop transformations. Author: sidbav (Sidharth Baveja) Reviewers: Whitney (Whitney Tsang), Meinersbur (Michael Kruse), skatkov (Serguei Katkov), ashlykov (Arkady Shlykov), bogner (Justin Bogner), hfinkel (Hal Finkel) Reviewed By: Meinersbur (Michael Kruse) Subscribers: fhahn (Florian Hahn), hiraditya (Aditya Kumar), llvm-commits, LLVM Tag: LLVM Differential Revision: https://reviews.llvm.org/D80580	2020-07-08 18:59:59 +00:00
Anh Tuyen Tran	6965af43e6	Revert "[NFC] Separate Peeling Properties into its own struct" This reverts commit `fead250b43`.	2020-07-08 18:58:05 +00:00
Anh Tuyen Tran	fead250b43	[NFC] Separate Peeling Properties into its own struct Summary: This patch makes the peeling properties of the loop accessible by other loop transformations. Author: sidbav (Sidharth Baveja) Reviewers: Whitney (Whitney Tsang), Meinersbur (Michael Kruse), skatkov (Serguei Katkov), ashlykov (Arkady Shlykov), bogner (Justin Bogner), hfinkel (Hal Finkel) Reviewed By: Meinersbur (Michael Kruse) Subscribers: fhahn (Florian Hahn), hiraditya (Aditya Kumar), llvm-commits, LLVM Tag: LLVM Differential Revision: https://reviews.llvm.org/D80580	2020-07-08 18:56:03 +00:00
Jay Foad	47788b97a9	SILoadStoreOptimizer: add support for GFX10 image instructions GFX10 image instructions use one or more address operands starting at vaddr0, instead of a single vaddr operand, to allow for NSA forms. Differential Revision: https://reviews.llvm.org/D81675	2020-07-08 19:15:46 +01:00
Jay Foad	a8816ebee0	[AMDGPU] Fix and simplify AMDGPULegalizerInfo::legalizeUDIV_UREM32Impl Use the algorithm from AMDGPUCodeGenPrepare::expandDivRem32. Differential Revision: https://reviews.llvm.org/D83383	2020-07-08 19:14:49 +01:00
Jay Foad	ecac951be9	[AMDGPU] Fix and simplify AMDGPUTargetLowering::LowerUDIVREM Use the algorithm from AMDGPUCodeGenPrepare::expandDivRem32. Differential Revision: https://reviews.llvm.org/D83382	2020-07-08 19:14:49 +01:00
Jay Foad	f4bd01c191	[AMDGPU] Fix and simplify AMDGPUCodeGenPrepare::expandDivRem32 Fix the division/remainder algorithm by adding a second quotient refinement step, which is required in some cases like 0xFFFFFFFFu / 0x11111111u (https://bugs.llvm.org/show_bug.cgi?id=46212). Also document, rewrite and simplify it by ensuring that we always have a lower bound on inv(y), which simplifies the UNR step and the quotient refinement steps. Differential Revision: https://reviews.llvm.org/D83381	2020-07-08 19:14:48 +01:00
Simon Pilgrim	800fb68420	[X86][SSE] Pull out PACK(SHUFFLE(),SHUFFLE()) folds into its own function. NFC. Future patches will extend this so declutter combineVectorPack before we start.	2020-07-08 17:42:42 +01:00
Simon Pilgrim	08a2c9ce5c	[X86] Fix copy+paste typo in combineVectorPack assert message. NFC.	2020-07-08 17:42:42 +01:00
Ulrich Weigand	cca8578efa	[SystemZ] Allow specifying integer registers as part of the address calculation Revision `e1de2773a5` provided support for accepting integer registers in inline asm i.e. __asm("lhi %r0, 5") -> lhi %r0, 5 __asm("lhi 0, 5") -> lhi 0,5 This patch aims to extend this support to instructions which compute addresses as well. (i.e instructions of type BDMem and BD[X\|R\|V\|L]Mem) Author: anirudhp Differential Revision: https://reviews.llvm.org/D83251	2020-07-08 18:20:24 +02:00
Nicolai Hähnle	3fa989d4fd	DomTree: remove explicit use of DomTreeNodeBase::iterator Summary: Almost all uses of these iterators, including implicit ones, really only need the const variant (as it should be). The only exception is in NewGVN, which changes the order of dominator tree child nodes. Change-Id: I4b5bd71e32d71b0c67b03d4927d93fe9413726d4 Reviewers: arsenm, RKSimon, mehdi_amini, courbet, rriddle, aartbik Subscribers: wdng, Prazek, hiraditya, kuhar, rogfer01, rriddle, jpienaar, shauheen, antiagainst, nicolasvasilache, arpith-jacob, mgester, lucyrfox, aartbik, liufengdb, stephenneuendorffer, Joonsoo, grosul1, vkmr, Kayjukh, jurahul, msifontes, cfe-commits, llvm-commits Tags: #clang, #mlir, #llvm Differential Revision: https://reviews.llvm.org/D83087	2020-07-08 18:18:49 +02:00
Sanjay Patel	9114900287	[x86] improve codegen for non-splat bit-masked vector compare and select (PR46531) vselect ((X & Pow2C) == 0), LHS, RHS --> vselect ((shl X, C') < 0), RHS, LHS Follow-up to D83073 - the non-splat mask cases where we actually see an improvement are quite limited from what I can tell. AVX1 needs multiply and blend capabilities and AVX2 needs vector shift and blend capabilities. The intersection of those 2 constraints is only vectors with 32-bit or 64-bit elements. XOP is/was better. Differential Revision: https://reviews.llvm.org/D83181	2020-07-08 08:20:49 -04:00
Simon Pilgrim	9dc250db9d	[X86][AVX] SimplifyDemandedVectorEltsForTargetShuffle - ensure mask is same size as constant size Fixes test regression reported on D81791	2020-07-08 11:47:59 +01:00
Paul Walker	fb75451775	[SVE] Custom ISel for fixed length extract/insert_subvector. We use extact_subvector and insert_subvector to "cast" between fixed length and scalable vectors. This patch adds custom c++ based ISel for the following cases: fixed_vector = ISD::EXTRACT_SUBVECTOR scalable_vector, 0 scalable_vector = ISD::INSERT_SUBVECTOR undef(scalable_vector), fixed_vector, 0 Which result in either EXTRACT_SUBREG/INSERT_SUBREG for NEON sized vectors or COPY_TO_REGCLASS otherwise. Differential Revision: https://reviews.llvm.org/D82871	2020-07-08 09:49:28 +00:00
Simon Pilgrim	c00a27752e	[X86][AVX] Remove redundant EXTRACT_VECTOR_ELT(VBROADCAST(SCALAR())) fold Noticed while looking for similar cases to rG931ec74f7a29 - SimplifyDemandedVectorElts and shuffle combining both should handle this now.	2020-07-08 10:18:36 +01:00
Ben Shi	1e9d0811c9	[RISCV] optimize addition with a pair of (addi imm) For an addition with an immediate in specific ranges, a pair of addi-addi can be generated instead of the ordinary lui-addi-add serial. Reviewed By: MaskRay, luismarques Differential Revision: https://reviews.llvm.org/D82262	2020-07-07 18:57:28 -07:00
Ben Shi	cb82de2960	[RISCV] Optimize multiplication by constant ... to shift/add or shift/sub. Do not enable it on riscv32 with the M extension where decomposeMulByConstant may not be an optimization. Reviewed By: luismarques, MaskRay Differential Revision: https://reviews.llvm.org/D82660	2020-07-07 18:50:24 -07:00
Eric Astor	bc8e262afe	[ms] [llvm-ml] Add initial MASM STRUCT/UNION support Summary: Add support for user-defined types to MasmParser, including initialization and field access. Known issues: - Omitted entry initializers (e.g., <,0>) do not work consistently for nested structs/arrays. - Size checking/inference for values with known types is not yet implemented. - Some ml64.exe syntaxes for accessing STRUCT fields are not recognized. - `[<register>.<struct name>].<field>` - `[<register>[<struct name>.<field>]]` - `(<struct name> PTR [<register>]).<field>` - `[<variable>.<struct name>].<field>` - `(<struct name> PTR <variable>).<field>` Reviewed By: thakis Differential Revision: https://reviews.llvm.org/D75306	2020-07-07 17:02:10 -04:00
Matt Arsenault	23157f3bdb	GlobalISel: Handle EVT argument lowering correctly handleAssignments was assuming every argument type is an MVT, and assignArg would always fail. This fixes one of the hacks in the current AMDGPU calling convention code that pre-processes the arguments.	2020-07-07 16:36:14 -04:00
Matt Arsenault	42bb481442	AMDGPU/GlobalISel: Fix skipping unused kernel arguments The tests in `a5b9ad7e9a` actually failed the verifier, which for some reason is not the default. Also add tests for 0-sized function arguments, which do not add entries to the expected register lists.	2020-07-07 16:36:13 -04:00
Zola Bridges	9d9e499840	[x86][seses] Add clang flag; Use lvi-cfi with seses This patch creates a clang flag to enable SESES. This flag also ensures that lvi-cfi is on when using seses via clang. SESES should use lvi-cfi to mitigate returns and indirect branches. The flag to enable the SESES functionality only without lvi-cfi is now -x86-seses-enable-without-lvi-cfi to warn users part of the mitigation is not enabled if they use this flag. This is useful in case folks want to see the cost of SESES separate from the LVI-CFI. Reviewed By: sconstab Differential Revision: https://reviews.llvm.org/D79910	2020-07-07 13:20:13 -07:00
Simon Pilgrim	931ec74f7a	[X86][AVX] Don't fold PEXTR(VBROADCAST_LOAD(X)) -> LOAD(X). We were checking the VBROADCAST_LOAD element size against the extraction destination size instead of the extracted vector element size - PEXTRW/PEXTB have implicit zext'ing so have i32 destination sizes for v8i16/v16i8 vectors, resulting in us extracting from the wrong part of a load. This patch bails from the fold if the vector element sizes don't match, and we now use the target constant extraction code later on like the pre-AVX2 targets, fixing the test case. Found by internal fuzzing tests.	2020-07-07 19:10:03 +01:00
Zola Bridges	dfabffb195	[x86][lvi][seses] Use SESES at O0 for LVI mitigation Use SESES as the fallback at O0 where the optimized LVI pass isn't desired due to its effect on build times at O0. I updated the LVI tests since this changes the code gen for the tests touched in the parent revision. This is a follow up to the comments I made here: https://reviews.llvm.org/D80964 Hopefully we can continue the discussion here. Also updated SESES to handle LFENCE instructions properly instead of adding redundant LFENCEs. In particular, 1) no longer add LFENCE if the current instruction being processed is an LFENCE and 2) no longer add LFENCE if the instruction right before the instruction being processed is an LFENCE Reviewed By: sconstab Differential Revision: https://reviews.llvm.org/D82037	2020-07-07 11:05:09 -07:00
Thomas Lively	0d7286a652	[WebAssembly] Avoid scalarizing vector shifts in more cases Since WebAssembly's vector shift instructions take a scalar shift amount rather than a vector shift amount, we have to check in ISel that the vector shift amount is a splat. Previously, we were checking explicitly for splat BUILD_VECTOR nodes, but this change uses the standard utilities for detecting splat values that can handle more complex splat patterns. Since the C++ ISel lowering is now more general than the ISel patterns, this change also simplifies shift lowering by using the C++ lowering for all SIMD shifts rather than mixing C++ and normal pattern-based lowering. This change improves ISel for shifts to the point that the simd-shift-unroll.ll regression test no longer tests the code path it was originally meant to test. The bug corresponding to that regression test is no longer reproducible with its original reported reproducer, so rather than try to fix the regression test, this change just removes it. Differential Revision: https://reviews.llvm.org/D83278	2020-07-07 10:45:26 -07:00
Biplob Mishra	62ba48b45f	[PowerPC] Implement Vector Replace Builtins in LLVM Provide the LLVM intrinsics needed to implement vector replace element builtins in altivec.h which will be added in a subsequent patch. Differential Revision: https://reviews.llvm.org/D83308	2020-07-07 12:22:52 -05:00
Sanjay Patel	642eed3713	[x86] fix miscompile in buildvector v16i8 lowering In the test based on PR46586: https://bugs.llvm.org/show_bug.cgi?id=46586 ...we are inserting 16-bits into the high element of the vector, shuffling it to element 0, and extracting 32-bits. But xmm1 was never initialized, so the top 16-bits of the extract are undef without this patch. (It seems like we could do better than this by recognizing that we only demand a subsection of the build vector, but I want to make sure we fix the miscompile 1st.) This path is only used for pre-SSE4.1, and simpler patterns get squashed somewhere along the way, so the test still includes a 'urem' as it did in the original test from the bug report. Differential Revision: https://reviews.llvm.org/D83319	2020-07-07 13:02:31 -04:00
Liu, Chen3	ea85ff82c8	[X86] Fix a bug that when lowering byval argument When an argument has 'byval' attribute and should be passed on the stack according calling convention, a stack copy would be emitted twice. This will cause the real value will be put into stack where the pointer should be passed. Differential Revision: https://reviews.llvm.org/D83175	2020-07-07 21:49:31 +08:00
David Sherwood	79d34a5a1b	[SVE][CodeGen] Fix bug when falling back to DAG ISel In an earlier commit `584d0d5c17` I added functionality to allow AArch64 CodeGen support for falling back to DAG ISel when Global ISel encounters scalable vector types. However, it seems that we were not falling back early enough as llvm::getLLTForType was still being invoked for scalable vector types. I've added a new fallback function to the call lowering class in order to catch this problem early enough, rather than wait for lowerFormalArguments to reject scalable vector types. Differential Revision: https://reviews.llvm.org/D82524	2020-07-07 09:23:04 +01:00
Carl Ritson	560292fa99	[AMDGPU] Update isFMAFasterThanFMulAndFAdd assumptions MAD/MAC is no longer always available. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D83207	2020-07-07 15:40:44 +09:00
Nemanja Ivanovic	1b1539712e	[PowerPC] Do not RAUW combined nodes in VECTOR_SHUFFLE legalization When legalizing shuffles, we make an attempt to combine it into a PPC specific canonical form that avoids a need for a swap. If the combine is successful, we RAUW the node and the custom legalization replaces the now dead node instead of the one it should replace. Remove that erroneous call to RAUW.	2020-07-06 22:09:28 -05:00
Xiang1 Zhang	939d8309db	[X86-64] Support Intel AMX Intrinsic INTEL ADVANCED MATRIX EXTENSIONS (AMX). AMX is a new programming paradigm, it has a set of 2-dimensional registers (TILES) representing sub-arrays from a larger 2-dimensional memory image and operate on TILES. These intrinsics use direct TMM register number as its params. Spec can be found in Chapter 3 here https://software.intel.com/content/www/us/en/develop/download/intel-architecture-instruction-set-extensions-programming-reference.html Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D83111	2020-07-07 10:13:40 +08:00
Amy Kwan	c13e3e2c2e	[PowerPC][Power10] Exploit the xxsplti32dx instruction when lowering VECTOR_SHUFFLE. This patch aims to exploit the xxsplti32dx XT, IX, IMM32 instruction when lowering VECTOR_SHUFFLEs. We implement lowerToXXSPLTI32DX when lowering vector shuffles to check if: - Element size is 4 bytes - The RHS is a constant vector (and constant splat of 4-bytes) - The shuffle mask is a suitable mask for the XXSPLTI32DX instruction where it is one of the 32 masks: <0, 4-7, 2, 4-7> <4-7, 1, 4-7, 3> Differential Revision: https://reviews.llvm.org/D83245	2020-07-06 20:28:38 -05:00
Wolfgang Pieb	129387497e	Correct 3 spelling errors in headers and doc strings.	2020-07-06 17:27:51 -07:00
Stanislav Mekhanoshin	f7a7efbf88	[AMDGPU] Tweak getTypeLegalizationCost() Even though wide vectors are legal they still cost more as we will have to eventually split them. Not all operations can be uniformly done on vector types. Conservatively add the cost of splitting at least to 8 dwords, which is our widest possible load. We are more or less lying to cost mode with this change but this can prevent vectorizer from creation of wide vectors which results in RA problems for us. Differential Revision: https://reviews.llvm.org/D83078	2020-07-06 14:07:48 -07:00
Matt Arsenault	f25d020c2e	AMDGPU/GlobalISel: Add types to special inputs When passing special ABI inputs, we have no existing context for the type to use.	2020-07-06 17:00:55 -04:00
Nicolai Hähnle	dfcc68c528	DomTree: Remove getRoots() accessor Summary: Avoid exposing details about how roots are stored. This enables subsequent type-erasure changes. v5: - cleanup a unit test by using EXPECT_EQ instead of EXPECT_TRUE Change-Id: I532b774cc71f2224e543bc7d79131d97f63f093d Reviewers: arsenm, RKSimon, mehdi_amini, courbet Subscribers: jvesely, wdng, hiraditya, kuhar, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D83085	2020-07-06 21:58:11 +02:00
Nicolai Hähnle	76c5cb05a3	DomTree: Remove getChildren() accessor Summary: Avoid exposing details about how children are stored. This will enable subsequent type-erasure changes. New methods are introduced to cover common access patterns. Change-Id: Idb5f4b1b9c84e4cc71ddb39bb52a388682f5674f Reviewers: arsenm, RKSimon, mehdi_amini, courbet Subscribers: qcolombet, sdardis, wdng, hiraditya, jrtc27, zzheng, atanasyan, asbirlea, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D83083	2020-07-06 21:58:11 +02:00
Wouter van Oortmerssen	16d83c395a	[WebAssembly] Added 64-bit memory.grow/size/copy/fill This covers both the existing memory functions as well as the new bulk memory proposal. Added new test files since changes where also required in the inputs. Also removes unused init/drop intrinsics rather than trying to make them work for 64-bit. Differential Revision: https://reviews.llvm.org/D82821	2020-07-06 12:49:50 -07:00
Kazushi (Jam) Marukawa	fa1fecc73d	[VE] Support symbol with offset in assembly Summary: Change MCExpr to support Aurora VE's modifiers. Change asmparser to use existing MCExpr parser (parseExpression) to parse an expression contining symbols with modifiers and offsets. Also add several regression tests of MC layer. Reviewers: simoll, k-ishizaka Reviewed By: simoll Subscribers: hiraditya, llvm-commits Tags: #llvm, #ve Differential Revision: https://reviews.llvm.org/D83170	2020-07-07 04:16:51 +09:00
Kazushi (Jam) Marukawa	af8389e131	[VE] Change to use isa Summary: Change to use isa instead of dyn_cast to avoid a warning. Reviewers: simoll, k-ishizaka Reviewed By: simoll Subscribers: hiraditya, llvm-commits Tags: #llvm, #ve Differential Revision: https://reviews.llvm.org/D83200	2020-07-07 03:48:49 +09:00
Matt Arsenault	c19c153e74	AMDGPU: Don't ignore carry out user when expanding add_co_pseudo This was resulting in a missing vreg def in the use select instruction. The output of the pseudo doesn't make sense, since it really shouldn't have the vreg output in the first place, and instead an implicit scc def to match the real scalar behavior. We could have easier to understand tests if we selected scalar versions of the [us]{add\|sub}.with.overflow intrinsics. This does still end up producing vector code in the end, since it gets moved later.	2020-07-06 14:28:01 -04:00
Luís Marques	61c2a0bb82	[RISCV] Fold ADDIs into load/stores with nonzero offsets We can often fold an ADDI into the offset of load/store instructions: (load (addi base, off1), off2) -> (load base, off1+off2) (store val, (addi base, off1), off2) -> (store val, base, off1+off2) This is possible when the off1+off2 continues to fit the 12-bit immediate. We remove the previous restriction where we would never fold the ADDIs if the load/stores had nonzero offsets. We now do the fold the the resulting constant still fits a 12-bit immediate, or if off1 is a variable's address and we know based on that variable's alignment that off1+offs2 won't overflow. Differential Revision: https://reviews.llvm.org/D79690	2020-07-06 17:32:57 +01:00
jasonliu	6d3ae365bd	[XCOFF][AIX] Give symbol an internal name when desired symbol name contains invalid character(s) Summary: When a desired symbol name contains invalid character that the system assembler could not process, we need to emit .rename directive in assembly path in order for that desired symbol name to appear in the symbol table. Reviewed By: hubert.reinterpretcast, DiggerLin, daltenty, Xiangling_L Differential Revision: https://reviews.llvm.org/D82481	2020-07-06 15:49:15 +00:00
David Green	146dad0077	[ARM] MVE FP16 cost adjustments This adjusts the MVE fp16 cost model, similar to how we already do for integer casts. It uses the base cost of 1 per cvt for most fp extend / truncates, but adjusts it for loads and stores where we know that a extending load has been used to get the load into the correct lane, and only an MVE VCVTB is then needed. Differential Revision: https://reviews.llvm.org/D81813	2020-07-06 15:57:51 +01:00
David Green	afdb2ef2ed	[ARM] Adjust default fp extend and trunc costs This adds some default costs for fp extends and truncates, generally costing them as 1 per lane. If the type is not legal then the cost will include a call to an __aeabi_ function. Some NEON code is also adjusted to make sure it applies to the expected types, now that fp16 is a more common thing. Differential Revision: https://reviews.llvm.org/D82458	2020-07-06 14:23:17 +01:00
Matt Arsenault	a5b9ad7e9a	AMDGPU/GlobalISel: Don't emit code for unused kernel arguments	2020-07-06 09:04:06 -04:00
Matt Arsenault	7b76a5c8a2	AMDGPU: Fix fixed ABI SGPR arguments The default constructor wasn't setting isSet o the ArgDescriptor, so while these had the value set, they were treated as missing. This only ended up mattering in the indirect call case (and for regular calls in GlobalISel, which current doesn't have a way to support the variable ABI).	2020-07-06 09:01:18 -04:00
Esme-Yi	0607c8df7f	[PowerPC] Legalize SREM/UREM directly on P9. Summary: As Bugzilla-35090 reported, the rationale for using custom lowering SREM/UREM should no longer be true. At the IR level, the div-rem-pairs pass performs the transformation where the remainder is computed from the result of the division when both a required. We should now be able to lower these directly on P9. And the pass also fixed the problem that divide is in a different block than the remainder. This is a patch to remove redundant code and make SREM/UREM legal directly on P9. Reviewed By: lkail Differential Revision: https://reviews.llvm.org/D82145	2020-07-06 11:47:31 +00:00
David Green	60b8b2beea	[ARM] Add extra extend and trunc costs for cast instructions This expands the existing extend costs with a few extras for larger types than legal, which will usually be split under MVE. It also adds trunk support for the same thing. These should not have a large effect on many things, but makes the costs explicit and keeps a certain balance between the trunks and extends. Differential Revision: https://reviews.llvm.org/D82457	2020-07-06 11:33:05 +01:00
David Green	55227f85d0	[ARM] Use BaseT::getMemoryOpCost for getMemoryOpCost This alters getMemoryOpCost to use the Base TargetTransformInfo version that includes some additional checks for whether extending loads are legal. This will generally have the effect of making <2 x ..> and some <4 x ..> loads/stores more expensive, which in turn should help favour larger vector factors. Notably it alters the cost of a <4 x half>, which with the current codegen will be expensive if it is not extended. Differential Revision: https://reviews.llvm.org/D82456	2020-07-06 10:58:40 +01:00
Kazushi (Jam) Marukawa	df3bda047d	[VE] Correct stack alignment Summary: Change stack alignment from 64 bits to 128 bits to follow ABI correctly. And add a regression test for datalayout. Reviewers: simoll, k-ishizaka Reviewed By: simoll Subscribers: hiraditya, cfe-commits, llvm-commits Tags: #llvm, #ve, #clang Differential Revision: https://reviews.llvm.org/D83173	2020-07-06 17:25:29 +09:00
David Green	74ca67c109	[ARM] Remove hasSideEffects from FP converts Whether an instruction is deemed to have side effects in determined by whether it has a tblgen pattern that emits a single instruction. Because of the way a lot of the the vcvt instructions are specified either in dagtodag code or with patterns that emit multiple instructions, they don't get marked as not having side effects. This just marks them as not having side effects manually. It can help especially with instruction scheduling, to not create artificial barriers, but one of these tests also managed to produce fewer instructions. Differential Revision: https://reviews.llvm.org/D81639	2020-07-05 16:23:24 +01:00
Alexander Belyaev	2247f7218a	[llvm] Cast to (void) the unused variable.	2020-07-05 12:33:58 +02:00
Thomas Lively	65330f394b	[WebAssembly] Do not assume br_table range checks will be gt_u OSS-Fuzz and the Emscripten test suite uncovered some edge cases in which the range check instruction seemed to be an (i32.const 0) or other unexpected instruction, triggering an assertion. Unfortunately the reproducers are rather complicated, so they don't make good unit tests. This commit removes the bad assertion and conservatively optimizes range checks only when the range check instruction is i32.gt_u. Differential Revision: https://reviews.llvm.org/D83169	2020-07-04 18:11:24 -07:00
Craig Topper	e652c0f8f3	[X86] Teach lowerShuffleAsBlend to use bit blend for v16i8/v32i8/v16i16 when avx512vl is enabled but not avx512bw. Probably not super important since there are no real CPUs with avx512vl and not avx512bw. But vpternlog should be better than vblendvb. I do wonder if we should use vpternlog even with BWI. We currently use vblendmb or vpblendmw by putting the mask into a GPR and moving it to a k-register. But I don't think we hoist the GPR to k-register copy in machine LICM. Using VPTERNLOG would use a constant pool load, but has the advantage that we're pretty good at hoisting and rematerializing those. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D83156	2020-07-04 10:26:56 -07:00
Craig Topper	b4eb415a99	[X86] Disable VPBLENDVB formation in combineLogicBlendIntoPBLENDV if VPTERNLOG is supported. VPBLENDVB is multiple uops while VPTERNLOG is a single uop. So we should use that instead. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D83155	2020-07-04 10:12:19 -07:00
Simon Pilgrim	71f342d6c3	[X86][AVX] Fold PACK(LOSUBVECTOR(SHUFFLE(X)),HISUBVECTOR(SHUFFLE(X))) -> SHUFFLE(PACK(LOSUBVECTOR(X),HISUBVECTOR(X))) Using PACK for truncations leaves us with intermediate shuffles that can be tricky to remove while the truncation tree is being formed. This fold helps pull out the PERMQ case which is one of the most common, avoiding some costly lane-crossing shuffles. A future patch will begin adding more general shuffle folding, which we should be able to use for HADD/HSUB as well.	2020-07-04 13:54:30 +01:00
Paul Walker	7356b4243a	[SVE] Fix invalid assert in expand_DestructiveOp. AArch64ExpandPseudo::expand_DestructiveOp contains an assert to ensure the destructive operand's register is unique. However, this is only required when psuedo expansion emits a movprfx. A simple example when a movprfx is not required is Z0 = FADD_ZPZZ_UNDEF_S P0, Z0, Z0 which expands to an unprefixed FADD_ZPmZ_S instruction. This patch moves the assert to the places where a movprfx is emitted. Differential Revision: https://reviews.llvm.org/D83029	2020-07-04 09:21:40 +00:00
Craig Topper	fed432523e	[X86] Directly emit VPTERNLOG from canonicalizeBitSelect when possible. Seems to produce better results on some rotate tests. And is neutral for other tests.	2020-07-03 22:08:28 -07:00
Kai Luo	c352e0885a	[PowerPC] Implement probing for prologue This patch is part of supporting `-fstack-clash-protection`. Implemented probing when emitting prologue. Differential Revision: https://reviews.llvm.org/D81460	2020-07-04 03:07:08 +00:00
Craig Topper	e75f2d5a8c	[X86] Add matching support for X86ISD::ANDNP to X86DAGToDAGISel::tryVPTERNLOG.	2020-07-03 17:50:35 -07:00
Thomas Lively	8df30d988e	[WebAssembly] Do not omit range checks for i64 switches Summary: Since the br_table instruction takes an i32, switches over i64s (and larger integers) must use the i32.wrap_i64 instruction to truncate the table index. This truncation makes numbers just over 2^32 indistinguishable from small numbers, so it was a miscompilation to omit the range check preceding these br_tables. This change fixes the problem by skipping the "fixing" of the br_table when the range check is an i64 instruction. Fixes PR46447. Reviewers: aheejin, dschuff, kripken Reviewed By: kripken Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D83017	2020-07-03 17:15:39 -07:00
Lei Huang	e359ab1eca	[PowerPC][NFC] Fix indentation	2020-07-03 16:47:24 -05:00
Sanjay Patel	26543f1c0c	[x86] improve codegen for bit-masked vector compare and select (PR46531) We canonicalize patterns like: %s = lshr i32 %a0, 1 %t = trunc i32 %s to i1 to: %a = and i32 %a0, 2 %c = icmp ne i32 %a, 0 ...in IR, but the bit-shifting original sequence may be better for x86 vector codegen. I tried several variants of the transform, and it's tricky to not induce regressions. In particular, I did not find a way to cleanly handle non-splat constants, so I've left that as a TODO item here (currently negative tests for those are included). AVX512 resulted in some diffs, but didn't look meaningful, so I left that out too. Some of the 256-bit AVX1 diffs are questionable, but close enough that they are probably insignificant. Differential Revision: https://reviews.llvm.org/D83073.	2020-07-03 17:31:57 -04:00
Biplob Mishra	0939e04e41	[PowerPC] Implement Vector Insert Builtins in LLVM/Clang Implements vec_insertl() and vec_inserth(). Differential Revision: https://reviews.llvm.org/D82365	2020-07-03 15:30:41 -05:00
Sean Fertile	484a36b97d	Enable basepointer for AIX. Differential Revision: https://reviews.llvm.org/D82030	2020-07-03 11:55:49 -04:00
Petre-Ionut Tudor	af80a4353e	[ARM] Generate [SU]RHADD from (b - (~a)) >> 1 Summary: Teach LLVM to recognize the above pattern, which is usually a transformation of (a + b + 1) >> 1, where the operands are either signed or unsigned types. Subscribers: kristof.beyls, hiraditya, danielkiss, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D82669	2020-07-03 16:00:06 +01:00
vpykhtin	bb69ca822a	[AMDGPU] Don't combine DPP if DPP register is used more than once per instruction Reviewers: arsenm, rampitec, foad Reviewed By: rampitec, foad Subscribers: wuzish, kzhuravl, nemanjai, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kbarton, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D82551	2020-07-03 15:08:26 +03:00
Luke Geeson	8bf99f1e6f	[ARM] Add Cortex-A77 Support for Clang and LLVM This patch upstreams support for the Arm-v8 Cortex-A77 processor for AArch64 and ARM. In detail: - Adding cortex-a77 as a cpu option for aarch64 and arm targets in clang - Cortex-A77 CPU name and ProcessorModel in llvm details of the CPU can be found here: https://www.arm.com/products/silicon-ip-cpu/cortex-a/cortex-a77 and a similar submission to GCC can be found here: `e0664b7a63` The following people contributed to this patch: - Luke Geeson - Mikhail Maltsev Reviewers: t.p.northover, dmgreen, ostannard, SjoerdMeijer Reviewed By: dmgreen Subscribers: dmgreen, kristof.beyls, hiraditya, danielkiss, cfe-commits, llvm-commits, miyuki Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D82887	2020-07-03 13:00:54 +01:00
Guillaume Chatelet	87e2751cf0	[Alignment][NFC] Use proper getter to retrieve alignment from ConstantInt and ConstantSDNode This patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Differential Revision: https://reviews.llvm.org/D83082	2020-07-03 08:06:43 +00:00
serge-sans-paille	c8ef3d5a2f	Fix stack-clash probing for large static alloca Differential Revision: https://reviews.llvm.org/D82867	2020-07-03 09:22:03 +02:00
Guillaume Chatelet	3587c9c427	[NFC] Use ADT/Bitfields in Instructions This is an example patch for D81580. Differential Revision: https://reviews.llvm.org/D81662	2020-07-03 07:20:22 +00:00
Craig Topper	b94e9b7f05	[X86] Remove MODRM_SPLITREGM from the disassembler tables. This offers a very minor table size reduction due to only being used for one AMX opcode.	2020-07-03 00:16:20 -07:00
Kai Luo	03828e38c3	[PowerPC] Implement probing for dynamic stack allocation This patch is part of supporting `-fstack-clash-protection`. Mainly do such things compared to existing `lowerDynamicAlloc` - Added a new pseudo instruction PPC::PREPARE_PROBED_ALLOC to get actual frame pointer and final stack pointer. - Synthesize a loop to probe by blocks. - Use DYNAREAOFFSET to get MaxCallFrameSize which is calculated in prologepilog. Differential Revision: https://reviews.llvm.org/D81358	2020-07-03 05:36:40 +00:00
Craig Topper	52855ed099	[X86] Add back support for matching VPTERNLOG from back to back logic ops. I think this mostly looks ok. The only weird thing I noticed was a couple rotate vXi8 tests picked up an extra logic op where we have (and (or (and), (andn)), X). Previously we matched the (or (and), (andn)) to vpternlog, but now we match the (and (or), X) and leave the and/andn unmatched.	2020-07-02 22:11:52 -07:00
Carl Ritson	42ca2070d7	[AMDGPU] Insert PS early exit at end of control flow Exit early if the exec mask is zero at the end of control flow. Mark the ends of control flow during control flow lowering and convert these to exits during the insert skips pass. Reviewed By: nhaehnle Differential Revision: https://reviews.llvm.org/D82737	2020-07-03 14:04:34 +09:00
Kai Luo	d8921a8005	[PowerPC][NFC] Prevent unused error when assertion is disabled.	2020-07-03 04:23:19 +00:00
Carl Ritson	7ec6927bad	Revert "[AMDGPU] Insert PS early exit at end of control flow" This reverts commit `2bfcacf0ad`. There appears to be an issue to analysis preservation.	2020-07-03 13:03:33 +09:00
Kai Luo	40e9e0826b	[PowerPC][NFC] Refactor lowerDynamicAlloc When performing dynamic stack allocation, calculation of frame pointer and actual negsize can be separated. This patch refactors `lowerDynamicAlloc` in preparation of supporting `-fstack-clash-protection` which also has to calculate actual frame pointer and negsize. Differential Revision: https://reviews.llvm.org/D81354	2020-07-03 03:33:24 +00:00
Carl Ritson	2bfcacf0ad	[AMDGPU] Insert PS early exit at end of control flow Exit early if the exec mask is zero at the end of control flow. Mark the ends of control flow during control flow lowering and convert these to exits during the insert skips pass. Reviewed By: nhaehnle Differential Revision: https://reviews.llvm.org/D82737	2020-07-03 12:26:28 +09:00
Carl Ritson	a3daa3f75a	[AMDGPU] Unify early PS termination blocks Generate a single early exit block out-of-line and branch to this if all lanes are killed. This avoids branching if lanes are active. Reviewed By: nhaehnle Differential Revision: https://reviews.llvm.org/D82641	2020-07-03 09:58:05 +09:00
Craig Topper	acf6c94a38	[X86] Teach lower512BitShuffle to try bitmask and bitblend before splitting v32i16/v64i8 on av512f only targets. We consider v32i16/v64i8 to be legal types on avx512f, but we don't have most operations until avx512bw. But we can use and/or/xor operations. So try those before splitting. This is especially helpful since we turn some ands with constant masks into shuffles in early DAG combines. So we should make sure we recover those back to AND.	2020-07-02 15:35:48 -07:00
Biplob Mishra	ca464639a1	[PowerPC] Implement Vector Blend Builtins in LLVM/Clang Implements vec_blendv() Differential Revision: https://reviews.llvm.org/D82774	2020-07-02 16:52:52 -05:00
Amy Kwan	6076fc698d	[PowerPC]Add Vector Insert Instruction Definitions and MC Test Adds td definitions and asm/disasm tests for the following instructions: VINSBVLX VINSBVRX VINSHVLX VINSHVRX VINSWVLX VINSWVRX VINSBLX VINSBRX VINSHLX VINSHRX VINSWLX VINSWRX VINSDLX VINSDRX VINSW VINSD Differential Revision: https://reviews.llvm.org/D83052	2020-07-02 15:49:16 -05:00
Craig Topper	912cd8a37f	[X86] Add vpternlog to the broadcast unfolding table.	2020-07-02 13:43:44 -07:00
Craig Topper	204a21317a	[X86] Modify the conditions for when we stop making v16i8/v32i8 rotate Custom based on having avx512 features. The comments here indicate that we prefer to promote the shifts instead of allowing rotate to be pattern matched. But we weren't taking into account whether 512-bit registers are enabled or whethever we have vpsllvw/vpsrlvw instructions. splatvar_rotate_v32i8 is a slight regrssion, but the other cases are neutral or improved.	2020-07-02 13:07:51 -07:00
Biplob Mishra	286073484f	[PowerPC]Implement Vector Permute Extended Builtin Implements vector permute builtin: vec_permx() Differential Revision: https://reviews.llvm.org/D82869	2020-07-02 14:53:18 -05:00
Nemanja Ivanovic	a701dc5510	[PowerPC] Remove undefs from splat input when changing shuffle mask As of `1fed131660`, we have code that changes shuffle masks so that we can put the shuffle in a canonical form that can be matched to a single instruction. However, it does not properly account for undef elements in the BUILD_VECTOR that is the RHS splat so we can end up with undefs where they shouldn't be. This patch converts the splat input with undefs to one without.	2020-07-02 12:26:56 -05:00
Sander de Smalen	8b7b0ad24c	[AArch64][SVE] NFC: Rename isOrig -> isReverseInstr This is a non-functional to clarify some of the terminology in the AArch64SVEInstrInfo/SVEInstrFormats.td files around the tables for mapping an instruction to it's reverse instruction counter part, and vice versa. e.g. DIV -> DIVR and DIVR -> DIV. Reviewers: paulwalker-arm, cameron.mcinally, rengolin, efriedma Reviewed By: paulwalker-arm, efriedma Tags: #llvm Differential Revision: https://reviews.llvm.org/D82979	2020-07-02 17:01:15 +01:00
Dmitry Preobrazhensky	1c9d681092	[AMDGPU][CODEGEN] Added support of new inline assembler constraints Added support for constraints 'I', 'J', 'B', 'C', 'DA', 'DB'. See https://gcc.gnu.org/onlinedocs/gcc/Machine-Constraints.html#Machine-Constraints. Reviewers: arsenm, rampitec Differential Revision: https://reviews.llvm.org/D81651	2020-07-02 17:20:15 +03:00
Sander de Smalen	075c440f7b	[AArch64][SVE] Put zeroing pseudos and patterns under flag. This patch puts the _ZERO pseudos and corresponding patterns under the predicate 'UseExperimentalZeroingPseudos', so that they can be enabled/disabled through compile flags. This is done because the zeroing pseudos use MOVPRFX to do merging of the inactive lanes, but it depends on the uarch whether this operation is actually merged with the destructive operation. If not, it may be more profitable to use a SELECT and to give the compiler the freedom to schedule these instructions as normal, rather than keeping them bundled together. Additionally, this feature is not yet fully implemented and there are still known bugs (see D80410) that need to be resolved before the 'experimental' can be dropped from the name. Reviewers: paulwalker-arm, cameron.mcinally, efriedma Reviewed By: paulwalker-arm Tags: #llvm Differential Revision: https://reviews.llvm.org/D82780	2020-07-02 14:24:33 +01:00
Guillaume Chatelet	8dbafd24d6	[Alignment][NFC] Transition and simplify calls to DL::getABITypeAlignment This patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Differential Revision: https://reviews.llvm.org/D82977	2020-07-02 11:28:02 +00:00
Kerry McLaughlin	fd6193d5ea	[AArch64][SVE] Add reg+imm addressing mode for unpredicated stores Reviewers: sdesmalen, efriedma, david-arm Reviewed By: efriedma Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, danielkiss, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D82985	2020-07-02 12:00:01 +01:00
Sander de Smalen	143e324e75	[CodeGen][SVE] Don't drop scalable flag in DAGCombiner::visitEXTRACT_SUBVECTOR There was a rogue 'assert' in AArch64ISelLowering for the tuple.get intrinsics, that shouldn't really have been there (I suspect this was a remnant from when we expected the wider vector always to have come from a vector CONCAT). When I tried to create a more minimal reproducer, I found a bug in DAGCombiner where it drops the scalable flag when trying to fold: extract_subv (bitcast X), Index --> bitcast (extract_subv X, Index') This patch fixes both issues. Reviewers: david-arm, efriedma, spatel Reviewed By: efriedma Tags: #llvm Differential Revision: https://reviews.llvm.org/D82910	2020-07-02 10:16:43 +01:00
Sander de Smalen	07bda98b6a	[AArch64][SVE] Add unpred load/store patterns for bf16 types Reviewers: kmclaughlin, c-rhodes, efriedma Reviewed By: efriedma Tags: #llvm Differential Revision: https://reviews.llvm.org/D82909	2020-07-02 10:01:24 +01:00
Nicholas Guy	dc8e4d8566	[ARM] Rearrange SizeReduction when using -Oz Move the Thumb2SizeReduce pass to before IfConversion when optimising for minimal code size. Running the Thumb2SizeReduction pass before IfConversionallows T1 instructions to propagate to the final output, rather than the ifConverter modifying T2 instructions and preventing them from being reduced later. This change does introduce a regression regarding execution time, so it's only applied when optimising for size. Running the LLVM Test Suite with this change produces a geomean difference of -0.1% for the size..text metric. Differential Revision: https://reviews.llvm.org/D82439	2020-07-02 09:19:38 +01:00
Craig Topper	0aad82943a	[X86] Enable multibyte NOPs in 64-bit mode for padding/alignment. The default CPU used by llvm-mc doesn't have the NOPL feature, but if we know we're compiling in 64-bit mode we should be able to use nopl.	2020-07-01 23:59:01 -07:00
Pushpinder Singh	e1a31f52cd	[AMDGPU] Control num waves per EU for implicit work-group size Summary: If amdgpu-flat-work-group-size is not specified in LLVM IR, the backend uses default value of 1024. For this, minimum waves per EU should be 4. However, backend is still setting minimum value to 1 instead of calculated value. This is not observed normally as frontend always provide amdgpu-flat-work-group-size attribute. Reviewers: rampitec, b-sumner, sameerds, msearles Reviewed By: rampitec Subscribers: qcolombet, arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D81991	2020-07-01 22:53:52 -04:00

... 10 11 12 13 14 ...

59397 Commits