llvm-project

Commit Graph

Author	SHA1	Message	Date
Cullen Rhodes	1f44dfb640	[AArch64][AsmParser] Fix bug in operand printer The switch in AArch64Operand::print was changed in D45688 so the shift can be printed after printing the register. This is implemented with LLVM_FALLTHROUGH and was broken in D52485 when BTIHint was put between the register and shift operands. Reviewed By: ostannard Differential Revision: https://reviews.llvm.org/D86535	2020-08-26 09:31:36 +00:00
Sander de Smalen	5f47d4456d	[AArch64][SVE] Fix calculation restore point for SVE callee saves. This fixes an issue where the restore point of callee-saves in the function epilogues was incorrectly calculated when the basic block consisted of only a RET instruction. This caused dealloc instructions to be inserted in between the block of callee-save restore instructions, rather than before it. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D86099	2020-08-26 10:02:31 +01:00
Craig Topper	1d1515a9e2	[X86] Add an isel pattern for (i8 (trunc (i16 (bitconvert (v16i1 X))))) to avoid an extra EXTRACT_SUBREG Since we can only copy to GR32 we had to EXTRACT from GR32, but we would first go to GR16 and then the truncate would extra again to GR8. This adds a special case to go directly from GR32 to GR8. This would eventually get cleaned up, but though maybe we should avoid doing it in the first place. Our k-register handling is weird and we could probably stand to have some more special ISD nodes for the conversions so the i32 type would be explicit.	2020-08-25 18:20:43 -07:00
Craig Topper	b8ec8f5776	[X86] Remove extra getOperand(0) call from recently introduced store(extract_element(vtrunc)) to truncated store combine. The IsExtractedElement already called getOperand(0) so Extract here is the source vector. We shouldn't call getOperand(0). This worked for the original test cases because the result was a bitcast so the getOperand(0) accidently peeked through the bitcast which is what we wanted. In the failing case here, the operand turns out to be undef so the getOperand(0) asserts because undef has no operands. Fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=25184 Differential Revision: https://reviews.llvm.org/D86428	2020-08-25 16:16:54 -07:00
Craig Topper	ba319ac47e	[X86] Remove a redundant COPY_TO_REGCLASS for VK16 after a KMOVWkr in an isel output pattern. KMOVWkr produces VK16, there's no reason to copy it to VK16 again. Test changes are presumably because we were scheduling based on the COPY that is no longer there.	2020-08-25 15:19:27 -07:00
Stanislav Mekhanoshin	b7760c3e5d	[AMDGPU] Remove unsound dependency on ISA version in waitcnt Differential Revision: https://reviews.llvm.org/D86566	2020-08-25 14:01:42 -07:00
Stanislav Mekhanoshin	817c831f02	[AMDGPU] Switch to named simm16 in vscnt insertion Differential Revision: https://reviews.llvm.org/D86568	2020-08-25 13:05:27 -07:00
Ankit Aggarwal	2da1eefb58	[Hexagon] Check if EVT is simple type in HVX lowering	2020-08-25 15:02:44 -05:00
Krzysztof Parzyszek	dcef5e0c37	[Hexagon] Remove (redundant) HexagonISelLowering::isHvxOperation(SDValue) Use isHvxOperation(SDNode*) instead.	2020-08-25 11:45:08 -05:00
Matt Arsenault	0d2fe90063	AMDGPU/GlobalISel: Use more accurate legality rules for merge/unmerge Most notably, we were incorrectly reporting <3 x s16> as a legal type for these. Make sure these aren't legal to help make progress on fixing the artifact combiner and vector legalizer rules. Unfortunately, this means spreading the -global-isel-abort=0 hack, although this doesn't change the legalizer result in any situation.	2020-08-25 09:40:20 -04:00
Sjoerd Meijer	c352e7fbda	[ARM][MVE] Tail-predication: remove the BTC + 1 overflow checks This adapts tail-predication to the new semantics of get.active.lane.mask as defined in D86147. This means that: - we can remove the BTC + 1 overflow checks because now the loop tripcount is passed in to the intrinsic, - we can immediately use that value to setup a counter for the number of elements processed by the loop and don't need to materialize BTC + 1. Differential Revision: https://reviews.llvm.org/D86303	2020-08-25 14:38:03 +01:00
Matt Arsenault	ef8f3b5a78	AMDGPU/GlobalISel: Apply bitcast load/store hack to pointer vectors The selection patterns will currently fail on these.	2020-08-25 09:37:41 -04:00
Mikael Holmen	59e1fbe557	[PowerPC] Fix gcc warning [NFC] Without the fix gcc 7.4 warns with ../lib/Target/PowerPC/PPCAsmPrinter.cpp: In member function 'void {anonymous}::PPCAsmPrinter::EmitTlsCall(const llvm::MachineInstr*, llvm::MCSymbolRefExpr::VariantKind)': ../lib/Target/PowerPC/PPCAsmPrinter.cpp:525:53: warning: enumeral and non-enumeral type in conditional expression [-Wextra] MCInstBuilder(Subtarget->isPPC64() ? Opcode : PPC::BL_TLS) ~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~	2020-08-25 12:58:38 +02:00
Paul Walker	73ac3c0ede	[SVE] Lower scalable vector ISD::FNEG operations. Also updates isConstOrConstSplatFP to allow the mul(A,-1) -> neg(A) transformation when -1 is expressed as an ISD::SPLAT_VECTOR. Differential Revision: https://reviews.llvm.org/D86415	2020-08-25 11:22:28 +01:00
Freddy Ye	e02d081f2b	[X86] Support -march=sapphirerapids Support -march=sapphirerapids for x86. Compare with Icelake Server, it includes 14 more new features. They are amxtile, amxint8, amxbf16, avx512bf16, avx512vp2intersect, cldemote, enqcmd, movdir64b, movdiri, ptwrite, serialize, shstk, tsxldtrk, waitpkg. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D86503	2020-08-25 14:21:21 +08:00
Matt Arsenault	77e5a195f8	AMDGPU/GlobalISel: Handle AGPRs used for SGPR operands. We would still need to waterfall if the value were somehow an AGPR, and also need to explicitly copy to a VGPR.	2020-08-24 17:54:34 -04:00
Nemanja Ivanovic	075a92dea1	[PowerPC] Do not use FISel for calls and TOC-based accesses with PC-Rel PC-Relative addressing introduces a fair bit of complexity for correctly eliminating TOC accesses. FastISel does not include any of that handling so we miscompile code with -mcpu=pwr10 -O0 if it includes an external call that FastISel does not handle followed by any of the following: Floating point constant materialization Materialization of a GlobalValue Call that FastISel does handle This patch switches to SDISel for any of the above. Differential revision: https://reviews.llvm.org/D86343	2020-08-24 16:51:44 -05:00
Craig Topper	f7c87b7e37	[X86] Copy the tuning features and scheduler model from pentium4/x86-64 to generic This is preparation for making clang default to -mtune=generic when no -march is specified. This will allow the default tuning to be "generic" even though our default march is "pentium4" or "x86-64". To avoid llc lit test regressions, if no mcpu is specified, I've defaulted tune to use i586 to match the old tuning settings of no CPU. Some tests explicitly used -mcpu=generic which I've removed so they instead get this default of architecture features from generic and tune from i586. I updated one llvm-mca test to check a different CPU since generic has a scheduler model now Differential Revision: https://reviews.llvm.org/D86312	2020-08-24 14:47:10 -07:00
Nemanja Ivanovic	c485343c83	[PowerPC] Handle SUBFIC in reg+reg -> reg+imm transformation We initially missed the subtract-immediate in this transformation. This patch just adds that. Differential revision: https://reviews.llvm.org/D84659	2020-08-24 16:22:59 -05:00
Roland Froese	b6d7ed469f	[PowerPC] Extend custom lower of vector truncate to handle wider input Current custom lowering of truncate vector handles a source of up to 128 bits, but that only uses one of the two shuffle vector operands. Extend it to use both operands to handle 256 bit sources. Differential Revision: https://reviews.llvm.org/D68035	2020-08-24 15:33:43 -04:00
Matt Arsenault	75e6f0b3d4	AMDGPU: Add flag to disable promotion of uniform i16 ops This interferes with GlobalISel's much better handling of the situation. This should really be disable for GlobalISel. However, the fallback only re-runs the selection passes, and doesn't go back and rerun any codegen IR passes. I haven't come up with a good solution to this problem.	2020-08-24 14:39:27 -04:00
Matt Arsenault	62d1fb828f	AMDGPU/GlobalISel: Use unmerge instead of extract in addrspace queries This is a bit more consistent with regular operation legalization.	2020-08-24 11:07:51 -04:00
Baptiste Saleil	512e256c0d	[PowerPC] Add clang options to control MMA support This patch adds frontend and backend options to enable and disable the PowerPC MMA operations added in ISA 3.1. Instructions using these options will be added in subsequent patches. Differential Revision: https://reviews.llvm.org/D81442	2020-08-24 09:35:55 -05:00
Matt Arsenault	70cd9f5b77	AMDGPU/GlobalISel: Start implementing computeKnownBitsForTargetInstr Handle workitem intrinsics. There isn't really away to adequately test this right now, since none of the known bits users are fine grained enough to test the edge conditions. This triggers a number of instances of the new 64-bit to 32-bit shift combine in the existing tests.	2020-08-24 09:53:27 -04:00
Matt Arsenault	e1644a3779	GlobalISel: Reduce G_SHL width if source is extension shl ([sza]ext x, y) => zext (shl x, y). Turns expensive 64 bit shifts into 32 bit if it does not overflow the source type: This is a port of an AMDGPU DAG combine added in `5fa289f0d8`. InstCombine does this already, but we need to do it again here to apply it to shifts introduced for lowered getelementptrs. This will help matching addressing modes that use 32-bit offsets in a future patch. TableGen annoyingly assumes only a single match data operand, so introduce a reusable struct. However, this still requires defining a separate GIMatchData for every combine which is still annoying. Adds a morally equivalent function to the existing getShiftAmountTy. Without this, we would have to do try to repeatedly query the legalizer info and guess at what type to use for the shift.	2020-08-24 09:42:40 -04:00
Anna Welker	8048068c3e	[ARM][MVE] Allow tail predication for strides !=1 with gather/scatters If gather/scatters are enabled, ARMTargetTransformInfo now allows tail predication for loops with a much wider range of strides, up to anything that is loop invariant. Differential Revision: https://reviews.llvm.org/D85410	2020-08-24 13:54:47 +01:00
Jonas Paulsson	8ac70694b9	[SystemZ] Preserve the MachineMemOperand in emitCondStore() in all cases. Review: Ulrich Weigand	2020-08-24 14:07:30 +02:00
Julien Etienne	0f0be3fb8d	Add support for AVR attiny441 and attiny841 Reviewed By: dylanmckay Differential Revision: https://reviews.llvm.org/D85589 Patch by Julien Etienne	2020-08-24 20:28:32 +12:00
Qiu Chaofan	fed6107dcb	[PowerPC] Allow constrained FP intrinsics in mightUseCTR We may meet Invalid CTR loop crash when there's constrained ops inside. This patch adds constrained FP intrinsics to the list so that CTR loop verification doesn't complain about it. Reviewed By: steven.zhang Differential Revision: https://reviews.llvm.org/D81924	2020-08-24 11:09:58 +08:00
Qiu Chaofan	41ba9d7723	[PowerPC] Support constrained vector fp/int conversion This patch makes these operations legal, and add necessary codegen patterns. There's still some issue similar to D77033 for conversion from v1i128 type. But normal type tests synced in vector-constrained-fp-intrinsic are passed successfully. Reviewed By: uweigand Differential Revision: https://reviews.llvm.org/D83654	2020-08-24 10:10:27 +08:00
Fangrui Song	bef684154d	[X86][FastISel] Support materializing floating-point constants for large code model & PIC The following program miscompiles because rL216012 added static relocation model support but not for PIC. ``` // clang -fpic -mcmodel=large -O0 a.cc double foo() { return 42.0; } ``` This patch adds PIC support. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D86024	2020-08-23 08:36:18 -07:00
Roman Lebedev	503deec218	Temporairly revert "[SimplifyCFG][LoopRotate] SimplifyCFG: disable common instruction hoisting by default, enable late in pipeline" As disscussed in post-commit review starting with https://reviews.llvm.org/D84108#2227365 while this appears to be mostly a win overall, especially code-size-wise, this appears to shake //certain// code pattens in a way that is extremely unfavorable for performance (+30% runtime regression) on certain CPU's (i personally can't reproduce). So until the behaviour is better understood, and a path forward is mapped, let's back this out for now. This reverts commit `1d51dc38d8`.	2020-08-22 00:33:22 +03:00
Stanislav Mekhanoshin	9a9a092e61	[AMDGPU] Avoid sorting stalls in regbank-reassign This is the slowest operation in the already slow pass. Instead of sorting just put a stall list into an ordered map. Differential Revision: https://reviews.llvm.org/D86253	2020-08-21 11:49:41 -07:00
Qiu Chaofan	a5b7b8cce0	[PowerPC] Support constrained scalar sitofp/uitofp This patch adds support for constrained scalar int to fp operations on PowerPC. Besides, this also fixes the FP exception bit of FCFID* instructions. Reviewed By: steven.zhang, uweigand Differential Revision: https://reviews.llvm.org/D81669	2020-08-22 02:10:29 +08:00
Kamau Bridgeman	365f861c45	[PowerPC][PCRelative] Thread Local Storage Support for Initial Exec This patch is the initial support for the Intial Exec Thread Local Local Storage model to produce code sequence and relocations correct to the ABI for the model when using PC relative memory operations. Reviewed By: stefanp Differential Revision: https://reviews.llvm.org/D81947	2020-08-21 10:13:11 -05:00
Cameron McInally	36dbb8fc97	[SVE] Lower fixed length UDIV to scalable Pretty much just a copy of the SDIV patches (D86114 and D85982) with string replacement. Differential Revision: https://reviews.llvm.org/D86316	2020-08-21 09:01:25 -05:00
lewis-revill	9e6c09c0d9	[RISCV] Fix inaccurate annotations on PseudoBRIND PseudoBRIND had seemingly inherited incorrect annotations denoting it as a call instruction and that it defines X1/ra. This caused excess save/restore code to be emitted for ra. Differential Revision: https://reviews.llvm.org/D86286	2020-08-21 11:38:42 +01:00
Mirko Brkusanin	0654ff703d	[AMDGPU] Use ds_read/write_b96/b128 when possible for SDag Do not break down local loads and stores so ds_read/write_b96/b128 in ISelLowering can be selected on subtargets that support them and if align requirements allow them. Differential Revision: https://reviews.llvm.org/D84403	2020-08-21 12:26:31 +02:00
Mirko Brkusanin	d17ea67b92	[AMDGPU][GlobalISel] Fix 96 and 128 local loads and stores Fix local ds_read/write_b96/b128 so they can be selected if the alignment allows. Otherwise, either pick appropriate ds_read2/write2 instructions or break them down. Differential Revision: https://reviews.llvm.org/D81638	2020-08-21 12:26:31 +02:00
Mirko Brkusanin	f5cd7ec9f3	[AMDGPU] Reorganize GCN subtarget features for unaligned access Features UnalignedBufferAccess and UnalignedDSAccess are now used to determine whether hardware supports such access. UnalignedAccessMode should be used to enable them. hasUnalignedBufferAccessEnabled() and hasUnalignedDSAccessEnabled() can be now used to quickly check both. Differential Revision: https://reviews.llvm.org/D84522	2020-08-21 12:26:31 +02:00
Mirko Brkusanin	5bd1febe21	[AMDGPU] Fix alignment requirements for 96bit and 128bit local loads and stores Adjust alignment requirements for ds_read/write_b96/b128. GFX9 and onwards allow misaligned access for reads and writes but only if SH_MEM_CONFIG.alignment_mode allows it. UnalignedDSAccess is set on GCN subtargets from GFX9 onward to let us know if we can relax alignment requirements. UnalignedAccessMode acts similary to UnalignedBufferAccess for DS instructions but only from GFX9 onward and is supposed to match alignment_mode. By default alignment of 4 is required. Differential Revision: https://reviews.llvm.org/D82788	2020-08-21 12:26:31 +02:00
Jay Foad	0819a6416f	[SelectionDAG] Better legalization for FSHL and FSHR In SelectionDAGBuilder always translate the fshl and fshr intrinsics to FSHL and FSHR (or ROTL and ROTR) instead of lowering them to shifts and ORs. Improve the legalization of FSHL and FSHR to avoid code quality regressions. Differential Revision: https://reviews.llvm.org/D77152	2020-08-21 10:32:49 +01:00
Jay Foad	98de0d22f5	[AMDGPU] Apply llvm-prefer-register-over-unsigned from clang-tidy	2020-08-21 10:14:35 +01:00
Sam Parker	acf0bb41e4	[ARM][CostModel] Select instruction costs. Modify the ARM getCmpSelInstrCost implementation for the code size costs of selects. Now consider the legalization cost and increase the cost of i1 because those values wouldn't live in a general purpose register. We also make selects +1 more expensive to account for the IT instruction. Differential Revision: https://reviews.llvm.org/D82091	2020-08-21 08:49:56 +01:00
David Green	2b69efded0	[ARM][LV] Add a preferPredicatedReductionSelect target hook As part of D84741, this adds a target hook for the preferPredicatedReductionSelect option and makes use of it under MVE, allowing us to tail predicate most reduction loops. Differential Revision: https://reviews.llvm.org/D85980	2020-08-21 08:48:12 +01:00
Michael Liao	5257a60ee0	[amdgpu] Add codegen support for HIP dynamic shared memory. Summary: - HIP uses an unsized extern array `extern __shared__ T s[]` to declare the dynamic shared memory, which size is not known at the compile time. Reviewers: arsenm, yaxunl, kpyzhov, b-sumner Subscribers: kzhuravl, jvesely, wdng, nhaehnle, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D82496	2020-08-20 21:29:18 -04:00
Kang Zhang	95e18b2d9d	[PowerPC] Fix a typo for InstAlias of mfsprg D77531 has a type for mfsprg, it should be mtsprg. This patch is to fix this typo.	2020-08-21 01:10:52 +00:00
Matt Arsenault	79ce9bb380	CodeGen: Don't drop AA metadata when splitting MachineMemOperands Assuming this is used to split a memory access into smaller pieces, the new access should still have the same aliasing properties as the original memory access. As far as I can tell, this wasn't intentionally dropped. It may be necessary to drop this if you are moving the operand outside of the bounds of the original object in such a way that it may alias another IR object, but I don't think any of the existing users are doing this. Some of the uses widen into unused alignment padding, which I think is OK.	2020-08-20 16:17:30 -04:00
Matt Arsenault	18b218007d	AMDGPU/GlobalISel: Legalize odd sized loads with widening Custom lower and widen odd sized loads up to the alignment. The default set of legalization actions doesn't have a way to represent this. This fixes naturally aligned <3 x s8> and <3 x s16> loads. This also starts moving towards eliminating the buggy and overcomplicated legalization rules for narrowing. All the memory size changes should be done in the lower or custom action, not NarrowScalar / FewerElements. These currently have redundant and ambiguous code with the lower action.	2020-08-20 16:15:53 -04:00
vnalamot	54d8ded4b1	allSGPRSpillsAreDead() should use actual FP/BP frame indices The SGPR spills happen in SILowerSGPRSpills() and allSGPRSpillsAreDead() make sure there are no SGPR spills pending during PEI. But the FP/BP spills happen during PEI and are exceptions. Use actual frame indices of FP/BP in allSGPRSpillsAreDead() to accommodate the exceptions. Differential Revision: https://reviews.llvm.org/D86291	2020-08-20 16:15:53 -04:00
Kamau Bridgeman	b74b80bb2d	[PowerPC][PCRelative] Thread Local Storage Support for General Dynamic This patch is the initial support for the General Dynamic Thread Local Local Storage model to produce code sequence and relocations correct to the ABI for the model when using PC relative memory operations. Patch by: NeHuang Reviewed By: stefanp Differential Revision: https://reviews.llvm.org/D82315	2020-08-20 15:08:13 -05:00
Cameron McInally	ac63959460	[SVE] Lower fixed length vXi8/vXi16 SDIV to scalable There are no nxv16i8/nxv8i16 SDIV instructions, so these fixed width operations must be promoted to nxv4i32. Differential Revision: https://reviews.llvm.org/D86114	2020-08-20 13:47:01 -05:00
Jessica Clarke	3149ec07c0	[RISCV] Enable MCCodeEmitter instruction predicate verifier This ensures that we never encode an instruction which is unavailable, such as if we explicitly insert a forbidden instruction when lowering. This is particularly important on RISC-V given its high degree of modularity, and will become increasingly important as new standard extensions appear. Reviewed By: asb, lenary Differential Revision: https://reviews.llvm.org/D85015	2020-08-20 18:36:54 +01:00
Jay Foad	3497860203	[AMDGPU] Remove uses of Register::isPhysicalRegister/isVirtualRegister ... in favour of the isPhysical/isVirtual methods.	2020-08-20 17:59:11 +01:00
Bjorn Pettersson	ff107eed15	[AArch64] Update a code comment incorrectly referring to zero_reg. NFC The getSrcFromCopy helper nowadays return a MachineOperand pointer, so talking about zero_reg was incorrect as it nowadays return a nullptr when not finding a copy like instruction.	2020-08-20 14:36:59 +02:00
Paul Walker	0015b8db8e	[SVE] Add ISEL patterns for predicated shifts by an immediate. For scalable vector shifts the prediacte is typically all active, which gets selected to an unpredicated shift by immediate. When code generating for fixed length vectors the predicate is based on the vector length and so additional patterns are required to make use of SVE's predicated shift by immediate instructions. Differential Revision: https://reviews.llvm.org/D86204	2020-08-20 11:47:20 +01:00
Sebastian Neubauer	b8d1994778	[AMDGPU] Add A16/G16 to InstCombine When sampling from images with coordinates that only have 16 bit accuracy, convert the image intrinsic call to use a16 or g16. This does only happen if the target hardware supports it. An alternative would be to always apply this combination, independent of the target hardware and extend 16 bit arguments to 32 bit arguments during legalization. To me, this sounds like an unnecessary roundtrip that could prevent some further InstCombine optimizations. Differential Revision: https://reviews.llvm.org/D85887	2020-08-20 10:51:49 +02:00
dfukalov	33e2f69a24	[AMDGPU][LoopUnroll] Increase BB size to analyze for complete unroll. The `UnrollMaxBlockToAnalyze` parameter is used at the stage when we have no information about a loop body BB cost. In some cases, e.g. for simple loop ``` for(int i=0; i<32; ++i){ D = Arr2[i8 + C1]; Arr1[i64 + C2] += C3 * D; Arr1[i64 + C2 + 2048] += C4 D; } ``` current default parameter value is not enough to run deeper cost analyze so the loop is not completely unrolled. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D86248	2020-08-20 10:41:47 +03:00
Yvan Roux	0459f29e8b	[ARM][MachineOutliner] Add default mode. Use the stack to save and restore the link register when there is no available register to do it. Differential Revision: https://reviews.llvm.org/D76069	2020-08-20 09:25:33 +02:00
Qiu Chaofan	131b3b9ed4	[PowerPC] Support constrained scalar fptosi/fptoui This patch adds support for constrained scalar fp to int operations on PowerPC. Besides, this fixes the FP exception bit of quad-precision convert & truncate instructions. Reviewed By: steven.zhang, uweigand Differential Revision: https://reviews.llvm.org/D81537	2020-08-20 13:29:43 +08:00
Matt Arsenault	31adc28d24	GlobalISel: Implement fewerElementsVector for G_CONCAT_VECTORS sources This fixes <6 x s16> = G_CONCAT_VECTORS from <3 x s16> handling.	2020-08-19 18:53:24 -04:00
Hiroshi Yamauchi	28ccc52c40	[X86] Add feature for Fast Short REP MOV (FSRM) for Icelake or newer. Differential Revision: https://reviews.llvm.org/D85989	2020-08-19 13:39:42 -07:00
Raul Tambre	e887d0e89b	[AArch64][GlobalISel] Handle rtcGPR64RegClassID in AArch64RegisterBankInfo::getRegBankFromRegClass() TargetRegisterInfo::getMinimalPhysRegClass() returns rtcGPR64RegClassID for X16 and X17, as it's the last matching class. This in turn gets passed to AArch64RegisterBankInfo::getRegBankFromRegClass(), which hits an unreachable. It seems sensible to handle this case, so copies from X16 and X17 work. Copying from X17 is used in inline assembly in libunwind for pointer authentication. Differential Revision: https://reviews.llvm.org/D85720	2020-08-19 12:52:30 -07:00
Matt Arsenault	9e8d59a9b8	AMDGPU/GlobalISel: Remove hack for combines forming illegal extloads Previously we weren't adding the LegalizerInfo to the post-legalizer combiner. Since that's fixed, we don't need to try to filter out the one case that was breaking.	2020-08-19 14:15:38 -04:00
Mehdi Amini	a407ec9b6d	Revert "Revert "[NFC][llvm] Make the contructors of `ElementCount` private."" Was reverted because MLIR/Flang builds were broken, these APIs have been fixed in the meantime.	2020-08-19 17:26:36 +00:00
Mehdi Amini	4fc56d70aa	Revert "[NFC][llvm] Make the contructors of `ElementCount` private." This reverts commit `264afb9e6a`. (and dependent `6b742cc48` and `fc53bd610f`) MLIR/Flang are broken.	2020-08-19 17:21:37 +00:00
Jessica Paquette	d25b12bdc3	[GlobalISel] Add combine for (x & mask) -> x when (x & mask) == x If we have a mask, and a value x, where (x & mask) == x, we can drop the AND and just use x. This is about a 0.4% geomean code size improvement on CTMark at -O3 for AArch64. In AArch64, this is most useful post-legalization. Patterns like this often show up when legalizing s1s, which must be extended to larger types. e.g. ``` %cmp:_(s32) = G_ICMP ... %and:_(s32) = G_AND %cmp, 1 ``` Since G_ICMP only produces a single bit, there's no reason to mask it with the G_AND. Differential Revision: https://reviews.llvm.org/D85463	2020-08-19 10:20:57 -07:00
Francesco Petrogalli	264afb9e6a	[NFC][llvm] Make the contructors of `ElementCount` private. Differential Revision: https://reviews.llvm.org/D86120	2020-08-19 16:26:44 +00:00
Simon Pilgrim	057bdd63a4	[X86][AVX] lowerShuffleWithVPMOV - minor refactor to more closely match lowerShuffleAsVTRUNC Replace isBuildVectorAllZeros check by using the Zeroable bitmask instead.	2020-08-19 14:34:32 +01:00
Simon Pilgrim	9fee2bad6d	[X86] lowerShuffleWithVPMOV - remove unnecessary shuffle commutation. NFCI. canonicalizeShuffleMaskWithCommute should have already ensured the lower elements are from V1, we do have test coverage for this already.	2020-08-19 13:28:59 +01:00
Simon Pilgrim	b61cef3a92	[X86][AVX] getAVX512TruncNode - don't truncate from illegal vector widths. Thanks to @fhahn for the test case.	2020-08-19 13:00:26 +01:00
Simon Pilgrim	80a0dc59b7	[X86][AVX] computeKnownBitsForTargetNode - add VTRUNC/VTRUNCS/VTRUNCUS known zero upper elements handling. Like many of the AVX512 conversion ops, the VTRUNC ops guarantee the upper destination elements are zero.	2020-08-19 11:39:27 +01:00
Simon Pilgrim	46fc9a0dfc	[X86][AVX] Fold store(extract_element(vtrunc)) to truncated store Add handling for storing the extracted lower (truncated bits) element from a X86ISD::VTRUNC node - this can be lowered to a generic truncated store directly. Differential Revision: https://reviews.llvm.org/D86158	2020-08-19 11:10:20 +01:00
Meera Nakrani	545de56f87	[ARM] Enabled VMLAV and Add instructions to use VMLAVA Used InstCombine to enable VMLAV and Add instructions to generate VMLAVA instead with tests.	2020-08-19 08:36:49 +00:00
luxufan	6c5039a10f	[RISCV] add the assemble and disassemble support of Zvlsseg instructions This implements the assemble and disassemble support of RISCV Vector extension Zvlsseg instructions, base on the 0.9 spec version. Reviewed by HsiangKai Differential Revision: https://reviews.llvm.org/D84416	2020-08-19 16:22:25 +08:00
Ronak Chauhan	fdf71d486c	Revert "[AMDGPU] Support disassembly for AMDGPU kernel descriptors" This reverts commit `cacfb02d28`. Reverting due to buildbot failures.	2020-08-19 13:12:29 +05:30
Ronak Chauhan	cacfb02d28	[AMDGPU] Support disassembly for AMDGPU kernel descriptors Decode AMDGPU Kernel descriptors as assembler directives. Reviewed By: scott.linder Differential Revision: https://reviews.llvm.org/D80713	2020-08-19 08:49:07 +05:30
Changpeng Fang	e7081d117a	AMDGPU: Implement waterfall loop for MIMG instructions with 256-bit SRsrc Summary: When the resource descriptor is of vgpr, we need a waterfall loop to read into a sgpr. In this patchm we generalized the implementation to work for any regster class sizes, and extend the work to MIMG instructions. Fixes: SWDEV-223405 Reviewers: arsenm, nhaehnle Differential Revision: https://reviews.llvm.org/D82603	2020-08-18 16:27:36 -07:00
Craig Topper	9028c03ce6	[X86] Fix the Predicates on MMX_PSHUFWri/PSHUFWmi to include SSE1 in addition to MMX. These instructions weren't in the initial version of MMX, but were added when SSE1 was introduced. We already have the intrinsic named correctly to include sse and the frontened header enforces sse. We have one place in the backend where we DAG combine to this intrinsic, but that's also qualified. So don't know of anything currently broken unless someone writes their own IR and doesn't set the sse feature.	2020-08-18 14:28:26 -07:00
Eli Friedman	be944c85f3	[AArch64][SVE] Add patterns for integer mla/mls. We probably want to introduce pseudo-instructions at some point, like we have for binary operations, but this seems okay for now. One thing I'm not sure about is whether we should be doing this as a DAGCombine instead of directly pattern-matching it. I don't see any big downside to doing it this way, though. Differential Revision: https://reviews.llvm.org/D85681	2020-08-18 12:51:16 -07:00
Eli Friedman	bb18532399	[AArch64][SVE] Allow llvm.aarch64.sve.st2/3/4 with vectors of pointers. This isn't necessaary for ACLE, but could be useful in other situations. And the change is simple. Differential Revision: https://reviews.llvm.org/D85251	2020-08-18 12:51:16 -07:00
Matt Arsenault	5a15f6628e	GlobalISel: Implement fewerElementsVector for G_INSERT_VECTOR_ELT Add unit tests since AMDGPU will only trigger this for gigantic vectors, and won't use the annoying odd sized breakdown case.	2020-08-18 13:51:19 -04:00
Simon Pilgrim	11ff5176c4	[X86][AVX] lowerShuffleWithVPMOV - add non-VLX support. We can efficiently handle non-VLX cases now that we have the getAVX512TruncNode helper.	2020-08-18 17:51:14 +01:00
Fangrui Song	c466c5fa7e	[ARM] Fix build after D86087	2020-08-18 09:20:32 -07:00
David Green	3471520b1f	[ARM] Allow tail predication of VLDn VLD2/4 instructions cannot be predicated, so we cannot tail predicate them from autovec. From intrinsics though, they should be valid as they will just end up loading extra values into off vector lanes, not effecting the on lanes. The same is true for loads in general where so long as we are not using the other vector lanes, an unpredicated load can be converted to a predicated one. This marks VLD2 and VLD4 instructions as validForTailPredication and allows any unpredicated load in tail predication loop, which seems to be valid given the other checks we have. Differential Revision: https://reviews.llvm.org/D86022	2020-08-18 17:15:45 +01:00
Sam Tebbs	31f02ac60a	[ARM] Use mov operand if the mov cannot be moved while tail predicating There are some cases where the instruction that sets up the iteration count for a tail predicated loop cannot be moved before the dlstp, stopping tail predication entirely. This patch checks if the mov operand can be used and if so, uses that instead. Differential Revision: https://reviews.llvm.org/D86087	2020-08-18 17:10:29 +01:00
jasonliu	f48eced390	[XCOFF] emit .rename for .lcomm when necessary Summary: This is a follow up for D82481. For .lcomm directive, although it's not necessary to have .rename emitted, it's still desirable to do it so that we do not see internal 'Rename..' gets print out in symbol table. And we could have consistent naming between TC entry and .lcomm. And also have consistent naming between IR and final object file. Reviewed By: hubert.reinterpretcast Differential Revision: https://reviews.llvm.org/D86075	2020-08-18 15:32:45 +00:00
Simon Pilgrim	abd33bf5ef	[X86][AVX] lowerShuffleWithPERMV - pad 128/256-bit shuffles on non-VLX targets Allow non-VLX targets to use 512-bits VPERMV/VPERMV3 for 128/256-bit shuffles. TBH I'm not sure these targets actually exist in the wild, but we're testing for them and its good test coverage for shuffle lowering/combines across different subvector widths.	2020-08-18 15:46:02 +01:00
Simon Pilgrim	011bf4fd96	[X86][AVX] lowerShuffleWithVTRUNC - extend to support v16i16/v32i8 binary shuffles. This requires a few additional SrcVT vs DstVT padding cases in getAVX512TruncNode.	2020-08-18 15:30:02 +01:00
Simon Pilgrim	d5621b83a5	[X86][AVX] lowerShuffleWithVTRUNC - pull out TRUNCATE/VTRUNC creation into helper code. NFCI. Prep work toward adding v16i16/v32i8 support for lowerShuffleWithVTRUNC and improving lowerShuffleWithVPMOV.	2020-08-18 14:52:42 +01:00
Matt Arsenault	2f5f5febf3	AMDGPU/GlobalISel: Select llvm.amdgcn.groupstaticsize Previously, it would successfully select and assert if not HSA or PAL when expanding the pseudoinstruction. We don't need the pseudoinstruction anymore since we know the total size after legalization.	2020-08-18 09:28:01 -04:00
Matt Arsenault	3ba7777b94	AMDGPU/GlobalISel: Fix selection of s1/s16 G_[F]CONSTANT The code to determine the value size was overcomplicated and only correct in the case where the result register already had a register class assigned. We can always take the size directly from the register's type.	2020-08-18 09:28:01 -04:00
Simon Pilgrim	7db5124736	[X86][AVX] lowerShuffleWithVTRUNC - avoid unnecessary division in element counts. NFCI. (256 / SrcEltBits) == ((2 * EltSizeInBits * NumElts) / (EltSizeInBits * Scale)) == (2 * (NumElts / Scale)) == NumSrcElts	2020-08-18 13:48:22 +01:00
Paul Walker	9f63dc3265	[SVE] Fix shift-by-imm patterns used by asr, lsl & lsr intrinsics. Right shift patterns will no longer incorrectly accept a shift amount of zero. At the same time they will allow larger shift amounts that are now saturated to their upper bound. Patterns have been extended to enable immediate forms for shifts taking an arbitrary predicate. This patch also unifies the code path for immediate parsing so the i64 based shifts are no longer treated specially. Differential Revision: https://reviews.llvm.org/D86084	2020-08-18 11:41:26 +01:00
Paul Walker	cb5cc47a65	[SVE] Lower fixed length vector ISD::SPLAT_VECTOR operations. Also strengthens the CHECK lines for scalable vector splat tests. Differential Revision: https://reviews.llvm.org/D86070	2020-08-18 11:19:43 +01:00
Simon Pilgrim	d2057a8015	[X86][AVX] Lower v16i8/v8i16 binary shuffles using VTRUNC/TRUNCATE This patch adds lowerShuffleWithVTRUNC to handle basic binary shuffles that can be lowered either as a pure ISD::TRUNCATE or a X86ISD::VTRUNC (with undef/zero values in the remaining upper elements). We concat the binary sources together into a single 256-bit source vector. To avoid regressions we perform this after we've tried to lower with PACKS/PACKUS which typically does a cleaner job than a concat. For non-AVX512VL cases we have to canonicalize VTRUNC cases to use a 512-bit source vectors (inserting undefs/zeros in the upper elements as necessary), truncate and then (possibly) extract the 128-bit result. This should address the last regressions in D66004 Differential Revision: https://reviews.llvm.org/D86093	2020-08-18 11:11:58 +01:00
Amy Kwan	c7ec3a7e33	[PowerPC] Implement Vector Extract Mask builtins in LLVM/Clang This patch implements the vec_extractm function prototypes in altivec.h in order to utilize the vector extract with mask instructions introduced in Power10. Differential Revision: https://reviews.llvm.org/D82675	2020-08-17 21:14:17 -05:00
Craig Topper	b673dfbb9a	[X86] When manually creating intrinsic nodes in X86ISelLowering, make sure we use getTargetConstant and pointer type for the intrinsic ID. Doesn't really matter in practice but that's how the nodes are normally created by SelectionDAGBuilder. So we should match. Found by temporarily hacking type checks into isel table.	2020-08-17 17:25:53 -07:00
Craig Topper	2ffa5d218f	[X86] Rename INTR_TYPE_4OP to INTR_TYPE_4OP_IMM8 and truncate immediates to MVT::i8 This makes sure VPTERNLOG is generated with MVT::i8 immediate as its SDNode declaration in X86InstrFragmentsSIMD.td declares.	2020-08-17 17:25:52 -07:00
Craig Topper	bc244f08cf	[X86] Truncate immediate to i8 for INTR_TYPE_3OP_IMM8 This is used for DBPSADBW which has a i32 immediate for its intrinsic and an i8 immediate in tablegen isel patterns.	2020-08-17 17:25:51 -07:00
Craig Topper	ab7151f1cf	[X86] Make PreprocessISelDAG create X86ISD::VRNDSCALE nodes with i32 constants instead of i8. This is the type declared in X86InstrFragmentsSIMD.td. ISel pattern matching doesn't check so it doesn't matter in practice. Maybe for SelectionDAG CSE it would matter.	2020-08-17 17:25:51 -07:00
Hongtao Yu	819b2d9c79	[llvm-objdump] Symbolize binary addresses for low-noisy asm diff. When diffing disassembly dump of two binaries, I see lots of noises from mismatched jump target addresses and global data references, which unnecessarily causes diffs on every function, making it impractical. I'm trying to symbolize the raw binary addresses to minimize the diff noise. In this change, a local branch target is modeled as a label and the branch target operand will simply be printed as a label. Local labels are collected by a separate pre-decoding pass beforehand. A global data memory operand will be printed as a global symbol instead of the raw data address. Unfortunately, due to the way the disassembler is set up and to be less intrusive, a global symbol is always printed as the last operand of a memory access instruction. This is less than ideal but is probably acceptable from checking code quality point of view since on most targets an instruction can have at most one memory operand. So far only the X86 disassemblers are supported. Test Plan: llvm-objdump -d --x86-asm-syntax=intel --no-show-raw-insn --no-leading-addr : ``` Disassembly of section .text: <_start>: push rax mov dword ptr [rsp + 4], 0 mov dword ptr [rsp], 0 mov eax, dword ptr [rsp] cmp eax, dword ptr [rip + 4112] # 202182 <g> jge 0x20117e <_start+0x25> call 0x201158 <foo> inc dword ptr [rsp] jmp 0x201169 <_start+0x10> xor eax, eax pop rcx ret ``` llvm-objdump -d --symbolize-operands --x86-asm-syntax=intel --no-show-raw-insn --no-leading-addr : ``` Disassembly of section .text: <_start>: push rax mov dword ptr [rsp + 4], 0 mov dword ptr [rsp], 0 <L1>: mov eax, dword ptr [rsp] cmp eax, dword ptr <g> jge <L0> call <foo> inc dword ptr [rsp] jmp <L1> <L0>: xor eax, eax pop rcx ret ``` Note that the jump instructions like `jge 0x20117e <_start+0x25>` without this work is printed as a real target address and an offset from the leading symbol. With a change in the optimizer that adds/deletes an instruction, the address and offset may shift for targets placed after the instruction. This will be a problem when diffing the disassembly from two optimizers where there are unnecessary false positives due to such branch target address changes. With `--symbolize-operand`, a label is printed for a branch target instead to reduce the false positives. Similarly, the disassemble of PC-relative global variable references is also prone to instruction insertion/deletion. Reviewed By: jhenderson, MaskRay Differential Revision: https://reviews.llvm.org/D84191	2020-08-17 16:55:12 -07:00
Kazushi (Jam) Marukawa	68cb29eff1	[VE] Modify ISelLoweirng following clang-tidy Modify case style of function names following clang-tidy. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D86076	2020-08-18 07:43:19 +09:00
Matt Arsenault	a9ee0589a8	AMDGPU/GlobalISel: Match global saddr addressing mode	2020-08-17 15:48:06 -04:00
Matt Arsenault	e1a2f4713c	AMDGPU: Match global saddr addressing mode The previous implementation was incorrect, and based off incorrect instruction definitions. Unfortunately we can't match natural addressing in a lot of cases due to the shift/scale applied in getelementptrs. This relies on reducing the 64-bit shift to 32-bits.	2020-08-17 15:28:14 -04:00
Stanislav Mekhanoshin	24182f14b6	[AMDGPU] Define spill opcodes for all AGPR sizes Since we have defined all these sizes I believe we shall be able to spill these as well. Differential Revision: https://reviews.llvm.org/D86098	2020-08-17 12:17:23 -07:00
Matt Arsenault	c8a9872259	AMDGPU/GlobalISel: Look through copies in getPtrBaseWithConstantOffset We may have an SGPR->VGPR copy if a totally uniform pointer calculation is used for a VGPR pointer operand. Also hack around a bug in MUBUF matching which would incorrectly use MUBUF for global when flat was requested. This should really be a predicate on the parent pattern, but the DAG always checked this manually inside the complex pattern.	2020-08-17 12:31:38 -04:00
Steven Perron	eed6476a87	Reset PAL metadata when AMDGPU traget stream finishes If the same stream object is used for multiple compiles, the PAL metadata from eariler compilations will leak into later one. See https://github.com/GPUOpen-Drivers/llpc/issues/882 for how this is happening in LLPC. No tests were added because multiple compiles will have to happen using the same pass manager, and I do not see a setup for that on the LLVM side. Let me know if there is a good way to test this. Reviewed By: nhaehnle Differential Revision: https://reviews.llvm.org/D85667	2020-08-17 10:56:11 -04:00
Matt Arsenault	c7b9cd31bf	AMDGPU/GlobalISel: Fix missing 256-bit AGPR mapping	2020-08-17 09:53:26 -04:00
Matt Arsenault	af162ac785	AMDGPU/GlobalISel: Fix using readfirstlane with ballot intrinsics This should use the default mapping and insert a copy to the vcc bank, and not try to insert a readfirstlane.	2020-08-17 09:53:25 -04:00
Matt Arsenault	da3f357de6	AMDGPU: Don't look at dbg users for foldable operands These would have always failed to fold, so checking them or adding them to the fold candidates is useless.	2020-08-17 09:53:25 -04:00
Matt Arsenault	66ffa0e91f	AMDGPU/GlobalISel: Fix using post-legal combiner without LegalizerInfo	2020-08-17 09:19:22 -04:00
Matt Arsenault	e0375dbcb3	AMDGPU: Fix using wrong offsets for global atomic fadd intrinsics Global instructions have the signed offsets.	2020-08-17 09:19:15 -04:00
Sam Elliott	3f7068ad98	[RISCV] Enable the use of the old mucounteren name The RISC-V Privileged Specification 1.11 defines `mcountinhibit`, which has the same numeric CSR value as `mucounteren` from 1.09.1. This patch enables the use of the old `mucounteren` name. Patch by Yuichi Sugiyama. Reviewed By: lenary, jrtc27, pzheng Differential Revision: https://reviews.llvm.org/D85067	2020-08-17 13:11:49 +01:00
Sam Elliott	5f9ecc5d85	[RISCV] Indirect branch generation in position independent code This fixes the "Unable to insert indirect branch" fatal error sometimes seen when generating position-independent code. Patch by msizanoen1 Reviewed By: jrtc27 Differential Revision: https://reviews.llvm.org/D84833	2020-08-17 13:09:26 +01:00
Simon Pilgrim	1d2ede87ea	[X86][AVX] Move lowerShuffleWithVPMOV inside explicit shuffle lowering cases Perform lowerShuffleWithVPMOV as part of the v16i8/v8i16 shuffle lowering stages, which are the only types that are currently supported. We need to expand support for lowering shuffles as truncations to fix the remaining regressions in D66004	2020-08-17 11:58:51 +01:00
Kazushi (Jam) Marukawa	40f1e7e804	[VE] Support f128 Support f128 using VE instructions. Update regression tests. I've noticed there is no load or store i128 test, so I add them too. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D86035	2020-08-17 17:26:52 +09:00
Craig Topper	a206f85091	[X86] Reject dirflag in inline asm constraints other than clobber. Fixes the crash from PR47195.	2020-08-16 23:33:45 -07:00
Chen Zheng	4d52ebb9b9	[PowerPC] Make StartMI ignore COPY like instructions. Reviewed By: lkail Differential Revision: https://reviews.llvm.org/D85659	2020-08-17 02:12:30 -04:00
Simon Pilgrim	f25d47b7ed	[X86][AVX] Fold CONCAT(HOP(X,Y),HOP(Z,W)) -> HOP(CONCAT(X,Z),CONCAT(Y,W)) for float types We can now enable this for AVX1 targets can now assist with canonicalizeShuffleMaskWithHorizOp cleanup. There's still a few missed opportunities for merging subvector insert/extracts into shuffles, but they shouldn't cause any regressions now.	2020-08-16 15:00:41 +01:00
Simon Pilgrim	dca7eb7d60	[X86][SSE] Replace combineShuffleWithHorizOp with canonicalizeShuffleMaskWithHorizOp Instead of just attempting to fold shuffle(HOP,HOP) for a specific target shuffle, make this part of combineX86ShufflesRecursively so we can perform this on the combined shuffle chain, which is particularly useful for recognising more cases of where we're performing multiple HOPs that can be merged and pre-AVX where we don't have good blend/unary target shuffle support.	2020-08-16 12:26:27 +01:00
Simon Pilgrim	c27baa54b7	[X86] isRepeatedTargetShuffleMask - don't require specific MVT type. NFC. Split the isRepeatedTargetShuffleMask into a wrapper variant that takes a MVT describing the mask width, and an internal version that just needs the raw mask element bit size. This will be necessary for an upcoming change where the horizontal ops element width might not match the shuffle mask element width.	2020-08-16 11:51:44 +01:00
Amara Emerson	7006bb69ef	[GlobalISel] Enable copy-propagation in post-legalizer combiner. This cleans up copies that the legalizer or other combines leave around. They can occasionally end up escaping as moves. Differential Revision: https://reviews.llvm.org/D85964	2020-08-15 13:44:30 -07:00
Matt Arsenault	f0af434b79	AMDGPU: Remove register class params from flat memory patterns	2020-08-15 12:12:33 -04:00
Matt Arsenault	a7455652c0	AMDGPU: Fix global atomic saddr operand class	2020-08-15 12:12:28 -04:00
Matt Arsenault	625db2fe5b	AMDGPU: Remove slc from flat offset complex patterns This was always set to 0. Use a default value of 0 in this context to satisfy the instruction definition patterns. We can't unconditionally use SLC with a default value of 0 due to limitations in TableGen's handling of defaulted operands when followed by non-default operands.	2020-08-15 12:12:24 -04:00
Matt Arsenault	e5077b5c2a	AMDGPU: Fix matching wrong offsets for global atomic loads These used signed offsets with a different size.	2020-08-15 12:12:17 -04:00
Matt Arsenault	8cb022982a	AMDGPU: Remove redundant FLAT complex patterns These were identical to the non-atomic cases. I'm not sure why these were ever separated.	2020-08-15 12:12:01 -04:00
Matt Arsenault	47af1ac69a	AMDGPU: Correct definitions for global saddr instructions The VGPR component is a 32-bit offset, not 64-bits. I'm not sure what the correct syntax is for this. This maintains the vaddr position and leaves saddr in the end "off" position. This is particularly terrible for stores, since the operand order is now <vgpr offset>, <data>, <sgpr base>, splitting the pointer operands. I suppose this is a logical consequence from the mistake of not putting the data operand first. I'm not sure what sp3 does.	2020-08-15 12:11:57 -04:00
Matt Arsenault	79298a5067	AMDGPU: Remove SIFixupVectorISel pass This was only used for matching the saddr addressing mode of global instructions, but this was not implemented correctly. The instruction definitions aren't even correct, and are defined as using a 64-bit VGPR component. Eliminate this pass to enable correcting the instruction definitions. A new matching implementation can work in GlobalISel or relying on DAG divergence information for the base address.	2020-08-15 12:11:51 -04:00
Stanislav Mekhanoshin	43a38dc251	[AMDGPU] Fix MAI ld/st hazard handling It did not process hazard for ds_permute because it does not load or store even though it is DS. Differential Revision: https://reviews.llvm.org/D86003	2020-08-14 17:07:37 -07:00
Cameron McInally	92593f9e77	[SVE] Lower fixed length vXi32/vXi64 SDIV to scalable vectors. Differential Revision: https://reviews.llvm.org/D85982	2020-08-14 18:47:22 -05:00
Fangrui Song	58f5966d5b	Fix TargetSubtargetInfo derivatives after D85165	2020-08-14 15:50:53 -07:00
Craig Topper	c7a0b2684f	[X86][MC][Target] Initial backend support a tune CPU to support -mtune This patch implements initial backend support for a -mtune CPU controlled by a "tune-cpu" function attribute. If the attribute is not present X86 will use the resolved CPU from target-cpu attribute or command line. This patch adds MC layer support a tune CPU. Each CPU now has two sets of features stored in their GenSubtargetInfo.inc tables . These features lists are passed separately to the Processor and ProcessorModel classes in tablegen. The tune list defaults to an empty list to avoid changes to non-X86. This annoyingly increases the size of static tables on all target as we now store 24 more bytes per CPU. I haven't quantified the overall impact, but I can if we're concerned. One new test is added to X86 to show a few tuning features with mismatched tune-cpu and target-cpu/target-feature attributes to demonstrate independent control. Another new test is added to demonstrate that the scheduler model follows the tune CPU. I have not added a -mtune to llc/opt or MC layer command line yet. With no attributes we'll just use the -mcpu for both. MC layer tools will always follow the normal CPU for tuning. Differential Revision: https://reviews.llvm.org/D85165	2020-08-14 15:31:50 -07:00
Xiangling Liao	f759b4e43b	[AIX] Generate unique module id based on Pid and timestamp A unique module id, which is a part of sinit and sterm function names, is necessary to be unique. However, `getUniqueModuleId` will fail if there is no strong external symbol within a module. We turn to use Pid and timestamp when this happens. Differential Revision: https://reviews.llvm.org/D85527	2020-08-14 16:22:50 -04:00
Simon Pilgrim	e9eb2dc332	[X86][SSE] Fold HOP(SHUFFLE(X),SHUFFLE(Y)) --> SHUFFLE(HOP(X,Y)) This is beginning to look like a canonicalization stage that could be performed as part of shuffle combining Another step towards PR41813 Recommit of rG9bd97d036398 with fixed offset adjustments	2020-08-14 18:43:19 +01:00
Matt Arsenault	40a142fa57	AMDGPU/GlobalISel: Match andn2/orn2 for more types Unfortunately this ends up not working as expected on targets with 16-bit operations due to AMDGPUCodeGenPrepare's promotion of uniform 16-bit ops to i32. The vector case annoyingly requires switching the checked opcode, since constants for vectors aren't directly handled. I also need to think more carefully about whether this is valid for i1.	2020-08-14 13:18:03 -04:00
Kazushi (Jam) Marukawa	2f01af764b	[VE] Remove obsolete I8/I16 register classes Remove I8/I16 register classes which are prepared to implement previously to implement VE ABI. However, it is possible to implement VE ABI correctly without them. Therefore, removing them now. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D85905	2020-08-14 21:52:22 +09:00
Sam Parker	eb82d58f83	[NFC][ARM] Port MaybeCall into ARMTTImpl method Renamed to maybeLoweredToCall.	2020-08-14 10:23:20 +01:00
Sebastian Neubauer	9aa0ff77bd	[AMDGPU] Enable .rodata for amdpal os PAL recently got support for multiple ELF sections and relocations, therefore we can now use .rodata sections instead of forcing constants into .text. Differential Revision: https://reviews.llvm.org/D85895	2020-08-14 09:05:48 +02:00
David Sherwood	6c7957c990	[SVE] Fix bug in SVEIntrinsicOpts::optimizePTest The code wasn't taking into account that the two operands passed to ptest could be identical and was trying to erase them twice. Differential Revision: https://reviews.llvm.org/D85892	2020-08-14 07:57:21 +01:00
David Green	0c390c22a5	Revert "[ARM] Fix IT block generation after Thumb2SizeReduce with -Oz" This reverts commit `18279a54b5` as it is causing some chromium android test problems.	2020-08-13 22:40:36 +01:00
Austin Kerbow	7d1cb187fb	[AMDGPU] Fix FP/BP spills when MUBUF constant offset exceeded If we need a scratch register for the spill don't use the same scratch register that is being used for the MBUF offset. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D85772	2020-08-13 14:12:00 -07:00
Thomas Lively	d53d952810	[WebAssembly] Allow inlining functions with different features Allow inlining only when the Callee has a subset of the Caller's features. In principle, we should be able to inline regardless of any features because WebAssembly supports features at module granularity, not function granularity, but without this restriction it would be possible for a module to "forget" about features if all the functions that used them were inlined. Requested in PR46812. Differential Revision: https://reviews.llvm.org/D85494	2020-08-13 13:57:43 -07:00
Cameron McInally	21810b0e14	[SVE] Lower fixed length vector integer UMIN/UMAX Differential Revision: https://reviews.llvm.org/D85926	2020-08-13 14:48:36 -05:00
Stanislav Mekhanoshin	0462aef5f3	[AMDGPU] Inhibit SDWA if target instruction has FI Differential Revision: https://reviews.llvm.org/D85918	2020-08-13 11:34:28 -07:00
Stanislav Mekhanoshin	d25cb5a8a2	[AMDGPU] Fix misleading SDWA verifier error. NFC. The old error from GFX9 shall be updated to GFX9+.	2020-08-13 11:32:17 -07:00
David Green	2632c625ed	[ARM] Mark VMINNMA/VMAXNMA as commutative These operations take Qda and Rn register operands, which are commutative so long as the instruction is not predicated. Differential Revision: https://reviews.llvm.org/D85813	2020-08-13 18:01:11 +01:00
Cameron McInally	e1a87f0a9b	[SVE] Lower fixed length vector integer SMIN/SMAX Differential Revision: https://reviews.llvm.org/D85855	2020-08-13 11:41:20 -05:00
Simon Pilgrim	cd3b850a4c	rG9bd97d0363987b582 - Revert "[X86][SSE] Fold HOP(SHUFFLE(X),SHUFFLE(Y)) --> SHUFFLE(HOP(X,Y))" This reverts commit `9bd97d0363`. Seeing some codegen issues in internal testing.	2020-08-13 15:21:15 +01:00
Carl Ritson	d538c5837a	[AMDGPU] Fix missed SI_RETURN_TO_EPILOG in pre-emit peephole SIPreEmitPeephole does not process all terminators, which means it can fail to handle SI_RETURN_TO_EPILOG if immediately preceeded by a branch to the early exit block. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D85872	2020-08-13 21:52:41 +09:00
Simon Pilgrim	a31d20e67e	[X86][SSE] IsElementEquivalent - add HOP(X,X) support For HADD/HSUB/PACKS ops with repeated operands the lower/upper half element of each lane are known to be equivalent	2020-08-13 12:42:59 +01:00
Paul Walker	e63cc8105a	[SVE] Lower fixed length vector integer shifts. Differential Revision: https://reviews.llvm.org/D85724	2020-08-13 12:35:47 +01:00
Anna Welker	9eb9ba076a	[ARM][MVE] Fix for tail predication for loops containing MVE gather/scatters Fix to include non-predicated version of write-back gather in special case treatment for deducting the instruction type. (This is fixing https://reviews.llvm.org/D85138 for corner cases) Differential Revision: https://reviews.llvm.org/D85889	2020-08-13 12:24:19 +01:00
Paul Walker	130098228d	[SVE] Lower fixed length vector integer ISD::SETCC operations. Differential Revision: https://reviews.llvm.org/D85831	2020-08-13 12:01:56 +01:00
Paul Walker	9e04895258	[SVE] Lower fixed length integer extend operations. Differential Revision: https://reviews.llvm.org/D85640	2020-08-13 11:54:53 +01:00
Ruiling Song	18b1e67523	[AMDGPU] Fix crash when dag-combining bitcast From the code after the 'break', they are processing 64bit scalar and vector bitcast. So I think the break-condition should be (cond1 \|\| cond2) This means we only execute following code if (64bit and dest-is-vector). Also remove a previous fix which is not needed with this new fix. (introduced in: `1349a04ef5`) Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D85804	2020-08-13 10:23:13 +08:00
Albion Fung	3136cbe29e	[PowerPC] Implement Vector Shift Builtins This patch implements the builtins for the vector shifts (shl, srl, sra), and adds the appropriate test cases for these builtins. The builtins utilize the vector shift instructions introduced within ISA 3.1. Differential Revision: https://reviews.llvm.org/D83338	2020-08-12 18:26:58 -05:00
Francesco Petrogalli	c561f4d2ec	[SVE][VLS] Don't combine logical AND. Testing is performed when targeting 128, 256 and 512-bit wide vectors. For 128-bit vectors, the original behavior of using NEON instructions is preserved. Differential Revision: https://reviews.llvm.org/D85479	2020-08-12 20:00:07 +01:00
Simon Pilgrim	39de63aef9	Fix signed/unsigned comparison warnings. NFC.	2020-08-12 19:22:13 +01:00
David Green	1bb3488685	[ARM] Predicated VFMA patterns Similar to the Two op + select patterns that were added recently, this adds some patterns for select + fma to turn them into predicated operations. Differential Revision: https://reviews.llvm.org/D85824	2020-08-12 18:35:01 +01:00
Simon Pilgrim	13d6cf0951	[X86][SSE] Pull out BUILD_VECTOR operand equivalence tests. NFC. Pull out element equivalence code from isShuffleEquivalent/isTargetShuffleEquivalent, I've also removed many of the index modulos where possible. First step toward simply adding some additional equivalence tests.	2020-08-12 18:20:18 +01:00
Craig Topper	5f7cdb2eff	[X86][GlobalISel] Legalize G_ICMP results to s8. We need to produce a setcc instruction which has an 8-bit result. This gets rid of a bunch of cases that were using the s1->s8/s16/s32/s64 handling in selectZExt. I'm not very familiar with GlobalISel yet so I'm not yet sure the best way to do things. I'd especially like feedback on the best way to handle the currently split 32-bit and 64-bit mode handling. Differential Revision: https://reviews.llvm.org/D85814	2020-08-12 10:13:59 -07:00
Cameron McInally	ce2c991061	[SVE] Lower fixed length FP minnum/maxnum Lower fixed length MINNUM/MAXNUM to scalable vectors. Cherry-picked from D71767 with added tests. Differential Revision: https://reviews.llvm.org/D85744	2020-08-12 12:02:52 -05:00
Krzysztof Parzyszek	a2dc19b81b	[Hexagon] Return scalar size in getMinVectorRegisterBitWidth() when no HVX This fixes https://llvm.org/PR47128.	2020-08-12 10:13:58 -05:00
Anna Welker	4fe5615eab	[ARM][MVE] Enable tail predication for loops containing MVE gather/scatters Widen the scope of memory operations that are allowed to be tail predicated to include gathers and scatters, such that loops that are auto-vectorized with the option -enable-arm-maskedgatscat (and actually end up containing an MVE gather or scatter) can be tail predicated. Differential Revision: https://reviews.llvm.org/D85138	2020-08-12 15:32:37 +01:00
Matt Arsenault	e14474a39a	AMDGPU/GlobalISel: Select llvm.amdgcn.global.atomic.fadd Remove the intermediate transform in the DAG path. I believe this is the last non-deprecated intrinsic that needs handling.	2020-08-12 10:04:53 -04:00
Matt Arsenault	701228c411	AMDGPU: Handle intrinsics in performMemSDNodeCombine This avoids a possible regression in a future patch	2020-08-12 10:04:53 -04:00
Sam Parker	ea8448e361	[LoopUnroll] Adjust CostKind query When TTI was updated to use an explicit cost, TCK_CodeSize was used although the default implicit cost would have been the hand-wavey cost of size and latency. So, revert back to this behaviour. This is not expected to have (much) impact on targets since most (all?) of them return the same value for SizeAndLatency and CodeSize. When optimising for size, the logic has been changed to query CodeSize costs instead of SizeAndLatency. This patch also adds a testing option in the unroller so that OptSize thresholds can be specified. Differential Revision: https://reviews.llvm.org/D85723	2020-08-12 12:56:09 +01:00
Simon Pilgrim	9bd97d0363	[X86][SSE] Fold HOP(SHUFFLE(X),SHUFFLE(Y)) --> SHUFFLE(HOP(X,Y)) This is beginning to look like a canonicalization stage that could be performed as part of shuffle combining Another step towards PR41813	2020-08-12 12:16:36 +01:00
Simon Pilgrim	a0c2c6aa42	[X86][AVX] Fold CONCAT(HOP(X,Y),HOP(Z,W)) -> HOP(CONCAT(X,Z),CONCAT(Y,W)) for float types Only do this for AVX2+ targets as we still get some regressions on AVX1 without PERMPD/PERMQ	2020-08-12 11:31:05 +01:00
Sjoerd Meijer	6716e7868e	[ARM][MVE] tail-predication: overflow checks for backedge taken count. This pick ups the work on the overflow checks for get.active.lane.mask, which ensure that it is safe to insert the VCTP intrinisc that enables tail-predication. For a 2d auto-correlation kernel and its inner loop j: M = Size - i; for (j = 0; j < M; j++) Sum += Input[j] * Input[j+i]; For this inner loop, the SCEV backedge taken count (BTC) expression is: (-1 + (sext i16 %Size to i32)),+,-1}<nw><%for.body> and LoopUtil cannotBeMaxInLoop couldn't calculate a bound on this, thus "BTC cannot be max" could not be determined. So overflow behaviour had to be assumed in the loop tripcount expression that uses the BTC. As a result tail-predication had to be forced (with an option) for this case. This change solves that by using ScalarEvolution's helper getConstantMaxBackedgeTakenCount which is able to determine the range of BTC, thus can determine it is safe, so that we no longer need to force tail-predication as reflected in the changed test cases. Differential Revision: https://reviews.llvm.org/D85737	2020-08-12 09:32:26 +01:00
David Sherwood	88bbd30736	[SVE][CodeGen] Fix issues with EXTRACT_SUBVECTOR when using scalable FP vectors In this patch I have fixed two issues: 1. Our SVE tuple get/set intrinsics were using the wrong constant type for the index passed to EXTRACT_SUBVECTOR. I have fixed this by using the function SelectionDAG::getVectorIdxConstant to create the value. Also, I have updated the documentation for EXTRACT_SUBVECTOR describing what type the constant index should be and we now enforce this when creating the node. 2. The AArch64 backend was missing the appropriate patterns for extracting certain subvectors (nxv4f16 and nxv2f32) from legal SVE types. I have added them as part of this patch. The only way that I could find to test the new patterns was to use the SVE tuple get intrinsics, although I realise it looks a bit unusual. Tests added here: test/CodeGen/AArch64/sve-extract-subvector.ll Differential Revision: https://reviews.llvm.org/D85516	2020-08-12 08:35:46 +01:00
Kazushi (Jam) Marukawa	5d549219df	[VE] Change to promote i32 AND/OR/XOR operations VE has only 64 bits AND/OR/XOR instructions. We pretended that VE has 32 bits instructions also, but doing it increase the number of generated instructions. Therefore, we decide to promote 32 bits operations and use only 64 bits instructions in back end. We also avoid pretending that VE has 32 bits LEA instruction. Update regression tests also. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D85726	2020-08-12 16:23:50 +09:00
Craig Topper	6b3dc96e59	[X86][GlobalISel] Replace a misuse of SUBREG_TO_REG with INSERT_SUBREG. SUBREG_TO_REG is supposed to be used when we know the producing instruction already zeroed the bits we're extending. But that's not the case here. So INSERT_SUBREG with an IMPLICIT_DEF is the correct thing to use.	2020-08-11 23:51:02 -07:00
Jordan Rupprecht	1a67522d3e	[NFC] Inline variable only used in debug builds	2020-08-11 19:38:01 -07:00
Thomas Lively	2985c02f79	[WebAssembly][AsmParser] Name missing features in error message Rather than just saying that some feature is missing, report the exact features to make the error message more useful and actionable. Differential Revision: https://reviews.llvm.org/D85795	2020-08-11 17:26:14 -07:00
Jian Cai	277873ce0f	[AARCH64] [MC] add memtag as an alias of mte architecture extension Add memtag as an alis of met architectture extesion to be consistent with GNU as. LINK:https://sourceware.org/bugzilla/show_bug.cgi?id=26339 Reviewed By: nickdesaulniers, MaskRay Differential Revision: https://reviews.llvm.org/D85620	2020-08-11 13:28:47 -07:00
Thomas Lively	1a69f02397	[WebAssembly][NFC] Replace WASM with standard Wasm The officially specified abbreviation for WebAssembly is Wasm and the spec explicitly calls out WASM as being an incorrect spelling. This patch fixes a few comments and error messages to use the spec-compliant abbreviation. Differential Revision: https://reviews.llvm.org/D85764	2020-08-11 12:27:59 -07:00
diggerlin	e9ac1495e2	[AIX][XCOFF] change the operand of branch instruction from symbol name to qualified symbol name for function declarations SUMMARY: 1. in the patch , remove setting storageclass in function .getXCOFFSection and construct function of class MCSectionXCOFF there are XCOFF::StorageMappingClass MappingClass; XCOFF::SymbolType Type; XCOFF::StorageClass StorageClass; in the MCSectionXCOFF class, these attribute only used in the XCOFFObjectWriter, (asm path do not need the StorageClass) we need get the value of StorageClass, Type,MappingClass before we invoke the getXCOFFSection every time. actually , we can get the StorageClass of the MCSectionXCOFF from it's delegated symbol. 2. we also change the oprand of branch instruction from symbol name to qualify symbol name. for example change bl .foo extern .foo to bl .foo[PR] extern .foo[PR] 3. and if there is reference indirect call a function bar. we also add extern .bar[PR] Reviewers: Jason liu, Xiangling Liao Differential Revision: https://reviews.llvm.org/D84765	2020-08-11 15:26:19 -04:00
Jessica Paquette	bebe6a6449	[GlobalISel] Combine (logic_op (op x...), (op y...)) -> (op (logic_op x, y)) This implements ``` (logic_op (op x...), (op y...)) -> (op (logic_op x, y)) ``` when `op` is an extend, a shift, or an and. This is similar to `DAGCombiner::hoistLogicOpWithSameOpcodeHands` (with a bunch of missing cases, e.g. G_TRUNC, G_BITCAST, etc.) This is implemented so it works both pre and post-legalization. This also adds a general way to add a series of instructions in a combine. (`applyBuildInstructionSteps`). Differential Revision: https://reviews.llvm.org/D85050	2020-08-11 10:40:06 -07:00
Simon Pilgrim	2655bd51d6	[X86][SSE] combineShuffleWithHorizOp - canonicalize SHUFFLE(HOP(X,Y),HOP(Y,X)) -> SHUFFLE(HOP(X,Y)) Attempt to canonicalize binary shuffles of HOPs with commuted operands to an unary shuffle.	2020-08-11 18:13:03 +01:00
Eric Christopher	8155cb27a2	Fold Opcode into assert uses to fix an unused variable warning without asserts.	2020-08-11 09:30:51 -07:00
Simon Pilgrim	fe1f36986b	[X86][SSE] combineShuffleWithHorizOp - avoid unnecessary subtraction. NFCI. We can safely replace ((M - NumElts) % NumEltsPerLane) with (M % NumEltsPerLane) as the modulo result will be the same.	2020-08-11 17:07:32 +01:00
Matt Arsenault	0dc4c36d3a	AMDGPU/GlobalISel: Manually select llvm.amdgcn.writelane Fixup the special case constant bus handling pre-gfx10.	2020-08-11 11:56:16 -04:00
Simon Pilgrim	91d59cbf1b	[X86][SSE] Add HADD/SUB support to combineHorizOpWithShuffle Handles some HOP(SHUFFLE,SHUFFLE) patterns and sets us up to improve some of the cases mentioned in PR41813.	2020-08-11 16:14:14 +01:00
Matt Arsenault	076305568c	AMDGPU/GlobalISel: Prepare for more custom load lowerings Slight restructuring of the code to avoid formatting changes when more cases are handled here.	2020-08-11 11:09:05 -04:00
Matt Arsenault	e2f1b48f86	GlobalISel: Implement bitcast action for G_INSERT_VECTOR_ELT This mirrors the support for the equivalent extracts. This also creates a huge mess that would be greatly improved if we had any bit operation combines.	2020-08-11 10:39:14 -04:00
Matt Arsenault	53f21e0fb7	TableGen/GlobalISel: Hack the operand order for atomic_store ISD::ATOMIC_STORE arbitrarily has the operands in the opposite order from regular ISD::STORE, which always introduced an annoying duplication of patterns to handle both cases. Since in GlobalISel there's just the one G_STORE, we need to swap the operands to correctly emit the type check for the pointer operand. Some work started in `20aafa3156` to migrate SelectionDAG to use ISD::STORE for atomics, but that work seems to have stalled. Since this is the pretty much the last operation which matters which isn't supported for AMDGPU, use this compatibility hack to unblock declaring it functionally complete. Not sure what's going on with the pending_phis AArch64 test. It seems it didn't always use atomics, and I'm not sure what it was originally testing matters anymore.	2020-08-11 10:22:44 -04:00
Kerry McLaughlin	85c7e89f3b	[CodeGen] Refactor getMemBasePlusOffset & getObjectPtrOffset to accept a TypeSize Changes the Offset arguments to both functions from int64_t to TypeSize & updates all uses of the functions to create the offset using TypeSize::Fixed() Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D85220	2020-08-11 12:17:10 +01:00
Kazushi (Jam) Marukawa	59703f1736	[VE] Update bit operations Change bitreverse/bswap/ctlz/ctpop/cttz regression tests to support i128 and signext/zeroext i32 types. This patch also change the way to support i32 types using 64 bits VE instructions. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D85712	2020-08-11 19:42:12 +09:00
Paul Walker	b6c7b7fa31	[SVE] Add ISD nodes for predicated integer extend inreg operations. These are useful instructions when lowering fixed length vector extends, so I've broken this patch out as kind of NFC like work. Differential Revision: https://reviews.llvm.org/D85546	2020-08-11 11:39:26 +01:00
Simon Pilgrim	49016eeab6	[X86] Rename combineVectorPackWithShuffle -> combineHorizOpWithShuffle. NFC. The plan is to use this for (F)HADD/SUB opcodes as well as PACKs - similar to how we use combineShuffleWithHorizOp	2020-08-11 11:38:43 +01:00
Paul Walker	d542feb8e4	[SVE] Lower fixed length vector integer subtract operations. Differential Revision: https://reviews.llvm.org/D85665	2020-08-11 11:32:12 +01:00
Kai Nacke	d6f710fd46	[NFC] Fix typo in comment. Twelvth -> Twelfth	2020-08-11 05:27:56 -04:00
Craig Topper	9201efb3b9	[X86] Custom match X86ISD::VPTERNLOG in X86ISelDAGToDAG in order to reduce isel patterns. By factoring out the end of tryVPTERNLOG, we can use the same code to directly match X86ISD::VPTERNLOG. This allows us to remove around 3-4K worth of X86GenDAGISel.inc.	2020-08-10 23:15:58 -07:00
Wang, Pengfei	9512525947	[X86][FPEnv] Teach X86 mask compare intrinsics to respect strict FP semantics. When we use mask compare intrinsics under strict FP option, the masked elements shouldn't raise any exception. So, we cann't replace the intrinsic with a full compare + "and" operation. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D85385	2020-08-11 10:28:41 +08:00
jasonliu	20abff0481	[XCOFF][AIX] Use TE storage mapping class when large code model is enabled Summary: Use TE SMC instead of TC SMC in large code model mode, so that large code model TOC entries could get placed after all the small code model TOC entries, which reduces the chance of TOC overflow. Reviewed By: Xiangling_L Differential Revision: https://reviews.llvm.org/D85455	2020-08-10 19:52:10 +00:00
Puyan Lotfi	7bc03f5553	[MachineOutliner][AArch64] WA for multiple stack fixup cases in MachineOutliner. In cases where MachineOutliner candidates either are: * noreturn * have calls with no available LR or free regs * Don't use SP we can end up hitting stack fixup code for the caller and the callee for a FrameID of MachineOutlinerDefault. This triggers the assert: `assert(OF.FrameConstructionID != MachineOutlinerDefault && "Can only fix up stack references once");` in AArch64InstrInfo.cpp. This assert exists for now because a lot of the fixup code is not tested to handle fixing up more than once and needs some better checks and enhancements to avoid potentially generating illegal code. I've filed a Bugzilla report to track this until these cases are handled by the AArch64 MachineOutliner: https://bugs.llvm.org/show_bug.cgi?id=46767 This diff detects cases that will cause these multiple stack fixups and prune the Candidates from `RepeatedSequenceLocs`. Differential Revision: https://reviews.llvm.org/D83923	2020-08-10 15:43:30 -04:00
Matt Arsenault	6fe6b29c29	AMDGPU: Fix assertion in performSHLPtrCombine for 64-bit pointers	2020-08-10 13:46:52 -04:00
Matt Arsenault	68fab44acf	AMDGPU: Fix visiting physreg dest users when folding immediate copies This can fold the immediate into the physical destination, but this should not look for further users of the register. Fixes regression introduced by `766cb615a3`.	2020-08-10 13:46:51 -04:00
Craig Topper	96dfc783b2	[BreakFalseDeps][X86] Move operand loop out of X86's getUndefRegClearance and put in the pass. X86 is the only user of this interface in tree. Previously the X86 pass would loop over operands looking for one undef operand for the pass to fix. But there could theoretically be multiple operands to fix. So it makes more sense for the pass to do the looping and ask the target if an operand needs to be fixed.	2020-08-10 10:32:29 -07:00
Wouter van Oortmerssen	582fd474dd	[WebAssembly] wasm64: fix memory.init operand types I had assumed they would all become in i64, but this is not necessary as long as data segments stay 32-bit, see: https://github.com/WebAssembly/memory64/blob/master/proposals/memory64/Overview.md Differential Revision: https://reviews.llvm.org/D85552	2020-08-10 10:15:20 -07:00
Simon Pilgrim	9a368d2b00	[X86][SSE] shuffle(hop,hop) - canonicalize unary hop(x,x) shuffle masks If a shuffle is referring to both the lower and upper half lanes of an unary horizontal op, then canonicalize the mask to only refer to the lower half.	2020-08-10 16:09:27 +01:00
jasonliu	7866442b3f	[XCOFF] Adjust .rename emission sequence Summary: AIX assembler does not generate correct relocation when .rename appear between tc entry label and .tc directive. So only emit .rename after .tc/.comm or other linkage is emitted. Reviewed By: daltenty, hubert.reinterpretcast Differential Revision: https://reviews.llvm.org/D85317	2020-08-10 14:48:24 +00:00
Xiangling Liao	6ef801aa6b	[AIX] Static init frontend recovery and backend support On the frontend side, this patch recovers AIX static init implementation to use the linkage type and function names Clang chooses for sinit related function. On the backend side, this patch sets correct linkage and function names on aliases created for sinit/sterm functions. Differential Revision: https://reviews.llvm.org/D84534	2020-08-10 10:10:49 -04:00
Simon Pilgrim	07e673a02b	[X86][SSE] Pull out shuffle(hop,hop) combine into combineShuffleWithHorizOp helper. NFC.	2020-08-10 15:08:57 +01:00
Stefan Pintilie	81883ca074	[PowerPC] Add option to control PCRel GOT indirect linker optimization Add a hidden option to the compiler to control a the PC Relative GOT indirect linker optimization. If this option is set to false the compiler will no loger produce the relocations required by the linker to perform the optimization. Reviewed By: nemanjai, NeHuang, #powerpc Differential Revision: https://reviews.llvm.org/D85377	2020-08-10 09:07:17 -05:00
Sam Parker	4f9f4b21e0	[ARM] Unrestrict Armv8-a IT when at minsize IT blocks with more than one instruction were performance deprecated in Armv8 but that doesn't mean we should follow that advise when optimising for size. Differential Revision: https://reviews.llvm.org/D85638	2020-08-10 14:59:53 +01:00
Simon Pilgrim	e6dc2c8ce7	[X86][SSE] combineTargetShuffle - rearrange shuffle(hop,hop) matching to delay shuffle mask manipulation. NFC. Check that we're shuffling hadd/pack ops first before altering shuffle masks. First step towards adding extra functionality, plus it avoids costly shuffle mask manipulation if not necessary.	2020-08-10 14:13:19 +01:00
Matt Arsenault	40188f807d	AMDGPU/GlobalISel: Don't try to handle undef source operand This is now illegal MIR	2020-08-10 08:49:43 -04:00
Matt Arsenault	f9c279b057	PeepholeOptimizer: Use Register	2020-08-10 08:49:36 -04:00
Matt Arsenault	a0ec81f70d	AMDGPU/GlobalISel: Merge load/store select cases	2020-08-10 08:46:26 -04:00
Matt Arsenault	c8b17874e5	AMDGPU/GlobalISel: Fix typo	2020-08-10 08:41:17 -04:00
Matt Arsenault	9533f0ea68	AMDGPU/GlobalISel: Use nicer form of buildInstr	2020-08-10 08:41:07 -04:00
Qiu Chaofan	dbcfbffc7a	[PowerPC] Add intrinsic to read or set FPSCR register This patch introduces two intrinsics: llvm.ppc.setflm and llvm.ppc.readflm. They read from or write to FPSCR register (floating-point status & control) which contains rounding mode and exception status. To ensure correctness of program, we need to prevent FP operations from being moved across these intrinsics (mffs/mtfsf instruction), so here I set them as scheduling boundaries. We can relax such restriction if FPSCR is modeled well in the future. Reviewed By: steven.zhang Differential Revision: https://reviews.llvm.org/D84914	2020-08-10 18:27:45 +08:00
Petar Avramovic	0d58d9e8fb	AMDGPU/GlobalISel: Lower G_FREM Add custom lower for G_FREM. Differential Revision: https://reviews.llvm.org/D84324	2020-08-10 10:10:46 +02:00
Piotr Sobczak	62d8b8a225	Fix 64-bit copy to SCC Fix 64-bit copy to SCC by restricting the pattern resulting in such a copy to subtargets supporting 64-bit scalar compare, and mapping the copy to S_CMP_LG_U64. Before introducing the S_CSELECT pattern with explicit SCC (`0045786f14`), there was no need for handling 64-bit copy to SCC ($scc = COPY sreg_64). The proposed handling to read only the low bits was however based on a false premise that it is only one bit that matters, while in fact the copy source might be a vector of booleans and all bits need to be considered. The practical problem of mapping the 64-bit copy to SCC is that the natural instruction to use (S_CMP_LG_U64) is not available on old hardware. Fix it by restricting the problematic pattern to subtargets supporting the instruction (hasScalarCompareEq64). Differential Revision: https://reviews.llvm.org/D85207	2020-08-09 20:50:30 +02:00
David Green	186a7f81e8	[ARM] Add VADDV and VMLAV patterns for v16i16 This adds patterns for v16i16's vecreduce, using all the existing code to go via an i32 VADDV/VMLAV and truncating the result. Differential Revision: https://reviews.llvm.org/D85452	2020-08-09 11:09:49 +01:00
David Green	8590e5abad	[ARM] Allow vecreduce_add in tail predicated loops This allows vecreduce_add in loops so that we can tailpredicate them. Differential Revision: https://reviews.llvm.org/D85454	2020-08-09 10:57:17 +01:00
David Green	296faa91ed	[ARM] Some formatting and predicate VRHADD patterns. NFC This formats some of the MVE patterns, and adds a missing Predicates = [HasMVEInt] to some VRHADD patterns I noticed as going through. Although I don't believe NEON would ever use the patterns (as it would use ADDL and VSHRN instead) they should ideally be predicated on having MVE instructions.	2020-08-09 10:07:52 +01:00
Craig Topper	bc8be30540	[X86][GlobalISel] Remove unneeded code for handling zext i8->16, i8->i64, i16->i64, i32->i64. These all seem to be handled by tablegen pattern imports.	2020-08-09 00:26:15 -07:00
Thomas Lively	cc612c2908	[WebAssembly] Fix FastISel address calculation bug Fixes PR47040, in which an assertion was improperly triggered during FastISel's address computation. The issue was that an `Address` set to be relative to the FrameIndex with offset zero was incorrectly considered to have an unset base. When the left hand side of an add set the Address to be 0 off the FrameIndex, the right side would not detect that the Address base had already been set and could try to set the Address to be relative to a register instead, triggering an assertion. This patch fixes the issue by explicitly tracking whether an `Address` has been set rather than interpreting an offset of zero to mean the `Address` has not been set. Differential Revision: https://reviews.llvm.org/D85581	2020-08-08 15:23:11 -07:00
Craig Topper	d3153b5ca2	[X86] Remove a DCI.isBeforeLegalize() call from combineVSelectWithAllOnesOrZeros. This was blocking isTypeLegal call so that we could do a particular transform on illegal types before type legalization. But the we create a target specific node using that type. We shouldn't do that if the type isn't legal. So I think we should just always make sure the type is legal. I suspect that in order to get the condition VT to not be a vector of i1 we already completed type legalization anyway so this probably doesn't matter much in practice.	2020-08-08 14:19:13 -07:00
Craig Topper	966a58e329	[X86] Support matching VPTERNLOG when the root node is X86ISD::ANDNP.	2020-08-08 13:11:47 -07:00
Dávid Bolvanský	c814eca3e4	[AArch64RegisterInfo] Supress new warning	2020-08-08 21:47:01 +02:00
Craig Topper	815a9b256b	[X86] Remove isSafeToClobberEFLAGS helper and just inline it into the call sites. This is just a thin wrapper around computeRegisterLivness which we can just call directly. The only real difference is that isSafeToClobberEFLAGS returns a bool and computeRegisterLivness returns an enum. So we need to check for the specific enum value that isSafeToClobberEFLAGS was hiding. I've also adjusted which sites pass an explicit value for Neighborhood since the default for computeRegisterLivness is 10.	2020-08-08 12:31:58 -07:00
Craig Topper	8d3ae64b04	Recommit "[X86] Increase the number of instructions searched for isSafeToClobberEFLAGS in a couple places" I messed up the bug numbers in the commit message before Previously this function searched 4 instructions forwards or backwards to determine if it was ok to clobber eflags. This is called in 3 places: rematerialization, turning 2 operand leas into adds or splitting 3 ops leas into an lea and add on some CPU targets. This patch increases the search limit to 10 instructions for rematerialization and 2 operand lea to add. I've left the old treshold for 3 ops lea spliting as that increases code size. Fixes PR47024 and PR46315.	2020-08-08 11:53:14 -07:00
Craig Topper	761f568420	Revert "[X86] Increase the number of instructions searched for isSafeToClobberEFLAGS in a couple places" This reverts commit `44b260cb0a`. I messed up the bug number in the commit message so I'm reverting to fix it.	2020-08-08 11:53:14 -07:00
Simon Pilgrim	cc15380f10	[X86][SSE] combineTargetShuffle - use scaleShuffleMask helper to widen shuffle mask. NFCI. Use scaleShuffleMask helper for the shuffle(hadd,hadd) canonicalization.	2020-08-08 19:36:18 +01:00
Craig Topper	44b260cb0a	[X86] Increase the number of instructions searched for isSafeToClobberEFLAGS in a couple places Previously this function searched 4 instructions forwards or backwards to determine if it was ok to clobber eflags. This is called in 3 places: rematerialization, turning 2 operand leas into adds or splitting 3 ops leas into an lea and add on some CPU targets. This patch increases the search limit to 10 instructions for rematerialization and 2 operand lea to add. I've left the old treshold for 3 ops lea spliting as that increases code size. Fixes PR47024 and PR43014	2020-08-08 11:29:41 -07:00
Craig Topper	514b00c439	[X86] Limit the scope of the min/max canonicalization in combineSelect Previously the transform was doing these two canonicalizations (x > y) ? x : y -> (x >= y) ? x : y (x < y) ? x : y -> (x <= y) ? x : y But those don't seem to be useful generally. And they actively pessimize the cases in PR47049. This patch limits it to (x > 0) ? x : 0 -> (x >= 0) ? x : 0 (x < -1) ? x : -1 -> (x <= -1) ? x : -1 These are the cases mentioned in the comments as the motivation for the canonicalization. These allow the CMOV to use the S flag from the compare thus improving opportunities to use a TEST or the flags from an arithmetic instruction.	2020-08-07 22:51:49 -07:00
Keno Fischer	c58674df14	[X86] Don't produce bad x86andp nodes for i1 vectors In D85499, I attempted to fix this same issue by canonicalizing andnp for i1 vectors, but since there was some opposition to such a change, this commit just fixes the bug by using two different forms depending on which kind of vector type is in use. We can then always decide to switch the canonical forms later. Description of the original bug: We have a DAG combine that tries to fold (vselect cond, 0000..., X) -> (andnp cond, x). However, it does so by attempting to create an i64 vector with the number of elements obtained by truncating division by 64 from the bitwidth. This is bad for mask vectors like v8i1, since that division is just zero. Besides, we don't want i64 vectors anyway. For i1 vectors, switch the pattern to (andnp (not cond), x), which is the canonical form for `kandn` on mask registers. Fixes https://github.com/JuliaLang/julia/issues/36955. Differential Revision: https://reviews.llvm.org/D85553	2020-08-07 20:05:47 -04:00
Matt Arsenault	3c0597a9e4	AMDGPU: Avoid explicitly listing all the memory nodes	2020-08-07 19:22:46 -04:00
Arthur Eubanks	1bf4629f11	[PPC] Rename bool-ret-to-int -> ppc-bool-ret-to-int Reviewed By: #powerpc, nemanjai Differential Revision: https://reviews.llvm.org/D85391	2020-08-07 11:27:05 -07:00
Vang Thao	04bd5b5286	[AMDGPU] Fix not rescheduling without clustering Regions are sometimes skipped which should be rescheduled without memory op clustering. RegionIdx is not incremented when iterating over regions that are flagged to be skipped, causing the index to be incorrect. Thanks to Vang Thao for discovering this bug! Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D85498	2020-08-07 11:15:58 -07:00
Amy Kwan	98eccec3ae	[PowerPC] Add Vector Extract/Expand/Count with Mask, Move to VSR Mask Instruction Definitions and MC Tests This patch adds the instruction definitions and assembly/disassembly tests for the following set of instructions: Vector Extract [byte \| half \| word \| doubleword \| quad] with mask Vector Expand [byte \| half \| word \| doubleword \| quad] with mask Move to VSR [byte \| byte immediate \| half \| word \| doubleword \| quad] with mask Vector Count Mask Bits [byte \| half \| word \| doubleword] Differential Revision: https://reviews.llvm.org/D83724	2020-08-07 11:02:08 -05:00
Kamau Bridgeman	d8c6d083c9	[PowerPC][PCRelative] Set TLS unsupported with PC relative memops Introduce a fatal error if any thread local storage code is compiled using pc relative memory operations as well as a hidden override option `-enable-ppc-pcrel-tls` so that this support can be incrementally added if possible. Reviewed By: #powerpc, nemanjai Differential Revision: https://reviews.llvm.org/D85448	2020-08-07 10:56:24 -05:00
Bevin Hansson	5de6c56f7e	[Intrinsic] Add sshl.sat/ushl.sat, saturated shift intrinsics. Summary: This patch adds two intrinsics, llvm.sshl.sat and llvm.ushl.sat, which perform signed and unsigned saturating left shift, respectively. These are useful for implementing the Embedded-C fixed point support in Clang, originally discussed in http://lists.llvm.org/pipermail/llvm-dev/2018-August/125433.html and http://lists.llvm.org/pipermail/cfe-dev/2018-May/058019.html Reviewers: leonardchan, craig.topper, bjope, jdoerfert Subscribers: hiraditya, jdoerfert, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D83216	2020-08-07 15:09:24 +02:00
Kazushi (Jam) Marukawa	63bc5d7863	[VE] Change to expand multiply related instructions Change to expand MULHU/MULHS/UMUL_LOHI/SMUL_LOHI for i32 and i64 since those instructions are not available on Aurora SX VE. Some of them are used in expansion of i128 multiply, so need to modify them to support i128. Then, update basic arithmetic regression tests of i128 and signed/unsigned i32 typed integer values. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D85490	2020-08-07 18:22:25 +09:00
David Sherwood	0905d9f31e	[SVE][CodeGen] Fix bug with store of unpacked FP scalable vectors Fixed an incorrect pattern in lib/Target/AArch64/AArch64SVEInstrInfo.td for storing out <vscale x 2 x f32> unpacked scalable vectors. Added a couple of tests to test/CodeGen/AArch64/sve-st1-addressing-mode-reg-imm.ll Differential Revision: https://reviews.llvm.org/D85441	2020-08-07 07:19:09 +01:00
biplmish	cce1b0e891	[PowerPC] Implement Vector Extract Low/High Order Builtins in LLVM/Clang This patch implements the function prototypes vec_extractl and vec_extracth in altivec.h to utilize the vector extract double element instructions introduced in Power10. Differential Revision: https://reviews.llvm.org/D84622	2020-08-07 01:02:29 -05:00
QingShan Zhang	55de46f3b2	[PowerPC] Support constrained fp operation for setcc The constrained fp operation fcmp was added by https://reviews.llvm.org/D69281. This patch is trying to add the support for PowerPC backend. Reviewed By: uweigand Differential Revision: https://reviews.llvm.org/D81727	2020-08-07 05:16:36 +00:00
Kazushi (Jam) Marukawa	f92e0d9384	[VE] Optimize trunc related instructions Change to not generate truncate instructions if all use of a truncate operation don't care about higher bits. For example, an i32 add instruction doesn't care about higher 32 bits in 64 bit registers. Updates regression tests also. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D85418	2020-08-07 09:21:05 +09:00
Yonghong Song	c50f5dece9	BPF: fix libLLVMBPFCodeGen.so build failure Buildbot reported a build failure when building shared library libLLVMBPFCodeGen.so with unknown reference to "createCFGSimplificationPass". Commit `87cba43402` ("BPF: add a SimplifyCFG IR pass during generic Scalar/IPO optimization") added an IR pass SimplifyCFG by BPF target. The commit called function createCFGSimplificationPass() defined in "Scalar" library. Add this library in Target/BPF/LLVMBuild.txt so shared library build can succeed.	2020-08-06 15:27:15 -07:00
Matt Arsenault	87b2af8140	AMDGPU/GlobalISel: Enable s_{and\|or}n2_{b32\|b64} patterns	2020-08-06 18:00:38 -04:00
Yonghong Song	87cba43402	BPF: add a SimplifyCFG IR pass during generic Scalar/IPO optimization The following bpf linux kernel selftest failed with latest llvm: $ ./test_progs -n 7/10 ... The sequence of 8193 jumps is too complex. verification time 126272 usec stack depth 320 processed 114799 insns (limit 1000000) ... libbpf: failed to load object 'pyperf600_nounroll.o' test_bpf_verif_scale:FAIL:110 #7/10 pyperf600_nounroll.o:FAIL #7 bpf_verif_scale:FAIL After some investigation, I found the following llvm patch https://reviews.llvm.org/D84108 is responsible. The patch disabled hoisting common instructions in SimplifyCFG by default. Later on, the code changes and a SimplifyCFG phase with hoisting on cannot do the work any more. A test is provided to demonstrate the problem. The IR before simplifyCFG looks like: for.cond: %i.0 = phi i32 [ 0, %entry ], [ %inc, %for.inc ] %cmp = icmp ult i32 %i.0, 6 br i1 %cmp, label %for.body, label %for.cond.cleanup for.cond.cleanup: %2 = load i8, i8* %frame_ptr, align 8, !tbaa !2 %cmp2 = icmp eq i8* %2, null %conv = zext i1 %cmp2 to i32 call void @llvm.lifetime.end.p0i8(i64 8, i8* nonnull %1) #3 call void @llvm.lifetime.end.p0i8(i64 8, i8* nonnull %0) #3 ret i32 %conv for.body: %3 = load i8, i8* %frame_ptr, align 8, !tbaa !2 %tobool.not = icmp eq i8* %3, null br i1 %tobool.not, label %for.inc, label %land.lhs.true The first two insns of `for.cond.cleanup` and `for.body`, load and icmp, can be hoisted to `for.cond` block. With Patch D84108, the optimization is delayed. But unfortunately, later on loop rotation added addition phi nodes to `for.body` and hoisting cannot be done any more. Note such a hoisting is beneficial to bpf programs as bpf verifier does path sensitive analysis and verification. The hoisting preverts reloading from stack which will assume conservative value and increase exploited insns. In this case, it caused verifier failure. To fix this problem, I added an IR pass from bpf target to performance additional simplifycfg with hoisting common inst enabled. Differential Revision: https://reviews.llvm.org/D85434	2020-08-06 13:16:00 -07:00
dfukalov	4ccc38813e	[AMDGPU][CostModel] Add f16, f64 and contract cases to fused costs estimation. Add cases of fused fmul+fadd/fsub with f16 and f64 operands to cost model. Also added operations with contract attribute. Fixed line endings in test. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D84995	2020-08-06 21:43:27 +03:00
Matt Arsenault	e00201539f	GlobalISel: Implement fewerElementsVector for G_EXTRACT_VECTOR_ELT Use the same basic strategy as LegalizeVectorTypes. Try to index into smaller pieces if there's a constant index, and otherwise fall back to a stack temporary.	2020-08-06 14:33:16 -04:00
Matt Arsenault	1a0c0944c6	AMDGPU: Define raw/struct variants of buffer atomic fadd Somehow the new FP atomic buffer intrinsics ended up using the legacy style for buffer intrinsics.	2020-08-06 13:36:19 -04:00
Matt Arsenault	eae9c54148	AArch64/GlobalISel: Fix verifier error after selecting returnaddress This was caching the wrong register to re-use later.	2020-08-06 13:18:05 -04:00
Matt Arsenault	90eb7d5283	AMDGPU: Fix spilling of 96-bit AGPRs	2020-08-06 12:42:07 -04:00
Matt Arsenault	56270d1d42	AMDGPU/GlobalISel: Start trying to handle AGPR bank Try to use AGPR banks for the various merge/unmerge type operations. Previously these would introduce copies to VGPR.	2020-08-06 12:39:50 -04:00
Matt Arsenault	34040a4f61	GlobalISel: Define InvalidRegBankID enum value	2020-08-06 12:39:49 -04:00
Matt Arsenault	63cdc9a49f	AMDGPU/GlobalISel: Handle llvm.amdgcn.ds.{fadd\|fmin\|fmax} These intrinsics are missing mangling for both the pointer and data type.	2020-08-06 11:09:08 -04:00
Matt Arsenault	63c4be53cf	AMDGPU/GlobalISel: Try to promote to use packed saturating add/sub This produces worse results right now for i8 vectors, but that should be addressed when we actually try to optimize packed vectors.	2020-08-06 11:08:45 -04:00
Matt Arsenault	dcf3ffb0a8	AMDGPU/GlobalISel: Move frame index selection to patterns Doesn't really save any code until global value is handled too.	2020-08-06 10:42:15 -04:00
Matt Arsenault	d188a608bd	AMDGPU: Fix code duplication between the selectors Not sure this is the right place for this helper.	2020-08-06 10:42:15 -04:00
Matt Arsenault	5a503521e7	AMDGPU/GlobalISel: Implement expansion for rsq.clamp Not sure why we handle this removed instruction on newer subtargets for this one and no others, but maintain compatibility with the DAG.	2020-08-06 10:23:25 -04:00
Matt Arsenault	c015cbc68b	AMDGPU/GlobalISel: Fix trying to widen <3 x s1> boolean ops	2020-08-06 10:07:22 -04:00
Matt Arsenault	28124a0a63	AMDGPU/GlobalISel: Stop using G_EXTRACT in argument lowering We really need to put this undef padding stuff into a helper somewhere, but leave that for when this is moved to generic code.	2020-08-06 09:55:35 -04:00
Matt Arsenault	6c7f640bf7	AMDGPU/GlobalISel: Implement LLT version of allowsMisalignedMemoryAccesses	2020-08-06 09:50:36 -04:00
Matt Arsenault	37894ba661	AMDGPU/GlobalISel: Make s16 phi legal If we were to have an operation with an s16 def that needs to be executed in a waterfall loop, not having s16 legal would place an avoidable burden on RegBankSelect to widen it.	2020-08-06 09:41:14 -04:00
Matt Arsenault	5316256709	AMDGPU/GlobalISel: Fix assert on copy to vcc This was trying to constrain a physical register. By the verifier's understanding, it's impossible to have a 1-bit copy to vcc/vcc_lo so don't try to handle physregs.	2020-08-06 09:41:14 -04:00
Paul Walker	0d33a8ef5b	[SVE] Lower scalable vector mul operations. This allows us to remove extra patterns from AArch64SVEInstrInfo.td because we can reuse those required for fixed length vectors. Differential Revision: https://reviews.llvm.org/D85328	2020-08-06 11:15:35 +01:00
Paul Walker	3ed59b775d	[SVE] Implement lowering for fixed length vector multiplication. NOTE: Also uses SVE code generation for NEON size vectors, instead of expanding i64 based vector multiplications. Differential Revision: https://reviews.llvm.org/D85327	2020-08-06 11:01:39 +01:00
Martin Storsjö	f5e6fbac24	[AArch64] [Windows] Error out on unsupported symbol locations These might occur in seemingly generic assembly. Previously when targeting COFF, they were silently ignored, which certainly won't give the right result. Instead clearly error out, to make it clear that the assembly needs to be adjusted for this target. Also change a preexisting report_fatal_error into a proper error message, pointing out the offending source instruction. This isn't strictly an internal error, as it can be triggered by user input. Differential Revision: https://reviews.llvm.org/D85242	2020-08-06 09:23:46 +03:00
Martin Storsjö	5eedc01a82	[ARM, AArch64] Fix a comment typo. NFC.	2020-08-06 09:23:45 +03:00
Craig Topper	0215ae9735	[X86] Remove incomplete custom handling of i128 sdivrem/udivrem on Windows. We need to have special handling of i128 div/rem on Windows due to a weird calling convention needed for the libcall. There was also some code that made it look like we do the same for sdivrem/udiv, but the code didn't account for multiple return values of those functions so couldn't possibly work. I think this code never triggers because we don't have libcall names defined for those functions by default so DAGCombine never creates DIVREM nodes.	2020-08-05 23:01:07 -07:00
Matt Arsenault	0ee1eba581	AMDGPU: Remove ATOMIC_PK_FADD The f32 and v2f16 cases should be handled the same way.	2020-08-05 22:00:52 -04:00
Ruiling Song	5ddc8b49ba	[AMDGPU] add buffer_atomic_swap for float The functionality is used when calling imageAtomicExhange() on float type imageBuffer in Graphics shaders. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D85187	2020-08-06 09:45:48 +08:00
Craig Topper	08b2d0a963	[X86] Disable copy elision in LowerMemArgument for scalarized vectors when the loc VT is a different size than the original element. For example a v4f16 argument is scalarized to 4 i32 values. So the values are spread out instead of being packed tightly like in the original vector. Fixes PR47000.	2020-08-05 15:44:54 -07:00
Stanislav Mekhanoshin	0bcda1a261	[AMDGPU] Scavenge temp reg for AGPR spill Differential Revision: https://reviews.llvm.org/D85234	2020-08-05 13:29:19 -07:00
Matt Arsenault	ec8c172d01	AMDGPU: Correct prolog SP initialization logic Having callees that will read SP is not the only reason we need to reference the stack pointer.	2020-08-05 15:47:53 -04:00
Stanislav Mekhanoshin	ea7d0e2996	[AMDGPU] gfx1031 target Differential Revision: https://reviews.llvm.org/D85337	2020-08-05 12:36:26 -07:00
Matt Arsenault	83eaf5d55d	AMDGPU: Eliminate BUFFER_ATOMIC_PK_ADD_F16 node This is redundant with the other no return buffer atomic node, and we don't really need a separate type profile for it.	2020-08-05 15:16:51 -04:00
Matt Arsenault	43c0c9252a	AMDGPU: Refactor buffer atomic intrinsic lowering Move raw/struct buffer atomic lowering to separate functions. This avoids a long nested switch, and simplifies a future patch.	2020-08-05 14:44:55 -04:00
Matt Arsenault	3e52667433	AMDGPU: Fix verifier error with undef source producing s_bitset* This needs to preserve the undef flag.	2020-08-05 14:42:20 -04:00
Simon Pilgrim	b60f998859	[X86][SSE] Fold 128-bit PACK(EXTEND(X),EXTEND(Y)) -> CONCAT(X,Y) subvectors This is seen in the sub-128-bit vector trunc(ext()) of comparison results Fixes pr46585.ll regression in D66004	2020-08-05 18:27:40 +01:00
Simon Pilgrim	6a06c7a0a7	[X86] isHorizontalBinOp - only update LHS/RHS references on success We've had issues in the past where isHorizontalBinOp calls would affect later combines as the LHS/RHS references had been commuted but still failed to match.	2020-08-05 15:09:52 +01:00
Simon Pilgrim	a57bfb44bc	[X86][AVX] Fold CONCAT(HOP(X,Y),HOP(Z,W)) -> HOP(CONCAT(X,Z),CONCAT(Y,W)) for integer types	2020-08-05 15:09:51 +01:00
Sam Parker	f2675ab45f	[ARM][CostModel] Implement getCFInstrCost As with other targets, set the throughput cost of control-flow instructions to free so that we don't miss out of vectorization opportunities. Differential Revision: https://reviews.llvm.org/D85283	2020-08-05 12:44:51 +01:00
Paul Walker	927fc536ca	[SVE] Add lowering for fixed length vector and, or & xor operations. Since there are no ill effects when performing these operations with undefined elements, they are lowered to the already supported unpredicated scalable vector equivalents. Differential Revision: https://reviews.llvm.org/D85117	2020-08-05 11:28:34 +01:00
Sander de Smalen	f2916636f8	[AArch64][SVE] Disable tail calls if callee does not preserve SVE regs. This fixes an issue triggered by the following code, where emitEpilogue got confused when trying to restore the SVE registers after the call, whereas the call to bar() is implemented as a TCReturn: int non_sve(); int sve(svint32_t x) { return non_sve(); } Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D84869	2020-08-05 09:38:54 +01:00
Jay Foad	8cbf4a17ac	[AMDGPU] Propagate fast math flags in frem lowering Differential Revision: https://reviews.llvm.org/D84518	2020-08-05 09:09:38 +01:00
Jay Foad	04cf4a5a65	[AMDGPU] Lower frem f16 Without this it would fail to select on subtargets that have 16-bit instructions. Differential Revision: https://reviews.llvm.org/D84517	2020-08-05 09:08:40 +01:00
Yonghong Song	00602ee7ef	BPF: simplify IR generation for __builtin_btf_type_id() This patch simplified IR generation for __builtin_btf_type_id(). For __builtin_btf_type_id(obj, flag), previously IR builtin looks like if (obj is a lvalue) llvm.bpf.btf.type.id(obj.ptr, 1, flag) !type else llvm.bpf.btf.type.id(obj, 0, flag) !type The purpose of the 2nd argument is to differentiate __builtin_btf_type_id(obj, flag) where obj is a lvalue vs. __builtin_btf_type_id(obj.ptr, flag) Note that obj or obj.ptr is never used by the backend and the `obj` argument is only used to derive the type. This code sequence is subject to potential llvm CSE when - obj is the same .e.g., nullptr - flag is the same - metadata type is different, e.g., typedef of struct "s" and strust "s". In the above, we don't want CSE since their metadata is different. This patch change IR builtin to llvm.bpf.btf.type.id(seq_num, flag) !type and seq_num is always increasing. This will prevent potential llvm CSE. Also report an error if the type name is empty for remote relocation since remote relocation needs non-empty type name to do relocation against vmlinux. Differential Revision: https://reviews.llvm.org/D85174	2020-08-04 16:29:42 -07:00
Arthur Eubanks	f50b3ff02e	[Hexagon] Use InstSimplify instead of ConstantProp This is the last remaining use of ConstantProp, migrate it to InstSimplify in the goal of removing ConstantProp. Add -hexagon-instsimplify option to enable skipping of instsimplify in tests that can't handle the extra optimization. Differential Revision: https://reviews.llvm.org/D85047	2020-08-04 15:42:39 -07:00
Krzysztof Parzyszek	09897b146a	[RDF] Remove uses of RDFRegisters::normalize (deprecate) This function has been reduced to an identity function for some time.	2020-08-04 17:02:12 -05:00
Matt Arsenault	486e84dfa4	AMDGPU/GlobalISel: Use live in helper function for returnaddress	2020-08-04 17:36:01 -04:00
Matt Arsenault	89011fc3c9	AMDGPU/GlobalISel: Select llvm.returnaddress	2020-08-04 17:14:38 -04:00
Matt Arsenault	f8fb7835d6	GlobalISel: Add utilty for getting function argument live ins Get the argument register and ensure there's a copy to the virtual register. AMDGPU and AArch64 have similarish code to get the livein value, and I also want to use this in multiple places. This is a bit more aggressive about setting the register class than the original function, but that's probably OK. I think we're missing a few verifier checks for function live ins. I noticed AArch64's calling convention code is not actually adding liveins to functions, only the entry block (which apparently might not matter that much?). There should probably be a verifier check that entry block live ins are also live into the function. We also might need a verifier check that the copy to the livein virtual register is in the entry block.	2020-08-04 16:55:55 -04:00
Eli Friedman	95efea4b93	[AArch64][SVE] Widen narrow sdiv/udiv operations. The SVE instruction set only supports sdiv/udiv for 32-bit and 64-bit integers. If we see an 8-bit or 16-bit divide, widen the operands to 32 bits, and narrow the result. Differential Revision: https://reviews.llvm.org/D85170	2020-08-04 13:22:15 -07:00
Yonghong Song	6d218b4adb	BPF: support type exist/size and enum exist/value relocations Four new CO-RE relocations are introduced: - TYPE_EXISTENCE: whether a typedef/record/enum type exists - TYPE_SIZE: the size of a typedef/record/enum type - ENUM_VALUE_EXISTENCE: whether an enum value of an enum type exists - ENUM_VALUE: the enum value of an enum type These additional relocations will make CO-RE bpf programs more adaptive for potential kernel internal data structure changes. Differential Revision: https://reviews.llvm.org/D83878	2020-08-04 12:35:39 -07:00
David Blaikie	e31cfc4cd3	Fix -Wconstant-conversion warning with explicit cast Introduced by `fd6584a220` Following similar use of casts in AsmParser.cpp, for instance - ideally this type would use unsigned chars as they're more representative of raw data and don't get confused around implementation defined choices of char's signedness, but this is what it is & the signed/unsigned conversions are (so far as I understand) safe/bit preserving in this usage and what's intended, given the API design here.	2020-08-04 10:41:27 -07:00
Matt Arsenault	0de547ed4a	AMDGPU/GlobalISel: Ensure subreg is valid when selecting G_UNMERGE_VALUES Fixes verifier error with SGPR unmerges with 96-bit result types.	2020-08-04 12:27:34 -04:00
Nemanja Ivanovic	14d726acd6	[PowerPC] Don't remove single swap between the load and store The swap removal pass looks to remove swaps when a loaded value is swapped, some number of lane-insensitive operations are performed and then the value is swapped again and stored. However, in a situation where we load the value, swap it and then store it without swapping again, the pass erroneously removes the single swap. The reason is that both checks in the same equivalence class: - load feeds a swap - swap feeds a store pass. However, there is no check that the two swaps are actually a single swap. This patch just fixes that. Differential revision: https://reviews.llvm.org/D84785	2020-08-04 10:38:15 -05:00
Jay Foad	28e322ea93	[PowerPC] Custom lowering for funnel shifts The custom lowering saves an instruction over the generic expansion, by taking advantage of the fact that PowerPC shift instructions are well defined in the shift-by-bitwidth case. Differential Revision: https://reviews.llvm.org/D83948	2020-08-04 16:30:49 +01:00
Jay Foad	8ec8ad868d	[AMDGPU] Use fma for lowering frem This gives shorter f64 code and perhaps better accuracy. Differential Revision: https://reviews.llvm.org/D84516	2020-08-04 16:18:23 +01:00
Simon Pilgrim	6f0da46d53	[X86] getFauxShuffleMask - drop unnecessary computeKnownBits OR(X,Y) shuffle decoding. Now that rG47cea9e82dda941e lets us aggressively decode multi-use shuffles for the OR(SHUFFLE(),SHUFFLE()) case we don't need the computeKnownBits variant any more.	2020-08-04 15:57:47 +01:00
Simon Pilgrim	051f293b78	[X86] Remove unused canScaleShuffleElements helper The only use was removed at rG36750ba5bd0e9e72 Thanks to @nemanjai for the heads up	2020-08-04 14:51:23 +01:00
Simon Pilgrim	36750ba5bd	[X86][AVX] isHorizontalBinOp - relax lane-crossing limits for AVX1-only targets. Permit lane-crossing post shuffles on AVX1 targets as long as every element comes from the same source lane, which for v8f32/v4f64 cases can be efficiently lowered with the LowerShuffleAsLanePermuteAnd* style methods.	2020-08-04 14:27:01 +01:00
Sander de Smalen	bb3344c7d8	[AArch64][SVE] Add missing unwind info for SVE registers. This patch adds a CFI entry for each SVE callee saved register that needs unwind info at an offset from the CFA. The offset is a DWARF expression because the offset is partly scalable. The CFI entries only cover a subset of the SVE callee-saves and only encodes the lower 64-bits, thus implementing the lowest common denominator ABI. Existing unwinders may support VG but only restore the lower 64-bits. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D84044	2020-08-04 11:47:06 +01:00
Sander de Smalen	fd6584a220	[AArch64][SVE] Fix CFA calculation in presence of SVE objects. The CFA is calculated as (SP/FP + offset), but when there are SVE objects on the stack the SP offset is partly scalable and should instead be expressed as the DWARF expression: SP + offset + scalable_offset * VG where VG is the Vector Granule register, containing the number of 64bits 'granules' in a scalable vector. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D84043	2020-08-04 11:47:06 +01:00
Paul Walker	4be13b15d6	[SVE] Replace remaining _MERGE_OP1 nodes with _PRED variants. This is the final bit of work to relax the register allocation requirements when code generating normal LLVM IR, which rarely care about the result of inactive lanes. By using _PRED nodes we can make better use of SVE's reversed instructions. Also removes a redundant parameter from the min/max tests. Differential Revision: https://reviews.llvm.org/D85142	2020-08-04 11:19:17 +01:00
Meera Nakrani	20283ff491	[ARM] Generated SSAT and USAT instructions with shift Added patterns so that both SSAT and USAT instructions are generated with shifts. Added corresponding regression tests. Differential Review: https://reviews.llvm.org/D85120	2020-08-04 09:38:17 +00:00
Simon Pilgrim	47cea9e82d	Revert rG66e7dce714fab "Revert "[X86][SSE] Shuffle combine blends to OR(X,Y) if the relevant elements are known zero."" [X86][SSE] Shuffle combine blends to OR(X,Y) if the relevant elements are known zero (REAPPLIED) This allows us to remove the (depth violating) code in getFauxShuffleMask where we were combining the OR(SHUFFLE,SHUFFLE) shuffle inputs as well, and not just the OR(). This is a minor step toward being able to shuffle combine from/to SELECT/BLENDV as a faux shuffle. Reapplied with fixed signed/unsigned comparisons.	2020-08-04 10:32:39 +01:00
Florian Hahn	f7658241cb	[AArch64] Consider instruction-level contract FMFs in combiner patterns. Currently, instruction level fast math flags are not considered when generating patterns for the machine combiner. This currently leads to some missed opportunities to generate FMAs in combination with `#pragma clang fp contract (fast)`. For example, when building the example below with -O3 for AArch64, no FMADD is generated. If built with -O2 and the DAGCombiner is used instead of the MachineCombiner for FMAs, an FMADD is generated. With this patch, the same code is generated in both cases. float madd_contract(float a, float b, float c) { #pragma clang fp contract (fast) return (a * b) + c; } Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D84930	2020-08-04 10:25:16 +01:00
Qiu Chaofan	6a78a8dd37	[NFC] [PowerPC] Refactor fp/int conversion lowering For FP_TO_INT and INT_TO_FP lowering, we have direct-move and non-direct-move methods. But they share some conversion logic, so we can reduce redundant code by introducing new methods. Reviewed By: steven.zhang Differential Revision: https://reviews.llvm.org/D81818	2020-08-04 15:48:16 +08:00
Wang, Pengfei	6bc7ea2d8d	[X86][AVX512] Fix build fail after D81548 Test function mask_cmp_128 failed during ISEL LLVM ERROR: Cannot select: t37: v8i1 = X86ISD::KSHIFTL t48, TargetConstant:i8<4> due to v8i1 only available under AVX512DQ. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D84922	2020-08-04 12:31:04 +08:00
Chen Zheng	45c46d180e	[PowerPC] mark r+i as legal address mode for vector type after pwr9 Reviewed By: steven.zhang Differential Revision: https://reviews.llvm.org/D84735	2020-08-04 00:02:37 -04:00
Carl Ritson	57899934ea	[AMDGPU] Make GCNRegBankReassign assign based on subreg banks When scavenging consider the sub-register of the source operand to determine the bank of a candidate register (not just sub0). Without this it is possible to introduce an infinite loop, e.g. $sgpr15_sgpr16_sgpr17 can be assigned for a conflict between $sgpr0 and SGPR_96:sub1. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D84910	2020-08-04 12:54:44 +09:00
Chen Zheng	ba955397ac	[SCEVExpander][PowerPC]clear scev rewriter before deleting instructions. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D85130	2020-08-03 20:36:08 -04:00
Christopher Tetreault	c9e6887f83	[SVE] Remove bad calls to VectorType::getNumElements() from X86 Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D85156	2020-08-03 16:34:10 -07:00
hgreving	509f5c4ec2	[MC] Fix memory leak when allocating MCInst with bump allocator Adds the function createMCInst() to MCContext that creates a MCInst using a typed bump alloctor. MCInst contains a SmallVector<MCOperand, 8>. The SmallVector is POD only for <= 8 operands. The default untyped bump pointer allocator of MCContext does not delete the MCInst, so if the SmallVector grows, it's a leak. This fixes https://bugs.llvm.org/show_bug.cgi?id=46900.	2020-08-03 16:08:26 -07:00
Christopher Tetreault	3b92db4c84	[SVE] Remove bad call to VectorType::getNumElements() from AMDGPU Differential Revision: https://reviews.llvm.org/D85151	2020-08-03 15:56:10 -07:00
Christopher Tetreault	b5059b7140	[SVE] Remove bad call to VectorType::getNumElements() from ARM Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D85152	2020-08-03 15:41:14 -07:00
Jordan Rupprecht	af3ec731d5	[NFC][ARM] Silence unused variable in release builds	2020-08-03 15:21:44 -07:00
Christopher Tetreault	b43791e701	[SVE] Remove bad calls to VectorType::getNumElements() from PowerPC Differential Revision: https://reviews.llvm.org/D85154	2020-08-03 15:15:20 -07:00
Mitch Phillips	9a05fa10bd	[HWASan] [GlobalISel] Add +tagged-globals backend feature for GlobalISel GlobalISel is the default ISel for aarch64 at -O0. Prior to D78465, GlobalISel didn't have support for dealing with address-of-global lowerings, so it fell back to SelectionDAGISel. HWASan Globals require special handling, as they contain the pointer tag in the top 16-bits, and are thus outside the code model. We need to generate a `movk` in the instruction sequence with a G3 relocation to ensure the bits are relocated properly. This is implemented in SelectionDAGISel, this patch does the same for GlobalISel. GlobalISel and SelectionDAGISel differ in their lowering sequence, so there are differences in the final instruction sequence, explained in `tagged-globals.ll`. Both of these implementations are correct, but GlobalISel is slightly larger code size / slightly slower (by a couple of arithmetic instructions). I don't see this as a problem for now as GlobalISel is only on by default at `-O0`. Reviewed By: aemerson, arsenm Differential Revision: https://reviews.llvm.org/D82615	2020-08-03 14:28:44 -07:00
David Green	22916481c1	[ARM] Convert VPSEL to VMOV in tail predicated loops VPSEL has slightly different semantics under tail predication (it can end up selecting from Qn, Qm and Qd). We do not model that at the moment so they block tail predicated loops from being formed. This just converts them into a predicated VMOV instead (via a VORR), allowing tail predication to happen whilst still modelling the original behaviour of the input. Differential Revision: https://reviews.llvm.org/D85110	2020-08-03 22:03:14 +01:00
Thomas Lively	cb32792210	[WebAssembly] Implement prototype v128.load{32,64}_zero instructions Specified in https://github.com/WebAssembly/simd/pull/237, these instructions load the first vector lane from memory and zero the other lanes. Since these instructions are not officially part of the SIMD proposal, they are only available on an opt-in basis via LLVM intrinsics and clang builtin functions. If these instructions are merged to the proposal, this implementation will change so that the instructions will be generated from normal IR. At that point the intrinsics and builtin functions would be removed. This PR also changes the opcodes for the experimental f32x4.qfm{a,s} instructions because their opcodes conflicted with those of the v128.load{32,64}_zero instructions. The new opcodes were chosen to match those used in V8. Differential Revision: https://reviews.llvm.org/D84820	2020-08-03 13:54:00 -07:00
Mitch Phillips	66e7dce714	Revert "[X86][SSE] Shuffle combine blends to OR(X,Y) if the relevant elements are known zero." This reverts commit `219f32f4b6`. Commit contains unsigned compasions that break bots that build with -Wsign-compare.	2020-08-03 13:48:30 -07:00
Eli Friedman	dca23ed895	[AArch64] Add missing isel patterns for fcvtzs/u intrinsic on v1f64. Fixes test-suite compile failure caused by `8dfb5d7`. While I'm in the area, add some more test coverage to related operations, to make sure we aren't missing any other patterns.	2020-08-03 13:04:59 -07:00
Jian Cai	c6334db577	[X86] support .nops directive Add support of .nops on X86. This addresses llvm.org/PR45788. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D82826	2020-08-03 11:50:56 -07:00
Joao Moreira	f208c659fb	[X86] Make ENDBR instruction a scheduling boundary Instructions should not be scheduled across ENDBR instructions, as this would result in the ENDBR being displaced, breaking the parity needed for the Indirect Branch Tracking feature of CET. Currently, the X86IndirectBranchTracking pass is later than the instruction scheduling in the pipeline, what causes the bug to be unnoticeable and very hard (if not unfeasible) to be triggered while compiling C files with the standard LLVM setup. Yet, for correctness and to prevent issues in future changes, the compiler should prevent the such scheduling. Differential Revision: https://reviews.llvm.org/D84862	2020-08-03 10:47:23 -07:00
Simon Pilgrim	219f32f4b6	[X86][SSE] Shuffle combine blends to OR(X,Y) if the relevant elements are known zero. This allows us to remove the (depth violating) code in getFauxShuffleMask where we were combining the OR(SHUFFLE,SHUFFLE) shuffle inputs as well, and not just the OR(). This is a minor step toward being able to shuffle combine from/to SELECT/BLENDV as a faux shuffle.	2020-08-03 18:32:47 +01:00
Craig Topper	ac82b918c7	[X86] Use h-register for final XOR of __builtin_parity on 64-bit targets. This adds an isel pattern and special XOR8rr_NOREX instruction to enable the use of h-registers for __builtin_parity. This avoids a copy and a shift instruction. The NOREX instruction is in case register allocation doesn't use the matching l-register for some reason. If a R8-R15 register gets picked instead, we won't be able to encode the instruction since an h-register can't be used with a REX prefix. Fixes PR46954	2020-08-03 10:10:17 -07:00
Cameron McInally	31c7a2fd5c	[FPEnv] Don't transform FSUB(-0,X)->FNEG(X) in SelectionDAGBuilder. This patch stops unconditionally transforming FSUB(-0,X) into an FNEG(X) while building the DAG. There is also one small change to handle the new FSUB(-0,X) similarly to FNEG(X) in the AMDGPU backend. Differential Revision: https://reviews.llvm.org/D84056	2020-08-03 10:22:25 -05:00
Matt Arsenault	2414bab5d7	AMDGPU/GlobalISel: Remove old hacks for boolean selection There were various hacks used to try to avoid making s1 SGPR vs. s1 VCC ambiguous after constraining the register before we had a strategy to deal with this. This also attempted to handle undef operands, which are now illegal gMIR.	2020-08-03 09:04:14 -04:00
Matt Arsenault	fd63e46941	AMDGPU/GlobalISel: Apply load bitcast to s.buffer.load intrinsic Should also apply this to the non-scalar buffer loads.	2020-08-03 08:54:29 -04:00
Simon Pilgrim	99a971cadf	[X86][SSE] Start shuffle combining from ANY_EXTEND_VECTOR_INREG on SSE targets We already do this on AVX (+ for ZERO_EXTEND_VECTOR_INREG), but this enables it for all SSE targets - we attempted something similar back at rL357057 but hit issues with the ZERO_EXTEND_VECTOR_INREG handling (PR41249). I'm still looking at the vector-mul.ll regression - which is due to 32-bit targets performing the load as a f64, resulting in the shuffle combiner thinking it has to create a shuffle in the float domain.	2020-08-03 13:41:48 +01:00
Matt Arsenault	d8ef1d1251	AMDGPU/GlobalISel: Fix selecting broken copies for s32->s64 anyext These should probably not be legal in the first place, but that might also be a pain.	2020-08-03 08:36:41 -04:00
Nicholas Guy	18279a54b5	[ARM] Fix IT block generation after Thumb2SizeReduce with -Oz Fixes a regression caused by D82439, in which IT blocks were no longer being generated when -Oz is present. This was due to the CPSR register being marked as dead, while this case was not accounted for. Differential Revision: https://reviews.llvm.org/D83667	2020-08-03 13:20:32 +01:00
Fangrui Song	40da58a04b	[MC] Default MCAsmBackend::mayNeedRelaxation() to false	2020-08-02 22:13:59 -07:00
QingShan Zhang	62e4644616	[NFC][PowerPC] Add a multiclass for fsetcc to define them in a uniform way This is a refactor patch to prepare for adding the support for strict-fsetcc in PowerPC backend. We want to move their definition into a uniform way so that, we could add the strict node easier. Reviewed By: shchenz Differential Revision: https://reviews.llvm.org/D81712	2020-08-03 03:28:03 +00:00
StephenFan	a96921afa7	[RISCV] eliminate the repetition declare of SDLoc DL Differential revision: https://reviews.llvm.org/D85002	2020-08-03 10:24:30 +08:00
Craig Topper	64516ec7c1	[X86] Use parity flag from byte test/cmp instruction for __builtin_parity when input fits in 8 bits. If the upper bits of the __builtin_parity idiom are known to be 0 we were previously emitting an xor with 0 to get the parity flag. But we can use cmp/test instead which may expose opportunities for load folding or combining an AND.	2020-08-02 10:45:04 -07:00
Matt Arsenault	212570abcf	GlobalISel: Implement bitcast action for G_EXTRACT_VECTOR_ELEMENT For AMDGPU, vectors with elements < 32 bits should be indexed in 32-bit elements and the desired bits extracted from there. For elements > 64-bits, these should be reduce to 64/32 elements to enable the normal dynamic indexing paths. In the dynamic index cases, this produces shorter code most of the time. This does immediately regress the constant index cases, but this should be fixed once we have the most basic of shift combines. The element size > 64 case is pretty much ported from the exisiting DAG implementation for extract element promote. The increasing element size case is new.	2020-08-02 10:42:07 -04:00
Simon Pilgrim	00d0f354f2	X86InstrInfo.cpp - fix include ordering. NFCI.	2020-08-02 15:34:18 +01:00
Simon Pilgrim	7dd4f03595	Use merge null and isa<> tests into isa_and_nonnull<>. NFCI.	2020-08-02 15:34:18 +01:00
Simon Pilgrim	d14a22da5e	[DAG] TargetLowering::LowerAsmOutputForConstraint - pass SDLoc as const& Try to be more consistent with the SDLoc param in the TargetLowering methods.	2020-08-02 15:12:02 +01:00
Simon Pilgrim	20fbbbc583	[X86] Use const APInt& in for-range loop to avoid unnecessary copies. NFCI. Fixes clang-tidy warning.	2020-08-02 14:32:23 +01:00
Simon Pilgrim	d7e2616741	[X86] Pass SDLoc by const reference. NFCI.	2020-08-02 14:32:22 +01:00
Simon Pilgrim	3f276840b6	[X86] Use const APInt& in for-range loop to avoid unnecessary copies. NFCI. Fixes clang-tidy warning.	2020-08-02 14:32:22 +01:00
Simon Pilgrim	2700311cce	[X86] combineX86ShuffleChain - pull out repeated RootVT.getSizeInBits() calls. NFCI.	2020-08-02 14:32:22 +01:00
Craig Topper	56166a3a52	[X86] Improve parity idiom recognition to handle (and (truncate (ctpop X)), 1). Fixes part of PR46954	2020-08-01 22:59:43 -07:00
Kazu Hirata	60434989e5	Use llvm::is_contained where appropriate (NFC) Use llvm::is_contained where appropriate (NFC) Reviewed By: kazu Differential Revision: https://reviews.llvm.org/D85083	2020-08-01 21:51:06 -07:00
Craig Topper	e297d928dc	[X86] Add assembler support for {disp8} and {disp32} to control the size of displacement used for memory operands. These prefixes should override the default behavior and force a larger immediate size. I don't believe gas issues any warning if you use {disp8} when a 32-bit displacement is already required. And this patch doesn't either. This completes the {disp8} and {disp32} support from PR46650. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D84793	2020-08-01 13:26:35 -07:00
Simon Pilgrim	82a5c848e7	[X86][AVX512] Fold concat(and(x,y),and(z,w)) -> and(concat(x,z),concat(y,w)) for 512-bit vectors Helps vpternlog folding on non-AVX512BW targets	2020-08-01 20:34:39 +01:00
Simon Pilgrim	bb13c34c3a	[X86][AVX] Ensure we only combine to PSHUFLW/PSHUFHW on supporting targets Noticed while investigating combining from concatenated shuffle vectors, we weren't checking that PSHUFLW/PSHUFHW was legal - we were depending on lowering splitting to subvectors.	2020-08-01 19:18:11 +01:00
David Green	fd69df62ed	[ARM] Distribute post-inc for Thumb2 sign/zero extending loads/stores This adds sign/zero extending scalar loads/stores to the MVE instructions added in D77813, allowing us to create up more post-inc instructions. These are comparatively simple, compared to LDR/STR (which may be better turned into an LDRD/LDM), but still require some additions over MVE instructions. Because there are i12 and i8 variants of the offset loads/stores dealing with different signs, we may need to convert an i12 address to a i8 negative instruction. t2LDRBi12 can also be shrunk to a tLDRi under the right conditions, so we need to be careful with codesize too. Differential Revision: https://reviews.llvm.org/D78625	2020-08-01 14:01:18 +01:00
Simon Pilgrim	1b1901536a	[X86][AVX] Extend v2f64 BROADCAST(LOAD) -> BROADCAST_LOAD to v2i64/v4f32/v4i32 Minor precursor fix for D66004, but helps the SSE41 tests as well as they run with -disable-peephole	2020-08-01 12:28:29 +01:00
Craig Topper	75f134eec1	[X86] Refactor the broadcast and load folding in tryVPTESTM to reduce some code. Now we try to load and broadcast together for operand 1. Followed by load and broadcast for operand 1. Previously we tried load operand 1, load operand 1, broadcast operand 0, broadcast operand 1. Now we have a single helper that tries load and broadcast for one operand that we can just call twice.	2020-07-31 23:57:13 -07:00
Craig Topper	1bd7046e4c	[X86] Use TargetLowering::getRegClassFor to simplify some code in tryVPTESTM. NFCI	2020-07-31 21:39:10 -07:00
Justin Hibbits	7e9153e940	PowerPC: Don't lower SELECT_CC to PPCISD::FSEL on SPE SPE doesn't have a fsel instruction, so don't try to lower to it. This fixes a "Cannot select: tN: f64 = PPCISD::FSEL tX, tY, tZ" error. Reviewed By: #powerpc, lkail Differential Revision: https://reviews.llvm.org/D77773	2020-07-31 22:52:47 -05:00
Justin Hibbits	914dbf4808	PowerPC: Fix SPE extloadf32 handling. The patterns were incorrect copies from the FPU code, and are unnecessary, since there's no extended load for SPE. Just let LLVM itself do the work by marking it expand. Reviewed By: #powerpc, lkail Differential Revision: https://reviews.llvm.org/D78670	2020-07-31 22:42:57 -05:00
Kazushi (Jam) Marukawa	605fd4d77c	[VE] Change calling convention to follow ABI Change to expand all arguments and return values to i64 to follow ABI. Update regression tests also. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D84581	2020-08-01 10:08:54 +09:00
Huihui Zhang	01bfe2e494	[AArch64][SVE] Allow vector of pointers as legal type for masked load/store. Refer to LangRef http://llvm.org/docs/LangRef.html#llvm-masked-load-intrinsics 'llvm.masked.load/store.*’ intrinsics are overloaded intrinsic, which allow the load/store data to be a vector of any integer, floating-point or pointer data type. Therefore, allow pointer data type when checking 'isLegalMaskedLoadStore()'. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D85045	2020-07-31 17:30:23 -07:00
Craig Topper	93c678a79b	[X86] Simplify vpternlog immediate selection. Rather than hardcoding immediate values for 12 different combinations in a nested pair of switches, we can perform the matched logic operation on 3 magic constants to calculate the immediate. Special thanks to this tweet https://twitter.com/rygorous/status/1187034321992871936 for making me realize I could do this.	2020-07-31 17:16:27 -07:00
Hsiangkai Wang	47a4a27f47	Upgrade MC to v0.9. Differential revision: https://reviews.llvm.org/D80802	2020-08-01 07:42:06 +08:00
Sidharth Baveja	b7cfa6ca92	[Loop Peeling] Separate the Loop Peeling Utilities from the Loop Unrolling Utilities Summary: This patch separates the Loop Peeling Utilities from Loop Unrolling. The reason for this change is that Loop Peeling is no longer only being used by loop unrolling; Patch D82927 introduces loop peeling with fusion, such that loops can be modified to have to same trip count, making them legal to be peeled. Reviewed By: Meinersbur Differential Revision: https://reviews.llvm.org/D83056	2020-07-31 18:31:58 +00:00
Albion Fung	93fd8dbdc2	[PowerPC] Add Vector String Isolate instruction definitions and MC Tests This patch implements the instruction definition and MC tests for the vector string isolate instructions. Differential Revision: https://reviews.llvm.org/D84197	2020-07-31 12:32:29 -05:00
Benjamin Kramer	c6f08b14d4	Hide some internal symbols. NFC.	2020-07-31 17:28:02 +02:00
Matt Arsenault	57bd64ff84	Support addrspacecast initializers with isNoopAddrSpaceCast Moves isNoopAddrSpaceCast to the TargetMachine. It logically belongs with the DataLayout.	2020-07-31 10:42:43 -04:00
Vitaly Buka	b0eb40ca39	[NFC] Remove unused GetUnderlyingObject paramenter Depends on D84617. Differential Revision: https://reviews.llvm.org/D84621	2020-07-31 02:10:03 -07:00
QingShan Zhang	9b04fec002	[PowerPC] Retrieve the offset from load/store if it stores to stack slots Scheduler will try to retrieve the offset and base addr to determine if two loads/stores are disjoint memory access. PowerPC failed to handle this for frame index which will bring extra memory dependency for loads/stores. Reviewed By: jji Differential Revision: https://reviews.llvm.org/D84308	2020-07-31 07:08:20 +00:00
Craig Topper	30a0dbb70d	[X86] Remove x86_sse42_crc32_64_64 from X86TTIImpl::simplifyDemandedUseBitsIntrinsic It doesn't do any simplifying. It just computes known bits. We can just let InstCombine call computeKnownBits which will handle this just as well.	2020-07-30 21:51:23 -07:00
Vitaly Buka	89051ebace	[NFC] GetUnderlyingObject -> getUnderlyingObject I am going to touch them in the next patch anyway	2020-07-30 21:08:24 -07:00
Craig Topper	916d9e1877	[X86] Pass the OperandVector by reference to ParseIntelOperand and ParseRoundingMode. NFCI Similar to what was recently done to ParseATTOperand. Make ParseIntelOperand directly responsible for adding to the operand vector instead of returning the operand. Return a bool for error. Remove ErrorOperand since it is no longer used.	2020-07-30 19:52:38 -07:00
Vitaly Buka	b256cb88a7	[ValueTracking] Remove AllocaForValue parameter findAllocaForValue uses AllocaForValue to cache resolved values. The function is used only to resolve arguments of lifetime intrinsic which usually are not fare for allocas. So result reuse is likely unnoticeable. In followup patches I'd like to replace the function with GetUnderlyingObjects. Depends on D84616. Differential Revision: https://reviews.llvm.org/D84617	2020-07-30 18:48:34 -07:00
Scott Constable	ec1445c5af	[X86] Fix for ballooning compile times due to Load Value Injection (LVI) mitigations Fix for the issue raised in https://github.com/rust-lang/rust/issues/74632. The current heuristic for inserting LFENCEs uses a quadratic-time algorithm. This can apparently cause substantial compilation slowdowns for building Rust projects, where functions > 5000 LoC are apparently common. The updated heuristic in this patch implements a linear-time algorithm. On a set of benchmarks, the slowdown factor for the generated code was comparable (2.55x geo mean for the quadratic-time heuristic, vs. 2.58x for the linear-time heuristic). Both heuristics offer the same security properties, namely, mitigating LVI. This patch also includes some formatting fixes. Differential Revision: https://reviews.llvm.org/D84471	2020-07-30 17:22:33 -07:00
Craig Topper	3ad09fd03c	[X86] Separate CPU Feature lists in X86.td between architecture features and tuning features After the recent change to the tuning settings for pentium4 to improve our default 32-bit behavior, I've decided to see about implementing -mtune support. This way we could have a default architecture CPU of "pentium4" or "x86-64" and a default tuning cpu of "generic". And we could change our "pentium4" tuning settings back to what they were before. As a step to supporting this, this patch separates all of the features lists for the CPUs into 2 lists. I'm using the Proc class and a new ProcModel class to concat the 2 lists before passing to the target independent ProcessorModel. Future work to truly support mtune would change ProcessorModel to take 2 lists separately. I've diffed the X86GenSubtargetInfo.inc file before and after this patch to ensure that the final feature list for the CPUs isn't changed. Differential Revision: https://reviews.llvm.org/D84879	2020-07-30 17:19:19 -07:00
Amara Emerson	09f9f7dd1b	[AArch64][GlobalISel] Add legalization & selection support for G_INTRINSIC_LRINT. Differential Revision: https://reviews.llvm.org/D84552	2020-07-30 16:14:56 -07:00
Matt Arsenault	e56e9022bc	AMDGPU: Fix liveness errors when copying AGPR tuples Avoid recursively calling copyPhysReg for AGPR handling. This was dropping the necessary super register implicit defs to avoid liveness verifier errors.	2020-07-30 18:13:04 -04:00
Changpeng Fang	243376cdc7	AMDGPU: Put inexpensive ops first in AMDGPUAnnotateUniformValues::visitLoadInst Summary: This is in response to the review of https://reviews.llvm.org/D84873: The expensive check should be reordered last Reviewers: arsenm Differential Revision: https://reviews.llvm.org/D84890	2020-07-30 14:37:06 -07:00
Wouter van Oortmerssen	ce1eb7af9d	[WebAssembly] Fixed 64-bit indices in br_table LLVM selection dag assumes "switch" indices are pointer sized, which causes problems for our 32-bit br_table. The new function ensures 32-bit operands don't get unnecessarily extended, and 64-bit operands get truncated. Note that the changes to the existing test test exactly that: the addition of -NEXT in 2 places ensures no extension is inserted (which the test previously ignored) and that the wrap is present (previously omitted in wasm64 mode). Differential Revision: https://reviews.llvm.org/D84705	2020-07-30 10:52:16 -07:00
Stanislav Mekhanoshin	5b32518f96	[AMDGPU] Do not use undef on indirect source We are using undef on the indirect move source subreg and then using implicit super-reg. This creates a problem in RA when Greedy decides to split the register. It reassigns the implicit super-reg but does not bother to change undef source because it is really does not matter. The fix is to stop lying to RA and drop undef flag. This has also hit a problem in SIFoldOperands as it can fold immediate into an indirect move since there is no undef flag anymore. That results in multiple test failures, so added the check for this case. Differential Revision: https://reviews.llvm.org/D84899	2020-07-30 10:41:59 -07:00
Craig Topper	3632f765dc	[WebAssembly] Fix GCC 5 build. Hans' speculative fix in `b7292f2db0` didn't work for me. This seems to.	2020-07-30 10:00:28 -07:00
hsmahesha	33fd4a18e7	[AMDGPU/MemOpsCluster] Clean-up fixme's around mem ops clustering logic Get rid of all fixmes and base heuristic on `num-clustered-dwords`. The main intuition behind this is as follows. The existing heuristic roughly summarizes as below: * Assume, all the mem ops instructions participating in the clustering process, loads/stores same num bytes * If num bytes loaded by each mem op is 4 bytes, then cluster at max 5 mem ops, that is at max 20 bytes * If num bytes loaded by each mem op is 8 bytes, then cluster at max 3 mem ops, that is at max 24 bytes * If num bytes loaded by each mem op is 16 bytes, then cluster at max 2 mem ops, that is at max 32 bytes So, we need to make sure that the new heuristic do not completey deviate away from the above one, and it properly handles both the sub-word loads and the wide loads. Reviewed By: arsenm, rampitec Differential Revision: https://reviews.llvm.org/D84354	2020-07-30 21:41:13 +05:30
Fangrui Song	d2c2248722	[X86] Parse and ignore .arch directives We parse .arch so that some `.arch i386; .code32` code can assemble. It seems that X86AsmParser does not do a good job tracking what features are needed to assemble instructions. GNU as's x86 port supports a very wide range of .arch operands. Ignore the operand for now. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D84900	2020-07-30 08:30:06 -07:00
Momchil Velikov	ef4e665435	[AArch64] Fix operand definitions of XPACI/XPACD The operand to these instructions is both input and output. These are not yet emitted by the compiler and the assembler already works fine, so can't test in this patch. But D75044 will use XPACI and provide test coverage for this patch as well. Differential Revision: https://reviews.llvm.org/D84298	2020-07-30 15:31:44 +01:00
Hans Wennborg	b7292f2db0	Speculative GCC 5 build fix It's complaining about specializing the template in a different namespace.	2020-07-30 16:12:52 +02:00
jasonliu	04dc9691eb	[XCOFF][AIX] Enable -ffunction-sections Summary: This patch implements -ffunction-sections on AIX. This patch focuses on assembly generation. Follow-on patch needs to handle: 1. -ffunction-sections implication for jump table. 2. Object file generation path and associated testing. Differential Revision: https://reviews.llvm.org/D83875	2020-07-30 13:30:01 +00:00
Simon Pilgrim	cc529285fd	VectorUtils.h - reduce unnecessary includes. NFC. Replace TargetLibraryInfo.h include with forward declaration and fix implicit dependencies. Reduce SmallSet.h include to SmallVector.h include.	2020-07-30 12:27:49 +01:00
Simon Pilgrim	2dec72ba5c	[X86][SSE] combineExtractWithShuffle - extend extract(truncate(x),0) for any source vector size As long as we can extract the lowest 128-bit subvector from the pre-truncated source vector, then we don't care what size it is. The next stage will be to support non-zero extraction indices, as long as its still coming from the lowest 128-bit subvector.	2020-07-30 12:27:49 +01:00
David Sherwood	23ad660b5d	[SVE][CodeGen] At -O0 fallback to DAG ISel when translating alloca with scalable types When building code at -O0 We weren't falling back to DAG ISel correctly when encountering alloca instructions with scalable vector types. This is because the alloca has no operands that are scalable. I've fixed this by adding a check in AArch64ISelLowering::fallBackToDAGISel for alloca instructions with scalable types. Differential Revision: https://reviews.llvm.org/D84746	2020-07-30 08:40:53 +01:00
Craig Topper	07bb8240a0	[X86] Pass the OperandVector to ParseMemOperand instead of returning the operand. NFCI Continue the change made to ParseATTOperand to take the vector by reference. Let ParseMemOperand add its memory operand to the vector and just return true/false to indicate error.	2020-07-29 23:44:56 -07:00
Craig Topper	17597442db	[X86] Don't pass some many parameters to ParseMemOperand by reference. Pointers and SMLocs are cheap to copy. Even though the function modifies some of these the caller doesn't use them after the call.	2020-07-29 23:44:56 -07:00
Craig Topper	9611ee5f40	[X86] Teach the assembler parser to handle a '' between segment register and base/index/displacement part of an address A '' after the segment is equivalent to a '' before the segment register. To make the AsmMatcher table work we need to place the '' token into the operand vector before the full memory operand. To accomplish this I've modified some portions of operand parsing to expose the operand vector to ParseATTOperand so that the token can be pushed to the vector after parsing the segment register and before creating the memory operand using that segment register. Fixes PR46879 Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D84895	2020-07-29 21:15:04 -07:00
Kang Zhang	a18953c1c0	[PowerPC] Fix RM operands for some instructions Summary: Some instructions have set the wrong [RM] flag, this patch is to fix it. Instructions x(v\|s)r(d\|s)pi[zmp]? and fri[npzm] use fixed rounding directions without referencing current rounding mode. Also, the SETRNDi, SETRND, BCLRn, MTFSFI, MTFSB0, MTFSB1, MTFSFb, MTFSFI, MTFSFI_rec, MTFSF, MTFSF_rec should also fix the RM flag. Reviewed By: jsji Differential Revision: https://reviews.llvm.org/D81360	2020-07-30 02:10:49 +00:00
Matt Arsenault	0da582d9b6	GlobalISel: Handle llvm.roundeven I still think it's highly questionable that we have two intrinsics with identical behavior and only vary by the name of the libcall used if it happens to be lowered that way, but try to reduce the feature delta between SDAG and GlobalISel for recently added intrinsics. I'm not sure which opcode should be considered the canonical one, but lower roundeven back to round.	2020-07-29 20:01:12 -04:00
Craig Topper	b1c1825b99	[X86] Remove unused argument from HandleAVX512Operand in the assembly parser.	2020-07-29 14:23:01 -07:00
Simon Pilgrim	a1c9529e60	[X86][AVX] isHorizontalBinOp - relax no-lane-crossing limit for AVX1-only targets. Instead of never accepting v8f32/v4f64 FHADD/FHSUB if the input shuffle masks cross lanes, perform the matching and determine if the post shuffle mask simplifies to a 'whole lane shuffle' mask - in which case we are guaranteed to cheaply perform this as a VPERM2F128 shuffle.	2020-07-29 20:49:10 +01:00
Stanislav Mekhanoshin	decfdb8ce3	[AMDGPU] Fixed formatting in GCNHazardRecognizer.cpp. NFC.	2020-07-29 12:21:28 -07:00
Stanislav Mekhanoshin	13b63be472	[AMDGPU] prefer non-mfma in post-RA schedule MFMA instructions shall not be scheduled back to back to avoid MAI SIMD stall. Tell post-RA schedule we would prefer some other instruction instead. Differential Revision: https://reviews.llvm.org/D84883	2020-07-29 12:17:50 -07:00
Baptiste Saleil	7aaa85627b	[PowerPC] Add options to control paired vector memops support Adds frontend and backend options to enable and disable the PowerPC paired vector memory operations added in ISA 3.1. Instructions using these options will be added in subsequent patches. Differential Revision: https://reviews.llvm.org/D83722	2020-07-29 14:00:53 -05:00
Amara Emerson	d8ba622209	[AArch64][GlobalISel] Selection support for vector DUP[X]lane instructions. In future, we'd like to use the perfect-shuffle mechanism to deal with these shuffle permutations. For now, this improves performance by avoiding the super-expensive const-pool load + tbl instruction. Differential Revision: https://reviews.llvm.org/D84866	2020-07-29 11:41:37 -07:00
Matt Arsenault	59fac51ff2	AMDGPU/GlobalISel: Handle llvm.amdgcn.reloc.constant	2020-07-29 14:24:21 -04:00
Matt Arsenault	0b7de7966f	GlobalISel: Implement lower for G_EXTRACT_VECTOR_ELT Use the basic store to stack and reload.	2020-07-29 14:16:28 -04:00

... 6 7 8 9 10 ...

59397 Commits