llvm-project

Commit Graph

Author	SHA1	Message	Date
Jessica Paquette	d0ba6c4002	[AArch64][GlobalISel] Select CSINC and CSINV for G_SELECT with constants Select the following: - G_SELECT cc, 0, 1 -> CSINC zreg, zreg, cc - G_SELECT cc 0, -1 -> CSINV zreg, zreg cc - G_SELECT cc, 1, f -> CSINC f, zreg, inv_cc - G_SELECT cc, -1, f -> CSINV f, zreg, inv_cc - G_SELECT cc, t, 1 -> CSINC t, zreg, cc - G_SELECT cc, t, -1 -> CSINC t, zreg, cc (IR example: https://godbolt.org/z/YfPna9) These correspond to a bunch of the AArch64csel patterns in AArch64InstrInfo.td. Unfortunately, it doesn't seem like we can import patterns that use NZCV like those ones do. E.g. ``` def : Pat<(AArch64csel GPR32:$tval, (i32 1), (i32 imm:$cc), NZCV), (CSINCWr GPR32:$tval, WZR, (i32 imm:$cc))>; ``` So we have to manually select these for now. This replaces `selectSelectOpc` with an `emitSelect` function, which performs these optimizations. Differential Revision: https://reviews.llvm.org/D90701	2020-11-12 14:44:01 -08:00
Kazushi (Jam) Marukawa	410626c9b5	[VE] Support vld intrinsics Add intrinsics for vector load instructions. Add a regression test also. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D91332	2020-11-13 07:34:42 +09:00
Stanislav Mekhanoshin	cf6565f6d0	[AMDGPU] Enable multi-dword flat scratch load/stores Differential Revision: https://reviews.llvm.org/D91384	2020-11-12 13:38:56 -08:00
Jay Foad	6881a82e8c	[AMDGPU] Fix scheduling of exp pos4 Also fix a similar issue in SIInsertWaitcnts, but I don't think that fix has any effect in practice. Differential Revision: https://reviews.llvm.org/D91290	2020-11-12 19:57:14 +00:00
Jay Foad	d7d6ac5624	[AMDGPU] Define and use names for export targets. NFC. Differential Revision: https://reviews.llvm.org/D91289	2020-11-12 19:57:14 +00:00
Craig Topper	4cdf1d2110	[MSP430] Remove unused MVT::Glue output from MSP430ISD::SELECT_CC nodes. Follow up from a similar patch on RISCV `637f19c36b` Nothing reads this Glue value that I could see. The SDNode def in the td file does not have the SDNPOutGlue flag so I don't think this glue would get properly propagated to MachineSDNodes if it was used.	2020-11-12 10:34:01 -08:00
Craig Topper	0add5f9122	[RISCV] Don't include CodeGen layer files in MC layer -Use MCRegister instead of Register in MC layer. -Move some enums from RISCVInstrInfo.h to RISCVBaseInfo.h to be with other TSFlags bits. Differential Revision: https://reviews.llvm.org/D91114	2020-11-12 07:45:38 -08:00
Craig Topper	9ca02d6fe1	[RISCV] Add an ANDI to shift amount of FSL/FSR instructions The fshl and fshr intrinsics are defined to modulo their shift amount by the bitwidth of one of their inputs. The FSR/FSL instructions read one extra bit from the shift amount. If that bit is set the inputs are swapped. In order to preserve the semantics of the llvm intrinsics we need to make sure that the extra bit isn't set. DAG combine or instcombine may have removed any mask that was originally present. We could be smarter here and try to use computeKnownBits to check if the bit is known zero, but wanted to start with correctness. Differential Revision: https://reviews.llvm.org/D90905	2020-11-12 07:33:40 -08:00
David Green	11dee2eae2	[ARM] Ensure CountReg definition dominates InsertPt when creating t2DoLoopStartTP Of course there was something missing, in this case a check that the def of the count register we are adding to a t2DoLoopStartTP would dominate the insertion point. In the future, when we remove some of these COPY's in between, the t2DoLoopStartTP will always become the last instruction in the block, preventing this from happening. In the meantime we need to check they are created in a sensible order. Differential Revision: https://reviews.llvm.org/D91287	2020-11-12 13:47:46 +00:00
Kazushi (Jam) Marukawa	a72d384249	[VE] Change the default type of v64 register class Change the default type of v64 register class from v512i32 to v256f64. Add a regression test also. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D91301	2020-11-12 19:07:07 +09:00
David Sherwood	3225fcf11e	[SVE] Deal with SVE tuple call arguments correctly when running out of registers When passing SVE types as arguments to function calls we can run out of hardware SVE registers. This is normally fine, since we switch to an indirect mode where we pass a pointer to a SVE stack object in a GPR. However, if we switch over part-way through processing a SVE tuple then part of it will be in registers and the other part will be on the stack. I've fixed this by ensuring that: 1. When we don't have enough registers to allocate the whole block we mark any remaining SVE registers temporarily as allocated. 2. We temporarily remove the InConsecutiveRegs flags from the last tuple part argument and reinvoke the autogenerated calling convention handler. Doing this prevents the code from entering an infinite recursion and, in combination with 1), ensures we switch over to the Indirect mode. 3. After allocating a GPR register for the pointer to the tuple we then deallocate any SVE registers we marked as allocated in 1). We also set the InConsecutiveRegs flags back how they were before. 4. I've changed the AArch64ISelLowering LowerCALL and LowerFormalArguments functions to detect the start of a tuple, which involves allocating a single stack object and doing the correct numbers of legal loads and stores. Differential Revision: https://reviews.llvm.org/D90219	2020-11-12 08:41:50 +00:00
Amara Emerson	ad376657c1	[AArch64][GlobalISel] Optimize G_PTR_ADD with a negated offset to be a G_SUB.	2020-11-11 22:46:53 -08:00
Baptiste Saleil	37c4ac8545	[PowerPC] Accumulator/Unprimed Accumulator register copy, spill and restore This patch adds support for accumulator/unprimed accumulator register copy, spill and restore for MMA. Authored By: Baptiste Saleil Reviewed By: #powerpc, bsaleil, amyk Differential Revision: https://reviews.llvm.org/D90616	2020-11-11 16:23:45 -06:00
Jessica Paquette	7a70a2f04d	[AArch64][GlobalISel] Mark G_FCONSTANT as legal when there is full fp16 support When there is full fp16 support, there is no reason to widen 16-bit G_FCONSTANTs to 32 bits. Mark them as legal in this case. Also, we currently import a pattern for materializing a 16-bit 0.0. Add a testcase showing we select it. (All other 16-bit G_FCONSTANTS are not yet selected.) Differential Revision: https://reviews.llvm.org/D89164	2020-11-11 13:25:11 -08:00
Craig Topper	637f19c36b	[RISCV] Remove traces of Glue from RISCVISD::SELECT_CC We were creating RISCVISD::SELECT_CC nodes with Glue output that was never being used, and the tablegen SDNode had the SDNPInGlue flag instead of the SDNPOutGlue flag. Since we don't seem to need the Glue just get rid of it from both places. Differential Revision: https://reviews.llvm.org/D91199	2020-11-11 09:30:48 -08:00
Jessica Paquette	c42053f79b	[AArch64][GlobalISel] Select arith extended add/sub in manual selection code The manual selection code for add/sub was not checking if it was possible to fold in shifts + extends (the *rx opcode variants). As a result, we could never select things like ``` cmp x1, w0, uxtw #2 ``` Because we don't import any patterns for compares. This adds support for the arithmetic shifted register forms and updates tests for instructions selected using `emitADD`, `emitADDS`, and `emitSUBS`. This is a 0.1% geomean code size improvement on SPECINT2000 at -Os. Differential Revision: https://reviews.llvm.org/D91207	2020-11-11 09:26:03 -08:00
Jessica Paquette	f0580c73bb	[AArch64][GlobalISel] Select negative arithmetic immediates in manual selector Previously, we only handled negative arithmetic immediates in the imported selector code. Since we don't import code for, say, compares, we were missing opportunities for things like ``` %cst:gpr(s64) = G_CONSTANT i64 -10 %cmp:gpr(s32) = G_ICMP intpred(eq), %reg0(s64), %cst -> %adds = ADDSXri %reg0, 10, 0, implicit-def $nzcv %cmp = CSINCWr $wzr, $wzr, 1, implicit $nzcv ``` Instead, we would have to materialize the constant and emit a SUBS. This adds support for selection like above for SUB, SUBS, ADD, and ADDS. This is a 0.1% geomean code size improvement on SPECINT2000 at -Os. Differential Revision: https://reviews.llvm.org/D91108	2020-11-11 09:20:05 -08:00
Jay Foad	f23c4c6f8a	[AMDGPU] Separate out real exp instructions by subtarget. NFC. Differential Revision: https://reviews.llvm.org/D91247	2020-11-11 17:13:40 +00:00
Jay Foad	2b33ea6935	[AMDGPU] Split exp instructions out into their own tablegen file. NFC. Differential Revision: https://reviews.llvm.org/D91246	2020-11-11 17:13:40 +00:00
Jay Foad	f94fd1c8ca	[AMDGPU] Make use of SIInstrInfo::isEXP. NFC.	2020-11-11 17:01:20 +00:00
Jay Foad	830ed64ccd	Revert "Revert "[AMDGPU] Reorganize GCN subtarget features for unaligned access"" This reverts commit `8b08fa0103`. The underlying problems were fixed by D90607.	2020-11-11 14:40:14 +00:00
Caroline Concatto	37f4ccb275	[AArch64]Add memory op cost model for SVE This patch adds/fixes memory op cost model for SVE with fixed-width vector. Differential Revision: https://reviews.llvm.org/D90950	2020-11-11 12:49:19 +00:00
Simon Pilgrim	1a62ca65c1	[KnownBits] Add KnownBits::commonBits helper. NFCI. We have a frequent pattern where we're merging two KnownBits to get the common/shared bits, and I just fell for the gotcha where I tried to use the & operator to merge them........	2020-11-11 12:15:54 +00:00
Kerry McLaughlin	170947a5de	[SVE][CodeGen] Lower scalable masked scatters Lowers the llvm.masked.scatter intrinsics (scalar plus vector addressing mode only) Changes included in this patch: - Custom lowering for MSCATTER, which chooses the appropriate scatter store opcode to use. Floating-point scatters are cast to integer, with patterns added to match FP reinterpret_casts. - Added the getCanonicalIndexType function to convert redundant addressing modes (e.g. scaling is redundant when accessing bytes) - Tests with 32 & 64-bit scaled & unscaled offsets Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D90941	2020-11-11 11:50:22 +00:00
Kerry McLaughlin	ffbbfc76ca	[SVE][CodeGen] Add the isTruncatingStore flag to MSCATTER This patch adds the IsTruncatingStore flag to MaskedScatterSDNode, set by getMaskedScatter(). Updated SelectionDAGDumper::print_details for MaskedScatterSDNode to print the details of masked scatters (is truncating, signed or scaled). This is the first in a series of patches which adds support for scalable masked scatters Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D90939	2020-11-11 10:58:24 +00:00
Sam Parker	898a81dfc5	[NFC][ARM] Replace lambda with any_of	2020-11-11 10:02:55 +00:00
Amara Emerson	2262393090	[AArch64][GlobalISel] Port some AArch64 target specific MUL combines from SDAG. These do things like turn a multiply of a pow-2+1 into a shift and and add, which is a common pattern that pops up, and is universally better than expensive madd instructions with a constant. I've added check lines to an existing codegen test since the code being ported is almost identical, however the mul by negative pow2 constant tests don't generate the same code because we're missing some generic G_MUL combines still. Differential Revision: https://reviews.llvm.org/D91125	2020-11-10 22:21:13 -08:00
Gaurav Jain	3726b14428	[NFC] Use [MC]Register for x86 target Differential Revision: https://reviews.llvm.org/D91161	2020-11-10 15:49:39 -08:00
Kazushi (Jam) Marukawa	dd6f607ea8	[VE] Implement FoldImmediate Implement FoldImmediate for only integer aritihmetic operations. Add regression tests also. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D91150	2020-11-11 08:08:32 +09:00
Pirama Arumuga Nainar	8262e94a6d	[ARM] Fix PR 47980: Use constrainRegClass during foldImmediate opt. Previously we used setRegClass to rgpr, which may expand the register domain if the result was already in a constrained class (tcgpr in the above PR). Differential Revision: https://reviews.llvm.org/D91192	2020-11-10 13:38:11 -08:00
Stanislav Mekhanoshin	544ef42e40	[AMDGPU] Set default op_sel_hi on accvgpr read/write These are opsel opcodes with op_sel actually being ignored. As a such op_sel_hi needs to be set to default 1 even though these bits are ignored. This is compatibility change. Differential Revision: https://reviews.llvm.org/D91202	2020-11-10 13:07:29 -08:00
Benjamin Kramer	92c61a045f	[ARM] Silence unused variable warning in Release builds. NFC.	2020-11-10 20:35:28 +01:00
Craig Topper	70b481e8db	[RISCV] Add missing copyright header to RISCVBaseInfo.cpp. NFC	2020-11-10 11:33:08 -08:00
David Green	08d1c2d470	[ARM] Introduce t2DoLoopStartTP This introduces a new pseudo instruction, almost identical to a t2DoLoopStart but taking 2 parameters - the original loop iteration count needed for a low overhead loop, plus the VCTP element count needed for a DLSTP instruction setting up a tail predicated loop. The idea is that the instruction holds both values and the backend ARMLowOverheadLoops pass can pick between the two, depending on whether it creates a tail predicated loop or falls back to a low overhead loop. To do that there needs to be something that converts a t2DoLoopStart to a t2DoLoopStartTP, for which this patch repurposes the MVEVPTOptimisationsPass as a "tail predication and vpt optimisation" pass. The extra operand for the t2DoLoopStartTP is chosen based on the operands of VCTP's in the loop, and the instruction is moved as late in the block as possible to attempt to increase the likelihood of making tail predicated loops. Differential Revision: https://reviews.llvm.org/D90591	2020-11-10 18:08:12 +00:00
Jay Foad	bb8d1437a6	[AMDGPU] Simplify multiclass EXP_m. NFC.	2020-11-10 17:28:36 +00:00
David Green	dbe1bf63aa	[ARM] Cleanup for ARMLowOverheadLoops. NFC	2020-11-10 17:28:07 +00:00
David Green	c7e275388e	[ARM] Don't aggressively unroll vector remainder loops We already do not unroll loops with vector instructions under MVE, but that does not include the remainder loops that the vectorizer produces. These remainder loops will be rarely executed and are not worth unrolling, as the trip count is likely to be low if they get executed at all. Luckily they get llvm.loop.isvectorized to make recognizing them simpler. We have wanted to do this for a while but hit issues with low overhead loops being reverted due to difficult registry allocation. With recent changes that seems to be less of an issue now. Differential Revision: https://reviews.llvm.org/D90055	2020-11-10 17:01:31 +00:00
David Green	73a6cd4b6b	[ARM] Add a RegAllocHint for hinting t2DoLoopStart towards LR This hints the operand of a t2DoLoopStart towards using LR, which can help make it more likely to become t2DLS lr, lr. This makes it easier to move if needed (as the input is the same as the output), or potentially remove entirely. The hint is added after others (from COPY's etc) which still take precedence. It needed to find a place to add the hint, which currently uses the post isel custom inserter. Differential Revision: https://reviews.llvm.org/D89883	2020-11-10 16:28:57 +00:00
David Green	b2ac9681a7	[ARM] Alter t2DoLoopStart to define lr This changes the definition of t2DoLoopStart from t2DoLoopStart rGPR to GPRlr = t2DoLoopStart rGPR This will hopefully mean that low overhead loops are more tied together, and we can more reliably generate loops without reverting or being at the whims of the register allocator. This is a fairly simple change in itself, but leads to a number of other required alterations. - The hardware loop pass, if UsePhi is set, now generates loops of the form: %start = llvm.start.loop.iterations(%N) loop: %p = phi [%start], [%dec] %dec = llvm.loop.decrement.reg(%p, 1) %c = icmp ne %dec, 0 br %c, loop, exit - For this a new llvm.start.loop.iterations intrinsic was added, identical to llvm.set.loop.iterations but produces a value as seen above, gluing the loop together more through def-use chains. - This new instrinsic conceptually produces the same output as input, which is taught to SCEV so that the checks in MVETailPredication are not affected. - Some minor changes are needed to the ARMLowOverheadLoop pass, but it has been left mostly as before. We should now more reliably be able to tell that the t2DoLoopStart is correct without having to prove it, but t2WhileLoopStart and tail-predicated loops will remain the same. - And all the tests have been updated. There are a lot of them! This patch on it's own might cause more trouble that it helps, with more tail-predicated loops being reverted, but some additional patches can hopefully improve upon that to get to something that is better overall. Differential Revision: https://reviews.llvm.org/D89881	2020-11-10 15:57:58 +00:00
Kazushi (Jam) Marukawa	543b30db06	[VE][NFC] Change cast to dyn_cast We used cast where we should use dyn_cast. So, change it this time. Old code cause problems if I implement brind instruction and compile openmp using new compiler. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D91151	2020-11-10 21:49:16 +09:00
Pablo Barrio	642b21beba	[AArch64] Enable RAS 1.1 system registers in all AArch64 Some use cases (e.g. kernel devs) have strict requirements to only enable features available with -march=armv8-a, e.g. no armv8.1-a. Enabling RAS 1.1 in all AArch64 means they can consider to support it. Bear in mind that the first versions of the Armv8 architecture still do not support RAS 1.1. This patch only lets devs write code with the user-friendly register mnemonic instead of the ugly generic S<op0>_<op1>_<Cn>_<Cm>_<op2>. They still need to place runtime checks to make sure that the CPU to run on supports RAS 1.1. Differential Revision: https://reviews.llvm.org/D90594	2020-11-10 12:13:33 +00:00
Kazushi (Jam) Marukawa	c84b2c49be	[VE] Support inline assembly with vector regsiters Support inline assembly with vector registers. Add a regression test also. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D91146	2020-11-10 20:55:38 +09:00
Mirko Brkusanin	a75d6178b8	[GlobalISel] Add combine for (x \| mask) -> x when (x \| mask) == x If we have a mask, and a value x, where (x \| mask) == x, we can drop the OR and just use x. Differential Revision: https://reviews.llvm.org/D90952	2020-11-10 11:32:13 +01:00
Mirko Brkusanin	fb36ab0a42	[GlobalISel] Expand combine for (x & mask) -> x when (x & mask) == x We can use KnownBitsAnalysis to cover cases when mask is not trivial. It can also help with cases when mask is not constant but can still be folded into one. Since 'and' is comutative we should treat both operands as possible replacements. Differential Revision: https://reviews.llvm.org/D90674	2020-11-10 11:32:13 +01:00
Kazushi (Jam) Marukawa	b65ef65b22	[VE] Support inline assembly Support inline assembly with scalar registers. Add a regression test also. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D91119	2020-11-10 18:56:22 +09:00
Jay Foad	0ad4d04002	[AMDGPU] Remove an unused return value. NFC. Differential Revision: https://reviews.llvm.org/D91063	2020-11-10 09:15:14 +00:00
Esme-Yi	6e0ad5bc8c	[PowerPC] Add an ISEL pattern for Mul with Imm. Summary: This patch try to do the following transformation if the multiplier doen't fit int16: (mul X, c1 << c2) -> (rldicr (mulli X, c1) c2) Reviewed By: jsji, steven.zhang Differential Revision: https://reviews.llvm.org/D87384	2020-11-10 06:52:39 +00:00
Carl Ritson	fde8351743	[AMDGPU] Fix lowering of S_MOV_{B32,B64}_term If the source of S_MOV_{B32,B64}_term is an immediate then it cannot be lowered to a COPY. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D90451	2020-11-10 12:16:31 +09:00
Eric Astor	d657f7cd30	[ms] [llvm-ml] Support MASM's relational operators (EQ, LT, etc.) Support the named relational operators (EQ, LT, etc.). Reviewed By: thakis Differential Revision: https://reviews.llvm.org/D89733	2020-11-09 14:01:36 -05:00
Francesco Petrogalli	9f61931e07	[llvm][AArch64] Allow TB(N)Z to drop signext for sign bit tests. For example if the sign extension is only used in for TBZ, and the value is used elsewhere with a zero extension, this can eliminate a sign extension. Reviewed By: samparker Differential Revision: https://reviews.llvm.org/D90606	2020-11-09 18:27:48 +00:00
David Green	c8cd7e2bbf	[ARM] Remove MI variable aliasing. NFC This was accidentally using the same name for two different variables in the same line. Whilst it seems to work for some compilers, others have trouble and it is probably not a fantastic idea.	2020-11-09 18:18:43 +00:00
Craig Topper	5d3fd3df94	[RISCV] Make ctlz/cttz cheap to speculatively execute so CodeGenPrepare won't insert a zero check. Add additional isel patterns for ctzw/clzw instructions. Differential Revision: https://reviews.llvm.org/D91040	2020-11-09 10:13:45 -08:00
Craig Topper	a59076006b	[RISCV] Add isel patterns for using PACK for zext.h and zext.w. Differential Revision: https://reviews.llvm.org/D91024	2020-11-09 10:13:45 -08:00
Craig Topper	4265cbaa34	[RISCV] Make SIGN_EXTEND_INREG from i8/i16 legal when Zbb extension is enabled. This produces better code for sign extend to i64 on RV32 target. Differential Revision: https://reviews.llvm.org/D91023	2020-11-09 10:13:45 -08:00
Craig Topper	c0dd22e44a	[RISCV] Add isel patterns to match sbset/sbclr/sbinv/sbext even if the shift amount isn't masked. This uses the shiftop PatFrags to handle the masked shift amount and unmasked shift amount cases. That also checks XLen as part of the masked amount check so we don't need separate RV32 and RV64 patterns. Differential Revision: https://reviews.llvm.org/D91016	2020-11-09 09:55:26 -08:00
Mircea Trofin	2ac3a7d0c4	[NFC] Use [MC]Register Differential Revision: https://reviews.llvm.org/D90795	2020-11-09 08:37:14 -08:00
jasonliu	42d2109380	[XCOFF] Enable explicit sections on AIX Implement mechanism to allow explicit sections to be generated on AIX. Reviewed By: DiggerLin Differential Revision: https://reviews.llvm.org/D88615	2020-11-09 16:27:38 +00:00
Stanislav Mekhanoshin	d5a465866e	[AMDGPU] Omit buffer resource with flat scratch. Differential Revision: https://reviews.llvm.org/D90979	2020-11-09 08:05:20 -08:00
Paul C. Anagnostopoulos	91d2e5c81a	[TableGen] Add the !filter bang operator. Add a test. Update the Programmer's Reference. Use it in some TableGen files. Differential Revision: https://reviews.llvm.org/D91008	2020-11-09 10:56:55 -05:00
Sebastian Neubauer	a022b1ccd8	[AMDGPU] Add amdgpu_gfx calling convention Add a calling convention called amdgpu_gfx for real function calls within graphics shaders. For the moment, this uses the same calling convention as other calls in amdgpu, with registers excluded for return address, stack pointer and stack buffer descriptor. Differential Revision: https://reviews.llvm.org/D88540	2020-11-09 16:51:44 +01:00
Momchil Velikov	937ab6a785	[ARM][MachineOutliner] Emit more CFI instructions This patch make the outliner emit CFI instructions in a few more places: * after LR is restored, but before the return in an outlined function * around save/restore of LR to/from a register at calls to outlined functions * around save/restore of LR to/from the stack at calls to outlined functions The latter two only when the function does NOT spill LR. If the function spills LR, then outliner generated saves/restores around calls are not considered interesting for unwinding the frame. Differential Revision: https://reviews.llvm.org/D89483	2020-11-09 15:26:18 +00:00
Sam Tebbs	40a3f7e48d	[ARM][LowOverheadLoops] Merge a VCMP and the new VPST into a VPT There were cases where a VCMP and a VPST were merged even if the VCMP didn't have the same defs of its operands as the VPST. This is fixed by adding RDA checks for the defs. This however gave rise to cases where the new VPST created would precede the un-merged VCMP and so would fail a predicate mask assertion since the VCMP wasn't predicated. This was solved by converting the VCMP to a VPT instead of inserting the new VPST. Differential Revision: https://reviews.llvm.org/D90461	2020-11-09 15:03:48 +00:00
Jay Foad	55ea017759	[AMDGPU] Remove unused DisableDecoder machinery. NFC. This has been unused since D24738.	2020-11-09 13:53:27 +00:00
David Green	a0a9e1c798	[ARM] Remove kill flags between VCMP and insertion point When we fold a VCMP into a VPST instruction any kill flags between the old VCMP position and the new insertion point need to be removed, in order to keep the verifier happy. Differential Revision: https://reviews.llvm.org/D90964	2020-11-09 13:17:53 +00:00
Lucas Prates	c2c2cc1360	[ARM][AArch64] Adding Neoverse V1 CPU support Add support for the Neoverse V1 CPU to the ARM and AArch64 backends. This is based on patches from Mark Murray and Victor Campos. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D90765	2020-11-09 13:15:40 +00:00
Craig Topper	f40925aa8b	[X86] Improve lowering of fptoui Invert the select condition when masking in the sign bit of a fptoui operation. Also, rather than lowering the sign mask to select/xor and expecting the select to get cleaned up later, directly lower to shift/xor. Patch by Layton Kifer! Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D90658	2020-11-07 23:50:03 -08:00
Craig Topper	19313ed580	[RISCV] Remove assertsexti32 from a couple B extension isel patterns that don't demanded the sign extended bits.	2020-11-07 22:43:16 -08:00
Carl Ritson	8e8a54c7e9	[AMDGPU] SIWholeQuadMode fix mode insertion when SCC always defined Fix a crash when SCC is defined until end of block and mode change must be inserted in SCC live region. Reviewed By: mceier Differential Revision: https://reviews.llvm.org/D90997	2020-11-08 11:14:57 +09:00
Craig Topper	c72358b77f	[RISCV] Use (not X) in instead of (xor X, -1) in isel patterns to improve readability. NFC	2020-11-07 11:50:52 -08:00
Elvina Yakubova	93b99728b1	[AArch64] Add pipeline model for HiSilicon's TSV110 This patch adds the scheduling and cost model for TSV110. Reviewed by: SjoerdMeijer, bryanpkc Differential Revision: https://reviews.llvm.org/D89972	2020-11-07 01:23:00 +03:00
Eric Astor	5afb360808	[ms] [llvm-ml] Allow arbitrary strings as integer constants MASM interprets strings in expression contexts as integers expressed in big-endian base-256, treating each character as its ASCII representation. This completely eliminates the need to special-case single-character strings. Reviewed By: thakis Differential Revision: https://reviews.llvm.org/D90788	2020-11-06 17:15:49 -05:00
Jay Foad	d61f2cfb9f	[AMDGPU] Simplify exp target parsing Treat any identifier as a potential exp target and diagnose them all the same way as "invalid exp target"s. Differential Revision: https://reviews.llvm.org/D90947	2020-11-06 16:09:34 +00:00
Paul C. Anagnostopoulos	eed768b700	[NVPTX] [TableGen] Use new features of TableGen to simplify and clarify. Differential Revision: https://reviews.llvm.org/D90861	2020-11-06 09:20:19 -05:00
Simon Moll	7914e4f0fa	[VE] Add v(m)regs to preserve_all reg mask V(m)regs where defined before CSR_preserve_all was, add them now. Reviewed By: kaz7 Differential Revision: https://reviews.llvm.org/D90912	2020-11-06 15:16:11 +01:00
Simon Moll	adc69743d2	[VE][NFC] Refactor to support more than one calling conv Prepare for supporting different calling conventions by factoring out things into CC-dependent selection functions (getParamCC, getReturnCC). Reviewed By: kaz7 Differential Revision: https://reviews.llvm.org/D90911	2020-11-06 14:25:25 +01:00
Kazushi (Jam) Marukawa	43df29e206	[VE] Optimize address calculation Optimize address calculations using LEA/LEASL instructions. Update comments in VEISelLowering.cpp also. Update an existing regression test optimized by this modification. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D90878	2020-11-06 19:46:59 +09:00
Simon Moll	d3b33a7810	[VE][TTI] don't advertise vregs/vops Claim to not have any vector support to dissuade SLP, LV and friends from generating SIMD IR for the VE target. We will take this back once vector isel is stable. Reviewed By: kaz7, fhahn Differential Revision: https://reviews.llvm.org/D90462	2020-11-06 11:12:10 +01:00
Craig Topper	741b04b0b7	[RISCV] Only enable GPR<->FPR32 bitconvert isel patterns on RV32. NFCI Bitconvert requires the bitwidth to match on both sides. On RV64 the GPR size is i64 so bitconvert between f32 isn't possible. The node should never be generated so the pattern won't ever match, but moving the patterns under IsRV32 makes it more obviously impossible. It also moves it to a similar location to the patterns for the custom nodes we use for RV64.	2020-11-05 16:15:25 -08:00
Konstantin Pyzhov	41e74e400d	[AMDGPU] Corrected declaration of VOPC instructions with SDWA addressing mode. Removed "implicit def VCC" from declarations of AMDGPU VOPC instructions since they do not implicitly write to VCC in SDWA mode. Differential Revision: https://reviews.llvm.org/D89168	2020-11-05 11:15:50 -05:00
Michael Liao	23c6d1501d	[amdgpu] Add `llvm.amdgcn.endpgm` support. - `llvm.amdgcn.endpgm` is added to enable "abort" support. Differential Revision: https://reviews.llvm.org/D90809	2020-11-05 19:06:50 -05:00
Yuriy Chernyshov	99e64623ec	Do not construct std::string from nullptr While I am trying to forbid such usages systematically in https://reviews.llvm.org/D79427 / P2166R0 to C++ standard, this PR fixes this (definitelly incorrect) usage in llvm. This code is unreachable, so it could not cause any harm Reviewed By: nikic, dblaikie Differential Revision: https://reviews.llvm.org/D87697	2020-11-05 15:23:26 -08:00
Craig Topper	defe11866a	[RISCV] Add isel patterns for fnmadd/fnmsub with an fneg on the second operand instead of the first. The multiply part of FMA is commutable, but TargetSelectionDAG.td doesn't have it marked as commutable so tablegen won't automatically create the additional patterns. So manually add commuted patterns.	2020-11-05 14:00:25 -08:00
Kazushi (Jam) Marukawa	f0e585d585	[VE] Add isReMaterializable and isAsCheapAsAMove flags Add isReMaterializable and isCheapAsAMove flags to integer instructions which cost cheap. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D90833	2020-11-06 06:09:10 +09:00
Sanjay Patel	264a6df353	[ARM] remove cost-kind predicate for cmp/sel costs This is the cmp/sel sibling to D90692. Again, the reasoning is: the throughput cost is number of instructions/uops, so size/blended costs are identical except in special cases (for example, fdiv or other known-expensive machine instructions or things like MVE that may require cracking into >1 uops). We need to check for a valid (non-null) condition type parameter because SimplifyCFG may pass nullptr for that (and so we will crash multiple regression tests without that check). I'm not sure if passing nullptr makes sense, but other code in the cost model does appear to check if that param is set or not. Differential Revision: https://reviews.llvm.org/D90781	2020-11-05 14:52:25 -05:00
Amara Emerson	f347d78cca	[AArch64][GlobalISel] Add AArch64::G_DUPLANE[X] opcodes for lane duplicates. These were previously handled by pattern matching shuffles in the selector, but adding a new opcode and making it equivalent to the AArch64duplane SDAG node allows us to select more patterns, like lane indexed FMLAs (patch adding a test for that will be committed later). The pattern matching code has been simply moved to postlegalize lowering. Differential Revision: https://reviews.llvm.org/D90820	2020-11-05 11:18:11 -08:00
Craig Topper	ce5f4f22e9	[RISCV] Use the 'si' lib call for (double (fp_to_sint/uint i32 X)) when F extension is enabled. D80526 added custom lowering to pick the si lib call on RV64, but this custom handling is only enabled when the F and D extension are both disabled. This prevents the si library call from being used for double when F is enabled but D is not. This patch changes the behavior so we always enable the Custom hook on RV64 and decide in ReplaceNodeResults if we should emit a libcall based on whether the FP type should be softened or not. Differential Revision: https://reviews.llvm.org/D90817	2020-11-05 10:46:45 -08:00
Stanislav Mekhanoshin	f738aee0bb	[AMDGPU] Add default 1 glc operand to rtn atomics This change adds a real glc operand to the return atomic instead of just string " glc" in the middle of the asm string. Improves asm parser diagnostics. Differential Revision: https://reviews.llvm.org/D90730	2020-11-05 10:41:59 -08:00
Craig Topper	ce1270fc7e	[RISCV] Remove shadow register list passed to AllocateReg when allocating FP registers for calling convention The _F and _D registers are already sub/super registers. When one gets allocated all its aliases are already marked as allocated. We don't need to explicitly shadow it too. I believe shadow is for calling conventions like 64-bit Windows on X86 where have rules like this CCIfType<[i32], CCAssignToRegWithShadow<[ECX , EDX , R8D , R9D ], [XMM0, XMM1, XMM2, XMM3]>> For that calling convention the argument number determines which register is used regardless of how many scalars or vectors came before it. Removing this removes a question I had in D90738. Differential Revision: https://reviews.llvm.org/D90801	2020-11-05 09:49:42 -08:00
Craig Topper	c623584b6f	[RISCV] Add isel patterns for fshl with immediate to select FSRI/FSRIW There is no FSLI instruction, but we can emulate it using FSRI by swapping operands and subtracting the immediate from the bitwidth. Differential Revision: https://reviews.llvm.org/D90826	2020-11-05 09:37:43 -08:00
Sander de Smalen	d57bba7cf8	[SVE] Return StackOffset for TargetFrameLowering::getFrameIndexReference. To accommodate frame layouts that have both fixed and scalable objects on the stack, describing a stack location or offset using a pointer + uint64_t is not sufficient. For this reason, we've introduced the StackOffset class, which models both the fixed- and scalable sized offsets. The TargetFrameLowering::getFrameIndexReference is made to return a StackOffset, so that this can be used in other interfaces, such as to eliminate frame indices in PEI or to emit Debug locations for variables on the stack. This patch is purely mechanical and doesn't change the behaviour of how the result of this function is used for fixed-sized offsets. The patch adds various checks to assert that the offset has no scalable component, as frame offsets with a scalable component are not yet supported in various places. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D90018	2020-11-05 11:02:18 +00:00
Fangrui Song	96b0b9a5e3	[X86] Enable shrink-wrapping for no-frame-pointer non-nounwind functions on platforms not using compact unwind The current compact unwind scheme does not work when the prologue is not at the start (the instructions before the prologue cannot be described). (Technically this is fixable, but it requires multiple compact unwind descriptors for one function.) rL255175 chose to not perform shrink-wrapping for no-frame-pointer functions not marked as nounwind to work around PR25614. This is overly limited, as platforms not supporting compact unwind (all non-Darwin) does not need the workaround. This patch restricts the limitation to compact unwind platforms. Reviewed By: qcolombet Differential Revision: https://reviews.llvm.org/D89930	2020-11-04 16:51:48 -08:00
Arthur Eubanks	ab0ddbc38a	Reland [NewPM] Add OptimizationLevel param to registerPipelineStartEPCallback This allows targets to skip optional optimization passes at -O0. Reviewed By: ychen Differential Revision: https://reviews.llvm.org/D90777	2020-11-04 13:11:40 -08:00
Arthur Eubanks	9173b5a99d	Revert "[NewPM] Add OptimizationLevel param to registerPipelineStartEPCallback" This reverts commit `7a83aa0520`. Causing buildbot failures.	2020-11-04 12:57:32 -08:00
Arthur Eubanks	7a83aa0520	[NewPM] Add OptimizationLevel param to registerPipelineStartEPCallback This allows targets to skip optional optimization passes at -O0. Reviewed By: ychen Differential Revision: https://reviews.llvm.org/D90777	2020-11-04 12:53:30 -08:00
Eric Astor	07c4f1d10b	[ms] [llvm-ml] Lex MASM strings, including escaping Allow single-quoted strings and double-quoted character values, as well as doubled-quote escaping. Reviewed By: thakis Differential Revision: https://reviews.llvm.org/D89731	2020-11-04 15:28:43 -05:00
Cameron McInally	c126eb7529	[SelectionDAG] Add legalizations for VECREDUCE_SEQ_FMUL Hook up legalizations for VECREDUCE_SEQ_FMUL. This is following up on the VECREDUCE_SEQ_FADD work from D90247. Differential Revision: https://reviews.llvm.org/D90644	2020-11-04 14:20:31 -06:00
Mircea Trofin	5dc47541f9	[NFC] Use Register/MCRegister Differential Revision: https://reviews.llvm.org/D90724	2020-11-04 12:20:17 -08:00
Craig Topper	cc3bf27077	[RISCV] Remove assertsexti32 from fslw/fsrw isel patterns. The operations in these patterns shouldn't be effected by sign bits. And the pattern is starting from a sign_extend_inreg so we aren't expecting sign bits to be passed through either. Differential Revision: https://reviews.llvm.org/D90739	2020-11-04 11:37:58 -08:00
Craig Topper	d47300f503	[RISCV] Correct the operand order for fshl/fshr to fsl/fsr instructions. fsl/fsr take their shift amount in $rs2 or an immediate. The sources are $rs1 and $rs3. fshl/fshr ISD opcodes both concatenate operand 0 in the high bits and operand 1 in the lower bits. fshl returns the high bits after shifting and fshr returns the low bits. So a shift amount of 0 returns operand 0 for fshl and operand 1 for fshr. fsl/fsr concatenate their operands in different orders such that $rs1 will be returned for a shift amount of 0. So $rs1 needs to come from operand 0 of fshl and operand 1 of fshr. Differential Revision: https://reviews.llvm.org/D90735	2020-11-04 11:13:25 -08:00
Craig Topper	0122a4ea66	[RISCV] Remove assertsexti32 from inputs to riscv_sllw/srlw nodes in B extension isel patterns. riscv_sllw/srlw only reads the lower 32 bits of the first operand. And the lower 5 bits of the second operands. Whether the upper 32 bits of the input are sign bits or not doesn't matter. Also use ineg and not to shorten the patterns. Differential Revision: https://reviews.llvm.org/D90668	2020-11-04 10:35:05 -08:00
Craig Topper	857563eaf0	[RISCV] Check all 64-bits of the mask in SelectRORIW. We need to ensure the upper 32 bits of the mask are zero. So that the srl shifts zeroes into the lower 32 bits. Differential Revision: https://reviews.llvm.org/D90585	2020-11-04 10:15:30 -08:00
Christopher Tetreault	900ec97bbe	[UBSan] Cannot negate smallest negative signed integer Silence warning Undefined Behavior Sanitzer warning: runtime error: negation of -9223372036854775808 cannot be represented in type 'int64_t' (aka 'long'); cast to an unsigned type to negate this value to itself Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D90710	2020-11-04 10:07:52 -08:00
Craig Topper	3701e33a22	[RISCV] Remove custom isel for (srl (shl val, 32), imm). Use pattern instead. NFCI We don't need custom matching, we just a need a predicate to check the immediate is greater than 32. We can use the existing ImmSub32 to adjust the immediate. I've also used the new predicate in the other location that used ImmSub32. I tried to create a test case where we would break without the greater than 32 check on that pattern, but DAG combine defeated me. Still seemed safer to have it. Differential Revision: https://reviews.llvm.org/D90546	2020-11-04 09:59:14 -08:00
Joe Nash	58adab34c4	[AMDGPU] Resolve pseudo registers at encoding uses Pseudo-registers allow different register encodings between gpu generations. Make sure we resolve the pseudo regs to real regs whenever we get their hardware encoding. Using the correct encodings revealed a register bank conflict and an unnecessary write dependency. Tests have been updated to match. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D90721 Change-Id: I73c154cd24aecc820993b50bebaf4df97a5710ca	2020-11-04 12:52:32 -05:00
Sebastian Neubauer	31a0b2834f	[AMDGPU] Fix iterating in SIFixSGPRCopies The insertion of waterfall loops splits the current basic block into three blocks. So the basic block that we iterate over must be updated. This failed assert(!NodePtr->isKnownSentinel()) in ilist_iterator for divergent calls in branches before. Differential Revision: https://reviews.llvm.org/D90596	2020-11-04 18:43:19 +01:00
Paul C. Anagnostopoulos	d56cd4291e	[TableGen] Add !interleave operator to concatenate a list of values with delimiters Add a test. Use it in some TableGen files. Differential Revision: https://reviews.llvm.org/D90469	2020-11-04 09:23:54 -05:00
Simon Moll	351c10cc72	[VE] Add +vpu attribute `+vpu` controls whether VEISelLowering adds any vregs. This defaults to `-vpu` to have scalar code generation out of the box. We bring up vector isel under the `+vpu` flag. Once vector isel is stable we switch to `+vpu` and advertise vregs and vops in TTI. Reviewed By: kaz7 Differential Revision: https://reviews.llvm.org/D90465	2020-11-04 12:42:00 +01:00
Kerry McLaughlin	f2412d372d	[SVE][CodeGen] Lower scalable integer vector reductions This patch uses the existing LowerFixedLengthReductionToSVE function to also lower scalable vector reductions. A separate function has been added to lower VECREDUCE_AND & VECREDUCE_OR operations with predicate types using ptest. Lowering scalable floating-point reductions will be addressed in a follow up patch, for now these will hit the assertion added to expandVecReduce() in TargetLowering. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D89382	2020-11-04 11:38:49 +00:00
Sebastian Neubauer	1124bf4ab7	[AMDGPU] Set rsrc1 flags for graphics shaders Before they were only set for compute kernels and compute shaders but not for other shaders. Differential Revision: https://reviews.llvm.org/D89399	2020-11-04 12:25:41 +01:00
Sebastian Neubauer	76313288cd	[AMDGPU] Fix ieee mode default value Previously, the default value for ieee mode was - on for compute kernels and compute shaders, - off for all shaders except compute shaders. This commit changes the default to be - on for compute kernels, - off for shaders. This aligns the default value with the settings that are actually in use. To my knowledge, all users of shader calling conventions (mesa and llpc) disable the ieee mode by default. Differential Revision: https://reviews.llvm.org/D89388	2020-11-04 12:25:38 +01:00
David Green	eb611930b6	[ARM] Remove unused variable. NFC	2020-11-04 09:00:03 +00:00
Sander de Smalen	73b6cb67dc	[NFCI] Replace AArch64StackOffset by StackOffset. This patch replaces the AArch64StackOffset class by the generic one defined in TypeSize.h. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D88983	2020-11-04 08:49:00 +00:00
Amara Emerson	393b55380a	[AArch64][GlobalISel] Add combine for G_EXTRACT_VECTOR_ELT to allow selection of pairwise FADD. For the <2 x float> case, instead of adding another combine or legalization to get it into a <4 x float> form, I'm just adding a GISel specific selection pattern to cover it. Differential Revision: https://reviews.llvm.org/D90699	2020-11-03 17:25:14 -08:00
Julien Jorge	0fca651711	[WebAssembly] Don't fold frame offset for global addresses When machine instructions are in the form of ``` %0 = CONST_I32 @str %1 = ADD_I32 %stack.0, %0 %2 = LOAD 0, 0, %1 ``` In the `ADD_I32` instruction, it is possible to fold it if `%0` is a `CONST_I32` from an immediate number. But in this case it is a global address, so we shouldn't do that. But we haven't checked if the operand of `ADD` is an immediate so far. This fixes the problem. (The case applies the same for `ADD_I64` and `CONST_I64` instructions.) Fixes https://bugs.llvm.org/show_bug.cgi?id=47944. Patch by Julien Jorge (jjorge@quarkslab.com) Reviewed By: dschuff Differential Revision: https://reviews.llvm.org/D90577	2020-11-03 14:56:25 -08:00
Sanjay Patel	c40126e740	[ARM] remove cost-kind predicate for most math op costs This is based on the same idea that I am using for the basic model implementation and what I have partly already done for x86: throughput cost is number of instructions/uops, so size/blended costs are identical except in special cases (for example, fdiv or other known-expensive machine instructions or things like MVE that may require cracking into >1 uop)). Differential Revision: https://reviews.llvm.org/D90692	2020-11-03 17:23:46 -05:00
Jordan Rupprecht	980bf1d5d1	[NFC] Inline wasm assertion-only variable	2020-11-03 13:06:59 -08:00
Andy Wingo	107c3a12d6	[WebAssembly] Implement ref.null This patch adds a new "heap type" operand kind to the WebAssembly MC layer, used by ref.null. Currently the possible values are "extern" and "func"; when typed function references come, though, this operand may be a type index. Note that the "heap type" production is still known as "refedtype" in the draft proposal; changing its name in the spec is ongoing (https://github.com/WebAssembly/reference-types/issues/123). The register form of ref.null is still untested. Differential Revision: https://reviews.llvm.org/D90608	2020-11-03 10:46:23 -08:00
Craig Topper	00eff96e1d	[RISCV] Add missing patterns for rotr with immediate for Zbb/Zbp extensions. DAGCombine doesn't canonicalize rotl/rotr with immediate so we need patterns for both. Remove the custom matcher for rotl to RORI and just use a SDNodeXForm to convert the immediate instead. Doing this gives priority to the rev32/rev16 versions of grevi over rori since an explicit immediate is more precise than any immediate. I also added rotr patterns for rev32/rev16. And removed the (or (shl), (shr)) patterns that should be combined to rotl by DAG combine. There is at least one other grev pattern that probably needs a another rotr pattern, but we need more test coverage first. Differential Revision: https://reviews.llvm.org/D90575	2020-11-03 10:04:52 -08:00
Esme-Yi	5053eab890	Revert "[PowerPC] Extend folding RLWINM + RLWINM to post-RA." This reverts commit `119ab2181e`.	2020-11-03 16:34:02 +00:00
Tim Renouf	89d41f3a2b	[AMDGPU] Add gfx1033 target Differential Revision: https://reviews.llvm.org/D90447 Change-Id: If2650fc7f31bbdd49c76e74a9ca8e3734d769761	2020-11-03 16:27:48 +00:00
Tim Renouf	ee3e642627	[AMDGPU] Add gfx90c target This differentiates the Ryzen 4000/4300/4500/4700 series APUs that were previously included in gfx909. Differential Revision: https://reviews.llvm.org/D90419 Change-Id: Ia901a7157eb2f73ccd9f25dbacec38427312377d	2020-11-03 16:27:43 +00:00
Jay Foad	040c50278c	[AMDGPU] Fix ds_read2/write2 with unaligned offsets These instructions use a scaled offset. We were wrongly selecting them even when the required offset was not a multiple of the scale factor. Differential Revision: https://reviews.llvm.org/D90607	2020-11-03 15:16:10 +00:00
Jameson Nash	a0ad066ce4	make the AsmPrinterHandler array public This lets external consumers customize the output, similar to how AssemblyAnnotationWriter lets the caller define callbacks when printing IR. The array of handlers already existed, this just cleans up the code so that it can be exposed publically. Replaces https://reviews.llvm.org/D74158 Differential Revision: https://reviews.llvm.org/D89613	2020-11-03 10:02:09 -05:00
Sanjay Patel	9af561ec99	[x86] update cost table comments for maxnum; NFC Follow-up suggested in D90613.	2020-11-03 08:09:59 -05:00
David Green	bd32386410	[ARM] Remove unused variable. NFC	2020-11-03 12:58:10 +00:00
David Green	e474499402	[ARM] Treat memcpy/memset/memmove as call instructions for low overhead loops If an instruction will be lowered to a call there is no advantage of using a low overhead loop as the LR register will need to be spilled and reloaded around the call, and the low overhead will end up being reverted. This teaches our hardware loop lowering that these memory intrinsics will be calls under certain situations. Differential Revision: https://reviews.llvm.org/D90439	2020-11-03 11:53:09 +00:00
Nicholas Guy	54d8627852	[AArch64] Redundant masks in downcast long multiply Adds patterns to catch masks preceeding a long multiply, and generating a single umull/smull instruction instead. Differential revision: https://reviews.llvm.org/D89956	2020-11-03 10:12:28 +00:00
Petar Avramovic	0031418dce	AMDGPU/GlobalISel: Use same builder/observer in post-legalizer-combiner Change match/apply functions into methods of new target specific combiner helper class. Use reference to MachineIRBuilder from helper instead of constructing new MachineIRBuilder each time new instruction needs to made. Allows correct tracking of newly created instructions. Differential Revision: https://reviews.llvm.org/D90623	2020-11-03 09:24:50 +01:00
Esme-Yi	119ab2181e	[PowerPC] Extend folding RLWINM + RLWINM to post-RA. Summary: This patch depends on D89846. We have the patterns to fold 2 RLWINMs in ppc-mi-peephole, while some RLWINM will be generated after RA, for example rGc4690b007743. If the RLWINM generated after RA followed by another RLWINM, we expect to perform the optimization after RA, too. Reviewed By: shchenz, steven.zhang Differential Revision: https://reviews.llvm.org/D89855	2020-11-03 07:44:11 +00:00
Craig Topper	46e91f6701	[RISCV] Remove isel patterns for fshl/fshr with same inputs. NFC These were being selected to ROL/ROR, but DAG combine should canonicalize fshl/fshr with same inputs to rotl/rotr which we also have patterns for.	2020-11-02 23:12:18 -08:00
Esme-Yi	b969dfe26f	[NFC][PowerPC] Move the folding RLWINMs from ppc-mi-peephole to PPCInstrInfo. Summary: We have the patterns to fold 2 RLWINMs in ppc-mi-peephole, while some RLWINM will be generated after RA, for example D88274. If the RLWINM generated after RA followed by another RLWINM, we expect to perform the optimization after RA, too. This is a NFC patch to move the folding patterns to PPCInstrInfo, and the follow-up works will be calling it in pre-emit-peephole and expand the patterns to handle more cases. Reviewed By: shchenz Differential Revision: https://reviews.llvm.org/D89846	2020-11-03 06:28:56 +00:00
Jessica Clarke	7601a21738	[RISCV] Only return DestSourcePair from isCopyInstrImpl for registers ADDI often has a frameindex in operand 1, but consumers of this interface, such as MachineSink, tend to call getReg() on the Destination and Source operands, leading to the following crash when building FreeBSD after this implementation was added in 8cf6778d30: ``` clang: llvm/include/llvm/CodeGen/MachineOperand.h:359: llvm::Register llvm::MachineOperand::getReg() const: Assertion `isReg() && "This is not a register operand!"' failed. PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace, preprocessed source, and associated run script. Stack dump: #0 0x00007f4286f9b4d0 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) llvm/lib/Support/Unix/Signals.inc:563:0 #1 0x00007f4286f9b587 PrintStackTraceSignalHandler(void) llvm/lib/Support/Unix/Signals.inc:630:0 #2 0x00007f4286f9926b llvm::sys::RunSignalHandlers() llvm/lib/Support/Signals.cpp:71:0 #3 0x00007f4286f9ae52 SignalHandler(int) llvm/lib/Support/Unix/Signals.inc:405:0 #4 0x00007f428646ffd0 (/lib/x86_64-linux-gnu/libc.so.6+0x3efd0) #5 0x00007f428646ff47 raise /build/glibc-2ORdQG/glibc-2.27/signal/../sysdeps/unix/sysv/linux/raise.c:51:0 #6 0x00007f42864718b1 abort /build/glibc-2ORdQG/glibc-2.27/stdlib/abort.c:81:0 #7 0x00007f428646142a __assert_fail_base /build/glibc-2ORdQG/glibc-2.27/assert/assert.c:89:0 #8 0x00007f42864614a2 (/lib/x86_64-linux-gnu/libc.so.6+0x304a2) #9 0x00007f428d4078e2 llvm::MachineOperand::getReg() const llvm/include/llvm/CodeGen/MachineOperand.h:359:0 #10 0x00007f428d8260e7 attemptDebugCopyProp(llvm::MachineInstr&, llvm::MachineInstr&) llvm/lib/CodeGen/MachineSink.cpp:862:0 #11 0x00007f428d826442 performSink(llvm::MachineInstr&, llvm::MachineBasicBlock&, llvm::MachineInstrBundleIterator<llvm::MachineInstr, false>, llvm::SmallVectorImpl<llvm::MachineInstr>&) llvm/lib/CodeGen/MachineSink.cpp:918:0 #12 0x00007f428d826e27 (anonymous namespace)::MachineSinking::SinkInstruction(llvm::MachineInstr&, bool&, std::map<llvm::MachineBasicBlock, llvm::SmallVector<llvm::MachineBasicBlock, 4u>, std::less<llvm::MachineBasicBlock>, std::allocator<std::pair<llvm::MachineBasicBlock const, llvm::SmallVector<llvm::MachineBasicBlock*, 4u> > > >&) llvm/lib/CodeGen/MachineSink.cpp:1073:0 #13 0x00007f428d824a2c (anonymous namespace)::MachineSinking::ProcessBlock(llvm::MachineBasicBlock&) llvm/lib/CodeGen/MachineSink.cpp:410:0 #14 0x00007f428d824513 (anonymous namespace)::MachineSinking::runOnMachineFunction(llvm::MachineFunction&) llvm/lib/CodeGen/MachineSink.cpp:340:0 ``` Thus, check that operand 1 is also a register in the condition. Reviewed By: arichardson, luismarques Differential Revision: https://reviews.llvm.org/D89090	2020-11-03 03:55:47 +00:00
Qiu Chaofan	d14e51806b	[PowerPC] Skip IEEE 128-bit FP type in FastISel Vector types, quadword integers and f128 currently cannot be handled in FastISel. We did not skip f128 type in lowering arguments, which causes a crash. This patch will fix it. Reviewed By: steven.zhang Differential Revision: https://reviews.llvm.org/D90206	2020-11-03 11:17:11 +08:00
Qiu Chaofan	3204ffeade	[PowerPC] [NFC] Rename VCMPo to VCMP_rec Reviewed By: jsji Differential Revision: https://reviews.llvm.org/D90581	2020-11-03 11:10:59 +08:00
Fangrui Song	ca01a6b3ac	[PowerPC] Parse and ignore .machine ppc64 In the wild, kexec-tools purgatory/arch/ppc64/v2wrap.S and hvcall.S use this directive.	2020-11-02 16:49:57 -08:00
Krzysztof Parzyszek	b26a2755dc	[Hexagon] Move isTypeForHVX from Hexagon TTI to HexagonSubtarget, NFC It's useful outside of Hexagon TTI, and with how TTI is implemented, it is not accessible outside of TTI.	2020-11-02 14:00:45 -06:00
Stanislav Mekhanoshin	c9d6fe6f7d	[AMDGPU] Improve FLAT scratch detection We were useing too broad check for isFLATScratch() which also includes FLAT global. Differential Revision: https://reviews.llvm.org/D90505	2020-11-02 11:37:33 -08:00
Craig Topper	9ac2910093	[RISCV] Make SelectRORIW handle the commutability of OR. The SHL and SRL could be in opposite order so account for that. Differential Revision: https://reviews.llvm.org/D90586	2020-11-02 09:32:54 -08:00
Sanjay Patel	35fa3c474f	[x86] add AVX2 cost model entries for maxnum of 256-bit vectors As noticed in D90554 , the AVX2 costs for 256-bit vectors did not include FMAXNUM entries, so we fell back to AVX1 which assumes those ops will be split into 128-bit halves or something close to that. Differential Revision: https://reviews.llvm.org/D90613	2020-11-02 12:20:17 -05:00
Craig Topper	7142ec3aaf	[RISCV] When matching RORIW, make sure the same input is given to both shifts. The code is looking for (sext_inreg (or (shl X, C2), (shr (and Y, C3), C1))). We need to ensure X and Y are the same. Differential Revision: https://reviews.llvm.org/D90580	2020-11-02 09:12:40 -08:00
Momchil Velikov	7360d6d921	[ARM][MachineOutliner] Do not overestimate LR liveness in return block The `LiveRegUnits` utility (as well as `LivePhysRegs`) considers callee-saved registers to be alive at the point after the return instruction in a block. In the ARM backend, the `LR` register is classified as callee-saved, which is not really correct (from an ARM eABI or just common sense point of view). These two conditions cause the `MachineOutliner` to overestimate the liveness of `LR`, which results in unnecessary saves/restores of `LR` around calls to outlined sequences. It also causes the `MachineVerifer` to crash in some cases, because the save instruction reads a dead `LR`, for example when the following program: int h(int, int); int f(int a, int b, int c, int d) { a = h(a + 1, b - 1); b = b + c; return 1 + (2 * a + b) * (c - d) / (a - b) * (c + d); } int g(int a, int b, int c, int d) { a = h(a - 1, b + 1); b = b + c; return 2 + (2 * a + b) * (c - d) / (a - b) * (c + d); } is compiled with `-target arm-eabi -march=armv7-m -Oz`. This patch computes the liveness of `LR` in return blocks only, while taking into account the few ARM instructions, which read `LR`, but nevertheless the register is not mentioned (explicitly or implicitly) in the instruction operands. Differential Revision: https://reviews.llvm.org/D89189	2020-11-02 16:47:22 +00:00
Florian Hahn	b3b993a7ad	Reland "[TTI] Add VecPred argument to getCmpSelInstrCost." This reverts the revert commit `408c4408fa`. This version of the patch includes a fix for a crash caused by treating ICmp/FCmp constant expressions as instructions. Original message: On some targets, like AArch64, vector selects can be efficiently lowered if the vector condition is a compare with a supported predicate. This patch adds a new argument to getCmpSelInstrCost, to indicate the predicate of the feeding select condition. Note that it is not sufficient to use the context instruction when querying the cost of a vector select starting from a scalar one, because the condition of the vector select could be composed of compares with different predicates. This change greatly improves modeling the costs of certain compare/select patterns on AArch64. I am also planning on putting up patches to make use of the new argument in SLPVectorizer & LV.	2020-11-02 15:39:29 +00:00
Matt Arsenault	86b8f6919b	AMDGPU: Reorder checks	2020-11-02 10:21:48 -05:00
Evgeny Leviant	cc96a82291	[TableGen][SchedModels] Fix read/write variant substitution Patch fixes case when sched class has write and read variants belonging to different processor models. Differential revision: https://reviews.llvm.org/D89777	2020-11-02 17:39:04 +03:00
Jay Foad	0892d2a311	Revert "Fix ds_read2/write2 unaligned offsets" This reverts commit `2e7e898c8f`. It was committed by mistake.	2020-11-02 14:01:33 +00:00
Jay Foad	2e7e898c8f	Fix ds_read2/write2 unaligned offsets	2020-11-02 13:57:13 +00:00
Simon Pilgrim	36920d5f9d	[RISCV] Avoid std::pair<> in FPReg StringSwitch to avoid MSVC compile failures. NFCI. As discussed on D90322, some MSVC builds are failing with is_trivially_copyable static asserts (see D86126) - we can avoid this by not using the std::pair<unsigned,unsigned> which held both the FP+DP Registers, just handle the FP register and convert to DP on the fly.	2020-11-02 11:30:57 +00:00
Caroline Concatto	71038788ce	Revert "[AArch64][AsmParser] Remove 'x31' alias for 'sp/xzr' register." This reverts commit `8b281bfaf3`.	2020-11-02 08:15:50 +00:00
Caroline Concatto	8b281bfaf3	[AArch64][AsmParser] Remove 'x31' alias for 'sp/xzr' register. Only the aliases 'xzr' and 'sp' exist for the physical register x31. The reason for wanting to remove the alias 'x31' is because it allows users to write invalid asm that is not accepted by the GNU assembler. Is there any objection to removing this alias? Or do we want to keep this for compatibility with existing code that uses w31/x31? Differential Revision: https://reviews.llvm.org/D90153	2020-11-02 07:57:05 +00:00
Qiu Chaofan	2762e6734f	[PowerPC] Fix a crash in POWER 9 setb peephole Variable InnerIsSel references FalseRes, while FalseRes might be zext/sext. So InnerIsSel should reference SetOrSelCC, otherwise a crash will happen. Reviewed By: steven.zhang Differential Revision: https://reviews.llvm.org/D90142	2020-11-02 14:29:43 +08:00
Craig Topper	e57237f198	Recommit "[RISCV] Remove include of RISCVRegisterInfo.h from RISCVBaseInfo.h. NFCI" This reverts `781917254d` and recommits `781917254d`. I've changed getRegForInlineAsmConstraint to not use a std::pair of Register in a previous commit. Hopefully that fixes the reported issue with expensive checks on Windows. I'm still not sure exactly why this commit removing an include affected a different file. Original message: RISCVRegisterInfo.h is part of the CodeGen layer. The Utils library is intended to be shared with the MC layer so shouldn't use files from the CodeGen layer. The register enum names are already available from RISCVMCTargetDesc.h. It appears what was coming from this include was a transitive include of the Register class which I've replaced with MCRegister. Register has a constructor from MCRegister so it should be convertible.	2020-11-01 10:35:37 -08:00
Craig Topper	a76cd10fcd	[RISCV] Use 'unsigned' instead of Register in getRegForInlineAsmConstraint. NFC The return value of this interface still uses an 'unsigned' on all targets. So we convert Register back to unsigned at the end. I'm hoping this will prevent the issue that caused the revert of D90322.	2020-11-01 10:16:52 -08:00
Christudasan Devadasan	d6aa4aa29a	[AMDGPU] Some refactoring after D90404. NFC.	2020-11-01 13:18:53 +05:30
Christudasan Devadasan	9bb2b4f0aa	[AMDGPU] Add alignment check for v3 to v4 load type promotion It should be enabled only when the load alignment is at least 8-byte. Fixes: SWDEV-256824 Reviewed By: foad Differential Revision: https://reviews.llvm.org/D90404	2020-11-01 12:05:34 +05:30
Ayke van Laethem	e03ba2198d	[AVR] Improve inline rotate/shift expansions These expansions were rather inefficient and were done with more code than necessary. This change optimizes them to use expansions more similar to GCC. The code size is the same (when optimizing for code size) but somehow LLVM reorders blocks in a non-optimal way. Still, this should be an improvement with a reduction in code size of around 0.12% (when building compiler-rt). Differential Revision: https://reviews.llvm.org/D86418	2020-10-31 23:15:49 +01:00
Paul C. Anagnostopoulos	ef6f6d1c1a	[TableGen] Eliminate uses of true and false in .td files. They occurred in one NVPTX file and some test files. Differential Revision: https://reviews.llvm.org/D90513	2020-10-31 10:54:33 -04:00
David Green	30ad742644	[ARM] Fix crash for gather of pointer costs. If the elt size is unknown due to it being a pointer, a comparison against 0 will cause an assert. Make sure the elt size is large enough before comparing and for the moment just return the scalar cost.	2020-10-31 13:10:14 +00:00
Simon Pilgrim	9e406ee808	[X86] Make some basic VarArgsLoweringHelper helper methods const. NFCI. Fixes a number of cppcheck remarks.	2020-10-31 12:16:49 +00:00
Simon Pilgrim	e0cbcf96ce	[X86] Make the X86FrameSortingComparator operator const. NFCI. Fixes a cppcheck remark.	2020-10-31 12:16:49 +00:00
Simon Pilgrim	55dbb7d823	[X86] X86MCTargetDesc - ensure the declaration/definition variable names match. NFCI. Silences cppcheck mismatch warnings.	2020-10-31 11:50:00 +00:00
Simon Pilgrim	30a1d91127	[X86] Reduce scope of DestReg and use specific Register type not unsigned. NFCI.	2020-10-31 11:46:07 +00:00
Simon Pilgrim	ae80ac6db2	[X86] printAsmMRegister - make the X86AsmPrinter arg a const reference. NFC. Fixes cppcheck warning.	2020-10-31 11:41:14 +00:00
Simon Pilgrim	39f77b3224	[X86] assignValueToReg - fix Wshadow warning. NFCI. X86OutgoingValueHandler already has a MIB member	2020-10-31 11:39:26 +00:00
Simon Pilgrim	33e20008d1	[X86] printAsmVRegister - remove unused argument. NFC.	2020-10-31 11:34:28 +00:00
Simon Pilgrim	ec547a7517	[X86] X86AsmPrinter - ensure the declaration/definition variable names match. NFCI. Silences cppcheck mismatch warnings.	2020-10-31 11:31:46 +00:00
Simon Pilgrim	5eec049689	[X86] No need to determine pointer when the type is already a MachineInstr. NFCI. Caught by cppcheck - appears to be a copy+paste typo as the other var is an iterator that does need the & pointer operation.	2020-10-31 11:26:25 +00:00
Liu, Chen3	756f597841	[X86] Support Intel avxvnni This patch mainly made the following changes: 1. Support AVX-VNNI instructions; 2. Introduce ExplicitVEXPrefix flag so that vpdpbusd/vpdpbusds/vpdpbusds/vpdpbusds instructions only use vex-encoding when user explicity add {vex} prefix. Differential Revision: https://reviews.llvm.org/D89105	2020-10-31 12:39:51 +08:00
Thomas Lively	a787e09779	[WebAssembly] Prototype i64x2.bitmask As proposed in https://github.com/WebAssembly/simd/pull/368. Differential Revision: https://reviews.llvm.org/D90514	2020-10-30 17:23:30 -07:00
Wouter van Oortmerssen	86cd2332ce	[WebAssembly] Fixed DWARF DW_AT_low_pc encoded as 64-bit in wasm64 Also added general wasm64 DWARF test Also added asserts for unsupported reloc combinations that triggered this bug. Differential Revision: https://reviews.llvm.org/D90503	2020-10-30 16:42:48 -07:00
Thomas Lively	0a512a555a	[WebAssembly] Prototype i64x2.eq As proposed in https://github.com/WebAssembly/simd/pull/381. Since it is still in the prototyping phase, it is only accessible via a target builtin function and a target intrinsic. Depends on D90504. Differential Revision: https://reviews.llvm.org/D90508	2020-10-30 16:38:15 -07:00
Thomas Lively	1cb0b56607	[WebAssembly] Prototype i64x2.widen_{low,high}_i32x4_{s,u} As proposed in https://github.com/WebAssembly/simd/pull/290. As usual, these instructions are available only via builtin functions and intrinsics while they are in the prototyping stage. Differential Revision: https://reviews.llvm.org/D90504	2020-10-30 15:44:04 -07:00
Florian Hahn	408c4408fa	Revert "[TTI] Add VecPred argument to getCmpSelInstrCost." This reverts commit `73f01e3df5`. This appears to break http://lab.llvm.org:8011/#/builders/85/builds/383.	2020-10-30 21:26:14 +00:00
Peter Collingbourne	3d049bce98	hwasan: Support for outlined checks in the Linux kernel. Add support for match-all tags and GOT-free runtime calls, which are both required for the kernel to be able to support outlined checks. This requires extending the access info to let the backend know when to enable these features. To make the code easier to maintain introduce an enum with the bit field positions for the access info. Allow outlined checks to be enabled with -mllvm -hwasan-inline-all-checks=0. Kernels that contain runtime support for outlined checks may pass this flag. Kernels lacking runtime support will continue to link because they do not pass the flag. Old versions of LLVM will ignore the flag and continue to use inline checks. With a separate kernel patch [1] I measured the code size of defconfig + tag-based KASAN, as well as boot time (i.e. time to init launch) on a DragonBoard 845c with an Android arm64 GKI kernel. The results are below: code size boot time before 92824064 6.18s after 38822400 6.65s [1] https://linux-review.googlesource.com/id/I1a30036c70ab3c3ee78d75ed9b87ef7cdc3fdb76 Depends on D90425 Differential Revision: https://reviews.llvm.org/D90426	2020-10-30 14:25:40 -07:00
Cameron McInally	dda1e74b58	[Legalize] Add legalizations for VECREDUCE_SEQ_FADD Add Legalization support for VECREDUCE_SEQ_FADD, so that we don't need to depend on ExpandReductionsPass. Differential Revision: https://reviews.llvm.org/D90247	2020-10-30 16:02:55 -05:00
Peter Collingbourne	c9b1a2b41d	AArch64: Use SBFX instead of UBFX to extract address granule in outlined HWASan checks. In a kernel (or in general in environments where bit 55 of the address is set) the shadow base needs to point to the end of the shadow region, not the beginning. Bit 55 needs to be sign extended into bits 52-63 of the shadow base offset, otherwise we end up loading from an invalid address. We can do this by using SBFX instead of UBFX. Using SBFX should have no effect in the userspace case where bit 55 of the address is clear so we do so unconditionally. I don't think we need a ABI version bump for this (but one will come anyway when we switch to x20 for the shadow base register). Differential Revision: https://reviews.llvm.org/D90424	2020-10-30 12:53:15 -07:00
Peter Collingbourne	3859fc653f	AArch64: Switch to x20 as the shadow base register for outlined HWASan checks. From a code size perspective it turns out to be better to use a callee-saved register to pass the shadow base. For non-leaf functions it avoids the need to reload the shadow base into x9 after each function call, at the cost of an additional stack slot to save the caller's x20. But with x9 there is also a stack size cost, either as a result of copying x9 to a callee-saved register across calls or by spilling it to stack, so for the non-leaf functions the change to stack usage is largely neutral. It is also code size (and stack size) neutral for many leaf functions. Although they now need to save/restore x20 this can typically be combined via LDP/STP into the x30 save/restore. In the case where the function needs callee-saved registers or stack spills we end up needing, on average, 8 more bytes of stack and 1 more instruction but given the improvements to other functions this seems like the right tradeoff. Unfortunately we cannot change the register for the v1 (non short granules) check because the runtime assumes that the shadow base register is stored in x9, so the v1 check still uses x9. Aside from that there is no change to the ABI because the choice of shadow base register is a contract between the caller and the outlined check function, both of which are compiler generated. We do need to rename the v2 check functions though because the functions are deduplicated based on their names, not on their contents, and we need to make sure that when object files from old and new compilers are linked together we don't end up with a function that uses x9 calling an outlined check that uses x20 or vice versa. With this change code size of /system/lib64/*.so in an Android build with HWASan goes from 200066976 bytes to 194085912 bytes, or a 3% decrease. Differential Revision: https://reviews.llvm.org/D90422	2020-10-30 12:51:30 -07:00
Craig Topper	6915c76e10	[RISCV] Don't use DCI.CombineTo to replace a single result. NFCI Just return the new node, which is the standard practice. I also noticed what appeared to be an unnecessary attempt at creating an ANY_EXTEND where the type should already be correct. I replace with an assert to verify the type. Differential Revision: https://reviews.llvm.org/D90444	2020-10-30 10:46:32 -07:00
Sanjay Patel	251dd7c0f9	[x86] add cost overrides for mul with overflow I'm assuming the standard size integer instructions for this end up as something like: mulq %rsi seto %al And the 'mul' generally has reciprocal throughput of 1 on typical implementations (higher latency, but that's not handled here). The default costs may end up much higher than that, and that's what we see in the test diffs. Vector types are left as a 'TODO'. Differential Revision: https://reviews.llvm.org/D90431	2020-10-30 12:38:16 -04:00
Simon Moll	4474d4d49c	[VE][NFC] Split up lowering init Split up the monolithic VETargetLowering ctor into three initialization phases: 1. initRegisterClasses() 2. initSPUActions() 3. // TODO initVPUActions() Reviewed By: kaz7 Differential Revision: https://reviews.llvm.org/D90463	2020-10-30 16:18:27 +01:00
Matt Arsenault	790f5771fd	AMDGPU: Fix missing writelane cases to skip with exec=0	2020-10-30 11:15:11 -04:00
serge-sans-paille	0f60bcc36c	[stack-clash] Fix probing of dynamic alloca - Perform the probing in the correct direction. Related to https://github.com/rust-lang/rust/pull/77885#issuecomment-711062924 - The first touch on a dynamic alloca cannot use a mov because it clobbers existing space. Use a xor 0 instead Differential Revision: https://reviews.llvm.org/D90216	2020-10-30 15:34:00 +01:00
Simon Pilgrim	0ff1ab42f2	Use cast<> instead of dyn_cast<> as we dereference the pointer immediately. NFCI. Fix clang static analyzer warning - we know that the arg should be ConstantInt and we're better off relying on cast<> asserting on failure rather than a null dereference crash.	2020-10-30 14:33:20 +00:00
Florian Hahn	73f01e3df5	[TTI] Add VecPred argument to getCmpSelInstrCost. On some targets, like AArch64, vector selects can be efficiently lowered if the vector condition is a compare with a supported predicate. This patch adds a new argument to getCmpSelInstrCost, to indicate the predicate of the feeding select condition. Note that it is not sufficient to use the context instruction when querying the cost of a vector select starting from a scalar one, because the condition of the vector select could be composed of compares with different predicates. This change greatly improves modeling the costs of certain compare/select patterns on AArch64. I am also planning on putting up patches to make use of the new argument in SLPVectorizer & LV. Reviewed By: dmgreen, RKSimon Differential Revision: https://reviews.llvm.org/D90070	2020-10-30 13:49:08 +00:00
David Sherwood	cea69fa4dc	[SVE] Add fatal error for unnamed SVE variadic arguments We don't currently support passing unnamed variadic SVE arguments so I've added a fatal error if we hit such cases to prevent any silent ABI issues in future. Differential Revision: https://reviews.llvm.org/D90230	2020-10-30 13:35:47 +00:00
David Green	d14db8c8dc	[ARM] Match MVE vqdmulh This adds ISel matching for a form of VQDMULH. There are several ir patterns that we could match to that instruction, this one is for: min(ashr(mul(sext(a), sext(b)), 7), 127) Which is what llvm will optimize to once it has removed the max that usually makes up the min/max saturate pattern, as in this case the compare will always be false. The additional complication to match i32 patterns (which extend into an i64) is that the min will be a vselect/setcc, as vmin is not supported for i64 vectors. Tablegen patterns have also been updated to attempt to reuse the MVE_TwoOpPattern patterns. Differential Revision: https://reviews.llvm.org/D90096	2020-10-30 13:34:27 +00:00
Simon Pilgrim	781917254d	Revert rG22c383763456 "[RISCV] Remove include of RISCVRegisterInfo.h from RISCVBaseInfo.h" This reverts commit `22c3837634`. This is causing a build failure with MSVC - reported on D90322	2020-10-30 11:59:37 +00:00
alex-t	a4f7e4264c	[AMDGPU] SILowerControlFlow::removeMBBifRedundant. Refactoring plus fix for the null MBB pointer in MF->splice Detailed description: This change addresses the refactoring adviced by foad. It also contain the fix for the case when getNextNode is null if the successor block is the last in MachineFunction. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D90314	2020-10-30 14:46:08 +03:00
Michael Roe	fc0892c1f9	[mips] Implement add.ps, mul.ps and sub.ps Differential revision: https://reviews.llvm.org/D90321	2020-10-30 10:59:15 +03:00
Krzysztof Parzyszek	db60e64036	[Hexagon] Handle additional shuffles that can be made perfect	2020-10-29 19:09:00 -05:00
Craig Topper	74b078294f	[RISCV] Improve worklist management in the DAG combine for SLLW/SRLW/SRAW This combine makes two calls to SimplifyDemandedBits, one for the LHS and one for the RHS. If the LHS call returns true, we don't make the RHS call. When SimplifyDemandedBits makes a change, it will add the nodes around the change to the DAG combiner worklist. If the simplification happens on the first recursion step, the N will get added to the worklist. But if the simplification happens deeper in the recursion, then N will not be revisited until the next time the DAG combiner runs. This patch explicitly addes N to the worklist anytime a Simplification is made. Without this we might miss additional simplifications on the LHS or never simplify the RHS. Special care also needs to be taken to not add N if it has been CSEd by the simplification. There are similar examples in DAGCombiner and the X86 target, but I don't have a test for it for RISC-V. I've also returned SDValue(N, 0) instead of SDValue() so DAGCombiner knows a change was made and will update its Statistic variable. The test here was constructed so that 2 simplifications happen to the LHS. Without this fix one happens in the post type legalization DAG combine and the other happens after LegalizeDAG. This prevents the RHS from ever being simplified causing the left and right shift to clear the upper 32 bits of the RHS to be left behind. Differential Revision: https://reviews.llvm.org/D90339	2020-10-29 14:52:53 -07:00
Craig Topper	22c3837634	[RISCV] Remove include of RISCVRegisterInfo.h from RISCVBaseInfo.h RISCVRegisterInfo.h is part of the CodeGen layer. The Utils library is intended to be shared with the MC layer so shouldn't use files from the CodeGen layer. The register enum names are already available from RISCVMCTargetDesc.h. It appears what was coming from this include was a transitive include of the Register class which I've replaced with MCRegister. Register has a constructor from MCRegister so it should be convertible.	2020-10-29 11:39:19 -07:00
Thomas Lively	be6f50798e	[WebAssembly] Implement SIMD signselect instructions As proposed in https://github.com/WebAssembly/simd/pull/124, using the opcodes adopted by V8 in https://chromium-review.googlesource.com/c/v8/v8/+/2486235/2/src/wasm/wasm-opcodes.h. Uses new builtin functions and a new target intrinsic exclusively to ensure that the new instructions are only emitted when a user explicitly opts in to using them since they are still in the prototyping and evaluation phase. Differential Revision: https://reviews.llvm.org/D90357	2020-10-29 11:06:20 -07:00
Jay Foad	9cee87d72a	[AMDGPU] Fix double space in disassembly of ds_gws_sema_* with gds By setting up the AsmStrings correctly we can remove some special cases from AMDGPUInstPrinter::printOffset. Differential Revision: https://reviews.llvm.org/D90307	2020-10-29 17:31:59 +00:00
Jay Foad	58de4b2053	[AMDGPU] Use pseudo instructions for readlane/writelane This reverts r227987 "R600/SI: Determine target-specific encoding of READLANE and WRITELANE early v2". All the codegen changes are caused by the post-RA scheduler no longer treating readlane/writelane as scheduling barriers due to having unmodelled side effects. (The pseudos are hasSideEffects = 0, but the real instructions are hasSideEffects = ? which TableGen conservatively treats as 1.) Differential Revision: https://reviews.llvm.org/D90401	2020-10-29 16:00:53 +00:00
Nicholas Guy	eb9fe24eaf	[ARM] Fix IT block generation after Thumb2SizeReduce with -Oz Fixes a regression caused by D82439, in which IT blocks were no longer being generated when -Oz is present. Differential Revision: https://reviews.llvm.org/D88496	2020-10-29 15:17:31 +00:00
Jay Foad	7a79921edd	[AMDGPU] Remove gds operand from ds_gws_* MachineInstrs The operand value was always 1 (except in some bad MIR tests) so it was redundant. Differential Revision: https://reviews.llvm.org/D90378	2020-10-29 15:04:23 +00:00
Jay Foad	a442fad911	[AMDGPU] Fix double space in disassembly of s_set_gpr_idx_mode Differential Revision: https://reviews.llvm.org/D90374	2020-10-29 14:54:33 +00:00
Jay Foad	e9dd2c4fe2	[AMDGPU] Fix double space in disassembly of some DPP instructions Differential Revision: https://reviews.llvm.org/D90373	2020-10-29 14:54:33 +00:00
Kazushi (Jam) Marukawa	58a6b7bcde	[VE] Add missing BCR format Add missing "BCR %sy, 0, target" format instruction and a regression test for this format. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D90387	2020-10-29 23:30:49 +09:00
Kazushi (Jam) Marukawa	07d1996601	[VE] Support register aliases in llvm-mc Support register aliases in MC layer to compile existing assembly files with clang and integrated assembler. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D90383	2020-10-29 23:28:32 +09:00
Jay Foad	69f5105f5c	[AMDGPU] Simplify insertNoops functions. NFC.	2020-10-29 10:55:20 +00:00
Kazushi (Jam) Marukawa	9c82944b2d	[VE] Add vector control instructions Add LVL/SVL/SMVL/LVIX isntructions. Add regression tests too. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D90355	2020-10-29 19:24:31 +09:00
Ben Shi	076a8d915b	[NFC][AVR] Improve device list Reviewed By: dylanmckay https://reviews.llvm.org/D87968	2020-10-29 10:54:17 +08:00
Kazushi (Jam) Marukawa	7942960199	[VE] Add vector mask operation instructions Add VFMK/VFMS/VFMF/ANDM/ORM/XORM/EQVM/NNDM/NEGM/PCVM/LZVM/TOVM isntructions. Add regression tests too. Also add new patterns to parse VFMK/VFMS/VFMF mnemonics. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D90297	2020-10-29 08:42:41 +09:00
Austin Kerbow	de51867343	[AMDGPU] Add Reset function to GCNHazardRecognizer Reset the tracked emitted instructions when starting scheduling on a new region. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D90347	2020-10-28 16:32:32 -07:00
Jay Foad	5b91a6a88b	[AMDGPU] Allow some modifiers on VOP3B instructions V_DIV_SCALE_F32/F64 are VOP3B encoded so they can't use the ABS src modifier, but they can still use NEG and the usual output modifiers. This partially reverts `3b99f12a4e` "AMDGPU: Remove modifiers from v_div_scale_*". Differential Revision: https://reviews.llvm.org/D90296	2020-10-28 21:54:14 +00:00
Jay Foad	50ee22d791	[AMDGPU] Fix double space in disassembly of SDWA instructions with vcc Differential Revision: https://reviews.llvm.org/D90317	2020-10-28 21:39:39 +00:00
Florian Hahn	772aaa6023	[AArch64] Improve lowering of insert_vector_elt with 0.0 consts. When moving +0.0 into a float vector, we can use to vi*gpr variants of INS. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D90176	2020-10-28 21:35:33 +00:00
Austin Kerbow	8b127a8661	[AMDGPU] Fix inserting combined s_nop in bundles Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D90334	2020-10-28 14:34:04 -07:00
Florian Hahn	ba78cae20f	[AArch64] Use DUP for BUILD_VECTOR with few different elements. If most elements of BUILD_VECTOR are the same, with a few different elements, it is better to use DUP for the common elements and INSERT_VECTOR_ELT for the different elements. Currently this transform is guarded quite restrictively to only trigger in clearly beneficial cases. With D90176, the lowering for patterns originating from code like ` float32x4_t y = {a,a,a,0};` (common in 3D apps) are lowered even better (unnecessary fmov is removed). Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D90233	2020-10-28 19:48:20 +00:00
Sanjay Patel	7c395f31a6	[CostModel][x86] remove cost-kind predicate for intrinsic costs We model cost as number of instructions / uops, so it does not make sense to treat size/blended costs any differently than throughput.	2020-10-28 14:33:37 -04:00
Thomas Lively	31e944556f	[WebAssembly] Prototype extending multiplication SIMD instructions As proposed in https://github.com/WebAssembly/simd/pull/376. This commit implements new builtin functions and intrinsics for these instructions, but does not yet add them to wasm_simd128.h because they have not yet been merged to the proposal. These are the first instructions with opcodes greater than 0xff, so this commit updates the MC layer and disassembler to handle that correctly. Differential Revision: https://reviews.llvm.org/D90253	2020-10-28 09:38:59 -07:00
Paul C. Anagnostopoulos	9d72065cf6	[TableGen] [AMDGPU] Add !sub operator for subtraction Use it in the AMDGPU target to eliminate !add(value1, !mul(value2, -1)) Differential Revision: https://reviews.llvm.org/D90107	2020-10-28 12:27:53 -04:00
Jay Foad	9e634bc22f	[AMDGPU] Omit needless string concatenations. NFC.	2020-10-28 12:56:52 +00:00
Kazushi (Jam) Marukawa	cbdee7df06	[VE] Add vector merger operation instructions Add VMRG/VSHF/VCP/VEX isntructions. Add regression tests too. Also add new patterns to parse new UImm4 oeprand. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D90292	2020-10-28 19:57:10 +09:00
Kazushi (Jam) Marukawa	7ce2b93cbe	[VE] Add vector iterative operation instructions Add VFIA/VFIS/VFIM/VFIAM/VFISM/VFIMA/VFIMS isntructions. Add regression tests too. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D90252	2020-10-28 19:06:46 +09:00
Kazushi (Jam) Marukawa	15f6250bed	[VE][NFC] Fix typo in comment	2020-10-28 18:51:07 +09:00
Kazushi (Jam) Marukawa	b22e32a9c8	[VE] Specify to expand BRIND and BR_JT BRIND and BR_JT are not implmented yet, so expand them atm. Add regression tests too. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D90283	2020-10-28 18:50:20 +09:00
David Green	066737fdbc	[AArch64] Remove AArch64ISD::NOT, use vnot instead vnot (xor -1) should be equivalent to the AArch64 specific AArch64ISD::NOT node, but allow more folding thanks to all the target independent optimizations. Specifically this allows select(icmp ne, x, y) to become "cmeq; bsl y, x" as opposed to needing to convert the predicate with "cmeq; mvn; bsl x, y" Unfortunately there is a regression in a cmtst test, but the code it selected from was already non-canonical, with instcombine preferring to use an eq predicate instead. Plus the more common case of icmp ne is improved. Differential Revision: https://reviews.llvm.org/D90126	2020-10-28 08:15:37 +00:00
Carl Ritson	057934a6d7	[AMDGPU] Fix insert of SIPreAllocateWWMRegs in FastRegAlloc SIPreAllocateWWMRegs was being inserted after RegisterCoalescer but this pass does not exist during FastAlloc so pre-allocation pass was never being run. Insert pre-allocation after TwoAddressInstructionPass instead. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D90236	2020-10-28 12:15:15 +09:00
Nemanja Ivanovic	5459d08795	[PowerPC] Fix single-use check and update chain users for ld-splat When converting a BUILD_VECTOR or VECTOR_SHUFFLE to a splatting load as of `1461fb6e78`, we inaccurately check for a single user of the load and neglect to update the users of the output chain of the original load. As a result, we can emit a new load when the original load is kept and the new load can be reordered after a dependent store. This patch fixes those two issues. Fixes https://bugs.llvm.org/show_bug.cgi?id=47891	2020-10-27 16:49:38 -05:00
Stanislav Mekhanoshin	78ae1f6c90	[AMDGPU] Change predicate for fma/fmac legacy I do not exactly like the use of a negative predicate to enable instructions' support. Change HasNoMadMacF32Insts with HasFmaLegacy32. Differential Revision: https://reviews.llvm.org/D90250	2020-10-27 12:03:52 -07:00
Victor Huang	2e1a737f46	[PowerPC][PCRelative] Turn on TLS support for PCRel by default Turn on TLS support for PCRel by default and update the test cases. Differential Revision: https://reviews.llvm.org/D88738 Reviewed by: stefanp, kamaub	2020-10-27 13:58:44 -05:00
Michael Liao	46c3d5cb05	[amdgpu] Add the late codegen preparation pass. Summary: - Teach that pass to widen naturally aligned but not DWORD aligned sub-DWORD loads. Reviewers: rampitec, arsenm Subscribers: Tags: #llvm Differential Revision: https://reviews.llvm.org/D80364	2020-10-27 14:07:59 -04:00
Kazushi (Jam) Marukawa	a65883a78a	[VE] Add vector reduction instructions Add VSUMS/VSUMX/VFSUM/VMAXS/VMAXX/VFMAX/VRAND/VROR/VRXOR isntructions. Add regression tests too. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D90227	2020-10-28 02:33:21 +09:00
Michael Liao	0d092303b4	[amdgpu] Enable use of AA during codegen. - Add an internal option `-amdgpu-use-aa-in-codegen` to enable or disable this feature. By Default, it's enabled. Differential Revision: https://reviews.llvm.org/D89320	2020-10-27 09:46:23 -04:00
Benjamin Kramer	35f7cbf9df	[X86] Don't crash on CVTPS2PH with wide vector inputs.	2020-10-27 14:42:02 +01:00
Kazushi (Jam) Marukawa	c5fa6bae12	[VE] Add vector float instructions Add VFAD/VFSB/VFMP/VFDV/VFSQRT/VFCP/VFCM/VFMAD/VFMSB/VFNMAD/VFNMSB/ VRCP/VRSQRT/VRSQRTNEX/VFIX/VFIXX/VFLT/VFLTX/VCVS/VCVD instructions. Add regression tests too. Also add additional AsmParser for VFIX and VFIXX instructions to parse their mnemonic. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D90166	2020-10-27 20:42:24 +09:00
Jay Foad	6539ebe97d	[AMDGPU] Use DPP instead of Ext in a couple of class names. NFC.	2020-10-27 10:22:30 +00:00
Craig Topper	f385823e04	[X86] Alternate implementation of D88194. This uses PreprocessISelDAG to replace the constant before instruction selection instead of matching opcodes after. Reviewed By: pengfei Differential Revision: https://reviews.llvm.org/D89178	2020-10-27 00:20:03 -07:00
Wei Wang	d602e79a81	[X86] Encode global address in small code model In small code model, program and its symbols are linked in the lower 2 GB of the address space. Try encoding global address even when the range is unknown in such case. Differential Revision: https://reviews.llvm.org/D89341	2020-10-26 23:14:06 -07:00
Bing1 Yu	2c08f1b4b6	[CostModel][X86] teach TTI calculate cost of chain of vector inserts/extracts more precisely and correctly:In each 128-lane, if there is at least one index is demanded and not all indices are demanded... In each 128-lane, if there is at least one index is demanded and not all indices are demanded and this 128-lane is not the first 128-lane of the legalized-vector, then this 128-lane needs a extracti128; If in each 128-lane, there is at least one index is demanded, this 128-lane needs a inserti128. The following cases will help you build a better understanding: Assume we insert several elements into a v8i32 vector in avx2, Case#1: inserting into 1th index needs vpinsrd + inserti128 Case#2: inserting into 5th index needs extracti128 + vpinsrd + inserti128 Case#3: inserting into 4,5,6,7 index needs 4*vpinsrd + inserti128. Reviewed By: pengfei, RKSimon Differential Revision: https://reviews.llvm.org/D89767	2020-10-27 11:21:13 +08:00
Chen Zheng	00e573cadb	[LSR] fix typo in comments and rename for a new added hook.	2020-10-26 22:29:22 -04:00
Carl Ritson	7a880ab388	[AMDGPU] Move WQM Pass after MI Scheduler Exec mask manipulation inserted by SIWholeQuadMode barriers to instruction scheduling. Move the entire pass after the machine instruction scheduler and make changes so pass is correct for non-SSA operation. These changes should leave the pass still usable pre-scheduler, although tests have be updated to reflect post-scheduler results. Reviewed By: nhaehnle Differential Revision: https://reviews.llvm.org/D88081	2020-10-27 10:25:53 +09:00
Amy Kwan	803cc3aff2	[PowerPC] Implement Set Boolean Condition Instructions This patch implements the set boolean condition instructions introduced in POWER10. The set boolean condition instructions (set[n]bc[r]) are used during the following situations: - sign/zero/any extending i1 to an i32 or i64, - reg+reg, reg+imm or floating point comparisons being sign/zero extended to i32 or i64, - spilling CR bits (using the setnbc instruction) Differential Revision: https://reviews.llvm.org/D87705	2020-10-26 18:42:51 -05:00
Stanislav Mekhanoshin	d176e13ca5	Fixed release build after D89170	2020-10-26 16:00:57 -07:00
Stanislav Mekhanoshin	038d884a50	[AMDGPU] Use flat scratch instructions where available The support is disabled by default. So far there is instruction selection, spilling, and frame elimination. It also changes SP from unswizzled to swizzled as used by flat scratch instructions, so it cannot be mixed with MUBUF stack access. At the very least missing: - GlobalISel; - Some optimizations in frame elimination in between vector and scalar ALU; - It shall finally allow to always materialize frame index as an SGPR, but that is not implemented and frame elimination cannot handle it yet; - Unaligned and/or multidword flat scratch shall work, but it is legalized now for MUBUF; - Operand folding cannot optimize FI like with MUBUF yet; - It will need scaling the value of the SP/FP in the DWARF expression to recover the unswizzled scratch address; Differential Revision: https://reviews.llvm.org/D89170	2020-10-26 14:40:42 -07:00
Evgeny Leviant	a28388f95b	[ARM][SchedModels] Move IsLDMBaseRegInListPred to ARMSchedule.td. NFC This predicate is not specific to cortex-a57 and can be used in other processor models as well.	2020-10-26 22:31:41 +03:00
Stanislav Mekhanoshin	ad8131bb03	[AMDGPU] Fix VC warning about singed/unsigned comparison. NFC. This is the warning reported in https://reviews.llvm.org/D89599	2020-10-26 11:55:57 -07:00
Evgeny Leviant	e74f66125e	[ARM][SchedModels] Convert IsLdstsoScaledNotOptimalPred to MCSchedPredicate Differential revision: https://reviews.llvm.org/D90150	2020-10-26 20:22:41 +03:00
Evgeny Leviant	a877bda397	Fix issue in cortex-a57 sched model Differential revision: https://reviews.llvm.org/D90152	2020-10-26 20:16:40 +03:00
Benjamin Kramer	b777d30496	[AMDGPU] Avoid unused variable warning in Release builds. NFC. SIRegisterInfo.cpp:480:19: error: unused variable 'SOffset'	2020-10-26 18:11:57 +01:00
Kazushi (Jam) Marukawa	9d0db405b5	[VE] Add vector shift instructions Add VSLL/VSLD/VSRL/VSLA/VSLAX/VSRA/VSRAX/VSFA instructionss. Add additonal AsmParser for VSLD special operand. Also add regression tests. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D90143	2020-10-27 00:30:27 +09:00
Kazushi (Jam) Marukawa	83cb423c6e	[VE] Add vector logical instructions Add VAND/VOR/VXOE/VEQV/VLDZ/VPCNT/VBRV/VSEQ instrucitons and regression tests. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D90141	2020-10-27 00:29:33 +09:00
Kazushi (Jam) Marukawa	cfefef50c1	[VE] Support atomic store Support atomic store instructions and add a regression test. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D90137	2020-10-27 00:28:11 +09:00
Jay Foad	0ca4124798	[AMDGPU] Make more use of printNamedBit in AMDGPUInstPrinter. NFC.	2020-10-26 14:03:35 +00:00
Kazushi (Jam) Marukawa	8aa60f67dc	[VE] Add vector comparison and min/max Add VCMP/VCPS/VCPX/VCMS/VCMX vector instructions. Also add regression tests. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D89643	2020-10-26 18:32:04 +09:00
Kazushi (Jam) Marukawa	0acf700243	[VE] Add integer arithmetic vector instructions Add VADD/VADS/VADX/VSUB/VSBS/VSBX/VMPY/VMPS/VMPX/VMPD/VDIV/VDVS/VDVX instructions. Also add regression tests. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D89642	2020-10-26 18:30:11 +09:00
Sebastian Neubauer	a094b4fa4b	[AMDGPU] Emit new pal metadata by default If no pal metadata is given, default to the msgpack format instead of the legacy metadata. This makes tests better readable. Differential Revision: https://reviews.llvm.org/D90035	2020-10-26 10:16:17 +01:00
Evgeny Leviant	a95ce5f65f	[ARM][SchedModels] Rename and generalize predicate. NFC	2020-10-26 12:14:55 +03:00

... 3 4 5 6 7 ...

60269 Commits