llvm-project

Commit Graph

Author	SHA1	Message	Date
Craig Topper	3f48dbf72e	[X86] Allow LowerTRUNCATE to use PACKUS/PACKSS for v16i16->v16i8 truncate when -mprefer-vector-width-256 is in effect and BWI is not available. llvm-svn: 350473	2019-01-05 18:48:11 +00:00
Craig Topper	45ec002e25	[X86] Require second operand of X86vshiftuniform to be an integer. NFC We don't need to require the first operand to be an integer because we already said it was the same type as the result which we also constrained to an integer. llvm-svn: 350455	2019-01-05 01:40:29 +00:00
Nikita Popov	c35b4a37ba	[X86] Fix warning; NFC llvm-svn: 350437	2019-01-04 21:41:35 +00:00
Vyacheslav Zakharin	0a6f86c54b	Update the pr_datasz of .note.gnu.property section. Patch by Xiang Zhang. Differential Revision: https://reviews.llvm.org/D56080 llvm-svn: 350436	2019-01-04 21:25:01 +00:00
Evandro Menezes	9f53bea536	[AArch64] Adjust the cost model for Exynos M3 Improve the modeling of ASIMD loads and stores. llvm-svn: 350434	2019-01-04 21:02:25 +00:00
Sanjay Patel	6153565511	[x86] lower extracted fadd/fsub to horizontal vector math; 2nd try The 1st try for this was at rL350369, but it caused IR-level diffs because our cost models differentiate custom vs. legal/promote lowering. So that was reverted at rL350373. The cost models were fixed independently at rL350403, so this is effectively the same patch as last time. Original commit message: This would show up if we fix horizontal reductions to narrow as they go along, but it's an improvement for size and/or Jaguar (fast-hops) independent of that. We need to do this late to not interfere with other pattern matching of larger horizontal sequences. We can extend this to integer ops in a follow-up patch. Differential Revision: https://reviews.llvm.org/D56011 llvm-svn: 350421	2019-01-04 17:48:13 +00:00
Nirav Dave	1468d6e1c5	Undo r350355 "[X86] Remove terrible DX Register parsing hack in parse operand. NFCI." Add missing test case and update comments. llvm-svn: 350406	2019-01-04 17:11:15 +00:00
Simon Pilgrim	c2054144ee	[CostModel][X86] Fix SSE1 FADD/FSUB costs Noticed in D56011 - handle the case that scalar fp ops are quicker on P3 than P4 Add the other costs so that we're not relying on the default "is legal/custom" cost logic. llvm-svn: 350403	2019-01-04 16:55:57 +00:00
Simon Pilgrim	9f4dea8c06	[X86] Add VPSLLI/VPSRLI ((X >>u C1) << C2) SimplifyDemandedBits combine Repeat of the generic SimplifyDemandedBits shift combine llvm-svn: 350399	2019-01-04 15:43:43 +00:00
Richard Trieu	e1fef949ae	[WebAssembly] Split the checking from the sorting logic. Move the check for -1 and identical values outside the vector sorting code. Compare functions need to be able to compare identical elements to be conforming. llvm-svn: 350379	2019-01-04 06:49:24 +00:00
Craig Topper	6265a15f2e	[X86] Add post-isel peephole to fold KAND+KORTEST into KTEST if only the zero flag is used. Doing this late so we will prefer to fold the AND into a masked comparison first. That can be better for the live range of the mask register. Differential Revision: https://reviews.llvm.org/D56246 llvm-svn: 350374	2019-01-04 00:10:58 +00:00
Sanjay Patel	26ce9c38a7	revert r350369: [x86] lower extracted fadd/fsub to horizontal vector math There are non-codegen tests that need to be updated with this code change. llvm-svn: 350373	2019-01-04 00:02:02 +00:00
Sanjay Patel	ef4afca2ad	[x86] lower extracted fadd/fsub to horizontal vector math This would show up if we fix horizontal reductions to narrow as they go along, but it's an improvement for size and/or Jaguar (fast-hops) independent of that. We need to do this late to not interfere with other pattern matching of larger horizontal sequences. We can extend this to integer ops in a follow-up patch. Differential Revision: https://reviews.llvm.org/D56011 llvm-svn: 350369	2019-01-03 23:16:19 +00:00
Heejin Ahn	777d01c756	[WebAssembly] Optimize Irreducible Control Flow Summary: Irreducible control flow is not that rare, e.g. it happens in malloc and 3 other places in the libc portions linked in to a hello world program. This patch improves how we handle that code: it emits a br_table to dispatch to only the minimal necessary number of blocks. This reduces the size of malloc by 33%, and makes it comparable in size to asm2wasm's malloc output. Added some tests, and verified this passes the emscripten-wasm tests run on the waterfall (binaryen2, wasmobj2, other). Reviewers: aheejin, sunfish Subscribers: mgrang, jgravelle-google, sbc100, dschuff, llvm-commits Differential Revision: https://reviews.llvm.org/D55467 Patch by Alon Zakai (kripken) llvm-svn: 350367	2019-01-03 23:10:11 +00:00
Wouter van Oortmerssen	820c6263d9	[WebAssembly] Fixed disassembler not knowing about new brlist operand Summary: The previously introduced new operand type for br_table didn't have a disassembler implementation, causing an assert. Reviewers: dschuff, aheejin Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D56227 llvm-svn: 350366	2019-01-03 23:01:30 +00:00
Wouter van Oortmerssen	9843295608	[WebAssembly] Made InstPrinter more robust Summary: Instead of asserting on certain kinds of malformed instructions, it now still print, but instead adds an annotation indicating the problem, and/or indicates invalid_type etc. We're using the InstPrinter from many contexts that can't always guarantee values are within range (e.g. the disassembler), where having output is more valueable than asserting. Reviewers: dschuff, aheejin Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D56223 llvm-svn: 350365	2019-01-03 22:59:59 +00:00
Nirav Dave	8de916d1a4	[X86] Remove terrible DX Register parsing hack in parse operand. NFCI. Fold hack special casing of (%dx) operand parsing into the related hack for out/in instruction parsing. llvm-svn: 350355	2019-01-03 21:46:30 +00:00
Sanjay Patel	9633d76a40	[DAGCombiner][x86] scalarize binop followed by extractelement As noted in PR39973 and D55558: https://bugs.llvm.org/show_bug.cgi?id=39973 ...this is a partial implementation of a fold that we do as an IR canonicalization in instcombine: // extelt (binop X, Y), Index --> binop (extelt X, Index), (extelt Y, Index) We want to have this in the DAG too because as we can see in some of the test diffs (reductions), the pattern may not be visible in IR. Given that this is already an IR canonicalization, any backend that would prefer a vector op over a scalar op is expected to already have the reverse transform in DAG lowering (not sure if that's a realistic expectation though). The transform is limited with a TLI hook because there's an existing transform in CodeGenPrepare that tries to do the opposite transform. Differential Revision: https://reviews.llvm.org/D55722 llvm-svn: 350354	2019-01-03 21:31:16 +00:00
Alexander Timofeev	993e2798fd	[AMDGPU] Fix scalar operand folding bug that causes SHOC performance regression. Detailed description: SIFoldOperands::foldInstOperand iterates over the operand uses calling the function that changes def-use iteratorson the way. As a result loop exits immediately when def-use iterator is changed. Hence, the operand is folded to the very first use instruction only. This makes VGPR live along the whole basic block and increases register pressure significantly. The performance drop observed in SHOC DeviceMemory test is caused by this bug. Proposed fix: collect uses to separate container for further processing in another loop. Testing: make check-llvm SHOC performance test. Reviewers: rampitec, ronlieb Differential Revision: https://reviews.llvm.org/D56161 llvm-svn: 350350	2019-01-03 19:55:32 +00:00
Evandro Menezes	0f67746c92	[AArch64] Add new scheduling predicates Add new scheduling predicates to identify the ASIMD loads and stores using the post indexed addressing mode. llvm-svn: 350332	2019-01-03 17:28:09 +00:00
Alex Bradbury	2ba76be882	[RISCV][MC] Accept %lo and %pcrel_lo on operands to li This matches GNU assembler behaviour. llvm-svn: 350321	2019-01-03 14:41:41 +00:00
Diogo N. Sampaio	8786a946d8	[ARM] Add command-line option for SB SB (Speculative Barrier) is only mandatory from 8.5 onwards but is optional from Armv8.0-A. This patch adds a command line option to enable SB, as it was previously only possible to enable by selecting -march=armv8.5-a. This patch also renames FeatureSpecRestrict to FeatureSB. Reviewed By: olista01, LukeCheeseman Differential Revision: https://reviews.llvm.org/D55990 llvm-svn: 350299	2019-01-03 12:09:12 +00:00
Simon Pilgrim	d824f99a6c	[X86] Add ADD/SUB SSAT/USAT vector costs (PR40123) Costs for real SSE2 instructions llvm-svn: 350295	2019-01-03 11:38:42 +00:00
Piotr Sobczak	3abef8f9ea	[AMDGPU] Change section name with metadata access Summary: The commit rL348922 introduced a means to set Metadata section kind for a global variable, if its explicit section name was prefixed with ".AMDGPU.metadata.". This patch changes that prefix to ".AMDGPU.comment.", as "metadata" in the section name might lead to ambiguity with metadata used by AMD PAL runtime. Change-Id: Idd4748800d6fe801441d91595fc21e5a4171e668 Reviewers: kzhuravl Reviewed By: kzhuravl Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D56197 llvm-svn: 350292	2019-01-03 11:22:58 +00:00
QingShan Zhang	f24ec7bdd0	[Power9] Enable the Out-of-Order scheduling model for P9 hw When switched to the MI scheduler for P9, the hardware is modeled as out of order. However, inside the MI Scheduler algorithm, we still use the in-order scheduling model as the MicroOpBufferSize isn't set. The MI scheduler take it as the hw cannot buffer the op. So, only when all the available instructions issued, the pending instruction could be scheduled. That is not true for our P9 hw in fact. This patch is trying to enable the Out-of-Order scheduling model. The buffer size 44 is picked from the P9 hw spec, and the perf test indicate that, its value won't hurt the cpu2017. With this patch, there are 3 specs improved over 3% and 1 spec deg over 3%. The detail is as follows: x264_r: +6.95% cactuBSSN_r: +6.94% lbm_r: +4.11% xz_r: -3.85% And the GEOMEAN for all the C/C++ spec in spec2017 is about 0.18% improved. Reviewer: Nemanjai Differential Revision: https://reviews.llvm.org/D55810 llvm-svn: 350285	2019-01-03 05:04:18 +00:00
Robert Widmann	7882b283cd	[LLVM-C] Expand LLVMRelocMode Summary: Add read[only\|write] PIC relocation models to the C API and teach the TargetMachine API about it. Reviewers: whitequark, deadalnix Reviewed By: whitequark Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D56187 llvm-svn: 350279	2019-01-03 00:33:44 +00:00
Craig Topper	df5304d8de	[X86] Add load folding support to the custom isel we do for X86ISD::UMUL/SMUL. The peephole pass isn't always able to fold the load because it can't commute the implicit usage of AL/AX/EAX/RAX. llvm-svn: 350272	2019-01-02 23:24:08 +00:00
Wouter van Oortmerssen	ad72f68501	[WebAssembly] made assembler parse block_type Summary: This was previously ignored and an incorrect value generated. Also fixed Disassembler's handling of block_type. Reviewers: dschuff, aheejin Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D56092 llvm-svn: 350270	2019-01-02 23:23:51 +00:00
Craig Topper	9d4860ec4e	[X86] Remove X86ISD::INC/DEC. Just select them from X86ISD::ADD/SUB at isel time INC/DEC are pretty much the same as ADD/SUB except that they don't update the C flag. This patch removes the special nodes and just pattern matches from ADD/SUB during isel if the C flag isn't being used. I had to avoid selecting DEC is the result isn't used. This will become a SUB immediate which will turned into a CMP later by optimizeCompareInstr. This lead to the one test change where we use a CMP instead of a DEC for an overflow intrinsic since we only checked the flag. This also exposed a hole in our RMW flag matching use of hasNoCarryFlagUses. Our root node for the match is a store and there's no guarantee that all the flag users have been selected yet. So hasNoCarryFlagUses needs to check copyToReg and machine opcodes, but it also needs to check for the pre-match SETCC, SETCC_CARRY, BRCOND, and CMOV opcodes. Differential Revision: https://reviews.llvm.org/D55975 llvm-svn: 350245	2019-01-02 19:01:05 +00:00
Wei Mi	ecc89b76cb	[PowerPC] Remove SeenUse check when optimizing conditional branch in PPCPreEmitPeephole pass. PPCPreEmitPeephole will convert a BC to B when the conditional branch is based on a constant CR by CRSET or CRUNSET. This is added in https://reviews.llvm.org/rL343100. When the conditional branch is known to be always taken, all branches will be removed and a new unconditional branch will be inserted. However, when SeenUse is false the original patch will not remove the branches, but still insert the new unconditional branch, update the successors and create inconsistent IR. Compiling the synthetic testcase included can show the problem we run into. The patch simply removes the SeenUse condition when adding branches into InstrsToErase set. Differential Revision: https://reviews.llvm.org/D56041 llvm-svn: 350223	2019-01-02 17:07:23 +00:00
Simon Pilgrim	d8125726d5	[X86] Support SHLD/SHRD masked shift-counts (PR34641) Peek through shift modulo masks while matching double shift patterns. I was hoping to delay this until I could remove the X86 code with generic funnel shift matching (PR40081) but this will do for now. Differential Revision: https://reviews.llvm.org/D56199 llvm-svn: 350222	2019-01-02 17:05:37 +00:00
Piotr Sobczak	378131bae0	[AMDGPU] Handle OR as operand of raw load/store Summary: Use isBaseWithConstantOffset() which handles OR as an operand to llvm.amdgcn.raw.buffer.load and llvm.amdgcn.raw.buffer.store. Change-Id: Ifefb9dc5ded8710d333df07ab1900b230e33539a Reviewers: nhaehnle, mareko, arsenm Reviewed By: arsenm Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D55999 llvm-svn: 350208	2019-01-02 09:47:41 +00:00
Craig Topper	f7cc7e3201	[X86] Remove the separate SMUL8/UMUL8 X86ISD opcodes by merging with SMUL/UMUL. Remove the second result from X86ISD::UMUL. All of these use custom isel so we can pretty easily detect the differences in the custom code in X86ISelDAGToDAG. The ISD opcodes just need to express the desired semantics not the details of how they would be selected by isel. So unifying them lets us remove the special casing from lowering. llvm-svn: 350206	2019-01-02 06:40:11 +00:00
Craig Topper	d4db122483	[X86] Allow LowerSELECT and LowerBRCOND to directly lower i8 UMULO/SMULO. These require a different X86ISD node to be created than i16/i32/i64. I guess no one wanted to add the special code for that except in LowerXALUO. But now LowerXALUO, LowerSELECT, and LowerBRCOND all use a common helper function so they all share the special code. Unfortunately, there are no test changes because we seem to correct the miss in a DAG combine later. I did verify it manually using test cases from xmulo.ll llvm-svn: 350205	2019-01-02 05:46:03 +00:00
Craig Topper	00b390a000	[X86] Factor the core code out of LowerXALUO into a helper function. Use it in LowerBRCOND and LowerSELECT to avoid some duplicated code. This makes it easier to keep the LowerBRCOND and LowerSELECT code in sync with LowerXALUO so they always pick the same operation for overflowing instructions. This is inspired by the helper functions used by ARM and AArch64 for the same purpose. The test change is because LowerSELECT was not in sync with LowerXALUO with regard to INC/DEC for SADDO/SSUBO. llvm-svn: 350198	2019-01-01 19:34:11 +00:00
Sanjay Patel	738a863648	[x86] move/rename helper for horizontal op codegen; NFC Preliminary commit as suggested in D56011. llvm-svn: 350193	2019-01-01 16:08:36 +00:00
Craig Topper	bb0873cf46	[X86] Add X86ISD::VSRAI to computeKnownBitsForTargetNode. Differential Revision: https://reviews.llvm.org/D56169 llvm-svn: 350178	2018-12-31 19:09:27 +00:00
Simon Pilgrim	f2b9d10477	Keep tablegen commands in alphabetical order. NFCI. Mentioned on D56167. llvm-svn: 350176	2018-12-31 14:51:53 +00:00
Martin Storsjo	74d93f9b24	[AArch64] Accept "sve" as arch feature in assembler Differential Revision: https://reviews.llvm.org/D56128 llvm-svn: 350174	2018-12-31 10:22:04 +00:00
Martin Storsjo	2018777836	[AArch64] Implement the .arch_extension directive Differential Revision: https://reviews.llvm.org/D56131 llvm-svn: 350169	2018-12-30 21:06:32 +00:00
Kang Zhang	9d78c60bf4	[PowerPC] Fix machine verify pass error for PATCHPOINT pseudo instruction that bad machine code Summary: For SDAG, we pretend patchpoints aren't special at all until we emit the code for the pseudo. Then the verifier runs and it seems like we have a use of an undefined register (the register will be reserved later, but the verifier doesn't know that). So this patch call setUsesTOCBasePtr before emit the code for the pseudo, so verifier can know X2 is a reserved register. Reviewed By: nemanjai Differential Revision: https://reviews.llvm.org/D56148 llvm-svn: 350165	2018-12-30 15:13:51 +00:00
Craig Topper	a32e353afa	[X86] Don't mark SEXTLOAD from v4i8/v4i16/v8i8 as Custom on pre-sse4.1. This seems to be getting in the way more than its helping. This does mean we stop scalarizing some cases, but I'm not convinced the scalarization was really better. Some of the changes to vsel-cmp-load.ll are a regression but D56156 should fix it. llvm-svn: 350159	2018-12-30 03:05:07 +00:00
Craig Topper	f237ce159e	[X86] Add custom type legalization for SIGN_EXTEND_VECTOR_INREG from 16i16/v32i8 to v4i64 when v4i64 needs splitting. This allows us to sign extend to v4i32 first. And then share that extension to implement the final steps to v4i64 using a pcmpgt and punpckl and punpckh. We already do something similar for SIGN_EXTEND with -x86-experimental-vector-widening-legalization. llvm-svn: 350158	2018-12-30 02:30:34 +00:00
Nemanja Ivanovic	0dad994a10	[PowerPC][NFC] Macro for register set defs for the Asm Parser We have some unfortunate code in the back end that defines a bunch of register sets for the Asm Parser. Every time another class is needed in the parser, we have to add another one of those definitions with explicit lists of registers. This NFC patch simply provides macros to use to condense that code a little bit. Differential revision: https://reviews.llvm.org/D54433 llvm-svn: 350156	2018-12-29 16:13:11 +00:00
Nemanja Ivanovic	0f7715afe1	[PowerPC] Complete the custom legalization of vector int to fp conversion A recent patch has added custom legalization of vector conversions of v2i16 -> v2f64. This just rounds it out for other types where the input vector has an illegal (narrower) type than the result vector. Specifically, this will handle the following conversions: v2i8 -> v2f64 v4i8 -> v4f32 v4i16 -> v4f32 Differential revision: https://reviews.llvm.org/D54663 llvm-svn: 350155	2018-12-29 13:40:48 +00:00
Nemanja Ivanovic	3c7ac649ec	[PowerPC] Fix CR Bit spill pseudo expansion The current CRBIT spill pseudo-op expansion creates a KILL instruction that kills the CRBIT and defines the enclosing CR field. However, this paints a false picture to the register allocator that all bits in the CR field are killed so copies of other bits out of the field become dead and removable. This changes the expansion to preserve the KILL flag on the CRBIT as an implicit use and to treat the CR field as an undef input. Thanks to Hal Finkel for the review and Uli Weigand for implementation input. Differential revision: https://reviews.llvm.org/D55996 llvm-svn: 350153	2018-12-29 11:43:54 +00:00
Simon Atanasyan	a6424e7c4e	[mips] Show an error on attempt to use 64-bit PC-relative relocation The following code requests 64-bit PC-relative relocations unsupported by MIPS ABI. Now it triggers an assertion. It's better to show an error message. ``` foo: .quad bar - foo ``` llvm-svn: 350152	2018-12-29 10:10:02 +00:00
Simon Atanasyan	b243d8d42a	[mips] Show a regular error message on attempt to use one byte relocation llvm-svn: 350151	2018-12-29 10:09:55 +00:00
Heejin Ahn	4d98dfb67d	[WebAssembly] Fix comments in ExplicitLocals (NFC) llvm-svn: 350144	2018-12-29 02:42:04 +00:00
Craig Topper	0a6cec6f9f	[X86] Don't mark SEXTLOAD v4i8->v4i64 and v8i8->v8i64 as custom under vector widening legalization. This was tricking us into making these operations and then letting them get scalarized later. But I can't prove that the scalarized version is actually better. llvm-svn: 350141	2018-12-29 01:17:11 +00:00
Craig Topper	f814d28eb3	[X86] Directly emit X86ISD::PMULUDQ from the ReplaceNodeResults handling of v2i8/v2i16/v2i32 multiply. Previously we emitted a multiply and some masking that was supposed to matched to PMULUDQ, but the masking could sometimes be removed before we got a chance to match it. So instead just emit the PMULUDQ directly. Remove the DAG combine that was added when the ReplaceNodeResults code was originally added. Add a new DAG combine to avoid regressions in shrink_vmul.ll Some of the shrink_vmul.ll test cases now pick PMULUDQ instead of PMADDWD/PMULLD, but I think this should be an improvement on most CPUs. I think all of this can go away if/when we switch to -x86-experimental-vector-widening-legalization llvm-svn: 350134	2018-12-28 19:19:39 +00:00
Diogo N. Sampaio	9123f82cc4	[AArch64] Add command-line option for SB SB (Speculative Barrier) is only mandatory from 8.5 onwards but is optional from Armv8.0-A. This patch adds a command line option to enable SB, as it was previously only possible to enable by selecting -march=armv8.5-a. This patch also moves to FeatureSB the old FeatureSpecRestrict. Reviewers: pbarrio, olista01, t.p.northover, LukeCheeseman Differential Revision: https://reviews.llvm.org/D55921 llvm-svn: 350126	2018-12-28 17:14:58 +00:00
Hiroshi Inoue	1ea98f040e	[PowerPC] handle ISD:TRUNCATE in BitPermutationSelector This is the last one in a series of patches to support better code generation for bitfield insert. BitPermutationSelector already support ISD::ZERO_EXTEND but not TRUNCATE. This patch adds support for ISD:TRUNCATE in BitPermutationSelector. For example of this test case, struct s64b { int a:4; int b:16; int c:24; }; void bitfieldinsert64b(struct s64b *p, unsigned char v) { p->b = v; } the selection DAG loos like: t14: i32,ch = load<(load 4 from %ir.0)> t0, t2, undef:i64 t18: i32 = and t14, Constant:i32<-1048561> t4: i64,ch = CopyFromReg t0, Register:i64 %1 t22: i64 = AssertZext t4, ValueType:ch:i8 t23: i32 = truncate t22 t16: i32 = shl nuw nsw t23, Constant:i32<4> t19: i32 = or t18, t16 t20: ch = store<(store 4 into %ir.0)> t14:1, t19, t2, undef:i64 By handling truncate in the BitPermutationSelector, we can use information from AssertZext when selecting t19 and skip the mask operation corresponding to t18. So the generated sequences with and without this patch are without this patch rlwinm 5, 5, 0, 28, 11 # corresponding to t18 rlwimi 5, 4, 4, 20, 27 with this patch rlwimi 5, 4, 4, 12, 27 Differential Revision: https://reviews.llvm.org/D49076 llvm-svn: 350118	2018-12-28 08:00:39 +00:00
QingShan Zhang	f2d9df61c7	[PowerPC] Remove the implicit use of the register if it is replaced by Imm If we are changing the MI operand from Reg to Imm, we need also handle its implicit use if have. Differential Revision: https://reviews.llvm.org/D56078 llvm-svn: 350115	2018-12-28 03:38:09 +00:00
Zi Xuan Wu	5187444345	[NFC] clang-format functions related to r350113 llvm-svn: 350114	2018-12-28 02:45:17 +00:00
Zi Xuan Wu	a02a3feecf	[PowerPC] Fix assert from machine verify pass that atomic pseudo expanding causes mismatched register class For atomic value operand which less than 4 bytes need to be masked. And the related operation to calculate the newvalue can be done in 32 bit gprc. So just use gprc for mask and value calculation. Differential Revision: https://reviews.llvm.org/D56077 llvm-svn: 350113	2018-12-28 02:12:55 +00:00
Chen Zheng	5ede950df9	[PowerPC] fix register class after converting X-FORM instruction to D-FORM instruction Differential Revision: https://reviews.llvm.org/D55806 llvm-svn: 350111	2018-12-28 01:02:35 +00:00
Craig Topper	787ad92bf6	[X86] Remove check that avoids creating PMULDQ with illegal types. Rely on SplitOpsAndApply to legalize it. Create PMULDQ/PMULUDQ as long as the number of elements is a power of 2. This seems to give some improvements in our ability to use SimplifyDemandedBits. llvm-svn: 350084	2018-12-27 03:37:04 +00:00
Craig Topper	a8f07e51f9	[X86] Factor the core code out of LowerSETCC into a helper that can create CMP/BT/PTEST/KORTEST etc. without making an X86ISD::SETCC node. NFCI Make each of the helper functions only return their comparison node and the condition code. Leave X86ISD::SETCC creation to the LowerSETCC function itself. Looking into whether we can use this code directly in BRCOND and SELECT lowering instead of going through LowerSETCC which creates an X86ISD::SETCC node we need to look through. llvm-svn: 350082	2018-12-27 01:50:40 +00:00
Craig Topper	4f1ef9fc0f	[X86] Merge getBitTestCondition into LowerAndToBT. Don't create X86ISD::SETCC node in the merged function. NFCI Only one of the 3 callers of LowerAndToBT need the SETCC node. Two of them have to look through it to find the operands they really need. Instead create it after the one call that needs it. LowerAndToBT now returns both the BT node and the X86 specific condition code separately. llvm-svn: 350081	2018-12-27 01:50:38 +00:00
Wouter van Oortmerssen	f227621036	[WebAssembly] Added basic support for if/else/end_if in MC layer. Summary: These instructions are currently unused in our backend, but for completeness it is good to support them, so they can be used with the assembler in hand-written code. Tests are very basic, signature support missing much like other blocks. Reviewers: dschuff, aheejin Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D55973 llvm-svn: 350079	2018-12-26 22:55:26 +00:00
Wouter van Oortmerssen	29c6ce5879	[WebAssembly] Make assembler check for proper nesting of control flow. Summary: It does so using a simple nesting stack, and gives clear errors upon violation. This is unique to wasm, since most CPUs do not have any nested constructs. Had to add an end of file check to the general assembler for this. Note: if/else/end instructions are not currently supported in our tablegen defs, so these tests will be enabled in a follow-up. They already pass the nesting check. Reviewers: dschuff, aheejin Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D55797 llvm-svn: 350078	2018-12-26 22:46:18 +00:00
Heejin Ahn	ce1d50f9d7	[WebAssembly] Delete an unnecessary line in RegStackify `OneUseInst` is set outside of the loop before and `OneUse` does not change throughout the loop, so this line is not necessary. llvm-svn: 350076	2018-12-26 22:33:35 +00:00
Heejin Ahn	99d3946398	[WebAssembly] Fix typos in comments in RegStackify (NFC) llvm-svn: 350075	2018-12-26 22:27:46 +00:00
Justin Lebar	49fac56ea3	[NVPTX] Allow libcalls that are defined in the current module. The patch adds a possibility to make library calls on NVPTX. An important thing about library functions - they must be defined within the current module. This basically should guarantee that we produce a valid PTX assembly (without calls to not defined functions). The one who wants to use the libcalls is probably will have to link against compiler-rt or any other implementation. Currently, it's completely impossible to make library calls because of error LLVM ERROR: Cannot select: i32 = ExternalSymbol '...'. But we can lower ExternalSymbol to TargetExternalSymbol and verify if the function definition is available. Also, there was an issue with a DAG during legalisation. When we expand instruction into libcall, the inner call-chain isn't being "integrated" into outer chain. Since the last "data-flow" (call retval load) node is located in call-chain earlier than CALLSEQ_END node, the latter becomes a leaf and therefore a dead node (and is being removed quite fast). Proposed here solution relies on another data-flow pseudo nodes (ProxyReg) which purpose is only to keep CALLSEQ_END at legalisation and instruction selection phases - we remove the pseudo instructions before register scheduling phase. Patch by Denys Zariaiev! Differential Revision: https://reviews.llvm.org/D34708 llvm-svn: 350069	2018-12-26 19:12:31 +00:00
Petar Avramovic	09dff33349	[MIPS GlobalISel] Select G_SELECT Add widen scalar for type index 1 (i1 condition) for G_SELECT. Select G_SELECT for pointer, s32(integer) and smaller low level types on MIPS32. Differential Revision: https://reviews.llvm.org/D56001 llvm-svn: 350063	2018-12-25 14:42:30 +00:00
Kang Zhang	d501a1e596	[PowerPC] Fix the bug of ISD::ADDE to set its second return type to glue Summary: This patch is to fix the bug imported by rL341634. In above submit , the the return type of ISD::ADDE is 14224: SDVTList VTs = DAG.getVTList(MVT::i64, MVT::i64), but in fact, the second return type of ISD::ADDE should be MVT::Glue not MVT::i64. Reviewed By: hfinkel Differential Revision: https://reviews.llvm.org/D55977 llvm-svn: 350061	2018-12-25 03:29:51 +00:00
Craig Topper	0229da8f07	[X86] Use GetDemandedBits to simplify the operands of PMULDQ/PMULUDQ. This is an alternative to what I attempted in D56057. GetDemandedBits is a special version of SimplifyDemandedBits that allows simplifications even when the operand has other uses. GetDemandedBits will only do simplifications that allow a node to be bypassed. It won't create new nodes or alter any of the other users. I had to add support for bypassing SIGN_EXTEND_INREG to GetDemandedBits. Based on a patch that Simon Pilgrim sent me in email. Fixes PR40142. llvm-svn: 350059	2018-12-24 19:40:20 +00:00
George Burgess IV	7e12875c89	[LoopIdioms] More LocationSize::precise annotations; NFC Both of these places reference memset-like loops. Memset is precise. Trying to keep these patches super small so they're easily post-commit verifiable, as requested in D44748. llvm-svn: 350044	2018-12-24 05:55:50 +00:00
Craig Topper	0adc3fe9e7	[X86] Remove unused variables left after r350041. NFC llvm-svn: 350043	2018-12-24 05:45:45 +00:00
Craig Topper	d8217b23ff	[X86] Move the optimization that turns 'CMP (AND+IMM64), 0' into SRL/SHL+TEST to X86ISelDAGToDAG. This cleans more code out of EmitTest. llvm-svn: 350041	2018-12-24 05:27:13 +00:00
Craig Topper	e8c50fc6af	[X86] Remove the ANDN check from EmitTest. Remove the TESTmr isel patterns and add another postprocessing combine for TESTrr+ANDrm->TESTmr. We already have a postprocessing combine for TESTrr+ANDrr->TESTrr. With this we can give ANDN a chance to match first. And clean it up during post processing if we ended up with just a regular AND. This is another step towards my plan to gut EmitTest and do more flag handling during isel matching or by using optimizeCompare. llvm-svn: 350038	2018-12-24 01:10:13 +00:00
Craig Topper	006bac6880	[X86] Return false from hasAndNotCompare if the comparision value is a constant. We won't end up using an ANDN instruction in this case so we should generate the same code we do for pre-BMI targets. llvm-svn: 350018	2018-12-23 05:52:55 +00:00
Craig Topper	3cc92a28ce	[X86] Fix an old FIXME about folding the zero constant into the OR instruction we use for sequentially consistent fence in 32-bit mode without SSE2. llvm-svn: 350013	2018-12-23 01:54:43 +00:00
Sanjay Patel	52c02d70e2	[x86] add load fold patterns for movddup with vzext_load The missed load folding noticed in D55898 is visible independent of that change either with an adjusted IR pattern to start or with AVX2/AVX512 (where the build vector becomes a broadcast first; movddup is not produced until we get into isel via tablegen patterns). Differential Revision: https://reviews.llvm.org/D55936 llvm-svn: 350005	2018-12-22 16:59:02 +00:00
Craig Topper	1f02ac3451	[X86] FixupLEAs, reduce number of calls to getOperand and use X86::AddrBaseReg/AddrIndexReg, etc. instead of hardcoded constants. Makes the code a little more readable. llvm-svn: 349983	2018-12-22 01:34:47 +00:00
Justin Lebar	7f41fe3a58	[NVPTX] Reduce stack size in NVPTXAsmPrinter::doInitialization(). NVPTXAsmPrinter::doInitialization() was creating an NVPTXSubtarget on the stack. This object is huge, about 80kb. Also it's slow to create. And it's all redundant; we have one in NVPTXTargetMachine anyway! llvm-svn: 349982	2018-12-22 01:30:37 +00:00
Mircea Trofin	499a66ecc0	Silence warning in assert introduced in rL349973. Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D56030 llvm-svn: 349975	2018-12-21 23:02:10 +00:00
Mircea Trofin	b53eeb6f4c	[llvm] API for encoding/decoding DWARF discriminators. Summary: Added a pair of APIs for encoding/decoding the 3 components of a DWARF discriminator described in http://lists.llvm.org/pipermail/llvm-dev/2016-October/106532.html: the base discriminator, the duplication factor (useful in profile-guided optimization) and the copy index (used to identify copies of code in cases like loop unrolling) The encoding packs 3 unsigned values in 32 bits. This CL addresses 2 issues: - communicates overflow back to the user - supports encoding all 3 components together. Current APIs assume a sequencing of events. For example, creating a new discriminator based on an existing one by changing the base discriminator was not supported. Reviewers: davidxl, danielcdh, wmi, dblaikie Reviewed By: dblaikie Subscribers: zzheng, dmgreen, aprantl, JDevlieghere, llvm-commits Differential Revision: https://reviews.llvm.org/D55681 llvm-svn: 349973	2018-12-21 22:48:50 +00:00
Craig Topper	e58cd9cbc6	[X86] Add isel patterns to match BMI/TBMI instructions when lowering has turned the root nodes into one of the flag producing binops. This fixes the patterns that have or/and as a root. 'and' is handled differently since thy usually have a CMP wrapped around them. I had to look for uses of the CF flag because all these nodes have non-standard CF flag behavior. A real or/xor would always clear CF. In practice we shouldn't be using the CF flag from these nodes as far as I know. Differential Revision: https://reviews.llvm.org/D55813 llvm-svn: 349962	2018-12-21 21:42:43 +00:00
Craig Topper	62ec024d3b	[X86] Don't allow optimizeCompareInstr to replace a CMP with BEXTR if the sign flag is used. The BEXTR instruction documents the SF bit as undefined. The TBM BEXTR instruction has the same issue, but I'm not sure how to test it. With the control being an immediate we can determine the sign bit is 0 or the BEXTR would have been removed. Fixes PR40060 Differential Revision: https://reviews.llvm.org/D55807 llvm-svn: 349956	2018-12-21 21:16:26 +00:00
Changpeng Fang	6f539294b5	AMDGPU: Don't peel of the offset if the resulting base could possibly be negative in Indirect addressing. Summary: Don't peel of the offset if the resulting base could possibly be negative in Indirect addressing. This is because the M0 field is of unsigned. This patch achieves the similar goal as https://reviews.llvm.org/D55241, but keeps the optimization if the base is known unsigned. Reviewers: arsemn Differential Revision: https://reviews.llvm.org/D55568 llvm-svn: 349951	2018-12-21 20:57:34 +00:00
Sanjay Patel	80187b8a17	[x86] add movddup specialization for build vector lowering (PR37502) This is admittedly a narrow fix for the problem: https://bugs.llvm.org/show_bug.cgi?id=37502 ...but as the XOP restriction shows, it's a maze to get this right. In the motivating example, note that we have movddup before SSE4.1 and again with AVX2. That's because insertps isn't available pre-SSE41 and vbroadcast is (more generally) available with AVX2 (and the splat is reduced to movddup via isel pattern). Differential Revision: https://reviews.llvm.org/D55898 llvm-svn: 349937	2018-12-21 18:48:32 +00:00
Florian Hahn	8c9f865e3d	[ARM] Set Defs = [CPSR] for COPY_STRUCT_BYVAL, as it clobbers CPSR. Fixes PR35023. Reviewers: MatzeB, t.p.northover, sunfish, qcolombet, efriedma Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D55909 llvm-svn: 349935	2018-12-21 18:07:10 +00:00
Jessica Paquette	453ab1db5b	[GlobalISel][AArch64] Add support for widening G_FCEIL This adds support for widening G_FCEIL in LegalizerHelper and AArch64LegalizerInfo. More specifically, it teaches the AArch64 legalizer to widen G_FCEIL from a 16-bit float to a 32-bit float when the subtarget doesn't support full FP 16. This also updates AArch64/f16-instructions.ll to show that we perform the correct transformation. llvm-svn: 349927	2018-12-21 17:05:26 +00:00
Evandro Menezes	96c11eceb2	[AArch64] Refactor Exynos predicate (NFC) Change order of conditions in predicate. llvm-svn: 349918	2018-12-21 15:51:34 +00:00
Simon Pilgrim	aa53fc15b7	[XCore] Always use the version of computeKnownBits that returns a value. NFCI. Continues the work started by @bogner in rL340594 to remove uses of the KnownBits output paramater version. llvm-svn: 349915	2018-12-21 15:35:32 +00:00
Simon Pilgrim	7787a02b23	[Sparc] Always use the version of computeKnownBits that returns a value. NFCI. Continues the work started by @bogner in rL340594 to remove uses of the KnownBits output paramater version. llvm-svn: 349914	2018-12-21 15:32:36 +00:00
Simon Pilgrim	3c157d3fa3	[AMDGPU] Always use the version of computeKnownBits that returns a value. NFCI. Continues the work started by @bogner in rL340594 to remove uses of the KnownBits output paramater version. llvm-svn: 349912	2018-12-21 15:29:47 +00:00
Simon Pilgrim	ca8bca2ad3	[WebAssembly] Always use the version of computeKnownBits that returns a value. NFCI. Continues the work started by @bogner in rL340594 to remove uses of the KnownBits output paramater version. llvm-svn: 349911	2018-12-21 15:25:37 +00:00
Simon Pilgrim	d800ee4861	[ARM] Always use the version of computeKnownBits that returns a value. NFCI. Continues the work started by @bogner in rL340594 to remove uses of the KnownBits output paramater version. llvm-svn: 349909	2018-12-21 15:15:38 +00:00
Simon Pilgrim	148957f336	[AArch64] Always use the version of computeKnownBits that returns a value. NFCI. Continues the work started by @bogner in rL340594 to remove uses of the KnownBits output paramater version. llvm-svn: 349908	2018-12-21 15:05:10 +00:00
Simon Pilgrim	2482c51e99	[SystemZ] Always use the version of computeKnownBits that returns a value. NFCI. Continues the work started by @bogner in rL340594 to remove uses of the KnownBits output paramater version. llvm-svn: 349906	2018-12-21 14:50:54 +00:00
Simon Pilgrim	d43bdc715c	[Lanai] Always use the version of computeKnownBits that returns a value. NFCI. Continues the work started by @bogner in rL340594 to remove uses of the KnownBits output paramater version. llvm-svn: 349905	2018-12-21 14:48:35 +00:00
Simon Pilgrim	af1ab22a76	[PPC] Always use the version of computeKnownBits that returns a value. NFCI. Continues the work started by @bogner in rL340594 to remove uses of the KnownBits output paramater version. llvm-svn: 349903	2018-12-21 14:32:39 +00:00
Simon Pilgrim	57733507fe	[X86] Always use the version of computeKnownBits that returns a value. NFCI. Continues the work started by @bogner in rL340594 to remove uses of the old KnownBits output paramater version. llvm-svn: 349902	2018-12-21 14:25:14 +00:00
Luke Cheeseman	41a9e53500	[Dwarf/AArch64] Return address signing B key dwarf support - When signing return addresses with -msign-return-address=<scope>{+<key>}, either the A key instructions or the B key instructions can be used. To correctly authenticate the return address, the unwinder/debugger must know which key was used to sign the return address. - When and exception is thrown or a break point reached, it may be necessary to unwind the stack. To accomplish this, the unwinder/debugger must be able to first authenticate an the return address if it has been signed. - To enable this, the augmentation string of CIEs has been extended to allow inclusion of a 'B' character. Functions that are signed using the B key variant of the instructions should have and FDE whose associated CIE has a 'B' in the augmentation string. - One must also be able to preserve these semantics when first stepping from a high level language into assembly and then, as a second step, into an object file. To achieve this, I have introduced a new assembly directive '.cfi_b_key_frame ', that tells the assembler the current frame uses return address signing with the B key. - This ensures that the FDE is associated with a CIE that has 'B' in the augmentation string. Differential Revision: https://reviews.llvm.org/D51798 llvm-svn: 349895	2018-12-21 10:45:08 +00:00
Simon Pilgrim	5d403f6bf8	[X86][SSE] Auto upgrade PADDS/PSUBS intrinsics to SADD_SAT/SSUB_SAT generic intrinsics (llvm) This auto upgrades the signed SSE saturated math intrinsics to SADD_SAT/SSUB_SAT generic intrinsics. Clang counterpart: https://reviews.llvm.org/D55890 Differential Revision: https://reviews.llvm.org/D55894 llvm-svn: 349892	2018-12-21 09:04:14 +00:00
Thomas Lively	b6dac89c87	[WebAssembly] Fix invalid machine instrs in -O0, verify in tests Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D55956 llvm-svn: 349889	2018-12-21 06:58:15 +00:00
Matt Arsenault	3eae3c4590	AMDGPU/GlobalISel: RegBankSelect for amdgcn.wqm.vote llvm-svn: 349882	2018-12-21 03:20:54 +00:00
Matt Arsenault	f4c21c575a	AMDGPU/GlobalISel: RegBankSelect for some fp ops llvm-svn: 349880	2018-12-21 03:14:45 +00:00
Matt Arsenault	bee2ad7185	AMDGPU/GlobalISel: Redo legality for build_vector It seems better to avoid using the callback if possible since there are coverage assertions which are disabled if this is used. Also fix missing tests. Only test the legal cases since it seems legalization for build_vector is quite lacking. llvm-svn: 349878	2018-12-21 03:03:11 +00:00
Craig Topper	54f1a7be13	[X86] Refactor hasNoCarryFlagUses and hasNoSignFlagUses in X86ISelDAGToDAG.cpp to tranlate opcode to condition code using the helpers in X86InstrInfo.cpp. This shortens the switches in X86ISelDAGToDAG.cpp to only need to check condition code instead of a list of opcodes. This also fixes a bug where the memory forms of SETcc were missing from hasNoCarryFlagUses. llvm-svn: 349868	2018-12-21 01:14:25 +00:00
Craig Topper	e0cff10289	[X86] Add memory forms of some SETCC instructions to hasNoCarryFlagUses. Found while working on another patch llvm-svn: 349867	2018-12-21 01:14:23 +00:00
Eli Friedman	b1bbd5dca3	[ARM] Complete the Thumb1 shift+and->shift+shift transforms. This saves materializing the immediate. The additional forms are less common (they don't usually show up for bitfield insert/extract), but they're still relevant. I had to add a new target hook to prevent DAGCombine from reversing the transform. That isn't the only possible way to solve the conflict, but it seems straightforward enough. Differential Revision: https://reviews.llvm.org/D55630 llvm-svn: 349857	2018-12-20 23:39:54 +00:00
Jessica Paquette	a6b9c68a85	[GlobalISel][AArch64] Add G_FCEIL to isPreISelGenericFloatingPointOpcode If you don't do this, then if you hit a G_LOAD in getInstrMapping, you'll end up with GPRs on the G_FCEIL instead of FPRs. This causes a fallback. Add it to the switch, and add a test verifying that this happens. llvm-svn: 349822	2018-12-20 21:14:15 +00:00
Eli Friedman	48397102d0	[MC] [AArch64] Correctly resolve ":abs_g1:3" etc. We have to treat constructs like this as if they were "symbolic", to use the correct codepath to resolve them. This mostly only affects movz etc. because the other uses of classifySymbolRef conservatively treat everything that isn't a constant as if it were a symbol. Differential Revision: https://reviews.llvm.org/D55906 llvm-svn: 349800	2018-12-20 19:46:14 +00:00
Eli Friedman	4648209e16	[MC] [AArch64] Support resolving fixups for abs_g0 etc. This requires a bit more code than other fixups, to distingush between abs_g0/abs_g1/etc. Actually, I think some of the other fixups are missing some checks, but I won't try to address that here. I haven't seen any real-world code that uses a construct like this, but it clearly should work, and we're considering using it in the implementation of localescape/localrecover on Windows (see https://reviews.llvm.org/D53540). I've verified that binutils produces the same code as llvm-mc for the testcase. This currently doesn't include support for the *_s variants (that requires a bit more work to set the opcode). Differential Revision: https://reviews.llvm.org/D55896 llvm-svn: 349799	2018-12-20 19:38:07 +00:00
Simon Pilgrim	2a25360ae3	[X86] Auto upgrade XOP/AVX512 rotation intrinsics to generic funnel shift intrinsics (llvm) This emits FSHL/FSHR generic intrinsics for the XOP VPROT and AVX512 VPROL/VPROR rotation intrinsics. Clang counterpart: https://reviews.llvm.org/D55937 Differential Revision: https://reviews.llvm.org/D55938 llvm-svn: 349795	2018-12-20 19:01:07 +00:00
Yonghong Song	821c93d556	[BPF] Disable relocation for .BTF.ext section Build llvm with assertion on, and then build bcc against this llvm. Run any bcc tool with debug=8 (turning on -g for clang compilation), you will get the following assertion errors, /home/yhs/work/llvm/lib/ExecutionEngine/RuntimeDyld/RuntimeDyldELF.cpp:888: void llvm::RuntimeDyldELF::resolveBPFRelocation(const llvm::SectionEntry&, uint64_t, uint64_t, uint32_t, int64_t): Assertion `Value <= (4294967295U)' failed. The .BTF.ext ELF section uses Fixup's to get the instruction offsets. The data width of the Fixup is 4 bytes since we only need the insn offset within the section. This caused the above error though since R_BPF_64_32 expects 4-byte value and the Runtime Dyld tried to resolve the actual insn address which is 8 bytes. Actually the offset within the section is all what we need. Therefore, there is no need to perform any kind of relocation for .BTF.ext section and such relocation will actually cause incorrect result. This patch changed BPFELFObjectWriter::getRelocType() such that for Fixup Kind FK_Data_4, if the relocation Target is a temporary symbol, let us skip the relocation (ELF::R_BPF_NONE). Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Yonghong Song <yhs@fb.com> llvm-svn: 349778	2018-12-20 17:40:23 +00:00
Krzysztof Parzyszek	30c42e2ab6	[Hexagon] Add patterns for funnel shifts llvm-svn: 349770	2018-12-20 16:39:20 +00:00
Alex Bradbury	eb3a64a4da	[RISCV] Properly evaluate fixup_riscv_pcrel_lo12 This is a update to D43157 to correctly handle fixup_riscv_pcrel_lo12. Notable changes: Rebased onto trunk Handle and test S-type Test case pcrel-hilo.s is merged into relocations.s D43157 description: VK_RISCV_PCREL_LO has to be handled specially. The MCExpr inside is actually the location of an auipc instruction with a VK_RISCV_PCREL_HI fixup pointing to the real target. Differential Revision: https://reviews.llvm.org/D54029 Patch by Chih-Mao Chen and Michael Spencer. llvm-svn: 349764	2018-12-20 14:52:15 +00:00
Simon Pilgrim	09c081176a	[X86][AVX512] Don't custom lower v16i8 rotations. As discussed on D55747, the expansion to (wider) shifts is better on all AVX512 cases, not just BWI. llvm-svn: 349763	2018-12-20 14:38:35 +00:00
Ulrich Weigand	380bece7af	[SystemZ] "Generic" vector assembler instructions shoud clobber CC There are several vector instructions which may or may not set the condition code register, depending on the value of an argument. For codegen, we use two versions of the instruction, one that sets CC and one that doesn't, which hard-code appropriate values of that argument. But we also have a "generic" version of the instruction that is used for the assembler/disassembler. These generic versions should always be considered to clobber CC just to be safe. llvm-svn: 349761	2018-12-20 14:24:17 +00:00
Ulrich Weigand	44d37ae38c	[SystemZ] Make better use of VLLEZ This patch fixes two deficiencies in current code that recognizes the VLLEZ idiom: - For the floating-point versions, we have ISel patterns that match on a bitconvert as the top node. In more complex cases, that bitconvert may already have been merged into something else. Fix the patterns to match the inner nodes instead. - For the 64-bit integer versions, depending on the surrounding code, we may get either a DAG tree based on JOIN_DWORDS or one based on INSERT_VECTOR_ELT. Use a PatFrags to simply match both variants. llvm-svn: 349749	2018-12-20 13:05:03 +00:00
Ulrich Weigand	8bb46b0f01	[SystemZ] Make better use of VGEF/VGEG Current code in SystemZDAGToDAGISel::tryGather refuses to perform any transformation if the Load SDNode has more than one use. This (erronously) counts uses of the chain result, which prevents the optimization in many cases unnecessarily. Fixed by this patch. llvm-svn: 349748	2018-12-20 13:01:20 +00:00
Clement Courbet	36a3480385	Re-land r349731 "[CodeGen][ExpandMemcmp] Add an option for allowing overlapping loads. Update PPC ir following GEP->bitcat to bitcat->GEP->bitcat change. llvm-svn: 349747	2018-12-20 13:01:04 +00:00
Ulrich Weigand	f43b510015	[SystemZ] Make better use of VLDEB We already have special code (DAG combine support for FP_ROUND) to recognize cases where we an use a vector version of VLEDB to perform two floating-point truncates in parallel, but equivalent support for VLEDB (vector floating-point extends) has been missing so far. This patch adds corresponding DAG combine support for FP_EXTEND. llvm-svn: 349746	2018-12-20 12:59:05 +00:00
Clement Courbet	e22cf4d7cb	Revert r349731 "[CodeGen][ExpandMemcmp] Add an option for allowing overlapping loads." Forgot to update PowerPC tests for the GEP->bitcast change. llvm-svn: 349733	2018-12-20 09:58:33 +00:00
Clement Courbet	1bb6e1b0f2	[CodeGen][ExpandMemcmp] Add an option for allowing overlapping loads. Summary: This allows expanding {7,11,13,14,15,21,22,23,25,26,27,28,29,30,31}-byte memcmp in just two loads on X86. These were previously calling memcmp. Reviewers: spatel, gchatelet Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D55263 llvm-svn: 349731	2018-12-20 09:13:47 +00:00
Kang Zhang	ca8db48974	[PowerPC] Implement the isSelectSupported() target hook Summary: PowerPC has scalar selects (isel) and vector mask selects (xxsel). But PowerPC does not have vector CR selects, PowerPC does not support scalar condition selects on vectors. In addition to implementing this hook, isSelectSupported() should return false when the SelectSupportKind is ScalarCondVectorVal, so that predictable selects are converted into branch sequences. Reviewed By: steven.zhang, hfinkel Differential Revision: https://reviews.llvm.org/D55754 llvm-svn: 349727	2018-12-20 06:19:59 +00:00
Thomas Lively	feb18fe927	[WebAssembly] Emit a splat for v128 IMPLICIT_DEF Summary: This is a code size savings and is also important to get runnable code while engines do not support v128.const. Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D55910 llvm-svn: 349724	2018-12-20 04:20:32 +00:00
Amara Emerson	321bfb210a	Fix build errors introduced by r349712 on aarch64 bots. llvm-svn: 349723	2018-12-20 03:27:42 +00:00
Thomas Lively	8dbf29af95	[WebAssembly] Gate unimplemented SIMD ops on flag Summary: Gates v128.const, f32x4.sqrt, f32x4.div, i8x16.extract_lane_u, and i16x8.extract_lane_u on the --wasm-enable-unimplemented-simd flag, since these ops are not implemented yet in V8. Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D55904 llvm-svn: 349720	2018-12-20 02:10:22 +00:00
Matt Arsenault	4339883710	AMDGPU: Make i1/i64/v2i32 and/or/xor legal The 64-bit types do depend on the register bank, but that's another issue to deal with later. llvm-svn: 349716	2018-12-20 01:35:49 +00:00
Matt Arsenault	8cc98bee8a	AMDGPU/GlobalISel: Fix ValueMapping tables for i1 This was incorrectly selecting SGPR for any i1 values, e.g. G_TRUNC to i1 from a VGPR was still an SGPR. llvm-svn: 349715	2018-12-20 01:33:43 +00:00
Craig Topper	9ca2f5605e	[X86] Disable custom widening of signed/unsigned add/sub saturation intrinsics under -x86-experimental-vector-widening-legalization. Generic legalization should take care of this. llvm-svn: 349714	2018-12-20 01:32:06 +00:00
Amara Emerson	8cb186ce17	[AArch64][GlobalISel] Implement selection og G_MERGE of two s32s into s64. This code pattern is an unfortunate side effect of the way some types get split at call lowering. Ideally we'd either not generate it at all or combine it away in the legalizer artifact combiner. Until then, add selection support anyway which is a significant proportion of our current fallbacks on CTMark. rdar://46491420 llvm-svn: 349712	2018-12-20 01:11:04 +00:00
Matt Arsenault	dff33c38e1	AMDGPU/GlobalISel: RegBankSelect for fp conversions llvm-svn: 349709	2018-12-20 00:37:02 +00:00
Matt Arsenault	36d4092173	AMDGPU/GlobalISel: Legality/regbankselect for atomicrmw/atomic_cmpxchg llvm-svn: 349708	2018-12-20 00:33:49 +00:00
Craig Topper	217b3b20d8	[X86] Remove TLI variable from ReplaceNodeResults. NFC We're already in X86TargetLowering which is a derived class of TargetLowering. We can just call methods directly. llvm-svn: 349695	2018-12-19 23:13:03 +00:00
Rhys Perry	3931ad38b9	AMDGPU: Add patterns for v4i16/v4f16 -> v4i16/v4f16 bitcasts Reviewers: arsenm, tstellar Reviewed By: arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D55058 llvm-svn: 349694	2018-12-19 22:53:33 +00:00
Evandro Menezes	374ccf6768	[AArch64] Improve Exynos predicates Expand the predicate `ExynosResetPred` to include all forms of immediate moves. llvm-svn: 349686	2018-12-19 22:24:36 +00:00
Evandro Menezes	ff827d737a	[AArch64] Use canonical copy idiom Use only the canonical form of the alias for register transfers in the `IsCopyIdiomPred` predicate. llvm-svn: 349685	2018-12-19 22:24:31 +00:00
Craig Topper	d16da2b479	[X86] Remove a bunch of 'else' after returns in reduceVMULWidth. NFC This reduces indentation and makes it obvious this function always returns something. llvm-svn: 349671	2018-12-19 19:39:34 +00:00
Jessica Paquette	3560e93dc1	[GlobalISel][AArch64] Add support for @llvm.ceil This adds a G_FCEIL generic instruction and uses it in AArch64. This adds selection for floating point ceil where it has a supported, dedicated instruction. Other cases aren't handled here. It updates the relevant gisel tests and adds a select-ceil test. It also adds a check to arm64-vcvt.ll which ensures that we don't fall back when we run into one of the relevant cases. llvm-svn: 349664	2018-12-19 19:01:36 +00:00
Craig Topper	84a00bd98a	[X86] Don't match TESTrr from (cmp (and X, Y), 0) during isel. Defer to post processing The (cmp (and X, Y) 0) pattern is greedy and ends up forming a TESTrr and consuming the and when it might be better to use one of the BMI/TBM like BLSR or BLSI. This patch moves removes the pattern from isel and adds a post processing check to combine TESTrr+ANDrr into just a TESTrr. With this patch we are able to select the BMI/TBM instructions, but we'll also emit a TESTrr when the result is compared to 0. In many cases the peephole pass will be able to use optimizeCompareInstr to remove the TEST, but its probably not perfect. Differential Revision: https://reviews.llvm.org/D55870 llvm-svn: 349661	2018-12-19 18:49:13 +00:00
Craig Topper	291470347a	[X86] Fix assert fails in pass X86AvoidSFBPass Fixes https://bugs.llvm.org/show_bug.cgi?id=38743 The function removeRedundantBlockingStores is supposed to remove any blocking stores contained in each other in lockingStoresDispSizeMap. But it currently looks only at the previous one, which will miss some cases that result in assert. This patch refine the function to check all previous layouts until find the uncontained one. So all redundant stores will be removed. Patch by Pengfei Wang Differential Revision: https://reviews.llvm.org/D55642 llvm-svn: 349660	2018-12-19 18:45:57 +00:00
Evandro Menezes	5d409b2278	[AArch64] Improve the Exynos M3 pipeline model llvm-svn: 349652	2018-12-19 17:37:51 +00:00
Yonghong Song	7b410ac352	[BPF] Generate BTF DebugInfo under BPF target This patch implements BTF (BPF Type Format). The BTF is the debug info format for BPF, introduced in the below linux patch: `69b693f0ae (diff-06fb1c8825f653d7e539058b72c83332)` and further extended several times, e.g., https://www.spinics.net/lists/netdev/msg534640.html https://www.spinics.net/lists/netdev/msg538464.html https://www.spinics.net/lists/netdev/msg540246.html The main advantage of implementing in LLVM is: . better integration/deployment as no extra tools are needed. . bpf JIT based compilation (like bcc, bpftrace, etc.) can get BTF without much extra effort. . BTF line_info needs selective source codes, which can be easily retrieved when inside the compiler. This patch implemented BTF generation by registering a BPF specific DebugHandler in BPFAsmPrinter. Signed-off-by: Yonghong Song <yhs@fb.com> Differential Revision: https://reviews.llvm.org/D55752 llvm-svn: 349640	2018-12-19 16:40:25 +00:00
Nicolai Haehnle	8d5e974076	AMDGPU: Use an ABS32_LO relocation for SCRATCH_RSRC_DWORD1 Summary: Using HI here makes no logical sense, since the dword is only 32 bits to begin with. Current Mesa master does not look at the relocation type at all, so this change is fine. Future Mesa will rely on this, however. Change-Id: I91085707834c4ac0370926602b93c94b90e44cb1 Reviewers: arsenm, rampitec, mareko Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D55369 llvm-svn: 349620	2018-12-19 11:55:03 +00:00
Carl Ritson	c521ac3a44	AMDGPU/InsertWaitcnts: Update VGPR/SGPR bounds when brackets are merged Summary: Fix an issue where VGPR/SGPR bounds are not properly extended when brackets are merged. This manifests as missing waitcnt insertions when multiple brackets are forwarded to a successor block and the first forward has lower VGPR/SGPR bounds. Irreducible loop test has been extended based on a CTS failure detected for GFX9. Reviewers: nhaehnle Reviewed By: nhaehnle Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, jfb, llvm-commits Differential Revision: https://reviews.llvm.org/D55602 llvm-svn: 349611	2018-12-19 10:17:49 +00:00
Diana Picus	6c35a1e5af	[ARM GlobalISel] Support G_CONSTANT for Thumb2 All we have to do is mark it as legal. This allows us to select a lot of new patterns handled by TableGen. This patch adds tests for them and splits up the existing test file for binary operators into 2 files, one for arithmetic ops and one for logical ones. llvm-svn: 349610	2018-12-19 09:55:10 +00:00
Matt Arsenault	b110e2277c	AMDGPU/GlobalISel: Regbankselect for fsub llvm-svn: 349608	2018-12-19 09:07:58 +00:00
Kewen Lin	a6247e7cf4	[PowerPC]Exploit P9 vabsdu for unsigned vselect patterns For type v4i32/v8ii16/v16i8, do following transforms: (vselect (setcc a, b, setugt), (sub a, b), (sub b, a)) -> (vabsd a, b) (vselect (setcc a, b, setuge), (sub a, b), (sub b, a)) -> (vabsd a, b) (vselect (setcc a, b, setult), (sub b, a), (sub a, b)) -> (vabsd a, b) (vselect (setcc a, b, setule), (sub b, a), (sub a, b)) -> (vabsd a, b) Differential Revision: https://reviews.llvm.org/D55812 llvm-svn: 349599	2018-12-19 03:04:07 +00:00
Evandro Menezes	f03c45d582	[AArch64] Simplify the Exynos M3 pipeline model llvm-svn: 349569	2018-12-18 23:19:57 +00:00
Evandro Menezes	4e39fa4474	[AArch64] Fix instructions order (NFC) llvm-svn: 349568	2018-12-18 23:19:55 +00:00
Craig Topper	18a9d545e1	[X86] Add BSR to isUseDefConvertible. We already had BSF here as part of __builtin_ffs improvements and I was just wondering yesterday whether we should have BSR there. This addresses one issue from PR40090. llvm-svn: 349531	2018-12-18 20:03:54 +00:00
Farhana Aleen	59ee2c5362	[AMDGPU] Removed the unnecessary operand size-check-assert from processBaseWithConstOffset(). Summary: 32bit operand sizes are guaranteed by the opcode check AMDGPU::V_ADD_I32_e64 and AMDGPU::V_ADDC_U32_e64. Therefore, we don't any additional operand size-check-assert. Author: FarhanaAleen llvm-svn: 349529	2018-12-18 19:58:39 +00:00
Craig Topper	8434ef7d1e	[X86] Don't use SplitOpsAndApply to create ISD::UADDSAT/ISD::USUBSAT nodes. Let type legalization and op legalization deal with it. Now that we've switched to target independent nodes we can rely on generic infrastructure to do the legalization for us. llvm-svn: 349526	2018-12-18 19:29:08 +00:00
Nikita Popov	f6058ff140	[X86] Use SADDSAT/SSUBSAT instead of ADDS/SUBS Migrate the X86 backend from X86ISD opcodes ADDS and SUBS to generic ISD opcodes SADDSAT and SSUBSAT. This also improves scodegen for @llvm.sadd.sat() and @llvm.ssub.sat() intrinsics. This is a followup to D55787 and part of PR40056. Differential Revision: https://reviews.llvm.org/D55833 llvm-svn: 349520	2018-12-18 18:28:22 +00:00
Craig Topper	20a6db5a84	[X86] Create PSUBUS from (add (umax X, C), -C) InstCombine seems to canonicalize or PSUB patter into a max with the cosntant and an add with an inverse of the constant. This patch recognizes this pattern and turns it into PSUBUS. Future work could improve undef element handling. Fixes some of PR40053 Differential Revision: https://reviews.llvm.org/D55780 llvm-svn: 349519	2018-12-18 18:26:25 +00:00
Simon Pilgrim	1411917431	[X86][SSE] Don't use 'sign bit select' vXi8 ROTL lowering for constant rotation amounts Noticed by @spatel on D55747 - we get much better codegen if we use the regular shift expansion. llvm-svn: 349510	2018-12-18 17:31:11 +00:00
Simon Pilgrim	e9effe9744	[X86][SSE] Don't use 'sign bit select' vXi8 ROTL lowering for splat rotation amounts Noticed by @spatel on D55747 - we get much better codegen if we use the regular shift expansion. llvm-svn: 349500	2018-12-18 16:02:23 +00:00
Petar Avramovic	0a5e4eb776	[MIPS GlobalISel] Select G_SDIV, G_UDIV, G_SREM and G_UREM Add support for s64 libcalls for G_SDIV, G_UDIV, G_SREM and G_UREM and use integer type of correct size when creating arguments for CLI.lowerCall. Select G_SDIV, G_UDIV, G_SREM and G_UREM for types s8, s16, s32 and s64 on MIPS32. Differential Revision: https://reviews.llvm.org/D55651 llvm-svn: 349499	2018-12-18 15:59:51 +00:00
Nikita Popov	665ab08178	[X86] Use UADDSAT/USUBSAT instead of ADDUS/SUBUS Replace the X86ISD opcodes ADDUS and SUBUS with generic ISD opcodes UADDSAT and USUBSAT. As a side-effect, this also makes codegen for the @llvm.uadd.sat and @llvm.usub.sat intrinsics reasonable. This only replaces use in the X86 backend, and does not move any of the ADDUS/SUBUS X86 specific combines into generic codegen. Differential Revision: https://reviews.llvm.org/D55787 llvm-svn: 349481	2018-12-18 13:23:03 +00:00
Petar Avramovic	150fd430f6	[MIPS GlobalISel] ClampScalar G_AND G_OR and G_XOR Add narrowScalar for G_AND and G_XOR. Legalize G_AND G_OR and G_XOR for types other then s32 with clampScalar on MIPS32. Differential Revision: https://reviews.llvm.org/D55362 llvm-svn: 349475	2018-12-18 11:36:14 +00:00
Luke Cheeseman	f57d7d8237	[AArch64] - Return address signing dwarf support - Reapply changes intially introduced in r343089 - The archtecture info is no longer loaded whenever a DWARFContext is created - The runtimes libraries (santiziers) make use of the dwarf context classes but do not intialise the target info - The architecture of the object can be obtained without loading the target info - Adding a method to the dwarf context to get this information and multiplex the string printing later on Differential Revision: https://reviews.llvm.org/D55774 llvm-svn: 349472	2018-12-18 10:37:42 +00:00
Matt Arsenault	c94e26c71d	AMDGPU: Legalize/regbankselect frame_index llvm-svn: 349468	2018-12-18 09:46:13 +00:00
Matt Arsenault	c0ea221068	AMDGPU: Legalize/regbankselect fma llvm-svn: 349467	2018-12-18 09:39:56 +00:00
Matt Arsenault	e01e7c81f2	AMDGPU/GlobalISel: Legalize/regbankselect fneg/fabs/fsub llvm-svn: 349463	2018-12-18 09:19:03 +00:00
Simon Pilgrim	8488a44c34	[X86][SSE] Move VSRAI sign extend in reg fold into SimplifyDemandedBits (VSRAI (VSHLI X, C1), C1) --> X iff NumSignBits(X) > C1 This works better as part of SimplifyDemandedBits than part of the general combine. llvm-svn: 349462	2018-12-18 09:11:34 +00:00
Simon Pilgrim	26c630f416	[X86][SSE] Replace (VSRLI (VSRAI X, Y), 31) -> (VSRLI X, 31) fold. This fold was incredibly specific - replace with a SimplifyDemandedBits fold to remove a VSRAI if only the original sign bit is demanded (its guaranteed to stay the same). Test change is merely a rescheduling. llvm-svn: 349459	2018-12-18 08:55:47 +00:00
Kristof Beyls	e66bc1f756	Introduce control flow speculation tracking pass for AArch64 The pass implements tracking of control flow miss-speculation into a "taint" register. That taint register can then be used to mask off registers with sensitive data when executing under miss-speculation, a.k.a. "transient execution". This pass is aimed at mitigating against SpectreV1-style vulnarabilities. At the moment, it implements the tracking of miss-speculation of control flow into a taint register, but doesn't implement a mechanism yet to then use that taint register to mask off vulnerable data in registers (something for a follow-on improvement). Possible strategies to mask out vulnerable data that can be implemented on top of this are: - speculative load hardening to automatically mask of data loaded in registers. - using intrinsics to mask of data in registers as indicated by the programmer (see https://lwn.net/Articles/759423/). For AArch64, the following implementation choices are made. Some of these are different than the implementation choices made in the similar pass implemented in X86SpeculativeLoadHardening.cpp, as the instruction set characteristics result in different trade-offs. - The speculation hardening is done after register allocation. With a relative abundance of registers, one register is reserved (X16) to be the taint register. X16 is expected to not clash with other register reservation mechanisms with very high probability because: . The AArch64 ABI doesn't guarantee X16 to be retained across any call. . The only way to request X16 to be used as a programmer is through inline assembly. In the rare case a function explicitly demands to use X16/W16, this pass falls back to hardening against speculation by inserting a DSB SYS/ISB barrier pair which will prevent control flow speculation. - It is easy to insert mask operations at this late stage as we have mask operations available that don't set flags. - The taint variable contains all-ones when no miss-speculation is detected, and contains all-zeros when miss-speculation is detected. Therefore, when masking, an AND instruction (which only changes the register to be masked, no other side effects) can easily be inserted anywhere that's needed. - The tracking of miss-speculation is done by using a data-flow conditional select instruction (CSEL) to evaluate the flags that were also used to make conditional branch direction decisions. Speculation of the CSEL instruction can be limited with a CSDB instruction - so the combination of CSEL + a later CSDB gives the guarantee that the flags as used in the CSEL aren't speculated. When conditional branch direction gets miss-speculated, the semantics of the inserted CSEL instruction is such that the taint register will contain all zero bits. One key requirement for this to work is that the conditional branch is followed by an execution of the CSEL instruction, where the CSEL instruction needs to use the same flags status as the conditional branch. This means that the conditional branches must not be implemented as one of the AArch64 conditional branches that do not use the flags as input (CB(N)Z and TB(N)Z). This is implemented by ensuring in the instruction selectors to not produce these instructions when speculation hardening is enabled. This pass will assert if it does encounter such an instruction. - On function call boundaries, the miss-speculation state is transferred from the taint register X16 to be encoded in the SP register as value 0. Future extensions/improvements could be: - Implement this functionality using full speculation barriers, akin to the x86-slh-lfence option. This may be more useful for the intrinsics-based approach than for the SLH approach to masking. Note that this pass already inserts the full speculation barriers if the function for some niche reason makes use of X16/W16. - no indirect branch misprediction gets protected/instrumented; but this could be done for some indirect branches, such as switch jump tables. Differential Revision: https://reviews.llvm.org/D54896 llvm-svn: 349456	2018-12-18 08:50:02 +00:00
Martin Storsjo	8f0cb9c3a8	[AArch64] [MinGW] Allow enabling SEH exceptions The default still is dwarf, but SEH exceptions can now be enabled optionally for the MinGW target. Differential Revision: https://reviews.llvm.org/D55748 llvm-svn: 349451	2018-12-18 08:32:37 +00:00
Kewen Lin	44ace92596	[PowerPC] Exploit power9 new instruction setb Check the expected pattens feeding to SELECT_CC like: (select_cc lhs, rhs, 1, (sext (setcc [lr]hs, [lr]hs, cc2)), cc1) (select_cc lhs, rhs, -1, (zext (setcc [lr]hs, [lr]hs, cc2)), cc1) (select_cc lhs, rhs, 0, (select_cc [lr]hs, [lr]hs, 1, -1, cc2), seteq) (select_cc lhs, rhs, 0, (select_cc [lr]hs, [lr]hs, -1, 1, cc2), seteq) Further transform the sequence to comparison + setb if hits. Differential Revision: https://reviews.llvm.org/D53275 llvm-svn: 349445	2018-12-18 07:53:26 +00:00
Craig Topper	1ff7356f96	[X86] Const correct some helper functions X86InstrInfo.cpp. NFC llvm-svn: 349440	2018-12-18 04:58:05 +00:00
Kewen Lin	3dac1252da	[PowerPC] Improve vec_abs on P9 Improve the current vec_abs support on P9, generate ISD::ABS node for vector types, combine ABS node to VABSD node for some special cases to make use of P9 VABSD* insns, do custom lowering to vsub(vneg later)+vmax if it has no combination opportunity. Differential Revision: https://reviews.llvm.org/D54783 llvm-svn: 349437	2018-12-18 03:16:43 +00:00
Simon Pilgrim	7e2975a44c	[X86][SSE] Improve immediate vector shift known bits handling. Convert VSRAI to VSRLI is the sign bit is known zero and improve KnownBits output for all shift instruction. Fixes the poor codegen comments in D55768. llvm-svn: 349407	2018-12-17 22:09:47 +00:00
Wouter van Oortmerssen	d3c544aa6e	[WebAssembly] Fix assembler parsing of br_table. Summary: We use `variable_ops` in the tablegen defs to denote the list of branch targets in `br_table`, but unlike other uses of `variable_ops` (e.g. call) the these branch targets need to actually be encoded in the instruction. The existing tables for `variable_ops` cause not operands to be accepted by the assembly matcher. Following the example of ARM: `2cc0a7da87/lib/Target/ARM/ARMInstrInfo.td (L550-L555)` we introduce a new operand type to capture this list, and we use the same {} syntax as ARM as well to differentiate them from regular integer operands. Also removed definition and use of TSFlags in tablegen defs, since `br_table` now has a non-variable_ops immediate operand, so the previous logic of only the variable_ops arguments being labels didn't make sense anymore. Reviewers: dschuff, aheejin, sunfish Subscribers: javed.absar, sbc100, jgravelle-google, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D55401 llvm-svn: 349405	2018-12-17 22:04:44 +00:00
Craig Topper	8c9d772991	[X86] Add T1MSKC and TZMSK to isDefConvertible used by optimizeCompareInstr. These seem to have been missed when the other TBM instructions were added. llvm-svn: 349404	2018-12-17 21:50:06 +00:00
Simon Pilgrim	6b5e0b7b2b	[X86][SSE] Split SimplifyDemandedBitsForTargetNode X86ISD::VSRLI/VSRAI handling. First step towards adding more capable combines to fix comments in D55768. llvm-svn: 349400	2018-12-17 21:36:17 +00:00
Craig Topper	728cbc0378	Convert (CMP (srl/shl X, C), 0) to (CMP (and X, C'), 0) when only the zero flag is used. This allows a TEST to be used and can be combined with any AND that may already exist as an input to the shift. This was already done in EmitTest, but was easily tricked by multiple uses because the setcc might be used by multiple instructions. Once the SETCC and users are legalized then we can look for the shift to be used by a single CMP, but the CMP itself can have multiple users. This appears to fix the case in PR39968. llvm-svn: 349385	2018-12-17 20:02:16 +00:00
Simon Pilgrim	9274f17a5e	[TargetLowering] Add DemandedElts mask to SimplifyDemandedBits (PR40000) This is an initial patch to add the necessary support for a DemandedElts argument to SimplifyDemandedBits, more closely matching computeKnownBits and to help improve vector codegen. I've added only a small amount of the changes necessary to get at least one test to update - a lot more can be done but I'd like to add these methodically with proper test coverage, at the same time the hope is to slowly move some/all of SimplifyDemandedVectorElts into SimplifyDemandedBits as well. Differential Revision: https://reviews.llvm.org/D55768 llvm-svn: 349374	2018-12-17 18:43:43 +00:00
Tim Northover	256a16d031	FastIsel: take care to update iterators when removing instructions. We keep a few iterators into the basic block we're selecting while performing FastISel. Usually this is fine, but occasionally code wants to remove already-emitted instructions. When this happens we have to be careful to update those iterators so they're not pointint at dangling memory. llvm-svn: 349365	2018-12-17 17:25:53 +00:00
Petar Avramovic	f9c9bc09ab	[MIPS GlobalISel] Remove switch statement (fix r349346 for MSVC) Temporarily remove switch statement without any case labels in function legalizeCustom in order to fix r349346 for MSVC. llvm-svn: 349356	2018-12-17 15:12:53 +00:00
Tim Northover	ae3b66b7b0	ARM: use acquire/release instruction variants when available. These features (fairly) recently got split out into their own feature, so we should make CodeGen use them when available. The main change here is that the check used to be based on the triple, but now it's based on CPU features. llvm-svn: 349355	2018-12-17 15:05:32 +00:00
Petar Avramovic	b8276f2280	[MIPS GlobalISel] Lower G_UADDE and narrowScalar G_ADD Lower G_UADDE and legalize G_ADD using narrowScalar on MIPS32. Differential Revision: https://reviews.llvm.org/D54580 llvm-svn: 349346	2018-12-17 12:31:07 +00:00
Alexandros Lamprineas	490ae11717	[AArch64] Re-run load/store optimizer after aggressive tail duplication The Load/Store Optimizer runs before Machine Block Placement. At O3 the Tail Duplication Threshold is set to 4 instructions and this can create new opportunities for the Load/Store Optimizer. It seems worthwhile to run it once again. llvm-svn: 349338	2018-12-17 10:45:43 +00:00
Craig Topper	fa4907d671	[X86] Fix bad operand lookup for cmov introduced in r349315 The CC is operand 2 not operand 3. llvm-svn: 349330	2018-12-17 06:40:35 +00:00
Simon Pilgrim	d0c9e43b1c	[X86] Pull out constant splat rotation detection. We had 3 different approaches - consistently use getTargetConstantBitsFromNode and allow undef elts. llvm-svn: 349319	2018-12-16 19:46:04 +00:00
Craig Topper	10f8892837	[X86] Remove truncation handling from EmitTest. Replace it with a DAG combine. I'd like to try to move a lot of the flag matching out of EmitTest and push it to isel or isel preprocessing. This is a step towards that. The test-shrink-bug.ll changie is an improvement because we are no longer interfering with test shrink handling in isel. The pr34137.ll change is a regression, but the IR came from -O0 and was not reduced by InstCombine. So it contains a lot of redundancies like duplicate loads that made it combine poorly. llvm-svn: 349315	2018-12-16 18:35:55 +00:00
Sanjay Patel	13ac2f15b0	[x86] increment/decrement constant vector with min/max in vsetcc lowering (PR39859) This is part of fixing PR39859: https://bugs.llvm.org/show_bug.cgi?id=39859 We have a crippled vector ISA, so we have to invert a typical fold and create min/max here. As discussed in the bug report, we can probably do better by using saturating subtract when it's available, but we should have this improvement for the min/max patterns regardless. Alive proofs: https://rise4fun.com/Alive/zsf https://rise4fun.com/Alive/Qrl Differential Revision: https://reviews.llvm.org/D55515 llvm-svn: 349304	2018-12-16 15:05:48 +00:00
Simon Pilgrim	52c982406e	[X86] Begin cleaning up combineOr -> SHLD/SHRD. NFCI. In preparation for converting to funnel shifts. llvm-svn: 349286	2018-12-15 21:11:49 +00:00
Simon Pilgrim	ef7b5949e5	[X86] Lower to SHLD/SHRD on slow machines for optsize Use consistent rules for when to lower to SHLD/SHRD for slow machines - fixes a weird issue where funnel shift gets expanded but then X86ISelLowering's combineOr sees the optsize and combines to SHLD/SHRD, but now with the modulo amount guard...... llvm-svn: 349285	2018-12-15 19:43:44 +00:00
Simon Pilgrim	9831d4058c	Fix -Wunused-variable warning. NFCI. llvm-svn: 349265	2018-12-15 12:25:22 +00:00
Florian Hahn	abe32c9125	[SILoadStoreOptimizer] Use std::abs to avoid truncation. Using regular abs() causes the following warning error: absolute value function 'abs' given an argument of type 'int64_t' (aka 'long') but has parameter of type 'int' which may cause truncation of value [-Werror,-Wabsolute-value] (uint32_t)abs(Dist) > MaxDist) { ^ lib/Target/AMDGPU/SILoadStoreOptimizer.cpp:1369:19: note: use function 'std::abs' instead which causes a bot to fail: http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux/builds/18284/steps/bootstrap%20clang/logs/stdio llvm-svn: 349224	2018-12-15 01:32:58 +00:00
Craig Topper	1fc257d97f	[X86] Rename hasNoSignedComparisonUses to hasNoSignFlagUses. Add the instruction that only modify the O flag to the waiver list. The only caller of this turns CMP with 0 into TEST. CMP with 0 and TEST both set OF to 0 so we should have no issues with instructions that only use OF. Though I don't think there's any reason we would read just OF after a compare with 0 anyway. So this probably isn't an observable change. llvm-svn: 349223	2018-12-15 01:07:19 +00:00
Craig Topper	5c304eac41	[X86] Make hasNoCarryFlagUses/hasNoSignedComparisonUses take an SDValue that indicates which result is the flag result. NFCI hasNoCarryFlagUses hardcoded that the flag result is 1 and used that to filter which uses were of interest. hasNoSignedComparisonUses just assumes the only result is flags and checks whether any user of the node is a CopyToReg instruction. After this patch we now do a result number check in both and rely on the caller to provide the result number. This shouldn't change behavior it was just an odd difference between the two functions that I noticed. llvm-svn: 349222	2018-12-15 01:07:16 +00:00
Artem Belevich	6d74bd638a	[NVPTX] Lower instructions that expand into libcalls. The change is an effort to split and refactor abandoned D34708 into smaller parts. Here the behaviour of unsupported instructions is changed to match the behaviour of explicit intrinsics calls. Currently LLVM crashes with: > Assertion getInstruction() && "Not a call or invoke instruction!" failed. With this patch LLVM produces a more sensible error message: > Cannot select: ... i32 = ExternalSymbol'__foobar' Author: Denys Zariaiev <denys.zariaiev@gmail.com> Differential Revision: https://reviews.llvm.org/D55145 llvm-svn: 349213	2018-12-14 23:53:06 +00:00
Krzysztof Parzyszek	26d994f56e	[Hexagon] Add patterns for shifts of v2i16 This fixes https://llvm.org/PR39983. llvm-svn: 349202	2018-12-14 22:33:48 +00:00
Krzysztof Parzyszek	c0fc0a9775	[Hexagon] Use IMPLICIT_DEF to any-extend 32-bit values to 64 bits llvm-svn: 349199	2018-12-14 22:05:44 +00:00
Farhana Aleen	ce095c564a	[AMDGPU] Promote constant offset to the immediate by finding a new base with 13bit constant offset from the nearby instructions. Summary: Promote constant offset to immediate by recomputing the relative 13bit offset from nearby instructions. E.g. s_movk_i32 s0, 0x1800 v_add_co_u32_e32 v0, vcc, s0, v2 v_addc_co_u32_e32 v1, vcc, 0, v6, vcc s_movk_i32 s0, 0x1000 v_add_co_u32_e32 v5, vcc, s0, v2 v_addc_co_u32_e32 v6, vcc, 0, v6, vcc global_load_dwordx2 v[5:6], v[5:6], off global_load_dwordx2 v[0:1], v[0:1], off => s_movk_i32 s0, 0x1000 v_add_co_u32_e32 v5, vcc, s0, v2 v_addc_co_u32_e32 v6, vcc, 0, v6, vcc global_load_dwordx2 v[5:6], v[5:6], off global_load_dwordx2 v[0:1], v[5:6], off offset:2048 Author: FarhanaAleen Reviewed By: arsenm, rampitec Subscribers: llvm-commits, AMDGPU Differential Revision: https://reviews.llvm.org/D55539 llvm-svn: 349196	2018-12-14 21:13:14 +00:00
Evandro Menezes	ea9d90083f	[AArch64] Simplify the scheduling predicates (NFC) The instruction encodings make it unnecessary to distinguish extended W-form from X-form instructions. llvm-svn: 349185	2018-12-14 20:04:58 +00:00
Ehsan Amiri	de1742c284	NFC. Adding an empty line to test the updated commit credentials. llvm-svn: 349158	2018-12-14 16:19:02 +00:00
Diana Picus	02c8343c75	[ARM GlobalISel] Thumb2: casts between int and ptr Mark as legal and add tests. Nothing special to do. llvm-svn: 349147	2018-12-14 13:45:38 +00:00
Diana Picus	813af0d283	[ARM GlobalISel] Minor refactoring. NFCI Refactor the ARMInstructionSelector to cache some opcodes in the constructor instead of checking all the time if we're in ARM or Thumb mode. llvm-svn: 349143	2018-12-14 12:37:24 +00:00
Diana Picus	14dc3b2959	[ARM GlobalISel] Allow simple binary ops in Thumb2 Mark G_ADD, G_SUB, G_MUL, G_AND, G_OR and G_XOR as legal for both ARM and Thumb2. Extract the legalizer tests for these opcodes into another file. Add tests for the instruction selector. llvm-svn: 349142	2018-12-14 11:58:14 +00:00
Craig Topper	257ce3871e	[DAGCombiner][X86] Prevent visitSIGN_EXTEND from returning N when (sext (setcc)) already has the target desired type for the setcc Summary: If the setcc already has the target desired type we can reach the getSetCC/getSExtOrTrunc after the MatchingVecType check with the exact same types as the nodes we started with. This causes those causes VsetCC to be CSEd to N0 and the getSExtOrTrunc will CSE to N. When we return N, the caller will think that meant we called CombineTo and did our own worklist management. But that's not what happened. This prevents target hooks from being called for the node. To fix this, I've now returned SDValue if the setcc is already the desired type. But to avoid some regressions in X86 I've had to disable one of the target combines that wasn't being reached before in the case of a (sext (setcc)). If we get vector widening legalization enabled that entire function will be deleted anyway so hopefully this is only for the short term. Reviewers: RKSimon, spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D55459 llvm-svn: 349137	2018-12-14 08:28:24 +00:00
Craig Topper	178abc59ac	[X86] Demote EmitTest to a helper function of EmitCmp. Route all callers except EmitCmp through EmitCmp. This requires the two callers to manifest a 0 to make EmitCmp call EmitTest. I'm looking into changing how we combine TEST and flag setting instructions to not be part of lowering. And instead be part of DAG combine or isel. Which will mean EmitTest will probably become gutted and maybe disappear entirely. llvm-svn: 349094	2018-12-13 23:55:30 +00:00
Evandro Menezes	6fe51ac973	[AArch64] Fix Exynos predicates (NFC) Fix the logic in the definition of the `ExynosShiftExPred` as a more specific version of `ExynosShiftPred`. But, since `ExynosShiftExPred` is not used yet, this change has NFC. llvm-svn: 349091	2018-12-13 23:19:46 +00:00
Aakanksha Patil	bc568766b2	Revert r348971: [AMDGPU] Support for "uniform-work-group-size" attribute This patch breaks RADV (and probably RadeonSI as well) llvm-svn: 349084	2018-12-13 21:23:12 +00:00
Matt Arsenault	934e534c47	AMDGPU/GlobalISel: Legalize/regbankselect block_addr llvm-svn: 349081	2018-12-13 20:34:15 +00:00
Mircea Trofin	41c729e78e	[llvm] Address base discriminator overflow in X86DiscriminateMemOps Summary: Macros are expanded on a single line. In case of large expansions, with sufficiently many instructions with memory operands (and when -fdebug-info-for-profiling is requested), we may be unable to generate new base discriminator values - new values overflow (base discriminators may not be larger than 2^12). This CL warns instead of asserting in such a case. A subsequent CL will add APIs to check for overflow before creating new debug info. See https://bugs.llvm.org/show_bug.cgi?id=39890 Reviewers: davidxl, wmi, gbedwell Reviewed By: davidxl Subscribers: aprantl, llvm-commits Differential Revision: https://reviews.llvm.org/D55643 llvm-svn: 349075	2018-12-13 19:40:59 +00:00
Simon Pilgrim	b5aaa673c6	[X86][SSE] Add SSE vector imm/var shift support to SimplifyDemandedVectorEltsForTargetNode llvm-svn: 349057	2018-12-13 16:39:29 +00:00
Simon Pilgrim	b0b2f1503a	[X86][SSE] Fix all remaining modulo vector rotation amounts (PR38243) There's still a couple of minor SimplifyDemandedElts regressions in some of the shift amount splats that will be fixed in future patches. llvm-svn: 349052	2018-12-13 15:50:31 +00:00
Daniel Cederman	77611426e1	[Sparc] Add membar assembler tags Summary: The Sparc V9 membar instruction can enforce different types of memory orderings depending on the value in its immediate field. In the architectural manual the type is selected by combining different assembler tags into a mask. This patch adds support for these tags. Reviewers: jyknight, venkatra, brad Reviewed By: jyknight Subscribers: fedor.sergeev, jrtc27, jfb, llvm-commits Differential Revision: https://reviews.llvm.org/D53491 llvm-svn: 349048	2018-12-13 15:29:12 +00:00
Simon Pilgrim	ba91ff4a86	[X86][SSE] Fix modulo rotation amounts for v8i16/v16i16/v4i32 (PR38243) llvm-svn: 349047	2018-12-13 15:23:09 +00:00
Daniel Cederman	b5d284408e	[Sparc] Use float register for integer constrained with "f" in inline asm Summary: Constraining an integer value to a floating point register using "f" causes an llvm_unreachable to trigger. This patch allows i32 integers to be placed in a single precision float register and i64 integers to be placed in a double precision float register. This matches the behavior of GCC. For other types the llvm_unreachable is removed to instead trigger an error message that points out the offending line. Reviewers: jyknight, venkatra Reviewed By: jyknight Subscribers: eraman, fedor.sergeev, jrtc27, llvm-commits Differential Revision: https://reviews.llvm.org/D51614 llvm-svn: 349045	2018-12-13 15:13:29 +00:00
Jinsong Ji	c7b43b94ce	[PowerPC][NFC] Sorting out Pseudo related classes to avoid confusion There are several Pseudo in PowerPC backend. eg: * ISel Pseudo-instructions , which has let usesCustomInserter=1 in td ExpandISelPseudos -> EmitInstrWithCustomInserter will deal with them. * Post-RA pseudo instruction, which has let isPseudo = 1 in td, or Standard pseudo (SUBREG_TO_REG,COPY etc.) ExpandPostRAPseudos -> expandPostRAPseudo will expand them * Multi-instruction pseudo operations will expand them PPCAsmPrinter::EmitInstruction * Pseudo instruction in CodeEmitter, which has encoding of 0. Currently, in td files, especially PPCInstrVSX.td, we did not distinguish Post-RA pseudo instruction and Pseudo instruction in CodeEmitter very clearly. This patch is to * Rename Pseudo<> class to PPCEmitTimePseudo, which means encoding of 0 in CodeEmitter * Introduce new class PPCPostRAExpPseudo <> for previous PostRA Pseudo * Introduce new class PPCCustomInserterPseudo <> for previous Isel Pseudo Differential Revision: https://reviews.llvm.org/D55143 llvm-svn: 349044	2018-12-13 15:12:57 +00:00
Simon Pilgrim	7c84f7ae3a	[X86][SSE] Merge the vXi16/vXi32 vector rotation expansion cases. NFCI. Merged the repeated code into a single if(). llvm-svn: 349040	2018-12-13 14:51:28 +00:00
Jonas Paulsson	e79b1b986d	[SystemZ] Pass copy-hinted regs first from getRegAllocationHints(). When computing register allocation hints for a GRX32Bit register, make sure that any of the hinted registers that are also copy hints are returned first in the list. Review: Ulrich Weigand. llvm-svn: 349037	2018-12-13 14:37:05 +00:00
Simon Pilgrim	320fd7383f	[X86][BWI] Don't custom lower vXi8 rotations. We always expand to shifts anyhow - test changes are just different scheduling only. llvm-svn: 349034	2018-12-13 13:44:33 +00:00
Chen Zheng	9c6fa536e0	[PowerPC] intrinsic llvm.eh.sjlj.setjmp should not have flag isBarrier. Differential Revision: https://reviews.llvm.org/D55499 llvm-svn: 349029	2018-12-13 12:25:20 +00:00
Simon Pilgrim	ab973a45b9	[DAGCombine] Moved X86 rotate_amount % bitwidth == 0 early out to DAGCombiner Remove common code from custom lowering (code is still safe if somehow a zero value gets used). llvm-svn: 349028	2018-12-13 12:23:32 +00:00
Diana Picus	99cd644b6c	[ARM GlobalISel] Support exts and truncs for Thumb2 Mark G_SEXT, G_ZEXT and G_ANYEXT to 32 bits as legal and add support for them in the instruction selector. This uses handwritten code again because the patterns that are generated with TableGen are tuned for what the DAG combiner would produce and not for simple sext/zext nodes. Luckily, we only need to update the opcodes to use the Thumb2 variants, everything else can be reused from ARM. llvm-svn: 349026	2018-12-13 12:06:54 +00:00
Simon Pilgrim	77fc551d1a	[TargetLowering] Add ISD::ROTL/ROTR vector expansion Move existing rotation expansion code into TargetLowering and set it up for vectors as well. Ideally this would share more of the funnel shift expansion, but we handle the shift amount modulo quite differently at the moment. Begun removing x86 vector rotate custom lowering to use the expansion. llvm-svn: 349025	2018-12-13 11:20:48 +00:00
Alex Bradbury	919f5fb8ca	[RISCV] Add support for the various RISC-V FMA instruction variants Adds support for the various RISC-V FMA instructions (fmadd, fmsub, fnmsub, fnmadd). The criteria for choosing whether a fused add or subtract is used, as well as whether the product is negated or not, is whether some of the arguments to the llvm.fma.* intrinsic are negated or not. In the tests, extraneous fadd instructions were added to avoid the negation being performed using a xor trick, which prevented the proper FMA forms from being selected and thus tested. The FMA instruction patterns might seem incorrect (e.g., fnmadd: -rs1 * rs2 - rs3), but they should be correct. The misleading names were inherited from MIPS, where the negation happens after computing the sum. The llvm.fmuladd.* intrinsics still do not generate RISC-V FMA instructions, as that depends on TargetLowering::isFMAFasterthanFMulAndFAdd. Some comments in the test files about what type of instructions are there tested were updated, to better reflect the current content of those test files. Differential Revision: https://reviews.llvm.org/D54205 Patch by Luís Marques. llvm-svn: 349023	2018-12-13 10:49:05 +00:00
Arnaud A. de Grandmaison	dfe861087d	[AArch64] Catch some more CMN opportunities. Fixes https://bugs.llvm.org/show_bug.cgi?id=33486 llvm-svn: 349022	2018-12-13 10:31:32 +00:00
Matt Arsenault	577b9fc543	AMDGPU/GlobalISel: Legalize f64 fadd/fmul llvm-svn: 349014	2018-12-13 08:27:48 +00:00
Matt Arsenault	f38f483bef	AMDGPU/GlobalISel: RegBankSelect some simple operations llvm-svn: 349012	2018-12-13 08:23:51 +00:00
Craig Topper	a048d58de7	[X86] Remove assert leftover from when i1 was a legal type. Add more accurate assert. NFC llvm-svn: 349007	2018-12-13 06:14:25 +00:00
Stanislav Mekhanoshin	d933c2ced7	[AMDGPU] Fix build failure, second attempt Some compilers complain that variable is captured and some complain when it is not. Switch to [&]. llvm-svn: 349006	2018-12-13 05:52:11 +00:00
Stanislav Mekhanoshin	5225746e03	[AMDGPU] Fix build failure Fixed error 'lambda capture 'CondReg' is not required to be captured for this use'. llvm-svn: 349005	2018-12-13 05:21:25 +00:00
Stanislav Mekhanoshin	6071e1aa58	[AMDGPU] Simplify negated condition Optimize sequence: %sel = V_CNDMASK_B32_e64 0, 1, %cc %cmp = V_CMP_NE_U32 1, %1 $vcc = S_AND_B64 $exec, %cmp S_CBRANCH_VCC[N]Z => $vcc = S_ANDN2_B64 $exec, %cc S_CBRANCH_VCC[N]Z It is the negation pattern inserted by DAGCombiner::visitBRCOND() in the rebuildSetCC(). Differential Revision: https://reviews.llvm.org/D55402 llvm-svn: 349003	2018-12-13 03:17:40 +00:00
Craig Topper	d1c61861dd	[X86] Don't emit MULX by default with BMI2 MULX has somewhat improved register allocation constraints compared to the legacy MUL instruction. Both output registers are encoded instead of fixed to EAX/EDX, but EDX is used as input. It also doesn't touch flags. Unfortunately, the encoding is longer. Prefering it whenever BMI2 is enabled is probably not optimal. Choosing it should somehow be a function of register allocation constraints like converting adds to three address. gcc and icc definitely don't pick MULX by default. Not sure what if any rules they have for using it. Differential Revision: https://reviews.llvm.org/D55565 llvm-svn: 348975	2018-12-12 21:21:31 +00:00
Aakanksha Patil	729309cc89	[AMDGPU] Support for "uniform-work-group-size" attribute Updated the annotate-kernel-features pass to support the propagation of uniform-work-group attribute from the kernel to the called functions. Once this pass is run, all kernels, even the ones which initially did not have the attribute, will be able to indicate weather or not they have uniform work group size depending on the value of the attribute. Differential Revision: https://reviews.llvm.org/D50200 llvm-svn: 348971	2018-12-12 20:49:17 +00:00
Scott Linder	f5b36e56fb	[AMDGPU] Emit MessagePack HSA Metadata for v3 code object Continue to present HSA metadata as YAML in ASM and when output by tools (e.g. llvm-readobj), but encode it in Messagepack in the code object. Differential Revision: https://reviews.llvm.org/D48179 llvm-svn: 348963	2018-12-12 19:39:27 +00:00
Craig Topper	4937adf75f	[X86] Emit SBB instead of SETCC_CARRY from LowerSELECT. Break false dependency on the SBB input. I'm hoping we can just replace SETCC_CARRY with SBB. This is another step towards that. I've explicitly used zero as the input to the setcc to avoid a false dependency that we've had with the SETCC_CARRY. I changed one of the patterns that used NEG to instead use an explicit compare with 0 on the LHS. We needed the zero anyway to avoid the false dependency. The negate would clobber its input register. By using a CMP we can avoid that which could be useful. Differential Revision: https://reviews.llvm.org/D55414 llvm-svn: 348959	2018-12-12 19:20:21 +00:00
Simon Pilgrim	eb508f8ccb	[SelectionDAG] Add a generic isSplatValue function This patch introduces a generic function to determine whether a given vector type is known to be a splat value for the specified demanded elements, recursing up the DAG looking for BUILD_VECTOR or VECTOR_SHUFFLE splat patterns. It also keeps track of the elements that are known to be UNDEF - it returns true if all the demanded elements are UNDEF (as this may be useful under some circumstances), so this needs to be handled by the caller. A wrapper variant is also provided that doesn't take the DemandedElts or UndefElts arguments for cases where we just want to know if the SDValue is a splat or not (with/without UNDEFS). I had hoped to completely remove the X86 local version of this function, but I'm seeing some regressions in shift/rotate codegen that will take a little longer to fix and I hope to get this in sooner so I can continue work on PR38243 which needs more capable splat detection. Differential Revision: https://reviews.llvm.org/D55426 llvm-svn: 348953	2018-12-12 18:32:29 +00:00
Artem Belevich	f802b9324a	[NVPTX] do not rely on cached subtarget info. If a module has function references, but no functions themselves, we may end up never calling runOnMachineFunction and therefore would never initialize nvptxSubtarget field which would eventually cause a crash. Instead of relying on nvptxSubtarget being initialized by one of the methods, retrieve subtarget info directly. Differential Revision: https://reviews.llvm.org/D55580 llvm-svn: 348952	2018-12-12 18:31:04 +00:00
Sanjay Patel	44eaa492b8	[x86] allow 8-bit adds to be promoted by convertToThreeAddress() to form LEA This extends the code that handles 16-bit add promotion to form LEA to also allow 8-bit adds. That allows us to combine add ops with register moves and save some instructions. This is another step towards allowing add truncation in generic DAGCombiner (see D54640). Differential Revision: https://reviews.llvm.org/D55494 llvm-svn: 348946	2018-12-12 17:58:27 +00:00
Neil Henning	76504a4c5e	[AMDGPU] Extend the SI Load/Store optimizer to combine more things. I've extended the load/store optimizer to be able to produce dwordx3 loads and stores, This change allows many more load/stores to be combined, and results in much more optimal code for our hardware. Differential Revision: https://reviews.llvm.org/D54042 llvm-svn: 348937	2018-12-12 16:15:21 +00:00
Simon Atanasyan	fa020082e4	[mips] Enable using of integrated assembler in all cases. llvm-svn: 348934	2018-12-12 15:32:03 +00:00
Piotr Sobczak	3732b4ce25	[AMDGPU] Set metadata access for explicit section Summary: This patch provides a means to set Metadata section kind for a global variable, if its explicit section name is prefixed with ".AMDGPU.metadata." This could be useful to make the global variable go to an ELF section without any section flags set. Reviewers: dstuttard, tpr, kzhuravl, nhaehnle, t-tye Reviewed By: dstuttard, kzhuravl Subscribers: llvm-commits, arsenm, jvesely, wdng, yaxunl, t-tye Differential Revision: https://reviews.llvm.org/D55267 llvm-svn: 348922	2018-12-12 11:20:04 +00:00
Diana Picus	59720b422a	[ARM GlobalISel] Select load/store for Thumb2 Unfortunately we can't use TableGen for this because it doesn't yet support predicates on the source pattern root. Therefore, add a bit of handwritten code to the instruction selector to handle the most basic cases. Also mark them as legal and extract their legalizer test cases to a new test file. llvm-svn: 348920	2018-12-12 10:32:15 +00:00
Jonas Paulsson	896775c2d3	[SystemZ] Minor cleanup of SchedModels Some fixes of a few InstRWs for z13 and z14. Review: Ulrich Weigand llvm-svn: 348917	2018-12-12 08:26:24 +00:00
Craig Topper	1fe466689b	[X86] Combine vpmovdw+vpacksswb into vpmovdb. This is similar to the combine we already have for vpmovdw+vpackuswb. llvm-svn: 348910	2018-12-12 05:56:01 +00:00
Mandeep Singh Grang	802dc40f41	[COFF, ARM64] Emit COFF function header Summary: Emit COFF header when printing out the function. This is important as the header contains two important pieces of information: the storage class for the symbol and the symbol type information. This bit of information is required for the linker to correctly identify the type of symbol that it is dealing with. This patch mimics X86 and ARM COFF behavior for function header emission. Reviewers: rnk, mstorsjo, compnerd, TomTan, ssijaric Reviewed By: mstorsjo Subscribers: dmajor, javed.absar, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D55535 llvm-svn: 348875	2018-12-11 18:36:14 +00:00
Craig Topper	b51283bfd7	Fix not correct imm operand assertion for SUB32ri in X86CondBrFolding::analyzeCompare Summary: When doing X86CondBrFolding::analyzeCompare, it will meet the SUB32ri instruction as below to use the global address for its operand, %733:gr32 = SUB32ri %62:gr32(tied-def 0), @img2buf_normal, implicit-def $eflags JNE_1 %bb.41, implicit $eflags so the assertion "assert(MI.getOperand(ValueIndex).isImm() && "Expecting Imm operand")" is not correct and change the assert to if make X86CondBrFolding::analyzeCompare return false as not finding the compare for this Patch by Jianping Chen Reviewers: smaslov, LuoYuanke, liutianle, Jianping Reviewed By: Jianping Subscribers: lebedev.ri, llvm-commits Differential Revision: https://reviews.llvm.org/D54250 llvm-svn: 348853	2018-12-11 15:32:14 +00:00
Sanjay Patel	05e36982dd	[x86] clean up code for converting 16-bit ops to LEA; NFC As discussed in D55494, we want to extend this to handle 8-bit ops too, but that could be extended further to enable this on 32-bit systems too. llvm-svn: 348851	2018-12-11 15:29:40 +00:00
Sanjay Patel	9765ba5f86	[x86] remove dead code for 16-bit LEA formation; NFC As discussed in: D55494 ...this code has been disabled/dead for a long time (the code references Athlon and Pentium 4), and there's almost no chance that it will be used given the last decade of uarch evolution. Also, in SDAG we promote 16-bit ops to 32-bit, so there's almost no way to test this code any more. llvm-svn: 348845	2018-12-11 14:05:03 +00:00
Martell Malone	0b3ddec7ed	[PPC][NFC] store operands are dst not src Differential Revision: https://reviews.llvm.org/D55502 llvm-svn: 348826	2018-12-11 03:14:56 +00:00
Heejin Ahn	be5e5874f6	[WebAssembly] Add '.eventtype' directive support Summary: This patch supports `.eventtype` directive printing and parsing in the same syntax with `.functype`. Reviewers: aardappel, sbc100 Subscribers: dschuff, sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D55353 llvm-svn: 348818	2018-12-11 01:11:04 +00:00
Heejin Ahn	21d45a2c98	[WebAssembly] TargetStreamer cleanup (NFC) Summary: - Unify mixed argument names (`Symbol` and `Sym`) to `Sym` - Changed `MCSymbolWasm` argument of `emit*` functions to `const MCSymbolWasm`. It seems not very intuitive that emit function in the streamer modifies symbol contents. - Moved empty function bodies to the header - clang-format Reviewers: aardappel, dschuff, sbc100 Subscribers: jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D55347 llvm-svn: 348816	2018-12-11 00:53:59 +00:00
Aditya Nandakumar	cef44a2342	[GISel]: Refactor MachineIRBuilder to allow passing additional parameters to build Instrs https://reviews.llvm.org/D55294 Previously MachineIRBuilder::buildInstr used to accept variadic arguments for sources (which were either unsigned or MachineInstrBuilder). While this worked well in common cases, it doesn't allow us to build instructions that have multiple destinations. Additionally passing in other optional parameters in the end (such as flags) is not possible trivially. Also a trivial call such as B.buildInstr(Opc, Reg1, Reg2, Reg3) can be interpreted differently based on the opcode (2defs + 1 src for unmerge vs 1 def + 2srcs). This patch refactors the buildInstr to buildInstr(Opc, ArrayRef<DstOps>, ArrayRef<SrcOps>) where DstOps and SrcOps are typed unions that know how to add itself to MachineInstrBuilder. After this patch, most invocations would look like B.buildInstr(Opc, {s32, DstReg}, {SrcRegs..., SrcMIBs..}); Now all the other calls (such as buildAdd, buildSub etc) forward to buildInstr. It also makes it possible to build instructions with multiple defs. Additionally in a subsequent patch, we should make it possible to add flags directly while building instructions. Additionally, the main buildInstr method is now virtual and other builders now only have to override buildInstr (for say constant folding/cseing) is straightforward. Also attached here (https://reviews.llvm.org/F7675680) is a clang-tidy patch that should upgrade the API calls if necessary. llvm-svn: 348815	2018-12-11 00:48:50 +00:00
Krzysztof Parzyszek	9f003f9262	[Hexagon] Couple of fixes in optimize addressing mode - Check if an operand is an immediate before calling getImm. Some operands that take constant values can actually have global symbols or other constant expressions. - When a load-constant instruction can be folded into users, make sure to only delete it when all users have been successfully converted. llvm-svn: 348802	2018-12-10 21:56:04 +00:00
Krzysztof Parzyszek	c1b2d5905a	Revert "[Hexagon] Check if operand is an immediate before getImm" This reverts r348787. The patch wasn't quite correct. llvm-svn: 348792	2018-12-10 19:30:08 +00:00
Amara Emerson	5ec146046c	[GlobalISel] Restrict G_MERGE_VALUES capability and replace with new opcodes. This patch restricts the capability of G_MERGE_VALUES, and uses the new G_BUILD_VECTOR and G_CONCAT_VECTORS opcodes instead in the appropriate places. This patch also includes AArch64 support for selecting G_BUILD_VECTOR of <4 x s32> and <2 x s64> vectors. Differential Revisions: https://reviews.llvm.org/D53629 llvm-svn: 348788	2018-12-10 18:44:58 +00:00
Krzysztof Parzyszek	c6e9380a56	[Hexagon] Check if operand is an immediate before getImm llvm-svn: 348787	2018-12-10 18:39:47 +00:00
Krzysztof Parzyszek	914f2d1c46	[Hexagon] Add patterns for any_extend from i1 and short vectors of i1 llvm-svn: 348785	2018-12-10 18:36:06 +00:00
Sanjay Patel	134f56e702	[x86] fix formatting; NFC This should really be generalized to allow increment and/or we should replace it by using ISD::matchUnaryPredicate(). See D55515 for context. llvm-svn: 348776	2018-12-10 17:23:44 +00:00
Evandro Menezes	53f0d41dc4	[AArch64] Refactor the Exynos scheduling predicates Refactor the scheduling predicates based on `MCInstPredicate`. In this case, for the Exynos processors. Differential revision: https://reviews.llvm.org/D55345 llvm-svn: 348774	2018-12-10 17:17:26 +00:00
Neil Henning	e448351b77	[AMDGPU] Change the l1 flush instruction for AMDPAL/MESA3D. This commit changes which l1 flush instruction is used for AMDPAL and MESA3d workloads to flush the entire l1 cache instead of just the volatile lines. Differential Revision: https://reviews.llvm.org/D55367 llvm-svn: 348771	2018-12-10 16:35:53 +00:00
Evandro Menezes	1ec1a0d342	[AArch64] Refactor the scheduling predicates Refactor the scheduling predicates based on `MCInstPredicate`. Augment the number of helper predicates used by processor specific predicates. Differential revision: https://reviews.llvm.org/D55375 llvm-svn: 348768	2018-12-10 16:24:30 +00:00
Tim Corringham	2faadb15f4	[AMDGPU] Add new Mode Register pass - minor fix Trivial change to add parentheses to an expression to avoid a sanitizer error in SIModeRegister.cpp, which was committed earlier. llvm-svn: 348767	2018-12-10 16:23:30 +00:00
Cameron McInally	872ed41a1e	[AVX512] Update typo in comment Should be "Sae" for "Suppress All Exceptions". NFC llvm-svn: 348763	2018-12-10 15:21:35 +00:00
Vladimir Stefanovic	4433f93afe	[mips][mc] Emit R_{MICRO}MIPS_JALR when expanding jal to jalr When replacing jal with jalr, also emit '.reloc R_MIPS_JALR' (R_MICROMIPS_JALR for micromips). The linker might then be able to turn jalr into a direct call. Add '-mips-jalr-reloc' to enable/disable this feature (default is true). Differential revision: https://reviews.llvm.org/D55292 llvm-svn: 348760	2018-12-10 15:07:36 +00:00
Tim Corringham	4c4d2fe280	[AMDGPU] Add new Mode Register pass A new pass to manage the Mode register. Currently this just manages the floating point double precision rounding requirements, but is intended to be easily extended to encompass all Mode register settings. The immediate motivation comes from the requirement to use the round-to-zero rounding mode for the 16 bit interpolation instructions, where the rounding mode setting is shared between 16 and 64 bit operations. llvm-svn: 348754	2018-12-10 12:06:10 +00:00
Nikita Popov	e79477895e	[X86] Fix AvoidStoreForwardingBlocks pass for negative displacements Fixes https://bugs.llvm.org/show_bug.cgi?id=39926. The size of the first copy was computed as std::abs(std::abs(LdDisp2) - std::abs(LdDisp1)), which results in skipped bytes if the signs of LdDisp2 and LdDisp1 differ. As far as I can see, this should just be LdDisp2 - LdDisp1. The case where LdDisp1 > LdDisp2 is already handled in the code above, in which case LdDisp2 is set to LdDisp1 and this subtraction will evaluate to Size1 = 0, which is the correct value to skip an overlapping copy. Differential Revision: https://reviews.llvm.org/D55485 llvm-svn: 348750	2018-12-10 10:16:50 +00:00
Craig Topper	02b614abc8	[X86] Merge addcarryx/addcarry intrinsic into a single addcarry intrinsic. Both intrinsics do the exact same thing so we really only need one. Earlier in the 8.0 cycle we changed the signature of this intrinsic without renaming it. But it looks difficult to get the autoupgrade code to allow me to merge the intrinsics and change the signature at the same time. So I've renamed the intrinsic slightly for the new merged intrinsic. I'm skipping autoupgrading from the previous new to 8.0 signature. I've also renamed the subborrow for consistency. llvm-svn: 348737	2018-12-10 06:07:50 +00:00
Brian Gesiak	b963c5150d	[AMDGPU] Fix discarded result of addAttribute Summary: `llvm::AttributeList` and `llvm::AttributeSet` are immutable, and so methods defined on these classes, such as `addAttribute`, return a new immutable object with the attribute added. In https://reviews.llvm.org/D55217 I attempted to annotate methods such as `addAttribute` with `LLVM_NODISCARD`, since calling these methods has no side-effects, and so ignoring the result that is returned is almost certainly a programmer error. However, committing the change resulted in new warnings in the AMDGPU target. The AMDGPU simplify libcalls pass added in https://reviews.llvm.org/D36436 attempts to add the readonly and nounwind attributes to simplified library functions, but instead calls the `addAttribute` methods and ignores the result. Modify the simplify libcalls pass to actually add the nounwind and readonly attributes. Also update the simplify libcalls test to assert that these attributes are actually being set. Reviewers: rampitec, vpykhtin, rnk Reviewed By: rampitec Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D55435 llvm-svn: 348732	2018-12-09 21:56:50 +00:00
Craig Topper	2b09d17d93	[X86] If the carry input to an addcarry/subborrow intrinsic is known to be 0, emit a flag setting ADD/SUB instead of ADC/SBB. Previously we had to take the carry in and add -1 to it to set the carry flag so we could use it with ADC/SBB. But if we know its 0 then we don't need to bother. This should go a long way towards fixing PR24545. llvm-svn: 348727	2018-12-09 18:02:37 +00:00
Nico Weber	b961661977	Remove unneeded dependency from lib/Target/X86/Utils/ to lib/IR (aka Core). The dependency was added in r213995 in response to r213986 which did make X86/Utils depend on IR, but r256680 later removed that dependency again. llvm-svn: 348724	2018-12-09 15:15:13 +00:00
Sanjay Patel	19bc850220	[x86] don't try to convert add with undef operands to LEA The existing code tries to handle an undef operand while transforming an add to an LEA, but it's incomplete because we will crash on the i16 test with the debug output shown below. It's better to just give up instead. Really, GlobalIsel should have folded these before we could get into trouble. # Machine code for function add_undef_i16: NoPHIs, TracksLiveness, Legalized, RegBankSelected, Selected bb.0 (%ir-block.0): liveins: $edi %1:gr32 = COPY killed $edi %0:gr16 = COPY %1.sub_16bit:gr32 %5:gr64_nosp = IMPLICIT_DEF %5.sub_16bit:gr64_nosp = COPY %0:gr16 %6:gr64_nosp = IMPLICIT_DEF %6.sub_16bit:gr64_nosp = COPY %2:gr16 %4:gr32 = LEA64_32r killed %5:gr64_nosp, 1, killed %6:gr64_nosp, 0, $noreg %3:gr16 = COPY killed %4.sub_16bit:gr32 $ax = COPY killed %3:gr16 RET 0, implicit killed $ax # End machine code for function add_undef_i16. * Bad machine code: Reading virtual register without a def * - function: add_undef_i16 - basic block: %bb.0 (0x7fe6cd83d940) - instruction: %6.sub_16bit:gr64_nosp = COPY %2:gr16 - operand 1: %2:gr16 LLVM ERROR: Found 1 machine code errors. Differential Revision: https://reviews.llvm.org/D54710 llvm-svn: 348722	2018-12-09 14:40:37 +00:00
Simon Pilgrim	e9d8275e43	[X86] Extend pfm counter coverage for llvm-exegesis Extension to rL348617, turns out llvm-exegesis doesn't need to match the perf counter name against a scheduler model resource name - so I've added a few more counters that I could find in the libpfm4 source code (and fix a typo in the knl/knm retired_uops counter - which uses 'all' instead of 'any'). llvm-svn: 348721	2018-12-09 13:45:15 +00:00
Matt Arsenault	b5613ecf17	AMDGPU: Fix offsets for < 4-byte aggregate kernel arguments We were still using the rounded down offset and alignment even though they aren't handled because you can't trivially bitcast the loaded value. llvm-svn: 348658	2018-12-07 22:12:17 +00:00
Krzysztof Parzyszek	b754f7a2e0	[Hexagon] Fix post-ra expansion of PS_wselect llvm-svn: 348655	2018-12-07 22:00:53 +00:00
Simon Pilgrim	44dfd81d01	Fix unused variable warning. NFCI. llvm-svn: 348649	2018-12-07 21:44:25 +00:00
Heejin Ahn	7ce5edf1ea	[WebAssembly] clang-format/clang-tidy AsmParser (NFC) Summary: - LLVM clang-format style doesn't allow one-line ifs. - LLVM clang-tidy style says method names should start with a lowercase letter. But currently WebAssemblyAsmParser's parent class MCTargetAsmParser is mixing lowercase and uppercase method names itself so overridden methods cannot be renamed now. - Changed else ifs after returns to ifs. - Added some newlines for readability. Reviewers: aardappel, sbc100 Subscribers: dschuff, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D55350 llvm-svn: 348648	2018-12-07 21:35:37 +00:00
Heejin Ahn	d2fd70991d	Delete registerScope function `unregisterScope()` is not currently used, so removing it. llvm-svn: 348647	2018-12-07 21:31:14 +00:00
Simon Pilgrim	9b8fdab26c	[X86] Replace instregex with instrs list. NFCI. llvm-svn: 348626	2018-12-07 18:47:05 +00:00
Matt Arsenault	ce2e053134	AMDGPU: Allow f32 types for llvm.amdgcn.s.buffer.load llvm-svn: 348625	2018-12-07 18:41:39 +00:00
Craig Topper	ba3ab78291	[X86] Initialize and Register X86CondBrFoldingPass To make X86CondBrFoldingPass can be run with --run-pass option, this can test one wrong assertion on analyzeCompare function for SUB32ri when its operand is not imm Patch by Jianping Chen Differential Revision: https://reviews.llvm.org/D55412 llvm-svn: 348620	2018-12-07 18:10:34 +00:00
Matt Arsenault	ca8eb0b672	AMDGPU: Remove llvm.SI.tbuffer.store llvm-svn: 348619	2018-12-07 18:03:47 +00:00
Simon Pilgrim	6155b32250	[X86] Improve pfm counter coverage for llvm-exegesis This patch attempts to improve pfm perf counter coverage for all the x86 CPUs that libpfm4 supports. Intel/AMD CPU families tend to share names for cycle/uops counters so even if they don't have a scheduler model yet they can at least use the default values (checked against the libpfm4 source code). The remaining CPUs (where their port/pipe resource counters are known) I've tried to add to the existing model mappings. These are untested but don't represent a regression to current llvm-exegesis behaviour for these CPUs. Differential Revision: https://reviews.llvm.org/D55432 llvm-svn: 348617	2018-12-07 17:48:40 +00:00
Matt Arsenault	3ff764a944	AMDGPU: Remove llvm.SI.buffer.load.dword llvm-svn: 348616	2018-12-07 17:46:20 +00:00
Matt Arsenault	aa9bcd56b1	AMDGPU: Remove llvm.AMDGPU.kill This is the last of the old AMDGPU intrinsics. llvm-svn: 348615	2018-12-07 17:46:16 +00:00
Graham Sellers	b297379ef0	[AMDGPU] Shrink scalar AND, OR, XOR instructions This change attempts to shrink scalar AND, OR and XOR instructions which take an immediate that isn't inlineable. It performs: AND s0, s0, ~(1 << n) -> BITSET0 s0, n OR s0, s0, (1 << n) -> BITSET1 s0, n AND s0, s1, x -> ANDN2 s0, s1, ~x OR s0, s1, x -> ORN2 s0, s1, ~x XOR s0, s1, x -> XNOR s0, s1, ~x In particular, this catches setting and clearing the sign bit for fabs (and x, 0x7ffffffff -> bitset0 x, 31 and or x, 0x80000000 -> bitset1 x, 31). llvm-svn: 348601	2018-12-07 15:33:21 +00:00
Tim Northover	4bf394be3a	ARM: use correct offset from base pointer (r6) in call frame regions. When we had dynamic call frames (i.e. sp adjustment around each call) we were including that adjustment into offsets calculated based on r6, even though it's only sp that changes. This led to incorrect stack slot accesses. llvm-svn: 348591	2018-12-07 13:43:55 +00:00
David Green	ca29c271d2	[Targets] Add errors for tiny and kernel codemodel on targets that don't support them Adds fatal errors for any target that does not support the Tiny or Kernel codemodels by rejigging the getEffectiveCodeModel calls. Differential Revision: https://reviews.llvm.org/D50141 llvm-svn: 348585	2018-12-07 12:10:23 +00:00
Simon Pilgrim	74c371da7b	Fix gcc7.3 -Wparentheses warning. NFCI. llvm-svn: 348581	2018-12-07 11:10:03 +00:00
Simon Pilgrim	9c7d85bc62	[X86] Add ivybridge to llvm-exegesis PFM counter mappings llvm-svn: 348575	2018-12-07 09:27:35 +00:00
Zi Xuan Wu	cf4d477b0b	[PowerPC] Fix assert from machine verify pass that missing undef register flag Fix assert about using an undefined physical register in machine instruction verify pass. The reason is that register flag undef is missing when doing transformation from If Conversion Pass. ``` Bad machine code: Using an undefined physical register - function: func_65 - basic block: %bb.0 entry (0x10024740738) - instruction: BCLR killed $cr5lt, implicit $lr8, implicit $rm, implicit undef $x3 - operand 0: killed $cr5lt LLVM ERROR: Found 1 machine code errors. ``` There are also other existing testcases with same issue. So I add -verify-machineinstrs option to open verifying. Differential Revision: https://reviews.llvm.org/D55408 llvm-svn: 348566	2018-12-07 05:25:16 +00:00
Craig Topper	2c7a9476e0	[X86] Directly create ADC/SBB nodes instead of using ADD/SUB with (and SETCC_CARRY, 1) This addresses a FIXME and avoids depending on an isel pattern match I think. I've remove the isel patterns too since he have no lit tests left that cover them. Hopefully that really means they are unused. I'm trying to decide if we need SETCC_CARRY. This removes one of its usages. Differential Revision: https://reviews.llvm.org/D55355 llvm-svn: 348536	2018-12-06 22:26:59 +00:00
Evandro Menezes	799b76eae2	[AArch64] Fix Exynos predicate Fix predicate for arithmetic instructions with shift and/or extend. llvm-svn: 348510	2018-12-06 18:25:37 +00:00
Simon Pilgrim	bb650daeaf	[X86] Refactored IsSplatVector to use switch. NFCI. Initial step towards making the function more generic (and probably move into SelectionDAG). This is necessary to avoid massive codegen bloat for PR38243 (Add modulo rotate support to LowerRotate). llvm-svn: 348498	2018-12-06 16:29:14 +00:00
Alexey Bataev	2e1a782189	[DEBUGINFO, NVPTX] Disable emission of ',debug' option if only debug directives are allowed. Summary: If the output of debug directives only is requested, we should drop emission of ',debug' option from the target directive. Required for supporting of nvprof profiler. Reviewers: echristo Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D46061 llvm-svn: 348497	2018-12-06 16:25:35 +00:00
Alexey Bataev	64ad0ad5ed	[DEBUGINFO, NVPTX]Emit last debugging directives. Summary: We may end up with not emitted debug directives at the end of the module emission. Patch fixes this problem emitting those last directives the end of the module emission. Reviewers: echristo Subscribers: jholewinski, llvm-commits Differential Revision: https://reviews.llvm.org/D54320 llvm-svn: 348495	2018-12-06 16:02:09 +00:00
Diogo N. Sampaio	9c9067316b	[NFC][AArch64] Split out backend features This patch splits backend features currently hidden behind architecture versions. For example, currently the only way to activate complex numbers extension is targeting an v8.3 architecture, where after the patch this extension can be added separately. This refactoring is required by the new command lines proposal: http://lists.llvm.org/pipermail/llvm-dev/2018-September/126346.html Reviewers: DavidSpickett, olista01, t.p.northover Subscribers: kristof.beyls, bryanpkc, javed.absar, pbarrio Differential revision: https://reviews.llvm.org/D54633 -- It was reverted in rL348249 due a build bot failure in one of the regression tests: http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-win/builds/14386 The problem seems to be that FileCheck behaves different in windows and linux. This new patch splits the test file in multiple, and does more exact pattern matching attempting to circumvent the issue. llvm-svn: 348493	2018-12-06 15:39:17 +00:00
Nicolai Haehnle	ca4a32945f	AMDGPU: Generate VALU ThreeOp Integer instructions Summary: Original patch by: Fabian Wahlster <razor@singul4rity.com> Change-Id: I148f692a88432541fad468963f58da9ddf79fac5 Reviewers: arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, b-sumner, llvm-commits Differential Revision: https://reviews.llvm.org/D51995 llvm-svn: 348488	2018-12-06 14:33:40 +00:00
Valery Pykhtin	f479fbba5f	[AMDGPU] Partial revert of rL348371: Turn on the DPP combiner by default Turn the combiner back off as there're failures until the issue is fixed. Differential revision: https://reviews.llvm.org/D55314 llvm-svn: 348487	2018-12-06 14:20:02 +00:00
Diana Picus	1027249ec9	[ARM GlobalISel] Nothing is legal for Thumb ...yet! A lot of the current code should be shared for arm and thumb mode, but until we add tests and work out some of the details (e.g. checking the correct subtarget feature for G_SDIV) it's safer to bail out as early as possible for thumb targets. This should have arguably been part of r348347, which allowed Thumb functions to be handled by the IR Translator. llvm-svn: 348472	2018-12-06 09:26:14 +00:00
Craig Topper	6a6d77b851	[X86] Remove some leftover code for handling an i1 setcc type. NFC We should only need to handle i8 now. llvm-svn: 348460	2018-12-06 07:00:02 +00:00
Matthias Braun	d041212c07	AArch64: Fix invalid CCMP emission The code emitting AND-subtrees used to check whether any of the operands was an OR in order to figure out if the result needs to be negated. However the OR could be hidden in further subtrees and not immediately visible. Change the code so that canEmitConjunction() determines whether the result of the generated subtree needs to be negated. Cleanup emission logic to use this. I also changed the code a bit to make all negation decisions early before we actually emit the subtrees. This fixes http://llvm.org/PR39550 Differential Revision: https://reviews.llvm.org/D54137 llvm-svn: 348444	2018-12-06 01:40:23 +00:00
Krzysztof Parzyszek	8eb394d764	[Hexagon] Add intrinsics for Hexagon V66 llvm-svn: 348413	2018-12-05 21:14:51 +00:00
Krzysztof Parzyszek	545a68ca4b	[Hexagon] Add instruction definitions for Hexagon V66 llvm-svn: 348411	2018-12-05 21:01:07 +00:00
Krzysztof Parzyszek	13a9cf28a1	[Hexagon] Foundation of support for Hexagon V66 llvm-svn: 348407	2018-12-05 20:18:09 +00:00
Aditya Nandakumar	f75d4f329c	[GISel]: Provide standard interface to observe changes in GISel passes https://reviews.llvm.org/D54980 This provides a standard API across GISel passes to observe and notify passes about changes (insertions/deletions/mutations) to MachineInstrs. This patch also removes the recordInsertion method in MachineIRBuilder and instead provides method to setObserver. Reviewed by: vkeles. llvm-svn: 348406	2018-12-05 20:14:52 +00:00
Evandro Menezes	4b5707550c	[AArch64] Reword description of feature (NFC) Reword the description of the feature that enables custom handling of cheap instructions. llvm-svn: 348398	2018-12-05 18:42:57 +00:00
Chandler Carruth	71c14a36a2	[SLH] Fix a nasty bug in SLH. Whenever we effectively take the address of a basic block we need to manually update that basic block to reflect that fact or later passes such as tail duplication and tail merging can break the invariants of the code. =/ Sadly, there doesn't appear to be any good way of automating this or even writing a reasonable assert to catch it early. The change seems trivially and obviously correct, but sadly the only really good test case I have is 1000s of basic blocks. I've tried directly writing a test case that happens to make tail duplication do something that crashes later on, but this appears to require an amazingly complex set of conditions that I've not yet reproduced. The change is technically covered by the tests because we mark the blocks as having their address taken, but that doesn't really count as properly testing the functionality. llvm-svn: 348374	2018-12-05 15:42:11 +00:00
Valery Pykhtin	5b4db77b13	[AMDGPU]: Turn on the DPP combiner by default Differential revision: https://reviews.llvm.org/D55314 llvm-svn: 348371	2018-12-05 15:21:17 +00:00
Simon Pilgrim	32483668d7	[X86][SSE] Begun adding modulo rotate support to LowerRotate Prep work for PR38243 - mainly adding comments on where we need to add modulo support (doing so at the moment causes massive codegen regressions). I've also consistently added support for modulo folding for uniform constants (although at the moment we have no way to trigger this) and removed the old assertions. llvm-svn: 348366	2018-12-05 14:46:37 +00:00
Simon Pilgrim	180639afe5	[SelectionDAG] Initial support for FSHL/FSHR funnel shift opcodes (PR39467) This is an initial patch to add a minimum level of support for funnel shifts to the SelectionDAG and to begin wiring it up to the X86 SHLD/SHRD instructions. Some partial legalization code has been added to handle the case for 'SlowSHLD' where we want to expand instead and I've added a few DAG combines so we don't get regressions from the existing DAG builder expansion code. Differential Revision: https://reviews.llvm.org/D54698 llvm-svn: 348353	2018-12-05 11:12:12 +00:00
Diana Picus	8a1b4f57c9	[ARM GlobalISel] Implement call lowering for Thumb2 The only things that are different from arm are: * different opcodes for calls and returns * Thumb calls take predicate operands llvm-svn: 348347	2018-12-05 10:35:28 +00:00
Saleem Abdulrasool	efd2cb8a0d	AArch64: support funclets in fastcall and swift_call Functions annotated with `__fastcall` or `__attribute__((__fastcall__))` or `__attribute__((__swiftcall__))` may contain SEH handlers even on Win64. This matches the behaviour of cl which allows for `__try`/`__except` inside a `__fastcall` function. This was detected while trying to self-host clang on Windows ARM64. llvm-svn: 348337	2018-12-05 07:09:20 +00:00
Amara Emerson	8547f4fb7f	[AArch64][GlobalISel] Re-enable selection of volatile loads. We previously disabled this in r323371 because of a bug where we selected an extending load, but didn't delete the old G_LOAD, resulting in two loads being generated for volatile loads. Since we now have dedicated G_SEXTLOAD/G_ZEXTLOAD operations, and that the tablegen patterns should no longer be able to select (ext(load x)) patterns, it should be safe to re-enable it. The old test case should still work as expected. llvm-svn: 348320	2018-12-05 00:03:09 +00:00
Saleem Abdulrasool	a9248fdfab	AArch64: clean up some whitespace in Windows CC (NFC) Drive by clean up for Windows ARM64 variadic CC (NFC). llvm-svn: 348310	2018-12-04 22:19:29 +00:00
Nirav Dave	9c593e8676	[AVR] Silence fallthrough warning. NFC. llvm-svn: 348304	2018-12-04 21:41:52 +00:00
Stefan Pintilie	46f840f286	[PowerPC] Make no-PIC default to match GCC - LLVM Change the default for PowerPC LE to -fno-PIC. Differential Revision: https://reviews.llvm.org/D53383 llvm-svn: 348298	2018-12-04 20:14:57 +00:00
Nirav Dave	ce26c27b2a	[SelectionDAG] Redefine isGAPlusOffset in terms of unwrapAddress. NFCI. llvm-svn: 348288	2018-12-04 17:59:43 +00:00
Matt Arsenault	0f414b81cc	AMDGPU: Add f32 vectors to SGPR register classes llvm-svn: 348286	2018-12-04 17:51:36 +00:00
Simon Pilgrim	07843640d5	[X86][SSE] Add SimplifyDemandedBitsForTargetNode handling for MOVMSK Moves existing SimplifyDemandedBits call out of combineMOVMSK and add SimplifyDemandedVectorElts call based on the sign bits we need. llvm-svn: 348282	2018-12-04 16:52:32 +00:00
Krzysztof Parzyszek	9fc0a2fe30	[Hexagon] Remove unused checker functions from asm parser llvm-svn: 348269	2018-12-04 14:58:14 +00:00
Simon Pilgrim	6a088b2ce5	Fix MSVC "unknown pragma" warning. NFCI. llvm-svn: 348256	2018-12-04 12:31:52 +00:00
Simon Pilgrim	58d44235e5	Fix -Wparentheses warning. NFCI. llvm-svn: 348254	2018-12-04 12:24:10 +00:00
Simon Pilgrim	b1d6db7693	[X86] Remove unnecessary peekThroughEXTRACT_SUBVECTORs call. The GetSplatValue/IsSplatVector call will call this anyhow and the later code is just for a v2i64 type so doesn't need it. llvm-svn: 348253	2018-12-04 12:21:43 +00:00
Simon Pilgrim	0add090e24	[TargetLowering] expandFP_TO_UINT - avoid FPE due to out of range conversion (PR17686) PR17686 demonstrates that for some targets FP exceptions can fire in cases where the FP_TO_UINT is expanded using a FP_TO_SINT instruction. The existing code converts both the inrange and outofrange cases using FP_TO_SINT and then selects the result, this patch changes this for 'strict' cases to pre-select the FP_TO_SINT input and the offset adjustment. The X87 cases don't need the strict flag but generates much nicer code with it.... Differential Revision: https://reviews.llvm.org/D53794 llvm-svn: 348251	2018-12-04 11:21:30 +00:00
Simon Pilgrim	1a2e0200ac	Revert rL348121 from llvm/trunk: [NFC][AArch64] Split out backend features This patch splits backend features currently hidden behind architecture versions. For example, currently the only way to activate complex numbers extension is targeting an v8.3 architecture, where after the patch this extension can be added separately. This refactoring is required by the new command lines proposal: http://lists.llvm.org/pipermail/llvm-dev/2018-September/126346.html Reviewers: DavidSpickett, olista01, t.p.northover Subscribers: kristof.beyls, bryanpkc, javed.absar, pbarrio Differential revision: https://reviews.llvm.org/D54633 ........ This has been causing buildbots failures for the past 24 hours: http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-win/builds/14386 llvm-svn: 348249	2018-12-04 10:55:48 +00:00
Craig Topper	35585aff34	[X86] Remove custom DAG combine for SIGN_EXTEND_VECTOR_INREG/ZERO_EXTEND_VECTOR_INREG. We only needed this because it provided really aggressive constant folding even through constant pool entries created from build_vectors. The main case was for vXi8 MULH legalization which was happening as part of legalize DAG instead of as part of legalize vector ops. Now its part of vector op legalization and we've added special handling for build vectors of all constants there. This has removed the need for this code on the list tests we have. llvm-svn: 348237	2018-12-04 04:51:07 +00:00
Sanjin Sijaric	dc6403d133	[ARM64][Windows] Fix local stack size for funclets The comment was misplaced, and the code didn't do what the comment indicated, namely ignoring the varargs portion when computing the local stack size of a funclet in emitEpilogue. This results in incorrect offset computations within funclets that are contained in vararg functions. Differential Revision: https://reviews.llvm.org/D55096 llvm-svn: 348222	2018-12-04 00:54:52 +00:00
Jessica Paquette	bce2086ad1	[MachineOutliner] Move stack instr check logic to getOutliningCandidateInfo This moves the stack check logic into a lambda within getOutliningCandidateInfo. This allows us to be less conservative with stack checks. Whether or not a stack instruction is safe to outline is dependent on the frame variant and call variant of the outlined function; only in cases where we modify the stack can these be unsafe. So, if we move that logic later, when we're looking at an individual candidate, we can make better decisions here. This gives some code size savings as a result. llvm-svn: 348220	2018-12-04 00:31:55 +00:00
Jessica Paquette	2f5833ecd9	[MachineOutliner][AArch64][NFC] Add early exit to candidate discarding logic If we dropped too many candidates to be beneficial when dropping candidates that modify the stack, there's no reason to check for other cost model qualities. llvm-svn: 348219	2018-12-04 00:31:47 +00:00
Krzysztof Parzyszek	44c1f81b27	[Hexagon] Switch to auto-generated intrinsic definitions and patterns llvm-svn: 348206	2018-12-03 22:40:36 +00:00
Krzysztof Parzyszek	9dafa8a2c6	[Hexagon] Extract operand decoders into a separate file, NFC These decoders are automatically generated. Keeping them separated makes updating architectures easier. llvm-svn: 348196	2018-12-03 21:59:21 +00:00
Sanjay Patel	d24f63477d	[DAGCombiner] narrow truncated vector binops when legal This is the smallest vector enhancement I could find to D54640. Here, we're allowing narrowing to only legal vector ops because we'll see regressions without that. All of the test diffs are wins from what I can tell. With AVX/AVX512, we can shrink ymm/zmm ops to xmm. x86 vector multiplies are the problem case that we're avoiding due to the patchwork ISA, and it's not clear to me if we can dance around those regressions using TLI hooks or if we need preliminary patches to plug those holes. Differential Revision: https://reviews.llvm.org/D55126 llvm-svn: 348195	2018-12-03 21:57:35 +00:00
Simon Atanasyan	f76884b0d3	[mips] Fix TestDWARF32Version5Addr8AllForms test failure on MIPS hosts The `DIEExpr` is used in debug information entries for either TLS variables or call sites. For now the last case is unsupported for targets with delay slots, for MIPS in particular. The `DIEExpr::EmitValue` method calls a virtual `EmitDebugThreadLocal` routine which, in case of MIPS, always emits either `.dtprelword` or `.dtpreldword` directives. That is okay for "main" code, but in unit tests `DIEExpr` instances can be created not for TLS variables only even on MIPS hosts. That is a reason of the `TestDWARF32Version5Addr8AllForms` failure because handling of the `R_MIPS_TLS_DTPREL` relocation writes incorrect value into dwarf structures. And anyway unconditional emitting of `.dtprelword` directives will be incorrect when/if debug information entries for call sites become supported on MIPS. The patch solves the problem by wrapping expression created in the `MipsTargetObjectFile::getDebugThreadLocalSymbol` method in to the `MipsMCExpr` expression with a new `MEK_DTPREL` tag. This tag is recognized in the `MipsAsmPrinter::EmitDebugThreadLocal` method and `.dtprelword` directives created in this case only. In other cases the expression saved as a regular data. Differential Revision: http://reviews.llvm.org/D54937 llvm-svn: 348194	2018-12-03 21:54:43 +00:00
Krzysztof Parzyszek	a45a55fc67	[Hexagon] Remove unused encodings, NFC llvm-svn: 348193	2018-12-03 21:49:12 +00:00
Wouter van Oortmerssen	c7b89f0f62	[WebAssembly] Enforce assembler emits to streamer in order. Summary: The assembler processes directives and instructions in whatever order they are in the file, then directly emits them to the streamer. This could cause badly written (or generated) .s files to produce incorrect binaries. It now has state that tracks what it has most recently seen, to enforce they are emitted in a given order that always produces correct wasm binaries. Also added a new test that compares obj2yaml output from llc (the backend) to that going via .s and the assembler to ensure both paths generate the same binaries. The features this test covers could be extended. Passes all wasm Lit tests. Fixes: https://bugs.llvm.org/show_bug.cgi?id=39557 Reviewers: sbc100, dschuff, aheejin Subscribers: jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D55149 llvm-svn: 348185	2018-12-03 20:30:28 +00:00
Krzysztof Parzyszek	6290a73f29	[Hexagon] Update timing classes llvm-svn: 348183	2018-12-03 20:13:18 +00:00
Krzysztof Parzyszek	1cbc5cd364	[Hexagon] Change instruction type field in TSFlags to 7 bits llvm-svn: 348171	2018-12-03 19:34:04 +00:00
Jessica Paquette	2accb31690	[MachineOutliner] Drop candidates that require fixups if it's beneficial If it's a bigger code size win to drop candidates that require stack fixups than to demote every candidate to that variant, the outliner should do that. This happens if the number of bytes taken by calls to functions that don't require fixups, plus the number of bytes that'd be left is less than the number of bytes that it'd take to emit a save + restore for all candidates. Also add tests for each possible new behaviour. - machine-outliner-compatible-candidates shows that when we have candidates that don't use the stack, we can use the default call variant along with the no save/regsave variant. - machine-outliner-all-stack shows that when it's better to fix up the stack, we still will demote all candidates to that case - machine-outliner-drop-stack shows that we can discard candidates that require stack fixups when it would be beneficial to do so. llvm-svn: 348168	2018-12-03 19:11:27 +00:00
Krzysztof Parzyszek	71a7f447f6	[Hexagon] Add HasV5 predicate for compatibility with auto-generated files llvm-svn: 348167	2018-12-03 19:05:42 +00:00
Krzysztof Parzyszek	a55515f9a6	[Hexagon] Remove unused operand definitions, NFC llvm-svn: 348163	2018-12-03 18:54:24 +00:00
Krzysztof Parzyszek	7ecc277ef9	[Hexagon] Some formatting changes, NFC llvm-svn: 348162	2018-12-03 18:40:15 +00:00
Craig Topper	5440b63fa8	[X86] Teach LowerMUL/LowerMULH for vXi8 to unpack constant RHS. Summary: We need to unpackl and unpackh the operands to use two vXi16 multiplies. Previously it looks like the low unpack would get constant folded at least in the 128-bit case after shuffle lowering turned the unpackl into ZERO_EXTEND_VECTOR_INREG and X86 custom DAG combined it. The same doesn't happen for the high half. So we'd load a constant and then shuffle it. But the low half would just be loaded and used by the multiply directly. After this patch we now end up with a constant pool entry for the low and high unpacks separately with no shuffle operations. This is a step towards removing custom constant folding for ZERO_EXTEND_VECTOR_INREG/SIGN_EXTEND_VECTOR_INREG in the X86 backend. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D55165 llvm-svn: 348159	2018-12-03 18:26:27 +00:00
Craig Topper	e35b01f8ea	[X86] Add DAG combine to combine a v8i32->v8i16 truncate with a packuswb that truncates v8i16->v8i8. Summary: Under -x86-experimental-vector-widening-legalization, fp_to_uint/fp_to_sint with a smaller than 128 bit vector type results are custom type legalized by promoting the result to a 128 bit vector by promoting the elements, inserting an assertzext/assertsext, then truncating back to original type. The truncate will be further legalizdd to a pack shuffle. In the case of a v8i8 result type, we'll end up with a v8i16 fp_to_sint. This will need to be further legalized during vector op legalization by promoting to v8i32 and then truncating again. Under avx2 this produces good code with two pack instructions, but Under avx512 this will result in a truncate instruction and a packuswb instruction. But we should be able to get away with a single truncate instruction. The other option is to promote all the way to vXi32 result type during the first type legalization. But in some experimentation that seemed to require more work to produce good code for other configurations. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D54836 llvm-svn: 348158	2018-12-03 18:26:24 +00:00
Jonas Paulsson	8ae0f88b13	[SystemZ::TTI] Return zero cost for ICmp that becomes Load And Test. A loaded value with multiple users compared with 0 will become a load and test single instruction. The load is not folded in this case (multiple users), but the compare instruction is eliminated. This patch returns 0 cost for the icmp in these cases. Review: Ulrich Weigand https://reviews.llvm.org/D55111 llvm-svn: 348141	2018-12-03 14:30:18 +00:00
Pablo Barrio	a17f855698	[AArch64] Add command-line option for SSBS Summary: SSBS (Speculative Store Bypass Safe) is only mandatory from 8.5 onwards but is optional from Armv8.0-A. This patch adds a command line option to enable SSBS, as it was previously only possible to enable by selecting -march=armv8.5-a. Similar patch upstream in GNU binutils: https://sourceware.org/ml/binutils/2018-09/msg00274.html Reviewers: olista01, samparker, aemerson Reviewed By: samparker Subscribers: javed.absar, kristof.beyls, kristina, llvm-commits Differential Revision: https://reviews.llvm.org/D54629 llvm-svn: 348137	2018-12-03 14:00:47 +00:00
Ron Lieberman	16de4fd2eb	[AMDGPU] Add sdwa support for ADD\|SUB U64 decomposed Pseudos The introduction of S_{ADD\|SUB}_U64_PSEUDO instructions which are decomposed into VOP3 instruction pairs for S_ADD_U64_PSEUDO: V_ADD_I32_e64 V_ADDC_U32_e64 and for S_SUB_U64_PSEUDO V_SUB_I32_e64 V_SUBB_U32_e64 preclude the use of SDWA to encode a constant. SDWA: Sub-Dword addressing is supported on VOP1 and VOP2 instructions, but not on VOP3 instructions. We desire to fold the bit-and operand into the instruction encoding for the V_ADD_I32 instruction. This requires that we transform the VOP3 into a VOP2 form of the instruction (_e32). %19:vgpr_32 = V_AND_B32_e32 255, killed %16:vgpr_32, implicit $exec %47:vgpr_32, %49:sreg_64_xexec = V_ADD_I32_e64 %26.sub0:vreg_64, %19:vgpr_32, implicit $exec %48:vgpr_32, dead %50:sreg_64_xexec = V_ADDC_U32_e64 %26.sub1:vreg_64, %54:vgpr_32, killed %49:sreg_64_xexec, implicit $exec which then allows the SDWA encoding and becomes %47:vgpr_32 = V_ADD_I32_sdwa 0, %26.sub0:vreg_64, 0, killed %16:vgpr_32, 0, 6, 0, 6, 0, implicit-def $vcc, implicit $exec %48:vgpr_32 = V_ADDC_U32_e32 0, %26.sub1:vreg_64, implicit-def $vcc, implicit $vcc, implicit $exec Differential Revision: https://reviews.llvm.org/D54882 llvm-svn: 348132	2018-12-03 13:04:54 +00:00
Tim Northover	5745b6ac3b	ARM: use target-specific SUBS node when combining cmp with cmov. This has two positive effects. First, using a custom node prevents recombination leading to an infinite loop since the output DAG is notionally a little more complex than the input one. Using a flag-setting instruction also allows the subtraction to be folded with the related comparison more easily. https://reviews.llvm.org/D53190 llvm-svn: 348122	2018-12-03 11:16:21 +00:00
Diogo N. Sampaio	3c7d062b6b	[NFC][AArch64] Split out backend features This patch splits backend features currently hidden behind architecture versions. For example, currently the only way to activate complex numbers extension is targeting an v8.3 architecture, where after the patch this extension can be added separately. This refactoring is required by the new command lines proposal: http://lists.llvm.org/pipermail/llvm-dev/2018-September/126346.html Reviewers: DavidSpickett, olista01, t.p.northover Subscribers: kristof.beyls, bryanpkc, javed.absar, pbarrio Differential revision: https://reviews.llvm.org/D54633 llvm-svn: 348121	2018-12-03 11:08:13 +00:00
Oliver Stannard	4cf35b4ab0	[ARM][MC] Move information about variadic register defs into tablegen Currently, variadic operands on an MCInst are assumed to be uses, because they come after the defs. However, this is not always the case, for example the Arm/Thumb LDM instructions write to a variable number of registers. This adds a property of instruction definitions which can be used to mark variadic operands as defs. This only affects MCInst, because MachineInstruction already tracks use/def per operand in each instance of the instruction, so can already represent this. This property can then be checked in MCInstrDesc, allowing us to remove some special cases in ARMAsmParser::isITBlockTerminator. Differential revision: https://reviews.llvm.org/D54853 llvm-svn: 348114	2018-12-03 10:32:42 +00:00
Oliver Stannard	c588110f13	[ARM][Asm] Debug trace for the processInstruction loop In the Arm assembly parser, we first match an instruction, then call processInstruction to possibly change it to a different encoding, to match rules in the architecture manual which can't be expressed by the table-generated matcher. This adds debug printing so that this process is visible when using the -debug option. To support this, I've added a new overload of MCInst::dump_pretty which takes the opcode name as a StringRef, since we don't have an InstPrinter instance in the assembly parser. Instead, we can get the same information directly from the MCInstrInfo. Differential revision: https://reviews.llvm.org/D54852 llvm-svn: 348113	2018-12-03 10:21:28 +00:00
Sjoerd Meijer	5afc957eba	[ARM] FP16: support vld1.16 for vector loads with post-increment Differential Revision: https://reviews.llvm.org/D55112 llvm-svn: 348110	2018-12-03 08:26:34 +00:00
Kang Zhang	51986417f9	[PowerPC] Fix inconsistent ImmMustBeMultipleOf for same instruction Summary: There are 4 instructions which have Inconsistent ImmMustBeMultipleOf in the function PPCInstrInfo::instrHasImmForm, they are LFS, LFD, STFS, STFD. These four instructions should set the ImmMustBeMultipleOf to 1 instead of 4. Reviewed By: steven.zhang Differential Revision: https://reviews.llvm.org/D54738 llvm-svn: 348109	2018-12-03 03:32:57 +00:00
QingShan Zhang	8b7653db72	[NFC] [PowerPC] add an routine in PPCTargetLowering to determine if a global is accessed as got-indirect or not. In theory, we should let the PPC target to determine how to lower the TOC Entry for globals. And the PPCTargetLowering requires this query to do some optimization for TOC_Entry. Differential Revision: https://reviews.llvm.org/D54925 llvm-svn: 348108	2018-12-03 03:32:16 +00:00
Craig Topper	959b415e2f	[X86] Add a DAG combine to turn stores of vXi1 on pre-avx512 targets into a bitcast and a store of a iX scalar. llvm-svn: 348104	2018-12-02 19:47:14 +00:00
Craig Topper	6f54ff57fd	[X86] Fix bad comment. NFC llvm-svn: 348103	2018-12-02 19:47:13 +00:00
Craig Topper	204e4110e0	[X86] Simplify LowerBITCAST code for v2i32/v4i16/v8i8/i64->mmx/i64/f64 bitcast. Previously this code generated its own extracts and build_vector. But we can use a simpler concat_vectors or scalar_to_vector operation and let type legalization do additional legalization of those operations. llvm-svn: 348087	2018-12-02 07:52:39 +00:00
Craig Topper	4bb077910a	[X86] Add custom type legalization for v2i32/v4i16/v8i8->mmx bitcasts to avoid a store/load to/from the stack. Widen the input to a 128 bit vector by padding with undef elements. Then use a movdq2q to convert from xmm register to mmx register. llvm-svn: 348086	2018-12-02 05:46:50 +00:00
Craig Topper	ec096a1dae	[X86] Custom type legalize v2i32/v4i16/v8i8->i64 bitcasts in 64-bit mode similar to what's done when the destination is f64. The generic legalizer will fall back to a stack spill that uses a truncating store. That store will get expanded into a shuffle and non-truncating store on pre-avx512 targets. Once that happens the stack store/load pair will be combined away leaving behind the shuffle and bitcasts. On avx512 targets the truncating store is legal so doesn't get folded away. By custom legalizing it we can avoid this churn and maybe produce better code. llvm-svn: 348085	2018-12-02 05:46:48 +00:00
Jessica Paquette	9a7103b0f8	[MachineOutliner][AArch64] Improve checks for stack instructions If we know that we'll definitely save LR to a register, there's no reason to pre-check whether or not a stack instruction is unsafe to fix up. This makes it so that we check for that condition before mapping instructions. This allows us to outline more, since we don't pessimise as many instructions. Also update some tests, since we outline more. llvm-svn: 348081	2018-12-01 21:24:06 +00:00
Craig Topper	f4b13927e7	[X86] Don't use zero_extend_vector_inreg for mulhu lowering with sse 4.1 Summary: With sse4.1 we use two zero_extend_vector_inreg and a pshufd to expand the v16i8 input into two v8i16 vectors for the multiply. That's 3 shuffles to extend one operand. The other operand is usually constant as this is mostly used by division by constant optimization. Pre sse4.1 we use a punpckhbw and a punpcklbw with a zero vector. That's two shuffles and an xor and a copy due to tied register constraints. That seems maybe better than the 3 shuffles. With AVX we avoid the copy so that's obviously better. Reviewers: spatel, RKSimon Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D55138 llvm-svn: 348079	2018-12-01 19:26:31 +00:00
Graham Sellers	ba559ac058	[AMDGPU] Split 64-Bit XNOR to 64-Bit NOT/XOR The identity ~(x ^ y) == (~x ^ y) == (x ^ ~y) allows XNOR (XOR/NOT) to turn into NOT/XOR. Handling this case with its own split means we can make the NOT remain in the scalar unit. Previously, we split 64-bit XNOR into two 32-bit XNOR, then lowered. Now, we get three instructions (s_not, v_xor, v_xor) rather than four in the case where either of the sources is a scalar 64-bit. Add test cases to xnor.ll to attempt XNOR Vx, Sy and XNOR Sx, Vy. Also adding test that uses the opposite identity such that (~x ^ y) on the scalar unit (or vector for gfx906) can generate XNOR. This already worked, but I didn't see a test for it. Differential: https://reviews.llvm.org/D55071 llvm-svn: 348075	2018-12-01 12:27:53 +00:00
Alex Bradbury	757d296222	[RISCV] Remove RV64I SLLW/SRLW/SRAW patterns and add new test cases As noted by Eli Friedman <https://reviews.llvm.org/D52977?id=168629#1315291>, the RV64I shift patterns for SLLW/SRLW/SRAW make some incorrect assumptions. SRAW assumed that (sext_inreg foo, i32) could only be produced when sign-extended an i32. However, it can be produced by input such as: define i64 @tricky_ashr(i64 %a, i64 %b) { %1 = shl i64 %a, 32 %2 = ashr i64 %1, 32 %3 = ashr i64 %2, %b ret i64 %3 } It's important not to select sraw in the above case, because sraw only uses bits lower 5 bits from the shift, while a shift of 32-63 would be valid. Similarly, the patterns for srlw assumed (and foo, 0xffffffff) would only be produced when zero-extending a value that was originally i32 in LLVM IR. This is obviously incorrect. This patch removes the SLLW/SRLW/SRAW shift patterns for the time being and adds test cases that would demonstrate a miscompile if the incorrect patterns were re-added. llvm-svn: 348067	2018-12-01 05:00:00 +00:00
Artem Belevich	e5664b1559	[NVPTX] Add lowering of i128 numbers as struct fields Addition to D34555 - override VTs computation with ComputePTXValueVTs for struct fields. Author: Denys Zariaiev<denys.zariaiev@gmail.com> Differential Revision: https://reviews.llvm.org/D55144 llvm-svn: 348057	2018-12-01 00:21:52 +00:00
Nicolai Haehnle	a7b00058e0	AMDGPU: Divergence-driven selection of scalar buffer load intrinsics Summary: Moving SMRD to VMEM in SIFixSGPRCopies is rather bad for performance if the load is really uniform. So select the scalar load intrinsics directly to either VMEM or SMRD buffer loads based on divergence analysis. If an offset happens to end up in a VGPR -- either because a floating point calculation was involved, or due to other remaining deficiencies in SIFixSGPRCopies -- we use v_readfirstlane. There is some unrelated churn in tests since we now select MUBUF offsets in a unified way with non-scalar buffer loads. Change-Id: I170e6816323beb1348677b358c9d380865cd1a19 Reviewers: arsenm, alex-t, rampitec, tpr Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D53283 llvm-svn: 348050	2018-11-30 22:55:38 +00:00
Nicolai Haehnle	a9cc92c247	AMDGPU: Fix various issues around the VirtReg2Value mapping Summary: The VirtReg2Value mapping is crucial for getting consistently reliable divergence information into the SelectionDAG. This patch fixes a bunch of issues that lead to incorrect divergence info and introduces tight assertions to ensure we don't regress: 1. VirtReg2Value is generated lazily; there were some cases where a lookup was performed before all relevant virtual registers were created, leading to an out-of-sync mapping. Those cases were: - Complex code to lower formal arguments that generated CopyFromReg nodes from live-in registers (fixed by never querying the mapping for live-in registers). - Code that generates CopyToReg for formal arguments that are used outside the entry basic block (fixed by never querying the mapping for Register nodes, which don't need the divergence info anyway). 2. For complex values that are lowered to a sequence of registers, all registers must be reflected in the VirtReg2Value mapping. I am not adding any new tests, since I'm not actually aware of any bugs that these problems are causing with trunk as-is. However, I recently added a test case (in r346423) which fails when D53283 is applied without this change. Also, the new assertions should provide most of the effective test coverage. There is one test change in sdwa-peephole.ll. The underlying issue is that since the divergence info is now correct, the DAGISel will select V_OR_B32 directly instead of S_OR_B32. This leads to an extra COPY which affects the behavior of MachineLICM in a way that ends up with the S_MOV_B32 with the constant in a different basic block than the V_OR_B32, which is presumably what defeats the peephole. Reviewers: alex-t, arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D54340 llvm-svn: 348049	2018-11-30 22:55:29 +00:00
Jessica Paquette	1cb18ec4ec	[MachineOutliner] Outline both register save calls + no LR save calls together Instead of treating the outlined functions for these as distinct frames, they should be combined into one case. Neither allows for stack fixups, and both generate the same frame. Thus, they ought to be considered one case. This makes the code far easier to understand, for one thing. It also offers some small code size improvements. It's fairly rare to see a class of outlined functions that doesn't fall entirely into one variant (on CTMark anyway). It does happen from time to time though. This mostly offers some serious simplification. Also update the test to show the added functionality. llvm-svn: 348036	2018-11-30 21:14:58 +00:00
Peter Collingbourne	35fcc294ab	AArch64: Don't emit CFI for SCS register in nounwind functions. All that you can legitimately do with the CFI for a nounwind function is get a backtrace, and adjusting the SCS register is not (currently) required for this purpose. Differential Revision: https://reviews.llvm.org/D54988 llvm-svn: 348035	2018-11-30 21:04:25 +00:00
Craig Topper	4d80f199e8	[X86] Change vXi8 MULHU lowering to unpack high and low half of lanes instead of extracting and concating low and high half registers. This reduces the number of shuffle operations that need to be done. The splitting strategy requires the shuffle unit for the extraction and the extension. With the unpack strategy the unpacks accomplish a splitting and extending in one operation. llvm-svn: 348019	2018-11-30 18:43:18 +00:00
Craig Topper	8191307d09	[X86] Prefer lowerVectorShuffleAsBitMask over using a avx512 masked operation when avx512bw/avx512vl is enabled. This does require a constant pool load instead of loading an immediate into a gpr, moving to a k register and masking. But its less instructions and more consistent with previous ISAs. It probably opens up more combine opportunities as one of the test cases demonstrates. llvm-svn: 348018	2018-11-30 18:43:15 +00:00
Ron Lieberman	f48e43bbf7	[AMDGPU] Disable SReg Global LD/ST, perf regression Differential Revision: https://reviews.llvm.org/D55093 llvm-svn: 348014	2018-11-30 18:29:17 +00:00
Valery Pykhtin	3d9afa273f	[AMDGPU] Combine DPP mov with use instructions (VOP1/2/3) Introduces DPP pseudo instructions and the pass that combines DPP mov with subsequent uses. Differential revision: https://reviews.llvm.org/D53762 llvm-svn: 347993	2018-11-30 14:21:56 +00:00
Alex Bradbury	4830fdd21a	[RISCV] Add additional CSR instruction aliases (imm. operands) This patch adds CSR instructions aliases for the cases where the instruction takes an immediate operand but the alias doesn't have the i suffix. This is necessary for gas/gcc compatibility. gas doesn't do a similar conversion for fsflags or fsrm, so this should be complete. Differential Revision: https://reviews.llvm.org/D55008 Patch by Luís Marques. llvm-svn: 347991	2018-11-30 14:10:52 +00:00
Alex Bradbury	26403def69	[RISCV] Add UNIMP instruction (32- and 16-bit forms) This patch adds support for UNIMP in both 32- and 16-bit forms. The 32-bit form can be seen as a variant of the ECALL/EBREAK/etc. family of instructions. The 16-bit form is just all zeroes, which isn't a valid RISC-V instruction, but still follows the 16-bit instruction form (i.e. bits 0-1 != 11). Until recently unimp was undocumented and supported just by binutils, which printed unimp for either the 16 or 32-bit form. Both forms are now documented <https://github.com/riscv/riscv-asm-manual/pull/20> and binutils now supports c.unimp <https://sourceware.org/ml/binutils-cvs/2018-11/msg00179.html>. Differential Revision: https://reviews.llvm.org/D54316 Patch by Luís Marques. llvm-svn: 347988	2018-11-30 13:39:17 +00:00
Alex Bradbury	e0e62e97df	[TargetLowering][RISCV] Introduce isSExtCheaperThanZExt hook and implement for RISC-V DAGTypeLegalizer::PromoteSetCCOperands currently prefers to zero-extend operands when it is able to do so. For some targets this is more expensive than a sign-extension, which is also a valid choice. Introduce the isSExtCheaperThanZExt hook and use it in the new SExtOrZExtPromotedInteger helper. On RISC-V, we prefer sign-extension for FromTy == MVT::i32 and ToTy == MVT::i64, as it can be performed using a single instruction. Differential Revision: https://reviews.llvm.org/D52978 llvm-svn: 347977	2018-11-30 09:56:54 +00:00
Alex Bradbury	bc96a98ed0	[RISCV] Introduce codegen patterns for instructions introduced in RV64I As discussed in the RFC <http://lists.llvm.org/pipermail/llvm-dev/2018-October/126690.html>, 64-bit RISC-V has i64 as the only legal integer type. This patch introduces patterns to support codegen of the new instructions introduced in RV64I: addiw, addiw, subw, sllw, slliw, srlw, srliw, sraw, sraiw, ld, sd. Custom selection code is needed for srliw as SimplifyDemandedBits will remove lower bits from the mask, meaning the obvious pattern won't work: def : Pat<(sext_inreg (srl (and GPR:$rs1, 0xffffffff), uimm5:$shamt), i32), (SRLIW GPR:$rs1, uimm5:$shamt)>; This is sufficient to compile and execute all of the GCC torture suite for RV64I other than those files using frameaddr or returnaddr intrinsics (LegalizeDAG doesn't know how to promote the operands - a future patch addresses this). When promoting i32 sltu/sltiu operands, it would be more efficient to use sign-extension rather than zero-extension for RV64. A future patch adds a hook to allow this. Differential Revision: https://reviews.llvm.org/D52977 llvm-svn: 347973	2018-11-30 09:38:44 +00:00
Craig Topper	a2133061c0	[X86] Emit PACKUS directly from the v16i8 LowerMULH code instead of using a shuffle. llvm-svn: 347967	2018-11-30 08:32:05 +00:00
Craig Topper	6e4b266a0d	[X86] Change the pre-sse4.1 code in the v16i8 MULHU lowering to be what we get after DAG combine cleans it up. Previously we emitted a punpcklbw/punpckhbw to move the byte elements into the upper half of 16 bit elements then shifted right by 8 to zero the upper bits. After DAG combine we end up with punpcklbw/punpckhbw into the lower bits with zeros in the uppers bits and no shifts. So just emit that directly. llvm-svn: 347966	2018-11-30 08:32:01 +00:00
Sjoerd Meijer	ecc7dcb879	[ARM] Don't expand sdiv when optimising for minsize Don't expand SDIV with an immediate that is a power of 2 if we optimise for minimum code size. For example: sdiv %1, i32 4 gets expanded to a sequence of 3 instructions, but this is suboptimal for minimum code size so instead we just generate a MOV and a SDIV if integer division is supported. Differential Revision: https://reviews.llvm.org/D54546 llvm-svn: 347965	2018-11-30 08:14:28 +00:00
Jonas Paulsson	b1d014883c	[SystemZ::TTI] i8/i16 operands extension costs revisited Three minor changes to these extra costs: * For ICmp instructions, instead of adding 2 all the time for extending each operand, this is only done if that operand is neither a load or an immediate. * The operands extension costs for divides removed, because we now use a high cost already for the divide (20). * The costs for lhsr/ashr extra costs removed as this did not seem useful. Review: Ulrich Weigand https://reviews.llvm.org/D55053 llvm-svn: 347961	2018-11-30 07:09:34 +00:00
Craig Topper	0850e8a6b6	[X86] Fix a couple types in SimplifyDemandedVectorEltsForTargetNode. NFCI We had a EVT variable capturing the result of getSimpleValueType which returns an MVT. Another place using EVT that could have been MVT. And an 'int' that should be 'unsigned'. llvm-svn: 347959	2018-11-30 06:23:55 +00:00
Mircea Trofin	5e0b21fb45	Fix build warnings introduced in rL347938 Summary: Suppressed warnings in release builds due to variable used only in assert statement. Subscribers: llvm-commits, eraman, mgorny Differential Revision: https://reviews.llvm.org/D55100 llvm-svn: 347939	2018-11-30 01:53:17 +00:00
Mircea Trofin	f1a49e8525	Revert "Revert r347596 "Support for inserting profile-directed cache prefetches"" Summary: This reverts commit d8517b96dfbd42e6a8db33c50d1fa1e58e63fbb9. Fix: correct the use of DenseMap. Reviewers: davidxl, hans, wmi Reviewed By: wmi Subscribers: mgorny, eraman, llvm-commits Differential Revision: https://reviews.llvm.org/D55088 llvm-svn: 347938	2018-11-30 01:01:52 +00:00
Thomas Lively	66ea30c7bc	[WebAssembly] Expand unavailable integer operations for vectors Summary: Expands for vector types all of the integer operations that are expanded for scalars because they are not supported at all by WebAssembly. This CL has no tests because such tests would really be testing the target-independent expansion, but I'm happy to add tests if reviewers think it would be helpful. Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D55010 llvm-svn: 347923	2018-11-29 22:01:01 +00:00
Jonas Devlieghere	ccf7d4b4aa	Produce an error on non-encodable offsets for darwin ARM scattered relocations. Scattered ARM relocations for Mach-O's only have 24 bits available to encode the offset. This is not checked but just truncated and can result in corrupt binaries after linking because the relocations are applied to the wrong offset. This patch will check and error out in those situations instead of emitting a wrong relocation. Patch by: Sander Bogaert (dzn) Differential revision: https://reviews.llvm.org/D54776 llvm-svn: 347922	2018-11-29 21:58:23 +00:00
Alex Bradbury	66d9a752b9	[RISCV] Implement codegen for cmpxchg on RV32IA Utilise a similar ('late') lowering strategy to D47882. The changes to AtomicExpandPass allow this strategy to be utilised by other targets which implement shouldExpandAtomicCmpXchgInIR. All cmpxchg are lowered as 'strong' currently and failure ordering is ignored. This is conservative but correct. Differential Revision: https://reviews.llvm.org/D48131 llvm-svn: 347914	2018-11-29 20:43:42 +00:00
Craig Topper	73c1d75d58	[X86] Change the pre-type legalization DAG combine added in r347898 into a custom type legalization operation instead. This seems to produce the same results on the tests we have. llvm-svn: 347912	2018-11-29 20:18:58 +00:00
David Stuttard	c6603861d8	Revert r347871 "Fix: Add support for TFE/LWE in image intrinsic" Also revert fix r347876 One of the buildbots was reporting a failure in some relevant tests that I can't repro or explain at present, so reverting until I can isolate. llvm-svn: 347911	2018-11-29 20:14:17 +00:00
Francis Visoiu Mistrih	0b8dd4488e	[MachineScheduler] Order FI-based memops based on stack direction It makes more sense to order FI-based memops in descending order when the stack goes down. This allows offsets to stay "consecutive" and allow easier pattern matching. llvm-svn: 347906	2018-11-29 20:03:19 +00:00
Craig Topper	129d529ab3	[SelectionDAG][AArch64][X86] Move legalization of vector MULHS/MULHU from LegalizeDAG to LegalizeVectorOps I believe we should be legalizing these with the rest of vector binary operations. If any custom lowering is required for these nodes, this will give the DAG combine between LegalizeVectorOps and LegalizeDAG to run on the custom code before constant build_vectors are lowered in LegalizeDAG. I've moved MULHU/MULHS handling in AArch64 from Lowering to isel. Moving the lowering earlier caused build_vector+extract_subvector simplifications to kick in which made the generated code worse. Differential Revision: https://reviews.llvm.org/D54276 llvm-svn: 347902	2018-11-29 19:36:17 +00:00
Craig Topper	6cd0b17078	[X86] Add a DAG combine pre type legalization to widen division by constant splat on narrow vectors to avoid scalarization This is another patch for -x86-experimental-vector-widening. This pre widens narrow division by constants so that we can get pass the legal type check in the generic DAG combiner. Otherwise we end up scalarizing. I've restricted this to splats for now because it was easy to just call DAG.getConstant. Not sure what we should do for non-splat? Increase the element size?Widen the constant vector by padding with 1? Differential Revision: https://reviews.llvm.org/D54919 llvm-svn: 347898	2018-11-29 19:13:38 +00:00
Graham Sellers	04f7a4d2d2	[AMDGPU] Add and update scalar instructions This patch adds support for S_ANDN2, S_ORN2 32-bit and 64-bit instructions and adds splits to move them to the vector unit (for which there is no equivalent instruction). It modifies the way that the more complex scalar instructions are lowered to vector instructions by first breaking them down to sequences of simpler scalar instructions which are then lowered through the existing code paths. The pattern for S_XNOR has also been updated to apply inversion to one input rather than the output of the XOR as the result is equivalent and may allow leaving the NOT instruction on the scalar unit. A new tests for NAND, NOR, ANDN2 and ORN2 have been added, and existing tests now hit the new instructions (and have been modified accordingly). Differential: https://reviews.llvm.org/D54714 llvm-svn: 347877	2018-11-29 16:05:38 +00:00
David Stuttard	535c1af0bf	Fix: Add support for TFE/LWE in image intrinsic My change svn-id: 347871 caused a buildbot failure due to an unused variable def (used in an assert). Change-Id: Ia882d18bb6fa79b4d7bbfda422b9ea5d23eab336 llvm-svn: 347876	2018-11-29 15:56:36 +00:00
David Stuttard	de02e4b1cc	Add support for TFE/LWE in image intrinsics TFE and LWE support requires extra result registers that are written in the event of a failure in order to detect that failure case. The specific use-case that initiated these changes is sparse texture support. This means that if image intrinsics are used with either option turned on, the programmer must ensure that the return type can contain all of the expected results. This can result in redundant registers since the vector size must be a power-of-2. This change takes roughly 6 parts: 1. Modify the instruction defs in tablegen to add new instruction variants that can accomodate the extra return values. 2. Updates to lowerImage in SIISelLowering.cpp to accomodate setting TFE or LWE (where the bulk of the work for these instruction types is now done) 3. Extra verification code to catch cases where intrinsics have been used but insufficient return registers are used. 4. Modification to the adjustWritemask optimisation to account for TFE/LWE being enabled (requires extra registers to be maintained for error return value). 5. An extra pass to zero initialize the error value return - this is because if the error does not occur, the register is not written and thus must be zeroed before use. Also added a new (on by default) option to ensure ALL return values are zero-initialized that is required for sparse texture support. 6. Disable the inst_combine optimization in the presence of tfe/lwe (later TODO for this to re-enable and handle correctly). There's an additional fix now to avoid a dmask=0 For an image intrinsic with tfe where all result channels except tfe were unused, I was getting an image instruction with dmask=0 and only a single vgpr result for tfe. That is incorrect because the hardware assumes there is at least one vgpr result, plus the one for tfe. Fixed by forcing dmask to 1, which gives the desired two vgpr result with tfe in the second one. The TFE or LWE result is returned from the intrinsics using an aggregate type. Look in the test code provided to see how this works, but in essence IR code to invoke the intrinsic looks as follows: %v = call {<4 x float>,i32} @llvm.amdgcn.image.load.1d.v4f32i32.i32(i32 15, i32 %s, <8 x i32> %rsrc, i32 1, i32 0) %v.vec = extractvalue {<4 x float>, i32} %v, 0 %v.err = extractvalue {<4 x float>, i32} %v, 1 Differential revision: https://reviews.llvm.org/D48826 Change-Id: If222bc03642e76cf98059a6bef5d5bffeda38dda llvm-svn: 347871	2018-11-29 15:21:13 +00:00
Hans Wennborg	6e3be9d12e	Revert r347596 "Support for inserting profile-directed cache prefetches" It causes asserts building BoringSSL. See https://crbug.com/91009#c3 for repro. This also reverts the follow-ups: Revert r347724 "Do not insert prefetches with unsupported memory operands." Revert r347606 "[X86] Add dependency from X86 to ProfileData after rL347596" Revert r347607 "Add new passes to X86 pipeline tests" llvm-svn: 347864	2018-11-29 13:58:02 +00:00
Petr Pavlu	e6406d568c	[GlobalISel] Make EnableGlobalISel always set when GISel is enabled Change meaning of TargetOptions::EnableGlobalISel. The flag was previously set only when a target switched on GlobalISel but it is now always set when the GlobalISel pipeline is enabled. This makes the flag consistent with TargetOptions::EnableFastISel and allows its use in other parts of the compiler to determine when GlobalISel is enabled. The EnableGlobalISel flag had previouly only one use in TargetPassConfig::isGlobalISelAbortEnabled(). The method used its value to determine if GlobalISel was enabled by a target and returned false in such a case. To preserve the current behaviour, a new flag TargetOptions::GlobalISelAbort is introduced to separately record the abort behaviour. Differential Revision: https://reviews.llvm.org/D54518 llvm-svn: 347861	2018-11-29 12:56:32 +00:00
Andrea Di Biagio	373a4ccf6c	[llvm-mca][MC] Add the ability to declare which processor resources model load/store queues (PR36666). This patch adds the ability to specify via tablegen which processor resources are load/store queue resources. A new tablegen class named MemoryQueue can be optionally used to mark resources that model load/store queues. Information about the load/store queue is collected at 'CodeGenSchedule' stage, and analyzed by the 'SubtargetEmitter' to initialize two new fields in struct MCExtraProcessorInfo named `LoadQueueID` and `StoreQueueID`. Those two fields are identifiers for buffered resources used to describe the load queue and the store queue. Field `BufferSize` is interpreted as the number of entries in the queue, while the number of units is a throughput indicator (i.e. number of available pickers for loads/stores). At construction time, LSUnit in llvm-mca checks for the presence of extra processor information (i.e. MCExtraProcessorInfo) in the scheduling model. If that information is available, and fields LoadQueueID and StoreQueueID are set to a value different than zero (i.e. the invalid processor resource index), then LSUnit initializes its LoadQueue/StoreQueue based on the BufferSize value declared by the two processor resources. With this patch, we more accurately track dynamic dispatch stalls caused by the lack of LS tokens (i.e. load/store queue full). This is also shown by the differences in two BdVer2 tests. Stalls that were previously classified as generic SCHEDULER FULL stalls, are not correctly classified either as "load queue full" or "store queue full". About the differences in the -scheduler-stats view: those differences are expected, because entries in the load/store queue are not released at instruction issue stage. Instead, those are released at instruction executed stage. This is the main reason why for the modified tests, the load/store queues gets full before PdEx is full. Differential Revision: https://reviews.llvm.org/D54957 llvm-svn: 347857	2018-11-29 12:15:56 +00:00
Nicolai Haehnle	7bed696915	AMDGPU/InsertWaitcnts: Remove the dependence on MachineLoopInfo Summary: MachineLoopInfo cannot be relied on for correctness, because it cannot properly recognize loops in irreducible control flow which can be introduced by late machine basic block optimization passes. See the new test case for the reduced form of an example that occurred in practice. Use a simple fixpoint iteration instead. In order to facilitate this change, refactor WaitcntBrackets so that it only tracks pending events and registers, rather than also maintaining state that is relevant for the high-level algorithm. Various accessor methods can be removed or made private as a consequence. Affects (in radv): - dEQP-VK.glsl.loops.special.{for,while}_uniform_iterations.select_iteration_count_{fragment,vertex} Fixes: r345719 ("AMDGPU: Rewrite SILowerI1Copies to always stay on SALU") Reviewers: msearles, rampitec, scott.linder, kanarayan Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits, hakzsam Differential Revision: https://reviews.llvm.org/D54231 llvm-svn: 347853	2018-11-29 11:06:26 +00:00
Nicolai Haehnle	ab43bf60fe	AMDGPU/InsertWaitcnt: Consistently use uint32_t for scores / time points Summary: There is one obsolete reference to using -1 as an indication of "unknown", but this isn't actually used anywhere. Using unsigned makes robust wrapping checks easier. Reviewers: msearles, rampitec, scott.linder, kanarayan Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, llvm-commits, tpr, t-tye, hakzsam Differential Revision: https://reviews.llvm.org/D54230 llvm-svn: 347852	2018-11-29 11:06:21 +00:00
Nicolai Haehnle	f96456c611	AMDGPU/InsertWaitcnt: Remove unused WaitAtBeginning Reviewers: msearles, rampitec, scott.linder, kanarayan Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits, hakzsam Differential Revision: https://reviews.llvm.org/D54229 llvm-svn: 347851	2018-11-29 11:06:18 +00:00
Nicolai Haehnle	d1f45dad84	AMDGPU/InsertWaitcnts: Simplify pending events tracking Summary: Instead of storing the "score" (last time point) of the various relevant events, only store whether an event is pending or not. This is sufficient, because whenever only one event of a count type is pending, its last time point is naturally the upper bound of all time points of this count type, and when multiple event types are pending, the count type has gone out of order and an s_waitcnt to 0 is required to clear any pending event type (and will then clear all pending event types for that count type). This also removes the special handling of GDS_GPR_LOCK and EXP_GPR_LOCK. I do not understand what this special handling ever attempted to achieve. It has existed ever since the original port from an internal code base, so my best guess is that it solved a problem related to EXEC handling in that internal code base. Reviewers: msearles, rampitec, scott.linder, kanarayan Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits, hakzsam Differential Revision: https://reviews.llvm.org/D54228 llvm-svn: 347850	2018-11-29 11:06:14 +00:00
Nicolai Haehnle	ae369d70c3	AMDGPU/InsertWaitcnts: Use foreach loops for inst and wait event types Summary: It hides the type casting ugliness, and I happened to have to add a new such loop (in a later patch). Reviewers: msearles, rampitec, scott.linder, kanarayan Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits, hakzsam Differential Revision: https://reviews.llvm.org/D54227 llvm-svn: 347849	2018-11-29 11:06:11 +00:00
Nicolai Haehnle	1a94cbb3f5	AMDGPU/InsertWaitcnts: Untangle some semi-global state Summary: Reduce the statefulness of the algorithm in two ways: 1. More clearly split generateWaitcntInstBefore into two phases: the first one which determines the required wait, if any, without changing the ScoreBrackets, and the second one which actually inserts the wait and updates the brackets. 2. Communicate pre-existing s_waitcnt instructions using an argument to generateWaitcntInstBefore instead of through the ScoreBrackets. To simplify these changes, a Waitcnt structure is introduced which carries the counts of an s_waitcnt instruction in decoded form. There are some functional changes: 1. The FIXME for the VCCZ bug workaround was implemented: we only wait for SMEM instructions as required instead of waiting on all counters. 2. We now properly track pre-existing waitcnt's in all cases, which leads to less conservative waitcnts being emitted in some cases. s_load_dword ... s_waitcnt lgkmcnt(0) <-- pre-existing wait count ds_read_b32 v0, ... ds_read_b32 v1, ... s_waitcnt lgkmcnt(0) <-- this is too conservative use(v0) more code use(v1) This increases code size a bit, but the reduced latency should still be a win in basically all cases. The worst code size regressions in my shader-db are: WORST REGRESSIONS - Code Size Before After Delta Percentage 1724 1736 12 0.70 % shaders/private/f1-2015/1334.shader_test [0] 2276 2284 8 0.35 % shaders/private/f1-2015/1306.shader_test [0] 4632 4640 8 0.17 % shaders/private/ue4_elemental/62.shader_test [0] 2376 2384 8 0.34 % shaders/private/f1-2015/1308.shader_test [0] 3284 3292 8 0.24 % shaders/private/talos_principle/1955.shader_test [0] Reviewers: msearles, rampitec, scott.linder, kanarayan Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits, hakzsam Differential Revision: https://reviews.llvm.org/D54226 llvm-svn: 347848	2018-11-29 11:06:06 +00:00
Craig Topper	c2540995ed	[X86] Correct comment. NFC llvm-svn: 347835	2018-11-29 05:56:03 +00:00
Li Jia He	bcae407a3c	[PowerPC] Fix a conversion is not considered when the ISD::BR_CC node making the instruction selection Summary: A signed comparison of i1 values produces the opposite result to an unsigned one if the condition code includes less-than or greater-than. This is so because 1 is the most negative signed i1 number and the most positive unsigned i1 number. The CR-logical operations used for such comparisons are non-commutative so for signed comparisons vs. unsigned ones, the input operands just need to be swapped. Reviewed By: steven.zhang Differential Revision: https://reviews.llvm.org/D54825 llvm-svn: 347831	2018-11-29 03:04:39 +00:00
Sanjay Patel	2de209313e	[x86] try select simplification for target-specific nodes This failed to select (which might be a separate bug) in X86ISelDAGToDAG because we try to create a select node that can be simplified away after rL347227. This change avoids the problem by simplifying the SHRUNKBLEND node sooner. In the test case, we manage to realize that the true/false values of the select (SHRUNKBLEND) are the same thing, so it simplifies away completely. llvm-svn: 347818	2018-11-28 22:51:04 +00:00
Craig Topper	81f1b4a361	[X86] Make X86TTIImpl::getCastInstrCost properly handle the case where AVX512 is enabled, but 512-bit vectors aren't legal. Unlike most cost model functions this code makes a lot of table lookups without using the results from getTypeLegalizationCost. This means 512-bit vectors can be looked up even when the type isn't legal. This patch adds a check around the two tables that contain 512-bit types to make sure that neither of the types would be split by type legalization. Meaning 512 bit types are illegal. I wanted to write this in a somewhat generic way that uses type legalization query hooks. But if prefered, I can switch to just using is512BitVector and the subtarget feature. Differential Revision: https://reviews.llvm.org/D54984 llvm-svn: 347786	2018-11-28 18:11:42 +00:00

... 6 7 8 9 10 ...

50667 Commits