llvm-project

Commit Graph

Author	SHA1	Message	Date
Stanislav Mekhanoshin	79b2828b3f	[AMDGPU] Reorder includes per coding standard. NFC. llvm-svn: 360609	2019-05-13 18:05:10 +00:00
Stanislav Mekhanoshin	21088639ae	[AMDGPU] Remove now unused V2FP16_ONE constant def. NFC. llvm-svn: 360608	2019-05-13 17:52:57 +00:00
Robert Lougher	91a9d4ef4b	Revert [X86] Avoid SFB - Fix inconsistent codegen with/without debug info Revert r360436 as it is causing clang-x64-windows-msvc buildbot to fail. llvm-svn: 360606	2019-05-13 17:36:46 +00:00
Nick Desaulniers	c33f754e74	[TargetLowering] Handle multi depth GEPs w/ inline asm constraints Summary: X86TargetLowering::LowerAsmOperandForConstraint had better support than TargetLowering::LowerAsmOperandForConstraint for arbitrary depth getelementpointers for "i", "n", and "s" extended inline assembly constraints. Hoist its support from the derived class into the base class. Link: https://github.com/ClangBuiltLinux/linux/issues/469 Reviewers: echristo, t.p.northover Reviewed By: t.p.northover Subscribers: t.p.northover, E5ten, kees, jyknight, nemanjai, javed.absar, eraman, hiraditya, jsji, llvm-commits, void, craig.topper, nathanchance, srhines Tags: #llvm Differential Revision: https://reviews.llvm.org/D61560 llvm-svn: 360604	2019-05-13 17:27:44 +00:00
Simon Pilgrim	73aee29095	[X86][SSE] LowerBuildVectorv4x32 - don't insert MOVQ for undef elts Fixes the regression noted in D61782 where a VZEXT_MOVL was being inserted because we weren't discriminating between 'zeroable' and 'all undef' for the upper elts. Differential Revision: https://reviews.llvm.org/D61782 llvm-svn: 360596	2019-05-13 16:10:11 +00:00
Simon Pilgrim	cf5a8eb7cd	[X86][SSE] Relax use limits for lowerAddSubToHorizontalOp (PR32433) Now that we can use HADD/SUB for scalar additions from any pair of extracted elements (D61263), we can relax the one use limit as we will be able to merge multiple uses into using the same HADD/SUB op. This exposes a couple of missed opportunities in LowerBuildVectorv4x32 which will be committed separately. Differential Revision: https://reviews.llvm.org/D61782 llvm-svn: 360594	2019-05-13 16:02:45 +00:00
Simon Pilgrim	d9aa928603	[X86] Add SimplifyDemandedBits support for PEXTRB/PEXTRW (PR39709) Test case will be included in a followup - its being used but its tricky to show a case that isn't caught at a later stage anyway. llvm-svn: 360588	2019-05-13 15:31:27 +00:00
Cullen Rhodes	6dcef8fc0c	[AArch64][SVE2] Add SVE2 target features to backend and TargetParser Summary: This patch adds the following features defined by Arm SVE2 architecture extension: sve2, sve2-aes, sve2-sm4, sve2-sha3, bitperm For existing CPUs these features are declared as unsupported to prevent scheduler errors. The specification can be found here: https://developer.arm.com/docs/ddi0602/latest Reviewers: SjoerdMeijer, sdesmalen, ostannard, rovka Reviewed By: SjoerdMeijer, rovka Subscribers: rovka, javed.absar, tschuett, kristof.beyls, kristina, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61513 llvm-svn: 360573	2019-05-13 10:10:24 +00:00
Ulrich Weigand	8e42f6ddc8	[SystemZ] Model floating-point control register This adds the FPC (floating-point control register) as a reserved physical register and models its use by SystemZ instructions. Note that only the current rounding modes and the IEEE exception masks are modeled. Changes of the FPC due to exceptions (in particular the IEEE exception flags and the DXC) are not modeled. At this point, this patch is mostly NFC, but it will prevent scheduling of floating-point instructions across SPFC/LFPC etc. llvm-svn: 360570	2019-05-13 09:47:26 +00:00
Sam Parker	a33e311a3b	[ARM][ParallelDSP] Relax alias checks When deciding the safety of generating smlad, we checked for any writes within the block that may alias with any of the loads that need to be widened. This is overly conservative because it only matters when there's a potential aliasing write to a location accessed by a pair of loads. Now we check for aliasing writes only once, during setup. If two loads are found to have an aliasing write between them, we don't add these loads to LoadPairs. This means that later during the transform, we can safely widened a pair without worrying about aliasing. However, to maintain correctness, we also need to change the way that wide loads are inserted because the order is now important. The MatchSMLAD method has also been changed, absorbing MatchReductions and AddMACCandidate to hopefully improve readability. Differential Revision: https://reviews.llvm.org/D6102 llvm-svn: 360567	2019-05-13 09:23:32 +00:00
Fangrui Song	f3be557159	[WebAssembly] Add dependency on WebAssemblyDesc to fix BUILD_SHARED_LIBS=on builds after rL360550 This fixes the link error ld.lld: error: undefined symbol: llvm::WebAssembly::anyTypeToString(unsigned int) >>> referenced by WebAssemblyDisassembler.cpp llvm-svn: 360558	2019-05-13 05:51:39 +00:00
Yonghong Song	98fe9c9869	[BPF] emit BTF sections only if debuginfo available Currently, without -g, BTF sections may still be emitted with data sections, e.g., for linux kernel bpf selftest test_tcp_check_syncookie_kern.c issue discovered by Martin as shown below. -bash-4.4$ bpftool btf dump file test_tcp_check_syncookie_kern.o [1] VAR 'results' type_id=0, linkage=global-alloc [2] VAR '_license' type_id=0, linkage=global-alloc [3] DATASEC 'license' size=0 vlen=1 type_id=2 offset=0 size=4 [4] DATASEC 'maps' size=0 vlen=1 type_id=1 offset=0 size=28 Let disable BTF generation if no debuginfo, which is the original design. Signed-off-by: Yonghong Song <yhs@fb.com> Differential Revision: https://reviews.llvm.org/D61826 llvm-svn: 360556	2019-05-13 05:00:23 +00:00
Craig Topper	61e556d2bd	Recommit r358887 "[TargetLowering][AMDGPU][X86] Improve SimplifyDemandedBits bitcast handling" I've included a new fix in X86RegisterInfo to prevent PR41619 without reintroducing r359392. We might be able to improve that in the base class implementation of shouldRewriteCopySrc somehow. But this hopefully enables forward progress on SimplifyDemandedBits improvements for now. Original commit message: This patch adds support for BigBitWidth -> SmallBitWidth bitcasts, splitting the DemandedBits/Elts accordingly. The AMDGPU backend needed an extra (srl (and x, c1 << c2), c2) -> (and (srl(x, c2), c1) combine to encourage BFE creation, I investigated putting this in DAGComb but it caused a lot of noise on other targets - some improvements, some regressions. The X86 changes are all definite wins. llvm-svn: 360552	2019-05-13 04:03:35 +00:00
David L. Jones	a263aa25e1	[WebAssembly] Move InstPrinter files to MCTargetDesc. NFC For some targets, there is a circular dependency between InstPrinter and MCTargetDesc. Merging them together will fix this. For the other targets, the merging is to maintain consistency so all targets will have the same structure. llvm-svn: 360550	2019-05-13 03:32:41 +00:00
Simon Pilgrim	a7fc763082	[X86][AVX] Split VZEXT_MOVL ymm/zmm if the upper elements are not demanded. Removes unnecessary vzeroupper noted in D61806 llvm-svn: 360543	2019-05-12 15:16:29 +00:00
Simon Pilgrim	fda6bffd3b	[X86][SSE] SimplifyDemandedBits - call PEXTRB/PEXTRW SimplifyDemandedVectorElts as well. See if we can simplify the demanded vector elts from the extraction before trying to simplify the demanded bits. This helps us with target shuffles and hops in particular. llvm-svn: 360535	2019-05-11 21:35:50 +00:00
Simon Pilgrim	6b10fde69b	[CostModel][X86] Add min/max reduction costs for all SSE targets The original costs stopped at SSE42, I've added conservative estimates for everything down to SSE1/SSE2 and moved some of the SSE42 costs to SSE41 (really only the addition of PCMPGT makes any difference). I've also added missing vXi8 costs (we use PHMINPOSUW for i8/i16 for scarily quick results) and 256-bit vector costs for AVX1. llvm-svn: 360528	2019-05-11 17:12:52 +00:00
Simon Pilgrim	e4c5b6d9bd	[X86][SSE] Add SimplifyDemandedVectorElts HADD/HSUB handling. Still missing PHADDW/PHSUBW tests because PEXTRW doesn't call SimplifyDemandedVectorElts llvm-svn: 360526	2019-05-11 16:07:12 +00:00
Simon Pilgrim	5e0f92acad	FixupLEAPass::fixupIncDec - non-LEA opcodes should not happen here. NFCI. Matches what we do in other functions and fixes scan-build warning about uninitialized NewOpcode variable. llvm-svn: 360525	2019-05-11 16:02:34 +00:00
Craig Topper	c9d7484aa3	[X86] Add CMOV_FR32X/CMOV_FR64X pseudo instructions. Use them in fast isel to fix a machine verifier error after adding test cases. Fast isel picks the FR32X/FR64X register classes when lowering pseudo select, but it didn't have the right opcode to go with it. llvm-svn: 360524	2019-05-11 16:00:28 +00:00
Craig Topper	74a436596d	[X86] Sink some fast isel code into the only if that uses it. NFC llvm-svn: 360523	2019-05-11 16:00:19 +00:00
Craig Topper	26f2b13a65	[X86] Use TLI.getRegClassFor to simplify some more fast isel code. NFCI llvm-svn: 360522	2019-05-11 16:00:13 +00:00
Simon Pilgrim	e7c51137aa	HexagonConstEvaluator::evaluateHexExt - check incoming opcodes. NFCI. Only certain extension opcodes are supported - fixes scan build warning. llvm-svn: 360520	2019-05-11 15:24:34 +00:00
Craig Topper	682cc09675	[X86] Use getRegClassFor to simplify some code in fast isel. NFCI No need to select the register class based on type and features. It should already be setup by X86ISelLowering. llvm-svn: 360513	2019-05-11 05:18:58 +00:00
Craig Topper	31f7adb94f	[X86] Don't emit MOVNTDQA loads from fast-isel without SSE4.1. We were checking for SSE4.1 for FP types, but not integer 128-bit types. Fixes PR41837. llvm-svn: 360512	2019-05-11 04:19:33 +00:00
Craig Topper	bdef12df8d	[X86] Add a test case for idempotent atomic operations with speculative load hardening. Fix an additional issue found by the test. This test covers the fix from r360475 as well. llvm-svn: 360511	2019-05-11 04:00:27 +00:00
Richard Trieu	d0124bd762	[SystemZ] Move InstPrinter files to MCTargetDesc. NFC For some targets, there is a circular dependency between InstPrinter and MCTargetDesc. Merging them together will fix this. For the other targets, the merging is to maintain consistency so all targets will have the same structure. llvm-svn: 360510	2019-05-11 03:36:16 +00:00
Richard Trieu	03fe9d82c4	[Sparc] Move InstPrinter files to MCTargetDesc. NFC For some targets, there is a circular dependency between InstPrinter and MCTargetDesc. Merging them together will fix this. For the other targets, the merging is to maintain consistency so all targets will have the same structure. llvm-svn: 360506	2019-05-11 02:59:02 +00:00
Richard Trieu	00ecf67045	[RISCV] Move InstPrinter files to MCTargetDesc. NFC For some targets, there is a circular dependency between InstPrinter and MCTargetDesc. Merging them together will fix this. For the other targets, the merging is to maintain consistency so all targets will have the same structure llvm-svn: 360505	2019-05-11 02:43:58 +00:00
Richard Trieu	4bdb136b0f	[PowerPC] Move InstPrinter files to MCTargetDesc. NFC For some targets, there is a circular dependency between InstPrinter and MCTargetDesc. Merging them together will fix this. For the other targets, the merging is to maintain consistency so all targets will have the same structure. llvm-svn: 360502	2019-05-11 02:33:18 +00:00
Richard Trieu	4b620fcf0f	[NVPTX] Move InstPrinter files to MCTargetDesc. NFC For some targets, there is a circular dependency between InstPrinter and MCTargetDesc. Merging them together will fix this. For the other targets, the merging is to maintain consistency so all targets will have the same structure. llvm-svn: 360500	2019-05-11 02:09:13 +00:00
Richard Trieu	61fb6700a5	[MSP430] Move InstPrinter files to MCTargetDesc. NFC For some targets, there is a circular dependency between InstPrinter and MCTargetDesc. Merging them together will fix this. For the other targets, the merging is to maintain consistency so all targets will have the same structure. llvm-svn: 360498	2019-05-11 01:58:52 +00:00
Richard Trieu	fa29bee9d0	[Mips] Move InstPrinter files to MCTargetDesc. NFC For some targets, there is a circular dependency between InstPrinter and MCTargetDesc. Merging them together will fix this. For the other targets, the merging is to maintain consistency so all targets will have the same structure. llvm-svn: 360497	2019-05-11 01:38:56 +00:00
Richard Trieu	4c3890ddbf	[Lanai] Move InstPrinter files to MCTargetDesc. NFC For some targets, there is a circular dependency between InstPrinter and MCTargetDesc. Merging them together will fix this. For the other targets, the merging is to maintain consistency so all targets will have the same structure. llvm-svn: 360496	2019-05-11 01:25:58 +00:00
Richard Trieu	48803aa65c	[BPF] Move InstPrinter files to MCTargetDesc. NFC For some targets, there is a circular dependency between InstPrinter and MCTargetDesc. Merging them together will fix this. For the other targets, the merging is to maintain consistency so all targets will have the same structure. llvm-svn: 360494	2019-05-11 01:13:21 +00:00
Richard Trieu	bf9e67b5b9	[AVR] Move InstPrinter files to MCTargetDesc. NFC For some targets, there is a circular dependency between InstPrinter and MCTargetDesc. Merging them together will fix this. For the other targets, the merging is to maintain consistency so all targets will have the same structure. llvm-svn: 360493	2019-05-11 01:03:03 +00:00
Richard Trieu	5e3ee4b84e	[ARM] Move InstPrinter files to MCTargetDesc. NFC For some targets, there is a circular dependency between InstPrinter and MCTargetDesc. Merging them together will fix this. For the other targets, the merging is to maintain consistency so all targets will have the same structure. llvm-svn: 360490	2019-05-11 00:34:07 +00:00
Richard Trieu	dcf1ea08e5	[ARC] Move InstPrinter files to MCTargetDesc. NFC For some targets, there is a circular dependency between InstPrinter and MCTargetDesc. Merging them together will fix this. For the other targets, the merging is to maintain consistency so all targets will have the same structure. llvm-svn: 360488	2019-05-11 00:13:01 +00:00
Richard Trieu	c0bd7bd481	[AMDGPU] Move InstPrinter files to MCTargetDesc. NFC For some targets, there is a circular dependency between InstPrinter and MCTargetDesc. Merging them together will fix this. For the other targets, the merging is to maintain consistency so all targets will have the same structure. llvm-svn: 360487	2019-05-11 00:03:35 +00:00
Richard Trieu	7ba0605511	[AArch64] Move InstPrinter files to MCTargetDesc. NFC For some targets, there is a circular dependency between InstPrinter and MCTargetDesc. Merging them together will fix this. For the other targets, the merging is to maintain consistency so all targets will have the same structure. llvm-svn: 360486	2019-05-10 23:50:01 +00:00
Richard Trieu	f48ef2f2ba	[XCore] Move InstPrinter files to MCTargetDesc. NFC For some targets, there is a circular dependency between InstPrinter and MCTargetDesc. Merging them together will fix this. For the other targets, the merging is to maintain consistency so all targets will have the same structure. llvm-svn: 360485	2019-05-10 23:36:49 +00:00
Richard Trieu	b28b8b7724	[X86] Move InstPrinter files to MCTargetDesc. NFC For some targets, there is a circular dependency between InstPrinter and MCTargetDesc. Merging them together will fix this. For the other targets, the merging is to maintain consistency so all targets will have the same structure. llvm-svn: 360484	2019-05-10 23:24:38 +00:00
Philip Reames	849ef823df	Factor out redzone ABI checks [NFCI] As requested in D58632, cleanup our red zone detection logic in the X86 backend. The existing X86MachineFunctionInfo flag is used to track whether we use the redzone (via a particularly optimization?), but there's no common way to check whether the function has a red zone. I'd appreciate careful review of the uses being updated. I think they are NFC, but a careful eye from someone else would be appreciated. Differential Revision: https://reviews.llvm.org/D61799 llvm-svn: 360479	2019-05-10 22:55:42 +00:00
Craig Topper	df10cc6068	[X86] Disable speculative load hardening for operations with an explicit RSP base. After D58632, we can create idempotent atomic operations to the top of stack. This confused speculative load hardening because it thinks accesses should have virtual register base except for the cases it already excluded. This commit adds a new exclusion for this case. I'll try to reduce a test case for this, but this fix was verified to work by the reporter. This should avoid needing to revert D58632. llvm-svn: 360475	2019-05-10 22:03:33 +00:00
Mircea Trofin	ff3bed0e61	Skip over prefetches Summary: Skip over prefetches when assigning debug info to instructions with memory operands. This way, the debug info is stable after instrumenting a binary with prefetches, allowing for iterative profiling and instrumentation. Reviewers: davidxl Reviewed By: davidxl Subscribers: aprantl, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61789 llvm-svn: 360471	2019-05-10 21:27:55 +00:00
Robert Lougher	986b6b86bb	[X86] Avoid SFB - Fix inconsistent codegen with/without debug info Fixes https://bugs.llvm.org/show_bug.cgi?id=40969 The functions findPotentiallyBlockedCopies and buildCopy are currently not accounting for the presence of debug instructions. In the former this results in the optimization not being trigerred, and in the latter results in inconsistent codegen. This patch enables the optimization to be performed in a debug build and ensures the codegen is consistent with non-debug builds. Patch by Chris Dawson. Differential Revision: https://reviews.llvm.org/D61680 llvm-svn: 360436	2019-05-10 15:55:06 +00:00
Simon Pilgrim	a0b1518a4a	[X86][SSE] Add getHopForBuildVector vector splitting If we only use the lower xmm of a ymm hop, then extract the xmm's (for free), perform the xmm hop and then insert back into a ymm (for free). Fixes some of the regressions noted in D61782 llvm-svn: 360435	2019-05-10 15:46:04 +00:00
Lei Huang	1ac6e9636c	[PowerPC] custom lower `v2f64 fpext v2f32` Reduces scalarization overhead via custom lowering of v2f64 fpext v2f32. eg. For the following IR %0 = load <2 x float>, <2 x float>* %Ptr, align 8 %1 = fpext <2 x float> %0 to <2 x double> ret <2 x double> %1 Pre custom lowering: ld r3, 0(r3) mtvsrd f0, r3 xxswapd vs34, vs0 xscvspdpn f0, vs0 xxsldwi vs1, vs34, vs34, 3 xscvspdpn f1, vs1 xxmrghd vs34, vs0, vs1 After custom lowering: lfd f0, 0(r3) xxmrghw vs0, vs0, vs0 xvcvspdp vs34, vs0 Differential Revision: https://reviews.llvm.org/D57857 llvm-svn: 360429	2019-05-10 14:04:06 +00:00
Sam Clegg	ea38ac5ba3	[WebAssembly] Don't assume that strongly defined symbols are DSO-local The current PIC model for WebAssembly is more like ELF in that it allows symbol interposition. This means that more functions end up being addressed via the GOT and fewer directly added to the wasm table. One effect is a reduction in the number of wasm table entries similar to the previous attempt in https://reviews.llvm.org/D61539 which was reverted. Differential Revision: https://reviews.llvm.org/D61772 llvm-svn: 360402	2019-05-10 01:52:08 +00:00
Sam Clegg	2147365484	[WebAssembly] Remove friend18.C from list of known gcc torture test failures. NFC. Differential Revision: https://reviews.llvm.org/D61775 llvm-svn: 360401	2019-05-10 01:45:34 +00:00
Mircea Trofin	5c31c05fbd	[llvm] X86DiscriminateMemOps: insert debug info when missing Reviewers: davidxl Reviewed By: davidxl Subscribers: aprantl, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61735 llvm-svn: 360396	2019-05-10 00:12:51 +00:00
Stanislav Mekhanoshin	64196850f0	[AMDGPU] Pattern for v_xor3_b32 This also allows three op patterns to use increased constant bus limit of GFX10. Differential Revision: https://reviews.llvm.org/D61763 llvm-svn: 360395	2019-05-10 00:09:01 +00:00
Philip Reames	bd588dfd59	[X86] Improve lowering of idemptotent RMW operations The current lowering uses an mfence. mfences are substaintially higher latency than the locked operations originally requested, but we do want to avoid contention on the original cache line. As such, use a locked instruction on a cache line assumed to be thread local. Differential Revision: https://reviews.llvm.org/D58632 llvm-svn: 360393	2019-05-09 23:23:42 +00:00
Bill Wendling	6ee7f31484	Add ".dword" directive Summary: The ".dword" directive is a synonym for ".xword" and is used used by klibc, a minimalistic libc subset for initramfs. Reviewers: t.p.northover, nickdesaulniers Reviewed By: nickdesaulniers Subscribers: nickdesaulniers, javed.absar, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61719 llvm-svn: 360381	2019-05-09 21:57:44 +00:00
Stanislav Mekhanoshin	a76da34b1d	[AMDGPU] gfx1010 v_interp_* instructions Differential Revision: https://reviews.llvm.org/D61703 llvm-svn: 360364	2019-05-09 18:38:55 +00:00
Simon Pilgrim	93bfa5af48	[X86][SSE] Fold add(shuffle(),shuffle()) to hadd on 'slow' targets (PR39920) As reported on PR39920, "slow horizontal ops" targets tend to internally expand to 2shuffle+add/sub - so if we can reduce 2shuffle+add/sub to a hadd/sub then we should do it - similar port usage but reduced instruction count. This works out in most cases, although the "PR22377" regression in vector-shuffle-combining.ll is annoying - going from 2shuffle+add+shuffle to hadd+2shuffle - I've opened PR41813 to cover this. Differential Revision: https://reviews.llvm.org/D61308 llvm-svn: 360360	2019-05-09 17:45:01 +00:00
Stanislav Mekhanoshin	4d4c9e0757	[AMDGPU] gfx1010 changes for PAL metadata Differential Revision: https://reviews.llvm.org/D61704 llvm-svn: 360353	2019-05-09 16:34:13 +00:00
Roman Lebedev	9db0e72570	[X86] AMD Piledriver (BdVer2): major cleanup (mainly inverse throughput) I've started this cleanup more several times now, but got sidetracked elsewhere, e.g. by llvm-exegesis problems. Not this time, finally! This is mainly cleaning up the inverse throughput values, and a few latencies/uops, based on the llvm-exegesis measured values. Though this is not complete by any means, there's certainly more cleanup to be done. The performance numbers (i've only checked by RawSpeed benchmark) aren't really surprising - overall this slightly (< -1%) improves perf. llvm-svn: 360341	2019-05-09 13:54:51 +00:00
Sam Parker	d7b650cc72	[ARM][CGP] Guard against signext args and sitofp Add an Argument that has the SExtAttr attached, as well as SIToFP instructions, as values that generate sign bits. SIToFP doesn't strictly do this and could be treated as a sink to be sign-extended. Differential Revision: https://reviews.llvm.org/D61381 llvm-svn: 360331	2019-05-09 11:56:16 +00:00
Diana Picus	3531453371	[ARM GlobalISel] Map DBG_VALUE for types != s32 ...and make sure we fail elegantly for unsupported values. s64 goes into DPR, anything <= 32 into GPR. llvm-svn: 360321	2019-05-09 09:49:36 +00:00
Hans Wennborg	b1b09e5b55	X86WinAllocaExpander: Drop code looking through register copies (PR41786) This code was never covered by tests, in PR41786 it was pointed out that the deletion part doesn't work, and in a full Chrome build I was never able to hit the code path that looks through copies. It seems the situation it's supposed to handle doesn't actually come up in practice. Delete it to simplify the code. Differential revision: https://reviews.llvm.org/D61671 llvm-svn: 360320	2019-05-09 09:22:56 +00:00
Matt Arsenault	462403a5c8	AMDGPU: Mark scheduler classes as final llvm-svn: 360294	2019-05-08 22:10:04 +00:00
Matt Arsenault	01434f9377	AMDGPU: Select VOP3 form of add The VOP3 form should always be the preferred selection, to be shrunk later. This should only be an optimization issue, but this partially works around a problem from clobbering VCC when SIFixSGPRCopies rewrites an SCC defining operation directly to VCC. 3 of the testcases are regressions from failing to fold the immediate in cases it should. These can be avoided by improving the VCC liveness handling in SIFoldOperands. Simply increasing the threshold to computeRegisterLiveness works, although this is common enough that VCC liveness should probably be tracked throughout the pass. The hack of leaving behind an implicit_def instruction to avoid breaking iterator wastes instruction count, which inhibits finding the VCC def in long chains of adds. Doing this however exposes different, worse looking regressions from poor scheduling behavior. This could probably be avoided around by forcing the shrink of the addc here, but the scheduler should probably be fixed. The r600 add test needs to be split out because it asserts on the arguments in the new test during the calling convention lowering. llvm-svn: 360293	2019-05-08 22:09:57 +00:00
Stanislav Mekhanoshin	1dbf721315	[AMDGPU] gfx1010 exp modifications Differential Revision: https://reviews.llvm.org/D61701 llvm-svn: 360287	2019-05-08 21:23:37 +00:00
Changpeng Fang	73b7272e7a	AMDGPU: Fix a mis-placed bracket Differential Revision: https://reviews.llvm.org/D61430 llvm-svn: 360283	2019-05-08 19:46:04 +00:00
Alina Sbirlea	f31eba6494	[MemorySSA] Teach LoopSimplify to preserve MemorySSA. Summary: Preserve MemorySSA in LoopSimplify, in the old pass manager, if the analysis is available. Do not preserve it in the new pass manager. Update tests. Subscribers: nemanjai, jlebar, javed.absar, Prazek, kbarton, zzheng, jsji, llvm-commits, george.burgess.iv, chandlerc Tags: #llvm Differential Revision: https://reviews.llvm.org/D60833 llvm-svn: 360270	2019-05-08 17:05:36 +00:00
Simon Pilgrim	e461e9a77d	[AArch64] Remove scan-build "Value stored during its initialization is never read" warnings. NFCI. llvm-svn: 360268	2019-05-08 16:29:39 +00:00
Simon Pilgrim	12521b2d43	[AArch64] Fix scan-build null/uninitialized pointer warnings. NFCI. llvm-svn: 360267	2019-05-08 16:27:24 +00:00
Simon Pilgrim	e3eec06dde	[AMDGPU] Reapplied BFE canonicalization from D60462 This was committed in rL358887 but reverted in rL360066 due to a x86 regression, really it should be have been pre-committed instead of being part of the SimplifyDemandedBits bitcast patch. llvm-svn: 360263	2019-05-08 15:49:10 +00:00
Simon Pilgrim	ec58090491	[Hexagon] Fix cppcheck reduce variable scope warnings. NFCI. Also fixes a static analyzer "Value stored to 'S2' during its initialization is never read" warning. llvm-svn: 360244	2019-05-08 11:02:46 +00:00
Tim Northover	18adcf331b	ARM: disallow SP as Rn for Thumb2 TST & TEQ instructions Using SP in this position is unpredictable in ARMv7. CMP and CMN are not affected, and of course v8 relaxes this requirement, but that's handled elsewhere. llvm-svn: 360242	2019-05-08 10:59:08 +00:00
Simon Pilgrim	02937dad69	R600InstrInfo.cpp - Add getTransSwizzle assert for the swizzle op index. NFCI. Fixes static analyzer undefined value warning. llvm-svn: 360239	2019-05-08 10:39:56 +00:00
Simon Pilgrim	be9ade93d1	[SIMode] Fix typo in Status constructor As noted in https://www.viva64.com/en/b/0629/ (Snippet No. 36) and the scan-build CI reports (https://llvm.org/reports/scan-build/report-SIModeRegister.cpp-Status-1-1.html#EndPath), rL348754 introduced a typo in the Status constructor due to argument variable names shadowing the member variable names. Differential Revision: https://reviews.llvm.org/D61595 llvm-svn: 360236	2019-05-08 10:24:22 +00:00
Reid Kleckner	6bf108d77a	[COFF] Use COFF stubs for extern_weak functions Summary: A COFF stub indirects the reference to a symbol through memory. A .refptr.$sym global variable pointer is created to refer to $sym. Typically mingw uses these for external global variable declarations, but we can use them for weak function declarations as well. Updates the dso_local classification to add a special case for extern_weak symbols on COFF in both clang and LLVM. Fixes PR37598 Reviewers: smeenai, mstorsjo Subscribers: hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D61615 llvm-svn: 360207	2019-05-07 23:06:21 +00:00
Austin Kerbow	8a3d3a9af6	[AMDGPU] Check MI bundles for hazards Summary: GCNHazardRecognizer fails to identify hazards that are in and around bundles. This patch allows the hazard recognizer to consider bundled instructions in both scheduler and hazard recognizer mode. We ignore “bundledness” for the purpose of detecting hazards and examine the instructions individually. Reviewers: arsenm, msearles, rampitec Reviewed By: rampitec Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61564 llvm-svn: 360199	2019-05-07 22:12:15 +00:00
Eric Christopher	4727221734	Make sure that the DAG combiner doesn't merge stores that we explicitly asked not be greater than preferred vector width for the vectorizer. Test for both 128 and 256 with a skylake architecture. llvm-svn: 360183	2019-05-07 19:25:34 +00:00
Simon Pilgrim	debb2b2a1e	Fix local shadow variable warning. NFCI. llvm-svn: 360157	2019-05-07 14:56:34 +00:00
Nemanja Ivanovic	b4f028f0f3	[PowerPC] Use the two-constant NR algorithm for refining estimates The single-constant algorithm produces infinities on a lot of denormal values. The precision of the two-constant algorithm is actually sufficient across the range of denormals. We will switch to that algorithm for now to avoid the infinities on denormals. In the future, we will re-evaluate the algorithm to find the optimal one for PowerPC. Differential revision: https://reviews.llvm.org/D60037 llvm-svn: 360144	2019-05-07 13:48:03 +00:00
Diana Picus	0a47fb8884	[ARM GlobalISel] Widen G_SELECT operands ...except for the condition operand. llvm-svn: 360135	2019-05-07 11:39:30 +00:00
Simon Pilgrim	b0f51266b8	[X86][AVX] Fold concat(packus(),packus()) -> packus(concat(),concat()) (PR34773) Basic "revectorization" combine, we can probably do more opcodes here but it can be a tricky cost-benefit depending on where the subvectors came from - but this case helps shuffle combining. llvm-svn: 360134	2019-05-07 11:17:39 +00:00
Simon Pilgrim	a80abeea88	Fixed "Value stored to 'Opc' is never read" warning. NFCI. llvm-svn: 360133	2019-05-07 11:09:16 +00:00
Simon Pilgrim	3c975a0ab5	[X86] Reduce scope of variables where possible. NFCI. Fixes cppcheck warnings. llvm-svn: 360131	2019-05-07 10:50:11 +00:00
Diana Picus	d6d3808fa4	[ARM GlobalISel] Widen G_INTTOPTR/G_PTRTOINT We actually have a couple of G_PTRTOINT to s8 when building clang, so we should do something about them. llvm-svn: 360130	2019-05-07 10:48:01 +00:00
Simon Pilgrim	c5ac14eef8	Fix uninitialized variable warning. NFCI. This also fixes a scan-build "array subscript is undefined" warning. llvm-svn: 360128	2019-05-07 10:30:22 +00:00
Diana Picus	d18bac5d19	[ARM GlobalISel] Widen G_GEP index operand llvm-svn: 360127	2019-05-07 10:11:57 +00:00
Nicolai Haehnle	79ea85c6af	AMDGPU: Verify that SOP2/SOPC instructions have at most one immediate operand Summary: No test case because I don't know of a way to trigger this, but I accidentally caused this to fail while working on a different change. Change-Id: I8015aa447fe27163cc4e4902205a203bd44bf7e3 Reviewers: arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61490 llvm-svn: 360123	2019-05-07 09:19:09 +00:00
Fangrui Song	da82ce99b7	[DebugInfo] Delete TypedDINodeRef TypedDINodeRef<T> is a redundant wrapper of Metadata * that is actually a T . Accordingly, change DI{Node,Scope,Type}Ref uses to DI{Node,Scope,Type} or their const variants. This allows us to delete many resolve() calls that clutter the code. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D61369 llvm-svn: 360108	2019-05-07 02:06:37 +00:00
Craig Topper	a75630302d	[X86] Use extended vector register classes in getRegForInlineAsmConstraint to support x/y/zmm16-31 when the type is mismatched. The FR32/FR64/VR128/VR256 register classes don't contain the upper 16 registers. For most cases we use the default implementation which will find any register class that contains the register in question if the VT is legal for the register class. But if the VT is i32 or i64, we won't find a matching register class and will instead up in the code modified in this patch. If the requested register is x/y/zmm16-31 we weren't returning a register class that contains those registers and will hit an assertion in the caller. To fix this, I've changed to use the extended register class instead. I don't believe we need a subtarget check to see if avx512 is enabled. The default implementation just pick whatever register class it finds first. I checked and we currently pick FR32X for XMM0 with an f32 type using the default implementation regardless of whether avx512 is enabled. So I assume its it is ok to do the same for i32. Differential Revision: https://reviews.llvm.org/D61457 llvm-svn: 360102	2019-05-06 23:57:42 +00:00
Eli Friedman	2ea088173d	[ARM] Glue register copies to tail calls. This generally follows what other targets do. I don't completely understand why the special case for tail calls existed in the first place; even when the code was committed in r105413, call lowering didn't work in the way described in the comments. Stack protector lowering breaks if the register copies are not glued to a tail call: we have to insert the stack protector check before the tail call, and we choose the location based on the assumption that all physical register dependencies of a tail call are adjacent to the tail call. (See FindSplitPointForStackProtector.) This is sort of fragile, but I don't see any reason to break that assumption. I'm guessing nobody has seen this before just because it's hard to convince the scheduler to actually schedule the code in a way that breaks; even without the glue, the only computation that could actually be scheduled after the register copies is the computation of the call address, and the scheduler usually prefers to schedule that before the copies anyway. Fixes https://bugs.llvm.org/show_bug.cgi?id=41417 Differential Revision: https://reviews.llvm.org/D60427 llvm-svn: 360099	2019-05-06 23:21:59 +00:00
Stanislav Mekhanoshin	491746a584	[AMDGPU] gfx1010 verifier changes Differential Revision: https://reviews.llvm.org/D61521 llvm-svn: 360095	2019-05-06 22:49:45 +00:00
Stanislav Mekhanoshin	971cb8b633	[AMDGPU] gfx1010: prefer V_MUL_LO_U32 over V_MUL_LO_I32 GFX10 deprecates v_mul_lo_i32 instruction, so choose u32 form for all targets. Differential Revision: https://reviews.llvm.org/D61525 llvm-svn: 360094	2019-05-06 22:27:05 +00:00
Stanislav Mekhanoshin	1bc001dec4	[AMDGPU] gfx1010 memory legalizer Differential Revision: https://reviews.llvm.org/D61535 llvm-svn: 360087	2019-05-06 21:57:02 +00:00
Craig Topper	d10a200ceb	[X86] Remove the suffix on vcvt[u]si2ss/sd register variants in assembly printing. We require d/q suffixes on the memory form of these instructions to disambiguate the memory size. We don't require it on the register forms, but need to support parsing both with and without it. Previously we always printed the d/q suffix on the register forms, but it's redundant and inconsistent with gcc and objdump. After this patch we should support the d/q for parsing, but not print it when its unneeded. llvm-svn: 360085	2019-05-06 21:39:51 +00:00
Martin Storsjo	899f3cd581	[AArch64] Default to SEH exception handling on MinGW The SEH implementation is pretty mature at this point. Differential Revision: https://reviews.llvm.org/D61590 llvm-svn: 360080	2019-05-06 21:18:15 +00:00
Amara Emerson	3d1128cc9e	[GlobalISel] Handle <1 x T> vector return types properly. After support for dealing with types that need to be extended in some way was added in r358032 we didn't correctly handle <1 x T> return types. These types don't have a GISel direct representation, instead we just see them as scalars. When we need to pad them into <2 x T> types however we need to use a G_BUILD_VECTOR instead of trying to do a G_CONCAT_VECTOR. This fixes PR41738. llvm-svn: 360068	2019-05-06 19:41:01 +00:00
Craig Topper	55a71b575c	Revert r359392 and r358887 Reverts "[X86] Remove (V)MOV64toSDrr/m and (V)MOVDI2SSrr/m. Use 128-bit result MOVD/MOVQ and COPY_TO_REGCLASS instead" Reverts "[TargetLowering][AMDGPU][X86] Improve SimplifyDemandedBits bitcast handling" Eric Christopher and Jorge Gorbe Moya reported some issues with these patches to me off list. Removing the CodeGenOnly instructions has changed how fneg is handled during fast-isel with sse/sse2. We're now emitting fsub -0.0, x instead moving to the integer domain(in a GPR), xoring the sign bit, and then moving back to xmm. This is because the fast isel table no longer contains an entry for (f32/f64 bitcast (i32/i64)) so the target independent fneg code fails. The use of fsub changes the behavior of nan with respect to -O2 codegen which will always use a pxor. NOTE: We still have a difference with double with -m32 since the move to GPR doesn't work there. I'll file a separate PR for that and add test cases. Since removing the CodeGenOnly instructions was fixing PR41619, I'm reverting r358887 which exposed that PR. Though I wouldn't be surprised if that bug can still be hit independent of that. This should hopefully get Google back to green. I'll work with Simon and other X86 folks to figure out how to move forward again. llvm-svn: 360066	2019-05-06 19:29:24 +00:00
Guillaume Chatelet	edd69fca3e	Modernize repmovsb implementation of x86 memcpy and allow runtime sizes. Summary: This is a prerequisite to RFC http://lists.llvm.org/pipermail/llvm-dev/2019-April/131973.html Reviewers: courbet Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61593 Fix typo. Turn this patch into an NFC. Addressing comments llvm-svn: 360050	2019-05-06 15:10:19 +00:00
Simon Pilgrim	2a0ef0530b	[X86] Fix uninitialized members in constructor warnings. NFCI. Initialize all member variables in X86ATTInstPrinter and X86DAGToDAGISel constructors to fix cppcheck warning. llvm-svn: 360047	2019-05-06 14:48:02 +00:00
Alexandre Ganea	799d96ec39	Fix compilation warnings when compiling with GCC 7.3 Differential Revision: https://reviews.llvm.org/D61046 llvm-svn: 360044	2019-05-06 13:41:54 +00:00
Nemanja Ivanovic	70afe4f7e1	[PowerPC] Fix erroneous condition for converting uint-to-fp vector conversion A condition for exiting the legalization of v4i32 conversion to v2f64 through extract/convert/build erroneously checks for the extract having type i32. This is not adequate as smaller extracts are actually legalized to i32 as well. Furthermore, an early exit is missing which means that we only check that both extracts are from the same vector if that check fails. As a result, both cases in the included test case fail - the first gets a select error and the second generates incorrect code. The culprit commit is r274535. llvm-svn: 360043	2019-05-06 13:35:49 +00:00
Simon Pilgrim	d672d0e246	X86DAGToDAGISel::tryVPTESTM - fix uninitialized variable warning. NFCI. findBroadcastedOp should always initialize the value if it returns true but static-analyzer isn't great at recognising this. llvm-svn: 360037	2019-05-06 11:52:16 +00:00
Simon Pilgrim	04dad8f66d	[X86] X86InstrInfo::findThreeSrcCommutedOpIndices - fix unread variable warning. scan-build was reporting that CommutableOpIdx1 never used its original initialized value - move it down to where its first used to make the real initialization more obvious (and matches the comment that's there). llvm-svn: 360028	2019-05-06 10:15:34 +00:00
Simon Pilgrim	07d91cd98a	[X86] lowerVectorShuffle - use any_of to detect out of bounds shuffle indices. NFCI. Fixes cppcheck local shadow warning as well. llvm-svn: 360027	2019-05-06 10:11:24 +00:00
Luo, Yuanke	beec41c656	Enable AVX512_BF16 instructions, which are supported for BFLOAT16 in Cooper Lake Summary: 1. Enable infrastructure of AVX512_BF16, which is supported for BFLOAT16 in Cooper Lake; 2. Enable VCVTNE2PS2BF16, VCVTNEPS2BF16 and DPBF16PS instructions, which are Vector Neural Network Instructions supporting BFLOAT16 inputs and conversion instructions from IEEE single precision. VCVTNE2PS2BF16: Convert Two Packed Single Data to One Packed BF16 Data. VCVTNEPS2BF16: Convert Packed Single Data to Packed BF16 Data. VDPBF16PS: Dot Product of BF16 Pairs Accumulated into Packed Single Precision. For more details about BF16 isa, please refer to the latest ISE document: https://software.intel.com/en-us/download/intel-architecture-instruction-set-extensions-programming-reference Author: LiuTianle Reviewers: craig.topper, smaslov, LuoYuanke, wxiao3, annita.zhang, RKSimon, spatel Reviewed By: craig.topper Subscribers: kristina, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60550 llvm-svn: 360017	2019-05-06 08:22:37 +00:00
Simon Pilgrim	8462cc3c74	[X86] Pull out repeated Subtarget feature tests. NFCI. Avoids a scan-build "uninitialized value" warning in X86FastISel::X86SelectFPExtOrFPTrunc llvm-svn: 360001	2019-05-05 20:45:20 +00:00
Simon Pilgrim	addc90e4e8	[TTI][X86] Make getAddressComputationCost cost value const. NFCI. llvm-svn: 359999	2019-05-05 20:03:51 +00:00
Simon Pilgrim	5170c0e5fe	Move getOpcode() call into if statement. NFCI. Avoids a cppcheck "Local variable name shadows outer variable" warning. llvm-svn: 359991	2019-05-05 18:34:38 +00:00
Simon Pilgrim	70ee2def90	[X86] Make X86RegisterInfo(const Triple &TT) constructor explicit. Fixes cppcheck warning. llvm-svn: 359981	2019-05-05 12:51:47 +00:00
Simon Pilgrim	cbcd9b1b92	[X86] Fix some cppcheck "Local variable name shadows outer variable" warnings. NFCI. llvm-svn: 359976	2019-05-05 12:00:14 +00:00
Stanislav Mekhanoshin	5ddd564e19	[AMDGPU] Fixed asan error after D61536 llvm-svn: 359963	2019-05-04 06:40:20 +00:00
Stanislav Mekhanoshin	51d1415a16	AMDGPU] gfx1010 hazard recognizer Differential Revision: https://reviews.llvm.org/D61536 llvm-svn: 359961	2019-05-04 04:30:57 +00:00
Stanislav Mekhanoshin	28a1936f6d	[AMDGPU] gfx1010: use fmac instructions Differential Revision: https://reviews.llvm.org/D61527 llvm-svn: 359959	2019-05-04 04:20:37 +00:00
Jessica Paquette	910630c1e4	[AArch64][GlobalISel] Use fcsel instead of csel for G_SELECT on FPRs This saves us some unnecessary copies. If the inputs to a G_SELECT are floating point, we should use fcsel rather than csel. Changes here are... - Teach selectCopy about s1-to-s1 copies across register banks. - AArch64RegisterBankInfo about G_SELECT in general. - Teach the instruction selector about the FCSEL instructions. Also add two tests: - select-select.mir to show that we get the expected FCSEL - regbank-select.mir (unfortunately named) to show the register banks on G_SELECT are properly preserved And update fast-isel-select.ll to show that we do the same thing as other instruction selectors in these cases. llvm-svn: 359940	2019-05-03 22:37:46 +00:00
Stanislav Mekhanoshin	d9dcf392c7	[AMDGPU] gfx1010 wait count insertion Differential Revision: https://reviews.llvm.org/D61534 llvm-svn: 359938	2019-05-03 21:53:53 +00:00
Stanislav Mekhanoshin	41bbe101a2	[AMDGPU] gfx1010 s_code_end generation Also add some missing metadata in the streamer. Differential Revision: https://reviews.llvm.org/D61531 llvm-svn: 359937	2019-05-03 21:26:39 +00:00
Stanislav Mekhanoshin	93f15c922f	[AMDGPU] gfx1010 loop alignment Differential Revision: https://reviews.llvm.org/D61529 llvm-svn: 359935	2019-05-03 21:17:29 +00:00
Mandeep Singh Grang	5dc8aeb26d	[COFF, ARM64] Fix ABI implementation of struct returns Summary: Refer the ABI doc at: https://docs.microsoft.com/en-us/cpp/build/arm64-windows-abi-conventions?view=vs-2019#return-values Related clang patch: D60349 Reviewers: rnk, efriedma, TomTan, ssijaric Reviewed By: rnk, efriedma Subscribers: mstorsjo, javed.absar, kristof.beyls, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60348 llvm-svn: 359934	2019-05-03 21:12:36 +00:00
Brian Cain	3428c9daef	[hexagon] change AsmParser assertion to error For immediates that can't be evaluated in assembler-mapped instructions, we should return 'invalid operand' instead of assert. llvm-svn: 359905	2019-05-03 16:50:38 +00:00
Craig Topper	a8f3840c62	[X86] Allow assembly parser to accept x/y/z suffixes on non-memory vfpclassps/pd and on memory forms in intel syntax The x/y/z suffix is needed to disambiguate the memory form in at&t syntax since no xmm/ymm/zmm register is mentioned. But we should also allow it for the register and broadcast forms where its not needed for consistency. This matches gas. The printing code will still only use the suffix for the memory form where it is needed. llvm-svn: 359903	2019-05-03 16:15:15 +00:00
Simon Pilgrim	b323d5ec7c	[X86] LowerToHorizontalOp - Tidyup calls to getHopForBuildVector. NFCI. Merge the if() tests for the various HADD/SUB + Subtarget tests llvm-svn: 359901	2019-05-03 15:56:06 +00:00
Matt Arsenault	657ef48a88	AMDGPU: Select VOP3 form of sub The VOP3 form should always be the preferred selection form to be shrunk later. The r600 sub test needs to be split out because it asserts on the arguments in the new test during the calling convention lowering. llvm-svn: 359899	2019-05-03 15:37:07 +00:00
Matt Arsenault	cfd0ca38b0	AMDGPU: Support shrinking add with FI in SIFoldOperands Avoids test regression in a future patch llvm-svn: 359898	2019-05-03 15:21:53 +00:00
Matt Arsenault	344d68d3c9	AMDGPU: Remove redundant patterns for shifts llvm-svn: 359895	2019-05-03 15:08:36 +00:00
Matt Arsenault	ada33314a2	AMDGPU: Remove redundant patterns for sub There were 2 patterns for sub, one selecting to sub and one to subrev. Only one of these will succeed, so remove the reversed one. llvm-svn: 359894	2019-05-03 15:08:35 +00:00
Matt Arsenault	0446fbe45e	AMDGPU: Replace shrunk instruction with dummy implicit_def This was broken if the original operand was killed. The kill flag would appear on both instructions, and fail the verifier. Keep the kill flag, but remove the operands from the old instruction. This has an added benefit of really reducing the use count for future folds. Ideally the pass would be structured more like what PeepholeOptimizer does to avoid this hack to avoid breaking instruction iterators. llvm-svn: 359891	2019-05-03 14:40:10 +00:00
Simon Pilgrim	bfdd0f75a8	[X86] Remove repeated variables. NFCI. llvm-svn: 359889	2019-05-03 14:37:00 +00:00
Simon Pilgrim	aa49be4926	Avoid cppcheck operator precedence warnings. NFCI. Prefer ((X & Y) ? A : B) to (X & Y ? A : B) llvm-svn: 359884	2019-05-03 13:50:38 +00:00
Matt Arsenault	2c8936fd26	AMDGPU: Fix incorrect commute with sub when folding immediates When a fold of an immediate into a sub/subrev required shrinking the instruction, the wrong VOP2 opcode was used. This was using the VOP2 equivalent of the original instruction, not the commuted instruction with the inverted opcode. llvm-svn: 359883	2019-05-03 13:42:56 +00:00
Simon Pilgrim	a359ef192b	[X86] LowerMULH - remove unused Lo/Hi vector indices. NFCI. Leftover from before we had the extract128BitVector helpers. llvm-svn: 359871	2019-05-03 10:32:07 +00:00
Simon Pilgrim	88f9117168	Reduce variable scope to just the if() block its actually used in. NFCI. llvm-svn: 359869	2019-05-03 10:13:41 +00:00
Craig Topper	d724360695	[X86] Add more one checks to masked compare patterns that were missed in r358358. This covers the patterns we use for widening 128/256 comparisons to 512-bit when AVX512VL isn't supported. llvm-svn: 359863	2019-05-03 07:14:05 +00:00
Eli Friedman	7238353848	[AArch64][MC] Reject "add x0, x1, w2, lsl #1" etc. Looks like just a minor oversight in the parsing code. Fixes https://bugs.llvm.org/show_bug.cgi?id=41504. Differential Revision: https://reviews.llvm.org/D60840 llvm-svn: 359855	2019-05-03 00:59:52 +00:00
Craig Topper	bf29238e1a	[X86] Remove LEA16r references from X86FixupLEAs. NFCI As far as I know, we never emit LEA16r llvm-svn: 359840	2019-05-02 22:46:23 +00:00
Craig Topper	e1e38d4248	[X86] Correct the register class for specific mask register constraints in getRegForInlineAsmConstraint when the VT is a scalar type The default impementation in the base class for TargetLowering::getRegForInlineAsmConstraint doesn't work for mask registers when the VT is a scalar type integer types since the only legal mask types are vXi1. So we end up just getting whatever the first register class that contains the register. Currently this appears to be VK1, but its really dependent on the order tablegen outputs the register classes. Some code in the caller ends up looking up the type for this register class and find v1i1 then generates a copyfromreg from the physical k-register with the v1i1 type. Then it generates an any_extend from v1i1 to the scalar VT which isn't legal. This bad any_extend sticks around until isel where it selects a MOVZX32rr8 with a v1i1 input or maybe a i8 input. Not sure but eventually we pick up a copy from VK1 to GR8 in MachineIR which isn't supported. This leads to a failure in physical register copying. This patch uses the scalar type to find a VK class of the right size. In the attached test case this will be VK16. This causes a bitcast from vk16 to i16 to be generated instead of an any_extend. This will be properly iseled to a VK16 to GR32 copy and a GR32->GR16 extract_subreg. Fixes PR41678 Differential Revision: https://reviews.llvm.org/D61453 llvm-svn: 359837	2019-05-02 22:26:40 +00:00
Evandro Menezes	111df108e6	[AArch64] Update for Exynos Fix the forwarding of multiplication results for Exynos M4. llvm-svn: 359834	2019-05-02 22:01:39 +00:00
Craig Topper	47d8865a38	[X86] Remove string literal from an if. NFC This if used to be an assert that got refactored into an if, but left the string literal behind. Fixes PR41718 llvm-svn: 359833	2019-05-02 21:57:18 +00:00
Sanjay Patel	284472be6d	[SelectionDAG] remove constant folding limitations based on FP exceptions We don't have FP exception limits in the IR constant folder for the binops (apart from strict ops), so it does not make sense to have them here in the DAG either. Nothing else in the backend tries to preserve exceptions (again outside of strict ops), so I don't see how this could have ever worked for real code that cares about FP exceptions. There are still cases (examples: unary opcodes in SDAG, FMA in IR) where we are trying (at least partially) to preserve exceptions without even asking if the target supports FP exceptions. Those should be corrected in subsequent patches. Real support for FP exceptions requires several changes to handle the constrained/strict FP ops. Differential Revision: https://reviews.llvm.org/D61331 llvm-svn: 359791	2019-05-02 14:47:59 +00:00
Simon Pilgrim	df8daf0ef4	[X86][SSE] lowerAddSubToHorizontalOp - enable ymm extraction+fold Limiting scalar hadd/hsub generation to the lowest xmm looks to be unnecessary - we will be extracting one upper xmm whatever, and we can remove a shuffle by using the hop which is inline with what shouldUseHorizontalOp expects to happen anyway. Testing on btver2 (the main target for fast-hops) shows this is beneficial even for float ops where we have a 'shuffle' to extract the float result: https://godbolt.org/z/0R-U-K Differential Revision: https://reviews.llvm.org/D61426 llvm-svn: 359786	2019-05-02 14:00:55 +00:00
Simon Pilgrim	9fa56f7829	[X86][SSE] Move shouldUseHorizontalOp inside isHorizontalBinOp. NFCI. Matches what we do for lowerAddSubToHorizontalOp and will make it easier to peek through subvectors to help fix PR39921 llvm-svn: 359782	2019-05-02 12:18:24 +00:00
Diana Picus	1136ea2d44	[ARM GlobalISel] Fixup r359768 Get rid of local variable used only in assertion. llvm-svn: 359772	2019-05-02 10:08:29 +00:00
Diana Picus	06a61ccc42	[ARM GlobalISel] Select extensions to < 32 bits Select G_SEXT and G_ZEXT with destination types smaller than 32 bits in the exact same way as 32 bits. This overwrites the higher bits, but that should be ok since all legal users of types smaller than 32 bits ignore those bits anyway. llvm-svn: 359768	2019-05-02 09:28:00 +00:00
Diana Picus	53bcf6f2e7	[ARM GlobalISel] Legalize extensions to < 32 bits Make it legal to extend from e.g. s1 to s8 or s16. llvm-svn: 359766	2019-05-02 09:21:46 +00:00
Kang Zhang	1a0d6d6899	[NFC][PowerPC] Return early if the element type is not byte-sized in combineBVOfConsecutiveLoads Summary: Based on the Eli Friedman's comments in https://reviews.llvm.org/D60811 , we'd better return early if the element type is not byte-sized in `combineBVOfConsecutiveLoads`. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D61076 llvm-svn: 359764	2019-05-02 08:15:13 +00:00
Stanislav Mekhanoshin	64399da8b8	[AMDGPU] gfx1010 lost VOP2 forms of some add/sub Add legalization of V_ADD_I32, V_SUB_I32, V_SUBREV_I32. Differential Revision: llvm-svn: 359757	2019-05-02 04:26:35 +00:00
Stanislav Mekhanoshin	5cf8167735	[AMDGPU] gfx1010 allows VOP3 to have a literal Differential Revision: https://reviews.llvm.org/D61413 llvm-svn: 359756	2019-05-02 04:01:39 +00:00
Stanislav Mekhanoshin	f2baae0abb	[AMDGPU] gfx1010 constant bus limit Constant bus limit has increased to 2 with GFX10. Differential Revision: https://reviews.llvm.org/D61404 llvm-svn: 359754	2019-05-02 03:47:23 +00:00
Craig Topper	b929a0062e	[X86] Remove the redundant suffix in vfpclassp[d,s]'s broadcasting variant The broadcasting variant for instruction vfpclassp[d,s] shouldn't use suffix q/l. So remove them from the template. Patch by Pengfei Wang Differential Revision: https://reviews.llvm.org/D61295 llvm-svn: 359753	2019-05-02 03:25:50 +00:00
Jessica Paquette	a3843fe6f4	[GlobalISel][AArch64] Use fmov for G_FCONSTANT when possible This adds support for using fmov rather than a standard mov to materialize G_FCONSTANT when it's safe to do so. Update arm64-fast-isel-materialize.ll and select-constant.mir to show that the selection is correct. llvm-svn: 359734	2019-05-01 22:39:43 +00:00
Simon Pilgrim	9f04d97cd7	[X86][SSE] Fold scalar horizontal add/sub for non-0/1 element extractions We already perform horizontal add/sub if we extract from elements 0 and 1, this patch extends it to non-0/1 element extraction indices (as long as they are from the lowest 128-bit vector). Differential Revision: https://reviews.llvm.org/D61263 llvm-svn: 359707	2019-05-01 17:13:35 +00:00
Stanislav Mekhanoshin	3b7925f035	[AMDGPU] gfx1010 GCNRegBankReassign pass Reassign registers to reduce register bank conflicts. Differential Revision: https://reviews.llvm.org/D61344 llvm-svn: 359704	2019-05-01 16:49:31 +00:00

1 2 3 4 5 ...

52024 Commits