llvm-project

Commit Graph

Author	SHA1	Message	Date
Craig Topper	cb6c38612e	[X86] Make FeatureAVX512 imply FeatureFMA. Previously our VEX patterns were checking Subtarget.hasFMA() which checked FMA \|\| AVX512. So we were behaving as if AVX512 implied it anyway. Which means we'd allow VEX encoded 128/256 FMA when AVX512F was enabled but AVX512VL is off. Regardless of the FMA flag. EVEX to VEX also transforms scalar EVEX FMA instructions to their VEX versions even without the FMA flag. Similarly for 128/256 under AVX512VL. So this makes AVX512 imply FeatureFMA to make our current behavior explicit. All known CPUs that support AVX512 have VEX FMA instructions. llvm-svn: 317520	2017-11-06 22:49:01 +00:00
Graham Yiu	52a52a6cab	Fix buildbot breakages from r317503. Add parentheses to assignment when using result as a condition. llvm-svn: 317508	2017-11-06 21:04:19 +00:00
Graham Yiu	030621bbcb	Adds code to PPC ISEL lowering to recognize byte inserts from vector_shuffles, and use P9 shift and vector insert byte instructions instead of vperm. Extends tests from vector insert half-word. Differential Revision: https://reviews.llvm.org/D34497 llvm-svn: 317503	2017-11-06 20:18:30 +00:00
Guozhi Wei	e3b8d9a312	[PPC] Use xxbrd to speed up bswap64 Power doesn't have bswap instructions, so llvm generates following code sequence for bswap64. rotldi 5, 3, 16 rotldi 4, 3, 8 rotldi 9, 3, 24 rotldi 10, 3, 32 rotldi 11, 3, 48 rotldi 12, 3, 56 rldimi 4, 5, 8, 48 rldimi 4, 9, 16, 40 rldimi 4, 10, 24, 32 rldimi 4, 11, 40, 16 rldimi 4, 12, 48, 8 rldimi 4, 3, 56, 0 But Power9 has vector bswap instructions, they can also be used to speed up scalar bswap intrinsic. With this patch, bswap64 can be translated to: mtvsrdd 34, 3, 3 xxbrd 34, 34 mfvsrld 3, 34 Differential Revision: https://reviews.llvm.org/D39510 llvm-svn: 317499	2017-11-06 19:09:38 +00:00
Matt Arsenault	4f6318fe1b	AMDGPU: Select v_mad_u64_u32 and v_mad_i64_i32 llvm-svn: 317492	2017-11-06 17:04:37 +00:00
Sanjay Patel	629c411538	[IR] redefine 'UnsafeAlgebra' / 'reassoc' fast-math-flags and add 'trans' fast-math-flag As discussed on llvm-dev: http://lists.llvm.org/pipermail/llvm-dev/2016-November/107104.html and again more recently: http://lists.llvm.org/pipermail/llvm-dev/2017-October/118118.html ...this is a step in cleaning up our fast-math-flags implementation in IR to better match the capabilities of both clang's user-visible flags and the backend's flags for SDNode. As proposed in the above threads, we're replacing the 'UnsafeAlgebra' bit (which had the 'umbrella' meaning that all flags are set) with a new bit that only applies to algebraic reassociation - 'AllowReassoc'. We're also adding a bit to allow approximations for library functions called 'ApproxFunc' (this was initially proposed as 'libm' or similar). ...and we're out of bits. 7 bits ought to be enough for anyone, right? :) FWIW, I did look at getting this out of SubclassOptionalData via SubclassData (spacious 16-bits), but that's apparently already used for other purposes. Also, I don't think we can just add a field to FPMathOperator because Operator is not intended to be instantiated. We'll defer movement of FMF to another day. We keep the 'fast' keyword. I thought about removing that, but seeing IR like this: %f.fast = fadd reassoc nnan ninf nsz arcp contract afn float %op1, %op2 ...made me think we want to keep the shortcut synonym. Finally, this change is binary incompatible with existing IR as seen in the compatibility tests. This statement: "Newer releases can ignore features from older releases, but they cannot miscompile them. For example, if nsw is ever replaced with something else, dropping it would be a valid way to upgrade the IR." ( http://llvm.org/docs/DeveloperPolicy.html#ir-backwards-compatibility ) ...provides the flexibility we want to make this change without requiring a new IR version. Ie, we're not loosening the FP strictness of existing IR. At worst, we will fail to optimize some previously 'fast' code because it's no longer recognized as 'fast'. This should get fixed as we audit/squash all of the uses of 'isFast()'. Note: an inter-dependent clang commit to use the new API name should closely follow commit. Differential Revision: https://reviews.llvm.org/D39304 llvm-svn: 317488	2017-11-06 16:27:15 +00:00
Simon Pilgrim	ad9b9720e8	[X86][SSE] Merge combineExtractVectorElt_SSE into combineExtractVectorElt. NFCI. We still early-out for X86ISD::PEXTRW/X86ISD::PEXTRB so no actual change in behaviour, but it'll make it easier to add support in a future patch. llvm-svn: 317485	2017-11-06 15:28:25 +00:00
Simon Pilgrim	14450720e6	[X86][SSE] Combine EXTRACT_VECTOR_ELT with combineExtractWithShuffle before XFormVExtractWithShuffleIntoLoad combineExtractWithShuffle can handle more complex shuffles/bitcasts than we can with the equivalent code in XFormVExtractWithShuffleIntoLoad. Mainly a compile time improvement now (combineExtractWithShuffle combines will have always failed late on inside XFormVExtractWithShuffleIntoLoad), and will let us merge combineExtractVectorElt_SSE in a future commit. llvm-svn: 317481	2017-11-06 14:34:19 +00:00
Yaxun Liu	cc56a8b108	[AMDGPU] Change alloca addr space of r600 to 5 for amdgiz environment Differential Revision: https://reviews.llvm.org/D39657 llvm-svn: 317479	2017-11-06 14:32:33 +00:00
Jonas Paulsson	e54cc1a436	[SystemZ] implement hasDivRemOp() SystemZ can do division and remainder in a single instruction for scalar integer types, which are now reflected by returning true in this hook for those cases. Review: Ulrich Weigand llvm-svn: 317477	2017-11-06 13:10:31 +00:00
Yaxun Liu	1ac16619d2	[AMDGPU] Fix assertion due to assuming pointer in default addr space is 32 bit The backend assumes pointer in default addr space is 32 bit, which is not true for the new addr space mapping and causes assertion for unresolved functions. This patch fixes that. Differential Revision: https://reviews.llvm.org/D39643 llvm-svn: 317476	2017-11-06 13:01:33 +00:00
Simon Dardis	169df4e24b	[mips] Add movep for microMIPS32R6 and fix microMIPS32r3 version Previously, the 'movep' instruction was defined for microMIPS32r3 and shared that definition with microMIPS32R6. 'movep' was re-encoded for microMIPS32r6, so this patch provides the correct encoding. Secondly, correct the encoding of the 'rs' and 'rt' operands which have an instruction specific encoding for the registers those operands accept. Finally, correct the decoding of the 'dst_regs' operand which was extracting the relevant field from the instruction, but was actually extracting the field from the alreadly extracted field. Reviewers: atanasyan Differential Revision: https://reviews.llvm.org/D39495 llvm-svn: 317475	2017-11-06 12:59:53 +00:00
Mohammed Agabaria	6691758364	[LV][X86] update the cost of interleaving mem. access of floats Recommit: This patch contains update of the costs of interleaved loads of v8f32 of stride 3 and 8. fixed the location of the lit test it works with make check-all. Differential Revision: https://reviews.llvm.org/D39403 llvm-svn: 317471	2017-11-06 10:56:20 +00:00
Simon Dardis	e57795384c	[mips] Fix PR35140 Mark all symbols involved with TLS relocations as being TLS symbols. This resolves PR35140. Thanks to Alex Crichton for reporting the issue! Reviewers: atanasyan Differential Revision: https://reviews.llvm.org/D39591 llvm-svn: 317470	2017-11-06 10:50:04 +00:00
Uriel Korach	bb86686a8b	[X86][AVX512] Improve lowering of AVX512 test intrinsics Added TESTM and TESTNM to the list of instructions that already zeroing unused upper bits and does not need the redundant shift left and shift right instructions afterwards. Added a pattern for TESTM and TESTNM in iselLowering, so now icmp(neq,and(X,Y), 0) goes folds into TESTM and icmp(eq,and(X,Y), 0) goes folds into TESTNM This commit is a preparation for lowering the test and testn X86 intrinsics to IR. Differential Revision: https://reviews.llvm.org/D38732 llvm-svn: 317465	2017-11-06 09:22:38 +00:00
Uriel Korach	eb47d95d52	[X86] Replace duplicate function call with variable. NFC Change from: if (N->getOperand(0).getValueType() == MVT::v8i32 \|\| N->getOperand(0).getValueType() == MVT::v8f32) to: EVT OpVT = N->getOperand(0).getValueType(); if (OpVT == MVT::v8i32 \|\| OpVT == MVT::v8f32) Change-Id: I5a105f8710b73a828e6cfcd55fac2eae6153ce25 llvm-svn: 317464	2017-11-06 08:32:45 +00:00
Zvi Rackover	3122698040	X86 ISel: Basic support for variable-index vector permutations Summary: Try to lower a BUILD_VECTOR composed of extract-extract chains that can be reasoned to be a permutation of a vector by indices in a non-constant vector. We saw this pattern created by ISPC, which resolts to creating it due to the requirement that shufflevector's mask operand be a constant vector. I didn't check this but we could possibly use this pattern for lowering the X86 permute C-instrinsics instead of llvm.x86 instrinsics. This change can be followed by more improvements: 1. Handle vectors with undef elements. 2. Utilize pshufb and zero-mask-blending to support more effiecient construction of vectors with constant-0 elements. 3. Use smaller-element vectors of same width, and "interpolate" the indices, when no native operation available. Reviewers: RKSimon, craig.topper Reviewed By: RKSimon Subscribers: chandlerc, DavidKreitzer Differential Revision: https://reviews.llvm.org/D39126 llvm-svn: 317463	2017-11-06 08:25:46 +00:00
Jina Nahias	3844f1ad5c	Revert "adding a pattern for broadcastm" This reverts commit r317457. Change-Id: If07f1fca1e3453d16c1dac906e87768661384e91 llvm-svn: 317462	2017-11-06 07:48:58 +00:00
Jina Nahias	7b705f1f91	[x86][AVX512] Lowering Broadcastm intrinsics to LLVM IR This patch, together with a matching clang patch (https://reviews.llvm.org/D38683), implements the lowering of X86 broadcastm intrinsics to IR. Differential Revision: https://reviews.llvm.org/D38684 Change-Id: I709ac0b34641095397e994c8ff7e15d1315b3540 llvm-svn: 317458	2017-11-06 07:09:24 +00:00
Jina Nahias	9c6561b648	adding a pattern for broadcastm Change-Id: I6551fb13879e098aed74de410e29815cf37d9ab5 llvm-svn: 317457	2017-11-06 07:09:09 +00:00
Craig Topper	70eaeae7f0	[X86] Use EVEX encoded intrinsics for legacy FMA intrinsics when possible. llvm-svn: 317454	2017-11-06 05:48:26 +00:00
Craig Topper	07dac55d95	[X86] Add scalar FMA ISD nodes without rounding mode. NFC Next step is to use them for the legacy FMA scalar intrinsics as well. This will enable the legacy intrinsics to use EVEX encoded opcodes and the extended registers. llvm-svn: 317453	2017-11-06 05:48:25 +00:00
Craig Topper	eff606cc0e	[X86] Use EVEX encoded instructions for legacy scalar sqrt intrinsics. Fixes PR35161. llvm-svn: 317445	2017-11-06 04:04:01 +00:00
Craig Topper	d6471cb934	[X86] Add missing predicate to a pattern. NFC Other patterns had higher priority so this wasn't noticed. But we shouldn't be dependent on pattern order. llvm-svn: 317442	2017-11-05 21:14:06 +00:00
Craig Topper	4e2f53511a	[X86] Remove some more RCP and RSQRT patterns from InstrAVX512.td that I missed in r317413. llvm-svn: 317441	2017-11-05 21:14:05 +00:00
Craig Topper	948c39c480	[X86] Fix outdated comment. NFC llvm-svn: 317440	2017-11-05 21:14:04 +00:00
Mohammed Agabaria	acd69dbc7c	[REVERT][LV][X86] update the cost of interleaving mem. access of floats reverted my changes will be committed later after fixing the failure This patch contains update of the costs of interleaved loads of v8f32 of stride 3 and 8. Differential Revision: https://reviews.llvm.org/D39403 llvm-svn: 317433	2017-11-05 09:36:54 +00:00
Mohammed Agabaria	f74c767de6	[LV][X86] update the cost of interleaving mem. access of floats This patch contains update of the costs of interleaved loads of v8f32 of stride 3 and 8. Differential Revision: https://reviews.llvm.org/D39403 llvm-svn: 317432	2017-11-05 09:06:23 +00:00
Craig Topper	692c8efe30	[X86] Don't use RCP14 and RSQRT14 for reciprocal estimations or for legacy SSE rcp/rsqrt intrinsics when AVX512 features are enabled. Summary: AVX512 added RCP14 and RSQRT instructions which improve accuracy over the legacy RCP and RSQRT instruction, but not enough accuracy to remove the need for a Newton Raphson refinement. Currently we use these new instructions for the legacy packed SSE instrinics, but not the scalar instrinsics. And we use it for fast math optimization of division and reciprocal sqrt. I think switching the legacy instrinsics maybe surprising to the user since it changes the answer based on which processor you're using regardless of any fastmath settings. It's also weird that we did something different between scalar and packed. As far at the reciprocal estimation, I think it creates unnecessary deltas in our output behavior (and prevents EVEX->VEX). A little playing around with gcc and icc and godbolt suggest they don't change which instructions they use here. This patch adds new X86ISD nodes for the RCP14/RSQRT14 and uses those for the new intrinsics. Leaving the old intrinsics to use the old instructions. Going forward I think our focus should be on -Supporting 512-bit vectors, which will have to use the RCP14/RSQRT14. -Using RSQRT28/RCP28 to remove the Newton Raphson step on processors with AVX512ER -Supporting double precision. Reviewers: zvi, DavidKreitzer, RKSimon Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D39583 llvm-svn: 317413	2017-11-04 18:26:41 +00:00
Craig Topper	e5d44cefea	[X86] Teach EVEX->VEX pass to turn SHUFI32X4/SHUFF32X4/SHUFI64X/SHUFF64X2 into VPERM2F128/VPERM2I128. This recovers some of the tests that were changed by r317403. llvm-svn: 317410	2017-11-04 18:10:03 +00:00
Yaxun Liu	0d9673cff2	[AMDGPU] Remove hardcoded address space value from AMDGPULibFunc AMDGPULibFunc hardcodes address space values of the old address space mapping, which causes invalid addrspacecast instructions and undefined functions in APPSDK sample MonteCarloAsianDP. This patch fixes that. Differential Revision: https://reviews.llvm.org/D39616 llvm-svn: 317409	2017-11-04 17:37:43 +00:00
Craig Topper	a96d62b360	[X86] Teach shuffle lowering to use 256-bit SHUF128 when possible. This allows masked operations to be used and allows the register allocator to use YMM16-31 if necessary. As a follow up I'll look into teaching EVEX->VEX how to turn this back into PERM2X128 if any of the additional features don't work out. llvm-svn: 317403	2017-11-04 06:44:47 +00:00
Craig Topper	d21a53f246	[X86] Give unary PERMI priority over SHUF128 in lowerV8I64VectorShuffle to make it possible to fold a load. llvm-svn: 317382	2017-11-03 22:48:13 +00:00
David Blaikie	1be62f0327	Move TargetFrameLowering.h to CodeGen where it's implemented This header already includes a CodeGen header and is implemented in lib/CodeGen, so move the header there to match. This fixes a link error with modular codegeneration builds - where a header and its implementation are circularly dependent and so need to be in the same library, not split between two like this. llvm-svn: 317379	2017-11-03 22:32:11 +00:00
Aaron Ballman	ecf0e95267	Add llvm::for_each as a range-based extensions to <algorithm> and make use of it in some cases where it is a more clear alternative to std::for_each. llvm-svn: 317356	2017-11-03 20:01:25 +00:00
Evandro Menezes	9dcf099944	[AArch64] Fix the number of iterations for the Newton series The number of iterations was incorrectly determined for DP FP vector types and the tests were insufficient to flag this issue. Differential revision: https://reviews.llvm.org/D39507 llvm-svn: 317349	2017-11-03 18:56:36 +00:00
Simon Dardis	d3b9f61c52	[mips] Match 'ins' and its' variants with C++ code Change the ISel matching of 'ins', 'dins[mu]' from tablegen code to C++ code. This resolves an issue where ISel would select 'dins' instead of 'dinsm' when the instructions size and position were individually in range but their sum was out of range according to the ISA specification. Reviewers: atanasyan Differential Revision: https://reviews.llvm.org/D39117 llvm-svn: 317331	2017-11-03 15:35:13 +00:00
Andrew V. Tischenko	0916c6b654	Fix for Bug 34475 - LOCK/REP/REPNE prefixes emitted as instruction on their own. Differential Revision: https://reviews.llvm.org/D39546 llvm-svn: 317330	2017-11-03 15:25:13 +00:00
Simon Pilgrim	ae1f013495	[X86][SSE] Add PACKUS support to combineVectorTruncation Similar to the existing code to lower to PACKSS, we can use PACKUS if the input vector's leading zero bits extend all the way to the packed/truncated value. We have to account for pre-SSE41 targets not supporting PACKUSDW llvm-svn: 317315	2017-11-03 11:33:48 +00:00
Diana Picus	acf4bf21ab	[ARM GlobalISel] Move the check for Thumb higher up We're currently bailing out for Thumb targets while lowering formal parameters, but there used to be some other checks before it, which could've caused some functions (e.g. those without formal parameters) to sneak through unnoticed. llvm-svn: 317312	2017-11-03 10:30:12 +00:00
Martin Storsjo	9befcd7d8d	[AArch64] Use dwarf exception handling on MinGW Ideally we should probably produce WinEH here as well, but until then, we can use dwarf exceptions, without any further changes required in clang, libunwind or libcxxabi. Differential Revision: https://reviews.llvm.org/D39535 llvm-svn: 317304	2017-11-03 07:33:20 +00:00
Craig Topper	333897ec31	[X86] Remove PALIGNR/VALIGN handling from combineBitcastForMaskedOp and move to isel patterns instead. Prefer 128-bit VALIGND/VALIGNQ over PALIGNR during lowering when possible. llvm-svn: 317299	2017-11-03 06:48:02 +00:00
Sriraman Tallam	7cdb10f1aa	Avoid PLT for external calls when attribute nonlazybind is used. Differential Revision: https://reviews.llvm.org/D39065 llvm-svn: 317292	2017-11-03 00:10:19 +00:00
Quentin Colombet	b6afac1f9a	[AArch64][RegisterBankInfo] Add mapping for G_FPEXT. This fixes http://llvm.org/PR32560. We were missing a description for half floating point type and as a result were using the FPR 32 mapping. Because of the size mismatch the generic code was complaining that the default mapping is not appropriate. Fix the mapping description so that the default mapping can be properly applied. llvm-svn: 317287	2017-11-02 23:38:19 +00:00
Quentin Colombet	619d649878	[AArch64][RegisterBankInfo] Add FPR16 support in value mapping. NFC. llvm-svn: 317286	2017-11-02 23:38:13 +00:00
Craig Topper	086c04c8a7	[X86] Give AVX512VL instructions priority over their AVX equivalents. I thought we had gotten all these priority bugs worked out, but I guess not. llvm-svn: 317283	2017-11-02 23:23:37 +00:00
Konstantin Zhuravlyov	275a4f76c4	AMDGPU: Fix warning discovered by r317266 [-Wunused-private-field] llvm-svn: 317280	2017-11-02 22:35:22 +00:00
Krzysztof Parzyszek	058014fca5	[Hexagon] Prefer L2_loadrub_io over L4_loadrub_rr If the offset is an immediate, avoid putting it in a register to get Rs+Rt<<#0. llvm-svn: 317275	2017-11-02 21:56:59 +00:00
Konstantin Zhuravlyov	b695cd41b3	AMDGPU: Remove outdated fixme (it was already fixed) llvm-svn: 317266	2017-11-02 20:48:06 +00:00
Simon Dardis	725acb2d91	[mips] Use register scavenging with MSA. MSA stores and loads to the stack are more likely to require an emergency GPR spill slot due to the smaller offsets available with those instructions. Handle this by overestimating the size of the stack by determining the largest offset presuming that all callee save registers are spilled and accounting of incoming arguments when determining whether an emergency spill slot is required. Reviewers: atanasyan Differential Revision: https://reviews.llvm.org/D39056 llvm-svn: 317204	2017-11-02 12:47:22 +00:00
Sam Parker	242052c6b4	[ARM] and, or, xor and add with shl combine The generic dag combiner will fold: (shl (add x, c1), c2) -> (add (shl x, c2), c1 << c2) (shl (or x, c1), c2) -> (or (shl x, c2), c1 << c2) This can create constants which are too large to use as an immediate. Many ALU operations are also able of performing the shl, so we can unfold the transformation to prevent a mov imm instruction from being generated. Other patterns, such as b + ((a << 1) \| 510), can also be simplified in the same manner. Differential Revision: https://reviews.llvm.org/D38084 llvm-svn: 317197	2017-11-02 10:43:10 +00:00
Andrew V. Tischenko	3c8bf5ec37	The patch updates sched numbers for YMM AVX instrs such as VMOVx, VORx, VXOR, VPERMILx, VBROADCASTx, etc. PR32857 should be closed. Differential Revision: https://reviews.llvm.org/D39227 llvm-svn: 317196	2017-11-02 10:33:41 +00:00
Petar Jovanovic	bb5c84fb57	Revert "Correct dwarf unwind information in function epilogue for X86" This reverts r317100 as it introduced sanitizer-x86_64-linux-autoconf buildbot failure (build #15606). llvm-svn: 317136	2017-11-01 23:05:52 +00:00
Craig Topper	3837322a6b	[X86] Use foreach in X86.td to combine some of the CPU names that are obviously aliases. NFC llvm-svn: 317134	2017-11-01 22:15:49 +00:00
Craig Topper	7a754c4622	[X86] Add CMOV feature to 'i686' processor, making it a proper alias of pentiumpro which I believe it should be. This is consistent with current gcc behavior. llvm-svn: 317133	2017-11-01 22:15:40 +00:00
Simon Pilgrim	e152c2c447	[X86][SSE] Add PACKUS support to LowerTruncate Similar to the existing code to lower to PACKSS, we can use PACKUS if the input vector's leading zero bits extend all the way to the packed/truncated value. We have to account for pre-SSE41 targets not supporting PACKUSDW llvm-svn: 317128	2017-11-01 21:52:29 +00:00
Craig Topper	4e56ba271e	[X86] Add custom code to EVEX to VEX pass to turn unmasked 128-bit VPALIGND/Q into VPALIGNR if the extended registers aren't being used. This will enable us to prefer VALIGND/Q during shuffle lowering in order to get the extended register encoding space when BWI isn't available. But if we end up not using the extended registers we can switch VPALIGNR for the shorter VEX encoding. Differential Revision: https://reviews.llvm.org/D39401 llvm-svn: 317122	2017-11-01 21:00:59 +00:00
Konstantin Zhuravlyov	435151ad75	AMDGPU: Fix set but not used warnings related to AMDGPUAS Differential Revision: https://reviews.llvm.org/D39499 llvm-svn: 317114	2017-11-01 19:12:38 +00:00
Craig Topper	ca1aa83cbe	[X86] Prevent fast isel from folding loads into the instructions listed in hasPartialRegUpdate. This patch moves the check for opt size and hasPartialRegUpdate into the lower level implementation of foldMemoryOperandImpl to catch the entry point that fast isel uses. We're still folding undef register instructions in AVX that we should also probably disable, but that's a problem for another patch. Unfortunately, this requires reordering a bunch of functions which is why the diff is so large. I can do the function reordering separately if we want. Differential Revision: https://reviews.llvm.org/D39402 llvm-svn: 317112	2017-11-01 18:10:06 +00:00
Graham Yiu	671526148c	Adds code to PPC ISEL lowering to recognize half-word inserts from vector_shuffles, and use P9 shift and vector insert instructions instead of vperm. Differential Revision: https://reviews.llvm.org/D34160 llvm-svn: 317111	2017-11-01 18:06:56 +00:00
Craig Topper	5ae677e102	[X86] Add 64-bit int to float/double conversion with AVX to X86FastISel::X86SelectSIToFP Summary: [X86] Teach fast isel to handle i64 sitofp with AVX. For some reason we only handled i32 sitofp with AVX. But with SSE only we support i64 so we should do the same with AVX. Also add i686 command lines for the 32-bit tests. 64-bit tests are in a separate file to avoid a fast-isel abort failure in 32-bit mode. Reviewers: RKSimon, zvi Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D39450 llvm-svn: 317102	2017-11-01 16:23:06 +00:00
Andrew V. Tischenko	3d971e39f8	Update VCVTx, VMOVNTPx and VROUNDYPx instructions scheduling on btver2. Differential Revision: https://reviews.llvm.org/D39059 llvm-svn: 317101	2017-11-01 16:10:20 +00:00
Petar Jovanovic	f2faee92aa	Correct dwarf unwind information in function epilogue for X86 This patch aims to provide correct dwarf unwind information in function epilogue for X86. It consists of two parts. The first part inserts CFI instructions that set appropriate cfa offset and cfa register in emitEpilogue() in X86FrameLowering. This part is X86 specific. The second part is platform independent and ensures that: - CFI instructions do not affect code generation - Unwind information remains correct when a function is modified by different passes. This is done in a late pass by analyzing information about cfa offset and cfa register in BBs and inserting additional CFI directives where necessary. Changed CFI instructions so that they: - are duplicable - are not counted as instructions when tail duplicating or tail merging - can be compared as equal Added CFIInstrInserter pass: - analyzes each basic block to determine cfa offset and register valid at its entry and exit - verifies that outgoing cfa offset and register of predecessor blocks match incoming values of their successors - inserts additional CFI directives at basic block beginning to correct the rule for calculating CFA Having CFI instructions in function epilogue can cause incorrect CFA calculation rule for some basic blocks. This can happen if, due to basic block reordering, or the existence of multiple epilogue blocks, some of the blocks have wrong cfa offset and register values set by the epilogue block above them. CFIInstrInserter is currently run only on X86, but can be used by any target that implements support for adding CFI instructions in epilogue. Patch by Violeta Vukobrat. Differential Revision: https://reviews.llvm.org/D35844 llvm-svn: 317100	2017-11-01 16:04:11 +00:00
Simon Pilgrim	778810eb42	[X86][SSE] Begun generalizing truncateVectorWithPACKSS to work with PACKSS/PACKUS functions Renamed to truncateVectorWithPACK llvm-svn: 317098	2017-11-01 15:31:51 +00:00
Roger Ferrer Ibanez	9dfbc10522	Revert r313618 "[ARM] Use ADDCARRY / SUBCARRY" That change causes PR35103, so reverting until I figure it out. llvm-svn: 317092	2017-11-01 14:06:57 +00:00
NAKAMURA Takumi	1657f2ad99	Fix warnings discovered by rL317076. [-Wunused-private-field] llvm-svn: 317091	2017-11-01 13:47:55 +00:00
NAKAMURA Takumi	f7d7a59b9e	Suppress a warning discovered by rL317076. [-Wunused-private-field] llvm-svn: 317090	2017-11-01 13:47:51 +00:00
Simon Pilgrim	f657ba0cb6	[X86][SSE] Truncate with PACKSS any input with sufficient sign-bits So far we've only been using PACKSS truncations with 'all-bits or zero-bits' patterns (vector comparison results etc.). When really we can safely use it for any case as long as the number of sign bits reach down to the last 16-bits (or 8-bits if we're truncating to bytes). The next steps after this is add the equivalent support for PACKUS and to support packing to sub-128 bit vectors for truncating stores etc. Differential Revision: https://reviews.llvm.org/D39476 llvm-svn: 317086	2017-11-01 11:47:44 +00:00
Craig Topper	688f0ca6a7	[X86] Add more type qualifiers to INSERT_SUBREG operations in rotate patterns so they don't get created with a v64i8 type. Not sure why tablegen didn't error on this. Fixes PR35158. llvm-svn: 317079	2017-11-01 07:11:32 +00:00
Craig Topper	a827f84dcc	[X86] Add AVX512 support to X86FastISel::fastMaterializeFloatZero. llvm-svn: 317059	2017-11-01 00:47:45 +00:00
Benjamin Kramer	f9ab3ddb8f	[AMDGPU] Clean up symbols in the global namespace. llvm-svn: 317051	2017-10-31 23:21:30 +00:00
Marek Olsak	5914ece6aa	AMDGPU: Select s_buffer_load_dword with a non-constant SGPR offset Summary: Apps that benefit: - alien isolation - bioshock infinite - civilization: beyond earth - company of heroes 2 - dirt showdown - dota 2 - F1 2015 - grid autosport - hitman - legend of grimrock - serious sam 3: bfe - shadow warrior - talos principle - total war: warhammer - UE4 demos: effects cave, elemental, sun temple Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D38914 llvm-svn: 317038	2017-10-31 21:06:42 +00:00
Reid Kleckner	39970069b1	[X86][AsmParser] Treat '%' as the modulo operator under Intel syntax It can't be a register prefix, anyway. This is consistent with the masm docs on MSDN: https://msdn.microsoft.com/en-us/library/t4ax90d2.aspx This is a straight-forward extension of our support for "MOD" implemented in https://reviews.llvm.org/D33876 / r306425 llvm-svn: 317011	2017-10-31 16:47:38 +00:00
Simon Pilgrim	f3c33ca83e	[X86][SSE] Add VSRLI/VSRAI/VSLLI demanded elts support to computeKnownBits/ComputeNumSignBits Mainly a perf improvements as most combines will have occurred before we lower to these instructions llvm-svn: 317005	2017-10-31 16:06:21 +00:00
Michael Zuckerman	9e58831cb8	[AVX512] Adding new patterns for extract_subvector of vXi1 extract subvector of vXi1 from vYi1 is poorly supported by LLVM and most of the time end with an assertion. This patch fixes this issue by adding new patterns to the TD file. Reviewers: 1. guyblank 2. igorb 3. zvi 4. ayman 5. craig.topper Differential Revision: https://reviews.llvm.org/D39292 Change-Id: Ideb4d7e946c8d40cfce2920891f2d89fe64c58f8 llvm-svn: 316981	2017-10-31 10:00:19 +00:00
Craig Topper	beed653135	[X86] Make AVX512_512_SET0 XMM16-31 lower to 128-bit XOR when AVX512VL is enabled. Use 128-bit VLX instruction when VLX is enabled. Unfortunately, this weakens our ability to do domain fixing when AVX512DQ is not enabled, but it is consistent with our 256-bit behavior. Maybe we should add custom handling to domain fixing to allow EVEX integer XOR/AND/OR/ANDN to switch to VEX encoded fp instructions if the high registers aren't being used? llvm-svn: 316978	2017-10-31 06:01:04 +00:00
Craig Topper	668b1ab6f1	[X86] Clang-format some code. NFC llvm-svn: 316973	2017-10-31 02:34:29 +00:00
Javed Absar	d13d419d4a	[AArch64]: range loopify frame-lowering llvm-svn: 316960	2017-10-30 22:00:06 +00:00
Craig Topper	9f01f6093c	[X86] Add AVX512 support to fast isel's X86ChooseCmpOpcode. llvm-svn: 316955	2017-10-30 21:09:19 +00:00
Stefan Pintilie	6262fd4b0a	Revert "[PowerPC] Try to simplify a Swap if it feeds a Splat" Revert r316478. A test case has failed. Will recommit this change once we find and fix the failure. This reverts commit 7c330fabaedaba3d02c58bc3cc1198896c895f34. llvm-svn: 316952	2017-10-30 19:55:38 +00:00
Jina Nahias	5bf6620b15	[X86][AVX512] Adding a pattern for broadcastm intrinsic. Differential Revision: https://reviews.llvm.org/D38312 Change-Id: I71c8605a8e4c98013ef25289694afc5cfd46bb0b llvm-svn: 316921	2017-10-30 16:37:28 +00:00
Rafael Espindola	6f36637be0	Move isDSOLocal check and add a comment. llvm-svn: 316920	2017-10-30 16:32:31 +00:00
Fangrui Song	2696db90d1	[PPC CodeGen] Fix the bitreverse.i64 intrinsic. Summary: The two 32-bit words were swapped. Update a test omitted in reverted r316270. Reviewers: jtony, aaron.ballman Subscribers: nemanjai, kbarton Differential Revision: https://reviews.llvm.org/D39163 llvm-svn: 316916	2017-10-30 16:03:44 +00:00
Craig Topper	4e13d4de52	[X86] Make sure we don't create locked inc/dec instructions when the carry flag is being used. Summary: INC/DEC don't update the carry flag so we need to make sure we don't try to use it. This patch introduces new X86ISD opcodes for locked INC/DEC. Teaches lowerAtomicArithWithLOCK to emit these nodes if INC/DEC is not slow or the function is being optimized for size. An additional flag is added that allows the INC/DEC to be disabled if the caller determines that the carry flag is being requested. The test_sub_1_cmp_1_setcc_ugt test is currently showing this bug. The other test case changes are recovering cases that were regressed in r316860. This should fully fix PR35068 finishing the fix started in r316860. Reviewers: RKSimon, zvi, spatel Reviewed By: zvi Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D39411 llvm-svn: 316913	2017-10-30 14:51:37 +00:00
Craig Topper	367cc12fa9	[X86] Remove AVX512 early out from X86FastISel::X86SelectCmp. This shouldn't be needed anymore since i1 isn't a legal type. llvm-svn: 316912	2017-10-30 14:50:11 +00:00
Yaxun Liu	c928f2a6d4	[AMDGPU] Emit metadata for hidden arguments for kernel enqueue Identifies kernels which performs device side kernel enqueues and emit metadata for the associated hidden kernel arguments. Such kernels are marked with calls-enqueue-kernel function attribute by AMDGPUOpenCLEnqueueKernelLowering pass and later on hidden kernel arguments metadata HiddenDefaultQueue and HiddenCompletionAction are emitted for them. Differential Revision: https://reviews.llvm.org/D39255 llvm-svn: 316907	2017-10-30 14:30:28 +00:00
Clement Courbet	b2c3eb8cf1	[CodeGen][ExpandMemcmp] Allow memcmp to expand to vector loads (2). - Targets that want to support memcmp expansions now return the list of supported load sizes. - Expansion codegen does not assume that all power-of-two load sizes smaller than the max load size are valid. For examples, this is not the case for x86(32bit)+sse2. Fixes PR34887. llvm-svn: 316905	2017-10-30 14:19:33 +00:00
Krzysztof Parzyszek	bef1c56724	[Hexagon] Allow the RDF optimizations to be run in .mir testcases llvm-svn: 316904	2017-10-30 14:11:52 +00:00
Javed Absar	5cde1ccb29	[GlobalISel\|ARM] : Allow legalizing G_FSUB Adding support for VSUB. Reviewed by: @rovka Differential Revision: https://reviews.llvm.org/D39261 llvm-svn: 316902	2017-10-30 13:51:56 +00:00
Andrew V. Tischenko	f94da596a7	Invalid used of 'w' suffix on push and pop using 64-bit register. Differential Revision: https://reviews.llvm.org/D38626 llvm-svn: 316898	2017-10-30 12:02:06 +00:00
Jina Nahias	e63db55c67	Revert "[X86][AVX512] Adding a pattern for broadcastm intrinsic." This reverts commit r316890. Change-Id: I683cceee9848ef309b452293086b1f26a941950d llvm-svn: 316894	2017-10-30 10:35:53 +00:00
Jina Nahias	70280f9a0d	[X86][AVX512] Adding a pattern for broadcastm intrinsic. Differential Revision: https://reviews.llvm.org/D38312 Change-Id: I6551fb13879e098aed74de410e29815cf37d9ab5 llvm-svn: 316890	2017-10-30 09:59:52 +00:00
Craig Topper	85bcc297c3	[X86] Rearrange code in X86InstrInfo.cpp to put all the foldMemoryOperandImpl methods together without partial/undef register handling in the middle. NFC I have a future patch that wants to make use of the one of the partial functions in one of the earlier memory folding methods and the current ordering prevents that. llvm-svn: 316883	2017-10-30 04:39:18 +00:00
Craig Topper	c848355335	[X86] Simplify code by removing an unnecessary temporary variable. NFC llvm-svn: 316882	2017-10-30 03:35:44 +00:00
Craig Topper	730414b0ca	[X86] Move some EVEX->VEX code to a helper function to prepare for a future patch. NFC llvm-svn: 316881	2017-10-30 03:35:43 +00:00
Craig Topper	495a1bc893	[X86] Remove combine that turns X86ISD::LSUB into X86ISD::LADD. Update patterns that depended on this. If the carry flag is being used, this transformation isn't safe. This does prevent some test cases from using DEC now, but I'll try to look into that separately. Fixes PR35068. llvm-svn: 316860	2017-10-29 06:51:04 +00:00
Craig Topper	7a60e29185	[X86] Fix typo in comment. NFC llvm-svn: 316859	2017-10-29 06:51:02 +00:00
Craig Topper	912f3b8e4b	[X86] Use the extended vector register classes in fast isel with AVX512F/VL. llvm-svn: 316857	2017-10-29 05:14:26 +00:00
Craig Topper	5f2289a13c	[X86] Add AVX512 support to X86FastISel::X86SelectFPExt and X86FastISel::X86SelectFPTrunc. llvm-svn: 316856	2017-10-29 02:50:31 +00:00
Craig Topper	1e30d783dd	[X86] Add AVX512 support to X86FastISel::X86MaterializeFP llvm-svn: 316853	2017-10-29 02:18:41 +00:00
Craig Topper	0692ca4bd2	[X86] Remove invalid code from LowerVSELECT. This code attempted to say that v8i16/v16i16 VSELECT is legal if BWI and VLX are enabled, but the only way we could reach this point is if the condition was not a vXi1 type. Which means it really wasn't legal. We don't have any tests that exercise this code. So I'm hoping it wasn't really reachable. llvm-svn: 316851	2017-10-28 23:10:13 +00:00
Simon Pilgrim	294f88dfa0	[X86][SSE] Combine 128-bit target shuffles to PACKSS/PACKUS. llvm-svn: 316845	2017-10-28 20:51:27 +00:00
Simon Pilgrim	bd3852aa5e	[X86][SSE] Split off matchVectorShuffleWithPACK. NFCI. Split matchVectorShuffleWithPACK from lowerVectorShuffleWithPACK so that we can reuse it for target shuffle combines llvm-svn: 316844	2017-10-28 20:27:22 +00:00
Craig Topper	40f0584f08	[X86] Fix a mistake in the X86ISelDAGToDAG.cpp code for MUL8r/IMUL8r. I think this code is unreachable due to some promotions that occur elsewhere. I'll look into that to be sure, but for now I thought I should at least fix the obvious typo. llvm-svn: 316840	2017-10-28 19:56:57 +00:00
Craig Topper	202b559ae0	[X86] Replace some default cases in X86SelectShift with llvm_unreachable. llvm-svn: 316839	2017-10-28 19:56:56 +00:00
Sanjay Patel	b049173157	[SimplifyCFG] use pass options and remove the latesimplifycfg pass This is no-functional-change-intended. This is repackaging the functionality of D30333 (defer switch-to-lookup-tables) and D35411 (defer folding unconditional branches) with pass parameters rather than a named "latesimplifycfg" pass. Now that we have individual options to control the functionality, we could decouple when these fire (but that's an independent patch if desired). The next planned step would be to add another option bit to disable the sinking transform mentioned in D38566. This should also make it clear that the new pass manager needs to be updated to limit simplifycfg in the same way as the old pass manager. Differential Revision: https://reviews.llvm.org/D38631 llvm-svn: 316835	2017-10-28 18:43:07 +00:00
Simon Pilgrim	25808c303f	[X86][SSE] Rename truncateVectorCompareWithPACKSS to truncateVectorWithPACKSS. NFC. We no longer rely on the vector source being a comparison result, just have sufficient sign bits. llvm-svn: 316834	2017-10-28 17:59:56 +00:00
Craig Topper	f8b92661b8	[X86] Remove unneeded MVT::i1 related code from fast isel. llvm-svn: 316825	2017-10-28 05:52:23 +00:00
Tom Stellard	d0c6cf2e8c	AMDGPU/GlobalISel: Mark 32-bit G_FADD as legal Reviewers: arsenm Reviewed By: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, igorb, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D38439 llvm-svn: 316815	2017-10-27 23:57:41 +00:00
Krzysztof Parzyszek	4dc04e6a70	[Hexagon] Adjust patterns to reflect instruction selection preferences llvm-svn: 316804	2017-10-27 22:24:49 +00:00
David Blaikie	8699f71310	Add a few missing headers for modularization/IWYU/etc Several cases where class definitions are required for DenseMap pointer traits handling. llvm-svn: 316803	2017-10-27 22:12:46 +00:00
Rafael Espindola	2393c3b4e1	Handle undefined weak hidden symbols on all architectures. We were handling the non-hidden case in lib/Target/TargetMachine.cpp, but the hidden case was handled in architecture dependent code and only X86_64 and AArch64 were covered. While it is true that some code sequences in some ABIs might be able to produce the correct value at runtime, that doesn't seem to be the common case. I left the AArch64 code in place since it also forces a got access for non-pic code. It is not clear if that is needed, but it is probably better to change that in another commit. llvm-svn: 316799	2017-10-27 21:18:48 +00:00
Craig Topper	d69453290e	[X86] Remove fast-isel code for handling i8 shifts. This is handled by auto generated code. llvm-svn: 316797	2017-10-27 21:00:59 +00:00
Craig Topper	728fa7b4e2	[X86] Teach fastisel to use VLX VMOVNTDQA for v4f64 and 256-bit integers when available. This looks to have been missed from r280682. llvm-svn: 316790	2017-10-27 20:13:10 +00:00
Krzysztof Parzyszek	92a2635bbd	[Hexagon] Fix an incorrect assertion in HexagonConstExtenders.cpp Making sure that an instruction has fewer operands than required, then attempting to access one out of range is going to fail. llvm-svn: 316785	2017-10-27 18:52:28 +00:00
Simon Pilgrim	5e3808afa2	[X86][F16C] Fix btver2 AGU pipe scheduling Use the store AGU for stores, and the load AGU needs to be the first pipe for loads llvm-svn: 316771	2017-10-27 16:34:58 +00:00
David Blaikie	6265130054	InstructionSelectorImpl.h: Modularize/remove ODR violations by using a static member function to expose the debug name llvm-svn: 316715	2017-10-26 23:39:54 +00:00
Eli Friedman	d5dfb62de7	[ARM] Honor -mfloat-abi for libcall calling convention As far as I can tell, this matches gcc: -mfloat-abi determines the calling convention for all functions except those explicitly defined as soft-float in the ARM RTABI. This change only affects cases where the user specifies -mfloat-abi to override the default calling convention derived from the target triple. Fixes https://bugs.llvm.org//show_bug.cgi?id=34530. Differential Revision: https://reviews.llvm.org/D38299 llvm-svn: 316708	2017-10-26 21:42:32 +00:00
Craig Topper	b8d7d4d683	[X86] Improve handling of UDIVREM8_ZEXT_HREG/SDIVREM8_SEXT_HREG to support 64-bit extensions. If the extend type is 64-bits, emit a 32-bit -> 64-bit extend after the UDIVREM8_ZEXT_HREG/UDIVREM8_SEXT_HREG operation. This gives a shorter encoding for the second extend in the sext case, and allows us to completely remove the second extend in the zext case. This also adds known bit and num sign bits support for UDIVREM8_ZEXT_HREG/SDIVREM8_SEXT_HREG. Differential Revision: https://reviews.llvm.org/D38275 llvm-svn: 316702	2017-10-26 21:12:03 +00:00
Craig Topper	8a2a104129	[X86] Teach the assembly parser to warn on duplicate registers in gather instructions. Fixes PR32238. Differential Revision: https://reviews.llvm.org/D39077 llvm-svn: 316700	2017-10-26 21:03:54 +00:00
Sanjay Patel	ac50f3e907	[x86] use an insert op to put one variable element into a constant of vectors Instead of loading (a potential ton of) scalar constants, load those as a vector and then insert into it. Differential Revision: https://reviews.llvm.org/D38756 llvm-svn: 316685	2017-10-26 18:27:55 +00:00
Yichao Yu	221dae31a5	Clear LastMappingSymbols and LastEMS(Info) when resetting the ARM(AArch64)ELFStreamer Summary: This causes a segfault on ARM when (I think) the pass manager is used multiple times. Reset set the (last) current section to NULL without saving the corresponding LastEMSInfo back into the map. The next use of the streamer then save the LastEMSInfo for the NULL section leaving the LastEMSInfo mapping for the last current section (the one that was there before the reset) NULL which cause the LastEMSInfo to be set to NULL when the section is being used again. The reuse of the section (pointer) might mean that the map was holding dangling pointers previously which is why I went for clearing the map and resetting the info, making it as similar to the state right after the constructor run as possible. The AArch64 one doesn't have segfault (since LastEMS isn't a pointer) but it seems to have the same issue. The segfault is likely caused by https://reviews.llvm.org/D30724 which turns LastEMSInfo into a pointer. As mentioned above, it seems that the actual issue was older though. No test is included since the test is believed to be too complicated for such an obvious fix and not worth doing. Reviewers: llvm-commits, shankare, t.p.northover, peter.smith, rengolin Reviewed By: rengolin Subscribers: mgorny, aemerson, rengolin, javed.absar, kristof.beyls Differential Revision: https://reviews.llvm.org/D38588 llvm-svn: 316679	2017-10-26 17:36:43 +00:00
Sean Fertile	c70d28bff5	Represent runtime preemption in the IR. Currently we do not represent runtime preemption in the IR, which has several drawbacks: 1) The semantics of GlobalValues differ depending on the object file format you are targeting (as well as the relocation-model and -fPIE value). 2) We have no way of disabling inlining of run time interposable functions, since in the IR we only know if a function is link-time interposable. Because of this llvm cannot support elf-interposition semantics. 3) In LTO builds of executables we will have extra knowledge that a symbol resolved to a local definition and can't be preemptable, but have no way to propagate that knowledge through the compiler. This patch adds preemptability specifiers to the IR with the following meaning: dso_local --> means the compiler may assume the symbol will resolve to a definition within the current linkage unit and the symbol may be accessed directly even if the definition is not within this compilation unit. dso_preemptable --> means that the compiler must assume the GlobalValue may be replaced with a definition from outside the current linkage unit at runtime. To ease transitioning dso_preemptable is treated as a 'default' in that low-level codegen will still do the same checks it did previously to see if a symbol should be accessed indirectly. Eventually when IR producers emit the specifiers on all Globalvalues we can change dso_preemptable to mean 'always access indirectly', and remove the current logic. Differential Revision: https://reviews.llvm.org/D20217 llvm-svn: 316668	2017-10-26 15:00:26 +00:00
Marek Olsak	2232243863	AMDGPU: Handle s_buffer_load_dword hazard on SI Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D39171 llvm-svn: 316666	2017-10-26 14:43:02 +00:00
Simon Dardis	b633acac9f	[mips] Fix (dis)assembly of abs.fmt for micromips These instructions were previously marked as codegen only preventing them from being assembled as microMIPS or disassembled. Reviewers: atanasyan, abeserminji Differential Revision: https://reviews.llvm.org/D39123 llvm-svn: 316656	2017-10-26 11:36:54 +00:00
Simon Dardis	13452383cd	[mips] Fix PR35071 PR35071 exposed the fact that MipsInstrInfo::removeBranch did not walk past debug instructions when removing branches for the control flow optimizer, which lead to duplicated conditional branches. If the target of the branch was a removable block, only the conditional branch in the terminating position would have it's MBB operands updated, leaving the first branch with a dangling MBB operand. The MIPS long branch pass would then trigger an assertion when attempting to examine the instruction with dangling MBB operand. This resolves PR35071. Thanks to Alex Richardson for reporting the issue! Reviewers: atanasyan Differential Revision: https://reviews.llvm.org/D39288 llvm-svn: 316654	2017-10-26 10:58:36 +00:00
Hiroshi Inoue	b72b1fb0de	[PowerPC] Use record-form instruction for Less-or-Equal -1 and Greater-or-Equal 1 Currently a record-form instruction is used for comparison of "greater than -1" and "less than 1" by modifying the predicate (e.g. LT 1 into LE 0) in addition to the naive case of comparison against 0. This patch also enables emitting a record-form instruction for "less than or equal to -1" (i.e. "less than 0") and "greater than or equal to 1" (i.e. "greater than 0") to increase the optimization opportunities. Differential Revision: https://reviews.llvm.org/D38941 llvm-svn: 316647	2017-10-26 09:01:51 +00:00
Craig Topper	0551556ed2	[AsmParser][TableGen] Add VariantID argument to the generated mnemonic spell check function so it can use the correct table based on variant. I'm considering implementing the mnemonic spell checker for x86, and that would require the separate intel and att variants. llvm-svn: 316641	2017-10-26 06:46:41 +00:00
Craig Topper	2a06028c0a	[AsmParser][TableGen] Make the generated mnemonic spell checker function a file local static function. Also only emit in targets that specificially request it. This is required so we don't get an unused static function error. llvm-svn: 316640	2017-10-26 06:46:40 +00:00
Craig Topper	619b15283d	[X86] Use correct type for return value of ComputeAvailableFeatures in the AsmParser. NFC There aren't enough used bits to make this a functional change, but we should fix it for consistency. llvm-svn: 316639	2017-10-26 06:46:38 +00:00
David Blaikie	cc7763ba92	Hexagon: Fold a single-use textual header into its use llvm-svn: 316604	2017-10-25 19:52:21 +00:00
Krzysztof Parzyszek	27056da9a8	[Hexagon] Account for negative offset when limiting max deviation In getOffsetRange, Max can be set to 0 to force the extender replacement to be at or below the original value. This would cause the new offset to be non-negative, which is preferred for memory instructions (to reduce the likelihood of it getting constant-extended due to predication). The problem happens when the range is shifted by an offset (present in the instruction being examined) and the offset is negative. The entire range for the allowable deviation will then be strictly negative. This creates a problem, since 0 is assumed to be a valid deviation. llvm-svn: 316601	2017-10-25 18:46:40 +00:00
Craig Topper	6fae2eedf3	[X86] Add avx512vpopcntdq to Knights Mill As indicated by Table 1-1 in Intel Architecture Instruction Set Extensions and Future Features Programming Reference from October 2017. llvm-svn: 316592	2017-10-25 17:10:32 +00:00
Simon Dardis	7af3edc4f4	[mips] Clean up some whitespace (NFC). Also test that my email address was updated. llvm-svn: 316575	2017-10-25 13:35:53 +00:00
Diana Picus	b35022121d	[ARM GlobalISel] Fix call opcodes We were generating BLX for all the calls, which was incorrect in most cases. Update ARMCallLowering to generate BL for direct calls, and BLX, BX_CALL or BMOVPCRX_CALL for indirect calls. llvm-svn: 316570	2017-10-25 11:42:40 +00:00
Sam Parker	1f742117bd	[ARM] OrCombineToBFI function Extract the functionality to combine OR to BFI into its own function. Differential Revision: https://reviews.llvm.org/D39001 llvm-svn: 316563	2017-10-25 08:37:33 +00:00
Sam Parker	ccb209bb97	[ARM] Swap cmp operands for automatic shifts Swap the compare operands if the lhs is a shift and the rhs isn't, as in arm and T2 the shift can be performed by the compare for its second operand. Differential Revision: https://reviews.llvm.org/D39004 llvm-svn: 316562	2017-10-25 08:33:06 +00:00
Martin Storsjo	373c8efa1e	[AArch64] Add support for dllimport of values and functions Previously, the dllimport attribute did the right thing in terms of treating it as a pointer to a value, but this makes sure the names get mangled properly, and calls to such functions load the function from the __imp_ pointer. This is based on SVN r212431 and r212430 where the same was implemented for ARM. Differential Revision: https://reviews.llvm.org/D38530 llvm-svn: 316555	2017-10-25 07:25:18 +00:00
Matt Arsenault	28f52e51f1	AMDGPU: Add max-mix-insts subtarget feature llvm-svn: 316553	2017-10-25 07:00:51 +00:00
Yonghong Song	9af998e86e	bpf: fix an uninitialized variable issue Signed-off-by: Yonghong Song <yhs@fb.com> llvm-svn: 316519	2017-10-24 21:36:33 +00:00
David Blaikie	c70b392e49	ARMAddressingModes.h: Don't mark header functions as file local llvm-svn: 316517	2017-10-24 21:29:21 +00:00
David Blaikie	4016da602e	HexagonDepTimingClasses.h: Don't mark header functions as file local llvm-svn: 316508	2017-10-24 21:29:16 +00:00
David Blaikie	75bda3006b	WebassemblyAsmPrinter.h: Include WebAssemblyMachineFunctionInfo for use with MachineFunction::getInfo llvm-svn: 316507	2017-10-24 21:29:15 +00:00
David Blaikie	1032b51aa0	X86Operand.h: Include X86MCTargetDesc.h for SSE register enum/names llvm-svn: 316506	2017-10-24 21:29:15 +00:00
David Blaikie	6a2b124248	X86AsmPrinter.h: Add missing header for complete type needed for MCCodeEmitter dtor. llvm-svn: 316505	2017-10-24 21:29:14 +00:00
Artem Belevich	cb8f6328dc	[NVPTX] allow address space inference for volatile loads/stores. If particular target supports volatile memory access operations, we can avoid AS casting to generic AS. Currently it's only enabled in NVPTX for loads and stores that access global & shared AS. Differential Revision: https://reviews.llvm.org/D39026 llvm-svn: 316495	2017-10-24 20:31:44 +00:00
Gadi Haber	323f2e1715	[X86][Broadwell] Added the instruction scheduling information for the Broadwell CPU. Adding the scheduling information for the Browadwell (BDW) CPU target. This patch adds the instruction scheduling information for the Broadwell (BDW) architecture target by adding the file X86SchedBroadwell.td located under the X86 Target. We used the scheduling information retrieved from the Broadwell architects in order to create the file. The scheduling information includes latency, number of micro-Ops and used ports by each BDW instruction. The patch continues the scheduling replacement and insertion effort started with the SandyBridge (SNB) target in r310792, the Haswell (HSW) target in r311879, the SkylakeClient (SKL) target in rL313613 + rL315978 and the SkylakeServer (SKX) in rL315175. Performance fluctuations may be expected due to code alignment effects. Reviewers: zvi, RKSimon, craig.topper Differential Revision: https://reviews.llvm.org/D39054 Change-Id: If6f799e5ff60e1091c8d43b05ea78c53581bae01 llvm-svn: 316492	2017-10-24 20:19:47 +00:00
Yonghong Song	ee68d8e41f	bpf: fix a bug in trunc-op optimization Previous implementation for per-function scope is incorrect and too conservative. Signed-off-by: Yonghong Song <yhs@fb.com> llvm-svn: 316481	2017-10-24 18:21:10 +00:00
Stefan Pintilie	8f0c783095	[PowerPC] Try to simplify a Swap if it feeds a Splat If we have the situation where a Swap feeds a Splat we can sometimes change the index on the Splat and then remove the Swap instruction. Fixed the test case that was failing and recommit after pulling the original commit. Original revision is here: https://reviews.llvm.org/D39009 llvm-svn: 316478	2017-10-24 17:44:27 +00:00
Yonghong Song	0f836d5dc5	bpf: fix a bug in bpf-isel trunc-op optimization In BPF backend, we try to optimize away redundant trunc operations so that kernel verifier rewrite remains valid. Previous implementation only works for a single function. This patch fixed the issue for multiple functions. It clears internal map data structure before performing optimization for each function. Signed-off-by: Yonghong Song <yhs@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> llvm-svn: 316469	2017-10-24 17:29:03 +00:00
Simon Pilgrim	5e8c3f328f	[X86][AVX] ComputeNumSignBitsForTargetNode - add support for X86ISD::VTRUNC llvm-svn: 316462	2017-10-24 17:04:57 +00:00
Saleem Abdulrasool	fb490a0bcc	PowerPC: support the separator character in the IAS PowerPC uses ; as a comment leader and the @ as a separator character. Support this properly. llvm-svn: 316454	2017-10-24 16:19:56 +00:00
Simon Pilgrim	0a12c239b6	[X86] truncateVectorCompareWithPACKSS - use PACKSSDW/PACKSSWB instead of just PACKSSWB. By using the widest type possible for PACKSS truncation we have a better chance of being able to peek through bitcasts and improves other combines driven by ComputeNumSignBits. llvm-svn: 316448	2017-10-24 15:38:16 +00:00
Oliver Stannard	03ded27bbc	[ARM] Error for invalid shift in memory operand Report a diagnostic when we fail to parse a shift in a memory operand because the shift type is not an identifier. Without this, we were silently ignoring the whole instruction. Differential revision: https://reviews.llvm.org/D39237 llvm-svn: 316441	2017-10-24 14:19:08 +00:00
Simon Pilgrim	c36dd6ae9c	[X86] truncateVectorCompareWithPACKSS - remove duplicate variables. NFCI. llvm-svn: 316440	2017-10-24 14:18:32 +00:00
Andrew V. Tischenko	f4fbe4a51b	Update f16c instruction scheduling on btver2. Differential Revision: https://reviews.llvm.org/D39051 llvm-svn: 316435	2017-10-24 13:38:30 +00:00
Zvi Rackover	bf31bf78e7	X86CallFrameOptimization: Update comments and variable names. NFCI. Following up on D38738. llvm-svn: 316434	2017-10-24 13:24:26 +00:00
Zvi Rackover	31b101a186	X86CallFrameOptimization: Recognize 'store 0/-1 using and/or' idioms Summary: r264440 added or/and patterns for storing -1 or 0 with the intention of decreasing code size. However, X86CallFrameOptimization does not recognize these memory accesses so it will not replace them with push's when profitable. This patch fixes this problem by teaching X86CallFrameOptimization these store 0/-1 idioms. An alternative fix would be to prevent the 'store 0/1 idioms' patterns from firing when accessing the stack. This would save the need to teach the pass about these idioms. However, because X86CallFrameOptimization does not always fire we may result in cases where neither X86CallFrameOptimization not the patterns for 'store 0/1 idioms' fire. Fixes pr34863 Reviewers: DavidKreitzer, guyblank, aymanmus Reviewed By: aymanmus Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D38738 llvm-svn: 316431	2017-10-24 12:13:05 +00:00
Marek Olsak	ce76ea0394	AMDGPU: Add new intrinsic llvm.amdgcn.kill(i1) Summary: Kill the thread if operand 0 == false. llvm.amdgcn.wqm.vote can be applied to the operand. Also allow kill in all shader stages. Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D38544 llvm-svn: 316427	2017-10-24 10:27:13 +00:00
Marek Olsak	2114fc3bcb	AMDGPU: Add llvm.amdgcn.wqm.vote intrinsic Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye Differential Revision: https://reviews.llvm.org/D38543 llvm-svn: 316426	2017-10-24 10:26:59 +00:00
Oliver Stannard	ce256a3a01	[ARM] Replace development diagnostics with normal DEBUG macro * Remove the -arm-asm-parser-dev-diags option. * Use normal DEBUG(dbgs()) printing for the extra development information about missing diagnostics. Differential Revision: https://reviews.llvm.org/D39194 llvm-svn: 316423	2017-10-24 09:46:56 +00:00
Oliver Stannard	6d5a5b98ab	[ARM] tSETEND needs IsThumb This is the Thumb encoding, so the Requires list must include IsThumb. No test because we happen to select the ARM one first, but that's just luck. Differential Revision: https://reviews.llvm.org/D39190 llvm-svn: 316421	2017-10-24 09:03:33 +00:00
Oliver Stannard	c507b370a1	[ARM] Remove tCPS alias which just crashed This alias caused a crash when trying to print the "cps #0" instruction in a diagnostic for thumbv6 (which doesn't have that instruction). The comment was incorrect, this instruction is UNPREDICTABLE if no flag bits are set, so I don't think it's worth keeping. Differential Revision: https://reviews.llvm.org/D39191 llvm-svn: 316420	2017-10-24 08:55:36 +00:00
Zvi Rackover	3c0d385598	X86: Fix X86CallFrameOptimization to search for the COPY StackPointer SelectionDAG inserts a copy of ESP into a virtual register. X86CallFrameOptimization assumed that the COPY, if present, is always right after the call-frame setup instruction (ADJCALLSTACKDOWN). This was a wrong assumption as the COPY can be located anywhere between the call-frame setup instruction and its first use. If the COPY happened to be located in a different location than what X86CallFrameOptimization assumed, visiting it while processing the call chain would lead to a conservative bail-out. The fix is quite straightfoward, scan ahead for the stack-pointer copy and make note of it so it can be ignored while processing the call chain. Fixes pr34903 Differential Revision: https://reviews.llvm.org/D38730 llvm-svn: 316416	2017-10-24 07:38:29 +00:00
Omer Paparo Bivas	2251c79aba	[MC] Adding code padding for performance stability - infrastructure. NFC. Infrastructure designed for padding code with nop instructions in key places such that preformance improvement will be achieved. The infrastructure is implemented such that the padding is done in the Assembler after the layout is done and all IPs and alignments are known. This patch by itself in a NFC. Future patches will make use of this infrastructure to implement required policies for code padding. Reviewers: aaboud zvi craig.topper gadi.haber Differential revision: https://reviews.llvm.org/D34393 Change-Id: I92110d0c0a757080a8405636914a93ef6f8ad00e llvm-svn: 316413	2017-10-24 06:16:03 +00:00
Zvi Rackover	c6d0b6c103	X86: Register the X86CallFrameOptimization pass Summary: The motivation of this change is to enable .mir testing for this pass. Added one test case to cover the functionality, this same case will be improved by a future patch. Reviewers: igorb, guyblank, DavidKreitzer Reviewed By: guyblank, DavidKreitzer Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D38729 llvm-svn: 316412	2017-10-24 05:47:07 +00:00
Konstantin Zhuravlyov	339e74440a	AMDGPU: Initialize WavefrontSize from TD files Differential Revision: https://reviews.llvm.org/D39205 llvm-svn: 316389	2017-10-23 23:02:39 +00:00
Simon Pilgrim	321e54f72d	[X86][SSE] combineBitcastvxi1 - use PACKSSWB directly to pack v8i16 to v16i8 Avoid difficulties determining the number of sign bits later on in shuffle lowering to lower to PACKSS llvm-svn: 316383	2017-10-23 22:05:02 +00:00
Stefan Pintilie	52bbd587ac	Revert "[PowerPC] Try to simplify a Swap if it feeds a Splat" Revert commit r316366. Previous commit causes p8-scalar_vector_conversions.ll to fail. This reverts commit 990e764ad8a2eec206ce5dda6aefab059ccd4e92. llvm-svn: 316371	2017-10-23 20:22:23 +00:00
Krzysztof Parzyszek	6f06b6edff	[Hexagon] Return the correct chain edge for i1 function calls In HexagonISelLowering, there is code to handle the case when a function returns an i1 type. In this case, we need to generate extra nodes to copy the result from R0 to a predicate register. The code was returning the wrong value for the chain edge which caused an assert "Wrong topological sorting" when converting the instructions to MIs. This patch fixes the problem by returning the chain for the final copy. Patch by Brendon Cahoon. llvm-svn: 316367	2017-10-23 19:35:25 +00:00
Stefan Pintilie	feafa1d7f0	[PowerPC] Try to simplify a Swap if it feeds a Splat If we have the situation where a Swap feeds a Splat we can sometimes change the index on the Splat and then remove the Swap instruction. Differential Revision: https://reviews.llvm.org/D39009 llvm-svn: 316366	2017-10-23 19:33:31 +00:00
Krzysztof Parzyszek	273678823b	[Hexagon] Add extra pattern for S4_addaddi One combination was missing: add(add(x,y),c). llvm-svn: 316363	2017-10-23 19:07:50 +00:00
Daniel Sanders	d66e0901ae	[globalisel][tablegen] Import stores and allow GISel to automatically substitute zero regs like WZR/XZR/$zero. This patch enables the import of stores. Unfortunately, doing so by itself, loses an optimization where storing 0 to memory makes use of WZR/XZR. To mitigate this, this patch also introduces a new feature that allows register operands to nominate a zero register. When this is done, GlobalISel will substitute (G_CONSTANT 0) with the nominated register automatically. This is currently configured to only apply to the stores. Applying it to GPR32/GPR64 register classes in general will be done after review see (https://reviews.llvm.org/D39150). llvm-svn: 316360	2017-10-23 18:19:24 +00:00
Matt Arsenault	a030e2688f	AMDGPU: Cleanup local atomic node names llvm-svn: 316349	2017-10-23 17:16:43 +00:00
Matt Arsenault	b791802aef	AMDGPU: Fix default range in non-kernel functions The range should be assumed to be the hardware maximum if a workitem intrinsic is used in a callable function which does not know the restricted limit of the calling kernel. llvm-svn: 316346	2017-10-23 17:09:35 +00:00
Craig Topper	8d5a246ebe	[X86] Change VMPTRST to use PS instead of TB to match VMPTRLD. llvm-svn: 316340	2017-10-23 16:22:40 +00:00
Craig Topper	1db2f0828e	[X86] Change RDRAND to use PS instead of TB. Should be no functional change for now. A future disassembler change will prevent disassembling with 0xf2/0xf3. llvm-svn: 316339	2017-10-23 16:22:38 +00:00
Craig Topper	4d93adfed5	[X86] Change XRSTOR to use PS instead of TB to match XSAVE. I don't think this changes anything functionally yet, but I plan to fix the disassembler to use this to disable matching certain instructions with 0xf3/0xf2/0x66 prefixes. llvm-svn: 316337	2017-10-23 16:11:33 +00:00
Simon Pilgrim	1dcb913be6	[X86][SSE] Remove AssertZext stage from PEXTRW/PEXTRB lowering. NFCI. Remove AssertZext and instead add PEXTRW/PEXTRB support to computeKnownBitsForTargetNode to simplify instruction selection. Differential Revision: https://reviews.llvm.org/D39169 llvm-svn: 316336	2017-10-23 16:00:57 +00:00
Andrew V. Tischenko	777308b548	Update DPPD/DPPS instruction scheduling on btver2. Differential Revision: https://reviews.llvm.org/D39046 llvm-svn: 316334	2017-10-23 15:53:30 +00:00
Craig Topper	8f182fdd8b	[X86] Add PTWRITE instruction for assembler and disassembler. llvm-svn: 316333	2017-10-23 15:53:21 +00:00
Craig Topper	5f0339d2f3	[X86] Add RDPID instruction for assembler and disassembler. llvm-svn: 316332	2017-10-23 15:53:16 +00:00
Andrew V. Tischenko	eff4fc0d41	Fix for Bug 30718 - Failure to disassemble certain MOV with rex.R. The issue was in illegal segment register index. Differential Revision: https://reviews.llvm.org/D38786 llvm-svn: 316319	2017-10-23 09:36:33 +00:00
Haojian Wu	1afddd4136	Fix a -Wpedantic warning. llvm-svn: 316315	2017-10-23 09:02:59 +00:00
Sam Parker	487ab86942	[ARM] Allow unrolling of multi-block loops. Before, loop unrolling was only enabled for loops with a single block. This restriction has been removed and replaced by: - allow a maximum of two exiting blocks, - a four basic block limit for cores with a branch predictor. Differential Revision: https://reviews.llvm.org/D38952 llvm-svn: 316313	2017-10-23 08:05:14 +00:00
Craig Topper	326008c615	[X86] Fix disassembly of EVEX rounding control and SAE instructions. Fixes PR31955. llvm-svn: 316308	2017-10-23 02:26:24 +00:00
Benjamin Kramer	a7c822a238	[X86] Add missing override. NFC. llvm-svn: 316299	2017-10-22 19:16:31 +00:00
Simon Pilgrim	ce55eab936	Strip trailing whitespace. NFCI. llvm-svn: 316296	2017-10-22 18:38:57 +00:00
Marina Yatsina	f9371d821f	Add logic to greedy reg alloc to avoid bad eviction chains This fixes bugzilla 26810 https://bugs.llvm.org/show_bug.cgi?id=26810 This is intended to prevent sequences like: movl %ebp, 8(%esp) # 4-byte Spill movl %ecx, %ebp movl %ebx, %ecx movl %edi, %ebx movl %edx, %edi cltd idivl %esi movl %edi, %edx movl %ebx, %edi movl %ecx, %ebx movl %ebp, %ecx movl 16(%esp), %ebp # 4 - byte Reload Such sequences are created in 2 scenarios: Scenario #1: vreg0 is evicted from physreg0 by vreg1 Evictee vreg0 is intended for region splitting with split candidate physreg0 (the reg vreg0 was evicted from) Region splitting creates a local interval because of interference with the evictor vreg1 (normally region spliiting creates 2 interval, the "by reg" and "by stack" intervals. Local interval created when interference occurs.) one of the split intervals ends up evicting vreg2 from physreg1 Evictee vreg2 is intended for region splitting with split candidate physreg1 one of the split intervals ends up evicting vreg3 from physreg2 etc.. until someone spills Scenario #2 vreg0 is evicted from physreg0 by vreg1 vreg2 is evicted from physreg2 by vreg3 etc Evictee vreg0 is intended for region splitting with split candidate physreg1 Region splitting creates a local interval because of interference with the evictor vreg1 one of the split intervals ends up evicting back original evictor vreg1 from physreg0 (the reg vreg0 was evicted from) Another evictee vreg2 is intended for region splitting with split candidate physreg1 one of the split intervals ends up evicting vreg3 from physreg2 etc.. until someone spills As compile time was a concern, I've added a flag to control weather we do cost calculations for local intervals we expect to be created (it's on by default for X86 target, off for the rest). Differential Revision: https://reviews.llvm.org/D35816 Change-Id: Id9411ff7bbb845463d289ba2ae97737a1ee7cc39 llvm-svn: 316295	2017-10-22 17:59:38 +00:00
Momchil Velikov	d6a4ab3d49	[ARM] Dynamic stack alignment for 16-bit Thumb This patch implements dynamic stack (re-)alignment for 16-bit Thumb. When targeting processors, which support only the 16-bit Thumb instruction set the compiler ignores the alignment attributes of automatic variables and may silently generate incorrect code. Differential revision: https://reviews.llvm.org/D38143 llvm-svn: 316289	2017-10-22 11:56:35 +00:00
Guy Blank	92d5ce3bd4	[X86] Add a pass to convert instruction chains between domains. The pass scans the function to find instruction chains that define registers in the same domain (closures). It then calculates the cost of converting the closure to another domain. If found profitable, the instructions are converted to instructions in the other domain and the register classes are changed accordingly. This commit adds the pass infrastructure and a simple conversion from the GPR domain to the Mask domain. Differential Revision: https://reviews.llvm.org/D37251 Change-Id: Ic2cf1d76598110401168326d411128ae2580a604 llvm-svn: 316288	2017-10-22 11:43:08 +00:00
Craig Topper	a33846aca6	[X86] Add VEX_WIG to applicable AVX512 instructions. This should be NFC. Will be used in future patches to fix disassembler bugs. llvm-svn: 316284	2017-10-22 06:18:23 +00:00
Craig Topper	1bcb0d8a7f	[X86] Add VEX_WIG to VROUNDSSrr/VROUNDSSrm/VROUNDSDrr/VROUNDSDrm llvm-svn: 316283	2017-10-22 06:18:20 +00:00
Craig Topper	158bc6474a	[X86] Don't allow gather/scatter to disassembler if memory operand does not use a SIB byte. Fixes PR34998. llvm-svn: 316282	2017-10-22 04:32:30 +00:00
Simon Pilgrim	ab6dbe2b29	Strip trailing whitespace. NFCI. llvm-svn: 316277	2017-10-21 20:40:49 +00:00
Aaron Ballman	fc02869c96	Reverting r316270 due to failing build bots. http://lab.llvm.org:8011/builders/clang-x86_64-linux-selfhost-modules-2/builds/12899 http://lab.llvm.org:8011/builders/clang-x86-windows-msvc2015/builds/7951 llvm-svn: 316276	2017-10-21 20:38:15 +00:00
Simon Pilgrim	3cb024490a	[X86][SSE] Add extractps/pextrd equivalence to domain tables Differential Revision: https://reviews.llvm.org/D39135 llvm-svn: 316274	2017-10-21 20:19:48 +00:00
Craig Topper	ca2382d809	[X86] Fix disassembling of EVEX instructions to stop accidentally decoding the SIB index register as an XMM/YMM/ZMM register. This introduces a new operand type to encode the whether the index register should be XMM/YMM/ZMM. And new code to fixup the results created by readSIB. This has the nice effect of removing a bunch of code that hard coded the name of every GATHER and SCATTER instruction to map the index type. This fixes PR32807. llvm-svn: 316273	2017-10-21 20:03:20 +00:00
Simon Pilgrim	cb028c7321	Fix MSVC 'result of 32-bit shift implicitly converted to 64 bits' warning. NFCI. llvm-svn: 316271	2017-10-21 17:23:04 +00:00
Fangrui Song	c7b749bd06	[PPC CodeGen] Fix the bitreverse.i64 intrinsic. Summary: The two 32-bit words were swapped. Subscribers: nemanjai, kbarton Differential Revision: https://reviews.llvm.org/D38705 llvm-svn: 316270	2017-10-21 16:59:40 +00:00
Craig Topper	fcf27188d7	[X86] Do not generate __multi3 for mul i128 on X86 Summary: __multi3 is not available on x86 (32-bit). Setting lib call name for MULI_128 to nullptr forces DAGTypeLegalizer::ExpandIntRes_MUL to generate instructions for 128-bit multiply instead of a call to an undefined function. This fixes PR20871 though it may be worth looking at why licm and indvars combine to generate 65-bit multiplies in that test. Patch by Riyaz V Puthiyapurayil Reviewers: craig.topper, schweitz Reviewed By: craig.topper, schweitz Subscribers: RKSimon, llvm-commits Differential Revision: https://reviews.llvm.org/D38668 llvm-svn: 316254	2017-10-21 02:26:00 +00:00
Krzysztof Parzyszek	9d19c8cac9	[Packetizer] Add function to check for aliasing between instructions llvm-svn: 316243	2017-10-20 22:08:40 +00:00
Sam Clegg	12fd3da9d1	[WebAssembly] MC: Fix crash when -g specified. At this point we don't output any debug sections or thier relocations. Differential Revision: https://reviews.llvm.org/D39076 llvm-svn: 316240	2017-10-20 21:28:38 +00:00
Daniel Sanders	1e4569fdc1	[globalisel][tablegen] Fix small spelling nits. NFC ComplexRendererFn -> ComplexRendererFns Corrected a couple lingering references to tied operands that were missed. llvm-svn: 316237	2017-10-20 20:55:29 +00:00
Krzysztof Parzyszek	022922b31a	[Hexagon] Report error instead of crashing on wrong inline-asm constraints llvm-svn: 316236	2017-10-20 20:24:44 +00:00
Krzysztof Parzyszek	64e5d7d3ae	[Hexagon] Reorganize and update instruction patterns llvm-svn: 316228	2017-10-20 19:33:12 +00:00
Simon Pilgrim	29b32472b4	[X86][SSE] getTargetShuffleMask - check shuffle input value types. NFCI. To help identify shuffle combine issues llvm-svn: 316222	2017-10-20 18:07:50 +00:00
Dave Lee	f9b72327b0	Make x86 __ehhandler comdat if parent function is Summary: This change comes from using lld for i686-windows-msvc. Before this change, lld emits an error of: error: relocation against symbol in discarded section: .xdata It's possible that this could be addressed in lld, but I think this change is reasonable on its own. At a high level, this is being generated: A (.text comdat) -> B (.text) -> C (.xdata comdat) Where A is a C++ inline function, which references B, an exception handler thunk, which references C, the exception handling info. With this structure, lld will error when applying relocations to B if the C it references has been discarded (some other C has been selected). This change checks if A is comdat, and if so places the exception registration thunk (B) in the comdata group of A (and B). It appears that MSVC makes the __ehhandler function comdat. Is it possible that duplicate thunks are being emitted into the final binary with other linkers, or are they stripping the unused thunks? Reviewers: rnk, majnemer, compnerd, smeenai Reviewed By: rnk, compnerd Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D38940 llvm-svn: 316219	2017-10-20 17:04:43 +00:00
Krzysztof Parzyszek	3818aeaeb9	[Hexagon] Allow redefinition with immediates for hw loop conversion Normally, if the registers holding the induction variable's bounds are redefined inside of the loop's body, the loop cannot be converted to a hardware loop. However, if the redefining instruction is actually loading an immediate value into the register, this conversion is both possible and legal (since the immediate itself will be used in the loop setup in the preheader). llvm-svn: 316218	2017-10-20 16:56:33 +00:00
Aleksandar Beserminji	143572984d	Revert "[mips] Reordering callseq* nodes to be linear" This reverts commit r314507, because the original patch is causing test failures. llvm-svn: 316215	2017-10-20 14:35:41 +00:00
Eugene Leviant	27b226fb65	[ARM] Use post-RA MI scheduler when +use-misched is set Differential revision: https://reviews.llvm.org/D39100 llvm-svn: 316214	2017-10-20 14:29:17 +00:00
Nemanja Ivanovic	0026c06e11	Disabling the transformation introduced in r315888 The commit at https://reviews.llvm.org/rL315888 is causing some failures with internal testing. Disabling this code until we can resolve the issues. llvm-svn: 316199	2017-10-20 00:36:46 +00:00
Alex Bradbury	c6c4e8bd5a	[RISCV] Add missing hunk from r316188 r316188 didn't set guessInstructionProperties=1 as it should have done. llvm-svn: 316189	2017-10-19 21:43:29 +00:00
Alex Bradbury	8971842f43	[RISCV] Initial codegen support for ALU operations This adds the minimum necessary to support codegen for simple ALU operations on RV32. Prolog and epilog insertion, support for memory operations etc etc follow in future patches. Leave guessInstructionProperties=1 until https://reviews.llvm.org/D37065 is reviewed and lands. Differential Revision: https://reviews.llvm.org/D29933 llvm-svn: 316188	2017-10-19 21:37:38 +00:00
Craig Topper	7bce79a539	[X86] Remove LowerEXTRACT_SUBVECTOR handler. All EXTRACT_SUBVECTORs are marked as legal. llvm-svn: 316182	2017-10-19 20:59:40 +00:00
Graham Yiu	488782efa3	The cost of splitting a large vector instruction is not being taken into account by the getUserCost function. This was leading to some loops being over unrolled. The cost of a vector instruction is now being multiplied by the cost of the type legalization. This will return a more accurate cost. Committing on behalf on Brad Nemanich (brad.nemanich@ibm.com) Differential Revision: https://reviews.llvm.org/D38961 llvm-svn: 316174	2017-10-19 18:16:31 +00:00
Krzysztof Parzyszek	e4d0e199bf	[Hexagon] Fix store conversion from rr to io in optimize addressing modes llvm-svn: 316170	2017-10-19 16:59:22 +00:00
Alex Bradbury	3c941e7ed9	[RISCV] RISCVAsmParser: early exit if RISCVOperand isn't immediate as expected This is necessary to avoid an assertion in the included test case and similar assembler inputs. llvm-svn: 316168	2017-10-19 16:22:51 +00:00
Alex Bradbury	baa54d4ac8	[RISCV][NFC] Drop unused parameter from createImm helper in RISCVAsmParser llvm-svn: 316167	2017-10-19 16:09:20 +00:00
Simon Pilgrim	fdd63d1535	[X86] Replace custom scalar integer absolute matching with ISD::ABS lowering. x86 has its own copy of integer absolute pattern matching to combine directly to a SUB+CMOV. This patch removes the x86 combine and adds custom lowering support for ISD::ABS instead, allowing us to use the DAGCombiner version. Additional test cases are already covered by iabs.ll (rL315706 and rL315711). Differential Revision: https://reviews.llvm.org/D38895 llvm-svn: 316162	2017-10-19 15:02:24 +00:00
Alex Bradbury	ee7c7ecd03	[RISCV] Prepare for the use of variable-sized register classes While parameterising by XLen, also take the opportunity to clean up the formatting of the RISCV .td files. This commit unifies the in-tree code with my patchset at <https://github.com/lowrisc/riscv-llvm>. llvm-svn: 316159	2017-10-19 14:29:03 +00:00
Sumanth Gundapaneni	e1983bcf55	[Hexagon] New HVX target features. This patch lets the llvm tools handle the new HVX target features that are added by frontend (clang). The target-features are of the form "hvx-length64b" for 64 Byte HVX mode, "hvx-length128b" for 128 Byte mode HVX. "hvx-double" is an alias to "hvx-length128b" and is soon will be deprecated. The hvx version target feature is upgated form "+hvx" to "+hvxv{version_number}. Eg: "+hvxv62" For the correct HVX code generation, the user must use the following target features. For 64B mode: "+hvxv62" "+hvx-length64b" For 128B mode: "+hvxv62" "+hvx-length128b" Clang picks a default length if none is specified. If for some reason, no hvx-length is specified to llvm, the compilation will bail out. There is a corresponding clang patch. Differential Revision: https://reviews.llvm.org/D38851 llvm-svn: 316101	2017-10-18 18:07:07 +00:00
Sumanth Gundapaneni	9d954c4169	[Hexagon] Update Hexagon ArchEnum and sync some downstream changes(NFC) Differential Revision: https://reviews.llvm.org/D38850 llvm-svn: 316099	2017-10-18 17:45:22 +00:00
Krzysztof Parzyszek	8c53c95137	[Hexagon] Mark vector loads as predicable, update instruction mappings All loads of form V6_vL32b_{,cur,nt,tmp,nt_cur,nt_tmp}_{ai,pi,ppu} are predicable on v62 (but not on v60). Mark them all as predicable in the instruction definitions, and handle the v60 case in HII::isPredicable. llvm-svn: 316098	2017-10-18 17:36:46 +00:00
Konstantin Zhuravlyov	8d5e9e110c	AMDGPU: Rename MaxFlatWorkgroupSize to MaxFlatWorkGroupSize for consistency Differential Revision: https://reviews.llvm.org/D38957 llvm-svn: 316097	2017-10-18 17:31:09 +00:00
Alex Bradbury	13ce95b77f	[RISCV] Bugfix createRISCVELFObjectWriter r315275 set the IsLittleEndian parameter incorrectly. This patch corrects this, and adds a test to ensure such mistakes will be caught in the future. llvm-svn: 316091	2017-10-18 16:11:31 +00:00
Andre Vieira	d4a25707f0	[ARM] Fix disassembly for conditional VMRS and VMSR instructions in ARM mode Differential Revision: https://reviews.llvm.org/D38347 llvm-svn: 316085	2017-10-18 14:47:37 +00:00
Simon Dardis	03c2c65b2d	[mips] Fix analyzeBranch to handle debug data In the case where there was a conditional branch followed by a unconditional branch with debug instruction separating them, MipsInstrInfo::analyzeBranch would not skip past debug instruction when searching for the second branch which give erroneous results about the control flow of the block. This could lead to the branch folder to merge the non-fall through case into it's predecessor, leaving the conditional branch with a dangling basic block operand. This resolves PR34975. Thanks to Alexander Richardson for reporting the issue! Reviewers: atanasyan Differential Revision: https://reviews.llvm.org/D39003 llvm-svn: 316084	2017-10-18 14:35:29 +00:00
NAKAMURA Takumi	6f43bd4bde	Untabify. llvm-svn: 316079	2017-10-18 13:31:28 +00:00
Dylan McKay	bebde41ec5	[AVR] Update to current LLVM API r315410 broke a number of things in the AVR backend, which are now fixed. llvm-svn: 316076	2017-10-18 12:35:15 +00:00
Michael Zuckerman	49293264cc	[AVX512][AVX2]Cost calculation for interleave load/store patterns {v8i8,v16i8,v32i8,v64i8} This patch adds accurate instructions cost. The formula presents two cases(stride 3 and stride 4) and calculates the cost according to the VF and stride. Reviewers: 1. delena 2. Farhana 3. zvi 4. dorit 5. Ayal Differential Revision: https://reviews.llvm.org/D38762 Change-Id: If4cfbd4ac0e63694e8144cb78c7fa34850647ff7 llvm-svn: 316072	2017-10-18 11:41:55 +00:00
Hiroshi Inoue	5388e66d3a	[PowerPC] Use helper functions to check sign-/zero-extended value Helper functions to identify sign- and zero-extending machine instruction is introduced in rL315888. This patch makes PPCInstrInfo::optimizeCompareInstr use the helper functions. It simplifies the code and also makes possible more optimizations since the helper can do more analysis than the original check code; I observed about 5000 more compare instructions are eliminated while building LLVM. Also, this patch fixes a bug in helpers on ANDIo instruction handling due to the order of checks. This bug causes a failure in an existing test case for optimizeCompareInstr. Differential Revision: https://reviews.llvm.org/D38988 llvm-svn: 316071	2017-10-18 10:31:19 +00:00
Michael Zuckerman	72a6f893cb	Fixing bug issue https://bugs.llvm.org/show_bug.cgi?id=34978 Change-Id: I7f13d5bcb181be2860377df7b40e1579a8ad4add llvm-svn: 316067	2017-10-18 08:04:31 +00:00
Daniel Sanders	30247fd1d9	[aarch64][globalisel] Register banks and classes should have distinct names. Otherwise they are ambiguous in MIR. llvm-svn: 316047	2017-10-18 00:12:43 +00:00
Wei Ding	7ab1f7a421	AMDGPU : Fix an error for the llvm.cttz implementation. Differential Revision: http://reviews.llvm.org/D39014 llvm-svn: 316037	2017-10-17 21:49:52 +00:00
Matthias Braun	a2f96b5bde	AArch64: Enable AES instruction fusion on Cyclone. Note that cyclone itself doesn't fuse, but newer apple chips do and we are using cyclone as the default when targeting apple OSes. The current code also does not capture all fusion patterns of apple CPUs yet; I am still looking for ways to refactor the code nicely to extend it. llvm-svn: 316036	2017-10-17 21:46:15 +00:00
Tim Northover	350a87eaf1	AArch64: account for possible frame index operand in compares. If the address of a local is used in a comparison, AArch64 can fold the address-calculation into the comparison via "adds". Unfortunately, a couple of places (both hit in this one test) are not ready to deal with that yet and just assume the first source operand is a register. llvm-svn: 316035	2017-10-17 21:43:52 +00:00
Eugene Zelenko	6cadde7f40	[Transforms] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC). llvm-svn: 316034	2017-10-17 21:27:42 +00:00
Konstantin Zhuravlyov	7dabe9ced7	AMDGPU: Start generating metadata for MaxFlatWorkGroupSize Differential Revision: https://reviews.llvm.org/D38958 llvm-svn: 316024	2017-10-17 20:03:21 +00:00
Yichao Yu	a46eb8e649	Fix `FaultMaps` crash when the out streamer is reused Summary: Make sure the map is cleared before processing a new module. Similar to what is done on `StackMaps`. This issue is similar to D38588, though this time for FaultMaps (on x86) rather than ARM/AArch64. Other than possible mixing of information between modules, the crash is caused by the pointers values in the map that was allocated by the bump pointer allocator that is unwinded when emitting the next file. This issue has been around since 3.8. This issue is likely much harder to write a test for since AFAICT it requires emitting something much more compilcated (and possibly real code) instead of just some random bytes. Reviewers: skatkov, sanjoy Reviewed By: skatkov, sanjoy Subscribers: sanjoy, aemerson, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D38924 llvm-svn: 315990	2017-10-17 11:44:34 +00:00
Gadi Haber	1e0f1f476a	[X86][SKL] Updated scheduling information for the SkylakeClient target Updated the scheduling information for the SkylakeClient target with the following changes: 1. regrouped the instructions after adding load and store latencies. 2. regrouped the instructions after adding identified missing ports in several groups. The changes were made after revisiting the latencies impact of all the load and store uOps. Reviewers: zvi, RKSimon, craig.topper Differential Revision: https://reviews.llvm.org/D38727 Change-Id: I778a308cc11e490e8fa5e27e2047412a1dca029f llvm-svn: 315978	2017-10-17 06:47:04 +00:00
Craig Topper	fbb1985c14	[X86] Fix typo in comment. NFC llvm-svn: 315969	2017-10-17 04:17:54 +00:00
Mark Searles	4e3d6160db	Use the return value of UpdateNodeOperands(); in some cases, UpdateNodeOperands() modifies the node in-place and using the return value isn’t strictly necessary. However, it does not necessarily modify the node, but may return a resultant node if it already exists in the DAG. See comments in UpdateNodeOperands(). In that case, the return value must be used to avoid such scenarios as an infinite loop (node is assumed to have been updated, so added back to the worklist, and re-processed; however, node hasn’t changed so it is once again passed to UpdateNodeOperands(), assumed modified, added back to worklist; cycle infinitely repeats). Differential Revision: https://reviews.llvm.org/D38466 llvm-svn: 315957	2017-10-16 23:38:53 +00:00
Quentin Colombet	0bd2825517	Re-apply [AArch64][RegisterBankInfo] Use the statically computed mappings for COPY This reverts commit r315823, thus re-applying r315781. Also make sure we don't use G_BITCAST mapping for non-generic registers. Non-generic registers don't have a type but do have a reg bank. Something the COPY mapping now how to deal with but the G_BITCAST mapping don't. -- Original Commit Message -- We use to resort on the generic implementation to get the mappings for COPYs. The generic implementation resorts on table lookup and dynamically allocated objects to get the valid mappings. Given we already know how to map G_BITCAST and have the static mappings for them, use that code path for COPY as well. This is much more efficient. Improve the compile time of RegBankSelect by up to 20%. Note: When we eventually generate all the mappings via TableGen, we wouldn't have to do that dance to shave compile time. The intent of this change was to make sure that moving to static structure really pays off. NFC. llvm-svn: 315947	2017-10-16 22:28:40 +00:00
Quentin Colombet	9f20af6135	[AArch64][RegisterBankInfo] Add mapping support for G_BITCAST of s128 Anything bigger than 64-bit just map to FPR. llvm-svn: 315946	2017-10-16 22:28:38 +00:00
Quentin Colombet	7c114d3d70	[AArch64][LegalizerInfo] Mark s128 G_BITCAST legal We used to mark all G_BITCAST of 128-bit legal but only for vector types. Scalars of this size are just fine as well. llvm-svn: 315945	2017-10-16 22:28:27 +00:00
Krzysztof Parzyszek	72518eaa6f	Add iterator range MachineRegisterInfo::liveins(), adopt users, NFC llvm-svn: 315927	2017-10-16 19:08:41 +00:00
Krzysztof Parzyszek	02893de4ef	[Hexagon] Rangify some loops, NFC Recommit r315763 with a fix. llvm-svn: 315925	2017-10-16 18:43:08 +00:00
Simon Dardis	0d378a9eed	[mips][micromips] Fix (dis)assembly of bc1(t\|f) Previously these instructions were marked codegen only and had an under-specified instruction description that did not record the fcc register. Reviewers: atanasyan, abeserminji Differential Revision: https://reviews.llvm.org/D38847 llvm-svn: 315905	2017-10-16 14:20:22 +00:00
Simon Pilgrim	73bd5aa049	Fix or vs \|\| typo. llvm-svn: 315903	2017-10-16 14:01:59 +00:00
Stefan Maksimovic	ee6b5a79dc	[mips] Provide alternate predicates for constant synthesis Ordering of patterns should not be of importance anymore since the predicates used are mutually exclusive now. llvm-svn: 315901	2017-10-16 13:18:21 +00:00
Hiroshi Inoue	a7eb78b47f	[PowerPC] fix up in sign-/zero-extension elimination This patch fixes a potential problem in my previous commit (https://reviews.llvm.org/rL315888) by adding a null check. llvm-svn: 315900	2017-10-16 12:11:15 +00:00
Andrew V. Tischenko	bfc9061593	This patch is a result of D37262: The issues with X86 prefixes. It closes PR7709, PR17697, PR19251, PR32809 and PR21640. There could be other bugs closed by this patch. llvm-svn: 315899	2017-10-16 11:14:29 +00:00
Daniel Sanders	01805b6747	[aarch64][globalisel] Fix a crash in selectAddrModeIndexed() caused by incorrect G_FRAME_INDEX handling The wrong operand was being rendered to the result instruction. The crash was detected by Bitcode/simd_ops/AArch64_halide_runtime.bc llvm-svn: 315890	2017-10-16 05:39:30 +00:00
Yonghong Song	6621cf67cf	bpf: fix bug on silently truncating 64-bit immediate We came across an llvm bug when compiling some testcases that 64-bit immediates are silently truncated into 32-bit and then packed into BPF_JMP \| BPF_K encoding. This caused comparison with wrong value. This bug looks to be introduced by r308080. The Select_Ri pattern is supposed to be lowered into J_Ri while the latter only support 32-bit immediate encoding, therefore Select_Ri should have similar immediate predicate check as what J_Ri are doing. Reported-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Reviewed-by: Yonghong Song <yhs@fb.com> llvm-svn: 315889	2017-10-16 04:14:53 +00:00
Hiroshi Inoue	e3a3e3c9e9	[PowerPC] Eliminate sign- and zero-extensions if already sign- or zero-extended This patch enables redundant sign- and zero-extension elimination in PowerPC MI Peephole pass. If the input value of a sign- or zero-extension is known to be already sign- or zero-extended, the operation is redundant and can be eliminated. One common case is sign-extensions for a method parameter or for a method return value; they must be sign- or zero-extended as defined in PPC ELF ABI. For example of the following simple code, two extsw instructions are generated before the invocation of int_func and before the return. With this patch, both extsw are eliminated. void int_func(int); void ii_test(int a) { if (a & 1) return int_func(a); } Such redundant sign- or zero-extensions are quite common in many programs; e.g. I observed about 60,000 occurrences of the elimination while compiling the LLVM+CLANG. Differential Revision: https://reviews.llvm.org/D31319 llvm-svn: 315888	2017-10-16 04:12:57 +00:00
Daniel Sanders	ea8711b88e	Re-commit r315885: [globalisel][tblgen] Add support for iPTR and implement am_unscaled* and am_indexed* Summary: iPTR is a pointer of subtarget-specific size to any address space. Therefore type checks on this size derive the SizeInBits from a subtarget hook. At this point, we can import the simplests G_LOAD rules and select load instructions using them. Further patches will support for the predicates to enable additional loads as well as the stores. The previous commit failed on MSVC due to a failure to convert an initializer_list to a std::vector. Hopefully, MSVC will accept this version. Depends on D37457 Reviewers: ab, qcolombet, t.p.northover, rovka, aditya_nandakumar Reviewed By: qcolombet Subscribers: kristof.beyls, javed.absar, llvm-commits, igorb Differential Revision: https://reviews.llvm.org/D37458 llvm-svn: 315887	2017-10-16 03:36:29 +00:00
Daniel Sanders	ce72d611af	Revert r315885: [globalisel][tblgen] Add support for iPTR and implement am_unscaled* and am_indexed* MSVC doesn't like one of the constructors. llvm-svn: 315886	2017-10-16 02:15:39 +00:00
Daniel Sanders	6735ea86cd	[globalisel][tblgen] Add support for iPTR and implement am_unscaled* and am_indexed* Summary: iPTR is a pointer of subtarget-specific size to any address space. Therefore type checks on this size derive the SizeInBits from a subtarget hook. At this point, we can import the simplests G_LOAD rules and select load instructions using them. Further patches will support for the predicates to enable additional loads as well as the stores. Depends on D37457 Reviewers: ab, qcolombet, t.p.northover, rovka, aditya_nandakumar Reviewed By: qcolombet Subscribers: kristof.beyls, javed.absar, llvm-commits, igorb Differential Revision: https://reviews.llvm.org/D37458 llvm-svn: 315885	2017-10-16 01:16:35 +00:00
Krzysztof Parzyszek	7467119149	[Hexagon] Add LLVM_ATTRIBUTE_UNUSED to operator<<, NFC This should silence "unused function" warnings. llvm-svn: 315883	2017-10-16 00:29:47 +00:00
Daniel Sanders	df39cbae2f	Re-commit r315863: [globalisel][tablegen] Import ComplexPattern when used as an operator Summary: It's possible for a ComplexPattern to be used as an operator in a match pattern. This is used by the load/store patterns in AArch64 to name the suboperands returned by ComplexPattern predicate so that they can be broken apart and referenced independently in the result pattern. This patch adds support for this in order to enable the import of load/store patterns. Depends on D37445 Hopefully fixed the ambiguous constructor that a large number of bots reported. Reviewers: ab, qcolombet, t.p.northover, rovka, aditya_nandakumar Reviewed By: qcolombet Subscribers: aemerson, javed.absar, igorb, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D37456 llvm-svn: 315869	2017-10-15 18:22:54 +00:00
Daniel Sanders	bb082a36d3	Revert r315863: [globalisel][tablegen] Import ComplexPattern when used as an operator A large number of bots are failing on an ambiguous constructor call. llvm-svn: 315866	2017-10-15 17:51:07 +00:00
Daniel Sanders	b95b867dd8	[globalisel][tablegen] Import ComplexPattern when used as an operator Summary: It's possible for a ComplexPattern to be used as an operator in a match pattern. This is used by the load/store patterns in AArch64 to name the suboperands returned by ComplexPattern predicate so that they can be broken apart and referenced independently in the result pattern. This patch adds support for this in order to enable the import of load/store patterns. Depends on D37445 Reviewers: ab, qcolombet, t.p.northover, rovka, aditya_nandakumar Reviewed By: qcolombet Subscribers: aemerson, javed.absar, igorb, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D37456 llvm-svn: 315863	2017-10-15 17:03:36 +00:00
Craig Topper	2738117326	[X86] Remove the SlowBTMem feature flag entirely Turns out we have no patterns on the instructions that were using this feature flag for other reasons. These instructions are slow on all modern CPUs so it seems unlikely that we will spend any effort supporting these instructions going forward. So we might as well just kill of the feature flag and just fix up the comments. llvm-svn: 315862	2017-10-15 16:57:33 +00:00
Craig Topper	a5af4a64d0	[AVX512] Don't mark EXTLOAD as legal with AVX512. Continue using custom lowering. Summary: This was impeding our ability to combine the extending shuffles with other shuffles as you can see from the test changes. There's one special case that needed to be added to use VZEXT directly for v8i8->v8i64 since the custom lowering requires v64i8. Reviewers: RKSimon, zvi, delena Reviewed By: delena Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D38714 llvm-svn: 315860	2017-10-15 16:41:17 +00:00
Craig Topper	a1f9c9dd8b	[X86] Add FeatureSlowBTMem to Haswell, Broadwell, Skylake, Cannonlake, and Knights Landing CPUs. Summary: I see nothing in Agner Fog's tables to indicate that this improved between Ivy Bridge and Haswell. It's also set for all Atom CPUs so I assume KNL should have it too. Reviewers: RKSimon, zvi, gadi.haber Reviewed By: gadi.haber Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D38890 llvm-svn: 315859	2017-10-15 16:41:15 +00:00
Aaron Ballman	615eb47035	Reverting r315590; it did not include changes for llvm-tblgen, which is causing link errors for several people. Error LNK2019 unresolved external symbol "public: void __cdecl `anonymous namespace'::MatchableInfo::dump(void)const " (?dump@MatchableInfo@?A0xf4f1c304@@QEBAXXZ) referenced in function "public: void __cdecl `anonymous namespace'::AsmMatcherEmitter::run(class llvm::raw_ostream &)" (?run@AsmMatcherEmitter@?A0xf4f1c304@@QEAAXAEAVraw_ostream@llvm@@@Z) llvm-tblgen D:\llvm\2017\utils\TableGen\AsmMatcherEmitter.obj 1 llvm-svn: 315854	2017-10-15 14:32:27 +00:00
Amjad Aboud	c8d67979c0	[X86] Ignore DBG instructions in X86CmovConversion optimization to resolve PR34565 Differential Revision: https://reviews.llvm.org/D38359 llvm-svn: 315851	2017-10-15 11:00:56 +00:00
Craig Topper	a9cd59fb5d	[X86] Lower vselect with constant condition to vector_shuffle even with AVX512 instructions. Summary: It's better to use our shuffle lowering code to handle these than loading an immediate into a k-register. It really feels like this should be a DAG combine optimization rather than a lowering operation, but that's a problem for another day. Reviewers: RKSimon, delena, zvi Reviewed By: delena Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D38932 llvm-svn: 315849	2017-10-15 06:39:07 +00:00
Vitaly Buka	7450398e01	Remove unused variables llvm-svn: 315847	2017-10-15 05:35:02 +00:00
Davide Italiano	76067588dc	[Hexagon] Mark RangeTree::dump() with LLVM_DUMP_METHOD. GCC otherwise emits a "defined but not used" warning on the member function. llvm-svn: 315838	2017-10-14 23:46:01 +00:00
Konstantin Zhuravlyov	8c18f5b3d4	AMDGPU: Don't use TargetStreamer if it has not been initialized Fixes cfe/trunk/test/Misc/backend-resource-limit-diagnostics.cl test after r315808 We may hit few other similar issues, but I want to discuss good solution offline. llvm-svn: 315830	2017-10-14 22:16:26 +00:00
Simon Pilgrim	36fe00ee17	[X86][SSE] Don't attempt to reduce the imul vector width of odd sized vectors (PR34947) llvm-svn: 315825	2017-10-14 19:57:19 +00:00
Bruno Cardoso Lopes	caac2fbd19	Revert "[AArch64][RegisterBankInfo] Use the statically computed mappings for COPY" This reverts commit r315781, breaks: http://green.lab.llvm.org/green/job/Compiler_Verifiers_GlobalISEL/9882 llvm-svn: 315823	2017-10-14 19:31:03 +00:00
Konstantin Zhuravlyov	a01d8b0b63	AMDGPU: Bring HSA metadata on par with the specification Differential Revision: https://reviews.llvm.org/D38753 llvm-svn: 315821	2017-10-14 19:03:51 +00:00
Simon Pilgrim	f5b9f353c3	Pull out repeated calls to VT.getVectorNumElements(). NFCI. llvm-svn: 315818	2017-10-14 17:37:42 +00:00
Simon Pilgrim	cded82837d	Use DAG::getBitcast() helper. NFCI. llvm-svn: 315815	2017-10-14 17:14:42 +00:00
Konstantin Zhuravlyov	219066bab8	AMDGPU: Improve note directive verification in assembler - Do not allow amd_amdgpu_isa directives on non-amdgcn architectures - Do not allow amd_amdgpu_hsa_metadata on non-amdhsa OSes - Do not allow amd_amdgpu_pal_metadata on non-amdpal OSes Differential Revision: https://reviews.llvm.org/D38750 llvm-svn: 315812	2017-10-14 16:15:28 +00:00
Konstantin Zhuravlyov	eda425edd4	AMDGPU: Do not emit deprecated notes for code object v3 Differential Revision: https://reviews.llvm.org/D38749 llvm-svn: 315810	2017-10-14 15:59:07 +00:00
Konstantin Zhuravlyov	9c05b2bc3b	AMDGPU: Add support for isa version note - Emit NT_AMD_AMDGPU_ISA - Add assembler parsing for isa version directive - If isa version directive does not match command line arguments, then return error Differential Revision: https://reviews.llvm.org/D38748 llvm-svn: 315808	2017-10-14 15:40:33 +00:00
Simon Pilgrim	f367c27d2d	[X86][SSE] Support combining AND(EXTRACT(SHUF(X)), C) -> EXTRACT(SHUF(X)) If we are applying a byte mask to a value extracted from a shuffle, see if we can combine the mask into shuffle. Fixes the last issue with PR22415 llvm-svn: 315807	2017-10-14 15:01:36 +00:00
Craig Topper	f7e777763d	[X86] Add patterns for vzmovl+cvtpd2dq/cvttpd2dq with a load. llvm-svn: 315802	2017-10-14 07:04:48 +00:00
Craig Topper	61010a85b8	[X86] Add AVX512 versions of VCVTPD2PS to load folding tables. llvm-svn: 315801	2017-10-14 05:55:43 +00:00
Craig Topper	ee277e190c	[X86] Add patterns for vzmovl+cvtpd2ps with a load. llvm-svn: 315800	2017-10-14 05:55:42 +00:00
Craig Topper	aec05a9303	[X86] Remove some patterns for bitcasted alignednonedtemporalloads. These select the same instruction as the non-bitcasted pattern. So this provides no additional value. llvm-svn: 315799	2017-10-14 04:18:11 +00:00
Craig Topper	009f0aaeb0	[X86] Remove unnecessary bitconverts as the root of patterns for zero extended VCVTPD2UDQZ128rr and VCVTTPD2UDQZ128rr. We don't need a bitconvert as a root pattern in these cases. The types in the other parts of the pattern are sufficient to express the behavior of these instructions. llvm-svn: 315798	2017-10-14 04:18:10 +00:00
Craig Topper	d746747d03	[X86] Add additional patterns for folding loads with 128-bit VCVTDQ2PD and VCVTUDQ2PD. This matches the patterns we have for the SSE/AVX version. This is a prerequisite for D38714. llvm-svn: 315797	2017-10-14 04:18:09 +00:00
Craig Topper	134241e4af	[X86] Add AVX512 flavors of VCVTDQ2PD plus VCVTUDQ2PD to the load folding tables. llvm-svn: 315796	2017-10-14 04:18:08 +00:00
Craig Topper	0b64e67b0d	[X86] Remove TB_NO_REVERSE from VCVTDQ2PDYrr and VCVTPS2PDYrr in the load folding tables. I believe these were added incorrectly under the belief that the load size was smaller than the input register size, but that's not true. llvm-svn: 315795	2017-10-14 04:18:07 +00:00
Craig Topper	53b0cb7fa9	[X86] Add an additional isel pattern to CVTDQ2PDrm/VCVTDQ2PDrm to enable load folding without the peephole pass. This pattern is already used in AVX512VL version of these instructions. Though AVX512VL version is missing other patterns. llvm-svn: 315794	2017-10-14 04:18:06 +00:00
Quentin Colombet	dc2da06c55	[AArch64][RegisterBankInfo] Use the statically computed mappings for COPY We use to resort on the generic implementation to get the mappings for COPYs. The generic implementation resorts on table lookup and dynamically allocated objects to get the valid mappings. Given we already know how to map G_BITCAST and have the static mappings for them, use that code path for COPY as well. This is much more efficient. Improve the compile time of RegBankSelect by up to 20%. Note: When we eventually generate all the mappings via TableGen, we wouldn't have to do that dance to shave compile time. The intent of this change was to make sure that moving to static structure really pays off. NFC. llvm-svn: 315781	2017-10-14 00:43:48 +00:00
Krzysztof Parzyszek	a7e5c84590	Revert r315763: "[Hexagon] Rangify some loops, NFC" Broke some builds (using libstdc++). llvm-svn: 315769	2017-10-13 21:57:11 +00:00
Craig Topper	f6c69564e7	[X86] Use X86ISD::VBROADCAST in place of v2f64 X86ISD::MOVDDUP when AVX2 is available This is particularly important for AVX512VL where we are better able to recognize the VBROADCAST loads to fold with other operations. For AVX512VL we now use X86ISD::VBROADCAST for all of the patterns and remove the 128-bit X86ISD::VMOVDDUP. We may be able to use this for AVX1 as well which would allow us to remove more isel patterns. I also had to add X86ISD::VBROADCAST as a node to call combineShuffle for so that we treat it similar to X86ISD::MOVDDUP. Differential Revision: https://reviews.llvm.org/D38836 llvm-svn: 315768	2017-10-13 21:56:48 +00:00
Krzysztof Parzyszek	63ca5d6196	[Hexagon] Rangify some loops, NFC llvm-svn: 315763	2017-10-13 21:43:00 +00:00
Daniel Sanders	11300cead8	[globalisel][tablegen] Add support for fpimm and import of APInt/APFloat based ImmLeaf. Summary: There's only a tablegen testcase for IntImmLeaf and not a CodeGen one because the relevant rules are rejected for other reasons at the moment. On AArch64, it's because there's an SDNodeXForm attached to the operand. On X86, it's because the rule either emits multiple instructions or has another predicate using PatFrag which cannot easily be supported at the same time. Reviewers: ab, t.p.northover, qcolombet, rovka, aditya_nandakumar Reviewed By: qcolombet Subscribers: aemerson, javed.absar, igorb, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D36569 llvm-svn: 315761	2017-10-13 21:28:03 +00:00
Matt Arsenault	e11d8aca77	AMDGPU: Implement hasBitPreservingFPLogic llvm-svn: 315754	2017-10-13 21:10:22 +00:00
Benjamin Kramer	9f21ca6361	[Hexagon] Avoid unused variable warnings in release builds. No functionality change intended. llvm-svn: 315749	2017-10-13 20:46:14 +00:00
Matt Arsenault	550c66d10f	AMDGPU: Look for src mods before fp_extend When selecting modifiers for mad_mix instructions, look at fneg/fabs that occur before the conversion. llvm-svn: 315748	2017-10-13 20:45:49 +00:00
Daniel Sanders	649c585710	[aarch64] Support APInt and APFloat in ImmLeaf subclasses and make AArch64 use them. Summary: The purpose of this patch is to expose more information about ImmLeaf-like PatLeaf's so that GlobalISel can learn to import them. Previously, ImmLeaf could only be used to test int64_t's produced by sign-extending an APInt. Other tests on immediates had to use the generic PatLeaf and extract the constant using C++. With this patch, tablegen will know how to generate predicates for APInt, and APFloat. This will allow it to 'do the right thing' for both SelectionDAG and GlobalISel which require different methods of extracting the immediate from the IR. This is NFC for SelectionDAG since the new code is equivalent to the previous code. It's also NFC for FastISel because FastIselShouldIgnore is 1 for the ImmLeaf subclasses. Enabling FastIselShouldIgnore == 0 for these new subclasses will require a significant re-factor of FastISel. For GlobalISel, it's currently NFC because the relevant code to import the affected rules is not yet present. This will be added in a later patch. Depends on D36086 Reviewers: ab, t.p.northover, qcolombet, rovka, aditya_nandakumar Reviewed By: qcolombet Subscribers: bjope, aemerson, rengolin, javed.absar, igorb, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D36534 llvm-svn: 315747	2017-10-13 20:42:18 +00:00
Matt Arsenault	4d70754e3c	AMDGPU: Implement isFPExtFoldable This helps match v_mad_mix* in some cases. llvm-svn: 315744	2017-10-13 20:18:59 +00:00
Matt Arsenault	f2db97d8fa	DAG: Add opcode and source type to isFPExtFree This is only currently used for mad/fma transforms. This is the only case where it should be used for AMDGPU, so add an opcode to be sure. llvm-svn: 315740	2017-10-13 19:55:45 +00:00
Krzysztof Parzyszek	7c9c05888c	[Hexagon] Minimize number of repeated constant extenders Each constant extender requires an extra instruction, which adds to the code size and also reduces the number of available slots in an instruction packet. In most cases, the value of a repeated constant extender could be loaded into a register, and the instructions using the extender could be replaced with their counterparts that use that register instead. This patch adds a pass that tries to reduce the number of constant extenders, including extenders which differ only in an immediate offset known at compile time, e.g. @global and @global+12. llvm-svn: 315735	2017-10-13 19:02:59 +00:00
Craig Topper	5d692917f4	[X86] Add initial skeleton support for knm cpu This adds Intel's Knights Mill CPU to valid CPU names for the backend. For now its an alias of "knl", but ultimately we need to support AVX5124FMAPS and AVX5124VNNIW instruction sets for it. Differential Revision: https://reviews.llvm.org/D38811 llvm-svn: 315722	2017-10-13 18:10:17 +00:00
Craig Topper	5805fb3dfc	[X86] Fix some inconsistent formatting in the processor feature lists. llvm-svn: 315696	2017-10-13 16:06:06 +00:00
Craig Topper	54541c4675	[X86] Add ProcIntelBDW to BroadwellProc class not BDWFeatures class. This isn't a property we want inherited. llvm-svn: 315695	2017-10-13 16:04:08 +00:00
Krzysztof Parzyszek	a0f2f7c413	[Hexagon] Add patterns for cmpb/cmph with immediate arguments Patch by Sumanth Gundapaneni. llvm-svn: 315692	2017-10-13 15:43:12 +00:00
Craig Topper	0817346aef	[X86] Stop creating CMOV nodes with a second MVT::Glue result Summary: We seem to inconsistently create CMOV nodes some with a Glue result and some without. But I can't find any cases that use the Glue result. So I've tried to remove all the place that did this. Reviewers: RKSimon, spatel, zvi Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D38664 llvm-svn: 315686	2017-10-13 15:28:35 +00:00
Craig Topper	bf0de9d3b6	[X86] Remove patterns that select unmasked vbroadcastf2x32/vbroadcasti2x32. Prefer vbroadcastsd/vpbroadcastq instead. There's no advantage to using these instructions when they aren't masked. This enables some additional execution domain switching without needing to update the table. llvm-svn: 315674	2017-10-13 06:07:10 +00:00
Matthias Braun	bb8507e63c	Revert "TargetMachine: Merge TargetMachine and LLVMTargetMachine" Reverting to investigate layering effects of MCJIT not linking libCodeGen but using TargetMachine::getNameWithPrefix() breaking the lldb bots. This reverts commit r315633. llvm-svn: 315637	2017-10-12 22:57:28 +00:00
Matthias Braun	3a9c114b24	TargetMachine: Merge TargetMachine and LLVMTargetMachine Merge LLVMTargetMachine into TargetMachine. - There is no in-tree target anymore that just implements TargetMachine but not LLVMTargetMachine. - It should still be possible to stub out all the various functions in case a target does not want to use lib/CodeGen - This simplifies the code and avoids methods ending up in the wrong interface. Differential Revision: https://reviews.llvm.org/D38489 llvm-svn: 315633	2017-10-12 22:28:54 +00:00
Craig Topper	060cb43721	[X86] Add CLWB intrinsic. llvm part llvm-svn: 315613	2017-10-12 20:08:31 +00:00
Wei Ding	5676acad9e	Implement custom lowering for ISD::CTTZ_ZERO_UNDEF and ISD::CTTZ. Differential Revision: http://reviews.llvm.org/D37348 llvm-svn: 315610	2017-10-12 19:37:14 +00:00
Konstantin Zhuravlyov	70303c011f	AMDGPU/NFC: Move AMDGPU specific note types to ELF.h Differential Revision: https://reviews.llvm.org/D38747 llvm-svn: 315608	2017-10-12 18:59:54 +00:00
Artem Belevich	3bafc2f0d9	[NVPTX] Implemented wmma intrinsics and instructions. WMMA = "Warp Level Matrix Multiply-Accumulate". These are the new instructions introduced in PTX6.0 and available on sm_70 GPUs. Differential Revision: https://reviews.llvm.org/D38645 llvm-svn: 315601	2017-10-12 18:27:55 +00:00
Reid Kleckner	1a7e387849	[codeview] Don't emit FPO data in funclet prologues Attempt 3 to work around bugs in FPO data with funclets. llvm-svn: 315600	2017-10-12 18:20:35 +00:00
Konstantin Zhuravlyov	63e87f5a02	AMDGPU: Fix warnings introduced in r315526 llvm-svn: 315596	2017-10-12 17:34:05 +00:00
Lei Huang	0724fea2da	[PowerPC] Add profitablilty check for conversion to mtctr loops Add profitability checks for modifying counted loops to use the mtctr instruction. The latency of mtctr is only justified if there are more than 4 comparisons that will be removed as a result. Usually counted loops are formed relatively early and before unrolling, so most low trip count loops often don't survive. However we want to ensure that if they do, we do not mistakenly update them to mtctr loops. Use CodeMetrics to ensure we are only doing this for small loops with small trip counts. Differential Revision: https://reviews.llvm.org/D38212 llvm-svn: 315592	2017-10-12 16:43:33 +00:00
Tim Renouf	c8ffffe462	[AMDGPU] For amdpal, widen interpolation mode workaround Summary: The interpolation mode workaround ensures that at least one interpolation mode is enabled in PSInputAddr. It does not also check PSInputEna on the basis that the user might enable bits in that depending on run-time state. However, for amdpal os type, the user does not enable some bits after compilation based on run-time states; the register values being generated here are the final ones set in the hardware. Therefore, apply the workaround to PSInputAddr and PSInputEnable together. (The case where a bit is set in PSInputAddr but not in PSInputEnable is where the frontend set up an input arg for a particular interpolation mode, but nothing uses that input arg. Really we should have an earlier pass that removes such an arg.) Reviewers: arsenm, nhaehnle, dstuttard Subscribers: kzhuravl, wdng, yaxunl, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D37758 llvm-svn: 315591	2017-10-12 16:16:41 +00:00
Don Hinton	3e0199f7eb	[dump] Remove NDEBUG from test to enable dump methods [NFC] Summary: Add LLVM_FORCE_ENABLE_DUMP cmake option, and use it along with LLVM_ENABLE_ASSERTIONS to set LLVM_ENABLE_DUMP. Remove NDEBUG and only use LLVM_ENABLE_DUMP to enable dump methods. Move definition of LLVM_ENABLE_DUMP from config.h to llvm-config.h so it'll be picked up by public headers. Differential Revision: https://reviews.llvm.org/D38406 llvm-svn: 315590	2017-10-12 16:16:06 +00:00
Sanjay Patel	3a72909b7e	[x86] replace isEqualTo with == for efficiency This is a follow-up suggested in D37534. Patch by Yulia Koval. llvm-svn: 315589	2017-10-12 16:15:38 +00:00
Simon Pilgrim	0903085ec3	[X86][SSE] Pull out repeated INSERT_VECTOR_ELT code from LowerBUILD_VECTOR v16i8/v8i16 insertion. NFCI. llvm-svn: 315587	2017-10-12 15:52:01 +00:00
Reid Kleckner	d925f98375	Speculative build fix 2 llvm-svn: 315542	2017-10-12 00:28:28 +00:00
Wei Mi	1736efd16a	Revert r307036 because of PR34919. llvm-svn: 315540	2017-10-12 00:24:52 +00:00
Reid Kleckner	9c0126ec0b	Speculative build fix, apparently I built llc without my patch applied to test it llvm-svn: 315539	2017-10-12 00:20:50 +00:00
Reid Kleckner	29cfa6f11f	[codeview] Disable FPO in functions using EH funclets Funclets are emitted by WinException which doesn't have access to X86TargetStreamer so it's hard to make a quick fix for this. llvm-svn: 315538	2017-10-12 00:06:57 +00:00
Reid Kleckner	c18c12e385	Fix AMDGPU build issue llvm-svn: 315535	2017-10-11 23:53:36 +00:00
Reid Kleckner	ec4ff24f79	[X86] Sink X86AsmPrinter ctor into .cpp file, NFC I keep adding and removing code here, so let's sink it. llvm-svn: 315534	2017-10-11 23:53:12 +00:00
Lang Hames	2241ffa43c	[MC] Have MCObjectStreamer take its MCAsmBackend argument via unique_ptr. MCObjectStreamer owns its MCCodeEmitter -- this fixes the types to reflect that, and allows us to remove the last instance of MCObjectStreamer's weird "holding ownership via someone else's reference" trick. llvm-svn: 315531	2017-10-11 23:34:47 +00:00
Konstantin Zhuravlyov	516651b154	AMDGPU/NFC: Minor clean ups in HSA metadata - Use HSA metadata streamer directly from AMDGPUAsmPrinter - Make naming consistent with PAL metadata Differential Revision: https://reviews.llvm.org/D38746 llvm-svn: 315526	2017-10-11 22:59:35 +00:00
Konstantin Zhuravlyov	c3beb6a075	AMDGPU/NFC: Minor clean ups in PAL metadata - Move PAL metadata definitions to AMDGPUMetadata - Make naming consistent with HSA metadata Differential Revision: https://reviews.llvm.org/D38745 llvm-svn: 315523	2017-10-11 22:41:09 +00:00
Konstantin Zhuravlyov	a63b0f9d20	AMDGPU/NFC: Rename code object metadata as HSA metadata - Rename AMDGPUCodeObjectMetadata to AMDGPUMetadata (PAL metadata will be included in this file in the follow up change) - Rename AMDGPUCodeObjectMetadataStreamer to AMDGPUHSAMetadataStreamer - Introduce HSAMD namespace - Other minor name changes in function and test names llvm-svn: 315522	2017-10-11 22:18:53 +00:00
Reid Kleckner	9cdd4df81a	[codeview] Implement FPO data assembler directives Summary: This adds a set of new directives that describe 32-bit x86 prologues. The directives are limited and do not expose the full complexity of codeview FPO data. They are merely a convenience for the compiler to generate more readable assembly so we don't need to generate tons of labels in CodeGen. If our prologue emission changes in the future, we can change the set of available directives to suit our needs. These are modelled after the .seh_ directives, which use a different format that interacts with exception handling. The directives are: .cv_fpo_proc _foo .cv_fpo_pushreg ebp/ebx/etc .cv_fpo_setframe ebp/esi/etc .cv_fpo_stackalloc 200 .cv_fpo_endprologue .cv_fpo_endproc .cv_fpo_data _foo I tried to follow the implementation of ARM EHABI CFI directives by sinking most directives out of MCStreamer and into X86TargetStreamer. This helps avoid polluting non-X86 code with WinCOFF specific logic. I used cdb to confirm that this can show locals in parent CSRs in a few cases, most importantly the one where we use ESI as a frame pointer, i.e. the one in http://crbug.com/756153#c28 Once we have cdb integration in debuginfo-tests, we can add integration tests there. Reviewers: majnemer, hans Subscribers: aemerson, mgorny, kristof.beyls, llvm-commits, hiraditya Differential Revision: https://reviews.llvm.org/D38776 llvm-svn: 315513	2017-10-11 21:24:33 +00:00
Krzysztof Parzyszek	c4a9a8d8e0	[Hexagon] Make sure that new-value jump is packetized with producer llvm-svn: 315510	2017-10-11 21:20:43 +00:00
Lei Huang	263dc4ef3a	[PowerPC] Utilize DQ-Form instructions for spill/restore and fix FrameIndex elimination to only use `lis/addi` if necessary. Currently we produce a bunch of unnecessary code when emitting the prologue/epilogue for spills/restores. Namely, if the load from stack slot/store to stack slot instruction is an X-Form instruction, we will always produce an LIS/ORI sequence for the stack offset. Furthermore, we have not exploited the P9 vector D-Form loads/stores for this purpose. This patch address both issues. Specifying the D-Form load as the instruction to use for stack spills/reloads should be safe because: 1. The stack should be aligned according to the ABI 2. If the stack isn't aligned, PPCRegisterInfo::eliminateFrameIndex() will check for the offset being a multiple of 16 and will convert it to an X-Form instruction if it isn't. Differential Revision : https://reviews.llvm.org/D38758 llvm-svn: 315500	2017-10-11 20:20:58 +00:00
Sanjay Patel	6c0aef77aa	[x86] avoid infinite loop from SoftenFloatOperand (PR34866) Legalization of fp128 assumes things that we should have asserts for, so that's another potential improvement. Differential Revision: https://reviews.llvm.org/D38771 llvm-svn: 315485	2017-10-11 18:24:21 +00:00
Krzysztof Parzyszek	bf626195df	[Hexagon] Handle non-immediate operands to A2_addi in getIncrementValue llvm-svn: 315472	2017-10-11 16:15:31 +00:00
Simon Pilgrim	7db366630c	Spelling mistake in comment. NFCI. llvm-svn: 315471	2017-10-11 16:10:05 +00:00
Craig Topper	3dc22bba47	[X86] Remove MVT::i1 handling code from LowerTRUNCATE Summary: I don't think this is necessary with i1 being illegal now. Reviewers: RKSimon, zvi, guyblank Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D38784 llvm-svn: 315469	2017-10-11 16:05:05 +00:00
Krzysztof Parzyszek	12bdcab59c	[Pipeliner] Fix offset value for instrs dependent on post-inc load/stores The software pipeliner and the packetizer try to break dependence between the post-increment instruction and the dependent memory instructions by changing the base register and the offset value. However, in some cases, the existing logic didn't work properly and created incorrect offset value. Patch by Jyotsna Verma. llvm-svn: 315468	2017-10-11 15:59:51 +00:00
Krzysztof Parzyszek	8f174dde92	[Pipeliner] Improve serialization order for post-increments The pipeliner is generating a serial sequence that causes poor register allocation when a post-increment instruction appears prior to the use of the post-increment register. This occurs when there is a circular set of dependences involved with a sequence of instructions in the same cycle. In this case, there is no serialization of the parallel semantics that will not cause an additional register to be allocated. This patch fixes the problem by changing the instructions so that the post-increment instruction is used by the subsequent instruction, which enables the register allocator to make a better decision and not require another register. Patch by Brendon Cahoon. llvm-svn: 315466	2017-10-11 15:51:44 +00:00
Alex Bradbury	5c1eef4618	[RISCV] Fix build after r315327 Differential Revision: https://reviews.llvm.org/D38779 Patch by Chih-Mao Chen. llvm-svn: 315455	2017-10-11 12:09:06 +00:00
Simon Dardis	41851e3546	[mips] Add support for parsing target specific flags for MIR Reviewers: atanasyan Differential Revision: https://reviews.llvm.org/D38620 llvm-svn: 315451	2017-10-11 11:11:35 +00:00
Oliver Stannard	4191b9eaea	[Asm] Add debug tracing in table-generated assembly matcher This adds debug tracing to the table-generated assembly instruction matcher, enabled by the -debug-only=asm-matcher option. The changes in the target AsmParsers are to add an MCInstrInfo reference under a consistent name, so that we can use it from table-generated code. This was already being used this way for targets that use deprecation warnings, but 5 targets did not have it, and Hexagon had it under a different name to the other backends. llvm-svn: 315445	2017-10-11 09:17:43 +00:00
Lang Hames	02d330548d	[MC] Have MCObjectStreamer take its MCAsmBackend argument via unique_ptr. MCObjectStreamer owns its MCAsmBackend -- this fixes the types to reflect that, and allows us to remove another instance of MCObjectStreamer's weird "holding ownership via someone else's reference" trick. llvm-svn: 315410	2017-10-11 01:57:21 +00:00
Craig Topper	85b1da1dc4	[X86] Remove temporary std::string creation from shuffle comment printing. We can just write directly to the raw_ostream. llvm-svn: 315399	2017-10-11 00:46:09 +00:00
Craig Topper	6ce20bd184	[X86] Add 128-bit version of vbroadcasti32x2 to shuffle comment decoding. llvm-svn: 315395	2017-10-11 00:11:53 +00:00
Craig Topper	bb0e316dc7	[X86] Add broadcast patterns that allow a scalar_to_vector between the broadcast and the load. We already have these patterns for AVX512VL, but not AVX1 or 2. llvm-svn: 315382	2017-10-10 22:40:31 +00:00
Craig Topper	ad3d03193a	[X86] Fix some patterns that select VLX instructions, but were incorrectly also checking presence of BWI instructions. The EVEX->VEX pass probably obscures this. llvm-svn: 315365	2017-10-10 21:07:14 +00:00
Simon Dardis	b994128d14	[mips] Correct the instruction predicates for microMIPSr3 Rather than using the AdditionalPredicates mechanism to guard the microMIPS instructions, use the existing predicates to properly guard those instructions. This also resolves a case where an instruction pattern was incorrectly available for microMIPS32R6, which caused a register allocation failure as the registers specified in the pattern were not available. Reviewers: nitesh.jain, atanasyan Differential Revision: https://reviews.llvm.org/D38451 llvm-svn: 315362	2017-10-10 20:52:53 +00:00
Matt Arsenault	f42074b699	AMDGPU: Fix missing skipFunction calls llvm-svn: 315361	2017-10-10 20:48:36 +00:00
Matt Arsenault	d674e0ac0d	AMDGPU: Fix failure to select branch with optnone opt-bisect/optnone disable the AMDGPUUniformAnnotateValues pass. The heuristic in the custom selector for brcond deferred the branch uniformity check to the pattern, which would fail. llvm-svn: 315360	2017-10-10 20:34:49 +00:00
Matt Arsenault	cc85223f87	AMDGPU: Fix incorrect selection of pseudo-branches These should only be used if the machine structurizer is enabled. llvm-svn: 315357	2017-10-10 20:22:07 +00:00
Yaxun Liu	de4b88d9a1	[AMDGPU] Lower enqueued blocks and generate runtime metadata This patch adds a post-linking pass which replaces the function pointer of enqueued block kernel with a global variable (runtime handle) and adds runtime-handle attribute to the enqueued block kernel. In LLVM CodeGen the runtime-handle metadata will be translated to RuntimeHandle metadata in code object. Runtime allocates a global buffer for each kernel with RuntimeHandel metadata and saves the kernel address required for the AQL packet into the buffer. __enqueue_kernel function in device library knows that the invoke function pointer in the block literal is actually runtime handle and loads the kernel address from it and puts it into AQL packet for dispatching. This cannot be done in FE since FE cannot create a unique global variable with external linkage across LLVM modules. The global variable with internal linkage does not work since optimization passes will try to replace loads of the global variable with its initialization value. Differential Revision: https://reviews.llvm.org/D38610 llvm-svn: 315352	2017-10-10 19:39:48 +00:00
Derek Schuff	669300db9c	[WebAssembly] Update MCObjectWriter and associated interfaces after r315327 llvm-svn: 315335	2017-10-10 17:31:43 +00:00
Lang Hames	232cdb48fc	[MC] Add another missing <memory> include left out of r315327. llvm-svn: 315332	2017-10-10 16:59:01 +00:00
Lang Hames	3a67075a3a	[MC] Add a missing <memory> include left out of r315327. llvm-svn: 315331	2017-10-10 16:58:26 +00:00
Lang Hames	60fbc7cc38	[MC] Thread unique_ptr<MCObjectWriter> through the create.*ObjectWriter functions. This makes the ownership of the resulting MCObjectWriter clear, and allows us to remove one instance of MCObjectStreamer's bizarre "holding ownership via someone else's reference" trick. llvm-svn: 315327	2017-10-10 16:28:07 +00:00
Jacob Gravelle	37af00e7d0	[WebAssembly] Narrow the scope of WebAssemblyFixFunctionBitcasts Summary: The pass to fix function bitcasts generates thunks for functions that are called directly with a mismatching signature. It was also generating thunks in cases where the function was address-taken, causing aliasing problems in otherwise valid cases. This patch tightens the restrictions for when the pass runs. Reviewers: sunfish, dschuff Subscribers: jfb, sbc100, llvm-commits, aheejin Differential Revision: https://reviews.llvm.org/D38640 llvm-svn: 315326	2017-10-10 16:20:18 +00:00
Simon Dardis	96d35fe06a	[mips] Duplicate the reciprocal instruction definitions for FP32 Add instruction definitions for FP32 mode for recip.d and rsqrt.d. Previously these instructions were only defined when targeting the full 64-bit FPU model but were not guarded properly. Reviewers: nitesh.jain, atanasyan Differential Revision: https://reviews.llvm.org/D38400 llvm-svn: 315318	2017-10-10 14:41:11 +00:00
Stefan Pintilie	cc330daa5b	[PowerPC] Add missing record form instructions to the P9 Scheduling Model A number of record form instructions were missing from the P9 scheduling model. Added those instructions and marked the P9 model as complete. Differential Revision: https://reviews.llvm.org/D38560 llvm-svn: 315313	2017-10-10 13:45:35 +00:00
Uriel Korach	059e211aa1	after fixing the i386 case Change-Id: If6fe0b6ec01f111115fb734fe31c0e152dbc165f llvm-svn: 315311	2017-10-10 13:43:09 +00:00
Simon Dardis	a17a7b619a	[mips] Partially fix PR34391 Previously, the parsing of the 'subu $reg, ($reg,) imm' relied on a parser which also rendered the operand to the instruction. In some cases the general parser could construct an MCExpr which was not a MCConstantExpr which MipsAsmParser was expecting. Address this by altering the special handling to cope with unexpected inputs and fine-tune the handling of cases where an register name that is not available in the current ABI is regarded as not a match for the custom parser but also not as an outright error. Also enforces the binutils restriction that only constants are accepted. This partially resolves PR34391. Thanks to Ed Maste for reporting the issue! Reviewers: nitesh.jain, arichardson Differential Revision: https://reviews.llvm.org/D37476 llvm-svn: 315310	2017-10-10 13:34:45 +00:00
Oliver Stannard	30b732c942	[ARM, Asm] Harden GNU LDRD/STRD aliases against invalid inputs Previously, the code that implemented the GNU assembler aliases for the LDRD and STRD instructions (where the second register is omitted) assumed that the input was a valid instruction. This caused assertion failures for every example in ldrd-strd-gnu-bad-inst.s. This improves this code so that it bails out if the instruction is not in the expected format, the check bails out, and the asm parser is run on the unmodified instruction. It also relaxes the alias on thumb targets, so that unaligned pairs of registers can be used. The restriction that Rt must be even-numbered only applies to the ARM versions of these instructions. Differential revision: https://reviews.llvm.org/D36732 llvm-svn: 315305	2017-10-10 12:38:22 +00:00
Oliver Stannard	cd3306f62f	[ARM, Asm] Add diagnostics for floating-point register operands This adds diagnostic strings for the ARM floating-point register classes, which will be used when these classes are expected by the assembler, but the provided operand is not valid. One of these, DPR, requires C++ code to select the correct error message, as that class contains different registers depending on the FPU. The rest can all have their diagnostic strings stored in the tablegen decription of them. Differential revision: https://reviews.llvm.org/D36693 llvm-svn: 315304	2017-10-10 12:35:09 +00:00
Oliver Stannard	bbad419e94	[ARM, Asm] Add diagnostics for general-purpose register operands This adds diagnostic strings for the ARM general-purpose register classes, which will be used when these classes are expected by the assembler, but the provided operand is not valid. One of these, rGPR, requires C++ code to select the correct error message, as that class contains different registers in pre-v8 and v8 targets. The rest can all have their diagnostic strings stored in the tablegen description of them. Differential revision: https://reviews.llvm.org/D36692 llvm-svn: 315303	2017-10-10 12:31:53 +00:00
Nicolai Haehnle	312b64f4d7	AMDGPU: Split MUBUF offset into aligned components Summary: Atomic buffer operations do not work (and trap on gfx9) when the components are unaligned, even if their sum is aligned. Previously, we generated an offset of 4156 without an SGPR by splitting it as 4095 + 61 (immediate + inline constant). The highest offset for which we can do this correctly is 4156 = 4092 + 64. Fixes dEQP-GLES31.functional.ssbo.atomic.* Reviewers: arsenm Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D37850 llvm-svn: 315302	2017-10-10 12:22:23 +00:00
Nemanja Ivanovic	7bf866eb10	Fix for PR34888. The issue is that we assume operand zero of the input to the add instruction is a register. In this case, the input comes from inline assembly and operand zero is not a register thereby causing a crash. The code will bail anyway if the input instruction doesn't have the right opcode. So do that check first and let short-circuiting prevent the crash. llvm-svn: 315285	2017-10-10 08:46:10 +00:00
NAKAMURA Takumi	aba2b3d1f3	SILoadStoreOptimizer.cpp: Fix build; Clang doesn't like "using anonymous struct" since rL315256. llvm-svn: 315283	2017-10-10 08:30:53 +00:00
Alex Bradbury	8cc99f1887	[RISCV] Fix build after r315254 createELFObjectWriter now takes a std::unique_ptr<MCELFObjectTargetWriter> rather than a MCELFObjectTargetWriter*. llvm-svn: 315275	2017-10-10 07:19:18 +00:00
Craig Topper	a88306e6fb	[AVX512] Add patterns to commute integer comparison instructions during isel. This enables broadcast loads to be commuted and allows normal loads to be folded without the peephole pass. llvm-svn: 315274	2017-10-10 06:36:46 +00:00
Reid Kleckner	e52d1e6787	[SEH] Use reportError instead of report_fatal_error for bad directives This makes the .seh_ directives slightly more usable from standalone assembly files. This removes a large number of report_fatal_errors and recovers from the error by ignoring the directive. llvm-svn: 315262	2017-10-10 01:26:25 +00:00
Lang Hames	1301a878f1	[MC] Plumb unique_ptr<MCWasmObjectTargetWriter> through createWasmObjectWriter to WasmObjectWriter's constructor. Fixes the same ownership issue for COFF that r315245 did for MachO: WasmObjectWriter takes ownership of its MCWasmObjectTargetWriter, so we want to pass this through to the constructor via a unique_ptr, rather than a raw ptr. llvm-svn: 315260	2017-10-10 01:15:10 +00:00
Reid Kleckner	a11b983e11	Fix Wasm build after r315254 llvm-svn: 315258	2017-10-10 00:52:40 +00:00
Lang Hames	77dff39cb4	[MC] Plumb unique_ptr<MCWinCOFFObjectTargetWriter> through createWinCOFFObjectWriter to WinCOFFObjectWriter's constructor. Fixes the same ownership issue for COFF that r315245 did for MachO: WinCOFFObjectWriter takes ownership of its MCWinCOFFObjectTargetWriter, so we want to pass this through to the constructor via a unique_ptr, rather than a raw ptr. llvm-svn: 315257	2017-10-10 00:50:29 +00:00
Lang Hames	dcb312bdb9	[MC] Plumb unique_ptr<MCELFObjectTargetWriter> through createELFObjectWriter to ELFObjectWriter's constructor. Fixes the same ownership issue for ELF that r315245 did for MachO: ELFObjectWriter takes ownership of its MCELFObjectTargetWriter, so we want to pass this through to the constructor via a unique_ptr, rather than a raw ptr. llvm-svn: 315254	2017-10-09 23:53:15 +00:00
Lang Hames	9b206a7d60	[MC] Plumb unique_ptr<MCMachObjectTargetWriter> through createMachObjectWriter to MCObjectWriter's constructor. MCObjectWriter takes ownership of its MCMachObjectTargetWriter argument -- this patch plumbs that ownership relationship through the constructor (which previously took raw MCMachObjectTargetWriter*) and the createMachObjectWriter function. llvm-svn: 315245	2017-10-09 22:38:13 +00:00
Aditya Nandakumar	c3bfc81a1f	[GISel]: Fix generation of illegal COPYs during CallLowering We end up creating COPY's that are either truncating/extending and this should be illegal. https://reviews.llvm.org/D37640 Patch for X86 and ARM by igorb, rovka llvm-svn: 315240	2017-10-09 20:07:43 +00:00
Zvi Rackover	c1d5955684	[X86] Unsigned saturation subtraction canonicalization [the backend part] Summary: On behalf of julia.koval@intel.com The patch transforms canonical version of unsigned saturation, which is sub(max(a,b),a) or sub(a,min(a,b)) to special psubus insturuction on targets, which support it(8bit and 16bit uints). umax(a,b) - b -> subus(a,b) a - umin(a,b) -> subus(a,b) There is also extra case handled, when right part of sub is 32 bit and can be truncated, using UMIN(this transformation was discussed in https://reviews.llvm.org/D25987). The example of special case code: ``` void foo(unsigned short p, int max, int n) { int i; unsigned m; for (i = 0; i < n; i++) { m = --p; p = (unsigned short)(m >= max ? m-max : 0); } } ``` Max in this example is truncated to max_short value, if it is greater than m, or just truncated to 16 bit, if it is not. It is vaid transformation, because if max > max_short, result of the expression will be zero. Here is the table of types, I try to support, special case items are bold: \| Size \| 128 \| 256 \| 512 \| ----- \| ----- \| ----- \| ----- \| i8 \| v16i8 \| v32i8 \| v64i8 \| i16 \| v8i16 \| v16i16 \| v32i16 \| i32 \| \| v8i32* \| v16i32 \| i64 \| \| \| v8i64 Reviewers: zvi, spatel, DavidKreitzer, RKSimon Reviewed By: zvi Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D37534 llvm-svn: 315237	2017-10-09 20:01:10 +00:00
Amara Emerson	24ca39ce71	[AArch64] Improve codegen for inverted overflow checking intrinsics E.g. if we have a (xor(overflow-bit), 1) where overflow-bit comes from an intrinsic like llvm.sadd.with.overflow then we can kill the xor and use the inverted condition code for the CSEL. rdar://28495949 Reviewed By: kristof.beyls Differential Revision: https://reviews.llvm.org/D38160 llvm-svn: 315205	2017-10-09 15:15:09 +00:00
Craig Topper	c88883b07d	[X86] Remove a setLoadExtAction from the AVX512 section that uses an AVX512BW type and is alraedy present in the AVX512BW section. llvm-svn: 315202	2017-10-09 01:05:16 +00:00
Craig Topper	4f8656a7af	[X86] Enable extended comparison predicate support for SETUEQ/SETONE when targeting AVX instructions. We believe that despite AMD's documentation, that they really do support all 32 comparision predicates under AVX. Differential Revision: https://reviews.llvm.org/D38609 llvm-svn: 315201	2017-10-09 01:05:15 +00:00
Simon Pilgrim	2c742f919a	[X86][SSE] Don't call combineTo inside combineX86ShufflesRecursively. NFCI. Return the combined shuffle from combineX86ShufflesRecursively and perform the combineTo in the caller. Makes it easier for future patches to use this in functions that aren't actually shuffles themselves. llvm-svn: 315195	2017-10-08 20:58:14 +00:00
Simon Pilgrim	6abbd33ec0	Tidyup with clang-format. NFCI. llvm-svn: 315187	2017-10-08 19:24:30 +00:00
Benjamin Kramer	16610028ea	Remove unused variables. No functionality change. llvm-svn: 315185	2017-10-08 19:11:02 +00:00
Simon Pilgrim	dc32c844f9	[X86] getTargetConstantBitsFromNode - add support for decoding scalar constants llvm-svn: 315182	2017-10-08 17:21:18 +00:00
Craig Topper	c97775c03c	[X86] Prefer MOVSS/SD over BLENDI during legalization. Remove BLENDI versions of scalar arithmetic patterns Summary: We currently disable some converting of shuffles to MOVSS/MOVSD during legalization if SSE41 is enabled. But later during shuffle combining we go back to prefering MOVSS/MOVSD. Additionally we have patterns that look for BLENDIs to detect scalar arithmetic operations. I believe due to the combining using MOVSS/MOVSD these are unnecessary. Interestingly, we still codegen blend instructions even though lowering/isel emit movss/movsd instructions. Turns out machine CSE commutes them to blend, and then commuting those blends back into blends that are equivalent to the original movss/movsd. This patch fixes the inconsistency in legalization to prefer MOVSS/MOVSD. The one test change was caused by this change. The problem is that we have integer types and are mostly selecting integer instructions except for the shufps. This shufps forced the execution domain, but the vpblendw couldn't have its domain changed with a naive instruction swap. We could fix this by special casing VPBLENDW based on the immediate to widen the element type. The rest of the patch is removing all the excess scalar patterns. Long term we should probably add isel patterns to make MOVSS/MOVSD emit blends directly instead of relying on the double commute. We may also want to consider emitting movss/movsd for optsize. I also wonder if we should still use the VEX encoded blendi instructions even with AVX512. Blends have better throughput, and that may outweigh the register constraint. Reviewers: RKSimon, zvi Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D38023 llvm-svn: 315181	2017-10-08 16:57:23 +00:00
Amara Emerson	1cd89ca669	[AArch64][GlobalISel] Make G_PHI of p0 types legal. Differential Revision: https://reviews.llvm.org/D38621 llvm-svn: 315177	2017-10-08 15:29:11 +00:00
Gadi Haber	684944b822	[X86][SKX] Adding the scheduling information for the SKX target. Adding the scheduling information for the SkylakeServer (SKX) target. This patch adds the instruction scheduling information for the SkylakeServer (SKX) architecture target by adding the file X86SchedSkylakeServer.td located under the X86 Target. We used the scheduling information retrieved from the Skylake architects in order to create the file. The scheduling information includes latency, number of micro-Ops and used ports by each SKL instruction. The patch continues the scheduling replacement and insertion effort started with the SNB target in r310792, the HSW target in r311879 and the SkylakeClient (SKL) target in rL313613. Please expect some performance fluctuations due to code alignment effects. Reviewers: zvi, RKSimon, craig.topper, chandlerc, aymanmu Differential Revision: https://reviews.llvm.org/D38443 Change-Id: I5c228fcc09e9e5a99b6116e62b356c4f9b971185 llvm-svn: 315175	2017-10-08 12:52:54 +00:00
Ayman Musa	1170deb9c8	[X86] Add missing entries in 'MemoryFoldTable2Addr' to get complete form of the table. Get the folding table 'MemoryFoldTable2Addr' to a complete state as part of the process explained in https://reviews.llvm.org/D38028 Differential Revision: https://reviews.llvm.org/D38500 llvm-svn: 315174	2017-10-08 09:46:50 +00:00
Ayman Musa	993339b941	[X86][TableGen] Recommitting the X86 memory folding tables TableGen backend while disabling it by default. After the original commit ([[ https://reviews.llvm.org/rL304088 \| rL304088 ]]) was reverted, a discussion in llvm-dev was opened on 'how to accomplish this task'. In the discussion we concluded that the best way to achieve our goal (which is to automate the folding tables and remove the manually maintained tables) is: # Commit the tablegen backend disabled by default. # Proceed with an incremental updating of the manual tables - while checking the validity of each added entry. # Repeat previous step until we reach a state where the generated and the manual tables are identical. Then we can safely remove the manual tables and include the generated tables instead. # Schedule periodical (1 week/2 weeks/1 month) runs of the pass: - if changes appear (new entries): - make sure the entries are legal - If they are not, mark them as illegal to folding - Commit the changes (if there are any). CMake flag added for this purpose is "X86_GEN_FOLD_TABLES". Building with this flags will run the pass and emit the X86GenFoldTables.inc file under build/lib/Target/X86/ directory which is a good reference for any developer who wants to take part in the effort of completing the current folding tables. Differential Revision: https://reviews.llvm.org/D38028 llvm-svn: 315173	2017-10-08 09:20:32 +00:00
Craig Topper	bbca2f2978	[X86] Stop LowerSIGN_EXTEND_AVX512 from creating v8i16/v16i16/v16i8 vselects with a v8i1/v16i1 condition when BWI is not available. Some of the tests in vector-shuffle-v1.ll would get into an infinite loop without this. llvm-svn: 315172	2017-10-08 08:50:59 +00:00
Ayman Musa	5fc6dc58d7	[X86] Add new attribute to X86 instructions to enable marking them as "not memory foldable" This attribute will be used in a tablegen backend that generated the X86 memory folding tables which will be added in a future pass. Instructions with this attribute unset will be excluded from the full set of X86 instructions available for the pass. Differential Revision: https://reviews.llvm.org/D38027 llvm-svn: 315171	2017-10-08 08:32:56 +00:00
Craig Topper	9563cab961	[X86] Simplify some code in getInsertVINSERTImmediate and getExtractVEXTRACTImmediate. NFC Replace one of the divides with a multiply. llvm-svn: 315162	2017-10-08 01:33:42 +00:00
Craig Topper	27170fee8d	[X86] If we see an insert of a bitcast into zero vector, canonicalize it to move the bitcast to the other side of the insert. This improves detection of zeroing of upper bits during isel. llvm-svn: 315161	2017-10-08 01:33:41 +00:00
Craig Topper	f7a19db649	[X86] Remove ISD::INSERT_SUBVECTOR handling from combineBitcastForMaskedOp. Add isel patterns to make up for it. This will allow for some flexibility in canonicalizing bitcasts around insert_subvector. llvm-svn: 315160	2017-10-08 01:33:40 +00:00
Craig Topper	16f2044fa8	[X86] Use getConstantOperandVal to simplify some code. NFC llvm-svn: 315159	2017-10-08 01:33:38 +00:00
Simon Pilgrim	9508fe7924	[X86][SSE] Match bitcasted BUILD_VECTOR of constants for v2i64 shifts on 64-bit targets (PR34855) Extension to rL315155, generate constant shifts on 64-bits as well as 32-bits. llvm-svn: 315156	2017-10-07 17:57:22 +00:00
Simon Pilgrim	70e1db78db	[X86][SSE] Match bitcasted v4i32 BUILD_VECTORS for v2i64 shifts on 64-bit targets (PR34855) We were already doing this for 32-bit targets, but we can generate these on 64-bits as well. llvm-svn: 315155	2017-10-07 17:42:17 +00:00
Craig Topper	2f60295364	[X86] Add X86ISD::CMOV to computeKnownBitsForTargetNode and ComputeNumSignBitsForTargetNode. Summary: Implementations based on ISD::SELECT. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D38663 llvm-svn: 315153	2017-10-07 16:51:19 +00:00
Simon Pilgrim	73f143e774	[X86][SSE] Improve shuffling combining with horizontal operations Recognise cases when we can merge the shuffles with their horizontal (HADD/HSUB/PACK) instruction inputs. Replaces an older implementation which performed some of this during lowering, expanding an existing target shuffle combine stage instead. Differential Revision: https://reviews.llvm.org/D38506 llvm-svn: 315150	2017-10-07 12:42:23 +00:00
Martin Storsjo	5e9d482b0a	[X86] Update an outdated comment about SjLj The SjLj intrinsics in the X86 backend are intended for use with SjLj exception handling as well, since SVN r271244. Differential Revision: https://reviews.llvm.org/D38532 llvm-svn: 315146	2017-10-07 06:00:32 +00:00
Craig Topper	e79eff3bb5	[X86] Correct result type for the flag result of RDSEED and RDRAND nodes. Correct the CC type for the CMOV used with RDSEED/RDRAND. The flag result was MVT::Glue, but should be MVT::i32. The CC type was MVT::i8, but should be MVT::i32. llvm-svn: 315145	2017-10-07 05:11:59 +00:00
Jessica Paquette	13593843f6	[MachineOutliner] Disable outlining from LinkOnceODRs by default Say you have two identical linkonceodr functions, one in M1 and one in M2. Say that the outliner outlines A,B,C from one function, and D,E,F from another function (where letters are instructions). Now those functions are not identical, and cannot be deduped. Locally to M1 and M2, these outlining choices would be good-- to the whole program, however, this might not be true! To mitigate this, this commit makes it so that the outliner sees linkonceodr functions as unsafe to outline from. It also adds a flag, -enable-linkonceodr-outlining, which allows the user to specify that they want to outline from such functions when they know what they're doing. Changing this handles most code size regressions in the test suite caused by competing with linker dedupe. It also doesn't have a huge impact on the code size improvements from the outliner. There are 6 tests that regress > 5% from outlining WITH linkonceodrs to outlining WITHOUT linkonceodrs. Overall, most tests either improve or are not impacted. Not outlined vs outlined without linkonceodrs: https://hastebin.com/raw/qeguxavuda Not outlined vs outlined with linkonceodrs: https://hastebin.com/raw/edepoqoqic Outlined with linkonceodrs vs outlined without linkonceodrs: https://hastebin.com/raw/awiqifiheb Numbers generated using compare.py with -m size.__text. Tests run for AArch64 with -Oz -mllvm -enable-machine-outliner -mno-red-zone. llvm-svn: 315136	2017-10-07 00:16:34 +00:00
Cameron McInally	9d64101fe8	[AVX512] Fix TERNLOG when folding broadcast Patch to fix ternlog instructions with a folded broadcast. The broadcast decorator, e.g. {1toX}, was missing. Differential Revision: https://reviews.llvm.org/D38649 llvm-svn: 315122	2017-10-06 22:31:29 +00:00
Stanislav Mekhanoshin	de42c29a68	[AMDGPU] New 64 bit div/rem expansion Old expansion was 20 VGPRs, 78 SGPRs and ~380 instructions. This expansion is 11 VGPRs, 12 SGPRs and ~120 instructions. Passes OpenCL conformance test_integer_ops quick_[u]long_math Differential Revision: https://reviews.llvm.org/D38607 llvm-svn: 315081	2017-10-06 17:24:45 +00:00
Diana Picus	e393bc72ee	[ARM] GlobalISel: Select shifts Unfortunately TableGen doesn't handle this yet: Unable to deduce gMIR opcode to handle Src (which is a leaf). Just add some temporary hand-written code to generate the proper MOVsr. llvm-svn: 315071	2017-10-06 15:39:16 +00:00
Diana Picus	a81a4b17e5	[ARM] GlobalISel: Map shift operands to GPRs llvm-svn: 315067	2017-10-06 14:52:43 +00:00
Diana Picus	2c95730450	[ARM] GlobalISel: Mark shifts as legal for s32 The new legalize combiner introduces shifts all over the place, so we should support them sooner rather than later. llvm-svn: 315064	2017-10-06 14:30:05 +00:00
Jonas Paulsson	c63ed222b8	[SystemZ] Enable machine scheduler. The machine scheduler (before register allocation) is enabled by default for SystemZ. The SelectionDAG scheduling preference now becomes source order scheduling (was regpressure). Review: Ulrich Weigand https://reviews.llvm.org/D37977 llvm-svn: 315063	2017-10-06 13:59:28 +00:00
Reid Kleckner	676941909d	[X86] Extract CATCHRET handling from emitEpilogue, NFC llvm-svn: 315023	2017-10-05 21:37:39 +00:00
Derek Schuff	885dc59297	[WebAssembly] Add the rest of the atomic loads Add extending loads and constant offset patterns A bit more refactoring of the tablegen to make the patterns fairly nice and uniform between the regular and atomic loads. Differential Revision: https://reviews.llvm.org/D38523 llvm-svn: 315022	2017-10-05 21:18:42 +00:00
Krzysztof Parzyszek	a114941fa8	[Hexagon] Make PS_fi and PS_fia extendable (they both expand to A2_addi) llvm-svn: 315019	2017-10-05 20:20:06 +00:00
Krzysztof Parzyszek	7ae3ae9ef4	[Hexagon] Give uniform names to functions changing addressing modes, NFC The new format is changeAddrMode_xx_yy, where xx is the current mode, and yy is the new one. Old name: New name: getBaseWithImmOffset changeAddrMode_abs_io getAbsoluteForm changeAddrMode_io_abs getBaseWithRegOffset changeAddrMode_io_rr xformRegToImmOffset changeAddrMode_rr_io getBaseWithLongOffset changeAddrMode_rr_ur getRegShlForm changeAddrMode_ur_rr llvm-svn: 315013	2017-10-05 20:01:38 +00:00
Reid Kleckner	7344282c36	[X86] Simplify X86 epilogue frame size calculation, NFC Sink the insertion of "pop ebp" out of the frame size calculation branches. They all check for HasFP. Our handling of CLEANUPRET and CATCHRET was equivalent, both are funclets and use the same frame size. We can eliminate the CLEANUPRET case. Hoist the hasFP(MF) query into a local bool. Rename TargetMBB to CatchRetTarget to be more descriptive. Eliminate the Optional<unsigned> RetOpcode local, now that it has one use. It's only a net savings of 10 lines, but hopefully it's slightly more readable. llvm-svn: 315000	2017-10-05 18:27:08 +00:00
Petar Jovanovic	65f10246bb	[mips] implement .set dspr2 directive Implement .set dspr2 directive with appropriate feature bits. This directive is a counterpart of -mattr=dspr2 command line option with the exception that it does not influence elf header flags. Patch by Milos Stojanovic. Differential Revision: https://reviews.llvm.org/D38537 llvm-svn: 314994	2017-10-05 17:40:32 +00:00
Matt Arsenault	2d3f8f333d	AMDGPU: Set v2i32 any_extend to expand llvm-svn: 314993	2017-10-05 17:38:30 +00:00
Krzysztof Parzyszek	9f3e88ae64	[RDF] Simplify construction of maximal registers The old algoritm was not correct, although it worked most of the time. Avoid the complex reachability analysis and simply calculate the maximal registers out of the set of all referenced registers. llvm-svn: 314991	2017-10-05 17:12:49 +00:00
Artur Pilipenko	7b15254c8f	[X86] Fix chains update when lowering BUILD_VECTOR to a vector load The code which lowers BUILD_VECTOR of consecutive loads into a single vector load doesn't update chains properly. As a result the vector load can be reordered with the store to the same location. The current code in EltsFromConsecutiveLoads only updates the chain following the first load. The fix is to update the chains following all the loads comprising the vector. This is a fix for PR10114. Reviewed By: niravd Differential Revision: https://reviews.llvm.org/D38547 llvm-svn: 314988	2017-10-05 16:28:21 +00:00
Konstantin Zhuravlyov	aa0835a7ab	AMDGPU: Add and set AMDGPU-specific e_flags Differential Revision: https://reviews.llvm.org/D38556 llvm-svn: 314987	2017-10-05 16:19:18 +00:00
Simon Dardis	51a7ae2a29	[mips] Place certain 64 bit FPU instructions in their own decoder namespace Previously, instructions that were defined to use the FGR64 register class were associated with the Mips64 table which was incorrect. Reviewers: nitesh.jain, atanasyan Differential Revision: https://reviews.llvm.org/D38454 llvm-svn: 314976	2017-10-05 10:27:37 +00:00
Eugene Zelenko	60433b682f	[X86] Fix some Clang-tidy modernize-use-using and Include What You Use warnings; other minor fixes (NFC). llvm-svn: 314953	2017-10-05 00:33:50 +00:00
Matt Arsenault	f48e5c9ce5	AMDGPU: Add comment about clamps llvm-svn: 314952	2017-10-05 00:13:20 +00:00
Matt Arsenault	aafff87dda	AMDGPU: Do not fold clamp instructions when sources are different Patch by hakzsam (Samuel Pitoiset) llvm-svn: 314951	2017-10-05 00:13:17 +00:00
Matt Arsenault	9ab1fa6803	AMDGPU: Fix not accounting for instruction size in bundles These were counted as 0. Fixes branch limit exceeded errors in some large programs. llvm-svn: 314944	2017-10-04 22:59:12 +00:00
Konstantin Zhuravlyov	8684f7b4f9	AMDGPU: Correctly set EI_OSABI based on the os Differential Revision: https://reviews.llvm.org/D38555 llvm-svn: 314943	2017-10-04 22:44:13 +00:00
Sanjay Patel	4c33d5213b	[SimplifyCFG] put the optional assumption cache pointer in the options struct; NFCI This is a follow-up to https://reviews.llvm.org/D38138. I fixed the capitalization of some functions because we're changing those lines anyway and that helped verify that we weren't accidentally dropping any options by using default param values. llvm-svn: 314930	2017-10-04 20:26:25 +00:00
Simon Pilgrim	9edbe110e8	[X86][AVX] Improve (i8 bitcast (v8i1 x)) handling for v8i64/v8f64 512-bit vector compare results. AVX1/AVX2 targets were missing a chance to use vmovmskps for v8f32/v8i32 results for bool vector bitcasts llvm-svn: 314921	2017-10-04 18:00:42 +00:00
Krzysztof Parzyszek	4697ddeea4	[Hexagon] Add a member Subtarget to HexagonInstrInfo, NFC llvm-svn: 314920	2017-10-04 18:00:15 +00:00
Hans Wennborg	2a6c9adb2f	Revert r314886 "[X86] Improvement in CodeGen instruction selection for LEAs (re-applying post required revision changes.)" It broke the Chromium / SQLite build; see PR34830. > Summary: > 1/ Operand folding during complex pattern matching for LEAs has been > extended, such that it promotes Scale to accommodate similar operand > appearing in the DAG. > e.g. > T1 = A + B > T2 = T1 + 10 > T3 = T2 + A > For above DAG rooted at T3, X86AddressMode will no look like > Base = B , Index = A , Scale = 2 , Disp = 10 > > 2/ During OptimizeLEAPass down the pipeline factorization is now performed over LEAs > so that if there is an opportunity then complex LEAs (having 3 operands) > could be factored out. > e.g. > leal 1(%rax,%rcx,1), %rdx > leal 1(%rax,%rcx,2), %rcx > will be factored as following > leal 1(%rax,%rcx,1), %rdx > leal (%rdx,%rcx) , %edx > > 3/ Aggressive operand folding for AM based selection for LEAs is sensitive to loops, > thus avoiding creation of any complex LEAs within a loop. > > Reviewers: lsaba, RKSimon, craig.topper, qcolombet, jmolloy > > Reviewed By: lsaba > > Subscribers: jmolloy, spatel, igorb, llvm-commits > > Differential Revision: https://reviews.llvm.org/D35014 llvm-svn: 314919	2017-10-04 17:54:06 +00:00
Simon Pilgrim	b47b3f2564	[X86][SSE] Add support for lowering v8i16 binary shuffles to PACKSS/PACKUS Missed in D38472 llvm-svn: 314916	2017-10-04 17:31:28 +00:00
Craig Topper	6fb55716e9	[X86] Redefine MOVSS/MOVSD instructions to take VR128 regclass as input instead of FR32/FR64 This patch redefines the MOVSS/MOVSD instructions to take VR128 as its second input. This allows the MOVSS/SD->BLEND commute to work without requiring a COPY to be inserted. This should fix PR33079 Overall this looks to be an improvement in the generated code. I haven't checked the EXPENSIVE_CHECKS build but I'll do that and update with results. Differential Revision: https://reviews.llvm.org/D38449 llvm-svn: 314914	2017-10-04 17:20:12 +00:00
Yonghong Song	09b01b3555	bpf: fix an insn encoding issue for neg insn Signed-off-by: Yonghong Song <yhs@fb.com> llvm-svn: 314911	2017-10-04 16:11:52 +00:00
Simon Pilgrim	46a366ccb7	[X86][SSE] Early out from ComputeNumSignBitsForTargetNode. NFCI. Early out from vector shift by immediates that will exceed eltsize - don't bother making an unnecessary ComputeNumSignBits recursive call. llvm-svn: 314903	2017-10-04 13:41:26 +00:00
Simon Pilgrim	bd5d2f0284	[X86][SSE] Add support for lowering unary shuffles to PACKSS/PACKUS Extension to D38472 llvm-svn: 314901	2017-10-04 13:12:08 +00:00
Dylan McKay	8dd702c1cd	[AVR] Implement LPMWRdZ pseudo-instruction's expansion. FIXME: implementation is mostly copy-pasted from LDWRdPtr, so we should refactor a bit and unify the two Patch by Gerdo Erdi. llvm-svn: 314898	2017-10-04 10:37:22 +00:00
Dylan McKay	3f71f1c91e	[AVR] Factor out mayLoad in tablegen patterns Patch by Gergo Erdi. llvm-svn: 314897	2017-10-04 10:36:07 +00:00
Dylan McKay	d00f9c1ef1	[AVR] Elaborate LDWRdPtr into `ld r, X++; ld r+1, X` Patch by Gergo Erdi. llvm-svn: 314896	2017-10-04 10:33:36 +00:00
Dylan McKay	39069208d5	[AVR] Insert JMP for long branches Previously, on long branches (relative jumps of >4 kB), an assertion failure was hit, as AVRInstrInfo::insertIndirectBranch was not implemented. Despite its name, it is called by the branch relaxator for all unconditional jumps. Patch by Thomas Backman. llvm-svn: 314891	2017-10-04 09:51:28 +00:00
Dylan McKay	c4b002bf5a	[AVR] Fix displacement overflow for LDDW/STDW In some cases, the code generator attempts to generate instructions such as: lddw r24, Y+63 which expands to: ldd r24, Y+63 ldd r25, Y+64 # Oops! This is actually ld r25, Y in the binary This commit limits the first offset to 62, and thus the second to 63. It also updates some asserts in AVRExpandPseudoInsts.cpp, including for INW and OUTW, which appear to be unused. Patch by Thomas Backman. llvm-svn: 314890	2017-10-04 09:51:21 +00:00
Oliver Stannard	878216dd05	[ARM] Add diag string for movw/movt immediates in assembly This adds diagnostics for invalid immediate operands to the MOVW and MOVT instructions (ARM and Thumb). Differential revision: https://reviews.llvm.org/D31879 llvm-svn: 314888	2017-10-04 09:24:54 +00:00
Oliver Stannard	5a7aae3a80	[ARM, Asm] Change grammar of immediate operand diagnostics Currently, our diagnostics for assembly operands are not consistent. Some start with (for example) "immediate operand must be ...", and some with "operand must be an immediate ...". I think the latter form is preferable for a few reasons: * It's unambiguous that it is referring to the expected type of operand, not the type the user provided. For example, the user could provide an register operand, and get a message taking about an operand is if it is already an immediate, just not in the accepted range. * It allows us to have a consistent style once we add diagnostics for operands that could take two forms, for example a label or pc-relative memory operand. Differential revision: https://reviews.llvm.org/D36689 llvm-svn: 314887	2017-10-04 09:18:07 +00:00
Jatin Bhateja	3c29bacd43	[X86] Improvement in CodeGen instruction selection for LEAs (re-applying post required revision changes.) Summary: 1/ Operand folding during complex pattern matching for LEAs has been extended, such that it promotes Scale to accommodate similar operand appearing in the DAG. e.g. T1 = A + B T2 = T1 + 10 T3 = T2 + A For above DAG rooted at T3, X86AddressMode will no look like Base = B , Index = A , Scale = 2 , Disp = 10 2/ During OptimizeLEAPass down the pipeline factorization is now performed over LEAs so that if there is an opportunity then complex LEAs (having 3 operands) could be factored out. e.g. leal 1(%rax,%rcx,1), %rdx leal 1(%rax,%rcx,2), %rcx will be factored as following leal 1(%rax,%rcx,1), %rdx leal (%rdx,%rcx) , %edx 3/ Aggressive operand folding for AM based selection for LEAs is sensitive to loops, thus avoiding creation of any complex LEAs within a loop. Reviewers: lsaba, RKSimon, craig.topper, qcolombet, jmolloy Reviewed By: lsaba Subscribers: jmolloy, spatel, igorb, llvm-commits Differential Revision: https://reviews.llvm.org/D35014 llvm-svn: 314886	2017-10-04 09:02:10 +00:00
Martin Storsjo	e14145dcb0	[X86] Fix using the SJLJ jump table on x86_64 The previous version didn't work if the jump table base address didn't fit in 32 bit, since it was encoded as an immediate offset. And in case the jump table is encoded as 32 bit label differences, we need to load and add them to the table base first. This solves the first half of the issues mentioned in PR34720. Also fix some of the errors pointed out by -verify-machineinstrs, by using GR32_NOSPRegClass. Differential Revision: https://reviews.llvm.org/D38333 llvm-svn: 314876	2017-10-04 05:12:10 +00:00
Balaram Makam	e0c43152b5	[AArch64] Use LateSimplifyCFG after expanding atomic operations. Summary: After r308422 we defer optimizations that can destroy loop canonical forms to LateSimplifyCFG. Running LateSimplifyCFG after expanding atomic operations can exploit more control-flow opportunities. Reviewers: mcrosier, t.p.northover, efriedma Reviewed By: efriedma Subscribers: aemerson, rengolin, javed.absar, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D38262 llvm-svn: 314857	2017-10-03 22:39:24 +00:00
Konstantin Zhuravlyov	22bc039c89	AMDGPU: Expand setcc for v2f32 and v4f32 llvm-svn: 314853	2017-10-03 21:45:01 +00:00
Konstantin Zhuravlyov	908fa90b51	AMDGPU: Expand setcc for v2i32 and v4i32 llvm-svn: 314852	2017-10-03 21:31:24 +00:00
Reid Kleckner	33cbbbc62f	[X86] Remove dead declaration convertArgMovsToPushes, NFC This was dead when it landed in r252578. We have this functionality, if not for stack probe calls, but for regular calls in X86CallFrameOptimization.cpp. llvm-svn: 314845	2017-10-03 21:12:18 +00:00
Stefan Pintilie	e1d7547237	[PowerPC] Revert P9 scheduling model to incomplete Partially revert a previous change from commit: https://llvm.org/svn/llvm-project/llvm/trunk@314026 The previous change caused regressions on Power 9. llvm-svn: 314835	2017-10-03 20:27:30 +00:00
Tim Renouf	72800f0436	[AMDGPU] implemented pal metadata Summary: For the amdpal OS type: We write an AMDGPU_PAL_METADATA record in the .note section in the ELF (or as an assembler directive). It contains key=value pairs of 32 bit ints. It is a merge of metadata from codegen of the shaders, and metadata provided by the frontend as _amdgpu_pal_metadata IR metadata. Where both sources have a key=value with the same key, the two values are ORed together. This .note record is part of the amdpal ABI and will be documented in docs/AMDGPUUsage.rst in a future commit. Eventually the amdpal OS type will stop generating the .AMDGPU.config section once the frontend has safely moved over to using the .note records above instead of .AMDGPU.config. Reviewers: arsenm, nhaehnle, dstuttard Subscribers: kzhuravl, wdng, yaxunl, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D37753 llvm-svn: 314829	2017-10-03 19:03:52 +00:00
Alexander Timofeev	4651396584	[AMDGPU] Avoid predicated execution of the basic blocks containing scalar instructions. Differential revision: https://reviews.llvm.org/D38293 llvm-svn: 314828	2017-10-03 18:55:36 +00:00
Hans Wennborg	660531085a	CodeView: Provide a .def file with the register ids The list of register ids was previously written out in a couple of dirrent places. This puts it in a .def file and also adds a few more registers (e.g. the x87 regs) which should lead to more readable dumps, but I didn't include the whole list since that seems unnecessary. X86_MC::initLLVMToSEHAndCVRegMapping is pretty ugly, but at least it's not relying on magic constants anymore. The TODO of using tablegen still stands. Differential revision: https://reviews.llvm.org/D38480 llvm-svn: 314821	2017-10-03 18:27:22 +00:00
Oliver Stannard	0d5c792223	[ARM] Use table-gen'd assembly operand diags in ARM asm parser This switches the ARM AsmParser to use assembly operand diagnostics from tablegen, rather than a switch statement on the ARMMatchResultTy. It moves the existing diagnostic strings to tablegen, but adds no new ones, so this is NFC except for one diagnostic string that had an off-by-1 error in the hand-written switch statement. Differential revision: https://reviews.llvm.org/D31607 llvm-svn: 314804	2017-10-03 14:38:52 +00:00
Oliver Stannard	55114fd9f0	[ARM, Asm] Use correct source location for register tokens tryParseRegister advances the lexer, so we need to take copies of the start and end locations of the register operand before calling it. Previously, the caret in the diagnostic pointer to the comma after the r0 operand in the test, rather than the start of the operand. Differential revision: https://reviews.llvm.org/D31537 llvm-svn: 314799	2017-10-03 14:30:58 +00:00
Simon Dardis	055192ccd3	[mips] Enable spilling and reloading of the dsp register set. The dsp register class is an alias of the gpr register class, so we have to define instructions for spilling and reloading. Reviewers: atanasyan Differential Revision: https://reviews.llvm.org/D38038 llvm-svn: 314798	2017-10-03 13:45:49 +00:00
Oliver Stannard	68aa7de517	[ARM, Asm] Fix ubsan failure caused by out-of-range enum value In this code, we use ~0U as a sentinel value for any operand class that doesn't have a user-friendly error message, but this value isn't in range of the MatchClassKind enum, so we need to ensure it does not get passed to isSubclass. llvm-svn: 314793	2017-10-03 12:45:18 +00:00
Simon Pilgrim	cf99d069c3	[X86][SSE] Add support for decoding PACKSS/PACKUS shuffles masks with UNDEF llvm-svn: 314792	2017-10-03 12:41:39 +00:00
Oliver Stannard	5daee987fd	[ARM, Asm] Remove dead code causing MSan failure. r314779 caused ErrorInfo to be red uninitialised, but also made this code dead, so it can just be removed. llvm-svn: 314791	2017-10-03 12:28:28 +00:00
Simon Pilgrim	f5f291d129	[X86][SSE] Add support for lowering shuffles to PACKSS/PACKUS If the upper bits of a truncation shuffle patterns have at least the minimum number of sign/zero bits on their inputs then we can safely use PACKSS/PACKUS as shuffles. Partial fix for https://bugs.llvm.org/show_bug.cgi?id=34773 Differential Revision: https://reviews.llvm.org/D38472 llvm-svn: 314788	2017-10-03 12:01:31 +00:00
Oliver Stannard	e093bad472	[ARM] Use new assembler diags for ARM This converts the ARM AsmParser to use the new assembly matcher error reporting mechanism, which allows errors to be reported for multiple instruction encodings when it is ambiguous which one the user intended to use. By itself this doesn't improve many error messages, because we don't have diagnostic text for most operand types, but as we add that then this will allow more of those diagnostic strings to be used when they are relevant. Differential revision: https://reviews.llvm.org/D31530 llvm-svn: 314779	2017-10-03 10:26:11 +00:00
Simon Pilgrim	d87af9a1c0	Remove unused variable. NFCI. llvm-svn: 314778	2017-10-03 10:01:02 +00:00
Simon Pilgrim	640fbf5132	[X86][SSE] Add support for shuffle combining from PACKSS/PACKUS Mentioned in D38472 llvm-svn: 314777	2017-10-03 09:54:03 +00:00
Simon Pilgrim	19d535e75b	[X86][SSE] Add support for PACKSS/PACKUS constant folding Pulled out of D38472 llvm-svn: 314776	2017-10-03 09:41:00 +00:00
Sjoerd Meijer	7a22a4948f	ISel type legalization: add debug messages. NFCI. This adds some more debug messages to the type legalizer and functions like PromoteNode, ExpandNode, ExpandLibCall in an attempt to make the debug messages a little bit more informative and useful. Differential Revision: https://reviews.llvm.org/D38450 llvm-svn: 314773	2017-10-03 08:54:15 +00:00
Hiroshi Inoue	224661d94b	[trivial] fix format, NFC llvm-svn: 314769	2017-10-03 07:28:58 +00:00
Martin Storsjo	1e54738676	[X86] Provide the LSDA pointer with RIP relative addressing if necessary This makes sure the LSDA pointer isn't truncated to 32 bit. Make LowerINTRINSIC_WO_CHAIN a member function instead of a static function, so that it can use the getGlobalWrapperKind method. This solves the second half of the issues mentioned in PR34720. Differential Revision: https://reviews.llvm.org/D38343 llvm-svn: 314767	2017-10-03 06:29:58 +00:00
Matt Arsenault	90c7593a75	AMDGPU: Remove global isGCN predicates These are problematic because they apply to everything, and can easily clobber whatever more specific predicate you are trying to add to a function. Currently instructions use SubtargetPredicate/PredicateControl to apply this to patterns applied to an instruction definition, but not to free standing Pats. Add a wrapper around Pat so the special PredicateControls requirements can be appended to the final predicate list like how Mips does it. llvm-svn: 314742	2017-10-03 00:06:41 +00:00
Amjad Aboud	8ef85a088e	[X86][NFC] Add X86CmovConverterPass to the pass registry. Differential Revision: https://reviews.llvm.org/D38355 llvm-svn: 314726	2017-10-02 21:46:37 +00:00
Michael Liao	c6004d0371	Remove dead file. llvm-svn: 314720	2017-10-02 21:00:52 +00:00
Matt Arsenault	c6baa85fc6	AMDGPU: Fix typos llvm-svn: 314715	2017-10-02 20:31:18 +00:00
Walter Lee	35b09cbd42	Add support for Myriad ma2x8x series of CPUs Summary: Also add support for some older Myriad CPUs that were missing. Reviewers: jyknight Subscribers: fedor.sergeev Differential Revision: https://reviews.llvm.org/D37552 llvm-svn: 314705	2017-10-02 18:50:48 +00:00
Bjorn Pettersson	8e978c0151	[X86][SSE] Fix -Wsign-compare problems introduced in r314658 The refactoring in "[X86][SSE] Add createPackShuffleMask helper function. NFCI." resulted in warning when compiling the code (seen in build bots). This patch restores some types from int to unsigned to avoid those warnings. llvm-svn: 314667	2017-10-02 12:46:38 +00:00
Simon Pilgrim	e2e27aff9b	[X86][SSE] Add createPackShuffleMask helper function. NFCI. llvm-svn: 314658	2017-10-02 10:12:51 +00:00
Simon Pilgrim	c04c7443ea	[X86][SSE] matchBinaryVectorShuffle - add support for different src/dst value shuffle types Preparation for support for combining to PACKSS/PACKUS llvm-svn: 314656	2017-10-02 09:45:08 +00:00
Hiroshi Inoue	dcedd66b00	[PowerPC] support ZERO_EXTEND in tryBitPermutation This patch add a support of ISD::ZERO_EXTEND in PPCDAGToDAGISel::tryBitPermutation to increase the opportunity to use rotate-and-mask by reordering ZEXT and ANDI. Since tryBitPermutation stops analyzing nodes if it hits a ZEXT node while traversing SDNodes, we want to avoid ZEXT between two nodes that can be folded into a rotate-and-mask instruction. For example, we allow these nodes t9: i32 = add t7, Constant:i32<1> t11: i32 = and t9, Constant:i32<255> t12: i64 = zero_extend t11 t14: i64 = shl t12, Constant:i64<2> to be folded into a rotate-and-mask instruction. Such case often happens in array accesses with logical AND operation in the index, e.g. array[i & 0xFF]; Differential Revision: https://reviews.llvm.org/D37514 llvm-svn: 314655	2017-10-02 09:24:00 +00:00
Simon Pilgrim	3bbbf31590	Fix typo in comment. NFCI. llvm-svn: 314653	2017-10-02 09:10:50 +00:00
Simon Pilgrim	e575651370	[X86] Cleanup uses of computeKnownBits by using MaskedValueIsZero helper instead. NFCI. llvm-svn: 314652	2017-10-02 09:08:45 +00:00
Michael Zuckerman	e4084f6bdb	[X86][LLVM]Expanding Supports lowerInterleaved{store\|load}() in X86InterleavedAccess (VF64 stride 3-4) I continue to support different VF interleaved and in this pass for this patch, I added the vf64 stride3 support for both load and store. I also added support fot the stride4 store. Reviewers: 1. zvi 2. dorit 3. igorb 4. guyblank Differential Revision: https://reviews.llvm.org/D37687 Change-Id: I3d238efedf217d1768b348d710de1efa2f19d27b llvm-svn: 314651	2017-10-02 07:35:25 +00:00
Craig Topper	d37625859a	[X86] Fix copy pasto in X86FastISel::fastEmitInst_rrrr. The 4th operand was not being constrained and the third operand was being constrained twice. llvm-svn: 314648	2017-10-02 05:46:53 +00:00
Craig Topper	bb7866162c	[X86] Use a bool flag instead of assigning an unsigned to two different values that we only use in an equality comparison. llvm-svn: 314647	2017-10-02 05:46:52 +00:00
Craig Topper	c05c390a7c	[X86] Use _NOREX MOVZX instructions for some patterns even in 32-bit mode. This unifies the patterns between both modes. This should be effectively NFC since all the available registers in 32-bit mode statisfy this constraint. llvm-svn: 314643	2017-10-02 00:44:50 +00:00
Ron Lieberman	9bcdd80b66	[Hexagon] Check vector elements for equivalence in the HexagonVectorLoopCarriedReuse pass If the two instructions being compared for equivalence have corresponding operands that are integer constants, then check their values to determine equivalence. Patch by Suyog Sarda! llvm-svn: 314642	2017-10-02 00:34:07 +00:00
Ron Lieberman	f90493d220	[Hexagon] Patch to Extract i1 element from vector of i1 This patch extracts 1 element from vector consisting of elements of size 1 bit at given index. llvm-svn: 314641	2017-10-02 00:16:15 +00:00
Craig Topper	c20b46da2f	[X86] Change register&memory TEST instructions from MRMSrcMem to MRMDstMem Summary: Intel documentation shows the memory operand as the first operand. But we currently treat it as the second operand. Conceptually the order doesn't matter since it doesn't write memory. We have aliases to parse with the operands in either order and the isel matching is commutable. For the register&register form order does matter for the assembly parser. PR22995 was previously filed and fixed by changing the register&register form from MRMSrcReg to MRMDestReg to match gas. Ideally the memory form should match by using MRMDestMem. I believe this supercedes D38025 which was trying to switch the register&register form back to pre-PR22995. Reviewers: aymanmus, RKSimon, zvi Reviewed By: aymanmus Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D38120 llvm-svn: 314639	2017-10-01 23:53:53 +00:00
Craig Topper	00230604d3	[X86] Remove a couple unnecessary COPY_TO_REGCLASS from some output patterns where the instruction already produces the correct register class. llvm-svn: 314638	2017-10-01 23:53:50 +00:00
Simon Pilgrim	df23a2700d	[X86][SSE] Add faux shuffle combining support for PACKUS llvm-svn: 314631	2017-10-01 18:43:48 +00:00
Simon Pilgrim	836fa6dcfd	[X86][SSE] Improve shuffle combining of PACKSS instructions. Support unary packing and fix the faux shuffle mask for vectors larger than 128 bits. llvm-svn: 314629	2017-10-01 17:54:55 +00:00
Sanjay Patel	c7076a3ba9	[x86] formatting; NFC llvm-svn: 314627	2017-10-01 14:39:10 +00:00
Simon Pilgrim	a8dd6f4f30	[X86][SSE] Fold (VSRAI (VSHLI X, C1), C1) --> X iff NumSignBits(X) > C1 Remove sign extend in register style pattern if the sign is already extended enough llvm-svn: 314599	2017-09-30 17:57:34 +00:00
Craig Topper	619569841a	[AVX-512] Add patterns to make fp compare instructions commutable during isel. llvm-svn: 314598	2017-09-30 17:02:39 +00:00
Michael Zuckerman	b92b6d424f	Code refactoring for the interleaved code <NFC> Change-Id: I7831c9febad8e14278a5bc87584a0053dc837be1 llvm-svn: 314596	2017-09-30 14:55:03 +00:00
Craig Topper	d92ade96f4	[X86] Support v64i8 mulhu/mulhs Implemented by splitting into two v32i8 mulhu/mulhs and concatenating the results. Differential Revision: https://reviews.llvm.org/D38307 llvm-svn: 314584	2017-09-30 04:21:46 +00:00
Stanislav Mekhanoshin	1d8cf2be89	[AMDGPU] Set fast-math flags on functions given the options We have a single library build without relaxation options. When inlined library functions remove fast math attributes from the functions they are integrated into. This patch sets relaxation attributes on the functions after linking provided corresponding relaxation options are given. Math instructions inside the inlined functions remain to have no fast flags, but inlining does not prevent fast math transformations of a surrounding caller code anymore. Differential Revision: https://reviews.llvm.org/D38325 llvm-svn: 314568	2017-09-29 23:40:19 +00:00
Nicolai Haehnle	ce4ddd06da	AMDGPU: VALU carry-in and v_cndmask condition cannot be EXEC The hardware will only forward EXEC_LO; the high 32 bits will be zero. Additionally, inline constants do not work. At least, v_addc_u32_e64 v0, vcc, v0, v1, -1 which could conceivably be used to combine (v0 + v1 + 1) into a single instruction, acts as if all carry-in bits are zero. The llvm.amdgcn.ps.live test is adjusted; it would be nice to combine s_mov_b64 s[0:1], exec v_cndmask_b32_e64 v0, v1, v2, s[0:1] into v_mov_b32 v0, v3 but it's not particularly high priority. Fixes dEQP-GLES31.functional.shaders.helper_invocation.value.* llvm-svn: 314522	2017-09-29 15:37:31 +00:00
Jonas Paulsson	c9e363ac69	[SystemZ] implement shouldCoalesce() Implement shouldCoalesce() to help regalloc avoid running out of GR128 registers. If a COPY involving a subreg of a GR128 is coalesced, the live range of the GR128 virtual register will be extended. If this happens where there are enough phys-reg clobbers present, regalloc will run out of registers (if there is not a single GR128 allocatable register available). This patch tries to allow coalescing only when it can prove that this will be safe by checking the (local) interval in question. Review: Ulrich Weigand, Quentin Colombet https://reviews.llvm.org/D37899 https://bugs.llvm.org/show_bug.cgi?id=34610 llvm-svn: 314516	2017-09-29 14:31:39 +00:00
Amara Emerson	7d6c55f8aa	[X86] Improve codegen for inverted overflow checking intrinsics. Adds a new combine for: xor(setcc cc, val), 1 --> setcc (invert(cc), val) Differential Revision: https://reviews.llvm.org/D38161 llvm-svn: 314514	2017-09-29 13:53:44 +00:00
Sam Parker	963da5b119	[ARM] v8.3-a complex number support New instructions are added to AArch32 and AArch64 to aid floating-point multiplication and addition of complex numbers, where the complex numbers are packed in a vector register as a pair of elements. The Imaginary part of the number is placed in the more significant element, and the Real part of the number is placed in the less significant element. This patch adds assembler for the ARM target. Differential Revision: https://reviews.llvm.org/D36789 llvm-svn: 314511	2017-09-29 13:11:33 +00:00
Michael Zuckerman	0b5db55b96	Small modification <NFC> Change-Id: I360abccee12cae29bd2ac4f8399c9ecc92eb7f13 llvm-svn: 314510	2017-09-29 12:45:54 +00:00
Aleksandar Beserminji	29341b88ac	[mips] Reordering callseq* nodes to be linear Fix nested callseq* nodes by moving callseq_start after the arguments calculation to temporary registers, so that callseq* nodes in resulting DAG are linear. Recommitting r314497. This version does not contain test which fails when compiler is not build in debug mode. Differential Revision: https://reviews.llvm.org/D37328 llvm-svn: 314507	2017-09-29 11:05:02 +00:00
Aleksandar Beserminji	a0a01e7172	Revert "[mips] Reordering callseq* nodes to be linear" Added test relies on the compiler being built in debug mode, which may not be the case. This reverts commit r314497. llvm-svn: 314506	2017-09-29 10:52:03 +00:00
Simon Dardis	f21d8d6ad5	[mips] Add missing license info, formatting changes. NFCI Add missing license information to MicroMipsInstrFPU.td and fix most of the formatting errors present. Others will be addressed in a follow up commits. llvm-svn: 314505	2017-09-29 10:08:06 +00:00
Tim Renouf	ef1ae8ffac	[AMDGPU] calling conventions for AMDPAL OS type Summary: This commit adds comments on how the AMDPAL OS type overloads the existing AMDGPU_ calling conventions used by Mesa, and adds a couple of new ones. Reviewers: arsenm, nhaehnle, dstuttard Subscribers: mehdi_amini, kzhuravl, wdng, yaxunl, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D37752 llvm-svn: 314502	2017-09-29 09:51:22 +00:00
Tim Renouf	132291589f	[AMDGPU] AMDPAL scratch buffer support Summary: Added support for scratch (including spilling) for OS type amdpal: generates code to set up the scratch descriptor if it is needed. With amdpal, the scratch resource descriptor is loaded from offset 0 of the global information table. The low 32 bits of the address of the global information table is passed in s0. Added amdgpu-git-ptr-high function attribute to hard-wire the high 32 bits of the address of the global information table. If the function attribute is not specified, or is 0xffffffff, then the backend generates code to use the high 32 bits of pc. The documentation for the AMDPAL ABI will be added in a later commit. Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye Differential Revision: https://reviews.llvm.org/D37483 llvm-svn: 314501	2017-09-29 09:49:35 +00:00
Tim Renouf	9f7ead3334	[Triple] Add AMDPAL operating system type Summary: This operating system type represents the AMDGPU PAL runtime, and will be required by the AMDGPU backend in order to generate correct code for this runtime. Currently it generates the same code as not specifying an OS at all. That will change in future commits. Patch from Tim Corringham. Subscribers: arsenm, nhaehnle Differential Revision: https://reviews.llvm.org/D37380 llvm-svn: 314500	2017-09-29 09:48:12 +00:00
Aleksandar Beserminji	502dcb035a	[mips] Reordering callseq* nodes to be linear Fix nested callseq* nodes by moving callseq_start after the arguments calculation to temporary registers, so that callseq* nodes in resulting DAG are linear. Differential Revision: https://reviews.llvm.org/D37328 llvm-svn: 314497	2017-09-29 09:32:14 +00:00
Coby Tayree	c3d24118e8	[X86][MS-InlineAsm] Extended support for variables / identifiers on memory / immediate expressions Allow the proper recognition of Enum values and global variables inside ms inline-asm memory / immediate expressions, as they require some additional overhead and treated incorrect if doesn't early recognized. supersedes D33278, D35774 Differential Revision: https://reviews.llvm.org/D37412 llvm-svn: 314493	2017-09-29 07:02:46 +00:00
Craig Topper	6255c7b675	[X86] Don't select (cmp (and, imm), 0) to testw Summary: X86ISelDAGToDAG tries to analyze ANDs compared with 0 to optimize to narrower immediates using subregisters. I don't think we should be optimizing to 16-bit test instructions. It goes against our normal behavior of promoting i16 operations to i32. It only saves one byte due to the need to add a 0x66 prefix. I think it would also be subject to a length changing prefix penalty in the decoders on Intel CPUs. Reviewers: RKSimon, zvi, spatel Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D38273 llvm-svn: 314474	2017-09-28 23:35:36 +00:00
Matthias Braun	51687912a4	ARM: Fix cases where CSI Restored bit is not cleared LR is an untypical callee saved register in that it is restored into a different register (PC) and thus does not live-out of the return block. This case requires the `Restored` flag in CalleeSavedInfo to be cleared. This fixes a number of cases where this wasn't handled correctly yet. llvm-svn: 314471	2017-09-28 23:12:06 +00:00
Yonghong Song	ef29a84d48	bpf: fix a bug for disassembling ld_pseudo inst Signed-off-by: Yonghong Song <yhs@fb.com> llvm-svn: 314469	2017-09-28 22:47:34 +00:00
Eugene Zelenko	3b87336a0c	[Hexagon] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC). llvm-svn: 314467	2017-09-28 22:27:31 +00:00
Ulrich Weigand	df86855f61	[SystemZ] Fix fall-out from r314428 The expensive-checks build bot found a problem with the r314428 commit: if CC is live after a ATOMIC_CMP_SWAPW instruction, it needs to be marked as live-in to the block after the loop the pseudo gets expanded to. This actually fixes a code-gen bug as well, since if the CC isn't live, the CR and JLH are merged to a CRJLH which doesn't actually set the condition code any more. llvm-svn: 314465	2017-09-28 22:08:25 +00:00
Craig Topper	ed19350293	[X86] Make use of vpmovwb when possible in LowerMULH If we have BWI, we can truncate in a much simpler way by using vpmovwb. This even works without VLX by using the wider zmm->ymm truncate with a subvector extract. Differential Revision: https://reviews.llvm.org/D38375 llvm-svn: 314457	2017-09-28 20:10:34 +00:00
Martin Storsjo	d6218cc385	[ARM] Restore the right frame pointer register in Int_eh_sjlj_longjmp In setupEntryBlockAndCallSites in CodeGen/SjLjEHPrepare.cpp, we fetch and store the actual frame pointer, but on return via the longjmp intrinsic, it always was restored into the r7 variable. On windows, the frame pointer should be restored into r11 instead of r7. On Darwin (where sjlj exception handling is used by default), the frame pointer is always r7, both in arm and thumb mode, and likewise, on windows, the frame pointer always is r11. On linux however, if sjlj exception handling is enabled (which it isn't by default), libcxxabi and the user code can be built in differing modes using different registers as frame pointer. Therefore, when restoring registers on a platform where we don't always use the same register depending on code mode, restore both r7 and r11. Differential Revision: https://reviews.llvm.org/D38253 llvm-svn: 314451	2017-09-28 19:04:30 +00:00
Martin Storsjo	adceba59a2	[ARM] Fix SJLJ exception handling when manually chosen on a platform where it isn't default Differential Revision: https://reviews.llvm.org/D38252 llvm-svn: 314450	2017-09-28 19:04:14 +00:00
Craig Topper	3819be6cf6	[X86] Use target independent ZERO_EXTEND/SIGN_EXTEND nodes were possible in LowerMULH We aren't do any in register extends here so we should be able to just the target independent nodes directly and allow them to be lowered as necessary. llvm-svn: 314447	2017-09-28 18:45:28 +00:00
Craig Topper	fc104bfbc0	[X86] Move a setOperation action for ISD::TRUNCATE near another one in the same if. Remove one that is redundant with another subtarget features. llvm-svn: 314446	2017-09-28 18:45:27 +00:00
Craig Topper	ceff6da6e9	[X86] Use BWI instructions to improve lowering of v32i8 MULHU/S Summary: If we have BWI instructions we can widen to v32i16 to do the multiply instead of splitting. Reviewers: RKSimon, spatel, zvi Reviewed By: zvi Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D38305 llvm-svn: 314432	2017-09-28 17:00:21 +00:00
Craig Topper	fd6b8a67fb	[X86] Remove dead code from X86ISelDAGToDAG.cpp multiply handling Summary: Lowering never creates X86ISD::UMUL for 8-bit types. X86ISD::UMUL8 is used instead. If X86ISD::UMUL 8-bit were ever used it would crash. DAGCombiner replaces UMUL_LOHI/SMUL_LOHI with a wider MUL and a shift if the type twice as wide is legal. So we should never see i8 UMUL_LOHI/SMUL_LOHI. In fact I think there was a bug in part of the i8 code. Similar is true for i16 though without the bug. Reviewers: RKSimon, spatel, zvi Reviewed By: zvi Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D38276 llvm-svn: 314430	2017-09-28 16:56:36 +00:00
Craig Topper	71a8cf9f99	[X86] Use correct subvector index when combining two insert subvectors featuring zero vectors. Previously we were using one of the subvector indices twice. The included test case causes an assert without this change. Thanks to Simon Pilgrim for catching this. llvm-svn: 314429	2017-09-28 16:53:16 +00:00
Ulrich Weigand	0f1de04979	[SystemZ] Custom-expand ATOMIC_CMP_AND_SWAP_WITH_SUCCESS The SystemZ compare-and-swap instructions already provide the "success" indication via a condition-code value, so the default expansion of those operations generates an unnecessary extra comparsion. llvm-svn: 314428	2017-09-28 16:22:54 +00:00
Simon Pilgrim	2ff339303e	Use SDValue::getConstantOperandVal helper. NFCI. llvm-svn: 314425	2017-09-28 15:53:27 +00:00
Simon Dardis	c8e33c5ca1	[mips] Remove codegen support for branch likely instructions. This patch disables codegen support for branch likely instructions to address a potential bug. These branches were unselectable as they had the same patterns as the normal branches but came after them when ISel was concerned. The branch likely instructions were marked as having no delay slots when they have annulling delay slots. The delay slot filler does not currently handle annulling delay slot branches, so this would lead to wrong codegen if these branches were generated. Reviewers: atanasyan, nitesh.jain Differential Revision: https://reviews.llvm.org/D38169 llvm-svn: 314421	2017-09-28 15:24:07 +00:00
Coby Tayree	566348f2a0	[x86][AsmParser] Allow some more MS size directives MS allows the following size directives: float/double and long as synonymous to dword/qword and dword, respectively. Differential Revision: https://reviews.llvm.org/D37190 llvm-svn: 314410	2017-09-28 11:04:08 +00:00
Alex Bradbury	5518cbfc41	Teach TargetInstrInfo::getInlineAsmLength to parse .space directives with integer arguments It's currently quite difficult to test passes like branch relaxation, which requires branches with large displacement to be generated. The .space assembler directive makes it easy to create arbitrarily large basic blocks, but getInlineAsmLength is not able to parse it and so the size of the block is not correctly estimated. Other backends (AArch64, AMDGPU) introduce options just for testing that artificially restrict the ranges of branch instructions (e.g. aarch64-tbz-offset-bits). Although parsing a single form of the .space directive feels inelegant, it does allow a more direct testing approach. This patch adapts the .space parsing code from Mips16InstrInfo::getInlineAsmLength and removes it now the extra functionality is provided by the base implementation. I want to move this functionality to the generic getInlineAsmLength as 1) I need the same for RISC-V, and 2) I feel other backends will benefit from more direct testing of large branch displacements. Differential Revision: https://reviews.llvm.org/D37798 llvm-svn: 314393	2017-09-28 09:31:46 +00:00
Hiroshi Inoue	79c0bec06e	[PowerPC] eliminate partially redundant compare instruction This is a follow-on of D37211. D37211 eliminates a compare instruction if two conditional branches can be made based on the one compare instruction, e.g. if (a == 0) { ... } else if (a < 0) { ... } This patch extends this optimization to support partially redundant cases, which often happen in while loops. For example, one compare instruction is moved from the loop body into the preheader by this optimization in the following example. do { if (a == 0) dummy1(); a = func(a); } while (a > 0); Differential Revision: https://reviews.llvm.org/D38236 llvm-svn: 314390	2017-09-28 08:38:19 +00:00
Alex Bradbury	9d3f12501a	[RISCV] Add common fixups and relocations %lo(), %hi(), and %pcrel_hi() are supported and test cases have been added to ensure the appropriate fixups and relocations are generated. I've added an instruction format field which is used in RISCVMCCodeEmitter to, for instance, tell whether it should emit a lo12_i fixup or a lo12_s fixup (RISC-V has two 12-bit immediate encodings depending on the instruction type). Differential Revision: https://reviews.llvm.org/D23568 llvm-svn: 314389	2017-09-28 08:26:24 +00:00
Yonghong Song	e9165f8720	bpf: add new insns for bswap_to_le and negation This patch adds new insn, "reg = be16/be32/be64 reg", for bswap to little endian for big-endian target (bpfeb). It also adds new insn for negation "reg = -reg". Currently, for source code, e.g., b = -a LLVM still prefers to generate: b = 0 - a But "reg = -reg" format can be used in assembly code. Signed-off-by: Yonghong Song <yhs@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> llvm-svn: 314376	2017-09-28 02:46:11 +00:00
Galina Kistanova	1c6f0bb63e	Reverted r313993. This patch produces a crash and hexagon_vector_loop_carried_reuse_constant.ll test fails on Windows (llvm-clang-x86_64-expensive-checks-win build bot). llvm-svn: 314361	2017-09-27 23:09:14 +00:00
Jessica Paquette	4cf187b5b4	[MachineOutliner] AArch64: Avoid saving + restoring LR if possible This commit allows the outliner to avoid saving and restoring the link register on AArch64 when it is dead within an entire class of candidates. This introduces changes to the way the outliner interfaces with the target. For example, the target now interfaces with the outliner using a MachineOutlinerInfo struct rather than by using getOutliningCallOverhead and getOutliningFrameOverhead. This also improves several comments on the outliner's cost model. https://reviews.llvm.org/D36721 llvm-svn: 314341	2017-09-27 20:47:39 +00:00
Craig Topper	c16a472966	Revert r314249 "Recommit r314151 "[X86] Make all the NOREX CodeGenOnly instructions into postRA pseudos like the NOREX version of TEST.""" This caused PR34751 llvm-svn: 314339	2017-09-27 20:34:17 +00:00
Craig Topper	e0d8290094	Revert r314248 "[X86] Don't emit X86::MOV8rr_NOREX from X86InstrInfo::copyPhysReg." This contributed to PR34751 llvm-svn: 314338	2017-09-27 20:34:13 +00:00
Simon Pilgrim	870007b4f8	[X86][SSE] Pull out variable shuffle mask combine logic. NFCI. Hopefully this will make it easier to vary the combine depth threshold per-target. llvm-svn: 314337	2017-09-27 20:19:53 +00:00
Craig Topper	7b1d503d7f	[X86] Rewrite the zero vector checks in lowerV2X128VectorShuffle to use the Zeroable APInt We already have zeroable bits in an APInt. We might as well use that instead of checking for an all zero BUILD_VECTOR. Differential Revision: https://reviews.llvm.org/D37950 llvm-svn: 314332	2017-09-27 18:56:20 +00:00
Craig Topper	05f71dd036	[X86] In combineLoopSADPattern, pad result with zeros and use full size add instead of using a smaller add and inserting. In some cases the result psadbw is smaller than the type of the add that started the match. Currently in these cases we are using a smaller add and inserting the result. If we instead combine the psadbw with zeros and use the full size add we can take advantage of implicit zeroing we get if we emit a narrower move before the add. In a future patch, I want to make isel aware that the psadbw itself already zeroed the upper bits and remove the move entirely. Differential Revision: https://reviews.llvm.org/D37453 llvm-svn: 314331	2017-09-27 18:36:45 +00:00
Geoff Berry	c032b2beb0	[AArch64][Falkor] Ignore SP based loads in HW prefetch fixups. Reviewers: mcrosier Subscribers: aemerson, rengolin, javed.absar, kristof.beyls Differential Revision: https://reviews.llvm.org/D38301 llvm-svn: 314319	2017-09-27 17:14:10 +00:00
Sanjay Patel	0f9b4773c1	[SimplifyCFG] add a struct to house optional folds (PR34603) This was intended to be no-functional-change, but it's not - there's a test diff. So I thought I should stop here and post it as-is to see if this looks like what was expected based on the discussion in PR34603: https://bugs.llvm.org/show_bug.cgi?id=34603 Notes: 1. The test improvement occurs because the existing 'LateSimplifyCFG' marker is not carried through the recursive calls to 'SimplifyCFG()->SimplifyCFGOpt().run()->SimplifyCFG()'. The parameter isn't passed down, so we pick up the default value from the function signature after the first level. I assumed that was a bug, so I've passed 'Options' down in all of the 'SimplifyCFG' calls. 2. I split 'LateSimplifyCFG' into 2 bits: ConvertSwitchToLookupTable and KeepCanonicalLoops. This would theoretically allow us to differentiate the transforms controlled by those params independently. 3. We could stash the optional AssumptionCache pointer and 'LoopHeaders' pointer in the struct too. I just stopped here to minimize the diffs. 4. Similarly, I stopped short of messing with the pass manager layer. I have another question that could wait for the follow-up: why is the new pass manager creating the pass with LateSimplifyCFG set to true no matter where in the pipeline it's creating SimplifyCFG passes? // Create an early function pass manager to cleanup the output of the // frontend. EarlyFPM.addPass(SimplifyCFGPass()); --> /// \brief Construct a pass with the default thresholds /// and switch optimizations. SimplifyCFGPass::SimplifyCFGPass() : BonusInstThreshold(UserBonusInstThreshold), LateSimplifyCFG(true) {} <-- switches get converted to lookup tables and loops may not be in canonical form If this is unintended, then it's possible that the current behavior of dropping the 'LateSimplifyCFG' setting via recursion was masking this bug. Differential Revision: https://reviews.llvm.org/D38138 llvm-svn: 314308	2017-09-27 14:54:16 +00:00
Hiroshi Inoue	ed1ffa49a4	[PowerPC] eliminate unconditional branch to the next instruction This patch makes analyzeBranch eliminate unconditional branch to the next instruction. After basic blocks are re-organized by optimizers, such as machine block placement, a BB may end with an unconditional branch to the next (fallthrough) BB. This patch removes such redundant branch instruction. Differential Revision: https://reviews.llvm.org/D37730 llvm-svn: 314297	2017-09-27 10:33:02 +00:00
Coby Tayree	836c50cc2f	[X86][AsmParser] fix PR32035 Differential Revision: https://reviews.llvm.org/D37473 llvm-svn: 314295	2017-09-27 10:29:29 +00:00
Simon Pilgrim	3b0d9e789e	[X86][AVX] Improve (i4 bitcast (v4i1 x)) handling for 256-bit vector compare results. As commented on D37849 and rL313547, AVX1 targets were missing a chance to use vmovmskpd for v4f64/v4i64 results for bool vector bitcasts llvm-svn: 314293	2017-09-27 10:10:17 +00:00
Sam Parker	211f47aa37	[ARM] isTruncateFree fix I implemented isTruncateFree in rL313533, this patch fixes the logic to match my comment, as the previous logic was too general. Now the only truncates that are free are i64 -> i32. Differential Revision: https://reviews.llvm.org/D38234 llvm-svn: 314280	2017-09-27 08:30:45 +00:00
Martin Storsjo	aa1533bf9b	[X86] Fix SJLJ struct offsets for x86_64 This is necessary, but not sufficient, for having working SJLJ exception handling on x86_64. Differential Revision: https://reviews.llvm.org/D38254 llvm-svn: 314277	2017-09-27 06:08:23 +00:00
Martin Storsjo	eccaf04e40	[X86] Remove erroneous callsite offsetting in SJLJ landing pads The callsite value is already stored indexed from 0 in the _Unwind_Context struct. When accessed via the functions _Unwind_GetIP and _Unwind_SetIP, the value is indexed from 1, but those functions handle the offseting. When reading directly from the struct here, we shouldn't subtract 1. This matches the code generated by the ARM target, where SJLJ exception handling is used by default on iOS. This makes clang-built object files for 32 bit x86 mingw work when linked with libgcc/libstdc++. Differential Revision: https://reviews.llvm.org/D38251 llvm-svn: 314276	2017-09-27 06:08:16 +00:00
Craig Topper	177a3923ce	[X86] Use extract128BitVector in LowerMULH so we can extract from constant build vectors. llvm-svn: 314274	2017-09-27 06:04:55 +00:00
Geoff Berry	bbfa246ad3	[AArch64][Falkor] Fix bug in falkor prefetcher fix pass. Summary: In rare cases, loads that don't get prefetched that were marked as strided loads could cause a crash if they occurred in a loop with other colliding loads. Reviewers: mcrosier Subscribers: aemerson, rengolin, javed.absar, kristof.beyls Differential Revision: https://reviews.llvm.org/D38261 llvm-svn: 314252	2017-09-26 21:40:46 +00:00
Geoff Berry	a4b2f5df5e	[AArch64][Falkor] Fix correctness bug in falkor prefetcher fix pass and correct some opcode tag computations. Summary: This addresses a correctness bug for LD[1234]*_POST opcodes that have the prefetcher fix applied to them: the base register was not being written back from the temp after being incremented, so it would appear to never be incremented. Also, fix some opcode tag computations based on some updated HW details to get better tag avoidance and thus better prefetcher performance. Reviewers: mcrosier Subscribers: aemerson, rengolin, javed.absar, kristof.beyls Differential Revision: https://reviews.llvm.org/D38256 llvm-svn: 314251	2017-09-26 21:40:41 +00:00
Craig Topper	b7e4c94c6c	[X86] Fix register class name in a comment. NFC llvm-svn: 314250	2017-09-26 21:35:11 +00:00
Craig Topper	7f0eeb428b	Recommit r314151 "[X86] Make all the NOREX CodeGenOnly instructions into postRA pseudos like the NOREX version of TEST."" The late MOV8rr_NOREX that caused the crash has been removed. llvm-svn: 314249	2017-09-26 21:35:09 +00:00
Craig Topper	ab3c0075b8	[X86] Don't emit X86::MOV8rr_NOREX from X86InstrInfo::copyPhysReg. This hook is called after register allocation with two physical registers. We don't need a separate instruction at that time to force register class constraints. I left in the assert though. We also have a fatal error in X86MCCodeEmitter if we ever encode an H-reg and a REX prefix. llvm-svn: 314248	2017-09-26 21:35:06 +00:00
Craig Topper	0768bced39	[X86] Fix typo in comment. NFC llvm-svn: 314247	2017-09-26 21:35:04 +00:00
Nemanja Ivanovic	e22ebeab1a	[PowerPC] Reverting sequence of patches for elimination of comparison instructions In the past while, I've committed a number of patches in the PowerPC back end aimed at eliminating comparison instructions. However, this causes some failures in proprietary source and these issues are not observed in SPEC or any open source packages I've been able to run. As a result, I'm pulling the entire series and will refactor it to: - Have a single entry point for easy control - Have fine-grained control over which patterns we transform A side-effect of this is that test cases for these patches (and modified by them) are XFAIL-ed. This is a temporary measure as it is counter-productive to remove/modify these test cases and then have to modify them again when the refactored patch is recommitted. The failure will be investigated in parallel to the refactoring effort and the recommit will either have a fix for it or will leave this transformation off by default until the problem is resolved. llvm-svn: 314244	2017-09-26 20:42:47 +00:00
Michael Zuckerman	645f777e40	[X86][LLVM]Expanding Supports lowerInterleavedStore() in X86InterleavedAccess (VF{8\|16\|32} stride 3) This patch expands the support of lowerInterleavedStore to {8\|16\|32}x8i stride 3. LLVM creates suboptimal shuffle code-gen for AVX2. In overall, this patch is a specific fix for the pattern (Strid=3 VF={8\|16\|32}) . This patch is part two of two patches and it covers the store (interlevaed) side. The patch goal is to optimize the following sequence: a0 a1 a2 a3 a4 a5 a6 a7 b0 b1 b2 b3 b4 b5 b6 b7 c0 c1 c2 c3 c4 c5 c6 c7 into a0 b0 c0 a1 b1 c1 a2 b2 c2 a3 b3 c3 a4 b4 c4 a5 b5 c5 a6 b6 c6 a7 b7 c7 Reviewers: zvi guyblank dorit Ayal Differential Revision: https://reviews.llvm.org/D37117 Change-Id: I56ced8bcbea809a37654060771911ade20246ccc llvm-svn: 314234	2017-09-26 18:49:11 +00:00
Artem Belevich	bab95c7087	[NVPTX] added match.{any,all}.sync instructions, intrinsics & builtins. Differential Revision: https://reviews.llvm.org/D38191 llvm-svn: 314223	2017-09-26 17:07:23 +00:00
Craig Topper	f51913155c	[X86] Add support for v16i32 UMUL_LOHI/SMUL_LOHI Summary: This patch extends the v8i32/v4i32 custom lowering to support v16i32 Reviewers: zvi, RKSimon Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D38274 llvm-svn: 314221	2017-09-26 16:43:57 +00:00
Krzysztof Parzyszek	9801d7fd9f	[Hexagon] Fix a typo: #ifndef DEBUG -> #ifndef NDEBUG llvm-svn: 314216	2017-09-26 15:31:15 +00:00
Krzysztof Parzyszek	1665b3db40	[Hexagon] Fix initialization of HexagonSubtarget Make sure that "initializeSubtargetDependencies" sets all members that InstrInfo and the like may depend on. llvm-svn: 314214	2017-09-26 15:06:37 +00:00
Simon Pilgrim	dac6fd4170	[X86][XOP] Merge rotation opcodes with AVX512 equivalents. NFCI. The XOP rotations act as ROTL with +ve values and ROTR with -ve values, which means that we can treat them all as ROTL with unsigned modulo. We already check that we're only trying to lower as ROTL for XOP rotations. Differential Revision: https://reviews.llvm.org/D37949 llvm-svn: 314207	2017-09-26 14:12:50 +00:00
Coby Tayree	f191fdc3fb	[x86] fix pr29061 https://bugs.llvm.org//show_bug.cgi?id=29061 Don't try referencing REX-needed regs when not on 64bit mode Aligns to GCC Differetial Revision: https://reviews.llvm.org/D37801 llvm-svn: 314203	2017-09-26 13:28:05 +00:00
Benjamin Kramer	4b2113a303	Revert "[X86] Make all the NOREX CodeGenOnly instructions into postRA pseudos like the NOREX version of TEST." Makes llc crash. This reverts commit r314151. llvm-svn: 314199	2017-09-26 10:25:27 +00:00
Uriel Korach	0ecc984b1b	[X86] Finishing broadcastf32x2 and broadcasti32x2 intrinsics lowering to IR. llvm side. Removing X86 broadcast(f/i)32x2 intrinsics from llvm. Adding autoUpgrade support. Moving matching tests from avx512dq-intrinsics.ll to avx512dq-intrinsics-upgrade.ll and from avx512dqvl-intrinsics.ll to avx512dqvl-intrinsics-upgrade.ll. Differential Revision: https://reviews.llvm.org/D38220 llvm-svn: 314195	2017-09-26 07:39:39 +00:00
Dylan McKay	1446eedbc2	[AVR] Prefer BasicBlock::getIterator over Function::begin() Thanks to Eli Friedman for the suggestion. llvm-svn: 314182	2017-09-26 01:37:53 +00:00
Dylan McKay	dada014781	[AVR] When lowering shifts into loops, put newly generated MBBs in the same spot as the original MBB Discovered in avr-rust/rust#62 https://github.com/avr-rust/rust/issues/62 Patch by Gergo Erdi. llvm-svn: 314180	2017-09-26 00:51:03 +00:00
Dylan McKay	832c4a65c0	[AVR] Use 1-byte alignment for all data types This was an oversight in the original backend data layout. The AVR architecture does not have the concept of unaligned loads - all loads/stores from all addresses are aligned to one byte. Discovered in avr-rust issue #64 https://github.com/avr-rust/rust/issues/64 Patch By Gergo Erdi. llvm-svn: 314179	2017-09-26 00:45:27 +00:00
Eli Friedman	edee9999c4	Revert r312724 ("[ARM] Remove redundant vcvt patterns."). It leads to some improvements, but also a regression for the simple case, so it's not clearly a good idea. test/CodeGen/ARM/vcvt.ll now has test coverage to show the difference. Ultimately, the right solution is probably to custom-lower fp-to-int conversions, to something like ARMISD::VCVT_F32_S32 plus a bitcast. It's hard to do the right thing when the implicit bitcast isn't visible to DAG transforms. llvm-svn: 314169	2017-09-25 22:07:33 +00:00
Saleem Abdulrasool	2e0d72311b	X86: remove R12 from CSR on Windows x64 SwiftCC R12 is used for the SwiftError parameter. It is no longer a CSR as it is used for transfer the SwiftError, and the caller must preserve it if they need to. llvm-svn: 314165	2017-09-25 22:00:17 +00:00
Craig Topper	5124a14d9c	[X86] Don't select anyext GR32->GR64 to SUBREG_TO_REG. Use INSERT_SUBREG instead. As far as I know SUBREG_TO_REG is stating that the upper bits are 0. But if we are just converting the GR32 with no checks, then we have no reason to say the upper bits are 0. I don't really know how to test this today since I can't find anything that looks that closely at SUBREG_TO_REG. The test changes here seems to be some perturbance of register allocation. Differential Revision: https://reviews.llvm.org/D38001 llvm-svn: 314152	2017-09-25 21:14:59 +00:00
Craig Topper	d830f276c1	[X86] Make all the NOREX CodeGenOnly instructions into postRA pseudos like the NOREX version of TEST. llvm-svn: 314151	2017-09-25 21:14:55 +00:00
Benjamin Kramer	82b7103a69	[Hexagon] Avoid unused variable warnings in Release builds. No functionality change intended. llvm-svn: 314143	2017-09-25 19:42:20 +00:00
Justin Lebar	d31d5e6aa2	Revert "[NVPTX] added match.{any,all}.sync instructions, intrinsics & builtins.", rL314135. Causing assertion failures on macos: > Assertion failed: (Num < NumOperands && "Invalid child # of SDNode!"), > function getOperand, file > /Users/buildslave/jenkins/workspace/clang-stage1-cmake-RA-incremental/llvm/include/llvm/CodeGen/SelectionDAGNodes.h, > line 835. http://green.lab.llvm.org/green/job/clang-stage1-cmake-RA-incremental/42739/testReport/LLVM/CodeGen_NVPTX/surf_read_cuda_ll/ llvm-svn: 314142	2017-09-25 19:41:56 +00:00
Konstantin Belochapka	741099bc0f	[X86] [ASM INTEL SYNTAX] fix for incorrect assembler code generation when x86-asm-syntax=intel (PR34617). Fix for incorrect code generation when x86-asm-syntax=intel. Differential Revision: https://reviews.llvm.org/D37945 llvm-svn: 314140	2017-09-25 19:26:48 +00:00
Craig Topper	5bc10ede53	[SelectionDAG] Teach simplifyDemandedBits to handle shifts by constant splat vectors This teach simplifyDemandedBits to handle constant splat vector shifts. This required changing some uses of getZExtValue to getLimitedValue since we can't rely on legalization using getShiftAmountTy for the shift amount. I believe there may have been a bug in the ((X << C1) >>u ShAmt) handling where we didn't check if the inner shift was too large. I've fixed that here. I had to add new patterns to ARM because the zext/sext the patterns were trying to look for got turned into an any_extend with this patch. Happy to split that out too, but not sure how to test without this change. Differential Revision: https://reviews.llvm.org/D37665 llvm-svn: 314139	2017-09-25 19:26:08 +00:00
Krzysztof Parzyszek	7e604deca9	[Hexagon] Better determination of register classes in bit tracker Add two callbacks to MachineEvaluator, so that specific implementations can specify more details about register classes: - composeWithSubRegIndex(RC,Idx), to provide the register class for a register from RC used in conjunction with a subregister index Idx. - getPhysRegBitWidth(Reg), to provide the size in bits of the given physical register. llvm-svn: 314136	2017-09-25 19:12:55 +00:00
Artem Belevich	9941ee9529	[NVPTX] added match.{any,all}.sync instructions, intrinsics & builtins. Differential Revision: https://reviews.llvm.org/D38191 llvm-svn: 314135	2017-09-25 18:53:57 +00:00
Krzysztof Parzyszek	d72bd83479	[Hexagon] Make getHexagonSubRegIndex take reference instead of pointer llvm-svn: 314134	2017-09-25 18:49:42 +00:00
Craig Topper	ba3cc2e0da	[AVX-512] Replace large number of explicit patterns that check for insert_subvector with zero after masked compares with fewer patterns with predicate This replaces the large number of patterns that handle every possible case of zeroing after a masked compare with a few simpler patterns that use a predicate to check for a masked compare producer. This is similar to what we do for detecting free GR32->GR64 zero extends and free xmm->ymm/zmm zero extends. This shrinks the isel table from ~590k to ~531k. This is a roughly 10% reduction in size. Differential Revision: https://reviews.llvm.org/D38217 llvm-svn: 314133	2017-09-25 18:43:13 +00:00
Arnold Schwaighofer	b45717adda	ARM: One more fix for swifterror CSR set We use a differently ordered CSR set if the frame pointer is pushed. Add a matching ..._SwiftError version. llvm-svn: 314128	2017-09-25 17:51:33 +00:00
Benjamin Kramer	a23c1a37d0	[ARM] Fix -Wdangling-else warning. A ternary is clearer here. No functionality change. llvm-svn: 314123	2017-09-25 17:35:38 +00:00
Arnold Schwaighofer	ae4de58a5b	ARM: Use the proper swifterror CSR list on platforms other than darwin Noticed by inspection llvm-svn: 314121	2017-09-25 17:19:50 +00:00
Michael Zuckerman	4a97df01c4	[X86][LLVM]Expanding Supports lowerInterleavedStore() in X86InterleavedAccess (VF8 stride 4): This patch expands the support of lowerInterleavedStore to 8x8i stride 4. LLVM creates suboptimal shuffle code-gen for AVX2. In overall, this patch is a specific fix for the pattern (Strid=4 VF=8) and we plan to include more patterns in the future. The patch goal is to optimize the following sequence: At the end of the computation, we have xmm2, xmm0, xmm12 and xmm3 holding each 8 chars: c0, c1, , c7 m0, m1, , m7 y0, y1, , y7 k0, k1, ., k7 And these need to be transposed/interleaved and stored like so: c0 m0 y0 k0 c1 m1 y1 k1 c2 m2 y2 k2 c3 m3 y3 k3 .... Reviewers DavidKreitzer Farhana zvi igorb guyblank RKSimon Ayal Differential Revision: https://reviews.llvm.org/D36058 Change-Id: I3cc5c2ca5d6318901c192a4428493b99ef424c32 llvm-svn: 314109	2017-09-25 14:50:38 +00:00
Nemanja Ivanovic	f7bc9ce378	[PowerPC] Eliminate compares - add i64 sext/zext handling for SETLT/SETGT As mentioned in https://reviews.llvm.org/D33718, this simply adds another pattern to the compare elimination sequence and is committed without a differential review. llvm-svn: 314106	2017-09-25 14:05:46 +00:00
Chad Rosier	71070856e6	[AArch64] Add basic support for Qualcomm's Saphira CPU. llvm-svn: 314105	2017-09-25 14:05:00 +00:00
Michael Zuckerman	ac1d20dea7	Adding missing feature to goldmont. Change-Id: I1ddc619169fae6a56308deef8dae5db3da702cf4 llvm-svn: 314103	2017-09-25 13:45:31 +00:00
Clement Courbet	2807c0a442	[CodeGenPrepare][NFC] Rename TargetTransformInfo::expandMemCmp -> TargetTransformInfo::enableMemCmpExpansion. Summary: Right now there are two functions with the same name, one does the work and the other one returns true if expansion is needed. Rename TargetTransformInfo::expandMemCmp to make it more consistent with other members of TargetTransformInfo. Remove the unused Instruction* parameter. Differential Revision: https://reviews.llvm.org/D38165 llvm-svn: 314096	2017-09-25 06:35:16 +00:00
Craig Topper	47e14ead54	[X86] Make IFMA instructions during isel so we can fold broadcast loads. This required changing the ISD opcode for these instructions to have the commutable operands first and the addend last. This way tablegen can autogenerate the additional patterns for us. llvm-svn: 314083	2017-09-24 19:30:55 +00:00
Craig Topper	23f1830748	[X86] Add IFMA instructions to the load folding tables and make them commutable for the multiply operands. llvm-svn: 314080	2017-09-24 17:28:14 +00:00
Simon Pilgrim	6ef8a7ed74	Fix signed/unsigned warning llvm-svn: 314078	2017-09-24 14:00:52 +00:00
Simon Pilgrim	a705db9a9e	[X86][SSE] Add support for extending bool vectors bitcasted from scalars This patch acts as a reverse to combineBitcastvxi1 - bitcasting a scalar integer to a boolean vector and extending it 'in place' to the requested legal type. Currently this doesn't handle AVX512 at all - but the current mask register approach is lacking for some cases. Differential Revision: https://reviews.llvm.org/D35320 llvm-svn: 314076	2017-09-24 13:42:31 +00:00
Nemanja Ivanovic	f894ce35d0	[PowerPC] Eliminate compares - add i64 sext/zext handling for SETLE/SETGE As mentioned in https://reviews.llvm.org/D33718, this simply adds another pattern to the compare elimination sequence and is committed without a differential review. llvm-svn: 314073	2017-09-24 05:48:11 +00:00
Craig Topper	eb5c411218	[AVX-512] Add pattern for selecting masked version of v8i32/v8f32 compare instructions when VLX isn't available. We use a v16i32/v16f32 compare instead and truncate the result. We already did this for the unmasked version, but were missing the version with 'and'. llvm-svn: 314072	2017-09-24 05:24:52 +00:00
Craig Topper	675bdd30c6	[X86] Make sure we still mark the full register as implicitly defined when we shrink 256/512 bit zeroing xors to 128-bit. Not sure if anything really cares, but this seems like the right thing to do. llvm-svn: 314071	2017-09-24 05:24:51 +00:00
Dylan McKay	f9e291a2f6	[AVR] Implement getCmpLibcallReturnType(). This fixes the avr-rust issue (#75) with floating-point comparisons generating broken code. By default, LLVM assumes these comparisons return 32-bit values, but ours are 8-bit. Patch By Thomas Backman. llvm-svn: 314070	2017-09-24 01:07:26 +00:00
Sanjay Patel	fa8bad8a0f	[x86] reduce 64-bit mask constant to 32-bits by right shifting This is a follow-up from D38181 (r314023). We have to put 64-bit constants into a register using a separate instruction, so we should try harder to avoid that. From what I see, we're not likely to encounter this pattern in the DAG because the upstream setcc combines from this don't (usually?) produce this pattern. If we fix that, then this will become more relevant. Since the cost of handling this case is just loosening the predicate of the existing fold, we might as well do it now. llvm-svn: 314064	2017-09-23 14:32:07 +00:00
Nemanja Ivanovic	35db4f956a	[PowerPC] Eliminate compares - add i32 sext/zext handling for SETULT/SETUGT As mentioned in https://reviews.llvm.org/D33718, this simply adds another pattern to the compare elimination sequence and is committed without a differential revision. llvm-svn: 314062	2017-09-23 12:53:03 +00:00
Nemanja Ivanovic	c4980799ab	[PowerPC] Eliminate compares - add i32 sext/zext handling for SETULE/SETUGE As mentioned in https://reviews.llvm.org/D33718, this simply adds another pattern to the compare elimination sequence and is committed without a differential revision. llvm-svn: 314060	2017-09-23 09:50:12 +00:00
Craig Topper	092c2f4357	[X86] Move the getInsertVINSERTImmediate and getExtractVEXTRACTImmediate helper functions over to X86ISelDAGToDAG.cpp Redefine them to call getI8Imm and return that directly. llvm-svn: 314059	2017-09-23 05:34:07 +00:00
Craig Topper	492282d4e2	[X86] Remove is the isVINSERTIndex/isVEXTRACTIndex predicates from isel. The only insert_subvector/extract_subvector nodes that make it to isel are guaranteed to match. llvm-svn: 314058	2017-09-23 05:34:06 +00:00
Nemanja Ivanovic	41c4a109d8	[PowerPC] Eliminate compares - add i32 sext/zext handling for SETLT/SETGT As mentioned in https://reviews.llvm.org/D33718, this simply adds another pattern to the compare elimination sequence and is committed without a differential revision. llvm-svn: 314055	2017-09-23 04:41:34 +00:00
Konstantin Belochapka	3477711ec7	[X86] [MC] fixed non optimal encoding of instruction memory operand (PR24038). Fixed suboptimal encoding of instruction memory operand when assembler is used to select 32 bit fixup rather than 8 bit immediate for encoding memory offset value. Differential Revision: https://reviews.llvm.org/D38117 llvm-svn: 314044	2017-09-22 23:37:48 +00:00
Stefan Pintilie	590eb2755d	[PowerPC] Mark P9 scheduling model complete This patch just adds the missing information to the P9 scheduling model to allow the model to be marked as complete. The model has been verified against P9 documentation. The model was verified with utils/schedcover.py. Differential Revision: https://reviews.llvm.org/D35695 llvm-svn: 314026	2017-09-22 20:17:25 +00:00
Sanjay Patel	0c723bb017	[x86] shiftRightAlgebraic -> shiftRightArithmetic; NFC x86 re-education camp is in session. The LLVM LangRef agrees with x86 too. The DAG nodes are undocumented and ambiguous as always. :) llvm-svn: 314024	2017-09-22 19:49:37 +00:00
Sanjay Patel	3339954fa3	[x86] swap order of srl (and X, C1), C2 when it saves size The (non-)obvious win comes from saving 3 bytes by using the 0x83 'and' opcode variant instead of 0x81. There are also better improvements based on known-bits that allow us to eliminate the mask entirely. As noted, this could be extended. There are potentially other wins from always shifting first, but doing that reveals a tangle of problems in other pattern matching. We do this transform generically in instcombine, but we often have icmp IR that doesn't match that pattern, so we must account for this in the backend. Differential Revision: https://reviews.llvm.org/D38181 llvm-svn: 314023	2017-09-22 19:37:21 +00:00
Tim Shen	cee7536188	[XRay] support conditional return on PPC. Summary: Conditional returns were not taken into consideration at all. Implement them by turning them into jumps and normal returns. This means there is a slightly higher performance penalty for conditional returns, but this is the best we can do, and it still disturbs little of the rest. Reviewers: dberris, echristo Subscribers: sanjoy, nemanjai, hiraditya, kbarton, llvm-commits Differential Revision: https://reviews.llvm.org/D38102 llvm-svn: 314005	2017-09-22 18:30:02 +00:00
Pranav Bhandarkar	09273239d1	Check vector elements for equivalence in the HexagonVectorLoopCarriedReuse pass If the two instructions being compared for equivalence have corresponding operands that are integer constants, then check their values to determine equivalence. Patch by Suyog Sarda! llvm-svn: 313993	2017-09-22 16:43:31 +00:00
Alexander Ivchenko	34498ba052	[X86] Combining CMOVs with [ANY,SIGN,ZERO]_EXTEND for cases where CMOV has constant arguments Combine CMOV[i16]<-[SIGN,ZERO,ANY]_EXTEND to [i32,i64] into CMOV[i32,i64]. One example of where it is useful is: before (20 bytes) <foo>: test $0x1,%dil mov $0x307e,%ax mov $0xffff,%cx cmovne %ax,%cx movzwl %cx,%eax retq after (18 bytes) <foo>: test $0x1,%dil mov $0x307e,%ecx mov $0xffff,%eax cmovne %ecx,%eax retq Reviewers: craig.topper, aaboud, spatel, RKSimon, zvi Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D36711 llvm-svn: 313982	2017-09-22 13:21:39 +00:00
Nemanja Ivanovic	cea42b7fff	Remove the default clause from a fully-covering switch to appease bots that use a compiler that warns about this and use -Werror. llvm-svn: 313980	2017-09-22 12:26:00 +00:00
Andre Vieira	640527f7f1	[ARM] Fix assembly and disassembly for VMRS/VMSR Reviewed by: t.p.northover Differential Revision: https://reviews.llvm.org/D36306 llvm-svn: 313979	2017-09-22 12:17:42 +00:00
Nemanja Ivanovic	d6f93f5143	Recommit r310809 with a fix for the spill problem This patch re-commits the patch that was pulled out due to a problem it caused, but with a fix for the problem. The fix was reviewed separately by Eric Christopher and Hal Finkel. Differential Revision: https://reviews.llvm.org/D38054 llvm-svn: 313978	2017-09-22 11:50:25 +00:00
Simon Pilgrim	2b1c3bb25d	[ARM] Add missing selection patterns for vnmla For the following function: double fn1(double d0, double d1, double d2) { double a = -d0 - d1 * d2; return a; } on ARM, LLVM generates code along the lines of vneg.f64 d0, d0 vmls.f64 d0, d1, d2 i.e., a negate and a multiply-subtract. The attached patch adds instruction selection patterns to allow it to generate the single instruction vnmla.f64 d0, d1, d2 (multiply-add with negation) instead, like GCC does. Committed on behalf of @gergo- (Gergö Barany) Differential Revision: https://reviews.llvm.org/D35911 llvm-svn: 313972	2017-09-22 09:50:52 +00:00
Alexander Richardson	eb5ce8b92a	[mips] clang-format MipsTargetMachine.cpp This is my test commit as it only changes two lines llvm-svn: 313968	2017-09-22 08:52:03 +00:00
Dylan McKay	b7926ba50a	[AVR] Remove the 'IsN64' argument to 'MCELFObjectWriter' This has since been removed. llvm-svn: 313965	2017-09-22 06:32:23 +00:00
Yonghong Song	d2e0d1fa11	bpf: initial 32-bit ALU encoding support in assembler This patch adds instruction patterns for operations in BPF_ALU. After this, assembler could recognize some 32-bit ALU statement. For example, those listed int the unit test file. Separate MOV patterns are unnecessary as MOV is ALU operation that could reuse ALU encoding infrastructure, this patch removed those redundant patterns. Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Reviewed-by: Yonghong Song <yhs@fb.com> llvm-svn: 313961	2017-09-22 04:36:36 +00:00
Yonghong Song	3c63b101de	bpf: add 32bit register set Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Reviewed-by: Yonghong Song <yhs@fb.com> llvm-svn: 313960	2017-09-22 04:36:35 +00:00
Yonghong Song	d03fef970b	bpf: refactor inst patterns with better inheritance Arithmetic and jump instructions, load and store instructions are sharing the same 8-bit code field encoding, A better instruction pattern implemention could be the following inheritance relationships, and each layer only encoding those fields which start to diverse from that layer. This avoids some redundant code. InstBPF -> TYPE_ALU_JMP -> ALU/JMP InstBPF -> TYPE_LD_ST -> Load/Store Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Reviewed-by: Yonghong Song <yhs@fb.com> llvm-svn: 313959	2017-09-22 04:36:34 +00:00
Yonghong Song	3bf1a8d04e	bpf: refactor inst patterns with more mnemonics Currently, eBPF backend is using some constant directly in instruction patterns, This patch replace them with mnemonics and removed some unnecessary temparary variables. Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Reviewed-by: Yonghong Song <yhs@fb.com> llvm-svn: 313958	2017-09-22 04:36:32 +00:00
Saleem Abdulrasool	ba7a75c7b2	AArch64: support SwiftCC properly on AAPCS64 The previous SwiftCC support for AAPCS64 was partially correct. It setup swiftself parameters in the proper register but failed to setup swifterror in the correct register. This would break compilation of swift code for non-Darwin AAPCS64 conforming environments. llvm-svn: 313956	2017-09-22 04:31:44 +00:00
NAKAMURA Takumi	fec5e10890	HexagonVectorLoopCarriedReuse.cpp: Apply LLVM_ATTRIBUTE_UNUSED. [-Wunused-function] llvm-svn: 313947	2017-09-22 01:01:33 +00:00
NAKAMURA Takumi	05f6015fbd	Reformat. llvm-svn: 313946	2017-09-22 01:01:31 +00:00
Richard Trieu	cc10e633d9	Fix unused variable warning. Move function call into debug macro to suppress unused variable warning in non-debug builds. llvm-svn: 313942	2017-09-21 23:48:01 +00:00
Pranav Bhandarkar	931d0b7aff	Enable the reuse of values computed in a previous loop iteration. This patch adds a pass that removes the computation of provably redundant expressions that have been computed earlier in a previous iteration. It relies on the use of PHIs to identify loop carried dependences. This is scalar replacement for vector types. llvm-svn: 313925	2017-09-21 21:48:23 +00:00
Geoff Berry	bb23df92b5	[AArch64] Fix bug in store of vector 0 DAGCombine. Summary: Avoid using XZR/WZR directly as operands to split stores of zero vectors. Doing so can lead to the XZR/WZR being used by an instruction that doesn't allow it (e.g. add). Fixes bug 34674. Reviewers: t.p.northover, efriedma, MatzeB Subscribers: aemerson, rengolin, javed.absar, mcrosier, eraman, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D38146 llvm-svn: 313916	2017-09-21 21:10:06 +00:00
Artem Belevich	42960b4188	[NVPTX] Implemented bar.warp.sync, barrier.sync, and vote{.sync} instructions/intrinsics/builtins. Differential Revision: https://reviews.llvm.org/D38148 llvm-svn: 313898	2017-09-21 18:44:49 +00:00
Zaara Syeda	fcd9697d72	[Power9] Spill gprs to vector registers rather than stack This patch updates register allocation to enable spilling gprs to volatile vector registers rather than the stack. It can be enabled for Power9 with option -ppc-enable-gpr-to-vsr-spills. Differential Revision: https://reviews.llvm.org/D34815 llvm-svn: 313886	2017-09-21 16:12:33 +00:00
Simon Atanasyan	9f676a7798	[mips] Do not pass redundant IsN64 flag to MCELFObjectTargetWriter. NFC Now we pass the 'Is64_' flag to the MCELFObjectTargetWriter ctor iif when we make deal with N64 ABI. So it is redundant to pass additional 'IsN64' flag. llvm-svn: 313878	2017-09-21 14:04:47 +00:00
Jonas Paulsson	b0e8a2e623	[SystemZ] Improve optimizeCompareZero() More conversions to load-and-test can be made with this patch by adding a forward search in optimizeCompareZero(). Review: Ulrich Weigand https://reviews.llvm.org/D38076 llvm-svn: 313877	2017-09-21 13:52:24 +00:00
Simon Atanasyan	11766558d7	[mips] Fix relocation record format and ELF header for N32 ABI The N32 ABI uses RELA relocation format, do not use 3-in-1 relocation's encoding, and uses ELFCLASS32. This change passes the `IsN32` flag to the `MCAsmBackend` to distinguish usage of N32 ABI. We still do not handle some cases like providing the `-target-abi=o32` command line option with the `mips64` target triple. That's why elf_header.s contains some "FIXME" strings. This case will be fixed in a separate patch. Differential revision: https://reviews.llvm.org/D37960 llvm-svn: 313873	2017-09-21 10:44:26 +00:00
Matt Arsenault	1390af2dd2	AMDGPU: Add option to stress calls This inverts the behavior of the AlwaysInline pass to mark every function not already marked alwaysinline as noinline. llvm-svn: 313865	2017-09-21 07:00:48 +00:00
Craig Topper	1b9d24ca57	[X86] Remove execute permissions from a couple files. llvm-svn: 313863	2017-09-21 04:55:08 +00:00
Craig Topper	8b6b8cc5b1	[X86] Remove windows line endings. llvm-svn: 313862	2017-09-21 04:55:07 +00:00
Craig Topper	d1252692a4	[X86] Remove unused tablegen class. llvm-svn: 313861	2017-09-21 04:55:06 +00:00
Matt Arsenault	fdcdd88d57	AMDGPU: Fix crash on immediate operand We can have a v_mac with an immediate src0. We can still fold if it's an inline immediate, otherwise it already uses the constant bus. llvm-svn: 313852	2017-09-21 00:45:59 +00:00
Craig Topper	e33755860d	[X86] Replace a condition that can never be true with an assert. llvm-svn: 313848	2017-09-21 00:18:48 +00:00
Eugene Zelenko	076468c0d0	[ARM] Fix some Clang-tidy modernize-use-using and Include What You Use warnings; other minor fixes (NFC). llvm-svn: 313823	2017-09-20 21:35:51 +00:00
Artem Belevich	4654dc89be	[NVPTX] Implemented shfl.sync instruction and supporting intrinsics/builtins. Differential Revision: https://reviews.llvm.org/D38090 llvm-svn: 313820	2017-09-20 21:23:07 +00:00
Simon Atanasyan	72982e6913	[mips] Fix calculation of a branch instruction offset to escape left shift of negative value llvm-svn: 313815	2017-09-20 21:01:30 +00:00
Matt Arsenault	8cbb4884a5	AMDGPU: Start selecting v_mad_mixhi_f16 llvm-svn: 313814	2017-09-20 21:01:24 +00:00
Saleem Abdulrasool	aff96d907b	X86: treat SwiftCC as Win64_CC on Win64 The Swift CC is identical to Win64 CC with the exception of swift error being passed in r12 which is a CSR. However, since this calling convention is only used in swift -> swift code, it does not impact interoperability and can be treated entirely as Win64 CC. We would previously incorrectly lower the frame setup as we did not treat the frame as conforming to Win64 specifications. llvm-svn: 313813	2017-09-20 21:00:40 +00:00
Matt Arsenault	e135c4c6a6	AMDGPU: Add tied operands to v_mad_mix{lo\|hi}_f16 These write to the low and high half of the destination register and leave the other 16-bits unchanged. This is true for most 16-bit instructions on gfx9, but we don't use that now. llvm-svn: 313812	2017-09-20 20:53:49 +00:00
Eric Christopher	adc4bc64ad	Remove the default subtarget from the new Nios2 port. It's unused and deprecated. llvm-svn: 313808	2017-09-20 20:32:23 +00:00
Matt Arsenault	76935122cc	AMDGPU: Start selecting v_mad_mixlo_f16 Also add some tests that should be able to use v_mad_mixhi_f16, but do not yet. This is trickier because we don't really model the partial update of the register done by 16-bit instructions. llvm-svn: 313806	2017-09-20 20:28:39 +00:00
Matt Arsenault	644883ff07	AMDGPU: Fix encoding of op_sel for mad_mix* opcodes llvm-svn: 313797	2017-09-20 19:09:28 +00:00
Saleem Abdulrasool	432b88e5f4	CodeGen: support SwiftError SwiftCC on Windows x64 Add support for passing SwiftError through a register on the Windows x64 calling convention. This allows the use of swifterror attributes on parameters which is used by the swift front end for the `Error` parameter. This partially enables building the swift standard library for Windows x86_64. llvm-svn: 313791	2017-09-20 18:40:59 +00:00
Simon Pilgrim	33ec43d653	[X86][SSE] Remove unnecessary NonceMasks from combineX86ShufflesRecursively calls (NFCI) llvm-svn: 313743	2017-09-20 09:36:11 +00:00
Andrew V. Tischenko	92980ce6aa	'into' instruction should not be decoded as a valid instr in 64-bit mode llvm-svn: 313735	2017-09-20 08:17:17 +00:00
Craig Topper	5c7cd25f82	[X86] Remove isel checks for immediate size on floating point compare and xop compare instructions. NFCI If these checks fail we end up not selecting an instruction at all. So we are already relying on the immediate being checked upstream of isel. So doing the check in isel is just bloat to the isel table. Interestingly, we didn't check on the AVX512 version of the instructions anyway. llvm-svn: 313724	2017-09-20 06:38:41 +00:00
Stanislav Mekhanoshin	2e3bf37ec4	[AMDGPU] Fixed memory leak with inliner replaced Delete inliner before replacing it. llvm-svn: 313723	2017-09-20 06:34:28 +00:00
Matt Arsenault	c8aea66627	AMDGPU: Move r600 only code into r600 only td file llvm-svn: 313719	2017-09-20 06:11:25 +00:00
Stanislav Mekhanoshin	5641820141	[AMDGPU] Fix regression in test clang/test/CodeGen/backend-unsupported-error.ll llvm-svn: 313718	2017-09-20 06:10:15 +00:00
Matt Arsenault	b81495dccb	AMDGPU: Match load d16 hi instructions Also starts selecting global loads for constant address in some cases. Some end up selecting to mubuf still, which requires investigation. We still get sub-optimal regalloc and extra waitcnts inserted due to not really tracking the liveness of the separate register halves. llvm-svn: 313716	2017-09-20 05:01:53 +00:00
Stanislav Mekhanoshin	5670e6d482	[AMDGPU] Port of HSAIL inliner Differential Revision: https://reviews.llvm.org/D36849 llvm-svn: 313714	2017-09-20 04:25:58 +00:00
Matt Arsenault	bc68383166	AMDGPU: Cleanup load/store PatFrags Try to use a consistent naming scheme. llvm-svn: 313713	2017-09-20 03:43:35 +00:00
Matt Arsenault	fcc213fab7	AMDGPU: Match store d16_hi instructions llvm-svn: 313712	2017-09-20 03:20:09 +00:00
Jonathan Roelofs	85908aa84b	[ARM] Relax 'cpsie'/'cpsid' flag parsing. The ARM docs suggest in examples that the flags can have either case, and there are applications in the wild that (libopencm3, for example) that expect to be able to use the uppercase spelling. https://reviews.llvm.org/D37953 llvm-svn: 313680	2017-09-19 21:23:19 +00:00
Vadzim Dambrouski	8cc8b63b06	[MSP430] Align functions on 2-byte boundary instead of 4. Summary: There is no benefit in having the 4-byte alignment, and removing this restriction can save a lot of space for some applications. Reviewers: asl, awygle Reviewed By: awygle Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D36165 llvm-svn: 313676	2017-09-19 21:05:20 +00:00
Stanislav Mekhanoshin	d4ae470d2e	[AMDGPU] Prevent post-RA scheduler from breaking memory clauses The pre-RA scheduler does load/store clustering, but post-RA scheduler undoes it. Add mutation to prevent it. Differential Revision: https://reviews.llvm.org/D38014 llvm-svn: 313670	2017-09-19 20:54:38 +00:00
Ulrich Weigand	59a01a958a	[SystemZ] Fix truncstore + bswap codegen bug SystemZTargetLowering::combineSTORE contains code to transform a combination of STORE + BSWAP into a STRV type instruction. This transformation is correct for regular stores, but not for truncating stores. The routine neglected to check for that case. Fixes a miscompilation of llvm-objcopy with clang, which caused test suite failures in the SystemZ multistage build bot. llvm-svn: 313669	2017-09-19 20:50:05 +00:00
Craig Topper	75370b9b49	[X86] Convert X86ISD::SELECT to ISD::VSELECT just before instruction selection to avoid duplicate patterns Similar to what we do for X86ISD::SHRUNKBLEND just turn X86ISD::SELECT into ISD::VSELECT. This allows us to remove the duplicated TRUNC patterns. Differential Revision: https://reviews.llvm.org/D38022 llvm-svn: 313644	2017-09-19 17:19:45 +00:00
Tony Jiang	2d9c5f3b8b	[PowerPC Peephole] Constants into a join add, use ADDI over LI/ADD. Two blocks prior to the join each perform an li and the the join block has an add using the initialized register. Optimize each predecessor block to instead use addi and delete the li's and add. Differential Revision: https://reviews.llvm.org/D36734 llvm-svn: 313639	2017-09-19 16:14:37 +00:00
Tony Jiang	425071eff3	[Power9] Add missing Power9 instructions. The following 8 instructions are implemented in this patch. addpcis(subpcis, lnia), darn, maddhd, maddhdu, maddld, setb llvm-svn: 313636	2017-09-19 15:22:36 +00:00
Daniel Sanders	83e23d1398	[globalisel] Add a G_BSWAP instruction and support bswap using it. llvm-svn: 313633	2017-09-19 14:25:15 +00:00
Nikolai Bozhenov	ebbde1409f	[Nios2] Subtarget, basic infrastructure for frame, instructions and registers This is the second minimal patch keeping Nios2 target buildable. I'm adding subtarget here and other stuff for frame lowering, instruction, register information methods. I do not add any test cases, as still there are missing parts like DAG selector and assembly printing. I plan to include them into the next patch. Patch by Andrei Grischenko <andrei.l.grischenko@intel.com> Differential Revision: https://reviews.llvm.org/D37256 llvm-svn: 313626	2017-09-19 11:54:29 +00:00
Jina Nahias	ccfb8d4fe8	[x86] Lowering Mask Set1 intrinsics to LLVM IR This patch, together with a matching clang patch (https://reviews.llvm.org/D37668), implements the lowering of X86 mask set1 intrinsics to IR. Differential Revision: https://reviews.llvm.org/D37669 llvm-svn: 313625	2017-09-19 11:03:06 +00:00
Roger Ferrer Ibanez	8d0180c955	[ARM] Use ADDCARRY / SUBCARRY This is a preparatory step for D34515. This change: - makes nodes ISD::ADDCARRY and ISD::SUBCARRY legal for i32 - lowering is done by first converting the boolean value into the carry flag using (_, C) ← (ARMISD::ADDC R, -1) and converted back to an integer value using (R, _) ← (ARMISD::ADDE 0, 0, C). An ARMISD::ADDE between the two operations does the actual addition. - for subtraction, given that ISD::SUBCARRY second result is actually a borrow, we need to invert the value of the second operand and result before and after using ARMISD::SUBE. We need to invert the carry result of ARMISD::SUBE to preserve the semantics. - given that the generic combiner may lower ISD::ADDCARRY and ISD::SUBCARRYinto ISD::UADDO and ISD::USUBO we need to update their lowering as well otherwise i64 operations now would require branches. This implies updating the corresponding test for unsigned. - add new combiner to remove the redundant conversions from/to carry flags to/from boolean values (ARMISD::ADDC (ARMISD::ADDE 0, 0, C), -1) → C - fixes PR34045 - fixes PR34564 Differential Revision: https://reviews.llvm.org/D35192 llvm-svn: 313618	2017-09-19 09:05:39 +00:00
Matt Arsenault	e745d9963e	AMDGPU: Run internalize symbols at -O0 The relocations used for externally visible functions aren't supported, so the direct call emitted ends up hitting a linker error. llvm-svn: 313616	2017-09-19 07:40:11 +00:00
Gadi Haber	6f8fbf4b86	[X86][Skylake] Adding the scheduling information for the SkylakeClient target This patch adds the instruction scheduling information for the SkylakeClient (SKL) architecture target by adding the file X86SchedSkylakeClient.td located under the X86 Target. We used the scheduling information retrieved from the Skylake architects in order to create the file. The scheduling information includes latency, number of micro-Ops and used ports by each SKL instruction. The patch continues the scheduling replacement and insertion effort started with the SNB target in r307529 and r310792 and for HSW in r311879. Please expect some performance fluctuations due to code alignment effects. Reviewers: craig.topper, zvi, chandlerc, igorb, aymanmus, RKSimon, delena Differential Revision: https://reviews.llvm.org/D37294 llvm-svn: 313613	2017-09-19 06:19:27 +00:00
Craig Topper	c38371492f	[X86] Remove some unnecessary patterns for truncate with X86ISD::SELECT and undef preserved source. We canonicalize undef preserved sources to zero during intrinsic lowering. llvm-svn: 313612	2017-09-19 05:30:24 +00:00
Craig Topper	a80949feb5	[X86] Add VPERMPD/VPERMQ and VPERMPS/VPERMD to the execution domain fixing table. llvm-svn: 313610	2017-09-19 04:39:55 +00:00
Yonghong Song	9ef85f0677	bpf: add inline-asm support Signed-off-by: Yonghong Song <yhs@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> llvm-svn: 313593	2017-09-18 23:29:36 +00:00
Sanjay Patel	f31b1a00ea	[DAGCombiner] fold assertzexts separated by trunc If we have an AssertZext of a truncated value that has already been AssertZext'ed, we can assert on the wider source op to improve the zext-y knowledge: assert (trunc (assert X, i8) to iN), i1 --> trunc (assert X, i1) to iN This moves a fold from being Mips-specific to general combining, and x86 shows improvements. Differential Revision: https://reviews.llvm.org/D37017 llvm-svn: 313577	2017-09-18 22:05:35 +00:00
Konstantin Zhuravlyov	ca8946a376	AMDGPU: Start selecting s_xnor_{b32, b64} Differential Revision: https://reviews.llvm.org/D37981 llvm-svn: 313565	2017-09-18 21:22:45 +00:00
Craig Topper	39cdb84560	[X86] Make sure we still emit zext for GR32 to GR64 when the source of the zext is AssertZext The AssertZext we might see in this case is only giving information about the lower 32 bits. It isn't providing information about the upper 32 bits. So we should emit a zext. This fixes PR28540. Differential Revision: https://reviews.llvm.org/D37729 llvm-svn: 313563	2017-09-18 20:49:13 +00:00
Craig Topper	e92327e236	[X86] Don't emit COPY_TO_REG to ABCD registers before EXTRACT_SUBREG of sub_8bit This is similar to D37843, but for sub_8bit. This fixes all of the patterns except for the 2 that emit only an EXTRACT_SUBREG. That causes a verifier error with global isel because global isel doesn't know to issue the ABCD when doing this extract on 32-bits targets. Differential Revision: https://reviews.llvm.org/D37890 llvm-svn: 313558	2017-09-18 19:21:21 +00:00
Craig Topper	b2155159a8	[X86] Don't emit COPY_TO_REG to ABCD registers before EXTRACT_SUBREG of sub_8bit_hi I'm pretty sure that InstrEmitter::EmitSubregNode will take care of this itself by calling ConstrainForSubReg which in turn calls TRI->getSubClassWithSubReg. I think Jakob Stoklund Olesen alluded to this in his commit message for r141207 which added the code to EmitSubregNode. Differential Revision: https://reviews.llvm.org/D37843 llvm-svn: 313557	2017-09-18 19:21:19 +00:00
Evandro Menezes	307e039d8c	[AArch64] Adjust the cost model for Exynos M1 and M2 Refine the model of FP loads and stores. llvm-svn: 313555	2017-09-18 19:00:38 +00:00
Evandro Menezes	91650ef061	[AArch64] Adjust the cost model for Exynos M1 and M2 Refine the model of loads and stores using the register offset addressing modes. llvm-svn: 313554	2017-09-18 19:00:36 +00:00
Evandro Menezes	9cd1bd7a83	[AArch64] Adjust the cost model for Exynos M1 and M2 Fix formatting in the predicate function AArch64InstrInfo::isExynosShiftLeftFast(). llvm-svn: 313553	2017-09-18 19:00:31 +00:00
Simon Pilgrim	4aa28b9730	[X86][AVX] Improve (i8 bitcast (v8i1 x)) handling for 256-bit vector compare results. As commented on D37849, AVX1 targets were missing a chance to use vmovmskps for v8f32/v8i32 results for bool vector bitcasts llvm-svn: 313547	2017-09-18 17:58:31 +00:00
Craig Topper	77d7f331dd	[X86] Fix two more places to prefer VPERMQ/PD over VPERM2X128 when AVX2 is enabled The shuffle combining and lowerVectorShuffleAsLanePermuteAndBlend were both still trying to use VPERM2XF128 for unary shuffles when AVX2 is enabled. VPERM2X128 takes two inputs meaning when we use it for a unary shuffle one of those inputs is left undefined creating a false dependency on whatever register gets allocated there. If we have VPERMQ/PD we should prefer those since they only have a single input. Differential Revision: https://reviews.llvm.org/D37947 llvm-svn: 313542	2017-09-18 16:39:49 +00:00
Sam Parker	3fa0ccffc6	[AArch64] Add V8_2aOps feature to Cortex-A55 and 75 Add the missing hardware features the ProcA55 and ProcA75 feature. These are already enabled via the target parser, but I had missed them in the backend. Differential Revision: https://reviews.llvm.org/D37974 llvm-svn: 313535	2017-09-18 14:46:14 +00:00
Sam Parker	71efbe4c68	[ARM] Implement isTruncateFree Implement the isTruncateFree hooks, lifted from AArch64, that are used by TargetTransformInfo. This allows simplifycfg to reduce the test case into a single basic block. Differential Revision: https://reviews.llvm.org/D37516 llvm-svn: 313533	2017-09-18 14:28:51 +00:00
Simon Pilgrim	00161c9961	[X86][SSE] Improve support for vselect(Cond, 0, X) -> ANDN(Cond, X) As discussed on PR28925 and D37849. Differential Revision: https://reviews.llvm.org/D37975 llvm-svn: 313532	2017-09-18 14:23:23 +00:00
Sjoerd Meijer	4e6df15962	[ARM] Fix for indexed dot product instruction descriptions The indexed dot product instructions only accept the lower 16 D-registers as the indexed register, but we were e.g. incorrectly accepting: vudot.u8 d16,d16,d18[0] Differential Revision: https://reviews.llvm.org/D37968 llvm-svn: 313531	2017-09-18 14:17:57 +00:00
Simon Pilgrim	f133c50a42	[X86] combineVSelectWithAllOnesOrZeros - cleanup variable names. NFCI. We were reusing the 'false' select value 'is zero' variable name for the 'true' select value 'is zero' variable name. llvm-svn: 313528	2017-09-18 12:55:54 +00:00
Nikolai Bozhenov	84af99b3b1	[X86FixupBWInsts] More precise register liveness if no <imp-use> on MOVs. Summary: Subregister liveness tracking is not implemented for X86 backend, so sometimes the whole super register is said to be live, when only a subregister is really live. That might happen if the def and the use are located in different MBBs, see added fixup-bw-isnt.mir test. However, using knowledge of the specific instructions handled by the bw-fixup-pass we can get more precise liveness information which this change does. Reviewers: MatzeB, DavidKreitzer, ab, andrew.w.kaylor, craig.topper Reviewed By: craig.topper Subscribers: n.bozhenov, myatsina, llvm-commits, hiraditya Patch by Andrei Elovikov <andrei.elovikov@intel.com> Differential Revision: https://reviews.llvm.org/D37559 llvm-svn: 313524	2017-09-18 10:17:59 +00:00
Craig Topper	fc52eb37af	[X86] Strengthen some of the SD type constraints in X86InstrFragmentsSIMD.td This effects the vector shift and rotates as well as some of the vector compares. The changes to the shifts by immediates allows a few hundred bytes to be removed by removing type checks for the size of the immediate containing the shift/rotate amount. llvm-svn: 313512	2017-09-18 05:50:54 +00:00
Craig Topper	a6054328e8	[X86] Teach the execution domain fixing tables to use movlhps inplace of unpcklpd for the packed single domain. MOVLHPS has a smaller encoding than UNPCKLPD in the legacy encodings. With VEX and EVEX encodings it doesn't matter. llvm-svn: 313509	2017-09-18 04:40:58 +00:00
Craig Topper	87f7381edf	[X86] Teach execution domain fixing to convert between FP and int unpack instructions. llvm-svn: 313508	2017-09-18 03:29:54 +00:00
Craig Topper	d4341920d5	[X86] Teach execution domain fixing to convert between VPERMILPS and VPSHUFD. llvm-svn: 313507	2017-09-18 03:29:47 +00:00
Craig Topper	3b11fca73e	[X86] Remove the X86ISD::MOVLHPD. Lowering doesn't use it and it's not a real instruction. It was used in patterns, but we had the exact same patterns with Unpckl as well. So now just use Unpckl in the instruction patterns. llvm-svn: 313506	2017-09-18 00:20:53 +00:00
Craig Topper	ee6646d7de	[X86] Teach shuffle lowering to use MOVLHPS/MOVHLPS for lowering v4f32 unary shuffles with SSE1 only. llvm-svn: 313504	2017-09-17 22:36:41 +00:00
Craig Topper	0a197df6ce	[X86] Synchronize a pattern between SSE1 and AVX/AVX512. For some reason the SSE1 pattern expected a X86Movlhps pattern to have a v4f32 type, but AVX and AVX512 expected it to have a v4i32 type. I'm not even sure this pattern is even reachable post SSE1, but I'm starting with fixing this obvious bug. llvm-svn: 313495	2017-09-17 18:59:32 +00:00
Craig Topper	9689fc6dc8	[X86] Colocate all of the X86VBroadcast patterns for v2i64 and v2f64. NFC The memory patterns were near the MOVDDUP definition, but the non-memory patterns were near the broadcast instructions. llvm-svn: 313494	2017-09-17 18:59:30 +00:00
Craig Topper	9c0bf2c70a	[X86] Remove patterns for X86Movddup with v4i64 type. Lowering doesn't emit these. llvm-svn: 313493	2017-09-17 18:59:28 +00:00
Craig Topper	5831e2c872	[X86] Remove isel patterns for X86Movhlps and X86Movlhps with integer types. Lowering doesn't emit these. llvm-svn: 313492	2017-09-17 18:59:26 +00:00
Craig Topper	e305c5ab5e	[X86] Remove isel patterns for movlpd/movlps with integer types. Lowering doesn't emit these. llvm-svn: 313491	2017-09-17 18:59:24 +00:00
Alex Bradbury	8ab4a9696a	[RISCV] Add support for disassembly This Disassembly support allows for 'round-trip' testing, and rv32i-valid.s has been updated appropriately. Differential Revision: https://reviews.llvm.org/D23567 llvm-svn: 313486	2017-09-17 14:36:28 +00:00
Alex Bradbury	6758ecb98c	[RISCV] Add support for all RV32I instructions This patch supports all RV32I instructions as described in the RISC-V manual. A future patch will add support for pseudoinstructions and other instruction expansions (e.g. 0-arg fence -> fence iorw, iorw). Differential Revision: https://reviews.llvm.org/D23566 llvm-svn: 313485	2017-09-17 14:27:35 +00:00
Igor Breger	06335bbd2f	[GlobalISel][X86] refactoring X86InstructionSelector.cpp .NFC. llvm-svn: 313484	2017-09-17 14:02:19 +00:00
Igor Breger	f1d388a5c5	[GlobalISel][X86] Legalize i1 G_ADD/G_SUB/G_MUL/G_XOR/G_OR/G_AND instructions. llvm-svn: 313483	2017-09-17 11:34:17 +00:00
Igor Breger	21200ed7af	[GlobalISel][X86] G_FCONSTANT support. Summary: G_FCONSTANT support, port the implementation from X86FastIsel. Reviewers: zvi, delena, guyblank Reviewed By: delena Subscribers: rovka, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D37734 llvm-svn: 313478	2017-09-17 08:08:13 +00:00
Craig Topper	bef5d24449	[X86] Remove integer X86ISD::SHUFP patterns. Lowering doesn't emit these. llvm-svn: 313477	2017-09-17 06:09:32 +00:00
Craig Topper	7c0de01082	[X86] Add patterns to make blends with immediate control commutable during isel for load folding. llvm-svn: 313476	2017-09-17 05:06:05 +00:00
Craig Topper	e09907fcd4	[X86] Remove some unused defaults from some multiclass parameters. llvm-svn: 313475	2017-09-17 05:06:03 +00:00
Craig Topper	ca05e9fd8d	[X86] Make PLCMULQDQ instructions commutable during isel to fold loads. This adds new patterns and SDNodeXForm to enable the immediate to commuted. llvm-svn: 313472	2017-09-16 23:18:50 +00:00
Craig Topper	ffca0ff9bf	[X86] Add NoAVX predicates to the patterns for the legacy encoded PCLMUL and AES instructions. Previously we were just relying on pattern order to define precedence. Which works, but isn't the best way. llvm-svn: 313471	2017-09-16 23:18:48 +00:00
Craig Topper	b150deac6e	[X86] Remove some extra code that snuck into r313450. The same code appears earlier in the function. This represents an earlier version of what became r313373 that I still had sitting in my local repo. llvm-svn: 313465	2017-09-16 17:51:55 +00:00
Sanjay Patel	65d6780703	[x86] enable storeOfVectorConstantIsCheap() target hook This allows vector-sized store merging of constants in DAGCombiner using the existing code in MergeConsecutiveStores(). All of the twisted logic that decides exactly what vector operations are legal and fast for each particular CPU are handled separately in there using the appropriate hooks. For the motivating tests in merge-store-constants.ll, we already produce the same vector code in IR via the SLP vectorizer. So this is just providing a backend backstop for code that doesn't go through that pass (-O1). More details in PR24449: https://bugs.llvm.org/show_bug.cgi?id=24449 (this change should be the last step to resolve that bug) Differential Revision: https://reviews.llvm.org/D37451 llvm-svn: 313458	2017-09-16 13:29:12 +00:00
Craig Topper	23f78c1662	[X86] Add isel patterns to be able to fold loads into VPERM2F128 even when the load is on the first input to the SDNode. We just need to toggle bits 1 and 5 of the immediate and swap the sources. The peephole pass could trigger commuting/folding for this later, but its easy enough to fix in isel. Disable the peephole pass on the main vperm2x128 test so we know we're doing this through isel. llvm-svn: 313455	2017-09-16 09:16:48 +00:00
Craig Topper	833788a05c	[X86] Remove VPERM2X128 isel patterns with 32-bit elements. Now that the intrinsics are gone we only need 64-bit elements since that's what shuffle lowering uses. llvm-svn: 313453	2017-09-16 08:15:52 +00:00
Craig Topper	f264fcc704	[X86] Remove VPERM2F128/VPERM2I128 intrinsics and autoupgrade to native shuffles. I've moved the test cases from the InstCombine optimizations to the backend to keep the coverage we had there. It covered every possible immediate so I've preserved the resulting shuffle mask for each of those immediates. llvm-svn: 313450	2017-09-16 07:36:14 +00:00
Sam Clegg	66a99e41cd	Change encodeU/SLEB128 to pad to certain number of bytes Previously the 'Padding' argument was the number of padding bytes to add. However most callers that use 'Padding' know how many overall bytes they need to write. With the previous code this would mean encoding the LEB once to find out how many bytes it would occupy and then using this to calulate the 'Padding' value. See: https://reviews.llvm.org/D36595 Differential Revision: https://reviews.llvm.org/D37494 llvm-svn: 313393	2017-09-15 20:34:47 +00:00
Mandeep Singh Grang	1be19e6f5b	[llvm] Fix some typos. NFC. Reviewers: mcrosier Reviewed By: mcrosier Subscribers: mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D37922 llvm-svn: 313388	2017-09-15 20:01:43 +00:00
Hans Wennborg	534bfbd3ba	Revert r313343 "[X86] PR32755 : Improvement in CodeGen instruction selection for LEAs." This caused PR34629: asserts firing when building Chromium. It also broke some buildbots building test-suite as reported on the commit thread. > Summary: > 1/ Operand folding during complex pattern matching for LEAs has been > extended, such that it promotes Scale to accommodate similar operand > appearing in the DAG. > e.g. > T1 = A + B > T2 = T1 + 10 > T3 = T2 + A > For above DAG rooted at T3, X86AddressMode will no look like > Base = B , Index = A , Scale = 2 , Disp = 10 > > 2/ During OptimizeLEAPass down the pipeline factorization is now performed over LEAs > so that if there is an opportunity then complex LEAs (having 3 operands) > could be factored out. > e.g. > leal 1(%rax,%rcx,1), %rdx > leal 1(%rax,%rcx,2), %rcx > will be factored as following > leal 1(%rax,%rcx,1), %rdx > leal (%rdx,%rcx) , %edx > > 3/ Aggressive operand folding for AM based selection for LEAs is sensitive to loops, > thus avoiding creation of any complex LEAs within a loop. > > Reviewers: lsaba, RKSimon, craig.topper, qcolombet > > Reviewed By: lsaba > > Subscribers: spatel, igorb, llvm-commits > > Differential Revision: https://reviews.llvm.org/D35014 llvm-svn: 313376	2017-09-15 18:40:26 +00:00
Craig Topper	7a183e2760	[X86] Prefer VPERMQ over VPERM2F128 for any unary shuffle, not just the ones that can be done with a insertf128 The early out for AVX2 in lowerV2X128VectorShuffle is positioned in a weird spot below some shuffle mask equivalency checks. But I think we want to allow VPERMQ for any unary shuffle. Differential Revision: https://reviews.llvm.org/D37893 llvm-svn: 313373	2017-09-15 18:11:13 +00:00
Craig Topper	f1620b2555	[X86] Use SDNode::ops() instead of makeArrayRef and op_begin(). NFCI llvm-svn: 313367	2017-09-15 17:09:05 +00:00
Craig Topper	e0d724cf51	[X86] Don't create i64 constants on 32-bit targets when lowering v64i1 constant build vectors When handling a v64i1 build vector of constants on 32-bit targets we were creating an illegal i64 constant that we then bitcasted back to v64i1. We need to instead create two 32-bit constants, bitcast them to v32i1 and concat the result. We should also take care to handle the halves being all zeros/ones after the split. This patch splits the build vector and then recursively lowers the two pieces. This allows us to handle the all ones and all zeros cases with minimal effort. Ideally we'd just do the split and concat, and let lowering get called again on the new nodes, but getNode has special handling for CONCAT_VECTORS that reassembles the pieces back into a single BUILD_VECTOR. Hopefully the two temporary BUILD_VECTORS we had to create to do this that don't get returned don't cause any issues. Fixes PR34605. Differential Revision: https://reviews.llvm.org/D37858 llvm-svn: 313366	2017-09-15 17:09:03 +00:00
Craig Topper	143797eb89	[X86] Add isel pattern infrastructure to begin recognizing when we're inserting 0s into the upper portions of a vector register and the producing instruction as already produced the zeros. Currently if we're inserting 0s into the upper elements of a vector register we insert an explicit move of the smaller register to implicitly zero the upper bits. But if we can prove that they are already zero we can skip that. This is based on a similar idea of what we do to avoid emitting explicit zero extends for GR32->GR64. Unfortunately, this is harder for vector registers because there are several opcodes that don't have VEX equivalent instructions, but can write to XMM registers. Among these are SHA instructions and a MMX->XMM move. Bitcasts can also get in the way. So for now I'm starting with explicitly allowing only VPMADDWD because we emit zeros in combineLoopMAddPattern. So that is placing extra instruction into the reduction loop. I'd like to allow PSADBW as well after D37453, but that's currently blocked by a bitcast. We either need to peek through bitcasts or canonicalize insert_subvectors with zeros to remove bitcasts on the value being inserted. Longer term we should probably have a cleanup pass that removes superfluous zeroing moves even when the producer is in another basic block which is something these isel tricks can't do. See PR32544. Differential Revision: https://reviews.llvm.org/D37653 llvm-svn: 313365	2017-09-15 17:09:00 +00:00
Krzysztof Parzyszek	557729761c	[Hexagon] Switch to parameterized register classes for HVX This removes the duplicate HVX instruction set for the 128-byte mode. Single instruction set now works for both modes (64- and 128-byte). llvm-svn: 313362	2017-09-15 15:46:05 +00:00
Sjoerd Meijer	0c5ba21cbf	[AArch64] allow v8f16 types when FullFP16 is supported This adds support for allowing v8f16 vector types, thus avoiding conversions from/to single precision for these types. This is a follow up patch of commits r311154 and r312104, which added support for scalars and v4f16 types, respectively. Differential Revision: https://reviews.llvm.org/D37802 llvm-svn: 313351	2017-09-15 09:24:48 +00:00
Jatin Bhateja	908c8b37c2	[X86] PR32755 : Improvement in CodeGen instruction selection for LEAs. Summary: 1/ Operand folding during complex pattern matching for LEAs has been extended, such that it promotes Scale to accommodate similar operand appearing in the DAG. e.g. T1 = A + B T2 = T1 + 10 T3 = T2 + A For above DAG rooted at T3, X86AddressMode will no look like Base = B , Index = A , Scale = 2 , Disp = 10 2/ During OptimizeLEAPass down the pipeline factorization is now performed over LEAs so that if there is an opportunity then complex LEAs (having 3 operands) could be factored out. e.g. leal 1(%rax,%rcx,1), %rdx leal 1(%rax,%rcx,2), %rcx will be factored as following leal 1(%rax,%rcx,1), %rdx leal (%rdx,%rcx) , %edx 3/ Aggressive operand folding for AM based selection for LEAs is sensitive to loops, thus avoiding creation of any complex LEAs within a loop. Reviewers: lsaba, RKSimon, craig.topper, qcolombet Reviewed By: lsaba Subscribers: spatel, igorb, llvm-commits Differential Revision: https://reviews.llvm.org/D35014 llvm-svn: 313343	2017-09-15 05:29:51 +00:00
Craig Topper	c2311f476d	[X86] Remove an unnecessary SmallVector from LowerBUILD_VECTOR. I think this may have existed to convert from SDUse to SDValue, but it doesn't look like its needed now. llvm-svn: 313311	2017-09-14 22:47:59 +00:00
Jan Sjodin	1f2f57a7ea	Fix warnings in r313297. llvm-svn: 313302	2017-09-14 21:49:52 +00:00
Matt Arsenault	c317287fde	AMDGPU: Fix violating constant bus restriction You can't use madmk/madmk if it already uses an SGPR input. llvm-svn: 313298	2017-09-14 20:54:29 +00:00
Jan Sjodin	312ccf761c	Add AddresSpace to PseudoSourceValue. Differential Revision: https://reviews.llvm.org/D35089 llvm-svn: 313297	2017-09-14 20:53:51 +00:00
Matt Arsenault	37ab4cf8b8	AMDGPU: Fix assert on alloca of array of struct llvm-svn: 313282	2017-09-14 18:02:29 +00:00
Matt Arsenault	defe371771	AMDGPU: Stop modifying SP in call sequences Because the stack growth direction and addressing is done in the same direction, modifying SP at the beginning of the call sequence was incorrect. If we had a stack passed argument, we would end up skipping that number of bytes before pushing arguments, leaving unused/inconsistent space. The callee creates fixed stack objects in its frame, so the space necessary for these is already logically allocated in the callee, so we just let the callee increment SP if it really requires it. llvm-svn: 313279	2017-09-14 17:37:40 +00:00
Simon Dardis	55e446737f	[mips] Implement the 'dext' aliases and it's disassembly alias. The other members of the dext family of instructions (dextm, dextu) are traditionally handled by the assembler selecting the right variant of 'dext' depending on the values of the position and size operands. When these instructions are disassembled, rather than reporting the actual instruction, an equivalent aliased form of 'dext' is generated and is reported. This is to mimic the behaviour of binutils. Reviewers: slthakur, nitesh.jain, atanasyan Differential Revision: https://reviews.llvm.org/D34887 llvm-svn: 313276	2017-09-14 17:27:53 +00:00
Matt Arsenault	6efd082c01	AMDGPU: Make frame register caller preserved Using SplitCSR for the frame register was very broken. Often the copies in the prolog and epilog were optimized out, in addition to them being inserted after the true prolog where the FP was clobbered. I have a hacky solution which works that continues to use split CSR, but for now this is simpler and will get to working programs. llvm-svn: 313274	2017-09-14 17:14:57 +00:00
Simon Dardis	6f83ae38a3	[mips] Implement the 'dins' aliases. Traditionally GAS has provided automatic selection between dins, dinsm and dinsu. Binutils also disassembles all instructions in that family as 'dins' rather than the actual instruction. Reviewers: slthakur Differential Revision: https://reviews.llvm.org/D34877 llvm-svn: 313267	2017-09-14 15:17:50 +00:00
Aleksandar Beserminji	7d610f4d06	Test commit. llvm-svn: 313262	2017-09-14 14:34:04 +00:00
Krzysztof Parzyszek	473d02dbac	[Hexagon] Make getMemAccessSize return size in bytes It used to return the actual field value from the instruction descriptor. There is no reason for that, that value is not interesting in any way and the specifics of its encoding in the descriptor should not be exposed. llvm-svn: 313257	2017-09-14 12:06:40 +00:00
Ayman Musa	ab68449c53	[X86] When applying the shuffle-to-zero-extend transformation on floating point, bitcast to integer first. Fix issue described in PR34577. Differential Revision: https://reviews.llvm.org/D37803 llvm-svn: 313256	2017-09-14 12:06:38 +00:00
Simon Dardis	28365b33ad	[mips] Pick the right variant of DINS upfront and enable target instruction verification This patch complements D16810 "[mips] Make isel select the correct DEXT variant up front.". Now ISel picks the right variant of DINS, so now there is no need to replace DINS with the appropriate variant during MipsMCCodeEmitter::encodeInstruction(). This patch also enables target specific instruction verification for ins, dins, dinsm, dinsu, ext, dext, dextm, dextu. These instructions have constraints that are checked when generating MipsISD::Ins and MipsISD::Ext nodes, but these constraints are not checked during instruction selection. Adding machine verification should catch outstanding cases. Finally, correct a bug that instruction verification uncovered, where the position operand of a DINSU generated during lowering was being silently and accidently corrected to the correct value. Reviewers: slthakur Differential Revision: https://reviews.llvm.org/D34809 llvm-svn: 313254	2017-09-14 10:58:00 +00:00
Matt Arsenault	ecb43ef1bc	AMDGPU: Don't spill SP reg like a normal CSR llvm-svn: 313217	2017-09-13 23:47:01 +00:00
Stanislav Mekhanoshin	7fe9a5d9b4	Allow target to decide when to cluster loads/stores in misched MachineScheduler when clustering loads or stores checks if base pointers point to the same memory. This check is done through comparison of base registers of two memory instructions. This works fine when instructions have separate offset operand. If they require a full calculated pointer such instructions can never be clustered according to such logic. Changed shouldClusterMemOps to accept base registers as well and let it decide what to do about it. Differential Revision: https://reviews.llvm.org/D37698 llvm-svn: 313208	2017-09-13 22:20:47 +00:00
Matt Arsenault	fb017ae155	AMDGPU: Handle coldcc in more places Missed in r312936 llvm-svn: 313205	2017-09-13 21:55:52 +00:00
Michael Zuckerman	80d3649f23	Refactoring the stride 4 code in the X86interleavedaccess NFC llvm-svn: 313166	2017-09-13 18:28:09 +00:00
Petar Jovanovic	50e068158b	[mips] correct operand range for DINSM instruction This patch corrects the definition of the DINSM instruction. Specification for DINSM instruction for Mips64 says that size operand should be 2 <= size <= 64, but it is defined as uimm5_inssize_plus1 which gives range of 1 .. 32. Patch by Aleksandar Beserminji. Differential Revision: https://reviews.llvm.org/D37683 llvm-svn: 313149	2017-09-13 14:09:13 +00:00
Stefan Pintilie	dff606ec3e	[Power9] Add missing instructions: extswsli, popcntb Added the following P9 instructions: extswsli, extswsli., popcntb Differential Revision: https://reviews.llvm.org/D37342 llvm-svn: 313147	2017-09-13 14:05:27 +00:00
Igor Breger	5c721199dd	[GlobalISel][X86] support G_FPEXT operation. Summary: Support G_FPEXT operation. Selection done via TableGen'erated code. Reviewers: zvi, guyblank, aymanmus, m_zuckerman Reviewed By: zvi Subscribers: rovka, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D34816 llvm-svn: 313135	2017-09-13 09:05:23 +00:00
Uriel Korach	5d5da5f531	[X86] [PATCH] [intrinsics] Lowering X86 ABS intrinsics to IR. (llvm) This patch, together with a matching clang patch (https://reviews.llvm.org/D37694), implements the lowering of X86 ABS intrinsics to IR. differential revision: https://reviews.llvm.org/D37693. llvm-svn: 313134	2017-09-13 09:02:36 +00:00
Mohammed Agabaria	e9aebf26af	[X86] Adding X86 Processor Families Adding x86 Processor families to initialize several uArch properties (based on the family) This patch shows how gather cost can be initialized based on the proc. family Differential Revision: https://reviews.llvm.org/D35348 llvm-svn: 313132	2017-09-13 09:00:27 +00:00
Craig Topper	2b6bfda561	[X86] Make sure we emit a SUBREG_TO_REG after the MOV32ri when creating a BEXTR64rr instruction from a shift/and pair. Fixes PR34589. llvm-svn: 313126	2017-09-13 07:53:21 +00:00
Elena Demikhovsky	6cab129464	[X86 CodeGen] Optimization of ZeroExtendLoad for v2i8 vector Load with zero-extend and sign-extend from v2i8 to v2i32 is "Legal" since SSE4.1 and may be performed using PMOVZXBD , PMOVSXBD instructions. llvm-svn: 313121	2017-09-13 06:40:26 +00:00
Craig Topper	0a3bcebcc2	[X86] Use isUInt<32> to simplify some code. NFC llvm-svn: 313112	2017-09-13 02:29:59 +00:00
Petr Hosek	c35fe2b70b	[Fuchsia] Magenta -> Zircon Fuchsia's lowest API layer has been renamed from Magenta to Zircon. In LLVM proper, this is only mentioned in comments. Patch by Roland McGrath Differential Revision: https://reviews.llvm.org/D37763 llvm-svn: 313105	2017-09-13 01:18:06 +00:00
Derek Schuff	a519fe5a37	[WebAssembly] Add sign extend instructions from atomics proposal Select them from ISD::SIGN_EXTEND_INREG Differential Revision: https://reviews.llvm.org/D37603 remove spurious change llvm-svn: 313101	2017-09-13 00:29:06 +00:00
Sanjay Patel	659279450e	[x86] eliminate unnecessary vector compare for AVX masked store The masked store instruction only cares about the sign-bit of each mask element, so the compare s<0 isn't needed. As noted in PR11210: https://bugs.llvm.org/show_bug.cgi?id=11210 ...fixing this should allow us to eliminate x86-specific masked store intrinsics in IR. (Although more testing will be needed to confirm that.) I filed a bug to track improvements for AVX512: https://bugs.llvm.org/show_bug.cgi?id=34584 Differential Revision: https://reviews.llvm.org/D37446 llvm-svn: 313089	2017-09-12 23:24:05 +00:00
Petar Jovanovic	e4dacb750d	[mips] handle UImm16_AltRelaxed match type Currently, UImm16_AltRelaxed match type is not handled in MatchAndEmitInstruction() function, which may result in llvm_unreachable() behavior. This patch adds necessary case for this match type. Patch by Aleksandar Beserminji. Differential Revision: https://reviews.llvm.org/D37682 llvm-svn: 313077	2017-09-12 21:43:33 +00:00
Ahmed Bougacha	106dd035a8	[AArch64][GlobalISel] Select all fpexts. Tablegen already can select these: mark them as legal, remove the c++ code, and add tests for all types. llvm-svn: 313074	2017-09-12 21:04:11 +00:00
Ahmed Bougacha	a7aa2a9fb1	[AArch64][GlobalISel] Select all fptruncs. We already support these in tablegen, but we're matching the wrong operator (libm ftrunc). Fix that. While there, drop the c++ code, support COPYs of FPR16, and add tests for the other types. llvm-svn: 313073	2017-09-12 21:04:10 +00:00
Lei Huang	34e6621724	Update branch coalescing to be a PowerPC specific pass Implementing this pass as a PowerPC specific pass. Branch coalescing utilizes the analyzeBranch method which currently does not include any implicit operands. This is not an issue on PPC but must be handled on other targets. Pass is currently off by default. Enabled via -enable-ppc-branch-coalesce. Differential Revision : https: // reviews.llvm.org/D32776 llvm-svn: 313061	2017-09-12 18:39:11 +00:00
Yonghong Song	06ff655e59	bpf: Add BPF AsmParser support in LLVM Reviewed-by: Yonghong Song <yhs@fb.com> Signed-off-by: Jiong Wang <jiong.wang@netronome.com> llvm-svn: 313055	2017-09-12 17:55:23 +00:00
Craig Topper	958106d0f1	[X86] Move matching of (and (srl/sra, C), (1<<C) - 1) to BEXTR/BEXTRI instruction to custom isel Recognizing this pattern during DAG combine hides information about the 'and' and the shift from other combines. I think it should be recognized at isel so its as late as possible. But it can't be done with table based isel because you need to be able to look at both immediates. This patch moves it to custom isel in X86ISelDAGToDAG.cpp. This does break a couple tests in tbm_patterns because we are now emitting an and_flag node or (cmp and, 0) that we dont' recognize yet. We already had this problem for several other TBM patterns so I think this fine and we can address of them together. I've also fixed a bug where the combine to BEXTR was preventing us from using a trick of zero extending AH to handle extracts of bits 15:8. We might still want to use BEXTR if it enables load folding. But honestly I hope we narrowed the load instead before got to isel. I think we should probably also support matching BEXTR from (srl/srl (and mask << C), C). But that should be a different patch. Differential Revision: https://reviews.llvm.org/D37592 llvm-svn: 313054	2017-09-12 17:40:25 +00:00
Hans Wennborg	8c1eb106bd	Revert r313009 "[ARM] Use ADDCARRY / SUBCARRY" This was causing PR34045 to fire again. > This is a preparatory step for D34515 and also is being recommitted as its > first version caused PR34045. > > This change: > - makes nodes ISD::ADDCARRY and ISD::SUBCARRY legal for i32 > - lowering is done by first converting the boolean value into the carry flag > using (_, C) ← (ARMISD::ADDC R, -1) and converted back to an integer value > using (R, _) ← (ARMISD::ADDE 0, 0, C). An ARMISD::ADDE between the two > operations does the actual addition. > - for subtraction, given that ISD::SUBCARRY second result is actually a > borrow, we need to invert the value of the second operand and result before > and after using ARMISD::SUBE. We need to invert the carry result of > ARMISD::SUBE to preserve the semantics. > - given that the generic combiner may lower ISD::ADDCARRY and > ISD::SUBCARRYinto ISD::UADDO and ISD::USUBO we need to update their lowering > as well otherwise i64 operations now would require branches. This implies > updating the corresponding test for unsigned. > - add new combiner to remove the redundant conversions from/to carry flags > to/from boolean values (ARMISD::ADDC (ARMISD::ADDE 0, 0, C), -1) → C > - fixes PR34045 > > Differential Revision: https://reviews.llvm.org/D35192 Also revert follow-up r313010: > [ARM] Fix typo when creating ISD::SUB nodes > > In D35192, I accidentally introduced a typo when creating ISD::SUB nodes, > giving them two values instead of one. > > This fails when the merge_values combiner finds one of these nodes. > > This change fixes PR34564. > > Differential Revision: https://reviews.llvm.org/D37690 llvm-svn: 313044	2017-09-12 16:24:17 +00:00
Jonas Paulsson	fc4f323ac1	[SystemZ] Add the CoveredBySubRegs bit to GPR64, GPR128 and FPR128 registers. This bit is needed in order for the CalleeSavedRegs list to automatically include the super registers if all of their subregs are present. Thanks to Wei Mi for initially indicating this deficiency in the SystemZ backend. Review: Ulrich Weigand. https://bugs.llvm.org/show_bug.cgi?id=34550 llvm-svn: 313023	2017-09-12 12:11:29 +00:00
Sjoerd Meijer	bafde8f3e3	[AArch64] ISel: Add some debug messages to LowerBUILDVECTOR. NFC. Differential Revision: https://reviews.llvm.org/D37676 llvm-svn: 313017	2017-09-12 10:24:12 +00:00
Yael Tsafrir	47668b5e03	[X86] Lower _mm[256\|512]_[mask[z]]_avg_epu[8\|16] intrinsics to native llvm IR Differential Revision: https://reviews.llvm.org/D37560 llvm-svn: 313013	2017-09-12 07:50:35 +00:00
Roger Ferrer Ibanez	9df2527b0b	[ARM] Fix typo when creating ISD::SUB nodes In D35192, I accidentally introduced a typo when creating ISD::SUB nodes, giving them two values instead of one. This fails when the merge_values combiner finds one of these nodes. This change fixes PR34564. Differential Revision: https://reviews.llvm.org/D37690 llvm-svn: 313010	2017-09-12 07:42:28 +00:00
Roger Ferrer Ibanez	4f92b4162f	[ARM] Use ADDCARRY / SUBCARRY This is a preparatory step for D34515 and also is being recommitted as its first version caused PR34045. This change: - makes nodes ISD::ADDCARRY and ISD::SUBCARRY legal for i32 - lowering is done by first converting the boolean value into the carry flag using (_, C) ← (ARMISD::ADDC R, -1) and converted back to an integer value using (R, _) ← (ARMISD::ADDE 0, 0, C). An ARMISD::ADDE between the two operations does the actual addition. - for subtraction, given that ISD::SUBCARRY second result is actually a borrow, we need to invert the value of the second operand and result before and after using ARMISD::SUBE. We need to invert the carry result of ARMISD::SUBE to preserve the semantics. - given that the generic combiner may lower ISD::ADDCARRY and ISD::SUBCARRYinto ISD::UADDO and ISD::USUBO we need to update their lowering as well otherwise i64 operations now would require branches. This implies updating the corresponding test for unsigned. - add new combiner to remove the redundant conversions from/to carry flags to/from boolean values (ARMISD::ADDC (ARMISD::ADDE 0, 0, C), -1) → C - fixes PR34045 Differential Revision: https://reviews.llvm.org/D35192 llvm-svn: 313009	2017-09-12 07:40:09 +00:00
Craig Topper	fd6be2868e	[X86] Fix typo in comment. NFC llvm-svn: 312990	2017-09-12 01:30:09 +00:00
Hans Wennborg	075e5a2e2b	Revert r312898 "[ARM] Use ADDCARRY / SUBCARRY" It caused PR34564. > This is a preparatory step for D34515 and also is being recommitted as its > first version caused PR34045. > > This change: > - makes nodes ISD::ADDCARRY and ISD::SUBCARRY legal for i32 > - lowering is done by first converting the boolean value into the carry flag > using (_, C) ← (ARMISD::ADDC R, -1) and converted back to an integer value > using (R, _) ← (ARMISD::ADDE 0, 0, C). An ARMISD::ADDE between the two > operations does the actual addition. > - for subtraction, given that ISD::SUBCARRY second result is actually a > borrow, we need to invert the value of the second operand and result before > and after using ARMISD::SUBE. We need to invert the carry result of > ARMISD::SUBE to preserve the semantics. > - given that the generic combiner may lower ISD::ADDCARRY and > ISD::SUBCARRYinto ISD::UADDO and ISD::USUBO we need to update their lowering > as well otherwise i64 operations now would require branches. This implies > updating the corresponding test for unsigned. > - add new combiner to remove the redundant conversions from/to carry flags > to/from boolean values (ARMISD::ADDC (ARMISD::ADDE 0, 0, C), -1) → C > - fixes PR34045 > > Differential Revision: https://reviews.llvm.org/D35192 llvm-svn: 312980	2017-09-11 23:52:02 +00:00
Yonghong Song	be9c00347f	bpf: add " ll" in the LD_IMM64 asmstring This partially revert previous fix in commit f5858045aa0b ("bpf: proper print imm64 expression in inst printer"). In that commit, the original suffix "ll" is removed from LD_IMM64 asmstring. In the customer print method, the "ll" suffix is printed if the rhs is an immediate. For example, "r2 = 5ll" => "r2 = 5ll", and "r3 = varll" => "r3 = var". This has an issue though for assembler. Since assembler relies on asmstring to do pattern matching, it will not be able to distiguish between "mov r2, 5" and "ld_imm64 r2, 5" since both asmstring is "r2 = 5". In such cases, the assembler uses 64bit load for all "r = <val>" asm insts. This patch adds back " ll" suffix for ld_imm64 with one additional space for "#reg = #global_var" case. Signed-off-by: Yonghong Song <yhs@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> llvm-svn: 312978	2017-09-11 23:43:35 +00:00
Matt Arsenault	537bd3b906	AMDGPU: Allow coldcc calls llvm-svn: 312936	2017-09-11 18:54:20 +00:00
Petar Jovanovic	d4f3723c56	[mips][microMIPS] add lapc instruction Implement LAPC instruction for mips32r6, mips64r6 and micromips32r6. Patch by Milos Stojanovic. Differential Revision: https://reviews.llvm.org/D35984 llvm-svn: 312934	2017-09-11 18:34:04 +00:00
Stanislav Mekhanoshin	710da42b86	[AMDGPU] Produce madak and madmk from the two-address pass These two instructions are normally selected, but when the two address pass converts mac into mad we end up with the mad where we could have one of these. Differential Revision: https://reviews.llvm.org/D37389 llvm-svn: 312928	2017-09-11 17:13:57 +00:00
Craig Topper	7b02020c7f	[X86] Remove portions of r275950 that are no longer needed with i1 not being a legal type Summary: r275950 added support for turning (trunc (X >> N) to i1) into BT(X, N). But that's no longer necessary now that i1 isn't legal. This patch removes the support for that, but preserves some of the refactorings done in that commit. Reviewers: guyblank, RKSimon, spatel, zvi Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D37673 llvm-svn: 312925	2017-09-11 16:16:48 +00:00
Simon Pilgrim	b092bd321a	[X86][SSE] Add support for X86ISD::PACKSS to ComputeNumSignBitsForTargetNode Helps improve combineLogicBlendIntoPBLENDV support by allowing us to peek into through PACKSS truncations of vector comparison results. Differential Revision: https://reviews.llvm.org/D37680 llvm-svn: 312916	2017-09-11 14:03:47 +00:00
Tim Renouf	660ba2b8af	[AMDGPU] exp should not be in WQM mode A mrt exp with vm=1 must be in exact (non-WQM) mode, as it also exports the exec mask as the valid mask to determine which pixels to render. This commit marks any exp as needing to be in exact mode. Actually, if there are multiple mrt exps, only one needs to have vm=1, and only that one needs to be in exact mode. But that is an optimization for another day. Differential Revision: https://reviews.llvm.org/D36305 llvm-svn: 312915	2017-09-11 13:55:39 +00:00
Andre Vieira	c429aabb91	[ARM] Enable the use of SVC anywhere in an IT block Differential Revision: https://reviews.llvm.org/D37374 llvm-svn: 312908	2017-09-11 11:11:17 +00:00
Dylan McKay	0fc5fe0a58	[AVR] Enable the '__do_copy_data' function Also enables '__do_clear_bss'. These functions are automaticalled called by the CRT if they are declared. We need these to be called otherwise RAM will start completely uninitialised, even though we need to copy RAM variables from progmem to RAM. llvm-svn: 312905	2017-09-11 10:32:51 +00:00
Igor Breger	1f14364d64	[GlobalISel][X86] G_ANYEXT support. Summary: G_ANYEXT support Reviewers: zvi, delena Reviewed By: delena Subscribers: rovka, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D37675 llvm-svn: 312903	2017-09-11 09:41:13 +00:00
Tim Renouf	6cb007fc72	AMDGPU: trivial comment change ... to check commit access for new committer. llvm-svn: 312900	2017-09-11 08:31:32 +00:00
Roger Ferrer Ibanez	12b20f2307	[ARM] Use ADDCARRY / SUBCARRY This is a preparatory step for D34515 and also is being recommitted as its first version caused PR34045. This change: - makes nodes ISD::ADDCARRY and ISD::SUBCARRY legal for i32 - lowering is done by first converting the boolean value into the carry flag using (_, C) ← (ARMISD::ADDC R, -1) and converted back to an integer value using (R, _) ← (ARMISD::ADDE 0, 0, C). An ARMISD::ADDE between the two operations does the actual addition. - for subtraction, given that ISD::SUBCARRY second result is actually a borrow, we need to invert the value of the second operand and result before and after using ARMISD::SUBE. We need to invert the carry result of ARMISD::SUBE to preserve the semantics. - given that the generic combiner may lower ISD::ADDCARRY and ISD::SUBCARRYinto ISD::UADDO and ISD::USUBO we need to update their lowering as well otherwise i64 operations now would require branches. This implies updating the corresponding test for unsigned. - add new combiner to remove the redundant conversions from/to carry flags to/from boolean values (ARMISD::ADDC (ARMISD::ADDE 0, 0, C), -1) → C - fixes PR34045 Differential Revision: https://reviews.llvm.org/D35192 llvm-svn: 312898	2017-09-11 07:38:05 +00:00
Simon Pilgrim	5e2ed8beb1	[X86][SSE] Tidyup + clang-format combineX86ShuffleChain call. NFCI. llvm-svn: 312887	2017-09-10 18:18:45 +00:00
Simon Pilgrim	ff347d3ea4	[X86][SSE] Move combineTo call out of combineX86ShufflesConstants. NFCI. Move towards making it possible to use the shuffle combines for cases where we don't want to call DCI.CombineTo() with the result. llvm-svn: 312886	2017-09-10 18:10:49 +00:00
Simon Pilgrim	9a95e1afd0	[X86][SSE] Move combineTo call out of combineX86ShuffleChain. NFCI. First step towards making it possible to use the shuffle combines for cases where we don't want to call DCI.CombineTo() with the result. llvm-svn: 312884	2017-09-10 14:06:41 +00:00
Coby Tayree	ef66b3bbab	[X86][X86AsmParser] adding const on InlineAsmIdentifierInfo in CreateMemForInlineAsm. NFC. llvm-svn: 312881	2017-09-10 12:21:24 +00:00
Uriel Korach	01dfd3d1e3	Revert "adding autoUpgrade support to broadcast[f\|i]32x2 intrinsics" This reverts commit r312879 - An accidental partial commit. llvm-svn: 312880	2017-09-10 09:07:21 +00:00
Uriel Korach	3eb10a79e5	adding autoUpgrade support to broadcast[f\|i]32x2 intrinsics llvm-svn: 312879	2017-09-10 08:40:13 +00:00
Craig Topper	3be1db82b6	[X86] Don't disable slow INC/DEC if optimizing for size Summary: Just because INC/DEC is a little slow on some processors doesn't mean we shouldn't prefer it when optimizing for size. This appears to match gcc behavior. Reviewers: chandlerc, zvi, RKSimon, spatel Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D37177 llvm-svn: 312866	2017-09-09 17:11:59 +00:00
Sanjay Patel	6fd4391ddd	[DivRempairs] add a pass to optimize div/rem pairs (PR31028) This is intended to be a superset of the functionality from D31037 (EarlyCSE) but implemented as an independent pass, so there's no stretching of scope and feature creep for an existing pass. I also proposed a weaker version of this for SimplifyCFG in D30910. And I initially had almost this same functionality as an addition to CGP in the motivating example of PR31028: https://bugs.llvm.org/show_bug.cgi?id=31028 The advantage of positioning this ahead of SimplifyCFG in the pass pipeline is that it can allow more flattening. But it needs to be after passes (InstCombine) that could sink a div/rem and undo the hoisting that is done here. Decomposing remainder may allow removing some code from the backend (PPC and possibly others). Differential Revision: https://reviews.llvm.org/D37121 llvm-svn: 312862	2017-09-09 13:38:18 +00:00
Craig Topper	6bed9de3d5	[X86] Call removeDeadNode when we're done doing custom isel for mul, div and test Summary: Once we've done our custom isel for these nodes, I think we should be calling removeDeadNode to prune them out of the DAG. Table driven isel ultimately either calls morphNodeTo which modifies a node and doesn't leave dead nodes. Or it emits new nodes and then calls removeDeadNode as part of Opc_CompleteMatch. If you run a simple multiply test case like this through llc with -debug you'll see a umul_lohi node get printed as part of the dump for Instruction Selection ends. ``` define i64 @foo(i64 %a, i64 %b) local_unnamed_addr #0 { entry: %conv = zext i64 %a to i128 %conv1 = zext i64 %b to i128 %mul = mul nuw nsw i128 %conv1, %conv %shr = lshr i128 %mul, 64 %conv2 = trunc i128 %shr to i64 ret i64 %conv2 } ``` Reviewers: RKSimon, spatel, zvi, guyblank, niravd Reviewed By: niravd Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D37547 llvm-svn: 312857	2017-09-09 05:57:20 +00:00
Craig Topper	63c5047a4e	[X86] Use ReplaceNode instead of ReplaceUses when converting X86ISD::SHRUNKBLEND to ISD::VSELECT during isel. This ensures that the SHRUNKBLEND node gets erased immediately. llvm-svn: 312856	2017-09-09 05:57:19 +00:00
Kyle Butt	8c0314c3ed	PPC: Don't select lxv/stxv for insufficiently aligned stack slots. The lxv/stxv instructions require an offset that is 0 % 16. Previously we were selecting lxv/stxv for loads and stores to the stack where the offset from the slot was a multiple of 16, but the stack slot was not 16 or more byte aligned. When the frame gets lowered these transform to r(1\|31) + slot + offset. If slot is not aligned, slot + offset may not be 0 % 16. Now we require 16 byte or more alignment for select lxv/stxv to stack slots. Includes a testcase that shows both sufficiently and insufficiently aligned stack slots. llvm-svn: 312843	2017-09-09 00:37:56 +00:00
Davide Italiano	0731a4f52a	[AMDGPU] Remove unused function. NFCI. llvm-svn: 312836	2017-09-08 23:54:11 +00:00
Yonghong Song	093420f929	bpf: proper print imm64 expression in inst printer Fixed an issue in printImm64Operand where if the value is an expression, print out the expression properly. Currently, it will print r1 = <MCOperand Expr:(tx_port)>ll With the patch, the printout will be r1 = tx_port Suggested-by: Jiong Wang <jiong.wang@netronome.com> Signed-off-by: Yonghong Song <yhs@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> llvm-svn: 312833	2017-09-08 23:32:38 +00:00
Matt Arsenault	461ed08fbd	AMDGPU: Start using !con operator We have a lot of operand definition work essentially producing every valid permutation of operands to workaround builiding operand lists based on the instruction features. Apparently tablegen already has a mostly undocumented operator to concat dags which simplies this. Convert one simple place to use this. The BUF instruction definitions have much more complicated logic that can be totally rewritten now. llvm-svn: 312822	2017-09-08 19:09:13 +00:00
Matt Arsenault	2f4df7ec41	AMDGPU: Recompute scc liveness The various scalar bit operations set SCC, so one is erased or moved it needs to be recomputed. Not sure why the existing tests don't fail on this. llvm-svn: 312819	2017-09-08 18:51:26 +00:00
Chandler Carruth	38e2b506db	[x86] Fix GCC pedantic warnings about default arguments for lambdas. llvm-svn: 312809	2017-09-08 18:23:42 +00:00
Alexey Bataev	6dd29fccb8	[SLP] Support for horizontal min/max reduction. SLP vectorizer supports horizontal reductions for Add/FAdd binary operations. Patch adds support for horizontal min/max reductions. Function getReductionCost() is split to getArithmeticReductionCost() for binary operation reductions and getMinMaxReductionCost() for min/max reductions. Patch fixes PR26956. Differential revision: https://reviews.llvm.org/D27846 llvm-svn: 312791	2017-09-08 13:49:36 +00:00
Dean Michael Berris	711dec260f	[XRay][CodeGen][PowerPC] Fix tail exit codegen for XRay in PPC Summary: This fixes code-gen for XRay in PPC. The regression wasn't caught by codegen tests which we add in this change. What happened was the following: - For tail exits, we used to unconditionally prepend the returns/exits with a pseudo-instruction that gets lowered to the instrumentation sled (and leave the actual return/exit instruction as-is). - Changes to the XRay instrumentation pass caused the tail exits to suddenly also emit the tail exit pseudo-instruction, since the check for whether a return instruction was also a call instruction meant it was a tail exit instruction. - None of the tests caught the regression either due to non-existent tests, or the tests being disabled/removed for continuous breakage. This change re-introduces some of the basic tests and verifies that we're back to a state that allows the back-end to generate appropriate XRay instrumented binaries for PPC in the presence of tail exits. Reviewers: echristo, timshen Subscribers: nemanjai, kbarton, llvm-commits Differential Revision: https://reviews.llvm.org/D37570 llvm-svn: 312772	2017-09-08 01:47:56 +00:00
Chandler Carruth	acbcf06f03	[x86] Flesh out the custom ISel for RMW aritmetic ops with used flags to cover the bitwise operators. Nothing really exciting here, this just stamps out the rest of the core operations that can RMW memory and set flags. Still not implemented here: ADC, SBB. Those will require more interesting logic to channel the flags in, and I'm not currently planning to try to tackle that. It might be interesting for someone who wants to improve our code generation for bignum implementations. Differential Revision: https://reviews.llvm.org/D37141 llvm-svn: 312768	2017-09-08 00:17:12 +00:00
Chandler Carruth	52a31bf268	[x86] Extend the manual ISel of `add` and `sub` with both RMW memory operands and used flags to support matching immediate operands. This is a bit trickier than register operands, and we still want to fall back on a register operands even for things that appear to be "immediates" when they won't actually select into the operation's immediate operand. This also requires us to handle things like selecting `sub` vs. `add` to minimize the number of bits needed to represent the immediate, and picking the shortest immediate encoding. In order to that, we in turn need to scan to make sure that CF isn't used as it will get inverted. The end result seems very nice though, and we're now generating optimal instruction sequences for these patterns IMO. A follow-up patch will further expand this to other operations with RMW memory operands. But handing `add` and `sub` are useful starting points to flesh out the machinery and make sure interesting and complex cases can be handled. Thanks to Craig Topper who provided a few fixes and improvements to this patch in addition to the review! Differential Revision: https://reviews.llvm.org/D37139 llvm-svn: 312764	2017-09-07 23:54:24 +00:00
Reid Kleckner	0e8c4bb055	Sink some IntrinsicInst.h and Intrinsics.h out of llvm/include Many of these uses can get by with forward declarations. Hopefully this speeds up compilation after adding a single intrinsic. llvm-svn: 312759	2017-09-07 23:27:44 +00:00
Artem Belevich	8af4e23d1e	[CUDA] Added rudimentary support for CUDA-9 and sm_70. For now CUDA-9 is not included in the list of CUDA versions clang searches for, so the path to CUDA-9 must be explicitly passed via --cuda-path=. On LLVM side NVPTX added sm_70 GPU type which bumps required PTX version to 6.0, but otherwise is equivalent to sm_62 at the moment. Differential Revision: https://reviews.llvm.org/D37576 llvm-svn: 312734	2017-09-07 18:14:32 +00:00
Matt Arsenault	d7e2303df2	AMDGPU: Start selecting v_mad_mix_f32 llvm-svn: 312732	2017-09-07 18:05:07 +00:00
Konstantin Zhuravlyov	5f5b586c99	AMDGPU: Handle non-temporal loads and stores Differential Revision: https://reviews.llvm.org/D36862 llvm-svn: 312729	2017-09-07 17:14:54 +00:00
Konstantin Zhuravlyov	c8c9d4a0a6	AMDGPU: Handle more than one memory operand in SIMemoryLegalizer Differential Revision: https://reviews.llvm.org/D37397 llvm-svn: 312725	2017-09-07 16:14:21 +00:00
Benjamin Kramer	6ef976d5e1	[ARM] Remove redundant vcvt patterns. These don't add any value as they're just compositions of existing patterns. However, they can confuse the cost logic in ISel, leading to duplicated vcvt instructions like in PR33199. llvm-svn: 312724	2017-09-07 14:52:26 +00:00
Michael Zuckerman	5a385940d3	[X86][LLVM]Expanding Supports lowerInterleavedLoad() in X86InterleavedAccess (VF{8\|16\|32} stride 3). This patch expands the support of lowerInterleavedload to {8\|16\|32}x8i stride 3. LLVM creates suboptimal shuffle code-gen for AVX2. In overall, this patch is a specific fix for the pattern (Strid=3 VF={8\|16\|32}) and we plan to include the store (deinterleved side). The patch goal is to optimize the following sequence: a0 b0 c0 a1 b1 c1 a2 b2 c2 a3 b3 c3 a4 b4 c4 a5 b5 c5 a6 b6 c6 a7 b7 c7 into a0 a1 a2 a3 a4 a5 a6 a7 b0 b1 b2 b3 b4 b5 b6 b7 c0 c1 c2 c3 c4 c5 c6 c7 Reviewers 1. zvi 2. igor 3. guyblank 4. dorit 5. Ayal llvm-svn: 312722	2017-09-07 14:02:13 +00:00
Simon Atanasyan	6d7958684b	[mips] Use RegisterMCAsmBackend to register all MIPS asm backends. NFC This change converts the `MipsAsmBackend` constructor to the "standard" form. It makes possible to use `RegisterMCAsmBackend` for the backends registrations. Now we pass `Triple` instance to the `MipsAsmBackend` ctor and deduce all required options like endianness and bitness from the triple. We still need to implement explicit ABI checking for providing correct options to backends. Differential revision: https://reviews.llvm.org/D37519 llvm-svn: 312720	2017-09-07 12:54:26 +00:00
Alex Bradbury	c09d5611c4	[Sparc][NFC] Clean up SelectCC lowering The ARM, BPF, MSP430, Sparc and Mips backends all use a similar code sequence for lowering SelectCC. As pointed out by @reames in D29937, this code isn't particularly clear and in most of these backends doesn't actually match the comments. This patch makes the code sequence clearer for the Sparc backend through better variable naming and more accurate comments (e.g. we are inserting triangle control flow, _not_ diamond). There is no functional change. Differential Revision: https://reviews.llvm.org/D37194 llvm-svn: 312713	2017-09-07 11:30:55 +00:00
Zvi Rackover	25799d93f0	X86: Improve AVX512 fptoui lowering Summary: Add patterns for fptoui <16 x float> to <16 x i8> fptoui <16 x float> to <16 x i16> Reviewers: igorb, delena, craig.topper Reviewed By: craig.topper Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D37505 llvm-svn: 312704	2017-09-07 07:40:34 +00:00
Craig Topper	7bc65e220c	[X86] Force shuffle lowering to only create X86ISD::VPERM2X128 with 64-bit element types so we can remove some patterns from isel. Intrinsic handling is still creating these nodes with 32-bit elements as well. But at least this gets rid of 8 and 16. Ideally, someday we'll convert the intrinsics to generic vector shuffles and remove the intrinsics. llvm-svn: 312702	2017-09-07 06:11:10 +00:00
Matt Arsenault	65ca292a8d	AMDGPU: Don't legalize i16 extloads to i32 with legal i16 Keeping non-i16 extloads makes it easier to match some new gfx9 load instructions. llvm-svn: 312699	2017-09-07 05:37:34 +00:00
Craig Topper	9228aee711	[X86] Remove patterns for selecting a v8f32 X86ISD::MOVSS or v4f64 X86ISD::MOVSD. I don't think we ever generate these. If we did, I would expect we would also be able to generate v16f32 and v8f64, but we don't have those patterns. llvm-svn: 312694	2017-09-07 05:08:16 +00:00
Saleem Abdulrasool	5fba8ba9cc	ARM: track globals promoted to coalesced const pool entries Globals that are promoted to an ARM constant pool may alias with another existing constant pool entry. We need to keep a reference to all globals that were promoted to each constant pool value so that we can emit a distinct label for each promoted global. These labels are necessary so that debug info can refer to the promoted global without an undefined reference during linking. Patch by Stephen Crane! llvm-svn: 312692	2017-09-07 04:00:13 +00:00
Stanislav Mekhanoshin	442e28dd42	[AMDGPU] Use v_pk_max_f16 for fcanonicalize Differential Revision: https://reviews.llvm.org/D37325 llvm-svn: 312676	2017-09-06 22:27:29 +00:00
Matthias Braun	c9056b834d	Insert IMPLICIT_DEFS for undef uses in tail merging Tail merging can convert an undef use into a normal one when creating a common tail. Doing so can make the register live out from a block which previously contained the undef use. To keep the liveness up-to-date, insert IMPLICIT_DEFs in such blocks when necessary. To enable this patch the computeLiveIns() function which used to compute live-ins for a block and set them immediately is split into new functions: - computeLiveIns() just computes the live-ins in a LivePhysRegs set. - addLiveIns() applies the live-ins to a block live-in list. - computeAndAddLiveIns() is a convenience function combining the other two functions and behaving like computeLiveIns() before this patch. Based on a patch by Krzysztof Parzyszek <kparzysz@codeaurora.org> Differential Revision: https://reviews.llvm.org/D37034 llvm-svn: 312668	2017-09-06 20:45:24 +00:00
Craig Topper	7391786175	[X86] Move more isel patterns to X86InstrVecCompiler.td. NFC This moves more of our subvector insert/extract tricks to X86InstrVecCompiler.td and refactors them into multiclasses. llvm-svn: 312661	2017-09-06 19:03:55 +00:00
Stanislav Mekhanoshin	ea134bcb13	[AMDGPU] Fixed encoding of v_pk_mul_f16 in fcanonicalize Differential Revision: https://reviews.llvm.org/D37522 llvm-svn: 312660	2017-09-06 18:29:51 +00:00
Craig Topper	d548bb9d37	[X86] Actually add the new file that was supposed to go with r312649. llvm-svn: 312650	2017-09-06 17:06:40 +00:00
Craig Topper	cf1d8a55f2	[X86] Introduce a new td file to hold patterns some of the non instruction patterns from SSE and AVX512 This patch moves some of similar non-instruction patterns from X86InstrSSE.td and X86InstrAVX512.td to a common file. This is intended as a starting point. There are many other optimization patterns that exist in both files that we could move here. Differential Revision: https://reviews.llvm.org/D37455 llvm-svn: 312649	2017-09-06 16:56:52 +00:00
Krzysztof Parzyszek	daf1a5f94e	[Hexagon] Add option to generate calls to "abort" for "unreachable" llvm-svn: 312644	2017-09-06 16:22:55 +00:00
Stanislav Mekhanoshin	949fac9e40	[AMDGPU] Fix shouldClusterMemOps to process flat loads Flat loads do not have vdata operand but have vdst instead. Differential Revision: https://reviews.llvm.org/D37502 llvm-svn: 312640	2017-09-06 15:31:30 +00:00
Nicolai Haehnle	523827145b	AMDGPU: Make worst-case assumption about the wait states in inline assembly Summary: Mesa still uses a hack where empty inline assembly is used as a kind of optimization barrier. This exposed a problem where not enough wait states were inserted, because the hazard recognizer implicitly assumed that each inline assembly "instruction" has at least one wait state. Reviewers: arsenm Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D37205 llvm-svn: 312635	2017-09-06 13:50:13 +00:00
Simon Pilgrim	05710a8b4f	[X86][X87] Ensure x87 instructions are tagged as altering the FPSW reg As noted in PR34080, a lot of x87 instructions alter the FPSW status register (or leave it in an undefined state) but aren't tagged as such in the tablegen. This patch tags the control word, stack, wait and math instructions as altering FPSW, which matches what the AMD APMs suggests happens. Differential Revision: https://reviews.llvm.org/D36414 llvm-svn: 312629	2017-09-06 10:23:12 +00:00
Alex Bradbury	4f7f0da574	[RISCV][NFC] Fix sorting of includes in lib/Target/RISCV llvm-svn: 312624	2017-09-06 09:21:21 +00:00
Chandler Carruth	585bfc8443	[x86] Fix PR34377 by disabling cmov conversion when we relied on it performing a zext of a register. On the PR there is discussion of how to more effectively handle this, but this patch prevents us from miscompiling code. Differential Revision: https://reviews.llvm.org/D37504 llvm-svn: 312620	2017-09-06 06:28:08 +00:00
Craig Topper	eec768b5c4	[X86] Add more FMA3 patterns to cover a load in all 3 possible positions. This matches what we already do for AVX512. The peephole pass makes up for this in most if not all cases. But this makes isel behavior for these consistent with every other instruction. llvm-svn: 312613	2017-09-06 03:35:58 +00:00
Hal Finkel	112a6bac72	[PowerPC] Don't use xscvdpspn on the P7 xscvdpspn was not introduced until the P8, so don't use it on the P7. Fixes a regression introduced in r288152. llvm-svn: 312612	2017-09-06 03:08:26 +00:00
Jatin Bhateja	2c139f77c7	[X86] Allow cross-lane permutations for sub targets supporting AVX2. Summary: Most instructions in AVX work “in-lane”, that is, each source element is applied only to other elements of the same lane, thus a cross lane permutation is costly and needs more than one instrution. AVX2 includes instructions to perform any-to-any permutation of words over a 256-bit register and vectorized table lookup. This should also Fix PR34369 Differential Revision: https://reviews.llvm.org/D37388 llvm-svn: 312608	2017-09-06 02:58:47 +00:00
Yaxun Liu	fc5121a722	[AMDGPU] Transform __read_pipe_* and __write_pipe_* When packet size equals packet align and is power of 2, transform __read_pipe* and __write_pipe* to specialized library function. Differential Revision: https://reviews.llvm.org/D36831 llvm-svn: 312598	2017-09-06 00:30:27 +00:00
Eli Friedman	c22c699882	[ARM] Make ARMExpandPseudo add implicit uses for predicated instructions Missing these could potentially screw up post-ra scheduling. Issue found by inspection, so I don't have a real testcase. Included test just verifies the expected operands after expansion. Differential Revision: https://reviews.llvm.org/D35156 llvm-svn: 312589	2017-09-05 22:54:06 +00:00
Eli Friedman	06d0ee734a	[ARM] Register ARMExpandPseudo pass. This allows -run-pass etc. to refer to it. (Split off from D35156.) llvm-svn: 312587	2017-09-05 22:45:23 +00:00
Craig Topper	784fa8a4e3	[X86] Remove unnecessary (v4f32 (X86vzmovl (v4f32 (scalar_to_vector FR32X)))) patterns We had already disabled the pattern for SSE4.1 and SSE4.2. But it got re-enabled for AVX and AVX512. With SSE41 we rely on a separate (v4f32 (X86vzmovl VR128)) pattern to select blendps with a xorps to create zeroess. And a separate (v4f32 (scalar_to_vector FR32X)) to select a COPY_TO_REG_CLASS to move FR32 to VR128 The same thing can happen for AVX with vblendps and those separate patterns already exist. For AVX512, (v4f32 (X86vzmov VR128)) will select a VMOVSS instruction instead of VBLENDPS due to their not being a EVEX VBLENDPS. This is what we were getting out of the larger pattern anyway. So the larger pattern is unneeded for AVX512 too. For SSE1-SSSE3 we can rely on (v4f32 (X86vzmov VR128)) selecting a MOVSS similar to AVX512. Again this is what the larger pattern did too. So the only real change here is that AVX1/2 now properly outputs a VBLENDPS during isel instead of a VMOVSS to match SSE41. Most tests didn't notice because the two address instruction pass knows how to turn VMOVSS into VBLENDPS to get an independent destination register. llvm-svn: 312564	2017-09-05 19:09:02 +00:00
Konstantin Zhuravlyov	80528702c9	AMDGPU: Cleanup/refactor SIMemoryLegalizer [3]: - Refactor SIMemOpInfo's constructors - Allow construction of NotAtomic SIMemOpInfo Differential Revision: https://reviews.llvm.org/D37396 llvm-svn: 312563	2017-09-05 19:01:10 +00:00
Matt Arsenault	22cdb61a78	AMDGPU: Fix not accounting for tail call resource usage If the only call in a function is a tail call, the function isn't considered to have a call since it's a type of return. llvm-svn: 312561	2017-09-05 18:36:36 +00:00
Tony Jiang	61ef1c540c	[PPC][NFC] Renaming things with 'xxinsert' moniker to 'vecinsert' to make it more general. Commit on behalf of Graham Yiu (gyiu@ca.ibm.com) llvm-svn: 312547	2017-09-05 18:08:02 +00:00
Craig Topper	33caeadd90	[AVX512] Remove patterns for (v8f32 (X86vzmovl (insert_subvector undef, (v4f32 (scalar_to_vector FR32X:)), (iPTR 0)))) and the same for v4f64. We don't have this same pattern for AVX2 so I don't believe we should have it for AVX512. We also didn't have it for v16f32. llvm-svn: 312543	2017-09-05 17:33:58 +00:00
Konstantin Zhuravlyov	1aa667fe64	AMDGPU/NFC: Cleanup/refactor SIMemoryLegalizer [2]: - Make SIMemOpInfo a class - Add accessor methods to SIMemOpInfo - Move get*Info methods to SIMemOpInfo Differential Revision: https://reviews.llvm.org/D37395 llvm-svn: 312541	2017-09-05 16:41:25 +00:00
Konstantin Zhuravlyov	844845ae06	AMDGPU/NFC: Cleanup/refactor SIMemoryLegalizer [1]: - Rename MemOpInfo -> SIMemOpInfo - Move SIMemOpInfo class out of SIMemoryLegalizer class Differential Revision: https://reviews.llvm.org/D37394 llvm-svn: 312540	2017-09-05 16:18:05 +00:00
Simon Pilgrim	49f9ba37d8	[X86] Limit store merge size when implicitfloat is enabled (PR34421) As suggested by @niravd : https://bugs.llvm.org/show_bug.cgi?id=34421#c2 Differential Revision: https://reviews.llvm.org/D37464 llvm-svn: 312534	2017-09-05 13:40:29 +00:00
Simon Pilgrim	60ea09eaca	Strip trailing whitespace. NFCI. llvm-svn: 312531	2017-09-05 12:32:16 +00:00
Diana Picus	ac15473cdd	[ARM] GlobalISel: Minor cleanups in inst selector Use the STI member of ARMInstructionSelector instead of TII.getSubtarget() and also make use of STI's methods instead of checking the object format manually. llvm-svn: 312522	2017-09-05 08:22:47 +00:00
Diana Picus	abb088691b	[ARM] GlobalISel: Support global variables for RWPI In RWPI code, globals that are not read-only are accessed relative to the SB register (R9). This is achieved by explicitly generating an ADD instruction between SB and an offset that we either load from a constant pool or movw + movt into a register. llvm-svn: 312521	2017-09-05 07:57:41 +00:00
Craig Topper	c228d790af	[X86] Add hasSideEffects=0 and mayLoad=1 to some instructions that recently had their patterns removed. llvm-svn: 312520	2017-09-05 05:49:44 +00:00
Hiroshi Inoue	614453b797	[PowerPC] eliminate redundant compare instruction If multiple conditional branches are executed based on the same comparison, we can execute multiple conditional branches based on the result of one comparison on PPC. For example, if (a == 0) { ... } else if (a < 0) { ... } can be executed by one compare and two conditional branches instead of two pairs of a compare and a conditional branch. This patch identifies a code sequence of the two pairs of a compare and a conditional branch and merge the compares if possible. To maximize the opportunity, we do canonicalization of code sequence before merging compares. For the above example, the input for this pass looks like: cmplwi r3, 0 beq 0, .LBB0_3 cmpwi r3, -1 bgt 0, .LBB0_4 So, before merging two compares, we canonicalize it as cmpwi r3, 0 ; cmplwi and cmpwi yield same result for beq beq 0, .LBB0_3 cmpwi r3, 0 ; greather than -1 means greater or equal to 0 bge 0, .LBB0_4 The generated code should be cmpwi r3, 0 beq 0, .LBB0_3 bge 0, .LBB0_4 Differential Revision: https://reviews.llvm.org/D37211 llvm-svn: 312514	2017-09-05 04:15:17 +00:00
Simon Pilgrim	91751b42f6	[X86][AVX512] Add support for VPERMILPS v16f32 shuffle lowering (PR34382) Avoid use of VPERMPS where we don't need it by instead using the variable mask version of VPERMILPS for unary shuffles. llvm-svn: 312486	2017-09-04 13:51:57 +00:00
Igor Breger	2661ae48c7	[GlobalISel][X86] G_PHI support. llvm-svn: 312473	2017-09-04 09:06:45 +00:00
Craig Topper	69e22789e1	[X86] Remove duplicate FMA patterns from the isel table. This reorders some patterns to get tablegen to detect them as duplicates. Tablegen only detects duplicates when creating variants for commutable operations. It does not detect duplicates between the patterns as written in the td file. So we need to ensure all the FMA patterns in the td file are unique. This also uses null_frag to remove some other unneeded patterns. llvm-svn: 312470	2017-09-04 07:35:05 +00:00
Craig Topper	af0b992b04	[X86] Mark the FMA nodes as commutable so tablegen will auto generate the patterns. This uses the capability introduced in r312464 to make SDNode patterns commutable on the first two operands. This allows us to remove some of the extra FMA patterns that have to put loads and mask operands in different places to cover all cases. This even includes patterns that were missing to support match a load in the first operand with FMA4. Non-broadcast loads with masking for AVX512. I believe this is causing us to generate some duplicate patterns because tablegen's isomorphism checks don't catch isomorphism between the patterns as written in the td. It only detects isomorphism in the commuted variants it tries to create. The the unmasked 231 and 132 memory forms are isomorphic as written in the td file so we end up keeping both. I think we precommute the 132 pattern to fix this. We also need a follow up patch to go back to the legacy FMA3 instructions and add patterns to the 231 and 132 forms which we currently don't have. llvm-svn: 312469	2017-09-04 06:59:50 +00:00
Dean Michael Berris	ebc1659016	[XRay][CodeGen] Use PIC-friendly code in XRay sleds and remove synthetic references in .text Summary: This is a re-roll of D36615 which uses PLT relocations in the back-end to the call to __xray_CustomEvent() when building in -fPIC and -fxray-instrument mode. Reviewers: pcc, djasper, bkramer Subscribers: sdardis, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D37373 llvm-svn: 312466	2017-09-04 05:34:58 +00:00
Craig Topper	76f44015e7	[X86] Add a combine to recognize when we have two insert subvectors that together write the whole vector, but the starting vector isn't undef. In this case we should replace the starting vector with undef. llvm-svn: 312462	2017-09-04 01:13:36 +00:00
Craig Topper	959fc08f3a	[X86] Remove some unnecessary curly braces and blank line. NFC llvm-svn: 312461	2017-09-04 01:13:34 +00:00
Craig Topper	bc13af84f2	[X86] Add a combine to turn (insert_subvector zero, (insert_subvector zero, X, Idx), Idx) into an insert of X into the larger zero vector. llvm-svn: 312460	2017-09-03 22:25:52 +00:00
Craig Topper	fcf6bc5503	[X86] Add more patterns to use moves to zero the upper portions of a vector register that I missed in r312450. llvm-svn: 312459	2017-09-03 22:25:50 +00:00
Craig Topper	788fbe08db	[X86] Combine inserting a vector of zeros into a vector of zeros just the larger vector. llvm-svn: 312458	2017-09-03 22:25:49 +00:00
Craig Topper	8ee36ffb54	[X86] Add patterns to turn an insert into lower subvector of a zero vector into a move instruction which will implicitly zero the upper elements. Ideally we'd be able to emit the SUBREG_TO_REG without the explicit register->register move, but we'd need to be sure the producing operation would select something that guaranteed the upper bits were already zeroed. llvm-svn: 312450	2017-09-03 17:52:25 +00:00
Craig Topper	fa82efb50a	[X86] Add VBLENDPS/VPBLENDD to the execution domain fixing tables. llvm-svn: 312449	2017-09-03 17:52:23 +00:00
Craig Topper	bb6506d251	[X86] Canonicalize (concat_vectors X, zero) -> (insert_subvector zero, X, 0). In a future patch, I plan to teach isel to use a small vector move with implicit zeroing of the upper elements when it sees the (insert_subvector zero, X, 0) pattern. llvm-svn: 312448	2017-09-03 17:52:19 +00:00
Craig Topper	fe96ff7398	[X86] Add output register to BTC/BTR/BTS instructions. llvm-svn: 312432	2017-09-03 01:46:26 +00:00

... 15 16 17 18 19 ...

45412 Commits