llvm-project

Commit Graph

Author	SHA1	Message	Date
Craig Topper	bdb1729c83	[X86] Teach EltsFromConsecutiveLoads that it's ok to form a v4f32 VZEXT_LOAD with a 64 bit memory size on SSE1 targets. We can use MOVLPS which will load 64 bits, but we need a v4f32 result type. We already have isel patterns for this. The code here is a little hacky. We can probably improve it with more isel patterns.	2020-02-22 18:50:52 -08:00
Craig Topper	e7a184fc7c	[X86] Use movlps for i64 atomic stores on 32-targets with sse1. This is similar to using movd which we do for sse2 targets. I've added a DAG combine for VEXTRACT_STORE to use SimplifyDemandedVectorElts to clean up some artifacts from type legalization.	2020-02-22 18:22:47 -08:00
Simon Moll	635034f193	[VE][fix] missing include	2020-02-22 11:00:59 +01:00
Craig Topper	228a2bc9b7	[X86] Teach combineCVTPH2PS to shrink v8i16 loads when the output type is v4f32. Remove extra isel patterns. Similar to what do for other operations that use a subset of bits. Allows us to remove a pattern that shrinks a load. Which was incorrect if the load was volatile.	2020-02-21 18:11:07 -08:00
Heejin Ahn	3648370a79	[WebAssembly] Fix a non-determinism problem in FixIrreducibleControlFlow Summary: We already sorted the blocks when fixing up a set of mutual loop entries, however, there can be multiple sets of such mutual loop entries, and the order we encounter them should not be random, so sort them too. Fixes https://bugs.llvm.org/show_bug.cgi?id=44982 Patch by Alon Zakai (kripken) Reviewers: aheejin, sbc100, dschuff Subscribers: mgrang, sunfish, hiraditya, jgravelle-google, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D74999	2020-02-21 17:05:46 -08:00
Matt Arsenault	bf4933b4ea	AMDGPU/GlobalISel: Remove dead code	2020-02-21 19:19:32 -05:00
Mark Searles	d3e170c438	Revert "[AMDGPU] Don’t marke the .note section as ALLOC" This reverts commit `977cd661cf`. It breaks OpenCL testing. OpenCL Runtime is using PT_LOAD information to calculate memory for global variables. This commit should be relanded once the OpenCL runtime stops relying on PT_LOAD information for calculating global variable memory size. Differential Revision: https://reviews.llvm.org/D74995	2020-02-21 16:08:30 -08:00
Francis Visoiu Mistrih	a32d539798	[Target] Remove libObject dependency in lib/Target This removes a couple useless includes and the dependency of X86Desc on Object, which was useless as well.	2020-02-21 14:52:31 -08:00
Fangrui Song	fddbff1473	[AArch64] Delete an unneeded dependency on Object after `1874dee566` `1874dee566` moved CPU_(SUB_)TYPE logic to BinaryFormat. Object is not directly referenced.	2020-02-21 14:02:54 -08:00
Fangrui Song	fad1c750f1	[AArch64][SVE] Fix -DBUILD_SHARED_LIBS=on builds after -D74808/1874dee5662603c9251228c71b66de72cec0c979	2020-02-21 13:59:47 -08:00
Fangrui Song	5c33a81b7a	[AArch64][SVE] Fix -Wimplicit-fallthrough after D73711	2020-02-21 13:46:33 -08:00
Cameron McInally	a5b22b768f	[AArch64][SVE] Add support for DestructiveBinary and DestructiveBinaryComm DestructiveInstTypes Add support for DestructiveBinaryComm DestructiveInstType, as well as the lowering code to expand the new Pseudos into the final movprfx+instruction pairs. Differential Revision: https://reviews.llvm.org/D73711	2020-02-21 15:19:54 -06:00
Jay Foad	b72f1448ce	AMDGPU/GlobalISel: Better code for one case of G_SHUFFLE_VECTOR on v2i16 Reviewers: arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D74987	2020-02-21 21:16:39 +00:00
Francis Visoiu Mistrih	1874dee566	[macho][NFC] Extract all CPU_(SUB_)TYPE logic to BinaryFormat This moves all the logic of converting LLVM Triples to MachO::CPU_(SUB_)TYPE from the specific target (Target)AsmBackend to more convenient functions in lib/BinaryFormat. This also gets rid of the separate two X86AsmBackend classes. The previous attempt was to add it to libObject, but that adds an unnecessary dependency to libObject from all the targets. Differential Revision: https://reviews.llvm.org/D74808	2020-02-21 12:43:29 -08:00
Craig Topper	8875ee18d7	[X86] Add a new format type for instructions that represent named prefix bytes like data16 and rep. Use it to make a simpler version of isPrefix. isPrefix was added to support the patches to align branches. it relies on a switch over instruction names. This moves those opcodes to a new format so the information is tablegen and we can just check for a specific value in some bits in TSFlags instead. I've left the other function in place for now so that the existing patches in phabricator will still work. I'll work with the owner to get them migrated.	2020-02-21 12:34:59 -08:00
Francesco Petrogalli	33bf119647	[llvm][CodeGen][aarch64] Add contiguous prefetch intrinsics for SVE. Summary: The patch covers both register/register and register/immediate addressing modes. Reviewers: efriedma, andwar, sdesmalen Reviewed By: sdesmalen Subscribers: sdesmalen, tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D74581	2020-02-21 20:22:25 +00:00
Francesco Petrogalli	e2ed1d14d6	[llvm][aarch64] SVE addressing modes. Summary: Added register + immediate and register + register addressing modes for the following intrinsics: 1. Masked load and stores: * Sign and zero extended load and truncated stores. * No extension or truncation. 2. Masked non-temporal load and store. Reviewers: andwar, efriedma Subscribers: cameron.mcinally, sdesmalen, tschuett, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D74254	2020-02-21 20:02:34 +00:00
Cameron McInally	266959c0f7	[AArch64][SVE] Add backend support for splats of immediates This patch adds backend support for splats of both Int and FP immediates. Differential Revision: https://reviews.llvm.org/D74856	2020-02-21 13:21:47 -06:00
Matt Arsenault	00955a62e4	AMDGPU/GlobalISel: Fix SALU mapping for v2s16 min/max The legalizer helper functions are unusably awkward to perform the 3-5 part legalization. This needs to be widened, scalarized, lowered, and we should avoid creating vector extends and truncates. Manually do all of this and expand.	2020-02-21 14:02:16 -05:00
Matt Arsenault	db06870dbd	AMDGPU: Move dot intrinsic patterns to instruction def I tried to use some of the new tablegen features to avoid creating different operand list permutations, but I still don't see a way to programmatically build a source pattern dag. Also add GlobalISel tests, which now all import successfully. Some of the fneg fold tests are incorrect, which need to be fixed in a future commit	2020-02-21 13:35:40 -05:00
Matt Arsenault	4c1c9422a3	AMDGPU/GlobalISel: Select llvm.amdgcn.fdot2 I'm slighly worried about the generated checks, since they won't catch incorrect modifiers being added at the end of the line.	2020-02-21 13:35:40 -05:00
Matt Arsenault	dfce5fd50a	AMDGPU/GlobalISel: Select VOP3P instructions This only handles the basic cases. More work is needed to make better use of op_sel.	2020-02-21 13:35:40 -05:00
Matt Arsenault	72eef820d5	AMDGPU/GlobalISel: Select G_SHUFFLE_VECTOR G_SHUFFLE_VECTOR is legal since it theoretically may help match op_sel for VOP3P instructions. Expand it in some other way in case it doesn't fold into the use instructions.	2020-02-21 13:35:40 -05:00
Nikita Popov	c90ea87cfd	[X86] Fix SDLoc initialization Fixes -Wparentheses warning, in this case indicating a genuine bug.	2020-02-21 18:26:05 +01:00
Jonas Paulsson	41bd9ead35	[SystemZ] Return scalarized costs for vector instructions on older archs. A cost query for a vector instruction should return a cost even without target vector support, and not trigger an assert. VectorCombine does this with an input containing source code vectors. Review: Ulrich Weigand	2020-02-21 09:17:37 -08:00
Matt Arsenault	60023e3471	AMDGPU: Use default operand for VOP3P clamp We don't use this, and matching from the def doesn't make much sense. There are multiple tablegen bugs with default operand handling. undef_tied_input should work to handle the vdst_in correctly, but this breaks the operand register class constraint which it should be able to infer.	2020-02-21 12:14:18 -05:00
Danilo Carvalho Grael	db9c40f562	[AArch64][SVE] Add intrinsics for SVE2 bitwise ternary operations Summary: Add intrinsics for the following operations: - eor3, bcax - bsl, bsl1n, bsl2n, nbsl Fix MC tests for bsl instructions. Reviewers: kmclaughlin, c-rhodes, sdesmalen, efriedma, rengolin Reviewed By: efriedma Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D74785	2020-02-21 12:15:51 -05:00
Matt Arsenault	043ed2e22a	AMDGPU/GlobalISel: Fix xnor matching We should try the generated matchers before the manual selection. This means the patterns are now handling the common cases, but the manual selection code is not yet dead. It's still handling the non-s32/s64 cases (like v2s16 and v2s32). Currently tablegen doesn't have a nice way to have a single pattern that covers multiple types.	2020-02-21 11:42:49 -05:00
David Green	83012cb217	[ARM] Correct Formatting. NFC Also removed an unnecessary TODO that I don't believe is relevant for the instruction in question.	2020-02-21 16:08:56 +00:00
Matt Arsenault	ac7abe0ba9	AMDGPU/GlobalISel: Manually select G_BUILD_VECTOR_TRUNC We have patterns for s_pack* selection, but they assume the inputs are a build_vector with 16-bit inputs, not a truncating build vector. Since there's still outstanding work for how to handle mismatched result and source element vector operations, and since I'm trying a different packed vector strategy than SelectionDAG, just manually select this for now.	2020-02-21 10:34:11 -05:00
Matt Arsenault	79ff188add	AMDGPU/GlobalISel: Legalize G_FPOW There are few differences from the DAG handling. First, the DAG handling uses a primitive selection pattern instead of custom legalizing it. Because of this, this makes use of source modifiers while the DAG does not. Also instead of promoting f16, try to use the f16 log/exp. There's no f16 fmul_legacy, so widen just for the multiply, although I'm not sure that's the best solution.	2020-02-21 10:31:13 -05:00
Matt Arsenault	fab4cdea39	AMDGPU/GlobalISel: Select llvm.amdgcn.fmul.legacy	2020-02-21 10:30:26 -05:00
Matt Arsenault	b64aa8c715	AMDGPU/GlobalISel: Fix constant bus violation with source modifiers This looked through copies to find the source modifiers, which may have been SGPR->VGPR copies added to avoid potential constant bus violations. Re-insert a copy to a VGPR if this happens.	2020-02-21 10:30:23 -05:00
Sean Fertile	4fdaac0e1e	[PowerPC][NFC] Remove Darwin specific logic in frame finalization. Remove some cumbersome Darwin specific logic for updating the frame offsets of the condition-register spill slots. The containing function has an early return if the subtarget is not ELF based which makes the Darwin logic dead.	2020-02-21 09:32:24 -05:00
Krzysztof Parzyszek	c51b0bede8	[Hexagon] Introduce noop intrinsic to cast between vector predicate types The (overloaded) intrinsic is llvm.hexagon.V6.pred.typecast[.128B]. The types of the operand and the return value are HVX boolean vector types. For each cast, there needs to be a corresponding intrinsic declared, with different suffixes appended to the name, e.g. ; cast <128 x i1> to <32 x i1> declare <32 x i1> @llvm.hexagon.V6.pred.typecast.128B.s1(<128 x i1>) ; cast <32 x i1> to <64 x i1> declare <64 x i1> @llvm.hexagon.V6.pred.typecast.128B.s2(<32 x i1>) etc.	2020-02-21 07:37:59 -06:00
Swiftfuchs	a24d46318f	[NFC] Corrected a minor typo in a comment	2020-02-21 13:56:44 +01:00
Craig Topper	97f11600e0	[X86] Don't bother avoiding illegal FCMOVs if we don't have the cmov subtarget feature. We'll be forced to emit branches so we might as well use the most direct condition.	2020-02-21 00:34:15 -08:00
Craig Topper	263bef2bbc	[X86] Make combineCMov not create unsupported FCMOVs when f32/f64 are using X87. This makes the behavior consistent with what's in LowerSELECT.	2020-02-21 00:34:15 -08:00
Craig Topper	4576606831	[X86] Remove unnecessary isNullConstant in LowerSelect. NFC At this point in the code we know that Op1 or Op2 is all ones. Y points to the other operand. In the case that Op2 is zero, Op1 must be all ones and Y is Op2. The OR ORs Y into Res. But if Y is 0 the OR will be folded away by getNode so we don't need to check for it.	2020-02-20 21:41:13 -08:00
Craig Topper	78be618717	[X86] Add CMOV_VR64 pseudo instruction for MMX. Remove mmx handling from combineSelect. The combineSelect code was casting to i64 without any check that i64 was legal. This can break after type legalization. It also required splitting the mmx register on 32-bit targets. It's not clear that this makes sense. Instead switch to using a cmov pseudo like we do for XMM/YMM/ZMM.	2020-02-20 20:30:56 -08:00
Jim Lin	e27b61c1ea	[XCore] Add instruction pattern for bitrev Summary: Add support for lowering bitreverse to the bitrev instruction. Fix https://bugs.llvm.org/show_bug.cgi?id=34628. Reviewers: RKSimon, rtrieu, robertlytton Reviewed By: RKSimon Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D74748	2020-02-21 09:28:49 +08:00
Craig Topper	e5782377f3	[X86] Add CMOV_VK1 pseudo so we don't crash on v1i1 ISD::SELECT	2020-02-20 15:13:48 -08:00
Craig Topper	7e92769862	[X86] Expand vselect of v1i1 under avx512. We already do this for v2i1, v4i1, etc.	2020-02-20 15:13:47 -08:00
Craig Topper	b00ef8951b	[X86] Custom legalize v1i1 UADDSAT/USUBSAT/SADDSAT/UADDSAT to match v2i1/v4i1/v8i1 etc.	2020-02-20 15:13:46 -08:00
Craig Topper	5228a5544b	[X86] Fix a couple copy mistakes in v4i1 or/and/xor isel patterns. VK1 was being used as the output of the copy to regclass, but it should be VK2/VK4. Shouldn't matter in practice though since VK1/VK2/VK4/VK8/VK16 are all identicaly and just have different VTs.	2020-02-20 15:13:45 -08:00
Craig Topper	d95a10a7f9	[X86] Custom legalize v1i1 add/sub/mul to xor/xor/and with avx512. We already did this for v2i1, v4i1, v8i1, etc.	2020-02-20 15:13:44 -08:00
Craig Topper	c7b54a196e	Recommit "[X86] Replace a bad use of MVT::getVectorVT with EVT::getVectorVT"" With the correct author this time	2020-02-20 12:28:54 -08:00
Craig Topper	1d8860f90b	Revert `714265dabb` "[X86] Replace a bad use of MVT::getVectorVT with EVT::getVectorVT" I accidentally messed up the author on the previous commit somehow.	2020-02-20 12:28:33 -08:00
Quentin Colombet	714265dabb	[X86] Replace a bad use of MVT::getVectorVT with EVT::getVectorVT The type here isn't guaranteed to be a simple type. Fixes PR44976	2020-02-20 12:25:37 -08:00
Nico Weber	6f4d9d1029	Revert "[AArch64][SVE] Add intrinsics for SVE2 bitwise ternary operations" This reverts commit `ce70e28998`. It broke MC/AArch64/SVE2/bsl-diagnostics.s everywhere.	2020-02-20 15:11:13 -05:00
Francesco Petrogalli	0c8fa6db90	[llvm][build] Fix shared lib builds. [NFC] The code at https://reviews.llvm.org/D74808 has broken builds that are configured with -DBUILD_SHARED_LIBS=On. This patch adds the correct library dependencies.	2020-02-20 19:42:53 +00:00
Sanjay Patel	064cd2ecdb	[x86] allow peeking through an extract_subvector to find a splatted operand The motivating case is seen in "splat4_v8f32_load_store" and based on code in PR42024: https://bugs.llvm.org/show_bug.cgi?id=42024 (I haven't stepped through the v8i32 sibling test yet to see why that diverged.) There are other potential improvements visible like allowing scalarization or vector narrowing. Differential Revision: https://reviews.llvm.org/D74909	2020-02-20 13:59:59 -05:00
Sean Fertile	da181d4ba0	[PowerPC][NFC] Cleanup some of the Darwin mentions in the README.txt.	2020-02-20 13:57:13 -05:00
Francis Visoiu Mistrih	3f785212e9	Revert "[macho][NFC] Extract all CPU_(SUB_)TYPE logic to libObject" This reverts commit `726c342ce2`. This breaks the windows bots with linker errors.	2020-02-20 10:51:25 -08:00
Francis Visoiu Mistrih	726c342ce2	[macho][NFC] Extract all CPU_(SUB_)TYPE logic to libObject This moves all the logic of converting LLVM Triples to MachO::CPU_(SUB_)TYPE from the specific target (Target)AsmBackend to more convenient functions in libObject. This also gets rid of the separate two X86AsmBackend classes. Differential Revision: https://reviews.llvm.org/D74808	2020-02-20 10:28:07 -08:00
Craig Topper	0ed7a61543	[X86] Fix a -Wparentheses warning. NFC	2020-02-20 09:32:03 -08:00
Craig Topper	3543ac9ab5	[X86] Rewrite LowerBRCOND to remove dead code and handle ISD::SETCC and overflow ops directly. There's a lot of old leftover code in LowerBRCOND. Especially the detecting or AND or OR of X86ISD::SETCC nodes. Those were needed before LegalizeDAG was changed to visit nodes before their operands. It also relied on reversing the output of LowerSETCC to find the flags producing node to use for the X86ISD::BRCOND node. Rather than using LowerSETCC this patch uses emitFlagsForSetcc to handle the integer ISD::SETCC case. This gives the flag producer and the comparison code to use directly. I've removed the addTest flag and just produce a X86ISD::BRCOND and return immediately. Floating point ISD::SETCC case is just an X86ISD::FCMP with special care for OEQ and UNE derived from the previous code. I've left f128 out so it will emit a test. And LowerSETCC will be called later to produce a libcall and X86ISD::SETCC. We have combines that can merge the test and X86ISD::SETCC. We need to handle two cases for overflow ops. Either they are used directly or they have a seteq 0 or setne 1 to invert the overflow. The old code did not handle the setne 1 case, but I think some other combines were making up for it. If we fail to find a condition, we'll wrap an AND with 1 on the original condition and tell emitFlagsForSetcc to emit a compare with 0. This will pickup the LowerAndToBT and or the EmitTest case. I kept the isTruncWithZeroHighBitsInput call, but we might be able to fold that in to emitFlagsForSetcc. Differential Revision: https://reviews.llvm.org/D74750	2020-02-20 08:50:18 -08:00
Craig Topper	9bbf271fc9	[AArch64] Move isOverflowIntrOpRes help function to the ISD namespace in SelectionDAG.h. NFC Enables sharing with an upcoming X86 change.	2020-02-20 08:50:17 -08:00
Danilo Carvalho Grael	ce70e28998	[AArch64][SVE] Add intrinsics for SVE2 bitwise ternary operations Summary: Add intrinsics for the following operations: - eor3, bcax - bsl, bsl1n, bsl2n, nbsl Reviewers: kmclaughlin, c-rhodes, sdesmalen, efriedma, rengolin Reviewed By: efriedma Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D74785	2020-02-20 11:36:48 -05:00
Craig Topper	12cc105f80	[X86] Add DAG combines to form CVTPH2PS/CVTPS2PH from vXf16->vXf32/vXf64 fp_extends and vXf32->vXf16 fp_round. Only handle power of 2 element count for simplicity. Not sure what to do with vXf64->vXf16 fp_round to avoid double rounding Differential Revision: https://reviews.llvm.org/D74886	2020-02-20 08:26:17 -08:00
Matt Arsenault	083717cf49	AMDGPU: Fix v2i64<->v4f32 bitcast I'm not sure how to test the v2i64->v4f32 case since I can't think of any v2i64 cases that won't legalize to v4i32.	2020-02-20 09:49:09 -05:00
Sebastian Neubauer	977cd661cf	[AMDGPU] Don’t marke the .note section as ALLOC Marking a section as ALLOC tells the ELF loader to load the section into memory. As we do not want to load the notes into VRAM, the flag should not be there. Differential Revision: https://reviews.llvm.org/D74600	2020-02-20 15:14:48 +01:00
Djordje Todorovic	2f215cf36a	Revert "Reland "[DebugInfo] Enable the debug entry values feature by default"" This reverts commit rGfaff707db82d. A failure found on an ARM 2-stage buildbot. The investigation is needed.	2020-02-20 14:41:39 +01:00
Andrzej Warzynski	0e417b034a	[AArch64][SVE] Re-arrange definitions in AArch64SVEInstrInfo.td (NFC) Re-arrange definitions related to loads and stores so that they are grouped together. This patch implements only non-functional changes.	2020-02-20 12:41:16 +00:00
Simon Pilgrim	6085593c12	[AMDGPU] simplifyI24 - replace GetDemandedBits with SimplifyMultipleUseDemandedBits GetDemandedBits mostly just calls SimplifyMultipleUseDemandedBits now, but it does a very blunt constant simplification that SimplifyMultipleUseDemandedBits avoids. If we need to demand bits from constants we should handle this through ShrinkDemandedConstant/targetShrinkDemandedConstant. @arsenm confirmed that the sign extended immediates are better for code size. Differential Revision: https://reviews.llvm.org/D74857	2020-02-20 12:03:08 +00:00
Mikhail Maltsev	f4fd7dbf85	[ARM,MVE] Add vqdmull[b,t]q intrinsic families Summary: This patch adds two families of ACLE intrinsics: vqdmullbq and vqdmulltq (including vector-vector and vector-scalar variants) and the corresponding LLVM IR intrinsics llvm.arm.mve.vqdmull and llvm.arm.mve.vqdmull.predicated. Reviewers: simon_tatham, MarkMurrayARM, dmgreen, ostannard Reviewed By: MarkMurrayARM Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D74845	2020-02-20 10:51:19 +00:00
Thomas Lively	16aabc86e0	[WebAssembly] Fix memory bug introduced in `5286180999` Summary: The instruction at `DefI` can sometimes be destroyed by `rematerializeCheapDef`, so it should not be used after calling that function. The fix is to use `Insert` instead when examining additional multivalue stackifications. `Insert` is the address of the new defining instruction after all moves and rematerializations have taken place. Reviewers: aheejin Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D74875	2020-02-19 15:07:45 -08:00
Matt Arsenault	4bb0c8f91c	AMDGPU: Enable integer division bypass We probably want this, and I've meant to turn this on for a long time. SC actually emits a special case to early-out for a 1 denominator, which perhaps should also be considered.	2020-02-19 17:50:19 -05:00
Matt Arsenault	cbc3b3046f	AMDGPU/GlobalISel: Remove outdated comment	2020-02-19 17:32:25 -05:00
Stanislav Mekhanoshin	03954a12ae	[AMDGPU] Fix DS_WRITE_B32 patterns It uses VGPR_32.RegTypes which includes 16 bit types. As a result DS_WRITE_B32 may be generated for "store i16" which is a bug. The only reason we do not hit it now is relative patterns complexity and sorting. Should DS_WRITE_B16 pattern complexity become higher and the bug appears. Differential Revision: https://reviews.llvm.org/D74868	2020-02-19 13:42:16 -08:00
Krzysztof Parzyszek	b1d47467e2	[Hexagon] Change HVX vector predicate types from v512/1024i1 to v64/128i1 This commit removes the artificial types <512 x i1> and <1024 x i1> from HVX intrinsics, and makes v512i1 and v1024i1 no longer legal on Hexagon. It may cause existing bitcode files to become invalid. * Converting between vector predicates and vector registers must be done explicitly via vandvrt/vandqrt instructions (their intrinsics), i.e. (for 64-byte mode): %Q = call <64 x i1> @llvm.hexagon.V6.vandvrt(<16 x i32> %V, i32 -1) %V = call <16 x i32> @llvm.hexagon.V6.vandqrt(<64 x i1> %Q, i32 -1) The conversion intrinsics are: declare <64 x i1> @llvm.hexagon.V6.vandvrt(<16 x i32>, i32) declare <128 x i1> @llvm.hexagon.V6.vandvrt.128B(<32 x i32>, i32) declare <16 x i32> @llvm.hexagon.V6.vandqrt(<64 x i1>, i32) declare <32 x i32> @llvm.hexagon.V6.vandqrt.128B(<128 x i1>, i32) They are all pure. * Vector predicate values cannot be loaded/stored directly. This directly reflects the architecture restriction. Loading and storing or vector predicates must be done indirectly via vector registers and explicit conversions via vandvrt/vandqrt instructions.	2020-02-19 14:14:56 -06:00
Craig Topper	f559cecc3e	[X86] Add DCI.isBeforeLegalize() check to the v64i1 constant splitting code in combineStore. We only need to split after type legalization. If we're before we can just use a wide store and type legalization will split it. Add a v128i1 test to exercise it post type legalization.	2020-02-19 09:18:16 -08:00
Stanislav Mekhanoshin	ada205e91e	[AMDGPU] Fix assumption about LaneBitmask content Yet another assumption about an actual LaneBitmask content is fixed. Differential Revision: https://reviews.llvm.org/D74805	2020-02-19 09:07:11 -08:00
Mikhail Maltsev	461fd94f00	[ARM,MVE] Fix predicate types of some intrinsics Summary: Some predicated MVE intrinsics return a vector with element size different from the input vector element size. In this case the predicate must type correspond to the output vector type. The following intrinsics use the incorrect predicate type: * llvm.arm.mve.mull.int.predicated * llvm.arm.mve.mull.poly.predicated * llvm.arm.mve.vshll.imm.predicated This patch fixes the issue. Reviewers: simon_tatham, dmgreen, ostannard, MarkMurrayARM Reviewed By: MarkMurrayARM Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D74838	2020-02-19 16:24:54 +00:00
Cameron McInally	3931734990	[AArch64][SVE] Add initial backend support for FP splat_vector Differential Revision: https://reviews.llvm.org/D74632	2020-02-19 10:19:11 -06:00
Stefan Pintilie	440ca29ea2	[Hexagon][NFC] Rename VK_Hexagon_PCREL to VK_PCREL On PowerPC we will soon need to use pcrel to indicate PC Relative addressing. Renamed the Hexagon specific variant kind to a non target specific VK so that it can be used on both Hexagon and PowerPC. Differential Revision: https://reviews.llvm.org/D74788	2020-02-19 09:52:58 -06:00
Matt Arsenault	ff4639f060	AMDGPU/GlobalISel: Select MUBUF path for global atomic cmpxchg I'm not sure why this isn't a pattern, but the DAG manually selects this.	2020-02-19 06:19:22 -08:00
Pierre-vh	39cecabece	[AArch64][ASMParser] Refuse equal source/destination for LDRAA/LDRAB Differential Revision: https://reviews.llvm.org/D74822	2020-02-19 14:15:17 +00:00
Sam Parker	de3e65e60c	[ARM][LowOverheadLoops] Check loop liveouts Check that no Q-regs are live out of the loop, unless the instruction within the loop is predicated on the vctp. Differential Revision: https://reviews.llvm.org/D72713	2020-02-19 12:59:01 +00:00
David Green	33aa5dfe9c	[ARM] VMLAVA reduction patterns Similar to VADDV and VADDLV that have been added recently, this adds lowering and patterns for VMLAV, VMLAVA, VMLALV and VMLALVA. They perform the same roles as the add's, just folding a mul into the same instruction (and so taking two inputs). As such, they need to be lowered in the same way as the types are often not legal. Differential Revision: https://reviews.llvm.org/D74390	2020-02-19 12:39:58 +00:00
Simon Pilgrim	4af8db317d	[AMDGPU] performCvtF32UByteNCombine - add SHL and SimplifyMultipleUseDemandedBits support This is part of the work to remove SelectionDAG::GetDemandedBits and just use SimplifyMultipleUseDemandedBits. Recent experiments raised some v_cvt_f32_ubyte*_e32 regressions, so I've added some additional abilities to performCvtF32UByteNCombine to help unpack byte data more aggressively. We still don't remove all OR(SHL,SRL) patterns as some of the regenerated nodes don't get combined again, but we are getting closer. Differential Revision: https://reviews.llvm.org/D74786	2020-02-19 11:45:57 +00:00
David Green	fceb3e3b4a	[ARM] MVE VADDLV lowering Following on from the extra VADDV lowering, this extends things to handle VADDLV which allows summing values into a pair of i32 registers, together treated as a i64. This needs to be done in DAGCombine too as the types are otherwise illegal, which is a fairly simple addition on top of the existing code. There is also a VADDLVA instruction handled here, that adds the incoming values from the two general purpose registers. As opposed to the non-long version where we could just add patterns for add(x, VADDV), the long version needs to handle this early before the i64 has being split into too many pieces. Differential Revision: https://reviews.llvm.org/D74224	2020-02-19 11:07:20 +00:00
Petar Avramovic	5e32e7981b	[MIPS GlobalISel] Legalize non-power-of-2 and unaligned load and store Custom legalize non-power-of-2 and unaligned load and store for MIPS32r5 and older, custom legalize non-power-of-2 load and store for MIPS32r6. Don't attempt to combine non power of 2 loads or unaligned loads when subtarget doesn't support them (MIPS32r5 and older). Differential Revision: https://reviews.llvm.org/D74625	2020-02-19 12:02:27 +01:00
Petar Avramovic	5171d1523d	[MIPS GlobalISel] Select 4 byte unaligned load and store Improve legality checks for load and store, 4 byte scalar load and store are now legal for all subtargets. During regbank selection 4 byte unaligned loads and stores for MIPS32r5 and older get mapped to gprb. Select 4 byte unaligned loads and stores for MIPS32r5. Fix tests that unintentionally had unaligned load or store. Differential Revision: https://reviews.llvm.org/D74624	2020-02-19 11:57:06 +01:00
Florian Hahn	216afd3301	[TargetLower] Update shouldFormOverflowOp check if math is used. On some targets, like SPARC, forming overflow ops is only profitable if the math result is used: https://godbolt.org/z/DxSmdB This patch adds a new MathUsed parameter to allow the targets to make the decision and defaults to only allowing it if the math result is used. That is the conservative choice. This patch also updates AArch64ISelLowering, X86ISelLowering, ARMISelLowering.h, SystemZISelLowering.h to allow forming overflow ops if the math result is not used. On those targets using the overflow intrinsic for the overflow check only generates better code. Reviewers: nikic, RKSimon, lebedev.ri, spatel Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D74722	2020-02-19 11:28:33 +01:00
Kerry McLaughlin	63236078d2	[AArch64][SVE] Add SVE2 intrinsics for polynomial arithmetic Summary: Implements the following intrinsics: - @llvm.aarch64.sve.eorbt - @llvm.aarch64.sve.eortb - @llvm.aarch64.sve.pmullb.pair - @llvm.aarch64.sve.pmullt.pair Reviewers: sdesmalen, c-rhodes, dancgr, cameron.mcinally, efriedma, rengolin Reviewed By: efriedma Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, cfe-commits, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D74769	2020-02-19 10:12:50 +00:00
Djordje Todorovic	faff707db8	Reland "[DebugInfo] Enable the debug entry values feature by default" Differential Revision: https://reviews.llvm.org/D73534	2020-02-19 11:12:26 +01:00
David Green	51c6e9445c	[ARM] Extra MVE VADDV reduction patterns We already make use of the VADDV vector reduction instruction for cases where the input and the output start out at the same type. The MVE instruction however will sum into an i32, so if we are summing a v16i8 into an i32, we can still use the same instructions. In terms of IR, this looks like a sext of a legal type (v16i8) into a very illegal type (v16i32) and a vecreduce.add of that into the result. This means we have to catch the pattern early in a DAG combine, producing a target VADDVs/u node, where the signedness is now important. This is the first part, handling VADDV and VADDVA. There are also VADDVL/VADDVLA instructions, which are interesting because they sum into a 64bit value. And VMLAV and VMLALV, which are interesting because they also do a multiply of two values. It may look a little odd in places as a result. On it's own this will probably not do very much, as the vectorizer will not produce this IR yet. Differential Revision: https://reviews.llvm.org/D74218	2020-02-19 09:45:35 +00:00
Petar Avramovic	92c80529dd	[MIPS GlobalISel] RegBankSelect G_MERGE_VALUES and G_UNMERGE_VALUES Consider large operands in G_MERGE_VALUES and G_UNMERGE_VALUES as Ambiguous during regbank selection. Introducing new InstType AmbiguousWithMergeOrUnmerge which will allow us to recognize whether to narrow scalar or use s64:fprb. This change exposed a bug when reusing data from TypeInfoForMF. Thus when Instr is about to get destroyed (using narrow scalar) clear its data in TypeInfoForMF. Internal data is saved based on Instr's address, and it will no longer be valid. Add detailed asserts for InstType and operand size. Generate generic instructions instead of MIPS target instructions during argument lowering and custom legalizer. Select G_UNMERGE_VALUES and G_MERGE_VALUES when proper banks are selected: {s32:gprb, s32:gprb, s64:fprb} for G_UNMERGE_VALUES and {s64:fprb, s32:gprb, s32:gprb} for G_MERGE_VALUES. Update tests. One improvement is when floating point argument in gpr(or two gprs) gets passed to another function through gpr unnecessary fpr-to-gpr moves are no longer generated. Differential Revision: https://reviews.llvm.org/D74623	2020-02-19 10:09:52 +01:00
Craig Topper	f69a29da5a	[X86] Remove vXi1 select optimization from LowerSELECT. Move it to DAG combine.	2020-02-19 00:00:55 -08:00
Craig Topper	0dbc4658d8	[X86] Handle splats in LowerBUILD_VECTORvXi1 by directly emitting scalar selects instead of deferring that to LowerSELECT. LoweSELECT will detect the constant inputs and convert to scalar selects, but we can do it directly here. I might remove some of the code from LowerSELECT and move it to DAG combine so doing this explicitly will make us less dependent on it happening in lowering.	2020-02-18 22:39:30 -08:00
Thomas Lively	ca9ba76481	[WebAssembly] Replace all calls with generalized multivalue calls Summary: Extends the multivalue call infrastructure to tail calls, removes all legacy calls specialized for particular result types, and removes the CallIndirectFixup pass, since all indirect call arguments are now fixed up directly in the post-insertion hook. In order to keep supporting pretty-printed defs and uses in test expectations, MCInstLower now inserts an immediate containing the number of defs for each call and call_indirect. The InstPrinter is updated to query this immediate if it is present and determine which MCOperands are defs and uses accordingly. Depends on D72902. Reviewers: aheejin Subscribers: dschuff, mgorny, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D74192	2020-02-18 15:55:20 -08:00
Thomas Lively	5286180999	[WebAssembly] Fix RegStackify and ExplicitLocals to handle multivalue Summary: There is still room for improvement in the handling of multivalue nodes in both passes, but the current algorithm is at least correct and optimizes some simpler cases. In order to make future optimizations of these passes easier and build confidence that the current algorithms are correct, this CL also adds a script that automatically and exhaustively generates interesting multivalue test cases. Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D72902	2020-02-18 14:56:09 -08:00
Reid Kleckner	0c2b09a9b6	[IR] Lazily number instructions for local dominance queries Essentially, fold OrderedBasicBlock into BasicBlock, and make it auto-invalidate the instruction ordering when new instructions are added. Notably, we don't need to invalidate it when removing instructions, which is helpful when a pass mostly delete dead instructions rather than transforming them. The downside is that Instruction grows from 56 bytes to 64 bytes. The resulting LLVM code is substantially simpler and automatically handles invalidation, which makes me think that this is the right speed and size tradeoff. The important change is in SymbolTableTraitsImpl.h, where the numbering is invalidated. Everything else should be straightforward. We probably want to implement a fancier re-numbering scheme so that local updates don't invalidate the ordering, but I plan for that to be future work, maybe for someone else. Reviewed By: lattner, vsk, fhahn, dexonsmith Differential Revision: https://reviews.llvm.org/D51664	2020-02-18 14:44:24 -08:00
Thomas Lively	9d37f5afac	[WebAssembly] Implement multivalue call_indirects Summary: Unlike normal calls, call_indirects have immediate arguments that caused a MachineVerifier failure without a small tweak to loosen the verifier's requirements for variadicOpsAreDefs instructions. One nice thing about the new call_indirects is that they do not need to participate in the PCALL_INDIRECT mechanism because their post-isel hook handles moving the function pointer argument and adding the flags and typeindex arguments itself. Reviewers: aheejin Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D74191	2020-02-18 13:49:46 -08:00
Thomas Lively	d51910967f	Reland "[WebAssembly] Split and recombine multivalue calls for ISel" This reverts commit `8acedb595d` and relands a prerequisite for the patch series culminating in https://reviews.llvm.org/D74192.	2020-02-18 13:49:46 -08:00
Thomas Lively	7b64a59060	Reland "[WebAssembly][InstrEmitter] Foundation for multivalue call lowering" This reverts commit `649aba93a2`, now that the approach started there has been shown to be workable in the patch series culminating in https://reviews.llvm.org/D74192.	2020-02-18 13:49:46 -08:00
Simon Pilgrim	d6eef0614f	[TargetLowering] Add SimplifyMultipleUseDemandedBits 'all elements' helper wrapper. NFC.	2020-02-18 19:53:50 +00:00
Craig Topper	89ab5c69c8	[X86] Add a helper function to pull some repeated code out of combineGatherScatter. NFC	2020-02-18 11:10:40 -08:00
Huihui Zhang	8ee0e1dc02	[NFC] Silence compiler warning [-Wmissing-braces].	2020-02-18 10:37:12 -08:00
Stanislav Mekhanoshin	dd4766451e	[AMDGPU] Use generated RegisterPressureSets enum Differential Revision: https://reviews.llvm.org/D74671	2020-02-18 10:34:03 -08:00
Matt Arsenault	f4d3765fd9	CodeGen: Move undef_tied_input declaration This doesn't belong in ARM specific code since it's generally recognized by tablegen.	2020-02-18 10:33:10 -08:00
Mikhail Maltsev	63809d365e	[ARM,MVE] Add vbrsrq intrinsics family Summary: This patch adds a new MVE intrinsics family, `vbrsrq`: vector bit reverse and shift right. The intrinsics are compiled into the VBRSR instruction. Two new LLVM IR intrinsics were also added: arm.mve.vbrsr and arm.mve.vbrsr.predicated. Reviewers: simon_tatham, dmgreen, ostannard, MarkMurrayARM Reviewed By: simon_tatham Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D74721	2020-02-18 17:31:21 +00:00
Sean Fertile	3126b556d1	[PowerPC][NFC] Add defines to help creating the SpillSlot arrays. Create preprocessor defines for callee saved floating-point register spill slots, vector register spill slots, and both 32-bit and 64-bit general purpose register spill slots. This is an NFC refactor to prepare for adding ABI compliant callee saves and restores for AIX.	2020-02-18 11:52:04 -05:00
Andrew Wei	4ca753f4e3	[RISCV] Implement mayBeEmittedAsTailCall for tail call optimization Implement TargetLowering callback mayBeEmittedAsTailCall for riscv in CodeGenPrepare, which will duplicate return instructions to enable tailcall optimization. Differential Revision: https://reviews.llvm.org/D73699	2020-02-18 23:56:42 +08:00
Sander de Smalen	8fbc925807	Add OffsetIsScalable to getMemOperandWithOffset Summary: Making `Scale` a `TypeSize` in AArch64InstrInfo::getMemOpInfo, has the effect that all places where this information is used (notably, TargetInstrInfo::getMemOperandWithOffset) will need to consider Scale - and derived, Offset - possibly being scalable. This patch adds a new operand `bool &OffsetIsScalable` to TargetInstrInfo::getMemOperandWithOffset and fixes up all the places where this function is used, to consider the offset possibly being scalable. In most cases, this means bailing out because the algorithm does not (or cannot) support scalable offsets in places where it does some form of alias checking for example. Reviewers: rovka, efriedma, kristof.beyls Reviewed By: efriedma Subscribers: wuzish, kerbowa, MatzeB, arsenm, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, javed.absar, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, Jim, lenary, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D72758	2020-02-18 15:53:29 +00:00
Djordje Todorovic	2bf44d11cb	Revert "Reland "[DebugInfo] Enable the debug entry values feature by default"" This reverts commit rGa82d3e8a6e67.	2020-02-18 16:38:11 +01:00
Kazushi (Jam) Marukawa	5526786a56	[VE] TLS codegen Summary: Codegen and tests for thread-local storage. This implements only the general dynamic model due to limitations in nld 2.26. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D74718	2020-02-18 16:09:12 +01:00
Luke Geeson	4518aab289	[AArch64] Add Cortex-A34 Support for clang and llvm This patch upstreams support for the AArch64 Armv8-A cpu Cortex-A34. In detail adding support for: - mcpu option in clang - AArch64 Target Features in clang - llvm AArch64 TargetParser definitions details of the cpu can be found here: https://developer.arm.com/ip-products/processors/cortex-a/cortex-a34 Reviewers: SjoerdMeijer Reviewed By: SjoerdMeijer Subscribers: SjoerdMeijer, kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D74483 Change-Id: Ida101fc544ca183a0a0e61a1277c8957855fde0b	2020-02-18 14:56:16 +00:00
Matt Arsenault	37c452a289	AMDGPU/GlobalISel: Adjust branch target when lowering loop intrinsic This needs to steal the branch target like the other control flow intrinsics.	2020-02-18 06:35:40 -08:00
Djordje Todorovic	a82d3e8a6e	Reland "[DebugInfo] Enable the debug entry values feature by default" This patch enables the debug entry values feature. - Remove the (CC1) experimental -femit-debug-entry-values option - Enable it for x86, arm and aarch64 targets - Resolve the test failures - Leave the llc experimental option for targets that do not support the CallSiteInfo yet Differential Revision: https://reviews.llvm.org/D73534	2020-02-18 14:41:08 +01:00
Kerry McLaughlin	d4576080da	[AArch64][SVE] Add remaining SVE2 intrinsics for widening DSP operations Summary: Implements the following intrinsics: - llvm.aarch64.sve.[s\|u]mullb_lane - llvm.aarch64.sve.[s\|u]mullt_lane - llvm.aarch64.sve.sqdmullb_lane - llvm.aarch64.sve.sqdmullt_lane - llvm.aarch64.sve.[s\|u]addwb - llvm.aarch64.sve.[s\|u]addwt - llvm.aarch64.sve.[s\|u]shllb - llvm.aarch64.sve.[s\|u]shllt - llvm.aarch64.sve.[s\|u]subwb - llvm.aarch64.sve.[s\|u]subwt Reviewers: sdesmalen, dancgr, efriedma, c-rhodes, rengolin Reviewed By: sdesmalen Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, cameron.mcinally, cfe-commits, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73903	2020-02-18 10:28:00 +00:00
Mikhail Maltsev	58f66f8af0	[ARM,CDE] Cosmetic changes, additonal driver tests Summary: This is a follow-up patch addressing post-commit comments in https://reviews.llvm.org/D74044: * Add more Clang driver tests (-march=armv8.1m.main and -march=armv8.1m.main+mve.fp) * Clang-format a chunk in ARMAsmParser.cpp * Add a missing copyright header to ARMInstrCDE.td Reviewers: SjoerdMeijer, simon_tatham, dmgreen Reviewed By: SjoerdMeijer Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D74732	2020-02-18 10:23:09 +00:00
Simon Tatham	c32af4447f	[ARM,MVE] Add the vmovnbq,vmovntq intrinsic family. Summary: These are in some sense the inverse of vmovl[bt]q: they take a vector of n wide elements and truncate each to half its width. So they only write half a vector's worth of output data, and therefore they also take an 'inactive' parameter to provide the other half of the data in the output vector. So vmovnb overwrites the even lanes of 'inactive' with the narrowed values from the main input, and vmovnt overwrites the odd lanes. LLVM had existing codegen which generates these MVE instructions in response to IR that takes two vectors of wide elements, or two vectors of narrow ones. But in this case, we have one vector of each. So my clang codegen strategy is to narrow the input vector of wide elements by simply reinterpreting it as the output type, and then we have two narrow vectors and can represent the operation as a vector shuffle that interleaves lanes from both of them. Even so, not all the cases I needed ended up being selected as a single MVE instruction, so I've added a couple more patterns that spot combinations of the 'MVEvmovn' and 'ARMvrev32' SDNodes which can be generated as a VMOVN instruction with operands swapped. This commit adds the unpredicated forms only. Reviewers: dmgreen, miyuki, MarkMurrayARM, ostannard Reviewed By: dmgreen Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D74337	2020-02-18 09:34:50 +00:00
Simon Tatham	5e97940cd2	[ARM,MVE] Add the vmovlbq,vmovltq intrinsic family. Summary: These intrinsics take a vector of 2n elements, and return a vector of n wider elements obtained by sign- or zero-extending every other element of the input vector. They're represented in IR as a shufflevector that extracts the odd or even elements of the input, followed by a sext or zext. Existing LLVM codegen already matches this pattern and generates the VMOVLB instruction (which widens the even-index input lanes). But no existing isel rule was generating VMOVLT, so I've added some. However, the new rules currently only work in little-endian MVE, because the pattern they expect from isel lowering includes a bitconvert which doesn't have the right semantics in big-endian. The output of one existing codegen test is improved by those new rules. This commit adds the unpredicated forms only. Reviewers: dmgreen, miyuki, MarkMurrayARM, ostannard Reviewed By: dmgreen Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D74336	2020-02-18 09:34:50 +00:00
Simon Tatham	9dcc1667ab	[ARM] Allow `ARMVectorRegCast` to match bitconverts too. (NFC) Summary: When we start putting instances of `ARMVectorRegCast` in complex isel patterns, it will be awkward that they're often turned into the more standard `bitconvert` in little-endian mode. We'd rather not have to write separate isel patterns for the two endiannesses, matching different but equivalent cast operations. This change aims to fix that awkwardness in advance, by turning the Tablegen record `ARMVectorRegCast` from a simple `SDNode` instance into a `PatFrags` that can match either kind of cast – with a predicate that prevents it matching a bitconvert in the big-endian case, where bitconvert isn't semantically identical. No existing code generation should be affected by this change, but it will enable the patterns introduced by D74336 to work in both endiannesses. Reviewers: dmgreen Reviewed By: dmgreen Subscribers: kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D74716	2020-02-18 09:34:50 +00:00
Simon Tatham	68b49f7ef4	[ARM,MVE] Add intrinsics vclzq and vclsq. Summary: vclzq maps nicely to the existing target-independent @llvm.ctlz IR intrinsic. But vclsq ('count leading sign bits') has no corresponding target-independent intrinsic, so I've made up @llvm.arm.mve.vcls. This commit adds the unpredicated forms only. Reviewers: dmgreen, miyuki, MarkMurrayARM, ostannard Reviewed By: miyuki Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D74335	2020-02-18 09:34:50 +00:00
Simon Tatham	c8b3196e54	[ARM,MVE] Add intrinsics for FP rounding operations. Summary: This adds the unpredicated forms of six different MVE intrinsics which all round a vector of floating-point numbers to integer values, leaving them still in FP format, differing only in rounding mode and exception settings. Five of them map to existing target-independent intrinsics in LLVM IR, such as @llvm.trunc and @llvm.rint. The sixth, mapping to the `vrintn` instruction, is done by inventing a target-specific intrinsic. (`vrintn` behaves the same as `vrintx` in terms of the output value: the side effects on the FPSCR flags are the only difference between the two. But ACLE specifies separate user-callable intrinsics for the two, so the side effects matter enough to make sure we generate the right one of the two instructions in each case.) Reviewers: dmgreen, miyuki, MarkMurrayARM, ostannard Reviewed By: miyuki Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D74333	2020-02-18 09:34:50 +00:00
Craig Topper	e90dc7c48b	[X86] Move avx512 code that forces zeros to the false side of vselects above a check for legal types. This helps this transform occur earlier so we can fold the not with setcc. If we delay it until after type legalization we might have introduced instructions to widen the mask if the vselect was widened. This can prevent the not from making it to the setcc. We could of course add more DAG combines to handle that, but moving this earlier is easier.	2020-02-17 22:24:21 -08:00
Craig Topper	b0840934a7	[X86] Use isScalarFPTypeInSSEReg to simplify code in LowerSELECT. NFC	2020-02-17 19:43:57 -08:00
Jim Lin	fa75bffbbb	[XCore][NFC] Remove trailing space	2020-02-18 10:32:58 +08:00
Craig Topper	3f4490d384	[X86] Add one use check to '0-x == y --> x+y == 0' in EmitCmp. I failed to copy it when I moved this in `b62de210cf`	2020-02-17 18:16:42 -08:00
Stanislav Mekhanoshin	8e760e1018	[TBLGEN] Inhibit generation of unneeded psets Differential Revision: https://reviews.llvm.org/D74744	2020-02-17 15:38:08 -08:00
Craig Topper	68400a2308	[X86] Add missing isel pattern for BLCFILL producing flags.	2020-02-17 13:20:13 -08:00
Matt Arsenault	5e8792453d	AMDGPU/GlobalISel: Fix RegBankSelect for G_SHUFFLE_VECTOR	2020-02-17 15:11:25 -05:00
Matt Arsenault	f742a28ae3	AMDGPU/GlobalISel: Custom lower 32-bit G_SDIV/G_SREM	2020-02-17 15:09:51 -05:00
Matt Arsenault	e240b27d6d	AMDGPU/GlobalISel: Allow arbitrary global values Treat unknown address spaces as global	2020-02-17 11:32:28 -08:00
Craig Topper	43e948c4b7	[X86] Change how the alignment for the stack object is created in LowerFLT_ROUNDS_. We don't need FrameInfo's concept of the stack alignment. We just need to tell it the desired alignment. Which in this case is 2.	2020-02-17 11:27:34 -08:00
Craig Topper	b62de210cf	[X86] Move '0-x == y --> x+y == 0' and similar combines to EmitCmp. AArch64 handles this pattern in their lowering code. By emitting CMN. ARM handles it as an isel pattern.	2020-02-17 11:27:34 -08:00
Matt Arsenault	54137bbaaf	GlobalISel: Allow running localizer earlier This required legal and regbankselected MIR for seemingly no reason. For AMDGPU this wouldn't see legalized G_GLOBAL_VALUEs.	2020-02-17 11:24:06 -08:00
Matt Arsenault	96db12d507	AMDGPU/GlobalISel: Custom lower 32-bit G_UDIV/G_UREM AMDGPUCodeGenPrepare expands this most of the time, but not always. We will always at least need a fallback option here. This is the 3rd implementation of the same expansion in the backend. Eventually I would like to eliminate the IR expansion (and the DAG version obviously). Currently the new legalizer path produces a better result, since the IR expansion results in extra operations which need to be combined out. Notably, the IR expansion results in multiplies by 0.	2020-02-17 11:05:50 -08:00
Matt Arsenault	0e2eb357e0	GlobalISel: Extend narrowing to G_ASHR	2020-02-17 10:42:59 -08:00
John Brawn	594a89f727	[FPEnv][ARM] Don't call mutateStrictFPToFP when lowering mutateStrictFPToFP can delete the node and replace it with another with the same value which can later cause problems, and returning the result of mutateStrictFPToFP doesn't work because SelectionDAGLegalize expects that the returned value has the same number of results as the original. Instead handle things by doing the mutation manually. Differential Revision: https://reviews.llvm.org/D74726	2020-02-17 18:19:25 +00:00
Mikhail Maltsev	489f62e801	[ARM,MVE] Add vector-scalar intrinsics Summary: This patch adds vector-scalar variants to the following families of MVE intrinsics: * vaddq * vsubq * vmulq * vqaddq * vqsubq * vhaddq * vhsubq * vqdmulhq * vqrdmulhq The vector-scalar variants perform a splat operation on the scalar operand and then perform the same operations as their vector-vector counterparts. Code generation is done accordingly (using LLVM IR 'insert' and 'shuffle' operations which are later converted into an ARMvdup SDNode). Reviewers: simon_tatham, dmgreen, MarkMurrayARM, ostannard Reviewed By: dmgreen Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D74620	2020-02-17 17:47:05 +00:00
Nikita Popov	80397d2d12	[IRBuilder] Delete copy constructor D73835 will make IRBuilder no longer trivially copyable. This patch deletes the copy constructor in advance, to separate out the breakage. Currently, the IRBuilder copy constructor is usually used by accident, not by intention. In rG7c362b25d7a9 I've fixed a number of cases where functions accepted IRBuilder rather than IRBuilder &, thus performing an unnecessary copy. In rG5f7b92b1b4d6 I've fixed cases where an IRBuilder was copied, while an InsertPointGuard should have been used instead. The only non-trivial use of the copy constructor is the getIRBForDbgInsertion() helper, for which I separated construction and setting of the insertion point in this patch. Differential Revision: https://reviews.llvm.org/D74693	2020-02-17 18:14:48 +01:00
Nikita Popov	98ed613ccc	[IRBuilder] Avoid passing IRBuilder by value; NFC I've fixed most of these before, but missed some occurrences in targets I don't usually build.	2020-02-17 18:14:47 +01:00
Matt Arsenault	8550859535	GlobalISel: Extend shift narrowing to G_SHL	2020-02-17 09:13:37 -08:00
Matt Arsenault	d9e8b2cbcc	AMDGPU/GlobalISel: Skip DAG hack passes on selected functions The way fallback to SelectionDAG works is somewhat surprising to me. When the fallback path is enabled, the entire set of SelectionDAG selector passes is added to the pass pipeline, and each one needs to check if the function was selected. This results in the surprising behavior of running SIFixSGPRCopies for example, but only if -global-isel-abort=2 is used. SIAddIMGInitPass is also added in addInstSelector, but I'm not sure why we have this pass or if it should be added somewhere else for GlobalISel.	2020-02-17 08:33:17 -08:00
Matt Arsenault	78d455adf0	GlobalISel: Add combine to narrow G_LSHR Produce an unmerge to a narrower type and introduce a narrower shift if needed. I wasn't sure if there was a better way to parameterize the target's preferred shift type for the GICombineRule, so manually call the combine helper.	2020-02-17 08:04:52 -08:00
Matt Arsenault	86813e2768	AMDGPU/GlobalISel: Select llvm.amdgcn.s.buffer.load Doesn't try to fail on the dlc bit pre-gfx10 like the DAG lowering does.	2020-02-17 08:02:40 -08:00
Mikhail Maltsev	dd4d093762	[ARM] Add initial support for Custom Datapath Extension (CDE) Summary: This patch adds assembly-level support for a new Arm M-profile architecture extension, Custom Datapath Extension (CDE). A brief description of the extension is available at https://developer.arm.com/architectures/instruction-sets/custom-instructions The latest specification for CDE is currently a beta release and is available at https://static.docs.arm.com/ddi0607/aa/DDI0607A_a_armv8m_arm_supplement_cde.pdf CDE allows chip vendors to add custom CPU instructions. The CDE instructions re-use the same encoding space as existing coprocessor instructions (such as MRC, MCR, CDP etc.). Each coprocessor in range cp0-cp7 can be configured as either general purpose (GCP) or custom datapath (CDEv1). This configuration is defined by the CPU vendor and is provided to LLVM using 8 subtarget features: cdecp0 ... cdecp7. The semantics of CDE instructions are implementation-defined, but the instructions are guaranteed to be pure (that is, they are stateless, they do not access memory or any registers except their explicit inputs/outputs). CDE requires the CPU to support at least Armv8.0-M mainline architecture. CDE includes 3 sets of instructions: * Instructions that operate on general purpose registers and NZCV flags * Instructions that operate on the S or D register file (require either FP or MVE extension) * Instructions that operate on the Q register file, require MVE The user-facing names that can be specified on the command line are the same as the 8 subtarget feature names. For example: $ clang -target arm-none-none-eabi -march=armv8m.main+cdecp0+cdecp3 tells the compiler that the coprocessors 0 and 3 are configured as CDEv1 and the remaining coprocessors are configured as GCP (which is the default). Reviewers: simon_tatham, ostannard, dmgreen, eli.friedman Reviewed By: simon_tatham Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D74044	2020-02-17 15:39:16 +00:00
Matt Arsenault	5fdc9851d0	AMDGPU/GlobalISel: Run the localizer pass While looking at the output on real sized programs, there is a lot of extra SGPR spilling compared to the DAG path. This seems to largely be from all constants being SGPRs in the entry block.	2020-02-17 07:38:12 -08:00
Sander de Smalen	a7a96c726e	[AArch64] Implement passing SVE vectors by ref for AAPCS. Summary: This patch implements the part of the calling convention where SVE Vectors are passed by reference. This means the caller must allocate stack space for these objects and pass the address to the callee. Reviewers: efriedma, rovka, cameron.mcinally, c-rhodes, rengolin Reviewed By: efriedma Subscribers: tschuett, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71216	2020-02-17 15:20:28 +00:00
Benjamin Kramer	f4c59c0f97	[wasm] Unbreak after `5fc5c7db38`. NFCI.	2020-02-17 15:49:49 +01:00
Benjamin Kramer	5fc5c7db38	Strength reduce vectors into arrays. NFCI.	2020-02-17 15:37:35 +01:00
Matt Arsenault	e5805529bf	AMDGPU/GlobalISel: Select v2s32->v2s16 G_TRUNC It would be nice if there was a way to avoid the tied operand, but as far as I can tell there isn't a way to use or with op_sel to achieve this	2020-02-17 09:20:13 -05:00
Matt Arsenault	361f2a7818	AMDGPU/GlobalISel: Handle sbfe/ubfe intrinsic Try to handle arbitrary scalar BFEs by packing the operands. The DAG gives up on non-constant arguments. We're still missing any constant folding, so we end up with pretty ugly code most of the time. Also handle the 64-bit scalar case, which the DAG doesn't try to do.	2020-02-17 09:20:13 -05:00
Kerry McLaughlin	633db60f3e	[AArch64][SVE] Add SVE index intrinsic Summary: Implements the @llvm.aarch64.sve.index intrinsic, which takes a scalar base and step value. This patch also adds the printSImm function to AArch64InstPrinter to ensure that immediates of type i8 & i16 are printed correctly. Reviewers: sdesmalen, andwar, efriedma, dancgr, cameron.mcinally, rengolin Reviewed By: cameron.mcinally Subscribers: tatyana-krasnukha, tschuett, kristof.beyls, hiraditya, rkruppe, arphaman, psnobl, cfe-commits, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D74550	2020-02-17 10:30:11 +00:00
Sjoerd Meijer	e5043cd3c2	[AArch64] Fix small typos in the target description. NFC. Patch by Tamas Petz. Differential Revision: https://reviews.llvm.org/D74603	2020-02-17 10:13:47 +00:00
QingShan Zhang	113df90388	[PowerPC] Add the missing InstrAliasing for 64-bit rotate instructions We have the InstAlias rules for 32-bit rotate but missing the 64-bit one. Rotate left immediate rotlwi ra,rs,n rlwinm ra,rs,n,0,31 Rotate left rotlw ra,rs,rb rlwnm ra,rs,rb,0,31 Differential Revision: https://reviews.llvm.org/D72676	2020-02-17 05:42:49 +00:00

1 2 3 4 5 ...

56350 Commits