llvm-project

Commit Graph

Author	SHA1	Message	Date
Sam Parker	414dd1c946	[NFC][ARM[ParallelDSP] Cleanup of BinOpChain - Remove some unused typedefs. - Rename BinOpChain struct to MulCandidate. - Remove the size method of MulCandidate. - Store only the first input of the ValueList provided to MulCandidate, as it's the only value we care about. This means we don't have to perform any ugly (and unnecessary) iterations of the list later on. llvm-svn: 367208	2019-07-29 08:41:51 +00:00
David Stuttard	20235ef3e7	[AMDGPU] Enable v4f16 and above for v_pk_fma instructions Summary: If isel is presented with <2 x half> vectors then it will correctly select v_pk_fma style instructions. If isel is presented with e.g. <4 x half> vectors it will scalarize, unlike for other instruction types (such as fadd, fmul etc.) Added extra support to enable this. Updated one of the tests to include a test for this (as well as extending the test to GFX9) Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65325 Change-Id: I50a4577a3f8223fb53992af3b7d26121f65b71ee llvm-svn: 367206	2019-07-29 08:15:10 +00:00
Sam Parker	8538060103	[NFC][ARM][ParallelDSP] Remove AreSymmetrical We explicitly search for a parallel mac and we only care about its inputs, checking for symmetry doesn't add anything here. llvm-svn: 367205	2019-07-29 08:12:24 +00:00
Sam Parker	11ad33ede6	[NFC][ARM][ParallelDSP] Remove PopulateLoads We no longer have to check what loads are used, all this is performed at the start of the transform, so it's not doing anything now. llvm-svn: 367204	2019-07-29 08:07:23 +00:00
Craig Topper	eb1beabad9	[X86] Don't use PMADDWD for vector add reductions of multiplies if the mul inputs have an additional user. The pmaddwd inserts a truncate, if that truncate would end up creating additional instructions instead of making a zext narrower, then we shouldn't do it. I've restricted this to only sse4.1 targets since on prior targets the zext will be done in stages. So the truncate will probably not create additional instructions. Might need some more investigation of mul shrinking and the other pmaddwd transform to be sure this is the right decision. There might be a slight regression on AVX1 targets due to add splitting. Hard to say for sure. Maybe we need to look into using the vector reduction flag to use 2 narrow loads and a blend instead of extracting and inserting. llvm-svn: 367198	2019-07-29 01:36:58 +00:00
Craig Topper	894916cac9	[X86] In combineLoopMAddPattern and combineLoopSADPattern, preserve the vector reduction flag on the final add. Handle unrolled loops by letting DAG combine revisit. This reverts r340478 and r340631 and replaces them with a simpler method of just letting DAG combine revisit the nodes to handle the other operand. llvm-svn: 367195	2019-07-28 18:45:42 +00:00
David Green	b8b8b46a51	[ARM] MVE VPNOT This adds the patterns required to transform xor P0, -1 to a VPNOT. The instruction operands have to change a little for this, adding an in and an out VCCR reg and using a custom DecodeMVEVPNOT for the decode. Differential Revision: https://reviews.llvm.org/D65133 llvm-svn: 367192	2019-07-28 14:07:48 +00:00
David Green	9cf344e739	[ARM] Better patterns for fp <> predicate vectors These are some better patterns for converting between predicates and floating points. Much like the extends, we select "1"/"-1" or "0" depending on the predicate value. Or we perform a compare against 0 to convert to a predicate. Differential Revision: https://reviews.llvm.org/D65103 llvm-svn: 367191	2019-07-28 13:53:39 +00:00
Simon Pilgrim	353a848473	[X86][SSE] Replace PMULDQ GetDemandedBits combine with SimplifyMultipleUseDemandedBits handler (Reapplied) Recommit rL367100 which was reverted at rL367141. Until PR42777 is fixed, we no longer get the benefits of peeking through bitcasts but it does still remove a GetDemandedBits user and gives us the equivalent combines. llvm-svn: 367172	2019-07-27 13:30:29 +00:00
Amara Emerson	7bc4fad0fb	[AArch64][GlobalISel] Implement narrowing of G_SEXT. We need this to narrow a sext to s128. Differential Revision: https://reviews.llvm.org/D65357 llvm-svn: 367164	2019-07-26 23:46:38 +00:00
Jessica Paquette	aa8b9993c2	[AArch64][GlobalISel] Select @llvm.aarch64.stlxr for 32-bit pointers Add partial instruction selection for intrinsics like this: ``` declare i32 @llvm.aarch64.stlxr(i64, i32*) ``` (This only handles the case where a G_ZEXT is feeding the intrinsic.) Also make sure that the added store instruction actually has the memory op from the original G_STORE. Update select-stlxr-intrin.mir and arm64-ldxr-stxr.ll. Differential Revision: https://reviews.llvm.org/D65355 llvm-svn: 367163	2019-07-26 23:28:53 +00:00
Vlad Tsyrklevich	485b8789de	Revert "[X86][SSE] Replace PMULDQ GetDemandedBits combine with SimplifyMultipleUseDemandedBits handler." This reverts r367100, it appears to be causing test failures after Nico's revert of r367091. llvm-svn: 367141	2019-07-26 18:14:21 +00:00
Sean Fertile	9df6177d38	[PowerPC][AIX]Add lowering of MCSymbol MachineOperand. Adds machine operand lowering for MCSymbolSDNodes to the PowerPC backend. This is needed to produce call instructions in assembly for AIX because the callee operand is a MCSymbolSDNode. The test is XFAIL'ed for asserts due to a (valid) assertion in PEI that the AIX ABI isn't supported yet. Differential Revision: https://reviews.llvm.org/D63738 llvm-svn: 367133	2019-07-26 17:25:27 +00:00
Michael Liao	711556e6a8	[AMDGPU] Fix typo. llvm-svn: 367131	2019-07-26 17:13:59 +00:00
Cullen Rhodes	2cde8b5db6	[AArch64][SVE2] Rename bitperm feature to sve2-bitperm Summary: The bitperm feature flag is now prefixed with SVE2, as it is for all other SVE2 extensions Patch by Maciej Gabka. Reviewers: sdesmalen, rovka, chill, SjoerdMeijer, rengolin Reviewed By: SjoerdMeijer, rengolin Differential Revision: https://reviews.llvm.org/D65327 llvm-svn: 367124	2019-07-26 15:57:50 +00:00
Sam Parker	3da59e5513	[ARM][ParallelDSP] Combine structs Combine OpChain and BinOpChain structs as OpChain is a base class to BinOpChain that is never used. llvm-svn: 367114	2019-07-26 14:11:40 +00:00
Sean Fertile	9bd22fec0d	[PowerPC] Add getCRSaveOffset to improve readability. [NFC] In preperation for AIX support in FrameLowering: replace a number of literal '8' that represent the stack offset of the condition register save area with a member in PPCFrameLowering. Patch by Chris Bowler. llvm-svn: 367111	2019-07-26 14:02:17 +00:00
Petar Avramovic	cf21794566	[MIPS GlobalISel] Fix check for void return during lowerCall Void return used to have unsigned with value 0 for virtual register but with addition of Register class and changes to arguments to lowerCall this is no longer valid. Check for void return by inspecting the Ty field in OrigRet. Differential Revision: https://reviews.llvm.org/D65321 llvm-svn: 367107	2019-07-26 13:19:37 +00:00
Carl Ritson	0b28357053	[AMDGPU] Move WQM/WWM intrinsic instruction selection to AMDGPUISelDAGToDAG Reviewers: arsenm, nhaehnle Reviewed By: arsenm Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65328 llvm-svn: 367105	2019-07-26 13:11:44 +00:00
Petar Avramovic	b1fc6f6130	[MIPS GlobalISel] Select inttoptr and ptrtoint Select G_INTTOPTR and G_PTRTOINT for MIPS32. Differential Revision: https://reviews.llvm.org/D65217 llvm-svn: 367104	2019-07-26 13:08:06 +00:00
Simon Pilgrim	d93e8ece7b	[X86][SSE] Replace PMULDQ GetDemandedBits combine with SimplifyMultipleUseDemandedBits handler. This removes a GetDemandedBits user and allows us to benefit from the DemandedElts propagated through SimplifyDemandedBits. llvm-svn: 367100	2019-07-26 11:10:20 +00:00
Sam Parker	7440065bd8	[NFC][ARM][ParallelDSP] Cleanup isNarrowSequence Remove unused logic. llvm-svn: 367099	2019-07-26 10:57:42 +00:00
Carl Ritson	00e89b428b	[AMDGPU] Add llvm.amdgcn.softwqm intrinsic Add llvm.amdgcn.softwqm intrinsic which behaves like llvm.amdgcn.wqm only if there is other WQM computation in the shader. Reviewers: nhaehnle, tpr Reviewed By: nhaehnle Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64935 llvm-svn: 367097	2019-07-26 09:54:12 +00:00
Momchil Velikov	898d953693	[AArch64] Define ETE and TRBE system registers Embedded Trace Extension and Trace Buffer Extension are optional future architecture extensions. (cf. https://developer.arm.com/architectures/cpu-architecture/a-profile/exploration-tools) Their system registers are documented here: https://developer.arm.com/docs/ddi0601/a ETE shares register names with ETM. One exception is the ETE TRCEXTINSELR0 register, which has the same encoding as the ETM TRCEXTINSELR register (but different semantics). This patch treats them as aliases: the assembler will accept both names, emitting identical encoding, and the disassembler will keep disassembling to TRCEXRINSELR. Differential Revision: https://reviews.llvm.org/D63707 llvm-svn: 367093	2019-07-26 09:19:08 +00:00
Sam Parker	c760b5da11	[ARM][LowOverheadLoops] Add CPSR defs Both WhileLoopStart and LoopEnd may get turned into a cmp and br pair, so add an implicit def to these pseudo instructions in case that WLS and LE aren't generated. Differential Revision: https://reviews.llvm.org/D65275 llvm-svn: 367089	2019-07-26 08:15:01 +00:00
Pengfei Wang	9ad565f70e	[WinEH] Allocate space in funclets stack to save XMM CSRs Summary: This is an alternate approach to D57970. Currently funclets reuse the same stack slots that are used in the parent function for saving callee-saved xmm registers. If the parent function modifies a callee-saved xmm register before an excpetion is thrown, the catch handler will overwrite the original saved value. This patch allocates space in funclets stack for saving callee-saved xmm registers and uses RSP instead RBP to access memory. Reviewers: andrew.w.kaylor, LuoYuanke, annita.zhang, craig.topper, RKSimon Subscribers: rnk, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63396 Signed-off-by: pengfei <pengfei.wang@intel.com> llvm-svn: 367088	2019-07-26 07:33:15 +00:00
Matt Arsenault	a9ea8a9aae	AMDGPU/GlobalISel: Handle most function return types handleAssignments gives up pretty easily on structs, and i8 values for some reason. The other case that doesn't work is when an implicit sret needs to be inserted if the return size exceeds the number of return registers. llvm-svn: 367082	2019-07-26 02:36:05 +00:00
Amara Emerson	c07fe307b4	[AArch64][GlobalISel] Simplify zext/sext selection, use MachineIRBuilder. NFC. llvm-svn: 367075	2019-07-26 00:01:09 +00:00
Yonghong Song	329abf2939	[BPF] fix typedef issue for offset relocation Currently, the CO-RE offset relocation does not work if any struct/union member or array element is a typedef. For example, typedef const int arr_t[7]; struct input { arr_t a; }; func(...) { struct input *in = ...; ... __builtin_preserve_access_index(&in->a[1]) ... } The BPF backend calculated default offset is 0 while 4 is the correct answer. Similar issues exist for struct/union typedef's. When getting struct/union member or array element type, we should trace down to the type by skipping typedef and qualifiers const/volatile as this is what clang did to generate getelementptr instructions. (const/volatile member type qualifiers are already ignored by clang.) This patch fixed this issue, for each access index, skipping typedef and const/volatile/restrict BTF types. Signed-off-by: Yonghong Song <yhs@fb.com> Differential Revision: https://reviews.llvm.org/D65259 llvm-svn: 367062	2019-07-25 21:47:27 +00:00
Amara Emerson	e54dc6b8b5	[AArch64][GlobalISel] Fix G_SELECT legalization fallback after r366943. Changes the order of legalization of G_ICMP suggested by Petar in D65079. llvm-svn: 367060	2019-07-25 21:44:52 +00:00
Yonghong Song	d8efec97be	[BPF] fix CO-RE incorrect index access string Currently, we expect the CO-RE offset relocation records a string encoding the original getelementptr access index, so kernel bpf loader can decode it correctly. For example, struct s { int a; int b; }; struct t { int c; int d; }; #define _(x) (__builtin_preserve_access_index(x)) int get_value(const void addr1, const void addr2); int test(struct s arg1, struct t arg2) { return get_value(_(&arg1->b), _(&arg2->d)); } We expect two offset relocations: reloc 1: type s, access index 0, 1 reloc 2: type t, access index 0, 1 Two globals are created to retain access indexes for the above two relocations with global variable names. The first global has a name "0:1:". Unfortunately, the second global has the name "0:1:.1" as the llvm internals automatically add suffix ".1" to a global with the same name. Later on, the BPF peels the last character and record "0:1" and "0:1:." in the relocation table. This is not desirable. BPF backend could use the global variable suffix knowledge to generate correct access str. This patch rather took an approach not relying on that knowledge. It generates "s:0:1:" and "t:0:1:" to avoid global variable suffixes and later on generate correct index access string "0:1" for both records. Signed-off-by: Yonghong Song <yhs@fb.com> Differential Revision: https://reviews.llvm.org/D65258 llvm-svn: 367030	2019-07-25 16:01:26 +00:00
Michael Liao	53f967f2bd	[AMDGPU] Run `unreachable-mbb-elimination` after isel to clean up PHIs. Summary: - As LCSSA is turned on just before isel, it may create PHI of the flow, which is consumed by pseudo structurized CFG instructions. When that PHIs are eliminated in O0, COPY may be placed wrongly as the these pseudo structurized CFG instructions are considering prologue of MBB. - Run extra `unreachable-mbb-elimination` at the end of isel to clean up PHIs. Reviewers: arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64353 llvm-svn: 367023	2019-07-25 14:50:18 +00:00
Momchil Velikov	a655f476b0	[AArch64][SVE] Allow explicit size specifier for predicate operand ... for the vector forms of `{SQ,UQ,}{INC,DEC}P` instructions. Also continue supporting the exsting behaviour of not requiring an explicit size specifier. The preferred disasembly is with the specifier. This is implemented by redefining intruction forms to require vector predicates with explicit size and adding aliases, which allow a predicate with no size. Differential Revision: https://reviews.llvm.org/D65145 llvm-svn: 367019	2019-07-25 13:56:04 +00:00
Matt Arsenault	a85af76c72	AMDGPU: Don't assert on v4f16 arguments to shader calling conventions llvm-svn: 367018	2019-07-25 13:55:07 +00:00
Simon Pilgrim	447fe31964	[X86] concatSubVectors - remove unnecessary args. NFCI. All these args can be cheaply recomputed and it makes it much easier to use the function as a quick helper. llvm-svn: 367014	2019-07-25 13:05:46 +00:00
Pablo Barrio	275954539d	[ARM][AArch64] Support for Cortex-A65 & A65AE, Neoverse E1 & N1 Summary: Add support for Cortex-A65, Cortex-A65AE, Neoverse E1 and Neoverse N1. Neoverse E1 and Cortex-A65(&AE) only implement the AArch64 state of the Arm architecture. Neoverse N1 implements both AArch32 and AArch64. Cortex-A65: https://developer.arm.com/ip-products/processors/cortex-a/cortex-a65 Cortex-A65AE: https://developer.arm.com/ip-products/processors/cortex-a/cortex-a65ae Neoverse E1: https://developer.arm.com/ip-products/processors/neoverse/neoverse-e1 Neoverse N1: https://developer.arm.com/ip-products/processors/neoverse/neoverse-n1 Patch by Diogo Sampaio and Pablo Barrio Reviewers: samparker, LukeCheeseman, sbaranga, ostannard Reviewed By: ostannard Subscribers: ostannard, javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64406 llvm-svn: 367007	2019-07-25 10:59:45 +00:00
Kai Luo	985e52a4c1	[PowerPC][NFC] Make `getDefMIPostRA` public llvm-svn: 366995	2019-07-25 08:36:44 +00:00
Kai Luo	5c8af53806	[PowerPC][NFC] Added `getDefMIPostRA` method Summary: In PostRA phase, we often have to find out the most recent definition of a register. This patch adds getDefMIPostRA so that other methods can use it rather than implementing it repeatedly. Differential Revision: https://reviews.llvm.org/D65131 llvm-svn: 366990	2019-07-25 07:47:52 +00:00
Seiya Nuta	21277e3ec2	[MC] Add MCInstrAnalysis::evaluateMemoryOperandAddress Summary: Add a new method which tries to compute the target address referenced by an operand. This patch supports x86_64 RIP-relative addressing for now. It is necessary to print referenced symbol names in llvm-objdump. Reviewers: andreadb, MaskRay, grosbach, jgalenson, craig.topper Reviewed By: MaskRay, craig.topper Subscribers: bcain, rupprecht, jhenderson, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63847 llvm-svn: 366987	2019-07-25 06:57:09 +00:00
Eli Friedman	82e109279d	[ARM] Remove dead code from ARMConstantIslands. tLDRHi is not a pc-relative load; it can't directly refer to a constant pool or jump table. llvm-svn: 366963	2019-07-24 23:36:14 +00:00
Jessica Paquette	728b18f29f	[AArch64][GlobalISel] Select immediate modes for ADD when selecting G_GEP Before, we weren't able to select things like this for G_GEP: add x0, x8, #8 And instead we'd materialize the 8. This teaches GISel to do that. It gives some considerable code size savings on 252.eon-- about 4%! Differential Revision: https://reviews.llvm.org/D65248 llvm-svn: 366959	2019-07-24 23:11:01 +00:00
Amara Emerson	de81bd0faa	[AArch64][GlobalISel] Don't try to use GISel if subtarget doesn't have neon or fp. Throughout the legalizerinfo we currently make the assumption that the target has neon and FP target features available. Fixing it will require a refactor of the whole thing, so until then make sure we fall back. Works around PR42734 Differential Revision: https://reviews.llvm.org/D65244 llvm-svn: 366957	2019-07-24 23:00:04 +00:00
Roman Lebedev	017e272c3a	[Codegen] (X & (C l>>/<< Y)) ==/!= 0 --> ((X <</l>> Y) & C) ==/!= 0 fold Summary: This was originally reported in D62818. https://rise4fun.com/Alive/oPH InstCombine does the opposite fold, in hope that `C l>>/<< Y` expression will be hoisted out of a loop if `Y` is invariant and `X` is not. But as it is seen from the diffs here, if it didn't get hoisted, the produced assembly is almost universally worse. Much like with my recent "hoist add/sub by/from const" patches, we should get almost universal win if we hoist constant, there is almost always an "and/test by imm" instruction, but "shift of imm" not so much, so we may avoid having to materialize the immediate, and thus need one less register. And since we now shift not by constant, but by something else, the live-range of that something else may reduce. Special care needs to be applied not to disturb x86 `BT` / hexagon `tstbit` instruction pattern. And to not get into endless combine loop. Reviewers: RKSimon, efriedma, t.p.northover, craig.topper, spatel, arsenm Reviewed By: spatel Subscribers: hiraditya, MaskRay, wuzish, xbolva00, nikic, nemanjai, jvesely, wdng, nhaehnle, javed.absar, tpr, kristof.beyls, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62871 llvm-svn: 366955	2019-07-24 22:57:22 +00:00
Jessica Paquette	68499112cf	[AArch64][GlobalISel] Fold G_MUL into XRO load addressing mode when possible If we have a G_MUL, and either the LHS or the RHS of that mul is the legal shift value for a load addressing mode, we can fold it into the load. This gives some code size savings on some SPEC tests. The best are around 2% on 300.twolf and 3% on 254.gap. Differential Revision: https://reviews.llvm.org/D65173 llvm-svn: 366954	2019-07-24 22:49:42 +00:00
Amara Emerson	13af1ed8e3	[GlobalISel] Support for inlining memcpy, memset and memmove calls. This introduces a new family of combiner helper routines that re-use the target specific cost model from SelectionDAG, and generate inline implementations of the memcpy family of intrinsics. The combines are only enabled at optimization levels higher than -O0, and give very substantial performance improvements. Differential Revision: https://reviews.llvm.org/D65167 llvm-svn: 366951	2019-07-24 22:17:31 +00:00
Stanislav Mekhanoshin	c43784ff26	[AMDGPU] Increase kernel padding To support prefetch mode 3 we need to pad current cacheline and fill 3 cachelines after. Current padding is only sufficient for mode 2. Differential Revision: https://reviews.llvm.org/D65236 llvm-svn: 366938	2019-07-24 19:40:13 +00:00
David Green	cd7a6fa314	[ARM] Rewrite how VCMP are lowered, using a single node This removes the VCEQ/VCNE/VCGE/VCEQZ/etc nodes, just using two called VCMP and VCMPZ with an extra operand as the condition code. I believe this will make some combines simpler, allowing us to just look at these codes and not the operands. It also helps fill in a missing VCGTUZ MVE selection without adding extra nodes for it. Differential Revision: https://reviews.llvm.org/D65072 llvm-svn: 366934	2019-07-24 17:36:47 +00:00
Simon Pilgrim	7d318b2bb1	[DAGCombine] matchBinOpReduction - add partial reduction matching This patch adds support for recognizing cases where a larger vector type is being used to reduce just the elements in the lower subvector: e.g. <8 x i32> reduction pattern in a <16 x i32> vector: <4,5,6,7,u,u,u,u,u,u,u,u,u,u,u,u> <2,3,u,u,u,u,u,u,u,u,u,u,u,u,u,u> <1,u,u,u,u,u,u,u,u,u,u,u,u,u,u,u> matchBinOpReduction returns the lower extracted subvector in such cases, assuming isExtractSubvectorCheap accepts the extraction. I've only enabled it for X86 reduction sums so far. I intend to enable it for the bitop/minmax cases in future patches, and eventually I think its worth turning it on all the time. This is mainly just a case of ensuring calls to matchBinOpReduction don't make assumptions on the vector width based on the original vector extraction. Fixes the x86 partial reduction sum cases in PR33758 and PR42023. Differential Revision: https://reviews.llvm.org/D65047 llvm-svn: 366933	2019-07-24 17:29:56 +00:00
David Green	047a0b6575	[ARM] Disable MVE fptosi and friends The prevents us from trying to convert an i1 predicate vector to a float, or vice-versa. Better patterns are possible, which will follow in a subsequent commit. For now we just expand them. Differential Revision: https://reviews.llvm.org/D65066 llvm-svn: 366931	2019-07-24 17:26:26 +00:00
Jessica Paquette	c19c30776a	[AArch64][GlobalISel] Make vector dup optimization look at last elt of ZeroVec Fix an off-by-one error which made us not look at the last element of the zero vector. This caused a miscompile in 188.ammp. Differential Revision: https://reviews.llvm.org/D65168 llvm-svn: 366930	2019-07-24 17:18:51 +00:00
David Green	b342bddbe2	[ARM] More MVE compare vector splat combines for ANDs Adds some extra r register compare combines, this time for ANDs. Differential Revision: https://reviews.llvm.org/D65062 llvm-svn: 366928	2019-07-24 17:08:09 +00:00
David Green	93b5f61295	[ARM] MVE compare vector splat combine MVE VCMP instructions can use a general purpose register as the second operand. This adds the combines for it, selecting from a compare of a vdup. Differential Revision: https://reviews.llvm.org/D65061 llvm-svn: 366924	2019-07-24 16:58:41 +00:00
Dmitry Preobrazhensky	5e1dd02c90	[AMDGPU][MC][GFX10] Enabled GFX10 assembly with arbitrary wavesize assumed by the code Reviewers: rampitec, arsenm Differential Revision: https://reviews.llvm.org/D65216 llvm-svn: 366921	2019-07-24 16:50:17 +00:00
David Green	bab4d8ac5a	[ARM] Better OR's for MVE compares This adds a DeMorgan combine for OR's of compares to turn them into AND's, helping prevent them from going into and out of gpr registers. It also fills in the VCLE and VCLT nodes that MVE can select, allowing it to invert more compares. Differential Revision: https://reviews.llvm.org/D65059 llvm-svn: 366920	2019-07-24 16:42:09 +00:00
Stanislav Mekhanoshin	5cdacea297	[AMDGPU] Add all vgpr classes to asm parser Differential Revision: https://reviews.llvm.org/D65158 llvm-svn: 366917	2019-07-24 16:21:18 +00:00
Matt Arsenault	0e7d8698b5	AMDGPU/GlobalISel: Don't assume instruction can be erased when selecting exts The G_ANYEXT handling can end up reaching selectCOPY, which mutates the instruction in place. llvm-svn: 366915	2019-07-24 16:05:53 +00:00
David Green	69fba7434e	[ARM] Better AND's for MVE compares Add a number of folds to convert and(vcmp, vcmp) into a single VPT block, where the second vcmp becomes predicated on the first. The VCMP; VPST; VCMP will eventually be converted to VPT; VCMP in the VPTBlockPass. Differential Revision: https://reviews.llvm.org/D65058 llvm-svn: 366910	2019-07-24 14:42:05 +00:00
David Green	4fc78c496e	[ARM] MVE floating point compares and selects Much like integers, this adds MVE floating point compares and select. It requires a lot more buildvector/shuffle code because we may need to expand the compares without mve.fp, and requires support for and/or because of the way we lower llvm condition codes. Some original code by David Sherwood Differential Revision: https://reviews.llvm.org/D65054 llvm-svn: 366909	2019-07-24 14:28:22 +00:00
David Green	a4a4698c16	[ARM] Basic And/Or/Xor handling for MVE predicates This adds some basic, "worst case" handling for MVE predicate Or/And/Xor. It does this by going into and out of GPRs, doing the operation on scalars. Code by David Sherwood. Differential Revision: https://reviews.llvm.org/D65053 llvm-svn: 366907	2019-07-24 14:17:54 +00:00
Simi Pallipurath	724888af45	[ARM] Make sure that the constant pool does not keep in the middle of an IT block. This change make sure that llvm does not emit an invalid IT block by putting the constant pool in the middle of an IT block. We have code to try to avoid putting a constant island in the middle of an IT block, but it only works if we see an IT between the one currently referencing CPE and possible insertion point. If the first instruction we look at is the VLDRD after the IT , we never see the IT and does not realize that the instruction doing the load could be in an IT block itself. Differential Revision: https://reviews.llvm.org/D64621 Change-Id: I24cecb37cded75e8992870bd997f6226853bd920 llvm-svn: 366905	2019-07-24 13:54:14 +00:00
Sjoerd Meijer	a19f5a76e6	Test commit. NFC. Removed 2 trailing whitespaces in 2 files that used to be in different repos to test my new github monorepo workflow. llvm-svn: 366904	2019-07-24 13:30:36 +00:00
David Green	c7e55d4f52	[ARM] MVE predicate register support This adds support code for building and shuffling i1 predicate registers. It generally uses two basic principles, either converting the predicate into an scalar (through a PREDICATE_CAST) and doing scalar operations on it there, or by converting the register to an full vector register and back. Some of the code here is a not super efficient but will hopefully cover most cases of moving i1 vectors around and can be improved in subsequent patches. Some code by David Sherwood. Differential Revision: https://reviews.llvm.org/D65052 llvm-svn: 366890	2019-07-24 11:51:36 +00:00
David Green	b9d96ceca0	[ARM] MVE integer compares and selects This adds the very basics for MVE vector predication, adding integer VCMP and VSEL instruction support. This is done through predicate registers (MVT::v16i1, MVT::v8i1, MVT::v4i1), but otherwise using same mechanics as NEON to custom lower setcc's through ARMISD::VCXX nodes (VCEQ, VCGT, VCEQZ, etc). An extra VCNE was added, as this can be handled sensibly by MVE's expanded number of VCMP condition codes. (There are also VCLE and VCLT which are added later). VPSEL is also added here, simply selecting on the vselect. Original code by David Sherwood. Differential Revision: https://reviews.llvm.org/D65051 llvm-svn: 366885	2019-07-24 11:08:14 +00:00
Sam Parker	aeb21b96a0	[ARM][ParallelDSP] Fix pointer operand reordering While combining two loads into a single load, we often need to reorder the pointer operands for the new load. This reordering was broken in the cases where there was a chain of values that built up the pointer. Differential Revision: https://reviews.llvm.org/D65193 llvm-svn: 366881	2019-07-24 09:38:39 +00:00
Chen Zheng	8b7e82be12	[PowerPC][NFC] use opcode instead of MachineInstr for instrHasImmForm(). llvm-svn: 366867	2019-07-24 04:50:23 +00:00
Fangrui Song	305ace7cc8	[AArch64] Fix -Wunused-variable in -DLLVM_ENABLE_ASSERTIONS=off builds after r366857 llvm-svn: 366866	2019-07-24 01:59:44 +00:00
Amara Emerson	511f7f5785	[AArch64][GlobalISel] Add support for s128 loads, stores, extracts, truncs. We need to be able to load and store s128 for memcpy inlining, where we want to generate Q register mem ops. Making these legal also requires that we add some support in other instructions. Regbankselect should also know about these since they have no GPR register class that can hold them, so need special handling to live on the FPR bank. Differential Revision: https://reviews.llvm.org/D65166 llvm-svn: 366857	2019-07-23 22:05:13 +00:00
Jessica Paquette	a2fae1e3e9	[GlobalISel][AArch64] Save a copy on G_SELECT by fixing condition to GPR The condition can never be fed by FPRs, so it should always be on a GPR. Differential Revision: https://reviews.llvm.org/D65157 llvm-svn: 366854	2019-07-23 21:39:50 +00:00
Eli Friedman	b27fc95e89	[ARM] Add opt-bisect support to ARMParallelDSP. llvm-svn: 366851	2019-07-23 20:48:46 +00:00
Yi-Hong Lyu	41a010a4ef	[PowerPC] Remove redundant load immediate instructions Currently PowerPC backend emits code like this: r3 = li 0 std r3, 264(r1) r3 = li 0 std r3, 272(r1) This patch fixes that and other cases where a register already contains a value that is loaded so we will get: r3 = li 0 std r3, 264(r1) std r3, 272(r1) Differential Revision: https://reviews.llvm.org/D64220 llvm-svn: 366840	2019-07-23 19:11:07 +00:00
Craig Topper	76bc3d6e07	[X86] In lowerVectorShuffle, instead of creating a new node to canonicalize the shuffle mask by commuting, just commute the mask and swap V1/V2. LegalizeDAG tries to legal the DAG by legalizing nodes before their operands. If we create a new node, we end up legalizing it after its operands. This prevents some of the optimizations that can be done when the operand is a build_vector since the build_vector will have been legalized to something else. Differential Revision: https://reviews.llvm.org/D65132 llvm-svn: 366835	2019-07-23 18:46:15 +00:00
Jessica Paquette	2b404d01e8	[GlobalISel][AArch64] Teach GISel to handle shifts in load addressing modes When we select the XRO variants of loads, we can pull in very specific shifts (of the size of an element). E.g. ``` ldr x1, [x2, x3, lsl #3] ``` This teaches GISel to handle these when they're coming from shifts specifically. This adds a new addressing mode function, `selectAddrModeShiftedExtendXReg` which recognizes this pattern. This also packs this up with `selectAddrModeRegisterOffset` into `selectAddrModeXRO`. This is intended to be equivalent to `selectAddrModeXRO` in AArch64ISelDAGtoDAG. Also update load-addressing-modes to show that all of the cases here work. Differential Revision: https://reviews.llvm.org/D65119 llvm-svn: 366819	2019-07-23 16:09:42 +00:00
Sam Parker	57e87dd81b	[ARM][LowOverheadLoops] Fix branch target codegen While lowering test.set.loop.iterations, it wasn't checked how the brcond was using the result and so the wls could branch to the loop preheader instead of not entering it. The same was true for loop.decrement.reg. So brcond and br_cc and now lowered manually when using the hwloop intrinsics. During this we now check whether the result has been negated and whether we're using SETEQ or SETNE and 0 or 1. We can then figure out which basic block the WLS and LE should be targeting. Differential Revision: https://reviews.llvm.org/D64616 llvm-svn: 366809	2019-07-23 14:08:46 +00:00
Simon Pilgrim	c60c12fb10	Fix MSVC warning about extending a uint32_t shift result to uint64_t. NFCI. llvm-svn: 366808	2019-07-23 14:04:54 +00:00
David Green	fdedf240f8	[ARM] Rename NEONModImm to VMOVModImm. NFC Rename NEONModImm to VMOVModImm as it is used in both NEON and MVE. llvm-svn: 366790	2019-07-23 09:19:24 +00:00
Zi Xuan Wu	57d17ec2e1	[PowerPC] Replace float load/store pair with integer load/store pair when it's only used in load/store Replace float load/store pair with integer load/store pair when it's only used in load/store, because float load/store instructions cost more cycles then integer load/store. A typical scenario is when there is a call with more than 13 float arguments passing, we need pass them by stack. So we need a load/store pair to do such memory operation if the variable is global variable. Differential Revision: https://reviews.llvm.org/D64195 llvm-svn: 366775	2019-07-23 03:34:40 +00:00
Matt Arsenault	827427f65b	AMDGPU: Don't use SDNodeXForm for DS offset output The xform has no real valuewhen it's using out of a complex pattern output. The complex pattern was already creating TargetConstants with i16, so this was just unnecessary machinery. This allows global isel to import the simple cases once the complex pattern is implemented. llvm-svn: 366743	2019-07-22 21:38:11 +00:00
Craig Topper	510e6fadaa	[X86] When using AND+PACKUS in lowerV16I8Shuffle, generate the build vector directly in v16i8 with the correct 0x00 or 0xFF elements rather than using another VT and bitcasting it. The build_vector will become a constant pool load. By using the desired type initially, it ensures we don't generate a bitcast of the constant pool load which will need to be folded with the load. While experimenting with another patch, I noticed that when the load type and the constant pool type don't match, then SimplifyDemandedBits can't handle it. While we should probably fix that, this was a simple way to fix the issue I saw. llvm-svn: 366732	2019-07-22 19:58:49 +00:00
Jason Liu	8dd563ef4b	[NFC][PowerPC]Change ADDIStocHA to ADDIStocHA8 to follow 64-bit naming convention Summary: Since we are planning to add ADDIStocHA for 32bit in later patch, we decided to change 64bit one first to follow naming convention with 8 behind opcode. Patch by: Xiangling_L Differential Revision: https://reviews.llvm.org/D64814 llvm-svn: 366731	2019-07-22 19:55:33 +00:00
Sean Fertile	942537d9fa	Stubs out TLOF for AIX and add support for common vars in assembly output. Stubs out a TargetLoweringObjectFileXCOFF class, implementing only SelectSectionForGlobal for common symbols. Also adds an override of EmitGlobalVariable in PPCAIXAsmPrinter which adds a number of defensive errors and adds support for emitting common globals. llvm-svn: 366727	2019-07-22 19:15:29 +00:00
Sean Fertile	324d33dd4e	[PowerPC] Fix comment on MO_PLT Target Operand Flag. [NFC] Patch by Xiangling Liao. llvm-svn: 366724	2019-07-22 18:47:59 +00:00
Sam Parker	4379a40088	[ARM][LowOverheadLoops] Revert remaining pseudos ARMLowOverheadLoops would assert a failure if it did not find all the pseudo instructions that comprise the hardware loop. Instead of doing this, iterate through all the instructions of the function and revert any remaining pseudo instructions that haven't been converted. Differential Revision: https://reviews.llvm.org/D65080 llvm-svn: 366691	2019-07-22 14:16:40 +00:00
Matt Arsenault	937d0ee5d8	AMDGPU/GlobalISel: Remove unnecessary code The minnum/maxnum case are dead, and the cvt is handled by the default. llvm-svn: 366685	2019-07-22 13:05:25 +00:00
David Green	8876a312a8	[ARM] Fix for MVE VPT block pass We need to ensure that the number of T's is correct when adding multiple instructions into the same VPT block. Differential revision: https://reviews.llvm.org/D65049 llvm-svn: 366684	2019-07-22 12:51:38 +00:00
Simon Pilgrim	b3d719e1cf	[X86] EltsFromConsecutiveLoads - support common source loads (REAPPLIED) This patch enables us to find the source loads for each element, splitting them into a Load and ByteOffset, and attempts to recognise consecutive loads that are in fact from the same source load. A helper function, findEltLoadSrc, recurses to find a LoadSDNode and determines the element's byte offset within it. When attempting to match consecutive loads, byte offsetted loads then attempt to matched against a previous load that has already been confirmed to be a consecutive match. Next step towards PR16739 - after this we just need to account for shuffling/repeated elements to create a vector load + shuffle. Fixed out of bounds load assert identified in rL366501 Differential Revision: https://reviews.llvm.org/D64551 llvm-svn: 366681	2019-07-22 12:44:10 +00:00
Christudasan Devadasan	006cf8c03d	Added address-space mangling for stack related intrinsics Modified the following 3 intrinsics: int_addressofreturnaddress, int_frameaddress & int_sponentry. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D64561 llvm-svn: 366679	2019-07-22 12:42:48 +00:00
Oliver Stannard	6771a89fa0	[IPRA][ARM] Make use of the "returned" parameter attribute ARM has code to recognise uses of the "returned" function parameter attribute which guarantee that the value passed to the function in r0 will be returned in r0 unmodified. IPRA replaces the regmask on call instructions, so needs to be told about this to avoid reverting the optimisation. Differential revision: https://reviews.llvm.org/D64986 llvm-svn: 366669	2019-07-22 08:44:36 +00:00
Jay Foad	298500ae33	[AMDGPU] Save some work when an atomic op has no uses Summary: In the atomic optimizer, save doing a bunch of work and generating a bunch of dead IR in the fairly common case where the result of an atomic op (i.e. the value that was in memory before the atomic op was performed) is not used. NFC. Reviewers: arsenm, dstuttard, tpr Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, t-tye, hiraditya, jfb, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64981 llvm-svn: 366667	2019-07-22 07:19:44 +00:00
Simon Pilgrim	86fa3270ef	[X86] SimplifyDemandedVectorEltsForTargetNode - Move SUBV_BROADCAST narrowing handling. NFCI. Move the narrowing of SUBV_BROADCAST to where we handle all the other opcodes. llvm-svn: 366660	2019-07-21 19:04:44 +00:00
Simon Pilgrim	adec0f2252	[X86][SSE] Use PSADBW to improve vXi8 sum reduction (PR42674) As detailed on PR42674, we can reduce a vXi8 down until we have the final <8 x i8>, and then use PSADBW with zero, to sum those values. We then extract the bottom i8, discarding any overflow from the upper bits of the i16 result. llvm-svn: 366636	2019-07-20 15:20:11 +00:00
Jessica Paquette	41affad967	[GlobalISel][AArch64] Contract trivial same-size cross-bank copies into G_STOREs Sometimes, you can end up with cross-bank copies between same-sized GPRs and FPRs, which feed into G_STOREs. When these copies feed only into stores, they aren't necessary; we can just store using the original register bank. This provides some minor code size savings for some floating point SPEC benchmarks. (Around 0.2% for 453.povray and 450.soplex) This issue doesn't seem to show up due to regbankselect or anything similar. So, this patch introduces an early select function, `contractCrossBankCopyIntoStore` which performs the contraction when possible. The selector then continues normally and selects the correct store opcode, eliminating needless copies along the way. Differential Revision: https://reviews.llvm.org/D65024 llvm-svn: 366625	2019-07-20 01:55:35 +00:00
Guanzhong Chen	5204f7611f	[WebAssembly] Compute and export TLS block alignment Summary: Add immutable WASM global `__tls_align` which stores the alignment requirements of the TLS segment. Add `__builtin_wasm_tls_align()` intrinsic to get this alignment in Clang. The expected usage has now changed to: __wasm_init_tls(memalign(__builtin_wasm_tls_align(), __builtin_wasm_tls_size())); Reviewers: tlively, aheejin, sbc100, sunfish, alexcrichton Reviewed By: tlively Subscribers: dschuff, jgravelle-google, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D65028 llvm-svn: 366624	2019-07-19 23:34:16 +00:00
Matt Arsenault	f3bfb85bce	AMDGPU/GlobalISel: Legalize GEP for other 32-bit address spaces llvm-svn: 366621	2019-07-19 22:28:44 +00:00
Stanislav Mekhanoshin	05d9e6a2a3	[AMDGPU] Autogenerate register sequences in tuples Differential Revision: https://reviews.llvm.org/D65007 llvm-svn: 366619	2019-07-19 21:43:42 +00:00
Stanislav Mekhanoshin	7b5a54e369	[AMDGPU] Fixed occupancy calculation for gfx10 Differential Revision: https://reviews.llvm.org/D65010 llvm-svn: 366616	2019-07-19 21:29:51 +00:00
Matt Arsenault	5e23f42820	AMDGPU: Avoid custom predicates for stores with glue llvm-svn: 366613	2019-07-19 21:01:30 +00:00
Matt Arsenault	e3401a9b86	AMDGPU: Redefine setcc condition PatLeafs Avoid using custom code predicates. llvm-svn: 366609	2019-07-19 20:24:40 +00:00
Matt Arsenault	48c0df5d46	AMDGPU: Don't rely on m0 being -1 for GWS offsets This only works if the high bits of m0 are also 0, so m0 would have to be set to 0xffff. llvm-svn: 366608	2019-07-19 20:01:24 +00:00
Matt Arsenault	85f3890126	AMDGPU: Force s_waitcnt after GWS instructions This is apparently required to be the immediately following instruction, so force it into a bundle with a waitcnt. llvm-svn: 366607	2019-07-19 19:47:30 +00:00
Stanislav Mekhanoshin	01fcf9238f	[AMDGPU] Allow register tuples to set asm names This change reverts most of the previous register name generation. The real problem is that RegisterTuple does not generate asm names. Added optional operand to RegisterTuple. This way we can simplify register name access and dramatically reduce the size of static tables for the backend. Differential Revision: https://reviews.llvm.org/D64967 llvm-svn: 366598	2019-07-19 18:05:01 +00:00
Matt Arsenault	7df225dfc2	AMDGPU/GlobalISel: Fix MMO flags for kernel argument loads The DAG lowering sets dereferencable and invariant, not nontemporal. llvm-svn: 366597	2019-07-19 17:52:56 +00:00
Matt Arsenault	08494f6231	AMDGPU/GlobalISel: Selection for fminnum/fmaxnum v2f16 case doesn't work yet because the VOP3P complex patterns haven't been ported yet. llvm-svn: 366585	2019-07-19 14:42:40 +00:00
Matt Arsenault	b60a2ae40e	AMDGPU/GlobalISel: Support arguments with multiple registers Handles structs used directly in argument lists. llvm-svn: 366584	2019-07-19 14:29:30 +00:00
Matt Arsenault	fecf43eba3	AMDGPU/GlobalISel: Rewrite lowerFormalArguments This should now handle everything except structs passed as multiple registers. I think most of the packing logic should be handled by handleAssignments, but I'm unclear on what the contract is for multiple registers. This is copying how x86 handles this. This does change the behavior of the test_sgpr_alignment0 amdgpu_vs test. I don't think shader arguments should try to follow the alignment, and registers need to be repacked. I also don't think it matters, since I think the pointers are packed to the beginning of the argument list anyway. llvm-svn: 366582	2019-07-19 14:15:18 +00:00
Matt Arsenault	1022c0dfde	AMDGPU: Decompose all values to 32-bit pieces for calling conventions This is the more natural lowering, and presents more opportunities to reduce 64-bit ops to 32-bit. This should also help avoid issues graphics shaders have had with 64-bit values, and simplify argument lowering in globalisel. llvm-svn: 366578	2019-07-19 13:57:44 +00:00
Dmitry Preobrazhensky	4ccb7f8c45	[AMDGPU][MC] Corrected parsing of branch offsets See bug 40820: https://bugs.llvm.org/show_bug.cgi?id=40820 Reviewers: artem.tamazov, arsenm Differential Revision: https://reviews.llvm.org/D64629 llvm-svn: 366571	2019-07-19 13:12:47 +00:00
Than McIntosh	e238a4c757	[X86] for split stack, not save/restore nested arg if unused Summary: For split-stack, if the nested argument (i.e. R10) is not used, no need to save/restore it in the prologue. Reviewers: thanm Reviewed By: thanm Subscribers: mstorsjo, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64673 llvm-svn: 366569	2019-07-19 12:54:44 +00:00
Oliver Stannard	8780c0dda2	Don't update NoTrappingFPMath and FPDenormalMode in resetTargetOptions We'd like to remove this whole function, because these are properties of functions, not the target as a whole. These two are easy to remove because they are only used for emitting ARM build attributes, which expects them to represent the defaults for the whole module, not just the last function generated. This is needed to get correct build attributes when using IPRA on ARM, because IPRA causes resetTargetOptions to get called before ARMAsmPrinter::emitAttributes. Differential revision: https://reviews.llvm.org/D64929 llvm-svn: 366562	2019-07-19 10:37:37 +00:00
Mikhail Maltsev	0b001f94a5	[ARM] Add <saturate> operand to SQRSHRL and UQRSHLL Summary: According to the new Armv8-M specification https://static.docs.arm.com/ddi0553/bh/DDI0553B_h_armv8m_arm.pdf the instructions SQRSHRL and UQRSHLL now have an additional immediate operand <saturate>. The new assembly syntax is: SQRSHRL<c> RdaLo, RdaHi, #<saturate>, Rm UQRSHLL<c> RdaLo, RdaHi, #<saturate>, Rm where <saturate> can be either 64 (the existing behavior) or 48, in that case the result is saturated to 48 bits. The new operand is encoded as follows: #64 Encoded as sat = 0 #48 Encoded as sat = 1 sat is bit 7 of the instruction bit pattern. This patch adds a new assembler operand class MveSaturateOperand which implements parsing and encoding. Decoding is implemented in DecodeMVEOverlappingLongShift. Reviewers: ostannard, simon_tatham, t.p.northover, samparker, dmgreen, SjoerdMeijer Reviewed By: simon_tatham Subscribers: javed.absar, kristof.beyls, hiraditya, pbarrio, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64810 llvm-svn: 366555	2019-07-19 09:46:28 +00:00
Jay Foad	7d06ffff46	[AMDGPU] Simplify the exclusive scan used for optimized atomics Summary: Change the scan algorithm to use only power-of-two shifts (1, 2, 4, 8, 16, 32) instead of starting off shifting by 1, 2 and 3 and then doing a 3-way ADD, because: 1. It simplifies the compiler a little. 2. It minimizes vgpr pressure because each instruction is now of the form vn = vn + vn << c. 3. It is more friendly to the DPP combiner, which currently can't combine into an ADD3 instruction. Because of #2 and #3 the end result is improved from this: v_add_u32_dpp v4, v3, v3 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0 v_mov_b32_dpp v5, v3 row_shr:2 row_mask:0xf bank_mask:0xf v_mov_b32_dpp v1, v3 row_shr:3 row_mask:0xf bank_mask:0xf v_add3_u32 v1, v4, v5, v1 s_nop 1 v_add_u32_dpp v1, v1, v1 row_shr:4 row_mask:0xf bank_mask:0xe s_nop 1 v_add_u32_dpp v1, v1, v1 row_shr:8 row_mask:0xf bank_mask:0xc s_nop 1 v_add_u32_dpp v1, v1, v1 row_bcast:15 row_mask:0xa bank_mask:0xf s_nop 1 v_add_u32_dpp v1, v1, v1 row_bcast:31 row_mask:0xc bank_mask:0xf To this: v_add_u32_dpp v1, v1, v1 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0 s_nop 1 v_add_u32_dpp v1, v1, v1 row_shr:2 row_mask:0xf bank_mask:0xf bound_ctrl:0 s_nop 1 v_add_u32_dpp v1, v1, v1 row_shr:4 row_mask:0xf bank_mask:0xe s_nop 1 v_add_u32_dpp v1, v1, v1 row_shr:8 row_mask:0xf bank_mask:0xc s_nop 1 v_add_u32_dpp v1, v1, v1 row_bcast:15 row_mask:0xa bank_mask:0xf s_nop 1 v_add_u32_dpp v1, v1, v1 row_bcast:31 row_mask:0xc bank_mask:0xf I.e. two fewer computational instructions, one extra nop where we could schedule something else. Reviewers: arsenm, sheredom, critson, rampitec, vpykhtin Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64411 llvm-svn: 366543	2019-07-19 08:40:37 +00:00
Hsiangkai Wang	18ccfadd46	[DebugInfo] Generate fixups as emitting DWARF .debug_frame/.eh_frame. It is necessary to generate fixups in .debug_frame or .eh_frame as relaxation is enabled due to the address delta may be changed after relaxation. There is an opcode with 6-bits data in debug frame encoding. So, we also need 6-bits fixup types. Differential Revision: https://reviews.llvm.org/D58335 llvm-svn: 366524	2019-07-19 02:03:34 +00:00
Amara Emerson	cf12c7815f	[GlobalISel] Translate calls to memcpy et al to G_INTRINSIC_W_SIDE_EFFECTs and legalize later. I plan on adding memcpy optimizations in the GlobalISel pipeline, but we can't do that unless we delay lowering to actual function calls. This patch changes the translator to generate G_INTRINSIC_W_SIDE_EFFECTS for these functions, and then have each target specify that using the new custom legalizer for intrinsics hook that they want it expanded it a libcall. Differential Revision: https://reviews.llvm.org/D64895 llvm-svn: 366516	2019-07-19 00:24:45 +00:00
Stanislav Mekhanoshin	a9c71e01e7	[AMDGPU] Drop Reg32 and use regular AsmName This allows to reduce generated AMDGPUGenAsmWriter.inc by ~100Kb. Differential Revision: https://reviews.llvm.org/D64952 llvm-svn: 366505	2019-07-18 22:18:33 +00:00
Jessica Paquette	7a1dcc5ff1	[GlobalISel][AArch64] Add support for base register + offset register loads Add support for folding G_GEPs into loads of the form ``` ldr reg, [base, off] ``` when possible. This can save an add before the load. Currently, this is only supported for loads of 64 bits into 64 bit registers. Add a new addressing mode function, `selectAddrModeRegisterOffset` which performs this folding when it is profitable. Also add a test for addressing modes for G_LOAD. Differential Revision: https://reviews.llvm.org/D64944 llvm-svn: 366503	2019-07-18 21:50:11 +00:00
Reid Kleckner	ba9c9e62cb	Revert [X86] EltsFromConsecutiveLoads - support common source loads This reverts r366441 (git commit `48104ef7c9`) This causes clang to fail to compile some file in Skia. Reduction soon. llvm-svn: 366501	2019-07-18 21:26:41 +00:00
Guanzhong Chen	df4479200b	[WebAssembly] Fix __builtin_wasm_tls_base intrinsic Summary: Properly generate the outchain for the `__builtin_wasm_tls_base` intrinsic. Also marked the intrinsic pure, per @sunfish's suggestion. Reviewers: tlively, aheejin, sbc100, sunfish Reviewed By: tlively Subscribers: dschuff, jgravelle-google, hiraditya, cfe-commits, llvm-commits, sunfish Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D64949 llvm-svn: 366499	2019-07-18 21:17:52 +00:00
Guanzhong Chen	801fa8e6b9	[WebAssembly] Implement __builtin_wasm_tls_base intrinsic Summary: Add `__builtin_wasm_tls_base` so that LeakSanitizer can find the thread-local block and scan through it for memory leaks. Reviewers: tlively, aheejin, sbc100 Subscribers: dschuff, jgravelle-google, hiraditya, sunfish, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D64900 llvm-svn: 366475	2019-07-18 17:53:22 +00:00
Peter Collingbourne	aa6a7df64a	MC: AArch64: Add support for prel_g* relocation specifiers. Differential Revision: https://reviews.llvm.org/D64683 llvm-svn: 366462	2019-07-18 16:54:33 +00:00
Peter Collingbourne	76427f849f	AArch64: Unify relocation restrictions between MOVK/MOVN/MOVZ. There doesn't seem to be a practical reason for these instructions to have different restrictions on the types of relocations that they may be used with, notwithstanding the language in the ELF AArch64 spec that implies that specific relocations are meant to be used with specific instructions. For example, we currently forbid the first instruction in the following sequence, despite it currently being used by clang to generate a global reference under -mcmodel=large: movz x0, #:abs_g0_nc:foo movk x0, #:abs_g1_nc:foo movk x0, #:abs_g2_nc:foo movk x0, #:abs_g3:foo Therefore, allow MOVK/MOVN/MOVZ to accept the union of the set of relocations that they currently accept individually. Differential Revision: https://reviews.llvm.org/D64466 llvm-svn: 366461	2019-07-18 16:51:53 +00:00
Hsiangkai Wang	657277e0f1	Revert "[DebugInfo] Generate fixups as emitting DWARF .debug_frame/.eh_frame." This reverts commit 17e3cbf5fe656483d9016d0ba9e1d0cd8629379e. llvm-svn: 366444	2019-07-18 15:06:50 +00:00
Hsiangkai Wang	e43ce1a958	[DebugInfo] Generate fixups as emitting DWARF .debug_frame/.eh_frame. It is necessary to generate fixups in .debug_frame or .eh_frame as relaxation is enabled due to the address delta may be changed after relaxation. There is an opcode with 6-bits data in debug frame encoding. So, we also need 6-bits fixup types. Differential Revision: https://reviews.llvm.org/D58335 llvm-svn: 366442	2019-07-18 14:47:34 +00:00
Simon Pilgrim	48104ef7c9	[X86] EltsFromConsecutiveLoads - support common source loads This patch enables us to find the source loads for each element, splitting them into a Load and ByteOffset, and attempts to recognise consecutive loads that are in fact from the same source load. A helper function, findEltLoadSrc, recurses to find a LoadSDNode and determines the element's byte offset within it. When attempting to match consecutive loads, byte offsetted loads then attempt to matched against a previous load that has already been confirmed to be a consecutive match. Next step towards PR16739 - after this we just need to account for shuffling/repeated elements to create a vector load + shuffle. Differential Revision: https://reviews.llvm.org/D64551 llvm-svn: 366441	2019-07-18 14:33:25 +00:00
Sanjay Patel	e654785912	[x86] try harder to form LEA from ADD to avoid flag conflicts (PR40483) LEA doesn't affect flags, so use it more liberally to replace an ADD when we know that the ADD operands affect flags. In the motivating example from PR40483: https://bugs.llvm.org/show_bug.cgi?id=40483 ...this lets us avoid duplicating a math op just to avoid flag conflict. As mentioned in the TODO comments, this heuristic can be extended to fire more often if that leads to more improvements. Differential Revision: https://reviews.llvm.org/D64707 llvm-svn: 366431	2019-07-18 12:48:01 +00:00
Diogo N. Sampaio	11512e742b	[ARM][DAGCOMBINE][FIX] PerformVMOVRRDCombine Summary: PerformVMOVRRDCombine ommits adding a offset of 4 to the PointerInfo, when converting a f64 = load[M] to {i32, i32} = {load[M], load[M + 4]} Which would allow the machine scheduller to break dependencies with the second load. - pr42638 Reviewers: eli.friedman, dmgreen, ostannard Reviewed By: ostannard Subscribers: ostannard, javed.absar, kristof.beyls, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64870 llvm-svn: 366423	2019-07-18 10:05:56 +00:00
Alex Bradbury	b8d352a08b	[RISCV] Reset NoPHIS MachineFunctionProperty in emitSelectPseudo We insered PHIS were there were none before, so the property must be reset. This error was found on an EXPENSIVE_CHECKS build. llvm-svn: 366412	2019-07-18 07:52:41 +00:00
Craig Topper	8da0402210	[X86] Disable combineConcatVectors for vXi1 vectors. I'm not convinced the code this calls is properly vetted for vXi1 vectors. Experimental vector widening legalization testing for D55251 is now hitting an assertion failure inside EltsFromConsecutiveLoads. This is occurring from a v2i1 load having a store size different than its VT size. Hopefully this commit will keep such issues from happening. llvm-svn: 366405	2019-07-18 06:18:06 +00:00
Alex Bradbury	8aba95d64c	[RISCV] Avoid signed integer overflow UB in RISCVMatInt::generateInstSeq Found by UBSan. llvm-svn: 366398	2019-07-18 04:02:58 +00:00
Alex Bradbury	ad73a436dc	[RISCV] Don't acccess an invalidated iterator in RISCVInstrInfo::removeBranch Issue found by ASan. llvm-svn: 366397	2019-07-18 03:23:47 +00:00
Fangrui Song	f358cf8de2	[AArch64] Add dependency from AArch64CodeGen to TransformUtils to fix -DBUILD_SHARED_LIBS=on link error after D64173/r366361 This fixes: ld.lld: error: undefined symbol: llvm::findAllocaForValue(llvm::Value, llvm::DenseMap<llvm::Value, llvm::Alloc aInst, llvm::DenseMapInfo<llvm::Value>, llvm::detail::DenseMapPair<llvm::Value, llvm::AllocaInst> >&) >>> referenced by AArch64StackTagging.cpp llvm-svn: 366396	2019-07-18 01:53:08 +00:00
Stanislav Mekhanoshin	7872d76a16	[AMDGPU] Simplify AMDGPUInstPrinter::printRegOperand() Differential Revision: https://reviews.llvm.org/D64892 llvm-svn: 366385	2019-07-17 22:58:43 +00:00
Craig Topper	61fff7a337	[X86] Make sure we mark 128/256 MLOAD as Legal with VLX when min-legal-vector-width=256 is in effect. This started triggering an assertion after r364718 when we made these Custom under AVX2. llvm-svn: 366382	2019-07-17 22:26:00 +00:00
Stanislav Mekhanoshin	9c7f4264d3	[AMDGPU] Stop special casing flat_scratch for register name Differential Revision: https://reviews.llvm.org/D64885 llvm-svn: 366376	2019-07-17 21:35:11 +00:00
Evgeniy Stepanov	f45fd429b7	Speculative fix for stack-tagging.ll failure. Depending on the evaluation order of function call arguments, the current code may insert a use before def. llvm-svn: 366375	2019-07-17 21:27:44 +00:00
Evgeniy Stepanov	851339fb29	Basic MTE stack tagging instrumentation. Summary: Use MTE intrinsics to tag stack variables in functions with sanitize_memtag attribute. Reviewers: pcc, vitalybuka, hctim, ostannard Subscribers: srhines, mgorny, javed.absar, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64173 llvm-svn: 366361	2019-07-17 19:24:12 +00:00
Evgeniy Stepanov	d752f5e953	Basic codegen for MTE stack tagging. Implement IR intrinsics for stack tagging. Generated code is very unoptimized for now. Two special intrinsics, llvm.aarch64.irg.sp and llvm.aarch64.tagp are used to implement a tagged stack frame pointer in a virtual register. Differential Revision: https://reviews.llvm.org/D64172 llvm-svn: 366360	2019-07-17 19:24:02 +00:00
Momchil Velikov	0e2b74a2b0	Revert [AArch64] Add support for Transactional Memory Extension (TME) This reverts r366322 (git commit `4b8da3a503`) llvm-svn: 366355	2019-07-17 17:43:32 +00:00
Daniil Fukalov	d912a9ba9b	[AMDGPU] Tune inlining parameters for AMDGPU target Summary: Since the target has no significant advantage of vectorization, vector instructions bous threshold bonus should be optional. amdgpu-inline-arg-alloca-cost parameter default value and the target InliningThresholdMultiplier value tuned then respectively. Reviewers: arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, eraman, hiraditya, haicheng, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64642 llvm-svn: 366348	2019-07-17 16:51:29 +00:00
Matt Arsenault	06eed42213	AMDGPU: Use getTargetConstant Avoids creating an extra intermediate mov. llvm-svn: 366340	2019-07-17 15:35:36 +00:00
Alex Bradbury	ab009a602e	[AsmPrinter] Make the encoding of call sites in .gcc_except_table configurable and use for RISC-V The original behavior was to always emit the offsets to each call site in the call site table as uleb128 values, however on some architectures (eg RISCV) these uleb128 offsets into the code cannot always be resolved until link time (because relaxation will invalidate any calculated offsets), and there are no appropriate relocations for uleb128 values. As a consequence it needs to be possible to specify an alternative. This also switches RISCV to use DW_EH_PE_udata4 for call side encodings in .gcc_except_table Differential Revision: https://reviews.llvm.org/D63415 Patch by Edward Jones. llvm-svn: 366329	2019-07-17 14:00:35 +00:00
Jay Foad	70235c642e	[AMDGPU] Optimize atomic AND/OR/XOR Summary: Extend the atomic optimizer to handle AND, OR and XOR. Reviewers: arsenm, sheredom Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, jfb, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64809 llvm-svn: 366323	2019-07-17 13:40:03 +00:00
Momchil Velikov	4b8da3a503	[AArch64] Add support for Transactional Memory Extension (TME) TME is a future architecture technology, documented in https://developer.arm.com/architectures/cpu-architecture/a-profile/exploration-tools https://developer.arm.com/docs/ddi0601/a More about the future architectures: https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/new-technologies-for-the-arm-a-profile-architecture This patch adds support for the TME instructions TSTART, TTEST, TCOMMIT, and TCANCEL and the target feature/arch extension "tme". It also implements TME builtin functions, defined in ACLE Q2 2019 (https://developer.arm.com/docs/101028/latest) Patch by Javed Absar and Momchil Velikov Differential Revision: https://reviews.llvm.org/D64416 llvm-svn: 366322	2019-07-17 13:23:27 +00:00
Justin Hibbits	0257c6b659	PowerPC: Fix register spilling for SPE registers Summary: Missed in the original commit, use the correct callee-saved register list for spilling, instead of the standard SVR432 list. This avoids needlessly spilling the SPE non-volatile registers when they're not used. As part of this, also add where missing, and sort, the spill opcode checks for SPE and SPE4 register classes. Reviewers: nemanjai, hfinkel, joerg Subscribers: kbarton, jsji, llvm-commits Differential Revision: https://reviews.llvm.org/D56703 llvm-svn: 366319	2019-07-17 12:30:48 +00:00
Justin Hibbits	5214956eaa	PowerPC/SPE: Fix load/store handling for SPE Summary: Pointed out in a comment for D49754, register spilling will currently spill SPE registers at almost any offset. However, the instructions `evstdd` and `evldd` require a) 8-byte alignment, and b) a limit of 256 (unsigned) bytes from the base register, as the offset must fix into a 5-bit offset, which ranges from 0-31 (indexed in double-words). The update to the register spill test is taken partially from the test case shown in D49754. Additionally, pointed out by Kei Thomsen, globals will currently use evldd/evstdd, though the offset isn't known at compile time, so may exceed the 8-bit (unsigned) offset permitted. This fixes that as well, by forcing it to always use evlddx/evstddx when accessing globals. Part of the patch contributed by Kei Thomsen. Reviewers: nemanjai, hfinkel, joerg Subscribers: kbarton, jsji, llvm-commits Differential Revision: https://reviews.llvm.org/D54409 llvm-svn: 366318	2019-07-17 12:30:04 +00:00
Petar Avramovic	1e62635d05	[MIPS GlobalISel] ClampScalar and select pointer G_ICMP Add narrowScalar to half of original size for G_ICMP. ClampScalar G_ICMP's operands 2 and 3 to to s32. Select G_ICMP for pointers for MIPS32. Pointer compare is same as for integers, it is enough to declare them as legal type. Differential Revision: https://reviews.llvm.org/D64856 llvm-svn: 366317	2019-07-17 12:08:01 +00:00
Nicolai Haehnle	8b7041a5c6	AMDGPU/GFX10: Apply the VMEM-to-scalar-write hazard also to writes to EXEC Summary: Change-Id: I854fbf7d48e937bef9f8f3f5d0c8aeb970652630 Reviewers: rampitec, mareko Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64807 Change-Id: I4405b3a7f84186acea5a78d291bff71056e745fc llvm-svn: 366314	2019-07-17 11:22:57 +00:00
Nicolai Haehnle	a256b8b7d7	AMDGPU: Improve alias analysis for GDS Summary: GDS cannot alias anything else. Original patch by: Marek Olšák Reviewers: arsenm, mareko Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64114 Change-Id: I07bfbd96f5d5c37a6dfba7997df12f291dd794b0 llvm-svn: 366313	2019-07-17 11:22:19 +00:00
Diana Picus	37e403d18c	[ARM GlobalISel] Cleanup CallLowering. NFC Migrate CallLowering::lowerReturnVal to use the same infrastructure as lowerCall/FormalArguments and remove the now obsolete code path from splitToValueTypes. Forgot to push this earlier. llvm-svn: 366308	2019-07-17 10:01:27 +00:00
Simon Atanasyan	4c1e440892	[mips] Use mult/mflo pattern on 64-bit targets prior to MIPS64 The `MUL` instruction is available starting from the MIPS32/MIPS64 targets. llvm-svn: 366301	2019-07-17 08:11:40 +00:00
Simon Atanasyan	a884afb6f8	[mips] Implement .cplocal directive This directive forces to use the alternate register for context pointer. For example, this code: .cplocal $4 jal foo expands to: ld $25, %call16(foo)($4) jalr $25 Differential Revision: https://reviews.llvm.org/D64743 llvm-svn: 366300	2019-07-17 08:11:31 +00:00
Simon Atanasyan	7f308af5ee	[mips] Support the "o" inline asm constraint As well as other LLVM targets we do not handle "offsettable" memory addresses in any special way. In other words, the "o" constraint is an exact equivalent of the "m" one. But some existing code require the "o" constraint support. This fixes PR42589. Differential Revision: https://reviews.llvm.org/D64792 llvm-svn: 366299	2019-07-17 08:11:15 +00:00
Stanislav Mekhanoshin	e5012ab308	[AMDGPU] Autogenerate register asm names Differential Revision: https://reviews.llvm.org/D64839 llvm-svn: 366283	2019-07-16 23:44:21 +00:00
Guanzhong Chen	0a8d4df799	[WebAssembly] Compile all TLS on Emscripten as local-exec Summary: Currently, on Emscripten, dynamic linking is not supported with threads. This means that if thread-local storage is used, it must be used in a statically-linked executable. Hence, local-exec is the only possible model. This diff compiles all TLS variables to use local-exec on Emscripten as a temporary measure until dynamic linking is supported with threads. The goal for this is to allow C++ types with constructors to be thread-local. Currently, when `clang` compiles a `thread_local` variable with a constructor, it generates `__tls_guard` variable: @__tls_guard = internal thread_local global i8 0, align 1 As no TLS model is specified, this is treated as general-dynamic, which we do not support (and cannot support without implementing dynamic linking support with threads in Emscripten). As a result, any C++ constructor in `thread_local` variables would not compile. By compiling all `thread_local` as local-exec, `__tls_guard` will compile and we can support C++ constructors with TLS without implementing dynamic linking with threads. Depends on D64537 Reviewers: tlively, aheejin, sbc100 Reviewed By: aheejin Subscribers: dschuff, jgravelle-google, hiraditya, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64776 llvm-svn: 366275	2019-07-16 22:22:08 +00:00
Guanzhong Chen	42bba4b852	[WebAssembly] Implement thread-local storage (local-exec model) Summary: Thread local variables are placed inside a `.tdata` segment. Their symbols are offsets from the start of the segment. The address of a thread local variable is computed as `__tls_base` + the offset from the start of the segment. `.tdata` segment is a passive segment and `memory.init` is used once per thread to initialize the thread local storage. `__tls_base` is a wasm global. Since each thread has its own wasm instance, it is effectively thread local. Currently, `__tls_base` must be initialized at thread startup, and so cannot be used with dynamic libraries. `__tls_base` is to be initialized with a new linker-synthesized function, `__wasm_init_tls`, which takes as an argument a block of memory to use as the storage for thread locals. It then initializes the block of memory and sets `__tls_base`. As `__wasm_init_tls` will handle the memory initialization, the memory does not have to be zeroed. To help allocating memory for thread-local storage, a new compiler intrinsic is introduced: `__builtin_wasm_tls_size()`. This instrinsic function returns the size of the thread-local storage for the current function. The expected usage is to run something like the following upon thread startup: __wasm_init_tls(malloc(__builtin_wasm_tls_size())); Reviewers: tlively, aheejin, kripken, sbc100 Subscribers: dschuff, jgravelle-google, hiraditya, sunfish, jfb, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D64537 llvm-svn: 366272	2019-07-16 22:00:45 +00:00
Sanjay Patel	d746a210e1	[x86] use more phadd for reductions This is part of what is requested by PR42023: https://bugs.llvm.org/show_bug.cgi?id=42023 There's an extension needed for FP add, but exactly how we would specify that using flags is not clear to me, so I left that as a TODO. We're still missing patterns for partial reductions when the input vector is 256-bit or 512-bit, but I think that's a failure of vector narrowing. If we can reduce the widths, then this matching should work on those tests. Differential Revision: https://reviews.llvm.org/D64760 llvm-svn: 366268	2019-07-16 21:30:41 +00:00
Matt Arsenault	f8c8284455	AMDGPU/GlobalISel: Select G_ASHR llvm-svn: 366257	2019-07-16 20:31:25 +00:00
Matt Arsenault	e5b28b98e9	AMDGPU/GlobalISel: Select G_LSHR llvm-svn: 366256	2019-07-16 20:25:43 +00:00
Jinsong Ji	65e34a3143	[PowerPC][HTM] Fix impossible reg-to-reg copy assert with ttest builtin Summary: This is exposed by our internal testing. The reduced testcase will assert with "Impossible reg-to-reg copy" We can't use COPY to do 32-bit to 64-bit conversion. Reviewers: kbarton, hfinkel, nemanjai Reviewed By: hfinkel Subscribers: hiraditya, MaskRay, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64499 llvm-svn: 366255	2019-07-16 20:24:33 +00:00
Matt Arsenault	1b69fd275d	AMDGPU/GlobalISel: Select G_SHL I think this manages to not break the DAG handling with the divergent predicates because the stadalone divergent patterns end up with a higher priority than the pattern on the instruction definition. The 16-bit versions don't work yet. llvm-svn: 366254	2019-07-16 20:15:30 +00:00
Stanislav Mekhanoshin	6e0fa292c2	[AMDGPU] Change register type for v32 vectors When it is AReg_1024 this results in unnecessary copying into AGPRs of a 32 element vectors even though they are not intended for an mfma instruction. Differential Revision: https://reviews.llvm.org/D64815 llvm-svn: 366252	2019-07-16 20:06:00 +00:00
Matt Arsenault	2d10407719	AMDGPU/GlobalISel: Fix selection of private stores llvm-svn: 366249	2019-07-16 19:27:44 +00:00
Matt Arsenault	7161fb0be5	AMDGPU/GlobalISel: Select private loads llvm-svn: 366248	2019-07-16 19:22:21 +00:00
Matt Arsenault	dad1f89210	AMDGPU/GlobalISel: Select flat stores llvm-svn: 366246	2019-07-16 18:42:53 +00:00
Matt Arsenault	7eb1902cd5	AMDGPU: Add register classes to flat store patterns For some reason GlobalISelEmitter needs register classes to import these, although it works for the load patterns. llvm-svn: 366242	2019-07-16 18:26:42 +00:00
Matt Arsenault	8f8d07e93b	AMDGPU: Replace store PatFrags Convert the easy cases to formats understood for GlobalISel. llvm-svn: 366240	2019-07-16 18:21:25 +00:00
Matt Arsenault	35c96598b1	AMDGPU/GlobalISel: Select flat loads Now that the patterns use the new PatFrag address space support, the only blocker to importing most load patterns is the addressing mode complex patterns. llvm-svn: 366237	2019-07-16 18:05:29 +00:00
Jay Foad	17060f0a54	[AMDGPU] Optimize atomic max/min Summary: Extend the atomic optimizer to handle signed and unsigned max and min operations, as well as add and subtract. Reviewers: arsenm, sheredom, critson, rampitec Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, jfb, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64328 llvm-svn: 366235	2019-07-16 17:44:54 +00:00
Matt Arsenault	c6fd5abecc	AMDGPU: Redefine load PatFrags Rewrite PatFrags using the new PatFrag address space matching in tablegen. These will now work with both SelectionDAG and GlobalISel. llvm-svn: 366234	2019-07-16 17:38:50 +00:00
Michael Liao	b3f967d411	[AMDGPU] Add the adjusted FP as a livein register. Reviewers: arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64145 llvm-svn: 366223	2019-07-16 15:57:12 +00:00
Matt Arsenault	22c4a147a9	AMDGPU/GlobalISel: Fix test failures in release build Apparently the check for legal instructions during instruction select does not happen without an asserts build, so these would successfully select in release, and fail in debug. Make s16 and/or/xor legal. These can just be selected directly to the 32-bit operation, as is already done in SelectionDAG, so just make them legal. llvm-svn: 366210	2019-07-16 14:28:30 +00:00
Kyrylo Tkachov	eb72138340	[AArch64] Implement __jcvt intrinsic from Armv8.3-A The jcvt intrinsic defined in ACLE [1] is available when ARM_FEATURE_JCVT is defined. This change introduces the AArch64 intrinsic, wires it up to the instruction and a new clang builtin function. The __ARM_FEATURE_JCVT macro is now defined when an Armv8.3-A or higher target is used. I've implemented the target detection logic in Clang so that this feature is enabled for architectures from armv8.3-a onwards (so -march=armv8.4-a also enables this, for example). make check-all didn't show any new failures. [1] https://developer.arm.com/docs/101028/latest/data-processing-intrinsics Differential Revision: https://reviews.llvm.org/D64495 llvm-svn: 366197	2019-07-16 09:27:39 +00:00
Kyrylo Tkachov	a3e26d1a6c	[NFC] Test commit: add full stop at end of comment llvm-svn: 366195	2019-07-16 09:15:01 +00:00
Craig Topper	c0b2ed664b	[X86] In combineStore, don't convert v2f32 load/store pairs to f64 loads/stores. Type legalization can take care of this. This gives DAG combine a little more time with the original types. llvm-svn: 366182	2019-07-16 05:52:27 +00:00
Alex Bradbury	1ffceaa543	[RISCV] Match GNU tools canonical JALR and add aliases The canonical GNU form of JALR resembles a load/store instruction rather than placing the immediate offset as a separate argument, so match this behaviour. Also add parser-only aliases for the three-operand form, and add other shorter aliases also emitted by GNU tools. Differential Revision: https://reviews.llvm.org/D55277 Patch by James Clarke. llvm-svn: 366179	2019-07-16 04:56:43 +00:00
Rui Ueyama	49a3ad21d6	Fix parameter name comments using clang-tidy. NFC. This patch applies clang-tidy's bugprone-argument-comment tool to LLVM, clang and lld source trees. Here is how I created this patch: $ git clone https://github.com/llvm/llvm-project.git $ cd llvm-project $ mkdir build $ cd build $ cmake -GNinja -DCMAKE_BUILD_TYPE=Debug \ -DLLVM_ENABLE_PROJECTS='clang;lld;clang-tools-extra' \ -DCMAKE_EXPORT_COMPILE_COMMANDS=On -DLLVM_ENABLE_LLD=On \ -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ ../llvm $ ninja $ parallel clang-tidy -checks='-,bugprone-argument-comment' \ -config='{CheckOptions: [{key: StrictMode, value: 1}]}' -fix \ ::: ../llvm/lib//.{cpp,h} ../clang/lib/*/.{cpp,h} ../lld/*/.{cpp,h} llvm-svn: 366177	2019-07-16 04:46:31 +00:00
Alex Bradbury	bb479ca311	[RISCV] Avoid overflow when determining number of nops for code align RISCVAsmBackend::shouldInsertExtraNopBytesForCodeAlign() assumed that the align specified would be greater than or equal to the minimum nop length, but that is not always the case - for example if a user specifies ".align 0" in assembly. Differential Revision: https://reviews.llvm.org/D63274 Patch by Edward Jones. llvm-svn: 366176	2019-07-16 04:40:25 +00:00
Alex Bradbury	e9ad0cf6cf	[RISCV] Fix a potential issue in shouldInsertFixupForCodeAlign() The bool result of shouldInsertExtraNopBytesForCodeAlign() is not checked but the returned nop count is unconditionally read even though it could be uninitialized. Differential Revision: https://reviews.llvm.org/D63285 Patch by Edward Jones. llvm-svn: 366175	2019-07-16 04:37:19 +00:00
Alex Bradbury	ef8577ef98	[RISCV][NFC] Split PseudoCALL pattern out from instruction Since PseudoCALL defines AsmString, it can be generated from assembly, and so code-gen patterns should be defined separately to be consistent with the style of the RISCV backend. Other pseudo-instructions exist that have code-gen patterns defined directly, but these instructions are purely for code-gen and cannot be written in assembly. Differential Revision: https://reviews.llvm.org/D64012 Patch by James Clarke. llvm-svn: 366174	2019-07-16 03:56:45 +00:00
Alex Bradbury	a3c7b27419	[RISCV][NFC] Fix HasStedExtA -> HasStdExtA typo in comment Differential Revision: https://reviews.llvm.org/D64011 Patch by James Clarke. llvm-svn: 366173	2019-07-16 03:54:08 +00:00
Alex Bradbury	4ac0b9be23	[RISCV] Make RISCVELFObjectWriter::getRelocType check IsPCRel Previously, this function didn't check the IsPCRel argument. But doing so is a useful check for errors, and also seemingly necessary for FK_Data_4 (which we produce a R_RISCV_32_PCREL relocation for if IsPCRel). Other than R_RISCV_32_PCREL, this should be NFC. Future exception handling related patches will include tests that capture this behaviour. llvm-svn: 366172	2019-07-16 03:47:34 +00:00
Matt Arsenault	1739b700b1	AMDGPU: Avoid code predicates for extload PatFrags Use the MemoryVT field. This will be necessary for tablegen to automatically handle patterns for GlobalISel. Doesn't handle the d16 lo/hi patterns. Those are a special case since it involvess the custom node type. llvm-svn: 366168	2019-07-16 02:46:05 +00:00
Craig Topper	51193871da	[X86] Teach convertToThreeAddress to handle SUB with immediate We mostly avoid sub with immediate but there are a couple cases that can create them. One is the add 128, %rax -> sub -128, %rax trick in isel. The other is when a SUB immediate gets created for a compare where both the flags and the subtract value is used. If we are unable to linearize the SelectionDAG to satisfy the flag user and the sub result user from the same instruction, we will clone the sub immediate for the two uses. The one that produces flags will eventually become a compare. The other will have its flag output dead, and could then be considered for LEA creation. I added additional test cases to add.ll to show the the sub -128 trick gets converted to LEA and a case where we don't need to convert it. This showed up in the current codegen for PR42571. Differential Revision: https://reviews.llvm.org/D64574 llvm-svn: 366151	2019-07-15 23:07:56 +00:00
Heejin Ahn	1cf6922660	[WebAssembly] Add missing utility methods for exnref type Summary: This adds missing utility methods and copy instruction handling for `exnref` type and also adds tests. `tee` instruction tests are missing because `isTee` is currently only used in ExplicitLocals pass and testing that pass in mir requires serialization of stackified registers in mir files, which is a bit nontrivial because `MachineFunctionInfo` only has info of vreg numbers (which are large integers) but not the mir's register numbers. But this change is quite trivial anyway. Reviewers: tlively Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64705 llvm-svn: 366149	2019-07-15 23:04:00 +00:00
Heejin Ahn	9f96a58ccc	[WebAssembly] Rename except_ref type to exnref Summary: We agreed to rename `except_ref` to `exnref` for consistency with other reference types in https://github.com/WebAssembly/exception-handling/issues/79. This also renames WebAssemblyInstrExceptRef.td to WebAssemblyInstrRef.td in order to use the file for other reference types in future. Reviewers: dschuff Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, jfb, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64703 llvm-svn: 366145	2019-07-15 22:49:25 +00:00
Wouter van Oortmerssen	292e21d8bc	[WebAssembly] Assembler: support special floats: infinity / nan Summary: These are emitted as identifiers by the InstPrinter, so we should parse them as such. These could potentially clash with symbols of the same name, but that is out of our (the WebAssembly backend) control. Reviewers: dschuff Subscribers: sbc100, jgravelle-google, aheejin, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64770 llvm-svn: 366139	2019-07-15 22:13:39 +00:00
Austin Kerbow	423b4a18a4	[AMDGPU] Enable merging m0 initializations. Summary: Enable hoisting and merging m0 defs that are initialized with the same immediate value. Fixes bug where removed instructions are not considered to interfere with other inits, and make sure to not hoist inits before block prologues. Reviewers: rampitec, arsenm Reviewed By: rampitec Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64766 llvm-svn: 366135	2019-07-15 22:07:05 +00:00
Simon Atanasyan	becae2b232	[mips] Print BEQZL and BNEZL pseudo instructions One of the reasons - to be compatible with GNU tools. llvm-svn: 366133	2019-07-15 21:46:38 +00:00
Matt Arsenault	b082f1055b	AMDGPU: Use standalone MUBUF load patterns We already do this for the flat and DS instructions, although it is certainly uglier and more verbose. This will allow using separate pattern definitions for extload and zextload. Currently we get away with using a single PatFrag with custom predicate code to check if the extension type is a zextload or anyextload. The generic mechanism the global isel emitter understands treats these as mutually exclusive. I was considering making the pattern emitter accept zextload or sextload extensions for anyextload patterns, but in global isel, the different extending loads have distinct opcodes, and there is currently no mechanism for an opcode matcher to try multiple (and there probably is very little need for one beyond this case). llvm-svn: 366132	2019-07-15 21:41:44 +00:00
Matt Arsenault	66ee934440	AMDGPU/GlobalISel: Allow scalar s1 and/or/xor If a 1-bit value is in a 32-bit VGPR, the scalar opcodes set SCC to whether the result is 0. If the inputs are SCC, these can be copied to a 32-bit SGPR to produce an SCC result. llvm-svn: 366125	2019-07-15 20:20:18 +00:00
Matt Arsenault	c8291c94f8	AMDGPU/GlobalISel: Select G_AND/G_OR/G_XOR llvm-svn: 366121	2019-07-15 19:50:07 +00:00
Matt Arsenault	ad19b50c00	AMDGPU/GlobalISel: Don't constrain source register of VCC copies This is a hack until I come up with a better way of dealing with the pseudo-register banks used for boolean values. If the use instruction constrains the register, the selector for the def instruction won't see that the bank was VCC. A 1-bit SReg_32 is could ambiguously have been SCCRegBank or VCCRegBank in wave32. This is necessary to successfully select branches with and and/or/xor condition. llvm-svn: 366120	2019-07-15 19:48:36 +00:00
Matt Arsenault	e1b52f4180	AMDGPU/GlobalISel: Fix selecting vcc->vcc bank copies The extra test change is correct, although how it arrives there is a bug that needs work. With wave32, the test for isVCC ambiguously reports true for an SCC or VCC source. A new allocatable pseudo register class for SCC may be necesssary. llvm-svn: 366119	2019-07-15 19:46:48 +00:00
Matt Arsenault	3bfdb54d88	AMDGPU/GlobalISel: Fix not constraining result reg of copies to VCC llvm-svn: 366118	2019-07-15 19:45:49 +00:00
Matt Arsenault	18b7133843	AMDGPU/GlobalISel: Fix handling of sgpr (not scc bank) s1 to VCC This was emitting a copy from a 32-bit register to a 64-bit. llvm-svn: 366117	2019-07-15 19:44:07 +00:00
Matt Arsenault	6ed315f89b	AMDGPU/GlobalISel: Custom legalize G_INSERT_VECTOR_ELT llvm-svn: 366116	2019-07-15 19:43:04 +00:00
Matt Arsenault	b0e04c018c	AMDGPU/GlobalISel: Custom legalize G_EXTRACT_VECTOR_ELT Turn the constant cases into G_EXTRACTs. llvm-svn: 366115	2019-07-15 19:40:59 +00:00
Matt Arsenault	5dfd466032	AMDGPU/GlobalISel: Fix G_ICMP for wave32 llvm-svn: 366114	2019-07-15 19:39:31 +00:00
David Green	dc56995c57	[ARM] MVE vector for 64bit types We need to make sure that we are sensibly dealing with vectors of types v2i64 and v2f64, even if most of the time we cannot generate native operations for them. This mostly adds a lot of testing, plus fixes up a couple of the issues found. And, or and xor can be legal for v2i64, and shifts combining needs a slight fixup. Differential Revision: https://reviews.llvm.org/D64316 llvm-svn: 366106	2019-07-15 18:42:54 +00:00
Matt Arsenault	90bdfb3daf	AMDGPU/GlobalISel: Widen vector extracts llvm-svn: 366103	2019-07-15 18:31:10 +00:00
Matt Arsenault	53fa759ff5	AMDGPU/GlobalISel: Handle llvm.amdgcn.if.break llvm-svn: 366102	2019-07-15 18:25:24 +00:00
Matt Arsenault	b390121efb	AMDGPU/GlobalISel: Select llvm.amdgcn.end.cf llvm-svn: 366099	2019-07-15 18:18:46 +00:00
Sanjay Patel	eb99165b97	[x86] try to keep FP casted+truncated+extracted vector element out of GPRs inttofp (trunc (extelt X, 0)) --> inttofp (extelt (bitcast X), 0) We have pseudo-vectorization of scalar int to FP casts, so this tries to make that more likely by replacing a truncate with a bitcast. I didn't see any test diffs starting from 'uitofp', so I left that as a TODO. We can't only match the shorter trunc+extract pattern because there's an opposing transform somewhere, so we infinite loop. Waiting to try this during lowering is another possibility. A motivating case is shown in PR39975 and included in the test diffs here: https://bugs.llvm.org/show_bug.cgi?id=39975 Differential Revision: https://reviews.llvm.org/D64710 llvm-svn: 366098	2019-07-15 18:17:23 +00:00
Craig Topper	81971b2b79	[X86] Return UNDEF from LowerScalarImmediateShift when the shift amount is out of range. I think we only turn out of range shiftss to undef when all elements are out of range or the shift amount is a splat out of range. I'm not sure which, I didn't check. During lowering we can split a shift where some elements are out of range into multiple shifts. This can create a new shift with a splat shift amount that is out of range. This patch returns undef for this case. Fixes PR42615. Differential Revision: https://reviews.llvm.org/D64699 llvm-svn: 366096	2019-07-15 17:56:57 +00:00
Matt Arsenault	49169a963e	AMDGPU: Add 24-bit mul intrinsics Insert these during codegenprepare. This works around a DAG issue where generic combines eliminate the and asserting the high bits are zero, which then exposes an unknown read source to the mul combine. It doesn't worth the hassle of trying to insert an AssertZext or something to try to deal with it. llvm-svn: 366094	2019-07-15 17:50:31 +00:00
Stanislav Mekhanoshin	7938424eb9	[AMDGPU] Copy missing predicate from pseudo to real NFC at the momemnt, needed for future commit. Differential Revision: https://reviews.llvm.org/D64761 llvm-svn: 366092	2019-07-15 17:49:25 +00:00
David Green	8e7eee617a	[ARM] Minor formatting in ARMInstrMVE.td. NFC llvm-svn: 366089	2019-07-15 17:29:06 +00:00
Matt Arsenault	a65913e752	AMDGPU/GlobalISel: Select easy cases for G_BUILD_VECTOR llvm-svn: 366087	2019-07-15 17:26:43 +00:00
Matt Arsenault	cc02b17082	AMDGPU/GlobalISel: RegBankSelect for G_CONCAT_VECTORS llvm-svn: 366086	2019-07-15 17:20:40 +00:00
Stanislav Mekhanoshin	fd08dcb9db	[AMDGPU] fixed scheduler crash in gfx908 For some reason scheduler can send down an SUnit without an instruction. Differential Revision: https://reviews.llvm.org/D64709 llvm-svn: 366074	2019-07-15 15:34:05 +00:00
Dmitry Preobrazhensky	5153b1723a	[AMDGPU][MC][GFX9][GFX10] Added support of GET_DOORBELL message Reviewers: artem.tamazov, arsenm Differential Revision: https://reviews.llvm.org/D64729 llvm-svn: 366071	2019-07-15 15:12:16 +00:00
Dmitry Preobrazhensky	8d879c8d95	[AMDGPU][MC] Corrected encoding of src0 for DS_GWS_* instructions See bug 42599: https://bugs.llvm.org/show_bug.cgi?id=42599 Reviewers: artem.tamazov, arsenm Differential Revision: https://reviews.llvm.org/D64716 llvm-svn: 366067	2019-07-15 14:37:57 +00:00
Simon Pilgrim	60fb5e97a0	[X86] isTargetShuffleEquivalent - assert the expected mask is correctly formed. NFCI. While we don't make any assumptions about the actual mask, assert that the expected mask only contains valid mask element values. llvm-svn: 366066	2019-07-15 14:29:14 +00:00
Simon Atanasyan	83ae0b5eb4	[mips] Remove "else-after-return". NFC llvm-svn: 366064	2019-07-15 13:12:36 +00:00
David Green	6e89887642	[ARM] MVE Vector Shifts This adds basic lowering for MVE shifts. There are many shifts in MVE, but the instructions handled here are: VSHL (imm) VSHRu (imm) VSHRs (imm) VSHL (vector) VSHL (register) MVE, like NEON before it, doesn't have shift right by a vector (or register). We instead have to negate the amount and shift in the opposite direction. This means we have to convert any SHR's into a form of SHL (that is still signed or unsigned) with a negated condition and selecting from there. MVE still does have shifting by an immediate for SHL, ASR and LSR. This adds lowering for these and for register forms, which work well for shift lefts but may require an extra fold of neg(vdup(x)) -> vdup(neg(x)) to potentially work optimally for right shifts. Differential Revision: https://reviews.llvm.org/D64212 llvm-svn: 366056	2019-07-15 11:35:39 +00:00
David Green	f059147a10	[ARM] Move Shifts after Bits. NFC This just moves the shift instruction definitions further down the ARMInstrMVE.td file, to make positioning patterns slightly more natural. llvm-svn: 366054	2019-07-15 11:22:05 +00:00
David Green	da750b1688	[ARM] Adjust how NEON shifts are lowered This adjusts the way that we lower NEON shifts to use a DAG target node, not via a neon intrinsic. This is useful for handling MVE shifts operations in the same the way. It also renames some of the immediate shift nodes for consistency, and moves some of the processing of immediate shifts into LowerShift allowing it to capture more cases. Differential Revision: https://reviews.llvm.org/D64426 llvm-svn: 366051	2019-07-15 10:44:50 +00:00
Bill Wendling	796ed134cc	Remove set but unused variable. llvm-svn: 366041	2019-07-15 06:35:28 +00:00
Craig Topper	635d103e0b	[X86] Separate the memory size of vzext_load/vextract_store from the element size of the result type. Use them improve the codegen of v2f32 loads/stores with sse1 only. Summary: SSE1 only supports v4f32. But does have instructions like movlps/movhps that load/store 64-bits of memory. This patch breaks the connection between the node VT of the vzext_load/vextract_store patterns and the memory VT. Enabling a v4f32 node with a 64-bit memory VT. I've used i64 as the memory VT here. I've written the PatFrag predicate to just check the store size not the specific VT. I think the VT will only matter for CSE purposes. We could use v2f32, but if we want to start using these operations in more places a simple integer type might make the most sense. I'd like to maybe use this same thing for SSE2 and later as well, but that will need more work to be supported by EltsFromConsecutiveLoads to avoid regressing lit tests. I'd maybe also like to combine bitcasts with these load/stores nodes now that the types are disconnected. And I'd also like to consider canonicalizing (scalar_to_vector + load) to vzext_load. If you want I can split the mechanical tablegen stuff where I added the 32/64 off from the sse1 change. Reviewers: spatel, RKSimon Reviewed By: RKSimon Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64528 llvm-svn: 366034	2019-07-15 02:02:31 +00:00
Craig Topper	9450b0084a	[X86] Remove offset of 8 from the call to FuseInst for UNPCKLPDrr folding added in r365287. This was copy/pasted from above and I forgot to change it. We just need the default offset of 0 here. Fixes PR42616. llvm-svn: 366011	2019-07-14 04:13:33 +00:00
David Green	458a720ec1	[ARM] Add sign and zero extend patterns for MVE The vmovlb instructions can be uses to sign or zero extend vector registers between types. This adds some patterns for them and relevant testing. The VBICIMM generation is also put behind a hasNEON check (as is already done for VORRIMM). Code originally by David Sherwood. Differential Revision: https://reviews.llvm.org/D64069 llvm-svn: 366008	2019-07-13 15:43:00 +00:00
David Green	07a7ec2021	[ARM] MVE VNEG instruction patterns This selects integer VNEG instructions, which can be especially useful with shifts. Differential Revision: https://reviews.llvm.org/D64204 llvm-svn: 366006	2019-07-13 15:26:51 +00:00
David Green	4ce648b5e8	[ARM] MVE integer abs Similar to floating point abs, we also have instructions for integers. Differential Revision: https://reviews.llvm.org/D64027 llvm-svn: 366005	2019-07-13 14:58:32 +00:00
David Green	701bf714db	[ARM] MVE integer min and max This simply makes the MVE integer min and max instructions legal and adds the relevant patterns for them. Differential Revision: https://reviews.llvm.org/D64026 llvm-svn: 366004	2019-07-13 14:48:54 +00:00
David Green	ac5bcbeb9f	[ARM] MVE VRINT support This adds support for the floor/ceil/trunc/... series of instructions, converting to various forms of VRINT. They use the same suffixes as their floating point counterparts. There is not VTINTR, so nearbyint is expanded. Also added a copysign test, to show it is expanded. Differential Revision: https://reviews.llvm.org/D63985 llvm-svn: 366003	2019-07-13 14:38:53 +00:00
David Green	ec8af0db6c	[ARM] MVE minnm and maxnm instructions This adds the patterns for minnm and maxnm from the fminnum and fmaxnum nodes, similar to scalar types. Original patch by Simon Tatham Differential Revision: https://reviews.llvm.org/D63870 llvm-svn: 366002	2019-07-13 14:29:02 +00:00
Sanjay Patel	2097f75eab	[x86] simplify cmov with same true/false operands llvm-svn: 365998	2019-07-13 12:04:52 +00:00
Stanislav Mekhanoshin	1dfae6fe50	[AMDGPU] use v32f32 for 3 mfma intrinsics These should really use v32f32, but were defined as v32i32 due to the lack of the v32f32 type. Differential Revision: https://reviews.llvm.org/D64667 llvm-svn: 365972	2019-07-12 22:42:01 +00:00
Wouter van Oortmerssen	d8ddf83950	[WebAssembly] refactored utilities to not depend on MachineInstr Summary: Most of these functions can work for MachineInstr and MCInst equally now. Reviewers: dschuff Subscribers: MatzeB, sbc100, jgravelle-google, aheejin, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64643 llvm-svn: 365965	2019-07-12 22:08:25 +00:00
Evgeniy Stepanov	32452487ae	Factor out resolveFrameOffsetReference (NFC). Split AArch64FrameLowering::resolveFrameIndexReference in two parts * Finding frame offset for the index. * Finding base register and offset to that register. The second part will be used to implement a virtual frame pointer in armv8.5 MTE stack instrumentation lowering. Reviewers: pcc, vitalybuka, hctim, ostannard Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64171 llvm-svn: 365958	2019-07-12 21:13:55 +00:00
Matt Arsenault	51a05d72ae	AMDGPU: Drop remnants of byval support for shaders Before 2018, mesa used to use byval interchangably with inreg, which didn't really make sense. Fix tests still using it to avoid breaking in a future commit. llvm-svn: 365953	2019-07-12 20:12:17 +00:00
David Tenty	ae79a2c390	Fix missing use of defined() in include guard Subscribers: arsenm, jvesely, nhaehnle, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64657 llvm-svn: 365952	2019-07-12 20:12:15 +00:00
Nikita Popov	411fa4c0df	[SystemZ] Fix addcarry of addcarry of const carry (PR42606) This fixes https://bugs.llvm.org/show_bug.cgi?id=42606 by extending D64213. Instead of only checking if the carry comes from a matching operation, we now check the full chain of carries. Otherwise we might custom lower the outermost addcarry, but then generically legalize an inner addcarry. Differential Revision: https://reviews.llvm.org/D64658 llvm-svn: 365949	2019-07-12 20:03:34 +00:00
Craig Topper	b828f0b90a	[X86] Use MachineInstr::findRegisterDefOperand to simplify some code in optimizeCompareInstr. NFCI llvm-svn: 365946	2019-07-12 19:26:35 +00:00
Ulrich Weigand	38ec89a670	[SystemZ] Fix build bot failure after r365932 Insert LLVM_FALLTHROUGH to avoid compiler warning. llvm-svn: 365942	2019-07-12 18:44:51 +00:00
Stanislav Mekhanoshin	495b0f5cc3	[AMDGPU] Extend MIMG opcode to 8 bits This is NFC, but required for future commit. Differential Revision: https://reviews.llvm.org/D64649 llvm-svn: 365940	2019-07-12 18:38:06 +00:00
Ulrich Weigand	0f0a8b7784	[SystemZ] Add support for new cpu architecture - arch13 This patch series adds support for the next-generation arch13 CPU architecture to the SystemZ backend. This includes: - Basic support for the new processor and its features. - Assembler/disassembler support for new instructions. - CodeGen for new instructions, including new LLVM intrinsics. - Scheduler description for the new processor. - Detection of arch13 as host processor. Note: No currently available Z system supports the arch13 architecture. Once new systems become available, the official system name will be added as supported -march name. llvm-svn: 365932	2019-07-12 18:13:16 +00:00
Craig Topper	98f931639b	[X86] Add NEG to isUseDefConvertible. We can use the C flag from NEG to detect that the input was zero. Really we could probably use the Z flag too. But C matches what we'd do for usubo 0, X. Haven't found a test case for this due to the usubo formation in CGP. But I verified if I comment out the CGP code this transformation catches some of the same cases. llvm-svn: 365929	2019-07-12 17:52:17 +00:00
Jay Foad	27ec195f39	[AMDGPU] Fix DPP combiner check for exec modification Summary: r363675 changed the exec modification helper function, now called execMayBeModifiedBeforeUse, so that if no UseMI is specified it checks all instructions in the basic block, even beyond the last use. That meant that the DPP combiner no longer worked in any basic block that ended with a control flow instruction, and in particular it didn't work on code sequences generated by the atomic optimizer. Fix it by reinstating the old behaviour but in a new helper function execMayBeModifiedBeforeAnyUse, and limiting the number of instructions scanned. Reviewers: arsenm, vpykhtin Subscribers: kzhuravl, nemanjai, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kbarton, MaskRay, jfb, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64393 llvm-svn: 365910	2019-07-12 15:59:40 +00:00
Jay Foad	7816ad918f	[AMDGPU] Restrict v_cndmask_b32 abs/neg modifiers to f32 Summary: D64497 allowed abs/neg source modifiers on v_cndmask_b32 but it doesn't make any sense to apply them to f16 operands; they would interpret the bits of the value as an f32, giving nonsensical results. This patch restricts them to f32 operands. Reviewers: arsenm, hakzsam Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64636 llvm-svn: 365904	2019-07-12 15:02:59 +00:00
Fangrui Song	b251cc0d91	Delete dead stores llvm-svn: 365903	2019-07-12 14:58:15 +00:00
Djordje Todorovic	0739ccd3b5	Revert "[DwarfDebug] Dump call site debug info" A build failure was found on the SystemZ platform. This reverts commit 9e7e73578e54cd22b3c7af4b54274d743b6607cc. llvm-svn: 365886	2019-07-12 09:45:12 +00:00
Sam Elliott	fafec5155e	[RISCV] Allow parsing dot '.' in assembly Summary: Useful for jumps, such as `j .`. I am not sure who should review this. Do not hesitate to change the reviewers if needed. Reviewers: asb, jrtc27, lenary Reviewed By: lenary Subscribers: MaskRay, lenary, hiraditya, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, kito-cheng, shiva0217, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, rkruppe, PkmX, jocewei, psnobl, benna, Jim, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63669 Patch by John LLVM (JohnLLVM) llvm-svn: 365881	2019-07-12 08:36:07 +00:00
Bryant Wong	7ba838d29c	Test commit. NFC. Formatting fix. llvm-svn: 365878	2019-07-12 08:25:59 +00:00
Simon Atanasyan	ee5af50eb0	[mips] Fix JmpLink to texternalsym and tglobaladdr on mcroMIPS R6 There is not match for the `MipsJmpLink texternalsym` and `MipsJmpLink tglobaladdr` patterns for microMIPS R6. As a result LLVM incorrectly selects the `JALRC16` compact 2-byte instruction which takes a target instruction address from a register only and assign `R_MIPS_32` relocation for this instruction. This relocation completely overwrites `JALRC16` and nearby instructions. This patch adds missed matching patterns, selects `BALC` instruction and assign a correct `R_MICROMIPS_PC26_S1` relocation. Differential Revision: https://reviews.llvm.org/D64552 llvm-svn: 365870	2019-07-12 04:58:45 +00:00
Michael Liao	16d3c1ac03	[AMDGPU] Skip calculating callee saved registers for entry function. Reviewers: arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64596 llvm-svn: 365846	2019-07-11 23:53:30 +00:00
Matt Arsenault	e5fb434d92	AMDGPU: s_waitcnt field should be treated as unsigned Also make it an ImmLeaf, so it should work with global isel as well, which was part of the point of moving it in the first place. llvm-svn: 365842	2019-07-11 23:42:57 +00:00
Stanislav Mekhanoshin	28550c8680	[AMDGPU] Fixed asan error with agpr spilling Instruction was used after it was erased. llvm-svn: 365837	2019-07-11 22:30:11 +00:00
Stanislav Mekhanoshin	937ff6e701	[AMDGPU] gfx908 agpr spilling Differential Revision: https://reviews.llvm.org/D64594 llvm-svn: 365833	2019-07-11 21:54:13 +00:00
Stanislav Mekhanoshin	7d2019bb96	[AMDGPU] gfx908 hazard recognizer Differential Revision: https://reviews.llvm.org/D64593 llvm-svn: 365829	2019-07-11 21:30:34 +00:00
Stanislav Mekhanoshin	b83e283e65	[AMDGPU] gfx908 scheduling Differential Revision: https://reviews.llvm.org/D64590 llvm-svn: 365826	2019-07-11 21:25:00 +00:00
Stanislav Mekhanoshin	e67cc380a8	[AMDGPU] gfx908 mfma support Differential Revision: https://reviews.llvm.org/D64584 llvm-svn: 365824	2019-07-11 21:19:33 +00:00
Wouter van Oortmerssen	a617967d68	[WebAssembly] Assembler: support negative float constants. Reviewers: dschuff Subscribers: sbc100, jgravelle-google, aheejin, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64367 llvm-svn: 365802	2019-07-11 18:18:07 +00:00
Benjamin Kramer	fa1a4e4de5	[NVPTX] Use atomicrmw fadd instead of intrinsics AutoUpgrade the old intrinsics to atomicrmw fadd. llvm-svn: 365796	2019-07-11 17:11:25 +00:00
Sanjay Patel	5cc7c9ab93	[X86] Merge negated ISD::SUB nodes into X86ISD::SUB equivalent (PR40483) Follow up to D58597, where it was noted that the commuted ISD::SUB variant was having problems with lack of combines. See also D63958 where we untangled setcc/sub pairs. Differential Revision: https://reviews.llvm.org/D58875 llvm-svn: 365791	2019-07-11 15:56:33 +00:00
Matt Arsenault	b725d27350	AMDGPU/GlobalISel: Move kernel argument handling to separate function llvm-svn: 365782	2019-07-11 14:18:25 +00:00
Tim Northover	67828edbbd	OpaquePtr: switch to GlobalValue::getValueType in a few places. NFC. llvm-svn: 365770	2019-07-11 13:13:02 +00:00
Fangrui Song	f9ca13cb5f	[X86] -fno-plt: use GOT __tls_get_addr only if GOTPCRELX is enabled Summary: As of binutils 2.32, ld has a bogus TLS relaxation error when the GD/LD code sequence using R_X86_64_GOTPCREL (instead of R_X86_64_GOTPCRELX) is attempted to be relaxed to IE/LE (binutils PR24784). gold and lld are good. In gcc/config/i386/i386.md, there is a configure-time check of as/ld support and the GOT relaxation will not be used if as/ld doesn't support it: if (flag_plt \|\| !HAVE_AS_IX86_TLS_GET_ADDR_GOT) return "call\t%P2"; return "call\t{*%p2@GOT(%1)\|[DWORD PTR %p2@GOT[%1]]}"; In clang, -DENABLE_X86_RELAX_RELOCATIONS=OFF is the default. The ld.bfd bogus error can be reproduced with: thread_local int a; int main() { return a; } clang -fno-plt -fpic a.cc -fuse-ld=bfd GOTPCRELX gained relative good support in 2016, which is considered relatively new. It is even difficult to conditionally default to -DENABLE_X86_RELAX_RELOCATIONS=ON due to cross compilation reasons. So work around the ld.bfd bug by only using GOT when GOTPCRELX is enabled. Reviewers: dalias, hjl.tools, nikic, rnk Reviewed By: nikic Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64304 llvm-svn: 365752	2019-07-11 10:10:09 +00:00
Sam Parker	08b4a8da07	[ARM][LowOverheadLoops] Correct offset checking This patch addresses a couple of problems: 1) The maximum supported offset of LE is -4094. 2) The offset of WLS also needs to be checked, this uses a maximum positive offset of 4094. The use of BasicBlockUtils has been changed because the block offsets weren't being initialised, but the isBBInRange checks both positive and negative offsets. ARMISelLowering has been tweaked because the test case presented another pattern that we weren't supporting. llvm-svn: 365749	2019-07-11 09:56:15 +00:00
Simon Tatham	7916198a41	[ARM] Remove nonexistent unsigned forms of MVE VQDMLAH. The VQDMLAH.U8, VQDMLAH.U16 and VQDMLAH.U32 instructions don't actually exist: the Armv8.1-M architecture spec only lists signed forms of that instruction. The unsigned ones were added in error: they existed in an early draft of the spec, but they were removed before the public version, and we missed that particular spec change. Also affects the variant forms VQDMLASH, VQRDMLAH and VQRDMLASH. Reviewers: miyuki Subscribers: javed.absar, kristof.beyls, hiraditya, dmgreen, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64502 llvm-svn: 365747	2019-07-11 09:52:15 +00:00
Petar Avramovic	962524070a	[MIPS GlobalISel] Skip copies in addUseDef and addDefUses Skip copies between virtual registers during search for UseDefs and DefUses. Since each operand has one def search for UseDefs is straightforward. But since operand can have many uses, we have to check all uses of each copy we traverse during search for DefUses. Differential Revision: https://reviews.llvm.org/D64486 llvm-svn: 365744	2019-07-11 09:28:34 +00:00
Petar Avramovic	e3bb0a72b6	[MIPS GlobalISel] RegBankSelect for chains of ambiguous instructions When one of the uses/defs of ambiguous instruction is also ambiguous visit it recursively and search its uses/defs for instruction with only one mapping available. When all instruction in a chain are ambiguous arbitrary mapping can be selected. For s64 operands in ambiguous chain fprb is selected since it results in less instructions then having to narrow scalar s64 to s32. For s32 both gprb and fprb result in same number of instructions and gprb is selected like a general purpose option. At the moment we always avoid cross register bank copies. TODO: Implement a model for costs calculations of different mappings on same instruction and cross bank copies. Allow cross bank copies when appropriate according to cost model. Differential Revision: https://reviews.llvm.org/D64485 llvm-svn: 365743	2019-07-11 09:22:49 +00:00
Jay Foad	c1b7db9eda	Remove some redundant code from r290372 and improve a comment. llvm-svn: 365741	2019-07-11 08:49:52 +00:00
Sam Parker	85ad78b1cf	[ARM][ParallelDSP] Change the search for smlads Two functional changes have been made here: - Now search up from any add instruction to find the chains of operations that we may turn into a smlad. This allows the generation of a smlad which doesn't accumulate into a phi. - The search function has been corrected to stop it falsely searching up through an invalid path. The bulk of the changes have been making the Reduction struct a class and making it more C++y with getters and setters. Differential Revision: https://reviews.llvm.org/D61780 llvm-svn: 365740	2019-07-11 07:47:50 +00:00
Heejin Ahn	54c136bbdf	[WebAssembly] Print error message for llvm.clear_cache intrinsic Summary: Wasm does not currently support `llvm.clear_cache` intrinsic, and this prints a proper error message instead of segfault. Reviewers: dschuff, sbc100, sunfish Subscribers: jgravelle-google, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64322 llvm-svn: 365731	2019-07-11 05:55:47 +00:00
Craig Topper	88729e3dec	[X86] Don't convert 8 or 16 bit ADDs to LEAs on Atom in FixupLEAPass. We use the functions that convert to three address to do the conversion, but changing an 8 or 16 bit will cause it to create a virtual register. This can't be done after register allocation where this pass runs. I've switched the pass completely to a white list of instructions that can be converted to LEA instead of a blacklist that was incorrect. This will avoid surprises if we enhance the three address conversion function to include additional instructions in the future. Fixes PR42565. llvm-svn: 365720	2019-07-11 01:01:39 +00:00
Stanislav Mekhanoshin	e93279fd1b	[AMDGPU] gfx908 atomic fadd and atomic pk_fadd Differential Revision: https://reviews.llvm.org/D64435 llvm-svn: 365717	2019-07-11 00:10:17 +00:00
Stanislav Mekhanoshin	c0ae1be066	[AMDGPU] gfx908 dot instruction support Differential Revision: https://reviews.llvm.org/D64431 llvm-svn: 365715	2019-07-11 00:00:27 +00:00
Craig Topper	1c327c7e0a	[X86] Add patterns with and_flag_nocf for BLSI and TBM instructions. Fixes similar issues to r352306. llvm-svn: 365705	2019-07-10 22:44:32 +00:00
Craig Topper	d916f23b83	[X86] Add BLSR and BLSMSK to isUseDefConvertible. Unfortunately subo formation in CGP prevents obvious ways of testing this. But we already have BLSI in here and the flag behavior is well understood. Might become more useful if we improve PR42571. llvm-svn: 365702	2019-07-10 22:14:39 +00:00
David Tenty	a2681296e0	[NFC]Fix IR/MC depency issue for function descriptor SDAG implementation Summary: llvm/IR/GlobalValue.h can't be included in MC, that creates a circular dependency between MC and IR libraries. This circular dependency is causing an issue for build system that enforce layering. Author: Xiangling_L Reviewers: sfertile, jasonliu, hubert.reinterpretcast, gribozavr Reviewed By: gribozavr Subscribers: wuzish, nemanjai, hiraditya, kbarton, MaskRay, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64445 llvm-svn: 365701	2019-07-10 22:13:55 +00:00
Craig Topper	021ba49b31	[X86] Remove unused variable. NFC llvm-svn: 365697	2019-07-10 21:01:34 +00:00
Amara Emerson	7a4d2df04a	[AArch64][GlobalISel] Optimize compare and branch cases with G_INTTOPTR and unknown values. Since we have distinct types for pointers and scalars, G_INTTOPTRs can sometimes obstruct attempts to find constant source values. These usually come about when try to do some kind of null pointer check. Teaching getConstantVRegValWithLookThrough about this operation allows the CBZ/CBNZ optimization to catch more cases. This change also improves the case where we can't find a constant source at all. Previously we would emit a cmp, cset and tbnz for that. Now we try to just emit a cmp and conditional branch, saving an instruction. The cumulative code size improvement of this change plus D64354 is 5.5% geomean on arm64 CTMark -O0. Differential Revision: https://reviews.llvm.org/D64377 llvm-svn: 365690	2019-07-10 19:21:43 +00:00
Jessica Paquette	7c95925b13	[GlobalISel][AArch64] Use getOpcodeDef instead of findMIFromReg Some minor cleanup. This function in Utils does the same thing as `findMIFromReg`. It also looks through copies, which `findMIFromReg` didn't. Delete `findMIFromReg` and use `getOpcodeDef` instead. This only happens in `tryOptVectorDup` right now. Update opt-shuffle-splat to show that we can look through the copies now, too. Differential Revision: https://reviews.llvm.org/D64520 llvm-svn: 365684	2019-07-10 18:46:56 +00:00
Jessica Paquette	3132968ae9	[GlobalISel][AArch64][NFC] Use getDefIgnoringCopies from Utils where we can There are a few places where we walk over copies throughout AArch64InstructionSelector.cpp. In Utils, there's a function that does exactly this which we can use instead. Note that the utility function works with the case where we run into a COPY from a physical register. We've run into bugs with this a couple times, so using it should defend us from similar future bugs. Also update opt-fold-compare.mir to show that we still handle physical registers properly. Differential Revision: https://reviews.llvm.org/D64513 llvm-svn: 365683	2019-07-10 18:44:57 +00:00
David Greene	d300a493df	Revert "[System Model] [TTI] Update cache and prefetch TTI interfaces" This broke some PPC prefetching tests. This reverts commit `9fdfb045ae`. llvm-svn: 365680	2019-07-10 18:25:58 +00:00
David Greene	9fdfb045ae	[System Model] [TTI] Update cache and prefetch TTI interfaces Rework the TTI cache and software prefetching APIs to prepare for the introduction of a general system model. Changes include: - Marking existing interfaces const and/or override as appropriate - Adding comments - Adding BasicTTIImpl interfaces that delegate to a subtarget implementation - Adding a default "no information" subtarget implementation Only a handful of targets use these interfaces currently: AArch64, Hexagon, PPC and SystemZ. AArch64 already has a custom subtarget implementation, so its custom TTI implementation is migrated to use the new facilities in BasicTTIImpl to invoke its custom subtarget implementation. The custom TTI implementations continue to exist for the other targets with this change. They are not moved over to subtarget-based implementations. The end goal is to have the default subtarget implementation defer to the system model defined by the target. With this change, the default subtarget implementation essentially returns "no information" for these interfaces. None of the existing users of TTI will hit that implementation because they define their own custom TTI implementations and won't use the BasicTTIImpl implementations. Once system models are in place for the targets that use these interfaces, their custom TTI implementations can be removed. Differential Revision: https://reviews.llvm.org/D63614 llvm-svn: 365676	2019-07-10 18:07:01 +00:00
Simon Pilgrim	5dd2af5248	[X86] EltsFromConsecutiveLoads - clean up element size calcs. NFCI. Determine the element/load size calculations earlier and assert that they are whole bytes in size. llvm-svn: 365674	2019-07-10 17:49:27 +00:00
Peter Collingbourne	893f8d719c	MC: AArch64: Add support for pg_hi21_nc relocation specifier. Differential Revision: https://reviews.llvm.org/D64455 llvm-svn: 365661	2019-07-10 16:36:46 +00:00
Matt Arsenault	6ce1b4fec5	GlobalISel: Legalization for G_FMINNUM/G_FMAXNUM llvm-svn: 365658	2019-07-10 16:31:19 +00:00
Simon Pilgrim	093f4aa72f	[X86] EltsFromConsecutiveLoads - remove duplicate check for element size. NFCI. We've already checked that each element is the correct contributory size for VT when we inspect the elements for Undef/Zero/Load. llvm-svn: 365656	2019-07-10 16:22:31 +00:00
Simon Pilgrim	893448a3e4	[X86] EltsFromConsecutiveLoads - ensure element reg/store sizes are the same size. NFCI. This renames the type so it doesn't sound like its based off the load size - as we're moving towards supporting combining loads of different sizes. llvm-svn: 365655	2019-07-10 16:14:26 +00:00
Matt Arsenault	58426a3707	AMDGPU: Serialize mode from MachineFunctionInfo llvm-svn: 365653	2019-07-10 16:09:26 +00:00
Jay Foad	bba37e89a5	[AMDGPU] Allow abs/neg source modifiers on v_cndmask_b32 Summary: D59191 added support for these modifiers in the assembler and disassembler. This patch just teaches instruction selection that it can use them. Reviewers: arsenm, tstellar Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64497 llvm-svn: 365640	2019-07-10 14:53:47 +00:00
Simon Pilgrim	0a9479ef39	[X86] EltsFromConsecutiveLoads - cleanup Zero/Undef/Load element collection. NFCI. llvm-svn: 365628	2019-07-10 13:28:13 +00:00
Petar Avramovic	7d0778ea6b	[MIPS GlobalISel] Select float and double phi Select float and double phi for MIPS32. Differential Revision: https://reviews.llvm.org/D64420 llvm-svn: 365627	2019-07-10 13:18:13 +00:00
Petar Avramovic	7b31491ae2	[MIPS GlobalISel] Select float and double load and store Select float and double load and store for MIPS32. Differential Revision: https://reviews.llvm.org/D64419 llvm-svn: 365626	2019-07-10 12:55:21 +00:00
Sam Parker	775b2f598a	[NFC][ARM] Convert lambdas to static helpers Break up and convert some of the lambdas in ARMLowOverheadLoops into static functions. llvm-svn: 365623	2019-07-10 12:29:43 +00:00
Simon Pilgrim	ef1aac3191	[X86] EltsFromConsecutiveLoads - LDBase is non-null. NFCI. Don't bother checking for LDBase != null - it should be (and we assert that it is). llvm-svn: 365622	2019-07-10 12:22:59 +00:00
Simon Pilgrim	c972193583	[X86] EltsFromConsecutiveLoads - store Loads on a per-element basis. NFCI. Cache the LoadSDNode nodes so we can easily map to/from the element index instead of packing them together - this will be useful for future patches for PR16739 etc. llvm-svn: 365620	2019-07-10 11:26:57 +00:00
Simon Pilgrim	6a58583951	[X86][SSE] EltsFromConsecutiveLoads - add basic dereferenceable support This patch checks to see if the vector element loads are based off a dereferenceable pointer that covers the entire vector width, in which case we don't need to have element loads at both extremes of the vector width - just the start (base pointer) of it. Another step towards partial vector loads...... Differential Revision: https://reviews.llvm.org/D64205 llvm-svn: 365614	2019-07-10 10:46:36 +00:00
Simon Pilgrim	988925c127	Fix "result of 32-bit shift implicitly converted to 64 bits" warning. NFCI. llvm-svn: 365612	2019-07-10 10:34:44 +00:00
Mikhail Maltsev	ed143c5d59	[ARM] Enable VPUSH/VPOP aliases when either MVE or VFP is present Summary: Use the same predicates as VSTMDB/VLDMIA since VPUSH/VPOP alias to these. Patch by Momchil Velikov. Reviewers: ostannard, simon_tatham, SjoerdMeijer, samparker, t.p.northover, dmgreen Reviewed By: dmgreen Subscribers: javed.absar, kristof.beyls, hiraditya, dmgreen, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64413 llvm-svn: 365604	2019-07-10 08:59:17 +00:00
Craig Topper	50f70de557	[X86] Limit getTargetConstantFromNode to only work on NormalLoads not extending loads. This seems to fix a failure reported by Jordan Rupprecht, but we don't have a reduced test case yet. llvm-svn: 365589	2019-07-10 00:40:01 +00:00
Tom Stellard	d0ba79fe7b	AMDGPU/GlobalISel: Add support for wide loads >= 256-bits Summary: This adds support for the most commonly used wide load types: <8xi32>, <16xi32>, <4xi64>, and <8xi64> Reviewers: arsenm Reviewed By: arsenm Subscribers: hiraditya, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, volkan, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D57399 llvm-svn: 365586	2019-07-10 00:22:41 +00:00
Matt Arsenault	b1843e130a	GlobalISel: Implement lower for G_FCOPYSIGN In SelectionDAG AMDGPU treated these as legal, but this was mostly because the bitcasts required for FP types were painful. Theoretically the bitpattern should eventually match to bfi, so don't bother trying to get the patterns to import. llvm-svn: 365583	2019-07-09 23:34:29 +00:00
Craig Topper	1ae60797cd	[X86] Don't form extloads in combineExtInVec unless the load extension is legal. This should prevent doing this on pre-sse4.1 targets or for 256 bit vectors without avx2. I don't know of a failure from this. Op legalization will probably take care of, but seemed better to be safe. llvm-svn: 365577	2019-07-09 23:05:54 +00:00
Matt Arsenault	3f1a34546c	AMDGPU/GlobalISel: Fix legality for G_BUILD_VECTOR llvm-svn: 365575	2019-07-09 22:48:04 +00:00
Stanislav Mekhanoshin	1e9eae95af	[AMDGPU] gfx908 v_pk_fmac_f16 support Differential Revision: https://reviews.llvm.org/D64433 llvm-svn: 365573	2019-07-09 22:42:24 +00:00
Stanislav Mekhanoshin	50d7f46460	[AMDGPU] gfx908 mAI instructions, MC part Differential Revision: https://reviews.llvm.org/D64446 llvm-svn: 365563	2019-07-09 21:43:09 +00:00
Peter Collingbourne	1366262b74	hwasan: Improve precision of checks using short granule tags. A short granule is a granule of size between 1 and `TG-1` bytes. The size of a short granule is stored at the location in shadow memory where the granule's tag is normally stored, while the granule's actual tag is stored in the last byte of the granule. This means that in order to verify that a pointer tag matches a memory tag, HWASAN must check for two possibilities: * the pointer tag is equal to the memory tag in shadow memory, or * the shadow memory tag is actually a short granule size, the value being loaded is in bounds of the granule and the pointer tag is equal to the last byte of the granule. Pointer tags between 1 to `TG-1` are possible and are as likely as any other tag. This means that these tags in memory have two interpretations: the full tag interpretation (where the pointer tag is between 1 and `TG-1` and the last byte of the granule is ordinary data) and the short tag interpretation (where the pointer tag is stored in the granule). When HWASAN detects an error near a memory tag between 1 and `TG-1`, it will show both the memory tag and the last byte of the granule. Currently, it is up to the user to disambiguate the two possibilities. Because this functionality obsoletes the right aligned heap feature of the HWASAN memory allocator (and because we can no longer easily test it), the feature is removed. Also update the documentation to cover both short granule tags and outlined checks. Differential Revision: https://reviews.llvm.org/D63908 llvm-svn: 365551	2019-07-09 20:22:36 +00:00
Craig Topper	84a1f07363	[X86][AMDGPU][DAGCombiner] Move call to allowsMemoryAccess into isLoadBitCastBeneficial/isStoreBitCastBeneficial to allow X86 to bypass it Basically the problem is that X86 doesn't set the Fast flag from allowsMemoryAccess on certain CPUs due to slow unaligned memory subtarget features. This prevents bitcasts from being folded into loads and stores. But all vector loads and stores of the same width are the same cost on X86. This patch merges the allowsMemoryAccess call into isLoadBitCastBeneficial to allow X86 to skip it. Differential Revision: https://reviews.llvm.org/D64295 llvm-svn: 365549	2019-07-09 19:55:28 +00:00
Stanislav Mekhanoshin	9e77d0c6df	[AMDGPU] gfx908 register file changes Differential Revision: https://reviews.llvm.org/D64438 llvm-svn: 365546	2019-07-09 19:41:51 +00:00
Sean Fertile	f09d54ed2a	Boilerplate for producing XCOFF object files from the PowerPC backend. Stubs out a number of the classes needed to produce a new object file format (XCOFF) for the powerpc-aix target. For testing input is an empty module which produces an object file with just a file header. Differential Revision: https://reviews.llvm.org/D61694 llvm-svn: 365541	2019-07-09 19:21:01 +00:00
Simon Pilgrim	294f37561a	[X86] LowerToHorizontalOp - use count_if to count non-UNDEF ops. NFCI. llvm-svn: 365540	2019-07-09 19:19:17 +00:00
Yonghong Song	a1b2a27a38	[BPF] Fix a typo in the file name Fixed the file name from BPFAbstrctMemberAccess.cpp to BPFAbstractMemberAccess.cpp. Signed-off-by: Yonghong Song <yhs@fb.com> llvm-svn: 365532	2019-07-09 18:35:46 +00:00
Stanislav Mekhanoshin	22b2c3d651	[AMDGPU] gfx908 target Differential Revision: https://reviews.llvm.org/D64429 llvm-svn: 365525	2019-07-09 18:10:06 +00:00
Christudasan Devadasan	b2d24bd540	[AMDGPU] Created a sub-register class for the return address operand in the return instruction. Function return instruction lowering, currently uses the fixed register pair s[30:31] for holding the return address. It can be any SGPR pair other than the CSRs. Created an SGPR pair sub-register class exclusive of the CSRs, and used this regclass while lowering the return instruction. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D63924 llvm-svn: 365512	2019-07-09 16:48:42 +00:00
Sam Elliott	114d2db49b	[RISCV] Fix ICE in isDesirableToCommuteWithShift Summary: There was an error being thrown from isDesirableToCommuteWithShift in some tests. This was tracked down to the method being called before legalisation, with an extended value type, not a machine value type. In the case I diagnosed, the error was only hit with an instruction sequence involving `i24`s in the add and shift. `i24` is not a Machine ValueType, it is instead an Extended ValueType which was causing the issue. I have added a test to cover this case, and fixed the error in the callback. Reviewers: asb, luismarques Reviewed By: asb Subscribers: hiraditya, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, kito-cheng, shiva0217, jrtc27, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, rkruppe, PkmX, jocewei, psnobl, benna, Jim, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64425 llvm-svn: 365511	2019-07-09 16:24:16 +00:00
Amara Emerson	6616e269a6	[AArch64][GlobalISel] Optimize conditional branches followed by unconditional branches If we have an icmp->brcond->br sequence where the brcond just branches to the next block jumping over the br, while the br takes the false edge, then we can modify the conditional branch to jump to the br's target while inverting the condition of the incoming icmp. This means we can eliminate the br as an unconditional branch to the fallthrough block. Differential Revision: https://reviews.llvm.org/D64354 llvm-svn: 365510	2019-07-09 16:05:59 +00:00
Simon Atanasyan	e3892d84e0	[mips] Show error in case of using FP64 mode on pre MIPS32R2 CPU llvm-svn: 365508	2019-07-09 15:48:16 +00:00
Yonghong Song	d3d88d08b5	[BPF] Support for compile once and run everywhere Introduction ============ This patch added intial support for bpf program compile once and run everywhere (CO-RE). The main motivation is for bpf program which depends on kernel headers which may vary between different kernel versions. The initial discussion can be found at https://lwn.net/Articles/773198/. Currently, bpf program accesses kernel internal data structure through bpf_probe_read() helper. The idea is to capture the kernel data structure to be accessed through bpf_probe_read() and relocate them on different kernel versions. On each host, right before bpf program load, the bpfloader will look at the types of the native linux through vmlinux BTF, calculates proper access offset and patch the instruction. To accommodate this, three intrinsic functions preserve_{array,union,struct}_access_index are introduced which in clang will preserve the base pointer, struct/union/array access_index and struct/union debuginfo type information. Later, bpf IR pass can reconstruct the whole gep access chains without looking at gep itself. This patch did the following: . An IR pass is added to convert preserve__access_index to global variable who name encodes the getelementptr access pattern. The global variable has metadata attached to describe the corresponding struct/union debuginfo type. . An SimplifyPatchable MachineInstruction pass is added to remove unnecessary loads. . The BTF output pass is enhanced to generate relocation records located in .BTF.ext section. Typical CO-RE also needs support of global variables which can be assigned to different values to different hosts. For example, kernel version can be used to guard different versions of codes. This patch added the support for patchable externals as well. Example ======= The following is an example. struct pt_regs { long arg1; long arg2; }; struct sk_buff { int i; struct net_device dev; }; #define _(x) (__builtin_preserve_access_index(x)) static int (bpf_probe_read)(void dst, int size, const void unsafe_ptr) = (void ) 4; extern __attribute__((section(".BPF.patchable_externs"))) unsigned __kernel_version; int bpf_prog(struct pt_regs ctx) { struct net_device dev = 0; // ctx->arg* does not need bpf_probe_read if (__kernel_version >= 41608) bpf_probe_read(&dev, sizeof(dev), _(&((struct sk_buff )ctx->arg1)->dev)); else bpf_probe_read(&dev, sizeof(dev), _(&((struct sk_buff )ctx->arg2)->dev)); return dev != 0; } In the above, we want to translate the third argument of bpf_probe_read() as relocations. -bash-4.4$ clang -target bpf -O2 -g -S trace.c The compiler will generate two new subsections in .BTF.ext, OffsetReloc and ExternReloc. OffsetReloc is to record the structure member offset operations, and ExternalReloc is to record the external globals where only u8, u16, u32 and u64 are supported. BPFOffsetReloc Size struct SecLOffsetReloc for ELF section #1 A number of struct BPFOffsetReloc for ELF section #1 struct SecOffsetReloc for ELF section #2 A number of struct BPFOffsetReloc for ELF section #2 ... BPFExternReloc Size struct SecExternReloc for ELF section #1 A number of struct BPFExternReloc for ELF section #1 struct SecExternReloc for ELF section #2 A number of struct BPFExternReloc for ELF section #2 struct BPFOffsetReloc { uint32_t InsnOffset; ///< Byte offset in this section uint32_t TypeID; ///< TypeID for the relocation uint32_t OffsetNameOff; ///< The string to traverse types }; struct BPFExternReloc { uint32_t InsnOffset; ///< Byte offset in this section uint32_t ExternNameOff; ///< The string for external variable }; Note that only externs with attribute section ".BPF.patchable_externs" are considered for Extern Reloc which will be patched by bpf loader right before the load. For the above test case, two offset records and one extern record will be generated: OffsetReloc records: .long .Ltmp12 # Insn Offset .long 7 # TypeId .long 242 # Type Decode String .long .Ltmp18 # Insn Offset .long 7 # TypeId .long 242 # Type Decode String ExternReloc record: .long .Ltmp5 # Insn Offset .long 165 # External Variable In string table: .ascii "0:1" # string offset=242 .ascii "__kernel_version" # string offset=165 The default member offset can be calculated as the 2nd member offset (0 representing the 1st member) of struct "sk_buff". The asm code: .Ltmp5: .Ltmp6: r2 = 0 r3 = 41608 .Ltmp7: .Ltmp8: .loc 1 18 9 is_stmt 0 # t.c:18:9 .Ltmp9: if r3 > r2 goto LBB0_2 .Ltmp10: .Ltmp11: .loc 1 0 9 # t.c:0:9 .Ltmp12: r2 = 8 .Ltmp13: .loc 1 19 66 is_stmt 1 # t.c:19:66 .Ltmp14: .Ltmp15: r3 = (u64 )(r1 + 0) goto LBB0_3 .Ltmp16: .Ltmp17: LBB0_2: .loc 1 0 66 is_stmt 0 # t.c:0:66 .Ltmp18: r2 = 8 .loc 1 21 66 is_stmt 1 # t.c:21:66 .Ltmp19: r3 = (u64 )(r1 + 8) .Ltmp20: .Ltmp21: LBB0_3: .loc 1 0 66 is_stmt 0 # t.c:0:66 r3 += r2 r1 = r10 .Ltmp22: .Ltmp23: .Ltmp24: r1 += -8 r2 = 8 call 4 For instruction .Ltmp12 and .Ltmp18, "r2 = 8", the number 8 is the structure offset based on the current BTF. Loader needs to adjust it if it changes on the host. For instruction .Ltmp5, "r2 = 0", the external variable got a default value 0, loader needs to supply an appropriate value for the particular host. Compiling to generate object code and disassemble: 0000000000000000 bpf_prog: 0: b7 02 00 00 00 00 00 00 r2 = 0 1: 7b 2a f8 ff 00 00 00 00 (u64 )(r10 - 8) = r2 2: b7 02 00 00 00 00 00 00 r2 = 0 3: b7 03 00 00 88 a2 00 00 r3 = 41608 4: 2d 23 03 00 00 00 00 00 if r3 > r2 goto +3 <LBB0_2> 5: b7 02 00 00 08 00 00 00 r2 = 8 6: 79 13 00 00 00 00 00 00 r3 = (u64 )(r1 + 0) 7: 05 00 02 00 00 00 00 00 goto +2 <LBB0_3> 0000000000000040 LBB0_2: 8: b7 02 00 00 08 00 00 00 r2 = 8 9: 79 13 08 00 00 00 00 00 r3 = (u64 )(r1 + 8) 0000000000000050 LBB0_3: 10: 0f 23 00 00 00 00 00 00 r3 += r2 11: bf a1 00 00 00 00 00 00 r1 = r10 12: 07 01 00 00 f8 ff ff ff r1 += -8 13: b7 02 00 00 08 00 00 00 r2 = 8 14: 85 00 00 00 04 00 00 00 call 4 Instructions #2, #5 and #8 need relocation resoutions from the loader. Signed-off-by: Yonghong Song <yhs@fb.com> Differential Revision: https://reviews.llvm.org/D61524 llvm-svn: 365503	2019-07-09 15:28:41 +00:00
Petar Avramovic	be20e36107	[MIPS GlobalISel] Register bank select for G_PHI. Select i64 phi Select gprb or fprb when def/use register operand of G_PHI is used/defined by either: copy to/from physical register or instruction with only one mapping available for that use/def operand. Integer s64 phi is handled with narrowScalar when mapping is applied, produced artifacts are combined away. Manually set gprb to all register operands of instructions created during narrowScalar. Differential Revision: https://reviews.llvm.org/D64351 llvm-svn: 365494	2019-07-09 14:36:17 +00:00
Petar Avramovic	dbb6d01d34	[MIPS GlobalISel] Regbanks for G_SELECT. Select i64, f32 and f64 select Select gprb or fprb when def/use register operand of G_SELECT is used/defined by either: copy to/from physical register or instruction with only one mapping available for that use/def operand. Integer s64 select is handled with narrowScalar when mapping is applied, produced artifacts are combined away. Manually set gprb to all register operands of instructions created during narrowScalar. For selection of floating point s32 or s64 select it is enough to set fprb of appropriate size and selectImpl will do the rest. Differential Revision: https://reviews.llvm.org/D64350 llvm-svn: 365492	2019-07-09 14:30:29 +00:00
Matt Arsenault	4dd5755d01	AMDGPU/GlobalISel: Legalize more concat_vectors llvm-svn: 365488	2019-07-09 14:17:31 +00:00
Matt Arsenault	6bdb92d833	AMDGPU/GlobalISel: Improve regbankselect for icmp s16 Account for 64-bit scalar eq/ne when available. llvm-svn: 365487	2019-07-09 14:13:09 +00:00
Matt Arsenault	8b8eee5904	AMDGPU/GlobalISel: Make s16 G_ICMP legal llvm-svn: 365486	2019-07-09 14:10:43 +00:00
Matt Arsenault	e6d10f97dd	AMDGPU/GlobalISel: Select G_SUB llvm-svn: 365484	2019-07-09 14:05:11 +00:00
Matt Arsenault	872f38be7e	AMDGPU/GlobalISel: Select G_UNMERGE_VALUES llvm-svn: 365483	2019-07-09 14:02:26 +00:00
Matt Arsenault	9b7ffc4e55	AMDGPU/GlobalISel: Select G_MERGE_VALUES llvm-svn: 365482	2019-07-09 14:02:20 +00:00
Simon Atanasyan	2fa6b54635	[mips] Implement sge/sgeu pseudo instructions The `sge/sgeu Dst, Src1, Src2/Imm` pseudo instructions set register `Dst` to 1 if register `Src1` is greater than or equal `Src2/Imm` and to 0 otherwise. Differential Revision: https://reviews.llvm.org/D64314 llvm-svn: 365476	2019-07-09 12:55:55 +00:00
Simon Atanasyan	00df4d92ed	[mips] Implement sgt/sgtu pseudo instructions with immediate operand The `sgt/sgtu Dst, Src1, Src2/Imm` pseudo instructions set register `Dst` to 1 if register `Src1` is greater than `Src2/Imm` and to 0 otherwise. Differential Revision: https://reviews.llvm.org/D64313 llvm-svn: 365475	2019-07-09 12:55:42 +00:00
Djordje Todorovic	01eaae6dd1	[DwarfDebug] Dump call site debug info Dump the DWARF information about call sites and call site parameters into debug info sections. The patch also provides an interface for the interpretation of instructions that could load values of a call site parameters in order to generate DWARF about the call site parameters. ([13/13] Introduce the debug entry values.) Co-authored-by: Ananth Sowda <asowda@cisco.com> Co-authored-by: Nikola Prica <nikola.prica@rt-rk.com> Co-authored-by: Ivan Baev <ibaev@cisco.com> Differential Revision: https://reviews.llvm.org/D60716 llvm-svn: 365467	2019-07-09 11:33:56 +00:00
Alex Bradbury	e0831dac0c	[RISCV] Fix RISCVTTIImpl::getIntImmCost for immediates where getMinSignedBits() > 64 APInt::getSExtValue will assert if getMinSignedBits() > 64. This can happen, for instance, if examining an i128. Avoid this assertion by checking Imm.getMinSignedBits() <= 64 before doing getTLI()->isLegalAddImmediate(Imm.getSExtValue()). We could directly check getMinSignedBits() <= 12 but it seems better to reuse the isLegalAddImmediate helper for this. Differential Revision: https://reviews.llvm.org/D64390 llvm-svn: 365462	2019-07-09 10:56:18 +00:00
Kai Luo	619e39bc72	[NFC][PowerPC] Fixed unused variable 'NewInstr'. llvm-svn: 365433	2019-07-09 03:33:04 +00:00
Stanislav Mekhanoshin	c776dc0b60	[AMDGPU] Added td definitions for HW regs Infrastructure work for future commit. NFC. Differential Revision: https://reviews.llvm.org/D64370 llvm-svn: 365432	2019-07-09 03:20:33 +00:00
Stanislav Mekhanoshin	818d748a45	[AMDGPU] Always use s_memtime for readcyclecounter Differential Revision: https://reviews.llvm.org/D64369 llvm-svn: 365431	2019-07-09 03:10:18 +00:00
Kai Luo	1931ed73c3	[PowerPC][Peephole] Combine extsw and sldi after instruction selection Summary: `extsw` and `sldi` are supposed to be combined if they are in the same BB in instruction selection phase. This patch handles the case where extsw and sldi are not in the same BB. Differential Revision: https://reviews.llvm.org/D63806 llvm-svn: 365430	2019-07-09 02:55:08 +00:00
Chen Zheng	25ab27e6ef	[PowerPC][NFC] remove redundant function isVFReg(). llvm-svn: 365429	2019-07-09 02:48:30 +00:00
Heejin Ahn	947bfe73fc	[WebAssembly] Make sret parameter work with AddMissingPrototypes Summary: Even with functions with `no-prototype` attribute, there can be an argument `sret` (structure return) attribute, which is an optimization when a function return type is a struct. Fixes PR42420. Reviewers: sbc100 Subscribers: dschuff, jgravelle-google, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64318 llvm-svn: 365426	2019-07-09 02:10:33 +00:00
Jessica Paquette	55d19247ef	[AArch64][GlobalISel] Use TST for comparisons when possible Porting over the part of `emitComparison` in AArch64ISelLowering where we use TST to represent a compare. - Rename `tryOptCMN` to `tryFoldIntegerCompare`, since it now also emits TSTs when possible. - Add a utility function for emitting a TST with register operands. - Rename opt-fold-cmn.mir to opt-fold-compare.mir, since it now also tests the TST fold as well. Differential Revision: https://reviews.llvm.org/D64371 llvm-svn: 365404	2019-07-08 22:58:36 +00:00
Matt Arsenault	9e7cbc0e7d	AMDGPU: Split extload/zextload local load patterns This will help removing the custom load predicates, allowing the global isel emitter to handle them. llvm-svn: 365398	2019-07-08 22:08:23 +00:00
Bill Wendling	c8933c4070	Add parentheses to silence warning. llvm-svn: 365394	2019-07-08 22:00:33 +00:00
Reid Kleckner	2f07c2e9d9	Standardize on MSVC behavior for triples with no environment Summary: This makes it so that IR files using triples without an environment work out of the box, without normalizing them. Typically, the MSVC behavior is more desirable. For example, it tends to enable things like constant merging, use of associative comdats, etc. Addresses PR42491 Reviewers: compnerd Subscribers: hiraditya, dexonsmith, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64109 llvm-svn: 365387	2019-07-08 21:05:20 +00:00
Matt Arsenault	8561844321	AMDGPU: Fix unused variable in release build llvm-svn: 365378	2019-07-08 19:47:42 +00:00
Matt Arsenault	acc9e1e4c2	AMDGPU: Fix stray typing llvm-svn: 365373	2019-07-08 19:05:19 +00:00
Matt Arsenault	71dfb7ec5c	AMDGPU: Make s34 the FP register Make the FP register callee saved. This is tricky because now the FP needs to be spilled in the prolog relative to the incoming SP register, rather than the frame register used throughout the rest of the function. I don't like how this bypassess the standard mechanism for CSR spills just to get the correct insert point. I may look for a better solution, since all CSR VGPRs may also need to have all lanes activated. Another option might be to make getFrameIndexReference change the base register if the frame index is a CSR, and then try to figure out the right insertion point in emitProlog. If there is a free VGPR lane available for SGPR spilling, try to use it for the FP. If that would require intrtoducing a new VGPR spill, try to use a free call clobbered SGPR. Only fallback to introducing a new VGPR spill as a last resort. This also doesn't attempt to handle SGPR spilling with scalar stores. llvm-svn: 365372	2019-07-08 19:03:38 +00:00
Matt Arsenault	5e643036cb	AMDGPU: Move DEBUG_TYPE definition below includes llvm-svn: 365369	2019-07-08 18:48:39 +00:00
Wouter van Oortmerssen	81db9f543c	[WebAssembly] tablegen: distinguish float/int immediate operands. Summary: Before, they were one category of operands which could cause crashes in non-sensical combinations, e.g. "f32.const symbol". Now these are forced to be an error. Reviewers: dschuff Subscribers: sbc100, jgravelle-google, aheejin, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64039 llvm-svn: 365351	2019-07-08 16:58:37 +00:00
Matt Arsenault	224d8cd987	AMDGPU: Remove mubuf specific PatFrags These are identical to the *_global PatFrag, and will only create more work to get the GlobalISel importer to handle them. llvm-svn: 365350	2019-07-08 16:53:53 +00:00
Matt Arsenault	430b0497e7	AMDGPU: Move waitcnt intrinsic to instruction definition pattern llvm-svn: 365349	2019-07-08 16:53:48 +00:00
Matt Arsenault	079f77b590	GlobalISel: Convert some build functions to using SrcOp/DstOp llvm-svn: 365343	2019-07-08 16:27:47 +00:00
Simon Pilgrim	e1a9b49d6b	[X86] ISD::INSERT_SUBVECTOR - use uint64_t index. NFCI. Keep the uint64_t type from getConstantOperandVal to stop truncation/extension overflow warnings in MSVC in subvector index math. llvm-svn: 365328	2019-07-08 14:52:56 +00:00
Petar Avramovic	aa699b20a0	[MIPS GlobalISel] Register bank select for G_LOAD. Select i64 load Select gprb or fprb when loaded value is used by either: copy to physical register or instruction with only one mapping available for that use operand. Load of integer s64 is handled with narrowScalar when mapping is applied, produced artifacts are combined away. Manually set gprb to all register operands of instructions created during narrowScalar. Differential Revision: https://reviews.llvm.org/D64269 llvm-svn: 365323	2019-07-08 14:45:52 +00:00
Petar Avramovic	ec575f6e3e	[MIPS GlobalISel] Register bank select for G_STORE. Select i64 store Select gprb or fprb when stored value is defined by either: copy from physical register or instruction with only one mapping available for that def operand. Store of integer s64 is handled with narrowScalar when mapping is applied, produced artifacts are combined away. Manually set gprb to all register operands of instructions created during narrowScalar. Differential Revision: https://reviews.llvm.org/D64268 llvm-svn: 365322	2019-07-08 14:36:36 +00:00
Dmitry Preobrazhensky	2eff0318c6	[AMDGPU][MC] Corrected parsing of FLAT offset modifier Summary of changes: - simplified handling of FLAT offset: offset_s13 and offset_u12 have been replaced with flat_offset; - provided information about error position for pre-gfx9 targets; - improved errors handling. Reviewers: artem.tamazov, arsenm, rampitec Differential Revision: https://reviews.llvm.org/D64244 llvm-svn: 365321	2019-07-08 14:27:37 +00:00
Mikhail Maltsev	ee81051fc9	[ARM] Relax constraints on operands of VQxDMLxDH instructions Summary: According to a recently updated Armv8-M spec (https://static.docs.arm.com/ddi0553/bh/DDI0553B_h_armv8m_arm.pdf) the 32-bit width versions of the following instructions: * VQDMLADH * VQDMLADHX * VQRDMLADH * VQRDMLADHX * VQDMLSDH * VQDMLSDHX * VQRDMLSDH * VQRDMLSDHX are no longer unpredictable when their output register is the same as one of the input registers. This patch updates the assembler parser and the corresponding tests and also removes @earlyclobber from the instruction constraints. Reviewers: simon_tatham, ostannard, dmgreen, SjoerdMeijer, samparker Reviewed By: simon_tatham Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64250 llvm-svn: 365306	2019-07-08 09:44:52 +00:00
Alex Bradbury	0b9addb8c0	[RISCV] Specify registers used in DWARF exception handling Defines RISCV registers for getExceptionPointerRegister() and getExceptionSelectorRegister(). Differential Revision: https://reviews.llvm.org/D63411 Patch by Edward Jones. Modified by Alex Bradbury to add CHECK lines to exception-pointer-register.ll. llvm-svn: 365301	2019-07-08 09:16:47 +00:00
Fangrui Song	7d63be09b6	[ARM] Fix null pointer dereference in CodeGen/ARM/Windows/stack-protector-msvc.ll.test after D64292/r365283 CLI.CS may not be set. llvm-svn: 365299	2019-07-08 08:43:31 +00:00
Jay Foad	38902350ef	[AMDGPU] Use a named predicate instead of a magic number. Reviewers: arsenm Reviewed By: arsenm Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64201 llvm-svn: 365294	2019-07-08 07:04:58 +00:00
Craig Topper	1deca50ab1	[X86] Allow execution domain fixing to turn SHUFPD into SHUFPS. This can help with code size on SSE targets where SHUFPD requires a 0x66 prefix and SHUFPS doesn't. llvm-svn: 365293	2019-07-08 06:52:49 +00:00
Craig Topper	d8261f0288	[X86] Make movsd commutable to shufpd with a 0x02 immediate on pre-SSE4.1 targets. This can help avoid a copy or enable load folding. On SSE4.1 targets we can commute it to blendi instead. I had to make shufpd with a 0x02 immediate commutable as well since we expect commuting to be reversible. llvm-svn: 365292	2019-07-08 06:52:43 +00:00
Alex Bradbury	e1e036a33b	[RISCV] Support z and i operand modifiers Differential Revision: https://reviews.llvm.org/D57792 Patch by James Clarke. llvm-svn: 365291	2019-07-08 05:00:26 +00:00
Craig Topper	46f2b583a2	[X86] Add MOVSDrr->MOVLPDrm entry to load folding table. Add custom handling to turn UNPCKLPDrr->MOVHPDrm when load is under aligned. If the load is aligned we can turn UNPCKLPDrr into UNPCKLPDrm. llvm-svn: 365287	2019-07-08 02:10:20 +00:00
Martin Storsjo	8d9d290d4c	[ARM] Add support for MSVC stack cookie checking Heavily based on the same for AArch64, from SVN r346469. Differential Revision: https://reviews.llvm.org/D64292 llvm-svn: 365283	2019-07-07 18:57:31 +00:00
Craig Topper	ac744d5a86	[X86] Make sure load isn't volatile before shrinking it in MOVDDUP isel patterns. llvm-svn: 365275	2019-07-07 05:33:20 +00:00
Simon Pilgrim	a7145c45a7	[X86] SimplifyDemandedVectorEltsForTargetNode - fix shadow variable warning. NFCI. Fixes cppcheck warning. llvm-svn: 365271	2019-07-06 18:46:09 +00:00
Simon Pilgrim	01f1bad618	[X86] LowerBuildVectorv16i8 - pull out repeated getOperand() call. NFCI. llvm-svn: 365270	2019-07-06 18:33:29 +00:00
Craig Topper	317d6093df	[X86] Remove patterns from MOVLPSmr and MOVHPSmr instructions. These patterns are the same as the MOVLPDmr and MOVHPDmr patterns, but with a bitcast at the end. We can just select the PD instruction and let execution domain fixing switch to PS. llvm-svn: 365267	2019-07-06 17:59:51 +00:00
Craig Topper	913105ca42	[X86] Add patterns to select MOVLPDrm from MOVSD+load and MOVHPD from UNPCKL+load. These narrow the load so we can only do it if the load isn't volatile. There also tests in vector-shuffle-128-v4.ll that this should support, but we don't seem to fold bitcast+load on pre-sse4.2 targets due to the slow unaligned mem 16 flag. llvm-svn: 365266	2019-07-06 17:59:45 +00:00
Matt Arsenault	5e9610a3f5	AMDGPU: Fix assert in clang test llvm-svn: 365245	2019-07-05 21:09:53 +00:00
Nikita Popov	a2a09cb606	[SystemZ] Fix addcarry of usubo (PR42512) Only custom lower uaddo+addcarry or usubo+subcarry chains and leave mixtures like usubo+addcarry or uaddo+subcarry to the generic legalizer. Otherwise we run into issues because SystemZ uses different CC values for carries and borrows. Fixes https://bugs.llvm.org/show_bug.cgi?id=42512. Differential Revision: https://reviews.llvm.org/D64213 llvm-svn: 365242	2019-07-05 20:35:11 +00:00
Matt Arsenault	e7e23e3e91	AMDGPU: Make AMDGPUPerfHintAnalysis an SCC pass Add a string attribute instead of directly setting MachineFunctionInfo. This avoids trying to get the analysis in the MachineFunctionInfo in a way that doesn't work with the new pass manager. This will also avoid re-visiting the call graph for every single function. llvm-svn: 365241	2019-07-05 20:26:13 +00:00
Michael Liao	8d6ea2d48c	[CodeGen] Enhance `MachineInstrSpan` to allow the end of MBB to be used. Summary: - Explicitly specify the parent MBB to allow the end iterator to be used. Reviewers: aprantl, MatzeB, craig.topper, qcolombet Subscribers: arsenm, jvesely, nhaehnle, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64261 llvm-svn: 365240	2019-07-05 20:23:59 +00:00
Benjamin Kramer	05eebaa949	[PowerPC] Fold another unused variable into assertion. NFC. llvm-svn: 365237	2019-07-05 19:58:39 +00:00
Benjamin Kramer	31f6b13e83	[PowerPC] Fold variable into assert. NFC. Avoids a warning in Release builds. llvm-svn: 365236	2019-07-05 19:46:48 +00:00
Benjamin Kramer	049230b4d2	[PowerPC] Remove unused variable. NFC. llvm-svn: 365235	2019-07-05 19:28:02 +00:00
Craig Topper	d22b2d01ca	[X86] Correct the size check in foldMemoryOperandCustom. The Size either needs to be 0 meaning we aren't folding a stack reload. Or the stack slot needs to be at least 16 bytes. I've also added a paranoia check ensure the RCSize is at leat 16 bytes as well. This avoids any FR32/FR64 surprises, but I think we already filtered those earlier. All of our test case have Size as either 0 or 16 and RCSize == 16. So the Size <= 16 check worked for those cases. llvm-svn: 365234	2019-07-05 18:54:00 +00:00
Nemanja Ivanovic	6c9a392c8e	[PowerPC] Move TOC save to prologue when profitable The indirect call sequence on PPC requires that the TOC base register be saved prior to the indirect call and restored after the call since the indirect call may branch to a global entry point in another DSO which will update the TOC base. Over the last couple of years, we have improved this to: - be able to hoist TOC saves from loops (with changes to MachineLICM) - avoid multiple saves when one dominates the other[s] However, it is still possible to have multiple TOC saves dynamically in the execution path if there is no dominance relationship between them. This patch moves the TOC save to the prologue when one of the TOC saves is in a block that post-dominates entry (i.e. it cannot be avoided) or if it is in a block that is hotter than entry. Differential revision: https://reviews.llvm.org/D63803 llvm-svn: 365232	2019-07-05 18:38:09 +00:00
Craig Topper	6e6d229e5e	[X86] Update SSE1 MOVLPSrm and MOVHPSrm isel patterns to ensure loads are non-volatile before folding. These patterns use 128-bit loads, but the instructions only load 64-bits. We shouldn't narrow the load if its volatile. Fixes another variant of PR42079 llvm-svn: 365225	2019-07-05 17:31:29 +00:00
Craig Topper	8a93952a5c	[X86] Remove unnecessary isel pattern for MOVLPSmr. This was identical to a pattern for MOVPQI2QImr with a bitcast as an input. But we should be able to turn MOVPQI2QImr into MOVLPSmr in the execution domain fixup pass so we shouldn't need this. llvm-svn: 365224	2019-07-05 17:31:25 +00:00
Christudasan Devadasan	652ad423bb	[NFC] A test commit to check the access permission. Removed a blank line. llvm-svn: 365223	2019-07-05 17:07:42 +00:00
Yaxun Liu	a62413526d	[AMDGPU] Added a new metadata for multi grid sync implicit argument Patch by Christudasan Devadasan. Differential Revision: https://reviews.llvm.org/D63886 llvm-svn: 365217	2019-07-05 16:05:17 +00:00
David Green	47afdaa487	[ARM] MVE patterns for VMVN, VORR and VBIC This add simple Q register forms of bitwise not instructions. Differential Revision: https://reviews.llvm.org/D63983 llvm-svn: 365214	2019-07-05 15:21:29 +00:00
Jay Foad	7e0c10b55f	[AMDGPU] DPP combiner: recognize identities for more opcodes Summary: This allows the DPP combiner to kick in more often. For example the exclusive scan generated by the atomic optimizer for a divergent atomic add used to look like this: v_mov_b32_e32 v3, v1 v_mov_b32_e32 v5, v1 v_mov_b32_e32 v6, v1 v_mov_b32_dpp v3, v2 wave_shr:1 row_mask:0xf bank_mask:0xf s_nop 1 v_add_u32_dpp v4, v3, v3 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0 v_mov_b32_dpp v5, v3 row_shr:2 row_mask:0xf bank_mask:0xf v_mov_b32_dpp v6, v3 row_shr:3 row_mask:0xf bank_mask:0xf v_add3_u32 v3, v4, v5, v6 v_mov_b32_e32 v4, v1 s_nop 1 v_mov_b32_dpp v4, v3 row_shr:4 row_mask:0xf bank_mask:0xe v_add_u32_e32 v3, v3, v4 v_mov_b32_e32 v4, v1 s_nop 1 v_mov_b32_dpp v4, v3 row_shr:8 row_mask:0xf bank_mask:0xc v_add_u32_e32 v3, v3, v4 v_mov_b32_e32 v4, v1 s_nop 1 v_mov_b32_dpp v4, v3 row_bcast:15 row_mask:0xa bank_mask:0xf v_add_u32_e32 v3, v3, v4 s_nop 1 v_mov_b32_dpp v1, v3 row_bcast:31 row_mask:0xc bank_mask:0xf v_add_u32_e32 v1, v3, v1 v_add_u32_e32 v1, v2, v1 v_readlane_b32 s0, v1, 63 But now most of the dpp movs are combined into adds: v_mov_b32_e32 v3, v1 v_mov_b32_e32 v5, v1 s_nop 0 v_mov_b32_dpp v3, v2 wave_shr:1 row_mask:0xf bank_mask:0xf s_nop 1 v_add_u32_dpp v4, v3, v3 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0 v_mov_b32_dpp v5, v3 row_shr:2 row_mask:0xf bank_mask:0xf v_mov_b32_dpp v1, v3 row_shr:3 row_mask:0xf bank_mask:0xf v_add3_u32 v1, v4, v5, v1 s_nop 1 v_add_u32_dpp v1, v1, v1 row_shr:4 row_mask:0xf bank_mask:0xe s_nop 1 v_add_u32_dpp v1, v1, v1 row_shr:8 row_mask:0xf bank_mask:0xc s_nop 1 v_add_u32_dpp v1, v1, v1 row_bcast:15 row_mask:0xa bank_mask:0xf s_nop 1 v_add_u32_dpp v1, v1, v1 row_bcast:31 row_mask:0xc bank_mask:0xf v_add_u32_e32 v1, v2, v1 v_readlane_b32 s0, v1, 63 Reviewers: arsenm, vpykhtin Subscribers: kzhuravl, nemanjai, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kbarton, MaskRay, jfb, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64207 llvm-svn: 365211	2019-07-05 14:52:48 +00:00
Robert Lougher	9dcfbbae76	This reverts r365061 and r365062 (test update) Revision r365061 changed a skip of debug instructions for a skip of meta instructions. This is not safe, as IMPLICIT_DEF is classed as a meta instruction. llvm-svn: 365202	2019-07-05 12:42:06 +00:00
Sam Elliott	b2c9eed0d7	[RISCV] Support @llvm.readcyclecounter() Intrinsic On RISC-V, the `cycle` CSR holds a 64-bit count of the number of clock cycles executed by the core, from an arbitrary point in the past. This matches the intended semantics of `@llvm.readcyclecounter()`, which we currently leave to the default lowering (to the constant 0). With this patch, we will now correctly lower this intrinsic to the intended semantics, using the user-space instruction `rdcycle`. On 64-bit targets, we can directly lower to this instruction. On 32-bit targets, we need to do more, as `rdcycle` only returns the low 32-bits of the `cycle` CSR. In this case, we perform a custom lowering, based on the PowerPC lowering, using `rdcycleh` to obtain the high 32-bits of the `cycle` CSR. This custom lowering inserts a new basic block which detects overflow in the high 32-bits of the `cycle` CSR during reading (because multiple instructions are required to read). The emitted assembly matches the suggested assembly in the RISC-V specification. Differential Revision: https://reviews.llvm.org/D64125 llvm-svn: 365201	2019-07-05 12:35:21 +00:00
Robert Lougher	2478b62098	Revert r365198 as this accidentally commited something that should not have been added. llvm-svn: 365199	2019-07-05 12:30:45 +00:00
Robert Lougher	3bea2b15f5	This reverts r365061 and r365062 (test update) Revision r365061 changed a skip of debug instructions for a skip of meta instructions. This is not safe, as IMPLICIT_DEF is classed as a meta instruction. llvm-svn: 365198	2019-07-05 12:20:21 +00:00
Sam Elliott	6884d5e040	[RISCV][NFC] Replace hard-coded CSR duplication with symbolic references Reviewers: asb, lenary Reviewed By: asb, lenary Subscribers: MaskRay, hiraditya, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, kito-cheng, shiva0217, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, rkruppe, PkmX, jocewei, psnobl, benna, Jim, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64139 Patch by James Clarke (jrtc27) llvm-svn: 365195	2019-07-05 12:16:40 +00:00
Simon Pilgrim	8b25d9bf01	[X86][SSE] LowerINSERT_VECTOR_ELT - early out for out of range indices Fixes OSS-Fuzz #15662 llvm-svn: 365180	2019-07-05 10:34:53 +00:00
David Green	25cf705097	[ARM] MVE VMOV immediate handling This adds some handling for VMOVimm, using the same method that NEON uses. We create VMOVIMM/VMVNIMM/VMOVFPIMM nodes based on the immediate, and select them using the now renamed ARMvmovImm/etc. There is also an extra 64bit immediate mode that I have not yet added here. Code by David Sherwood Differential Revision: https://reviews.llvm.org/D63884 llvm-svn: 365178	2019-07-05 10:02:43 +00:00
David Green	bb7e97d783	[ARM] MVE fp to int conversions This adds the patterns needed for fptosi and sitofp. Differential Revision: https://reviews.llvm.org/D63729 llvm-svn: 365176	2019-07-05 09:34:30 +00:00
Fangrui Song	6fa850c4fe	[RISCV] Delete a ctor that is commented out. NFC llvm-svn: 365175	2019-07-05 08:25:14 +00:00
Craig Topper	171732aeb3	[X86] Add custom isel to select ADD/SUB/OR/XOR/AND to their non-immediate forms under optsize when the immediate has additional users. Summary: We attempt to prevent folding immediates with multiple users under optsize. But we only do this from store nodes and X86ISD::ADD/SUB/XOR/OR/AND patterns. We don't do it for ISD::ADD/SUB/XOR/OR/AND even though we count them as users when deciding whether to fold into other nodes. This leads to situations where we block folding to a compare for example, but still fold into an AND or OR as seen in PR27202. Unfortunately touching the isel patterns in tablegen for the ISD::ADD/SUB/XOR/OR/AND opcodes will cause the patterns to be unusable for fast isel. And we don't have a way to make a fast isel only pattern. To workaround this, this patch adds custom isel in front of the isel table that will select the non-immediate forms if the immediate has additional users. This may create some issues for ANDN and NOT matching. And there's room for improvement with unsigned 32 immediates on 64-bit AND. This patch needs more thorough test cases, but I wanted to get feedback on the direction. Please send me any other test cases you've seen in the wild. I think we probably have the same issue with the immediate matching when we fold RMW from X86ISD::ADD/SUB/XOR/OR/AND. And our TEST immedaite shrinking logic. Our cost modeling for immediates that can fit in a sign extended 8-bit immediate on a 16/32/64 bit operation is completely wrong. I also wonder if we should update the ConstantHoisting cost model and block folding for "opaque" constants. But of course constants can still be created by DAG combine and lowering optimizations. Fixes PR27202 Reviewers: spatel, RKSimon, andreadb Reviewed By: RKSimon Subscribers: jsji, hiraditya, jdoerfert, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59909 llvm-svn: 365163	2019-07-04 22:53:57 +00:00
Simon Atanasyan	1e9c00308b	[mips] Refactor expandSeq and expandSeqI methods. NFC llvm-svn: 365161	2019-07-04 22:45:07 +00:00
Tim Renouf	5816889c74	[AMDGPU] Custom lower INSERT_SUBVECTOR v3, v4, v5, v8 Summary: Since the changes to introduce vec3 and vec5, INSERT_VECTOR for these sizes has been marked "expand", which made LegalizeDAG lower it to loads and stores via a stack slot. The code got optimized a bit later, but the now-unused stack slot was never deleted. This commit avoids that problem by custom lowering INSERT_SUBVECTOR into an EXTRACT_VECTOR_ELT and INSERT_VECTOR_ELT for each element in the subvector to insert. V2: Addressed review comments re test. Differential Revision: https://reviews.llvm.org/D63160 Change-Id: I9e3c13e36f68cfa3431bb9814851cc1f673274e1 llvm-svn: 365148	2019-07-04 17:38:24 +00:00
Jay Foad	0cd50b2a95	Fix typos in comments and debug output. llvm-svn: 365146	2019-07-04 15:04:29 +00:00
Michael Liao	7a9ad430fe	[AMDGPU] Correct the setting of `FlatScratchInit`. Summary: - That flag setting should skip spilling stack slot. Reviewers: arsenm, rampitec Subscribers: qcolombet, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64143 llvm-svn: 365137	2019-07-04 13:29:45 +00:00
Simon Pilgrim	fde766de4b	[X86][AVX1] Combine concat_vectors(pshufd(x,c),pshufd(y,c)) -> vpermilps(concat_vectors(x,y),c) Bitcast v4i32 to v8f32 and back again - it might be worth adding isel patterns for X86PShufd v8i32 on AVX1 targets like we did for X86Blendi to avoid the bitcasts? llvm-svn: 365125	2019-07-04 10:17:10 +00:00
David Green	2b20ee4110	[ARM] Favour PL/MI over GE/LT when possible The arm condition codes for GE is N==V (and for LT is N!=V). If the source of flags cannot set V (overflow), such as a cmp against #0, then we can use the simpler PL and MI conditions that only check N. As these PL/MI conditions are simpler than GE/LT, other passes like the peephole optimiser can have a better time optimising away the redundant CMPs. The exception is the VSEL instruction, which cannot take the PL code, so there the transform favours GE. Differential Revision: https://reviews.llvm.org/D64160 llvm-svn: 365117	2019-07-04 08:58:58 +00:00
David Green	d2a9ec29d0	[ARM] MVE bitwise instruction patterns This adds patterns for the simpler VAND, VORR and VEOR bitwise vector instructions. It also adjusts the top16Zero PatLeaf to not match on vector instructions, which can otherwise cause problems. Code written by David Sherwood. Differential Revision: https://reviews.llvm.org/D63867 llvm-svn: 365113	2019-07-04 08:41:23 +00:00
QingShan Zhang	63e62006cf	[NFC][PowerPC] Make the PowerPC scheduling strategy feature only control the strategy instead of the scheduler. llvm-svn: 365110	2019-07-04 07:43:51 +00:00
Craig Topper	163b8bb3f5	[X86] Use pointer sized indices instead of i32 for EXTRACT_VECTOR_ELT and INSERT_VECTOR_ELT in a couple places. Most places already did this. llvm-svn: 365109	2019-07-04 06:21:54 +00:00
Fangrui Song	1f333562de	[PowerPC] Support constraint code "ww" Summary: "ww" and "ws" are both constraint codes for VSX vector registers that hold scalar double data. "ww" is preferred for float while "ws" is preferred for double. Reviewed By: jsji Differential Revision: https://reviews.llvm.org/D64119 llvm-svn: 365106	2019-07-04 04:44:42 +00:00
Derek Schuff	51d3c4dfcd	[WebAssembly] Update test failure explanations llvm-svn: 365100	2019-07-04 00:24:35 +00:00
Derek Schuff	ec4be57655	[WebAssembly] Enable IndirectBrExpandPass Wasm doesn't have a direct way to lower indirectbr, so hook up the IndirectBrExpandPass to lower indirectbr into a switch. Fixes PR42498 Reviewers: aheejin Differential Revision: https://reviews.llvm.org/D64161 llvm-svn: 365096	2019-07-03 23:54:06 +00:00
Matt Arsenault	5b0922fe1f	AMDGPU: Add pass to lower SGPR spills This is split out from my patches to split register allocation into a separate SGPR and VGPR phase, and has some parts that aren't yet used (like maintaining LiveIntervals). This simplifies making the frame pointer register callee saved. As it is now, the code to determine callee saves needs to predict all the possible SGPR spills and how many callee saved VGPRs are needed. By handling this before PrologEpilogInserter, it's possible to just check the spill objects that already exist. Change-Id: I29e6df4034afcf949e06f8ef44206acb94696f04 llvm-svn: 365095	2019-07-03 23:32:29 +00:00
Matt Arsenault	c96c174557	Revert "[AMDGPU] Kernel arg metadata: added support for "__hip_texture" type." This reverts commit r365073. This is crashing, and is improperly relying on IR type names. llvm-svn: 365087	2019-07-03 21:34:34 +00:00
Konstantin Pyzhov	6f419a3370	[AMDGPU] Kernel arg metadata: added support for "__hip_texture" type. Summary: Hip texture type is equivalent to OpenCL image. So, we need to set the Image type for kernel arguments with __hip_texture type. Differential revision: https://reviews.llvm.org/D63850 llvm-svn: 365073	2019-07-03 19:11:35 +00:00
Jessica Paquette	6584109389	Fix precedence in assert from r364961 Precedence was wrong in an assert added in r364961. Add braces around the assertion condition to make it right. See: https://reviews.llvm.org/D64084 llvm-svn: 365069	2019-07-03 18:30:01 +00:00
Jessica Paquette	a99cfeea44	[GlobalISel][AArch64] Use getConstantVRegValWithLookThrough for selectArithImmed Instead of just stopping to see if we have a G_CONSTANT, instead, look through G_TRUNCs, G_SEXTs, and G_ZEXTs. This gives an average ~1.3% code size improvement on CINT2000 at -O3. Differential Revision: https://reviews.llvm.org/D64108 llvm-svn: 365063	2019-07-03 17:46:23 +00:00
Robert Lougher	720baf0416	[X86] Avoid SFB - Skip meta instructions This patch generalizes the fix in D61680 to ignore all meta instructions, not just debug info. Patch by Chris Dawson. Differential Revision: https://reviews.llvm.org/D62605 llvm-svn: 365061	2019-07-03 17:43:55 +00:00
Francis Visoiu Mistrih	83bbe2f418	[CodeGen] Make branch funnels pass the machine verifier We previously marked all the tests with branch funnels as `-verify-machineinstrs=0`. This is an attempt to fix it. 1) `ICALL_BRANCH_FUNNEL` has no defs. Mark it as `let OutOperandList = (outs)` 2) After that we hit an assert: ``` Assertion failed: (Op.getValueType() != MVT::Other && Op.getValueType() != MVT::Glue && "Chain and glue operands should occur at end of operand list!"), function AddOperand, file /Users/francisvm/llvm/llvm/lib/CodeGen/SelectionDAG/InstrEmitter.cpp, line 461. ``` The chain operand was added at the beginning of the operand list. Move that to the end. 3) After that we hit another verifier issue in the pseudo expansion where the registers used in the cmps and jmps are not added to the livein lists. Add the `EFLAGS` to all the new MBBs that we create. PR39436 Differential Review: https://reviews.llvm.org/D54155 llvm-svn: 365058	2019-07-03 17:16:45 +00:00
Simon Pilgrim	26812c7675	[X86] ComputeNumSignBitsForTargetNode - add target shuffle support. llvm-svn: 365057	2019-07-03 17:06:59 +00:00
Simon Pilgrim	783dbe402f	[X86][AVX] combineX86ShufflesRecursively - peek through extract_subvector If we have more then 2 shuffle ops to combine, try to use combineX86ShuffleChainWithExtract to see if some are from the same super vector. llvm-svn: 365050	2019-07-03 15:46:08 +00:00
Sam Parker	6005681ac6	[ARM] Fix for NDEBUG builds Fix unused variable warning as well as a nonsense assert. Differential Revision: https://reviews.llvm.org/D63816 llvm-svn: 365046	2019-07-03 14:39:23 +00:00
Simon Pilgrim	868d0b7fd9	[X86][AVX] Combine vpermi(bitcast(x)) -> bitcast(vpermi(x)) iff the number of elements doesn't change. This gets around an issue with combineX86ShuffleChain not being able to hint which domain is preferred for shuffles that can be done with either. Fixes regression introduced in rL365041 llvm-svn: 365044	2019-07-03 14:34:16 +00:00
Simon Pilgrim	0c230209fe	[X86][AVX] combineX86ShuffleChainWithExtract - add number of non-zero extract_subvectors to the combine depth This better accounts for the cost/benefit of removing extract_subvectors from the shuffle and will be more useful in future patches. The vpermq predicate regression will be fixed shortly. llvm-svn: 365041	2019-07-03 14:17:21 +00:00
Simon Atanasyan	a10bf0939d	[mips] Mark general scheduling model as complete llvm-svn: 365034	2019-07-03 12:28:05 +00:00
Simon Atanasyan	4d364659f9	[mips] Add missing atomic instructions to general scheduling definitions llvm-svn: 365033	2019-07-03 12:27:58 +00:00
Simon Atanasyan	3e4c7eb33e	[mips] Add missing microMIPS instructions to general scheduling definitions llvm-svn: 365032	2019-07-03 12:27:51 +00:00
Simon Pilgrim	8c099cbe7c	[X86][SSE] lowerUINT_TO_FP_v2i32 - explicitly cast half word to double Fixes MSVC analyzer extension->double warning. llvm-svn: 365027	2019-07-03 11:23:27 +00:00
Simon Pilgrim	8df90b843d	[X86][SSE] LowerINSERT_VECTOR_ELT - ensure insertion index correctness. NFCI. Assert that the insertion index is in range and use uint64_t for the index to fix MSVC/cppcheck truncation warning. llvm-svn: 365025	2019-07-03 10:59:52 +00:00
Simon Pilgrim	8853bd9592	[X86][SSE] LowerScalarImmediateShift - ensure shift amount correctness. NFCI. Assert that the shift amount is in range and create vXi8 shift masks in a way that doesn't cause MSVC/cppcheck shift result is truncated then extended warnings. llvm-svn: 365024	2019-07-03 10:47:33 +00:00
Simon Atanasyan	3e41b97f14	[mips] Add SIGRIE,GINVI,GINVT to general scheduling definitions llvm-svn: 365023	2019-07-03 10:33:16 +00:00
Simon Atanasyan	dc3c67bbe2	[mips] Add missing mips16 instructions to general scheduling definitions llvm-svn: 365022	2019-07-03 10:33:09 +00:00
Simon Atanasyan	b04f6a1a25	[mips] Add missing MSA and ASE instructions to general scheduling definitions llvm-svn: 365021	2019-07-03 10:33:01 +00:00
Simon Atanasyan	e5dfbe83b6	[mips] Replace some itineraries by instructions in the general scheduling definitions llvm-svn: 365020	2019-07-03 10:32:54 +00:00
Simon Pilgrim	64e3a51534	Fix uninitialized variable warnings. NFCI. Both MSVC and cppcheck don't like the fact that the variables are initialized via references. llvm-svn: 365018	2019-07-03 10:22:08 +00:00
Simon Pilgrim	7b7b9b78a2	[X86] LowerFunnelShift - use modulo constant shift amount. This avoids the use of getZExtValue and uses the modulo shift amount which is whats expected for funnel shifts anyhow. llvm-svn: 365016	2019-07-03 10:04:16 +00:00
Oliver Stannard	830b20344b	[ARM] Thumb2: favor R4-R7 over R12/LR in allocation order when opt for minsize For Thumb2, we prefer low regs (costPerUse = 0) to allow narrow encoding. However, current allocation order is like: R0-R3, R12, LR, R4-R11 As a result, a lot of instructs that use R12/LR will be wide instrs. This patch changes the allocation order to: R0-R7, R12, LR, R8-R11 for thumb2 and -Osize. In most cases, there is no extra push/pop instrs as they will be folded into existing ones. There might be slight performance impact due to more stack usage, so we only enable it when opt for min size. https://reviews.llvm.org/D30324 llvm-svn: 365014	2019-07-03 09:58:52 +00:00
Roman Lebedev	c4b83a6054	[Codegen][X86][AArch64][ARM][PowerPC] Inc-of-add vs sub-of-not (PR42457) Summary: This is the backend part of [[ https://bugs.llvm.org/show_bug.cgi?id=42457 \| PR42457 ]]. In middle-end, we'd want to prefer the form with two adds - D63992, but as this diff shows, not every target will prefer that pattern. Out of 4 targets for which i added tests all seem to be ok with inc-of-add for scalars, but only X86 prefer that same pattern for vectors. Here i'm adding a new TLI hook, always defaulting to the inc-of-add, but adding AArch64,ARM,PowerPC overrides to prefer inc-of-add only for scalars. Reviewers: spatel, RKSimon, efriedma, t.p.northover, hfinkel Reviewed By: efriedma Subscribers: nemanjai, javed.absar, kristof.beyls, kbarton, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64090 llvm-svn: 365010	2019-07-03 09:41:35 +00:00
Michael Liao	80177ca5a9	[AMDGPU] Enable serializing of argument info. Summary: - Support serialization of all arguments in machine function info. This enables fabricating MIR tests depending on argument info. Reviewers: arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64096 llvm-svn: 364995	2019-07-03 02:00:21 +00:00
Amara Emerson	cac1151845	[AArch64][GlobalISel] Overhaul legalization & isel or shifts to select immediate forms. There are two main issues preventing us from generating immediate form shifts: 1) We have partial SelectionDAG imported support for G_ASHR and G_LSHR shift immediate forms, but they currently don't work because the amount type is expected to be an s64 constant, but we only legalize them to have homogenous types. To deal with this, first we introduce a custom legalizer to only custom legalize s32 shifts which have a constant operand into a s64. There is also an additional artifact combiner to fold zexts(g_constant) to a larger G_CONSTANT if it's legal, a counterpart to the anyext version committed in an earlier patch. 2) For G_SHL the importer can't cope with the pattern. For this I introduced an early selection phase in the arm64 selector to select these forms manually before the tablegen selector pessimizes it to a register-register variant. Differential Revision: https://reviews.llvm.org/D63910 llvm-svn: 364994	2019-07-03 01:49:06 +00:00
Chen Zheng	dfdccbb26b	[PowerPC] exclude ICmpZero in LSR if icmp can be replaced in later hardware loop. Differential Revision: https://reviews.llvm.org/D63477 llvm-svn: 364993	2019-07-03 01:49:03 +00:00
Guanzhong Chen	b88ebe8cc9	[WebAssembly] Prevent inline assembly from being mangled by SjLj Summary: Before, inline assembly gets mangled by the SjLj transformation. For example, in a function with setjmp/longjmp, this LLVM IR code call void asm sideeffect "", ""() would be transformed into call void @__invoke_void(void ()* asm sideeffect "", "") This is invalid, and results in the error: Cannot take the address of an inline asm! In this diff, we skip the transformation for inline assembly. Reviewers: aheejin, tlively Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64115 llvm-svn: 364985	2019-07-03 00:37:49 +00:00
Matt Arsenault	c04aab9c06	AMDGPU: Look through bundles for existing waitcnts These aren't produced now, but will be in a future patch. llvm-svn: 364983	2019-07-03 00:30:44 +00:00
Craig Topper	b770d2c9d4	[X86] Add a DAG combine for turning *_extend_vector_inreg+load into an appropriate extload if the load isn't volatile. Remove the corresponding isel patterns that did the same thing without checking for volatile. This fixes another variation of PR42079 llvm-svn: 364977	2019-07-02 23:20:03 +00:00
Eli Friedman	e97aa961d3	[ARM] Fix unwind info for Thumb1 functions that save high registers. There were two issues here: one, some of the relevant instructions were missing the expected "FrameSetup" flag, and two, ARMAsmPrinter::EmitUnwindingInstruction wasn't expecting "mov" instructions in the prologue. I'm sticking the additional state into ARMFunctionInfo so it's obvious it only applies to the current function. I considered a few alternative approaches where we would compute the correct unwind information as part of the prologue/epilogue lowering, but it seems like a lot of work to introduce pseudo-instructions, and the current code seems to be reliable enough. Fixes https://bugs.llvm.org/show_bug.cgi?id=42408. Differential Revision: https://reviews.llvm.org/D63964 llvm-svn: 364970	2019-07-02 21:35:15 +00:00
Jessica Paquette	99316043bb	[AArch64][GlobalISel] Teach tryOptSelect to handle G_ICMP This teaches `tryOptSelect` to handle folding G_ICMP, and removes the requirement that the G_SELECT we're dealing with is floating point. Some refactoring to make this work nicely as well: - Factor out the scalar case from the selection code for G_ICMP into `emitIntegerCompare`. - Make `tryOptCMN` return a MachineInstr* instead of a bool. - Make `tryOptCMN` not modify the instruction being selected. - Factor out the CMN emission into `emitCMN` for readability. By doing this this way, we can get all of the compare selection optimizations in select emission. Differential Revision: https://reviews.llvm.org/D64084 llvm-svn: 364961	2019-07-02 19:44:16 +00:00
Matt Arsenault	5fe851b6cd	AMDGPU: Custom lower vector_shuffle for v4i16/v4f16 Ordinarily it is lowered as a build_vector of each extract_vector_elt, which in turn get lowered to bitcasts and bit shifts. Very little understand the lowered extract pattern, resulting in much worse code. We treat concat_vectors of v2i16 as legal, so prefer that. llvm-svn: 364959	2019-07-02 19:15:45 +00:00
Simon Pilgrim	5613874947	[X86] getTargetConstantBitsFromNode - remove unnecessary getZExtValue() (PR42486) Don't use APInt::getZExtValue() if you can avoid it - eventually someone will call it with i128 or something that doesn't fit into 64-bits. In this case it was completely superfluous as we'd moved the rest of the code to always use APInt. Fixes the <1 x i128> addition bug in PR42486 llvm-svn: 364953	2019-07-02 18:20:38 +00:00
Alexander Timofeev	66ac6b409d	[AMDGPU] LCSSA pass added in preISel. Fixing typo in previous commit llvm-svn: 364952	2019-07-02 18:16:42 +00:00
Alexander Timofeev	2ce560f029	[AMDGPU] LCSSA pass added in preISel. Uniform values defined in the divergent loop and used outside Differential Revision: https://reviews.llvm.org/D63953 Reviewers: rampitec, nhaehnle, arsenm llvm-svn: 364950	2019-07-02 17:59:44 +00:00
Craig Topper	cffbaa93b7	[X86] Add patterns to select (scalar_to_vector (loadf32)) as (V)MOVSSrm instead of COPY_TO_REGCLASS + (V)MOVSSrm_alt. Similar for (V)MOVSD. Ultimately, I'd like to see about folding scalar_to_vector+load to vzload. Which would select as (V)MOVSSrm so this is closer to that. llvm-svn: 364948	2019-07-02 17:51:02 +00:00
Matt Arsenault	50be3481d4	AMDGPU/GlobalISel: Try generated matcher with intrinsics llvm-svn: 364933	2019-07-02 14:52:16 +00:00
Matt Arsenault	a8bff4b963	AMDGPU/GlobalISel: Select mul llvm-svn: 364932	2019-07-02 14:52:14 +00:00
Matt Arsenault	70a4d3f67c	AMDGPU/GlobalISel: Fix G_GEP with mixed SGPR/VGPR operands The register bank for the destination of the sample argument copy was wrong. We shouldn't be constraining each source to the result register bank. Allow constraining the original register to the right size. llvm-svn: 364928	2019-07-02 14:40:22 +00:00
Matt Arsenault	ed63399244	AMDGPU/GlobalISel: Select G_FENCE Manually select to workaround tablegen emitter emitting checks for G_CONSTANT. llvm-svn: 364927	2019-07-02 14:17:38 +00:00
Simon Pilgrim	9304168103	[X86][AVX] combineX86ShuffleChain - pull out CombineShuffleWithExtract lambda. NFCI. Pull out CombineShuffleWithExtract lambda to new combineX86ShuffleChainWithExtract wrapper and refactored it to handle more than 2 shuffle inputs - this will allow combineX86ShufflesRecursively to call this in a future patch. llvm-svn: 364924	2019-07-02 13:30:04 +00:00
Simon Tatham	bffd099d15	[ARM] MVE: allow soft-float ABI to pass vector types. Passing a vector type over the soft-float ABI involves it being split into four GPRs, so the first thing that has to happen at the start of the function is to recombine those into a vector register. The ABI types all vectors as v2f64, so we need to support BUILD_VECTOR for that type, which I do in this patch by allowing it to be expanded in terms of INSERT_VECTOR_ELT, and writing an ISel pattern for that in turn. Similarly, I provide a rule for EXTRACT_VECTOR_ELT so that a returned vector can be marshalled back into GPRs. While I'm here, I've also added ISD::UNDEF to the list of operations we turn back on in `setAllExpand`, because I noticed that otherwise it gets expanded into a BUILD_VECTOR with explicit zero inputs, leading to pointless machine instructions to zero out a vector register that's about to have every lane overwritten of in any case. Reviewers: dmgreen, ostannard Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63937 llvm-svn: 364910	2019-07-02 11:26:11 +00:00
Simon Tatham	7b63a9533c	[ARM] Stop using scalar FP instructions in integer-only MVE mode. If you compile with `-mattr=+mve` (enabling integer MVE instructions but not floating-point ones), then the scalar FP //registers// exist and it's legal to move things in and out of them, load and store them, but it's not legal to do arithmetic on them. In D60708, the calls to `addRegisterClass` in ARMISelLowering that enable use of the scalar FP registers became conditionalised on `Subtarget->hasFPRegs()` instead of `Subtarget->hasVFP2Base()`, so that loads, stores and moves of those registers would work. But I didn't realise that that would also enable all the operations on those types by default. Now, if the target doesn't have basic VFP, we follow up those `addRegisterClass` calls by turning back off all the nontrivial operations you can perform on f32 and f64. That causes several knock-on failures, which are fixed by allowing the `VMOVDcc` and `VMOVScc` instructions to be selected even if all you have is `HasFPRegs`, and adjusting several checks for 'is this a double in a single-precision-only world?' to the more general 'is this any FP type we can't do arithmetic on?'. Between those, the whole of the `float-ops.ll` and `fp16-instructions.ll` tests can now run in MVE-without-FP mode and generate correct-looking code. One odd side effect is that I had to relax the check lines in that test so that they permit test functions like `add_f` to be generated as tailcalls to software FP library functions, instead of ordinary calls. Doing that is entirely legal, but the mystery is why this is the first RUN line that's needed the relaxation: on the usual kind of non-FP target, no tailcalls ever seem to be generated. Going by the llc messages, I think `SoftenFloatResult` must be perturbing the code generation in some way, but that's as much as I can guess. Reviewers: dmgreen, ostannard Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63938 llvm-svn: 364909	2019-07-02 11:26:00 +00:00
Simon Pilgrim	d609ebb779	[X86] resolveTargetShuffleInputsAndMask - add repeated input handling. We were relying on combineX86ShufflesRecursively to handle this - this patch gets it done earlier which should make it easier for other code to use resolveTargetShuffleInputsAndMask. llvm-svn: 364906	2019-07-02 10:53:17 +00:00
Simon Atanasyan	1d7d0e2126	[mips] Mark P5600 scheduling model as complete llvm-svn: 364902	2019-07-02 10:22:14 +00:00
Simon Atanasyan	f2867518b3	[mips] Add missing schedinfo for FPU load/store/conv instructions llvm-svn: 364900	2019-07-02 10:22:06 +00:00
Simon Atanasyan	116cf95c00	[mips] Map SNOP, NOP to the P5600Nop scheduler resource llvm-svn: 364899	2019-07-02 10:21:59 +00:00
Craig Topper	2d306b2d57	[X86] Add PreprocessISelDAG support for turning ISD::FP_TO_SINT/UINT into X86ISD::CVTTP2SI/CVTTP2UI and to reduce the number of isel patterns. llvm-svn: 364887	2019-07-02 05:53:37 +00:00
QingShan Zhang	7fdb3a293b	[PowerPC] Implement the areMemAccessesTriviallyDisjoint hook After implemented this hook, we will model the memory dependency in the scheduling dependency graph more precise, and will have more opportunity to reorder the load/stores, as they didn't have the dependency at some condition Differential Revision: https://reviews.llvm.org/D63804 llvm-svn: 364886	2019-07-02 03:28:52 +00:00
Jordan Rupprecht	351b7e7b24	Revert Recommit [PowerPC] Update P9 vector costs for insert/extract element This reverts r364557 (git commit `9f7f5858fe`) This crashes as reported on the commit thread. Repro instructions TBD. llvm-svn: 364876	2019-07-01 23:29:46 +00:00
Matt Arsenault	40c08052a5	AMDGPU: Correct properties for adjcallstack* pseudos These should be SALU writes, and these are lowered to instructions that def SCC. llvm-svn: 364859	2019-07-01 22:01:05 +00:00
Craig Topper	3f722d40c5	[X86] Use v4i32 vzloads instead of v2i64 for vpmovzx/vpmovsx patterns where only 32-bits are loaded. v2i64 vzload defines a 64-bit memory access. It doesn't look like we have any coverage for this either way. Also remove some vzload usages where the instruction loads only 16-bits. llvm-svn: 364851	2019-07-01 21:25:11 +00:00
Simon Atanasyan	fa27500676	[mips] Add missing schedinfo for MIPSeh_return[32\|64] instructions llvm-svn: 364850	2019-07-01 21:25:04 +00:00
Simon Atanasyan	29801f7851	[mips] Add virtualization ASE to P5600 scheduling definitions llvm-svn: 364849	2019-07-01 21:24:58 +00:00
Simon Atanasyan	574d0a61bd	[mips] Add missing schedinfo for LONG_BRANCH_* instructions llvm-svn: 364848	2019-07-01 21:24:51 +00:00
Craig Topper	328b24150e	[X86] Remove several bad load folding isel patterns for VPMOVZX/VPMOVSX. These patterns all matched a v2i64 vzload which only loads 64-bits to instructions that load a full 128-bits. llvm-svn: 364847	2019-07-01 21:23:38 +00:00
Craig Topper	5e7815b695	[X86] Correct v4f32->v2i64 cvt(t)ps2(u)qq memory isel patterns These instructions only read 64-bits of memory so we shouldn't allow a full vector width load to be pattern matched in case it is marked volatile. Instead allow vzload or scalar_to_vector+load. Also add a DAG combine to turn full vector loads into vzload when used by one of these instructions if the load isn't volatile. This fixes another case for PR42079 llvm-svn: 364838	2019-07-01 19:01:37 +00:00
Matt Arsenault	bae3636f96	AMDGPU/GlobalISel: Handle more input argument intrinsics llvm-svn: 364836	2019-07-01 18:50:50 +00:00
Matt Arsenault	9e8e8c60fa	AMDGPU/GlobalISel: Lower kernarg segment ptr intrinsics llvm-svn: 364835	2019-07-01 18:49:01 +00:00
Matt Arsenault	756d81905f	AMDGPU/GlobalISel: Legalize workgroup ID intrinsics llvm-svn: 364834	2019-07-01 18:47:22 +00:00
Matt Arsenault	e2c86cce3a	AMDGPU/GlobalISel: Legalize workitem ID intrinsics Tests don't cover the masked input path since non-kernel arguments aren't lowered yet. Test is copied directly from the existing test, with 2 additions. llvm-svn: 364833	2019-07-01 18:45:36 +00:00
Matt Arsenault	e15770aec4	AMDGPU/GlobalISel: Custom lower control flow intrinsics Replace the brcond for the 2 cases that act as branches. For now follow how the current system works, although I think we can eventually get rid of the pseudos. llvm-svn: 364832	2019-07-01 18:40:23 +00:00
Matt Arsenault	4073b33786	AMDGPU/GlobalISel: Handle 16-bit SALU min/max This needs to be extended to s32, and expanded into cmp+select. This is relying on the fact that widenScalar happens to leave the instruction in place, but this isn't a guaranteed property of LegalizerHelper. llvm-svn: 364831	2019-07-01 18:33:37 +00:00
Matt Arsenault	5a7d5111e5	AMDGPU/GlobalISel: Lower SALU min/max to cmp+select Use a change observer to apply a register bank to the newly created intermediate result register. llvm-svn: 364830	2019-07-01 18:30:45 +00:00
Robert Lougher	e20030f612	[X86] Avoid SFB - Fix inconsistent codegen with/without debug info(2) The function findPotentialBlockers may consider debug info instructions as potential blockers and may stop searching for a store-load pair prematurely. This patch corrects this and tests the cases where the store is separated from the load by more than InspectionLimit debug instructions. Patch by Chris Dawson. Differential Revision: https://reviews.llvm.org/D62408 llvm-svn: 364829	2019-07-01 18:28:21 +00:00
Matt Arsenault	ef59cb6982	AMDGPU/GlobalISel: Legalize s16 add/sub/mul If this is scalar, promote to s32. Use a new observer class to assign the register bank of newly created registers. llvm-svn: 364827	2019-07-01 18:18:55 +00:00
Matt Arsenault	9470bb262b	AMDGPU/GlobalISel: Fix allowing non-boolean conditions for G_SELECT The condition register bank must be scc or vcc so that a copy will be inserted, which will be lowered to a compare. Currently greedy unnecessarily forces using a VCC select. llvm-svn: 364825	2019-07-01 18:13:12 +00:00
Matt Arsenault	b2ea20eedd	AMDGPU/GlobalISel: RegBankSelect for sendmsg/sendmsghalt llvm-svn: 364819	2019-07-01 17:40:18 +00:00
Matt Arsenault	40d1faf38f	AMDGPU/GlobalISel: Legalize s16 fcmp llvm-svn: 364817	2019-07-01 17:35:53 +00:00
Nicolai Haehnle	10c911db63	AMDGPU/GFX10: implement ds_ordered_count changes Summary: ds_ordered_count can now simultaneously operate on up to 4 dwords in a single instruction, which are taken from (and returned to) lanes 0..3 of a single VGPR. Change-Id: I19b6e7b0732b617c10a779a7f9c0303eec7dd276 Reviewers: mareko, arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63716 llvm-svn: 364815	2019-07-01 17:17:52 +00:00
Nicolai Haehnle	4dc3b2bf95	AMDGPU: Support GDS atomics Summary: Original patch by Marek Olšák Change-Id: Ia97d5d685a63a377d86e82942436d1fe6e429bab Reviewers: mareko, arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, jfb, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63452 llvm-svn: 364814	2019-07-01 17:17:45 +00:00
Matt Arsenault	1094e6a814	AMDGPU/GlobalISel: RegBankSelect for DS ordered add/swap llvm-svn: 364811	2019-07-01 17:04:57 +00:00
Matt Arsenault	732149b24e	AArch64/GlobalISel: Fix trying to select invalid MIR Physical registers are not allowed to be a phi operand. llvm-svn: 364810	2019-07-01 17:02:24 +00:00
Matt Arsenault	265059eaf6	AMDGPU/GlobalISel: RegBankSelect for amdgcn.writelane llvm-svn: 364808	2019-07-01 16:41:36 +00:00
Matt Arsenault	a310727830	AMDGPU/GlobalISel: Fail instead of assert when selecting loads llvm-svn: 364807	2019-07-01 16:36:39 +00:00
Matt Arsenault	0a52e9d026	AMDGPU/GlobalISel: Complete implementation of G_GEP Also works around tablegen defect in selecting add with unused carry, but if we have to manually select GEP, might as well handle add manually. llvm-svn: 364806	2019-07-01 16:34:48 +00:00
Matt Arsenault	e1006259d8	AMDGPU/GlobalISel: Select G_PHI llvm-svn: 364805	2019-07-01 16:32:47 +00:00
Matt Arsenault	d810ff2588	AMDGPU/GlobalISel: Try to select VOP3 form of add There are several things broken, but at least emit the right thing for gfx9. The import of the pattern with the unused carry out seems to not work. Needs a special class for clamp, because OperandWithDefaultOps doesn't really work. llvm-svn: 364804	2019-07-01 16:27:32 +00:00
Simon Pilgrim	e3e38cce4a	[X86] Add widenSubVector to size in bits helper. NFCI. We can already widenSubVector to a specific type (of the same scalar type) - this variant just specifies the target vector size. This will be useful when CombineShuffleWithExtract relaxes the need to have the same scalar type for all shuffle operand subvector sources. llvm-svn: 364803	2019-07-01 16:20:47 +00:00
Matt Arsenault	62d64b0c30	AMDGPU/GlobalISel: RegBankSelect for readlane/readfirstlane llvm-svn: 364801	2019-07-01 16:19:39 +00:00
Tom Stellard	9e9dd30de3	AMDGPU/GlobalISel: Implement select for 32-bit G_ADD Reviewers: arsenm Reviewed By: arsenm Subscribers: hiraditya, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58804 llvm-svn: 364797	2019-07-01 16:09:33 +00:00
Mikhail Maltsev	8b2e304bc5	[ARM] Fix MVE_VQxDMLxDH instruction class Summary: According to the ARMARM, the VQDMLADH, VQRDMLADH, VQDMLSDH and VQRDMLSDH instructions handle their results as follows: "The base variant writes the results into the lower element of each pair of elements in the destination register, whereas the exchange variant writes to the upper element in each pair". I.e., the initial content of the output register affects the result, as usual, we model this with an additional input. Also, for 32-bit variants Qd is not allowed to be the same register as Qm and Qn, we use @earlyclobber to indicate this. This patch also changes vpred_r to vpred_n because the instructions don't have an explicit 'inactive' operand. Reviewers: dmgreen, ostannard, simon_tatham Reviewed By: simon_tatham Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64007 llvm-svn: 364796	2019-07-01 16:07:58 +00:00
Matt Arsenault	2ab25f9ceb	AMDGPU/GlobalISel: Select G_BRCOND for vcc llvm-svn: 364795	2019-07-01 16:06:02 +00:00
Mikhail Maltsev	4a9e3f15bb	[ARM] MVE: support QQPRRegClass and QQQQPRRegClass Summary: QQPRRegClass and QQQQPRRegClass are used by the interleaving/deinterleaving loads/stores to represent sequences of consecutive SIMD registers. Reviewers: ostannard, simon_tatham, dmgreen Reviewed By: simon_tatham Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64009 llvm-svn: 364794	2019-07-01 16:05:23 +00:00
Krzysztof Parzyszek	5abf80cdfa	[Hexagon] Custom-lower UADDO(x, 1) and USUBO(x, 1) llvm-svn: 364790	2019-07-01 15:50:09 +00:00
Matt Arsenault	cda82f0bb6	AMDGPU/GlobalISel: Select G_FRAME_INDEX llvm-svn: 364789	2019-07-01 15:48:18 +00:00
Nicolai Haehnle	7cfd99ab15	AMDGPU/GFX10: fix scratch resource descriptor Summary: The stride should depend on the wave size, not the hardware generation. Also, the 32_FLOAT format is 0x16, not 16; though that shouldn't be relevant. Change-Id: I088f93bf6708974d085d1c50967f119061da6dc6 Reviewers: arsenm, rampitec, mareko Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63808 llvm-svn: 364788	2019-07-01 15:43:00 +00:00
Matt Arsenault	fdf36729c7	AMDGPU/GlobalISel: Make s16 select legal This is easy to handle and avoids legalization artifacts which are likely to obscure combines. llvm-svn: 364787	2019-07-01 15:42:47 +00:00
Matt Arsenault	6464280eb0	AMDGPU/GlobalISel: Select G_BRCOND for scc conditions llvm-svn: 364786	2019-07-01 15:39:27 +00:00
Matt Arsenault	1daad91af6	AMDGPU/GlobalISel: Tolerate copies with no type set isVCC has the same bug, but isn't used in a context where it can cause a problem. llvm-svn: 364784	2019-07-01 15:23:04 +00:00
Matt Arsenault	4f64ade04c	AMDGPU/GlobalISel: Select src modifiers llvm-svn: 364782	2019-07-01 15:18:56 +00:00
Krzysztof Parzyszek	511ad50db4	[Hexagon] Rework VLCR algorithm Add code to catch pattern for commutative instructions for VLCR. Patch by Suyog Sarda. llvm-svn: 364770	2019-07-01 13:50:47 +00:00
Matt Arsenault	1b317685e9	AMDGPU: Convert some places to Register llvm-svn: 364769	2019-07-01 13:44:46 +00:00
Matt Arsenault	5bf850d52e	AMDGPU/GlobalISel: Fix RegBankSelect for G_FCANONICALIZE llvm-svn: 364768	2019-07-01 13:40:18 +00:00
Matt Arsenault	b5fc94f3e7	AMDGPU/GlobalISel: Fix RegBankSelect for G_BUILD_VECTOR llvm-svn: 364767	2019-07-01 13:40:17 +00:00
Matt Arsenault	89fc8bcdd6	AMDGPU/GlobalISel: Fail on store to 32-bit address space llvm-svn: 364766	2019-07-01 13:37:39 +00:00
Matt Arsenault	3b7668ae4b	AMDGPU/GlobalISel: Improve icmp selection coverage. Select s64 eq/ne scalar icmp. llvm-svn: 364765	2019-07-01 13:34:26 +00:00
Matt Arsenault	c23149f612	AMDGPU/GlobalISel: RegBankSelect for WWM/WQM llvm-svn: 364763	2019-07-01 13:30:12 +00:00
Matt Arsenault	facf69e844	AMDGPU/GlobalISel: Use vcc reg bank for amdgcn.wqm.vote llvm-svn: 364762	2019-07-01 13:30:09 +00:00
Matt Arsenault	9f992c238a	AMDGPU/GlobalISel: Fix scc->vcc copy handling This was checking the size of the register with the value of the size, which happens to be exec. Also fix assuming VCC is 64-bit to fix wave32. Also remove some untested handling for physical registers which is skipped. This doesn't insert the V_CNDMASK_B32 if SCC is the physical copy source. I'm not sure if this should be trying to handle this special case instead of dealing with this in copyPhysReg. llvm-svn: 364761	2019-07-01 13:22:07 +00:00
Matt Arsenault	5dafcb9b11	AMDGPU/GlobalISel: Use and instead of BFE with inline immediate Zext from s1 is the only case where this should do anything with the current legal extensions. llvm-svn: 364760	2019-07-01 13:22:06 +00:00
Simon Atanasyan	ceb9da5bc7	[mips] Add missing schedinfo for MSA and ASE instructions llvm-svn: 364757	2019-07-01 13:21:05 +00:00
Simon Atanasyan	c0121bf874	[mips] Add missing schedinfo for atomic instructions llvm-svn: 364756	2019-07-01 13:20:56 +00:00
Simon Atanasyan	3a10810b7a	[mips] Add missing schedinfo for ADJCALLSTACKDOWN, ADJCALLSTACKUP llvm-svn: 364755	2019-07-01 13:20:48 +00:00
Florian Hahn	33c8c0ea27	[AMDGPU] Call isLoopExiting for blocks in the loop. isLoopExiting should only be called for blocks in the loop. A follow up patch makes this requirement an assertion. I've updated the usage here, to only match for actual exit blocks. Previously, it would also match blocks not in the loop. Reviewers: arsenm, nhaehnle Reviewed By: nhaehnle Differential Revision: https://reviews.llvm.org/D63980 llvm-svn: 364750	2019-07-01 12:36:44 +00:00
Fangrui Song	92e78b7bed	[RISCV] Add break; to the last switch case As suggested by jrtc27 in the post-commit review of D60528. llvm-svn: 364746	2019-07-01 11:41:07 +00:00
Simon Pilgrim	172fe5dd19	[X86] CombineShuffleWithExtract - updated description comments. NFCI. CombineShuffleWithExtract no longer requires that both shuffle ops are extract_subvectors, from the same type or from the same size. llvm-svn: 364745	2019-07-01 11:33:45 +00:00
Sam Parker	98722691b0	[ARM] WLS/LE Code Generation Backend changes to enable WLS/LE low-overhead loops for armv8.1-m: 1) Use TTI to communicate to the HardwareLoop pass that we should try to generate intrinsics that guard the loop entry, as well as setting the loop trip count. 2) Lower the BRCOND that uses said intrinsic to an Arm specific node: ARMWLS. 3) ISelDAGToDAG the node to a new pseudo instruction: t2WhileLoopStart. 4) Add support in ArmLowOverheadLoops to handle the new pseudo instruction. Differential Revision: https://reviews.llvm.org/D63816 llvm-svn: 364733	2019-07-01 08:21:28 +00:00
Craig Topper	29fff0797b	[X86] Improve the type checking fast-isel handling of vector bitcasts. We had a bunch of vector size legality checks for the source type based on feature flags, but we didn't check the destination type at all beyond ensuring that it was a "simple" type. But this allowed the destination to be i128 which isn't legal. This commit changes the code to use TLI's isTypeLegal logic in place of the all the subtarget checks. Then additionally checks that the source and dest are vectors. Fixes 42452 llvm-svn: 364729	2019-07-01 07:09:34 +00:00
Craig Topper	4ca81a9b99	[X86] Add a DAG combine to replace vector loads feeding a v4i32->v2f64 CVTSI2FP/CVTUI2FP node with a vzload. But only when the load isn't volatile. This improves load folding during isel where we only have vzload and scalar_to_vector+load patterns. We can't have full vector load isel patterns for the same volatile load issue. Also add some missing masked cvtsi2fp/cvtui2fp with vzload patterns. llvm-svn: 364728	2019-07-01 07:09:31 +00:00
Craig Topper	d1728f8987	[X86] Add MOVHPDrm/MOVLPDrm patterns that use VZEXT_LOAD. We already had patterns that used scalar_to_vector+load. But we can also have a vzload. Found while investigating combining scalar_to_vector+load to vzload. llvm-svn: 364726	2019-07-01 07:09:23 +00:00
Fangrui Song	78ee2fbf98	Cleanup: llvm::bsearch -> llvm::partition_point after r364719 llvm-svn: 364720	2019-06-30 11:19:56 +00:00
Craig Topper	725a8a5dc4	[X86] Custom lower AVX masked loads to masked load and vselect instead of selecting a maskmov+vblend during isel. AVX masked loads only support 0 as the value for masked off elements. So we need an extra blend to support other values. Previously we expanded the masked load to two instructions with isel patterns. With this patch we now insert the vselect during lowering and it will be separately selected as a blend. llvm-svn: 364718	2019-06-30 06:46:37 +00:00
Matt Arsenault	0d45209757	AMDGPU/GlobalISel: RegBankSelect for update.dpp llvm-svn: 364701	2019-06-29 00:44:36 +00:00
Matt Arsenault	fd82cf4f4d	AMDGPU/GlobalISel: RegBankSelect for atomic.inc/atomic.dec llvm-svn: 364699	2019-06-29 00:39:20 +00:00
Matt Arsenault	adb1f21e52	AMDGPU/GlobalISel: RegBankSelect for some DS intrinsics llvm-svn: 364698	2019-06-29 00:33:13 +00:00
Matt Arsenault	b416d5fc8b	AMDGPU/GlobalISel: RegBankSelect for some easy intrinsics llvm-svn: 364697	2019-06-29 00:29:56 +00:00
Matt Arsenault	5ea3c9adb2	AMDGPU/GlobalISel: RegBankSelect for icmp/fcmp intrinsics llvm-svn: 364696	2019-06-29 00:28:52 +00:00
Matt Arsenault	6aafb3068f	AMDGPU/GlobalISel: RegBankSelect for amdgcn.div.fmas llvm-svn: 364695	2019-06-29 00:25:53 +00:00
Matt Arsenault	ade5162432	AMDGPU/GlobalISel: RegBankSelect for some simple leaf intrinsics llvm-svn: 364694	2019-06-29 00:22:28 +00:00
Wouter van Oortmerssen	319c87d94f	[WebAssembly] Assembler: support .int16/32/64 directives. Reviewers: sbc100 Subscribers: dschuff, jgravelle-google, aheejin, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63959 llvm-svn: 364689	2019-06-28 22:20:33 +00:00
Sanjay Patel	9126c84f50	[x86] remove stale comment about cmov; NFC The cmov node used to sometimes return a glue result (and that's what 'flag' meant in this context), but that was removed with D38664. llvm-svn: 364687	2019-06-28 21:45:55 +00:00
Wouter van Oortmerssen	fc222e23ca	[WebAssembly] Assembler: Allow offsets and p2align in symbol load. Reviewers: sbc100 Subscribers: dschuff, jgravelle-google, aheejin, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63951 llvm-svn: 364682	2019-06-28 20:31:13 +00:00
Wouter van Oortmerssen	597ba18008	[WebAssembly] Assembler: Improve section parsing. Reviewers: sbc100 Subscribers: dschuff, jgravelle-google, aheejin, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63947 llvm-svn: 364681	2019-06-28 20:29:16 +00:00
Brad Smith	4b733ca617	Default to Secure PLT on PPC for musl libc. This matches the default settings of clang. llvm-svn: 364675	2019-06-28 19:48:31 +00:00
Simon Pilgrim	978a08c885	[X86] CombineShuffleWithExtract - recurse through EXTRACT_SUBVECTOR chain llvm-svn: 364667	2019-06-28 17:57:32 +00:00
Dmitry Preobrazhensky	e1eb25ff3e	[AMDGPU][MC] Fix 2 for sanitizer failure in 364645 llvm-svn: 364656	2019-06-28 16:28:46 +00:00
Sam Tebbs	e39e958da3	[ARM] Add support for the MVE long shift instructions MVE adds the lsll, lsrl and asrl instructions, which perform a shift on a 64 bit value separated into two 32 bit registers. The Expand64BitShift function is modified to accept ISD::SHL, ISD::SRL and ISD::SRA and convert it into the appropriate opcode in ARMISD. An SHL is converted into an lsll, an SRL is converted into an lsrl for the immediate form and a negation and lsll for the register form, and SRA is converted into an asrl. test/CodeGen/ARM/shift_parts.ll is added to test the logic of emitting these instructions. Differential Revision: https://reviews.llvm.org/D63430 llvm-svn: 364654	2019-06-28 15:43:31 +00:00
Dmitry Preobrazhensky	d12966c088	[AMDGPU][MC] Fix for sanitizer failure in 364645 llvm-svn: 364651	2019-06-28 15:22:47 +00:00
Dmitry Preobrazhensky	1d572ce395	[AMDGPU][MC] Enabled constant expressions as operands of sendmsg See bug 40820: https://bugs.llvm.org/show_bug.cgi?id=40820 Reviewers: artem.tamazov, arsenm Differential Revision: https://reviews.llvm.org/D62735 llvm-svn: 364645	2019-06-28 14:14:02 +00:00
Simon Pilgrim	a54e1a0f01	[X86] CombineShuffleWithExtract - only require 1 source to be EXTRACT_SUBVECTOR We were requiring that both shuffle operands were EXTRACT_SUBVECTORs, but we can relax this to only require one of them to be. Also, we shouldn't bother attempting this if both operands are from the lowest subvector (or not EXTRACT_SUBVECTOR at all). llvm-svn: 364644	2019-06-28 12:24:49 +00:00
David Green	9dbdfe6b78	[ARM] Add MVE mul patterns This simply adds integer and floating point VMUL patterns for MVE, same as we have add and sub. Differential Revision: https://reviews.llvm.org/D63866 llvm-svn: 364643	2019-06-28 11:44:03 +00:00
David Green	2883944035	[ARM] Mark math routines as non-legal for MVE This adds handling and tests for a number of floating point math routines, which have no MVE instructions. Differential Revision: https://reviews.llvm.org/D63725 llvm-svn: 364641	2019-06-28 11:17:38 +00:00
David Green	ff70cbc895	[ARM] MVE patterns for VABS and VNEG This simply adds the required patterns for fp neg and abs. Differential Revision: https://reviews.llvm.org/D63861 llvm-svn: 364640	2019-06-28 10:25:35 +00:00
David Green	eb7080ac6e	[ARM] Widening loads and narrowing stores MVE has instructions to widen as it loads, and narrow as it stores. This adds the required patterns and legalisation to make them work including specifying that they are legal, patterns to select them and test changes. Patch by David Sherwood. Differential Revision: https://reviews.llvm.org/D63839 llvm-svn: 364636	2019-06-28 09:47:55 +00:00
Simon Tatham	29ff1b4f46	[ARM] Fix integer UB in MVE load/store immediate handling. llvm-svn: 364635	2019-06-28 09:28:39 +00:00
David Green	07e53fee14	[ARM] MVE loads and stores This fills in the gaps for basic MVE loads and stores, allowing unaligned access and adding far too many tests. These will become important as narrowing/expanding and pre/post inc are added. Big endian might still not be handled very well, because we have not yet added bitcasts (and I'm not sure how we want it to work yet). I've included the alignment code anyway which maps with our current patterns. We plan to return to that later. Code written by Simon Tatham, with additional tests from Me and Mikhail Maltsev. Differential Revision: https://reviews.llvm.org/D63838 llvm-svn: 364633	2019-06-28 08:41:40 +00:00
Dylan McKay	2bc48f503a	[AVR] Don't look for the TargetFrameLowering in the FrameLowering implementation c.f. r364349 llvm-svn: 364632	2019-06-28 08:35:21 +00:00
David Green	fc4102417b	[ARM] Mark div and rem as expand for MVE We don't have vector operations for these, so they need to be expanded for both integer and float. Differential Revision: https://reviews.llvm.org/D63595 llvm-svn: 364631	2019-06-28 08:18:55 +00:00
David Green	62889b0ea5	[ARM] Select MVE fp add and sub The same as integer arithmetic, we can add simple floating point MVE addition and subtraction patterns. Initial code by David Sherwood Differential Revision: https://reviews.llvm.org/D63257 llvm-svn: 364629	2019-06-28 07:41:09 +00:00
David Green	be05b85db9	[ARM] Select MVE add and sub This adds the first few patterns for MVE code generation, adding simple integer add and sub patterns. Initial code by David Sherwood Differential Revision: https://reviews.llvm.org/D63255 llvm-svn: 364627	2019-06-28 07:21:11 +00:00
David Green	8be372b190	[ARM] MVE vector shuffles This patch adds necessary shuffle vector and buildvector support for ARM MVE. It essentially adds support for VDUP, VREVs and some VMOVs, which are often required by other code (like upcoming patches). This mostly uses the same code from Neon that already generated NEONvdup/NEONvduplane/NEONvrev's. These have been renamed to ARMvdup/etc and moved to ARMInstrInfo as they are common to both architectures. Most of the selection code seems to be applicable to both, but NEON does have some more instructions making some parts specific. Most code originally by David Sherwood. Differential Revision: https://reviews.llvm.org/D63567 llvm-svn: 364626	2019-06-28 07:08:42 +00:00
Craig Topper	cbb88a5169	[X86] Connect the output chain properly when combining vzext_movl+load into vzext_load. llvm-svn: 364625	2019-06-28 06:58:50 +00:00
Craig Topper	e832adea0f	[X86] Remove some duplicate patterns that already exist as part of their instruction definition. NFC llvm-svn: 364623	2019-06-28 05:03:47 +00:00
Zi Xuan Wu	588a170970	[NFC][PowerPC] Move XSQP series instruction apart from XSQPO series in position of td file llvm-svn: 364620	2019-06-28 02:51:03 +00:00
Stanislav Mekhanoshin	07fd88d735	[AMDGPU] Packed thread ids in function call ABI Differential Revision: https://reviews.llvm.org/D63851 llvm-svn: 364619	2019-06-28 01:52:13 +00:00
Kai Luo	c6fe8436e8	[PowerPC][NFC] Use `\|=` to update `Simplified` flag llvm-svn: 364617	2019-06-28 01:38:42 +00:00
Matt Arsenault	1178dc3d0b	AMDGPU/GlobalISel: Convert to using Register llvm-svn: 364616	2019-06-28 01:16:46 +00:00
Sanjay Patel	a95ca2b5ff	[x86] prevent crashing from select narrowing with AVX512 llvm-svn: 364585	2019-06-27 20:16:58 +00:00
Jinsong Ji	c627aa2fa9	[PowerPC][NFC] Remove unused (and unsupported) fusion feature bits. FeatureFusion bits was first introduced in https://reviews.llvm.org/rL253724. for add/load integer fusion for P8. The only use of `hasFusion` was https://reviews.llvm.org/rL255319. However, this was removed later in https://reviews.llvm.org/rL280440. So, there is NO any reference to fusion in code now. Leaving it there is misleading and confusing, so remove it for now. We can alwasy add back if we ever support fusion in the future. llvm-svn: 364581	2019-06-27 19:35:11 +00:00
Wouter van Oortmerssen	bfd3f69480	[WebAssembly] AsmParser: better atomic inst detection Summary: Previously missed atomic.notify. Fixes https://bugs.llvm.org/show_bug.cgi?id=40728 Reviewers: aheejin Subscribers: sbc100, jgravelle-google, sunfish, jfb, llvm-commits, dschuff Tags: #llvm Differential Revision: https://reviews.llvm.org/D63747 llvm-svn: 364576	2019-06-27 18:58:26 +00:00
Wouter van Oortmerssen	6b3f56b65f	[WebAssembly] Fix p2align in assembler. Summary: - Match the syntax output by InstPrinter. - Fix it always emitting 0 for align. Had to work around fact that opcode is not available for GetDefaultP2Align while parsing. - Updated tests that were erroneously happy with a p2align=0 Fixes https://bugs.llvm.org/show_bug.cgi?id=40752 Reviewers: aheejin, sbc100 Subscribers: jgravelle-google, sunfish, jfb, llvm-commits, dschuff Tags: #llvm Differential Revision: https://reviews.llvm.org/D63633 llvm-svn: 364570	2019-06-27 18:11:15 +00:00
Simon Pilgrim	1fd1c60979	[X86] combineX86ShufflesRecursively - merge shuffles with more than 2 inputs We already had the infrastructure for this, but were waiting for the fix for a number of regressions which were handled by the recent shuffle(extract_subvector(),extract_subvector()) -> extract_subvector(shuffle()) shuffle combines llvm-svn: 364569	2019-06-27 17:30:51 +00:00
Nicolai Haehnle	32ef9292be	AMDGPU: Make fixing i1 copies robust against re-ordering Summary: The new test case led to incorrect code. Change-Id: Ief48b227e97aa662dd3535c9bafb27d4a184efca Reviewers: arsenm, david-salinas Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63871 llvm-svn: 364566	2019-06-27 16:56:44 +00:00
Simon Pilgrim	e9a2f4fe2c	Use getConstantOperandAPInt instead of getConstantOperandVal for comparisons. getConstantOperandAPInt avoids any large integer issues - these are unlikely but the fuzzers do like to mess around..... llvm-svn: 364564	2019-06-27 16:46:00 +00:00
Simon Pilgrim	74343eba37	[X86] getTargetVShiftByConstNode - reduce variable scope. NFCI. Fixes cppcheck warning. llvm-svn: 364561	2019-06-27 16:33:44 +00:00
Sam Tebbs	8747c5f482	[ARM] Fix formatting issue in ARMISelLowering.cpp Fix a formatting error in ARMISelLowering.cpp::Expand64BitShift. My test commit after receiving write access. llvm-svn: 364560	2019-06-27 16:28:28 +00:00
Roland Froese	9f7f5858fe	Recommit [PowerPC] Update P9 vector costs for insert/extract element Recommit patch D60160 after regression fix patch D63463. llvm-svn: 364557	2019-06-27 16:20:24 +00:00
Jinsong Ji	157b073fa5	[PowerPC][HTM] Fix disassembling buffer overflow for tabortdc and others This was reported in https://bugs.llvm.org/show_bug.cgi?id=41751 llvm-mc aborted when disassembling tabortdc. This patch try to clean up TM related DAGs. * Fixes the problem by remove explicit output of cr0, and put it as implicit def. * Update int_ppc_tbegin pattern to accommodate the implicit def of cr0. * Update the TCHECK operand and int_ppc_tcheck accordingly. * Add some builtin test and disassembly tests. * Remove unused CRRC0/crrc0 Differential Revision: https://reviews.llvm.org/D61935 llvm-svn: 364544	2019-06-27 14:11:31 +00:00
Simon Atanasyan	e9ec0b6f09	[mips] Mark pseudo select instructions by the `hasNoSchedulingInfo` tag llvm-svn: 364540	2019-06-27 13:41:30 +00:00
Simon Atanasyan	7c83f0705a	[mips] Add new items to the list of features unsupported by P5600 llvm-svn: 364539	2019-06-27 13:41:23 +00:00
Djordje Todorovic	71d3869f60	[Backend] Keep call site info valid through the backend Handle call instruction replacements and deletions in order to preserve valid state of the call site info of the MachineFunction. NOTE: If the call site info is enabled for a new target, the assertion from the MachineFunction::DeleteMachineInstr() should help to locate places where the updateCallSiteInfo() should be called in order to preserve valid state of the call site info. ([10/13] Introduce the debug entry values.) Co-authored-by: Ananth Sowda <asowda@cisco.com> Co-authored-by: Nikola Prica <nikola.prica@rt-rk.com> Co-authored-by: Ivan Baev <ibaev@cisco.com> Differential Revision: https://reviews.llvm.org/D61062 llvm-svn: 364536	2019-06-27 13:10:29 +00:00
Simon Tatham	1a3dc8f678	[ARM] Fix bogus assertions in copyPhysReg v8.1-M cases. The code to generate register move instructions in and out of VPR and FPSCR_NZCV had assertions checking that the other register involved was a GPR _pair_, instead of a single GPR as it should have been. Reviewers: miyuki, ostannard Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63865 llvm-svn: 364534	2019-06-27 12:41:12 +00:00
Simon Tatham	ffb2b347ff	[ARM] Fix handling of zero offsets in LOB instructions. The BF and WLS/WLSTP instructions have various branch-offset fields occupying different positions and lengths in the instruction encoding, and all of them were decoded at disassembly time by the function DecodeBFLabelOffset() which returned SoftFail if the offset was zero. In fact, it's perfectly fine and not even a SoftFail for most of those offset fields to be zero. The only one that can't be zero is the 4-bit field labelled `boff` in the architecture spec, occupying bits {26-23} of the BF instruction family. If that one is zero, the encoding overlaps other instructions (WLS, DLS, LETP, VCTP), so it ought to be a full Fail. Fixed by adding an extra template parameter to DecodeBFLabelOffset which controls whether a zero offset is accepted or rejected. Adjusted existing tests (only in error messages for bad disassemblies); added extra tests to demonstrate zero offsets being accepted in all the right places, and a few demonstrating rejection of zero `boff`. Reviewers: DavidSpickett, ostannard Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63864 llvm-svn: 364533	2019-06-27 12:41:07 +00:00
Simon Tatham	e5ce56fb95	[ARM] Make coprocessor number restrictions consistent. Different versions of the Arm architecture disallow the use of generic coprocessor instructions like MCR and CDP on different sets of coprocessors. This commit centralises the check of the coprocessor number so that it's consistent between assembly and disassembly, and also updates it for the new restrictions in Arm v8.1-M. New tests added that check all the coprocessor numbers; old tests updated, where they used a number that's now become illegal in the context in question. Reviewers: DavidSpickett, ostannard Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63863 llvm-svn: 364532	2019-06-27 12:40:55 +00:00
Simon Tatham	02449f9c3c	[ARM] Tighten restrictions on use of SP in v8.1-M CSEL. In the `CSEL Rd,Rm,Rn` instruction family (also including CSINC, CSINV and CSNEG), the architecture lists it as CONSTRAINED UNPREDICTABLE (i.e. SoftFail) to use SP in the Rd or Rm slot, but outright illegal to use it in the Rn slot, not least because some encodings of that form are used by MVE instructions such as UQRSHLL. MC was treating all three slots the same, as SoftFail. So the only reason UQRSHLL was disassembled correctly at all was because the MVE decode table is separate from the Thumb2 one and takes priority; if you turned off MVE, then encodings such as `[0x5f,0xea,0x0d,0x83]` would disassemble as spurious CSELs. Fixed by inventing another version of the `GPRwithZR` register class, which disallows SP completely instead of just SoftFailing it. Reviewers: DavidSpickett, ostannard Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63862 llvm-svn: 364531	2019-06-27 12:40:40 +00:00
Simon Pilgrim	c5cff5d3d1	[X86] getFauxShuffle - add DemandedElts as a filter This is currently benign but will be used in the future based on the elements referenced by the parent shuffle(s). llvm-svn: 364530	2019-06-27 12:35:52 +00:00
Simon Atanasyan	8c35c43816	[mips] Add GPR_64 predicate to some mov[zn] instructions llvm-svn: 364527	2019-06-27 12:08:17 +00:00
Simon Atanasyan	bf5fc620d9	[mips] Fix indentation and split long lines. NFC llvm-svn: 364526	2019-06-27 12:08:10 +00:00
Simon Atanasyan	3b184cf7e1	[mips] Reformat MSA instruction definitions. NFC llvm-svn: 364525	2019-06-27 12:08:03 +00:00
Simon Pilgrim	90e121fbe6	[X86][AVX] SimplifyDemandedVectorElts - combine PERMPD(x) -> EXTRACTF128(X) If we only use the bottom lane, see if we can simplify this to extract_subvector - which is always at least as quick as PERMPD/PERMQ. llvm-svn: 364518	2019-06-27 11:16:03 +00:00
Djordje Todorovic	7eeeb5947e	[ISEL][X86] Tracking of registers that forward call arguments While lowering calls, collect info about registers that forward arguments into following function frame. We store such info into the MachineFunction of the call. This is used very late when dumping DWARF info about call site parameters. ([9/13] Introduce the debug entry values.) Co-authored-by: Ananth Sowda <asowda@cisco.com> Co-authored-by: Nikola Prica <nikola.prica@rt-rk.com> Co-authored-by: Ivan Baev <ibaev@cisco.com> Differential Revision: https://reviews.llvm.org/D60715 llvm-svn: 364516	2019-06-27 10:51:15 +00:00
Diana Picus	253b53b2ec	[AArch64 GlobalISel] Cleanup CallLowering. NFCI Now that lowerCall and lowerFormalArgs have been refactored, we can simplify splitToValueTypes. Differential Revision: https://reviews.llvm.org/D63552 llvm-svn: 364513	2019-06-27 09:24:30 +00:00
Diana Picus	43fb5ae50c	[GlobalISel] Accept multiple vregs for lowerCall's args Change the interface of CallLowering::lowerCall to accept several virtual registers for each argument, instead of just one. This is a follow-up to D46018. CallLowering::lowerReturn was similarly refactored in D49660 and lowerFormalArguments in D63549. With this change, we no longer pack the virtual registers generated for aggregates into one big lump before delegating to the target. Therefore, the target can decide itself whether it wants to handle them as separate pieces or use one big register. ARM and AArch64 have been updated to use the passed in virtual registers directly, which means we no longer need to generate so many merge/extract instructions. NFCI for AMDGPU, Mips and X86. Differential Revision: https://reviews.llvm.org/D63551 llvm-svn: 364512	2019-06-27 09:18:03 +00:00
Diana Picus	8138996128	[GlobalISel] Accept multiple vregs for lowerCall's result Change the interface of CallLowering::lowerCall to accept several virtual registers for the call result, instead of just one. This is a follow-up to D46018. CallLowering::lowerReturn was similarly refactored in D49660 and lowerFormalArguments in D63549. With this change, we no longer pack the virtual registers generated for aggregates into one big lump before delegating to the target. Therefore, the target can decide itself whether it wants to handle them as separate pieces or use one big register. ARM and AArch64 have been updated to use the passed in virtual registers directly, which means we no longer need to generate so many merge/extract instructions. NFCI for AMDGPU, Mips and X86. Differential Revision: https://reviews.llvm.org/D63550 llvm-svn: 364511	2019-06-27 09:15:53 +00:00
Diana Picus	c3dbe23977	[GlobalISel] Accept multiple vregs in lowerFormalArgs Change the interface of CallLowering::lowerFormalArguments to accept several virtual registers for each formal argument, instead of just one. This is a follow-up to D46018. CallLowering::lowerReturn was similarly refactored in D49660. lowerCall will be refactored in the same way in follow-up patches. With this change, we forward the virtual registers generated for aggregates to CallLowering. Therefore, the target can decide itself whether it wants to handle them as separate pieces or use one big register. We also copy the pack/unpackRegs helpers to CallLowering to facilitate this. ARM and AArch64 have been updated to use the passed in virtual registers directly, which means we no longer need to generate so many merge/extract instructions. AArch64 seems to have had a bug when lowering e.g. [1 x i8*], which was put into a s64 instead of a p0. Added a test-case which illustrates the problem more clearly (it crashes without this patch) and fixed the existing test-case to expect p0. AMDGPU has been updated to unpack into the virtual registers for kernels. I think the other code paths fall back for aggregates, so this should be NFC. Mips doesn't support aggregates yet, so it's also NFC. x86 seems to have code for dealing with aggregates, but I couldn't find the tests for it, so I just added a fallback to DAGISel if we get more than one virtual register for an argument. Differential Revision: https://reviews.llvm.org/D63549 llvm-svn: 364510	2019-06-27 08:54:17 +00:00
Diana Picus	69ce1c1319	[GlobalISel] Allow multiple VRegs in ArgInfo. NFC Allow CallLowering::ArgInfo to contain more than one virtual register. This is useful when passes split aggregates into several virtual registers, but need to also provide information about the original type to the call lowering. Used in follow-up patches. Differential Revision: https://reviews.llvm.org/D63548 llvm-svn: 364509	2019-06-27 08:50:53 +00:00
Jay Foad	8479240b0a	[AMDGPU] Fix +DumpCode to print an entry label for the first function Summary: The +DumpCode attribute is a horrible hack in AMDGPU to embed the disassembly of the generated code into the elf file. It is used by LLPC to implement an extension that allows the application to read back the disassembly of the code. It tries to print an entry label at the start of every function, but that didn't work for the first function in the module because DumpCodeInstEmitter wasn't initialised until EmitFunctionBodyStart which is too late. Change-Id: I790d73ddf4f51fd02ab32529380c7cb7c607c4ee Reviewers: arsenm, tpr, kzhuravl Reviewed By: arsenm Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63712 llvm-svn: 364508	2019-06-27 08:19:28 +00:00
Mikael Holmen	7b81b61368	Silence gcc warning after r364458 Without the fix gcc 7.4.0 complains with ../lib/Target/X86/X86ISelLowering.cpp: In function 'bool getFauxShuffleMask(llvm::SDValue, llvm::SmallVectorImpl<int>&, llvm::SmallVectorImpl<llvm::SDValue>&, llvm::SelectionDAG&)': ../lib/Target/X86/X86ISelLowering.cpp:6690:36: error: enumeral and non-enumeral type in conditional expression [-Werror=extra] int Idx = (ZeroMask[j] ? SM_SentinelZero : (i + j + Ofs)); ~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ cc1plus: all warnings being treated as errors llvm-svn: 364507	2019-06-27 08:16:18 +00:00
Craig Topper	9153501f07	[X86] Remove (vzext_movl (scalar_to_vector (load))) matching code from selectScalarSSELoad. I think this will be turning into vzext_load during DAG combine. llvm-svn: 364499	2019-06-27 05:52:00 +00:00
Craig Topper	9ea5a32251	[X86] Teach selectScalarSSELoad to not narrow volatile loads. llvm-svn: 364498	2019-06-27 05:51:56 +00:00
Kang Zhang	490bc46541	[NFC][PowerPC] Improve the for loop in Early Return Summary: In `PPCEarlyReturn.cpp` ``` 183 for (MachineFunction::iterator I = MF.begin(); I != MF.end();) { 184 MachineBasicBlock &B = I++; 185 if (processBlock(B)) 186 Changed = true; 187 } ``` Above code can be improved to: ``` 184 for (MachineFunction::iterator I = MF.begin(), E = MF.end(); I != E;) { 185 MachineBasicBlock &B = I++; 186 Changed \|= processBlock(B); 187 } ``` Reviewed By: hfinkel Differential Revision: https://reviews.llvm.org/D63800 llvm-svn: 364496	2019-06-27 03:39:09 +00:00
Eli Friedman	ab1d73ee32	[ARM] Don't reserve R12 on Thumb1 as an emergency spill slot. The current implementation of ThumbRegisterInfo::saveScavengerRegister is bad for two reasons: one, it's buggy, and two, it blocks using R12 for other optimizations. So this patch gets rid of it, and adds the necessary support for using an ordinary emergency spill slot on Thumb1. (Specifically, I think saveScavengerRegister was broken by r305625, and nobody noticed for two years because the codepath is almost never used. The new code will also probably not be used much, but it now has better tests, and if we fail to emit a necessary emergency spill slot we get a reasonable error message instead of a miscompile.) A rough outline of the changes in the patch: 1. Gets rid of ThumbRegisterInfo::saveScavengerRegister. 2. Modifies ARMFrameLowering::determineCalleeSaves to allocate an emergency spill slot for Thumb1. 3. Implements useFPForScavengingIndex, so the emergency spill slot isn't placed at a negative offset from FP on Thumb1. 4. Modifies the heuristics for allocating an emergency spill slot to support Thumb1. This includes fixing ExtraCSSpill so we don't try to use "lr" as a substitute for allocating an emergency spill slot. 5. Allocates a base pointer in more cases, so the emergency spill slot is always accessible. 6. Modifies ARMFrameLowering::ResolveFrameIndexReference to compute the right offset in the new cases where we're forcing a base pointer. 7. Ensures we never generate a load or store with an offset outside of its frame object. This makes the heuristics more straightforward. 8. Changes Thumb1 prologue and epilogue emission so it never uses register scavenging. Some of the changes to the emergency spill slot heuristics in determineCalleeSaves affect ARM/Thumb2; hopefully, they should allow the compiler to avoid allocating an emergency spill slot in cases where it isn't necessary. The rest of the changes should only affect Thumb1. Differential Revision: https://reviews.llvm.org/D63677 llvm-svn: 364490	2019-06-26 23:46:51 +00:00
Matt Arsenault	c0cad98363	AMDGPU: Assert SPAdj is 0 llvm-svn: 364473	2019-06-26 20:56:18 +00:00
Matt Arsenault	6a87e0fc6a	[AMDGPU] Fix Livereg computation during epilogue insertion The LivePhysRegs calculated in order to find a scratch register in the epilogue code wrongly uses 'LiveIns'. Instead, it should use the 'Liveout' sets. For the liveness, also considering the operands of the terminator (return) instruction which is the insertion point for the scratch-exec-copy instruction. Patch by Christudasan Devadasan llvm-svn: 364470	2019-06-26 20:35:18 +00:00
Craig Topper	3d12971e1c	[X86] Rework the logic in LowerBuildVectorv16i8 to make better use of any_extend and break false dependencies. Other improvements This patch rewrites the loop iteration to only visit every other element starting with element 0. And we work on the "even" element and "next" element at the same time. The "First" logic has been moved to the bottom of the loop and doesn't run on every element. I believe it could create dangling nodes previously since we didn't check if we were going to use SCALAR_TO_VECTOR for the first insertion. I got rid of the "First" variable and just do a null check on V which should be equivalent. We also no longer use undef as the starting V for vectors with no zeroes to avoid false dependencies. This matches v8i16. I've changed all the extends and OR operations to use MVT::i32 since that's what they'll be promoted to anyway. I've tried to use zero_extend only when necessary and use any_extend otherwise. This resulted in some improvements in tests where we are now able to promote aligned (i32 (extload i8)) to a 32-bit load. Differential Revision: https://reviews.llvm.org/D63702 llvm-svn: 364469	2019-06-26 20:16:19 +00:00
Craig Topper	afa58b6ba1	[X86] Remove isTypePromotionOfi1ZeroUpBits and its helpers. This was trying to optimize concat_vectors with zero of setcc or kand instructions. But I think it produced the same code we produce for a concat_vectors with 0 even it it doesn't come from one of those operations. llvm-svn: 364463	2019-06-26 19:45:48 +00:00
Simon Pilgrim	dfe079ffbf	[X86][SSE] getFauxShuffleMask - handle OR(x,y) where x and y have no overlapping bits Create a per-byte shuffle mask based on the computeKnownBits from each operand - if for each byte we have a known zero (or both) then it can be safely blended. Fixes PR41545 llvm-svn: 364458	2019-06-26 18:21:26 +00:00
Ryan Taylor	9ab812d475	[AMDGPU] Fix for branch offset hardware workaround Summary: This fixes a hardware bug that makes a branch offset of 0x3f unsafe. This replaces the 32 bit branch with offset 0x3f to a 64 bit instruction that includes the same 32 bit branch and the encoding for a s_nop 0 to follow. The relaxer than modifies the offsets accordingly. Change-Id: I10b7aed99d651f8159401b01bb421f105fa6288e Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63494 llvm-svn: 364451	2019-06-26 17:34:57 +00:00
Ulrich Weigand	4c86dd9032	Allow matching extend-from-memory with strict FP nodes This implements a small enhancement to https://reviews.llvm.org/D55506 Specifically, while we were able to match strict FP nodes for floating-point extend operations with a register as source, this did not work for operations with memory as source. That is because from regular operations, this is represented as a combined "extload" node (which is a variant of a load SD node); but there is no equivalent using a strict FP operation. However, it turns out that even in the absence of an extload node, we can still just match the operations explicitly, e.g. (strict_fpextend (f32 (load node:$ptr)) This patch implements that method to match the LDEB/LXEB/LXDB SystemZ instructions even when the extend uses a strict-FP node. llvm-svn: 364450	2019-06-26 17:19:12 +00:00
Thomas Lively	7663e0cd7d	[WebAssembly] Omit wrap on i64x2.{shl,shr*} ISel when possible Summary: Since the WebAssembly SIMD shift instructions take i32 operands, we truncate the i64 operand to <2 x i64> shifts during ISel. When the i64 operand is sign extended from i32, this CL makes it so the sign extension is dropped instead of a wrap instruction added. Reviewers: dschuff, aheejin Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63615 llvm-svn: 364446	2019-06-26 16:19:59 +00:00
Thomas Lively	a1d97a960e	[WebAssembly] Implement tail calls and unify tablegen call classes Summary: Implements direct and indirect tail calls enabled by the 'tail-call' feature in both DAG ISel and FastISel. Updates existing call tests and adds new tests including a binary encoding test. Reviewers: aheejin Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62877 llvm-svn: 364445	2019-06-26 16:17:15 +00:00
Simon Pilgrim	435ee9fb1f	[X86][SSE] X86TargetLowering::isCommutativeBinOp - add PMULDQ Allows narrowInsertExtractVectorBinOp to reduce vector size instead of the more restricted SimplifyDemandedVectorEltsForTargetNode llvm-svn: 364434	2019-06-26 14:58:11 +00:00
Simon Pilgrim	6b687bf681	[X86][SSE] X86TargetLowering::isCommutativeBinOp - add PCMPEQ Allows narrowInsertExtractVectorBinOp to reduce vector size llvm-svn: 364432	2019-06-26 14:40:49 +00:00
Simon Pilgrim	b13c6f1a9d	[X86][SSE] X86TargetLowering::isBinOp - add PCMPGT Allows narrowInsertExtractVectorBinOp to reduce vector size llvm-svn: 364431	2019-06-26 14:34:41 +00:00
Simon Pilgrim	24f96a0eee	[X86] shouldScalarizeBinop - never scalarize target opcodes. We have (almost) no target opcodes that have scalar/vector equivalents - for now assume we can't scalarize them (we can add exceptions if we need to). llvm-svn: 364429	2019-06-26 14:21:29 +00:00
Matt Arsenault	5f798f1346	AMDGPU: Fix unused variable llvm-svn: 364426	2019-06-26 13:48:04 +00:00
Matt Arsenault	e0b8443460	AMDGPU: Check MRI for callee saved regs instead of TRI This should the same, but MRI does allow dynamically changing the CSR set, although currently not used. llvm-svn: 364425	2019-06-26 13:39:29 +00:00
Roman Lebedev	13889145f0	[X86][Codegen] X86DAGToDAGISel::matchBitExtract(): consistently capture lambdas by value llvm-svn: 364420	2019-06-26 12:19:52 +00:00
Roman Lebedev	fbb2e40d5c	[X86] X86DAGToDAGISel::matchBitExtract(): pattern c: truncation awareness Summary: The one thing of note here is that the 'bitwidth' constant (32/64) was previously pessimistic. Given `x & (-1 >> (C - z))`, we were taking `C` to be `bitwidth(x)`, but in reality we want `(-1 >> (C - z))` pattern to mean "low z bits must be all-ones". And for that, `C` should be `bitwidth(-1 >> (C - z))`, i.e. of the shift operation itself. Last pattern D does not seem to exhibit any of these truncation issues. Although it has the opposite problem - if we extract low bits (no shift) from i64, and then truncate to i32, then we fail to shrink this 64-bit extraction into 32-bit extraction. Reviewers: RKSimon, craig.topper, spatel Reviewed By: RKSimon Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62806 llvm-svn: 364419	2019-06-26 12:19:47 +00:00
Roman Lebedev	b0ecc1cc6b	[X86] X86DAGToDAGISel::matchBitExtract(): pattern b: truncation awareness Summary: (Not so) boringly identical to pattern a (D62786) Not yet sure how do deal with the last pattern c. Reviewers: RKSimon, craig.topper, spatel Reviewed By: RKSimon Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62793 llvm-svn: 364418	2019-06-26 12:19:39 +00:00
Roman Lebedev	8b9a03973a	[X86] X86DAGToDAGISel::matchBitExtract(): pattern a: truncation awareness Summary: Finally tying up loose ends here. The problem is quite simple: If we have pattern `(x >> start) & (1 << nbits) - 1`, and then truncate the result, that truncation will be propagated upwards, into the `and`. And that isn't currently handled. I'm only fixing pattern `a` here, the same fix will be needed for patterns `b`/`c` too. I think this isn't missing any extra legality checks, since we only look past truncations. Similary, i don't think we can get any other truncation there other than i64->i32. Reviewers: craig.topper, RKSimon, spatel Reviewed By: craig.topper Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62786 llvm-svn: 364417	2019-06-26 12:19:11 +00:00
Hans Wennborg	6876de90e8	Fix the build after r364401 It was failing with: /b/s/w/ir/cache/builder/src/third_party/llvm/llvm/lib/Target/X86/X86ISelLowering.cpp:18772:66: error: call of overloaded 'makeArrayRef(<brace-enclosed initializer list>)' is ambiguous scaleShuffleMask<int>(Scale, makeArrayRef<int>({ 0, 2, 1, 3 }), Mask); ^ /b/s/w/ir/cache/builder/src/third_party/llvm/llvm/lib/Target/X86/X86ISelLowering.cpp:18772:66: note: candidates are: In file included from /b/s/w/ir/cache/builder/src/third_party/llvm/llvm/include/llvm/CodeGen/MachineFunction.h:20:0, from /b/s/w/ir/cache/builder/src/third_party/llvm/llvm/include/llvm/CodeGen/CallingConvLower.h:19, from /b/s/w/ir/cache/builder/src/third_party/llvm/llvm/lib/Target/X86/X86ISelLowering.h:17, from /b/s/w/ir/cache/builder/src/third_party/llvm/llvm/lib/Target/X86/X86ISelLowering.cpp:14: /b/s/w/ir/cache/builder/src/third_party/llvm/llvm/include/llvm/ADT/ArrayRef.h:480:15: note: llvm::ArrayRef<T> llvm::makeArrayRef(const std::vector<_RealType>&) [with T = int] ArrayRef<T> makeArrayRef(const std::vector<T> &Vec) { ^ /b/s/w/ir/cache/builder/src/third_party/llvm/llvm/include/llvm/ADT/ArrayRef.h:485:37: note: llvm::ArrayRef<T> llvm::makeArrayRef(const llvm::ArrayRef<T>&) [with T = int] template <typename T> ArrayRef<T> makeArrayRef(const ArrayRef<T> &Vec) { ^ llvm-svn: 364414	2019-06-26 11:56:38 +00:00
Simon Pilgrim	c0711af7f9	[X86][AVX] combineExtractSubvector - 'little to big' extract_subvector(bitcast()) support Ideally this needs to be a generic combine in DAGCombiner::visitEXTRACT_SUBVECTOR but there's some nasty regressions in aarch64 due to neon shuffles not handling bitcasts at all..... llvm-svn: 364407	2019-06-26 11:21:09 +00:00
Mikhail Maltsev	6dcbb3161e	[ARM] Handle fixup_arm_pcrel_9 correctly on big-endian targets Summary: The getFixupKindContainerSizeBytes function returns the size of the instruction containing a given fixup. Currently fixup_arm_pcrel_9 is not handled in this function, this causes an assertion failure in the debug build and incorrect codegen in the release build. This patch fixes the problem. Reviewers: ostannard, simon_tatham Reviewed By: ostannard Subscribers: javed.absar, kristof.beyls, hiraditya, pbarrio, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63778 llvm-svn: 364404	2019-06-26 10:48:40 +00:00
Lewis Revill	cf74881329	[RISCV] Add pseudo instruction for calls with explicit register This patch adds the PseudoCALLReg instruction which allows using an explicit register operand as the destination for the return address. GCC can successfully parse this form of the call instruction, which would be used for calls to functions which do not use ra as the return address register, such as the __riscv_save libcalls. This patch forms the first part of an implementation of -msave-restore for RISC-V. Differential Revision: https://reviews.llvm.org/D62685 llvm-svn: 364403	2019-06-26 10:35:58 +00:00
Simon Pilgrim	3845a4f849	[X86][AVX] truncateVectorWithPACK - avoid bitcasted shuffles truncateVectorWithPACK is often used in conjunction with ComputeNumSignBits which struggles when peeking through bitcasts. This fix tries to avoid bitcast(shuffle(bitcast())) patterns in the 256-bit 64-bit sublane shuffles so we can still see through at least until lowering when the shuffles will need to be bitcasted to widen the shuffle type. llvm-svn: 364401	2019-06-26 09:50:11 +00:00
Clement Courbet	be98e0ab78	[ExpandMemCmp] Honor prefer-vector-width. Reviewers: gchatelet, echristo, spatel, atdt Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63769 llvm-svn: 364384	2019-06-26 07:06:49 +00:00
Kai Luo	d6a8bc7a12	[PowerPC] Fixed missing change flag of emitRLDICWhenLoweringJumpTables PPCMIPeephole::emitRLDICWhenLoweringJumpTables should return a bool value to indicate optimization is conducted or not. Differential Revision: https://reviews.llvm.org/D63801 llvm-svn: 364383	2019-06-26 05:25:16 +00:00
Fangrui Song	6a4c68e187	[ARM] Fix -Wimplicit-fallthrough after D60709/r364331 llvm-svn: 364376	2019-06-26 02:34:10 +00:00
Nemanja Ivanovic	8265e8ff36	[PowerPC] Mark FCOPYSIGN legal for FP vectors This was just an omission in the back end. We have had the instructions for both single and double precision for a few HW generations, but never got around to legalizing these. Differential revision: https://reviews.llvm.org/D63634 llvm-svn: 364373	2019-06-26 01:48:57 +00:00
Kai Luo	174b4ff781	[PowerPC][NFC] Move peephole optimization of RLDICR into a method. llvm-svn: 364372	2019-06-26 01:34:37 +00:00
Heejin Ahn	65d8d6357b	[WebAssembly] Remove catch_all from AsmParser Summary: `catch_all` is from the first version of EH proposal and now has been removed. There were no tests covering this, and thus no tests to remove or fix. Reviewers: aardappel Subscribers: dschuff, sbc100, jgravelle-google, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63737 llvm-svn: 364360	2019-06-25 23:04:12 +00:00
Matt Arsenault	8fcc70f141	Don't look for the TargetFrameLowering in the implementation The same oddity was apparently copy-pasted between multiple targets. llvm-svn: 364349	2019-06-25 20:53:35 +00:00
Diego Novillo	688afeb884	Update phis in AMDGPUUnifyDivergentExitNodes Original patch https://reviews.llvm.org/D63659 from Steven Perron <stevenperron@google.com> The pass AMDGPUUnifyDivergentExitNodes does not update the phi nodes in the successors of blocks that is splits. This is fixed by calling BasicBlock::splitBasicBlock to split the block instead of doing it manually. This does extra work because a new conditional branch is created in BB which is immediately replaced, but I think the simplicity is worth it. It also helps make the code more future proof in case other things need to be updated. llvm-svn: 364342	2019-06-25 18:55:16 +00:00
Stanislav Mekhanoshin	4be636ebb3	[AMDGPU] Removed dead SIMachineFunctionInfo::getWorkItemIDVGPR() Differential Revision: https://reviews.llvm.org/D63780 llvm-svn: 364339	2019-06-25 18:33:53 +00:00
Craig Topper	4577b8c17c	[X86] Remove isel patterns that look for (vzext_movl (scalar_to_vector (load))) I believe these all get canonicalized to vzext_movl. The only case where that wasn't true was when the load was loadi32 and the load was an extload aligned to 32 bits. But that was fixed in r364207. Differential Revision: https://reviews.llvm.org/D63701 llvm-svn: 364337	2019-06-25 17:31:52 +00:00
Craig Topper	14ea14ae85	[X86] Add a DAG combine to turn vzmovl+load into vzload if the load isn't volatile. Remove isel patterns for vzmovl+load We currently have some isel patterns for treating vzmovl+load the same as vzload, but that shrinks the load which we shouldn't do if the load is volatile. Rather than adding isel checks for volatile. This patch removes the patterns and teachs DAG combine to merge them into vzload when its legal to do so. Differential Revision: https://reviews.llvm.org/D63665 llvm-svn: 364333	2019-06-25 17:08:26 +00:00
Simon Tatham	e8de8ba6a6	[ARM] Support inline assembler constraints for MVE. "To" selects an odd-numbered GPR, and "Te" an even one. There are some 8.1-M instructions that have one too few bits in their register fields and require registers of particular parity, without necessarily using a consecutive even/odd pair. Also, the constraint letter "t" should select an MVE q-register, when MVE is present. This didn't need any source changes, but some extra tests have been added. Reviewers: dmgreen, samparker, SjoerdMeijer Subscribers: javed.absar, eraman, kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D60709 llvm-svn: 364331	2019-06-25 16:49:32 +00:00
Ayke van Laethem	88139c143c	[AVR] Adjust to Register class change A refactor in r364191 changed register types from an unsigned int to the llvm:Register class. Adjust the AVR backend to this change. This fixes build errors when building with the experimental AVR backend enabled. Differential Revision: https://reviews.llvm.org/D63776 llvm-svn: 364330	2019-06-25 16:49:22 +00:00
Simon Tatham	a4b415a683	[ARM] Code-generation infrastructure for MVE. This provides the low-level support to start using MVE vector types in LLVM IR, loading and storing them, passing them to __asm__ statements containing hand-written MVE vector instructions, and if you have the hard-float ABI turned on, using them as function parameters. (In the soft-float ABI, vector types are passed in integer registers, and combining all those 32-bit integers into a q-reg requires support for selection DAG nodes like insert_vector_elt and build_vector which aren't implemented yet for MVE. In fact I've also had to add `arm_aapcs_vfpcc` to a couple of existing tests to avoid that problem.) Specifically, this commit adds support for: * spills, reloads and register moves for MVE vector registers * ditto for the VPT predication mask that lives in VPR.P0 * make all the MVE vector types legal in ISel, and provide selection DAG patterns for BITCAST, LOAD and STORE * make loads and stores of scalar FP types conditional on `hasFPRegs()` rather than `hasVFP2Base()`. As a result a few existing tests needed their llc command lines updating to use `-mattr=-fpregs` as their method of turning off all hardware FP support. Reviewers: dmgreen, samparker, SjoerdMeijer Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60708 llvm-svn: 364329	2019-06-25 16:48:46 +00:00
Fangrui Song	96a192ea53	[PPC32] Support PLT calls for -msecure-plt -fpic Summary: In Secure PLT ABI, -fpic is similar to -fPIC. The differences are that: * -fpic stores the address of _GLOBAL_OFFSET_TABLE_ in r30, while -fPIC stores .got2+0x8000. * -fpic uses an addend of 0 for R_PPC_PLTREL24, while -fPIC uses 0x8000. Reviewers: hfinkel, jhibbits, joerg, nemanjai, spetrovic Reviewed By: jhibbits Subscribers: adalava, kbarton, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63563 llvm-svn: 364324	2019-06-25 15:56:32 +00:00
Sam Parker	bcf0eb7a64	[ARM] Fix for DLS/LE CodeGen The expensive buildbots highlighted the mir tests were broken, which I've now updated and added --verify-machineinstrs to them. This also uncovered a couple of bugs in the backend pass, so these have also been fixed. llvm-svn: 364323	2019-06-25 15:11:17 +00:00
Michael Liao	f0a665afca	[AMDGPU] Null checking on TS to avoid crashing in clang tests. - `test/Misc/backend-resource-limit-diagnostics.cl` crashes as null streamer is used. llvm-svn: 364318	2019-06-25 14:06:34 +00:00
Simon Pilgrim	aae4b68703	[X86] lowerShuffleAsSpecificZeroOrAnyExtend - add ANY_EXTEND TODO. lowerShuffleAsSpecificZeroOrAnyExtend should be able to lower to ANY_EXTEND_VECTOR_INREG as well as ZER_EXTEND_VECTOR_INREG. llvm-svn: 364313	2019-06-25 13:36:53 +00:00
Fangrui Song	807d2f442a	[ARM] Fix -Wunused-variable in -DLLVM_ENABLE_ASSERTIONS=off builds after D60692 llvm-svn: 364312	2019-06-25 13:28:44 +00:00
Matt Arsenault	d7ffa2a948	AMDGPU: Select G_SEXT/G_ZEXT/G_ANYEXT llvm-svn: 364308	2019-06-25 13:18:11 +00:00
Simon Tatham	287f0403e3	[ARM] Fix buildbot failure due to -Werror. Including both 'case ARM_AM::uxtw' and 'default' in the getShiftOp switch caused a buildbot to fail with error: default label in switch which covers all enumeration values [-Werror,-Wcovered-switch-default] llvm-svn: 364300	2019-06-25 12:23:46 +00:00
Sjoerd Meijer	74ec25a197	[ARM] MVE VPT Blocks A minor iteration on the MVE VPT Block pass to enable more efficient VPT Block code generation: consecutive VPT predicated statements, predicated on the same condition, will be placed within the same VPT Block. This essentially is also an exercise to write some more tests for the next step, which should be more generic also merging instructions when they are not consecutive. Differential Revision: https://reviews.llvm.org/D63711 llvm-svn: 364298	2019-06-25 12:04:31 +00:00
Nicolai Haehnle	2710171a15	AMDGPU: Write LDS objects out as global symbols in code generation Summary: The symbols use the processor-specific SHN_AMDGPU_LDS section index introduced with a previous change. The linker is then expected to resolve relocations, which are also emitted. Initially disabled for HSA and PAL environments until they have caught up in terms of linker and runtime loader. Some notes: - The llvm.amdgcn.groupstaticsize intrinsics can no longer be lowered to a constant at compile times, which means some tests can no longer be applied. The current "solution" is a terrible hack, but the intrinsic isn't used by Mesa, so we can keep it for now. - We no longer know the full LDS size per kernel at compile time, which means that we can no longer generate a relevant error message at compile time. It would be possible to add a check for the size of individual variables, but ultimately the linker will have to perform the final check. Change-Id: If66dbf33fccfbf3609aefefa2558ac0850d42275 Reviewers: arsenm, rampitec, t-tye, b-sumner, jsjodin Subscribers: qcolombet, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61494 llvm-svn: 364297	2019-06-25 11:52:30 +00:00
Nicolai Haehnle	08e8cb5760	AMDGPU/MC: Add .amdgpu_lds directive Summary: The directive defines a symbol as an group/local memory (LDS) symbol. LDS symbols behave similar to common symbols for the purposes of ELF, using the processor-specific SHN_AMDGPU_LDS as section index. It is the linker and/or runtime loader's job to "instantiate" LDS symbols and resolve relocations that reference them. It is not possible to initialize LDS memory (not even zero-initialize as for .bss). We want to be able to link together objects -- starting with relocatable objects, but possible expanding to shared objects in the future -- that access LDS memory in a flexible way. LDS memory is in an address space that is entirely separate from the address space that contains the program image (code and normal data), so having program segments for it doesn't really make sense. Furthermore, we want to be able to compile multiple kernels in a compilation unit which have disjoint use of LDS memory. In that case, we may want to place LDS symbols differently for different kernels to save memory (LDS memory is very limited and physically private to each kernel invocation), so we can't simply place LDS symbols in a .lds section. Hence this solution where LDS symbols always stay undefined. Change-Id: I08cbc37a7c0c32f53f7b6123aa0afc91dbc1748f Reviewers: arsenm, rampitec, t-tye, b-sumner, jsjodin Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, rupprecht, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61493 llvm-svn: 364296	2019-06-25 11:51:35 +00:00
Simon Tatham	4cf18c2849	[ARM] Explicit lowering of half <-> double conversions. If an FP_EXTEND or FP_ROUND isel dag node converts directly between f16 and f32 when the target CPU has no instruction to do it in one go, it has to be done in two steps instead, going via f32. Previously, this was done implicitly, because all such CPUs had the storage-only implementation of f16 (i.e. the only thing you can do with one at all is to convert it to/from f32). So isel would legalize the f16 into an f32 as soon as it saw it, by inserting an fp16_to_fp node (or vice versa), and then the fp_extend would already be f32->f64 rather than f16->f64. But that technique can't support a target CPU which has full f16 support but _not_ f64, such as some variants of Arm v8.1-M. So now we provide custom lowering for FP_EXTEND and FP_ROUND, which checks support for f16 and f64 and decides on the best thing to do given the combination of flags it gets back. Reviewers: dmgreen, samparker, SjoerdMeijer Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60692 llvm-svn: 364294	2019-06-25 11:24:50 +00:00
Simon Tatham	86b7a1e660	[ARM] Add remaining miscellaneous MVE instructions. This final batch includes the tail-predicated versions of the low-overhead loop instructions (LETP); the VPSEL instruction to select between two vector registers based on the predicate mask without having to open a VPT block; and VPNOT which complements the predicate mask in place. Reviewers: dmgreen, samparker, SjoerdMeijer, t.p.northover Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62681 llvm-svn: 364292	2019-06-25 11:24:33 +00:00
Simon Tatham	e6824160dd	[ARM] Add MVE vector load/store instructions. This adds the rest of the vector memory access instructions. It includes contiguous loads/stores, with an ordinary addressing mode such as [r0,#offset] (plus writeback variants); gather loads and scatter stores with a scalar base address register and a vector of offsets from it (written [r0,q1] or similar); and gather/scatters with a vector of base addresses (written [q0,#offset], again with writeback). Additionally, some of the loads can widen each loaded value into a larger vector lane, and the corresponding stores narrow them again. To implement these, we also have to add the addressing modes they need. Also, in AsmParser, the `isMem` query function now has subqueries `isGPRMem` and `isMVEMem`, according to which kind of base register is used by a given memory access operand. I've also had to add an extra check in `checkTargetMatchPredicate` in the AsmParser, without which our last-minute check of `rGPR` register operands against SP and PC was failing an assertion because Tablegen had inserted an immediate 0 in place of one of a pair of tied register operands. (This matches the way the corresponding check for `MCK_rGPR` in `validateTargetOperandClass` is guarded.) Apparently the MVE load instructions were the first to have ever triggered this assertion, but I think only because they were the first to have a combination of the usual Arm pre/post writeback system and the `rGPR` class in particular. Reviewers: dmgreen, samparker, SjoerdMeijer, t.p.northover Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62680 llvm-svn: 364291	2019-06-25 11:24:18 +00:00
Nemanja Ivanovic	47b7d13459	[PowerPC] Emit XXSEL for vec_sel and code that has the same pattern As pointed out in https://bugs.llvm.org/show_bug.cgi?id=41777 we do not emit a vector select even when the pretty much asks for one. This patch changes that. Differential revision: https://reviews.llvm.org/D61658 llvm-svn: 364289	2019-06-25 10:46:13 +00:00
Sam Parker	a6fd919cb3	[ARM] DLS/LE low-overhead loop code generation Introduce three pseudo instructions to be used during DAG ISel to represent v8.1-m low-overhead loops. One maps to set_loop_iterations while loop_decrement_reg is lowered to two, so that we can separate the decrement and branching operations. The pseudo instructions are expanded pre-emission, where we can still decide whether we actually want to generate a low-overhead loop, in a new pass: ARMLowOverheadLoops. The pass currently bails, reverting to an sub, icmp and br, in the cases where a call or stack spill/restore happens between the decrement and branching instructions, or if the loop is too large. Differential Revision: https://reviews.llvm.org/D63476 llvm-svn: 364288	2019-06-25 10:45:51 +00:00
Clement Courbet	3bc5ad551a	[ExpandMemCmp] Move all options to TargetTransformInfo. Split off from D60318. llvm-svn: 364281	2019-06-25 08:04:13 +00:00
Matt Arsenault	25bc27965a	AMDGPU/GlobalISel: Fix regbankselect for amdgcn.class llvm-svn: 364262	2019-06-25 01:07:22 +00:00
Matt Arsenault	dbb6c03175	AMDGPU/GlobalISel: Select G_TRUNC llvm-svn: 364215	2019-06-24 18:02:18 +00:00
Matt Arsenault	14d0b646b7	AMDGPU/GlobalISel: RegBankSelect for amdgcn.class llvm-svn: 364214	2019-06-24 18:00:47 +00:00
Matt Arsenault	8fcd5ade3e	AMDGPU/GlobalISel: Split VALU s64 G_ZEXT/G_SEXT in RegBankSelect Scalar extends to s64 can use S_BFE_{I64\|U64}, but vector extends need to extend to the 32-bit half, and then to 64. I'm not sure what the line should be between what RegBankSelect handles, and what instruction select does, but for now I'm erring on the side of RegBankSelect for future post-RBS combines. llvm-svn: 364212	2019-06-24 17:54:12 +00:00
Tim Renouf	d2fdb956e0	[AMDGPU] Allow any value in unused src0 field in v_nop Summary: The LLVM disassembler assumes that the unused src0 operand of v_nop is zero. Other tools can put another value in that field, which is still valid. This commit fixes the LLVM disassembler to recognize such an encoding as v_nop, in the same way as we already do for s_getpc. Differential Revision: https://reviews.llvm.org/D63724 Change-Id: Iaf0363eae26ff92fc4ebc716216476adbff37a6f llvm-svn: 364208	2019-06-24 17:35:20 +00:00
Craig Topper	7fccb2ac5e	[X86] Don't a vzext_movl in LowerBuildVectorv16i8/LowerBuildVectorv8i16 if there are no zeroes in the vector we're building. In LowerBuildVectorv16i8 we took care to use an any_extend if the first pair is in the lower 16-bits of the vector and no elements are 0. So bits [31:16] will be undefined. But we still emitted a vzext_movl to ensure that bits [127:32] are 0. If we don't need any zeroes we should be consistent and make all of 127:16 undefined. In LowerBuildVectorv8i16 we can just delete the vzext_movl code because we only use the scalar_to_vector when there are no zeroes. So the vzext_movl is always unnecessary. Found while investigating whether (vzext_movl (scalar_to_vector (loadi32)) patterns are necessary. At least one of the cases where they were necessary was where the loadi32 matched 32-bit aligned 16-bit extload. Seemed weird that we required vzext_movl for that case. Differential Revision: https://reviews.llvm.org/D63700 llvm-svn: 364207	2019-06-24 17:28:41 +00:00
Craig Topper	033774e144	[X86] Cleanups and safety checks around the isFNEG This patch does a few things to start cleaning up the isFNEG function. -Remove the Op0/Op1 peekThroughBitcast calls that seem unnecessary. getTargetConstantBitsFromNode has its own peekThroughBitcast inside. And we have a separate peekThroughBitcast on the return value. -Add a check of the scalar size after the first peekThroughBitcast to ensure we haven't changed the element size and just did something like f32->i32 or f64->i64. -Remove an unnecessary check that Op1's type is floating point after the peekThroughBitcast. We're just going to look for a bit pattern from a constant. We don't care about its type. -Add VT checks on several places that consume the return value of isFNEG. Due to the peekThroughBitcasts inside, the type of the return value isn't guaranteed. So its not safe to use it to build other nodes without ensuring the type matches the type being used to build the node. We might be able to replace these checks with bitcasts instead, but I don't have a test case so a bail out check seemed better for now. Differential Revision: https://reviews.llvm.org/D63683 llvm-svn: 364206	2019-06-24 17:28:26 +00:00
Matt Arsenault	f8a841b88e	AMDGPU/GlobalISel: Fix selecting G_IMPLICIT_DEF for s1 Try to fail for scc, since I don't think that should ever be produced. llvm-svn: 364199	2019-06-24 16:24:03 +00:00
Matt Arsenault	ae171f1e9f	Hexagon: Rename another copy of Register class For some reason clang is happy with the conflict, but MSVC is not. llvm-svn: 364196	2019-06-24 16:16:19 +00:00
Matt Arsenault	f8f1ace5bb	ARC: Fix -Wimplicit-fallthrough llvm-svn: 364195	2019-06-24 16:16:16 +00:00
Matt Arsenault	faeaedf8e9	GlobalISel: Remove unsigned variant of SrcOp Force using Register. One downside is the generated register enums require explicit conversion. llvm-svn: 364194	2019-06-24 16:16:12 +00:00
Matt Arsenault	e3a676e9ad	CodeGen: Introduce a class for registers Avoids using a plain unsigned for registers throughoug codegen. Doesn't attempt to change every register use, just something a little more than the set needed to build after changing the return type of MachineOperand::getReg(). llvm-svn: 364191	2019-06-24 15:50:29 +00:00
Bjorn Pettersson	3260ef16bb	[AMDGPU] Remove unused variable AllSGPRSpilledToVGPRs. NFC Summary: Removing the unused variable AllSGPRSpilledToVGPRs in SIFrameLowering::processFunctionBeforeFrameFinalized to avoid error: variable 'AllSGPRSpilledToVGPRs' set but not used [-Werror=unused-but-set-variable] Reviewers: arsenm, nhaehnle Reviewed By: nhaehnle Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63721 llvm-svn: 364190	2019-06-24 15:50:18 +00:00
Matt Arsenault	2bc35b7938	Hexagon: Rename Register class This avoids a naming conflict in a future patch. llvm-svn: 364188	2019-06-24 15:27:29 +00:00
Matt Arsenault	5dbd9228c4	AMDGPU/GlobalISel: Fix RegBankSelect for s1 sext/zext/anyext This needs different handling if the source is known to be a valid condition or not. Handle turning it into shifts or a select during regbankselect. llvm-svn: 364186	2019-06-24 14:53:58 +00:00
Matt Arsenault	60957cb74c	AMDGPU: Fold frame index into MUBUF This matters for byval uses outside of the entry block, which appear as copies. Previously, the only folding done was during selection, which could not see the underlying frame index. For any uses outside the entry block, the frame index was materialized in the entry block relative to the global scratch wave offset. This may produce worse code in cases where the offset ends up not fitting in the MUBUF offset field. A better heuristic would be helpfu for extreme frames. llvm-svn: 364185	2019-06-24 14:53:56 +00:00
Matt Arsenault	942404d01b	AMDGPU: Cleanup checking when spills need emergency slots Address fixme, which should no longer be a problem since r363757. llvm-svn: 364182	2019-06-24 14:34:40 +00:00
Simon Tatham	fe8017621e	[ARM] Add MVE interleaving load/store family. This adds the family of loads and stores with names like VLD20.8 and VST42.32, which load and store parts of multiple q-registers in such a way that executing both VLD20 and VLD21, or all four of VLD40..VLD43, will distribute 2 or 4 vectors' worth of memory data across the lanes of the same number of registers but in a transposed order. In addition to the Tablegen descriptions of the instructions themselves, this patch also adds encode and decode support for the QQPR and QQQQPR register classes (representing the range of loaded or stored vector registers), and tweaks to the parsing system for lists of vector registers to make it return the right format in this case (since, unlike NEON, MVE regards q-registers as primitive, and not just an alias for two d-registers). llvm-svn: 364172	2019-06-24 10:00:39 +00:00
Craig Topper	e8da65c698	[X86] Turn v16i16->v16i8 truncate+store into a any_extend+truncstore if we avx512f, but not avx512bw. Ideally we'd be able to represent this truncate as a any_extend to v16i32 and a truncate, but SelectionDAG doens't know how to not fold those together. We have isel patterns to use a vpmovzxwd+vpdmovdb for the truncate, but we aren't able to simultaneously fold the load and the store from the isel pattern. By pulling the truncate into the store we can successfully hide it from the DAG combiner. Then we can isel pattern match the truncstore and load+any_extend separately. llvm-svn: 364163	2019-06-23 23:51:21 +00:00
Craig Topper	c8d94e7889	[X86] Fix isel pattern that was looking for a bitcasted load. Remove what appears to be a copy/paste mistake. DAG combine should ensure bitcasts of loads don't exist. Also remove 3 patterns that are identical to the block above them. llvm-svn: 364158	2019-06-23 19:17:50 +00:00
Craig Topper	cadd826d0a	[X86][SelectionDAG] Cleanup and simplify masked_load/masked_store in tablegen. Use more precise PatFrags for scalar masked load/store. Rename masked_load/masked_store to masked_ld/masked_st to discourage their direct use. We need to check truncating/extending and compressing/expanding before using them. This revealed that our scalar masked load/store patterns were misusing these. With those out of the way, renamed masked_load_unaligned and masked_store_unaligned to remove the "_unaligned". We didn't check the alignment anyway so the name was somewhat misleading. Make the aligned versions inherit from masked_load/store instead from a separate identical version. Merge the 3 different alignments PatFrags into a single version that uses the VT from the SDNode to determine the size that the alignment needs to match. llvm-svn: 364150	2019-06-23 06:06:04 +00:00
Simon Pilgrim	a962c1bc0f	[X86][SSE] Fold extract_subvector(vselect(x,y,z),0) -> vselect(extract_subvector(x,0),extract_subvector(y,0),extract_subvector(z,0)) llvm-svn: 364136	2019-06-22 17:57:01 +00:00
Hubert Tong	6f3222ed94	[NFC] Fix indentation in PPCAsmPrinter.cpp After r248261, the indentation switches, inside a namespace definition, between indenting and not indenting one level in for that namespace; the abomination occurs in the middle of a class definition. Fix that. llvm-svn: 364133	2019-06-22 16:03:29 +00:00
Hubert Tong	d801cb1f54	[PowerPC][NFC] Move comment to the relevant function A comment that applies to a virtual destructor was placed on a class constructor. Move the comment to where it belongs. llvm-svn: 364132	2019-06-22 16:02:02 +00:00
Peter Collingbourne	8cd780b432	AArch64: Add support for reading pc using llvm.read_register. This is useful for allowing code to efficiently take an address that can be later mapped onto debug info. Currently the hwasan pass achieves this by taking the address of the current function: http://llvm-cs.pcc.me.uk/lib/Transforms/Instrumentation/HWAddressSanitizer.cpp#921 but this costs two instructions (plus a GOT entry in PIC code) per function with stack variables. This will allow the cost to be reduced to a single instruction. Differential Revision: https://reviews.llvm.org/D63471 llvm-svn: 364126	2019-06-22 03:03:25 +00:00
Peter Collingbourne	4608868d2f	AArch64: Prefer FP-relative debug locations in HWASANified functions. To help produce better diagnostics for stack use-after-return, we'd like to be able to determine the addresses of each HWASANified function's local variables given a small amount of information recorded on entry to the function. Currently we require all HWASANified functions to use frame pointers and record (PC, FP) on function entry. This works better than recording SP because FP cannot change during the function, unlike SP which can change e.g. due to dynamic alloca. However, most variables currently end up using SP-relative locations in their debug info. This prevents us from recomputing the address of most variables because the distance between SP and FP isn't recorded in the debug info. To address this, make the AArch64 backend prefer FP-relative debug locations when producing debug info for HWASANified functions. Differential Revision: https://reviews.llvm.org/D63300 llvm-svn: 364117	2019-06-22 00:06:51 +00:00
Tom Tan	7ecb5145ba	[COFF, ARM64] Fix encoding of debugtrap for Windows On Windows ARM64, intrinsic __debugbreak is compiled into brk #0xF000 which is mapped to llvm.debugtrap in Clang. Instruction brk #F000 is the defined break point instruction on ARM64 which is recognized by Windows debugger and exception handling code, so llvm.debugtrap should map to it instead of redirecting to llvm.trap (brk #1) as the default implementation. Differential Revision: https://reviews.llvm.org/D63635 llvm-svn: 364115	2019-06-21 23:38:05 +00:00
Matt Arsenault	22e3dc60a0	AMDGPU: Fix not using s33 for scratch wave offset in kernels Fixes missing piece from r363990. llvm-svn: 364099	2019-06-21 20:04:02 +00:00
Craig Topper	4649a051bf	[X86] Add DAG combine to turn (vzmovl (insert_subvector undef, X, 0)) into (insert_subvector allzeros, (vzmovl X), 0) 128/256 bit scalar_to_vectors are canonicalized to (insert_subvector undef, (scalar_to_vector), 0). We have isel patterns that try to match this pattern being used by a vzmovl to use a 128-bit instruction and a subreg_to_reg. This patch detects the insert_subvector undef portion of this and pulls it through the vzmovl, creating a narrower vzmovl and an insert_subvector allzeroes. We can then match the insertsubvector into a subreg_to_reg operation by itself. Then we can fall back on existing (vzmovl (scalar_to_vector)) patterns. Note, while the scalar_to_vector case is the motivating case I didn't restrict to just that case. I'm also wondering about shrinking any 256/512 vzmovl to an extract_subvector+vzmovl+insert_subvector(allzeros) but I fear that would have bad implications to shuffle combining. I also think there is more canonicalization we can do with vzmovl with loads or scalar_to_vector with loads to create vzload. Differential Revision: https://reviews.llvm.org/D63512 llvm-svn: 364095	2019-06-21 19:10:21 +00:00
Craig Topper	4569cdbcf5	[X86] Don't mark v64i8/v32i16 ISD::SELECT as custom unless they are legal types. We don't have any Custom handling during type legalization. Only operation legalization. Fixes PR42355 llvm-svn: 364093	2019-06-21 18:50:00 +00:00
Craig Topper	ce6c06dfdd	[X86] Add a debug print of the node in the default case for unhandled opcodes in ReplaceNodeResults. This should be unreachable, but bugs can make it reachable. This adds a debug print so we can see the bad node in the output when the llvm_unreachable triggers. llvm-svn: 364091	2019-06-21 18:49:21 +00:00
Simon Pilgrim	5dba4ed208	[X86][AVX] Combine INSERT_SUBVECTOR(SRC0, EXTRACT_SUBVECTOR(SRC1)) as shuffle Subvector shuffling often ends up as insert/extract subvector. llvm-svn: 364090	2019-06-21 18:35:04 +00:00
Amara Emerson	6e71b34fe6	[AArch64][GlobalISel] Implement selection support for the new G_JUMP_TABLE and G_BRJT ops. With this we can now fully code generate jump tables, which is important for code size. Differential Revision: https://reviews.llvm.org/D63223 llvm-svn: 364086	2019-06-21 18:10:41 +00:00
Craig Topper	6af1be9664	[X86] Use vmovq for v4i64/v4f64/v8i64/v8f64 vzmovl. We already use vmovq for v2i64/v2f64 vzmovl. But we were using a blendpd+xorpd for v4i64/v4f64/v8i64/v8f64 under opt speed. Or movsd+xorpd under optsize. I think the blend with 0 or movss/d is only needed for vXi32 where we don't have an instruction that can move 32 bits from one xmm to another while zeroing upper bits. movq is no worse than blendpd on any known CPUs. llvm-svn: 364079	2019-06-21 17:24:21 +00:00
Amara Emerson	8f25a021dd	[AArch64][GlobalISel] Make s8 and s16 G_CONSTANTs legal. We sometimes get poor code size because constants of types < 32b are legalized as 32 bit G_CONSTANTs with a truncate to fit. This works but means that the localizer can no longer sink them (although it's possible to extend it to do so). On AArch64 however s8 and s16 constants can be selected in the same way as s32 constants, with a mov pseudo into a W register. If we make s8 and s16 constants legal then we can avoid unnecessary truncates, they can be CSE'd, and the localizer can sink them as normal. There is a caveat: if the user of a smaller constant has to widen the sources, we end up with an anyext of the smaller typed G_CONSTANT. This can cause regressions because of the additional extend and missed pattern matching. To remedy this, there's a new artifact combiner to generate the wider G_CONSTANT if it's legal for the target. Differential Revision: https://reviews.llvm.org/D63587 llvm-svn: 364075	2019-06-21 16:43:50 +00:00
Stanislav Mekhanoshin	bdf7f81b89	[AMDGPU] hazard recognizer for fp atomic to s_denorm_mode This requires 3 wait states unless there is a wait or VALU in between. Differential Revision: https://reviews.llvm.org/D63619 llvm-svn: 364074	2019-06-21 16:30:14 +00:00
Simon Pilgrim	96e77ce626	[X86] isBinOp - move commutative ops to isCommutativeBinOp. NFCI. TargetLoweringBase::isBinOp checks isCommutativeBinOp as a fallback, so don't duplicate. llvm-svn: 364072	2019-06-21 16:23:28 +00:00
Simon Pilgrim	bdea88325f	Fix MSVC "result of 32-bit shift implicitly converted to 64 bits" warning. NFCI. llvm-svn: 364068	2019-06-21 16:11:18 +00:00
Sam Elliott	96c8bc7956	[RISCV] Add RISCV-specific TargetTransformInfo Summary: LLVM Allows Targets to provide information that guides optimisations made to LLVM IR. This is done with callbacks on a TargetTransformInfo object. This patch adds a TargetTransformInfo class for RISC-V. This will allow us to implement RISC-V specific callbacks as they become necessary. This commit also adds the getIntImmCost callbacks, and tests them with a simple constant hoisting test. Our immediate costs are on the conservative side, for the moment, but we prevent hoisting in most circumstances anyway. Previous review was on D63007 Reviewers: asb, luismarques Reviewed By: asb Subscribers: ributzka, MaskRay, llvm-commits, Jim, benna, psnobl, jocewei, PkmX, rkruppe, the_o, brucehoult, MartinMosbeck, rogfer01, edward-jones, zzheng, jrtc27, shiva0217, kito-cheng, niosHD, sabuasal, apazos, simoncook, johnrusso, rbar, hiraditya, mgorny Tags: #llvm Differential Revision: https://reviews.llvm.org/D63433 llvm-svn: 364046	2019-06-21 13:36:09 +00:00
Simon Tatham	0c7af66450	[ARM] Add MVE 64-bit GPR <-> vector move instructions. These instructions let you load half a vector register at once from two general-purpose registers, or vice versa. The assembly syntax for these instructions mentions the vector register name twice. For the move _into_ a vector register, the MC operand list also has to mention the register name twice (once as the output, and once as an input to represent where the unchanged half of the output register comes from). So we can conveniently assign one of the two asm operands to be the output $Qd, and the other $QdSrc, which avoids confusing the auto-generated AsmMatcher too much. For the move _from_ a vector register, there's no way to get round the fact that both instances of that register name have to be inputs, so we need a custom AsmMatchConverter to avoid generating two separate output MC operands. (And even that wouldn't have worked if it hadn't been for D60695.) Reviewers: dmgreen, samparker, SjoerdMeijer, t.p.northover Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62679 llvm-svn: 364041	2019-06-21 13:17:23 +00:00
Simon Tatham	bafb105e96	[ARM] Add MVE vector instructions that take a scalar input. This adds the `MVE_qDest_rSrc` superclass and all its instances, plus a few other instructions that also take a scalar input register or two. I've also belatedly added custom diagnostic messages to the operand classes for odd- and even-numbered GPRs, which required matching changes in two of the existing MVE assembly test files. Reviewers: dmgreen, samparker, SjoerdMeijer, t.p.northover Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62678 llvm-svn: 364040	2019-06-21 13:17:08 +00:00
Simon Pilgrim	36a999ffb8	[X86] X86ISD::ANDNP is a (non-commutative) binop The sat add/sub tests still have unnecessary extract_subvector((vandnps ymm, ymm), 0) uses that should be split to (vandnps (extract_subvector(ymm, 0), extract_subvector(ymm, 0)), but its getting better. llvm-svn: 364038	2019-06-21 12:42:39 +00:00
Simon Tatham	a6b6a15701	[ARM] Add a batch of similarly encoded MVE instructions. Summary: This adds the `MVE_qDest_qSrc` superclass and all instructions that inherit from it. It's not the complete class of _everything_ with a q-register as both destination and source; it's a subset of them that all have similar encodings (but it would have been hopelessly unwieldy to call it anything like MVE_111x11100). This category includes add/sub with carry; long multiplies; halving multiplies; multiply and accumulate, and some more complex instructions. Reviewers: dmgreen, samparker, SjoerdMeijer, t.p.northover Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62677 llvm-svn: 364037	2019-06-21 12:13:59 +00:00
Simon Pilgrim	9184b009cf	[X86] createMMXBuildVector - call with BuildVectorSDNode directly. NFCI. llvm-svn: 364030	2019-06-21 11:25:06 +00:00
Fangrui Song	d5cf95e41c	[ARM] Fix -Wimplicit-fallthrough after D62675 llvm-svn: 364028	2019-06-21 11:19:11 +00:00
Simon Tatham	7d76f8acf0	[ARM] Add MVE vector compare instructions. Summary: These take a pair of vector register to compare, and a comparison type (written in the form of an Arm condition suffix); they output a vector of booleans in the VPR register, where predication can conveniently use them. Reviewers: dmgreen, samparker, SjoerdMeijer, t.p.northover Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62676 llvm-svn: 364027	2019-06-21 11:14:51 +00:00
Simon Pilgrim	c26b8f2afc	[X86] combineAndnp - use isNOT instead of manually checking for (XOR x, -1) llvm-svn: 364026	2019-06-21 11:13:15 +00:00
Simon Pilgrim	b5733581c4	[X86] foldVectorXorShiftIntoCmp - use isConstOrConstSplat. NFCI. Use the isConstOrConstSplat helper instead of inspecting the build vector manually. llvm-svn: 364024	2019-06-21 10:54:30 +00:00
Simon Pilgrim	771c33e375	[X86][AVX] isNOT - handle concat_vectors(xor X, -1, xor Y, -1) pattern llvm-svn: 364022	2019-06-21 10:44:15 +00:00
Simon Tatham	c9b2cd4674	[ARM] Add a batch of MVE floating-point instructions. Summary: This includes floating-point basic arithmetic (add/sub/multiply), complex add/multiply, unary negation and absolute value, rounding to integer value, and conversion to/from integer formats. Reviewers: dmgreen, samparker, SjoerdMeijer, t.p.northover Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62675 llvm-svn: 364013	2019-06-21 09:35:07 +00:00
Fangrui Song	dc8de6037c	Simplify std::lower_bound with llvm::{bsearch,lower_bound}. NFC llvm-svn: 364006	2019-06-21 05:40:31 +00:00
Fangrui Song	ddd056c984	[MIPS GlobalISel] Fix -Wunused-variable in -DLLVM_ENABLE_ASSERTIONS=off builds after D63541 llvm-svn: 364003	2019-06-21 01:51:50 +00:00
Matt Arsenault	d88db6d7fc	AMDGPU: Always use s33 for global scratch wave offset Every called function could possibly need this to calculate the absolute address of stack objectst, and this avoids inserting a copy around every call site in the kernel. It's also somewhat cleaner to keep this in a callee saved SGPR. llvm-svn: 363990	2019-06-20 21:58:24 +00:00
Eli Friedman	25f08a17c3	[ARM GlobalISel] Add support for s64 G_ADD and G_SUB. Teach RegisterBankInfo to use the correct register class, and tell the legalizer it's legal. Everything else just works. The one thing that's slightly weird about this compared to SelectionDAG isel is that legalization can't distinguish between i64 and <1 x i64>, so we might end up with more NEON instructions than the user expects. Differential Revision: https://reviews.llvm.org/D63585 llvm-svn: 363989	2019-06-20 21:56:47 +00:00
Jinsong Ji	8b1abe568e	[PowerPC][NFC] Fix comments for AltVSXFMARel mapping. llvm-svn: 363987	2019-06-20 21:36:06 +00:00
Matt Arsenault	740322f1eb	AMDGPU: Add intrinsics for DS GWS semaphore instructions llvm-svn: 363983	2019-06-20 21:11:42 +00:00
Matt Arsenault	8ad1decf45	AMDGPU: Insert mem_viol check loop around GWS pre-GFX9 It is necessary to emit this loop around GWS operations in case the wave is preempted pre-GFX9. llvm-svn: 363979	2019-06-20 20:54:32 +00:00
Craig Topper	9e1665f2d6	[X86] Add BLSI to isUseDefConvertible. Summary: BLSI sets the C flag is the input is not zero. So if its followed by a TEST of the input where only the Z flag is consumed, we can replace it with the opposite check of the C flag. We should be able to do the same for BLSMSK and BLSR, but the naive test case for those is being optimized to a subo by CodeGenPrepare. Reviewers: spatel, RKSimon Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63589 llvm-svn: 363957	2019-06-20 17:52:53 +00:00
Matt Arsenault	5dc457cbe4	AMDGPU: Fix ignoring DisableFramePointerElim in leaf functions The attribute can specify elimination for leaf or non-leaf, so it should always be considered. I copied this bug from AArch64, which probably should also be fixed. llvm-svn: 363949	2019-06-20 17:03:23 +00:00
Matt Arsenault	b7f87c0ecf	AMDGPU: Treat undef as an inline immediate This should only matter in vectors with an undef component, since a full undef vector would have been folded out. llvm-svn: 363941	2019-06-20 16:01:09 +00:00
Simon Tatham	232db11020	[ARM] Add a batch of MVE integer instructions. This includes integer arithmetic of various kinds (add/sub/multiply, saturating and not), and the immediate forms of VMOV and VMVN that load an immediate into all lanes of a vector. Reviewers: dmgreen, samparker, SjoerdMeijer, t.p.northover Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62674 llvm-svn: 363936	2019-06-20 15:16:56 +00:00
Stanislav Mekhanoshin	0846c125f9	[AMDGPU] gfx1010 core wave32 changes Differential Revision: https://reviews.llvm.org/D63204 llvm-svn: 363934	2019-06-20 15:08:34 +00:00
Simon Pilgrim	a4d705e0ef	[X86] LowerAVXExtend - handle ANY_EXTEND_VECTOR_INREG lowering as well. llvm-svn: 363922	2019-06-20 11:31:54 +00:00
Petar Avramovic	153bd24eda	[MIPS GlobalISel] Select integer to floating point conversions Select G_SITOFP and G_UITOFP for MIPS32. Differential Revision: https://reviews.llvm.org/D63542 llvm-svn: 363912	2019-06-20 09:05:02 +00:00
Petar Avramovic	4b4dae1c76	[MIPS GlobalISel] Select floating point to integer conversions Select G_FPTOSI and G_FPTOUI for MIPS32. Differential Revision: https://reviews.llvm.org/D63541 llvm-svn: 363911	2019-06-20 08:52:53 +00:00
Craig Topper	b4ea64570c	[X86] Remove memory instructions form isUseDefConvertible. The caller of this is looking for comparisons of the input to these instructions with 0. But the memory instructions input is an addess not a value input in a register. llvm-svn: 363907	2019-06-20 04:58:40 +00:00
Craig Topper	451f7feb64	[X86] Add v64i8/v32i16 to several places in X86CallingConv.td where they seemed obviously missing. llvm-svn: 363906	2019-06-20 04:29:00 +00:00
Matt Arsenault	c67c484f36	AMDGPU: Don't clobber VCC in MUBUF addr64 emulation Introducing VCC defs during SIFixSGPRCopies is generally problematic. Avoid it by starting with the VOP3 form with the general condition register. This is the easiest to fix instance, but doesn't solve any specific problems I'm looking at. llvm-svn: 363904	2019-06-20 00:51:28 +00:00
Eli Friedman	d88e28d13e	[llvm-objdump] Switch between ARM/Thumb based on mapping symbols. The ARMDisassembler changes allow changing between ARM and Thumb mode based on the MCSubtargetInfo, rather than the Target, which simplifies the other changes a bit. I'm not really happy with adding more target-specific logic to tools/llvm-objdump/, but there isn't any easy way around it: the logic in question specifically applies to disassembling an object file, and that code simply isn't located in lib/Target, at least at the moment. Differential Revision: https://reviews.llvm.org/D60927 llvm-svn: 363903	2019-06-20 00:29:40 +00:00
Matt Arsenault	e4c2e9b016	AMDGPU: Consolidate some getGeneration checks This is incomplete, and ideally these would all be removed, but it's better to localize them to the subtarget first with comments about what they're for. llvm-svn: 363902	2019-06-19 23:54:58 +00:00
Matt Arsenault	e24b34e9c9	AMDGPU: Undo sub x, c canonicalization for v2i16 Should avoid regression from D62341 llvm-svn: 363899	2019-06-19 23:37:43 +00:00
Simon Atanasyan	f61c43c636	[mips] Mark the `lwupc` instruction as MIPS64 R6 only The "The MIPS64 Instruction Set Reference Manual" [1] states that the `lwupc` is MIPS64 Release 6 only. It should not be supported for 32-bit CPUs. [1] https://s3-eu-west-1.amazonaws.com/downloads-mips/documents/MD00087-2B-MIPS64BIS-AFP-6.06.pdf llvm-svn: 363886	2019-06-19 22:08:06 +00:00
Simon Atanasyan	0121432602	[mips] Add (GPR\|PTR)_64 predicates to PseudoReturn64 and PseudoIndirectHazardBranch64 This patch is one of a series of patches. The goal is to make P5600 scheduler model complete and turn on the `CompleteModel` flag. llvm-svn: 363885	2019-06-19 22:07:46 +00:00
Matt Arsenault	4d000d2488	AMDGPU: Fix folding immediate into readfirstlane through reg_sequence The def instruction for the vreg may not match, because it may be folding through a reg_sequence. The assert was overly conservative and not necessary. It's not actually important if DefMI really defined the register, because the fold that will be done cares about the def of the value that will be folded. For some reason copies aren't making it through the reg_sequence, although they should. llvm-svn: 363876	2019-06-19 20:44:15 +00:00
Peter Collingbourne	2742eeb78e	hwasan: Shrink outlined checks by 1 instruction. Turns out that we can save an instruction by folding the right shift into the compare. Differential Revision: https://reviews.llvm.org/D63568 llvm-svn: 363874	2019-06-19 20:40:03 +00:00
Matt Arsenault	4d55d024be	Reapply "AMDGPU: Add ds_gws_init / ds_gws_barrier intrinsics" This reapplies r363678, using the correct chain for the CopyToReg for v0. glueCopyToM0 counterintuitively changes the operands of the original node. llvm-svn: 363870	2019-06-19 19:55:27 +00:00
Sanjay Patel	b5640b6fe8	[x86] avoid vector load narrowing with extracted store uses (PR42305) This is an exception to the rule that we should prefer xmm ops to ymm ops. As shown in PR42305: https://bugs.llvm.org/show_bug.cgi?id=42305 ...the store folding opportunity with vextractf128 may result in better perf by reducing the instruction count. Differential Revision: https://reviews.llvm.org/D63517 llvm-svn: 363853	2019-06-19 18:13:47 +00:00
Simon Pilgrim	0018b78ef6	[X86][SSE] combineToExtendVectorInReg - add ANY_EXTEND support TODO. NFCI. So I don't forget - there's a load of yak shaving to do first. llvm-svn: 363847	2019-06-19 17:42:37 +00:00
Simon Pilgrim	34279db355	[X86][SSE] Combine shuffles to ANY_EXTEND/ANY_EXTEND_VECTOR_INREG. We already do this for ZERO_EXTEND/ZERO_EXTEND_VECTOR_INREG - this just extends the pattern matcher to recognize cases where we don't need the zeros in the extension. llvm-svn: 363841	2019-06-19 17:21:15 +00:00
Simon Tatham	2f5188fd58	[ARM] Add MVE vector bit-operations (register inputs). This includes all the obvious bitwise operations (AND, OR, BIC, ORN, MVN) in register-to-register forms, and the immediate forms of AND/OR/BIC/ORN; byte-order reverse instructions; and the VMOVs that access a single lane of a vector. Some of those VMOVs (specifically, the ones that access a 32-bit lane) share an encoding with existing instructions that were disassembled as accessing half of a d-register (e.g. `vmov.32 r0, d1[0]`), but in 8.1-M they're now written as accessing a quarter of a q-register (e.g. `vmov.32 r0, q0[2]`). The older syntax is still accepted by the assembler. Reviewers: dmgreen, samparker, SjoerdMeijer, t.p.northover Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62673 llvm-svn: 363838	2019-06-19 16:43:53 +00:00
Evandro Menezes	567f6c150d	[AVR] Change limit type to match the argument type (NFC) llvm-svn: 363832	2019-06-19 16:12:12 +00:00
Evandro Menezes	56c45e93ab	[Hexagon] Change limit type to match the argument type (NFC) llvm-svn: 363831	2019-06-19 16:12:01 +00:00
Simon Pilgrim	cdc0236e3a	[X86] getExtendInVec - take a ISD::*_EXTEND opcode instead of a IsSigned bool flag. NFCI. Prep work to support ANY_EXTEND/ANY_EXTEND_VECTOR_INREG without needing another flag. llvm-svn: 363818	2019-06-19 15:18:24 +00:00
Simon Pilgrim	d4754cac89	[X86] Add _EXTEND -> _EXTEND_VECTOR_INREG opcode conversion helper. NFCI. Given a _EXTEND or _EXTEND_VECTOR_INREG opcode, convert it to *_EXTEND_VECTOR_INREG. llvm-svn: 363812	2019-06-19 14:54:02 +00:00
Simon Pilgrim	2b309027ed	[X86] Merge extract_subvector(_EXTEND) and extract_subvector(_EXTEND_VECTOR_INREG) handling. NFCI. llvm-svn: 363808	2019-06-19 14:25:27 +00:00
Ulrich Weigand	3641b10f3d	[SystemZ] Support vector load/store alignment hints Vector load/store instructions support an optional alignment field that the compiler can use to provide known alignment info to the hardware. If the field is used (and the information is correct), the hardware may be able (on some models) to perform faster memory accesses than otherwise. This patch adds support for alignment hints in the assembler and disassembler, and fills in known alignment during codegen. llvm-svn: 363806	2019-06-19 14:20:00 +00:00
Simon Pilgrim	128ce93c60	Revert rL363678 : AMDGPU: Add ds_gws_init / ds_gws_barrier intrinsics There may or may not be additional work to handle this correctly on SI/CI. ........ Breaks EXPENSIVE_CHECKS buildbots - http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-win/builds/78/ llvm-svn: 363797	2019-06-19 13:00:54 +00:00
Lewis Revill	18737e81eb	[RISCV] Allow parsing immediates that use tilde & exclaim This patch allows immediates (and CSR alias immediates) which start with a tilde token or an exclaim (!) token to be parsed as intended. Differential Revision: https://reviews.llvm.org/D57320 llvm-svn: 363783	2019-06-19 10:27:24 +00:00
Lewis Revill	218aa0edb1	[RISCV] Fix failure to parse parenthesized immediates Since the parser attempts to parse an operand as a register with parentheses before parsing it as an immediate, immediates in parentheses should not be parsed by parseRegister. However in the case where the immediate does not start with an identifier, the LParen is not unlexed and so the RParen causes an unexpected token error. This patch adds the missing UnLex, and modifies the existing UnLex to not use a buffered token, as it should always be unlexing an LParen. Differential Revision: https://reviews.llvm.org/D57319 llvm-svn: 363782	2019-06-19 10:11:13 +00:00
Clement Courbet	4ef7c2868a	[X86] Add missing properties on llvm.x86.sse.{st,ld}mxcsr Summary: llvm.x86.sse.stmxcsr only writes to memory. llvm.x86.sse.ldmxcsr only reads from memory, and might generate an FPE. Reviewers: craig.topper, RKSimon Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62896 llvm-svn: 363773	2019-06-19 08:44:31 +00:00
Lewis Revill	39263ac5d1	[RISCV] Add lowering of global TLS addresses This patch adds lowering for global TLS addresses for the TLS models of InitialExec, GlobalDynamic, LocalExec and LocalDynamic. LocalExec support required using a 4-operand add instruction, which uses the fourth operand to express a relocation on the symbol. The necessary fixup is emitted when the instruction is emitted. Differential Revision: https://reviews.llvm.org/D55305 llvm-svn: 363771	2019-06-19 08:40:59 +00:00
Chen Zheng	c5b918de58	[NFC] move some hardware loop checking code to a common place for other using. Differential Revision: https://reviews.llvm.org/D63478 llvm-svn: 363758	2019-06-19 01:26:31 +00:00
Matt Arsenault	9cac4e6d14	Rename ExpandISelPseudo->FinalizeISel, delay register reservation This allows targets to make more decisions about reserved registers after isel. For example, now it should be certain there are calls or stack objects in the frame or not, which could have been introduced by legalization. Patch by Matthias Braun llvm-svn: 363757	2019-06-19 00:25:39 +00:00
Thomas Lively	1885747498	[WebAssembly] Optimize ISel for SIMD Boolean reductions Summary: Converting the result *.{all,any}_true to a bool at the source level generates LLVM IR that compares the result to 0. This check is redundant since these instructions already return either 0 or 1 and therefore conform to the BooleanContents setting for WebAssembly. This CL adds patterns to detect and remove such redundant operations on the result of Boolean reductions. Reviewers: dschuff, aheejin Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63529 llvm-svn: 363756	2019-06-19 00:02:13 +00:00
Huihui Zhang	d16779a732	[ARM] Comply with rules on ARMv8-A thumb mode partial deprecation of IT. Summary: When identifing instructions that can be folded into a MOVCC instruction, checking for a predicate operand is not enough, also need to check for thumb2 function, with restrict-IT, is the machine instruction eligible for ARMv8 IT or not. Notes in ARMv8-A Architecture Reference Manual, section "Partial deprecation of IT" https://usermanual.wiki/Pdf/ARM20Architecture20Reference20ManualARMv8.1667877052.pdf "ARMv8-A deprecates some uses of the T32 IT instruction. All uses of IT that apply to instructions other than a single subsequent 16-bit instruction from a restricted set are deprecated, as are explicit references to the PC within that single 16-bit instruction. This permits the non-deprecated forms of IT and subsequent instructions to be treated as a single 32-bit conditional instruction." Reviewers: efriedma, lebedev.ri, t.p.northover, jmolloy, aemerson, compnerd, stoklund, ostannard Reviewed By: ostannard Subscribers: ostannard, javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63474 llvm-svn: 363739	2019-06-18 20:55:09 +00:00
Sam Elliott	9f155bc6e5	[RISCV] Prevent re-ordering some adds after shifts Summary: DAGCombine will normally turn a `(shl (add x, c1), c2)` into `(add (shl x, c2), c1 << c2)`, where `c1` and `c2` are constants. This can be prevented by a callback in TargetLowering. On RISC-V, materialising the constant `c1 << c2` can be more expensive than materialising `c1`, because materialising the former may take more instructions, and may use a register, where materialising the latter would not. This patch implements the hook in RISCVTargetLowering to prevent this transform, in the cases where: - `c1` fits into the immediate field in an `addi` instruction. - `c1` takes fewer instructions to materialise than `c1 << c2`. In future, DAGCombine could do the check to see whether `c1` fits into an add immediate, which might simplify more targets hooks than just RISC-V. Reviewers: asb, luismarques, efriedma Reviewed By: asb Subscribers: xbolva00, lebedev.ri, craig.topper, lewis-revill, Jim, hiraditya, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, kito-cheng, shiva0217, jrtc27, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, rkruppe, PkmX, jocewei, psnobl, benna, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62857 llvm-svn: 363736	2019-06-18 20:38:08 +00:00
Stanislav Mekhanoshin	bb1c8b6f5c	[AMDGPU] gfx10 wave32 patterns Differential Revision: https://reviews.llvm.org/D63511 llvm-svn: 363729	2019-06-18 20:00:24 +00:00
Stanislav Mekhanoshin	ab4f2ea793	[AMDGPU] gfx1010 disassembler changes for wave32 Differential Revision: https://reviews.llvm.org/D63506 llvm-svn: 363721	2019-06-18 19:10:59 +00:00
Craig Topper	10e6128c62	[X86] Remove unnecessary line that makes v4f32 FP_ROUND Legal. NFC FP_ROUND defaults to Legal for all MVT types and nothing changes the v4f32 entry way from this default. If we needed this line we'd also need one for v8f32 with AVX512 which we don't have. llvm-svn: 363719	2019-06-18 19:04:03 +00:00
Simon Atanasyan	796e7f8724	[mips] Add more strict predicates to the RSQRT_S_MM and TAILCALL_MM This patch is one of a series of patches. The goal is to make P5600 scheduler model complete and turn on the `CompleteModel` flag. llvm-svn: 363703	2019-06-18 17:00:08 +00:00
Simon Atanasyan	60a9d0c248	[mips] Add PTR_64 and GPR_64 predicates to some MIPS 64-bit instructions Add `IsGP64bit` and `IsPTR64bit` to the list of `UnsupportedFeatures` of the P5600 scheduling definitions. Also mark some MIPS 64-bit instructions by PTR_64 and GPR_64 predicates. This reduces number of "No schedule information for" and "lacks information for" errors in case of marking this scheduler model as complete. This patch is one of a series of patches. The goal is to make P5600 scheduler model complete and turn on the `CompleteModel` flag. Differential Revision: https://reviews.llvm.org/D63237 llvm-svn: 363702	2019-06-18 16:59:57 +00:00
Simon Atanasyan	9086ba8763	[mips] Set the hasNoSchedulingInfo flag for the `MipsAsmPseudoInst` Set the hasNoSchedulingInfo flag for the`MipsAsmPseudoInst`. These pseudo-instructions are never used by codegen. This flag allows to reduce number of "No schedule information for" and "lacks information for" errors in case of marking a scheduler model as complete. This patch is one of a series of patches. The goal is to make P5600 scheduler model complete and turn on the `CompleteModel` flag. Differential Revision: https://reviews.llvm.org/D63236 llvm-svn: 363701	2019-06-18 16:59:47 +00:00
Simon Tatham	cfc70782d7	[ARM] Add MVE vector shift instructions. This includes saturating and non-saturating shifts, both with immediate shift count and with the shift counts given by another vector register; VSHLC (in which the bits shifted out of each active vector lane are shifted in to the next active lane); and also VMOVL, which is enough like an immediate shift that it didn't fit too badly in this category. Reviewers: dmgreen, samparker, SjoerdMeijer, t.p.northover Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62672 llvm-svn: 363696	2019-06-18 16:19:59 +00:00
Simon Tatham	faaf1a5366	[ARM] Add MVE integer vector min/max instructions. Summary: These form a small family of their own, to go with the floating-point VMINNM/VMAXNM instructions added in a previous commit. They introduce the first of many special cases in the mnemonic recognition code, because VMIN with the E suffix used by the VPT predication system needs to avoid being interpreted as the nonexistent instruction 'VMI' with an ordinary 'NE' condition suffix. Reviewers: dmgreen, samparker, SjoerdMeijer, t.p.northover Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62671 llvm-svn: 363695	2019-06-18 15:51:46 +00:00
Simon Pilgrim	9c8593934a	[X86][AVX] extract_subvector(any_extend(x)) -> any_extend_vector_inreg(x) Part of fixing the X86 regression noted in D63281 - I've split this into X86 and generic parts - the generic commit will be coming shortly and will fix the vector-reduce-mul-widen.ll regression introduced here. llvm-svn: 363693	2019-06-18 15:30:50 +00:00
Simon Tatham	ed4a602515	[ARM] Rename MVE instructions in Tablegen for consistency. Summary: Their names began with a mishmash of `MVE_`, `t2` and no prefix at all. Now they all start with `MVE_`, which seems like a reasonable choice on the grounds that (a) NEON is the thing they're most at risk of being confused with, and (b) MVE implies Thumb-2, so a prefix indicating MVE is strictly more specific than one indicating Thumb-2. Reviewers: ostannard, SjoerdMeijer, dmgreen Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63492 llvm-svn: 363690	2019-06-18 15:05:42 +00:00
Lewis Revill	74c8364954	[RISCV] Lower calls through PLT This patch adds support for generating calls through the procedure linkage table where required for a given ExternalSymbol or GlobalAddress callee. Differential Revision: https://reviews.llvm.org/D55304 llvm-svn: 363686	2019-06-18 14:29:45 +00:00
Matt Arsenault	8d35dcd703	AMDGPU: Add ds_gws_init / ds_gws_barrier intrinsics There may or may not be additional work to handle this correctly on SI/CI. llvm-svn: 363678	2019-06-18 13:19:57 +00:00
Matt Arsenault	f39f3bd056	AMDGPU: Change API for checking for exec modification Invert the name and return value to better reflect the imprecise nature. Force passing in the DefMI, since it's known in the 2 users and could possibly fail for an arbitrary vreg. Allow specifying a specific user instruction. Scan through use instructions, instead of use operands. Add scan thresholds instead of searching infinitely. Stop using a set to track seen uses. I didn't understand this usage, or why it would not check the last use. I don't think the use list has any particular order. llvm-svn: 363675	2019-06-18 12:48:36 +00:00
Matt Arsenault	bcb5ea0042	AMDGPU: Fold readlane from copy of SGPR or imm These may be inserted to assert uniformity somewhere. llvm-svn: 363670	2019-06-18 12:23:46 +00:00
Matt Arsenault	e75e197ad8	AMDGPU: Remove unnecessary check for virtual register The copy was found by searching the uses of a virtual register, so it's already known to be virtual. llvm-svn: 363669	2019-06-18 12:23:45 +00:00
Matt Arsenault	23f03f5059	AMDGPU: Fix iterator crash in AMDGPUPromoteAlloca The lifetime intrinsic was erased, which was the next iterator. llvm-svn: 363668	2019-06-18 12:23:44 +00:00
Matt Arsenault	d5ce8ec778	AMDGPU/GlobalISel: RegBankSelect for amdgcn.div.scale llvm-svn: 363667	2019-06-18 12:23:42 +00:00
Sjoerd Meijer	7a7009f7c8	[ARM] Some Thumb2ITBlock clean ups. NFC Some more refactoring, like registering the IT Block pass, less cryptic variable names, and some simplification of loops. Differential Revision: https://reviews.llvm.org/D63419 llvm-svn: 363666	2019-06-18 12:13:11 +00:00
Jonas Paulsson	5c64a8c4c6	[SystemZ] Fix AHIMuxK pseudo expansion. Do not emit a copy if the source and destination registers are the same. Review: Ulrich Weigand llvm-svn: 363665	2019-06-18 12:10:02 +00:00
Valery Pykhtin	7e854e1cdd	[AMDGPU] Speed up live-in virtual register set computaion in GCNScheduleDAGMILive. Differential revision: https://reviews.llvm.org/D62401 llvm-svn: 363661	2019-06-18 11:43:17 +00:00
Simon Pilgrim	7dd529e54d	[X86] Replace any_extend* vector extensions with zero_extend* equivalents First step toward addressing the vector-reduce-mul-widen.ll regression in D63281 - we should replace ANY_EXTEND/ANY_EXTEND_VECTOR_INREG in X86ISelDAGToDAG to avoid having to add duplicate patterns when treating any extensions as legal. In future patches this will also allow us to keep any extension nodes around a lot longer in the DAG, which should mean that we can keep better track of undef elements that otherwise become zeros that we think we have to keep...... Differential Revision: https://reviews.llvm.org/D63326 llvm-svn: 363655	2019-06-18 09:50:13 +00:00
Craig Topper	f4284f8a9d	[X86] Move code that shrinks immediates for ((x << C1) op C2) into a helper function. NFCI Preliminary step for D59909 llvm-svn: 363645	2019-06-18 04:23:58 +00:00
Craig Topper	587427716c	[X86] Remove MOVDI2SSrm/MOV64toSDrm/MOVSS2DImr/MOVSDto64mr CodeGenOnly instructions. The isel patterns for these use a bitcast and load/store, but DAG combine should have canonicalized those away. For the purposes of the memory folding table these opcodes can be replaced by the MOVSSrm_alt/MOVSDrm_alt and MOVSSmr/MOVSDmr opcodes. llvm-svn: 363644	2019-06-18 03:23:15 +00:00
Craig Topper	8582ecd8d9	[X86] Introduce new MOVSSrm/MOVSDrm opcodes that use VR128 register class. Rename the old versions that use FR32/FR64 to MOVSSrm_alt/MOVSDrm_alt. Use the new versions in patterns that previously used a COPY_TO_REGCLASS to VR128. These patterns expect the upper bits to be zero. The current set up appears to work, but I'm not sure we should be enforcing upper bits being zero through a COPY_TO_REGCLASS. I wanted to flip the arrangement and use a COPY_TO_REGCLASS to FR32/FR64 for the patterns that need an f32/f64 result, but that complicated fastisel and globalisel. I've been doing some experiments with reducing some isel patterns and ended up in a situation where I had a (SUBREG_TO_REG (COPY_TO_RECLASS (VMOVSSrm), VR128)) and our post-isel peephole was unable to avoid using an instruction for the SUBREG_TO_REG due to the COPY_TO_REGCLASS. Having a VR128 instruction removes the COPY_TO_REGCLASS that was breaking this. llvm-svn: 363643	2019-06-18 03:23:11 +00:00
Amara Emerson	146882242f	[GlobalISel][Localizer] Rewrite localizer to run in 2 phases, inter & intra block. Inter-block localization is the same as what currently happens, except now it only runs on the entry block because that's where the problematic constants with long live ranges come from. The second phase is a new intra-block localization phase which attempts to re-sink the already localized instructions further right before one of the multiple uses. One additional change is to also localize G_GLOBAL_VALUE as they're constants too. However, on some targets like arm64 it takes multiple instructions to materialize the value, so some additional heuristics with a TTI hook have been introduced attempt to prevent code size regressions when localizing these. Overall, these changes improve CTMark code size on arm64 by 1.2%. Full code size results: Program baseline new diff ------------------------------------------------------------------------------ test-suite...-typeset/consumer-typeset.test 1249984 1217216 -2.6% test-suite...:: CTMark/ClamAV/clamscan.test 1264928 1232152 -2.6% test-suite :: CTMark/SPASS/SPASS.test 1394092 1361316 -2.4% test-suite...Mark/mafft/pairlocalalign.test 731320 714928 -2.2% test-suite :: CTMark/lencod/lencod.test 1340592 `1324200` -1.2% test-suite :: CTMark/kimwitu++/kc.test 3853512 3820420 -0.9% test-suite :: CTMark/Bullet/bullet.test 3406036 3389652 -0.5% test-suite...ark/tramp3d-v4/tramp3d-v4.test 8017000 8016992 -0.0% test-suite...TMark/7zip/7zip-benchmark.test 2856588 2856588 0.0% test-suite...:: CTMark/sqlite3/sqlite3.test 765704 765704 0.0% Geomean difference -1.2% Differential Revision: https://reviews.llvm.org/D63303 llvm-svn: 363632	2019-06-17 23:20:29 +00:00
Craig Topper	971ad74ba2	Use VR128X instead of FR32X/FR64X for the register class in VMOVSSZmrk/VMOVSDZmrk. Removes COPY_TO_REGCLASS from some patterns. llvm-svn: 363630	2019-06-17 23:08:29 +00:00
Craig Topper	0e18300802	[X86] Make an assert in LowerSCALAR_TO_VECTOR stricter to make it clear what types are allowed here. NFC Make it clear that only integer type with i32 or smaller elements shoudl get to this part of the code. llvm-svn: 363629	2019-06-17 23:08:09 +00:00
Stanislav Mekhanoshin	121956108f	[AMDGPU] Use custom inserter for gfx10 VOP2b This is part of the approved D63204 pending parent revision. This small change is in fact a part of the VOP2b legalization which does not technically belong to wave32 support, so extracted separately. llvm-svn: 363625	2019-06-17 22:37:37 +00:00
Stanislav Mekhanoshin	3138278287	[AMDGPU] Propagate function attributes thru bitcasts AMDGPUPropagateAttributes will not work on function bitcatsts, so move AMDGPUFixFunctionBitcasts before it. Differential Revision: https://reviews.llvm.org/D63455 llvm-svn: 363614	2019-06-17 20:42:48 +00:00
Nicolai Haehnle	ae4fcb97dd	AMDGPU/GFX10: Don't generate s_code_end padding in the asm-printer Summary: The purpose of the padding is to guard against stale code being fetched into the instruction cache by the lowest level prefetching. We're generating relocatable ELF here, and so the padding should arguably be added by the linker. This is in fact what Mesa does. This also fixes multi-part shaders for Mesa. Change-Id: I6bfede58f20e9f337762ccf39ef9e0e263e69e82 Reviewers: arsenm, rampitec, t-tye Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63427 llvm-svn: 363602	2019-06-17 19:28:43 +00:00
Jessica Paquette	49537bbf74	[GlobalISel][AArch64] Fold G_SUB into G_ICMP when it's safe to do so Basically porting over the behaviour in AArch64ISelLowering to GISel. See emitComparison for reference. When we have something like this: ``` lhs = G_SUB 0, y ... G_ICMP lhs, rhs ``` We can fold away the G_SUB and produce a cmn instead, given that we produce the same value in NZCV. Add a test showing that the transformation works, and also showing that we don't perform the transformation when it's unsafe. Also factor out the CSet emission into emitCSetForICMP. Differential Revision: https://reviews.llvm.org/D63163 llvm-svn: 363596	2019-06-17 18:40:06 +00:00
Craig Topper	f3f968adcd	[X86] Add TB_NO_REVERSE to some memory folding table entries where the register form requires 64-bit mode, but the memory form does not. We don't know if its safe to unfold if we're in 32-bit mode. This is simlar to what was done to some load opcodes in r363523. I think its pretty unlikely we will try to unfold these anyway so I don't think this is testable. llvm-svn: 363595	2019-06-17 18:38:07 +00:00
Simon Pilgrim	835999e48a	[X86][SSE] Scalarize under-aligned XMM vector nt-stores (PR42026) If a XMM non-temporal store has less than natural alignment, scalarize the vector - with SSE4A we can stay on the vector and use MOVNTSD(f64), else we must move to GPRs and use MOVNTI(i32/i64). llvm-svn: 363592	2019-06-17 18:20:04 +00:00
Stanislav Mekhanoshin	a9191c8492	[AMDGPU] gfx1010 wavefrontsize intrinsic folding Differential Revision: https://reviews.llvm.org/D63206 llvm-svn: 363588	2019-06-17 17:57:50 +00:00
Stanislav Mekhanoshin	ad04e7ad42	[AMDGPU] Pass to propagate ABI attributes from kernels to the functions The pass works in two modes: Mode 1: Just set attributes starting from kernels. This can work at the very beginning of opt and llc pipeline, but cannot clone functions because it must be a function pass. Mode 2: Actually clone functions for new attributes. This can only work after all function passes in the opt pipeline because it has to be a module pass. Differential Revision: https://reviews.llvm.org/D63208 llvm-svn: 363586	2019-06-17 17:47:28 +00:00
Simon Pilgrim	bb9adfdb4e	[X86][AVX] Split under-aligned vector nt-stores. If a YMM/ZMM non-temporal store has less than natural alignment, split the vector - either they will be satisfactorily aligned or will continue to be split until they are XMMs - at which point the legalizer will scalarize it. llvm-svn: 363582	2019-06-17 17:22:38 +00:00
Warren Ristow	6452bdd29b	[LV] Suppress vectorization in some nontemporal cases When considering a loop containing nontemporal stores or loads for vectorization, suppress the vectorization if the corresponding vectorized store or load with the aligment of the original scaler memory op is not supported with the nontemporal hint on the target. This adds two new functions: bool isLegalNTStore(Type DataType, unsigned Alignment) const; bool isLegalNTLoad(Type DataType, unsigned Alignment) const; to TTI, leaving the target independent default implementation as returning true, but with overriding implementations for X86 that check the legality based on available Subtarget features. This fixes https://llvm.org/PR40759 Differential Revision: https://reviews.llvm.org/D61764 llvm-svn: 363581	2019-06-17 17:20:08 +00:00
Matt Arsenault	a7f09f3c9e	GlobalISel: Verify intrinsics I keep using the wrong instruction when manually writing tests. This really needs to check the number of operands, but I don't see an easy way to do that right now. llvm-svn: 363579	2019-06-17 17:01:32 +00:00
Matt Arsenault	fee1949b35	AMDGPU/GlobalISel: Account for multiple defs when finding intrinsic ID llvm-svn: 363578	2019-06-17 17:01:27 +00:00
Stanislav Mekhanoshin	5d00c3060e	[AMDGPU] gfx1010 wave32 metadata Differential Revision: https://reviews.llvm.org/D63207 llvm-svn: 363577	2019-06-17 16:48:56 +00:00
Tom Stellard	8b1c53b528	AMDGPU/GlobalISel: Implement select for G_ICMP and G_SELECT Reviewers: arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, hiraditya, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60640 llvm-svn: 363576	2019-06-17 16:27:43 +00:00
Simon Pilgrim	12cb792d7f	[X86] combineLoad - begun making the load split code more generic. NFCI. This is currently only used for ymm->xmm splitting but we shouldn't hardcode the offsets/alignment. This is necessary for an upcoming patch to split under-aligned non-temporal vector loads. llvm-svn: 363570	2019-06-17 15:54:36 +00:00
Simon Pilgrim	454e6b9010	[X86][SSE] Prevent misaligned non-temporal vector load/store combines For loads, pre-SSE41 we can't perform NT loads at all, and after that we can only perform vector aligned loads, so if the alignment is less than for a xmm we'll just end up using the regular unaligned vector loads anyway. First step towards fixing PR42026 - the next step for stores will be to use SSE4A movntsd where possible and to avoid the stack spill on SSE2 targets. Differential Revision: https://reviews.llvm.org/D63246 llvm-svn: 363564	2019-06-17 14:26:10 +00:00
Matt Arsenault	b10f097833	AMDGPU: Ignore subtarget for InferAddressSpaces Even if the target doesn't have flat instructions, addrspace(0) is still flat. It just happens to not work. llvm-svn: 363561	2019-06-17 14:13:24 +00:00
Matt Arsenault	29e792659b	AMDGPU/GlobalISel: Fix default mapping for non-register operands Tests will be in future commits when new intrinsics are handled here. llvm-svn: 363559	2019-06-17 13:52:19 +00:00
Matt Arsenault	e683eba0ed	AMDGPU: Cleanup custom PseudoSourceValue definitions Use separate enums for each kind, avoid repeating overloads, and add missing classof implementation. llvm-svn: 363558	2019-06-17 13:52:15 +00:00
Sam Parker	1bd3d00e7e	[CodeGen] Check for HardwareLoop Latch ExitBlock The HardwareLoops pass finds exit blocks with a scevable exit count. If the target specifies to update the loop counter in a register, through a phi, we need to ensure that the exit block is a latch so that we can insert the phi with the correct value for the incoming edge. Differential Revision: https://reviews.llvm.org/D63336 llvm-svn: 363556	2019-06-17 13:39:28 +00:00
Luis Marques	2e46312ffd	[DAGCombiner] [CodeGenPrepare] More comprehensive GEP splitting Some GEPs were not being split, presumably because that split would just be undone by the DAGCombiner. Not performing those splits can prevent important optimizations, such as preventing the element indices / member offsets from being (partially) folded into load/store instruction immediates. This patch: - Makes the splits also occur in the cases where the base address and the GEP are in the same BB. - Ensures that the DAGCombiner doesn't reassociate them back again. Differential Revision: https://reviews.llvm.org/D60294 llvm-svn: 363544	2019-06-17 10:54:12 +00:00
Fangrui Song	5401c2db6e	Fix clang -Wcovered-switch-default after stack-id change by D60137 llvm-svn: 363543	2019-06-17 10:20:20 +00:00
Fangrui Song	4bde5d3c08	[ARM] Fix another -Wunused-variable in -DLLVM_ENABLE_ASSERTIONS=off builds after D63265 llvm-svn: 363535	2019-06-17 09:29:50 +00:00
Fangrui Song	89d6905c59	[ARM] Fix -Wunused-variable in -DLLVM_ENABLE_ASSERTIONS=off builds after D63265 llvm-svn: 363534	2019-06-17 09:26:50 +00:00
Sander de Smalen	5d6ee76c16	Describe stack-id as an enum This patch changes MIR stack-id from an integer to an enum, and adds printing/parsing support for this in MIR files. The default stack-id '0' is now renamed to 'default'. This should make MIR tests that have stack objects with different stack-ids more descriptive. It also clarifies code operating on StackID. Reviewers: arsenm, thegameg, qcolombet Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D60137 llvm-svn: 363533	2019-06-17 09:13:29 +00:00
Sam Parker	a059efa885	[ARM] Remove ARMComputeBlockSize Forgot to remove file! llvm-svn: 363532	2019-06-17 09:13:10 +00:00
Sam Parker	f7c0b3aeb2	[ARM] Add ARMBasicBlockInfo.cpp Forgot to add file! llvm-svn: 363531	2019-06-17 09:05:43 +00:00
Sam Parker	966f4e874e	[ARM] Extract some code from ARMConstantIslandPass Create the ARMBasicBlockUtils class for tracking and querying basic blocks sizes so we can use them when generating low-overhead loops. Differential Revision: https://reviews.llvm.org/D63265 llvm-svn: 363530	2019-06-17 08:49:09 +00:00
Justin Hibbits	1d1cf30b73	PowerPC: Optimize SPE double parameter calling setup Summary: SPE passes doubles the same as soft-float, in register pairs as i32 types. This is all handled by the target-independent layer. However, this is not optimal when splitting or reforming the doubles, as it pushes to the stack and loads from, on either side. For instance, to pass a double argument to a function, assuming the double value is in r5, the sequence currently looks like this: evstdd 5, X(1) lwz 3, X(1) lwz 4, X+4(1) Likewise, to form a double into r5 from args in r3 and r4: stw 3, X(1) stw 4, X+4(1) evldd 5, X(1) This optimizes the fence to use SPE instructions. Now, to pass a double to a function: mr 4, 5 evmergehi 3, 5, 5 And to form a double into r5 from args in r3 and r4: evmergelo 5, 3, 4 This is comparable to the way that gcc generates the double splits. This also fixes a bug with expanding builtins to libcalls, where the LowerCallTo() code path was generating intermediate illegal type nodes. Reviewers: nemanjai, hfinkel, joerg Subscribers: kbarton, jfb, jsji, llvm-commits Differential Revision: https://reviews.llvm.org/D54583 llvm-svn: 363526	2019-06-17 03:15:23 +00:00
Craig Topper	9f2f127009	[X86] Add TB_NO_REVERSE to some folding table entries where the register from uses the REX prefix, but the memory form does not. It would not be safe to unfold the memory form the register form without checking that we are compiling for 64-bit mode. This probaby isn't a real functional issue since we are unlikely to unfold any of these instructions since they don't have any tied registers, aren't commutable, and don't have any inputs other than the address. llvm-svn: 363523	2019-06-16 22:33:09 +00:00
Nicolai Haehnle	41abf2766e	AMDGPU: Prepare for explicit absolute relocations in code generation Summary: We will use absolute relocations for LDS symbols. Change-Id: I9a32795ed0ea835e433a787129cfe3c57ee9a325 Reviewers: arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61492 llvm-svn: 363517	2019-06-16 17:43:37 +00:00
Nicolai Haehnle	6d71be4e67	AMDGPU: Be explicit about whether the high-word in SI_PC_ADD_REL_OFFSET is 0 Summary: Instead of encoding a high-word of 0 using a fake TargetGlobalAddress, just use a literal target constant. This simplifies some subsequent changes. The generated assembly is now more explicit about the kind of relocation that is to be used. Change-Id: I066835202d23b5941fa7a358eb4b89e9b71ab6f8 Reviewers: arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61491 llvm-svn: 363516	2019-06-16 17:32:01 +00:00
Nicolai Haehnle	490e83cd43	AMDGPU/GFX10: Support DLC bit in llvm.amdgcn.s.buffer.load intrinsic Summary: Change-Id: Ie4c971462a7749740938c687144e77441dac2539 Reviewers: rampitec, arsenm Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62486 Change-Id: Iae59523edd75c74918d2118df6571a7b671717a0 llvm-svn: 363514	2019-06-16 17:14:12 +00:00
Stanislav Mekhanoshin	5250021672	[AMDGPU] gfx10 conditional registers handling This is cpp source part of wave32 support, excluding overriden getRegClass(). Differential Revision: https://reviews.llvm.org/D63351 llvm-svn: 363513	2019-06-16 17:13:09 +00:00
Sanjay Patel	d14389c0a5	[x86] split 256-bit vector selects if operands are vector concats This is similar logic/motivation to the select splitting in D62969. In D63233, the pattern changes so that we no longer have an extract_subvector of vselect, but the operands of the select are still being concatenated. The closest case is represented in either the first or last test diffs here - we have an extra instruction, but we converted 3-4 ymm instructions into 4-5 xmm instructions. I think that's the right trade-off for most AVX1 targets. In the example based on PR37428: https://bugs.llvm.org/show_bug.cgi?id=37428 ...this makes the loop about 30% faster (tested on Haswell by compiling with -mavx). Differential Revision: https://reviews.llvm.org/D63364 llvm-svn: 363508	2019-06-16 14:04:49 +00:00
Simon Pilgrim	fcffc2facc	[X86] CombineShuffleWithExtract - handle cases with different vector extract sources Insert the shorter vector source into an undef vector of the longer vector source's type. llvm-svn: 363507	2019-06-16 08:00:41 +00:00
Simon Pilgrim	456ca5d7f7	[X86] CombineShuffleWithExtract - assert all src ops types are multiples of rootsize. NFCI. llvm-svn: 363501	2019-06-15 19:12:44 +00:00
Simon Pilgrim	90e87af303	[X86][AVX] Handle lane-crossing shuffle(extract_subvector(x,c1),extract_subvector(y,c2),m1) shuffles Pull out the existing (non)lane-crossing fold into a helper lambda and use for lane-crossing unary shuffles as well. Fixes PR34380 llvm-svn: 363500	2019-06-15 18:30:43 +00:00
Simon Pilgrim	990f3ceb67	[X86][AVX] Decode constant bits from insert_subvector(c1, c2, c3) This mostly happens due to SimplifyDemandedVectorElts reducing a vector to insert_subvector(undef, c1, 0) llvm-svn: 363499	2019-06-15 17:05:24 +00:00
Kang Zhang	2d51adcb57	[PowerPC] Set the innermost hot loop to align 32 bytes Summary: If the nested loop is an innermost loop, prefer to a 32-byte alignment, so that we can decrease cache misses and branch-prediction misses. Actual alignment of the loop will depend on the hotness check and other logic in alignBlocks. The old code will only align hot loop to 32 bytes when the LoopSize larger than 16 bytes and smaller than 32 bytes, this patch will align the innermost hot loop to 32 bytes not only for the hot loop whose size is 16~32 bytes. Reviewed By: steven.zhang, jsji Differential Revision: https://reviews.llvm.org/D61228 llvm-svn: 363495	2019-06-15 15:10:24 +00:00
Fangrui Song	44cc4e9351	[RISCV] Simplify RISCVAsmBackend::writeNopData(). NFC llvm-svn: 363486	2019-06-15 06:14:15 +00:00
Matt Arsenault	9487278010	Reapply "GlobalISel: Avoid producing Illegal copies in RegBankSelect" This reapplies r363410, avoiding null dereference if there is no AltRegBank. llvm-svn: 363478	2019-06-15 00:33:26 +00:00

... 14 15 16 17 18 ...

53908 Commits