llvm-project

Commit Graph

Author	SHA1	Message	Date
Simon Pilgrim	7a2ab876fd	[Hexagon] Fix fshl/fshr -> combine() bug identified in D75114	2020-03-06 17:23:10 +00:00
Jay Foad	11d1573bb6	[APFloat] Make use of new overloaded comparison operators. NFC. Reviewers: ekatz, spatel, jfb, tlively, craig.topper, RKSimon, nikic, scanon Subscribers: arsenm, jvesely, nhaehnle, hiraditya, dexonsmith, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D75744	2020-03-06 16:42:53 +00:00
Lucas Prates	af1c2e561e	[ARM] Fix dropped dollar sign from symbols in branch targets Summary: ARMAsmParser was incorrectly dropping a leading dollar sign character from symbol names in targets of branch instructions. This was caused by an incorrect assumption that the contents following the dollar sign token should be handled as a constant immediate, similarly to the # token. This patch avoids the operand parsing from consuming the dollar sign token when it is followed by an identifier, making sure it is properly parsed as part of the expression. Reviewers: efriedma Reviewed By: efriedma Subscribers: danielkiss, chill, carwil, vhscampos, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73176	2020-03-06 16:25:08 +00:00
Krzysztof Parzyszek	37a604c296	[Hexagon] Recognize undefined registers in expandPostRAPseudo	2020-03-06 08:27:42 -06:00
Xiangling Liao	362456bc53	[AIX] Handle LinkOnceODRLinkage and AppendingLinkage for static init gloabl arrays Handle LinkOnceODRLinkage; Handle AppendingLinkage type for llvm.global_ctors/dtors static init global arrays; Differential Revision: https://reviews.llvm.org/D75305	2020-03-06 09:26:55 -05:00
Sam Parker	4cf0dddcc6	[ARM][MVE] Enable VMOVN for tail predication These instructions also don't exchange lanes, so make them legal. Differential Revision: https://reviews.llvm.org/D75669	2020-03-06 08:59:22 +00:00
Jim Lin	c40a9010d9	[AVR][NFC] Remove trailing space	2020-03-06 10:40:27 +08:00
Jessica Paquette	ef4282e0ee	[AArch64][GlobalISel] Avoid copies to target register bank for subregister copies Previously for any copy from a register bigger than the destination: Copied to a same-sized register in the destination register bank. Subregister copy of that to the destination. This fails for copies from 128-bit FPRs to GPRs because the GPR register bank can't accomodate 128-bit values. Instead of special-casing such copies to perform the truncation beforehand in the source register bank, generalize this: a) Perform a subregister copy straight from source register whenever possible. This results in shorter MIR and fixes the above problem. b) Perform a full copy to target bank and then do a subregister copy only if source bank can't support target's size. E.g. GPR to 8-bit FPR copy. Patch by Raul Tambre (tambre)! Differential Revision: https://reviews.llvm.org/D75421	2020-03-05 11:13:02 -08:00
Fangrui Song	3e851f4a68	[PowerPC] Delete PPCMachObjectWriter and powerpc{,64}-apple-darwin Reviewed By: #powerpc, sfertile Differential Revision: https://reviews.llvm.org/D75494	2020-03-05 11:05:26 -08:00
Philip Reames	c93f1046fc	[X86/MC] Factor out common code [NFC]	2020-03-05 09:43:41 -08:00
David Stuttard	a74b33f612	AMDGPU: Fix SMRD test in trivially disjoint mem access code Summary: This seems like an obvious error - cut and paste issue? The change does make a change to one of the lit tests - it stops s_buffer_load re-ordering past an MUBUF instruction (which is not surprising). Change-Id: I80be99de5b62af4f42e91af2591b76a52ac9efa6 Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D75686	2020-03-05 17:14:01 +00:00
Chris Bowler	c7b6fa8f4b	[AIX] Extend int arguments to register width when passed in stack memory. This is a follow up to the previous patch: [AIX] Implement caller arguments passed in stack memory. This corrects a defect in AIX 64-bit where an i32 is written to the stack with stw (4 bytes) rather than the expected std (8 bytes.) Integer arguments pass on the stack as images of their register representation. I also took the opportunity to tidy up some of the calling convention AIX tests I added in my last commit. This patch adds the missed assembly expected output for the stack arg int case, which would have caught this problem. Differential Revision: https://reviews.llvm.org/D75126	2020-03-05 11:49:16 -05:00
Daniel Kiss	11ab687c66	[AArch64] Harmonize print format of hint instructions. Summary: Hint instructions printed as "hint\t#hintnum" except in case of ARM v8.3a instruction only "hint #hintnum" is printed. This patch changes all format to the fist one. Reviewers: pbarrio, LukeCheeseman, vsk Reviewed By: vsk Subscribers: kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D75625	2020-03-05 15:35:24 +01:00
Simon Pilgrim	1dbef64ef3	Fix "Value stored to 'RegForm' is never read" static analyzer warnings. NFC.	2020-03-05 14:22:24 +00:00
Sam Parker	77e30758dd	[ARM][MVE] Enable SHRN for tail predication These instructions don't swap lanes so make them valid. Differential Revision: https://reviews.llvm.org/D75667	2020-03-05 11:00:45 +00:00
David Blaikie	7a6878a72e	X86AsmBackend.cpp: #ifndef NDEBUG some only-used-in-asserts variables to fix the -Werror non-asserts build	2020-03-04 22:36:24 -08:00
Craig Topper	4c7c87f245	[X86] Simplify the code at the end of lowerShuffleAsBroadcast. The original code could create a bitcast from f64 to i64 and back on 32-bit targets. This was only working because getBitcast was able to fold the casts away to avoid leaving the illegal i64 type. Now we handle the scalar case directly by broadcasting using the scalar type as the element type. Then bitcasting to the final VT. This works since we ensure the scalar type is the same size as the final VT element type. No more casts to i64. For the vector case, we cast to VT or subvector of VT. And then do the broadcast. I think this all matches what we generated before, just in a more readable way.	2020-03-04 20:45:02 -08:00
Philip Reames	c94a4133bb	Consistently capitalize a variable [NFC] One instance in a copy paste was pointed out in a review, fix all instances at once.	2020-03-04 20:00:08 -08:00
Jim Lin	ea6eb813c7	[AVR][NFC] Use Register instead of unsigned Summary: Use Register type for variables instead of unsigned type. Reviewers: dylanmckay Reviewed By: dylanmckay Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D75595	2020-03-05 11:38:24 +08:00
hsmahesha	3fda1fde8f	AMDGPU/GlobalISel: Support llvm.trap and llvm.debugtrap intrinsics Summary: Lower trap and debugtrap intrinsics to AMDGPU machine instruction(s). Reviewers: arsenm, nhaehnle, kerbowa, cdevadas, t-tye, kzhuravl Reviewed By: arsenm Subscribers: kzhuravl, jvesely, wdng, yaxunl, rovka, dstuttard, tpr, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D74688	2020-03-05 08:16:57 +05:30
Shengchen Kan	b3722dea3b	[X86] Add a private member function determinePaddingPrefix for X86AsmBackend Summary: X86 can reduce the bytes of NOP by padding instructions with prefixes to get a better peformance in some cases. So a private member function `determinePaddingPrefix` is added to determine which prefix is the most suitable. Reviewers: annita.zhang, reames, MaskRay, craig.topper, LuoYuanke, jyknight Reviewed By: reames Subscribers: llvm-commits, dexonsmith, hiraditya Tags: #llvm Differential Revision: https://reviews.llvm.org/D75357	2020-03-05 09:26:33 +08:00
Philip Reames	f708c823f0	[X86] Relax existing instructions to reduce the number of nops needed for alignment purposes If we have an explicit align directive, we currently default to emitting nops to fill the space. As discussed in the context of the prefix padding work for branch alignment (D72225), we're allowed to play other tricks such as extending the size of previous instructions instead. This patch will convert near jumps to far jumps if doing so decreases the number of bytes of nops needed for a following align. It does so as a post-pass after relaxation is complete. It intentionally works without moving any labels or doing anything which might require another round of relaxation. The point of this patch is mainly to mock out the approach. The optimization implemented is real, and possibly useful, but the main point is to demonstrate an approach for implementing such "pad previous instruction" approaches. The key notion in this patch is to treat padding previous instructions as an optional optimization, not as a core part of relaxation. The benefit to this is that we avoid the potential concern about increasing the distance between two labels and thus causing further potentially non-local code grown due to relaxation. The downside is that we may miss some opportunities to avoid nops. For the moment, this patch only implements a small set of existing relaxations.. Assuming the approach is satisfactory, I plan to extend this to a broader set of instructions where there are obvious "relaxations" which are roughly performance equivalent. Note that this patch doesn't change which instructions are relaxable. We may wish to explore that separately to increase optimization opportunity, but I figured that deserved it's own separate discussion. There are possible downsides to this optimization (and all "pad previous instruction" variants). The major two are potentially increasing instruction fetch and perturbing uop caching. (i.e. the usual alignment risks) Specifically: * If we pad an instruction such that it crosses a fetch window (16 bytes on modern X86-64), we may cause the decoder to have to trigger a fetch it wouldn't have otherwise. This can effect both decode speed, and icache pressure. * Intel's uop caching have particular restrictions on instruction combinations which can fit in a particular way. By moving around instructions, we can both cause misses an change misses into hits. Many of the most painful cases are around branch density, so I don't expect this to be too bad on the whole. On the whole, I expect to see small swings (i.e. the typical alignment change problem), but nothing major or systematic in either direction. Differential Revision: https://reviews.llvm.org/D75203	2020-03-04 16:52:35 -08:00
Craig Topper	eadea7868f	[X86] Convert vXi1 vectors to xmm/ymm/zmm types via getRegisterTypeForCallingConv rather than using CCPromoteToType in the td file Previously we tried to promote these to xmm/ymm/zmm by promoting in the X86CallingConv.td file. But this breaks when we run out of xmm/ymm/zmm registers and need to fall back to memory. We end up trying to create a non-sensical scalar to vector. This lead to an assertion. The new tests in avx512-calling-conv.ll all trigger this assertion. Since we really want to treat these types like we do on avx2, it seems better to promote them before the calling convention code gets involved. Except when the calling convention is one that passes the vXi1 type in a k register. The changes in avx512-regcall-Mask.ll are because we indicated that xmm/ymm/zmm types should be passed indirectly for the Win64 ABI before we go to the common lines that promoted the vXi1 types. This caused the promoted types to be picked up by the default calling convention code. Now we promote them earlier so they get passed indirectly as though they were xmm/ymm/zmm. Differential Revision: https://reviews.llvm.org/D75154	2020-03-04 15:02:32 -08:00
Craig Topper	6ca96765c7	[X86] Disable commuting for the first source operand of zero masked scalar fma intrinsic instructions. I believe this is the correct fix for D75506 rather than disabling all commuting. We can still commute the remaining two sources. Differential Revision:m https://reviews.llvm.org/D75526	2020-03-04 14:35:53 -08:00
Matt Arsenault	15bf916b54	AMDGPU: Remove VOP3OpSelMods0 complex pattern Use default operand of 0 instead.	2020-03-04 17:18:22 -05:00
Matt Arsenault	9e1d2afc13	AMDGPU/GlobalISel: Don't use vector G_EXTRACT in arg lowering Create a wider source vector, and unmerge with dead defs like the legalizer. The legalization handling for G_EXTRACT is incomplete, and it's preferrable to keep everything in 32-bit pieces. We should probably start moving these functions into utils, since we have a growing number of places that do almost the same thing.	2020-03-04 16:49:01 -05:00
Matt Arsenault	fb0c35fa34	GlobalISel: Set alignment on function argument stack load/store	2020-03-04 16:38:46 -05:00
Zola Bridges	aa3f791fa9	[x86][SLH] Rm liveness check from data invariance check SLH had two functions named isDataInvariant and isDataInvariantLoad that checked whether the passed instruction was data invariant. For some instructions, if the EFLAGS were dead then they were considered data invariant, otherwise they were not considered data invariant. In this patch, I extracted that EFLAGS liveness check and made it explicit at every call to isDataInvariant and isDataInvariantLoad. This makes the isDataInvariant function behave more generally and preserves the liveness check behavior that SLH would like to have. Tested via llvm-lit llvm/test/CodeGen/X86/speculative-load-hardening* This is the first step in making these two data invariance checks available for non-SLH passes. The second step is to move the passes from SLH to X86InstrInfo.cpp. I'll follow up with a patch that does that. Differential Revision: https://reviews.llvm.org/D70283	2020-03-04 21:49:49 +01:00
Wei Mi	3c96d01d2e	Generate Callee Saved Register (CSR) related cfi directives like .cfi_restore. https://reviews.llvm.org/D42848 only handled CFA related cfi directives but didn't handle CSR related cfi. The patch adds the CSR part. Basically it reuses the framework created in D42848. For each basicblock, the patch tracks which CSR set have been saved at its CFG predecessors's exits, and compare the CSR set with the set at its previous basicblock's exit (The previous block is the block laid before the current block). If the saved CSR set at its previous basicblock's exit is larger, .cfi_restore will be inserted. The patch also generates proper .cfi_restore in epilogue to make sure the saved CSR set is consistent for the incoming edges of each block. Differential Revision: https://reviews.llvm.org/D74303	2020-03-04 11:18:37 -08:00
Nikita Popov	0e890cd4d4	[ConstantFolding] Always return something from ConstantFoldConstant Spin-off from D75407. As described there, ConstantFoldConstant() currently returns null for non-ConstantExpr/ConstantVector inputs, but otherwise always returns non-null, independently of whether any folding has happened or not. This is confusing and makes consumer code more complicated. I would expect either that ConstantFoldConstant() returns only if it actually folded something, or that it always returns non-null. I'm going to the latter possibility here, which appears to be more useful considering existing usage. Differential Revision: https://reviews.llvm.org/D75543	2020-03-04 18:24:47 +01:00
Craig Topper	06de426426	[X86] Directly form VBROADCAST_LOAD in lowerShuffleAsBroadcast on AVX targets. If we would emit a VBROADCAST node, we can instead directly emit a VBROADCAST_LOAD. This allows us to get rid of the special case to use an f64 load on 32-bit targets for vXi64. I believe there is more cleanup we can do later in this function, but I'll do that in follow ups.	2020-03-04 09:11:57 -08:00
Kerry McLaughlin	f5502c7035	[AArch64][SVE] Add SVE2 intrinsic for xar Summary: Implements the @llvm.aarch64.sve.xar intrinsic Reviewers: andwar, c-rhodes, dancgr, efriedma, rengolin Reviewed By: andwar Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, cfe-commits, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D75160	2020-03-04 11:44:32 +00:00
Simon Pilgrim	e2f0093800	[AMDGPU] performCvtF32UByteNCombine - revisit node after src operand simplification. If SimplifyDemandedBits succeeds in simplifying the byte src, add the CVT_F32_UBYTE node back to the worklist as we might be able to simplify further. Yet another step towards removing SelectionDAG::GetDemandedBits.	2020-03-04 11:25:50 +00:00
Simon Tatham	068b2f313c	[ARM,MVE] Add the `vshlcq` intrinsics. Summary: The VSHLC instruction performs a left shift of a whole vector register by an immediate shift count up to 32, shifting in new bits at the low end from a GPR and delivering the shifted-out bits from the high end back into the same GPR. Since the instruction produces two outputs (the shifted vector register and the output GPR of shifted-out bits), it has to be instruction-selected in C++ rather than Tablegen. Reviewers: MarkMurrayARM, dmgreen, miyuki, ostannard Reviewed By: miyuki Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D75445	2020-03-04 08:49:27 +00:00
Simon Tatham	810127f6ab	[ARM,MVE] Add the `vsbciq` intrinsics. Summary: These are exactly parallel to the existing `vadciq` intrinsics, which we implemented last year as part of the original MVE intrinsics framework setup. Just like VADC/VADCI, the MVE VSBC/VSBCI instructions deliver two outputs, both of which the intrinsic exposes: a modified vector register and a carry flag. So they have to be instruction-selected in C++ rather than Tablegen. However, in this case, that's trivial: the same C++ isel routine we already have for VADC works unchanged, and all we have to do is to pass it a different instruction id. Reviewers: MarkMurrayARM, dmgreen, miyuki, ostannard Reviewed By: miyuki Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D75444	2020-03-04 08:49:27 +00:00
Craig Topper	9284abd004	[X86] Directly form VBROADCAST_LOAD for BUILD_VECTOR of splat loads in lowerBuildVectorAsBroadcast.	2020-03-03 22:27:34 -08:00
Matt Arsenault	88aced1e45	AMDGPU: Fix computation for getOccupancyWithLocalMemSize The computation here didn't really make sense to me, and reported wildy different results depending on the flat work group size attribute. I think this should really report a range derived from the possible work group size bounds, and only allow an occupancy that is a multiple of the group size.	2020-03-03 17:15:57 -05:00
Craig Topper	02f03a6fd4	[X86] Match vpmullq latency to uops.info. Correct port usage for 512-bit memory form uops.info says these should be 15 cycle instructions. Uops.info also shows the 512-bit form uses port 0 and 5 for both register and memory. We had memory using 0 and 1. Differential Revision: https://reviews.llvm.org/D75549	2020-03-03 12:16:03 -08:00
Craig Topper	3c4e635593	[X86] Always emit an integer vbroadcast_load from lowerBuildVectorAsBroadcast regardless of AVX vs AVX2 If we go with D75412, we no longer depend on the scalar type directly. So we don't need to avoid using i64. We already have AVX1 fallback patterns with i32 and i64 scalar types so we don't need to avoid using integer types on AVX1. Differential Revision: https://reviews.llvm.org/D75413	2020-03-03 10:39:11 -08:00
Craig Topper	56cd3bc209	[X86] Directly emit VBROADCAST_LOAD from constant pool in lowerBuildVectorAsBroadcast Also add a DAG combine to combine different sized broadcasts from constant pool to avoid a regression. Differential Revision: https://reviews.llvm.org/D75412	2020-03-03 10:39:10 -08:00
Craig Topper	68aeaab888	[X86] Don't count the chain uses when forming broadcast loads in lowerBuildVectorAsBroadcast. The build_vector needs to be the only user of the data, but the chain will likely have another use. So we can't make sure the build_vector is the only user of the node.	2020-03-03 08:41:31 -08:00
Jonas Paulsson	ae4d39c9e4	[SystemZ] Copy Access registers and CC with the correct register class. On SystemZ there are a set of "access registers" that can be copied in and out of 32-bit GPRs with special instructions. These instructions can only perform the copy using low 32-bit parts of the 64-bit GPRs. However, the default register class for 32-bit integers is GRX32, which also contains the high 32-bit part registers. In order to never end up with a case of such a COPY into a high reg, this patch adds a new simple pre-RA pass that selects such COPYs into target instructions. This pass also handles COPYs from CC (Condition Code register), and COPYs to CC can now also be emitted from a high reg in copyPhysReg(). Fixes: https://bugs.llvm.org/show_bug.cgi?id=44254 Review: Ulrich Weigand. Differential Revision: https://reviews.llvm.org/D75014	2020-03-03 16:41:09 +01:00
Francesco Petrogalli	779e2c7a1a	[llvm][CodeGen][SVE] Constrain prefetch intrinsic argument to immediate values. Summary: The argument that sets the prefetch type of a prefetch intrinsic must be an immediate value. Reviewers: andwar, sdesmalen, efriedma Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D75482	2020-03-03 15:25:08 +00:00
Sam Parker	5618e9be37	[RDA][ARM] collectKilledOperands across multiple blocks Use MIOperand in collectLocalKilledOperands to make the search global, as we already have to search for global uses too. This allows us to delete more dead code when tail predicating. Differential Revision: https://reviews.llvm.org/D75167	2020-03-03 15:23:05 +00:00
Sam Parker	dfe8f5da4c	[ARM][RDA] Allow multiple killed users In RDA, check against the already decided dead instructions when looking at users. This allows an instruction to be removed if it has multiple users, but they're all dead. This means that IT instructions can be considered killed once all the itstate using instructions are dead. Differential Revision: https://reviews.llvm.org/D75245	2020-03-03 15:12:29 +00:00
Jonas Paulsson	237625757a	[SystemZ] Bugfix for backchain with packed-stack The incoming back chain slot was implicitly allocated whenever a GPR was saved in SystemZFrameLowering::getRegSpillOffset(), but in cases where no GPRs were saved/restored this did not take effect. Review: Ulrich Weigand Differential Revision: https://reviews.llvm.org/D75367	2020-03-03 15:03:01 +01:00
Jonas Paulsson	cdcce3cabf	[SystemZ] Also accept ISD::USUBO in shouldFormOverflowOp(). Forming subtract with overflow is beneficial on SystemZ, just like additions. Review: Ulrich Weigand Differential Revision: https://reviews.llvm.org/D75290	2020-03-03 14:38:57 +01:00
Jim Lin	4e3b037665	[AVR] Fix incorrect register state for LDRdPtr Summary: LDRdPtr expanded from LDWRdPtr shouldn't define its second operand(SrcReg). The second operand is its source register. Add -verify-machineinstrs into command line of testcases can trigger this error. Reviewers: dylanmckay Reviewed By: dylanmckay Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D75437	2020-03-03 17:34:54 +08:00
Shengchen Kan	af57b139a0	Temporarily Revert [X86] Not track size of the boudaryalign fragment during the layout Summary: This reverts commit `2ac19feb15`. This commit causes some test cases to run fail when branch is aligned.	2020-03-03 11:15:56 +08:00
Jim Lin	c0a2da9460	[AVR] Add missing ROLLOOP and RORLOOP into getTargetNodeName	2020-03-03 09:43:52 +08:00
Huihui Zhang	44fa47c9e7	[ARM][ConstantIslands] Fix stack mis-alignment caused by undoLRSpillRestore. Summary: It is not safe for ARMConstantIslands to undoLRSpillRestore. PrologEpilogInserter is the one to ensure stack alignment, taking into consideration LR is spilled or not. For noreturn function with StackAlignment 8 (function contains call/alloc), undoLRSpillRestore cause stack be mis-aligned. Fixing stack alignment in ARMConstantIslands doesn't give us much benefit, as undo LR spill/restore only occur in large function with near branches only, also doesn't have callee-saved LR spill. Reviewers: t.p.northover, rengolin, efriedma, apazos, samparker, ostannard Reviewed By: ostannard Subscribers: dmgreen, ostannard, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D75288	2020-03-02 16:28:57 -08:00
Philip Reames	7049cf6496	[BranchAlign] Fix bug w/nop padding for SS manipulation X86 has several instructions which are documented as enabling interrupts exactly one instruction after the one which changes the SS segment register. Inserting a nop between these two instructions allows an interrupt to arrive before the execution of the following instruction which changes semantic behaviour. The list of instructions is documented in "Table 24-3. Format of Interruptibility State" in Volume 3c of the Intel manual. They basically all come down to different ways to write to the SS register. Differential Revision: https://reviews.llvm.org/D75359	2020-03-02 14:40:25 -08:00
Joerg Sonnenberger	eb812efa12	Explicitly include <cassert> when using assert Depending on the OS used, a module-enabled build can fail due to the special handling <cassert> gets as textual header.	2020-03-02 22:45:28 +01:00
Jessica Paquette	02c154a9cb	[AArch64][MachineOutliner] Don't outline CFI instructions CFI instructions can only safely be outlined when the outlined call is a tail call, or when the outlined frame is fixed up. For the sake of correctness, disable outlining from CFI instructions. Add machine-outliner-cfi.mir to test this.	2020-03-02 10:56:35 -08:00
Simon Pilgrim	e20e6f26fa	Fix shadow variable warning. NFC.	2020-03-02 18:53:19 +00:00
Simon Pilgrim	2b624e04c7	Fix 'unsigned variable can never be negative' cppcheck warning. NFCI.	2020-03-02 18:53:18 +00:00
Krzysztof Parzyszek	0fafb4becc	[Hexagon] Use BUILD_PAIR to expand i128 instead of doing arithmetic	2020-03-02 09:52:07 -06:00
Simon Pilgrim	f5ad93d2f7	[X86] Cleanup ShuffleDecode implementations. NFCI. - Remove unnecessary includes from the headers - Fix cppcheck definition/declaration arg mismatch warnings - Tidyup old comments (MVT usage was removed a long time ago) - Use SmallVector::append for repeated mask entries	2020-03-02 15:06:35 +00:00
Luke Geeson	7d594cf003	[ARM] Add Cortex-M55 Support for clang and llvm This patch upstreams support for the ARM Armv8.1m cpu Cortex-M55. In detail adding support for: - mcpu option in clang - Arm Target Features in clang - llvm Arm TargetParser definitions details of the CPU can be found here: https://developer.arm.com/ip-products/processors/cortex-m/cortex-m55 Reviewers: chill Reviewed By: chill Subscribers: dmgreen, kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D74966	2020-03-02 11:42:26 +00:00
Andrzej Warzynski	9249f60602	[AArch64][SVE] Add intrinsics for non-temporal gather-loads/scatter-stores Summary: This patch adds the following LLVM IR intrinsics for SVE: 1. non-temporal gather loads * @llvm.aarch64.sve.ldnt1.gather * @llvm.aarch64.sve.ldnt1.gather.uxtw * @llvm.aarch64.sve.ldnt1.gather.scalar.offset 2. non-temporal scatter stores * @llvm.aarch64.sve.stnt1.scatter * @llvm.aarch64.sve.ldnt1.gather.uxtw * @llvm.aarch64.sve.ldnt1.gather.scalar.offset These intrinsic are mapped to the corresponding SVE instructions (example for half-words, zero-extending): * ldnt1h { z0.s }, p0/z, [z0.s, x0] * stnt1h { z0.s }, p0/z, [z0.s, x0] Note that for non-temporal gathers/scatters, the SVE spec defines only one instruction type: "vector + scalar". For this reason, we swap the arguments when processing intrinsics that implement the "scalar + vector" addressing mode: * @llvm.aarch64.sve.ldnt1.gather * @llvm.aarch64.sve.ldnt1.gather.uxtw * @llvm.aarch64.sve.stnt1.scatter * @llvm.aarch64.sve.ldnt1.gather.uxtw In other words, all intrinsics for gather-loads and scatter-stores implemented in this patch are mapped to the same load and store instruction, respectively. The sve2_mem_gldnt_vs multiclass (and it's counterpart for scatter stores) from SVEInstrFormats.td was split into: * sve2_mem_gldnt_vec_vs_32_ptrs (32bit wide base addresses) * sve2_mem_gldnt_vec_vs_62_ptrs (64bit wide base addresses) This is consistent with what we did for @llvm.aarch64.sve.ld1.scalar.offset and highlights the actual split in the spec and the implementation. Reviewed by: sdesmalen Differential Revision: https://reviews.llvm.org/D74858	2020-03-02 10:38:28 +00:00
Simon Tatham	1a8cbfa514	[ARM,MVE] Add ACLE intrinsics for VCVT[ANPM] family. Summary: These instructions convert a vector of floats to a vector of integers of the same size, with assorted non-default rounding modes. Implemented in IR as target-specific intrinsics, because as far as I can see there are no matches for that functionality in the standard IR intrinsics list. Reviewers: MarkMurrayARM, dmgreen, miyuki, ostannard Reviewed By: dmgreen Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D75255	2020-03-02 10:33:30 +00:00
Simon Tatham	b08d2ddd69	[ARM,MVE] Add ACLE intrinsics for VCVT.F32.F16 family. Summary: These instructions make a vector of `<4 x float>` by widening every other lane of a vector of `<8 x half>`. I wondered about representing these using standard IR, along the lines of a shufflevector to extract elements of the input into a `<4 x half>` followed by an `fpext` to turn that into `<4 x float>`. But it looks as if that would take a lot of work in isel lowering to make it match any pattern I could sensibly write in Tablegen, and also I haven't been able to think of any other case where that pattern might be generated in IR, so there wouldn't be any extra code generation win from doing it that way. Therefore, I've just used another target-specific intrinsic. We can always change it to the other way later if anyone thinks of a good reason. (In order to put the intrinsic definition near similar things in `IntrinsicsARM.td`, I've also lifted the definition of the `MVEMXPredicated` multiclass higher up the file, without changing it.) Reviewers: MarkMurrayARM, dmgreen, miyuki, ostannard Reviewed By: miyuki Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D75254	2020-03-02 10:33:30 +00:00
Simon Tatham	69441e53c9	[ARM,MVE] Correct MC operands in VCVT.F32.F16. (NFC) Summary: The two MVE instructions that convert between v4f32 and v8f16 were implemented as instances of the same class, with the same MC operand list. But that's not really appropriate, because the narrowing conversion only partially overwrites its output register (it only has 4 f16 values to write into a vector of 8), so even when unpredicated, it needs a $Qd_src input, a constraint tying that to the $Qd output, and a vpred_n. The widening conversion is better represented like any other instruction that completely replaces its output when unpredicated: it should have no $Qd_src operand, and instead, a vpred_r containing a $inactive parameter. That's a better match to other similar instructions, such as its integer analogue, the VMOVL instruction that makes a v4i32 by sign- or zero-extending every other lane of a v8i16. This commit brings the widening VCVT.F32.F16 into line with the other instructions that behave like it. That means you can write isel patterns that use it unpredicated, without having to add a pointless undefined $QdSrc operand. No existing code generation uses that instruction yet, so there should be no functional change from this fix. Reviewers: MarkMurrayARM, dmgreen, miyuki, ostannard Reviewed By: dmgreen Subscribers: kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D75253	2020-03-02 10:33:30 +00:00
Simon Tatham	a41ecf0eb0	[ARM,MVE] Add ACLE intrinsics for VQMOV[U]N family. Summary: These instructions work like VMOVN (narrowing a vector of wide values to half size, and overwriting every other lane of an output register with the result), except that the narrowing conversion is saturating. They come in three signedness flavours: signed to signed, unsigned to unsigned, and signed to unsigned. All are represented in IR by a target-specific intrinsic that takes two separate 'unsigned' flags. Reviewers: MarkMurrayARM, dmgreen, miyuki, ostannard Reviewed By: dmgreen Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D75252	2020-03-02 10:33:30 +00:00
Anna Welker	394974111b	[ARM][MVE] Restrict allowed types of gather/scatter offsets The MVE gather instructions smaller than 32bits zext extend the values in the offset register, as opposed to sign extending them. We need to make sure that the code that we select from is suitably extended, which this patch attempts to fix by tightening up the offset checks. Differential Revision: https://reviews.llvm.org/D75361	2020-03-02 10:04:12 +00:00
Kang Zhang	4962a0b26a	[NFC][PowerPC] Move some alias definition from PPCInstrInfo.td to PPCInstr64Bit.td Summary: Some 64-bit instructions alias definition is in PPCInstrInfo.td, it should be moved to PPCInstr64Bit.td.	2020-03-02 09:54:15 +00:00
Jim Lin	bfdb834bc3	[Sparc] Fix incorrect operand for matching CMPri pattern Summary: It should be normal constant instead of target constant. Pattern CMPri can be matched if the constant can be fitted into immediate field. Otherwise, pattern CMPrr will be matched. This fixed bug https://bugs.llvm.org/show_bug.cgi?id=44091. Reviewers: dcederman, jyknight Reviewed By: jyknight Subscribers: jonpa, hiraditya, fedor.sergeev, jrtc27, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D75227	2020-03-02 11:36:32 +08:00
Shengchen Kan	2ac19feb15	[X86] Not track size of the boudaryalign fragment during the layout Summary: Currently the boundaryalign fragment caches its size during the process of layout and then it is relaxed and update the size in each iteration. This behaviour is unnecessary and ugly. Reviewers: annita.zhang, reames, MaskRay, craig.topper, LuoYuanke, jyknight Reviewed By: MaskRay Subscribers: hiraditya, dexonsmith, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D75404	2020-03-02 09:32:30 +08:00
Craig Topper	2f4f8fcf64	[X86] Don't add DELETED_NODES to DAG combine worklist after calling SimplifyDemandedBits/SimplifyDemandedVectorElts. These AddToWorklist calls were added in `84cd968f75`. It's possible the SimplifyDemandedBits/SimplifyDemandedVectorElts triggered CSE that deleted N. Detect that and avoid adding N to the worklist. Fixes PR45067.	2020-03-01 00:06:32 -08:00
Fangrui Song	9569a1472e	[PowerPC] Move .got2/.toc logic from PPCLinuxAsmPrinter::doFinalization() to emitEndOfAsmFile() Delete redundant .p2align 2 and improve tests.	2020-02-29 17:12:36 -08:00
Craig Topper	5d6dfd877f	[X86] Tighten up the SDTypeProfile for X86ISD::CVTNE2PS2BF16. NFCI	2020-02-29 13:22:13 -08:00
Simon Pilgrim	7e9747b50b	[X86][F16C] Remove cvtph2ps intrinsics and use generic half2float conversion (PR37554) This removes everything but int_x86_avx512_mask_vcvtph2ps_512 which provides the SAE variant, but even this can use the fpext generic if the rounding control is the default. Differential Revision: https://reviews.llvm.org/D75162	2020-02-29 18:57:35 +00:00
Fangrui Song	692e0c9648	[MC] Add MCStreamer::emitInt{8,16,32,64} Similar to AsmPrinter::emitInt{8,16,32,64}.	2020-02-29 09:40:21 -08:00
Benjamin Kramer	186dd63182	ArrayRef'ize restoreCalleeSavedRegisters. NFCI. restoreCalleeSavedRegisters can mutate the contents of the CalleeSavedInfos, so use a MutableArrayRef.	2020-02-29 09:50:23 +01:00
Shengchen Kan	95fa5c4f24	[X86] Move the function getOrCreateBoundaryAlignFragment MCObjectStreamer is more suitable to create fragments than X86AsmBackend, for example, the function getOrCreateDataFragment is defined in MCObjectStreamer. Differential Revision: https://reviews.llvm.org/D75351	2020-02-29 15:11:16 +08:00
Shengchen Kan	129a762555	[X86] Disable the NOP padding for branches when bundle is enabled When bundle is enabled, data fragment itself has a space to emit NOP to bundle-align instructions. The behaviour makes it impossible for us to determine whether the macro fusion really happen when emitting instructions. In addition, boundary-align fragment is also used to emit NOPs to align instructions, currently using them together sometimes makes code crazy. Differential Revision: https://reviews.llvm.org/D75346	2020-02-29 15:07:06 +08:00
Craig Topper	9fcd212e2f	[X86] Remove isel patterns from broadcast of loadi32. We already combine non extending loads with broadcasts in DAG combine. All these patterns are picking up is the aligned extload special case. But the only lit test we have that exercsises it is using v8i1 load that datalayout is reporting align 8 for. That seems generous. So without a realistic test case I don't think there is much value in these patterns.	2020-02-28 16:39:27 -08:00
Jay Foad	7d973307d5	[AMDGPU] Fix scheduling model for V_MULLIT_F32 This was incorrectly marked as a half rate 64-bit instruction by D45073.	2020-02-28 23:22:58 +00:00
Craig Topper	f2d45e5097	[X86] Canonicalize (bitcast (vbroadcast_load)) so that the cast and vbroadcast_load are both integer or fp. Helps a little with some isel pattern matching. Especially on 32-bit targets where we sometimes use f64 loads.	2020-02-28 15:07:49 -08:00
Craig Topper	b68eeff05c	[X86] Cleanup a comment around bitcasting X86ISD::VBROADCAST_LOAD and add an assert to make sure memory VT size doesn't change.	2020-02-28 15:07:49 -08:00
Vedant Kumar	0368b42295	[entry values] ARM: Add a describeLoadedValue override (PR45025) As a narrow stopgap for the assertion failure described in PR45025, add a describeLoadedValue override to ARMBaseInstrInfo and use it to detect copies in which the forwarding reg is a super/sub reg of the copy destination. For the moment this is unsupported. Several follow ups are possible: 1) Handle VORRq. At the moment, we do not, because isCopyInstrImpl returns early when !MI.isMoveReg(). 2) In the case where forwarding reg is a super-reg of the copy destination, we should be able to describe the forwarding reg as a subreg within the copy destination. I'm not 100% sure about this, but it looks like that's what's done in AArch64InstrInfo. 3) In the case where the forwarding reg is a sub-reg of the copy destination, maybe we could describe the forwarding reg using the copy destinaion and a DW_OP_LLVM_fragment (I guess this should be possible after D75036). https://bugs.llvm.org/show_bug.cgi?id=45025 rdar://59772698 Differential Revision: https://reviews.llvm.org/D75273	2020-02-28 14:30:40 -08:00
Jay Foad	43830790d7	[AMDGPU] Remove dubious logic in bidirectional list scheduler Summary: pickNodeBidirectional tried to compare the best top candidate and the best bottom candidate by examining TopCand.Reason and BotCand.Reason. This is unsound because, after calling pickNodeFromQueue, Cand.Reason does not reflect the most important reason why Cand was chosen. Rather it reflects the most recent reason why it beat some other potential candidate, which could have been for some low priority tie breaker reason. I have seen this cause problems where TopCand is a good candidate, but because TopCand.Reason is ORDER (which is very low priority) it is repeatedly ignored in favour of a mediocre BotCand. This is not how bidirectional scheduling is supposed to work. To fix this I changed the code to always compare TopCand and BotCand directly, like the generic implementation of pickNodeBidirectional does. This removes some uncommented AMDGPU-specific logic; if this logic turns out to be important then perhaps it could be moved into an override of tryCandidate instead. Graphics shader benchmarking on gfx10 shows a lot more positive than negative effects from this change. Reviewers: arsenm, tstellar, rampitec, kzhuravl, vpykhtin, dstuttard, tpr, atrick, MatzeB Subscribers: jvesely, wdng, nhaehnle, yaxunl, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68338	2020-02-28 21:35:34 +00:00
Krzysztof Parzyszek	e7b9a20584	[Hexagon] Map dcfetch intrinsic to Y2_dcfetchbo, not Y2_dcfetch	2020-02-28 14:19:20 -06:00
Craig Topper	c0d0e6b198	[X86] Recognize CVTPH2PS from STRICT_FP_EXTEND This should avoid scalarizing the cvtph2ps intrinsics with D75162 Differential Revision: https://reviews.llvm.org/D75304	2020-02-28 10:19:57 -08:00
Teresa Johnson	f9ca75f19b	[Inliner] Inlining should honor nobuiltin attributes Summary: Final patch in series to fix inlining between functions with different nobuiltin attributes/options, which was specifically an issue in LTO. See discussion on D61634 for background. The prior patch in this series (D67923) enabled per-Function TLI construction that identified the nobuiltin attributes. Here I have allowed inlining to proceed if the callee's nobuiltins are a subset of the caller's nobuiltins, but not in the reverse case, which should be conservatively correct. This is controlled by a new option, -inline-caller-superset-nobuiltin, which is enabled by default. Reviewers: hfinkel, gchatelet, chandlerc, davidxl Subscribers: arsenm, jvesely, nhaehnle, mehdi_amini, eraman, hiraditya, haicheng, dexonsmith, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D74162	2020-02-28 07:34:14 -08:00
Simon Pilgrim	b6e80864b6	Fix MSVC "32-bit shift implicitly converted to 64 bits" warning. NFCI.	2020-02-28 15:23:37 +00:00
Krzysztof Parzyszek	c8bfed05e2	Reland `7691790dfd` with a MSAN fix In some cases when HexagonTargetLowering::allowsMemoryAccess returned true, it did not set the "Fast" argument, leaving it uninitialized. [Hexagon] Improve casting of boolean HVX vectors to scalars - Mark memory access for bool vectors as disallowed in target lowering. This will prevent combining bitcasts of bool vectors with stores. - Replace the actual bitcasting code with a faster version. - Handle casting of v16i1 to i16.	2020-02-28 08:32:58 -06:00
David Green	e2a2f3f7fc	[ARM] MVE VMLAS This addes extra patterns for the VMLAS MVE instruction, which performs Qda = Qda * Qn + Rm, a similar pattern to the existing VMLA. The sinking of splat(Rm) into the loop is already performed, meaning we just need extra Pat's in tablegen. Differential Revision: https://reviews.llvm.org/D75115	2020-02-28 14:27:21 +00:00
Jay Foad	970558df94	[AMDGPU] Mark the scheduling model as complete	2020-02-28 13:35:55 +00:00
Jay Foad	addcbc401c	[AMDGPU] Update a comment missed in `74e2974ac6`	2020-02-28 13:35:55 +00:00
Simon Cook	ca950a6bb1	[RISCV] Compress instructions based on function features When running under LTO, it is common to not specify the architecture spec, which is used for setting up the target machine, and instead rely on features specified in each function to generate the correct instructions. This works for the code generator, but the RISC-V backend uses the AsmPrinter to do instruction compression, which does not see these features but instead uses a MCSubtargetInfo object to see whether compression is enabled. Since this is configured based on the TargetMachine at startup, it will result in compressed instructions not being emitted when it has not been given the 'c' TargetFeature, but the function has it. This changes the RISCVAsmPrinter to re-initialize the STI feature set based on the current MachineFunction, such that compressed instructions are now correctly emitted regardless of the method used to enable them. Differential revision: https://reviews.llvm.org/D73339	2020-02-28 11:52:55 +00:00
Peter Smith	2a92fc9b8e	[MC][ELF][ARM] Add relocations for some pc-relative fixups Add ELF relocations for the following fixups: fixup_thumb_adr_pcrel_10 -> R_ARM_THM_PC8 fixup_thumb_cp -> R_ARM_THM_PC8 fixup_t2_adr_pcrel_12 -> R_ARM_THM_PREL_11_0 fixup_t2_ldst_pcrel_12 -> R_ARM_THM_PC12 While these relocations are short-ranged there is support in the open source ELF linker's in binutils and soon to be in LLD. MC will no longer resolve pc-relative fixups to global symbols due to interpositioning concerns. We can handle these at link time by implementing the relocations. The R_ARM_THM_PC8 has some extra encoding rules for addends that llvm-mc sidesteps by not supporting addends for these instructions, using the wide Thumb 2 instruction if it is available. I think that this is a reasonable compromise given that these are rare. This partiall reverts D72892, the Thumb fixups no longer need to be evaluated at assembly time. Differential Revision: https://reviews.llvm.org/D75039	2020-02-28 11:29:29 +00:00
Sam Parker	bf61421a02	[RDA] Track implicit-defs Ensure that we're recording implicit defs, as well as visiting implicit uses and implicit defs when we're walking through operands. Differential Revision: https://reviews.llvm.org/D75185	2020-02-28 11:14:42 +00:00
Stefan Agner	2f95d5f103	[ARM][Thumb2] support .w assembler qualifier for dmb/dsb/isb Support the explicit wide assembler qualifier for the dmb/dsb/isb synchronization barrier instructions. Differential revision: https://reviews.llvm.org/D75143	2020-02-28 11:08:24 +00:00
Stefan Agner	b4207e705b	[ARM][Thumb2] Support .w assembler qualifier for pld/pldw/pli Accept explicit wide assembler qualifier for the pld/pldw/pli. Differential revision: https://reviews.llvm.org/D75144	2020-02-28 11:08:24 +00:00
Stanislav Mekhanoshin	6b813f2762	[AMDGPU] Enable runtime unroll for LDS We want to do unroll for LDS even for runtime trip count to combine LDS operations. Differential Revision: https://reviews.llvm.org/D75293	2020-02-27 12:59:35 -08:00
Sanjay Patel	90fd859f51	[x86] use instruction-level fast-math-flags to drive MachineCombiner The code changes here are hopefully straightforward: 1. Use MachineInstruction flags to decide if FP ops can be reassociated (use both "reassoc" and "nsz" to be consistent with IR transforms; we probably don't need "nsz", but that's a safer interpretation of the FMF). 2. Check that both nodes allow reassociation to change instructions. This is a stronger requirement than we've usually implemented in IR/DAG, but this is needed to solve the motivating bug (see below), and it seems unlikely to impede optimization at this late stage. 3. Intersect/propagate MachineIR flags to enable further reassociation in MachineCombiner. We managed to make MachineCombiner flexible enough that no changes are needed to that pass itself. So this patch should only affect x86 (assuming no other targets have implemented the hooks using MachineIR flags yet). The motivating example in PR43609 is another case of fast-math transforms interacting badly with special FP ops created during lowering: https://bugs.llvm.org/show_bug.cgi?id=43609 The special fadd ops used for converting int to FP assume that they will not be altered, so those are created without FMF. However, the MachineCombiner pass was being enabled for FP ops using the global/function-level TargetOption for "UnsafeFPMath". We managed to run instruction/node-level FMF all the way down to MachineIR sometime in the last 1-2 years though, so we can do better now. The test diffs require some explanation: 1. llvm/test/CodeGen/X86/fmf-flags.ll - no target option for unsafe math was specified here, so MachineCombiner kicks in where it did not previously; to make it behave consistently, we need to specify a CPU schedule model, so use the default model, and there are no code diffs. 2. llvm/test/CodeGen/X86/machine-combiner.ll - replace the target option for unsafe math with the equivalent IR-level flags, and there are no code diffs; we can't remove the NaN/nsz options because those are still used to drive x86 fmin/fmax codegen (special SDAG opcodes). 3. llvm/test/CodeGen/X86/pow.ll - similar to #1 4. llvm/test/CodeGen/X86/sqrt-fastmath.ll - similar to #1, but MachineCombiner does some reassociation of the estimate sequence ops; presumably these are perf wins based on latency/throughput (and we get some reduction of move instructions too); I'm not sure how it affects numerical accuracy, but the test reflects reality better now because we would expect MachineCombiner to be enabled if the IR was generated via something like "-ffast-math" with clang. 5. llvm/test/CodeGen/X86/vec_int_to_fp.ll - this is the test added to model PR43609; the fadds are not reassociated now, so we should get the expected results. 6. llvm/test/CodeGen/X86/vector-reduce-fadd-fast.ll - similar to #1 7. llvm/test/CodeGen/X86/vector-reduce-fmul-fast.ll - similar to #1 Differential Revision: https://reviews.llvm.org/D74851	2020-02-27 15:19:37 -05:00
Simon Pilgrim	168a44a70e	[CostModel][X86] Improve extract/insert element costs (PR43605) This tries to improve the accuracy of extract/insert element costs by accounting for subvector extraction/insertion for >128-bit vectors and the shuffling of elements to/from the 0'th index. It also adds INSERTPS for f32 types and PINSR/PEXTR costs for integer types (at the moment we assume the same cost as MOVD/MOVQ - which isn't always true). Differential Revision: https://reviews.llvm.org/D74976	2020-02-27 15:54:13 +00:00
Sam Parker	965ba4291a	Revert "[ARM] Add CPSR as an implicit use of t2IT" This reverts commit `e58229fded`. Differential Revision: https://reviews.llvm.org/D75186	2020-02-27 15:43:44 +00:00
Simon Pilgrim	f90cc633de	Fix cppcheck definition/declaration arg mismatch warnings. NFCI.	2020-02-27 14:35:20 +00:00
Simon Pilgrim	fe6bcfaf3b	[X86] Use Subtarget.useSoftFloat() in X86TargetLowering constructor Avoid use of X86TargetLowering::useSoftFloat() in the constructor as its a virtual function	2020-02-27 14:35:20 +00:00
Simon Pilgrim	e61e7f0794	Fix shadow variable warning. NFC.	2020-02-27 14:23:05 +00:00
Simon Pilgrim	dc7ac563ac	Fix shadow variable warnings. NFC.	2020-02-27 14:21:30 +00:00
Simon Pilgrim	efe2f59ec4	[X86] LowerMSCATTER/MGATHER - reduce scope of MaskVT. NFCI. Fixes cppcheck warning.	2020-02-27 14:20:44 +00:00
Simon Pilgrim	fabe52a741	Fix uninitialized variable warning. NFC.	2020-02-27 14:20:43 +00:00
Simon Pilgrim	6bdd63dc28	[X86] createVariablePermute - handle case where recursive createVariablePermute call fails Account for the case where a recursive createVariablePermute call with a wider vector type fails. Original test case from @craig.topper (Craig Topper)	2020-02-27 13:52:31 +00:00
Djordje Todorovic	016d91ccbd	[CallSiteInfo] Handle bundles when updating call site info This will address the issue: P8198 and P8199 (from D73534). The methods was not handle bundles properly. Differential Revision: https://reviews.llvm.org/D74904	2020-02-27 13:57:06 +01:00
Andrzej Warzynski	fa9439fac8	[AArch64][SVE] Add intrinsics for first-faulting gather loads Summary: The following intrinsics are added: * @llvm.aarch64.sve.ldff1.gather * @llvm.aarch64.sve.ldff1.gather.index * @llvm.aarch64.sve.ldff1.gather_sxtw * @llvm.aarch64.sve.ldff1.gather.uxtw * @llvm.aarch64.sve.ldff1.gather_sxtw.index * @llvm.aarch64.sve.ldff1.gather.uxtw.index * @llvm.aarch64.sve.ldff1.gather.scalar.offset Although this patch is quite substantial, the vast majority of the implementation is just a 'copy & paste' of the implementation of regular gather loads, including tests. There's only a handful of new definitions: * AArch64ISD nodes defined in AArch64ISelLowering.h (e.g. GLDFF1) * Seleciton DAG Types in AArch64SVEInstrInfo.td (e.g. AArch64ldff1_gather) * intrinsics in IntrinsicsAArch64.td (e.g. aarch64_sve_ldff1_gather) * Pseudo instructions in SVEInstrFormats.td to workaround the issue of use-before-def for the FFR register. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D75128	2020-02-27 12:56:33 +00:00
David Green	13f2a5883f	[ARM] Fixup FP16 bitcasts Under fp16 we optimise the bitcast between a VMOVhr and a CopyToReg via custom lowering. This rewrites that to be a DAG combine instead, which helps produce better code in the cases where the bitcast is actaully legal. Differential Revision: https://reviews.llvm.org/D72753	2020-02-27 12:19:31 +00:00
Hans Wennborg	2e24219d3c	[MC][ARM] Resolve some pcrel fixups at assembly time (PR44929) MC currently does not emit these relocation types, and lld does not handle them. Add FKF_Constant as a work-around of some ARM code after D72197. Eventually we probably should implement these relocation types. By Fangrui Song! Differential revision: https://reviews.llvm.org/D72892	2020-02-27 12:43:29 +01:00
Kirill Bobyrev	014728413f	Revert "[Hexagon] Improve casting of boolean HVX vectors to scalars" This reverts commit `7691790dfd`. The patch is failing tests with MSAN: http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/39054/steps/check-llvm%20msan/logs/stdio	2020-02-27 11:58:32 +01:00
Djordje Todorovic	58d9e8194e	[DebugInfo][ARM] Fix noreg case when checkig if it is an addImm This fixes a build failure with an ARM buildbot. Differential Revision: https://reviews.llvm.org/D75231	2020-02-27 11:39:19 +01:00
Sam Parker	e58229fded	[ARM] Add CPSR as an implicit use of t2IT This use is already attached to the BUNDLE instruction but is lost after finalisation. Differential Revision: https://reviews.llvm.org/D75186	2020-02-27 10:10:40 +00:00
Sjoerd Meijer	13db7490fa	[AArch64] Peephole optimization: merge AND and TST instructions In some cases Clang does not perform merging of instructions AND and TST (aka ANDS xzr). Example: tst x2, x1 and x3, x2, x1 to: ands x3, x2, x1 This patch add such merging during instruction selection: when AND is replaced with ANDS instruction in LowerSELECT_CC, all users of AND also should be changed for using this ANDS instruction Short discussion on mailing list: http://llvm.1065342.n5.nabble.com/llvm-dev-ARM-Peephole-optimization-instructions-tst-add-tp133109.html Patch by Pavel Kosov. Differential Revision: https://reviews.llvm.org/D71701	2020-02-27 09:23:47 +00:00
Craig Topper	82a21c1655	[X86] Add proper MachinePointerInfo to stack store created in LowerWin64_i128OP.	2020-02-26 16:55:24 -08:00
Craig Topper	870363a22d	[X86] Explicitly pass Destination VT and debug location to BuildFILD. NFC We'd already passed most everything else. Might was well pass these two things and stop passing Op.	2020-02-26 16:26:46 -08:00
Craig Topper	15e2831fcd	[X86] Explicitly pass Pointer, MachinePointerInfo and Alignment to BuildFILD. Previously this code was called into two ways, either a FrameIndexSDNode was passed in StackSlot. Or a load node was passed in the argument called StackSlot. This was determined by a dyn_cast to FrameIndexSDNode. In the case of a load, we had to go find the real pointer from operand 0 and cast the node to MemSDNode to find the pointer info. For the stack slot case, the code assumed that the stack slot was perfectly aligned despite not being the creator of the slot. This commit modifies the interface to make the caller responsible for passing all of the required information to avoid all the guess work and reverse engineering. I'm not aware of any issues with the original code after an earlier commit to fix the alignment of one of the stack objects. This is just clean up to make the code less surprising.	2020-02-26 16:26:26 -08:00
Craig Topper	77d9b7b2cd	[X86] Query constant pool object alignment instead of hardcoding.	2020-02-26 14:45:39 -08:00
Craig Topper	9c1a707ba3	[X86] Use proper alignment for stack temporary and correct MachinePointerInfo for stack accesses in LowerUINT_TO_FP.	2020-02-26 14:45:38 -08:00
Craig Topper	a8186935ae	[X86] Use correct MachineMemOperand for stack load in LowerFLT_ROUNDS_	2020-02-26 14:45:38 -08:00
Craig Topper	14306ce80c	[X86] Add proper MachinePointerInfo to the loads/stores created for moving data between SSE and X87 in X86DAGToDAGISel::PreprocessISelDAG	2020-02-26 14:45:37 -08:00
Amara Emerson	65f99b5383	[AArch64][GlobalISel] Fixup <32b heterogeneous regbanks of G_PHIs just before selection. Since all types <32b on gpr end up being assigned gpr32 regclasses, we can end up with PHIs here which try to select between a gpr32 and an fpr16. Ideally RBS shouldn't be selecting heterogenous regbanks for operands if possible, but we still need to be able to deal with it here. To fix this, if we have a gpr-bank operand < 32b in size and at least one other operand is on the fpr bank, then we add cross-bank copies to homogenize the operand banks. For simplicity the bank that we choose to settle on is whatever bank the def operand has. For example: %endbb: %dst:gpr(s16) = G_PHI %in1:gpr(s16), %bb1, %in2:fpr(s16), %bb2 => %bb2: ... %in2_copy:gpr(s16) = COPY %in2:fpr(s16) ... %endbb: %dst:gpr(s16) = G_PHI %in1:gpr(s16), %bb1, %in2_copy:gpr(s16), %bb2 Differential Revision: https://reviews.llvm.org/D75086	2020-02-26 14:10:32 -08:00
Eric Astor	85b641c27a	[ms] Rename ParsingInlineAsm functions/variables to reflect MS-specificity. Summary: ParsingInlineAsm was a misleading name. These values are only set for MS-style inline assembly. Reviewed By: rnk Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D75198	2020-02-26 15:19:40 -05:00
Ayke van Laethem	6afc3de42f	[AVR] Fix private label prefix This is a small pet peeve from me. This change makes sure the AVR backend uses the correct private label prefix (.L) so that private labels are hidden in avr-objdump. Example code: define i8 @foo(i1 %cond) { br i1 %cond, label %then, label %else then: ret i8 3 else: ret i8 5 } When compiling this: llc -march=avr -filetype=obj -o test.o test.ll and then dumping it: avr-objdump -d test.o You would previously get an ugly temporary label: 00000000 <foo>: 0: 81 70 andi r24, 0x01 ; 1 2: 80 30 cpi r24, 0x00 ; 0 4: f9 f3 breq .-2 ; 0x4 <foo+0x4> 6: 83 e0 ldi r24, 0x03 ; 3 8: 08 95 ret 0000000a <LBB0_2>: a: 85 e0 ldi r24, 0x05 ; 5 c: 08 95 ret This patch fixes that, the output is now: 00000000 <foo>: 0: 81 70 andi r24, 0x01 ; 1 2: 80 30 cpi r24, 0x00 ; 0 4: 01 f0 breq .+0 ; 0x6 <foo+0x6> 6: 83 e0 ldi r24, 0x03 ; 3 8: 08 95 ret a: 85 e0 ldi r24, 0x05 ; 5 c: 08 95 ret Note that as you can see the breq operand is different. However it is still the same after linking: 4: 11 f0 breq .+4 Differential Revision: https://reviews.llvm.org/D75124	2020-02-26 20:32:25 +01:00
Ayke van Laethem	165f707f9d	[AVR] Don't adjust addresses by 2 for absolute values Adjusting by 2 breaks DWARF output. With this fix, programs start to compile and produce valid DWARF output. Differential Revision: https://reviews.llvm.org/D74213	2020-02-26 20:32:24 +01:00
Krzysztof Parzyszek	7691790dfd	[Hexagon] Improve casting of boolean HVX vectors to scalars - Mark memory access for bool vectors as disallowed in target lowering. This will prevent combining bitcasts of bool vectors with stores. - Replace the actual bitcasting code with a faster version. - Handle casting of v16i1 to i16.	2020-02-26 12:46:52 -06:00
Krzysztof Parzyszek	fd7c2e24c1	[SDAG] Add SDNode::values() = make_range(values_begin(), values_end()) Also use it in a few places to simplify code a little bit. NFC	2020-02-26 12:07:38 -06:00
Reid Kleckner	465dca79b3	Avoid SmallString.h include in MD5.h, NFC Saves 200 includes, which is mostly immaterial.	2020-02-26 09:10:24 -08:00
Nicolai Hähnle	d6b05fccb7	Full fix for "AMDGPU/SIInsertSkips: Fix the determination of whether early-exit-after-kill is possible" (hopefully) Properly preserve the MachineDominatorTree in all cases. Change-Id: I54cf0c0a20934168a356920ba8ed5097a93c4131	2020-02-26 16:21:44 +01:00
Simon Tatham	9eb3cc10b2	[ARM,MVE] Add predicated intrinsics for many unary functions. Summary: This commit adds the predicated MVE intrinsics for the same set of unary operations that I added in their unpredicated forms in * D74333 (vrint) * D74334 (vrev) * D74335 (vclz, vcls) * D74336 (vmovl) * D74337 (vmovn) but since the predicated versions are a lot more similar to each other, I've kept them all together in a single big patch. Everything here is done in the standard way we've been doing other predicated operations: an IR intrinsic called `@llvm.arm.mve.foo.predicated` and some isel rules that match that alongside whatever they accept for the unpredicated version of the same instruction. In order to write the isel rules conveniently, I've refactored the existing isel rules for the affected instructions into multiclasses parametrised by a vector-type class, in the usual way. All those refactorings are intended to leave the existing isel rules unchanged: the only difference should be that new ones for the predicated intrinsics are introduced. The only tiny infrastructure change I needed in this commit was to change the implementation of `IntrinsicMX` in `arm_mve_defs.td` so that the records it defines are anonymous rather than named (and use `NameOverride` to set the output intrinsic name), which allows me to call it twice in two multiclasses with the same `NAME` without a tablegen-time error. Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard Reviewed By: MarkMurrayARM Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D75165	2020-02-26 15:12:07 +00:00
Xiangling Liao	e7375e9932	[AIX] Remove whitelist checking for ExternalSymbolSDNodes Allow all ExternalSymbolSDNode on AIX, and rely on the linker error to find symbols which we don't have definitions from any library/compiler-rt. Differential Revision: https://reviews.llvm.org/D75075	2020-02-26 10:09:25 -05:00
Nicolai Hähnle	0aec4b418e	Quick fix for bot failure on "AMDGPU/SIInsertSkips: Fix the determination of whether early-exit-after-kill is possible" Apparently the dominator tree update is incorrect, will investigate. Change-Id: Ie76f8d11b22a552af1f098c893773f3d85e02d4f	2020-02-26 16:02:22 +01:00
Nicolai Hähnle	0f1df48925	AMDGPU/SIInsertSkips: Fix the determination of whether early-exit-after-kill is possible Summary: The old code made some incorrect assumptions about the order in which basic blocks are laid out in a function. This could lead to incorrect early-exits, especially when kills occurred inside of loops. The new approach is to check whether the point where the conditional kill occurs dominates all reachable code. If that is the case, there cannot be any other threads in the wave that are waiting to rejoin at a later point in the CFG, i.e. if exec=0 at that point, then all threads really are dead and we can exit the wave. Make some other minor cleanups to the pass while we're at it. v2: preserve the dominator tree Reviewers: arsenm, cdevadas, foad, critson Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D74908 Change-Id: Ia0d2b113ac944ad642d1c622b6da1b20aa1aabcc	2020-02-26 15:30:42 +01:00
Kerry McLaughlin	9c859fc54d	[AArch64][SVE] Add SVE2 intrinsics for bit permutation & table lookup Summary: Implements the following intrinsics: - @llvm.aarch64.sve.bdep.x - @llvm.aarch64.sve.bext.x - @llvm.aarch64.sve.bgrp.x - @llvm.aarch64.sve.tbl2 - @llvm.aarch64.sve.tbx The SelectTableSVE2 function in this patch is used to select the TBL2 intrinsic & ensures that the vector registers allocated are consecutive. Reviewers: sdesmalen, andwar, dancgr, cameron.mcinally, efriedma, rengolin Reviewed By: efriedma Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, cfe-commits, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D74912	2020-02-26 11:22:23 +00:00
Sam Parker	1d06e75df2	[ARM][RDA] add getUniqueReachingMIDef Add getUniqueReachingMIDef to RDA which performs a global search for a machine instruction that produces a unique definition of a given register at a given point. Also add two helper functions (getMIOperand) that wrap around this functionality to get the incoming definition uses of a given instruction. These now replace the uses of getReachingMIDef in ARMLowOverheadLoops. getReachingMIDef has been renamed to getReachingLocalMIDef and has been made private along with getInstFromId. Differential Revision: https://reviews.llvm.org/D74605	2020-02-26 11:15:26 +00:00
Florian Hahn	a059be72c4	[AArch64] Flip default for register renaming in the ld/st optimizier. Turn on register renaming again after disabling it for the 10.0 release, to help flushing out any issues.	2020-02-26 11:08:17 +00:00
Jim Lin	f6603aed59	[ARC][NFC] Remove trailing space	2020-02-26 13:38:51 +08:00
Kang Zhang	b083d7a346	[PowerPC] Fix the unexpected modification caused by D62993 in LowerSELECT_CC for power9 Summary: The patch D62993 : `[PowerPC] Emit scalar min/max instructions with unsafe fp math` has modified the functionality when `Subtarget.hasP9Vector() && (!HasNoInfs \|\| !HasNoNaNs)`, this modification is not expected. Reviewed By: nemanjai Differential Revision: https://reviews.llvm.org/D74701	2020-02-26 02:59:03 +00:00
Fangrui Song	d0c4277d38	[MC][ARM] Don't create multiple .ARM.exidx associated to one .text Fixed an issue exposed by D74006. In clang cc1as, MCContext::UseNamesOnTempLabels is true. When parsing a .fnstart directive, FnStart gets redefined to a temporary symbol of a different name (.Ltmp0, .Ltmp1, ...). MCContext::getELFSection() called by SwitchToEHSection() will create a different .ARM.exidx each time. llvm-mc uses `Ctx.setUseNamesOnTempLabels(false);` and FnStart is unnamed. MCContext::getELFSection() called by SwitchToEHSection() will reuse the same .ARM.exidx . Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D75095	2020-02-25 18:18:13 -08:00
Craig Topper	735d27dc40	[SelectionDAG][PowerPC][AArch64][X86][ARM] Add chain input and output the ISD::FLT_ROUNDS_ This node reads the rounding control which means it needs to be ordered properly with operations that change the rounding control. So it needs to be chained to maintain order. This patch adds a chain input and output to the node and connects it to the chain in SelectionDAGBuilder. I've update all in-tree targets to connect their chain through their lowering code. Differential Revision: https://reviews.llvm.org/D75132	2020-02-25 16:58:23 -08:00
Vedant Kumar	8594f3d899	Revert "[X86MCTargetDesc.h] Speculative fix for macro collision with sys/param.h" This reverts commit `eee22ec3c3`. This is not the correct fix, the root cause seems to be a bug in the stage1 host clang compiler. See https://reviews.llvm.org/D75091 for more discussion.	2020-02-25 14:38:46 -08:00
Thomas Lively	0906dca493	[WebAssembly] Simplify extract_vector lowering Summary: Removes patterns that were not doing useful work, changes the default extract instructions to be the unsigned versions now that they are enabled by default, fixes PR44988, and adds tests for sext_inreg lowering. Reviewers: aheejin Reviewed By: aheejin Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D75005	2020-02-25 13:54:48 -08:00
Roman Lebedev	0789f28048	[NFC][SCEV] Piping to pass TTI into SCEVExpander::isHighCostExpansionHelper() Summary: Future patches will make use of TTI to perform cost-model-driven `SCEVExpander::isHighCostExpansionHelper()` This is a fully NFC patch to make things reviewable. Reviewers: reames, mkazantsev, wmi, sanjoy Reviewed By: mkazantsev Subscribers: hiraditya, zzheng, javed.absar, dmgreen, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73704	2020-02-25 23:05:56 +03:00
Scott Linder	481b1c8380	[AMDGPU] Implement wave64 DWARF register mapping Summary: Implement the DWARF register mapping described in llvm/docs/AMDGPUUsage.rst This is currently limited to wave64 VGPRs/AGPRs. This also includes some minor changes in AMDGPUInstPrinter, AMDGPUMCTargetDesc, and AMDGPUAsmParser to make generating CFI assembly text and ELF sections possible to ease testing, although complete CFI support is not yet implemented. Tags: #llvm Differential Revision: https://reviews.llvm.org/D74915	2020-02-25 14:00:01 -05:00
Vedant Kumar	eee22ec3c3	[X86MCTargetDesc.h] Speculative fix for macro collision with sys/param.h See discussion on https://reviews.llvm.org/D75091 for information about the build failure and alternatives considered.	2020-02-25 10:52:37 -08:00
Matt Arsenault	86e13ec194	AMDGPU/GlobalISel: Use packed for G_ADD/G_SUB/G_MUL v2s16	2020-02-25 11:20:35 -05:00
Jay Foad	33cbd5ee08	AMDGPU/GlobalISel: Legalize s64 min/max by lowering Reviewers: arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D75108	2020-02-25 16:00:43 +00:00
Andrzej Warzynski	cff90c938b	[AArch64][SVE] Update names and comments for gathers/scatters (NFC) Summary: This patch renames functions and TableGen classes for SVE gathers and scatters. The original names implied that the corresponding methods/classes are only suited for regular gathers/scatters (i.e. LD1 and ST1), which is not the case. Indeed, we will be re-using them for non-temporal and first-faulting gathers/scatters in the forthcoming patches. The new names also highlight the split into Vector-Scalar (VS) and Scalar-Vector (SV) cases. List of changes: * `performLD1GatherCombine` and `performST1ScatterCombine` are renamed as `performGatherLoadCombine` and `performScatterStoreCombine`, respectively. * Selection DAG types for scatters and gathers from AArch64SVEInstrInfo.td are renamed. For example, `SDT_AArch64_GLD1` is renamed as `SDT_AArch64_GATHER_SV`. SV stands for Scalar-Vector, as opposed to Vector-Scalar (VS). * The intrinsic classes from IntrinsicsAArch64.td are renamed. For example, `AdvSIMD_GatherLoad_64bitOffset_Intrinsic` is renamed as `AdvSIMD_GatherLoad_SV_64b_Offsets_Intrinsic`. * Updated comments in `performGatherLoadCombine` and `performScatterStoreCombine`. Reviewers: sdesmalen, rengolin, efriedma Reviewed By: sdesmalen Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D75035	2020-02-25 11:09:01 +00:00
Hans Wennborg	decd021fac	Don't generate libcalls for wide shift on Windows ARM (PR42711) The previous patch (`cff90f07cb`) didn't cover ARM.	2020-02-25 11:54:07 +01:00
Cullen Rhodes	72848f26b4	[AArch64][SVE] Add predicate reinterpret intrinsics Summary: Implements the following intrinsics: * llvm.aarch64.sve.convert.to.svbool * llvm.aarch64.sve.convert.from.svbool For converting the ACLE svbool_t type (<n x 16 x i1>) to and from the other predicate types: <n x 8 x i1>, <n x 4 x i1> and <n x 2 x i1>. Reviewers: sdesmalen, kmclaughlin, efriedma, dancgr, rengolin Reviewed By: sdesmalen, efriedma Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D74471	2020-02-25 10:24:06 +00:00
Craig Topper	89ba4acad6	[X86] Pass parameters into selectVectorAddr to remove dependency on X86MaskedGatherScatterSDNode. Might be able to get rid of X86ISD::SCATTER and some uses of X86ISD::GATHER. Which require isel to use ISD::SCATTER and ISD::GATHER as well.	2020-02-24 23:56:34 -08:00
Craig Topper	9238dfb4d8	[X86] Remove mask output from X86 gather/scatter ISD opcodes. Instead add it when we make the machine nodes during instruction selections. This makes this ISD node closer to ISD::MGATHER. Trying to see if we remove the X86 specific ones.	2020-02-24 23:56:28 -08:00
Jim Lin	84c3d3f37a	[Sparc][NFC] Remove trailing space	2020-02-25 14:38:58 +08:00
Matt Arsenault	fee41517fe	AMDGPU/GlobalISel: Introduce post-legalize combiner The current set of custom combines are only really useful after legalization, so move them there. There is a lot of overlap in the boilerplate here, but I think we do want a pretty different set of combines before and after legalize. I think we will want a lot of overlap between the post-legalize and a post-regbankselect combiner.	2020-02-24 22:12:12 -05:00
Matt Arsenault	0b46b078b6	AMDGPU/GlobalISel: Fix incorrect VOP3P fneg folding We use some s32 values in VOP3P operands, and won't see any intervening casts from a 32-bit fneg. Make sure it's really a packed fneg before folding.	2020-02-24 21:20:35 -05:00
Eli Friedman	248eaff823	[AArch64] SVE implies fullfp16 This is explicitly guaranteed in ARMARM. And it makes reasoning about vectors easier: we can assume that if a vector operation is legal, the corresponding scalar operation is also legal. Differential Revision: https://reviews.llvm.org/D74993	2020-02-24 17:19:35 -08:00
Jay Foad	0ed4744bb5	AMDGPU/GlobalISel: Lower 64-bit uaddo/usubo Summary: Add more test cases for signed and unsigned add/sub with overflow. Reviewers: arsenm, rampitec, kerbowa Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D75051	2020-02-24 23:08:14 +00:00
Ikhlas Ajbar	820df6e679	[Hexagon] Lower vector predicate store This patch lowers store of vector predicate of type v128i1.	2020-02-24 15:43:04 -06:00
Ikhlas Ajbar	a8a4f99afb	[Hexagon] Lower bitcast of a vector predicate This patch lowers bitcast of vector predicate of type v32i1/v64i1 to i32/i64 type.	2020-02-24 15:25:51 -06:00
Craig Topper	727328433a	[X86] Add back fmaddsub intrinsics to work towards fixing the strict fp implementation Previously we emitted an fmadd and a fmadd+fneg and combined them with a shufflevector. But this doesn't follow the correct exception behavior for unselected elements so the backend can't merge them into the fmaddsub/fmsubadd instructions. This patch restores the the fmaddsub intrinsics so we don't have two arithmetic operations. We lose out on optimization opportunity in the non-strict FP case, but I don't think this is a big loss. If someone gives us a test case we can look into adding instcombine/dagcombine improvements. I'd rather not have the frontend do completely different things for strict and non-strict. This still has problems because target specific intrinsics don't support strict semantics yet. We also still have all of the problems with masking. But we at least generate the right instruction in constrained mode now. Differential Revision: https://reviews.llvm.org/D74268	2020-02-24 12:07:21 -08:00
Stanislav Mekhanoshin	4135077e26	[AMDGPU] use llvm_unreachable instead of default for rp set GCC 9.2 seems to incorrectly issue warning about out of bounds access. This situation should not happen in any way. Differential Revision: https://reviews.llvm.org/D75071	2020-02-24 12:02:12 -08:00
Ayke van Laethem	5b2046c95c	[AVR] Disassemble register operands Simply by implementing a few functions I was able to correctly disassemble a much larger amount of instructions. Differential Revision: https://reviews.llvm.org/D74045	2020-02-24 19:35:51 +01:00
Simon Pilgrim	daac8dba77	[X86] combineX86ShuffleChain - select X86ISD::FAND/ISD::AND based on MaskVT Noticed by inspection, we shouldn't use FloatDomain directly, we've already bitcast both inputs to MaskVT so select the opcode using that.	2020-02-24 18:24:44 +00:00
Ayke van Laethem	d1af6011e5	[AVR] Don't assert on an undefined operand Not all operands are correctly disassembled at the moment. This means that some machine instructions won't have all the necessary operands set. To avoid asserting, print an error instead until the necessary support has been implemented. Differential Revision: https://reviews.llvm.org/D73958	2020-02-24 19:22:52 +01:00
Ayke van Laethem	a5424ded37	[AVR] Use correct register class for mul instructions A number of multiplication instructions (muls, mulsu, fmul, fmuls, fmulsu) had the wrong register class for an operand. This resulted in the wrong register being used for the instruction. Example: target datalayout = "e-P1-p:16:8-i8:8-i16:8-i32:8-i64:8-f32:8-f64:8-n8-a:8" target triple = "avr-atmel-none" define i16 @sliceAppend(i16, i16, i16, i16, i16, i16) addrspace(1) { %d = mul i16 %0, %5 ret i16 %d } The first instruction would be muls r24, r31 before this patch. The r31 should have been r15 if you look at the intermediate forms during instruction selection / register allocation, but the generated instruction uses r31. After this patch, an extra movw is inserted to get %5 in range for muls. To make sure this bug is fixed everywhere, I checked all instructions and found that most multiplication instructions suffered from this bug, which I have fixed with this patch. No other instructions appear to be affected. Differential Revision: https://reviews.llvm.org/D74281	2020-02-24 19:19:56 +01:00
jasonliu	bee70bfff0	[XCOFF][AIX] Fix incorrect alignment for function descriptor csect Summary: Function descriptor csect on AIX should be 4 byte align instead of 1 byte align. Reviewer: daltenty Differential Revision: https://reviews.llvm.org/D74974	2020-02-24 18:15:17 +00:00
Simon Pilgrim	59d8d13c7b	[X86] getTargetShuffleInputs - check that the source inputs are all the right size. I'm hoping to begin improving shuffle combining across different vector sizes, but before that we must ensure that all existing getTargetShuffleInputs calls must bail if the inputs aren't the same size.	2020-02-24 16:26:10 +00:00
Sean Fertile	8efc2f5723	[PowerPC][AIX] Spill/restore the callee-saved condition register bits. Extends the existing support for spilling and restoring the condition register to the linkage area for 32-bit targets, and enables for AIX. Differential Revision: https://reviews.llvm.org/D74349	2020-02-24 11:24:46 -05:00
Simon Pilgrim	b82438872b	[CostModel][X86] We don't need a scale factor for SLM extract costs D74976 will handle larger vector types, but since SLM doesn't support AVX+ then we will always be extracting from 128-bit vectors so don't need to scale the cost.	2020-02-24 14:23:04 +00:00
Sjoerd Meijer	7efabe5c7d	[MIR][ARM] MachineOperand comments This adds infrastructure to print and parse MIR MachineOperand comments. The motivation for the ARM backend is to print condition code names instead of magic constants that are difficult to read (for human beings). For example, instead of this: dead renamable $r2, $cpsr = tEOR killed renamable $r2, renamable $r1, 14, $noreg t2Bcc %bb.4, 0, killed $cpsr we now print this: dead renamable $r2, $cpsr = tEOR killed renamable $r2, renamable $r1, 14 /* CC::always /, $noreg t2Bcc %bb.4, 0 / CC:eq /, killed $cpsr This shows that MachineOperand comments are enclosed between / and /. In this example, the EOR instruction is not conditionally executed (i.e. it is "always executed"), which is encoded by the 14 immediate machine operand. Thus, now this machine operand has / CC::always / as a comment. The 0 on the next conditional branch instruction represents the equal condition code, thus now this operand has / CC:eq */ as a comment. As it is a comment, the MI lexer/parser completely ignores it. The benefit is that this keeps the change in the lexer extremely minimal and no target specific parsing needs to be done. The changes on the MIPrinter side are also minimal, as there is only one target hooks that is used to create the machine operand comments. Differential Revision: https://reviews.llvm.org/D74306	2020-02-24 14:19:21 +00:00
Kerry McLaughlin	f87f23c81c	[AArch64][SVE] Add the SVE dupq_lane intrinsic Summary: Implements the @llvm.aarch64.sve.dupq.lane intrinsic. As specified in the ACLE, the behaviour of: svdupq_lane_u64(data, index) ...is identical to: svtbl(data, svadd_x(svptrue_b64(), svand_x(svptrue_b64(), svindex_u64(0, 1), 1), index * 2)) If the index is in the range [0,3], the operation is equivalent to a single DUP (.q) instruction. Reviewers: sdesmalen, c-rhodes, cameron.mcinally, efriedma, dancgr, rengolin Reviewed By: sdesmalen Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, cfe-commits, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D74734	2020-02-24 13:59:47 +00:00
Sam Parker	a67eb221e2	[RDA][ARM][LowOverheadLoops] Iteration count IT blocks Change the way that we remove the redundant iteration count code in the presence of IT blocks. collectLocalKilledOperands has been introduced to scan an instructions operands, collecting the killed instructions and then visiting them too. This is used to delete the code in the preheader which calculates the iteration count. We also track any IT blocks within the preheader and, if we remove all the instructions from the IT block, we also remove the IT instruction. isSafeToRemove is used to remove any redundant uses of the iteration count within the loop body. Differential Revision: https://reviews.llvm.org/D74975	2020-02-24 13:51:03 +00:00
Kerry McLaughlin	f2ff153401	[AArch64][SVE] Add intrinsics for SVE2 cryptographic instructions Summary: Implements the following SVE2 intrinsics: - @llvm.aarch64.sve.aesd - @llvm.aarch64.sve.aesimc - @llvm.aarch64.sve.aese - @llvm.aarch64.sve.aesmc - @llvm.aarch64.sve.rax1 - @llvm.aarch64.sve.sm4e - @llvm.aarch64.sve.sm4ekey Reviewers: sdesmalen, c-rhodes, dancgr, cameron.mcinally, efriedma, rengolin Reviewed By: sdesmalen Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, cfe-commits, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D74833	2020-02-24 10:49:31 +00:00
Bevin Hansson	c3f36acc92	[MC] Widen the functional unit type from 32 to 64 bits. Summary: The type used to represent functional units in MC is 'unsigned', which is 32 bits wide. This is currently not a problem in any upstream target as no one seems to have hit the limit on this yet, but in our downstream one, we need to define more than 32 functional units. Increasing the size does not seem to cause a huge size increase in the binary (an llc debug build went from 1366497672 to 1366523984, a difference of 26k), so perhaps it would be acceptable to have this patch applied upstream as well. Subscribers: hiraditya, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71210	2020-02-24 09:37:00 +01:00
Sam Parker	03756a4197	[ARM][MVE] Combine more extending masked loads For MVE, don't look at the users of the extending loads so that more as desirable for folding. Differential Revision: https://reviews.llvm.org/D74958	2020-02-24 07:50:15 +00:00
Craig Topper	7a7146cf72	[X86] When creating X86ISD::MGATHER nodes from AVX2 gather intrinsics, cast the mask to integer type. The gather intrinsics use a floating point mask when the result type is FP. But we call DemandedBits on the mask assuming its an integer type. We also use integer types when we create it from generic IR. So add a bitcast to the intrinsic path to guarantee the integer type.	2020-02-23 23:00:41 -08:00
Craig Topper	f1b8ec3398	[X86] Use custom isel for gather/scatter instructions. The type profile we use for the isel patterns lied about how many operands the gather/scatter node has to skip the index and scale operands. This allowed us to expand the baseptr operand into base, displacement, and segment and then merge the index and scale with them in the final instruction during isel. This is kind of a hack that relies on isel not checking the number of operands at all. This commit switches to custom isel where we can manage this directly without relying on holes in the isel checking.	2020-02-23 22:33:06 -08:00
QingShan Zhang	8b3a62dc98	[NFC][PowerPC] Refactor the tryAndWithMask() Split the tryAndWithMask into several small calls. Differential Revision: https://reviews.llvm.org/D72250	2020-02-24 04:02:24 +00:00
Craig Topper	5a70518660	[X86] Remove most X86 specific subclasses of MemSDNode. Just use a MemIntrinsicSDNode as we usually do. Leave the gather/scatter subclasses, but make them inherit from MemIntrinsicSDNode and delete their constructor and destructor. This way we can still have the getIndex, getMask, etc. convenience functions.	2020-02-23 15:13:32 -08:00
Craig Topper	15b6aa7448	[X86] Enable the use of movlps for i64 atomic load on 32-bit targets with sse1. Still a little room for improvement by using movlps to store to the stack temporary needed to move data out of the xmm register after the load.	2020-02-23 15:11:38 -08:00
Craig Topper	2a10f8019d	[X86] Use FIST for i64 atomic stores on 32-bit targets without SSE.	2020-02-23 15:11:38 -08:00
Jonas Paulsson	82879c2913	[SystemZ] Support the kernel back chain. In order to build the Linux kernel, the back chain must be supported with packed-stack. The back chain is then stored topmost in the register save area. Review: Ulrich Weigand Differential Revision: https://reviews.llvm.org/D74506	2020-02-23 13:42:36 -08:00
Craig Topper	84cd968f75	[X86] Add AddToWorklist(N) after calls to SimplifyDemandedBits/SimplifyDemandedVectorElts that are called on an operand of N. If a simplication occurs the operand will be added to the worklist. But since the demanded mask was based on N, we need to make sure we revisit N in case there are more simplifications to be done. Returning SDValue(N, 0) as we do, only tells DAG combine that something changed, but that won't make it add anything to the worklist. Found while playing around with using VEXTRACT_STORE in more cases. But I guess this doesn't affect any of our existing tests.	2020-02-22 21:42:59 -08:00
Craig Topper	bdb1729c83	[X86] Teach EltsFromConsecutiveLoads that it's ok to form a v4f32 VZEXT_LOAD with a 64 bit memory size on SSE1 targets. We can use MOVLPS which will load 64 bits, but we need a v4f32 result type. We already have isel patterns for this. The code here is a little hacky. We can probably improve it with more isel patterns.	2020-02-22 18:50:52 -08:00
Craig Topper	e7a184fc7c	[X86] Use movlps for i64 atomic stores on 32-targets with sse1. This is similar to using movd which we do for sse2 targets. I've added a DAG combine for VEXTRACT_STORE to use SimplifyDemandedVectorElts to clean up some artifacts from type legalization.	2020-02-22 18:22:47 -08:00
Simon Moll	635034f193	[VE][fix] missing include	2020-02-22 11:00:59 +01:00
Craig Topper	228a2bc9b7	[X86] Teach combineCVTPH2PS to shrink v8i16 loads when the output type is v4f32. Remove extra isel patterns. Similar to what do for other operations that use a subset of bits. Allows us to remove a pattern that shrinks a load. Which was incorrect if the load was volatile.	2020-02-21 18:11:07 -08:00
Heejin Ahn	3648370a79	[WebAssembly] Fix a non-determinism problem in FixIrreducibleControlFlow Summary: We already sorted the blocks when fixing up a set of mutual loop entries, however, there can be multiple sets of such mutual loop entries, and the order we encounter them should not be random, so sort them too. Fixes https://bugs.llvm.org/show_bug.cgi?id=44982 Patch by Alon Zakai (kripken) Reviewers: aheejin, sbc100, dschuff Subscribers: mgrang, sunfish, hiraditya, jgravelle-google, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D74999	2020-02-21 17:05:46 -08:00
Matt Arsenault	bf4933b4ea	AMDGPU/GlobalISel: Remove dead code	2020-02-21 19:19:32 -05:00
Mark Searles	d3e170c438	Revert "[AMDGPU] Don’t marke the .note section as ALLOC" This reverts commit `977cd661cf`. It breaks OpenCL testing. OpenCL Runtime is using PT_LOAD information to calculate memory for global variables. This commit should be relanded once the OpenCL runtime stops relying on PT_LOAD information for calculating global variable memory size. Differential Revision: https://reviews.llvm.org/D74995	2020-02-21 16:08:30 -08:00
Francis Visoiu Mistrih	a32d539798	[Target] Remove libObject dependency in lib/Target This removes a couple useless includes and the dependency of X86Desc on Object, which was useless as well.	2020-02-21 14:52:31 -08:00
Fangrui Song	fddbff1473	[AArch64] Delete an unneeded dependency on Object after `1874dee566` `1874dee566` moved CPU_(SUB_)TYPE logic to BinaryFormat. Object is not directly referenced.	2020-02-21 14:02:54 -08:00
Fangrui Song	fad1c750f1	[AArch64][SVE] Fix -DBUILD_SHARED_LIBS=on builds after -D74808/1874dee5662603c9251228c71b66de72cec0c979	2020-02-21 13:59:47 -08:00
Fangrui Song	5c33a81b7a	[AArch64][SVE] Fix -Wimplicit-fallthrough after D73711	2020-02-21 13:46:33 -08:00
Cameron McInally	a5b22b768f	[AArch64][SVE] Add support for DestructiveBinary and DestructiveBinaryComm DestructiveInstTypes Add support for DestructiveBinaryComm DestructiveInstType, as well as the lowering code to expand the new Pseudos into the final movprfx+instruction pairs. Differential Revision: https://reviews.llvm.org/D73711	2020-02-21 15:19:54 -06:00
Jay Foad	b72f1448ce	AMDGPU/GlobalISel: Better code for one case of G_SHUFFLE_VECTOR on v2i16 Reviewers: arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D74987	2020-02-21 21:16:39 +00:00
Francis Visoiu Mistrih	1874dee566	[macho][NFC] Extract all CPU_(SUB_)TYPE logic to BinaryFormat This moves all the logic of converting LLVM Triples to MachO::CPU_(SUB_)TYPE from the specific target (Target)AsmBackend to more convenient functions in lib/BinaryFormat. This also gets rid of the separate two X86AsmBackend classes. The previous attempt was to add it to libObject, but that adds an unnecessary dependency to libObject from all the targets. Differential Revision: https://reviews.llvm.org/D74808	2020-02-21 12:43:29 -08:00
Craig Topper	8875ee18d7	[X86] Add a new format type for instructions that represent named prefix bytes like data16 and rep. Use it to make a simpler version of isPrefix. isPrefix was added to support the patches to align branches. it relies on a switch over instruction names. This moves those opcodes to a new format so the information is tablegen and we can just check for a specific value in some bits in TSFlags instead. I've left the other function in place for now so that the existing patches in phabricator will still work. I'll work with the owner to get them migrated.	2020-02-21 12:34:59 -08:00
Francesco Petrogalli	33bf119647	[llvm][CodeGen][aarch64] Add contiguous prefetch intrinsics for SVE. Summary: The patch covers both register/register and register/immediate addressing modes. Reviewers: efriedma, andwar, sdesmalen Reviewed By: sdesmalen Subscribers: sdesmalen, tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D74581	2020-02-21 20:22:25 +00:00
Francesco Petrogalli	e2ed1d14d6	[llvm][aarch64] SVE addressing modes. Summary: Added register + immediate and register + register addressing modes for the following intrinsics: 1. Masked load and stores: * Sign and zero extended load and truncated stores. * No extension or truncation. 2. Masked non-temporal load and store. Reviewers: andwar, efriedma Subscribers: cameron.mcinally, sdesmalen, tschuett, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D74254	2020-02-21 20:02:34 +00:00
Cameron McInally	266959c0f7	[AArch64][SVE] Add backend support for splats of immediates This patch adds backend support for splats of both Int and FP immediates. Differential Revision: https://reviews.llvm.org/D74856	2020-02-21 13:21:47 -06:00
Matt Arsenault	00955a62e4	AMDGPU/GlobalISel: Fix SALU mapping for v2s16 min/max The legalizer helper functions are unusably awkward to perform the 3-5 part legalization. This needs to be widened, scalarized, lowered, and we should avoid creating vector extends and truncates. Manually do all of this and expand.	2020-02-21 14:02:16 -05:00
Matt Arsenault	db06870dbd	AMDGPU: Move dot intrinsic patterns to instruction def I tried to use some of the new tablegen features to avoid creating different operand list permutations, but I still don't see a way to programmatically build a source pattern dag. Also add GlobalISel tests, which now all import successfully. Some of the fneg fold tests are incorrect, which need to be fixed in a future commit	2020-02-21 13:35:40 -05:00
Matt Arsenault	4c1c9422a3	AMDGPU/GlobalISel: Select llvm.amdgcn.fdot2 I'm slighly worried about the generated checks, since they won't catch incorrect modifiers being added at the end of the line.	2020-02-21 13:35:40 -05:00
Matt Arsenault	dfce5fd50a	AMDGPU/GlobalISel: Select VOP3P instructions This only handles the basic cases. More work is needed to make better use of op_sel.	2020-02-21 13:35:40 -05:00
Matt Arsenault	72eef820d5	AMDGPU/GlobalISel: Select G_SHUFFLE_VECTOR G_SHUFFLE_VECTOR is legal since it theoretically may help match op_sel for VOP3P instructions. Expand it in some other way in case it doesn't fold into the use instructions.	2020-02-21 13:35:40 -05:00
Nikita Popov	c90ea87cfd	[X86] Fix SDLoc initialization Fixes -Wparentheses warning, in this case indicating a genuine bug.	2020-02-21 18:26:05 +01:00
Jonas Paulsson	41bd9ead35	[SystemZ] Return scalarized costs for vector instructions on older archs. A cost query for a vector instruction should return a cost even without target vector support, and not trigger an assert. VectorCombine does this with an input containing source code vectors. Review: Ulrich Weigand	2020-02-21 09:17:37 -08:00
Matt Arsenault	60023e3471	AMDGPU: Use default operand for VOP3P clamp We don't use this, and matching from the def doesn't make much sense. There are multiple tablegen bugs with default operand handling. undef_tied_input should work to handle the vdst_in correctly, but this breaks the operand register class constraint which it should be able to infer.	2020-02-21 12:14:18 -05:00
Danilo Carvalho Grael	db9c40f562	[AArch64][SVE] Add intrinsics for SVE2 bitwise ternary operations Summary: Add intrinsics for the following operations: - eor3, bcax - bsl, bsl1n, bsl2n, nbsl Fix MC tests for bsl instructions. Reviewers: kmclaughlin, c-rhodes, sdesmalen, efriedma, rengolin Reviewed By: efriedma Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D74785	2020-02-21 12:15:51 -05:00
Matt Arsenault	043ed2e22a	AMDGPU/GlobalISel: Fix xnor matching We should try the generated matchers before the manual selection. This means the patterns are now handling the common cases, but the manual selection code is not yet dead. It's still handling the non-s32/s64 cases (like v2s16 and v2s32). Currently tablegen doesn't have a nice way to have a single pattern that covers multiple types.	2020-02-21 11:42:49 -05:00
David Green	83012cb217	[ARM] Correct Formatting. NFC Also removed an unnecessary TODO that I don't believe is relevant for the instruction in question.	2020-02-21 16:08:56 +00:00
Matt Arsenault	ac7abe0ba9	AMDGPU/GlobalISel: Manually select G_BUILD_VECTOR_TRUNC We have patterns for s_pack* selection, but they assume the inputs are a build_vector with 16-bit inputs, not a truncating build vector. Since there's still outstanding work for how to handle mismatched result and source element vector operations, and since I'm trying a different packed vector strategy than SelectionDAG, just manually select this for now.	2020-02-21 10:34:11 -05:00
Matt Arsenault	79ff188add	AMDGPU/GlobalISel: Legalize G_FPOW There are few differences from the DAG handling. First, the DAG handling uses a primitive selection pattern instead of custom legalizing it. Because of this, this makes use of source modifiers while the DAG does not. Also instead of promoting f16, try to use the f16 log/exp. There's no f16 fmul_legacy, so widen just for the multiply, although I'm not sure that's the best solution.	2020-02-21 10:31:13 -05:00
Matt Arsenault	fab4cdea39	AMDGPU/GlobalISel: Select llvm.amdgcn.fmul.legacy	2020-02-21 10:30:26 -05:00
Matt Arsenault	b64aa8c715	AMDGPU/GlobalISel: Fix constant bus violation with source modifiers This looked through copies to find the source modifiers, which may have been SGPR->VGPR copies added to avoid potential constant bus violations. Re-insert a copy to a VGPR if this happens.	2020-02-21 10:30:23 -05:00
Sean Fertile	4fdaac0e1e	[PowerPC][NFC] Remove Darwin specific logic in frame finalization. Remove some cumbersome Darwin specific logic for updating the frame offsets of the condition-register spill slots. The containing function has an early return if the subtarget is not ELF based which makes the Darwin logic dead.	2020-02-21 09:32:24 -05:00
Krzysztof Parzyszek	c51b0bede8	[Hexagon] Introduce noop intrinsic to cast between vector predicate types The (overloaded) intrinsic is llvm.hexagon.V6.pred.typecast[.128B]. The types of the operand and the return value are HVX boolean vector types. For each cast, there needs to be a corresponding intrinsic declared, with different suffixes appended to the name, e.g. ; cast <128 x i1> to <32 x i1> declare <32 x i1> @llvm.hexagon.V6.pred.typecast.128B.s1(<128 x i1>) ; cast <32 x i1> to <64 x i1> declare <64 x i1> @llvm.hexagon.V6.pred.typecast.128B.s2(<32 x i1>) etc.	2020-02-21 07:37:59 -06:00
Swiftfuchs	a24d46318f	[NFC] Corrected a minor typo in a comment	2020-02-21 13:56:44 +01:00
Craig Topper	97f11600e0	[X86] Don't bother avoiding illegal FCMOVs if we don't have the cmov subtarget feature. We'll be forced to emit branches so we might as well use the most direct condition.	2020-02-21 00:34:15 -08:00
Craig Topper	263bef2bbc	[X86] Make combineCMov not create unsupported FCMOVs when f32/f64 are using X87. This makes the behavior consistent with what's in LowerSELECT.	2020-02-21 00:34:15 -08:00
Craig Topper	4576606831	[X86] Remove unnecessary isNullConstant in LowerSelect. NFC At this point in the code we know that Op1 or Op2 is all ones. Y points to the other operand. In the case that Op2 is zero, Op1 must be all ones and Y is Op2. The OR ORs Y into Res. But if Y is 0 the OR will be folded away by getNode so we don't need to check for it.	2020-02-20 21:41:13 -08:00
Craig Topper	78be618717	[X86] Add CMOV_VR64 pseudo instruction for MMX. Remove mmx handling from combineSelect. The combineSelect code was casting to i64 without any check that i64 was legal. This can break after type legalization. It also required splitting the mmx register on 32-bit targets. It's not clear that this makes sense. Instead switch to using a cmov pseudo like we do for XMM/YMM/ZMM.	2020-02-20 20:30:56 -08:00
Jim Lin	e27b61c1ea	[XCore] Add instruction pattern for bitrev Summary: Add support for lowering bitreverse to the bitrev instruction. Fix https://bugs.llvm.org/show_bug.cgi?id=34628. Reviewers: RKSimon, rtrieu, robertlytton Reviewed By: RKSimon Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D74748	2020-02-21 09:28:49 +08:00
Craig Topper	e5782377f3	[X86] Add CMOV_VK1 pseudo so we don't crash on v1i1 ISD::SELECT	2020-02-20 15:13:48 -08:00
Craig Topper	7e92769862	[X86] Expand vselect of v1i1 under avx512. We already do this for v2i1, v4i1, etc.	2020-02-20 15:13:47 -08:00
Craig Topper	b00ef8951b	[X86] Custom legalize v1i1 UADDSAT/USUBSAT/SADDSAT/UADDSAT to match v2i1/v4i1/v8i1 etc.	2020-02-20 15:13:46 -08:00
Craig Topper	5228a5544b	[X86] Fix a couple copy mistakes in v4i1 or/and/xor isel patterns. VK1 was being used as the output of the copy to regclass, but it should be VK2/VK4. Shouldn't matter in practice though since VK1/VK2/VK4/VK8/VK16 are all identicaly and just have different VTs.	2020-02-20 15:13:45 -08:00
Craig Topper	d95a10a7f9	[X86] Custom legalize v1i1 add/sub/mul to xor/xor/and with avx512. We already did this for v2i1, v4i1, v8i1, etc.	2020-02-20 15:13:44 -08:00
Craig Topper	c7b54a196e	Recommit "[X86] Replace a bad use of MVT::getVectorVT with EVT::getVectorVT"" With the correct author this time	2020-02-20 12:28:54 -08:00
Craig Topper	1d8860f90b	Revert `714265dabb` "[X86] Replace a bad use of MVT::getVectorVT with EVT::getVectorVT" I accidentally messed up the author on the previous commit somehow.	2020-02-20 12:28:33 -08:00
Quentin Colombet	714265dabb	[X86] Replace a bad use of MVT::getVectorVT with EVT::getVectorVT The type here isn't guaranteed to be a simple type. Fixes PR44976	2020-02-20 12:25:37 -08:00
Nico Weber	6f4d9d1029	Revert "[AArch64][SVE] Add intrinsics for SVE2 bitwise ternary operations" This reverts commit `ce70e28998`. It broke MC/AArch64/SVE2/bsl-diagnostics.s everywhere.	2020-02-20 15:11:13 -05:00
Francesco Petrogalli	0c8fa6db90	[llvm][build] Fix shared lib builds. [NFC] The code at https://reviews.llvm.org/D74808 has broken builds that are configured with -DBUILD_SHARED_LIBS=On. This patch adds the correct library dependencies.	2020-02-20 19:42:53 +00:00
Sanjay Patel	064cd2ecdb	[x86] allow peeking through an extract_subvector to find a splatted operand The motivating case is seen in "splat4_v8f32_load_store" and based on code in PR42024: https://bugs.llvm.org/show_bug.cgi?id=42024 (I haven't stepped through the v8i32 sibling test yet to see why that diverged.) There are other potential improvements visible like allowing scalarization or vector narrowing. Differential Revision: https://reviews.llvm.org/D74909	2020-02-20 13:59:59 -05:00
Sean Fertile	da181d4ba0	[PowerPC][NFC] Cleanup some of the Darwin mentions in the README.txt.	2020-02-20 13:57:13 -05:00
Francis Visoiu Mistrih	3f785212e9	Revert "[macho][NFC] Extract all CPU_(SUB_)TYPE logic to libObject" This reverts commit `726c342ce2`. This breaks the windows bots with linker errors.	2020-02-20 10:51:25 -08:00
Francis Visoiu Mistrih	726c342ce2	[macho][NFC] Extract all CPU_(SUB_)TYPE logic to libObject This moves all the logic of converting LLVM Triples to MachO::CPU_(SUB_)TYPE from the specific target (Target)AsmBackend to more convenient functions in libObject. This also gets rid of the separate two X86AsmBackend classes. Differential Revision: https://reviews.llvm.org/D74808	2020-02-20 10:28:07 -08:00
Craig Topper	0ed7a61543	[X86] Fix a -Wparentheses warning. NFC	2020-02-20 09:32:03 -08:00
Craig Topper	3543ac9ab5	[X86] Rewrite LowerBRCOND to remove dead code and handle ISD::SETCC and overflow ops directly. There's a lot of old leftover code in LowerBRCOND. Especially the detecting or AND or OR of X86ISD::SETCC nodes. Those were needed before LegalizeDAG was changed to visit nodes before their operands. It also relied on reversing the output of LowerSETCC to find the flags producing node to use for the X86ISD::BRCOND node. Rather than using LowerSETCC this patch uses emitFlagsForSetcc to handle the integer ISD::SETCC case. This gives the flag producer and the comparison code to use directly. I've removed the addTest flag and just produce a X86ISD::BRCOND and return immediately. Floating point ISD::SETCC case is just an X86ISD::FCMP with special care for OEQ and UNE derived from the previous code. I've left f128 out so it will emit a test. And LowerSETCC will be called later to produce a libcall and X86ISD::SETCC. We have combines that can merge the test and X86ISD::SETCC. We need to handle two cases for overflow ops. Either they are used directly or they have a seteq 0 or setne 1 to invert the overflow. The old code did not handle the setne 1 case, but I think some other combines were making up for it. If we fail to find a condition, we'll wrap an AND with 1 on the original condition and tell emitFlagsForSetcc to emit a compare with 0. This will pickup the LowerAndToBT and or the EmitTest case. I kept the isTruncWithZeroHighBitsInput call, but we might be able to fold that in to emitFlagsForSetcc. Differential Revision: https://reviews.llvm.org/D74750	2020-02-20 08:50:18 -08:00
Craig Topper	9bbf271fc9	[AArch64] Move isOverflowIntrOpRes help function to the ISD namespace in SelectionDAG.h. NFC Enables sharing with an upcoming X86 change.	2020-02-20 08:50:17 -08:00
Danilo Carvalho Grael	ce70e28998	[AArch64][SVE] Add intrinsics for SVE2 bitwise ternary operations Summary: Add intrinsics for the following operations: - eor3, bcax - bsl, bsl1n, bsl2n, nbsl Reviewers: kmclaughlin, c-rhodes, sdesmalen, efriedma, rengolin Reviewed By: efriedma Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D74785	2020-02-20 11:36:48 -05:00
Craig Topper	12cc105f80	[X86] Add DAG combines to form CVTPH2PS/CVTPS2PH from vXf16->vXf32/vXf64 fp_extends and vXf32->vXf16 fp_round. Only handle power of 2 element count for simplicity. Not sure what to do with vXf64->vXf16 fp_round to avoid double rounding Differential Revision: https://reviews.llvm.org/D74886	2020-02-20 08:26:17 -08:00
Matt Arsenault	083717cf49	AMDGPU: Fix v2i64<->v4f32 bitcast I'm not sure how to test the v2i64->v4f32 case since I can't think of any v2i64 cases that won't legalize to v4i32.	2020-02-20 09:49:09 -05:00
Sebastian Neubauer	977cd661cf	[AMDGPU] Don’t marke the .note section as ALLOC Marking a section as ALLOC tells the ELF loader to load the section into memory. As we do not want to load the notes into VRAM, the flag should not be there. Differential Revision: https://reviews.llvm.org/D74600	2020-02-20 15:14:48 +01:00
Djordje Todorovic	2f215cf36a	Revert "Reland "[DebugInfo] Enable the debug entry values feature by default"" This reverts commit rGfaff707db82d. A failure found on an ARM 2-stage buildbot. The investigation is needed.	2020-02-20 14:41:39 +01:00
Andrzej Warzynski	0e417b034a	[AArch64][SVE] Re-arrange definitions in AArch64SVEInstrInfo.td (NFC) Re-arrange definitions related to loads and stores so that they are grouped together. This patch implements only non-functional changes.	2020-02-20 12:41:16 +00:00
Simon Pilgrim	6085593c12	[AMDGPU] simplifyI24 - replace GetDemandedBits with SimplifyMultipleUseDemandedBits GetDemandedBits mostly just calls SimplifyMultipleUseDemandedBits now, but it does a very blunt constant simplification that SimplifyMultipleUseDemandedBits avoids. If we need to demand bits from constants we should handle this through ShrinkDemandedConstant/targetShrinkDemandedConstant. @arsenm confirmed that the sign extended immediates are better for code size. Differential Revision: https://reviews.llvm.org/D74857	2020-02-20 12:03:08 +00:00
Mikhail Maltsev	f4fd7dbf85	[ARM,MVE] Add vqdmull[b,t]q intrinsic families Summary: This patch adds two families of ACLE intrinsics: vqdmullbq and vqdmulltq (including vector-vector and vector-scalar variants) and the corresponding LLVM IR intrinsics llvm.arm.mve.vqdmull and llvm.arm.mve.vqdmull.predicated. Reviewers: simon_tatham, MarkMurrayARM, dmgreen, ostannard Reviewed By: MarkMurrayARM Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D74845	2020-02-20 10:51:19 +00:00
Thomas Lively	16aabc86e0	[WebAssembly] Fix memory bug introduced in `5286180999` Summary: The instruction at `DefI` can sometimes be destroyed by `rematerializeCheapDef`, so it should not be used after calling that function. The fix is to use `Insert` instead when examining additional multivalue stackifications. `Insert` is the address of the new defining instruction after all moves and rematerializations have taken place. Reviewers: aheejin Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D74875	2020-02-19 15:07:45 -08:00
Matt Arsenault	4bb0c8f91c	AMDGPU: Enable integer division bypass We probably want this, and I've meant to turn this on for a long time. SC actually emits a special case to early-out for a 1 denominator, which perhaps should also be considered.	2020-02-19 17:50:19 -05:00
Matt Arsenault	cbc3b3046f	AMDGPU/GlobalISel: Remove outdated comment	2020-02-19 17:32:25 -05:00
Stanislav Mekhanoshin	03954a12ae	[AMDGPU] Fix DS_WRITE_B32 patterns It uses VGPR_32.RegTypes which includes 16 bit types. As a result DS_WRITE_B32 may be generated for "store i16" which is a bug. The only reason we do not hit it now is relative patterns complexity and sorting. Should DS_WRITE_B16 pattern complexity become higher and the bug appears. Differential Revision: https://reviews.llvm.org/D74868	2020-02-19 13:42:16 -08:00
Krzysztof Parzyszek	b1d47467e2	[Hexagon] Change HVX vector predicate types from v512/1024i1 to v64/128i1 This commit removes the artificial types <512 x i1> and <1024 x i1> from HVX intrinsics, and makes v512i1 and v1024i1 no longer legal on Hexagon. It may cause existing bitcode files to become invalid. * Converting between vector predicates and vector registers must be done explicitly via vandvrt/vandqrt instructions (their intrinsics), i.e. (for 64-byte mode): %Q = call <64 x i1> @llvm.hexagon.V6.vandvrt(<16 x i32> %V, i32 -1) %V = call <16 x i32> @llvm.hexagon.V6.vandqrt(<64 x i1> %Q, i32 -1) The conversion intrinsics are: declare <64 x i1> @llvm.hexagon.V6.vandvrt(<16 x i32>, i32) declare <128 x i1> @llvm.hexagon.V6.vandvrt.128B(<32 x i32>, i32) declare <16 x i32> @llvm.hexagon.V6.vandqrt(<64 x i1>, i32) declare <32 x i32> @llvm.hexagon.V6.vandqrt.128B(<128 x i1>, i32) They are all pure. * Vector predicate values cannot be loaded/stored directly. This directly reflects the architecture restriction. Loading and storing or vector predicates must be done indirectly via vector registers and explicit conversions via vandvrt/vandqrt instructions.	2020-02-19 14:14:56 -06:00
Craig Topper	f559cecc3e	[X86] Add DCI.isBeforeLegalize() check to the v64i1 constant splitting code in combineStore. We only need to split after type legalization. If we're before we can just use a wide store and type legalization will split it. Add a v128i1 test to exercise it post type legalization.	2020-02-19 09:18:16 -08:00
Stanislav Mekhanoshin	ada205e91e	[AMDGPU] Fix assumption about LaneBitmask content Yet another assumption about an actual LaneBitmask content is fixed. Differential Revision: https://reviews.llvm.org/D74805	2020-02-19 09:07:11 -08:00
Mikhail Maltsev	461fd94f00	[ARM,MVE] Fix predicate types of some intrinsics Summary: Some predicated MVE intrinsics return a vector with element size different from the input vector element size. In this case the predicate must type correspond to the output vector type. The following intrinsics use the incorrect predicate type: * llvm.arm.mve.mull.int.predicated * llvm.arm.mve.mull.poly.predicated * llvm.arm.mve.vshll.imm.predicated This patch fixes the issue. Reviewers: simon_tatham, dmgreen, ostannard, MarkMurrayARM Reviewed By: MarkMurrayARM Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D74838	2020-02-19 16:24:54 +00:00
Cameron McInally	3931734990	[AArch64][SVE] Add initial backend support for FP splat_vector Differential Revision: https://reviews.llvm.org/D74632	2020-02-19 10:19:11 -06:00
Stefan Pintilie	440ca29ea2	[Hexagon][NFC] Rename VK_Hexagon_PCREL to VK_PCREL On PowerPC we will soon need to use pcrel to indicate PC Relative addressing. Renamed the Hexagon specific variant kind to a non target specific VK so that it can be used on both Hexagon and PowerPC. Differential Revision: https://reviews.llvm.org/D74788	2020-02-19 09:52:58 -06:00
Matt Arsenault	ff4639f060	AMDGPU/GlobalISel: Select MUBUF path for global atomic cmpxchg I'm not sure why this isn't a pattern, but the DAG manually selects this.	2020-02-19 06:19:22 -08:00
Pierre-vh	39cecabece	[AArch64][ASMParser] Refuse equal source/destination for LDRAA/LDRAB Differential Revision: https://reviews.llvm.org/D74822	2020-02-19 14:15:17 +00:00
Sam Parker	de3e65e60c	[ARM][LowOverheadLoops] Check loop liveouts Check that no Q-regs are live out of the loop, unless the instruction within the loop is predicated on the vctp. Differential Revision: https://reviews.llvm.org/D72713	2020-02-19 12:59:01 +00:00
David Green	33aa5dfe9c	[ARM] VMLAVA reduction patterns Similar to VADDV and VADDLV that have been added recently, this adds lowering and patterns for VMLAV, VMLAVA, VMLALV and VMLALVA. They perform the same roles as the add's, just folding a mul into the same instruction (and so taking two inputs). As such, they need to be lowered in the same way as the types are often not legal. Differential Revision: https://reviews.llvm.org/D74390	2020-02-19 12:39:58 +00:00
Simon Pilgrim	4af8db317d	[AMDGPU] performCvtF32UByteNCombine - add SHL and SimplifyMultipleUseDemandedBits support This is part of the work to remove SelectionDAG::GetDemandedBits and just use SimplifyMultipleUseDemandedBits. Recent experiments raised some v_cvt_f32_ubyte*_e32 regressions, so I've added some additional abilities to performCvtF32UByteNCombine to help unpack byte data more aggressively. We still don't remove all OR(SHL,SRL) patterns as some of the regenerated nodes don't get combined again, but we are getting closer. Differential Revision: https://reviews.llvm.org/D74786	2020-02-19 11:45:57 +00:00
David Green	fceb3e3b4a	[ARM] MVE VADDLV lowering Following on from the extra VADDV lowering, this extends things to handle VADDLV which allows summing values into a pair of i32 registers, together treated as a i64. This needs to be done in DAGCombine too as the types are otherwise illegal, which is a fairly simple addition on top of the existing code. There is also a VADDLVA instruction handled here, that adds the incoming values from the two general purpose registers. As opposed to the non-long version where we could just add patterns for add(x, VADDV), the long version needs to handle this early before the i64 has being split into too many pieces. Differential Revision: https://reviews.llvm.org/D74224	2020-02-19 11:07:20 +00:00
Petar Avramovic	5e32e7981b	[MIPS GlobalISel] Legalize non-power-of-2 and unaligned load and store Custom legalize non-power-of-2 and unaligned load and store for MIPS32r5 and older, custom legalize non-power-of-2 load and store for MIPS32r6. Don't attempt to combine non power of 2 loads or unaligned loads when subtarget doesn't support them (MIPS32r5 and older). Differential Revision: https://reviews.llvm.org/D74625	2020-02-19 12:02:27 +01:00
Petar Avramovic	5171d1523d	[MIPS GlobalISel] Select 4 byte unaligned load and store Improve legality checks for load and store, 4 byte scalar load and store are now legal for all subtargets. During regbank selection 4 byte unaligned loads and stores for MIPS32r5 and older get mapped to gprb. Select 4 byte unaligned loads and stores for MIPS32r5. Fix tests that unintentionally had unaligned load or store. Differential Revision: https://reviews.llvm.org/D74624	2020-02-19 11:57:06 +01:00
Florian Hahn	216afd3301	[TargetLower] Update shouldFormOverflowOp check if math is used. On some targets, like SPARC, forming overflow ops is only profitable if the math result is used: https://godbolt.org/z/DxSmdB This patch adds a new MathUsed parameter to allow the targets to make the decision and defaults to only allowing it if the math result is used. That is the conservative choice. This patch also updates AArch64ISelLowering, X86ISelLowering, ARMISelLowering.h, SystemZISelLowering.h to allow forming overflow ops if the math result is not used. On those targets using the overflow intrinsic for the overflow check only generates better code. Reviewers: nikic, RKSimon, lebedev.ri, spatel Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D74722	2020-02-19 11:28:33 +01:00
Kerry McLaughlin	63236078d2	[AArch64][SVE] Add SVE2 intrinsics for polynomial arithmetic Summary: Implements the following intrinsics: - @llvm.aarch64.sve.eorbt - @llvm.aarch64.sve.eortb - @llvm.aarch64.sve.pmullb.pair - @llvm.aarch64.sve.pmullt.pair Reviewers: sdesmalen, c-rhodes, dancgr, cameron.mcinally, efriedma, rengolin Reviewed By: efriedma Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, cfe-commits, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D74769	2020-02-19 10:12:50 +00:00
Djordje Todorovic	faff707db8	Reland "[DebugInfo] Enable the debug entry values feature by default" Differential Revision: https://reviews.llvm.org/D73534	2020-02-19 11:12:26 +01:00
David Green	51c6e9445c	[ARM] Extra MVE VADDV reduction patterns We already make use of the VADDV vector reduction instruction for cases where the input and the output start out at the same type. The MVE instruction however will sum into an i32, so if we are summing a v16i8 into an i32, we can still use the same instructions. In terms of IR, this looks like a sext of a legal type (v16i8) into a very illegal type (v16i32) and a vecreduce.add of that into the result. This means we have to catch the pattern early in a DAG combine, producing a target VADDVs/u node, where the signedness is now important. This is the first part, handling VADDV and VADDVA. There are also VADDVL/VADDVLA instructions, which are interesting because they sum into a 64bit value. And VMLAV and VMLALV, which are interesting because they also do a multiply of two values. It may look a little odd in places as a result. On it's own this will probably not do very much, as the vectorizer will not produce this IR yet. Differential Revision: https://reviews.llvm.org/D74218	2020-02-19 09:45:35 +00:00
Petar Avramovic	92c80529dd	[MIPS GlobalISel] RegBankSelect G_MERGE_VALUES and G_UNMERGE_VALUES Consider large operands in G_MERGE_VALUES and G_UNMERGE_VALUES as Ambiguous during regbank selection. Introducing new InstType AmbiguousWithMergeOrUnmerge which will allow us to recognize whether to narrow scalar or use s64:fprb. This change exposed a bug when reusing data from TypeInfoForMF. Thus when Instr is about to get destroyed (using narrow scalar) clear its data in TypeInfoForMF. Internal data is saved based on Instr's address, and it will no longer be valid. Add detailed asserts for InstType and operand size. Generate generic instructions instead of MIPS target instructions during argument lowering and custom legalizer. Select G_UNMERGE_VALUES and G_MERGE_VALUES when proper banks are selected: {s32:gprb, s32:gprb, s64:fprb} for G_UNMERGE_VALUES and {s64:fprb, s32:gprb, s32:gprb} for G_MERGE_VALUES. Update tests. One improvement is when floating point argument in gpr(or two gprs) gets passed to another function through gpr unnecessary fpr-to-gpr moves are no longer generated. Differential Revision: https://reviews.llvm.org/D74623	2020-02-19 10:09:52 +01:00
Craig Topper	f69a29da5a	[X86] Remove vXi1 select optimization from LowerSELECT. Move it to DAG combine.	2020-02-19 00:00:55 -08:00
Craig Topper	0dbc4658d8	[X86] Handle splats in LowerBUILD_VECTORvXi1 by directly emitting scalar selects instead of deferring that to LowerSELECT. LoweSELECT will detect the constant inputs and convert to scalar selects, but we can do it directly here. I might remove some of the code from LowerSELECT and move it to DAG combine so doing this explicitly will make us less dependent on it happening in lowering.	2020-02-18 22:39:30 -08:00
Thomas Lively	ca9ba76481	[WebAssembly] Replace all calls with generalized multivalue calls Summary: Extends the multivalue call infrastructure to tail calls, removes all legacy calls specialized for particular result types, and removes the CallIndirectFixup pass, since all indirect call arguments are now fixed up directly in the post-insertion hook. In order to keep supporting pretty-printed defs and uses in test expectations, MCInstLower now inserts an immediate containing the number of defs for each call and call_indirect. The InstPrinter is updated to query this immediate if it is present and determine which MCOperands are defs and uses accordingly. Depends on D72902. Reviewers: aheejin Subscribers: dschuff, mgorny, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D74192	2020-02-18 15:55:20 -08:00
Thomas Lively	5286180999	[WebAssembly] Fix RegStackify and ExplicitLocals to handle multivalue Summary: There is still room for improvement in the handling of multivalue nodes in both passes, but the current algorithm is at least correct and optimizes some simpler cases. In order to make future optimizations of these passes easier and build confidence that the current algorithms are correct, this CL also adds a script that automatically and exhaustively generates interesting multivalue test cases. Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D72902	2020-02-18 14:56:09 -08:00
Reid Kleckner	0c2b09a9b6	[IR] Lazily number instructions for local dominance queries Essentially, fold OrderedBasicBlock into BasicBlock, and make it auto-invalidate the instruction ordering when new instructions are added. Notably, we don't need to invalidate it when removing instructions, which is helpful when a pass mostly delete dead instructions rather than transforming them. The downside is that Instruction grows from 56 bytes to 64 bytes. The resulting LLVM code is substantially simpler and automatically handles invalidation, which makes me think that this is the right speed and size tradeoff. The important change is in SymbolTableTraitsImpl.h, where the numbering is invalidated. Everything else should be straightforward. We probably want to implement a fancier re-numbering scheme so that local updates don't invalidate the ordering, but I plan for that to be future work, maybe for someone else. Reviewed By: lattner, vsk, fhahn, dexonsmith Differential Revision: https://reviews.llvm.org/D51664	2020-02-18 14:44:24 -08:00
Thomas Lively	9d37f5afac	[WebAssembly] Implement multivalue call_indirects Summary: Unlike normal calls, call_indirects have immediate arguments that caused a MachineVerifier failure without a small tweak to loosen the verifier's requirements for variadicOpsAreDefs instructions. One nice thing about the new call_indirects is that they do not need to participate in the PCALL_INDIRECT mechanism because their post-isel hook handles moving the function pointer argument and adding the flags and typeindex arguments itself. Reviewers: aheejin Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D74191	2020-02-18 13:49:46 -08:00
Thomas Lively	d51910967f	Reland "[WebAssembly] Split and recombine multivalue calls for ISel" This reverts commit `8acedb595d` and relands a prerequisite for the patch series culminating in https://reviews.llvm.org/D74192.	2020-02-18 13:49:46 -08:00
Thomas Lively	7b64a59060	Reland "[WebAssembly][InstrEmitter] Foundation for multivalue call lowering" This reverts commit `649aba93a2`, now that the approach started there has been shown to be workable in the patch series culminating in https://reviews.llvm.org/D74192.	2020-02-18 13:49:46 -08:00
Simon Pilgrim	d6eef0614f	[TargetLowering] Add SimplifyMultipleUseDemandedBits 'all elements' helper wrapper. NFC.	2020-02-18 19:53:50 +00:00
Craig Topper	89ab5c69c8	[X86] Add a helper function to pull some repeated code out of combineGatherScatter. NFC	2020-02-18 11:10:40 -08:00
Huihui Zhang	8ee0e1dc02	[NFC] Silence compiler warning [-Wmissing-braces].	2020-02-18 10:37:12 -08:00
Stanislav Mekhanoshin	dd4766451e	[AMDGPU] Use generated RegisterPressureSets enum Differential Revision: https://reviews.llvm.org/D74671	2020-02-18 10:34:03 -08:00
Matt Arsenault	f4d3765fd9	CodeGen: Move undef_tied_input declaration This doesn't belong in ARM specific code since it's generally recognized by tablegen.	2020-02-18 10:33:10 -08:00
Mikhail Maltsev	63809d365e	[ARM,MVE] Add vbrsrq intrinsics family Summary: This patch adds a new MVE intrinsics family, `vbrsrq`: vector bit reverse and shift right. The intrinsics are compiled into the VBRSR instruction. Two new LLVM IR intrinsics were also added: arm.mve.vbrsr and arm.mve.vbrsr.predicated. Reviewers: simon_tatham, dmgreen, ostannard, MarkMurrayARM Reviewed By: simon_tatham Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D74721	2020-02-18 17:31:21 +00:00
Sean Fertile	3126b556d1	[PowerPC][NFC] Add defines to help creating the SpillSlot arrays. Create preprocessor defines for callee saved floating-point register spill slots, vector register spill slots, and both 32-bit and 64-bit general purpose register spill slots. This is an NFC refactor to prepare for adding ABI compliant callee saves and restores for AIX.	2020-02-18 11:52:04 -05:00
Andrew Wei	4ca753f4e3	[RISCV] Implement mayBeEmittedAsTailCall for tail call optimization Implement TargetLowering callback mayBeEmittedAsTailCall for riscv in CodeGenPrepare, which will duplicate return instructions to enable tailcall optimization. Differential Revision: https://reviews.llvm.org/D73699	2020-02-18 23:56:42 +08:00
Sander de Smalen	8fbc925807	Add OffsetIsScalable to getMemOperandWithOffset Summary: Making `Scale` a `TypeSize` in AArch64InstrInfo::getMemOpInfo, has the effect that all places where this information is used (notably, TargetInstrInfo::getMemOperandWithOffset) will need to consider Scale - and derived, Offset - possibly being scalable. This patch adds a new operand `bool &OffsetIsScalable` to TargetInstrInfo::getMemOperandWithOffset and fixes up all the places where this function is used, to consider the offset possibly being scalable. In most cases, this means bailing out because the algorithm does not (or cannot) support scalable offsets in places where it does some form of alias checking for example. Reviewers: rovka, efriedma, kristof.beyls Reviewed By: efriedma Subscribers: wuzish, kerbowa, MatzeB, arsenm, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, javed.absar, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, Jim, lenary, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D72758	2020-02-18 15:53:29 +00:00
Djordje Todorovic	2bf44d11cb	Revert "Reland "[DebugInfo] Enable the debug entry values feature by default"" This reverts commit rGa82d3e8a6e67.	2020-02-18 16:38:11 +01:00
Kazushi (Jam) Marukawa	5526786a56	[VE] TLS codegen Summary: Codegen and tests for thread-local storage. This implements only the general dynamic model due to limitations in nld 2.26. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D74718	2020-02-18 16:09:12 +01:00
Luke Geeson	4518aab289	[AArch64] Add Cortex-A34 Support for clang and llvm This patch upstreams support for the AArch64 Armv8-A cpu Cortex-A34. In detail adding support for: - mcpu option in clang - AArch64 Target Features in clang - llvm AArch64 TargetParser definitions details of the cpu can be found here: https://developer.arm.com/ip-products/processors/cortex-a/cortex-a34 Reviewers: SjoerdMeijer Reviewed By: SjoerdMeijer Subscribers: SjoerdMeijer, kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D74483 Change-Id: Ida101fc544ca183a0a0e61a1277c8957855fde0b	2020-02-18 14:56:16 +00:00
Matt Arsenault	37c452a289	AMDGPU/GlobalISel: Adjust branch target when lowering loop intrinsic This needs to steal the branch target like the other control flow intrinsics.	2020-02-18 06:35:40 -08:00
Djordje Todorovic	a82d3e8a6e	Reland "[DebugInfo] Enable the debug entry values feature by default" This patch enables the debug entry values feature. - Remove the (CC1) experimental -femit-debug-entry-values option - Enable it for x86, arm and aarch64 targets - Resolve the test failures - Leave the llc experimental option for targets that do not support the CallSiteInfo yet Differential Revision: https://reviews.llvm.org/D73534	2020-02-18 14:41:08 +01:00
Kerry McLaughlin	d4576080da	[AArch64][SVE] Add remaining SVE2 intrinsics for widening DSP operations Summary: Implements the following intrinsics: - llvm.aarch64.sve.[s\|u]mullb_lane - llvm.aarch64.sve.[s\|u]mullt_lane - llvm.aarch64.sve.sqdmullb_lane - llvm.aarch64.sve.sqdmullt_lane - llvm.aarch64.sve.[s\|u]addwb - llvm.aarch64.sve.[s\|u]addwt - llvm.aarch64.sve.[s\|u]shllb - llvm.aarch64.sve.[s\|u]shllt - llvm.aarch64.sve.[s\|u]subwb - llvm.aarch64.sve.[s\|u]subwt Reviewers: sdesmalen, dancgr, efriedma, c-rhodes, rengolin Reviewed By: sdesmalen Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, cameron.mcinally, cfe-commits, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73903	2020-02-18 10:28:00 +00:00
Mikhail Maltsev	58f66f8af0	[ARM,CDE] Cosmetic changes, additonal driver tests Summary: This is a follow-up patch addressing post-commit comments in https://reviews.llvm.org/D74044: * Add more Clang driver tests (-march=armv8.1m.main and -march=armv8.1m.main+mve.fp) * Clang-format a chunk in ARMAsmParser.cpp * Add a missing copyright header to ARMInstrCDE.td Reviewers: SjoerdMeijer, simon_tatham, dmgreen Reviewed By: SjoerdMeijer Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D74732	2020-02-18 10:23:09 +00:00
Simon Tatham	c32af4447f	[ARM,MVE] Add the vmovnbq,vmovntq intrinsic family. Summary: These are in some sense the inverse of vmovl[bt]q: they take a vector of n wide elements and truncate each to half its width. So they only write half a vector's worth of output data, and therefore they also take an 'inactive' parameter to provide the other half of the data in the output vector. So vmovnb overwrites the even lanes of 'inactive' with the narrowed values from the main input, and vmovnt overwrites the odd lanes. LLVM had existing codegen which generates these MVE instructions in response to IR that takes two vectors of wide elements, or two vectors of narrow ones. But in this case, we have one vector of each. So my clang codegen strategy is to narrow the input vector of wide elements by simply reinterpreting it as the output type, and then we have two narrow vectors and can represent the operation as a vector shuffle that interleaves lanes from both of them. Even so, not all the cases I needed ended up being selected as a single MVE instruction, so I've added a couple more patterns that spot combinations of the 'MVEvmovn' and 'ARMvrev32' SDNodes which can be generated as a VMOVN instruction with operands swapped. This commit adds the unpredicated forms only. Reviewers: dmgreen, miyuki, MarkMurrayARM, ostannard Reviewed By: dmgreen Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D74337	2020-02-18 09:34:50 +00:00
Simon Tatham	5e97940cd2	[ARM,MVE] Add the vmovlbq,vmovltq intrinsic family. Summary: These intrinsics take a vector of 2n elements, and return a vector of n wider elements obtained by sign- or zero-extending every other element of the input vector. They're represented in IR as a shufflevector that extracts the odd or even elements of the input, followed by a sext or zext. Existing LLVM codegen already matches this pattern and generates the VMOVLB instruction (which widens the even-index input lanes). But no existing isel rule was generating VMOVLT, so I've added some. However, the new rules currently only work in little-endian MVE, because the pattern they expect from isel lowering includes a bitconvert which doesn't have the right semantics in big-endian. The output of one existing codegen test is improved by those new rules. This commit adds the unpredicated forms only. Reviewers: dmgreen, miyuki, MarkMurrayARM, ostannard Reviewed By: dmgreen Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D74336	2020-02-18 09:34:50 +00:00
Simon Tatham	9dcc1667ab	[ARM] Allow `ARMVectorRegCast` to match bitconverts too. (NFC) Summary: When we start putting instances of `ARMVectorRegCast` in complex isel patterns, it will be awkward that they're often turned into the more standard `bitconvert` in little-endian mode. We'd rather not have to write separate isel patterns for the two endiannesses, matching different but equivalent cast operations. This change aims to fix that awkwardness in advance, by turning the Tablegen record `ARMVectorRegCast` from a simple `SDNode` instance into a `PatFrags` that can match either kind of cast – with a predicate that prevents it matching a bitconvert in the big-endian case, where bitconvert isn't semantically identical. No existing code generation should be affected by this change, but it will enable the patterns introduced by D74336 to work in both endiannesses. Reviewers: dmgreen Reviewed By: dmgreen Subscribers: kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D74716	2020-02-18 09:34:50 +00:00
Simon Tatham	68b49f7ef4	[ARM,MVE] Add intrinsics vclzq and vclsq. Summary: vclzq maps nicely to the existing target-independent @llvm.ctlz IR intrinsic. But vclsq ('count leading sign bits') has no corresponding target-independent intrinsic, so I've made up @llvm.arm.mve.vcls. This commit adds the unpredicated forms only. Reviewers: dmgreen, miyuki, MarkMurrayARM, ostannard Reviewed By: miyuki Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D74335	2020-02-18 09:34:50 +00:00
Simon Tatham	c8b3196e54	[ARM,MVE] Add intrinsics for FP rounding operations. Summary: This adds the unpredicated forms of six different MVE intrinsics which all round a vector of floating-point numbers to integer values, leaving them still in FP format, differing only in rounding mode and exception settings. Five of them map to existing target-independent intrinsics in LLVM IR, such as @llvm.trunc and @llvm.rint. The sixth, mapping to the `vrintn` instruction, is done by inventing a target-specific intrinsic. (`vrintn` behaves the same as `vrintx` in terms of the output value: the side effects on the FPSCR flags are the only difference between the two. But ACLE specifies separate user-callable intrinsics for the two, so the side effects matter enough to make sure we generate the right one of the two instructions in each case.) Reviewers: dmgreen, miyuki, MarkMurrayARM, ostannard Reviewed By: miyuki Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D74333	2020-02-18 09:34:50 +00:00
Craig Topper	e90dc7c48b	[X86] Move avx512 code that forces zeros to the false side of vselects above a check for legal types. This helps this transform occur earlier so we can fold the not with setcc. If we delay it until after type legalization we might have introduced instructions to widen the mask if the vselect was widened. This can prevent the not from making it to the setcc. We could of course add more DAG combines to handle that, but moving this earlier is easier.	2020-02-17 22:24:21 -08:00
Craig Topper	b0840934a7	[X86] Use isScalarFPTypeInSSEReg to simplify code in LowerSELECT. NFC	2020-02-17 19:43:57 -08:00
Jim Lin	fa75bffbbb	[XCore][NFC] Remove trailing space	2020-02-18 10:32:58 +08:00
Craig Topper	3f4490d384	[X86] Add one use check to '0-x == y --> x+y == 0' in EmitCmp. I failed to copy it when I moved this in `b62de210cf`	2020-02-17 18:16:42 -08:00
Stanislav Mekhanoshin	8e760e1018	[TBLGEN] Inhibit generation of unneeded psets Differential Revision: https://reviews.llvm.org/D74744	2020-02-17 15:38:08 -08:00
Craig Topper	68400a2308	[X86] Add missing isel pattern for BLCFILL producing flags.	2020-02-17 13:20:13 -08:00
Matt Arsenault	5e8792453d	AMDGPU/GlobalISel: Fix RegBankSelect for G_SHUFFLE_VECTOR	2020-02-17 15:11:25 -05:00
Matt Arsenault	f742a28ae3	AMDGPU/GlobalISel: Custom lower 32-bit G_SDIV/G_SREM	2020-02-17 15:09:51 -05:00
Matt Arsenault	e240b27d6d	AMDGPU/GlobalISel: Allow arbitrary global values Treat unknown address spaces as global	2020-02-17 11:32:28 -08:00
Craig Topper	43e948c4b7	[X86] Change how the alignment for the stack object is created in LowerFLT_ROUNDS_. We don't need FrameInfo's concept of the stack alignment. We just need to tell it the desired alignment. Which in this case is 2.	2020-02-17 11:27:34 -08:00
Craig Topper	b62de210cf	[X86] Move '0-x == y --> x+y == 0' and similar combines to EmitCmp. AArch64 handles this pattern in their lowering code. By emitting CMN. ARM handles it as an isel pattern.	2020-02-17 11:27:34 -08:00
Matt Arsenault	54137bbaaf	GlobalISel: Allow running localizer earlier This required legal and regbankselected MIR for seemingly no reason. For AMDGPU this wouldn't see legalized G_GLOBAL_VALUEs.	2020-02-17 11:24:06 -08:00
Matt Arsenault	96db12d507	AMDGPU/GlobalISel: Custom lower 32-bit G_UDIV/G_UREM AMDGPUCodeGenPrepare expands this most of the time, but not always. We will always at least need a fallback option here. This is the 3rd implementation of the same expansion in the backend. Eventually I would like to eliminate the IR expansion (and the DAG version obviously). Currently the new legalizer path produces a better result, since the IR expansion results in extra operations which need to be combined out. Notably, the IR expansion results in multiplies by 0.	2020-02-17 11:05:50 -08:00
Matt Arsenault	0e2eb357e0	GlobalISel: Extend narrowing to G_ASHR	2020-02-17 10:42:59 -08:00
John Brawn	594a89f727	[FPEnv][ARM] Don't call mutateStrictFPToFP when lowering mutateStrictFPToFP can delete the node and replace it with another with the same value which can later cause problems, and returning the result of mutateStrictFPToFP doesn't work because SelectionDAGLegalize expects that the returned value has the same number of results as the original. Instead handle things by doing the mutation manually. Differential Revision: https://reviews.llvm.org/D74726	2020-02-17 18:19:25 +00:00
Mikhail Maltsev	489f62e801	[ARM,MVE] Add vector-scalar intrinsics Summary: This patch adds vector-scalar variants to the following families of MVE intrinsics: * vaddq * vsubq * vmulq * vqaddq * vqsubq * vhaddq * vhsubq * vqdmulhq * vqrdmulhq The vector-scalar variants perform a splat operation on the scalar operand and then perform the same operations as their vector-vector counterparts. Code generation is done accordingly (using LLVM IR 'insert' and 'shuffle' operations which are later converted into an ARMvdup SDNode). Reviewers: simon_tatham, dmgreen, MarkMurrayARM, ostannard Reviewed By: dmgreen Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D74620	2020-02-17 17:47:05 +00:00
Nikita Popov	80397d2d12	[IRBuilder] Delete copy constructor D73835 will make IRBuilder no longer trivially copyable. This patch deletes the copy constructor in advance, to separate out the breakage. Currently, the IRBuilder copy constructor is usually used by accident, not by intention. In rG7c362b25d7a9 I've fixed a number of cases where functions accepted IRBuilder rather than IRBuilder &, thus performing an unnecessary copy. In rG5f7b92b1b4d6 I've fixed cases where an IRBuilder was copied, while an InsertPointGuard should have been used instead. The only non-trivial use of the copy constructor is the getIRBForDbgInsertion() helper, for which I separated construction and setting of the insertion point in this patch. Differential Revision: https://reviews.llvm.org/D74693	2020-02-17 18:14:48 +01:00
Nikita Popov	98ed613ccc	[IRBuilder] Avoid passing IRBuilder by value; NFC I've fixed most of these before, but missed some occurrences in targets I don't usually build.	2020-02-17 18:14:47 +01:00
Matt Arsenault	8550859535	GlobalISel: Extend shift narrowing to G_SHL	2020-02-17 09:13:37 -08:00
Matt Arsenault	d9e8b2cbcc	AMDGPU/GlobalISel: Skip DAG hack passes on selected functions The way fallback to SelectionDAG works is somewhat surprising to me. When the fallback path is enabled, the entire set of SelectionDAG selector passes is added to the pass pipeline, and each one needs to check if the function was selected. This results in the surprising behavior of running SIFixSGPRCopies for example, but only if -global-isel-abort=2 is used. SIAddIMGInitPass is also added in addInstSelector, but I'm not sure why we have this pass or if it should be added somewhere else for GlobalISel.	2020-02-17 08:33:17 -08:00
Matt Arsenault	78d455adf0	GlobalISel: Add combine to narrow G_LSHR Produce an unmerge to a narrower type and introduce a narrower shift if needed. I wasn't sure if there was a better way to parameterize the target's preferred shift type for the GICombineRule, so manually call the combine helper.	2020-02-17 08:04:52 -08:00
Matt Arsenault	86813e2768	AMDGPU/GlobalISel: Select llvm.amdgcn.s.buffer.load Doesn't try to fail on the dlc bit pre-gfx10 like the DAG lowering does.	2020-02-17 08:02:40 -08:00
Mikhail Maltsev	dd4d093762	[ARM] Add initial support for Custom Datapath Extension (CDE) Summary: This patch adds assembly-level support for a new Arm M-profile architecture extension, Custom Datapath Extension (CDE). A brief description of the extension is available at https://developer.arm.com/architectures/instruction-sets/custom-instructions The latest specification for CDE is currently a beta release and is available at https://static.docs.arm.com/ddi0607/aa/DDI0607A_a_armv8m_arm_supplement_cde.pdf CDE allows chip vendors to add custom CPU instructions. The CDE instructions re-use the same encoding space as existing coprocessor instructions (such as MRC, MCR, CDP etc.). Each coprocessor in range cp0-cp7 can be configured as either general purpose (GCP) or custom datapath (CDEv1). This configuration is defined by the CPU vendor and is provided to LLVM using 8 subtarget features: cdecp0 ... cdecp7. The semantics of CDE instructions are implementation-defined, but the instructions are guaranteed to be pure (that is, they are stateless, they do not access memory or any registers except their explicit inputs/outputs). CDE requires the CPU to support at least Armv8.0-M mainline architecture. CDE includes 3 sets of instructions: * Instructions that operate on general purpose registers and NZCV flags * Instructions that operate on the S or D register file (require either FP or MVE extension) * Instructions that operate on the Q register file, require MVE The user-facing names that can be specified on the command line are the same as the 8 subtarget feature names. For example: $ clang -target arm-none-none-eabi -march=armv8m.main+cdecp0+cdecp3 tells the compiler that the coprocessors 0 and 3 are configured as CDEv1 and the remaining coprocessors are configured as GCP (which is the default). Reviewers: simon_tatham, ostannard, dmgreen, eli.friedman Reviewed By: simon_tatham Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D74044	2020-02-17 15:39:16 +00:00
Matt Arsenault	5fdc9851d0	AMDGPU/GlobalISel: Run the localizer pass While looking at the output on real sized programs, there is a lot of extra SGPR spilling compared to the DAG path. This seems to largely be from all constants being SGPRs in the entry block.	2020-02-17 07:38:12 -08:00
Sander de Smalen	a7a96c726e	[AArch64] Implement passing SVE vectors by ref for AAPCS. Summary: This patch implements the part of the calling convention where SVE Vectors are passed by reference. This means the caller must allocate stack space for these objects and pass the address to the callee. Reviewers: efriedma, rovka, cameron.mcinally, c-rhodes, rengolin Reviewed By: efriedma Subscribers: tschuett, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71216	2020-02-17 15:20:28 +00:00
Benjamin Kramer	f4c59c0f97	[wasm] Unbreak after `5fc5c7db38`. NFCI.	2020-02-17 15:49:49 +01:00
Benjamin Kramer	5fc5c7db38	Strength reduce vectors into arrays. NFCI.	2020-02-17 15:37:35 +01:00
Matt Arsenault	e5805529bf	AMDGPU/GlobalISel: Select v2s32->v2s16 G_TRUNC It would be nice if there was a way to avoid the tied operand, but as far as I can tell there isn't a way to use or with op_sel to achieve this	2020-02-17 09:20:13 -05:00
Matt Arsenault	361f2a7818	AMDGPU/GlobalISel: Handle sbfe/ubfe intrinsic Try to handle arbitrary scalar BFEs by packing the operands. The DAG gives up on non-constant arguments. We're still missing any constant folding, so we end up with pretty ugly code most of the time. Also handle the 64-bit scalar case, which the DAG doesn't try to do.	2020-02-17 09:20:13 -05:00
Kerry McLaughlin	633db60f3e	[AArch64][SVE] Add SVE index intrinsic Summary: Implements the @llvm.aarch64.sve.index intrinsic, which takes a scalar base and step value. This patch also adds the printSImm function to AArch64InstPrinter to ensure that immediates of type i8 & i16 are printed correctly. Reviewers: sdesmalen, andwar, efriedma, dancgr, cameron.mcinally, rengolin Reviewed By: cameron.mcinally Subscribers: tatyana-krasnukha, tschuett, kristof.beyls, hiraditya, rkruppe, arphaman, psnobl, cfe-commits, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D74550	2020-02-17 10:30:11 +00:00
Sjoerd Meijer	e5043cd3c2	[AArch64] Fix small typos in the target description. NFC. Patch by Tamas Petz. Differential Revision: https://reviews.llvm.org/D74603	2020-02-17 10:13:47 +00:00
QingShan Zhang	113df90388	[PowerPC] Add the missing InstrAliasing for 64-bit rotate instructions We have the InstAlias rules for 32-bit rotate but missing the 64-bit one. Rotate left immediate rotlwi ra,rs,n rlwinm ra,rs,n,0,31 Rotate left rotlw ra,rs,rb rlwnm ra,rs,rb,0,31 Differential Revision: https://reviews.llvm.org/D72676	2020-02-17 05:42:49 +00:00
Michael Liao	487fcc8d3d	Fix `-Wpedantic` warning. NFC.	2020-02-17 00:18:01 -05:00
Craig Topper	dd0b18e1ec	[X86] Disable load folding for X86ISD::ADD with 128 as an immediate. It can be turned into a sub with -128 instead as long as the carry flag isn't used.	2020-02-16 20:52:51 -08:00
Craig Topper	464729cf7c	[X86] Remove unnecessary check for null SDValue. NFC	2020-02-16 20:25:24 -08:00
Matt Arsenault	295bbea3ed	AMDGPU/GlobalISel: Fix non-power-of-2 G_SITOFP/G_UITOFP This wouldn't work for s33-s63 sources.	2020-02-16 22:48:57 -05:00
Matt Arsenault	044d40ed46	AMDGPU/GlobalISel: Move lambdas to normal function These aren't using any local state	2020-02-16 22:48:32 -05:00
Zheng Chen	04377a81ae	[Powerpc] set instruction count as lsr first priority of lsr. On Powerpc, set instruction count as lsr first priority of lsr by default. Add an option ppc-lsr-no-insns-cost to return back to default lsr cost model. Reviewed By: steven.zhang, jsji Differential Revision: https://reviews.llvm.org/D72683	2020-02-16 21:04:55 -05:00
Craig Topper	20c5968e09	[X86] Increase latency of port5 masked compares and kshift/kadd/kunpck instructions in SKX scheduler model Uops.info shows these as 4 cycle latency.	2020-02-16 16:59:37 -08:00
Craig Topper	272d35aef5	[X86] Separate floating point handling out of EmitCmp and emitFlagsForSetcc. Both of those functions only have a single caller starting at LowerSETCC. Just handle floating point directly in LowerSETCC. This removes the need to pass Chain and IsSignaling all the way down.	2020-02-16 10:51:05 -08:00
Craig Topper	d26f11108b	[X86] Split X86ISD::CMP into an integer and FP opcode.	2020-02-16 10:10:19 -08:00
Eric Astor	ee2c0f76d7	[ms] [llvm-ml] Add a draft MASM parser Summary: Many directives are unavailable, and support for others may be limited. This first draft has preliminary support for: - conditional directives (including errors), - data allocation (unsigned types up to 8 bytes, and ALIGN), - equates/variables (numeric and text), - and procedure directives (without parameters), as well as COMMENT, ECHO, INCLUDE, INCLUDELIB, PUBLIC, and EXTERN. Text variables (aka text macros) are expanded in-place wherever the identifier occurs. We deliberately ignore all ml.exe processor directives. Prominent features not yet supported: - structs - macros (both procedures and functions) - procedures (with specified parameters) - substitution & expansion operators Conditional directives are complicated by the fact that "ifdef rax" is a valid way to check if a file is being assembled for a 64-bit x86 processor; we add support for "ifdef <register>" in general, which requires adding a tryParseRegister method to all MCTargetAsmParsers. (Some targets require backtracking in the non-register case.) Reviewers: rnk, thakis Reviewed By: thakis Subscribers: kerbowa, merge_guards_bot, wuzish, arsenm, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, mgorny, sbc100, jgravelle-google, hiraditya, aheejin, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, Jim, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D72680	2020-02-16 12:30:46 -05:00
Nikita Popov	7c362b25d7	[IRBuilder] Fix unnecessary IRBuilder copies; NFC Fix a few cases where an IRBuilder is passed to a helper function by value, while a by reference pass was intended.	2020-02-16 17:57:18 +01:00
Simon Pilgrim	b85df2e185	[X86] combineX86ShuffleChain - add support for combining 512-bit shuffles to PALIGNR	2020-02-16 16:13:26 +00:00
Simon Pilgrim	c9c1c2b335	[X86] combineX86ShuffleChain - add support for combining 512-bit shuffles to bit shifts	2020-02-16 16:13:25 +00:00
Sanjay Patel	e48b536be6	[x86] form broadcast of scalar memop even with >1 use The unseen logic diff occurs because MayFoldLoad() is defined like this: static bool MayFoldLoad(SDValue Op) { return Op.hasOneUse() && ISD::isNormalLoad(Op.getNode()); } The test diffs here all seem ok to me on screen/paper, but it's hard to know if that will lead to universally better perf for all targets. For example, if a target implements broadcast from mem as multiple uops, we would have to weigh the potential reduction of instructions and register pressure vs. possible increase in number of uops. I don't know if we can make a truly informed decision on this at compile-time. The motivating case that I'm looking at in PR42024: https://bugs.llvm.org/show_bug.cgi?id=42024 ...resembles the diff in extract-concat.ll, but we're not going to change the larger example there without at least 1 other fix. Differential Revision: https://reviews.llvm.org/D74088	2020-02-16 10:32:56 -05:00
Fangrui Song	46788a21f9	[X86][AsmPrinter] PrintSymbolOperand: prefer to lower ELF MO_GlobalAddress to .Lfoo$local	2020-02-15 13:45:29 -08:00
Craig Topper	e5b3ae4b34	[X86] Merge two switches together to simplify some code. NFC	2020-02-15 12:55:51 -08:00
Craig Topper	c3c20c83f3	[X86] Fix typo in comment. NFC	2020-02-15 12:48:19 -08:00
Simon Pilgrim	34a054ce71	[X86] combineX86ShuffleChain - add support for combining to X86ISD::ROTLI Refactors matchShuffleAsBitRotate to allow use by both lowerShuffleAsBitRotate and matchUnaryPermuteShuffle.	2020-02-15 20:04:54 +00:00
Craig Topper	3f7649799b	[X86] Move combineIncDecVector logic from Select to PreprocessISelDAG. This allows it to work properly with masked inc/dec for avx512. Those would have a vselect as the root node so didn't get a chance to call combineIncDecVector. This also simplifies the logic because we don't have to manage the topological ordering.	2020-02-15 09:59:12 -08:00
Fangrui Song	549b436beb	[MC] De-capitalize MCStreamer::Emit{Bundle,Addrsig}* etc So far, all non-COFF-related Emit* functions have been de-capitalized.	2020-02-15 09:11:48 -08:00
Pavel Iliin	dc0b815989	[AArch64][FIX] Correct register live range during pseudo expansion. This commit fixes the broken tests after commit `b6a9fe2099` on the expensive check builder: http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-ubuntu/builds/2884	2020-02-15 12:16:56 +00:00
David Green	da147ef0a5	[AArch64] Fixup kill flags on BSL generation This hopefully fixes up the expensive checks bot.	2020-02-15 11:44:23 +00:00
Fangrui Song	774971030d	[MCStreamer] De-capitalize EmitValue EmitIntValue{,InHex}	2020-02-14 23:08:40 -08:00
Fangrui Song	1dc16c752d	[MC] Add MCSection::NonUniqueID and delete one MCContext::getELFSection overload	2020-02-14 20:25:52 -08:00
Matt Arsenault	8d8d46b57a	AMDGPU/GlobalISel: Fix missing impdef of scc on boolean bit ops	2020-02-14 22:35:30 -05:00
Fangrui Song	6d2d589b06	[MC] De-capitalize another set of MCStreamer::Emit* functions Emit{ValueTo,Code}Alignment Emit{DTP,TP,GP}* EmitSymbolValue etc	2020-02-14 19:26:52 -08:00
Fangrui Song	a55daa1461	[MC] De-capitalize some MCStreamer::Emit* functions	2020-02-14 19:11:53 -08:00
Shiva Chen	1cae2f9d19	[RISCV] Correct the CallPreservedMask for the function call in an interrupt handler CallPreservedMask is used to describe the register liveness after a function call. The function call in an interrupt handler should use the same CallPreservedMask as normal functions. So that only callee save registers can live through the function call.	2020-02-15 09:14:04 +08:00
Matt Arsenault	65dbdc329f	AMDGPU: Don't preserve analyses with div64 IR expansion The dominator tree needs to be updated, but that isn't handled now.	2020-02-14 20:06:02 -05:00
Matt Arsenault	dc3e499dd4	AMDGPU/GlobalISel: Fix G_EXTRACT of 96-bit results This would assert on an unhandled size in getRegSplitParts.	2020-02-14 15:57:40 -08:00
Matt Arsenault	60fea2713d	AMDGPU/GlobalISel: Improve 16-bit bswap Match the new DAG behavior and use v_perm_b32 when available. Also does better on SI/CI by expanding 16-bit swaps. Also fix non-power-of-2 cases.	2020-02-14 15:57:39 -08:00
Stanislav Mekhanoshin	922197d664	[TBLGEN] Allow to override RC weight Differential Revision: https://reviews.llvm.org/D74509	2020-02-14 15:49:52 -08:00
Craig Topper	8dc659c131	[Hexagon] Add an explicit makeArrayRef to pacify gcc 5.5 The array seemed to have decayed to a pointer before the ArrayRef constructor got called so there was no size information available.	2020-02-14 13:51:39 -08:00
Austin Kerbow	07824e65bf	[AMDGPU] Always enable XNACK feature when support is explicitly requested Differential Revision: https://reviews.llvm.org/D74630	2020-02-14 11:58:58 -08:00
Matt Arsenault	9ec668606b	AMDGPU: Add option to disable CGP division expansion The division expansions in AMDGPUCodeGenPrepare can't be relied on for correctness, since they punt to later optimization and possibly legalization in some cases. We still need a way to be able to write tests for the legalizer versions of the expansion. This is mostly for GlobalISel, since the expected optimzations is expecting aren't implemented. The interaction with the flag to expand 64-bit division in the IR is pretty confusing, but these flags have different purposes.	2020-02-14 11:37:07 -08:00
Matt Arsenault	34d9a16e54	AMDGPU: Add option to expand 64-bit integer division in IR I didn't realize we were already expanding 24/32-bit division here already. Use the available IntegerDivision utilities. This uses loops, so produces significantly smaller code than the inline DAG expansion. This now requires width reductions of 64-bit divisions before introducing the expanded loops. This helps work around missing legalization in GlobalISel for division, which are the only remaining core instructions that didn't work at all. I think this is plausibly a better implementation than exists in the DAG, although turning it on by default misses out on the constant value optimizations and also needs benchmarking.	2020-02-14 11:16:08 -08:00
Craig Topper	391cc4dd41	[X86] Use ZERO_EXTEND instead of SIGN_EXTEND in the fast isel handling of convert_from_fp16.	2020-02-14 10:57:12 -08:00
Craig Topper	fc0c72b2df	[X86] Add AVX512 support to the fast isel code for Intrinsic::convert_from_fp16/convert_to_fp16.	2020-02-14 10:57:11 -08:00
Matt Arsenault	bfbfa18591	GlobalISel: Lower s64->s16 G_FPTRUNC This is more or less directly ported from the AMDGPU custom lowering for FP_TO_FP16. I made a few minor fixups (using G_UNMERGE_VALUES instead of creating shift/trunc to extract the two halves, and zexting an inverted compare instead of select_cc). This also does not include the fast math expansion the DAG which converts to f32 and then to f16. I think that belongs in a pre-legalize combine instead.	2020-02-14 10:46:58 -08:00
Volkan Keles	187686a22f	[GlobalISel] LegalizationArtifactCombiner: Fix a bug in tryCombineMerges Like COPY instructions explained in D70616, we don't check the constraints when combining G_UNMERGE_VALUES. Use the same logic used in D70616 to check if registers can be replaced, or a COPY instruction needs to be built. https://reviews.llvm.org/D70564	2020-02-14 10:45:58 -08:00
Brian Cain	bf3b86bc2f	[Hexagon] v67+ HVX register pairs should support either direction Assembler now permits pairs like 'v0:1', which are encoded differently from the odd-first pairs like 'v1:0'. The compiler will require more work to leverage these new register pairs.	2020-02-14 12:43:43 -06:00
Matt Arsenault	8c2c0b3637	AMDGPU: Improve i16/v2i16 bswap	2020-02-14 09:53:22 -08:00
Craig Topper	7badb38918	[X86] Fix copy/paste mistake in comment. NFC	2020-02-14 09:47:50 -08:00
Matt Arsenault	a257bde420	AMDGPU/GlobalISel: Handle G_BSWAP	2020-02-14 09:09:44 -08:00
Pavel Iliin	b6a9fe2099	[AArch64] Add BIT/BIF support. This patch added generation of SIMD bitwise insert BIT/BIF instructions. In the absence of GCC-like functionality for optimal constraints satisfaction during register allocation the bitwise insert and select patterns are matched by pseudo bitwise select BSP instruction with not tied def. It is expanded later after register allocation with def tied to BSL/BIT/BIF depending on operands registers. This allows to get rid of redundant moves. Reviewers: t.p.northover, samparker, dmgreen Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D74147	2020-02-14 14:19:39 +00:00
Simon Pilgrim	2492075add	[X86][SSE] lowerShuffleAsBitRotate - lower to vXi8 shuffles to ROTL on pre-SSSE3 targets Without PSHUFB we are better using ROTL (expanding to OR(SHL,SRL)) than using the generic v16i8 shuffle lowering - but if we can widen to v8i16 or more then the existing shuffles are still the better option. REAPPLIED: Original commit rG11c16e71598d was reverted at rGde1d90299b16 as it wasn't accounting for later lowering. This version emits ROTLI or the OR(VSHLI/VSRLI) directly to avoid the issue.	2020-02-14 11:55:18 +00:00
Kazushi (Jam) Marukawa	60431bd728	[VE] Support for PIC (global data and calls) Summary: Support for PIC with tests for global variables and function calls. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D74536	2020-02-14 09:50:02 +01:00
Sam Parker	fd01b2f4a6	[NFC][ARM] Convert some pointers to references.	2020-02-14 08:29:01 +00:00
Fangrui Song	bcd24b2d43	[AsmPrinter][MCStreamer] De-capitalize EmitInstruction and EmitCFI*	2020-02-13 22:08:55 -08:00
Liu, Chen3	ec89335c47	[X86] Fix the bug that _mm_mask_cvtsepi64_epi32 generates result without zero the upper 64bit. Differential Revision : https://reviews.llvm.org/D74552	2020-02-14 09:26:06 +08:00
Fangrui Song	1d49eb00d9	[AsmPrinter] De-capitalize all AsmPrinter::Emit* but EmitInstruction Similar to rL328848.	2020-02-13 17:06:24 -08:00
Thomas Lively	918e90559b	[WebAssembly] Make stack pointer args inhibit tail calls Summary: Also make return calls terminator instructions so epilogues are inserted before them rather than after them. Together, these changes make WebAssembly's tail call optimization more stack-safe. Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73943	2020-02-13 16:43:53 -08:00
Fangrui Song	0bc77a0f0d	[AsmPrinter] De-capitalize some AsmPrinter::Emit* functions Similar to rL328848.	2020-02-13 13:38:33 -08:00
Craig Topper	c2e8a421ac	[X86] Don't widen 128/256-bit strict compares with vXi1 result to 512-bits on KNL. If we widen the compare we might trigger a spurious exception from the garbage data. We have two choices here. Explicitly force the upper bits to zero. Or use a legacy VEX vcmpps/pd instruction and convert the XMM/YMM result to mask register. I've chosen to go with the second option. I'm not sure which is really best. In some cases we could get rid of the zeroing since the producing instruction probably already zeroed it. But we lose the ability to fold a load. So which is best is dependent on surrounding code. Differential Revision: https://reviews.llvm.org/D74522	2020-02-13 13:26:40 -08:00
Fangrui Song	0dce409cee	[AsmPrinter] De-capitalize Emit{Function,BasicBlock]* and Emit{Start,End}OfAsmFile	2020-02-13 13:22:49 -08:00
Thomas Lively	e252293d06	[WebAssembly] Add cbrt function signatures Summary: Fixes a crash in the backend where optimizations produce calls to the cbrt runtime functions. Fixes PR 44227. Reviewers: aheejin Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D74259	2020-02-13 13:18:42 -08:00
Matt Arsenault	5adbf7d57f	AMDGPU/GlobalISel: Make G_TRUNC legal This is required to be legal. I'm not sure how we were getting away without defining any rules for it.	2020-02-13 15:25:52 -05:00
Frederic Bastien	019ab61e25	[NVPTX, LSV] Move the LSV optimization pass to later when the graph is cleaner This allow it to recognize more loads as being consecutive when the load's address are complex at the start. Differential Revision: https://reviews.llvm.org/D74444	2020-02-13 12:15:38 -08:00
Matt Arsenault	bfe3779459	AMDGPU: Use v_perm_b32 to implement bswap Also greatly improve i64 lowering. LegalizeIntegerTypes does the correct narrowing if i64 isn't legal. Just workaround this for SelectionDAG by making i64 legal and splitting in the patterns.	2020-02-13 09:45:31 -08:00
John Brawn	0ec5797296	[ARM] Fix infinite loop when lowering STRICT_FP_EXTEND If the target has FP64 but not FP16 then we have custom lowering for FP_EXTEND and STRICT_FP_EXTEND with type f64. However if the extend is from f32 to f64 the current implementation will cause in infinite loop for STRICT_FP_EXTEND due to emitting a merge_values of the original node which after replacement becomes a merge_values of itself. Fix this by not doing anything for f32 to f64 extend when we have FP64, though for STRICT_FP_EXTEND we have to do the strict-to-nonstrict mutation as that doesn't happen automatically for opcodes with custom lowering. Differential Revision: https://reviews.llvm.org/D74559	2020-02-13 16:12:50 +00:00
Sean Fertile	b2d1e002ca	[PowerPC][NFC] Small cleanup to restore CR field code in PPCFrameLowering. Skip the loop over the CalleSavedInfos in 'restoreCalleeSavedRegisters' when the register is a CR field and we are not targeting 32-bit ELF. This is safe because: 1) The helper function 'restoreCRs' returns if the target is not 32-bit ELF, making all the code in the loop related to CR fields dead for every other subtarget. This code is only called on ELF right now, but the patch to extend it for AIX also needs to skip 'restoreCRs'. 2) The loop will not otherwise modify the iterator, so the iterator manipulations at the bottom of the loop end up setting 'I' to its current value. This simplifciation allows us to remove one argument from 'restoreCRs'. Also add a helper function to determine if a register is one of the callee saved condition register fields.	2020-02-13 09:50:28 -05:00
Qiu Chaofan	87c773082a	[PowerPC] Exploit VSX rounding instrs for rint Exploit native VSX rounding instruction, x(v\|s)r(d\|s)pic, which does rounding using current rounding mode. According to C standard library, rint may raise INEXACT exception while nearbyint won't. Reviewed By: lkail Differential Revision: https://reviews.llvm.org/D72685	2020-02-13 20:59:50 +08:00
Daniel Kiss	d5a186a600	[AArch64] Fix BTI landing pad generation. In some cases BTI landing pad is inserted even compatible instruction was there already. Meta instruction does not count in this case therefore skip them in the check for first instructions in the function. Differential revision: https://reviews.llvm.org/D74492	2020-02-13 10:44:34 +00:00
Kerry McLaughlin	671cbc1fbb	[AArch64][SVE] Add mul/mla/mls lane & dup intrinsics Summary: Implements the following intrinsics: - @llvm.aarch64.sve.dup - @llvm.aarch64.sve.mul.lane - @llvm.aarch64.sve.mla.lane - @llvm.aarch64.sve.mls.lane Reviewers: c-rhodes, sdesmalen, dancgr, efriedma, rengolin Reviewed By: sdesmalen Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, cfe-commits, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D74222	2020-02-13 10:32:59 +00:00
David Green	9d4c597541	[ARM] Fix ReconstructShuffle for bigendian Simon pointed out that this function is doing a bitcast, which can be incorrect for big endian. That makes the lowering of VMOVN in MVE wrong, but the function is shared between Neon and MVE so both can be incorrect. This attempts to fix things by using the newly added VECTOR_REG_CAST instead of the BITCAST. As it may now be used on Neon, I've added the relevant patterns for it there too. I've also added a quick dag combine for it to remove them where possible. Differential Revision: https://reviews.llvm.org/D74485	2020-02-13 09:56:46 +00:00
Yonghong Song	61bd33e37b	[BPF] explicit warning of not supporting dynamic stack allocation Currently, BPF does not support dynamic static allocation. For a program like below: extern void bar(int *); void foo(int n) { int a[n]; bar(a); } The current error message looks like: unimplemented operand UNREACHABLE executed at /.../llvm/lib/Target/BPF/BPFISelLowering.cpp:199! Let us make error message explicit so it will be clear to the user what is the problem. With this patch, the error message looks like: fatal error: error in backend: Unsupported dynamic stack allocation ... Differential Revision: https://reviews.llvm.org/D74521	2020-02-12 20:43:06 -08:00
Austin Kerbow	5db0b2521c	[AMDGPU][GlobalISel] Handle 64byte EltSIze in getRegSplitParts Reviewers: arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D74518	2020-02-12 19:11:52 -08:00
Amy Huang	de1d90299b	Revert "[X86][SSE] lowerShuffleAsBitRotate - lower to vXi8 shuffles to ROTL on pre-SSSE3 targets" This reverts commit `11c16e7159` because it causes a crash in chromium code. See https://reviews.llvm.org/rG11c16e71598d51f15b4cfd0f719c4dabcc0bebf7.	2020-02-12 17:00:37 -08:00
Matt Arsenault	d1b393d92c	AMDGPU/GlobalISel: Select G_CTTZ_ZERO_UNDEF Directly select this rather than going through the intermediate instruction, which may provide some combine value in the future.	2020-02-12 16:19:46 -08:00
Matt Arsenault	045a8921d7	AMDGPU/GlobalISel: Select G_CTLZ_ZERO_UNDEF Directly select this rather than going through the intermediate instruction, which may provide some combine value in the future.	2020-02-12 16:19:45 -08:00
Matt Arsenault	e174c278ca	AMDGPU/GlobalISel: Fix mapping G_ICMP with constrained result When SI_IF is inserted, it constrains the source register with a register class, which was quite likely a G_ICMP. This was incorrectly treating it as a scalar, and then applyMappingImpl would end up producing invalid MIR since this was unexpected. Also fix not using all VGPR sources for vcc outputs.	2020-02-12 16:19:45 -08:00
Jay Foad	32aac25637	[KnownBits] Introduce anyext instead of passing a flag into zext Summary: This was a very odd API, where you had to pass a flag into a zext function to say whether the extended bits really were zero or not. All callers passed in a literal true or false. I think it's much clearer to make the function name reflect the operation being performed on the value we're tracking (rather than on the KnownBits Zero and One fields), so zext means the value is being zero extended and new function anyext means the value is being extended with unknown bits. NFC. Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D74482	2020-02-12 19:06:53 +00:00
Jessica Paquette	45417b7aa7	[AArch64][GlobalISel] Properly implement widening for TB(N)Z When we have to widen to a 64-bit register, we have to emit a SUBREG_TO_REG. Add a general-purpose widening helpe which emits the correct SUBREG_TO_REG instruction based off of a desired size and add a testcase. Also remove some asserts which are technically incorrect in `emitTestBit`. - p0 doesn't count as a scalar type, so we need to check `!Ty.isVector()` instead - Whenever we have a s1, the Size/Bit checks are too conservative, so just remove them Replace these asserts with less conservative ones where applicable. Differential Revision: https://reviews.llvm.org/D74427	2020-02-12 09:24:58 -08:00
Simon Pilgrim	ff307c8120	[X86] combineFneg - generalize FMA negations with isNegatibleForFree/getNegatedExpression This has a really interesting side effect in that it improves some UMAX/UMIN reduction code which had redundant XOR(SHUFFLE(XOR(X,SIGNMASK)),SIGNMASK) patterns - the getNegatibleCost recognises it as FNEG(SHUFFLE(FNEG(X))).... We have a lot of FNEG patterns bitcasted to the integer domain for XOR signbit twiddling which is similar to what we do to allow UMAX/UMIN to be lowered using SMAX/SMIN. Differential Revision: https://reviews.llvm.org/D74231	2020-02-12 16:07:27 +00:00
Danilo Carvalho Grael	fc8d033e96	[AArch64][SVE] Add addsub carry long instrinsics Summary: Add intrinsics for the following instructions: - adclb, adclt, sbclb, sbclt Reviewers: kmclaughlin, c-rhodes, sdesmalen, efriedma, rengolin Reviewed By: kmclaughlin Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D74328	2020-02-12 10:49:10 -05:00
Victor Huang	caa10988be	[PowerPC] Add options for PPC to enable/disable using non-volatile CR An option is added for PowerPC to disable use of non-volatile CR register fields and avoid CR spilling in the prologue. Differential Revision: https://reviews.llvm.org/D69835	2020-02-12 09:23:11 -06:00
Anil Mahmud	ab4d606421	[PowerPC] Add support for intrinsic llvm.ppc.eieio Add support for the intrinsic llvm.ppc.eieio to emit the instruction eieio. Differential Revision: https://reviews.llvm.org/D69066	2020-02-12 09:02:17 -06:00
Anil Mahmud	b413e5c309	[PowerPC] Add support for intrinsics llvm.ppc.dcbfl and llvm.ppc.dcbflp Added support for the intrinsic llvm.ppc.dcbfl and llvm.ppc.dcbflp. These will be used for emitting cache control instructions dcbfl and dcbflp which are actually mnemonics for using dcbf instruction with different immediate arguments. dcbfl ra, rb -> dcbf ra, rb, 1 dcbflp, ra, rb -> dcbf ra, rb, 3 Differential Revision: https://reviews.llvm.org/D68411	2020-02-12 09:02:17 -06:00
Matt Arsenault	fa61e200e5	AMDGPU/GlobalISel: Widen non-power-of-2 load results Load extra bits if suitably aligned. This allows using widened 3-vector loads on SI, and fixes legalization for <9 x s32> (which LSV apparently forms frequently on lowered kernel argument lists). Fix incorrectly treating these as legal on SI. This should emit a 64-bit store and a 32-bit store. I think all of the load and store rules are just about complete, but due for a rewrite.	2020-02-12 09:35:10 -05:00
Hans Wennborg	a19de32095	Fix unused function warning (PR44808)	2020-02-12 15:12:48 +01:00
Simon Pilgrim	9eb426c88c	[TargetLowering] Add NegatibleCost enum for isNegatibleForFree return codes The isNegatibleForFree/getNegatedExpression methods currently rely on a raw char value to indicate whether a negation is beneficial or not. This patch replaces the char return value with an NegatibleCost enum to more clearly demonstrate what is implied. It also renames isNegatibleForFree to getNegatibleCost to more accurately reflect whats going on. Differential Revision: https://reviews.llvm.org/D74221	2020-02-12 11:51:42 +00:00
Jay Foad	e9900b1fbf	[AMDGPU] Add one more pass to LLVMInitializeAMDGPUTarget	2020-02-12 11:19:14 +00:00
Djordje Todorovic	97ed706a96	Revert "[DebugInfo] Enable the debug entry values feature by default" This reverts commit rG9f6ff07f8a39. Found a test failure on clang-with-thin-lto-ubuntu buildbot.	2020-02-12 11:59:04 +01:00
Djordje Todorovic	9f6ff07f8a	[DebugInfo] Enable the debug entry values feature by default This patch enables the debug entry values feature. - Remove the (CC1) experimental -femit-debug-entry-values option - Enable it for x86, arm and aarch64 targets - Resolve the test failures - Leave the llc experimental option for targets that do not support the CallSiteInfo yet Differential Revision: https://reviews.llvm.org/D73534	2020-02-12 10:25:14 +01:00
Nicolai Hähnle	ab2f610f38	AMDGPU: llvm.amdgcn.writelane is a source of divergence Summary: Consider: %r = call i32 @llvm.amdgcn.writelane(i32 0, i32 1, i32 2) This produces a value that is 0 on lane 1, and 2 everywhere else; i.e., it is divergent. Reported-by: Marek Olsak <Marek.Olsak@amd.com> Reviewers: arsenm, foad, mareko Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D74400	2020-02-12 09:12:56 +01:00
Kazushi (Jam) Marukawa	42a16dacda	[VE] Bit operator isel Summary: Isel and tests for bswap,brev,ctpop,ctlz,ctty,rotl,rotr Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D74304	2020-02-12 09:02:13 +01:00
Craig Topper	746395a446	[X86] Remove unnecessary hasSideEffects = 0, mayLoad = 1 from an instruction with a pattern. NFC	2020-02-11 23:26:29 -08:00
Craig Topper	3988b7046a	[X86] Correct the predicate on some patterns for 128 and 256 EVEX versions of VCVTPS2PH. These should require AVX512VL not AVX512F. The legacy VEX patterns will match first unless AVX512VL is enabled so this doesn't cause a functional issue.	2020-02-11 23:26:29 -08:00
Craig Topper	0daf9b8e41	[X86][LegalizeTypes] Add SoftPromoteHalf support STRICT_FP_EXTEND and STRICT_FP_ROUND This adds a strict version of FP16_TO_FP and FP_TO_FP16 and uses them to implement soft promotion for the half type. This is enough to provide basic support for __fp16 with strictfp. Add the necessary X86 support to use VCVTPS2PH/VCVTPH2PS when F16C is enabled.	2020-02-11 22:30:04 -08:00
Matt Arsenault	6d4ebada79	AMDGPU: Use conditions directly in division expansion This was creating a select on true/false values, and then comparing that later. This produced more work for later combines, which can be avoided by just using the boolean values. This was copied from the original DAG expansion, which also has the same problem. This doesn't have a observable change using SelectionDAG, but since GlobalISel is missing these optimizations, the final code was noticeably longer.	2020-02-11 23:11:30 -05:00
Austin Kerbow	3a312c3ee5	[AMDGPU][GlobalISel] Refactor selectDS1Addr1Offset/selectDS64Bit4ByteAligned Differential Revision: https://reviews.llvm.org/D74261	2020-02-11 16:57:13 -08:00
Matt Arsenault	b30e122333	AMDGPU: Don't expand more special div cases in IR These have nicer expansions implemented in the DAG. Ideally we would either directly implement all of these special expansions, or stop expanding division in the IR.	2020-02-11 19:01:06 -05:00
Matt Arsenault	86f9117d47	AMDGPU: Don't report 2-byte alignment as fast This is apparently worse than 1-byte alignment. This does not attempt to decompose 2-byte aligned wide stores, but will stop trying to produce them. Also fix bug in LoadStoreVectorizer which was decreasing the alignment and vectorizing stack accesses. It was assuming a stack object was an alloca that could have its base alignment changed, which is not true if the pointer is derived from a function argument.	2020-02-11 18:35:00 -05:00
Justin Lebar	1bd6123b78	Use std::foo_t rather than std::foo in LLVM. Summary: C++14 migration. No functional change. Reviewers: bkramer, JDevlieghere, lebedev.ri Subscribers: MatzeB, hiraditya, jkorous, dexonsmith, arphaman, kadircet, lebedev.ri, usaxena95, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D74384	2020-02-11 15:12:51 -08:00
Matt Arsenault	f734ce0488	AMDGPU: Fix crash on v3i15 kernel arguments This was split into 3 i15 arguments. The i15 piece needs to be rounded to a simple MVT for the memory type.	2020-02-11 18:11:39 -05:00
Matt Arsenault	92c62582fc	AMDGPU: Directly use rcp intrinsic in idiv expansions Since natural fdiv lowering is now more conservative even with denormals disabled, we get a slower expansion from just a plain 1.0/fdiv. Directly emit the rcp intrinsic when using it to implement integer division to avoid a pointlessly complex sequence.	2020-02-11 18:11:39 -05:00
Matt Arsenault	b87e3e2d0d	AMDGPU: Don't create potentially dead rcp declarations This will introduce unused declarations if this doesn't reach any of the paths that will really use it.	2020-02-11 18:11:39 -05:00
Craig Topper	846d0ac43e	[X86] Don't disable code in combineHorizontalPredicateResult just because we have avx512 We aren't doing a good job of optimizing AVX512 outside of this code. So remove the bail out for AVX512 and replace with a FIXME. This at least gets us the AVX2 codegen. Differential Revision: https://reviews.llvm.org/D74431	2020-02-11 14:36:29 -08:00
Krzysztof Parzyszek	61ca996e79	[Hexagon] Don't generate short vectors in ISD::SELECT in preprocessing Selection DAG preprocessing runs long after legalization, so make sure that the types can be handled by the selection code.	2020-02-11 15:27:33 -06:00
lewis-revill	07f7c00208	[RISCV] Add support for save/restore of callee-saved registers via libcalls This patch adds the support required for using the __riscv_save and __riscv_restore libcalls to implement a size-optimization for prologue and epilogue code, whereby the spill and restore code of callee-saved registers is implemented by common functions to reduce code duplication. Logic is also included to ensure that if both this optimization and shrink wrapping are enabled then the prologue and epilogue code can be safely inserted into the basic blocks chosen by shrink wrapping. Differential Revision: https://reviews.llvm.org/D62686	2020-02-11 21:23:03 +00:00
Jay Foad	9df0c264d4	[AMDGPU] Fix implicit operands for ENTER_WWM pseudo Summary: SIInstrInfo::expandPostRAPseudo converts ENTER_WWM in-place into an S_OR_SAVEEXEC instruction that needs certain implicit operands. Without this patch I get errors like this that make it harder to use -stop-after to bisect the pass pipeline: $ llc -march=amdgcn test/CodeGen/AMDGPU/wqm.ll -stop-after=postrapseudos -o - \| sed -E 's/ (from\|into) custom "TargetCustom[0-9]+"//' \| llc -march=amdgcn -x=mir error: <stdin>:1295:70: missing implicit register operand 'implicit-def $scc' renamable $sgpr2_sgpr3 = S_OR_SAVEEXEC_B64 -1, implicit-def $exec ^ Note that this error is currently only generated by MIParser but it comes with a FIXME comment: // FIXME: Move the implicit operand verification to the machine verifier. Reviewers: critson, arsenm, rampitec, nhaehnle Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D74428	2020-02-11 20:11:41 +00:00
Craig Topper	d7de7ac370	[X86] Raise the latency for VectorImul from 4 to 5 in Skylake scheduler models Based on uops.info these should have 5 cycle latency as they did on Haswell/Broadwell. I have no additional internal information from Intel. This was also shown as a discrepancy in the spreadsheet that was sent with an early llvm-dev post about llvm-exegesis. It also matches Agner Fog. Differential Revision: https://reviews.llvm.org/D74357	2020-02-11 11:24:25 -08:00
Stanislav Mekhanoshin	453a8f3af7	[AMDGPU] Remove AMDGPURegisterInfo R600 and GCN do not have anything in common in terms of register file organization anymore. Differential Revision: https://reviews.llvm.org/D74426	2020-02-11 11:13:38 -08:00
Jordan Rupprecht	734f086b42	[NFC] Fix unused var in release builds	2020-02-11 10:10:52 -08:00
Yonghong Song	29bc5dd194	[BPF] implement isTruncateFree and isZExtFree in BPFTargetLowering Currently, isTruncateFree() and isZExtFree() callbacks return false as they are not implemented in BPF backend. This may cause suboptimal code generation. For example, if the load in the context of zero extension has more than one use, the pattern zextload{i8,i16,i32} will not be generated. Rather, the load will be matched first and then the result is zero extended. For example, in the test together with this commit, we have I1: %0 = load i32, i32* %data_end1, align 4, !tbaa !2 I2: %conv = zext i32 %0 to i64 ... I3: %2 = load i32, i32* %data, align 4, !tbaa !7 I4: %conv2 = zext i32 %2 to i64 ... I5: %4 = trunc i64 %sub.ptr.lhs.cast to i32 I6: %conv13 = sub i32 %4, %2 ... The I1 and I2 will match to one zextloadi32 DAG node, where SUBREG_TO_REG is used to convert a 32bit register to 64bit one. During code generation, SUBREG_TO_REG is a noop. The %2 in I3 is used in both I4 and I6. If isTruncateFree() is false, the current implementation will generate a SLL_ri and SRL_ri for the zext part during lowering. This patch implement isTruncateFree() in the BPF backend, so for the above example, I3 and I4 will generate a zextloadi32 DAG node with SUBREG_TO_REG is generated during lowering to Machine IR. isZExtFree() is also implemented as it should help code gen as well. This patch also enables the change in https://reviews.llvm.org/D73985 since it won't kick in generates MOV_32_64 machine instruction. Differential Revision: https://reviews.llvm.org/D74101	2020-02-11 09:59:19 -08:00
Nikita Popov	5eb19bf4a2	[X86CmovConversion] Make heuristic for optimized cmov depth more conservative (PR44539) Fix/workaround for https://bugs.llvm.org/show_bug.cgi?id=44539. As discussed there, this pass makes some overly optimistic assumptions, as it does not have access to actual branch weights. This patch makes the computation of the depth of the optimized cmov more conservative, by assuming a distribution of 75/25 rather than 50/50 and placing the weights to get the more conservative result (larger depth). The fully conservative choice would be std::max(TrueOpDepth, FalseOpDepth), but that would break at least one existing test (which may or may not be an issue in practice). Differential Revision: https://reviews.llvm.org/D74155	2020-02-11 17:33:11 +01:00
Eric Astor	8d5bf0422b	[ms] [llvm-ml] Add support for attempted register parsing Summary: Add a new method (tryParseRegister) that attempts to parse a register specification. MASM allows the use of IFDEF <register>, as well as IFDEF <symbol>. To accommodate this, we make it possible to check whether a register specification can be parsed at the current location, without failing the entire parse if it can't. Reviewers: thakis Reviewed By: thakis Tags: #llvm Differential Revision: https://reviews.llvm.org/D73486	2020-02-11 10:45:33 -05:00
Jonas Paulsson	0311e28e9c	[SystemZ] Bugfix in emitSelect() When more than one SelectPseudo instruction is handled a new MBB is returned. This must not be done if that would result in leaving an undhandled isel pseudo behind in the original MBB. Fixes https://bugs.llvm.org/show_bug.cgi?id=44849. Review: Ulrich Weigand Differential Revision: https://reviews.llvm.org/D74352	2020-02-11 10:41:01 -05:00
Sjoerd Meijer	6b0ed508fa	[ARM][MVE] Tail-Predication: recognise (again) active lanes IR pattern A small IR change in calculating the active lanes resulted in no longer recognising tail-predication. Now recognise both an 'add' and 'or' in the expression that calculates the active lanes. Differential Revision: https://reviews.llvm.org/D74394	2020-02-11 15:18:18 +00:00
Andrew Wei	db875f6655	[RISCV] Optimize seteq/setne pattern expansions for better code size ADDI(C.ADDI) may achieve better code size than XORI, since XORI has no C extension. This patch transforms two patterns and gets almost equivalent results. Differential Revision: https://reviews.llvm.org/D71774	2020-02-11 22:45:15 +08:00
Simon Pilgrim	fa620fc8e2	[X86] combineConcatVectorOps - reuse IsSplat and remove duplicate code. NFC.	2020-02-11 13:37:57 +00:00
Simon Pilgrim	11c16e7159	[X86][SSE] lowerShuffleAsBitRotate - lower to vXi8 shuffles to ROTL on pre-SSSE3 targets Without PSHUFB we are better using ROTL (expanding to OR(SHL,SRL)) than using the generic v16i8 shuffle lowering - but if we can widen to v8i16 or more then the existing shuffles are still the better option.	2020-02-11 12:21:03 +00:00
Mirko Brkusanin	5ba931a84a	[Mips] Add intrinsics for 4-byte and 8-byte MSA loads/stores. New intrinisics are implemented for when we need to port SIMD code from other arhitectures and only load or store portions of MSA registers. Following intriniscs are added which only load/store element 0 of a vector: v4i32 __builtin_msa_ldrq_w (const void , imm_n2048_2044); v2i64 __builtin_msa_ldr_d (const void , imm_n4096_4088); void __builtin_msa_strq_w (v4i32, void , imm_n2048_2044); void __builtin_msa_str_d (v2i64, void , imm_n4096_4088); Differential Revision: https://reviews.llvm.org/D73644	2020-02-11 11:47:30 +01:00
Kerry McLaughlin	e7755f9e4f	[AArch64][SVE] Add SVE2 intrinsics for complex integer dot product Summary: Implements the following intrinsics: - @llvm.aarch64.sve.cdot - @llvm.aarch64.sve.cdot.lane Reviewers: sdesmalen, efriedma, dancgr, c-rhodes, rengolin Reviewed By: efriedma Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, cfe-commits, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73687	2020-02-11 10:28:31 +00:00
Jay Foad	b06a13f541	[AMDGPU] Fix non-deterministic iteration order Summary: As far as I know this did not affect code generation, but it did affect the order of -debug-only=si-wqm output and the naming of autonamed values in -print-after=si-wqm output. Reviewers: arsenm, rampitec, nhaehnle Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, mgrang, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D74317	2020-02-11 09:19:30 +00:00
Craig Topper	798305d29b	[X86] Custom lower ISD::FP16_TO_FP and ISD::FP_TO_FP16 on f16c targets instead of using isel patterns. We need to use vector instructions for these operations. Previously we handled this with isel patterns that used extra instructions and copies to handle the the conversions. Now we use custom lowering to emit the conversions. This allows them to be pattern matched and optimized on their own. For example we can now emit vpextrw to store the result if its going directly to memory. I've forced the upper elements to VCVTPHS2PS to zero to keep some code similar. Zeroes will be needed for strictfp. I've added a DAG combine for (fp16_to_fp (fp_to_fp16 X)) to avoid extra instructions in between to be closer to the previous codegen. This is a step towards strictfp support for f16 conversions.	2020-02-10 22:01:48 -08:00
diggerlin	09d26b79d2	[NFC] Refactor the tuple of symbol information with structure for llvm-objdump SUMMARY: refator the std::tuple<uint64_t, StringRef, uint8_t> to structor Reviewers: daltenty Subscribers: wuzish, nemanjai, hiraditya Differential Revision: https://reviews.llvm.org/D74240	2020-02-10 19:23:01 -05:00
Xiangling Liao	660b0d7f7b	[AIX] Enable frame pointer for AIX and add related test suite This patch: - enable frame pointer for AIX; - update some of red zone comments; - add/update testcases; Differential Revision: https://reviews.llvm.org/D72454	2020-02-10 15:43:41 -05:00
diggerlin	aa86311e62	[AIX][XCOFF] Support Mergeable2ByteCString and Mergeable4ByteCString SUMMARY: The patch is enable to support Mergeable2ByteCString and Mergeable4ByteCString Reviewers: daltenty Subscribers: wuzish, nemanjai, hiraditya Differential Revision: https://reviews.llvm.org/D74164	2020-02-10 14:45:54 -05:00
Jonas Paulsson	fcdb99e0b5	[SystemZ] Add a subtarget cache like some other targets already have. Each function is with this compiled with the SystemZSubtarget initialized from the functions attributes. Review: Ulrich Weigand. Differential Revision: https://reviews.llvm.org/D74086	2020-02-10 13:10:58 -05:00
Matt Arsenault	7af7b96a9b	AMDGPU: Move R600 test compatability hack Instead of handling the r600 intrinsics on amdgcn, handle the amdgcn intrinsics on r600.	2020-02-10 10:02:06 -08:00
Simon Pilgrim	f319074824	[X86] combineConcatVectorOps - combine X86ISD::PACKSS ops	2020-02-10 17:48:02 +00:00
Simon Pilgrim	74c0f98cf5	[X86] combineConcatVectorOps - combine X86ISD::VPERMI ops	2020-02-10 17:48:01 +00:00
Simon Pilgrim	2463b8c97d	[X86] combineConcatVectorOps - combine VSHLI/VSRAI/VSRLI ops Non-AVX512BW targets failed to concatenate 256-bit shifts back to 512-bits (split during 512-bit shuffle lowering as they don't have v32i16/v64i8 types).	2020-02-10 16:59:09 +00:00
Stanislav Mekhanoshin	ed3527c648	[AMDGPU] Split R600 and GCN subregs These are generated and do not need to have the same values. We are defining separate subregs for R600 and GCN but then using AMDGPU subregs on R600. Differential Revision: https://reviews.llvm.org/D74248	2020-02-10 08:29:56 -08:00
Simon Pilgrim	06617c4522	[X86] Add lowerShuffleAsBitRotate (PR44379) As noted on PR44379, we didn't attempt to lower vector shuffles using bit rotations on XOP/AVX512F targets. This patch lowers to uniform ISD:ROTL nodes - ROTR isn't supported by XOP and they are interchangeable for constant values anyway. There might be cases where targets without ISD:ROTL support would benefit from this (expanding to SRL+SHL+OR), which I'll investigate in a future patch. REAPPLIED rGe82e17d4d4ca after reversion at rG39eade73a567 - fixed offset matching in matchShuffleAsBitRotate.	2020-02-10 16:16:56 +00:00
Luke Geeson	a67db83681	[AArch64] Make Read Write System Registers Read Only This patch makes the following System Registers Read Only: - CurrentEL - ICH_MISR_EL2 - PMBIDR_EL1 - PMSIDR_EL1 as found in: https://developer.arm.com/docs/ddi0595/e/aarch64-system-registers Relative line numbers were also added to the tests so we get more informative error messages on failure. Change-Id: I963b4f01ca5737b58f9e8e7abe9ca1d99e328758	2020-02-10 14:34:24 +00:00
Kai Nacke	34946dfd79	[SystemZ] Add implementation for the intrinsic llvm.read_register This change implements the llvm intrinsic llvm.read_register for the SystemZ platform which returns the value of the specified register (http://llvm.org/docs/LangRef.html#llvm-read-register-and-llvm-write-register-intrinsics). This implementation returns the value of the stack register, and can be extended to return the value of other registers. The implementation for this intrinsic exists on various other platforms including Power, x86, ARM, etc. but missing on SystemZ. Reviewers: uweigand Differential Revision: https://reviews.llvm.org/D73378	2020-02-10 08:19:10 -05:00
Hans Wennborg	ea9850b6c7	Fix an unused variable warning	2020-02-10 14:08:18 +01:00
Kerry McLaughlin	92a7875092	[AArch64][SVE] SVE2 intrinsics for complex integer arithmetic Summary: Adds the following SVE2 intrinsics: - cadd & sqcadd - cmla & sqrdcmlah - saddlbt, ssublbt & ssubltb Reviewers: sdesmalen, dancgr, efriedma, cameron.mcinally, c-rhodes, rengolin Reviewed By: sdesmalen Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, cfe-commits, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73636	2020-02-10 12:14:56 +00:00
Simon Pilgrim	39eade73a5	Revert rGe82e17d4d4cac8b2df00094e80d5e1cb22795664 - [X86] Add lowerShuffleAsBitRotate (PR44379) As noted on PR44379, we didn't attempt to lower vector shuffles using bit rotations on XOP/AVX512F targets. This patch lowers to uniform ISD:ROTL nodes - ROTR isn't supported by XOP and they are interchangeable for constant values anyway. There might be cases where targets without ISD:ROTL support would benefit from this (expanding to SRL+SHL+OR), which I'll investigate in a future patch. Also, non-AVX512BW targets fail to concatenate 256-bit rotations back to 512-bits (split during shuffle lowering as they don't have v32i16/v64i8 types). --- Internal shuffle tests indicate theres a bug somewhere that I haven't been able to track down yet.	2020-02-10 12:14:26 +00:00
Kerry McLaughlin	e299a08149	[AArch64][SVE] SVE2 intrinsics for character match & histogram generation Summary: Implements the following intrinsics: - @llvm.aarch64.sve.histcnt - @llvm.aarch64.sve.histseg - @llvm.aarch64.sve.match - @llvm.aarch64.sve.nmatch Reviewers: c-rhodes, sdesmalen, dancgr, efriedma, rengolin Reviewed By: c-rhodes Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, cfe-commits, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D74117	2020-02-10 11:08:00 +00:00
Kerry McLaughlin	5e1d7bb679	[AArch64][SVE] Add SVE2 intrinsics for widening DSP operations Summary: Implements the following intrinsics: - @llvm.aarch64.sve.[s\|u]abalb - @llvm.aarch64.sve.[s\|u]abalt - @llvm.aarch64.sve.[s\|u]addlb - @llvm.aarch64.sve.[s\|u]addlt - @llvm.aarch64.sve.[s\|u]sublb - @llvm.aarch64.sve.[s\|u]sublt - @llvm.aarch64.sve.[s\|u]abdlb - @llvm.aarch64.sve.[s\|u]abdlt - @llvm.aarch64.sve.sqdmullb - @llvm.aarch64.sve.sqdmullt - @llvm.aarch64.sve.[s\|u]mullb - @llvm.aarch64.sve.[s\|u]mullt Reviewers: sdesmalen, dancgr, efriedma, cameron.mcinally, rengolin Reviewed By: sdesmalen Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, cfe-commits, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73719	2020-02-10 10:37:59 +00:00
Djordje Todorovic	3a4dc577c9	[CSInfo] Fix the assertions regarding updating the CSInfo The call site info was not updated correctly when deleting corresponding call instructions. Differential Revision: https://reviews.llvm.org/D73700	2020-02-10 10:55:06 +01:00
Kai Nacke	a5040d5ec9	[SytemZ] Disable vector ABI when using option -march=arch[8\|9\|10] When specifying -march=arch[8\|9\|10], those CPU types do NOT support the vector extension. In this case the vector ABI must be disabled. The generated data layout should NOT contain 64-v128. Reviewers: uweigand Differential Revision: https://reviews.llvm.org/D74146	2020-02-10 04:14:05 -05:00
Djordje Todorovic	68908993eb	[CSInfo] Use isCandidateForCallSiteEntry() when updating the CSInfo Use the isCandidateForCallSiteEntry(). This should mostly be an NFC, but there are some parts ensuring the moveCallSiteInfo() and copyCallSiteInfo() operate with call site entry candidates (both Src and Dest should be the call site entry candidates). Differential Revision: https://reviews.llvm.org/D74122	2020-02-10 10:03:14 +01:00
Sebastian Neubauer	8756869170	[AMDGPU] Add a16 feature to gfx10 Based on D72931 This adds a new feature called A16 which is enabled for gfx10. gfx9 keeps the R128A16 feature so it can share all the instruction encodings with gfx7/8. Differential Revision: https://reviews.llvm.org/D73956	2020-02-10 09:04:23 +01:00
Craig Topper	06ba969c9d	[X86] Make (insert_vector_elt (v8i16 zerovec), i16 %x, 0) generate the same code as (v8i16 (build_vector %x, 0, 0, 0, 0, 0, 0, 0)). Instead of using a insrw to element 0, use movzx and movd. Same for v16i8.	2020-02-09 21:52:11 -08:00
Craig Topper	05d44204fa	[X86] Use MOVZX instead of MOVSX in f16_to_fp isel patterns. Using sign extend forces the adjacent element to either all zeros or all ones. But all ones is a NAN. So that doesn't seem like a great idea. Trying to work on supporting this with strict FP where NAN would definitely be bad.	2020-02-09 20:39:52 -08:00
Shiva Chen	64f417200e	[RISCV] Fix incorrect FP base CFI offset for variable argument functions When the FP exists, the FP base CFI directive offset should take the size of variable arguments into account. Differential Revision: https://reviews.llvm.org/D73862	2020-02-10 11:56:08 +08:00
Matt Arsenault	312a9d1b83	GlobalISel: Fix narrowScalar for G_{CTLZ\|CTTZ}_ZERO_UNDEF Narrow these for 64-bit VALU for AMDGPU.	2020-02-09 19:02:38 -05:00
Matt Arsenault	c437f6c687	AMDGPU/GlobalISel: Split 64-bit G_CTPOP in RegBankSelect	2020-02-09 18:39:33 -05:00
Matt Arsenault	6135f5eda4	GlobalISel: Fix narrowing of G_CTLZ/G_CTTZ The result type is separate from the source type.	2020-02-09 18:11:43 -05:00
Matt Arsenault	2126c70e3a	AMDGPU/GlobalISel: Don't mis-select vector index on a constant Vector indexing with a constant index should be folded out in the legalizer, but this was accidentally falling through. This would produce the indexing operation with $noreg. Handle this case as a dynamic index just in case a bug like this happens again in the future.	2020-02-09 18:02:37 -05:00
Matt Arsenault	f4a38c114e	AMDGPU/GlobalISel: Look through casts when legalizing vector indexing We were failing to find constants that were casted. I feel like the artifact combiner should have folded the constant in the trunc before the custom lowering, but that doesn't happen.	2020-02-09 18:02:10 -05:00
Matt Arsenault	00115d767f	AMDGPU: Remove dead kill handling At one point a custom node was used for kill handling, but now the intrinsic is directly selected. Remove leftover pattern machinery.	2020-02-09 17:59:24 -05:00
Matt Arsenault	6e1770821f	AMDGPU: Fix SI_IF lowering when the save exec reg has terminator uses Reverts part of `6524a7a2b9`. Since that commit, the expansion was ignoring the actual save exec register produced by the instruction, and looking at other instructions. I do not understand why it was looking at other instructions, but relying on this scan was wrong. Fixes verifier errors after SI_IF is tail duplicated, which should be correct to do. The results were fed into a phi, which was lowered to the S_MOV_B64_term instructions.	2020-02-09 17:59:19 -05:00
Simon Pilgrim	29e646fe65	[X86] combineConcatVectorOps - combine VROTLI/VROTRI ops Fix issue mentioned on rGe82e17d4d4ca - non-AVX512BW targets failed to concatenate 256-bit rotations back to 512-bits (split during shuffle lowering as they don't have v32i16/v64i8 types).	2020-02-09 21:50:10 +00:00
Craig Topper	656d66f5fc	[X86] Use custom isel for (X86sbb_flag 0, 0) so we can use 32-bit SBB for i8/i16. We were using MOV32r0 and an extract_subreg as an input. By using custom isel we can move the extract_subreg to after the SBB instead of on the input.	2020-02-09 13:19:35 -08:00
Craig Topper	e1cbfecdb8	[X86] Add flag result VT to a MOV32r0 created in X86DAGToDAGISel::Select The flag isn't used, but I believe this matches the MOV32r0 that would be created by the table emitter. This should allow this node to be CSEed with any others created by the table.	2020-02-09 13:19:21 -08:00
Simon Pilgrim	e82e17d4d4	[X86] Add lowerShuffleAsBitRotate (PR44379) As noted on PR44379, we didn't attempt to lower vector shuffles using bit rotations on XOP/AVX512F targets. This patch lowers to uniform ISD:ROTL nodes - ROTR isn't supported by XOP and they are interchangeable for constant values anyway. There might be cases where targets without ISD:ROTL support would benefit from this (expanding to SRL+SHL+OR), which I'll investigate in a future patch. Also, non-AVX512BW targets fail to concatenate 256-bit rotations back to 512-bits (split during shuffle lowering as they don't have v32i16/v64i8 types).	2020-02-09 21:15:03 +00:00
Craig Topper	dd262222b4	[X86] Use MVT::i32 for the type of a MOV32r0 created in X86DAGToDAGISel::Select. Not sure if this really matters. The VT isn't really used after this point. At best it might affect CSE.	2020-02-09 11:57:42 -08:00
Craig Topper	dbcc1392b3	[X86] Remove isel patterns that include a vselect/X86selects and a strict FP node. A vselect+strictfp node is not equivalent to a masked operation. The exceptions of the strictfp node are not masked by a vselect after it so we can't match it to a masked operation. We already had a hack in IsLegalToFold to prevent these patterns from matching. This patch removes that hack and removes the patterns.	2020-02-09 11:45:54 -08:00
Simon Pilgrim	29621b2534	[X86] Rename matchShuffleAsRotate - matchShuffleAsByteRotate. NFCI. A matchShuffleAsBitRotate variant will be added soon and we need to make the difference more obvious.	2020-02-09 18:35:50 +00:00
Simon Pilgrim	3ec6de07e9	Fix signed/unsigned warning.	2020-02-09 13:35:03 +00:00
Simon Pilgrim	644d56b432	[X86] Recognise ROTLI/ROTRI rotations as faux shuffles Allows us to combine rotations with shuffles. One of many things necessary to fix PR44379 (lowering shuffles to rotations)	2020-02-09 12:25:49 +00:00
serge_sans_paille	e67cbac812	Support -fstack-clash-protection for x86 Implement protection against the stack clash attack [0] through inline stack probing. Probe stack allocation every PAGE_SIZE during frame lowering or dynamic allocation to make sure the page guard, if any, is touched when touching the stack, in a similar manner to GCC[1]. This extends the existing `probe-stack' mechanism with a special value `inline-asm'. Technically the former uses function call before stack allocation while this patch provides inlined stack probes and chunk allocation. Only implemented for x86. [0] https://www.qualys.com/2017/06/19/stack-clash/stack-clash.txt [1] https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00556.html This a recommit of `39f50da2a3` with proper LiveIn declaration, better option handling and more portable testing. Differential Revision: https://reviews.llvm.org/D68720	2020-02-09 10:42:45 +01:00
serge-sans-paille	4546211600	Revert "Support -fstack-clash-protection for x86" This reverts commit `0fd51a4554`. Failures: http://lab.llvm.org:8011/builders/llvm-clang-win-x-armv7l/builds/4354	2020-02-09 10:06:31 +01:00
serge_sans_paille	0fd51a4554	Support -fstack-clash-protection for x86 Implement protection against the stack clash attack [0] through inline stack probing. Probe stack allocation every PAGE_SIZE during frame lowering or dynamic allocation to make sure the page guard, if any, is touched when touching the stack, in a similar manner to GCC[1]. This extends the existing `probe-stack' mechanism with a special value `inline-asm'. Technically the former uses function call before stack allocation while this patch provides inlined stack probes and chunk allocation. Only implemented for x86. [0] https://www.qualys.com/2017/06/19/stack-clash/stack-clash.txt [1] https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00556.html This a recommit of `39f50da2a3` with proper LiveIn declaration, better option handling and more portable testing. Differential Revision: https://reviews.llvm.org/D68720	2020-02-09 09:35:42 +01:00
Craig Topper	e629674176	[X86] Add more scalar intrinsic instructions to isNonFoldablePartialRegisterLoad. I think this covers most if not all of the scalar intrinsic instructions.	2020-02-08 20:41:36 -08:00
Fangrui Song	ee3f13b81d	Fix -Wunused-lambda-capture for -DLLVM_ENABLE_ASSERTIONS=off builds after `6556c615f3`	2020-02-08 19:03:58 -08:00
Craig Topper	0152b106ae	[X86] Add the recently added (V)CVTSS2SI/CVTSD2SI instructions used for LRINT/LLRINT to the load folding tables.	2020-02-08 17:54:48 -08:00
Craig Topper	d643a39aba	[X86] Use any_fadd/sub/mul/div/sqrt with the AVX512 scalar_*_patterns. Making sure not to use them with patterns for masked instructions. Also fix FMA patterns that were matching strict_fma+x86selects to masked instructions.	2020-02-08 15:54:40 -08:00
Craig Topper	eeb63944e4	[LegalizeTypes][ARM][AArch64][PowerPC][RISCV][X86] Use BUILD_PAIR to return expanded integer results from ReplaceNodeResults instead of just returning two results. Remove code from LegalizeTypes that allowed this to work. We were already using BUILD_PAIR for this in some places so this standardizes on a single way to do this.	2020-02-08 09:52:31 -08:00
Simon Pilgrim	4aa7b9cc96	[X86] X86InstComments - add FMA4 comments These typically match the FMA3 equivalents, although the multiply operands sometimes get flipped due to the FMA3 permute variants.	2020-02-08 17:02:00 +00:00
Simon Pilgrim	10417ad2e4	[X86] Standardize BROADCAST enum names (PR31079) Tweak EVEX implementation names so it matches the other variants by adding the 'r' prefix. Oddly some of the subvec broadcast ops already matched.	2020-02-08 16:55:00 +00:00
Simon Pilgrim	0ed79e9b8f	[X86] Standardize VPSLLDQ/VPSRLDQ enum names (PR31079) Tweak EVEX implementation names so it matches the other variants	2020-02-08 14:54:44 +00:00
serge-sans-paille	658495e6ec	Revert "Support -fstack-clash-protection for x86" This reverts commit `e229017732`. Failures: http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-debian/builds/2604 http://lab.llvm.org:8011/builders/llvm-clang-win-x-aarch64/builds/4308	2020-02-08 14:26:22 +01:00
Victor Campos	af2a384581	Revert "[ARM] Improve codegen of volatile load/store of i64" This reverts commit `60e0120c91`.	2020-02-08 13:18:45 +00:00
serge_sans_paille	e229017732	Support -fstack-clash-protection for x86 Implement protection against the stack clash attack [0] through inline stack probing. Probe stack allocation every PAGE_SIZE during frame lowering or dynamic allocation to make sure the page guard, if any, is touched when touching the stack, in a similar manner to GCC[1]. This extends the existing `probe-stack' mechanism with a special value `inline-asm'. Technically the former uses function call before stack allocation while this patch provides inlined stack probes and chunk allocation. Only implemented for x86. [0] https://www.qualys.com/2017/06/19/stack-clash/stack-clash.txt [1] https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00556.html This a recommit of `39f50da2a3` with better option handling and more portable testing Differential Revision: https://reviews.llvm.org/D68720	2020-02-08 13:31:52 +01:00
Benjamin Kramer	e4230a9f6c	ArrayRef'ize spillCalleeSavedRegisters. NFCI.	2020-02-08 12:19:23 +01:00
Simon Pilgrim	7f5b3fa73c	[X86][SSE] Add X86ISD::FRCP handling to isNegatibleForFree Peek through X86ISD::FRCP nodes to see if there is a negatible input.	2020-02-08 10:56:27 +00:00
Simon Pilgrim	4229f12a22	[TargetLowering] Remove isDesirableToCombineBuildVectorToShuffleTruncate target hook. NFC. This hasn't been used for years, its original implementation, D35700, had bugs that caused the reversion of most of the code, and since then x86 shuffle lowering/combining has handled most cases and can deal with the rest as well.	2020-02-08 08:55:51 +00:00
Sam Clegg	caeb6cfbc2	[WebAssembly] Fix signature of __powitf2 libcall Add tests for @llvm.powi.f64/f128. See: https://llvm.org/docs/LangRef.html#llvm-powi-intrinsic Differential Revision: https://reviews.llvm.org/D74274	2020-02-07 20:30:47 -08:00
Heejin Ahn	5b5cbfe135	[WebAssembly] Add debug info to insts in Emscripten SjLj Summary: This makes sure all newly create instructions in Emscripten SjLj has appropriate debug info attached. Fixes https://github.com/emscripten-core/emscripten/issues/9797. Reviewers: kripken Subscribers: dschuff, aprantl, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D74269	2020-02-07 19:08:39 -08:00
Huihui Zhang	6556c615f3	Reland "[AMDGPU] Fix data race on RegisterBank initialization."	2020-02-07 14:18:48 -08:00
Huihui Zhang	ae39105466	Reland "[ARM] Fix data race on RegisterBank initialization." Update lambda function static auto InitializeRegisterBankOnce = [this](const auto &TRI) { with static auto InitializeRegisterBankOnce = [&]() { Capture reference instead of passing argument, as there are buildbot compiling errors related when passing argument.	2020-02-07 14:01:06 -08:00
Huihui Zhang	2491fd0e6f	Reland "[AArch64] Fix data race on RegisterBank initialization." Update lambda function static auto InitializeRegisterBankOnce = [this](const auto &TRI) { with static auto InitializeRegisterBankOnce = [&]() { Capture reference instead of passing argument, as there are buildbot compiling errors related when passing argument.	2020-02-07 13:13:55 -08:00
Nemanja Ivanovic	26bf877ec5	[PowerPC] Fix spilling of vector registers in PEI of EH aware functions On little endian targets prior to Power9, we spill vector registers using a swapping store (i.e. stdxvd2x saves the vector with the two doublewords in big endian order regardless of endianness). This is generally not a problem since we restore them using the corresponding swapping load (lxvd2x). However if the restore is done by the unwinder, the vector register contains data in the incorrect order. This patch fixes that by using Altivec loads/stores for vector saves and restores in PEI (which keep the order correct) under those specific conditions: - EH aware function - Subtarget requires swaps for VSX memops (Little Endian prior to Power9) Differential revision: https://reviews.llvm.org/D73692	2020-02-07 14:41:52 -06:00
Nico Weber	b03c3d8c62	Revert "Support -fstack-clash-protection for x86" This reverts commit `4a1a0690ad`. Breaks tests on mac and win, see https://reviews.llvm.org/D68720	2020-02-07 14:49:38 -05:00
Changpeng Fang	884acbb9e1	AMDGPU: Enhancement on FDIV lowering in AMDGPUCodeGenPrepare Summary: The accuracy limit to use rcp is adjusted to 1.0 ulp from 2.5 ulp. Also, afn instead of arcp is used to allow inaccurate rcp to be used. Reviewers: arsenm Differential Revision: https://reviews.llvm.org/D73588	2020-02-07 11:46:23 -08:00
Jessica Paquette	609a489e05	[AArch64][GlobalISel] Reland SLT/SGT TBNZ optimization The issue in the previous commits was that we swap the LHS and RHS while looking for the constant. In SLT/SGT, the constant must be on the RHS, or the optimization is invalid. Move the swapping logic after the check for the SLT/SGT case and update tests. Original commits: `d78cefb160` `a373841407`	2020-02-07 11:15:25 -08:00
Changpeng Fang	6370c7c13e	AMDGPU: Limit the search in finding the instruction pattern for v_swap generation. Summary: Current implementation of matchSwap in SIShrinkInstructions searches the entire use_nodbg_operands set to find the possible pattern to generate v_swap instruction. This approach will lead to a O(N^3) in compile time for SIShrinkInstructions. But in reality, the matching pattern only exists within nearby instructions in the same basic block. This work limits the search to a maximum of 16 instructions, and has a linear compile time comsumption. Reviewers: rampitec, arsenm Differential Revision: https://reviews.llvm.org/D74180	2020-02-07 11:06:33 -08:00
serge_sans_paille	4a1a0690ad	Support -fstack-clash-protection for x86 Implement protection against the stack clash attack [0] through inline stack probing. Probe stack allocation every PAGE_SIZE during frame lowering or dynamic allocation to make sure the page guard, if any, is touched when touching the stack, in a similar manner to GCC[1]. This extends the existing `probe-stack' mechanism with a special value `inline-asm'. Technically the former uses function call before stack allocation while this patch provides inlined stack probes and chunk allocation. Only implemented for x86. [0] https://www.qualys.com/2017/06/19/stack-clash/stack-clash.txt [1] https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00556.html This a recommit of `39f50da2a3` with correct option flags set. Differential Revision: https://reviews.llvm.org/D68720	2020-02-07 19:54:39 +01:00
Sean Fertile	88073d40c7	[PowerPC] Create a FixedStack object for CR save in linkage area. hasReservedSpillSlot returns a dummy frame index of '0' on PPC64 for the non-volatile condition registers, which leads to the CalleSavedInfo either referencing an unrelated stack object, or an invalid object if there are no stack objects. The latter case causes the mir-printer to crash due to assertions that checks if the frame index referenced by a CalleeSavedInfo is valid. To fix the problem create an immutable FixedStack object at the correct offset in the linkage area of the previous stack frame (ie SP + positive offset). Differential Revision: https://reviews.llvm.org/D73709	2020-02-07 13:33:44 -05:00
Craig Topper	278578744a	[X86] Handle SETB_C32r/SETB_C64r in flag copy lowering the same way we handle SBB Previously we took the restored flag in a GPR, extended it 32 or 64 bits. Then used as an input to a sub from 0. This requires creating a zero extend and creating a 0. This patch changes this to just use an ADD with 255 to restore the carry flag and keep the SETB_C32r/SETB_C64r. Exactly like we handle SBB which is what SETB becomes. Differential Revision: https://reviews.llvm.org/D74152	2020-02-07 10:31:19 -08:00
Petar Avramovic	7df5fc9e03	[GlobalISel] Add buildMerge with SrcOp initializer list Allows more flexible use of buildMerge in places where use operands are available as SrcOp since it does not require explicit conversion to Register. Simplify code with new buildMerge. Differential Revision: https://reviews.llvm.org/D74223	2020-02-07 18:43:45 +01:00
Sanjay Patel	de6f7eb47e	[x86] don't create an unused constant vector Noticed while scanning through debug spew. Creating unused nodes is inefficient and makes following the debug output harder.	2020-02-07 12:05:02 -05:00
Simon Pilgrim	c96001035d	[X86] isNegatibleForFree - allow pre-legalized FMA negation As long as the FMA operation is legal (which we can proxy for the FMA3/FMA4 variants as well), we don't have to wait for the LegalOperations stage.	2020-02-07 17:04:17 +00:00
Matt Arsenault	2f885cbe90	AMDGPU/GlobalISel: Fix move s.buffer.load to VALU We were executing this in a waterfall loop as a placeholder, but this should really be converted to a MUBUF load. Also execute in a waterfall loop if the resource isn't an SGPR. This is a case where the DAG handling was wrong because doing the right thing was too hard. Currently, this will mishandle 96-bit loads. There's currently no way to track the original memory size with an MMO, so these loads will be widened andd the resulting memory size will be 128-bits.	2020-02-07 07:19:01 -08:00
Matt Arsenault	3b198518ad	GlobalISel: Fix narrowing of G_CTPOP The result type is separate from the source type. Tests will be included in a future AMDGPU patch which uses this from RegBankSelect/applyMappingImpl.	2020-02-07 06:58:00 -08:00
Matt Arsenault	8de2dad9e0	GlobalISel: Fix lowering of G_CTLZ/G_CTTZ The type passed to lower was invalid, so I'm not sure how this was even working before. The source and destination type also do not have to match, so make sure to use the right ones.	2020-02-07 06:54:12 -08:00
Momchil Velikov	a2531081b3	[AArch64] Predictably disassemble system registers with the same encoding The registers TRCEXTINSELR and TRCEXTINSELR0 are distinct registers, defined by separate extension specifications (ETM and ETE, respectively), yet they use the same encoding in MSR/MRS. When performing a system register lookup by encoding, we would essentially return a random one, depending on the number, relative position in the TableGen file, whether the TableGen records for system registers are named or not, and, if they are named, depending on record (not register!) name as well. This patch works around the issue by explictly checking for the TRCEXTINSELR/TRCEXTINSELR0 encoding and always returning TRCEXTINSELR. Differential Revision: https://reviews.llvm.org/D74074	2020-02-07 12:19:57 +00:00
serge-sans-paille	f6d98429fc	Revert "Support -fstack-clash-protection for x86" This reverts commit `39f50da2a3`. The -fstack-clash-protection is being passed to the linker too, which is not intended. Reverting and fixing that in a later commit.	2020-02-07 11:36:53 +01:00
Guillaume Chatelet	f85d3408e6	[NFC] Introduce an API for MemOp Summary: This patch introduces an API for MemOp in order to simplify and tighten the client code. Reviewers: courbet Subscribers: arsenm, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jsji, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73964	2020-02-07 11:32:27 +01:00
Pierre van Houtryve	e8c3a6c260	[ARM][ASMParser] Refuse equal RdHi/RdLo for s/umlal, smlsl, s/umull, umaal Differential Revision: https://reviews.llvm.org/D74120	2020-02-07 10:05:20 +00:00
serge_sans_paille	39f50da2a3	Support -fstack-clash-protection for x86 Implement protection against the stack clash attack [0] through inline stack probing. Probe stack allocation every PAGE_SIZE during frame lowering or dynamic allocation to make sure the page guard, if any, is touched when touching the stack, in a similar manner to GCC[1]. This extends the existing `probe-stack' mechanism with a special value `inline-asm'. Technically the former uses function call before stack allocation while this patch provides inlined stack probes and chunk allocation. Only implemented for x86. [0] https://www.qualys.com/2017/06/19/stack-clash/stack-clash.txt [1] https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00556.html Differential Revision: https://reviews.llvm.org/D68720	2020-02-07 10:56:15 +01:00
Pierre van Houtryve	56d81d4580	[Target][AArch64] Remove non-existing system registers ICH_VSEIR_EL2 & ICC_SEIEN_EL1 from AArch64 backend Differential Revision: https://reviews.llvm.org/D74118	2020-02-07 09:44:41 +00:00
Craig Topper	ae4e49868a	[X86] Turn vXi1 any_extends into sign_extends in PreprocessISelDAG and remove some isel patterns. Similar to what we do for other vector any_extends, but instead of zero_extend we need to use sign_extend.	2020-02-06 21:32:53 -08:00
Craig Topper	3f62028f2f	[X86] Use SelectionDAG::getAllOnesConstant to simplify some code. NFC	2020-02-06 21:32:53 -08:00
Matt Arsenault	6a570dc548	AMDGPU/GlobalISel: Fix non-pow-2 add/sub/mul for 16-bit insts These wouldn't legalize between 16-bits and 32-bits on targets with 16-bit instructions.	2020-02-06 21:43:54 -05:00
Stanislav Mekhanoshin	cacc3b7a55	[AMDGPU] Cleanup assumptions about generated subregs We are using countPopulation on a LaneBitmask to determine a number of registers it covers. This is the assumption which does not necessarily need to be true. It is not changed but factored into a single call SIRegisterInfo::getNumCoveredRegs(). Some other places are cleaned up with respect to assumptions about subreg indexes values and tablegen behavior. Differential Revision: https://reviews.llvm.org/D74177	2020-02-06 17:39:24 -08:00
Stanislav Mekhanoshin	2863c26968	Revert "AMDGPU: Limit the search in finding the instruction pattern for v_swap generation." This reverts commit `9827806481`.	2020-02-06 17:38:55 -08:00
Changpeng Fang	9827806481	AMDGPU: Limit the search in finding the instruction pattern for v_swap generation. Summary: Current implementation of matchSwap in SIShrinkInstructions searches the entire use_nodbg_operands set to find the possible pattern to generate v_swap instruction. This approach will lead to a O(N^3) in compile time for SIShrinkInstructions. But in reality, the matching pattern only exists within nearby instructions in the same basic block. This work limits the search to a maximum of 16 instructions, and has a linear compile time comsumption. Reviewers: rampitec, arsenm Differential Revision: https://reviews.llvm.org/D74180	2020-02-06 16:40:21 -08:00
Jessica Paquette	3e5d837cda	Revert "[AArch64][GlobalISel] Emit TBNZ with G_BRCOND where the condition is SLT" This reverts commit `a373841407`. It looks like this broke set_shadow_test.c, so I'm reverting until I can fix it. I also reverted the SGT change because it's probably also broken.	2020-02-06 16:30:13 -08:00
Jessica Paquette	df51b685ef	Revert "[AArch64][GlobalISel] Emit TBZ for SGT cond branches against -1" This reverts commit `d78cefb160`. One of this and the SLT change broke set_shadow_test.c, so I'm reverting until I can fix it.	2020-02-06 16:29:00 -08:00
Huihui Zhang	e0d1e83e23	Revert "Reland "[AArch64] Fix data race on RegisterBank initialization."" This reverts commit `8e1ca948cc`. New failing at http://lab.llvm.org:8011/builders/clang-armv7-linux-build-cache/builds/25929 I did reproduce and pass the previous failure at http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-autoconf/builds/46803/steps/annotate/logs/stdio	2020-02-06 15:27:10 -08:00
Huihui Zhang	8e1ca948cc	Reland "[AArch64] Fix data race on RegisterBank initialization." Update lambda function argument "[this](const auto &TRI)" with [this](const TargetRegisterInfo &TRI). Looks like a bug in g++-6, there is no issue compiling using g++-9.	2020-02-06 15:11:33 -08:00
Craig Topper	ec9a94af4d	[X86] Use MVT::i8 instead of MVT::i64 for shift amount in BuildSDIVPow2 X86 uses i8 for shift amounts. This code can fail on a 32-bit target if it runs after type legalization. This code was copied from AArch64 and modified for X86, but the shift amount wasn't changed to the correct type for X86. Fixes PR44812	2020-02-06 13:32:13 -08:00
Jessica Paquette	d78cefb160	[AArch64][GlobalISel] Emit TBZ for SGT cond branches against -1 When we have a G_BRCOND fed by a sgt compare against -1, we can just emit a TBZ. This is similar to the code in `AArch64TargetLowering::LowerBR_CC`. Also while we're here, properly scope the commutative constant check in `selectCompareBranch`, since it sometimes would call `getConstantVRegValWithLookThrough` twice. Differential Revision: https://reviews.llvm.org/D74149	2020-02-06 12:04:03 -08:00
Matt Arsenault	03a2d0045d	AMDGPU: Add compile time hack for hasCFUser Assume the control flow intrinsic results are never casted, and early exit based on the type.	2020-02-06 11:41:34 -08:00
Craig Topper	4175d7e22e	[X86] Custom isel floating point X86ISD::CMP on pre-CMOV targets. Eliminate ConvertCmpIfNecessary If we don't have cmov, X87 compares write to FPSW and we need to move the bits to EFLAGS to use as JCC/SETCC/CMOV conditions. Previously this was done by calling ConvertCmpIfNecessary in multiple places which would emit the extra code for the FNSTSW, a shift, a truncate, and a SAHF instructions. Isel would then select trunc+X86ISD::CMP to a FUCOM instruction that produces FPSW. This patch centralizes all of the handling into a single custom isel handler. This allows us to remove ConvertCmpIfNecessary and a couple target specific ISD opcodes. Differential Revision: https://reviews.llvm.org/D73863	2020-02-06 10:43:06 -08:00
Craig Topper	600f2e1c4d	[X86] Remove SETB_C8r/SETB_C16r pseudo instructions. Use SETB_C32r and EXTRACT_SUBREG instead. Only 32 and 64 bit SBB are dependency breaking instructons on some CPUs. The 8 and 16 bit forms have to preserve upper bits of the GPR. This patch removes the smaller forms and selects the wider form instead. I had to do this with custom code as the tblgen generated code glued the eflags copytoreg to the extract_subreg instead of to the SETB pseudo. Longer term I think we can remove X86ISD::SETCC_CARRY and use (X86ISD::SBB zero, zero). We'll want to keep the pseudo and select (X86ISD::SBB zero, zero) to either a MOV32r0+SBB for targets where there is no dependency break and SETB_C32/SETB_C64 for targets that have a dependency break. May want some way to avoid the MOV32r0 if the instruction that produced the carry flag happened to def a register that we can use for the dependency. I think the flag copy lowering should be using NEG instead of SUB to handle SETB. That would avoid the MOV32r0 there. Or maybe it should use a ADC with -1 to recreate the carry flag and keep the SETB? That would avoid a MOVZX on the input of the SUB. Differential Revision: https://reviews.llvm.org/D74024	2020-02-06 10:22:24 -08:00
Matt Arsenault	5a8c0f552b	AMDGPU/GlobalISel: Avoid handling registers twice in waterfall loops When multiple instructions are moved into a waterfall loop, it's possible some of them re-use the same operands. Avoid creating multiple sequences of readfirstlanes for them. None of the current uses will hit this, but will be used in a future patch.	2020-02-06 09:38:24 -08:00
Chris Bowler	b373ec8ce7	[AIX] Implement caller arguments passed in stack memory. This patch implements the caller side of placing function call arguments in stack memory. This removes the current limitation where LLVM on AIX will report fatal error when arguments can't be contained in registers. There is a particular oddity that a float argument that passes in a register and also in stack memory requires that the caller initialize both. From what AIX "ABI" documentation I have it's not clear that this needs to be done, however, it is necessary for compatibility with the AIX XL compiler so I think it's best to implement it the same way. Note a later patch will follow to address the callee side. Differential Revision: https://reviews.llvm.org/D73209	2020-02-06 12:07:34 -05:00
Mikhail Maltsev	2694cc3dca	[ARM][MVE] Add fixed point vector conversion intrinsics Summary: This patch implements the following Arm ACLE MVE intrinsics: * vcvtq_n_* * vcvtq_m_n_* * vcvtq_x_n_* and two corresponding LLVM IR intrinsics: * int_arm_mve_vcvt_fix (vcvtq_n_) int_arm_mve_vcvt_fix_predicated (vcvtq_m_n_, vcvtq_x_n_) Reviewers: simon_tatham, ostannard, MarkMurrayARM, dmgreen Reviewed By: MarkMurrayARM Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D74134	2020-02-06 16:49:45 +00:00
Sam Parker	0a8cae10fe	[ReachingDefs] Make isSafeToMove more strict. Test that we're not moving the instruction through instructions with side-effects. Differential Revision: https://reviews.llvm.org/D74058	2020-02-06 14:06:08 +00:00
Jessica Paquette	a373841407	[AArch64][GlobalISel] Emit TBNZ with G_BRCOND where the condition is SLT When we have a G_ICMP which checks SLT, and the comparison is against 0, we can emit a TBNZ instead of a CBZ. This lets us fold in things into the branch, which can provide some code size savings. This is similar to the case in `AArch64TargetLowering::LowerBR_CC`. https://reviews.llvm.org/D74090	2020-02-05 15:23:54 -08:00
Jessica Paquette	bab993451e	[AArch64][GlobalISel][NFC] Factor out TB(N)Z emission code into its own function Factor it out into `emitTestBit` and add some asserts to the new function. This will be useful for implementing TB(N)Z emission for SLT/SGT compares. Differential Revision: https://reviews.llvm.org/D74080	2020-02-05 15:15:44 -08:00
Jessica Paquette	7212f65784	[AArch64][GlobalISel] Fold G_LSHR into test bit calculation Add support for walking through G_LSHR in `getTestBitReg`. Equivalent to the code in `getTestBitOperand` in AArch64ISelLowering. ``` (tbz (lshr x, c), b) -> (tbz x, b+c) when b + c is < # bits in x ``` Differential Revision: https://reviews.llvm.org/D74077	2020-02-05 15:14:12 -08:00
Matt Arsenault	89b7091c28	AMDGPU: Make LDS_DIRECT an artifical register	2020-02-05 17:47:22 -05:00
Jonas Paulsson	4a3760d2ba	[SystemZ] Improve handling of inline asm constraints. The "{=v0}" constraint did not result in the expected error message in the abscence of the vector facility, because 'v0' matches as a string into the AnyRegBitRegClass in common code. This patch adds checks for vector support in case of "{v" and soft-float in case of "{f" to remedy this. Review: Ulrich Weigand.	2020-02-05 17:04:16 -05:00
Craig Topper	c6bdd8e731	[X86] Improve the gather scheduler models for SkylakeClient and SkylakeServer The load ports need a cycle for each potentially loaded element just like Haswell and Skylake. Unlike Haswell and Broadwell, the number of uops does not scale with the number of elements. Instead the load uops run for multiple cycles. I've taken the latency number from the uops.info. The port binding for the non-load uops is taken from the original IACA data I have. Differential Revision: https://reviews.llvm.org/D74000	2020-02-05 13:26:47 -08:00
Matt Arsenault	baafe82b07	AMDGPU/GlobalISel: Remove bitcast legality hack	2020-02-05 16:24:24 -05:00
Matt Arsenault	364326ce66	AMDGPU/GlobalISel: Add mem operand to s.buffer.load intrinsic Really the intrinsic definition is wrong, but work around this here. The DAG lowering introduces an MMO. We have to introduce a new operation to avoid the verifier complaining about the missing mayLoad.	2020-02-05 15:04:42 -05:00
Sanjay Patel	0a389c81cd	[x86] use getSplatIndex() in lowerShuffleAsBroadcast() The old code was doing an N^2 search for splat index. Differential Revision: https://reviews.llvm.org/D74064	2020-02-05 14:55:02 -05:00
Victor Huang	043e478721	[PowerPC][NFC] Clang-format on commit 4b414d	2020-02-05 13:47:54 -06:00
Matt Arsenault	5aa6e246a1	AMDGPU/GlobalISel: Legalize f64 G_FFLOOR for SI Use cmp ord instead of cmp_class compared to the DAG version for the nan check, but mostly try to match the existsing pattern. I think the sign doesn't matter for fract, so we could do a little better with the source modifier matching. I think this is also still broken as in D22898, but I'm leaving it as-is for now while I don't have an SI system to test on.	2020-02-05 14:32:01 -05:00
Nate Voorhies	e5ba52dc81	[NFC][RISCV] Fixing typo in comment. Reviewers: luismarques, lenary Reviewed By: lenary Subscribers: hiraditya, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, kito-cheng, shiva0217, jrtc27, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, rkruppe, PkmX, jocewei, psnobl, benna, Jim, lenary, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73984	2020-02-05 11:30:11 -08:00
Shu-Chun Weng	ce9633633c	[GlobalISel][AArch64] Fix contract cross-bank copies with SIMD instructions contractCrossBankCopyIntoStore() finds the instruction defines the source register and uses its output to replace the register. There are, however, instructions that have multiple outputs, e.g. G_UNMERGE_VALUES. Current implementation hardcodes to operand 0 and has no way of knowing which output should be used. This change adds another function to directly return the register that is the source of the register and use that for folding. This fixes https://bugs.llvm.org/show_bug.cgi?id=44783 Differential Revision: https://reviews.llvm.org/D74005	2020-02-05 10:38:35 -08:00
Jessica Paquette	292f725711	[AArch64][GlobalISel] Fold G_ASHR into TB(N)Z bit calculation This implements walking over G_ASHR in the same way as `getTestBitOperand` in AArch64ISelLowering. ``` (tbz (ashr x, c), b) -> (tbz x, b+c) or (tbz x, msb) if b+c is > # bits in x ``` Differential Revision: https://reviews.llvm.org/D73933	2020-02-05 10:04:48 -08:00
Matt Arsenault	7bffa97285	AMDGPU/GlobalISel: Prefer merge/unmerge ops to legalize TFE These have a better chance of combining with other operations and are currently much better supported than G_EXTRACT.	2020-02-05 12:56:10 -05:00
Jessica Paquette	a82a28ae12	[AArch64][GlobalISel] Fix one use check in getTestBitReg (1) The check needs to be on the 0th operand of whatever we're folding (2) Checks for validity should happen before we change the bit Fixes a bug which caused MultiSource/Applications/JM/lencod to fail at -O3. Differential Revision: https://reviews.llvm.org/D74002	2020-02-05 09:54:52 -08:00
Matt Arsenault	e65e6d052e	AMDGPU/GlobalISel: Legalize TFE image result loads Rewrite the result register pair into the expected sinigle register format in the legalizer. I'm also operating under the assumption that TFE doesn't apply to stores or atomics, but don't know if this is true or not.	2020-02-05 12:40:20 -05:00
Matt Arsenault	096cd991ee	AMDGPU: Fix divergence analysis of control flow intrinsics The mask results of these should be uniform. The trickier part is the dummy booleans used as IR glue need to be treated as divergent. This should make the divergence analysis results correct for the IR the DAG is constructed from. This should allow us to eliminate requiresUniformRegister, which has an expensive, recursive scan over all users looking for control flow intrinsics. This should avoid recent compile time regressions.	2020-02-05 09:30:54 -08:00
Jordan Rupprecht	9f507bfd8d	NFC: fix unused var warnings in no-assert builds	2020-02-05 09:26:59 -08:00
Matt Arsenault	69cc9f3046	AMDGPU/GlobalISel: Legalize llvm.amdgcn.s.buffer.load The 96-bit results need to be widened. I find the interaction between LegalizerHelper and MIRBuilder somewhat awkward. The custom legalization is called by the LegalizerHelper, but then does not have access to the helper. You have to construct a new helper, which then does not own the MachineIRBuilder, but does modify it. Maybe custom legalization should be passed the helper?	2020-02-05 12:01:34 -05:00
Matt Arsenault	307e0d5490	AMDGPU/GlobalISel: Fix processing new phi in waterfall loop The adjusted iterator range included the last we just inserted, and don't want to process. Figure out the new iterator range before inserting phis. This was a harmless problem, but added an unnecessary complication for a future patch.	2020-02-05 11:52:42 -05:00
Matt Arsenault	dfa9420f09	AMDGPU/GlobalISel: Don't use legal v2s16 G_BUILD_VECTOR If we have s_pack_* instructions, legalize this to G_BUILD_VECTOR_TRUNC from s32 elements. This is closer to how how the s_pack_* instructions really behave. If we don't have s_pack_ instructions, expand this by creating a merge to s32 and bitcasting. This expands to the expected bit operations. I think this eventually should go in a new bitcast legalize action type in LegalizerHelper. We already directly emit the shift operations in RegBankSelect for the vector case. This could possibly be cleaned up, but I also may want to defer doing this expansion to selection anyway. I'll see about that when I try to actually match VOP3P instructions. This breaks the selection of the build_vector since tablegen doesn't know how to match G_BUILD_VECTOR_TRUNC yet, so just xfail it for now.	2020-02-05 11:52:18 -05:00
Sjoerd Meijer	01022af5d5	[ARM][MVE] LowOverheadLoops: DCE on the iteration count setup expression Once we have created a tail-predicated hardware-loop, and thus know the number of elements that are processed, we want to clean-up the iteration count expression of that loop. In D73682, we bailed the analysis on conditionally executed instructions. This adds support for IT-blocks, so that we can handle these cases again. The restriction is that we only support IT blocks containing 1 statement, but that seems to cover most cases and forms of the iteration count expression. Differential Revision: https://reviews.llvm.org/D73947	2020-02-05 15:15:46 +00:00
Momchil Velikov	a328536c6d	[ARM] Correct syntax of the CLRM insn The predicate should be adjacent to the opcode. Differential Revision: https://reviews.llvm.org/D74040	2020-02-05 13:54:34 +00:00
Sam Parker	564275289d	[ARM][LowOverheadLoops] Fix loop count chain Checking that the use-def chain that performs the loop count isSafeToRemove is not sufficient because it means that we can remove register copies that we need to restore lr to its correct value. This change now prevents the transform from kicking in for the 'remove-elem-moves' test which needs to addressed later on. Differential Revision: https://reviews.llvm.org/D74037	2020-02-05 13:21:51 +00:00
Sam Parker	4c7f819204	[ARM][LowOverheadLoops] Ensure memory predication While validating each MVE instruction, check that all instructions that touch memory are somehow predicated upon the VCTP. Differential Revision: https://reviews.llvm.org/D73616	2020-02-05 13:19:08 +00:00
Simon Pilgrim	8616bd417f	[X86] Fix missing load latencies (PR36894) We weren't account for load latencies in the SSE42/AES/CLMUL schedule classes	2020-02-05 11:53:16 +00:00
Sebastian Neubauer	163e33b290	[AMDGPU] Fix lowering a16 image intrinsics scalar_to_vector takes only one argument, not two. The a16 tests now also check the packing of coordinates into registers Differential Revision: https://reviews.llvm.org/D73482	2020-02-05 10:54:34 +01:00
Sebastian Neubauer	3bc7ffdaab	[AMDGPU] Use v3f32 type in image instructions This should lower the amount of used registers for gfx9. I updated some of the changed tests with the update script because changing them by hand is tedious. Differential Revision: https://reviews.llvm.org/D73884	2020-02-05 10:35:41 +01:00
Craig Topper	a3d489e87e	[X86] Add a DAG combine for (i32 (sext (i8 (x86isd::setcc_carry)))) -> (i32 (x86isd::setcc_carry)) and remove isel patterns. Same for any_extend though we don't have coverage for that. The test changes are because isel didn't check one use of the setcc_carry. So in isel we would end up with two different sized setcc_carry instructions. And since it clobbers the flags we would need to recreate the flags for the second instruction. This code handles additional uses by truncating the new wide setcc_carry back to the original size for those uses.	2020-02-04 22:40:36 -08:00
Jan Vesely	e6686adf8a	AMDGPU/EG,CM: Implement fsqrt using recip(rsqrt(x)) instead of x * rsqrt(x) The old version might be faster on EG (RECIP_IEEE is Trans only), but it'd need extra corner case checks. This gives correct corner case behaviour and saves a register. Fixes OCL CTS sqrt test (1-thread, scalar) on Turks. Reviewer: arsenm Differential Revision: https://reviews.llvm.org/D74017	2020-02-05 00:24:07 -05:00
Thomas Lively	649aba93a2	Revert "[WebAssembly][InstrEmitter] Foundation for multivalue call lowering" Summary: This reverts commit `3ef169e586`. The purpose of this commit was to allow stack machines to perform instruction selection for instructions with variadic defs. However, MachineInstrs fundamentally cannot support variadic defs right now, so this change does not turn out to be useful. Depends on D73927. Reviewers: aheejin Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73928	2020-02-04 20:04:59 -08:00
Matt Arsenault	9260d01faa	AMDGPU: Correct memory size for image intrinsics This was incorrectly rounding up to the next power of 2. v4f32 was rounding up to v8f32, which was just wrong. There are also v3i16/v3f16 available in MVT, so we don't even need to round the f16 cases anymore. Additionally, this field is really an EVT so we don't even need to consider this. Also switch some asserts to return invalid. We should have an IR verifier for these intrinsic return types, but for now it's better to not assert on IR that passes the verifier. This should also probably be fixed to consider that dmask is really eliminating some of the loaded components.	2020-02-04 22:29:23 -05:00
Thomas Lively	8acedb595d	Revert "[WebAssembly] Split and recombine multivalue calls for ISel" Summary: This reverts commit `28857d14a8`. This commit worked toward a solution that did not turn out to be feasible because MachineInstrs cannot contain an arbitrary number of defs. Reviewers: aheejin Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73927	2020-02-04 18:46:43 -08:00
Yonghong Song	d96c1bbaa0	[BPF] disable ReduceLoadWidth during SelectionDag phase The compiler may transform the following code ctx = ctx + reloc_offset ... ((u32 )ctx) & 0x8000 ... to ctx = ctx + reloc_offset ... ((u8 )(ctx + 1)) & 0x80 ... where reloc_offset will be replaced with a constant during AsmPrinter phase. The above transformed code will be rejected the kernel verifier as it does not allow (type )((ctx + non_zero_offset1) + non_zero_offset2) style access pattern. It is hard at SelectionDag phase to identify whether a load is related to context or not. Sometime, interprocedure analysis may be needed. So let us simply prevent such optimization from happening. Differential Revision: https://reviews.llvm.org/D73997	2020-02-04 18:37:43 -08:00
Thomas Lively	27748363da	[WebAssembly] Enable recently implemented SIMD operations Summary: Moves a batch of instructions from unimplemented-simd128 to simd128 because they have recently become available in V8. Reviewers: aheejin Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D73926	2020-02-04 18:36:32 -08:00
Craig Topper	016f42e3dc	[X86] Add custom lowering for lrint/llrint to either cvtss2si/cvtsd2si or fist. lrint/llrint are defined as rounding using the current rounding mode. Numbers that can't be converted raise FE_INVALID and an implementation defined value is returned. They may also write to errno. I believe this means we can use cvtss2si/cvtsd2si or fist to convert as long as -fno-math-errno is passed on the command line. Clang will leave them as libcalls if errno is enabled so they won't become ISD::LRINT/LLRINT in SelectionDAG. For 64-bit results on a 32-bit target we can't use cvtss2si/cvtsd2si but we can use fist since it can write to a 64-bit memory location. Though maybe we could consider using vcvtps2qq/vcvtpd2qq on avx512dq targets? gcc also does this optimization. I think we might be able to do this with STRICT_LRINT/LLRINT as well, but I've left that for future work. Differential Revision: https://reviews.llvm.org/D73859	2020-02-04 16:15:40 -08:00
Craig Topper	c67773bebe	[X86] Give KSET0* and KSET1* pseudos the same scheduler resource usage as KXOR/KXNOR. These aren't recognized as idioms by the CPU so they still use execution resources. We just use the pseudo to force the input register to k0.	2020-02-04 15:22:54 -08:00
Reid Kleckner	2d89e0a098	[SEH] Remove CATCHPAD SDNode and X86::EH_RESTORE MachineInstr The CATCHPAD node mostly existed to be selected into the EH_RESTORE instruction, which sets the frame back up when 32-bit Windows exceptions return to the parent function. However, creating this MachineInstr early increases the risk that other passes will come along and insert instructions that use the stack before ESP and EBP are restored. That happened in PR44697. Instead of representing these in the instruction stream early, delay it until PEI. Mark the blocks where this needs to happen as EHPads, but not funclet entry blocks. Passes after PEI have to be careful not to hoist instructions that can use stack across frame setup instructions, so this should be relatively reliable. Fixes PR44697 Reviewed By: hans Differential Revision: https://reviews.llvm.org/D73752	2020-02-04 15:13:12 -08:00
Matt Arsenault	4f9f5d09de	AMDGPU: Fix isAlwaysUniform for simple asm SGPR results We were handling the case where the result was a struct with an extracted SGPR component, but not for the simple case.	2020-02-04 13:34:14 -08:00
Matt Arsenault	12fe9b26ec	AMDGPU/GlobalISel: Select G_SEXT_INREG	2020-02-04 13:23:53 -08:00
Matt Arsenault	0693e827ed	AMDGPU/GlobalISel: Do a better job splitting 64-bit G_SEXT_INREG We don't need to expand to full shifts for the > 32-bit case. This just switches to a sext_inreg of the high half.	2020-02-04 13:23:53 -08:00
Matt Arsenault	05f2a04ba7	AMDGPU/GlobalISel: Legalize G_SEXT_INREG Split the VALU 64-bit case in RegBankSelect.	2020-02-04 13:23:53 -08:00
Austin Kerbow	0f116fd9d8	[AMDGPU] Fix infinite loop with fma combines https://reviews.llvm.org/D72312 introduced an infinite loop which involves DAGCombiner::visitFMA and AMDGPUTargetLowering::performFNegCombine. fma( a, fneg(b), fneg(c) ) => fneg( fma (a, b, c) ) => fma( a, fneg(b), fneg(c) ) ... This only breaks with types where 'isFNegFree' returns flase, e.g. v4f32. Reproducing the issue also needs the attribute 'no-signed-zeros-fp-math', and no source mods allowed on one of the users of the Op. This fix makes changes to indicate that it is not free to negate a fma if it has users with source mods. Differential Revision: https://reviews.llvm.org/D73939	2020-02-04 13:11:09 -08:00
Matt Arsenault	9b0ce8edfa	AMDGPU/GlobalISel: Remove extension legality hacks The legalization has improved since this was added, and the tests relying on this no longer need it.	2020-02-04 12:50:47 -08:00
Craig Topper	e195ff98f6	Recommit "[X86] Use X86ISD::SUB instead of X86ISD::CMP in some places." This time with correct types for the data result from the SUB. Original commit message: Our normal lowering for ISD::SETCC uses X86ISD::SUB to enable CSE unless the RHS is 0. optimizeCompareInstr called by the peephole pass can turn subs with unused results into cmps to clean this up. This commit makes other places that create X86ISD::CMP have the same behavior.	2020-02-04 12:19:34 -08:00
Matt Arsenault	5d2749938c	AMDGPU/GlobalISel: Custom lower G_FEXP	2020-02-04 11:50:55 -08:00
Matt Arsenault	b461436d01	AMDGPU/GlobalISel: Legalize s16 G_FEXP2	2020-02-04 11:50:55 -08:00
Matt Arsenault	1024b73ef5	AMDGPU: Split denormal mode tracking bits Prepare to accurately track the future denormal-fp-math attribute changes. The way to actually set these separately is not wired in yet. This is just a mechanical change, and mostly still assumes the input and output mode match. This should be refined for some cases. For example, fcanonicalize lowering should use the flushing variant if either input or output flushing is enabled	2020-02-04 10:44:21 -08:00
Matt Arsenault	75fcdfa1fc	AMDGPU: Cleanup SMRD buffer selection The usage of the Imm out argument from SelectSMRDOffset is pretty confusing. Stop trying to reject CI immediates in the case where the offset field can be used. It's not an illegal way to encode the immediate, so just prefer the better encoding pattern with AddedComplexity. We probably don't even really need the different opcodes for the different offset types anymore, but that will be more work to cleanup. The SMRD non-buffer load patterns could also use a cleanup to be done separately.	2020-02-04 10:28:08 -08:00
Simon Pilgrim	f25a2a3de5	[X86] Fix missing load latencies (PR36894) We weren't account for load latencies in the SSE42/AES/CLMUL schedule classes	2020-02-04 18:18:29 +00:00
Matt Arsenault	a3c814d234	Separately track input and output denormal mode AMDGPU and x86 at least both have separate controls for whether denormal results are flushed on output, and for whether denormals are implicitly treated as 0 as an input. The current DAGCombiner use only really cares about the input treatment of denormals.	2020-02-04 12:59:21 -05:00
Fangrui Song	8ff86fcf4c	[X86] -fpatchable-function-entry=N,0: place patch label after ENDBR{32,64} Similar to D73680 (AArch64 BTI). A local linkage function whose address is not taken does not need ENDBR32/ENDBR64. Placing the patch label after ENDBR32/ENDBR64 has the advantage that code does not need to differentiate whether the function has an initial ENDBR. Also, add 32-bit tests and test that .cfi_startproc is at the function entry. The line information has a general implementation and is tested by AArch64/patchable-function-entry-empty.mir Reviewed By: nickdesaulniers Differential Revision: https://reviews.llvm.org/D73760	2020-02-04 09:42:36 -08:00
David Spickett	a05566c994	[ARM] Correct missing newline after outputting .tlsdescseq directive. Differential Revision: https://reviews.llvm.org/D73972	2020-02-04 17:38:09 +00:00
Yonghong Song	6d07802d63	[BPF] handle typedef of struct/union for CO-RE relocations Linux commit `1cf5b23988 (diff-289313b9fec99c6f0acfea19d9cfd949)` uses "#pragma clang attribute push (__attribute__((preserve_access_index)), apply_to = record)" to apply CO-RE relocations to all records including the following pattern: #pragma clang attribute push (__attribute__((preserve_access_index)), apply_to = record) typedef struct { int a; } __t; #pragma clang attribute pop int test(__t *arg) { return arg->a; } The current approach to use struct/union type in the relocation record will result in an anonymous struct, which make later type matching difficult in bpf loader. In fact, current BPF backend will fail the above program with assertion: clang: ../lib/Target/BPF/BPFAbstractMemberAccess.cpp:796: ... Assertion `TypeName.size()' failed. clang will change to use the type of the base of the member access which will preserve the typedef modifier for the preserve_{struct,union}_access_index intrinsics in the above example. Here we adjust BPF backend to accept that the debuginfo type metadata may be 'typedef' and handle them properly. Differential Revision: https://reviews.llvm.org/D73902	2020-02-04 08:53:03 -08:00
Justin Hibbits	b8dc54cf39	PowerPC: Remove redundancy in ternary for predicate selection rG2c4620ad57b8 inadvertently added redundancies in selection of GT and LE predicates for SPE. Correct this. Partially addresses PR 44768.	2020-02-04 10:38:21 -06:00
David Spickett	95c95a94d7	[ARM][AsmParser] Make assembly directives case insensitive Differential Revision: https://reviews.llvm.org/D73469	2020-02-04 16:34:39 +00:00
Kazushi (Jam) Marukawa	3ed12232b0	[VE] half fptrunc+store&load+fpext Summary: fp16 (half) load+fpext and fptrunc+store isel legalization and tests. Also, ExternalSymbolSDNode operand printing (tested by fp16 lowering). Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D73899	2020-02-04 17:16:09 +01:00
Jonas Paulsson	563e84790f	[SystemZ] Support -msoft-float This is needed when building the Linux kernel. Review: Ulrich Weigand Differential Revision: https://reviews.llvm.org/D72189	2020-02-04 10:32:45 -05:00
Mikhail Maltsev	65b3b6c0ac	[ARM] Make ARM::ArchExtKind use 64-bit underlying type (part 2), NFCI Summary: After following Simon's suggestion about additional testing posted at https://reviews.llvm.org/D73906, I found several more places that need to be updated. Reviewers: simon_tatham, dmgreen, ostannard, eli.friedman Reviewed By: simon_tatham Subscribers: merge_guards_bot, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73963	2020-02-04 14:48:10 +00:00
Mikhail Maltsev	7128aace60	[ARM] Make ARM::ArchExtKind use 64-bit underlying type, NFCI Summary: This patch changes the underlying type of the ARM::ArchExtKind enumeration to uint64_t and adjusts the related code. The goal of the patch is to prepare the code base for a new architecture extension. Reviewers: simon_tatham, eli.friedman, ostannard, dmgreen Reviewed By: dmgreen Subscribers: merge_guards_bot, kristof.beyls, hiraditya, cfe-commits, llvm-commits, pbarrio Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D73906	2020-02-04 11:24:18 +00:00
Filipe Cabecinhas	abada5036e	[NFC] Fix some spelling mistakes to test pushing to GH.	2020-02-04 11:07:31 +00:00
Kadir Cetinkaya	d2b6ac6ccd	Revert "[X86] Use X86ISD::SUB instead of X86ISD::CMP in some places." This reverts commit `8413116bf1`. this seems to be causing crashes while compiling ncurses. ``` $ ./bin/llc bugpoint-reduced-simplified.ll LLVM ERROR: Cannot emit physreg copy instruction ``` Here are the crashers: https://gist.github.com/kadircet/918f5bb97a2afe048cb875490edba46e executing with an llc compiled at `904d54de9b` works fine.	2020-02-04 11:22:53 +01:00
David Green	362d00e051	[ARM][VecReduce] Force expand vector_reduce_fmin Under MVE, we do not have any lowering for fminimum, which a vector_reduce_fmin without NoNan will be expanded into. As with the other recent patches, force this to expand in the pre-isel pass. Note that Neon lowering would be OK because the scalar fminimum uses the vector VMIN instruction, but is probably better to just rely on the scalar operations, which is what is done here. Also fixes what appears to be the reversal of INF vs -INF in the vector_reduce_fmin widening code.	2020-02-04 09:36:59 +00:00
Guillaume Chatelet	b8144c0536	[NFC] Encapsulate MemOp logic Summary: This patch simply introduces functions instead of directly accessing the fields. This helps introducing additional check logic. A second patch will add simplifying functions. Reviewers: courbet Subscribers: arsenm, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jsji, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73945	2020-02-04 10:36:26 +01:00
Craig Topper	cd14b4a62b	[X86] Remove unneeded code that looks for (and (i8 (X86setcc_c)) I don't believe we use this construct anymore so I don't think we need to look for it.	2020-02-03 23:18:11 -08:00
Craig Topper	4581d97416	[X86] Remove some uncovered and possibly broken code from combineZext. This code matches (zext (trunc (setcc_carry))) -> (and (setcc_carry), 1) but the code never checks what type we're truncating too. An and mask of 1 would only make sense if the trunc was to MVT::i1, but we didn't check for that. I believe this code is a leftover from when i1 was a legal type.	2020-02-03 22:59:39 -08:00
Craig Topper	8413116bf1	[X86] Use X86ISD::SUB instead of X86ISD::CMP in some places. Our normal lowering for ISD::SETCC uses X86ISD::SUB to enable CSE unless the RHS is 0. optimizeCompareInstr called by the peephole pass can turn subs with unused results into cmps to clean this up. This commit makes other places that create X86ISD::CMP have the same behavior.	2020-02-03 21:01:11 -08:00
Craig Topper	c3a47221e0	[X86] Don't emit two X86ISD::COMI/UCOMI nodes when handling comi/ucomi intrinsics. We were creating two with different operand orders, and then only using one of them. Instead just swap the operands when needed and create a single node.	2020-02-03 20:08:01 -08:00
Craig Topper	c7768ce522	[X86] Update the haswell and broadwell scheduler information for gather instructions Broadwell was missing half the gather instructions. Both models had some mixups in the resource costs and number of uops. I've updated here based on what I think the original IACA source says with some cross checking against the microcode. I'm not sure about latency as the IACA source I have doesn't have that information. So I'm using the latency from uops.info. I plan to update Skylake models as well, but I'll do that in a separate patch. Differential Revision: https://reviews.llvm.org/D73844	2020-02-03 17:57:48 -08:00
Huihui Zhang	9a40670a0a	Revert "Reland "[AArch64] Fix data race on RegisterBank initialization."" This reverts commit `9c726e9d90`. There still buildbot failure: http://lab.llvm.org:8011/builders/clang-armv7-linux-build-cache/builds/25749	2020-02-03 16:58:58 -08:00
Huihui Zhang	9c726e9d90	Reland "[AArch64] Fix data race on RegisterBank initialization." Minor fix, lambda function should capture all automatic variables by reference. Harbormaster pass with: https://reviews.llvm.org/B45640	2020-02-03 16:48:18 -08:00
Jessica Paquette	9effe38b22	[AArch64][GlobalISel] Fold G_XOR into TB(N)Z bit calculation This ports the existing case for G_XOR from `getTestBitOperand` in AArch64ISelLowering into GlobalISel. The idea is to flip between TBZ and TBNZ while walking through G_XORs. Let's say we have ``` tbz (xor x, c), b ``` Let's say the `b`-th bit in `c` is 1. Then - If the `b`-th bit in `x` is 1, the `b`-th bit in `(xor x, c)` is 0. - If the `b`-th bit in `x` is 0, then the `b`-th bit in `(xor x, c)` is 1. So, then ``` tbz (xor x, c), b == tbnz x, b ``` Let's say the `b`-th bit in `c` is 0. Then - If the `b`-th bit in `x` is 1, the `b`-th bit in `(xor x, c)` is 1. - If the `b`-th bit in `x` is 0, then the `b`-th bit in `(xor x, c)` is 0. So, then ``` tbz (xor x, c), b == tbz x, b ``` Differential Revision: https://reviews.llvm.org/D73929	2020-02-03 15:22:24 -08:00
Jay Foad	2252cac694	[ANDGPU] getMemOperandsWithOffset: support BUF non-stack-access instructions with resource but no vaddr Summary: This enables clustering for many more BUF instructions. Reviewers: rampitec, arsenm, nhaehnle Subscribers: jvesely, wdng, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73868	2020-02-03 22:49:30 +00:00
Jessica Paquette	37910fd0e1	[AArch64][GlobalISel] Fold G_SHL into TB(N)Z bit calculation This implements the following optimization: ``` (tbz (shl x, c), b) -> (tbz x, b-c) ``` Which appears in `getTestBitOperand` in AArch64ISelLowering.cpp. If we test bit `b` of `shl x, c`, we can fold away the `shl` by looking `c` bits to the right of `b` in `x` when this fits in the type. So, we can just test the `b-c`th bit. Differential Revision: https://reviews.llvm.org/D73924	2020-02-03 14:27:08 -08:00
Matt Arsenault	7d3aace3f5	AMDGPU: Add flag to control mem intrinsic expansion GlobalISel doesn't implement the expansion for these yet, so add a flag to force expanding these so it's possible to avoid these for a while.	2020-02-03 14:26:01 -08:00
Matt Arsenault	cb7b661d3d	AMDGPU: Analyze divergence of inline asm	2020-02-03 12:42:16 -08:00
Matt Arsenault	2758ae41ae	AMDGPU/GlobalISel: Allow selecting s128 load/stores	2020-02-03 12:28:08 -08:00
Matt Arsenault	726446a009	AMDGPU: Fix splitting wide f32 s.buffer.load intrinsics This would witch f32 to i32, and produce an invald concat_vectors from i32 pieces to an f32 vector.	2020-02-03 12:28:08 -08:00
David Tenty	77e71c5217	[AIX] Don't use a zero fill with a second parameter Summary: The AIX assembler .space directive can't take a second non-zero argument to fill with. But LLVM emitFill currently assumes it can. We add a flag to the AsmInfo to check if non-zero fill is supported, and if we can't zerofill non-zero values we just splat the .byte directives. Reviewers: stevewan, sfertile, DiggerLin, jasonliu, Xiangling_L Reviewed By: jasonliu Subscribers: Xiangling_L, wuzish, nemanjai, hiraditya, kbarton, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73554	2020-02-03 15:16:08 -05:00
Jessica Paquette	2bd46444d7	[AArch64][GlobalISel] Walk through G_AND in TB(N)Z bit calculation Given ``` tb(n)z (and x, m), b ``` Where the `b`-th bit of `m` is 1, ``` tb(n)z (and x, m), b == tb(n)z x, b ``` So, we can walk past a `G_AND` in this case. Also add test/CodeGen/AArch64/GlobalISel/opt-fold-and-tbz-tbnz.mir to test this. Differential Revision: https://reviews.llvm.org/D73790	2020-02-03 11:53:47 -08:00
Amara Emerson	b911b99052	[AArch64][GlobalISel] Don't reconvert to p0 in convertPtrAddToAdd(). convertPtrAddToAdd improved overall code size and quality by a significant amount, but on -O0 we generate some cross-class copies due to the fact that we emitted G_PTRTOINT and G_INTTOPTR around the G_ADD. Unfortunately at -O0 we don't run any register coalescing, so these cross class copies end up escaping as moves, and we ended up regressing 3 benchmarks on CTMark (though still a winner overall). This patch changes the lowering to instead directly emit the G_ADD into the destination register, and then force changes the dest LLT to s64 from p0. This should be ok, as all uses of the register should now be selected and therefore the LLT doesn't matter for the users. It does however matter for the importer patterns, which will fail to select a G_ADD if there's a p0 LLT. I'm not able to get rid of the G_PTRTOINT on the source yet however. We can't use the same trick of breaking the type system since that could break the selection of the defining instruction. Thus with -O0 we still end up with a cross class copy on source. Code size improvements on -O0: Program baseline new diff test-suite :: CTMark/Bullet/bullet.test 965520 949164 -1.7% test-suite...TMark/7zip/7zip-benchmark.test 1069456 1052600 -1.6% test-suite...ark/tramp3d-v4/tramp3d-v4.test 1213692 1199804 -1.1% test-suite...:: CTMark/sqlite3/sqlite3.test 421680 419736 -0.5% test-suite...-typeset/consumer-typeset.test 837076 833380 -0.4% test-suite :: CTMark/lencod/lencod.test 799712 796976 -0.3% test-suite...:: CTMark/ClamAV/clamscan.test 688264 686132 -0.3% test-suite :: CTMark/kimwitu++/kc.test 1002344 999648 -0.3% test-suite...Mark/mafft/pairlocalalign.test 422296 421768 -0.1% test-suite :: CTMark/SPASS/SPASS.test 656792 656532 -0.0% Geomean difference -0.6% Differential Revision: https://reviews.llvm.org/D73910	2020-02-03 11:50:22 -08:00
Matt Arsenault	cd7650c186	GlobalISel: Implement fewerElementsVector for G_SEXT_INREG Start using a new strategy with a combination of merge and unmerges. This allows scalarizing before lowering, which in cases like <2 x s128> avoids producing giant illegal shifts.	2020-02-03 11:47:33 -08:00
Simon Pilgrim	3ece5a23bd	[X86] getTargetShuffleMask - use getConstantOperandVal helper. NFCI.	2020-02-03 18:06:47 +00:00
Nikita Popov	1cc4f8d172	[ARM] Expand vector reduction intrinsics on soft float Followup to D73135. If the target doesn't have hard float (default for ARM), then we assert when trying to soften the result of vector reduction intrinsics. This patch marks these for expansion as well. (A bit odd to use vectors on a target without hard float ... but that's where you end up if you expose target-independent vector types.) Differential Revision: https://reviews.llvm.org/D73854	2020-02-03 18:49:12 +01:00
Jay Foad	05297b7cbe	[AMDGPU] getMemOperandsWithOffset: add resource operand for BUF instructions Summary: This prevents unwanted clustering of BUF instructions with the same vaddr but different resource descriptors. Reviewers: rampitec, arsenm, nhaehnle Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73867	2020-02-03 17:06:09 +00:00
Simon Pilgrim	bdfcdb1fb3	HexagonOptAddrMode::changeStore - fix null dereference warning (PR43463) As detailed on PR43463, this fixes a static analyzer null dereference warning by sinking Changed = true into the if() blocks where the MIB is actually created. I did a quick check that suggested that one of those if() blocks is always guaranteed to be hit (so we could change it to if-else), but this seems like a safer approach Differential Revision: https://reviews.llvm.org/D73883	2020-02-03 16:50:04 +00:00
Simon Pilgrim	8c0e715eb2	[X86] BEXTR SimplifyDemandedBitsForTargetNode - length == 0 -> result = 0	2020-02-03 16:50:03 +00:00
Guillaume Chatelet	333f2ad8b8	[Alignment][NFC] Use Align for getMemcpy/Memmove/Memset Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: arsenm, dschuff, jyknight, sdardis, nemanjai, jvesely, nhaehnle, sbc100, jgravelle-google, hiraditya, aheejin, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, Jim, lenary, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73885	2020-02-03 17:13:19 +01:00
Kazushi (Jam) Marukawa	be9fe6aa8b	[VE] (fp)trunc+store & load+(fp)ext isel Summary: load+sext/zext/fpext and (fp)trunc+store isel legalization and tests Reviewers: arsenm, craig.topper, rengolin, k-ishizaka Reviewed By: arsenm Subscribers: merge_guards_bot, wdng, hiraditya, llvm-commits Tags: #ve, #llvm Differential Revision: https://reviews.llvm.org/D73774	2020-02-03 16:55:44 +01:00
Simon Pilgrim	8ead5df0b1	[X86] computeKnownBitsForTargetNode - add BEXTR support (PR39153) Add a KnownBits::extractBits helper	2020-02-03 15:43:59 +00:00
Craig Topper	028579b51e	[X86] FUCOMI/FCOMI instructions should Def FPSW not FPCW. These instructions can set the exception in FPSW. But I don't think they can change FPCW. So this looks like a typo. Differential Revision: https://reviews.llvm.org/D73864	2020-02-03 07:39:00 -08:00
Kazushi (Jam) Marukawa	07c9f7574d	[VE] vaarg functions callers and callees Summary: Isel patterns and tests for vaarg functions as callers and callees. Reviewers: arsenm, rengolin, k-ishizaka Subscribers: merge_guards_bot, wdng, hiraditya, llvm-commits Tags: #ve, #llvm Differential Revision: https://reviews.llvm.org/D73710	2020-02-03 16:26:44 +01:00
Simon Pilgrim	a9ee3ffbc0	[X86] Move BEXTR DemandedBits handling inside SimplifyDemandedBitsForTargetNode Some prep work for PR39153.	2020-02-03 15:16:40 +00:00
Matt Arsenault	00b22df71d	AMDGPU: Fix extra type mangling on llvm.amdgcn.if.break These have to be the same mask type.	2020-02-03 07:02:05 -08:00
John Brawn	68cf574857	[FPEnv][AArch64] Add lowering of f128 STRICT_FSETCC These get lowered to function calls, like the non-strict versions. Differential Revision: https://reviews.llvm.org/D73784	2020-02-03 14:39:16 +00:00
Krzysztof Parzyszek	b99ed5c0b4	[Hexagon] Rename FeatureHasPreV65 to FeaturePreV65	2020-02-03 08:20:59 -06:00
Matt Arsenault	e4bc55bd94	AMDGPU/GlobalISel: Reduce indentation	2020-02-03 05:41:14 -08:00
Simon Moll	24215fec9a	[NFC][VE] format VEInstrInfo	2020-02-03 14:25:49 +01:00
Simon Moll	5c8ba508b2	[NFC] unsigned->Register in storeRegTo/loadRegFromStack Summary: This patch makes progress on the 'unsigned -> Register' rewrite for `TargetInstrInfo::loadRegFromStack` and `TII::storeRegToStack`. Reviewers: arsenm, craig.topper, uweigand, jpienaar, atanasyan, venkatra, robertlytton, dylanmckay, t.p.northover, kparzysz, tstellar, k-ishizaka Reviewed By: arsenm Subscribers: wuzish, merge_guards_bot, jyknight, sdardis, nemanjai, jvesely, wdng, nhaehnle, hiraditya, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, Jim, lenary, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73870	2020-02-03 14:22:16 +01:00
Guillaume Chatelet	fc19465965	[Alignment][NFC] Use Align for code creating MemOp Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73874	2020-02-03 14:10:30 +01:00
John Brawn	b37d59353f	[FPEnv][ARM] Add lowering of STRICT_FSETCC and STRICT_FSETCCS These can be lowered to code sequences using CMPFP and CMPFPE which then get selected to VCMP and VCMPE. The implementation isn't fully correct, as the chain operand isn't handled correctly, but resolving that looks like it would involve changes around FPSCR-handling instructions and how the FPSCR is modelled. The fp-intrinsics test was already testing some of this but as the entire test was being XFAILed it wasn't noticed. Un-XFAIL the test and instead leave the cases where we aren't generating the right instruction sequences as FIXME. Differential Revision: https://reviews.llvm.org/D73194	2020-02-03 12:59:12 +00:00
Simon Tatham	961530fdc9	[ARM,MVE] Fix vreinterpretq in big-endian mode. Summary: In big-endian MVE, the simple vector load/store instructions (i.e. both contiguous and non-widening) don't all store the bytes of a register to memory in the same order: it matters whether you did a VSTRB.8, VSTRH.16 or VSTRW.32. Put another way, the in-register formats of different vector types relate to each other in a different way from the in-memory formats. So, if you want to 'bitcast' or 'reinterpret' one vector type as another, you have to carefully specify which you mean: did you want to reinterpret the //register// format of one type as that of the other, or the //memory// format? The ACLE `vreinterpretq` intrinsics are specified to reinterpret the register format. But I had implemented them as LLVM IR bitcast, which is specified for all types as a reinterpretation of the memory format. So a `vreinterpretq` intrinsic, applied to values already in registers, would code-generate incorrectly if compiled big-endian: instead of emitting no code, it would emit a `vrev`. To fix this, I've introduced a new IR intrinsic to perform a register-format reinterpretation: `@llvm.arm.mve.vreinterpretq`. It's implemented by a trivial isel pattern that expects the input in an MQPR register, and just returns it unchanged. In the clang codegen, I only emit this new intrinsic where it's actually needed: I prefer a bitcast wherever it will have the right effect, because LLVM understands bitcasts better. So we still generate bitcasts in little-endian mode, and even in big-endian when you're casting between two vector types with the same lane size. For testing, I've moved all the codegen tests of vreinterpretq out into their own file, so that they can have a different set of RUN lines to check both big- and little-endian. Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard Reviewed By: dmgreen Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D73786	2020-02-03 11:20:06 +00:00
Simon Tatham	f8d4afc49a	[ARM,MVE] Add intrinsics for v[id]dupq and v[id]wdupq. Summary: These instructions generate a vector of consecutive elements starting from a given base value and incrementing by 1, 2, 4 or 8. The `wdup` versions also wrap the values back to zero when they reach a given limit value. The instruction updates the scalar base register so that another use of the same instruction will continue the sequence from where the previous one left off. At the IR level, I've represented these instructions as a family of target-specific intrinsics with two return values (the constructed vector and the updated base). The user-facing ACLE API provides a set of intrinsics that throw away the written-back base and another set that receive it as a pointer so they can update it, plus the usual predicated versions. Because the intrinsics return two values (as do the underlying instructions), the isel has to be done in C++. This is the first family of MVE intrinsics that use the `imm_1248` immediate type in the clang Tablegen framework, so naturally, I found I'd given it the wrong C integer type. Also added some tests of the check that the immediate has a legal value, because this is the first time those particular checks have been exercised. Finally, I also had to fix a bug in MveEmitter which failed an assertion when I nested two `seq` nodes (the inner one used to extract the two values from the pair returned by the IR intrinsic, and the outer one put on by the predication multiclass). Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard Reviewed By: dmgreen Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D73357	2020-02-03 11:20:06 +00:00
Simon Tatham	cf7e98e6f7	[ARM,MVE] Add intrinsics for vdupq. Summary: The unpredicated case of this is trivial: the clang codegen just makes a vector splat of the input, and LLVM isel is already prepared to handle that. For the predicated version, I've generated a `select` between the same vector splat and the `inactive` input parameter, and added new Tablegen isel rules to match that pattern into a predicated `MVE_VDUP` instruction. Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard Reviewed By: dmgreen Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D73356	2020-02-03 11:20:06 +00:00
Clement Courbet	5b2c5e261f	[llvm-exegesis] Add pfm counters for Zen2 (znver2). Summary: There are no counters for individual ports, but this is already enough to find a lot of issues in the current model (upcoming patch). Reviewers: dblaikie, gchatelet Subscribers: hiraditya, tschuett, RKSimon, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D72032	2020-02-03 10:57:41 +01:00
Jay Foad	97d9a76afc	[AMDGPU] Don't remove short branches over kills Summary: D68092 introduced a new SIRemoveShortExecBranches optimization pass and broke some graphics shaders. The problem is that it was removing branches over KILL pseudo instructions, and the fix is to explicitly check for that in mustRetainExeczBranch. Reviewers: critson, arsenm, nhaehnle, cdevadas, hakzsam Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73771	2020-02-03 09:26:52 +00:00
Craig Topper	cf20fde1d1	[X86] Remove a couple unnecessary calls to ConvertCmpIfNecessary. We only need to call this on floating point comparisons. In this case these are known to be integer compares. One of them even has a SUB opcode instead of CMP.	2020-02-02 21:36:51 -08:00
Shengchen Kan	db7d2ab03d	[NFC] Fix helptext for opt/llc after https://reviews.llvm.org/D68411 Remove "cl::value_desc("jcc, fused, jmp, call, ret, indirect"),", which makes the option+it's cl::value_desc too long in all of help.	2020-02-03 12:31:42 +08:00
Craig Topper	ee85415dbb	[X86] Use MVT::f80 for the result type of the FLD used to convert from SSE register to X87 register in FP_TO_INTHelper.	2020-02-02 13:24:37 -08:00
Simon Pilgrim	5d86ac82a6	Fix a few spelling mistakes in comments. NFCI.	2020-02-02 18:27:43 +00:00
Simon Pilgrim	17e91b7dd2	[X86][SSE] combineBitcastvxi1 - add pre-AVX512 v64i1 handling	2020-02-02 18:00:09 +00:00
David Green	d50e188a07	Revert "[ARM][MVE] VPT Blocks: findVCMPToFoldIntoVPS" This reverts commit `e34801c8e6` and the followup due to multiple problems. I've tried to keep the tests and RDA parts where possible, as those still seem useful.	2020-02-02 13:24:05 +00:00
Nicolai Hähnle	ba8110161d	AMDGPU/GFX10: Fix NSA reassign pass when operands are undef Summary: Virtual registers that are undef have an empty LiveInterval at this point, which means beginIndex() and endIndex() cannot be used. We only need those indices to determine the range in which to scan for affected other NSA instructions, and undef operands cannot contribute to that range. Reviewers: arsenm, rampitec, mareko Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73831	2020-02-01 22:41:40 +01:00
Craig Topper	a57dd66d5e	[X86] In X86FastEmitSSESelect, fall back to SelectionDAG if the inputs to the compare can't be found in registers. We were checking that the original Value * for the compare operands were null. But that can never happen. I believe we intended to check for 0 registers here instead. Fixes PR44749.	2020-02-01 12:24:55 -08:00
Craig Topper	d975910c50	[X86] Don't exit from foldOffsetIntoAddress if the Offset is 0, but AM.Disp is non-zero. This is an alternate fix for the issue D73606 was trying to solve. The main issue here is that we bailed out of foldOffsetIntoAddress if Offset is 0. But if we just found a symbolic displacement and AM.Disp became non-zero earlier, we still need to validate that AM.Disp with the symbolic displacement. This is my second attempt at committing this after failing build bots previously. One thing I realized about the previous attempt is that its possible that AM.Disp is already non-zero and the new Offset changes it back to zero. In that case my previous attempt failed to update AM.Disp to zero. So this patch removes the early out for 0 and appropriately handle the 0 case in each check so we still update AM.Disp at the end.	2020-02-01 11:26:17 -08:00
Craig Topper	943b5561d6	[LegalizeTypes][X86] Add a new strategy for type legalizing f16 type that softens it to i16, but promotes to f32 around arithmetic ops. This is based on this llvm-dev thread http://lists.llvm.org/pipermail/llvm-dev/2019-December/137521.html The current strategy for f16 is to promote type to float every except where the specific width is required like loads, stores, and bitcasts. This results in rounding occurring in odd places instead of immediately after arithmetic operations. This interacts in weird ways with the __fp16 type in clang which is a storage only type where arithmetic is always promoted to float. InstCombine can remove some fpext/fptruncs around such arithmetic and turn it into arithmetic on half. This wouldn't be so bad if SelectionDAG was able to put those fpext/fpround back in when it promotes. It is also not obvious how to handle to make the existing strategy work with STRICT fp. We need to use STRICT versions of the conversions which require chain operands. But if the conversions are created for a bitcast, there is no place to get an appropriate chain from. This patch implements a different strategy where conversions are emitted directly around arithmetic operations. And otherwise its passed around as an i16 including in arguments and return values. This can result in more conversions between arithmetic operations, but is closer to matching the IR the frontend generates for __fp16. And it will allow us to use the chain from constrained arithmetic nodes to link the STRICT_FP_TO_FP16/STRICT_FP16_TO_FP that will need to be added. I've set it up so that each target can opt into the new behavior. Converting all the targets myself was more than I was able to handle. Differential Revision: https://reviews.llvm.org/D73749	2020-02-01 11:21:04 -08:00
Alex Richardson	24ee9c8496	Don't mark MIPS TRAP as isTerminator This was causing machine verifier errors when compiling libunwind. Reviewed By: atanasyan Differential Revision: https://reviews.llvm.org/D73648	2020-02-01 15:50:22 +00:00
Matt Arsenault	c0b12916a7	AMDGPU/GlobalISel: Use more wide vector load/stores This improves the type breakdown for some large vectors. For example, we now get a <4 x s32> and s32 store instead of 5 s32 stores for <5 x s32>.	2020-02-01 10:47:21 -05:00
Matt Arsenault	e3117e5c30	AMDGPU/GlobalISel: Improve legalization of wide stores This fixes legalizations of global stores > 128-bits. It seems work is needed on how this split actually occurs. For example, we get the right code for s160, with an s128 and s32 load, but get 5 s32 loads for <5 x s32>.	2020-02-01 10:47:03 -05:00
Matt Arsenault	98aaed2980	AMDGPU/GlobalISel: Fix forming G_TRUNC with vcc result This somehow got lost when I fixed the boolean handling.	2020-01-31 20:29:41 -05:00
Luís Marques	24cba3312f	[RISCV] Implement jump pseudo-instruction Summary: Implements the jump pseudo-instruction, which is used in e.g. the Linux kernel. Reviewers: asb, lenary Reviewed By: lenary Tags: #llvm Differential Revision: https://reviews.llvm.org/D73178	2020-01-31 22:28:26 +00:00
Jessica Paquette	b9bf9305d1	[AArch64][GlobalISel] Walk through G_TRUNC in getTestBitReg When you encounter a G_TRUNC, you are moving from a larger type to a smaller type. Asking for the i-th bit on a larger value is the same as asking for the i-th bit on a smaller value. So, we should always be able to walk through G_TRUNC when computing the bit for a TB(N)Z. Differential Revision: https://reviews.llvm.org/D73748	2020-01-31 11:09:55 -08:00
alex-t	5df1ac7846	[AMDGPU] fixed divergence driven shift operations selection Differential Revision: https://reviews.llvm.org/D73483 Reviewers: rampitec	2020-01-31 20:49:56 +03:00
Jay Foad	2a1b5af299	[GlobalISel] Tidy up unnecessary calls to createGenericVirtualRegister Summary: As a side effect some redundant copies of constant values are removed by CSEMIRBuilder. Reviewers: aemerson, arsenm, dsanders, aditya_nandakumar Subscribers: sdardis, jvesely, wdng, nhaehnle, rovka, hiraditya, jrtc27, atanasyan, volkan, Petar.Avramovic, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73789	2020-01-31 17:07:16 +00:00
Danilo Carvalho Grael	44a4f5fc6a	[AArch64][SVE] Add SVE2 mla unpredicated intrinsics. Summary: Add intrinsics for the MLA unpredicated sve2 instructions: - smlalb, smlalt, umlalb, umlalt, smlslb, smlslt, umlslb, umlslt - sqdmlalb, sqdmlalt, sqdmlslb, sqdmlslt - sqdmlalbt, sqdmlslbt Reviewers: efriedma, sdesmalen, cameron.mcinally, c-rhodes, rengolin, kmclaughlin Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits, amehsan Tags: #llvm Differential Revision: https://reviews.llvm.org/D73746	2020-01-31 11:39:12 -05:00
Guillaume Chatelet	3c89b75f23	[NFC] Introduce a type to model memory operation Summary: This is a first step before changing the types to llvm::Align and introduce functions to ease client code. Reviewers: courbet Subscribers: arsenm, sdardis, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jrtc27, atanasyan, jsji, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73785	2020-01-31 17:29:01 +01:00
Matt Arsenault	b3726ecea4	AMDGPU: Fix potential use of undefined value	2020-01-31 10:38:58 -05:00
Matt Arsenault	6fb544d1d2	AMDGPU/GlobalISel: Combine FMIN_LEGACY/FMAX_LEGACY Try out using combine definition rules. This really should be a post-legalizer combine, but the combiner pass is currently pre-legalize. Most of the target combines are really post-legalize, so we should probably move the pass.	2020-01-31 06:58:04 -08:00
Matt Arsenault	49e424e08e	AMDGPU/GlobalISel: Select global MUBUF atomicrmw	2020-01-31 06:05:41 -08:00
Matt Arsenault	0426c2d07d	Reapply "AMDGPU: Cleanup and fix SMRD offset handling" This reverts commit `6a4acb9d80`.	2020-01-31 06:01:28 -08:00
Jay Foad	31e29d4afe	AMDGPU/GlobalISel: Make use of MachineIRBuilder helper functions. NFC.	2020-01-31 13:53:39 +00:00
Kerry McLaughlin	69558c8487	[AArch64][SVE] Add remaining SVE2 intrinsics for uniform DSP operations Summary: Implements the following intrinsics: - @llvm.aarch64.sve.[s\|u]qadd - @llvm.aarch64.sve.[s\|u]qsub - @llvm.aarch64.sve.suqadd - @llvm.aarch64.sve.usqadd - @llvm.aarch64.sve.[s\|u]qsubr - @llvm.aarch64.sve.[s\|u]rshl - @llvm.aarch64.sve.[s\|u]qshl - @llvm.aarch64.sve.[s\|u]qrshl - @llvm.aarch64.sve.[s\|u]rshr - @llvm.aarch64.sve.sqshlu - @llvm.aarch64.sve.sri - @llvm.aarch64.sve.sli - @llvm.aarch64.sve.[s\|u]sra - @llvm.aarch64.sve.[s\|u]rsra - @llvm.aarch64.sve.[s\|u]aba Reviewers: efriedma, sdesmalen, dancgr, cameron.mcinally, c-rhodes, rengolin Reviewed By: sdesmalen Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, cfe-commits, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73551	2020-01-31 10:51:57 +00:00
Fangrui Song	5b22bcc2b7	[X86][ELF] Prefer to lower MC_GlobalAddress operands to .Lfoo$local For a MC_GlobalAddress reference to a dso_local external GlobalValue with a definition, emit .Lfoo$local to avoid a relocation. -fno-pic and -fpie can infer dso_local but -fpic cannot. In the future, we can explore the possibility of inferring dso_local with -fpic. As the description of D73228 says, LLVM's existing IPO optimization behaviors (like -fno-semantic-interposition) and a previous assembly behavior give us enough license to be aggressive here. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D73230	2020-01-30 17:52:35 -08:00
Matt Arsenault	6a4acb9d80	Revert "AMDGPU: Cleanup and fix SMRD offset handling" This reverts commit `17dbc6611d`. A test is failing on some bots	2020-01-30 15:39:51 -08:00
Matt Arsenault	17dbc6611d	AMDGPU: Cleanup and fix SMRD offset handling I believe this also fixes bugs with CI 32-bit handling, which was incorrectly skipping offsets that look like signed 32-bit values. Also validate the offsets are dword aligned before folding.	2020-01-30 15:04:21 -08:00
Jessica Paquette	c8c987d310	[AArch64][GlobalISel] Fold in G_ANYEXT/G_ZEXT into TB(N)Z This is similar to the code in getTestBitOperand in AArch64ISelLowering. Instead of implementing all of the TB(N)Z optimizations at once, this patch implements the simplest case first. The way that this is set up should make it fairly easy to add the rest as we go along. The idea here is that after determining that we can use a TB(N)Z, we can continue looking through instructions and perform further folding. In this case, when we have a G_ZEXT or G_ANYEXT where the extended bits are not used, we can fold it into the TB(N)Z. Differential Revision: https://reviews.llvm.org/D73673	2020-01-30 14:51:26 -08:00
Amara Emerson	6170272ab9	[AArch64][GlobalISel] Disallow vectors in convertPtrAddToAdd. Found by inspection, but there's no test for this yet because G_PTR_ADD is currently illegal for vectors. I'll add the test at a later time when the legalizer support has landed.	2020-01-30 14:50:44 -08:00
Matt Arsenault	f7521dc292	AMDGPU: Replace subtarget check with an assert This is already checked by the pattern subtarget predicate.	2020-01-30 14:15:26 -08:00
Matt Arsenault	97a1d4bc02	AMDGPU: Don't use separate cache arguments for s_buffer_load node There's not much value to this separate node from the intrinsic. Make the operand structure the same as the intrinsic, so we can reuse the same pattern for GlobalISel.	2020-01-30 14:15:26 -08:00
hsmahesha	1d9e08ec35	[AMDGPU] Add file headers for few files where it is missing. Summary: Added file headers for files which implement iterative lightweight scheduling strategies. Which is basically an exercise which I undertook in order to get used to LLVM development process. Reviewers: arsenm, vpykhtin, cdevadas Reviewed By: vpykhtin Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, javed.absar, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73417	2020-01-31 02:06:41 +05:30
Fangrui Song	06b8e32d4f	[AArch64] -fpatchable-function-entry=N,0: place patch label after BTI Summary: For -fpatchable-function-entry=N,0 -mbranch-protection=bti, after `9a24488cb6`, we place the NOP sled after the initial BTI. ``` .Lfunc_begin0: bti c nop nop .section __patchable_function_entries,"awo",@progbits,f,unique,0 .p2align 3 .xword .Lfunc_begin0 ``` This patch adds a label after the initial BTI and changes the __patchable_function_entries entry to reference the label: ``` .Lfunc_begin0: bti c .Lpatch0: nop nop .section __patchable_function_entries,"awo",@progbits,f,unique,0 .p2align 3 .xword .Lpatch0 ``` This placement is compatible with the resolution in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92424 . A local linkage function whose address is not taken does not need a BTI. Placing the patch label after BTI has the advantage that code does not need to differentiate whether the function has an initial BTI. Reviewers: mrutland, nickdesaulniers, nsz, ostannard Subscribers: kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73680	2020-01-30 11:11:52 -08:00
Danilo Carvalho Grael	0610637aac	[AArch64][SVE] Add remaining SVE2 mla indexed intrinsics. Summary: Add remaining SVE2 mla indexed intrinsics: - sqdmlalb, sqdmlalt, sqdmlslb, sqdmlslt Add suffix _lanes and switch immediate types to i32 for all mla indexed intrinsics to align with ACLE builtin definitions. Reviewers: efriedma, sdesmalen, cameron.mcinally, c-rhodes, rengolin, kmclaughlin Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, arphaman, psnobl, llvm-commits, amehsan Tags: #llvm Differential Revision: https://reviews.llvm.org/D73633	2020-01-30 13:32:11 -05:00
Nikita Popov	70d345e687	[AArch64][ARM] Always expand ordered vector reductions (PR44600) fadd/fmul reductions without reassoc are lowered to VECREDUCE_STRICT_FADD/FMUL nodes, which don't have legalization support. Until that is in place, expand these intrinsics on ARM and AArch64. Other targets always expand the vector reduction intrinsics. Additionally expand fmax/fmin reductions without nonan flag on AArch64, as the backend asserts that the flag is present when lowering VECREDUCE_FMIN/FMAX. This fixes https://bugs.llvm.org/show_bug.cgi?id=44600. Differential Revision: https://reviews.llvm.org/D73135	2020-01-30 18:40:24 +01:00
Yonghong Song	795bbb3662	[BPF] fix a bug in BPFMISimplifyPatchable pass with -O0 The recommended optimization level for BPF programs is O2 since (1). BPF is running inside the kernel and linux kernel won't work at -O0 level, and (2). Verifier is not able to handle O0 code properly, e.g., potential large stack size and a lot of spills. But we should keep -O0 at least compiling. This patch fixed a bug in BPFMISimplifyPatchable phase where with -O0, a segmentation fault will happen for a simple program like: int test(int a, int b) { return a + b; } A test case is added to capture such a case. Differential Revision: https://reviews.llvm.org/D73681	2020-01-30 08:28:39 -08:00
jasonliu	3bbe7a681e	[XCOFF][AIX] Support basic relocation type on AIX Summary: This patch intends to support three most common relocation type on AIX: R_POS, R_TOC, R_RBR. These three relocation type will be needed for object file generation on AIX for small code model. We will have follow up patches to bring relocation support for large code model on AIX. Reviewers: hubert.reinterpretcast, daltenty, DiggerLin Differential Revision: https://reviews.llvm.org/D72027	2020-01-30 15:59:09 +00:00
Stefan Pintilie	9de1241bb2	[PowerPC][Future] Branch Distance Estimation For Prefixed Instructions By adding the prefixed instructions the branch distances are no longer computed correctly. Since prefixed instructions cannot cross a 64 byte boundary we have to assume that a prefixed instruction may have a nop prepended to it. This patch tries to take that nop into consideration when computing the size of basic blocks. Differential Revision: https://reviews.llvm.org/D72572	2020-01-30 08:54:33 -06:00
Matt Arsenault	d6b83d6ba5	AMDGPU/GlobalISel: Don't use pointless getConstantVRegVal This is always a G_CONSTANT already	2020-01-30 09:38:43 -05:00
Nemanja Ivanovic	6cc6e89c11	Fix helptext for opt/llc after `14fc20ca6` The commit https://reviews.llvm.org/rG14fc20ca6 added some options to the X86 back end that cause the help text for opt/llc to become much harder to read. The issue is that the cl::value_desc is part of the option name and is used to compute the indentation of the description text (i.e. the maximum length option name is what everything aligns to). Since the commit puts a large number of characters into that text, everything is aligned to that width. This patch just reformats the option so that the description is contained in the description and the list of possible values is within the angle brackets. Note: the readability issue of the helptext was fixed in commit `70cbf8c71c`, but the re-formatting wasn't added on that commit so I am still committing this. Differential revision: https://reviews.llvm.org/D73267	2020-01-30 08:35:55 -06:00
John Brawn	0bb9a27c98	[FPEnv][AArch64] Add lowering and instruction selection for strict conversions Strict fp-to-int and int-to-fp conversions can be handled in the same way that the non-strict versions are (by using the appropriate instruction or converting to a function call when we have no instruction). Differential Revision: https://reviews.llvm.org/D73625	2020-01-30 13:50:06 +00:00

... 12 13 14 15 16 ...

57083 Commits