llvm-project

Commit Graph

Author	SHA1	Message	Date
Krzysztof Parzyszek	0ce93194fe	[Hexagon] Fix type in HexagonTargetLowering::ReplaceNodeResults llvm-svn: 371083	2019-09-05 16:19:47 +00:00
David Candler	a59bffb576	[ARM] Add support for the s,j,x,N,O inline asm constraints A number of inline assembly constraints are currently supported by LLVM, but rejected as invalid by Clang: Target independent constraints: s: An integer constant, but allowing only relocatable values ARM specific constraints: j: An immediate integer between 0 and 65535 (valid for MOVW) x: A 32, 64, or 128-bit floating-point/SIMD register: s0-s15, d0-d7, or q0-q3 N: An immediate integer between 0 and 31 (Thumb1 only) O: An immediate integer which is a multiple of 4 between -508 and 508. (Thumb1 only) This patch adds support to Clang for the missing constraints along with some checks to ensure that the constraints are used with the correct target and Thumb mode, and that immediates are within valid ranges (at least where possible). The constraints are already implemented in LLVM, but just a couple of minor corrections to checks (V8M Baseline includes MOVW so should work with 'j', 'N' and 'O' shouldn't be valid in Thumb2) so that Clang and LLVM are in line with each other and the documentation. Differential Revision: https://reviews.llvm.org/D65863 Change-Id: I18076619e319bac35fbb60f590c069145c9d9a0a llvm-svn: 371079	2019-09-05 15:17:25 +00:00
Simon Pilgrim	29361c704d	[X86][SSE] EltsFromConsecutiveLoads - ignore non-zero offset base loads (PR43227) As discussed on D64551 and PR43227, we don't correctly handle cases where the base load has a non-zero byte offset. Until we can properly handle this, we must bail from EltsFromConsecutiveLoads. llvm-svn: 371078	2019-09-05 15:07:07 +00:00
David Green	83a3341246	[ARM] Fixup the creation of VPT blocks This attempts to just fix the creation of VPT blocks, fixing up the iterating, which instructions are considered in the bundle, and making sure that we do not overrun the end of the block. Differential Revision: https://reviews.llvm.org/D67219 llvm-svn: 371064	2019-09-05 13:37:04 +00:00
Guillaume Chatelet	33671ceffa	[LLVM][Alignment] Convert isLegalNTStore/isLegalNTLoad to llvm::Align Summary: This is patch is part of a serie to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67223 llvm-svn: 371063	2019-09-05 13:09:42 +00:00
Petar Avramovic	a4bfc8dfda	[MIPS GlobalISel] Select G_FENCE G_FENCE comes form fence instruction. For MIPS fence is generated in AtomicExpandPass when atomic instruction gets surrounded with fence instruction when needed. G_FENCE arguments don't have LLT, because of that there is no job for legalizer and regbankselect. Instruction select G_FENCE for MIPS32. Differential Revision: https://reviews.llvm.org/D67181 llvm-svn: 371056	2019-09-05 11:20:32 +00:00
Petar Avramovic	f5c7fe0795	[MIPS GlobalISel] Select llvm.trap intrinsic Select G_INTRINSIC_W_SIDE_EFFECTS for Intrinsic::trap for MIPS32 via legalizeIntrinsic. Differential Revision: https://reviews.llvm.org/D67180 llvm-svn: 371055	2019-09-05 11:16:37 +00:00
Petar Avramovic	d2574d79b6	[MIPS GlobalISel] Lower SRet pointer arguments Instead of returning structure by value clang usually adds pointer to that structure as an argument. Pointers don't require special handling no matter the SRet flag. Remove unsuccessful exit from lowerCall for arguments with SRet flag if they are pointers. Differential Revision: https://reviews.llvm.org/D67179 llvm-svn: 371054	2019-09-05 11:12:01 +00:00
Simon Pilgrim	071287c5a9	Revert rL370996 from llvm/trunk: [AArch64][GlobalISel] Teach AArch64CallLowering to handle basic sibling calls This adds support for basic sibling call lowering in AArch64. The intent here is to only handle tail calls which do not change the ABI (hence, sibling calls.) At this point, it is very restricted. It does not handle - Vararg calls. - Calls with outgoing arguments. - Calls whose calling conventions differ from the caller's calling convention. - Tail/sibling calls with BTI enabled. This patch adds - `AArch64CallLowering::isEligibleForTailCallOptimization`, which is equivalent to the same function in AArch64ISelLowering.cpp (albeit with the restrictions above.) - `mayTailCallThisCC` and `canGuaranteeTCO`, which are identical to those in AArch64ISelLowering.cpp. - `getCallOpcode`, which is exactly what it sounds like. Tail/sibling calls are lowered by checking if they pass target-independent tail call positioning checks, and checking if they satisfy `isEligibleForTailCallOptimization`. If they do, then a tail call instruction is emitted instead of a normal call. If we have a sibling call (which is always the case in this patch), then we do not emit any stack adjustment operations. When we go to lower a return, we check if we've already emitted a tail call. If so, then we skip the return lowering. For testing, this patch - Adds call-translator-tail-call.ll to test which tail calls we currently lower, which ones we don't, and which ones we shouldn't. - Updates branch-target-enforcement-indirect-calls.ll to show that we fall back as expected. Differential Revision: https://reviews.llvm.org/D67189 ........ This fails on EXPENSIVE_CHECKS builds due to a -verify-machineinstrs test failure in CodeGen/AArch64/dllimport.ll llvm-svn: 371051	2019-09-05 10:38:39 +00:00
Simon Pilgrim	082750fe68	[X86] X86SpeculativeLoadHardeningPass::canHardenRegister - fix out of bounds warning. Fixes clang static-analyzer warning. llvm-svn: 371050	2019-09-05 10:26:38 +00:00
Jonas Paulsson	821858780e	[SystemZ] Recognize INLINEASM_BR in backend Handle the remaining cases also by handling asm goto in SystemZInstrInfo::getBranchInfo(). Review: Ulrich Weigand https://reviews.llvm.org/D67151 llvm-svn: 371048	2019-09-05 10:20:05 +00:00
Simon Pilgrim	67991a59cb	[X86] X86InstrInfo::optimizeCompareInstr - fix potential null dereference. Fixes clang static-analyzer warning. Technically the MachineInstr *Sub might still be null if we're comparing zero (IsCmpZero == true), although this probably won't happen as SrcReg2 is probably == 0. llvm-svn: 371047	2019-09-05 10:18:24 +00:00
Guillaume Chatelet	aff45e4b23	[LLVM][Alignment] Make functions using log of alignment explicit Summary: This patch renames functions that takes or returns alignment as log2, this patch will help with the transition to llvm::Align. The renaming makes it explicit that we deal with log(alignment) instead of a power of two alignment. A few renames uncovered dubious assignments: - `MirParser`/`MirPrinter` was expecting powers of two but `MachineFunction` and `MachineBasicBlock` were using deal with log2(align). This patch fixes it and updates the documentation. - `MachineBlockPlacement` exposes two flags (`align-all-blocks` and `align-all-nofallthru-blocks`) supposedly interpreted as power of two alignments, internally these values are interpreted as log2(align). This patch updates the documentation, - `MachineFunctionexposes` exposes `align-all-functions` also interpreted as power of two alignment, internally this value is interpreted as log2(align). This patch updates the documentation, Reviewers: lattner, thegameg, courbet Subscribers: dschuff, arsenm, jyknight, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, javed.absar, hiraditya, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, dexonsmith, PkmX, jocewei, jsji, Jim, s.egerton, llvm-commits, courbet Tags: #llvm Differential Revision: https://reviews.llvm.org/D65945 llvm-svn: 371045	2019-09-05 10:00:22 +00:00
Matt Arsenault	f581d575ce	AMDGPU: Add intrinsics for address space identification The library currently uses ptrtoint and directly checks the queue ptr for this, which counts as a pointer capture. llvm-svn: 371009	2019-09-05 02:20:39 +00:00
Matt Arsenault	69b1a2ae65	AMDGPU/GlobalISel: Restore insert point when getting aperture Avoids SSA violations in a future patch. llvm-svn: 371008	2019-09-05 02:20:32 +00:00
Matt Arsenault	25156ae7ea	AMDGPU/GlobalISel: Fix placeholder value used for addrspacecast llvm-svn: 371007	2019-09-05 02:20:29 +00:00
Matt Arsenault	d51a3746d0	AMDGPU/GlobalISel: Fix assert on load from constant address llvm-svn: 371006	2019-09-05 02:20:25 +00:00
Jessica Paquette	b78324fc40	[AArch64][GlobalISel] Teach AArch64CallLowering to handle basic sibling calls This adds support for basic sibling call lowering in AArch64. The intent here is to only handle tail calls which do not change the ABI (hence, sibling calls.) At this point, it is very restricted. It does not handle - Vararg calls. - Calls with outgoing arguments. - Calls whose calling conventions differ from the caller's calling convention. - Tail/sibling calls with BTI enabled. This patch adds - `AArch64CallLowering::isEligibleForTailCallOptimization`, which is equivalent to the same function in AArch64ISelLowering.cpp (albeit with the restrictions above.) - `mayTailCallThisCC` and `canGuaranteeTCO`, which are identical to those in AArch64ISelLowering.cpp. - `getCallOpcode`, which is exactly what it sounds like. Tail/sibling calls are lowered by checking if they pass target-independent tail call positioning checks, and checking if they satisfy `isEligibleForTailCallOptimization`. If they do, then a tail call instruction is emitted instead of a normal call. If we have a sibling call (which is always the case in this patch), then we do not emit any stack adjustment operations. When we go to lower a return, we check if we've already emitted a tail call. If so, then we skip the return lowering. For testing, this patch - Adds call-translator-tail-call.ll to test which tail calls we currently lower, which ones we don't, and which ones we shouldn't. - Updates branch-target-enforcement-indirect-calls.ll to show that we fall back as expected. Differential Revision: https://reviews.llvm.org/D67189 llvm-svn: 370996	2019-09-04 22:54:52 +00:00
Matt Arsenault	2df41a8e38	AMDGPU/GlobalISel: Select G_BITREVERSE llvm-svn: 370980	2019-09-04 20:46:31 +00:00
Matt Arsenault	5ff310e298	GlobalISel: Add basic legalization for G_BITREVERSE llvm-svn: 370979	2019-09-04 20:46:15 +00:00
Matt Arsenault	84489b34f6	AMDGPU: Handle frame index expansion with no free SGPRs pre gfx9 Since an add instruction must produce an unused carry out, this requires additional SGPRs. This can be avoided by keeping the entire offset computation in SGPRs. If one SGPR is still available, this only costs one extra mov. If none are available, the entire computation can be done in place and reversed. This does assume the use is a VGPR operand. This was already assumed, and we currently only select frame indexes to VALU instructions. This should probably be fixed at some point to handle more possible MIR. llvm-svn: 370929	2019-09-04 17:12:57 +00:00
Matt Arsenault	d9af712da4	AMDGPU/GlobalISel: Make 16-bit constants legal This is mostly for the benefit of patterns which use 16-bit constants. llvm-svn: 370921	2019-09-04 16:19:45 +00:00
Krzysztof Parzyszek	08a09822a5	[Hexagon] Improve generated code for test-if-bit-clear, one more time Adjust isel patterns after recent commit. Fixes https://llvm.org/PR43194. llvm-svn: 370913	2019-09-04 15:22:36 +00:00
Sam Parker	fea532230b	[ARM][ParallelDSP] SExt mul for accumulation For any unpaired muls, we accumulate them as an input to the reduction. Check the type of the mul and perform a sext if the existing accumlator input type is not the same. Differential Revision: https://reviews.llvm.org/D66993 llvm-svn: 370851	2019-09-04 08:41:34 +00:00
Jim Lin	b77aa1d248	[RISCV] Enable tail call opt for variadic function Summary: Tail call opt can treat variadic function call the same as normal function call Reviewers: mgrang, asb, lenary, lewis-revill Reviewed By: lenary Subscribers: luismarques, pzheng, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, kito-cheng, shiva0217, jrtc27, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, rkruppe, PkmX, jocewei, psnobl, benna, s.egerton, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66278 llvm-svn: 370835	2019-09-04 02:03:36 +00:00
Reid Kleckner	3fa07dee94	Revert [Windows] Disable TrapUnreachable for Win64, add SEH_NoReturn This reverts r370525 (git commit `0bb1630685`) Also reverts r370543 (git commit `185ddc08ee`) The approach I took only works for functions marked `noreturn`. In general, a call that is not known to be noreturn may be followed by unreachable for other reasons. For example, there could be multiple call sites to a function that throws sometimes, and at some call sites, it is known to always throw, so it is followed by unreachable. We need to insert an `int3` in these cases to pacify the Windows unwinder. I think this probably deserves its own standalone, Win64-only fixup pass that runs after block placement. Implementing that will take some time, so let's revert to TrapUnreachable in the mean time. llvm-svn: 370829	2019-09-03 22:27:27 +00:00
Heejin Ahn	49e7ee4dd5	[WebAssembly] Compare functions by names in Emscripten Sjlj Summary: This removes all string constants for function names and compares functions by string directly when needed. Many of these constants are used only once or twice so the benefit of defining them separately is not very clear, and this actually fixes a bug. When we already have a `malloc` declaration which is an alias to something else within the module, ``` @malloc = weak hidden alias i8* (i32), i8* (i32)* @dlmalloc ``` (this happens compiling with emscripten with `-s WASM_OBJECT_FILES=0` because all bc files are merged before being fed into `wasm-ld` which runs the backend optimizations as LTO) `Module::getFunction("malloc")` in `canLongjmp` returns `nullptr` because `Module::getFunction` dyncasts pointer into `Function`, but the alias is a `GlobalValue` but not a `Function`. This makes `canLongjmp` return false for `malloc` in this case, and we end up adding a lot of longjmp handling code around malloc. This is not only a code size increase but actually a bug because `malloc` is used in the entry block when preparing for setjmp tables for emscripten sjlj handling, and this makes initial setjmp preparation, which has to happen in the entry block, move to another split block, and this interferes with SSA update later. This also adds two more functions, `getTempRet0` and `setTempRet0`, in the list of not longjmp-able functions. Fixes https://github.com/emscripten-core/emscripten/issues/8935. Reviewers: sbc100 Subscribers: mehdi_amini, jgravelle-google, hiraditya, sunfish, dexonsmith, dschuff, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67129 llvm-svn: 370828	2019-09-03 22:26:49 +00:00
Amara Emerson	2a2c25ba48	[AArch64][GlobalISel] Legalize 128 bit divisions to libcalls. Now that we have the infrastructure to support s128 types as parameters we can expand these to libcalls. Differential Revision: https://reviews.llvm.org/D66185 llvm-svn: 370823	2019-09-03 21:42:32 +00:00
Amara Emerson	fbaf425b79	[GlobalISel][CallLowering] Add support for splitting types according to calling conventions. On AArch64, s128 types have to be split into s64 GPRs when passed as arguments. This change adds the generic support in call lowering for dealing with multiple registers, for incoming and outgoing args. Support for splitting for return types not yet implemented. Differential Revision: https://reviews.llvm.org/D66180 llvm-svn: 370822	2019-09-03 21:42:28 +00:00
Reid Kleckner	b2d10cf22e	[MC] Pass through .code16/32/64 and .syntax unified for COFF These flags should simply be passed through to the target, which will do the right thing. Add an MC/X86 test that uses these directives with the three primary object file formats and shows that they disassemble the same everywhere. There is a missing test for .code32 on Windows ARM, since I'm not sure exactly how to construct one. Fixes PR43203 llvm-svn: 370805	2019-09-03 18:16:52 +00:00
Jessica Paquette	15036acb05	[AArch64][GlobalISel] Don't import i64imm_32bit pattern at -O0 This pattern, when imported at -O0 adds an extra copy via the SUBREG_TO_REG. This is because the SUBREG_TO_REG is not eliminated. At all other opt levels, it is eliminated. This is a 1% geomean code size savings at -O0 on CTMark. Differential Revision: https://reviews.llvm.org/D67027 llvm-svn: 370789	2019-09-03 17:21:12 +00:00
Kerry McLaughlin	7b5c6b8d86	[SVE][Inline-Asm] Fix -Wimplicit-fallthrough in AArch64ISelLowering.cpp Summary: Adds break to 'x' case in getRegForInlineAsmConstraint added by D66302, fixing the unintentional fallthrough. Reviewers: sdesmalen, rovka, cameron.mcinally, greened, gribozavr, ruiu Reviewed By: sdesmalen Subscribers: bjope, javed.absar, tschuett, kristof.beyls, rkruppe, psnobl, llvm-commits, cfe-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67095 llvm-svn: 370769	2019-09-03 15:45:42 +00:00
Simon Pilgrim	99525bbe49	[X86] Merge 2 consecutive HasInt256 branches. NFCI. llvm-svn: 370761	2019-09-03 14:39:06 +00:00
Jonas Paulsson	a0a811739d	[SystemZ] Recognize INLINEASM_BR in backend. SystemZInstrInfo::analyzeBranch() needs to check for INLINEASM_BR instructions, or it will crash. Review: Ulrich Weigand llvm-svn: 370753	2019-09-03 13:31:22 +00:00
David Green	2f3574c168	[ARM] Ignore Implicit CPSR regs when lowering from Machine to MC operands The code here seems to date back to r134705, when tablegen lowering was first being added. I don't believe that we need to include CPSR implicit operands on the MCInst. This now works more like other backends (like AArch64), where all implicit registers are skipped. This allows the AliasInst for CSEL's to match correctly, as can be seen in the test changes. Differential revision: https://reviews.llvm.org/D66703 llvm-svn: 370745	2019-09-03 11:30:54 +00:00
Jonas Paulsson	f12415812c	[SystemZ] Add support for fentry. SystemZAsmPrinter now properly emits function calls to __fentry__. Review: Ulrich Weigand llvm-svn: 370743	2019-09-03 11:21:12 +00:00
David Green	61973d978b	[ARM] Invert CSEL predicates if the opposite is a simpler constant to materialise This moves ConstantMaterializationCost into ARMBaseInstrInfo so that it can also be used in ISel Lowering, adding codesize values to the computed costs, to be able to compare either approximate instruction counts or codesize costs. It also adds a HasLowerConstantMaterializationCost, which compares the ConstantMaterializationCost of two values, returning true if the first is smaller either in instruction count/codesize, or falling back to the other in the case that they are equal. This is used in constant CSEL lowering to invert the predicate if the opposite is easier to materialise. Differential revision: https://reviews.llvm.org/D66701 llvm-svn: 370741	2019-09-03 11:06:24 +00:00
David Green	57cc65ff47	[ARM] Generate 8.1-m CSINC, CSNEG and CSINV instructions. Arm 8.1-M adds a number of related CSEL instructions, including CSINC, CSNEG and CSINV. These choose between two values given the content in CPSR and a condition, performing an increment, negation or inverse of the false value. This adds some selection for them, either from constant values or patterns. It does not include CSEL directly, which is currently not always making code better. It is still useful, but we will have to check more carefully where it should and shouldn't be used. Code by Ranjeet Singh and Simon Tatham, with some modifications from me. Differential revision: https://reviews.llvm.org/D66483 llvm-svn: 370739	2019-09-03 10:53:07 +00:00
Simon Atanasyan	25d5b54542	[mips] Switch to the `.text` section after emitting asm file preamble Now the last `.section` directive in the MIPS asm file preamble is the `.section .mdebug.abi`. If assembler code injected for example by the LLVM `module asm` or the C ` __asm` directives do not contain explicit switching to the `.text` section it goes to the `.mdebug.abi` section. It might be unexpected to the user and in fact for example breaks building some existing code like FreeBSD libc [1]. The patch forces switching to the `.text` section after emitting MIPS assembler file preamble. [1] https://bugs.llvm.org/show_bug.cgi?id=43119 Fix PR43119. Differential Revision: https://reviews.llvm.org/D67014 llvm-svn: 370735	2019-09-03 10:24:07 +00:00
David Green	3e8d5f335d	[ARM] Fix MVE ldst offset ranges We were using isShiftedInt<7, Shift>(RHSC) to detect the ranges of offsets to fold into MVE loads/stores. The instructions actually take a 7 bit unsigned integer which is either added or subtracted. So something more like isShiftedUInt<7, Shift>(abs(RHSC)). Instead I've changes this to use the isScaledConstantInRange method, same as in SelectT2AddrModeImm7Offset used by pre/post inc, which seemed to already be getting this correct. Differential revision: https://reviews.llvm.org/D66997 llvm-svn: 370731	2019-09-03 09:57:02 +00:00
Oliver Stannard	3be2df2418	[ARM][MVE] Decoding of VMSR doesn't diagnose some unpredictable encodings Decoding of VMSR doesn't diagnose some unpredictable encodings, as the unpredictable bits are not correctly set. Diff-reduce this instruction's internals WRT VMRS so I can see the differences better. Mostly this is s/src/Rt/g. Fill in the "should-be-(0)" bits. Designate the Unpredictable{} bits for both VMRS and VMSR. Patch by Mark Murray! Differential revision: https://reviews.llvm.org/D66938 llvm-svn: 370729	2019-09-03 09:55:30 +00:00
Oliver Stannard	39bf484d92	Bug fix on function epilog optimization (ARM backend) To save a 'add sp,#val' instruction by adding registers to the final pop instruction, the first register transferred by this pop instruction need to be found. If the function to be optimized has a non-void return value, the operand list contains r0 (implicit) which prevents the optimization to take place. Therefore implicit register references should be skipped in the search loop, because this registers are never popped from the stack. Patch by Rainer Herbertz (rOptimizer)! Differential revision: https://reviews.llvm.org/D66730 llvm-svn: 370728	2019-09-03 09:51:19 +00:00
Sam Tebbs	8b2df85d02	[ARM] Select vmla This patch adds vmla selection. Differential revision: https://reviews.llvm.org/D66297 llvm-svn: 370704	2019-09-03 08:17:46 +00:00
Craig Topper	b915109043	[X86] Simplify the setOperationAction handling for fp_to_uint by improving the Custom handler a bit. This merges the 32-bit and 64-bit mode code to just use Custom for both i32 and i64. We already had most of the handling in the custom handling due to the AVX512 having legal fp_to_uint. Just needed to add the i32->i64 promotion handling. Refactor the fp_to_uint code in the custom handler to simplify the number of times we check things. Tweak cost model tables to match the default handling we were getting due to Expand before. llvm-svn: 370700	2019-09-03 05:57:22 +00:00
Craig Topper	9dc8c448ed	[X86] Don't use Expand for i32 fp_to_uint on SSE1/2 targets on 32-bit target. Use Custom lowering instead. Fall back to default expansion only when the scalar FP type belongs in an XMM register. This improves lowering for i32 to fp80, and also i32 to double on SSE1 only. llvm-svn: 370699	2019-09-03 05:57:18 +00:00
Craig Topper	dcecc7ea46	[X86] Custom promote i32->f80 uint_to_fp on AVX512 64-bit targets. Reuse the same code to promote all i32 uint_to_fp on 64-bit targets to simplify the X86ISelLowering constructor. llvm-svn: 370693	2019-09-03 02:51:10 +00:00
Craig Topper	45cd185109	[X86] Enable fp128 as a legal type with SSE1 rather than with MMX. FP128 values are passed in xmm registers so should be asssociated with an SSE feature rather than MMX which uses a different set of registers. llc enables sse1 and sse2 by default with x86_64. But does not enable mmx. Clang enables all 3 features by default. I've tried to add command lines to test with -sse where possible, but any test that returns a value in an xmm register fails with a fatal error with -sse since we have no defined ABI for that scenario. llvm-svn: 370682	2019-09-02 20:16:30 +00:00
David Green	a95ec59fa5	[ARM] Use MQPR not QPR for MVE registers We should be using MQPR, and if we don't we can get COPYs and PHIs created for QPR. These get folded into instructions, failing verification checks. Differential revision: https://reviews.llvm.org/D66214 llvm-svn: 370676	2019-09-02 17:18:23 +00:00
Ulrich Weigand	b21e245711	[SystemZ] Support constrained fpto[su]i intrinsics Now that constrained fpto[su]i intrinsic are available, add codegen support to the SystemZ backend. In addition to pure back-end changes, I've also needed to add the strict_fp_to_[su]int and any_fp_to_[su]int pattern fragments in the obvious way. llvm-svn: 370674	2019-09-02 16:49:29 +00:00
Kerry McLaughlin	da4ef9b4c8	[SVE][Inline-Asm] Support for SVE asm operands Summary: Adds the following inline asm constraints for SVE: - w: SVE vector register with full range, Z0 to Z31 - x: Restricted to registers Z0 to Z15 inclusive. - y: Restricted to registers Z0 to Z7 inclusive. This change also adds the "z" modifier to interpret a register as an SVE register. Not all of the bitconvert patterns added by this patch are used, but they have been included here for completeness. Reviewers: t.p.northover, sdesmalen, rovka, momchil.velikov, rengolin, cameron.mcinally, greened Reviewed By: sdesmalen Subscribers: javed.absar, tschuett, rkruppe, psnobl, cfe-commits, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66302 llvm-svn: 370673	2019-09-02 16:12:31 +00:00
Simon Pilgrim	fb5661a884	[X86] getPMOVMSKB - add MVT::v64i8 handling and remove from combineBitcastvxi1. NFCI. llvm-svn: 370670	2019-09-02 15:10:35 +00:00
Jay Foad	6e18266aa4	Partially revert D61491 "AMDGPU: Be explicit about whether the high-word in SI_PC_ADD_REL_OFFSET is 0" Summary: D61491 caused us to use relocs when they're not strictly necessary, to refer to symbols in the text section. This is a pessimization and it's a problem for some loaders that don't support relocs yet. Reviewers: nhaehnle, arsenm, tpr Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65813 llvm-svn: 370667	2019-09-02 14:40:57 +00:00
Dmitry Preobrazhensky	4aa90ea58e	[AMDGPU][MC][GFX10] Corrected constant bus checks to exclude null See AMD SWDEV-157286 Reviewers: atamazov, arsenm Differential Revision: https://reviews.llvm.org/D65229 llvm-svn: 370665	2019-09-02 14:19:52 +00:00
Dmitry Preobrazhensky	9c68eddbbe	[AMDGPU][MC][GFX10] Enabled null with 64-bit operands See Bug 42745: https://bugs.llvm.org/show_bug.cgi?id=42745 Reviewers: atamazov, arsenm https://reviews.llvm.org/D65231 llvm-svn: 370660	2019-09-02 13:42:25 +00:00
Dmitry Preobrazhensky	fe2ee4c46a	[AMDGPU][MC][GFX10] Corrected constant bus limit for 64-bit shift instructions See bug 42744: https://bugs.llvm.org/show_bug.cgi?id=42744 Reviewers: atamazov, arsenm Differential Revision: https://reviews.llvm.org/D65228 llvm-svn: 370652	2019-09-02 12:50:05 +00:00
Andrea Di Biagio	528f68144b	[X86][BtVer2] Fix latency and throughput of conditional SIMD store instructions. On BtVer2 conditional SIMD stores are heavily microcoded. The latency is directly proportional to the number of packed elements extracted from the input vector. Also, according to micro-benchmarks, most of the computation seems to be done in the integer unit. Only a minority of the uOPs is executed by the FPU. The observed behaviour on the FPU looks similar to this: - The input MASK value is moved to the Integer Unit -- [ a VMOVMSK-like uOP-executed on JFPU0]. - In parallel, each element of the input XMM/YMM is extracted and then sent to the IntegerUnit through JFPU1. As expected, a (conditional) store is executed for every extracted element. Interestingly, a (speculative) load is executed for every extracted element too. It is as-if a "LOAD - BIT_EXTRACT- CMOV" sequence of uOPs is repeated by the integer unit for every contionally stored element. VMASKMOVDQU is a special case: the number of speculative loads is always 2 (presumably, one load per quadword). That means, extra shifts and masking is performed on (one of) the loaded quadwords before each conditional store (that also explains the big number of non-FP uOPs retired). This patch replaces the existing writes for conditional SIMD stores (i.e. WriteFMaskedStore, and WriteFMaskedStoreY) with the following new writes: WriteFMaskedStore32 [ XMM Packed Single ] WriteFMaskedStore32Y [ YMM Packed Single ] WriteFMaskedStore64 [ XMM Packed Double ] WriteFMaskedStore64Y [ YMM Packed Double ] Added a wrapper class named X86SchedWriteMaskMove in X86Schedule.td to describe both RM and MR variants for conditional SIMD moves in a single tablegen definition. Instances of that class are then passed in input to multiclass avx_movmask_rm when constructing MASKMOVPS/PD definitions. Since this patch introduces new writes, I had to update all the X86 scheduling models. Differential Revision: https://reviews.llvm.org/D66801 llvm-svn: 370649	2019-09-02 12:32:28 +00:00
Simon Pilgrim	05a3a92751	[X86] combineHorizontalPredicateResult - pull out repeated getTargetLoweringInfo() calls. NFCI. llvm-svn: 370637	2019-09-02 10:42:48 +00:00
Craig Topper	3ab210862a	[X86] Add initial support for unfolding broadcast loads from arithmetic instructions to enable LICM hoisting of the load MachineLICM can hoist an invariant load, but if that load is folded it needs to be unfolded. On AVX512 sometimes this load is an broadcast load which we were previously unable to unfold. This patch adds initial support for that with a very basic list of supported instructions as a starting point. Differential Revision: https://reviews.llvm.org/D67017 llvm-svn: 370620	2019-09-01 22:14:36 +00:00
Simon Pilgrim	07de5292e5	[X86][AVX] Rename + cleanup lowerShuffleAsLanePermuteAndBlend. NFCI. Rename to lowerShuffleAsLanePermuteAndShuffle to make it clear that not just blends are performed. Cleanup the in-lane shuffle mask generation to make it more obvious what's going on. Some prep work noticed while investigating the poor shuffle code mentioned in D66004. llvm-svn: 370613	2019-09-01 16:04:28 +00:00
Simon Pilgrim	27cc2efaf2	Fix shadow variable warning. NFCI. llvm-svn: 370610	2019-09-01 13:10:18 +00:00
David Green	8469a39af3	[ARM] Remove MVE masked loads/stores These were never enabled correctly and are causing other problems. Taking them out for the moment, whilst we work on the issues. This reverts r370329. llvm-svn: 370607	2019-09-01 10:11:40 +00:00
Matt Arsenault	ede9a5293d	AMDGPU: Remove unused custom node definition llvm-svn: 370603	2019-09-01 02:00:08 +00:00
Craig Topper	1594605416	[X86] Replace some COPY_TO_REGCLASS from GR32/GR64 to VR128 in isel patterns with VMOVDI2PDIrr/VMOV64toPQIrr. This is what the copies will eventually be turned into. We don't use COPY_TO_REGCLASS for scalar_to_vector patterns. So we should use the real instruction here too. llvm-svn: 370601	2019-08-31 23:52:25 +00:00
Craig Topper	1329cc6e01	[X86] Compress the flag bits in the folding tables to make room for more bits in an upcoming patch. llvm-svn: 370600	2019-08-31 23:52:21 +00:00
David Bolvansky	8caa16ec13	[NFC] Fixed -Wdocumentation warning /srv/llvm-buildbot-srcatch/llvm-build-dir/clang-x86_64-debian-fast/llvm.src/lib/Target/AMDGPU/AMDGPUGenRegisterBankInfo.def:98:1: warning: not a Doxygen trailing comment [-Wdocumentation] 1 warning generated. llvm-svn: 370596	2019-08-31 18:44:57 +00:00
Simon Pilgrim	f8d1d00190	[X86] EltsFromConsecutiveLoads - Don't confuse elt count with vector element count (PR43170) EltsFromConsecutiveLoads was assuming that the number of input elts was the same as the number of elements in the output vector type when creating a zeroing shuffle, causing an assert when subvectors were being combined instead of just scalars. llvm-svn: 370592	2019-08-31 16:21:31 +00:00
Simon Pilgrim	cffbec63d6	Fix shadow variable warning by making CondCodes names more explicit. NFCI. llvm-svn: 370589	2019-08-31 15:19:59 +00:00
Simon Pilgrim	ad020c0af1	Fix shadow variable warning. NFCI. llvm-svn: 370585	2019-08-31 15:01:03 +00:00
Simon Pilgrim	2d89007f61	[X86ISelLowering] combineCMov - cleanup CMOV->LEA codegen. NFCI. Only compute the diff once and we don't need the truncation code (assert the bitwidth is correct just to be safe). llvm-svn: 370583	2019-08-31 14:18:26 +00:00
Simon Pilgrim	7238353da2	[X86ISelLowering] LowerSELECT - remove duplicate value type. NFCI. VT of SELECT result and selection ops will be the same. llvm-svn: 370581	2019-08-31 13:14:52 +00:00
Thomas Lively	d0d9317061	[WebAssembly] Add SIMD QFMA/QFMS Summary: Adds clang builtins and LLVM intrinsics for these experimental instructions. They are not implemented in engines yet, but that is ok because the user must opt into using them by calling the builtins. Reviewers: aheejin, dschuff Reviewed By: aheejin Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D67020 llvm-svn: 370556	2019-08-31 00:12:29 +00:00
Reid Kleckner	185ddc08ee	Fix SEH_NoReturn machine verifier error llvm-svn: 370543	2019-08-30 22:40:51 +00:00
Reid Kleckner	657a06c619	[MC] Avoid crashes from improperly nested or wrong target .seh_handlerdata directives llvm-svn: 370540	2019-08-30 22:25:55 +00:00
Reid Kleckner	a33474d595	[X86] Print register names in .seh_* directives Also improve assembler parser register validation for .seh_ directives. This requires moving X86-specific seh directive handling into the x86 backend, which addresses some assembler FIXMEs. Differential Revision: https://reviews.llvm.org/D66625 llvm-svn: 370533	2019-08-30 21:23:05 +00:00
Reid Kleckner	0bb1630685	[Windows] Disable TrapUnreachable for Win64, add SEH_NoReturn Users have complained llvm.trap produce two ud2 instructions on Win64, one for the trap, and one for unreachable. This change fixes that. TrapUnreachable was added and enabled for Win64 in r206684 (April 2014) to avoid poorly understood issues with the Windows unwinder. There seem to be two major things in play: - the unwinder - C++ EH, _CxxFrameHandler3 & co The unwinder disassembles forward from the return address to scan for epilogues. Inserting a ud2 had the effect of stopping the unwinder, and ensuring that it ran the EH personality function for the current frame. However, it's not clear what the unwinder does when the return address happens to be the last address of one function and the first address of the next function. The Visual C++ EH personality, _CxxFrameHandler3, needs to figure out what the current EH state number is. It does this by consulting the ip2state table, which maps from PC to state number. This seems to go wrong when the return address is the last PC of the function or catch funclet. I'm not sure precisely which system is involved here, but in order to address these real or hypothetical problems, I believe it is enough to insert int3 after a call site if it would otherwise be the last instruction in a function or funclet. I was able to reproduce some similar problems locally by arranging for a noreturn call to appear at the end of a catch block immediately before an unrelated function, and I confirmed that the problems go away when an extra trailing int3 instruction is added. MSVC inserts int3 after every noreturn function call, but I believe it's only necessary to do it if the call would be the last instruction. This change inserts a pseudo instruction that expands to int3 if it is in the last basic block of a function or funclet. I did what I could to run the Microsoft compiler EH tests, and the ones I was able to run showed no behavior difference before or after this change. Differential Revision: https://reviews.llvm.org/D66980 llvm-svn: 370525	2019-08-30 20:46:39 +00:00
Craig Topper	18e8d02e8c	[X86] Pass v32i16/v64i8 in zmm registers on KNL target. gcc and icc pass these types in zmm registers in zmm registers. This patch implements a quick hack to override the register type before calling convention handling to one that is legal. Longer term we might want to do something similar to 256-bit integer registers on AVX1 where we just split all the operations. Fixes PR42957 Differential Revision: https://reviews.llvm.org/D66708 llvm-svn: 370495	2019-08-30 17:35:08 +00:00
Evgeniy Stepanov	04647f5e22	MemTag: unchecked load/store optimization. Summary: MTE allows memory access to bypass tag check iff the address argument is [SP, #imm]. This change takes advantage of this to demote uses of tagged addresses to regular FrameIndex operands, reducing register pressure in large functions. MO_TAGGED target flag is used to signal that the FrameIndex operand refers to memory that might be tagged, and needs to be handled with care. Such operand must be lowered to [SP, #imm] directly, without a scratch register. The transformation pass attempts to predict when the offset will be out of range and disable the optimization. AArch64RegisterInfo::eliminateFrameIndex has an escape hatch in case this prediction has been wrong, but it is quite inefficient and should be avoided. Reviewers: pcc, vitalybuka, ostannard Subscribers: mgorny, javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66457 llvm-svn: 370490	2019-08-30 17:23:02 +00:00
Craig Topper	66f03ba17d	[X86] Merge X86InstrInfo::loadRegFromAddr/storeRegToAddr into their only call site. I'm looking at unfolding broadcast loads on AVX512 which will require refactoring this code to select broadcast opcodes instead of regular load/stores in some cases. Merging them to avoid further complicating their interfaces. llvm-svn: 370484	2019-08-30 16:05:57 +00:00
Petar Avramovic	e96892a8aa	[MIPS GlobalISel] Lower uitofp Add custom lowering for G_UITOFP for MIPS32. Differential Revision: https://reviews.llvm.org/D66930 llvm-svn: 370432	2019-08-30 05:51:12 +00:00
Petar Avramovic	6412b56513	[MIPS GlobalISel] Lower fptoui Add lower for G_FPTOUI. Algorithm is similar to the SDAG version in TargetLowering::expandFP_TO_UINT. Lower G_FPTOUI for MIPS32. Differential Revision: https://reviews.llvm.org/D66929 llvm-svn: 370431	2019-08-30 05:44:02 +00:00
Jinsong Ji	a070f12e57	[PowerPC][NFC] Use inline Subtarget->isPPC64() To be consistent with all the other instances. llvm-svn: 370428	2019-08-30 03:16:41 +00:00
Fangrui Song	7704b54389	[PPC32] Emit R_PPC_GOT_TPREL16 instead R_PPC_GOT_TPREL16_LO Unlike ppc64, which has ADDISgotTprelHA+LDgotTprelL pairs, ppc32 just uses LDgotTprelL32, so it does not make lots of sense to use _LO without a paired _HA. Emit R_PPC_GOT_TPREL16 instead R_PPC_GOT_TPREL16_LO to match GCC, and get better linker relocation check. Note, R_PPC_GOT_TPREL16_{HA,LO} don't have good linker support: (a) lld does not support R_PPC_GOT_TPREL16_{HA,LO}. (b) Top of tree ld.bfd does not support R_PPC_GOT_REL16_HA Initial-Exec -> Local-Exec relaxation: // a.o addis 3, 3, tsd_tls@got@tprel@ha lwz 3, tsd_tls@got@tprel@l(3) add 3, 3, tsd_tls@tls // b.o .section .tdata,"awT"; .globl tsd_tls; tsd_tls: // ld/ld-new a.o b.o internal error, aborting at ../../bfd/elf32-ppc.c:7952 in ppc_elf_relocate_section Reviewed By: adalava Differential Revision: https://reviews.llvm.org/D66925 llvm-svn: 370426	2019-08-30 02:20:49 +00:00
Craig Topper	160ed4cab4	[X86] Explicitly list all the always trivially rematerializable instructions. Add a default with an llvm_unreachable for anything we don't expect. This seems safer that just blindly returning true for anything missing from the switch. llvm-svn: 370424	2019-08-30 00:54:36 +00:00
Dan Gohman	da84b688f9	[WebAssembly] Make __attribute__((used)) not imply export. Add an WASM_SYMBOL_NO_STRIP flag, so that __attribute__((used)) doesn't need to imply exporting. When targeting Emscripten, have WASM_SYMBOL_NO_STRIP imply exporting. Differential Revision: https://reviews.llvm.org/D62542 llvm-svn: 370415	2019-08-29 22:40:00 +00:00
Jinsong Ji	1ed7d2119e	[PowerPC] Support extended mnemonics mffprwz etc. Summary: Reported in https://github.com/opencv/opencv/issues/15413. We have serveral extended mnemonics for Move To/From Vector-Scalar Register Instructions eg: mffprd,mtfprd etc. We only support one of them, this patch add the others. Reviewers: nemanjai, steven.zhang, hfinkel, #powerpc Reviewed By: hfinkel Subscribers: wuzish, qcolombet, hiraditya, kbarton, MaskRay, shchenz, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66963 llvm-svn: 370411	2019-08-29 21:53:59 +00:00
Jessica Paquette	04e657be28	[AArch64][GlobalISel] Select arithmetic extended register patterns This teaches GISel to select patterns which fold an extend plus optional shift into the addressing mode. In particular, adds and subs. Factor out the arith extended register ComplexPatterns in AArch64InstrFormats.td and create GISel equivalents. Add some equivalent functions to the ones in AArch64ISelDAGToDAG: - `selectArithExtendedRegister` - `narrowExtendRegIfNeeded` - `getExtendTypeForInst` `getExtendTypeForInst` includes the checks for loads and stores. This will be used for WRO addressing modes in loads + stores. Teach selectCopy to properly handle subregister copies on the same bank in order to support `narrowExtendRegIfNeeded`. The extended register must be a GPR32, so we need to support same-bank subregister copies. Fix a bug in getSubRegForClass which would cause registers on things like GPR32common to end up getting ssub. Just change the check to look for FPR32 rather than GPR32. For tests: - Add select-arith-extended-reg.mir - Update addsub_ext.ll to include GlobalISel checks Differential Revision: https://reviews.llvm.org/D66835 llvm-svn: 370410	2019-08-29 21:53:58 +00:00
Reid Kleckner	5b79e603d3	[X86] Don't emit unreachable stack adjustments Summary: This is a minor improvement on our past attempts to do this. Fixes PR43155. Reviewers: hans Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66905 llvm-svn: 370409	2019-08-29 21:24:41 +00:00
Reid Kleckner	81e458d001	Allow '@' to appear in x86 mingw symbols Summary: There is no reason to differ in assembler behavior here between -msvc and -gnu targets. Without this setting, the text after the '@' is interpreted as a symbol variable, like foo@IMGREL. Reviewers: mstorsjo Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66974 llvm-svn: 370408	2019-08-29 21:15:02 +00:00
Reid Kleckner	fe47ed67fc	Fix the build for MSVC builds using M_PI llvm-svn: 370405	2019-08-29 20:32:53 +00:00
Simon Pilgrim	3d705a1fa4	[X86][SSE] combinePMULDQ - pmuldq(x, 0) -> zero vector (PR43159) ISD::isBuildVectorAllZeros permits undef elements to be present, which means we can't return it as a zero vector. PMULDQ/PMULUDQ is an extending multiply so a multiply by zero of the lower 32-bits should result in a zero 64-bit element. llvm-svn: 370404	2019-08-29 20:22:08 +00:00
Matt Arsenault	cbd1782c79	AMDGPU/GlobalISel: Legalize sin/cos llvm-svn: 370402	2019-08-29 20:06:48 +00:00
Craig Topper	5a43fdd313	[X86] Remove what little support we had for MPX -Deprecate -mmpx and -mno-mpx command line options -Remove CPUID detection of mpx for -march=native -Remove MPX from all CPUs -Remove MPX preprocessor define I've left the "mpx" string in the backend so we don't fail on old IR, but its not connected to anything. gcc has also deprecated these command line options. https://www.phoronix.com/scan.php?page=news_item&px=GCC-Patch-To-Drop-MPX Differential Revision: https://reviews.llvm.org/D66669 llvm-svn: 370393	2019-08-29 18:09:02 +00:00
Matt Arsenault	caff0a88dd	GlobalISel: Add known bits to InstructionSelector AMDGPU uses this for some addressing mode selection patterns. The analysis run itself doesn't do anything so it seems easier to just always require this than adding a way to opt in. llvm-svn: 370388	2019-08-29 17:24:32 +00:00
Jessica Paquette	ba04f5fac1	[GlobalISel][AArch64] Select llvm.aarch64.stxr* intrinsics. Add a GISelPredicateCode to the stxr_* PatFrags in AArch64InstrAtomics.td. This allows us to select these intrinsics. Differential Revision: https://reviews.llvm.org/D65779 llvm-svn: 370382	2019-08-29 16:55:55 +00:00
Jessica Paquette	b8b23a1648	[GlobalISel][AArch64] Use a GISelPredicateCode to select llvm.aarch64.stlxr.* Remove manual selection code for this intrinsic and use a GISelPredicateCode instead. This allows us to fully select this intrinsic without any tricky custom C++ matching. Differential Revision: https://reviews.llvm.org/D65780 llvm-svn: 370380	2019-08-29 16:45:19 +00:00
Jessica Paquette	87720ac8c8	[AArch64][GlobalISel] Select @llvm.aarch64.ldxr.* intrinsics Same thing as D66897, but for ldxr.* instead. Add a GISelPredicateCode to the ldxr_* definitions, which allows us to import them. Add select-ldxr-intrin.mir, and update arm64-ldxr-stxr.ll. Differential Revision: https://reviews.llvm.org/D66898 llvm-svn: 370378	2019-08-29 16:33:01 +00:00
Jessica Paquette	c327daeea5	[AArch64][GlobalISel] Select @llvm.aarch64.ldaxr.* intrinsics Add a GISelPredicateCode to ldaxr_. This allows us to import the patterns for @llvm.aarch64.ldaxr., and thus select them. Add `isLoadStoreOfNumBytes` for the GISelPredicateCode, since each of these intrinsics involves the same check. Add select-ldaxr-intrin.mir, and update arm64-ldxr-stxr.ll. Differential Revision: https://reviews.llvm.org/D66897 llvm-svn: 370377	2019-08-29 16:16:38 +00:00
Simon Atanasyan	b23857c149	[mips] Inline emitStoreWithSymOffset and emitLoadWithSymOffset methods. NFC Both methods `MipsTargetStreamer::emitStoreWithSymOffset` and `MipsTargetStreamer::emitLoadWithSymOffset` are almost the same and differ argument names only. These methods are used in the single place so it's better to inline their code and remove original methods. llvm-svn: 370354	2019-08-29 13:19:50 +00:00
Simon Atanasyan	3464b91ef7	[mips] Fix expanding `lw/sw $reg1, symbol($reg2)` instruction When a "base" in the `lw/sw $reg1, symbol($reg2)` instruction is a register and generated code is position independent, backend does not add the "base" value to the symbol address. ``` lw $reg1, %got(symbol)($gp) lw/sw $reg1, 0($reg1) ``` This patch fixes the bug and adds the missed `addu` instruction by passing `BaseReg` into the `loadAndAddSymbolAddress` routine and handles the case when the `BaseReg` is the zero register to escape redundant `move reg, reg` instruction: ``` lw $reg1, %got(symbol)($gp) addu $reg1, $reg1, $reg2 lw/sw $reg1, 0($reg1) ``` Differential Revision: https://reviews.llvm.org/D66894 llvm-svn: 370353	2019-08-29 13:19:38 +00:00
David Green	942c2e3795	[ARM] MVE Masked loads and stores Masked loads and store fit naturally with MVE, the instructions being easily predicated. This adds lowering for the simple cases of masked loads and stores. It does not yet deal with widening/narrowing or pre/post inc. The llvm masked load intrinsic will accept a "passthru" value, dictating the values used for the zero masked lanes. In MVE the instructions write 0 to the zero predicated lanes, so we need to match a passthru that isn't 0 (or undef) with a select instruction to pull in the correct data after the load. We also need to do something with unaligned loads/stores. Currently this uses a similar method used in big endian, using an VLDRB.8 (and potentially a VREV in BE). This does mean that the predicate mask is converted from, for example, a v4i1 to a v16i1. The VLDR instructions are defined as using the first bit of the relevant mask lane, so this could potentially load different results if the predicate is little odd. As the input is a v4i1 however, I believe this is OK and all the bits required should be set in the predicate, making the VLDRB.8 load the same data. Differential Revision: https://reviews.llvm.org/D66534 llvm-svn: 370329	2019-08-29 10:54:35 +00:00
Roman Lebedev	cc7495a355	[X86][CodeGen][NFC] Delay `combineIncDecVector()` from DAGCombine to X86DAGToDAGISel Summary: We were previously doing it in DAGCombine. But we also want to do `sub %x, C` -> `add %x, (sub 0, C)` for vectors in DAGCombine. So if we had `sub %x, -1`, we'll transform it to `add %x, 1`, which `combineIncDecVector()` will immediately transform back into `sub %x, -1`, and here we go again... I've marked this as NFC since not a single test changes, but since that 'changes' DAGCombine, probably this isn't fully NFC. Reviewers: RKSimon, craig.topper, spatel Reviewed By: craig.topper Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62327 llvm-svn: 370327	2019-08-29 10:50:09 +00:00
Craig Topper	c96284002e	[X86] Remove isel patterns with X86VBroadcast+scalar_to_vector+load. The DAG should have these as X86VBroadcast+load. llvm-svn: 370299	2019-08-29 06:36:16 +00:00
Craig Topper	231e628d69	[X86] Remove some unneeded X86VBroadcast isel patterns that have larger than 128 bit input types. We should always be shrinking the input to 128 bits or smaller when the node is created. llvm-svn: 370296	2019-08-29 06:02:11 +00:00
Craig Topper	1ec5c204b8	[X86] Add a DAG combine to combine INSERTPS and VBROADCAST of a scalar load. Remove corresponding isel patterns. We had an isel pattern to perform this, but its better to do it in DAG combine as a simplification. This also fixes the lack of patterns for AVX512 targets. llvm-svn: 370294	2019-08-29 05:48:48 +00:00
Craig Topper	1aadf6f39f	[X86] Make inline assembly 'x' and 'v' constraints work for f128. Including a type legalizer fix to make bitcast operand promotion work correctly when getSoftenedFloat returns f128 instead of i128. Fixes PR43157 llvm-svn: 370293	2019-08-29 05:13:56 +00:00
Matt Arsenault	216d8ff60b	AMDGPU: Don't use frame virtual registers SGPR spills aren't really handled after SILowerSGPRSpills. In order to directly control what happens if the scavenger needs to spill, the scavenger needs to be used directly. There is an alternative to spilling in these contexts anyway since the frame register can be increment and restored. This does present another possible issue if spilling is needed for the unused carry out if an add is needed. I think this can be avoided by using a scalar add (although that clobbers SCC, which happens anyway). llvm-svn: 370281	2019-08-29 01:13:47 +00:00
Matt Arsenault	8ec5c10042	GlobalISel/TableGen: Handle setcc patterns This is a special case because one node maps to two different G_ instructions, and the operand order is changed. This mostly enables G_FCMP for AMDPGPU. G_ICMP is still manually selected for now since it has the SALU and VALU complication to deal with. llvm-svn: 370280	2019-08-29 01:13:41 +00:00
Craig Topper	af364131af	[X86] Fix a couple isel patterns to not shrink a volatile load. Also add a FIXME because I'm not sure why these patterns exist. Looks like a missing combine. And another FIXME because the AVX512 equivalent one of the patterns is missing. llvm-svn: 370276	2019-08-28 23:45:10 +00:00
Shiva Chen	b39876d8cd	[RISCV] Avoid generating AssertZext for LP64 ABI when lowering floating LibCall The patch fixed the issue that RV64 didn't clear the upper bits when return complex floating value with lp64 ABI. float _Complex complex_add(float _Complex a, float _Complex b) { return a + b; } RealResult = zero_extend(RealA + RealB) ImageResult = ImageA + ImageB Return (RealResult \| (ImageResult << 32)) The patch introduces shouldExtendTypeInLibCall target hook to suppress the AssertZext generation when lowering floating LibCall. Thanks to Eli's comments from the Bugzilla https://bugs.llvm.org/show_bug.cgi?id=42820 Differential Revision: https://reviews.llvm.org/D65497 llvm-svn: 370275	2019-08-28 23:40:37 +00:00
Heejin Ahn	d85fd5a3f4	[WebAssembly] Add atomic.fence instruction Summary: This adds `atomic.fence` instruction: https://github.com/WebAssembly/threads/blob/master/proposals/threads/Overview.md#fence-operator And we now emit the new `atomic.fence` instruction for multithread fences, rather than the prevous `atomic.rmw` hack. Reviewers: dschuff Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, jfb, tlively, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66794 llvm-svn: 370272	2019-08-28 23:13:43 +00:00
Simon Atanasyan	027f1da010	[mips] Add an empty line to separate different patterns. NFC llvm-svn: 370269	2019-08-28 22:32:16 +00:00
Simon Atanasyan	59bb3609fa	[mips] Fix 64-bit address loading in case of applying 32-bit mask to the result If result of 64-bit address loading combines with 32-bit mask, LLVM tries to optimize the code and remove "redundant" loading of upper 32-bits of the address. It leads to incorrect code on MIPS64 targets. MIPS backend creates the following chain of commands to load 64-bit address in the `MipsTargetLowering::getAddrNonPICSym64` method: ``` (add (shl (add (shl (add %highest(sym), %higher(sym)), 16), %hi(sym)), 16), %lo(%sym)) ``` If the mask presents, LLVM decides to optimize the chain of commands. It really does not make sense to load upper 32-bits because the 0x0fffffff mask anyway clears them. After removing redundant commands we get this chain: ``` (add (shl (%hi(sym), 16), %lo(%sym)) ``` There is no patterns matched `(MipsHi (i64 symbol))`. Due a bug in `SYM_32` predicate definition, backend incorrectly selects a pattern for a 32-bit symbols and uses the `lui` instruction for loading `%hi(sym)`. As a result we get incorrect set of instructions with unnecessary 16-bit left shifting: ``` lui at,0x0 R_MIPS_HI16 foo dsll at,at,0x10 daddiu at,at,0 R_MIPS_LO16 foo ``` This patch resolves two problems: - Fix `SYM_32/SYM_64` predicates to prevent selection of patterns dedicated to 32-bit symbols in case of using N64 ABI. - Add missed patterns for 64-bit symbols for `%hi/%lo`. Fix PR42736. Differential Revision: https://reviews.llvm.org/D66228 llvm-svn: 370268	2019-08-28 22:32:10 +00:00
Scott Linder	04f6f25421	[AMDGPU] Fix bug when calculating user_spgr_count for Code Object V3 assembler Stop counting explicitly disabled user_spgr's in the user_sgpr_count field of the kernel descriptor. Differential Revision: https://reviews.llvm.org/D66900 llvm-svn: 370250	2019-08-28 19:38:15 +00:00
Jessica Paquette	af0bd41e06	[AArch64][GlobalISel] Fall back when translating musttail calls These are currently translated as normal functions calls in AArch64. Until we have proper tail call lowering, we shouldn't translate these. Differential Revision: https://reviews.llvm.org/D66842 llvm-svn: 370225	2019-08-28 16:19:01 +00:00
Ryan Taylor	3b1459ed7c	[AMDGPU] Adjust number of SGPRs available in Calling Convention This reduces the number of SGPRs due to some concerns about running out of SGPRs if you make all the SGPRs that aren't reserved available for the calling convention. Change-Id: Idb4ca4dc72f5b6808cb524ff7270915a8de5b4c1 llvm-svn: 370215	2019-08-28 15:00:45 +00:00
Hans Wennborg	cff90f07cb	[SelectionDAG] Don't generate libcalls for wide shifts on Windows (PR42711) Neither libgcc or compiler-rt are usually used on Windows, so these functions can't be called. Differential revision: https://reviews.llvm.org/D66880 llvm-svn: 370204	2019-08-28 13:55:10 +00:00
Simon Atanasyan	f46ba4f077	[mips] Use less registers to load address of TargetExternalSymbol There is no pattern matched `add hi, (MipsLo texternalsym)`. As a result, loading an address of 32-bit symbol requires two registers and one more additional instruction: ``` addiu $1, $zero, %lo(foo) lui $2, %hi(foo) addu $25, $2, $1 ``` This patch adds the missed pattern and enables generation more effective set of instructions: ``` lui $1, %hi(foo) addiu $25, $1, %lo(foo) ``` Differential Revision: https://reviews.llvm.org/D66771 llvm-svn: 370196	2019-08-28 12:35:53 +00:00
Amaury Sechet	4f4387dd12	[TargetLowering] Add buildLegalVectorShuffle facility to help build legal shuffles Summary: There are at least 2 ways to express the same shuffle. Various pieces of code explicit check for both option, but other places do not when they would benefit from doing it. This patches refactor the codebase to use buildLegalVectorShuffle in order to make that behavior more consistent. Reviewers: craig.topper, efriedma, RKSimon, lebedev.ri Subscribers: javed.absar, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66804 llvm-svn: 370190	2019-08-28 12:00:06 +00:00
David Green	379f6186dd	[ARM] Move MVEVPTBlockPass to a separate file. NFC This just pulls the MVEVPTBlockPass into a separate file, as opposed to being wrapped up in Thumb2ITBlockPass. Differential revision: https://reviews.llvm.org/D66579 llvm-svn: 370187	2019-08-28 11:37:31 +00:00
David Green	1c5b143c99	[MVE] VMOVX patterns This adds fp16 VMOVX patterns, using the same patterns as rL362482 with some adjustments for MVE. It allows us to move fp16 registers without going into and out of gprs. VMOVX is able to move the top bits from a fp16 in a fp reg into the bottom bits of another register, zeroing the rest. This can be used for odd MVE register lanes. The top bits are not read by fp16 instructions, so no move is required there if we are dealing with even lanes. Differential revision: https://reviews.llvm.org/D66793 llvm-svn: 370184	2019-08-28 10:13:23 +00:00
Sam Parker	a761ba0f2d	[ARM][ParallelDSP] Change search for muls rL369567 reverted a couple of recent changes made to ARMParallelDSP because of a miscompilation error: PR43073. The issue stemmed from an underlying bug that was caused by adding muls into a reduction before it was proved that they could be executed in parallel with another mul. Most of the changes here are from the previously reverted commits. The additional changes have been made area: 1) The Search function now doesn't insert any muls into the Reduction object. That now happens once the search has successfully finished. 2) For any muls added into the reduction but that weren't paired, we accumulate their values as an input into the smlad. Differential Revision: https://reviews.llvm.org/D66660 llvm-svn: 370171	2019-08-28 08:51:13 +00:00
Matt Arsenault	a8bbcbd006	AMDGPU/GlobalISel: Fix constraining scalar and/or/xor If the result register already had a register class assigned, the sources may not have been properly constrained. llvm-svn: 370150	2019-08-28 02:11:03 +00:00
Vlad Tsyrklevich	57076d3199	Revert "Change the X86 datalayout to add three address spaces for 32 bit signed," This reverts commit r370083 because it caused check-lld failures on sanitizer-x86_64-linux-fast. llvm-svn: 370142	2019-08-28 01:08:54 +00:00
Matt Arsenault	5c7e96dc26	AMDGPU/GlobalISel: Implement addrspacecast for 32-bit constant addrspace llvm-svn: 370140	2019-08-28 00:58:24 +00:00
Luis Marques	c894c6c983	[RISCV] Implement RISCVRegisterInfo::getPointerRegClass Fixes bug 43041 Differential Revision: https://reviews.llvm.org/D66752 llvm-svn: 370113	2019-08-27 21:37:57 +00:00
Amara Emerson	e20b91c265	[GlobalISel] Replace hard coded dynamic alloca handling with G_DYN_STACKALLOC. This change moves the actual stack pointer manipulation into the legalizer, available to targets via lower(). The codegen is slightly different because we're using explicit masks instead of G_PTRMASK, and using G_SUB rather than adding a negative amount via G_GEP. Differential Revision: https://reviews.llvm.org/D66678 llvm-svn: 370104	2019-08-27 19:54:27 +00:00
Matt Arsenault	ff07631b48	AMDGPU: Add amdgpu-32bit-address-high-bits to MIR serialization llvm-svn: 370089	2019-08-27 18:18:38 +00:00
Matt Arsenault	0c096da02f	AMDGPU: Fix crash from inconsistent register types for v3i16/v3f16 This is something of a workaround since computeRegisterProperties seems to be doing the wrong thing. llvm-svn: 370086	2019-08-27 17:51:56 +00:00
Amy Huang	1299945b81	Change the X86 datalayout to add three address spaces for 32 bit signed, 32 bit unsigned, and 64 bit pointers. llvm-svn: 370083	2019-08-27 17:46:53 +00:00
Craig Topper	fc1f08c2f2	[X86] Remove encoding information from the TAILJMP instructions that are lowered by MCInstLowering. Fix LowerPATCHABLE_TAIL_CALL to also convert them to regular JMP/JCC instructions There are 5 instructions here that are converted from TAILJMP opcodes to regular JMP/JCC opcodes during MCInstLowering. So normally there encoding information isn't used. The exception being when XRay wraps them in PATCHABLE_TAIL_CALL. For the ones that weren't already handled in MCInstLowering, add handling for those and remove their encoding information. This patch fixes PATCHABLE_TAIL_CALL to do the same opcode conversion as the regular lowering patch. Then removes the encoding information. Differential Revision: https://reviews.llvm.org/D66561 llvm-svn: 370079	2019-08-27 17:24:23 +00:00
Petar Avramovic	4a2a653288	[MIPS GlobalISel] ClampScalar G_SHL, G_ASHR and G_LSHR ClampScalar G_SHL, G_ASHR and G_LSHR to s32 for MIPS32. Differential Revision: https://reviews.llvm.org/D66533 llvm-svn: 370067	2019-08-27 14:41:44 +00:00
Simon Pilgrim	8912e2af39	[X86][AVX] Add SimplifyDemandedVectorElts support for KSHIFTL/KSHIFTR Differential Revision: https://reviews.llvm.org/D66527 llvm-svn: 370055	2019-08-27 13:13:17 +00:00
Tim Northover	a7f226f9db	AArch64: avoid creating cycle in DAG for post-increment NEON ops. Inserting a value into Visited has the effect of terminating a search for predecessors if that node is seen. This is legitimate for the base address, and acts as a slight performance optimization, but the vector-building node can be paert of a legitimate cycle so we shouldn't stop searching there. PR43056. llvm-svn: 370036	2019-08-27 10:21:11 +00:00
Pengfei Wang	564fb58a32	[WinEH] Allocate space in funclets stack to save XMM CSRs Summary: This is an alternate approach to D63396 Currently funclets reuse the same stack slots that are used in the parent function for saving callee-saved xmm registers. If the parent function modifies a callee-saved xmm register before an excpetion is thrown, the catch handler will overwrite the original saved value. This patch allocates space in funclets stack for saving callee-saved xmm registers and uses RSP instead RBP to access memory. Signed-off-by: Pengfei Wang <pengfei.wang@intel.com> Reviewers: rnk, RKSimon, craig.topper, annita.zhang, LuoYuanke, andrew.w.kaylor Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66596 Signed-off-by: Pengfei Wang <pengfei.wang@intel.com> llvm-svn: 370005	2019-08-27 01:53:24 +00:00
Matt Arsenault	0a6564980b	AMDGPU: Combine directly on mul24 intrinsics The problem these are supposed to work around can occur before the intrinsics are lowered into the nodes. Try to directly simplify them so they are matched before the bit assert operations can be optimized out. llvm-svn: 369994	2019-08-27 00:18:09 +00:00
Matt Arsenault	3b95986a32	AMDGPU: Run AMDGPUCodeGenPrepare after scalar opts The mul24 matching could interfere with SLSR and the other addressing mode related passes. This probably is not the optimal placement, but is an intermediate step. This should probably be moved after all the generic IR passes, particularly LSR. Moving this after LSR seems to help in some cases, and hurts others. As-is in this patch, in idiv-licm, it saves 1-2 instructions inside some of the loop bodies, but increases the number in others. Moving this later helps these loops. In the new lsr tests in mul24-pass-ordering, the intrinsic prevents introducing more instructions in the loop preheader, so moving this later ends up hurting them. This shouldn't be any worse than before the intrinsics were introduced in r366094, and LSR should probably be smarter. I think it's because it doesn't know the and inside the loop will be folded away. llvm-svn: 369991	2019-08-27 00:08:31 +00:00
Simon Atanasyan	d5918edf0d	[mips] Fix indentation. NFC llvm-svn: 369983	2019-08-26 22:40:34 +00:00
Simon Atanasyan	ac64924a55	[mips] clang-format the code. NFC llvm-svn: 369982	2019-08-26 22:40:28 +00:00
Craig Topper	6db7f492d9	[X86] Delay combineIncDecVector until after op legalization. Probably better to keep add over sub in early DAG combines. It might make sense to push this to lowering or delay it all the way to isel. But this was the simplest change. llvm-svn: 369981	2019-08-26 22:17:54 +00:00
Heejin Ahn	173a3a54bb	[WebAssembly] Fix SSA rebuilding in SjLj transformation Summary: Previously we skipped uses within the same BB as a def when rebuilding SSA after SjLj transformation. For example, before transformation, ``` for.cond: %0 = phi i32 [ %var, %for.inc ] ... %var = ... br label %for.inc for.inc: ; preds = %for.cond call i32 @setjmp(...) br %for.cond ``` In this BB, %var should be defined in all paths from %for.inc to make %0 valid. In the input it was true; %for.inc's only predecessor was %for.cond. But after SjLj transformation, it is possible that %for.inc has other predecessors that are reachable without reaching %for.cond. ``` entry.split: ... br i1 %a, label %bb.1, label %for.inc for.cond: %0 = phi i32 [ %var, %for.inc ] ... ; Not valid! %var = ... br label %for.inc for.inc: ; preds = %for.cond, %entry.split call i32 @setjmp(...) ... br %for.cond ``` In this case, we can't use %var in the `phi` instruction in %for.cond, because %var is not defined in all paths through %for.inc (If the control flow is %entry -> %entry.split -> %for.inc -> %for.cond, %var has not been defined until we reach the `phi`). But the previous code excluded users within the same BB, skipping instructions within the same BB so they are not rewritten properly. User instructions within the same BB also should be candidates for rewriting if they are _before_ the original definition. Fixes PR43097. Reviewers: dschuff Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66729 llvm-svn: 369978	2019-08-26 21:51:35 +00:00
Roland Froese	18db4e9ae1	Recommit [PowerPC] Update P9 vector costs for insert/extract Now that the v1i128 smin regression has been fixed, recommit the P9 cost updates from D60160. llvm-svn: 369952	2019-08-26 19:26:08 +00:00
Krzysztof Parzyszek	9e0feaf562	[Hexagon] Improve generated code for test-if-bit-clear llvm-svn: 369947	2019-08-26 19:08:08 +00:00
Craig Topper	36d1588f01	[X86] Add a hack to combinePMULDQ to manually turn SIGN_EXTEND_VECTOR_INREG/ZERO_EXTEND_VECTOR_INREG inputs into an ANY_EXTEND_VECTOR_INREG style shuffle ANY_EXTEND_VECTOR_INREG isn't currently marked Legal which prevents SimplifyDemandedBits from turning SIGN/ZERO_EXTEND_VECTOR_INREG into it after op legalization. And even if we did make it Legal, combineExtInVec doesn't do shuffle combining on the VECTOR_INREG nodes until AVX1. This patch adds a quick hack to combinePMULDQ to directly emit a vector shuffle corresponding to an ANY_EXTEND_VECTOR_INREG operation. This avoids both of those issues without creating any other regressions on our tests. The xop-ifma.ll change here also showed up when I tried to resurrect D56306 and seemed to be the only improvement that patch creates now. This is a more direct way to get the benefit. Differential Revision: https://reviews.llvm.org/D66436 llvm-svn: 369942	2019-08-26 18:23:26 +00:00
Craig Topper	b8b90ac1c5	[X86][DAGCombiner] Teach narrowShuffle to use concat_vectors instead of inserting into undef Summary: Concat_vectors is more canonical during early DAG combine. For example, its what's used by SelectionDAGBuilder when converting IR shuffles into SelectionDAG shuffles when element counts between inputs and mask don't match. We also have combines in DAGCombiner than can pull concat_vectors through a shuffle. See partitionShuffleOfConcats. So it seems like concat_vectors is a better operation to use here. I had to teach DAGCombiner's SimplifyVBinOp to also handle concat_vectors with undef. I haven't checked yet if we can remove the INSERT_SUBVECTOR version in there or not. I didn't want to mess with the other caller of getShuffleHalfVectors that's used during shuffle lowering where insert_subvector probably is what we want to produce so I've enabled this via a boolean passed to the function. Reviewers: spatel, RKSimon Reviewed By: RKSimon Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66504 llvm-svn: 369872	2019-08-25 17:59:49 +00:00
Xing Xue	ef039a3ccd	[PowerPC][AIX] Adds support for writing the .data section in assembly files Summary: Adds support for generating the .data section in assembly files for global variables with a non-zero initialization. The support for writing the .data section in XCOFF object files will be added in a follow-on patch. Any relocations are not included in this patch. Reviewers: hubert.reinterpretcast, sfertile, jasonliu, daltenty, Xiangling_L Reviewed by: hubert.reinterpretcast Subscribers: nemanjai, hiraditya, kbarton, MaskRay, jsji, wuzish, shchenz, DiggerLin, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66154 llvm-svn: 369869	2019-08-25 15:17:25 +00:00
Benjamin Kramer	55e8c91dd5	[AMDGPU] Downgrade from StringLiteral to const char* in an attempt to make GCC 5 happy llvm-svn: 369867	2019-08-25 12:47:31 +00:00
Craig Topper	1abe162a9a	[X86] Teach -Os immediate sharing code to not count constant uses that will become INC/DEC. INC/DEC don't use an immediate so we don't need to count it. We also shouldn't use the custom isel for it. Fixes PR42998. llvm-svn: 369863	2019-08-25 05:22:40 +00:00
Craig Topper	cc4b0596b1	[X86] Add isel patterns to match vpdpwssd avx512vnni instruction from add+pmaddwd nodes. llvm-svn: 369859	2019-08-24 23:14:57 +00:00
Matt Arsenault	c6ab2b4fed	AMDGPU: Preserve value name when inserting mul24 intrinsic llvm-svn: 369857	2019-08-24 22:17:10 +00:00
Matt Arsenault	b3dd381a73	AMDGPU: Introduce a flag to disable mul24 intrinsic formation llvm-svn: 369856	2019-08-24 22:14:41 +00:00
Benjamin Kramer	7043477042	Fix some accidental global initializers by using StringLiteral instead of StringRef llvm-svn: 369850	2019-08-24 15:24:25 +00:00
Benjamin Kramer	16b322914a	Use a bit of relaxed constexpr to make FeatureBitset costant intializable This requires std::intializer_list to be a literal type, which it is starting with C++14. The downside is that std::bitset is still not constexpr-friendly so this change contains a re-implementation of most of it. Shrinks clang by ~60k. llvm-svn: 369847	2019-08-24 15:02:44 +00:00
Craig Topper	dd2cf78381	[X86] Add an assert to mark more code that needs to be removed when the vector widening legalization switch is removed again. llvm-svn: 369837	2019-08-24 05:59:46 +00:00
Guillaume Chatelet	0b6563e8a2	[LLVM][NFC] Removing unused functions Summary: Removes a not so useful function from DataLayout and cleans up Support/MathExtras.h Reviewers: courbet Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66691 llvm-svn: 369824	2019-08-23 23:19:25 +00:00
Stanislav Mekhanoshin	b37d6a750a	[AMDGPU] Check for immediate SrcC in mfma in AsmParser Differential Revision: https://reviews.llvm.org/D66674 llvm-svn: 369819	2019-08-23 22:22:49 +00:00
Stanislav Mekhanoshin	e6e1c4eac0	[AMDGPU] w/a for gfx908 mfma SrcC literal HW bug gfx908 ignores an mfma if SrcC is a literal. Differential Revision: https://reviews.llvm.org/D66670 llvm-svn: 369818	2019-08-23 22:22:29 +00:00
Stanislav Mekhanoshin	8fe1245a0f	[AMDGPU] w/a for gfx908 mfma SrcC literal HW bug gfx908 ignores an mfma if SrcC is a literal. Differential Revision: https://reviews.llvm.org/D66670 llvm-svn: 369816	2019-08-23 22:09:58 +00:00
Jessica Paquette	83fe56b3b9	[AArch64][GlobalISel] Import XRO load/store patterns instead of custom selection Instead of using custom C++ in `earlySelect` for loads and stores, just import the patterns. Remove `earlySelectLoad`, since we can just import the work it's doing. Some minor changes to how `ComplexRendererFns` are returned for the XRO addressing modes. If you add immediates in two steps, sometimes they are not imported properly and you only end up with one immediate. I'm not sure if this is intentional. - Update load-addressing-modes.mir to include the instructions we can now import. - Add a similar test, store-addressing-modes.mir to show which store opcodes we currently import, and show that we can pull in shifts etc. - Update arm64-fastisel-gep-promote-before-add.ll to use FastISel instead of GISel. This test failed with GISel because GISel folds the gep into the load. The test checks that FastISel doesn't fold non-pointer-width adds into loads. GISel on the other hand, produces a G_CONSTANT of -128 for the add, and then a G_GEP, which must be pointer-width. Note that we don't get STRBRoX right now. It seems like the importer can't handle `FPR8Op:{ *:[Untyped] }:$Rt` source operands. So, those are not currently supported. Differential Revision: https://reviews.llvm.org/D66679 llvm-svn: 369806	2019-08-23 20:31:34 +00:00
Benjamin Kramer	dc5f805d31	Do a sweep of symbol internalization. NFC. llvm-svn: 369803	2019-08-23 19:59:23 +00:00
Craig Topper	bc173d4c51	[X86] Move a transform out of combineConcatVectorOps so we don't prematurely turn CONCAT_VECTORS into INSERT_SUBVECTORS. CONCAT_VECTORS and INSERT_SUBVECTORS can both call combineConcatVectorOps, but we shouldn't produce INSERT_SUBVECTORS from there. We should keep CONCAT_VECTORS until vector legalization. Noticed while looking at the madd_quad_reduction test from madd.ll llvm-svn: 369802	2019-08-23 19:52:24 +00:00
Roland Froese	b4051e57b1	[PowerPC] Expand v1i128 smin The smin opcode and friends for v1i128 are incorrectly marked as legal for PPC. Change them to expand. Differential Revision: https://reviews.llvm.org/D64960 llvm-svn: 369797	2019-08-23 19:04:47 +00:00
Craig Topper	bccd183217	[X86] Mark VPDPWSSD and VPDPWSSDS as commutable. Add stack folding tests. llvm-svn: 369792	2019-08-23 18:05:37 +00:00
Craig Topper	e7211bb567	[SelectionDAG][X86] Enable iX SimplifyDemandedBits to vXi1 SimplifyDemandedVectorElts simplification. Add a hack to X86 to avoid a regression Patch showing the effect of enabling bool vector oversimplification. Non-VLX builds can simplify a kshift shuffle, but VLX builds simplify: insert_subvector v8i zeroinitializer, v2i --> insert_subvector v8i undef, v2i Preventing the removal of the AND to clear the upper bits of result Differential Revision: https://reviews.llvm.org/D53022 llvm-svn: 369780	2019-08-23 17:14:58 +00:00
Simon Atanasyan	5f7d6ac7bf	[mips] Reduce number of instructions used for loading a global symbol's value Now `lw/sw $reg, sym+offset` pseudo instructions for global symbol `sym` are lowering into the following three instructions. ``` lw $reg, %got(symbol)($gp) addiu $reg, $reg, offset lw/sw $reg, 0($reg) ``` It's possible to reduce the number of instructions by taking the offset in account in the final `lw/sw` command. This patch implements that optimization. ``` lw $reg, %got(symbol)($gp) lw/sw $reg, offset($reg) ``` Differential Revision: https://reviews.llvm.org/D66553 llvm-svn: 369756	2019-08-23 13:36:24 +00:00
Simon Atanasyan	58492b1895	[mips] Do not include offset into `%got` expression for global symbols Now pseudo instruction `la $6, symbol+8($6)` is expanding into the following chain of commands: ``` lw $1, %got(symbol+8)($gp) addiu $1, $1, 8 addu $6, $1, $6 ``` This is incorrect. When a linker handles the `R_MIPS_GOT16` relocation, it does not expect to get any addend and breaks on assertion. Otherwise it has to create new GOT entry for each unique "sym + offset" pair. Offset for a global symbol should be added to result of loading GOT entry by a separate `add` command. The patch fixes the problem by stripping off an offset from the expression passed to the `%got`. That's interesting that even current code inserts a separate `add` command. Differential Revision: https://reviews.llvm.org/D66552 llvm-svn: 369755	2019-08-23 13:36:14 +00:00
Simon Pilgrim	c88408cf85	Use VT::getHalfNumVectorElementsVT helpers in a few places. NFCI. llvm-svn: 369751	2019-08-23 12:37:09 +00:00
Andrea Di Biagio	8e9af64da6	[X86][BtVer2] Add a read-advance to every implicit register use of CMPXCHG8B/16B. This is a follow up of r369642. This patch assigns a ReadAfterLd to every implicit register use of instruction CMPXCHG8B and instruction CMPXCHG16B. Perf micro-benchmarks show that implicit registers are read after 3cy from the start of execution. llvm-svn: 369750	2019-08-23 12:19:45 +00:00
Andrea Di Biagio	1630f64e2f	[X86][BtVer2] Fix latency of ALU RMW instructions. Excluding ADC/SBB and the bit-test instructions (BTR/BTS/BTC), the observed latency of all other RMW integer arithmetic/logic instructions is 6cy and not 5cy. Example (ADD): ``` addb $0, (%rsp) # Latency: 6cy addb $7, (%rsp) # Latency: 6cy addb %sil, (%rsp) # Latency: 6cy addw $0, (%rsp) # Latency: 6cy addw $511, (%rsp) # Latency: 6cy addw %si, (%rsp) # Latency: 6cy addl $0, (%rsp) # Latency: 6cy addl $511, (%rsp) # Latency: 6cy addl %esi, (%rsp) # Latency: 6cy addq $0, (%rsp) # Latency: 6cy addq $511, (%rsp) # Latency: 6cy addq %rsi, (%rsp) # Latency: 6cy ``` The same latency profile applies to SUB/AND/OR/XOR/INC/DEC. The observed latency of ADC/SBB is 7-8cy. So we need a different write to model those. Latency of BTS/BTR/BTC is not fixed by this patch (they are much slower than what the model for btver2 currently reports). Differential Revision: https://reviews.llvm.org/D66636 llvm-svn: 369748	2019-08-23 11:34:10 +00:00
Jay Foad	eac23862a8	[AMDGPU] gfx10 atomic optimizer changes. Summary: Add support for gfx10, where all DPP operations are confined to work within a single row of 16 lanes, and wave32. Reviewers: arsenm, sheredom, critson, rampitec Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, t-tye, hiraditya, jfb, dstuttard, tpr, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65644 llvm-svn: 369745	2019-08-23 10:07:43 +00:00
Craig Topper	4deb388bca	[X86] Make combineLoopSADPattern use CONCAT_VECTORS instead of INSERT_SUBVECTORS for widening with zeros. CONCAT_VECTORS is more canonical for the early DAG combine runs until we start getting into the op legalization phases. llvm-svn: 369734	2019-08-23 06:08:33 +00:00
Craig Topper	bdceb9fb14	[X86] Improve lowering of v2i32 SAD handling in combineLoopSADPattern. For v2i32 we only feed 2 i8 elements into the psadbw instructions with 0s in the other 14 bytes. The resulting psadbw instruction will produce zeros in bits [127:16] of the output. We need to take the result and feed it to a v2i32 add where the first element includes bits [15:0] of the sad result. The other element should be zero. Prior to this patch we were using a truncate to take 0 from bits 95:64 of the psadbw. This results in a pshufd to move those bits to 63:32. But since we also have zeroes in bits 63:32 of the psadbw output, we should just take those bits. The previous code probably worked better with promoting legalization, but now we use widening legalization. I've preserved the old behavior if -x86-experimental-vector-widening-legalization=false until we get that option removed. llvm-svn: 369733	2019-08-23 05:33:27 +00:00
Sam Clegg	90b6bb75e8	[MC] Minor cleanup to MCFixup::Kind handling. NFC. Prefer `MCFixupKind` where possible and add getTargetKind() to convert to `unsigned` when needed rather than scattering cast operators around the place. Differential Revision: https://reviews.llvm.org/D59890 llvm-svn: 369720	2019-08-23 01:00:55 +00:00
Peter Collingbourne	2452d7030b	IR. Change strip* family of functions to not look through aliases. I noticed another instance of the issue where references to aliases were being replaced with aliasees, this time in InstCombine. In the instance that I saw it turned out to be only a QoI issue (a symbol ended up being missing from the symbol table due to the last reference to the alias being removed, preventing HWASAN from symbolizing a global reference), but it could easily have manifested as incorrect behaviour. Since this is the third such issue encountered (previously: D65118, D65314) it seems to be time to address this common error/QoI issue once and for all and make the strip* family of functions not look through aliases. Includes a test for the specific issue that I saw, but no doubt there are other similar bugs fixed here. As with D65118 this has been tested to make sure that the optimization isn't load bearing. I built Clang, Chromium for Linux, Android and Windows as well as the test-suite and there were no size regressions. Differential Revision: https://reviews.llvm.org/D66606 llvm-svn: 369697	2019-08-22 19:56:14 +00:00
Francis Visoiu Mistrih	5b5ee61b5f	[MachO][TLOF] Use hasLocalLinkage to determine if indirect symbol is local Local symbols in the indirect symbol table contain the value `INDIRECT_SYMBOL_LOCAL` and the corresponding __pointers entry must contain the address of the target. In r349060, I added support for local symbols in the indirect symbol table, which was checking if the symbol `isDefined` && `!isExternal` to determine if the symbol is local or not. It turns out that `isDefined` will return false if the user of the symbol comes before its definition, and we'll again generate .long 0 which will be the symbol at the adress 0x0. Instead of doing that, use GlobalValue::hasLocalLinkage() to check if the symbol is local. Differential Revision: https://reviews.llvm.org/D66563 llvm-svn: 369671	2019-08-22 16:59:00 +00:00
Craig Topper	898a0e9b84	[X86] Remove MCInstLower code that drops operands from some CALL and TAILJMP instructions. Add asserts to verify operand count It appears the FIXME here was handled at some point. r159728 from 2012 seems to be at least aportion of fixing it. Differential Revision: https://reviews.llvm.org/D66570 llvm-svn: 369665	2019-08-22 16:23:35 +00:00
Andrea Di Biagio	c9649eb9da	[X86][BtVer2] Fix latency/throughput of scalar integer MUL instructions. Single operand MUL instructions that implicitly set EAX have the following latency/throughput profile (see below): imul %cl # latency: 3cy - uOPs: 1 - 1 JMul imul %cx # latency: 3cy - uOPs: 3 - 3 JMul imul %ecx # latency: 3cy - uOPs: 2 - 2 JMul imul %rcx # latency: 6cy - uOPs: 2 - 4 JMul mul %cl # latency: 3cy - uOPs: 1 - 1 JMul mul %cx # latency: 3cy - uOPs: 3 - 3 JMul mul %ecx # latency: 3cy - uOPs: 2 - 2 JMul mul %rcx # latency: 6cy - uOPs: 2 - 4 JMul Excluding the 64bit variant, which has a latency of 6cy, every other instruction has a latency of 3cy. However, the number of decoded macro-opcodes (as well as the resource cyles) depend on the MUL size. The two operand MULs have a more predictable profile (see below): imul %dx, %dx # latency: 3cy - uOPs: 1 - 1 JMul imul %edx, %edx # latency: 3cy - uOPs: 1 - 1 JMul imul %rdx, %rdx # latency: 6cy - uOPs: 1 - 4 JMul imul $3, %dx, %dx # latency: 4cy - uOPs: 2 - 2 JMul imul $3, %ecx, %ecx # latency: 3cy - uOPs: 1 - 1 JMul imul $3, %rdx, %rdx # latency: 6cy - uOPs: 1 - 4 JMul This patch updates the values in the Jaguar scheduling model and regenerates llvm-mca tests. Differential Revision: https://reviews.llvm.org/D66547 llvm-svn: 369661	2019-08-22 15:20:16 +00:00
Sean Fertile	5f85a7b1cf	[PowerPC] Add combined ELF ABI and 32/64 bit queries to the subtarget. [NFC] A lot of places in the code combine checks for both ABI (SVR4/Darwin/AIX) and addressing mode (64-bit vs 32-bit). In an attempt to make some of the code more readable I've added a couple functions that combine checking for the ELF abi and 64-bit/32-bit code at once. As we add more AIX support I intend to add similar functions for the AIX ABI. Differential Revision: https://reviews.llvm.org/D65814 llvm-svn: 369658	2019-08-22 15:11:28 +00:00
Sean Fertile	18fd1b0b49	[PowerPC][XCOFF][MC] Explicitly set containing csect on symbols. [NFC] Previously we would get the csect a symbol was contained in through its fragment. This works only if we are writing an object file, and only for defined symbols. To fix this we set the contating csect explicitly on the MCSymbolXCOFF object. Differential Revision: https://reviews.llvm.org/D66032 llvm-svn: 369657	2019-08-22 15:11:23 +00:00
Andrea Di Biagio	c6744055ad	[X86][BtVer2] Fix latency and throughput of XCHG and XADD. On Jaguar, XCHG has a latency of 1cy and decodes to 2 macro-opcodes. Maximum throughput for XCHG is 1 IPC. The byte exchange has worse latency and decodes to 1 extra uOP; maximum observed throughput is 0.5 IPC. ``` xchgb %cl, %dl # Latency: 2cy - uOPs: 3 - 2 ALU xchgw %cx, %dx # Latency: 1cy - uOPs: 2 - 2 ALU xchgl %ecx, %edx # Latency: 1cy - uOPs: 2 - 2 ALU xchgq %rcx, %rdx # Latency: 1cy - uOPs: 2 - 2 ALU ``` The reg-mem forms of XCHG are atomic operations with an observed latency of 16cy. The resource usage is similar to the XCHGrr variants. The biggest difference is obviously the bus-locking, which prevents the LS to issue other memory uOPs in parallel until the unlocking store uOP is executed. ``` xchgb %cl, (%rsp) # Latency: 16cy - uOPs: 3 - ECX latency: 11cy xchgw %cx, (%rsp) # Latency: 16cy - uOPs: 3 - ECX latency: 11cy xchgl %ecx, (%rsp) # Latency: 16cy - uOPs: 3 - ECX latency: 11cy xchgq %rcx, (%rsp) # Latency: 16cy - uOPs: 3 - ECX latency: 11cy ``` The exchanged in/out register operand becomes available after 11cy from the start of execution. Added test xchg.s to verify that we correctly see that register write committed in 11cy (and not 16cy). Reg-reg XADD instructions have the same latency/throughput than the byte exchange (register-register variant). ``` xaddb %cl, %dl # latency: 2cy - uOPs: 3 - 3 ALU xaddw %cx, %dx # latency: 2cy - uOPs: 3 - 3 ALU xaddl %ecx, %edx # latency: 2cy - uOPs: 3 - 3 ALU xaddq %rcx, %rdx # latency: 2cy - uOPs: 3 - 3 ALU ``` The non-atomic RM variants have a latency of 11cy, and decode to 4 macro-opcodes. They still consume 2 ALU pipes, and the exchange in/out register operand becomes available in 3cy (it matches the 'load-to-use latency'). ``` xaddb %cl, (%rsp) # latency: 11cy - uOPs: 4 - 3 ALU xaddw %cx, (%rsp) # latency: 11cy - uOPs: 4 - 3 ALU xaddl %ecx, (%rsp) # latency: 11cy - uOPs: 4 - 3 ALU xaddq %rcx, (%rsp) # latency: 11cy - uOPs: 4 - 3 ALU ``` The atomic XADD variants execute in 16cy. The in/out register operand is available after 11cy from the start of execution. ``` lock xaddb %cl, (%rsp) # latency: 16cy - uOPs: 4 - 3 ALU -- ECX latency: 11cy lock xaddw %cx, (%rsp) # latency: 16cy - uOPs: 4 - 3 ALU -- ECX latency: 11cy lock xaddl %ecx, (%rsp) # latency: 16cy - uOPs: 4 - 3 ALU -- ECX latency: 11cy lock xaddq %rcx, (%rsp) # latency: 16cy - uOPs: 4 - 3 ALU -- ECX latency: 11cy ``` Added test xadd.s to verify those latencies as well as read-advance values. Differential Revision: https://reviews.llvm.org/D66535 llvm-svn: 369642	2019-08-22 11:32:47 +00:00
Simon Pilgrim	6dd51c2f19	[MVT] Add MVT equivalent to EVT::getHalfNumVectorElementsVT() helper. NFCI. Allows for some cleanup in a lot of SSE/AVX vector splitting code llvm-svn: 369640	2019-08-22 11:14:30 +00:00
Sam Tebbs	a69d9d6156	Reapply: [ARM] Fix lsrl with a 128/256 bit shift amount or a shift of 32 The CodeGen/Thumb2/mve-vaddv.ll test needed to be amended to reflect the changes from the above patch. This reverts commit `cd53ff6`, reapplying `7c6b229`. llvm-svn: 369638	2019-08-22 10:29:20 +00:00
Hans Wennborg	cd53ff6c0d	Revert r369626 "[ARM] Fix lsrl with a 128/256 bit shift amount or a shift of 32" It broke the bots, see e.g. http://lab.llvm.org:8011/builders/clang-cuda-build/builds/36275/ > This patch fixes shifts by a 128/256 bit shift amount. It also fixes > codegen for shifts of 32 by delegating to LLVM's default optimisation > instead of emitting a long shift. > > Tests that used to generate long shifts of 32 are updated to check for the > more optimised codegen. > > Differential revision: https://reviews.llvm.org/D66519 > > llvm-svn: 369626 llvm-svn: 369636	2019-08-22 09:16:53 +00:00
Craig Topper	d420616313	[X86] Lower the cost of v2i32->v2f64 sint_to_fp under vector widening legalization. I don't really understand the costs we're using for fp_to_sint, but prior to widening legalization we used 20 as the cost for this via the v2i64->v2f64 entry. That number seems better than the 40 we got with widening legalization. So now we need either a v2i32->v2f64 entry or a v4i32->v2f64 entry depending on whether AVX is enabled or not since we skip the first SSE2 table look up under AVX. llvm-svn: 369628	2019-08-22 08:18:45 +00:00
Sam Tebbs	7c6b229204	[ARM] Fix lsrl with a 128/256 bit shift amount or a shift of 32 This patch fixes shifts by a 128/256 bit shift amount. It also fixes codegen for shifts of 32 by delegating to LLVM's default optimisation instead of emitting a long shift. Tests that used to generate long shifts of 32 are updated to check for the more optimised codegen. Differential revision: https://reviews.llvm.org/D66519 llvm-svn: 369626	2019-08-22 08:12:06 +00:00
Shiva Chen	72a41e7b0d	[TargetLowering] Remove optional arguments passing to makeLibCall The patch introduces MakeLibCallOptions struct as suggested by @efriedma on D65497. The struct contain argument flags which will pass to makeLibCall function. The patch should not has any functionality changes. Differential Revision: https://reviews.llvm.org/D65795 llvm-svn: 369622	2019-08-22 04:59:43 +00:00
Pengfei Wang	7630e24492	[X86] Making X86OptimizeLEAs pass public. NFC Reviewers: wxiao3, LuoYuanke, andrew.w.kaylor, craig.topper, annita.zhang, liutianle, pengfei, xiangzhangllvm, RKSimon, spatel, andreadb Reviewed By: RKSimon Subscribers: andreadb, hiraditya, llvm-commits Tags: #llvm Patch by Gen Pei (gpei) Differential Revision: https://reviews.llvm.org/D65933 llvm-svn: 369612	2019-08-22 02:29:27 +00:00
Craig Topper	78e6507b0a	[X86] Correct the scheduler classes for TAILJMP and TCRETURN CodeGenOnly instructions. We had an odd combination of WriteJump applied to some memory instructions and WriteJumpLd applied to register and immediate instructions. Thsi should hopefully assign them all correctly. llvm-svn: 369599	2019-08-21 23:17:52 +00:00
Craig Topper	303bbc3be2	[X86] Replace a couple hardcoded '5's with X86::AddrNumOperands for readability. NFC llvm-svn: 369598	2019-08-21 22:40:07 +00:00
Luis Marques	f7cdff4ffd	[RISCV] Remove fix introduced by r369573, superseded by r369580 llvm-svn: 369590	2019-08-21 22:02:56 +00:00
Luis Marques	4f488b594a	[RISCV] Fix use of side-effects in asserts in decoder functions llvm-svn: 369580	2019-08-21 21:11:37 +00:00
Richard Smith	b73cd33625	Fix -Werror=unused-variable error after r369528. llvm-svn: 369573	2019-08-21 20:42:37 +00:00
Nico Weber	ed18e70c86	Revert r367389 (and follow-up r368404); it caused PR43073. llvm-svn: 369567	2019-08-21 19:53:42 +00:00
Sam Clegg	dde8a25a4b	[WebAssembly] Handle aliases in WebAssemblyFixFunctionBitcasts Fixes: https://github.com/emscripten-core/emscripten/issues/8770 Differential Revision: https://reviews.llvm.org/D66508 llvm-svn: 369566	2019-08-21 19:52:33 +00:00
Craig Topper	3f59bfd5be	[MVT] Add v16f16 and v32f16 vectors. I might look at improving PR43065 which will require being able to mark a 256 and 512 bit vector of f16 as Legal. Differential Revision: https://reviews.llvm.org/D66515 llvm-svn: 369565	2019-08-21 19:14:48 +00:00
Simon Atanasyan	159f621c5c	[mips] Replace call `expandLoadAddress` by `loadAndAddSymbolAddress`. NFC In case of expanding `lw/sw $reg, symbol($reg)` instruction for PIC it's enough to call the `loadAndAddSymbolAddress` method. Additional work performed by the `expandLoadAddress` is not required here. llvm-svn: 369563	2019-08-21 18:54:51 +00:00
Simon Atanasyan	bb2f857247	[mips] Remove duplicated case from the `StringSwitch`. NFC llvm-svn: 369562	2019-08-21 18:54:41 +00:00
Matt Arsenault	954a012b4c	GlobalISel: Implement moreElementsVector for G_UNMERGE_VALUES sources This is necessary for handling <3 x s16> on AMDGPU, assuming this should be handled as 2 separate legalization actions. The alternative would be for fewerElementsVector to handle 3->2. llvm-svn: 369547	2019-08-21 16:59:10 +00:00
David Green	717feabdf0	[ARM] Formatting for ARMInstrMVE.td. NFC This is just some formatting cleanup, prior to the masked load and store patch in D66534. llvm-svn: 369545	2019-08-21 16:20:35 +00:00
Alexander Timofeev	78347c979e	[AMDGPU] Prevent VGPR copies from moving across the EXEC mask definitions Differential Revision: https://reviews.llvm.org/D63731 Reviewers: qcolombet, rampitec llvm-svn: 369532	2019-08-21 15:15:04 +00:00
Guillaume Chatelet	1c18a9cb9e	[LLVM][Alignment] Introduce Alignment In MachineFrameInfo Summary: This is patch is part of a serie to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: jfb Subscribers: hiraditya, dexonsmith, llvm-commits, courbet Tags: #llvm Differential Revision: https://reviews.llvm.org/D65800 llvm-svn: 369531	2019-08-21 14:29:30 +00:00
Luis Marques	c3bf3d14ea	[RISCV] Add support for RVC HINT instructions The hint instructions are enabled by default (if the standard C extension is enabled). To disable them pass -mattr=-rvc-hints. Differential Revision: https://reviews.llvm.org/D62592 llvm-svn: 369528	2019-08-21 14:00:58 +00:00
Petar Avramovic	7f581df649	[MIPS GlobalISel] NarrowScalar G_ZEXTLOAD and G_SEXTLOAD NarrowScalar G_ZEXTLOAD and G_SEXTLOAD to s32 for MIPS32. Differential Revision: https://reviews.llvm.org/D66205 llvm-svn: 369512	2019-08-21 09:43:20 +00:00
Petar Avramovic	e406aa791c	[MIPS GlobalISel] NarrowScalar G_ZEXT and G_SEXT NarrowScalar G_ZEXT and G_SEXT to s32 for MIPS32. Differential Revision: https://reviews.llvm.org/D66204 llvm-svn: 369511	2019-08-21 09:35:02 +00:00
Petar Avramovic	61bf2675b9	[MIPS GlobalISel] Consider type1 when legalizing shifts after r351882 r351882 allows different type for shift amount then result and value being shifted. Fix MIPS Legalizer rules to take r351882 into account. Differential Revision: https://reviews.llvm.org/D66203 llvm-svn: 369510	2019-08-21 09:31:29 +00:00
Petar Avramovic	5b4c5c2c54	[MIPS GlobalISel] NarrowScalar G_TRUNC Add NarrowScalar for G_TRUNC when NarrowTy is half the size of source. NarrowScalar G_TRUNC to s32 for MIPS32. Differential Revision: https://reviews.llvm.org/D66202 llvm-svn: 369509	2019-08-21 09:26:39 +00:00
Luke Cheeseman	71d38b3c62	[AArch64] Update MTE system register encodings The encodings for the system registers TFSRE0_EL1, TFSR_EL1 TFSR_EL2, TFSR_EL3 and TFSR_EL12 have been changed so that they consistently have CRn=5 and CRm=6 as per https://developer.arm.com/docs/ddi0487/latest. Differential Revision: https://reviews.llvm.org/D65442 llvm-svn: 369505	2019-08-21 09:09:56 +00:00
Amara Emerson	56606a4db3	[AArch64][GlobalISel] Add support for narrowScalar of G_ZEXT We do this by merging the source with the high bits set to 0. Differential Revision: https://reviews.llvm.org/D66181 llvm-svn: 369480	2019-08-21 00:12:37 +00:00
Daniel Sanders	a16bd4f9f2	[RISCV GlobalISel] Adding initial GlobalISel infrastructure Summary: Add an initial GlobalISel skeleton for RISCV. It can only run ir translator for `ret void`. Patch by Andrew Wei Reviewers: asb, sabuasal, apazos, lenary, simoncook, lewis-revill, edward-jones, rogfer01, xiangzhai, rovka, Petar.Avramovic, mgorny, dsanders Reviewed By: dsanders Subscribers: pzheng, s.egerton, dsanders, hiraditya, rbar, johnrusso, niosHD, kito-cheng, shiva0217, jrtc27, MaskRay, zzheng, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, psnobl, benna, Jim, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65219 llvm-svn: 369467	2019-08-20 22:53:24 +00:00
Jessica Paquette	e6c299b983	[AArch64][GlobalISel] Select logical_imm32 and logical_imm64 patterns Add a GlobalISel equivalent for the logical_imm32_XFORM and logical_imm64_XFORM SDNodeXForms in AArch64InstrFormats.td. - Add select-logical-imm.mir, which contains tests for each imported pattern. - Update select-pr32733.mir and select-scalar-shift-imm.mir, since they now select instructions of this form. Differential Revision: https://reviews.llvm.org/D66162 llvm-svn: 369465	2019-08-20 22:31:25 +00:00
Jessica Paquette	9a95e79b1b	[AArch64][GlobalISel] Select patterns which use shifted register operands This adds GlobalISel equivalents for the following from AArch64InstrFormats: - arith_shifted_reg32 - arith_shifted_reg64 And partial support for - logical_shifted_reg32 - logical_shifted_reg32 The only thing missing for the logical cases is support for rotates. Other than the missing support, the transformation is identical for the arithmetic shifted register and the logical shifted register. Lots of tests here: - Add select-arith-shifted-reg.mir to show that we correctly select add and sub instructions which use this pattern. - Add select-logical-shifted-reg.mir to cover patterns which are not shared between the arithmetic and logical cases. - Update addsub-shifted.ll to show that we correctly fold shifts into adds/subs. - Update eon.ll to show that we can select the eon instruction by folding xors. Differential Revision: https://reviews.llvm.org/D66163 llvm-svn: 369460	2019-08-20 22:18:06 +00:00
Craig Topper	ba375263e8	[DAGCombiner][X86] Teach visitCONCAT_VECTORS to combine (concat_vectors (concat_vectors X, Y), undef)) -> (concat_vectors X, Y, undef, undef) I also had to add a new combine to X86's combineExtractSubvector to prevent a regression. This helps our vXi1 code see the full concat operation and allow it optimize undef to a zero if there is already a zero in the concat. This helped us use a movzx instead of an AND in some of the tests. In those tests, one concat comes from SelectionDAGBuilder and the second comes from type legalization of v4i1->i4 bitcasts which uses an additional concat. Though these changes weren't my original motivation. I'm looking at making X86ISelLowering's narrowShuffle emit a concat_vectors instead of an insert_subvector since concat_vectors is more canonical during early DAG combine. This patch helps prevent a regression from my experiments with that. Differential Revision: https://reviews.llvm.org/D66456 llvm-svn: 369459	2019-08-20 22:12:50 +00:00
Reid Kleckner	22fb734907	Revert [WinEH] Allocate space in funclets stack to save XMM CSRs This reverts r367088 (git commit `9ad565f70e`) And the follow up fix r368631 / `e9865b9b31` llvm-svn: 369457	2019-08-20 22:08:57 +00:00
Sean Fertile	1e46d4cec5	Adds support for writing the .bss section for XCOFF object files. Adds Wrapper classes for MCSymbol and MCSection into the XCOFF target object writer. Also adds a class to represent the top-level sections, which we materialize in the ObjectWriter. executePostLayoutBinding will map all csects into the appropriate container depending on its storage mapping class, and map all symbols into their containing csect. Once all symbols have been processed we - Assign addresses and symbol table indices. - Calaculte section sizes. - Build the section header table. - Assign the sections raw-pointer value for non-virtual sections. Since the .bss section is virtual, writing the header table is enough to add support. Writing of a sections raw data, or of any relocations is not included in this patch. Testing is done by dumping the section header table, but it needs to be extended to include dumping the symbol table once readobj support for dumping auxiallary entries lands. Differential Revision: https://reviews.llvm.org/D65159 llvm-svn: 369454	2019-08-20 22:03:18 +00:00
Craig Topper	3a2b08e6c9	[X86] Add a DAG combine to transform (i8 (bitcast (v8i1 (extract_subvector (v16i1 X), 0)))) -> (i8 (trunc (i16 (bitcast (v16i1 X))))) on KNL target Without AVX512DQ we don't have KMOVB so we can't really copy 8-bits of a k-register to a GPR. We have to copy 16 bits instead. We do this even if the DAG copy is from v8i1->v16i1. If we detect the (i8 (bitcast (v8i1 (extract_subvector (v16i1 X), 0)))) we should rewrite the types to match the copy we do support. By doing this, we can help known bits to propagate without losing the upper 8 bits of the input to the extract_subvector. This allows some zero extends to be removed since we have an isel pattern to use kmovw for (zero_extend (i16 (bitcast (v16i1 X))). Differential Revision: https://reviews.llvm.org/D66489 llvm-svn: 369434	2019-08-20 20:20:04 +00:00
Craig Topper	250951abf5	[X86] Add isel patterns for (i64 (zext (i8 (bitcast (v16i1 X))))) to use a KMOVW and a SUBREG_TO_REG. Similar for i8 and anyextend. We already had patterns for extending to i32 to take advantage of the impliciting zeroing of the upper bits of a 32-bit GPR that is done by KMOVW/KMOVB. But the extend might be all the way to i64, in which case the existing patterns would fail and we'd get a KMOVW/B followed by a MOVZX. By adding patterns for i64 we can use the fact that KMOVW/B zero the upper bits of the 32-bit GPR and the normal property that 32-bit GPR writes implicitly zero the upper 32-bits of the full 64-bit GPR. The anyextend patterns are slightly different since we don't care about the upper zeros. For the i8->i64 I think this avoids selecting the anyextend as a MOVZX to prevent a partial register issue that doesn't exist. For i16->i64 I think we would have just emitted an insert_subreg on top of the extract_subreg that the vXi16->i16 bitcast pattern emits. The register coalescer or peephole pass should combine those, but this saves that work and makes i8/16 consistent. llvm-svn: 369431	2019-08-20 19:43:48 +00:00
Martin Storsjo	514f3a122d	[TargetMachine] Don't try to create COFFSTUB references on windows on non-COFF This avoids spurious relocation types for windows/elf targets. Differential Revision: https://reviews.llvm.org/D66401 llvm-svn: 369426	2019-08-20 18:58:05 +00:00
Matt Arsenault	4b7fc85c0b	Revert "AMDGPU: Fix iterator error when lowering SI_END_CF" This reverts r367500 and r369203. This is causing various test failures. llvm-svn: 369417	2019-08-20 17:45:25 +00:00
Andrea Di Biagio	2e897a94f5	[X86][BtVer2] Use ReadAfterLd entries for the register operands of CMPXCHG. This is a follow-up of r369365. llvm-svn: 369412	2019-08-20 17:05:56 +00:00
Craig Topper	22ac9f396f	[X86] Use isNullConstant instead of getConstantOperandVal == 0. NFC llvm-svn: 369410	2019-08-20 16:55:12 +00:00
Sam Tebbs	dcfc2d40d3	[ARM] Select vaddva This patch adds vaddva selection. Differential revision: https://reviews.llvm.org/D66410 llvm-svn: 369404	2019-08-20 16:33:34 +00:00
Andrea Di Biagio	16111d3795	[X86][BtVer2] Fix latency and throughput of atomic INC/DEC/NEG/NOT. Latency and throughput of LOCK INC/DEC/NEG/NOT is always 19cy. Number of uOPs is still 1. Differential Revision: https://reviews.llvm.org/D66469 llvm-svn: 369388	2019-08-20 14:31:27 +00:00
Alex Bradbury	7cb3cd34e8	[RISCV] Implement getExprForFDESymbol to ensure RISCV_32_PCREL is used for the FDE location Follow binutils in using RISCV_32_PCREL for the FDE initial location. As explained in the relevant binutils commit <`a6cbf936e3`>, the ADD/SUB pair of relocations is problematic in the presence of linker relaxation. This patch has the same end goal as D64715 but includes test changes and avoids adding a new global VariantKind to MCExpr.h (preferring RISCVMCExpr VKs like the rest of the RISC-V backend). Differential Revision: https://reviews.llvm.org/D66419 llvm-svn: 369375	2019-08-20 12:32:31 +00:00
Andrea Di Biagio	b1bdd97a26	[X86][Btver2] Fix latency and throughput of CMPXCHG instructions. On Jaguar, CMPXCHG has a latency of 11cy, and a maximum throughput of 0.33 IPC. Throughput is superiorly limited to 0.33 because of the implicit in/out dependency on register EAX. In the case of repeated non-atomic CMPXCHG with the same memory location, store-to-load forwarding occurs and values for sequent loads are quickly forwarded from the store buffer. Interestingly, the functionality in LLVM that computes the reciprocal throughput doesn't seem to know about RMW instructions. That functionality only looks at the "consumed resource cycles" for the throughput computation. It should be fixed/improved by a future patch. In particular, for RMW instructions, that logic should also take into account for the write latency of in/out register operands. An atomic CMPXCHG has a latency of ~17cy. Throughput is also limited to ~17cy/inst due to cache locking, which prevents other memory uOPs to start executing before the "lock releasing" store uOP. CMPXCHG8rr and CMPXCHG8rm are treated specially because they decode to one less macro opcode. Their latency tend to be the same as the other RR/RM variants. RR variants are relatively fast 3cy (but still microcoded - 5 macro opcodes). CMPXCHG8B is 11cy and unfortunately doesn't seem to benefit from store-to-load forwarding. That means, throughput is clearly limited by the in/out dependency on GPR registers. The uOP composition is sadly unknown (due to the lack of PMCs for the Integer pipes). I have reused the same mix of consumed resource from the other CMPXCHG instructions for CMPXCHG8B too. LOCK CMPXCHG8B is instead 18cycles. CMPXCHG16B is 32cycles. Up to 38cycles when the LOCK prefix is specified. Due to the in/out dependencies, throughput is limited to 1 instruction every 32 (or 38) cycles dependeing on whether the LOCK prefix is specified or not. I wouldn't be surprised if the microcode for CMPXCHG16B is similar to 2x microcode from CMPXCHG8B. So, I have speculatively set the JALU01 consumption to 2x the resource cycles used for CMPXCHG8B. The two new hasLockPrefix() functions are used by the btver2 scheduling model check if a MCInst/MachineInst has a LOCK prefix. Calls to hasLockPrefix() have been encoded in predicates of variant scheduling classes that describe lat/thr of CMPXCHG. Differential Revision: https://reviews.llvm.org/D66424 llvm-svn: 369365	2019-08-20 10:23:55 +00:00
Craig Topper	1ada137854	[X86] Add back the -x86-experimental-vector-widening-legalization comand line flag and all associated code, but leave it enabled by default Google is reporting performance issues with the new default behavior and have asked for a way to switch back to the old behavior while we investigate and make fixes. I've restored all of the code that had since been removed and added additional checks of the command flag onto code paths that are not otherwise guarded by a check of getTypeAction. I've also modified the cost model tables to hopefully get us back to the previous costs. Hopefully we won't need to support this for very long since we have no test coverage of the old behavior so we can very easily break it. llvm-svn: 369332	2019-08-20 06:58:00 +00:00
Thomas Raoux	a08e139d50	[NFC] Test commit, fix some comment spelling. llvm-svn: 369326	2019-08-20 05:21:27 +00:00
Karl-Johan Karlsson	40da6be2bd	[AsmPrinter] Remove const qualifier from EmitBasicBlockStart. Overriders may want to modify state in it. AMDGPU wants to, but has to make its members mutable in order to do so. Besides, EmitBasicBlockEnd is not const, so why should Start be? Patch by Bevin Hansson. Reviewed By: nickdesaulniers Differential Revision: https://reviews.llvm.org/D66341 llvm-svn: 369325	2019-08-20 05:13:57 +00:00
Evgeniy Stepanov	55ccd16354	Refactor isPointerOffset (NFC). Summary: Simplify the API using Optional<> and address comments in https://reviews.llvm.org/D66165 Reviewers: vitalybuka Subscribers: hiraditya, llvm-commits, ostannard, pcc Tags: #llvm Differential Revision: https://reviews.llvm.org/D66317 llvm-svn: 369300	2019-08-19 21:08:04 +00:00
Evgeniy Stepanov	50affbe47f	MemTag: stack initializer merging. Summary: MTE provides instructions to update memory tags and data at the same time. This change makes use of those to generate more compact code for stack variable tagging + initialization. We collect memory store and memset instructions following an alloca or a lifetime.start call, and replace them with the corresponding MTE intrinsics. Since the intrinsics work on 16-byte aligned chunks, the stored values are combined as necessary. Reviewers: pcc, vitalybuka, ostannard Subscribers: srhines, javed.absar, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66167 llvm-svn: 369297	2019-08-19 20:47:09 +00:00
Sam Clegg	19bf637eb1	[WebAssembly][MC] Allow empty assembly functions Differential Revision: https://reviews.llvm.org/D66434 llvm-svn: 369292	2019-08-19 19:04:54 +00:00
Craig Topper	a0d92c7262	[X86] Teach lowerV4I32Shuffle to only use broadcasts if the mask has more than one undef element. Prioritize shifts over broadcast in lowerV8I16Shuffle. The motivating case are the changes in vector-reduce-add.ll where we were doing extra work in the scalar domain instead of shuffling. There may be some one use check that needs to be looked into there, but this patch sidesteps the issue by avoiding broadcasts that aren't really broadcasting. Differential Revision: https://reviews.llvm.org/D66071 llvm-svn: 369287	2019-08-19 18:15:50 +00:00
Serge Guelton	a023d6b7de	[nfc] Silent gcc warning llvm-svn: 369266	2019-08-19 14:40:33 +00:00
Jinsong Ji	0776da5236	[PeepholeOptimizer] Don't assume bitcast def always has input Summary: If we have a MI marked with bitcast bits, but without input operands, PeepholeOptimizer might crash with assert. eg: If we apply the changes in PPCInstrVSX.td as in this patch: [(set v4i32:$XT, (bitconvert (v16i8 immAllOnesV)))]>; We will get assert in PeepholeOptimizer. ``` llvm-lit llvm-project/llvm/test/CodeGen/PowerPC/build-vector-tests.ll -v llvm-project/llvm/include/llvm/CodeGen/MachineInstr.h:417: const llvm::MachineOperand &llvm::MachineInstr::getOperand(unsigned int) const: Assertion `i < getNumOperands() && "getOperand() out of range!"' failed. ``` The fix is to abort if we found out of bound access. Reviewers: qcolombet, MatzeB, hfinkel, arsenm Reviewed By: qcolombet Subscribers: wdng, arsenm, steven.zhang, wuzish, nemanjai, hiraditya, kbarton, MaskRay, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65542 llvm-svn: 369261	2019-08-19 14:19:04 +00:00
Alex Bradbury	1c1f8f215d	[RISCV] Don't force absolute FK_Data_X fixups to relocs The current behavior of shouldForceRelocation forces relocations for the majority of fixups when relaxation is enabled. This makes sense for fixups which incorporate symbols but is unnecessary for simple data fixups where the fixup target is already resolved to an absolute value. Differential Revision: https://reviews.llvm.org/D63404 Patch by Edward Jones. llvm-svn: 369257	2019-08-19 13:23:02 +00:00
Sam Tebbs	f312c1ecf4	[ARM] Add support for MVE vaddv This patch adds vecreduce_add and the relevant instruction selection for vaddv. Differential revision: https://reviews.llvm.org/D66085 llvm-svn: 369245	2019-08-19 09:38:28 +00:00
David Green	2bfc13fde1	[ARM] MVE sext costs This adds some sext costs for MVE, taken from the length of assembly sequences that we currently generate. Differential Revision: https://reviews.llvm.org/D66010 llvm-svn: 369244	2019-08-19 09:13:22 +00:00
Craig Topper	ebb7ddc633	[X86] Teach lower1BitShuffle to match right shifts with upper zero elements on types that don't natively support KSHIFT. We can support these by widening to a supported type, then shifting all the way to the left and then back to the right to ensure that we shift in zeroes. llvm-svn: 369232	2019-08-19 05:45:39 +00:00
Craig Topper	e47437a6ef	[X86] Fix the lower1BitShuffle code added in r369215 to correctly pass the widened vector to the KSHIFT node. Not sure how to test this as we have tests that exercise this code, but nothing failed for the types not matching. Since all the k-registers use equivalent register classes everything just ends up working. llvm-svn: 369228	2019-08-19 04:08:44 +00:00
Craig Topper	269c6b1c15	[X86] Teach lower1BitShuffle to match KSHIFTR that doesn't use Zeroable and only relies on undef. This allows us to widen the type when the KSHIFTR instruction doesn't exist for the type. If we need to shift in zeroes into the upper elements we would need more work to guarantee zeroes when widening. llvm-svn: 369227	2019-08-19 04:08:40 +00:00
Craig Topper	2eb7951da3	[X86] Teach lower1BitShuffle to recognize padding a subvector with zeros with V2 as the source and V1 as the zero vector. Shuffle canonicalization can swap the sources so the zero vector might be V1 and the subvector that's being padded can be V2. llvm-svn: 369226	2019-08-19 00:39:22 +00:00
Craig Topper	2ee46c7c4b	[X86] Add a special case to LowerCONCAT_VECTORSvXi1 to handle concatenating zero vectors followed by one non-zero vector followed by undef vectors. For such a case we should only need a KSHIFTL, but we were previously generating a KSHIFTL followed by a KSHIFTR because we mistakenly believed we need to zero the undef elements. llvm-svn: 369224	2019-08-18 23:30:11 +00:00
Craig Topper	388b8dd94a	[X86] Replace uses of getZeroVector for vXi1 vectors with DAG.getConstant. vXi1 vectors don't need special handling. llvm-svn: 369222	2019-08-18 23:30:03 +00:00
Craig Topper	9e074c06fe	[X86] Improve lower1BitShuffle handling for KSHIFTL on narrow vectors. We can insert the value into a larger legal type and shift that by the desired amount. llvm-svn: 369215	2019-08-18 18:52:46 +00:00
Simon Pilgrim	63b3c56fca	Fix signed/unsigned comparison warning. NFCI. llvm-svn: 369213	2019-08-18 17:26:30 +00:00
Simon Pilgrim	fee2546f3f	[X86] isTargetShuffleEquivalent - add BUILD_VECTOR matching Add similar functionality to isShuffleEquivalent - if the mask elements don't match, try matching the BUILD_VECTOR scalars instead. As target shuffles need to handle SM_Sentinel values, this can get a bit tricky, so commit just adds actual mask element index handling - full SM_SentinelZero support will be added when the need arises. Also, enables support in matchVectorShuffleWithPACK llvm-svn: 369212	2019-08-18 17:15:26 +00:00
Simon Pilgrim	a66edd86e2	[X86] isTargetShuffleEquivalent - early out on illegal shuffle masks. NFCI. Simplifies shuffle mask comparisons by just bailing out if the shuffle mask has any out of range values - will make an upcoming patch much simpler. llvm-svn: 369211	2019-08-18 16:37:58 +00:00
Matt Arsenault	479f3bdb2c	AMDGPU: Fix iterator error when lowering SI_END_CF If the instruction is the last in the block, there is no next instruction but the iteration still needs to look at the new block. llvm-svn: 369203	2019-08-18 00:20:44 +00:00
Matt Arsenault	cfdc2b9bd9	AMDGPU: Disambiguate v3f16 format in load/store tables Currently the searchable tables report the number of dwords. These round to the same number for 3 and 4 component d16 instructions. Change this to report the number of elements so this isn't ambiguous. llvm-svn: 369202	2019-08-18 00:20:43 +00:00
Craig Topper	31f829f0cd	[X86] Add a one use check to the combineStore code that handles v16i16->v16i8 truncate+store by extending to v16i32 and then emitting a v16i32->v16i8 truncstore. This prevent us from emitting a separate truncate and a truncating store instruction. llvm-svn: 369200	2019-08-17 22:46:15 +00:00
Paul Walker	26295676a4	Revert Revert [AArch64InstrInfo] Stop getInstSizeInBytes returning non-zero for meta instructions. This reverts r369132 (git commit `19301d75f0`) llvm-svn: 369186	2019-08-17 09:22:36 +00:00
Paul Walker	93c7a4a47c	Revert [AArch64InstrInfo] Stop getInstSizeInBytes returning non-zero for meta instructions. This reverts r369133 (git commit `2632c677f8`) llvm-svn: 369185	2019-08-17 09:22:28 +00:00

... 3 4 5 6 7 ...

53908 Commits