llvm-project

Commit Graph

Author	SHA1	Message	Date
Craig Topper	27a5896fe8	[X86] Correct 256 vpmovzx/vpmovsx isel patterns to check HasAVX2 instead of HasAVX to prevent fast-isel from using them incorrectly. These are AVX2 instructions, but have been incorrectly marked in tablegen for a while. This wasn't a problem until r346784 switched the patterns to use target independent ISD opcodes. This made the patterns visible to fast isel. Fixes PR39733 llvm-svn: 347375	2018-11-21 01:39:38 +00:00
Craig Topper	aa52ee2770	[X86] Emit a PACKUS instead of a VECTOR_SHUFFLE from LowerTRUNCATE for v16i16->v16i8. We can't guarantee that demanded bits passing through the vector shuffle won't cause the AND in front of this to be removed. This would prevent the PACKUS from being matched during shuffle lowering. Unfortunately, this adds a packuswb to one of the vector-reduce-mul.ll tests since we were removing the shuffle via SimplifyDemandedVectorElts. We appear to have similar issues with vpmovwb on the same test case on other targets. llvm-svn: 347361	2018-11-20 22:57:48 +00:00
Craig Topper	24b346da42	[X86] Emit a single shuffle for the v16i8->v4i32 step of a SIGN_EXTEND_VECTOR_INREG lowering on pre-sse4.1 targets. Previously we emitted to separate shuffles, one for unpcklbw and one for unpcklwd. Instead emit a single shuffle equivalent to both of the original shuffles. Shuffle lowering seems able to handle it. This avoids a bitcast between the two shuffles which seems helpful to DAG combine. Remove the custom type legalization for v8i8->v8i32. I had put that in to avoid some almost duplicate punpcklbw instructions I was seeing, but this lowering change seems to fix that. It also fixes some duplicate shuffles seen in vector-sext.ll llvm-svn: 347348	2018-11-20 21:21:52 +00:00
Sam Clegg	4791a668f5	[WebAssembly] WebAssemblyLowerEmscriptenEHSjLj: use getter/setter for accessing tempRet0 Rather than assuming that `tempRet0` exists in linear memory only assume the getter/setter functions exist. This avoids conflicting with binaryen which declares a wasm global for this purpose and defines it's own getter and setter for that. The other advantage of doing things this way is that it leaving it up to the linker/finalizer to decide how to actually store this temporary. As it happens binaryen uses a wasm global which is more appropriate since it is thread safe. This also allows us to change the way this is stored in the future (memory, TLS memory, wasm global) without modifying LLVM. This is part of a 4 part change: LLVM: https://reviews.llvm.org/D53240 fastcomp: https://github.com/kripken/emscripten-fastcomp/pull/237 emscripten: https://github.com/kripken/emscripten/pull/7358 binaryen: https://github.com/WebAssembly/binaryen/pull/1709 Differential Revision: https://reviews.llvm.org/D53240 llvm-svn: 347340	2018-11-20 19:25:07 +00:00
Jinsong Ji	9a0ed20072	[PowerPC] Add Itineraries for STWU/STWUX etc When doing some instruction scheduling work, we noticed some missing itineraries. Before we switch to machine scheduler, those missing itineraries might not have impact to actually scheduling, because we can still get same latency due to default values. With machine scheduler, however, itineraries will have impact to scheduling. eg: NumMicroOps will default to be 0 if there is NO itineraries for specific instruction class. And most of the instruction class with itineraries will have NumMicroOps default to 1. This will has impact on the count of RetiredMOps, affects the Pending/Available Queue, then causing different scheduling or suboptimal scheduling further. This patch is for STWU/STWUX (IIC_LdStStoreUpd ) for P8. Since there are already multiple IIC for store update, this patch also merge IIC_LdStSTDU/IIC_LdStStoreUpd to IIC_LdStSTU IIC_LdStSTDUX to IIC_LdStSTUX and we add a new testcase in https://reviews.llvm.org/D54699 to show the difference. Differential Revision: https://reviews.llvm.org/D54700 llvm-svn: 347311	2018-11-20 15:11:42 +00:00
Simon Pilgrim	c9cc6cca42	Fix MSVC 'truncation of constant value' warning. NFCI. llvm-svn: 347308	2018-11-20 14:29:40 +00:00
Simon Pilgrim	ee8b96f253	[X86][SSE] Add computeKnownBits/ComputeNumSignBits support for PACKSS/PACKUS instructions. Pull out getPackDemandedElts demanded elts remapping helper from computeKnownBitsForTargetNode and use in computeKnownBits/ComputeNumSignBits. llvm-svn: 347303	2018-11-20 13:23:37 +00:00
Simon Pilgrim	ed7e2fda18	[X86][SSE] XFormVExtractWithShuffleIntoLoad - getVectorShuffle won't accept SM_SentinelZero Noticed while working on improving demanded elts target shuffle shuffle combining llvm-svn: 347302	2018-11-20 12:17:50 +00:00
Simon Pilgrim	a6fb85ffa7	[X86][SSE] Lower immediately to PACKUS instead of VECTOR_SHUFFLE. As discussed on rL347240, this avoids some regressions on D54679 and also helps some combines to kick in a bit earlier. llvm-svn: 347300	2018-11-20 11:46:37 +00:00
Simon Pilgrim	7198506ba8	[X86][SSE] Add SimplifyDemandedVectorElts support for PACKSS/PACKUS instructions. As discussed on rL347240. llvm-svn: 347299	2018-11-20 11:09:46 +00:00
Craig Topper	17fa42a69b	[X86] Preserve undef information when creating a punpckl/hbw from a v16i8 where all the even or odd elements are undef. Previously if V2 was unused we ended up using V1 for both inputs as part of the code that follows the new code. By using lowerVectorShuffleWithUNPCK we keep the undef nature of V2 in the output. As near as I can tell this makes v16i8 behavior consistent with every other VT now. This does mean that we give the register allocator freedom to fill in random registers now and create false dependencies. But like I said we're already doing that for other types. llvm-svn: 347296	2018-11-20 09:04:01 +00:00
Craig Topper	b06d1aa3a1	[X86] Add custom type legalization for v8i8->v8i32 sign extend pre-SSE4.1 This helps with a future patch and makes us less reliant on DAG combine merging shuffles. llvm-svn: 347295	2018-11-20 09:03:58 +00:00
Craig Topper	c733c7bf94	[X86] Replace more calls to getZeroVector with regular getConstant. getZeroVector produces a specifically canonicalized zero vector, but we can just let DAG legalization take care of it. The test changes are because MULH lowering happens later than it should and this change gave us the opportunity to constant fold away a multiply during a DAG combine before the build_vector got legalized with a bitcast. llvm-svn: 347290	2018-11-20 06:54:01 +00:00
Nemanja Ivanovic	9b393909e2	[PowerPC] Don't combine to bswap store on 1-byte truncating store Turns out that there was no check for a store that truncates down to a single byte when combining a (store (bswap...)) into a byte-swapping store. This patch just adds that check. Fixes https://bugs.llvm.org/show_bug.cgi?id=39478. llvm-svn: 347288	2018-11-20 04:42:31 +00:00
Craig Topper	808d0dd689	[X86] Rename combineVSZext->combineExtendVectorInreg. NFC Now that we no longer have target specific vector extend nodes let's make the function name match the nodes we do use. llvm-svn: 347268	2018-11-19 22:18:47 +00:00
Konstantin Zhuravlyov	700b1ef54d	AMDGPU: Fix V_FMA_F16 selection on GFX9 GFX9 should select opsel version. Differential Revision: https://reviews.llvm.org/D54545 llvm-svn: 347265	2018-11-19 21:10:16 +00:00
Stanislav Mekhanoshin	8bafbae889	[AMDGPU] Restored selection of scalar_to_vector (v2x16) This works if DAG combiner is enabled, but without combining we cannot select scalar_to_vector of <2 x half> and <2 x i16>. Differential Revision: https://reviews.llvm.org/D54718 llvm-svn: 347259	2018-11-19 19:58:13 +00:00
Craig Topper	a5e0380c30	[X86][CostModel] Don't lookup intrinsic cost tables if the intrinsic isn't one we care about We're seeing some issues internally where we sent some intrinsics into the cost model that the getTypeLegalizationCost call fails on, but X86 specific tables don't care about. Our base class implementation takes care of them. We'd just like X86 backend to ignore them. This patch makes sure the switch returned something X86 cares about and skips the table lookups and type legalization call if not. Probably more efficient too since we don't go scanning the tables for every intrinsic we could possibly see. Differential Revision: https://reviews.llvm.org/D54711 llvm-svn: 347248	2018-11-19 18:57:31 +00:00
Simon Pilgrim	c4861ab170	[X86][SSE] Remove unnecessary bit-and in pshufb vector ctlz (PR39703) SSE PSHUFB vector ctlz lowering works at the i4 nibble level. As detailed in PR39703, we were masking the lower nibble off but we only actually use it in the case where the upper nibble is known to be zero, making it safe to remove the mask and save an instruction. Differential Revision: https://reviews.llvm.org/D54707 llvm-svn: 347242	2018-11-19 18:40:59 +00:00
Craig Topper	311bbcd535	[X86] Attempt to improve v32i8/v64i8 multiply lowering by applying the v16i8 non-avx2 algorithm to each 128-bit lane. Previously we split the vectors in half to allow the two halves to be any extended then concatenated the results back together. This patch instead instead extends the v16i8 sse algorithm to extend half of each 128-bit lane using punpcklbw/punpckhbw. Multiplies all the low half lanes and high half lanes together in separate operations. Then merges the half lane results back together using packuswb. Unfortunately, some of the cases in vector-reduce-mul.ll regress because we aren't narrowing the vector width of the multiplies as we reduce. The splitting was somewhat making up for that before by causing halves to be discarded after the split. Differential Revision: https://reviews.llvm.org/D54668 llvm-svn: 347240	2018-11-19 18:32:53 +00:00
Fangrui Song	d83a5526d5	[AMDGPU] Fix -Wunused-variable llvm-svn: 347234	2018-11-19 17:54:27 +00:00
Stanislav Mekhanoshin	054f8101f1	[AMDGPU] Convert insert_vector_elt into set of selects This allows to avoid scratch use or indirect VGPR addressing for small vectors. Differential Revision: https://reviews.llvm.org/D54606 llvm-svn: 347231	2018-11-19 17:39:20 +00:00
Wouter van Oortmerssen	49482f824a	[WebAssembly] replaced .param/.result by .functype Summary: This makes it easier/cleaner to generate a single signature from this directive. Also: - Adds the symbol name, such that we don't depend on the location of this directive anymore. - Actually constructs the signature in the assembler, and make the assembler own it. - Refactor the use of MVT vs ValType in the streamer and assembler to require less conversions overall. - Changed 700 or so tests to use it. Reviewers: sbc100, dschuff Subscribers: jgravelle-google, eraman, aheejin, sunfish, jfb, llvm-commits Differential Revision: https://reviews.llvm.org/D54652 llvm-svn: 347228	2018-11-19 17:10:36 +00:00
David Stuttard	be3d7ba9fb	[AMDGPU] Derive GCNSubtarget from MF to get overridden target features Summary: AMDGPUAsmPrinter has a getSTI function that derives a GCNSubtarget from the TM. However, this means that overridden target features are not detected and can result in incorrect behaviour. Switch to using STM which is a GCNSubtarget derived from the MF (used elsewhere in the same function). Change-Id: Ib6328ad667b7fcdc87e9c06344e59859207db9b0 Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D54301 llvm-svn: 347221	2018-11-19 15:44:20 +00:00
Martin Elshuber	fef3036d37	Subject: [PATCH] [CodeGen] Add pass to combine interleaved loads. This patch defines an interleaved-load-combine pass. The pass searches for ShuffleVector instructions that represent interleaved loads. Matches are converted such that they will be captured by the InterleavedAccessPass. The pass extends LLVMs capabilities to use target specific instruction selection of interleaved load patterns (e.g.: ld4 on Aarch64 architectures). Differential Revision: https://reviews.llvm.org/D52653 llvm-svn: 347208	2018-11-19 14:26:10 +00:00
Nicolai Haehnle	c548d91419	AMDGPU/InsertWaitcnts: Some more const-correctness Reviewers: msearles, rampitec, scott.linder, kanarayan Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits, hakzsam Differential Revision: https://reviews.llvm.org/D54225 llvm-svn: 347192	2018-11-19 12:03:11 +00:00
Sam Parker	e7c42dd7e2	[ARM] Remove trunc sinks in ARM CGP Truncs are treated as sources if their produce a value of the same type as the one we currently trying to promote. Truncs used to be considered as a sink if their operand was the same value type. We now allow smaller types in the search, so we should search through truncs that produce a smaller value. These truncs can then be converted to an AND mask. This leaves sinks as being: - points where the value in the register is being observed, such as an icmp, switch or store. - points where value types have to match, such as calls and returns. - zext are included to ease the transformation and are generally removed later on. During this change, it also became apart from truncating sinks was broken: if a sink used a source, its type information had already been lost by the time the truncation happens. So I've changed the method of caching the type information. Differential Revision: https://reviews.llvm.org/D54515 llvm-svn: 347191	2018-11-19 11:34:40 +00:00
Anton Korobeynikov	4df19b75c0	[MSP430] Optimize srl/sra in case of A >> (8 + N) There is no variable-length shifts on MSP430. Therefore "eat" 8 bits of shift via bswap & ext. Path by Kristina Bessonova! Differential Revision: https://reviews.llvm.org/D54623 llvm-svn: 347187	2018-11-19 10:43:02 +00:00
Craig Topper	8b22bcd39f	[X86] Use a pcmpgt with 0 instead of psrad 31, to fill elements with the sign bit in v4i32 MULH lowering. The shift requires a copy to avoid clobbering a register. Comparing with 0 uses an xor to produce 0 that will be overwritten with the compare results. So still requires 2 instructions, but should be one byte shorter since it doesn't need to encode an immediate. llvm-svn: 347185	2018-11-19 07:22:26 +00:00
Craig Topper	3616891046	[X86] Use compare with 0 to fill an element with sign bits when sign extending to v2i64 pre-sse4.1 Previously we used an arithmetic shift right by 31, but that requires a copy to preserve the input. So we might as well materialize a zero and compare to it since the comparison will overwrite the register that contains the zeros. This should be one byte shorter. llvm-svn: 347181	2018-11-19 04:33:20 +00:00
Craig Topper	053f1eea96	[X86] Remove most of the SEXTLOAD Custom setOperationAction calls under -x86-experimental-vector-widening-legalization. Leave just the v4i8->v4i64 and v8i8->v8i64, but only enable them on pre-sse4.1 targets when 64-bit mode is enabled. In those cases we end up creating sext loads that get scalarized to code that looks better than what we get from loading into a vector register and doing a multiple step sign extend using unpacks and shifts. llvm-svn: 347180	2018-11-19 00:33:16 +00:00
Simon Pilgrim	7f92efa5a9	[X86][SSE] Add SimplifyDemandedVectorElts support for SSE packed i2fp conversions. llvm-svn: 347177	2018-11-18 22:13:31 +00:00
Craig Topper	0468c860b7	[X86] Add custom type legalization for extending v4i8/v4i16->v4i64. Pre-SSE4.1 sext_invec for v2i64 is complicated because we don't have a v2i64 sra instruction. So instead we sign extend to i32 using unpack and sra, then copy the elements and do a v4i32 sra to fill with sign bits, then interleave the i32 sign extend and the sign bits. So really we're doing to two sign extends but only using half of the v4i32 intermediate result. When the result is more than 128 bits, default type legalization would prefer to split the destination type all the way down to v2i64 with shuffles followed by v16i8/v8i16->v2i64 sext_inreg operations. This results in more instructions than necessary because we are only utilizing the lower 2 elements of the v4i32 intermediate result. Instead we can custom split a v4i8/v4i16->v4i64 sign_extend. Then we can sign extend v4i8/v4i16->v4i32 invec producing a full v4i32 result. Create the sign bit vector as a v4i32 then split and interleave with the sign bits using an punpackldq and punpackhdq. llvm-svn: 347176	2018-11-18 21:28:50 +00:00
Simon Pilgrim	b31bdbd2e9	[X86][SSE] Add SimplifyDemandedVectorElts support for SSE splat-vector-shifts. SSE vector shifts only use the bottom 64-bits of the shift amount vector. llvm-svn: 347173	2018-11-18 20:21:52 +00:00
Craig Topper	11d50948e2	[X86] Disable combineToExtendVectorInReg under -x86-experimental-vector-widening-legalization. Add custom type legalization for extends. If we widen illegal types instead of promoting, we should be able to rely on the type legalizer to create the vector_inreg operations for us with some caveats. This patch disables combineToExtendVectorInReg when we are using widening. I've enabled custom legalization for v8i8->v8i64 extends under avx512f since the type legalizer would want to create a vector_inreg with a v64i8 input type which isn't legal without avx512bw. So we go to v16i8 with custom code using the relaxation of rules we get from D54346. I've also enable custom legalization of v8i64 and v16i32 operations with with AVX. When the input type is 128 bits, the default splitting legalization would extend first 128->256, then do the a split to two 128 pieces. Extend each half to 256 and then concat the result. The custom legalization I've added instead uses a 128->256 bit vector_inreg extend that only reads the lower 64-bits for the low half of the split. Then shuffles the high 64-bits to the low 64-bits and does another vector_inreg extend. llvm-svn: 347172	2018-11-18 18:11:25 +00:00
Craig Topper	bc8148f7b0	[X86] Lower v16i16->v8i16 truncate using an 'and' with 255, an extract_subvector, and a packuswb instruction. Summary: This is an improvement over the two pshufbs and punpcklqdq we'd get otherwise. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D54671 llvm-svn: 347171	2018-11-18 17:59:28 +00:00
Simon Pilgrim	ec808cf541	Remove unused variable. NFCI. llvm-svn: 347169	2018-11-18 17:24:59 +00:00
Simon Pilgrim	50828c75d0	[X86][SSE] Split IsSplatValue into GetSplatValue and IsSplatVector Refactor towards making this recursive (necessary for PR38243 rotation splat detection). IsSplatVector returns the original vector source of the splat and the splat index. GetSplatValue returns the scalar splatted value as an extraction from IsSplatVector. llvm-svn: 347168	2018-11-18 17:15:06 +00:00
Simon Pilgrim	fec9f8657b	[X86][SSE] Relax IsSplatValue - remove the 'variable shift' limit on subtracts. Means we don't use the per-lane-shifts as much when we can cheaply use the older splat-variable-shifts. llvm-svn: 347162	2018-11-18 15:52:08 +00:00
Simon Pilgrim	cc1f5d2407	[X86][SSE] Use raw shuffle mask decode in SimplifyDemandedVectorEltsForTargetNode (PR39549) We were using the 'normalized' shuffle mask from resolveTargetShuffleInputs, which replaces zero/undef inputs with sentinel values. For SimplifyDemandedVectorElts we need the raw mask so we can correctly demand those 'zero' inputs that got normalized away, this requires an extra bit of logic to locally normalize undef inputs. llvm-svn: 347158	2018-11-18 13:34:53 +00:00
Heejin Ahn	e0f8b9bfc6	[WebAssembly] Add null streamer support Summary: Now `llc -filetype=null` works. Reviewers: eush Subscribers: dschuff, jgravelle-google, sbc100, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D54660 llvm-svn: 347155	2018-11-18 11:58:47 +00:00
Craig Topper	cd94a7c227	[X86] Add -x86-experimental-vector-widening-legalization check to combineSelect and combineSetCC to cover vXi16/vXi8 promotion without BWI. I don't yet have any test cases for this, but its the right thing to do based on log file inspection. llvm-svn: 347151	2018-11-18 08:30:09 +00:00
Craig Topper	b03f80a21c	[X86] Rename WidenMaskArithmetic->PromoteMaskArithmetic since we usually use widen to refer to adding elements not making elements larger. NFC llvm-svn: 347150	2018-11-18 07:35:08 +00:00
Craig Topper	f56a57518d	[X86] Don't use a pmaddwd for vXi32 multiply if the inputs are zero extends from i8 or smaller without SSE4.1. Prefer to shrink the mul instead. The zero extend will require two stages of unpacks to implement. So its better to shrink the multiply using pmullw and then extend that result back to v4i32 using a single unpack. llvm-svn: 347149	2018-11-18 05:53:21 +00:00
Craig Topper	0438d791fa	[X86] Add support for matching PACKUSWB from a v64i8 shuffle. llvm-svn: 347143	2018-11-17 18:54:43 +00:00
Craig Topper	dd61f11642	[X86] Don't extend v32i8 multiplies to v32i16 with avx512bw and prefer-vector-width=256. llvm-svn: 347131	2018-11-17 02:36:07 +00:00
Craig Topper	b05ea28f1f	[X86] Use getUnpackl/getUnpackh instead of hardcoding a shuffle mask. llvm-svn: 347127	2018-11-17 02:18:12 +00:00
Fangrui Song	7570932977	Use llvm::copy. NFC llvm-svn: 347126	2018-11-17 01:44:25 +00:00
Craig Topper	ee0333b4a9	[X86] Add custom promotion of narrow fp_to_uint/fp_to_sint operations under -x86-experimental-vector-widening-legalization. This tries to force the result type to vXi32 followed by a truncate. This can help avoid scalarization that would otherwise occur. There's some annoying examples of an avx512 truncate instruction followed by a packus where we should really be able to just use one truncate. But overall this is still a net improvement. llvm-svn: 347105	2018-11-16 22:53:00 +00:00
Craig Topper	87bc07b3dd	[X86] Qualify part of the masked gather handling in ReplaceNodeResults with a getTypeAction call to know if we can use default legalization. If we managed to switch to -x86-experimental-vector-widening-legalization this block can be removed. llvm-svn: 347100	2018-11-16 22:04:29 +00:00
Craig Topper	567aaeb40d	[X86] Remove a branch on SSE4.1 from LowerLoad We should be able to use getExtendInVec with or without sse4.1 to produce a SIGN_EXTEND_VECTOR_INREG. llvm-svn: 347095	2018-11-16 21:05:00 +00:00
Craig Topper	7fff9a9aef	[X86] In LowerLoad, fix assert messages and rename a variable that use Zize instead of Size. NFC llvm-svn: 347093	2018-11-16 21:04:56 +00:00
Peter Collingbourne	527024469a	AArch64: Emit a call frame instruction for the shadow call stack register. When unwinding past a function that uses shadow call stack, we must subtract 8 from the value of the x18 register. This patch causes us to emit a call frame instruction that causes that to happen. Differential Revision: https://reviews.llvm.org/D54609 llvm-svn: 347089	2018-11-16 20:08:54 +00:00
Anton Korobeynikov	e5cb1c35b4	[MSP430] Add RTLIB::[SRL/SRA/SHL]_I32 lowering to EABI lib calls Patch by Kristina Bessonova! Differential Revision: https://reviews.llvm.org/D54626 llvm-svn: 347080	2018-11-16 19:36:15 +00:00
Rong Xu	3a38175723	[X86] Disable Condbr_merge pass Disable Condbr_merge pass for now due to PR39658. Will reenable the pass once the bug is fixed. llvm-svn: 347079	2018-11-16 19:35:00 +00:00
Stefan Pintilie	9004444d81	Revert "[PowerPC] Make no-PIC default to match GCC - LLVM" This reverts commit r347069 llvm-svn: 347076	2018-11-16 19:24:23 +00:00
Anton Korobeynikov	883c70959d	[MSP430] Use R_MSP430_16_BYTE type for FK_Data_2 fixup Linker fails to link example like this (simplified case from newlib sources): $ cat test.c extern const char _ctype_b[]; struct _t { char ptr; }; struct _t T = { ((char ) _ctype_b + 3) }; $ cat ctype.c char _ctype_b[4] = { 0, 0, 0, 0 }; LD: test.o:(.data+0x0): warning: internal error: unsupported relocation error We also follow gnu toolchain here, where 2-byte relocation mapped to R_MSP430_16_BYTE, instead of R_MSP430_16. Patch by Kristina Bessonova! Differential Revision: https://reviews.llvm.org/D54620 llvm-svn: 347074	2018-11-16 19:20:51 +00:00
Sam Clegg	74f5fd4e32	[WebAssembly] Default to static reloc model Differential Revision: https://reviews.llvm.org/D54637 llvm-svn: 347073	2018-11-16 18:59:51 +00:00
Stefan Pintilie	046eff502f	[PowerPC] Make no-PIC default to match GCC - LLVM Set -fno-PIC as the default option. Differential Revision: https://reviews.llvm.org/D53383 llvm-svn: 347069	2018-11-16 18:36:21 +00:00
Simon Pilgrim	66f42ea6e1	[SelectionDAG] Move (repeated) SDTIntShiftDOp double shift node def to common code. NFCI. Prep work for PR39467. llvm-svn: 347067	2018-11-16 17:50:59 +00:00
Simon Pilgrim	bcd6631a2a	[X86][SSE] Move number of input limit out of resolveTargetShuffleInputs. Only combineX86ShufflesRecursively needs this limit. llvm-svn: 347054	2018-11-16 15:01:05 +00:00
Roman Lebedev	90c5b3f78e	[X86] X86DAGToDAGISel::matchBitExtract(): extract 'lshr' from `X` Summary: As discussed in previous review, and noted in the FIXME, if `X` is actually an `lshr Y, Z` (logical!), we can fold the `Z` into 'control`, and let the `BEXTR` do this too. We could just insert those 8 bits of shift amount into control, but it is better to instead zero-extend them, and 'or' them in place. We can only do this for `lshr`, not `ashr`, because we do not know that the mask cover only the bits of `Y`, and not any of the sign-extended bits. The obvious question is, is this actually legal to do? I believe it is. Relevant quotes, from `Intel® 64 and IA-32 Architectures Software Developer’s Manual`, `BEXTR — Bit Field Extract`: * `Bit 7:0 of the second source operand specifies the starting bit position of bit extraction.` * `A START value exceeding the operand size will not extract any bits from the second source operand.` * `Only bit positions up to (OperandSize -1) of the first source operand are extracted.` * `All higher order bits in the destination operand (starting at bit position LENGTH) are zeroed.` * `The destination register is cleared if no bits are extracted.` FIXME: if we can do this, i wonder if we should prefer `BEXTR` over `BZHI` in such cases. Reviewers: RKSimon, craig.topper, spatel, andreadb Reviewed By: RKSimon, craig.topper, andreadb Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D54095 llvm-svn: 347048	2018-11-16 13:04:54 +00:00
Alex Bradbury	b4a64cede8	[RISCV][NFC] Define and use the new CA instruction format The RISC-V ISA manual was updated on 2018-11-07 (commit 00557c3) to define a new compressed instruction format, RVC format CA (no actual instruction encodings were changed). This patch updates the RISC-V backend to define the new format, and to use it in the relevant instructions. Differential Revision: https://reviews.llvm.org/D54302 Patch by Luís Marques. llvm-svn: 347043	2018-11-16 10:33:23 +00:00
Alex Bradbury	2146e8fb1e	[RISCV] Constant materialisation for RV64I This commit introduces support for materialising 64-bit constants for RV64I, making use of the RISCVMatInt::generateInstSeq helper in order to share logic for immediate materialisation with the MC layer (where it's used for the li pseudoinstruction). test/CodeGen/RISCV/imm.ll is updated to test RV64, and gains new 64-bit constant tests. It would be preferable if anyext constant returns were sign rather than zero extended (see PR39092). This patch simply adds an explicit signext to the returns in imm.ll. Further optimisations for constant materialisation are possible, most notably for mask-like values which can be generated my loading -1 and shifting right. A future patch will standardise on the C++ codepath for immediate selection on RV32 as well as RV64, and then add further such optimisations to RISCVMatInt::generateInstSeq in order to benefit both RV32 and RV64 for codegen and li expansion. Differential Revision: https://reviews.llvm.org/D52962 llvm-svn: 347042	2018-11-16 10:14:16 +00:00
Anton Korobeynikov	411773d227	[MSP430] Add support for .refsym directive Introduces support for '.refsym' assembler directive. From GCC docs (for MSP430): '.refsym' - This directive instructs assembler to add an undefined reference to the symbol following the directive. No relocation is created for this symbol; it will exist purely for pulling in object files from archives. Patch by Kristina Bessonova! Differential Revision: https://reviews.llvm.org/D54618 llvm-svn: 347041	2018-11-16 09:50:24 +00:00
Craig Topper	079c37da58	[X86] Add custom type legalization for v2i8/v4i8/v8i8 mul under -x86-experimental-vector-widening. By early promoting the multiply to use an i16 element type we can avoid op legalization emit a second multiply for the 8 upper elements of the v16i8 type we would otherwise get. llvm-svn: 347032	2018-11-16 06:15:21 +00:00
Matt Arsenault	eabb8dd015	AMDGPU: Fix analyzeBranch failing with pseudoterminators If a block had one of the _term instructions used for gluing exec modifying instructions to the end of the block, analyzeBranch would fail, preventing the verifier from catching a broken successor list. llvm-svn: 347027	2018-11-16 05:03:02 +00:00
Craig Topper	5802b82b40	[X86] Use ANY_EXTEND instead of SIGN_EXTEND in the AVX2 and later path for legalizing vXi8 multiply. We aren't going to use the upper bits of the multiply result that the extend would effect. So we don't need a specific type of extend. This makes some reduction test cases shorter because we were previously trying to sign_extend a truncate which we can't eliminate. llvm-svn: 347011	2018-11-16 01:16:59 +00:00
Craig Topper	1acafd863f	[X86] Update a couple comments to remove a mention of a sign extending that no longer happens. NFC llvm-svn: 347010	2018-11-16 01:16:51 +00:00
Ron Lieberman	cac749ac88	[AMDGPU] Add FixupVectorISel pass, currently Supports SREGs in GLOBAL LD/ST Add a pass to fixup various vector ISel issues. Currently we handle converting GLOBAL_{LOAD\|STORE}_* and GLOBAL_Atomic_* instructions into their _SADDR variants. This involves feeding the sreg into the saddr field of the new instruction. llvm-svn: 347008	2018-11-16 01:13:34 +00:00
Heejin Ahn	095796a391	[WebAssembly] Split BBs after throw instructions Summary: `throw` instruction is a terminator in wasm, but BBs were not splitted after `throw` instructions, causing machine instruction verifier to fail. This patch - Splits BBs after `throw` instructions in WasmEHPrepare and adding an unreachable instruction after `throw`, which will be deleted in LateEHPrepare pass - Refactors WasmEHPrepare into two member functions - Changes the semantics of `eraseBBsAndChildren` in LateEHPrepare pass to match that of WasmEHPrepare pass, which is newly added. Now `eraseBBsAndChildren` does not delete BBs with remaining predecessors. - Fixes style nits, making static function names conform to clang-tidy - Re-enables the test temporarily disabled by rL346840 && rL346845 Reviewers: dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D54571 llvm-svn: 347003	2018-11-16 00:47:18 +00:00
Ron Lieberman	2f5683e6b0	[AMDGPU] NFC Test commit llvm-svn: 347002	2018-11-16 00:46:51 +00:00
Konstantin Zhuravlyov	af7b5d7092	AMDHSA: More code object v3 fixes: - Make sure IsaInfo::hasCodeObjectV3 returns true only for AMDHSA - Update assembler metadata tests to use v2 by default llvm-svn: 347001	2018-11-15 23:14:23 +00:00
Craig Topper	22bfa99448	[X86] Remove ANY_EXTEND special case from canReduceVMulWidth Removing this code doesn't affect any lit tests so it doesn't appear to be tested anymore. I assume it was when it was added, but I guess something else changed? Code coverage report also says its unused. I mostly didn't like that it seemed to count the sign bits as if it was a sign_extend, but then set isPositive as if it was a zero_extend. It feels like we should have picked one interpretation? Differential Revision: https://reviews.llvm.org/D54596 llvm-svn: 346995	2018-11-15 21:19:32 +00:00
Craig Topper	b144c7a6fb	[X86] Minor cleanup to getExtendInVec. NFCI Use unsigned to calculate the subvector index to avoid a cast. Remove an unnecessary condition and replace it with a stronger assert. Use the InVT variable we updated when we extracted instead of grabbing it from the In SDValue. llvm-svn: 346983	2018-11-15 19:20:22 +00:00
Craig Topper	73bb04ab6f	[X86] Add -x86-experimental-vector-widening support to reduceVMULWidth and combineMulToPMADDWD In reduceVMULWidth, we no longer need to worry about extending the vector to 128 bits first. Regular widening of extends, muls and shuffles will take care of that for us. In combineMulToPMADDWD, we can handle v2i32 multiplies and allow the VPMADDWD to be widened to v4i32 during type legalization by adding custom widening like we do have for AVG/ADDUS/SUBUS. I had to modify that code a little to allow different and output VTs. Differential Revision: https://reviews.llvm.org/D54512 llvm-svn: 346980	2018-11-15 18:59:31 +00:00
Thomas Lively	fc3163b67a	[WebAssembly] Fix return type of nextByte Summary: The old return type did not allow for correct error reporting and was causing a compiler warning. Reviewers: aheejin Subscribers: dschuff, sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D54586 llvm-svn: 346979	2018-11-15 18:56:49 +00:00
Simon Pilgrim	0db8cb0147	[X86] Fix MCNullStreamer support for modules with a CodeView flag This fixes -filetype=null support when compiling for a Win32 target and the module has a CodeView flag. The only places changed are the uses of getTargetStreamer function - this patch guards both of them with null checks. Committed on behalf of @eush (Eugene Sharygin) Differential Revision: https://reviews.llvm.org/D54008 llvm-svn: 346962	2018-11-15 15:17:15 +00:00
Alex Bradbury	f809d89980	[RISCV] Mark C.EBREAK instruction as having side effects C.EBREAK was defined with hasSideEffects = 0, which is incorrect and inconsistent with the non-compressed instruction form. This patch corrects this oversight. This wouldn't cause codegen issues, as compressed instructions are only ever generated by converting the non-compressed form as an MCInst. But having correct flags is still worthwhile. Differential Revision: https://reviews.llvm.org/D54256 Patch by Luís Marques. llvm-svn: 346959	2018-11-15 14:52:24 +00:00
Alex Bradbury	7727240438	[RISCV] Mark FREM as Expand Mark the FREM SelectionDAG node as Expand, which is necessary in order to support the frem IR instruction on RISC-V. This is expanded into a library call. Adds the corresponding test. Previously, this would have triggered an assertion at instruction selection time. Differential Revision: https://reviews.llvm.org/D54159 Patch by Luís Marques. llvm-svn: 346958	2018-11-15 14:46:11 +00:00
Anton Korobeynikov	f0001f4186	Add missed files from prev. commit llvm-svn: 346949	2018-11-15 12:35:04 +00:00
Anton Korobeynikov	49045c6a0d	[MSP430] Add MC layer Reapply r346374 with the fixes for modules build. Original summary: This change implements assembler parser, code emitter, ELF object writer and disassembler for the MSP430 ISA. Also, more instruction forms are added to the target description. Patch by Michael Skvortsov! llvm-svn: 346948	2018-11-15 12:29:43 +00:00
Alex Bradbury	22c091fc3c	[RISCV] Introduce the RISCVMatInt::generateInstSeq helper Logic to load 32-bit and 64-bit immediates is currently present in RISCVAsmParser::emitLoadImm in order to support the li pseudoinstruction. With the introduction of RV64 codegen, there is a greater benefit of sharing immediate materialisation logic between the MC layer and codegen. The generateInstSeq helper allows this by producing a vector of simple structs representing the chosen instructions. This can then be consumed in the MC layer to produce MCInsts or at instruction selection time to produce appropriate SelectionDAG node. Sharing this logic means that both the li pseudoinstruction and codegen can benefit from future optimisations, and that this logic can be used for materialising constants during RV64 codegen. This patch does contain a behaviour change: addi will now be produced on RV64 when no lui is necessary to materialise the constant. In that case addiw takes x0 as the source register, so is semantically identical to addi. Differential Revision: https://reviews.llvm.org/D52961 llvm-svn: 346937	2018-11-15 10:11:31 +00:00
Craig Topper	553ac560aa	[X86] Add some custom type legalization rules for truncate with -x86-experimental-vector-widening-legalization. This avoids some nasty shuffles when we have avx512. It will also prevent using zmm truncate instructions when a ymm instruction that zeroes part of an xmm register will do. Also avoid using avx512 truncate instructions when the input is 128 bits or less. These instructions are 2 uops on skx so we can probably find a better single uop shuffle like pshufb. llvm-svn: 346936	2018-11-15 08:23:40 +00:00
Thomas Lively	77b33c86f5	[WebAssembly] Renumber SIMD bitwise instructions Summary: Changed to match https://github.com/WebAssembly/simd/pull/54. Reviewers: aheejin Subscribers: dschuff, sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D54561 llvm-svn: 346931	2018-11-15 03:38:59 +00:00
Konstantin Zhuravlyov	a25e0524c0	AMDGPU: Enable code object v3 for AMDHSA only Differential Revision: https://reviews.llvm.org/D54186 llvm-svn: 346923	2018-11-15 02:32:43 +00:00
Craig Topper	ea6ced9d1a	[X86] Don't mark SEXTLOADS with narrow types as Custom with -x86-experimental-vector-widening-legalization. The narrow types end up requesting widening, but generic legalization will end up scalaring and using a build_vector to do the widening. llvm-svn: 346916	2018-11-15 00:21:41 +00:00
Benjamin Kramer	6b7d6fe079	[X86] Remove unused variable llvm-svn: 346909	2018-11-14 23:13:27 +00:00
Craig Topper	0b2089da4b	[X86] Support v2i32/v4i16/v8i8 load/store using f64 on 32-bit targets under -x86-experimental-vector-widening-legalization. On 64-bit targets the type legalizer will use i64 to legalize these. But when i64 isn't legal, the type legalizer won't try an FP type. So do it manually instead. There are a few regressions in here due to some v2i32 operations like mul and div now being reassembled into a full vector just to store instead of storing the pieces. But this was already occuring in 64-bit mode so its not a new issue. llvm-svn: 346908	2018-11-14 23:02:09 +00:00
Jessica Paquette	27e1754fc9	[MachineOutliner][NFC] Don't compute liveness if X16/X17/NZCV are unused Using the MBB flags, we can tell if X16/X17/NZCV are unused in a block, and also not live out. If this holds for all MBBs, then we can avoid checking for liveness on that candidate. Furthermore, if it holds for an individual candidate's MBB, then we can avoid checking for liveness on that candidate. llvm-svn: 346901	2018-11-14 22:23:38 +00:00
Nirav Dave	1241dcb3cf	Bias physical register immediate assignments The machine scheduler currently biases register copies to/from physical registers to be closer to their point of use / def to minimize their live ranges. This change extends this to also physical register assignments from immediate values. This causes a reduction in reduction in overall register pressure and minor reduction in spills and indirectly fixes an out-of-registers assertion (PR39391). Most test changes are from minor instruction reorderings and register name selection changes and direct consequences of that. Reviewers: MatzeB, qcolombet, myatsina, pcc Subscribers: nemanjai, jvesely, nhaehnle, eraman, hiraditya, javed.absar, arphaman, jfb, jsji, llvm-commits Differential Revision: https://reviews.llvm.org/D54218 llvm-svn: 346894	2018-11-14 21:11:53 +00:00
Aakanksha Patil	1a60116b5c	AMDGPU: Additional pattern for i16 median3 matching min(max(a, b), max(min(a, b), c)) Differential Revision: https://reviews.llvm.org/D54494 llvm-svn: 346886	2018-11-14 20:10:41 +00:00
Craig Topper	6c94264b1f	[X86] Allow pmulh to be formed from narrow vXi16 vectors under -x86-experimental-vector-widening-legalization Narrower vectors will be widened to 128 bits without changing the element size. And generic type legalization can already handle widening mulhu/mulhs. Differential Revision: https://reviews.llvm.org/D54513 llvm-svn: 346879	2018-11-14 18:16:21 +00:00
Simon Pilgrim	cdb170794b	[CostModel] Add generic expansion funnel shift cost support Add support for the expansion of funnelshift/rotates to getIntrinsicInstrCost. This also required us to move the X86 fshl/fshr costs to the same place as the rotates to avoid expansion and get correct scalarization vs vectorization costs. llvm-svn: 346854	2018-11-14 12:24:50 +00:00
Simon Pilgrim	7501780ec6	[X86][AVX512] Remove constant pool shuffle decoding from SelectionDAG This patch removes the last use of the constant pool shuffle decode helper and consistently uses the 'getTargetShuffleMaskIndices' versions instead. The constant pool versions are now purely used for assembly comments. The avx512vbmi intrinsic upgrades had to be altered as they were being decoded as broadcasts, similar to what I fixed in rL346032. I don't think the change is critical - although its annoying that we lose the {k}{z} instruction test coverage as they are tricky to generate.... Differential Revision: https://reviews.llvm.org/D54083 llvm-svn: 346850	2018-11-14 11:26:35 +00:00
Heejin Ahn	da419bdb5e	[WebAssembly] Add support for the event section Summary: This adds support for the 'event section' specified in the exception handling proposal. (This was named 'exception section' first, but later renamed to 'event section' to take possibilities of other kinds of events into consideration. But currently we only store exception info in this section.) The event section is added between the global section and the export section. This is for ease of validation per request of the V8 team. This patch: - Creates the event symbol type, which is a weak symbol - Makes 'throw' instruction take the event symbol '__cpp_exception' - Adds relocation support for events - Adds WasmObjectWriter / WasmObjectFile (Reader) support - Adds obj2yaml / yaml2obj support - Adds '.eventtype' printing support Reviewers: dschuff, sbc100, aardappel Subscribers: jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D54096 llvm-svn: 346825	2018-11-14 02:46:21 +00:00
Zi Xuan Wu	6a3c279d1c	[PowerPC] Enhance the selection(ISD::VSELECT) of vector type To make ISD::VSELECT available(legal) so long as there are altivec instruction, otherwise it's default behavior is expanding, which is legalized at type-legalization phase. Use xxsel to match vselect if vsx is open, or use vsel. Differential Revision: https://reviews.llvm.org/D49531 llvm-svn: 346824	2018-11-14 02:34:45 +00:00
Jessica Paquette	4e97ec94d9	[MachineOutliner][NFC] Use flags set in all candidates to check for calls If we keep track of if the ContainsCalls bit is set in the MBB flags for each candidate, then we have a better chance of not checking the candidate for calls at all. This saves quite a few checks in some CTMark tests (~200 in Bullet, for example.) llvm-svn: 346816	2018-11-13 23:41:31 +00:00
Jessica Paquette	cad864d49e	[MachineOutliner][NFC] Use MBB flags to avoid call checks in getOutliningInfo We already determine a bunch of information about an MBB in getMachineOutlinerMBBFlags. We can reuse that information to avoid calculating things that must be false/true. The first thing we can easily check is if an outlined sequence could ever contain calls. There's no reason to walk over the outlined range, checking for calls, if we already know that there are no calls in the block containing the sequence. llvm-svn: 346809	2018-11-13 23:01:34 +00:00
Jessica Paquette	b2d53c5d7d	[MachineOutliner][NFC] Exit getOutliningType if there are < 2 candidates Since we never outline anything with fewer than 2 occurrences, there's no reason to compute cost model information if there's less than that. llvm-svn: 346803	2018-11-13 22:16:27 +00:00
Stanislav Mekhanoshin	bcb34ac2ea	[AMDGPU] combine extractelement into several selects An extractelement with non-constant index will be lowered either to scratch or movrel loop in most cases. This patch converts such instruction into a set of selects if vector size is not too big. Differential Revision: https://reviews.llvm.org/D54351 llvm-svn: 346800	2018-11-13 21:18:21 +00:00
Craig Topper	aca8390216	[SelectionDAG][X86] Relax restriction on the width of an input to _EXTEND_VECTOR_INREG. Use them and regular _EXTEND to replace the X86 specific VSEXT/VZEXT opcodes Previously, the extend_vector_inreg opcode required their input register to be the same total width as their output. But this doesn't match up with how the X86 instructions are defined. For X86 the input just needs to be a legal type with at least enough elements to cover the output. This patch weakens the check on these nodes and allows them to be used as long as they have more input elements than output elements. I haven't changed type legalization behavior so it will still create them with matching input and output sizes. X86 will custom legalize these nodes by shrinking the input to be a 128 bit vector and once we've done that we treat them as legal operations. We still have one case during type legalization where we must custom handle v64i8 on avx512f targets without avx512bw where v64i8 isn't a legal type. In this case we will custom type legalize to a *extend_vector_inreg with a v16i8 input. After that the input is a legal type so type legalization should ignore the node and doesn't need to know about the relaxed restriction. We are no longer allowed to use the default expansion for these nodes during vector op legalization since the default expansion uses a shuffle which required the widths to match. Custom legalization for all types will prevent us from reaching the default expansion code. I believe DAG combine works correctly with the released restriction because it doesn't check the number of input elements. The rest of the patch is changing X86 to use either the vector_inreg nodes or the regular zero_extend/sign_extend nodes. I had to add additional isel patterns to handle any_extend during isel since simplifydemandedbits can create them at any time so we can't legalize to zero_extend before isel. We don't yet create any_extend_vector_inreg in simplifydemandedbits. Differential Revision: https://reviews.llvm.org/D54346 llvm-svn: 346784	2018-11-13 19:45:21 +00:00
Sam Clegg	f98ba05f3d	[WebAssembly] Fix broken assumption that all bitcasts are to functions types Specifically, we can bitcast to void. Fixes PR39591 Differential Revision: https://reviews.llvm.org/D54447 llvm-svn: 346778	2018-11-13 19:14:02 +00:00
Simon Pilgrim	e827fe09b3	[CostModel][X86] Fix constant vector XOP rights shifts We'll constant fold these cases so they are as cheap as vector left shift cases. Noticed while improving funnel shift costs. llvm-svn: 346760	2018-11-13 16:40:10 +00:00
Simon Pilgrim	72a7fbc1a3	Fix comment for XOP rotates. NFCI. llvm-svn: 346753	2018-11-13 12:09:27 +00:00
Alexander Richardson	4eb93907f7	Fix modules build of AVRAsmParser.cpp Summary: Without this change I get the following error: lib/Target/AVR/AVRGenAsmMatcher.inc:1135:1: error: redundant #include of module 'LLVM_Utils.Support.Format' appears within namespace 'llvm' [-Wmodules-import-nested-redundant] Reviewers: dylanmckay Reviewed By: dylanmckay Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D53425 llvm-svn: 346750	2018-11-13 10:54:44 +00:00
Jonas Paulsson	f9b2b5e67e	[SystemZ] Increase the number of VLREPs If a loaded value is replicated it is best to combine these two operations into a VLREP (load and replicate), but isel will not produce this if the load has other users as well. This patch handles this by putting the other users of the load to use the REPLICATE 0-element instead of the load. This way the load has only the REPLICATE node as user, and we get a VLREP. Review: Ulrich Weigand https://reviews.llvm.org/D54264 llvm-svn: 346746	2018-11-13 08:37:09 +00:00
Jessica Paquette	106946329d	[MachineOutliner][NFC] Simplify isMBBSafeToOutlineFrom check in AArch64 outliner Turns out it's way simpler to do this check with one LRU. Instead of maintaining two, just keep one. Check if each of the registers is available, and then check if it's a live out from the block. If it's a live out, but available in the block, we know we're in an unsafe case. llvm-svn: 346721	2018-11-13 00:32:09 +00:00
Jessica Paquette	82d9c0a3fa	[MachineOutliner][NFC] Change getMachineOutlinerMBBFlags to isMBBSafeToOutlineFrom Instead of returning Flags, return true if the MBB is safe to outline from. This lets us check for unsafe situations, like say, in AArch64, X17 is live across a MBB without being defined in that MBB. In that case, there's no point in performing an instruction mapping. llvm-svn: 346718	2018-11-12 23:51:32 +00:00
Simon Pilgrim	e565e5a962	[X86][SSE] Add lowerVectorShuffleAsByteRotateAndPermute (PR39387) This patch adds the ability to use a PALIGNR to rotate a pair of inputs to select a range containing all the referenced elements, followed by a single input permute to put them in the right location. Differential Revision: https://reviews.llvm.org/D54267 llvm-svn: 346706	2018-11-12 21:12:38 +00:00
Aakanksha Patil	a992c694c6	AMDGPU: Adding more median3 patterns min(max(a, b), max(min(a, b), c)) -> med3 a, b, c Differential Revision: https://reviews.llvm.org/D54331 llvm-svn: 346704	2018-11-12 21:04:06 +00:00
Wouter van Oortmerssen	cc75e77df5	[WebAssembly] Added WasmAsmParser. Summary: This is to replace the ELFAsmParser that WebAssembly was using, which so far was a stub that didn't do anything, and couldn't work correctly with wasm. This new class is there to implement generic directives related to wasm as a binary format. Wasm target specific directives are still parsed in WebAssemblyAsmParser as before. The two classes now cooperate more correctly too. Also implemented .result which was missing. Any unknown directives will now result in errors. Reviewers: dschuff, sbc100 Subscribers: mgorny, jgravelle-google, eraman, aheejin, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D54360 llvm-svn: 346700	2018-11-12 20:15:01 +00:00
Craig Topper	c48712b341	[X86] In LowerMULH, use generic truncate and vector shuffle nodes instead of directly emitting PACKUS. Truncate and shuffle lowering are already capable of matching to PACKUS using known bits analysis. This features one test change where we now prefer to extend v16i16->v16i32 then trunc v16i32->v16i8 over extract_subvector+packus when avx512f is available, but avx512bw is not. llvm-svn: 346697	2018-11-12 19:37:29 +00:00
Stanislav Mekhanoshin	e86c8d33b1	[AMDGPU] Optimize S_CBRANCH_VCC[N]Z -> S_CBRANCH_EXEC[N]Z Sometimes after basic block placement we end up with a code like: sreg = s_mov_b64 -1 vcc = s_and_b64 exec, sreg s_cbranch_vccz This happens as a join of a block assigning -1 to a saved mask and another block which consumes that saved mask with s_and_b64 and a branch. This is essentially a single s_cbranch_execz instruction when moved into a single new basic block. Differential Revision: https://reviews.llvm.org/D54164 llvm-svn: 346690	2018-11-12 18:48:17 +00:00
Simon Pilgrim	93c64e5c76	[CostModel][X86] Add funnel shift rotation special case costs When we repeat the 2 shifting operands then this is a bit rotation - annoyingly this has to be done in the other getIntrinsicInstrCost than most intrinsics as we need to check the operands are the same. llvm-svn: 346688	2018-11-12 18:27:54 +00:00
Simon Pilgrim	49e93d2f0e	[CostModel][X86] Add SHLD/SHRD scalar funnel shift costs The costs match the typical reg-reg cases - the RMW case can be a lot slower but we don't model that at this level llvm-svn: 346683	2018-11-12 17:56:59 +00:00
Simon Pilgrim	f4cd292ba2	[CostModel][X86] SK_ExtractSubvector is cheap if the (legal) subvector is aligned within the source vector llvm-svn: 346664	2018-11-12 15:48:06 +00:00
Jonas Paulsson	5cea85dd59	[SystemZ::TTI] Improve accuracy of costs for vector fp <-> int conversions Improve getCastInstrCost() by respecting the different types of Src and Dst for vector integer <-> fp conversions. This means that extracting from integer becomes more expensive (by the extraction penalty), and the extraction from fp becomes cheaper (no longer has a false extraction penalty). Review: Ulrich Weigand https://reviews.llvm.org/D54423 llvm-svn: 346663	2018-11-12 15:32:27 +00:00
Alex Bradbury	9c03e4cacd	[RISCV] Support .option relax and .option norelax This extends the .option support from D45864 to enable/disable the relax feature flag from D44886 During parsing of the relax/norelax directives, the RISCV::FeatureRelax feature bits of the SubtargetInfo stored in the AsmParser are updated appropriately to reflect whether relaxation is currently enabled in the parser. When an instruction is parsed, the parser checks if relaxation is currently enabled and if so, gets a handle to the AsmBackend and sets the ForceRelocs flag. The AsmBackend uses a combination of the original RISCV::FeatureRelax feature bits set by e.g -mattr=+/-relax and the ForceRelocs flag to determine whether to emit relocations for symbol and branch diffs. Diff relocations should therefore only not be emitted if the relax flag was not set on the command line and no instruction was ever parsed in a section with relaxation enabled to ensure correct diffs are emitted. Differential Revision: https://reviews.llvm.org/D46423 Patch by Lewis Revill. llvm-svn: 346655	2018-11-12 14:25:07 +00:00
Jonas Paulsson	c0ee028dc3	[SystemZ] Replicate the load with most uses in buildVector() Iterate over all elements and count the number of uses among them for each used load. Then make sure to REPLICATE the load which has the most uses in order to minimize the number of needed element insertions. Review: Ulrich Weigand https://reviews.llvm.org/D54322 llvm-svn: 346637	2018-11-12 08:12:20 +00:00
Craig Topper	2eab39f77b	[X86] Use DAG.getConstant instead of getZeroVector. llvm-svn: 346605	2018-11-11 07:24:36 +00:00
Craig Topper	ef33a190bc	[X86] Replace calls to getOnesVector/getZeroVector with getConstant. getConstant will create a BUILD_VECTOR for us and use a legal type if necessary. So just create the simple node and let BUILD_VECTOR legalization do the canonicalization. llvm-svn: 346603	2018-11-11 01:40:04 +00:00
Sanjay Patel	0a515595a7	[x86] allow vector load narrowing with multi-use values This is a long-awaited follow-up suggested in D33578. Since then, we've picked up even more opportunities for vector narrowing from changes like D53784, so there are a lot of test diffs. Apart from 2-3 strange cases, these are all wins. I've structured this to be no-functional-change-intended for any target except for x86 because I couldn't tell if AArch64, ARM, and AMDGPU would improve or not. All of those targets have existing regression tests (4, 4, 10 files respectively) that would be affected. Also, Hexagon overrides the shouldReduceLoadWidth() hook, but doesn't show any regression test diffs. The trade-off is deciding if an extra vector load is better than a single wide load + extract_subvector. For x86, this is almost always better (on paper at least) because we often can fold loads into subsequent ops and not increase the official instruction count. There's also some unknown -- but potentially large -- benefit from using narrower vector ops if wide ops are implemented with multiple uops and/or frequency throttling is avoided. Differential Revision: https://reviews.llvm.org/D54073 llvm-svn: 346595	2018-11-10 20:05:31 +00:00
Benjamin Kramer	37c691e867	[X86] Remove unused variable llvm-svn: 346592	2018-11-10 18:11:11 +00:00
Craig Topper	7956a256e9	[X86] Remove apparently unneeded code from combineVSZext. No lit tests fail with this code removed. This is a pre-commit for D54346. llvm-svn: 346590	2018-11-10 17:44:28 +00:00
Simon Pilgrim	d3ca710ec9	[CostModel][X86] SK_ExtractSubvector costs must only be tested for vector types (PR39615) llvm-svn: 346589	2018-11-10 17:37:52 +00:00
Roman Lebedev	b428b8b214	[X86][BdVer2] Fix loads/stores throughput for Piledriver (PR39465) There are two AGU units, and per 1cy, there can be either two loads, or a load and a store; but not two stores, or two loads and a store. Additionally, loads shouldn't affect the store scheduler and vice versa. (but should affect the PdEX scheduler.) Required rL346545. Fixes https://bugs.llvm.org/show_bug.cgi?id=39465 llvm-svn: 346587	2018-11-10 14:31:43 +00:00
Craig Topper	a1b6667c6a	[X86] Use a MOVSX instruction instead of a MOVZX instruction in isel for an any_extend of the remainder from an 8-bit sdivrem. The sdivrem will emit its own MOVSX to move %ah to the low byte of a register. By using a MOVSX for an any_extend this allows a post-isel peephole to merge them. llvm-svn: 346581	2018-11-10 06:04:33 +00:00
Craig Topper	0364085281	[X86] In LowerHorizontalByteSum, emit vector_shuffle nodes instead of directly using X86ISD::UNPCKL/X86ISD::UNPCKH. This gives shuffle lowering the freedom to use zero_extend_vector_inreg for the unpckl shuffle. Shuffle combining usually makes this swap later, but not when AVX512 is enabled it seems. While there also use DAG.getConstant to create a 0 vector instead of using the helper the forces a specific BUILD_VECTOR. I don't think that helper is usually needed. We're basically free to create a constant build_vector anytime and it will be legalized on its own. llvm-svn: 346574	2018-11-10 00:26:42 +00:00
Thomas Lively	936734b777	[WebAssembly] Update bleeding-edge cpu features Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, sunfish, jfb, llvm-commits Differential Revision: https://reviews.llvm.org/D54362 llvm-svn: 346570	2018-11-10 00:11:14 +00:00
Eli Friedman	ad1151cf6a	[ARM64] [Windows] Handle funclets This patch adds support for funclets in frame lowering and ISel lowering. Together with D50288 and D50166, it enables C++ exception handling. Patch by Sanjin Sijaric, with some fixes by me. Differential Revision: https://reviews.llvm.org/D51524 llvm-svn: 346568	2018-11-09 23:33:30 +00:00
Eli Friedman	0bbb0d0720	[ARM] Add MemOperand to LDRcp to enable DCE. LDRcp should be deleted when the dest register is dead in register coalescing. Without MemOp, dead LDRcp will cause dead constant pool value which references to non-existing label. Patch by Yin Ma. Differential Revision: https://reviews.llvm.org/D54173 llvm-svn: 346563	2018-11-09 23:09:17 +00:00
Craig Topper	17d64c71c5	[X86] Move the promotion of v16i16->v16i8 for avx512f but not avx512bw from lowering to isel. Change to use vpmovzx instead of vpmovsx. With avx512f but not avx512bw we need to extend to v16i32 then truncate that to to v16i8. Previously we emitted both nodes during lowering, but I'm trying to switch to using target independent nodes and with that switched the extend+truncate wou This patch changes the implementation to what will be necessary with that patch which helps minimize test diffs. llvm-svn: 346552	2018-11-09 20:09:53 +00:00
Bryan Chan	123553921f	[AArch64] Support HiSilicon's TSV110 processor Reviewers: t.p.northover, SjoerdMeijer, kristof.beyls Reviewed By: kristof.beyls Subscribers: olista01, javed.absar, kristof.beyls, kristina, llvm-commits Differential Revision: https://reviews.llvm.org/D53908 llvm-svn: 346546	2018-11-09 19:32:08 +00:00
Fangrui Song	60b7fb46e1	[Hexagon] Fix some -Wunused-function with LLVM_DUMP_METHOD and -Wunused-variable llvm-svn: 346543	2018-11-09 19:24:48 +00:00
Craig Topper	731ea7dbc1	[X86] Turn X86ISD::VSEXT into X86ISD::VZEXT if the upper bits aren't demanded. This makes X86ISD::VSEXT more similar to ISD::SIGN_EXTEND and ISD::ZERO_EXTEND. I'm hoping to replace X86ISD::VSEXT/VZEXT with target independent nodes. Making the target specific nodes similar to the target independent nodes helps minimize test diffs in that patch. llvm-svn: 346539	2018-11-09 19:05:51 +00:00
Simon Pilgrim	fc8f1d7da7	[CostModel][X86] SK_ExtractSubvector is free if the subvector is at the start of the source vector llvm-svn: 346538	2018-11-09 19:04:27 +00:00
Jordan Rupprecht	c1741a5a8a	[Hexagon] Fix unused variable warning in release builds llvm-svn: 346537	2018-11-09 18:54:27 +00:00
Fangrui Song	4955066366	[WebAssembly] Hotfix of WebAssemblyInstructionTableSize after rL346465 llvm-svn: 346535	2018-11-09 18:32:20 +00:00
Brendon Cahoon	ac8fed68d5	[Hexagon] Implement noreturn optimization Eliminate the stack frame in functions with the noreturn nounwind attributes, and when the noreturn-stack-elim target feature is enabled. This reduces the code and stack space needed for noreturn functions. Differential Revision: https://reviews.llvm.org/D54210 llvm-svn: 346532	2018-11-09 18:16:24 +00:00
Stanislav Mekhanoshin	13d3371e68	[AMDGPU] Always pass TRI into findRegister[Use/Def]OperandIdx This only covers AMDGPU BE, hopefully all occurrences. Differential Revision: https://reviews.llvm.org/D54235 llvm-svn: 346528	2018-11-09 17:58:59 +00:00
Krzysztof Parzyszek	8567de0871	[Hexagon] Place globals with explicit .sdata section in small data Both -fPIC and -G0 disable placement of globals in small data section, but if a global has an explicit section assigmnent placing it in small data, it should go there anyway. llvm-svn: 346523	2018-11-09 17:31:22 +00:00
Zaara Syeda	5c179bf14b	[Power9] Allow gpr callee saved spills in prologue to vectors registers Currently in llvm, CalleeSavedInfo can only assign a callee saved register to stack frame index to be spilled in the prologue. We would like to enable spilling gprs to vector registers. This patch adds the capability to spill to other registers aside from just the stack. It also adds the changes for power9 to spill gprs to volatile vector registers when they are available. This happens only for leaf functions when using the option -ppc-enable-pe-vector-spills. Differential Revision: https://reviews.llvm.org/D39386 llvm-svn: 346512	2018-11-09 16:36:24 +00:00
Alexey Bataev	93d018a916	Revert "[DEBUGINFO, NVPTX]DO not emit ',debug' option if no debug info or only debug directives are requested." This reverts commit r345972. Need to update the description + possibly to update the patch itself after discussion with Eric Christofer. llvm-svn: 346508	2018-11-09 16:22:35 +00:00
Jonas Paulsson	458b7c0b39	[SystemZ] Avoid inserting same value after replication A minor improvement of buildVector() that skips creating an INSERT_VECTOR_ELT for a Value which has already been used for the REPLICATE. Review: Ulrich Weigand https://reviews.llvm.org/D54315 llvm-svn: 346504	2018-11-09 15:44:28 +00:00
Sam Parker	2804f32ec4	[ARM] Don't promote i1 types in ARM CGP Now that we have mixed type sizes, i1 values need to be explicitly handled as we want to avoid promoting these values. Differential Revision: https://reviews.llvm.org/D54308 llvm-svn: 346499	2018-11-09 15:06:33 +00:00
Sanjay Patel	fa1c0fe478	[x86] try to form broadcast before widening shuffle elements I noticed that we weren't generating broadcasts as much I thought we would with D54271, and this is part of the problem. Widening the shuffle elements means adding bitcasts and hiding the relationship between a splatted scalar and the vector. If we can form a broadcast, do that before going through the rest of the shuffle lowering because broadcasts should be cheap and can often be load-folded. Differential Revision: https://reviews.llvm.org/D54280 llvm-svn: 346498	2018-11-09 14:54:58 +00:00
Alex Bradbury	1cc2d0b9fb	[RISCV] Avoid unnecessary XOR for seteq/setne 0 Differential Revision: https://reviews.llvm.org/D53492 Patch by James Clarke. llvm-svn: 346497	2018-11-09 14:47:36 +00:00
Petar Avramovic	2cefaa2747	[MIPS GlobalISel] narrowScalar G_CONSTANT Legalize s64 G_CONSTANT using narrowScalar on MIPS 32. Differential Revision: https://reviews.llvm.org/D54255 llvm-svn: 346495	2018-11-09 14:21:16 +00:00
Simon Pilgrim	ea51f98b9b	[X86] Add Subtarget to more lowerVectorShuffle functions. NFCI. This will be necessary for an update to D54267 llvm-svn: 346490	2018-11-09 13:19:03 +00:00
Clement Courbet	eee2e06e2a	[llvm-exegesis][NFC] Add a way to declare the default counter binding for unbound CPUs for a target. Summary: This simplifies the code and moves everything to tablegen for consistency. This also prepares the ground for adding issue counters. Reviewers: gchatelet, john.brawn, jsji Subscribers: nemanjai, mgorny, javed.absar, kbarton, tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D54297 llvm-svn: 346489	2018-11-09 13:15:32 +00:00
Clement Courbet	e6b727e552	[X86] Fix VZEROUPPER scheduling info on SNB,HSW,BDW,SXL,SKX. Summary: Starting from SNB, VZEROUPPER is handled by the renamer and uses no proc resources. After HSW, it also has zero latency. This fixes PR35606. To reproduce: Uops: llvm-exegesis -mode=uops -opcode-name=VZEROUPPER Latency: echo -e '#LLVM-EXEGESIS-DEFREG XMM0 1\n#LLVM-EXEGESIS-DEFREG XMM1 1\nvzeroupper' \| /tmp/llvm-exegesis -mode=latency -snippets-file=- echo -e '#LLVM-EXEGESIS-DEFREG XMM0 1\n#LLVM-EXEGESIS-DEFREG XMM1 1\nvzeroupper\naddps %xmm0, %xmm1' \| /tmp/llvm-exegesis -mode=latency -snippets-file=- Reviewers: RKSimon, craig.topper, andreadb Subscribers: gbedwell, llvm-commits Differential Revision: https://reviews.llvm.org/D54107 llvm-svn: 346482	2018-11-09 09:49:06 +00:00
Sam Parker	08979cd125	[ARM] Enable mixed types in ARM CGP Previously, during the search, all values had to have the same 'TypeSize', which is equal to number of bits of the integer type of the icmp operand. All values in the tree had to match this size; meaning that, if we searched from i16, we wouldn't accept i8s. A change in type size requires zext and truncs to perform the casts so, to allow mixed narrow types, the handling of these instructions is now slightly different: - we allow casts if their result or operand is <= TypeSize. - zexts are sinks if their result > TypeSize. - truncs are still sinks if their operand == TypeSize. - truncs are still sources if their result == TypeSize. The transformation bails on finding an icmp that operates on data smaller than the current TypeSize. Differential Revision: https://reviews.llvm.org/D54108 llvm-svn: 346480	2018-11-09 09:28:27 +00:00
Sam Parker	453ba916a0	[ARM] Small reorganisation in ARMParallelDSP A few code movement things: - AreSymmetrical is now a method of BinOpChain. - Created a lambda in CreateParallelMACPairs to reduce loop nesting. - A Reduction object now gets pasted in a couple of places instead, including CreateParallelMACPairs so it doesn't need to return a value. I've also added RecordSequentialLoads, which is run before the transformation begins, and caches the interesting loads. This can then be queried later instead of cross checking many load values. Differential Revision: https://reviews.llvm.org/D54254 llvm-svn: 346479	2018-11-09 09:18:00 +00:00
Mandeep Singh Grang	397765bc51	[COFF, ARM64] Add support for MSVC buffer security check Reviewers: rnk, mstorsjo, compnerd, efriedma, TomTan Reviewed By: rnk Subscribers: javed.absar, kristof.beyls, chrib, llvm-commits Differential Revision: https://reviews.llvm.org/D54248 llvm-svn: 346469	2018-11-09 02:48:36 +00:00
Thomas Lively	2faf079494	[WebAssembly] Read prefixed opcodes as ULEB128s Summary: Depends on D54126. Reviewers: aheejin, dschuff, aardappel Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D54138 llvm-svn: 346465	2018-11-09 01:57:00 +00:00
Thomas Lively	4ddd22581e	[WebAssembly][NFC] Reorder SIMD section Summary: Reorders the sections in the SIMD tablegen file to roughly match the new opcode ordering. Depends on D54126. Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D54134 llvm-svn: 346464	2018-11-09 01:49:19 +00:00
Thomas Lively	299d214aba	[WebAssembly] Renumber and LEB128-encode SIMD opcodes Reviewers: aheejin, dschuff, aardappel Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D54126 llvm-svn: 346463	2018-11-09 01:45:56 +00:00
Thomas Lively	38c902bc2e	[WebAssembly] Lower select for vectors Summary: Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D53675 llvm-svn: 346462	2018-11-09 01:38:44 +00:00
Heejin Ahn	0c68a875fa	[WebAssembly] Fix LowerEmscriptenEHSjLj when there's only longjmp Summary: The pass incorrectly assumed if there's a longjmp declaration in the module, there is also a setjmp function declaration. Fixed it, and now the pass only converts longjmp and does not do any other transformation when there's no setjmp declaration in the module. Fixes PR39562. Reviewers: jgravelle-google, sbc100 Subscribers: dschuff, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D54273 llvm-svn: 346445	2018-11-08 22:56:26 +00:00
Sanjay Patel	b5535dc7b3	[x86] use shuffles for scalar insertion into high elements of a constant vector As discussed in D54073, we have a potential regression from more aggressive vector narrowing here, so let's try to avoid that by changing build-vector lowering slightly. Insert-vector-element lowering always does this since there's no "pinsr" for ymm/zmm: // If the vector is wider than 128 bits, extract the 128-bit subvector, insert // into that, and then insert the subvector back into the result. ...but we can sometimes do better for insert-into-constant-vector by using shuffle lowering. Differential Revision: https://reviews.llvm.org/D54271 llvm-svn: 346433	2018-11-08 19:16:27 +00:00
Davide Italiano	ac8279ab8b	Revert "[MSP430] Add MC layer" This commit broke the module buildbots. Error: lib/Target/MSP430/MSP430GenAsmMatcher.inc:1027:1: error: redundant namespace 'llvm' [-Wmodules-import-nested-redundant] ^ llvm-svn: 346410	2018-11-08 16:21:29 +00:00
Jonas Paulsson	1993894c03	[SystemZ] Bugfix in shouldCoalesce() It was discovered in randomized testing that the SystemZ implementation of shouldCoalesce() could be caused to crash when subreg liveness was enabled. This was because an undef use of the virtual register was copied outside current MBB at the point of shouldCoalesce() being called. For more details, see https://bugs.llvm.org/show_bug.cgi?id=39276. This patch changes the check for MBB locality from livein/liveout checks to do checks for all instructions of both intervals being inside MBB. This avoids the cases with dead defs / undef uses outside MBB, which are not affecting liveness in/out of MBB. The original test case included as a reduced .mir test case. Review: Ulrich Weigand https://reviews.llvm.org/D54197 llvm-svn: 346406	2018-11-08 15:29:48 +00:00
Petr Pavlu	7c84b2e3ab	[ARM] Enable spilling of the hGPR register class in Thumb2 Generalize code in Thumb2InstrInfo::storeRegToStackSlot() and loadRegToStackSlot() to allow the GPR class or any of its sub-classes (including hGPR) to be stored/loaded by ARM::t2STRi12/ARM::t2LDRi12. Differential Revision: https://reviews.llvm.org/D51927 llvm-svn: 346401	2018-11-08 13:02:10 +00:00
Anton Korobeynikov	5eb3d339d3	[MSP430] Fix encodeInstruction() for big endian hosts Reviewers: asl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D54251 llvm-svn: 346391	2018-11-08 10:17:52 +00:00
Thomas Lively	897171902b	[WebAssembly] Add V128 to WebAssemblyInstrInfo::copyPhysReg Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D53872 llvm-svn: 346384	2018-11-08 02:35:28 +00:00
Stanislav Mekhanoshin	6cc8b2fc65	[AMDGPU] Extend promote alloca vectorization Promote alloca can vectorize a small array by bitcasting it to a vector type. Extend vectorization for the case when alloca is already a vector type. We still want to replace GEPs with an insert/extract element instructions in this case. Differential Revision: https://reviews.llvm.org/D54219 llvm-svn: 346376	2018-11-08 00:16:23 +00:00
Anton Korobeynikov	09dff53840	[MSP430] Add MC layer Summary: This change implements assembler parser, code emitter, ELF object writer and disassembler for the MSP430 ISA. Also, more instruction forms are added to the target description. Reviewers: asl Reviewed By: asl Subscribers: pftbest, krisb, mgorny, llvm-commits Differential Revision: https://reviews.llvm.org/D53661 llvm-svn: 346374	2018-11-08 00:03:45 +00:00
Eli Friedman	0917d0c80c	[AArch64] [Windows] Address post-commit review comment on r346358. In this context, usesWindowsCFI() is basically the same thing as isOSWindows(), but it makes the relevant property of the target more explicit. llvm-svn: 346366	2018-11-07 22:30:56 +00:00
Nicolai Haehnle	bc233f5523	Revert "AMDGPU: Divergence-driven selection of scalar buffer load intrinsics" This reverts commit r344696 for now (except for some test additions). See https://bugs.freedesktop.org/show_bug.cgi?id=108611. llvm-svn: 346364	2018-11-07 21:53:43 +00:00
Nicolai Haehnle	61396ff67c	AMDGPU/InsertWaitcnts: Cleanup some old cruft (NFCI) Summary: Remove redundant logic and simplify control flow. Reviewers: msearles, rampitec, scott.linder, kanarayan Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D54086 llvm-svn: 346363	2018-11-07 21:53:36 +00:00
Nicolai Haehnle	0ab31c9c44	AMDGPU/InsertWaitcnts: Remove kill-related logic Summary: This is not needed, because we don't actually insert relevant branches for KILLs that late in the compilation flow. Besides, this was always checking for the wrong kill opcode anyway... Reviewers: msearles, rampitec, scott.linder, kanarayan Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D54085 llvm-svn: 346362	2018-11-07 21:53:29 +00:00
Konstantin Zhuravlyov	15e90e331c	AMDGPU/NFC: Split FLAT_Global_Atomic_Pseudo into RTN/NO_RTN multiclasses llvm-svn: 346361	2018-11-07 21:42:13 +00:00
Eli Friedman	d00fb2e0a8	[AArch64] [Windows] Trap after noreturn calls. Like the comment says, this isn't the most efficient fix in terms of codesize, but it works. Differential Revision: https://reviews.llvm.org/D54129 llvm-svn: 346358	2018-11-07 21:31:14 +00:00
Konstantin Zhuravlyov	7f1959ebb3	AMDGPU/NFC: Split MUBUF_Pseudo_Atomics into RTN/NO_RTN multiclasses llvm-svn: 346357	2018-11-07 21:21:32 +00:00
Eli Friedman	7d7d41debc	[ARM] Fix CPSR liveness in tMOVCCr_pseudo lowering. The lowering was missing live-ins in certain cases, like a sequence of multiple tMOVCCr_pseudo instructions. This would lead to a verifier failure, and on pre-v6 Thumb CPSR would be incorrectly clobbered. For reasons I don't completely understand, it's hard to get a sequence of multiple tMOVCCr_pseudo instructions; the issue only seems to show up with 64-bit comparisons where the result is zero-extended. I added some extra testcases in case that changes in the future. Probably some optimization opportunities here if anyone is interested. (@test_slt_not is the case that was getting miscompiled.) The code to check the liveness of CPSR was stolen from X86ISelLowering.cpp; maybe it could be refactored into common helper, but I have no idea where to put it. Differential Revision: https://reviews.llvm.org/D54192 llvm-svn: 346355	2018-11-07 21:08:13 +00:00
Matt Arsenault	8ba740a5a8	Allow subclassing ExternalAA This allows testing AMDGPU alias analysis like any other alias analysis pass. This fixes the existing test pointlessly running opt -O3 when it really just wants to run the one analysis. Before there was no way to test this using -aa-eval with opt, since the default constructed pass is run. The wrapper subclass allows the default constructor to pass the necessary callback. llvm-svn: 346353	2018-11-07 20:26:42 +00:00
Than McIntosh	5bcdea5118	[X86] improve split-stack machine BB placement Summary: The conditional branch created to support -fsplit-stack for X86 is left unbiased/unhinted, resulting in less than ideal block placement: the __morestack call block is kept on the main hot path. Bias the branch to insure that the stack allocation block is treated as a "cold" block during machine basic block placement. Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D54123 llvm-svn: 346336	2018-11-07 17:41:57 +00:00
Sanjay Patel	de58e93666	fix typos aggressively; NFC llvm-svn: 346316	2018-11-07 14:35:36 +00:00
Andrea Di Biagio	4ae974e745	[X86][FixupLEA] Avoid checking target features for every single processed instruction. NFCI llvm-svn: 346309	2018-11-07 12:26:00 +00:00
Petar Avramovic	2624c8db68	[MIPS GlobalISel] Set operand order for G_MERGE and G_UNMERGE Set operands order for G_MERGE_VALUES and G_UNMERGE_VALUES so that least significant bits always go first, regardless of endianness. Differential Revision: https://reviews.llvm.org/D54098 llvm-svn: 346305	2018-11-07 11:45:43 +00:00
Evandro Menezes	f1a0d93b1d	[PATCH] [AArch64] Refactor helper functions (NFC) Refactor helper functions in AArch64InstrInfo to be static methods. llvm-svn: 346273	2018-11-06 22:17:14 +00:00
Yaxun Liu	73bf0af32f	AMDGPU: Add an option -disable-promote-alloca-to-lds Add this option for debugging and providing workaround. By default it is off so no behavior change in backend. Differential Revision: https://reviews.llvm.org/D54158 llvm-svn: 346267	2018-11-06 21:28:17 +00:00
Craig Topper	6428a2cd9a	[X86] Add custom promotion of v2i8/v2i16 fp_to_sint to avoid over promotion to v2i64 which would force scalarization. llvm-svn: 346259	2018-11-06 19:24:21 +00:00
Matthias Braun	c6613879ce	LivePhysRegs/IfConversion: Change some types from unsigned to MCPhysReg; NFC Change the type in a couple of lists and sets that only store physical registers from unsigned to MCPhysRegs. The later is only 16bits and saves us a bit of memory. llvm-svn: 346254	2018-11-06 19:00:11 +00:00
Simon Atanasyan	bb36aea1d5	[mips] Support sigrie instruction The `sigrie` instruction signals a Reserved Instruction Exception. This patch adds support for assembling / disassembling the instruction. Differential Revision: http://reviews.llvm.org/D53861 llvm-svn: 346230	2018-11-06 14:37:24 +00:00
Clement Courbet	54a1184fff	[X86][NFC] Fix comment. llvm-svn: 346226	2018-11-06 13:48:56 +00:00
Matthias Braun	96d12513a1	AArch64: Cleanup CCMP code; NFC Cleanup CCMP pattern matching code in preparation for review/bugfix: - Rename `isConjunctionDisjunctionTree()` to `canEmitConjunction()` (it won't accept arbitrary disjunctions and is really about whether we can transform the subtree into a conjunction that we can emit). - Rename `emitConjunctionDisjunctionTree()` to `emitConjunction()` llvm-svn: 346203	2018-11-06 03:15:22 +00:00
Sam Clegg	5292d17ec8	Revert "[WebAssembly] Fixup `main` signature by default" This reverts rL345880. It caused some test failures on the webassembly waterfall. e.g. binaryen2.test_mainenv fails due the fact that `envp` ends up being undef rather than 0. Differential Revision: https://reviews.llvm.org/D54117 llvm-svn: 346187	2018-11-06 00:31:02 +00:00
Matthias Braun	7a75a91b5b	MachineFunction: Store more specific reference to LLVMTargetMachine; NFC MachineFunction can only be used in code using lib/CodeGen, hence we can keep a more specific reference to LLVMTargetMachine rather than just TargetMachine around. Do the same for references in ScheduleDAG and RegUsageInfoCollector. llvm-svn: 346183	2018-11-05 23:49:14 +00:00
Craig Topper	0b5f8169b0	[TargetLowering] Change TargetLoweringBase::getPreferredVectorAction to take an MVT instead of an EVT. NFC The main caller of this already has an MVT and several targets called getSimpleVT inside without checking isSimple. This makes the simpleness explicit. llvm-svn: 346180	2018-11-05 23:26:13 +00:00
Konstantin Zhuravlyov	108927b944	AMDGPU: Add sram-ecc feature Differential Revision: https://reviews.llvm.org/D53222 llvm-svn: 346177	2018-11-05 22:44:19 +00:00
Craig Topper	def82a81af	[X86] Don't turn any_extend from a mask register into a sign_extend during lowering. Add patterns to match any_extend during isel instead. SimplifyDemandedBits can turn a sign_extend back into an any_extend and trigger an infinite loop. So instead legalize it the same way as a sign_extend, but preserve the opcode. Then just pattern match it the same as sign_extend during isel. I don't have a reduced test case for such an infinite loop yet. llvm-svn: 346170	2018-11-05 22:08:17 +00:00
Zaara Syeda	7509880b54	[Power9] Add support for stxvw4x.be and stxvd2x.be intrinsics On Power9, we don't have patterns to select the following intrinsics: llvm.ppc.vsx.stxvw4x.be llvm.ppc.vsx.stxvd2x.be This patch adds support for these. Differential Revision: https://reviews.llvm.org/D53581 llvm-svn: 346148	2018-11-05 17:31:26 +00:00
Stefan Maksimovic	8d7c351799	[Mips] Supplement long branch pseudo instructions Expand on LONG_BRANCH_LUi and LONG_BRANCH_(D)ADDiu pseudo instructions by creating variants which support less operands/accept GPR64Opnds as their operand in order to appease the machine verifier pass. Differential Revision: https://reviews.llvm.org/D53977 llvm-svn: 346133	2018-11-05 14:37:41 +00:00
Neil Henning	233a02d0ed	[AMDGPU] Fix the new atomic optimizer in pixel shaders. The new atomic optimizer I previously added in D51969 did not work correctly when a pixel shader was using derivatives, and had helper lanes active. To fix this we add an llvm.amdgcn.ps.live call that guards a branch around the entire atomic operation - ensuring that all helper lanes are inactive within the wavefront when we compute our atomic results. I've added a test case that can cause derivatives, and exposes the problem. Differential Revision: https://reviews.llvm.org/D53930 llvm-svn: 346128	2018-11-05 12:04:48 +00:00
Sam Parker	fec793c98f	[ARM] Turn assert into condition in ARMCGP Turn the assert in PrepareConstants into a conditon so that we can handle mul instructions with negative immediates. Differential Revision: https://reviews.llvm.org/D54094 llvm-svn: 346126	2018-11-05 11:26:04 +00:00
Sam Parker	fcd8adab30	[ARM][ARMCGP] Remove unecessary zexts and truncs r345840 slightly changed the way promotion happens which could result in zext and truncs having the same source and destination types. This fixes that issue. We can now also remove the zext and trunc in the following case: (zext (trunc (promoted op)), i32) This means that we can no longer treat a value, that is only used by a sink, to be safe to promote. I've also added in some extra asserts and replaced a cast for a dyn_cast. Differential Revision: https://reviews.llvm.org/D54032 llvm-svn: 346125	2018-11-05 10:58:37 +00:00
Dylan McKay	4c5a5c8db6	[AVR] Fix a backend bug that left extraneous operands after expansion This patch fixes a bug in the AVR FRMIDX expansion logic. The expansion would leave a leftover operand from the original FRMIDX, but now attached to a MOVWRdRr instruction. The MOVWRdRr instruction did not expect this operand and so LLVM rejected the machine instruction. This would trigger an assertion: Assertion failed: ((isImpReg \|\| Op.isRegMask() \|\| MCID->isVariadic() \|\| OpNo < MCID->getNumOperands() \|\| isMetaDataOp) && "Trying to add an operand to a machine instr that is already done!"), function addOperand, file llvm/lib/CodeGen/MachineInstr.cpp Tim fixed this so that now the FRMIDX is expanded correctly into a well-formed MOVWRdRr. Patch by Tim Neumann llvm-svn: 346117	2018-11-05 05:49:04 +00:00
Craig Topper	30b627e5c9	[X86] Custom type legalize v2i8/v2i16/v2i32 mul to use to pmuludq. v2i8/v2i16/v2i32 are promoted to v2i64. pmuludq takes a v2i64 input and produces a v2i64 output. Since we don't about the upper bits of the type legalized multiply we can use the pmuludq to produce the multiply result for the bits we do care about. llvm-svn: 346115	2018-11-05 05:02:12 +00:00
Dylan McKay	9a9ae99b30	[AVR] Disallow the LDDWRdPtrQ instruction with Z as the destination This is an AVR-specific workaround for a limitation of the register allocator that only exposes itself on targets with high register contention like AVR, which only has three pointer registers. The three pointer registers are X, Y, and Z. In most nontrivial functions, Y is reserved for the frame pointer, as per the calling convention. This leaves X and Z. Some instructions, such as LPM ("load program memory"), are only defined for the Z register. Sometimes this just leaves X. When the backend generates a LDDWRdPtrQ instruction with Z as the destination pointer, it usually trips up the register allocator with this error message: LLVM ERROR: ran out of registers during register allocation This patch is a hacky workaround. We ban the LDDWRdPtrQ instruction from ever using the Z register as an operand. This gives the register allocator a bit more space to allocate, fixing the regalloc exhaustion error. Here is a description from the patch author Peter Nimmervoll As far as I understand the problem occurs when LDDWRdPtrQ uses the ptrdispregs register class as target register. This should work, but the allocator can't deal with this for some reason. So from my testing, it seams like (and I might be totally wrong on this) the allocator reserves the Z register for the ICALL instruction and then the register class ptrdispregs only has 1 register left and we can't use Y for source and destination. Removing the Z register from DREGS fixes the problem but removing Y register does not. More information about the bug can be found on the avr-rust issue tracker at https://github.com/avr-rust/rust/issues/37. A bug has raised to track the removal of this workaround and a proper fix; PR39553 at https://bugs.llvm.org/show_bug.cgi?id=39553. Patch by Peter Nimmervoll llvm-svn: 346114	2018-11-05 05:00:44 +00:00
Craig Topper	ed6a0a817f	[X86] Add vector shift by immediate to SimplifyDemandedBitsForTargetNode. Summary: This also enables some constant folding from KnownBits propagation. This helps on some cases vXi64 case in 32-bit mode where constant vectors appear as vXi32 and a bitcast. This can prevent getNode from constant folding sra/shl/srl. Reviewers: RKSimon, spatel Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D54069 llvm-svn: 346102	2018-11-04 17:31:27 +00:00
Craig Topper	1ba86188cf	[SelectionDAG] Remove special methods for creating *_EXTEND_VECTOR_INREG nodes. Move asserts into getNode. These methods were just wrappers around getNode with additional asserts (identical and repeated 3 times). But getNode already has a switch that can be used to hold these asserts that allows them to be shared for all 3 opcodes. This also enables checking on the places that create these nodes without using the wrappers. The rest of the patch is just changing all callers to use getNode directly. llvm-svn: 346087	2018-11-04 02:10:18 +00:00
Craig Topper	7aed9e600b	[X86] Update comment I forgot to change in r346043. NFC llvm-svn: 346073	2018-11-03 19:49:13 +00:00
Reid Kleckner	2bcb288ade	[codeview] Let the X86 backend tell us the VFRAME offset adjustment Use MachineFrameInfo's OffsetAdjustment field to pass this information from the target to CodeViewDebug.cpp. The X86 backend doesn't use it for any other purpose. This fixes PR38857 in the case where there is a non-aligned quantity of CSRs and a non-aligned quantity of locals. llvm-svn: 346062	2018-11-03 00:41:52 +00:00
Craig Topper	f7108aef14	[X86] In LowerEXTEND_VECTOR_INREG, emit a vector shuffle instead of directly using X86ISD::UNPCKL The majority of the changes are because the rest of shuffle lowering/combining prefers to replace the undef input with the other operand. Using UNPCKL directly seemed to avoid this and just grabbed a randomish register for the undef which can create false dependencies. llvm-svn: 346050	2018-11-02 22:48:02 +00:00
Wouter van Oortmerssen	de28b5d17f	[WebAssembly] Parsing missing directives to produce valid .o Summary: The assembler was able to assemble and then dump back to .s, but was failing to parse certain directives necessary for valid .o output: - .type directives are now recognized to distinguish function symbols and others. - .size is now parsed to provide function size. - .globaltype (introduced in https://reviews.llvm.org/D54012) is now recognized to ensure symbols like __stack_pointer have a proper type set for both .s and .o output. Also added tests for the above. Reviewers: sbc100, dschuff Subscribers: jgravelle-google, aheejin, dexonsmith, kristina, llvm-commits, sunfish Differential Revision: https://reviews.llvm.org/D53842 llvm-svn: 346047	2018-11-02 22:04:33 +00:00
Craig Topper	60c202a494	[X86] Don't emit *_extend_vector_inreg nodes when both the input and output types are legal with AVX1 We already have custom lowering for the AVX case in LegalizeVectorOps. So its better to keep the regular extend op around as long as possible. I had to qualify one place in DAG combine that created illegal vector extending load operations. This change by itself had no effect on any tests which is why its included here. I've made a few cleanups to the custom lowering. The sign extend code no longer creates an identity shuffle with undef elements. The zero extend code now emits a zero_extend_vector_inreg instead of an unpckl with a zero vector. For the high half of the custom lowering of zero_extend/any_extend, we're now using an unpckh with a zero vector or undef. Previously we used used a pshufd to move the upper 64-bits to the lower 64-bits and then used a zero_extend_vector_inreg. I think the zero vector should require less execution resources and be smaller code size. Differential Revision: https://reviews.llvm.org/D54024 llvm-svn: 346043	2018-11-02 21:09:49 +00:00
Alex Bradbury	52c27785ce	[RISCV] Add some missing expansions for floating-point intrinsics A number of intrinsics, such as llvm.sin.f32, would result in a failure to select. This patch adds expansions for the relevant selection DAG nodes, as well as exhaustive testing for all f32 and f64 intrinsics. The codegen for FMA remains a TODO item, pending support for the various RISC-V FMA instruction variants. The llvm.minimum.f32.* and llvm.maximum.* tests are commented-out, pending upstream support for target-independent expansion, as discussed in http://lists.llvm.org/pipermail/llvm-dev/2018-November/127408.html. Differential Revision: https://reviews.llvm.org/D54034 Patch by Luís Marques. llvm-svn: 346034	2018-11-02 19:50:38 +00:00
Heejin Ahn	5b023e07ea	[WebAssembly] Fix bugs in rethrow depth counting and InstPrinter Summary: EH stack depth is incremented at `try` and decremented at `catch`. When there are more than two catch instructions for a try instruction, we shouldn't count non-first catches when calculating EH stack depths. This patch fixes two bugs: - CFGStackify: Exclude `catch_all` in the terminate catch pad when calculating EH pad stack, because when we have multiple catches for a try we should count only the first catch instruction when calculating EH pad stack. - InstPrinter: The initial intention was also to exclude non-first catches, but it didn't account nested try-catches, so it failed on this case: ``` try try catch end catch <-- (1) end ``` In the example, when we are at the catch (1), the last seen EH instruction is not `try` but `end_try`, violating the wrong assumption. We don't need these after we switch to the second proposal because there is gonna be only one `catch` instruction. But anyway before then these bugfixes are necessary for keep trunk in working state. Reviewers: dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D53819 llvm-svn: 346029	2018-11-02 18:38:52 +00:00
Matthias Braun	5f7cb79e94	ARMExpandPseudoInsts: Fix CMP_SWAP expansion adding a kill flag to a def llvm-svn: 346026	2018-11-02 18:22:15 +00:00
Jonas Paulsson	cced2a2775	[SystemZ::TTI] Improve cost handling of uint/sint to fp conversions. Let i8/i16 uint/sint to fp conversions cost 1 if operand is a load. Since the load already does the extension, there is no extra cost (previously returned 2). Review: Ulrich Weigand https://reviews.llvm.org/D54028 llvm-svn: 346009	2018-11-02 17:53:31 +00:00
Sylvestre Ledru	df92dabaef	Fixed inclusion of M_PI fow MinGW-w64 Patch by KOLANICH llvm-svn: 346000	2018-11-02 17:25:40 +00:00
Jonas Paulsson	79f2441eee	[SystemZ] Rework getInterleavedMemoryOpCost() Model this function more closely after the BasicTTIImpl version, with separate handling of loads and stores. For loads, the set of actually loaded vectors is checked. This makes it more readable and just slightly more accurate generally. Review: Ulrich Weigand https://reviews.llvm.org/D53071 llvm-svn: 345998	2018-11-02 17:15:36 +00:00
Krzysztof Parzyszek	f070544f8e	[Hexagon] Do not reduce load size for globals in small-data Small-data (i.e. GP-relative) loads and stores allow 16-bit scaled offset. For a load of a value of type T, the small-data area is equivalent to an array "T sdata[65536]". This implies that objects of smaller sizes need to be closer to the beginning of sdata, while larger objects may be farther away, or otherwise the offset may be insufficient to reach it. Similarly, an object of a larger size should not be accessed via a load of a smaller size. llvm-svn: 345975	2018-11-02 14:17:47 +00:00
Alexey Bataev	8831ef7a16	[DEBUGINFO, NVPTX]DO not emit ',debug' option if no debug info or only debug directives are requested. Summary: If the output of debug directives only is requested, we should drop emission of ',debug' option from the target directive. Required for supporting of nvprof profiler. Reviewers: probinson, echristo, dblaikie Subscribers: Hahnfeld, jholewinski, llvm-commits, JDevlieghere, aprantl Differential Revision: https://reviews.llvm.org/D46061 llvm-svn: 345972	2018-11-02 13:47:47 +00:00
Neil Henning	7d1b77df57	[AMDGPU] UBSan bug fix for r345710 UBSan detected an error in our ISelLowering that is exposed only when you have a dmask == 0x1. Fix this by adding in an explicit check to ensure we don't do the UBSan detected shl << 32. llvm-svn: 345962	2018-11-02 10:24:57 +00:00
Matt Arsenault	8e0269ba0b	AMDGPU: Fix assertion with bitcast from i64 constant to v4i16 llvm-svn: 345922	2018-11-02 02:43:55 +00:00
Wouter van Oortmerssen	3231e518a3	[WebAssembly] Added a .globaltype directive to .s output. Summary: Assembly output can use globals like __stack_pointer implicitly, but has no way of indicating the type of such a global, which makes it hard for tools processing it (such as the MC Assembler) to reconstruct this information. The improved assembler directives parsing (in progress in https://reviews.llvm.org/D53842) will make use of this information. Also deleted code for the .import_global directive which was unused. New test case in userstack.ll Reviewers: dschuff, sbc100 Subscribers: jgravelle-google, aheejin, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D54012 llvm-svn: 345917	2018-11-02 00:45:00 +00:00
Thomas Lively	b2382c8bf7	[WebAssembly] General vector shift lowering Summary: Adds support for lowering non-splat shifts. Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D53625 llvm-svn: 345916	2018-11-02 00:39:57 +00:00
Thomas Lively	fb84fd7c8e	[WebAssembly] Expand inserts and extracts with variable indices Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D53964 llvm-svn: 345913	2018-11-02 00:06:56 +00:00
Mandeep Singh Grang	547a0d765a	[COFF, ARM64] Implement Intrinsic.sponentry for AArch64 Summary: This patch adds Intrinsic.sponentry. This intrinsic is required to correctly support setjmp for AArch64 Windows platform. Patch by: Yin Ma (yinma@codeaurora.org) Reviewers: mgrang, ssijaric, eli.friedman, TomTan, mstorsjo, rnk, compnerd, efriedma Reviewed By: efriedma Subscribers: efriedma, javed.absar, kristof.beyls, chrib, llvm-commits Differential Revision: https://reviews.llvm.org/D53996 llvm-svn: 345909	2018-11-01 23:22:25 +00:00
Farhana Aleen	5853762e5a	[AMDGPU] Handle the idot8 pattern generated by FE. Summary: Different variants of idot8 codegen dag patterns are not generated by llvm-tablegen due to a huge increase in the compile time. Support the pattern that clang FE generates after reordering the additions in integer-dot8 source language pattern. Author: FarhanaAleen Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D53937 llvm-svn: 345902	2018-11-01 22:48:19 +00:00
Mandeep Singh Grang	df19e57a1c	[COFF, ARM64] Implement llvm.addressofreturnaddress intrinsic Reviewers: rnk, mstorsjo, efriedma, TomTan Reviewed By: efriedma Subscribers: javed.absar, kristof.beyls, chrib, llvm-commits Differential Revision: https://reviews.llvm.org/D53962 llvm-svn: 345892	2018-11-01 21:23:47 +00:00
Heejin Ahn	2e398976ba	[WebAssembly] Fix signature parsing for 'try' in AsmParser Summary: Like `block` or `loop`, `try` can take an optional signature which can be omitted. This patch allows `try`'s signature to be omitted. Also added some tests for EH instructions. Reviewers: aardappel Subscribers: dschuff, sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D53873 llvm-svn: 345888	2018-11-01 20:32:15 +00:00
Reid Kleckner	4af6025f09	[Hexagon] Remove unintended fallthrough from MC duplex code I added these annotations in r345878 because I wasn't sure if the fallthrough was intended. Krzysztof Parzyszek confirmed that they should be breaks, so that's what this patch does. Reviewers: kparzysz Differential Revision: https://reviews.llvm.org/D53991 llvm-svn: 345883	2018-11-01 19:59:27 +00:00
Reid Kleckner	4dc0b1ac60	Fix clang -Wimplicit-fallthrough warnings across llvm, NFC This patch should not introduce any behavior changes. It consists of mostly one of two changes: 1. Replacing fall through comments with the LLVM_FALLTHROUGH macro 2. Inserting 'break' before falling through into a case block consisting of only 'break'. We were already using this warning with GCC, but its warning behaves slightly differently. In this patch, the following differences are relevant: 1. GCC recognizes comments that say "fall through" as annotations, clang doesn't 2. GCC doesn't warn on "case N: foo(); default: break;", clang does 3. GCC doesn't warn when the case contains a switch, but falls through the outer case. I will enable the warning separately in a follow-up patch so that it can be cleanly reverted if necessary. Reviewers: alexfh, rsmith, lattner, rtrieu, EricWF, bollu Differential Revision: https://reviews.llvm.org/D53950 llvm-svn: 345882	2018-11-01 19:54:45 +00:00
Sam Clegg	ddf049869a	[WebAssembly] Fixup `main` signature by default Differential Revision: https://reviews.llvm.org/D53396 llvm-svn: 345880	2018-11-01 19:38:44 +00:00
Reid Kleckner	bebc53f838	Annotate possibly unintended fallthroughs in Hexagon MC code, NFC Clang's -Wimplicit-fallthrough check fires on these switch cases. GCC does not warn when a case body that ends in a switch falls through to a case label of an outer switch. It's not clear if these fall throughs are truly intended. The Hexagon tests pass regardless of whether these case blocks fall through or break. For now, I have applied the intended fallthrough annotation macro with a FIXME comment to unblock enabling the warning. I will send a follow-up patch that converts them to breaks to the Hexagon maintainers. llvm-svn: 345878	2018-11-01 19:32:04 +00:00
Volkan Keles	0a8dc9eb0f	[GlobalISel] Fix a bug in LegalizeRuleSet::clampMaxNumElements Summary: This function was causing a crash when `MaxElements == 1` because it was trying to create a single element vector type. Reviewers: dsanders, aemerson, aditya_nandakumar Reviewed By: dsanders Subscribers: rovka, kristof.beyls, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D53734 llvm-svn: 345875	2018-11-01 19:01:53 +00:00
Simon Pilgrim	b34a052852	[LegalizeDAG] Add generic vector CTPOP expansion (PR32655) This patch adds support for expanding vector CTPOP instructions and removes the x86 'bitmath' lowering which replicates the same expansion. Differential Revision: https://reviews.llvm.org/D53258 llvm-svn: 345869	2018-11-01 18:22:11 +00:00
Reid Kleckner	ba982b5f8f	[Hexagon] Fix MO_JumpTable const extender conversion Previously this case fell through to unreachable, so it is clearly not covered by any test case in LLVM. It may be dynamically unreachable, in fact. However, if it were to run, this is what it would logically do. The assert suggests that the intended behavior was not to allow folding offsets from jump table indices, which makes sense. llvm-svn: 345868	2018-11-01 18:14:45 +00:00
Reid Kleckner	eb56894a4b	[AArch64] Fix unintended fallthrough and strengthen cast This was added in r330630. GCC's -Wimplicit-fallthrough seems to not fire when the previous case contains a switch itself. This fallthrough was bening because the helper function implementing the case used dyn_cast to re-check the type of the node in question. After fixing the fallthrough, we can strengthen the cast. llvm-svn: 345864	2018-11-01 18:02:27 +00:00
Mandeep Singh Grang	b0cdf56dd7	Revert "[COFF, ARM64] Implement Intrinsic.sponentry for AArch64" This reverts commit 585b6667b4712e3c7f32401e929855b3313b4ff2. llvm-svn: 345863	2018-11-01 17:53:57 +00:00
Sam Parker	48fbf752b0	[ARM] Attempt to fix ppc64be buildbot llvm-svn: 345850	2018-11-01 16:44:45 +00:00
Sam Parker	84a2f8b364	[ARM][CGP] Negative constant operand handling While mutating instructions, we sign extended negative constant operands for binary operators that can safely overflow. This was to allow instructions, such as add nuw i8 %a, -2, to still be able to perform a subtraction. However, the code to handle constants doesn't take into consideration that instructions, such as sub nuw i8 -2, %a, require the i8 -2 to be converted into i32 254. This is a relatively simple fix, but I've taken the time to reorganise the code a bit - mainly that instructions that can be promoted are cached and splitting up the Mutate function. Differential Revision: https://reviews.llvm.org/D53972 llvm-svn: 345840	2018-11-01 15:23:42 +00:00
Simon Pilgrim	d5d7224355	[X86][X86FixupLEA] Rename processInstructionForSLM to processInstructionForSlowLEA (NFCI) The function isn't SLM specific (its driven by the FeatureSlowLEA flag). Minor tidyup prior to PR38225. llvm-svn: 345836	2018-11-01 14:57:07 +00:00
Aleksandar Beserminji	b9c840c9f0	[mips][micromips] Fix JmpLink to TargetExternalSymbol When matching MipsISD::JmpLink t9, TargetExternalSymbol:i32'...', wrong JALR16_MM is selected. This patch adds missing pattern for JmpLink, so that JAL instruction is selected. Differential Revision: https://reviews.llvm.org/D53366 llvm-svn: 345830	2018-11-01 13:57:54 +00:00
Chad Rosier	1546efd4a7	[AArch64] Add support for ARMv8.4 in Saphira. llvm-svn: 345827	2018-11-01 13:45:16 +00:00
Simon Pilgrim	1f0a8421ad	[X86][SSE] Move 2-input limit up from getFauxShuffleMask to resolveTargetShuffleInputs (reapplied) Reapplying an updated version of rL345395 (reverted in rL345451), now the issues noticed in PR39483 have been fixed. This patch allows resolveTargetShuffleInputs to remove UNDEF inputs from cases where we have more than 2 inputs. llvm-svn: 345824	2018-11-01 11:52:09 +00:00
Stefan Maksimovic	cd0c50e3d2	[Mips] Conditionally remove successor block In MipsBranchExpansion::splitMBB, upon splitting a block with two direct branches, remove the successor of the newly created block (which inherits successors from the original block) which is pointed to by the last branch in the original block only if the targets of two branches differ. This is to fix the failing test when ran with -verify-machineinstrs enabled. Differential Revision: https://reviews.llvm.org/D53756 llvm-svn: 345821	2018-11-01 10:10:42 +00:00
Jonas Paulsson	6749c24f40	[SystemZ::TTI] Recognize the higher cost of scalar i1 -> fp conversion Scalar i1 to fp conversions are done with a branch sequence, so it should have a higher cost. Review: Ulrich Weigand https://reviews.llvm.org/D53924 llvm-svn: 345818	2018-11-01 09:05:32 +00:00
Jonas Paulsson	f15a53bc81	[SystemZ::TTI] Accurate costs for i1->double vector conversions This factors out a new method getBoolVecToIntConversionCost() containing the code for vector sext/zext of i1, in order to reuse it for i1 to double vector conversions. Review: Ulrich Weigand https://reviews.llvm.org/D53923 llvm-svn: 345817	2018-11-01 09:01:51 +00:00
Li Jia He	03170a904f	[PowerPC] Support constraint 'wi' in asm From the gcc manual, we can see that the specific limit of wi inline asm is “FP or VSX register to hold 64-bit integers for VSX insns or NO_REGS”. The link is https://gcc.gnu.org/onlinedocs/gcc-8.2.0/gcc/Machine-Constraints.html#Machine-Constraints. We should accept this constraint. Reviewed By: jsji Differential Revision: https://reviews.llvm.org/D53265 llvm-svn: 345810	2018-11-01 02:35:17 +00:00
Matthias Braun	a9f900561e	X86: Consistently declare pass initializers in X86.h; NFC This avoids declaring them twice: in X86TargetMachine.cpp and the file implementing the pass. llvm-svn: 345801	2018-11-01 00:38:01 +00:00
Thomas Lively	d4891a1b7a	[WebAssembly] Lower vselect Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D53630 llvm-svn: 345797	2018-11-01 00:01:02 +00:00
Thomas Lively	b61232eacd	[WebAssembly] Process p2align operands for SIMD loads and stores Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D53886 llvm-svn: 345795	2018-10-31 23:58:20 +00:00
Thomas Lively	6ff31fe34d	[WebAssembly] Handle vector IMPLICIT_DEFs. Summary: Also reduce the test case for implicit defs and test it with all register classes. Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D53855 llvm-svn: 345794	2018-10-31 23:50:53 +00:00
Mandeep Singh Grang	88ad9ac720	[COFF, ARM64] Implement Intrinsic.sponentry for AArch64 Summary: This patch adds Intrinsic.sponentry. This intrinsic is required to correctly support setjmp for AArch64 Windows platform. Reviewers: mgrang, TomTan, rnk, compnerd, mstorsjo, efriedma Reviewed By: efriedma Subscribers: majnemer, chrib, javed.absar, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D53673 llvm-svn: 345791	2018-10-31 23:16:20 +00:00
Evandro Menezes	3a06c46470	[AArch64] Sort switch cases (NFC) llvm-svn: 345786	2018-10-31 21:56:49 +00:00
Craig Topper	6c3f1692c8	Revert r345165 "[X86] Bring back the MOV64r0 pseudo instruction" Google is reporting regressions on some benchmarks. llvm-svn: 345785	2018-10-31 21:53:24 +00:00
Eli Friedman	063fd98bcc	[ARM] Add missing pseudo-instruction for Thumb1 RSBS. Shows up rarely for 64-bit arithmetic, more frequently for the compare patterns added in r325323. Differential Revision: https://reviews.llvm.org/D53848 llvm-svn: 345782	2018-10-31 21:45:48 +00:00
Stanislav Mekhanoshin	222e9c11f7	Check shouldReduceLoadWidth from SimplifySetCC SimplifySetCC could shrink a load without checking for profitability or legality of such shink with a target. Added checks to prevent shrinking of aligned scalar loads in AMDGPU below dword as scalar engine does not support it. Differential Revision: https://reviews.llvm.org/D53846 llvm-svn: 345778	2018-10-31 21:24:30 +00:00
Scott Linder	c6c627253d	[AMDGPU] Remove FeatureVGPRSpilling This feature is only relevant to shaders, and is no longer used. When disabled, lowering of reserved registers for shaders causes a compiler crash. Remove the feature and add a test for compilation of shaders at OptNone. Differential Revision: https://reviews.llvm.org/D53829 llvm-svn: 345763	2018-10-31 18:54:06 +00:00
Krzysztof Parzyszek	977a1fe507	[Hexagon] Make sure not to use GP-relative addressing with PIC Make sure that -relocation-model=pic prevents use of GP-relative addressing modes. llvm-svn: 345731	2018-10-31 15:54:31 +00:00
Nicolai Haehnle	814abb59df	AMDGPU: Rewrite SILowerI1Copies to always stay on SALU Summary: Instead of writing boolean values temporarily into 32-bit VGPRs if they are involved in PHIs or are observed from outside a loop, we use bitwise masking operations to combine lane masks in a way that is consistent with wave control flow. Move SIFixSGPRCopies to before this pass, since that pass incorrectly attempts to move SGPR phis to VGPRs. This should recover most of the code quality that was lost with the bug fix in "AMDGPU: Remove PHI loop condition optimization". There are still some relevant cases where code quality could be improved, in particular: - We often introduce redundant masks with EXEC. Ideally, we'd have a generic computeKnownBits-like analysis to determine whether masks are already masked by EXEC, so we can avoid this masking both here and when lowering uniform control flow. - The criterion we use to determine whether a def is observed from outside a loop is conservative: it doesn't check whether (loop) branch conditions are uniform. Change-Id: Ibabdb373a7510e426b90deef00f5e16c5d56e64b Reviewers: arsenm, rampitec, tpr Subscribers: kzhuravl, jvesely, wdng, mgorny, yaxunl, dstuttard, t-tye, eraman, llvm-commits Differential Revision: https://reviews.llvm.org/D53496 llvm-svn: 345719	2018-10-31 13:27:08 +00:00
Nicolai Haehnle	28212cc689	AMDGPU: Remove PHI loop condition optimization Summary: The optimization to early break out of loops if all threads are dead was never fully implemented. But the PHI node analyzing is actually causing a number of problems, so remove all the extra code for it. (This does actually regress code quality in a few places because it ends up relying more heavily on phi's of i1, which we don't do a great job with. However, since it fixes real bugs in the wild, we should take this change. I have some prototype changes to improve i1 lowering in general -- not just for control flow -- which should help recover the code quality, I just need to make those changes fit for general consumption. -- Nicolai) Change-Id: I6fc6c6c8961857ac6009fcfb9f7e5e48dc23fbb1 Patch-by: Christian König <christian.koenig@amd.com> Reviewers: arsenm, rampitec, tpr Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D53359 llvm-svn: 345718	2018-10-31 13:26:48 +00:00
Andrea Di Biagio	3d2b7176fc	[tblgen][PredicateExpander] Add the ability to describe more complex constraints on instruction operands. Before this patch, class PredicateExpander only knew how to expand simple predicates that performed checks on instruction operands. In particular, the new scheduling predicate syntax was not rich enough to express checks like this one: Foo(MI->getOperand(0).getImm()) == ExpectedVal; Here, the immediate operand value at index zero is passed in input to function Foo, and ExpectedVal is compared against the value returned by function Foo. While this predicate pattern doesn't show up in any X86 model, it shows up in other upstream targets. So, being able to support those predicates is fundamental if we want to be able to modernize all the scheduling models upstream. With this patch, we allow users to specify if a register/immediate operand value needs to be passed in input to a function as part of the predicate check. Now, register/immediate operand checks all derive from base class CheckOperandBase. This patch also changes where TIIPredicate definitions are expanded by the instructon info emitter. Before, definitions were expanded in class XXXGenInstrInfo (where XXX is a target name). With the introduction of this new syntax, we may want to have TIIPredicates expanded directly in XXXInstrInfo. That is because functions used by the new operand predicates may only exist in the derived class (i.e. XXXInstrInfo). This patch is a non functional change for the existing scheduling models. In future, we will be able to use this richer syntax to better describe complex scheduling predicates, and expose them to llvm-mca. Differential Revision: https://reviews.llvm.org/D53880 llvm-svn: 345714	2018-10-31 12:28:05 +00:00
Neil Henning	63718b214a	[AMDGPU] support image load/store a16 Our a16 support was only enabled for sample/gather and buffer load/store, but not for image load/store operations (which take an i16 as the pixel index rather than a half). Fix our isel lowering and add test cases to prove it out. Differential Revision: https://reviews.llvm.org/D53750 llvm-svn: 345710	2018-10-31 10:34:48 +00:00
Dorit Nuzman	34da6dd696	[LV] Support vectorization of interleave-groups that require an epilog under optsize using masked wide loads Under Opt for Size, the vectorizer does not vectorize interleave-groups that have gaps at the end of the group (such as a loop that reads only the even elements: a[2*i]) because that implies that we'll require a scalar epilogue (which is not allowed under Opt for Size). This patch extends the support for masked-interleave-groups (introduced by D53011 for conditional accesses) to also cover the case of gaps in a group of loads; Targets that enable the masked-interleave-group feature don't have to invalidate interleave-groups of loads with gaps; they could now use masked wide-loads and shuffles (if that's what the cost model selects). Reviewers: Ayal, hsaito, dcaballe, fhahn Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D53668 llvm-svn: 345705	2018-10-31 09:57:56 +00:00
Sanjin Sijaric	fadebc8aae	[ARM64] [Windows] Exception handling support in frame lowering Emit pseudo instructions indicating unwind codes corresponding to each instruction inside the prologue/epilogue. These are used by the MCLayer to populate the .xdata section. Differential Revision: https://reviews.llvm.org/D50288 llvm-svn: 345701	2018-10-31 09:27:01 +00:00
Martin Storsjo	315357faca	[AArch64] Mark condition flags and x16/x17 as clobbered when calling __chkstk This is similar to SVN r311061 for ARM. Differential Revision: https://reviews.llvm.org/D53878 llvm-svn: 345698	2018-10-31 08:14:09 +00:00
Konstantin Zhuravlyov	2d22d24ac4	Revert r345542: AMDGPU: Enable code object v3 by default It breaks mesa. llvm-svn: 345662	2018-10-30 22:02:40 +00:00
Mandeep Singh Grang	71e0cc2a0b	[COFF, ARM64] Make sure to forward arguments from vararg to musttail vararg Summary: Thunk functions in Windows are varag functions that call a musttail function to pass the arguments after the fixup is done. We need to make sure that we forward the arguments from the caller vararg to the callee vararg function. This is the same mechanism that is used for Windows on X86. Reviewers: ssijaric, eli.friedman, TomTan, mgrang, mstorsjo, rnk, compnerd, efriedma Reviewed By: efriedma Subscribers: efriedma, kristof.beyls, chrib, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D53843 llvm-svn: 345641	2018-10-30 20:46:10 +00:00
Eli Friedman	93d0129b78	[AArch64] [Windows] SEH opcodes should be scheduling boundaries. Prevents the post-RA scheduler from modifying the prologue sequences emitting by frame lowering. This is roughly similar to what we do for other targets: TargetInstrInfo::isSchedulingBoundary checks isPosition(), which checks for CFI_INSTRUCTION. isSEHInstruction is taken from D50288; it'll land with whatever patch lands first. Differential Revision: https://reviews.llvm.org/D53851 llvm-svn: 345634	2018-10-30 19:24:51 +00:00
David Greene	3e89fa8e08	[AArch64] Create proper memoperand for multi-vector stores Re-apply r345315 with testcase fixes. Include all of the store's source vector operands when creating the MachineMemOperand. Previously, we were missing the first operand, making the store size seem smaller than it really is. Differential Revision: https://reviews.llvm.org/D52816 llvm-svn: 345631	2018-10-30 19:17:51 +00:00
Craig Topper	6958b5ffa9	[X86] In lowerVectorShuffleAsBroadcast, make peeking through CONCAT_VECTORS work correctly if we already walked through a bitcast that changed the element size. The CONCAT_VECTORS case was using the original mask element count to determine how to adjust the broadcast index. But if we looked through a bitcast the original mask size doesn't tell us anything about the concat_vectors. This patch switchs to using the concat_vectors input element count directly instead. Differential Revision: https://reviews.llvm.org/D53823 llvm-svn: 345626	2018-10-30 18:48:42 +00:00
Ulrich Weigand	c5854b0adb	[SystemZ] Simplify LRV/STRV ISD nodes The LRV and STRV nodes carry an extra operand to indicate the type of the memory access. This is redundant, since the nodes are actually of class MemIntrinsicNode and therefore hold that same information already as MemoryVT. NFC intended. llvm-svn: 345618	2018-10-30 18:20:59 +00:00
Jonas Paulsson	af8e036c29	[SystemZ] Improve isFoldableLoad() for Sub, SDiv and UDiv. Sub, SDiv and UDiv are not commutative, so only the RHS operand can fold a load. This patch adds a check for this. Review: Ulrich Weigand https://reviews.llvm.org/D53791 llvm-svn: 345596	2018-10-30 13:41:03 +00:00
Francis Visoiu Mistrih	0e237d357e	[X86] Re-enable the machine verifier after fixing more tests Was disabled again in r345528. Hopefully this the bots. llvm-svn: 345593	2018-10-30 12:20:17 +00:00
Roman Lebedev	b3a14208ac	[X86][BMI1] X86DAGToDAGISel: select BEXTR from x & (-1 >> (32 - y)) pattern Summary: The final pattern. There is no test changes: * We are looking for the pattern with one-use of it's mask, * If the mask is one-use, D48768 will unfold it into pattern d. * Thus, the tests have extra-use on the mask. * Thus, only the BMI2 BZHI can be tested, and it already worked. * So there is no BMI1 test coverage, we just assume it works since it uses the same codepath. Reviewers: craig.topper, RKSimon Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D53575 llvm-svn: 345584	2018-10-30 11:12:34 +00:00
Diogo N. Sampaio	a3783b2ac2	[AArch64] Add support for UDF instruction Summary: Add support for AArch64 UDF instruction. UDF - Permanently Undefined generates an Undefined Instruction exception (ESR_ELx.EC = 0b000000). Reviewers: DavidSpickett, javed.absar, t.p.northover Reviewed By: javed.absar Subscribers: nhaehnle, kristof.beyls Differential Revision: https://reviews.llvm.org/D53319 llvm-svn: 345581	2018-10-30 11:06:50 +00:00
Simon Pilgrim	858303b827	[SelectionDAG] Add FoldBUILD_VECTOR to simplify new BUILD_VECTOR nodes Similar to FoldCONCAT_VECTORS, this patch adds FoldBUILD_VECTOR to simplify cases that can avoid the creation of the BUILD_VECTOR - if all the operands are UNDEF or if the BUILD_VECTOR simplifies to a copy. This exposed an assumption in some AMDGPU code that getBuildVector was guaranteed to be a BUILD_VECTOR node that I've tried to handle. Differential Revision: https://reviews.llvm.org/D53760 llvm-svn: 345578	2018-10-30 10:32:11 +00:00
Craig Topper	b293322cee	[LegalizeTypes] Teach PromoteIntRes_BITCAST to better handle a bitcast with vector output type and a vector input type that needs to be widened Summary: Previously if we had a bitcast vector output type that needs promotion and a vector input type that needs widening we would just do a stack store and load to handle the conversion. We can do a little better if we can widen the bitcast to a legal vector type the same size as the widened input type. Then we can do the bitcast between this widened type and the widened input type. Afterwards we can extract_subvector back to the original output and any_extend that. Type legalization will then circle back and handle promotion of the extract_subvector and the any_extend will just be removed. This will avoid going through the stack and allows us to remove a custom version of this legalization from X86. Reviewers: efriedma, RKSimon Reviewed By: efriedma Subscribers: javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D53229 llvm-svn: 345567	2018-10-30 03:27:15 +00:00
Craig Topper	67c2878501	[X86] Cleanup the code in LowerFABSorFNEG and LowerFCOPYSIGN a little. NFC Use SelectionDAG::EVTToAPFloatSemantics. Make the LogicVT calculation in LowerFABSorFNEG similar to LowerFCOPYSIGN. Use APInt::getSignedMaxValue instead of ~APInt::getSignMask. llvm-svn: 345565	2018-10-30 03:27:12 +00:00
Craig Topper	676d7a7a43	[X86] Stop changing f128 fand/for/fxor to v2i64. The additional patterns don't cost us much and it seems better than changing element widths. llvm-svn: 345564	2018-10-30 03:27:11 +00:00
Matt Arsenault	abc4f29f9c	AMDGPU: Remove custom BUILD_VECTOR combine This was looping in a testcase and removing it now slightly improves a test. llvm-svn: 345560	2018-10-30 01:37:59 +00:00
Matt Arsenault	b0b741efb8	AMDGPU: Use scavengeRegisterBackwards llvm-svn: 345559	2018-10-30 01:33:14 +00:00
Reid Kleckner	23c9efc071	Remove unneeded friend declarations that clang-cl warns on llvm-svn: 345549	2018-10-29 22:38:13 +00:00
Konstantin Zhuravlyov	5cb950200c	AMDGPU: Enable code object v3 by default Differential Revision: https://reviews.llvm.org/D53525 llvm-svn: 345542	2018-10-29 21:07:27 +00:00
Simon Pilgrim	090a444cb7	[X86] Set isMachineVerifierClean() back to false (PR27481) Put back the isMachineVerifierClean() override removed at rL345513 to fix Windows ThinLTO tests llvm-svn: 345528	2018-10-29 19:51:52 +00:00
Thomas Lively	eb15d00193	[WebAssembly] Lower away condition truncations for scalar selects Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D53676 llvm-svn: 345521	2018-10-29 18:38:12 +00:00
Simon Pilgrim	3a2f3c2c0a	[X86][SSE] getFauxShuffleMask - Fix shuffle mask adjustment for multiple inserted subvectors Part of the issue discovered in PR39483, although its not fully exposed until I reapply rL345395 (by reverting rL345451) llvm-svn: 345520	2018-10-29 18:25:48 +00:00
Craig Topper	220fd33522	[X86] Add AES to KNL CPUs to match clang. I believe this was lost from KNL when AES was pushed from Westmere to Skylake recently. KNL used to inherit from IVB. llvm-svn: 345519	2018-10-29 18:17:01 +00:00
Stanislav Mekhanoshin	6b1c6548bd	[AMDGPU] Fixed return value causing warning and regression llvm-svn: 345518	2018-10-29 17:53:23 +00:00
Bryan Chan	bfd32d4377	[AArch64] Rename FP16FML instruction format (NFC) Rename SIMDThreeSameMult (etc.) to SIMDThreeSameVectorFML (etc.) to follow usual naming convention, and add some comments in the .td files. llvm-svn: 345515	2018-10-29 17:27:34 +00:00
Stanislav Mekhanoshin	79080ecd82	[AMDGPU] Match v_swap_b32 Differential Revision: https://reviews.llvm.org/D52677 llvm-svn: 345514	2018-10-29 17:26:01 +00:00
Francis Visoiu Mistrih	61c9de7565	[X86] Enable the MachineVerifier by default The machine verifier was disabled for x86 by default. There are now only 9 tests failing, compared to what previously was between 20 and 30. This is a good opportunity to file bugs for all the remaining issues, then explicitly disable the failing tests and enabling the machine verifier by default. This allows us to avoid adding new tests that break the verifier. PR27481 llvm-svn: 345513	2018-10-29 16:57:43 +00:00
Luke Cheeseman	71c989ae1f	[AArch64] Return address signing B key support - Add support to generate AUTIBSP, PACIBSP, RETAB instructions for return address signing - The key used to sign the function is controlled by the function attribute "sign-return-address-key" Differential Revision: https://reviews.llvm.org/D51427 llvm-svn: 345511	2018-10-29 16:26:58 +00:00
Craig Topper	aa5eb2fbaa	[X86] Force floating point values in constant pool decoding to print in scientific notation so they can't be confused with integers. When the floating point constants are whole numbers they have no decimal point so look like integers, but mean something very different in something like an 'and' instruction. Ideally we would just print a decimal point and a 0, but I couldn't see how to make APFloat::toString do that. llvm-svn: 345488	2018-10-29 04:52:04 +00:00
Craig Topper	42aa87143d	[X86] Recognize constant splats in LowerFCOPYSIGN. llvm-svn: 345484	2018-10-28 23:51:35 +00:00
Simon Pilgrim	9b77f0c291	[VectorLegalizer] Enable TargetLowering::expandFP_TO_UINT support. Add vector support to TargetLowering::expandFP_TO_UINT. This exposes an issue in X86TargetLowering::LowerVSELECT which was assuming that the select mask was the same width as the LHS/RHS ops - as long as the result is a sign splat we can easily sext/trunk this. llvm-svn: 345473	2018-10-28 13:07:25 +00:00
Roman Lebedev	a5baf86744	AMD BdVer2 (Piledriver) Initial Scheduler model Summary: # Overview This is somewhat partial. * Latencies are good {F7371125} * All of these remaining inconsistencies //appear// to be noise/noisy/flaky. * NumMicroOps are somewhat good {F7371158} * Most of the remaining inconsistencies are from `Ld` / `Ld_ReadAfterLd` classes * Actual unit occupation (pipes, `ResourceCycles`) are undiscovered lands, i did not really look there. They are basically verbatum copy from `btver2` * Many `InstRW`. And there are still inconsistencies left... To be noted: I think this is the first new schedule profile produced with the new next-gen tools like llvm-exegesis! # Benchmark I realize that isn't what was suggested, but i'll start with some "internal" public real-world benchmark i understand - [[ https://github.com/darktable-org/rawspeed \| RawSpeed raw image decoding library ]]. Diff (the exact clang from trunk without/with this patch): ``` Comparing /home/lebedevri/rawspeed/build-old/src/utilities/rsbench/rsbench to /home/lebedevri/rawspeed/build-new/src/utilities/rsbench/rsbench Benchmark Time CPU Time Old Time New CPU Old CPU New ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Canon/EOS 5D Mark II/09.canon.sraw1.cr2/threads:8/real_time_pvalue 0.0000 0.0000 U Test, Repetitions: 25 vs 25 Canon/EOS 5D Mark II/09.canon.sraw1.cr2/threads:8/real_time_mean -0.0607 -0.0604 234 219 233 219 Canon/EOS 5D Mark II/09.canon.sraw1.cr2/threads:8/real_time_median -0.0630 -0.0626 233 219 233 219 Canon/EOS 5D Mark II/09.canon.sraw1.cr2/threads:8/real_time_stddev +0.2581 +0.2587 1 2 1 2 Canon/EOS 5D Mark II/10.canon.sraw2.cr2/threads:8/real_time_pvalue 0.0000 0.0000 U Test, Repetitions: 25 vs 25 Canon/EOS 5D Mark II/10.canon.sraw2.cr2/threads:8/real_time_mean -0.0770 -0.0767 144 133 144 133 Canon/EOS 5D Mark II/10.canon.sraw2.cr2/threads:8/real_time_median -0.0767 -0.0763 144 133 144 133 Canon/EOS 5D Mark II/10.canon.sraw2.cr2/threads:8/real_time_stddev -0.4170 -0.4156 1 0 1 0 Canon/EOS 5DS/2K4A9927.CR2/threads:8/real_time_pvalue 0.0000 0.0000 U Test, Repetitions: 25 vs 25 Canon/EOS 5DS/2K4A9927.CR2/threads:8/real_time_mean -0.0271 -0.0270 463 450 463 450 Canon/EOS 5DS/2K4A9927.CR2/threads:8/real_time_median -0.0093 -0.0093 453 449 453 449 Canon/EOS 5DS/2K4A9927.CR2/threads:8/real_time_stddev -0.7280 -0.7280 13 4 13 4 Canon/EOS 5DS/2K4A9928.CR2/threads:8/real_time_pvalue 0.0004 0.0004 U Test, Repetitions: 25 vs 25 Canon/EOS 5DS/2K4A9928.CR2/threads:8/real_time_mean -0.0065 -0.0065 569 565 569 565 Canon/EOS 5DS/2K4A9928.CR2/threads:8/real_time_median -0.0077 -0.0077 569 564 569 564 Canon/EOS 5DS/2K4A9928.CR2/threads:8/real_time_stddev +1.0077 +1.0068 2 5 2 5 Canon/EOS 5DS/2K4A9929.CR2/threads:8/real_time_pvalue 0.0220 0.0199 U Test, Repetitions: 25 vs 25 Canon/EOS 5DS/2K4A9929.CR2/threads:8/real_time_mean +0.0006 +0.0007 312 312 312 312 Canon/EOS 5DS/2K4A9929.CR2/threads:8/real_time_median +0.0031 +0.0032 311 312 311 312 Canon/EOS 5DS/2K4A9929.CR2/threads:8/real_time_stddev -0.7069 -0.7072 4 1 4 1 Canon/EOS 10D/CRW_7673.CRW/threads:8/real_time_pvalue 0.0004 0.0004 U Test, Repetitions: 25 vs 25 Canon/EOS 10D/CRW_7673.CRW/threads:8/real_time_mean -0.0015 -0.0015 141 141 141 141 Canon/EOS 10D/CRW_7673.CRW/threads:8/real_time_median -0.0010 -0.0011 141 141 141 141 Canon/EOS 10D/CRW_7673.CRW/threads:8/real_time_stddev -0.1486 -0.1456 0 0 0 0 Canon/EOS 40D/_MG_0154.CR2/threads:8/real_time_pvalue 0.6139 0.8766 U Test, Repetitions: 25 vs 25 Canon/EOS 40D/_MG_0154.CR2/threads:8/real_time_mean -0.0008 -0.0005 60 60 60 60 Canon/EOS 40D/_MG_0154.CR2/threads:8/real_time_median -0.0006 -0.0002 60 60 60 60 Canon/EOS 40D/_MG_0154.CR2/threads:8/real_time_stddev -0.1467 -0.1390 0 0 0 0 Canon/EOS 77D/IMG_4049.CR2/threads:8/real_time_pvalue 0.0137 0.0137 U Test, Repetitions: 25 vs 25 Canon/EOS 77D/IMG_4049.CR2/threads:8/real_time_mean +0.0002 +0.0002 275 275 275 275 Canon/EOS 77D/IMG_4049.CR2/threads:8/real_time_median -0.0015 -0.0014 275 275 275 275 Canon/EOS 77D/IMG_4049.CR2/threads:8/real_time_stddev +3.3687 +3.3587 0 2 0 2 Canon/PowerShot G1/crw_1693.crw/threads:8/real_time_pvalue 0.4041 0.3933 U Test, Repetitions: 25 vs 25 Canon/PowerShot G1/crw_1693.crw/threads:8/real_time_mean +0.0004 +0.0004 67 67 67 67 Canon/PowerShot G1/crw_1693.crw/threads:8/real_time_median -0.0000 -0.0000 67 67 67 67 Canon/PowerShot G1/crw_1693.crw/threads:8/real_time_stddev +0.1947 +0.1995 0 0 0 0 Fujifilm/GFX 50S/20170525_0037TEST.RAF/threads:8/real_time_pvalue 0.0074 0.0001 U Test, Repetitions: 25 vs 25 Fujifilm/GFX 50S/20170525_0037TEST.RAF/threads:8/real_time_mean -0.0092 +0.0074 547 542 25 25 Fujifilm/GFX 50S/20170525_0037TEST.RAF/threads:8/real_time_median -0.0054 +0.0115 544 541 25 25 Fujifilm/GFX 50S/20170525_0037TEST.RAF/threads:8/real_time_stddev -0.4086 -0.3486 8 5 0 0 Fujifilm/X-Pro2/_DSF3051.RAF/threads:8/real_time_pvalue 0.3320 0.0000 U Test, Repetitions: 25 vs 25 Fujifilm/X-Pro2/_DSF3051.RAF/threads:8/real_time_mean +0.0015 +0.0204 218 218 12 12 Fujifilm/X-Pro2/_DSF3051.RAF/threads:8/real_time_median +0.0001 +0.0203 218 218 12 12 Fujifilm/X-Pro2/_DSF3051.RAF/threads:8/real_time_stddev +0.2259 +0.2023 1 1 0 0 GoPro/HERO6 Black/GOPR9172.GPR/threads:8/real_time_pvalue 0.0000 0.0001 U Test, Repetitions: 25 vs 25 GoPro/HERO6 Black/GOPR9172.GPR/threads:8/real_time_mean -0.0209 -0.0179 96 94 90 88 GoPro/HERO6 Black/GOPR9172.GPR/threads:8/real_time_median -0.0182 -0.0155 95 93 90 88 GoPro/HERO6 Black/GOPR9172.GPR/threads:8/real_time_stddev -0.6164 -0.2703 2 1 2 1 Kodak/DCS Pro 14nx/D7465857.DCR/threads:8/real_time_pvalue 0.0000 0.0000 U Test, Repetitions: 25 vs 25 Kodak/DCS Pro 14nx/D7465857.DCR/threads:8/real_time_mean -0.0098 -0.0098 176 175 176 175 Kodak/DCS Pro 14nx/D7465857.DCR/threads:8/real_time_median -0.0126 -0.0126 176 174 176 174 Kodak/DCS Pro 14nx/D7465857.DCR/threads:8/real_time_stddev +6.9789 +6.9157 0 2 0 2 Nikon/D850/Nikon-D850-14bit-lossless-compressed.NEF/threads:8/real_time_pvalue 0.0000 0.0000 U Test, Repetitions: 25 vs 25 Nikon/D850/Nikon-D850-14bit-lossless-compressed.NEF/threads:8/real_time_mean -0.0237 -0.0238 474 463 474 463 Nikon/D850/Nikon-D850-14bit-lossless-compressed.NEF/threads:8/real_time_median -0.0267 -0.0267 473 461 473 461 Nikon/D850/Nikon-D850-14bit-lossless-compressed.NEF/threads:8/real_time_stddev +0.7179 +0.7178 3 5 3 5 Olympus/E-M1MarkII/Olympus_EM1mk2__HIRES_50MP.ORF/threads:8/real_time_pvalue 0.6837 0.6554 U Test, Repetitions: 25 vs 25 Olympus/E-M1MarkII/Olympus_EM1mk2__HIRES_50MP.ORF/threads:8/real_time_mean -0.0014 -0.0013 1375 1373 1375 1373 Olympus/E-M1MarkII/Olympus_EM1mk2__HIRES_50MP.ORF/threads:8/real_time_median +0.0018 +0.0019 1371 1374 1371 1374 Olympus/E-M1MarkII/Olympus_EM1mk2__HIRES_50MP.ORF/threads:8/real_time_stddev -0.7457 -0.7382 11 3 10 3 Panasonic/DC-G9/P1000476.RW2/threads:8/real_time_pvalue 0.0000 0.0000 U Test, Repetitions: 25 vs 25 Panasonic/DC-G9/P1000476.RW2/threads:8/real_time_mean -0.0080 -0.0289 22 22 10 10 Panasonic/DC-G9/P1000476.RW2/threads:8/real_time_median -0.0070 -0.0287 22 22 10 10 Panasonic/DC-G9/P1000476.RW2/threads:8/real_time_stddev +1.0977 +0.6614 0 0 0 0 Panasonic/DC-GH5/_T012014.RW2/threads:8/real_time_pvalue 0.0000 0.0000 U Test, Repetitions: 25 vs 25 Panasonic/DC-GH5/_T012014.RW2/threads:8/real_time_mean +0.0132 +0.0967 35 36 10 11 Panasonic/DC-GH5/_T012014.RW2/threads:8/real_time_median +0.0132 +0.0956 35 36 10 11 Panasonic/DC-GH5/_T012014.RW2/threads:8/real_time_stddev -0.0407 -0.1695 0 0 0 0 Panasonic/DC-GH5S/P1022085.RW2/threads:8/real_time_pvalue 0.0000 0.0000 U Test, Repetitions: 25 vs 25 Panasonic/DC-GH5S/P1022085.RW2/threads:8/real_time_mean +0.0331 +0.1307 13 13 6 6 Panasonic/DC-GH5S/P1022085.RW2/threads:8/real_time_median +0.0430 +0.1373 12 13 6 6 Panasonic/DC-GH5S/P1022085.RW2/threads:8/real_time_stddev -0.9006 -0.8847 1 0 0 0 Pentax/645Z/IMGP2837.PEF/threads:8/real_time_pvalue 0.0016 0.0010 U Test, Repetitions: 25 vs 25 Pentax/645Z/IMGP2837.PEF/threads:8/real_time_mean -0.0023 -0.0024 395 394 395 394 Pentax/645Z/IMGP2837.PEF/threads:8/real_time_median -0.0029 -0.0030 395 394 395 393 Pentax/645Z/IMGP2837.PEF/threads:8/real_time_stddev -0.0275 -0.0375 1 1 1 1 Phase One/P65/CF027310.IIQ/threads:8/real_time_pvalue 0.0232 0.0000 U Test, Repetitions: 25 vs 25 Phase One/P65/CF027310.IIQ/threads:8/real_time_mean -0.0047 +0.0039 114 113 28 28 Phase One/P65/CF027310.IIQ/threads:8/real_time_median -0.0050 +0.0037 114 113 28 28 Phase One/P65/CF027310.IIQ/threads:8/real_time_stddev -0.0599 -0.2683 1 1 0 0 Samsung/NX1/2016-07-23-142101_sam_9364.srw/threads:8/real_time_pvalue 0.0000 0.0000 U Test, Repetitions: 25 vs 25 Samsung/NX1/2016-07-23-142101_sam_9364.srw/threads:8/real_time_mean +0.0206 +0.0207 405 414 405 414 Samsung/NX1/2016-07-23-142101_sam_9364.srw/threads:8/real_time_median +0.0204 +0.0205 405 414 405 414 Samsung/NX1/2016-07-23-142101_sam_9364.srw/threads:8/real_time_stddev +0.2155 +0.2212 1 1 1 1 Samsung/NX30/2015-03-07-163604_sam_7204.srw/threads:8/real_time_pvalue 0.0000 0.0000 U Test, Repetitions: 25 vs 25 Samsung/NX30/2015-03-07-163604_sam_7204.srw/threads:8/real_time_mean -0.0109 -0.0108 147 145 147 145 Samsung/NX30/2015-03-07-163604_sam_7204.srw/threads:8/real_time_median -0.0104 -0.0103 147 145 147 145 Samsung/NX30/2015-03-07-163604_sam_7204.srw/threads:8/real_time_stddev -0.4919 -0.4800 0 0 0 0 Samsung/NX3000/_3184416.SRW/threads:8/real_time_pvalue 0.0000 0.0000 U Test, Repetitions: 25 vs 25 Samsung/NX3000/_3184416.SRW/threads:8/real_time_mean -0.0149 -0.0147 220 217 220 217 Samsung/NX3000/_3184416.SRW/threads:8/real_time_median -0.0173 -0.0169 221 217 220 217 Samsung/NX3000/_3184416.SRW/threads:8/real_time_stddev +1.0337 +1.0341 1 3 1 3 Sony/DSLR-A350/DSC05472.ARW/threads:8/real_time_pvalue 0.0001 0.0001 U Test, Repetitions: 25 vs 25 Sony/DSLR-A350/DSC05472.ARW/threads:8/real_time_mean -0.0019 -0.0019 194 193 194 193 Sony/DSLR-A350/DSC05472.ARW/threads:8/real_time_median -0.0021 -0.0021 194 193 194 193 Sony/DSLR-A350/DSC05472.ARW/threads:8/real_time_stddev -0.4441 -0.4282 0 0 0 0 Sony/ILCE-7RM2/14-bit-compressed.ARW/threads:8/real_time_pvalue 0.0000 0.4263 U Test, Repetitions: 25 vs 25 Sony/ILCE-7RM2/14-bit-compressed.ARW/threads:8/real_time_mean +0.0258 -0.0006 81 83 19 19 Sony/ILCE-7RM2/14-bit-compressed.ARW/threads:8/real_time_median +0.0235 -0.0011 81 82 19 19 Sony/ILCE-7RM2/14-bit-compressed.ARW/threads:8/real_time_stddev +0.1634 +0.1070 1 1 0 0 ``` {F7443905} If we look at the `_mean`s, the time column, the biggest win is `-7.7%` (`Canon/EOS 5D Mark II/10.canon.sraw2.cr2`), and the biggest loose is `+3.3%` (`Panasonic/DC-GH5S/P1022085.RW2`); Overall: mean `-0.7436%`, median `-0.23%`, `cbrt(sum(time^3))` = `-8.73%` Looks good so far i'd say. llvm-exegesis details: {F7371117} {F7371125} {F7371128} {F7371144} {F7371158} Reviewers: craig.topper, RKSimon, andreadb, courbet, avt77, spatel, GGanesh Reviewed By: andreadb Subscribers: javed.absar, gbedwell, jfb, llvm-commits Differential Revision: https://reviews.llvm.org/D52779 llvm-svn: 345463	2018-10-27 20:46:30 +00:00
Simon Pilgrim	a365719a24	[X86][SSE] LowerVSELECT - pull out repeated getOperand(). NFCI. llvm-svn: 345458	2018-10-27 18:37:59 +00:00
Simon Pilgrim	88116e905e	Revert rL345395: [X86][SSE] Move 2-input limit up from getFauxShuffleMask to resolveTargetShuffleInputs Makes no difference to actual shuffle decoding yet, but merges all the existing limits in one place for when proper support is fixed. ........ Its been reported that this is causing out of trunk failures. llvm-svn: 345451	2018-10-27 07:10:48 +00:00
Sanjin Sijaric	96f2ea3dd4	[ARM64][Windows] MCLayer support for exception handling Add ARM64 unwind codes to MCLayer, as well SEH directives that will be emitted by the frame lowering patch to follow. We only emit unwind codes into object object files for now. Differential Revision: https://reviews.llvm.org/D50166 llvm-svn: 345450	2018-10-27 06:13:06 +00:00
Craig Topper	4b89647b79	[X86] Add some isel patterns for scalar_to_vector/extract_vector_element that use the avx512 extended register classes when they are available. llvm-svn: 345448	2018-10-27 05:35:20 +00:00
Alina Sbirlea	bdb16f0519	Revert r345169 [along with its llvm counterpart r345170] as it makes Halide builds timeout. llvm-svn: 345447	2018-10-27 04:51:12 +00:00
Brendon Cahoon	aa783dfd6e	[Hexagon] Add missing assignment to Itinerary in Call_nr The class definition for Call_nr has the itinerary as a parameter, but the value is never assigned to the Itinerary field for the instruction. This means the compiler is unable to schedule and packetize the instruction correctly because these instrution will not have any resource descritions. I don't have a specific test case, but the ps_call_nr.ll test failed with a proposed patch. llvm-svn: 345442	2018-10-27 00:50:29 +00:00
Reid Kleckner	98d880fbd7	[Spectre] Fix MIR verifier errors in retpoline thunks Summary: The main challenge here is that X86InstrInfo::AnalyzeBranch doesn't understand the way we're using a CALL instruction as a branch, so we can't list the CallTarget MBB as a successor of the entry block. If we don't list it as a successor, then the AsmPrinter doesn't print a label for the MBB. Fix the issue by inserting our own label at the beginning of the call target block. We can rely on the AsmPrinter to always emit it, even though the block appears to be unreachable, but address-taken. Fixes PR38391. Reviewers: thegameg, chandlerc, echristo Subscribers: hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D53653 llvm-svn: 345426	2018-10-26 20:26:36 +00:00
Eli Friedman	2ac1162917	[ARM] Make InstrEmitter mark CPSR defs dead for Thumb1. The "dead" markings allow existing target-independent optimizations, like MachineSink, to trigger more frequently. The CPSR defs would have eventually been marked dead by LiveVariables, so this only affects optimizations before regalloc. The ARMBaseInstrInfo.cpp change is fixing a bug which is only visible with this change: the transform adds a use to an otherwise dead def of CPSR. This is covered by existing regression tests. thumb2-tbh.ll breaks for Thumb1 due to MachineLICM changing the generated code; I'll fix it in D53452. Differential Revision: https://reviews.llvm.org/D53453 llvm-svn: 345420	2018-10-26 19:32:24 +00:00
Lei Huang	de20843f6f	[PowerPC] Improve BUILD_VECTOR of 4 i32s Currently, for this node: vector int test(int a, int b, int c, int d) { return (vector int) { a, b, c, d }; } we get this on Power9: mtvsrdd 34, 5, 3 mtvsrdd 35, 6, 4 vmrgow 2, 3, 2 and this on Power8: mtvsrwz 0, 3 mtvsrwz 1, 5 mtvsrwz 2, 4 mtvsrwz 3, 6 xxmrghd 34, 1, 0 xxmrghd 35, 3, 2 vmrgow 2, 3, 2 This can be improved to this on LE Power9: rldimi 3, 4, 32, 0 rldimi 5, 6, 32, 0 mtvsrdd 34, 5, 3 and this on LE Power8 rldimi 3, 4, 32, 0 rldimi 5, 6, 32, 0 mtvsrd 34, 3 mtvsrd 35, 5 xxpermdi 34, 35, 34, 0 This patch updates the TD pattern to generate the optimized sequence for both Power8 and Power9 on LE and BE. Differential Revision: https://reviews.llvm.org/D53494 llvm-svn: 345414	2018-10-26 18:09:36 +00:00
Craig Topper	8315d9990c	[X86] Stop promoting vector and/or/xor/andn to vXi64. These promotions add additional bitcasts to the SelectionDAG that can pessimize computeKnownBits/computeNumSignBits. It also seems to interfere with broadcast formation. This patch removes the promotion and adds isel patterns instead. The increased table size is more than I would like, but hopefully we can find some canonicalizations or other tricks to start pruning out patterns going forward. Differential Revision: https://reviews.llvm.org/D53268 llvm-svn: 345408	2018-10-26 17:21:26 +00:00
Simon Pilgrim	5d1be4f8d4	[X86][SSE] Move 2-input limit up from getFauxShuffleMask to resolveTargetShuffleInputs Makes no difference to actual shuffle decoding yet, but merges all the existing limits in one place for when proper support is fixed. llvm-svn: 345395	2018-10-26 15:19:02 +00:00
Sanjay Patel	6b40768f5a	[x86] commute blendvb with constant condition op to allow load folding This is a narrow fix for 1 of the problems mentioned in PR27780: https://bugs.llvm.org/show_bug.cgi?id=27780 I looked at more general solutions, but it's a mess. We canonicalize shuffle masks based on the number of elements accessed from each operand, and that's not optional. If you remove that, we'll crash because we fail to match isel patterns. So I'm waiting until we're sure that we have blendvb with constant condition and then commuting based on the load potential. Other cases like blend-with-immediate are already handled elsewhere, so this is probably not a common problem anyway. I didn't use "MayFoldLoad" because that checks for one-use and in these cases, we've screwed that up by creating a temporary PSHUFB using these operands that we're counting on to be killed later. Undoing that didn't look like a simple task because it's intertwined with determining if we actually use both operands of the shuffle or not.a Differential Revision: https://reviews.llvm.org/D53737 llvm-svn: 345390	2018-10-26 14:58:13 +00:00
Simon Pilgrim	7575c6d01b	[X86] Use existing pulled out VT variables. NFCI. llvm-svn: 345388	2018-10-26 14:39:28 +00:00
Scott Linder	11ef7984b0	[AMDGPU] Add a pass to promote bitcast calls AMDGPU currently only supports direct calls, but at lower optimisation levels it fails to lower statically direct calls which appear indirect due to a bitcast. Add a pass to visit all CallSites and use CallPromotionUtils to "devirtualize" calls. Differential Revision: https://reviews.llvm.org/D52741 llvm-svn: 345382	2018-10-26 13:18:36 +00:00
Fangrui Song	065c3610ad	[SystemZ] Fix -Wcovered-switch-default as coding standard regulates llvm-svn: 345369	2018-10-26 06:59:08 +00:00
Li Jia He	f6fb752fe8	[PowerPC] Fix some missed optimization opportunities in combineSetCC For both operands are bool, short, int, long, long long, add the following optimization. 1. 0-x == y --> x+y ==0 2. 0-x != y --> x+y != 0 Review: nemanjai Differential Revision: https://reviews.llvm.org/D53360 llvm-svn: 345366	2018-10-26 06:48:53 +00:00
Nemanja Ivanovic	6a74bfba20	[PowerPC] Keep vector int to fp conversions in vector domain At present a v2i16 -> v2f64 convert is implemented by extracts to scalar, scalar converts, and merge back into a vector. Use vector converts instead, with the int data permuted into the proper position and extended if necessary. Patch by RolandF. Differential revision: https://reviews.llvm.org/D53346 llvm-svn: 345361	2018-10-26 03:19:13 +00:00
Fangrui Song	61ea8dae2e	Add dependency from SystemZAsmParser to SystemZAsmPrinter after rL345349 This fixes -DBUILD_SHARED_LIBS=on build. The dependency is similar to that of X86's. llvm-svn: 345358	2018-10-26 03:04:54 +00:00
Vlad Tsyrklevich	21beeb29ea	Revert "[AArch64] Create proper memoperand for multi-vector stores" This reverts commit r345315, it was causing test failures on sanitizer-x86_64-linux-fast. llvm-svn: 345356	2018-10-26 02:00:14 +00:00
Jonas Paulsson	dda46307c2	[SystemZ] Implement SystemZOperand::print() SystemZAsmParser can now handle -debug by printing the operands neatly to the output stream. Before this patch this lead to an llvm_unreachable(). It seems that now '-mllvm -debug' does not cause any crashes anywhere (at least not on SPEC). Review: Ulrich Weigand https://reviews.llvm.org/D53328 llvm-svn: 345349	2018-10-26 00:36:00 +00:00
Jonas Paulsson	e2c5cbc164	[SystemZ] Pass the DAG pointer from SystemZAddressingMode::dump(). In order to print the IR slot number for the memory operand, the DAG pointer must be passed to SDNode::dump(). The isel-debug.ll test updated to also check for the IR Value reference being printed correctly. Review: Ulrich Weigand https://reviews.llvm.org/D53333 llvm-svn: 345347	2018-10-26 00:02:33 +00:00
Heejin Ahn	24faf859e5	Reland "[WebAssembly] LSDA info generation" Summary: This adds support for LSDA (exception table) generation for wasm EH. Wasm EH mostly follows the structure of Itanium-style exception tables, with one exception: a call site table entry in wasm EH corresponds to not a call site but a landing pad. In wasm EH, the VM is responsible for stack unwinding. After an exception occurs and the stack is unwound, the control flow is transferred to wasm 'catch' instruction by the VM, after which the personality function is called from the compiler-generated code. (Refer to WasmEHPrepare pass for more information on this part.) This patch: - Changes wasm.landingpad.index intrinsic to take a token argument, to make this 1:1 match with a catchpad instruction - Stores landingpad index info and catch type info MachineFunction in before instruction selection - Lowers wasm.lsda intrinsic to an MCSymbol pointing to the start of an exception table - Adds WasmException class with overridden methods for table generation - Adds support for LSDA section in Wasm object writer Reviewers: dschuff, sbc100, rnk Subscribers: mgorny, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D52748 llvm-svn: 345345	2018-10-25 23:55:10 +00:00
Heejin Ahn	3103d3dcd1	[WebAssembly] Support EH instructions in InstPrinter Summary: This adds support for exception handling instructions to InstPrinter. Reviewers: dschuff, aardappel Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D53634 llvm-svn: 345343	2018-10-25 23:45:48 +00:00
Bryan Chan	f0923f16f8	[AArch64] Implement FP16FML intrinsics Add LLVM intrinsics for the ARMv8.2-A FP16FML vector-form instructions. Add a DAG pattern to define the indexed-form intrinsics in terms of the vector-form ones, similarly to how the Dot Product intrinsics were implemented. Based on a patch by Gao Yiling. Differential Revision: https://reviews.llvm.org/D53632 llvm-svn: 345337	2018-10-25 23:36:41 +00:00
Heejin Ahn	1d13e6be37	Address comments - Add llvm-mc test case (and delete the old one) - Change report_fatal_error to assertions llvm-svn: 345334	2018-10-25 23:35:14 +00:00
Heejin Ahn	1147d91402	[WebAssembly] Error out when block/loop markers mismatch Summary: Currently InstPrinter ignores if there are mismatches between block/loop and end markers by skipping the case if ControlFlowStack is empty. I guess it is better to explicitly error out in this case, because this signals invalid input. Reviewers: aardappel Subscribers: dschuff, sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D53620 llvm-svn: 345333	2018-10-25 23:35:13 +00:00
Jonas Paulsson	2b280ea604	[SystemZ] NFC reformatting in SystemZTargetTransformInfo.cpp Some lines more than 80 characters long reformatted. llvm-svn: 345331	2018-10-25 22:53:27 +00:00
Jonas Paulsson	b7caa809e1	[SystemZ] Improve getMemoryOpCost() to find foldable loads that are converted. The SystemZ backend can do arithmetic of memory by loading and then extending one of the operands. Similarly, a load + truncate can be folded into an operand. This patch improves the SystemZ TTI cost function to recognize this. Review: Ulrich Weigand https://reviews.llvm.org/D52692 llvm-svn: 345327	2018-10-25 22:28:25 +00:00
Jonas Paulsson	4645711a8d	[SystemZ] Improve handling and cost estimates of vector integer div/rem Enable the DAG optimization that converts vector div/rem with constants into multiply+shifts sequences by expanding them early. This is needed since ISD::SMUL_LOHI is 'Custom' lowered on SystemZ, and will therefore not be available to BuildSDIV after legalization. Better cost values for these instructions based on how they will be implemented (a constant divisor is cheaper). Review: Ulrich Weigand https://reviews.llvm.org/D53196 llvm-svn: 345321	2018-10-25 21:47:22 +00:00
Craig Topper	813064bf4d	[X86] Change X86 backend to look for 'min-legal-vector-width' attribute instead of 'required-vector-width' when determining whether 512-bit vectors should be legal. The required-vector-width attribute was only used for backend testing and has never been generated by clang. I believe clang is now generating min-legal-vector-width for vector uses in user code. With this I believe passing -mprefer-vector-width=256 to clang should prevent use of zmm registers in the generated assembly unless the user used a 512-bit intrinsic in their source code. llvm-svn: 345317	2018-10-25 21:16:06 +00:00
David Greene	53e869da7d	[AArch64] Create proper memoperand for multi-vector stores Include all of the store's source vector operands when creating the MachineMemOperand. Previously, we were missing the first operand, making the store size seem smaller than it really is. Differential Revision: https://reviews.llvm.org/D52816 llvm-svn: 345315	2018-10-25 21:10:39 +00:00
Thomas Lively	0aad98fd07	[WebAssembly] Use target-independent saturating add Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D53721 llvm-svn: 345299	2018-10-25 19:06:13 +00:00
Volkan Keles	f87473fe1c	[GISel] LegalizerInfo: Rename MemDesc::Size to SizeInBits to make the value clearer Requested in D53679. llvm-svn: 345288	2018-10-25 17:37:07 +00:00
Craig Topper	c10de9a37a	[X86] Remove ProcIntelKNL and replace with a SlowPMADDWD flag to use in the one place it was checked. llvm-svn: 345286	2018-10-25 17:29:00 +00:00
Craig Topper	5d787ac4be	[X86] Remove some uarch tuning flags from KNL that look to have been inherited from SNB/IVB incorrectly KNL is based on a modified Silvermont core so I don't think these features apply. I think the LEA flag is probably also wrong, but I'm less sure as I barely understand the 3 LEA flags we have currently. Differential Revision: https://reviews.llvm.org/D53671 llvm-svn: 345285	2018-10-25 17:28:57 +00:00
Volkan Keles	3a103b1d25	[AArch64][GlobalISel] Fix the LegalityPredicate for lowerIf for G_LOAD/G_STORE Summary: Currently, Legalizer is trying to lower G_LOAD with a vector type that has more than two elements due to the incorrect LegalityPredicate. This patch fixes the issue by removing the multiplication by 8 as `MemDesc.Size` already contains the size in bits. Reviewers: dsanders, aemerson Reviewed By: dsanders Subscribers: rovka, javed.absar, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D53679 llvm-svn: 345282	2018-10-25 17:23:25 +00:00
Evandro Menezes	b53cf99388	[AArch64] Refactor Exynos feature sets (NFC) llvm-svn: 345279	2018-10-25 16:45:46 +00:00
John Brawn	958865202d	[AArch64] Add EXT patterns for 64-bit EXT of a subvector of a 128-bit vector If we have a 64-bit EXT where one of the operands is a subvector of a 128-bit vector then in some cases we can eliminate an extract_subvector by converting to a 128-bit EXT of the 128-bit vector. Differential Revision: https://reviews.llvm.org/D53582 llvm-svn: 345275	2018-10-25 15:31:51 +00:00
Sam Parker	a16667e79b	[ARM] Use Cortex-A57 sched model for Cortex-A72 This mirrors what we already do for AArch64 as the cores are similar. As discussed in the review, enabling the machine scheduler causes more variations in performance changes so it is not enabled for now. This patch improves LNT scores by a geomean of 1.57% at -O3. Differential Revision: https://reviews.llvm.org/D53562 llvm-svn: 345272	2018-10-25 15:08:29 +00:00
John Brawn	b8e7887f33	[AArch64] Refactor definition of EXT patterns to use a multiclass Using a multiclass reduces duplication, and makes it easier to add new patterns later. This refactoring does add some new patterns, but as far as I can tell there's no IR that will end up triggering them so this is effectively NFC. Differential Revision: https://reviews.llvm.org/D53580 llvm-svn: 345271	2018-10-25 15:00:10 +00:00
John Brawn	49e61d90ca	[AArch64] Do 64-bit vector move of 0 and -1 by extracting from the 128-bit move Currently a vector move of 0 or -1 will use different instructions depending on the size of the vector. Using a single instruction (the 128-bit one) for both gives more opportunity for Machine CSE to eliminate instructions. Differential Revision: https://reviews.llvm.org/D53579 llvm-svn: 345270	2018-10-25 14:56:48 +00:00
Alexey Bataev	0f2fe4f135	[DEBUG_INFO][NVPTX]Fix processing of DBG_VALUES. Summary: If the instruction in the eliminateFrameIndex function is a DBG_VALUE instruction, it requires special processing. The frame register is set to VRFrame and the offset is based on the object offset. The code is similar to the code used in lib/CodeGen/PrologEpilogInserter.cpp. Reviewers: tra Subscribers: jholewinski, llvm-commits Differential Revision: https://reviews.llvm.org/D53657 llvm-svn: 345269	2018-10-25 14:27:27 +00:00
Amara Emerson	cbd86d8429	[GlobalISel] Use the target preferred type for G_EXTRACT_VECTOR_ELT index. Allows for better imported pattern re-use. llvm-svn: 345265	2018-10-25 14:04:54 +00:00
Simon Pilgrim	53e8e145e9	[CostModel][X86] Add realistic vXi64 uitofp vXf64 costs Match codegen improvements from D53649/rL345256 llvm-svn: 345263	2018-10-25 13:06:20 +00:00
Alex Bradbury	74d4931da2	[RISCV] Use PatFrags for variable shift patterns This follows SystemZ and I think is cleaner vs the multiclass. llvm-svn: 345262	2018-10-25 12:45:20 +00:00
Simon Pilgrim	0573b8d8b6	[CostModel][X86] Add realistic i64 uitofp f64 scalar costs llvm-svn: 345261	2018-10-25 12:42:10 +00:00
Simon Pilgrim	071e82218f	[TTI] Add generic SK_Broadcast shuffle costs I noticed while fixing PR39368 that we don't have generic shuffle costs for broadcast style shuffles. This patch adds SK_BROADCAST handling, but exposes ARM/AARCH64 lack of handling of this type, which I've added a fix for at the same time. Differential Revision: https://reviews.llvm.org/D53570 llvm-svn: 345253	2018-10-25 10:52:36 +00:00
Clement Courbet	41c8af3924	[MCSched] Bind PFM Counters to the CPUs instead of the SchedModel. Summary: The pfm counters are now in the ExegesisTarget rather than the MCSchedModel (PR39165). This also compresses the pfm counter tables (PR37068). Reviewers: RKSimon, gchatelet Subscribers: mgrang, llvm-commits Differential Revision: https://reviews.llvm.org/D52932 llvm-svn: 345243	2018-10-25 07:44:01 +00:00
Craig Topper	7ae43cad65	[X86] Don't use the OriginalDemandedBits to calculate the DemandedMask for PMULUDQ/PMULDQ inputs. Multiply a is complex operation so just because some bit of the output isn't used doesn't mean that bit of the input isn't used. We might able to bound it, but it will require some more thought. llvm-svn: 345241	2018-10-25 07:00:09 +00:00
Craig Topper	eaa1cf5b57	[X86] Fix typo in comment. NFC llvm-svn: 345236	2018-10-25 05:00:20 +00:00
Thomas Lively	325c9c5e84	[WebAssembly] Set LoadExt and TruncStore actions for SIMD types Summary: Fixes part of the problem reported in bug 39275. Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits, alexcrichton Differential Revision: https://reviews.llvm.org/D53542 llvm-svn: 345230	2018-10-25 01:46:07 +00:00
Heejin Ahn	ac764aa88e	[WebAssembly] Fix immediate of rethrow when throwing to caller Summary: Currently when assigning depths 'rethrow' does not take the whole control flow stack into accounts but only considers EH pad stacks. When assigning depth immmediates to rethrows, in normal cases it is done correctly but when a rethrow instruction throws up to a caller, i.e., we convert a pseudo RETHROW_TO_CALLER instruction to a rethrow, it mistakenly compute the whole stack depth. Reviewers: dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D53619 llvm-svn: 345223	2018-10-24 23:31:24 +00:00
Thomas Lively	ed9513472c	[WebAssembly] Retain shuffle types during custom lowering Summary: Changing the node type in lowering was violating assumptions made in the DAG combiner, so don't change the node type any more. This fixes one of the issues reported in bug 39275. Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits, alexcrichton Differential Revision: https://reviews.llvm.org/D53537 llvm-svn: 345221	2018-10-24 23:27:40 +00:00
Reid Kleckner	49a24278ba	[ELF] Fix large code model MIR verifier errors Instead of using the MOVGOT64r pseudo, use the existing MO_PIC_BASE_OFFSET support on symbol operands. Now I don't have to create a "scratch register operand" for the pseudo to use, and the register allocator can make better decisions. Fixes some X86 verifier errors tracked in PR27481. llvm-svn: 345219	2018-10-24 22:57:28 +00:00
Thomas Lively	30f1d69115	[NFC] Rename minnan and maxnan to minimum and maximum Summary: Changes all uses of minnan/maxnan to minimum/maximum globally. These names emphasize that the semantic difference between these operations is more than just NaN-propagation. Reviewers: arsenm, aheejin, dschuff, javed.absar Subscribers: jholewinski, sdardis, wdng, sbc100, jgravelle-google, jrtc27, atanasyan, llvm-commits Differential Revision: https://reviews.llvm.org/D53112 llvm-svn: 345218	2018-10-24 22:49:55 +00:00
Evandro Menezes	096e2497b5	[AArch64] Refactor Exynos machine model Effectively, NFC. llvm-svn: 345201	2018-10-24 21:40:43 +00:00
Reid Kleckner	9c5bda652c	[X86] Add SP to tailcall register class to fix verifier error It's possible to do a tail call to a stack argument. LLVM already calculates the right stack offset to call through. Fixes the sibcall and musttail* verifier failures tracked at PR27481. llvm-svn: 345197	2018-10-24 21:09:34 +00:00
Reid Kleckner	953bdce68d	[MC] Separate masm integer literal lexer support from inline asm Summary: This renames the IsParsingMSInlineAsm member variable of AsmLexer to LexMasmIntegers and moves it up to MCAsmLexer. This is the only behavior controlled by that variable. I added a public setter, so that it can be set from outside or from the llvm-mc command line. We may need to arrange things so that users can get this behavior from clang, but that's future work. I also put additional hex literal lexing functionality under this flag to fix PR32973. It appears that this hex literal parsing wasn't intended to be enabled in non-masm-style blocks. Now, masm integers (0b1101 and 0ABCh) work in __asm blocks from clang, but 0b label references work when using .intel_syntax in standalone .s files. However, 0b label references will not work from __asm blocks in clang. They will work from GCC inline asm blocks, which it sounds like is important for Crypto++ as mentioned in PR36144. Essentially, we only lex masm literals for inline asm blobs that use intel syntax. If the .intel_syntax directive is used inside a gnu-style inline asm statement, masm literals will not be lexed, which is compatible with gas and llvm-mc standalone .s assembly. This fixes PR36144 and PR32973. Reviewers: Gerolf, avt77 Subscribers: eraman, hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D53535 llvm-svn: 345189	2018-10-24 20:23:57 +00:00
Tim Northover	1c353419ab	AArch64: add a pass to compress jump-table entries when possible. llvm-svn: 345188	2018-10-24 20:19:09 +00:00
Evandro Menezes	769d4cebad	[AArch64] Refactor Exynos machine model (NFC) llvm-svn: 345187	2018-10-24 20:03:24 +00:00
Evandro Menezes	80bc136732	[AArch64] Fix overlapping instructions Fix overlapping instruction descriptions in the machine model for Exynos M3. Effectively, NFC. llvm-svn: 345186	2018-10-24 20:03:20 +00:00
Craig Topper	7bb8c2e6e5	[X86] Explicitly list all KNL features of inheriting from IVB. NFC I'm not sure all the microarchitectural tuning flags that have been added to IVBFeatures are relevant for KNL. Separating will allow us to see and audit them. There might even be some simplification opportunities in the Sandy Bridge through Icelake inheritance line without KNL using the same chain. llvm-svn: 345183	2018-10-24 19:24:44 +00:00
Simon Pilgrim	c5bb362b13	[X86][SSE] Add SimplifyDemandedBitsForTargetNode PMULDQ/PMULUDQ handling Add X86 SimplifyDemandedBitsForTargetNode and use it to simplify PMULDQ/PMULUDQ target nodes. This enables us to repeatedly simplify the node's arguments after the previous approach had to be reverted due to PR39398. Differential Revision: https://reviews.llvm.org/D53643 llvm-svn: 345182	2018-10-24 19:11:28 +00:00
Simon Pilgrim	ac84005841	[CostModel][X86] Add vXi8 vector division by constants costs. ISD::MULHS/ISD::MULHU lowering of vXi8 types means we expand these in TargetLowering BuildSDIV/BuildUDIV. llvm-svn: 345175	2018-10-24 18:44:12 +00:00
Peter Collingbourne	4bb928c110	ARM: Use BKPT instead of TRAP to implement llvm.debugtrap. The BKPT instruction is specified to cause a software breakpoint, and at least on Linux results in a SIGTRAP. This makes it more suitable for implementing debugtrap than TRAP (aka UDF #254), which is specified to cause an undefined instruction exception and results in a SIGILL on Linux. Moreover, BKPT is not marked as a terminator, which is not only consistent with the IR instruction but allows the analyzeBlock function to correctly analyze a basic block containing the instruction, which fixes an assertion failure in the machine block placement pass previously triggered by the included test case. Because BKPT is only supported starting with ARMv5T, we continue to use UDF #254 when targeting v4T. Differential Revision: https://reviews.llvm.org/D53614 llvm-svn: 345171	2018-10-24 18:10:38 +00:00
Krzysztof Parzyszek	57b5ac1431	[Hexagon] Flip hexagon-autohvx to be true by default This will allow other generators of LLVM IR to use the auto-vectorizer without having to change that flag. Note: on its own, this patch will enable auto-vectorization on Hexagon in all cases, regardless of the -fvectorize flag. There is a companion clang patch that together with this one forms an NFC for clang users. llvm-svn: 345169	2018-10-24 17:55:13 +00:00
Craig Topper	2417273255	[X86] Bring back the MOV64r0 pseudo instruction This patch brings back the MOV64r0 pseudo instruction for zeroing a 64-bit register. This replaces the SUBREG_TO_REG MOV32r0 sequence we use today. Post register allocation we will rewrite the MOV64r0 to a 32-bit xor with an implicit def of the 64-bit register similar to what we do for the various XMM/YMM/ZMM zeroing pseudos. My main motivation is to enable the spill optimization in foldMemoryOperandImpl. As we were seeing some code that repeatedly did "xor eax, eax; store eax;" to spill several registers with a new xor for each store. With this optimization enabled we get a store of a 0 immediate instead of an xor. Though I admit the ideal solution would be one xor where there are multiple spills. I don't believe we have a test case that shows this optimization in here. I'll see if I can try to reduce one from the code were looking at. There's definitely some other machine CSE(and maybe other passes) behavior changes exposed by this patch. So it seems like there might be some other deficiencies in SUBREG_TO_REG handling. Differential Revision: https://reviews.llvm.org/D52757 llvm-svn: 345165	2018-10-24 17:32:09 +00:00
Simon Pilgrim	2cce074e8c	[CostModel][X86] Enable non-uniform vector division by constants costs. Non-uniform division/remainder handling was added back at D49248/D50765 - so share the 'mul+sub' costs that already exist for uniform cases. llvm-svn: 345164	2018-10-24 17:30:29 +00:00
Alexey Bataev	c15c853c3a	[DEBUGINFO, NVPTX] Try to pack bytes data into a single string. Summary: If the target does not support `.asciz` and `.ascii` directives, the strings are represented as bytes and each byte is placed on the new line as a separate byte directive `.b8 <data>`. NVPTX target allows to represent the vector of the data of the same type as a vector, where values are separated using `,` symbol: `.b8 <data1>,<data2>,...`. This allows to reduce the size of the final PTX file. Ptxas tool includes ptx files into the resulting binary object, so reducing the size of the PTX file is important. Reviewers: tra, jlebar, echristo Subscribers: jholewinski, llvm-commits Differential Revision: https://reviews.llvm.org/D45822 llvm-svn: 345142	2018-10-24 14:04:00 +00:00
Tim Renouf	2a1b1d94b6	[AMDGPU] Defined gfx909 Raven Ridge 2 Differential Revision: https://reviews.llvm.org/D53418 Change-Id: Ie3d054f2e956c2768988c0f4c0ffd29a47294eef llvm-svn: 345120	2018-10-24 08:14:07 +00:00
Craig Topper	da54bbf52a	[X86] Correct a bad isel predicate. Though I don't think it can be exposed. This B/W VPTEST instructions are only available with AVX512BW. But lowering should prevent any byte or word elements from getting to isel so this can't be exposed. llvm-svn: 345112	2018-10-24 06:13:36 +00:00
Saleem Abdulrasool	4005f9a860	ARM: handle checking aliases with out-of-bounds GEPs A global alias may use indices which are not considered in bounds. In such a case, accessing the base object will fail as it only peers through inbounds accesses. This pattern is used by the swift compiler to create references to preceeding members in the type metadata. This would cause the code generation to fail when targeting a platform that used ELF as the object file format. Be conservative and fail the read-only check if we run into an alias that we cannot peer through. llvm-svn: 345107	2018-10-24 00:00:52 +00:00
Matthias Braun	4f82406c46	SelectionDAG: Reuse bigger sized constants in memset expansion. When implementing memset's today we often see this pattern: $x0 = MOV 0xXYXYXYXYXYXYXYXY store $x0, ... $w1 = MOV 0xXYXYXYXY store $w1, ... We first create a 64bit constant in a 64bit register with all bytes the same and then create a 32bit constant with all bytes the same in a 32bit register. In many targets we could just access the lower byte of the 64bit register instead. - Ideally this would be handled by the ConstantHoist pass but it runs too early when memset isn't expanded yet. - The memset expansion code already had this optimization implemented, however SelectionDAG constantfolding would constantfold the "trunc(bigconstnat)" pattern to "smallconstant". - This patch makes the memset expansion mark the constant as Opaque and stop DAGCombiner from constant folding in this situation. (Similar to how ConstantHoisting marks things as Opaque to avoid folding ADD/SUB/etc.) Differential Revision: https://reviews.llvm.org/D53181 llvm-svn: 345102	2018-10-23 23:19:23 +00:00
Simon Pilgrim	b6c57075c0	[X86][SSE] Revert rL343922 combinePMULDQ AddToWorklist (PR39398) We can't add the MULDQ node back to the worklist after the demanded bits change has been committed in case the node has been removed entirely. This will have to wait until we have SimplifyDemandedBitsForTargetNode. llvm-svn: 345070	2018-10-23 19:07:53 +00:00
Roman Lebedev	2fae985793	X86DAGToDAGISel::matchBitExtract(): lambdas can't have default arguments. As reported by ctopper. That is a gcc-only warning at the moment. llvm-svn: 345065	2018-10-23 18:27:10 +00:00
Stefan Pintilie	927e8bf316	[Power9] Add __float128 support in the backend for bitcast to a i128 Add support to allow bit-casting from f128 to i128 and then extracting 64 bits from the result. Differential Revision: https://reviews.llvm.org/D49507 llvm-svn: 345053	2018-10-23 17:11:36 +00:00
Simon Pilgrim	f04a04c2b6	[TTI][X86] Treat SK_Transpose shuffles as SK_PermuteTwoSrc - there's no difference in lowering. llvm-svn: 345048	2018-10-23 16:45:26 +00:00
Sanjay Patel	47a52a0521	[WebAssembly] use 'match' to simplify code; NFC Vector types are not possible here because this code explicitly checks for a scalar type, but this is another step towards completely removing the fake binop queries for not/neg/fneg. llvm-svn: 345043	2018-10-23 16:05:09 +00:00
Roman Lebedev	06e4db07af	Experimental re-land of [X86][BMI1] X86DAGToDAGISel: select BEXTR from x << (32 - y) >> (32 - y) pattern This initially landed in rL345014, but was reverted in rL345017 due to sanitizer-x86_64-linux-fast buildbot failure in check-lld (ELF/relocatable-versioned.s) test. While i'm not yet quite sure what is the problem, one obvious thing here is that extra truncation roundtrip. Maybe that's it? If not, will re-revert. Differential Revision: https://reviews.llvm.org/D53521 llvm-svn: 345027	2018-10-23 13:19:31 +00:00
Simon Pilgrim	f85ee9f8b4	[X86][SSE] Update raw mask shuffle decoders to handle UNDEF mask elts Matches the approach taken in the constant pool shuffle decoders, and uses an UndefElts mask instead of uint64_t(-1) raw mask values, which doesn't work safely for i32/i64 shuffle mask sizes (as the -1 value is legal). This allows us to remove the constant pool shuffle decoders from most of the getTargetShuffleMask variable shuffle cases (X86ISD::VPERMV3 will be handled in a future commit). llvm-svn: 345018	2018-10-23 11:33:38 +00:00
Roman Lebedev	c29dbbdb10	Revert "[X86][BMI1] X86DAGToDAGISel: select BEXTR from x << (32 - y) >> (32 - y) pattern" Seems to be breaking sanitizer-x86_64-linux-fast buildbot, the ELF/relocatable-versioned.s test: ==17758==MemorySanitizer CHECK failed: /b/sanitizer-x86_64-linux-fast/build/llvm/projects/compiler-rt/lib/sanitizer_common/sanitizer_allocator.cc:191 "((kBlockMagic)) == ((((u64)addr)[0]))" (0x6a6cb03abcebc041, 0x0) #0 0x59716b in MsanCheckFailed(char const, int, char const, unsigned long long, unsigned long long) /b/sanitizer-x86_64-linux-fast/build/llvm/projects/compiler-rt/lib/msan/msan.cc:393 #1 0x586635 in __sanitizer::CheckFailed(char const, int, char const, unsigned long long, unsigned long long) /b/sanitizer-x86_64-linux-fast/build/llvm/projects/compiler-rt/lib/sanitizer_common/sanitizer_termination.cc:79 #2 0x57d5ff in __sanitizer::InternalFree(void, __sanitizer::SizeClassAllocatorLocalCache<__sanitizer::SizeClassAllocator32<__sanitizer::AP32> >*) /b/sanitizer-x86_64-linux-fast/build/llvm/projects/compiler-rt/lib/sanitizer_common/sanitizer_allocator.cc:191 #3 0x7fc21b24193f (/lib/x86_64-linux-gnu/libc.so.6+0x3593f) #4 0x7fc21b241999 in exit (/lib/x86_64-linux-gnu/libc.so.6+0x35999) #5 0x7fc21b22c2e7 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x202e7) #6 0x57c039 in _start (/b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/bin/lld+0x57c039) This reverts commit r345014. llvm-svn: 345017	2018-10-23 10:34:57 +00:00
Simon Pilgrim	816e57be35	[TTI] Add generic cost handling of SK_Reverse shuffles These can be treated as a general permute. This required a fix for missing reverse patterns on ARM llvm-svn: 345015	2018-10-23 09:42:10 +00:00
Roman Lebedev	1c95b2f779	[X86][BMI1] X86DAGToDAGISel: select BEXTR from x << (32 - y) >> (32 - y) pattern Summary: Continuation of D52348. We also get the `c) x & (-1 >> (32 - y))` pattern here, because of the D48768. I will add extra-uses into those tests and follow-up with a patch to handle those patterns too. Reviewers: RKSimon, craig.topper Reviewed By: craig.topper Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D53521 llvm-svn: 345014	2018-10-23 09:08:44 +00:00
Heejin Ahn	a40303aa03	[WebAssembly] Fix assembly printing of br_table Summary: In `br_table's stack version asm string, \t was missing. Reviewers: aardappel Subscribers: dschuff, sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D53516 llvm-svn: 344981	2018-10-23 00:28:14 +00:00
Saleem Abdulrasool	96cd3cc312	X86: fix a comment copy-paste issue (NFC) The comment was copy-pasted but not updated. NFC. llvm-svn: 344973	2018-10-22 23:34:24 +00:00
Craig Topper	96889b8b96	[X86] Remove unused entries from the X86ProcFamily enum. Add a note to discourage creation of new enum entries. As we've learned multiple times, a coarse grained enum like this is not scalable and we should be migrating away from it. llvm-svn: 344972	2018-10-22 23:14:55 +00:00
Matthias Braun	a0beeffeed	X86: Do not optimize branches with undef eflags inputs analyzeBranch()/insertBranch() etc. do not properly deal with an undef flag on the eflags input and used to produce invalid MIR. I don't see this ever affecting real world inputs (I don't think it is possible to produce undef flags with llvm IR), so I simply changed the code to bail out in this case. rdar://42122367 llvm-svn: 344970	2018-10-22 22:52:23 +00:00
Craig Topper	c8e183f9ee	Recommit r344877 "[X86] Stop promoting integer loads to vXi64" I've included a fix to DAGCombiner::ForwardStoreValueToDirectLoad that I believe will prevent the previous miscompile. Original commit message: Theoretically this was done to simplify the amount of isel patterns that were needed. But it also meant a substantial number of our isel patterns have to match an explicit bitcast. By making the vXi32/vXi16/vXi8 types legal for loads, DAG combiner should be able to change the load type to rem I had to add some additional plain load instruction patterns and a few other special cases, but overall the isel table has reduced in size by ~12000 bytes. So it looks like this promotion was hurting us more than helping. I still have one crash in vector-trunc.ll that I'm hoping @RKSimon can help with. It seems to relate to using getTargetConstantFromNode on a load that was shrunk due to an extract_subvector combine after the constant pool entry was created. So we end up decoding more mask elements than the lo I'm hoping this patch will simplify the number of patterns needed to remove the and/or/xor promotion. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: llvm-commits, RKSimon Differential Revision: https://reviews.llvm.org/D53306 llvm-svn: 344965	2018-10-22 22:14:05 +00:00
Thomas Lively	c63b5fcb2a	[WebAssembly][NFC] Remove WebAssemblyStackifier TableGen backend Summary: Replace its functionality with a TableGen InstrInfo relational instruction mapping. Although arguably more complex than the TableGen backend, the relational mapping is a smaller maintenance burden than a TableGen backend. Reviewers: aardappel, aheejin, dschuff Subscribers: mgorny, sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D53307 llvm-svn: 344962	2018-10-22 21:55:26 +00:00
Tim Northover	a23c12a627	X86: add alias for pushfw/popfw in Intel mode A while ago we changed pushf and popf in Intel mode to generate pushfq and popfq. Unfortunately that left us with no way to get the 16-bit encoding in Intel mode so this patch adds pushfw and popfw as aliases there. llvm-svn: 344949	2018-10-22 20:38:13 +00:00
Simon Pilgrim	3b91e9676b	Revert rL344931 from llvm/trunk: [X86][SSE] getTargetShuffleMaskIndices - allow opt-in support for whole undef shuffle mask elements We can't safely assume that certain RawMask entries are UNDEF as most variable shuffles ignore non-index bits - PSHUFB only works on i8 elts so it'd be safe to use but I'm intending to come up with an alternative approach that works for all. ........ Enable this for PSHUFB constant mask decoding and remove the ConstantPool DecodePSHUFBMask llvm-svn: 344937	2018-10-22 19:01:25 +00:00
Simon Pilgrim	794f85cd93	Revert rL344933 from llvm/trunk: [X86][SSE] Tidyup DecodeVPERMILPMask shuffle mask decoding We can't safely assume that certain RawMask entries are UNDEF as most variable shuffles ignore non-index bits. ........ Add support for UNDEF raw mask elements and remove the ConstantPool DecodeVPERMILPMask usage in X86ISelLowering.cpp llvm-svn: 344936	2018-10-22 18:58:32 +00:00
Simon Pilgrim	476c9f42fc	[X86][SSE] Tidyup DecodeVPERMILPMask shuffle mask decoding Add support for UNDEF raw mask elements and remove the ConstantPool DecodeVPERMILPMask usage in X86ISelLowering.cpp llvm-svn: 344933	2018-10-22 18:35:13 +00:00
Simon Pilgrim	3521367ff3	[X86][SSE] getTargetShuffleMaskIndices - allow opt-in support for whole undef shuffle mask elements Enable this for PSHUFB constant mask decoding and remove the ConstantPool DecodePSHUFBMask llvm-svn: 344931	2018-10-22 18:09:02 +00:00
Simon Pilgrim	5dff767c25	[X86] getTargetConstantBitsFromNode - handle extraction from larger constant pool entries First step towards removing X86ShuffleDecodeConstantPool usage from X86ISelLowering.cpp llvm-svn: 344924	2018-10-22 17:43:33 +00:00
Craig Topper	8d8dcfe690	Revert r344877 "[X86] Stop promoting integer loads to vXi64" Sam McCall reported miscompiles in some tensorflow code. Reverting while I try to figure out. llvm-svn: 344921	2018-10-22 16:59:24 +00:00
Matt Arsenault	687ec75d10	DAG: Change behavior of fminnum/fmaxnum nodes Introduce new versions that follow the IEEE semantics to help with legalization that may need quieted inputs. There are some regressions from inserting unnecessary canonicalizes when these are matched from fast math fcmp + select which should be fixed in a future commit. llvm-svn: 344914	2018-10-22 16:27:27 +00:00
Simon Pilgrim	6f5cd7c67f	[X86][SSE] getTargetShuffleMask - pull out repeated shuffle mask element size. NFCI. llvm-svn: 344910	2018-10-22 15:33:30 +00:00
Roman Lebedev	898808504d	[X86] X86DAGToDAGISel: handle BZHI selection too, not just BEXTR. Summary: As discussed in D52304 / IRC, we now have pattern matching for 'bit extract' in two places - tablegen and `X86DAGToDAGISel`. There are 4 patterns. And we will have a problem with `x & (-1 >> (32 - y))` pattern. * If the mask is one-use, then it is always unfolded into `x << (32 - y) >> (32 - y)` first. Thus, the existing test coverage is already broken. * If it is not one-use, then it is not unfolded, and is matched as BZHI. * If it is not one-use, we will not match it as BEXTR. And if it is one-use, it will have been unfolded already. So we will either not handle that pattern for BEXTR, or not have test coverage for it. This is bad. As discussed with @craig.topper, let's unify this matching, and do everything in `X86DAGToDAGISel`. Then we will not have code duplication, and will have proper test coverage. This indeed does not affect any tests, and this is great. It means that for these two patterns, the `X86DAGToDAGISel` is identical to the tablegen version. Please review carefully, i'm not fully sure about that intrinsic change, and introduction of the new `X86ISD` opcode. Reviewers: craig.topper, RKSimon, spatel Reviewed By: craig.topper Subscribers: llvm-commits, craig.topper Differential Revision: https://reviews.llvm.org/D53164 llvm-svn: 344904	2018-10-22 14:12:44 +00:00
Roman Lebedev	13c5ab2e27	[X86][BMI1]: X86DAGToDAGISel: select BEXTR from x & ((1 << nbits) + (-1)) pattern Summary: Trivial continuation of D52304. While this pattern is not canonical, we do select it in the BZHI case, so this should not be any different. Reviewers: RKSimon, craig.topper, spatel Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D52348 llvm-svn: 344902	2018-10-22 13:54:17 +00:00
Petar Avramovic	e72a743740	Test commit: change comment. llvm-svn: 344900	2018-10-22 13:27:50 +00:00
Nemanja Ivanovic	674581afbb	[PowerPC][NFC] Fix bugs in r+r to r+i conversion The D-Form VSX loads introduced in ISA 3.0 are not direct D-Form equivalent of the corresponding X-Forms since they only target the Altivec registers. Namely LXSSPX can load into any of the 64 VSX registers whereas LXSSP can only load into the upper 32 VSX registers. Similarly with the remaining affected instructions. There is currently no way that I can see to trigger the bug, but as we add other ways of exploiting these instructions, there may very well be instances that do. This is an NFC patch in practical terms since the changes it introduces can not be triggered without an MIR test. Differential revision: https://reviews.llvm.org/D53323 llvm-svn: 344894	2018-10-22 11:22:59 +00:00
Craig Topper	290c081d91	[X86] Add patterns for vector and/or/xor/andn with other types than vXi64. This makes fast isel treat all legal vector types the same way. Previously only vXi64 was in the fast-isel tables. This unfortunately prevents matching of andn by fast-isel for these types since the requires SelectionDAG. But we already had this issue for vXi64. So at least we're consistent now. Interestinly it looks like fast-isel can't handle instructions with constant vector arguments so the the not part of the andn patterns is selected with SelectionDAG. This explains why VPTERNLOG shows up in some of the tests. This is a subset of D53268. As I make progress on that, I will try to reduce the number of lines in the tablegen files. llvm-svn: 344884	2018-10-22 06:30:22 +00:00
Craig Topper	321df5b0d4	[X86] Stop promoting integer loads to vXi64 Summary: Theoretically this was done to simplify the amount of isel patterns that were needed. But it also meant a substantial number of our isel patterns have to match an explicit bitcast. By making the vXi32/vXi16/vXi8 types legal for loads, DAG combiner should be able to change the load type to remove the bitcast. I had to add some additional plain load instruction patterns and a few other special cases, but overall the isel table has reduced in size by ~12000 bytes. So it looks like this promotion was hurting us more than helping. I still have one crash in vector-trunc.ll that I'm hoping @RKSimon can help with. It seems to relate to using getTargetConstantFromNode on a load that was shrunk due to an extract_subvector combine after the constant pool entry was created. So we end up decoding more mask elements than the load size. I'm hoping this patch will simplify the number of patterns needed to remove the and/or/xor promotion. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: llvm-commits, RKSimon Differential Revision: https://reviews.llvm.org/D53306 llvm-svn: 344877	2018-10-21 21:30:26 +00:00
Craig Topper	8de07b4db1	Revert r344873 "foo" Rebase gone wrong left this in my tree. llvm-svn: 344875	2018-10-21 21:08:37 +00:00
Craig Topper	5eea94edd4	[X86] Remove SDIVREM8_SEXT_HREG/UDIVREM8_ZEXT_HREG and their associated DAG combine and target bits support. Use a post isel peephole instead. Summary: These nodes exist to overcome an isel problem where we can generate a zero extend of an AH register followed by an extract subreg, and another zero extend. The first zero extend exists to avoid a partial register update copying the AH register into the low 8-bits. The second zero extend exists if the user wanted the remainder zero extended. To make this work we had a DAG combine to morph the DIVREM opcode to a special opcode that included the extend. But then we had to add the new node to computeKnownBits and computeNumSignBits to process the extension portion. This patch instead removes all of that and adds a late peephole to detect the two extends. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D53449 llvm-svn: 344874	2018-10-21 21:07:27 +00:00
Craig Topper	e367039fe5	foo llvm-svn: 344873	2018-10-21 21:07:25 +00:00
Simon Pilgrim	eb806d5f30	[X86][AVX] Enable lowerVectorShuffleAsLanePermuteAndPermute v16i16/v32i8 unary shuffle lowering llvm-svn: 344868	2018-10-21 17:07:50 +00:00
Simon Pilgrim	abc24fdb94	[X86] Only extract constant pool shuffle mask data with zero offsets D53306 exposes an issue where we sometimes use constant pool data from bigger vectors than the target shuffle mask. This should be safe to do, but we have to be certain that we're using the bottom most part of the vector as the shuffle mask decoders have no way to peek into subvectors with non-zero offsets. llvm-svn: 344867	2018-10-21 11:55:56 +00:00
Thomas Lively	5ea17d450e	[WebAssembly] Implement vector sext_inreg and tests with comparisons Summary: Depends on D53251. Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D53252 llvm-svn: 344826	2018-10-20 01:35:23 +00:00
Thomas Lively	55735d522d	[WebAssembly] Custom lower i64x2 constant shifts to avoid wrap Summary: Depends on D53057. Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D53251 llvm-svn: 344825	2018-10-20 01:31:18 +00:00
Changpeng Fang	f95f763ea5	AMDGPU: Add support pattern for SUB of one bit Summary: Add selection patterns to support one bit Sub. Reviewers: rampitec, arsenm Differential Revision: https://reviews.llvm.org/D52946 llvm-svn: 344815	2018-10-19 21:09:21 +00:00
Craig Topper	5ed1099962	[X86] Remove some left over code from when MVT:i1 was a legal type for AVX512. llvm-svn: 344813	2018-10-19 20:44:33 +00:00
Craig Topper	5c81c68385	[X86] In PostprocessISelDAG, start from allnodes_end, not the root. There is no guarantee the root is at the end if isel created any nodes without morphing them. This includes the nodes created by manual isel from C++ code in X86ISelDAGToDAG. This is similar to r333415 from PowerPC which is where I originally stole the peephole loop from. I don't have a test case, but without this a future patch doesn't work which is how I found it. llvm-svn: 344808	2018-10-19 19:24:42 +00:00
Thomas Lively	11a332d08d	[WebAssembly] Handle undefined lane indices in SIMD patterns Summary: Undefined indices in shuffles can be used when not all lanes of the output vector will be used. This happens for example in the expansion of vector reduce operations. Regardless, undefs are legal as lane indices in IR and should be supported. Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D53057 llvm-svn: 344803	2018-10-19 19:08:06 +00:00
Krzysztof Parzyszek	6bfc6577f2	[Hexagon] Remove support for V4 llvm-svn: 344791	2018-10-19 17:31:11 +00:00
Fangrui Song	2e83b2e9ee	Use llvm::{all,any,none}_of instead std::{all,any,none}_of. NFC llvm-svn: 344774	2018-10-19 06:12:02 +00:00
Eli Friedman	b09c778715	Revert r344693 ("[ARM] bottom-top mul support in ARMParallelDSP") Still causing failures on the polly-aosp buildbot; I'll follow up with a reduced testcase. llvm-svn: 344752	2018-10-18 19:34:30 +00:00
Kristina Brooks	312fcc116b	[X86] Support for the mno-tls-direct-seg-refs flag Allows to disable direct TLS segment access (%fs or %gs). GCC supports a similar flag, it can be useful in some circumstances, e.g. when a thread context block needs to be updated directly from user space. More info and specific use cases: https://bugs.llvm.org/show_bug.cgi?id=16145 There is another revision for clang as well. Related: D53102 All X86 CodeGen tests appear to pass: ``` [46/47] Running lit suite /SourceCache/llvm-trunk-8.0/test/CodeGen Testing Time: 23.17s Expected Passes : 3801 Expected Failures : 15 Unsupported Tests : 8021 ``` Reviewed by: Craig Topper. Patch by nruslan (Ruslan Nikolaev). Differential Revision: https://reviews.llvm.org/D53103 llvm-svn: 344723	2018-10-18 03:14:37 +00:00
Nicolai Haehnle	4821937d2e	AMDGPU: Avoid selecting ds_{read,write}2_b32 on SI Summary: To workaround a hardware issue in the (base + offset) calculation when base is negative. The impact on code quality should be limited since SILoadStoreOptimizer still runs afterwards and is able to combine loads/stores based on known sign information. This fixes visible corruption in Hitman on SI (easily reproducible by running benchmark mode). Change-Id: Ia178d207a5e2ac38ae7cd98b532ea2ae74704e5f Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99923 Reviewers: arsenm, mareko Subscribers: jholewinski, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D53160 llvm-svn: 344698	2018-10-17 15:37:48 +00:00
Nicolai Haehnle	c4a2ff0950	AMDGPU: Divergence-driven selection of scalar buffer load intrinsics Summary: Moving SMRD to VMEM in SIFixSGPRCopies is rather bad for performance if the load is really uniform. So select the scalar load intrinsics directly to either VMEM or SMRD buffer loads based on divergence analysis. If an offset happens to end up in a VGPR -- either because a floating point calculation was involved, or due to other remaining deficiencies in SIFixSGPRCopies -- we use v_readfirstlane. There is some unrelated churn in tests since we now select MUBUF offsets in a unified way with non-scalar buffer loads. Change-Id: I170e6816323beb1348677b358c9d380865cd1a19 Reviewers: arsenm, alex-t, rampitec, tpr Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D53283 llvm-svn: 344696	2018-10-17 15:37:30 +00:00
Sam Parker	2ef3c0dad6	[ARM] bottom-top mul support in ARMParallelDSP Previously reverted in rL343082. Original commit message: On failing to find sequences that can be converted into dual macs, try to find sequential 16-bit loads that are used by muls which we can then use smultb, smulbt, smultt with a wide load. Differential Revision: https://reviews.llvm.org/D51983 llvm-svn: 344693	2018-10-17 13:02:48 +00:00
Nicolai Haehnle	e9b134aa31	AMDGPU: Remove dead TableGen code Summary: Change-Id: Ic1f2c1d0cf9e90a0baa9fc6bacd0d3c386069fb0 Reviewers: tpr Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D53318 Change-Id: Ib4d143c898801e5cf6cb9999a495d62c91ae77fb llvm-svn: 344691	2018-10-17 12:14:26 +00:00
Petar Jovanovic	8a08412533	[MIPS GlobalISel] Legalize constants Legalize s1, s8, s16 and s64 G_CONSTANT for MIPS32. Patch by Petar Avramovic. Differential Revision: https://reviews.llvm.org/D53077 llvm-svn: 344684	2018-10-17 10:30:03 +00:00
Sjoerd Meijer	64cfb74a61	[ARM] Do not fuse VADD and VMUL, continued (2/2) This is patch 2/2, following up on D53314, and is the functional change to prevent fusing mul + add sequences into VFMAs. Differential revision: https://reviews.llvm.org/D53315 llvm-svn: 344683	2018-10-17 10:05:44 +00:00
Sjoerd Meijer	1a213a42d6	[ARM] Follow up of rL344671, attempt to pacify a buildbot It was rightfully complaining about an unpretty logical expression. llvm-svn: 344677	2018-10-17 07:51:24 +00:00
Sjoerd Meijer	ff3ab33ec8	[ARM][NFCI] Do not fuse VADD and VMUL, continued (1/2) This is a follow up of rL342874, which stopped fusing muls and adds into VMLAs for performance reasons on the Cortex-M4 and Cortex-M33. This is a serie of 2 patches, that is trying to achieve the same for VFMA. The second column in the table below shows what we were generating before rL342874, the third column what changed with rL342874, and the last column what we want to achieve with these 2 patches: -------------------------------------------------------- \| Opt \| < rL342874 \| >= rL342874 \| \| \|------------------------------------------------------\| \|-O3 \| vmla \| vmul \| vmul \| \| \| \| vadd \| vadd \| \|------------------------------------------------------\| \|-Ofast \| vfma \| vfma \| vmul \| \| \| \| \| vadd \| \|------------------------------------------------------\| \|-Oz \| vmla \| vmla \| vmla \| -------------------------------------------------------- This patch 1/2, is a cleanup of the spaghetti predicate logic on the different VMLA and VFMA codegen rules, so that we can make the final functional change in patch 2/2. This also fixes a typo in the regression test added in rL342874. Differential revision: https://reviews.llvm.org/D53314 llvm-svn: 344671	2018-10-17 07:26:35 +00:00
Craig Topper	e0a992918b	[X86] Match (cmp (and (shr X, C), mask), 0) to BEXTR+TEST. Without this we match the CMP+AND to a TEST and then match the SHR separately. I'm trusting analyzeCompare to remove the TEST during the peephole pass. Otherwise we need to check the flag users to see if they only use the Z flag. This recovers a case lost by r344270. Differential Revision: https://reviews.llvm.org/D53310 llvm-svn: 344649	2018-10-16 22:29:36 +00:00
Krasimir Georgiev	547d824da6	Revert "[WebAssembly] LSDA info generation" This reverts commit r344575. Newly introduced test eh-lsda.ll.test fails with use-after-free under ASAN build. llvm-svn: 344639	2018-10-16 18:50:09 +00:00
Evandro Menezes	c98decf864	[PATCH] [NFC][AArch64] Fix refactoring of macro fusion Fix compiler error. llvm-svn: 344632	2018-10-16 17:41:45 +00:00
Evandro Menezes	46eadcff9c	[NFC][ARM] Refactor macro fusion Simplify code for wildcards. llvm-svn: 344625	2018-10-16 17:19:51 +00:00
Evandro Menezes	de655c6d3a	[NFC][AArch64] Refactor macro fusion Simplify API of checking functions. llvm-svn: 344624	2018-10-16 17:19:28 +00:00
Simon Pilgrim	7d27cfdcb2	[X86] Fix Skylake ReadAfterLd for PADDrm etc. Missed in rL343868 as due to their custom InstrRW. llvm-svn: 344600	2018-10-16 09:50:16 +00:00
Aleksandar Beserminji	a5949439ca	[mips][micromips] Fix how values in .gcc_except_table are calculated When a landing pad is calculated in a program that is compiled for micromips, it will point to an even address. Such an error will cause a segmentation fault, as the instructions in micromips are aligned on odd addresses. This patch sets the last bit of the offset where a landing pad is, to 1, which will effectively be an odd address and point to the instruction exactly. Differential Revision: https://reviews.llvm.org/D52985 llvm-svn: 344591	2018-10-16 08:27:28 +00:00
Heejin Ahn	0981eaab47	[WebAssembly] LSDA info generation Summary: This adds support for LSDA (exception table) generation for wasm EH. Wasm EH mostly follows the structure of Itanium-style exception tables, with one exception: a call site table entry in wasm EH corresponds to not a call site but a landing pad. In wasm EH, the VM is responsible for stack unwinding. After an exception occurs and the stack is unwound, the control flow is transferred to wasm 'catch' instruction by the VM, after which the personality function is called from the compiler-generated code. (Refer to WasmEHPrepare pass for more information on this part.) This patch: - Changes wasm.landingpad.index intrinsic to take a token argument, to make this 1:1 match with a catchpad instruction - Stores landingpad index info and catch type info MachineFunction in before instruction selection - Lowers wasm.lsda intrinsic to an MCSymbol pointing to the start of an exception table - Adds WasmException class with overridden methods for table generation - Adds support for LSDA section in Wasm object writer Reviewers: dschuff, sbc100, rnk Subscribers: mgorny, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D52748 llvm-svn: 344575	2018-10-16 00:09:12 +00:00
Craig Topper	e70c560b6d	[X86] Remove some isel patterns that shouldn't be possible. These included a bitcast of a load from v4f32 to v2f64, but DAG combine should have already changed the type of the load to remove the cast. llvm-svn: 344573	2018-10-15 23:34:58 +00:00
Craig Topper	2909a3d9d0	[X86] Fix a bad bitcast in the load form of vXi16 uniform shift patterns for EVEX encoded instructions. llvm-svn: 344563	2018-10-15 21:51:32 +00:00
Simon Pilgrim	095a7fe635	[AARCH64] Improve vector popcnt lowering with ADDLP AARCH64 equivalent to D53257 - uses widening pairwise adds on vXi8 CTPOP to support i16/i32/i64 vectors. This is a blocker for generic vector CTPOP expansion (P32655) - this will remove the aarch64 diff from D53258. Differential Revision: https://reviews.llvm.org/D53259 llvm-svn: 344554	2018-10-15 21:15:58 +00:00
Konstantin Zhuravlyov	94dfcc2eb2	AMDGPU: Generate .amdgcn_target for object code v3 Differential Revision: https://reviews.llvm.org/D53221 llvm-svn: 344552	2018-10-15 20:37:47 +00:00
Aleksandar Beserminji	81eb440772	[mips][micromips] Fix overlaping FDEs error When compiling static executable for micromips, CFI symbols are incorrectly labeled as MICROMIPS, which cause ".eh_frame_hdr refers to overlapping FDEs." error. This patch does not label CFI symbols as MICROMIPS, and FDEs do not overlap anymore. This patch also exposes another bug, which is fixed here: https://reviews.llvm.org/D52985 Differential Revision: https://reviews.llvm.org/D52987 llvm-svn: 344516	2018-10-15 14:39:12 +00:00
Aleksandar Beserminji	585f55bb8b	[mips][micromips] Revert "Fix overlaping FDEs error" This reverts r344511. llvm-svn: 344515	2018-10-15 14:36:48 +00:00
Simon Pilgrim	5abb607ebe	[ARM][NEON] Improve vector popcnt lowering with PADDL (PR39281) As I suggested on PR39281, this patch uses PADDL pairwise addition to widen from the vXi8 CTPOP result to the target vector type. This is a blocker for moving more x86 code to generic vector CTPOP expansion (P32655 + D53258) - ARM's vXi64 CTPOP currently expands, which would generate a vXi64 MUL but ARM's custom lowering expands the general MUL case and vectors aren't well handled in LegalizeDAG - improving the CTPOP lowering was a lot easier than fixing the MUL lowering for this one case...... Differential Revision: https://reviews.llvm.org/D53257 llvm-svn: 344512	2018-10-15 13:20:41 +00:00
Aleksandar Beserminji	10ec5c8c28	[mips][micromips] Fix overlaping FDEs error When compiling static executable for micromips, CFI symbols are incorrectly labeled as MICROMIPS, which cause ".eh_frame_hdr refers to overlapping FDEs." error. This patch does not label CFI symbols as MICROMIPS, and FDEs do not overlap anymore. This patch also exposes another bug, which is fixed here: https://reviews.llvm.org/D52985 Differential Revision: https://reviews.llvm.org/D52987 llvm-svn: 344511	2018-10-15 12:59:17 +00:00
Chandler Carruth	edb12a838a	[TI removal] Make variables declared as `TerminatorInst` and initialized by `getTerminator()` calls instead be declared as `Instruction`. This is the biggest remaining chunk of the usage of `getTerminator()` that insists on the narrow type and so is an easy batch of updates. Several files saw more extensive updates where this would cascade to requiring API updates within the file to use `Instruction` instead of `TerminatorInst`. All of these were trivial in nature (pervasively using `Instruction` instead just worked). llvm-svn: 344502	2018-10-15 10:04:59 +00:00
Craig Topper	06aea1720a	[X86] Move promotion of vector and/or/xor from legalization to DAG combine Summary: I've noticed that the bitcasts we introduce for these make computeKnownBits and computeNumSignBits not work well in LegalizeVectorOps. LegalizeVectorOps legalizes bottom up while LegalizeDAG legalizes top down. The bottom up strategy for LegalizeVectorOps means operands are legalized before their uses. So we promote and/or/xor before we legalize the operands that use them making computeKnownBits/computeNumSignBits in places like LowerTruncate suboptimal. I looked at changing LegalizeVectorOps to be top down as well, but that was more disruptive and caused some regressions. I also looked at just moving promotion of binops to LegalizeDAG, but that had a few issues one around matching AND,ANDN,OR into VSELECT because I had to create ANDN as vXi64, but the other nodes hadn't legalized yet, I didn't look too hard at fixing that. This patch seems to produce better results overall than my other attempts. We now form broadcasts of constants better in some cases. For at least some of them the AND was being introduced in LegalizeDAG, promoted to vXi64, and the BUILD_VECTOR was also legalized there. I think we got bad ordering of that. Now the promotion is out of the legalizer so we handle this better. In the longer term I think we really should evaluate whether we should be doing this promotion at all. It's really there to reduce isel pattern count, but I'm wondering if we'd be better served just eating the pattern cost or doing C++ based isel for vector and/or/xor in X86ISelDAGToDAG. The masked and/or/xor will definitely be difficult in patterns if a bitcast gets between the vselect and the and/or/xor node. That becomes a lot of permutations to cover. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D53107 llvm-svn: 344487	2018-10-15 01:51:58 +00:00
Craig Topper	671779456a	[X86] Add 128 MOVDDUP to the constant pool printing in X86AsmPrinter::EmitInstruction. We use this instruction to broadcast a single 64-bit value to a v2i64/v2f64 vector. llvm-svn: 344486	2018-10-15 01:51:53 +00:00
Simon Pilgrim	861cd0ba44	[X86][AVX] Enable lowerVectorShuffleAsLanePermuteAndPermute v16i16/v32i8 shuffle lowering Extends D53148 from v4f64 now that we have test coverage for v16i16/v32i8 shuffles. llvm-svn: 344481	2018-10-14 17:34:20 +00:00
Dorit Nuzman	38bbf81ade	recommit 344472 after fixing build failure on ARM and PPC. llvm-svn: 344475	2018-10-14 08:50:06 +00:00
Dorit Nuzman	5118c68cde	revert 344472 due to failures. llvm-svn: 344473	2018-10-14 07:21:20 +00:00
Dorit Nuzman	8174368955	[IAI,LV] Add support for vectorizing predicated strided accesses using masked interleave-group The vectorizer currently does not attempt to create interleave-groups that contain predicated loads/stores; predicated strided accesses can currently be vectorized only using masked gather/scatter or scalarization. This patch makes predicated loads/stores candidates for forming interleave-groups during the Loop-Vectorizer's analysis, and adds the proper support for masked-interleave- groups to the Loop-Vectorizer's planning and transformation stages. The patch also extends the TTI API to allow querying the cost of masked interleave groups (which each target can control); Targets that support masked vector loads/ stores may choose to enable this feature and allow vectorizing predicated strided loads/stores using masked wide loads/stores and shuffles. Reviewers: Ayal, hsaito, dcaballe, fhahn, javed.absar Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D53011 llvm-svn: 344472	2018-10-14 07:06:16 +00:00
Craig Topper	20fa085d74	[X86] Fix bad indentation. NFC llvm-svn: 344471	2018-10-14 04:01:40 +00:00
Craig Topper	ec4b75f47a	[X86] Type legalize v2f32 stores by widening to v4f32, casting to v2f64, extracting f64 and storing. Summary: This is similar to what D52528 did for loads. It should match what generic type legalization does in 64-bit mode where it uses a v2i64 cast and an i64 store. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D53173 llvm-svn: 344470	2018-10-14 03:36:27 +00:00
Benjamin Kramer	c55e997556	Move some helpers from the global namespace into anonymous ones. llvm-svn: 344468	2018-10-13 22:18:22 +00:00
Thomas Lively	ffde98de21	[WebAssembly][NFC] Fix signed/unsigned comparison warning llvm-svn: 344459	2018-10-13 16:58:03 +00:00
Simon Pilgrim	c5d7c6e5f6	[X86][SSE] Remove most of vector CTTZ custom lowering and use LegalizeDAG instead. There is one remnant - AVX1 custom splitting of 256-bit vectors - which is due to a regression where the X86ISD::ANDNP is still performed as a YMM. I've also tightened the CTLZ or CTPOP lowering in SelectionDAGLegalize::ExpandBitCount to require a legal CTLZ - it doesn't affect existing users and fixes an issue with AVX512 codegen. llvm-svn: 344457	2018-10-13 16:11:15 +00:00
Simon Pilgrim	1c2051ead7	[X86][SSE] Begin removing vector CTTZ custom lowering and use LegalizeDAG instead. Adds CTTZ vector legalization support and begins the removal of the X86/SSE custom lowering. llvm-svn: 344453	2018-10-13 15:16:55 +00:00
Simon Pilgrim	1c6d320351	[X86][SSE] combineIncDecVector - use isConstantSplat Use isConstantSplat instead of ISD::isConstantSplatVector to let us us peek through to illegal types (in this case for i686 targets to recognise i64 constants) llvm-svn: 344452	2018-10-13 14:45:44 +00:00
Simon Pilgrim	a03379527a	[X86] Pull out target constant splat helper function. NFCI. The code in LowerScalarImmediateShift is just a more powerful version of ISD::isConstantSplatVector. llvm-svn: 344451	2018-10-13 14:28:40 +00:00
Simon Pilgrim	10434cbae1	Pull out repeated getOperand(). NFCI. llvm-svn: 344450	2018-10-13 13:33:32 +00:00
Simon Pilgrim	bc141724c0	Remove unused variable. NFCI. llvm-svn: 344449	2018-10-13 13:30:10 +00:00
Simon Pilgrim	f64e654d62	[X86][SSE] Improve CTTZ lowering when CTLZ is legal If we have better CTLZ support than CTPOP, then use cttz(x) = width - ctlz(~x & (x - 1)) - and remove the CTTZ_ZERO_UNDEF handling as it no longer gives better codegen. Similar to rL344447, this is also closer to LegalizeDAG's approach llvm-svn: 344448	2018-10-13 13:05:19 +00:00
Simon Pilgrim	afead139cf	[X86][SSE] Change CTTZ vector lowering to cttz(x) = ctpop(~x & (x - 1)) This patch changes the vector CTTZ lowering from: cttz(x) = ctpop((x & -x) - 1) to: cttz(x) = ctpop(~x & (x - 1)) Not only does this make better use of the PANDN instruction, but it also matches the LegalizeDAG method which should allow us to remove the x86 specific code at some point in the future (we need to fix some issues with the bitcasted logic ops and CTPOP lowering first). Differential Revision: https://reviews.llvm.org/D53214 llvm-svn: 344447	2018-10-13 12:12:06 +00:00
Simon Pilgrim	f3952413f7	[X86][AVX] Add lowerVectorShuffleAsLanePermuteAndPermute for v4f64 shuffles (PR39161) Add shuffle lowering for the case where we can shuffle the lanes into place followed by an in-lane permute. This is mainly for cases where we can have non-repeating permutes in each lane, but for now I've just enabled it for v4f64 unary shuffles to fix PR39161 - there is no test coverage for other shuffles that might benefit yet. We now have several cross-lane shuffle lowering methods that all do something similar - I've looked at merging some of these (notably by making the repeated mask mechanism in lowerVectorShuffleByMerging128BitLanes optional), but there is a lot of assertions/assumptions in the way that makes this tricky - I ended up going for adding yet another relatively simple method instead. Differential Revision: https://reviews.llvm.org/D53148 llvm-svn: 344446	2018-10-13 11:38:10 +00:00
Arnaud A. de Grandmaison	162435e7b5	[AArch64] Swap comparison operands if that enables some folding. Summary: AArch64 can fold some shift+extend operations on the RHS operand of comparisons, so swap the operands if that makes sense. This provides a fix for https://bugs.llvm.org/show_bug.cgi?id=38751 Reviewers: efriedma, t.p.northover, javed.absar Subscribers: mcrosier, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D53067 llvm-svn: 344439	2018-10-13 07:43:56 +00:00
Thomas Lively	3afc346dd0	[WebAssembly] SIMD min and max Summary: Depends on D52324 and D52764. Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D52325 llvm-svn: 344438	2018-10-13 07:26:10 +00:00
Thomas Lively	0ff82ac154	[WebAssembly][NFC] Unify ARGUMENT classes Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D53172 llvm-svn: 344436	2018-10-13 07:09:10 +00:00
Alex Bradbury	748d080e62	[RISCV] Eliminate unnecessary masking of promoted shift amounts SelectionDAGBuilder::visitShift will always zero-extend a shift amount when it is promoted to the ShiftAmountTy. This results in zero-extension (masking) which is unnecessary for RISC-V as the shift operations only read the lower 5 or 6 bits (RV32 or RV64). I initially proposed adding a getExtendForShiftAmount hook so the shift amount can be any-extended (D52975). @efriedma explained this was unsafe, so I have instead eliminate the unnecessary and operations at instruction selection time in a manner similar to X86InstrCompiler.td. Differential Revision: https://reviews.llvm.org/D53224 llvm-svn: 344432	2018-10-12 23:18:52 +00:00
Craig Topper	3e76b2d736	[X86] Improve type legalization of (v2i32/v4i16/v8i16 (bitcast (v2f32))) to avoid a stack stack temporary. llvm-svn: 344425	2018-10-12 22:00:04 +00:00
Craig Topper	c693a23025	[X86] Simplify the end of custom type legalization for (v2i32/v4i16/v8i8 (bitcast (f64))) by just emitting an EXTRACT_SUBVECTOR instead of a BUILD_VECTOR. Generic legalization should be able to finish legalizing the EXTRACT_SUBVECTOR probably by turning it into a BUILD_VECTOR. But we should emit the simplest sequence. llvm-svn: 344424	2018-10-12 22:00:00 +00:00
Craig Topper	a8a44f1bec	[X86] Skip (v2i32/v4i16/v8i8 (bitcast (f64))) handling in ReplaceNodeResults if the dest type can be widened by generic legalization. NFCI The algorithm we would do previously was identical to generic legalization. If we ever switch to legalizing integer vectors via widening we'll be able to kill off the code since it now only runs for promotion. llvm-svn: 344423	2018-10-12 21:59:58 +00:00
Sanjay Patel	e28c8ecd72	[x86] add and use fast horizontal vector math subtarget feature This is the planned follow-up to D52997. Here we are reducing horizontal vector math codegen by default. AMD Jaguar (btver2) should have no difference with this patch because it has fast-hops. (If we want to set that bit for other CPUs, let me know.) The code changes are small, but there are many test diffs. For files that are specifically testing for hops, I added RUNs to distinguish fast/slow, so we can see the consequences side-by-side. For files that are primarily concerned with codegen other than hops, I just updated the CHECK lines to reflect the new default codegen. To recap the recent horizontal op story: 1. Before rL343727, we were producing hops for all subtargets for a variety of patterns. Hops were likely not optimal for all targets though. 2. The IR improvement in r343727 exposed a hole in the backend hop pattern matching, so we reduced hop codegen for all subtargets. That was bad for Jaguar (PR39195). 3. We restored the hop codegen for all targets with rL344141. Good for Jaguar, but probably bad for other CPUs. 4. This patch allows us to distinguish when we want to produce hops, so everyone can be happy. I'm not sure if we have the best predicate here, but the intent is to undo the extra hop-iness that was enabled by r344141. Differential Revision: https://reviews.llvm.org/D53095 llvm-svn: 344361	2018-10-12 16:41:02 +00:00
Eric Liu	55ab86b72b	Fix unused variable warning after r344348 llvm-svn: 344350	2018-10-12 15:01:11 +00:00
Simon Pilgrim	78b5a3c3ef	[X86][SSE] LowerVectorCTPOP - pull out repeated byte sum stage. Pull out repeated byte sum stage for popcount of vector elements > 8bits. This allows us to simplify the LUT/BITMATH popcnt code to always assume vXi8 vectors, and also improves avx512bitalg codegen which only has access to vpopcntb/vpopcntw. llvm-svn: 344348	2018-10-12 14:18:47 +00:00
Hiroshi Inoue	9552dd187a	[PowerPC] avoid masking already-zero bits in BitPermutationSelector The current BitPermutationSelector generates a code to build a value by tracking two types of bits: ConstZero and Variable. ConstZero means a bit we need to mask off and Variable is a bit we copy from an input value. This patch add third type of bits VariableKnownToBeZero caused by AssertZext node or zero-extending load node. VariableKnownToBeZero means a bit comes from an input value, but it is known to be already zero. So we do not need to mask them. VariableKnownToBeZero enhances flexibility to group bits, since we can avoid redundant masking for these bits. This patch also renames "HasZero" to "NeedMask" since now we may skip masking even when we have zeros (of type VariableKnownToBeZero). Differential Revision: https://reviews.llvm.org/D48025 llvm-svn: 344347	2018-10-12 14:02:20 +00:00
Simon Pilgrim	29279f29c8	[X86][SSE] Add extract_subvector(PSHUFB) -> PSHUFB(extract_subvector()) combine Fixes PR32160 by reducing the size of PSHUFB if we only use one of the lanes. This approach can probably be generalized to handle any target shuffle (and any subvector index) but we have no test coverage at the moment. llvm-svn: 344336	2018-10-12 12:10:34 +00:00
Andrea Di Biagio	6eebbe0a97	[tblgen][llvm-mca] Add the ability to describe move elimination candidates via tablegen. This patch adds the ability to identify instructions that are "move elimination candidates". It also allows scheduling models to describe processor register files that allow move elimination. A move elimination candidate is an instruction that can be eliminated at register renaming stage. Each subtarget can specify which instructions are move elimination candidates with the help of tablegen class "IsOptimizableRegisterMove" (see llvm/Target/TargetInstrPredicate.td). For example, on X86, BtVer2 allows both GPR and MMX/SSE moves to be eliminated. The definition of 'IsOptimizableRegisterMove' for BtVer2 looks like this: ``` def : IsOptimizableRegisterMove<[ InstructionEquivalenceClass<[ // GPR variants. MOV32rr, MOV64rr, // MMX variants. MMX_MOVQ64rr, // SSE variants. MOVAPSrr, MOVUPSrr, MOVAPDrr, MOVUPDrr, MOVDQArr, MOVDQUrr, // AVX variants. VMOVAPSrr, VMOVUPSrr, VMOVAPDrr, VMOVUPDrr, VMOVDQArr, VMOVDQUrr ], CheckNot<CheckSameRegOperand<0, 1>> > ]>; ``` Definitions of IsOptimizableRegisterMove from processor models of a same Target are processed by the SubtargetEmitter to auto-generate a target-specific override for each of the following predicate methods: ``` bool TargetSubtargetInfo::isOptimizableRegisterMove(const MachineInstr *MI) const; bool MCInstrAnalysis::isOptimizableRegisterMove(const MCInst &MI, unsigned CPUID) const; ``` By default, those methods return false (i.e. conservatively assume that there are no move elimination candidates). Tablegen class RegisterFile has been extended with the following information: - The set of register classes that allow move elimination. - Maxium number of moves that can be eliminated every cycle. - Whether move elimination is restricted to moves from registers that are known to be zero. This patch is structured in three part: A first part (which is mostly boilerplate) adds the new 'isOptimizableRegisterMove' target hooks, and extends existing register file descriptors in MC by introducing new fields to describe properties related to move elimination. A second part, uses the new tablegen constructs to describe move elimination in the BtVer2 scheduling model. A third part, teaches llm-mca how to query the new 'isOptimizableRegisterMove' hook to mark instructions that are candidates for move elimination. It also teaches class RegisterFile how to describe constraints on move elimination at PRF granularity. llvm-mca tests for btver2 show differences before/after this patch. Differential Revision: https://reviews.llvm.org/D53134 llvm-svn: 344334	2018-10-12 11:23:04 +00:00
Simon Pilgrim	c844bc84dd	[X86] Ignore float/double non-temporal loads (PR39256) Scalar non-temporal loads were asserting instead of just being ignored. Reduced from https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=10895 llvm-svn: 344331	2018-10-12 10:20:16 +00:00
Stefan Maksimovic	285c0f4fdc	[mips] Mark fmaxl as a long double emulation routine Failure was discovered upon running projects/compiler-rt/test/builtins/Unit/divtc3_test.c in a stage2 compiler build. When compiling projects/compiler-rt/lib/builtins/divtc3.c, a call to fmaxl within the divtc3 implementation had its return values read from registers $2 and $3 instead of $f0 and $f2. Include fmaxl in the list of long double emulation routines to have its return value correctly interpreted as f128. Almost exact issue here: https://reviews.llvm.org/D17760 Differential Revision: https://reviews.llvm.org/D52649 llvm-svn: 344326	2018-10-12 08:18:38 +00:00
Tom Stellard	a894043910	Revert "AMDGPU/GlobalISel: Implement select for G_INSERT" This reverts commit r344310. The test case was failing on some bots. llvm-svn: 344317	2018-10-11 23:36:46 +00:00
Matthias Braun	d6131c9633	X86/TargetTransformInfo: Report div/rem constant immediate costs as TCC_Free DIV/REM by constants should always be expanded into mul/shift/etc. patterns. Unfortunately the ConstantHoisting pass runs too early at a point where the pattern isn't expanded yet. However after ConstantHoisting hoisted some immediate the result may not expand anymore. Also the hoisting typically doesn't make sense because it operates on immediates that will change completely during the expansion. Report DIV/REM as TCC_Free so ConstantHoisting will not touch them. Differential Revision: https://reviews.llvm.org/D53174 llvm-svn: 344315	2018-10-11 23:14:35 +00:00
Tom Stellard	4733be6e7b	AMDGPU/GlobalISel: Implement select for G_INSERT Reviewers: arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D53116 llvm-svn: 344310	2018-10-11 22:49:54 +00:00
Ana Pazos	0a5fcefa31	[RISCV] Fix disassembling of fence instruction with invalid field Summary: Instruction with 0 in fence field being disassembled as fence , iorw. Printing "unknown" to match GAS behavior. This bug was uncovered by a LLVM MC Disassembler Protocol Buffer Fuzzer for the RISC-V assembly language. Reviewers: asb Subscribers: rbar, johnrusso, simoncook, sabuasal, niosHD, kito-cheng, shiva0217, zzheng, edward-jones, mgrang, rogfer01, MartinMosbeck, brucehoult, the_o, rkruppe, jfb, PkmX, jocewei, asb Differential Revision: https://reviews.llvm.org/D51828 llvm-svn: 344309	2018-10-11 22:49:13 +00:00
Richard Trieu	dfd1760b5f	Inline variable into assert to avoid unused variable warning. llvm-svn: 344308	2018-10-11 22:42:41 +00:00
Craig Topper	35d513c7e4	[X86] Type legalize v2f32 loads by using an f64 load and a scalar_to_vector. On 64-bit targets the generic legalize will use an i64 load and a scalar_to_vector for us. But on 32-bit targets i64 isn't legal and the generic legalizer will end up emitting two 32-bit loads. We have DAG combines that try to put those two loads back together with pretty good success. This patch instead uses f64 to avoid the splitting entirely. I've made it do the same for 64-bit mode for consistency and to keep the load in the fp domain. There are a few things in here that look like regressions in 32-bit mode, but I believe they bring us closer to the 64-bit mode codegen. And that the 64-bit mode code could be better. I think those issues should be looked at separately. Differential Revision: https://reviews.llvm.org/D52528 llvm-svn: 344291	2018-10-11 20:36:06 +00:00
Thomas Lively	f04bed8e79	[WebAssembly][NFC] Remove repetition of Defs = [ARGUMENTS] (fixed) llvm-svn: 344287	2018-10-11 20:21:22 +00:00
Sumanth Gundapaneni	a4a9155e4f	[Hexagon] Restrict compound instructions with constant value. Having a constant value operand in the compound instruction is not always profitable. This patch improves coremark by ~4% on Hexagon. Differential Revision: https://reviews.llvm.org/D53152 llvm-svn: 344284	2018-10-11 19:48:15 +00:00
Thomas Lively	ab37189f7e	[WebAssembly] Revert rL344180, which was breaking expensive checks llvm-svn: 344280	2018-10-11 18:45:48 +00:00
Krzysztof Parzyszek	5d3a6f76a8	[Hexagon] Eliminate potential sources of non-determinism in HCE Also, avoid comparing GUIDs when ordering global addresses, because source file location can cause different GUID to be calculated. As a result, a pair of symbols can compare "less" in one directory, but "greater" in another. llvm-svn: 344271	2018-10-11 18:26:02 +00:00
Craig Topper	fb2ac8969e	[X86] Restore X86ISelDAGToDAG::matchBEXTRFromAnd. Teach address matching to create a BEXTR pattern from a (shl (and X, mask >> C1) if C1 can be folded into addressing mode. This is an alternative to D53080 since I think using a BEXTR for a shifted mask is definitely an improvement when the shl can be absorbed into addressing mode. The other cases I'm less sure about. We already have several tricks for handling an and of a shift in address matching. This adds a new case for BEXTR. I've moved the BEXTR matching code back to X86ISelDAGToDAG to allow it to match. I suppose alternatively we could directly emit a X86ISD::BEXTR node that isel could pattern match. But I'm trying to view BEXTR matching as an isel concern so DAG combine can see 'and' and 'shift' operations that are well understood. We did lose a couple cases from tbm_patterns.ll, but I think there are ways to recover that. I've also put back the manual load folding code in matchBEXTRFromAnd that I removed a few months ago in r324939. This gives us some more freedom to make decisions based on the ability to fold a load. I haven't done anything with that yet. Differential Revision: https://reviews.llvm.org/D53126 llvm-svn: 344270	2018-10-11 18:06:07 +00:00
Diogo N. Sampaio	352a2fa1e7	[AARCH64][FIX] Emit data symbol for constant pool data The ARM64 elf emitter would omit printing data symbol for zero filled constant data. This patch overrides the emitFill method as to enforce that the symbol is correctly printed. Differential revision: https://reviews.llvm.org/D53132 llvm-svn: 344248	2018-10-11 14:10:32 +00:00
Roman Lebedev	4225f4adff	[X86][BMI1]: X86DAGToDAGISel: select BEXTR from x & ~(-1 << nbits) pattern Summary: As discussed in D48491, we can't really do this in the TableGen, since we need to produce two instructions. This only implements one single pattern. The other 3 patterns will be in follow-ups. I'm not sure yet if we want to also fuse shift into here (i.e `(x >> start) & ...`) Reviewers: RKSimon, craig.topper, spatel Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D52304 llvm-svn: 344224	2018-10-11 07:51:13 +00:00
Thomas Lively	7fa7e6a284	[WebAssembly][NFC] Use intrinsic dag nodes directly Summary: Instead of custom lowering to WebAssemblyISD nodes first. Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D53119 llvm-svn: 344211	2018-10-11 00:49:24 +00:00
Thomas Lively	2ebacb107b	[WebAssembly] Saturating float to int intrinsics Summary: Although the saturating float to int instructions are already emitted from normal IR, the fpto{s,u}i instructions produce poison values if the argument cannot fit in the result type. These intrinsics are therefore necessary to get guaranteed defined saturating behavior. Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D53004 llvm-svn: 344204	2018-10-11 00:01:25 +00:00
Craig Topper	b5421c498d	[X86] Prevent non-temporal loads from folding into instructions by blocking them in X86DAGToDAGISel::IsProfitableToFold rather than with a predicate. Remove tryFoldVecLoad since tryFoldLoad would call IsProfitableToFold and pick up the new check. This saves about 5K out of ~600K on the generated isel table. llvm-svn: 344189	2018-10-10 21:48:34 +00:00
George Burgess IV	6ef8002c2c	Replace most users of UnknownSize with LocationSize::unknown(); NFC Moving away from UnknownSize is part of the effort to migrate us to LocationSizes (e.g. the cleanup promised in D44748). This doesn't entirely remove all of the uses of UnknownSize; some uses require tweaks to assume that UnknownSize isn't just some kind of int. This patch is intended to just be a trivial replacement for all places where LocationSize::unknown() will Just Work. llvm-svn: 344186	2018-10-10 21:28:44 +00:00
Thomas Lively	eff0542c56	[WebAssembly][NFC] Remove repetition of Defs = [ARGUMENTS] Summary: By moving that line into the `I` multiclass. Reviewers: aheejin Subscribers: dschuff, sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D53093 llvm-svn: 344180	2018-10-10 20:40:54 +00:00
Roman Lebedev	33d84c6dac	[X86] Move X86DAGToDAGISel::matchBEXTRFromAnd() into X86ISelLowering Summary: As discussed in [[ https://bugs.llvm.org/show_bug.cgi?id=38938 \| PR38938 ]], we fail to emit `BEXTR` if the mask is shifted. We can't deal with that in `X86DAGToDAGISel` `before the address mode for the inc is selected`, and we can't really do it in the normal DAGCombine, because we don't have generic `ISD::BitFieldExtract` node, and if we simply turn the shifted mask into a normal mask + shift-left, it will be folded back. So it would seem X86ISelLowering is the place to handle this. This patch only moves the matchBEXTRFromAnd() from X86DAGToDAGISel to X86ISelLowering. It does not add support for the 'shifted mask' pattern. Reviewers: RKSimon, craig.topper, spatel Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D52426 llvm-svn: 344179	2018-10-10 20:40:12 +00:00
Thomas Lively	103f0161b3	[WebAssembly][NFC] Use vnot patfrag to simplify v128.not Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D53097 llvm-svn: 344175	2018-10-10 19:09:16 +00:00
Sanjay Patel	6cca8af227	[x86] allow single source horizontal op matching (PR39195) This is intended to restore horizontal codegen to what it looked like before IR demanded elements improved in: rL343727 As noted in PR39195: https://bugs.llvm.org/show_bug.cgi?id=39195 ...horizontal ops can be worse for performance than a shuffle+regular binop, so I've added a TODO. Ideally, we'd solve that in a machine instruction pass, but a quicker solution will be adding a 'HasFastHorizontalOp' feature bit to deal with it here in the DAG. Differential Revision: https://reviews.llvm.org/D52997 llvm-svn: 344141	2018-10-10 13:39:59 +00:00
Simon Pilgrim	5cb3a82892	[TargetLowering] Add root node back to work list after successful SimplifyDemandedBits/SimplifyDemandedVectorElts Similar to what already happens in the DAGCombiner wrappers, this patch adds the root nodes back onto the worklist if the DCI wrappers' SimplifyDemandedBits/SimplifyDemandedVectorElts were successful. Differential Revision: https://reviews.llvm.org/D53026 llvm-svn: 344132	2018-10-10 10:44:15 +00:00
Jonas Paulsson	bf66f38705	[SystemZ] Temporarily disable high VFs with integer div/rem. Until mischeduler is clever enough to avoid spilling in a vectorized loop with many (scalar) DLRs it is better to avoid high vectorization factors (8 and above). llvm-svn: 344129	2018-10-10 09:30:29 +00:00
Craig Topper	02c62aa58a	[X86] Remove FeatureRTM from Skylake processor list Summary: There are a LOT of Skylakes and later without TSX-NI. Examples: - SKL: https://ark.intel.com/products/136863/Intel-Core-i3-8121U-Processor-4M-Cache-up-to-3-20-GHz- - KBL: https://ark.intel.com/products/97540/Intel-Core-i7-7560U-Processor-4M-Cache-up-to-3-80-GHz- - KBL-R: https://ark.intel.com/products/149091/Intel-Core-i7-8565U-Processor-8M-Cache-up-to-4-60-GHz- - CNL: https://ark.intel.com/products/136863/Intel-Core-i3-8121U-Processor-4M-Cache-up-to-3_20-GHz This feature seems to be present only on high-end desktop and server chips (I can't find any SKX without). This commit leaves it disabled for all processors, but can be re-enabled for specific builds with -mrtm. Patch by Thiago Macieira Reviewers: erichkeane, craig.topper Reviewed By: craig.topper Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D53041 llvm-svn: 344116	2018-10-10 07:43:35 +00:00
Jonas Paulsson	2c8b33770c	[SystemZ] Take better care when computing needed vector registers in TTI. A new function getNumVectorRegs() is better to use for the number of needed vector registers instead of getNumberOfParts(). This is to make sure that the number of vector registers (and typically operations) required for a vector type is accurate. getNumberOfParts() which was previously used works by splitting the vector type until it is legal gives incorrect results for types with a non power of two number of elements (rare). A new static function getScalarSizeInBits() that also checks for a pointer type and returns 64U for it since otherwise it gets a value of 0). Used in a few places where Ty may be pointer. Review: Ulrich Weigand llvm-svn: 344115	2018-10-10 07:36:27 +00:00
QingShan Zhang	bc1586352e	[PowerPC] Fix the assert of ISD::SIGN_EXTEND_INREG when type is v2i16 and v2i8 For ISD::SIGN_EXTEND_INREG operation of v2i16 and v2i8 types will cause assert because they are registered as custom operation. So that the type legalization phase will enter the custom hook, which do not handle ISD::SIGN_EXTEND_INREG operation and fall throw into unreachable assert. Patch By: wuzish (Zixuan Wu) Differential Revision: https://reviews.llvm.org/D52449 llvm-svn: 344109	2018-10-10 02:33:48 +00:00
Thomas Lively	108e98ec32	[WebAssembly] Fix fneg lowering Summary: Subtraction from zero and floating point negation do not have the same semantics, so fix lowering. Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D52948 llvm-svn: 344107	2018-10-10 01:09:09 +00:00
Heejin Ahn	5d900954bd	[WebAssembly] Improve comments for SIMD instruction definitions llvm-svn: 344106	2018-10-10 01:04:02 +00:00
Thomas Lively	409f5840a7	[WebAssembly] Handle V128 register class in explicit locals pass Summary: Also add tests to catch crashes in passes that are not normally run in tests. Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D52959 llvm-svn: 344094	2018-10-09 23:33:16 +00:00
Rong Xu	5c7bf1a756	[X86] Fix sanitizer bot failure from 344085 Fix the memory issue exposed by sanitizer. llvm-svn: 344092	2018-10-09 23:10:56 +00:00
Heejin Ahn	d9a6de3c38	[WebAssembly] Improve readability of SIMD instructions (NFC) Summary: - Categorize instructions into the categories as in the SIMD spec - Move SIMD-related definition to WebAssemblyInstrSIMD.td - Put definition and use of patterns together - Add newlines here and there Reviewers: tlively Subscribers: dschuff, sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D53045 llvm-svn: 344086	2018-10-09 22:23:39 +00:00
Rong Xu	3d2efdfdea	Recommit r343993: [X86] condition branches folding for three-way conditional codes Fix the memory issue exposed by sanitizer. llvm-svn: 344085	2018-10-09 22:03:40 +00:00
Nemanja Ivanovic	87873d04c3	[PowerPC] Implement hasBitPreservingFPLogic for types that can be supported This is the PPC-specific non-controversial part of https://reviews.llvm.org/D44548 that simply enables this combine for PPC since PPC has these instructions. This commit will allow the target-independent portion to be truly target independent. llvm-svn: 344077	2018-10-09 20:35:15 +00:00
Craig Topper	f6d8400869	[X86] When lowering unsigned v2i64 setcc without SSE42, flip the sign bits in the v2i64 type then bitcast to v4i32. This may give slightly better opportunities for DAG combine to simplify with the operations before the setcc. It also matches the type the xors will eventually be promoted to anyway so it saves a legalization step. Almost all of the test changes are because our constant pool entry is now v2i64 instead of v4i32 on 64-bit targets. On 32-bit targets getConstant should be emitting a v4i32 build_vector and a v4i32->v2i64 bitcast. There are a couple test cases where it appears we now combine a bitwise not with one of these xors which caused a new constant vector to be generated. This prevented a constant pool entry from being shared. But if that's an issue we're concerned about, it seems we need to address it another way that just relying a bitcast to hide it. This came about from experiments I've been trying with pushing the promotion of and/or/xor to vXi64 later than LegalizeVectorOps where it is today. We run LegalizeVectorOps in a bottom up order. So the and/or/xor are promoted before their users are legalized. The bitcasts added for the promotion act as a barrier to computeKnownBits if we try to use it during vector legalization of a later operation. So by moving the promotion out we can hopefully get better results from computeKnownBits/computeNumSignBits like in LowerTruncate on AVX512. I've also looked at running LegalizeVectorOps in a top down order like LegalizeDAG, but thats showing some other issues. llvm-svn: 344071	2018-10-09 19:05:50 +00:00
Sanjay Patel	f5fac1826a	[x86] use demanded bits to simplify masked store codegen As noted in D52747, if we prefer IR to use trunc for bool vectors rather than and+icmp, we can expose codegen shortcomings as seen here with masked store. Replace a hard-coded PCMPGT simplification with the more general demanded bits call to improve things. Differential Revision: https://reviews.llvm.org/D52964 llvm-svn: 344048	2018-10-09 14:04:14 +00:00
Simon Atanasyan	d465318c6d	[mips] Set pointer size to 4 bytes for N32 ABI CodePointerSize and CalleeSaveStackSlotSize values are used in DWARF generation. In case of MIPS it's incorrect to check for Triple::isMIPS64() only this function returns true for N32 ABI too. Now we do not have a method to recognize N32 if it's specified by a command line option and is not a part of a target triple. So we check for Triple::GNUABIN32 only. It's better than nothing. Differential revision: https://reviews.llvm.org/D52874 llvm-svn: 344039	2018-10-09 11:29:45 +00:00
Nemanja Ivanovic	4c0b110e3e	[PowerPC] Remove self-copies in pre-emit peephole There are occasionally instances where AADB rewrites registers in such a way that a reg-reg copy becomes a self-copy. Such an instruction is obviously redundant and can be removed. This patch does precisely that. Note that this will not remove various nop's that we insert (which are themselves just self-copies). The reason those are left alone is that all of them have their own opcodes (that just encode to a self-copy). What prompted this patch is the fact that these self-copies sometimes end up using registers that make the instruction a priority-setting nop, thereby having a significant effect on performance. Differential revision: https://reviews.llvm.org/D52432 llvm-svn: 344036	2018-10-09 10:54:04 +00:00
Simon Pilgrim	720db8ed7b	[X86][AVX1] Enable _EXTEND_VECTOR_INREG lowering of 256-bit vectors As discussed on D52964, this adds 256-bit _EXTEND_VECTOR_INREG lowering support for AVX1 targets to help improve SimplifyDemandedBits handling. Differential Revision: https://reviews.llvm.org/D52980 llvm-svn: 344019	2018-10-09 07:42:01 +00:00
Petar Jovanovic	aa97890d66	[MIPS GlobalISel] Legalize i64 add Custom legalize s64 G_ADD for MIPS32. Patch by Petar Avramovic. Differential Revision: https://reviews.llvm.org/D52652 llvm-svn: 344007	2018-10-08 23:59:37 +00:00
Rong Xu	47fd015163	[X86] Revert r343993 condition branches folding for three-way conditional codes Some buildbots failed. llvm-svn: 343998	2018-10-08 22:08:43 +00:00
Craig Topper	ff9f02580d	[X86] Prefer isTypeLegal over checking isSimple in a DAG combine. Simple types are a superset of what all in tree targets in LLVM could possibly have a legal type. This means the behavior of using isSimple to check for a supported type for X86 could change over time. For example, this could would change if a v256i1 type was added to MVT in the future. llvm-svn: 343995	2018-10-08 20:02:59 +00:00
Rong Xu	67b1b328f7	[X86] condition branches folding for three-way conditional codes This patch implements a pass that optimizes condition branches on x86 by taking advantage of the three-way conditional code generated by compare instructions. Currently, it tries to hoisting EQ and NE conditional branch to a dominant conditional branch condition where the same EQ/NE conditional code is computed. An example: bb_0: cmp %0, 19 jg bb_1 jmp bb_2 bb_1: cmp %0, 40 jg bb_3 jmp bb_4 bb_4: cmp %0, 20 je bb_5 jmp bb_6 Here we could combine the two compares in bb_0 and bb_4 and have the following code: bb_0: cmp %0, 20 jg bb_1 jl bb_2 jmp bb_5 bb_1: cmp %0, 40 jg bb_3 jmp bb_6 For the case of %0 == 20 (bb_5), we eliminate two jumps, and the control height for bb_6 is also reduced. bb_4 is gone after the optimization. This optimization is motivated by the branch pattern generated by the switch lowering: we always have pivot-1 compare for the inner nodes and we do a pivot compare again the leaf (like above pattern). This pass currently is enabled on Intel's Sandybridge and later arches. Some reviewers pointed out that on some arches (like AMD Jaguar), this pass may increase branch density to the point where it hurts the performance of the branch predictor. Differential Revision: https://reviews.llvm.org/D46662 llvm-svn: 343993	2018-10-08 18:52:39 +00:00
Scott Linder	823549a6ec	[AMDGPU] Legalize VGPR Rsrc operands for MUBUF instructions Emit a waterfall loop in the general case for a potentially-divergent Rsrc operand. When practical, avoid this by using Addr64 instructions. Recommits r341413 with changes to update the MachineDominatorTree when present. Differential Revision: https://reviews.llvm.org/D51742 llvm-svn: 343992	2018-10-08 18:47:01 +00:00
Simon Pilgrim	6fc8d05565	[X86][AVX2] Enable ZERO_EXTEND_VECTOR_INREG lowering of 256-bit vectors Some necessary yak shaving before lowering *_EXTEND_VECTOR_INREG 256-bit vectors on AVX1 targets as suggested by D52964. Differential Revision: https://reviews.llvm.org/D52970 llvm-svn: 343991	2018-10-08 18:40:50 +00:00
Sanjay Patel	43bf9917cc	[x86] make horizontal binop matching clearer; NFCI The instructions are complicated, so this code will probably never be very obvious, but hopefully this makes it better. As shown in PR39195: https://bugs.llvm.org/show_bug.cgi?id=39195 ...we need to improve the matching to not miss cases where we're h-opping on 1 source vector, and that should be a small patch after this rearranging. llvm-svn: 343989	2018-10-08 18:08:02 +00:00
Tom Stellard	14d8807d9a	AMDGPU/GlobalISel: Select amdgcn.cvt.pkrtz to 64-bit instructions Summary: The 32-bit variants do not exist on VI+. Reviewers: arsenm Reviewed By: arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D52958 llvm-svn: 343985	2018-10-08 17:49:29 +00:00
Neil Henning	6641657453	[AMDGPU] Add an AMDGPU specific atomic optimizer. This commit adds a new IR level pass to the AMDGPU backend to perform atomic optimizations. It works by: - Running through a function and finding atomicrmw add/sub or uses of the atomic buffer intrinsics for add/sub. - If all arguments except the value to be added/subtracted are uniform, record the value to be optimized. - Run through the atomic operations we can optimize and, depending on whether the value is uniform/divergent use wavefront wide operations (DPP in the divergent case) to calculate the total amount to be atomically added/subtracted. - Then let only a single lane of each wavefront perform the atomic operation, reducing the total number of atomic operations in flight. - Lastly we recombine the result from the single lane to each lane of the wavefront, and calculate our individual lanes offset into the final result. Differential Revision: https://reviews.llvm.org/D51969 llvm-svn: 343973	2018-10-08 15:49:19 +00:00
Oliver Stannard	367b4741f4	[AArch64][v8.5A] Don't create BR instructions in outliner when BTI enabled When branch target identification is enabled, we can only do indirect tail-calls through x16 or x17. This means that the outliner can't transform a BLR instruction at the end of an outlined region into a BR. Differential revision: https://reviews.llvm.org/D52869 llvm-svn: 343969	2018-10-08 14:12:08 +00:00
Oliver Stannard	c922116a51	[AArch64][v8.5A] Restrict indirect tail calls to use x16/17 only when using BTI When branch target identification is enabled, all indirectly-callable functions start with a BTI C instruction. this instruction can only be the target of certain indirect branches (direct branches and fall-through are not affected): - A BLR instruction, in either a protected or unprotected page. - A BR instruction in a protected page, using x16 or x17. - A BR instruction in an unprotected page, using any register. Without BTI, we can use any non call-preserved register to hold the address for an indirect tail call. However, when BTI is enabled, then the code being compiled might be loaded into a BTI-protected page, where only x16 and x17 can be used for indirect tail calls. Legacy code withiout this restriction can still indirectly tail-call BTI-protected functions, because they will be loaded into an unprotected page, so any register is allowed. Differential revision: https://reviews.llvm.org/D52868 llvm-svn: 343968	2018-10-08 14:09:15 +00:00
Oliver Stannard	250e5a5b65	[AArch64][v8.5A] Branch Target Identification code-generation pass The Branch Target Identification extension, introduced to AArch64 in Armv8.5-A, adds the BTI instruction, which is used to mark valid targets of indirect branches. When enabled, the processor will trap if an instruction in a protected page tries to perform an indirect branch to any instruction other than a BTI. The BTI instruction uses encodings which were NOPs in earlier versions of the architecture, so BTI-enabled code will still run on earlier hardware, just without the extra protection. There are 3 variants of the BTI instruction, which are valid targets for different kinds or branches: - BTI C can be targeted by call instructions, and is inteneded to be used at function entry points. These are the BLR instruction, as well as BR with x16 or x17. These BR instructions are allowed for use in PLT entries, and we can also use them to allow indirect tail-calls. - BTI J can be targeted by BR only, and is intended to be used by jump tables. - BTI JC acts ab both a BTI C and a BTI J instruction, and can be targeted by any BLR or BR instruction. Note that RET instructions are not restricted by branch target identification, the reason for this is that return addresses can be protected more effectively using return address signing. Direct branches and calls are also unaffected, as it is assumed that an attacker cannot modify executable pages (if they could, they wouldn't need to do a ROP/JOP attack). This patch adds a MachineFunctionPass which: - Adds a BTI C at the start of every function which could be indirectly called (either because it is address-taken, or externally visible so could be address-taken in another translation unit). - Adds a BTI J at the start of every basic block which could be indirectly branched to. This could be either done by a jump table, or by taking the address of the block (e.g. the using GCC label values extension). We only need to use BTI JC when a function is indirectly-callable, and takes the address of the entry block. I've not been able to trigger this from C or IR, but I've included a MIR test just in case. Using BTI C at function entries relies on the fact that no other code in BTI-protected pages uses indirect tail-calls, unless they use x16 or x17 to hold the address. I'll add that code-generation restriction as a separate patch. Differential revision: https://reviews.llvm.org/D52867 llvm-svn: 343967	2018-10-08 14:04:24 +00:00
Alexander Ivchenko	1aedf203dd	[GlobalIsel][X86] Support G_UDIV/G_UREM/G_SREM Support G_UDIV/G_UREM/G_SREM. The instruction selection code is taken from FastISel with only minor tweaks to adapt for GlobalISel. Differential Revision: https://reviews.llvm.org/D49781 llvm-svn: 343966	2018-10-08 13:40:34 +00:00
Neil Henning	57f5d0a885	[IRBuilder] Fixup CreateIntrinsic to allow specifying Types to Mangle. The IRBuilder CreateIntrinsic method wouldn't allow you to specify the types that you wanted the intrinsic to be mangled with. To fix this I've: - Added an ArrayRef<Type > member to both CreateIntrinsic overloads. - Used that array to pass into the Intrinsic::getDeclaration call. - Added a CreateUnaryIntrinsic to replace the most common use of CreateIntrinsic where the type was auto-deduced from operand 0. - Added a bunch more unit tests to test CreateIntrinsic calls that weren't being tested (including the FMF flag that wasn't checked). This was suggested as part of the AMDGPU specific atomic optimizer review (https://reviews.llvm.org/D51969). Differential Revision: https://reviews.llvm.org/D52087 llvm-svn: 343962	2018-10-08 10:32:33 +00:00
Peter Smith	6f36cd4d76	[ARM] Account for implicit IT when calculating inline asm size When deciding if it is safe to optimize a conditional branch to a CBZ or CBNZ the offsets of the BasicBlocks from the start of the function are estimated. For inline assembly the generic getInlineAsmLength() function is used to get a worst case estimate of the inline assembly by multiplying the number of instructions by the max instruction size of 4 bytes. This unfortunately doesn't take into account the generation of Thumb implicit IT instructions. In edge cases such as when all the instructions in the block are 4-bytes in size and there is an implicit IT then the size is underestimated. This can cause an out of range CBZ or CBNZ to be generated. The patch takes a conservative approach and assumes that every instruction in the inline assembly block may have an implicit IT. Fixes pr31805 Differential Revision: https://reviews.llvm.org/D52834 llvm-svn: 343960	2018-10-08 09:38:28 +00:00
Oliver Stannard	9ecdac8ee0	[AArch64] Fix verifier error when outlining indirect calls The MachineOutliner for AArch64 transforms indirect calls into indirect tail calls, replacing the call with the TCRETURNri pseudo-instruction. This pseudo lowers to a BR, but has the isCall and isReturn flags set. The problem is that TCRETURNri takes a tcGPR64 as the register argument, to prevent indiret tail-calls from using caller-saved registers. The indirect calls transformed by the outliner could use caller-saved registers. This is fine, because the outliner ensures that the register is available at all call sites. However, this causes a verifier failure when the register is not in tcGPR64. The fix is to add a new pseudo-instruction like TCRETURNri, but which accepts any GPR. Differential revision: https://reviews.llvm.org/D52829 llvm-svn: 343959	2018-10-08 09:18:48 +00:00
Simon Pilgrim	9fa1c66421	[X86] getFauxShuffleMask - Handle undef + sentinel values in subvector insertion llvm-svn: 343926	2018-10-06 22:13:44 +00:00
Simon Pilgrim	a30e8d23e2	[X86][AVX] Ensure resolveTargetShuffleInputs shuffle masks are the correct width Don't handle ZERO_EXTEND style shuffles until we support bitcasts. Found by inspection. llvm-svn: 343924	2018-10-06 17:18:41 +00:00
Simon Pilgrim	62d199f4e5	[X86] combinePMULDQ - add op back to worklist if SimplifyDemandedBits succeeds on either operand Prevents missing other simplifications that may occur deep in the operand chain where CommitTargetLoweringOpt won't add the PMULDQ back to the worklist itself llvm-svn: 343922	2018-10-06 14:51:14 +00:00
Simon Pilgrim	0cc0a24b55	[X86][SSE] SimplifyDemandedVectorEltsForTargetNode - simplify PSHUFB masks Attempt to simplify PSHUFB masks (even non-constant ones) - we should probably be able to simplify other variable shuffles as well as the need arises. llvm-svn: 343919	2018-10-06 13:49:31 +00:00
Simon Pilgrim	ae78d709b4	[X86] Use the SimplifyDemandedBits wrappers where possible. NFCI. Leave the wrapper to handle TargetLowering::TargetLoweringOpt and CommitTargetLoweringOpt. llvm-svn: 343918	2018-10-06 13:29:08 +00:00
Alex Bradbury	639df9e4c0	[RISCV] Compress addiw rd, x0, simm6 to c.li rd, simm6 A pattern was present for addi rd, x0, simm6 but not addiw which is semantically identical when the source register is x0. This patch addresses that, and the benefit can be seen in rv64c-aliases-valid.s. llvm-svn: 343911	2018-10-06 06:09:46 +00:00
Tom Stellard	251ee083a3	AMDGPU: Consolidate SMRD TableGen patterns Summary: Merge the SMRD patterns for CI into the same multiclass as the patterns for other sub-targets. This removes some duplicate code and will make it easier for some future GlobalISel changes I would like to do. Reviewers: arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D52557 llvm-svn: 343909	2018-10-06 03:32:43 +00:00
Matthias Braun	81578e9f77	X86, AArch64, ARM: Do not attach debug location to spill/reload instructions This rebases and recommits r343520. hwasan should be fixed now and this shouldn't break the tests anymore. Spill/reload instructions are artificially generated by the compiler and have no relation to the original source code. So the best thing to do is not attach any debug location to them (instead of just taking the next debug location we find on following instructions). Differential Revision: https://reviews.llvm.org/D52125 llvm-svn: 343895	2018-10-05 22:00:13 +00:00
Simon Pilgrim	dc97118efe	[X86][AVX] Limit getFauxShuffleMask INSERT_SUBVECTOR support to 2 inputs rL343853 didn't limit the number of subinputs, but we don't currently support faux shuffles with more than 2 total inputs, so put a limiter in place until this is fixed. Found by Artem Dergachev. llvm-svn: 343891	2018-10-05 21:44:19 +00:00
Craig Topper	0ed892da70	[X86] Don't promote i16 compares to i32 if the immediate will fit in 8 bits. The comments in this code say we were trying to avoid 16-bit immediates, but if the immediate fits in 8-bits this isn't an issue. This avoids creating a zero extend that probably won't go away. The movmskb related changes are interesting. The movmskb instruction writes a 32-bit result, but fills the upper bits with 0. So the zero_extend we were previously emitting was free, but we turned a -1 immediate that would fit in 8-bits into a 32-bit immediate so it was still bad. llvm-svn: 343871	2018-10-05 18:13:36 +00:00
Simon Pilgrim	f09fc3bc12	[X86] Move ReadAfterLd functionality into X86FoldableSchedWrite (PR36957) Currently we hardcode instructions with ReadAfterLd if the register operands don't need to be available until the folded load has completed. This doesn't take into account the different load latencies of different memory operands (PR36957). This patch adds a ReadAfterFold def into X86FoldableSchedWrite to replace ReadAfterLd, allowing us to specify the load latency at a scheduler class level. I've added ReadAfterVec*Ld classes that match the XMM/Scl, XMM and YMM/ZMM WriteVecLoad classes that we currently use, we can tweak these values in future patches once this infrastructure is in place. Differential Revision: https://reviews.llvm.org/D52886 llvm-svn: 343868	2018-10-05 17:57:29 +00:00
Simon Pilgrim	6c5ab48fe7	[X86][AVX] getFauxShuffleMask - add support for INSERT_SUBVECTOR subvector shuffles Decode subvector shuffles from INSERT_SUBVECTOR(SRC0, SHUFFLE(EXTRACT_SUBVECTOR(SRC1)) This was found necessary while investigating PR39161 llvm-svn: 343853	2018-10-05 14:41:00 +00:00
Jonas Paulsson	faad1b3056	[TargetRegisterInfo] Remove temporary hook enableMultipleCopyHints() Finally all targets are enabling multiple regalloc hints, so the hook to disable this can now be removed. NFC. Review: Simon Pilgrim https://reviews.llvm.org/D52316 llvm-svn: 343851	2018-10-05 14:23:11 +00:00
Tom Stellard	7c65078f04	AMDGPU/GlobalISel: Add support for G_INTTOPTR Summary: This is a no-op. Reviewers: arsenm Reviewed By: arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D52916 llvm-svn: 343839	2018-10-05 04:34:09 +00:00
Thomas Lively	4b47d08e52	[WebAssembly] Saturating arithmetic intrinsics Summary: Depends on D52805. Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D52813 llvm-svn: 343833	2018-10-05 00:45:20 +00:00
Yury Delendik	409b439152	[WebAssembly] Ignore DBG_VALUE in WebAssemblyCFGStackify pass when looking for block start Summary: Fixes https://bugs.llvm.org/show_bug.cgi?id=39158 and regression caused by D49034. Though it is possible the problem was existed before and was exposed by additional DBG_VALUEs. Reviewers: sunfish, dschuff, aheejin Reviewed By: aheejin Subscribers: sbc100, aheejin, llvm-commits, alexcrichton, jgravelle-google Differential Revision: https://reviews.llvm.org/D52837 llvm-svn: 343827	2018-10-04 23:31:00 +00:00
Ana Pazos	9d6c55323f	[RISCV] Support named operands for CSR instructions. Reviewers: asb, mgrang Reviewed By: asb Subscribers: jocewei, mgorny, jfb, PkmX, MartinMosbeck, brucehoult, the_o, rkruppe, rogfer01, rbar, johnrusso, simoncook, jordy.potman.lists, sabuasal, niosHD, kito-cheng, shiva0217, zzheng, edward-jones Differential Revision: https://reviews.llvm.org/D46759 llvm-svn: 343822	2018-10-04 21:50:54 +00:00
Craig Topper	7d2155e3f9	[X86][LegalizeVectorOps] Use MERGE_VALUES to return two results from LowerLoad. Remove special case code in LegalizeVectorOps that allowed us to only return one result. Previously we replaced the chain use ourself and return the data result. LegalizeVectorOps then detected that we'd done this and assumed the chain had already been handled. This commit instead returns a MERGE_VALUES node with two results joined from nodes. This allows LegalizeVectorOps to do all the replacements for us without any special casing. The MERGE_VALUES will be removed by DAG combine. llvm-svn: 343817	2018-10-04 21:24:24 +00:00
Heejin Ahn	b68d591475	[WebAssembly] Don't modify preds/succs iterators while erasing from them Summary: This caused out-of-bound bugs. Found by `-DLLVM_ENABLE_EXPENSIVE_CHECKS=ON`. Reviewers: dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D52902 llvm-svn: 343814	2018-10-04 21:03:35 +00:00
Konstantin Zhuravlyov	aa067cb9fb	AMDGPU: Rename isAmdCodeObjectV2 -> isAmdHsaOrMesa The isAmdCodeObjectV2 is a misleading name which actually checks whether the os is amdhsa or mesa. Also add a test to make sure we do not generate old kernel header for code object v3. Differential Revision: https://reviews.llvm.org/D52897 llvm-svn: 343813	2018-10-04 21:02:16 +00:00
Martin Storsjo	37b742e208	[COFF] [X86] Don't use llvm_unreachable for unsupported relocation types This can happen if assembling a reference to _GLOBAL_OFFSET_TABLE_. While it doesn't make sense to try to assemble that for COFF, the fact that we previously used llvm_unreachable meant that the code had undefined behaviour if something tried to assemble that. The configure script of libgmp would try to assemble such a snippet (which should signal a failure). If llvm is built without assertions, the undefined behaviour meant a (near) infinite loop. Differential Revision: https://reviews.llvm.org/D52903 llvm-svn: 343811	2018-10-04 20:43:38 +00:00
Matthias Braun	0c67a4e958	AArch64: Fix XSeqPairs/WSeqPairs problems - Fix spill/reloads of XSeqPairs failing with vregs (only physregs worked correctly) - Add missing spill/reload code for WSeqPairs class Differential Revision: https://reviews.llvm.org/D52761 llvm-svn: 343799	2018-10-04 17:02:53 +00:00
Farhana Aleen	4bc597bff5	[AMDGPU] Match signed dot4/8 pattern. Summary: This patch matches signed dot4 and dot8 pattern. Author: FarhanaAleen Reviewed By: msearles Differential Revision: https://reviews.llvm.org/D52520 llvm-svn: 343798	2018-10-04 16:57:37 +00:00
Alex Bradbury	5bf3b20e99	[RISCV] Remove overzealous is64Bit checks lowerGlobalAddress, lowerBlockAddress, and insertIndirectBranch contain overzealous checks for is64Bit. These functions are all safe as-implemented for RV64. llvm-svn: 343781	2018-10-04 14:30:03 +00:00
David Greene	4f916df29e	[X86] Set correct MMO offset on scalarized load pieces When scalarizing a load, be sure to update the offset in the MachineMemOperand for each scalar load. llvm-svn: 343776	2018-10-04 14:07:59 +00:00
Simon Pilgrim	991b0d24ff	Fix MSVC "not all control paths return a value" warning. NFCI. llvm-svn: 343765	2018-10-04 10:25:52 +00:00
Alex Bradbury	e96b7c88a3	[RISCV] Bugfix for floats passed on the stack with the ILP32 ABI on RV32F f32 values passed on the stack would previously cause an assertion in unpackFromMemLoc.. This would only trigger in the presence of the F extension making f32 a legal type. Otherwise the f32 would be legalized. This patch fixes that by keeping LocVT=f32 when a float is passed on the stack. It also adds test coverage for this case, and tests that also demonstrate lw/sw/flw/fsw will be selected when most profitable. i.e. there is no unnecessary i32<->f32 conversion in registers. llvm-svn: 343756	2018-10-04 07:28:49 +00:00
Craig Topper	8b3c46f0a8	[X86] Merge matchANDXORWithAllOnesAsANDNP into combineANDXORWithAllOnesIntoANDNP. NFCI It's the only caller and the logic pretty easy to combine. llvm-svn: 343754	2018-10-04 06:13:27 +00:00
Alex Bradbury	0e16766b76	[RISCV][NFC] Fix naming of RISCVISelLowering::{LowerRETURNADDR,LowerFRAMEADDR} Rename to lowerRETURNADDR, lowerFRAMEADDR in order to be consistent with the LLVM coding style and the other functions in this file. llvm-svn: 343752	2018-10-04 05:27:50 +00:00
Alex Bradbury	5ac0a2fc48	[RISCV] Handle redundant SplitF64+BuildPairF64 pairs in a DAGCombine r343712 performed this optimisation during instruction selection. As Eli Friedman pointed out in post-commit review, implementing this as a DAGCombine might allow opportunities for further optimisations. llvm-svn: 343741	2018-10-03 23:30:16 +00:00
Thomas Lively	5d461c96bd	[WebAssembly] Bitselect intrinsic and instruction Summary: Depends on D52755. Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D52805 llvm-svn: 343739	2018-10-03 23:02:23 +00:00
Alex Bradbury	1dbfdeb6e5	[RISCV][NFC] Refactor LocVT<->ValVT converstion in RISCVISelLowering There was some duplicated logic for using the LocInfo of a CCValAssign in order to convert from the ValVT to LocVT or vice versa. Resolve this by factoring out convertLocVTFromValVT from unpackFromRegLoc. Also rename packIntoRegLoc to the more appropriate convertValVTToLocVT and call these helper functions consistently. llvm-svn: 343737	2018-10-03 22:53:25 +00:00
Derek Schuff	77a7a38006	[WebAssembly] Refactor WasmSignature and use it for MCSymbolWasm MCContext does not destroy MCSymbols on shutdown. So, rather than putting SmallVectors (which may heap-allocate) inside MCSymbolWasm, use unowned pointer to a WasmSignature instead. The signatures are now owned by the AsmPrinter. Also uses WasmSignature instead of param and result vectors in TargetStreamer, and leaves some TODOs for further simplification. Differential Revision: https://reviews.llvm.org/D52580 llvm-svn: 343733	2018-10-03 22:22:48 +00:00
Craig Topper	a65c2dbfd6	[X86] Stop promoting vector ISD::SELECT to vXi64. The additional patterns needed for this aren't overwhelming and introducing extra bitcasts during lowering limits our ability to do computeNumSignBits. Not that I have a good example of that for select. I'm just becoming increasingly grumpy about promotion of AND/OR/XOR. SELECT was just a lot easier to fix. llvm-svn: 343723	2018-10-03 21:10:29 +00:00
Craig Topper	c39dc41b63	[X86] Add CMOV_VK2/VK4 pseudos and remove lowering code that turned v2i1/v4i1 SELECT into v8i1. llvm-svn: 343713	2018-10-03 20:28:43 +00:00
Alex Bradbury	ce9049952f	[RISCV][NFCI] Handle redundant splitf64+buildpairf64 pairs during instruction selection Although we can't write a tablegen pattern to remove redundant splitf64+buildf64 pairs due to the multiple return values, we can handle it with some C++ selection code. This is simpler than removing them after instruction selection through RISCVDAGToDAGISel::PostprocessISelDAG, as was done previously. llvm-svn: 343712	2018-10-03 20:12:10 +00:00
Craig Topper	703fbde3cb	[X86] Add CMOV pseudos for VR128X and VR256X register classes. Use them when AVX512VL is enabled. This allows the phi nodes to be generated with the correct register class when expanded. llvm-svn: 343710	2018-10-03 19:48:26 +00:00
Craig Topper	4b62c2dbda	[X86] Don't break CMOV pseudo instructions down by type. Just by register class. The register class is all that's important for the pseudo instructions. We can use patterns to handle the different types. llvm-svn: 343709	2018-10-03 19:48:23 +00:00
Simon Pilgrim	aabd99c27a	[X86] PUSH/POP 'mem-mem' instructions are not RMW - these are 2 different addresses This patch adds a 'WriteCopy' [WriteLoad, WriteStore] schedule sequence instead to better model the behaviour Found by @andreadb during llvm-mca testing on btver2 which was crashing on "zero uop" WriteRMW only instructions llvm-svn: 343708	2018-10-03 19:02:38 +00:00
Simon Pilgrim	b80d27a916	[X86] Move Atomic binops to use WriteALURMW schedule class These were being tagged as <WriteALULd, WriteRMW> instead of properly using the RMW sequence llvm-svn: 343705	2018-10-03 18:38:28 +00:00
Simon Pilgrim	0b451a2983	[X86][Btver2] Fix MMX PSHUFB schedule Match AMD Fam16h SOG + llvm-exegesis tests llvm-svn: 343701	2018-10-03 18:18:50 +00:00
Simon Pilgrim	a400612aed	[X86] Move Atomic CMPXCHG to WriteCMPXCHGRMW schedule class llvm-svn: 343700	2018-10-03 18:05:01 +00:00
Simon Pilgrim	2c59475c06	[X86] Add SkylakeClient uops counter - same as the other Intel models. llvm-svn: 343697	2018-10-03 16:45:26 +00:00
Nirav Dave	925b64be64	[X86] Correctly use SSE registers if no-x87 is selected. Fix use of SSE1 registers for f32 ops in no-x87 mode. Notably, allow use of SSE instructions for f32 operations in 64-bit mode (but not 32-bit which is disallowed by callign convention). Also avoid translating memset/memcopy/memmove into SSE registers without X87 for 32-bit mode. This fixes PR38738. Reviewers: nickdesaulniers, craig.topper Subscribers: hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D52555 llvm-svn: 343689	2018-10-03 14:13:30 +00:00
Alex Bradbury	d33ffe9bb1	[RISCV][NFC] Refactor RISCVDAGToDAGISel::Select Introduce and use a switch on the opcode. llvm-svn: 343688	2018-10-03 13:13:13 +00:00
Alex Bradbury	d934032e48	[RISCV] Gate float<->int and double<->int conversion patterns on IsRV32 The patterns as defined are correct only when XLen==32. This is another preparatory patch for a set of patches that flesh out RV64 codegen. llvm-svn: 343679	2018-10-03 11:35:22 +00:00
Alex Bradbury	d464ed8c2e	[RISCV] Remove XLenVT==i32 assumptions from RISCVInstrInfo td 1. brcond operates on an condition. 2. atomic_fence and the pseudo AMO instructions should all take xlen immediates This allows the same definitions and patterns to work for RV64 (XLenVT==i64). llvm-svn: 343678	2018-10-03 11:14:26 +00:00
Alex Bradbury	a9ac5994b1	[RISCV] Gate simm32 materialisation pattern and SW pattern on IsRV32 These patterns are not correct for RV64. llvm-svn: 343677	2018-10-03 11:04:59 +00:00
Tim Renouf	a37679d67b	[AMDGPU] Fix for negative offsets in buffer/tbuffer intrinsics Summary: The new buffer/tbuffer intrinsics handle an out-of-range immediate offset by moving/adding offset&-4096 to a vgpr, leaving an in-range immediate offset, with a chance of the move/add being CSEd for similar loads/stores. However it turns out that a negative offset in a vgpr is illegal, even if adding the immediate offset makes it legal again. Therefore, this commit disables the offset&-4096 thing if the offset is negative. Differential Revision: https://reviews.llvm.org/D52683 Change-Id: Ie02f0a74f240a138dc2a29d17cfbd9e350e4ed13 llvm-svn: 343672	2018-10-03 10:29:43 +00:00
Simon Pilgrim	c68cc4efbe	[X86][Btver2] Most RMW instructions don't require an additional uop Remove uop on WriteRMW and move it into the few instructions that need it. Match AMD Fam16h SOG + llvm-exegesis tests llvm-svn: 343671	2018-10-03 10:28:43 +00:00
Simon Pilgrim	d11015861c	[X86] ALU/ADC RMW instructions should use the WriteRMW sequence class I was expecting this to be a nfc but Silvermont seems to be setup a little differently: // A folded store needs a cycle on MEC_RSV for the store data, but it does not need an extra port cycle to recompute the address. def : WriteRes<WriteRMW, [SLM_MEC_RSV]>; So moving from WriteStore to WriteRMW reduces predicted port pressure, confirmed by @craig.topper that this is correct. Differential Revision: https://reviews.llvm.org/D52740 llvm-svn: 343670	2018-10-03 10:01:13 +00:00
Fangrui Song	3d76d36059	[AMDGPU] Rename pass "isel" to "amdgpu-isel" Summary: The AMDGPU target specific pass "isel" is a misleading name. Reviewers: tstellar, echristo, javed.absar, arsenm Reviewed By: arsenm Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D52759 llvm-svn: 343659	2018-10-03 03:38:22 +00:00
Matt Arsenault	635d479322	AMDGPU: Always run AMDGPUAlwaysInline Even if calls are enabled, it still needs to be run for forcing inline of functions that use LDS. llvm-svn: 343657	2018-10-03 02:47:25 +00:00
Daniel Sanders	34eac35a60	Add the missing new files from r343654 llvm-svn: 343655	2018-10-03 02:21:30 +00:00
Daniel Sanders	c973ad1878	Re-commit: [globalisel] Add a combiner helpers for extending loads and use them in a pre-legalize combiner for AArch64 Summary: Depends on D45541 Reviewers: ab, aditya_nandakumar, bogner, rtereshin, volkan, rovka, javed.absar, aemerson Subscribers: aemerson, rengolin, mgorny, javed.absar, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D45543 The previous commit failed portions of the test-suite on GreenDragon due to duplicate COPY instructions and iterator invalidation. Both issues have now been fixed. To assist with this, a helper (cloneVirtualRegister) has been added to MachineRegisterInfo that can be used to get another register that has the same type and class/bank as an existing one. llvm-svn: 343654	2018-10-03 02:12:17 +00:00
Thomas Lively	9075cd607d	[WebAssembly] any_true and all_true intrinsics and instructions Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D52755 llvm-svn: 343649	2018-10-03 00:19:39 +00:00
Stanislav Mekhanoshin	1821513e2f	[AMDGPU] Assert in getOpSize() there are no sub-dword subregs Differential Revision: https://reviews.llvm.org/D52769 llvm-svn: 343648	2018-10-03 00:00:41 +00:00
Sam Clegg	b2486f118d	[WebAssembly] Stop generating helper functions in WebAssemblyLowerEmscriptenEHSjLj Previously we were creating weakly defined helper function in each translation unit: - setThrew - setTempRet0 Instead we now assume these will be provided at link time. In emscripten they are provided in compiler-rt: https://github.com/kripken/emscripten/pull/7203 Additionally we previously created three global variable which are also now required to exist at link time instead. - __THREW__ - _threwValue - __tempRet0 Differential Revision: https://reviews.llvm.org/D49208 llvm-svn: 343640	2018-10-02 22:12:15 +00:00
Matt Morehouse	4b1ec17fb0	Revert "X86, AArch64, ARM: Do not attach debug location to spill/reload instructions" This reverts r343520 due to breakage of HWASan tests on Android. llvm-svn: 343616	2018-10-02 18:35:44 +00:00
Craig Topper	49225d0915	[X86][Disassembler] Add bizarro versions of the MOVSXD instruction that sign extend from a GR32 to GR32 or GR16. The 0x63 opcodes in 64-bit mode have a fixed source size of 32-bits, but the destination size is controlled by REX.W and the 0x66 opsize prefix. This instruction is normally used with a REX.W prefix which provides desired behavior. The other encodings are interpretted as valid by the processor, but aren't useful. This patch makes us recognize them for the disassembler to match objdump. llvm-svn: 343614	2018-10-02 18:16:19 +00:00
Reid Kleckner	d5e4ec74e3	[codeview] Fix 32-bit x86 variable locations in realigned stack frames Add the .cv_fpo_stackalign directive so that we can define $T0, or the VFRAME virtual register, with it. This was overlooked in the initial implementation because unlike MSVC, we push CSRs before allocating stack space, so this value is only needed to describe local variable locations. Variables that the compiler now addresses via ESP are instead described as being stored at offsets from VFRAME, which for us is ESP after alignment in the prologue. This adds tests that show that we use the VFRAME register properly in our S_DEFRANGE records, and that we emit the correct FPO data to define it. Fixes PR38857 llvm-svn: 343603	2018-10-02 16:43:52 +00:00
Simon Pilgrim	860cb5c071	[X86][Btver2] Fix BLENDV and AESDEC schedules Match AMD Fam16h SOG + llvm-exegesis tests llvm-svn: 343597	2018-10-02 15:13:18 +00:00
Krzysztof Parzyszek	528aff3372	[Hexagon] Fix extracting subvectors of non-HVX vNi1 Patch by Brendon Cahoon. llvm-svn: 343596	2018-10-02 15:05:43 +00:00
Diogo N. Sampaio	eb9ca5ab18	[ARM] Emmit data symbol for constant pool data The ARM elf emitter would omit printing data symbol when constant data. This patch overrides the emitFill method as to enforce that the symbol is correctly printed. Differential revision: https://reviews.llvm.org/D52737 llvm-svn: 343594	2018-10-02 14:55:48 +00:00
Simon Pilgrim	201bbe3993	[X86] Remove unnecessary BT(C/R/S)m(i/r) scheduler overrides Some SchedAlias remain due to some badly setup RMW tags - but at least the overrides are all removed llvm-svn: 343586	2018-10-02 13:11:59 +00:00
Simon Pilgrim	271bcb9397	[X86] Add APInt constant assembly printer helper llvm-svn: 343577	2018-10-02 11:32:33 +00:00
Oliver Stannard	c41902807e	[AArch64][v8.5A] Add Memory Tagging instructions This adds new instructions to manipluate tagged pointers, and to load and store the tags associated with memory. Patch by Pablo Barrio, David Spickett and Oliver Stannard! Differential revision: https://reviews.llvm.org/D52490 llvm-svn: 343572	2018-10-02 10:04:39 +00:00
Oliver Stannard	2a5fcba94b	[AArch64][v8.5A] Add Memory Tagging system registers This adds new system registers introduced by the Memory Tagging extension. Patch by Pablo Barrio! Differential revision: https://reviews.llvm.org/D52488 llvm-svn: 343571	2018-10-02 09:54:35 +00:00
Oliver Stannard	4493f421ac	[AArch64][v8.5A] Add MTE system instructions The Memory Tagging Extension adds system instructions for data cache maintenance, implemented as new operands to the DC instruction. Patch by Pablo Barrio! Differential revision: https://reviews.llvm.org/D52487 llvm-svn: 343570	2018-10-02 09:48:43 +00:00
Oliver Stannard	85de54090e	[AArch64][v8.5A] Add MTE as an optional AArch64 extension This adds the memory tagging extension, which is an optional extension introduced in v8.5A. The new instructions and registers will be added by subsequent patches. Patch by Pablo Barrio! Differential revision: https://reviews.llvm.org/D52486 llvm-svn: 343563	2018-10-02 09:36:28 +00:00
Simon Pilgrim	ad23f270db	[X86] Standardize floating point assembly comments Consistently try to use APFloat::toString for floating point constant comments to get rid of differences between Constant / ConstantDataSequential values - it should help stop some of the linux-windows buildbot failures matching NaN/INF etc. as well. Differential Revision: https://reviews.llvm.org/D52702 llvm-svn: 343562	2018-10-02 09:08:51 +00:00
Matt Arsenault	ab41193312	AMDGPU: Expand atomicrmw nand in IR llvm-svn: 343559	2018-10-02 03:50:56 +00:00
Thomas Lively	6f77811a21	[WebAssembly] Restore slashes in SIMD conversion names Summary: Depends on D52372 and D52442. Reviewers: aheejin, dschuff, aardappel Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D52512 llvm-svn: 343558	2018-10-02 01:52:21 +00:00
Daniel Sanders	33f42f97af	Revert: r343521 and r343541: [globalisel] Add a combiner helpers for extending loads and use them in a pre-legalize combiner for AArch64 There's a strange assertion on two of the Green Dragon bots that goes away when this is reverted. The assertion is in RegBankAlloc and if it is this commit then -verify-machine-instrs should have caught it earlier in the pipeline. llvm-svn: 343546	2018-10-01 22:32:08 +00:00
Reid Kleckner	9ea2c01264	[codeview] Emit S_FRAMEPROC and use S_DEFRANGE_FRAMEPOINTER_REL Summary: Before this change, LLVM would always describe locals on the stack as being relative to some specific register, RSP, ESP, EBP, ESI, etc. Variables in stack memory are pretty common, so there is a special S_DEFRANGE_FRAMEPOINTER_REL symbol for them. This change uses it to reduce the size of our debug info. On top of the size savings, there are cases on 32-bit x86 where local variables are addressed from ESP, but ESP changes across the function. Unlike in DWARF, there is no FPO data to describe the stack adjustments made to push arguments onto the stack and pop them off after the call, which makes it hard for the debugger to find the local variables in frames further up the stack. To handle this, CodeView has a special VFRAME register, which corresponds to the $T0 variable set by our FPO data in 32-bit. Offsets to local variables are instead relative to this value. This is part of PR38857. Reviewers: hans, zturner, javed.absar Subscribers: aprantl, hiraditya, JDevlieghere, llvm-commits Differential Revision: https://reviews.llvm.org/D52217 llvm-svn: 343543	2018-10-01 21:59:45 +00:00
Craig Topper	42cd8cd862	Recommit r343499 "[X86] Enable load folding in the test shrinking code" Original message: This patch adds load folding support to the test shrinking code. This was noticed missing in the review for D52669 llvm-svn: 343540	2018-10-01 21:35:28 +00:00
Craig Topper	f06a57fc89	Recommit r343498 "[X86] Improve test instruction shrinking when the sign flag is used and the output of the and is truncated." This includes a fix to prevent i16 compares with i32/i64 ands from being shrunk if bit 15 of the and is set and the sign bit is used. Original commit message: Currently we skip looking through truncates if the sign flag is used. But that's overly restrictive. It's safe to look through the truncate as long as we ensure one of the 3 things when we shrink. Either the MSB of the mask at the shrunken size isn't set. If the mask bit is set then either the shrunk size needs to be equal to the compare size or the sign There are still missed opportunities to shrink a load and fold it in here. This will be fixed in a future patch. llvm-svn: 343539	2018-10-01 21:35:26 +00:00
Stefan Pintilie	5d32a86f44	[PowerPC] Folding XForm to DForm loads requires alignment for some DForm loads. Going from XForm Load to DSForm Load requires that the immediate be 4 byte aligned. If we are not aligned we must leave the load as LDX (XForm). This bug is causing a compile-time failure in the benchmark h264ref. Differential Revision: https://reviews.llvm.org/D51988 llvm-svn: 343525	2018-10-01 20:16:27 +00:00
Daniel Sanders	9659bfda5a	[globalisel] Add a combiner helpers for extending loads and use them in a pre-legalize combiner for AArch64 Summary: Depends on D45541 Reviewers: ab, aditya_nandakumar, bogner, rtereshin, volkan, rovka, javed.absar, aemerson Subscribers: aemerson, rengolin, mgorny, javed.absar, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D45543 llvm-svn: 343521	2018-10-01 18:56:47 +00:00
Matthias Braun	3e081703c3	X86, AArch64, ARM: Do not attach debug location to spill/reload instructions Spill/reload instructions are artificially generated by the compiler and have no relation to the original source code. So the best thing to do is not attach any debug location to them (instead of just taking the next debug location we find on following instructions). Differential Revision: https://reviews.llvm.org/D52125 llvm-svn: 343520	2018-10-01 18:56:39 +00:00
Craig Topper	e072934d28	Revert r343499 and r343498. X86 test improvements There's a subtle bug in the handling of truncate from i32/i64 to i32 without minsize. I'll be adding more test cases and trying to find a fix. llvm-svn: 343516	2018-10-01 18:40:44 +00:00
Krzysztof Parzyszek	6d569a2cc4	[Hexagon] Remove incorrect pattern for swiz The pattern had a couple of problems: - It was checking for loads of bytes in the reverse order to what it should have been looking for. - It would replace loads of bytes with a load of a word without making sure that the alignment was correct. Thanks to Eli Friedman for pointing it out. llvm-svn: 343514	2018-10-01 18:24:40 +00:00
Stanislav Mekhanoshin	ae8bd6d9b5	[AMDGPU] Fixed SIInstrInfo::getOpSize to handle subregs Currently it returns incorrect operand size for a target independet node such as COPY if operand is a register with subreg. Instead of correct subreg size it returns a size of the whole superreg. Differential Revision: https://reviews.llvm.org/D52736 llvm-svn: 343508	2018-10-01 18:00:02 +00:00
Wouter van Oortmerssen	0c83c3ff38	[WebAssembly] Fixed AsmParser not allowing instructions with / Summary: The AsmParser Lexer regards these as a seperate token. Here we expand the instruction name with them if they are adjacent (no whitespace). Tested: the basic-assembly.s test case has one case with a / in it. The currently are also instructions with : in them, which we intend to rename rather than fix them here. Reviewers: tlively, dschuff Subscribers: sbc100, jgravelle-google, aheejin, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D52442 llvm-svn: 343501	2018-10-01 17:20:31 +00:00
Craig Topper	aa84e1bba2	[X86] Enable load folding in the test shrinking code This patch adds load folding support to the test shrinking code. This was noticed missing in the review for D52669 Differential Revision: https://reviews.llvm.org/D52699 llvm-svn: 343499	2018-10-01 17:10:50 +00:00
Craig Topper	2b587ad071	[X86] Improve test instruction shrinking when the sign flag is used and the output of the and is truncated Currently we skip looking through truncates if the sign flag is used. But that's overly restrictive. It's safe to look through the truncate as long as we ensure one of the 3 things when we shrink. Either the MSB of the mask at the shrunken size isn't set. If the mask bit is set then either the shrunk size needs to be equal to the compare size or the sign flag needs to be unused. There are still missed opportunities to shrink a load and fold it in here. This will be fixed in a future patch. Differential Revision: https://reviews.llvm.org/D52669 llvm-svn: 343498	2018-10-01 17:10:45 +00:00
Simon Pilgrim	e0d2019052	[X86][Btver2] Fix BT(C\|R\|S)mr & BT(C\|R\|S)mi schedule latency + uop counts Match AMD Fam16h SOG + llvm-exegesis tests llvm-svn: 343494	2018-10-01 16:31:30 +00:00
Simon Pilgrim	683e35527b	[X86] Create schedule classes for BT(C\|R\|S)mi and BT(C\|R\|S)mr instructions llvm-svn: 343490	2018-10-01 16:12:44 +00:00
Evandro Menezes	55b9a5395b	[AArch64] Refactor cheap cost model Refactor the order in `TII::isAsCheapAsAMove()` to ease future development and maintenance. Practically NFC. llvm-svn: 343489	2018-10-01 16:11:19 +00:00
Simon Pilgrim	4334912c1c	[X86] Remove unnecessary BTmi/BTmr scheduler overrides llvm-svn: 343487	2018-10-01 15:01:00 +00:00
Simon Pilgrim	6ddc4e821c	[X86][Btver2] Fix BTmr schedule uop counts Match AMD Fam16h SOG + llvm-exegesis tests llvm-svn: 343484	2018-10-01 14:42:16 +00:00
Simon Pilgrim	43737a3df4	[X86] Create schedule classes for BTmi and BTmr instructions llvm-svn: 343478	2018-10-01 14:23:37 +00:00
Simon Pilgrim	a982236e59	[X86][Btver2] Fix masked load schedule JFPU01 resource usage should match JFPX Match AMD Fam16h SOG + llvm-exegesis tests llvm-svn: 343468	2018-10-01 13:12:05 +00:00
Alexander Timofeev	b048fa3344	[AMDGPU] Divergence driven instruction selection. Shift operations. Summary: This change enables VOP3 shifts to be explicitly selected dependent on the divergence. Differential Revision: https://reviews.llvm.org/D52559 Reviewers: rampitec llvm-svn: 343455	2018-10-01 11:06:35 +00:00
Andrea Di Biagio	24ea163007	[X86][BtVer2] Teach how to identify zero-idiom VPERM2F128rr instructions. This patch adds another variant class to identify zero-idiom VPERM2F128rr instructions. On Jaguar, a VPERM wih bit 3 and 7 of the mask set, is a zero-idiom. Differential Revision: https://reviews.llvm.org/D52663 llvm-svn: 343452	2018-10-01 10:35:13 +00:00
Clement Courbet	a933fb237e	[X86][Sched] Update scheduling information for VZEROALL on HWS, BDW, SKX, SNB. Summary: While looking at PR35606, I found out that the scheduling info is incorrect. One can check that it's really a P5+P6 and not a 2*P56 with: echo -e 'vzeroall\nvandps %xmm1, %xmm2, %xmm3' \| ./bin/llvm-exegesis -mode=uops -snippets-file=- (vandps executes on P5 only) Reviewers: craig.topper, RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D52541 llvm-svn: 343447	2018-10-01 08:37:48 +00:00
Clement Courbet	dac60b9837	[X86][Sched] Add pfm uop counter definitions for SNB,BDW,SKX. llvm-svn: 343446	2018-10-01 08:37:37 +00:00
Craig Topper	67d9dbdbdd	[X86] Stop X86DomainReassignment from creating copies between GR8/GR16 physical registers and k-registers. We can only copy between a k-register and a GR32/GR64 register. This patch detects that the copy will be illegal and prevents the domain reassignment from happening for that closure. This probably isn't the best fix, and we should probably figure out how to handle this correctly. Fixes PR38803. llvm-svn: 343443	2018-10-01 07:08:41 +00:00
Craig Topper	1d1dca6a6f	[X86] Change an llvm_unreachable to a report_fatal_error so the optimizer will stop making us reach the other report_fatal_error in this function. There's a conditional report_fatal_error just above this llvm_unreachable. The optimizer when seeing the unreachable removes the conditional and just makes any other error trigger the existing report_fatal_error. llvm-svn: 343428	2018-09-30 23:43:30 +00:00
Simon Pilgrim	f21083870d	[X86] Fix scheduler class for BTmi instructions This wasn't treated as a folded load instruction llvm-svn: 343424	2018-09-30 20:19:16 +00:00
Craig Topper	99ad2a5723	[X86] Copy memrefs when folding a load for division instruction selection. llvm-svn: 343419	2018-09-30 17:47:18 +00:00
Simon Pilgrim	4f5693ac8d	[X86][Btver2] Fix PCmpIStrI/PCmpIStrM schedules Missing JFPU0 pipe and double JFPU1 pipe (to match JVALU1) resources Match AMD Fam16h SOG + llvm-exegesis tests llvm-svn: 343413	2018-09-30 16:38:38 +00:00
Simon Pilgrim	9cec221a1c	[X86][BtVer2] Add the ability to add additional uops for folded instructions Some instructions take an extra load uop - but not consistently..... llvm-svn: 343410	2018-09-30 15:58:56 +00:00
Craig Topper	1709829fed	[X86] Disable BMI BEXTR in X86DAGToDAGISel::matchBEXTRFromAnd unless we're on compiling for a CPU with single uop BEXTR Summary: This function turns (X >> C1) & C2 into a BMI BEXTR or TBM BEXTRI instruction. For BMI BEXTR we have to materialize an immediate into a register to feed to the BEXTR instruction. The BMI BEXTR instruction is 2 uops on Intel CPUs. It looks like on SKL its one port 0/6 uop and one port 1/5 uop. Despite what Agner's tables say. I know one of the uops is a regular shift uop so it would have to go through the port 0/6 shifter unit. So that's the same or worse execution wise than the shift+and which is one 0/6 uop and one 0/1/5/6 uop. The move immediate into register is an additional 0/1/5/6 uop. For now I've limited this transform to AMD CPUs which have a single uop BEXTR. If may also might make sense if we can fold a load or if the and immediate is larger than 32-bits and can't be encoded as a sign extended 32-bit value or if LICM or CSE can hoist the move immediate and share it. But we'd need to look more carefully at that. In the regression I looked at it doesn't look load folding or large immediates were occurring so the regression isn't caused by the loss of those. So we could try to be smarter here if we find a compelling case. Reviewers: RKSimon, spatel, lebedev.ri, andreadb Reviewed By: RKSimon Subscribers: llvm-commits, andreadb, RKSimon Differential Revision: https://reviews.llvm.org/D52570 llvm-svn: 343399	2018-09-30 03:01:46 +00:00
Simon Pilgrim	a2efe82b81	[X86] SimplifyDemandedVectorEltsForTargetNode - remove identity target shuffles before simplifying inputs By removing demanded target shuffles that simplify to zero/undef/identity before simplifying its inputs we improve chances of further simplification, as only the immediate parent user of the combined is added back to the work list - this still doesn't help us if its passed through other ops though (bitcasts....). llvm-svn: 343390	2018-09-29 18:15:26 +00:00
Simon Pilgrim	a93407fadf	[X86][SSE] LowerScalarImmediateShift - remove 32-bit vXi64 special case handling. This is all handled generally by getTargetConstantBitsFromNode now llvm-svn: 343387	2018-09-29 17:36:22 +00:00
Simon Pilgrim	b5737007cd	Fix signed/unsigned mismatch warning. NFCI. llvm-svn: 343385	2018-09-29 17:11:19 +00:00
Simon Pilgrim	d633e290c8	[X86] getTargetConstantBitsFromNode - add support for rearranging constant bits via shuffles Exposed an issue that recursive calls to getTargetConstantBitsFromNode don't handle changes to EltSizeInBits yet. llvm-svn: 343384	2018-09-29 17:01:55 +00:00
Simon Pilgrim	ae34ae12ef	[X86][SSE] LowerScalarImmediateShift - use getTargetConstantBitsFromNode to get immediate data Don't just attempt to find a splat build vector. First step towards getting rid of all the 32-bit special case code. llvm-svn: 343383	2018-09-29 16:40:35 +00:00
Simon Pilgrim	a731940c60	[X86] getTargetConstantBitsFromNode - fix self-move assertions from gcc builds due to rL343375 llvm-svn: 343377	2018-09-29 14:51:09 +00:00
Simon Pilgrim	22d51014af	[X86] getTargetConstantBitsFromNode - add support for peeking through ISD::EXTRACT_SUBVECTOR llvm-svn: 343375	2018-09-29 14:17:32 +00:00
Simon Pilgrim	aa77033a6b	[X86][SSE] Fixed issue with v2i64 variable shifts on 32-bit targets The shift amount might have peeked through a extract_subvector, altering the number of vector elements in the 'Amt' variable - so we were incorrectly calculating the ratio when peeking through bitcasts, resulting in incorrectly detecting splats. llvm-svn: 343373	2018-09-29 13:25:22 +00:00
Vitaly Buka	0509070811	[cxx2a] Fix warning triggered by r343285 llvm-svn: 343369	2018-09-29 02:17:12 +00:00
Eli Friedman	5ab09a684f	[ARM] Fix correctness checks in promoteToConstantPool. Correctly check for relocations in the constant to promote. And don't allow promoting a constant multiple times. This partially fixes https://bugs.llvm.org//show_bug.cgi?id=32780 ; it's not a complete fix because we also need to prevent ARMConstantIslands from cloning the constant. (-arm-promote-constant is currently off by default, and it stays off with this patch. I'll look into turning it on again when all the known issues are fixed.) Differential Revision: https://reviews.llvm.org/D51472 llvm-svn: 343361	2018-09-28 20:27:31 +00:00
Eli Friedman	bb993be56b	[ARM] Use preferred alignment for constants in promoteToConstantPool. This mostly affects IR generated by non-clang frontends because clang generally sets the alignment of globals explicitly. Fixes https://bugs.llvm.org//show_bug.cgi?id=32394 . (-arm-promote-constant is currently off by default, and it stays off with this patch. I'll look into turning it on again when all the known issues are fixed.) Differential Revision: https://reviews.llvm.org/D51469 llvm-svn: 343359	2018-09-28 20:21:51 +00:00
Evandro Menezes	fc1852ff1c	[AArch64] Split zero cycle feature more granularly Split the `zcz` feature into specific ones got GP and FP registers, `zcz-gp` and `zcz-fp`, respectively, while retaining the original feature option to mean both. Differential revision: https://reviews.llvm.org/D52621 llvm-svn: 343354	2018-09-28 19:05:09 +00:00
Luke Cheeseman	10981cc884	Revert r343317 - asan buildbots are breaking and I need to investigate the issue llvm-svn: 343341	2018-09-28 17:01:50 +00:00
Simon Pilgrim	428c1196d8	[X86][Btver2] PSUBS/PSUBUS instructions are zero-idioms Noticed during llvm-exegesis tests, the PSUBS/PSUBUS instructions have the same zero-idiom behaviour to PSUB llvm-svn: 343321	2018-09-28 14:20:42 +00:00
Luke Cheeseman	21f2955bb2	Reapply changes reverted by r343235 - Add fix so that all code paths that create DWARFContext with an ObjectFile initialise the target architecture in the context - Add an assert that the Arch is known in the Dwarf CallFrameString method llvm-svn: 343317	2018-09-28 13:37:27 +00:00
Petar Jovanovic	ff1bc621a0	[MIPS GlobalISel] Lower i64 arguments Lower integer arguments larger then 32 bits for MIPS32. setMostSignificantFirst is used in order for G_UNMERGE_VALUES and G_MERGE_VALUES to always hold registers in same order, regardless of endianness. Patch by Petar Avramovic. Differential Revision: https://reviews.llvm.org/D52409 llvm-svn: 343315	2018-09-28 13:28:47 +00:00
Simon Pilgrim	66da1ed29d	[X86][Btver2] CVTSS2I/CVTSD2I - add missing JFPU0 pipe We issue JFPU1->JSTC then JFPU0->JFPA then -> JALU0 (integer pipe) Match AMD Fam16h SOG + llvm-exegesis tests llvm-svn: 343314	2018-09-28 13:19:22 +00:00
Simon Pilgrim	17e5981ebf	[X86][Btver2] Fix BSF/BSR schedule Double throughput to account for 2 pipes + fix BSF's latency/uop counts Match AMD Fam16h SOG + llvm-exegesis tests llvm-svn: 343311	2018-09-28 10:26:48 +00:00
David Spickett	ea605913be	[ARM] Allow execute only code on Cortex-m23 The NoMovt feature prevents the use of MOVW/MOVT instructions on Cortex-M23 for performance reasons. These instructions are required for execute only code so NoMovt should be disabled when that option is enabled. Differential Revision: https://reviews.llvm.org/D52551 llvm-svn: 343302	2018-09-28 08:55:19 +00:00
David Spickett	a799fe40dc	Remove extra whitespace. NFC. (test commit) llvm-svn: 343301	2018-09-28 08:45:28 +00:00
Oliver Stannard	5f34e9e265	[ARM][v8.5A] Add speculation barriers SSBB and PSSBB This adds two new barrier instructions which can be used to restrict speculative execution of load instructions. Patch by Pablo Barrio! Differential revision: https://reviews.llvm.org/D52484 llvm-svn: 343300	2018-09-28 08:27:56 +00:00
Simon Pilgrim	280af1c7f0	[X86][BtVer2] Fix PHMINPOS schedule resources typo PHMINPOS can run on either JFPU pipe llvm-svn: 343299	2018-09-28 08:21:39 +00:00
Derek Schuff	70ce1af9fa	WebAssembly: Rename GetSignature to GetLibcallSignature [NFC] llvm-svn: 343275	2018-09-27 22:20:33 +00:00
Konstantin Zhuravlyov	5f1b8181ad	AMDGPU: Split HasExt into HasExtDPP/SDWA/SDWA9 llvm-svn: 343264	2018-09-27 20:49:00 +00:00
Konstantin Zhuravlyov	9da26b20da	AMDGPU: Split VOP2Inst into VOP2Inst_e32/e64/sdwa llvm-svn: 343259	2018-09-27 19:46:41 +00:00
Konstantin Zhuravlyov	7d424aae13	AMDGPU/NFC: Simplify VOP_MAC_F16/F32 llvm-svn: 343254	2018-09-27 19:24:05 +00:00
Stanislav Mekhanoshin	b080adfc0c	[AMDGPU] Fold copy (copy vgpr) This allows to reduce a number of used VGPRs in some cases. Differential Revision: https://reviews.llvm.org/D52577 llvm-svn: 343249	2018-09-27 18:55:20 +00:00
Simon Pilgrim	2a64d393ea	[X86] Remove BT/BTC/BTR/BTS rr/ri overrides llvm-svn: 343241	2018-09-27 17:29:13 +00:00
Simon Pilgrim	86c7b07ecd	[X86][Btver2] (V)MPSADBW instructions take 3uops not 1 llvm-svn: 343238	2018-09-27 17:13:57 +00:00
Luke Cheeseman	8e5676b1aa	Revert r343192 as an ubsan build is currently failing llvm-svn: 343235	2018-09-27 16:47:30 +00:00
Simon Pilgrim	dd744f158a	[X86][Btver2] BTC/BTR/BTS instructions take 2uops not 1 llvm-svn: 343234	2018-09-27 16:39:52 +00:00
Simon Pilgrim	29cf499bca	[X86] Split BT and BTC/BTR/BTS scheduler classes llvm-svn: 343233	2018-09-27 16:24:42 +00:00
Simon Pilgrim	06ccc9d998	[Sparc] EXPENSIVE_CHECKS now passes all machine verifier errors (PR27461) Now that D51487 has landed, the last machine verifier tests that failed EXPENSIVE_CHECKS builds have now been fixed/removed, so we can remove @MatzeB 's isMachineVerifierClean() hack for sparc targets. Differential Revision: https://reviews.llvm.org/D52612 llvm-svn: 343232	2018-09-27 16:21:35 +00:00
Oliver Stannard	2721e6f0ed	[AArch64] Refactor immediate details out of add/sub tblgen class (NFCI) Bits [23-22] are used in Add and Sub to specify the shift. The value of the shift field must be 0x; values of 1x are unallocated. MTE adds some instructions that use such encodings, and this patch refactors the Add/Sub class so that another class could derive from this one to implement other encodings and other formats of bitfields. Patch by Pablo Barrio! Differential revision: https://reviews.llvm.org/D52489 llvm-svn: 343231	2018-09-27 16:19:04 +00:00
Oliver Stannard	a4f68bf4ad	[AArch64][v8.5A] Add speculation barriers SSBB and PSSBB This adds two new barrier instructions which can be used to restrict speculative execution of load instructions. Patch by Pablo Barrio! Differential revision: https://reviews.llvm.org/D52483 llvm-svn: 343229	2018-09-27 16:09:05 +00:00
Simon Pilgrim	c2a88ea64e	[X86][Btver2] BLSI/BLSMSK/BLSR instructions take 2uops not 1 (same as TZCNT) llvm-svn: 343227	2018-09-27 14:57:57 +00:00
Oliver Stannard	a9a5eee169	[AArch64][v8.5A] Add Branch Target Identification instructions This adds new instructions used by the Branch Target Identification feature. When this is enabled, these are the only instructions which can be targeted by indirect branch instructions. Patch by Pablo Barrio! Differential revision: https://reviews.llvm.org/D52485 llvm-svn: 343225	2018-09-27 14:54:33 +00:00
Oliver Stannard	8459d34e82	[AArch64][v8.5A] Add speculation restriction system registers This adds some new system registers which can be used to restrict certain types of speculative execution. Patch by Pablo Barrio and David Spickett! Differential revision: https://reviews.llvm.org/D52482 llvm-svn: 343218	2018-09-27 14:05:46 +00:00
Oliver Stannard	dc837e3f1f	[AArch64][v8.5A] Add Armv8.5-A random number instructions This adds two new system registers, used to generate random numbers. This is an optional extension to v8.5-A, and will be controlled by the "+rng" modifier of the -march= and -mcpu= options. Patch by Pablo Barrio! Differential revision: https://reviews.llvm.org/D52481 llvm-svn: 343217	2018-09-27 14:01:40 +00:00
Oliver Stannard	6930b12d53	[AArch64][v8.5A] Add Armv8.5-A "DC CVADP" instruction This adds a new variant of the DC system instruction for persistent memory. Patch by Pablo Barrio! Differential revision: https://reviews.llvm.org/D52480 llvm-svn: 343216	2018-09-27 13:53:35 +00:00
Oliver Stannard	224428c06a	[AArch64][v8.5A] Add prediction invalidation instructions to AArch64 This adds new system instructions which act as barriers to speculative execution based on earlier execution within a particular execution context. Patch by Pablo Barrio! Differential revision: https://reviews.llvm.org/D52479 llvm-svn: 343214	2018-09-27 13:47:40 +00:00
Oliver Stannard	382c935c42	[ARM][v8.5A] Add speculation barrier to ARM & Thumb instruction sets This is a new barrier which limits speculative execution of the instructions following it. Patch by Pablo Barrio! Differential revision: https://reviews.llvm.org/D52477 llvm-svn: 343213	2018-09-27 13:41:14 +00:00
Oliver Stannard	e481f1d95a	[AArch64][v8.5A] Add speculation barrier to AArch64 instruction set This is a new barrier which limits speculative execution of the instructions following it. Patch by Pablo Barrio! Differential revision: https://reviews.llvm.org/D52476 llvm-svn: 343211	2018-09-27 13:39:06 +00:00
Daniel Cederman	0c05bdea2b	[Sparc] Remove the support for builtin setjmp/longjmp Summary: It is currently broken and for Sparc there is not much benefit in using a builtin version compared to a library version. Both versions needs to store the same four values in setjmp and flush the register windows in longjmp. If the need for a builtin setjmp/longjmp arises there is an improved implementation available at https://reviews.llvm.org/D50969. Reviewers: jyknight, joerg, venkatra Subscribers: fedor.sergeev, jrtc27, llvm-commits Differential Revision: https://reviews.llvm.org/D51487 llvm-svn: 343210	2018-09-27 13:32:54 +00:00
Oliver Stannard	ddb7d46aa5	[AArch64][v8.5A] Add FRINT[32,64][Z,X] instructions These are some new variants of the "Floating-point Round to Integral" family of instructions, which round to the nearest floating-point value which fits in a 32- or 64-bit integer. Patch by Pablo Barrio! Differential revision: https://reviews.llvm.org/D52475 llvm-svn: 343209	2018-09-27 13:32:06 +00:00
Daniel Cederman	b35d3a2733	[Sparc] Add unimp alias Summary: Use 0 as the default immediate for the UNIMP instruction. This matches the behavior in gas. Reviewers: jyknight, venkatra Subscribers: fedor.sergeev, jrtc27, llvm-commits Differential Revision: https://reviews.llvm.org/D51526 llvm-svn: 343203	2018-09-27 12:34:53 +00:00
Daniel Cederman	c1968ba5d3	[Sparc] Add support for the partial write PSR instruction Summary: Partial write %PSR (WRPSR) is a SPARC V8e option that allows WRPSR instructions to only affect the %PSR.ET field. It is supported by the GR740 and GR716. Reviewers: jyknight, venkatra Subscribers: fedor.sergeev, jrtc27, llvm-commits Differential Revision: https://reviews.llvm.org/D48644 llvm-svn: 343202	2018-09-27 12:34:48 +00:00
Simon Pilgrim	98f503a326	[X86][Btver2] TZCNT instructions take 2uops not 1 llvm-svn: 343200	2018-09-27 12:28:47 +00:00
Nemanja Ivanovic	a59096759d	[PowerPC] [NFC] Refactor code for printing register operands We have an unfortunate situation in our back end where we have to keep pairs of functions synchronized. Needless to say that this is not an ideal situation as it is very difficult to enforce. Even without bugs, it's annoying to have to do the same thing in two places. This patch just refactors the code so that the two pairs of those functions that pertain to printing register operands are unified: - stripRegisterPrefix() - this just removes the letter prefixes from registers for the InstrPrinter and AsmPrinter. This patch provides this as a static member of PPCRegisterInfo - Handling of PPCII::UseVSXReg - there are 3 places where we do something special for instructions with that flag set. Each of those places does its own checking of this flag and implements code customization. Any changes to how we print/encode VSX/VMX registers require modifying all 3 places. This patch unifies this into a static function in PPCInstrInfo that returns the register number adjusted as needed. Differential revision: https://reviews.llvm.org/D52467 llvm-svn: 343195	2018-09-27 11:49:47 +00:00
Simon Pilgrim	7e4f154e79	[X86][Btver2] Add uops counter for exegesis reports llvm-svn: 343194	2018-09-27 11:40:26 +00:00
Luke Cheeseman	f6844b307a	Reapply changes reverted in r343114, lldb patch to follow shortly llvm-svn: 343192	2018-09-27 10:39:20 +00:00
Oliver Stannard	31af178f4a	[AArch64][v8.5A] Add PSTATE manipulation instructions XAFlag and AXFlag These new instructions manipluate the NZCV bits, to convert between the regular Arm floating-point comare format and an alternative format. Patch by Pablo Barrio! Differential revision: https://reviews.llvm.org/D52473 llvm-svn: 343187	2018-09-27 09:11:27 +00:00
Simon Atanasyan	e58c45a695	[mips] Add support MIPS r6 Debian triples Debian uses different triples for MIPS r6 and paths. Here we use SubArch to determine whether it is r6, if we found `r6' in CPU section of triple. These new triples include: mipsisa32r6-linux-gnu mipsisa32r6el-linux-gnu mipsisa64r6-linux-gnuabi64 mipsisa64r6el-linux-gnuabi64 mipsisa64r6-linux-gnuabin32 mipsisa64r6el-linux-gnuabin32 Patch by YunQiang Su. Differential revision: https://reviews.llvm.org/D50857 llvm-svn: 343185	2018-09-27 08:51:18 +00:00
Fangrui Song	0cac726a00	llvm::sort(C.begin(), C.end(), ...) -> llvm::sort(C, ...) Summary: The convenience wrapper in STLExtras is available since rL342102. Reviewers: dblaikie, javed.absar, JDevlieghere, andreadb Subscribers: MatzeB, sanjoy, arsenm, dschuff, mehdi_amini, sdardis, nemanjai, jvesely, nhaehnle, sbc100, jgravelle-google, eraman, aheejin, kbarton, JDevlieghere, javed.absar, gbedwell, jrtc27, mgrang, atanasyan, steven_wu, george.burgess.iv, dexonsmith, kristina, jsji, llvm-commits Differential Revision: https://reviews.llvm.org/D52573 llvm-svn: 343163	2018-09-27 02:13:45 +00:00
Yury Delendik	b3857e4d35	[WebAssembly] Fix MRI.hasOneNonDBGUse assert in WebAssemblyRegStackify pass Summary: The OneUseDominatesOtherUses in the WebAssemblyRegStackify not properly validates register use using hasOneUse. Since we added/modified DBG_VALUE the assert started catching valid cases. See also https://reviews.llvm.org/D49034#1247200 Fix verified by running the wasm waterfall. Reviewed By: dschuff Tags: #debug-info Differential Revision: https://reviews.llvm.org/D49034 llvm-svn: 343154	2018-09-26 23:49:21 +00:00
Tom Stellard	344475fce5	AMDGPU/SI: Change predicate to isCIOnly for 32-bit imm s_buffer_load* patterns Summary: This is essentially NFC, because the complex pattern used for these patterns will fail on non-CI, but this makes the pattern consistent with other CI smrd patterns. It is also a performance improvement, because the pattern will now fail earlier on non-CI. Reviewers: arsenm, nhaehnle Reviewed By: arsenm Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D52469 llvm-svn: 343125	2018-09-26 16:53:36 +00:00
Oliver Stannard	2905937435	[AArch64] Extend single-operand FP insns to match Arm ARM (NFCI) The Armv8.3-A reference manual defines floating-point data-processing instructions with one source operand to have an opcode of 6 bits [20:15]. The current class in tablegen, BaseSingleOperandFPData, only allows [18:15]. This was ok because [20:19] could only be '00', with other encodings unallocated. Armv8.5-A brings in the FRINT group of instructions which use other values for these bits. This patch refactors the existing class a bit to allow using the full 6 bits of the opcode, as defined in the Arm ARM. Patch by Pablo Barrio! Differential revision: https://reviews.llvm.org/D52474 llvm-svn: 343120	2018-09-26 15:42:47 +00:00
Luke Cheeseman	77aaa22081	Revert r343112 as CallFrameString API change has broken lldb builds llvm-svn: 343114	2018-09-26 14:48:03 +00:00
Oliver Stannard	c5d192b611	[AArch64] Refactor instructions that write PSTATE (NFCI) Reuse some code in preparation for the v8.5A XAFlag/AXFlag instructions, which shares part of the encoding of the MSR-immediate. Patch by Pablo Barrio! Differential revision: https://reviews.llvm.org/D52472 llvm-svn: 343113	2018-09-26 14:42:59 +00:00
Luke Cheeseman	03ad8812f5	[AArch64] - Return address signing dwarf support - Reapply r343089 with a fix for DebugInfo/Sparc/gnu-window-save.ll llvm-svn: 343112	2018-09-26 14:30:29 +00:00
Oliver Stannard	89b1604935	[AArch64][AsmParser] Show name of missing feature for system instructions Parsing of the system instructions (IC, DC, AT and TLBI) uses this function to show the required architecture when the operand is valid, but the architecture is not enabled. Armv8.5A adds a few different system instructions as part of optional features, so we need to extend it to show individual features, not just base architectures. This is NFC for now, but will be used by three different features added in v8.5A, and will be tested by them. Patch by David Spickett! Differential revision: https://reviews.llvm.org/D52478 llvm-svn: 343109	2018-09-26 13:52:27 +00:00
Hans Wennborg	00b88bbcaf	Revert r343089 "[AArch64] - Return address signing dwarf support" This caused the DebugInfo/Sparc/gnu-window-save.ll test to fail. > Functions that have signed return addresses need additional dwarf support: > - After signing the LR, and before authenticating it, the LR register is in a > state the is unusable by a debugger or unwinder > - To account for this a new directive, .cfi_negate_ra_state, is added > - This directive says the signed state of the LR register has now changed, > i.e. unsigned -> signed or signed -> unsigned > - This directive has the same CFA code as the SPARC directive GNU_window_save > (0x2d), adding a macro to account for multiply defined codes > - This patch matches the gcc implementation of this support: > https://patchwork.ozlabs.org/patch/800271/ > > Differential Revision: https://reviews.llvm.org/D50136 llvm-svn: 343103	2018-09-26 12:57:45 +00:00
Oliver Stannard	7c3c4baa3f	[ARM/AArch64][v8.5A] Add Armv8.5-A target This patch allows targeting Armv8.5-A, adding the architecture to tablegen and setting the options to be identical to Armv8.4-A for the time being. Subsequent patches will add support for the different features included in the Armv8.5-A Reference Manual. Patch by Pablo Barrio! Differential revision: https://reviews.llvm.org/D52470 llvm-svn: 343102	2018-09-26 12:48:21 +00:00
Hiroshi Inoue	20982f0995	[PowerPC] optimize conditional branch on CRSET/CRUNSET This patch adds a check to optimize conditional branch (BC and BCn) based on a constant set by CRSET or CRUNSET. Other optimizers, such as block placement, may generate such code and hence I do this at the very end of the optimization in pre-emit peephole pass. A conditional branch based on a constant is eliminated or converted into unconditional branch. Also CRSET/CRUNSET is eliminated if the condition code register is not used by instruction other than the branch to be optimized. Differential Revision: https://reviews.llvm.org/D52345 llvm-svn: 343100	2018-09-26 12:32:45 +00:00
Simon Pilgrim	ebabd79f43	[X86][SSE] canReduceVMulWidth - use ComputeNumSignBits/SignBitIsZero directly Don't reinvent the wheel for BUILD_VECTOR/ZERO_EXTEND - its only the ANY_EXTEND special case that needs handling. llvm-svn: 343096	2018-09-26 11:48:52 +00:00
Clement Courbet	596c56ff9c	[llvm-exegesis] Add support for measuring NumMicroOps. Summary: Example output for vzeroall: --- mode: uops key: instructions: - 'VZEROALL' config: '' register_initial_values: cpu_name: haswell llvm_triple: x86_64-unknown-linux-gnu num_repetitions: 10000 measurements: - { debug_string: HWPort0, value: 0.0006, per_snippet_value: 0.0006, key: '3' } - { debug_string: HWPort1, value: 0.0011, per_snippet_value: 0.0011, key: '4' } - { debug_string: HWPort2, value: 0.0004, per_snippet_value: 0.0004, key: '5' } - { debug_string: HWPort3, value: 0.0018, per_snippet_value: 0.0018, key: '6' } - { debug_string: HWPort4, value: 0.0002, per_snippet_value: 0.0002, key: '7' } - { debug_string: HWPort5, value: 1.0019, per_snippet_value: 1.0019, key: '8' } - { debug_string: HWPort6, value: 1.0033, per_snippet_value: 1.0033, key: '9' } - { debug_string: HWPort7, value: 0.0001, per_snippet_value: 0.0001, key: '10' } - { debug_string: NumMicroOps, value: 20.0069, per_snippet_value: 20.0069, key: NumMicroOps } error: '' info: '' assembled_snippet: C5FC77C5FC77C5FC77C5FC77C5FC77C5FC77C5FC77C5FC77C5FC77C5FC77C5FC77C5FC77C5FC77C5FC77C5FC77C5FC77C3 ... Reviewers: gchatelet Subscribers: tschuett, RKSimon, andreadb, llvm-commits Differential Revision: https://reviews.llvm.org/D52539 llvm-svn: 343094	2018-09-26 11:22:56 +00:00
Simon Pilgrim	5beaac433d	[X86][SSE] Use ISD::MULHS for constant vXi16 ISD::SRA lowering (PR38151) Similar to the existing ISD::SRL constant vector shifts from D49562, this patch adds ISD::SRA support with ISD::MULHS. As we're dealing with signed values, we have to handle shift by zero and shift by one special cases, so XOP+AVX2/AVX512 splitting/extension is still a better solution - really we should still use ISD::MULHS if one of the special cases are used but for now I've just left a TODO and filtered by isKnownNeverZero. Differential Revision: https://reviews.llvm.org/D52171 llvm-svn: 343093	2018-09-26 10:57:05 +00:00
Sam Parker	75aca94093	[ARM] Fix for PR39060 When calculating whether a value can safely overflow for use by an icmp, we weren't checking that the value couldn't wrap around. To do this we need the icmp to be using a constant, as well as the incoming add or sub. bugzilla report: https://bugs.llvm.org/show_bug.cgi?id=39060 Differential Revision: https://reviews.llvm.org/D52463 llvm-svn: 343092	2018-09-26 10:56:00 +00:00
Luke Cheeseman	f755e687fc	[AArch64] - Return address signing dwarf support Functions that have signed return addresses need additional dwarf support: - After signing the LR, and before authenticating it, the LR register is in a state the is unusable by a debugger or unwinder - To account for this a new directive, .cfi_negate_ra_state, is added - This directive says the signed state of the LR register has now changed, i.e. unsigned -> signed or signed -> unsigned - This directive has the same CFA code as the SPARC directive GNU_window_save (0x2d), adding a macro to account for multiply defined codes - This patch matches the gcc implementation of this support: https://patchwork.ozlabs.org/patch/800271/ Differential Revision: https://reviews.llvm.org/D50136 llvm-svn: 343089	2018-09-26 10:14:15 +00:00
Hans Wennborg	4b2e7daa7e	Revert r342870 "[ARM] bottom-top mul support ARMParallelDSP" This broke Chromium's Android build (https://crbug.com/889390) and the polly-aosp buildbot (http://lab.llvm.org:8011/builders/aosp-O3-polly-before-vectorizer-unprofitable). > Originally committed in rL342210 but was reverted in rL342260 because > it was causing issues in vectorized code, because I had forgotten to > ensure that we're operating on scalar values. > > Original commit message: > > On failing to find sequences that can be converted into dual macs, > try to find sequential 16-bit loads that are used by muls which we > can then use smultb, smulbt, smultt with a wide load. > > Differential Revision: https://reviews.llvm.org/D51983 llvm-svn: 343082	2018-09-26 08:41:50 +00:00
Thomas Lively	c949857a7f	[WebAssembly] SIMD conversions Summary: Lowers (s\|u)itofp and fpto(s\|u)i instructions for vectors. The fp to int conversions produce poison values if their arguments are out of the convertible range, so a future CL will have to add an LLVM intrinsic to make the saturating behavior of this conversion usable. Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D52372 llvm-svn: 343052	2018-09-26 00:34:36 +00:00
Stanislav Mekhanoshin	8dfcd83371	[AMDGPU] Fix ds combine with subregs Differential Revision: https://reviews.llvm.org/D52522 llvm-svn: 343047	2018-09-25 23:33:18 +00:00
Craig Topper	12c18840fa	[X86] Allow movmskpd/ps ISD nodes to be created and selected with integer input types. This removes an int->fp bitcast between the surrounding code and the movmsk. I had already added a hack to combineMOVMSK to try to look through this bitcast to improve the SimplifyDemandedBits there. But I found an additional issue where the bitcast was preventing combineMOVMSK from being called again after earlier nodes in the DAG are optimized. The bitcast gets revisted, but not the user of the bitcast. By using integer types throughout, the bitcast doesn't get in the way. llvm-svn: 343046	2018-09-25 23:28:27 +00:00
Changpeng Fang	6f4922ccc9	AMDGPU: Add Selection patterns to support add of one bit. Summary: We generate s_xor to lower add of i1s in general cases, and s_not to lower add with a one-bit imm of -1 (true). Reviewers: rampitec Differential Revision: https://reviews.llvm.org/D52518 llvm-svn: 343030	2018-09-25 21:21:18 +00:00
Simon Pilgrim	96335dd1ec	[X86] combineUIntToFP - Fix UINT_TO_FP(vXi1) comment (PR39078) llvm-svn: 343026	2018-09-25 20:52:08 +00:00
Sanjay Patel	10c11b867a	[x86] avoid 256-bit andnp that requires insert/extract with AVX1 (PR37449) This is the final (I hope!) problem pattern mentioned in PR37749: https://bugs.llvm.org/show_bug.cgi?id=37749 We are trying to avoid an AVX1 sinkhole caused by having 256-bit bitwise logic ops but no other 256-bit integer ops. We've already solved the simple logic ops, but 'andn' is an x86 special. I looked at alternative solutions like extending the generic DAG combine or trying to wait until the ANDNP node is created, but those are bigger patches that can over-reach. Ie, splitting to 128-bit does not look like a win in most cases with >1 256-bit op. The pattern matching is cluttered with bitcasts because of our i64 element canonicalization. For the affected test, we have this vector-type-legalized sequence: t29: v8i32 = concat_vectors t27, t28 t30: v4i64 = bitcast t29 t18: v8i32 = BUILD_VECTOR Constant:i32<-1>, Constant:i32<-1>, ... t31: v4i64 = bitcast t18 t32: v4i64 = xor t30, t31 t9: v8i32 = BUILD_VECTOR Constant:i32<255>, Constant:i32<255>, ... t34: v4i64 = bitcast t9 t35: v4i64 = and t32, t34 t36: v8i32 = bitcast t35 t37: v4i32 = extract_subvector t36, Constant:i64<0> t38: v4i32 = extract_subvector t36, Constant:i64<4> Differential Revision: https://reviews.llvm.org/D52318 llvm-svn: 343008	2018-09-25 19:09:34 +00:00
Yury Delendik	7c18d6083a	[WebAssembly] Move/clone DBG_VALUE during WebAssemblyRegStackify pass Summary: The MoveForSingleUse or MoveAndTeeForMultiUse functions move wasm instructions, however DBG_VALUE stay unchanged -- moving or cloning these. Reviewers: dschuff Reviewed By: dschuff Subscribers: mattd, MatzeB, dschuff, sbc100, jgravelle-google, aheejin, sunfish, llvm-commits, aardappel Tags: #debug-info Differential Revision: https://reviews.llvm.org/D49034 llvm-svn: 343007	2018-09-25 18:59:34 +00:00
Craig Topper	6fb1358a98	[X86] Add AVX512 support to combineVectorSizedSetCCEquality. Reviewers: spatel, RKSimon Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D52424 llvm-svn: 342989	2018-09-25 16:27:12 +00:00
Nirav Dave	0a0c2e6dd9	[ARM] Share predecessor bookkeeping in CombineBaseUpdate. NFCI. llvm-svn: 342987	2018-09-25 15:30:47 +00:00
Nirav Dave	e40e2bbd37	[AArch64] Share search bookkeeping in combines. NFCI. Share predecessor search bookkeeping in both perform PostLD1Combine and performNEONPostLDSTCombine. This should be approximately a 4x and 2x performance improvement. llvm-svn: 342986	2018-09-25 15:30:22 +00:00
Simon Pilgrim	b56be79e0c	Revert rL342916: [X86] Remove shift/rotate by CL memory (RMW) overrides As suggested by Craig Topper - I'm going to look at cleaning up the RMW sequences instead. The uops are slightly different to the register variant, so requires a +1uop tweak llvm-svn: 342969	2018-09-25 13:01:26 +00:00
Sameer Sahasrabuddhe	b4f2d1cb68	[AMDGPU] restore r342722 which was reverted with r342743 [AMDGPU] lower-switch in preISel as a workaround for legacy DA Summary: The default target of the switch instruction may sometimes be an "unreachable" block, when it is guaranteed that one of the cases is always taken. The dominator tree concludes that such a switch instruction does not have an immediate post dominator. This confuses divergence analysis, which is unable to propagate sync dependence to the targets of the switch instruction. As a workaround, the AMDGPU target now invokes lower-switch as a preISel pass. LowerSwitch is designed to handle the unreachable default target correctly, allowing the divergence analysis to locate the correct immediate dominator of the now-lowered switch. llvm-svn: 342956	2018-09-25 09:39:21 +00:00
Stefan Maksimovic	90e7ff8045	[mips] Correct MUL pattern for mips64 Guard existing pattern with a predicate, introduce a new one for revision 6. Differential Revision: https://reviews.llvm.org/D51684 llvm-svn: 342946	2018-09-25 06:27:49 +00:00
Fangrui Song	10a2162588	Use unique_ptr to hold AsmInfo,MRI,MII,STI Reviewers: pcc, dblaikie Reviewed By: dblaikie Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D52389 llvm-svn: 342945	2018-09-25 06:19:31 +00:00
Thomas Lively	12da0f9c3d	[WebAssembly] SIMD sqrt Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D52387 llvm-svn: 342937	2018-09-25 03:39:28 +00:00
Craig Topper	9ce5da7b62	[X86] Don't create FILD ISD nodes when X87 is disabled. The included test case previously asserted because the type legalizer tried to soften the FILD ISD node. Fixes PR38819. llvm-svn: 342934	2018-09-25 00:16:57 +00:00
Craig Topper	aeb4930b47	[X86] Remove superfluous curly braces. NFC llvm-svn: 342933	2018-09-25 00:16:54 +00:00
Craig Topper	b7e2499e80	[X86] Update comment. Use 'glued' instead of 'flagged' NFC llvm-svn: 342932	2018-09-25 00:16:52 +00:00
Artem Belevich	44ecb0e3c2	[CUDA] Added basic support for compiling with CUDA-10.0 llvm-svn: 342924	2018-09-24 23:10:44 +00:00
Simon Pilgrim	0b4ad7596f	[X86] Remove shift/rotate by CL memory (RMW) overrides The uops are slightly different to the register variant, so requires a +1uop tweak llvm-svn: 342916	2018-09-24 20:11:50 +00:00
Stefan Pintilie	b5305771fb	[Power9] [LLVM] Add __float128 exponent GET and SET builtins Added __builtin_vsx_scalar_extract_expq __builtin_vsx_scalar_insert_exp_qp Builtins should behave the same way as in GCC. Differential Revision: https://reviews.llvm.org/D48185 llvm-svn: 342910	2018-09-24 18:14:13 +00:00
Simon Pilgrim	a8b4e27760	[X86] Remove WriteDiv/WriteIDiv schedule overrides - use classes directly. NFCI. We're missing quite a bit of data for these instruction, removing the overrides makes this obvious - inconsistent reg/mem variants is a concern as well. Also, we have Divider resources (HWDivider etc.) but they aren't actually used consistently. llvm-svn: 342904	2018-09-24 16:58:26 +00:00
Evandro Menezes	0600c365a8	[ARM] Adjust the cost model for Exynos Tune `MaxInterleaveFactor` and `LdStMultipleTiming`and remove `PartialUpdateClearance` for the Exynos processors. llvm-svn: 342900	2018-09-24 16:35:14 +00:00
Evandro Menezes	814c68729d	[ARM] Adjust the feature set for Exynos Enable crypto and literals fusion for the Exynos processors. llvm-svn: 342899	2018-09-24 16:35:09 +00:00
Zhaoshi Zheng	05b46dc300	[Thumb1] Any imm8 should have cost of 1 A simple MOVS rd, imm8 can materialize [-128, 127] in signed i8 type or [0, 255] in unsigned i8 type on Thumb1. Differential Revision: https://reviews.llvm.org/D52257 llvm-svn: 342898	2018-09-24 16:15:23 +00:00
Simon Pilgrim	00865a48d1	[X86] Split WriteIMul into 8/16/32/64 implementations (PR36931) Split WriteIMul by size and also by IMUL multiply-by-imm and multiply-by-reg cases. This removes all the scheduler overrides for gpr multiplies and stops WriteMULH being ignored for BMI2 MULX instructions. llvm-svn: 342892	2018-09-24 15:21:57 +00:00
Luke Cheeseman	ab7f9b170d	[Arm][AsmParser] Restrict register list size for VSTM/VLDM - The assembler accepts VSTM/VLDM with register lists (specifically double registers lists) with more than 16 registers specified - The Arm architecture reference manual says this instruction must not contain more than 16 registers when the registers are doubleword registers - This addresses one of the concerns in https://bugs.llvm.org/show_bug.cgi?id=38389 Differential Revision: https://reviews.llvm.org/D52082 llvm-svn: 342891	2018-09-24 15:13:48 +00:00
Petar Jovanovic	f9808c5f09	[Mips][FastISel] Fix selectBranch on icmp i1 The r337288 tried to fix result of icmp i1 when its input is not sanitized by falling back to DagISel. While it now produces the correct result for bit 0, the other bits can still hold arbitrary value which is not supported by MipsFastISel branch lowering. This patch fixes the issue by falling back to DagISel in this case. Patch by Dragan Mladjenovic. Differential Revision: https://reviews.llvm.org/D52045 llvm-svn: 342884	2018-09-24 14:14:19 +00:00
Zaara Syeda	edefda48d2	[PowerPC] Support operand modifier 'x' in inline asm gcc uses operand modifier 'x' in inline asm for VSX registers. Without this modifier, instructions which use VSX numbering for their operands are printed as VMX registers. This patch adds support for the operand modifier 'x'. Differential Revision: https://reviews.llvm.org/D52244 llvm-svn: 342882	2018-09-24 14:01:16 +00:00
Matt Arsenault	f432011d33	AMDGPU: Fix private handling for allowsMisalignedMemoryAccesses If the alignment is at least 4, this should report true. Something still seems off with how < 4-byte types are handled here though. Fixing this seems to change how some combines get to where they get, but somehow isn't changing the net result. llvm-svn: 342879	2018-09-24 13:18:15 +00:00
Sjoerd Meijer	d986ede313	[ARM] Do not fuse VADD and VMUL on the Cortex-M4 and Cortex-M33 A sequence of VMUL and VADD instructions always give the same or better performance than a fused VMLA instruction on the Cortex-M4 and Cortex-M33. Executing the VMUL and VADD back-to-back requires the same cycles, but having separate instructions allows scheduling to avoid the hazard between these 2 instructions. Differential Revision: https://reviews.llvm.org/D52289 llvm-svn: 342874	2018-09-24 12:02:50 +00:00
Hans Wennborg	5555c00902	Revert r341932 "[ARM] Enable ARMCodeGenPrepare by default" This caused miscompilation of WebRTC for Android: PR39060. > We've had the pass enabled downstream for a couple of weeks and it > seems to be okay, so enable it by default. > > Differential Revision: https://reviews.llvm.org/D51920 llvm-svn: 342873	2018-09-24 11:40:07 +00:00
Luke Cheeseman	bda54bca39	[ARM][ARMLoadStoreOptimizer] - The load store optimizer is currently merging multiple loads/stores into VLDM/VSTM with more than 16 doubleword registers - This is an UNPREDICTABLE instruction and shouldn't be done - It looks like the Limit for how many registers included in a merge got dropped at some point so I am reintroducing it in this patch - This fixes https://bugs.llvm.org/show_bug.cgi?id=38389 Differential Revision: https://reviews.llvm.org/D52085 llvm-svn: 342872	2018-09-24 10:42:22 +00:00
Sam Parker	a7b2405b06	[ARM] bottom-top mul support ARMParallelDSP Originally committed in rL342210 but was reverted in rL342260 because it was causing issues in vectorized code, because I had forgotten to ensure that we're operating on scalar values. Original commit message: On failing to find sequences that can be converted into dual macs, try to find sequential 16-bit loads that are used by muls which we can then use smultb, smulbt, smultt with a wide load. Differential Revision: https://reviews.llvm.org/D51983 llvm-svn: 342870	2018-09-24 09:34:06 +00:00
Simon Pilgrim	f3f3dd584a	[X86] Split WriteShift/WriteRotate schedule classes by CL usage. Variable Shifts/Rotates using the CL register have different behaviours to the immediate instructions - split accordingly to help remove yet more repeated overrides from the schedule models. llvm-svn: 342852	2018-09-23 21:19:15 +00:00
Simon Pilgrim	6d95a8521f	[X86] Remove unnecessary WriteRotate override. NFCI. SNB was the last override for ROT(L\|R)r(1\|i) - they now all use WriteRotate correctly. llvm-svn: 342848	2018-09-23 19:33:58 +00:00
Simon Pilgrim	e7938423b2	Fix line ending mismatches. NFCI. llvm-svn: 342847	2018-09-23 19:16:32 +00:00
Simon Pilgrim	9202c9fb47	[X86] RORmCL instruction models should match ROLmCL etc. Confirmed with Craig Topper - fix a typo that was missing a Port4 uop for ROR*mCL instructions on some Intel models. Yet another step on the scheduler model cleanup marathon...... llvm-svn: 342846	2018-09-23 19:16:01 +00:00
Benjamin Kramer	b3478fcf0e	[Aarch64] Fix memcpy that was copying 4x too many bytes Found by asan. llvm-svn: 342845	2018-09-23 18:43:28 +00:00
Sanjay Patel	0027946915	[DAGCombiner][x86] extend decompose of integer multiply into shift/add with negation This is an alternative to https://reviews.llvm.org/D37896. We can't decompose multiplies generically without a target hook to tell us when it's profitable. ARM and AArch64 may be able to remove some existing code that overlaps with this transform. This extends D52195 and may resolve PR34474: https://bugs.llvm.org/show_bug.cgi?id=34474 (still an open question about transforming legal vector multiplies, but we could open another bug report for those) llvm-svn: 342844	2018-09-23 18:41:38 +00:00
Simon Pilgrim	19952add7c	[X86] Added missing RCL/RCR schedule overrides to the generic SNB model The SandyBridge model was missing schedule values for the RCL/RCR values - instead using the (incredibly optimistic) WriteShift (now WriteRotate) defaults. I've added overrides with more realistic (slow) values, based on a mixture of Agner/instlatx64 numbers and what later Intel models do as well. This is necessary to allow WriteRotate to be updated to remove other rotate overrides. It'd probably be a good idea to investigate a WriteRotateCarry class at some point but its not high priority given the unusualness of these instructions. llvm-svn: 342842	2018-09-23 17:40:24 +00:00
Simon Pilgrim	22d31c5e0f	[X86] Remove unnecessary WriteRotate overrides. NFCI. llvm-svn: 342841	2018-09-23 16:53:02 +00:00
Simon Pilgrim	4b50086013	[X86] Move RORX instructions back to WriteShift schedule class Despite being rotates, these more modern instructions avoid many of the quirks of the regular x86 rotate instructions and consistently have a schedule closer to shifts. llvm-svn: 342839	2018-09-23 16:17:13 +00:00
Simon Pilgrim	5f9d912095	[X86] Add WriteRotate schedule class, splitting off from WriteShift. NFCI for now, but it should make it easier to remove a lot of unnecessary overrides in a future commit. Now that funnel shift intrinsics are coming online we need to get this cleaned up to make vectorization costs from scalar rotate patterns more straightforward. llvm-svn: 342837	2018-09-23 15:12:10 +00:00
Craig Topper	c296436a30	[X86] Add isel pattern for (v8i16 (sext (v8i1))) with DQI and no BWI. Our lowering that tries to avoid this sign extend can be defeated by the DAG combine folding it with a truncate. The pattern needs to extend to an v8i32 then truncate back down to v8i16. llvm-svn: 342830	2018-09-23 06:49:48 +00:00
Craig Topper	3e0b4b0eb7	[X86] Fix a few typos in comments. llvm-svn: 342829	2018-09-23 06:49:47 +00:00
Tri Vo	6c47c62588	[AArch64] Support adding X[8-15,18] registers as CSRs. Summary: Specifying X[8-15,18] registers as callee-saved is used to support CONFIG_ARM64_LSE_ATOMICS in Linux kernel. As part of this patch we: - use custom CSR list/mask when user specifies custom CSRs - update Machine Register Info's list of CSRs with additional custom CSRs in LowerCall and LowerFormalArguments. Reviewers: srhines, nickdesaulniers, efriedma, javed.absar Reviewed By: nickdesaulniers Subscribers: kristof.beyls, jfb, llvm-commits Differential Revision: https://reviews.llvm.org/D52216 llvm-svn: 342824	2018-09-22 22:17:50 +00:00
Simon Atanasyan	7c9648ff89	[mips] Provide more detailed description for MIPS targets. NFC llvm-svn: 342799	2018-09-22 06:04:32 +00:00
Simon Atanasyan	1ba42ab73a	[mips] Remove obsoleted "experimental" tag from MIPS 64-bit targets. NFC llvm-svn: 342798	2018-09-22 06:04:26 +00:00
Craig Topper	082e04c61d	[X86] Fix inline expansion for memset in x32 Summary: Similar to D51893 which was for memcpy Reviewers: efriedma Reviewed By: efriedma Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D52063 llvm-svn: 342796	2018-09-22 05:16:35 +00:00
Craig Topper	9995760df4	[X86] Fold (movmsk (setne (and X, (1 << C)), 0)) -> (movmsk (X << C)) for vXi8 vectors. We don't have a vXi8 shift left so we need to bitcast to a vXi16 vector to perform the shift. If we let lowering legalize the vXi8 shift we get an extra and that we don't need and fail to remove. llvm-svn: 342795	2018-09-22 05:08:38 +00:00
Craig Topper	ecdab03d10	[X86] Teach fast isel to use MOV32ri64 for loading an unsigned 32 immediate into a 64-bit register. Previously we used SUBREG_TO_REG+MOV32ri. But regular isel was changed recently to use the MOV32ri64 pseudo. Fast isel now does the same. llvm-svn: 342788	2018-09-21 23:14:05 +00:00
Wouter van Oortmerssen	e0403f13c4	[WebAssembly] Simplified selecting asmmatcher stack instructions. Summary: By using the existing isCodeGenOnly bit in the tablegen defs, as suggested by tlively in https://reviews.llvm.org/D51662 Tested: llvm-lit -v `find test -name WebAssembly` Reviewers: tlively Subscribers: dschuff, sbc100, jgravelle-google, aheejin, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D52373 llvm-svn: 342772	2018-09-21 20:53:55 +00:00
Krzysztof Parzyszek	5805def9c8	[Hexagon] Avoid functions with exception handling in HexagonConstExtenders The constant-extender optimization does a form of code motion, which is complicated in the presence of exception handling. llvm-svn: 342751	2018-09-21 17:40:35 +00:00
Sameer Sahasrabuddhe	0807e94951	revert changes from r342722 "[AMDGPU] lower-switch in preISel as a workaround for legacy DA" This broke regression tests. The first breakage was noticed here: http://lab.llvm.org:8011/builders/lld-x86_64-freebsd/builds/23549 llvm-svn: 342743	2018-09-21 16:31:51 +00:00
Matthias Braun	c0ef786004	AArch64FastISel: Abort if we failed to select operand of intrinsic rdar://44642447 Differential Revision: https://reviews.llvm.org/D52335 llvm-svn: 342742	2018-09-21 15:47:41 +00:00
Clement Courbet	8171bd8e0f	[X86][Sched] Add zero idiom sched data to the SNB model. Summary: On SNB, renamer-based zeroing does not work for: - 16 and 8-bit GPRs[1]. - MMX [2]. - ANDN variants [3] [1] echo 'sub %ax, %ax' \| /tmp/llvm-exegesis -mode=uops -snippets-file=- [2] echo 'pxor %mm0, %mm0' \| /tmp/llvm-exegesis -mode=uops -snippets-file=- [3] echo 'andnps %xmm0, %xmm0' \| /tmp/llvm-exegesis -mode=uops -snippets-file=- Reviewers: RKSimon, andreadb Subscribers: gbedwell, craig.topper, llvm-commits Differential Revision: https://reviews.llvm.org/D52358 llvm-svn: 342736	2018-09-21 14:07:20 +00:00
Andrea Di Biagio	4cd5cf9fc8	[X86][BtVer2] Fix latency and resource cycles of AVX 256-bit zero-idioms. This patch introduces a SchedWriteVariant to describe zero-idiom VXORP(S\|D)Yrr and VANDNP(S\|D)Yrr. This is a follow-up of r342555. On Jaguar, a VXORPSYrr is 2 macro opcodes. Only one opcode is eliminated at register-renaming stage. The other opcode has to be executed to set the upper half of the destination YMM. Same for VANDNP(S\|D)Yrr. Differential Revision: https://reviews.llvm.org/D52347 llvm-svn: 342728	2018-09-21 12:43:07 +00:00
Sameer Sahasrabuddhe	2de7653fd5	[AMDGPU] lower-switch in preISel as a workaround for legacy DA Summary: The default target of the switch instruction may sometimes be an "unreachable" block, when it is guaranteed that one of the cases is always taken. The dominator tree concludes that such a switch instruction does not have an immediate post dominator. This confuses divergence analysis, which is unable to propagate sync dependence to the targets of the switch instruction. As a workaround, the AMDGPU target now invokes lower-switch as a preISel pass. LowerSwitch is designed to handle the unreachable default target correctly, allowing the divergence analysis to locate the correct immediate dominator of the now-lowered switch. Reviewers: arsenm, nhaehnle Reviewed By: nhaehnle Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits, simoll Differential Revision: https://reviews.llvm.org/D52221 llvm-svn: 342722	2018-09-21 11:26:55 +00:00
Alexander Timofeev	36617f0160	[AMDGPU] Divergence driven instruction selection. Part 1. Summary: This change is the first part of the AMDGPU target description change. The aim of it is the effective splitting the vector and scalar flows at the selection stage. Selection uses predicate functions based on the framework implemented earlier - https://reviews.llvm.org/D35267 Differential revision: https://reviews.llvm.org/D52019 Reviewers: rampitec llvm-svn: 342719	2018-09-21 10:31:22 +00:00
Yonghong Song	150ca5143b	bpf: check illegal usage of XADD insn return value Currently, BPF has XADD (locked add) insn support and the asm looks like: lock (u32 )(r1 + 0) += r2 lock (u64 )(r1 + 0) += r2 The instruction itself does not have a return value. At the source code level, users often use __sync_fetch_and_add() which eventually translates to XADD. The return value of __sync_fetch_and_add() is supposed to be the old value in the xadd memory location. Since BPF::XADD insn does not support such a return value, this patch added a PreEmit phase to check such a usage. If such an illegal usage pattern is detected, a fatal error will be reported like line 4: Invalid usage of the XADD return value if compiled with -g, or Invalid usage of the XADD return value if compiled without -g. Signed-off-by: Yonghong Song <yhs@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> llvm-svn: 342692	2018-09-20 22:24:27 +00:00
Thomas Lively	6f21a13675	[WebAssembly] Add V128 value type to binary format Summary: Adds the necessary support to lib/ObjectYAML and fixes SIMD calls to allow the tests to work. Also removes some dead code that would otherwise have to have been updated. Reviewers: aheejin, dschuff, sbc100 Subscribers: jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D52105 llvm-svn: 342689	2018-09-20 22:04:44 +00:00
Sanjay Patel	8a1227ccc8	[SelectionDAG] replace duplicated peekThroughBitcast helper functions; NFCI x86 had 2 versions of peekThroughBitcast. DAGCombiner had 1. Plus, it had a 1-off implementation for the one-use variant. Move the x86 versions of the code to SelectionDAG, so we don't have different copies of the code. No functional change intended. I'm putting this next to isBitwiseNot() because I am planning to use it in there. Another option is next to the helpers in the ISD namespace (eg, ISD::isConstantSplatVector()). But if there's no good reason for those to be there, I'd prefer to pull other helpers over to SelectionDAG in follow-up steps. Differential Revision: https://reviews.llvm.org/D52285 llvm-svn: 342669	2018-09-20 17:34:08 +00:00
Simon Pilgrim	3e2de767f6	[X86][SSE] Remove UNPCKL(SHUFFLE)->UNPCKH custom combine This can be achieved more generally by combineX86ShufflesRecursively. llvm-svn: 342645	2018-09-20 13:10:22 +00:00
Simon Pilgrim	46c1dcb1af	[X86][SSE] Remove PSHUFLW/PSHUFHW combineRedundantHalfShuffle combine This can be achieved more generally by combineX86ShufflesRecursively and was causing a fuzz test failure found by Mikael Holmén. llvm-svn: 342642	2018-09-20 12:11:38 +00:00
Alex Bradbury	96ed75d066	[RISCV][MC] Modify evaluateConstantImm interface to allow reuse from addExpr This is a trivial refactoring that I'm committing now as it makes a patch I'm about to post for review easier to follow. There is some overlap between evaluateConstantImm and addExpr in RISCVAsmParser. This patch allows evaluateConstantImm to be reused from addExpr to remove this overlap. The benefit will be greater when a future patch adds extra code to allows immediates to be evaluated from constant symbols (e.g. `.equ CONST, 0x1234`). No functional change intended. llvm-svn: 342641	2018-09-20 11:40:43 +00:00
Alex Bradbury	226f3ef5a5	[RISCV][MC] Improve parsing of jal/j operands Examples such as `jal a3`, `j a3` and `jal a3, a3` are accepted by gas but rejected by LLVM MC. This patch rectifies this. I introduce RISCVAsmParser::parseJALOffset to ensure that symbol names that coincide with register names can safely be parsed. This is made a somewhat fiddly due to the single-operand alias form (see the comment in parseJALOffset for more info). Differential Revision: https://reviews.llvm.org/D52029 llvm-svn: 342629	2018-09-20 08:10:35 +00:00
Maya Madhavan	ec1efe4ee3	Fix for bug 34002 - label generated before it block is finalized. Differential Revision: https://reviews.llvm.org/D52258 llvm-svn: 342615	2018-09-20 05:11:42 +00:00
QingShan Zhang	cae9425a3c	[PowerPC] Fix the assert of combineBVOfConsecutiveLoads when element num is 1 Building a vector out of multiple loads can be converted to a load of the vector type if the loads are consecutive. But the special condition is that the element number is 1, such as <1 x i128>. So just early exit to fix the assert. Patch By: wuzish (Zixuan Wu) Differential Revision: https://reviews.llvm.org/D52072 llvm-svn: 342611	2018-09-20 03:09:15 +00:00
Thomas Lively	f45de47c59	[WebAssembly] Renumber SIMD ops Summary: This change leaves holes in the opcode space where missing instructions could logically be added later if they were found to be useful. Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D52282 llvm-svn: 342610	2018-09-20 02:55:28 +00:00
Matthias Braun	28d6a4ac9a	AArch64: Add FuseCryptoEOR fusion rules There's some additional rules available on newer apple CPUs. rdar://41235346 llvm-svn: 342590	2018-09-19 20:50:51 +00:00
Evandro Menezes	8a6973d6ff	[ARM] Adjust the feature set for Exynos Fine tune the cost model for all Exynos processors. llvm-svn: 342585	2018-09-19 19:51:29 +00:00
Evandro Menezes	c62ab61173	[ARM] Refactor Exynos feature set (NFC) Since all Exynos processors share the same feature set, fold them in the implied fatures list for the subtarget. llvm-svn: 342583	2018-09-19 19:43:23 +00:00
Simon Pilgrim	2d0f20cc04	[X86] Handle COPYs of physregs better (regalloc hints) Enable enableMultipleCopyHints() on X86. Original Patch by @jonpa: While enabling the mischeduler for SystemZ, it was discovered that for some reason a test needed one extra seemingly needless COPY (test/CodeGen/SystemZ/call-03.ll). The handling for that is resulted in this patch, which improves the register coalescing by providing not just one copy hint, but a sorted list of copy hints. On SystemZ, this gives ~12500 less register moves on SPEC, as well as marginally less spilling. Instead of improving just the SystemZ backend, the improvement has been implemented in common-code (calculateSpillWeightAndHint(). This gives a lot of test failures, but since this should be a general improvement I hope that the involved targets will help and review the test updates. Differential Revision: https://reviews.llvm.org/D38128 llvm-svn: 342578	2018-09-19 18:59:08 +00:00
Sanjay Patel	1a1c0ee599	[x86] change names of vector splitting helper functions; NFC As the code comments suggest, these are about splitting, and they are not necessarily limited to lowering, so that misled me. There's nothing that's actually x86-specific in these either, so they might be better placed in a common header so any target can use them. llvm-svn: 342575	2018-09-19 18:52:00 +00:00
Simon Atanasyan	a9e8765e3e	[mips][microMIPS] Extending size reduction pass with MOVEP The patch extends size reduction pass for MicroMIPS. Two MOVE instructions are transformed into one MOVEP instrucition. Patch by Milena Vujosevic Janicic. Differential revision: https://reviews.llvm.org/D52037 llvm-svn: 342572	2018-09-19 18:46:29 +00:00
Simon Atanasyan	852dd83be8	[mips][microMIPS] Fix the definition of MOVEP instruction The patch fixes definition of MOVEP instruction. Two registers are used instead of register pairs. This is necessary as machine verifier cannot handle register pairs. Patch by Milena Vujosevic Janicic. Differential revision: https://reviews.llvm.org/D52035 llvm-svn: 342571	2018-09-19 18:46:21 +00:00
Simon Pilgrim	8191d63c3b	[X86] Add initial SimplifyDemandedVectorEltsForTargetNode support This patch adds an initial x86 SimplifyDemandedVectorEltsForTargetNode implementation to handle target shuffles. Currently the patch only decodes a target shuffle, calls SimplifyDemandedVectorElts on its input operands and removes any shuffle that reduces to undef/zero/identity. Future work will need to integrate this with combineX86ShufflesRecursively, add support for other x86 ops, etc. NOTE: There is a minor regression that appears to be affecting further (extractelement?) combines which I haven't been able to solve yet - possibly something to do with how nodes are added to the worklist after simplification. Differential Revision: https://reviews.llvm.org/D52140 llvm-svn: 342564	2018-09-19 18:11:34 +00:00
Carl Ritson	6b8d75425e	[AMDGPU] Add instruction selection for i1 to f16 conversion Summary: This is required for GPUs with 16 bit instructions where f16 is a legal register type and hence int_to_fp i1 to f16 is not lowered by legalizing. Reviewers: arsenm, nhaehnle Reviewed By: nhaehnle Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D52018 Change-Id: Ie4c0fd6ced7cf10ad612023c6879724d9ded5851 llvm-svn: 342558	2018-09-19 16:32:12 +00:00
Yonghong Song	5b476c5a9f	[bpf] Symbol sizes and types in object file Clang-compiled object files currently don't include the symbol sizes and types. Some tools however need that information. For example, ctfconvert uses that information to generate FreeBSD's CTF representation from ELF files. With this patch, symbol sizes and types are included in object files. Signed-off-by: Paul Chaignon <paul.chaignon@orange.com> Reported-by: Yutaro Hayakawa <yhayakawa3720@gmail.com> llvm-svn: 342556	2018-09-19 16:04:13 +00:00
Andrea Di Biagio	8b6c314be1	[TableGen][SubtargetEmitter] Add the ability for processor models to describe dependency breaking instructions. This patch adds the ability for processor models to describe dependency breaking instructions. Different processors may specify a different set of dependency-breaking instructions. That means, we cannot assume that all processors of the same target would use the same rules to classify dependency breaking instructions. The main goal of this patch is to provide the means to describe dependency breaking instructions directly via tablegen, and have the following TargetSubtargetInfo hooks redefined in overrides by tabegen'd XXXGenSubtargetInfo classes (here, XXX is a Target name). ``` virtual bool isZeroIdiom(const MachineInstr MI, APInt &Mask) const { return false; } virtual bool isDependencyBreaking(const MachineInstr MI, APInt &Mask) const { return isZeroIdiom(MI); } ``` An instruction MI is a dependency-breaking instruction if a call to method isDependencyBreaking(MI) on the STI (TargetSubtargetInfo object) evaluates to true. Similarly, an instruction MI is a special case of zero-idiom dependency breaking instruction if a call to STI.isZeroIdiom(MI) returns true. The extra APInt is used for those targets that may want to select which machine operands have their dependency broken (see comments in code). Note that by default, subtargets don't know about the existence of dependency-breaking. In the absence of external information, those method calls would always return false. A new tablegen class named STIPredicate has been added by this patch to let processor models classify instructions that have properties in common. The idea is that, a MCInstrPredicate definition can be used to "generate" an instruction equivalence class, with the idea that instructions of a same class all have a property in common. STIPredicate definitions are essentially a collection of instruction equivalence classes. Also, different processor models can specify a different variant of the same STIPredicate with different rules (i.e. predicates) to classify instructions. Tablegen backends (in this particular case, the SubtargetEmitter) will be able to process STIPredicate definitions, and automatically generate functions in XXXGenSubtargetInfo. This patch introduces two special kind of STIPredicate classes named IsZeroIdiomFunction and IsDepBreakingFunction in tablegen. It also adds a definition for those in the BtVer2 scheduling model only. This patch supersedes the one committed at r338372 (phabricator review: D49310). The main advantages are: - We can describe subtarget predicates via tablegen using STIPredicates. - We can describe zero-idioms / dep-breaking instructions directly via tablegen in the scheduling models. In future, the STIPredicates framework can be used for solving other problems. Examples of future developments are: - Teach how to identify optimizable register-register moves - Teach how to identify slow LEA instructions (each subtarget defining its own concept of "slow" LEA). - Teach how to identify instructions that have undocumented false dependencies on the output registers on some processors only. It is also (in my opinion) an elegant way to expose knowledge to both external tools like llvm-mca, and codegen passes. For example, machine schedulers in LLVM could reuse that information when internally constructing the data dependency graph for a code region. This new design feature is also an "opt-in" feature. Processor models don't have to use the new STIPredicates. It has all been designed to be as unintrusive as possible. Differential Revision: https://reviews.llvm.org/D52174 llvm-svn: 342555	2018-09-19 15:57:45 +00:00
Sanjay Patel	4fd2e2a498	[DAGCombiner][x86] add transform/hook to decompose integer multiply into shift/add This is an alternative to D37896. I don't see a way to decompose multiplies generically without a target hook to tell us when it's profitable. ARM and AArch64 may be able to remove some duplicate code that overlaps with this transform. As a first step, we're only getting the most clear wins on the vector examples requested in PR34474: https://bugs.llvm.org/show_bug.cgi?id=34474 As noted in the code comment, it's likely that the x86 constraints are tighter than necessary, but it may not always be a win to replace a pmullw/pmulld. Differential Revision: https://reviews.llvm.org/D52195 llvm-svn: 342554	2018-09-19 15:57:40 +00:00
Alex Bradbury	79518b02cd	[AtomicExpandPass]: Add a hook for custom cmpxchg expansion in IR This involves changing the shouldExpandAtomicCmpXchgInIR interface, but I have updated the in-tree backends using this hook (ARM, AArch64, Hexagon) so they will see no functional change. Previously this hook returned bool, but it now returns AtomicExpansionKind. This hook allows targets to select how a given cmpxchg is to be expanded. D48131 uses this to expand part-word cmpxchg to a target-specific intrinsic. See my associated RFC for more info on the motivation for this change <http://lists.llvm.org/pipermail/llvm-dev/2018-June/123993.html>. Differential Revision: https://reviews.llvm.org/D48130 llvm-svn: 342550	2018-09-19 14:51:42 +00:00
Oliver Stannard	0b835be7bb	[ARM] Fix unwind information for floating point registers Fixes the unwind information generated for floating-point registers. Previously, all padding registers were assumed to be four bytes wide. Now, the width of the register is used to specify the amount of padding. Patch by Jackson Woodruff! Differential revision: https://reviews.llvm.org/D51494 llvm-svn: 342545	2018-09-19 13:25:31 +00:00
Calixte Denizet	7413a43886	Verify commit access in fixing typo llvm-svn: 342538	2018-09-19 11:26:20 +00:00
Alex Bradbury	21aea51e71	[RISCV] Codegen for i8, i16, and i32 atomicrmw with RV32A Introduce a new RISCVExpandPseudoInsts pass to expand atomic pseudo-instructions after register allocation. This is necessary in order to ensure that register spills aren't introduced between LL and SC, thus breaking the forward progress guarantee for the operation. AArch64 does something similar for CmpXchg (though only at O0), and Mips is moving towards this approach (see D31287). See also [this mailing list post](http://lists.llvm.org/pipermail/llvm-dev/2016-May/099490.html) from James Knight, which summarises the issues with lowering to ll/sc in IR or pre-RA. See the [accompanying RFC thread](http://lists.llvm.org/pipermail/llvm-dev/2018-June/123993.html) for an overview of the lowering strategy. Differential Revision: https://reviews.llvm.org/D47882 llvm-svn: 342534	2018-09-19 10:54:22 +00:00
Hans Wennborg	4195eb1068	[COFF] Emit @feat.00 on 64-bit and set the CFG bit when emitting guardcf tables The 0x800 bit in @feat.00 needs to be set in order to make LLD pick up the .gfid$y table. I believe this is fine to set even if we don't emit the instrumentation. We haven't emitted @feat.00 on 64-bit before. I see that MSVC does emit it, but I'm not entirely sure what the default value should be. I went with zero since that seems as safe as not emitting the symbol in the first place. Differential Revision: https://reviews.llvm.org/D52235 llvm-svn: 342532	2018-09-19 09:58:30 +00:00
Thomas Lively	ad7e9e9f60	[WebAssembly][NFC] Remove extra space in WebAssemblyInstrSIMD.td llvm-svn: 342522	2018-09-19 00:54:20 +00:00
Matthias Braun	934be5fecf	AArch64MacroFusion: Factor out some opcode handling code; NFC llvm-svn: 342521	2018-09-19 00:23:37 +00:00
Matthias Braun	726e12cf0c	ScheduleDAG: Cleanup dumping code; NFC - Instead of having both `SUnit::dump(ScheduleDAG)` and `ScheduleDAG::dumpNode(ScheduleDAG)`, just keep the latter around. - Add `ScheduleDAG::dump()` and avoid code duplication in several places. Implement it for different ScheduleDAG variants. - Add `ScheduleDAG::dumpNodeName()` in favor of the `SUnit::print()` functions. They were only ever used for debug dumping and putting the function into ScheduleDAG is consistent with the `dumpNode()` change. llvm-svn: 342520	2018-09-19 00:23:35 +00:00
Thomas Lively	aaf4e2cbba	[WebAssembly] v4f32.abs and v2f64.abs Summary: implement lowering of @llvm.fabs for vector types. Reviewers: aheejin, dschuff Subscribers: llvm-svn: 342513	2018-09-18 21:45:12 +00:00
Farhana Aleen	f5a2848376	[AMDGPU] Match udot8 pattern Summary: D.u32 = S0.u4[0] * S1.u4[0] + S0.u4[1] * S1.u4[1] + S0.u4[2] * S1.u4[2] + S0.u4[3] * S1.u4[3] + S0.u4[4] * S1.u4[4] + S0.u4[5] * S1.u4[5] + S0.u4[6] * S1.u4[6] + S0.u4[7] * S1.u4[7] + S2.u32 Author: FarhanaAleen Reviewed By: arsenm, nhaehnle Differential Revision: https://reviews.llvm.org/D51947 llvm-svn: 342497	2018-09-18 16:59:48 +00:00
Alex Bradbury	68f73c1206	[RISCV][MC] Use a custom ParserMethod for the bare_symbol operand type This allows the hard-coded shouldForceImmediate logic to be removed because the generated MatchOperandParserImpl makes use of the current context (i.e. the current mnemonic) to determine parsing behaviour, and so won't first try to parse a register before parsing a symbol name. No functional change is intended. gas accepts immediate arguments for call, tail and lla. This patch doesn't address this discrepancy. Differential Revision: https://reviews.llvm.org/D51733 llvm-svn: 342488	2018-09-18 15:18:16 +00:00
Alex Bradbury	7d0e18d0dd	[RISCV][MC] Reject bare symbols for the simm12 operand type addi a0, a0, foo and lw a0, foo(a0) and similar are now rejected. An explicit %lo and %pcrel_lo modifier is required. This matches gas behaviour. llvm-svn: 342487	2018-09-18 15:13:29 +00:00
Alex Bradbury	74340f1805	[RISCV][MC] Tighten up checking of sybol operands to lui and auipc Reject bare symbols and accept only %pcrel_hi(sym) for auipc and %hi(sym) for lui. Also test valid operand modifiers in rv32i-valid.s. Note this is slightly stricter than gas, which will accept either %pcrel_hi or %hi for both lui and auipc. Differential Revision: https://reviews.llvm.org/D51731 llvm-svn: 342486	2018-09-18 15:08:35 +00:00
Nemanja Ivanovic	87c31a6113	[PowerPC] Do not emit record-form rotates when record-form andi/andis suffices This is a follow-up to the previous patch that eliminated some of the rotates. With this addition, we will also emit the record-form andis. This patch increases the number of record-form rotates we eliminate by more than 70%. Differential revision: https://reviews.llvm.org/D44897 llvm-svn: 342478	2018-09-18 13:43:16 +00:00
Nemanja Ivanovic	6a39d32e66	[PowerPC] Optimize compares fed by ANDISo Both ANDIo and ANDISo (and the 64-bit versions) are record-form instructions. When optimizing compares, we handle the former in order to eliminate the compare instruction but not the latter. This patch just adds the latter to the set of instructions we optimize. The reason these instructions need to be handled separately is that they are not part of the RecFormRel map (since they don't have a non-record-form). The missing "and-immediate-shifted" is just an oversight in the initial implementation. Differential revision: https://reviews.llvm.org/D51353 llvm-svn: 342472	2018-09-18 13:21:58 +00:00
Simon Pilgrim	e9bf71e761	[X86][SSE] LowerShift - pull out repeated getTargetVShiftUniformOpcode calls. NFCI. llvm-svn: 342462	2018-09-18 10:44:44 +00:00
David Green	85d6a55995	[AArch64] Attempt to parse more operands as expressions This tries to make use of evaluateAsRelocatable in AArch64AsmParser::classifySymbolRef to parse more complex expressions as relocatable operands. It is hopefully better than the existing code which only handles Symbol +- Constant. This allows us to parse more complex adr/adrp, mov, ldr/str and add operands. It also loosens the requirements on parsing addends in ld/st and mov's and adds a number of tests. Differential Revision: https://reviews.llvm.org/D51792 llvm-svn: 342455	2018-09-18 09:44:53 +00:00
Matt Arsenault	ebf46143ea	AMDGPU: Don't form fmed3 if it will require materialization If there is a single use constant, it can be folded into the min/max, but not into med3. llvm-svn: 342443	2018-09-18 02:34:54 +00:00
QingShan Zhang	f1b0b47b2d	[PowerPC] Add Itineraries of IIC_IntMulHD for P7/P8 When doing some instruction scheduling work, we noticed some missing itineraries. Before we switch to machine scheduler, those missing itineraries might not have impact to actually scheduling, because we can still get same latency due to default values. With machine scheduler, however, itineraries will have impact to scheduling. eg: NumMicroOps will default to be 0 if there is NO itineraries for specific instruction class. And most of the instruction class with itineraries will have NumMicroOps default to 1. This will has impact on the count of RetiredMOps, affects the Pending/Available Queue, then causing different scheduling or suboptimal scheduling further. Patch By: jsji (Jinsong Ji) Differential Revision: https://reviews.llvm.org/D52040 llvm-svn: 342441	2018-09-18 02:05:18 +00:00
Matt Arsenault	9d49c449ec	AMDGPU: Expand vector canonicalizes llvm-svn: 342439	2018-09-18 01:51:33 +00:00
Volodymyr Sapsai	703ab84cf5	Revert "[ARM] Cleanup ARM CGP isSupportedValue" This reverts r342395 as it caused error > Argument value type does not match pointer operand type! > %0 = atomicrmw volatile xchg i8* %_Value1, i32 1 monotonic, !dbg !25 > i8in function atomic_flag_test_and_set > fatal error: error in backend: Broken function found, compilation aborted! on bot http://green.lab.llvm.org/green/job/clang-stage1-configure-RA/ More details are available at https://reviews.llvm.org/D52080 llvm-svn: 342431	2018-09-18 00:11:55 +00:00
Simon Atanasyan	9265dca8b5	[mips] Fix MIPS N32 ABI triples support Add support mips64(el)-linux-gnuabin32 triples, and set them to N32. Debian architecture name mipsn32/mipsn32el are also added. Set UseIntegratedAssembler for N32 if we can detect it. Patch by YunQiang Su. Differential revision: https://reviews.llvm.org/D51408 llvm-svn: 342416	2018-09-17 21:21:57 +00:00
Keno Fischer	c8ccaed325	[X86ISel] Implement byval lowering for Win64 calling convention Summary: The IR reference for the `byval` attribute states: ``` This indicates that the pointer parameter should really be passed by value to the function. The attribute implies that a hidden copy of the pointee is made between the caller and the callee, so the callee is unable to modify the value in the caller. This attribute is only valid on LLVM pointer arguments. ``` However, on Win64, this attribute is unimplemented and the raw pointer is passed to the callee instead. This is problematic, because frontend authors relying on the implicit hidden copy (as happens for every other calling convention) will see the passed value silently (if mutable memory) or loudly (by means of a crash) modified because the callee treats the location as scratch memory space it is allowed to mutate. At this point, it's worth taking a step back to understand the context. In most calling conventions, aggregates that are too large to be passed in registers, instead get copied to the stack at a fixed (computable from the signature) offset of the stack pointer. At the LLVM, we hide this hidden copy behind the byval attribute. The caller passes a pointer to the desired data and the callee receives a pointer, but these pointers are not the same. In particular, the pointer that the callee receives points to temporary stack memory allocated as part of the call lowering. In most calling conventions, this pointer is never realized in registers or memory. The temporary memory is simply defined by an implicit offset from the stack pointer at function entry. Win64, uniquely, works differently. The structure is still passed in memory, but instead of being stored at an implicit memory offset, the caller computes a pointer to the temporary memory and passes it to the callee as a regular pointer (taking up a register, or if all registers are taken up, an additional stack slot). Presumably, this was done to allow eliding the copy when passing aggregates through several functions on the stack. This explains why ignoring the `byval` attribute mostly works on Win64. The argument simply gets passed as a pointer and as long as we're ok with the callee trampling all over that memory, there are no ill effects. However, it does contradict the documentation of the `byval` attribute which specifies that there is to be an implicit copy. Frontends can of course work around this by never emitting the `byval` attribute for Win64 and creating `alloca`s for the requisite temporary stack slots (and that does appear to be what frontends are doing). However, the presence of the `byval` attribute is not a trap for frontend authors, since it seems to work, but silently modifies the passed memory contrary to documentation. I see two solutions: - Disallow the `byval` attribute in the verifier if using the Win64 calling convention. - Make it work by simply emitting a temporary stack copy as we would with any other calling convention (frontends can of course always not use the attribute if they want to elide the copy). This patch implements the second option (make it work), though I would be fine with the first also. Ref: https://github.com/JuliaLang/julia/issues/28338 Reviewers: rnk Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D51842 llvm-svn: 342402	2018-09-17 17:37:14 +00:00
Stanislav Mekhanoshin	06d3b4139e	[AMDGPU] Initialize instruction itinerary from GCNSubtarget I need to use it in the GCN codegen. Differential Revision: https://reviews.llvm.org/D52123 llvm-svn: 342400	2018-09-17 16:04:32 +00:00
Sam Parker	481cdab919	[ARM] Cleanup ARM CGP isSupportedValue isSupportedValue explicitly checked and accepted many types of value, primarily for debugging reasons. Remove most of these checks and do a bit of refactoring now that the pass is more stable. This also enables ZExts to be sources, but this has very little practical benefit at the moment extend instructions will still be introduced. Differential Revision: https://reviews.llvm.org/D52080 llvm-svn: 342395	2018-09-17 13:57:39 +00:00
Sam Parker	76d25d7f55	[ARM] Disallow icmp with negative imm and overflow We allow overflowing instructions if they're decreasing and only used by an unsigned compare. Add the extra condition that the icmp cannot be using a negative immediate. Differential Revision: https://reviews.llvm.org/D52102 llvm-svn: 342392	2018-09-17 13:48:25 +00:00
Strahinja Petrovic	488fd4e625	[PowerPC] Fix label address calculation for ppc64 This patch fixes calculating address of label for non-pic ppc64. Differential Revision: https://reviews.llvm.org/D50965 llvm-svn: 342368	2018-09-17 11:03:40 +00:00
Simon Pilgrim	cffa206423	[X86][SSE] Always enable ISD::SRL -> ISD::MULHU for v8i16 For constant non-uniform cases we'll never introduce more and/andn/or selects than already occur in generic pre-SSE41 ISD::SRL lowering. llvm-svn: 342352	2018-09-16 20:28:38 +00:00
Simon Pilgrim	ea069ffd44	[X86][AVX] Enable ISD::SRL -> ISD::MULHU for v16i16 Now that rL340913 has landed with improved v16i16 selects as shuffles. llvm-svn: 342349	2018-09-16 19:20:47 +00:00
Sanjay Patel	bfee5a9b42	[x86] fix uses check in broadcast transform (PR38949) https://bugs.llvm.org/show_bug.cgi?id=38949 It's not clear to me that we even need a one-use check in this fold. Ie, 2 independent loads might be better than a load+dependent shuffle. Note that the existing re-use tests are not affected. We actually do form a broadcast node in those tests now because there's no extra use of the insert_subvector node in those cases. But something later in isel pattern matching decides that it is not worth using a broadcast for the full load in those tests: Legalized selection DAG: %bb.0 'test_broadcast_2f64_4f64_reuse:' t7: v2f64,ch = load<(load 16 from %ir.p0)> t0, t2, undef:i64 t4: i64,ch = CopyFromReg t0, Register:i64 %1 t10: ch = store<(store 16 into %ir.p1)> t7:1, t7, t4, undef:i64 t18: v4f64 = insert_subvector undef:v4f64, t7, Constant:i64<0> t20: v4f64 = insert_subvector t18, t7, Constant:i64<2> Becomes: t7: v2f64,ch = load<(load 16 from %ir.p0)> t0, t2, undef:i64 t4: i64,ch = CopyFromReg t0, Register:i64 %1 t10: ch = store<(store 16 into %ir.p1)> t7:1, t7, t4, undef:i64 t21: v4f64 = X86ISD::SUBV_BROADCAST t7 ISEL: Starting selection on root node: t21: v4f64 = X86ISD::SUBV_BROADCAST t7 ... Created node: t27: v4f64 = INSERT_SUBREG IMPLICIT_DEF:v4f64, t7, TargetConstant:i32<7> Morphed node: t21: v4f64 = VINSERTF128rr t27, t7, TargetConstant:i8<1> llvm-svn: 342347	2018-09-16 15:41:56 +00:00
Craig Topper	fe0b973fbf	[X86] Remove an fp->int->fp domain crossing in LowerUINT_TO_FP_i64. Summary: This unfortunately adds a move, but isn't that better than going to the int domain and back? Reviewers: RKSimon Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D52134 llvm-svn: 342327	2018-09-15 16:23:35 +00:00
Craig Topper	273f755da3	[X86] Fold (movmsk (setne (and X, (1 << C)), 0)) -> (movmsk (X << C)) Summary: MOVMSK only care about the sign bit so we don't need the setcc to fill the whole element with 0s/1s. We can just shift the bit we're looking for into the sign bit. This saves a constant pool load. Inspired by PR38840. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: lebedev.ri, llvm-commits Differential Revision: https://reviews.llvm.org/D52121 llvm-svn: 342326	2018-09-15 16:23:33 +00:00
Thomas Lively	f2550e0c44	[WebAssembly] SIMD shifts Summary: Implement shifts of vectors by i32. Since LLVM defines shifts as binary operations between two vectors, this involves pattern matching on splatted shift operands. For v2i64 shifts any i32 shift operands have to be zero extended in the input and any i64 shift operands have to be wrapped in the output. Depends on D52007. Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D51906 llvm-svn: 342302	2018-09-15 00:45:31 +00:00
Thomas Lively	88b7443f94	[WebAssembly] SIMD neg Summary: Depends on D52007. Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D52009 llvm-svn: 342296	2018-09-14 22:35:12 +00:00
Lion Yang	c68f78d5d8	[PowerPC] Fix the calling convention for i1 arguments on PPC32 Summary: Integer types smaller than i32 must be extended to i32 by default. The feature "crbits" introduced at r202451 handles i1 as a special case, but it did not extend properly. The caller was, therefore, passing i1 stack arguments by writing 0/1 to the first byte of the 4-byte stack object and callee was reading the first byte for the value. "crbits" is enabled if the optimization level is greater than 1, which is very common in "release builds". Such discrepancies with ABI specification also introduces potential incompatibility with programs or libraries built with other compilers e.g. GCC. Fixes PR38661 Reviewers: hfinkel, cuviper Subscribers: sylvestre.ledru, glaubitz, nagisa, nemanjai, kbarton, llvm-commits Differential Revision: https://reviews.llvm.org/D51108 llvm-svn: 342288	2018-09-14 21:26:05 +00:00
Konstantin Zhuravlyov	e721b11c12	AMDGPU: Clear the bits before they are being set in program resource registers Change by Tony Tye llvm-svn: 342270	2018-09-14 20:00:36 +00:00
Reid Kleckner	00f0ee718f	Revert r342210 "[ARM] bottom-top mul support in ARMParallelDSP" It causes assertion failures while building Skia for Android in Chromium: https://ci.chromium.org/buildbot/chromium.clang/ToTAndroid/4550 Reduction forthcoming. llvm-svn: 342260	2018-09-14 18:44:37 +00:00
Simon Pilgrim	32857c54d2	[X86][SSE] Lower shuffles to permute(unpack(x,y)) (PR31151) Attempt to lower a shuffle as an unpack of elements from two inputs followed by a single-input (wider) permutation. As long as the permutation is wider this is a win - there may be some circumstances where same size permutations would also be useful but I've left that for future work. Differential Revision: https://reviews.llvm.org/D52043 llvm-svn: 342257	2018-09-14 18:33:31 +00:00
Simon Pilgrim	1c1335a10d	[X86][BMI1] Fix BLSI/BLSMSK/BLSR BMI1 scheduling on btver2 These have the same behaviour as tzcnt on btver2 - confirmed with AMD 16h SOG, Agner and instlatx64. llvm-svn: 342235	2018-09-14 13:31:14 +00:00
Simon Pilgrim	6a47cdbdec	[X86][BMI1] Add scheduler class for BLSI/BLSMSK/BLSR BMI1 instructions llvm-svn: 342234	2018-09-14 13:09:56 +00:00
David Stuttard	20de3e99b5	[AMDGPU] Ensure trig range reduction only used for subtargets that require it Summary: GFX9 and above support sin/cos instructions with a greater range and thus don't require a fract instruction prior to invocation. Added a subtarget feature to reflect this and added code to take advantage of expanded range on GFX9+ Also updated the tests to check correct behaviour Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D51933 Change-Id: I1c1f1d3726a5ae32116646ca5cfa1ab4ef69e5b0 llvm-svn: 342222	2018-09-14 10:27:19 +00:00
Sam Parker	7b84fd7847	[ARM] bottom-top mul support in ARMParallelDSP On failing to find sequences that can be converted into dual macs, try to find sequential 16-bit loads that are used by muls which we can then use smultb, smulbt, smultt with a wide load. Differential Revision: https://reviews.llvm.org/D51983 llvm-svn: 342210	2018-09-14 08:09:09 +00:00
Jonas Paulsson	77df2f2f38	[SystemZ] Adjust cost functions for subtargets that use LI + LOC instead of IPM After recent improvements which makes better use of LOC instead of IPM, the TTI cost functions also needs to be updated to reflect this. This involves sext, zext and xor of i1. The tests were updated so that for z13 the new costs are expected, while the old costs are still checked for on zEC12. Review: Ulrich Weigand https://reviews.llvm.org/D51339 llvm-svn: 342207	2018-09-14 06:46:55 +00:00
Tim Renouf	c8af6a46fa	[AMDGPU] Removed unused method Summary: I accidentally left this behind in D50306, and it causes a build warning when I build with gcc7. Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D52022 Change-Id: I30f7a47047e9d9d841f652da66d2fea19e74842c llvm-svn: 342189	2018-09-13 21:56:25 +00:00
Nirav Dave	59ad1c8457	[X86] Fix register resizings for inline assembly register operands. When replacing a named register input to the appropriately sized sub/super-register. In the case of a 64-bit value being assigned to a register in 32-bit mode, match GCC's assignment. Reviewers: eli.friedman, craig.topper Subscribers: nickdesaulniers, llvm-commits, hiraditya Differential Revision: https://reviews.llvm.org/D51502 llvm-svn: 342175	2018-09-13 20:33:56 +00:00
Nirav Dave	2060a16dfd	[X86] Cleanup pair returns. NFCI. llvm-svn: 342174	2018-09-13 20:33:27 +00:00
Ana Pazos	065b088759	[RISCV][MC] Reject bare symbols for the simm6 and simm6nonzero operand types Summary: Fixed assertions due to invalid fixup when encoding compressed instructions (c.addi, c.addiw, c.li, c.andi) with bare symbols with/without modifiers. This matches GAS behavior as well. This bug was uncovered by a LLVM MC Disassembler Protocol Buffer Fuzzer for the RISC-V assembly language. Reviewers: asb Reviewed By: asb Subscribers: rbar, johnrusso, simoncook, sabuasal, niosHD, kito-cheng, shiva0217, zzheng, edward-jones, mgrang, rogfer01, MartinMosbeck, brucehoult, the_o, rkruppe, PkmX, jocewei, asb Differential Revision: https://reviews.llvm.org/D52005 llvm-svn: 342160	2018-09-13 18:37:23 +00:00
Ana Pazos	b0799dda77	[RISCV] Fix decoding of invalid instruction with C extension enabled. Summary: The illegal instruction 0x00 0x00 is being wrongly decoded as c.addi4spn with 0 immediate. The invalid instruction 0x01 0x61 is being wrongly decoded as c.addi16sp with 0 immediate. This bug was uncovered by a LLVM MC Disassembler Protocol Buffer Fuzzer for the RISC-V assembly language. Reviewers: asb Reviewed By: asb Subscribers: rbar, johnrusso, simoncook, sabuasal, niosHD, kito-cheng, shiva0217, zzheng, edward-jones, mgrang, rogfer01, MartinMosbeck, brucehoult, the_o, rkruppe, PkmX, jocewei, asb Differential Revision: https://reviews.llvm.org/D51815 llvm-svn: 342159	2018-09-13 18:21:19 +00:00
Sam Clegg	79c054f6b8	[WebAssembly] Fix signature of `main` in FixFunctionBitcasts Also, add a check to ensure that when main has the expected signature we do not create a wrapper. Differential Revision: https://reviews.llvm.org/D51562 llvm-svn: 342157	2018-09-13 17:13:10 +00:00
Sam Parker	aaec3c6260	[ARM] Allow truncs as sources in ARM CGP We previously only allowed truncs as sinks, but now allow them as sources too. We do this by checking that the result type is the narrow type that we're trying to optimise for. Differential Revision: https://reviews.llvm.org/D51978 llvm-svn: 342141	2018-09-13 15:14:12 +00:00
Sam Parker	96f77f142b	[ARM] Fix FixConst for ARMCodeGenPrepare Part of FixConsts wrongly assumes either a 8- or 16-bit constant which can result in the wrong constants being generated during promotion. Differential Revision: https://reviews.llvm.org/D52032 llvm-svn: 342140	2018-09-13 14:48:10 +00:00
Matt Arsenault	ff987ac6ea	AMDGPU: Fix not preserving alignent in call setups If an argument was passed on the stack, this was using the default alignment. I'm not sure there's an observable change from this. This was observable due to bugs in expansion of unaligned loads and stores, but since that is fixed I don't think this matters much. llvm-svn: 342133	2018-09-13 12:14:31 +00:00
Tim Northover	c15d47bb01	ARM: align loops to 4 bytes on Cortex-M3 and Cortex-M4. The Technical Reference Manuals for these two CPUs state that branching to an unaligned 32-bit instruction incurs an extra pipeline reload penalty. That's bad. This also enables the optimization at -Os since it costs on average one byte per loop in return for 1 cycle per iteration, which is pretty good going. llvm-svn: 342127	2018-09-13 10:28:05 +00:00
Alexander Timofeev	4d302f6911	[AMDGPU] Load divergence predicate refactoring Differential revision: https://reviews.llvm.org/D51931 Reviewers: rampitec llvm-svn: 342120	2018-09-13 09:06:56 +00:00
Simon Atanasyan	c49da2e4ed	[mips] Enable the mnemonic spell corrector This implements suggesting alternative mnemonics when an invalid one is specified. For example `addru $9, $6, 17767` leads to the following error message: error: unknown instruction, did you mean: add, addiu, addu, maddu? Differential revision: https://reviews.llvm.org/D40646 llvm-svn: 342119	2018-09-13 08:38:03 +00:00
Alexander Timofeev	2fb44808b1	[AMDGPU] Preliminary patch for divergence driven instruction selection. Load offset inlining pattern changed. Differential revision: https://reviews.llvm.org/D51975 Reviewers: rampitec llvm-svn: 342115	2018-09-13 06:34:56 +00:00
Craig Topper	f107123a88	[X86] Type legalize v2i32 div/rem by scalarizing rather than promoting Summary: Previously we type legalized v2i32 div/rem by promoting to v2i64. But we don't support div/rem of vectors so op legalization would then scalarize it using i64 scalar ops since it doesn't know about the original promotion. 64-bit scalar divides on Intel hardware are known to be slow and in 32-bit mode they require a libcall. This patch switches type legalization to do the scalarizing itself using i32. It looks like the division by power of 2 optimization is still kicking in and leaving the code as a vector. The division by other constant optimization doesn't kick in pre type legalization since it ignores illegal types. And previously, after type legalization we scalarized the v2i64 since we don't have v2i64 MULHS/MULHU support. Another option might be to widen v2i32 to v4i32 so we could do division by constant optimizations, but we'd have to be careful to only do that for constant divisors or we risk scalaring to 4 scalar divides. Reviewers: RKSimon, spatel Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D51325 llvm-svn: 342114	2018-09-13 06:13:37 +00:00
Saleem Abdulrasool	aaa72c547b	ARM: correct the relocation type for `bl` on WoA The `IMAGE_REL_ARM_BRANCH20T` applies only to a `b.w` instruction. A thumb-2 `bl` should be relocated using a `IMAGE_REL_ARM_BRANCH24T`. Correct the relocation that we emit in such a case. Resolves PR38620! Based on the patch by Jordan Rhee! llvm-svn: 342109	2018-09-13 04:55:08 +00:00
Thomas Lively	65825cd7c5	Remove isAsCheapAsAMove from v128.const llvm-svn: 342106	2018-09-13 02:50:57 +00:00
Thomas Lively	17ba6becaa	Remove isAsCheapAsAMove from mem ops llvm-svn: 342105	2018-09-13 02:50:57 +00:00
Thomas Lively	56b34f6c51	[WebAssembly] Add missing SIMD instruction attributes Summary: These attributes are copied from equivalent instructions in WebAssemblyInstrInfo.td. Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D51518 llvm-svn: 342104	2018-09-13 02:50:56 +00:00
Krzysztof Parzyszek	a6d4fc0e29	[Hexagon] Use shuffles when lowering "gather" shufflevectors Shufflevector instructions in LLVM IR that extract a subset of elements of a longer input into a shorter vector can be done using VECTOR_SHUFFLEs. This will avoid expanding them into constly extracts and inserts. llvm-svn: 342091	2018-09-12 22:14:52 +00:00
Krzysztof Parzyszek	f853741142	[Hexagon] Improve the selection algorithm in scalarizeShuffle Use topological ordering for newly generated nodes. llvm-svn: 342090	2018-09-12 22:10:58 +00:00
Heejin Ahn	300f42fbce	[WebAssembly] Make tied inline asm operands work again Summary: rL341389 broke code with tied register operands in inline assembly. For example, `asm("" : "=r"(var) : "0"(var));` The code above specifies the input operand to be in the same register with the output operand, tying the two register. This patch makes this kind of code work again. Reviewers: dschuff Subscribers: sbc100, jgravelle-google, eraman, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D51991 llvm-svn: 342084	2018-09-12 21:34:39 +00:00
Krzysztof Parzyszek	cd95e03cf0	[Hexagon] Use legalized type for extracted elements in scalarizeShuffle Scalarization of a shuffle will break up the source vectors into individual elements, and use them to assemble the resulting vector. An element type of a legal vector type may not necessarily be a legal scalar type, so make sure that the extracted values are extended to a legal scalar type. llvm-svn: 342079	2018-09-12 20:58:48 +00:00
Konstantin Zhuravlyov	6e551e0e49	AMDGPU: Print all kernel descriptor directives (including the ones with default values) Change by Tony Tye Differential Revision: https://reviews.llvm.org/D51954 llvm-svn: 342077	2018-09-12 20:25:39 +00:00
Konstantin Zhuravlyov	71e43ee47d	AMDGPU: Re-apply r341982 after fixing the layering issue Move isa version determination into TargetParser. Also switch away from target features to CPU string when determining isa version. This fixes an issue when we output wrong isa version in the object code when features of a particular CPU are altered (i.e. gfx902 w/o xnack used to result in gfx900). llvm-svn: 342069	2018-09-12 18:50:47 +00:00
Thomas Lively	ebd4c906d8	[WebAssembly] SIMD comparisons Summary: Match the ordering semantics of non-vector comparisons. For floating point comparisons that do not correspond to instructions, the tests check that some vector comparison instruction was emitted but do not care about the full implementation. Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D51765 llvm-svn: 342064	2018-09-12 17:56:00 +00:00
Diogo N. Sampaio	01b916e188	[ARM] Tighten f64<->f16 conversion requirements Fix missing Requires fields. Patch by Bernard Ogden (bogden) Reviewers: SjoerdMeijer, javed.absar, t.p.northover Reviewed By: t.p.northover Differential Revision: https://reviews.llvm.org/D51631 llvm-svn: 342061	2018-09-12 16:24:43 +00:00

... 15 16 17 18 19 ...

50667 Commits