llvm-project

Commit Graph

Author	SHA1	Message	Date
Simon Pilgrim	266f43964e	[TargetLowering] Add allowsMemoryAccess(MachineMemOperand) helper wrapper. NFCI. As suggested by @arsenm on D63075 - this adds a TargetLowering::allowsMemoryAccess wrapper that takes a Load/Store node's MachineMemOperand to handle the AddressSpace/Alignment arguments and will also implicitly handle the MachineMemOperand::Flags change in D63075. llvm-svn: 363048	2019-06-11 11:00:23 +00:00
Simon Tatham	14241378d3	[ARM] Fix unused-variable warning in rL363039. The variable `OffsetMask` is currently only used in an assertion, so if assertions are compiled out and -Werror is enabled, it becomes a build failure. llvm-svn: 363043	2019-06-11 10:09:12 +00:00
Simon Tatham	8c865cacda	[ARM] Add the non-MVE instructions in Arm v8.1-M. This adds support for the new family of conditional selection / increment / negation instructions; the low-overhead branch instructions (e.g. BF, WLS, DLS); the CLRM instruction to zero a whole list of registers at once; the new VMRS/VMSR and VLDR/VSTR instructions to get data in and out of 8.1-M system registers, particularly including the new VPR register used by MVE vector predication. To support this, we also add a register name 'zr' (used by the CSEL family to force one of the inputs to the constant 0), and operand types for lists of registers that are also allowed to include APSR or VPR (used by CLRM). The VLDR/VSTR instructions also need a new addressing mode. The low-overhead branch instructions exist in their own separate architecture extension, which we treat as enabled by default, but you can say -mattr=-lob or equivalent to turn it off. Reviewers: dmgreen, samparker, SjoerdMeijer, t.p.northover Reviewed By: samparker Subscribers: miyuki, javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62667 llvm-svn: 363039	2019-06-11 09:29:18 +00:00
Craig Topper	627d8168e7	[X86] Add load folding isel patterns to scalar_math_patterns and AVX512_scalar_math_fp_patterns. Also add a FIXME for the peephole pass not being able to handle this. llvm-svn: 363032	2019-06-11 04:30:53 +00:00
Tom Stellard	4b0b26199b	Revert CMake: Make most target symbols hidden by default This reverts r362990 (git commit `374571301d`) This was causing linker warnings on Darwin: ld: warning: direct access in function 'llvm::initializeEvexToVexInstPassPass(llvm::PassRegistry&)' from file '../../lib/libLLVMX86CodeGen.a(X86EvexToVex.cpp.o)' to global weak symbol 'void std::__1::__call_once_proxy<std::__1::tuple<void* (&)(llvm::PassRegistry&), std::__1::reference_wrapper<llvm::PassRegistry>&&> >(void*)' from file '../../lib/libLLVMCore.a(Verifier.cpp.o)' means the weak symbol cannot be overridden at runtime. This was likely caused by different translation units being compiled with different visibility settings. llvm-svn: 363028	2019-06-11 03:21:13 +00:00
Matt Arsenault	c5830f5f05	AtomicExpand: Don't crash on non-0 alloca This now produces garbage on AMDGPU with a call to an nonexistent, anonymous libcall but won't assert. llvm-svn: 363022	2019-06-11 01:35:07 +00:00
Matt Arsenault	383e72fcfe	AMDGPU: Expand < 32-bit atomics Also fix AtomicExpand asserting on atomicrmw fadd/fsub. llvm-svn: 363021	2019-06-11 01:35:00 +00:00
Tom Stellard	374571301d	CMake: Make most target symbols hidden by default Summary: For builds with LLVM_BUILD_LLVM_DYLIB=ON and BUILD_SHARED_LIBS=OFF this change makes all symbols in the target specific libraries hidden by default. A new macro called LLVM_EXTERNAL_VISIBILITY has been added to mark symbols in these libraries public, which is mainly needed for the definitions of the LLVMInitialize* functions. This patch reduces the number of public symbols in libLLVM.so by about 25%. This should improve load times for the dynamic library and also make abi checker tools, like abidiff require less memory when analyzing libLLVM.so One side-effect of this change is that for builds with LLVM_BUILD_LLVM_DYLIB=ON and LLVM_LINK_LLVM_DYLIB=ON some unittests that access symbols that are no longer public will need to be statically linked. Before and after public symbol counts (using gcc 8.2.1, ld.bfd 2.31.1): nm before/libLLVM-9svn.so \| grep ' [A-Zuvw] ' \| wc -l 36221 nm after/libLLVM-9svn.so \| grep ' [A-Zuvw] ' \| wc -l 26278 Reviewers: chandlerc, beanz, mgorny, rnk, hans Reviewed By: rnk, hans Subscribers: Jim, hiraditya, michaelplatings, chapuni, jholewinski, arsenm, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, javed.absar, sbc100, jgravelle-google, aheejin, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, zzheng, edward-jones, mgrang, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, kristina, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D54439 llvm-svn: 362990	2019-06-10 22:12:56 +00:00
Jinsong Ji	9c7f93e914	[PowerPC][HTM]Fix $zero is not a GPRC register for builtin_ttest This was found during HTM cleanup. Adding a test for builtin_ttest would expose following issue. * Bad machine code: Illegal physical register for instruction * - function: test10 - basic block: %bb.0 entry (0xf0e57497b58) - instruction: %5:crrc0 = TABORTWCI 0, $zero, 0 - operand 2: $zero $zero is not a GPRC register. LLVM ERROR: Found 1 machine code errors. Differential Revision: https://reviews.llvm.org/D63079 llvm-svn: 362974	2019-06-10 19:04:14 +00:00
Piotr Sobczak	9b11e93d90	[AMDGPU] Optimize image_[load\|store]_mip Summary: Replace image_load_mip/image_store_mip with image_load/image_store if lod is 0. Reviewers: arsenm, nhaehnle Reviewed By: arsenm Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63073 llvm-svn: 362957	2019-06-10 15:58:51 +00:00
Simon Tatham	67065c5c70	Revert rL362953 and its followup rL362955. These caused a build failure because I managed not to notice they depended on a later unpushed commit in my current stack. Sorry about that. llvm-svn: 362956	2019-06-10 15:58:19 +00:00
Simon Tatham	42078d41d5	[ARM] Add the non-MVE instructions in Arm v8.1-M. This should have been part of r362953, but I had a finger-trouble incident and committed the old rather than new version of the patch. Sorry. llvm-svn: 362955	2019-06-10 15:41:58 +00:00
Simon Tatham	baeea91933	[ARM] Add the non-MVE instructions in Arm v8.1-M. This adds support for the new family of conditional selection / increment / negation instructions; the low-overhead branch instructions (e.g. BF, WLS, DLS); the CLRM instruction to zero a whole list of registers at once; the new VMRS/VMSR and VLDR/VSTR instructions to get data in and out of 8.1-M system registers, particularly including the new VPR register used by MVE vector predication. To support this, we also add a register name 'zr' (used by the CSEL family to force one of the inputs to the constant 0), and operand types for lists of registers that are also allowed to include APSR or VPR (used by CLRM). The VLDR/VSTR instructions also need some new addressing modes. The low-overhead branch instructions exist in their own separate architecture extension, which we treat as enabled by default, but you can say -mattr=-lob or equivalent to turn it off. Reviewers: dmgreen, samparker, SjoerdMeijer, t.p.northover Reviewed By: samparker Subscribers: miyuki, javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62667 llvm-svn: 362953	2019-06-10 15:36:34 +00:00
Simon Tatham	b87669f166	[ARM] Disallow PC, and optionally SP, in VMOVRH and VMOVHR. Arm v8.1-M supports the VMOV instructions that move a half-precision value to and from a GPR, but not if the GPR is SP or PC. To fix this, I've changed those instructions to use the rGPR register class instead of GPR. rGPR always excludes PC, and it excludes SP except in the presence of the HasV8Ops target feature (i.e. Arm v8-A). So the effect is that VMOV.F16 to and from PC is now illegal everywhere, but VMOV.F16 to and from SP is illegal only on non-v8-A cores (which I believe is all as it should be). Reviewers: dmgreen, samparker, SjoerdMeijer, ostannard Reviewed By: ostannard Subscribers: ostannard, javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60704 llvm-svn: 362942	2019-06-10 14:43:55 +00:00
David Green	d847aa573b	[ARM] Enable Unroll UpperBound This option allows loops with small max trip counts to be fully unrolled. This can help with code like the remainder loops from manually unrolled loops like those that appear in the cmsis dsp library. We would apparently previously runtime unroll them with the default unroll count (4). Differential Revision: https://reviews.llvm.org/D63064 llvm-svn: 362928	2019-06-10 10:22:14 +00:00
Craig Topper	9000a72a4b	[X86] When promoting i16 compare with immediate to i32, try to use sign_extend for eq/ne if the input is truncated from a type with enough sign its. Summary: Our default behavior is to use sign_extend for signed comparisons and zero_extend for everything else. But for equality we have the freedom to use either extension. If we can prove the input has been truncated from something with enough sign bits, we can use sign_extend instead and let DAG combine optimize it out. A similar rule is used by type legalization in LegalizeIntegerTypes. This gets rid of the movzx in PR42189. The immediate will still take 4 bytes instead of the 2 bytes plus 0x66 prefix a cmp di, 32767 would get, but it avoids a length changing prefix. Reviewers: RKSimon, spatel, xbolva00 Reviewed By: xbolva00 Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63032 llvm-svn: 362920	2019-06-10 04:50:12 +00:00
Craig Topper	ceb807bbbc	[X86] Disable f32->f64 extload when sse2 is enabled Summary: We can only use the memory form of cvtss2sd under optsize due to a partial register update. So previously we were emitting 2 instructions for extload when optimizing for speed. Also due to a late optimization in preprocessiseldag we had to handle (fpextend (loadf32)) under optsize. This patch forces extload to expand so that it will always be in the (fpextend (loadf32)) form during isel. And when optimizing for speed we can just let each of those pieces select an instruction independently. Reviewers: spatel, RKSimon Reviewed By: RKSimon Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62710 llvm-svn: 362919	2019-06-10 04:37:16 +00:00
Craig Topper	dd10099d5c	[X86] Use EVEX instructions for f128 FAND/FOR/FXOR when avx512vl is enabled. llvm-svn: 362915	2019-06-10 01:18:55 +00:00
Craig Topper	f7ba8b808a	[X86] Convert f32/f64 FANDN/FAND/FOR/FXOR to vector logic ops and scalar_to_vector/extract_vector_elts to reduce isel patterns. Previously we did the equivalent operation in isel patterns with COPY_TO_REGCLASS operations to transition. By inserting scalar_to_vetors and extract_vector_elts before isel we can allow each piece to be selected individually and accomplish the same final result. I ideally we'd use vector operations earlier in lowering/combine, but that looks to be more difficult. The scalar-fp-to-i64.ll changes are because we have a pattern for using movlpd for store+extract_vector_elt. While an f64 store uses movsd. The encoding sizes are the same. llvm-svn: 362914	2019-06-10 00:41:07 +00:00
Jatin Bhateja	2a30aeb010	[X86] NFCI : Comment updation for EVEX to VEX translation. Reviewers: llvm-commits, jbhateja Reviewed By: jbhateja Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63055 llvm-svn: 362898	2019-06-09 09:59:26 +00:00
Amara Emerson	0d20969dea	[AArch64][GlobalISel] Select immediate forms of cmp instructions. A simple re-use of the immediate operand matcher and renderer functions. rdar://43795178 llvm-svn: 362896	2019-06-09 07:31:25 +00:00
Craig Topper	2ba0e2518b	[X86] Remove (store (f32 (extractelt (v4f32))) isel patterns which is redundant. We emit a MOVSSmr and a COPY_TO_REGCLASS, but that's what we would get from selecting the store and extractelt independently. llvm-svn: 362895	2019-06-09 03:21:33 +00:00
Craig Topper	7d8494c41c	[X86] Mutate scalar fceil/ffloor/ftrunc/fnearbyint/frint into X86ISD::RNDSCALE during PreProcessIselDAG to cut down on number of isel patterns. Similar was done for vectors in r362535. Removes about 1200 bytes from the isel table. llvm-svn: 362894	2019-06-08 23:53:31 +00:00
David Green	c5471c2a57	[ARM] Adjust isLegalT1AddressImmediate for non-legal types Types such as float and i64's do not have legal loads in Thumb1, but will still be loaded with a LDR (or potentially multiple LDR's). As such we can treat the cost of addressing mode calculations the same as an i32 and get some optimisation benefits. Differential Revision: https://reviews.llvm.org/D62968 llvm-svn: 362874	2019-06-08 10:32:53 +00:00
David Green	342d1b81a3	[ARM] Add MVE addressing to isLegalT2AddressImmediate Now with MVE being added, we can add the vector addressing mode costs for it. These are generally imm7 multiplied by the size of the type being loaded / stored. Differential Revision: https://reviews.llvm.org/D62967 llvm-svn: 362873	2019-06-08 10:18:23 +00:00
David Green	4ecce205d5	[ARM] Add fp16 addressing to isLegalT2AddressImmediate The fp16 version of VLDR takes a imm8 multiplied by 2. This updates the costs to account for those, and adds extra testing. It is dependant upon hasFPRegs16 as this is what the load/store instructions require. Differential Revision: https://reviews.llvm.org/D62966 llvm-svn: 362872	2019-06-08 10:09:02 +00:00
David Green	10fbaa96c5	[ARM] Add HasNEON for all Neon patterns in ARMInstrNEON.td. NFCI We are starting to add an entirely separate vector architecture to the ARM backend. To do that we need at least some separation between the existing NEON and the new MVE code. This patch just goes through the Neon patterns and ensures that they are predicated on HasNEON, giving MVE a stable place to start from. No tests yet as this is largely an NFC, and we don't have the other target that will treat any of these intructions as legal. Differential Revision: https://reviews.llvm.org/D62945 llvm-svn: 362870	2019-06-08 09:36:49 +00:00
Jonas Paulsson	bca56ab073	[SystemZ] Fix CMakeLists.txt for alphabetical order (NFC). llvm-svn: 362869	2019-06-08 06:42:02 +00:00
Jonas Paulsson	fdc4ea34e3	[SystemZ, RegAlloc] Favor 3-address instructions during instruction selection. This patch aims to reduce spilling and register moves by using the 3-address versions of instructions per default instead of the 2-address equivalent ones. It seems that both spilling and register moves are improved noticeably generally. Regalloc hints are passed to increase conversions to 2-address instructions which are done in SystemZShortenInst.cpp (after regalloc). Since the SystemZ reg/mem instructions are 2-address (dst and lhs regs are the same), foldMemoryOperandImpl() can no longer trivially fold a spilled source register since the reg/reg instruction is now 3-address. In order to remedy this, new 3-address pseudo memory instructions are used to perform the folding only when the dst and lhs virtual registers are known to be allocated to the same physreg. In order to not let MachineCopyPropagation run and change registers on these transformed instructions (making it 3-address), a new target pass called SystemZPostRewrite.cpp is run just after VirtRegRewriter, that immediately lowers the pseudo to a target instruction. If it would have been possibe to insert a COPY instruction and change a register operand (convert to 2-address) in foldMemoryOperandImpl() while trusting that the caller (e.g. InlineSpiller) would update/repair the involved LiveIntervals, the solution involving pseudo instructions would not have been needed. This is perhaps a potential improvement (see Phabricator post). Common code changes: * A new hook TargetPassConfig::addPostRewrite() is utilized to be able to run a target pass immediately before MachineCopyPropagation. * VirtRegMap is passed as an argument to foldMemoryOperand(). Review: Ulrich Weigand, Quentin Colombet https://reviews.llvm.org/D60888 llvm-svn: 362868	2019-06-08 06:19:15 +00:00
Matt Arsenault	ddd2c9ac86	AMDGPU: Force skips around traps llvm-svn: 362852	2019-06-07 23:02:52 +00:00
Craig Topper	bd03230cb0	[X86] Remove unnecessary new line escape from the end of a macro. NFC llvm-svn: 362837	2019-06-07 20:30:40 +00:00
Sanjay Patel	6880bceda2	[x86] narrow extract subvector of vector select This is a potentially large perf win for AVX1 targets because of the way we auto-vectorize to 256-bit but then expect the backend to legalize/optimize for the half-implemented AVX1 ISA. On the motivating example from PR37428 (even though this patch doesn't solve the vector shift issue): https://bugs.llvm.org/show_bug.cgi?id=37428 ...there's a 16% speedup when compiling with "-mavx" (perf tested on Haswell) because we eliminate the remaining 256-bit vblendv ops. I added comments on a couple of tests that require further work. If we have 256-bit logic ops separating the vselect and extract, we should probably narrow everything to 128-bit, but that requires a larger pattern match. Differential Revision: https://reviews.llvm.org/D62969 llvm-svn: 362797	2019-06-07 13:17:46 +00:00
Sam Elliott	f720647ddd	[RISCV] Support Bit-Preserving FP in F/D Extensions Summary: This allows some integer bitwise operations to instead be performed by hardware fp instructions. This is correct because the RISC-V spec requires the F and D extensions to use the IEEE-754 standard representation, and fp register loads and stores to be bit-preserving. This is tested against the soft-float ABI, but with hardware float extensions enabled, so that the tests also ensure the optimisation also fires in this case. Reviewers: asb, luismarques Reviewed By: asb Subscribers: hiraditya, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, kito-cheng, shiva0217, jrtc27, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, rkruppe, PkmX, jocewei, psnobl, benna, Jim, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62900 llvm-svn: 362790	2019-06-07 12:20:14 +00:00
Valery Pykhtin	cb8de55f47	[AMDGPU] Constrain the AMDGPU inliner on maximum number of basic blocks in a caller function (compile time performance) Differential revision: https://reviews.llvm.org/D62917 llvm-svn: 362789	2019-06-07 12:16:46 +00:00
Cullen Rhodes	1f0d251244	[AArch64][AsmParser] error on unexpected SVE predicate type suffix Summary: This patch fixes a bug in the assembler that permitted a type suffix on predicate registers when not expected. For instance, the following was previously valid: faddv h0, p0.q, z1.h This bug was present in all SVE instructions containing predicates with no type suffix and no predication form qualifier, i.e. /z or /m. The latter instructions are already caught with an appropiate error message by the assembler, e.g.: .text <stdin>:1:13: error: not expecting size suffix cmpne p1.s, p0.b/z, z2.s, 0 ^ A similar issue for SVE vector registers was fixed in: https://reviews.llvm.org/D59636 Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D62942 llvm-svn: 362780	2019-06-07 08:46:56 +00:00
Cullen Rhodes	f730548484	[AArch64][AsmParser] Provide better diagnostics for SVE predicates Patch by Sander de Smalen (sdesmalen) Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D62941 llvm-svn: 362779	2019-06-07 08:37:00 +00:00
Pengfei Wang	f8b28931a7	[X86] -march=cooperlake (llvm) Support intel -march=cooperlake in llvm Patch by Shengchen Kan (skan) Differential Revision: https://reviews.llvm.org/D62836 llvm-svn: 362776	2019-06-07 08:31:35 +00:00
Sam Parker	c5ef502ee8	[CodeGen] Generic Hardware Loop Support Patch which introduces a target-independent framework for generating hardware loops at the IR level. Most of the code has been taken from PowerPC CTRLoops and PowerPC has been ported over to use this generic pass. The target dependent parts have been moved into TargetTransformInfo, via isHardwareLoopProfitable, with HardwareLoopInfo introduced to transfer information from the backend. Three generic intrinsics have been introduced: - void @llvm.set_loop_iterations Takes as a single operand, the number of iterations to be executed. - i1 @llvm.loop_decrement(anyint) Takes the maximum number of elements processed in an iteration of the loop body and subtracts this from the total count. Returns false when the loop should exit. - anyint @llvm.loop_decrement_reg(anyint, anyint) Takes the number of elements remaining to be processed as well as the maximum numbe of elements processed in an iteration of the loop body. Returns the updated number of elements remaining. llvm-svn: 362774	2019-06-07 07:35:30 +00:00
Dylan McKay	04b418f246	[AVR] Expand 16-bit rotations during the legalization stage In r356860, the legalization logic for BSWAP was modified to ISD::ROTL, rather than the old ISD::{SHL, SRL, OR} nodes. This works fine on AVR for 8-bit rotations, but 16-bit rotations are currently unimplemented - they always trigger an assertion error in the AVRExpandPseudoInsts pass ("RORW unimplemented"). This patch instructions the legalizer to expand 16-bit rotations into the previous SHL, SRL, OR pattern it did previously. This fixes the 'issue-cannot-select-bswap.ll' test. Interestingly, this test failure seems flaky - it passes successfully on the avr-build-01 buildbot, but fails locally on my Arch Linux install. llvm-svn: 362773	2019-06-07 06:55:00 +00:00
Matt Arsenault	c0edb8f5cf	AMDGPU: Don't count mask branch pseudo towards skip threshold llvm-svn: 362761	2019-06-07 00:14:55 +00:00
Matt Arsenault	99ee81b183	AMDGPU: Insert skips for blocks with FLAT This already forced a skip for VMEM, so it should also be done for flat. I'm somewhat skeptical about the benefit of this though. llvm-svn: 362760	2019-06-07 00:14:45 +00:00
Nemanja Ivanovic	ef4a3aa549	[PowerPC] Exploit the vector min/max instructions Use the PPC vector min/max instructions for computing the corresponding operation as these should be faster than the compare/select sequences we currently emit. Differential revision: https://reviews.llvm.org/D47332 llvm-svn: 362759	2019-06-06 23:49:01 +00:00
Matt Arsenault	b6cfa129cc	AMDGPU: Insert skip branches over return blocks SIInsertSkips really doesn't understand the control flow, and makes very stupid assumptions about the block layout. This was able to get away with not skipping return blocks, since usually after structurization there is only one placed at the end of the function. Tail duplication can break this assumption. llvm-svn: 362754	2019-06-06 22:51:51 +00:00
Alexander Timofeev	37bd9bd137	[AMDGPU] Partial revert for the `ba447bae74` "Divergence driven ISel. Assign register class for cross block values according to the divergence." that discovered the design flaw leading to several issues that required to be solved before. This change reverts AMDGPU specific changes and keeps common part unaffected. llvm-svn: 362749	2019-06-06 21:13:02 +00:00
Craig Topper	f320f26716	[X86] Make a bunch of merge masked binops commutable for loading folding. This primarily affects add/fadd/mul/fmul/and/or/xor/pmuludq/pmuldq/max/min/fmaxc/fminc/pmaddwd/pavg. We already commuted the unmasked and zero masked versions. I've added 512-bit stack folding tests for most of the instructions affected. I've tested needing commuting and not commuting across unmasked, merged masked, and zero masked. The 128/256 bit instructions should behave similarly. llvm-svn: 362746	2019-06-06 21:00:04 +00:00
Jason Liu	60ec248148	[AIX] Implement function descriptor on SDAG Summary: (1) Function descriptor on AIX On AIX, a called routine may have 2 distinct symbols associated with it: * A function descriptor (Name) * A function entry point (.Name) The descriptor structure on AIX is the same as those in the ELF V1 ABI: * The address of the entry point of the function. * The TOC base address for the function. * The environment pointer. The descriptor symbol uses the same name as the source level function in C. The function entry point is analogous to the symbol we would generate for a function in a non-descriptor-based ABI, except that it is renamed by prepending a ".". Which symbol gets referenced depends on the context: * Taking the address of the function references the descriptor symbol. * Calling the function references the entry point symbol. (2) Speaking of implementation on AIX, for direct function call target, we create proper MCSymbol SDNode(e.g . ".foo") while constructing SDAG to replace original TargetGlobalAddress SDNode. Then down the path, we can take advantage of this MCSymbol. Patch by: Xiangling_L Reviewed by: sfertile, hubert.reinterpretcast, jasonliu, syzaara Differential Revision: https://reviews.llvm.org/D62532 llvm-svn: 362735	2019-06-06 19:13:36 +00:00
Dmitri Gribenko	5438cc6910	Remove unused PPC.h includes under llvm/lib/Target/PowerPC. llvm-svn: 362718	2019-06-06 16:47:06 +00:00
Craig Topper	6b67dfa54c	[X86] Make masked floating point equality/ordered compares commutable for load folding purposes. Same as what is supported for the unmasked form. llvm-svn: 362717	2019-06-06 16:39:04 +00:00
Jason Liu	0338b88861	[AIX] Implement call lowering with parameters could pass onto GPRs Summary: This patch implements SDAG call lowering on AIX for functions which only have parameters that could fit into GPRs. Reviewers: hubert.reinterpretcast, syzaara Differential Revision: https://reviews.llvm.org/D62823 llvm-svn: 362708	2019-06-06 14:36:43 +00:00
Adhemerval Zanella	559e69a821	AArch64] Handle ISD::LRINT and ISD::LLRINT for float16 This patch is a follow up for D62018 to add lrint/llrint support for float16. Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D62863 llvm-svn: 362700	2019-06-06 12:38:11 +00:00
Adhemerval Zanella	bce9e11a7b	[AArch64] Handle ISD::LROUND and ISD::LLROUND for float16 This patch is a follow up for D61391 to add lround/llround support for float16. Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D62861 llvm-svn: 362698	2019-06-06 11:53:26 +00:00
Dmitri Gribenko	8c2c072582	Include what you use in LanaiAsmParser.cpp llvm-svn: 362696	2019-06-06 10:37:06 +00:00
Petar Avramovic	81132ce0e9	[MIPS GlobalISel] Select sqrt Select G_FSQRT for MIPS32. Differential Revision: https://reviews.llvm.org/D62905 llvm-svn: 362692	2019-06-06 10:00:41 +00:00
Petar Avramovic	0a1fd355b2	[MIPS GlobalISel] Select fabs Select G_FABS for MIPS32. Differential Revision: https://reviews.llvm.org/D62903 llvm-svn: 362690	2019-06-06 09:22:37 +00:00
Petar Avramovic	a7d0006447	[MIPS GlobalISel] Select fpext and fptrunc Select G_FPEXT and G_FPTRUNC for MIPS32. Differential Revision: https://reviews.llvm.org/D62902 llvm-svn: 362689	2019-06-06 09:16:58 +00:00
Petar Avramovic	faaa2b5d21	[MIPS GlobalISel] Select floor and ceil Select G_FFLOOR and G_FCEIL for MIPS32. Differential Revision: https://reviews.llvm.org/D62901 llvm-svn: 362688	2019-06-06 09:02:24 +00:00
Amara Emerson	d3144a4abc	[AArch64][GlobalISel] Add manual selection support for G_ZEXTLOADs to s64. We already get support for G_ZEXTLOAD to s32 from the importer, but it can't deal with the SUBREG_TO_REG in the pattern. Tweaking the existing manual selection code for G_LOAD to handle an additional SUBREG_TO_REG when dealing with G_ZEXTLOAD isn't much work. Also add tests to check the imported pattern selections to s32 work. llvm-svn: 362681	2019-06-06 07:58:37 +00:00
Amara Emerson	d940e20051	[AArch64][GlobalISel] Add the new changes to fix PR42129 that were supposed to go into r362666. The changes weren't staged so ended up just re-commiting the unmodified reverted change. llvm-svn: 362677	2019-06-06 07:33:47 +00:00
Craig Topper	9226ba6b37	[X86] Don't turn avx masked.load with constant mask into masked.load+vselect when passthru value is all zeroes. This is intended to enable the use of an immediate blend or more optimal instruction. But if the passthru is zero we don't need any additional instructions. llvm-svn: 362675	2019-06-06 05:41:27 +00:00
Amara Emerson	c37ff0d138	Revert "Revert "[AArch64][GlobalISel] Optimize G_FCMP + G_SELECT pairs when G_SELECT is fp"" When looking through copies, make sure to not try to find the vreg def of a physreg. Normally getVRegDef will return nullptr in this case, but if there happens to be multiple defs then it will assert. This fixes PR42129. llvm-svn: 362666	2019-06-05 23:46:16 +00:00
Matt Arsenault	34c8b835b1	AMDGPU: Don't fix emergency stack slot at offset 0 This forced the caller to be aware of this, which is an ugly ABI feature. Partially reverts r295877. The original reasons for doing this are mostly fixed. Alloca is now in a non-0 address space, so it should be OK to have 0 as a valid pointer. Since we treat the absolute address as the pointer value, this part only really needed to apply to kernels. Since r357093, we avoid the need to increment/decrement the offset register in more cases, and since r354816 the scavenger can fail without spilling, so it's less critical that we try to avoid an offset that fits in the MUBUF offset. Restrict to callable functions for now to split this into 2 steps to limit thte number of test updates and in case anything breaks. llvm-svn: 362665	2019-06-05 22:37:50 +00:00
Ulrich Weigand	6c5d5ce551	Allow target to handle STRICT floating-point nodes The ISD::STRICT_ nodes used to implement the constrained floating-point intrinsics are currently never passed to the target back-end, which makes it impossible to handle them correctly (e.g. mark instructions are depending on a floating-point status and control register, or mark instructions as possibly trapping). This patch allows the target to use setOperationAction to switch the action on ISD::STRICT_ nodes to Legal. If this is done, the SelectionDAG common code will stop converting the STRICT nodes to regular floating-point nodes, but instead pass the STRICT nodes to the target using normal SelectionDAG matching rules. To avoid having the back-end duplicate all the floating-point instruction patterns to handle both strict and non-strict variants, we make the MI codegen explicitly aware of the floating-point exceptions by introducing two new concepts: - A new MCID flag "mayRaiseFPException" that the target should set on any instruction that possibly can raise FP exception according to the architecture definition. - A new MI flag FPExcept that CodeGen/SelectionDAG will set on any MI instruction resulting from expansion of any constrained FP intrinsic. Any MI instruction that is both marked as mayRaiseFPException and FPExcept then needs to be considered as raising exceptions by MI-level codegen (e.g. scheduling). Setting those two new flags is straightforward. The mayRaiseFPException flag is simply set via TableGen by marking all relevant instruction patterns in the .td files. The FPExcept flag is set in SDNodeFlags when creating the STRICT_ nodes in the SelectionDAG, and gets inherited in the MachineSDNode nodes created from it during instruction selection. The flag is then transfered to an MIFlag when creating the MI from the MachineSDNode. This is handled just like fast-math flags like no-nans are handled today. This patch includes both common code changes required to implement the new features, and the SystemZ implementation. Reviewed By: andrew.w.kaylor Differential Revision: https://reviews.llvm.org/D55506 llvm-svn: 362663	2019-06-05 22:33:10 +00:00
Petr Hosek	2f94203e23	Revert "[AArch64][GlobalISel] Optimize G_FCMP + G_SELECT pairs when G_SELECT is fp" This reverts commit r362435 as this triggers ICE, see PR42129 for details. llvm-svn: 362662	2019-06-05 22:27:31 +00:00
Matt Arsenault	b812b7a45e	AMDGPU: Invert frame index offset interpretation Since the beginning, the offset of a frame index has been consistently interpreted backwards. It was treating it as an offset from the scratch wave offset register as a frame register. The correct interpretation is the offset from the SP on entry to the function, before the prolog. Frame index elimination then should select either SP or another register as an FP. Treat the scratch wave offset on kernel entry as the pre-incremented SP. Rely more heavily on the standard hasFP and frame pointer elimination logic, and clean up the private reservation code. This saves a copy in most callee functions. The kernel prolog emission code is still kind of a mess relying on checking the uses of physical registers, which I would prefer to eliminate. Currently selection directly emits MUBUF instructions, which require using a reference to some register. Use the register chosen for SP, and then ignore this later. This should probably be cleaned up to use pseudos that don't refer to any specific base register until frame index elimination. Add a workaround for shaders using large numbers of SGPRs. I'm not sure these cases were ever working correctly, since as far as I can tell the logic for figuring out which SGPR is the scratch wave offset doesn't match up with the shader input initialization in the shader programming guide. llvm-svn: 362661	2019-06-05 22:20:47 +00:00
Craig Topper	3975b15dba	[X86] Fix mistake that marked VADDSSrrb_Int/VADDSDrrb_Int/VMULSSrrb_Int/VMULSDrrb_Int as commutable. One of the sources controls the pass through value for the upper bits of the result so we can't really commute it. In practice this problem isn't a functional issue because we would only try to commute this instruction in order to fold a load. But we can't do embedded rounding and fold a load at the same time. So the load fold would never succeed so I don't think we would ever commute or at least keep the version after commuting. llvm-svn: 362647	2019-06-05 21:00:31 +00:00
Matt Arsenault	4fb580c314	AMDGPU: Remove amdgpu-max-work-group-size attribute This has been deprecated for a long time, and mesa recently switched to amdgpu-flat-work-group-size. llvm-svn: 362641	2019-06-05 20:32:32 +00:00
Matt Arsenault	0f8a764e8f	AMDGPU: Fix using 2 different enums for same operand flags These enums are really for the same namespace of flags set on arbitrary MachineOperands, so merge them to avoid value collisions. llvm-svn: 362640	2019-06-05 20:32:25 +00:00
Dan Gohman	53572d0470	[WebAssembly] Limit PIC support to the Emscripten target The current PIC support currently only works with Emscripten, so disable it for other targets. This is the PIC portion of https://reviews.llvm.org/D62542. Reviewed By: dschuff, sbc100 llvm-svn: 362638	2019-06-05 20:01:01 +00:00
Craig Topper	d0fff89b81	[X86] Add the vector integer min/max instructions to isAssociativeAndCommutative. As far as I know these should be freely reassociatable just like the floating point MAXC/MINC instructions. The reduce test changes are largely regressions and caused by the "generic" CPU we default to not having a scheduler model. The machine-combiner-int-vec.ll test shows the positive benefits of this change. Differential Revision: https://reviews.llvm.org/D62787 llvm-svn: 362629	2019-06-05 18:25:09 +00:00
Sanjay Patel	2bf82879bd	[x86] split more 256-bit stores of concatenated vectors As suggested in D62498 - collectConcatOps() matches both concat_vectors and insert_subvector patterns, and we see more test improvements by using the more general match. llvm-svn: 362620	2019-06-05 16:40:57 +00:00
Simon Pilgrim	de586bd1fd	[X86][AVX] Generalize split256BitStore to splitVectorStore. NFCI. Enables us to use this to split 512-bit vectors in future patches. llvm-svn: 362617	2019-06-05 16:14:14 +00:00
Petar Avramovic	22e99c434f	[MIPS GlobalISel] Select fcmp Select floating point compare for MIPS32. Differential Revision: https://reviews.llvm.org/D62721 llvm-svn: 362603	2019-06-05 14:03:13 +00:00
Simon Pilgrim	886a55eaa0	[X86][AVX] combineX86ShuffleChain - combine shuffle(extractsubvector(x),extractsubvector(y)) We already handle the case where we combine shuffle(extractsubvector(x),extractsubvector(x)), this relaxes the requirement to permit different sources as long as they have the same value type. This causes a couple of cases where the VPERMV3 binary shuffles occur at a wider width than before, which I intend to improve in future commits - but as only the subvector's mask indices are defined, these will broadcast so we don't see any increase in constant size. llvm-svn: 362599	2019-06-05 12:56:53 +00:00
Dmitri Gribenko	6fc4c1cc54	Include what you use in PPCFrameLowering.h llvm-svn: 362590	2019-06-05 08:58:00 +00:00
Nemanja Ivanovic	7c842fadf1	[PowerPC] Collapse RLDICL/RLDICR into RLDIC when possible Generally speaking, we lower to an optimal rotate sequence for nodes visible in the SDAG. However, there are instances where the two rotates are not visible at ISEL time - most notably those in a very common sequence when lowering switch statements to jump tables. A common situation is a switch on a 32-bit integer. This value has to have the upper 32 bits cleared and because jump table offsets are word offsets, the value needs to be shifted left by 2 bits. We currently emit the clear and the left shift as two separate instructions, but this is not needed as we can lower it to a single RLDIC. This patch just cleans that up. Differential revision: https://reviews.llvm.org/D60402 llvm-svn: 362576	2019-06-05 02:36:40 +00:00
Craig Topper	78fdce25a1	[X86] Cleanup convertIntLogicToFPLogic a little. NFCI -Use early returns to reduce indentation -Replace multipe ifs with a switch. -Replace an assert with an llvm_unreachable default in the switch. -Check that the FP type we're going to use for the X86ISD::FAND/FOR/FXOR is legal rather than checking that the integer type matches the width of a legal scalar fp type. This all runs after legalization so it shouldn't really matter, but making sure we're using a valid type in the X86ISD node is really whats important. llvm-svn: 362565	2019-06-05 01:00:34 +00:00
Amara Emerson	2d37cb82f0	[AArch64][GlobalISel] Make extloads to i64 legal. Although we had the support in the prelegalizer combiner to generate the G_SEXTLOAD or G_ZEXTLOAD ops, the legalizer definitions for arm64 had them as lowering back to separate ops. llvm-svn: 362553	2019-06-04 21:51:34 +00:00
Thomas Lively	3d9ca00e74	[WebAssembly] Fix ISel crash on sext_inreg/extract type mismatch Summary: Adjusts the index and adds a bitcast around the vector operand of EXTRACT_VECTOR_ELT so that its lane type matches the source type of its parent sext_inreg. Without this bitcast the ISel patterns do not match and ISel fails. Reviewers: aheejin Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62646 llvm-svn: 362547	2019-06-04 21:08:20 +00:00
Craig Topper	137de38009	[X86] Mutate fceil/ffloor/ftrunc/fnearbyint/frint into X86ISD::RNDSCALE during PreProcessIselDAG to cut down on pattern permutations We already need to have patterns for X86ISD::RNDSCALE to support software intrinsics. But we currently have 5 sets of patterns for the 5 rounding operations. For of these 6 patterns we have to support 3 vectors widths, 2 element sizes, sse/vex/evex encodings, load folding, and broadcast load folding. This results in a fair amount of bytes in the isel table. This patch adds code to PreProcessIselDAG to morph the fceil/ffloor/ftrunc/fnearbyint/frint to X86ISD::RNDSCALE. This way we can remove everything, but the intrinsic pattern while still allowing the operations to be considered Legal for DAGCombine and Legalization. This shrinks the DAGISel by somewhere between 9K and 10K. There is one complication to this, the STRICT versions of these nodes are currently mutated to their none strict equivalents at isel time when the node is visited. This won't be true in the future since that loses the chain ordering information. For now I've also added support for the non-STRICT nodes to Select so we can change the STRICT versions there after they've been mutated to their non-STRICT versions. We'll probably need a STRICT version of RNDSCALE or something to handle this in the future. Which will take us back to needing 2 sets of patterns for strict and non-strict, but that's still better than the 11 or 12 sets of patterns we'd need. We can probably do something similar for scalar, but I haven't looked at it yet. Differential Revision: https://reviews.llvm.org/D62757 llvm-svn: 362535	2019-06-04 18:03:07 +00:00
Benjamin Kramer	03ff1b3c30	[X86] Fold single-use variable into assert. NFC. Avoids an unused variable warning in Release builds. llvm-svn: 362534	2019-06-04 18:01:07 +00:00
Sanjay Patel	606eb2367f	[x86] split 256-bit store of concatenated vectors This shows up as a side issue to the main problem for the AVX target example from PR37428: https://bugs.llvm.org/show_bug.cgi?id=37428 - https://godbolt.org/z/7tpRa3 But as we can see in the pile of existing test diffs, it's actually a widespread problem that affects any AVX or later target. Apart from a couple of oddballs, I think these are all improvements for the reasons stated in the code comment: we do not want to enable YMM unnecessarily (avoid vzeroupper and frequency throttling) and some cores split 256-bit stores anyway. We could say that MergeConsecutiveStores() is going overboard on some of these examples, but that won't solve the problem completely. But that is a reason I'm proposing this as a lowering rather than a combine: we will infinite loop fighting the merge code if we try this earlier. Differential Revision: https://reviews.llvm.org/D62498 llvm-svn: 362524	2019-06-04 16:40:04 +00:00
Peter Smith	f15e3d856f	[AArch64][ELF] Add support for PLT decoding with BTI instructions present Arm Architecture v8.5a introduces Branch Target Identification (BTI). When enabled all indirect branches must target a bti instruction of the appropriate form. As PLT sequences may sometimes be the target of an indirect branch and PLT[0] always is, a static linker may need to generate PLT sequences that contain "bti c" as the first instruction. In effect: bti c adrp x16, page offset to .got.plt ... Instead of: adrp x16, page offset to .got.plt ... At present the PLT decoding assumes the adrp will always be the first instruction. This patch adds support for a single "bti c" to prefix it. A test binary has been uploaded with such a PLT sequence. A forthcoming LLD patch will make heavy use of the PLT decoding code. Differential Revision: https://reviews.llvm.org/D62598 llvm-svn: 362523	2019-06-04 16:35:40 +00:00
Jinsong Ji	3144d7a2da	[PowerPC] P9 Scheduling Model: dispatching rule fixes This is to address some of the problems in existing P9 resource modeling, especially about the dispatching rules. Instead of using a hypothetical DISPATCHER , we try to use the number of actual dispatch slots, and define SchedWriteRes to model dispatch rules, then update instruction classes according to dispatch rules. All the dispatch rules and instruction classes update are made according to POWER9 User Manual. Differential Revision: https://reviews.llvm.org/D61873 llvm-svn: 362509	2019-06-04 15:22:23 +00:00
Sanjay Patel	1e63dd0b44	[SelectionDAG][x86] limit post-legalization store merging by type The proposal in D62498 showed that x86 would benefit from vector store splitting, but that may conflict with the generic DAG combiner's store merging transforms. Add memory type to the existing TLI hook that enables the merging transforms, so we can limit those changes to scalars only for x86. llvm-svn: 362507	2019-06-04 15:15:59 +00:00
Simon Pilgrim	a6e289e9f8	[X86][SSE] Pulled out (sub (xor X, M), M) 'ConditionalNegate' out pattern match code. NFCI. As discussed on D62777 - we should be able to use this in more SSE41+ cases as well but that requires us to separate it from the OR(AND(),ANDN()) matcher. llvm-svn: 362504	2019-06-04 15:02:33 +00:00
Shawn Landden	669775f9db	[Support] make countLeadingZeros() countTrailingZeros() countLeadingOnes() and countTrailingOnes() return unsigned This matches APInt's versions of these functions, and there is no need for these to be size_t. (as well as __builtin_clzll()) Differential Revision: https://reviews.llvm.org/D60823 llvm-svn: 362503	2019-06-04 14:51:15 +00:00
Dmitri Gribenko	454fc77872	Include what you use in PPCRegisterInfo.cpp llvm-svn: 362495	2019-06-04 12:55:00 +00:00
Mikhail Maltsev	08da01b496	[ARM] Add FP16 vector insert/extract patterns This change adds two FP16 extraction and two insertion patterns (one per possible vector length). Extractions are handled by copying a Q/D register into one of VFP2 class registers, where single FP32 sub-registers can be accessed. Then the extraction of even lanes are simple sub-register extractions (because we don't care about the top parts of registers for FP16 operations). Odd lanes need an additional VMOVX instruction. Unfortunately, insertions cannot be handled in the same way, because: * There is no instruction to insert FP16 into an even lane (VINS only works with odd lanes) * The patterns for odd lanes will have a form of a DAG (not a tree), and will not be implementable in pure tablegen Because of this insertions are handled in the same way as 16-bit integer insertions (with conversions between FP registers and GPRs using VMOVHR instructions). Without these patterns the ARM backend would sometimes fail during instruction selection. This patch also adds patterns which combine: * an FP16 element extraction and a store into a single VST1 instruction * an FP16 load and insertion into a single VLD1 instruction Differential Revision: https://reviews.llvm.org/D62651 llvm-svn: 362482	2019-06-04 09:39:55 +00:00
Dmitri Gribenko	73a15d4b78	Include what you use in PPC.h llvm-svn: 362477	2019-06-04 09:16:35 +00:00
Dmitri Gribenko	067a17b51d	Include what you use in PPCMachineScheduler.cpp llvm-svn: 362476	2019-06-04 09:16:31 +00:00
Dmitri Gribenko	9d1c5ea165	Include what you use in PPCRegisterInfo.h llvm-svn: 362475	2019-06-04 09:13:08 +00:00
Simon Tatham	ac02445524	[ARM] Turn some undefined encoding bits into 0s. The family of 32-bit Thumb instruction encodings that include t2ORR, t2AND and t2EOR are all listed in the ArmARM as having (0) in bit 15. The Tablegen descriptions of those instructions listed them as ?. This change tightens that up by making them into 0 + Unpredictable. In the specific case of t2ORR, we tighten it up still further by making the zero bit mandatory. This change comes from Arm v8.1-M, in which encodings with that bit equal to 1 will now be used for different instructions. Reviewers: dmgreen, samparker, SjoerdMeijer, efriedma Reviewed By: dmgreen, efriedma Subscribers: efriedma, javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60705 llvm-svn: 362470	2019-06-04 08:28:48 +00:00
Matt Arsenault	0ceda9fb5c	AMDGPU: Disable stack realignment for kernels This is something of a workaround, and the state of stack realignment controls is kind of a mess. Ideally, we would be able to specify the stack is infinitely aligned on entry to a kernel. TargetFrameLowering provides multiple controls which apply at different points. The StackRealignable field is used during SelectionDAG, and for some reason distinct from this hook. StackAlignment is a single field not dependent on the function. It would probably be better to make that dependent on the calling convention, and the maximum value for kernels. Currently this doesn't really change anything, since the frame lowering mostly does its own thing. This helps avoid regressions in a future change which will rely more heavily on hasFP. llvm-svn: 362447	2019-06-03 21:33:22 +00:00
Jessica Paquette	7500c97ce4	[AArch64][GlobalISel] Optimize G_FCMP + G_SELECT pairs when G_SELECT is fp Instead of emitting all of the test stuff for a compare when it's only used by a select, instead, just emit the compare + select. The select will use the value of NZCV correctly, so we don't need to emit all of the test instructions etc. For now, only support fp selects which use G_FCMP. Also only support condition codes which will only require one select to represent. Also add a test. Differential Revision: https://reviews.llvm.org/D62695 llvm-svn: 362446	2019-06-03 20:47:20 +00:00
Craig Topper	dcf865f0ca	[X86] Fix the pattern for merge masked vcvtps2pd. r362199 fixed it for zero masking, but not zero masking. The load folding in the peephole pass hid the bug. This patch turns off the peephole pass on the relevant test to ensure coverage. llvm-svn: 362440	2019-06-03 19:29:14 +00:00
Nemanja Ivanovic	bad43d8f49	[PowerPC] Look through copies for compare elimination We currently miss the opportunities for optmizing comparisons in the peephole optimizer if the input is the result of a COPY since we look for record-form versions of the producing instruction. This patch simply lets the optimization peek through copies. Differential revision: https://reviews.llvm.org/D59633 llvm-svn: 362438	2019-06-03 19:09:15 +00:00
Matt Arsenault	8dbeb9256c	TTI: Improve default costs for addrspacecast For some reason multiple places need to do this, and the variant the loop unroller and inliner use was not handling it. Also, introduce a new wrapper to be slightly more precise, since on AMDGPU some addrspacecasts are free, but not no-ops. llvm-svn: 362436	2019-06-03 18:41:34 +00:00
Dmitri Gribenko	26c43d0ef8	Include what you use in Lanai.h Other files were not relying on these transitive includes, so I'm submitting this change separately. llvm-svn: 362423	2019-06-03 17:02:15 +00:00
Dmitri Gribenko	b8aeaf882e	Include what you use in LanaiAsmPrinter.cpp llvm-svn: 362422	2019-06-03 17:02:07 +00:00
Dmitri Gribenko	dc136847e3	Include what you use in LanaiMemAluCombiner.cpp llvm-svn: 362421	2019-06-03 17:02:02 +00:00
Dmitri Gribenko	f4d22bd0b4	Include what you use in LanaiISelDAGToDAG.cpp llvm-svn: 362420	2019-06-03 17:01:57 +00:00
Dmitri Gribenko	179154f6b9	Include what you use in LanaiFrameLowering.{cpp,h} llvm-svn: 362419	2019-06-03 17:01:52 +00:00
Dmitri Gribenko	8e317e29da	Include what you use in LanaiRegisterInfo.cpp llvm-svn: 362416	2019-06-03 16:31:37 +00:00
Dmitri Gribenko	bedcaea99a	Include what you use in LanaiInstrInfo.cpp llvm-svn: 362408	2019-06-03 15:26:25 +00:00
Dmitri Gribenko	b3bd866c7f	Include what you use in PPCInstrInfo.h llvm-svn: 362405	2019-06-03 15:04:05 +00:00
Dmitri Gribenko	2b369f83c5	Include what you use in NVPTX.h Other files were not relying on these transitive includes, so I'm submitting this change separately. llvm-svn: 362403	2019-06-03 14:37:26 +00:00
Dmitri Gribenko	14c69fefe6	Include what you use in NVPTX.h I also fixed all other files that were including NVPTX.h and were relying on transitive includes. llvm-svn: 362402	2019-06-03 14:26:50 +00:00
Dmitry Preobrazhensky	9111f35f02	[AMDGPU][MC] Added support of SCC, VCCZ and EXECZ operands See bug 39292: https://bugs.llvm.org/show_bug.cgi?id=39292 Reviewers: rampitec, arsenm Differential Revision: https://reviews.llvm.org/D62660 llvm-svn: 362400	2019-06-03 13:51:24 +00:00
Dmitri Gribenko	7a3e4ab286	Include what you use in LanaiInstPrinter.cpp llvm-svn: 362395	2019-06-03 12:53:05 +00:00
Dmitri Gribenko	2f66316c96	Include what you use in LanaiMCCodeEmitter.cpp LanaiMCCodeEmitter.cpp was not using any APIs from Lanai.h, and was only including it for transitive dependencies. Doing so is problematic from include-what-you-use perspective, but it is also a layering issue (it creates a dependency cycle between the primary Lanai target library and the MCTargetDesc library). llvm-svn: 362394	2019-06-03 12:42:48 +00:00
Dmitri Gribenko	c69ee63cb9	Include what you use in LanaiDisassembler.cpp llvm-svn: 362392	2019-06-03 12:37:11 +00:00
Nicolai Haehnle	edfa756f3f	AMDGPU/GFX10: V_CMPX_xxx instructions still have an omod operand Summary: Change-Id: If6ee98e4a723b643bc37254fc6ef8b3812db16da Reviewers: rampitec Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62720 Change-Id: Id547ef152b2f92b24dc1c0efbf7e4467c4fb4b6e llvm-svn: 362390	2019-06-03 12:07:41 +00:00
Dmitri Gribenko	8668fc0102	Include what you use in HexagonInstPrinter.cpp HexagonInstPrinter.cpp was not using any APIs from HexagonAsmPrinter.h. Doing so is problematic from include-what-you-use perspective, but it is also a layering issue (it creates a dependency cycle between the primary Hexagon target library and the MCTargetDesc library). llvm-svn: 362389	2019-06-03 11:41:22 +00:00
Dmitri Gribenko	61b49ccb77	Include what you use in HexagonAsmPrinter.h llvm-svn: 362388	2019-06-03 11:41:18 +00:00
Dmitri Gribenko	03d1b33041	Include what you use in HexagonMCInstrInfo.cpp HexagonMCInstrInfo.cpp was not using any APIs from Hexagon.h. Doing so is problematic from include-what-you-use perspective, but it is also a layering issue (it creates a dependency cycle between the primary Hexagon target library and the MCTargetDesc library). llvm-svn: 362387	2019-06-03 11:25:37 +00:00
Dmitri Gribenko	970b9f961f	Include what you use in HexagonMCCodeEmitter.cpp HexagonMCCodeEmitter.cpp was not using any APIs from Hexagon.h. Doing so is problematic from include-what-you-use perspective, but it is also a layering issue (it creates a dependency cycle between the primary Hexagon target library and the MCTargetDesc library). llvm-svn: 362386	2019-06-03 11:20:53 +00:00
Dmitri Gribenko	ebe360edfa	Include what you use in HexagonMCCompound.cpp HexagonMCCompound.cpp was not using any APIs from Hexagon.h. Doing so is problematic from include-what-you-use perspective, but it is also a layering issue (it creates a dependency cycle between the primary Hexagon target library and the MCTargetDesc library). llvm-svn: 362385	2019-06-03 11:20:48 +00:00
Dmitri Gribenko	6e076a081a	Include what you use in HexagonShuffler.cpp HexagonShuffler.cpp was not using any APIs from Hexagon.h, and was only including it for transitive dependencies. Doing so is problematic from include-what-you-use perspective, but it is also a layering issue (it creates a dependency cycle between the primary Hexagon target library and the MCTargetDesc library). llvm-svn: 362384	2019-06-03 11:14:20 +00:00
Dmitri Gribenko	6214b577b7	Include what you use in HexagonMCChecker.cpp HexagonMCChecker.cpp was not using any APIs from Hexagon.h. Doing so is problematic from include-what-you-use perspective, but it is also a layering issue (it creates a dependency cycle between the primary Hexagon target library and the MCTargetDesc library). llvm-svn: 362383	2019-06-03 11:14:15 +00:00
Dmitri Gribenko	bf2a356ec0	Include what you use in HexagonMCTargetDesc.cpp HexagonMCTargetDesc.cpp was not using any APIs from Hexagon.h. Doing so is problematic from include-what-you-use perspective, but it is also a layering issue (it creates a dependency cycle between the primary Hexagon target library and the MCTargetDesc library). llvm-svn: 362382	2019-06-03 11:14:10 +00:00
Dmitri Gribenko	beb7f48a29	Include what you use in HexagonMCShuffler.cpp HexagonMCShuffler.cpp was not using any APIs from Hexagon.h. Doing so is problematic from include-what-you-use perspective, but it is also a layering issue (it creates a dependency cycle between the primary Hexagon target library and the MCTargetDesc library). llvm-svn: 362381	2019-06-03 11:14:05 +00:00
Dmitri Gribenko	7ebfbebfe1	Include what you use in HexagonELFObjectWriter.cpp HexagonELFObjectWriter.cpp was not using any APIs from Hexagon.h, and was only including it for transitive dependencies. Doing so is problematic from include-what-you-use perspective, but it is also a layering issue (it creates a dependency cycle between the primary Hexagon target library and the MCTargetDesc library). llvm-svn: 362376	2019-06-03 09:56:40 +00:00
Dmitri Gribenko	0aa374a306	Include what you use in HexagonAsmBackend.cpp HexagonAsmBackend.cpp was not using any APIs from Hexagon.h. Doing so is problematic from include-what-you-use perspective, but it is also a layering issue (it creates a dependency cycle between the primary Hexagon target library and the MCTargetDesc library). llvm-svn: 362372	2019-06-03 09:43:05 +00:00
Dmitri Gribenko	301f8fd632	Include what you use in HexagonAsmParser.cpp HexagonAsmParser.cpp was not using any APIs from Hexagon.h. Doing so is problematic from include-what-you-use perspective, but it is also a layering issue (it creates a dependency cycle between the primary Hexagon target library and the AsmParser library). llvm-svn: 362370	2019-06-03 09:38:48 +00:00
Dmitri Gribenko	c5327ab71d	Include what you use in HexagonShuffler.h HexagonShuffler.h was not using any APIs from Hexagon.h, and was only including it for transitive dependencies. Doing so is problematic from include-what-you-use perspective, but it is also a layering issue (it creates a dependency cycle between the primary Hexagon target library and the MCTargetDesc library). llvm-svn: 362369	2019-06-03 09:33:48 +00:00
Dmitri Gribenko	3c837201e0	Include what you use in BPFMCTargetDesc.cpp BPFMCTargetDesc.cpp was not using any APIs from BPF.h. Doing so is problematic from include-what-you-use perspective, but it is also a layering issue (it creates a dependency cycle between the primary BPF target library and the MCTargetDesc library). llvm-svn: 362368	2019-06-03 09:29:51 +00:00
Diogo N. Sampaio	df92f84110	[ARM][FIX] Ran out of registers due tail recursion Summary: - pr42062 When compiling for MinSize, ARMTargetLowering::LowerCall decides to indirect multiple calls to a same function. However, it disconsiders the limitation that thumb1 indirect calls require the callee to be in a register from r0 to r3 (llvm limiation). If all those registers are used by arguments, the compiler dies with "error: run out of registers during register allocation". This patch tells the function IsEligibleForTailCallOptimization if we intend to perform indirect calls, as to avoid tail call optimization. Reviewers: dmgreen, efriedma Reviewed By: efriedma Subscribers: javed.absar, kristof.beyls, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62683 llvm-svn: 362366	2019-06-03 08:58:05 +00:00
Sam Parker	a0bd6f8a1a	[AArch64] Check for simple type in FPToUInt DAGCombiner was hitting a SimpleType assertion when trying to combine a v3f32 before type legalization. bugzilla: https://bugs.llvm.org/show_bug.cgi?id=41916 Differential Revision: https://reviews.llvm.org/D62734 llvm-svn: 362365	2019-06-03 08:49:17 +00:00
Jim Lin	20b14dacbb	[AVR] Fix incorrect source regclass of LDWRdPtr Summary: LDWRdPtr would be expanded to ld+ldd. ldd only accepts the pointer register is Y or Z. So the register class of pointer of LDWRdPtr should be PTRDISPREGS instead of PTRREGS. Reviewers: dylanmckay Reviewed By: dylanmckay Subscribers: dylanmckay, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62300 llvm-svn: 362351	2019-06-03 02:31:07 +00:00
Simon Pilgrim	8a32ca381d	[CostModel][X86] Improve masked load/store AVX1/AVX2 costs A mixture of internal tests and review of the scheduler models indicates we're overestimating the cost of a masked load, which we're estimating at 4x regular memory ops - more realistic values indicates that its closer to 2x. Masked stores costs are a lot more diverse but 8x is roughly in the middle of the range. e.g. SandyBridge defm : X86WriteRes<WriteFMaskedLoad, [SBPort23,SBPort05], 8, [1,2], 3>; defm : X86WriteRes<WriteFMaskedLoadY, [SBPort23,SBPort05], 9, [1,2], 3>; defm : X86WriteRes<WriteFMaskedStore, [SBPort4,SBPort01,SBPort23], 5, [1,1,1], 3>; defm : X86WriteRes<WriteFMaskedStoreY, [SBPort4,SBPort01,SBPort23], 5, [1,1,1], 3>; e.g. Btver2 defm : X86WriteRes<WriteFMaskedLoad, [JLAGU, JFPU01, JFPX], 6, [1, 2, 2], 1>; defm : X86WriteRes<WriteFMaskedLoadY, [JLAGU, JFPU01, JFPX], 6, [2, 4, 4], 2>; defm : X86WriteRes<WriteFMaskedStore, [JSAGU, JFPU01, JFPX], 6, [1, 1, 4], 1>; defm : X86WriteRes<WriteFMaskedStoreY, [JSAGU, JFPU01, JFPX], 6, [2, 2, 4], 2>; Differential Revision: https://reviews.llvm.org/D61257 llvm-svn: 362338	2019-06-02 20:37:02 +00:00
Simon Pilgrim	59a8db628b	[TTI][X86] Cleanup getMaskedMemoryOpCost. NFCI. Prep work before resurrecting D61257. llvm-svn: 362335	2019-06-02 18:06:42 +00:00
Simon Pilgrim	71a39bcf68	[X86] isHorizontalBinOp - add extract_subvector(shuffle(x)) handling (PR39921) Let's us match horizontal op patterns on fast-variable-shuffle targets (Haswell etc.) llvm-svn: 362327	2019-06-02 15:47:49 +00:00
Simon Pilgrim	7a869e7036	[DAGCombine] Fold insert_subvector(bitcast(x),bitcast(y),c1) -> bitcast(insert_subvector(x,y),c2) Move this combine from x86 into generic DAGCombine, which currently only manages cases where the bitcast is between types of the same scalarsize. Differential Revision: https://reviews.llvm.org/D59188 llvm-svn: 362324	2019-06-02 14:42:11 +00:00
Craig Topper	396a915c26	[X86] Add the SSE versions of PMULLW and PMULLD to isAssociativeAndCommutative. llvm-svn: 362309	2019-06-02 00:42:58 +00:00
Simon Atanasyan	25694e0084	[mips] Extend range of register indexes accepted by cfcmsa/ctcmsa The `cfcmsa` and `ctcmsa` instructions accept index of MSA control register. The MIPS64 SIMD Architecture define eight MSA control registers. But register index for `cfcmsa` and `ctcmsa` instructions might be any number in 0..31 range. If the index is greater then 7, `cfcmsa` writes zero to the destination registers and `ctcmsa` does nothing [1]. [1] MIPS Architecture for Programmers Volume IV-j: The MIPS64 SIMD Architecture Module https://www.mips.com/?do-download=the-mips64-simd-architecture-module Differential Revision: https://reviews.llvm.org/D62597 llvm-svn: 362299	2019-06-01 13:55:18 +00:00
Dylan McKay	45eb4c7e55	[AVR] Disable register coalescing to the PTRDISPREGS class If we would allow register coalescing on PTRDISPREGS class then register allocator can lock Z register to some virtual register. Larger instructions requiring a memory acces then fail during the register allocation phase since there is no available register to hold a pointer if Y register was already taken for a stack frame. This patch prevents it by keeping Z register spillable. It does it by not allowing coalescer to lock it. Original discussion on https://github.com/avr-rust/rust/issues/128. llvm-svn: 362298	2019-06-01 12:38:56 +00:00
Craig Topper	c288a19bb7	[X86] Add AVX512BF16 and AVX512VP2INTERSECT instructions to the loading folding tables. llvm-svn: 362288	2019-06-01 06:20:59 +00:00
Craig Topper	48fdb61766	[X86] Make the X86FoldTablesEmitter functional again. Fix the spacing in the output to make it easier to diff. Fix a few other formatting issues in the manual table. And remove some old FIXMEs. llvm-svn: 362287	2019-06-01 06:20:55 +00:00
Tom Tan	eb4d6142dc	[COFF, ARM64] Add CodeView register mapping CodeView has its own register map which is defined in cvconst.h. Missing this mapping before saving register to CodeView causes debugger to show incorrect value for all register based variables, like variables in register and local variables addressed by register (stack pointer + offset). This change added mapping between LLVM register and CodeView register so the correct register number will be stored to CodeView/PDB, it aso fixed the mapping from CodeView register number to register name based on current CPUType but print PDB to yaml still assumes X86 CPU and needs to be fixed. Differential Revision: https://reviews.llvm.org/D62608 llvm-svn: 362280	2019-05-31 23:43:31 +00:00
Nick Desaulniers	7fcad2f171	[PowerPC] check for INLINEASM_BR along w/ INLINEASM Summary: It looks like since INLINEASM_BR was created off of INLINEASM (r353563), a few checks for INLINEASM needed to be updated to check for either case. pr/41999 Reviewers: hfinkel Reviewed By: hfinkel Subscribers: nemanjai, hiraditya, kbarton, jsji, llvm-commits, craig.topper, srhines Tags: #llvm Differential Revision: https://reviews.llvm.org/D62403 llvm-svn: 362278	2019-05-31 23:02:13 +00:00
Matt Arsenault	302eedcbfa	AMDGPU: Fix not adding ImplicitBufferPtr as a live-in Fixes missing test from r293000. llvm-svn: 362275	2019-05-31 22:47:36 +00:00
Stanislav Mekhanoshin	fbbe5230f4	[AMDGPU] Use InliningThresholdMultiplier for inline hint AMDGPU uses multiplier 9 for the inline cost. It is taken into account everywhere except for inline hint threshold. As a result we are penalizing functions with the inline hint making them less probable to be inlined than those without the hint. Defaults are 225 for a normal function and 325 for a function with an inline hint. Currently we have effective threshold 225 * 9 = 2025 for normal functions and just 325 for those with the hint. That is fixed by this patch. Differential Revision: https://reviews.llvm.org/D62707 llvm-svn: 362239	2019-05-31 16:19:26 +00:00
Guozhi Wei	c3a24e93d5	[PPC] Correctly adjust branch probability in PPCReduceCRLogicals In PPCReduceCRLogicals after splitting the original MBB into 2, the 2 impacted branches still use original branch probability. This is unreasonable. Suppose we have following code, and the probability of each successor is 50%. condc = conda \|\| condb br condc, label %target, label %fallthrough It can be transformed to following, br conda, label %target, label %newbb newbb: br condb, label %target, label %fallthrough Since each branch has a probability of 50% to each successor, the total probability to %fallthrough is 25% now, and the total probability to %target is 75%. This actually changed the original profiling data. A more reasonable probability can be set to 70% to the false side for each branch instruction, so the total probability to %fallthrough is close to 50%. This patch assumes the branch target with two incoming edges have same edge frequency and computes new probability fore each target, and keep the total probability to original targets unchanged. Differential Revision: https://reviews.llvm.org/D62430 llvm-svn: 362237	2019-05-31 16:11:17 +00:00
Cullen Rhodes	0fc3a07398	[AArch64][SVE2] Asm: support WHILE instructions Summary: Patch adds support for the following instructions: * WHILEGE, WHILEGT, WHILEHS, WHILEHI, WHILEWR, WHILERW The specification can be found here: https://developer.arm.com/docs/ddi0602/latest Reviewed By: chill Differential Revision: https://reviews.llvm.org/D62601 llvm-svn: 362215	2019-05-31 09:13:55 +00:00
Cullen Rhodes	087d1337f8	[AArch64][SVE2] Asm: support TBL/TBX instructions Summary: A three sources variant of the TBL instruction is added to the existing SVE instruction in SVE2. This is implemented with minor changes to the existing TableGen class. TBX is a new instruction with its own definition. The specification can be found here: https://developer.arm.com/docs/ddi0602/latest Reviewed By: chill Differential Revision: https://reviews.llvm.org/D62600 llvm-svn: 362214	2019-05-31 09:06:53 +00:00
Cullen Rhodes	2e870011b6	[AArch64][SVE2] Asm: support SVE2 store instructions Summary: Patch adds support for the following instructions: * STNT1B, STNT1H, STNT1S, STNT1D The specification can be found here: https://developer.arm.com/docs/ddi0602/latest Reviewed By: chill Differential Revision: https://reviews.llvm.org/D62599 llvm-svn: 362213	2019-05-31 08:59:40 +00:00
Petar Avramovic	efcd3c0009	[MIPS GlobalISel] Handle position independent code Handle position independent code for MIPS32. When callee is global address, lower call will emit callee as G_GLOBAL_VALUE and add target flag if needed. Support $gp in getRegBankFromRegClass(). Select G_GLOBAL_VALUE, specially handle case when there are target flags attached by lowerCall. Differential Revision: https://reviews.llvm.org/D62589 llvm-svn: 362210	2019-05-31 08:27:06 +00:00
Petar Avramovic	9058b50fb2	[mips] Move initGlobalBaseReg to MipsFunctionInfo. NFC Move initGlobalBaseReg from MipsSEDAGToDAGISel to MipsFunctionInfo. This way functions used for handling position independent code during instruction selection, getGlobalBaseReg and initGlobalBaseReg, end up in same class. Differential Revision: https://reviews.llvm.org/D62586 llvm-svn: 362206	2019-05-31 08:15:28 +00:00
Petar Avramovic	f4a6dd28b6	[MIPS GlobalISel] Lower call for callee that is register Lower call for callee that is register for MIPS32. Register should contain callee function address. Differential Revision: https://reviews.llvm.org/D62585 llvm-svn: 362204	2019-05-31 08:06:17 +00:00
Craig Topper	31d00d80a2	[X86] Remove patterns for X86VSintToFP/X86VUintToFP+loadv4f32 to v2f64. These patterns can incorrectly narrow a volatile load from 128-bits to 64-bits. Similar to PR42079. Switch to using (v4i32 (bitcast (v2i64 (scalar_to_vector (loadi64))))) as the load pattern used in the instructions. This probably still has issues in 32-bit mode where loadi64 isn't legal. Maybe we should use VZMOVL for widened loads even when we don't need the upper bits as zeroes? llvm-svn: 362203	2019-05-31 07:38:26 +00:00
Craig Topper	b79cc5f802	[X86] Remove avx512 isel patterns for fpextend+load. Prefer to only match fp extloads instead. DAG combine will usually fold fpextend+load to an fp extload anyway. So the 256 and 512 patterns were probably unnecessary. The 128 bit pattern was special in that it looked for a v4f32 load, but then used it in an instruction that only loads 64-bits. This is bad if the load happens to be volatile. We could probably make the patterns volatile aware, but that's more work for something that's probably rare. The peephole pass might kick in and save us anyway. We might also be able to fix this with some additional DAG combines. This also adds patterns for vselect+extload to enabled masked vcvtps2pd to be used. Previously we looked for the unlikely vselect+fpextend+load. llvm-svn: 362199	2019-05-31 06:21:53 +00:00
Craig Topper	23066033a1	[X86] Correct the ins operand order for MASKPAIR16STORE to match other store instructions. This makes the 5 address operands come first. And the data operand comes last. This matches the operand order the instruction is created with. It's also the expected order in X86MCInstLower. So everything appeared to work, but the operands didn't match their declared type. Fixes a -verify-machineinstrs failure. Also remove the isel patterns from these instructions since they should only be used for stack spills and reloads. I'm not even sure what types the patterns were looking for to match. llvm-svn: 362193	2019-05-31 05:20:27 +00:00
Pengfei Wang	2e67d0c842	[X86] Add VP2INTERSECT instructions Support Intel AVX512 VP2INTERSECT instructions in llvm Patch by Xiang Zhang (xiangzhangllvm) Differential Revision: https://reviews.llvm.org/D62366 llvm-svn: 362188	2019-05-31 02:50:41 +00:00
Craig Topper	70dc2200a2	[X86] Remove result type constraints from the extloadv2f32/extloadv4f32/extloadv8f32 PatFrags. NFC The result types aren't mentioned in the pattern name so really shouldn't be in the PatFrags. The users of these either have their own type constraint or rely on the type constranit system to realize the only legal extend would be to f64. llvm-svn: 362175	2019-05-30 23:35:24 +00:00
Craig Topper	d6b74cc859	[X86] Remove code that unnecessarily sets EXTLOAD with src type of v2f32/v4f32/v8f32 as Legal for SSE2/AVX/AVX512 respectively. NFC The LoadExt table defaults to all combinations being Legal. For vector types, only src VTs with an i1 element type were ever changed. So we don't need to mark them legal manually. llvm-svn: 362170	2019-05-30 22:29:06 +00:00
Matt Arsenault	e0a4da8c0a	AMDGPU/GlobalISel: Add wave scratch offset argument Avoids crashing in PEI in a future change. llvm-svn: 362136	2019-05-30 19:33:18 +00:00
Tim Renouf	7fecdf36cc	[AMDGPU] Added target-specific attribute amdgpu-max-memory-clause With LLPC, previous investigation has suggested that si-scheduler interacts badly with SiFormMemoryClauses on an XNACK target in some games. That needs further investigation in the future. In the meantime, this commit adds a target-specific attribute to allow us to disable SIFormMemoryClauses by setting it to 1 on a per-function basis for LLPC to use. Differential Revision: https://reviews.llvm.org/D62572 Change-Id: Ia0ca12ce79093cbbe86caded723ffb13384ede92 llvm-svn: 362127	2019-05-30 18:46:34 +00:00
Sam Parker	913604a637	[NFC][ARM][ParallelDSP] Refactor narrow sequence Most of the code used for finding a 'narrow' sequence is not used, so I've removed it and simplified the calls from the smlad matcher. llvm-svn: 362104	2019-05-30 15:26:37 +00:00
Sjoerd Meijer	eb072b5a6a	[ARM] Change the MC names for VMAXNM/VMINNM Now the NEON ones have a prefix "NEON_", and the VFP ones have a prefix "VFP_". This is so that the regex in ARMScheduleA57.td can be made to match both of _those_ classes of VMAXNM without also matching the MVE ones that are going to be introduced soon. NFCI. Patch by Simon Tatham. Differential Revision: https://reviews.llvm.org/D60700 llvm-svn: 362097	2019-05-30 14:34:29 +00:00
Simon Pilgrim	5359bb4d31	[ARM] LowerVECTOR_SHUFFLE - fix uninitialized variable warnings. NFCI. llvm-svn: 362094	2019-05-30 14:01:24 +00:00
Sjoerd Meijer	930dee2c0b	[ARM] add target arch definitions for 8.1-M and MVE This adds: - LLVM subtarget features to make all the new instructions conditional on, - CPU and FPU names for use on clang's command line, with default FPUs set so that "armv8.1-m.main+fp" and "armv8.1-m.main+fp.dp" will select the right FPU features, - architecture extension names "mve" and "mve.fp", - ABI build attribute support for v8.1-M (a new value for Tag_CPU_arch) and MVE (a new actual tag). Patch mostly by Simon Tatham. Differential Revision: https://reviews.llvm.org/D60698 llvm-svn: 362090	2019-05-30 12:57:04 +00:00
Sjoerd Meijer	7eb95d672d	[ARM] Introduce separate features for FP registers The MVE extension in Arm v8.1-M permits the use of some move, load and store isntructions which access the FP registers, even if there's no actual FP support in the processor (in particular, if you have the integer-only version of MVE). Therefore, we need separate subtarget features to condition those instructions on, which are implied by both FP and MVE but are not part of either. Patch mostly by Simon Tatham. Differential Revision: https://reviews.llvm.org/D60694 llvm-svn: 362088	2019-05-30 12:37:05 +00:00
Simon Pilgrim	32aac1727a	[X86][SSE] Improve bool vector extload (PR26091) We already have good codegen for (vXiY *ext(vXi1 bitcast(iX))) cases, this patch uses it for loads of vXi1 types as well - changing the load into a iX integer load, and bitcasting so that combineToExtendBoolVectorInReg can then use it. Differential Revision: https://reviews.llvm.org/D62449 llvm-svn: 362081	2019-05-30 10:25:20 +00:00
Cullen Rhodes	7fad428931	[AArch64][SVE2] Asm: support SVE2 vector splice (constructive) Summary: The specification can be found here: https://developer.arm.com/docs/ddi0602/latest Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D62530 llvm-svn: 362073	2019-05-30 08:51:39 +00:00
Cullen Rhodes	ebe23041f0	[AArch64][SVE2] Asm: support SVE2 load instructions Summary: Patch adds support for the following instructions: * LDNT1SB, LDNT1B, LDNT1SH, LDNT1H, LDNT1SW, LDNT1W, LDNT1D The specification can be found here: https://developer.arm.com/docs/ddi0602/latest Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D62528 llvm-svn: 362072	2019-05-30 08:44:27 +00:00
Cullen Rhodes	455c529f77	[AArch64][SVE2] Asm: support FCVTX/FLOGB instructions Summary: Patch completes SVE2 support for: SVE Floating Point Unary Operations - Predicated Group The specification can be found here: https://developer.arm.com/docs/ddi0602/latest Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D62526 llvm-svn: 362071	2019-05-30 08:35:12 +00:00
Cullen Rhodes	028413f5ae	[AArch64][SVE2] Asm: add ext (immediate offset, constructive) instruction Summary: The specification can be found here: https://developer.arm.com/docs/ddi0602/latest Reviewed By: chill Differential Revision: https://reviews.llvm.org/D62518 llvm-svn: 362070	2019-05-30 08:25:17 +00:00
Sjoerd Meijer	5857bf5d1e	[ARM] Add an MVE execution domain MVE architecturally specifies a 'beat' system in which a vector instruction executed now will complete its actual operation over the next four cycles, so it can overlap with the execution of the previous and next MVE instruction. This makes it generally an advantage to avoid moving values back and forth between MVE registers and anywhere else, if there's any sensible way to do the same processing in whatever register type the values already occupied. That's just what the 'execution domain' system is supposed to achieve. So here we add a new execution domain which will contain all the MVE vector instructions when they are added. Patch by: Simon Tatham Differential Revision: https://reviews.llvm.org/D60703 llvm-svn: 362068	2019-05-30 08:07:06 +00:00
Pengfei Wang	1f67d94279	[X86] Add ENQCMD instructions For more details about these instructions, please refer to the latest ISE document: https://software.intel.com/en-us/download/intel-architecture-instruction-set-extensions-programming-reference. Patch by Tianqing Wang (tianqing) Differential Revision: https://reviews.llvm.org/D62281 llvm-svn: 362053	2019-05-30 03:59:16 +00:00
Pete Couperus	d80024c687	[ARC] Cleanup ARCAsmPrinter. Summary: Remove unused getTargetStreamer. Remove unused headers. Reviewers: dantrushin Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62549 llvm-svn: 362021	2019-05-29 20:07:35 +00:00
Aakanksha Patil	d5443f8c21	AMDGPU: Return address lowering The patch computes the return address for the current function. Differential revision: https://reviews.llvm.org/D59666 llvm-svn: 362001	2019-05-29 18:20:11 +00:00
Simon Atanasyan	188162118f	[mips] Iterate over MSACtrlRegClass to reserve all MSA control registers. NFC llvm-svn: 361965	2019-05-29 14:58:56 +00:00
Simon Atanasyan	af7bf2f687	[mips] Use range-based for loops. NFC llvm-svn: 361964	2019-05-29 14:58:50 +00:00
Sjoerd Meijer	24c5629625	[ARM] Split predicates out into their own .td file The new ARMPredicates.td is included from ARM.td, early enough that the predicate definitions are already in scope when ARMSchedule.td is included. This will make it possible to refer to them in UnsupportedFeatures fields of scheduling models. NFC: the chunk of Tablegen being moved here is copied and pasted verbatim. Patch by: Simon Tatham Differential Revision: https://reviews.llvm.org/D60693 llvm-svn: 361958	2019-05-29 13:41:57 +00:00
Cullen Rhodes	6c04ef3d48	[AArch64][SVE2] Asm: support SVE Bitwise Logical - Unpredicated Group Summary: Patch adds support for the following instructions: * EOR3, BSL, BCAX, BSL1N, BSL2N, NBSL, XAR Aliases for types .B/.H/.S for EOR3 and BCAX have been added, the preferred disassembly is .D. The specification can be found here: https://developer.arm.com/docs/ddi0602/latest Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D62387 llvm-svn: 361936	2019-05-29 09:03:27 +00:00
Cullen Rhodes	75dfbdc2da	[AArch64][SVE2] Asm: support Floating Point Widening Multiply-Add Summary: Patch adds support for the indexed and unpredicated vectors forms of the FMLALB, FMLALT, FMLSLB and FMLSLT instructions. The specification can be found here: https://developer.arm.com/docs/ddi0602/latest Reviewed By: chill Differential Revision: https://reviews.llvm.org/D62386 llvm-svn: 361935	2019-05-29 08:53:06 +00:00
Cullen Rhodes	4f58ad4e72	[AArch64][SVE2] Asm: support SVE2 Floating Point Pairwise Group Summary: Patch adds support for the following instructions: SVE2 floating-point pairwise operations: * FADDP, FMAXNMP, FMINNMP, FMAXP, FMINP The specification can be found here: https://developer.arm.com/docs/ddi0602/latest Reviewed By: chill Differential Revision: https://reviews.llvm.org/D62383 llvm-svn: 361933	2019-05-29 08:40:33 +00:00
Pengfei Wang	72e3f9662b	Revert "[X86] Use 'llvm_unreachable' instead of nullptr in unreachable code to" This reverts commit c1b3716614bc0a107e6f41a7d3d503baefad8a5b. llvm-svn: 361918	2019-05-29 02:49:59 +00:00
Pengfei Wang	818c652643	[X86] Use 'llvm_unreachable' instead of nullptr in unreachable code to avoid static check fail RegClassOrBank is an object of RegClassOrRegBank, which is defined as using llvm::RegClassOrRegBank = typedef PointerUnion<const TargetRegisterClass , const RegisterBank > so control flow can not get here. Use ""llvm_unreachable" here to avoid "null pointer" confusion. Patch by Shengchen Kan (skan) Differential Revision: https://reviews.llvm.org/D62006 Signed-off-by: pengfei <pengfei.wang@intel.com> llvm-svn: 361912	2019-05-29 02:20:37 +00:00
Fangrui Song	656afe370d	[X86] Fix x86-64 call foo@tlsdesc(%rax) and support R_386_TLSGOTDESC R_386_TLS_DESC_CALL D18885 emitted 5 bytes for call foo@tlsdesc(%rax). It should use the 2-byte form instead and let R_X86_64_TLSDESC_CALL apply to the beginning of the call instruction. The 2-byte form was deliberately chosen to make ->LE and ->IE relaxation work: 0: 48 8d 05 00 00 00 00 lea 0x0(%rip),%rax # 7 <.text+0x7> 3: R_X86_64_GOTPC32_TLSDESC a-0x4 7: ff 10 callq *(%rax) 7: R_X86_64_TLSDESC_CALL a => 0: 48 c7 c0 fc ff ff ff mov $0xfffffffffffffffc,%rax 7: 66 90 xchg %ax,%ax Also change the symbol type to STT_TLS when VK_TLSCALL or VK_TLSDESC is seen. Reviewed By: compnerd Differential Revision: https://reviews.llvm.org/D62512 llvm-svn: 361910	2019-05-29 02:02:59 +00:00
Thomas Lively	26d711be6e	[WebAssembly] Add signatures for RINT builtins Reviewers: azakai, dschuff Subscribers: sbc100, jgravelle-google, hiraditya, aheejin, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62564 llvm-svn: 361904	2019-05-29 01:06:00 +00:00
Jessica Paquette	b73ea75b38	[AArch64][GlobalISel] Select FCMPSri/FCMPDri when comparing against 0.0 Add support for selecting FCMPSri and FCMPDri when comparing against 0.0, and factor out opcode selection for G_FCMP into its own function. Add a test to show that we don't do this with other immediates. Differential Revision: https://reviews.llvm.org/D62539 llvm-svn: 361888	2019-05-28 22:52:49 +00:00
Heejin Ahn	5514658591	[WebAssembly] Support for atomic fences Summary: This adds support for translation of LLVM IR fence instruction. We convert a singlethread fence to a pseudo compiler barrier which becomes 0 instructions in final binary, and a thread fence to an idempotent atomicrmw instruction to a memory address. Reviewers: dschuff, jfb, sunfish, tlively Subscribers: sbc100, jgravelle-google, llvm-commits Differential Revision: https://reviews.llvm.org/D50277 llvm-svn: 361884	2019-05-28 22:09:12 +00:00
Konstantin Zhuravlyov	fe23ed2c68	AMDGPU: Temporary drop s_mul_hi_i/u32 patterns It introduces performance regressions in several applications. This has already been submitted downstream. llvm-svn: 361879	2019-05-28 21:18:34 +00:00
Adhemerval Zanella	34d8daae53	[AArch64] Handle ISD::LRINT and ISD::LLRINT This patch optimizes ISD::LRINT and ISD::LLRINT to frintx plus fcvtzs. It currently only handles the scalar version. Reviewed By: SjoerdMeijer, mstorsjo Differential Revision: https://reviews.llvm.org/D62018 llvm-svn: 361877	2019-05-28 21:04:29 +00:00
Adhemerval Zanella	6d7bf5e8df	[CodeGen] Add lrint/llrint builtins This patch add the ISD::LRINT and ISD::LLRINT along with new intrinsics. The changes are straightforward as for other floating-point rounding functions, with just some adjustments required to handle the return value being an interger. The idea is to optimize lrint/llrint generation for AArch64 in a subsequent patch. Current semantic is just route it to libm symbol. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D62017 llvm-svn: 361875	2019-05-28 20:47:44 +00:00
Michael Liao	5fc1dfa784	[AMDGPU] Correct the handling of inlineasm output registers. Summary: - There's a regression due to the cross-block RC assignment. Use the proper way to derive the output register RC in inline asm. Reviewers: rampitec, alex-t Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, dstuttard, tpr, t-tye, eraman, hiraditya, llvm-commits, yaxunl Tags: #llvm Differential Revision: https://reviews.llvm.org/D62537 llvm-svn: 361868	2019-05-28 19:37:09 +00:00
Sanjay Patel	f7980e727f	Revert "[x86] split 256-bit store of concatenated vectors" This reverts commit `d5a8637072`. Most likely suspect for this bot failure: http://lab.llvm.org:8011/builders/clang-cmake-x86_64-avx2-linux/builds/9684 llvm-svn: 361850	2019-05-28 17:37:58 +00:00
Matt Arsenault	24e80b8d04	AMDGPU: Don't enable all lanes with non-CSR VGPR spills If the only VGPRs used for SGPR spilling were not CSRs, this was enabling all laness and immediately restoring exec. This is the usual situation in leaf functions. llvm-svn: 361848	2019-05-28 16:46:02 +00:00
Michael Liao	7166843f1e	[AMDGPU] Fix the mis-handling of `vreg_1` copied from scalar register. Summary: - Don't treat the use of a scalar register as `vreg_1` an VGPR usage. Otherwise, that promotes that scalar register into vector one, which breaks the assumption that scalar register holds the lane mask. - The issue is triggered in a complicated case, where if the uses of that (lane mask) scalar register is legalized firstly before its definition, e.g., due to the mismatch block placement and its topological order or loop. In that cases, the legalization of PHI introduces the use of that scalar register as `vreg_1`. Reviewers: rampitec, nhaehnle, arsenm, alex-t Subscribers: kzhuravl, jvesely, wdng, dstuttard, tpr, t-tye, hiraditya, llvm-commits, yaxunl Tags: #llvm Differential Revision: https://reviews.llvm.org/D62492 llvm-svn: 361847	2019-05-28 16:29:39 +00:00
Simon Tatham	760df47b77	[ARM] Replace fp-only-sp and d16 with fp64 and d32. Those two subtarget features were awkward because their semantics are reversed: each one indicates the _lack_ of support for something in the architecture, rather than the presence. As a consequence, you don't get the behavior you want if you combine two sets of feature bits. Each SubtargetFeature for an FP architecture version now comes in four versions, one for each combination of those options. So you can still say (for example) '+vfp2' in a feature string and it will mean what it's always meant, but there's a new string '+vfp2d16sp' meaning the version without those extra options. A lot of this change is just mechanically replacing positive checks for the old features with negative checks for the new ones. But one more interesting change is that I've rearranged getFPUFeatures() so that the main FPU feature is appended to the output list before rather than after the features derived from the Restriction field, so that -fp64 and -d32 can override defaults added by the main feature. Reviewers: dmgreen, samparker, SjoerdMeijer Subscribers: srhines, javed.absar, eraman, kristof.beyls, hiraditya, zzheng, Petar.Avramovic, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D60691 llvm-svn: 361845	2019-05-28 16:13:20 +00:00
Fangrui Song	448a79d123	[AArch64] Delete unused VariantKind in AArch64MCExpr llvm-svn: 361844	2019-05-28 16:11:56 +00:00
David Greene	561fcc0d63	[X86-64] Fix 256-bit SET0 lowering for non-VLX targets If we don't have VLX then 256-bit SET0 should be lowered to VPXOR with ZMM registers. This restores functionality accidentally removed by r309926. Differential Revision: https://reviews.llvm.org/D62415 llvm-svn: 361843	2019-05-28 15:37:01 +00:00
Sanjay Patel	d5a8637072	[x86] split 256-bit store of concatenated vectors This shows up as a side issue to the main problem for the AVX target example from PR37428: https://bugs.llvm.org/show_bug.cgi?id=37428 - https://godbolt.org/z/7tpRa3 But as we can see in the pile of existing test diffs, it's actually a widespread problem that affects any AVX or later target. Apart from a couple of oddballs, I think these are all improvements for the reasons stated in the code comment: we do not want to enable YMM unnecessarily (avoid vzeroupper and frequency throttling) and some cores split 256-bit stores anyway. We could say that MergeConsecutiveStores() is going overboard on some of these examples, but that won't solve the problem completely. But that is the reason I'm proposing this as a lowering rather than a combine: we will infinite loop fighting the merge code if we try this earlier. Differential Revision: https://reviews.llvm.org/D62498 llvm-svn: 361822	2019-05-28 13:54:17 +00:00
Sanjay Patel	6bf4ca9d2e	[x86] fix 256-bit vector store splitting to honor 'volatile' Forking this out of the discussion in D62498 (and assuming that will be committed later, so adding the helper function here). The LangRef says: "the backend should never split or merge target-legal volatile load/store instructions." Differential Revision: https://reviews.llvm.org/D62506 llvm-svn: 361815	2019-05-28 12:58:07 +00:00
Benjamin Kramer	57e267a2e9	[X86] Custom lower CONCAT_VECTORS of v2i1 The generic legalizer cannot handle this. Add an assert instead of silently miscompiling vectors with elements smaller than 8 bits. llvm-svn: 361814	2019-05-28 12:52:57 +00:00
Graham Hunter	19e91253c0	[NFC] Test commit, delete trailing whitespace llvm-svn: 361813	2019-05-28 12:36:39 +00:00
Simon Pilgrim	4b48aa0e30	[X86] X86CmovConverterPass::collectCmovCandidates - fix uninitialized variable warnings. NFCI. llvm-svn: 361804	2019-05-28 10:53:23 +00:00
Cullen Rhodes	f57bd6bd23	[AArch64][SVE2] Asm: support SVE2 Floating Point Convert Group Summary: Patch adds support for the following intructions: SVE2 floating-point convert precision: * FCVTXNT, FCVTNT, FCVTLT The specification can be found here: https://developer.arm.com/docs/ddi0602/latest Reviewed By: chill Differential Revision: https://reviews.llvm.org/D62382 llvm-svn: 361801	2019-05-28 09:36:52 +00:00
Cullen Rhodes	8e91dd7934	[AArch64][SVE2] Asm: support SVE2 Crypto Extensions Group Summary: Patch adds support for the following instructions: SVE2 crypto constructive binary operations: * SM4EKEY, RAX1 SVE2 crypto destructive binary operations: * AESE, AESD, SM4E SVE2 crypto unary operations: * AESMC, AESIMC AESE, AESD, AESMC and AESIMC are enabled with +sve2-aes. SM4E and SM4EKEY are enabled with +sve2-sm4. RAX1 is enabled with +sve2-sha3. The specification can be found here: https://developer.arm.com/docs/ddi0602/latest Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D62307 llvm-svn: 361797	2019-05-28 09:13:17 +00:00
Cullen Rhodes	c4ed601bd9	[AArch64][SVE2] Asm: support SVE2 Histogram Computation Groups Summary: Patch adds support for the following instructions: SVE2 histogram generation (segment): * HISTSEG SVE2 histogram generation (vector): * HISTCNT The specification can be found here: https://developer.arm.com/docs/ddi0602/latest Reviewed By: chill Differential Revision: https://reviews.llvm.org/D62306 llvm-svn: 361796	2019-05-28 08:51:59 +00:00
Cullen Rhodes	7d9cac5bba	[AArch64][SVE2] Asm: support SVE2 Misc Group Summary: Patch adds support for the following instructions: SVE2 bitwise exclusive-or interleaved: * EORBT, EORTB SVE2 bitwise permute: * BEXT, BDEP, BGRP SVE2 bitwise shift left long: * SSHLLB, SSHLLT, USHLLB, USHLLT SVE2 integer add/subtract interleaved long: * SADDLBT, SSUBLBT, SSUBLTB BDEP, BEXT and BGRP are enabled with SVE2 feature +bitperm, all other instructions in this group are enabled with +sve2. Reviewed By: chill Differential Revision: https://reviews.llvm.org/D62304 llvm-svn: 361795	2019-05-28 08:42:22 +00:00
Alexander Timofeev	f4040a0dd8	[AMDGPU] Fix for the address sanitizer failure. Fixing typo llvm-svn: 361776	2019-05-27 18:17:21 +00:00
Dmitri Gribenko	5379f1a6c5	Include what you use in AArch64AsmBackend.cpp AArch64AsmBackend.cpp was not using any APIs from AArch64.h, and was only including it for transitive dependencies. Doing so is problematic from include-what-you-use perspective, but it is also a layering issue (it creates a dependency cycle between the primary AArch64 target library and the MCTargetDesc library). llvm-svn: 361774	2019-05-27 17:03:57 +00:00
Alexander Timofeev	4a7c4069ae	[AMDGPU] Fix for the address sanitizer failure caused by the ifollowing commit: 1a8b2ea611cf4ca7cb09562e0238cfefa27c05b5 Divergence driven ISel. Assign register class for cross block values according to the divergence. llvm-svn: 361770	2019-05-27 15:03:29 +00:00
Dmitry Preobrazhensky	b79af7930c	[AMDGPU][MC] Enabled constant expressions as operands of s_waitcnt See bug 40820: https://bugs.llvm.org/show_bug.cgi?id=40820 Reviewers: artem.tamazov, arsenm Differential Revision: https://reviews.llvm.org/D61017 llvm-svn: 361763	2019-05-27 14:08:43 +00:00
Diana Picus	68b20c589c	[ARM GlobalISel] Cleanup CallLowering a bit We never actually use the Offsets produced by ComputeValueVTs, so remove them until we need them. llvm-svn: 361755	2019-05-27 10:30:33 +00:00
Yonghong Song	e698958ad8	[BPF] generate R_BPF_NONE relocation for BTF DataSec variables The variables in BTF DataSec type encode in-section offset. R_BPF_NONE should be generated instead of R_BPF_64_32. Signed-off-by: Yonghong Song <yhs@fb.com> Differential Revision: https://reviews.llvm.org/D62460 llvm-svn: 361742	2019-05-26 21:26:06 +00:00
Alexander Timofeev	ba447bae74	[AMDGPU] Divergence driven ISel. Assign register class for cross block values according to the divergence. Details: To make instruction selection really divergence driven it is necessary to assign the correct register classes to the cross block values beforehand. For the divergent targets same value type requires different register classes dependent on the value divergence. Reviewers: rampitec, nhaehnle Differential Revision: https://reviews.llvm.org/D59990 This commit was reverted because of the build failure. The reason was mlformed patch. Build failure fixed. llvm-svn: 361741	2019-05-26 20:33:26 +00:00
Shawn Landden	343578759e	[SimplifyCFG] back out all SwitchInst commits They caused the sanitizer builds to fail. My suspicion is the change the countLeadingZeros(). llvm-svn: 361736	2019-05-26 18:15:51 +00:00
Simon Pilgrim	a044410f37	[X86][SSE] Add shuffle combining support for ISD::ANY_EXTEND_VECTOR_INREG Reuses what we already have in place for ISD::ZERO_EXTEND_VECTOR_INREG just with a different sentinel llvm-svn: 361734	2019-05-26 16:00:35 +00:00
Shawn Landden	b7cc093db2	[Support] make countLeadingZeros() and countTrailingZeros() return unsigned This matches countLeadingOnes() and countTrailingOnes(), and APInt's countLeadingZeros() and countTrailingZeros(). (as well as __builtin_clzll()) llvm-svn: 361724	2019-05-26 13:49:58 +00:00
David Green	0dbafe191e	[ARM] Select fp16 fma This adds a pattern for fma, similar to the float and double patterns. Differential Revision: https://reviews.llvm.org/D62330 llvm-svn: 361719	2019-05-26 11:34:30 +00:00
David Green	21542cd6f4	[ARM] Select a number of fp16 rounding functions This add patterns for fp16 round and ceil etc. Same as the float and double patterns. Differential Revision: https://reviews.llvm.org/D62326 llvm-svn: 361718	2019-05-26 11:13:00 +00:00
David Green	c9f4b7d201	[ARM] Promote various fp16 math intrinsics Promote a number of fp16 math intrinsics to float, so that the relevant float math routines can be used. Copysign is expanded so as to be handled in-place. Differential Revision: https://reviews.llvm.org/D62325 llvm-svn: 361717	2019-05-26 10:59:21 +00:00
Simon Pilgrim	58a8541dcc	[X86][AVX] combineBitcastvxi1 - peek through bitops to determine size of original vector We were only testing for direct SETCC results - this allows us to peek through AND/OR/XOR combinations of the comparison results as well. There's a missing SEXT(PACKSS) fold that I need to investigate for v8i1 cases before I can enable it there as well. llvm-svn: 361716	2019-05-26 10:54:23 +00:00
David Green	2881325b17	[ARM] Select fp16 fabs This adds a pattern for the fabs intrinsic, the same as float and double. Differential Revision: https://reviews.llvm.org/D62324 llvm-svn: 361715	2019-05-26 10:51:58 +00:00
David Green	aeade651f3	[ARM] Select fp16 fsqrt This adds a pattern for the sqrt intrinsic, the same as float and double. Differential Revision: https://reviews.llvm.org/D62322 llvm-svn: 361714	2019-05-26 10:42:24 +00:00
David Green	caf8a11b65	[ARM] Promote fp16 frem Promote fp16 frem operations on ARM to floats so they call fmodf. Differential Revision: https://reviews.llvm.org/D62321 llvm-svn: 361713	2019-05-26 10:30:22 +00:00
Simon Pilgrim	40fa52b174	[X86] lowerBuildVectorToBitOp - support build_vector(shift()) -> shift(build_vector(),C) Commonly occurs in sign-extension cases llvm-svn: 361706	2019-05-25 18:02:17 +00:00
Nikita Popov	d87eceda0e	[X86] Combine fminnum/fmaxnum with non-nan operand to fmin/fmax If we have a known non-nan operand, place it in the second operand of fmin/fmax that is returned if either operand is nan. Differential Revision: https://reviews.llvm.org/D62448 llvm-svn: 361704	2019-05-25 16:44:29 +00:00
Craig Topper	46e5052b8e	[X86FixupLEAs] Turn optIncDec into a generic two address LEA optimizer. Support LEA64_32r properly. INC/DEC is really a special case of a more generic issue. We should also turn leas into add reg/reg or add reg/imm regardless of the slow lea flags. This also supports LEA64_32 which has 64 bit input registers and 32 bit output registers. So we need to convert the 64 bit inputs to their 32 bit equivalents to check if they are equal to base reg. One thing to note, the original code preserved the kill flags by adding operands to the new instruction instead of using addReg. But I think tied operands aren't supposed to have the kill flag set. I dropped the kill flags, but I could probably try to preserve it in the add reg/reg case if we think its important. Not sure which operand its supposed to go on for the LEA64_32r instruction due to the super reg implicit uses. Though I'm also not sure those are needed since they were probably just created by an INSERT_SUBREG from a 32-bit input. Differential Revision: https://reviews.llvm.org/D61472 llvm-svn: 361691	2019-05-25 06:17:47 +00:00
Craig Topper	4b08fcdeb1	[X86] Add zero idioms to the haswell, broadwell, and skylake schedule models. Add 256-bit fp xor to sandybridge zero idioms This copies the Sandy Bridge zero idiom support to later CPUs. Adding the AVX2 and AVX512F/VL instructions as appropriate. Differential Revision: https://reviews.llvm.org/D62360 llvm-svn: 361690	2019-05-25 04:47:49 +00:00
Peter Collingbourne	3b93737446	Revert r361644, "[AMDGPU] Divergence driven ISel. Assign register class for cross block values according to the divergence." Broke sanitizer bots: http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux/builds/21694/steps/bootstrap%20clang/logs/stdio http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/32478/steps/check-llvm%20asan/logs/stdio llvm-svn: 361688	2019-05-25 01:52:38 +00:00
Jessica Paquette	97d668d70f	[GlobalISel][AArch64] Make FP constraint checks consider possible use/def banks In a few places in getInstrMapping, we check if use/def instructions for the instruction we're mapping have floating point constraints. We can improve this check and reduce the number of copies in GISel-compiled code if we make a couple observations: - For a def instruction, it only matters if the def instruction must always output a value stored on a FPR - For a use instruction, it only matters if the use instruction must always only take in values stored in FPRs This adds two new functions: - onlyUsesFP - onlyDefinesFP Then we can use those when we're checking the uses/defs instead. Without this patch, the load, unmerge, store, and select in the added test would have unnecessary copies. Differential Revision: https://reviews.llvm.org/D62426 llvm-svn: 361679	2019-05-24 23:08:45 +00:00
Jessica Paquette	bede937b16	[GlobalISel][AArch64] NFC: Factor out HasFPConstraints into a proper function Factor it out into a function, and replace places where we had the same check with the new function. Differential Revision: https://reviews.llvm.org/D62421 llvm-svn: 361677	2019-05-24 22:12:21 +00:00
Jason Liu	8e1d921bb3	Implement call lowering without parameters on AIX Summary:dd This patch implements call lowering for calls without parameters on AIX as initial support. Reviewers: sfertile, hubert.reinterpretcast, aheejin, efriedma Differential Revision: https://reviews.llvm.org/D61948 llvm-svn: 361669	2019-05-24 20:54:35 +00:00
Jessica Paquette	56503865ed	[GlobalISel][AArch64] Improve register bank mappings for G_SELECT The fcsel and csel instructions differ in only the register banks they work on. So, they're entirely interchangeable otherwise. With this in mind, this does two things: - Teach AArch64RegisterBankInfo to consider the inputs to G_SELECT as well as the outputs. - Teach it to choose the best register bank mapping based off the constraints of the inputs and outputs. The "best" in this case means the one that requires the smallest number of copies to properly emit a fcsel/csel. For example, if the inputs are all already going to be on FPRs, we should emit a fcsel, even if the output is a GPR. This costs one copy to produce the result, but saves us from copying the inputs into GPRs. Also update the regbank-select.mir to check that we end up with the right select instruction. Differential Revision: https://reviews.llvm.org/D62267 llvm-svn: 361665	2019-05-24 19:35:25 +00:00
Nick Desaulniers	33bc64202b	[AArch64] check for INLINEASM_BR along w/ INLINEASM Summary: It looks like since INLINEASM_BR was created off of INLINEASM, a few checks for INLINEASM needed to be updated to check for either case. pr/41999 Reviewers: t.p.northover, peter.smith Reviewed By: peter.smith Subscribers: craig.topper, javed.absar, kristof.beyls, hiraditya, llvm-commits, peter.smith, srhines Tags: #llvm Differential Revision: https://reviews.llvm.org/D62402 llvm-svn: 361661	2019-05-24 19:00:13 +00:00
Nick Desaulniers	9f7bd71cf5	[ARM] additionally check for ARM::INLINEASM_BR w/ ARM::INLINEASM Summary: We were observing failures for arm32 allyesconfigs of the Linux kernel with the asm goto Clang patch, where ldr's were being generated to offsets too far away to encode in imm12. It looks like since INLINEASM_BR was created off of INLINEASM, a few checks for INLINEASM needed to be updated to check for either case. pr/41999 Link: https://github.com/ClangBuiltLinux/linux/issues/490 Reviewers: peter.smith, kristof.beyls, ostannard, rengolin, t.p.northover Reviewed By: peter.smith Subscribers: jyu2, javed.absar, hiraditya, llvm-commits, nathanchance, craig.topper, kees, srhines Tags: #llvm Differential Revision: https://reviews.llvm.org/D62400 llvm-svn: 361659	2019-05-24 18:58:21 +00:00
Matt Arsenault	3d59e388ca	AMDGPU: Activate all lanes when spilling CSR VGPR for SGPR spills If some lanes weren't active on entry to the function, this could clobber their VGPR values. llvm-svn: 361655	2019-05-24 18:18:51 +00:00
Matt Arsenault	0ff901fba0	AMDGPU: Boost inline threshold with addrspacecasted alloca arguments This was skipping GetUnderlyingObject for nonprivate addresses, but an alloca could also be found through an addrspacecast if it's flat. llvm-svn: 361649	2019-05-24 16:52:35 +00:00
Alexander Timofeev	dffedea014	[AMDGPU] Divergence driven ISel. Assign register class for cross block values according to the divergence. Details: To make instruction selection really divergence driven it is necessary to assign the correct register classes to the cross block values beforehand. For the divergent targets same value type requires different register classes dependent on the value divergence. Reviewers: rampitec, nhaehnle Differential Revision: https://reviews.llvm.org/D59990 llvm-svn: 361644	2019-05-24 15:32:18 +00:00
Stefan Pintilie	522307fa40	[PowerPC] Remove CRBits Copy Of Unset/set CBit For the situation, where we generate the following code: crxor 8, 8, 8 < Some instructions> .LBB0_1: < Some instructions> cror 1, 8, 8 cror (COPY of CRbit) depends on the result of the crxor instruction. CR8 is known to be zero as crxor is equivalent to CRUNSET. We can simply use crxor 1, 1, 1 instead to zero out CR1, which does not have any dependency on any previous instruction. This patch will optimize it to: < Some instructions> .LBB0_1: < Some instructions> cror 1, 1, 1 Patch By: Victor Huang (NeHuang) Differential Revision: https://reviews.llvm.org/D62044 llvm-svn: 361632	2019-05-24 12:05:37 +00:00
Cullen Rhodes	b3e58df80c	[AArch64][SVE2] Asm: support SVE2 String Processing Group Summary: Patch adds support for the SVE2 character match instructions MATCH and NMATCH. The specification can be found here: https://developer.arm.com/docs/ddi0602/latest Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D62206 llvm-svn: 361627	2019-05-24 10:32:01 +00:00
Cullen Rhodes	adb1d74bf9	[AArch64][SVE2] Asm: support SVE2 Narrowing Group Summary: Patch adds support for the following instructions: SVE2 bitwise shift right narrow: * SQSHRUNB, SQSHRUNT, SQRSHRUNB, SQRSHRUNT, SHRNB, SHRNT, RSHRNB, RSHRNT, SQSHRNB, SQSHRNT, SQRSHRNB, SQRSHRNT, UQSHRNB, UQSHRNT, UQRSHRNB, UQRSHRNT SVE2 integer add/subtract narrow high part: * ADDHNB, ADDHNT, RADDHNB, RADDHNT, SUBHNB, SUBHNT, RSUBHNB, RSUBHNT SVE2 saturating extract narrow: * SQXTNB, SQXTNT, UQXTNB, UQXTNT, SQXTUNB, SQXTUNT The specification can be found here: https://developer.arm.com/docs/ddi0602/latest Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D62205 llvm-svn: 361624	2019-05-24 10:22:30 +00:00
Cullen Rhodes	5f04f00282	[AArch64][SVE2] Asm: support SVE2 Accumulate Group Summary: Patch adds support for the following instructions: SVE2 bitwise shift and insert: * SRI, SLI SVE2 bitwise shift right and accumulate: * SSRA, USRA, SRSRA, URSRA SVE2 complex integer add: * CADD, SQCADD SVE2 integer absolute difference and accumulate: * SABA, UABA SVE2 integer absolute difference and accumulate long: * SABALB, SABALT, UABALB, UABALT SVE2 integer add/subtract long with carry: * ADCLB, ADCLT, SBCLB, SBCLT The specification can be found here: https://developer.arm.com/docs/ddi0602/latest Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D62204 llvm-svn: 361622	2019-05-24 10:10:34 +00:00
Simon Pilgrim	95b8d9bbf8	[SelectionDAG] computeKnownBits - support constant pool values from target This patch adds the overridable TargetLowering::getTargetConstantFromLoad function which allows targets to return any constant value loaded by a LoadSDNode node - only X86 makes use of this so far but everything should be in place for other targets. computeKnownBits then uses this function to improve codegen, notably vector code after legalization. A future commit will do the same for ComputeNumSignBits but computeKnownBits sees the bigger benefit. This required a couple of fixes: * SimplifyDemandedBits must early-out for getTargetConstantFromLoad cases to prevent infinite loops of constant regeneration (similar to what we already do for BUILD_VECTOR). * Fix a DAGCombiner::visitTRUNCATE issue as we had trunc(shl(v8i32),v8i16) <-> shl(trunc(v8i16),v8i32) infinite loops after legalization on AVX512 targets. Differential Revision: https://reviews.llvm.org/D61887 llvm-svn: 361620	2019-05-24 10:03:11 +00:00
Cullen Rhodes	980f760515	[AArch64][SVE2] Asm: add PMULLB/PMULLT instructions Summary: This patch adds support for the polynomial multiplication instructions PMULLB/PMULLT. The 64-bit source and 128-bit destination element variants are enabled with crypto extensions (+sve2-aes), similar to the NEON PMULL2 instruction. All other variants are enabled with +sve2. The specification can be found here: https://developer.arm.com/docs/ddi0602/latest Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D62145 llvm-svn: 361619	2019-05-24 09:56:23 +00:00
Cullen Rhodes	8bcea9daaa	[AArch64][SVE2] Asm: add integer add/sub long/wide instructions Summary: Patch adds support for the following instructions: SVE2 integer add/subtract long: * SADDLB, SADDLT, UADDLB, UADDLT, SSUBLB, SSUBLT, USUBLB, USUBLT, SABDLB, SABDLT, UABDLB, UABDLT SVE2 integer add/subtract wide: * SADDWB, SADDWT, UADDWB, UADDWT, SSUBWB, SSUBWT, USUBWB, USUBWT The specification can be found here: https://developer.arm.com/docs/ddi0602/latest Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D62142 llvm-svn: 361615	2019-05-24 09:28:27 +00:00
Cullen Rhodes	968cb0e049	[AArch64][SVE2] Asm: add various bitwise shift instructions Summary: This patch adds support for the SVE2 saturating/rounding bitwise shift left (predicated) group of instructions: * SRSHL, URSHL, SRSHLR, URSHLR, SQSHL, UQSHL, SQRSHL, UQRSHL, SQSHLR, UQSHLR, SQRSHLR, UQRSHLR Immediate forms of the SQSHL and UQSHL instructions are also added to the existing SVE bitwise shift by immediate (predicated) group, as well as three new instructions SRSHR/URSHR/SQSHLU. The new instructions in this group are encoded similarly and are implemented using the same TableGen class with a minimal change (1 bit in encoding). The specification can be found here: https://developer.arm.com/docs/ddi0602/latest Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D62140 llvm-svn: 361612	2019-05-24 09:17:23 +00:00
Cullen Rhodes	6bca64fe5e	[AArch64][SVE2] Asm: add saturating add/sub instructions Summary: Patch adds support for the following instructions: * SQADD, UQADD, SUQADD, USQADD * SQSUB, UQSUB, SQSUBR, UQSUBR The specification can be found here: https://developer.arm.com/docs/ddi0602/latest Reviewed By: chill Differential Revision: https://reviews.llvm.org/D62130 llvm-svn: 361611	2019-05-24 09:06:37 +00:00
Cullen Rhodes	d9bb7b69ab	[AArch64][SVE2] Asm: fix overlapping bit Summary: Bit 20 in sve2_int_arith_pred TableGen class was overlapping. The encodings are not affected as bit 20 is defined by the opc bits and this was overwriting the earlier error of setting bit 20 to 0. Raised by Momchil: https://reviews.llvm.org/D62130 Reviewed By: chill Differential Revision: https://reviews.llvm.org/D62292 llvm-svn: 361609	2019-05-24 08:45:37 +00:00
Tim Northover	3b2157aeed	GlobalISel: support swifterror attribute on AArch64. swifterror marks an argument as a register pretending to be a pointer, so we need a guaranteed mem2reg-like analysis of its uses. Fortunately most of the infrastructure can be reused from the DAG world. llvm-svn: 361608	2019-05-24 08:40:13 +00:00
Simon Atanasyan	c1b482f2a5	[mips] Always check that `shift and add` optimization is efficient. The D45316 introduced the `shouldTransformMulToShiftsAddsSubs` function to check that breaking down constant multiplications into a series of shifts, adds, and subs is efficient. Unfortunately, this function does not check maximum number of steps on all paths of the algorithm. This patch fixes this bug. Fix for PR41929. Differential Revision: https://reviews.llvm.org/D62166 llvm-svn: 361606	2019-05-24 08:39:40 +00:00
Sjoerd Meijer	937af54666	[ARM] ARMExpandPseudoInsts: add debug messages This pass wasn't printing any messages at all, which I find really inconvenient while debugging/tracing things. It now dumps the before and after of expanded instructions. It doesn't do this yet for all instructions, but this is a good start I guess. Differential Revision: https://reviews.llvm.org/D62297 llvm-svn: 361604	2019-05-24 08:25:02 +00:00
QingShan Zhang	449bfdd1b0	[Power9] Add a specific heuristic to schedule the addi before the load When we are scheduling the load and addi, if all other heuristic didn't take effect, we will try to schedule the addi before the load, to hide the latency, and avoid the true dependency added by RA. And this only take effects for Power9. Differential Revision: https://reviews.llvm.org/D61930 llvm-svn: 361600	2019-05-24 05:30:09 +00:00
Reid Kleckner	b7a78c7dff	[AArch64] Preserve X8 for thunks ending in variadic musttail calls Summary: On Windows, X8 may be used to pass in the address of an aggregate that is returned indirectly. Therefore, it should be forwarded to variadic musttail calls and preserved in thunks. Fixes PR41997 Reviewers: mgrang, efriedma Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62344 llvm-svn: 361585	2019-05-24 01:27:20 +00:00
Serge Pavlov	ed595e8627	[AArch64] Add nvcast patterns for v2f32 -> v1f64 Summary: Constant stores of f32 values can create such NvCast nodes. Reviewers: t.p.northover Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62285 llvm-svn: 361584	2019-05-24 01:20:34 +00:00
Thomas Lively	55229f6b10	[WebAssembly] Expand more SIMD float ops Summary: These were previously causing ISel failures. Reviewers: aheejin Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62354 llvm-svn: 361577	2019-05-24 00:15:04 +00:00
Matt Arsenault	5c714cbdd8	AMDGPU: Correct maximum possible private allocation size We were assuming a much larger possible per-wave visible stack allocation than is possible: `faa3ae5138/src/core/runtime/amd_gpu_agent.cpp (L70)` Based on this, we can assume the high 15 bits of a frame index or sret are 0. The frame index value is the per-lane offset, so the maximum frame index value is MAX_WAVE_SCRATCH / wavesize. Remove the corresponding subtarget feature and option that made this configurable. llvm-svn: 361541	2019-05-23 19:38:14 +00:00
Robert Lougher	170dfeb2ff	Resubmit r360436 "[X86] Avoid SFB - Fix inconsistent codegen with/without debug info" Fixes https://bugs.llvm.org/show_bug.cgi?id=40969 The functions findPotentiallyBlockedCopies and buildCopy are currently not accounting for the presence of debug instructions. In the former this results in the optimization not being trigerred, and in the latter results in inconsistent codegen. This patch enables the optimization to be performed in a debug build and ensures the codegen is consistent with non-debug builds. Patch by Chris Dawson. Differential Revision: https://reviews.llvm.org/D61680 llvm-svn: 361527	2019-05-23 18:15:12 +00:00
Thomas Lively	e18b5c6237	[WebAssembly] Implement ReplaceNodeResults to fix a SIMD crash Reviewers: aheejin Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61037 llvm-svn: 361526	2019-05-23 18:09:26 +00:00
Matt Arsenault	0f3ba44b57	AMDGPU/GlobalISel: Legality for integer min/max llvm-svn: 361519	2019-05-23 17:58:48 +00:00
Thomas Lively	eafe8ef6f2	[WebAssembly] Add multivalue and tail-call target features Summary: These features will both be implemented soon, so I thought I would save time by adding the boilerplate for both of them at the same time. Reviewers: aheejin Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D62047 llvm-svn: 361516	2019-05-23 17:26:47 +00:00
Lewis Revill	74927554e2	[RISCV] Support assembling TLS LA pseudo instructions This patch adds the pseudo instructions la.tls.ie and la.tls.gd, used in the initial-exec and global-dynamic TLS models respectively when addressing a global. The pseudo instructions are expanded in the assembly parser. llvm-svn: 361499	2019-05-23 14:46:27 +00:00
Sam Parker	617cdc5a6d	[ARM][CGP] Clear SafeWrap before each search The previous patch added a member set to store instructions that we could allow to wrap. But this wasn't cleared between searches meaning that they could get promoted, incorrectly, during the promotion of a separate valid chain. Differential Revision: https://reviews.llvm.org/D62254 llvm-svn: 361462	2019-05-23 07:46:39 +00:00
Thomas Lively	1a3cbe720c	[WebAssembly] Implement __builtin_return_address for emscripten Summary: In this patch, `ISD::RETURNADDR` is lowered on the emscripten target to the new Emscripten runtime function `emscripten_return_address`, which implements the functionality. Patch by Guanzhong Chen Reviewers: tlively, aheejin Reviewed By: tlively Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62210 llvm-svn: 361454	2019-05-23 01:24:01 +00:00
Fangrui Song	86c9ca48c3	[X86] Support -fno-plt __tls_get_addr calls In general dynamic/local dynamic TLS models, with -fno-plt, * x86: emit `calll ___tls_get_addr@GOT(%ebx)` instead of `calll ___tls_get_addr@PLT` Note, on x86, if we can get rid of %ebx as the PIC register, it may be better to use a register not preserved across function calls. x86_64: emit `callq *__tls_get_addr@GOTPCREL(%rip)` instead of `callq __tls_get_addr@PLT` Reorganize the code by separating 32-bit and 64-bit. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D62106 llvm-svn: 361453	2019-05-23 01:05:13 +00:00
Craig Topper	93f38e1f1a	[X86] Explcitly disable VEXTRACT instruction matching for an immediate of 0. Remove a bunch of isel patterns that become unnecessary. We effectively had a second set of isel patterns that tried to use a regular store instruction and an extract_subreg instruction. Or a masked move and an extract_subreg. These patterns were intended to override the matching of VEXTRACT instructions by taking advantage of the priority of the explicit immediate 0 for the index. This patch instaed just disables the immediate 0 matchin the VEXTRACT patterns. This each of the component pieces of the larger patterns will match by themselves. This found a bug of sorts were we didn't use 128-bit store for 512->128 extract on KNL. Its unclear what the right thing here should be. Using the vextract avoids constraining the register allocator to use xmm0-15. But it always results in a longer encoding if the register allocator ends up choosing xmm0-15 anyway. llvm-svn: 361431	2019-05-22 21:00:18 +00:00
Galina Kistanova	ed49f6d8e6	Reverted r361134 because of a failing test left unattended for a long time. http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-win/builds/17792/steps/test-check-all/logs/stdio Failing Tests (1): LLVM :: CodeGen/AMDGPU/regbank-reassign.mir llvm-svn: 361430	2019-05-22 20:42:56 +00:00
Craig Topper	9816d55776	[X86][InstCombine] Remove InstCombine code that turns X86 round intrinsics into llvm.ceil/floor. Remove some isel patterns that existed because that was happening. We were turning roundss/sd/ps/pd intrinsics with immediates of 1 or 2 into llvm.floor/ceil. The llvm.ceil/floor intrinsics are supposed to correspond to the libm functions. For the libm functions we need to disable the precision exception so the llvm.floor/ceil functions should always map to encodings 0x9 and 0xA. We had a mix of isel patterns where some used 0x9 and 0xA and others used 0x1 and 0x2. We need to be consistent and always use 0x9 and 0xA. Since we have no way in isel of knowing where the llvm.ceil/floor came from, we can't map X86 specific intrinsics with encodings 1 or 2 to it. We could map 0x9 and 0xA to llvm.ceil/floor instead, but I'd really like to see a use case and optimization advantage first. I've left the backend test cases to show the blend we now emit without the extra isel patterns. But I've removed the InstCombine tests completely. llvm-svn: 361425	2019-05-22 20:04:55 +00:00
Alexey Lapshin	53726588f6	[DebugInfo][AArch64] Recognise target specific instruction as mov instr This fix is for the problem from https://bugs.llvm.org/show_bug.cgi?id=38714. Specifically, Simple Register Coalescing creates following conversion : undef %0.sub_32:gpr64 = ORRWrs $wzr, %3:gpr32common, 0, debug-location !24; It copies 32-bit value from gpr32 into gpr64. But Live DEBUG_VALUE analysis is not able to create debug location record for that instruction. So the problem is in that debug info for argc variable is incorrect. The fix is to write custom isCopyInstrImpl() which would recognize the ORRWrs instr. llvm-svn: 361417	2019-05-22 18:48:58 +00:00
Matt Arsenault	418e23e33c	AMDGPU: Move disassembler support check to constructor Don't check for unsupported targets for every instruction. llvm-svn: 361406	2019-05-22 16:28:48 +00:00
Matt Arsenault	ca64ef2043	MC: Allow getMaxInstLength to depend on the subtarget Keep it optional in cases this is ever needed in some global context. Currently it's only used for getting an upper bound inline asm code size. For AMDGPU, gfx10 increases the maximum instruction size to 20-bytes. This avoids penalizing older subtargets when estimating code size, and making some annoying branch relaxation test adjustments. llvm-svn: 361405	2019-05-22 16:28:41 +00:00
Dmitry Preobrazhensky	7773fc478d	[AMDGPU][MC] Corrected parsing of op_sel* and neg_* modifiers See bug 41361: https://bugs.llvm.org/show_bug.cgi?id=41361 Reviewers: artem.tamazov, arsenm Differential Revision: https://reviews.llvm.org/D61012 llvm-svn: 361386	2019-05-22 13:59:01 +00:00
Simon Pilgrim	9b40dd6318	[Hexagon] assert getRegisterBitWidth returns non-zero value. NFCI. Fixes scan-build warning. llvm-svn: 361375	2019-05-22 12:25:46 +00:00
Sjoerd Meijer	aa4f1ffca4	[TargetMachine] error message unsupported code model When the tiny code model is requested for a target machine that does not support this, we get an error message (which is nice) but also this diagnostic and request to submit a bug report: fatal error: error in backend: Target does not support the tiny CodeModel [Inferior 2 (process 31509) exited with code 0106] clang-9: error: clang frontend command failed with exit code 70 (use -v to see invocation) (gdb) clang version 9.0.0 (http://llvm.org/git/clang.git 29994b0c63a40f9c97c664170244a7bba5ecc15e) (http://llvm.org/git/llvm.git 95606fdf91c2d63a931e865f4b78b2e9828ddc74) Target: arm-arm-none-eabi Thread model: posix clang-9: note: diagnostic msg: PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace, preprocessed source, and associated run script. clang-9: note: diagnostic msg: ******************** PLEASE ATTACH THE FOLLOWING FILES TO THE BUG REPORT: Preprocessed source(s) and associated run script(s) are located at: clang-9: note: diagnostic msg: /tmp/tiny-dfe1a2.c clang-9: note: diagnostic msg: /tmp/tiny-dfe1a2.sh clang-9: note: diagnostic msg: But this is not a bug, this is a feature. :-) Not only is this not a bug, this is also pretty confusing. This patch causes just to print the fatal error and not the diagnostic: fatal error: error in backend: Target does not support the tiny CodeModel Differential Revision: https://reviews.llvm.org/D62236 llvm-svn: 361370	2019-05-22 10:40:26 +00:00
Fangrui Song	1c61471ab1	[PPC64] Parse -elfv1 -elfv2 when specified on target triple Summary: For big-endian powerpc64, the default ABI is ELFv1. OpenPower ABI ELFv2 is supported when -mabi=elfv2 is specified. FreeBSD support for PowerPC64 ELFv2 ABI with LLVM is in progress[1]. This patch adds an alternative way to specify ELFv2 ABI on target triple [2]. The following results are expected: ELFv1 when using: -target powerpc64-unknown-freebsd12.0 -target powerpc64-unknown-freebsd12.0 -mabi=elfv1 -target powerpc64-unknown-freebsd12.0-elfv1 ELFv2 when using: -target powerpc64-unknown-freebsd12.0 -mabi=elfv2 -target powerpc64-unknown-freebsd12.0-elfv2 [1] https://wiki.freebsd.org/powerpc/llvm-elfv2 [2] https://clang.llvm.org/docs/CrossCompilation.html Patch by Alfredo Dal'Ava Júnior! Differential Revision: https://reviews.llvm.org/D61950 llvm-svn: 361355	2019-05-22 07:29:59 +00:00
Sjoerd Meijer	eec021658b	[AArch64] Subtarget crypto extension defaults The Armv8.2-A crypto extensions all defaulted to true, but should default to false, like all the other extensions. Differential Revision: https://reviews.llvm.org/D62180 llvm-svn: 361354	2019-05-22 07:10:27 +00:00
Nikita Popov	15df05152d	[X86] Don't compare i128 through vector if construction not cheap (PR41971) Fix for https://bugs.llvm.org/show_bug.cgi?id=41971. Make the combineVectorSizedSetCCEquality() transform more conservative by checking that the bitcast to the vector type will be cheap/free for both operands. I'm considering it cheap if it's a constant, a load or already a vector. I've dropped the explicit check for f128 because it should fall out naturally (in the cases where it'd be detrimental). Differential Revision: https://reviews.llvm.org/D62220 llvm-svn: 361352	2019-05-22 06:47:06 +00:00
Chen Zheng	b727b0483c	[PowerPC] use meaningful name for displacement form aligned with x-form - NFC llvm-svn: 361347	2019-05-22 03:17:39 +00:00
Chen Zheng	9970665f60	[PowerPC] [ISEL] select x-form instruction for unaligned offset Differential Revision: https://reviews.llvm.org/D62173 llvm-svn: 361346	2019-05-22 02:57:31 +00:00
Pengfei Wang	6a0d432e9e	[X86] [CET] Deal with return-twice function such as vfork, setjmp when CET-IBT enabled Return-twice functions will indirectly jump after the caller's position. So when CET-IBT is enable, we should make sure these is endbr* instructions follow these Return-twice function caller. Like GCC does. Patch by Xiang Zhang (xiangzhangllvm) Differential Revision: https://reviews.llvm.org/D61881 llvm-svn: 361342	2019-05-22 00:50:21 +00:00
Matt Arsenault	2cba91b8db	AMDGPU: Assume calls read exec llvm-svn: 361333	2019-05-21 23:23:16 +00:00
Matt Arsenault	dd1ffa00a5	AMDGPU: Assume call pseudos are convergent There should probably be nonconvergent versions, but my guess is it doesn't matter in practice. llvm-svn: 361331	2019-05-21 23:23:10 +00:00
Matt Arsenault	60ba03e210	AMDGPU: Fix not marking new gfx10 SGPRs as CSRs llvm-svn: 361330	2019-05-21 23:23:05 +00:00
Dan Gohman	a49496fb2a	[WebAssembly] Add the signature for the new llround builtin function r360889 added new llround builtin functions. This patch adds their signatures for the WebAssembly backend. It also adds wasm32 support to utils/update_llc_test_checks.py, since that's the script other targets are using for their testcases for this feature. Differential Revision: https://reviews.llvm.org/D62207 llvm-svn: 361327	2019-05-21 23:06:34 +00:00
Craig Topper	ed6df47bae	[X86] Remove an unneeded ZERO_EXTEND creation from LowerINTRINSIC_W_CHAIN. NFC We were trying to ZERO_EXTEND from an i8 X86ISD::SETCC to i8 again. llvm-svn: 361288	2019-05-21 19:03:45 +00:00
Simon Pilgrim	4b82e50315	[X86][SSE] computeKnownBitsForTargetNode - add X86ISD::ANDNP support Fixes PACKSS-PSHUFB shuffle regressions mentioned on D61692 llvm-svn: 361270	2019-05-21 15:20:24 +00:00
Fangrui Song	cd36a2857e	[PPC64] Update LocalEntry from assigned symbols On PowerPC64 ELFv2 ABI, functions may have 2 entry points: global and local. The local entry point location of a function is stored in the st_other field of the symbol, as an offset relative to the global entry point. In order to make symbol assignments (e.g. .equ/.set) work properly with this, PPCTargetELFStreamer already copies the local entry bits from the source symbol to the destination one, on emitAssignment(). The problem is that this copy is performed only at the assignment location, where the source symbol may not yet have processed the .localentry directive, that sets the local entry. This may cause the destination symbol to end up with wrong local entry information. Other symbol info is not affected by this because, in this case, the destination symbol value is actually a symbol reference. This change keeps track of these assignments, and update all needed st_other fields when finish() is called. Patch by Leandro Lupori! Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D56586 llvm-svn: 361237	2019-05-21 10:41:25 +00:00
Florian Hahn	4a8835c655	[AArch64] Skip mask checks for masks with an odd number of elements. Some checks in isShuffleMaskLegal expect an even number of elements, e.g. isTRN_v_undef_Mask or isUZP_v_undef_Mask, otherwise they access invalid elements and crash. This patch adds checks to the impacted functions. Fixes PR41951 Reviewers: t.p.northover, dmgreen, samparker Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D60690 llvm-svn: 361235	2019-05-21 10:05:26 +00:00
Cullen Rhodes	7f47b75d18	[AArch64][SVE2] Asm: add integer unary instructions (predicated) Summary: Patch adds support for the following instructions: * URECPE, URSQRTE, SQABS, SQNEG The specification can be found here: https://developer.arm.com/docs/ddi0602/latest Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D62129 llvm-svn: 361230	2019-05-21 09:06:51 +00:00
Cullen Rhodes	e798e8d9d2	[AArch64][SVE2] Asm: add integer pairwise arithmetic instructions Summary: Patch adds support for the following instructions: ADDP, SMAXP, UMAXP, SMINP, UMINP The specification can be found here: https://developer.arm.com/docs/ddi0602/latest Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D62128 llvm-svn: 361229	2019-05-21 08:59:00 +00:00
Sam Parker	3141bbd52d	[ARM][CGP] Skip nuw in PrepareConstants PrepareConstants step converts add/sub with 'negative' immediates to sub/add with a 'positive' imm to make promotion more simple. nuw already states that the add shouldn't cause an unsigned wrap, so it shouldn't need any tweaking. Plus, we also don't allow a sub with a 'negative' immediate to be safe wrap, so this functionality has been removed. The PrepareConstants step now just handles the add instructions that we've determined would be safe if they wrap around zero. Differential Revision: https://reviews.llvm.org/D62057 llvm-svn: 361227	2019-05-21 07:56:47 +00:00
Dylan McKay	e967308da4	Add TargetLoweringInfo hook for explicitly setting the ABI calling convention endianess Summary: The endianess used in the calling convention does not always match the endianess of the target on all architectures, namely AVR. When an argument is too large to be legalised by the architecture and is split for the ABI, a new hook TargetLoweringInfo::shouldSplitFunctionArgumentsAsLittleEndian is queried to find the endianess that function arguments must be laid out in. This approach was recommended by Eli Friedman. Originally reported in https://github.com/avr-rust/rust/issues/129. Patch by Carl Peto. Reviewers: bogner, t.p.northover, RKSimon, niravd, efriedma Reviewed By: efriedma Subscribers: JDevlieghere, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62003 llvm-svn: 361222	2019-05-21 06:38:02 +00:00
Chen Zheng	c4c407a0eb	[PowerPC] use more meaningful name - NFC llvm-svn: 361218	2019-05-21 03:54:42 +00:00
Matt Arsenault	6dd08e335f	AMDGPU: Force skip branches over calls Unfortunately the way SIInsertSkips works is backwards, and is required for correctness. r338235 added handling of some special cases where skipping is mandatory to avoid side effects if no lanes are active. It conservatively handled asm correctly, but the same logic needs to apply to calls. Usually the call sequence code is larger than the skip threshold, although the way the count is computed is really broken, so I'm not sure if anything was likely to really hit this. llvm-svn: 361202	2019-05-20 22:04:42 +00:00
Martin Storsjo	4ed18e5ef5	[AArch64] Handle lowering lround on windows, where long is 32 bit Differential Revision: https://reviews.llvm.org/D62108 llvm-svn: 361192	2019-05-20 19:53:28 +00:00
Bjorn Pettersson	eee0f2330d	[AMDGPU] Fix std::array initializers to avoid warnings with older tool chains. NFC A std::array is implemented as a template with an array inside a struct. Older versions of clang, like 3.6, require an extra set of curly braces around std::array initializations to avoid warnings. The C++ language was changed regarding this by CWG 1270. So more modern tool chains does not complaing even if leaving out one level of braces. llvm-svn: 361171	2019-05-20 16:41:08 +00:00
Matt Arsenault	5239298b0d	R600: Fix unconditional return in loop llvm-svn: 361167	2019-05-20 16:22:11 +00:00
Cullen Rhodes	523789fa6b	[AArch64][SVE2] Asm: add SADALP and UADALP instructions Summary: This patch adds support for the integer pairwise add and accumulate long instructions SADALP/UADALP. These instructions are predicated. The specification can be found here: https://developer.arm.com/docs/ddi0602/latest Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D62001 llvm-svn: 361154	2019-05-20 13:50:15 +00:00
Petar Jovanovic	e85bbf564d	[DebugInfoMetadata] Refactor DIExpression::prepend constants (NFC) Refactor DIExpression::With* into a flag enum in order to be less error-prone to use (as discussed on D60866). Patch by Djordje Todorovic. Differential Revision: https://reviews.llvm.org/D61943 llvm-svn: 361137	2019-05-20 10:35:57 +00:00
Cullen Rhodes	96c5929926	[AArch64][SVE2] Asm: add int halving add/sub (predicated) instructions Summary: This patch adds support for the predicated integer halving add/sub instructions: * SHADD, UHADD, SRHADD, URHADD * SHSUB, UHSUB, SHSUBR, UHSUBR The specification can be found here: https://developer.arm.com/docs/ddi0602/latest Reviewed By: rovka Differential Revision: https://reviews.llvm.org/D62000 llvm-svn: 361136	2019-05-20 10:35:23 +00:00
Cullen Rhodes	0fc6347b35	[AArch64][SVE2] Asm: add saturating multiply-add interleaved long instructions Summary: Patch adds support for SQDMLALBT and SQDMLSLBT instructions. The specification can be found here: https://developer.arm.com/docs/ddi0602/latest Reviewed By: rovka Differential Revision: https://reviews.llvm.org/D61998 llvm-svn: 361135	2019-05-20 10:29:48 +00:00
Fangrui Song	68774edcd6	Use llvm::sort. NFC llvm-svn: 361134	2019-05-20 10:18:35 +00:00
Carl Ritson	34e95ce259	[AMDGPU] gfx1010 Avoid SMEM WAR hazard for some s_waitcnt values Summary: Avoid introducing hazard mitigation when lgkmcnt is reduced to 0. Clarify code comments to explain assumptions made for this hazard mitigation. Expand and correct test cases to cover variants of s_waitcnt. Reviewers: nhaehnle, rampitec Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62058 llvm-svn: 361124	2019-05-20 07:20:12 +00:00
Craig Topper	3164b50af7	[X86] Remove combineShift function. Just dispatch directly to the handler for each flavor from the main switch. NFC llvm-svn: 361108	2019-05-19 01:01:46 +00:00
Matt Arsenault	2f29220d6d	AMDGPU/GlobalISel: Implement s64->s64 [SU]ITOFP llvm-svn: 361082	2019-05-17 23:05:18 +00:00
Matt Arsenault	02b5ca8cd1	GlobalISel: Implement lower for S64->S32 [SU]ITOFP This is ported from the custom AMDGPU DAG implementation. I think this is a better default expansion than what the DAG currently uses, at least if the target has CTLZ. This implements the signed version in terms of the unsigned conversion, which is implemented with bit operations. SelectionDAG has several other implementations that should eventually be ported depending on what instructions are legal. llvm-svn: 361081	2019-05-17 23:05:13 +00:00
Sam Clegg	13717bd54b	[WebAssembly] Remove expected failure of builtin-location.C test This seems to have been fixed by https://reviews.llvm.org/D61956 Yay Differential Revision: https://reviews.llvm.org/D62075 llvm-svn: 361071	2019-05-17 19:55:17 +00:00
Simon Pilgrim	065431c82b	[X86][SSE] Fold movmsk(not(x)) -> not(movmsk) Helps to improve folding of comparisons with movmsk results. llvm-svn: 361056	2019-05-17 17:56:25 +00:00
Simon Pilgrim	2c2f8e74b9	[X86][SSE] Match all-of bool scalar reductions into a bitcast/movmsk + cmp. Same as what we do for vector reductions in combineHorizontalPredicateResult, use movmsk+cmp for scalar (and(extract(x,0),extract(x,1)) reduction patterns. llvm-svn: 361052	2019-05-17 17:25:55 +00:00
Dmitry Preobrazhensky	198611b0ff	[AMDGPU][MC] Corrected parsing of NAME:VALUE modifiers See bug 41298: https://bugs.llvm.org/show_bug.cgi?id=41298 Reviewers: artem.tamazov, arsenm Differential Revision: https://reviews.llvm.org/D61009 llvm-svn: 361045	2019-05-17 16:04:17 +00:00
Dmitry Preobrazhensky	5ae3113969	[AMDGPU][MC] Enabled labels with s_call_b64 and s_cbranch_i_fork See https://bugs.llvm.org/show_bug.cgi?id=41888 Reviewers: artem.tamazov, arsenm Differential Revision: https://reviews.llvm.org/D62016 llvm-svn: 361040	2019-05-17 14:57:04 +00:00
Simon Pilgrim	279314e81b	[X86][AVX] Remove LowerCTTZ's AVX1 custom vector handling. We can now rely on generic expansion to handle this. llvm-svn: 361038	2019-05-17 14:37:19 +00:00
Simon Pilgrim	62c7032c18	[X86][AVX] isNOT - add extract_subvector(xor X, -1) -> extract_subvector(X) fold. Prep work for the removal of the remaining x86 CTTZ vector lowering. llvm-svn: 361035	2019-05-17 14:04:56 +00:00
Dmitry Preobrazhensky	43fcc79837	[AMDGPU][MC] Enabled expressions for most operands which accept integer values See bug 40873: https://bugs.llvm.org/show_bug.cgi?id=40873 Reviewers: artem.tamazov, arsenm Differential Revision: https://reviews.llvm.org/D60768 llvm-svn: 361031	2019-05-17 13:17:48 +00:00
Matt Arsenault	1a02d30c87	AMDGPU: Fix unused variable warnings in release builds llvm-svn: 361030	2019-05-17 12:59:27 +00:00
Matt Arsenault	a510b570c2	AMDGPU/GlobalISel: Legalize G_FCEIL llvm-svn: 361028	2019-05-17 12:20:05 +00:00
Matt Arsenault	6aebcd5499	AMDGPU/GlobalISel: Legalize G_INTRINSIC_TRUNC llvm-svn: 361027	2019-05-17 12:20:01 +00:00
Matt Arsenault	6aafc5e19d	AMDGPU/GlobalISel: Legalize G_FRINT llvm-svn: 361026	2019-05-17 12:19:57 +00:00
Matt Arsenault	1448f5689e	AMDGPU/GlobalISel: Legalize G_FCOPYSIGN llvm-svn: 361025	2019-05-17 12:19:52 +00:00
Matt Arsenault	568f193847	AMDGPU/GlobalISel: RegBankSelect for llvm.amdgcn.s.buffer.load llvm-svn: 361023	2019-05-17 12:02:34 +00:00
Matt Arsenault	a3b5a386fa	AMDGPU/GlobalISel: Use subreg index instead of extra unmerge This saves instructions and extra steps, but I'm not sure about introducing subregister indexes at this point. llvm-svn: 361022	2019-05-17 12:02:31 +00:00
Matt Arsenault	b3dc73634c	AMDGPU/GlobalISel: Use waterfall loop for buffer_load This adds support for more complex waterfall loops that need to handle operands > 32-bits, and multiple operands. llvm-svn: 361021	2019-05-17 12:02:27 +00:00
Simon Pilgrim	a6d3bd486b	[X86] Pull out IsNOT helper. NFCI. Return the input value for the NOT pattern: (xor X, -1) -> X llvm-svn: 361012	2019-05-17 10:37:08 +00:00
Rhys Perry	c4bc61bad7	[AMDGPU] detect WaW hazards when moving/merging load/store instructions Summary: In order to combine memory operations efficiently, the load/store optimizer might move some instructions around. It's usually safe to move instructions down past the merged instruction because the pass checks if memory operations can be re-ordered. Though, the current logic doesn't handle Write-after-Write hazards. This fixes a reflection issue with Monster Hunter World and DXVK. v2: - rebased on top of master - clean up the test case - handle WaW hazards correctly Bugzilla: https://bugs.llvm.org/show_bug.cgi?id=40130 Original patch by Samuel Pitoiset. Reviewers: tpr, arsenm, nhaehnle Reviewed By: nhaehnle Subscribers: ronlieb, arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye Differential Revision: https://reviews.llvm.org/D61313 llvm-svn: 361008	2019-05-17 09:32:23 +00:00
Cullen Rhodes	7f605c3550	[AArch64][SVE2] Asm: add saturating multiply-add long instructions Summary: Patch adds support for indexed and unpredicated vectors forms of the following instructions: * SQDMLALB, SQDMLALT, SQDMLSLB, SQDMLSLT The specification can be found here: https://developer.arm.com/docs/ddi0602/latest Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D61997 llvm-svn: 361005	2019-05-17 09:29:43 +00:00
Cullen Rhodes	334130a199	[AArch64][SVE2] Asm: add integer multiply-add long instructions Summary: Patch adds support for indexed and unpredicated vectors forms of the following instructions: * SMLALB, SMLALT, UMLALB, UMLALT, SMLSLB, SMLSLT, UMLSLB, UMLSLT The specification can be found here: https://developer.arm.com/docs/ddi0602/latest Reviewed By: rovka Differential Revision: https://reviews.llvm.org/D61951 llvm-svn: 361003	2019-05-17 09:19:41 +00:00
Cullen Rhodes	0d47f00821	[AArch64][SVE2] Asm: add integer multiply long instructions Summary: Patch adds support for indexed and unpredicated vectors forms of the following instructions: * SMULLB, SMULLT, UMULLB, UMULLT, SQDMULLB, SQDMULLT The specification can be found here: https://developer.arm.com/docs/ddi0602/latest Reviewed By: rovka Differential Revision: https://reviews.llvm.org/D61936 llvm-svn: 361002	2019-05-17 09:04:44 +00:00
Craig Topper	ae1597d360	[X86] Add FeatureFastScalarShiftMasks and FeatureFastVectorShiftMasks to the ignore list for inlining compatibility. These are tuning flags and won't cause any codegen issue if we inline a function with a different value. llvm-svn: 360992	2019-05-17 06:40:21 +00:00
Fangrui Song	ad7199f3e6	[PowerPC] Support .reloc , R_PPC{,64}_NONE, This can be used to create references among sections. When --gc-sections is used, the referenced section will be retained if the origin section is retained. llvm-svn: 360990	2019-05-17 06:04:11 +00:00
Fangrui Song	e18a6ad0b8	[MC][PowerPC] Clean up PPCAsmBackend Replace the member variable Target with Triple Use Triple instead of TheTarget.getName() to dispatch on 32-bit/64-bit. Delete redundant parameters llvm-svn: 360986	2019-05-17 05:44:26 +00:00
Fangrui Song	2463239777	[X86] Support .reloc , R_{386,X86_64}_NONE, This can be used to create references among sections. When --gc-sections is used, the referenced section will be retained if the origin section is retained. See R_MIPS_NONE (D13659), R_ARM_NONE (D61992), R_AARCH64_NONE (D61973) for similar changes. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D62014 llvm-svn: 360983	2019-05-17 03:25:39 +00:00
Fangrui Song	aa6102ad8e	[AArch64] Support .reloc , R_AARCH64_NONE, Summary: This can be used to create references among sections. When --gc-sections is used, the referenced section will be retained if the origin section is retained. Reviewed By: peter.smith Differential Revision: https://reviews.llvm.org/D61973 llvm-svn: 360981	2019-05-17 03:05:07 +00:00
Fangrui Song	43ca0e9eb8	[ARM] Support .reloc , R_ARM_NONE, R_ARM_NONE can be used to create references among sections. When --gc-sections is used, the referenced section will be retained if the origin section is retained. Add a generic MCFixupKind FK_NONE as this kind of no-op relocation is ubiquitous on ELF and COFF, and probably available on many other binary formats. See D62014. Reviewed By: peter.smith Differential Revision: https://reviews.llvm.org/D61992 llvm-svn: 360980	2019-05-17 02:51:54 +00:00
Jonas Paulsson	9427961c89	[SystemZ] Bugfix in SystemZTargetLowering::combineIntDIVREM() Make sure to not unroll a vector division/remainder (with a constant splat divisor) after type legalization, since the scalar type may then be illegal. Review: Ulrich Weigand https://reviews.llvm.org/D62036 llvm-svn: 360965	2019-05-17 00:50:35 +00:00
David L. Jones	4a5e01faa4	[X86][AsmParser] Add mnemonics missed in r360954. These are valid Jcc, but aren't based on the EFLAGS condition codes (Intel 64 and IA-32 Architetcures Software Developer's Manual Vol. 1, Appendix B). These are covered in clang/test, but not llvm/test. llvm-svn: 360960	2019-05-17 00:19:20 +00:00
David L. Jones	add7ed2281	[X86][AsmParser] Ignore "short" even harder in Intel syntax ASM. In Intel syntax, it's not uncommon to see a "short" modifier on Jcc conditional jumps, which indicates the offset should be a "short jump" (8-bit immediate offset from EIP, -128 to +127). This patch expands to all recognized Jcc condition codes, and removes the inline restriction. Clang already ignores "jmp short" in inline assembly. However, only "jmp" and a couple of Jcc are actually checked, and only inline (i.e., not when using the integrated assembler for asm sources). A quick search through asm-containing libraries at hand shows a pretty broad range of Jcc conditions spelled with "short." GAS ignores the "short" modifier, and instead uses an encoding based on the given immediate. MS inline seems to do the same, and I suspect MASM does, too. NASM will yield an error if presented with an out-of-range immediate value. Example of GCC 9.1 and MSVC v19.20, "jmp short" with offsets that do and do not fit within 8 bits: https://gcc.godbolt.org/z/aFZmjY Differential Revision: https://reviews.llvm.org/D61990 llvm-svn: 360954	2019-05-16 23:27:07 +00:00
David L. Jones	11305984d0	[X86][AsmParser] Rename "ConditionCode" variable to "ConditionPredicate". This better matches the verbiage in Intel documentation, and should help avoid confusion between these two different kinds of values, both of which are parsed from mnemonics. llvm-svn: 360953	2019-05-16 23:27:05 +00:00
Reid Kleckner	08c15df29f	[X86] Deduplicate symbol lowering logic, NFC Summary: This refactors four pieces of code that create SDNodes for references to symbols: - normal global address lowering (LEA, MOV, etc) - callee global address lowering (CALL) - external symbol address lowering (LEA, MOV, etc) - external symbol address lowering (CALL) Each of these pieces of code need to: - classify the reference - lower the symbol - emit a RIP wrapper if needed - emit a load if needed - add offsets if needed I think handling them all in one place will make the code easier to maintain in the future. Reviewers: craig.topper, RKSimon Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61690 llvm-svn: 360952	2019-05-16 23:15:26 +00:00
Craig Topper	f09b9d419f	[X86] Use 0x9 instead of 0x1 as the immediate in some masked floor pattern. Similarly change 0x2 to 0xA for ceil. This suppresses exceptions which is what we should be doing for ceil and floor. We already use the correct immediate in patterns without masking. llvm-svn: 360915	2019-05-16 16:53:50 +00:00
Matt Arsenault	99e6f4d11a	AMDGPU: Introduce TokenFactor for ABI register copies in call sequence The call was missing chain dependencies on the pre-call copies. I don't think this was causing any real issues however. llvm-svn: 360906	2019-05-16 15:10:27 +00:00
Matt Arsenault	df24c92c0f	AMDGPU: Assume xnack is enabled by default This is the conservatively correct default. It is always safe to assume xnack is enabled, but not the converse. Introduce a feature to blacklist targets where xnack can never be meaningfully enabled. I'm not sure the targets this is applied to is 100% correct. llvm-svn: 360903	2019-05-16 14:48:34 +00:00
Adhemerval Zanella	2d28db6b9f	[AArch64] Handle ISD::LROUND and ISD::LLROUND This patch optimizes ISD::LROUND and ISD::LLROUND to fcvtas instruction. It currently only handles the scalar version. llvm-svn: 360894	2019-05-16 13:30:18 +00:00
Adhemerval Zanella	73643b5041	[CodeGen] Add lround/llround builtins This patch add the ISD::LROUND and ISD::LLROUND along with new intrinsics. The changes are straightforward as for other floating-point rounding functions, with just some adjustments required to handle the return value being an interger. The idea is to optimize lround/llround generation for AArch64 in a subsequent patch. Current semantic is just route it to libm symbol. llvm-svn: 360889	2019-05-16 13:15:27 +00:00
Matt Arsenault	a8f88c388f	AMDGPU/GlobalISel: Correct regbank for 1-bit and/or/xor Bool values should use the scc/vcc regbank since r350611. llvm-svn: 360877	2019-05-16 12:06:41 +00:00
Cullen Rhodes	472c6ef8b0	[AArch64][SVE2] Asm: implement CMLA/SQRDCMLAH instructions Summary: This patch adds support for the indexed and unpredicated vectors forms of the CMLA and SQRDCMLAH instructions. The specification can be found here: https://developer.arm.com/docs/ddi0602/latest Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D61906 llvm-svn: 360871	2019-05-16 09:42:22 +00:00
Cullen Rhodes	07eba98dd7	[AArch64][SVE2] Asm: implement CDOT instruction Summary: The complex DOT instructions perform a dot-product on quadtuplets from two source vectors and the resuling wide real or wide imaginary is accumulated into the destination register. The instructions come in two forms: Vector form, e.g. cdot z0.s, z1.b, z2.b, #90 - complex dot product on four 8-bit quad-tuplets, accumulating results in 32-bit elements. The complex numbers in the second source vector are rotated by 90 degrees. cdot z0.d, z1.h, z2.h, #180 - complex dot product on four 16-bit quad-tuplets, accumulating results in 64-bit elements. The complex numbers in the second source vector are rotated by 180 degrees. Indexed form, e.g. cdot z0.s, z1.b, z2.b[3], #0 - complex dot product on four 8-bit quad-tuplets, with specified quadtuplet from second source vector, accumulating results in 32-bit elements. cdot z0.d, z1.h, z2.h[1], #0 - complex dot product on four 16-bit quad-tuplets, with specified quadtuplet from second source vector, accumulating results in 64-bit elements. The specification can be found here: https://developer.arm.com/docs/ddi0602/latest Reviewed By: SjoerdMeijer, rovka Differential Revision: https://reviews.llvm.org/D61903 llvm-svn: 360870	2019-05-16 09:33:44 +00:00
Cullen Rhodes	064f6ab556	[AArch64][SVE2] Asm: add unpredicated integer multiply instructions Summary: Add support for the following instructions: * MUL (indexed and unpredicated vectors forms) * SQDMULH (indexed and unpredicated vectors forms) * SQRDMULH (indexed and unpredicated vectors forms) * SMULH (unpredicated, predicated form added in SVE) * UMULH (unpredicated, predicated form added in SVE) * PMUL (unpredicated) The specification can be found here: https://developer.arm.com/docs/ddi0602/latest Reviewed By: SjoerdMeijer, rovka Differential Revision: https://reviews.llvm.org/D61902 llvm-svn: 360867	2019-05-16 09:07:26 +00:00
Fangrui Song	3e92df3e39	Add Triple::isPPC64() llvm-svn: 360864	2019-05-16 08:31:22 +00:00
Craig Topper	e43bdf144c	[X86] Delay creating index register negations during address matching until after we know for sure the match will succeed If we're trying to match an LEA, its possible the LEA match will be deemed unprofitable. In which case the negation we created in matchAddress would be left dangling in the SelectionDAG. This could artificially increase use counts for other nodes in the DAG. Though I don't have an example of that. But it just seems like bad form to have dangling nodes in isel. Differential Revision: https://reviews.llvm.org/D61047 llvm-svn: 360823	2019-05-15 21:59:53 +00:00
Simon Atanasyan	0b0cc23fb6	[mips] Use range-based `for` loops. NFC llvm-svn: 360817	2019-05-15 21:26:25 +00:00
Mandeep Singh Grang	814435fe87	[AArch64] only indicate CFI on Windows if we emitted CFI Summary: Otherwise, we emit directives for CFI without any actual CFI opcodes to go with them, which causes tools to malfunction. The technique is similar to what the x86 backend already does. Fixes https://bugs.llvm.org/show_bug.cgi?id=40876 Patch by: froydnj (Nathan Froyd) Reviewers: mstorsjo, eli.friedman, rnk, mgrang, ssijaric Reviewed By: rnk Subscribers: javed.absar, kristof.beyls, llvm-commits, dmajor Tags: #llvm Differential Revision: https://reviews.llvm.org/D61960 llvm-svn: 360816	2019-05-15 21:23:41 +00:00
Craig Topper	439228727a	[X86] Strengthen type constraints on some specialized X86 ISD opcodes that don't have any flexibility. NFC These particular instructions only operate on 128-bit vectors and have no wider equivalents. And the element size is always known. One could argue that MOVSS/MOVSD could be merged, but that's probably disruptive to code in X86ISelLowering and probably low value. llvm-svn: 360815	2019-05-15 21:16:28 +00:00
Pete Couperus	1ca049959f	Uncomment LLVM_FALLTHROUGH. llvm-svn: 360798	2019-05-15 19:46:17 +00:00
Ryan Taylor	29257eb76c	[AMDGPU] Increases available SGPR for Calling Convention Summary: SGPR in CC can be either hw initialized or set by other chained shaders and so this increases the SGPR count availalbe to CC to 105. Change-Id: I3dfadc750fe4a3e2bd07117a2899fd13f3e2fef3 Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61261 llvm-svn: 360778	2019-05-15 14:43:55 +00:00
David Green	0582b22f10	[ARM] Don't use the Machine Scheduler for cortex-m at minsize The new cortex-m schedule in rL360768 helps performance, but can increase the amount of high-registers used. This, on average, ends up increasing the codesize by a fair amount (because less instructions are converted from T2 to T1). On cortex-m at -Oz, where we are quite size-paranoid, it is better to use the existing DAG scheduler with the RegPressure scheduling preference (at least until the issues around T2 vs T1 instructions can be improved). I have also made sure that the Sched::RegPressure dag scheduler is always chosen for MinSize. The test shows one case where we increase the number of registers used. Differential Revision: https://reviews.llvm.org/D61882 llvm-svn: 360769	2019-05-15 12:58:02 +00:00
David Green	d2d0f46cd2	[ARM] Cortex-M4 schedule This patch adds a simple Cortex-M4 schedule, renaming the existing M3 schedule to M4 and filling in the latencies as-per the Cortex-M4 TRM: https://developer.arm.com/docs/ddi0439/latest Most of these are 1, with the important exception being loads taking 2 cycles. A few others are also higher, but I don't believe they make a large difference. I've repurposed the M3 schedule as the latencies are mostly the same between the two cores, with the M4 having more FP and DSP instructions. We also turn on MISched and UseAA for the cores that now use this. It also adds some schedule Write's to various instruction to make things simpler. Differential Revision: https://reviews.llvm.org/D54142 llvm-svn: 360768	2019-05-15 12:41:58 +00:00
Simon Atanasyan	4c68c5ae71	[mips] LLVM and GAS now use same instructions for CFA Definition. NFCI LLVM previously used `DW_CFA_def_cfa` instruction in .eh_frame to set the register and offset for current CFA rule. We change it to `DW_CFA_def_cfa_register` which is the same one used by GAS that only changes the register but keeping the old offset. Patch by Mirko Brkusanin. Differential Revision: https://reviews.llvm.org/D61899 llvm-svn: 360765	2019-05-15 12:05:27 +00:00
Craig Topper	384d46c0d5	[X86] Use OR32mi8Locked instead of LOCK_OR32mi8 in emitLockedStackOp. They encode the same way, but OR32mi8Locked sets hasUnmodeledSideEffects set which should be stronger than the mayLoad/mayStore on LOCK_OR32mi8. I think this makes sense since we are using it as a fence. This also seems to hide the operation from the speculative load hardening pass so I've reverted r360511. llvm-svn: 360747	2019-05-15 04:15:46 +00:00
Philip Reames	658cad1287	[NFC] Reuse a helper function to eliminate duplicate code llvm-svn: 360740	2019-05-15 01:39:07 +00:00
Richard Trieu	5f7d4ab5f9	[XCore] Create a TargetInfo header. NFC Move the declarations of getThe<Name>Target() functions into a new header in TargetInfo and make users of these functions include this new header. This fixes a layering problem. llvm-svn: 360738	2019-05-15 01:28:30 +00:00
Richard Trieu	0116385452	[X86] Create a TargetInfo header. NFC Move the declarations of getThe<Name>Target() functions into a new header in TargetInfo and make users of these functions include this new header. This fixes a layering problem. llvm-svn: 360736	2019-05-15 01:17:58 +00:00
Richard Trieu	c6c421379d	[WebAssembly] Create a TargetInfo header. NFC Move the declarations of getThe<Name>Target() functions into a new header in TargetInfo and make users of these functions include this new header. This fixes a layering problem. llvm-svn: 360735	2019-05-15 01:03:00 +00:00
Richard Trieu	1e6f98b89d	[SystemZ] Create a TargetInfo header. NFC Move the declarations of getThe<Name>Target() functions into a new header in TargetInfo and make users of these functions include this new header. This fixes a layering problem. llvm-svn: 360734	2019-05-15 00:46:18 +00:00
Richard Trieu	cf82d4a483	[Sparc] Create a TargetInfo header. NFC Move the declarations of getThe<Name>Target() functions into a new header in TargetInfo and make users of these functions include this new header. This fixes a layering problem. llvm-svn: 360733	2019-05-15 00:35:37 +00:00
Richard Trieu	51fc56d603	[RISCV] Create a TargetInfo header. NFC Move the declarations of getThe<Name>Target() functions into a new header in TargetInfo and make users of these functions include this new header. This fixes a layering problem. llvm-svn: 360732	2019-05-15 00:24:15 +00:00
Richard Trieu	ee6ced196d	[PowerPC] Create a TargetInfo header. NFC Move the declarations of getThe<Name>Target() functions into a new header in TargetInfo and make users of these functions include this new header. This fixes a layering problem. llvm-svn: 360731	2019-05-15 00:09:58 +00:00
Richard Trieu	e8f83befd5	[NVPTX] Create a TargetInfo header. NFC Move the declarations of getThe<Name>Target() functions into a new header in TargetInfo and make users of these functions include this new header. This fixes a layering problem. llvm-svn: 360729	2019-05-14 23:56:18 +00:00
Richard Trieu	a57ce32eff	[MSP430] Create a TargetInfo header. NFC Move the declarations of getThe<Name>Target() functions into a new header in TargetInfo and make users of these functions include this new header. This fixes a layering problem. llvm-svn: 360728	2019-05-14 23:45:18 +00:00
Richard Trieu	313b78150c	[Mips] Create a TargetInfo header. NFC Move the declarations of getThe<Name>Target() functions into a new header in TargetInfo and make users of these functions include this new header. This fixes a layering problem. llvm-svn: 360727	2019-05-14 23:34:37 +00:00
Richard Trieu	2e50dc78c5	[Lanai] Create a TargetInfo header. NFC Move the declarations of getThe<Name>Target() functions into a new header in TargetInfo and make users of these functions include this new header. This fixes a layering problem. llvm-svn: 360726	2019-05-14 23:17:18 +00:00
Richard Trieu	7ef172998b	[Hexagon] Create a TargetInfo header. NFC Move the declarations of getThe<Name>Target() functions into a new header in TargetInfo and make users of these functions include this new header. This fixes a layering problem. llvm-svn: 360724	2019-05-14 23:04:55 +00:00
Richard Trieu	a68ee931e6	[BPF] Create a TargetInfo header. NFC Move the declarations of getThe<Name>Target() functions into a new header in TargetInfo and make users of these functions include this new header. This fixes a layering problem. llvm-svn: 360722	2019-05-14 22:54:06 +00:00
Richard Trieu	e982b42003	[AVR] Create a TargetInfo header. NFC Move the declarations of getThe<Name>Target() functions into a new header in TargetInfo and make users of these functions include this new header. This fixes a layering problem. llvm-svn: 360721	2019-05-14 22:41:58 +00:00
Philip Reames	445f942fc4	Use an offset from TOS for idempotent rmw locked op lowering This was the portion split off D58632 so that it could follow the redzone API cleanup. Note that I changed the offset preferred from -8 to -64. The difference should be very minor, but I thought it might help address one concern which had been previously raised. Differential Revision: https://reviews.llvm.org/D61862 llvm-svn: 360719	2019-05-14 22:32:42 +00:00
Richard Trieu	f3011b9b10	[ARM] Create a TargetInfo header. NFC Move the declarations of getThe<Name>Target() functions into a new header in TargetInfo and make users of these functions include this new header. This fixes a layering problem. llvm-svn: 360718	2019-05-14 22:29:50 +00:00
Richard Trieu	7f9a008a2d	[ARC] Create a TargetInfo header. NFC Move the declarations of getThe<Name>Target() functions into a new header in TargetInfo and make users of these functions include this new header. This fixes a layering problem. llvm-svn: 360716	2019-05-14 22:06:04 +00:00
Richard Trieu	8ce2ee9d56	[AMDGPU] Create a TargetInfo header. NFC Move the declarations of getThe<Name>Target() functions into a new header in TargetInfo and make users of these functions include this new header. This fixes a layering problem. llvm-svn: 360713	2019-05-14 21:54:37 +00:00
Richard Trieu	b26592e04d	[AArch64] Create a TargetInfo header. NFC Move the declarations of getThe<Name>Target() functions into a new header in TargetInfo and make users of these functions include this new header. This fixes a layering problem. llvm-svn: 360709	2019-05-14 21:33:53 +00:00
Dmitry Preobrazhensky	ee51d851ea	[AMDGPU][GFX8][GFX9] Corrected predicate of v_*_co_u32 aliases Reviewers: rampitec, arsenm Differential Revision: https://reviews.llvm.org/D61905 llvm-svn: 360702	2019-05-14 19:16:24 +00:00
Stanislav Mekhanoshin	05791d90c9	[AMDGPU] Fixed handling of imemdiate i1 literals This bug was exposed by the rL360395. Differential Revision: https://reviews.llvm.org/D61812 llvm-svn: 360689	2019-05-14 16:18:00 +00:00
Tim Renouf	33cb8f5b54	[AMDGPU] Fixed +DumpCode The +DumpCode attribute is a horrible hack in AMDGPU to embed the disassembly of the generated code into the elf file. It is used by LLPC to implement an extension that allows the application to read back the disassembly of the code. Longer term, we should re-implement that by using the LLVM disassembler from the Vulkan driver. Recent LLVM changes broke +DumpCode. With -filetype=asm it crashed, and with -filetype=obj I think it did not include any instructions, only the labels. Fixed with this commit: now it has no effect with -filetype=asm, and works as intended with -filetype=obj. Differential Revision: https://reviews.llvm.org/D60682 Change-Id: I6436d86fe2ea220d74a643a85e64753747c9366b llvm-svn: 360688	2019-05-14 16:17:14 +00:00
Simon Pilgrim	c2d9cfd925	[X86] Disable shouldFoldConstantShiftPairToMask for scalar shifts on AMD targets (PR40758) D61068 handled vector shifts, this patch does the same for scalars where there are similar number of pipes for shifts as bit ops - this is true almost entirely for AMD targets where the scalar ALUs are well balanced. This combine avoids AND immediate mask which usually means we reduce encoding size. Some tests show use of (slow, scaled) LEA instead of SHL in some cases, but thats due to particular shift immediates - shift+mask generate these just as easily. Differential Revision: https://reviews.llvm.org/D61830 llvm-svn: 360684	2019-05-14 15:21:28 +00:00
Cullen Rhodes	3b917019a5	[AArch64][SVE2] Asm: add SQRDMLAH/SQRDMLSH instructions Summary: This patch adds support for the indexed and unpredicated vectors forms of the SQRDMLAH and SQRDMLSH instructions. The specification can be found here: https://developer.arm.com/docs/ddi0602/latest Reviewed By: rovka Differential Revision: https://reviews.llvm.org/D61515 llvm-svn: 360683	2019-05-14 15:10:16 +00:00
Cullen Rhodes	e029da46e6	[AArch64][SVE2] Asm: add integer multiply-add/subtract (indexed) instructions Summary: This patch adds support for the following instructions: MLA mul-add, writing addend (Zda = Zda + Zn * Zm[idx]) MLS mul-sub, writing addend (Zda = Zda + -Zn * Zm[idx]) Predicated forms of these instructions were added in SVE. The specification can be found here: https://developer.arm.com/docs/ddi0602/latest Reviewed By: rovka Differential Revision: https://reviews.llvm.org/D61514 llvm-svn: 360682	2019-05-14 15:01:00 +00:00
Lei Huang	22561972af	[PowerPC] Custom lower known CR bit spills For known CRBit spills, CRSET/CRUNSET, it is more efficient to load and spill the known value instead of extracting the bit. eg. This sequence is currently used to spill a CRUNSET: crclr 4*cr5+lt mfocrf r3,4 rlwinm r3,r3,20,0,0 stw r3,132(r1) This patch custom lower it to: li r3,0 stw r3,132(r1) Differential Revision: https://reviews.llvm.org/D61754 llvm-svn: 360677	2019-05-14 14:27:06 +00:00
Simon Pilgrim	2747ee2c83	[X86] X86TargetLowering::LowerINTRINSIC_WO_CHAIN - ensure rounding control is initialized. NFCI. Fixes scan-build warnings llvm-svn: 360664	2019-05-14 11:30:39 +00:00
Tim Northover	ff6875acd9	AArch64: support binutils-like things on arm64_32. This adds support for the arm64_32 watchOS ABI to LLVM's low level tools, teaching them about the specific MachO choices and constants needed to disassemble things. llvm-svn: 360663	2019-05-14 11:25:44 +00:00
Philip Reames	3098e44daa	[X86] Prefer locked stack op over mfence for seq_cst 64-bit stores on 32-bit targets This is a follow on to D58632, with the same logic. Given a memory operation which needs ordering, but doesn't need to modify any particular address, prefer to use a locked stack op over an mfence. Differential Revision: https://reviews.llvm.org/D61863 llvm-svn: 360649	2019-05-14 04:43:37 +00:00
Sanjay Patel	3a13d970aa	[SDAG, x86] allow targets to override test for binop opcodes This follows the pattern of the existing isCommutativeBinOp(). x86 shows improvements from vector narrowing for the min/max opcodes. llvm-svn: 360639	2019-05-14 00:39:40 +00:00
Craig Topper	e2966473dd	[X86] Use ISD::MERGE_VALUES to return from lowerAtomicArith instead of calling ReplaceAllUsesOfValueWith and returning SDValue(). Returning SDValue() makes the caller think that nothing happened and it will end up executing the Expand path. This generates extra nodes that will need to be pruned as dead code. Returning an ISD::MERGE_VALUES will tell the caller that we'd like to make a change and it will take care of replacing uses. This will prevent falling into the Expand path. llvm-svn: 360627	2019-05-13 22:17:13 +00:00
Craig Topper	5f999c2bea	[X86] Various type corrections to the code that creates LOCK_OR32mi8/OR32mi8Locked to the stack for idempotent atomic rmw and atomic fence. These are updates to match how isel table would emit a LOCK_OR32mi8 node. -Use i32 for the immediate zero even though only 8 bits are encoded. -Use i16 for segment register. -Use LOCK_OR32mi8 for idempotent atomic operations in 32-bit mode to match 64-bit mode. I'm not sure why OR32mi8Locked and LOCK_OR32mi8 both exist. The only difference seems to be that OR32mi8Locked is marked as UnmodeledSideEffects=1. -Emit an extra i32 result for the flags output. I don't know if the types here really matter just noticed it was inconsistent with normal behavior. llvm-svn: 360619	2019-05-13 21:01:24 +00:00
Nikita Popov	323dc634b9	[WebAssembly] Don't assume that zext/sext result is i32/i64 in fast isel (PR41841) Usually this will abort fast-isel at the instruction using the non-legal result, but if the only use is in a different basic block, we'll incorrectly assume that the zext/sext is to i32 (rather than i128 in this case). Differential Revision: https://reviews.llvm.org/D61823 llvm-svn: 360616	2019-05-13 19:40:18 +00:00
Stanislav Mekhanoshin	79b2828b3f	[AMDGPU] Reorder includes per coding standard. NFC. llvm-svn: 360609	2019-05-13 18:05:10 +00:00
Stanislav Mekhanoshin	21088639ae	[AMDGPU] Remove now unused V2FP16_ONE constant def. NFC. llvm-svn: 360608	2019-05-13 17:52:57 +00:00
Robert Lougher	91a9d4ef4b	Revert [X86] Avoid SFB - Fix inconsistent codegen with/without debug info Revert r360436 as it is causing clang-x64-windows-msvc buildbot to fail. llvm-svn: 360606	2019-05-13 17:36:46 +00:00
Nick Desaulniers	c33f754e74	[TargetLowering] Handle multi depth GEPs w/ inline asm constraints Summary: X86TargetLowering::LowerAsmOperandForConstraint had better support than TargetLowering::LowerAsmOperandForConstraint for arbitrary depth getelementpointers for "i", "n", and "s" extended inline assembly constraints. Hoist its support from the derived class into the base class. Link: https://github.com/ClangBuiltLinux/linux/issues/469 Reviewers: echristo, t.p.northover Reviewed By: t.p.northover Subscribers: t.p.northover, E5ten, kees, jyknight, nemanjai, javed.absar, eraman, hiraditya, jsji, llvm-commits, void, craig.topper, nathanchance, srhines Tags: #llvm Differential Revision: https://reviews.llvm.org/D61560 llvm-svn: 360604	2019-05-13 17:27:44 +00:00
Simon Pilgrim	73aee29095	[X86][SSE] LowerBuildVectorv4x32 - don't insert MOVQ for undef elts Fixes the regression noted in D61782 where a VZEXT_MOVL was being inserted because we weren't discriminating between 'zeroable' and 'all undef' for the upper elts. Differential Revision: https://reviews.llvm.org/D61782 llvm-svn: 360596	2019-05-13 16:10:11 +00:00
Simon Pilgrim	cf5a8eb7cd	[X86][SSE] Relax use limits for lowerAddSubToHorizontalOp (PR32433) Now that we can use HADD/SUB for scalar additions from any pair of extracted elements (D61263), we can relax the one use limit as we will be able to merge multiple uses into using the same HADD/SUB op. This exposes a couple of missed opportunities in LowerBuildVectorv4x32 which will be committed separately. Differential Revision: https://reviews.llvm.org/D61782 llvm-svn: 360594	2019-05-13 16:02:45 +00:00
Simon Pilgrim	d9aa928603	[X86] Add SimplifyDemandedBits support for PEXTRB/PEXTRW (PR39709) Test case will be included in a followup - its being used but its tricky to show a case that isn't caught at a later stage anyway. llvm-svn: 360588	2019-05-13 15:31:27 +00:00
Cullen Rhodes	6dcef8fc0c	[AArch64][SVE2] Add SVE2 target features to backend and TargetParser Summary: This patch adds the following features defined by Arm SVE2 architecture extension: sve2, sve2-aes, sve2-sm4, sve2-sha3, bitperm For existing CPUs these features are declared as unsupported to prevent scheduler errors. The specification can be found here: https://developer.arm.com/docs/ddi0602/latest Reviewers: SjoerdMeijer, sdesmalen, ostannard, rovka Reviewed By: SjoerdMeijer, rovka Subscribers: rovka, javed.absar, tschuett, kristof.beyls, kristina, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61513 llvm-svn: 360573	2019-05-13 10:10:24 +00:00
Ulrich Weigand	8e42f6ddc8	[SystemZ] Model floating-point control register This adds the FPC (floating-point control register) as a reserved physical register and models its use by SystemZ instructions. Note that only the current rounding modes and the IEEE exception masks are modeled. Changes of the FPC due to exceptions (in particular the IEEE exception flags and the DXC) are not modeled. At this point, this patch is mostly NFC, but it will prevent scheduling of floating-point instructions across SPFC/LFPC etc. llvm-svn: 360570	2019-05-13 09:47:26 +00:00
Sam Parker	a33e311a3b	[ARM][ParallelDSP] Relax alias checks When deciding the safety of generating smlad, we checked for any writes within the block that may alias with any of the loads that need to be widened. This is overly conservative because it only matters when there's a potential aliasing write to a location accessed by a pair of loads. Now we check for aliasing writes only once, during setup. If two loads are found to have an aliasing write between them, we don't add these loads to LoadPairs. This means that later during the transform, we can safely widened a pair without worrying about aliasing. However, to maintain correctness, we also need to change the way that wide loads are inserted because the order is now important. The MatchSMLAD method has also been changed, absorbing MatchReductions and AddMACCandidate to hopefully improve readability. Differential Revision: https://reviews.llvm.org/D6102 llvm-svn: 360567	2019-05-13 09:23:32 +00:00
Fangrui Song	f3be557159	[WebAssembly] Add dependency on WebAssemblyDesc to fix BUILD_SHARED_LIBS=on builds after rL360550 This fixes the link error ld.lld: error: undefined symbol: llvm::WebAssembly::anyTypeToString(unsigned int) >>> referenced by WebAssemblyDisassembler.cpp llvm-svn: 360558	2019-05-13 05:51:39 +00:00
Yonghong Song	98fe9c9869	[BPF] emit BTF sections only if debuginfo available Currently, without -g, BTF sections may still be emitted with data sections, e.g., for linux kernel bpf selftest test_tcp_check_syncookie_kern.c issue discovered by Martin as shown below. -bash-4.4$ bpftool btf dump file test_tcp_check_syncookie_kern.o [1] VAR 'results' type_id=0, linkage=global-alloc [2] VAR '_license' type_id=0, linkage=global-alloc [3] DATASEC 'license' size=0 vlen=1 type_id=2 offset=0 size=4 [4] DATASEC 'maps' size=0 vlen=1 type_id=1 offset=0 size=28 Let disable BTF generation if no debuginfo, which is the original design. Signed-off-by: Yonghong Song <yhs@fb.com> Differential Revision: https://reviews.llvm.org/D61826 llvm-svn: 360556	2019-05-13 05:00:23 +00:00
Craig Topper	61e556d2bd	Recommit r358887 "[TargetLowering][AMDGPU][X86] Improve SimplifyDemandedBits bitcast handling" I've included a new fix in X86RegisterInfo to prevent PR41619 without reintroducing r359392. We might be able to improve that in the base class implementation of shouldRewriteCopySrc somehow. But this hopefully enables forward progress on SimplifyDemandedBits improvements for now. Original commit message: This patch adds support for BigBitWidth -> SmallBitWidth bitcasts, splitting the DemandedBits/Elts accordingly. The AMDGPU backend needed an extra (srl (and x, c1 << c2), c2) -> (and (srl(x, c2), c1) combine to encourage BFE creation, I investigated putting this in DAGComb but it caused a lot of noise on other targets - some improvements, some regressions. The X86 changes are all definite wins. llvm-svn: 360552	2019-05-13 04:03:35 +00:00
David L. Jones	a263aa25e1	[WebAssembly] Move InstPrinter files to MCTargetDesc. NFC For some targets, there is a circular dependency between InstPrinter and MCTargetDesc. Merging them together will fix this. For the other targets, the merging is to maintain consistency so all targets will have the same structure. llvm-svn: 360550	2019-05-13 03:32:41 +00:00
Simon Pilgrim	a7fc763082	[X86][AVX] Split VZEXT_MOVL ymm/zmm if the upper elements are not demanded. Removes unnecessary vzeroupper noted in D61806 llvm-svn: 360543	2019-05-12 15:16:29 +00:00
Simon Pilgrim	fda6bffd3b	[X86][SSE] SimplifyDemandedBits - call PEXTRB/PEXTRW SimplifyDemandedVectorElts as well. See if we can simplify the demanded vector elts from the extraction before trying to simplify the demanded bits. This helps us with target shuffles and hops in particular. llvm-svn: 360535	2019-05-11 21:35:50 +00:00
Simon Pilgrim	6b10fde69b	[CostModel][X86] Add min/max reduction costs for all SSE targets The original costs stopped at SSE42, I've added conservative estimates for everything down to SSE1/SSE2 and moved some of the SSE42 costs to SSE41 (really only the addition of PCMPGT makes any difference). I've also added missing vXi8 costs (we use PHMINPOSUW for i8/i16 for scarily quick results) and 256-bit vector costs for AVX1. llvm-svn: 360528	2019-05-11 17:12:52 +00:00
Simon Pilgrim	e4c5b6d9bd	[X86][SSE] Add SimplifyDemandedVectorElts HADD/HSUB handling. Still missing PHADDW/PHSUBW tests because PEXTRW doesn't call SimplifyDemandedVectorElts llvm-svn: 360526	2019-05-11 16:07:12 +00:00
Simon Pilgrim	5e0f92acad	FixupLEAPass::fixupIncDec - non-LEA opcodes should not happen here. NFCI. Matches what we do in other functions and fixes scan-build warning about uninitialized NewOpcode variable. llvm-svn: 360525	2019-05-11 16:02:34 +00:00
Craig Topper	c9d7484aa3	[X86] Add CMOV_FR32X/CMOV_FR64X pseudo instructions. Use them in fast isel to fix a machine verifier error after adding test cases. Fast isel picks the FR32X/FR64X register classes when lowering pseudo select, but it didn't have the right opcode to go with it. llvm-svn: 360524	2019-05-11 16:00:28 +00:00
Craig Topper	74a436596d	[X86] Sink some fast isel code into the only if that uses it. NFC llvm-svn: 360523	2019-05-11 16:00:19 +00:00
Craig Topper	26f2b13a65	[X86] Use TLI.getRegClassFor to simplify some more fast isel code. NFCI llvm-svn: 360522	2019-05-11 16:00:13 +00:00
Simon Pilgrim	e7c51137aa	HexagonConstEvaluator::evaluateHexExt - check incoming opcodes. NFCI. Only certain extension opcodes are supported - fixes scan build warning. llvm-svn: 360520	2019-05-11 15:24:34 +00:00
Craig Topper	682cc09675	[X86] Use getRegClassFor to simplify some code in fast isel. NFCI No need to select the register class based on type and features. It should already be setup by X86ISelLowering. llvm-svn: 360513	2019-05-11 05:18:58 +00:00
Craig Topper	31f7adb94f	[X86] Don't emit MOVNTDQA loads from fast-isel without SSE4.1. We were checking for SSE4.1 for FP types, but not integer 128-bit types. Fixes PR41837. llvm-svn: 360512	2019-05-11 04:19:33 +00:00
Craig Topper	bdef12df8d	[X86] Add a test case for idempotent atomic operations with speculative load hardening. Fix an additional issue found by the test. This test covers the fix from r360475 as well. llvm-svn: 360511	2019-05-11 04:00:27 +00:00
Richard Trieu	d0124bd762	[SystemZ] Move InstPrinter files to MCTargetDesc. NFC For some targets, there is a circular dependency between InstPrinter and MCTargetDesc. Merging them together will fix this. For the other targets, the merging is to maintain consistency so all targets will have the same structure. llvm-svn: 360510	2019-05-11 03:36:16 +00:00
Richard Trieu	03fe9d82c4	[Sparc] Move InstPrinter files to MCTargetDesc. NFC For some targets, there is a circular dependency between InstPrinter and MCTargetDesc. Merging them together will fix this. For the other targets, the merging is to maintain consistency so all targets will have the same structure. llvm-svn: 360506	2019-05-11 02:59:02 +00:00
Richard Trieu	00ecf67045	[RISCV] Move InstPrinter files to MCTargetDesc. NFC For some targets, there is a circular dependency between InstPrinter and MCTargetDesc. Merging them together will fix this. For the other targets, the merging is to maintain consistency so all targets will have the same structure llvm-svn: 360505	2019-05-11 02:43:58 +00:00
Richard Trieu	4bdb136b0f	[PowerPC] Move InstPrinter files to MCTargetDesc. NFC For some targets, there is a circular dependency between InstPrinter and MCTargetDesc. Merging them together will fix this. For the other targets, the merging is to maintain consistency so all targets will have the same structure. llvm-svn: 360502	2019-05-11 02:33:18 +00:00
Richard Trieu	4b620fcf0f	[NVPTX] Move InstPrinter files to MCTargetDesc. NFC For some targets, there is a circular dependency between InstPrinter and MCTargetDesc. Merging them together will fix this. For the other targets, the merging is to maintain consistency so all targets will have the same structure. llvm-svn: 360500	2019-05-11 02:09:13 +00:00
Richard Trieu	61fb6700a5	[MSP430] Move InstPrinter files to MCTargetDesc. NFC For some targets, there is a circular dependency between InstPrinter and MCTargetDesc. Merging them together will fix this. For the other targets, the merging is to maintain consistency so all targets will have the same structure. llvm-svn: 360498	2019-05-11 01:58:52 +00:00
Richard Trieu	fa29bee9d0	[Mips] Move InstPrinter files to MCTargetDesc. NFC For some targets, there is a circular dependency between InstPrinter and MCTargetDesc. Merging them together will fix this. For the other targets, the merging is to maintain consistency so all targets will have the same structure. llvm-svn: 360497	2019-05-11 01:38:56 +00:00
Richard Trieu	4c3890ddbf	[Lanai] Move InstPrinter files to MCTargetDesc. NFC For some targets, there is a circular dependency between InstPrinter and MCTargetDesc. Merging them together will fix this. For the other targets, the merging is to maintain consistency so all targets will have the same structure. llvm-svn: 360496	2019-05-11 01:25:58 +00:00
Richard Trieu	48803aa65c	[BPF] Move InstPrinter files to MCTargetDesc. NFC For some targets, there is a circular dependency between InstPrinter and MCTargetDesc. Merging them together will fix this. For the other targets, the merging is to maintain consistency so all targets will have the same structure. llvm-svn: 360494	2019-05-11 01:13:21 +00:00
Richard Trieu	bf9e67b5b9	[AVR] Move InstPrinter files to MCTargetDesc. NFC For some targets, there is a circular dependency between InstPrinter and MCTargetDesc. Merging them together will fix this. For the other targets, the merging is to maintain consistency so all targets will have the same structure. llvm-svn: 360493	2019-05-11 01:03:03 +00:00
Richard Trieu	5e3ee4b84e	[ARM] Move InstPrinter files to MCTargetDesc. NFC For some targets, there is a circular dependency between InstPrinter and MCTargetDesc. Merging them together will fix this. For the other targets, the merging is to maintain consistency so all targets will have the same structure. llvm-svn: 360490	2019-05-11 00:34:07 +00:00
Richard Trieu	dcf1ea08e5	[ARC] Move InstPrinter files to MCTargetDesc. NFC For some targets, there is a circular dependency between InstPrinter and MCTargetDesc. Merging them together will fix this. For the other targets, the merging is to maintain consistency so all targets will have the same structure. llvm-svn: 360488	2019-05-11 00:13:01 +00:00
Richard Trieu	c0bd7bd481	[AMDGPU] Move InstPrinter files to MCTargetDesc. NFC For some targets, there is a circular dependency between InstPrinter and MCTargetDesc. Merging them together will fix this. For the other targets, the merging is to maintain consistency so all targets will have the same structure. llvm-svn: 360487	2019-05-11 00:03:35 +00:00
Richard Trieu	7ba0605511	[AArch64] Move InstPrinter files to MCTargetDesc. NFC For some targets, there is a circular dependency between InstPrinter and MCTargetDesc. Merging them together will fix this. For the other targets, the merging is to maintain consistency so all targets will have the same structure. llvm-svn: 360486	2019-05-10 23:50:01 +00:00
Richard Trieu	f48ef2f2ba	[XCore] Move InstPrinter files to MCTargetDesc. NFC For some targets, there is a circular dependency between InstPrinter and MCTargetDesc. Merging them together will fix this. For the other targets, the merging is to maintain consistency so all targets will have the same structure. llvm-svn: 360485	2019-05-10 23:36:49 +00:00
Richard Trieu	b28b8b7724	[X86] Move InstPrinter files to MCTargetDesc. NFC For some targets, there is a circular dependency between InstPrinter and MCTargetDesc. Merging them together will fix this. For the other targets, the merging is to maintain consistency so all targets will have the same structure. llvm-svn: 360484	2019-05-10 23:24:38 +00:00
Philip Reames	849ef823df	Factor out redzone ABI checks [NFCI] As requested in D58632, cleanup our red zone detection logic in the X86 backend. The existing X86MachineFunctionInfo flag is used to track whether we use the redzone (via a particularly optimization?), but there's no common way to check whether the function has a red zone. I'd appreciate careful review of the uses being updated. I think they are NFC, but a careful eye from someone else would be appreciated. Differential Revision: https://reviews.llvm.org/D61799 llvm-svn: 360479	2019-05-10 22:55:42 +00:00
Craig Topper	df10cc6068	[X86] Disable speculative load hardening for operations with an explicit RSP base. After D58632, we can create idempotent atomic operations to the top of stack. This confused speculative load hardening because it thinks accesses should have virtual register base except for the cases it already excluded. This commit adds a new exclusion for this case. I'll try to reduce a test case for this, but this fix was verified to work by the reporter. This should avoid needing to revert D58632. llvm-svn: 360475	2019-05-10 22:03:33 +00:00
Mircea Trofin	ff3bed0e61	Skip over prefetches Summary: Skip over prefetches when assigning debug info to instructions with memory operands. This way, the debug info is stable after instrumenting a binary with prefetches, allowing for iterative profiling and instrumentation. Reviewers: davidxl Reviewed By: davidxl Subscribers: aprantl, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61789 llvm-svn: 360471	2019-05-10 21:27:55 +00:00
Robert Lougher	986b6b86bb	[X86] Avoid SFB - Fix inconsistent codegen with/without debug info Fixes https://bugs.llvm.org/show_bug.cgi?id=40969 The functions findPotentiallyBlockedCopies and buildCopy are currently not accounting for the presence of debug instructions. In the former this results in the optimization not being trigerred, and in the latter results in inconsistent codegen. This patch enables the optimization to be performed in a debug build and ensures the codegen is consistent with non-debug builds. Patch by Chris Dawson. Differential Revision: https://reviews.llvm.org/D61680 llvm-svn: 360436	2019-05-10 15:55:06 +00:00
Simon Pilgrim	a0b1518a4a	[X86][SSE] Add getHopForBuildVector vector splitting If we only use the lower xmm of a ymm hop, then extract the xmm's (for free), perform the xmm hop and then insert back into a ymm (for free). Fixes some of the regressions noted in D61782 llvm-svn: 360435	2019-05-10 15:46:04 +00:00
Lei Huang	1ac6e9636c	[PowerPC] custom lower `v2f64 fpext v2f32` Reduces scalarization overhead via custom lowering of v2f64 fpext v2f32. eg. For the following IR %0 = load <2 x float>, <2 x float>* %Ptr, align 8 %1 = fpext <2 x float> %0 to <2 x double> ret <2 x double> %1 Pre custom lowering: ld r3, 0(r3) mtvsrd f0, r3 xxswapd vs34, vs0 xscvspdpn f0, vs0 xxsldwi vs1, vs34, vs34, 3 xscvspdpn f1, vs1 xxmrghd vs34, vs0, vs1 After custom lowering: lfd f0, 0(r3) xxmrghw vs0, vs0, vs0 xvcvspdp vs34, vs0 Differential Revision: https://reviews.llvm.org/D57857 llvm-svn: 360429	2019-05-10 14:04:06 +00:00
Sam Clegg	ea38ac5ba3	[WebAssembly] Don't assume that strongly defined symbols are DSO-local The current PIC model for WebAssembly is more like ELF in that it allows symbol interposition. This means that more functions end up being addressed via the GOT and fewer directly added to the wasm table. One effect is a reduction in the number of wasm table entries similar to the previous attempt in https://reviews.llvm.org/D61539 which was reverted. Differential Revision: https://reviews.llvm.org/D61772 llvm-svn: 360402	2019-05-10 01:52:08 +00:00
Sam Clegg	2147365484	[WebAssembly] Remove friend18.C from list of known gcc torture test failures. NFC. Differential Revision: https://reviews.llvm.org/D61775 llvm-svn: 360401	2019-05-10 01:45:34 +00:00
Mircea Trofin	5c31c05fbd	[llvm] X86DiscriminateMemOps: insert debug info when missing Reviewers: davidxl Reviewed By: davidxl Subscribers: aprantl, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61735 llvm-svn: 360396	2019-05-10 00:12:51 +00:00
Stanislav Mekhanoshin	64196850f0	[AMDGPU] Pattern for v_xor3_b32 This also allows three op patterns to use increased constant bus limit of GFX10. Differential Revision: https://reviews.llvm.org/D61763 llvm-svn: 360395	2019-05-10 00:09:01 +00:00
Philip Reames	bd588dfd59	[X86] Improve lowering of idemptotent RMW operations The current lowering uses an mfence. mfences are substaintially higher latency than the locked operations originally requested, but we do want to avoid contention on the original cache line. As such, use a locked instruction on a cache line assumed to be thread local. Differential Revision: https://reviews.llvm.org/D58632 llvm-svn: 360393	2019-05-09 23:23:42 +00:00
Bill Wendling	6ee7f31484	Add ".dword" directive Summary: The ".dword" directive is a synonym for ".xword" and is used used by klibc, a minimalistic libc subset for initramfs. Reviewers: t.p.northover, nickdesaulniers Reviewed By: nickdesaulniers Subscribers: nickdesaulniers, javed.absar, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61719 llvm-svn: 360381	2019-05-09 21:57:44 +00:00
Stanislav Mekhanoshin	a76da34b1d	[AMDGPU] gfx1010 v_interp_* instructions Differential Revision: https://reviews.llvm.org/D61703 llvm-svn: 360364	2019-05-09 18:38:55 +00:00
Simon Pilgrim	93bfa5af48	[X86][SSE] Fold add(shuffle(),shuffle()) to hadd on 'slow' targets (PR39920) As reported on PR39920, "slow horizontal ops" targets tend to internally expand to 2shuffle+add/sub - so if we can reduce 2shuffle+add/sub to a hadd/sub then we should do it - similar port usage but reduced instruction count. This works out in most cases, although the "PR22377" regression in vector-shuffle-combining.ll is annoying - going from 2shuffle+add+shuffle to hadd+2shuffle - I've opened PR41813 to cover this. Differential Revision: https://reviews.llvm.org/D61308 llvm-svn: 360360	2019-05-09 17:45:01 +00:00
Stanislav Mekhanoshin	4d4c9e0757	[AMDGPU] gfx1010 changes for PAL metadata Differential Revision: https://reviews.llvm.org/D61704 llvm-svn: 360353	2019-05-09 16:34:13 +00:00
Roman Lebedev	9db0e72570	[X86] AMD Piledriver (BdVer2): major cleanup (mainly inverse throughput) I've started this cleanup more several times now, but got sidetracked elsewhere, e.g. by llvm-exegesis problems. Not this time, finally! This is mainly cleaning up the inverse throughput values, and a few latencies/uops, based on the llvm-exegesis measured values. Though this is not complete by any means, there's certainly more cleanup to be done. The performance numbers (i've only checked by RawSpeed benchmark) aren't really surprising - overall this slightly (< -1%) improves perf. llvm-svn: 360341	2019-05-09 13:54:51 +00:00
Sam Parker	d7b650cc72	[ARM][CGP] Guard against signext args and sitofp Add an Argument that has the SExtAttr attached, as well as SIToFP instructions, as values that generate sign bits. SIToFP doesn't strictly do this and could be treated as a sink to be sign-extended. Differential Revision: https://reviews.llvm.org/D61381 llvm-svn: 360331	2019-05-09 11:56:16 +00:00
Diana Picus	3531453371	[ARM GlobalISel] Map DBG_VALUE for types != s32 ...and make sure we fail elegantly for unsupported values. s64 goes into DPR, anything <= 32 into GPR. llvm-svn: 360321	2019-05-09 09:49:36 +00:00
Hans Wennborg	b1b09e5b55	X86WinAllocaExpander: Drop code looking through register copies (PR41786) This code was never covered by tests, in PR41786 it was pointed out that the deletion part doesn't work, and in a full Chrome build I was never able to hit the code path that looks through copies. It seems the situation it's supposed to handle doesn't actually come up in practice. Delete it to simplify the code. Differential revision: https://reviews.llvm.org/D61671 llvm-svn: 360320	2019-05-09 09:22:56 +00:00
Matt Arsenault	462403a5c8	AMDGPU: Mark scheduler classes as final llvm-svn: 360294	2019-05-08 22:10:04 +00:00
Matt Arsenault	01434f9377	AMDGPU: Select VOP3 form of add The VOP3 form should always be the preferred selection, to be shrunk later. This should only be an optimization issue, but this partially works around a problem from clobbering VCC when SIFixSGPRCopies rewrites an SCC defining operation directly to VCC. 3 of the testcases are regressions from failing to fold the immediate in cases it should. These can be avoided by improving the VCC liveness handling in SIFoldOperands. Simply increasing the threshold to computeRegisterLiveness works, although this is common enough that VCC liveness should probably be tracked throughout the pass. The hack of leaving behind an implicit_def instruction to avoid breaking iterator wastes instruction count, which inhibits finding the VCC def in long chains of adds. Doing this however exposes different, worse looking regressions from poor scheduling behavior. This could probably be avoided around by forcing the shrink of the addc here, but the scheduler should probably be fixed. The r600 add test needs to be split out because it asserts on the arguments in the new test during the calling convention lowering. llvm-svn: 360293	2019-05-08 22:09:57 +00:00
Stanislav Mekhanoshin	1dbf721315	[AMDGPU] gfx1010 exp modifications Differential Revision: https://reviews.llvm.org/D61701 llvm-svn: 360287	2019-05-08 21:23:37 +00:00

... 7 8 9 10 11 ...

52710 Commits