llvm-project

Commit Graph

Author	SHA1	Message	Date
Krzysztof Parzyszek	2c4231d888	[Hexagon] Fix division by zero in machine scheduler llvm-svn: 327980	2018-03-20 13:28:46 +00:00
Alex Bradbury	80c8eb7696	[RISCV] Add codegen for RV32F floating point load/store As part of this, add support for load/store from the constant pool. This is used to materialise f32 constants. llvm-svn: 327979	2018-03-20 13:26:12 +00:00
Alex Bradbury	76c29ee815	[RISCV] Add codegen for RV32F arithmetic and conversion operations Currently, only a soft floating point ABI is supported. llvm-svn: 327976	2018-03-20 12:45:35 +00:00
Krzysztof Parzyszek	dca383123f	[Hexagon] Improve scheduling based on register pressure Patch by Brendon Cahoon. llvm-svn: 327975	2018-03-20 12:28:43 +00:00
Simon Pilgrim	4a83f802cc	[X86][SandyBridge] Merge multiple InstrRW entries that map to the same SchedWriteRes group (NFCI) (PR35955) I've also merged some VEX/non-VEX instregex strings with a (V?) prefix - there are still a lot more of these to do. llvm-svn: 327974	2018-03-20 12:26:55 +00:00
Martin Storsjo	802b434156	[X86] Properly implement the calling convention for f80 for mingw/x86_64 In these cases, both parameters and return values are passed as a pointer to a stack allocation. MSVC doesn't use the f80 data type at all, while it is used for long doubles on mingw. Normally, this part of the calling convention is handled within clang, but for intrinsics that are lowered to libcalls, it may need to be handled within llvm as well. Differential Revision: https://reviews.llvm.org/D44592 llvm-svn: 327957	2018-03-20 06:19:38 +00:00
Craig Topper	ad7c685791	[X86] Rename MOVSX32_NOREXrr8 to MOVSX32rr8_NOREX so that the scheduler model regular expressions will pick it up with the regular version. Do the same for MOVSX32_NOREXrm8, MOVZX32_NOREXrr8, and MOVZX32_NOREXrm8 llvm-svn: 327948	2018-03-20 05:00:20 +00:00
Craig Topper	4778fa7e8a	[X86] Fix the SchedRW for memory forms of CMP and TEST. They were incorrectly marked as RMW operations. Some of the CMP instrucions worked, but the ones that use a similar encoding as RMW form of ADD ended up marked as RMW. TEST used the same tablegen class as some of the CMPs. llvm-svn: 327947	2018-03-20 03:55:17 +00:00
Craig Topper	3e9462607e	[X86] Add TEST16mi/TEST32mi/TEST64mi32 to the Sandybridge/Haswell/Broadwell/Skylake scheduler models. Move it from a load+store group on SNB to a load only group, the same group as CMP. llvm-svn: 327944	2018-03-20 03:02:03 +00:00
Craig Topper	7c90e29cf8	[X86] Add ROR/ROL/SHR/SAR by 1 instructions to the Sandy Bridge scheduler model. I assume these match the generic immediate version like they do in the other models. llvm-svn: 327943	2018-03-20 03:01:59 +00:00
Shiva Chen	cbd498ac10	[RISCV] Preserve stack space for outgoing arguments when the function contain variable size objects E.g. bar (int x) { char p[x]; push outgoing variables for foo. call foo } We need to generate stack adjustment instructions for outgoing arguments by eliminateCallFramePseudoInstr when the function contains variable size objects to avoid outgoing variables corrupt the variable size object. Default hasReservedCallFrame will return !hasFP(). We don't want to generate extra sp adjustment instructions when hasFP() return true, So We override hasReservedCallFrame as !hasVarSizedObjects(). Differential Revision: https://reviews.llvm.org/D43752 llvm-svn: 327938	2018-03-20 01:39:17 +00:00
Craig Topper	2330d6cd55	[X86] Fix the SNB scheduler for BLENDVB. PBLENDVBrr0 was with the memory version of VBLENDVB and PBLENDVBrm0 was missing. llvm-svn: 327937	2018-03-20 01:30:21 +00:00
Jessica Paquette	563548d8f3	[MachineOutliner] AArch64: Emit CFI instructions when outlining calls When outlining calls, the outliner needs to update CFI to ensure that, say, exception handling works. This commit adds that functionality and adds a test just for call outlining. Call outlining stuff in machine-outliner.mir should be moved into machine-outliner-calls.mir in a later commit. llvm-svn: 327917	2018-03-19 22:48:40 +00:00
Craig Topper	ab6076514d	[X86] Simplify the AVX512 code in LowerTruncate a little. We don't need to create an ISD::TRUNCATE node to return, we started with one and can return it. Also remove the call to getExtendInVec, the result is just going to be a getNode of that value passed in. llvm-svn: 327914	2018-03-19 21:58:02 +00:00
Craig Topper	3b967466d5	[X86] Replace a couple calls to getExtendInVec with getNode and the appropriate target independent EXTEND_VECTOR_INREG opcode. llvm-svn: 327899	2018-03-19 20:20:22 +00:00
Nirav Dave	3264c1bdf6	[DAG, X86] Revert r327197 "Revert r327170, r327171, r327172" Reland ISel cycle checking improvements after simplifying node id invariant traversal and correcting typo. llvm-svn: 327898	2018-03-19 20:19:46 +00:00
Martin Storsjo	9a55c1b0dc	[ARM, AArch64] Check the no-stack-arg-probe attribute for dynamic stack probes This extends the use of this attribute on ARM and AArch64 from SVN r325900 (where it was only checked for fixed stack allocations on ARM/AArch64, but for all stack allocations on X86). This also adds a testcase for the existing use of disabling the fixed stack probe with the attribute on ARM and AArch64. Differential Revision: https://reviews.llvm.org/D44291 llvm-svn: 327897	2018-03-19 20:06:50 +00:00
Lei Huang	ecfede94a7	[Power9]Legalize and emit code for quad-precision copySign/abs/nabs/neg/sqrt Legalize and emit code for quad-precision floating point operations: * xscpsgnqp * xsabsqp * xsnabsqp * xsnegqp * xssqrtqp Differential Revision: https://reviews.llvm.org/D44530 llvm-svn: 327889	2018-03-19 19:22:52 +00:00
Craig Topper	9770107b5f	[X86] Add JMP16r and JMP32r to Sandybridge scheduler model. Fixes PR36010 llvm-svn: 327883	2018-03-19 19:00:37 +00:00
Craig Topper	5e65996fac	[X86] Remove OUT32rr/OUT8rr/OUT32ri/OUT8ri from Sandybridge scheduler model. PR35590 was already filed for this information being wrong. It's probably better to default to WriteSystem behavior instead of using something completely wrong. llvm-svn: 327882	2018-03-19 19:00:35 +00:00
Craig Topper	b4c7873f8c	[X86] Add JCXZ/JECXZ to Sandybridge/Haswell/Broadwell/Skylake scheduler models. JRCXZ was already present, but not the others. We never codegen this instruction so this doesn't affect much just trying to get them all into a single generated scheduler class in the output. llvm-svn: 327881	2018-03-19 19:00:32 +00:00
Craig Topper	afabf36505	[X86] Correct regular expression in Zen scheduler model that was excluding JECXZ instruction. The regex was looking for JECXZ_32 or JECXZ_64, but their is just one instruction called JECXZ. They used to exist as separate instructions, but were merged over 3 years ago. llvm-svn: 327880	2018-03-19 19:00:29 +00:00
Craig Topper	591f44df54	[X86] Correct the SchedRW on (V)MOVAPSrr_REV and similar to match their non _REV counterparts. llvm-svn: 327879	2018-03-19 19:00:26 +00:00
Lei Huang	6d1596a98c	[PowerPC][Power9]Legalize and emit code for quad-precision add/div/mul/sub Legalize and emit code for quad-precision floating point operations: * xsaddqp * xssubqp * xsdivqp * xsmulqp Differential Revision: https://reviews.llvm.org/D44506 llvm-svn: 327878	2018-03-19 18:52:20 +00:00
Nemanja Ivanovic	d9d5bd3067	[PowerPC] Make AddrSpaceCast noop PowerPC targets do not use address spaces. As a result, we can get selection failures with address space casts. This patch makes those casts noops. Patch by Valentin Churavy. Differential revision: https://reviews.llvm.org/D43781 llvm-svn: 327877	2018-03-19 18:50:02 +00:00
Craig Topper	836cfb3a4c	[X86] Add the rest of the TEST with immediate instructions to the scheduler models to match their 8-bit counterpart. llvm-svn: 327874	2018-03-19 17:58:41 +00:00
Craig Topper	645e531a69	[X86] Add MOV16ri/MOV32ri/MOV64ri* to scheduler models to match MOV8ri. Correct SchedRW and itinerary for MOV32ri64. llvm-svn: 327872	2018-03-19 17:46:59 +00:00
Craig Topper	259eaa6e7c	[X86] Remove sse41 specific code from lowering v16i8 multiply With the SRAs removed from the SSE2 code in D44267, then there doesn't appear to be any advantage to the sse41 code. The punpcklbw instruction and pmovsx seem to have the same latency and throughput on most CPUs. And the SSE41 code requires moving the upper 64-bits into the lower 64-bit before the sign extend can be done. The unpckhbw in sse2 code can do better than that. llvm-svn: 327869	2018-03-19 17:31:41 +00:00
Craig Topper	5ccd87233f	[X86] Make the multiply and divide itineraries more consistent. Sometimes we used the same itinerary for MEM and REG forms, but that seems inconsistent with our usual usage. We also used the MUL8 itinerary for MULX32/64 which was also weird. The test changes are because we were using IIC_IMUL32_RR and IIC_IMUL64_RR instead of IIC_IMUL32_REG/IIC_IMUL64_REG for the 32 and 64 bit multiplies that produce double width result. llvm-svn: 327866	2018-03-19 16:38:33 +00:00
Zaara Syeda	01f414baaa	Revert [MachineLICM] This reverts commit rL327856 Failing build bots. Revert the commit now. llvm-svn: 327864	2018-03-19 16:19:44 +00:00
Zaara Syeda	ff05e2b0e6	[MachineLICM] Add functions to MachineLICM to hoist invariant stores This patch adds functions to allow MachineLICM to hoist invariant stores. Currently, MachineLICM does not hoist any store instructions, however when storing the same value to a constant spot on the stack, the store instruction should be considered invariant and be hoisted. The function isInvariantStore iterates each operand of the store instruction and checks that each register operand satisfies isCallerPreservedPhysReg. The store may be fed by a copy, which is hoisted by isCopyFeedingInvariantStore. This patch also adds the PowerPC changes needed to consider the stack register as caller preserved. Differential Revision: https://reviews.llvm.org/D40196 llvm-svn: 327856	2018-03-19 14:52:25 +00:00
Simon Pilgrim	30c38c3849	[X86] Generalize schedule classes to support multiple stages Currently the WriteResPair style multi-classes take a single pipeline stage and latency, this patch generalizes this to make it easier to create complex schedules with ResourceCycles and NumMicroOps be overriden from their defaults. This has already been done for the Jaguar scheduler to remove a number of custom schedule classes and adding it to the other x86 targets will make it much tidier as we add additional classes in the future to try and replace so many custom cases. I've converted some instructions but a lot of the models need a bit of cleanup after the patch has been committed - memory latencies not being consistent, the class not actually being used when we could remove some/all customs, etc. I'd prefer to keep this as NFC as possible so later patches can be smaller and target specific. Differential Revision: https://reviews.llvm.org/D44612 llvm-svn: 327855	2018-03-19 14:46:07 +00:00
Sanjay Patel	05daae75ad	[x86] put nops into the WriteNop class and customize for Jaguar 1. Given that we already have a classification bucket with 'nop' in the name, that's where 'nop' belongs. Right now, it's only used for prefix bytes and 'pause'. 2. Make the latency of this class '1' for Jaguar to tell the scheduler (and presumably llvm-mca) how to model the resource requirements better even though a nop has no dependencies. Differential Revision: https://reviews.llvm.org/D44608 llvm-svn: 327853	2018-03-19 14:26:50 +00:00
Nicolai Haehnle	4186cc7c08	TableGen: Check the dynamic type of !cast<Rec>(string) Summary: The docs already claim that this happens, but so far it hasn't. As a consequence, existing TableGen files get this wrong a lot, but luckily the fixes are all reasonably straightforward. To make this work with all the existing forms of self-references (since the true type of a record is only built up over time), the lookup of self-references in !cast is delayed until the final resolving step. Change-Id: If5923a72a252ba2fbc81a889d59775df0ef31164 Reviewers: arsenm, craig.topper, tra, MartinO Subscribers: wdng, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D44475 llvm-svn: 327849	2018-03-19 14:14:20 +00:00
Nicolai Haehnle	18f1998a00	TableGen: Explicitly test some cases of self-references and !cast errors Summary: These are cases of self-references that exist today in practice. Let's add tests for them to avoid regressions. The self-references in PPCInstrInfo.td can be expressed in a simpler way. Allowing this type of self-reference while at the same time consistently doing late-resolve even for self-references is problematic because there are references to fields that aren't in any class. Since there's no need for this type of self-reference anyway, let's just remove it. Change-Id: I914e0b3e1ae7adae33855fac409b536879bc3f62 Reviewers: arsenm, craig.topper, tra, MartinO Subscribers: nemanjai, wdng, kbarton, llvm-commits Differential Revision: https://reviews.llvm.org/D44474 llvm-svn: 327848	2018-03-19 14:14:10 +00:00
Matt Arsenault	fed0a45036	AMDGPU/GlobalISel: RegBankSelect for basic int ops llvm-svn: 327843	2018-03-19 14:07:23 +00:00
Matt Arsenault	69932e4d69	AMDGPU: Don't leave dead illegal VGPR->SGPR copies Normally DCE kills these, but at -O0 these get left behind leaving suspicious looking illegal copies. Replace with IMPLICIT_DEF to avoid iterator issues. llvm-svn: 327842	2018-03-19 14:07:15 +00:00
Sjoerd Meijer	d16037d9bb	[ARM] Support for v4f16 and v8f16 vectors This is the groundwork for adding the Armv8.2-A FP16 vector intrinsics, which uses v4f16 and v8f16 vector operands and return values. All the moving parts are tested with two intrinsics, a 1-operand v8f16 and a 2-operand v4f16 intrinsic. In a follow-up patch the rest of the intrinsics and tests will be added. Differential Revision: https://reviews.llvm.org/D44538 llvm-svn: 327839	2018-03-19 13:35:25 +00:00
Jonas Paulsson	a6216ec4cc	[SystemZ] Bugfix of CC liveness in emitMemMemWrapper (CLC). If DoneMBB becomes empty it must have CC added to its live-in list, since it will fall-through into EndMBB. This happens when the CLC loop does the complete range. Review: Ulrich Weigand llvm-svn: 327834	2018-03-19 13:05:22 +00:00
Hans Wennborg	13e8a85820	HexagonISelLowering.cpp: fix 'enum in bool context' warning llvm-svn: 327832	2018-03-19 12:55:58 +00:00
Alex Bradbury	0171a9f4ec	[RISCV] Peephole optimisation for load/store of global values or constant addresses (load (add base, off), 0) -> (load base, off) (store val, (add base, off)) -> (store val, base, off) This is similar to an equivalent peephole optimisation in PPCISelDAGToDAG. llvm-svn: 327831	2018-03-19 11:54:28 +00:00
Mikhail Maltsev	f07278ec31	[ARM] Fix warnings about missing parentheses in ARMAsmParser llvm-svn: 327827	2018-03-19 09:48:58 +00:00
Craig Topper	e18fbab988	[X86] Merge XADD8rr regular expression with XADD16rr/XADD32rr/XADD64rr in a couple scheduler models. llvm-svn: 327821	2018-03-19 04:21:42 +00:00
Craig Topper	d10ceffa5f	[X86] Add ADD16i16/ADD32i32/ADD64i32 and similar to the scheduler models to match ADD8i8. Also move ADC8i8 and SBB8i8 in the Sandy Bridge model to the same class as ADC8ri and SBB8ri. That seems more accurate since its the 8i8 is just the register forced to AL instead of coming from modrm. llvm-svn: 327820	2018-03-19 04:21:40 +00:00
Craig Topper	e9c99d32b3	[X6] Remove two unused InstrItinClass llvm-svn: 327819	2018-03-19 02:07:32 +00:00
Craig Topper	793733a6c8	[X86] Use IIC_CMOV64_RR/RM on 64-bit cmov instructions. llvm-svn: 327817	2018-03-19 00:56:12 +00:00
Craig Topper	9b60dcb29b	[X86] Merge 32 and 64-bit RORX/SHLX/SARX/SHRX into single regular expressions in scheduler models. llvm-svn: 327816	2018-03-19 00:56:11 +00:00
Craig Topper	13a1650d8a	[X86] Merge 8-bit instructions into instregex with 16/32/64 instructions in the scheduler models as much as possible. NFCI This reduces the total number of generated scheduler classes from 5404 to 5316. llvm-svn: 327815	2018-03-19 00:56:09 +00:00
Dylan McKay	a35ee70641	[AVR] Lower i128 divisions to runtime library calls This patch adds i128 division support by instruction LLVM to lower 128-bit divisions to the __udivmodti4 and __divmodti4 rtlib functions. This also adds test for 64-bit division and 128-bit division. Patch by Peter Nimmervoll. llvm-svn: 327814	2018-03-19 00:55:50 +00:00
Craig Topper	f545cfee52	[Mips] Remove duplicate lines from MipsScheduleP5600.td and enable FullInstRWOverlapCheck. This fixes the errors found by the new check added in r327808. llvm-svn: 327813	2018-03-18 22:16:54 +00:00
Craig Topper	75aeb62eb4	[AArch64] Fix a few InstRWs in the A53 scheduler model and enable FullInstRWOverlapCheck. This fixes the errors found by the new check added in r327808. llvm-svn: 327812	2018-03-18 22:16:53 +00:00
Craig Topper	e1d6a4df1c	[TableGen] When trying to reuse a scheduler class for instructions from an InstRW, make sure we haven't already seen another InstRW containing this instruction on this CPU. This is similar to the check later when we remap some of the instructions from one class to a new one. But if we reuse the class we don't get to do that check. So many CPUs have violations of this check that I had to add a flag to the SchedMachineModel to allow it to be disabled. Hopefully we can get those cleaned up quickly and remove this flag. A lot of the violations are due to overlapping regular expressions, but that's not the only kind of issue it found. llvm-svn: 327808	2018-03-18 19:56:15 +00:00
Simon Pilgrim	203876f104	[X86][Btver2] Fix crc32 schedule costs The default is currently FAdd for some reason llvm-svn: 327807	2018-03-18 19:54:42 +00:00
Simon Pilgrim	c3db8c7cda	[X86][Btver2] FADD/FHADD ymm instructions are double pumped on the JFPA functional pipe llvm-svn: 327804	2018-03-18 18:45:57 +00:00
Simon Pilgrim	036cc82622	[X86][Btver2] Float bitwise ymm instructions are double pumped on the JFPX (JFPA/JFPM) functional pipes llvm-svn: 327803	2018-03-18 17:10:12 +00:00
Simon Pilgrim	87d2f7463f	[X86][Btver2] F16C instructions are performed on the JSTC functional pipe llvm-svn: 327801	2018-03-18 15:59:51 +00:00
Simon Pilgrim	541992203d	[X86][Btver2] Strip default latency/resource values. NFCI. llvm-svn: 327795	2018-03-18 13:16:11 +00:00
Simon Pilgrim	40f6d6ad0b	[X86][Btver2] SSE4A EXTRQ/INSERTQ instructions are performed on the JVALU0/JVALU1 functional pipes llvm-svn: 327794	2018-03-18 13:05:09 +00:00
Simon Pilgrim	e16790b133	[X86][Btver2] Modelled float bitwise instructions as being performed on the float cluster (FPA/FPM) not the integer. llvm-svn: 327793	2018-03-18 12:37:35 +00:00
Simon Pilgrim	e409f84e7e	[X86][Btver2] Correctly distinguish between scheduling pipe and functional unit for JWriteResFpuPair defs Jaguar's FPU has 2 scheduler pipes (JFPU0/JFPU1) which forward to multiple functional sub-units each. We need to model that an micro-op will both consume the scheduler pipe and a functional unit. This patch just handles the ops defined through JWriteResFpuPair, I'll go through the custom cases later. llvm-svn: 327791	2018-03-18 12:09:17 +00:00
Simon Pilgrim	f86d48b3ae	[X86][Btver2] Merge equivalent VBLENDVY + VPERMILY schedule groups Thanks to Craig Topper for noticing this. llvm-svn: 327789	2018-03-18 10:22:35 +00:00
Craig Topper	2d451e73f9	[X86] Fix a bunch of overlapping regular expressions in the scheduler models. llvm-svn: 327787	2018-03-18 08:38:06 +00:00
Craig Topper	86b02cf076	[X86] Fix a couple typos in the Zen scheduler model. llvm-svn: 327786	2018-03-18 08:38:04 +00:00
Craig Topper	89dcda3e90	[X86] Remove MMX_MASKMOVQ64 and VMASKMOVDQU from scheduler models. The information was so wildly inaccurate and incomplete its better to just remove it. MMX_MASKMOVQ64 showed up twice in several scheduler models. In Haswell and Broadwell they were on adjacent lines. On Skylake the copies had different information. MMX_MASKMOVQ and MASKMOVDQU were completely missing. MMX_MASKMOVQ64 was listed on Haswell/Broadwell as 1 cycle on port 1 despite it being a store instruction. Filed PR36780 to track fixing this right. llvm-svn: 327783	2018-03-18 03:24:42 +00:00
Martin Storsjo	36d6419cc5	[AArch64] Skip an unnecessary getCopyToReg in DYNAMIC_STACKALLOC Differential Revision: https://reviews.llvm.org/D44586 llvm-svn: 327779	2018-03-17 20:08:48 +00:00
Nirav Dave	5f0ab71b62	Revert "[DAG, X86] Revert r327197 "Revert r327170, r327171, r327172"" as it times out building test-suite on PPC. llvm-svn: 327778	2018-03-17 19:24:54 +00:00
Nirav Dave	982d3a56ea	[DAG, X86] Revert r327197 "Revert r327170, r327171, r327172" Reland ISel cycle checking improvements after simplifying and reducing node id invariant traversal. llvm-svn: 327777	2018-03-17 17:42:10 +00:00
Matt Arsenault	abdc4f2dc7	AMDGPU/GlobalISel: Cleanup constant legality llvm-svn: 327774	2018-03-17 15:17:48 +00:00
Matt Arsenault	685d1e8157	AMDGPU/GlobalISel: Basic G_GEP legality llvm-svn: 327773	2018-03-17 15:17:45 +00:00
Matt Arsenault	85803366d6	AMDGPU/GlobalISel: Basic legality for load/store llvm-svn: 327772	2018-03-17 15:17:41 +00:00
Oren Ben Simhon	fdd72fd522	[X86] Added support for nocf_check attribute for indirect Branch Tracking X86 Supports Indirect Branch Tracking (IBT) as part of Control-Flow Enforcement Technology (CET). IBT instruments ENDBR instructions used to specify valid targets of indirect call / jmp. The `nocf_check` attribute has two roles in the context of X86 IBT technology: 1. Appertains to a function - do not add ENDBR instruction at the beginning of the function. 2. Appertains to a function pointer - do not track the target function of this pointer by adding nocf_check prefix to the indirect-call instruction. This patch implements `nocf_check` context for Indirect Branch Tracking. It also auto generates `nocf_check` prefixes before indirect branchs to jump tables that are guarded by range checks. Differential Revision: https://reviews.llvm.org/D41879 llvm-svn: 327767	2018-03-17 13:29:46 +00:00
Jonas Paulsson	138960770c	[SystemZ] computeKnownBitsForTargetNode() / ComputeNumSignBitsForTargetNode() Improve/implement these methods to improve DAG combining. This mainly concerns intrinsics. Some constant operands to SystemZISD nodes have been marked Opaque to avoid transforming back and forth between generic and target nodes infinitely. Review: Ulrich Weigand llvm-svn: 327765	2018-03-17 08:32:12 +00:00
Jessica Paquette	b3e7dc9144	[MachineOutliner] Make KILLs invisible At the point the outliner runs, KILLs don't impact anything, but they're still considered unique instructions. This commit makes them invisible like DebugValues so that they can still be outlined without impacting outlining decisions. llvm-svn: 327760	2018-03-16 22:53:34 +00:00
Craig Topper	25007c4f32	[X86] Pass SelectionDAG into X86ISelAddressMode::dump and on to SDNode::dump. This prevents a crash in SelectionDAGDumper with -debug when trying to print mem operands if one of the registers in the addressing mode comes from a load. llvm-svn: 327744	2018-03-16 21:10:07 +00:00
Krzysztof Parzyszek	f81a8d03c1	[Hexagon] Avoid bank conflicts in post-RA scheduler Avoid scheduling two loads in such a way that they would end up in the same packet. If there is a load in a packet, try to schedule a non-load next. Patch by Brendon Cahoon. llvm-svn: 327742	2018-03-16 20:55:49 +00:00
Craig Topper	f0815e01d8	[X86] Merge ADDSUB/SUBADD detection into single methods that can detect either and indicate what they found. Previously, we called the same functions twice with a bool flag determining whether we should look for ADDSUB or SUBADD. It would be more efficient to run the code once and detect either pattern with a flag to tell which type it found. Differential Revision: https://reviews.llvm.org/D44540 llvm-svn: 327730	2018-03-16 18:25:59 +00:00
Farhana Aleen	c6c9dc8773	[AMDGPU] Supported ds_write_b128 generation. Summary: This is a follow-on patch of https://reviews.llvm.org/D44210 Author: FarhanaAleen Reviewed By: msearles Subscribers: llvm-commits, AMDGPU Differential Revision: https://reviews.llvm.org/D44319 llvm-svn: 327726	2018-03-16 18:12:00 +00:00
Craig Topper	e6913ec340	[X86] Post process the DAG after isel to remove vector moves that were added to zero upper bits. We previously avoided inserting these moves during isel in a few cases which is implemented using a whitelist of opcodes. But it's too difficult to generate a perfect list of opcodes to whitelist. Especially with AVX512F without AVX512VL using 512 bit vectors to implement some 128/256 bit operations. Since isel is done bottoms up, we'd have to check the VT and opcode and subtarget in order to determine whether an EXTRACT_SUBREG would be generated for some operations. So instead of doing that, this patch adds a post processing step that detects when the moves are unnecesssary after isel. At that point any EXTRACT_SUBREGs would have already been created and appear in the DAG. So then we just need to ensure the input to the move isn't one. Differential Revision: https://reviews.llvm.org/D44289 llvm-svn: 327724	2018-03-16 17:13:42 +00:00
Dmitry Preobrazhensky	4c8f4234b6	[AMDGPU][MC][GFX8][GFX9][DISASSEMBLER] Added "_e32" suffix to 32-bit VINTRP opcodes See bug 36751: https://bugs.llvm.org/show_bug.cgi?id=36751 Differential Revision: https://reviews.llvm.org/D44529 Reviewers: artem.tamazov, arsenm llvm-svn: 327723	2018-03-16 16:38:04 +00:00
Dmitry Preobrazhensky	9c1a6e7e24	[AMDGPU][MC] Corrected default values for unused SDWA operands See bug 36355: https://bugs.llvm.org/show_bug.cgi?id=36355 Differential Revision: https://reviews.llvm.org/D44481 Reviewers: artem.tamazov, arsenm llvm-svn: 327720	2018-03-16 15:40:27 +00:00
Jonas Paulsson	a9f05a9d50	[SystemZ] Make AnyRegBitRegClass unallocatable. AnyReg is just for the assembler and it is better to have it as not allocatable in order to simplify (make more intuitive) the RegPressureSets. Review: Ulrich Weigand llvm-svn: 327715	2018-03-16 15:21:26 +00:00
Krzysztof Parzyszek	9915291ab8	[Hexagon] Fix zero-extending non-HVX bool vectors llvm-svn: 327712	2018-03-16 15:03:37 +00:00
Mikhail Maltsev	ed1c8bfec2	[ARM] Convert more invalid NEON immediate loads Summary: Currently the LLVM MC assembler is able to convert e.g. vmov.i32 d0, #0xabababab (which is technically invalid) into a valid instruction vmov.i8 d0, #0xab this patch adds support for vmov.i64 and for cases with the resulting load types other than i8, e.g.: vmov.i32 d0, #0xab00ab00 -> vmov.i16 d0, #0xab00 Reviewers: olista01, rengolin Reviewed By: rengolin Subscribers: rengolin, javed.absar, kristof.beyls, rogfer01, llvm-commits Differential Revision: https://reviews.llvm.org/D44467 llvm-svn: 327709	2018-03-16 14:10:56 +00:00
Simon Pilgrim	23578e7d3c	[X86][Btver2] Add correct mul/imul schedule costs Integer multiply is performed on the JMul function unit and i64 requires double pumping llvm-svn: 327707	2018-03-16 14:01:01 +00:00
Simon Pilgrim	8d28ae6aec	[X86][Btver2] Add correct lzcnt/tzcnt/popcnt schedule costs Don't use WriteIMul defaults llvm-svn: 327706	2018-03-16 13:43:55 +00:00
Mikhail Maltsev	8dcf6fa308	[ARM] Fix a check in vmov/vmvn immediate parsing Summary: Currently the check is incorrect and the following invalid instruction is accepted and incorrectly assembled: vmov.i32 d2, #0x00a500a6 This patch fixes the issue. Reviewers: olista01, rengolin Reviewed By: rengolin Subscribers: SjoerdMeijer, javed.absar, rogfer01, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D44460 llvm-svn: 327704	2018-03-16 12:46:49 +00:00
Matthew Simpson	eacfefd056	[AArch64] Implement getArithmeticReductionCost This patch provides an implementation of getArithmeticReductionCost for AArch64. We can specialize the cost of add reductions since they are computed using the 'addv' instruction. Differential Revision: https://reviews.llvm.org/D44490 llvm-svn: 327702	2018-03-16 11:34:15 +00:00
Sjoerd Meijer	d391a1a985	[ARM] FP16 codegen support for VSEL This implements lowering of SELECT_CC for f16s, which enables codegen of VSEL with f16 types. Differential Revision: https://reviews.llvm.org/D44518 llvm-svn: 327695	2018-03-16 08:06:25 +00:00
Simon Pilgrim	14e5a1b05b	[X86][Btver2] Add support for multiple pipelines stages for x86 scalar schedules. NFCI. This allows us to use JWriteResIntPair for complex schedule classes (like WriteIDiv) as well as single pipe instructions. llvm-svn: 327686	2018-03-15 23:46:12 +00:00
Craig Topper	1b8cf49704	[SelectionDAG][ARM][X86] Teach PromoteIntRes_SETCC to do a better job picking the result type for the setcc. Previously if getSetccResultType returned an illegal type we just fell back to using the default promoted type. This appears to have been to handle the case where for vectors getSetccResultType returns the input type, but the input type itself isn't legal and will need to be promoted. Without the legality check we would never reach a legal type. But just picking the promoted type to be the setcc type can create strange setccs where the result type is 128 bits and the operand type is 256 bits. If for example the result type was promoted to v8i16 from v8i1, but the input type was promoted from v8i23 to v8i32. We currently handle this with custom lowering code in X86. This legality check also caused us reject the getSetccResultType when the input type needed to be widened or split. Even though that result wouldn't have caused legalization to get stuck. This patch tries to fix this by detecting the getSetccResultType needs to be promoted. If its input type also needs to be promoted we'll try a ask for a new setcc result type based on its eventual promoted value. Otherwise we fall back to default type to promote to. For any other illegal values we might get back from the initial call to getSetccResultType we just keep and allow it to be re-legalized later via splitting or widening or scalarizing. llvm-svn: 327683	2018-03-15 23:04:11 +00:00
Simon Pilgrim	3894809997	[X86][Btver2] Fix ymm div/sqrt to use fmul unit YMM FDiv/FSqrt are dispatched on pipe JFPU1 but should be performed on the JFPM unit - that is where most of the cycles are spent. This matches the pipes for WriteFSqrt/WriteFDiv definitions. llvm-svn: 327682	2018-03-15 23:00:47 +00:00
Derek Schuff	10b313581f	[WebAssembly] Add DebugLoc information to WebAssembly block and loop. Patch by Yury Delendik Differential Revision: https://reviews.llvm.org/D44448 llvm-svn: 327673	2018-03-15 22:06:51 +00:00
Artem Belevich	7b14e7f041	[NVPTX] TblGen-ized lowering of WMMA intrinsics. NFC. Differential Revision: https://reviews.llvm.org/D43151 llvm-svn: 327672	2018-03-15 21:40:56 +00:00
Evandro Menezes	d4254ac1b9	[AArch64] Adjust the cost model for Exynos M3 Fix typo. llvm-svn: 327663	2018-03-15 20:37:32 +00:00
Evandro Menezes	5303f897d4	[AArch64] Adjust the cost model for Exynos M3 Add special case for rotate right. llvm-svn: 327662	2018-03-15 20:31:25 +00:00
Evandro Menezes	1515e859c6	[AArch64] Adjust the cost model for Exynos M3 Increase the number of cheap as move cases of register reset. llvm-svn: 327661	2018-03-15 20:31:13 +00:00
Craig Topper	c3983c34cd	[X86] Make sure we use FSUB instruction as the reference for operand order in isAddSubOrSubAdd when recognizing subadd The FADD part of the addsub/subadd pattern can have its operands commuted, but when checking for fsubadd we were using the fadd as reference and commuting the fsub node. llvm-svn: 327660	2018-03-15 20:30:54 +00:00
David Blaikie	a46b8a7677	Remove empty file I should've deleted this in r320768 but accidentally just deleted its contents instead. llvm-svn: 327658	2018-03-15 20:29:14 +00:00
Guozhi Wei	9c916584ba	[PPC] Avoid non-simple MVT in STBRX optimization PR35402 triggered this case. It bswap and stores a 48bit value, current STBRX optimization transforms it into STBRX. Unfortunately 48bit is not a simple MVT, there is no PPC instruction to support it, and it can't be automatically expanded by llvm, so caused a crash. This patch detects the non-simple MVT and returns early. Differential Revision: https://reviews.llvm.org/D44500 llvm-svn: 327651	2018-03-15 17:49:12 +00:00
Simon Pilgrim	48b758e8ad	[X86][Btver2] Attach AES/CLMUL instructions to a scheduler pipe llvm-svn: 327650	2018-03-15 17:45:10 +00:00
Craig Topper	5a0251fe67	[X86] Simplify the type legality checking for (FM)ADDSUB/SUBADD matching. NFCI Rather than enumerating all specific types, for the DAG combine we can just use TLI::isTypeLegal and an SSE3 check. For the BUILD_VECTOR version we already know the type is legal so we just need to check SSE3. llvm-svn: 327649	2018-03-15 17:38:59 +00:00
Craig Topper	627e001fad	[X86] Fix 80 column violations. llvm-svn: 327648	2018-03-15 17:38:55 +00:00
Zaara Syeda	1110c4d336	[PowerPC] Optimize TLS initial-exec sequence to use X-Form loads/stores This patch adds new load/store instructions for integer scalar types which can be used for X-Form when fed by add with an @tls relocation. Differential Revision: https://reviews.llvm.org/D43315 llvm-svn: 327635	2018-03-15 15:34:41 +00:00
Simon Pilgrim	d30df5769e	[X86][Btver2] Remove JAny resource, and map system/microcoded instructions to JALU pipes Simplifies throughput to the issue width (1/2) instead of permitting any pipe (1/6) llvm-svn: 327632	2018-03-15 15:12:12 +00:00
Simon Pilgrim	fb7aa57bf1	[X86][SSE] Introduce Float/Vector WriteMove, WriteLoad and Writetore scheduler classes As discussed on D44428 and PR36726, this patch splits off WriteFMove/WriteVecMove, WriteFLoad/WriteVecLoad and WriteFStore/WriteVecStore scheduler classes to permit vectors to be handled separately from gpr/scalar types. I've minimised the diff here by only moving various basic SSE/AVX vector instructions across - we can fix the rest when called for. This does fix the MOVDQA vs MOVAPS/MOVAPD discrepancies mentioned on D44428. Differential Revision: https://reviews.llvm.org/D44471 llvm-svn: 327630	2018-03-15 14:45:30 +00:00
Craig Topper	26a3a80c87	[X86] Add support for matching FMSUBADD from build_vector. llvm-svn: 327604	2018-03-15 06:14:55 +00:00
Craig Topper	a5e712f402	[X86] Remove old TODO. We have coverage for this now. Coverage was added in r320950. llvm-svn: 327603	2018-03-15 06:14:53 +00:00
Craig Topper	b9526e9fdb	[X86] Use MVT in a couple places where we know the type is legal. llvm-svn: 327602	2018-03-15 06:14:51 +00:00
Lei Huang	1f8da3ae19	[PowerPC][NFC] formatting-only fix llvm-svn: 327599	2018-03-15 03:06:44 +00:00
Simon Pilgrim	48fbf0c69a	[X86][Btver2] Add support for multiple pipelines stages for fpu schedules. NFCI. This allows us to use JWriteResFpuPair for complex schedule classes as well as single pipe instructions. llvm-svn: 327588	2018-03-14 23:12:09 +00:00
Mark Searles	c3c02bde73	[AMDGPU] Waitcnt pass: Modify the waitcnt pass to propagate info in the case of a single basic block loop. mergeInputScoreBrackets() does this for us; update it so that it processes the single bb's score bracket when processing the single bb's preds. It is, after all, a pred of itself, so it's score bracket is needed. Differential Revision: https://reviews.llvm.org/D44434 llvm-svn: 327583	2018-03-14 22:04:32 +00:00
Simon Pilgrim	dfeebdbed7	[X86][Btver2] Add ResourceCycles and NumMicroOps overrides to scalar instructions. NFCI. Currently still use default values - this is setup for a future patch. llvm-svn: 327582	2018-03-14 21:55:54 +00:00
Francis Visoiu Mistrih	164560bd74	[AArch64] Emit CSR loads in the same order as stores Optionally allow the order of restoring the callee-saved registers in the epilogue to be reversed. The flag -reverse-csr-restore-seq generates the following code: ``` stp x26, x25, [sp, #-64]! stp x24, x23, [sp, #16] stp x22, x21, [sp, #32] stp x20, x19, [sp, #48] ; [..] ldp x24, x23, [sp, #16] ldp x22, x21, [sp, #32] ldp x20, x19, [sp, #48] ldp x26, x25, [sp], #64 ret ``` Note how the CSRs are restored in the same order as they are saved. One exception to this rule is the last `ldp`, which allows us to merge the stack adjustment and the ldp into a post-index ldp. This is done by first generating: ldp x26, x27, [sp] add sp, sp, #64 which gets merged by the arm64 load store optimizer into ldp x26, x25, [sp], #64 The flag is disabled by default. llvm-svn: 327569	2018-03-14 20:34:03 +00:00
Craig Topper	9c098ed819	[X86] Add back fast-isel code for handling i8 shifts. I removed this in r316797 because the coverage report showed no coverage and I thought it should have been handled by the auto generated table. I now see that there is code that bypasses the table if the shift amount is out of bounds. This adds back the code. We'll codegen out of bounds i8 shifts to effectively (amount & 0x1f). The 0x1f is a strange quirk of x86 that shift amounts are always masked to 5-bits(except 64-bits). So if the masked value is still out bounds the result will be 0. Fixes PR36731. llvm-svn: 327540	2018-03-14 17:57:19 +00:00
Francis Visoiu Mistrih	084e7d8770	[AArch64] Keep track of MIFlags in the LoadStoreOptimizer Merging: * $x26, $x25 = frame-setup LDPXi $sp, 0 * $sp = frame-destroy ADDXri $sp, 64, 0 into an LDPXpost should preserve the flags from both instructions as following: * frame-setup frame-destroy LDPXpost Differential Revision: https://reviews.llvm.org/D44446 llvm-svn: 327533	2018-03-14 17:10:58 +00:00
Craig Topper	b36cb20ef9	[X86] Teach X86TargetLowering::targetShrinkDemandedConstant to set non-demanded bits if it helps created an and mask that can be matched as a zero extend. I had to modify the bswap recognition to allow unshrunk masks to make this work. Fixes PR36689. Differential Revision: https://reviews.llvm.org/D44442 llvm-svn: 327530	2018-03-14 16:55:15 +00:00
Simon Pilgrim	d1c3c995c0	[X86][AVX] Use WriteFShuffleLd for broadcast reg-mem instructions They shouldn't be treated as pure loads. Found while investigating D44428 llvm-svn: 327524	2018-03-14 15:47:08 +00:00
Alexander Ivchenko	86ef9ab28f	[GlobalIsel][X86] Support for G_SDIV instruction Reviewed By: igorb Differential Revision: https://reviews.llvm.org/D44430 llvm-svn: 327520	2018-03-14 15:41:11 +00:00
Petar Jovanovic	3408caf686	[mips] Add support for CRC ASE This includes Instructions: crc32b, crc32h, crc32w, crc32d, crc32cb, crc32ch, crc32cw, crc32cd Assembler directives: .set crc, .set nocrc, .module crc, .module nocrc Attribute: crc .MIPS.abiflags: CRC (0x8000) Patch by Vladimir Stefanovic. Differential Revision: https://reviews.llvm.org/D44176 llvm-svn: 327511	2018-03-14 14:13:31 +00:00
Simon Pilgrim	d594942928	[X86][Btver2] Fix YMM shuffle, permute and permutevar scheduler costs Account for ymm double pumping and add proper pshufb/permutevar support llvm-svn: 327510	2018-03-14 14:05:19 +00:00
Simon Pilgrim	de995e6e37	[X86][SSE] Use WriteFShuffleLd for MOVDDUP/MOVSHDUP/MOVSLDUP reg-mem instructions They shouldn't be treated as pure loads. Found while investigating D44428 llvm-svn: 327505	2018-03-14 13:22:56 +00:00
Martin Storsjo	bde677289a	[AArch64] Don't produce R_AARCH64_TLSLE_LDST32_TPREL_LO12_NC Support for this relocation is missing in both LLD and GNU binutils at the moment. This reverts the ELF parts of SVN r327316. llvm-svn: 327503	2018-03-14 13:09:10 +00:00
Alexander Ivchenko	0bd4d8c901	[GlobalISel][X86] Support G_LSHR/G_ASHR/G_SHL Support G_LSHR/G_ASHR/G_SHL. We have 3 variance for shift instructions : shift gpr, shift imm, shift 1. Currently GlobalIsel TableGen generate patterns for shift imm and shift 1, but with shiftCount i8. In G_LSHR/G_ASHR/G_SHL like LLVM-IR both arguments has the same type, so for now only shift i8 can use auto generated TableGen patterns. The support of G_SHL/G_ASHR enables tryCombineSExt from LegalizationArtifactCombiner.h to hit, which results in different legalization for the following tests: LLVM :: CodeGen/X86/GlobalISel/ext-x86-64.ll LLVM :: CodeGen/X86/GlobalISel/gep.ll LLVM :: CodeGen/X86/GlobalISel/legalize-ext-x86-64.mir -; X64-NEXT: movsbl %dil, %eax +; X64-NEXT: movl $24, %ecx +; X64-NEXT: # kill: def $cl killed $ecx +; X64-NEXT: shll %cl, %edi +; X64-NEXT: movl $24, %ecx +; X64-NEXT: # kill: def $cl killed $ecx +; X64-NEXT: sarl %cl, %edi +; X64-NEXT: movl %edi, %eax ..which is not optimal and should be addressed later. Rework of the patch by igorb Reviewed By: igorb Differential Revision: https://reviews.llvm.org/D44395 llvm-svn: 327499	2018-03-14 11:23:57 +00:00
Alexander Ivchenko	327de80529	[GlobalIsel][X86] Support for G_ZEXT instruction Reviewed By: igorb Differential Revision: https://reviews.llvm.org/D44378 llvm-svn: 327482	2018-03-14 09:11:23 +00:00
Matt Arsenault	41e5ac4fa4	TargetMachine: Add address space to getPointerSize llvm-svn: 327467	2018-03-14 00:36:23 +00:00
Craig Topper	ec4881ad53	[X86] Simplify the LowerAVXCONCAT_VECTORS code a little by creating a single path for insert_subvector handling. We now only create recursive concats if we have more than two non-zero values. This keeps our subvector broadcast DAG combine functioning. llvm-svn: 327457	2018-03-13 22:36:07 +00:00
Craig Topper	cc060e921b	[X86] Rewrite LowerAVXCONCAT_VECTORS similar to how we handle vXi1 concats. This better able to detect undef and zeros pieces in the concat. Or cases when only one subvector is non-zero. This allows us to avoid silly things like double inserts into progressively larger undefs. This still builds 512 bit concats of 128 bits by building up through 256 bits first. But I don't know if that's best. We probably want to merge this with the vXi1 concat code since they are very similar. llvm-svn: 327454	2018-03-13 22:05:25 +00:00
Simon Dardis	e5f72dd5e1	Revert "[mips] Guard traps for microMIPS correctly" This appears to have broken the expensive checks bot in a strange fashion. Reverting until I can investigate. This reverts r327409. llvm-svn: 327427	2018-03-13 17:31:11 +00:00
Craig Topper	7e711a6822	[X86] Remove SplitBinaryOpsAndApply and use SplitOpsAndApply by adding curly braces around the ops. Summary: Unless you were intentionally avoiding this syntax? I saw you mentioned makeArrayRef in your commit that added SplitOpsAndApply. Reviewers: RKSimon Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D44403 llvm-svn: 327418	2018-03-13 16:23:27 +00:00
Zaara Syeda	df28fb6ac2	test commit: fix formatting of a comment This is a simple change to do the test commit. llvm-svn: 327412	2018-03-13 15:49:05 +00:00
Simon Dardis	d5ae61d49d	[mips] Guard traps for microMIPS correctly This is part of fixing the instruction predicates for MIPS. Reviewers: atanasyan, abeserminji Differential Revision: https://reviews.llvm.org/D44212 llvm-svn: 327409	2018-03-13 15:46:58 +00:00
Simon Pilgrim	3d4c86d399	[X86][Btver2] Split i8/i16/i32/i64 div/idiv costs We were assuming a mixture of 32/64 division costs. llvm-svn: 327407	2018-03-13 15:22:24 +00:00
Simon Dardis	476ed8f26e	[mips] Fix the definitions of the EVA instructions Correct their availability to their respective ISAs. Reviewers: atanasyan Differential Revision: https://reviews.llvm.org/D44209 llvm-svn: 327403	2018-03-13 14:39:44 +00:00
Simon Dardis	9d7e9032f1	[mips] Don't create nested CALLSEQ_START..CALLSEQ_END nodes. For the MIPS O32 ABI, the current call lowering logic naively lowers each call, creating the reserved argument area to hold the argument spill areas for $a0..$a3 and the outgoing parameter area if one is required at each call site. In the case of a sufficently large byval argument, a call to memcpy is used to write the start+16..end of the argument into the outgoing parameter area. This is done within the CALLSEQ_START..CALLSEQ_END of the callee. The CALLSEQ nodes are responsible for performing the necessary stack adjustments. Since the O32/N32/N64 MIPS ABIs do not have a red-zone and writing below the stack pointer and reading the values back is unpredictable, the call to memcpy cannot be hoisted out of the callee's CALLSEQ nodes. However, for the O32 ABI requires the reserved argument area for functions which have parameters. The naive lowering of calls will then create nested CALLSEQ sequences. For N32 and N64 these nodes are also created, but with zero stack adjustments as those ABIs do not have a reserved argument area. This patch addresses the correctness issue by recognizing the special case of lowering a byval argument that uses memcpy. By recognizing that the incoming chain already has a CALLSEQ_START node on it when calling memcpy, the CALLSEQ nodes are not created. For the N32 and N64 ABIs, this is not an issue, as no stack adjustment has to be performed. For the O32 ABI, the correctness reasoning is different. In the case of a sufficently large byval argument, registers a0..a3 are going to be used for the callee's arguments, mandating the creation of the reserved argument area. The call to memcpy in the naive case will also create its own reserved argument area. However, since the reserved argument area consists of undefined values, both calls can use the same reserved argument area. Reviewers: abeserminji, atanasyan Differential Revision: https://reviews.llvm.org/D44296 llvm-svn: 327388	2018-03-13 12:50:03 +00:00
Simon Pilgrim	93bd7187f4	[X86][SSE41] createVariablePermute v2X64 - PCMPEQQ can test for index 0/1 and select between them. llvm-svn: 327385	2018-03-13 12:22:58 +00:00
Yonghong Song	82bf8bcb4f	bpf: Enhance debug information for peephole optimization passes Add more debug information for peephole optimization passes. These would only be enabled for debug version binary and could help analyzing why some optimization opportunities were missed. Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Signed-off-by: Yonghong Song <yhs@fb.com> llvm-svn: 327371	2018-03-13 06:47:07 +00:00
Yonghong Song	e91802f336	bpf: New post-RA peephole optimization pass to eliminate bad RA codegen This new pass eliminate identical move: MOV rA, rA This is particularly possible to happen when sub-register support enabled. The special type cast insn MOV_32_64 involves different register class on src (i32) and dst (i64), RA could generate useless instruction due to this. This pass also could serve as the bast for further post-RA optimization. Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Signed-off-by: Yonghong Song <yhs@fb.com> llvm-svn: 327370	2018-03-13 06:47:06 +00:00
Yonghong Song	80b882ecc5	bpf: Don't expand BSWAP on i32, promote it Currently, there is no ALU32 bswap support in eBPF ISA. BSWAP on i32 was set to EXPAND which would need about eight instructions for single BSWAP. It would be more efficient to promote it to i64, then doing BSWAP on i64. For eBPF programs, most of the promotion are zero extensions which are likely be elimiated later by peephole optimizations. Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Signed-off-by: Yonghong Song <yhs@fb.com> llvm-svn: 327369	2018-03-13 06:47:05 +00:00
Yonghong Song	1d28a759d9	bpf: Support subregister definition check on PHI node This patch relax the subregister definition check on Phi node. Previously, we just cancel the optimizatoin when the definition is Phi node while actually we could further check the definitions of incoming parameters of PHI node. This helps catch more elimination opportunities. Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Signed-off-by: Yonghong Song <yhs@fb.com> llvm-svn: 327368	2018-03-13 06:47:04 +00:00
Yonghong Song	c88bcdec43	bpf: Extends zero extension elimination beyond comparison instructions The current zero extension elimination was restricted to operands of comparison. It actually could be extended to more cases. For example: int inc_p (int p, unsigned a) { return p + a; } 'a' will be promoted to i64 during addition, and the zero extension could be eliminated as well. For the elimination optimization, it should be much better to start recognizing the candidate sequence from the SRL instruction instead of J* instructions. This patch makes it an generic zero extension elimination pass instead of one restricted with comparison. Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Signed-off-by: Yonghong Song <yhs@fb.com> llvm-svn: 327367	2018-03-13 06:47:03 +00:00
Yonghong Song	905d13c123	bpf: J_RR should check both operands There is a mistake in current code that we "break" out the optimization when the first operand of J_RR doesn't qualify the elimination. This caused some elimination opportunities missed, for example the one in the testcase. The code should just fall through to handle the second operand. Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Signed-off-by: Yonghong Song <yhs@fb.com> llvm-svn: 327366	2018-03-13 06:47:02 +00:00
Yonghong Song	89e47ac671	bpf: Tighten subregister definition check The current subregister definition check stops after the MOV_32_64 instruction. This means we are thinking all the following instruction sequences are safe to be eliminated: MOV_32_64 rB, wA SLL_ri rB, rB, 32 SRL_ri rB, rB, 32 However, this is not true. The source subregister wA of MOV_32_64 could come from a implicit truncation of 64-bit register in which case the high bits of the 64-bit register is not zeroed, therefore we can't eliminate above sequence. For example, for i32_val, we shouldn't do the elimination: long long bar (); int foo (int b, int c) { unsigned int i32_val = (unsigned int) bar(); if (i32_val < 10) return b; else return c; } Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Signed-off-by: Yonghong Song <yhs@fb.com> llvm-svn: 327365	2018-03-13 06:47:00 +00:00
Simon Pilgrim	7f1b9196cb	[X86][Btver2] Clean up formatting/comments in scheduler model. NFCI. Moved 'special cases' to be closer to other system classes. llvm-svn: 327332	2018-03-12 21:35:12 +00:00
Lei Huang	cd4f385795	[PowerPC][NFC] Explicitly state types on FP SDAG patterns in anticipation of adding the f128 type llvm-svn: 327319	2018-03-12 19:26:18 +00:00
Martin Storsjo	7bc64bd889	[AArch64] Fold adds with tprel_lo12_nc and secrel_lo12 into a following ldr/str Differential Revision: https://reviews.llvm.org/D44355 llvm-svn: 327316	2018-03-12 18:47:43 +00:00
Krzysztof Parzyszek	2d08f2ebf8	[Hexagon] Counting leading/trailing bits is cheap llvm-svn: 327308	2018-03-12 18:18:23 +00:00
Simon Pilgrim	f0a9b25394	[X86][Btver2] FSqrt/FDiv reg-reg instructions don't use the AGU. I love you llvm-mca. llvm-svn: 327306	2018-03-12 18:12:46 +00:00
Krzysztof Parzyszek	5d41cc19bd	[Hexagon] Subtarget feature to emit one instruction per packet This adds two features: "packets", and "nvj". Enabling "packets" allows the compiler to generate instruction packets, while disabling it will prevent it and disable all optimizations that generate them. This feature is enabled by default on all subtargets. The feature "nvj" allows the compiler to generate new-value jumps and it implies "packets". It is enabled on all subtargets. The exception is made for packets with endloop instructions, since they require a certain minimum number of instructions in the packets to which they apply. Disabling "packets" will not prevent hardware loops from being generated. llvm-svn: 327302	2018-03-12 17:47:46 +00:00
Simon Pilgrim	a0536c17b9	[X86] Deleting README-MMX.txt now that all tasks have been completed. MMX buildvectors were improved at rL327247 - new MMX bugs should be raised on bugzilla llvm-svn: 327300	2018-03-12 17:29:54 +00:00
Dmitry Preobrazhensky	d98c97b4f9	[AMDGPU][MC][GFX8] Added BUFFER_STORE_LDS_DWORD Instruction See bug 36558: https://bugs.llvm.org/show_bug.cgi?id=36558 Differential Revision: https://reviews.llvm.org/D43950 Reviewers: artem.tamazov, arsenm llvm-svn: 327299	2018-03-12 17:29:24 +00:00
Simon Pilgrim	deface9c73	[X86][Btver2] Prefix all scheduler defs. NFCI. These are all global, so prefix with 'J' to help prevent accidental name clashes with other models. llvm-svn: 327296	2018-03-12 17:07:08 +00:00
Craig Topper	acaba3b402	[X86] Remove use of MVT class from the ShuffleDecode library. MVT belongs to the CodeGen layer, but ShuffleDecode is used by the X86 InstPrinter which is part of the MC layer. This only worked because MVT is completely implemented in a header file with no other library dependencies. Differential Revision: https://reviews.llvm.org/D44353 llvm-svn: 327292	2018-03-12 16:43:11 +00:00
Yaxun Liu	a99e7d8e44	[AMDGPU] Fix lowering enqueue kernel when kernel has no name Since the enqueued kernels have internal linkage, their names may be dropped. In this case, give them unique names __amdgpu_enqueued_kernel or __amdgpu_enqueued_kernel.n where n is a sequential number starting from 1. Differential Revision: https://reviews.llvm.org/D44322 llvm-svn: 327291	2018-03-12 16:34:06 +00:00
Simon Pilgrim	6f01e654b4	[X86][Btver2] Extend JWriteResFpuPair to accept resource/uop counts. NFCI. This allows the single resource classes (VarBlend, MPSAD, VarVecShift) to use the JWriteResFpuPair macro. llvm-svn: 327289	2018-03-12 16:02:56 +00:00
Simon Pilgrim	bc216b440f	[X86][Btver2] Use JWriteResFpuPair wrapper for AES/CLMUL/HADD scheduler cases. NFCI. These are single pipe and have the default resource/uop counts like JWriteResFpuPair so there's no need to handle them separately. llvm-svn: 327283	2018-03-12 15:29:00 +00:00
Dmitry Preobrazhensky	da4a7c01bf	[AMDGPU][MC] Corrected GATHER4 opcodes See bug 36252: https://bugs.llvm.org/show_bug.cgi?id=36252 Differential Revision: https://reviews.llvm.org/D43874 Reviewers: artem.tamazov, arsenm llvm-svn: 327278	2018-03-12 15:03:34 +00:00
Sam McCall	bbfe434185	[Hexagon] fix 'must explicitly initialize the const member' error which clang 3.8 emits llvm-svn: 327273	2018-03-12 14:40:48 +00:00
Matt Arsenault	7b9ed89dcf	AMDGPU/GlobalISel: Legality and RegBankInfo for G_{INSERT\|EXTRACT}_VECTOR_ELT llvm-svn: 327269	2018-03-12 13:35:53 +00:00
Matt Arsenault	c0aefd561e	AMDGPU/GlobalISel: InstrMapping for G_MERGE_VALUES llvm-svn: 327268	2018-03-12 13:35:49 +00:00
Matt Arsenault	503afda95f	AMDGPU/GlobalISel: Make some G_MERGE_VALUEs legal llvm-svn: 327267	2018-03-12 13:35:43 +00:00
Simon Dardis	1f0fe56460	[mips] Split out ASEPredicate from InsnPredicates (NFC) This simplifies tagging instructions with the correct ISA and ASE, albeit making instruction definitions a bit more verbose. Reviewers: atanasyan, abeserminji Differential Revision: https://reviews.llvm.org/D44299 llvm-svn: 327265	2018-03-12 13:16:12 +00:00
Nico Weber	73a699e592	MC intel asm parser: Allow @ at the start of function names. Ports parts of r193000 to the intel parser. Fixes part of PR36676. https://reviews.llvm.org/D44359 llvm-svn: 327262	2018-03-12 12:47:27 +00:00
Simon Pilgrim	6618e2a09c	[X86][SSE] createVariablePermute - PSHUFB requires SSSE3 not just SSE3 llvm-svn: 327259	2018-03-12 12:30:04 +00:00
Craig Topper	7cc1b1fc84	[X86] Don't compute known bits twice for the same SDValue in LowerMUL. We called MaskedValueIsZero with two different masks, but underneath that calls computeKnownBits before applying the mask. This means we compute the same known bits twice due to the two calls. Instead just call computeKnownBits directly and apply the two masks ourselves. llvm-svn: 327251	2018-03-12 05:35:02 +00:00
Simon Pilgrim	d09cc9c62c	[X86][MMX] Support MMX build vectors to avoid SSE usage (PR29222) 64-bit MMX vector generation usually ends up lowering into SSE instructions before being spilled/reloaded as a MMX type. This patch creates a MMX vector from MMX source values, taking the lowest element from each source and constructing broadcasts/build_vectors with direct calls to the MMX PUNPCKL/PSHUFW intrinsics. We're missing a few consecutive load combines that could be handled in a future patch if that would be useful - my main interest here is just avoiding a lot of the MMX/SSE crossover. Differential Revision: https://reviews.llvm.org/D43618 llvm-svn: 327247	2018-03-11 19:22:13 +00:00
Simon Pilgrim	30f74c14ff	[X86][AVX] createVariablePermute - scale v16i16 variable permutes to use v32i8 codegen XOP was already doing this, and now AVX performs v32i8 variable permutes as well. llvm-svn: 327245	2018-03-11 17:23:54 +00:00
Simon Pilgrim	b306501796	[X86][AVX] createVariablePermute - widen permutes for cases where the source vector is wider than the destination type llvm-svn: 327244	2018-03-11 17:00:46 +00:00
Simon Pilgrim	9a5d0c7540	[X86][AVX] createVariablePermute - use PSHUFB+PCMPGT+SELECT for v32i8 variable permutes Same as the VPERMILPS/VPERMILPD approach for v8f32/v4f64 cases, rely on PSHUFB using bits[3:0] for indexing - we can ignore the sign bit (zero element) as those index vector values are considered undefined. The select between the lo/hi permute results based on the index size. llvm-svn: 327242	2018-03-11 16:28:11 +00:00
Simon Pilgrim	d2fbd87ce8	Fix for buildbots which didn't like makeArrayRef with initializer lists. llvm-svn: 327241	2018-03-11 14:31:55 +00:00
Simon Pilgrim	e60afdf9eb	[X86][SSE] Generalized SplitBinaryOpsAndApply to SplitOpsAndApply to support any number of ops. I've kept SplitBinaryOpsAndApply as a wrapper to avoid a lot of makeArrayRef code. llvm-svn: 327240	2018-03-11 14:04:53 +00:00
Simon Pilgrim	f9cc80d218	[X86][AVX] createVariablePermute - use 2xVPERMIL+PCMPGT+SELECT for v8i32/v8f32 and v4i64/v4f64 variable permutes As VPERMILPS/VPERMILPD only selects elements based on the bits[1:0]/bit[1] then we can permute both the (repeated) lo/hi 128-bit vectors in each case and then select between these results based on whether the index was for for lo/hi. For v4i64/v4f64 this avoids some rather nasty v4i64 multiples on the AVX2 implementation, which seems to be worse than the extra port5 pressure from the additional shuffles/blends. llvm-svn: 327239	2018-03-11 11:52:26 +00:00
Simon Pilgrim	2565bd421e	[X86][AVX512] createVariablePermute - Non-VLX targets can widen v4i64/v8f64 variable permutes to v8i64/v8f64 Permutes in the upper elements will be undefined, but they will be discarded anyway. llvm-svn: 327238	2018-03-11 11:19:19 +00:00
Simon Pilgrim	64b899f0f3	[x86][SSE] Add widenSubVector helper. NFCI. Helper function to insert a subvector into the bottom elements of a larger zero/undef vector with the same scalar type. I've converted a couple of INSERT_SUBVECTOR calls to use it, there are plenty more although in some cases I was worried it might make the code more ambiguous. llvm-svn: 327236	2018-03-11 10:50:48 +00:00
Michael Bedy	80cf9ff564	Test commit - change comment slightly. llvm-svn: 327234	2018-03-11 03:27:50 +00:00
Craig Topper	d88204fe1b	[X86] Add comments to the end of FMA3 instructions to make the operation clear Summary: There are 3 different operand orders for FMA instructions so figuring out the exact operation being performed requires a lot of thought. This patch adds a comment to the end of the assembly line to print the exact operation. I think I've got all the instructions in here except the ones with builtin rounding. I didn't update all tests, but I assume we can get them as we regenerate tests in the future. Reviewers: spatel, v_klochkov, RKSimon Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D44345 llvm-svn: 327225	2018-03-10 21:30:46 +00:00
Simon Pilgrim	de7f3f0f91	[X86][XOP] createVariablePermute - use VPERMIL2 for v8i32/v4i64 variable permutes llvm-svn: 327222	2018-03-10 19:49:59 +00:00
Martin Storsjo	cc24096d4d	[AArch64] Implement native TLS for Windows Differential Revision: https://reviews.llvm.org/D43971 llvm-svn: 327220	2018-03-10 19:05:21 +00:00
Simon Pilgrim	ff1248f82f	[X86][XOP] createVariablePermute - use VPPERM for v16i16 variable permutes llvm-svn: 327218	2018-03-10 18:33:29 +00:00
Simon Pilgrim	d9dc114e2f	[X86][SSE] createVariablePermute - create index scaling helper. NFCI. This will help in some future changes for custom lowering. llvm-svn: 327217	2018-03-10 18:12:35 +00:00
Simon Pilgrim	8224241f75	[X86][XOP] createVariablePermute - use VPPERM for v32i8 variable permutes llvm-svn: 327213	2018-03-10 16:51:45 +00:00
Matt Arsenault	cbda7ff4ae	AMDGPU: Fix crash when constant folding with physreg operand llvm-svn: 327209	2018-03-10 16:05:35 +00:00
Craig Topper	b76ed82b58	[X86] Add a missing EVEX instruction to EmitAnyX86InstComments. The equivalent SSE and VEX instruction are already there. llvm-svn: 327205	2018-03-10 06:05:13 +00:00
Craig Topper	f27016f3de	[X86] Move the AC_EVEX_2_VEX AsmComments enum to X86InstrInfo.h from X86InstComments.h. X86InstComments.h is used by tools that only have the MC layer. We shouldn't be importing a file from CodeGen into this. X86InstrInfo.h isn't a great place, but I couldn't find a better one. llvm-svn: 327202	2018-03-10 05:15:22 +00:00
Craig Topper	9804c67d21	[X86] Rewrite printMasking code in X86InstComments to use TSFlags to determine whether the instruction is masked. This should have been NFC, but it looks like we were missing PUNPCKLHQDQ/PUNPCKLQDQ instructions in there. llvm-svn: 327200	2018-03-10 03:12:00 +00:00
Rafael Espindola	63c378d343	Go back to sometimes assuming intristics are local. This fixes pr36674. While it is valid for shouldAssumeDSOLocal to return false anytime, always returning false for intrinsics is not optimal on i386 and also hits a bug in the backend. To use a plt, the caller must first setup ebx to handle the case of that file being linked into a PIE executable or shared library. In those cases the generated PLT uses ebx. Currently we can produce "calll expf@plt" without setting ebx. We could fix that by correctly setting ebx, but this would produce worse code for the case where the runtime library is statically linked. It would also required other tools to handle R_386_PLT32. llvm-svn: 327198	2018-03-10 02:42:14 +00:00
Nirav Dave	042678bd55	Revert: r327172 "Correct load-op-store cycle detection analysis" r327171 "Improve Dependency analysis when doing multi-node Instruction Selection" r328170 "[DAG] Enforce stricter NodeId invariant during Instruction selection" Reverting patch as NodeId invariant change is causing pathological increases in compile time on PPC llvm-svn: 327197	2018-03-10 02:16:15 +00:00
Stefan Pintilie	735817aa1a	[Power9] Code Cleaup and adding Comments for Power 9 Scheduler Did some code cleanup up removing ItinRW that are not needed and resource types that are no longer used. Also added more comments to the td files related to the Power 9 sheduler model. llvm-svn: 327174	2018-03-09 21:08:35 +00:00
Nirav Dave	0fab41782d	Correct load-op-store cycle detection analysis Add missing cycle dependency checks in load-op-store fusion. Fixes PR36274. Reviewers: craig.topper, bogner Subscribers: hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D43154 llvm-svn: 327172	2018-03-09 20:58:07 +00:00
Nirav Dave	d668f69ee7	Improve Dependency analysis when doing multi-node Instruction Selection Relanding after fixing NodeId Invariant. Cleanup cycle/validity checks in ISel (IsLegalToFold, HandleMergeInputChains) and X86 (isFusableLoadOpStore). Now do a full search for cycles / dependencies pruning the search when topological property of NodeId allows. As part of this propogate the NodeId-based cutoffs to narrow hasPreprocessorHelper searches. Reviewers: craig.topper, bogner Subscribers: llvm-commits, hiraditya Differential Revision: https://reviews.llvm.org/D41293 llvm-svn: 327171	2018-03-09 20:57:42 +00:00
Nirav Dave	071699bf82	[DAG] Enforce stricter NodeId invariant during Instruction selection Instruction Selection makes use of the topological ordering of nodes by node id (a node's operands have smaller node id than it) when doing cycle detection. During selection we may violate this property as a selection of multiple nodes may induce a use dependence (and thus a node id restriction) between two unrelated nodes. If a selected node has an unselected successor this may allow us to miss a cycle in detection an invalid selection. This patch fixes this by marking all unselected successors of a selected node have negated node id. We avoid pruning on such negative ids but still can reconstruct the original id for pruning. In-tree targets have been updated to replace DAG-level replacements with ISel-level ones which enforce this property. This preemptively fixes PR36312 before triggering commit r324359 relands Reviewers: craig.topper, bogner, jyknight Subscribers: arsenm, nhaehnle, javed.absar, llvm-commits, hiraditya Differential Revision: https://reviews.llvm.org/D43198 llvm-svn: 327170	2018-03-09 20:57:15 +00:00
Peter Collingbourne	2974856ad4	Use branch funnels for virtual calls when retpoline mitigation is enabled. The retpoline mitigation for variant 2 of CVE-2017-5715 inhibits the branch predictor, and as a result it can lead to a measurable loss of performance. We can reduce the performance impact of retpolined virtual calls by replacing them with a special construct known as a branch funnel, which is an instruction sequence that implements virtual calls to a set of known targets using a binary tree of direct branches. This allows the processor to speculately execute valid implementations of the virtual function without allowing for speculative execution of of calls to arbitrary addresses. This patch extends the whole-program devirtualization pass to replace certain virtual calls with calls to branch funnels, which are represented using a new llvm.icall.jumptable intrinsic. It also extends the LowerTypeTests pass to recognize the new intrinsic, generate code for the branch funnels (x86_64 only for now) and lay out virtual tables as required for each branch funnel. The implementation supports full LTO as well as ThinLTO, and extends the ThinLTO summary format used for whole-program devirtualization to support branch funnels. For more details see RFC: http://lists.llvm.org/pipermail/llvm-dev/2018-January/120672.html Differential Revision: https://reviews.llvm.org/D42453 llvm-svn: 327163	2018-03-09 19:11:44 +00:00
Simon Pilgrim	2cd489feb2	[X86][AVX] createVariablePermute - fix v2i64/v2f64 VPERMILPD index creation. The input indices vector will put the index in bit0, but VPERMILPD actually selects off bit1 - so we need to scale accordingly. llvm-svn: 327159	2018-03-09 18:37:56 +00:00
Simon Pilgrim	230d38b559	[X86][SSE] createVariablePermute - move source vector canonicalization to top of function. NFCI. This is to make it easier to return early from the switch statement with custom lowering. llvm-svn: 327157	2018-03-09 18:08:08 +00:00
Farhana Aleen	a7cb31123c	[AMDGPU] Supported ds_read_b128 generation; Widened vector length for local address-space. Summary: Starting from GCN 2nd generation, ISA supports ds_read_b128 on top of ds_read_b64. This patch supports ds_read_b128 instruction pattern and generation of this instruction. In the vectorizer, this patch also widen the vector length so that vectorizer generates 128 bit loads for local address-space which gets translated to ds_read_b128. Since the performance benefit is not clear; compiler generates ds_read_b128 under -amdgpu-ds128. Author: FarhanaAleen Reviewed By: rampitec, arsenm Subscribers: llvm-commits, AMDGPU Differential Revision: https://reviews.llvm.org/D44210 llvm-svn: 327153	2018-03-09 17:41:39 +00:00
Stefan Pintilie	ef7c4976bb	Revert "[PowerPC] LSR tunings for PowerPC" Revert the rest of the LST tune commit. It seems that the LSR tune commit breaks internal tests. Reverting the commit. llvm-svn: 327143	2018-03-09 16:08:55 +00:00
Simon Pilgrim	033a4167d2	Tidyup comment that was destroyed by clang-format. NFCI. llvm-svn: 327141	2018-03-09 15:50:09 +00:00
Simon Pilgrim	322c521ed7	[X86][SSE] createVariablePermute - move index vector canonicalization to top of function. NFCI. This is to make it easier to return early from the switch statement with custom lowering. llvm-svn: 327140	2018-03-09 15:48:56 +00:00
Sebastian Pop	b4bd0a404f	[x86][aarch64] ask the backend whether it has a vector blend instruction The code to match and produce more x86 vector blends was enabled for all architectures even though the transform may pessimize the code for other architectures that do not provide a vector blend instruction. Added an aarch64 testcase to check that a VZIP instruction is generated instead of byte movs. Differential Revision: https://reviews.llvm.org/D44118 llvm-svn: 327132	2018-03-09 14:29:21 +00:00
Stanislav Mekhanoshin	c8127fc674	[AMDGPU] Fixed V_DIV_FIXUP_F16 selection on GFX9 GFX9 should select opsel version. Differential Revision: https://reviews.llvm.org/D44279 llvm-svn: 327106	2018-03-09 07:21:43 +00:00
Craig Topper	e7060b2040	[X86] Remove duplicate isel pattern. NFC llvm-svn: 327104	2018-03-09 05:42:44 +00:00
Craig Topper	784f1bbf5e	[X86] Remove SRAs from v16i8 multiply lowering on sse2 targets Previously we unpacked the even bytes of each input into the high byte of 16-bit elements then did an v8i16 arithmetic shift right by 8 bits to fill the upper bits of each word with sign bits. Then we did the v8i16 multiply and then masked to zero the upper 8-bits of each result. The similar was done for all the odd bytes. The results are then packed together with packuswb Since we are masking each multiply result element to 8-bits, and those 8-bits are determined only by the lower 8-bits of each of the inputs, we don't need to fill the upper bits with sign bits. So we can just unpack into the low byte of each element and treat the upper bits as garbage. This is what gcc also does. Differential Revision: https://reviews.llvm.org/D44267 llvm-svn: 327093	2018-03-09 01:22:31 +00:00
Simon Pilgrim	c286680032	[X86][AVX] Pull out variable permute creation from LowerBUILD_VECTORAsVariablePermute. NFCI. This will make it easier to handle more complex cases than basic scaling or index masks. llvm-svn: 327054	2018-03-08 20:07:06 +00:00
Krzysztof Parzyszek	480ab2bbc4	[Hexagon] Ignore indexed loads when handling unaligned loads llvm-svn: 327037	2018-03-08 18:15:13 +00:00
Stefan Pintilie	235fb927b0	[Power9] Add more missing instructions to the Power 9 scheduler With this patch we should be able to mark the Power 9 model as complete. llvm-svn: 327021	2018-03-08 16:24:33 +00:00
Matt Arsenault	c3fe46bbcf	AMDGPU/GlobalISel: Pass subtarget + TM to LegalizerInfo These are the parameters x86 already uses. llvm-svn: 327020	2018-03-08 16:24:16 +00:00
Craig Topper	a406796f5f	[X86] Change X86::PMULDQ/PMULUDQ opcodes to take vXi64 type as input instead of vXi32. This instruction can be thought of as reading either the even elements of a vXi32 input or the lower half of each element of a vXi64 input. We currently use the vXi32 interpretation, but vXi64 matches better with its broadcast behavior in EVEX. I'm looking at moving MULDQ/MULUDQ creation to a DAG combine so we can do it when AVX512DQ is enabled without having to go through Custom lowering. But in some of the test cases we failed to use a broadcast load due to the size difference. This should help with that. I'm also wondering if we can model these instructions in native IR and remove the intrinsics and I think using a vXi64 type will work better with that. llvm-svn: 326991	2018-03-08 08:02:52 +00:00
Heejin Ahn	0de587296e	[WebAssembly] Add except_ref as a first-class type Summary: Add except_ref as a first-class type, according to the [[https://github.com/WebAssembly/exception-handling/blob/master/proposals/Level-1.md \| Level 1 exception handling proposal ]]. Reviewers: dschuff Subscribers: jfb, sbc100, llvm-commits Differential Revision: https://reviews.llvm.org/D43706 llvm-svn: 326985	2018-03-08 04:05:37 +00:00
Weiming Zhao	a4259cd3a6	[AArch64] Fix UB about shift amount exceeds data bit-width Summary: Fixes an UB caught by sanitizer. The shift amount might be larger than 32 so the operand should be 1ULL. In this patch, we replace the original expression with existing API with uint64_t type. Reviewers: eli.friedman, rengolin Reviewed By: rengolin Subscribers: rengolin, javed.absar, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D44234 llvm-svn: 326969	2018-03-08 00:28:25 +00:00
Craig Topper	7ff9779768	[X86] Fix some isel patterns that used aligned vector load instructions with unaligned predicates. These patterns weren't checking the alignment of the load, but were using the aligned instructions. This will cause a GP fault if the data isn't aligned. I believe these were introduced in r312450. llvm-svn: 326967	2018-03-08 00:21:17 +00:00
Rafael Espindola	06c064824e	Delete code that is probably dead since r249303. With r249303 the expression evaluation should expand variables that are not in sections (and so don't have an atom). llvm-svn: 326966	2018-03-08 00:17:13 +00:00
Simon Pilgrim	68594ee24a	[X86][SSE] LowerBUILD_VECTORAsVariablePermute - reorder permute types. NFCI. Reorder into 128/256/512 bit vector size groupings. NFCI commit before some new features. llvm-svn: 326963	2018-03-07 23:56:42 +00:00
Evandro Menezes	f9bd871d32	[AArch64] Adjust the cost of integer vector division Since there is no instruction for integer vector division, factor in the cost of singling out each element to be used with the scalar division instruction. Differential revision: https://reviews.llvm.org/D43974 llvm-svn: 326955	2018-03-07 22:35:32 +00:00
Sebastian Pop	33bdb3e0e6	[AArch64] add missing pattern for insert_subvector undef The attached testcase started failing after the patch to define isExtractSubvectorCheap with the following pattern mismatch: ISEL: Starting pattern match Initial Opcode index to 85068 Match failed at index 85076 LLVM ERROR: Cannot select: t47: v8i16 = insert_subvector undef:v8i16, t43, Constant:i64<0> The code generated from llvm/lib/Target/AArch64/AArch64InstrInfo.td def : Pat<(insert_subvector undef, (v4i16 FPR64:$src), (i32 0)), (INSERT_SUBREG (v8i16 (IMPLICIT_DEF)), FPR64:$src, dsub)>; is in ninja/lib/Target/AArch64/AArch64GenDAGISel.inc At the location of the error it is: /* 85076*/ OPC_CheckChild2Type, MVT::i32, And it failed to match the type of operand 2. Adding another def-pat for i64 fixes the failed def-pat error: def : Pat<(insert_subvector undef, (v4i16 FPR64:$src), (i64 0)), (INSERT_SUBREG (v8i16 (IMPLICIT_DEF)), FPR64:$src, dsub)>; llvm-svn: 326949	2018-03-07 22:07:13 +00:00
Craig Topper	80ec0c3106	[X86] Remove unused function argument. NFC llvm-svn: 326939	2018-03-07 19:45:45 +00:00
Craig Topper	c3c15dd640	[X86] Make the MUL->VPMADDWD work before op legalization on AVX1 targets. Simplify feature checks by using isTypeLegal. The v8i32 conversion on AVX1 targets was only working after LowerMUL splits 256-bit vectors. While I was there I've also made it so we don't have to check for AVX2 and BWI directly and instead just ask if the type is legal. Differential Revision: https://reviews.llvm.org/D44190 llvm-svn: 326917	2018-03-07 17:53:18 +00:00
Krzysztof Parzyszek	2c3edf0567	[Hexagon] Rewrite non-HVX unaligned loads as pairs of aligned ones This is a follow-up to r325169, this time for all types, not just HVX vector types. Disable this by default, since it's not always safe. llvm-svn: 326915	2018-03-07 17:27:18 +00:00
Farhana Aleen	89196642f7	[AMDGPU] Increased vector length for global/constant loads. Summary: GCN ISA supports instructions that can read 16 consecutive dwords from memory through the scalar data cache; loadstoreVectorizer should take advantage of the wider vector length and pack 16/8 elements of dwords/quadwords. Author: FarhanaAleen Reviewed By: rampitec Subscribers: llvm-commits, AMDGPU Differential Revision: https://reviews.llvm.org/D44179 llvm-svn: 326910	2018-03-07 17:09:18 +00:00
Farhana Aleen	347d12b4ce	Revert "[AMDGPU] Widened vector length for global/constant address space." This reverts commit ce988cc100dc65e7c6c727aff31ceb99231cab03. llvm-svn: 326907	2018-03-07 16:55:27 +00:00
Stefan Pintilie	f8438e8e59	[PowerPC] LSR tunings for PowerPC The purpose of this patch is to have LSR generate better code on Power. This is done by overriding isLSRCostLess. Differential Revision: https://reviews.llvm.org/D40855 llvm-svn: 326906	2018-03-07 16:53:09 +00:00
Farhana Aleen	0d03d0588d	[AMDGPU] Widened vector length for global/constant address space. llvm-svn: 326904	2018-03-07 16:29:05 +00:00
Simon Dardis	52ae4f078e	[mips] Correct the definition of m(f\|t)c(0\|2) These instructions are defined as taking a GPR register and a coprocessor register for ISAs up to MIPS32. MIPS32 extended the definition to allow a selector--a value from 0 to 32--to access another register. These instructions are now internally defined as being MIPS-I instructions, but are rejected for pre-MIPS32 ISA's if they have an explicit selector which is non-zero. This deviates slightly from GAS's behaviour which rejects assembly instructions with an explicit selector for pre-MIPS32 ISAs. E.g: mfc0 $4, $5, 0 is rejected by GAS for MIPS-I to MIPS-V but will be accepted with this patch for MIPS-I to MIPS-V. Reviewers: atanasyan Differential Revision: https://reviews.llvm.org/D41662 llvm-svn: 326890	2018-03-07 11:39:48 +00:00
Sjoerd Meijer	af30f06d5c	[ARM] Fix for PR36577 Don't PerformSHLSimplify if the given node is used by a node that also uses a constant because we may get stuck in an infinite combine loop. bugzilla: https://bugs.llvm.org/show_bug.cgi?id=36577 Patch by Sam Parker. Differential Revision: https://reviews.llvm.org/D44097 llvm-svn: 326882	2018-03-07 09:10:44 +00:00
Jonas Paulsson	91c853a79d	[SystemZ] NFC refactoring in SystemZHazardRecognizer. Use Reset() after emitting a call. Review: Ulrich Weigand llvm-svn: 326881	2018-03-07 08:57:09 +00:00
Jonas Paulsson	9b0f28f009	[SystemZ] Improve getCurrCycleIdx() in SystemZHazardRecognizer. getCurrCycleIdx() returns the decoder cycle index which the next candidate SU will be placed on. This patch improves this method by passing the candidate SU to it so that if SU will begin a new group, the index of that group is returned instead. Review: Ulrich Weigand llvm-svn: 326880	2018-03-07 08:54:32 +00:00
Jonas Paulsson	e18dbeb24f	[SystemZ] NFC refactoring in SystemZHazardRecognizer. Handle the not-taken branch in emitInstruction() where the TakenBranch argument is available. This is cleaner than relying on EmitInstruction(). Review: Ulrich Weigand llvm-svn: 326879	2018-03-07 08:45:09 +00:00
Jonas Paulsson	61fbcf5825	[SystemZ] Improved debug dumping during post-RA scheduling. Review: Ulrich Weigand llvm-svn: 326878	2018-03-07 08:39:00 +00:00
Clement Courbet	327fac4d75	[X86] Add IMUL scheduling info on sandybridge, fix it on >=haswell. Summary: Only IMUL16rri uses an extra P0156. IMUL32* and IMUL16rr only use P1. This was computed using https://github.com/google/EXEgesis/blob/master/exegesis/tools/compute_itineraries.cc This can easily be validated by running perf on the following code: ``` int main(int argc, char**argv) { int a = argc; int b = argc; int c = argc; int d = argc; for (int i = 0; i < LOOP_ITERATIONS; ++i) { asm volatile( R"( .rept 10000 imull $0x2, %%edx, %%eax imull $0x2, %%ecx, %%ebx imull $0x2, %%eax, %%edx imull $0x2, %%ebx, %%ecx .endr )" : "+a"(a), "+b"(b), "+c"(c), "+d"(d) : :); } return a+b+c+d; } ``` -> test.cc perf stat -x, -e cycles --pfm-events=uops_executed_port:port_0:u,uops_executed_port:port_1:u,uops_executed_port:port_2:u,uops_executed_port:port_3:u,uops_executed_port:port_4:u,uops_executed_port:port_5:u,uops_executed_port:port_6:u,uops_executed_port:port_7:u test Reviewers: craig.topper, RKSimon, gadi.haber Subscribers: llvm-commits, gchatelet, chandlerc Differential Revision: https://reviews.llvm.org/D43460 llvm-svn: 326877	2018-03-07 08:14:02 +00:00
Craig Topper	80d3bb3b4b	[TargetLowering] Rename DAGCombinerInfo::isAfterLegalizeVectorOps to DAGCombiner::isAfterLegalizeDAG since that's what it checks. NFC The code checks Level == AfterLegalizeDAG which is the fourth and last of the possible DAG combine stages that we have. There is a Level called AfterLegalVectorOps, but that's the third DAG combine and it doesn't always run. A function called isAfterLegalVectorOps should imply it returns true in either of the DAG combines that runs after the legalize vector ops stage, but that's not what this function does. llvm-svn: 326832	2018-03-06 19:44:52 +00:00
Craig Topper	274e08dd81	[X86] Reject registers that require a REX prefix in inline asm constraints in 32-bit mode We don't currently reject r8-r15 or xmm8-32 or bpl/spl/sil/dil in 32-bit mode. Differential Revision: https://reviews.llvm.org/D44031 llvm-svn: 326826	2018-03-06 18:56:33 +00:00
Stanislav Mekhanoshin	0f72225433	[AMDGPU] Add default ISA version targets In case if -mattr used to modify feature set bits in llvm-mc call getIsaVersion can fail to identify specific ISA due to test mismatch. Adding default fallback tests which will always correctly report at least major version. Differential Revision: https://reviews.llvm.org/D44163 llvm-svn: 326825	2018-03-06 18:33:55 +00:00
Sebastian Pop	41073e8046	[AArch64] define isExtractSubvectorCheap Following the ARM-neon backend, define isExtractSubvectorCheap to return true when extracting low and high part of a neon register. The patch disables a test in llvm/test/CodeGen/AArch64/arm64-ext.ll This testcase is fragile in the sense that it requires a BUILD_VECTOR to "survive" all DAG transforms until ISelLowering. The testcase is supposed to check that AArch64TargetLowering::ReconstructShuffle() works, and for that we need a BUILD_VECTOR in ISelLowering. As we now transform the BUILD_VECTOR earlier into an VEXT + vector_shuffle, we don't have the BUILD_VECTOR pattern when we get to ISelLowering. As there is no way to disable the combiner to only exercise the code in ISelLowering, the patch disables the testcase. Differential revision: https://reviews.llvm.org/D43973 llvm-svn: 326811	2018-03-06 16:54:55 +00:00
Yaxun Liu	46439e8d4a	[AMDGPU] Fix lowering OpenCL enqueue_kernel One addrspacecast disappeared in clang emitted IR for block invoke function due to adoption of the new addr space mapping. Differential Revision: https://reviews.llvm.org/D43785 llvm-svn: 326806	2018-03-06 16:04:39 +00:00
Simi Pallipurath	75c6bfeac9	[ARM]Decoding MSR with unpredictable destination register causes an assert This patch handling: Enable parsing of raw encodings of system registers . Allows UNPREDICTABLE sysregs to be decoded to a raw number in the same way that disasslib does, rather than llvm crashing. Disassemble msr/mrs with unpredictable sysregs as SoftFail. Fix regression due to SoftFailing some encodings. Patch by Chris Ryder Differential revision:https://reviews.llvm.org/D43374 llvm-svn: 326803	2018-03-06 15:21:19 +00:00
Dylan McKay	8f46486c65	[AVR] Remove the earlyclobber flag from LDDWRdYQ Before I started maintaining the AVR backend, this instruction never originally used to have an earlyclobber flag. Some time afterwards (years ago), I must've added it back in, not realising that it was left out for a reason. This pseudo instrction exists solely to work around a long standing bug in the register allocator. Before this commit, the LDDWRdYQ pseudo was not actually working around any bug. With the earlyclobber flag removed again, the LDDWRdYQ pseudo now correctly works around PR13375 again. llvm-svn: 326774	2018-03-06 11:20:25 +00:00
Martin Storsjo	a7adc3185b	[X86] Handle EAX being live when calling chkstk for x86_64 EAX can turn out to be alive here, when shrink wrapping is done (which is allowed when using dwarf exceptions, contrary to the normal case with WinCFI). This fixes PR36487. Differential Revision: https://reviews.llvm.org/D43968 llvm-svn: 326764	2018-03-06 06:00:13 +00:00
Simon Pilgrim	53ff5ae8a1	[X86] cvttpd2dq lowering has been supported for some time Tests in vec_fp_to_int.ll llvm-svn: 326751	2018-03-05 23:00:39 +00:00
Nemanja Ivanovic	6cc31ca814	[PowerPC] Do not emit record-form rotates when record-form andi suffices Up until Power9, the performance profile for rlwinm., rldicl. and andi. looked more or less equivalent. However with Power9, the rotates are still 2-way cracked whereas the and-immediate is not. This patch just ensures that we don't emit record-form rotates when an andi. is adequate. As first pointed out by Carrot in https://bugs.llvm.org/show_bug.cgi?id=30833 (this patch is a fix for that PR). Differential Revision: https://reviews.llvm.org/D43977 llvm-svn: 326736	2018-03-05 19:27:16 +00:00
Sebastian Pop	ac0bfb5938	fix PR36582 The error occurs when reading i16 elements (as in the testcase) from a v8i8 with a pattern of <0,2,4,6>. As all the data in the vector is accessed, the operation is not a VUZP. The patch stops the pattern recognition of VUZP when EXTRACT_VECTOR_ELT has a different element type than BUILD_VECTOR. llvm-svn: 326722	2018-03-05 17:35:49 +00:00
Evandro Menezes	cd855f70c5	[AArch64] Improve code generation of constant vectors Use the whole gammut of constant immediates available to set up a vector. Instead of using, for example, `mov w0, #0xffff; dup v0.4s, w0`, which transfers between register files, use the more efficient `movi v0.4s, #-1` instead. Not limited to just a few values, but any immediate value that can be encoded by all the variants of `FMOV`, `MOVI`, `MVNI`, thus eliminating the need to there be patterns to optimize special cases. Differential revision: https://reviews.llvm.org/D42133 llvm-svn: 326718	2018-03-05 17:02:47 +00:00
Matt Arsenault	e31ab94e97	AMDGPU/GlobalISel: Add InstrMapping for G_EXTRACT llvm-svn: 326715	2018-03-05 16:25:18 +00:00
Matt Arsenault	71272e6d4e	AMDGPU/GlobalISel: Make some G_EXTRACTs legal As far as I can tell legalization of weird sizes for the output type isn't implemented. llvm-svn: 326714	2018-03-05 16:25:15 +00:00
Matt Arsenault	4cc0b85276	AMDGPU: Fix build warning about override llvm-svn: 326713	2018-03-05 16:25:10 +00:00
Alexander Timofeev	2e5eeceeb7	Pass Divergence Analysis data to Selection DAG to drive divergence dependent instruction selection. Differential revision: https://reviews.llvm.org/D35267 llvm-svn: 326703	2018-03-05 15:12:21 +00:00
Stefan Pintilie	d45db612c6	[Power9] Add more missing instructions to the Power 9 scheduler Adding more instructions using InstRW so that we can move away from ItinRW and ultimately have a complete Power 9 scheduler. llvm-svn: 326701	2018-03-05 14:34:59 +00:00
Oliver Stannard	f20222a83c	[ARM][Asm] VMOVSRR and VMOVRRS need sequential S registers These instructions require that the two S registers are adjacent (but not the R registers), because only the first register is included in the encoding, but we were not checking this in the assembler. Differential revision: https://reviews.llvm.org/D44084 llvm-svn: 326696	2018-03-05 13:27:26 +00:00
Thomas Preud'homme	c699eaa311	Fix location of comment in EmitPopInst Comment about folding return in LDM was not moved along with the corresponding code in r242714. This commit fixes that. llvm-svn: 326690	2018-03-05 11:49:00 +00:00
Craig Topper	f546b2c06f	[X86] Replace usages of X86Subtarget::hasFp256 with hasAVX. Remove hasFP256. Almost none of these usages were FP specific. And we had no clear guideliness on when to use hasAVX vs hasFP256. I might also remove hasInt256 too since its an alias for hasAVX2. llvm-svn: 326682	2018-03-05 00:13:35 +00:00
Craig Topper	f2aae62228	[X86] Add a DAG combine to turn stores of vXi1 constants into scalar stores. llvm-svn: 326679	2018-03-04 19:33:15 +00:00
Simon Pilgrim	3a76bc29d7	[X86][MMX] Remove completed _mm_cvtsi32_si64 todo rL322525 - mmx zero constant support rL322553 - mmx i32 zero extended value rL326497 - mmx i64 general constant handling Not all constants are folded, we generate some on the GPRs (similar to SSE build vector) where appropriate llvm-svn: 326673	2018-03-04 14:57:26 +00:00
Craig Topper	12c35e1940	[X86] Fix unused variable in release builds. llvm-svn: 326672	2018-03-04 02:14:16 +00:00
Craig Topper	a476026f70	[X86] Combine (store (v1i1 (scalar_to_vector (i8 X)))) -> (store (i8 X)). llvm-svn: 326670	2018-03-04 01:48:02 +00:00
Craig Topper	be31585be8	[X86] Lower v1i1/v2i1/v4i1/v8i1 load/stores to i8 load/store during op legalization if AVX512DQ is not supported. We were previously doing this with isel patterns. Moving it to op legalization gives us chance to see the required bitcast earlier. And it lets us remove some isel patterns. llvm-svn: 326669	2018-03-04 01:48:00 +00:00
Simon Pilgrim	ad403a483a	[X86] This bit-test TODO has been moved in PR36551 llvm-svn: 326658	2018-03-03 16:31:17 +00:00
Craig Topper	d4b6601662	[X86] Remove 'else' after return. NFC llvm-svn: 326642	2018-03-03 05:18:21 +00:00
Krzysztof Parzyszek	e3e963236a	[Hexagon] Generate valignb for shifting shuffles (instead of vdelta) llvm-svn: 326627	2018-03-02 22:22:19 +00:00
Sameer AbuAsal	2646a41e54	[RISCV] Implement MC relaxations for compressed instructions. Summary: This patch implements relaxation for RISCV in the MC layer. The following relaxations are currently handled: 1) Relax C_BEQZ to BEQ and C_BNEZ to BNEZ in RISCV. 2) Relax and C_J $imm to JAL x0, $imm and CJAL to JAL ra, $imm. Reviewers: asb, llvm-commits, efriedma Reviewed By: asb Subscribers: shiva0217 Differential Revision: https://reviews.llvm.org/D43055 llvm-svn: 326626	2018-03-02 22:04:12 +00:00
Sam Clegg	bd1716aed1	[WebAssembly] Avoid cast ExprType to wasm::ValType This cast was causing invalid signatures to be written for libcall functions. Add an MC test which includes a call to builtin memcpy. Differential Revision: https://reviews.llvm.org/D44037 llvm-svn: 326618	2018-03-02 21:33:14 +00:00
Heejin Ahn	0c69a3e3d9	Reland "[WebAssembly] More uses of uint8_t for single byte values" Summary: Original change was D43991 (rL326541) and was reverted by rL326571 and rL326572. This adds also the necessary MCCodeEmitter patch. Reviewers: sbc100 Subscribers: jfb, dschuff, sbc100, jgravelle-google, sunfish, llvm-commits, ncw Differential Revision: https://reviews.llvm.org/D44034 llvm-svn: 326614	2018-03-02 20:52:59 +00:00
Ulrich Weigand	db16beed8a	[SystemZ] Allow LRV/STRV with volatile memory accesses The byte-swapping loads and stores do not actually perform multiple accesses to their memory operand, so they are OK to use with volatile memory operands as well. Remove overly cautious check. llvm-svn: 326613	2018-03-02 20:51:59 +00:00
Ulrich Weigand	8b19be46c7	[SystemZ] Add support for anyregcc calling convention This adds back-end support for the anyregcc calling convention for use with patchpoints. Since all registers are considered call-saved with anyregcc (except for 0 and 1 which may still be clobbered by PLT stubs and the like), this required adding support for saving and restoring vector registers in prologue/epilogue code for the first time. This is not used by any other calling convention. llvm-svn: 326612	2018-03-02 20:40:11 +00:00
Ulrich Weigand	5eb64110d2	[SystemZ] Support stackmaps and patchpoints This adds back-end support for the @llvm.experimental.stackmap and @llvm.experimental.patchpoint intrinsics. llvm-svn: 326611	2018-03-02 20:39:30 +00:00
Ulrich Weigand	3206388870	[SystemZ] Fix common-code users of stack size On SystemZ we need to provide a register save area of 160 bytes to any called function. This size needs to be added when allocating stack in the function prologue. However, it was not accounted for as part of MachineFrameInfo::getStackSize(); instead the back-end used a private routine getAllocatedStackSize(). This is OK for code-gen purposes, but it breaks other users of the getStackSize() routine, in particular it breaks the recently- added -stack-size-section feature. Fix this by updating the main stack size tracked by common code (in emitPrologue) instead of using the private routine. No change in code generation intended. llvm-svn: 326610	2018-03-02 20:38:41 +00:00
Ulrich Weigand	18f6930fef	[SystemZ] Support vector registers in inline asm This adds support for specifying vector registers for use with inline asm statements, either via the 'v' constraint or by explicit register names (v0 ... v31). llvm-svn: 326609	2018-03-02 20:36:34 +00:00
Krzysztof Parzyszek	f608812bde	[Hexagon] Handle VACOPY in isel lowering llvm-svn: 326599	2018-03-02 18:35:57 +00:00
Simon Pilgrim	8cbc1d232b	[X86][BTVER2] Fix throughput of YMM bitwise instructions These instructions are double-pumped, split into 2 128-bit ops and then passing through either FPU pipe. Found while testing llvm-mca (D43951) llvm-svn: 326597	2018-03-02 18:20:35 +00:00
Craig Topper	6b1419b547	[X86] Reject xmm16-31 in inline asm constraints when AVX512 is disabled Fixes PR36532 Differential Revision: https://reviews.llvm.org/D43960 llvm-svn: 326596	2018-03-02 18:19:40 +00:00
Derek Schuff	57feeed307	[X86][x32] Save callee-save register used as base pointer for x32 ABI For the x32 ABI, since the base pointer register (EBX) is a callee save register it should be saved before use. This fixes https://bugs.llvm.org/show_bug.cgi?id=36011 Differential Revision: https://reviews.llvm.org/D42358 Patch by Pratik Bhatu llvm-svn: 326593	2018-03-02 17:46:39 +00:00
Benjamin Kramer	4925653555	[ARM] Fold variable into assert. Avoids unused variable warnings in Release mode. llvm-svn: 326592	2018-03-02 17:39:20 +00:00
Matt Arsenault	b9699c009d	AMDGPU/GlobalISel: InstrMapping for G_ZEXT llvm-svn: 326589	2018-03-02 16:55:37 +00:00
Matt Arsenault	1c1aab99ae	AMDGPU/GlobalISel: InstrMapping for G_TRUNC llvm-svn: 326588	2018-03-02 16:55:33 +00:00
Matt Arsenault	ef8db767d7	AMDGPU/GlobalISel: Define InstrMappings for G_FCMP Patch by Tom Stellard llvm-svn: 326587	2018-03-02 16:53:15 +00:00
Matt Arsenault	2607dc60de	AMDGPU/GlobalISel: Define instruction mapping for @llvm.minnum Patch by Tom Stellard llvm-svn: 326586	2018-03-02 16:40:17 +00:00
Momchil Velikov	505614bb4f	[ARM] Fix access to stack arguments when re-aligning SP in Armv6m When an Armv6m function dynamically re-aligns the stack, access to incoming stack arguments (and to stack area, allocated for register varargs) is done via SP, which is incorrect, as the SP is offset by an unknown amount relative to the value of SP upon function entry. This patch fixes it, by making access to "fixed" frame objects be done via FP when the function needs stack re-alignment. It also changes the access to "fixed" frame objects be done via FP (instead of using R6/BP) also for the case when the stack frame contains variable sized objects. This should allow more objects to fit within the immediate offset of the load instruction. All of the above via a small refactoring to reuse the existing `ARMFrameLowering::ResolveFrameIndexReference.` Differential Revision: https://reviews.llvm.org/D43566 llvm-svn: 326584	2018-03-02 15:47:14 +00:00
Stefan Pintilie	b5a9440a80	[Power9] Add missing instructions to the Power 9 scheduler Adding more instructions using InstRW so that we can move away from ItinRW and ultimately have a complete Power 9 scheduler. llvm-svn: 326578	2018-03-02 14:41:38 +00:00
David Stenberg	3fb8c324b3	Test commit: Remove an extraneous space. NFC Test commit access. llvm-svn: 326573	2018-03-02 14:28:56 +00:00
Nicholas Wilson	be28e61a03	Revert "[WebAssembly] More uses of uint8_t" and "[WebAssembly] Update tests" This reverts commits r326541 and r326571. The tests were correct, and were updated with incorrect expectations. The original commit was broken and should be reverted to get things back to a working state. llvm-svn: 326572	2018-03-02 14:07:39 +00:00
Florian Hahn	9deef20b6c	[ARM] Fix codegen for VLD3/VLD4/VST3/VST4 with WB Code generation of VLD3, VLD4, VST3 and VST4 with register writeback is broken due to 2 separate bugs: 1) VLD1d64TPseudoWB_register and VLD1d64QPseudoWB_register are missing rules to expand them to non pseudo MIR. These are selected for ARMISD::VLD3_UPD/VLD4_UPD with v1i64 vectors in SelectVLD. 2) Selection of the right VLD/VST instruction is broken for load and store of 3 and 4 v1i64 vectors. SelectVLD and SelectVST are called with MIR opcode for fixed writeback (ie increment is access size) and call getVLDSTRegisterUpdateOpcode() to select an opcode with register writeback if base register update is of a different size. Since getVLDSTRegisterUpdateOpcode() only knows about VLD1/VLD2/VST1/VST2 the call is currently conditional on the number of element in the vector. However, VLD1/VST1 is selected by SelectVLD/SelectVST's caller for load and stores of 3 or 4 v1i64 vectors. Therefore the opcode is not updated which later lead to a fixed writeback instruction being constructed with an extra operand for the register writeback. This patch addresses the two issues as follows: - it adds the necessary mapping from VLD1d64TPseudoWB_register and VLD1d64QPseudoWB_register to VLD1d64Twb_register and VLD1d64Qwb_register respectively. Like for the existing _fixed variants, the cost of these is bumped for unaligned access. - it changes the logic in SelectVLD and SelectVSD to call isVLDfixed and isVSTfixed respectively to decide whether the opcode should be updated. It also reworks the logic and comments for pushing the writeback offset operand and r0 operand to clarify the logic: writeback offset needs to be pushed if it's a register writeback, r0 needs to be pushed if not and the instruction is a VLD1/VLD2/VST1/VST2. Reviewers: rengolin, t.p.northover, samparker Reviewed By: samparker Patch by Thomas Preud'homme <thomas.preudhomme@arm.com> Differential Revision: https://reviews.llvm.org/D42970 llvm-svn: 326570	2018-03-02 13:02:55 +00:00
Matt Arsenault	b46c191c49	AMDGPU/GlobalISel: Define instruction mapping for @llvm.maxnum Patch by Tom Stellard llvm-svn: 326567	2018-03-02 12:23:00 +00:00
Simon Pilgrim	c879aa7eab	[X86] Remove old UNIMPLEMENTED list All of these are implemented and have appropriate test coverage llvm-svn: 326553	2018-03-02 11:59:37 +00:00
Heejin Ahn	d684cb57f4	[WebAssembly] More uses of uint8_t for single byte values Summary: It looks like this was missing from D43921. Reviewers: sbc100 Subscribers: jfb, dschuff, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D43991 llvm-svn: 326541	2018-03-02 06:51:35 +00:00
Jan Vesely	b283ea0f0f	AMDGPU/GCN: Promote i16 ctpop i16 capable ASICs do not support i16 operands for this instruction. Add tablegen pattern to merge chained i16 additions. Differential Revision: https://reviews.llvm.org/D43985 llvm-svn: 326535	2018-03-02 02:50:22 +00:00
Matt Arsenault	41d2e3d98e	AMDGPU/GlobalISel: Define instruction mapping for G_FPTOSI Patch by Tom Stellard llvm-svn: 326534	2018-03-02 02:19:16 +00:00
Matt Arsenault	b23041ad4d	AMDGPU/GlobalISel: Define instruction mapping for G_FPTOUI Patch by Tom Stellard llvm-svn: 326533	2018-03-02 02:19:11 +00:00
Matt Arsenault	327d5fb2e5	AMDGPU/GlobalISel: Define instruction mapping for G_FMUL llvm-svn: 326532	2018-03-02 02:17:01 +00:00
Matt Arsenault	5a9e834eac	AMDGPU/GlobalISel: Define instruction mapping for G_FADD Patch by Tom Stellard llvm-svn: 326526	2018-03-02 01:22:13 +00:00
Matt Arsenault	d99317f1b3	AMDGPU/GlobalISel: Define instruction mapping for G_SHL Patch by Tom Stellard llvm-svn: 326525	2018-03-02 01:22:10 +00:00
Matt Arsenault	3c7a123ccc	AMDGPU/GlobalISel: Define instruction mapping for G_XOR llvm-svn: 326524	2018-03-02 01:22:06 +00:00
Matt Arsenault	c0f34c9e36	AMDGPU/GlobalISel: Define instruction mapping for G_AND Patch by Tom Stellard llvm-svn: 326523	2018-03-02 01:22:01 +00:00
Heejin Ahn	e4a8deea84	[WebAssembly] Gather EH instructions in one place. NFC. Summary: - Gather EH instructions in one place for easy tracking (more will be added later) - Variable name change Reviewers: dschuff Subscribers: jfb, sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D43742 llvm-svn: 326522	2018-03-02 01:03:40 +00:00
Yonghong Song	03e1c8b8f9	bpf: introduce -mattr=dwarfris to disable DwarfUsesRelocationsAcrossSections Commit e4507fb8c94b ("bpf: disable DwarfUsesRelocationsAcrossSections") disables MCAsmInfo DwarfUsesRelocationsAcrossSections unconditionally so that dwarf will not use cross section (between dwarf and symbol table) relocations. This new debug format enables pahole to dump structures correctly as libdwarves.so does not have BPF backend support yet. This new debug format, however, breaks bcc (https://github.com/iovisor/bcc) source debug output as llvm in-memory Dwarf support has some issues to handle it. More specifically, with DwarfUsesRelocationsAcrossSections disabled, JIT compiler does not generate .debug_abbrev and Dwarf DIE (debug info entry) processing is not happy about this. This patch introduces a new flag -mattr=dwarfris (dwarf relocation in section) to disable DwarfUsesRelocationsAcrossSections. DwarfUsesRelocationsAcrossSections is true by default. Signed-off-by: Yonghong Song <yhs@fb.com> llvm-svn: 326505	2018-03-01 23:04:59 +00:00
Simon Pilgrim	90fd0622b6	[X86][MMX] Improve handling of 64-bit MMX constants 64-bit MMX constant generation usually ends up lowering into SSE instructions before being spilled/reloaded as a MMX type. This patch bitcasts the constant to a double value to allow correct loading directly to the MMX register. I've added MMX constant asm comment support to improve testing, it's better to always print the double values as hex constants as MMX is mainly an integer unit (and even with 3DNow! its just floats). Differential Revision: https://reviews.llvm.org/D43616 llvm-svn: 326497	2018-03-01 22:22:31 +00:00
Krzysztof Parzyszek	c5e0ed109d	[Hexagon] Add trap1 instruction llvm-svn: 326492	2018-03-01 21:54:08 +00:00
Matt Arsenault	364f12e8f9	AMDGPU/GlobalISel: Define instruction mapping for @llvm.amdgcn.cvt.pkrtz Patch by Tom Stellard llvm-svn: 326490	2018-03-01 21:25:30 +00:00
Matt Arsenault	5320ee4a05	AMDGPU/GlobalISel: Define instruction mapping for G_OR Patch by Tom Stellard llvm-svn: 326489	2018-03-01 21:25:25 +00:00
Matt Arsenault	e65404f5c5	AMDGPU/GlobalISel: Remove default register mapping This crashes for some opcodes, which prevents the SelectionDAG fallback from working. Patch by Tom Stellard llvm-svn: 326487	2018-03-01 21:20:44 +00:00
Evandro Menezes	2bbb4a7c93	[AArch64] Clean up code (NFC) Clean up a couple of functions in `AArch64TargetLowering` by removing redundant statements. llvm-svn: 326486	2018-03-01 21:17:36 +00:00
Matt Arsenault	1422a19a88	AMDGPU/GlobalISel: Use a more correct getValueMapping This was finding the wrong size registers for anything with more than 2 components. Patch by Tom Stellard llvm-svn: 326483	2018-03-01 21:08:51 +00:00
Matt Arsenault	62669ede94	AMDGPU/GlobalISel: Define instruction mapping for G_BITCAST Patch by Tom Stellard llvm-svn: 326482	2018-03-01 20:59:44 +00:00
Matt Arsenault	0529a8e2de	AMDGPU/GlobalISel: Mark i32->i64 zext as legal llvm-svn: 326481	2018-03-01 20:56:21 +00:00
Martin Storsjo	c61ff3bef1	[AArch64] Add support for secrel add/load/store relocations for COFF Differential Revision: https://reviews.llvm.org/D43288 llvm-svn: 326480	2018-03-01 20:42:28 +00:00
Matt Arsenault	36b99e1937	AMDGPU/GlobalISel: InstrMapping for llvm.amdgcn.exp.compr Patch by Tom Stellard llvm-svn: 326479	2018-03-01 20:40:55 +00:00
Matt Arsenault	8931bbf8df	AMDGPU/GlobalISel: Define instruction mapping for @llvm.amdgcn.exp Patch by Tom Stellard llvm-svn: 326477	2018-03-01 20:24:37 +00:00
Matt Arsenault	50721ab325	AMDGPU/GlobalISel: Define InstrMappings for G_ICMP Patch by Tom Stellard llvm-svn: 326472	2018-03-01 19:27:10 +00:00
Matt Arsenault	dc14ec05d4	AMDGPU/GlobalISel: Make i32 mul legal llvm-svn: 326471	2018-03-01 19:22:05 +00:00
Matt Arsenault	06cbb27a79	AMDGPU/GlobalISel: Define instruction mapping for G_IMPLICIT_DEF Patch by Tom Stellard llvm-svn: 326470	2018-03-01 19:16:52 +00:00
Matt Arsenault	e3d9ecf2b9	AMDGPU/GlobalISel: Define instruction mapping for G_FCONSTANT Patch by Tom Stellard llvm-svn: 326468	2018-03-01 19:13:30 +00:00
Matt Arsenault	51b0b20023	AMDGPU/GlobalISel: Add copyCost for VGPR->SGPR copies Patch by Tom Stellard llvm-svn: 326467	2018-03-01 19:09:25 +00:00
Matt Arsenault	3f6a204eaa	AMDGPU/GlobalISel: Make i32 xor legal llvm-svn: 326466	2018-03-01 19:09:21 +00:00
Matt Arsenault	8e80a5fbca	AMDGPU/GlobalISel: Mark 32/64-bit G_FCMP as legal Patch by Tom Stellard llvm-svn: 326465	2018-03-01 19:09:16 +00:00
Matt Arsenault	dd022ce064	AMDGPU/GlobalISel: Mark 32-bit G_FPTOSI as legal Patch by Tom Stellard llvm-svn: 326464	2018-03-01 19:04:25 +00:00
Sam Clegg	503fdea3cb	[WebAssembly] Fix broken gcc build after rL326454 The gcc builders were broken by rL326454 See: https://reviews.llvm.org/D43921 llvm-svn: 326460	2018-03-01 18:48:08 +00:00
Artem Belevich	8c9749b1dc	[NVPTX] use pattern matching to lower int_nvvm_match_all_sync*. Now that patterns can handle intrinsics returning multiple results, use tablegen'ed pattern matching instead of custom lowering. Differential Revision: https://reviews.llvm.org/D43890 llvm-svn: 326457	2018-03-01 18:28:45 +00:00
Sam Clegg	03e101f1b0	[WebAssembly] Use uint8_t for single byte values to match the spec The original BinaryEncoding.md document used to specify that these values were `varint7`, but the official spec lists them explicitly as single byte values and not LEB. A similar change for wabt is in flight: https://github.com/WebAssembly/wabt/pull/782 Differential Revision: https://reviews.llvm.org/D43921 llvm-svn: 326454	2018-03-01 18:06:21 +00:00
Alexander Timofeev	0081d23fd8	[AMDGPU] : fix for the crash in SIRegisterInfo when the regiser class not found Differential revision: https://reviews.llvm.org./D43334 llvm-svn: 326451	2018-03-01 17:36:43 +00:00
Krzysztof Parzyszek	22a21d4c5d	[Hexagon] Add guest registers llvm-svn: 326450	2018-03-01 17:03:26 +00:00
Stefan Pintilie	e894e0ff6f	[Power9] Add missing instructions to the Power 9 scheduler Adding more instructions using InstRW so that we can move away from ItinRW and ultimately have a complete Power 9 scheduler. Differential Revision: https://reviews.llvm.org/D43899 llvm-svn: 326447	2018-03-01 16:16:08 +00:00
Sebastian Pop	c33af715d7	[AArch64] generate vuzp instead of mov when a BUILD_VECTOR is created out of a sequence of EXTRACT_VECTOR_ELT with a specific pattern sequence, either <0, 2, 4, ...> or <1, 3, 5, ...>, replace the BUILD_VECTOR with either vuzp1 or vuzp2. With this patch LLVM generates the following code for the first function fun1 in the testcase: adrp x8, .LCPI0_0 ldr q0, [x8, :lo12:.LCPI0_0] tbl v0.16b, { v0.16b }, v0.16b ext v1.16b, v0.16b, v0.16b, #8 uzp1 v0.8b, v0.8b, v1.8b str d0, [x8] ret Without this patch LLVM currently generates this code: adrp x8, .LCPI0_0 ldr q0, [x8, :lo12:.LCPI0_0] tbl v0.16b, { v0.16b }, v0.16b mov v1.16b, v0.16b mov v1.b[1], v0.b[2] mov v1.b[2], v0.b[4] mov v1.b[3], v0.b[6] mov v1.b[4], v0.b[8] mov v1.b[5], v0.b[10] mov v1.b[6], v0.b[12] mov v1.b[7], v0.b[14] str d1, [x8] ret llvm-svn: 326443	2018-03-01 15:47:39 +00:00
Craig Topper	cb7881c649	[X86] Stop passing two arguments by reference. NFC I think these used to be out parameters, but they haven't been for a while. llvm-svn: 326417	2018-03-01 06:25:13 +00:00
Craig Topper	ccfa5257a6	[X86] Make sure we don't combine (fneg (fma X, Y, Z)) to a target specific node when there are no FMA instructions. This would cause a 'cannot select' error at isel when we should have emitted a lib call and an xor. Fixes PR36553. llvm-svn: 326393	2018-03-01 00:08:38 +00:00
Justin Lebar	faaf2d298e	[NVPTX] Lower loads from global constants using ld.global.nc (aka LDG). Summary: After D43914, loads from global variables in addrspace(1) happen with ld.global. But since they're constants, even better would be to use ld.global.nc, aka ldg. Reviewers: tra Subscribers: jholewinski, sanjoy, hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D43915 llvm-svn: 326390	2018-02-28 23:58:05 +00:00
Justin Lebar	5a7de898d2	[NVPTX] Use addrspacecast instead of target-specific intrinsics in NVPTXGenericToNVVM. Summary: NVPTXGenericToNVVM was using target-specific intrinsics to do address space casts. Using the addrspacecast instruction is (a lot) simpler. But it also has the advantage of being understandable to other passes. In particular, InferAddrSpaces is able to understand these address space casts and remove them in most cases. Reviewers: tra Subscribers: jholewinski, sanjoy, hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D43914 llvm-svn: 326389	2018-02-28 23:57:48 +00:00
Craig Topper	e31b9d1e5f	[X86] Lower extract_element from k-registers by bitcasting from v16i1 to i16 and extending/truncating. This is equivalent to what isel was doing anyway but by canonicalizing earlier we can remove some patterns. llvm-svn: 326375	2018-02-28 22:23:55 +00:00
Simon Pilgrim	72b86586b0	[X86][AVX512] Improve support for signed saturation truncation stores Matches what we already manage for unsigned saturation truncation stores Differential Revision: https://reviews.llvm.org/D43629 llvm-svn: 326372	2018-02-28 21:42:19 +00:00
Krzysztof Parzyszek	b1cdb60e75	[Hexagon] Implement target feature +reserved-r19 llvm-svn: 326364	2018-02-28 20:29:36 +00:00
Tim Renouf	2a99fa2c08	[AMDGPU] added writelane intrinsic Summary: For use by LLPC SPV_AMD_shader_ballot extension. The v_writelane instruction was already implemented for use by SGPR spilling, but I had to add an extra dummy operand tied to the destination, to represent that all lanes except the selected one keep the old value of the destination register. .ll test changes were due to schedule changes caused by that new operand. Differential Revision: https://reviews.llvm.org/D42838 llvm-svn: 326353	2018-02-28 19:10:32 +00:00
Artem Belevich	18a7c51520	[NVPTX] Removed always-true predicates in NVPTX. NVPTX stopped supporting GPUs older than sm_20 (Fermi) quite a while back. Removal of support of pre-Fermi GPUs made a lot of predicates in the NVPTX backend pointless as they can't ever be false any more. It's time to retire them. NFC intended. Differential Revision: https://reviews.llvm.org/D43843 llvm-svn: 326349	2018-02-28 18:51:22 +00:00
Chih-Hung Hsieh	9f9e4681ac	[TLS] use emulated TLS if the target supports only this mode Emulated TLS is enabled by llc flag -emulated-tls, which is passed by clang driver. When llc is called explicitly or from other drivers like LTO, missing -emulated-tls flag would generate wrong TLS code for targets that supports only this mode. Now use useEmulatedTLS() instead of Options.EmulatedTLS to decide whether emulated TLS code should be generated. Unit tests are modified to run with and without the -emulated-tls flag. Differential Revision: https://reviews.llvm.org/D42999 llvm-svn: 326341	2018-02-28 17:48:55 +00:00
Pablo Barrio	512f7ee315	[ARM] Lower lower saturate to 0 and lower saturate to -1 using bit-operations Summary: Expressions of the form x < 0 ? 0 : x; and x < -1 ? -1 : x can be lowered using bit-operations instead of branching or conditional moves In thumb-mode this results in a two-instruction sequence, a shift followed by a bic or or while in ARM/thumb2 mode that has flexible second operand the shift can be folded into a single bic/or instructions. In most cases this results in smaller code and possibly less branches, and in no case larger than before. Patch by Martin Svanfeldt Reviewers: fhahn, pbarrio, rogfer01 Reviewed By: pbarrio, rogfer01 Subscribers: chrib, yroux, eugenis, efriedma, rogfer01, aemerson, javed.absar, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D42574 llvm-svn: 326333	2018-02-28 17:13:07 +00:00
Simon Dardis	4529aac2de	[mips] Begin reworking instruction predicates for ISAs/encodings (1/N) The MIPS backend has inconsistent usage of instruction predicates for assembly and code generation. The issue arises from supporting three encodings, two (MIPS and microMIPS) of which have a near 1:1 instruction mapping across ISA revisions and a third encoding with a more restricted set of instructions (MIPS16e). To enforce consistent usage, each of the ISA_* adjectives has (or will have) the relevant encoding attached to it along the relevant ISA revision where the instruction is defined. Each instruction, pattern or alias will then have the correct ISA adjective attached to it, and the base instruction description classes will have any predicates relating to ISA encoding or revision removed. Pseudo instructions will also be guarded for the encoding or ABI that they are supported in. Finally, the hasStandardEncoding() / inMicroMipsMode() / inMips16Mode() methods of MipsSubtarget will be changed such that only one can be true at any one time. The result of this is that code generation and assembly will produce the correct encoding up front, while code generated from pseudo instructions and other inserted sequences of instructions will be able to rely on the mapping tables to produce the correct encoding. This should fix numerous bugs where the result 'happens' to be correct but has edge cases where microMIPS and MIPS have subtle differences (e.g. microMIPSR6 using 'j', 'jal' instructions.) This patch starts the process by changing most of the ISA adjectives to make use of the EncodingPredicate member of PredicateControl. Follow on patches will annotate instructions with their correct ISA adjective and eliminate the usage of "let Predicates = [..]", "let AdditionalPredicates = [..]" and "isCodeGenOnly = 1" in the cases where it was used to control instruction availability. Contributions from Nitesh Jain. Reviewers: atanasyan Differential Revision: https://reviews.llvm.org/D41434 llvm-svn: 326322	2018-02-28 13:02:44 +00:00
Alexander Ivchenko	c01f750480	[GlobalIsel][X86] Support G_INTTOPTR instruction. Add legalization/selection for x86/x86_64 and corresponding tests. Reviewed By: igorb Differential Revision: https://reviews.llvm.org/D43622 llvm-svn: 326320	2018-02-28 12:11:53 +00:00
Alexander Ivchenko	46e07e3623	[GlobalIsel][X86] Support G_PTRTOINT instruction. Add legalization/selection for x86/x86_64 and corresponding tests. Reviewed By: igorb Differential Revision: https://reviews.llvm.org/D43617 llvm-svn: 326311	2018-02-28 09:18:47 +00:00
Craig Topper	48d5ed265c	[X86] Don't use EXTRACT_ELEMENT from v1i1 with i8/i32 result type when we need to guarantee zeroes in the upper bits of return. An extract_element where the result type is larger than the scalar element type is semantically an any_extend of from the scalar element type to the result type. If we expect zeroes in the upper bits of the i8/i32 we need to mae sure those zeroes are explicit in the DAG. For these cases the best way to accomplish this is use an insert_subvector to pad zeroes to the upper bits of the v1i1 first. We extend to either v16i1(for i32) or v8i1(for i8). Then bitcast that to a scalar and finish with a zero_extend up to i32 if necessary. We can't extend past v16i1 because that's the largest mask size on KNL. But isel is smarter enough to know that a zext of a bitcast from v16i1 to i16 can use a KMOVW instruction. The insert_subvectors will be dropped during isel because we can determine that the producing instruction already zeroed the upper bits of the k-register. llvm-svn: 326308	2018-02-28 08:14:28 +00:00
Craig Topper	ac799b05d4	[X86] Change the masked FPCLASS implementation to use AND instead of OR to combine the mask results. While the description for the instruction does mention OR, its talking about how the individual classification test results are ORed together. The incoming mask is used as a zeroing write mask. If the bit is 1 the classification is written to the output. The bit is 0 the output is 0. This equivalent to an AND. Here is pseudocode from the intrinsics guide FOR j := 0 to 1 i := j*64 IF k1[j] k[j] := CheckFPClass_FP64(a[i+63:i], imm8[7:0]) ELSE k[j] := 0 FI ENDFOR k[MAX:2] := 0 llvm-svn: 326306	2018-02-28 06:19:55 +00:00
Andrew Zhogin	f8e88af11d	[ARM] Cortex-A57 scheduler fix for ARM backend (missed 16-bit, v8.1/v8.2/v8.3, thumb and pseudo instructions) Added missed scheduling info for ARM Cortex A57 (AArch32) to have CompleteModel with this checkCompleteness fix: https://reviews.llvm.org/D43235. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D43808 llvm-svn: 326304	2018-02-28 05:53:18 +00:00
Krzysztof Parzyszek	2373f8fcf3	[Hexagon] Recognize more sign-extensions as inputs to 32x32-bit multiply llvm-svn: 326263	2018-02-27 22:44:41 +00:00
Konstantin Zhuravlyov	40b09e86b9	AMDGPU: Add fast fmaf feature to gfx702 Differential Revision: https://reviews.llvm.org/D43790 llvm-svn: 326252	2018-02-27 21:46:15 +00:00
Sjoerd Meijer	fc0d02cbbf	[ARM] Another f16 litpool fix We were always setting the block alignment to 2 bytes in Thumb mode and 4-bytes in ARM mode (r325754, and r325012), but this could cause reducing the block alignment when it already had been aligned (e.g. in Thumb mode when the block is a CPE that was already 4-byte aligned). Patch by Momchil Velikov, I've only added a test. Differential Revision: https://reviews.llvm.org/D43777 llvm-svn: 326232	2018-02-27 19:26:02 +00:00
Craig Topper	688d1eb919	Revert r326225 "[X86] Move the load folding tables to a separate .inc file" The bots don't seem to like the .inc file. I must be missing some cmake incantation. llvm-svn: 326228	2018-02-27 19:15:40 +00:00
Peter Collingbourne	e8436e8631	ARM: Don't rewrite add reg, $sp, 0 -> mov reg, $sp if the add defines CPSR. Differential Revision: https://reviews.llvm.org/D43807 llvm-svn: 326226	2018-02-27 19:00:59 +00:00
Craig Topper	c0a1291478	[X86] Move the load folding tables to a separate .inc file These tables add 3000 lines to X86InstrInfo.cpp. And if we ever manage to auto generate them they'll be a separate file anyway. Differential Revision: https://reviews.llvm.org/D43806 llvm-svn: 326225	2018-02-27 18:46:11 +00:00
Krzysztof Parzyszek	d70f5a0eb4	[Hexagon] Add patterns for compares of i1 values llvm-svn: 326220	2018-02-27 18:31:46 +00:00
Simon Pilgrim	ba43ec8702	[X86][AVX] combineLoopMAddPattern - support 256-bit cases on AVX1 via SplitBinaryOpsAndApply llvm-svn: 326189	2018-02-27 12:20:37 +00:00
Jonas Paulsson	f268cd0aad	[SystemZ] Make sure SelectCode() is not called on a target opcode. Since getNode() might not always return the requsted opcode, for instance if called with (ISD::AND, -1) arguments, there should be a check so that SelectCode() is only called when appropriate. Review: Ulrich Weigand llvm-svn: 326178	2018-02-27 07:53:23 +00:00
Craig Topper	264707bae4	[X86] Simplify if condition. NFC SSE2 implies SSE1 and we already covered f32 in the SSE1 check so we don't need to check f32 in the SSE2 check. llvm-svn: 326170	2018-02-27 06:00:38 +00:00
Craig Topper	fcaa0323ec	[X86] Replace an impossible if condition with an assert. llvm-svn: 326167	2018-02-27 03:50:00 +00:00
Aditya Nandakumar	599990530e	[GISel]: Don't assert when constraining RegisterOperands which are uses. Currently we assert that only non target specific opcodes can have missing RegisterClass constraints in the MCDesc. The backend can have instructions with register operands but don't have RegisterClass constraints (say using unknown_class) in which case the instruction defining the register will constrain it. Change the assert to only fire if a def has no regclass. https://reviews.llvm.org/D43409 llvm-svn: 326142	2018-02-26 22:56:21 +00:00
Simon Pilgrim	9929f90740	[X86][SSE] Reduce FADD/FSUB/FMUL costs on later targets (PR36280) Agner's tables indicate that for SSE42+ targets (Core2 and later) we can reduce the FADD/FSUB/FMUL costs down to 1, which should fix the Himeno benchmark. Note: the AVX512 FDIV costs look rather dodgy, but this isn't part of this patch. Differential Revision: https://reviews.llvm.org/D43733 llvm-svn: 326133	2018-02-26 22:10:17 +00:00
Craig Topper	e5d39e42b9	[X86] Add constant folding to combineMOVMSK. There's still some shortcoming in our ability to combine binops of constants with different sizes separated by an extend. I'll try to look at that next. llvm-svn: 326128	2018-02-26 21:17:33 +00:00
Craig Topper	5e0ceb8865	[X86] Add a custom legalization for (i16 (bitcast v16i1)) and (i32 (bitcast v32i1)) without AVX512 to prevent scalarization Summary: We have an early DAG combine to turn these patterns into MOVMSK, but that combine doesn't work if the vXi1 type has more elements than the widest legal vXi8 type. Type legalization will eventually split it down to v16i1 or v32i1 and then the bitcast gets legalized to a truncstore and a scalar load. The truncstore will get lowered to a series of extracts and bit math. This patch adds a custom legalization to use a sign extend and MOVMSK instead. This prevents the eventual scalarization. Reviewers: spatel, RKSimon, zvi Reviewed By: RKSimon Subscribers: mgorny, llvm-commits Differential Revision: https://reviews.llvm.org/D43593 llvm-svn: 326119	2018-02-26 20:32:27 +00:00
Simon Pilgrim	db0ed7d724	[X86][AVX] createPSADBW - support 256-bit cases on AVX1 via SplitBinaryOpsAndApply llvm-svn: 326104	2018-02-26 18:17:25 +00:00
Matt Arsenault	2a26a286db	AMDGPU/GlobalISel: Make f64 constants legal llvm-svn: 326101	2018-02-26 17:20:43 +00:00
Tim Renouf	832f90fa0c	[AMDGPU] Scratch setup fix on AMDPAL gfx9+ merge shader Summary: With OS type AMDPAL, the scratch descriptor is hardwired to be loaded from offset 0 of the global information table, whose low pointer is passed in s0. For a merge shader on gfx9+, it needs to be s8 instead, as the hardware reserves s0-s7. Reviewers: kzhuravl Subscribers: arsenm, nhaehnle, dstuttard, llvm-commits, t-tye, yaxunl, wdng, kzhuravl Differential Revision: https://reviews.llvm.org/D42203 llvm-svn: 326088	2018-02-26 14:46:43 +00:00
Benjamin Kramer	b84e158df7	[WebAssembly] Relax constexpr for old standard libraries. This will still be constexpr when the standard library supports it, but doesn't force constexpr. Old libraries will get a global constructor, which is not too bad. llvm-svn: 326080	2018-02-26 11:07:25 +00:00
Jonas Paulsson	b1e81479e9	[XCore] Return true in enableMultipleCopyHints(). Enable multiple COPY hints to eliminate more COPYs during register allocation. Note that this is something all targets should do, see https://reviews.llvm.org/D38128. Review: Robert Lytton llvm-svn: 326069	2018-02-26 08:03:32 +00:00
Craig Topper	5c980eba47	[X86] Don't use getZExtValue when we have no idea how large the input elements are. llvm-svn: 326066	2018-02-26 04:43:24 +00:00
Craig Topper	2286058f46	[X86] Use SelectionDAG::SplitVectorOperand to simplify some code. NFC llvm-svn: 326065	2018-02-26 02:16:34 +00:00
Craig Topper	2bf8e3e0e1	[X86] Simplify the ReplaceNodeResults code for X86ISD::AVG. This code seemed to try to widen to 128, 256, or 512 bit vectors, but we only create X86ISD::AVG with a power of 2 number of elements. This means the only nodes that need to be legalized are less than 128-bits and need to be widened up to 128 bits. llvm-svn: 326064	2018-02-26 02:16:33 +00:00
Craig Topper	79d189f597	[X86] Remove VT.isSimple() check from detectAVGPattern. Which types are considered 'simple' is a function of the requirements of all targets that LLVM supports. That shouldn't directly affect what types we are able to handle. The remainder of this code checks that the number of elements is a power of 2 and takes care of splitting down to a legal size. llvm-svn: 326063	2018-02-26 02:16:31 +00:00
Craig Topper	6694df14e6	[X86] Use SDNode instead of SDPatternOperator. NFC llvm-svn: 326048	2018-02-25 06:21:04 +00:00
Craig Topper	81c0eaf4c8	[X86] Allow int_x86_sse2_cvtps2dq and int_x86_avx_cvt_ps2dq_256 to select EVEX encoded instructions. llvm-svn: 326041	2018-02-24 18:58:07 +00:00
Simon Pilgrim	a4fb569483	[X86][SSE] combineSubToSubus - support v8i64 handling from SSSE3 Our UMIN/UMAX, vector truncation and shuffle combining is good enough to efficiently handle v8i64 with the number of leading zeros that are necessary for PSUBUS. llvm-svn: 326034	2018-02-24 14:06:39 +00:00
Simon Pilgrim	8ad91261e8	[X86][SSE] combineSubToSubus - support v8i32 handling from SSSE3 (not SSE41) Now that UMIN etc are Legal/Custom for SSE2+, we can efficiently match SUBUS v8i32 cases from SSSE3 which can perform efficient truncation with PSHUFB. llvm-svn: 326033	2018-02-24 13:39:13 +00:00
Simon Pilgrim	744f008a75	[X86][SSE] combineSubToSubus - begun generalizing to work with any type sizes with SplitBinaryOpsAndApply llvm-svn: 326030	2018-02-24 12:44:12 +00:00
Simon Pilgrim	51ce2ed367	Fix spelling in comment. NFCI. llvm-svn: 326029	2018-02-24 12:27:02 +00:00
Jonas Paulsson	8ff0773b13	[Sparc] Return true in enableMultipleCopyHints(). Enable multiple COPY hints to eliminate more COPYs during register allocation. Note that this is something all targets should do, see https://reviews.llvm.org/D38128. Review: James Y Knight llvm-svn: 326028	2018-02-24 08:24:31 +00:00
Craig Topper	161c805da4	[X86] Use SelectionDAG::getNot instead of implementing manually. NFC llvm-svn: 326020	2018-02-24 03:15:54 +00:00
Stanislav Mekhanoshin	fa48c496e2	[AMDGPU] Shrinking V_SUBBREV_U32 V_SUBBREV_U32 is a commute opcode for V_SUBB_U32. However, when we try to commute V_SUBB_U32 in order to shrink it we do not then process V_SUBBREV_U32 and it stay VOP3. This is fixed. Differential Revision: https://reviews.llvm.org/D43699 llvm-svn: 326011	2018-02-24 01:32:32 +00:00
Heejin Ahn	9386bde11b	[WebAssembly] Add exception handling option and feature Summary: Add a llc command line option and WebAssembly architecture feature for exception handling. Reviewers: dschuff Subscribers: jfb, sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D43683 llvm-svn: 326004	2018-02-24 00:40:50 +00:00
Craig Topper	7bcac492d4	[X86] Remove checks for '(scalar_to_vector (i8 (trunc GR32:)))' from scalar masked move patterns. This portion can be matched by other patterns. We don't need it to make the larger pattern valid. It's sufficient to have a v1i1 mask input without caring where it came from. llvm-svn: 325999	2018-02-24 00:15:05 +00:00
Yonghong Song	60fed1fef0	bpf: New optimization pass for eliminating unnecessary i32 promotions This pass performs peephole optimizations to cleanup ugly code sequences at MachineInstruction layer. Currently, the only optimization in this pass is to eliminate type promotion sequences for zero extending 32-bit subregisters to 64-bit registers. If the compiler could prove the zero extended source come from 32-bit subregistere then it is safe to erase those promotion sequece, because the upper half of the underlying 64-bit registers were zeroed implicitly already. Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Reviewed-by: Yonghong Song <yhs@fb.com> llvm-svn: 325991	2018-02-23 23:49:32 +00:00
Yonghong Song	ae961bb061	bpf: New decoder namespace for 32-bit subregister load/store When -mattr=+alu32 passed to the disassembler, use decoder namespace for 32-bit subregister. This is to disassemble load and store instructions in preferred B format as described in previous commit: w = (u8 ) (r + off) // BPF_LDX \| BPF_B w = (u16 )(r + off) // BPF_LDX \| BPF_H w = (u32 )(r + off) // BPF_LDX \| BPF_W (u8 ) (r + off) = w // BPF_STX \| BPF_B (u16 )(r + off) = w // BPF_STX \| BPF_H (u32 )(r + off) = w // BPF_STX \| BPF_W NOTE: all other instructions should still use the default decoder namespace. Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Reviewed-by: Yonghong Song <yhs@fb.com> llvm-svn: 325990	2018-02-23 23:49:31 +00:00
Yonghong Song	ca31c3bb3f	bpf: Enable 32-bit subregister support for -mattr=+alu32 After all those preparation patches, now we could enable 32-bit subregister support once -mattr=+alu32 specified. Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Reviewed-by: Yonghong Song <yhs@fb.com> llvm-svn: 325989	2018-02-23 23:49:30 +00:00
Yonghong Song	fcd1e0f625	bpf: Support 32-bit subregister in various InstrInfo hooks This patch support 32-bit subregister in three InstrInfo hooks, i.e. copyPhysReg, loadRegFromStackSlot and storeRegToStackSlot, Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Reviewed-by: Yonghong Song <yhs@fb.com> llvm-svn: 325988	2018-02-23 23:49:29 +00:00
Yonghong Song	b1a52bd756	bpf: New instruction patterns for 32-bit subregister load and store The instruction mapping between eBPF/arm64/x86_64 are: eBPF arm64 x86_64 LD1 BPF_LDX \| BPF_B ldrb movzbl LD2 BPF_LDX \| BPF_H ldrh movzwl LD4 BPF_LDX \| BPF_W ldr movl movzbl/movzwl/movl on x86_64 accept 32-bit sub-register, for example %eax, the same for ldrb/ldrh on arm64 which accept 32-bit "w" register. And actually these instructions only accept sub-registers. There is no point to have LD1/2/4 (unsigned) for 64-bit register, because on these arches, upper 32-bits are guaranteed to be zeroed by hardware or VM, so load into the smallest available register class is the best choice for maintaining type information. For eBPF we should adopt the same philosophy, to change current format (A): r = (u8 ) (r + off) // BPF_LDX \| BPF_B r = (u16 )(r + off) // BPF_LDX \| BPF_H r = (u32 )(r + off) // BPF_LDX \| BPF_W (u8 ) (r + off) = r // BPF_STX \| BPF_B (u16 )(r + off) = r // BPF_STX \| BPF_H (u32 )(r + off) = r // BPF_STX \| BPF_W into B: w = (u8 ) (r + off) // BPF_LDX \| BPF_B w = (u16 )(r + off) // BPF_LDX \| BPF_H w = (u32 )(r + off) // BPF_LDX \| BPF_W (u8 ) (r + off) = w // BPF_STX \| BPF_B (u16 )(r + off) = w // BPF_STX \| BPF_H (u32 )(r + off) = w // BPF_STX \| BPF_W There is no change on encoding nor how should they be interpreted, everything is as it is, load the specified length, write into low bits of the register then zeroing all remaining high bits. The only change is their associated register class and how compiler view them. Format A still need to be kept, because eBPF LLVM backend doesn't support sub-registers at default, but once 32-bit subregister is enabled, it should use format B. This patch implemented this together with all those necessary extended load and truncated store patterns. Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Reviewed-by: Yonghong Song <yhs@fb.com> llvm-svn: 325987	2018-02-23 23:49:28 +00:00
Yonghong Song	63cf273f55	bpf: Support i32 in getScalarShiftAmountTy method getScalarShiftAmount method should be implemented for eBPF backend to make sure shift amount could still get correct type once 32-bit subregisters support are enabled. Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Reviewed-by: Yonghong Song <yhs@fb.com> llvm-svn: 325986	2018-02-23 23:49:26 +00:00
Yonghong Song	59fc805c7e	bpf: Support condition comparison on i32 We need to support condition comparison on i32. All these comparisons are supposed to be combined into BPF_J* instructions which only support i64. For ISD::BR_CC we need to promote it to i64 first, then do custom lowering. For ISD::SET_CC, just expand to SELECT_CC like what's been done for i64. For ISD::SELECT_CC, we also want to do custom lower for i32. However, after 32-bit subregister support enabled, it is possible the comparison operands are i32 while the selected value are i64, or the comparison operands are i64 while the selected value are i32. We need to define extra instruction pattern and support them in custom instruction inserter. Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Reviewed-by: Yonghong Song <yhs@fb.com> llvm-svn: 325985	2018-02-23 23:49:25 +00:00
Yonghong Song	219156cff0	bpf: Handle i32 for ALU operations without ISA support There is no eBPF ISA support for BSWAP, ROTR, ROTL, SREM, SDIVREM, MULHU, ADDC, ADDE etc on i32. They could be emulated by other basic BPF_ALU operations, we'd set their lowering action the same as i64. Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Reviewed-by: Yonghong Song <yhs@fb.com> llvm-svn: 325984	2018-02-23 23:49:24 +00:00
Yonghong Song	07a7a41753	bpf: New calling convention for 32-bit subregisters This patch add new calling conventions to allow GPR32RegClass as valid register class for arguments and return types. New calling convention will only be choosen when -mattr=+alu32 specified. Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Reviewed-by: Yonghong Song <yhs@fb.com> llvm-svn: 325983	2018-02-23 23:49:23 +00:00
Yonghong Song	42389377d8	bpf: New target attribute "alu32" for 32-bit subregister support This new attribute aims to control the enablement of 32-bit subregister support on eBPF backend. Name the interface as "alu32" is because we in particular want to enable the generation of BPF_ALU32 instructions by enable subregister support. This attribute could be used in the following format with llc: llc -mtriple=bpf -mattr=[+\|-]alu32 It is disabled at default. Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Reviewed-by: Yonghong Song <yhs@fb.com> llvm-svn: 325982	2018-02-23 23:49:22 +00:00
Yonghong Song	0252f35362	bpf: Define instruction patterns for extensions and truncations between i32 to i64 For transformations between i32 and i64, if it is explicit signed extension: - first cast the operand to i64 - then use SLL + SRA to finish the extension. if it is explicit zero extension: - first cast the operand to i64 - then use SLL + SRL to finish the extension. if it is explicit any extension: - just refer to 64-bit register. if it is explicit truncation: - just refer to 32-bit subregister. NOTE: Some of the zero extension sequences might be unnecessary, they will be removed by an peephole pass on MachineInstruction layer. Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Reviewed-by: Yonghong Song <yhs@fb.com> llvm-svn: 325981	2018-02-23 23:49:21 +00:00
Yonghong Song	3a564a8f6e	bpf: Tighten the immediate predication for 32-bit alu instructions These 32-bit ALU insn patterns which takes immediate as one operand were initially added to enable AsmParser support, and the AsmMatcher uses "ins" and "outs" fields to deduct the operand constraint. However, the instruction selector doesn't work the same as AsmMatcher. The selector will use the "pattern" field for which we are not setting the predication for immediate operands correctly. Without this patch, i32 would eventually means all i32 operands are valid, both imm and gpr, while these patterns should allow imm only. Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Reviewed-by: Yonghong Song <yhs@fb.com> llvm-svn: 325980	2018-02-23 23:49:19 +00:00
Yonghong Song	ec84e2f1b0	bpf: Use markSuperRegs to mark reserved registers markSuperRegs is the canonical helper function used to mark reserved registers. It could mark any overlapping sub-registers automatically. Reviewed-by: Yonghong Song <yhs@fb.com> Signed-off-by: Jiong Wang <jiong.wang@netronome.com> llvm-svn: 325979	2018-02-23 23:49:18 +00:00
Nemanja Ivanovic	bcc82c9a78	[PowerPC] Disable shrink-wrapping when getting PC address through the LR The instruction sequence used to get the address of the PC into a GPR requires that we clobber the link register. Doing so without having first saved it in the prologue leaves the function unable to return. Currently, this sequence is emitted into the entry block. To ensure the prologue is inserted before this sequence, disable shrink-wrapping. This fixes PR33547. Differential Revision: https://reviews.llvm.org/D43677 llvm-svn: 325972	2018-02-23 23:08:34 +00:00
Eric Christopher	a70ec1308a	Sink the verification code around the assert where it's handled and wrap in NDEBUG. This has the advantage of making release only builds more warning free and there's no need to make this routine a class function if it isn't using class members anyhow. llvm-svn: 325967	2018-02-23 22:32:05 +00:00
Sriraman Tallam	609f8c013c	Intrinsics calls should avoid the PLT when "RtLibUseGOT" metadata is present. Differential Revision: https://reviews.llvm.org/D42216 llvm-svn: 325962	2018-02-23 21:32:06 +00:00
Craig Topper	16b20245ba	[X86] Add assembler/disassembler support for blendm with zero masking and broacast. Fixes PR31617 llvm-svn: 325957	2018-02-23 20:48:44 +00:00
Stefan Pintilie	626b651016	[Power9] Add missing instructions to the Power 9 scheduler This is the first in a series of patches that will define more instructions using InstRW so that we can move away from ItinRW and ultimately have a complete Power 9 scheduler. Differential Revision: https://reviews.llvm.org/D43635 llvm-svn: 325956	2018-02-23 20:37:10 +00:00
Krzysztof Parzyszek	96690ceceb	[Hexagon] Recognize non-immediate constants in HexagonConstPropagation llvm-svn: 325954	2018-02-23 20:33:26 +00:00
Simon Pilgrim	69b8fa8391	Fixed unused variable warning. NFCI. llvm-svn: 325950	2018-02-23 20:16:18 +00:00
Craig Topper	61d6ddbf0a	[X86] Add DAG combine to remove (and X, 1) from in front of a v1i1 scalar to vector. These can be created by type legalization promoting the inputs to select to match scalar boolean contents. We were trying to pattern match them away during isel, but its better to just remove them from the DAG. I've cleaned up some patterns to not check for this 'and' anymore. But I suspect this has also opened up opportunities for pattern removal. llvm-svn: 325949	2018-02-23 20:13:42 +00:00
Benjamin Kramer	ae87f86ec4	[WebAssembly] Fix macro metaprogram to not duplicate code as much. No functionality change intended. llvm-svn: 325947	2018-02-23 20:13:03 +00:00
Simon Pilgrim	425965be0f	[X86][SSE] Generalize x > C-1 ? x+-C : 0 --> subus x, C combine for non-uniform constants llvm-svn: 325944	2018-02-23 19:58:44 +00:00
Evandro Menezes	1afffac05b	[PATCH] [AArch64] Add new target feature to fuse conditional select This feature enables the fusion of the comparison and the conditional select instructions together. Differential revision: https://reviews.llvm.org/D42392 llvm-svn: 325939	2018-02-23 19:27:43 +00:00
Geoff Berry	d6ba3dbbbd	Fix compiler warning introduced in r325931. NFC. llvm-svn: 325938	2018-02-23 19:11:33 +00:00
Craig Topper	11704dcc72	[X86] Custom split v32i16/v64i8 bitcasts when AVX512F is available, but BWI is not. The test changes you can see are related to the changes in ReplaceNodeResults. Though shuffle-vs-trunc-512.ll does have a test that exercises the code in LowerBITCAST. Looks like the test output didn't change because DAG combining is able to clean up the resulting type legalization. Adding the custom hook just makes type legalization work less hard. Differential Revision: https://reviews.llvm.org/D43447 llvm-svn: 325933	2018-02-23 18:43:36 +00:00
Geoff Berry	f8bf2ec0a8	[MachineOperand][Target] MachineOperand::isRenamable semantics changes Summary: Add a target option AllowRegisterRenaming that is used to opt in to post-register-allocation renaming of registers. This is set to 0 by default, which causes the hasExtraSrcRegAllocReq/hasExtraDstRegAllocReq fields of all opcodes to be set to 1, causing MachineOperand::isRenamable to always return false. Set the AllowRegisterRenaming flag to 1 for all in-tree targets that have lit tests that were effected by enabling COPY forwarding in MachineCopyPropagation (AArch64, AMDGPU, ARM, Hexagon, Mips, PowerPC, RISCV, Sparc, SystemZ and X86). Add some more comments describing the semantics of the MachineOperand::isRenamable function and how it is set and maintained. Change isRenamable to check the operand's opcode hasExtraSrcRegAllocReq/hasExtraDstRegAllocReq bit directly instead of relying on it being consistently reflected in the IsRenamable bit setting. Clear the IsRenamable bit when changing an operand's register value. Remove target code that was clearing the IsRenamable bit when changing registers/opcodes now that this is done conservatively by default. Change setting of hasExtraSrcRegAllocReq in AMDGPU target to be done in one place covering all opcodes that have constant pipe read limit restrictions. Reviewers: qcolombet, MatzeB Subscribers: aemerson, arsenm, jyknight, mcrosier, sdardis, nhaehnle, javed.absar, tpr, arichardson, kristof.beyls, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, jordy.potman.lists, apazos, sabuasal, niosHD, escha, nemanjai, llvm-commits Differential Revision: https://reviews.llvm.org/D43042 llvm-svn: 325931	2018-02-23 18:25:08 +00:00
Stefan Pintilie	15e6b10ee0	[PowerPC] Code cleanup. Remove instructions that were withdrawn from Power 9. The following set of instructions was originally planned to be added for Power 9 and so code was added to support them. However, a decision was made later on to withdraw support for these instructions in the hardware. xscmpnedp xvcmpnesp xvcmpnedp This patch removes support for the instructions that were not added. Differential Revision: https://reviews.llvm.org/D43641 llvm-svn: 325918	2018-02-23 15:55:16 +00:00
Petar Jovanovic	a7bd36e63e	[mips] finish removal of unused fields in MipsInstructionSelector r325916 missed to remove calls in constructor. llvm-svn: 325917	2018-02-23 15:47:05 +00:00
Petar Jovanovic	f49c5ce3a6	[mips] remove unused fields in MipsInstructionSelector Unused fields cause buildbreak if -Werror,-Wunused-private-field is passed. llvm-svn: 325916	2018-02-23 15:34:02 +00:00
Hans Wennborg	89c35fc44d	Support for the mno-stack-arg-probe flag Adds support for this flag. There is also another piece for clang (separate review). More info: https://bugs.llvm.org/show_bug.cgi?id=36221 By Ruslan Nikolaev! Differential Revision: https://reviews.llvm.org/D43107 llvm-svn: 325900	2018-02-23 13:46:25 +00:00
Amaury Sechet	893a6b89ff	[DAGCOmbine] Ensure that (brcond (setcc ...)) is handled in a canonical manner. Summary: There are transformation that change setcc into other constructs, and transform that try to reconstruct a setcc from the brcond condition. Depending on what order these transform are done, the end result differs. Most of the time, it is preferable to get a setcc as a brcond argument (and this is why brcond try to recreate the setcc in the first place) so we ensure this is done every time by also doing it at the setcc level when the only user is a brcond. Reviewers: spatel, hfinkel, niravd, craig.topper Subscribers: nhaehnle, llvm-commits Differential Revision: https://reviews.llvm.org/D41235 llvm-svn: 325892	2018-02-23 11:50:42 +00:00
Petar Jovanovic	fac93e28f0	[MIPS GlobalISel] Adding GlobalISel Add GlobalISel infrastructure up to the point where we can select a ret void. Patch by Petar Avramovic. Differential Revision: https://reviews.llvm.org/D43583 llvm-svn: 325888	2018-02-23 11:06:40 +00:00
Nicolai Haehnle	6cf306deca	AMDGPU: Track physreg uses in SILoadStoreOptimizer Summary: This handles def-after-use of physregs, and allows us to merge loads and stores even across some physreg defs (typically M0 defs). Change-Id: I076484b2bda27c2cf46013c845a0380c5b89b67b Reviewers: arsenm, mareko, rampitec Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D42647 llvm-svn: 325882	2018-02-23 10:45:56 +00:00
Jonas Paulsson	07d6aea61a	[Mips] Return true in enableMultipleCopyHints(). Enable multiple COPY hints to eliminate more COPYs during register allocation. Note that this is something all targets should do, see https://reviews.llvm.org/D38128. Review: Simon Dardis llvm-svn: 325870	2018-02-23 08:30:15 +00:00
Sam Clegg	6c899ba6de	[WebAssembly] Add first claass symbol table to wasm objects This is combination of two patches by Nicholas Wilson: 1. https://reviews.llvm.org/D41954 2. https://reviews.llvm.org/D42495 Along with a few local modifications: - One change I made was to add the UNDEFINED bit to the binary format to avoid the extra byte used when writing data symbols. Although this bit is redundant for other symbols types (i.e. undefined can be implied if a function or global is a wasm import) - I prefer to be explicit and consistent and not have derived flags. - Some field renaming. - Some reverting of unrelated minor changes. - No test output differences. Differential Revision: https://reviews.llvm.org/D43147 llvm-svn: 325860	2018-02-23 05:08:34 +00:00
Richard Smith	ade53736b0	Revert r325128 ("[X86] Reduce Store Forward Block issues in HW"). This is causing miscompiles in some situations. See the llvm-commits thread for the commit for details. llvm-svn: 325852	2018-02-23 01:43:46 +00:00
Craig Topper	0dcc88a500	[X86] Turn setne X, signedmax into setgt signedmax, X in LowerVSETCC to avoid an invert We won't be able to fold the constant pool load, but its still better than materialing ones and xoring for the invert if we used PCMPEQ. This will fix another regression from D42948. llvm-svn: 325845	2018-02-23 00:21:39 +00:00
Evandro Menezes	5c986b010b	[AArch64] Refactor macro fusion (NFC) Move checks for each fusion case into separate functions for better legibility and maintainability. Differential revision: https://reviews.llvm.org/D43649 llvm-svn: 325844	2018-02-23 00:14:39 +00:00
Rafael Espindola	ba02f3f242	Fix grammar. NFC. Thank to Eric Christopher for noticing. llvm-svn: 325842	2018-02-22 23:59:46 +00:00
Craig Topper	d2fab30827	[X86] Turn setne X, signedmin into setgt X, signedmin in LowerVSETCC to avoid an invert This will fix one of the regressions from D42948. Differential Revision: https://reviews.llvm.org/D43531 llvm-svn: 325840	2018-02-22 23:46:28 +00:00
Benjamin Kramer	a01e97d748	Fix the build of the wasm backend. toString conflicts with llvm::toString here. Yay for overly generic function names. llvm-svn: 325833	2018-02-22 22:29:27 +00:00
Craig Topper	1aed540ea2	[X86] Make the subus special case in LowerVSETCC self contained Previously this code overrode the flags and opcode used by the later code in LowerVSETCC. This makes the code difficult to read and follow. This patch moves all the SUBUS code into its own function and makes it responsible for creating its own SDNodes on success. Differential Revision: https://reviews.llvm.org/D43530 llvm-svn: 325827	2018-02-22 20:24:18 +00:00
Nicolai Haehnle	40b140fef1	AMDGPU: Stop using .NAME in .td files Summary: .NAME is a bit of an odd duck, in that we should really treat it like a template argument, but we currently don't, and so when and where NAME is initialized and how is pretty inconsistent. Best to just avoid using it as a field of already instantiated records, and use cast to string instead. Change-Id: I5a0c202401cede3d5c3827ab9c7858ea48b29108 Reviewers: arsenm, rampitec Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D43551 llvm-svn: 325794	2018-02-22 15:25:11 +00:00
Shiva Chen	7c17242b92	[RISCV] Implement c.lui immediate operand constraint Implement c.lui immediate constraint to [1, 31] and [0xfffe0, 0xfffff]. The RISC-V ISA describes the constraint as [1, 63], with that value being loaded in to bits 17-12 of the destination register and sign extended from bit 17. Therefore, this 6-bit immediate can represent values in the ranges [1, 31] and [0xfffe0, 0xfffff]. Differential Revision: https://reviews.llvm.org/D42834 llvm-svn: 325792	2018-02-22 15:02:28 +00:00
Stefan Maksimovic	ed797a3049	[mips] Generate memory dependencies for byVal arguments There were no memory dependencies made between stores generated when lowering formal arguments and loads generated when call lowering byVal arguments which made the Post-RA scheduler place a load before a matching store. Make the fixed object stored to mutable so that the load instructions can have their memory dependencies added Set the frame object as isAliased which clears the underlying objects vector in ScheduleDAGInstrs::buildSchedGraph(). This results in addition of all stores as dependenies for loads. This problem appeared when passing a byVal parameter coupled with a fastcc function call. Differential Revision: https://reviews.llvm.org/D37515 llvm-svn: 325782	2018-02-22 13:40:42 +00:00
Alex Bradbury	8d8d0a733f	[RISCV][NFC] Make logic in RISCVMCCodeEmitter::getImmOpValue more defensive As pointed out by @sabuasal in a comment on D23568, the logic in RISCVMCCodeEmitter::getImmOpValue could be more defensive. Although with the current instruction definitions it is always the case that `VK_RISCV_LO` is always used with either an I- or S-format instruction, this may not always be the case in the future. Add a check to ensure we will get an assertion in debug builds if that changes. llvm-svn: 325775	2018-02-22 13:24:25 +00:00
Sjoerd Meijer	d31a8c0595	Recommit: [ARM] f16 constant pool fix This recommits r325754; the modified and failing test case actually didn't need any modifications. llvm-svn: 325765	2018-02-22 10:43:57 +00:00
David Green	01e0f25a9f	[ARM] Fix issue with large xor constants. Fixup to rL325573 for large xor constants. Thanks to Eli Friedman for the catch. Differential revision: https://reviews.llvm.org/D43549 llvm-svn: 325761	2018-02-22 09:38:57 +00:00
Sjoerd Meijer	9a25247f80	Revert r325754 and r325755 (f16 literal pool) because buildbots were unhappy. llvm-svn: 325756	2018-02-22 08:41:55 +00:00
Sjoerd Meijer	7d5909eb0f	[ARM] f16 constant pool fix This is a follow up of r325012, that allowed half types in constant pools. Proper alignment was enforced when a big basic block was split up, but not when a CPE was placed before/after a block; the successor block had the wrong alignment. Differential Revision: https://reviews.llvm.org/D43580 llvm-svn: 325754	2018-02-22 08:16:05 +00:00
Hiroshi Inoue	7f9f92f8b6	[NFC] fix trivial typos in comments "a a" -> "a" llvm-svn: 325752	2018-02-22 07:48:29 +00:00
Nemanja Ivanovic	e54a9ee8ac	[PowerPC] Do not produce invalid CTR loop with an FRem An FRem instruction inside a loop should prevent the loop from being converted into a CTR loop since this is not an operation that is legal on any PPC subtarget. This will always be a call to a library function which means the loop will be invalid if this instruction is in the body. Fixes PR36292. llvm-svn: 325739	2018-02-22 03:02:41 +00:00
Simon Pilgrim	55b7e01116	[X86][MMX] Generlize MMX_MOVD64rr combines to accept v4i16/v8i8 build vectors as well as v2i32 Also handle both cases where the lower 32-bits of the MMX is undef or zero extended. llvm-svn: 325736	2018-02-21 23:07:30 +00:00
Yonghong Song	9fdd139b41	bpf: disable DwarfUsesRelocationsAcrossSections The pahole does not work with BPF backend properly: -bash-4.2$ cat test.c struct test_t { int a; int b; }; int test(struct test_t s) { return s->a; } -bash-4.2$ clang -g -O2 -target bpf -c test.c -bash-4.2$ pahole test.o struct clang version 7.0.0 (trunk 325446) (llvm/trunk 325464) { clang version 7.0.0 (trunk 325446) (llvm/trunk 325464) clang version 7.0.0 (trunk 325446) (llvm/trunk 325464); / 0 4 / clang version 7.0.0 (trunk 325446) (llvm/trunk 325464) clang version 7.0.0 (trunk 325446) (llvm/trunk 325464); / 4 4 / / size: 8, cachelines: 1, members: 2 / / last cacheline: 8 bytes / }; -bash-4.2$ The reason is that BPF backend is not yet implemented in elfutils backend https://github.com/threatstack/elfutils/tree/master/backends and pahole depends on elfutils for dwarf parsing and resolving relocation. More specifically, the unsupported relocation in .debug_info for type/member name against symbol table caused the incorrect result above. The following is the raw .rel.debug_info for the above example, Hex dump of section '.rel.debug_info': 0x00000000 06000000 00000000 0a000000 0b000000 ................ 0x00000010 0c000000 00000000 0a000000 01000000 ................ 0x00000020 12000000 00000000 0a000000 02000000 ................ 0x00000030 16000000 00000000 0a000000 0e000000 ................ 0x00000040 1a000000 00000000 0a000000 03000000 ................ ----------------- -------- -------- reloc location type symtab index Hex dump of section '.debug_info': 0x00000000 7b000000 04000000 00000801 00000000 {............... 0x00000010 0c000000 00000000 00000000 00000000 ................ 0x00000020 00000000 00001000 00000200 00000000 ................ Based on "type", the proper value will be extracted from symbol table and filled in .debug_info so later on .debug_info can be properly resolved against debug strings. There are two ways to fix this problem. One is to fix elfutils by adding BPF support which is desirable. This could take a long time and won't work with already deployed pahole. For a short term workaround, we can disable dwarf cross-section relation which specifically avoids debug_info and symbol table cross relocation. This should help any dwarf-related tool which has not implement BPF specific relocations yet. Now .rel.debug_info does not have any relocation for symbol table and .debug_info itself contains necessary relocation information by itself. Hex dump of section '.debug_info': 0x00000000 7b000000 04000000 00000801 00000000 {............... 0x00000010 0c003700 00000000 00003e00 00000000 ..7.......>..... 0x00000020 00000000 00001000 00000200 00000000 ................ location 0xc has 0, 0x12 has 0x37, 0x1a has 0x3e in place which will be used in relocation resolution. Here, the values of 0, 0x37 and 0x3e are offset in .debug_str section. Please note the difference between two above .debug_info dumps. With the fix, pahole works properly with BPF backend: -bash-4.2$ clang -O2 -g -target bpf -c test.c -bash-4.2$ pahole test.o struct test_t { int a; / 0 4 / int b; / 4 4 / / size: 8, cachelines: 1, members: 2 / / last cacheline: 8 bytes */ }; Signed-off-by: Yonghong Song <yhs@fb.com> llvm-svn: 325735	2018-02-21 22:59:14 +00:00
Tobias Edler von Koch	ba7a1f08da	[Hexagon] Add TargetRegisterInfo::getPointerRegClass() override llvm-svn: 325731	2018-02-21 22:27:07 +00:00
Jonas Paulsson	77cdf3881c	[Hexagon] Return true in enableMultipleCopyHints(). Enable multiple COPY hints to eliminate more COPYs during register allocation. Note that this is something all targets should do, see https://reviews.llvm.org/D38128. Review: Krzysztof Parzyszek llvm-svn: 325697	2018-02-21 16:37:45 +00:00
Simon Pilgrim	82d33b7c44	[X86] LowerBITCAST - pull out repeated calls to getOperand(0). NFCI. llvm-svn: 325695	2018-02-21 16:35:40 +00:00
Jonas Devlieghere	e0af7c390d	[Sparc] Include __tls_get_addr in symbol table for TLS calls to it Global Dynamic and Local Dynamic call relocations only implicitly reference __tls_get_addr; there is no connection in the ELF file between the relocations and the symbol other than the specification for the relocations' semantics. However, it still needs to be in the symbol table despite the lack of explicit references to the symbol table entry, since it needs to be bound at link time for these relocations, otherwise any objects will fail to link. For details, see https://sourceware.org/bugzilla/show_bug.cgi?id=22832. Path by: James Clarke (jrtc27) Differential revision: https://reviews.llvm.org/D43271 llvm-svn: 325688	2018-02-21 15:25:26 +00:00
Nicolai Haehnle	770397f4cd	AMDGPU: Do not combine loads/store across physreg defs Summary: Since this pass operates on machine SSA form, this should only really affect M0 in practice. Fixes various piglit variable-indexing/vs-varying-array-mat4-index-* Change-Id: Ib2a1dc3a8d7b08225a8da49a86f533faa0986aa8 Fixes: r317751 ("AMDGPU: Merge S_BUFFER_LOAD_DWORD_IMM into x2, x4") Reviewers: arsenm, mareko, rampitec Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D40343 llvm-svn: 325677	2018-02-21 13:31:35 +00:00
Dmitry Preobrazhensky	d6e1a9404d	[AMDGPU][MC] Added lds support for MUBUF instructions See bug 28234: https://bugs.llvm.org/show_bug.cgi?id=28234 Differential Revision: https://reviews.llvm.org/D43472 Reviewers: vpykhtin, artem.tamazov, arsenm llvm-svn: 325676	2018-02-21 13:13:48 +00:00
Craig Topper	d710adac2d	[X86] Disable CLWB for Cannon Lake Cannon Lake does not support CLWB, therefore it does not include all features listed under SKX anymore. Instead, enumerate all SKX features with the exception of CLWB. Patch by Gabor Buella Differential Revision: https://reviews.llvm.org/D43380 llvm-svn: 325654	2018-02-21 00:15:48 +00:00
Simon Dardis	7bc8ad5849	[mips] Spectre variant two mitigation for MIPSR2 This patch provides mitigation for CVE-2017-5715, Spectre variant two, which affects the P5600 and P6600. It implements the LLVM part of -mindirect-jump=hazard. It is _not_ enabled by default for the P5600. The migitation strategy suggested by MIPS for these processors is to use hazard barrier instructions. 'jalr.hb' and 'jr.hb' are hazard barrier variants of the 'jalr' and 'jr' instructions respectively. These instructions impede the execution of instruction stream until architecturally defined hazards (changes to the instruction stream, privileged registers which may affect execution) are cleared. These instructions in MIPS' designs are not speculated past. These instructions are used with the attribute +use-indirect-jump-hazard when branching indirectly and for indirect function calls. These instructions are defined by the MIPS32R2 ISA, so this mitigation method is not compatible with processors which implement an earlier revision of the MIPS ISA. Performance benchmarking of this option with -fpic and lld using -z hazardplt shows a difference of overall 10%~ time increase for the LLVM testsuite. Certain benchmarks such as methcall show a substantially larger increase in time due to their nature. Reviewers: atanasyan, zoran.jovanovic Differential Revision: https://reviews.llvm.org/D43486 llvm-svn: 325653	2018-02-21 00:06:53 +00:00
Konstantin Zhuravlyov	5c1237a1fd	Revert "[AMDGPU] Increased vector length for global/constant loads." https://reviews.llvm.org/rL325518 It breaks following OpenCL conformance tests: - Basic - parameter_types - Basic - vload_private llvm-svn: 325643	2018-02-20 23:30:21 +00:00
Evandro Menezes	72f3983633	[AArch64] Refactor instructions using SIMD immediates Get rid of icky goto loops and make the code easier to maintain. Otherwise, NFC. Restore r324903 and fix PR36369. Differentail revision: https://reviews.llvm.org/D43364 llvm-svn: 325621	2018-02-20 20:31:45 +00:00
Sjoerd Meijer	4d5c40492a	[ARM] Lower BR_CC for f16 This case wasn't handled yet. Differential Revision: https://reviews.llvm.org/D43508 llvm-svn: 325616	2018-02-20 19:28:05 +00:00
Krzysztof Parzyszek	f9f2005f94	[Hexagon] Handle *Low8 register classes in early if-conversion llvm-svn: 325606	2018-02-20 18:19:17 +00:00
Craig Topper	df0c22fcd3	[X86] Correct SHRUNKBLEND creation to work correctly when there are multiple uses of the condition. SimplifyDemandedBits forces the demanded mask to all 1s if the node has multiple uses, unless the AssumeSingleUse flag is set. So previously we were only really likely to simplify something if the condition had a single use. And on the off chance we did simplify with multiple uses the demanded mask being used was all ones so there was no reason to create a shrunkblend. This patch now checks that the condition is only used by selects first, and then sets the AssumeSingleUse flag for the simplifcation. Then we convert the selects to shrunkblend, and finally replace condition. Differential Revision: https://reviews.llvm.org/D43446 llvm-svn: 325604	2018-02-20 17:58:17 +00:00
Craig Topper	010ae8dcbb	[X86] Promote 16-bit cmovs to 32-bits This allows us to avoid an opsize prefix. And forcing some move immediates to i32 avoids a length changing prefix on those instructions. This mostly replaces the existing combine we had for zext/sext+cmov of constants. I left in a case for sign extending a 32 bit cmov of constants to 64 bits. Differential Revision: https://reviews.llvm.org/D43327 llvm-svn: 325601	2018-02-20 17:41:00 +00:00
Simon Dardis	d3860e6670	[mips] Correct the definition of cvt.d.w An upcoming patch D41434, changes the ordering of the matcher table for assembly. This patch corrects the definition of the normal MIPS cvt.d.w not to be available in microMIPS. llvm-svn: 325589	2018-02-20 15:55:17 +00:00
Lei Huang	dfd41552f4	[PowerPC] Reduce stack frame for fastcc functions by only allocating parameter save area when needed Current implementation always allocates the parameter save area conservatively for fastcc functions. There is no reason to allocate the parameter save area if all the parameters can be passed via registers. Differential Revision: https://reviews.llvm.org/D42602 llvm-svn: 325581	2018-02-20 15:09:45 +00:00
Krzysztof Parzyszek	b404fae9e3	[Hexagon] Fix alignment calculation of stack objects in Hexagon bit tracker llvm-svn: 325580	2018-02-20 14:29:43 +00:00
David Green	056476497e	[ARM] Mark -1 as cheap in xor's for thumb1 We can always convert xor %a, -1 into MVN, even in thumb 1 where the -1 would not otherwise be considered a cheap constant. This prevents the -1's from being pulled out into constants and potentially hoisted. Differential Revision: https://reviews.llvm.org/D43451 llvm-svn: 325573	2018-02-20 11:07:35 +00:00
George Rimar	da4f43a4b4	[llvm-mc] - Produce R_X86_64_PLT32 for "call/jmp foo". For instructions like call foo and jmp foo patch changes relocation produced from R_X86_64_PC32 to R_X86_64_PLT32. Relocation can be used as a marker for 32-bit PC-relative branches. Linker will reduce PLT32 relocation to PC32 if function is defined locally. Differential revision: https://reviews.llvm.org/D43383 llvm-svn: 325569	2018-02-20 10:17:57 +00:00
Tim Renouf	8234b4893a	[AMDGPU] stop buffer_store being moved illegally Summary: The machine instruction scheduler was illegally moving a buffer store past a buffer load with the same descriptor and offset. Fixed by marking buffer ops as mayAlias and isAliased. This may be overly conservative, and we may need to revisit. Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D43332 Change-Id: Iff3173d9e0653e830474546276ab9d30318b8ef7 llvm-svn: 325567	2018-02-20 10:03:38 +00:00
Craig Topper	9256ac1a58	[X86] Add 512-bit unmasked pmulhrsw/pmulhw/pmulhuw intrinsics. Remove and auto upgrade 128/256/512 bit masked pmulhrsw/pmulhw/pmulhuw intrinsics. The 128 and 256 bit versions were already not used by clang. This adds an equivalent unmasked 512 bit version. Then autoupgrades all sizes to use unmasked intrinsics plus select. llvm-svn: 325559	2018-02-20 07:28:14 +00:00
Amara Emerson	db211892ed	[AArch64][GlobalISel] When copying from a gpr32 to an fpr16 reg, convert to fpr32 first. This is a follow on commit to r[x] where we fix the other direction of copy. For this case, after converting the source from gpr32 -> fpr32, we use a subregister copy, which is essentially what EXTRACT_SUBREG does in SDAG land. https://reviews.llvm.org/D43444 llvm-svn: 325550	2018-02-20 05:11:57 +00:00
Craig Topper	a05ed17316	[X86] Make XOP VPCOM instructions commutable to fold loads during isel. llvm-svn: 325547	2018-02-20 03:58:13 +00:00
Craig Topper	9b64bf54b9	[X86] Make a helper function for commuting AVX512 VPCMP immediates since we do it in two places. llvm-svn: 325546	2018-02-20 03:58:11 +00:00
Craig Topper	b195ed8ce3	[X86] Use vpmovq2m/vpmovd2m for truncate to vXi1 when possible. Previously we used vptestmd, but the scheduling data for SKX says vpmovq2m/vpmovd2m is lower latency. We already used vpmovb2m/vpmovw2m for byte/word truncates. So this is more consistent anyway. llvm-svn: 325534	2018-02-19 22:07:31 +00:00
Craig Topper	e60f1472f1	[X86] Stop swapping the operands of AVX512 setge. We swapped the operands and used setle, but I don't see any reason to do that. I think this is a holdover from SSE where we swap and the invert to use pcmpgt. But with AVX512 we don't want an invert so we won't use pcmpgt. So there's no need to swap. llvm-svn: 325527	2018-02-19 19:23:35 +00:00
Craig Topper	9471a7c898	[X86] Reduce the number of isel pattern variations needed for VPTESTM/VPTESTNM matching. Canonicalize EQ/NE PCMPM to have build vector all zeros on the RHS so we don't have to pattern match it in both locations. This significantly reduces the number of isel patterns needed since we also had to multiply it out with loads being in either operand of the 'and' input node and in the 'and' masking node. This removes over 24000 bytes from the isel table. llvm-svn: 325526	2018-02-19 19:23:31 +00:00
Mark Searles	65207923f6	[AMDGPU] Make note of existing waitcnt instrs; this is add-on work related to suppression of redundant waitcnt instrs. It is necessary to make note of these existing waitcnt instrs so that we do not fall into an infinite loop when handling loops. Also, [NFC] some minor code clean-up. llvm-svn: 325524	2018-02-19 19:19:59 +00:00
Mark Searles	419bdab759	[AMDGPU] Increased vector length for global/constant loads. Summary: GCN ISA supports instructions that can read 16 consecutive dwords from memory through the scalar data cache; loadstoreVectorizer should take advantage of the wider vector length and pack 16/8 elements of dwords/quadwords. Author: FarhanaAleen Reviewed By: rampitec Subscribers: llvm-commits, AMDGPU Differential Revision: https://reviews.llvm.org/D43275 llvm-svn: 325518	2018-02-19 16:42:49 +00:00
Rafael Espindola	c7e51805ff	Bring back r323297. It was reverted because it broke the grub build. The reason the grub build broke is because grub does its own relocation processing and was not handing R_386_PLT32. Since grub has no dynamic linker, the fix is trivial: handle R_386_PLT32 exactly like R_386_PC32. On the report it was noted that they are using -fno-integrated-assembler. The upstream GAS (starting with 451875b4f976a527395e9303224c7881b65e12ed) will already be producing a R_386_PLT32 anyway, so they have to update their code one way or the other Original message: Don't assume a null GV is local for ELF and MachO. This is already a simplification, and should help with avoiding a plt reference when calling an intrinsic with -fno-plt. With this change we return false for null GVs, so the caller only needs to check the new metadata to decide if it should use foo@plt or *foo@got. llvm-svn: 325514	2018-02-19 16:02:38 +00:00
Francis Visoiu Mistrih	68ced40a23	Revert "[CodeGen] Move printing '\n' from MachineInstr::print to MachineBasicBlock::print" This reverts commit r324681. llvm-svn: 325505	2018-02-19 15:08:49 +00:00
Simon Pilgrim	c302a581a0	[X86][SSE] combineTruncateWithSat - use truncateVectorWithPACK down to 64-bit subvectors Add support for chaining PACKSS/PACKUS down to 64-bit vectors by using only a single 128-bit input. llvm-svn: 325494	2018-02-19 13:29:20 +00:00
Dylan McKay	9a2a996c1c	[AVR] Set the program address space in the data layout This adds the program memory address space setting to the AVR data layout. This setting was very recently added under r325479. At the moment, there are no uses of this setting. In the future, things such as switch lookup tables should reside there. llvm-svn: 325481	2018-02-19 10:40:59 +00:00
Dylan McKay	05d3e41076	[AVR] Fix a lowering bug in AVRISelLowering.cpp The parseFunctionArgs() method was directly reading the arguments from a Function object, but is should have used the arguments supplied by the SelectionDAGBuilder. This was causing the lowering code to only lower one argument, not two in some cases. Thanks to @brainlag on GitHub for coming up with the working fix! Patch-by: @brainlag on GitHub llvm-svn: 325474	2018-02-19 08:28:38 +00:00
Eric Christopher	8fad26e5f3	Add LanaiMCTargetDesc.h to LanaiInstrInfo.h to make it self contained with instruction enum definitions. llvm-svn: 325473	2018-02-19 05:26:49 +00:00
Craig Topper	9cf812e1ed	[X86] Correct a typo I made in combineToExtendCMOV recently. We're accidentally checking that the same node is a constant twice instead of checking the other node. This isn't a functional problem since we didn't do anything below that explicitly requires constants. It just means we may have introduced a sign_extend or zero_extend that won't fold out. llvm-svn: 325469	2018-02-18 20:41:25 +00:00
Amara Emerson	242efdb54b	Fix unused assertion variable warning. llvm-svn: 325464	2018-02-18 17:28:34 +00:00
Amara Emerson	7e9f348b2d	[AArch64][GlobalISel] Fix an assert fail/miscompile when fp16 types are copied to gpr register banks. PR36345. rdar://36478867 Differential Revision: https://reviews.llvm.org/D43310 llvm-svn: 325463	2018-02-18 17:10:49 +00:00
Amara Emerson	bc03baef77	[AArch64][GlobalISel] Support G_INSERT/G_EXTRACT of types < s32 bits. These are needed for operations on fp16 types in a later patch. llvm-svn: 325462	2018-02-18 17:03:02 +00:00
Haicheng Wu	aed6e52b3c	[AArch64] Coalesce Copy Zero during instruction selection Add special case for copy of zero to avoid a double copy. Differential Revision: https://reviews.llvm.org/D36104 llvm-svn: 325459	2018-02-18 13:51:33 +00:00
Jonas Paulsson	891789c299	[BPF] Return true in enableMultipleCopyHints(). Enable multiple COPY hints to eliminate more COPYs during register allocation. Note that this is something all targets should do, see https://reviews.llvm.org/D38128. Review: Yonghong Song llvm-svn: 325457	2018-02-18 10:09:54 +00:00
Craig Topper	1040f236a3	[X86] Make masked pcmpeq commutable during isel so we can fold loads in other operand to the shorter encoding. Previously we used the immediate encoding if the load was in operand 0 and the short encoding if the load was in operand 1. This added an insane number of bytes to the size of the isel table. I'm wondering if we should always use the immediate form during isel and change to the short form during emission. This would remove the need to pattern match every combination for both the immediate form and the short form during isel. We could do the same with vpcmpgt llvm-svn: 325456	2018-02-18 02:37:33 +00:00
Simon Pilgrim	386b8ddd5f	[MIPS][MSA] Convert vector integer min/max opcodes to use generic implementation Found while investigating D43338 Simon^3 - the LLVM project needs more Simons. Differential Revision: https://reviews.llvm.org/D43433 llvm-svn: 325447	2018-02-17 21:29:45 +00:00
Martin Storsjo	a63a5b993e	[AArch64] Implement dynamic stack probing for windows This makes sure that alloca() function calls properly probe the stack as needed. Differential Revision: https://reviews.llvm.org/D42356 llvm-svn: 325433	2018-02-17 14:26:32 +00:00
Simon Pilgrim	63db669013	Fix unused variable warning. NFCI. We were casting to AArch64InstrInfo but only using it for static methods which some compilers complain about. llvm-svn: 325432	2018-02-17 13:48:23 +00:00
Jonas Paulsson	b51a9bc358	[AMDGPU] Return true in enableMultipleCopyHints(). Enable multiple COPY hints to eliminate more COPYs during register allocation. Note that this is something all targets should do, see https://reviews.llvm.org/D38128. Review: Stanislav Mekhanoshin, Tom Stellard. llvm-svn: 325425	2018-02-17 10:00:28 +00:00
Chandler Carruth	a1d6107b14	[DAG, X86] Revert r324797, r324491, and r324359. Sadly, r324359 caused at least PR36312. There is a patch out for review but it seems to be taking a bit and we've already had these crashers in tree for too long. We're hitting this PR in real code now and are blocked on shipping new compilers as a consequence so I'm reverting us back to green. Sorry for the churn due to the stacked changes that I had to revert. =/ llvm-svn: 325420	2018-02-17 02:26:25 +00:00
Craig Topper	0bcdd399e7	[X86] Turn selects with constant condition into vector shuffles during DAG combine Summary: Currently we convert to shuffles during lowering. This moves it to DAG combine so hopefully we can get it done before type legalization has to extend the condition. I believe in some cases we're creating SHRUNKBLENDs that end up with constant conditions because we see the extended on the condition and think its a dynamic selelect before DAG combine gets a chance to constant fold the extend. We could add combines to turn SHRUNKBLENDs with constant condition back to vselect. But it seemed like it might be better to just send them to shuffles as early as possible so they never get a chance to become SHRUNKBLENDs. This the reason some tests went from blends controlled by a constant pool load to just move. Some of the constant pool entries changed because the sign_extend introduced by type legalization turned undef elements in select condition into 0s. While the select->shuffle used -1 in the shuffle mask. So now the shuffle lowering can do what it wants with them. I'll remove the lowering code as a follow up. We might be able to simplify some of the pre-checks for SHRUNKBLEND as the FIXME there says. Reviewers: spatel, RKSimon, efriedma, zvi, andreadb Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D43367 llvm-svn: 325417	2018-02-17 00:30:30 +00:00
Konstantin Zhuravlyov	ef9aafcddc	AMDGPU: Remove unused private member of AMDGPUTargetELFStreamer llvm-svn: 325408	2018-02-16 23:04:11 +00:00
Eric Christopher	8ceddb0ecd	Remove an unused function. llvm-svn: 325403	2018-02-16 22:46:47 +00:00
Konstantin Zhuravlyov	9122a63143	AMDGPU: Bring elf flags in sync with the spec - Add MACH flags - Add XNACK flag - Add reserved flags - Minor cleanups in docs Differential Revision: https://reviews.llvm.org/D43356 llvm-svn: 325399	2018-02-16 22:33:59 +00:00
Craig Topper	27b9ac2372	[X86] In lowerVSELECTtoVectorShuffle, don't map undef select condition to undef in shuffle mask. Undef in select condition means we should pick the element from one side or the other. An undef in a shuffle mask means pick any element from either source or worse. I suspect by the time we get here most of the undefs in a constant vector have been removed by other things, but doing this for safety. llvm-svn: 325394	2018-02-16 21:36:29 +00:00
Konstantin Zhuravlyov	331f97e171	AMDGPU: Bring processors and features in sync with the spec - Remove gfx800 - Make iceland gfx802 - Add xnack to gfx902 Differential Revision: https://reviews.llvm.org/D43355 llvm-svn: 325393	2018-02-16 21:26:25 +00:00
Evandro Menezes	10ae20d80c	[AArch64] Fix BITCAST lowering crash The data type is assumed to be a vector, but sometimes it is not, leading to an assertion. Add simple test-case to verify this. Differential revision: https://reviews.llvm.org/D42599 llvm-svn: 325378	2018-02-16 20:00:57 +00:00
Changpeng Fang	ba92059ca9	AMDGPU/SI: Extend promoting alloca to vector to arrays of up to 16 elements Summary: This patch extends the promotion of alloca to vector to the arrays of up to 16 elements. Also we introduce an option, -disable-promote-alloca-to-vector, to switch promotion to vector off, if needed. Reviewers: arsenm Differential Revision: https://reviews.llvm.org/D33559 llvm-svn: 325372	2018-02-16 19:14:17 +00:00
Craig Topper	de565fc73e	[X86] Only reorder srl/and on last DAG combiner run This seems to interfere with a target independent brcond combine that looks for the (srl (and X, C1), C2) pattern to enable TEST instructions. Once we flip, that combine doesn't fire and we end up exposing it to the X86 specific BT combine which causes us to emit a BT instruction. BT has lower throughput than TEST. We could try to make the brcond combine aware of the alternate pattern, but since the flip was just a code size reduction and not likely to enable other combines, it seemed easier to just delay it until after lowering. Differential Revision: https://reviews.llvm.org/D43201 llvm-svn: 325371	2018-02-16 18:51:09 +00:00
Craig Topper	79bd39db80	[X86] Remove call to ShrinkDemandedCosntant from the SHRUNKBLEND creation code. We only run this code if know the condition isn't a constant vector. ShrinkDemandedConstant isn't going to find any different. llvm-svn: 325368	2018-02-16 18:34:46 +00:00
Sam Clegg	b7a5469c7e	[WebAssembly] MC: Make explicit our current lack of support for relocations against unnamed temporary symbols. Add an explicit check before looking up symbol in SymbolIndices. This was previously silently succeeding and returning zero for such unnamed temporaries. Differential Revision: https://reviews.llvm.org/D43365 llvm-svn: 325367	2018-02-16 18:06:05 +00:00
Changpeng Fang	da38b5fd49	AMDGPU/SI: Turn off GPR Indexing Mode immediately after the interested instruction. Summary: In the current implementation of GPR Indexing Mode when the index is of non-uniform, the s_set_gpr_idx_off instruction is incorrectly inserted after the loop. This will lead the instructions with vgpr operands (v_readfirstlane for example) to read incorrect vgpr. In this patch, we fix the issue by inserting s_set_gpr_idx_on/off immediately around the interested instruction. Reviewers: rampitec Differential Revision: https://reviews.llvm.org/D43297 llvm-svn: 325355	2018-02-16 16:31:30 +00:00
Simon Pilgrim	4e2f757dc1	[X86][SSE] Allow float domain crossing if we are merging 2 or more shuffles and the root started as a float domain shuffle llvm-svn: 325349	2018-02-16 14:57:25 +00:00
Nemanja Ivanovic	6cf41b028d	[PowerPC] Fix transform in table gen file causing UB Running a bootstrap build with UBSan produces a number of instances where we have signed integer overflow due to this transform. Change the type to long to prevent this UB on 64-bit build machines. llvm-svn: 325347	2018-02-16 14:49:01 +00:00
Simon Dardis	b8ae30ecec	[mips] Remove codegen support from some 16 bit instructions These instructions conflict with their full length variants for the purposes of FastISel as they cannot be distingushed based on the number and type of operands and predicates. Reviewers: atanasyan Differential Revision: https://reviews.llvm.org/D41285 llvm-svn: 325341	2018-02-16 13:34:23 +00:00
Jonas Paulsson	995ba6e42c	[ARM] Return true in enableMultipleCopyHints(). Enable multiple COPY hints to eliminate more COPYs during register allocation. Note that this is something all targets should do, see https://reviews.llvm.org/D38128. Review: Eli Friedman llvm-svn: 325327	2018-02-16 09:51:01 +00:00
Roger Ferrer Ibanez	d41059a9f6	[ARM] Materialise some boolean values to avoid a branch This patch combines some cases of ARMISD::CMOV for integers that arise in comparisons of the form a != b ? x : 0 a == b ? 0 : x and that currently (e.g. in Thumb1) are emitted as branches. Differential Revision: https://reviews.llvm.org/D34515 llvm-svn: 325323	2018-02-16 09:23:59 +00:00
Craig Topper	2e4b838c06	[X86] Allow CMOVs of constants to be sign extended from i32. Sign extending i32 constants only requires a REX prefix as does widening the CMOV. This is cheaper than the explicit sign extend op. llvm-svn: 325318	2018-02-16 07:16:15 +00:00
Craig Topper	5d9e301042	[X86] Don't zero_extend cmov up to i64, stop at i32. Zero extend from i32 to i64 is free. So extend from i16 to i32, and then use a free zero extend to finish. llvm-svn: 325317	2018-02-16 06:52:43 +00:00
Stanislav Mekhanoshin	ff2763a658	[AMDGPU] Combine adjacent waitcounts in a single strongest wait Differential Revision: https://reviews.llvm.org/D43350 llvm-svn: 325299	2018-02-15 22:03:55 +00:00
Rafael Auler	de9ad4ba84	[X86][3DNOW] Teach decoder about AMD 3DNow! instrs Summary: This patch makes the decoder understand old AMD 3DNow! instructions that have never been properly supported in the X86 disassembler, despite being supported in other subsystems. Hopefully this should make the X86 decoder more complete with respect to binaries containing legacy code. Reviewers: craig.topper Reviewed By: craig.topper Subscribers: llvm-commits, maksfb, bruno Differential Revision: https://reviews.llvm.org/D43311 llvm-svn: 325295	2018-02-15 21:20:31 +00:00
Craig Topper	f3f35efe5c	[X86] Enable BT to be used in place of TEST for single bit checks under optsize We already do this for 64-bit when it won't fit into a 64-bit AND/TEST's immediate field. This adds an additional qualifier to do it for any single bit constant larger than 8-bits under optsize Differential Revision: https://reviews.llvm.org/D43346 llvm-svn: 325290	2018-02-15 20:27:30 +00:00
Craig Topper	2b2d8c5eb2	[X86] Use btc/btr/bts to implement xor/and/or that affects a single bit in the upper 32-bits of a 64-bit operation. We can't fold a large immediate into a 64-bit operation. But if we know we're only operating on a single bit we can use the bit instructions. For now only do this for optsize. Differential Revision: https://reviews.llvm.org/D37418 llvm-svn: 325287	2018-02-15 19:57:35 +00:00
Krzysztof Parzyszek	e0d7de7d7b	Recommit [Hexagon] Make the vararg handling a bit more robust Use the FunctionType of the callee when it's available. It may not be available for synthetic calls to functions specified by external symbols. llvm-svn: 325269	2018-02-15 17:20:07 +00:00
Yonghong Song	920df52a93	bpf: fix a bug in dag2dag optimization for loads from readonly section The reference '&' is missing in the function parameter. If there are back-to-back optimizations in terms of dag node list like below: t29: i64,ch = load<LD4[bitcast (%struct.test_t* @test.t to i8)+12](dereferenceable), zext from i32> t3, t43, undef:i64 t34: i64,ch = load<LD4[bitcast (%struct.test_t @test.t to i8*)](dereferenceable), zext from i32> t3, t41, undef:i64 The bug will trigger a segfault for the added test case remove_truncate_5.ll: LLVMSymbolizer: error reading file: No such file or directory #0 0x000000000241c4d9 (llc+0x241c4d9) #1 0x000000000241c56a (llc+0x241c56a) #2 0x000000000241aa50 (llc+0x241aa50) ... #22 0x0000000000fd5edf (llc+0xfd5edf) #23 0x00007f0fe03bec05 __libc_start_main (/lib64/libc.so.6+0x21c05) #24 0x0000000000fd3e69 (llc+0xfd3e69) ... Segmentation fault Signed-off-by: Yonghong Song <yhs@fb.com> llvm-svn: 325267	2018-02-15 17:06:45 +00:00
Krzysztof Parzyszek	8a9eff6b87	Revert "[Hexagon] Make the vararg handling a bit more robust" This is breaking lit tests. llvm-svn: 325266	2018-02-15 16:57:44 +00:00
Krzysztof Parzyszek	568107275d	[Hexagon] Make the vararg handling a bit more robust The FunctionType of the callee is always available, even if the Function of the callee is not. Use that to get the number of fixed parameters. llvm-svn: 325259	2018-02-15 16:24:30 +00:00
Krzysztof Parzyszek	18e0d2a1f8	[Hexagon] Fix lowering of formal arguments after r324737 Lowering of formal arguments needs to be aware of vararg functions. llvm-svn: 325255	2018-02-15 15:47:53 +00:00
Pablo Barrio	e28cb8399a	[ARM] Allow 64- and 128-bit types with 't' inline asm constraint Summary: In LLVM, 't' selects a floating-point/SIMD register and only supports 32-bit values. This is appropriately documented in the LLVM Language Reference Manual. However, this behaviour diverges from that of GCC, where 't' selects the s0-s31 registers and its qX and dX variants depending on additional operand modifiers (q/P). For example, the following C code: #include <arm_neon.h> float32x4_t a, b, x; asm("vadd.f32 %0, %1, %2" : "=t" (x) : "t" (a), "t" (b)) results in the following assembly if compiled with GCC: vadd.f32 s0, s0, s1 whereas LLVM will show "error: couldn't allocate output register for constraint 't'", since a, b, x are 128-bit variables, not 32-bit. This patch extends the use of 't' to mean that of GCC, thus allowing selection of the lower Q vector regs and their D/S variants. For example, the earlier code will now compile as: vadd.f32 q0, q0, q1 This behaviour still differs from that of GCC but I think it is actually more correct, since LLVM picks up the right register type based on the datatype of x, while GCC would need an extra operand modifier to achieve the same result, as follows: asm("vadd.f32 %q0, %q1, %q2" : "=t" (x) : "t" (a), "t" (b)) Since this is only an extension of functionality, existing code should not be affected by this change. Note that operand modifiers q/P are already supported by LLVM, so this patch should suffice to support inline assembly with constraint 't' originally built for GCC. Reviewers: grosbach, rengolin Reviewed By: rengolin Subscribers: rogfer01, efriedma, olista01, aemerson, javed.absar, eraman, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D42962 llvm-svn: 325244	2018-02-15 14:44:22 +00:00
Simon Pilgrim	17bb6f0755	[X86][SSE] combineTruncateWithSat - use truncateVectorWithPACK to chain PACKUS vXi32-vXi8 saturated truncation We can use PACKSS/PACKUS to saturate each stage of the chain: PACKSSDW down to [-32768,32767] and then PACKUSWB to [0,255]. llvm-svn: 325243	2018-02-15 14:37:59 +00:00
Simon Pilgrim	908f833e57	[X86][SSE] combineTruncateWithSat - use truncateVectorWithPACK to chain PACKSS vXi32-vXi8 saturated truncation We can use PACKSS to saturate each stage of the chain: PACKSSDW down to [-32768,32767] and then PACKSSWB to [-128,127]. PACKUS is a little trickier and will be handled in a separate patch. llvm-svn: 325235	2018-02-15 13:33:15 +00:00
Sjoerd Meijer	9430c8cd1c	[ARM] f16 vcmp fixes This adds f16 VCMP match rules and fixes the test cases. Differential Revision: https://reviews.llvm.org/D43291 llvm-svn: 325228	2018-02-15 10:33:07 +00:00
Craig Topper	386cfa08a8	[X86] Change 32 and 64 bit versions of LSL instruction have a 16-bit memory operand. This matches the Intel and AMD documentation and is consistent with the LAR instruction. llvm-svn: 325197	2018-02-15 01:21:53 +00:00
Craig Topper	bab7b0a466	[X86] Dont' allow 'outs' and 'ins' in at&t syntax without suffixes. The match would be ambiguous, but at&t asm parsing doesn't support ambiguous matches and will just return the first. llvm-svn: 325192	2018-02-14 23:53:26 +00:00
Craig Topper	a08f83bac3	[X86] Reverse the operand order of invlpga in at&t syntax to match gas. llvm-svn: 325190	2018-02-14 23:53:21 +00:00
Craig Topper	675752166d	[X86] Don't swap argument on BOUND instruction in at&t syntax. The bound instruction does not have reversed operands in gas. Fixes PR27653. Patch by Maya Madhavan. Differential Revision: https://reviews.llvm.org/D43243 llvm-svn: 325178	2018-02-14 21:54:58 +00:00
Krzysztof Parzyszek	ad83ce4cb4	[Hexagon] Split HVX vector pair loads/stores, expand unaligned loads llvm-svn: 325169	2018-02-14 20:46:06 +00:00
Simon Pilgrim	2ec8373633	[X86][SSE] truncateVectorWithPACK - Use src type instead of dst to select between PACKSDW/PACKSWB Try to keep PACKSDW/PACKSWB as wide as possible, this helps ComputeNumSignBits as it can only peek through bitcasts to wider types, pre-AVX2 codegen was already doing this as it could peek through bitcasts/subvectors more easily than AVX2 could through shuffles. This shouldn't affect existing results as calls to truncateVectorWithPACK ensure we have enough sign bits to pack to the same value, but it should make it possible to use truncateVectorWithPACK chains to perform saturation in combineTruncateWithSat with a future patch. llvm-svn: 325149	2018-02-14 18:23:58 +00:00
Stanislav Mekhanoshin	c078ca92eb	[AMDGPU] Remove non-temporal flag from argument loads Kernel arguments likely read by all workitems and should not bypass cache. Fixes performance hit in sub-dword argument loads. Differential Revision: https://reviews.llvm.org/D43249 llvm-svn: 325146	2018-02-14 18:05:14 +00:00
Paul Robinson	ee88ed6753	[DWARF] Fix incorrect prologue end line record. The prologue-end line record must be emitted after the last instruction that is part of the function frame setup code and before the instruction that marks the beginning of the function body. Patch by Carlos Alberto Enciso! Differential Revision: https://reviews.llvm.org/D41762 llvm-svn: 325143	2018-02-14 17:35:52 +00:00
Sjoerd Meijer	3b4294edd2	[ARM] f16 stack spill/reloads This adds support for handling f16 stack spills/reloads. Differential Revision: https://reviews.llvm.org/D43280 llvm-svn: 325130	2018-02-14 15:09:09 +00:00
Simon Pilgrim	ded6e7a263	Fix GCC -Wlogical-op-parentheses warning. NFCI. llvm-svn: 325129	2018-02-14 15:07:36 +00:00
Lama Saba	fe1016c485	[X86] Reduce Store Forward Block issues in HW - Recommit after fixing Bug 36346 If a load follows a store and reloads data that the store has written to memory, Intel microarchitectures can in many cases forward the data directly from the store to the load, This "store forwarding" saves cycles by enabling the load to directly obtain the data instead of accessing the data from cache or memory. A "store forward block" occurs in cases that a store cannot be forwarded to the load. The most typical case of store forward block on Intel Core microarchiticutre that a small store cannot be forwarded to a large load. The estimated penalty for a store forward block is ~13 cycles. This pass tries to recognize and handle cases where "store forward block" is created by the compiler when lowering memcpy calls to a sequence of a load and a store. The pass currently only handles cases where memcpy is lowered to XMM/YMM registers, it tries to break the memcpy into smaller copies. breaking the memcpy should be possible since there is no atomicity guarantee for loads and stores to XMM/YMM. Change-Id: Ic41aa9ade6512e0478db66e07e2fde41b4fb35f9 llvm-svn: 325128	2018-02-14 14:58:53 +00:00
Simon Pilgrim	86d15bff68	[X86][SSE] Relax type legality for combineTruncateWithSat PACKSS/PACKUS truncation While the AVX512 VTRUNCS/VTRUNCUS instructions require legal types, truncateVectorWithPACK handles cases with multiples of legal types through splitting/concatenation. So we just need to ensure that the src/dst scalar types are correct and leave truncateVectorWithPACK to handle the rest of it. llvm-svn: 325127	2018-02-14 14:14:29 +00:00
Reid Kleckner	1631ac1696	[X86] Remove dead code from retpoline thunk generation Follow-up to r325049 llvm-svn: 325085	2018-02-14 00:24:29 +00:00
Daniel Sanders	7fc87360e9	[globalisel][legalizerinfo] Follow up on post-commit review comments after r323681 * Document most API's * Delete a useless function call * Fix a discrepancy between the single and multi-opcode variants of getActionDefinitions(). The multi-opcode variant now requires that more than one opcode is requested. Previously it acted much like the single-opcode form but unnecessarily enforced the requirements of the multi-opcode form. llvm-svn: 325067	2018-02-13 23:02:44 +00:00
Reid Kleckner	91e11a83fc	[X86] Use EDI for retpoline when no scratch regs are left Summary: Instead of solving the hard problem of how to pass the callee to the indirect jump thunk without a register, just use a CSR. At a call boundary, there's nothing stopping us from using a CSR to hold the callee as long as we save and restore it in the prologue. Also, add tests for this mregparm=3 case. I wrote execution tests for __llvm_retpoline_push, but they never got committed as lit tests, either because I never rewrote them or because they got lost in merge conflicts. Reviewers: chandlerc, dwmw2 Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D43214 llvm-svn: 325049	2018-02-13 20:47:49 +00:00
Hans Wennborg	f381e94ac8	Revert r324903 "[AArch64] Refactor identification of SIMD immediates" It caused "Cannot select: t33: f64 = AArch64ISD::FMOV Constant:i32<0>" in Chromium builds. See PR36369. > Get rid of icky goto loops and make the code easier to maintain (NFC). > > Differential revision: https://reviews.llvm.org/D42723 llvm-svn: 325034	2018-02-13 18:14:38 +00:00
Yaxun Liu	0124b5484c	[AMDGPU] Change constant addr space to 4 Differential Revision: https://reviews.llvm.org/D43170 llvm-svn: 325030	2018-02-13 18:00:25 +00:00
Craig Topper	036789a7e8	[X86] Add combine to shrink 64-bit ands when one input is an any_extend and the other input guarantees upper 32 bits are 0. Summary: This gets the shift case from PR35792. Reviewers: spatel, RKSimon Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D43222 llvm-svn: 325018	2018-02-13 16:25:25 +00:00
Krzysztof Parzyszek	cfbe6ba20c	[Hexagon] Simplify some code, NFC llvm-svn: 325014	2018-02-13 15:35:07 +00:00
Krzysztof Parzyszek	080bf219c2	[Hexagon] Remove unnecessary check llvm-svn: 325013	2018-02-13 15:34:29 +00:00
Sjoerd Meijer	f4a7fa7bbe	[ARM] Allow half types in ConstantPool Change ARMConstantIslandPass to: - accept f16 literals as litpool entries, - if the litpool needs to be inserted in the middle of a big block, then we need to 4-byte align the next instruction in ARM mode. Differential Revision: https://reviews.llvm.org/D42784 llvm-svn: 325012	2018-02-13 15:34:09 +00:00
Andre Vieira	f00234c0bf	[ARM] Don't print "Requires NEON" error message for M-profile Differential Revision: https://reviews.llvm.org/D43125 llvm-svn: 325000	2018-02-13 11:46:38 +00:00
Sjoerd Meijer	101ee43072	[Thumb] Handle addressing mode AddrMode5FP16 This addressing mode wasn't checked, so we were running in an assert. Differential Revision: https://reviews.llvm.org/D43179 llvm-svn: 324996	2018-02-13 10:29:03 +00:00
Craig Topper	df99baa4df	[X86] Teach EVEX->VEX pass to turn VRNDSCALE into VROUND when bits 7:4 of the immediate are 0 and the regular EVEX->VEX checks pass. Bits 7:4 control the scale part of the operation. If the scale is 0 the behavior is equivalent to VROUND. Fixes PR36246 llvm-svn: 324985	2018-02-13 04:19:26 +00:00
Craig Topper	5ce6db93c1	[X86] Use getTypeAction in most places that were checking ExperimentalVectorWideningLegalization. This will allow more flexibility in what types we legalize via widening or not. This should help with a couple lines in D41062. llvm-svn: 324980	2018-02-13 01:49:58 +00:00
Jacob Gravelle	ca358da5e7	[WebAssembly] Fix casting MCSymbol to MCSymbolWasm on ELF Summary: wasm32-unknown-unknown-elf has MCSymbols that are not MCSymbolWasms, so we need a non-asserting cast here. Reviewers: dschuff, sunfish Subscribers: jfb, sbc100, aheejin, llvm-commits Differential Revision: https://reviews.llvm.org/D43205 llvm-svn: 324942	2018-02-12 21:41:12 +00:00
Craig Topper	88939fefe8	[X86] Simplify X86DAGToDAGISel::matchBEXTRFromAnd by creating an X86ISD::BEXTR node and calling Select. Add isel patterns to recognize this node. This removes a bunch of special case code for selecting the immediate and folding loads. llvm-svn: 324939	2018-02-12 21:18:11 +00:00
Craig Topper	efe3923514	[X86] Remove unused multiclass argument. NFC llvm-svn: 324938	2018-02-12 21:18:09 +00:00
Abderrazek Zaafrani	e72d99261f	[AArch64] Fixes for ARMv8.2-A FP16 scalar intrinsic - llvm portion https://reviews.llvm.org/D42993 llvm-svn: 324912	2018-02-12 17:35:42 +00:00
Simon Pilgrim	0efe9bc953	[X86] Add missing scheduling class tag for i64 absolute address moves Expand existing SchedRW to encompass these like it did for the other memory offset movs - added comments to closing braces to keep track of def scopes. We only tagged it with the itinerary class, so completeness checks were erroneously passed (PR35639). llvm-svn: 324910	2018-02-12 17:21:28 +00:00
Oliver Stannard	02f08c9d1f	[AArch64] Improve v8.1-A code-gen for atomic load-and Armv8.1-A added an atomic load-clear instruction (which performs bitwise and with the complement of it's operand), but not a load-and instruction. Our current code-generation for atomic load-and always inserts an MVN instruction to invert its argument, even if it could be folded into a constant or another instruction. This adds lowering early in selection DAG to convert a load-and operation into an xor with -1 and a load-clear, allowing the normal DAG optimisations to work on it. To do this, I've had to add a new ISD opcode, ATOMIC_LOAD_CLR. I don't see any easy way to do this with an AArch64-specific ISD node, because the code-generation for atomic operations assumes the SDNodes are of type AtomicSDNode. I've left the old tablegen patterns in because they are still needed for global isel. Differential revision: https://reviews.llvm.org/D42478 llvm-svn: 324908	2018-02-12 17:03:11 +00:00
Simon Pilgrim	07e1337c2a	[X86][AVX512] Add missing scheduling class tag for KMOVB/KMOVW/KMOVD/KMOVQ moves/loads/stores. We only tagged it with the itinerary class, so completeness checks were erroneously passed (PR35639). llvm-svn: 324905	2018-02-12 16:59:04 +00:00
Evandro Menezes	7dc0f1ec45	[AArch64] Refactor identification of SIMD immediates Get rid of icky goto loops and make the code easier to maintain (NFC). Differential revision: https://reviews.llvm.org/D42723 llvm-svn: 324903	2018-02-12 16:41:41 +00:00
Simon Pilgrim	369e59d4d1	[X86][AVX512] Add missing scheduling class tag for VMOVQ/VMOVHLPS/VMOVLHPS/VMOVHPD/VMOVHPS/VMOVLPD/VMOVLPS Tag AVX512 variants to match SSE/AVX originals. We only tagged it with the itinerary class, so completeness checks were erroneously passed (PR35639). llvm-svn: 324901	2018-02-12 16:18:36 +00:00
Simon Pilgrim	b941f5dc5f	[X86] Tag CET-IBT instruction scheduler classes llvm-svn: 324898	2018-02-12 15:57:00 +00:00
Simon Pilgrim	d0693a6501	[X86][MMX] Add missing scheduling class tag for EMMS/FEMMS We only tagged it with the itinerary class, so completeness checks were erroneously passed (PR35639). AMD targets can perform these a lot quicker than WriteMicrocoded so will need an override in the models. llvm-svn: 324897	2018-02-12 15:52:59 +00:00
Oliver Stannard	4269917304	[AArch64] Improve v8.1-A code-gen for atomic load-subtract Armv8.1-A added an atomic load-add instruction, but not a load-subtract instruction. Our current code-generation for atomic load-subtract always inserts a NEG instruction to negate it's argument, even if it could be folded into a constant or another instruction. This adds lowering early in selection DAG to convert a load-subtract operation into a subtract and a load-add, allowing the normal DAG optimisations to work on it. I've left the old tablegen patterns in because they are still needed for global isel. Some of the tests in this patch are copied from D35375 by Chad Rosier (which was abandoned). Differential revision: https://reviews.llvm.org/D42477 llvm-svn: 324892	2018-02-12 14:22:03 +00:00
Hans Wennborg	7e19dfc45f	Revert r324835 "[X86] Reduce Store Forward Block issues in HW" It asserts building Chromium; see PR36346. (This also reverts the follow-up r324836.) > If a load follows a store and reloads data that the store has written to memory, Intel microarchitectures can in many cases forward the data directly from the store to the load, This "store forwarding" saves cycles by enabling the load to directly obtain the data instead of accessing the data from cache or memory. > A "store forward block" occurs in cases that a store cannot be forwarded to the load. The most typical case of store forward block on Intel Core microarchiticutre that a small store cannot be forwarded to a large load. > The estimated penalty for a store forward block is ~13 cycles. > > This pass tries to recognize and handle cases where "store forward block" is created by the compiler when lowering memcpy calls to a sequence > of a load and a store. > > The pass currently only handles cases where memcpy is lowered to XMM/YMM registers, it tries to break the memcpy into smaller copies. > breaking the memcpy should be possible since there is no atomicity guarantee for loads and stores to XMM/YMM. llvm-svn: 324887	2018-02-12 12:43:39 +00:00
Simon Atanasyan	0874cf5e62	[mips] Fix 'l' constraint handling for types smaller than 32 bits In case of correct using of the 'l' constraint llvm now generates valid code; otherwise it shows an error message. Initially these triggers an assertion. This commit is the same as r324869 with fixed the test's file name. llvm-svn: 324885	2018-02-12 12:21:55 +00:00
Simon Atanasyan	dc4ed35ea6	[mips] Revert rL324869 This commit adds inlineasm-cnstrnt-bad-l.ll which is clashing with inlineasm-cnstrnt-bad-L.ll on case insensitive file systems. llvm-svn: 324882	2018-02-12 11:15:37 +00:00
David Green	6d9f8c9817	[CodeGen] Add a -trap-unreachable option for debugging Add a common -trap-unreachable option, similar to the target specific hexagon equivalent, which has been replaced. This turns unreachable instructions into traps, which is useful for debugging. Differential Revision: https://reviews.llvm.org/D42965 llvm-svn: 324880	2018-02-12 11:06:27 +00:00
Simon Atanasyan	e08f2a19d4	[mips] Fix 'l' constraint handling for types smaller than 32 bits In case of correct using of the 'l' constraint llvm now generates valid code; otherwise it shows an error message. Initially these triggers an assertion. llvm-svn: 324869	2018-02-12 07:51:21 +00:00
Craig Topper	b424fafa9f	[X86] Don't look for TEST instruction shrinking opportunities when the root node is a X86ISD::SUB. I don't believe we ever create an X86ISD::SUB with a 0 constant which is what the TEST handling needs. The ternary operator at the end of this code shows up as only going one way in the llvm-cov report from the bots. llvm-svn: 324865	2018-02-12 03:02:02 +00:00
Craig Topper	3ccbd3f32f	[X86] Remove check for X86ISD::AND with no flag users from the TEST instruction immediate shrinking code. We turn X86ISD::AND with no flag users back to ISD::AND in PreprocessISelDAG. llvm-svn: 324864	2018-02-12 03:02:01 +00:00
Craig Topper	98ae8f833f	[X86] Change some compare patterns to use loadi8/loadi16/loadi32/loadi64 helper fragments. This enables CMP8mi to fold zextloadi8i1 which in all tests allows us to avoid creating a TEST8rr that peephole can't fold. llvm-svn: 324863	2018-02-12 02:48:42 +00:00
Craig Topper	3ce035acf3	[X86] Add KADD X86ISD opcode instead of reusing ISD::ADD. ISD::ADD implies individual vector element addition with no carries between elements. But for a vXi1 type that would be the same as XOR. And we already turn ISD::ADD into ISD::XOR for all vXi1 types during lowering. So the ISD::ADD pattern would never be able to match anyway. KADD is different, it adds the elements but also propagates a carry between them. This just a way of doing an add in k-register without bitcasting to the scalar domain. There's still no way to match the pattern, but at least its not obviously wrong. llvm-svn: 324861	2018-02-12 01:33:38 +00:00
Craig Topper	dfc322ddf4	[X86] Allow zextload/extload i1->i8 to be folded into instructions during isel Previously we just emitted this as a MOV8rm which would likely get folded during the peephole pass anyway. This just makes it explicit earlier. The gpr-to-mask.ll test changed because the kaddb instruction has no memory form. llvm-svn: 324860	2018-02-12 01:33:36 +00:00
Craig Topper	363e099446	[X86] Remove MASK_BINOP intrinsic type. NFC llvm-svn: 324858	2018-02-11 22:32:30 +00:00
Craig Topper	38d61c38a2	[X86] Remove dead code from getMaskNode that looked for a i64 mask with a maskVT that wasn't v64i1. NFC llvm-svn: 324857	2018-02-11 22:32:29 +00:00
Craig Topper	a7ac028a6b	[X86] Remove LowerBoolVSETCC_AVX512, we get this with a target independent DAG combine now. NFC llvm-svn: 324856	2018-02-11 22:32:27 +00:00
Simon Pilgrim	0d8c4bfc2a	[X86][SSE] Use SplitBinaryOpsAndApply to recognise PSUBUS patterns before they're split on AVX1 This needs to be generalised further to support AVX512BW cases but I want to add non-uniform constants first. llvm-svn: 324844	2018-02-11 17:29:42 +00:00
Craig Topper	ca5a340171	[X86] Use min/max for vector ult/ugt compares if avoids a sign flip. Summary: Currently we only use min/max to help with ule/uge compares because it removes an invert of the result that would otherwise be needed. But we can also use it for ult/ugt compares if it will prevent the need for a sign bit flip needed to use pcmpgt at the cost of requiring an invert after the compare. I also refactored the code so that the max/min code is self contained and does its own return instead of setting up a flag to manipulate the rest of the function's behavior. Most of the test cases look ok with this. I did notice that we added instructions when one of the operands being sign flipped is a constant vector that we were able to constant fold the flip into. I also noticed that sometimes the SSE min/max clobbers a register that is needed after the compare. This resulted in an extra move being inserted before the min/max to preserve the register. We could try to detect this and switch from min to max and change the compare operands to use the operand that gets reused in the compare. Reviewers: spatel, RKSimon Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D42935 llvm-svn: 324842	2018-02-11 17:11:40 +00:00
Simon Pilgrim	c2544c572a	[X86][SSE] Moved SplitBinaryOpsAndApply earlier so more methods can use it. NFCI. llvm-svn: 324841	2018-02-11 17:01:43 +00:00
Simon Pilgrim	0be5567a89	[X86][SSE] Enable SMIN/SMAX/UMIN/UMAX custom lowering for all legal types This allows us to recognise more saturation patterns and also simplify some MINMAX codegen that was failing to combine CMPGE comparisons to a legal CMPGT. Differential Revision: https://reviews.llvm.org/D43014 llvm-svn: 324837	2018-02-11 10:52:37 +00:00
Lama Saba	c2ba6c387e	[X86] Reduce Store Forward Block issues in HW If a load follows a store and reloads data that the store has written to memory, Intel microarchitectures can in many cases forward the data directly from the store to the load, This "store forwarding" saves cycles by enabling the load to directly obtain the data instead of accessing the data from cache or memory. A "store forward block" occurs in cases that a store cannot be forwarded to the load. The most typical case of store forward block on Intel Core microarchiticutre that a small store cannot be forwarded to a large load. The estimated penalty for a store forward block is ~13 cycles. This pass tries to recognize and handle cases where "store forward block" is created by the compiler when lowering memcpy calls to a sequence of a load and a store. The pass currently only handles cases where memcpy is lowered to XMM/YMM registers, it tries to break the memcpy into smaller copies. breaking the memcpy should be possible since there is no atomicity guarantee for loads and stores to XMM/YMM. Change-Id: I620b6dc91583ad9a1444591e3ddc00dd25d81748 llvm-svn: 324835	2018-02-11 09:34:12 +00:00
Craig Topper	24d3b28d93	[X86] Don't make 512-bit vectors legal when preferred vector width is 256 bits and 512 bits aren't required This patch adds a new function attribute "required-vector-width" that can be set by the frontend to indicate the maximum vector width present in the original source code. The idea is that this would be set based on ABI requirements, intrinsics or explicit vector types being used, maybe simd pragmas, etc. The backend will then use this information to determine if its save to make 512-bit vectors illegal when the preference is for 256-bit vectors. For code that has no vectors in it originally and only get vectors through the loop and slp vectorizers this allows us to generate code largely similar to our AVX2 only output while still enabling AVX512 features like mask registers and gather/scatter. The loop vectorizer doesn't always obey TTI and will create oversized vectors with the expectation the backend will legalize it. In order to avoid changing the vectorizer and potentially harm our AVX2 codegen this patch tries to make the legalizer behavior similar. This is restricted to CPUs that support AVX512F and AVX512VL so that we have good fallback options to use 128 and 256-bit vectors and still get masking. I've qualified every place I could find in X86ISelLowering.cpp and added tests cases for many of them with 2 different values for the attribute to see the codegen differences. We still need to do frontend work for the attribute and teach the inliner how to merge it, etc. But this gets the codegen layer ready for it. Differential Revision: https://reviews.llvm.org/D42724 llvm-svn: 324834	2018-02-11 08:06:27 +00:00
Craig Topper	a4bf9b8d51	[X86] Remove setOperationAction lines for promoting vXi1 SINT_TO_FP/UINT_TO_FP. We promote these via a DAG combine now before lowering gets the chance. Also remove the v2i1 custom handling since it will no longer be triggered. llvm-svn: 324833	2018-02-11 07:44:33 +00:00
Craig Topper	ba5ad55965	[X86] Remove some redundant qualifications from the setOperationAction blocks. NFC These were added as part of the refactoring for prefer vector width. At the time I thought the hasAVX512 here would be replaced with "allow 512 bit vectors" so that it would read "allow 512 bit vectors OR VLX". But now the plan is to only give the option of disabling 512 bit vectors when VLX is enabled. So we don't need this qualification at all llvm-svn: 324831	2018-02-11 03:07:19 +00:00
Craig Topper	4dccffc84a	[X86] Change signatures of avx512 packed fp compare intrinsics to return a vXi1 mask type to be closer to an fcmp. Summary: This patch changes the signature of the avx512 packed fp compare intrinsics to return a vXi1 vector and no longer take a mask as input. The casts to scalar type will now need to be explicit in the IR. The masking node will now be an explicit and in the IR. This makes the intrinsic look much more similar to an fcmp instruction that we wish we could use for these but can't. We already use icmp instructions for integer compares. Previously the lowering step of isel would turn the intrinsic into an X86 specific ISD node and a emit the masking nodes as well as some bitcasts. This means DAG combines can't see the vXi1 type until somewhat late, making it more difficult to combine out gpr<->mask transition sequences. By exposing the vXi1 type explicitly in the IR and initial SelectionDAG we give earlier DAG combines and even InstCombine the chance to see it and optimize it. This should make any issues with gpr<->mask sequences the same between integer and fp. Meaning we only have to fix them once. Reviewers: spatel, delena, RKSimon, zvi Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D43137 llvm-svn: 324827	2018-02-10 23:33:55 +00:00
Simon Pilgrim	cb9a02f60e	[X86][SSE] Increase PMULLD costs to better match hardware Until Skylake, most hardware could only issue a PMULLD op every other cycle llvm-svn: 324823	2018-02-10 19:27:10 +00:00
Craig Topper	9121eb575e	[X86] Custom legalize (v2i32 (setcc (v2f32))) so that we don't end up with a (v4i1 (setcc (v4f32))) Undef VLX, getSetCCResultType returns v2i1/v4i1 for v2f32/v4f32 so default type legalization will end up changing the setcc result type back to vXi1 if it had been extended. The resulting extend gets messed up further by type legalization and is difficult to recombine back to (v4i32 (setcc (v4f32))) after legalization. I went ahead and enabled this for SSE2 and later since its always the result we want and this helps type legalization get there in less steps. llvm-svn: 324822	2018-02-10 19:12:58 +00:00
Craig Topper	28d3a73c81	[X86] Extend inputs with elements smaller than i32 to sint_to_fp/uint_to_fp before type legalization. This prevents extends of masks being introduced during lowering where it become difficult to combine them out. There are a few oddities in here. We sometimes concatenate two k-registers produced by two compares, sign_extend the combined pair, then extract two halves. This worked better previously because the sign_extend wasn't created until after the fp_to_sint was split which led to a split sign_extend being created. We probably also need to custom type legalize (v2i32 (sext v2i1)) via widening. llvm-svn: 324820	2018-02-10 17:58:58 +00:00
Craig Topper	b8d7b1620b	[X86] Custom legalize (v2i1 (fp_to_uint/fp_to_sint v2f64)) without AVX512VL. Strangely the code was already present, just the setOperationAction wasn't being called without VLX. llvm-svn: 324806	2018-02-10 08:39:31 +00:00
Craig Topper	c3aab4bbe1	[X86] Legalize zero extends from vXi1 to vXi16/vXi32/vXi64 using a sign extend and a shift. This avoids a constant pool load to create 1. The int->float are showing converts to mask and back. We probably need to widen inputs to sint_to_fp/uint_to_fp before type legalization. llvm-svn: 324805	2018-02-10 08:06:52 +00:00
Craig Topper	d34af6f636	[X86] Teach combineExtSetcc to handle ZERO_EXTEND by widening the setcc and then masking. A later DAG combine will convert to a shift. This helps to avoid a constant pool load needed to zero extend from the mask. llvm-svn: 324804	2018-02-10 08:06:49 +00:00
Nirav Dave	c8c9d4fe35	[DAG] Make early exit hasPredecessorHelper return true. NFCI. All uses conservatively assume in early exit case that it will be a predecessor. Changing default removes checking code in all uses. llvm-svn: 324797	2018-02-10 02:41:22 +00:00
Craig Topper	fa6113b3d7	[X86] Teach combineInsertSubvector how to combine some k-register insert_subvectors and extract_subvector sequences to remove extra zeroing.wq llvm-svn: 324791	2018-02-10 01:00:41 +00:00
Daniel Neilson	f4fa26f5d8	[Hexagon] Update uses of deprecated IRBuilder CreateMemCpy/Move calls Summary: This change is part of step five in the series of changes to remove alignment argument from memcpy/memmove/memset in favour of alignment attributes. In particular, this changes the Hexagon LoopIdiom pass to cease using the old IRBuilder createMemCpy/createMemMove single-alignment APIs in favour of the new API that allows setting source and destination alignments independently. Steps: Step 1) Remove alignment parameter and create alignment parameter attributes for memcpy/memmove/memset. ( rL322965, rC322964, rL322963 ) Step 2) Expand the IRBuilder API to allow creation of memcpy/memmove with differing source and dest alignments. ( rL323597 ) Step 3) Update Clang to use the new IRBuilder API. ( rC323617 ) Step 4) Update Polly to use the new IRBuilder API. ( rL323618 ) Step 5) Update LLVM passes that create memcpy/memmove calls to use the new IRBuilder API, and those that use use MemIntrinsicInst::[get\|set]Alignment() to use [get\|set]DestAlignment() and [get\|set]SourceAlignment() instead. ( rL323886, rL323891, rL324148, rL324273, rL324278, rL324384, rL324395, rL324402, rL324626, rL324642, rL324653, rL324654, rL324773, rL324774, rL324781 ) Step 6) Remove the single-alignment IRBuilder API for memcpy/memmove, and the MemIntrinsicInst::[get\|set]Alignment() methods. Reference http://lists.llvm.org/pipermail/llvm-dev/2015-August/089384.html http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html llvm-svn: 324784	2018-02-09 23:33:35 +00:00
Craig Topper	99db883d55	[X86] Teach lower1BitVectorShuffle to recognize shuffles that are just filling upper elements with zero. Replace with insert_subvector. There's still some extra kshifts in one of the modified test cases here, but hopefully that's only a DAG combine away. llvm-svn: 324782	2018-02-09 23:32:27 +00:00
Daniel Neilson	7512c3e15f	[ARMFastISel] Replace deprecated calls to MemoryIntrinsic::getAlignment() (NFCI) This change is part of step five in the series of changes to remove alignment argument from memcpy/memmove/memset in favour of alignment attributes. In particular, this changes ARMFastISel to cease using the old getAlignment() API of MemoryIntrinsic in favour of getting source & dest specific alignments through the new API. Steps: Step 1) Remove alignment parameter and create alignment parameter attributes for memcpy/memmove/memset. ( rL322965, rC322964, rL322963 ) Step 2) Expand the IRBuilder API to allow creation of memcpy/memmove with differing source and dest alignments. ( rL323597 ) Step 3) Update Clang to use the new IRBuilder API. ( rC323617 ) Step 4) Update Polly to use the new IRBuilder API. ( rL323618 ) Step 5) Update LLVM passes that create memcpy/memmove calls to use the new IRBuilder API, and those that use use MemIntrinsicInst::[get\|set]Alignment() to use [get\|set]DestAlignment() and [get\|set]SourceAlignment() instead. ( rL323886, rL323891, rL324148, rL324273, rL324278, rL324384, rL324395, rL324402, rL324626, rL324642, rL324653, rL324654, rL324773, rL324774 ) Step 6) Remove the single-alignment IRBuilder API for memcpy/memmove, and the MemIntrinsicInst::[get\|set]Alignment() methods. Reference http://lists.llvm.org/pipermail/llvm-dev/2015-August/089384.html http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html llvm-svn: 324781	2018-02-09 23:31:37 +00:00
Dan Gohman	db1916a646	[WebAssembly] Add mechanisms for specifying an explicit import module name. This adds a wasm-import-module function attribute and a .import_module assembler directive, for specifying module import names for WebAssembly. Currently these may only be used for function symbols; global variables may be considered in the future. WebAssembly has a two-level namespace scheme for symbols, and it's normally the linker's job to assign the module name, which is the first-level name. The attributes here allow users to specify their own module names explicitly, which is useful for tools generating bindings to modules defined in other languages. This feature is not fully usable yet. It will evolve along with the ongoing symbol table and lld changes. Differential Revision: https://reviews.llvm.org/D42520 llvm-svn: 324778	2018-02-09 23:13:22 +00:00
Dan Gohman	861bec2b7c	[WebAssembly] Add an LLVM_FALLTHROUGH to address a warning. NFC. llvm-svn: 324777	2018-02-09 22:59:01 +00:00
Daniel Neilson	a60f4621ae	[AMDGPUPromoteAlloca] Replace deprecated memory intrinsic APIs (NFCI) Summary: This change is part of step five in the series of changes to remove alignment argument from memcpy/memmove/memset in favour of alignment attributes. In particular, this changes the AMDGPUPromoteAlloca pass to cease using: 1) The old getAlignment() API of MemoryIntrinsic in favour of getting source & dest specific alignments through the new API. 2) The old IRBuilder createMemCpy/createMemMove single-alignment APIs in favour of the new API that allows setting source and destination alignments independently. Steps: Step 1) Remove alignment parameter and create alignment parameter attributes for memcpy/memmove/memset. ( rL322965, rC322964, rL322963 ) Step 2) Expand the IRBuilder API to allow creation of memcpy/memmove with differing source and dest alignments. ( rL323597 ) Step 3) Update Clang to use the new IRBuilder API. ( rC323617 ) Step 4) Update Polly to use the new IRBuilder API. ( rL323618 ) Step 5) Update LLVM passes that create memcpy/memmove calls to use the new IRBuilder API, and those that use use MemIntrinsicInst::[get\|set]Alignment() to use [get\|set]DestAlignment() and [get\|set]SourceAlignment() instead. ( rL323886, r323891, rL324148, rL324273, rL324278, rL324384, rL324395, rL324402, rL324626, rL324642, rL324653, rL324654, rL324773 ) Step 6) Remove the single-alignment IRBuilder API for memcpy/memmove, and the MemIntrinsicInst::[get\|set]Alignment() methods. Reference http://lists.llvm.org/pipermail/llvm-dev/2015-August/089384.html http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html llvm-svn: 324774	2018-02-09 21:56:15 +00:00
Daniel Neilson	4a58b4b52c	[AArch64FastISel] Replace deprecated calls to MemoryIntrinsic::getAlignment() (NFCI) Summary: This change is part of step five in the series of changes to remove alignment argument from memcpy/memmove/memset in favour of alignment attributes. In particular, this changes AArch64FastISel to cease using the old getAlignment() API of MemoryIntrinsic in favour of getting source & dest specific alignments through the new API. Steps: Step 1) Remove alignment parameter and create alignment parameter attributes for memcpy/memmove/memset. ( rL322965, rC322964, rL322963 ) Step 2) Expand the IRBuilder API to allow creation of memcpy/memmove with differing source and dest alignments. ( rL323597 ) Step 3) Update Clang to use the new IRBuilder API. ( rC323617 ) Step 4) Update Polly to use the new IRBuilder API. ( rL323618 ) Step 5) Update LLVM passes that create memcpy/memmove calls to use the new IRBuilder API, and those that use use MemIntrinsicInst::[get\|set]Alignment() to use [get\|set]DestAlignment() and [get\|set]SourceAlignment() instead. ( rL323886, r323891, rL324148, rL324273, rL324278, rL324384, rL324395, rL324402, rL324626, rL324642, rL324653, rL324654 ) Step 6) Remove the single-alignment IRBuilder API for memcpy/memmove, and the MemIntrinsicInst::[get\|set]Alignment() methods. Reference http://lists.llvm.org/pipermail/llvm-dev/2015-August/089384.html http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html llvm-svn: 324773	2018-02-09 21:49:29 +00:00
Francis Visoiu Mistrih	e67ed4c039	[X86][MC] Fix assembling rip-relative addressing + immediate displacements In the rare case where the input contains rip-relative addressing with immediate displacements, and the instruction ends with an immediate, we encode the instruction in the wrong way: movl $12345678, 0x400(%rdi) // all good, no rip-relative addr movl %eax, 0x400(%rip) // all good, no immediate at the end of the instruction movl $12345678, 0x400(%rip) // fails, encodes address as 0x3fc(%rip) Offset is a label: movl $12345678, foo(%rip) we want to account for the size of the immediate (in this case, $12345678, 4 bytes). Offset is an immediate: movl $12345678, 0x400(%rip) we should not account for the size of the immediate, assuming the immediate offset is what the user wanted. Differential Revision: https://reviews.llvm.org/D43050 llvm-svn: 324772	2018-02-09 21:47:07 +00:00
Evandro Menezes	205c0e085e	[AArch64] Adjust the cost model for Exynos M3 Fix the modeling of transfers between a generic register and a partial ASIMD one. llvm-svn: 324766	2018-02-09 19:26:11 +00:00
Krzysztof Parzyszek	9b48e8d233	[Hexagon] Add code to select QTRUE and QFALSE Fixes http://llvm.org/PR36320. llvm-svn: 324763	2018-02-09 19:10:46 +00:00
Matt Arsenault	0063ce7201	AMDGPU: Remove tied operand from si_else llvm-svn: 324751	2018-02-09 17:18:38 +00:00
Matt Arsenault	923712b6b5	Reapply "AMDGPU: Add 32-bit constant address space" This reverts r324494 and reapplies r324487. llvm-svn: 324747	2018-02-09 16:57:57 +00:00
Matt Arsenault	bcf7bec4b8	AMDGPU: Fix layering issue Move utility function that depends on codegen. Fixes build with r324487 reapplied. llvm-svn: 324746	2018-02-09 16:57:48 +00:00
Evandro Menezes	b5f12090fc	[AArch64] Refactor stand alone methods (NFC) Make stand alone methods in AArch64InstrInfo static. llvm-svn: 324745	2018-02-09 16:14:41 +00:00
Krzysztof Parzyszek	7cfe7cbccc	[Hexagon] Express calling conventions via .td file instead of hand-coding Additionally, simplify the rest of the argument/parameter lowering code. llvm-svn: 324737	2018-02-09 15:30:02 +00:00
Jonas Paulsson	7850601fa3	[AArch64] Return true in enableMultipleCopyHints(). Enable multiple COPY hints to eliminate more COPYs during register allocation. Note that this is something all targets should do, see https://reviews.llvm.org/D38128. Review: Martin Storsjö llvm-svn: 324720	2018-02-09 09:22:20 +00:00
Stanislav Mekhanoshin	9c6cd0458b	[AMDGPU] More descriptive names in the memory legalizer NFC. Differential Revision: https://reviews.llvm.org/D43054 llvm-svn: 324712	2018-02-09 06:05:33 +00:00
Craig Topper	ca5841b4e4	[X86] Simplify some code in lowerV4X128VectorShuffle and lowerV2X128VectorShuffle Previously we extracted two subvectors and concatenate. But the concatenate will be lowered to two insert subvectors. Then DAG combine will merge once of the inserts and one of the extracts back into the original vector. We might as well just directly use one extract and one insert. llvm-svn: 324710	2018-02-09 05:54:36 +00:00
Craig Topper	28166a877d	[X86] Teach shuffle lowering to recognize 128/256 bit insertions into a zero vector. This regresses a couple cases in the shuffle combining test. But those cases use intrinsics that InstCombine knows how to turn into a generic shuffle earlier. This should give opportunities to fold this earlier in InstCombine or DAG combine. llvm-svn: 324709	2018-02-09 05:54:34 +00:00
Francis Visoiu Mistrih	fb7b14f70d	[CodeGen] Unify the syntax of MBB liveins in MIR and -debug output Instead of: Live Ins: %r0 %r1 print: liveins: %r0, %r1 llvm-svn: 324694	2018-02-09 01:14:44 +00:00
Francis Visoiu Mistrih	d65438d0ca	[CodeGen] Move printing '\n' from MachineInstr::print to MachineBasicBlock::print MBB.print wasn't printing it, but the MIRPrinter is printing it. The goal is to unify that as much as possible. llvm-svn: 324681	2018-02-08 23:42:27 +00:00
Jacques Pienaar	bd275c7ba7	[Lanai] Code model dictates section selection. Always use the small section when the small code model is specified. llvm-svn: 324679	2018-02-08 23:25:05 +00:00
Matt Arsenault	9c2f3c4852	AMDGPU: Process SDWA block at a time Right now this loops over the entire function every time there is a change, which is not very efficient. There's no practical reason to track this so globally, since the code motion optimization passes should be sinking instructions with single uses and the pass currently will not fold with multiple uses. llvm-svn: 324667	2018-02-08 22:46:41 +00:00
Matt Arsenault	c24d5e2819	AMDGPU: Minor cleanups Column limit, typo, unnecessary reference llvm-svn: 324666	2018-02-08 22:46:38 +00:00
Alexander Ivchenko	da9e81c462	[GlobalISel][X86] Fixing failures after https://reviews.llvm.org/D37775 The patch essentially makes sure that X86CallLowering adds proper G_COPY/G_TRUNC and G_ANYEXT/G_COPY when we are doing lowering of arguments/returns for floating point values passed on registers. Tests are updated accordingly Reviewed By: qcolombet Differential Revision: https://reviews.llvm.org/D42287 llvm-svn: 324665	2018-02-08 22:41:47 +00:00
Alexander Ivchenko	a85c4fc029	[GlobalIsel][X86] Making {G_IMPLICIT_DEF, s128} legal The patch is a split from D42287 and is related to fixing failures after https://reviews.llvm.org/D37775 Reviewed By: qcolombet Differential Revision: https://reviews.llvm.org/D42287 llvm-svn: 324664	2018-02-08 22:40:31 +00:00
Craig Topper	9e030c9e00	[X86] Improve combineCastedMaskArithmetic to fold (bitcast (vXi1 (and/or/xor X, C)))->(vXi1 (and/or/xor (bitcast X), (bitcast C)) where C is a constant build_vector. Most vxi1 constant build vectors have to be implemented in the scalar domain anyway so we'll probably end up with a cast there later. But by then its too late to do the combine to get rid of it. llvm-svn: 324662	2018-02-08 22:26:39 +00:00
Craig Topper	1b5b4ccb77	[X86] Add DAG combine to constant fold a bitcast of a vXi1 constant build_vector into a scalar integer. llvm-svn: 324661	2018-02-08 22:26:36 +00:00
Craig Topper	dccf72b583	[X86] Remove kortest intrinsics and replace with native IR. llvm-svn: 324646	2018-02-08 20:16:06 +00:00
David Woodhouse	76eb26aa92	[X86] Support 'V' register operand modifier This allows the register name to be printed without the leading '%'. This can be used for emitting calls to the retpoline thunks from inline asm. llvm-svn: 324645	2018-02-08 20:06:05 +00:00
Oliver Stannard	133b6085e8	[ARM] Re-commit r324600 with fixed LLVMBuild.txt ARMDisassembler now depends on the banked register tables in ARMUtils, so the LLVMBuild.txt needed updating to reflect this. Original commit mesage: [ARM] Fix disassembly of invalid banked register moves When disassembling banked register move instructions, we don't have an assembly syntax for the unallocated register numbers, so we have to return Fail rather than SoftFail. Previously we were returning SoftFail, then crashing in the InstPrinter as we have no way to represent these encodings in an assembly string. This also switches the decoder to use the table-generated list of banked registers, removing the duplicated list of encodings. Differential revision: https://reviews.llvm.org/D43066 llvm-svn: 324606	2018-02-08 14:31:22 +00:00
Oliver Stannard	3c11ecbbab	Revert r324600 as it breaks a buildbot The broken bot (clang-ppc64le-linux-multistage) is doign a shared-object build, so I guess using lookupBankedRegByEncoding in the disassembler is a layering violation? llvm-svn: 324604	2018-02-08 14:21:28 +00:00
Oliver Stannard	db982b25ff	[ARM] Fix disassembly of invalid banked register moves When disassembling banked register move instructions, we don't have an assembly syntax for the unallocated register numbers, so we have to return Fail rather than SoftFail. Previously we were returning SoftFail, then crashing in the InstPrinter as we have no way to represent these encodings in an assembly string. This also switches the decoder to use the table-generated list of banked registers, removing the duplicated list of encodings. Differential revision: https://reviews.llvm.org/D43066 llvm-svn: 324600	2018-02-08 13:06:08 +00:00
Clement Courbet	1b8c08b633	[X86] Fix compilation of r324580. @ctopper Can you check that the fix is correct ? llvm-svn: 324586	2018-02-08 09:41:50 +00:00
Stefan Maksimovic	8989940557	Revert accidental changes that snuck in r324584 llvm-svn: 324585	2018-02-08 09:31:48 +00:00
Stefan Maksimovic	b3e7ed3b94	[mips] Define certain instructions in microMIPS32r3 Instructions affected: mthc1, mfhc1, add.d, sub.d, mul.d, div.d, mov.d, neg.d, cvt.w.d, cvt.d.s, cvt.d.w, cvt.s.d These instructions are now defined for microMIPS32r3 + microMIPS32r6 in MicroMipsInstrFPU.td since they shared their encoding with those already defined in microMIPS32r6InstrInfo.td and have been therefore removed from the latter file. Some instructions present in MicroMipsInstrFPU.td which did not have both AFGR64 and FGR64 variants defined have been altered to do so. Differential revision: https://reviews.llvm.org/D42738 llvm-svn: 324584	2018-02-08 09:25:17 +00:00
Sjoerd Meijer	5ea465ded7	[AArch64] Don't materialize 0 with "fmov h0, .." when FullFP16 is not supported We were generating "fmov h0, wzr" instructions when FullFP16 is not enabled. I've not added any tests, because the problem was visible in: test/CodeGen/AArch64/arm64-zero-cycle-zeroing.ll, which I had to change: I don't think Cyclone has FullFP16 enabled by default, so it shouldn't be using this v8.2a instruction. I've also removed these rdar tags, please shout if there are any objections. Differential Revision: https://reviews.llvm.org/D43020 llvm-svn: 324581	2018-02-08 08:39:05 +00:00
Craig Topper	8d0c8c9be1	[X86] Support folding in a k-register OR when creating KORTEST from scalar compare of a bitcast from vXi1. This should allow us to remove the kortest intrinsic from IR and use compare+bitcast+or in IR instead. llvm-svn: 324580	2018-02-08 08:29:43 +00:00
Craig Topper	93505707b6	[X86] Allow KORTEST instruction to be used for testing if a mask is all ones The KTEST instruction sets the C flag if the result of anding both operands together is all 1s. We can use this to lower (icmp eq/ne (bitcast (vXi1 X), -1) Differential Revision: https://reviews.llvm.org/D42772 llvm-svn: 324577	2018-02-08 07:54:16 +00:00
Craig Topper	f5465f98d2	[X86] Don't emit KTEST instructions unless only the Z flag is being used Summary: KTEST has weird flag behavior. The Z flag is set for all bits in the AND of the k-registers being 0, and the C flag is set for all bits being 1. All other flags are cleared. We currently emit this instruction in EmitTEST and don't check the condition code. This can lead to strange things like using the S flag after a KTEST for a signed compare. The domain reassignment pass can also transform TEST instructions into KTEST and is not protected against the flag usage either. For now I've disabled this part of the domain reassignment pass. I tried to comment out the checks in the mir test so that we could recover them later, but I couldn't figure out how to get that to work. This patch moves the KTEST handling into LowerSETCC and now creates a ktest+x86setcc. I've chosen this approach because I'd like to add support for the C flag for all ones in a followup patch. To do that requires that I can rewrite the condition code going in the x86setcc to be different than the original SETCC condition code. This fixes PR36182. I'll file a PR to fix domain reassignment once this goes in. Should this be merged to 6.0? Reviewers: spatel, guyblank, RKSimon, zvi Reviewed By: guyblank Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D42770 llvm-svn: 324576	2018-02-08 07:45:55 +00:00
Peter Collingbourne	559ff1fe03	ARM: Remove dead code. NFCI. llvm-svn: 324565	2018-02-08 05:28:39 +00:00
Yonghong Song	f2075aef68	bpf: Improve expanding logic in LowerSELECT_CC LowerSELECT_CC is not generating optimal Select_Ri pattern at the moment. It is not guaranteed to place ConstantNode at RHS which would miss matching Select_Ri. A new testcase added into the existing select_ri.ll, also there is an existing case in cmp.ll which would be improved to use Select_Ri after this patch, it is adjusted accordingly. Reported-by: Alexei Starovoitov <alexei.starovoitov@gmail.com> Reviewed-by: Yonghong Song <yhs@fb.com> Signed-off-by: Jiong Wang <jiong.wang@netronome.com> llvm-svn: 324560	2018-02-08 04:37:49 +00:00
Matt Arsenault	b02cebf552	AMDGPU: Fix incorrect reordering when inline asm defines LDS address Defs of operands outside of the instruction's explicit defs need to be checked. llvm-svn: 324554	2018-02-08 01:56:14 +00:00
Matt Arsenault	c908e3f77a	AMDGPU: Don't crash when trying to fold implicit operands llvm-svn: 324550	2018-02-08 01:12:46 +00:00
Justin Lebar	321b443ef6	[NVPTX] When dying due to a bad address space value, print out the value. llvm-svn: 324549	2018-02-08 00:50:04 +00:00
Stanislav Mekhanoshin	db39b4b0b4	[AMDGPU] Fixed wait count reuse The code reusing existing wait counts is incorrect since it keeps adding new operands to an old instruction instead of replacing the immediate. It was also effectively switched off by the condition that wait count is not an AMDGPU::S_WAITCNT. Also switched to BuildMI instead of creating instructions directly. Differential Revision: https://reviews.llvm.org/D42997 llvm-svn: 324547	2018-02-08 00:18:35 +00:00
Chandler Carruth	0be0cfa65b	[x86] Fix nasty bug in the x86 backend that is essentially impossible to hit from IR but creates a minefield for MI passes. The x86 backend has fairly powerful logic to try and fold loads that feed register operands to instructions into a memory operand on the instruction. This is almost always a good thing, but there are specific relocated loads that are only allowed to appear in specific instructions. Notably, R_X86_64_GOTTPOFF is only allowed in `movq` and `addq`. This patch blocks folding of memory operands using this relocation unless the target is in fact `addq`. The particular relocation indicates why we simply don't hit this under normal circumstances. This relocation is only used for TLS, and it gets used in very specific ways in conjunction with %fs-relative addressing. The result is that loads using this relocation are essentially never eligible for folding into an instruction's memory operands. Unless, of course, you have an MI pass that inserts usage of such a load. I have exactly such an MI pass and was greeted by truly mysterious miscompiles where the linker replaced my instruction with a completely garbage byte sequence. Go team. This is the only such relocation I'm aware of in x86, but there may be others that need to be similarly restricted. Fixes PR36165. Differential Revision: https://reviews.llvm.org/D42732 llvm-svn: 324546	2018-02-07 23:59:14 +00:00
Craig Topper	37765ff326	[X86] Prune some unreachable 'return SDValue()' paths from LowerSIGN_EXTEND/LowerZERO_EXTEND/LowerANY_EXTEND. We were doing a lot of whitelisting of what we handle in these routines, but setOperationAction constrains what we can get here. So just add some asserts and prune the unreachable paths. llvm-svn: 324538	2018-02-07 22:45:38 +00:00
Craig Topper	1db5ebc016	[X86] Remove dead code from EmitTest that looked for an i1 type which should have already been type legalized away. NFC llvm-svn: 324536	2018-02-07 22:19:26 +00:00
Craig Topper	8baa9c77e3	[X86] When doing callee save/restore for k-registers make sure we don't use KMOVQ on non-BWI targets If we are saving/restoring k-registers, the default behavior of getMinimalRegisterClass will find the VK64 class with a spill size of 64 bits. This will cause the KMOVQ opcode to be used for save/restore. If we don't have have BWI instructions we need to constrain the class returned to give us VK16 with a 16-bit spill size. We can do this by passing the either v16i1 or v64i1 into getMinimalRegisterClass. Also add asserts to make sure BWI is enabled anytime we use KMOVD/KMOVQ. These are what caught this bug. Fixes PR36256 Differential Revision: https://reviews.llvm.org/D42989 llvm-svn: 324533	2018-02-07 21:41:50 +00:00
Rafael Espindola	f4e3f3e31c	Revert "AMDGPU: Add 32-bit constant address space" This reverts commit r324487. It broke clang tests. llvm-svn: 324494	2018-02-07 18:09:35 +00:00
Marek Olsak	871c30e540	AMDGPU: Add 32-bit constant address space Note: This is a candidate for LLVM 6.0, because it was planned to be in that release but was delayed due to a long review period. Merge conflict in release_60 - resolution: Add "-p6:32:32" into the second (non-amdgiz) string. Only scalar loads support 32-bit pointers. An address in a VGPR will fail to compile. That's OK because the results of loads will only be used in places where VGPRs are forbidden. Updated AMDGPUAliasAnalysis and used SReg_64_XEXEC. The tests cover all uses cases we need for Mesa. Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D41651 llvm-svn: 324487	2018-02-07 16:01:00 +00:00
Marek Olsak	b2cc77985b	AMDGPU: Remove the s_buffer workaround for GFX9 chips Summary: I checked the AMD closed source compiler and the workaround is only needed when x3 is emulated as x4, which we don't do in LLVM. SMEM x3 opcodes don't exist, and instead there is a possibility to use x4 with the last component being unused. If the last component is out of buffer bounds and falls on the next 4K page, the hw hangs. Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D42756 llvm-svn: 324486	2018-02-07 16:00:40 +00:00
Simon Pilgrim	b4e789e8f6	[X86][AVX] Add PACKSSDW/PACKUSDW support for truncation of clamped values SSE and shorter vector sizes will have to wait until we can add support for general SMIN/SMAX matching. llvm-svn: 324485	2018-02-07 15:48:44 +00:00
Simon Atanasyan	70498f81de	[mips] Support 'y' operand code to print exact log2 of the operand llvm-svn: 324477	2018-02-07 12:36:39 +00:00
Simon Atanasyan	737bec38d0	[mips] Handle 'M' and 'L' operand codes for memory operands Both operand codes now work the same way in case of register or memory operands. It print high-order or low-order word in a double-word register or memory location. llvm-svn: 324476	2018-02-07 12:36:33 +00:00
Sjoerd Meijer	8c0739347c	[ARM] FP16 mov imm pattern This is a follow up of r324321, adding a match pattern for mov with a FP16 immediate (also fixing operand vfp_f16imm that wasn't even compiling). Differential Revision: https://reviews.llvm.org/D42973 llvm-svn: 324456	2018-02-07 08:37:17 +00:00
Chandler Carruth	282ae1632a	[x86/retpoline] Make the external thunk names exactly match the names that happened to end up in GCC. This is really unfortunate, as the names don't have much rhyme or reason to them. Originally in the discussions it seemed fine to rely on aliases to map different names to whatever external thunk code developers wished to use but there are practical problems with that in the kernel it turns out. And since we're discovering this practical problems late and since GCC has already shipped a release with one set of names, we are forced, yet again, to blindly match what is there. Somewhat rushing this patch out for the Linux kernel folks to test and so we can get it patched into our releases. Differential Revision: https://reviews.llvm.org/D42998 llvm-svn: 324449	2018-02-07 06:16:24 +00:00
Tom Stellard	33445765dd	AMDGPU/GlobalISel: Mark 32-bit G_FPTOUI as legal Reviewers: arsenm Reviewed By: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D42152 llvm-svn: 324446	2018-02-07 04:47:59 +00:00
Mark Searles	24c92eeb83	[AMDGPU] Suppress redundant waitcnt instrs. 1. Run the memory legalizer prior to the waitcnt pass; keep the policy that the waitcnt pass does not remove any waitcnts within the incoming IR. 2. The waitcnt pass doesn't (yet) track waitcnts that exist prior to the waitcnt pass (it just skips over them); because the waitcnt pass is ignorant of them, it may insert a redundant waitcnt. To avoid this, check the prev instr. If it and the to-be-inserted waitcnt are the same, suppress the insertion. We keep the existing waitcnt under the assumption that whomever, e.g., the memory legalizer, inserted it knows what they were doing. 3. Follow-on work: teach the waitcnt pass to record the pre-existing waitcnts for better waitcnt production. Differential Revision: https://reviews.llvm.org/D42854 llvm-svn: 324440	2018-02-07 02:21:21 +00:00
Matt Arsenault	a18b3bcf51	AMDGPU: Select BFI patterns with 64-bit ints llvm-svn: 324431	2018-02-07 00:21:34 +00:00
Craig Topper	58ecffd857	[DAGCombiner][AMDGPU][X86] Turn cttz/ctlz into cttz_zero_undef/ctlz_zero_undef if we can prove the input is never zero X86 currently has a late DAG combine after cttz/ctlz are turned into BSR+BSF+CMOV to detect this and remove the CMOV. But we should be able to do this much earlier and avoid creating the cmov all together. For the changed AMDGPU test case it appears that previously the i8 cttz was type legalized to i16 which introduced an OR with 256 in order to limit the result to 8 on the widened type. At this point the result is known to never be zero, but nothing checked that. Then operation legalization is told to promote all i16 cttz to i32. This introduces an extend and a truncate and another OR with 65536 to limit the result to 16. With the DAG combiner change we are able to prevent the creation of the second OR since the opcode will have been changed to cttz_zero_undef after the first OR. I the lack of the OR caused the instruction to change to v_ffbl_b32_sdwa Differential Revision: https://reviews.llvm.org/D42985 llvm-svn: 324427	2018-02-06 23:54:37 +00:00
Eli Friedman	cd07a3e2f9	Place undefined globals in .bss instead of .data Following up on the discussion from http://lists.llvm.org/pipermail/llvm-dev/2017-April/112305.html, undef values are now placed in the .bss as well as null values. This prevents undef global values taking up potentially huge amounts of space in the .data section. The following two lines now both generate equivalent .bss data: @vals1 = internal unnamed_addr global [20000000 x i32] zeroinitializer, align 4 @vals2 = internal unnamed_addr global [20000000 x i32] undef, align 4 ; previously unaccounted for This is primarily motivated by the corresponding issue in the Rust compiler (https://github.com/rust-lang/rust/issues/41315). Differential Revision: https://reviews.llvm.org/D41705 Patch by varkor! llvm-svn: 324424	2018-02-06 23:22:14 +00:00
Evandro Menezes	cb7959fd78	[AArch64] Adjust the cost model for Exynos M3 Fix the modeling of long division and SIMD conversion from integer and horizontal minimum and maximum. llvm-svn: 324417	2018-02-06 22:35:47 +00:00
Krzysztof Parzyszek	8abaf8954a	[Hexagon] Extract HVX lowering and selection into HVX-specific files, NFC llvm-svn: 324392	2018-02-06 20:22:20 +00:00
Krzysztof Parzyszek	97a5095db6	[Hexagon] Lower concat of more than 2 vectors into build_vector llvm-svn: 324391	2018-02-06 20:18:58 +00:00
Stanislav Mekhanoshin	ce2d428a98	[AMDGPU] removed dead code handling rmw in memory legalizer It was always using cmpxchg path and in rmw and cmpxchg instructions are not distinguishable in the BE. Differential Revision: https://reviews.llvm.org/D42976 llvm-svn: 324383	2018-02-06 19:11:56 +00:00
Krzysztof Parzyszek	be253e797b	[Hexagon] Don't form new-value jumps from floating-point instructions Additionally, verify that the register defined by the producer is a 32-bit register. llvm-svn: 324381	2018-02-06 19:08:41 +00:00
Sjoerd Meijer	d2718ba95e	[ARM] f16 conversions This is a follow up of r324321, adding f16 <-> f32 and f16 <-> f64 conversion match patterns. Differential Revision: https://reviews.llvm.org/D42954 llvm-svn: 324360	2018-02-06 16:28:43 +00:00
Nirav Dave	27721e8617	[DAG, X86] Improve Dependency analysis when doing multi-node Instruction Selection Cleanup cycle/validity checks in ISel (IsLegalToFold, HandleMergeInputChains) and X86 (isFusableLoadOpStore). Now do a full search for cycles / dependencies pruning the search when topological property of NodeId allows. As part of this propogate the NodeId-based cutoffs to narrow hasPreprocessorHelper searches. Reviewers: craig.topper, bogner Subscribers: llvm-commits, hiraditya Differential Revision: https://reviews.llvm.org/D41293 llvm-svn: 324359	2018-02-06 16:14:29 +00:00
Marek Olsak	7d92b7e23a	AMDGPU: Fix S_BUFFER_LOAD_DWORD_SGPR moveToVALU Author: Bas Nieuwenhuizen https://reviews.llvm.org/D42881 llvm-svn: 324353	2018-02-06 15:17:55 +00:00
Krzysztof Parzyszek	1d52a850b3	[Hexagon] Remove leftover assert llvm-svn: 324352	2018-02-06 15:15:13 +00:00
Krzysztof Parzyszek	88f11003a0	[Hexagon] Split HVX operations on vector pairs Vector pairs are legal types, but not every operation can work on pairs. For those operations that are legal for single vectors, generate a concat of their results on pair halves. llvm-svn: 324350	2018-02-06 14:24:57 +00:00
Krzysztof Parzyszek	7b52cf1d7f	[Hexagon] Add helper functions to identify single/pair vector types, NFC llvm-svn: 324349	2018-02-06 14:21:31 +00:00
Krzysztof Parzyszek	69f1d7e370	[Hexagon] Handle lowering of SETCC via setCondCodeAction It was expanded directly into instructions earlier. That was to avoid loads from a constant pool for a vector negation: "xor x, splat(i1 -1)". Implement ISD opcodes QTRUE and QFALSE to denote logical vectors of all true and all false values, and handle setcc with negations through selection patterns. llvm-svn: 324348	2018-02-06 14:16:52 +00:00
Simon Pilgrim	ae00a71f55	[X86][SSE] Add PACKUS support for truncation of clamped values Followup to D42544 that matches PACKUSWB cases for non-AVX512, SSE and PACKUSDW cases will have to wait until we can add support for general SMIN/SMAX matching. llvm-svn: 324347	2018-02-06 14:07:46 +00:00
Tim Renouf	807ecc3d66	[AMDGPU] do not generate .AMDGPU.config for amdpal os type Summary: Now we generate PAL metadata for the amdpal os type, there is no need to generate the .AMDGPU.config section. Reviewers: arsenm, nhaehnle, dstuttard Subscribers: kzhuravl, wdng, yaxunl, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D37760 Change-Id: I303c5fad66656ce97293da60621afac6595b4c18 llvm-svn: 324346	2018-02-06 13:39:38 +00:00
Sander de Smalen	81fcf865be	[AArch64][SVE] Asm: Add AND_ZI instructions and aliases Summary: Adds support for the SVE AND instruction with vector and logical-immediate operands, and their corresponding aliases. Reviewers: fhahn, rengolin, samparker, echristo, aadg, kristof.beyls Reviewed By: fhahn Subscribers: aemerson, javed.absar, tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D42295 llvm-svn: 324343	2018-02-06 13:13:21 +00:00
Simon Pilgrim	90a237bf83	[X86][SSE] Add PACKSS support for truncation of clamped values Followup to D42544 that matches PACKSSWB cases for non-AVX512, SSE and PACKSSDW cases will have to wait until we can add support for general SMIN/SMAX matching. llvm-svn: 324339	2018-02-06 12:16:10 +00:00
Hiroshi Inoue	ad48d2fe61	[PowerPC] fix up in rL324229, NFC This patch fixes up my previous commit (add initialization of local variables). llvm-svn: 324336	2018-02-06 11:34:16 +00:00

... 11 12 13 14 15 ...

47199 Commits