llvm-project

Commit Graph

Author	SHA1	Message	Date
Simon Pilgrim	b237b54c2d	[X86][AVX] Add X86ISD::VPERMV demandedelts test llvm-svn: 358173	2019-04-11 14:26:32 +00:00
Sanjay Patel	c0f4a35e68	[DAGCombiner][x86] scalarize inserted vector FP ops // bo (build_vec ...undef, x, undef...), (build_vec ...undef, y, undef...) --> // build_vec ...undef, (bo x, y), undef... The lifetime of the nodes in these examples is different for variables versus constants, but they are all build vectors briefly, so I'm proposing to catch them in this form to handle all of the leading examples in the motivating test file. Before we have build vectors, we might have insert_vector_element. After that, we might have scalar_to_vector and constant pool loads. It's going to take more work to ensure that FP vector operands are getting simplified with undef elements, so this transform can apply more widely. In a non-loose FP environment, we are likely simplifying FP elements to NaN values rather than undefs. We also need to allow more opcodes down this path. Eg, we don't handle FP min/max flavors yet. Differential Revision: https://reviews.llvm.org/D60514 llvm-svn: 358172	2019-04-11 14:21:57 +00:00
Diogo N. Sampaio	8ddfd46c61	[AArch64] Add lowering pattern for llvm.aarch64.neon.vcvtfxs2fp.f16.i64 Summary: Add lowering pattern for llvm.aarch64.neon.vcvtfxs2fp.f16.i64 Reviewers: pbarrio, DavidSpickett, LukeGeeson Reviewed By: LukeGeeson Subscribers: javed.absar, kristof.beyls, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60259 llvm-svn: 358171	2019-04-11 14:19:43 +00:00
Simon Pilgrim	6f3866c6fb	[X86] SimplifyDemandedVectorElts - add X86ISD::VPERMILPV mask support llvm-svn: 358170	2019-04-11 14:15:01 +00:00
Simon Pilgrim	886e32e0f2	[X86][AVX] Add X86ISD::VPERMILPV demandedelts tests llvm-svn: 358168	2019-04-11 14:09:35 +00:00
Simon Pilgrim	cb5218ad48	[X86] SimplifyDemandedVectorElts - add X86ISD::VPERMIL2 mask support llvm-svn: 358167	2019-04-11 14:04:19 +00:00
Simon Pilgrim	7021dec26e	[X86][XOP] Add X86ISD::VPERMIL2 demandedelts test llvm-svn: 358166	2019-04-11 13:52:43 +00:00
Simon Pilgrim	e468cc7f14	[X86] SimplifyDemandedVectorElts - add VPPERM support We need to add support for all variable shuffle mask ops, but VPPERM is the only one that already has test coverage. llvm-svn: 358165	2019-04-11 13:30:38 +00:00
Shiva Chen	7cc03bd064	[RISCV] Put data smaller than eight bytes to small data section Because of gp = sdata_start_address + 0x800, gp with signed twelve-bit offset could covert most of the small data section. Linker relaxation could transfer the multiple data accessing instructions to a gp base with signed twelve-bit offset instruction. Differential Revision: https://reviews.llvm.org/D57493 llvm-svn: 358150	2019-04-11 04:59:13 +00:00
Amara Emerson	213e0bde04	[AArch64][GlobalISel] Make <2 x p0> = G_BUILD_VECTOR legal. The existing isel support already works for p0 once the legalizer accepts it. llvm-svn: 358144	2019-04-10 23:06:14 +00:00
Amara Emerson	a7ff111b04	[AArch64][GlobalISel] Add legalizer support for <8 x s16> and <16 x s8> G_ADD. llvm-svn: 358143	2019-04-10 23:06:11 +00:00
Amara Emerson	ae878dab03	[AArch64][GlobalISel] Scalarize vector SDIV. llvm-svn: 358142	2019-04-10 23:06:08 +00:00
Craig Topper	10048060f6	[X86] Add SSE1 command line to atomic-fp.ll and atomic-non-integer.ll. NFC llvm-svn: 358141	2019-04-10 22:35:32 +00:00
Craig Topper	a3ee7e2b3e	[X86] Autogenerate complete checks. NFC llvm-svn: 358140	2019-04-10 22:35:24 +00:00
Craig Topper	61f31cbcb2	[X86] Teach foldMaskedShiftToScaledMask to look through an any_extend from i32 to i64 between the and & shl foldMaskedShiftToScaledMask tries to reorder and & shl to enable the shl to fold into an LEA. But if there is an any_extend between them it doesn't work. This patch modifies the code to look through any_extend from i32 to i64 when the and mask only uses bits that weren't from the extended part. This will prevent a regression from D60358 caused by 64-bit SHL being narrowed to 32-bits when their upper bits aren't demanded. Differential Revision: https://reviews.llvm.org/D60532 llvm-svn: 358139	2019-04-10 21:42:08 +00:00
David Green	deb3342018	[ARM] Add an extra test for constant hoist. NFC llvm-svn: 358128	2019-04-10 19:18:58 +00:00
Craig Topper	cacb70c94b	[X86] Add test case for LEA formation regression seen with D60358. NFC If we have an (add X, (and (aext (shl Y, C1)), C2)), we can pull the shift through and+aext to fold into an LEA with the. Assuming C1 is small enough and C2 masks off all of the extend bits. This pattern showed up in D60358. And we need to handle it to prevent a regression. llvm-svn: 358124	2019-04-10 19:09:06 +00:00
David Green	4e3fd7757a	[ARM] Add an extra constant hoisting test. NFC This adds a simple extra test for constant hoisting to show it's usefulness with constant addresses like those seen in memory mapped registers in embedded systems. llvm-svn: 358114	2019-04-10 18:05:57 +00:00
David Green	0861c87b06	Revert rL357745: [SelectionDAG] Compute known bits of CopyFromReg Certain optimisations from ConstantHoisting and CGP rely on Selection DAG not seeing through to the constant in other blocks. Revert this patch while we come up with a better way to handle that. I will try to follow this up with some better tests. llvm-svn: 358113	2019-04-10 18:00:41 +00:00
Matt Arsenault	7187272b2b	GlobalISel: Support legalizing G_CONSTANT with irregular breakdown llvm-svn: 358109	2019-04-10 17:27:53 +00:00
Craig Topper	35fe07916a	[AArch64] Teach getTestBitOperand to look through ANY_EXTENDS This patch teach getTestBitOperand to look through ANY_EXTENDs when the extended bits aren't used. The test case changed here is based what D60358 did to test16 in tbz-tbnz.ll. So this patch will avoid that regression. Differential Revision: https://reviews.llvm.org/D60482 llvm-svn: 358108	2019-04-10 17:27:29 +00:00
Matt Arsenault	9e0eeba569	GlobalISel: Handle odd breakdowns for bit ops llvm-svn: 358105	2019-04-10 17:07:56 +00:00
Simon Pilgrim	37d8d55823	[X86][AVX] getTargetConstantBitsFromNode - extract bits from X86ISD::SUBV_BROADCAST llvm-svn: 358096	2019-04-10 16:24:47 +00:00
Diogo N. Sampaio	aae424a2d2	[AArch64] Add lowering pattern for scalar fp16 facge and facgt Summary: The fp16 scalar version of facge and facgt requires a custom patter matching, as the result type is not the same width of the operands. Reviewers: olista01, javed.absar, pbarrio Reviewed By: javed.absar Subscribers: kristof.beyls, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60212 llvm-svn: 358083	2019-04-10 13:34:18 +00:00
Diogo N. Sampaio	651463e4a8	[ARM] [FIX] Add missing f16 vector operations lowering Summary: Add missing <8xhalf> shufflevectors pattern, when using concat_vector dag node. As well, allows <8xhalf> and <4xhalf> vldup1 operations. These instructions are required for v8.2a fp16 lowering of vmul_n_f16, vmulq_n_f16 and vmulq_lane_f16 intrinsics. Reviewers: olista01, pbarrio, LukeGeeson, efriedma Reviewed By: efriedma Subscribers: efriedma, javed.absar, kristof.beyls, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60319 llvm-svn: 358081	2019-04-10 13:28:06 +00:00
Diana Picus	b6e83b98f9	[ARM GlobalISel] Select G_FCONSTANT for VFP3 Make it possible to TableGen code for FCONSTS and FCONSTD. We need to make two changes to the TableGen descriptions of vfp_f32imm and vfp_f64imm respectively: * add GISelPredicateCode to check that the immediate fits in 8 bits; * extract the SDNodeXForms into separate definitions and create a GISDNodeXFormEquiv and a custom renderer function for each of them. There's a lot of boilerplate to get the actual value of the immediate, but it basically just boils down to calling ARM_AM::getFP32Imm or ARM_AM::getFP64Imm. llvm-svn: 358063	2019-04-10 09:14:32 +00:00
Diana Picus	3533ad6801	[ARM GlobalISel] Select G_FCONSTANT into pools Put all floating point constants into constant pools and load their values from there. llvm-svn: 358062	2019-04-10 09:14:24 +00:00
Diana Picus	165846b031	[ARM GlobalISel] Map G_FCONSTANT llvm-svn: 358061	2019-04-10 09:14:16 +00:00
Jim Lin	a49c95e02a	[Sparc] Fix incorrect MI insertion position for spilling f128. Summary: Obviously, new built MI (sethi+add or sethi+xor+add) for constructing large offset should be inserted before new created MI for storing even register into memory. So the insertion position should be *StMI instead of II. before fixed: std %f0, [%g1+80] sethi 4, %g1 <<< add %g1, %sp, %g1 <<< this two instructions should be put before "std %f0, [%g1+80]". sethi 4, %g1 add %g1, %sp, %g1 std %f2, [%g1+88] after fixed: sethi 4, %g1 add %g1, %sp, %g1 std %f0, [%g1+80] sethi 4, %g1 add %g1, %sp, %g1 std %f2, [%g1+88] Reviewers: venkatra, jyknight Reviewed By: jyknight Subscribers: jyknight, fedor.sergeev, jrtc27, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60397 llvm-svn: 358042	2019-04-10 01:56:32 +00:00
Amara Emerson	9bf092d719	[AArch64][GlobalISel] Add isel support for vector G_ICMP and G_ASHR & G_SHL The selection for G_ICMP is unfortunately not currently importable from SDAG due to the use of custom SDNodes. To support this, this selection method has an opcode table which has been generated by a script, indexed by various instruction properties. Ideally in future we will have a GISel native selection patterns that we can write in tablegen to improve on this. For selection of some types we also need support for G_ASHR and G_SHL which are generated as a result of legalization. This patch also adds support for them, generating the same code as SelectionDAG currently does. Differential Revision: https://reviews.llvm.org/D60436 llvm-svn: 358035	2019-04-09 21:22:43 +00:00
Amara Emerson	888dd5d198	[AArch64][GlobalISel] Legalize vector G_ICMP. Selection support will be coming in a later patch. Differential Revision: https://reviews.llvm.org/D60435 llvm-svn: 358034	2019-04-09 21:22:40 +00:00
Amara Emerson	92d74f19cf	[AArch64][GlobalISel] Add legalization for some vector G_SHL and G_ASHR. This is needed for some future support for vector ICMP. Differential Revision: https://reviews.llvm.org/D60433 llvm-svn: 358033	2019-04-09 21:22:37 +00:00
Amara Emerson	2b523f8162	[GlobalISel][AArch64] Allow CallLowering to handle types which are normally required to be passed as different register types. E.g. <2 x i16> may need to be passed as a larger <2 x i32> type, so formal arg lowering needs to be able truncate it back. Likewise, when dealing with returns of these types, they need to be widened in the appropriate way back. Differential Revision: https://reviews.llvm.org/D60425 llvm-svn: 358032	2019-04-09 21:22:33 +00:00
Craig Topper	ba55a40fd0	[AArch64] Add test case to show missed opportunity to remove a shift before tbnz when the shift has been zero extended from i32 to i64. NFC This pattern showed up in D60358 and it was suggested I had a test and fix that separately. llvm-svn: 358030	2019-04-09 19:23:37 +00:00
Craig Topper	61e77b11d1	[DAGCombiner][X86][SystemZ] Canonicalize SSUBO with immediate RHS to SADDO by negating the immediate. This lines up with what we do for regular subtract and it matches up better with X86 assumptions in isel patterns that add with immediate is more canonical than sub with immediate. Differential Revision: https://reviews.llvm.org/D60020 llvm-svn: 358027	2019-04-09 18:33:56 +00:00
Simon Pilgrim	d7cc0ec581	[TargetLowering] SimplifyDemandedBits - add ISD::INSERT_SUBVECTOR support llvm-svn: 358019	2019-04-09 16:52:21 +00:00
Stanislav Mekhanoshin	913ba8eeb4	Revert LIS handling in MachineDCE One of out of tree targets has regressed with this patch. Reverting it for now and let liveness to be fully reconstructed in case pass was used after the LIS is created to resolve the regression. Differential Revision: https://reviews.llvm.org/D60466 llvm-svn: 358015	2019-04-09 16:13:53 +00:00
Simon Pilgrim	345eacd555	[TargetLowering] SimplifyDemandedBits - call SimplifyDemandedBits in bitcast handling When bitcasting from a source op to a larger bitwidth op, split the demanded bits and OR them on top of one another and demand those merged bits in the SimplifyDemandedBits call on the source op. llvm-svn: 357992	2019-04-09 10:27:59 +00:00
Tom Stellard	206b9927f8	AMDGPU/GlobalISel: Implement call lowering for shaders returning values Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, jvesely, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, volkan, llvm-commits Differential Revision: https://reviews.llvm.org/D57166 llvm-svn: 357964	2019-04-09 02:26:03 +00:00
Chen Zheng	19ce6719bc	[PowerPC] initialize SchedModel according to platform. Differential Revision: https://reviews.llvm.org/D60177 llvm-svn: 357962	2019-04-09 01:25:25 +00:00
Simon Pilgrim	9f74df7d5b	[TargetLowering] SimplifyDemandedBits - use DemandedElts in bitcast handling Be more selective in the SimplifyDemandedBits -> SimplifyDemandedVectorElts bitcast call based on the demanded elts. llvm-svn: 357942	2019-04-08 20:59:38 +00:00
Simon Pilgrim	86844a865e	[X86][AVX] Add PR34380 shuffle test cases llvm-svn: 357914	2019-04-08 14:05:42 +00:00
Sanjay Patel	50c3b290ed	[x86] make 8-bit shl undesirable I was looking at a potential DAGCombiner fix for 1 of the regressions in D60278, and it caused severe regression test pain because x86 TLI lies about the desirability of 8-bit shift ops. We've hinted at making all 8-bit ops undesirable for the reason in the code comment: // TODO: Almost no 8-bit ops are desirable because they have no actual // size/speed advantages vs. 32-bit ops, but they do have a major // potential disadvantage by causing partial register stalls. ...but that leads to massive diffs and exposes all kinds of optimization holes itself. Differential Revision: https://reviews.llvm.org/D60286 llvm-svn: 357912	2019-04-08 13:58:50 +00:00
Craig Topper	afb6b42691	[X86] Split floating point tests out of atomic-mi.ll into atomic-fp.ll. Add avx and avx512f command lines. NFC llvm-svn: 357882	2019-04-08 01:54:27 +00:00
Craig Topper	8aeefe3149	[X86] Add avx and avx512f command lines to atomic-non-integer.ll. NFC llvm-svn: 357881	2019-04-08 01:54:24 +00:00
Craig Topper	424417da79	[X86] Use (SUBREG_TO_REG (MOV32rm)) for extloadi64i8/extloadi64i16 when the load is 4 byte aligned or better and not volatile. Summary: Previously we would use MOVZXrm8/MOVZXrm16, but those are longer encodings. This is similar to what we do in the loadi32 predicate. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60341 llvm-svn: 357875	2019-04-07 19:19:44 +00:00
Nikita Popov	3db93ac5d6	Reapply [ValueTracking] Support min/max selects in computeConstantRange() Add support for min/max flavor selects in computeConstantRange(), which allows us to fold comparisons of a min/max against a constant in InstSimplify. This fixes an infinite InstCombine loop, with the test case taken from D59378. Relative to the previous iteration, this contains some adjustments for AMDGPU med3 tests: The AMDGPU target runs InstSimplify prior to codegen, which ends up constant folding some existing med3 tests after this change. To preserve these tests a hidden -amdgpu-scalar-ir-passes option is added, which allows disabling scalar IR passes (that use InstSimplify) for testing purposes. Differential Revision: https://reviews.llvm.org/D59506 llvm-svn: 357870	2019-04-07 17:22:16 +00:00
Simon Pilgrim	07adb6abda	[X86][SSE] SimplifyDemandedBitsForTargetNode - Add initial PACKSS support In the case where we only want the sign bit (e.g. when using PACKSS truncation of comparison results for MOVMSK) then we can just demand the sign bit of the source operands. This makes use of the fact that PACKSS saturates out of range values to the min/max int values - so the sign bit is always preserved. Differential Revision: https://reviews.llvm.org/D60333 llvm-svn: 357859	2019-04-07 10:40:01 +00:00
Craig Topper	399102b464	[X86] When converting (x << C1) AND C2 to (x AND (C2>>C1)) << C1 during isel, try using andl over andq by favoring 32-bit unsigned immediates. llvm-svn: 357848	2019-04-06 19:00:11 +00:00
Craig Topper	f9b9f8d2e4	[X86] Use a signed mask in foldMaskedShiftToScaledMask to enable a shorter immediate encoding. This function reorders AND and SHL to enable the SHL to fold into an LEA. The upper bits of the AND will be shifted out by the SHL so it doesn't matter what mask value we use for these bits. By using sign bits from the original mask in these upper bits we might enable a shorter immediate encoding to be used. llvm-svn: 357846	2019-04-06 18:00:50 +00:00
Craig Topper	82448bc09e	[X86] Add test cases to show missed opportunities to use a sign extended 8 or 32 bit immediate AND when reversing SHL+AND to form an LEA. When we shift the AND mask over we should shift in sign bits instead of zero bits. The scale in the LEA will shift these bits out so it doesn't matter whether we mask the bits off or not. Using sign bits will potentially allow a sign extended immediate to be used. Also add some other test cases for cases that are currently optimal. llvm-svn: 357845	2019-04-06 18:00:45 +00:00
Craig Topper	9d7379c250	[X86] Autogenerate complete checks. NFC llvm-svn: 357844	2019-04-06 18:00:41 +00:00
Simon Pilgrim	ec28615f7f	[X86] Add AVX-target expandload and compressstore tests llvm-svn: 357842	2019-04-06 14:40:52 +00:00
Simon Pilgrim	d23611f9ad	[X86] Split expandload and compressstore tests llvm-svn: 357840	2019-04-06 14:14:54 +00:00
Simon Pilgrim	18a8a64c9f	[X86][SSE] Add more exhaustive masked load/store tests Reordered/renamed some existing tests to match the cleaned up order llvm-svn: 357839	2019-04-06 14:01:37 +00:00
Francis Visoiu Mistrih	9d9d1b6b2b	[X86] Enable tail calls for CallingConv::Swift It's currently only enabled on AArch64 (enabled in r281376). llvm-svn: 357809	2019-04-05 20:18:25 +00:00
Francis Visoiu Mistrih	ab051a378c	[X86] Preserve operand flag when expanding TCRETURNri The expansion of TCRETURNri(64) would not keep operand flags like undef/renamable/etc. which can result in machine verifier issues. Also add plumbing to be able to use `-run-pass=x86-pseudo`. llvm-svn: 357808	2019-04-05 20:18:21 +00:00
Stanislav Mekhanoshin	c8f78f8dd3	[AMDGPU] Add MachineDCE pass after RenameIndependentSubregs Detect dead lanes can create some dead defs. Then RenameIndependentSubregs will break a REG_SEQUENCE which may use these dead defs. At this point a dead instruction can be removed but we do not run a DCE anymore. MachineDCE was only running before live variable analysis. The patch adds a mean to preserve LiveIntervals and SlotIndexes in case it works past this. Differential Revision: https://reviews.llvm.org/D59626 llvm-svn: 357805	2019-04-05 20:11:32 +00:00
Craig Topper	80aa2290fb	[X86] Merge the different Jcc instructions for each condition code into single instructions that store the condition code as an operand. Summary: This avoids needing an isel pattern for each condition code. And it removes translation switches for converting between Jcc instructions and condition codes. Now the printer, encoder and disassembler take care of converting the immediate. We use InstAliases to handle the assembly matching. But we print using the asm string in the instruction definition. The instruction itself is marked IsCodeGenOnly=1 to hide it from the assembly parser. Reviewers: spatel, lebedev.ri, courbet, gchatelet, RKSimon Reviewed By: RKSimon Subscribers: MatzeB, qcolombet, eraman, hiraditya, arphaman, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60228 llvm-svn: 357802	2019-04-05 19:28:09 +00:00
Craig Topper	7323c2bf85	[X86] Merge the different SETcc instructions for each condition code into single instructions that store the condition code as an operand. Summary: This avoids needing an isel pattern for each condition code. And it removes translation switches for converting between SETcc instructions and condition codes. Now the printer, encoder and disassembler take care of converting the immediate. We use InstAliases to handle the assembly matching. But we print using the asm string in the instruction definition. The instruction itself is marked IsCodeGenOnly=1 to hide it from the assembly parser. Reviewers: andreadb, courbet, RKSimon, spatel, lebedev.ri Reviewed By: andreadb Subscribers: hiraditya, lebedev.ri, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60138 llvm-svn: 357801	2019-04-05 19:27:49 +00:00
Craig Topper	e0bfeb5f24	[X86] Merge the different CMOV instructions for each condition code into single instructions that store the condition code as an immediate. Summary: Reorder the condition code enum to match their encodings. Move it to MC layer so it can be used by the scheduler models. This avoids needing an isel pattern for each condition code. And it removes translation switches for converting between CMOV instructions and condition codes. Now the printer, encoder and disassembler take care of converting the immediate. We use InstAliases to handle the assembly matching. But we print using the asm string in the instruction definition. The instruction itself is marked IsCodeGenOnly=1 to hide it from the assembly parser. This does complicate the scheduler models a little since we can't assign the A and BE instructions to a separate class now. I plan to make similar changes for SETcc and Jcc. Reviewers: RKSimon, spatel, lebedev.ri, andreadb, courbet Reviewed By: RKSimon Subscribers: gchatelet, hiraditya, kristina, lebedev.ri, jdoerfert, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60041 llvm-svn: 357800	2019-04-05 19:27:41 +00:00
Clement Courbet	1d8c9dfe03	[ExpandMemCmp][NFC] Add tests for `memcmp(p, q, n) < 0` case. llvm-svn: 357767	2019-04-05 15:03:25 +00:00
Simon Pilgrim	17586cda4a	[SelectionDAG] Add fcmp UNDEF handling to SelectionDAG::FoldSetCC Second half of PR40800, this patch adds DAG undef handling to fcmp instructions to match the behavior in llvm::ConstantFoldCompareInstruction, this permits constant folding of vector comparisons where some elements had been reduced to UNDEF (by SimplifyDemandedVectorElts etc.). This involves a lot of tweaking to reduced tests as bugpoint loves to reduce fcmp arguments to undef........ Differential Revision: https://reviews.llvm.org/D60006 llvm-svn: 357765	2019-04-05 14:56:21 +00:00
Matt Arsenault	4ed6ccab9b	AMDGPU/GlobalISel: Fix non-power-of-2 select llvm-svn: 357762	2019-04-05 14:03:04 +00:00
Sanjay Patel	50a8652785	[DAGCombiner][x86] scalarize splatted vector FP ops There are a variety of vector patterns that may be profitably reduced to a scalar op when scalar ops are performed using a subset (typically, the first lane) of the vector register file. For x86, this is true for float/double ops and element 0 because insert/extract is just a sub-register rename. Other targets should likely enable the hook in a similar way. Differential Revision: https://reviews.llvm.org/D60150 llvm-svn: 357760	2019-04-05 13:32:17 +00:00
Simon Pilgrim	faa5b939f0	[X86][AVX] Add PR34584 masked store test cases llvm-svn: 357757	2019-04-05 11:34:30 +00:00
Simon Pilgrim	329e63b915	[X86] Add SSE/AVX1/AVX2 masked trunc+store tests llvm-svn: 357756	2019-04-05 11:22:28 +00:00
Roger Ferrer Ibanez	e011e4f89c	[RISCV] Implement adding a displacement to a BlockAddress Recent change rL357393 uses MachineInstrBuilder::addDisp to add a based on a BlockAddress but this case was not implemented. This patch adds the missing case and a test for RISC-V that exercises the new case. Differential Revision: https://reviews.llvm.org/D60136 llvm-svn: 357752	2019-04-05 08:40:57 +00:00
Piotr Sobczak	0376ac1d94	[SelectionDAG] Compute known bits of CopyFromReg Summary: Teach SelectionDAG how to compute known bits of ISD::CopyFromReg if the virtual reg used has one def only. This can be particularly useful when calling isBaseWithConstantOffset() with the ISD::CopyFromReg argument, as more optimizations may get enabled in the result. Also add a missing truncation on X86, found by testing of this patch. Change-Id: Id1c9fceec862d118c54a5b53adf72ada5d6daefa Reviewers: bogner, craig.topper, RKSimon Reviewed By: RKSimon Subscribers: lebedev.ri, nemanjai, jvesely, nhaehnle, javed.absar, jsji, jdoerfert, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59535 llvm-svn: 357745	2019-04-05 07:44:09 +00:00
Craig Topper	94f1772b1e	[X86] Promote i16 SRA instructions to i32 We already promote SRL and SHL to i32. This will introduce sign extends sometimes which might be harder to deal with than the zero we use for promoting SRL. I ran this through some of our internal benchmark lists and didn't see any major regressions. I think there might be some DAG combine improvement opportunities in the test changes here. Differential Revision: https://reviews.llvm.org/D60278 llvm-svn: 357743	2019-04-05 06:32:50 +00:00
Serguei Katkov	c39636cc2c	[FastISel] Fix crash for gc.relocate lowring Lowering safepoint checks that all gc.relocaes observed in safepoint must be lowered. However Fast-Isel is able to skip dead gc.relocate. To resolve this issue we just ignore dead gc.relocate in the check. Reviewers: reames Reviewed By: reames Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D60184 llvm-svn: 357742	2019-04-05 05:41:08 +00:00
James Y Knight	a040174418	Revert [X86] When using Win64 ABI, exit with error if SSE is disabled for varargs It unnecessarily breaks previously-working code which used varargs, but didn't pass any float/double arguments (such as EDK2). Also revert the fixup on top of that: Revert [X86] Fix a test from r357317 This reverts r357317 (git commit `d413f41de6`) This reverts r357380 (git commit `7af32444b9`) llvm-svn: 357718	2019-04-04 19:05:48 +00:00
Sam Clegg	2a7cac932b	[WebAssembly] Add new explicit relocation types for PIC relocations See https://github.com/WebAssembly/tool-conventions/pull/106 Differential Revision: https://reviews.llvm.org/D59907 llvm-svn: 357710	2019-04-04 17:43:50 +00:00
Sanjay Patel	17648b848e	[x86] eliminate unnecessary broadcast of horizontal op This is another pattern that comes up if we more aggressively scalarize FP ops. llvm-svn: 357703	2019-04-04 14:46:13 +00:00
Jonas Paulsson	c56ffed304	[SystemZ] Bugfix in isFusableLoadOpStorePattern() This function is responsible for checking the legality of fusing an instance of load -> op -> store into a single operation. In the SystemZ backend the check was incomplete and a test case emerged with a cycle in the instruction selection DAG as a result. Instead of using the NodeIds to determine node relationships, hasPredecessorHelper() now is used just like in the X86 backend. This handled the failing tests and as well gave a few additional transformations on benchmarks. The SystemZ isFusableLoadOpStorePattern() is now a very near copy of the X86 function, and it seems this could be made a utility function in common code instead. Review: Ulrich Weigand https://reviews.llvm.org/D60255 llvm-svn: 357688	2019-04-04 12:12:35 +00:00
Diana Picus	153c3887e4	[ARM GlobalISel] Support DBG_VALUE Make sure we can map and select DBG_VALUE. llvm-svn: 357681	2019-04-04 10:24:51 +00:00
Craig Topper	3649c20884	[X86] Use INSERT_SUBREG rather than SUBREG_TO_REG when creating LEA64_32 during isel. SUBREG_TO_REG is supposed to be used to assert that we know the upper bits are zero. But that isn't the case here. We've done no analysis of the inputs. llvm-svn: 357673	2019-04-04 05:00:18 +00:00
Serguei Katkov	fb44846e37	[FastISel] Fix the crash in gc.result lowering The Fast ISel has a fallback to SelectionDAGISel in case it cannot handle the instruction. This works as follows: Using reverse order, try to select instruction using Fast ISel, if it cannot handle instruction it fallbacks to SelectionDAGISel for these instructions if it is a call and continue fast instruction selections. However if unhandled instruction is not a call or statepoint related instruction it fallbacks to SelectionDAGISel for all remaining instructions in basic block. However gc.result instruction is missed and as a result it is possible that gc.result is processed earlier than statepoint causing breakage invariant the gc.results should be handled after statepoint. Test is updated because in the current form fast-isel cannot handle ret instruction (due to i1 ret type without explicit ext) and as a result test does not check fast-isel at all. Reviewers: reames Reviewed By: reames Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D60182 llvm-svn: 357672	2019-04-04 04:19:56 +00:00
David L. Jones	8b8a02175a	Revert r357452 - 'SimplifyCFG SinkCommonCodeFromPredecessors: Also sink function calls without used results (PR41259)' This revision causes tests to fail under ASAN. Since the cause of the failures is not clear (could be ASAN, could be a Clang bug, could be a bug in this revision), the safest course of action seems to be to revert while investigating. llvm-svn: 357667	2019-04-04 02:27:57 +00:00
Craig Topper	051bd16faf	[X86] Remove CustomInserters for RDPKRU/WRPKRU. Use some custom lowering and new ISD opcodes instead. These inserters inserted some instructions to zero some registers and copied from virtual registers to physical registers. This change instead inserts the zeros directly into the DAG at lowering time using new ISD opcodes that take the extra zeroes as inputs. The zeros will then go through isel on their own to select the MOV32r0 pseudo. Then we just need to mention the physical registers directly in the isel patterns and the isel table and InstrEmitter will take care of inserting the necessary copies to/from physical registers. llvm-svn: 357659	2019-04-04 00:28:49 +00:00
Craig Topper	52cac4b79f	[X86] Remove CustomInserter pseudos for MONITOR/MONITORX/CLZERO. Use custom instruction selection instead. This custom inserter existed so we could do a weird thing where we pretended that the instructions support a full address mode instead of taking a pointer in EAX/RAX. I think was largely so we could be pointer size agnostic in the isel pattern. To make this work we would then put the address into an LEA into EAX/RAX in front of the instruction after isel. But the LEA is overkill when we just have a base pointer. So we end up using the LEA as a slower MOV instruction. With this change we now just do custom selection during isel instead and just assign the incoming address of the intrinsic into EAX/RAX based on its size. After the intrinsic is selected, we can let isel take care of selecting an LEA or other operation to do any address computation needed in this basic block. I've also split the instruction into a 32-bit mode version and a 64-bit mode version so the implicit use is properly sized based on the pointer. Without this we get comments in the assembly output about killing eax and defing rax or vice versa depending on whether we define the instruction to use EAX/RAX. llvm-svn: 357652	2019-04-03 23:28:30 +00:00
Craig Topper	477008bd50	[X86] Remove dead CHECK lines for a test. NFC llvm-svn: 357651	2019-04-03 23:28:18 +00:00
Craig Topper	437b45a1f8	[X86] Autogenerate checks. NFC llvm-svn: 357650	2019-04-03 23:28:11 +00:00
Sanjay Patel	c9a012e4ea	[x86] fold shuffles of h-ops that have an undef operand If an operand is undef, we can assume it's the same as the other operand. llvm-svn: 357644	2019-04-03 22:40:35 +00:00
Sanjay Patel	61b5e3c6a9	[x86] eliminate movddup of horizontal op This pattern would show up as a regression if we more aggressively convert vector FP ops to scalar ops. There's still a missed optimization for the v4f64 legal case (AVX) because we create that h-op with an undef operand. We should probably just duplicate the operands for that pattern to avoid trouble. llvm-svn: 357642	2019-04-03 22:15:29 +00:00
Sanjay Patel	0b874c7c60	[x86] add another test for disguised h-op; NFC llvm-svn: 357636	2019-04-03 21:10:55 +00:00
Matt Arsenault	396653f8a1	AMDGPU: Split block for si_end_cf Relying on no spill or other code being inserted before this was precarious. It relied on code diligently checking isBasicBlockPrologue which is likely to be forgotten. Ideally this could be done earlier, but this doesn't work because of phis. Any other instruction can't be placed before them, so we have to accept the position being incorrect during SSA. This avoids regressions in the fast register allocator rewrite from inverting the direction. llvm-svn: 357634	2019-04-03 20:53:20 +00:00
Sanjay Patel	8c9ceecdc6	[x86] add test for disguised horizontal op; NFC llvm-svn: 357630	2019-04-03 20:34:22 +00:00
Krzysztof Parzyszek	4841643a1d	[X86] Extend boolean arguments to inline-asm according to getBooleanType Differential Revision: https://reviews.llvm.org/D60208 llvm-svn: 357615	2019-04-03 17:43:14 +00:00
Simon Pilgrim	15919ad306	[X86][AVX] combineHorizontalPredicateResult - split any/allof v16i16/v32i8 reduction on AVX1 Perform the 2 x 128-bit lo/hi OR/AND on the vectors before calling PMOVMSKB on the 128-bit result. llvm-svn: 357611	2019-04-03 17:28:34 +00:00
Simon Pilgrim	9e28dddf55	[X86][AVX] combineHorizontalPredicateResult - support v16i16/v32i8 reduction on AVX1 Use getPMOVMSKB helper which splits v32i8 MOVMSK calls on pre-AVX2 targets. llvm-svn: 357608	2019-04-03 17:17:13 +00:00
Jessica Paquette	e794121cd0	[AArch64][GlobalISel] Legalize G_FEXP2 Same as G_EXP. Add a test, and update legalizer-info-validation.mir and f16-instructions.ll. Differential Revision: https://reviews.llvm.org/D60165 llvm-svn: 357605	2019-04-03 16:58:32 +00:00
Sanjay Patel	8055034666	[x86] make stack folding tests immune to unrelated transforms; NFC llvm-svn: 357604	2019-04-03 16:33:24 +00:00
Ulrich Weigand	35dfd1b7df	[SystemZ] Improve codegen for certain SADDO-immediate cases When performing an add-with-overflow with an immediate in the range -2G ... -4G, code currently loads the immediate into a register, which generally takes two instructions. In this particular case, it is preferable to load the negated immediate into a register instead, which always only requires one instruction, and then perform a subtract. llvm-svn: 357597	2019-04-03 15:09:19 +00:00
Sanjay Patel	281cf28329	[x86] remove duplicate tests Accidentally double-committed these. llvm-svn: 357593	2019-04-03 14:45:45 +00:00
Sanjay Patel	393458f3ed	[x86] add negative tests for FP scalarization; NFC These go with the proposal in D60150. llvm-svn: 357592	2019-04-03 14:41:28 +00:00
Sanjay Patel	04848090cd	[x86] add tests with constants for FP scalarization; NFC llvm-svn: 357591	2019-04-03 14:41:24 +00:00
Sanjay Patel	eb5ffc7842	[x86] add tests with constants for FP scalarization; NFC llvm-svn: 357587	2019-04-03 14:36:47 +00:00
Petar Avramovic	afa3afa384	[MIPS GlobalISel] Select floating point arithmetic operations Select 32 and 64 bit floating point add, sub, mul and div for MIPS32. Differential Revision: https://reviews.llvm.org/D60191 llvm-svn: 357584	2019-04-03 14:12:59 +00:00
Sanjay Patel	00dae6b22d	[DAGCombiner] loosen restrictions for moving shuffles after vector binop There are 3 changes to make this correspond to the same transform in instcombine: 1. Remove the legality check - we can't create anything less legal than we started with. 2. Ease the use restriction, so we only bail out if both operands have >1 use. 3. Ease the use restriction for binops with a repeated operand (eg, mul x, x). As discussed in D60150, there's a scalarization opportunity that will be made easier by allowing this transform more generally. llvm-svn: 357580	2019-04-03 13:42:06 +00:00
Simon Pilgrim	143279e61f	[X86] Regenerate LEA codegen tests llvm-svn: 357573	2019-04-03 12:33:16 +00:00
Clement Courbet	26a8ed3ac9	[X86] Make the post machine scheduler macrofusion-aware. Summary: Given that X86 does not use this currently, this is an NFC. I'll experiment with enabling and will report numbers. Reviewers: andreadb, lebedev.ri Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60185 llvm-svn: 357568	2019-04-03 09:37:30 +00:00
Clement Courbet	5bfa946d69	[X86][NFC] Add tests for misched macro-fusion. llvm-svn: 357565	2019-04-03 08:21:54 +00:00
Hans Wennborg	94b867dc7c	Revert r357256 "[DAGCombine] Improve Lifetime node chains." As it caused a pathological compile-time regressionin V8, see PR41352. > Improve both start and end lifetime nodes chain dependencies. > > Reviewers: courbet > > Reviewed By: courbet > > Subscribers: hiraditya, llvm-commits > > Tags: #llvm > > Differential Revision: https://reviews.llvm.org/D59795 This also reverts the follow-up r357309: > [DAGCombiner] Rewrite ImproveLifetimeNodeChain to avoid DAG loop. > > Avoid EXPENSIVE_CHECK failure. NFCI. llvm-svn: 357563	2019-04-03 07:41:58 +00:00
Chen Zheng	4178c15330	[PowerPC]add testcase for ppcctrloops pass shortloop check llvm-svn: 357560	2019-04-03 03:11:34 +00:00
Matt Arsenault	f426ddbfc7	AMDGPU: Assume ECC is enabled by default if supported The test should really be checking for the property directly in the code object headers, but there are problems with this. I don't see this directly represented in the text form, and for the binary emission this is depending on a function level subtarget feature to emit a global flag. llvm-svn: 357558	2019-04-03 01:58:57 +00:00
Craig Topper	16683a3ef8	[X86] Update the test case for v4i1 bitselect in combine-bitselect.ll to not have an infinite loop in IR. In fact we don't even need a loop at all. I backed out the bug fix this was testing for and verified that this new case hit the same issue. This should stop D59626 from deleting some of this code by realizing it was dead due to the loop. llvm-svn: 357544	2019-04-03 00:05:03 +00:00
Craig Topper	ca9eb68541	[X86] Autogenerate complete checks. NFC llvm-svn: 357543	2019-04-03 00:04:57 +00:00
Matt Arsenault	2065680b47	AMDGPU: Don't use the default cpu in a few tests Avoids unnecessary test changes in a future commit. llvm-svn: 357539	2019-04-03 00:00:58 +00:00
Jessica Paquette	ed23352379	[GlobalISel] Add IRTranslator support for llvm.stacksave and llvm.stackrestore Also update arm64-irtranslator.ll. Differential Revision: https://reviews.llvm.org/D60140 llvm-svn: 357538	2019-04-02 22:46:31 +00:00
Stanislav Mekhanoshin	ea2e227926	X86: regenerate speculative-load-hardening-indirect.ll tests. NFC. llvm-svn: 357537	2019-04-02 22:44:46 +00:00
Sanjay Patel	8e6d41aeb2	[x86] add more tests for FP scalarization; NFC llvm-svn: 357523	2019-04-02 20:24:06 +00:00
Jessica Paquette	22c6215c7e	[AArch64][GlobalISel] Select llvm.aarch64.stlxr(i64, i64*) This adds partial instruction selection support for llvm.aarch64.stlxr. It also factors out selection for G_INTRINSIC_W_SIDE_EFFECTS into its own function. The new function removes the restriction that the intrinsic ID on the G_INTRINSIC_W_SIDE_EFFECTS be on operand 0. Also add a test, and add a GISel line to arm64-ldxr-stxr.ll. Differential Revision: https://reviews.llvm.org/D60100 llvm-svn: 357518	2019-04-02 19:57:26 +00:00
Craig Topper	0d3a533270	[X86] Allow FixupLEAs to form INC/DEC under OptSize not just MinSize This matches our usual INC/DEC heuristic used during isel. llvm-svn: 357497	2019-04-02 17:13:03 +00:00
Jonas Paulsson	f76fe45426	[SystemZ] Improve instruction selection of 64 bit shifts and rotates. For shift and rotate instructions that only use the last 6 bits of the shift amount, a shift amount of (x*64-s) can be substituted with (-s). This saves one instruction and a register: lhi %r1, 64 sr %r1, %r3 sllg %r2, %r2, 0(%r1) => lcr %r1, %r3 sllg %r2, %r2, 0(%r1) Review: Ulrich Weigand llvm-svn: 357481	2019-04-02 15:36:30 +00:00
Simon Atanasyan	2634a141fd	[mips] Use AltOrders to prevent using odd FP-registers To disable using of odd floating-point registers (O32 ABI and -mno-odd-spreg command line option) such registers and their super-registers added to the set of reserved registers. In general, it works. But there is at least one problem - in case of enabled machine verifier pass some floating-point tests failed because live ranges of register units that are reserved is not empty and verification pass failed with "Live segment doesn't end at a valid instruction" error message. There is D35985 patch which tries to solve the problem by explicit removing of register units. This solution did not get approval. I would like to use another approach for prevent using odd floating point registers - define `AltOrders` and `AltOrderSelect` for MIPS floating point register classes. Such `AltOrders` contains reduced set of registers. At first glance, such solution does not break any test cases and allows enabling machine instruction verification for all MIPS test cases. Differential Revision: http://reviews.llvm.org/D59799 llvm-svn: 357472	2019-04-02 13:57:32 +00:00
Simon Pilgrim	64bd87ad4b	[X86][AVX] Add test case showing failure to fold broadcast load if its also used as a scalar llvm-svn: 357465	2019-04-02 10:31:00 +00:00
Sander de Smalen	7f23e0a62f	Enforce StackID definition in PEI There are various places in LLVM where the definition of StackID is not properly honoured, for example in PEI where objects with a StackID > 0 are allocated on the default stack (StackID0). This patch enforces that PEI only considers allocating objects to StackID 0. Reviewers: arsenm, thegameg, MatzeB Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D60062 llvm-svn: 357460	2019-04-02 09:46:52 +00:00
Hans Wennborg	b669fea42f	SimplifyCFG SinkCommonCodeFromPredecessors: Also sink function calls without used results (PR41259) The code was previously checking that candidates for sinking had exactly one use or were a store instruction (which can't have uses). This meant we could sink call instructions only if they had a use. That limitation seemed a bit arbitrary, so this patch changes it to "instruction has zero or one use" which seems more natural and removes the need to special-case stores. Differential revision: https://reviews.llvm.org/D59936 llvm-svn: 357452	2019-04-02 08:01:38 +00:00
Craig Topper	536383a354	[X86] Add test cases to fixup-lea.ll for optsize and no size optimization. Add +/-slow-incdec command lines We only form inc/dec in FixupLEAs under minsize today, but all other locations in the compiler for inc/dec with optsize. llvm-svn: 357446	2019-04-02 00:54:22 +00:00
Craig Topper	c133015975	[X86] Autogenerate complete checks. NFC llvm-svn: 357445	2019-04-02 00:54:15 +00:00
Michael Liao	9bef688bc2	[AMDGPU] Add more test cases of D59608. Summary: - Add more test cases. Reviewers: arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60071 llvm-svn: 357442	2019-04-02 00:36:37 +00:00
Eli Friedman	3813fe0bda	[ARM] Optimize expressions like "return x != 0;" for Thumb1. There's an existing optimization for x != C, but somehow it was missing a special case for 0. While I'm here, also cleaned up the code/comments a bit: the second value produced by the MERGE_VALUES was actually dead, since a CMOV only produces one result. Differential Revision: https://reviews.llvm.org/D59616 llvm-svn: 357437	2019-04-02 00:01:23 +00:00
Eli Friedman	73af6ef2e7	[ARM] Don't try to create "push {r12, lr}" in Thumb1 at -Oz. It's a little tricky to make this issue show up because prologue/epilogue emission normally likes to push at least two registers... but it doesn't when lr is force-spilled due to function length. Not sure if that really makes sense, but I decided not to touch it for now. Differential Revision: https://reviews.llvm.org/D59385 llvm-svn: 357436	2019-04-01 23:55:57 +00:00
Jessica Paquette	e44c20a68d	[AArch64][GlobalISe] Select STRQui for stores into v264s instead of scalarizing This improves selection for vector stores into v2s64s. Before we just scalarized them, but we can just use a STRQui instead. Differential Revision: https://reviews.llvm.org/D60083 llvm-svn: 357432	2019-04-01 22:19:13 +00:00
Bixia Zheng	6c21ccd245	[NVPTX] Fix the codegen for llvm.round. Summary: Previously, we translate llvm.round to PTX cvt.rni, which rounds to the even interger when the source is equidistant between two integers. This is not correct as llvm.round should round away from zero. This change replaces llvm.round with a round away from zero implementation through target specific custom lowering. Modify a few affected tests to not check for cvt.rni. Instead, we check for the use of a few constants used in implementing round. We are also adding CUDA runnable tests to check for the values produced by llvm.round to test-suites/External/CUDA. Reviewers: tra Subscribers: jholewinski, sanjoy, jlebar, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59947 llvm-svn: 357407	2019-04-01 16:10:26 +00:00
Neil Henning	0a30f33ce2	[AMDGPU] Pre-allocate WWM registers to reduce VGPR pressure. This change incorporates an effort by Connor Abbot to change how we deal with WWM operations potentially trashing valid values in inactive lanes. Previously, the SIFixWWMLiveness pass would work out which registers were being trashed within WWM regions, and ensure that the register allocator did not have any values it was depending on resident in those registers if the WWM section would trash them. This worked perfectly well, but would cause sometimes severe register pressure when the WWM section resided before divergent control flow (or at least that is where I mostly observed it). This fix instead runs through the WWM sections and pre allocates some registers for WWM. It then reserves these registers so that the register allocator cannot use them. This results in a significant register saving on some WWM shaders I'm working with (130 -> 104 VGPRs, with just this change!). Differential Revision: https://reviews.llvm.org/D59295 llvm-svn: 357400	2019-04-01 15:19:52 +00:00
Alex Bradbury	da20f5ca74	[RISCV] Generate address sequences suitable for mcmodel=medium This patch adds an implementation of a PC-relative addressing sequence to be used when -mcmodel=medium is specified. With absolute addressing, a 'medium' codemodel may cause addresses to be out of range. This is because while 'medium' implies a 2 GiB addressing range, this 2 GiB can be at any offset as opposed to 'small', which implies the first 2 GiB only. Note that LLVM/Clang currently specifies code models differently to GCC, where small and medium imply the same functionality as GCC's medlow and medany respectively. Differential Revision: https://reviews.llvm.org/D54143 Patch by Lewis Revill. llvm-svn: 357393	2019-04-01 14:42:56 +00:00
Clement Courbet	7e062c9b1f	[X86] Make post-ra scheduling macrofusion-aware. Subscribers: MatzeB, arsenm, jvesely, nhaehnle, hiraditya, javed.absar, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59688 llvm-svn: 357384	2019-04-01 13:48:50 +00:00
Clement Courbet	d9f6ee1c3c	[X86MacroFusion][NFC] Add more tests. In preparation for D59688. llvm-svn: 357381	2019-04-01 13:18:34 +00:00
Krasimir Georgiev	7af32444b9	[X86] Fix a test from r357317 Summary: The missing `<` causes the lld command to override the test file, which fails in environments marking the test files as readonly. Reviewers: bkramer Reviewed By: bkramer Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60060 llvm-svn: 357380	2019-04-01 11:42:54 +00:00
Simon Pilgrim	e8c3136994	[X86][SSE] Add fcmp constant folding tests Initial test coverage for D60006 llvm-svn: 357379	2019-04-01 10:54:04 +00:00
Luis Marques	3091884e25	[RISCV] Add seto pattern expansion Adds a `seto` pattern expansion. Without it the lowerings of `fcmp one` and `fcmp ord` would be inefficient due to an unoptimized double negation. Differential Revision: https://reviews.llvm.org/D59699 llvm-svn: 357378	2019-04-01 09:54:14 +00:00
Sanjay Patel	e1bc360fc6	[x86] allow movmsk with 2-element reductions One motivation for making this change is that the lack of using movmsk is likely a main source of perf difference between clang and gcc on the C-Ray benchmark as shown here: https://www.phoronix.com/scan.php?page=article&item=gcc-clang-2019&num=5 ...but this change alone isn't enough to solve that problem. The 'all-of' examples show what is likely the worst case trade-off: we end up with an extra instruction (or 2 if we count the 'xor' register clearing). The 'any-of' examples look clearly better using movmsk because we've traded 2 vector instructions for 2 scalar instructions, and movmsk may have better timing than the generic 'movq'. If we examine the llvm-mca output for these cases, it appears that even though the 'all-of' movmsk variant looks worse on paper, it would perform better on both Haswell and Jaguar. $ llvm-mca -mcpu=haswell no_movmsk.s -timeline Iterations: 100 Instructions: 400 Total Cycles: 504 Total uOps: 400 Dispatch Width: 4 uOps Per Cycle: 0.79 IPC: 0.79 Block RThroughput: 1.0 $ llvm-mca -mcpu=haswell movmsk.s -timeline Iterations: 100 Instructions: 600 Total Cycles: 358 Total uOps: 600 Dispatch Width: 4 uOps Per Cycle: 1.68 IPC: 1.68 Block RThroughput: 1.5 $ llvm-mca -mcpu=btver2 no_movmsk.s -timeline Iterations: 100 Instructions: 400 Total Cycles: 407 Total uOps: 400 Dispatch Width: 2 uOps Per Cycle: 0.98 IPC: 0.98 Block RThroughput: 2.0 $ llvm-mca -mcpu=btver2 movmsk.s -timeline Iterations: 100 Instructions: 600 Total Cycles: 311 Total uOps: 600 Dispatch Width: 2 uOps Per Cycle: 1.93 IPC: 1.93 Block RThroughput: 3.0 Finally, there may be CPUs where movmsk is horribly slow (old AMD small cores?), but if that's true, then we're also almost certainly making the wrong transform already for reductions with >2 elements, so that should be fixed independently. Differential Revision: https://reviews.llvm.org/D59997 llvm-svn: 357367	2019-03-31 15:11:34 +00:00
Simon Pilgrim	ec56621a5c	[SystemZ] Remove fcmp undef from reduced test Pre-commit for D60006 (Add fcmp UNDEF handling to SelectionDAG::FoldSetCC) Approved by @uweigand (Ulrich Weigand) llvm-svn: 357355	2019-03-30 20:24:26 +00:00
Simon Pilgrim	513e6b9d58	[MIPS] Remove fcmp undef from reduced test Pre-commit for D60006 (Add fcmp UNDEF handling to SelectionDAG::FoldSetCC) Approved by @atanasyan (Simon Atanasyan) llvm-svn: 357354	2019-03-30 20:16:16 +00:00
Craig Topper	e4a0fc7d75	[X86] Teach isel for RMW binops to handle negate Negate updates flags like a subtract. We should be able to use the flags from the RMW form of negate when we have (store (X86ISD::SUB 0, load A), A) Differential Revision: https://reviews.llvm.org/D60007 llvm-svn: 357353	2019-03-30 18:59:17 +00:00
Alex Bradbury	0b2803ee65	[RISCV] Add codegen support for ilp32f, ilp32d, lp64f, and lp64d ("hard float") ABIs This patch adds support for the RISC-V hard float ABIs, building on top of rL355771, which added basic target-abi parsing and MC layer support. It also builds on some re-organisations and expansion of the upstream ABI and calling convention tests which were recently committed directly upstream. A number of aspects of the RISC-V float hard float ABIs require frontend support (e.g. flattening of structs and passing int+fp for fp+fp structs in a pair of registers), and will be addressed in a Clang patch. As can be seen from the tests, it would be worthwhile extending RISCVMergeBaseOffsets to handle constant pool as well as global accesses. Differential Revision: https://reviews.llvm.org/D59357 llvm-svn: 357352	2019-03-30 17:59:30 +00:00
Simon Pilgrim	10c9032c02	[X86][SSE] detectAVGPattern - Match zext(or(x,y)) 'add like' patterns (PR41316) Fixes PR41316 where the expanded PAVG intrinsic had had one of its ADDs turned into an OR due to its operands having no conflicting bits. llvm-svn: 357351	2019-03-30 17:12:29 +00:00
Alex Bradbury	b5498cbf64	[RISCV] Add RV64 CHECK lines to test/CodeGen/RISCV/vararg.ll and prepare for hard float tests vararg.ll previously missed RV64 tests. This patch also prepares for using vararg.ll to test handling of varargs for the ilp32f/ilp32d/lp64f/lp64d hard float ABIs. In these ABIs, varargs are passed as in either the ilp32 or lp64 ABI. Due to some slight codegen differences, different check lines are needed for when RV32D is enabled. llvm-svn: 357350	2019-03-30 15:53:38 +00:00
Simon Pilgrim	cfdf09ba7d	[X86][SSE] Add PAVG test case from PR41316 llvm-svn: 357346	2019-03-30 13:53:11 +00:00
Heejin Ahn	c4ac74fb49	[WebAssembly] Fix unwind destination mismatches in CFG stackify Summary: Linearing the control flow by placing `try`/`end_try` markers can create mismatches in unwind destinations. This patch resolves these mismatches by wrapping those instructions with an incorrect unwind destination with a nested `try`/`catch`/`end_try` and branching to the right destination within the new catch block. Reviewers: dschuff Subscribers: sunfish, sbc100, jgravelle-google, chrib, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D48345 llvm-svn: 357343	2019-03-30 11:04:48 +00:00
Heejin Ahn	e9fd9073e4	[WebAssembly] Run ExplicitLocals pass after CFGStackify Summary: While this does not change any final output, this will greatly simplify ixing unwind destination mismatches in CFGStackify (D48345), because we have to create some new registers there. Reviewers: dschuff Subscribers: sunfish, sbc100, jgravelle-google, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59652 llvm-svn: 357342	2019-03-30 09:29:57 +00:00
Alex Bradbury	9681b01c21	[RISCV] Add DAGCombine for (SplitF64 (ConstantFP x)) The SplitF64 node is used on RV32D to convert an f64 directly to a pair of i32 (necessary as bitcasting to i64 isn't legal). When performed on a ConstantFP, this will result in a FP load from the constant pool followed by a store to the stack and two integer loads from the stack (necessary as there is no way to directly move between f64 FPRs and i32 GPRs on RV32D). It's always cheaper to just materialise integers for the lo and hi parts of the FP constant, so do that instead. llvm-svn: 357341	2019-03-30 09:15:47 +00:00
Alex Bradbury	98b8ecde64	[RISCV][NFC] Remove floating point operations from test/CodeGen/RISCV/vararg.ll This minimises differences in output when compiling with hardware floating point support, which will be done in a future patch (to demonstrate the same vararg calling convention is used). llvm-svn: 357339	2019-03-30 05:24:42 +00:00
Heejin Ahn	7e7aad1510	[WebAssembly] Optimize the number of routing blocks in FixIrreducibleCFG Summary: Currently we create a routing block to the dispatch block for every predecessor of every entry. So the total number of routing blocks created will be (# of preds) * (# of entries). But we don't need to do this: we need at most 2 routing blocks per loop entry, one for when the predecessor is inside the loop and one for it is outside the loop. (We can't merge these into one because this will creates another loop cycle between blocks inside and blocks outside) This patch fixes this and creates at most 2 routing blocks per entry. This also renames variable `Split` to `Routing`, which I think is a bit clearer. Reviewers: kripken Subscribers: sunfish, dschuff, sbc100, jgravelle-google, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59462 llvm-svn: 357337	2019-03-30 01:31:11 +00:00
Thomas Lively	5f0c4c67bb	[WebAssembly] Add mutable globals feature Summary: This feature is not actually used for anything in the WebAssembly backend, but adding it allows users to get it into the target features sections of their objects, which makes these objects future-compatible. Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, jdoerfert, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D60013 llvm-svn: 357321	2019-03-29 22:00:18 +00:00
Jessica Paquette	d3ffd47df9	[GlobalISel][AArch64] Add isel support for G_INSERT_VECTOR_ELT on v2s32s This adds support for v2s32 vector inserts, and updates the selection + regbankselect tests for G_INSERT_VECTOR_ELT. Differential Revision: https://reviews.llvm.org/D59910 llvm-svn: 357318	2019-03-29 21:39:36 +00:00
Amara Emerson	d413f41de6	[X86] When using Win64 ABI, exit with error if SSE is disabled for varargs We need XMM registers to handle varargs with the Win64 ABI. Before we would silently generate bad code resulting in an assertion failure elsewhere in the backend. llvm-svn: 357317	2019-03-29 21:30:51 +00:00
Heejin Ahn	67f74aceab	[WebAssembly] Handle END_LOOP in unreachable BB in CFGStackify Summary: This fixes crashes when a BB in which an END_LOOP is to be placed is unreachable and does not have any predecessors. Fixes PR41307. Reviewers: dschuff Subscribers: yurydelendik, sbc100, jgravelle-google, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60004 llvm-svn: 357303	2019-03-29 19:36:51 +00:00
Matt Arsenault	055e4dce45	AMDGPU: Remove dx10-clamp from subtarget features Since this can be set with s_setreg*, it should not be a subtarget property. Set a default based on the calling convention, and Introduce a new amdgpu-dx10-clamp attribute to override this if desired. Also introduce a new amdgpu-ieee attribute to match. The values need to match to allow inlining. I think it is OK for the caller's dx10-clamp attribute to override the callee, but there doesn't appear to be the infrastructure to do this currently without definining the attribute in the generic Attributes.td. Eventually the calling convention lowering will need to insert a mode switch somewhere for these. llvm-svn: 357302	2019-03-29 19:14:54 +00:00
Simon Pilgrim	d395bc1cc2	[Hexagon] Remove fcmp undef from reduced tests Pre-commit for D60006 (Add fcmp UNDEF handling to SelectionDAG::FoldSetCC) Approved by @kparzysz (Krzysztof Parzyszek) llvm-svn: 357301	2019-03-29 19:14:52 +00:00
Craig Topper	103fbbbfca	[X86] Add test cases showing failure to use RMW form of negate when only flags are used. NFC llvm-svn: 357300	2019-03-29 19:09:37 +00:00
Simon Pilgrim	759cbee744	[SystemZ] Regenerate double constant comparison test Prep work for PR40800 (Add UNDEF handling to SelectionDAG::FoldSetCC) llvm-svn: 357295	2019-03-29 18:23:08 +00:00
Simon Pilgrim	05e2621342	[MIPS] Regenerate double constant comparison test Prep work for PR40800 (Add UNDEF handling to SelectionDAG::FoldSetCC) llvm-svn: 357294	2019-03-29 18:22:18 +00:00
Simon Pilgrim	a3fb3d5583	[ARM] Regenerate execute-only float comparison tests Prep work for PR40800 (Add UNDEF handling to SelectionDAG::FoldSetCC) llvm-svn: 357293	2019-03-29 18:21:19 +00:00
Simon Pilgrim	dee8a14389	[AArch64] Regenerate half precision tests Prep work for PR40800 (Add UNDEF handling to SelectionDAG::FoldSetCC) llvm-svn: 357286	2019-03-29 17:46:06 +00:00
Nirav Dave	fe59e14031	[DAGCombine] Prune unnused nodes. Summary: Nodes that have no uses are eventually pruned when they are selected from the worklist. Record nodes newly added to the worklist or DAG and perform pruning after every combine attempt. Reviewers: efriedma, RKSimon, craig.topper, spatel, jyknight Reviewed By: jyknight Subscribers: jdoerfert, jyknight, nemanjai, jvesely, nhaehnle, javed.absar, hiraditya, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58070 llvm-svn: 357283	2019-03-29 17:35:56 +00:00
Simon Pilgrim	b4b98a528b	[ARM] Regenerate vector comparison tests Prep work for PR40800 (Add UNDEF handling to SelectionDAG::FoldSetCC) llvm-svn: 357281	2019-03-29 17:35:11 +00:00
Simon Pilgrim	4e00a93558	[X86] Fix some tests using fcmp with undef arguments Prep work for PR40800 (Add UNDEF handling to SelectionDAG::FoldSetCC) llvm-svn: 357278	2019-03-29 17:20:27 +00:00
Simon Atanasyan	f26f56d6d3	[mips] Fix lowering a signed immediate for *.d MSA instructions The `lowerMSASplatImm` function zero-extends `i32` immediates while building constant. If target type is `i64`, negative immediate loses the sign. As a result, for example `__builtin_msa_ldi_d(-1)` lowered to series of instruction loads incorrect value 0xffffffff to the `$w0` register instead of single `ldi.d $w0, -1` instruction. The fix zero-extends unsigned immediates and signed-extend signed immediates. Differential Revision: http://reviews.llvm.org/D59884 llvm-svn: 357264	2019-03-29 15:15:22 +00:00
Sanjay Patel	12685d0f7c	[DAGCombiner] simplify shuffle of shuffle After investigating the examples from D59777 targeting an SSE4.1 machine, it looks like a very different problem due to how we map illegal types (256-bit in these cases). We're missing a shuffle simplification that maps elements of a vector back to a shuffled operand. We have a more general version of this transform in DAGCombiner::visitVECTOR_SHUFFLE(), but that generality means it is limited to patterns with a one-use constraint, and the examples here have 2 uses. We don't need any uses or legality limitations for a simplification (no new value is created). It looks like we miss this pattern in IR too. In one of the zext examples here, we have shuffle masks like this: Shuf0 = vector_shuffle<0,u,3,7,0,u,3,7> Shuf = vector_shuffle<4,u,6,7,u,u,u,u> ...so that's moving the high half of the 1st vector into the low half. But the high half of the 1st vector is already identical to the low half. Differential Revision: https://reviews.llvm.org/D59961 llvm-svn: 357258	2019-03-29 14:20:38 +00:00
Nirav Dave	9259de217e	[DAGCombine] Improve Lifetime node chains. Improve both start and end lifetime nodes chain dependencies. Reviewers: courbet Reviewed By: courbet Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59795 llvm-svn: 357256	2019-03-29 14:09:47 +00:00
Sanjay Patel	665a385035	[DAGCombiner] fold sext into decrement This is a sibling to rL357178 that I noticed we'd hit if we chose an alternate transform in D59818. %z = zext i8 %x to i32 %dec = add i32 %z, -1 %r = sext i32 %dec to i64 => %z2 = zext i8 %x to i64 %r = add i64 %z2, -1 https://rise4fun.com/Alive/kPP The x86 vector diffs show a slight regression, so there's a chance that we should limit this and the previous transform to scalars. But given that we allowed vectors before, I'm matching that behavior here. We should change both transforms together if that's the right thing to do. llvm-svn: 357254	2019-03-29 13:49:08 +00:00
Hans Wennborg	800b12f90a	Switch lowering: exploit unreachable fall-through when lowering case range cluster In the example below, we would previously emit two range checks, one for cases 1--3 and one for 4--6. This patch makes us exploit the fact that the fall-through is unreachable and only one range check is necessary. switch i32 %i, label %default [ i32 1, label %bb1 i32 2, label %bb1 i32 3, label %bb1 i32 4, label %bb2 i32 5, label %bb2 i32 6, label %bb2 ] default: unreachable llvm-svn: 357252	2019-03-29 13:40:05 +00:00
Sanjay Patel	881bcbe094	[x86] add tests for decrement+sext; NFC llvm-svn: 357251	2019-03-29 13:34:48 +00:00
Konstantin Zhuravlyov	2b766ed774	AMDGPU: Make sram-ecc off by default for Vega20 Differential Revision: https://reviews.llvm.org/D59718 llvm-svn: 357247	2019-03-29 12:04:18 +00:00
Simon Pilgrim	aeaf7fcdde	[X86] Add X86TargetLowering::isCommutativeBinOp override. We currently just have test coverage for PMULUDQ - will add more in the future. llvm-svn: 357244	2019-03-29 11:25:58 +00:00
Kang Zhang	05f78b35ae	[PowerPC] Add the support for __builtin_setrnd() Summary: PowerPC64/PowerPC64le supports the builtin function __builtin_setrnd to set the floating point rounding mode. This function will use the least significant two bits of integer argument to set the floating point rounding mode. double __builtin_setrnd(int mode); The effective values for mode are: 0 - round to nearest 1 - round to zero 2 - round to +infinity 3 - round to -infinity Note that the mode argument will modulo 4, so if the int argument is greater than 3, it will only use the least significant two bits of the mode. Namely, builtin_setrnd(102)) is equal to builtin_setrnd(2). Reviewed By: jsji Differential Revision: https://reviews.llvm.org/D59405 llvm-svn: 357241	2019-03-29 08:45:24 +00:00
Matt Arsenault	5fddf09187	AMDGPU/GlobalISel: Insert waterfall loop for vector indexing The register index can only really be an SGPR. Lie that a VGPR index is legal, and then rewrite the instruction in a waterfall loop to handle the index. llvm-svn: 357235	2019-03-29 03:54:56 +00:00
Zi Xuan Wu	1445b77e8c	[PowerPC] Strength reduction of multiply by a constant by shift and add/sub in place A shift and add/sub sequence combination is faster in place of a multiply by constant. Because the cycle or latency of multiply is not huge, we only consider such following worthy patterns. ``` (mul x, 2^N + 1) => (add (shl x, N), x) (mul x, -(2^N + 1)) => -(add (shl x, N), x) (mul x, 2^N - 1) => (sub (shl x, N), x) (mul x, -(2^N - 1)) => (sub x, (shl x, N)) ``` And the cycles or latency is subtarget-dependent so that we need consider the subtarget to determine to do or not do such transformation. Also data type is considered for different cycles or latency to do multiply. Differential Revision: https://reviews.llvm.org/D58950 llvm-svn: 357233	2019-03-29 03:08:39 +00:00
Thomas Lively	3f34e1b883	[WebAssembly] Merge used feature sets, update atomics linkage policy Summary: It does not currently make sense to use WebAssembly features in some functions but not others, so this CL adds an IR pass that takes the union of all used feature sets and applies it to each function in the module. This allows us to prevent atomics from being lowered away if some function has opted in to using them. When atomics is not enabled anywhere, we detect whether there exists any atomic operations or thread local storage that would be stripped and disallow linking with objects that contain atomics if and only if atomics or tls are stripped. When atomics is enabled, mark it as used but do not require it of other objects in the link. These changes allow libraries that do not use atomics to be built once and linked into both single-threaded and multithreaded binaries. Reviewers: aheejin, sbc100, dschuff Subscribers: jgravelle-google, hiraditya, sunfish, jfb, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59625 llvm-svn: 357226	2019-03-29 00:14:01 +00:00
Yonghong Song	360a4e2ca6	[BPF] add proper multi-dimensional array support For multi-dimensional array like below int a[2][3]; the previous implementation generates BTF_KIND_ARRAY type like below: . element_type: int . index_type: unsigned int . number of elements: 6 This is not the best way to represent arrays, esp., when converting BTF back to headers and users will see int a[6]; instead. This patch generates proper support for multi-dimensional arrays. For "int a[2][3]", the two BTF_KIND_ARRAY types will be generated: Type #n: . element_type: int . index_type: unsigned int . number of elements: 3 Type #(n+1): . element_type: #n . index_type: unsigned int . number of elements: 2 The linux kernel already supports such a multi-dimensional array representation properly. Signed-off-by: Yonghong Song <yhs@fb.com> Differential Revision: https://reviews.llvm.org/D59943 llvm-svn: 357215	2019-03-28 21:59:49 +00:00
Craig Topper	c25c9b4d16	[X86] Teach the isel optimization for (x << C1) op C2 to (x op (C2>>C1)) << C1 to consider cases where C2>>C1 can fit an unsigned 32-bit immediate For 64-bit operations we should consider if the immediate can be made to fit in an unsigned 32-bits immedate. For OR/XOR this allows us to load the immediate with MOV32ri instead of movabsq. For AND this allows us to fold the immediate. Differential Revision: https://reviews.llvm.org/D59867 llvm-svn: 357196	2019-03-28 18:05:37 +00:00
Petar Avramovic	1af05df3de	[MIPS GlobalISel] Select float constants Select 32 and 64 bit float constants for MIPS32. Differential Revision: https://reviews.llvm.org/D59933 llvm-svn: 357183	2019-03-28 16:58:12 +00:00
Sanjay Patel	ffa8d3def7	[DAGCombiner] fold sext into negation As noted in D59818: %z = zext i8 %x to i32 %neg = sub i32 0, %z %r = sext i32 %neg to i64 => %z2 = zext i8 %x to i64 %r = sub i64 0, %z2 https://rise4fun.com/Alive/KzSR llvm-svn: 357178	2019-03-28 15:46:02 +00:00
Sanjay Patel	e781528278	[x86] add vector test for sext of negate; NFC llvm-svn: 357177	2019-03-28 15:30:09 +00:00
Sanjay Patel	5bbf6f0bd8	[x86] avoid cmov in movmsk reduction This is probably the least important of our movmsk problems, but I'm starting at the bottom to reduce distractions. We were creating a select_cc which bypasses the select and bitmask codegen optimizations that we have now. If we produce a compare+negate instead, we allow things like neg/sbb carry bit hacks, and in all cases we avoid a cmov. There's no partial register update danger in these sequences because we always produce the zero-register xor ahead of the 'set' if needed. There seems to be a missing fold for sext of a bool bit here: negl %ecx movslq %ecx, %rax ...but that's an independent transform. Differential Revision: https://reviews.llvm.org/D59818 llvm-svn: 357172	2019-03-28 14:16:13 +00:00
Clement Courbet	699dc025a6	[X86MacroFusion] Handle branch fusion (AMD CPUs). Summary: This adds a BranchFusion feature to replace the usage of the MacroFusion for AMD CPUs. See D59688 for context. Reviewers: andreadb, lebedev.ri Subscribers: hiraditya, jdoerfert, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59872 llvm-svn: 357171	2019-03-28 14:12:46 +00:00
Matt Arsenault	a353fd572a	AMDGPU: Make exec mask optimzations more resistant to block splits Also improve the check for SALU instructions to also ignore implicit_def and other fake instructions. llvm-svn: 357170	2019-03-28 14:01:39 +00:00
Simon Pilgrim	38a0616c1d	[DAGCombiner] Fold truncate(build_vector(x,y)) -> build_vector(truncate(x),truncate(y)) If scalar truncates are free, attempt to pre-truncate build_vectors source operands. Only attempt to do this before legalization as we often end up with truncations/extensions during build_vector lowering. Differential Revision: https://reviews.llvm.org/D59654 llvm-svn: 357161	2019-03-28 11:34:21 +00:00
Diana Picus	13ef0c5309	[ARM GlobalISel] Run regbankselect test for Thumb. NFCI This should just work, since ARM mode and Thumb2 mode are at the same level of support now and should map the same to GPR and FPR. llvm-svn: 357159	2019-03-28 10:57:29 +00:00
Simon Pilgrim	22be913ac0	[X85][AVX] Add missing vXi16 broadcast fold patterns Now that D59484 has landed its easier to add these. Added missing AVX512BW v32i16 equivalents while I was at it. llvm-svn: 357155	2019-03-28 10:25:13 +00:00
Diana Picus	52495c472f	[ARM GlobalISel] Fix G_STORE with s1 G_STORE for 1-bit values uses a STRBi12, which stores the whole byte. Zero out the undefined bits before writing. llvm-svn: 357154	2019-03-28 09:09:36 +00:00
Diana Picus	4d512df300	[ARM GlobalISel] Fix selection of G_SELECT G_SELECT uses a 1-bit scalar for the condition, and is currently implemented with a plain CMPri against 0. This means that values such as 0x1110 are interpreted as true, when instead the higher bits should be treated as undefined and therefore ignored. Replace the CMPri with a TSTri against 0x1, which performs an implicit AND, yielding the expected result. llvm-svn: 357153	2019-03-28 09:09:27 +00:00
Piotr Sobczak	f896785cb7	[SelectionDAG] Add 2 tests for selection across basic blocks Summary: Add tests for selection across basic block boundary: * one test containing a buffer load, where part of the offset computation is placed in the predecessor of the load * similar test, but containing two buffer loads and shared computations Please note that the behaviour being tested will be updated in a subsequent commit. This commit was extracted from https://reviews.llvm.org/D59535. Reviewers: RKSimon Reviewed By: RKSimon Subscribers: jvesely, nhaehnle, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59690 llvm-svn: 357149	2019-03-28 07:06:26 +00:00
Craig Topper	929932954d	[X86] Add test cases from PR27202. llvm-svn: 357132	2019-03-27 23:12:19 +00:00
Sanjay Patel	1df0bb6264	[x86] improve AVX lowering of vector zext If we know the 2 halves of an oversized zext-in-reg are the same, don't create those halves independently. I tried several different approaches to fold this, but it's difficult to get right during legalization. In the default path, we are creating a generic shuffle that looks like an unpack high, but it can get transformed into a different mask (a blend), so it's not straightforward to match that. If we try to fold after it actually becomes an X86ISD::UNPCKH node, we can't be sure what the operand node is - it might be a generic shuffle, or it could be some x86-specific op. From the test output, we should be doing something like this for SSE4.1 as well, but I'd rather leave that as a follow-up since it involves changing lowering actions. Differential Revision: https://reviews.llvm.org/D59777 llvm-svn: 357129	2019-03-27 22:42:11 +00:00
Daniel Sanders	495156dc6a	test/CodeGen/X86/codegen-prepare-replacephi.mir requires a default triple llvm-svn: 357122	2019-03-27 20:43:47 +00:00
Nirav Dave	6b741a8038	[DAGCombiner] Teach TokenFactor pruning to peek through lifetime nodes Summary: Lifetime nodes were inhibiting TokenFactor simplification inhibiting chain-based optimizations. Reviewers: courbet, jyknight Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59897 llvm-svn: 357121	2019-03-27 20:37:08 +00:00
Justin Bogner	b1650f0da9	[LegalizeVectorTypes] Allow single loads and stores for more short vectors When lowering a load or store for TypeWidenVector, the type legalizer would use a single load or store if the associated integer type was legal or promoted. E.g. it loads a v4i8 as an i32 if i32 is legal/promotable. (See https://reviews.llvm.org/rL236528 for reference.) This applies that behaviour to vector types. If the vector type is TypePromoteInteger, the element type is going to be TypePromoteInteger as well, which will lead to have a single promoting load rather than N individual promoting loads. For instance, if we have a v3i1, we would now have a load of v4i1 instead of 3 loads of i1. Patch by Guillaume Marques. Thanks! Differential Revision: https://reviews.llvm.org/D56201 llvm-svn: 357120	2019-03-27 20:35:56 +00:00
Nirav Dave	c6dfaa0e83	Revert r356996 "[DAG] Avoid smart constructor-based dangling nodes." This patch appears to trigger very large compile time increases in halide builds. llvm-svn: 357116	2019-03-27 19:54:41 +00:00
Eli Friedman	c388bfa230	[ARM] Don't confuse the scheduler for very large VLDMDIA etc. ARMBaseInstrInfo::getNumLDMAddresses is making bad assumptions about the memory operands of load and store-multiple operations. This doesn't really fix the problem properly, but it's enough to prevent crashing, at least. Fixes https://bugs.llvm.org/show_bug.cgi?id=41231 . Differential Revision: https://reviews.llvm.org/D59834 llvm-svn: 357109	2019-03-27 18:33:30 +00:00
Matt Arsenault	2e9ddcc30e	RegPressure: Fix crash on blocks with only dbg_value If there were only dbg_values in the block, recede would hit the beginning of the block and try to use thet dbg_value as a real instruction. llvm-svn: 357105	2019-03-27 18:14:02 +00:00
Amara Emerson	381188f1f3	[GlobalISel] Fix legalizer artifact combiner from crashing with invalid dead instructions. The artifact combiners push instructions which have been marked for deletion onto an list for the legalizer to deal with on return. However, for trunc(ext) combines the combiner routine recursively calls itself. When it does this the dead instructions list may not be empty, and the other combiners don't expect to be dealing with essentially invalid MIR (multiple vreg defs etc). This change fixes it by ensuring that the dead instructions are processed on entry into tryCombineInstruction. As a result, this fix exposed a few places in tests where G_TRUNC instructions were not being deleted even though they were dead. Differential Revision: https://reviews.llvm.org/D59892 llvm-svn: 357101	2019-03-27 17:47:42 +00:00
Matt Arsenault	7b14b2425d	Reapply "AMDGPU: Scavenge register instead of findUnusedReg" This reapplies r356149, using the correct overload of findUnusedReg which passes the current iterator. This worked most of the time, because the scavenger iterator was moved at the end of the frame index loop in PEI. This would fail if the spill was the first instruction. This was further hidden by the fact that the scavenger wasn't passed in for normal frame index elimination. llvm-svn: 357098	2019-03-27 17:31:29 +00:00
Matt Arsenault	86e4fc0504	AMDGPU: Add testcase I meant to merge into r357093 llvm-svn: 357097	2019-03-27 17:31:26 +00:00
Craig Topper	7c9afc35bc	[X86] Add post-isel pseudos for rotate by immediate using SHLD/SHRD Haswell CPUs have special support for SHLD/SHRD with the same register for both sources. Such an instruction will go to the rotate/shift unit on port 0 or 6. This gives it 1 cycle latency and 0.5 cycle reciprocal throughput. When the register is not the same, it becomes a 3 cycle operation on port 1. Sandybridge and Ivybridge always have 1 cyc latency and 0.5 cycle reciprocal throughput for any SHLD. When FastSHLDRotate feature flag is set, we try to use SHLD for rotate by immediate unless BMI2 is enabled. But MachineCopyPropagation can look through a copy and change one of the sources to be different. This will break the hardware optimization. This patch adds psuedo instruction to hide the second source input until after register allocation and MachineCopyPropagation. I'm not sure if this is the best way to do this or if there's some other way we can make this work. Fixes PR41055 Differential Revision: https://reviews.llvm.org/D59391 llvm-svn: 357096	2019-03-27 17:29:34 +00:00
Quentin Colombet	89daf49e5c	[PeepholeOpt] Don't stop simplifying copies on sequence of subregs This patch removes an overly conservative check that would prevent simplifying copies when the value we were tracking would go through several subregister indices. Indeed, the intend of this check was to not track values whenever we have to compose subregister, but actually what the check was doing was bailing anytime we see a second subreg, even if that second subreg would actually be the new source of truth (as opposed to a part of that subreg). Differential Revision: https://reviews.llvm.org/D59891 llvm-svn: 357095	2019-03-27 17:27:56 +00:00
Matt Arsenault	17e39100a2	AMDGPU: Enable the scavenger for large frames Another test is needed for the case where the scavenge fail, but there's another issue with that which needs an additional fix. llvm-svn: 357093	2019-03-27 17:14:32 +00:00
Matt Arsenault	4d47ac3b30	AMDGPU: Add additional MIR tests for exec mask optimizations Also includes one example of how this transform is unsound. This isn't verifying the copies are used in the control flow intrinisic patterns. Also add option to disable exec mask opt pass. Since this pass is unsound, it may be useful to turn it off until it is fixed. llvm-svn: 357091	2019-03-27 16:58:30 +00:00
Matt Arsenault	4ab28b64b4	AMDGPU: Skip debug_instr when collapsing end_cf Based on how these are inserted, I doubt this was causing a problem in practice. llvm-svn: 357090	2019-03-27 16:58:27 +00:00
Matt Arsenault	a42b7247d3	AMDGPU: Fix missing scc implicit def on s_andn2_b64_term Introduce new helper class to copy properties directly from the base instruction. llvm-svn: 357089	2019-03-27 16:58:22 +00:00
Matt Arsenault	733b8571b4	MIR: Freeze reserved regs after parsing everything The AMDGPU implementation of getReservedRegs depends on MachineFunctionInfo fields that are parsed from the YAML section. This was reserving the wrong register since it was setting the reserved regs before parsing the correct one. Some tests were relying on the default reserved set for the assumed default calling convention. llvm-svn: 357083	2019-03-27 16:12:26 +00:00
Matt Arsenault	e9ad7e9a71	AMDGPU: wave_barrier is not isBarrier This is not a control flow instruction, so should not be marked as isBarrier. This fixes a verifier error if followed by unreachable. llvm-svn: 357081	2019-03-27 15:54:45 +00:00
Yonghong Song	6c56edfe42	[BPF] use std::map to ensure consistent output The .BTF.ext FuncInfoTable and LineInfoTable contain information organized per ELF section. Current definition of FuncInfoTable/LineInfoTable is: std::unordered_map<uint32_t, std::vector<BTFFuncInfo>> FuncInfoTable std::unordered_map<uint32_t, std::vector<BTFLineInfo>> LineInfoTable where the key is the section name off in the string table. The unordered_map may cause the order of section output different for different platforms. The same for unordered map definition of std::unordered_map<std::string, std::unique_ptr<BTFKindDataSec>> DataSecEntries where BTF_KIND_DATASEC entries may have different ordering for different platforms. This patch fixed the issue by using std::map. Test static-var-derived-type.ll is modified to generate two DataSec's which will ensure the ordering is the same for all supported platforms. Signed-off-by: Yonghong Song <yhs@fb.com> llvm-svn: 357077	2019-03-27 15:45:27 +00:00
Clement Courbet	678d128b5a	[X86MacroFusion][NFC] Improve macrofusion testing. Add negative tests. Add arithmetic/inc/cmp/and macrofusion tests. llvm-svn: 357076	2019-03-27 15:43:03 +00:00
Matt Arsenault	bbc59d8d0d	AMDGPU: Fix areLoadsFromSameBasePtr for DS atomics The offset operand index is different for atomics. llvm-svn: 357073	2019-03-27 15:41:00 +00:00
Hans Wennborg	5c0d7a24e8	Re-commit r355490 "[CodeGen] Omit range checks from jump tables when lowering switches with unreachable default" Original commit by Ayonam Ray. This commit adds a regression test for the issue discovered in the previous commit: that the range check for the jump table can only be omitted if the fall-through destination of the jump table is unreachable, which isn't necessarily true just because the default of the switch is unreachable. This addresses the missing optimization in PR41242. > During the lowering of a switch that would result in the generation of a > jump table, a range check is performed before indexing into the jump > table, for the switch value being outside the jump table range and a > conditional branch is inserted to jump to the default block. In case the > default block is unreachable, this conditional jump can be omitted. This > patch implements omitting this conditional branch for unreachable > defaults. > > Differential Revision: https://reviews.llvm.org/D52002 > Reviewers: Hans Wennborg, Eli Freidman, Roman Lebedev llvm-svn: 357067	2019-03-27 14:10:11 +00:00
Simon Pilgrim	d6f9baf74f	[X86][SSE] Add shuffle test case for PR41249 llvm-svn: 357062	2019-03-27 11:21:09 +00:00
Simon Pilgrim	ccb71b2985	Revert rL356864 : [X86][SSE41] Start shuffle combining from ZERO_EXTEND_VECTOR_INREG (PR40685) Enable SSE41 ZERO_EXTEND_VECTOR_INREG shuffle combines - for the PMOVZX(PSHUFD(V)) -> UNPCKH(V,0) pattern we reduce the shuffles (port5-bottleneck on Intel) at the expense of creating a zero (pxor v,v) and an extra register move - which is a good trade off as these are pretty cheap and in most cases it doesn't increase register pressure. This also exposed a missed opportunity to use combine to ZERO_EXTEND_VECTOR_INREG with folded loads - even if we're in the float domain. ........ Causes PR41249 llvm-svn: 357057	2019-03-27 10:25:02 +00:00
Jonas Paulsson	38342a5185	[DAGCombiner] Don't allow addcarry if the carry producer is illegal. getAsCarry() checks that the input argument is a carry-producing node before allowing a transformation to addcarry. This patch adds a check to make sure that the carry-producing node is legal. If it is not, it may not remain in a form that is manageable by the target backend. The test case caused a compilation failure during instruction selection for this reason on SystemZ. Patch by Ulrich Weigand. Review: Sanjay Patel https://reviews.llvm.org/D59822 llvm-svn: 357052	2019-03-27 08:41:46 +00:00
Craig Topper	feadc2a1de	[X86] Add test cases for missed opportunities in (x << C1) op C2 to (x op (C2>>C1)) << C1 transform. We handle the case where the C2 does not fit in a signed 32-bit immediate, but (C2>>C1) does. But there's also some 64-bit opportunities when C2 is not an unsigned 32-bit immediate, but (C2>>C1) is. For OR/XOR this allows us to load the immediate with with MOV32ri instead of a movabsq. For AND it allows us to use a 32-bit AND and fold the immediate. llvm-svn: 357050	2019-03-27 06:07:05 +00:00
Craig Topper	7da7b97487	[X86] When iselling (x << C1) and/or/xor C2 as (x and/or/xor (C2>>C1)) << C1, go through the isel table instead of manually selecting. Previously we manually selected the AND/OR/XOR with immediate and the SHL(or ADD if the shift is 1). But this was missing out on the opportunity to use a 64 bit AND with a 32-bit immediate and possibly other isel tricks we have built into the tables. Instead, insert the new nodes into the DAG using insertDAGNode and allow them each to be selected through the normal table. llvm-svn: 357049	2019-03-27 04:45:58 +00:00
Craig Topper	06cdd7e488	[X86] Autogenerate complete checks. NFC llvm-svn: 357046	2019-03-27 02:18:41 +00:00
Francis Visoiu Mistrih	ee1a6e70fa	[Remarks] Emit a section containing remark diagnostics metadata A section containing metadata on remark diagnostics will be emitted if the flag (-mllvm) -remarks-section is present. For now, the metadata is: * a magic number for remarks: "REMARKS\0" * the version number: a little-endian uint64_t * the absolute file path to the serialized remark diagnostics: a null-terminated string. Differential Revision: https://reviews.llvm.org/D59571 llvm-svn: 357043	2019-03-27 01:13:59 +00:00
Quentin Colombet	c74271c537	[LiveRange] Reset the VNIs when splitting subranges When splitting a subrange we end up with two different subranges covering two different, non overlapping, lanes. As part of this splitting the VNIs of the original live-range need to be dispatched to the subranges according to which lanes they are actually defining. Prior to this patch we were assuming that all values were defining all lanes. This was wrong as demonstrated by llvm.org/PR40835. Differential Revision: https://reviews.llvm.org/D59731 llvm-svn: 357032	2019-03-26 21:27:15 +00:00
Sanjay Patel	bb5cba3cca	[SDAG] add simplifications for FP at node creation time We have the folds for fadd/fsub/fmul already in DAGCombiner, so it may be possible to remove that code if we can guarantee that these ops are zapped before they can exist. llvm-svn: 357029	2019-03-26 20:54:15 +00:00
Stefan Pintilie	e1d79a87c6	[PowerPC] Remove UseVSXReg The UseVSXReg flag can be safely removed and the code cleaned up. Patch By: Yi-Hong Liu Differential Revision: https://reviews.llvm.org/D58685 llvm-svn: 357028	2019-03-26 20:28:21 +00:00
Sam Clegg	492f752969	[WebAssembly] Initial implementation of PIC code generation This change implements lowering of references global symbols in PIC mode. This change implements lowering of global references in PIC mode using a new @GOT reference type. @GOT references can be used with function or data symbol names combined with the get_global instruction. In this case the linker will insert the wasm global that stores the address of the symbol (either in memory for data symbols or in the wasm table for function symbols). For now I'm continuing to use the R_WASM_GLOBAL_INDEX_LEB relocation type for this type of reference which means that this relocation type can refer to either a global or a function or data symbol. We could choose to introduce specific relocation types for GOT entries in the future. See the current dynamic linking proposal: https://github.com/WebAssembly/tool-conventions/blob/master/DynamicLinking.md Differential Revision: https://reviews.llvm.org/D54647 llvm-svn: 357022	2019-03-26 19:46:15 +00:00
Heejin Ahn	54551c1df7	[WebAssembly] Don't analyze branches after CFGStackify Summary: `WebAssembly::analyzeBranch` now does not analyze anything if the function is CFG stackified. We were previously doing similar things by checking if a branch's operand is whether an integer or an MBB, but this failed to bail out when a BB did not have any terminators. Consider this case: ``` bb0: try $label0 call @foo // unwinds to %ehpad bb1: ... br $label0 // jumps to %cont. can be deleted ehpad: catch ... cont: end_try ``` Here `br $label0` will be deleted in CFGStackify's `removeUnnecessaryInstrs` function, because we jump to the %cont block even without the branch. But in this case, MachineVerifier fails to verify this, because `ehpad` is not a successor of `bb1` even if `bb1` does not have any terminators. MachineVerifier incorrectly thinks `bb1` falls through to the next block. This pass now consistently rejects all analysis after CFGStackify whether a BB has terminators or not, also making the MachineVerifier work. (MachineVerifier does not try to verify relationships between BBs if `analyzeBranch` fails, the behavior we want after CFGStackify.) This also adds a new option `-wasm-disable-ehpad-sort` for testing. This option helps create the sorted order we want to test, and without the fix in this patch, the tests in cfg-stackify-eh.ll fail at MachineVerifier with `-wasm-disable-ehpad-sort`. Reviewers: dschuff Subscribers: sunfish, sbc100, jgravelle-google, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59740 llvm-svn: 357015	2019-03-26 18:21:20 +00:00
Heejin Ahn	1aaa481fc1	[WebAssembly] Add CFGStacikfied field to WebAssemblyFunctionInfo Summary: This adds `CFGStackified` field and its serialization to WebAssemblyFunctionInfo. Reviewers: dschuff Subscribers: sunfish, sbc100, jgravelle-google, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59747 llvm-svn: 357011	2019-03-26 17:46:14 +00:00
Heejin Ahn	52221d56bc	[WebAssembly] Support WebAssemblyFunctionInfo serialization Summary: The framework for supporting target-specific MachineFunctionInfo was added in r356215. This adds serialization support for WebAssemblyFunctionInfo on top of that. This patch only adds the framework and does not actually serialize anything at this point; we have to add YAML mapping later for the fields in WebAssemblyFunctionInfo we want to serialize if necessary. Reviewers: dschuff, arsenm Subscribers: sunfish, wdng, sbc100, jgravelle-google, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59737 llvm-svn: 357009	2019-03-26 17:35:35 +00:00
Heejin Ahn	222718fdd2	[WebAssembly] Fix a bug when mixing TRY/LOOP markers Summary: When TRY and LOOP markers are in the same BB and END_TRY and END_LOOP markers are in the same BB, END_TRY should be _before_ END_LOOP, because LOOP is always before TRY if they are in the same BB. (TRY is placed in the latest possible position, whereas LOOP is in the earliest possible position.) Reviewers: dschuff Subscribers: sunfish, sbc100, jgravelle-google, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59751 llvm-svn: 357008	2019-03-26 17:29:55 +00:00
Heejin Ahn	44a5a4b107	[WebAssembly] Fix bugs in BLOCK/TRY placement Summary: Before we placed all TRY/END_TRY markers before placing BLOCK/END_BLOCK markers. This couldn't handle this case: ``` bb0: br bb2 bb1: // nearest common dominator of bb3 and bb4 br_if ... bb3 br bb4 bb2: ... bb3: call @foo // unwinds to ehpad bb4: call @bar // unwinds to ehpad ehpad: catch ... ``` When we placed TRY markers, we placed it in bb1 because it is the nearest common dominator of bb3 and bb4. But because bb0 jumps to bb2, when we placed block markers, we ended up with interleaved scopes like ``` block try end_block catch end_try ``` which was not correct. This patch fixes the bug by placing BLOCK and TRY markers in one pass while iterating BBs in a function. This also adds some more routines to `placeTryMarkers`, because we now have to assume that there can be previously placed BLOCK and END_BLOCK. Reviewers: dschuff Subscribers: sunfish, sbc100, jgravelle-google, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59739 llvm-svn: 357007	2019-03-26 17:15:55 +00:00
Luis Marques	72734fc7b5	[RISCV] Update setcc-logic.ll codegen test This should have been updated as part of D59753. llvm-svn: 357002	2019-03-26 15:41:45 +00:00
Nirav Dave	a28c514581	[DAG] Avoid smart constructor-based dangling nodes. Various SelectionDAG non-combine operations (e.g. the getNode smart constructor and legalization) may leave dangling nodes by applying optimizations or not fully pruning unused result values. This can result in nodes that are never added to the worklist and therefore can not be pruned. Add a node inserter as the current node deleter to make sure such nodes have the chance of being pruned. Many minor changes, mostly positive. llvm-svn: 356996	2019-03-26 15:08:14 +00:00
Luis Marques	614fd9d830	[RISCV] Improve codegen for icmp {ne,eq} with a constant Adds two patterns to improve the codegen of GPR value comparisons with small constants. Instead of first loading the constant into another register and then doing an XOR of those registers, these patterns directly use the constant as an XORI immediate. llvm-svn: 356990	2019-03-26 12:55:00 +00:00
Simon Pilgrim	e24441aab0	[TargetLowering] Add SimplifyDemandedBits support for ISD::INSERT_VECTOR_ELT This helps us relax the extension of a lot of scalar elements before they are inserted into a vector. Its exposes an issue in DAGCombiner::convertBuildVecZextToZext as some/all the zero-extensions may be relaxed to ANY_EXTEND, so we need to handle that case to avoid a couple of AVX2 VPMOVZX test regressions. Once this is in it should be easier to fix a number of remaining failures to fold loads into VBROADCAST nodes. Differential Revision: https://reviews.llvm.org/D59484 llvm-svn: 356989	2019-03-26 12:32:01 +00:00
Yi Kong	74b874ac4c	Fix nondeterminism introduced in r353954 DenseMap iteration order is not guaranteed, use MapVector instead. Fix provided by srhines. Differential Revision: https://reviews.llvm.org/D59807 llvm-svn: 356988	2019-03-26 12:18:08 +00:00
Eli Friedman	1e5d569c8c	[ARM] Add missing memory operands to a bunch of instructions. This should hopefully lead to minor improvements in code generation, and more accurate spill/reload comments in assembly. Also fix isLoadFromStackSlotPostFE/isStoreToStackSlotPostFE so they don't lead to misleading assembly comments for merged memory operands; this is technically orthogonal, but in practice the relevant memory operand lists don't show up without this change. Differential Revision: https://reviews.llvm.org/D59713 llvm-svn: 356963	2019-03-25 22:42:30 +00:00
Sanjay Patel	9bcb0766eb	[x86] add tests for vector cmps; NFC llvm-svn: 356959	2019-03-25 22:08:45 +00:00
Matt Arsenault	b008b37b61	AMDGPU: Make collapse-endcf test more useful Without a VALU instruction in the return block, these were mostly testing the path to delete exec mask code before s_endpgm rather than the end cf handling. llvm-svn: 356955	2019-03-25 21:28:51 +00:00
Eli Friedman	92d0d13366	[AArch64] Prefer "mov" over "orr" to materialize constants. This is generally more readable due to the way the assembler aliases work. (This causes a lot of test changes, but it's not really as scary as it looks at first glance; it's just mechanically changing a bunch of checks for orr to check for mov instead.) Differential Revision: https://reviews.llvm.org/D59720 llvm-svn: 356954	2019-03-25 21:25:28 +00:00
Konstantin Zhuravlyov	51809cbc98	AMDGPU: Add support for cross address space synchronization scopes Differential Revision: https://reviews.llvm.org/D59517 llvm-svn: 356946	2019-03-25 20:50:21 +00:00
Simon Pilgrim	167af1bafb	[SelectionDAG] Add icmp UNDEF handling to SelectionDAG::FoldSetCC First half of PR40800, this patch adds DAG undef handling to icmp instructions to match the behaviour in llvm::ConstantFoldCompareInstruction and SimplifyICmpInst, this permits constant folding of vector comparisons where some elements had been reduced to UNDEF (by SimplifyDemandedVectorElts etc.). This involved a lot of tweaking to reduced tests as bugpoint loves to reduce icmp arguments to undef........ Differential Revision: https://reviews.llvm.org/D59363 llvm-svn: 356938	2019-03-25 18:51:57 +00:00
Sanjay Patel	f49e33e252	[x86] add another vector zext test; NFC Goes with the proposal in D59777 llvm-svn: 356930	2019-03-25 17:53:56 +00:00
Matt Arsenault	b27e4974d0	MISched: Don't schedule regions with 0 instructions I think this is correct, but may not necessarily be the correct fix for the assertion I'm really trying to solve. If a scheduling region was found that only has dbg_value instructions, the RegPressure tracker would end up in an inconsistent state because it would skip over any debug instructions and point to an instruction outside of the scheduling region. It may still be possible for this to happen if there are some real schedulable instructions between dbg_values, but I haven't managed to break this. The testcase is extremely sensitive and I'm not sure how to make it more resistent to future scheduler changes that would avoid stressing this situation. llvm-svn: 356926	2019-03-25 17:15:44 +00:00
Sanjay Patel	76c1ef3d07	[x86] add tests for vector zext; NFC The AVX1 lowering is poor. llvm-svn: 356914	2019-03-25 15:54:34 +00:00
Jonas Paulsson	0e75e21eb3	[RegAlloc] Simplify MIR test Remove the IR part from test/CodeGen/X86/regalloc-copy-hints.mir (added by r355854). To make the test remain functional, the parts of the MBB names referring to BB names have been removed, as well as all machine memory operands. llvm-svn: 356899	2019-03-25 14:28:32 +00:00
Petar Avramovic	a034a64f84	[MIPS GlobalISel] Select copy for arguments from FPRBRegBank Move selectCopy into MipsInstructionSelector class. Select copy for arguments from FPRBRegBank for MIPS32. Differential Revision: https://reviews.llvm.org/D59644 llvm-svn: 356886	2019-03-25 11:38:06 +00:00
Petar Avramovic	3dfa368d5d	[MIPS GlobalISel] Add floating point register bank Add floating point register bank for MIPS32. Implement getRegBankFromRegClass for float register classes. Differential Revision: https://reviews.llvm.org/D59643 llvm-svn: 356883	2019-03-25 11:30:46 +00:00
Petar Avramovic	5a457e08f6	[MIPS GlobalISel] Lower float and double arguments in registers Lower float and double arguments in registers for MIPS32. When float/double argument is passed through gpr registers select appropriate move instruction. Differential Revision: https://reviews.llvm.org/D59642 llvm-svn: 356882	2019-03-25 11:23:41 +00:00
Diana Picus	254b11a0fd	[ARM GlobalISel] 64-bit memops should be aligned We currently use only VLDR/VSTR for all 64-bit loads/stores, so the memory operands must be word-aligned. Mark aligned operations as legal and narrow non-aligned ones to 32 bits. While we're here, also mark non-power-of-2 loads/stores as unsupported. llvm-svn: 356872	2019-03-25 08:54:29 +00:00
Craig Topper	7c2554dd92	Revert r356688 "[X86] Don't avoid folding multiple use sign extended 8-bit immediate into instructions under optsize." Looking back over how the one use optimization works, I don't think this is the right way to fix this. llvm-svn: 356866	2019-03-25 01:25:32 +00:00
Simon Pilgrim	87d4ab8b92	[X86][SSE41] Start shuffle combining from ZERO_EXTEND_VECTOR_INREG (PR40685) Enable SSE41 ZERO_EXTEND_VECTOR_INREG shuffle combines - for the PMOVZX(PSHUFD(V)) -> UNPCKH(V,0) pattern we reduce the shuffles (port5-bottleneck on Intel) at the expense of creating a zero (pxor v,v) and an extra register move - which is a good trade off as these are pretty cheap and in most cases it doesn't increase register pressure. This also exposed a missed opportunity to use combine to ZERO_EXTEND_VECTOR_INREG with folded loads - even if we're in the float domain. llvm-svn: 356864	2019-03-24 19:06:35 +00:00
Simon Pilgrim	4465a765ee	[X86] Remove icmp undef from reduced tests Pre-commit for D59363 (Add icmp UNDEF handling to SelectionDAG::FoldSetCC) Approved by @spatel (Sanjay Patel) llvm-svn: 356859	2019-03-24 17:02:08 +00:00
Simon Pilgrim	a71c0ed471	[X86][AVX] Start shuffle combining from ZERO_EXTEND_VECTOR_INREG (PR40685) Just enable this for AVX for now as SSE41 introduces extra register moves for the PMOVZX(PSHUFD(V)) -> UNPCKH(V,0) pattern (but otherwise helps reduce port5 usage on Intel targets). Only AVX support is required for PR40685 as the issue is due to 8i8->8i32 zext shuffle leftovers. llvm-svn: 356858	2019-03-24 16:30:35 +00:00
Sanjay Patel	7d676dfd86	[x86] improve the default expansion of uaddsat/usubsat This is yet another step towards solving PR14613: https://bugs.llvm.org/show_bug.cgi?id=14613 uaddsat X, Y --> (X >u (X + Y)) ? -1 : X + Y usubsat X, Y --> (X >u Y) ? X - Y : 0 We can't count on a sane vector ISA, so override the default (umin/umax) expansion of unsigned add/sub saturate in cases where we do not have umin/umax. Differential Revision: https://reviews.llvm.org/D59006 llvm-svn: 356855	2019-03-24 13:55:54 +00:00
Eli Friedman	b906bba576	[ARM] Don't form "ands" when it isn't scheduled correctly. In r322972/r323136, the iteration here was changed to catch cases at the beginning of a basic block... but we accidentally deleted an important safety check. Restore that check to the way it was. Fixes https://bugs.llvm.org/show_bug.cgi?id=41116 Differential Revision: https://reviews.llvm.org/D59680 llvm-svn: 356809	2019-03-22 20:49:15 +00:00
Craig Topper	ce1ed55a4a	[X86] Use xmm registers to implement 64-bit popcnt on 32-bit targets if possible if popcnt instruction is not available On 32-bit targets without popcnt, we currently expand 64-bit popcnt to sequences of arithmetic and logic ops for each 32-bit half and then add the 32 bit halves together. If we have xmm registers we can use use those to implement the operation instead. This results in less instructions then doing two separate 32-bit popcnt sequences. This mitigates some of PR41151 for the i64 on i686 case when we have SSE2. Differential Revision: https://reviews.llvm.org/D59662 llvm-svn: 356808	2019-03-22 20:47:02 +00:00
Craig Topper	1ffd8e8114	[X86] Use movq for i64 atomic load on 32-bit targets when sse2 is enable We used a lock cmpxchg8b to do i64 atomic loads. But if we have SSE2 we can do better and use a plain movq to do the load instead. I tried to just use an f64 atomic load and add isel patterns to MOVSD(which the domain fixing pass can turn to MOVQ), but the atomic_load SDNode in TargetSelectionDAG.td requires the type to be integer. So I've emitted VZEXT_LOAD instead which should be selected by isel to a MOVQ. Hopefully we don't need a specific atomic flavor of this. I kept the memory operand from the original AtomicSDNode. I wasn't sure if I might need to set the MOVolatile flag? I've left some FIXMEs for improvements we can do without SSE2. Differential Revision: https://reviews.llvm.org/D59679 llvm-svn: 356807	2019-03-22 20:46:56 +00:00
Evandro Menezes	4a7739b681	[AArch64, ARM] Add support for Exynos M5 Add Exynos M5 support and test cases. llvm-svn: 356793	2019-03-22 18:42:14 +00:00
James Y Knight	c0e6b8ac3a	IR: Support parsing numeric block ids, and emit them in textual output. Just as as llvm IR supports explicitly specifying numeric value ids for instructions, and emits them by default in textual output, now do the same for blocks. This is a slightly incompatible change in the textual IR format. Previously, llvm would parse numeric labels as string names. E.g. define void @f() { br label %"55" 55: ret void } defined a label named "55", even without needing to be quoted, while the reference required quoting. Now, if you intend a block label which looks like a value number to be a name, you must quote it in the definition too (e.g. `"55":`). Previously, llvm would print nameless blocks only as a comment, and would omit it if there was no predecessor. This could cause confusion for readers of the IR, just as unnamed instructions did prior to the addition of "%5 = " syntax, back in 2008 (PR2480). Now, it will always print a label for an unnamed block, with the exception of the entry block. (IMO it may be better to print it for the entry-block as well. However, that requires updating many more tests.) Thus, the following is supported, and is the canonical printing: define i32 @f(i32, i32) { %3 = add i32 %0, %1 br label %4 4: ret i32 %3 } New test cases covering this behavior are added, and other tests updated as required. Differential Revision: https://reviews.llvm.org/D58548 llvm-svn: 356789	2019-03-22 18:27:13 +00:00
Simon Pilgrim	aea9db9d40	[X86] Regenerate powi tests to include i686 x87/sse targets llvm-svn: 356787	2019-03-22 18:04:28 +00:00
Simon Pilgrim	08380afaab	[X86] Add PR13897 test case (i128 mul on i686) llvm-svn: 356786	2019-03-22 17:52:21 +00:00
Simon Pilgrim	564392d752	[X86] lowerShuffleAsBitMask - ensure float bit masks are the correct width (PR41203) llvm-svn: 356784	2019-03-22 17:23:55 +00:00
Sanjay Patel	221081e365	[x86] auto-generate complete test checks; NFC llvm-svn: 356763	2019-03-22 15:33:59 +00:00
Sanjay Patel	0893351c1c	[x86] auto-generate complete test checks; NFC llvm-svn: 356762	2019-03-22 15:33:55 +00:00
Sanjay Patel	61e2333acb	[x86] add 'nounwind' to tests to reduce noise; NFC llvm-svn: 356761	2019-03-22 15:33:51 +00:00
Sanjay Patel	f39494e795	[x86] auto-generate complete checks for test; NFC llvm-svn: 356760	2019-03-22 15:33:47 +00:00
Tim Renouf	6f0191a55a	[AMDGPU] Use three- and five-dword result type in image ops Some image ops return three or five dwords. Previously, we modeled that with a 4 or 8 dword register class. The register allocator could cleverly spot that some subregs were dead and allocate something else there, but that caused the de-optimization that waitcnt insertion would think that the result was used immediately. This commit allows such an image op to have a result with a three or five dword result, avoiding the above de-optimization. Differential Revision: https://reviews.llvm.org/D58905 Change-Id: I3651211bbd7ed22721ee7b9fefd7bcc60a809d8b llvm-svn: 356757	2019-03-22 15:21:11 +00:00
Tim Renouf	677387d8dc	[AMDGPU] Implemented dwordx3 variants of buffer/tbuffer load/store intrinsics Now we have vec3 MVTs, this commit implements dwordx3 variants of the buffer intrinsics. On gfx6, a dwordx3 buffer load intrinsic is implemented as a dwordx4 instruction, and a dwordx3 buffer store intrinsic is not supported. We need to support the dwordx3 load intrinsic because it is generated by subtarget-unaware code in InstCombine. Differential Revision: https://reviews.llvm.org/D58904 Change-Id: I016729d8557b98a52f529638ae97c340a5922a4e llvm-svn: 356755	2019-03-22 14:58:02 +00:00
Alex Bradbury	dab1f6fc4e	[RISCV] Add basic RV32E definitions and MC layer support The RISC-V ISA defines RV32E as an alternative "base" instruction set encoding, that differs from RV32I by having only 16 rather than 32 registers. This patch adds basic definitions for RV32E as well as MC layer support (assembling, disassembling) and tests. The only supported ABI on RV32E is ILP32E. Add a new RISCVFeatures::validate() helper to RISCVUtils which can be called from codegen or MC layer libraries to validate the combination of TargetTriple and FeatureBitSet. Other targets have similar checks (e.g. erroring if SPE is enabled on PPC64 or oddspreg + o32 ABI on Mips), but they either duplicate the checks (Mips), or fail to check for both codegen and MC codepaths (PPC). Codegen for the ILP32E ABI support and RV32E codegen are left for a future patch/patches. Differential Revision: https://reviews.llvm.org/D59470 llvm-svn: 356744	2019-03-22 11:21:40 +00:00
Alex Bradbury	b9e78c3994	[RISCV] Optimize emission of SELECT sequences This patch optimizes the emission of a sequence of SELECTs with the same condition, avoiding the insertion of unnecessary control flow. Such a sequence often occurs when a SELECT of values wider than XLEN is legalized into two SELECTs with legal types. We have identified several use cases where the SELECTs could be interleaved with other instructions. Therefore, we extend the sequence to include non-SELECT instructions if we are able to detect that the non-SELECT instructions do not impact the optimization. This patch supersedes https://reviews.llvm.org/D59096, which attempted to address this issue by introducing a new SelectionDAG node. Hat tip to Eli Friedman for his feedback on how to best handle this issue. Differential Revision: https://reviews.llvm.org/D59355 Patch by Luís Marques. llvm-svn: 356741	2019-03-22 10:45:03 +00:00
Alex Bradbury	3369101158	[RISCV] Allow conversion of CC logic to bitwise logic Indicates in the TargetLowering interface that conversions from CC logic to bitwise logic are allowed. Adds tests that show the benefit when optimization opportunities are detected. Also adds tests that show that when the optimization is not applied correct code is generated (but opportunities for other optimizations remain). Differential Revision: https://reviews.llvm.org/D59596 Patch by Luís Marques. llvm-svn: 356740	2019-03-22 10:39:22 +00:00
Tim Renouf	033f99a2e5	[AMDGPU] Added v5i32 and v5f32 register classes They are not used by anything yet, but a subsequent commit will start using them for image ops that return 5 dwords. Differential Revision: https://reviews.llvm.org/D58903 Change-Id: I63e1904081e39a6d66e4eb96d51df25ad399d271 llvm-svn: 356735	2019-03-22 10:11:21 +00:00
Craig Topper	b865084ef3	[X86] Add 32-bit command lines with and without SSE2 to atomic-non-integer.ll. NFC llvm-svn: 356733	2019-03-22 04:28:40 +00:00
Yonghong Song	a1ffe2fa49	[BPF] fix flaky btf unit test static-var-derived-type.ll The DataSecEentries is defined as an unordered_map since order does not really matter. std::unordered_map<std::string, std::unique_ptr<BTFKindDataSec>> DataSecEntries; This seems causing the test static-var-derived-type.ll flaky as two sections ".bss" and ".readonly" have undeterministic ordering when performing map iterating, which decides the output assembly code sequence of BTF_KIND_DATASEC entries. Fix the test to have only one data section to remove flakiness. Signed-off-by: Yonghong Song <yhs@fb.com> llvm-svn: 356731	2019-03-22 02:54:47 +00:00
Yonghong Song	ded9a440d0	[BPF] handle derived type properly for computing type id Currently, the type id for a derived type is computed incorrectly. For example, type #1: int type #2: ptr to #1 For a global variable "int *a", type #1 will be attributed to variable "a". This is due to a bug which assigns the type id of the basetype of that derived type as the derived type's type id. This happens to "const", "volatile", "restrict", "typedef" and "pointer" types. This patch fixed this bug, fixed existing test cases and added a new one focusing on pointers plus other derived types. Signed-off-by: Yonghong Song <yhs@fb.com> llvm-svn: 356727	2019-03-22 01:30:50 +00:00
Craig Topper	056b9a995b	[X86] Autogenerate complete checks. NFC llvm-svn: 356723	2019-03-21 23:09:56 +00:00
Amara Emerson	c10b24691a	[AArch64] Split the neon.addp intrinsic into integer and fp variants. This is the result of discussions on the list about how to deal with intrinsics which require codegen to disambiguate them via only the integer/fp overloads. It causes problems for GlobalISel as some of that information is lost during translation, while with other operations like IR instructions the information is encoded into the instruction opcode. This patch changes clang to emit the new faddp intrinsic if the vector operands to the builtin have FP element types. LLVM IR AutoUpgrade has been taught to upgrade existing calls to aarch64.neon.addp with fp vector arguments, and we remove the workarounds introduced for GlobalISel in r355865. This is a more permanent solution to PR40968. Differential Revision: https://reviews.llvm.org/D59655 llvm-svn: 356722	2019-03-21 22:31:37 +00:00
Matt Arsenault	9a1a1f7bb2	Mips: Don't create copy of nothing This was creating a copy of the register the pseudo itself was def'ing, leaving a copy of an undefined register. I'm not sure how the verifier is not catching this, but this avoids asserting in a future change to RegAllocFast llvm-svn: 356716	2019-03-21 20:56:05 +00:00
Matt Arsenault	b34afa311d	GlobalISel: Fix RegBankSelect for REG_SEQUENCE The AArch64 test was broken since the result register already had a set register class, so this test was a no-op. The mapping verify call would fail because the result size is not the same as the inputs like in a copy or phi. The AMDGPU testcases are half broken and introduce illegal VGPR->SGPR copies which need much more work to handle correctly (same for phis), but add them as a baseline. llvm-svn: 356713	2019-03-21 20:45:36 +00:00
Simon Pilgrim	c2e4405475	[X86] canonicalizeBitSelect - don't attempt to canonicalize mask registers We don't use X86ISD::ANDNP for mask registers. Test case from @craig.topper (Craig Topper) llvm-svn: 356696	2019-03-21 18:32:38 +00:00
Sanjay Patel	0760758fed	[x86] add tests with movmsk potential (PR39665); NFC llvm-svn: 356691	2019-03-21 17:57:56 +00:00
Craig Topper	c14f3e4222	[X86] Don't avoid folding multiple use sign extended 8-bit immediate into instructions under optsize. Under optsize we try to avoid folding immediates into instructions under optsize. But if the immediate is 16-bits or 32 bits, but can be encoded as an 8-bit immediate we don't save enough from disabling the folding unless the immediate has enough uses to make up for the size of the move which is either 3 bytes or 5 bytes since there are no sign extended 8-bit moves. We would also save something if the immediate was a live out of the basic block and thus a move was unavoidable, but that would require a more advanced heuristic than just counting uses. Note we only avoid folding multiple use immediates into the patterns that use X86ISD::ADD/SUB/XOR/OR/AND/CMP/ADC/SBB nodes and not the more common ISD::ADD/SUB/XOR/OR/AND nodes. Differential Revision: https://reviews.llvm.org/D59522 llvm-svn: 356688	2019-03-21 17:38:58 +00:00
Craig Topper	9f0b17a248	[ScalarizeMaskedMemIntrin] Add support for scalarizing expandload and compressstore intrinsics. This adds support for scalarizing these intrinsics as well the X86TargetTransformInfo support to avoid scalarizing them in the cases X86 can handle. I've omitted handling special cases for constant masks for this first pass. Though CodeGenPrepare can constant fold the branch conditions and remove some of the control flow anyway. Fixes PR40994 and is covers most of PR3666. Might want to implement constant masks to close that. Differential Revision: https://reviews.llvm.org/D59180 llvm-svn: 356687	2019-03-21 17:38:52 +00:00
Krzysztof Parzyszek	4719502941	Add more rotate tests, including ORs of rotates This is a part of https://reviews.llvm.org/D47735. llvm-svn: 356683	2019-03-21 17:14:22 +00:00
Simon Pilgrim	da4992bf8d	[DAGCombine] SimplifySelectCC - call FoldSetCC with the setcc result type We were calling FoldSetCC with the compare operand type instead of the result type. Found by OSS-Fuzz #13838 (https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=13838) llvm-svn: 356667	2019-03-21 14:07:18 +00:00
Sanjay Patel	d47eac59ef	[CodeGenPrepare] limit formation of overflow intrinsics (PR41129) This is probably a bigger limitation than necessary, but since we don't have any evidence yet that this transform led to real-world perf improvements rather than regressions, I'm making a quick, blunt fix. In the motivating x86 example from: https://bugs.llvm.org/show_bug.cgi?id=41129 ...and shown in the regression test, we want to avoid an extra instruction in the dominating block because that could be costly. The x86 LSR test diff is reversing the changes from D57789. There's no evidence that 1 version is any better than the other yet. Differential Revision: https://reviews.llvm.org/D59602 llvm-svn: 356665	2019-03-21 13:57:07 +00:00
Simon Pilgrim	87d261bfd3	[Thumb] Fix infinite loop in ABS expansion (PR41160) Don't expand ISD::ABS node if its legal. llvm-svn: 356661	2019-03-21 12:41:18 +00:00
Tim Renouf	361b5b2193	[AMDGPU] Support for v3i32/v3f32 Added support for dwordx3 for most load/store types, but not DS, and not intrinsics yet. SI (gfx6) does not have dwordx3 instructions, so they are not enabled there. Some of this patch is from Matt Arsenault, also of AMD. Differential Revision: https://reviews.llvm.org/D58902 Change-Id: I913ef54f1433a7149da8d72f4af54dbb13436bd9 llvm-svn: 356659	2019-03-21 12:01:21 +00:00
Oliver Stannard	defdb1070f	[AArch64] Allow -mattr=tpidr-el[1\|2\|3] Added subtarget features for AArch64 to use TPIDR_EL[1\|2\|3] as the TLS base register, rather than the default TPIDR_EL0. Patch by Philip Derrin! Differential revision: https://reviews.llvm.org/D54685 llvm-svn: 356657	2019-03-21 11:30:17 +00:00
Simon Pilgrim	54ed653870	[SelectionDAG] Add scalarization of ABS node (PR41149) Patch by: @ikulagin (Ivan Kulagin) Differential Revision: https://reviews.llvm.org/D59577 llvm-svn: 356656	2019-03-21 11:18:54 +00:00
Craig Topper	8d46403b8e	[X86] Add CMPXCHG8B feature flag. Set it for all CPUs except i386/i486 including 'generic'. Disable use of CMPXCHG8B when this flag isn't set. CMPXCHG8B was introduced on i586/pentium generation. If its not enabled, limit the atomic width to 32 bits so the AtomicExpandPass will expand to lib calls. Unclear if we should be using a different limit for other configs. The default is 1024 and experimentation shows that using an i256 atomic will cause a crash in SelectionDAG. Differential Revision: https://reviews.llvm.org/D59576 llvm-svn: 356631	2019-03-20 23:35:49 +00:00
Stanislav Mekhanoshin	0a11829ab2	Allow machine dce to remove uses in the same instruction Machine DCE cannot remove a dead definition if there are non-dbg uses. A use however can be in the same instruction: dead %0 = INST %0 Such instructions sometimes created by Detect dead lanes pass. Allow this instruction to be deleted despite the use if the only use belongs to the same instruction. Differential Revision: https://reviews.llvm.org/D59565 llvm-svn: 356619	2019-03-20 21:42:05 +00:00
Craig Topper	0367553304	[X86] Call lowerShuffleAsBitMask for 512-bit vectors in lowerShuffleAsBlend. This patch enables the use of lowerShuffleAsBitMask for 512-bit blends before falling back to move immedate, GPR to k-register, and masked op. I had to make some changes to support v8i64 when i64 is not a legal type. And to support floating point types. This trades a load for the move immediate and GPR move which is higher latency. But its probably better for register pressure not having to hop through other register classes. The load+and should play better with LICM and rematerialization I think. Differential Revision: https://reviews.llvm.org/D59479 llvm-svn: 356618	2019-03-20 21:30:20 +00:00
Matt Arsenault	2065206a9d	AMDGPU: Don't look for constant in insert/extract_vector_elt regbankselect The constantness shouldn't change the register bank choice. We also don't need to restrict this to only indexing VGPRs, since it's possible to index SGPRs (but SelectionDAG made using this difficult). Allow directly indexing SGPRs when appropriate. llvm-svn: 356611	2019-03-20 20:41:34 +00:00
Thomas Lively	f6f4f84378	[WebAssembly] Target features section Summary: Implements a new target features section in assembly and object files that records what features are used, required, and disallowed in WebAssembly objects. The linker uses this information to ensure that all objects participating in a link are feature-compatible and records the set of used features in the output binary for use by optimizers and other tools later in the toolchain. The "atomics" feature is always required or disallowed to prevent linking code with stripped atomics into multithreaded binaries. Other features are marked used if they are enabled globally or on any function in a module. Future CLs will add linker flags for ignoring feature compatibility checks and for specifying the set of allowed features, implement using the presence of the "atomics" feature to control the type of memory and segments in the linked binary, and add front-end flags for relaxing the linkage policy for atomics. Reviewers: aheejin, sbc100, dschuff Subscribers: jgravelle-google, hiraditya, sunfish, mgrang, jfb, jdoerfert, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59173 llvm-svn: 356610	2019-03-20 20:26:45 +00:00
Michael Liao	eea5177d30	[AMDGPU] Fix clamp bit DAG operand Summary: - Should use `targetconstant` instead of `constant` operand for clamp bit, which is expected as an immediate operand. Under certain conditions, such as a common `i1 false` constant is used in other place and selected before the instruction with clamp bit, register operand may be added instead of immediate one. Use `targetcosntant` to enforce that. Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59608 llvm-svn: 356608	2019-03-20 20:18:56 +00:00
Pete Couperus	b062239d63	[ARC] Add ARCOptAddrMode pass to generate postincrement loads/stores. Build on newly introduced ARC postincrement loads/stores from r356200. Patch By Denis Antrushin! <denis@synopsys.com> Differential Revision: https://reviews.llvm.org/D59409 llvm-svn: 356606	2019-03-20 20:06:21 +00:00
Eli Friedman	638be660d7	[ARM] Eliminate redundant "mov rN, sp" instructions in Thumb1. This takes sequences like "mov r4, sp; str r0, [r4]", and optimizes them to something like "str r0, [sp]". For regular stack variables, this optimization was already implemented: we lower loads and stores using frame indexes, which are expanded later. However, when constructing a call frame for a call with more than four arguments, the existing optimization doesn't apply. We need to use stores which are actually relative to the current value of sp, and don't have an associated frame index. This patch adds a special case to handle that construct. At the DAG level, this is an ISD::STORE where the address is a CopyFromReg from SP (plus a small constant offset). This applies only to Thumb1: in Thumb2 or ARM mode, a regular store instruction can access SP directly, so the COPY gets eliminated by existing code. The change to ARMDAGToDAGISel::SelectThumbAddrModeSP is a related cleanup: we shouldn't pretend that it can select anything other than frame indexes. Differential Revision: https://reviews.llvm.org/D59568 llvm-svn: 356601	2019-03-20 19:40:45 +00:00
Tim Renouf	e7bd52f86e	[AMDGPU] Added MsgPack format PAL metadata Summary: PAL metadata now supports both the old linear reg=val pairs format and the new MsgPack format. The MsgPack format uses YAML as its textual representation. On output to YAML, a mnemonic name is provided for some hardware registers. Differential Revision: https://reviews.llvm.org/D57028 Change-Id: I2bbaabaaca4b3574f7e03b80fbef7c7a69d06a94 llvm-svn: 356591	2019-03-20 18:47:21 +00:00
Tim Renouf	d737b551e9	[AMDGPU] Factored PAL metadata handling out into its own class Summary: This commit introduces a new AMDGPUPALMetadata class that: * is inside the AMDGPU target; * keeps an in-memory representation of PAL metadata; * provides a method to read the frontend-supplied metadata from LLVM IR; * provides methods for the asm printer to set metadata items; * provides methods to write the metadata as a binary blob to put in a .note record or as an asm directive; * provides a method to read the metadata as a binary blob from a .note record. Because llvm-readobj cannot call directly into a target, I had to remove llvm-readobj's ability to dump PAL metadata, pending a resolution to https://reviews.llvm.org/D52821 Differential Revision: https://reviews.llvm.org/D57027 Change-Id: I756dc830894fcb6850324cdcfa87c0120eb2cf64 llvm-svn: 356582	2019-03-20 17:42:00 +00:00
Sanjay Patel	fb44f99b73	[CGP][x86] add tests for usubo regression (PR41129); NFC llvm-svn: 356559	2019-03-20 15:02:35 +00:00
Clement Courbet	238af52ded	[ExpandMemCmp] Trigger on bcmp too. Summary: Fixes 41150. Reviewers: gchatelet Subscribers: hiraditya, llvm-commits, ckennelly, sbenza, jyknight Tags: #llvm Differential Revision: https://reviews.llvm.org/D59593 llvm-svn: 356550	2019-03-20 11:51:11 +00:00
David Stuttard	fc2a747345	[AMDGPU] Allow MIMG with no uses in adjustWritemask in isel Summary: If an MIMG instruction has managed to get through to adjustWritemask in isel but has no uses (and doesn't enable TFC) then prevent an assertion by not attempting to adjust the writemask. The instruction will be removed anyway. Change-Id: I9a5dba6bafe1f35ac99c1b73df390936e2ac27a7 Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58964 llvm-svn: 356540	2019-03-20 09:29:55 +00:00
Craig Topper	fda1f96d28	[X86] Remove X32 check lines from a test that doesn't have an X32 FileCheck prefix. Regenerate the test using update_llc_test_checks. NFC llvm-svn: 356535	2019-03-20 03:13:28 +00:00
Eli Friedman	2596e8b3e7	[ARM] Make sure to save/restore LR when we use tBfar. This change does two things. One, it ensures compilation will abort instead of miscompiling if ARMFrameLowering::determineCalleeSaves chooses not to save LR in a case where it's necessary. Two, it changes the way we estimate the size of a function to be more conservative in the presence of constant pool entries and jump tables. EstimateFunctionSizeInBytes probably still isn't really conservative enough, but I'm not sure how we can come up with a reliable estimate before constant islands runs. Differential Revision: https://reviews.llvm.org/D59439 llvm-svn: 356527	2019-03-19 21:48:08 +00:00

... 4 5 6 7 8 ...

28659 Commits