llvm-project

Commit Graph

Author	SHA1	Message	Date
Matt Arsenault	41e5ac4fa4	TargetMachine: Add address space to getPointerSize llvm-svn: 327467	2018-03-14 00:36:23 +00:00
Craig Topper	ec4881ad53	[X86] Simplify the LowerAVXCONCAT_VECTORS code a little by creating a single path for insert_subvector handling. We now only create recursive concats if we have more than two non-zero values. This keeps our subvector broadcast DAG combine functioning. llvm-svn: 327457	2018-03-13 22:36:07 +00:00
Craig Topper	cc060e921b	[X86] Rewrite LowerAVXCONCAT_VECTORS similar to how we handle vXi1 concats. This better able to detect undef and zeros pieces in the concat. Or cases when only one subvector is non-zero. This allows us to avoid silly things like double inserts into progressively larger undefs. This still builds 512 bit concats of 128 bits by building up through 256 bits first. But I don't know if that's best. We probably want to merge this with the vXi1 concat code since they are very similar. llvm-svn: 327454	2018-03-13 22:05:25 +00:00
Craig Topper	7e711a6822	[X86] Remove SplitBinaryOpsAndApply and use SplitOpsAndApply by adding curly braces around the ops. Summary: Unless you were intentionally avoiding this syntax? I saw you mentioned makeArrayRef in your commit that added SplitOpsAndApply. Reviewers: RKSimon Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D44403 llvm-svn: 327418	2018-03-13 16:23:27 +00:00
Simon Pilgrim	93bd7187f4	[X86][SSE41] createVariablePermute v2X64 - PCMPEQQ can test for index 0/1 and select between them. llvm-svn: 327385	2018-03-13 12:22:58 +00:00
Craig Topper	acaba3b402	[X86] Remove use of MVT class from the ShuffleDecode library. MVT belongs to the CodeGen layer, but ShuffleDecode is used by the X86 InstPrinter which is part of the MC layer. This only worked because MVT is completely implemented in a header file with no other library dependencies. Differential Revision: https://reviews.llvm.org/D44353 llvm-svn: 327292	2018-03-12 16:43:11 +00:00
Simon Pilgrim	6618e2a09c	[X86][SSE] createVariablePermute - PSHUFB requires SSSE3 not just SSE3 llvm-svn: 327259	2018-03-12 12:30:04 +00:00
Craig Topper	7cc1b1fc84	[X86] Don't compute known bits twice for the same SDValue in LowerMUL. We called MaskedValueIsZero with two different masks, but underneath that calls computeKnownBits before applying the mask. This means we compute the same known bits twice due to the two calls. Instead just call computeKnownBits directly and apply the two masks ourselves. llvm-svn: 327251	2018-03-12 05:35:02 +00:00
Simon Pilgrim	d09cc9c62c	[X86][MMX] Support MMX build vectors to avoid SSE usage (PR29222) 64-bit MMX vector generation usually ends up lowering into SSE instructions before being spilled/reloaded as a MMX type. This patch creates a MMX vector from MMX source values, taking the lowest element from each source and constructing broadcasts/build_vectors with direct calls to the MMX PUNPCKL/PSHUFW intrinsics. We're missing a few consecutive load combines that could be handled in a future patch if that would be useful - my main interest here is just avoiding a lot of the MMX/SSE crossover. Differential Revision: https://reviews.llvm.org/D43618 llvm-svn: 327247	2018-03-11 19:22:13 +00:00
Simon Pilgrim	30f74c14ff	[X86][AVX] createVariablePermute - scale v16i16 variable permutes to use v32i8 codegen XOP was already doing this, and now AVX performs v32i8 variable permutes as well. llvm-svn: 327245	2018-03-11 17:23:54 +00:00
Simon Pilgrim	b306501796	[X86][AVX] createVariablePermute - widen permutes for cases where the source vector is wider than the destination type llvm-svn: 327244	2018-03-11 17:00:46 +00:00
Simon Pilgrim	9a5d0c7540	[X86][AVX] createVariablePermute - use PSHUFB+PCMPGT+SELECT for v32i8 variable permutes Same as the VPERMILPS/VPERMILPD approach for v8f32/v4f64 cases, rely on PSHUFB using bits[3:0] for indexing - we can ignore the sign bit (zero element) as those index vector values are considered undefined. The select between the lo/hi permute results based on the index size. llvm-svn: 327242	2018-03-11 16:28:11 +00:00
Simon Pilgrim	d2fbd87ce8	Fix for buildbots which didn't like makeArrayRef with initializer lists. llvm-svn: 327241	2018-03-11 14:31:55 +00:00
Simon Pilgrim	e60afdf9eb	[X86][SSE] Generalized SplitBinaryOpsAndApply to SplitOpsAndApply to support any number of ops. I've kept SplitBinaryOpsAndApply as a wrapper to avoid a lot of makeArrayRef code. llvm-svn: 327240	2018-03-11 14:04:53 +00:00
Simon Pilgrim	f9cc80d218	[X86][AVX] createVariablePermute - use 2xVPERMIL+PCMPGT+SELECT for v8i32/v8f32 and v4i64/v4f64 variable permutes As VPERMILPS/VPERMILPD only selects elements based on the bits[1:0]/bit[1] then we can permute both the (repeated) lo/hi 128-bit vectors in each case and then select between these results based on whether the index was for for lo/hi. For v4i64/v4f64 this avoids some rather nasty v4i64 multiples on the AVX2 implementation, which seems to be worse than the extra port5 pressure from the additional shuffles/blends. llvm-svn: 327239	2018-03-11 11:52:26 +00:00
Simon Pilgrim	2565bd421e	[X86][AVX512] createVariablePermute - Non-VLX targets can widen v4i64/v8f64 variable permutes to v8i64/v8f64 Permutes in the upper elements will be undefined, but they will be discarded anyway. llvm-svn: 327238	2018-03-11 11:19:19 +00:00
Simon Pilgrim	64b899f0f3	[x86][SSE] Add widenSubVector helper. NFCI. Helper function to insert a subvector into the bottom elements of a larger zero/undef vector with the same scalar type. I've converted a couple of INSERT_SUBVECTOR calls to use it, there are plenty more although in some cases I was worried it might make the code more ambiguous. llvm-svn: 327236	2018-03-11 10:50:48 +00:00
Simon Pilgrim	de7f3f0f91	[X86][XOP] createVariablePermute - use VPERMIL2 for v8i32/v4i64 variable permutes llvm-svn: 327222	2018-03-10 19:49:59 +00:00
Simon Pilgrim	ff1248f82f	[X86][XOP] createVariablePermute - use VPPERM for v16i16 variable permutes llvm-svn: 327218	2018-03-10 18:33:29 +00:00
Simon Pilgrim	d9dc114e2f	[X86][SSE] createVariablePermute - create index scaling helper. NFCI. This will help in some future changes for custom lowering. llvm-svn: 327217	2018-03-10 18:12:35 +00:00
Simon Pilgrim	8224241f75	[X86][XOP] createVariablePermute - use VPPERM for v32i8 variable permutes llvm-svn: 327213	2018-03-10 16:51:45 +00:00
Simon Pilgrim	2cd489feb2	[X86][AVX] createVariablePermute - fix v2i64/v2f64 VPERMILPD index creation. The input indices vector will put the index in bit0, but VPERMILPD actually selects off bit1 - so we need to scale accordingly. llvm-svn: 327159	2018-03-09 18:37:56 +00:00
Simon Pilgrim	230d38b559	[X86][SSE] createVariablePermute - move source vector canonicalization to top of function. NFCI. This is to make it easier to return early from the switch statement with custom lowering. llvm-svn: 327157	2018-03-09 18:08:08 +00:00
Simon Pilgrim	033a4167d2	Tidyup comment that was destroyed by clang-format. NFCI. llvm-svn: 327141	2018-03-09 15:50:09 +00:00
Simon Pilgrim	322c521ed7	[X86][SSE] createVariablePermute - move index vector canonicalization to top of function. NFCI. This is to make it easier to return early from the switch statement with custom lowering. llvm-svn: 327140	2018-03-09 15:48:56 +00:00
Craig Topper	784f1bbf5e	[X86] Remove SRAs from v16i8 multiply lowering on sse2 targets Previously we unpacked the even bytes of each input into the high byte of 16-bit elements then did an v8i16 arithmetic shift right by 8 bits to fill the upper bits of each word with sign bits. Then we did the v8i16 multiply and then masked to zero the upper 8-bits of each result. The similar was done for all the odd bytes. The results are then packed together with packuswb Since we are masking each multiply result element to 8-bits, and those 8-bits are determined only by the lower 8-bits of each of the inputs, we don't need to fill the upper bits with sign bits. So we can just unpack into the low byte of each element and treat the upper bits as garbage. This is what gcc also does. Differential Revision: https://reviews.llvm.org/D44267 llvm-svn: 327093	2018-03-09 01:22:31 +00:00
Simon Pilgrim	c286680032	[X86][AVX] Pull out variable permute creation from LowerBUILD_VECTORAsVariablePermute. NFCI. This will make it easier to handle more complex cases than basic scaling or index masks. llvm-svn: 327054	2018-03-08 20:07:06 +00:00
Craig Topper	a406796f5f	[X86] Change X86::PMULDQ/PMULUDQ opcodes to take vXi64 type as input instead of vXi32. This instruction can be thought of as reading either the even elements of a vXi32 input or the lower half of each element of a vXi64 input. We currently use the vXi32 interpretation, but vXi64 matches better with its broadcast behavior in EVEX. I'm looking at moving MULDQ/MULUDQ creation to a DAG combine so we can do it when AVX512DQ is enabled without having to go through Custom lowering. But in some of the test cases we failed to use a broadcast load due to the size difference. This should help with that. I'm also wondering if we can model these instructions in native IR and remove the intrinsics and I think using a vXi64 type will work better with that. llvm-svn: 326991	2018-03-08 08:02:52 +00:00
Simon Pilgrim	68594ee24a	[X86][SSE] LowerBUILD_VECTORAsVariablePermute - reorder permute types. NFCI. Reorder into 128/256/512 bit vector size groupings. NFCI commit before some new features. llvm-svn: 326963	2018-03-07 23:56:42 +00:00
Craig Topper	80ec0c3106	[X86] Remove unused function argument. NFC llvm-svn: 326939	2018-03-07 19:45:45 +00:00
Craig Topper	c3c15dd640	[X86] Make the MUL->VPMADDWD work before op legalization on AVX1 targets. Simplify feature checks by using isTypeLegal. The v8i32 conversion on AVX1 targets was only working after LowerMUL splits 256-bit vectors. While I was there I've also made it so we don't have to check for AVX2 and BWI directly and instead just ask if the type is legal. Differential Revision: https://reviews.llvm.org/D44190 llvm-svn: 326917	2018-03-07 17:53:18 +00:00
Craig Topper	80d3bb3b4b	[TargetLowering] Rename DAGCombinerInfo::isAfterLegalizeVectorOps to DAGCombiner::isAfterLegalizeDAG since that's what it checks. NFC The code checks Level == AfterLegalizeDAG which is the fourth and last of the possible DAG combine stages that we have. There is a Level called AfterLegalVectorOps, but that's the third DAG combine and it doesn't always run. A function called isAfterLegalVectorOps should imply it returns true in either of the DAG combines that runs after the legalize vector ops stage, but that's not what this function does. llvm-svn: 326832	2018-03-06 19:44:52 +00:00
Craig Topper	274e08dd81	[X86] Reject registers that require a REX prefix in inline asm constraints in 32-bit mode We don't currently reject r8-r15 or xmm8-32 or bpl/spl/sil/dil in 32-bit mode. Differential Revision: https://reviews.llvm.org/D44031 llvm-svn: 326826	2018-03-06 18:56:33 +00:00
Craig Topper	f546b2c06f	[X86] Replace usages of X86Subtarget::hasFp256 with hasAVX. Remove hasFP256. Almost none of these usages were FP specific. And we had no clear guideliness on when to use hasAVX vs hasFP256. I might also remove hasInt256 too since its an alias for hasAVX2. llvm-svn: 326682	2018-03-05 00:13:35 +00:00
Craig Topper	f2aae62228	[X86] Add a DAG combine to turn stores of vXi1 constants into scalar stores. llvm-svn: 326679	2018-03-04 19:33:15 +00:00
Craig Topper	12c35e1940	[X86] Fix unused variable in release builds. llvm-svn: 326672	2018-03-04 02:14:16 +00:00
Craig Topper	a476026f70	[X86] Combine (store (v1i1 (scalar_to_vector (i8 X)))) -> (store (i8 X)). llvm-svn: 326670	2018-03-04 01:48:02 +00:00
Craig Topper	be31585be8	[X86] Lower v1i1/v2i1/v4i1/v8i1 load/stores to i8 load/store during op legalization if AVX512DQ is not supported. We were previously doing this with isel patterns. Moving it to op legalization gives us chance to see the required bitcast earlier. And it lets us remove some isel patterns. llvm-svn: 326669	2018-03-04 01:48:00 +00:00
Craig Topper	d4b6601662	[X86] Remove 'else' after return. NFC llvm-svn: 326642	2018-03-03 05:18:21 +00:00
Craig Topper	6b1419b547	[X86] Reject xmm16-31 in inline asm constraints when AVX512 is disabled Fixes PR36532 Differential Revision: https://reviews.llvm.org/D43960 llvm-svn: 326596	2018-03-02 18:19:40 +00:00
Simon Pilgrim	90fd0622b6	[X86][MMX] Improve handling of 64-bit MMX constants 64-bit MMX constant generation usually ends up lowering into SSE instructions before being spilled/reloaded as a MMX type. This patch bitcasts the constant to a double value to allow correct loading directly to the MMX register. I've added MMX constant asm comment support to improve testing, it's better to always print the double values as hex constants as MMX is mainly an integer unit (and even with 3DNow! its just floats). Differential Revision: https://reviews.llvm.org/D43616 llvm-svn: 326497	2018-03-01 22:22:31 +00:00
Craig Topper	ccfa5257a6	[X86] Make sure we don't combine (fneg (fma X, Y, Z)) to a target specific node when there are no FMA instructions. This would cause a 'cannot select' error at isel when we should have emitted a lib call and an xor. Fixes PR36553. llvm-svn: 326393	2018-03-01 00:08:38 +00:00
Craig Topper	e31b9d1e5f	[X86] Lower extract_element from k-registers by bitcasting from v16i1 to i16 and extending/truncating. This is equivalent to what isel was doing anyway but by canonicalizing earlier we can remove some patterns. llvm-svn: 326375	2018-02-28 22:23:55 +00:00
Simon Pilgrim	72b86586b0	[X86][AVX512] Improve support for signed saturation truncation stores Matches what we already manage for unsigned saturation truncation stores Differential Revision: https://reviews.llvm.org/D43629 llvm-svn: 326372	2018-02-28 21:42:19 +00:00
Chih-Hung Hsieh	9f9e4681ac	[TLS] use emulated TLS if the target supports only this mode Emulated TLS is enabled by llc flag -emulated-tls, which is passed by clang driver. When llc is called explicitly or from other drivers like LTO, missing -emulated-tls flag would generate wrong TLS code for targets that supports only this mode. Now use useEmulatedTLS() instead of Options.EmulatedTLS to decide whether emulated TLS code should be generated. Unit tests are modified to run with and without the -emulated-tls flag. Differential Revision: https://reviews.llvm.org/D42999 llvm-svn: 326341	2018-02-28 17:48:55 +00:00
Craig Topper	48d5ed265c	[X86] Don't use EXTRACT_ELEMENT from v1i1 with i8/i32 result type when we need to guarantee zeroes in the upper bits of return. An extract_element where the result type is larger than the scalar element type is semantically an any_extend of from the scalar element type to the result type. If we expect zeroes in the upper bits of the i8/i32 we need to mae sure those zeroes are explicit in the DAG. For these cases the best way to accomplish this is use an insert_subvector to pad zeroes to the upper bits of the v1i1 first. We extend to either v16i1(for i32) or v8i1(for i8). Then bitcast that to a scalar and finish with a zero_extend up to i32 if necessary. We can't extend past v16i1 because that's the largest mask size on KNL. But isel is smarter enough to know that a zext of a bitcast from v16i1 to i16 can use a KMOVW instruction. The insert_subvectors will be dropped during isel because we can determine that the producing instruction already zeroed the upper bits of the k-register. llvm-svn: 326308	2018-02-28 08:14:28 +00:00
Craig Topper	ac799b05d4	[X86] Change the masked FPCLASS implementation to use AND instead of OR to combine the mask results. While the description for the instruction does mention OR, its talking about how the individual classification test results are ORed together. The incoming mask is used as a zeroing write mask. If the bit is 1 the classification is written to the output. The bit is 0 the output is 0. This equivalent to an AND. Here is pseudocode from the intrinsics guide FOR j := 0 to 1 i := j*64 IF k1[j] k[j] := CheckFPClass_FP64(a[i+63:i], imm8[7:0]) ELSE k[j] := 0 FI ENDFOR k[MAX:2] := 0 llvm-svn: 326306	2018-02-28 06:19:55 +00:00
Simon Pilgrim	ba43ec8702	[X86][AVX] combineLoopMAddPattern - support 256-bit cases on AVX1 via SplitBinaryOpsAndApply llvm-svn: 326189	2018-02-27 12:20:37 +00:00
Craig Topper	264707bae4	[X86] Simplify if condition. NFC SSE2 implies SSE1 and we already covered f32 in the SSE1 check so we don't need to check f32 in the SSE2 check. llvm-svn: 326170	2018-02-27 06:00:38 +00:00
Craig Topper	fcaa0323ec	[X86] Replace an impossible if condition with an assert. llvm-svn: 326167	2018-02-27 03:50:00 +00:00

1 2 3 4 5 ...

5332 Commits