Commit Graph

7511 Commits

Author SHA1 Message Date
Freddy Ye b3b0acdc6f [NFC] Refine some uninitialized used variables.
These warning are reported by static code analysis tool: Klocwork

Reviewed By: pengfei

Differential Revision: https://reviews.llvm.org/D95421
2021-01-26 16:51:05 +08:00
Simon Pilgrim 13f2aee783 [X86][AVX] Generalize vperm2f128/vperm2i128 patterns to support all legal 256-bit vector types
Remove bitcasts to/from v4x64 types through vperm2f128/vperm2i128 ops to help improve shuffle combining and demanded vector elts folding.
2021-01-25 15:35:36 +00:00
Simon Pilgrim 821a51a9ca [X86][AVX] combineX86ShuffleChainWithExtract - widen to at least original root size. NFCI.
We're relying on the source inputs for shuffle combining having already been widened to the root size (otherwise the offset logic falls over) - we're going to be supporting different sized shuffle inputs soon, so we need to explicitly make the minimum widened width the original root size.
2021-01-25 13:45:37 +00:00
Simon Pilgrim 1b780cf32e [X86][AVX] LowerTRUNCATE - avoid bitcasts around extract_subvectors.
We allow extract_subvector lowering of all legal types, so pre-bitcast the source type to try and reduce bitcast pollution.
2021-01-25 12:10:36 +00:00
Simon Pilgrim f461e35cba [X86][AVX] combineX86ShuffleChain - avoid bitcasts around insert_subvector() shuffle patterns.
We allow insert_subvector lowering of all legal types, so don't always cast to the vXi64/vXf64 shuffle types - this is only necessary for X86ISD::SHUF128/X86ISD::VPERM2X128 patterns later.
2021-01-25 11:35:45 +00:00
Fangrui Song d745b82de1 [XRay] Support DW_TAG_call_site and delete unneeded PATCHABLE_EVENT_CALL/PATCHABLE_TYPED_EVENT_CALL lowering 2021-01-25 00:49:18 -08:00
Simon Pilgrim bd122f6d21 [X86][AVX] canonicalizeLaneShuffleWithRepeatedOps - handle vperm2x128(movddup(x),movddup(y)) cases
Fold vperm2x128(movddup(x),movddup(y)) -> movddup(vperm2x128(x,y))
2021-01-22 16:05:19 +00:00
Simon Pilgrim c33d36e066 [X86][AVX] canonicalizeLaneShuffleWithRepeatedOps - handle unary vperm2x128(permute/shift(x,c),undef) cases
Fold vperm2x128(permute/shift(x,c),undef) -> permute/shift(vperm2x128(x,undef),c)
2021-01-22 15:47:23 +00:00
Simon Pilgrim 4846f6ab81 [X86][AVX] combineTargetShuffle - simplify the X86ISD::VPERM2X128 subvector matching
Simplify vperm2x128(concat(X,Y),concat(Z,W)) folding.

Use collectConcatOps / ISD::INSERT_SUBVECTOR to find the source subvectors instead of hardcoded immediate matching.
2021-01-22 15:47:22 +00:00
Simon Pilgrim b1166e1317 [X86][AVX] combineX86ShufflesRecursively - attempt to constant fold before widening shuffle inputs
combineX86ShufflesConstants/canonicalizeShuffleMaskWithHorizOp can both handle/earlyout shuffles with inputs of different widths, so delay widening as late as possible to make it easier to match constant folds etc.

The plan is to eventually move the widening inside combineX86ShuffleChain so that we don't create any new nodes unless we successfully combine the shuffles.
2021-01-22 13:19:35 +00:00
Simon Pilgrim ffe72f987f [X86][SSE] Don't fold shuffle(binop(),binop()) -> binop(shuffle(),shuffle()) if the shuffle are splats
rGbe69e66b1cd8 added the fold, but DAGCombiner.visitVECTOR_SHUFFLE doesn't merge shuffles if the inner shuffle is a splat, so we need to bail.

The non-fast-horiz-ops paths see some minor regressions, we might be able to improve on this after lowering to target shuffles.

Fix PR48823
2021-01-22 11:31:38 +00:00
Simon Pilgrim 86021d98d3 [X86] Avoid a std::string copy by replacing auto with const auto&. NFC.
Fixes msvc analyzer warning.
2021-01-21 11:04:07 +00:00
Max Kazantsev d6bb96e677 [X86] Add experimental option to separately tune alignment of innermost loops
We already have an experimental option to tune loop alignment. Its impact
is very wide (and there is a suspicion that it's not always profitable). We want
to have something more narrow to play with. This patch adds similar option that
overrides preferred alignment for innermost loops. This is for experimental
purposes, default values do not change the existing behavior.

Differential Revision: https://reviews.llvm.org/D94895
Reviewed By: pengfei
2021-01-21 11:15:16 +07:00
Simon Pilgrim b8b5e87e6b [X86][AVX] Handle vperm2x128 shuffling of a subvector splat.
We already handle "vperm2x128 (ins ?, X, C1), (ins ?, X, C1), 0x31" for shuffling of the upper subvectors, but we weren't dealing with the case when we were splatting the upper subvector from a single source.
2021-01-20 18:16:33 +00:00
Simon Pilgrim 19d02842ee [X86][AVX] Fold extract_subvector(VSRLI/VSHLI(x,32)) -> VSRLI/VSHLI(extract_subvector(x),32)
As discussed on D56387, if we're shifting to extract the upper/lower half of a vXi64 vector then we're actually better off performing this at the subvector level as its very likely to fold into something.

combineConcatVectorOps can perform this in reverse if necessary.
2021-01-20 14:34:54 +00:00
Simon Pilgrim 5626adcd6b [X86][SSE] combineVectorSignBitsTruncation - fold trunc(srl(x,c)) -> packss(sra(x,c))
If a srl doesn't introduce any sign bits into the truncated result, then replace with a sra to let us use a PACKSS truncation - fixes a regression noticed in D56387 on pre-SSE41 targets that don't have PACKUSDW.
2021-01-19 11:04:13 +00:00
Sanjay Patel d27bb5c375 [x86] add cast to avoid compile-time warning; NFC 2021-01-18 17:47:04 -05:00
Simon Pilgrim ce06475da9 [X86][AVX] IsElementEquivalent - add matchShuffleWithUNPCK + VBROADCAST/VBROADCAST_LOAD handling
Specify LHS/RHS operands in matchShuffleWithUNPCK's calls to isTargetShuffleEquivalent, and handle VBROADCAST/VBROADCAST_LOAD matching in IsElementEquivalent
2021-01-18 15:55:00 +00:00
Simon Pilgrim 770d1e0a88 [X86][SSE] isHorizontalBinOp - reuse any existing horizontal ops.
If we already have similar horizontal ops using the same args, then match that, even if we are on a target with slow horizontal ops.
2021-01-18 10:14:45 +00:00
Kazu Hirata 2082b10d10 [llvm] Use *::empty (NFC) 2021-01-16 09:40:55 -08:00
Simon Pilgrim be69e66b1c [X86][SSE] Attempt to fold shuffle(binop(),binop()) -> binop(shuffle(),shuffle())
If this will help us fold shuffles together, then push the shuffle through the merged binops.

Ideally this would be performed in DAGCombiner::visitVECTOR_SHUFFLE but getting an efficient+legal merged shuffle can be tricky - on SSE we can be confident that for 32/64-bit elements vectors shuffles should easily fold.
2021-01-15 16:25:25 +00:00
Simon Pilgrim 1dfd5c9ad8 [X86][AVX] combineHorizOpWithShuffle - support target shuffles in HOP(SHUFFLE(X,Y),SHUFFLE(X,Y)) -> SHUFFLE(HOP(X,Y))
Be more aggressive on (AVX2+) folds of lane shuffles of 256-bit horizontal ops by working on target/faux shuffles as well.
2021-01-15 13:55:30 +00:00
Simon Pilgrim cbbfc82586 [X86][SSE] canonicalizeShuffleMaskWithHorizOp - simplify shuffle(HOP(HOP(X,Y),HOP(Z,W))) style chains.
See if we can remove the shuffle by resorting a HOP chain so that the HOP args are pre-shuffled.

This initial version just handles (the most common) v4i32/v4f32 hadd/hsub reduction patterns - future work can extend this to v8i16 types plus PACK chains (2f64 HADD/HSUB should already be handled in the half-lane combine code later on).
2021-01-13 17:19:40 +00:00
Simon Pilgrim 0a0ee7f5a5 [X86] canonicalizeShuffleMaskWithHorizOp - minor refactor to support multiple src ops. NFCI.
canonicalizeShuffleMaskWithHorizOp currently only supports shuffles with 1 or 2 sources, but PR41813 will require us to support higher numbers of sources.

This patch just generalizes the initial setup stages to ensure all src ops are the same type and opcode and then will continue to early out if we have more than 2 sources.
2021-01-13 13:59:56 +00:00
Simon Pilgrim 0f59d09957 [X86][AVX] combineVectorSignBitsTruncation - limit AVX512 truncations to 128-bits (PR48727)
rG73a44f437bf1 result in 256-bit packss/packus ops with additional shuffles that shuffle combining can sometimes try to convert back into a truncation.
2021-01-13 10:38:23 +00:00
Bevin Hansson 07605ea1f3 [X86] Improved lowering for saturating float to int.
Adapted from D54696 by @nikic.

This patch improves lowering of saturating float to
int conversions, FP_TO_[SU]INT_SAT, for X86.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D86079
2021-01-12 15:44:41 +01:00
Simon Pilgrim 2ed914cb7e [X86][SSE] getFauxShuffleMask - handle PACKSS(SRAI(),SRAI()) shuffle patterns.
We can't easily treat ASHR a faux shuffle, but if it was just feeding a PACKSS then it was likely being used as sign-extension for a truncation, so just peek through and adjust the mask accordingly.
2021-01-12 14:07:53 +00:00
Simon Pilgrim 7e44208115 [X86][SSE] combineSubToSubus - add v16i32 handling on pre-AVX512BW targets.
v16i32 -> v16i16/v8i16 truncation is now good enough using PACKSS/PACKUS + shuffle combining that its no longer necessary to early-out on pre-AVX512BW targets.

This was noticed while looking at completing PR40111 and moving combineSubToSubus to DAGCombine entirely.
2021-01-12 13:44:11 +00:00
Simon Pilgrim a5212b5c91 [X86][SSE] combineSubToSubus - remove SSE2 early-out.
SSE2 truncation codegen has improved over the past few years (mainly due to better shuffle lowering/combining and computeKnownBits) - its no longer necessary to early-out from v8i32/v8i64 truncations.

This was noticed while looking at completing PR40111 and moving combineSubToSubus to DAGCombine entirely.
2021-01-12 12:52:11 +00:00
Simon Pilgrim 4214ca9614 [X86][AVX] Attempt to fold vpermf128(op(x,i),op(y,i)) -> op(vpermf128(x,y),i)
If vpermf128/vpermi128 is acting on 2 similar 'inlane' ops, then try to perform the vpermf128 first which will allow us to merge the ops.

This will help us fix one of the regressions in D56387
2021-01-11 16:59:25 +00:00
Simon Pilgrim 41bf338dd1 Revert rGd43a264a5dd3 "Revert "[X86][SSE] Fold unpack(hop(),hop()) -> permute(hop())""
This reapplies commit rG80dee7965dffdfb866afa9d74f3a4a97453708b2.

[X86][SSE] Fold unpack(hop(),hop()) -> permute(hop())

UNPCKL/UNPCKH only uses one op from each hop, so we can merge the hops and then permute the result.

REAPPLIED with a fix for unary unpacks of HOP.
2021-01-11 11:29:04 +00:00
Nico Weber d43a264a5d Revert "[X86][SSE] Fold unpack(hop(),hop()) -> permute(hop())"
This reverts commit 80dee7965d.
Makes clang sometimes hang forever. See
https://bugs.chromium.org/p/chromium/issues/detail?id=1164786#c6 for a
stand-alone repro.
2021-01-10 20:22:53 -05:00
Simon Pilgrim 80dee7965d [X86][SSE] Fold unpack(hop(),hop()) -> permute(hop())
UNPCKL/UNPCKH only uses one op from each hop, so we can merge the hops and then permute the result.
2021-01-08 15:22:17 +00:00
Simon Pilgrim 73a44f437b [X86][AVX] combineVectorSignBitsTruncation - use PACKSS/PACKUS in more AVX cases
AVX512 has fast truncation ops, but if the truncation source is a concatenation of subvectors then its likely that we can use PACK more efficiently.

This is only guaranteed to work for truncations to 128/256-bit vectors as the PACK works across 128-bit sub-lanes, for now I've just disabled 512-bit truncation cases but we need to get them working eventually for D61129.
2021-01-05 15:01:45 +00:00
Kazu Hirata 985f899bf2 [Target] Use llvm::append_range (NFC) 2021-01-03 09:57:43 -08:00
Fangrui Song 6be0b9a8dd [X86] Don't fold negative offset into 32-bit absolute address (e.g. movl $foo-1, %eax)
When building abseil-cpp `bin/absl_hash_test` with Clang in -fno-pic
mode, an instruction like `movl $foo-2147483648, $eax` may be produced
(subtracting a number from the address of a static variable). If foo's
address is smaller than 2147483648, GNU ld/gold/LLD will error because
R_X86_64_32 cannot represent a negative value.

```
using absl::Hash;
struct NoOp {
  template < typename HashCode >
  friend HashCode AbslHashValue(HashCode , NoOp );
};
template <typename> class HashIntTest : public testing::Test {};
TYPED_TEST_SUITE_P(HashIntTest);
TYPED_TEST_P(HashIntTest, BasicUsage) {
  if (std::numeric_limits< TypeParam >::min )
    EXPECT_NE(Hash< NoOp >()({}),
              Hash< TypeParam >()(std::numeric_limits< TypeParam >::min()));
}
REGISTER_TYPED_TEST_CASE_P(HashIntTest, BasicUsage);
using IntTypes = testing::Types< int32_t>;
INSTANTIATE_TYPED_TEST_CASE_P(My, HashIntTest, IntTypes);

ld: error: hash_test.cc:(function (anonymous namespace)::gtest_suite_HashIntTest_::BasicUsage<int>::TestBody(): .text+0x4E472): relocation R_X86_64_32 out of range: 18446744071564237392 is not in [0, 4294967295]; references absl::hash_internal::HashState::kSeed
```

Actually any negative offset is not allowed because the symbol address
can be zero (e.g. set by `-Wl,--defsym=foo=0`). So disallow such folding.

Reviewed By: pengfei

Differential Revision: https://reviews.llvm.org/D93931
2020-12-30 18:47:26 -08:00
Luo, Yuanke 981a0bd858 [X86] Add x86_amx type for intel AMX.
The x86_amx is used for AMX intrisics. <256 x i32> is bitcast to x86_amx when
it is used by AMX intrinsics, and x86_amx is bitcast to <256 x i32> when it
is used by load/store instruction. So amx intrinsics only operate on type x86_amx.
It can help to separate amx intrinsics from llvm IR instructions (+-*/).
Thank Craig for the idea. This patch depend on https://reviews.llvm.org/D87981.

Differential Revision: https://reviews.llvm.org/D91927
2020-12-30 13:52:13 +08:00
Simon Pilgrim 8767f3bb97 [X86][AVX] Remove X86ISD::SUBV_BROADCAST (PR38969)
Followup to D92645 - remove the remaining places where we create X86ISD::SUBV_BROADCAST, and fold splatted vector loads to X86ISD::SUBV_BROADCAST_LOAD instead.

Remove all the X86SubVBroadcast isel patterns, including all the fallbacks for if memory folding failed.
2020-12-18 15:49:53 +00:00
Simon Pilgrim 992fad03e2 [X86][AVX] Replace extract_subvector(broadcast(), 0) folds with generic SimplifyDemandedVectorEltsForTargetNode handling.
Simplifies a few more cases, notably shuffle demanded elts cases.
2020-12-18 11:51:10 +00:00
Simon Pilgrim 931e66bd89 [X86] Remove extract_subvector(subv_broadcast_load()) fold.
This was needed in an earlier version of D92645, but isn't now - and I've just noticed that it was potentially flawed depending on the relevant widths of the broadcasted and extracted subvectors.
2020-12-17 11:02:49 +00:00
Simon Pilgrim cdb692ee0c [X86] Add X86ISD::SUBV_BROADCAST_LOAD and begin removing X86ISD::SUBV_BROADCAST (PR38969)
Subvector broadcasts are only load instructions, yet X86ISD::SUBV_BROADCAST treats them more generally, requiring a lot of fallback tablegen patterns.

This initial patch replaces constant vector lowering inside lowerBuildVectorAsBroadcast with direct X86ISD::SUBV_BROADCAST_LOAD loads which helps us merge a number of equivalent loads/broadcasts.

As well as general plumbing/analysis additions for SUBV_BROADCAST_LOAD, I needed to wrap SelectionDAG::makeEquivalentMemoryOrdering so it can handle result chains from non generic LoadSDNode nodes.

Later patches will continue to replace X86ISD::SUBV_BROADCAST usage.

Differential Revision: https://reviews.llvm.org/D92645
2020-12-17 10:25:25 +00:00
QingShan Zhang ebdd20f430 Expand the fp_to_int/int_to_fp/fp_round/fp_extend as libcall for fp128
X86 and AArch64 expand it as libcall inside the target. And PowerPC also
want to expand them as libcall for P8. So, propose an implement in the
legalizer to common the logic and remove the code for X86/AArch64 to
avoid the duplicate code.

Reviewed By: Craig Topper

Differential Revision: https://reviews.llvm.org/D91331
2020-12-17 07:59:30 +00:00
Simon Pilgrim 553808d456 [X86] Rename reduction combiners to make it clearer whats happening. NFCI.
Since these are all working on reduction patterns, actually use that term in the function name to make them easier to search for.

At some point we're likely to start working with the ISD::VECREDUCE_* opcodes directly in the x86 backend, but that is still some way off.
2020-12-16 14:48:21 +00:00
Simon Pilgrim e55f7de946 [X86][SSE] combineReductionToHorizontal - don't rely on widenSubVector to handle illegal vector types.
Thanks to @asbirlea for reporting the bug.
2020-12-16 11:24:40 +00:00
Simon Pilgrim 712117338a [X86] Explicitly use SDValue instead of auto. NFCI.
Fix static analyzer warning about not using a SDValue&
2020-12-15 17:27:25 +00:00
Simon Pilgrim b0e5aea557 [X86] Remove unnecessary SUBV_BROADCAST combines. NFCI.
Noticed while dealing with D92645 - these are now handled by getFauxShuffleMask + shuffle combining code.
2020-12-15 16:54:34 +00:00
Simon Pilgrim bd07092669 [X86] Remove trailing whitespace. NFC. 2020-12-15 10:11:38 +00:00
Simon Pilgrim 15a31389b2 [X86][AVX] LowerBUILD_VECTOR - reduce 256/512-bit build vectors with zero/undef upper elements + pad.
As discussed on D92645, we don't do a good job of recognising when we don't require the full width of a ymm/zmm build vector because the upper elements are undef/zero.

This commit allows us to make use of implicit zeroing of upper elements with AVX instructions, which we emulate in DAG with a INSERT_SUBVECTOR into the bottom of a undef/zero vector of the original type.

This exposed a limitation in getTargetConstantBitsFromNode which didn't extract bits from INSERT_SUBVECTORs of different element widths which I've included as well to prevent a couple of regressions.
2020-12-15 10:11:38 +00:00
Harald van Dijk 9eac818370
[X86] Fix variadic argument handling for x32
The X86-64 ABI defines va_list as

  typedef struct {
    unsigned int gp_offset;
    unsigned int fp_offset;
    void *overflow_arg_area;
    void *reg_save_area;
  } va_list[1];

This means the size, alignment, and reg_save_area offset will depend on
whether we are in LP64 or in ILP32 mode, so this commit adds the checks.
Additionally, the VAARG_64 pseudo-instruction assumed 64-bit pointers, so
this commit adds a VAARG_X32 pseudo-instruction that behaves just like
VAARG_64, except for assuming 32-bit pointers.

Some of these changes were originally done by
Michael Liao <michael.hliao@gmail.com>.

Fixes https://bugs.llvm.org/show_bug.cgi?id=48428.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D93160
2020-12-14 23:47:27 +00:00
Simon Pilgrim 5f5a2547c1 [X86] LowerBUILD_VECTOR - track zero/nonzero elements with APInt masks. NFCI.
Prep work for undef/zero 'upper elements' handling as proposed in D92645.
2020-12-14 16:28:45 +00:00