Commit Graph

11890 Commits

Author SHA1 Message Date
Craig Topper 1daa66d3fd [SelectionDAG] Add SPLAT_VECTOR to SelectionDAG::isConstantFPBuildVectorOrConstantFP.
Matches what is done for the int version.

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D119793
2022-02-16 09:22:11 -08:00
Simon Pilgrim 30e9cdd1aa [DAG] computeKnownBits - add ISD::AVGCEILU handling
Expand the ISD::AVGCEILU to determine the known bits of the result.

First part of PR53622

Differential Revision: https://reviews.llvm.org/D119629
2022-02-16 13:00:15 +00:00
David Green 655d0d86f9 [DAGCombine] Move AVG combine to SimplifyDemandBits
This moves the matching of AVGFloor and AVGCeil into a place where
demand bit are available, so that it can detect more cases for more
folds. It changes the transform to start from a shift, not from a
truncate. We match the pattern shr(add(ext(A), ext(B)), 1), transforming
to ext(hadd(A, B)).

For signed values, because only the bottom bits are demanded llvm will
transform the above to use a lshr too, as opposed to ashr. In order to
correctly detect the hadd we need to know the demanded bits to turn it
back. Depending on whether the shift is signed (ashr) or logical (lshr),
and the extensions are signed or unsigned we can create different nodes.
If the shift is signed:
  Needs >= 2 sign bits. https://alive2.llvm.org/ce/z/h4gQAW generating signed rhadd.
  Needs >= 2 zero bits. https://alive2.llvm.org/ce/z/B64DUA generating unsigned rhadd.
If the shift is unsigned:
  Needs >= 1 zero bits. https://alive2.llvm.org/ce/z/ByD8sj generating unsigned rhadd.
  Needs 1 demanded bit zero and >= 2 sign bits https://alive2.llvm.org/ce/z/hvPGxX and
    https://alive2.llvm.org/ce/z/32P5n1 generating signed rhadd.

Differential Revision: https://reviews.llvm.org/D119072
2022-02-15 10:17:02 +00:00
David Green 03380c70ed [DAGCombine] Basic combines for AVG nodes.
This adds very basic combines for AVG nodes, mostly for constant folding
and handling degenerate (zero) cases. The code performs mostly the same
transforms as visitMULHS, adjusted for AVG nodes.

Constant folding extends to a higher bitwidth and drops the lowest bit.
For undef nodes, `avg undef, x` is transformed to x.  There is also a
transform for `avgfloor x, 0` transforming to `shr x, 1`.

Differential Revision: https://reviews.llvm.org/D119559
2022-02-14 11:18:35 +00:00
Nikita Popov ff040eca93 [FastISel] Reuse register for bitcast that does not change MVT
The current FastISel code reuses the register for a bitcast that
doesn't change the IR type, but uses a reg-to-reg copy if it
changes the IR type without changing the MVT. However, we can
simply reuse the register in that case as well.

In particular, this avoids unnecessary reg-to-reg copies for pointer
bitcasts. This was found while inspecting O0 codegen differences
between typed and opaque pointers.

Differential Revision: https://reviews.llvm.org/D119432
2022-02-14 09:13:17 +01:00
Craig Topper e72fe654b7 [DAGCombiner] Use getShiftAmountConstant in DAGCombiner::foldSelectOfConstants.
This enables fshl to be matched earlier on X86

  %6 = lshr i32 %3, 1
  %7 = select i1 %4, i32 -2147483648, i32 0
  %8 = or i32 %6, %7

X86 uses i8 for shift amounts. SelectionDAGBuilder creates the
ISD::SRL with an i8 shift type. DAGCombiner turns the select into
an ISD::SHL. Prior to this patch it would use i32 for the shift
amount. fshl matching failed because the shift amounts have different
types. LegalizeDAG fixes the ISD::SHL shift amount to i8. This
allowed fshl matching to succeed.

With this patch, the ISD::SHL will be created with an i8 shift
amount. This allows the fshl to match immediately.

No test case beause we still end up with a fshl either way.
2022-02-13 19:09:26 -08:00
Sanjay Patel 96b7e0b5a0 [SDAG] clean up scalarizing load transform
I have not found a way to expose a difference for this patch in a test
because it only triggers for a one-use load, but this is the code that
was adapted into D118376 and caused miscompiles. The new code pattern
is the same as what we do in narrowExtractedVectorLoad() (reduces load
width for a subvector extract).

This removes seemingly unnecessary manual worklist management and fixes
the chain updating via "SelectionDAG::makeEquivalentMemoryOrdering()".

Differential Revision: https://reviews.llvm.org/D119549
2022-02-12 11:41:19 -05:00
Sanjay Patel 429f10f5f2 [SDAG] reduce code duplication and fix formatting; NFC 2022-02-12 10:22:13 -05:00
Arthur Eubanks c0281c7607 [OpaquePtr][SPARC] Remove getPointerElementType() call in SparcISelLowering
Requires keeping better track of sret types.
2022-02-11 11:31:19 -08:00
David Green 4072e362c0 [ISel] Port AArch64 HADD and RHADD to ISel
This ports the aarch64 combines for HADD and RHADD over to DAG combine,
so that they can be used in more architectures (notably MVE in a
followup patch). They are renamed to AVGFLOOR and AVGCEIL in the
process, to avoid confusion with instructions such as X86 hadd. The code
was also rewritten slightly to remove the AArch64 idiosyncrasies.

The general pattern for a AVGFLOORS is
  %xe = sext i8 %x to i32
  %ye = sext i8 %y to i32
  %a = add i32 %xe, %ye
  %r = lshr i32 %a, 1
  %t = trunc i32 %r to i8

An AVGFLOORU is equivalent with zext. Because of the truncate
lshr==ashr, as the top bits are not demanded. An AVGCEIL also includes
an extra rounding, so includes an extra add of 1.

Differential Revision: https://reviews.llvm.org/D106237
2022-02-11 18:28:56 +00:00
Julien Pages dcb2da13f1 [AMDGPU] Add a new intrinsic to control fp_trunc rounding mode
Add a new llvm.fptrunc.round intrinsic to precisely control
the rounding mode when converting from f32 to f16.

Differential Revision: https://reviews.llvm.org/D110579
2022-02-11 12:08:23 -05:00
Nikita Popov 6241f7dee0 [FastISel] Remove redundant reg class check (NFC)
SrcVT and DstVT are the same in this branch, as such their register
classes will also be the same.
2022-02-10 14:10:00 +01:00
Reid Kleckner b5a592a8e2 [DAG] Remove pointless std::function wrapper, NFC 2022-02-09 14:30:43 -08:00
Reid Kleckner f63c150187 Revert "[DagCombine] Increase depth by number of operands to avoid a pathological compile time."
Appears to be causing check-llvm to fail

This reverts commit 49ab760090.
2022-02-09 13:55:40 -08:00
Alina Sbirlea 49ab760090 [DagCombine] Increase depth by number of operands to avoid a pathological compile time.
We're hitting a pathological compile-time case, profiled to be in
DagCombiner::visitTokenFactor and many inserts into a SmallPtrSet.
It looks like one of the paths around findBetterNeighborChains is not
capped and leads to this.

This patch resolves the issue. Looking for feedback if this solution
looks reasonable.

Differential Revision: https://reviews.llvm.org/D118877
2022-02-09 13:31:28 -08:00
Sander de Smalen ec46232517 [DAGCombiner] Fold `ty1 extract_vector(ty2 splat(V)) -> ty1 splat(V)`
This seems like an obvious fold, which leads to a few improvements.

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D118920
2022-02-09 14:30:01 +00:00
Sanjay Patel 905abc5b7d [SDAG] enable binop identity constant folds for fmul/fdiv
The test diffs are identical to D119111.

This only affects x86 currently because no other target
has an override for the TLI hook that controls this transform.
2022-02-08 10:52:28 -05:00
Roman Lebedev ae9414d562
[ValueTracking] Only check for non-undef/poison if already known to be a self-multiply
https://godbolt.org/z/js9fTTG9h
^ we don't care what `isGuaranteedNotToBeUndefOrPoison()` says
unless we already knew that the operands were equal.
2022-02-08 18:35:29 +03:00
Sanjay Patel a68e098024 [SDAG] move x86 select-with-identity-constant fold behind a target hook; NFC
This is no-functional-change-intended because only the
x86 target enables the TLI hook currently.

We can add fmul/fdiv opcodes to the switch similar to the
proposal D119111, but we don't need to make other changes
like enabling target-specific combines.

We can also add integer opcodes (add, or, shl, etc.) to
the switch because this function is called from all of the
generic binary opcodes.

The goal is to incrementally enable the profitable diffs
from D90113 while avoiding regressions.

Differential Revision: https://reviews.llvm.org/D119150
2022-02-08 09:55:05 -05:00
Simon Pilgrim fd2bb51f1e [ADT] Add APInt/MathExtras isShiftedMask variant returning mask offset/length
In many cases, calls to isShiftedMask are immediately followed with checks to determine the size and position of the bitmask.

This patch adds variants of APInt::isShiftedMask, isShiftedMask_32 and isShiftedMask_64 that return these values as additional arguments.

I've updated a number of cases that were either performing seperate size/position calculations or had created their own local wrapper versions of these.

Differential Revision: https://reviews.llvm.org/D119019
2022-02-08 12:04:13 +00:00
Sanjay Patel d1ecfaa097 [SDAG] try to fold one-demanded-bit-of-multiply
This is a translation of the transform added to InstCombine with:
D118539
2022-02-07 17:24:35 -05:00
Sanjay Patel fc6bee1c11 [SDAG] SimplifyDemandedBits - generalize fold for 2 LSB of X*X
This is translated from recent changes to the IR version of this function:
D119060
D119139
2022-02-07 15:38:50 -05:00
Simon Pilgrim 74555fd367 [DAG] visitINSERT_VECTOR_ELT - break if-else chain as they both return (style). NFC. 2022-02-07 09:58:47 +00:00
Craig Topper c35ccd2ac8 [DAGCombiner][RISCV] Allow rotates by non-constant to be matched for i32 on riscv64 with Zbb.
rv64izbb has a RORW/ROLW instructions that operate on the lower
32-bits of a 64-bit value and sign extend bit 31 of the result.

DAGCombiner won't match rotate idioms because the i32 type isn't Legal
on riscv64.

This patch teaches DAGCombiner to allow it if the type is going to
be promoted and the target has Custom type legalization for ISD::ROTL
or ISD::ROTR. I've restricted this to scalar types. It doesn't appear
any in tree targets other than riscv64 have custom type legalization
for rotates.

If this patch isn't acceptable, I guess I can match SRLW, SLLW, and OR
after type legalization, but I'd like to avoid that if possible.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D119062
2022-02-06 10:58:12 -08:00
Bjorn Pettersson cecf11c315 [DAGCombiner] Fold SSHLSAT/USHLSAT to SHL when no saturation will occur
When the shift amount is known and a known sign bit analysis of
the shiftee indicates that no saturation will occur, then we can
replace SSHLSAT/USHLSAT by SHL.

Differential Revision: https://reviews.llvm.org/D118765
2022-02-06 18:59:06 +01:00
Benjamin Kramer a40dc4eaf8 Simplify mask creation with llvm::seq. NFCI. 2022-02-05 23:35:41 +01:00
Sander de Smalen 6452549f30 [DAGCombiner] Fold vecreduce_or/and if operand is insert_subvector.
Fold:
  vecreduce_or(insert_subvec(zeroinitializer, vec))
  -> vecreduce_or(vec)

  vecreduce_and(insert_subvec(allones, vec))
  -> vecreduce_and(vec)

  vecreduce_and/or(insert_subvec(undef, vec))
  -> vecreduce_and/or(vec)

This is useful for SVE which uses insert/extract subvector
to convert fixed-width to/from scalable vectors.

Reviewed By: bsmith

Differential Revision: https://reviews.llvm.org/D118919
2022-02-05 14:35:53 +00:00
John Brawn 0d8092dd48 [AArch64] Fix legalization of v1f64 strict_fsetcc and strict_fsetccs
These operations are scalarized but the result type v1i1 isn't which
needs special handling (the same as is done for the non-strict
versions of these operations).

Differential Revision: https://reviews.llvm.org/D118258
2022-02-04 12:55:38 +00:00
serge-sans-paille ffe8720aa0 Reduce dependencies on llvm/BinaryFormat/Dwarf.h
This header is very large (3M Lines once expended) and was included in location
where dwarf-specific information were not needed.

More specifically, this commit suppresses the dependencies on
llvm/BinaryFormat/Dwarf.h in two headers: llvm/IR/IRBuilder.h and
llvm/IR/DebugInfoMetadata.h. As these headers (esp. the former) are widely used,
this has a decent impact on number of preprocessed lines generated during
compilation of LLVM, as showcased below.

This is achieved by moving some definitions back to the .cpp file, no
performance impact implied[0].

As a consequence of that patch, downstream user may need to manually some extra
files:

llvm/IR/IRBuilder.h no longer includes llvm/BinaryFormat/Dwarf.h
llvm/IR/DebugInfoMetadata.h no longer includes llvm/BinaryFormat/Dwarf.h

In some situations, codes maybe relying on the fact that
llvm/BinaryFormat/Dwarf.h was including llvm/ADT/Triple.h, this hidden
dependency now needs to be explicit.

$ clang++ -E  -Iinclude -I../llvm/include ../llvm/lib/Transforms/Scalar/*.cpp -std=c++14 -fno-rtti -fno-exceptions | wc -l
after:   10978519
before:  11245451

Related Discourse thread: https://llvm.discourse.group/t/include-what-you-use-include-cleanup
[0] https://llvm-compile-time-tracker.com/compare.php?from=fa7145dfbf94cb93b1c3e610582c495cb806569b&to=995d3e326ee1d9489145e20762c65465a9caeab4&stat=instructions

Differential Revision: https://reviews.llvm.org/D118781
2022-02-04 11:44:03 +01:00
Bjorn Pettersson 3db39e7479 [DAGCombiner] Fix dependency analysis in checkMergeStoreCandidatesForDependencies
In the aftermath of D116895 a problem was found in the analysis of
dependencies between store merge candidates in
checkMergeStoreCandidatesForDependencies, that is needed to avoid
the cycles are introduced in the DAG.

In the past it has been enough (or assumed to be enough) to start
scanning from non-chain operands when analysing the store merge
candidates for dependencies, assuming that the analysis of chain
dependencies performed when finding the candidates would cover
up for potential dependencies that exist involving the chain operands.
It was however discovered that one could end up with scenarios such
as descibed in the aarch64-checkMergeStoreCandidatesForDependencies.ll
test case, when the dependency between two stores is given by a mix
of chain operand dependencies and non-chain operand dependencies.

The fix in this patch make sure that we also account for chain operand
dependencies when doing the more elaborate analysis in
checkMergeStoreCandidatesForDependencies, no longer relying on that
the earlier check involving chain operands is enough.

Differential Revision: https://reviews.llvm.org/D118943
2022-02-04 08:53:01 +01:00
Sander de Smalen 01bfe9729a [ISEL] Canonicalize STEP_VECTOR to LHS if RHS is a splat.
This helps recognise patterns where we're trying to match STEP_VECTOR
patterns to INDEX instructions that take a GPR for the Start/Step.

The reason for canonicalising this operation to the LHS is
because it will already be canonicalised to the LHS if the RHS
is a constant splat vector.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D118459
2022-02-03 09:31:46 +00:00
Simon Pilgrim 5aa2acc86b [DAG] SimplifyDemandedVectorElts - remove KnownZero/KnownUndef from DCI helper wrapper
None of the external users actual touch these (they're purely used internally down the recursive call) - its trivial to add another wrapper if anything ever does want to track known elements.
2022-02-02 12:04:49 +00:00
Simon Moll 7d926b7177 [VE] LEGALAVL and staged VVP legalization
The new LEGALAVL node annotates that the AVL refers to packs of 64bit.
We use a two-stage lowering approach with LEGALAVL:

First, standard SDNodes are translated into illegal VVP layer nodes.
Regardless of source (VP or standard), all VVP nodes have a mask and AVL
parameter. The AVL parameter refers to the element position (just as in
VP intrinsics).

Second, we legalize the AVL usage in VVP layer nodes. If the element
size is < 64bit, the EVL parameter has to be adjusted to refer to packs
of 64bits.  We wrap the legalized AVL in a LEGALAVL node to track this.

Reviewed By: kaz7

Differential Revision: https://reviews.llvm.org/D118321
2022-02-02 09:11:41 +01:00
David Green c89cfbd4dd Revert "[DAG] Extend SearchForAndLoads with any_extend handling"
This reverts commit 100763a88f as it was
making incorrect assumptions about implicit zero_extends.
2022-02-01 20:18:40 +00:00
Simon Pilgrim 904395ab8f [DAG] SimplifyMultipleUseDemandedBits - add default Depth = 0 argument.
Simplifies an upcoming change.
2022-02-01 12:34:38 +00:00
Simon Pilgrim d83a96f59f [DAG] Make it clear mul(x,x) knownbits bit[1] == 0 check should be for x is undef only
As raised on rGffd0e464b4b9, if x is poison, this fold is still ok.
2022-02-01 11:32:14 +00:00
Bjorn Pettersson 3885879046 [DAGCombine] Add simple folds for SSHLSAT/USHLSAT
Do "simplifyShift" and "FoldConstantArithmetic" folds for the SSHLSAT
and USHLSAT DAG nodes.

This includes folds such as:
  (shlsat undef/poison, x) -> 0
  (shlsat x, undef/poison) -> undef
  (shlsat x, too_large_shamt) -> undef
  (shlsat 0, x) -> 0
  (shlsat x, 0) -> x
  (shlsat c1, c2) -> c3

Differential Revision: https://reviews.llvm.org/D118603
2022-02-01 10:51:35 +01:00
David Sherwood daa80339df [CodeGen] Support folds of not(cmp(cc, ...)) -> cmp(!cc, ...) for scalable vectors
I have updated TargetLowering::isConstTrueVal to also consider
SPLAT_VECTOR nodes with constant integer operands. This allows the
optimisation to also work for targets that support scalable vectors.

Differential Revision: https://reviews.llvm.org/D117210
2022-02-01 09:50:00 +00:00
Philip Reames 57cf29ac1b [Statepoint] Remove another use of getActualReturnType [NFC]
For the cross block gc.result projection case, we only care about the return type if there is a cross block gc.result, and if there is one, we can take the type from the gc.result.

At the moment, this makes little difference, but for opaque pointers we need a means to get result typing without relying on pointee types.
2022-01-31 09:57:46 -08:00
Philip Reames 6e4f7c0823 [Statepoints] Take result type from gc.result [NFC]
When lowering a gc.result, we can assume that the result type of the gc.result matches the type of the underlying call.  This is explicitly required in LangRef.

At the moment, this makes little difference, but for opaque pointers we need a means to get result typing without relying on pointee types.
2022-01-31 09:42:34 -08:00
Philip Reames 093b43f48d Sink getGCResultLocality to sole use [NFC] 2022-01-31 09:33:57 -08:00
Kerry McLaughlin 002b944dfa [SVE] Fix TypeSize->uint64_t implicit conversion in visitAlloca()
Fixes a crash ('Invalid size request on a scalable vector') in visitAlloca()
when we call this function for a scalable alloca instruction, caused
by the implicit conversion of TySize to uint64_t.
This patch changes TySize to a TypeSize as returned by getTypeAllocSize()
and ensures the allocation size is multiplied by vscale for scalable vectors.

Reviewed By: sdesmalen, david-arm

Differential Revision: https://reviews.llvm.org/D118372
2022-01-31 14:37:23 +00:00
Dávid Bolvanský ae990a3cbd [Analysis] Attribute noundef should not prevent tail call optimization
Very similar to https://reviews.llvm.org/D101230
Fixes https://github.com/llvm/llvm-project/issues/53501
2022-01-31 15:13:52 +01:00
Simon Pilgrim 7ec8fc2932 [X86] combineAnd() - per-element simplification - call SimplifyDemandedBits using mask demanded bits if SimplifyDemandedVectorElts fails
We already call SimplifyDemandedVectorElts using whether each vector mask element is zero/nonzero, this just extends this to also try SimplifyDemandedBits using the demanded bits mask generated from the nonzero elements.

This also requires an additional TargetLowering::SimplifyDemandedBits DemandedBits/DemandedElts wrapper.
2022-01-31 13:58:00 +00:00
Simon Pilgrim 2d1390efbe [DAG] SimplifyDemandedBits - mul(x,x) - if only demand bit[1] then fold to zero 2022-01-31 12:00:51 +00:00
Simon Pilgrim 48f45f6b25 [X86] Limit mul(x,x) knownbits tests with not undef/poison check
We can only assume bit[1] == zero if its the only demanded bit or the source is not undef/poison
2022-01-31 11:55:10 +00:00
Kazu Hirata 2bea207d26 [CodeGen] Use default member initialization (NFC)
Identified with modernize-use-default-member-init.
2022-01-30 12:32:51 -08:00
Cullen Rhodes 5d089d9a83 [DAGCombiner] Fix invalid size request in combineRepeatedFPDivisors
If we have a vector FP division with a splatted divisor, use
getVectorMinNumElements when scaling the num of uses by splat factor.

For AArch64 the combine kicks in for the <vscale x 4 x float> case since it's
above the fdiv threshold (3) when scaling num uses by splat factor, but the
codegen is worse (splat + vector fdiv + vector fmul) than the <vscale x 2 x
double> case (splat + vector fdiv).

If the combine could be converted into a scalar FP division by
scalarizeBinOpOfSplats it may be cheaper, but it looks like this is predicated
on the isExtractVecEltCheap TLI function which is implemented for x86 but not
AArch64. Perhaps for now combineRepeatedFPDivisors should only scale num uses
by splat if the division can be converted into scalar op.

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D118343
2022-01-28 17:01:08 +00:00
Ellis Hoag 11d3074267 [InstrProf] Add single byte coverage mode
Use the llvm flag `-pgo-function-entry-coverage` to create single byte "counters" to track functions coverage. This mode has significantly less size overhead in both code and data because
  * We mark a function as "covered" with a store instead of an increment which generally requires fewer assembly instructions
  * We use a single byte per function rather than 8 bytes per block

The trade off of course is that this mode only tells you if a function has been covered. This is useful, for example, to detect dead code.

When combined with debug info correlation [0] we are able to create an instrumented Clang binary that is only 150M (the vanilla Clang binary is 143M). That is an overhead of 7M (4.9%) compared to the default instrumentation (without value profiling) which has an overhead of 31M (21.7%).

[0] https://groups.google.com/g/llvm-dev/c/r03Z6JoN7d4

Reviewed By: kyulee

Differential Revision: https://reviews.llvm.org/D116180
2022-01-27 17:38:55 -08:00
Simon Pilgrim fdd3e2c943 [DAG] SelectionDAG::getNode(N1,N2) - detect N2 constant vector splats as well as scalars
We already perform some basic folds (add/sub with zero etc.) on scalar types, this patch adds some basic support for constant splats as well in a few cases (we can add more with future test coverage).

In the cases I've enabled, we can handle buildvector implicit truncation as we're not creating new constant nodes from the vector types - we're just returning existing nodes. This allows us to get a number of extra cases in the aarch64 tests.

I haven't enabled support for undefs in buildvector splats, as we're often checking for zero/allones patterns that return the original constant and we shouldn't be returning undef elements in some of these cases - we can enable this later if we're OK with creating new constants.

Differential Revision: https://reviews.llvm.org/D118264
2022-01-27 10:59:08 +00:00