Commit Graph

1284 Commits

Author SHA1 Message Date
Craig Topper 30305d7948 [TargetLowering][RISCV][Sparc] Don't emit zero check in CTTZTableLookup for CTTZ_ZERO_UNDEF.
The code incorrectly checked for CTLZ_ZERO_UNDEF instead of
CTTZ_ZERO_UNDEF.

While I was there I flipped the condition into an early out.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D136010
2022-10-17 10:15:39 -07:00
Craig Topper 0121b1a4ac Revert "[TargetLowering][RISCV][X86] Support even divisors in expandDIVREMByConstant."
This reverts commit d4facda414.

This has been reported to cause failures. Reverting while I investigate.
2022-10-10 14:53:29 -07:00
Craig Topper d4facda414 [TargetLowering][RISCV][X86] Support even divisors in expandDIVREMByConstant.
If the divisor is even, we can first shift the dividend and divisor
right by the number of trailing zeros. Now the divisor is odd and we
can do the original algorithm to calculate a remainder. Then we shift
that remainder left by the number of trailing zeros and add the bits
that were shifted out of the dividend.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D135541
2022-10-10 11:02:22 -07:00
Paul Scoropan ce004fb4f2 [PowerPC] XCOFF exception section support on the direct assembler path
This feature implements support for making entries in the exception section
on XCOFF on the direct assembly path using the ".except" pseudo-op. It also
provides functionality to lower entries (comprised of language and reason
codes) into the exception section through the use of annotation metadata
attached to llvm.ppc.trap/trapd/tw/tdw intrinsics. Integrated assembler
support will be provided in another review. https://reviews.llvm.org/D133030
needs to merge first for LIT tests

Reviewed By: shchenz, RKSimon

Differential Revision: https://reviews.llvm.org/D132146
2022-09-26 22:24:20 -04:00
Simon Pilgrim 8206044183 [DAG] SimplifyDemandedVectorElts - add MULHS/MULHU handling to existing MUL/AND handling
Allows to determine known zero elements, which particularly helps simplification of DIV/REM by constant patterns
2022-09-19 12:44:43 +01:00
Craig Topper 38ffa2bb96 [LegalizeTypes] Improve splitting for urem/udiv by constant for some constants.
For remainder:
If (1 << (Bitwidth / 2)) % Divisor == 1, we can add the high and low halves
together and use a (Bitwidth / 2) urem. If (BitWidth /2) is a legal integer
type, this urem will be expand by DAGCombiner using multiply by magic
constant. We do have to take into account that adding high and low
together can produce a carry, making it a (BitWidth / 2)+1 bit number.
So we need to also add back in the carry from the first addition.

For division:
We can use the above trick to compute the remainder, subtract that
remainder from the dividend, then multiply by the multiplicative
inverse of the Divisor modulo (1 << BitWidth).

This is based on the section "Remainder by Summing Digits" in
Hacker's delight.

The remainder trick is similar to a trick you may have learned for
determining if a decimal number is divisible by 3. You can add all the
digits together and see if the sum is divisible by 3. If you're not sure
if the sum is divisible by 3, you can add its digits together. This
can be repeated until you have a single decimal digit. If that digit
is 3, 6, or 9, then the original number is divisible by 3. This works
because 10 % 3 == 1.

gcc already does this same trick. There are additional tricks gcc
does urem as well as srem, udiv, and sdiv that I plan to add in
future patches.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D130862
2022-09-12 10:34:52 -07:00
Craig Topper e529c0a2a0 [TargetLowering] Use ComputeMaxSignificantBits instead of ComputeNumSignBits in expandMUL_LOHI. NFC
The way ComputeNumSignBits was being used was only correct if
OuterBitSize is exactly 2x InnerBitSize. Which is always true,
but not obviously so. Comparing ComputeMaxSignificantBits to
InnerBitSize feels more correct.
2022-09-04 22:35:16 -07:00
Craig Topper 45e2809f71 [TargetLowering] Use getShiftAmountConstant. NFC 2022-09-04 20:22:32 -07:00
Sanjay Patel f8dfbea324 [SDAG] expand more is-power-of-2 patterns that use popcount
(ctpop x) == 1 --> (x != 0) && ((x & x-1) == 0)

Adjust the legality check to avoid the poor codegen on AArch64.
We probably only want to use popcount on this pattern when it
is a single instruction.

fixes #57225

Differential Revision: https://reviews.llvm.org/D132237
2022-08-23 17:53:53 -04:00
Nick Desaulniers 6b0e2fa6f0 [SelectionDAG] make INLINEASM_BR use MachineBasicBlocks instead of BlockAddresses
As part of re-architecting callbr to no longer use blockaddresses
(https://reviews.llvm.org/D129288), we don't really need them in MIR.
They make comparing MachineBasicBlocks of indirect targets during
MachineVerifier a PITA.

Suggested by @efriedma from the discussion:
https://reviews.llvm.org/D130290#3669531

Reviewed By: efriedma, void

Differential Revision: https://reviews.llvm.org/D130316
2022-08-17 09:34:31 -07:00
Simon Pilgrim 4de35f4bbf [DAG] Add TODO to remove creation of INSERT_SUBVECTOR nodes from SimplifyMultipleUseDemandedBits
SimplifyMultipleUseDemandedBits shouldn't be creating general nodes like this - although we allow bitcasts, even general constant folding is avoided.

Removing it causes a number of regressions that need addressing first, but I've added a TODO for now.
2022-08-12 10:45:30 +01:00
Simon Pilgrim ed162d455a [DAG] Avoid hasOneUse() calls if the cheaper !AssumeSingleUse test has already failed. NFC.
Very minor optimization, but every little helps..
2022-08-09 16:42:19 +01:00
Simon Pilgrim d79e7dc939 [DAG] SimplifyDemandedVectorElts - and/mul(x,y) - if a demanded element of y is known zero then we don't need to demand it in x
This fixes most of the remaining regressions from the fixes in rG293899c64b75
2022-08-09 16:24:08 +01:00
Fangrui Song de9d80c1c5 [llvm] LLVM_FALLTHROUGH => [[fallthrough]]. NFC
With C++17 there is no Clang pedantic warning or MSVC C5051.
2022-08-08 11:24:15 -07:00
Simon Pilgrim 9641a201a5 [DAG] Add initial SelectionDAG::canCreateUndefOrPoison support
This patch adds basic support for a DAG variant of the canCreateUndefOrPoison call and updates DAGCombiner::visitFREEZE to use it, further Opcodes (including target specific Opcodes) can be handled when we have test coverage.

So far, I've left visitFREEZE to just use this for unary nodes (which currently means the existing BITCAST/FREEZE cases) - later patches will add other unary opcodes (with test coverage) and we can also refactor visitFREEZE to support a general number of operands like we do in InstCombinerImpl::pushFreezeToPreventPoisonFromPropagating.

I'm not aware of any vector test freeze coverage so the DemandedElts (and the Depth) args are not being used yet - but they are in place. Similarly we will be able to handle poison generating SDNodeFlags as and when it becomes an issue.

Part of the work for D106675 / PR50468

Differential Revision: https://reviews.llvm.org/D130646
2022-08-08 15:16:06 +01:00
Shubham Narlawar ab4fc87a9d [DAG] Emit table lookup from TargetLowering::expandCTTZ()
This patch emits table lookup in expandCTTZ.

Context -
https://reviews.llvm.org/D113291 transforms set of IR instructions to
cttz intrinsic but there are some targets which does not support CTTZ or
CTLZ. Hence, I generate a table lookup in TargetLowering::expandCTTZ().

Differential Revision: https://reviews.llvm.org/D128911
2022-08-08 12:08:05 +01:00
Kazu Hirata a2d4501718 [llvm] Fix comment typos (NFC) 2022-08-07 00:16:14 -07:00
Dawid Jurczak 1bd31a6898 [NFC] Add SmallVector constructor to allow creation of SmallVector<T> from ArrayRef of items convertible to type T
Extracted from https://reviews.llvm.org/D129781 and address comment:
https://reviews.llvm.org/D129781#3655571

Differential Revision: https://reviews.llvm.org/D130268
2022-08-05 13:35:41 +02:00
Simon Pilgrim 9ad082eb5a [DAG] Pull out repeated getOperand() calls for shuffle ops. NFC. 2022-07-30 14:02:54 +01:00
Simon Pilgrim af1b7ebcdf [TargetLowering] Move a few hasOneUse() tests later to reduce unnecessary computations. NFC.
Many of these cases, an early-out on the much cheaper getOpcode() check will avoid us needing to call hasOneUse() entirely.
2022-07-29 14:20:35 +01:00
Simon Pilgrim 9082c13106 [Support] Add KnownBits::concat method
Add a method for the various cases where we need to concatenate 2 KnownBits together (BUILD_PAIR and SHIFT_PARTS in particular) - uses the existing APInt::concat 'HiBits.concat(LoBits)' convention

Differential Revision: https://reviews.llvm.org/D130557
2022-07-29 11:06:39 +01:00
Simon Pilgrim 8c99cef1e7 [DAG] Remove SelectionDAG::GetDemandedBits and use SimplifyMultipleUseDemandedBits directly.
GetDemandedBits is mainly a wrapper around SimplifyMultipleUseDemandedBits now, and is only used by DAGCombiner::visitSTORE so I've moved all remaining functionality there.

visitSTORE was making use of this to 'simplify' constants for a trunc-store. Just removing this code left to a mixture of regressions and gains - it came down to whether a target preferred a sign or zero extended constant for materialization/truncation. I've just moved the code over for now, but a next step would be to move this to targetShrinkDemandedConstant, but some targets that override the method expect a basic binop, and might react badly to a store node.....
2022-07-28 17:03:44 +01:00
Simon Pilgrim 69d5a038b9 [DAG] Enable ISD::SRL SimplifyMultipleUseDemandedBits handling inside SimplifyDemandedBits
This patch allows SimplifyDemandedBits to call SimplifyMultipleUseDemandedBits in cases where the ISD::SRL source operand has other uses, enabling us to peek through the shifted value if we don't demand all the bits/elts.

This is another step towards removing SelectionDAG::GetDemandedBits and just using TargetLowering::SimplifyMultipleUseDemandedBits.

There a few cases where we end up with extra register moves which I think we can accept in exchange for the increased ILP.

Differential Revision: https://reviews.llvm.org/D77804
2022-07-28 14:10:44 +01:00
Simon Pilgrim c0b3f7a50f [DAG] SimplifyDemandedBits - ensure we clear known One bits that AssertZext asserts are really known Zero
Matches ComputeKnownBits behaviour

Thanks to @uabelho for the fuzz regression report on D129765
2022-07-27 13:57:47 +01:00
Simon Pilgrim 529bd4f352 [DAG] SimplifyDemandedBits - don't early-out for multiple use values
SimplifyDemandedBits currently early-outs for multi-use values beyond the root node (just returning the knownbits), which is missing a number of optimizations as there are plenty of cases where we can still simplify when initially demanding all elements/bits.

@lenary has confirmed that the test cases in aea-erratum-fix.ll need refactoring and the current increase codegen is not a major concern.

Differential Revision: https://reviews.llvm.org/D129765
2022-07-27 10:54:06 +01:00
Simon Pilgrim e82d49bfed [DAG] SimplifyMultipleUseDemandedBits - early-out for any scalable vector types
Noticed while working to remove SelectionDAG::GetDemandedBits - we were relying on the callers to have already bailed for scalable vectors
2022-07-24 12:59:43 +01:00
Simon Pilgrim a3e38b4a20 [DAG] SimplifyDemandedVectorElts - if every and/mul element-pair has a zero/undef then just constant fold to zero 2022-07-24 12:00:31 +01:00
Simon Pilgrim 5f89d2bae9 [DAG] Move OR(AND(X,C1),AND(OR(X,Y),C2)) -> OR(AND(X,OR(C1,C2)),AND(Y,C2)) fold to SimplifyDemandedBits
This will fix the SystemZ v3i31 memcpy regression in D77804 (with the help of D129765 as well....).

It should also allow us to /bend/ the oneuse limitation for cases where we can use demanded bits to safely peek though multiple uses of the AND ops.
2022-07-23 13:17:24 +01:00
Simon Pilgrim 6aff1b7b3c [DAG] SimplifyDemandedBits - pull out repeated getValueType() calls. NFC. 2022-07-23 12:01:54 +01:00
Simon Pilgrim 0f6b0461b0 [DAG] SimplifyDemandedBits - relax "xor (X >> ShiftC), XorC --> (not X) >> ShiftC" to match only demanded bits
The "xor (X >> ShiftC), XorC --> (not X) >> ShiftC" fold is currently limited to the XOR mask being a shifted all-bits mask, but we can relax this to only need to match under the demanded bits.

This helps expose more bit extraction/clearing patterns and fixes the PowerPC testCompares*.ll regressions from D127115

Alive2: https://alive2.llvm.org/ce/z/fl7T7K

Differential Revision: https://reviews.llvm.org/D129933
2022-07-19 10:59:07 +01:00
Craig Topper 7fa1c32634 [CodeGen] Remove unnecessary APInt copy. NFC 2022-07-17 23:41:53 -07:00
Craig Topper a55ff6aadd [Support][CodeGen] Fix spelling Divison->Division. NFC 2022-07-17 23:16:29 -07:00
Craig Topper 795602af0c [CodeGen] Don't compare bool with integer 0. NFC
The IsAdd field is a bool.
2022-07-17 23:16:14 -07:00
Simon Pilgrim 3c8bf29696 [DAG] Move "xor (X logical_shift ShiftC), XorC --> (not X) logical_shift ShiftC" fold into SimplifyDemandedBits
SimplifyDemandedBits is called slightly later which allows the not(sext(x)) -> sext(not(x)) fold to occur via foldLogicOfShifts

As mentioned on D127115, we should be able to further generalise this based off the demanded bits.
2022-07-15 13:10:15 +01:00
Nikita Popov 2a721374ae [IR] Don't use blockaddresses as callbr arguments
Following some recent discussions, this changes the representation
of callbrs in IR. The current blockaddress arguments are replaced
with `!` label constraints that refer directly to callbr indirect
destinations:

    ; Before:
    %res = callbr i8* asm "", "=r,r,i"(i8* %x, i8* blockaddress(@test8, %foo))
    to label %asm.fallthrough [label %foo]
    ; After:
    %res = callbr i8* asm "", "=r,r,!i"(i8* %x)
    to label %asm.fallthrough [label %foo]

The benefit of this is that we can easily update the successors of
a callbr, without having to worry about also updating blockaddress
references. This should allow us to remove some limitations:

* Allow unrolling/peeling/rotation of callbr, or any other
  clone-based optimizations
  (https://github.com/llvm/llvm-project/issues/41834)
* Allow duplicate successors
  (https://github.com/llvm/llvm-project/issues/45248)

This is just the IR representation change though, I will follow up
with patches to remove limtations in various transformation passes
that are no longer needed.

Differential Revision: https://reviews.llvm.org/D129288
2022-07-15 10:18:17 +02:00
Craig Topper dcfc1fd26f [SelectionDAG][RISCV][AMDGPU][ARM] Improve SimplifyDemandedBits for SHL with variable shift amount.
If we have a variable shift amount and the demanded mask has leading
zeros, we can propagate those leading zeros to not demand those bits
from operand 0. This can allow zero_extend/sign_extend to become
any_extend. This pattern can occur due to C integer promotion rules.

This transform is already done by InstCombineSimplifyDemanded.cpp where
sign_extend can be turned into zero_extend for example.

Reviewed By: spatel, foad

Differential Revision: https://reviews.llvm.org/D121833
2022-07-14 16:10:14 -07:00
Kazu Hirata 611ffcf4e4 [llvm] Use value instead of getValue (NFC) 2022-07-13 23:11:56 -07:00
Simon Pilgrim d172842b51 [DAG] SimplifyDemandedVectorElts - adjust demanded elements for selection mask for known zero results
If an element is known zero from both selections then it shouldn't matter what the selection mask element is.
2022-07-13 17:36:05 +01:00
Craig Topper 8eaf00e04d [TargetLowering][RISCV] Make expandCTLZ work for non-power of 2 types.
To convert CTLZ to popcount we do

x = x | (x >> 1);
x = x | (x >> 2);
...
x = x | (x >>16);
x = x | (x >>32); // for 64-bit input
return popcount(~x);

This smears the most significant set bit across all of the bits
below it then inverts the remaining 0s and does a population count.

To support non-power of 2 types, the last shift amount must be
more than half of the size of the type. For i15, the last shift
was previously a shift by 4, with this patch we add another shift
of 8.

Fixes PR56457.

Differential Revision: https://reviews.llvm.org/D129431
2022-07-12 11:36:37 -07:00
Simon Pilgrim ded62411f7 [DAG] SimplifyDemandedBits - AND/OR/XOR - attempt basic knownbits simplifications before calling SimplifyMultipleUseDemandedBits
Noticed while investigating the SystemZ regressions in D77804, prefer handling the knownbits analysis/simplification in the bitop nodes directly before falling back to SimplifyMultipleUseDemandedBits
2022-07-12 14:09:00 +01:00
Nikita Popov c64aba5d93 [SDAG] Don't duplicate ParseConstraints() implementation SDAGBuilder (NFCI)
visitInlineAsm() in SDAGBuilder was duplicating a lot of the code
in ParseConstraints(), in particular all the logic to determine the
operand value and constraint VT.

Rely on the data computed by ParseConstraints() instead, and update
its ConstraintVT implementation to match getCallOperandValEVT()
more precisely.
2022-07-12 10:42:02 +02:00
Craig Topper b05160dbdf [SelectionDAG] Simplify how we drop poison flags in SimplifyDemandedBits.
As far as I can tell what was happening in the original code is
that the getNode call receives the same operands as the original
node with different SDNodeFlags. The logic inside getNode detects
that the node already exists and intersects the flags into the
existing node and returns it. This results in Op and NewOp for the
TLO.CombineTo call always being the same node.

We may have already called CombineTo as part of the recursive handling.
A second call to CombineTo as we unwind the recursion overwrites
the previous CombineTo. I think this means any time we updated the
poison flags that was the only change that ends up getting made
and we relied on DAGCombiner to revisit and call SimplifyDemandedBits
again. The second time the poison flags wouldn't need to be dropped
and we would keep the CombineTo call from further down the recursion.

We can instead call setFlags to drop the poison flags and remove the
call to TLO.CombineTo. This way we keep the CombineTo from deeper in
the recursion which should be more efficient.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D129511
2022-07-11 13:42:33 -07:00
Simon Pilgrim b53046122f [DAG] SimplifyDemandedBits - fold AND(INSERT_SUBVECTOR(C,X,I),M) -> INSERT_SUBVECTOR(AND(C,M),X,I)
If all the demanded bits of the AND mask covering the inserted subvector 'X' are known to be one, then the mask isn't affecting the subvector at all.

In which case, if the base vector 'C' is undef/constant, then move the AND mask up to just (constant) fold it directly.

Addresses some of the regressions from D129150, particularly the cases where we're attempting to zero the upper elements of a widened vector.

Differential Revision: https://reviews.llvm.org/D129290
2022-07-08 16:08:31 +01:00
Kazu Hirata 3b7c3a654c Revert "Don't use Optional::hasValue (NFC)"
This reverts commit aa8feeefd3.
2022-06-25 11:56:50 -07:00
Kazu Hirata aa8feeefd3 Don't use Optional::hasValue (NFC) 2022-06-25 11:55:57 -07:00
David Green c0ecbfa4fd [AArch64] Known bits for AArch64ISD::DUP
An AArch64ISD::DUP is just a splat, where the known bits for each lane
are the same as the input. This teaches that to computeKnownBitsForTargetNode.

Problems arise for constants though, as a constant BUILD_VECTOR can be
lowered to an AArch64ISD::DUP, which SimplifyDemandedBits would then
turn back into a constant BUILD_VECTOR leading to an infinite cycle.
This has been prevented by adding a isTargetCanonicalConstantNode node
to prevent the conversion back into a BUILD_VECTOR.

Differential Revision: https://reviews.llvm.org/D128144
2022-06-20 19:11:57 +01:00
Simon Pilgrim 1ebe5cac46 [DAG] SimplifyDemandedBits - add DemandedElts handling to ISD::SIGN_EXTEND_INREG simplification 2022-06-19 15:35:29 +01:00
Simon Pilgrim db1be696c4 [DAG] SimplifyDemandedBits - add ISD::VSELECT handling 2022-06-19 15:18:25 +01:00
Ping Deng c06f77ec0d [SelectionDAG] fold 'Op0 - (X * MulC)' to 'Op0 + (X << log2(-MulC))'
Reviewed By: craig.topper, spatel

Differential Revision: https://reviews.llvm.org/D127474
2022-06-15 05:50:18 +00:00
Simon Pilgrim 1cf9b24da3 [DAG] Enable ISD::FSHL/R SimplifyMultipleUseDemandedBits handling inside SimplifyDemandedBits
This patch allows SimplifyDemandedBits to call SimplifyMultipleUseDemandedBits in cases where the source operand has other uses, enabling us to peek through the shifted value if we don't demand all the bits/elts.

This helps with several of the regressions from D125836
2022-06-12 19:25:20 +01:00