Commit Graph

349 Commits

Author SHA1 Message Date
Sanjay Patel 2fa8fc3d0a [InstCombine] freeze operand in div+mul fold
As discussed in issue #37809, this transform is not safe
if the input is an undefined value.

This is similar to recent changes for urem and sdiv:
d428f09b2c
99ef341ce9

There is no difference in codegen on the basic examples,
but this could lead to regressions. We may need to
improve freeze analysis or lowering if that happens.

Presumably, in real cases that are similar to the tests
where a subsequent transform removes the rem, we
will also be able to remove the freeze by seeing that
the parameter has 'noundef'.
2022-05-12 13:49:29 -04:00
Sanjay Patel 99ef341ce9 [InstCombine] freeze operand in sdiv expansion
As discussed in issue #37809, this transform is not safe
if the input is an undefined value.

This is similar to a recent change for urem:
d428f09b2c

There is no difference in codegen on the basic examples,
but this could lead to regressions. We may need to
improve freeze analysis or lowering if that happens.

Presumably, in real cases that are similar to the tests
where a subsequent transform removes the select, we
will also be able to remove the freeze by seeing that
the parameter has 'noundef'.
2022-05-11 14:01:28 -04:00
Sanjay Patel d428f09b2c [InstCombine] freeze operand in urem expansion
As discussed in issue #37809, this transform is not safe
if the input is an undefined value.

There is no difference in codegen on the basic examples,
but this could lead to regressions. We may need to
improve freeze analysis or lowering if that happens.
2022-05-11 12:47:26 -04:00
Jonas Paulsson 304378fd09 Reapply "[BuildLibCalls] Introduce getOrInsertLibFunc() for use when building
libcalls." (was 0f8c626). This reverts commit 14d9390.

The patch previously failed to recognize cases where user had defined a
function alias with an identical name as that of the library
function. Module::getFunction() would then return nullptr which is what the
sanitizer discovered.

In this updated version a new function isLibFuncEmittable() has as well been
introduced which is now used instead of TLI->has() anytime a library function
is to be emitted . It additionally also makes sure there is e.g. no function
alias with the same name in the module.

Reviewed By: Eli Friedman

Differential Revision: https://reviews.llvm.org/D123198
2022-05-02 19:37:00 +02:00
Liqin Weng fa4b4f0fcb [InstCombine] fold more constant remainder to select-of-constants remainder
Reviewed By: xbolva00, spatel, Chenbing.Zheng

Differential Revision: https://reviews.llvm.org/D123486
2022-04-12 09:40:56 +08:00
Chenbing Zheng 467cbb6249 [InstCombine] fold more constant divisor to select-of-constants divisor
By adding a parameter to function FoldOpIntoSelect, we can fold more Ops to Select.
For this example, we tend to fold the division instruction,
so we no longer care whether SelectInst is one use.

This patch slove TODO left in InstCombine/div.ll.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D122967
2022-04-08 10:19:24 +08:00
serge-sans-paille 59630917d6 Cleanup includes: Transform/Scalar
Estimated impact on preprocessor output line:
before: 1062981579
after:  1062494547

Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup
Differential Revision: https://reviews.llvm.org/D120817
2022-03-03 07:56:34 +01:00
Nikita Popov 587c7ff15c [InstCombine] Support min/max intrinsics in udiv->lshr fold
This complements the existing fold for selects. This fold is a bit
more conservative, requiring one-use. The other folds here should
probably also be subjected to a one-use restriction.

https://alive2.llvm.org/ce/z/Q9eCDU
https://alive2.llvm.org/ce/z/8YK2CJ
2022-02-23 15:51:36 +01:00
Nikita Popov 03e6efb8c2 [InstCombine] Further simplify udiv -> lshr folding
Rather than queuing up actions, have one function that does the
log2() fold in the obvious way, but with a flag that allows us
to check whether the fold will succeed without actually performing
it.
2022-02-23 15:29:21 +01:00
Nikita Popov 5ccb0582c2 [InstCombine] Simplify udiv -> lshr folding
What we're really doing here is converting Op0 udiv Op1 into
Op0 lshr log2(Op1), so phrase it in that way. Actually pushing
the lshr into the log2(Op1) expression should be seen as a separate
transform.
2022-02-23 14:55:23 +01:00
Nikita Popov 5fb65557e3 [InstCombine] Remove unused visitUDivOperand() argument (NFC)
This function only works on the RHS operand.
2022-02-23 13:16:44 +01:00
Sanjay Patel 39e602b6c4 [InstCombine] try to fold binop with phi operands
This is an alternate version of D115914 that handles/tests all binary opcodes.

I suspect that we don't see these patterns too often because -simplifycfg
would convert the minimal cases into selects rather than leave them in phi form
(note: instcombine has logic holes for combining the select patterns too though,
so that's another potential patch).

We only create a new binop in a predecessor that unconditionally branches to
the final block.
https://alive2.llvm.org/ce/z/C57M2F
https://alive2.llvm.org/ce/z/WHwAoU (not safe to speculate an sdiv for example)
https://alive2.llvm.org/ce/z/rdVUvW (but it is ok on this path)

Differential Revision: https://reviews.llvm.org/D117110
2022-01-22 15:00:06 -05:00
Sanjay Patel a7a2860d0e [InstCombine] convert mul with sexted bool and constant to select
We already have the related folds for zext-of-bool, so it
should make things more consistent to have this transform
to select for sext-of-bool too:
https://alive2.llvm.org/ce/z/YikdfA

Fixes #53319
2022-01-20 15:57:01 -05:00
Sanjay Patel a7ed21aa1e [InstCombine] try to fold div with constant dividend and select-of-constants divisor
We avoid this fold in the more general cases where we use FoldOpIntoSelect.
That's because -- unlike most binary opcodes -- 'div' can't usually be
speculated with a variable divisor since it can have immediate UB. But in
the case where both arms of the select are constants, we can safely evaluate
both sides and eliminate 'div' completely.

This is a follow-up to the equivalent fold for 'rem' opcodes:
D115173 / f65be726ab
2021-12-08 10:27:50 -05:00
Sanjay Patel f65be726ab [InstCombine] try to fold rem with constant dividend and select-of-constants divisor
We avoid this fold in the more general cases where we use `FoldOpIntoSelect`.
That's because -- unlike most binary opcodes -- 'rem' can't usually be
speculated with a variable divisor since it can have immediate UB. But in
the case where both arms of the select are constants, we can safely evaluate
both sides and eliminate 'rem' completely.

This should fix:
https://llvm.org/PR52102

The same optimization for 'div' is planned as a follow-up patch.

Differential Revision: https://reviews.llvm.org/D115173
2021-12-07 15:48:45 -05:00
Sanjay Patel 7a2949647a [InstCombine] propagate no-wrap flag through select-of-mul fold
This may not be obvious, but Alive2 agrees:
https://alive2.llvm.org/ce/z/Ld9qNT

If the mul has "nsw", then -1 * INT_MIN is poison, so the
negate can also have "nsw" because 0 - INT_MIN is poison.

If the mul has "nuw", then that means the "OtherOp" can only
be 0 or 1 (anything else multiplied by 0xfff... would wrap).
So the replacement negate must be "nsw" because it is either
"0-0" or "0-1".

This is another regression noticed with a planned follow-up
to D111410.
2021-10-12 12:57:20 -04:00
Jay Foad a9bceb2b05 [APInt] Stop using soft-deprecated constructors and methods in llvm. NFC.
Stop using APInt constructors and methods that were soft-deprecated in
D109483. This fixes all the uses I found in llvm, except for the APInt
unit tests which should still test the deprecated methods.

Differential Revision: https://reviews.llvm.org/D110807
2021-10-04 08:57:44 +01:00
Simon Pilgrim 5a14edd8ed [InstCombine] Ensure shifts are in range for (X << C1) / C2 -> X fold.
We can get here before out of range shift amounts have been handled - limit to BW-2 for sdiv and BW-1 for udiv

Fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=38078
2021-09-25 12:57:43 +01:00
Florian Hahn e08a5dc86f
[InstCombine] Move InstCombineWorklist to Utils to allow reuse (NFC).
InstCombine's worklist can be re-used by other passes like
VectorCombine. Move it to llvm/Transform/Utils and rename it to
InstructionWorklist.

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D110181
2021-09-22 08:47:21 +01:00
Dávid Bolvanský c0fdfc9af2 [InstCombine] powi(x, y) * powi(x, z) -> powi(x, y + z)
We already have pow(x, y) * pow(x, z) -> pow(x, y + z) transformation, but we are missing same transformation for powi (power is integer).

Requires reassoc.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D109954
2021-09-21 18:20:46 +02:00
Daniil Seredkin 6643e51d79 [InstCombine] Fold (sext bool X) * (sext bool X) to zext (and X, X)
InstCombine didn't perform (sext bool X) * (sext bool X) --> zext (and X, X) which can result in just (zext X). The patch adds regression tests to check this transformation and adds a check for equality of mul's operands for that case.

Differential Revision: https://reviews.llvm.org/D104193
2021-06-18 16:28:06 +07:00
Daniil Seredkin 6de741de08 Revert "[InstCombine] Fold (sext bool X) * (sext bool X) to zext (and X, X)"
This reverts commit 31053338c9.
2021-06-18 14:21:02 +07:00
Daniil Seredkin 31053338c9 [InstCombine] Fold (sext bool X) * (sext bool X) to zext (and X, X)
InstCombine didn't perform (sext bool X) * (sext bool X) --> zext (and X, X) which can result in just (zext X). The patch adds regression tests to check this transformation and adds a check for equality of mul's operands for that case.

Differential Revision: https://reviews.llvm.org/D104193
2021-06-18 14:12:00 +07:00
Bjorn Pettersson 4c7f820b2b Update @llvm.powi to handle different int sizes for the exponent
This can be seen as a follow up to commit 0ee439b705,
that changed the second argument of __powidf2, __powisf2 and
__powitf2 in compiler-rt from si_int to int. That was to align with
how those runtimes are defined in libgcc.
One thing that seem to have been missing in that patch was to make
sure that the rest of LLVM also handle that the argument now depends
on the size of int (not using the si_int machine mode for 32-bit).
When using __builtin_powi for a target with 16-bit int clang crashed.
And when emitting libcalls to those rtlib functions, typically when
lowering @llvm.powi), the backend would always prepare the exponent
argument as an i32 which caused miscompiles when the rtlib was
compiled with 16-bit int.

The solution used here is to use an overloaded type for the second
argument in @llvm.powi. This way clang can use the "correct" type
when lowering __builtin_powi, and then later when emitting the libcall
it is assumed that the type used in @llvm.powi matches the rtlib
function.

One thing that needed some extra attention was that when vectorizing
calls several passes did not support that several arguments could
be overloaded in the intrinsics. This patch allows overload of a
scalar operand by adding hasVectorInstrinsicOverloadedScalarOpd, with
an entry for powi.

Differential Revision: https://reviews.llvm.org/D99439
2021-06-17 09:38:28 +02:00
Daniil Seredkin 7736c1936a [InstCombine] Missed optimization for pow(x, y) * pow(x, z) with fast-math
If FP reassociation (fast-math) is allowed, then LLVM is free to do the
following transformation pow(x, y) * pow(x, z) -> pow(x, y + z).
This patch adds this transformation and tests for it.
See more https://bugs.llvm.org/show_bug.cgi?id=47205

It handles two cases

1. When operands of fmul are different instructions

%4 = call reassoc float @llvm.pow.f32(float %0, float %1)
%5 = call reassoc float @llvm.pow.f32(float %0, float %2)
%6 = fmul reassoc float %5, %4
-->
%3 = fadd reassoc float %1, %2
%4 = call reassoc float @llvm.pow.f32(float %0, float %3)

2. When operands of fmul are the same instruction

%4 = call reassoc float @llvm.pow.f32(float %0, float %1)
%5 = fmul reassoc float %4, %4
-->
%3 = fadd reassoc float %1, %1
%4 = call reassoc float @llvm.pow.f32(float %0, float %3)

Differential Revision: https://reviews.llvm.org/D102574
2021-06-07 08:08:05 -04:00
Daniil Seredkin 13140120dc [InstCombine] Relax constraints of uses for exp(X) * exp(Y) -> exp(X + Y)
InstCombine didn't perform the transformations when fmul's operands were
the same instruction because it required to have one use for each of them
which is false in the case. This patch fixes this + adds tests for them
and introduces a new function isOnlyUserOfAnyOperand to check these cases
in a single place.

This patch is a result of discussion in D102574.

Differential Revision: https://reviews.llvm.org/D102698
2021-06-01 08:33:23 -04:00
Sanjay Patel a7cee55762 [InstCombine] fold fdiv with powi divisor (PR49147)
This extends b40fde062c for the especially non-standard
powi pattern. We want to avoid being completely wrong
on the negation-of-int-min corner case, so I'm adding
an extra FMF check for 'ninf' assuming that gives us
the flexibility to handle that possibility.
https://llvm.org/PR49147
2021-02-24 16:44:36 -05:00
Sanjay Patel 868d43fbd6 [InstCombine] add helper for x/pow(); NFC
We at least want to add powi to this list, so
split it off into a switch to reduce code duplication.
2021-02-24 16:44:36 -05:00
Sanjay Patel e772618f1e [InstCombine] fold fdiv with exp/exp2 divisor (PR49147)
Follow-up to:
D96648 / b40fde062
...for the special-case base calls.

From the earlier commit:
This is unusual in the general (non-reciprocal) case because we need
an extra instruction, but that should be better for general FP
reassociation and codegen. We conservatively check for "arcp" FMF
here as we do with existing fdiv folds, but it is not strictly
necessary to have that.
2021-02-20 16:02:58 -05:00
Sanjay Patel b40fde062c [InstCombine] fold fdiv with pow divisor (PR49147)
This is unusual in the general (non-reciprocal) case because we need
an extra instruction, but that should be better for general FP
reassociation and codegen. We conservatively check for "arcp" FMF
here as we do with existing fdiv folds, but it is not strictly
necessary to have that.

This is part of solving:
https://llvm.org/PR49147
(The powi variant potentially has a different constraint.)

Differential Revision: https://reviews.llvm.org/D96648
2021-02-14 08:07:36 -05:00
Kazu Hirata 302313a264 [Transforms] Use range-based for loops (NFC) 2021-02-08 22:33:53 -08:00
Dávid Bolvanský ed396212da [InstCombine] Transform abs pattern using multiplication to abs intrinsic (PR45691)
```
unsigned r(int v)
{
    return (1 | -(v < 0)) * v;
}

`r` is equivalent to `abs(v)`.

```

```
define <4 x i8> @src(<4 x i8> %0) {
%1:
  %2 = ashr <4 x i8> %0, { 31, undef, 31, 31 }
  %3 = or <4 x i8> %2, { 1, 1, 1, undef }
  %4 = mul nsw <4 x i8> %3, %0
  ret <4 x i8> %4
}
=>
define <4 x i8> @tgt(<4 x i8> %0) {
%1:
  %2 = icmp slt <4 x i8> %0, { 0, 0, 0, 0 }
  %3 = sub nsw <4 x i8> { 0, 0, 0, 0 }, %0
  %4 = select <4 x i1> %2, <4 x i8> %3, <4 x i8> %0
  ret <4 x i8> %4
}
Transformation seems to be correct!
```

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D94874
2021-01-17 17:06:14 +01:00
Simon Pilgrim b752daa26b [InstCombine] Replace getLogBase2 internal helper with ConstantExpr::getExactLogBase2. NFCI.
This exposes the helper for other power-of-2 instcombine folds that I'm intending to add vector support to.

The helper only operated on power-of-2 constants so getExactLogBase2 is a more accurate name.
2020-10-11 10:31:17 +01:00
Simon Pilgrim 702ccb40e2 [InstCombine] getLogBase2(undef) -> 0.
Move the undef element handling into the getLogBase2 helper instead of pre-empting with replaceUndefsWith.
2020-10-10 20:29:03 +01:00
Simon Pilgrim 3aab3cbd4a [InstCombine] getLogBase2 - no need to specify Type. NFCI.
In all the getLogBase2 uses, the specified Type is always the same as the constant being folded.
2020-10-10 20:09:55 +01:00
Simon Pilgrim 567049f892 [InstCombine] Use m_FAbs matcher helper. NFCI. 2020-10-01 14:42:34 +01:00
Nikita Popov 58b28fa7a2 [InstCombine] Fold mul of abs intrinsic
Same as the existing SPF_ABS fold. We don't need to explicitly
handle NABS, as the negs will get folded away first.
2020-09-05 12:37:45 +02:00
Venkataramanan Kumar 626c3738cd [InstCombine] Transform 1.0/sqrt(X) * X to X/sqrt(X)
These transforms will now be performed irrespective of the number of uses for the expression "1.0/sqrt(X)":
1.0/sqrt(X) * X => X/sqrt(X)
X * 1.0/sqrt(X) => X/sqrt(X)

We already handle more general cases, and we are intentionally not creating extra (and likely expensive)
fdiv ops in IR. This pattern is the exception to the rule because we always expect the Backend to reduce
X/sqrt(X) to sqrt(X), if it has the necessary (reassoc) fast-math-flags.

Ref: DagCombiner optimizes the X/sqrt(X) to sqrt(X).

Differential Revision: https://reviews.llvm.org/D86726
2020-09-02 08:23:48 -04:00
Christopher Tetreault 640f20b0c7 [SVE] Remove calls to VectorType::getNumElements from InstCombine
Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D82237
2020-08-31 12:59:10 -07:00
Sanjay Patel e6b6787d01 [InstCombine] fold abs(X)/X to cmp+select
The backend can convert the select-of-constants to
bit-hack shift+logic if desirable.

https://alive2.llvm.org/ce/z/pgJT6E

  define i8 @src(i8 %x) {
  %0:
    %a = abs i8 %x, 1
    %d = sdiv i8 %x, %a
    ret i8 %d
  }
  =>
  define i8 @tgt(i8 %x) {
  %0:
    %cond = icmp sgt i8 %x, 255
    %r = select i1 %cond, i8 1, i8 255
    ret i8 %r
  }
  Transformation seems to be correct!
2020-08-17 08:01:28 -04:00
Sanjay Patel 6cd4a6f6b2 [InstCombine] reduce code duplication; NFC 2020-08-17 08:01:27 -04:00
Roman Lebedev d6f0600c96
[NFC][InstCombine] Add FIXME's for getLogBase2() / visitUDivOperand()
These are not correctness issues.

In visitUDivOperand(), if the (potential) divisor is undef, then udiv is
already UB, so it is not incorrect to keep undef as shift amount.

But, that is suboptimal.
We could instead simply drop that select, picking the other operand.

Afterwards, getLogBase2() could assert that there is no undef in divisor.
2020-08-12 22:06:54 +03:00
Roman Lebedev 12d93a27e7
[InstCombine] Sanitize undef vector constant to 1 in X*(2^C) with X << C (PR47133)
While x*undef is undef, shift-by-undef is poison,
which we must avoid introducing.

Also log2(iN undef) is *NOT* iN undef, because log2(iN undef) u< N.

See https://bugs.llvm.org/show_bug.cgi?id=47133
2020-08-12 22:06:53 +03:00
Roman Lebedev be02adfad7
[InstCombine] Fold (x + C1) * (-1<<C2) --> (-C1 - x) * (1<<C2)
Negator knows how to do this, but the one-use reasoning is getting
a bit muddy here, we don't really want to increase instruction count,
so we need to both lie that "IsNegation" and have an one-use check
on the outermost LHS value.
2020-08-06 23:40:16 +03:00
Roman Lebedev 0c1c756a31
[InstCombine] Generalize %x * (-1<<C) --> (-%x) * (1<<C) fold
Multiplication is commutative, and either of operands can be negative,
so if the RHS is a negated power-of-two, we should try to make it
true power-of-two (which will allow us to turn it into a left-shift),
by trying to sink the negation down into LHS op.

But, we shouldn't re-invent the logic for sinking negation,
let's just use Negator for that.

Tests and original patch by: Simon Pilgrim @RKSimon!

Differential Revision: https://reviews.llvm.org/D85446
2020-08-06 23:39:53 +03:00
Roman Lebedev 7ce76b06ec
[InstCombine] Fold sdiv exact X, -1<<C --> -(ashr exact X, C)
While that does increases instruction count,
shift is obviously better than a division.

Name: base
Pre: (1<<C1) >= 0
%o0 = shl i8 1, C1
%r = sdiv exact i8 C0, %o0
  =>
%r = ashr exact i8 C0, C1

Name: neg
%o0 = shl i8 -1, C1
%r = sdiv exact i8 C0, %o0
  =>
%t0 = ashr exact i8 C0, C1
%r = sub i8 0, %t0

Name: reverse
Pre: C1 != 0 && C1 u< 8
%t0 = ashr exact i8 C0, C1
%r = sub i8 0, %t0
  =>
%o0 = shl i8 -1, C1
%r = sdiv exact i8 C0, %o0

https://rise4fun.com/Alive/MRplf
2020-08-06 23:37:16 +03:00
Roman Lebedev 442cb88f53
[InstCombine] Generalize sdiv exact X, 1<<C --> ashr exact X, C fold to handle non-splat vectors 2020-08-06 23:37:15 +03:00
Sebastian Neubauer 2a6c871596 [InstCombine] Move target-specific inst combining
For a long time, the InstCombine pass handled target specific
intrinsics. Having target specific code in general passes was noted as
an area for improvement for a long time.

D81728 moves most target specific code out of the InstCombine pass.
Applying the target specific combinations in an extra pass would
probably result in inferior optimizations compared to the current
fixed-point iteration, therefore the InstCombine pass resorts to newly
introduced functions in the TargetTransformInfo when it encounters
unknown intrinsics.
The patch should not have any effect on generated code (under the
assumption that code never uses intrinsics from a foreign target).

This introduces three new functions:
TargetTransformInfo::instCombineIntrinsic
TargetTransformInfo::simplifyDemandedUseBitsIntrinsic
TargetTransformInfo::simplifyDemandedVectorEltsIntrinsic

A few target specific parts are left in the InstCombine folder, where
it makes sense to share code. The largest left-over part in
InstCombineCalls.cpp is the code shared between arm and aarch64.

This allows to move about 3000 lines out from InstCombine to the targets.

Differential Revision: https://reviews.llvm.org/D81728
2020-07-22 15:59:49 +02:00
Roman Lebedev 0fdcca07ad
[InstCombine] Fold X sdiv (-1 << C) -> -(X u>> Y) iff X is non-negative
This is the one i'm seeing as missed optimization,
although there are likely other possibilities, as usual.

There are 4 variants of a general sdiv->udiv fold:
https://rise4fun.com/Alive/VS6

Name: v0
Pre: C0 >= 0 && C1 >= 0
%r = sdiv i8 C0, C1
  =>
%r = udiv i8 C0, C1

Name: v1
Pre: C0 <= 0 && C1 >= 0
%r = sdiv i8 C0, C1
  =>
%t0 = udiv i8 -C0, C1
%r = sub i8 0, %t0

Name: v2
Pre: C0 >= 0 && C1 <= 0
%r = sdiv i8 C0, C1
  =>
%t0 = udiv i8 C0, -C1
%r = sub i8 0, %t0

Name: v3
Pre: C0 <= 0 && C1 <= 0
%r = sdiv i8 C0, C1
  =>
%r = udiv i8 -C0, -C1


If we really don't like sdiv (more than udiv that is),
and are okay with increasing instruction count (2 new negations),
and we ensure that we don't undo the fold,
then we could just implement these..
2020-07-17 22:50:09 +03:00
Sanjay Patel 4458973347 [InstCombine] fold mul of zext/sext bools to 'and'
Similar to rG40fcc42:
The base case only worked because we were relying on a
poison-unsafe select transform; if that is fixed, we
would regress on patterns like this.

The extra use tests show that the select transform can't
be applied consistently. So it may be a regression to have
an extra instruction on 1 test, but that result was not
created safely and does not happen reliably.
2020-07-12 15:56:26 -04:00