Commit Graph

694 Commits

Author SHA1 Message Date
Benjamin Kramer b639091c02 Change users of CreateShuffleVector to pass the masks as int instead of Constants
No functionality change intended.
2020-04-17 16:34:29 +02:00
Johannes Doerfert ea7f17ee38 [InstCombine] Simplify calls with casted `returned` attribute
The handling of the `returned` attribute in D75815 did miss the case
where the argument is (bit)casted to a different type. This is
explicitly allowed by the language reference and exposed by the
Attributor.

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D77977
2020-04-16 00:56:00 -05:00
Benjamin Kramer 6f64daca8f Upgrade calls to CreateShuffleVector to use the preferred form of passing an array of ints
No functionality change intended.
2020-04-15 12:51:38 +02:00
Craig Topper f3d3cec648 [InstCombine] Avoid a call to deprecated version of CreateCall.
Passing a Value * to CreateCall has to call getPointerElementType
to find the type of the pointer.

In this case we can rely on the fact that Intrinsic::getDeclaration
returns a Function * and use that version of CreateCall.
2020-04-08 17:41:16 -07:00
Christopher Tetreault 155740cc33 Clean up usages of asserting vector getters in Type
Summary:
Remove usages of asserting vector getters in Type in preparation for the
VectorType refactor. The existence of these functions complicates the
refactor while adding little value.

Reviewers: sdesmalen, rriddle, efriedma

Reviewed By: sdesmalen

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77263
2020-04-08 15:15:41 -07:00
Matt Arsenault 57a55313c3 InstCombine: Reduce minnum/maxnum if inputs are casted 2020-04-03 11:57:25 -04:00
Tyker c00cb76274 [NFC] Split Knowledge retention and place it more appropriatly
Summary:
Splitting Knowledge retention into Queries in Analysis and Builder into Transform/Utils
allows Queries and Transform/Utils to use Analysis.

Reviewers: jdoerfert, sstefan1

Reviewed By: jdoerfert

Subscribers: mgorny, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77171
2020-04-02 15:01:41 +02:00
Uday Bondhugula dc817b2dea [InstCombine] Deduce attributes for aligned_alloc in InstCombine
Make InstCombine aware of the aligned_alloc library function.

Signed-off-by: Uday Bondhugula <uday@polymagelabs.com>

Depends on D76970.

Differential Revision: https://reviews.llvm.org/D76971
2020-03-31 23:17:28 +05:30
Sebastian Neubauer 5d3a69feca [AMDGPU] New llvm.amdgcn.ballot intrinsic
Add a new llvm.amdgcn.ballot intrinsic modeled on the ballot function
in GLSL and other shader languages. It returns a bitfield containing the
result of its boolean argument in all active lanes, and zero in all
inactive lanes.

This is intended to replace the existing llvm.amdgcn.icmp and
llvm.amdgcn.fcmp intrinsics after a suitable transition period.

Use the new intrinsic in the atomic optimizer pass.

Differential Revision: https://reviews.llvm.org/D65088
2020-03-31 10:35:39 +02:00
Nikita Popov 53d209076a [InstCombine] Use replaceOperand() in demanded elements simplification
To make sure that dead operands get DCEd. This fixes the largest
source of leftover dead operands we see in tests.

NFC apart from worklist changes.
2020-03-29 20:43:19 +02:00
Nikita Popov 28f67bd5c5 [InstCombine] Fix worklist management in varargs transform
Add a replaceUse() helper to mirror replaceOperand() for the
rare cases where we're working directly on uses.

NFC apart from worklist order changes.
2020-03-29 18:04:12 +02:00
Sanjay Patel a1fe6beb1e [InstCombine] remove one-use check for ctpop -> cttz
Two one-use checks were added with rGfdcb27105537,
but only the first one is necessary to limit an
increase in instruction count. The second transform
only creates one instruction, so it is always a
reasonable canonicalization/optimization.
2020-03-23 13:59:57 -04:00
Simon Pilgrim fdcb271055 [InstCombine] Limit CTPOP -> CTTZ simplifications to one use
Tweak D76568 so we only combine if it will remove the bit-twiddling.

Suggested by @spatel
2020-03-23 14:33:41 +00:00
Simon Pilgrim 72d1419bfb [InstCombine] Add CTPOP -> CTTZ simplifications (PR43513)
As detailed on PR43513, we can simplify:

ctpop(x | -x) -> bitwidth - cttz(x, false)
Alive2: http://volta.cs.utah.edu:8080/z/caw49X

ctpop(~x & (x - 1)) -> cttz(x, false)
Alive2: http://volta.cs.utah.edu:8080/z/5zfVrx

I've tweaked the initial test cases I added at rG2d712fb75584 to increase commutativity testing.

Differential Revision: https://reviews.llvm.org/D76568
2020-03-23 11:04:33 +00:00
Simon Pilgrim f00a4b531a [InstCombine][X86] simplifyX86immShift - remove ConstantAggregateZero handling. NFC.
The llvm::computeKnownBits path now handles this.
2020-03-21 11:30:44 +00:00
Simon Pilgrim 34659de5fd [InstCombine][X86] simplifyX86immShift - convert variable in-range vector shift by scalar amounts to generic shifts (PR40391)
The sll/srl/sra scalar vector shifts can be replaced with generic shifts if the shift amount is known to be in range.

This also required public DemandedElts variants of llvm::computeKnownBits to be exposed (PR36319).
2020-03-20 15:48:06 +00:00
Nikita Popov 0372768776 [InstCombine] Simplify calls with "returned" attribute
If a call argument has the "returned" attribute, we can simplify
the call to the value of that argument. This was already partially
handled by InstSimplify/InstCombine for the case where the argument
is an integer constant, and the result is thus known via known bits.
The non-constant (or non-int) argument cases weren't handled though.

This previously landed as an InstSimplify transform, but was reverted
due to assertion failures when compiling the Linux kernel. The reason
is that simplifying a call to another call breaks assumptions in
call graph updating during inlining. As the code is not easy to fix,
and there is no particularly strong motivation for having this in
InstSimplify, the transform is only performed in InstCombine instead.

Differential Revision: https://reviews.llvm.org/D75815
2020-03-20 10:23:39 +01:00
Simon Pilgrim a11e5b32df [InstCombine][X86] simplifyX86immShift - handle variable out-of-range vector shift by immediate amounts (PR40391)
If we know the SSE shift amount is out of range then we can simplify to zero value (logical) or a 'signsplat' bitwidth-1 shift (arithmetic). This allows us to remove the equivalent ConstantInt constant folding path from simplifyX86immShift.
2020-03-19 18:27:31 +00:00
Simon Pilgrim 433897da4a [InstCombine][X86] simplifyX86immShift - convert variable in-range vector shift by immediate amounts to generic shifts (PR40391)
The slli/srli/srai 'immediate' vector shifts (although its not immediate anymore to match gcc) can be replaced with generic shifts if the shift amount is known to be in range.
2020-03-19 15:44:24 +00:00
Simon Pilgrim f4e495a18e [InstCombine][X86] simplifyX86varShift - convert variable in-range per-element shift amounts to generic shifts (PR40391)
AVX2/AVX512 per-element shifts can be replaced with generic shifts if the shift amounts are guaranteed to be in-range (upper bits are known zero).
2020-03-18 11:26:54 +00:00
Tyker e8ac825f5b [AssumeBundles] Detection of Empty bundles
Summary: Prevent InstCombine from removing llvm.assume for which the arguement is true when they have operand bundles with usefull information.

Reviewers: jdoerfert, nikic, lebedev.ri

Reviewed By: jdoerfert

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76147
2020-03-17 15:50:15 +01:00
Serguei Katkov 80c351cdb6 [InstCombine] Transform to undef incorrect atomic unordered mem intrinsics
According to LangRef:
If len is not a positive integer multiple of element_size, then the behaviour of the intrinsic is undefined.

Add InstCombine rule to transform intrinsic to undef operation.

This is a follow-up for D76116.

Reviewers: reames
Reviewed By: reames
Subscribers: hiraditya, jfb, dantrushin, llvm-commits
Differential Revision: https://reviews.llvm.org/D76215
2020-03-17 10:20:16 +07:00
Nikita Popov c3ca6876ed [InstCombine] Don't simplify calls without uses
When simplifying a call without uses, replaceInstUsesWith() is
going to do nothing, but we'll skip all following folds. We can
only run into this problem with calls that both simplify and are
not trivially dead if unused, which currently seems to happen only
with calls to undef, as the test diff shows. When extending
SimplifyCall() to handle "returned" attributes, this becomes a much
bigger problem, so I'm fixing this first.

Differential Revision: https://reviews.llvm.org/D75814
2020-03-09 18:47:46 +01:00
Simon Pilgrim 7e9747b50b [X86][F16C] Remove cvtph2ps intrinsics and use generic half2float conversion (PR37554)
This removes everything but int_x86_avx512_mask_vcvtph2ps_512 which provides the SAE variant, but even this can use the fpext generic if the rounding control is the default.

Differential Revision: https://reviews.llvm.org/D75162
2020-02-29 18:57:35 +00:00
Simon Moll ddd11273d9 Remove BinaryOperator::CreateFNeg
Use UnaryOperator::CreateFNeg instead.

Summary:
With the introduction of the native fneg instruction, the
fsub -0.0, %x idiom is obsolete. This patch makes LLVM
emit fneg instead of the idiom in all places.

Reviewed By: cameron.mcinally

Differential Revision: https://reviews.llvm.org/D75130
2020-02-27 09:06:03 -08:00
Nikita Popov 56f7de5baa [InstCombine] Remove trivially empty ranges from end
InstCombine removes pairs of start+end intrinsics that don't
have anything in between them. Currently this is done by starting
at the start intrinsic and scanning forwards. This patch changes
it to start at the end intrinsic and scan backwards.

The motivation here is as follows: When we process the start
intrinsic, we have not yet looked at the following instructions,
which may still get folded/removed. If they do, we will only be
able to remove the start/end pair on the next iteration. When we
process the end intrinsic, all the instructions before it have
already been visited, and we don't run into this problem.

Differential Revision: https://reviews.llvm.org/D75011
2020-02-26 20:04:11 +01:00
Krzysztof Parzyszek d2b7c09e79 [Hexagon] Simplify intrinsic (vandvrt (vandqrt q b) m) -> q if possible
When each byte in b&m is non-zero, this conversion Q->V->Q is a no-op.
2020-02-21 13:56:04 -06:00
Nikita Popov 656dff9af4 [InstCombine] Use replaceOperand() in more places
Followup to D73919 with another batch of replacements of
setOperand() -> replaceOperand(), to make sure the old
operand gets DCEd right away.

Differential Revision: https://reviews.llvm.org/D74932
2020-02-21 18:41:16 +01:00
Nikita Popov a8db806d52 [SimplifyLibCalls][IRBuilder] Accept any IRBuilder in SimplifyLibCalls
This changes the SimplifyLibCalls utility to accept an IRBuilderBase,
which allows us to pass through the IRBuilder used by InstCombine.
This will ensure that new instructions get added to the worklist.
The annotated test-case drops from 4 to 2 InstCombine iterations thanks
to this.

To achieve this, I'm adding an IRBuilderBase::OperandBundlesGuard,
which is basically the same as the existing InsertPointGuard and
FastMathFlagsGuard, but for operand bundles. Also add a
setDefaultOperandBundles() method so these can be set outside the
constructor.

Differential Revision: https://reviews.llvm.org/D74792
2020-02-21 18:26:05 +01:00
Matt Arsenault a3c814d234 Separately track input and output denormal mode
AMDGPU and x86 at least both have separate controls for whether
denormal results are flushed on output, and for whether denormals are
implicitly treated as 0 as an input. The current DAGCombiner use only
really cares about the input treatment of denormals.
2020-02-04 12:59:21 -05:00
Nikita Popov 878cb38a5c [InstCombine] Add replaceOperand() helper
Adds a replaceOperand() helper, which is like Instruction.setOperand()
but adds the old operand to the worklist. This reduces the amount of
missing or incorrect worklist management.

This only applies the helper to a relatively small subset of
setOperand() calls in InstCombine, namely those of the pattern
`I.setOperand(); return &I;`, where it is most obviously applicable.

Differential Revision: https://reviews.llvm.org/D73803
2020-02-03 19:00:17 +01:00
Nikita Popov e6c9ab4fb7 [InstCombine] Rename worklist methods; NFC
This renames Worklist.AddDeferred() to Worklist.add() and
Worklist.Add() to Worklist.push(). The intention here is that
Worklist.add() should be the go-to method for explicit worklist
management, while the raw Worklist.push() is mostly for
InstCombine internals. I will then migrate uses of Worklist.push()
to Worklist.add() in followup changes.

As suggested by spatel on D73411 I'm also changing the remaining
method names to lowercase first character, in line with current
coding standards.

Differential Revision: https://reviews.llvm.org/D73745
2020-02-03 18:56:51 +01:00
Guillaume Chatelet 805c157e8a [Alignment][NFC] Deprecate Align::None()
Summary:
This is a follow up on https://reviews.llvm.org/D71473#inline-647262.
There's a caveat here that `Align(1)` relies on the compiler understanding of `Log2_64` implementation to produce good code. One could use `Align()` as a replacement but I believe it is less clear that the alignment is one in that case.

Reviewers: xbolva00, courbet, bollu

Subscribers: arsenm, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jrtc27, atanasyan, jsji, Jim, kerbowa, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D73099
2020-01-24 12:53:58 +01:00
Guillaume Chatelet 279fa8e006 [Alignement][NFC] Deprecate untyped CreateAlignedLoad
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: arsenm, jvesely, nhaehnle, hiraditya, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73260
2020-01-23 13:34:32 +01:00
Guillaume Chatelet 0957233320 [Alignment][NFC] Use Align with CreateMaskedStore
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D73106
2020-01-22 11:04:39 +01:00
Guillaume Chatelet bc8a1ab26f [Alignment][NFC] Use Align with CreateMaskedLoad
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D73087
2020-01-21 14:13:22 +01:00
Matt Arsenault a4451d88ee Consolidate internal denormal flushing controls
Currently there are 4 different mechanisms for controlling denormal
flushing behavior, and about as many equivalent frontend controls.

- AMDGPU uses the fp32-denormals and fp64-f16-denormals subtarget features
- NVPTX uses the nvptx-f32ftz attribute
- ARM directly uses the denormal-fp-math attribute
- Other targets indirectly use denormal-fp-math in one DAGCombine
- cl-denorms-are-zero has a corresponding denorms-are-zero attribute

AMDGPU wants a distinct control for f32 flushing from f16/f64, and as
far as I can tell the same is true for NVPTX (based on the attribute
name).

Work on consolidating these into the denormal-fp-math attribute, and a
new type specific denormal-fp-math-f32 variant. Only ARM seems to
support the two different flush modes, so this is overkill for the
other use cases. Ideally we would error on the unsupported
positive-zero mode on other targets from somewhere.

Move the logic for selecting the flush mode into the compiler driver,
instead of handling it in cc1. denormal-fp-math/denormal-fp-math-f32
are now both cc1 flags, but denormal-fp-math-f32 is not yet exposed as
a user flag.

-cl-denorms-are-zero, -fcuda-flush-denormals-to-zero and
-fno-cuda-flush-denormals-to-zero will be mapped to
-fp-denormal-math-f32=ieee or preserve-sign rather than the old
attributes.

Stop emitting the denorms-are-zero attribute for the OpenCL flag. It
has no in-tree users. The meaning would also be target dependent, such
as the AMDGPU choice to treat this as only meaning allow flushing of
f32 and not f16 or f64. The naming is also potentially confusing,
since DAZ in other contexts refers to instructions implicitly treating
input denormals as zero, not necessarily flushing output denormals to
zero.

This also does not attempt to change the behavior for the current
attribute. The LangRef now states that the default is ieee behavior,
but this is inaccurate for the current implementation. The clang
handling is slightly hacky to avoid touching the existing
denormal-fp-math uses. Fixing this will be left for a future patch.

AMDGPU is still using the subtarget feature to control the denormal
mode, but the new attribute are now emitted. A future change will
switch this and remove the subtarget features.
2020-01-17 20:09:53 -05:00
Matt Arsenault 3ef8cdf666 AMDGPU: Do permlane16 vdst_in discard optimization in InstCombine
There's more potential value to discarding the source value earlier,
since we always know the value of the fi/bc bits.
2020-01-16 17:27:53 -05:00
Nikita Popov 04e586151e [InstCombine] Fix worklist management when removing guard intrinsic
When multiple guard intrinsics are merged into one, currently the
result of eraseInstFromFunction() is returned -- however, this
should only be done if the current instruction is being removed.
In this case we're removing a different instruction and should
instead report that the current one has been modified by returning it.

For this test case, this reduces the number of instcombine iterations
from 5 to 2 (the minimum possible).

Differential Revision: https://reviews.llvm.org/D72558
2020-01-14 21:47:48 +01:00
Craig Topper 374e0299cf [X86][InstCombine] Add constant folding and simplification support for pdep and pext
The instructions use a mask to either pack disjoint bits together(pext) or spread bits to disjoint locations(pdep). If the mask is all 0s then no bits are extracted or deposited. If the mask is all ones, then the source value is written to the result since no compression or expansion happens. Otherwise if both the source and mask are constant we can walk the bits in the source/mask and calculate the result.

There other crazier things we could do like computeKnownBits or turning pext into shift/and if only a single contiguous range of bits is extracted.

Fixes PR44389

Differential Revision: https://reviews.llvm.org/D71952
2019-12-31 15:06:47 -08:00
Sanjay Patel 987eb8e26c [InstCombine] propagate sign argument through nested copysigns
This is another optimization suggested in PR44153:
https://bugs.llvm.org/show_bug.cgi?id=44153
2019-12-30 11:06:02 -05:00
Fangrui Song 7a7334663c Delete llvm.{sig,}{setjmp,longjmp} remnant after r136821
Intrinsic has incorrect argument type!
  i32 (i32*)* @llvm.setjmp

*wipes tear*
2019-12-27 00:00:14 -08:00
Sanjay Patel 9cdcd81d3f [InstCombine] enhance fold for copysign with known sign arg
This is another optimization suggested in PRPR44153:
https://bugs.llvm.org/show_bug.cgi?id=44153
2019-12-22 10:07:01 -05:00
Reid Kleckner 5d986953c8 [IR] Split out target specific intrinsic enums into separate headers
This has two main effects:
- Optimizes debug info size by saving 221.86 MB of obj file size in a
  Windows optimized+debug build of 'all'. This is 3.03% of 7,332.7MB of
  object file size.
- Incremental step towards decoupling target intrinsics.

The enums are still compact, so adding and removing a single
target-specific intrinsic will trigger a rebuild of all of LLVM.
Assigning distinct target id spaces is potential future work.

Part of PR34259

Reviewers: efriedma, echristo, MaskRay

Reviewed By: echristo, MaskRay

Differential Revision: https://reviews.llvm.org/D71320
2019-12-11 18:02:14 -08:00
Ehud Katz 2b6b8cb10c [APFloat] Prevent construction of APFloat with Semantics and FP value
Constructor invocations such as `APFloat(APFloat::IEEEdouble(), 0.0)`
may seem like they accept a FP (floating point) value, but the overload
they reach is actually the `integerPart` one, not a `float` or `double`
overload (which only exists when `fltSemantics` isn't passed).

This may lead to possible loss of data, by the conversion from `float`
or `double` to `integerPart`.

To prevent future mistakes, a new constructor overload, which accepts
any FP value and marked with `delete`, to prevent its usage.

Fixes PR34095.

Differential Revision: https://reviews.llvm.org/D70425
2019-12-04 12:02:04 +02:00
Simon Tatham 01aefae4a1 [ARM,MVE] Add an InstCombine rule permitting VPNOT.
Summary:
If a user writing C code using the ACLE MVE intrinsics generates a
predicate and then complements it, then the resulting IR will use the
`pred_v2i` IR intrinsic to turn some `<n x i1>` vector into a 16-bit
integer; complement that integer; and convert back. This will generate
machine code that moves the predicate out of the `P0` register,
complements it in an integer GPR, and moves it back in again.

This InstCombine rule replaces `i2v(~v2i(x))` with a direct complement
of the original predicate vector, which we can already instruction-
select as the VPNOT instruction which complements P0 in place.

Reviewers: ostannard, MarkMurrayARM, dmgreen

Reviewed By: dmgreen

Subscribers: kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70484
2019-12-02 16:20:30 +00:00
Sanjay Patel af0babc90a [InstCombine] fold copysign with constant sign argument to (fneg+)fabs
If the sign of the sign argument is known (this could be extended to use ValueTracking),
then we can use fneg+fabs to clear/set the sign bit of the magnitude argument.
http://llvm.org/docs/LangRef.html#llvm-copysign-intrinsic

This transform is already done in DAGCombiner, but we can do it sooner in IR as
suggested in PR44153:
https://bugs.llvm.org/show_bug.cgi?id=44153

We have effectively no analysis for copysign in IR, so we are taking the unusual step
of increasing the number of IR instructions for the negative constant case.

Differential Revision: https://reviews.llvm.org/D70792
2019-12-02 09:23:12 -05:00
Davide Italiano c32f0ff92f [InstCombine] Fix call guard difference with dbg
Patch by Chris Ye!

Differential Revision: https://reviews.llvm.org/D68004
2019-11-22 13:35:53 -08:00
Simon Tatham f4f77aa53e [ARM,MVE] Add InstCombine rules for pred_i2v / pred_v2i.
If you're writing C code using the ACLE MVE intrinsics that passes the
result of a vcmp as input to a predicated intrinsic, e.g.

  mve_pred16_t pred = vcmpeqq(v1, v2);
  v_out = vaddq_m(v_inactive, v3, v4, pred);

then clang's codegen for the compare intrinsic will create calls to
`@llvm.arm.mve.pred.v2i` to convert the output of `icmp` into an
`mve_pred16_t` integer representation, and then the next intrinsic
will call `@llvm.arm.mve.pred.i2v` to convert it straight back again.
This will be visible in the generated code as a `vmrs`/`vmsr` pair
that move the predicate value pointlessly out of `p0` and back into it again.

To prevent that, I've added InstCombine rules to remove round trips of
the form `v2i(i2v(x))` and `i2v(v2i(x))`. Also I've taught InstCombine
about the known and demanded bits of those intrinsics. As a result,
you now get just the generated code you wanted:

  vpt.u16 eq, q1, q2
  vaddt.u16 q0, q3, q4

Reviewers: ostannard, MarkMurrayARM, dmgreen

Reviewed By: dmgreen

Subscribers: kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70313
2019-11-18 10:39:30 +00:00
Benjamin Kramer 6f0bb77037 [InstCombine] Fold one-use variable into assert
Avoids warnings in Release builds. NFC.
2019-10-24 17:57:24 +02:00