Commit Graph

634 Commits

Author SHA1 Message Date
David Bolvansky be2487a2ba [InstCombine] Annotate strdup with deref_or_null
llvm-svn: 372098
2019-09-17 10:12:48 +00:00
Florian Hahn 51de22c8ee Revert [InstCombine] Use SimplifyFMulInst to simplify multiply in fma.
This introduces additional rounding error in some cases. See D67434.

This reverts r371518 (git commit 18a1f0818b)

llvm-svn: 371634
2019-09-11 16:17:03 +00:00
David Bolvansky 4dae283cd3 [InstCombine] Fixed handling of isOpNewLike (PR11748)
llvm-svn: 371602
2019-09-11 10:37:03 +00:00
Florian Hahn 18a1f0818b [InstCombine] Use SimplifyFMulInst to simplify multiply in fma.
This allows us to fold fma's that multiply with 0.0. Also, the
multiply by 1.0 case is handled there as well. The fneg/fabs cases
are not handled by SimplifyFMulInst, so we need to keep them.

Reviewers: spatel, anemet, lebedev.ri

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D67351

llvm-svn: 371518
2019-09-10 13:10:28 +00:00
David Bolvansky af118bb6d0 [NFC] Added a comment to avoid possible confusion
llvm-svn: 370217
2019-08-28 15:04:48 +00:00
David Bolvansky 05bda8b4e5 Annotate return values of allocation functions with dereferenceable_or_null
Summary:
Example
define dso_local noalias i8* @_Z6maixxnv() local_unnamed_addr #0 {
entry:
  %call = tail call noalias dereferenceable_or_null(64) i8* @malloc(i64 64) #6
  ret i8* %call
}


Reviewers: jdoerfert

Reviewed By: jdoerfert

Subscribers: aaron.ballman, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66651

llvm-svn: 370168
2019-08-28 08:28:20 +00:00
Vitaly Buka aeca56964f msan, codegen, instcombine: Keep more lifetime markers used for msan
Reviewers: eugenis

Subscribers: hiraditya, cfe-commits, #sanitizers, llvm-commits

Tags: #clang, #sanitizers, #llvm

Differential Revision: https://reviews.llvm.org/D66695

llvm-svn: 369979
2019-08-26 22:15:50 +00:00
Tim Northover 60afa49abe OpaquePtr: add Type parameter to Loads analysis API.
This makes the functions in Loads.h require a type to be specified
independently of the pointer Value so that when pointers have no structure
other than address-space, it can still do its job.

Most callers had an obvious memory operation handy to provide this type, but a
SROA and ArgumentPromotion were doing more complicated analysis. They get
updated to merge the properties of the various instructions they were
considering.

llvm-svn: 365468
2019-07-09 11:35:35 +00:00
Sanjay Patel 89efefb170 [InstCombine] reduce funnel-shift i16 X, X, 8 to bswap X
Prefer the more exact intrinsic to remove a use of the input value
and possibly make further transforms easier (we will still need
to match patterns with funnel-shift of wider types as pieces of
bswap, especially if we want to canonicalize to funnel-shift with
constant shift amount). Discussed in D46760.

llvm-svn: 364187
2019-06-24 15:20:49 +00:00
David Bolvansky 4b28478389 [InstCombine] cttz(abs(x)) -> cttz(x)
Summary: Signedness does not change number of trailing zeros.

Reviewers: spatel, lebedev.ri, nikic

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D63546

llvm-svn: 364064
2019-06-21 15:26:22 +00:00
David Bolvansky 01511192b2 [InstCombine] cttz(-x) -> cttz(x)
Summary: Signedness does not change number of trailing zeros.

Reviewers: spatel, lebedev.ri, nikic

Reviewed By: spatel

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63534

llvm-svn: 363951
2019-06-20 17:04:14 +00:00
Matt Arsenault 6d741f29ec AMDGPU: Fold readlane/readfirstlane calls
llvm-svn: 363587
2019-06-17 17:52:35 +00:00
Matt Arsenault 492d71cc99 AMDGPU: Fold readlane intrinsics of constants
I'm not 100% sure about this, since I'm worried about IR transforms
that might end up introducing divergence downstream once replaced with
a constant, but I haven't come up with an example yet.

llvm-svn: 363406
2019-06-14 14:51:26 +00:00
Stanislav Mekhanoshin 68a2fef9ae [AMDGPU] gfx1010 wave32 icmp/fcmp intrinsic changes for wave32
Differential Revision: https://reviews.llvm.org/D63301

llvm-svn: 363339
2019-06-13 23:47:36 +00:00
Tim Northover 8d7f118ab2 InstCombine: correctly change byval type attribute alongside call args.
When the byval attribute has a type, it must match the pointee type of
any parameter; but InstCombine was not updating the attribute when
folding casts of various kinds away.

llvm-svn: 362643
2019-06-05 20:38:17 +00:00
Nikita Popov 5382803b04 [InstCombine] Optimize always overflowing signed saturating add/sub
Based on the overflow direction information added in D62463, we can
now fold always overflowing signed saturating add/sub to signed min/max.

Differential Revision: https://reviews.llvm.org/D62544

llvm-svn: 362006
2019-05-29 18:37:13 +00:00
Nikita Popov c51cdacab9 [InstCombine] Clean up saturing math overflow optimizations; NFC
Reduce duplication and make it easier to handle signed
always-overflows conditions in the future.

llvm-svn: 361863
2019-05-28 18:59:21 +00:00
Nikita Popov 332c100562 [ValueTracking][ConstantRange] Distinguish low/high always overflow
In order to fold an always overflowing signed saturating add/sub,
we need to know in which direction the always overflow occurs.
This patch splits up AlwaysOverflows into AlwaysOverflowsLow and
AlwaysOverflowsHigh to pass through this information (but it is
not used yet).

Differential Revision: https://reviews.llvm.org/D62463

llvm-svn: 361858
2019-05-28 18:08:31 +00:00
Nikita Popov 352f598795 [InstCombine] Remove OverflowCheckFlavor; NFC
Instead pass binary op and signedness. The extra enum only makes
things more complicated in this case.

llvm-svn: 361720
2019-05-26 11:43:31 +00:00
Craig Topper 9816d55776 [X86][InstCombine] Remove InstCombine code that turns X86 round intrinsics into llvm.ceil/floor. Remove some isel patterns that existed because that was happening.
We were turning roundss/sd/ps/pd intrinsics with immediates of 1 or 2 into
llvm.floor/ceil.  The llvm.ceil/floor intrinsics are supposed to correspond
to the libm functions.  For the libm functions we need to disable the
precision exception so the llvm.floor/ceil functions should always map to
encodings 0x9 and 0xA.

We had a mix of isel patterns where some used 0x9 and 0xA and others used
0x1 and 0x2. We need to be consistent and always use 0x9 and 0xA.

Since we have no way in isel of knowing where the llvm.ceil/floor came
from, we can't map X86 specific intrinsics with encodings 1 or 2 to it.
We could map 0x9 and 0xA to llvm.ceil/floor instead, but I'd really like
to see a use case and optimization advantage first.

I've left the backend test cases to show the blend we now emit without
the extra isel patterns. But I've removed the InstCombine tests completely.

llvm-svn: 361425
2019-05-22 20:04:55 +00:00
Sanjay Patel 62f457b137 [InstCombine] reduce code duplication; NFCI
llvm-svn: 360051
2019-05-06 15:35:02 +00:00
Simon Pilgrim 48a3b54572 [InstCombine][X86] Tweak generic expansion of PACKSS/PACKUS to shuffle then truncate. NFCI.
This has no effect on constant folding but will be useful when we expand non-saturating PACKSS/PACKUS intrinsics.

llvm-svn: 359191
2019-04-25 13:51:57 +00:00
Simon Pilgrim 4b7d3c4831 Fix include order. NFCI.
llvm-svn: 359177
2019-04-25 09:49:37 +00:00
Philip Reames 88cd69b56f Consolidate existing utilities for interpreting vector predicate maskes [NFC]
llvm-svn: 359163
2019-04-25 02:30:17 +00:00
Philip Reames 7c8647b26f [InstCombine] Be consistent w/handling of masked intrinsics style wise [NFC]
llvm-svn: 359160
2019-04-25 01:18:56 +00:00
Simon Pilgrim 55f14dac74 [InstCombine][X86] Use generic expansion of PACKSS/PACKUS for constant folding. NFCI.
This patch rewrites the existing PACKSS/PACKUS constant folding code to expand as a generic expansion.

This is a first NFCI step toward expanding PACKSS/PACKUS intrinsics which are acting as non-saturating truncations (although technically the expansion could be used in all cases - but we'll probably want to be conservative).

llvm-svn: 359111
2019-04-24 16:53:17 +00:00
Philip Reames 2ce017026a [InstCombine] Convert a masked.load of a dereferenceable address to an unconditional load
If we have a masked.load from a location we know to be dereferenceable, we can simply issue a speculative unconditional load against that address. The key advantage is that it produces IR which is well understood by the optimizer. The select (cnd, load, passthrough) form produced should be pattern matchable back to hardware predication if profitable.

Differential Revision: https://reviews.llvm.org/D59703

llvm-svn: 359000
2019-04-23 15:25:14 +00:00
Philip Reames d748689c7f [InstCombine] Eliminate stores to constant memory
If we have a store to a piece of memory which is known constant, then we know the store must be storing back the same value. As a result, the store (or memset, or memmove) must either be down a dead path, or a noop. In either case, it is valid to simply remove the store.

The motivating case for this involves a memmove to a buffer which is constant down a path which is dynamically dead.

Note that I'm choosing to implement the less aggressive of two possible semantics here. We could simply say that the store *is undefined*, and prune the path. Consensus in the review was that the more aggressive form might be a good follow on change at a later date.

Differential Revision: https://reviews.llvm.org/D60659

llvm-svn: 358919
2019-04-22 20:28:19 +00:00
Philip Reames d8d9b7b20e [InstSimplify] Move masked.gather w/no active lanes handling to InstSimplify from InstCombine
In the process, use the existing masked.load combine which is slightly stronger, and handles a mix of zero and undef elements in the mask.  

llvm-svn: 358913
2019-04-22 19:30:01 +00:00
Philip Reames 88679717ce [InstCombine] Factor out unreachable inst idiom creation [NFC]
In InstCombine, we use an idiom of "store i1 true, i1 undef" to indicate we've found a path which we've proven unreachable.  We can't actually insert the unreachable instruction since that would require changing the CFG.  We leave that to simplifycfg later.

This just factors out that idiom creation so we don't duplicate the same mostly undocument idiom creation in multiple places.

llvm-svn: 358600
2019-04-17 17:37:58 +00:00
Nikita Popov 5ecd6a48b9 [InstCombine] Prune fshl/fshr with masked operands
If a constant shift amount is used, then only some of the LHS/RHS
operand bits are demanded and we may be able to simplify based on
that. InstCombineSimplifyDemanded already had the necessary support
for that, we just weren't calling it with fshl/fshr as root.

In particular, this allows us to relax some masked funnel shifts
into simple shifts, as shown in the tests.

Patch by Shawn Landden.

Differential Revision: https://reviews.llvm.org/D60660

llvm-svn: 358515
2019-04-16 19:05:49 +00:00
Hiroshi Yamauchi 09e539fcae [PGO] Profile guided code size optimization.
Summary:
Enable some of the existing size optimizations for cold code under PGO.

A ~5% code size saving in big internal app under PGO.

The way it gets BFI/PSI is discussed in the RFC thread

http://lists.llvm.org/pipermail/llvm-dev/2019-March/130894.html 

Note it doesn't currently touch loop passes.

Reviewers: davidxl, eraman

Reviewed By: eraman

Subscribers: mgorny, javed.absar, smeenai, mehdi_amini, eraman, zzheng, steven_wu, dexonsmith, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59514

llvm-svn: 358422
2019-04-15 16:49:00 +00:00
Nikita Popov 7a543c3758 [InstCombine] ssubo X, C -> saddo X, -C
ssubo X, C is equivalent to saddo X, -C. Make the transformation in
InstCombine and allow the logic implemented for saddo to fold prior
usages of add nsw or sub nsw with constants.

Patch by Dan Robertson.

Differential Revision: https://reviews.llvm.org/D60061

llvm-svn: 358099
2019-04-10 16:27:36 +00:00
Simon Pilgrim b4f1bfa659 [InstCombine][X86] Expand MOVMSK to generic IR (PR39927)
First step towards removing the MOVMSK intrinsics completely - this patch expands MOVMSK to the pattern:

e.g. PMOVMSKB(v16i8 x):
%cmp = icmp slt <16 x i8> %x, zeroinitializer
%int = bitcast <16 x i8> %cmp to i16
%res = zext i16 %int to i32

Which is correctly handled by ISel and FastIsel (give or take an annoying movzx move....): https://godbolt.org/z/rkrSFW

Differential Revision: https://reviews.llvm.org/D60256

llvm-svn: 357909
2019-04-08 13:17:51 +00:00
David Bolvansky 937720e75b [InstCombine] Simplify ctpop with bitreverse/bswap
Summary: Fixes PR41337

Reviewers: spatel

Reviewed By: spatel

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60148

llvm-svn: 357564
2019-04-03 08:08:44 +00:00
David Bolvansky 5ba60b22a4 [InstCombine] Simplify ctlz/cttz with bitreverse
Summary: Fixes PR41273

Reviewers: spatel

Reviewed By: spatel

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60096

llvm-svn: 357521
2019-04-02 20:13:28 +00:00
Philip Reames 60212be619 [instcombine] Add some todos, and arrange code for readibility
llvm-svn: 356642
2019-03-21 03:23:40 +00:00
Philip Reames e4588bbf80 Simplify operands of masked stores and scatters based on demanded elements
If we know we're not storing a lane, we don't need to compute the lane. This could be improved by using the undef element result to further prune the mask, but I want to separate that into its own change since it's relatively likely to expose other problems.

Differential Revision: https://reviews.llvm.org/D57247

llvm-svn: 356590
2019-03-20 18:44:58 +00:00
Nikita Popov 37cf25c3c6 [InstCombine] Fold add nuw + uadd.with.overflow
Fold add nuw and uadd.with.overflow with constants if the
addition does not overflow.

Part of https://bugs.llvm.org/show_bug.cgi?id=38146.

Patch by Dan Robertson.

Differential Revision: https://reviews.llvm.org/D59471

llvm-svn: 356584
2019-03-20 18:00:27 +00:00
Philip Reames 484d07c828 [instcombine] Add todos describing missing transforms for masked.* intrinsics
llvm-svn: 356536
2019-03-20 03:36:05 +00:00
Sanjay Patel 6063393536 [InstCombine] allow general vector constants for funnel shift to shift transforms
Follow-up to:
rL356338
rL356369

We can calculate an arbitrary vector constant minus the bitwidth, so there's
no need to limit this transform to scalars and splats.

llvm-svn: 356372
2019-03-18 14:27:51 +00:00
Sanjay Patel 84de8a30a0 [InstCombine] extend rotate-left-by-constant canonicalization to funnel shift
Follow-up to:
rL356338

Rotates are a special case of funnel shift where the 2 input operands
are the same value, but that does not need to be a restriction for the
canonicalization when the shift amount is a constant.

llvm-svn: 356369
2019-03-18 14:10:11 +00:00
Sanjay Patel b3bcd95771 [InstCombine] canonicalize rotate right by constant to rotate left
This was noted as a backend problem:
https://bugs.llvm.org/show_bug.cgi?id=41057
...and subsequently fixed for x86:
rL356121
But we should canonicalize these in IR for the benefit of all targets
and improve IR analysis such as CSE.

llvm-svn: 356338
2019-03-17 19:08:00 +00:00
Philip Reames 68a2e4d48b [SimplifyDemandedVec] Strengthen handling all undef lanes (particularly GEPs)
A change of two parts:
1) A generic enhancement for all callers of SDVE to exploit the fact that if all lanes are undef, the result is undef.
2) A GEP specific piece to strengthen/fix the vector index undef element handling, and call into the generic infrastructure when visiting the GEP.

The result is that we replace a vector gep with at least one undef in each lane with a undef.  We can also do the same for vector intrinsics.  Once the masked.load patch (D57372) has landed, I'll update to include call tests as well.

Differential Revision: https://reviews.llvm.org/D57468

llvm-svn: 356293
2019-03-15 19:54:06 +00:00
Sanjay Patel de1d5d3675 [InstCombine] canonicalize funnel shift constant shift amount to be modulo bitwidth
The shift argument is defined to be modulo the bitwidth, so if that argument
is a constant, we can always reduce the constant to its minimal form to allow
better CSE and other follow-on transforms.

We need to be careful to ignore constant expressions here, or we will likely
infinite loop. I'm adding a general vector constant query for that case.

Differential Revision: https://reviews.llvm.org/D59374

llvm-svn: 356192
2019-03-14 19:22:08 +00:00
Matt Arsenault caf1316f71 IR: Add immarg attribute
This indicates an intrinsic parameter is required to be a constant,
and should not be replaced with a non-constant value.

Add the attribute to all AMDGPU and generic intrinsics that comments
indicate it should apply to. I scanned other target intrinsics, but I
don't see any obvious comments indicating which arguments are intended
to be only immediates.

This breaks one questionable testcase for the autoupgrade. I'm unclear
on whether the autoupgrade is supposed to really handle declarations
which were never valid. The verifier fails because the attributes now
refer to a parameter past the end of the argument list.

llvm-svn: 355981
2019-03-12 21:02:54 +00:00
Nikita Popov 884feb1b69 [InstCombine] Fold add nsw + sadd.with.overflow
Fold `add nsw` and `sadd.with.overflow` with constants if the addition
does not overflow.

Part of https://bugs.llvm.org/show_bug.cgi?id=38146.

Patch by Dan Robertson.

Differential Revision: https://reviews.llvm.org/D58881

llvm-svn: 355530
2019-03-06 18:30:00 +00:00
Craig Topper a97857b5b5 [InstCombine] Fix an unused variable warning.
llvm-svn: 353630
2019-02-10 02:21:29 +00:00
Craig Topper 784929d045 Implementation of asm-goto support in LLVM
This patch accompanies the RFC posted here:
http://lists.llvm.org/pipermail/llvm-dev/2018-October/127239.html

This patch adds a new CallBr IR instruction to support asm-goto
inline assembly like gcc as used by the linux kernel. This
instruction is both a call instruction and a terminator
instruction with multiple successors. Only inline assembly
usage is supported today.

This also adds a new INLINEASM_BR opcode to SelectionDAG and
MachineIR to represent an INLINEASM block that is also
considered a terminator instruction.

There will likely be more bug fixes and optimizations to follow
this, but we felt it had reached a point where we would like to
switch to an incremental development model.

Patch by Craig Topper, Alexander Ivchenko, Mikhail Dvoretckii

Differential Revision: https://reviews.llvm.org/D53765

llvm-svn: 353563
2019-02-08 20:48:56 +00:00
James Y Knight 291f791ef1 [opaque pointer types] Pass function type for CallBase::setCalledFunction.
Differential Revision: https://reviews.llvm.org/D57174

llvm-svn: 352914
2019-02-01 20:44:54 +00:00