Commit Graph

605 Commits

Author SHA1 Message Date
Philip Reames 88679717ce [InstCombine] Factor out unreachable inst idiom creation [NFC]
In InstCombine, we use an idiom of "store i1 true, i1 undef" to indicate we've found a path which we've proven unreachable.  We can't actually insert the unreachable instruction since that would require changing the CFG.  We leave that to simplifycfg later.

This just factors out that idiom creation so we don't duplicate the same mostly undocument idiom creation in multiple places.

llvm-svn: 358600
2019-04-17 17:37:58 +00:00
Nikita Popov 5ecd6a48b9 [InstCombine] Prune fshl/fshr with masked operands
If a constant shift amount is used, then only some of the LHS/RHS
operand bits are demanded and we may be able to simplify based on
that. InstCombineSimplifyDemanded already had the necessary support
for that, we just weren't calling it with fshl/fshr as root.

In particular, this allows us to relax some masked funnel shifts
into simple shifts, as shown in the tests.

Patch by Shawn Landden.

Differential Revision: https://reviews.llvm.org/D60660

llvm-svn: 358515
2019-04-16 19:05:49 +00:00
Hiroshi Yamauchi 09e539fcae [PGO] Profile guided code size optimization.
Summary:
Enable some of the existing size optimizations for cold code under PGO.

A ~5% code size saving in big internal app under PGO.

The way it gets BFI/PSI is discussed in the RFC thread

http://lists.llvm.org/pipermail/llvm-dev/2019-March/130894.html 

Note it doesn't currently touch loop passes.

Reviewers: davidxl, eraman

Reviewed By: eraman

Subscribers: mgorny, javed.absar, smeenai, mehdi_amini, eraman, zzheng, steven_wu, dexonsmith, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59514

llvm-svn: 358422
2019-04-15 16:49:00 +00:00
Nikita Popov 7a543c3758 [InstCombine] ssubo X, C -> saddo X, -C
ssubo X, C is equivalent to saddo X, -C. Make the transformation in
InstCombine and allow the logic implemented for saddo to fold prior
usages of add nsw or sub nsw with constants.

Patch by Dan Robertson.

Differential Revision: https://reviews.llvm.org/D60061

llvm-svn: 358099
2019-04-10 16:27:36 +00:00
Simon Pilgrim b4f1bfa659 [InstCombine][X86] Expand MOVMSK to generic IR (PR39927)
First step towards removing the MOVMSK intrinsics completely - this patch expands MOVMSK to the pattern:

e.g. PMOVMSKB(v16i8 x):
%cmp = icmp slt <16 x i8> %x, zeroinitializer
%int = bitcast <16 x i8> %cmp to i16
%res = zext i16 %int to i32

Which is correctly handled by ISel and FastIsel (give or take an annoying movzx move....): https://godbolt.org/z/rkrSFW

Differential Revision: https://reviews.llvm.org/D60256

llvm-svn: 357909
2019-04-08 13:17:51 +00:00
David Bolvansky 937720e75b [InstCombine] Simplify ctpop with bitreverse/bswap
Summary: Fixes PR41337

Reviewers: spatel

Reviewed By: spatel

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60148

llvm-svn: 357564
2019-04-03 08:08:44 +00:00
David Bolvansky 5ba60b22a4 [InstCombine] Simplify ctlz/cttz with bitreverse
Summary: Fixes PR41273

Reviewers: spatel

Reviewed By: spatel

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60096

llvm-svn: 357521
2019-04-02 20:13:28 +00:00
Philip Reames 60212be619 [instcombine] Add some todos, and arrange code for readibility
llvm-svn: 356642
2019-03-21 03:23:40 +00:00
Philip Reames e4588bbf80 Simplify operands of masked stores and scatters based on demanded elements
If we know we're not storing a lane, we don't need to compute the lane. This could be improved by using the undef element result to further prune the mask, but I want to separate that into its own change since it's relatively likely to expose other problems.

Differential Revision: https://reviews.llvm.org/D57247

llvm-svn: 356590
2019-03-20 18:44:58 +00:00
Nikita Popov 37cf25c3c6 [InstCombine] Fold add nuw + uadd.with.overflow
Fold add nuw and uadd.with.overflow with constants if the
addition does not overflow.

Part of https://bugs.llvm.org/show_bug.cgi?id=38146.

Patch by Dan Robertson.

Differential Revision: https://reviews.llvm.org/D59471

llvm-svn: 356584
2019-03-20 18:00:27 +00:00
Philip Reames 484d07c828 [instcombine] Add todos describing missing transforms for masked.* intrinsics
llvm-svn: 356536
2019-03-20 03:36:05 +00:00
Sanjay Patel 6063393536 [InstCombine] allow general vector constants for funnel shift to shift transforms
Follow-up to:
rL356338
rL356369

We can calculate an arbitrary vector constant minus the bitwidth, so there's
no need to limit this transform to scalars and splats.

llvm-svn: 356372
2019-03-18 14:27:51 +00:00
Sanjay Patel 84de8a30a0 [InstCombine] extend rotate-left-by-constant canonicalization to funnel shift
Follow-up to:
rL356338

Rotates are a special case of funnel shift where the 2 input operands
are the same value, but that does not need to be a restriction for the
canonicalization when the shift amount is a constant.

llvm-svn: 356369
2019-03-18 14:10:11 +00:00
Sanjay Patel b3bcd95771 [InstCombine] canonicalize rotate right by constant to rotate left
This was noted as a backend problem:
https://bugs.llvm.org/show_bug.cgi?id=41057
...and subsequently fixed for x86:
rL356121
But we should canonicalize these in IR for the benefit of all targets
and improve IR analysis such as CSE.

llvm-svn: 356338
2019-03-17 19:08:00 +00:00
Philip Reames 68a2e4d48b [SimplifyDemandedVec] Strengthen handling all undef lanes (particularly GEPs)
A change of two parts:
1) A generic enhancement for all callers of SDVE to exploit the fact that if all lanes are undef, the result is undef.
2) A GEP specific piece to strengthen/fix the vector index undef element handling, and call into the generic infrastructure when visiting the GEP.

The result is that we replace a vector gep with at least one undef in each lane with a undef.  We can also do the same for vector intrinsics.  Once the masked.load patch (D57372) has landed, I'll update to include call tests as well.

Differential Revision: https://reviews.llvm.org/D57468

llvm-svn: 356293
2019-03-15 19:54:06 +00:00
Sanjay Patel de1d5d3675 [InstCombine] canonicalize funnel shift constant shift amount to be modulo bitwidth
The shift argument is defined to be modulo the bitwidth, so if that argument
is a constant, we can always reduce the constant to its minimal form to allow
better CSE and other follow-on transforms.

We need to be careful to ignore constant expressions here, or we will likely
infinite loop. I'm adding a general vector constant query for that case.

Differential Revision: https://reviews.llvm.org/D59374

llvm-svn: 356192
2019-03-14 19:22:08 +00:00
Matt Arsenault caf1316f71 IR: Add immarg attribute
This indicates an intrinsic parameter is required to be a constant,
and should not be replaced with a non-constant value.

Add the attribute to all AMDGPU and generic intrinsics that comments
indicate it should apply to. I scanned other target intrinsics, but I
don't see any obvious comments indicating which arguments are intended
to be only immediates.

This breaks one questionable testcase for the autoupgrade. I'm unclear
on whether the autoupgrade is supposed to really handle declarations
which were never valid. The verifier fails because the attributes now
refer to a parameter past the end of the argument list.

llvm-svn: 355981
2019-03-12 21:02:54 +00:00
Nikita Popov 884feb1b69 [InstCombine] Fold add nsw + sadd.with.overflow
Fold `add nsw` and `sadd.with.overflow` with constants if the addition
does not overflow.

Part of https://bugs.llvm.org/show_bug.cgi?id=38146.

Patch by Dan Robertson.

Differential Revision: https://reviews.llvm.org/D58881

llvm-svn: 355530
2019-03-06 18:30:00 +00:00
Craig Topper a97857b5b5 [InstCombine] Fix an unused variable warning.
llvm-svn: 353630
2019-02-10 02:21:29 +00:00
Craig Topper 784929d045 Implementation of asm-goto support in LLVM
This patch accompanies the RFC posted here:
http://lists.llvm.org/pipermail/llvm-dev/2018-October/127239.html

This patch adds a new CallBr IR instruction to support asm-goto
inline assembly like gcc as used by the linux kernel. This
instruction is both a call instruction and a terminator
instruction with multiple successors. Only inline assembly
usage is supported today.

This also adds a new INLINEASM_BR opcode to SelectionDAG and
MachineIR to represent an INLINEASM block that is also
considered a terminator instruction.

There will likely be more bug fixes and optimizations to follow
this, but we felt it had reached a point where we would like to
switch to an incremental development model.

Patch by Craig Topper, Alexander Ivchenko, Mikhail Dvoretckii

Differential Revision: https://reviews.llvm.org/D53765

llvm-svn: 353563
2019-02-08 20:48:56 +00:00
James Y Knight 291f791ef1 [opaque pointer types] Pass function type for CallBase::setCalledFunction.
Differential Revision: https://reviews.llvm.org/D57174

llvm-svn: 352914
2019-02-01 20:44:54 +00:00
James Y Knight 14359ef1b6 [opaque pointer types] Pass value type to LoadInst creation.
This cleans up all LoadInst creation in LLVM to explicitly pass the
value type rather than deriving it from the pointer's element-type.

Differential Revision: https://reviews.llvm.org/D57172

llvm-svn: 352911
2019-02-01 20:44:24 +00:00
James Y Knight 7976eb5838 [opaque pointer types] Pass function types to CallInst creation.
This cleans up all CallInst creation in LLVM to explicitly pass a
function type rather than deriving it from the pointer's element-type.

Differential Revision: https://reviews.llvm.org/D57170

llvm-svn: 352909
2019-02-01 20:43:25 +00:00
Sanjay Patel fbcbac7174 [InstCombine] reduce duplicate code; NFC
An unused variable problem was introduced with rL352870 
and stubbed out with rL352871, but we can make a better
fix by actually using the local variable in code rather 
than just the assert.  

llvm-svn: 352873
2019-02-01 14:37:49 +00:00
Fangrui Song 8495aabec2 [InstCombine] Fix -Wunused-variable when -DLLVM_ENABLE_ASSERTIONS=off
llvm-svn: 352871
2019-02-01 14:22:02 +00:00
Sanjay Patel be23a91fcd [InstCombine] try to reduce x86 addcarry to generic uaddo intrinsic
If we can reduce the x86-specific intrinsic to the generic op, it allows existing 
simplifications and value tracking folds. AFAICT, this always results in identical 
x86 codegen in the non-reduced case...which should be true because we semi-generically 
(too aggressively IMO) convert to llvm.uadd.with.overflow in CGP, so the DAG/isel must 
already combine/lower this intrinsic as expected.

This isn't quite what was requested in:
https://bugs.llvm.org/show_bug.cgi?id=40486
...but we want to have these kinds of folds early for efficiency and to enable greater 
simplifications. For the case in the bug report where we have:
_addcarry_u64(0, ahi, 0, &ahi)
...this gets completely simplified away in IR.

Differential Revision: https://reviews.llvm.org/D57453

llvm-svn: 352870
2019-02-01 14:14:47 +00:00
Craig Topper c1892ec15a [CallSite removal] Remove CallSite uses from InstCombine.
Reviewers: chandlerc

Reviewed By: chandlerc

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D57494

llvm-svn: 352771
2019-01-31 17:23:29 +00:00
Erik Pilkington 600e9deacf Add a 'dynamic' parameter to the objectsize intrinsic
This is meant to be used with clang's __builtin_dynamic_object_size.
When 'true' is passed to this parameter, the intrinsic has the
potential to be folded into instructions that will be evaluated
at run time. When 'false', the objectsize intrinsic behaviour is
unchanged.

rdar://32212419

Differential revision: https://reviews.llvm.org/D56761

llvm-svn: 352664
2019-01-30 20:34:35 +00:00
Philip Reames c71e996aed SimplifyDemandedVectorElts for all intrinsics
The point is that this simplifies integration of new intrinsics into SimplifiedDemandedVectorElts, and ensures we don't miss any existing ones.

This is intended to be NFC-ish, but as seen from the diffs, can produce slightly different output.  This is due to order of transforms w/in instcombine resulting in two slightly different fixed points.  That's something we should fix, but isn't a problem w/this patch per se.

Differential Revision: https://reviews.llvm.org/D57398

llvm-svn: 352653
2019-01-30 19:21:11 +00:00
Simon Pilgrim e1143c1322 [X86] Auto upgrade VPCOM/VPCOMU intrinsics to generic integer comparisons
This causes a couple of changes in the upgrade tests as signed/unsigned eq/ne are equivalent and we constant fold true/false codes, these changes are the same as what we already do for avx512 cmp/ucmp.

Noticed while cleaning up vector integer comparison costs for PR40376.

llvm-svn: 351697
2019-01-20 19:27:40 +00:00
Chandler Carruth 2946cd7010 Update the file headers across all of the LLVM projects in the monorepo
to reflect the new license.

We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.

Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.

llvm-svn: 351636
2019-01-19 08:50:56 +00:00
Serguei Katkov a5b0e5585b [InstCombine]Avoid introduction of unaligned mem access
InstCombine is able to transform mem transfer instrinsic to alone store or store/load pair.
It might result in generation of unaligned atomic load/store which later in backend
will be transformed to libcall. It is not an evident gain and it is better to keep intrinsic as is
and handle it at backend.

Reviewers: reames, anna, apilipenko, mkazantsev
Reviewed By: reames
Subscribers: t.p.northover, jfb, llvm-commits
Differential Revision: https://reviews.llvm.org/D56582

llvm-svn: 351295
2019-01-16 04:36:26 +00:00
Marek Olsak 33eb4d947d AMDGPU: Add a fast path for icmp.i1(src, false, NE)
Summary:
This allows moving the condition from the intrinsic to the standard ICmp
opcode, so that LLVM can do simplifications on it. The icmp.i1 intrinsic
is an identity for retrieving the SGPR mask.

And we can also get the mask from and i1, or i1, xor i1.

Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D52060

llvm-svn: 351150
2019-01-15 02:13:18 +00:00
Simon Pilgrim 5d403f6bf8 [X86][SSE] Auto upgrade PADDS/PSUBS intrinsics to SADD_SAT/SSUB_SAT generic intrinsics (llvm)
This auto upgrades the signed SSE saturated math intrinsics to SADD_SAT/SSUB_SAT generic intrinsics.

Clang counterpart: https://reviews.llvm.org/D55890

Differential Revision: https://reviews.llvm.org/D55894

llvm-svn: 349892
2018-12-21 09:04:14 +00:00
Michael Kruse 978ba61536 Introduce llvm.loop.parallel_accesses and llvm.access.group metadata.
The current llvm.mem.parallel_loop_access metadata has a problem in that
it uses LoopIDs. LoopID unfortunately is not loop identifier. It is
neither unique (there's even a regression test assigning the some LoopID
to multiple loops; can otherwise happen if passes such as LoopVersioning
make copies of entire loops) nor persistent (every time a property is
removed/added from a LoopID's MDNode, it will also receive a new LoopID;
this happens e.g. when calling Loop::setLoopAlreadyUnrolled()).
Since most loop transformation passes change the loop attributes (even
if it just to mark that a loop should not be processed again as
llvm.loop.isvectorized does, for the versioned and unversioned loop),
the parallel access information is lost for any subsequent pass.

This patch unlinks LoopIDs and parallel accesses.
llvm.mem.parallel_loop_access metadata on instruction is replaced by
llvm.access.group metadata. llvm.access.group points to a distinct
MDNode with no operands (avoiding the problem to ever need to add/remove
operands), called "access group". Alternatively, it can point to a list
of access groups. The LoopID then has an attribute
llvm.loop.parallel_accesses with all the access groups that are parallel
(no dependencies carries by this loop).

This intentionally avoid any kind of "ID". Loops that are clones/have
their attributes modifies retain the llvm.loop.parallel_accesses
attribute. Access instructions that a cloned point to the same access
group. It is not necessary for each access to have it's own "ID" MDNode,
but those memory access instructions with the same behavior can be
grouped together.

The behavior of llvm.mem.parallel_loop_access is not changed by this
patch, but should be considered deprecated.

Differential Revision: https://reviews.llvm.org/D52116

llvm-svn: 349725
2018-12-20 04:58:07 +00:00
Sanjay Patel 2aa2dc76c2 [InstCombine] try to convert x86 movmsk intrinsic to generic IR (PR39927)
call iM movmsk(sext <N x i1> X) --> zext (bitcast <N x i1> X to iN) to iM

This has the potential to create less-than-8-bit scalar types as shown in 
some of the test diffs, but it looks like the backend knows how to deal 
with that in these patterns. This is the simple part of the fix suggested in:
https://bugs.llvm.org/show_bug.cgi?id=39927

Differential Revision: https://reviews.llvm.org/D55529

llvm-svn: 348862
2018-12-11 16:38:03 +00:00
Nikita Popov 0c5d6ccbfc [InstCombine] Support ssub.sat canonicalization for non-splats
Extend ssub.sat(X, C) -> sadd.sat(X, -C) canonicalization to also
support non-splat vector constants. This is done by generalizing
the implementation of the isNotMinSignedValue() helper to return
true for constants that are non-splat, but don't contain any
signed min elements.

Differential Revision: https://reviews.llvm.org/D55011

llvm-svn: 348072
2018-12-01 10:58:34 +00:00
Nikita Popov 8d63aed459 [InstCombine] Combine saturating add/sub with constant operands
Combine
  sat(sat(X + C1) + C2) -> sat(X + (C1+C2))
and
  sat(sat(X - C1) - C2) -> sat(X - (C1+C2))
if the sign of C1 and C2 matches.

In the unsigned case we can compute C1+C2 with saturating arithmetic,
and InstSimplify will reduce this just to the saturation value. For
the signed case, we cannot perform the simplification if the result
of the addition overflows.

This change is part of https://reviews.llvm.org/D54534.

llvm-svn: 347773
2018-11-28 16:37:15 +00:00
Nikita Popov 42f89989a1 [InstCombine] Canonicalize ssub.sat to sadd.sat
Canonicalize ssub.sat(X, C) to ssub.sat(X, -C) if C is constant and
not signed minimum. This will help further optimizations to apply.

This change is part of https://reviews.llvm.org/D54534.

llvm-svn: 347772
2018-11-28 16:37:09 +00:00
Nikita Popov 78a9295e15 [InstCombine] Use known overflow information for saturating add/sub
If ValueTracking can determine that the add/sub can newer overflow,
replace it with the corresponding nuw/nsw add/sub.

Additionally, for the unsigned case, if ValueTracking determines
that the add/sub always overflows, replace the result with the
saturation value.

This change is part of https://reviews.llvm.org/D54534.

llvm-svn: 347770
2018-11-28 16:36:59 +00:00
Nikita Popov 085d24a8b3 [InstCombine] Canonicalize const arg for saturating adds
If a saturating add intrinsic has one constant argument, make sure
it is on the RHS. This will simplify further transformations.

This change is part of https://reviews.llvm.org/D54534.

llvm-svn: 347769
2018-11-28 16:36:52 +00:00
Sanjay Patel 790af91803 [InstCombine] add helper function to reduce code duplication; NFC
llvm-svn: 347604
2018-11-26 22:00:41 +00:00
Nikita Popov 6e81d421e1 [InstCombine] Simplify funnel shift with zero/undef operand to shift
The following simplifications are implemented:

 * `fshl(X, 0, C) -> shl X, C%BW`
 * `fshl(X, undef, C) -> shl X, C%BW` (assuming undef = 0)
 * `fshl(0, X, C) -> lshr X, BW-C%BW`
 * `fshl(undef, X, C) -> lshr X, BW-C%BW` (assuming undef = 0)
 * `fshr(X, 0, C) -> shl X, (BW-C%BW)`
 * `fshr(X, undef, C) -> shl X, BW-C%BW` (assuming undef = 0)
 * `fshr(0, X, C) -> lshr X, C%BW`
 * `fshr(undef, X, C) -> lshr, X, C%BW` (assuming undef = 0)

The simplification is only performed if the shift amount C is constant,
because we can explicitly compute C%BW and BW-C%BW in this case.

Differential Revision: https://reviews.llvm.org/D54778

llvm-svn: 347505
2018-11-23 22:45:08 +00:00
Sanjay Patel a139564896 [InstCombine] fold funnel shift amount based on demanded bits
The shift amount of a funnel shift is modulo the scalar bitwidth:
http://llvm.org/docs/LangRef.html#llvm-fshl-intrinsic
...so we can use demanded bits analysis on that operand to simplify it
when we have a power-of-2 bitwidth.

This is another step towards canonicalizing {shift/shift/or} to the 
intrinsics in IR.

Differential Revision: https://reviews.llvm.org/D54478

llvm-svn: 346814
2018-11-13 23:27:23 +00:00
Philip Reames b8d8db30ea [GC][InstCombine] Fix a potential iteration issue
Noticed via inspection.  Appears to be largely innocious in practice, but slight code change could have resulted in either visit order dependent missed optimizations or infinite loops.  May be a minor compile time problem today.

llvm-svn: 346698
2018-11-12 20:00:53 +00:00
Tom Stellard 28d662164d InstCombine: Avoid introducing poison values when lowering llvm.amdgcn.[us]bfe
Summary:
When the 3rd argument to these intrinsics is zero, lowering them
to shift instructions produces poison values, since we end up with
shift amounts equal to the number of bits in the shifted value.  This
means we can only lower these intrinsics if we can prove that the
3rd argument is not zero.

Reviewers: arsenm

Reviewed By: arsenm

Subscribers: bnieuwenhuizen, jvesely, wdng, nhaehnle, llvm-commits

Differential Revision: https://reviews.llvm.org/D53739

llvm-svn: 346422
2018-11-08 17:57:57 +00:00
Volkan Keles 3ca146d083 [InstCombine] Combine nested min/max intrinsics with constants
Reviewers: arsenm, spatel

Reviewed By: spatel

Subscribers: lebedev.ri, wdng, llvm-commits

Differential Revision: https://reviews.llvm.org/D53774

llvm-svn: 345751
2018-10-31 17:50:52 +00:00
Thomas Lively c339250e12 [InstCombine] InstCombine and InstSimplify for minimum and maximum
Summary: Depends on D52765

Reviewers: aheejin, dschuff

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D52766

llvm-svn: 344799
2018-10-19 19:01:26 +00:00
Chandler Carruth edb12a838a [TI removal] Make variables declared as `TerminatorInst` and initialized
by `getTerminator()` calls instead be declared as `Instruction`.

This is the biggest remaining chunk of the usage of `getTerminator()`
that insists on the narrow type and so is an easy batch of updates.
Several files saw more extensive updates where this would cascade to
requiring API updates within the file to use `Instruction` instead of
`TerminatorInst`. All of these were trivial in nature (pervasively using
`Instruction` instead just worked).

llvm-svn: 344502
2018-10-15 10:04:59 +00:00
Amara Emerson 54f60255a2 [InstCombine] Fix SimplifyLibCalls erasing an instruction while IC still had references to it.
InstCombine keeps a worklist and assumes that optimizations don't
eraseFromParent() the instruction, which SimplifyLibCalls violates. This change
adds a new callback to SimplifyLibCalls to let clients specify their own hander
for erasing actions.

Differential Revision: https://reviews.llvm.org/D52729

llvm-svn: 344251
2018-10-11 14:51:11 +00:00