Commit Graph

572 Commits

Author SHA1 Message Date
Simon Pilgrim 5d403f6bf8 [X86][SSE] Auto upgrade PADDS/PSUBS intrinsics to SADD_SAT/SSUB_SAT generic intrinsics (llvm)
This auto upgrades the signed SSE saturated math intrinsics to SADD_SAT/SSUB_SAT generic intrinsics.

Clang counterpart: https://reviews.llvm.org/D55890

Differential Revision: https://reviews.llvm.org/D55894

llvm-svn: 349892
2018-12-21 09:04:14 +00:00
Michael Kruse 978ba61536 Introduce llvm.loop.parallel_accesses and llvm.access.group metadata.
The current llvm.mem.parallel_loop_access metadata has a problem in that
it uses LoopIDs. LoopID unfortunately is not loop identifier. It is
neither unique (there's even a regression test assigning the some LoopID
to multiple loops; can otherwise happen if passes such as LoopVersioning
make copies of entire loops) nor persistent (every time a property is
removed/added from a LoopID's MDNode, it will also receive a new LoopID;
this happens e.g. when calling Loop::setLoopAlreadyUnrolled()).
Since most loop transformation passes change the loop attributes (even
if it just to mark that a loop should not be processed again as
llvm.loop.isvectorized does, for the versioned and unversioned loop),
the parallel access information is lost for any subsequent pass.

This patch unlinks LoopIDs and parallel accesses.
llvm.mem.parallel_loop_access metadata on instruction is replaced by
llvm.access.group metadata. llvm.access.group points to a distinct
MDNode with no operands (avoiding the problem to ever need to add/remove
operands), called "access group". Alternatively, it can point to a list
of access groups. The LoopID then has an attribute
llvm.loop.parallel_accesses with all the access groups that are parallel
(no dependencies carries by this loop).

This intentionally avoid any kind of "ID". Loops that are clones/have
their attributes modifies retain the llvm.loop.parallel_accesses
attribute. Access instructions that a cloned point to the same access
group. It is not necessary for each access to have it's own "ID" MDNode,
but those memory access instructions with the same behavior can be
grouped together.

The behavior of llvm.mem.parallel_loop_access is not changed by this
patch, but should be considered deprecated.

Differential Revision: https://reviews.llvm.org/D52116

llvm-svn: 349725
2018-12-20 04:58:07 +00:00
Sanjay Patel 2aa2dc76c2 [InstCombine] try to convert x86 movmsk intrinsic to generic IR (PR39927)
call iM movmsk(sext <N x i1> X) --> zext (bitcast <N x i1> X to iN) to iM

This has the potential to create less-than-8-bit scalar types as shown in 
some of the test diffs, but it looks like the backend knows how to deal 
with that in these patterns. This is the simple part of the fix suggested in:
https://bugs.llvm.org/show_bug.cgi?id=39927

Differential Revision: https://reviews.llvm.org/D55529

llvm-svn: 348862
2018-12-11 16:38:03 +00:00
Nikita Popov 0c5d6ccbfc [InstCombine] Support ssub.sat canonicalization for non-splats
Extend ssub.sat(X, C) -> sadd.sat(X, -C) canonicalization to also
support non-splat vector constants. This is done by generalizing
the implementation of the isNotMinSignedValue() helper to return
true for constants that are non-splat, but don't contain any
signed min elements.

Differential Revision: https://reviews.llvm.org/D55011

llvm-svn: 348072
2018-12-01 10:58:34 +00:00
Nikita Popov 8d63aed459 [InstCombine] Combine saturating add/sub with constant operands
Combine
  sat(sat(X + C1) + C2) -> sat(X + (C1+C2))
and
  sat(sat(X - C1) - C2) -> sat(X - (C1+C2))
if the sign of C1 and C2 matches.

In the unsigned case we can compute C1+C2 with saturating arithmetic,
and InstSimplify will reduce this just to the saturation value. For
the signed case, we cannot perform the simplification if the result
of the addition overflows.

This change is part of https://reviews.llvm.org/D54534.

llvm-svn: 347773
2018-11-28 16:37:15 +00:00
Nikita Popov 42f89989a1 [InstCombine] Canonicalize ssub.sat to sadd.sat
Canonicalize ssub.sat(X, C) to ssub.sat(X, -C) if C is constant and
not signed minimum. This will help further optimizations to apply.

This change is part of https://reviews.llvm.org/D54534.

llvm-svn: 347772
2018-11-28 16:37:09 +00:00
Nikita Popov 78a9295e15 [InstCombine] Use known overflow information for saturating add/sub
If ValueTracking can determine that the add/sub can newer overflow,
replace it with the corresponding nuw/nsw add/sub.

Additionally, for the unsigned case, if ValueTracking determines
that the add/sub always overflows, replace the result with the
saturation value.

This change is part of https://reviews.llvm.org/D54534.

llvm-svn: 347770
2018-11-28 16:36:59 +00:00
Nikita Popov 085d24a8b3 [InstCombine] Canonicalize const arg for saturating adds
If a saturating add intrinsic has one constant argument, make sure
it is on the RHS. This will simplify further transformations.

This change is part of https://reviews.llvm.org/D54534.

llvm-svn: 347769
2018-11-28 16:36:52 +00:00
Sanjay Patel 790af91803 [InstCombine] add helper function to reduce code duplication; NFC
llvm-svn: 347604
2018-11-26 22:00:41 +00:00
Nikita Popov 6e81d421e1 [InstCombine] Simplify funnel shift with zero/undef operand to shift
The following simplifications are implemented:

 * `fshl(X, 0, C) -> shl X, C%BW`
 * `fshl(X, undef, C) -> shl X, C%BW` (assuming undef = 0)
 * `fshl(0, X, C) -> lshr X, BW-C%BW`
 * `fshl(undef, X, C) -> lshr X, BW-C%BW` (assuming undef = 0)
 * `fshr(X, 0, C) -> shl X, (BW-C%BW)`
 * `fshr(X, undef, C) -> shl X, BW-C%BW` (assuming undef = 0)
 * `fshr(0, X, C) -> lshr X, C%BW`
 * `fshr(undef, X, C) -> lshr, X, C%BW` (assuming undef = 0)

The simplification is only performed if the shift amount C is constant,
because we can explicitly compute C%BW and BW-C%BW in this case.

Differential Revision: https://reviews.llvm.org/D54778

llvm-svn: 347505
2018-11-23 22:45:08 +00:00
Sanjay Patel a139564896 [InstCombine] fold funnel shift amount based on demanded bits
The shift amount of a funnel shift is modulo the scalar bitwidth:
http://llvm.org/docs/LangRef.html#llvm-fshl-intrinsic
...so we can use demanded bits analysis on that operand to simplify it
when we have a power-of-2 bitwidth.

This is another step towards canonicalizing {shift/shift/or} to the 
intrinsics in IR.

Differential Revision: https://reviews.llvm.org/D54478

llvm-svn: 346814
2018-11-13 23:27:23 +00:00
Philip Reames b8d8db30ea [GC][InstCombine] Fix a potential iteration issue
Noticed via inspection.  Appears to be largely innocious in practice, but slight code change could have resulted in either visit order dependent missed optimizations or infinite loops.  May be a minor compile time problem today.

llvm-svn: 346698
2018-11-12 20:00:53 +00:00
Tom Stellard 28d662164d InstCombine: Avoid introducing poison values when lowering llvm.amdgcn.[us]bfe
Summary:
When the 3rd argument to these intrinsics is zero, lowering them
to shift instructions produces poison values, since we end up with
shift amounts equal to the number of bits in the shifted value.  This
means we can only lower these intrinsics if we can prove that the
3rd argument is not zero.

Reviewers: arsenm

Reviewed By: arsenm

Subscribers: bnieuwenhuizen, jvesely, wdng, nhaehnle, llvm-commits

Differential Revision: https://reviews.llvm.org/D53739

llvm-svn: 346422
2018-11-08 17:57:57 +00:00
Volkan Keles 3ca146d083 [InstCombine] Combine nested min/max intrinsics with constants
Reviewers: arsenm, spatel

Reviewed By: spatel

Subscribers: lebedev.ri, wdng, llvm-commits

Differential Revision: https://reviews.llvm.org/D53774

llvm-svn: 345751
2018-10-31 17:50:52 +00:00
Thomas Lively c339250e12 [InstCombine] InstCombine and InstSimplify for minimum and maximum
Summary: Depends on D52765

Reviewers: aheejin, dschuff

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D52766

llvm-svn: 344799
2018-10-19 19:01:26 +00:00
Chandler Carruth edb12a838a [TI removal] Make variables declared as `TerminatorInst` and initialized
by `getTerminator()` calls instead be declared as `Instruction`.

This is the biggest remaining chunk of the usage of `getTerminator()`
that insists on the narrow type and so is an easy batch of updates.
Several files saw more extensive updates where this would cascade to
requiring API updates within the file to use `Instruction` instead of
`TerminatorInst`. All of these were trivial in nature (pervasively using
`Instruction` instead just worked).

llvm-svn: 344502
2018-10-15 10:04:59 +00:00
Amara Emerson 54f60255a2 [InstCombine] Fix SimplifyLibCalls erasing an instruction while IC still had references to it.
InstCombine keeps a worklist and assumes that optimizations don't
eraseFromParent() the instruction, which SimplifyLibCalls violates. This change
adds a new callback to SimplifyLibCalls to let clients specify their own hander
for erasing actions.

Differential Revision: https://reviews.llvm.org/D52729

llvm-svn: 344251
2018-10-11 14:51:11 +00:00
Neil Henning 57f5d0a885 [IRBuilder] Fixup CreateIntrinsic to allow specifying Types to Mangle.
The IRBuilder CreateIntrinsic method wouldn't allow you to specify the
types that you wanted the intrinsic to be mangled with. To fix this
I've:

- Added an ArrayRef<Type *> member to both CreateIntrinsic overloads.
- Used that array to pass into the Intrinsic::getDeclaration call.
- Added a CreateUnaryIntrinsic to replace the most common use of
  CreateIntrinsic where the type was auto-deduced from operand 0.
- Added a bunch more unit tests to test Create*Intrinsic calls that
  weren't being tested (including the FMF flag that wasn't checked).

This was suggested as part of the AMDGPU specific atomic optimizer
review (https://reviews.llvm.org/D51969).

Differential Revision: https://reviews.llvm.org/D52087

llvm-svn: 343962
2018-10-08 10:32:33 +00:00
Sanjay Patel 09e02fbf51 [InstCombine][x86] try even harder to convert blendv intrinsic to generic IR (PR38814)
Follow-up to rL342324 (D52059):

Missing optimizations with blendv are shown in:
https://bugs.llvm.org/show_bug.cgi?id=38814

This is an easier and more powerful solution than adding pattern matching for a few 
special cases in the backend. The potential danger with this transform in IR is that 
the condition value can get separated from the select, and the backend might not be 
able to make a blendv out of it again.

llvm-svn: 342806
2018-09-22 14:43:55 +00:00
Sanjay Patel 296d35a5e9 [InstCombine][x86] try harder to convert blendv intrinsic to generic IR (PR38814)
Missing optimizations with blendv are shown in:
https://bugs.llvm.org/show_bug.cgi?id=38814

If this works, it's an easier and more powerful solution than adding pattern matching 
for a few special cases in the backend. The potential danger with this transform in IR
is that the condition value can get separated from the select, and the backend might 
not be able to make a blendv out of it again. I don't think that's too likely, but 
I've kept this patch minimal with a 'TODO', so we can test that theory in the wild 
before expanding the transform.

Differential Revision: https://reviews.llvm.org/D52059

llvm-svn: 342324
2018-09-15 14:25:44 +00:00
Sanjay Patel 0f29e953b7 [InstCombine] canonicalize fneg with llvm.sin
This is a follow-up to rL339604 which did the same transform
for a sin libcall. The handling of intrinsics vs. libcalls
is unfortunately scattered, so I'm just adding this next to
the existing transform for llvm.cos for now.

This should resolve PR38458:
https://bugs.llvm.org/show_bug.cgi?id=38458
If the call was already negated, the negates will cancel
each other out.

llvm-svn: 340952
2018-08-29 18:27:49 +00:00
Matt Arsenault 10de2775bd AMDGPU: Remove nan tests in class if src is nnan
llvm-svn: 340850
2018-08-28 18:10:02 +00:00
Craig Topper 9c1d9fdeaa [X86] Remove masking from the 512-bit padds and psubs intrinsics. Use select in IR instead.
llvm-svn: 339842
2018-08-16 06:20:24 +00:00
Matt Arsenault 9a389fbd79 AMDGPU: Stop producing icmp/fcmp intrinsics with invalid types
llvm-svn: 339815
2018-08-15 21:14:25 +00:00
Tomasz Krupa e766e5f636 [X86] Constant folding of adds/subs intrinsics
Summary: This adds constant folding of signed add/sub with saturation intrinsics.

Reviewers: craig.topper, spatel, RKSimon, chandlerc, efriedma

Reviewed By: craig.topper

Subscribers: rnk, llvm-commits

Differential Revision: https://reviews.llvm.org/D50499

llvm-svn: 339659
2018-08-14 09:04:01 +00:00
Matt Arsenault d35f46caf1 AMDGPU: Turn class x, p_zero|n_zero into fcmp oeq x, 0
The library does use this for some reason.

llvm-svn: 339461
2018-08-10 18:58:49 +00:00
Sanjay Patel c6944f795d [InstSimplify] move minnum/maxnum with Inf folds from instcombine
llvm-svn: 339396
2018-08-09 22:20:44 +00:00
Sanjay Patel 948ff87d7d [InstSimplify] move minnum/maxnum with common op fold from instcombine
llvm-svn: 339144
2018-08-07 14:36:27 +00:00
Sanjay Patel 3f6e9a71f7 [InstSimplify] move minnum/maxnum with undef fold from instcombine
llvm-svn: 338719
2018-08-02 14:33:40 +00:00
Sanjay Patel 28c7e41c09 [InstSimplify] move minnum/maxnum with same arg fold from instcombine
llvm-svn: 338652
2018-08-01 23:05:55 +00:00
Matt Arsenault d149650760 PatternMatch: Add wrappers for fabs and canonicalize
llvm-svn: 338111
2018-07-27 09:04:35 +00:00
Piotr Padlewski c63b492bcd Simplify recursive launder.invariant.group and strip
Summary:
This patch is crucial for proving equality laundered/stripped
pointers. eg:

  bool foo(A *a) {
    return a == std::launder(a);
  }

Clang with -fstrict-vtable-pointers will emit something like:

    define dso_local zeroext i1 @_Z3fooP1A(%struct.A* %a) {
    entry:
      %c = bitcast %struct.A* %a to i8*
      %call = tail call i8* @llvm.launder.invariant.group.p0i8(i8* %c)
      %0 = bitcast %struct.A* %a to i8*
      %1 = tail call i8* @llvm.strip.invariant.group.p0i8(i8* %0)
      %2 = tail call i8* @llvm.strip.invariant.group.p0i8(i8* %call)
      %cmp = icmp eq i8* %1, %2
      ret i1 %cmp
    }

and because %2 can be replaced with @llvm.strip.invariant.group(%0)
and that %2 and %1 will produce the same value (because strip is readnone)
we can replace compare with true.

Reviewers: rsmith, hfinkel, majnemer, amharc, kuhar

Subscribers: llvm-commits, hiraditya

Differential Revision: https://reviews.llvm.org/D47423

llvm-svn: 336963
2018-07-12 23:55:20 +00:00
Craig Topper 034adf2683 [X86] Remove and autoupgrade the scalar fma intrinsics with masking.
This converts them to what clang is now using for codegen. Unfortunately, there seem to be a few kinks to work out still. I'll try to address with follow up patches.

llvm-svn: 336871
2018-07-12 00:29:56 +00:00
Manoj Gupta 77eeac3d9e llvm: Add support for "-fno-delete-null-pointer-checks"
Summary:
Support for this option is needed for building Linux kernel.
This is a very frequently requested feature by kernel developers.

More details : https://lkml.org/lkml/2018/4/4/601

GCC option description for -fdelete-null-pointer-checks:
This Assume that programs cannot safely dereference null pointers,
and that no code or data element resides at address zero.

-fno-delete-null-pointer-checks is the inverse of this implying that
null pointer dereferencing is not undefined.

This feature is implemented in LLVM IR in this CL as the function attribute
"null-pointer-is-valid"="true" in IR (Under review at D47894).
The CL updates several passes that assumed null pointer dereferencing is
undefined to not optimize when the "null-pointer-is-valid"="true"
attribute is present.

Reviewers: t.p.northover, efriedma, jyknight, chandlerc, rnk, srhines, void, george.burgess.iv

Reviewed By: efriedma, george.burgess.iv

Subscribers: eraman, haicheng, george.burgess.iv, drinkcat, theraven, reames, sanjoy, xbolva00, llvm-commits

Differential Revision: https://reviews.llvm.org/D47895

llvm-svn: 336613
2018-07-09 22:27:23 +00:00
Matt Arsenault 24ce89b717 Fix asserts in AMDGCN fmed3 folding by handling more cases of NaN
Better NaN handling for AMDGCN fmed3.

All operands are checked for NaN now. The checks
were moved before the canonicalization to provide
a better mapping from fclamp. Changed the behaviour
of fmed3(x,y,NaN) to return max(x,y) instead of
min(x,y) in light of this. Updated tests as a result
and added some new cases to cover the fix.

Patch by Alan Baker

llvm-svn: 336375
2018-07-05 17:05:36 +00:00
Craig Topper 350c5f1881 [X86] Remove X86 specific scalar FMA intrinsics and upgrade to tart independent FMA and extractelement/insertelement.
llvm-svn: 336315
2018-07-05 06:52:55 +00:00
Craig Topper 31cbe75b3b [X86] Rename the autoupgraded of packed fp compare and fpclass intrinsics that don't take a mask as input to exclude '.mask.' from their name.
I think the intrinsics named 'avx512.mask.' should refer to the previous behavior of taking a mask argument in the intrinsic instead of using a 'select' or 'and' instruction in IR to accomplish the masking. This is more consistent with the goal that eventually we will have no intrinsics that have masking builtin. When we reach that goal, we should have no intrinsics named "avx512.mask".

llvm-svn: 335744
2018-06-27 15:57:53 +00:00
Sanjay Patel 825a4faa8d [InstCombine] ignore debuginfo when removing redundant assumes (PR37726)
This is similar to:
rL335083

Fixes::
https://bugs.llvm.org/show_bug.cgi?id=37726

llvm-svn: 335121
2018-06-20 13:22:26 +00:00
Vedant Kumar f01827f2d1 [IR] Introduce helpers to skip debug instructions (NFC)
This patch introduces two helpers to make it easier to ignore debug
intrinsics:

- Instruction::getNextNonDebugInstruction()

This is just like Instruction::getNextNode(), except that it skips debug
info.

- skipDebugInfo(BasicBlock::iterator)

A free function which advances a BasicBlock iterator past any debug
info. This is a no-op when the iterator already points to a non-debug
instruction.

Part of: llvm.org/PR37728
Related to: https://reviews.llvm.org/D47874

Differential Revision: https://reviews.llvm.org/D48305

llvm-svn: 335083
2018-06-19 23:42:17 +00:00
Mikhail Dvoretckii 8393f90717 [InstCombine] Replacing X86-specific rounding intrinsics with generic floor-ceil
This patch replaces calls to X86-specific intrinsics with floor-ceil semantics
with calls to target-independent @llvm.floor.* and @llvm.ceil.* intrinsics. This
doesn't affect the resulting machine code, as those intrinsics are lowered to
the same instructions, but exposes these specific rounding cases to generic
optimizations.

Differential Revision: https://reviews.llvm.org/D48067

llvm-svn: 335039
2018-06-19 10:49:12 +00:00
Craig Topper 98a79934af [X86] Remove masking from the 512-bit masked floating point add/sub/mul/div intrinsics. Use a select in IR instead.
llvm-svn: 334358
2018-06-10 06:01:36 +00:00
Davide Italiano 189c2cf114 [InstCombine] Skip dbg.value(s) when looking at stack{save,restore}.
Fixes PR37713.

llvm-svn: 334317
2018-06-08 20:42:36 +00:00
Tim Northover 9b80060d7b InstCombine: ignore debug instructions during fence combine
We should never get different CodeGen based on whether the code is being
compiled in debug mode so we must skip over @llvm.dbg.value (and similar)
calls.

Should fix at least the worst part of PR37690.

llvm-svn: 334090
2018-06-06 12:46:02 +00:00
David Blaikie 31b98d2e99 Move Analysis/Utils/Local.h back to Transforms
Review feedback from r328165. Split out just the one function from the
file that's used by Analysis. (As chandlerc pointed out, the original
change only moved the header and not the implementation anyway - which
was fine for the one function that was used (since it's a
template/inlined in the header) but not in general)

llvm-svn: 333954
2018-06-04 21:23:21 +00:00
Alexandros Lamprineas 61f0ba1fcc [InstCombine, ARM] Convert vld1 to llvm load
Convert a vector load intrinsic into an llvm load instruction.
This is beneficial when the underlying object being addressed
comes from a constant, since we get constant-folding for free.

Differential Revision: https://reviews.llvm.org/D46273

llvm-svn: 333643
2018-05-31 12:19:18 +00:00
Alexandros Lamprineas 52457d33b2 [InstCombine, ARM, AArch64] Convert table lookup to shuffle vector
Turning a table lookup intrinsic into a shuffle vector instruction
can be beneficial. If the mask used for the lookup is the constant
vector {7,6,5,4,3,2,1,0}, then the back-end generates byte reverse
instructions instead.

Differential Revision: https://reviews.llvm.org/D46133

llvm-svn: 333550
2018-05-30 14:38:50 +00:00
Chad Rosier 274d72faad [InstCombine] Combine XOR and AES instructions on ARM/ARM64.
The ARM/ARM64 AESE and AESD instructions have a builtin XOR as the first step in
the instruction. Therefore, if the AES key is zero and the AES data was
previously XORed, it can be combined into a single instruction.

Differential Revision: https://reviews.llvm.org/D47239
Patch by Michael Brase!

llvm-svn: 333193
2018-05-24 15:26:42 +00:00
Stanislav Mekhanoshin 0e132dca53 [AMDGPU] Optimze old value of v_mov_b32_dpp
We can eliminate old value if bound_ctrl = 1 and row_mask = bank_mask = 0xf.
This is alternative implementation working with the intrinsic in InstCombine.
Original review for past-ISel optimization: D46570.

Differential Revision: https://reviews.llvm.org/D46596

llvm-svn: 332956
2018-05-22 08:04:33 +00:00
Craig Topper e4c045b7df [X86] Remove mask arguments from permvar builtins/intrinsics. Use a select in IR instead.
Someday maybe we'll use selects for all intrinsics.

llvm-svn: 332824
2018-05-20 23:34:04 +00:00
Nicola Zaghen d34e60ca85 Rename DEBUG macro to LLVM_DEBUG.
The DEBUG() macro is very generic so it might clash with other projects.
The renaming was done as follows:
- git grep -l 'DEBUG' | xargs sed -i 's/\bDEBUG\s\?(/LLVM_DEBUG(/g'
- git diff -U0 master | ../clang/tools/clang-format/clang-format-diff.py -i -p1 -style LLVM
- Manual change to APInt
- Manually chage DOCS as regex doesn't match it.

In the transition period the DEBUG() macro is still present and aliased
to the LLVM_DEBUG() one.

Differential Revision: https://reviews.llvm.org/D43624

llvm-svn: 332240
2018-05-14 12:53:11 +00:00