Commit Graph

11958 Commits

Author SHA1 Message Date
Nikita Popov 25a02c12f1 [InstCombine] Improve cttz/ctlz + icmp tests; NFC
Change part of the tests to use vectors (I'm using scalar for ugt
and vector for ult), add multiuse variations, rename %lz to %tz
for the cttz tests.

llvm-svn: 350471
2019-01-05 17:36:05 +00:00
Nikita Popov b46680407d [InstCombine] Add cttz/ctlz + icmp ugt/ult tests; NFC
llvm-svn: 350468
2019-01-05 15:51:59 +00:00
Nikita Popov 65038515ee [InstCombine] Relax cttz/ctlz with select on zero
The cttz/ctlz intrinsics have a parameter specifying whether the
result is undefined for zero. cttz(x, false) can be relaxed to
cttz(x, true) if x is known non-zero, and in fact such an optimization
is already performed. However, this currently doesn't work if x is
non-zero as a result of a select rather than an explicit branch.
This patch adds handling for this case, thus allowing
x != 0 ? cttz(x, false) : y to simplify to x != 0 ? cttz(x, true) : y.

Differential Revision: https://reviews.llvm.org/D55786

llvm-svn: 350463
2019-01-05 09:48:16 +00:00
Nikita Popov 7bd4900ba0 [InstCombine] Add vector tests for select + ctlz/cttz; NFC
llvm-svn: 350462
2019-01-05 09:48:05 +00:00
Nikita Popov 6658fce4fc [BDCE] Remove dead uses of arguments
In addition to finding dead uses of instructions, also find dead uses
of function arguments, and replace them with zero as well.

I'm changing the way the known bits are computed here to remove the
coupling between the transfer function and the algorithm. It previously
relied on the first op being visited first and computing known bits --
unless the first op is not an instruction, in which case they're computed
on the second op. I could have adjusted this to check for "instruction
or argument", but I think it's better to avoid the repeated calculation
with an explicit flag.

Differential Revision: https://reviews.llvm.org/D56247

llvm-svn: 350435
2019-01-04 21:21:43 +00:00
Teresa Johnson 853b962416 [ThinLTO] Handle chains of aliases
At -O0, globalopt is not run during the compile step, and we can have a
chain of an alias having an immediate aliasee of another alias. The
summaries are constructed assuming aliases in a canonical form
(flattened chains), and as a result only the base object but no
intermediate aliases were preserved.

Fix by adding a pass that canonicalize aliases, which ensures each
alias is a direct alias of the base object.

Reviewers: pcc, davidxl

Subscribers: mehdi_amini, inglorion, eraman, steven_wu, dexonsmith, arphaman, llvm-commits

Differential Revision: https://reviews.llvm.org/D54507

llvm-svn: 350423
2019-01-04 19:04:54 +00:00
Vedant Kumar a1778df474 [CodeExtractor] Do not extract unsafe lifetime markers
Lifetime markers which reference inputs to the extraction region are not
safe to extract. Example ('rhs' will be extracted):

```
               entry:
              +------------+
              | x = alloca |
              | y = alloca |
              +------------+
             /              \
   lhs:                      rhs:
  +-------------------+     +-------------------+
  | lifetime_start(x) |     | lifetime_start(x) |
  | use(x)            |     | lifetime_start(y) |
  | lifetime_end(x)   |     | use(x, y)         |
  | lifetime_start(y) |     | lifetime_end(y)   |
  | use(y)            |     | lifetime_end(x)   |
  | lifetime_end(y)   |     +-------------------+
  +-------------------+
```

Prior to extraction, the stack coloring pass sees that the slots for 'x'
and 'y' are in-use at the same time. After extraction, the coloring pass
infers that 'x' and 'y' are *not* in-use concurrently, because markers
from 'rhs' are no longer available to help decide otherwise.

This leads to a miscompile, because the stack slots actually are in-use
concurrently in the extracted function.

Fix this by moving lifetime start/end markers for memory regions defined
in the calling function around the call to the extracted function.

Fixes llvm.org/PR39671 (rdar://45939472).

Differential Revision: https://reviews.llvm.org/D55967

llvm-svn: 350420
2019-01-04 17:43:22 +00:00
Sanjay Patel 722466e1f1 [InstCombine] reduce raw IR narrowing rotate patterns to funnel shift
Similar to rL350199 - there are no known analysis/codegen holes for
funnel shift intrinsics now, so we can canonicalize the 6+ regular
instructions to funnel shift to improve vectorization, inlining,
unrolling, etc.

llvm-svn: 350419
2019-01-04 17:38:12 +00:00
John Brawn 39ac159c24 [LICM] Adjust how moving the re-hoist point works
In some cases the order that we hoist instructions in means that when rehoisting
(which uses the same order as hoisting) we can rehoist to a block A, then a
block B, then block A again. This currently causes an assertion failure as it
expects that when changing the hoist point it only ever moves to a block that
dominates the hoist point being moved from.

Fix this by moving the re-hoist point when it doesn't dominate the dominator of
hoisted instruction, or in other words when it wouldn't dominate the uses of
the instruction being rehoisted.

Differential Revision: https://reviews.llvm.org/D55266

llvm-svn: 350408
2019-01-04 17:12:09 +00:00
Simon Pilgrim c2aadfaaad [SLPVectorizer] Flag ADD/SUB SSAT/USAT intrinsics trivially vectorizable (PR40123)
Enables SLP vectorization for the SSE2 PADDS/PADDUS/PSUBS/PSUBUS style intrinsics

llvm-svn: 350300
2019-01-03 12:18:23 +00:00
Simon Pilgrim c5e22b29e4 [SLPVectorizer][X86] Add ADD/SUB SSAT/USAT tests (PR40123)
llvm-svn: 350297
2019-01-03 12:02:14 +00:00
Pete Cooper 697281df42 Teach ObjCARC optimizer about equivalent PHIs when eliminating autoreleaseRV/retainRV pairs
OptimizeAutoreleaseRVCall skips optimizing llvm.objc.autoreleaseReturnValue if it
sees a user which is llvm.objc.retainAutoreleasedReturnValue, and if they have
equivalent arguments (either identical or equivalent PHIs). It then assumes that
ObjCARCOpt::OptimizeRetainRVCall will optimize the pair instead.

Trouble is, ObjCARCOpt::OptimizeRetainRVCall doesn't know about equivalent PHIs
so optimizes in a different way and we are left with an unoptimized llvm.objc.autoreleaseReturnValue.

This teaches ObjCARCOpt::OptimizeRetainRVCall to also understand PHI equivalence.

rdar://problem/47005143

Reviewed By: ahatanak

Differential Revision: https://reviews.llvm.org/D56235

llvm-svn: 350284
2019-01-03 01:38:08 +00:00
Nikita Popov 41f5710328 [BDCE] Fix typo in test; NFC
shl by 32 is undefined. This was intended to be a shl by 31 as part
of a rotate sequence.

llvm-svn: 350265
2019-01-02 22:34:32 +00:00
Pete Cooper 8d58048024 Fix assert in ObjCARC optimizer when deleting retainBlock of null or undef.
The caller to EraseInstruction had this conditional:

    // ARC calls with null are no-ops. Delete them.
    if (IsNullOrUndef(Arg))

but the assert inside EraseInstruction only allowed ConstantPointerNull and not
undef or bitcasts.

This adds support for both of these cases.

rdar://problem/47003805

llvm-svn: 350261
2019-01-02 21:00:02 +00:00
Nikita Popov cc6ef7f153 [BDCE] Remove instructions without demanded bits
If an instruction has no demanded bits, remove it directly during BDCE,
instead of leaving it for something else to clean up.

Differential Revision: https://reviews.llvm.org/D56185

llvm-svn: 350257
2019-01-02 20:02:14 +00:00
Sanjay Patel 654e6aabb9 [InstCombine] canonicalize raw IR rotate patterns to funnel shift
The final piece of IR-level analysis to allow this was committed with:
rL350188

Using the intrinsics should improve transforms based on cost models
like vectorization and inlining.

The backend should be prepared too, so we can now canonicalize more
sequences of shift/logic to the intrinsics and know that the end
result should be equal or better to the original code even if the
target does not have an actual rotate instruction.

llvm-svn: 350199
2019-01-01 21:51:39 +00:00
Nikita Popov c5a023b624 [BDCE] Regenerate test checks; NFC
llvm-svn: 350190
2019-01-01 12:27:23 +00:00
Nikita Popov d4bf57be6b [BDCE] Remove -instsimplify from BDCE test; NFC
To make it more obvious which part of the transformation is carried
out by BDCE. Also drop the CHECK-IO lines which only run -instsimplify
as they don't really seem meaningful if the main check doesn't run
-instsimplify either.

llvm-svn: 350189
2019-01-01 10:17:35 +00:00
Nikita Popov bc9986e9ad Reapply "[BDCE][DemandedBits] Detect dead uses of undead instructions"
This (mostly) fixes https://bugs.llvm.org/show_bug.cgi?id=39771.

BDCE currently detects instructions that don't have any demanded bits
and replaces their uses with zero. However, if an instruction has
multiple uses, then some of the uses may be dead (have no demanded bits)
even though the instruction itself is still live. This patch extends
DemandedBits/BDCE to detect such uses and replace them with zero.
While this will not immediately render any instructions dead, it may
lead to simplifications (in the motivating case, by converting a rotate
into a simple shift), break dependencies, etc.

The implementation tries to strike a balance between analysis power and
complexity/memory usage. Originally I wanted to track demanded bits on
a per-use level, but ultimately we're only really interested in whether
a use is entirely dead or not. I'm using an extra set to track which uses
are dead. However, as initially all uses are dead, I'm not storing uses
those user is also dead. This case is checked separately instead.

The previous attempt to land this lead to miscompiles, because cases
where uses were initially dead but were later found to be live during
further analysis were not always correctly removed from the DeadUses
set. This is fixed now and the added test case demanstrates such an
instance.

Differential Revision: https://reviews.llvm.org/D55563

llvm-svn: 350188
2019-01-01 10:05:26 +00:00
Chen Zheng 4952e668f8 [InstCombine] canonicalize MUL with NEG operand
-X * Y --> -(X * Y)
X * -Y --> -(X * Y)

Differential Revision: https://reviews.llvm.org/D55961

llvm-svn: 350185
2019-01-01 01:09:20 +00:00
Chen Zheng 763c8973bf [InstCombine] [NFC] update testcases for canonicalize MUL with NEG operand
llvm-svn: 350154
2018-12-29 12:18:15 +00:00
Anna Thomas bae11e7999 [UnrollRuntime] NFC: Updated exiting tests and added more tests
Added more tests for multiple exiting blocks to the LatchExit.
Today these cases are not supported. Patch to follow soon.

llvm-svn: 350135
2018-12-28 19:21:50 +00:00
Anna Thomas 98743fa77a [UnrollRuntime] NFC: Add comment and verify LCSSA
Added -verify-loop-lcssa to test cases.
Updated comments in ConnectProlog.

llvm-svn: 350131
2018-12-28 18:52:16 +00:00
Max Kazantsev 8f70c9bf49 [NFC] Add failing test on LCSSA form preservation of LoopSimplifyCFG
llvm-svn: 350119
2018-12-28 10:43:37 +00:00
Max Kazantsev 530ff8f3cc Temporarily disable term folding in LoopSimplifyCFG, add tests
llvm-svn: 350117
2018-12-28 06:22:39 +00:00
Craig Topper c9a6000755 [LoopIdiomRecognize] Add CTTZ support
Summary:
Existing LIR recognizes CTLZ where shifting input variable right until it is zero. (Shift-Until-Zero idiom)

This commit:
1. Augments Shift-Until-Zero idiom to recognize CTTZ where input variable is shifted left.
2. Prepare for BitScan idiom recognition.

Patch by Yuanfang Chen (tabloid.adroit)

Reviewers: craig.topper, evstupac

Reviewed By: craig.topper

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D55876

llvm-svn: 350074
2018-12-26 21:59:48 +00:00
Max Kazantsev edabb9ae56 [LoopSimplifyCFG] Delete dead exiting edges
This patch teaches LoopSimplifyCFG to remove dead exiting edges
from loops.

Differential Revision: https://reviews.llvm.org/D54025
Reviewed By: fedor.sergeev

llvm-svn: 350049
2018-12-24 07:41:33 +00:00
Max Kazantsev 347c583772 Return "[LoopSimplifyCFG] Delete dead in-loop blocks"
The underlying bug that caused the revert should be fixed by rL348567.

Differential Revision: https://reviews.llvm.org/D54023

llvm-svn: 350045
2018-12-24 06:06:17 +00:00
Reid Kleckner 84a2d29681 [BasicAA] Fix AA bug on dynamic allocas and stackrestore
Summary:
BasicAA has special logic for unescaped allocas, which normally applies
equally well to dynamic and static allocas. However, llvm.stackrestore
has the power to end the lifetime of dynamic allocas, without referring
to them directly.

stackrestore is already marked with the most conservative memory
modification attributes, but because the alloca is not escaped, the
normal logic produces incorrect results. I think BasicAA needs a special
case here to teach it about the relationship between dynamic allocas and
stackrestore.

Fixes PR40118

Reviewers: gbiv, efriedma, george.burgess.iv

Subscribers: hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D55969

llvm-svn: 349945
2018-12-21 19:59:03 +00:00
Chen Zheng ddfaf07526 [InstCombine] [NFC] testcases for canonicalize MUL with NEG operand
llvm-svn: 349847
2018-12-20 22:37:05 +00:00
Nikita Popov f17421e595 [ConstantFolding] Consolidate and extend bitcount intrinsic tests; NFC
Move constant folding tests into ConstantFolding/bitcount.ll and drop
various tests in other places. Add coverage for undefs.

llvm-svn: 349806
2018-12-20 19:46:52 +00:00
Nikita Popov 6d79e43208 [ConstantFolding] Add undef tests for overflow intrinsics; NFC
llvm-svn: 349805
2018-12-20 19:46:46 +00:00
Nikita Popov da1beca0ac [ConstantFolding] Regenerate test checks; NFC
Bring overflow-ops.ll into current format. Remove redundant entry
blocks.

llvm-svn: 349804
2018-12-20 19:46:43 +00:00
Florian Hahn ef307b8c26 [LAA] Avoid generating RT checks for known deps preventing vectorization.
If we found unsafe dependences other than 'unknown', we already know at
compile time that they are unsafe and the runtime checks should always
fail. So we can avoid generating them in those cases.

This should have no negative impact on performance as the runtime checks
that would be created previously should always fail. As a sanity check,
I measured the test-suite, spec2k and spec2k6 and there were no regressions.

Reviewers: Ayal, anemet, hsaito

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D55798

llvm-svn: 349794
2018-12-20 18:49:09 +00:00
Michael Kruse 199427100b [InstCombine] Preserve access-group metadata.
Preserve llvm.access.group metadata when combining store instructions.
This was forgotten in r349725.

Fixes llvm.org/PR40117

llvm-svn: 349774
2018-12-20 17:11:02 +00:00
Simon Pilgrim 8c4abd20eb [InstCombine] Make x86 PADDS/PSUBS constant folding tests generic
As discussed on D55894, this replaces the existing PADDS/PSUBUS intrinsics with the the sadd/ssub.sat generic intrinsics and moves the tests out of the x86 subfolder.

PR40110 has been raised to fix the regression with constant folding vectors containing undef elements.

llvm-svn: 349759
2018-12-20 13:50:12 +00:00
Clement Courbet 36a3480385 Re-land r349731 "[CodeGen][ExpandMemcmp] Add an option for allowing overlapping loads.
Update PPC ir following GEP->bitcat to bitcat->GEP->bitcat change.

llvm-svn: 349747
2018-12-20 13:01:04 +00:00
Piotr Sobczak deaacc17fe [InstCombine][AMDGPU] Handle more buffer intrinsics
Summary:
Include the following intrinsics in the InsctCombine
simplification:

* amdgcn_raw_buffer_load
* amdgcn_raw_buffer_load_format
* amdgcn_struct_buffer_load
* amdgcn_struct_buffer_load_format

Change-Id: I14deceff74bcb21179baf6aa6e94bf39e7d63d5d

Reviewers: arsenm

Reviewed By: arsenm

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D55882

llvm-svn: 349735
2018-12-20 10:08:18 +00:00
Clement Courbet e22cf4d7cb Revert r349731 "[CodeGen][ExpandMemcmp] Add an option for allowing overlapping loads."
Forgot to update PowerPC tests for the GEP->bitcast change.

llvm-svn: 349733
2018-12-20 09:58:33 +00:00
Clement Courbet 1bb6e1b0f2 [CodeGen][ExpandMemcmp] Add an option for allowing overlapping loads.
Summary:
This allows expanding {7,11,13,14,15,21,22,23,25,26,27,28,29,30,31}-byte memcmp
in just two loads on X86. These were previously calling memcmp.

Reviewers: spatel, gchatelet

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D55263

llvm-svn: 349731
2018-12-20 09:13:47 +00:00
Michael Kruse 978ba61536 Introduce llvm.loop.parallel_accesses and llvm.access.group metadata.
The current llvm.mem.parallel_loop_access metadata has a problem in that
it uses LoopIDs. LoopID unfortunately is not loop identifier. It is
neither unique (there's even a regression test assigning the some LoopID
to multiple loops; can otherwise happen if passes such as LoopVersioning
make copies of entire loops) nor persistent (every time a property is
removed/added from a LoopID's MDNode, it will also receive a new LoopID;
this happens e.g. when calling Loop::setLoopAlreadyUnrolled()).
Since most loop transformation passes change the loop attributes (even
if it just to mark that a loop should not be processed again as
llvm.loop.isvectorized does, for the versioned and unversioned loop),
the parallel access information is lost for any subsequent pass.

This patch unlinks LoopIDs and parallel accesses.
llvm.mem.parallel_loop_access metadata on instruction is replaced by
llvm.access.group metadata. llvm.access.group points to a distinct
MDNode with no operands (avoiding the problem to ever need to add/remove
operands), called "access group". Alternatively, it can point to a list
of access groups. The LoopID then has an attribute
llvm.loop.parallel_accesses with all the access groups that are parallel
(no dependencies carries by this loop).

This intentionally avoid any kind of "ID". Loops that are clones/have
their attributes modifies retain the llvm.loop.parallel_accesses
attribute. Access instructions that a cloned point to the same access
group. It is not necessary for each access to have it's own "ID" MDNode,
but those memory access instructions with the same behavior can be
grouped together.

The behavior of llvm.mem.parallel_loop_access is not changed by this
patch, but should be considered deprecated.

Differential Revision: https://reviews.llvm.org/D52116

llvm-svn: 349725
2018-12-20 04:58:07 +00:00
Eli Friedman a69084ffa8 [CodeGenPrepare] Fix bad IR created by large offset GEP splitting.
Creating the IR builder, then modifying the CFG, leads to an IRBuilder
where the BB and insertion point are inconsistent, so new instructions
have the wrong parent.

Modified an existing test because the test wasn't covering anything
useful (the "invoke" was not actually an invoke by the time we hit the
code in question).

Differential Revision: https://reviews.llvm.org/D55729

llvm-svn: 349693
2018-12-19 22:52:04 +00:00
Nikita Popov 3817ee7908 Revert "[BDCE][DemandedBits] Detect dead uses of undead instructions"
This reverts commit r349674. It causes a failure in
test-suite enc-3des.execution_time.

llvm-svn: 349684
2018-12-19 22:09:02 +00:00
Nikita Popov 649e125451 [BDCE][DemandedBits] Detect dead uses of undead instructions
This (mostly) fixes https://bugs.llvm.org/show_bug.cgi?id=39771.

BDCE currently detects instructions that don't have any demanded bits
and replaces their uses with zero. However, if an instruction has
multiple uses, then some of the uses may be dead (have no demanded bits)
even though the instruction itself is still live. This patch extends
DemandedBits/BDCE to detect such uses and replace them with zero.
While this will not immediately render any instructions dead, it may
lead to simplifications (in the motivating case, by converting a rotate
into a simple shift), break dependencies, etc.

The implementation tries to strike a balance between analysis power and
complexity/memory usage. Originally I wanted to track demanded bits on
a per-use level, but ultimately we're only really interested in whether
a use is entirely dead or not. I'm using an extra set to track which uses
are dead. However, as initially all uses are dead, I'm not storing uses
those user is also dead. This case is checked separately instead.

The test case has a couple of cases that are not simplified yet. In
particular, we're only looking at uses of instructions right now. I think
it would make sense to also extend this to arguments. Furthermore
DemandedBits doesn't yet know some of the tricks that InstCombine does
for the demanded bits or bitwise or/and/xor in combination with known
bits information.

Differential Revision: https://reviews.llvm.org/D55563

llvm-svn: 349674
2018-12-19 19:56:21 +00:00
Simon Pilgrim 420d1e12b6 Regenerate test
llvm-svn: 349646
2018-12-19 17:24:34 +00:00
Sanjay Patel bb3d3a8f2e [InstCombine] add tests for extract of vector load; NFC
There's a mismatch internally about how we are handling these patterns. 
We count loads as cheapToScalarize(), but then we don't actually 
scalarize them, so that can leave extra instructions compared to where 
we started when scalarizing other ops. If it's cheapToScalarize, then 
we should be scalarizing.

llvm-svn: 349560
2018-12-18 22:51:06 +00:00
Pete Cooper a3e0be109c Preserve the linkage for objc* intrinsics as clang will set them to weak_external in some cases
Clang uses weak linkage for objc runtime functions when they are not available on the platform.

The intrinsic has this linkage so we just need to pass that on to the runtime call.

llvm-svn: 349559
2018-12-18 22:42:08 +00:00
Pete Cooper d0ffdf8782 Add nonlazybind to objc_retain/objc_release when converting from intrinsics.
For performance reasons, clang set nonlazybind on these functions.  Now that we
are using intrinsics instead of runtime calls, we should set this attribute when
creating the runtime functions.

llvm-svn: 349558
2018-12-18 22:31:34 +00:00
Florian Hahn 485f2826ba [LAA] Introduce enum for vectorization safety status (NFC).
This patch adds a VectorizationSafetyStatus enum, which will be extended
in a follow up patch to distinguish between 'safe with runtime checks'
and 'known unsafe' dependences.

Reviewers: anemet, anna, Ayal, hsaito

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D54892

llvm-svn: 349556
2018-12-18 22:25:11 +00:00
Sanjay Patel 608d128c42 [LoopVectorize] auto-generate complete checks; NFC
The first test claims to show that the vectorizer will
generate a vector load/loop, but then this file runs
other passes which might scalarize that op. I'm removing 
instcombine from the RUN line here to break that dependency.
Also, I'm generating full checks to make it clear exactly 
what the vectorizer has done.

llvm-svn: 349554
2018-12-18 22:23:04 +00:00