Commit Graph

4297 Commits

Author SHA1 Message Date
Matt Arsenault fed3dc8dc6 Revert "Revert r206045, "Fix shift by constants for vector.""
Fix cases where the Value itself is used, and not the constant value.

llvm-svn: 206214
2014-04-14 21:50:37 +00:00
NAKAMURA Takumi 58ad0c87f8 Whitespace.
llvm-svn: 206154
2014-04-14 07:03:13 +00:00
NAKAMURA Takumi 26afa982ec Revert r206045, "Fix shift by constants for vector."
It broke some builders, at least, i686.

llvm-svn: 206153
2014-04-14 07:02:57 +00:00
Hal Finkel 0192cbac66 [PowerPC] [Constant Hoisting] Enable constant hoisting on PPC
Implements the various TTI functions to enable constant hoisting on PPC. The
only significant test-suite change is this:

MultiSource/Benchmarks/VersaBench/bmm/bmm - 20% speedup
(which essentially reverses the slowdown from r206120).

llvm-svn: 206141
2014-04-13 23:02:40 +00:00
Serge Pavlov 4bb54d51c8 Recognize test for overflow in integer multiplication.
If multiplication involves zero-extended arguments and the result is
compared as in the patterns:

    %mul32 = trunc i64 %mul64 to i32
    %zext = zext i32 %mul32 to i64
    %overflow = icmp ne i64 %mul64, %zext
or
    %overflow = icmp ugt i64 %mul64 , 0xffffffff

then the multiplication may be replaced by call to umul.with.overflow.
This change fixes PR4917 and PR4918.

Differential Revision: http://llvm-reviews.chandlerc.com/D2814

llvm-svn: 206137
2014-04-13 18:23:41 +00:00
Juergen Ributzka cf03068d91 [ARM64] Never hoist the shift value of a shift instruction.
There is no need to check if we want to hoist the immediate value of an
shift instruction. Simply return TCC_Free right away.

llvm-svn: 206101
2014-04-12 02:53:51 +00:00
Juergen Ributzka 6e17aa45a3 [ARM64] Fix the cost model for cheap large constants.
Originally the cost model would give up for large constants and just return the
maximum cost. This is not what we want for constant hoisting, because some of
these constants are large in bitwidth, but are still cheap to materialize.

This commit fixes the cost model to either return TCC_Free if the cost cannot be
determined, or accurately calculate the cost even for large constants
(bitwidth > 128).

This fixes <rdar://problem/16591573>.

llvm-svn: 206100
2014-04-12 02:36:28 +00:00
Hal Finkel c3998306f4 Add the ability to use GEPs for address sinking in CGP
The current memory-instruction optimization logic in CGP, which sinks parts of
the address computation that can be adsorbed by the addressing mode, does this
by explicitly converting the relevant part of the address computation into
IR-level integer operations (making use of ptrtoint and inttoptr). For most
targets this is currently not a problem, but for targets wishing to make use of
IR-level aliasing analysis during CodeGen, the use of ptrtoint/inttoptr is a
problem for two reasons:
  1. BasicAA becomes less powerful in the face of the ptrtoint/inttoptr
  2. In cases where type-punning was used, and BasicAA was used
     to override TBAA, BasicAA may no longer do so. (this had forced us to disable
     all use of TBAA in CodeGen; something which we can now enable again)

This (use of GEPs instead of ptrtoint/inttoptr) is not currently enabled by
default (except for those targets that use AA during CodeGen), and so aside
from some PowerPC subtargets and SystemZ, there should be no change in
behavior. We may be able to switch completely away from the ptrtoint/inttoptr
sinking on all targets, but further testing is required.

I've doubled-up on a number of existing tests that are sensitive to the
address sinking behavior (including some store-merging tests that are
sensitive to the order of the resulting ADD operations at the SDAG level).

llvm-svn: 206092
2014-04-12 00:59:48 +00:00
Matt Arsenault 173a1e577c Fix shift by constants for vector.
ashr <N x iM>, <N x iM> M -> undef

llvm-svn: 206045
2014-04-11 17:57:53 +00:00
Arnold Schwaighofer b373e01d87 Reapply "SLPVectorizer: Ignore users that are insertelements we can reschedule them"
This commit reapplies 205018. After 205855 we should correctly vectorize
intrinsics.

llvm-svn: 205965
2014-04-10 13:41:35 +00:00
Juergen Ributzka 48c8c07d0a [ARM64] Fix immediate cost calculation for types larger than i64.
The immediate cost calculation code was hitting an assertion in the included
test case, because APInt was still internally 128-bits. Truncating it to 64-bits
fixed the issue.

Fixes <rdar://problem/16572521>.

llvm-svn: 205947
2014-04-10 01:36:59 +00:00
Arnold Schwaighofer fd0bf5d6e5 SLPVectorizer: Only vectorize intrinsics whose operands are widened equally
The vectorizer only knows how to vectorize intrinics by widening all operands by
the same factor.

Patch by Tyler Nowicki!

llvm-svn: 205855
2014-04-09 14:20:47 +00:00
Juergen Ributzka c11e8b67bb [Constant Hoisting][ARM64] Enable constant hoisting for ARM64.
This implements the target-hooks for ARM64 to enable constant hoisting.

This fixes <rdar://problem/14774662> and <rdar://problem/16381500>.

llvm-svn: 205791
2014-04-08 20:39:59 +00:00
Eric Christopher beb2cd6b7c Handle vlas during inline cost computation if they'll be turned
into a constant size alloca by inlining.

Ran a run over the testsuite, no results out of the noise, fixes
the testcase in the PR.

PR19115.

llvm-svn: 205710
2014-04-07 13:36:21 +00:00
Juergen Ributzka 9dff139025 Update the test to use FileCheck.
llvm-svn: 205647
2014-04-04 19:57:01 +00:00
Saleem Abdulrasool 905b6d192c ARM: yet another round of ARM test clean ups
llvm-svn: 205586
2014-04-03 23:47:24 +00:00
Eli Bendersky 9966b26dac Fix PR19270 - type mismatch caused by invalid optimization.
Patch by Jingyue Wu.

llvm-svn: 205547
2014-04-03 17:51:58 +00:00
Juergen Ributzka b0abeb0984 Add test case for [Constant Hoisting] Erase dead cast instructions (r204538).
llvm-svn: 205484
2014-04-02 23:06:22 +00:00
Adrian Prantl fd43f6ba3b typo
llvm-svn: 205473
2014-04-02 22:17:30 +00:00
Juergen Ributzka 27435b3b8a Add comments and test case for [X86TTI] Make constant base pointers for GetElementPtr opaque (r204739).
llvm-svn: 205468
2014-04-02 21:45:36 +00:00
Juergen Ributzka ab6f44efaf Add test case for [Stackmaps][X86TTI] Fix think-o in getIntImmCost calculation (r204738).
llvm-svn: 205464
2014-04-02 21:15:36 +00:00
Tim Northover 670df3d937 SLPVectorizer: compare entire intrinsic for SLP compatibility.
Some Intrinsics are overloaded to the extent that return type equality (all
that's been checked up to now) does not guarantee that the arguments are the
same. In these cases SLP vectorizer should not recurse into the operands, which
can be achieved by comparing them as "Function *" rather than simply the ID.

llvm-svn: 205424
2014-04-02 14:39:02 +00:00
Hal Finkel b0ebdc0f43 [LoopVectorizer] Count dependencies of consecutive pointers as uniforms
For the purpose of calculating the cost of the loop at various vectorization
factors, we need to count dependencies of consecutive pointers as uniforms
(which means that the VF = 1 cost is used for all overall VF values).

For example, the TSVC benchmark function s173 has:
  ...
  %3 = add nsw i64 %indvars.iv, 16000
  %arrayidx8 = getelementptr inbounds %struct.GlobalData* @global_data, i64 0, i32 0, i64 %3
  ...
and we must realize that the add will be a scalar in order to correctly deduce
it to be profitable to vectorize this on PowerPC with VSX enabled. In fact, all
dependencies of a consecutive pointer must be a scalar (uniform), and so we
simply need to add all consecutive pointers to the worklist that currently
detects collects uniforms.

Fixes PR19296.

llvm-svn: 205387
2014-04-02 02:34:49 +00:00
Hal Finkel 2eed29f3c8 Implement X86TTI::getUnrollingPreferences
This provides an initial implementation of getUnrollingPreferences for x86.
getUnrollingPreferences is used by the generic (concatenation) unroller, which
is distinct from the unrolling done by the loop vectorizer. Many modern x86
cores have some kind of uop cache and loop-stream detector (LSD) used to
efficiently dispatch small loops, and taking full advantage of this requires
unrolling small loops (small here means 10s of uops).

These caches also have limits on the number of taken branches in the loop, and
so we also cap the loop unrolling factor based on the maximum "depth" of the
loop. This is currently calculated with a partial DFS traversal (partial
because it will stop early if the path length grows too much). This is still an
approximation, and one that is both conservative (because it does not account
for branches eliminated via block placement) and optimistic (because it is only
recording the maximum depth over minimum paths). Nevertheless, because the
loops that fit in these uop caches are so small, it is not clear how much the
details matter.

The original set of patches posted for review produced the following test-suite
performance results (from the TSVC benchmark) at that time:
  ControlLoops-dbl - 13% speedup
  ControlLoops-flt - 15% speedup
  Reductions-dbl - 7.5% speedup

llvm-svn: 205348
2014-04-01 18:50:34 +00:00
Hal Finkel 86b3064f2b Move partial/runtime unrolling late in the pipeline
The generic (concatenation) loop unroller is currently placed early in the
standard optimization pipeline. This is a good place to perform full unrolling,
but not the right place to perform partial/runtime unrolling. However, most
targets don't enable partial/runtime unrolling, so this never mattered.

However, even some x86 cores benefit from partial/runtime unrolling of very
small loops, and follow-up commits will enable this. First, we need to move
partial/runtime unrolling late in the optimization pipeline (importantly, this
is after SLP and loop vectorization, as vectorization can drastically change
the size of a loop), while keeping the full unrolling where it is now. This
change does just that.

llvm-svn: 205264
2014-03-31 23:23:51 +00:00
Arnold Schwaighofer 15262e6703 Revert "SLPVectorizer: Ignore users that are insertelements we can reschedule them"
This reverts commit r205018.

Conflicts:
	lib/Transforms/Vectorize/SLPVectorizer.cpp
	test/Transforms/SLPVectorizer/X86/insert-element-build-vector.ll

This is breaking libclc build.

llvm-svn: 205260
2014-03-31 23:05:56 +00:00
Adam Nemet 10c4ce2584 [X86] Adjust cost of FP_TO_UINT v4f64->v4i32 as well
Pretty obvious follow-on to r205159 to also handle conversion from double
besides float.

Fixes <rdar://problem/16373208>

llvm-svn: 205253
2014-03-31 21:54:48 +00:00
Adam Nemet 6dafe97271 [X86] Adjust cost of FP_TO_UINT v8f32->v8i32
There is no direct AVX instruction to convert to unsigned.  I have some ideas
how we may be able to do this with three vector instructions but the current
backend just bails on this to get it scalarized.

See the comment why we need to adjust the cost returned by BasicTTI.

The test is a bit roundabout (and checks assembly rather than bit code) because
I'd like it to work even if at some point we could vectorize this conversion.

Fixes <rdar://problem/16371920>

llvm-svn: 205159
2014-03-30 18:07:13 +00:00
NAKAMURA Takumi 4cf1a3be82 llvm/test/Transforms/LoopStrengthReduce/ARM64/lsr-*.ll: Add explicit triple arm64-unknown for targeting pecoff.
llvm-svn: 205125
2014-03-30 05:01:04 +00:00
Tim Northover 00ed9964c6 ARM64: initial backend import
This adds a second implementation of the AArch64 architecture to LLVM,
accessible in parallel via the "arm64" triple. The plan over the
coming weeks & months is to merge the two into a single backend,
during which time thorough code review should naturally occur.

Everything will be easier with the target in-tree though, hence this
commit.

llvm-svn: 205090
2014-03-29 10:18:08 +00:00
Arnold Schwaighofer c9d58e8d32 SLPVectorizer: Take credit for free extractelement instructions
Extract element instructions that will be removed when vectorzing lower the
cost.

Patch by Arch D. Robison!

llvm-svn: 205020
2014-03-28 17:21:32 +00:00
Arnold Schwaighofer b190cb30c3 SLPVectorizer: Ignore users that are insertelements we can reschedule them
Patch by Arch D. Robison!

llvm-svn: 205018
2014-03-28 17:21:22 +00:00
Erik Verbruggen 5e1bac3a38 Revert "InstCombine: merge constants in both operands of icmp."
This reverts commit r204912, and follow-up commit r204948.

This introduced a performance regression, and the fix is not completely
clear yet.

llvm-svn: 205010
2014-03-28 14:50:57 +00:00
Erik Verbruggen 2074ebd8af Revert "GVN: merge overflow intrinsics with non-overflow instructions."
This reverts commit r203553, and follow-up commits r203558 and r203574.

I will follow this up on the mailinglist to do it in a way that won't
cause subtle PRE bugs.

llvm-svn: 205009
2014-03-28 14:42:34 +00:00
Reid Kleckner 3bdf9bc48b InstCombine: Don't combine constants on unsigned icmps
Fixes a miscompile introduced in r204912.  It would miscompile code like
(unsigned)(a + -49) <= 5U.  The transform would turn this into
(unsigned)a < 55U, which would return true for values in [0, 49], when
it should not.

llvm-svn: 204948
2014-03-27 17:49:27 +00:00
Rafael Espindola 24a669d225 Prevent alias from pointing to weak aliases.
This adds back r204781.

Original message:

Aliases are just another name for a position in a file. As such, the
regular symbol resolutions are not applied. For example, given

define void @my_func() {
  ret void
}
@my_alias = alias weak void ()* @my_func
@my_alias2 = alias void ()* @my_alias

We produce without this patch:

        .weak   my_alias
my_alias = my_func
        .globl  my_alias2
my_alias2 = my_alias

That is, in the resulting ELF file my_alias, my_func and my_alias are
just 3 names pointing to offset 0 of .text. That is *not* the
semantics of IR linking. For example, linking in a

@my_alias = alias void ()* @other_func

would require the strong my_alias to override the weak one and
my_alias2 would end up pointing to other_func.

There is no way to represent that with aliases being just another
name, so the best solution seems to be to just disallow it, converting
a miscompile into an error.

llvm-svn: 204934
2014-03-27 15:26:56 +00:00
Erik Verbruggen 59a1219846 InstCombine: merge constants in both operands of icmp.
Transform:
    icmp X+Cst2, Cst
into:
    icmp X, Cst-Cst2
when Cst-Cst2 does not overflow, and the add has nsw.

llvm-svn: 204912
2014-03-27 11:16:05 +00:00
Quentin Colombet 3914bf516b [X86][Vectorizer Cost Model] Correct vectorization cost model for v2i64->v2f64
and v4i64->v4f64.

The new costs match what we did for SSE2 and reflect the reality of our codegen.

<rdar://problem/16381225>

llvm-svn: 204884
2014-03-27 00:52:16 +00:00
Jim Grosbach 6373e70f81 add 'requires asserts' to test that needs it
llvm-svn: 204882
2014-03-27 00:20:42 +00:00
Jim Grosbach 72fbde84b8 X86: Correct vectorization cost model for v8f32->v8i8.
Fix the cost model to reflect the reality of our codegen.

rdar://16370633

llvm-svn: 204880
2014-03-27 00:04:11 +00:00
Nick Lewycky 77d5fb40c8 Treat lifetime.start'd memory like we treat freshly alloca'd memory. Patch by Björn Steinbrink!
llvm-svn: 204876
2014-03-26 23:45:15 +00:00
Rafael Espindola 65481d7b97 Revert "Prevent alias from pointing to weak aliases."
This reverts commit r204781.

I will follow up to with msan folks to see what is what they
were trying to do with aliases to weak aliases.

llvm-svn: 204784
2014-03-26 06:14:40 +00:00
Rafael Espindola 3b712a84a9 Prevent alias from pointing to weak aliases.
Aliases are just another name for a position in a file. As such, the
regular symbol resolutions are not applied. For example, given

define void @my_func() {
  ret void
}
@my_alias = alias weak void ()* @my_func
@my_alias2 = alias void ()* @my_alias

We produce without this patch:

        .weak   my_alias
my_alias = my_func
        .globl  my_alias2
my_alias2 = my_alias

That is, in the resulting ELF file my_alias, my_func and my_alias are
just 3 names pointing to offset 0 of .text. That is *not* the
semantics of IR linking. For example, linking in a

@my_alias = alias void ()* @other_func

would require the strong my_alias to override the weak one and
my_alias2 would end up pointing to other_func.

There is no way to represent that with aliases being just another
name, so the best solution seems to be to just disallow it, converting
a miscompile into an error.

llvm-svn: 204781
2014-03-26 04:48:47 +00:00
Richard Osborne 0af4aa9a19 [InstCombine] Don't fold bitcast into store if it would need addrspacecast
Summary:
Previously the code didn't check if the before and after types for the
store were pointers to different address spaces. This resulted in
instcombine using a bitcast to convert between pointers to different
address spaces, causing an assertion due to the invalid cast.

It is not be appropriate to use addrspacecast this case because it is
not guaranteed to be a no-op cast. Instead bail out and do not do the
transformation.

CC: llvm-commits

Differential Revision: http://llvm-reviews.chandlerc.com/D3117

llvm-svn: 204733
2014-03-25 17:21:41 +00:00
Karthik Bhat 195e9dd91b Allow constant folding of ceil function whenever feasible
llvm-svn: 204583
2014-03-24 04:36:06 +00:00
Lang Hames 459b5dc39e Revert r204076 for now - it caused significant regressions in a number of
benchmarks.

<rdar://problem/16368461>

llvm-svn: 204558
2014-03-23 04:22:31 +00:00
Juergen Ributzka e802d507b0 [Constant Hoisting] Fix multiple entries for the same basic block in PHI nodes.
A PHI node usually has only one value/basic block pair per incoming basic block.
In the case of a switch statement it is possible that a following PHI node may
have more than one such pair per incoming basic block. E.g.:
%0 = phi i64 [ 123456, %case2 ], [ 654321, %Entry ], [ 654321, %Entry ]
This is valid and the verfier doesn't complain, because both values are the
same.

Constant hoisting materializes the constant for each operand separately and the
value is still the same, but the variable names have changed. As a result the
verfier can't recognize anymore that they are the same value and complains.

This fix adds special update code for PHI node in constant hoisting to prevent
this corner case.

This fixes <rdar://problem/16394449>

llvm-svn: 204537
2014-03-22 01:49:27 +00:00
Tom Stellard edfd81d965 Sink: Don't sink static allocas from the entry block
CodeGen treats allocas outside the entry block as dynamically sized
stack objects.

llvm-svn: 204473
2014-03-21 15:51:51 +00:00
Juergen Ributzka f0dff49ad0 [Constant Hoisting] Make the constant materialization cost operand dependent
Extend the target hook to take also the operand index into account when
calculating the cost of the constant materialization.

Related to <rdar://problem/16381500>

llvm-svn: 204435
2014-03-21 06:04:45 +00:00
Juergen Ributzka 5429c06b90 [Constant Hoisting] Change the algorithm to only track constants for instructions.
Originally the algorithm would search for expensive constants and track their
users, which could be instructions and constant expressions. This change only
tracks the constants for instructions, but constant expressions are indirectly
covered too. If an operand is an constant expression, then we look through the
expression to find anny expensive constants.

The algorithm keep now track of the instruction and the operand index where the
constant is used. This allows more precise hoisting of constant materialization
code for PHI instructions, because we only hoist to the basic block of the
incoming operand. Before we had to find the idom of all PHI operands and hoist
the materialization code there.

This also makes updating of instructions easier. Before we had to keep track of
the original constant, find it in the instructions, and then replace it. Now we
can just simply update the operand.

Related to <rdar://problem/16381500>

llvm-svn: 204433
2014-03-21 06:04:36 +00:00