Commit Graph

27882 Commits

Author SHA1 Message Date
Frederic Riss 7c50047684 [dwarfdump] Handle relocations in Dwarf accelerator tables
ELF targets (and maybe COFF) use relocations when referring
to strings in the .debug_str section. Handle that in the
accelerator table dumper. This commit restores the
test/DebugInfo/cross-cu-inlining.ll test to its expected
platform independant form, validating that the fix works
(this test failed on linux boxes).

llvm-svn: 222029
2014-11-14 19:30:08 +00:00
Matt Arsenault 72858935f7 R600/SI: Fix verifier error from a branch on IMPLICIT_DEF
SIILowerI1Copies wasn't correctly handling this case.

llvm-svn: 222020
2014-11-14 18:43:41 +00:00
Matt Arsenault d28a7fde32 R600/SI: Match integer min / max instructions
llvm-svn: 222015
2014-11-14 18:30:06 +00:00
Matt Arsenault 94812216ef R600/SI: Use S_BFE_I64 for 64-bit sext_inreg
llvm-svn: 222012
2014-11-14 18:18:16 +00:00
Chad Rosier df8f2a23cb [Reassociate] Canonicalize the operands of all binary operators.
llvm-svn: 222008
2014-11-14 17:09:19 +00:00
Frederic Riss 39e1cda45b Tentatively appease the bots.
If this workaround gets the bots green, then we have to find out
why the -dwarf-accel-tables=Enable option doesn't work as
expected on non-darwin platforms.

llvm-svn: 222007
2014-11-14 17:08:18 +00:00
Chad Rosier d99df68e19 [Reassociate] Canonicalize operands of vector binary operators.
Prior to this commit fmul and fadd binary operators were being canonicalized for
both scalar and vector versions.  We now canonicalize add, mul, and, or, and xor
vector instructions.

llvm-svn: 222006
2014-11-14 17:08:15 +00:00
Chad Rosier f8b55f1bc5 [Reassociate] Canonicalize constants to RHS operand.
llvm-svn: 222005
2014-11-14 17:05:59 +00:00
Frederic Riss e837ec29c3 Reapply "[dwarfdump] Add support for dumping accelerator tables."
This reverts commit r221842 which was a revert of r221836 and of the
test parts of r221837.

This new version fixes an UB bug pointed out by David (along with
addressing some other review comments), makes some dumping more
resilient to broken input data and forces the accelerator tables
to be dumped in the tests where we use them (this decision is
platform specific otherwise).

llvm-svn: 222003
2014-11-14 16:15:53 +00:00
Cameron McInally 04400449c5 [AVX512] Add 512b masked integer shift by immediate patterns.
llvm-svn: 222002
2014-11-14 15:43:00 +00:00
Tom Stellard 9d7ddd516e R600/SI: Start implementing an assembler
This was done using the Sparc and PowerPC AsmParsers as guides.  So far it
is very simple and only supports sopp instructions.

llvm-svn: 221994
2014-11-14 14:08:00 +00:00
Bill Schmidt 7674692961 [PowerPC] Add VSX builtins for vec_div
This patch adds builtin support for xvdivdp and xvdivsp, along with a
test case.  Straightforward stuff.

There's a companion patch for Clang.

llvm-svn: 221983
2014-11-14 12:10:40 +00:00
Tim Northover d3be12a6c7 X86: use getConstant rather than getTargetConstant behind BUILD_VECTOR.
getTargetConstant should only be used when you can guarantee the instruction
selected will be able to cope with the raw value. BUILD_VECTOR is rather too
generic for this so we should use getConstant instead. In that case, an
instruction can still consume the constant, but if it doesn't it'll be
materialised through its own round of ISel.

Should fix PR21352.

llvm-svn: 221961
2014-11-14 01:30:14 +00:00
Reid Kleckner 283bc2ed28 Allow the use of functions as typeinfo in landingpad clauses
This is one step towards supporting SEH filter functions in LLVM.

llvm-svn: 221954
2014-11-14 00:35:50 +00:00
Reed Kotler d5c4196cb6 First stage of call lowering for Mips fast-isel
Summary:
This has most of what is needed for mips fast-isel call lowering for O32.
What is missing I will add on the next patch because this patch is already too large.
It should not be doing anything wrong but it will punt on some cases that it is basically
capable of doing.

The mechanism is there for parameters to be passed on the stack but I have not enabled it because it serves as a way for now to prevent some of the strange cases of O32 register passing that I have not fully checked yet and have some issues.

The Mips O32 abi rules are very complicated as far how data is passed in floating and integer registers.

However there is a way to think about this all very simply and this implementation reflects that.

Basically, the ABI rules are written as if everything is passed on the stack and aligned as such.
Once that is conceptually done, it is nearly trivial to reassign those locations to registers and
then all the complexity disappears.

So I have told tablegen that all the data is passed on the stack and during the lowering I fix
this by assigning to registers as per the ABI doc.

This has been my approach and you can line up what I did with the ABI document and see 1 to 1 what
is going on.



Test Plan: callabi.ll

Reviewers: dsanders

Reviewed By: dsanders

Subscribers: jholewinski, echristo, ahatanak, llvm-commits, rfuhler

Differential Revision: http://reviews.llvm.org/D5714

llvm-svn: 221948
2014-11-13 23:37:45 +00:00
Reid Kleckner 9aeb04793a Fix symbol resolution of floating point libc builtins in MCJIT
Fix for LLI failure on Windows\X86: http://llvm.org/PR5053

LLI.exe crashes on Windows\X86 when single precession floating point
intrinsics like the following are used: acos, asin, atan, atan2, ceil,
copysign, cos, cosh, exp, floor, fmin, fmax, fmod, log, pow, sin, sinh,
sqrt, tan, tanh

The above intrinsics are defined as inline-expansions in math.h, and are
not exported by msvcr120.dll (Win32 API GetProcAddress returns null).

For an FREM instruction, the JIT compiler generates a call to a stub for
the fmodf() intrinsic, and adds a relocation to fixup at load time. The
loader searches the libraries for the function, but fails because the
symbol is not exported. So, the call target remains NULL and the
execution crashes.

Since the math functions are loaded at JIT/runtime, the JIT can patch
CALL instruction directly instead of the searching the libraries'
exported symbols.  However, this fix caused build failures due to
unresolved symbols like _fmodf at link time.

Therefore, the current fix defines helper functions in the Runtime
link/load library to perform the above operations.  The address of these
helper functions are used to patch up the CALL instruction at load time.

Reviewers: lhames, rnk

Reviewed By: rnk

Differential Revision: http://reviews.llvm.org/D5387

Patch by Swaroop Sridhar!

llvm-svn: 221947
2014-11-13 23:32:52 +00:00
Reid Kleckner 29d880bdc7 Relax the gcov version.ll test to check '.' instead of '\*'
The escaping of the '\*' doesn't work with my combination of testing
tools.

llvm-svn: 221944
2014-11-13 23:07:55 +00:00
Matt Arsenault da59f3de45 R600/SI: Fix fmin_legacy / fmax_legacy matching for SI
select_cc is expanded on SI, so this was never matched.

llvm-svn: 221941
2014-11-13 23:03:09 +00:00
Chad Rosier 8716b58583 Revert "[GVN] Perform Scalar PRE on gep indices that feed loads before doing Load PRE."
This reverts commit r221924.  It appears the commit was a bit premature and is causing
bot failures that need further investigation.

llvm-svn: 221939
2014-11-13 22:54:59 +00:00
Chandler Carruth 99b261ce6d [x86] Add some tests for specific patterns of lane-flips combined with
in-lane shuffles that aren't always handled well by the current vector
shuffle lowering.

No functionality change yet, that will follow in a subsequent commit.

llvm-svn: 221938
2014-11-13 22:49:44 +00:00
Chad Rosier dd526665fc [GVN] Perform Scalar PRE on gep indices that feed loads before doing Load PRE.
Phabricator Revision: http://reviews.llvm.org/D6103
Patch by "Balaram Makam" <bmakam@codeaurora.org>!

llvm-svn: 221924
2014-11-13 21:17:58 +00:00
Juergen Ributzka 0af310d052 [FastISel][AArch64] Don't bail during simple GEP instruction selection.
The generic FastISel code would bail, because it can't emit a sign-extend for
AArch64. This copies the code over and uses AArch64 specific emit functions.

This is not ideal and 'computeAddress' should handles this, so it can fold the
address computation into the memory operation.

I plan to clean up 'computeAddress' anyways, so I will add that in a future
commit.

Related to rdar://problem/18962471.

llvm-svn: 221923
2014-11-13 20:50:44 +00:00
Matt Arsenault 7784992999 R600/SI: Use s_movk_i32
llvm-svn: 221922
2014-11-13 20:44:23 +00:00
Matt Arsenault 6ef66144f3 R600: Fix assert on empty function
If a function is just an unreachable, this would hit a
"this is not a MachO target" assertion because of setting
HasSubsectionViaSymbols.

llvm-svn: 221920
2014-11-13 20:07:40 +00:00
Matt Arsenault cc8d3b8774 R600: Error on initializer for LDS.
Also give a proper error for other address spaces.

llvm-svn: 221917
2014-11-13 19:56:13 +00:00
Matt Arsenault 1cffa4c191 R600/SI: Get rid of FCLAMP_SI pseudo
It's not necessary. Also use complex patterns to allow
src modifier usage.

llvm-svn: 221916
2014-11-13 19:49:04 +00:00
Matt Arsenault 581a7a6933 R600/SI: Allow commuting with src2_modifiers
llvm-svn: 221911
2014-11-13 19:26:50 +00:00
Matt Arsenault 95e48668b6 R600/SI: Allow commuting some 3 op instructions
e.g. v_mad_f32 a, b, c -> v_mad_f32 b, a, c

This simplifies matching v_madmk_f32.

This looks somewhat surprising, but it appears to be
OK to do this. We can commute src0 and src1 in all
of these instructions, and that's all that appears
to matter.

llvm-svn: 221910
2014-11-13 19:26:47 +00:00
Tim Northover 631cc9ce1a ARM: allow constpool entry to be moved to the user's block in all cases.
Normally entries can only move to a lower address, but when that wasn't viable,
the user's block was considered anyway. Unfortunately, it went via
createNewWater which wasn't designed to handle the case where there's already
an island after the block.

Unfortunately, the test we have is slow and fragile, and I couldn't reduce it
to anything sane even with the @llvm.arm.space intrinsic. The test change here
is recreating the previous one after the change.

rdar://problem/18545506

llvm-svn: 221905
2014-11-13 17:58:53 +00:00
Tim Northover ab85dcc7b8 ARM: avoid duplicating branches during constant islands.
We were using a naive heuristic to determine whether a basic block already had
an unconditional branch at the end. This mostly corresponded to reality
(assuming branches got optimised) because there's not much point in a branch to
the next block, but could go wrong.

llvm-svn: 221904
2014-11-13 17:58:51 +00:00
Tim Northover 650b0ee53b ARM: add @llvm.arm.space intrinsic for testing ConstantIslands.
Creating tests for the ConstantIslands pass is very difficult, since it depends
on precise layout details. Having the ability to precisely inject a number of
bytes into the stream helps greatly.

llvm-svn: 221903
2014-11-13 17:58:48 +00:00
Elena Demikhovsky d5e95b57e0 AVX-512: SINT_TO_FP cost model and some bugfixes
Checked some corner cases, for example translation
of <8 x i1> to <8 x double>

llvm-svn: 221883
2014-11-13 11:46:16 +00:00
Hal Finkel e353eba33b OCAMLFLAGS can contain =, don't use = with sed
Like HOST_LDFLAGS, etc. OCAMLFLAGS can contain =, so use ! as the substitution
separator instead of = (otherwise, sed might error).

llvm-svn: 221879
2014-11-13 09:29:30 +00:00
Hal Finkel 45ba2c10e4 Revert r219432 - "Revert "[BasicAA] Revert "Revert r218714 - Make better use of zext and sign information."""
Let's try this again...

This reverts r219432, plus a bug fix.

Description of the bug in r219432 (by Nick):

The bug was using AllPositive to break out of the loop; if the loop break
condition i != e is changed to i != e && AllPositive then the
test_modulo_analysis_with_global test I've added will fail as the Modulo will
be calculated incorrectly (as the last loop iteration is skipped, so Modulo
isn't updated with its Scale).

Nick also adds this comment:

ComputeSignBit is safe to use in loops as it takes into account phi nodes, and
the  == EK_ZeroEx check is safe in loops as, no matter how the variable changes
between iterations, zero-extensions will always guarantee a zero sign bit. The
isValueEqualInPotentialCycles check is therefore definitely not needed as all
the variable analysis holds no matter how the variables change between loop
iterations.

And this patch also adds another enhancement to GetLinearExpression - basically
to convert ConstantInts to Offsets (see test_const_eval and
test_const_eval_scaled for the situations this improves).

Original commit message:

This reverts r218944, which reverted r218714, plus a bug fix.

Description of the bug in r218714 (by Nick):

The original patch forgot to check if the Scale in VariableGEPIndex flipped the
sign of the variable. The BasicAA pass iterates over the instructions in the
order they appear in the function, and so BasicAliasAnalysis::aliasGEP is
called with the variable it first comes across as parameter GEP1. Adding a
%reorder label puts the definition of %a after %b so aliasGEP is called with %b
as the first parameter and %a as the second. aliasGEP later calculates that %a
== %b + 1 - %idxprom where %idxprom >= 0 (if %a was passed as the first
parameter it would calculate %b == %a - 1 + %idxprom where %idxprom >= 0) -
ignoring that %idxprom is scaled by -1 here lead the patch to incorrectly
conclude that %a > %b.

Revised patch by Nick White, thanks! Thanks to Lang to isolating the bug.
Slightly modified by me to add an early exit from the loop and avoid
unnecessary, but expensive, function calls.

Original commit message:

Two related things:

 1. Fixes a bug when calculating the offset in GetLinearExpression. The code
    previously used zext to extend the offset, so negative offsets were converted
    to large positive ones.

 2. Enhance aliasGEP to deduce that, if the difference between two GEP
    allocations is positive and all the variables that govern the offset are also
    positive (i.e. the offset is strictly after the higher base pointer), then
    locations that fit in the gap between the two base pointers are NoAlias.

Patch by Nick White!

llvm-svn: 221876
2014-11-13 09:16:54 +00:00
Chandler Carruth fee91883f4 [x86] Teach the vector shuffle lowering to make a more nuanced decision
between splitting a vector into 128-bit lanes and recombining them vs.
decomposing things into single-input shuffles and a final blend.

This handles a large number of cases in AVX1 where the cross-lane
shuffles would be much more expensive to represent even though we end up
with a fast blend at the root. Instead, we can do a better job of
shuffling in a single lane and then inserting it into the other lanes.

This fixes the remaining bits of Halide's regression captured in PR21281
for AVX1. However, the bug persists in AVX2 because I've made this
change reasonably conservative. The cases where it makes sense in AVX2
to split into 128-bit lanes are much more rare because we can often do
full permutations across all elements of the 256-bit vector. However,
the particular test case in PR21281 is an example of one of the rare
cases where it is *always* better to work in a single 128-bit lane. I'm
going to try to teach the logic to detect and form the good code even in
AVX2 next, but it will need to use a separate heuristic.

Finally, there is one pesky regression here where we previously would
craftily use vpermilps in AVX1 to shuffle both high and low halves at
the same time. We no longer pull that off, and not for any really good
reason. Ultimately, I think this is just another missing nuance to the
selection heuristic that I'll try to add in afterward, but this change
already seems strictly worth doing considering the magnitude of the
improvements in common matrix math shuffle patterns.

As always, please let me know if this causes a surprising regression for
you.

llvm-svn: 221861
2014-11-13 04:06:10 +00:00
Rui Ueyama ffa4cebe91 llvm-readobj: Print out address table when dumping COFF delay-import table
llvm-svn: 221855
2014-11-13 03:22:54 +00:00
Frederic Riss 0f7abef2cf Add an assert and a test that verify r221709's fix.
llvm-svn: 221854
2014-11-13 03:20:23 +00:00
Chandler Carruth 253dd39a9a [x86] Don't form overly fragmented blends when splitting and
re-combining shuffles because nothing was available in the wider vector
type.

The key observation (which I've put in the comments for future
maintainers) is that at this point, no further combining is really
possible. And so even though these shuffles trivially could be combined,
we need to actually do that as we produce them when producing them this
late in the lowering.

This fixes another (huge) part of the Halide vector shuffle regressions.
As it happens, this was already well covered by the tests, but I hadn't
noticed how bad some of these got. The specific patterns that turn
directly into unpckl/h patterns were occurring *many* times in common
vector processing code.

There are still more problems here sadly, but trying to incrementally
tease them apart and it looks like this is the core of the problem in
the splitting logic.

There is some chance of regression here, you can see it in the test
changes. Specifically, where we stop forming pshufb in some cases, it is
possible that pshufb was in fact faster. Intel "says" that pshufb is
slower than the instruction sequences replacing it.

llvm-svn: 221852
2014-11-13 02:42:08 +00:00
Quentin Colombet f5485bb008 [CodeGenPrepare] Handle zero extensions in the TypePromotionHelper.
Prior to this patch the TypePromotionHelper was promoting only sign extensions.
Supporting zero extensions changes:
- How constants are extended.
- How sign extensions, zero extensions, and truncate are composed together.
- How the type of the extended operation is recorded. Now we need to know the
  kind of the extension as well as its type.

Each change is fairly small, unlike the diff.
Most of the diff are comments/variable renaming to say "extension" instead of
"sign extension".

The performance improvements on the test suite are within the noise.

Related to <rdar://problem/18310086>.

llvm-svn: 221851
2014-11-13 01:44:51 +00:00
Juergen Ributzka 957a1454cc [FastISel][AArch64] Optimize select when one of the operands is a 'true' or 'false' value.
Optimize selects of i1 in the presence of 'true' and 'false' operands to simple
logic operations.

This fixes rdar://problem/18960150.

llvm-svn: 221848
2014-11-13 00:36:46 +00:00
Juergen Ributzka 424c5fd12f [FastISel][AArch64] Fold the cmp into the select when possible.
This folds the compare emission into the select emission when possible, so we
can directly use the flags and don't have to emit a separate compare.

Related to rdar://problem/18960150.

llvm-svn: 221847
2014-11-13 00:36:43 +00:00
Juergen Ributzka d1a042abd0 [FastISel][AArch64] Extend 'select' lowering to support also i1 to i16.
Related to rdar://problem/18960150.

llvm-svn: 221846
2014-11-13 00:36:38 +00:00
Frederic Riss e1f4958122 Revert "[dwarfdump] Add support for dumping accelerator tables."
This reverts commit r221836.

The tests are asserting on some buildbots. This also reverts the
test part of r221837 as it relies on dwarfdump dumping the
accelerator tables.

llvm-svn: 221842
2014-11-13 00:15:15 +00:00
Sanjoy Das c5676df3ec Teach ScalarEvolution to sharpen range information.
If x is known to have the range [a, b), in a loop predicated by (icmp
ne x, a) its range can be sharpened to [a + 1, b).  Get
ScalarEvolution and hence IndVars to exploit this fact.

This change triggers an optimization to widen-loop-comp.ll, so it had
to be edited to get it to pass.

This change was originally landed in r219834 but had a bug and broke
ASan. It was reverted in r219878, and is now being re-landed after
fixing the original bug.

phabricator: http://reviews.llvm.org/D5639
reviewed by: atrick

llvm-svn: 221839
2014-11-13 00:00:58 +00:00
Frederic Riss 3a6b354b3e Fix emission of Dwarf accelerator table when there are multiple CUs.
The DIE offset in the accel tables is an offset relative to the start
of the debug_info section, but we were encoding the offset to the
start of the containing CU.

llvm-svn: 221837
2014-11-12 23:48:14 +00:00
Frederic Riss 39467276d0 [dwarfdump] Add support for dumping accelerator tables.
The class used for the dump only allows to dump for the moment, but
it can (and will) be easily extended to support search also.

llvm-svn: 221836
2014-11-12 23:48:10 +00:00
Ahmed Bougacha 0788d49a40 [CodeGenPrepare][AArch64] Fix a TLI legality check on iPTR to use a lowered instead.
Fixes PR21548.  Related to PR20474.

llvm-svn: 221820
2014-11-12 22:16:55 +00:00
Sanjay Patel f6f7d5d1dd Expose the number of Newton-Raphson iterations applied to the hardware's reciprocal estimate as a parameter (x86).
This is a follow-on to r221706 and r221731 and discussed in more detail in PR21385.

This patch also loosens the testcase checking for btver2. We know that the "1.0" will be loaded, but
we can't tell exactly when, so replace the CHECK-NEXT specifiers with plain CHECKs. The CHECK-NEXT
sequence relied on a quirk of post-RA-scheduling that may change independently of anything in these tests.

llvm-svn: 221819
2014-11-12 21:39:01 +00:00
Timur Iskhodzhanov 0e76a16200 Temporary fix for PR21528 - use mangled C++ function names in COFF debug info to un-break ASan on Windows
llvm-svn: 221813
2014-11-12 20:21:20 +00:00
Timur Iskhodzhanov a11b32b7e5 [COFF] Make it clearer that the symbols subsection holds function display name rather than just name
llvm-svn: 221812
2014-11-12 20:10:09 +00:00
Cameron McInally 73a6bca32b [AVX512] Add integer shift by immediate intrinsics.
llvm-svn: 221811
2014-11-12 19:58:54 +00:00
Sanjay Patel 4c219fd248 CGSCC should not treat intrinsic calls like function calls (PR21403)
Make the handling of calls to intrinsics in CGSCC consistent: 
they are not treated like regular function calls because they
are never lowered to function calls.

Without this patch, we can get dangling pointer asserts from
the subsequent loop that processes callsites because it already
ignores intrinsics.

See http://llvm.org/bugs/show_bug.cgi?id=21403 for more details / discussion.

Differential Revision: http://reviews.llvm.org/D6124

llvm-svn: 221802
2014-11-12 18:25:47 +00:00
Jingyue Wu 8a12cea5f1 Disable indvar widening if arithmetics on the wider type are more expensive
Summary:
Reapply r221772. The old patch breaks the bot because the @indvar_32_bit test
was run whether NVPTX was enabled or not.

IndVarSimplify should not widen an indvar if arithmetics on the wider
indvar are more expensive than those on the narrower indvar. For
instance, although NVPTX64 treats i64 as a legal type, an ADD on i64 is
twice as expensive as that on i32, because the hardware needs to
simulate a 64-bit integer using two 32-bit integers.

Split from D6188, and based on D6195 which adds NVPTXTargetTransformInfo.

Fixes PR21148.

Test Plan:
Added @indvar_32_bit that verifies we do not widen an indvar if the arithmetics
on the wider type are more expensive. This test is run only when NVPTX is
enabled.

Reviewers: jholewinski, eliben, meheff, atrick

Reviewed By: atrick

Subscribers: jholewinski, llvm-commits

Differential Revision: http://reviews.llvm.org/D6196

llvm-svn: 221799
2014-11-12 18:09:15 +00:00
Justin Hibbits 21c5353f54 Revert part of the PIC tests (TLS part)
This change actually wasn't warranted for -O0, and the new changes prove it and
break the build.

llvm-svn: 221793
2014-11-12 16:50:15 +00:00
Justin Hibbits b296c9735e Fix thet tests.
I seem to have missed the update I made for changing 'flag_pic' to "PIC Level".
Mea culpa.

llvm-svn: 221792
2014-11-12 16:40:00 +00:00
Justin Hibbits a88b605721 Add support for small-model PIC for PowerPC.
Summary:
Large-model was added first.  With the addition of support for multiple PIC
models in LLVM, now add small-model PIC for 32-bit PowerPC, SysV4 ABI.  This
generates more optimal code, for shared libraries with less than about 16380
data objects.

Test Plan: Test cases added or updated

Reviewers: joerg, hfinkel

Reviewed By: hfinkel

Subscribers: jholewinski, mcrosier, emaste, llvm-commits

Differential Revision: http://reviews.llvm.org/D5399

llvm-svn: 221791
2014-11-12 15:16:30 +00:00
Rafael Espindola 301396c911 Fix the test.
It was broken since r221708.

llvm-svn: 221783
2014-11-12 14:23:04 +00:00
Zoran Jovanovic fd888630b5 [mips][micromips] Add predicate 'InMicroMips' at CodeGen patterns for microMIPS instructions
Differential Revision: http://reviews.llvm.org/D6198

llvm-svn: 221780
2014-11-12 13:30:10 +00:00
Chandler Carruth 0c922fcec5 [x86] Start improving the matching of unpck instructions based on test
cases from Halide folks. This initial step was extracted from
a prototype change by Clay Wood to try and address regressions found
with Halide and the new vector shuffle lowering.

llvm-svn: 221779
2014-11-12 10:05:18 +00:00
Chandler Carruth ce6947d4cf [x86] Clean up a bunch of vector shuffle tests with my script. Notably,
removes windows line endings and other noise. This is in prelude to
making substantive changes to these tests.

llvm-svn: 221776
2014-11-12 09:17:15 +00:00
Elena Demikhovsky be8808dc3f AVX-512: Intrinsics for ERI
3 instructions: vrcp28, vrsqrt28, vexp2, only vector forms.
Intrinsics include SAE (Suppres All Exceptions) parameter.

http://reviews.llvm.org/D6214

llvm-svn: 221774
2014-11-12 07:31:03 +00:00
Jingyue Wu a48273390c Reverts r221772 which fails tests
llvm-svn: 221773
2014-11-12 07:19:25 +00:00
Jingyue Wu 635a9b14fa Disable indvar widening if arithmetics on the wider type are more expensive
Summary:
IndVarSimplify should not widen an indvar if arithmetics on the wider
indvar are more expensive than those on the narrower indvar. For
instance, although NVPTX64 treats i64 as a legal type, an ADD on i64 is
twice as expensive as that on i32, because the hardware needs to
simulate a 64-bit integer using two 32-bit integers.

Split from D6188, and based on D6195 which adds NVPTXTargetTransformInfo.

Fixes PR21148.

Test Plan:
Added @indvar_32_bit that verifies we do not widen an indvar if the arithmetics
on the wider type are more expensive.

Reviewers: jholewinski, eliben, meheff, atrick

Reviewed By: atrick

Subscribers: jholewinski, llvm-commits

Differential Revision: http://reviews.llvm.org/D6196

llvm-svn: 221772
2014-11-12 06:58:45 +00:00
Bill Schmidt 729547847f [PowerPC] Add vec_vsx_ld and vec_vsx_st intrinsics
This patch enables the vec_vsx_ld and vec_vsx_st intrinsics for
PowerPC, which provide programmer access to the lxvd2x, lxvw4x,
stxvd2x, and stxvw4x instructions.

New LLVM intrinsics are provided to represent these four instructions
in IntrinsicsPowerPC.td.  These are patterned after the similar
intrinsics for lvx and stvx (Altivec).  In PPCInstrVSX.td, these
intrinsics are tied to the code gen patterns, with additional patterns
to allow plain vanilla loads and stores to still generate these
instructions.

At -O1 and higher the intrinsics are immediately converted to loads
and stores in InstCombineCalls.cpp.  This will open up more
optimization opportunities while still allowing the correct
instructions to be generated.  (Similar code exists for aligned
Altivec loads and stores.)

The new intrinsics are added to the code that checks for consecutive
loads and stores in PPCISelLowering.cpp, as well as to
PPCTargetLowering::getTgtMemIntrinsic().

There's a new test to verify the correct instructions are generated.
The loads and stores tend to be reordered, so the test just counts
their number.  It runs at -O2, as it's not very effective to test this
at -O0, when many unnecessary loads and stores are generated.

I ended up having to modify vsx-fma-m.ll.  It turns out this test case
is slightly unreliable, but I don't know a good way to prevent
problems with it.  The xvmaddmdp instructions read and write the same
register, which is one of the multiplicands.  Commutativity allows
either to be chosen.  If the FMAs are reordered differently than
expected by the test, the register assignment can be different as a
result.  Hopefully this doesn't change often.

There is a companion patch for Clang.

llvm-svn: 221767
2014-11-12 04:19:40 +00:00
Nick Kledzik f44dbda542 Object, support both mach-o archive t.o.c file names
For historical reasons archives on mach-o have two possible names for the 
file containing the table of contents for the archive: "__.SYMDEF SORTED" 
and "__.SYMDEF".  But the libObject archive reader only supported the former.

This patch fixes llvm::object::Archive to support both names.

llvm-svn: 221747
2014-11-12 01:37:45 +00:00
Chad Rosier f53f07046b [Reassociate] Canonicalize negative constants out of expressions.
Add support for FDiv, which was regressed by the previous commit.

llvm-svn: 221738
2014-11-11 23:36:42 +00:00
Philip Reames 66c6de61ee Canonicalize an assume(load != null) into !nonnull metadata
We currently have two ways of informing the optimizer that the result of a load is never null: metadata and assume. This change converts the second in to the former. This avoids a need to implement optimizations using both forms.

We should probably extend this basic idea to metadata of other forms; in particular, range metadata. We view is that assumes should be considered a "last resort" for when there isn't a more canonical way to represent something.

Reviewed by: Hal
Differential Revision: http://reviews.llvm.org/D5951

llvm-svn: 221737
2014-11-11 23:33:19 +00:00
Juergen Ributzka 89441b0dd8 [FastISel][AArch64] Add support for fabs intrinsic.
Lower the llvm.fabs intrinsic to the 'fabs' MI instruction.

This fixes rdar://problem/18946552.

llvm-svn: 221729
2014-11-11 23:10:44 +00:00
Chad Rosier 094ac7735b [Reassociate] Canonicalize negative constants out of expressions.
This is a reapplication of r221171, but we only perform the transformation
on expressions which include a multiplication.  We do not transform rem/div
operations as this doesn't appear to be safe in all cases.

llvm-svn: 221721
2014-11-11 22:58:35 +00:00
Kostya Serebryany 29a18dcbc5 Move asan-coverage into a separate phase.
Summary:
This change moves asan-coverage instrumentation
into a separate Module pass.
The other part of the change in clang introduces a new flag
-fsanitize-coverage=N.
Another small patch will update tests in compiler-rt.

With this patch no functionality change is expected except for the flag name.
The following changes will make the coverage instrumentation work with tsan/msan

Test Plan: Run regression tests, chromium.

Reviewers: nlewycky, samsonov

Reviewed By: nlewycky, samsonov

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D6152

llvm-svn: 221718
2014-11-11 22:14:37 +00:00
Tom Roeder eb7a303d1b Add Forward Control-Flow Integrity.
This commit adds a new pass that can inject checks before indirect calls to
make sure that these calls target known locations. It supports three types of
checks and, at compile time, it can take the name of a custom function to call
when an indirect call check fails. The default failure function ignores the
error and continues.

This pass incidentally moves the function JumpInstrTables::transformType from
private to public and makes it static (with a new argument that specifies the
table type to use); this is so that the CFI code can transform function types
at call sites to determine which jump-instruction table to use for the check at
that site.

Also, this removes support for jumptables in ARM, pending further performance
analysis and discussion.

Review: http://reviews.llvm.org/D4167
llvm-svn: 221708
2014-11-11 21:08:02 +00:00
Colin LeMahieu eb4675fb29 [llvm-mc] Fixing case where if a file ended with non-newline whitespace or a comma it would access invalid memory.
Cleaned up parse loop.

llvm-svn: 221707
2014-11-11 21:03:09 +00:00
Sanjay Patel e2e589288f Use rcpss/rcpps (X86) to speed up reciprocal calcs (PR21385).
This is a first step for generating SSE rcp instructions for reciprocal
calcs when fast-math allows it. This is very similar to the rsqrt optimization
enabled in D5658 ( http://reviews.llvm.org/rL220570 ).

For now, be conservative and only enable this for AMD btver2 where performance
improves significantly both in terms of latency and throughput.

We may never enable this codegen for Intel Core* chips because the divider circuits
are just too fast. On SandyBridge, divss can be as fast as 10 cycles versus the 21
cycle critical path for the rcp + mul + sub + mul + add estimate.

Follow-on patches may allow configuration of the number of Newton-Raphson refinement
steps, add AVX512 support, and enable the optimization for more chips.

More background here: http://llvm.org/bugs/show_bug.cgi?id=21385

Differential Revision: http://reviews.llvm.org/D6175

llvm-svn: 221706
2014-11-11 20:51:00 +00:00
Rafael Espindola 07e694d293 Simplify testcase. NFC.
Thanks to Filipe Cabecinhas for the tip.

llvm-svn: 221705
2014-11-11 20:49:16 +00:00
Bill Schmidt 3d9674cfb1 [PowerPC] Replace foul hackery with real calls to __tls_get_addr
My original support for the general dynamic and local dynamic TLS
models contained some fairly obtuse hacks to generate calls to
__tls_get_addr when lowering a TargetGlobalAddress.  Rather than
generating real calls, special GET_TLS_ADDR nodes were used to wrap
the calls and only reveal them at assembly time.  I attempted to
provide correct parameter and return values by chaining CopyToReg and
CopyFromReg nodes onto the GET_TLS_ADDR nodes, but this was also not
fully correct.  Problems were seen with two back-to-back stores to TLS
variables, where the call sequences ended up overlapping with unhappy
results.  Additionally, since these weren't real calls, the proper
register side effects of a call were not recorded, so clobbered values
were kept live across the calls.

The proper thing to do is to lower these into calls in the first
place.  This is relatively straightforward; see the changes to
PPCTargetLowering::LowerGlobalTLSAddress() in PPCISelLowering.cpp.
The changes here are standard call lowering, except that we need to
track the fact that these calls will require a relocation.  This is
done by adding a machine operand flag of MO_TLSLD or MO_TLSGD to the
TargetGlobalAddress operand that appears earlier in the sequence.

The calls to LowerCallTo() eventually find their way to
LowerCall_64SVR4() or LowerCall_32SVR4(), which call FinishCall(),
which calls PrepareCall().  In PrepareCall(), we detect the calls to
__tls_get_addr and immediately snag the TargetGlobalTLSAddress with
the annotated relocation information.  This becomes an extra operand
on the call following the callee, which is expected for nodes of type
tlscall.  We change the call opcode to CALL_TLS for this case.  Back
in FinishCall(), we change it again to CALL_NOP_TLS for 64-bit only,
since we require a TOC-restore nop following the call for the 64-bit
ABIs.

During selection, patterns in PPCInstrInfo.td and PPCInstr64Bit.td
convert the CALL_TLS nodes into BL_TLS nodes, and convert the
CALL_NOP_TLS nodes into BL8_NOP_TLS nodes.  This replaces the code
removed from PPCAsmPrinter.cpp, as the BL_TLS or BL8_NOP_TLS
nodes can now be emitted normally using their patterns and the
associated printTLSCall print method.

Finally, as a result of these changes, all references to get-tls-addr
in its various guises are no longer used, so they have been removed.

There are existing TLS tests to verify the changes haven't messed
anything up).  I've added one new test that verifies that the problem
with the original code has been fixed.

llvm-svn: 221703
2014-11-11 20:44:09 +00:00
Rafael Espindola a9c28b68cd Use a 8 bit immediate when possible.
This fixes pr21529.

llvm-svn: 221700
2014-11-11 19:46:36 +00:00
Dario Domizioli e904e85faf [X86][ELF] Fix PR20243 - leaf frame pointer bug with TLS access
The ISel lowering for global TLS access in PIC mode was creating a pseudo 
instruction that is later expanded to a call, but the code was not 
setting the hasCalls flag in the MachineFrameInfo alongside the adjustsStack 
flag. This caused some functions to be mistakenly recognized as leaf functions,
and this in turn affected the decision to eliminate the frame pointer.

With the fix, hasCalls is properly set and the leaf frame pointer is correctly
preserved.

llvm-svn: 221695
2014-11-11 18:44:49 +00:00
Oliver Stannard 8c2c67e63c LLVM incorrectly folds xor into select
LLVM replaces the SelectionDAG pattern (xor (set_cc cc x y) 1) with
(set_cc !cc x y), which is only correct when the xor has type i1.
Instead, we should check that the constant operand to the xor is all
ones.

llvm-svn: 221693
2014-11-11 17:36:01 +00:00
Vasileios Kalintiris b2dd15f8c7 [mips] Add preliminary support for the MIPS II target.
Summary:
This patch enables code generation for the MIPS II target. Pre-Mips32
targets don't have the MUL instruction, so we add the correspondent
pattern that uses the MULT/MFLO combination in order to retrieve the
product.

This is WIP as we don't support code generation for select nodes due to
the lack of conditional-move instructions.

Reviewers: dsanders

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D6150

llvm-svn: 221686
2014-11-11 11:43:55 +00:00
Vasileios Kalintiris 8c1c95e95c [mips] Add hardware register name "hwr_ulr" ($29)
The canonical name when printing assembly is still $29. The reason is that
GAS does not accept "$hwr_ulr" at the moment.

This addresses the comments from r221307, which reverted the original
commit r221299.

llvm-svn: 221685
2014-11-11 11:22:39 +00:00
Andrea Di Biagio 5fa2e15453 [X86] Add missing check for 'isINSERTPSMask' in method 'isShuffleMaskLegal'.
This helps the DAGCombiner to identify more opportunities to fold shuffles.

llvm-svn: 221684
2014-11-11 11:20:31 +00:00
Vasileios Kalintiris 10b5ba3f6e Recommit "[mips] Add names and tests for the hardware registers"
The original commit r221299 was reverted in r221307.  I removed the name
"hrw_ulr" ($29) from the original commit because two tests were failing.

llvm-svn: 221681
2014-11-11 10:31:31 +00:00
David Majnemer 185b5b1d24 llvm-objdump: Skip empty sections when dumping contents
Empty sections are just noise when using objdump.
This is similar to what binutils does.

llvm-svn: 221680
2014-11-11 09:58:25 +00:00
Manuel Klimek bfd0039e73 Was convinced in commit comments that requiring a specific python version is the wrong approach; reverting.
llvm-svn: 221679
2014-11-11 08:53:18 +00:00
David Majnemer 2cc4bc77bf MC, COFF: Use relocations for function references inside the section
Referencing one symbol from another in the same section does not
generally require a relocation.  However, the MS linker has a feature
called /INCREMENTAL which enables incremental links.  It achieves this
by creating thunks to the actual function and redirecting all
relocations to point to the thunk.

This breaks down with the old scheme if you have a function which
references, say, itself.  On x86_64, we would use %rip relative
addressing to reference the start of the function from out current
position.  This would lead to miscompiles because other references might
reference the thunk instead, breaking function pointer equality.

This fixes PR21520.

llvm-svn: 221678
2014-11-11 08:43:57 +00:00
Suyog Sarda beb064bd94 Addition to r216371 (SLP and Loop Vectorization) and r218607 where
cost model for signed division by power of 2 was improved for AArch64.
The revision r218607 missed test case for Loop Vectorization.
Adding it in this revision.

Differential Revision: http://reviews.llvm.org/D6181

llvm-svn: 221674
2014-11-11 07:39:27 +00:00
Michael Kuperstein 3fe15e498f [X86] Fix pattern match for 32-to-64-bit zext in the presence of AssertSext
This fixes an issue with matching trunc -> assertsext -> zext on x86-64, which would not zero the high 32-bits. See PR20494 for details.
Recommitting - This time, with a hopefully working test.

Differential Revision: http://reviews.llvm.org/D6128

llvm-svn: 221672
2014-11-11 07:07:40 +00:00
Rafael Espindola f7b5ba0621 Only run the gold plugin tests if gold supports the targets we test with.
This fixes pr21345.

llvm-svn: 221669
2014-11-11 05:27:12 +00:00
Quentin Colombet 360460ba64 [X86] Custom lower UINT_TO_FP from v4f32 to v4i32, and for v8f32 to v8i32 if
AVX2 is available.
According to IACA, the new lowering has a throughput of 8 cycles instead of 13
with the previous one.

Althought this lowering kicks in some SPECs benchmarks, the performance
improvement was within the noise.

Correctness testing has been done for the whole range of uint32_t with the
following program:
    uint4 v = (uint4) {0,1,2,3};
    uint32_t i;
    
    //Check correctness over entire range for uint4 -> float4 conversion
    for( i = 0; i < 1U << (32-2); i++ )
    {
        float4 t = test(v);
        float4 c = correct(v);
        
        if( 0xf != _mm_movemask_ps( t == c ))
        {
            printf( "Error @ %vx: %vf vs. %vf\n", v, c, t);
            return -1;
        }
        
        v += 4;
    }
Where "correct" is the old lowering and "test" the new one.

The patch adds a test case for the two custom lowering instruction.
It also modifies the vector cost model, which is why cast.ll and uitofp.ll are
modified.
2009-02-26-MachineLICMBug.ll is also modified because we now hoist 7
instructions instead of 4 (3 more constant loads).

rdar://problem/18153096>

llvm-svn: 221657
2014-11-11 02:23:47 +00:00
Chad Rosier a9ae3e311c [yaml2obj] Support AArch64 relocations.
Patch by Daniel Stewart <stewartd@codeaurora.org>!
Phabricator Revision: http://reviews.llvm.org/D6192

llvm-svn: 221639
2014-11-10 23:02:03 +00:00
Michael Kuperstein 217e1eec0d Reverting r221626 due to a too-strict test.
llvm-svn: 221629
2014-11-10 21:07:41 +00:00
Juergen Ributzka ea5870a530 [AArch64][FastISel] Fix kill flags for integer extends.
In the case we optimize an integer extend away and replace it directly with the
source register, we also have to clear all kill flags at all its uses.
This is necessary, because the orignal IR instruction might be trivially dead,
but we replaced it with a nop at MI level.

llvm-svn: 221628
2014-11-10 21:05:31 +00:00
Juergen Ributzka d441725d3d [SwitchLowering] Fix the "fixPhis" function.
Switch statements may have more than one incoming edge into the same BB if they
all have the same value. When the switch statement is converted these incoming
edges are now coming from multiple BBs. Updating all incoming values to be from
a single BB is incorrect and would generate invalid LLVM IR.

The fix is to only update the first occurrence of an incoming value. Switch
lowering will perform subsequent calls to this helper function for each incoming
edge with a new basic block - updating all edges in the process.

This fixes rdar://problem/18916275.

llvm-svn: 221627
2014-11-10 21:05:27 +00:00
Michael Kuperstein 3218b942f4 [X86] Fix pattern match for 32-to-64-bit zext in the presence of AssertSext
This fixes an issue with matching trunc -> assertsext -> zext on x86-64, which would not zero the high 32-bits.
See PR20494 for details.

Differential Revision: http://reviews.llvm.org/D6128

llvm-svn: 221626
2014-11-10 20:40:21 +00:00
Rafael Espindola 75b809c9b6 Copy externally_initialized in GlobalVariable::copyAttributesFrom.
Patch by Kevin Frei!

llvm-svn: 221620
2014-11-10 18:41:59 +00:00
Zoran Jovanovic 37bca10148 [mips][microMIPS] Fix issue with delay slot filler and microMIPS
Differential Revision: http://reviews.llvm.org/D6193

llvm-svn: 221612
2014-11-10 17:27:56 +00:00
Daniel Sanders 87f9b88bfb [mips] Fix sret arguments for N32/N64 which were accidentally broken in r221534.
llvm-svn: 221604
2014-11-10 15:57:53 +00:00
Manuel Klimek fb9e32b4b2 Mark test using python as REQUIRES: python27.
llvm-svn: 221598
2014-11-10 15:29:29 +00:00
Matt Arsenault afbf21f15c R600/SI: Fix broken check prefixes in test
llvm-svn: 221565
2014-11-08 00:02:57 +00:00
Chad Rosier b3eb452e83 [Reassociate] Better preserve NSW/NUW flags.
Part of PR12985.

Phabricator Revision: http://reviews.llvm.org/D6172

llvm-svn: 221555
2014-11-07 22:12:57 +00:00
Saleem Abdulrasool 5898e09057 Transform: add SymbolRewriter pass
This introduces the symbol rewriter. This is an IR->IR transformation that is
implemented as a CodeGenPrepare pass. This allows for the transparent
adjustment of the symbols during compilation.

It provides a clean, simple, elegant solution for symbol inter-positioning. This
technique is often used, such as in the various sanitizers and performance
analysis.

The control of this is via a custom YAML syntax map file that indicates source
to destination mapping, so as to avoid having the compiler to know the exact
details of the source to destination transformations.

llvm-svn: 221548
2014-11-07 21:32:08 +00:00
Daniel Sanders c43cda84ff [mips] Promote i32 arguments to i64 for the N32/N64 ABI and fix <64-bit structs...
Summary:
... and after all that refactoring, it's possible to distinguish softfloat
floating point values from integers so this patch no longer breaks softfloat to
do it.

Remove direct handling of i32's in the N32/N64 ABI by promoting them to
i64. This more closely reflects the ABI documentation and also fixes
problems with stack arguments on big-endian targets.

We now rely on signext/zeroext annotations (already generated by clang) and
the Assert[SZ]ext nodes to avoid the introduction of unnecessary sign/zero
extends.

It was not possible to convert three tests to use signext/zeroext. These tests
are bswap.ll, ctlz-v.ll, ctlz-v.ll. It's not possible to put signext on a
vector type so we just accept the sign extends here for now. These tests don't
pass the vectors the same way clang does (clang puts multiple elements in the
same argument, these map 1 element to 1 argument) so we don't need to worry too
much about it.

With this patch, all known N32/N64 bugs should be fixed and we now pass the
first 10,000 tests generated by ABITest.py.

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D6117

llvm-svn: 221534
2014-11-07 16:54:21 +00:00
NAKAMURA Takumi f507465236 llvm/test/tools/llvm-symbolizer/ppc64.test: Avoid subshell.
llvm-svn: 221526
2014-11-07 14:50:10 +00:00
Jay Foad 52695da39c llvm-symbolizer: teach it about PowerPC64 ELF function descriptors
Summary:
Teach llvm-symbolizer about PowerPC64 ELF function descriptors. Symbols in the .opd section point to function descriptors, the first word of which is a pointer to the real function. For the purposes of symbolizing we pretend that the symbol points directly to the function.

This is enough to get decent function names in stack traces for unoptimized binaries, which fixes the sanitizer print-stack-trace test on PowerPC64 Linux.

Reviewers: kcc, willschm, samsonov

Reviewed By: samsonov

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D6110

llvm-svn: 221514
2014-11-07 09:08:39 +00:00
David Majnemer 2098b86f64 SCCP: overdefined calls cannot become constant
We would attempt to fold away a call instruction which had been marked
overdefined.  However, it's not valid to transition to constant from
overdefined.

This fixes PR21512.

llvm-svn: 221513
2014-11-07 08:54:19 +00:00
Justin Hibbits 771c132e0f Add Position-independent Code model Module API.
Summary:
This makes PIC levels a Module flag attribute, which can be queried by the
backend.  The flag is named `PIC Level`, and can have a value of:

  0 - Backend-default
  1 - Small-model (-fpic)
  2 - Large-model (-fPIC)

These match the `-pic-level' command line argument for clang, and the value of the
preprocessor macro `__PIC__'.

Test Plan:
New flags tests specific for the 'PIC Level' module flag.
Tests to be added as part of a future commit for PowerPC, which will use this new API.

Reviewers: rafael, echristo

Reviewed By: rafael, echristo

Subscribers: rafael, llvm-commits

Differential Revision: http://reviews.llvm.org/D5882

llvm-svn: 221510
2014-11-07 04:46:10 +00:00
Ahmed Bougacha 72001cf287 [AArch64] Keep flags on condition vreg when instantiating a CB branch.
Reversing a CB* instruction used to drop the flags on the condition. On the
included testcase, this lead to a read from an undefined vreg.
Using addOperand keeps the flags, here <undef>.

Differential Revision: http://reviews.llvm.org/D6159

llvm-svn: 221507
2014-11-07 02:50:00 +00:00
David Majnemer bf93e7c7d3 LoopVectorize: Don't assume pointees are sized
A pointer's pointee might not be sized: the pointee could be a function.

Report this as IK_NoInduction when calculating isInductionVariable.

This fixes PR21508.

llvm-svn: 221501
2014-11-07 00:31:14 +00:00
David Majnemer c1eca5ad7c InstCombine: Rely on cmpxchg's return code when it's strong
Comparing the result of a cmpxchg instruction can be replaced with an
extractvalue of the cmpxchg success indicator.

llvm-svn: 221498
2014-11-06 23:23:30 +00:00
Simon Atanasyan 60e1a79242 [ELF][yaml2obj] Handle additional MIPS specific st_other field flags
The ELF symbol `st_other` field might contain additional flags besides
visibility ones. This patch implements support for some MIPS specific
flags.

llvm-svn: 221491
2014-11-06 22:46:24 +00:00
Simon Pilgrim 615ab8e721 [X86][SSE] Vector integer/float conversion memory folding (cvttps2dq / cvttpd2dq)
Fixed an issue with the (v)cvttps2dq and (v)cvttpd2dq instructions being incorrectly put in the 2 source operand folding tables instead of the 1 source operand and added the missing SSE/AVX versions.

Also added missing (v)cvtps2dq and (v)cvtpd2dq instructions to the folding tables.

Differential Revision: http://reviews.llvm.org/D6001

llvm-svn: 221489
2014-11-06 22:15:41 +00:00
Ahmed Bougacha b5367eeea3 [X86] Add VFMADDSUB cases for the 213->231 custom inserter.
Also add tests for vfmadd/vfmsub.

llvm-svn: 221488
2014-11-06 22:04:15 +00:00
Ahmed Bougacha 9152361d73 [X86] Add missing FMA3 VFMADDSUB in the emitter.
Also reuse the fma4 intrinsic test to cover fma3 instructions too.

llvm-svn: 221487
2014-11-06 21:58:11 +00:00
Ahmed Bougacha d1e71108d0 [X86] Split FMA4 RM tests into a separate file. NFC.
While there, remove useless comments.

llvm-svn: 221484
2014-11-06 21:46:23 +00:00
Rafael Espindola b7a4505a3f Base check on the section name, not the variable name.
The variable is private, so the name should not be relied on. Also, the
linker uses the sections, so asan should too when trying to avoid causing
the linker problems.

llvm-svn: 221480
2014-11-06 20:01:34 +00:00
Lang Hames bcdd304cbb [RegAlloc] Remove reference to the trivial spiller in test case.
This test case was never actually testing the trivial spiller: the -spiller
option has not been hooked up for a while now.

llvm-svn: 221475
2014-11-06 19:24:18 +00:00
Kevin Enderby 930fdc77dd Plumb in the ARM thumb symbolizer in llvm-objdump’s Mach-O disassembler and
add the code and test cases for 32-bit ARM symbolizer.

Also fixed the printing of data in code as it was not using the table correctly
and needed to fix one of the test cases too.

This will break lld’s test/mach-o/arm-interworking-movw.yaml till the tweak
for that is made. Which I’ll be committing immediately after this commit.

llvm-svn: 221470
2014-11-06 19:00:13 +00:00
Colin LeMahieu 2c769209a1 [Hexagon] Adding basic Hexagon ELF object emitter.
llvm-svn: 221465
2014-11-06 17:05:51 +00:00
Chad Rosier ac6a2f532c [Reassociate] Don't reassociate when mixing regular and fast-math FP
instructions.  Inlining might cause such cases and it's not valid to
reassociate floating-point instructions without the unsafe algebra flag.

Patch by Mehdi Amini <mehdi_amini@apple.com>!

llvm-svn: 221462
2014-11-06 16:46:37 +00:00
Rafael Espindola 1c99344156 Use FileCheck in a few tests.
llvm-svn: 221459
2014-11-06 15:05:51 +00:00
Rafael Espindola 26cfbea738 Compute the correct jump table entries on 32 bit windows.
On 32 bit windows we use label differences and .set does not suppress
rolocations, a combination that was not used before r220256.

This fixes PR21497.

llvm-svn: 221456
2014-11-06 14:39:49 +00:00
Andrea Di Biagio 7ecd22ca4a [X86] When commuting SSE immediate blend, make sure that the new blend mask is a valid imm8.
Example:
define <4 x i32> @test(<4 x i32> %a, <4 x i32> %b) {
  %shuffle = shufflevector <4 x i32> %a, <4 x i32> %b, <4 x i32> <i32 4, i32 5, i32 6, i32 3>
  ret <4 x i32> %shuffle
}

Before llc (-mattr=+sse4.1), produced the following assembly instruction:
  pblendw $4294967103, %xmm1, %xmm0

After
  pblendw $63, %xmm1, %xmm0

llvm-svn: 221455
2014-11-06 14:36:45 +00:00
Toma Tabacu 27cab751ca [mips] Tolerate the use of the %z inline asm operand modifier with non-immediates.
Summary:
Currently, we give an error if %z is used with non-immediates, instead of continuing as if the %z isn't there.

For example, you use the %z operand modifier along with the "Jr" constraints ("r" makes the operand a register, and "J" makes it an immediate, but only if its value is 0). 
In this case, you want the compiler to print "$0" if the inline asm input operand turns out to be an immediate zero and you want it to print the register containing the operand, if it's not.

We give an error in the latter case, and we shouldn't (GCC also doesn't).

Reviewers: dsanders

Reviewed By: dsanders

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D6023

llvm-svn: 221453
2014-11-06 14:25:42 +00:00
Sasa Stankovic b38db1eff8 [mips] Add the following MIPS options that control gp-relative addressing of
small data items: -mgpopt, -mlocal-sdata, -mextern-sdata. Implement gp-relative
addressing for constants.

Differential Revision: http://reviews.llvm.org/D4903

llvm-svn: 221450
2014-11-06 13:20:12 +00:00
Toma Tabacu dde4c464dd [mips] Improve error/warning messages and testing for the .cpload assembler directive.
Summary:
Improved warning message when using .cpload inside a reorder section and added an error message for using .cpload with Mips16 enabled.
Modified the tests to fit with the changes mentioned above, added a test-case for the N32 ABI in cpload.s and did some reformatting to make the tests easier to read.

Reviewers: dsanders

Reviewed By: dsanders

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D5465

llvm-svn: 221447
2014-11-06 10:02:45 +00:00
David Majnemer 51ff559500 Object, COFF: Infer symbol sizes from adjacent symbols
Use the position of the subsequent symbol in the object file to infer
the size of it's predecessor.  I hope to eventually remove whatever COFF
specific details from this little algorithm so that we can unify this
logic with what Mach-O does.

llvm-svn: 221444
2014-11-06 08:10:41 +00:00
Justin Bogner 58e41344f9 GCOV: Make sure that function idents in the .gcda and .gcno match
When generating gcov compatible profiling, we sometimes skip emitting
data for functions for one reason or another. However, this was
emitting different function IDs in the .gcno and .gcda files, because
the .gcno case was using the loop index before skipping functions and
the .gcda the array index after. This resulted in completely invalid
gcov data.

This fixes the problem by making the .gcno loop track the ID
separately from the loop index.

llvm-svn: 221441
2014-11-06 06:55:02 +00:00
Rafael Espindola 83f0ea8982 Add three other sections when L symbols are allowed.
llvm-svn: 221436
2014-11-06 05:01:21 +00:00
Rafael Espindola bf77ed6826 Allow L symbols in no_dead_strip sections.
If a section cannot be dead stripped, it is safe to use L symbols, since
the linker will keep all of it in the end.

llvm-svn: 221431
2014-11-06 02:42:03 +00:00
Quentin Colombet dbe33e7aa4 [X86] Lower VSELECT into SHRUNKBLEND when we shrink the bits used into the
condition to match a blend.
This prevents optimizations that work on VSELECT to perform invalid
transformations. Indeed, the optimized condition does not match the vector
boolean content that is expected and bad things may happen.

This patch yields the exact same code on the whole test-suite + specs (-O3 and
-O3 -march=core-avx2), it improves one test case (vector-blend.ll) and fixes a
bug reduced in vselect-avx.ll.

<rdar://problem/18819506>

llvm-svn: 221429
2014-11-06 02:25:03 +00:00
Petar Jovanovic 35f05747f3 [mips64] Fix MIPS64 exception personality encoding
Remove dynamic relocations of __gxx_personality_v0 from the .eh_frame.
The MIPS64 follow-up of the MIPS32 fix (rL209907).

Patch by Vladimir Stefanovic.

Differential Revision: http://reviews.llvm.org/D6141

llvm-svn: 221408
2014-11-05 22:42:31 +00:00
Simon Pilgrim 1fc483d991 [X86][SSE] Vector integer to float conversion memory folding
Added missing memory folding for the (V)CVTDQ2PS instructions - we can safely fold these (but not the (V)CVTDQ2PD versions which have a register/memory size discrepancy in the source operand). I've added a test case demonstrating that stack folding now works.

Differential Revision: http://reviews.llvm.org/D5981

llvm-svn: 221407
2014-11-05 22:28:25 +00:00
Sean Silva e727f57d37 [Linker] Add some test coverage for llvm.ident merging
llvm-svn: 221403
2014-11-05 21:33:34 +00:00
Derek Schuff 059e525970 Fix test breakage from r221386
llvm-svn: 221389
2014-11-05 20:02:05 +00:00
Derek Schuff a54222045e [x86 fast-isel] Materialize allocas with the correct-sized lea for ILP32
Summary:
X86FastISel::fastMaterializeAlloca was incorrectly conditioning its
opcode selection on subtarget bitness rather than pointer size.

Differential Revision: http://reviews.llvm.org/D6136

llvm-svn: 221386
2014-11-05 19:27:21 +00:00
Matt Arsenault b6e51ff1e7 R600/SI: Add testcase I forgot to commit from months ago
llvm-svn: 221384
2014-11-05 19:01:22 +00:00
Justin Holewinski 3d140fcfd1 [NVPTX] Add NVPTXLowerStructArgs pass
This works around the limitation that PTX does not allow .param space
loads/stores with arbitrary pointers.

If a function has a by-val struct ptr arg, say foo(%struct.x *byval %d), then
add the following instructions to the first basic block :

%temp = alloca %struct.x, align 8
%tt1 = bitcast %struct.x * %d to i8 *
%tt2 = llvm.nvvm.cvt.gen.to.param %tt2
%tempd = bitcast i8 addrspace(101) * to %struct.x addrspace(101) *
%tv = load %struct.x addrspace(101) * %tempd
store %struct.x %tv, %struct.x * %temp, align 8

The above code allocates some space in the stack and copies the incoming
struct from param space to local space. Then replace all occurences of %d
by %temp.

Fixes PR21465.

llvm-svn: 221377
2014-11-05 18:19:30 +00:00
Zoran Jovanovic 06c9d55123 ps][microMIPS] Implement CodeGen support for ANDI16 instruction
llvm-svn: 221371
2014-11-05 17:43:00 +00:00
Zoran Jovanovic 9f99723d92 ps][microMIPS] Implement CodeGen support for SLL16 and SRL16 instructions
llvm-svn: 221369
2014-11-05 17:38:31 +00:00
Zoran Jovanovic 8853171b46 [mips][microMIPS] Implement ANDI16 instruction
llvm-svn: 221367
2014-11-05 17:31:00 +00:00
Peter Collingbourne a1099840ff [dfsan] Abort at runtime on indirect calls to uninstrumented vararg functions.
We currently have no infrastructure to support these correctly.

This is accomplished by generating a call to a runtime library function that
aborts at runtime in place of the regular wrapper for such functions. Direct
calls are rewritten in the usual way during traversal of the caller's IR.

We also remove the "split-stack" attribute from such wrappers, as the code
generator cannot currently handle split-stack vararg functions.

llvm-svn: 221360
2014-11-05 17:21:00 +00:00
Zoran Jovanovic 9c654830f7 [mips][microMIPS] Mark symbols as microMIPS if necessary
Differential Revision: http://reviews.llvm.org/D6039

llvm-svn: 221355
2014-11-05 16:35:20 +00:00
Zoran Jovanovic a87308c84c Reverted revisions 221351, 221352 and 221353.
llvm-svn: 221354
2014-11-05 16:19:59 +00:00
Zoran Jovanovic 3038500f3b [mips][microMIPS] Implement CodeGen support for ANDI16 instruction
Differential Revision: http://reviews.llvm.org/D5797

llvm-svn: 221353
2014-11-05 15:54:05 +00:00
Zoran Jovanovic f4f5f1e272 [mips][microMIPS] Implement CodeGen support for SLL16 and SRL16 instructions
Differential Revision: http://reviews.llvm.org/D5933

llvm-svn: 221352
2014-11-05 15:46:53 +00:00
Zoran Jovanovic e548bb0634 [mips][microMIPS] Implement ANDI16 instruction
Differential Revision: http://reviews.llvm.org/D5163

llvm-svn: 221351
2014-11-05 15:39:41 +00:00
Tom Stellard 326d6ece94 R600/SI: Change all instruction assembly names to lowercase.
This matches the format produced by the AMD proprietary driver.

//==================================================================//
// Shell script for converting .ll test cases: (Pass the .ll files
   you want to convert to this script as arguments).
//==================================================================//

; This was necessary on my system so that A-Z in sed would match only
; upper case.  I'm not sure why.
export LC_ALL='C'

TEST_FILES="$*"

MATCHES=`grep -v Patterns SIInstructions.td | grep -o '"[A-Z0-9_]\+["e]' | grep -o '[A-Z0-9_]\+' | sort -r`

for f in $TEST_FILES; do
  # Check that there are SI tests:
  grep -q -e 'verde' -e 'bonaire' -e 'SI' -e 'tahiti' $f
  if [ $? -eq 0 ]; then
    for match in $MATCHES; do
      sed -i -e "s/\([ :]$match\)/\L\1/" $f
    done

    # Try to get check lines with partial instruction names
    sed -i 's/\(;[ ]*SI[A-Z\\-]*: \)\([A-Z_0-9]\+\)/\1\L\2/' $f
  fi
done

sed -i -e 's/bb0_1/BB0_1/g' ../../../test/CodeGen/R600/infinite-loop.ll
sed -i -e 's/SI-NOT: bfe/SI-NOT: {{[^@]}}bfe/g'../../../test/CodeGen/R600/llvm.AMDGPU.bfe.*32.ll ../../../test/CodeGen/R600/sext-in-reg.ll
sed -i -e 's/exp_IEEE/EXP_IEEE/g' ../../../test/CodeGen/R600/llvm.exp2.ll
sed -i -e 's/numVgprs/NumVgprs/g' ../../../test/CodeGen/R600/register-count-comments.ll
sed -i 's/\(; CHECK[-NOT]*: \)\([A-Z_0-9]\+\)/\1\L\2/' ../../../test/CodeGen/R600/select64.ll ../../../test/CodeGen/R600/sgpr-copy.ll

//==================================================================//
// Shell script for converting .td files (run this last)
//==================================================================//

export LC_ALL='C'
sed -i -e '/Patterns/!s/\("[A-Z0-9_]\+[ "e]\)/\L\1/g' SIInstructions.td
sed -i -e 's/"EXP/"exp/g' SIInstrInfo.td

llvm-svn: 221350
2014-11-05 14:50:53 +00:00
Tom Stellard bd59920616 R600/SI: Add an extra check line to make test more strict
llvm-svn: 221349
2014-11-05 14:50:34 +00:00
Andrea Di Biagio ce46b97b48 [X86] Teach method 'isVectorClearMaskLegal' how to check for legal blend masks.
This patch improves the folding of vector AND nodes into blend operations for
targets that feature SSE4.1. A vector AND node where one of the operands is
a constant build_vector with elements that are either zero or all-ones can be
converted into a blend.

This allows for example to simplify the following code:

define <4 x i32> @test(<4 x i32> %A, <4 x i32> %B) {
  %1 = and <4 x i32> %A, <i32 0, i32 0, i32 0, i32 -1>
  %2 = and <4 x i32> %B, <i32 -1, i32 -1, i32 -1, i32 0>
  %3 = or <4 x i32> %1, %2
  ret <4 x i32> %3
}

Before this patch llc (-mcpu=corei7) generated:
        andps  LCPI1_0(%rip), %xmm0, %xmm0
        andps  LCPI1_1(%rip), %xmm1, %xmm1
        orps   %xmm1, %xmm0, %xmm0
        retq

With this patch we generate a single 'vpblendw'.

llvm-svn: 221343
2014-11-05 13:04:14 +00:00
Oliver Stannard c8d452eed8 Fix bashism in tests added by r221341
llvm-svn: 221342
2014-11-05 12:40:21 +00:00
Oliver Stannard 9e89d8cc5c [ARM] Honor FeatureD16 in the assembler and disassembler
Some ARM FPUs only have 16 double-precision registers, rather than the
normal 32. LLVM represents this with the D16 target feature. This is
currently used by CodeGen to avoid using high registers when they are
not available, but the assembler and disassembler do not.

I fix this in the assmebler and disassembler rather than the
InstrInfo.td files, as the latter would require a large number of
changes everywhere one of the floating-point instructions is referenced
in the backend. This solution is similar to the one used for
co-processor numbers and MSR masks.

llvm-svn: 221341
2014-11-05 12:06:39 +00:00
Craig Topper 12f0d9ef2c Improve logic that decides if its profitable to commute when some of the virtual registers involved have uses/defs chains connecting them to physical register. Fix up the tests that this change improves.
llvm-svn: 221336
2014-11-05 06:43:02 +00:00
NAKAMURA Takumi 61b2f48453 llvm/test/Transforms/GCOVProfiling: Avoid to parse backslashes in MDString. Use %/T instead of %T.
LLVM Parser decodes "\bb" as hex in "C:\bb-win7\buildername\build...", with MDString.

See also, http://llvm.org/docs/LangRef.html#metadata-nodes-and-metadata-strings

This reverts r221270, "Disable 3 tests in llvm/test/Transforms/GCOVProfiling/ for now. Investigating."

FIXME: Please check EC in GCOVProfiler::emitProfileNotes().
llvm-svn: 221334
2014-11-05 06:29:05 +00:00
David Majnemer 5026722287 llvm-readobj: Add support for dumping the DOS header in PE files
llvm-svn: 221333
2014-11-05 06:24:35 +00:00
David Majnemer bf7550e7ec InstSimplify: Exact shifts of X by Y are X if X has the lsb set
Exact shifts may not shift out any non-zero bits. Use computeKnownBits
to determine when this occurs and just return the left hand side.

This fixes PR21477.

llvm-svn: 221325
2014-11-05 00:59:59 +00:00
Tim Northover dc0d9e46a5 ARM: try to add extra CS-register whenever stack alignment >= 8.
We currently try to push an even number of registers to preserve 8-byte
alignment during a function's prologue, but only when the stack alignment is
prcisely 8. Many of the reasons for doing it apply also when that alignment > 8
(the extra store is often free, and can save another stack adjustment, though
less frequently for 16-byte stack alignment).

llvm-svn: 221321
2014-11-05 00:27:20 +00:00
Tim Northover 228c943f31 ARM/Dwarf: correctly align stack before callee-saved VPRs
We were making an attempt to do this by adding an extra callee-saved GPR (so
that there was an even number in the list), but when that failed we went ahead
and pushed anyway.

This had a couple of potential issues:
  + The .cfi directives we emit misplaced dN because they were based on
    PrologEpilogInserter's calculation.
  + Unaligned stores can be less efficient.
  + Unaligned stores can actually fault (likely only an issue in niche cases,
    but possible).

This adds a final explicit stack adjustment if all other options fail, so that
the actual locations of the registers match up with where they should be.

llvm-svn: 221320
2014-11-05 00:27:13 +00:00
David Majnemer f20d7c4c61 Analysis: Make isSafeToSpeculativelyExecute fire less for divides
Divides and remainder operations do not behave like other operations
when they are given poison: they turn into undefined behavior.

It's really hard to know if the operands going into a div are or are not
poison.  Because of this, we should only choose to speculate if there
are constant operands which we can easily reason about.

This fixes PR21412.

llvm-svn: 221318
2014-11-04 23:49:08 +00:00
Reid Kleckner 941e93e9a8 Revert "[Reassociate] Canonicalize negative constants out of expressions."
This reverts commit r221171.

It performs this invalid transformation:
-  %div.i = urem i64 -1, %add
-  %sub.i = sub i64 -2, %div.i
+  %div.i = urem i64 1, %add
+  %sub.i1 = add i64 %div.i, -2

llvm-svn: 221317
2014-11-04 23:42:45 +00:00
Simon Pilgrim c9a0779309 [X86][SSE] Enable commutation for SSE immediate blend instructions
Patch to allow (v)blendps, (v)blendpd, (v)pblendw and vpblendd instructions to be commuted - swaps the src registers and inverts the blend mask.

This is primarily to improve memory folding (see new tests), but it also improves the quality of shuffles (see modified tests).

Differential Revision: http://reviews.llvm.org/D6015

llvm-svn: 221313
2014-11-04 23:25:08 +00:00
Juergen Ributzka f9660f0712 [AArch64] Use the correct register class for ORR.
While fixing up the register classes in the machine combiner in a previous
commit I missed one.

This fixes the last one and adds a test case.

llvm-svn: 221308
2014-11-04 22:20:07 +00:00
Rafael Espindola d85260827c Revert "[mips] Add names and tests for the hardware registers"
This reverts commit r221299.

The tests

    LLVM :: MC/Disassembler/Mips/mips32.txt
    LLVM :: MC/Disassembler/Mips/mips32_le.txt

were failing.

llvm-svn: 221307
2014-11-04 22:15:05 +00:00
David Blaikie 3a443c29b9 Provide gmlt-like inline scope information in the skeleton CU to facilitate symbolication without needing the .dwo files
Clang -gsplit-dwarf self-host -O0, binary increases by 0.0005%, -O2,
binary increases by 25%.

A large binary inside Google, split-dwarf, -O0, and other internal flags
(GDB index, etc) increases by 1.8%, optimized build is 35%.

The size impact may be somewhat greater in .o files (I haven't measured
that much - since the linked executable -O0 numbers seemed low enough)
due to relocations. These relocations could be removed if we taught the
llvm-symbolizer to handle indexed addressing in the .o file (GDB can't
cope with this just yet, but GDB won't be reading this info anyway).
Also debug_ranges could be shared between .o and .dwo, though ideally
debug_ranges would get a schema that could used index(+offset)
addressing, and move to the .dwo file, then we'd be back to sharing
addresses in the address pool again.

But for now, these sizes seem small enough to go ahead with this.

Verified that no other DW_TAGs are produced into the .o file other than
subprograms and inlined_subroutines.

llvm-svn: 221306
2014-11-04 22:12:25 +00:00
Rafael Espindola bfd0f01dd7 Don't produce relocations for a difference in a section with no symbols.
We were producing a relocation for
----------------
.section foo,bar
La:
Lb:
 .long   La-Lb
--------------

but not for

---------------------
  .section foo,bar
zed:
La:
Lb:
 .long   La-Lb
----------------

This patch handles the case where both fragments are part of the first atom
in a section and there is no corresponding symbol to that atom.

This fixes pr21328.

llvm-svn: 221304
2014-11-04 22:10:33 +00:00
Vasileios Kalintiris df6e0d0371 [mips] Add names and tests for the hardware registers
Reviewers: dsanders

Reviewed By: dsanders

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D5763

llvm-svn: 221299
2014-11-04 21:30:44 +00:00
Arnaud A. de Grandmaison a11cab3120 [PBQP] Callee saved regs should have a higher cost than scratch regs
Registers are not all equal. Some are not allocatable (infinite cost),
some have to be preserved but can be used, and some others are just free
to use.

Ensure there is a cost hierarchy reflecting this fact, so that the
allocator will favor scratch registers over callee-saved registers.

llvm-svn: 221293
2014-11-04 20:51:29 +00:00
Benjamin Kramer 185dc0da1f AArch64: Pattern match integer vector abs like we do on ARM.
This kind of pattern is emitted by the loop vectorizer.

llvm-svn: 221289
2014-11-04 20:10:06 +00:00
Toma Tabacu cc2502d8f3 [mips] Improve support for the .set mips16/nomips16 assembler directives.
Summary:
Appropriately set/clear the FeatureBit for Mips16 when these assembler directives are used and also emit ".set nomips16" (previously, only ".set mips16" was being emitted).

These improvements allow for better testing of the .cpload/.cprestore assembler directives (which are not supposed to work when Mips16 is enabled).

Test Plan: The test is bare-bones because there are no MC tests for Mips16 instructions (there's only one, which checks that the Mips16 ELF header flag gets set), and that suggests to me that it has not been implemented yet in the IAS.

Reviewers: dsanders

Reviewed By: dsanders

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D5462

llvm-svn: 221277
2014-11-04 17:18:07 +00:00
Juergen Ributzka 7d434ce7e5 [Stackmaps] Make test less fragile. NFC.
llvm-svn: 221276
2014-11-04 17:11:00 +00:00
NAKAMURA Takumi 14e3fc6ad4 Disable 3 tests in llvm/test/Transforms/GCOVProfiling/ for now. Investigating.
llvm-svn: 221270
2014-11-04 14:41:53 +00:00
NAKAMURA Takumi 06ac98299f Remove "REQUIRES:shell" from tests. They work for me.
llvm-svn: 221269
2014-11-04 13:41:33 +00:00
Simon Atanasyan d2bfd00e71 [yaml2obj] Allow yaml2obj tool to recognize EF_MIPS_NAN2008 flag
llvm-svn: 221268
2014-11-04 13:33:36 +00:00
NAKAMURA Takumi f4de174a44 Re-enable tests in llvm/test/Object, corresponding to line_iterator's
change in r221153.

llvm-svn: 221265
2014-11-04 13:19:29 +00:00
NAKAMURA Takumi 217ee5bf92 llvm/test/Transforms/GCOVProfiling/linezero.ll: Use %/T instead of %T in regex. This works on win32.
llvm-svn: 221262
2014-11-04 13:00:48 +00:00
Charlie Turner 34ec84ec7f Add missing tests for build attribute encodings in object files.
test/MC/ARM/directive-eabi_attribute.s was missing several tests of object file
encodings relative to the existing tests for assembly file encodings. This
commit adds the missing tests.

Change-Id: Ie110ca02b65e8f4d4c77f437bd09d03607fa5c0d
llvm-svn: 221250
2014-11-04 09:07:40 +00:00
David Majnemer b925715c56 CodeGen: Enable DWARF emission for MS ABI targets
This is experimental, just barely enough to get things to not
immediately combust.

A note for those who are curious:
Only lld can successfully link the object files, other linkers truncate
the section names making the debug sections illegible to debuggers.

Even with this in mind, we believe we are having trouble with SECREL
relocations.

llvm-svn: 221245
2014-11-04 08:03:31 +00:00
David Majnemer 83d6857b72 test: Restore llvm-lit (at least for my machine)
r221137 feeds None into os.path.join which is not valid.

llvm-svn: 221242
2014-11-04 05:54:50 +00:00
David Majnemer d28edfea03 Minimize test case further
No functional change intended.

llvm-svn: 221237
2014-11-04 05:17:58 +00:00
Reid Kleckner dd3f3edafa Revert "Transforms: reapply SVN r219899"
This reverts commit r220811 and r220839. It made an incorrect change to
musttail handling.

llvm-svn: 221226
2014-11-04 02:02:14 +00:00
Sanjoy Das e839965faa The patchpoint lowering logic would crash with live constants equal to
the tombstone or empty keys of a DenseMap<int64_t, T>.  This patch
fixes the issue (and adds a tests case).

llvm-svn: 221214
2014-11-04 00:59:21 +00:00
Kevin Enderby 9907d0a3c2 Add the code and test cases for 32-bit Intel to llvm-objdump’s Mach-O symbolizer.
llvm-svn: 221211
2014-11-04 00:43:16 +00:00
Colin LeMahieu 5241881bbc [Hexagon] Reverting 220584 to address ASAN errors.
llvm-svn: 221210
2014-11-04 00:14:36 +00:00
Hal Finkel 840257a49c Use AA in LoadCombine
LoadCombine can be smarter about aborting when a writing instruction is
encountered, instead of aborting upon encountering any writing instruction, use
an AliasSetTracker, and only abort when encountering some write that might
alias with the loads that could potentially be combined.

This was originally motivated by comments made (and a test case provided) by
David Majnemer in response to PR21448. It turned out that LoadCombine was not
responsible for that PR, but LoadCombine should also be improved so that
unrelated stores (and @llvm.assume) don't interrupt load combining.

llvm-svn: 221203
2014-11-03 23:19:16 +00:00
Rafael Espindola 169758b50e Handle ASAN_OPTIONS and UBSAN_OPTIONS in TestingConfig.py
Currently they are passed to tests of llvm itself, but not, for example, lld.

With this patch the options are visible in every test.

llvm-svn: 221198
2014-11-03 23:04:56 +00:00
David Majnemer 7e2b9882b1 InstCombine: Remove infinite loop caused by FoldOpIntoPhi
FoldOpIntoPhi could create an infinite loop if the PHI could potentially
reach a BB it was considering inserting instructions into.  The
instructions it would insert would eventually lead to other combines
firing which would, again, lead to FoldOpIntoPhi firing.

The solution is to handicap FoldOpIntoPhi so that it doesn't attempt to
insert instructions that the PHI might reach.

This fixes PR21377.

llvm-svn: 221187
2014-11-03 21:55:12 +00:00
Akira Hatanaka b961534818 [ARM, inline-asm] Fix ARMTargetLowering::getRegForInlineAsmConstraint to return
register class tGPRRegClass if the target is thumb1.

This commit fixes a crash that occurs during register allocation which was
triggered when a virtual register defined by an inline-asm instruction had to
be spilled.
 
rdar://problem/18740489

llvm-svn: 221178
2014-11-03 20:37:04 +00:00
Ahmed Bougacha 12eb558bd9 [X86] 8bit divrem: Improve codegen for AH register extraction.
For 8-bit divrems where the remainder is used, we used to generate:
    divb  %sil
    shrw  $8, %ax
    movzbl  %al, %eax

That was to avoid an H-reg access, which is problematic mainly because
it isn't possible in REX-prefixed instructions.

This patch optimizes that to:
    divb  %sil
    movzbl  %ah, %eax

To do that, we explicitly extend AH, and extract the L-subreg in the
resulting register.  The extension is done using the NOREX variants of
MOVZX.  To support signed operations, MOVSX_NOREX is also added.
Further, this introduces a new SDNode type, [us]divrem_ext_hreg, which is
then lowered to a sequence containing a single zext (rather than 2).

Differential Revision: http://reviews.llvm.org/D6064

llvm-svn: 221176
2014-11-03 20:26:35 +00:00
Hal Finkel 1e16fa302e EarlyCSE should ignore calls to @llvm.assume
EarlyCSE uses a simple generation scheme for handling memory-based
dependencies, and calls to @llvm.assume (which are marked as writing to memory
to ensure the preservation of control dependencies) disturb that scheme
unnecessarily. Skipping calls to @llvm.assume is legal, and the alternative
(adding AA calls in EarlyCSE) is likely undesirable (we have GVN for that).

Fixes PR21448.

llvm-svn: 221175
2014-11-03 20:21:32 +00:00
Tom Stellard 5cbb53c41e Reapply: R600: Make sure to inline all internal functions
Function calls aren't supported yet.

This was reverted due to build breakages, which should be fixed now.

llvm-svn: 221173
2014-11-03 19:49:05 +00:00
Chad Rosier 005505b027 [Reassociate] Canonicalize negative constants out of expressions.
This gives CSE/GVN more options to eliminate duplicate expressions.
This is a follow up patch to http://reviews.llvm.org/D4904.

http://reviews.llvm.org/D5363

llvm-svn: 221171
2014-11-03 19:11:30 +00:00
Paul Robinson ad06e430ce Normally an 'optnone' function goes through fast-isel, which does not
call DAGCombiner. But we ran into a case (on Windows) where the
calling convention causes argument lowering to bail out of fast-isel,
and we end up in CodeGenAndEmitDAG() which does run DAGCombiner.
So, we need to make DAGCombiner check for 'optnone' after all.

Commit includes the test that found this, plus another one that got
missed in the original optnone work.

llvm-svn: 221168
2014-11-03 18:19:26 +00:00
Charlie Turner 1d8cc909cc Remove the cortex-a9-mp CPU.
This CPU definition is redundant. The Cortex-A9 is defined as
supporting multiprocessing extensions. Remove its definition and
update appropriate tests.

LLVM defines both a cortex-a9 CPU and a cortex-a9-mp CPU. The only
difference between the two CPU definitions in ARM.td is that
cortex-a9-mp contains the feature FeatureMP for multiprocessing
extensions.

This is redundant since the Cortex-A9 is defined as having
multiprocessing extensions in the TRMs. armcc also defines the
Cortex-A9 as having multiprocessing extensions by default.

Change-Id: Ifcadaa6c322be0a33d9d2a39cfdd7da1d75981a7
llvm-svn: 221166
2014-11-03 17:38:00 +00:00
Oliver Stannard 269a275cb4 [AArch64] Fix miscompile of comparison with 0xffffffffffffffff
Some literals in the AArch64 backend had 15 'f's rather than 16, causing
comparisons with a constant 0xffffffffffffffff to be miscompiled.

llvm-svn: 221157
2014-11-03 15:28:40 +00:00
Sid Manning 326f8af463 Handle ctor/init_array initialization.
Hexagon was not calling InitializeELF and could not select between
ctors and init_array.

Phabricator revision: http://reviews.llvm.org/D6061

llvm-svn: 221156
2014-11-03 14:56:05 +00:00
Charlie Turner abaec9da3a Merge the directive-eabi_attribute.s and directive-eabi_attribute-2.s tests.
test/MC/ARM/directive-eabi_attribute.s had gotten out-of-sync with
test/MC/ARM/directive-eabi_attribute-2.s. The former tests the encoding of
build attributes in object files, and the latter the encoding in assembly
files. Since both these tests need to be updated at the same time, it makes
sense to combine them into a single test. The object file encodings are being
checked against the ouput of -arm-attributes rather than by direct byte
comparisons which makes for easier reading.

Change-Id: I0075de506ae5626fb2fa235383fe5ce6a65a15a9
llvm-svn: 221155
2014-11-03 14:52:00 +00:00
Rafael Espindola 42bce8f69d Add CRLF support to LineIterator.
The MRI scripts have to work with CRLF, and in general it is probably
a good idea to support this in a core utility like LineIterator.

llvm-svn: 221153
2014-11-03 14:09:47 +00:00
Oliver Stannard cf6bfb1dd0 Revert r221150, as it broke sanitizer tests
llvm-svn: 221151
2014-11-03 12:19:03 +00:00
Oliver Stannard 652ec6ee89 Emit .eh_frame with relocations to functions, rather than sections
When LLVM emits DWARF call frame information, it currently creates a local,
section-relative symbol in the code section, which is pointed to by a
relocation on the .eh_frame section. However, for C++ we emit some functions in
section groups, and the SysV ABI has some rules to make it easier to remove
these sections
(http://www.sco.com/developers/gabi/latest/ch4.sheader.html#section_group_rules):

  A symbol table entry with STB_LOCAL binding that is defined relative to one
  of a group's sections, and that is contained in a symbol table section that is
  not part of the group, must be discarded if the group members are discarded.
  References to this symbol table entry from outside the group are not allowed.

This means that we need to use the function symbol for the relocation, not a
temporary symbol.

There was a comment in the code claiming that the local symbol was used to
avoid creating a relocation, but a relocation must be created anyway as the
code and CFI are in different sections.

llvm-svn: 221150
2014-11-03 12:02:51 +00:00
Peter Zotov aea393be26 Unbreak build.
A bug in lit.cfg was introduced in r221137.

llvm-svn: 221144
2014-11-03 09:58:41 +00:00
Peter Zotov f21b6c84f5 [OCaml] Avoid embedding absolute paths into executables.
Bindings built out-of-tree, e.g. via OPAM, should append
a line to META.llvm like the following:

linkopts = "-cclib -L$libdir -cclib -Wl,-rpath,$libdir"

where $libdir is the lib/ directory where LLVM libraries are
installed.

llvm-svn: 221139
2014-11-03 09:51:34 +00:00
Peter Zotov 2f4735a1ad [OCaml] Run tests twice, with ocamlc and ocamlopt (if available)
ocamlc and ocamlopt expose a distinct set of buildsystem bugs, e.g.
only ocamlc would detect -custom or -dllib-related bugs, and as all
buildbots will have ocamlopt, these bugs will stay hidden.

This change should add no more than 30 seconds of testing time.

llvm-svn: 221137
2014-11-03 09:50:53 +00:00
David Majnemer aafa6790bd Forgot to add input file for test added in r221133
llvm-svn: 221134
2014-11-03 07:58:16 +00:00
David Majnemer e268361bb3 llvm-vtabledump: Handle Itanium VTables
Add support in the vtable dumper for the Itanium ABI.

llvm-svn: 221133
2014-11-03 07:23:25 +00:00
David Majnemer 72a643dc8f InstCombine: Combine (X | Y) - X to (~X & Y)
This implements the transformation from (X | Y) - X to (~X & Y).

Differential Revision: http://reviews.llvm.org/D5791

llvm-svn: 221129
2014-11-03 05:53:55 +00:00
Rafael Espindola 778fcc770b Revert r221096 bringing back r221014 with a fix.
The issue was that linkAppendingVarProto does the full linking job, including
deleting the old dst variable. The fix is just to call it and return early
if we have a GV with appending linkage.

original message:

    Refactor duplicated code in liking GlobalValues.

    There is quiet a bit of logic that is common to any GlobalValue but was
    duplicated for Functions, GlobalVariables and GlobalAliases.

    While at it, merge visibility even when comdats are used, fixing pr21415.

llvm-svn: 221098
2014-11-02 13:28:57 +00:00
Chandler Carruth fd38af2d13 Revert r221014: "Refactor duplicated code in liking GlobalValues."
This commit introduces heap-use-after-free detected by ASan. Here is the output
for one of several tests that detect it:

******************** TEST 'LLVM :: Linker/AppendingLinkage.ll' FAILED ********************
Command Output (stderr):
--
=================================================================
==2122==ERROR: AddressSanitizer: heap-use-after-free on address 0x60c00000b9c8 at pc 0x0000005d05d1 bp 0x7fff64ed27c0 sp 0x7fff64ed27b8
READ of size 4 at 0x60c00000b9c8 thread T0
    #0 0x5d05d0 in llvm::GlobalValue::setUnnamedAddr(bool) /usr/local/google/home/chandlerc/src/llvm/build/../include/llvm/IR/GlobalValue.h:115:35
    #1 0x69fff1 in (anonymous namespace)::ModuleLinker::linkGlobalValueProto(llvm::GlobalValue*) /usr/local/google/home/chandlerc/src/llvm/build/../lib/Linker/LinkModules.cpp:1041:5
    #2 0x697229 in (anonymous namespace)::ModuleLinker::run() /usr/local/google/home/chandlerc/src/llvm/build/../lib/Linker/LinkModules.cpp:1485:9
    #3 0x696542 in llvm::Linker::linkInModule(llvm::Module*) /usr/local/google/home/chandlerc/src/llvm/build/../lib/Linker/LinkModules.cpp:1621:10
    #4 0x4a2db7 in main /usr/local/google/home/chandlerc/src/llvm/build/../tools/llvm-link/llvm-link.cpp:116:9
    #5 0x7f4ae61e5ec4 in __libc_start_main /build/buildd/eglibc-2.19/csu/libc-start.c:287
    #6 0x41eb71 in _start (/usr/local/google/home/chandlerc/src/llvm/build/bin/llvm-link+0x41eb71)

0x60c00000b9c8 is located 72 bytes inside of 128-byte region [0x60c00000b980,0x60c00000ba00)
freed by thread T0 here:
    #0 0x4a1e6b in operator delete(void*) /usr/local/google/home/chandlerc/src/llvm/opt-build/../projects/compiler-rt/lib/asan/asan_new_delete.cc:94:3
    #1 0x5d1a7a in llvm::iplist<llvm::GlobalVariable, llvm::ilist_traits<llvm::GlobalVariable> >::erase(llvm::ilist_iterator<llvm::GlobalVariable>) /usr/local/google/home/chandlerc/src/llvm/build/../inclu
de/llvm/ADT/ilist.h:466:5
    #2 0x5d1980 in llvm::GlobalVariable::eraseFromParent() /usr/local/google/home/chandlerc/src/llvm/build/../lib/IR/Globals.cpp:204:3
    #3 0x6a8a4d in (anonymous namespace)::ModuleLinker::linkAppendingVarProto(llvm::GlobalVariable*, llvm::GlobalVariable const*) /usr/local/google/home/chandlerc/src/llvm/build/../lib/Linker/LinkModules.
cpp:980:3
    #4 0x6a7403 in (anonymous namespace)::ModuleLinker::linkGlobalVariableProto(llvm::GlobalVariable const*, llvm::GlobalValue*, bool) /usr/local/google/home/chandlerc/src/llvm/build/../lib/Linker/LinkMod
ules.cpp:1074:11
    #5 0x69ff4e in (anonymous namespace)::ModuleLinker::linkGlobalValueProto(llvm::GlobalValue*) /usr/local/google/home/chandlerc/src/llvm/build/../lib/Linker/LinkModules.cpp:1028:13
    #6 0x697229 in (anonymous namespace)::ModuleLinker::run() /usr/local/google/home/chandlerc/src/llvm/build/../lib/Linker/LinkModules.cpp:1485:9
    #7 0x696542 in llvm::Linker::linkInModule(llvm::Module*) /usr/local/google/home/chandlerc/src/llvm/build/../lib/Linker/LinkModules.cpp:1621:10
    #8 0x4a2db7 in main /usr/local/google/home/chandlerc/src/llvm/build/../tools/llvm-link/llvm-link.cpp:116:9
    #9 0x7f4ae61e5ec4 in __libc_start_main /build/buildd/eglibc-2.19/csu/libc-start.c:287

previously allocated by thread T0 here:
    #0 0x4a192b in operator new(unsigned long) /usr/local/google/home/chandlerc/src/llvm/opt-build/../projects/compiler-rt/lib/asan/asan_new_delete.cc:62:35
    #1 0x61d85c in llvm::User::operator new(unsigned long, unsigned int) /usr/local/google/home/chandlerc/src/llvm/build/../lib/IR/User.cpp:57:19
    #2 0x6a7525 in (anonymous namespace)::ModuleLinker::linkGlobalVariableProto(llvm::GlobalVariable const*, llvm::GlobalValue*, bool) /usr/local/google/home/chandlerc/src/llvm/build/../lib/Linker/LinkMod
ules.cpp:1100:3
    #3 0x69ff4e in (anonymous namespace)::ModuleLinker::linkGlobalValueProto(llvm::GlobalValue*) /usr/local/google/home/chandlerc/src/llvm/build/../lib/Linker/LinkModules.cpp:1028:13
    #4 0x697229 in (anonymous namespace)::ModuleLinker::run() /usr/local/google/home/chandlerc/src/llvm/build/../lib/Linker/LinkModules.cpp:1485:9
    #5 0x696542 in llvm::Linker::linkInModule(llvm::Module*) /usr/local/google/home/chandlerc/src/llvm/build/../lib/Linker/LinkModules.cpp:1621:10
    #6 0x4a2db7 in main /usr/local/google/home/chandlerc/src/llvm/build/../tools/llvm-link/llvm-link.cpp:116:9
    #7 0x7f4ae61e5ec4 in __libc_start_main /build/buildd/eglibc-2.19/csu/libc-start.c:287

SUMMARY: AddressSanitizer: heap-use-after-free /usr/local/google/home/chandlerc/src/llvm/build/../include/llvm/IR/GlobalValue.h:115 llvm::GlobalValue::setUnnamedAddr(bool)
Shadow bytes around the buggy address:
  0x0c187fff96e0: fa fa fa fa fa fa fa fa 00 00 00 00 00 00 00 00
  0x0c187fff96f0: 00 00 00 00 00 00 00 fa fa fa fa fa fa fa fa fa
  0x0c187fff9700: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fa
  0x0c187fff9710: fa fa fa fa fa fa fa fa 00 00 00 00 00 00 00 00
  0x0c187fff9720: 00 00 00 00 00 00 00 00 fa fa fa fa fa fa fa fa
=>0x0c187fff9730: fd fd fd fd fd fd fd fd fd[fd]fd fd fd fd fd fd
  0x0c187fff9740: fa fa fa fa fa fa fa fa fd fd fd fd fd fd fd fd
  0x0c187fff9750: fd fd fd fd fd fd fd fa fa fa fa fa fa fa fa fa
  0x0c187fff9760: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c187fff9770: fa fa fa fa fa fa fa fa fd fd fd fd fd fd fd fd
  0x0c187fff9780: fd fd fd fd fd fd fd fd fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07
  Heap left redzone:       fa
  Heap right redzone:      fb
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack partial redzone:   f4
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  ASan internal:           fe
==2122==ABORTING

llvm-svn: 221096
2014-11-02 09:10:31 +00:00
Elena Demikhovsky 27152aea88 Use Alias Analysis to hoist 2 loads from diamond to the common predecessor basic block.
Alias Analysis allows to detect real barriers for load hoisting.

Review in http://reviews.llvm.org/D5991

llvm-svn: 221091
2014-11-02 08:03:05 +00:00
Rafael Espindola e3cfb6c760 Update test to use llvm-readobj. NFC.
llvm-svn: 221074
2014-11-02 01:12:02 +00:00
David Blaikie 4729c78bc0 Test 221067 in a fixed-target test so as not to fail on targets with different DWARF encodings
llvm-svn: 221071
2014-11-01 23:50:59 +00:00
David Majnemer 634ca236dc InstCombine: Don't assume that m_ZExt matches an Instruction
m_ZExt might bind against a ConstantExpr instead of an Instruction.
Assuming this, using cast<Instruction>, results in InstCombine crashing.

Instead, introduce ZExtOperator to bridge both Instruction and
ConstantExpr ZExts.

This fixes PR21445.

llvm-svn: 221069
2014-11-01 23:46:05 +00:00
David Blaikie eed309da88 Remove test coverage added in 221067 due to it being non-portable.
Will try to find a portable way to test this (or a fixed-target test I
can add such coverage to) shortly.

llvm-svn: 221068
2014-11-01 23:42:30 +00:00
David Blaikie 983bfea0d0 Remove DwarfUnit::LabelEnd in favor of computing the length of the section directly
This was a compile-unit specific label (unused in type units) and seems
unnecessary anyway when we can more easily directly compute the size of
the compile unit.

llvm-svn: 221067
2014-11-01 23:07:14 +00:00
David Majnemer 549f4f2510 InstCombine: Combine (X+cst) < 0 --> X < -cst
This can happen pretty often in code that looks like:
int foo = bar - 1;
if (foo < 0)
  do stuff

In this case, bar < 1 is an equivalent condition.

This transform requires that the add instruction be annotated with nsw.

llvm-svn: 221045
2014-11-01 09:09:51 +00:00
Adrian Prantl a0852d2be3 Revert "Temporarily revert r220777 to sort out build bot breakage."
This reverts commit r221028. Later commits depend on this and
reverting just this one causes even more bots to fail.

llvm-svn: 221041
2014-11-01 03:19:45 +00:00
Diego Novillo d5336ae269 Add show and merge tools for sample PGO profiles.
Summary:
This patch extends the 'show' and 'merge' commands in llvm-profdata to handle
sample PGO formats. Using the 'merge' command it is now possible to convert
one sample PGO format to another.

The only format that is currently not working is 'gcc'. I still need to
implement support for it in lib/ProfileData.

The changes in the sample profile support classes are needed for the
merge operation.

Reviewers: bogner

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D6065

llvm-svn: 221032
2014-11-01 00:56:55 +00:00
Adrian Prantl cd4872399a Temporarily revert r220777 to sort out build bot breakage.
"[x86] Simplify vector selection if condition value type matches vselect value type and true value is all ones or false value is all zeros."

llvm-svn: 221028
2014-11-01 00:26:59 +00:00
Reid Kleckner 9abe268adb Revert "R600: Make sure to inline all internal functions"
This reverts commit r220996.

It introduced layering violations causing link errors in many
configurations.

llvm-svn: 221020
2014-10-31 23:35:26 +00:00
Rafael Espindola 4e27567a12 Refactor duplicated code in liking GlobalValues.
There is quiet a bit of logic that is common to any GlobalValue but was
duplicated for Functions, GlobalVariables and GlobalAliases.

While at it, merge visibility even when comdats are used, fixing pr21415.

llvm-svn: 221014
2014-10-31 23:10:07 +00:00
Michael Zolotukhin 9b9624de0c Correctly update dom-tree after loop vectorizer.
llvm-svn: 221009
2014-10-31 22:28:03 +00:00
Tom Stellard 5b2927fe83 R600: Don't promote allocas when one of the users is a ptrtoint instruction
We need to figure out how to track ptrtoint values all the
way until result is converted back to a pointer in order
to correctly rewrite the pointer type.

llvm-svn: 220997
2014-10-31 20:52:04 +00:00
Tom Stellard aa73831757 R600: Make sure to inline all internal functions
Function calls aren't supported yet.

llvm-svn: 220996
2014-10-31 20:52:02 +00:00
Bill Schmidt 1ca69fa64d [PowerPC] Initial VSX intrinsic support, with min/max for vector double
Now that we have initial support for VSX, we can begin adding
intrinsics for programmer access to VSX instructions.  This patch adds
basic support for VSX intrinsics in general, and tests it by
implementing intrinsics for minimum and maximum for the vector double
data type.

The LLVM portion of this is quite straightforward.  There is a
companion patch for Clang.

llvm-svn: 220988
2014-10-31 19:19:07 +00:00
Kostya Serebryany ea48bdc702 [asan] do not treat inline asm calls as indirect calls
llvm-svn: 220985
2014-10-31 18:38:23 +00:00
Quentin Colombet c32615dfef [CodeGenPrepare] Move extractelement close to store if they can be combined.
This patch adds an optimization in CodeGenPrepare to move an extractelement
right before a store when the target can combine them.
The optimization may promote any scalar operations to vector operations in the
way to make that possible.


** Context **

Some targets use different register files for both vector and scalar operations.
This means that transitioning from one domain to another may incur copy from one
register file to another. These copies are not coalescable and may be expensive.
For example, according to the scheduling model, on cortex-A8 a vector to GPR
move is 20 cycles.


** Motivating Example **

Let us consider an example:
define void @foo(<2 x i32>* %addr1, i32* %dest) {
 %in1 = load <2 x i32>* %addr1, align 8
 %extract = extractelement <2 x i32> %in1, i32 1
 %out = or i32 %extract, 1
 store i32 %out, i32* %dest, align 4
 ret void
}

As it is, this IR generates the following assembly on armv7:
  vldr  d16, [r0]            @vector load  
  vmov.32 r0, d16[1]  @ cross-register-file copy: 20 cycles
  orr r0, r0, #1           @ scalar bitwise or
  str r0, [r1]               @ scalar store
  bx  lr

Whereas we could generate much faster code:
  vldr  d16, [r0]               @ vector load
  vorr.i32  d16, #0x1     @ vector bitwise or
  vst1.32 {d16[1]}, [r1:32] @ vector extract + store
  bx  lr

Half of the computation made in the vector is useless, but this allows to get
rid of the expensive cross-register-file copy.


** Proposed Solution **

To avoid this cross-register-copy penalty, we promote the scalar operations to
vector operations. The penalty will be removed if we manage to promote the whole
chain of computation in the vector domain.
Currently, we do that only when the chain of computation ends by a store and the
target is able to combine an extract with a store.

Stores are the most likely candidates, because other instructions produce values
that would need to be promoted and so, extracted as some point[1]. Moreover,
this is customary that targets feature stores that perform a vector extract (see
AArch64 and X86 for instance).

The proposed implementation relies on the TargetTransformInfo to decide whether
or not it is beneficial to promote a chain of computation in the vector domain.
Unfortunately, this interface is rather inaccurate for this level of details and
although this optimization may be beneficial for X86 and AArch64, the inaccuracy
will lead to the optimization being too aggressive.
Basically in TargetTransformInfo, everything that is legal has a cost of 1,
whereas, even if a vector type is legal, usually a vector operation is slightly
more expensive than its scalar counterpart. That will lead to too many
promotions that may not be counter balanced by the saving of the
cross-register-file copy. For instance, on AArch64 this penalty is just 4
cycles.

For now, the optimization is just enabled for ARM prior than v8, since those
processors have a larger penalty on cross-register-file copies, and the scope is
limited to basic blocks. Because of these two factors, we limit the effects of
the inaccuracy. Indeed, I did not want to build up a fancy cost model with block
frequency and everything on top of that.

[1] We can imagine targets that can combine an extractelement with  other
instructions than just stores. If we want to go into that direction, the current
interfaces must be augmented and, moreover, I think this becomes a global isel
problem.

Differential Revision: http://reviews.llvm.org/D5921

<rdar://problem/14170854>

llvm-svn: 220978
2014-10-31 17:52:53 +00:00
Kostya Serebryany 001ea5fe15 [asan] fix caller-calee instrumentation to emit new cache for every call site
llvm-svn: 220973
2014-10-31 17:11:27 +00:00
Rafael Espindola 67926f10e8 Unify and update link-messages.ll and redefinition.ll. NFC.
llvm-svn: 220968
2014-10-31 16:52:30 +00:00
Chad Rosier a675e550ca [AArch64] CondOpt pass is missing FCMP instructions when searching backward for
a CMP which defines the flags used by B.CC.

http://reviews.llvm.org/D6047
Patch by Zhaoshi Zheng <zhaoshiz@codeaurora.org>!

llvm-svn: 220961
2014-10-31 15:17:36 +00:00
Bradley Smith 9992b167ae [SCEV] Improve Scalar Evolution's use of no {un,}signed wrap flags
In a case where we have a no {un,}signed wrap flag on the increment, if
RHS - Start is constant then we can avoid inserting a max operation bewteen
the two, since we can statically determine which is greater.

This allows us to unroll loops such as:

 void testcase3(int v) {
   for (int i=v; i<=v+1; ++i)
     f(i);
 }

llvm-svn: 220960
2014-10-31 11:40:32 +00:00
Ulrich Weigand c8c2ea2854 [PowerPC] Load BlockAddress values from the TOC in 64-bit SVR4 code
Since block address values can be larger than 2GB in 64-bit code, they
cannot be loaded simply using an @l / @ha pair, but instead must be
loaded from the TOC, just like GlobalAddress, ConstantPool, and
JumpTable values are.

The commit also fixes a bug in PPCLinuxAsmPrinter::doFinalization where
temporary labels could not be used as TOC values, since code would
attempt (and fail) to use GetOrCreateSymbol to create a symbol of the
same name as the temporary label.

llvm-svn: 220959
2014-10-31 10:33:14 +00:00
Peter Zotov e2b8b1431c [OCaml] Ensure consistent naming.
Specifically:
  * Directories match module names.
  * Test names match module names.
  * The language is called "OCaml", not "Ocaml".

llvm-svn: 220958
2014-10-31 09:19:03 +00:00
Peter Zotov b1f54ff42f [OCaml] Rework Llvm_executionengine using ctypes.
Since JIT->MCJIT migration, most of the ExecutionEngine interface
became deprecated and/or broken. This especially affected the OCaml
bindings, as runFunction is no longer available, and unlike in C,
it is not possible to coerce a pointer to a function and call it
in OCaml.

In practice, LLVM 3.5 shipped completely unusable
Llvm_executionengine.

The GenericValue interface and runFunction were essentially
a poor man's FFI. As such, this interface was removed and instead
a dependency on ctypes >=0.3 added, which handled platform-specific
aspects of accessing data and calling functions.

The new interface does not expose JIT (which is a shim around MCJIT),
as well as the interpreter (which can't handle a lot of valid IR).

Llvm_executionengine.add_global_mapping is currently unusable
due to PR20656.

llvm-svn: 220957
2014-10-31 09:05:36 +00:00
Rafael Espindola 5b24759dff Move an input file to Inputs instead of using RUN: true.
llvm-svn: 220953
2014-10-31 05:54:15 +00:00
David Majnemer c7d7c6fb3a Object, COFF: Cleanup symbol type code, improve binutils compatibility
Do a better job classifying symbols.  This increases the consistency
between the COFF handling code and the ELF side of things.

llvm-svn: 220952
2014-10-31 05:07:00 +00:00
Rafael Espindola e5204efeaf merge tests for constant linking.
llvm-svn: 220951
2014-10-31 05:04:16 +00:00
Hao Liu e02b1a068f PR20557: Fix the bug that bogus cpu parameter crashes llc on AArch64 backend.
Initial patch by Oleg Ranevskyy.

llvm-svn: 220945
2014-10-31 02:35:34 +00:00
Ahmed Bougacha 9f336c4ec5 [SelectionDAG] When scalarizing trunc, don't assert for legal operands.
r212242 introduced a legalizer hook, originally to let AArch64 widen
v1i{32,16,8} rather than scalarize, because the legalizer expected, when
scalarizing the result of a conversion operation, to already have
scalarized the operands.  On AArch64, v1i64 is legal, so that commit
ensured operations such as v1i32 = trunc v1i64 wouldn't assert.

It did that by choosing to widen v1 types whenever possible.  However,
v1i1 types, for which there's no legal widened type, would still trigger
the assert.

This commit fixes that, by only scalarizing a trunc's result when the
operand has already been scalarized, and introducing an extract_elt
otherwise.  
This is similar to r205625.

Fixes PR20777.

llvm-svn: 220937
2014-10-30 23:46:50 +00:00
NAKAMURA Takumi d9913e6d35 llvm/test/Transforms/SampleProfile/syntax.ll: Relax MISSING-FILE not to
check locale-aware message catalog.

llvm-svn: 220934
2014-10-30 22:28:46 +00:00
Louis Gerbarg e8f9c78247 Fix incorrect invariant check in DAG Combine
Earlier this summer I fixed an issue where we were incorrectly combining
multiple loads that had different constraints such alignment, invariance,
temporality, etc. Apparently in one case I made copt paste error and swapped
alignment and invariance.

Tests included.

rdar://18816719

llvm-svn: 220933
2014-10-30 22:21:03 +00:00
Rafael Espindola d1e64b1e93 Fix the merging of the constantness of declarations.
The langref says:

LLVM explicitly allows declarations of global variables to be marked
constant, even if the final definition of the global is not. This
capability can be used to enable slightly better optimization of the
program, but requires the language definition to guarantee that
optimizations based on the ‘constantness’ are valid for the
translation units that do not include the definition.

Given that definition, when merging two declarations, we have to drop
constantness if of of them is not marked contant, since the Module
without the constant marker might not have the necessary guarantees.

llvm-svn: 220927
2014-10-30 20:50:23 +00:00
Philip Reames 4cb4d3e048 Add handling for range metadata in ValueTracking isKnownNonZero
If we load from a location with range metadata, we can use information about the ranges of the loaded value for optimization purposes.  This helps to remove redundant checks and canonicalize checks for other optimization passes.  This particular patch checks whether a value is known to be non-zero from the range metadata.

Currently, these tests are against InstCombine.  In theory, all of these should be InstSimplify since we're not inserting any new instructions.  Moving the code may follow in a separate change.

Reviewed by: Hal
Differential Revision: http://reviews.llvm.org/D5947

llvm-svn: 220925
2014-10-30 20:25:19 +00:00
Rafael Espindola f9a94abd4e Update test to pass .ll to llvm-link and use Inputs.
llvm-svn: 220924
2014-10-30 20:23:59 +00:00
David Blaikie 76fd43c653 PR21408: Workaround the appearance of duplicate variables due to problems when inlining two calls to the same function from the same call site.
llvm-svn: 220923
2014-10-30 20:20:11 +00:00
Peter Zotov cf8f7a10b7 lit: PR21417: don't try to update OCAMLPATH if LibDir is empty.
llvm-svn: 220919
2014-10-30 19:26:42 +00:00
Diego Novillo c572e92c76 Add profile writing capabilities for sampling profiles.
Summary:
This patch finishes up support for handling sampling profiles in both
text and binary formats. The new binary format uses uleb128 encoding to
represent numeric values. This makes profiles files about 25% smaller.

The profile writer class can write profiles in the existing text and the
new binary format. In subsequent patches, I will add the capability to
read (and perhaps write) profiles in the gcov format used by GCC.

Additionally, I will be adding support in llvm-profdata to manipulate
sampling profiles.

There was a bit of refactoring needed to separate some code that was in
the reader files, but is actually common to both the reader and writer.

The new test checks that reading the same profile encoded as text or
raw, produces the same results.

Reviewers: bogner, dexonsmith

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D6000

llvm-svn: 220915
2014-10-30 18:00:06 +00:00
Tim Northover a94cd25188 ARM: test default values for TAG_CPU_unaligned_access attribute.
It should be on for every target that supports unaligned accesses (e.g. not
v6m).

Patch by Charlie Turner.

llvm-svn: 220912
2014-10-30 17:05:44 +00:00
Robert Khasanov af318f7073 [AVX512] Added VBROADCAST{SS/SD} encoding for VL subset.
Refactored through AVX512_maskable
        

llvm-svn: 220908
2014-10-30 14:21:47 +00:00
Peter Collingbourne dd3486ece1 [dfsan] New calling convention for custom functions with variadic arguments.
Summary:
The previous calling convention prevented custom functions from being able
to access argument labels unless it knew how many variadic arguments there
were, and of which type. This restriction made it impossible to correctly
model functions in the printf family, as it is legal to pass more arguments
than required to those functions. We now pass arguments in the following order:

non-vararg arguments
labels for non-vararg arguments
[if vararg function, pointer to array of labels for vararg arguments]
[if non-void function, pointer to label for return value]
vararg arguments

Differential Revision: http://reviews.llvm.org/D6028

llvm-svn: 220906
2014-10-30 13:22:57 +00:00
Peter Zotov e75657b727 [OCaml] Expose LLVM{Get,Set}DLLStorageClass.
llvm-svn: 220902
2014-10-30 08:30:08 +00:00
Peter Zotov f481471495 [OCaml] Test code emission in Llvm_target.
Prior to this commit, the Llvm_target tests (ab)used
the Llvm_executionengine as a mechanism to initialize at least some
target. This needlessly restricted tests to builds which can emit
code for their host architecture.

llvm-svn: 220901
2014-10-30 08:30:01 +00:00
Peter Zotov d6d49fd0e2 [OCaml] Enable backtraces in tests.
llvm-svn: 220900
2014-10-30 08:29:57 +00:00
Peter Zotov 668f9670a6 [OCaml] [autoconf] Migrate to ocamlfind.
This commit updates the OCaml bindings and tests to use ocamlfind.
The bindings are migrated in order to use ctypes, which are now
required for MCJIT-backed Llvm_executionengine.
The tests are migrated in order to use OUnit and to verify that
the distributed META.llvm allows to build working executables.

Every OCaml toolchain invocation is now chained through ocamlfind,
which (in theory) allows to cross-compile the OCaml bindings.

The configure script now checks for ctypes (>= 0.2.3) and
OUnit (>= 2). The code depending on these libraries will be added
later. The configure script does not check the package versions
in order to keep changes less invasive.

Additionally, OCaml bindings will now be automatically enabled
if ocamlfind is detected on the system, rather than ocamlc, as it
was before.

llvm-svn: 220899
2014-10-30 08:29:45 +00:00
Rafael Espindola 919fb535a0 Enable the slp vectorizer in the gold plugin.
llvm-svn: 220887
2014-10-30 00:38:54 +00:00
Rafael Espindola 8391dbd1c6 Enable the loop vectorizer in the gold plugin.
llvm-svn: 220886
2014-10-30 00:11:24 +00:00
Rafael Espindola 4a3b6cf3c1 Replace also-emit-llvm with save-temps.
The also-emit-llvm option only supported getting the IR before optimizations.
This patch replaces it with a more generic save-temps option that saves the IR
both before and after optimizations.

llvm-svn: 220885
2014-10-29 23:54:45 +00:00
NAKAMURA Takumi 27b6d47f36 llvm/test/Transforms/LoopRotate/nosimplifylatch.ll: Fix possibly mis-repeatedly-pasted test.
llvm-svn: 220880
2014-10-29 23:05:12 +00:00
Yi Jiang 323d57336c Test Case for r220872:Do not simplifyLatch for loops where hoisting increments couldresult in extra live range interferance
llvm-svn: 220873
2014-10-29 20:20:33 +00:00
Yi Jiang ab19fff4d8 Do not simplifyLatch for loops where hoisting increments couldresult in extra live range interferance
llvm-svn: 220872
2014-10-29 20:19:47 +00:00
Jan Wen Voung ce2164f45c Fix getRelocationValueString to return the symbol name for EM_386.
Summary: This helps llvm-objdump -r to print out the symbol name along
with the relocation type on x86. Adjust existing tests from checking
for "Unknown" to check for the symbol now.

Test Plan: Adjusted test/Object tests.

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D5987

llvm-svn: 220866
2014-10-29 18:37:13 +00:00
Robert Khasanov 595e598869 [AVX512] Implemented AVX512VL FP bnary packed instructions (VADDP*, VSUBP*, VMULP*, VDIVP*, VMAXP*, VMINP*)
Refactored through AVX512_maskable
Added encoding tests for them.

llvm-svn: 220858
2014-10-29 15:43:02 +00:00
Peter Zotov f58626d5c9 [OCaml] Expose Llvm_target.TargetMachine.add_analysis_passes.
llvm-svn: 220846
2014-10-29 08:16:14 +00:00
Peter Zotov 5f28729c61 [OCaml] Expose Llvm_bitwriter.write_bitcode_to_memory_buffer.
llvm-svn: 220844
2014-10-29 08:16:01 +00:00
Peter Zotov 662538ac40 [OCaml] Drop support for 3.12.1 and earlier.
In practice this means:
  * Always using -g flag.
  * Embedding -cclib -lstdc++ into the corresponding cma/cmxa file.
    This also moves -lstdc++ in a single place.
  * Using caml_named_value instead of a homegrown mechanism.

llvm-svn: 220843
2014-10-29 08:15:54 +00:00
Peter Zotov e447b61c50 [OCaml] Synchronize transformations with LLVM-C.
Also, rearrange the functions in a way that allows to quickly
compare C headers and .mli/glue files.

llvm-svn: 220842
2014-10-29 08:15:21 +00:00
NAKAMURA Takumi 815d752b93 macho-symbolized-disassembly.test: Don't check C++ demangler unconditionally.
For example, MS PSDK is not expected to have <cxxabi.h>.
You should introduce the new feature in lit.cfg corresponding to HAVE_CXXABI_H if you would like to test demangler.

llvm-svn: 220840
2014-10-29 08:08:21 +00:00
Saleem Abdulrasool 56ade2991b test: tweak inlined-allocs test
Remove pointless checks for storage of uninteresting values.  Ensure that we
perform basic alias analysis to make the test more correct.  Finally, apply a
stylistic change to the test.

llvm-svn: 220839
2014-10-29 06:31:11 +00:00
Kevin Enderby 04bf6931cc Update llvm-objdump’s Mach-O symbolizer code to demangle C++ names.
llvm-svn: 220833
2014-10-28 23:39:46 +00:00
Peter Zotov 1f351ca15a [OCaml] PR5595: Pass LDFLAGS to tests via -cclib.
llvm-svn: 220832
2014-10-28 23:31:13 +00:00
Peter Zotov 796537119b [OCaml] Fix whitespace.
llvm-svn: 220829
2014-10-28 22:39:42 +00:00
Peter Zotov dacfc64b57 [OCaml] PR9719, PR14727: Make tests run without ocamlopt.
Previously, tests hardcoded ocamlopt and cmxa, which broke builds on
machines without ocamlopt. Instead, they now fall back to ocamlc.

As a side effect this fixes PR14727, which was caused by a crude hack
that replaced gcc with g++ everywhere in the ocamlopt native compiler
path and passes it back using -cc. Now the tests use the same
technique as META, i.e. -cclib -lstdc++. It might be more fragile
than using g++ explicitly, but it will break when the installed
package will also break, which is good.

llvm-svn: 220828
2014-10-28 22:39:36 +00:00
Peter Zotov 1afb7497c7 [OCaml] PR19859: Add functions to query and modify branches.
Patch by Gabriel Radanne <drupyog@zoho.com>.

llvm-svn: 220818
2014-10-28 19:47:02 +00:00
Peter Zotov 6074c344de [OCaml] PR19859: Add tests for reading the values of numeric constants.
Patch by Gabriel Radanne <drupyog@zoho.com>.

llvm-svn: 220816
2014-10-28 19:46:52 +00:00
Saleem Abdulrasool d178ada55e Transforms: reapply SVN r219899
This restores the commit from SVN r219899 with an additional change to ensure
that the CodeGen is correct for the case that was identified as being incorrect
(originally PR7272).

In the case that during inlining we need to synthesize a value on the stack
(i.e. for passing a value byval), then any function involving that alloca must
be stripped of its tailness as the restriction that it does not access the
parent's stack no longer holds.  Unfortunately, a single alloca can cause a
rippling effect through out the inlining as the value may be aliased or may be
mutated through an escaped external call.  As such, we simply track if an alloca
has been introduced in the frame during inlining, and strip any tail calls.

llvm-svn: 220811
2014-10-28 18:27:37 +00:00
Robert Khasanov eb12639375 [AVX512] Extended avx512_sqrt_packed (sqrt instructions) to VL subset.
Refactored through AVX512_maskable

llvm-svn: 220806
2014-10-28 18:15:20 +00:00
Robert Khasanov 3e534c93b9 [AVX-512] Expanded rsqrt/rcp instructions to VL subset.
Refactored multiclass through AVX512_maskable

llvm-svn: 220783
2014-10-28 16:37:13 +00:00
Robert Khasanov 4441c4d31b [x86] Simplify vector selection if condition value type matches vselect value type and true value is all ones or false value is all zeros.
This transformation worked if selector is produced by SETCC, however SETCC is needed only if we consider to swap operands. So I replaced SETCC check for this case.
Added tests for vselect of <X x i1> values.

llvm-svn: 220777
2014-10-28 15:59:40 +00:00
Robert Khasanov dd09a8f320 [AVX512] Bring back vector-shuffle lowering support through broadcasts
Ffter commit at rev219046 512-bit broadcasts lowering become non-optimal. Most of tests on broadcasting and embedded broadcasting were changed and they doesn’t produce efficient code.

Example below is from commit changes (it’s the first test from test/CodeGen/X86/avx512-vbroadcast.ll):

 define   <16 x i32> @_inreg16xi32(i32 %a) {
 ; CHECK-LABEL: _inreg16xi32:
 ; CHECK:       ## BB#0:
-; CHECK-NEXT:    vpbroadcastd %edi, %zmm0
+; CHECK-NEXT:    vmovd %edi, %xmm0
+; CHECK-NEXT:    vpbroadcastd %xmm0, %ymm0
+; CHECK-NEXT:    vinserti64x4 $1, %ymm0, %zmm0, %zmm0
 ; CHECK-NEXT:    retq
 %b = insertelement <16 x i32> undef, i32 %a, i32 0
 %c = shufflevector <16 x i32> %b, <16 x i32> undef, <16 x i32> zeroinitializer
 ret <16 x i32> %c
}

Here, 256-bit broadcast was generated instead of 512-bit one.

In this patch
1) I added vector-shuffle lowering through broadcasts
2) Removed asserts and branches likes because this is incorrect
-  assert(Subtarget->hasDQI() && "We can only lower v8i64 with AVX-512-DQI");
3) Fixed lowering tests

llvm-svn: 220774
2014-10-28 12:28:51 +00:00
Reid Kleckner 9ccce99e1d X86: Implement the vectorcall calling convention
This is a Microsoft calling convention that supports both x86 and x86_64
subtargets. It passes vector and floating point arguments in XMM0-XMM5,
and passes them indirectly once they are consumed.

Homogenous vector aggregates of up to four elements can be passed in
sequential vector registers, but this part is not implemented in LLVM
and will be handled in Clang.

On 32-bit x86, it is similar to fastcall in that it uses ecx:edx as
integer register parameters and is callee cleanup. On x86_64, it
delegates to the normal win64 calling convention.

Reviewers: majnemer

Differential Revision: http://reviews.llvm.org/D5943

llvm-svn: 220745
2014-10-28 01:29:26 +00:00
Tim Northover 00917897b2 AArch64: enable Cortex-A57 FP balancing on Cortex-A53.
Benchmarks have shown that it's harmless to the performance there, and having a
unified set of passes between the two cores where possible helps big.LITTLE
deployment.

Patch by Z. Zheng.

llvm-svn: 220744
2014-10-28 01:24:32 +00:00
Adam Nemet cf7a4a2660 [AVX512] Add vpermil variable version
This is implemented via a multiclass that derives from the vperm imm
multiclass.

Fixes <rdar://problem/18426089>

llvm-svn: 220737
2014-10-27 23:08:40 +00:00
Pete Cooper 7c801dc90b Fix a stackmap bug introduced in r220710.
For a call to not return in to the stackmap shadow, the shadow must end with the call.

To do this, we must insert any required nops *before* the call, and not after it.

llvm-svn: 220728
2014-10-27 22:38:45 +00:00
Juergen Ributzka 7ccebec668 [FastISel][AArch64] Emit immediate version of icmp (subs) for null pointer check.
This is a minor change to use the immediate version when the operand is a null
value. This should get rid of an unnecessary 'mov' instruction in debug
builds and align the code more with the one generated by SelectionDAG.

This fixes rdar://problem/18785125.

llvm-svn: 220713
2014-10-27 19:58:36 +00:00
Juergen Ributzka 0190fea941 [FastISel][AArch64] Optimize compare-and-branch for i1 to use 'tbz'.
Minor enhancement to use 'tbz' for i1 compare-and-branch to get rid of an 'and'
instruction.

This fixes rdar://problem/18784953.

llvm-svn: 220712
2014-10-27 19:46:23 +00:00
Pete Cooper 3c0af35232 Stackmap shadows should consider call returns a branch target.
To avoid emitting too many nops, a stackmap shadow can include emitted instructions in the shadow, but these must not include branch targets.

A return from a call should count as a branch target as patching over the instructions after the call would lead to incorrect behaviour for threads currently making that call, when they return.

llvm-svn: 220710
2014-10-27 19:40:35 +00:00
Juergen Ributzka 90f741a2ce [FastISel][AArch64] Use 'cbz' also for null values (pointers).
The pattern matching for a 'ConstantInt' value was too restrictive. Checking for
a 'Constant' with a bull value is sufficient for using an 'cbz/cbnz' instruction.

This fixes rdar://problem/18784732.

llvm-svn: 220709
2014-10-27 19:38:05 +00:00
Juergen Ributzka eae91040d8 [FastISel][AArch64] Don't fold the 'and' instruction into the 'tbz/tbnz' instruction if it is in a different basic block.
This fixes a bug where the input register was not defined for the 'tbz/tbnz'
instruction. This happened, because we folded the 'and' instruction from a
different basic block.

This fixes rdar://problem/18784013.

llvm-svn: 220704
2014-10-27 19:16:48 +00:00
Juergen Ributzka 6de054a25a [FastISel][AArch64] Fix load/store with frame indices.
At higher optimization levels the LLVM IR may contain more complex patterns for
loads/stores from/to frame indices. The 'computeAddress' function wasn't able to
handle this and triggered an assertion.

This fix extends the possible addressing modes for frame indices.

This fixes rdar://problem/18783298.

llvm-svn: 220700
2014-10-27 18:21:58 +00:00
Kostya Serebryany 4f8f0c5aa2 [asan] experimental tracing for indirect calls, llvm part.
llvm-svn: 220699
2014-10-27 18:13:56 +00:00
Oliver Stannard 79efe41a0c [ARM] Select VMAXNM and VMINNM regardless of operand order
Currently, the ARM backend will select the VMAXNM and VMINNM for these C
expressions:
  (a < b) ? a : b
  (a > b) ? a : b
but not these expressions:
  (a > b) ? b : a
  (a < b) ? b : a

This patch allows all of these expressions to be matched.

llvm-svn: 220671
2014-10-27 09:23:02 +00:00
David Majnemer c8bdd23acf InstCombine: Fix a combine assuming that icmp operands were integers
An icmp may have pointer arguments, it isn't limited to integers or
vectors of integers.

This fixes PR21388.

llvm-svn: 220664
2014-10-27 05:47:49 +00:00
Elena Demikhovsky 4b01b7306c AVX-512: Fixed encoding of VPBROADCASTM and added SKX forms of this instruction
llvm-svn: 220638
2014-10-26 09:52:24 +00:00
Peter Zotov 8cdf0425a0 [OCaml] hexagon can't run MCJIT tests, XFAIL it.
llvm-svn: 220621
2014-10-25 19:01:14 +00:00
Peter Zotov 3944e6e223 [OCaml] Unbreak Llvm_executionengine.initialize_native_target.
First, return true on success, as it is the OCaml convention.
Second, also initialize the native assembly printer, which is,
despite the name, required for MCJIT operation.

Since this function did not initialize the assembly printer earlier
and no function to initialize native assembly printer was available
elsewhere, it is safe to break its interface: it means that it
simply could not be used successfully before.

llvm-svn: 220620
2014-10-25 18:50:02 +00:00
Peter Zotov d1531a2349 [OCaml] Expose Llvm_executionengine.ExecutionEngine.create_mcjit.
llvm-svn: 220619
2014-10-25 18:49:56 +00:00
Jingyue Wu fe72fcebf6 [SeparateConstOffsetFromGEP] Fixed a bug related to unsigned modulo
The dividend in "signed % unsigned" is treated as unsigned instead of signed,
causing unexpected behavior such as -64 % (uint64_t)24 == 0.

Added a regression test in split-gep.ll

Patched by Hao Liu.

llvm-svn: 220618
2014-10-25 18:34:03 +00:00
Jingyue Wu b723152379 [SeparateConstOffsetFromGEP] Fixed a bug in rebuilding OR expressions
The two operands of the new OR expression should be NextInChain and TheOther
instead of the two original operands.

Added a regression test in split-gep.ll.

Hao Liu reported this bug, and provded the test case and an initial patch.
Thanks! 

llvm-svn: 220615
2014-10-25 17:36:21 +00:00
Jingyue Wu ea51161a94 [NVPTX] aligned byte-buffers for vector return types
Summary:
Fixes PR21100 which is caused by inconsistency between the declared return type
and the expected return type at the call site. The new behavior is consistent
with nvcc and the NVPTXTargetLowering::getPrototype function.

Test Plan: test/Codegen/NVPTX/vector-return.ll

Reviewers: jholewinski

Reviewed By: jholewinski

Subscribers: llvm-commits, meheff, eliben, jholewinski

Differential Revision: http://reviews.llvm.org/D5612

llvm-svn: 220607
2014-10-25 03:46:16 +00:00
Rafael Espindola 6e0b559d4f Add a test for the -suppress-warnings option.
llvm-svn: 220603
2014-10-25 01:14:15 +00:00
Evgeniy Stepanov d337a59db5 [msan] Make -msan-check-constant-shadow a bit stronger.
Allow (under the experimental flag) non-Instructions to participate in MSan checks.

llvm-svn: 220601
2014-10-24 23:34:15 +00:00
Kevin Enderby 2813f496d9 Fix a Mach-O assembler segfault for a subtraction expression with an undefined symbol.
In a Mach-O object file a relocatable expression of the form
SymbolA - SymbolB + constant is allowed when both symbols are
defined in a section.  But when either symbol is undefined it
is an error.

The code was crashing when it had an undefined symbol in this case.
And should have printed a error message using the location information
in the relocation entry.

rdar://18678402

llvm-svn: 220599
2014-10-24 22:39:40 +00:00
Simon Pilgrim fd080af0c5 [X86][SSE] Bitcast assertion in XFormVExtractWithShuffleIntoLoad
Minor patch to fix an issue in XFormVExtractWithShuffleIntoLoad where a load is unary shuffled, then bitcast (to a type with the same number of elements) before extracting an element.

An undef was created for the second shuffle operand using the original (post-bitcasted) vector type instead of the pre-bitcasted type like the rest of the shuffle node - this was then causing an assertion on the different types later on inside SelectionDAG::getVectorShuffle.

Differential Revision: http://reviews.llvm.org/D5917

llvm-svn: 220592
2014-10-24 21:04:41 +00:00
Colin LeMahieu 838307b31f [Hexagon] Resubmission of 220427
Modified library structure to deal with circular dependency between HexagonInstPrinter and HexagonMCInst.
Adding encoding bits for add opcode.
Adding llvm-mc tests.
Removing unit tests.

http://reviews.llvm.org/D5624

llvm-svn: 220584
2014-10-24 19:00:32 +00:00
Sanjay Patel f924e11967 Allow AVX vrsqrtps generation.
This is a follow-on to r220570 that allows a 256-bit (v8f32)
version of vrsqrtps to be generated.

llvm-svn: 220579
2014-10-24 17:59:18 +00:00
Sanjay Patel 957efc23bb Use rsqrt (X86) to speed up reciprocal square root calcs
This is a first step for generating SSE rsqrt instructions for
reciprocal square root calcs when fast-math is allowed.

For now, be conservative and only enable this for AMD btver2
where performance improves significantly - for example, 29%
on llvm/projects/test-suite/SingleSource/Benchmarks/BenchmarkGame/n-body.c
(if we convert the data type to single-precision float).

This patch adds a two constant version of the Newton-Raphson
refinement algorithm to DAGCombiner that can be selected by any target
via a parameter returned by getRsqrtEstimate()..

See PR20900 for more details:
http://llvm.org/bugs/show_bug.cgi?id=20900

Differential Revision: http://reviews.llvm.org/D5658

llvm-svn: 220570
2014-10-24 17:02:16 +00:00
Daniel Sanders 19f01658fe [mips] For N32/N64, structs must be passed in the upper bits of a register.
Summary:
Most structs were fixed by r218451 but those of between >32-bits and
<64-bits remained broken since they were not marked with [ASZ]ExtUpper.
This patch fixes the remaining cases by using
CCPromoteToUpperBitsInType<i64> on i64's in addition to i32 and smaller.

Reviewers: vmedic

Reviewed By: vmedic

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D5963

llvm-svn: 220556
2014-10-24 13:09:19 +00:00
Oliver Stannard f7a5afc3f2 [AArch64] Fix fast-isel of cbz of i1, i8, i16
This fixes a miscompilation in the AArch64 fast-isel which was
triggered when a branch is based on an icmp with condition eq or ne,
and type i1, i8 or i16. The cbz instruction compares the whole 32-bit
register, so values with the bottom 1, 8 or 16 bits clear would cause
the wrong branch to be taken.

llvm-svn: 220553
2014-10-24 09:54:41 +00:00
Timur Iskhodzhanov b320527e39 Update test/MC/ARM/coff-debugging-secrel.ll expectations to fix breakage caused by r220544
llvm-svn: 220548
2014-10-24 06:24:07 +00:00
Timur Iskhodzhanov 2bc90fdbdc Fix PR21189 -- Emit symbol subsection required to debug LLVM-built binaries with VS2012+
Reviewed at http://reviews.llvm.org/D5772

llvm-svn: 220544
2014-10-24 01:27:45 +00:00
Ahmed Bougacha 26f63f77cd Make test for r220533 more robust by using GPR pattern.
llvm-svn: 220541
2014-10-24 00:03:46 +00:00
Adam Nemet 832ec5e911 [AVX512] FMA support for the 231 variants
This is asm/diasm-only support, similar to AVX.

For ISeling the register variant, they are no different from 213 other than
whether the multiplication or the addition operand is destructed.

For ISeling the memory variant, i.e. to fold a load, they are no different
than the 132 variant.  The addition operand (op3) in both cases can come from
memory.  Again the ony difference is which operand is destructed.

There could be a post-RA pass that would convert a 213 or 132 into a 231.

Part of <rdar://problem/17082571>

llvm-svn: 220540
2014-10-24 00:03:00 +00:00
Ahmed Bougacha 7daf3b89f9 [SelectionDAG] Teach the vector scalarizer about FP conversions.
This adds support for legalization of instructions of the form:

  [fp_conv] <1 x i1> %op to <1 x double>

where fp_conv is one of fpto[us]i, [us]itofp.  This used to assert
because they were simply missing from the vector operand scalarizer.

A similar problem arose in r190830, with trunc instead.

Fixes PR20778.

Differential Revision: http://reviews.llvm.org/D5810

llvm-svn: 220533
2014-10-23 22:49:25 +00:00
Tim Northover e4c7be56bf ScheduleDAG: record PhysReg dependencies represented by CopyFromReg nodes
x86's CMPXCHG -> EFLAGS consumer wasn't being recorded as a real EFLAGS
dependency because it was represented by a pair of CopyFromReg(EFLAGS) ->
CopyToReg(EFLAGS) nodes. ScheduleDAG was expecting the source to be an
implicit-def on the instruction, where the result numbers in the DAG and the
Uses list in TableGen matched up precisely.

The Copy notation seems much more robust, so this patch extends ScheduleDAG
rather than refactoring x86.

Should fix PR20376.

llvm-svn: 220529
2014-10-23 22:31:48 +00:00
David Blaikie 1dd573db45 DebugInfo: Remove DwarfDebug::CurrentFnArguments since we have to handle argument ordering of other arguments (abstract arguments) in the same way and already have code for that too.
While refactoring this code I was confused by both the name I had
introduced (addNonArgumentVariable... but it has all this logic to
handle argument numbering and keep things in order?) and by the
redundancy. Seems when I fixed the misordered inlined argument handling,
I didn't realize it was mostly redundant with the argument ordering code
(which I may've also written, I'm not sure). So let's just rely on the
more general case.

The only oddity in output this produces is that it means when we emit
all the variables for the current function, we don't track when we've
finished the argument variables and are about to start the local
variables and insert DW_AT_unspecified_parameters (for varargs
functions) there. Instead it ends up after the local variables, scopes,
etc. But this isn't invalid and doesn't cause DWARF consumers problems
that I know of... so we'll just go with that because it makes the code
nice & simple.

(though, let's see what the buildbots have to say about this - *crosses
fingers*)

There will be some cleanup commits to follow to remove the now trivial
wrappers, etc.

llvm-svn: 220527
2014-10-23 22:27:50 +00:00
Timur Iskhodzhanov 56af52f852 PR21189: Teach llvm-readobj to dump bits of COFF symbol subsections required to debug using VS2012+
Reviewed at http://reviews.llvm.org/D5755
Thanks to Andrey Guskov for his help investigating this!

llvm-svn: 220526
2014-10-23 22:25:31 +00:00
Ahmed Bougacha 5175bcf43a [X86] Improve mul w/ overflow codegen, to MUL8+SETO.
Currently, @llvm.smul.with.overflow.i8 expands to 9 instructions, where
3 are really needed.

This adds X86ISD::UMUL8/SMUL8 SD nodes, and custom lowers them to
MUL8/IMUL8 + SETO.

i8 is a special case because there is no two/three operand variants of
(I)MUL8, so the first operand and return value need to go in AL/AX.

Also, we can't write patterns for these instructions: TableGen refuses
patterns where output operands don't match SDNode results. In this case,
instructions where the output operand is an implicitly defined register.

A related special case (and FIXME) exists for MUL8 (X86InstrArith.td):

  // FIXME: Used for 8-bit mul, ignore result upper 8 bits.
  // This probably ought to be moved to a def : Pat<> if the
  // syntax can be accepted.
  [(set AL, (mul AL, GR8:$src)), (implicit EFLAGS)]

Ideally, these go away with UMUL8, but we still need to improve TableGen
support of implicit operands in patterns.

Before this change:
  movsbl  %sil, %eax
  movsbl  %dil, %ecx
  imull   %eax, %ecx
  movb    %cl, %al
  sarb    $7, %al
  movzbl  %al, %eax
  movzbl  %ch, %esi
  cmpl    %eax, %esi
  setne   %al

After:
  movb    %dil, %al
  imulb   %sil
  seto    %al

Also, remove a made-redundant testcase for PR19858, and enable more FastISel
ALU-overflow tests for SelectionDAG too.

Differential Revision: http://reviews.llvm.org/D5809

llvm-svn: 220516
2014-10-23 21:55:31 +00:00
Sanjay Patel 848309da7c Handle sqrt() shrinking in SimplifyLibCalls like any other call
This patch removes a chunk of special case logic for folding 
(float)sqrt((double)x) -> sqrtf(x)
in InstCombineCasts and handles it in the mainstream path of SimplifyLibCalls.

No functional change intended, but I loosened the restriction on the existing
sqrt testcases to allow for this optimization even without unsafe-fp-math because
that's the existing behavior.

I also added a missing test case for not shrinking the llvm.sqrt.f64 intrinsic
in case the result is used as a double.

Differential Revision: http://reviews.llvm.org/D5919

llvm-svn: 220514
2014-10-23 21:52:45 +00:00
Peter Collingbourne dfc98c2e61 Make llvm-go test dependency optional.
llvm-svn: 220503
2014-10-23 19:51:40 +00:00
Kevin Enderby 6f326ce75b Update llvm-objdump’s Mach-O symbolizer code for Objective-C references.
This prints disassembly comments for Objective-C references to CFStrings,
Selectors, Classes and method calls.

llvm-svn: 220500
2014-10-23 19:37:31 +00:00
Rafael Espindola 52b249b9f4 Cleanup this test a bit.
Use simpler names and remove unnecessary fields.

llvm-svn: 220499
2014-10-23 19:36:21 +00:00
Rafael Espindola 2c301f5c8c Cleanup this test a bit.
Use simpler names and remove unnecessary fields.

llvm-svn: 220498
2014-10-23 19:23:42 +00:00
David Blaikie 49cfc8ca8e DebugInfo: Simplify/tidy/correct global variable decl/def emission handling.
This fixes a bug (introduced by fixing the IR emitted from Clang where
the definition of a static member would be scoped within the class,
rather than within its lexical decl context) where the definition of a
static variable would be placed inside a class.

It also improves source fidelity by scoping static class member
definitions inside the lexical decl context in which tehy are written
(eg: namespace n { class foo { static int i; } int foo::i; } - the
definition of 'i' will be within the namespace 'n' in the DWARF output
now).

Lastly, and the original goal, this reduces debug info size slightly
(and makes debug info easier to read, etc) by placing the definitions of
non-member global variables within their namespace, rather than using a
separate namespace-scoped declaration along with a definition at global
scope.

Based on patches and discussion with Frédéric.

llvm-svn: 220497
2014-10-23 19:12:43 +00:00
Rafael Espindola 28d6b27bba Make this test a bit stricter.
This now:
* Forces the linker to include the internal definition.
* Checks the full output.

llvm-svn: 220495
2014-10-23 18:52:46 +00:00
Rafael Espindola c16ec3ed42 Make this test a bit stricter.
This now:
* Forces the linker to include the internal definition.
* Checks the full output.

llvm-svn: 220494
2014-10-23 18:44:07 +00:00
Reid Kleckner 5b2787dfb2 Revert "Don't count inreg params when mangling fastcall functions"
This reverts commit r214981.

I'm not sure what I was thinking when I wrote this. Testing with MSVC
shows that this function is mangled to '@f@8':
  int __fastcall f(int a, int b);

llvm-svn: 220492
2014-10-23 17:50:42 +00:00
Renato Golin 6fb9c2ea70 Do not emit intermediate register for zero FP immediate
This updates check for double precision zero floating point constant to allow
use of instruction with immediate value rather than temporary register.
Currently "a == 0.0", where "a" is of "double" type generates:

vmov.i32        d16, #0x0
vcmpe.f64       d0, d16

With this change it becomes:

vcmpe.f64        d0, #0

Patch by Sergey Dmitrouk.

llvm-svn: 220486
2014-10-23 15:31:50 +00:00
NAKAMURA Takumi 504bbf91cd Revert r220427, "[Hexagon] Adding encoding bits for add opcode."
It brought cyclic dependecy between HexagonAsmPrinter and HexagonDesc.

llvm-svn: 220478
2014-10-23 11:31:22 +00:00
Zoran Jovanovic 42b8444372 [mips][microMIPS] Implement ADDIUR1SP instruction
Differential Revision: http://reviews.llvm.org/D5153

llvm-svn: 220477
2014-10-23 11:13:59 +00:00
Zoran Jovanovic bac3619b29 ps][microMIPS] Implement ADDIUR2 instruction
Differential Revision: http://reviews.llvm.org/D5151

llvm-svn: 220476
2014-10-23 11:06:34 +00:00
Zoran Jovanovic 9bda2f1926 ps][microMIPS] Implement LI16 instruction
Differential Revision: http://reviews.llvm.org/D5149

llvm-svn: 220475
2014-10-23 10:59:24 +00:00
Zoran Jovanovic 4a00fdc2e3 [mips][microMIPS] Implement CodeGen support for SLL16 and SRL16 instructions
Differential Revision: http://reviews.llvm.org/D5774

llvm-svn: 220474
2014-10-23 10:42:01 +00:00
Oliver Stannard 39a85abddf [Thumb2] Improve disassembly of memory hints
Currently, the ARM disassembler will disassemble the Thumb2 memory hint
instructions (PLD, PLDW and PLI), even for targets which do not have
these instructions. This patch adds the required checks to the
disassmebler.

llvm-svn: 220472
2014-10-23 08:52:58 +00:00
Akira Hatanaka 2ee0e9e6ee [ARM, stack protector] If supported, use armv7 instructions.
This commit enables using movt/movw to load the stack guard address:

movw r0, :lower16:(L_g3$non_lazy_ptr-(LPC0_0+8))
movt r0, :upper16:(L_g3$non_lazy_ptr-(LPC0_0+8))
ldr r0, [pc, r0]

Previously a pc-relative load was emitted:

ldr r0, LCPI0_0
ldr r0, [pc, r0]

rdar://problem/18740489

llvm-svn: 220470
2014-10-23 04:17:05 +00:00
Frederic Riss e939b43aa4 [dwarfdump] Dump DW_AT_ranges values inline in the debug_info dump.
The output looks like that:
                      DW_AT_ranges [FORM_data4]    (0x00000000
                         [0x00000001000024a0 - 0x00000001000024c2)
                         [0x0000000100002505 - 0x000000010000268b))

Differential Revision: http://reviews.llvm.org/D5712

llvm-svn: 220466
2014-10-23 04:08:34 +00:00
Peter Collingbourne 244ecf55bd Add llvm-go tool.
This tool lets us build LLVM components within the tree by setting up a
$GOPATH that resembles a tree fetched in the normal way with "go get".

It is intended that components such as the Go frontend will be built in-tree
using this tool.

Differential Revision: http://reviews.llvm.org/D5902

llvm-svn: 220462
2014-10-23 02:33:23 +00:00
Derek Schuff 1fd051bfe8 Fix Mips nacl-mask test for new bundle-aligned label behavior
After r220439 the behavior of labels in bundle-align mode changed,
and I neglected to update this test.

llvm-svn: 220447
2014-10-22 23:32:00 +00:00
Derek Schuff 5f708e5ec8 [MC] Attach labels to existing fragments instead of using a separate fragment
Summary:
Currently when emitting a label, a new data fragment is created for it if the
current fragment isn't a data fragment.
This change instead enqueues the label and attaches it to the next fragment
(e.g. created for the next instruction) if possible.

When bundle alignment is not enabled, this has no functionality change (it
just results in fewer extra fragments being created). For bundle alignment,
previously labels would point to the beginning of the bundle padding instead
of the beginning of the emitted instruction. This was not only less efficient
(e.g. jumping to the nops instead of past them) but also led to miscalculation
of the address of the GOT (since MC uses a label difference rather than
emitting a "." symbol).

Fixes https://code.google.com/p/nativeclient/issues/detail?id=3982

Test Plan: regression test attached

Reviewers: jvoung, eliben

Subscribers: jfb, llvm-commits

Differential Revision: http://reviews.llvm.org/D5915

llvm-svn: 220439
2014-10-22 22:38:06 +00:00
Colin LeMahieu 73a51a1a68 [Hexagon] Adding encoding bits for add opcode.
Adding llvm-mc tests.
Removing unit tests.

http://reviews.llvm.org/D5624

llvm-svn: 220427
2014-10-22 20:58:35 +00:00
Chad Rosier dcd2a3014c [AArch64] Add support for the .inst directive.
This has been implement using the MCTargetStreamer interface as is done in the
ARM, Mips and PPC backends.

Phabricator: http://reviews.llvm.org/D5891
PR20964

llvm-svn: 220422
2014-10-22 20:35:57 +00:00
Justin Bogner 72d1f2b61b test: Make this test runnable in directories with @ in their names
Jenkins likes to use directories with names involving the '@'
character, which breaks the sed expression in this test. Switch to use
'|' on the assumption that it's less likely to show up in a path.

llvm-svn: 220401
2014-10-22 18:18:54 +00:00
Bill Schmidt 9c54bbd791 [PATCH] Support select-cc for VSFRC when VSX is enabled
A previous patch enabled SELECT_VSRC and SELECT_CC_VSRC for VSX to
handle <2 x double> cases.  This patch adds SELECT_VSFRC and
SELECT_CC_VSFRC to allow use of all 64 vector-scalar registers for the
f64 type when VSX is enabled.  The changes are analogous to those in
the previous patch.  I've added a new variant to vsx.ll to test the
code generation.

(I also cleaned up a little formatting in PPCInstrVSX.td from the
previous patch.)

llvm-svn: 220395
2014-10-22 16:58:20 +00:00
Sanjay Patel a92fa44740 Shrinkify libcalls: use float versions of double libm functions with fast-math (bug 17850)
When a call to a double-precision libm function has fast-math semantics 
(via function attribute for now because there is no IR-level FMF on calls), 
we can avoid fpext/fptrunc operations and use the float version of the call
if the input and output are both float.

We already do this optimization using a command-line option; this patch just
adds the ability for fast-math to use the existing functionality.

I moved the cl::opt from InstructionCombining into SimplifyLibCalls because
it's only ever used internally to that class.

Modified the existing test cases to use the unsafe-fp-math attribute rather
than repeating all tests.

This patch should solve: http://llvm.org/bugs/show_bug.cgi?id=17850

Differential Revision: http://reviews.llvm.org/D5893

llvm-svn: 220390
2014-10-22 15:29:23 +00:00
Diego Novillo a67c0b43e1 Change error to warning when a profile cannot be found.
When the profile for a function cannot be applied, we use to emit an
error. This seems extreme. The compiler can continue, it's just that the
optimization opportunities won't include profile information.

llvm-svn: 220386
2014-10-22 13:36:35 +00:00
Diego Novillo 8027b80b41 Support using sample profiles with partial debug info.
Summary:
When using a profile, we used to require the use -gmlt so that we could
get access to the line locations. This is used to match line numbers in
the input profile to the line numbers in the function's IR.

But this is actually not necessary. The driver can provide source
location tracking without the emission of debug information. In these
cases, the annotation 'llvm.dbg.cu' is missing from the IR, but the
actual line location annotations are still present.

This patch adds a new way of looking for the start of the current
function. Instead of looking through the compile units in llvm.dbg.cu,
we can walk up the scope for the first instruction in the function with
a debug loc. If that describes the function, we use it. Otherwise, we
keep looking until we find one.

If no such instruction is found, we then give up and produce an error.

Reviewers: echristo, dblaikie

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D5887

llvm-svn: 220382
2014-10-22 12:59:00 +00:00
Arnaud A. de Grandmaison 9b3330546b [AArch64] Cleanup A57PBQPConstraints
And add a long awaited testcase.

llvm-svn: 220381
2014-10-22 12:40:20 +00:00
Bruno Cardoso Lopes c29520c5b3 [InstSimplify] Support constant folding to vector of pointers
ConstantFolding crashes when trying to InstSimplify the following load:

@a = private unnamed_addr constant %mst {
     i8* inttoptr (i64 -1 to i8*),
     i8* inttoptr (i64 -1 to i8*)
}, align 8

%x = load <2 x i8*>* bitcast (%mst* @a to <2 x i8*>*), align 8

This patch fix this by adding support to this type of folding:

%x = load <2 x i8*>* bitcast (%mst* @a to <2 x i8*>*), align 8
==> gets folded to:
  %x = <2 x i8*> <i8* inttoptr (i64 -1 to i8*), i8* inttoptr (i64 -1 to i8*)>

llvm-svn: 220380
2014-10-22 12:18:48 +00:00
Jyoti Allur 3b68607eac [Thumb/Thumb2] Implement restrictions on SP in register list on LDM, STM variants in thumb mode
llvm-svn: 220379
2014-10-22 10:41:14 +00:00
Matt Arsenault 0cf39569bf R600/SI: Add another failing testcase for i1 copies
It's not handling phis.

llvm-svn: 220371
2014-10-22 05:30:42 +00:00
Matt Arsenault 59102d38fb R600/SI: Add failing testcase reduced from OpenCV
This fails the verifier with:
"Expected a VCSrc_32 register, but got a VReg_1 register"

llvm-svn: 220368
2014-10-22 04:26:10 +00:00
Rafael Espindola 68bae2c7f6 Handle spaces and quotes in file names in MRI scripts.
llvm-svn: 220364
2014-10-22 03:10:56 +00:00
Hans Wennborg 0b39fc0d16 Revert "Teach the load analysis to allow finding available values which require" (r220277)
This seems to have caused PR21330.

llvm-svn: 220349
2014-10-21 23:49:52 +00:00
Lang Hames 41d95947cf [MCJIT] Defer application of AArch64 MachO GOT relocations until resolve time.
On AArch64, GOT references are page relative (ADRP + LDR), so they can't be
applied until we know exactly where, within a page, the GOT entry will be in
the target address space.

Fixes <rdar://problem/18693976>.

llvm-svn: 220347
2014-10-21 23:41:15 +00:00
Rafael Espindola 915fbb3590 MRI scripts: Add addlib support.
llvm-svn: 220346
2014-10-21 23:18:51 +00:00
Matt Arsenault 7c93690be0 Add minnum / maxnum codegen
llvm-svn: 220342
2014-10-21 23:01:01 +00:00
Matt Arsenault d6511b49ac Add minnum / maxnum intrinsics
These are named following the IEEE-754 names for these
functions, rather than the libm fmin / fmax to avoid
possible ambiguities. Some languages may implement something
resembling fmin / fmax which return NaN if either operand is
to propagate errors. These implement the IEEE-754 semantics
of returning the other operand if either is a NaN representing
missing data.

llvm-svn: 220341
2014-10-21 23:00:20 +00:00
Matt Arsenault 75c658e2cc R600/SI: Add missing parameter to div_fmas intrinsic
llvm-svn: 220338
2014-10-21 22:20:55 +00:00
Rafael Espindola 8a4635224b Overwrite instead of adding to archives when creating them in mri scripts.
This matches the behavior of GNU ar and also makes it easier to implemnt
support for the addlib command.

llvm-svn: 220336
2014-10-21 21:56:47 +00:00
Matt Arsenault 8c4fb7cae0 R600: Use default GlobalDirective
The overridden one wasn't inserting a space,
so you would end up with .globalfoo

llvm-svn: 220329
2014-10-21 21:08:36 +00:00
Arnaud A. de Grandmaison a61262f989 [PBQP] Teach PassConfig to tell if the default register allocator is used.
This enables targets to adapt their pass pipeline to the register
allocator in use. For example, with the AArch64 backend, using PBQP
with the cortex-a57, the FPLoadBalancing pass is no longer necessary.

llvm-svn: 220321
2014-10-21 20:47:22 +00:00
Arnaud A. de Grandmaison ece7fe0e16 [PBQP] Add a testcase for r220302: Fix coalescing benefits
llvm-svn: 220316
2014-10-21 20:10:21 +00:00
David Majnemer d205602a0b InstCombine: Simplify FoldICmpCstShrCst
This function was complicated by the fact that it tried to perform
canonicalizations that were already preformed by InstSimplify.  Remove
this extra code and move the tests over to InstSimplify.  Add asserts to
make sure our preconditions hold before we make any assumptions.

llvm-svn: 220314
2014-10-21 19:51:55 +00:00
Rafael Espindola f03ae4efa7 Drop support for an old version of ld64 (from darwin 9).
llvm-svn: 220310
2014-10-21 18:31:09 +00:00
Rafael Espindola 4bbdeda8be Convert two tests to use llvm-readobj.
llvm-svn: 220308
2014-10-21 18:24:31 +00:00
Matt Arsenault e306a32325 R600/SI: Add pattern for bswap
llvm-svn: 220304
2014-10-21 16:25:08 +00:00
Rafael Espindola c9b33ff9ba Add support for addmod to mri scripts.
llvm-svn: 220294
2014-10-21 14:46:17 +00:00
Bill Schmidt 5c6cb813b6 [PowerPC] Avoid VSX FMA mutate when killed product reg = addend reg
With VSX enabled, test/CodeGen/PowerPC/recipest.ll exposes a bug in
the FMA mutation pass.  If we have a situation where a killed product
register is the same register as the FMA target, such as:

   %vreg5<def,tied1> = XSNMSUBADP %vreg5<tied0>, %vreg11, %vreg5,
                       %RM<imp-use>; VSFRC:%vreg5 F8RC:%vreg11 

then the substitution makes no sense.  We end up getting a crash when
we try to extend the interval associated with the killed product
register, as there is already a live range for %vreg5 there.  This
patch just disables the mutation under those circumstances.

Since recipest.ll generates different code with VMX enabled, I've
modified that test to use -mattr=-vsx.  I've borrowed the code from
that test that exposed the bug and placed it in fma-mutate.ll, where
it tests several mutation opportunities including the "bad" one.

llvm-svn: 220290
2014-10-21 13:02:37 +00:00
Oliver Stannard cdb8db8d3c [ARM] NEON 32-bit scalar moves are also available in VFPv2
The 32-bit variants of the NEON scalar<->GPR move instructions are
also available in VFPv2. The 8- and 16-bit variants do require NEON.

Note that the checks in the test file are all -DAG because they are
checking a mixture of stdout and stderr, and the ordering is not
guaranteed.

llvm-svn: 220288
2014-10-21 11:49:14 +00:00
Yuri Gorshenin 171eb8dbeb [asan-asm-instrumentation] Fixed memory accesses with rbp as a base or an index register.
Summary: Fixed memory accesses with rbp as a base or an index register.

Reviewers: eugenis

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D5819

llvm-svn: 220283
2014-10-21 10:22:27 +00:00
Oliver Stannard 38e6d45a46 [Thumb2] LDRS?[BH] cannot load to the PC
The Thumb2 LDRS?[BH] instructions are not valid when the destination
register is the PC (these encodings are used for preload hints).

llvm-svn: 220278
2014-10-21 09:14:15 +00:00
Chandler Carruth aa72a6dd3b Teach the load analysis to allow finding available values which require
inttoptr or ptrtoint cast provided there is datalayout available.
Eventually, the datalayout can just be required but in practice it will
always be there today.

To go with the ability to expose available values requiring a ptrtoint
or inttoptr cast, helpers are added to perform one of these three casts.

These smarts are necessary to finish canonicalizing loads and stores to
the operational type requirements without regressing fundamental
combines.

I've added some test cases. These should actually improve as the load
combining and store combining improves, but they may fundamentally be
highlighting some missing combines for select in addition to exercising
the specific added logic to load analysis.

llvm-svn: 220277
2014-10-21 09:00:40 +00:00
Zoran Jovanovic 592239d498 [mips][microMIPS] Implement ADDU16 and SUBU16 instructions
Differential Revision: http://reviews.llvm.org/D5118

llvm-svn: 220276
2014-10-21 08:44:58 +00:00
Zoran Jovanovic 81ceebc56e [mips][microMIPS] Implement AND16, NOT16, OR16 and XOR16 instructions
Differential Revision: http://reviews.llvm.org/D5117

llvm-svn: 220275
2014-10-21 08:32:40 +00:00
Rafael Espindola c606bfe660 Fix a bit of confusion about .set and produce more readable assembly.
Every target we support has support for assembly that looks like

a = b - c
.long a

What is special about MachO is that the above combination suppresses the
production of a relocation.

With this change we avoid producing the intermediary labels when they don't
add any value.

llvm-svn: 220256
2014-10-21 01:17:30 +00:00
Paul Robinson f60e0a160f Do not attribute static allocas to the call site's DebugLoc.
When functions are inlined, instructions without debug information are
attributed to the call site's DebugLoc. After inlining, inlined static
allocas are moved to the caller's entry block, adjacent to the caller's
original static alloca instructions. By retaining the call site's
DebugLoc, these instructions could cause instructions that were
subsequently inserted at the entry block to pick up the same DebugLoc.

Patch by Wolfgang Pieb!

llvm-svn: 220255
2014-10-21 01:00:55 +00:00
Rafael Espindola f16a66973c Make this test a bit more strict.
llvm-svn: 220253
2014-10-21 00:47:49 +00:00
Chandler Carruth 97192421e1 Teach lit to filter the host LDFLAGS down from the build system and into
the CGO build environment. This lets things like -rpath propagate down
to the C++ code that is built along side the Go bindings when testing
them.

Patch by Peter Collingbourne, and verified that it works by me.

llvm-svn: 220252
2014-10-21 00:36:28 +00:00
Lang Hames 2d0d096bd1 [MCJIT] Temporarily revert r220245 - it broke several bots.
(See e.g. http://bb.pgr.jp/builders/cmake-llvm-x86_64-linux/builds/17653)

llvm-svn: 220249
2014-10-21 00:24:02 +00:00
Philip Reames bf9676f7f0 Extend the verifier to validate range metadata on calls and invokes.
Range metadata applies to loads, call, and invokes.  We were validating that metadata applied to loads was correct according to the LangRef, but we were not validating metadata applied to calls or invokes.  This change extracts the checking functionality to a common location, reuses it for all valid locations, and adds a simple test to ensure a misused range on a call gets reported.

llvm-svn: 220246
2014-10-20 23:52:07 +00:00
Lang Hames 84801c217c [MCJIT] Make MCJIT honor symbol visibility settings when populating the global
symbol table.

Patch by Anthony Pesch. Thanks Anthony!

llvm-svn: 220245
2014-10-20 23:39:54 +00:00
Quentin Colombet 06355199f1 [X86] Fix a bug in the lowering of the mask of VSELECT.
X86 code to lower VSELECT messed a bit with the bits set in the mask of VSELECT
when it knows it can be lowered into BLEND. Indeed, only the high bits need to be
set for those and it optimizes those accordingly.
However, when the mask is a compile time constant, the lowering will be handled
by the generic optimizer and those modifications will generate bad code in the
generic optimizer.

This patch fixes that by preventing the optimization if the VSELECT will be
handled by the generic optimizer.

<rdar://problem/18675020>

llvm-svn: 220242
2014-10-20 23:13:30 +00:00
Philip Reames cdb72f369f Introduce a 'nonnull' metadata on Load instructions.
The newly introduced 'nonnull' metadata is analogous to existing 'nonnull' attributes, but applies to load instructions rather than call arguments or returns.  Long term, it would be nice to combine these into a single construct.   The value of the load is allowed to vary between successive loads, but null is not a valid value to be loaded by any load marked nonnull.

Reviewed by: Hal Finkel
Differential Revision:  http://reviews.llvm.org/D5220

llvm-svn: 220240
2014-10-20 22:40:55 +00:00
Simon Pilgrim 2f9548a3ef [X86] Memory folding for commutative instructions (updated)
This patch improves support for commutative instructions in the x86 memory folding implementation by attempting to fold a commuted version of the instruction if the original folding fails - if that folding fails as well the instruction is 're-commuted' back to its original order before returning.

Updated version of r219584 (reverted in r219595) - the commutation attempt now explicitly ensures that neither of the commuted source operands are tied to the destination operand / register, which was the source of all the regressions that occurred with the original patch attempt.

Added additional regression test case provided by Joerg Sonnenberger.

Differential Revision: http://reviews.llvm.org/D5818

llvm-svn: 220239
2014-10-20 22:14:22 +00:00
Tim Northover 23075ccee7 ARM: rework Thumb1 frame index rewriting
The previous code had a few problems, motivating the choices here.

1. It could create instructions clobbering CPSR, but the incoming MachineInstr
   didn't reflect this. A potential source of corruption. This is why the patch
   has a new PseudoInst for before lowering.
2. Similarly, there was some code to handle the incoming instruction not being
   ARMCC::AL, but this would have caused massive problems if it was actually
   invoked when a complex offset needing more than one instruction was requested.
3. It wasn't designed to handle unaligned pointers (or offsets). These should
   probably be minimised anyway, but the code needs to deal with them properly
   regardless.
4. It had some rather dubious ad-hoc code to avoid calling
   emitThumbRegPlusImmediate, a function which should be designed to do precisely
   this job.

We seem to cover the common cases correctly now, and hopefully can enhance
emitThumbRegPlusImmediate to handle any extra optimisations we need to add in
future.

llvm-svn: 220236
2014-10-20 21:28:41 +00:00
Gerolf Hoflehner c47baed0b5 [AArch64] test case for compfail fixed by r219748
llvm-svn: 220206
2014-10-20 16:08:33 +00:00
Oliver Stannard 6672f37ed2 [Thumb2] RFE, SRS and "SUBS pc, lr" are undefined on v7M
These instructions are related to the v7[AR] exception model, and are
not defined on v7M.

llvm-svn: 220204
2014-10-20 15:37:35 +00:00
Oliver Stannard e8f63a54b4 [ARM] Do not select SMULW[BT] or SMLAW[BT]
The current instruction selection patterns for SMULW[BT] and SMLAW[BT]
are incorrect. These instructions multiply a 32-bit and a 16-bit value
(both signed) and return the top 32 bits of the 48-bit result. This
preserves the 16 bits of overflow, whereas the patterns they currently
match truncate the result to 16 bits then sign extend.

To select these instructions, we would need to match an ISD::SMUL_LOHI,
a sign extend, two shifts and an or. There is no way to match SMUL_LOHI
in an instruction pattern as it defines multiple values, so this would
have to be done in C++. I have raised
http://llvm.org/bugs/show_bug.cgi?id=21297 to cover allowing correct
selection of these instructions.

This fixes http://llvm.org/bugs/show_bug.cgi?id=19396

llvm-svn: 220196
2014-10-20 11:30:35 +00:00
Oliver Stannard fce039240a [Thumb] Fix crash in Thumb1RegisterInfo::rewriteFrameIndex
This function can, for some offsets from the SP, split one instruction
into two. Since it re-uses the original instruction as the first
instruction of the result, we need ensure its result register is not
marked as dead before we use it in the second instruction.

llvm-svn: 220194
2014-10-20 11:00:18 +00:00
Chandler Carruth a32038b006 Fix a miscompile introduced in r220178.
The original code had an implicit assumption that if the test for
allocas or globals was reached, the two pointers were not equal. With my
changes to make the pointer analysis more powerful here, I also had to
guard against circumstances where the results weren't useful. That in
turn violated the assumption and gave rise to a circumstance in which we
could have a store with both the queried pointer and stored pointer
rooted at *the same* alloca. Clearly, we cannot ignore such a store.
There are other things we might do in this code to better handle the
case of both pointers ending up at the same alloca or global, but it
seems best to at least make the test explicit in what it intends to
check.

I've added tests for both the alloca and global case here.

llvm-svn: 220190
2014-10-20 10:03:01 +00:00
Chandler Carruth 6665d62117 Fix a somewhat subtle pair of issues with JumpThreading I introduced in
r220178. First, the creation routine doesn't insert prior to the
terminator of the basic block provided, but really at the end of the
basic block. Instead, get the terminator and insert before that. The
next issue was that we need to ensure multiple PHI node entries for
a single predecessor re-use the same cast instruction rather than
creating new ones.

All of the logic here was without tests previously. I've reduced and
added a test case from the test suite that crashed without both of these
fixes.

llvm-svn: 220186
2014-10-20 05:34:36 +00:00
Chandler Carruth eeec35ae1c Teach the load analysis driving core instcombine logic and other bits of
logic to look through pointer casts, making them trivially stronger in
the face of loads and stores with intervening pointer casts.

I've included a few test cases that demonstrate the kind of folding
instcombine can do without pointer casts and then variations which
obfuscate the logic through bitcasts. Without this patch, the variations
all fail to optimize fully.

This is more important now than it has been in the past as I've started
moving the load canonicialization to more closely follow the value type
requirements rather than the pointer type requirements and thus this
needs to be prepared for more pointer casts. When I made the same change
to stores several test cases regressed without logic along these lines
so I wanted to systematically improve matters first.

llvm-svn: 220178
2014-10-20 00:24:14 +00:00
Chandler Carruth b5f4c32830 Add a datalayout string to this test so that it exercises the full gamut
of InstCombine rather than just the bits enabled when datalayout is
optional.

The primary fixes here are because now things are little endian.

In good news, silliness like this seems like it will be going away as
we've got pretty stong consensus on dropping optional datalayout
entirely.

llvm-svn: 220176
2014-10-20 00:11:31 +00:00
Bill Schmidt 941a3244ff [PowerPC] Clean up -mattr=+vsx tests to always specify -mcpu
We recently discovered an issue that reinforces what a good idea it is
to always specify -mcpu in our code generation tests, particularly for
-mattr=+vsx.  This patch ensures that all tests that specify
-mattr=+vsx also specify -mcpu=pwr7 or -mcpu=pwr8, as appropriate.

Some of the uses of -mattr=+vsx added recently don't make much sense
(when specified for -mtriple=powerpc-apple-darwin8 or -march=ppc32,
for example).  For cases like this I've just removed the extra VSX
test commands; there's enough coverage without them.

llvm-svn: 220173
2014-10-19 21:29:21 +00:00
Bill Schmidt 87982a1e9b [PowerPC] Temporarily disable VSX for PowerPC fast-isel tests
Patch by Bill Seurer; some comment formatting changes by me.

There are a few PowerPC test cases for FastISel support that currently
fail with VSX support enabled.  The temporary workaround under
discussion in http://reviews.llvm.org/D5362 helps, but the tests still
fail because they specify -fast-isel-abort, and the VSX workaround
punts back to SelectionDAG.  We have plans to fix FastISel permanently
for VSX, but until that's in place these tests are preventing us from
enabling VSX by default.  Therefore we are adding -mattr=-vsx to these
tests until the full support is ready.

llvm-svn: 220172
2014-10-19 20:48:47 +00:00
Bill Schmidt fff860979d [PowerPC] Re-enable VSX test line for fma.ll with -mcpu=pwr7
The VSX testing variant in test/CodeGen/PowerPC/fma.ll had to be
disabled because of unexpected behavior on many of the builders.  I
tracked this down to a situation that occurs when the VSX attribute is
enabled for a target that disables the MI early scheduling pass.  This
patch adds -mcpu=pwr7 to make this predictable.  The other issue will
be addressed separately.

llvm-svn: 220171
2014-10-19 20:27:56 +00:00
Chandler Carruth bc6378defb Do a better and more complete job of preserving metadata when combining
loads.

This handles many more cases than just the AA metadata, some of them
suggested by Hal in his review of the AA metadata handling patch. I've
tried to test this behavior where tractable to do so.

I'll point out that I have specifically *not* included a test for
debuginfo because it was going to require 2 or 3 times as much work to
craft some input which would survive the "helpful" stripping of debug
info metadata that doesn't match the desired schema. This is another
good example of why the current state of write-ability for our debug
info metadata is unacceptable. I spent over 30 minutes trying to conjure
some test case that would survive, even copying from other debug info
tests, but it always failed to survive with no explanation of why or how
I might fix it. =[

llvm-svn: 220165
2014-10-19 10:46:46 +00:00
Chandler Carruth 5b8cd2f73c Move previously dead code to handle computing the known bits of an alias
up to where it actually works as intended. The problem is that
a GlobalAlias isa GlobalValue and so the prior block handled all of the
cases.

This allows us to constant fold based on the actual constant expression
in the global alias. As an example, see the last function in the newly
added test case which explicitly aligns an unaligned pointer using
constant expression math. Without this change, we fail to see that and
fold an alignment test to zero.

llvm-svn: 220164
2014-10-19 09:06:56 +00:00
David Majnemer 312c3e5f39 InstCombine: (sub (or A B) (xor A B)) --> (and A B)
The following implements the transformation:
(sub (or A B) (xor A B)) --> (and A B).

Patch by Ankur Garg!

Differential Revision: http://reviews.llvm.org/D5719

llvm-svn: 220163
2014-10-19 08:32:32 +00:00
David Majnemer 59939acd26 InstCombine: Optimize icmp eq/ne (shl Const2, A), Const1
The following implements the optimization for sequences of the form:
icmp eq/ne (shl Const2, A), Const1

Such sequences can be transformed to:
icmp eq/ne A, (TrailingZeros(Const1) - TrailingZeros(Const2))

This handles only the equality operators for now. Other operators need
to be handled.

Patch by Ankur Garg!

llvm-svn: 220162
2014-10-19 08:23:08 +00:00
Chandler Carruth a801dd5799 Fix a long-standing miscompile in the load analysis that was uncovered
by my refactoring of this code.

The method isSafeToLoadUnconditionally assumes that the load will
proceed with the preferred type alignment. Given that, it has to ensure
that the alloca or global is at least that aligned. It has always done
this historically when a datalayout is present, but has never checked it
when the datalayout is absent. When I refactored the code in r220156,
I exposed this path when datalayout was present and that turned the
latent bug into a patent bug.

This fixes the issue by just removing the special case which allows
folding things without datalayout. This isn't worth the complexity of
trying to tease apart when it is or isn't safe without actually knowing
the preferred alignment.

llvm-svn: 220161
2014-10-19 08:17:50 +00:00
Chandler Carruth be9dccd64d Preserve AA metadata when combining (cast (load (...))) -> (load (cast
(...))).

llvm-svn: 220141
2014-10-18 11:00:12 +00:00
Chandler Carruth 2f75fcfef3 [InstCombine] Do an about-face on how LLVM canonicalizes (cast (load
...)) and (load (cast ...)): canonicalize toward the former.

Historically, we've tried to load using the type of the *pointer*, and
tried to match that type as closely as possible removing as many pointer
casts as we could and trading them for bitcasts of the loaded value.
This is deeply and fundamentally wrong.

Repeat after me: memory does not have a type! This was a hard lesson for
me to learn working on SROA.

There is only one thing that should actually drive the type used for
a pointer, and that is the type which we need to use to load from that
pointer. Matching up pointer types to the loaded value types is very
useful because it minimizes the physical size of the IR required for
no-op casts. Similarly, the only thing that should drive the type used
for a loaded value is *how that value is used*! Again, this minimizes
casts. And in fact, the *only* thing motivating types in any part of
LLVM's IR are the types used by the operations in the IR. We should
match them as closely as possible.

I've ended up removing some tests here as they were testing bugs or
behavior that is no longer present. Mostly though, this is just cleanup
to let the tests continue to function as intended.

The only fallout I've found so far from this change was SROA and I have
fixed it to not be impeded by the different type of load. If you find
more places where this change causes optimizations not to fire, those
too are likely bugs where we are assuming that the type of pointers is
"significant" for optimization purposes.

llvm-svn: 220138
2014-10-18 06:36:22 +00:00
Chandler Carruth 71009cad95 Remove a test that was ported from the old llvm-gcc frontend test suite.
This test is pretty awesome. It is claiming to test devirtualization.
However, the code in question is not in fact devirtualized by LLVM. If
you take the original C++ test case and run it through Clang at -O3 we
fail to devirtualize it completely. It also isn't a sufficiently focused
test case.

The *reason* we fail to devirtualize it isn't because of any missing
instcombine though. Instead, it is because we fail to emit an available
externally vtable and thus the vtable is just an external and completely
opaque. If I cause the vtable to be emitted, we successfully
devirtualize things.

Anyways, I'm just removing it because it is providing negative value at
this point: it isn't representative of the output of Clang really, LLVM
isn't doing the transform it claims to be testing, LLVM's failure to do
the transform isn't actually an LLVM bug at all and we shouldn't be
testing for it here, and finally the test is written in such a way that
it will trivially pass even when the point of the test is failing.

llvm-svn: 220137
2014-10-18 06:36:18 +00:00
Nick Kledzik 4e1ef9951d [llvm-objdump] don't test timestamp dump as that is time zone dependent
llvm-svn: 220123
2014-10-18 02:28:01 +00:00
Nick Kledzik 600f245b8a [llvm-objdump] enhance test case for mach-o -private-headers
llvm-svn: 220120
2014-10-18 01:50:55 +00:00
Nick Kledzik 3b2aa057e6 [llvm-objdump] Fix mach-o binding decompression error
llvm-svn: 220119
2014-10-18 01:21:02 +00:00
Chandler Carruth 2dc9682e59 [SROA] Change how SROA does vector-based promotion of allocas to handle
cases where the alloca type, the load types, and the store types used
all disagree.

Previously, the only way that vector-based promotion occured was if the
alloca type was a vector type. This was one of the *very* few remaining
uses of the alloca's type to guide SROA/mem2reg left in LLVM. It turns
out it was a bad idea.

The alloca type can change very easily based on the mixture of types
loaded and stored to that alloca. We shouldn't be relying on it as
a signal for very much. Instead, the source of truth should be loads and
stores. We should canonicalize the loads and stores as much as possible
and then rely on them exclusively in SROA.

When looking and loads and stores, we may find many different candidate
vector types. This change will let SROA try all of them to find a vector
type which is a viable way to promote the entire alloca to a vector
register.

With this change, it becomes possible to do better canonicalization and
optimization of loads and stores without breaking SROA in random ways,
and that should allow fixing a core source of performance loss in hot
numerical loops such as those in Eigen.

llvm-svn: 220116
2014-10-18 00:44:02 +00:00
Aaron Watry 8114437a8f R600/SI: Add global atomicrmw xchg
v2: Add separate offset/no-offset tests

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Matt Arsenault <matthew.arsenault@amd.com>
llvm-svn: 220110
2014-10-17 23:33:03 +00:00
Aaron Watry d672ee2a47 R600/SI: Add global atomicrmw xor
v2: Add separate offset/no-offset tests

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Matt Arsenault <matthew.arsenault@amd.com>
llvm-svn: 220109
2014-10-17 23:33:01 +00:00
Aaron Watry 8a911e6926 R600/SI: Add global atomicrmw or
v2: Add separate offset/no-offset tests

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Matt Arsenault <matthew.arsenault@amd.com>
llvm-svn: 220108
2014-10-17 23:32:59 +00:00
Aaron Watry 58c9992f15 R600/SI: Add global atomicrmw min/umin
v2: Add separate offset/no-offset tests

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Matt Arsenault <matthew.arsenault@amd.com>
llvm-svn: 220107
2014-10-17 23:32:57 +00:00
Aaron Watry 29f295d7a5 R600/SI: Add global atomicrmw max/umax
v2: Add separate offset/no-offset tests

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Matt Arsenault <matthew.arsenault@amd.com>
llvm-svn: 220106
2014-10-17 23:32:56 +00:00
Aaron Watry 621278034c R600/SI: Add global atomicrmw and
v2: Add separate offset/no-offset tests

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Matt Arsenault <matthew.arsenault@amd.com>
llvm-svn: 220105
2014-10-17 23:32:54 +00:00
Aaron Watry 328f1bae8e R600/SI: Add global atomicrmw sub
v2: Add separate offset/no-offset tests

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Matt Arsenault <matthew.arsenault@amd.com>
llvm-svn: 220104
2014-10-17 23:32:52 +00:00
Aaron Watry 28682cf205 R600/SI: Fix/add tests for atomicrmw add
The previous tests claimed to test constant offsets in the function name,
but the tests weren't actually testing them.

Clone the tests, and do testing of all combinations of the following:
1) with/without constant pointer offset
2) 32/64-bit addressing modes
3) Usage and non-usage of the return value from the atomicrmw

Reviewed-by: Matt Arsenault <matthew.arsenault@amd.com>
llvm-svn: 220103
2014-10-17 23:32:50 +00:00
Aaron Watry 1d13d36520 R600: Rename atomic_load global tests to atomic_add
The function name now matches what it's actually testing.

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Matt Arsenault <matthew.arsenault@amd.com>
llvm-svn: 220102
2014-10-17 23:32:49 +00:00
Evgeniy Stepanov e08633e900 [msan] Fix handling of byval arguments with large alignment.
MSan param-tls slots are 8-byte aligned. This change clips
alignment of memcpy into param-tls to 8.

llvm-svn: 220101
2014-10-17 23:29:44 +00:00
Pete Cooper 230332f4fe Check for dynamic alloca's when selecting lifetime intrinsics.
TL;DR: Indexing maps with [] creates missing entries.

The long version:

When selecting lifetime intrinsics, we index the *static* alloca map with the AllocaInst we find for that lifetime.  Trouble is, we don't first check to see if this is a dynamic alloca.

On the attached example, this causes a dynamic alloca to create an entry in the static map, and returns 0 (the default) as the frame index for that lifetime.  0 was used for the frame index of the stack protector, which given that it now has a lifetime, is coloured, and merged with other stack slots.

PEI would later trigger an assert because it expects the stack protector to not be dead.

This fix ensures that we only get frame indices for static allocas, ie, those in the map.  Dynamic ones are effectively dropped, which is suboptimal, but at least isn't completely broken.

rdar://problem/18672951

llvm-svn: 220099
2014-10-17 22:59:33 +00:00
Bill Schmidt 296bb31f64 [PowerPC] Disable +vsx RUN line for fma.ll due to inconsistency on other builders
llvm-svn: 220094
2014-10-17 21:32:22 +00:00
Rafael Espindola 7da1ea83a9 Revert "TRE: make TRE a bit more aggressive"
This reverts commit r219899.

This also updates byval-tail-call.ll to make it clear what was breaking.
Adding r219899 again will cause the load/store to disappear.

llvm-svn: 220093
2014-10-17 21:25:48 +00:00
Bill Schmidt a087d74250 [PowerPC] Change liveness testing in VSX FMA mutation pass
With VSX enabled, LLVM crashes when compiling
test/CodeGen/PowerPC/fma.ll.  I traced this to the liveness test
that's revised in this patch. The interval test is designed to only
work for virtual registers, but in this case the AddendSrcReg is
physical. Since there is already a walk of the MIs between the
AddendMI and the FMA, I added a check for def/kill of the AddendSrcReg
in that loop.  At Hal Finkel's request, I converted the liveness test
to an assert restricted to virtual registers.

I've changed the fma.ll test to have VSX and non-VSX variants so we
can test both kinds of multiply-adds.

llvm-svn: 220090
2014-10-17 21:02:44 +00:00
Peter Collingbourne ce80084446 Disable ccache for go tests.
Should fix llvm-clang-lld-x86_64-debian-fast bot.

llvm-svn: 220071
2014-10-17 18:32:36 +00:00
Matt Arsenault d282ada508 R600/SI: Allow commuting with source modifiers
llvm-svn: 220066
2014-10-17 18:00:48 +00:00
Matt Arsenault 6d3cd544bb R600/SI: Allow comuting fp immediates
llvm-svn: 220062
2014-10-17 18:00:39 +00:00
Peter Collingbourne fde87a0cdf We also need to catch OSError here.
llvm-svn: 220058
2014-10-17 17:46:46 +00:00
Matt Arsenault 83a535ff6b R600/SI: Remove SI_BUFFER_RSRC pseudo
Just use REG_SEQUENCE directly, so there are fewer
instructions to need to deal with later.

llvm-svn: 220056
2014-10-17 17:42:56 +00:00
Juergen Ributzka ad2363f9ee [Stackmaps] Enable invoking the patchpoint intrinsic.
Patch by Kevin Modzelewski
Reviewers: atrick, ributzka
Reviewed By: ributzka
Subscribers: llvm-commits, reames

Differential Revision: http://reviews.llvm.org/D5634

llvm-svn: 220055
2014-10-17 17:39:00 +00:00
Andrea Di Biagio c48cb86f05 [X86] Fix missed selection of non-temporal store of zero vector.
When the input to a store instruction was a zero vector, the backend
always selected a normal vector store regardless of the non-temporal
hint. This is fixed by this patch.

This fixes PR19370.

llvm-svn: 220054
2014-10-17 17:27:06 +00:00
James Molloy f497d5511d [AArch64] Fix a silent codegen fault in BUILD_VECTOR lowering.
We should be talking about the number of source elements, not the number of destination elements, given we know at this point that the source and dest element numbers are not the same.

While we're at it, avoid writing to std::vector::end()...

Bug found with random testing and a lot of coffee.

llvm-svn: 220051
2014-10-17 17:06:31 +00:00
Rafael Espindola 44c661b611 Don't crash if find_executable return None.
This was crashing when trying to run the tests on Windows.

llvm-svn: 220048
2014-10-17 16:07:43 +00:00
Bill Schmidt 2d1128acb2 [PowerPC] Enable use of lxvw4x/stxvw4x in VSX code generation
Currently the VSX support enables use of lxvd2x and stxvd2x for 2x64
types, but does not yet use lxvw4x and stxvw4x for 4x32 types.  This
patch adds that support.

As with lxvd2x/stxvd2x, this involves straightforward overriding of
the patterns normally recognized for lvx/stvx, with preference given
to the VSX patterns when VSX is enabled.

In addition, the logic for permitting misaligned memory accesses is
modified so that v4r32 and v4i32 are treated the same as v2f64 and
v2i64 when VSX is enabled.  Finally, the DAG generation for unaligned
loads is changed to just use a normal LOAD (which will become lxvw4x)
on P8 and later hardware, where unaligned loads are preferred over
lvsl/lvx/lvx/vperm.

A number of tests now generate the VSX loads/stores instead of
lvx/stvx, so this patch adds VSX variants to those tests.  I've also
added <4 x float> tests to the vsx.ll test case, and created a
vsx-p8.ll test case to be used for testing code generation for the
P8Vector feature.  For now, that simply tests the unaligned load/store
behavior.

This has been tested along with a temporary patch to enable the VSX
and P8Vector features, with no new regressions encountered with or
without the temporary patch applied.

llvm-svn: 220047
2014-10-17 15:13:38 +00:00
Jan Vesely b535c902e6 R600: Add EG to FMA test
Reviewed-by: Tom Stellard <tom@stellard.net>
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 220045
2014-10-17 14:45:27 +00:00
Jan Vesely af62cf4db0 SelectionDAG: Add sext_inreg optimizations
v2: use dyn_cast
    fixup comments
v3: use cast

Reviewed-by: Matt Arsenault <arsenm2@gmail.com>
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 220044
2014-10-17 14:45:25 +00:00
Vasileios Kalintiris 238692beb9 [mips] Add support for COP1's Branch-On-Cond-Likely instructions
Summary: Depends on D5782

Reviewers: dsanders

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D5802

llvm-svn: 220042
2014-10-17 14:08:28 +00:00
Vasileios Kalintiris 6d1e64896d [mips] Add support for COP0's Branch-On-Cond-Likely instructions
Reviewers: dsanders

Reviewed By: dsanders

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D5782

llvm-svn: 220036
2014-10-17 12:38:35 +00:00
Hal Finkel dd38c0b876 [DSE] Remove no-data-layout-only type-based overlap checking
DSE's overlap checking contained special logic, used only when no DataLayout
was available, which inferred a complete overwrite when the pointee types were
equal. This logic seems fine for regular loads/stores, but does not work for
memcpy and friends. Instead of fixing this, I'm just removing it.
Philosophically, transformations should not contain enhanced behavior used only
when data layout is lacking (data layout should be strictly additive), and
maintaining these rarely-tested code paths seems not worthwhile at this stage.

Credit to Aliaksei Zasenka for the bug report and the diagnosis. The test case
(slightly reduced from that provided by Aliaksei) replaces the original
contents of test/Transforms/DeadStoreElimination/no-targetdata.ll -- a few
other tests have been updated to have a data layout.

llvm-svn: 220035
2014-10-17 11:56:00 +00:00
Rafael Espindola b66130209b Add back commits r219835 and a fixed version of r219829.
The only difference from r219829 is using

getOrCreateSectionSymbol(*ELFSec)

instead of

GetOrCreateSymbol(ELFSec->getSectionName())

in ELFObjectWriter which causes us to use the correct section symbol even if
we have multiple sections with the same name.

Original messages:

r219829:
Correctly handle references to section symbols.

When processing assembly like

.long .text

we were creating a new undefined symbol .text. GAS on the other hand would
handle that as a reference to the .text section.

This patch implements that by creating the section symbols earlier so that
they are visible during asm parsing.

The patch also updates llvm-readobj to print the symbol number in the relocation
dump so that the test can differentiate between two sections with the same name.

r219835:
Allow forward references to section symbols.

llvm-svn: 220021
2014-10-17 01:48:58 +00:00
Bill Schmidt 32e9c6465b [PPC] Adjust some PowerPC tests to account for presence/absence of VSX
Patch by Bill Seurer; committed on his behalf.

These test cases generate slightly different code sequences when VSX
is activated and thus fail. The update turns off VSX explicitly for
the existing checks and then adds a second set of checks for most of
them that test the VSX instruction output.

llvm-svn: 220019
2014-10-17 01:41:22 +00:00
Rafael Espindola 0eb2cec936 Add a test that would have found the bug in r219829.
llvm-svn: 220016
2014-10-17 01:34:23 +00:00
Akira Hatanaka 0d0c78180d ARM: Fix a bug which was causing convergence failure in constant-island pass.
The bug is in ARMConstantIslands::createNewWater where the upper bound of the
new water split point is computed:

// This could point off the end of the block if we've already got constant
// pool entries following this block; only the last one is in the water list.
// Back past any possible branches (allow for a conditional and a maximally
// long unconditional).
if (BaseInsertOffset + 8 >= UserBBI.postOffset()) {
  BaseInsertOffset = UserBBI.postOffset() - UPad - 8;
  DEBUG(dbgs() << format("Move inside block: %#x\n", BaseInsertOffset));
}

The split point is supposed to be somewhere between the machine instruction that
loads from the constant pool entry and the end of the basic block, before branch
instructions. The code above is fine if the basic block is large enough and
there are a sufficient number of instructions following the machine instruction.
However, if the machine instruction is near the end of the basic block,
BaseInsertOffset can point to the machine instruction or another instruction
that precedes it, and this can lead to convergence failure.

This commit fixes this bug by ensuring BaseInsertOffset is larger than the
offset of the instruction following the constant-loading instruction.

rdar://problem/18581150

llvm-svn: 220015
2014-10-17 01:31:47 +00:00
Rafael Espindola 4544a4062c Revert commit r219835 and r219829.
Revert "Correctly handle references to section symbols."
Revert "Allow forward references to section symbols."

Rui found a regression I am debugging.

llvm-svn: 220010
2014-10-17 01:06:02 +00:00
Peter Zotov 9e40430102 [OCaml] Add Llvm.instr_clone.
llvm-svn: 220008
2014-10-17 01:02:40 +00:00
Alexander Potapenko 7aaf514092 [llvm-symbolizer] Introduce the -dsym-hint option.
llvm-symbolizer will consult one of the .dSYM paths passed via -dsym-hint
if it fails to find the .dSYM bundle at the default location.

llvm-svn: 220004
2014-10-17 00:50:19 +00:00
Peter Collingbourne 659670d933 Add our own copy of the find_executable function to cope with installations
that do not have the distutils.spawn package. Should hopefully fix the
aarch64 buildbot.

llvm-svn: 219991
2014-10-16 23:43:20 +00:00
Peter Collingbourne 82e3e373b3 Initial version of Go bindings.
This code is based on the existing LLVM Go bindings project hosted at:
https://github.com/go-llvm/llvm

Note that all contributors to the gollvm project have agreed to relicense
their changes under the LLVM license and submit them to the LLVM project.

Differential Revision: http://reviews.llvm.org/D5684

llvm-svn: 219976
2014-10-16 22:48:02 +00:00
Robin Morisset e2de06bef6 Erase fence insertion from SelectionDAGBuilder.cpp (NFC)
Summary:
Backends can use setInsertFencesForAtomic to signal to the middle-end that
montonic is the only memory ordering they can accept for
stores/loads/rmws/cmpxchg. The code lowering those accesses with a stronger
ordering to fences + monotonic accesses is currently living in
SelectionDAGBuilder.cpp. In this patch I propose moving this logic out of it
for several reasons:
- There is lots of redundancy to avoid: extremely similar logic already
  exists in AtomicExpand.
- The current code in SelectionDAGBuilder does not use any target-hooks, it
  does the same transformation for every backend that requires it
- As a result it is plain *unsound*, as it was apparently designed for ARM.
  It happens to mostly work for the other targets because they are extremely
  conservative, but Power for example had to switch to AtomicExpand to be
  able to use lwsync safely (see r218331).
- Because it produces IR-level fences, it cannot be made sound ! This is noted
  in the C++11 standard (section 29.3, page 1140):
```
Fences cannot, in general, be used to restore sequential consistency for atomic
operations with weaker ordering semantics.
```
It can also be seen by the following example (called IRIW in the litterature):
```
atomic<int> x = y = 0;
int r1, r2, r3, r4;
Thread 0:
  x.store(1);
Thread 1:
  y.store(1);
Thread 2:
  r1 = x.load();
  r2 = y.load();
Thread 3:
  r3 = y.load();
  r4 = x.load();
```
r1 = r3 = 1 and r2 = r4 = 0 is impossible as long as the accesses are all seq_cst.
But if they are lowered to monotonic accesses, no amount of fences can prevent it..

This patch does three things (I could cut it into parts, but then some of them
would not be tested/testable, please tell me if you would prefer that):
- it provides a default implementation for emitLeadingFence/emitTrailingFence in
terms of IR-level fences, that mimic the original logic of SelectionDAGBuilder.
As we saw above, this is unsound, but the best that can be done without knowing
the targets well (and there is a comment warning about this risk).
- it then switches Mips/Sparc/XCore to use AtomicExpand, relying on this default
implementation (that exactly replicates the logic of SelectionDAGBuilder, so no
functional change)
- it finally erase this logic from SelectionDAGBuilder as it is dead-code.

Ideally, each target would define its own override for emitLeading/TrailingFence
using target-specific fences, but I do not know the Sparc/Mips/XCore memory model
well enough to do this, and they appear to be dealing fine with the ARM-inspired
default expansion for now (probably because they are overly conservative, as
Power was). If anyone wants to compile fences more agressively on these
platforms, the long comment should make it clear why he should first override
emitLeading/TrailingFence.

Test Plan: make check-all, no functional change

Reviewers: jfb, t.p.northover

Subscribers: aemerson, llvm-commits

Differential Revision: http://reviews.llvm.org/D5474

llvm-svn: 219957
2014-10-16 20:34:57 +00:00
Matt Arsenault a3fe7c62d1 R600: Fix nonsensical implementation of computeKnownBits for BFE
This was resulting in invalid simplifications of sdiv

llvm-svn: 219953
2014-10-16 20:07:40 +00:00
Rafael Espindola 11aaaeebe0 Delete -std-compile-opts.
These days -std-compile-opts was just a silly alias for -O3.

llvm-svn: 219951
2014-10-16 20:00:02 +00:00
Bjorn Steinbrink d20816fde9 Allow call-slop optzn for destinations with a suitable dereferenceable attribute
Summary:
Currently, call slot optimization requires that if the destination is an
argument, the argument has the sret attribute. This is to ensure that
the memory access won't trap. In addition to sret, we can also allow the
optimization to happen for arguments that have the new dereferenceable
attribute, which gives the same guarantee.

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D5832

llvm-svn: 219950
2014-10-16 19:43:08 +00:00
Sanjay Patel c699a6117b fold: sqrt(x * x * y) -> fabs(x) * sqrt(y)
If a square root call has an FP multiplication argument that can be reassociated,
then we can hoist a repeated factor out of the square root call and into a fabs().

In the simplest case, this:

   y = sqrt(x * x);

becomes this:

   y = fabs(x);

This patch relies on an earlier optimization in instcombine or reassociate to put the
multiplication tree into a canonical form, so we don't have to search over
every permutation of the multiplication tree.

Because there are no IR-level FastMathFlags for intrinsics (PR21290), we have to
use function-level attributes to do this optimization. This needs to be fixed
for both the intrinsics and in the backend.

Differential Revision: http://reviews.llvm.org/D5787

llvm-svn: 219944
2014-10-16 18:48:17 +00:00
Juergen Ributzka 03a0611061 [AArch64] Fix miscompile of sdiv-by-power-of-2.
When the constant divisor was larger than 32bits, then the optimized code
generated for the AArch64 backend would emit the wrong code, because the shift
was defined as a shift of a 32bit constant '(1<<Lg2(divisor))' and we would
loose the upper 32bits.

This fixes rdar://problem/18678801.

llvm-svn: 219934
2014-10-16 16:41:15 +00:00
Vasileios Kalintiris 167c372118 [mips] Account for endianess when expanding BuildPairF64/ExtractElementF64 nodes.
Summary:
In order to support big endian targets for the BuildPairF64 nodes we
just need to swap the low/high pair registers. Additionally, for the
ExtractElementF64 nodes we have to calculate the correct stack offset
with respect to the node's register/operand that we want to extract.

Reviewers: dsanders

Reviewed By: dsanders

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D5753

llvm-svn: 219931
2014-10-16 15:41:51 +00:00
Vasileios Kalintiris 711028f718 [mips] Marked the DI/EI instruction aliases as MIPS32r2
Reviewers: dsanders

Reviewed By: dsanders

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D5751

llvm-svn: 219927
2014-10-16 15:23:52 +00:00
Akira Hatanaka 5c221ef98f Reapply r219832 - InstCombine: Narrow switch instructions using known bits.
The code committed in r219832 asserted when it attempted to shrink a switch
statement whose type was larger than 64-bit.

llvm-svn: 219902
2014-10-16 06:00:46 +00:00
Saleem Abdulrasool 7f52921976 TRE: make TRE a bit more aggressive
Make tail recursion elimination a bit more aggressive.  This allows us to get
tail recursion on functions that are just branches to a different function.  The
fact that the function takes a byval argument does not restrict it from being
optimised into just a tail call.

llvm-svn: 219899
2014-10-16 03:27:30 +00:00
Akira Hatanaka 40c2cf4afc Revert r219832.
llvm-svn: 219884
2014-10-16 01:17:02 +00:00
Sanjoy Das 360b1ed5f2 Revert "r219834 - Teach ScalarEvolution to sharpen range information"
This change breaks the asan buildbots:
http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux/builds/13468

llvm-svn: 219878
2014-10-15 23:46:04 +00:00
Hal Finkel 68dc3c7ab2 Preserve non-byval pointer alignment attributes using @llvm.assume when inlining
For pointer-typed function arguments, enhanced alignment can be asserted using
the 'align' attribute. When inlining, if this enhanced alignment information is
not otherwise available, preserve it using @llvm.assume-based alignment
assumptions.

llvm-svn: 219876
2014-10-15 23:44:41 +00:00
Adam Nemet 4285c1f8cc [AVX512] Add DQ subvector inserts
In AVX512f we support 64x2 and 32x8 inserts via matching them to 32x4 and 64x4
respectively.  These are matched by "Alt" Pat<>'s (Alt stands for alternative
VTs).

Since DQ has native support for these intructions, I peeled off the non-"Alt"
part of the baseclass into vinsert_for_size_no_alt. The DQ instructions are
derived from this multiclass.  The "Alt" Pat<>'s are disabled with DQ.

Fixes <rdar://problem/18426089>

llvm-svn: 219874
2014-10-15 23:42:17 +00:00
Adam Nemet 2b71ca5d88 [AVX512] Add SKX testing to avx512-insert-extract.ll
This is in preparation to adding DQ subvector inserts to this testcase.

llvm-svn: 219873
2014-10-15 23:42:14 +00:00
Adam Nemet f3aba14dad [AVX512] Fix test to produce a defined value
We're inserting into a 8 wide vector, so the index should be < 8.

llvm-svn: 219872
2014-10-15 23:42:11 +00:00
Tom Stellard c8d7920ad9 R600/SI: Fix bug where immediates were being used in DS addr operands
The SelectDS1Addr1Offset complex pattern always tries to store constant
lds pointers in the offset operand and store a zero value in the addr operand.
Since the addr operand does not accept immediates, the zero value
needs to first be copied to a register.

This newly created zero value will not go through normal instruction
selection, so we need to manually insert a V_MOV_B32_e32 in the complex
pattern.

This bug was hidden by the fact that if there was another zero value
in the DAG that had not been selected yet, then the CSE done by the DAG
would use the unselected node for the addr operand rather than the one
that was just created.  This would lead to the zero value being selected
and the DAG automatically inserting a V_MOV_B32_e32 instruction.

llvm-svn: 219848
2014-10-15 21:08:59 +00:00
Rafael Espindola 78206c3576 Allow forward references to section symbols.
llvm-svn: 219835
2014-10-15 19:30:18 +00:00
Sanjoy Das 90c2f1455a Teach ScalarEvolution to sharpen range information.
If x is known to have the range [a, b) in a loop predicated by (icmp
ne x, a), its range can be sharpened to [a + 1, b).  Get
ScalarEvolution and hence IndVars to exploit this fact.
    
This change triggers an optimization to widen-loop-comp.ll, so it had
to be edited to get it to pass.

phabricator: http://reviews.llvm.org/D5639
llvm-svn: 219834
2014-10-15 19:25:28 +00:00
Akira Hatanaka 5bb9346a45 InstCombine: Narrow switch instructions using known bits.
Truncate the operands of a switch instruction to a narrower type if the upper
bits are known to be all ones or zeros.

rdar://problem/17720004

llvm-svn: 219832
2014-10-15 19:05:50 +00:00
Juergen Ributzka f82c987a5c Reapply "[FastISel][AArch64] Add custom lowering for GEPs."
This is mostly a copy of the existing FastISel GEP code, but we have to
duplicate it for AArch64, because otherwise we would bail out even for simple
cases. This is because the standard fastEmit functions don't cover MUL at all
and ADD is lowered very inefficientily.

The original commit had a bug in the add emit logic, which has been fixed.

llvm-svn: 219831
2014-10-15 18:58:07 +00:00
Rafael Espindola a74b5e6823 Correctly handle references to section symbols.
When processing assembly like

.long .text

we were creating a new undefined symbol .text. GAS on the other hand would
handle that as a reference to the .text section.

This patch implements that by creating the section symbols earlier so that
they are visible during asm parsing.

The patch also updates llvm-readobj to print the symbol number in the relocation
dump so that the test can differentiate between two sections with the same name.

llvm-svn: 219829
2014-10-15 18:55:30 +00:00
Matt Arsenault 1a74aff846 R600/SI: Also try to use 0 base for misaligned 8-byte DS loads.
llvm-svn: 219823
2014-10-15 18:06:43 +00:00
Matt Arsenault 7b68fdf3c0 R600: Fix miscompiles when BFE has multiple uses
SimplifyDemandedBits would break the other uses of the operand.

llvm-svn: 219819
2014-10-15 17:58:34 +00:00
Hal Finkel 3b7fc86677 [SLPVectorize] Basic ephemeral-value awareness
The SLP vectorizer should not vectorize ephemeral values. These are used to
express information to the optimizer, and vectorizing them does not lead to
faster code (because the ephemeral values are dropped prior to code generation,
vectorized or not), and obscures the information the instructions are
attempting to communicate (the logic that interprets the arguments to
@llvm.assume generically does not understand vectorized conditions).

Also, uses by ephemeral values are free (because they, and the necessary
extractelement instructions, will be dropped prior to code generation).

llvm-svn: 219816
2014-10-15 17:35:01 +00:00
Derek Schuff 05fb735f3a [MC] Make bundle alignment mode setting idempotent and support nested bundles
Summary:
Currently an error is thrown if bundle alignment mode is set more than once
per module (either via the API or the .bundle_align_mode directive). This
change allows setting it multiple times as long as the alignment doesn't
change.

Also nested bundle_lock groups are currently not allowed. This change allows
them, with the effect that the group stays open until all nests are exited,
and if any of the bundle_lock directives has the align_to_end flag, the
group becomes align_to_end.

These changes make the bundle aligment simpler to use in the compiler, and
also better match the corresponding support in GNU as.

Reviewers: jvoung, eliben

Differential Revision: http://reviews.llvm.org/D5801

llvm-svn: 219811
2014-10-15 17:10:04 +00:00
Juergen Ributzka 42379d4cf7 Revert "[FastISel][AArch64] Add custom lowering for GEPs."
This breaks our internal build bots. Reverting it to get the bots green again.

llvm-svn: 219776
2014-10-15 04:55:48 +00:00
Jingyue Wu 2954280f6a [MachineSink] Use the real post dominator tree
Summary:
Fixes a FIXME in MachineSinking. Instead of using the simple heuristics in
isPostDominatedBy, use the real MachinePostDominatorTree and MachineLoopInfo.
The old heuristics caused instructions to sink unnecessarily, and might create
register pressure.

This is the second try of the fix. The first one (D4814) caused a performance
regression due to failing to sink instructions out of loops (PR21115). This
patch fixes PR21115 by sinking an instruction from a deeper loop to a shallower
one regardless of whether the target block post-dominates the source.

Thanks Alexey Volkov for reporting PR21115! 

Test Plan:
Added a NVPTX codegen test to verify that our change prevents the backend from
over-sinking. It also shows the unnecessary register pressure caused by
over-sinking.

Added an X86 test to verify we can sink instructions out of loops regardless of
the dominance relationship. This test is reduced from Alexey's test in PR21115.

Updated an affected test in X86.

Also ran SPEC CINT2006 and llvm-test-suite for compilation time and runtime
performance. Results are attached separately in the review thread.

Reviewers: Jiangning, resistor, hfinkel

Reviewed By: hfinkel

Subscribers: hfinkel, bruno, volkalexey, llvm-commits, meheff, eliben, jholewinski

Differential Revision: http://reviews.llvm.org/D5633

llvm-svn: 219773
2014-10-15 03:27:43 +00:00
Nick Kledzik 51d2c2bf85 [llvm-objdump] Update error message and add test case for mach-o file with bad library ordinals
llvm-svn: 219746
2014-10-14 23:29:38 +00:00
Gerolf Hoflehner a4c96d02a2 [AAarch64] Optimize CSINC-branch sequence
Peephole optimization that generates a single conditional branch
for csinc-branch sequences like in the examples below. This is
possible when the csinc sets or clears a register based on a condition
code and the branch checks that register. Also the condition
code may not be modified between the csinc and the original branch.

Examples:

1. Convert csinc w9, wzr, wzr, <CC>;tbnz w9, #0, 0x44
   to b.<invCC>

2. Convert csinc w9, wzr, wzr, <CC>; tbz w9, #0, 0x44
   to b.<CC>


rdar://problem/18506500

llvm-svn: 219742
2014-10-14 23:07:53 +00:00
Hal Finkel 1a600faba0 [LoopVectorize] Ignore @llvm.assume for cost estimates and legality
A few minor changes to prevent @llvm.assume from interfering with loop
vectorization. First, treat @llvm.assume like the lifetime intrinsics, which
are scalarized (but don't otherwise interfere with the legality checking).
Second, ignore the cost of ephemeral instructions in the loop (these will go
away anyway during CodeGen).

Alignment assumptions and other uses of @llvm.assume can often end up inside of
loops that should be vectorized (this is not uncommon for assumptions generated
by __attribute__((align_value(n))), for example).

llvm-svn: 219741
2014-10-14 22:59:49 +00:00
David Majnemer 3416b48980 MC, COFF: Make bigobj test compatible with python3
No functionality change intended.

llvm-svn: 219739
2014-10-14 22:35:11 +00:00
Simon Pilgrim a798e9ffdf [X86][SSE] pslldq/psrldq shuffle mask decodes
Patch to provide shuffle decodes and asm comments for the sse pslldq/psrldq SSE2/AVX2 byte shift instructions.

Differential Revision: http://reviews.llvm.org/D5598

llvm-svn: 219738
2014-10-14 22:31:34 +00:00
David Majnemer 0b011b337f MC: Rewrite bigobj test in python
This makes the test easier to work with.  No functionality change
intended.

llvm-svn: 219737
2014-10-14 22:26:49 +00:00
Tim Northover cf6ce0c8f7 ARM: remove ARM/Thumb distinction for preferred alignment.
Thumb1 has legitimate reasons for preferring 32-bit alignment of types
i1/i8/i16, since the 16-bit encoding of "add rD, sp, #imm" requires #imm to be
a multiple of 4. However, this is a trade-off betweem code size and RAM usage;
the DataLayout string is not the best place to represent it even if desired.

So this patch removes the extra Thumb requirements, hopefully making ARM and
Thumb completely compatible in this respect.

llvm-svn: 219734
2014-10-14 22:12:17 +00:00
Tim Northover 9a4c043d67 ARM: allow misaligned local variables in Thumb1 mode.
There's no hard requirement on LLVM to align local variable to 32-bits, so the
Thumb1 frame handling needs to be able to deal with variables that are only
naturally aligned without falling over.

llvm-svn: 219733
2014-10-14 22:12:14 +00:00
David Majnemer 5fdfa70a1f Add a test for writing COFF BigObj
llvm-svn: 219729
2014-10-14 21:47:53 +00:00
Juergen Ributzka 4dfd590eaa [FastISel][AArch64] Add custom lowering for GEPs.
This is mostly a copy of the existing FastISel GEP code, but on AArch64 we bail
out even for simple cases, because the standard fastEmit functions don't cover
MUL and ADD is lowered inefficientily.

llvm-svn: 219726
2014-10-14 21:41:23 +00:00
Hans Wennborg f6aafeee60 [x86 asm] allow fwait alias in both At&t and Intel modes (PR21208)
Differential Revision: http://reviews.llvm.org/D5741

llvm-svn: 219725
2014-10-14 21:41:17 +00:00
Tim Northover aa09ac6e83 ARM: set preferred aggregate alignment to 32 universally.
Before, ARM and Thumb mode code had different preferred alignments, which could
lead to some rather unexpected results. There's justification for reducing it
from the default 64-bits (wasted space), but I don't think there is for going
below 32-bits.

There's no actual ABI change here, just to reassure people.

llvm-svn: 219719
2014-10-14 20:57:26 +00:00
Hal Finkel db5f86a9bf [CFL-AA] CFL-AA should not assert on an va_arg instruction
The CFL-AA implementation was missing a visit* routine for va_arg instructions,
causing it to assert when run on a function that had one. For now, handle these
in a conservative way.

Fixes PR20954.

llvm-svn: 219718
2014-10-14 20:51:26 +00:00
Sanjay Patel 0ca42bb5a8 Optimize away fabs() calls when input is squared (known positive).
Eliminate library calls and intrinsic calls to fabs when the input 
is a squared value.

Note that no unsafe-math / fast-math assumptions are needed for
this optimization.

Differential Revision: http://reviews.llvm.org/D5777

llvm-svn: 219717
2014-10-14 20:43:11 +00:00
Juergen Ributzka cd11a2806b [FastISel][AArch64] Fix sign-/zero-extend folding when SelectionDAG is involved.
Sign-/zero-extend folding depended on the load and the integer extend to be
both selected by FastISel. This cannot always be garantueed and SelectionDAG
might interfer. This commit adds additonal checks to load and integer extend
lowering to catch this.

Related to rdar://problem/18495928.

llvm-svn: 219716
2014-10-14 20:36:02 +00:00
David Majnemer dad2103801 InstCombine: Don't miscompile X % ((Pow2 << A) >>u B)
We assumed that A must be greater than B because the right hand side of
a remainder operator must be nonzero.

However, it is possible for A to be less than B if Pow2 is a power of
two greater than 1.

Take for example:
i32 %A = 0
i32 %B = 31
i32 Pow2 = 2147483648

((Pow2 << 0) >>u 31) is non-zero but A is less than B.

This fixes PR21274.

llvm-svn: 219713
2014-10-14 20:28:40 +00:00
Jan Vesely e5121f3c10 Reapply "R600: Add new intrinsic to read work dimensions"
This effectively reverts revert 219707. After fixing the test to work with
new function name format and renamed intrinsic.

Reviewed-by: Tom Stellard <tom@stellard.net>
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 219710
2014-10-14 20:05:26 +00:00
Hal Finkel 171c2ec008 Revert "r216914 - Revert: [APFloat] Fixed a bug in method 'fusedMultiplyAdd'"
Reapply r216913, a fix for PR20832 by Andrea Di Biagio. The commit was reverted
because of buildbot failures, and credit goes to Ulrich Weigand for isolating
the underlying issue (which can be confirmed by Valgrind, which does helpfully
light up like the fourth of July). Uli explained the problem with the original
patch as:

  It seems the problem is calling multiplySignificand with an addend of category
  fcZero; that is not expected by this routine.  Note that for fcZero, the
  significand parts are simply uninitialized, but the code in (or rather, called
  from) multiplySignificand will unconditionally access them -- in effect using
  uninitialized contents.

This version avoids using a category == fcZero addend within
multiplySignificand, which avoids this problem (the Valgrind output is also now
clean).

Original commit message:

[APFloat] Fixed a bug in method 'fusedMultiplyAdd'.

When folding a fused multiply-add builtin call, make sure that we propagate the
correct result in the case where the addend is zero, and the two other operands
are finite non-zero.

Example:
  define double @test() {
    %1 = call double @llvm.fma.f64(double 7.0, double 8.0, double 0.0)
    ret double %1
  }

Before this patch, the instruction simplifier wrongly folded the builtin call
in function @test to constant 'double 7.0'.
With this patch, method 'fusedMultiplyAdd' correctly evaluates the multiply and
propagates the expected result (i.e. 56.0).

Added test fold-builtin-fma.ll with the reproducible from PR20832 plus extra
test cases to verify the behavior of method 'fusedMultiplyAdd' in the presence
of NaN/Inf operands.

This fixes PR20832.

llvm-svn: 219708
2014-10-14 19:23:07 +00:00
Rafael Espindola db3f0a24ec Revert "R600: Add new intrinsic to read work dimensions"
This reverts commit r219705.

CodeGen/R600/work-item-intrinsics.ll was failing on linux.

llvm-svn: 219707
2014-10-14 18:58:04 +00:00
Jan Vesely 86187d231a R600: Add new intrinsic to read work dimensions
v2: Add SI lowering
    Add test

v3: Place work dimensions after the kernel arguments.
v4: Calculate offset while lowering arguments
v5: rebase
v6: change prefix to AMDGPU

Reviewed-by: Tom Stellard <tom@stellard.net>
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 219705
2014-10-14 18:52:07 +00:00
Matt Arsenault e775f5fe76 R600/SI: Use DS offsets for constant addresses
Use 0 as the base address for a constant address, so if
we have a constant address we can save moves and form
read2/write2s.

llvm-svn: 219698
2014-10-14 17:21:19 +00:00
Hal Finkel a3f23e3725 [LVI] Check for @llvm.assume dominating the edge branch
When LazyValueInfo uses @llvm.assume intrinsics to provide edge-value
constraints, we should check for intrinsics that dominate the edge's branch,
not just any potential context instructions. An assumption that dominates the
edge's branch represents a truth on that edge. This is specifically useful, for
example, if multiple predecessors assume a pointer to be nonnull, allowing us
to simplify a later null comparison.

The test case, and an initial patch, were provided by Philip Reames. Thanks!

llvm-svn: 219688
2014-10-14 16:04:49 +00:00
Robert Khasanov 1a77f6664e [AVX512] Extended avx512_binop_rm to DQ/VL subsets.
Added encoding tests.

llvm-svn: 219686
2014-10-14 15:13:56 +00:00
Robert Khasanov 545d1b7726 [AVX512] Extended avx512_binop_rm to BW/VL subsets.
Added encoding tests.

llvm-svn: 219685
2014-10-14 14:36:19 +00:00
Bradley Smith 698e08f4cf [AArch64] Fix crash with empty/pseudo-only blocks in A53 erratum (835769) workaround
llvm-svn: 219684
2014-10-14 14:02:41 +00:00
Hao Liu 3cb826ca10 [AArch64]Select wide immediate offset into [Base+XReg] addressing mode
e.g Currently we'll generate following instructions if the immediate is too wide:
    MOV  X0, WideImmediate
    ADD  X1, BaseReg, X0
    LDR  X2, [X1, 0]

    Using [Base+XReg] addressing mode can save one ADD as following:
    MOV  X0, WideImmediate
    LDR  X2, [BaseReg, X0]

    Differential Revision: http://reviews.llvm.org/D5477

llvm-svn: 219665
2014-10-14 06:50:36 +00:00
Marcello Maggioni 5bbe3df63f Switch to select optimization for two-case switches
This is the same optimization of r219233 with modifications to support PHIs with multiple incoming edges from the same block
and a test to check that this condition is handled.

llvm-svn: 219656
2014-10-14 01:58:26 +00:00
David Majnemer db0773089f InstCombine: Fix miscompile in X % -Y -> X % Y transform
We assumed that negation operations of the form (0 - %Z) resulted in a
negative number.  This isn't true if %Z was originally negative.
Substituting the negative number into the remainder operation may result
in undefined behavior because the dividend might be INT_MIN.

This fixes PR21256.

llvm-svn: 219639
2014-10-13 22:37:51 +00:00
David Majnemer a252138942 InstCombine: Don't miscompile (x lshr C1) udiv C2
We have a transform that changes:
  (x lshr C1) udiv C2
into:
  x udiv (C2 << C1)

However, it is unsafe to do so if C2 << C1 discards any of C2's bits.

This fixes PR21255.

llvm-svn: 219634
2014-10-13 21:48:30 +00:00
Timur Iskhodzhanov 1ee5ac87e2 Add VS2012-generated test inputs for test/tools/llvm-readobj/codeview-linetables.test
llvm-svn: 219621
2014-10-13 17:03:13 +00:00
Filipe Cabecinhas 9d7bd78ffa Fix a broadcast related regression on the vector shuffle lowering.
Summary: Test by Robert Lougher!

Reviewers: chandlerc

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D5745

llvm-svn: 219617
2014-10-13 16:16:16 +00:00
Renato Golin 16ea8ba3bc Adds support for the Cortex-A17 to the ARM backend
Patch by Matthew Wahab.

llvm-svn: 219606
2014-10-13 10:22:19 +00:00
Daniel Sanders 642daf0c0c [mips] Mark redundant instructions with a comment in test/CodeGen/Mips/Fast-ISel/icmpa.ll.
llvm-svn: 219605
2014-10-13 10:18:02 +00:00
Bradley Smith f2a801d8ac [AArch64] Add workaround for Cortex-A53 erratum (835769)
Some early revisions of the Cortex-A53 have an erratum (835769) whereby it is
possible for a 64-bit multiply-accumulate instruction in AArch64 state to
generate an incorrect result.  The details are quite complex and hard to
determine statically, since branches in the code may exist in some
 circumstances, but all cases end with a memory (load, store, or prefetch)
instruction followed immediately by the multiply-accumulate operation.

The safest work-around for this issue is to make the compiler avoid emitting
multiply-accumulate instructions immediately after memory instructions and the
simplest way to do this is to insert a NOP.

This patch implements such work-around in the backend, enabled via the option
-aarch64-fix-cortex-a53-835769.

The work-around code generation is not enabled by default.

llvm-svn: 219603
2014-10-13 10:12:35 +00:00
Yuri Gorshenin 46853b55fa [asan-asm-instrumentation] Fixed memory references which includes %rsp as a base or an index register.
Summary: [asan-asm-instrumentation] Fixed memory references which includes %rsp as a base or an index register.

Reviewers: eugenis

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D5599

llvm-svn: 219602
2014-10-13 09:37:47 +00:00
NAKAMURA Takumi 75a0240056 Revert r219584, "[X86] Memory folding for commutative instructions."
It broke i686 selfhosting.

llvm-svn: 219595
2014-10-13 04:17:34 +00:00
Joerg Sonnenberger 5ca10d0edb Revert r219223, it creates invalid PHI nodes.
llvm-svn: 219587
2014-10-12 17:16:04 +00:00
Benjamin Kramer 240b85eec5 InstCombine: Turn (x != 0 & x <u C) into the canonical range check form (x-1 <u C-1)
llvm-svn: 219585
2014-10-12 14:02:34 +00:00
Simon Pilgrim 77ac26d279 [X86] Memory folding for commutative instructions.
This patch improves support for commutative instructions in the x86 memory folding implementation by attempting to fold a commuted version of the instruction if the original folding fails - if that folding fails as well the instruction is 're-commuted' back to its original order before returning.

This mainly helps the stack inliner better fold reloads of 3 (or more) operand instructions (VEX encoded SSE etc.) but by performing this in the lowest foldMemoryOperandImpl implementation it also replaces the X86InstrInfo::optimizeLoadInstr version and is now used by FastISel too.

Differential Revision: http://reviews.llvm.org/D5701

llvm-svn: 219584
2014-10-12 10:52:55 +00:00
NAKAMURA Takumi 6fd86f1678 llvm/test/CodeGen: Some tests don't REQUIRE asserts any more. Remove them.
llvm-svn: 219581
2014-10-12 06:47:47 +00:00
NAKAMURA Takumi 7198a199aa Suppress llvm-ar's MRI tests for now on win32, since line_iterator is incompatible to CRLF.
llvm-svn: 219579
2014-10-11 22:21:27 +00:00
David Majnemer fe7fccff11 InstCombine: Don't fold (X <<s log(INT_MIN)) /s INT_MIN to X
Consider the case where X is 2.  (2 <<s 31)/s-2147483648 is zero but we
would fold to X.  Note that this is valid when we are in the unsigned
domain because we require NUW: 2 <<u 31 results in poison.

This fixes PR21245.

llvm-svn: 219568
2014-10-11 10:20:04 +00:00
David Majnemer cb9d596655 InstCombine, InstSimplify: (%X /s C1) /s C2 isn't always 0 when C1 * C2 overflow
consider:
C1 = INT_MIN
C2 = -1

C1 * C2 overflows without a doubt but consider the following:
%x = i32 INT_MIN

This means that (%X /s C1) is 1 and (%X /s C1) /s C2 is -1.

N. B.  Move the unsigned version of this transform to InstSimplify, it
doesn't create any new instructions.

This fixes PR21243.

llvm-svn: 219567
2014-10-11 10:20:01 +00:00
David Majnemer 3cac85e071 InstCombine: mul to shl shouldn't preserve nsw
consider:
mul i32 nsw %x, -2147483648

this instruction will not result in poison if %x is 1

however, if we transform this into:
shl i32 nsw %x, 31

then we will be generating poison because we just shifted into the sign
bit.

This fixes PR21242.

llvm-svn: 219566
2014-10-11 10:19:52 +00:00
Reed Kotler 62de6b96b5 Add basic conditional branches in mips fast-isel
Summary: Implement the most basic form of conditional branches in Mips fast-isel.

Test Plan:
br1.ll
run 4 flavors of test-suite. mips32 r1/r2 and at -O0/O2

Reviewers: dsanders

Reviewed By: dsanders

Subscribers: llvm-commits, rfuhler

Differential Revision: http://reviews.llvm.org/D5583

llvm-svn: 219556
2014-10-11 00:55:18 +00:00
Sanjay Patel ad8b666624 Return undef on FP <-> Int conversions that overflow (PR21330).
The LLVM Lang Ref states for signed/unsigned int to float conversions:
"If the value cannot fit in the floating point value, the results are undefined."

And for FP to signed/unsigned int:
"If the value cannot fit in ty2, the results are undefined."

This matches the C definitions.

The existing behavior pins to infinity or a max int value, but that may just
lead to more confusion as seen in:
http://llvm.org/bugs/show_bug.cgi?id=21130

Returning undef will hopefully lead to a less silent failure.

Differential Revision: http://reviews.llvm.org/D5603

llvm-svn: 219542
2014-10-10 23:00:21 +00:00
Matt Arsenault 61cc9083d0 R600/SI: Change how DS offsets are printed
Match SC by using offset/offset0/offset1 and printing
in decimal.

llvm-svn: 219537
2014-10-10 22:16:07 +00:00
Matt Arsenault fe0a2e677b R600/SI: Match read2/write2 stride 64 versions
llvm-svn: 219536
2014-10-10 22:12:32 +00:00
Matt Arsenault 410332860d R600/SI: Add load / store machine optimizer pass.
Currently this only functions to match simple cases
where ds_read2_* / ds_write2_* instructions can be used.

In the future it might match some of the other weird
load patterns, such as direct to LDS loads.

Currently enabled only with a subtarget feature to enable
easier testing.

llvm-svn: 219533
2014-10-10 22:01:59 +00:00
Sanjoy Das 1f05c51e5e This patch teaches ScalarEvolution to pick and use !range metadata.
It also makes it more aggressive in querying range information by
adding a call to isKnownPredicateWithRanges to
isLoopBackedgeGuardedByCond and isLoopEntryGuardedByCond.

phabricator: http://reviews.llvm.org/D5638

Reviewed by: atrick, hfinkel

llvm-svn: 219532
2014-10-10 21:22:34 +00:00
Reed Kotler 1f64ecab79 Implement floating point compare for mips fast-isel
Summary: Expand SelectCmp to handle floating point compare

Test Plan:
fpcmpa.ll
run 4 flavors of test-suite, mips32 r1/r2 O0/O2

Reviewers: dsanders

Reviewed By: dsanders

Subscribers: llvm-commits, rfuhler

Differential Revision: http://reviews.llvm.org/D5567

llvm-svn: 219530
2014-10-10 20:46:28 +00:00
Rafael Espindola b275797e31 llvm-ar: Start adding support for mri scripts.
I was quiet surprised to find this feature being used. Fortunately the uses
I found look fairly simple. In fact, they are just a very verbose version
of the regular ar commands.

Start implementing it then by parsing the script and setting the command
variables as if we had a regular command line.

This patch adds just enough support to create an empty archive and do a bit
of error checking. In followup patches I will implement at least addmod
and addlib.

From the description in the manual, even the more general case should not
be too hard to implement if needed. The features that don't map 1:1 to
the simple command line are

* Reading from multiple archives.
* Creating multiple archives.

llvm-svn: 219521
2014-10-10 18:33:51 +00:00
Reed Kotler 497311ab99 implement integer compare in mips fast-isel
Summary: implement SelectCmp (integer compare ) in mips fast-isel

Test Plan:
icmpa.ll
also ran 4 test-suite flavors mips32 r1/r2 O0/O2

Reviewers: dsanders

Reviewed By: dsanders

Subscribers: llvm-commits, rfuhler, mcrosier

Differential Revision: http://reviews.llvm.org/D5566

llvm-svn: 219518
2014-10-10 17:39:51 +00:00
Mark Heffernan 2beab5f0b4 This patch de-pessimizes the calculation of loop trip counts in
ScalarEvolution in the presence of multiple exits. Previously all
loops exits had to have identical counts for a loop trip count to be
considered computable. This pessimization was implemented by calling
getBackedgeTakenCount(L) rather than getExitCount(L, ExitingBlock)
inside of ScalarEvolution::getSmallConstantTripCount() (see the FIXME
in the comments of that function). The pessimization was added to fix
a corner case involving undefined behavior (pr/16130). This patch more
precisely handles the undefined behavior case allowing the pessimization
to be removed.

ControlsExit replaces IsSubExpr to more precisely track the case where
undefined behavior is expected to occur. Because undefined behavior is
tracked more precisely we can remove MustExit from ExitLimit. MustExit
was used to track the case where the limit was computed potentially
assuming undefined behavior even if undefined behavior didn't necessarily
occur.

llvm-svn: 219517
2014-10-10 17:39:11 +00:00
Hal Finkel 7a87f8a670 [MiSched] Fix a logic error in tryPressure()
Fixes a logic error in the MachineScheduler found by Steve Montgomery (and
confirmed by Andy). This has gone unfixed for months because the fix has been
found to introduce some small performance regressions. However, Andy has
recommended that, at this point, we fix this to avoid further dependence on the
incorrect behavior (and then follow-up separately on any regressions), and I
agree.

Fixes PR18883.

llvm-svn: 219512
2014-10-10 17:06:20 +00:00
Reed Kotler 12f9488e33 Implement floating point to integer conversion in mips fast-isel
Summary: Add the ability to convert 64 or 32 bit floating point values to integer in mips fast-isel

Test Plan:
fpintconv.ll
ran 4 flavors of test-suite with no errors, misp32 r1/r2 O0/O2

Reviewers: dsanders

Reviewed By: dsanders

Subscribers: llvm-commits, rfuhler, mcrosier

Differential Revision: http://reviews.llvm.org/D5562

llvm-svn: 219511
2014-10-10 17:00:46 +00:00
Frederic Riss b3c9912a45 [dwarfdump] Prettyprint DW_AT_APPLE_property_attribute bitfield values.
This change depends on the ApplePropertyString helper that I sent spearately.
Not sure how you want this tested: as a tool test by adding a binary to dump, or as an llvm test starting from an IR file?

Reviewers: dblaikie, samsonov

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D5689

llvm-svn: 219507
2014-10-10 15:51:10 +00:00
Frederic Riss d4de180e19 [dwarfdump] Resolve also variable specifications/abstract_origins.
DW_AT_specification and DW_AT_abstract_origin resolving was only performed
on subroutine DIEs because it used the getSubroutineName method. Introduce
a more generic getName() and use it to dump the reference attributes.

Testcases have been updated to check the printed names instead of the offsets
except when the name could be ambiguous.

Reviewers: dblaikie, samsonov

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D5625

llvm-svn: 219506
2014-10-10 15:51:02 +00:00
Zoran Jovanovic 98bd58ca33 [mips][microMIPS] Implement ADDIUSP instruction
Differential Revision: http://reviews.llvm.org/D5084

llvm-svn: 219500
2014-10-10 14:37:30 +00:00
Zoran Jovanovic 95e14e711d [mips][microMIPS] Implement JR16 instruction
Differential Revision: http://reviews.llvm.org/D5062

llvm-svn: 219498
2014-10-10 14:02:44 +00:00
Zoran Jovanovic b26f889afa [mips][microMIPS] Implement ADDIUS5 instruction
Differential Revision: http://reviews.llvm.org/D5049

llvm-svn: 219495
2014-10-10 13:45:34 +00:00
Zoran Jovanovic b39a174f11 ps][microMIPS] Implement JRC instruction
Differential Revision: http://reviews.llvm.org/D5045

llvm-svn: 219494
2014-10-10 13:31:18 +00:00
Zoran Jovanovic 6097bad3f8 [mips][microMIPS] Implement JALRS16 instruction
Differential Revision: http://reviews.llvm.org/D5027

llvm-svn: 219493
2014-10-10 13:22:28 +00:00
David Majnemer b30753d731 Add tests for r219479.
llvm-svn: 219480
2014-10-10 06:59:05 +00:00
Arnold Schwaighofer d7d010eb2a SimplifyCFG: Don't convert phis into selects if we could remove undef behavior
instead

We used to transform this:

  define void @test6(i1 %cond, i8* %ptr) {
  entry:
    br i1 %cond, label %bb1, label %bb2

  bb1:
    br label %bb2

  bb2:
    %ptr.2 = phi i8* [ %ptr, %entry ], [ null, %bb1 ]
    store i8 2, i8* %ptr.2, align 8
    ret void
  }

into this:

  define void @test6(i1 %cond, i8* %ptr) {
    %ptr.2 = select i1 %cond, i8* null, i8* %ptr
    store i8 2, i8* %ptr.2, align 8
    ret void
  }

because the simplifycfg transformation into selects would happen to happen
before the simplifycfg transformation that removes unreachable control flow
(We have 'unreachable control flow' due to the store to null which is undefined
behavior).

The existing transformation that removes unreachable control flow in simplifycfg
is:

  /// If BB has an incoming value that will always trigger undefined behavior
  /// (eg. null pointer dereference), remove the branch leading here.
  static bool removeUndefIntroducingPredecessor(BasicBlock *BB)

Now we generate:

  define void @test6(i1 %cond, i8* %ptr) {
    store i8 2, i8* %ptr.2, align 8
    ret void
  }

I did not see any impact on the test-suite + externals.

rdar://18596215

llvm-svn: 219462
2014-10-10 01:27:02 +00:00
David Majnemer efbe94890c obj2yaml, COFF: Handle long section names
Long section names are represented as a slash followed by a numeric
ASCII string.  This number is an offset into a string table.

Print the appropriate entry in the string table instead of the less
enlightening /4.

N.B.  yaml2obj already does the right thing, this test exercises both
sides of the (de-)serialization.

llvm-svn: 219458
2014-10-10 00:17:57 +00:00
Sanjay Patel 3d497cd778 Improve sqrt estimate algorithm (fast-math)
This patch changes the fast-math implementation for calculating sqrt(x) from:
y = 1 / (1 / sqrt(x))
to:
y = x * (1 / sqrt(x))

This has 2 benefits: less code / faster code and one less estimate instruction 
that may lose precision.

The only target that will be affected (until http://reviews.llvm.org/D5658 is approved)
is PPC. The difference in codegen for PPC is 2 less flops for a single-precision sqrtf
or vector sqrtf and 4 less flops for a double-precision sqrt. 
We also eliminate a constant load and extra register usage.

Differential Revision: http://reviews.llvm.org/D5682

llvm-svn: 219445
2014-10-09 21:26:35 +00:00
Samuel Antao 1194b8fd40 Fix bug in GPR to FPR moves in PPC64LE.
The current implementation of GPR->FPR register moves uses a stack slot. This mechanism writes a double word and reads a word. In big-endian the load address must be displaced by 4-bytes in order to get the right value. In little endian this is no longer required. This patch fixes the issue and adds LE regression tests to fast-isel-conversion which currently expose this problem.

llvm-svn: 219441
2014-10-09 20:42:56 +00:00
Chad Rosier bd64d46188 [Reassociate] Don't canonicalize X - undef to X + (-undef).
Phabricator Revision: http://reviews.llvm.org/D5674
PR21205

llvm-svn: 219434
2014-10-09 20:06:29 +00:00
Hal Finkel cbbd3df836 Revert "[BasicAA] Revert "Revert r218714 - Make better use of zext and sign information.""
This reverts commit r219135 -- still causing miscompiles in SPEC it seems...

llvm-svn: 219432
2014-10-09 19:48:12 +00:00
Tom Stellard 3457a8495a R600/SI: Legalize CopyToReg during instruction selection
The instruction emitter will crash if it encounters a CopyToReg
node with a non-register operand like FrameIndex.

llvm-svn: 219428
2014-10-09 19:06:00 +00:00
Tom Stellard 8dd392e135 R600/SI: Legalize INSERT_SUBREG instructions during PostISelFolding
LLVM assumes INSERT_SUBREG will always have register operands, so
we need to legalize non-register operands, like FrameIndexes, to
avoid random assertion failures.

llvm-svn: 219420
2014-10-09 18:09:15 +00:00
Bill Schmidt cb34fd09cd [PPC64] VSX indexed-form loads use wrong instruction format
The VSX instruction definitions for lxsdx, lxvd2x, lxvdsx, and lxvw4x
incorrectly use the XForm_1 instruction format, rather than the
XX1Form instruction format.  This is likely a pasto when creating
these instructions, which were based on lvx and so forth.  This patch
uses the correct format.

The existing reformatting test (test/MC/PowerPC/vsx.s) missed this
because the two formats differ only in that XX1Form has an extension
to the target register field in bit 31.  The tests for these
instructions used a target register of 7, so the default of 0 in bit
31 for XForm_1 didn't expose a problem.  For register numbers 32-63
this would be noticeable.  I've changed the test to use higher
register numbers to verify my change is effective.

llvm-svn: 219416
2014-10-09 17:51:35 +00:00
Andrea Di Biagio 458a669f49 [InstCombine] Fix wrong folding of constant comparisons involving ashr and negative values.
This patch fixes a bug in method InstCombiner::FoldCmpCstShrCst where we
wrongly computed the distance between the highest bits set of two negative
values.

This fixes PR21222.

Differential Revision: http://reviews.llvm.org/D5700

llvm-svn: 219406
2014-10-09 12:41:49 +00:00
Robert Khasanov d5b14f7994 [AVX512] Extended avx512_binop_rm for AVX512VL subsets.
Added avx512_binop_rm_vl multiclass for VL subset
Added encoding tests

llvm-svn: 219390
2014-10-09 08:38:48 +00:00
Adam Nemet 47b2d5f1e0 [AVX512] Intrinsics for vextract*x4
This adds the Pat<>'s for the intrinsics.  These are necessary because we
don't lower these intrinsics to SDNodes but match them directly.  See the
rational in the previous commit.

llvm-svn: 219362
2014-10-08 23:25:37 +00:00
Adam Nemet 2b5cdbb3de [AVX512] Add asm-only support for vextract*x4 masking variants
These derive from the new asm-only masking definitions.

Unfortunately I wasn't able to find a ISel pattern that we could legally
generate for the masking variants.  The problem is that since the destination
is v4* we would need VK4 register classes and v4i1 value types to express the
masking.  These are however not legal types/classes in AVX512f but only in VL,
so things get complicated pretty quickly.  We can revisit this question later
if we have a more pressing need to express something like this.

So the ISel patterns are empty for the masking instructions and the next patch
will add Pat<>s instead to match the intrinsics calls with instructions.

llvm-svn: 219361
2014-10-08 23:25:33 +00:00
Robin Morisset 6f3d04e4b6 [X86] Don't transform atomic-load-add into an inc/dec when inc/dec is slow
llvm-svn: 219357
2014-10-08 23:16:23 +00:00
Robin Morisset f9e8721564 [X86] Avoid generating inc/dec when slow for x.atomic_store(1 + x.atomic_load())
Summary:
I had forgotten to check for NotSlowIncDec in the patterns that can generate
inc/dec for the above pattern (added in D4796).
This currently applies to Atom Silvermont, KNL and SKX.

Test Plan: New checks on atomic_mi.ll

Reviewers: jfb, nadav

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D5677

llvm-svn: 219336
2014-10-08 19:38:18 +00:00
David Majnemer ac07703842 Inliner: Non-local functions in COMDATs shouldn't be dropped
A function with discardable linkage cannot be discarded if its a member
of a COMDAT group without considering all the other COMDAT members as
well.  This sort of thing is already handled by GlobalOpt/GlobalDCE.

This fixes PR21206.

llvm-svn: 219335
2014-10-08 19:32:32 +00:00
Timur Iskhodzhanov 5fcaeebb72 Fix COFF section index relocation should be 16 bits, not 32
Original patch by Andrey Guskov!
http://reviews.llvm.org/D5651

llvm-svn: 219327
2014-10-08 18:01:49 +00:00
Rafael Espindola 8280fbbf9f Correctly compute the size of common symbols in COFF.
llvm-svn: 219324
2014-10-08 17:37:19 +00:00
Rafael Espindola 589b36aae7 Print symbol sizes in this test in preparation for fixing COFF common sizes.
llvm-svn: 219320
2014-10-08 17:19:42 +00:00
Justin Bogner 894eff7a9f Revert "[InstCombine] re-commit r218721 with fix for pr21199"
This seems to cause a miscompile when building clang, which causes a
bootstrapped clang to fail or crash in several of its tests.

See:
  http://lab.llvm.org:8013/builders/clang-x86_64-darwin11-RA/builds/1184
  http://bb.pgr.jp/builders/clang-3stage-x86_64-linux/builds/7813

This reverts commit r219282.

llvm-svn: 219317
2014-10-08 16:30:22 +00:00
Robert Khasanov b51bb22611 [AVX512] Added intrinsics for 128-, 256- and 512-bit versions of VPCMP/VPCMPU{BWDQ}
Added CMP_MASK_CC intrinsic type.
Added tests for intrinsics.

Patch by Sergey Lisitsyn <sergey.lisitsyn@intel.com>

llvm-svn: 219316
2014-10-08 15:49:26 +00:00
Renato Golin 0595a26c25 Emit unaligned access build attribute for ARM
Patch by Charlie Turner.

llvm-svn: 219301
2014-10-08 12:26:22 +00:00
David Majnemer 1b3b70e371 GlobalOpt: Don't drop unused memberes of a Comdat
A linkonce_odr member of a COMDAT shouldn't be dropped if we need to
keep the entire COMDAT group.

This fixes PR21191.

llvm-svn: 219283
2014-10-08 07:23:31 +00:00
Gerolf Hoflehner e2ff5b9223 [InstCombine] re-commit r218721 with fix for pr21199
The icmp-select-icmp optimization targets select-icmp.eq
only. This is now ensured by testing the branch predicate
explictly. This commit also includes the test case for pr21199.

llvm-svn: 219282
2014-10-08 06:42:19 +00:00
David Majnemer d7586046ee COFF: Don't oversize COMMON symbols when targeting BFD ld
COFF normally doesn't allow us to describe the alignment of COMMON
symbols.

It turns out that most linkers use the symbol size as a hint as to how
aligned the symbol should be.

However the BFD folks have added a .drectve command, which we
now support as of r219229, that allows us to specify the alignment
precisely.  With this in mind, stop rounding sizes up.

llvm-svn: 219281
2014-10-08 06:38:53 +00:00
David Majnemer 5fbe324ff9 llvm-dwarfdump: Add support for some COFF relocations
DWARF in COFF utilizes several relocations.  Implement support for them
in RelocVisitor to support llvm-dwarfdump.

llvm-svn: 219280
2014-10-08 06:38:50 +00:00
Chad Rosier d9d0f86a79 [AArch64] Generate vector signed/unsigned mul and mla/mls long.
Phabricator Revision: http://reviews.llvm.org/D5589
Patch by Balaram Makam <bmakam@codeaurora.org>!!

llvm-svn: 219276
2014-10-08 02:31:24 +00:00
Rui Ueyama b67154d01b llvm-readobj: add test for r219228
llvm-svn: 219274
2014-10-08 02:06:11 +00:00
Hans Wennborg 1256198bbc Revert r219175 - [InstCombine] re-commit r218721 icmp-select-icmp optimization
This seems to have caused PR21199.

llvm-svn: 219264
2014-10-08 01:05:57 +00:00
Robin Morisset 880580b88f [X86] Fix a bug with fetch_add(INT32_MIN)
Summary:
Fix pr21099

The pseudocode of what we were doing (spread through two functions) was:
if (operand.doesNotFitIn32Bits())
  Opc.initializeWithFoo();
if (operand < 0)
  operand = -operand;
if (operand.doesFitIn8Bits())
  Opc.initializeWithBar();
else if (operand.doesFitIn32Bits())
  Opc.initializeWithBlah();
doStuff(Opc);

So for operand == INT32_MIN, Opc was never initialized because the operand changes
from fitting in 32 bits to not fitting, causing the various bugs/error messages
noted by pr21099.

This patch adds an extra test at the beginning for this case, and an
llvm_unreachable to have better error message if the operand ends up
not fitting in 32-bits at the end.

Test Plan: new test + make check

Reviewers: jfb

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D5655

llvm-svn: 219257
2014-10-07 23:53:57 +00:00
David Blaikie c6c6c7b177 DebugInfo+DFSan: Ensure that debug info references to llvm::Functions remain pointing to the underlying function when wrappers are created
This is somewhat the inverse of how similar bugs in DAE and ArgPromo
manifested and were addressed. In those passes, individual call sites
were visited explicitly, and then the old function was deleted. This
left the debug info with a null llvm::Function* that needed to be
updated to point to the new function.

In the case of DFSan, it RAUWs the old function with the wrapper, which
includes debug info. So now the debug info refers to the wrapper, which
doesn't actually have any instructions with debug info in it, so it is
ignored entirely - resulting in a DW_TAG_subprogram with no high/low pc,
etc. Instead, fix up the debug info to refer to the original function
after the RAUW messed it up.

Reviewed/discussed with Peter Collingbourne on the llvm-dev mailing
list.

llvm-svn: 219249
2014-10-07 22:59:46 +00:00
Duncan P. N. Exon Smith c46cfcbbc6 LoopUnroll: Create sub-loops in LoopInfo
`LoopUnrollPass` says that it preserves `LoopInfo` -- make it so.  In
particular, tell `LoopInfo` about copies of inner loops when unrolling
the outer loop.

Conservatively, also tell `ScalarEvolution` to forget about the original
versions of these loops, since their inputs may have changed.

Fixes PR20987.

llvm-svn: 219241
2014-10-07 21:19:00 +00:00
Tom Stellard 20fa0be97f R600/SI: Remove assertion in SIInstrInfo::areLoadsFromSameBasePtr()
Added a FIXME coment instead, we need to handle the case where the
two DS instructions being compared have different numbers of operands.

llvm-svn: 219236
2014-10-07 21:09:20 +00:00
Saleem Abdulrasool 64d491e488 MC: add support for -aligncomm GNU extension
The GNU linker supports an -aligncomm directive that allows for power-of-2
alignment of common data.  Add support to emit this directive.

llvm-svn: 219229
2014-10-07 19:37:57 +00:00
Marcello Maggioni 963bc87dbd Two case switch to select optimization
This optimization tries to convert switch instructions that are used to select a value with only 2 unique cases + default block
to a select or a couple of selects (depending if the default block is reachable or not).

The typical case this optimization wants to be able to optimize is this one:

Example:
switch (a) {
  case 10:                %0 = icmp eq i32 %a, 10
    return 10;            %1 = select i1 %0, i32 10, i32 4
  case 20:        ---->   %2 = icmp eq i32 %a, 20
    return 2;             %3 = select i1 %2, i32 2, i32 %1
  default:
    return 4;
}

It also sets the base for further optimizations that are planned and being reviewed.

llvm-svn: 219223
2014-10-07 18:16:44 +00:00
David Blaikie 17364d4e05 DebugInfo+DeadArgElimination: Ensure llvm::Function*s from debug info are updated even when DAE removes both varargs and non-varargs arguments on the same function.
After some stellar (& inspired) help from Reid Kleckner providing a test
case for some rather unstable undefined behavior showing up as
assertions produced by r214761, I was able to fix this issue in DAE
involving the application of both varargs removal, followed by normal
argument removal.

Indeed I introduced this same bug into ArgumentPromotion (r212128) by
copying the code from DAE, and when I fixed the bug in ArgPromo
(r213805) and commented in that patch that I didn't need to address the
same issue in DAE because it was a single pass. Turns out it's two pass,
one for the varargs and one for the normal arguments, so the same fix is
needed (at least during varargs removal). So here it is.

(the observable/net effect of this bug, even when it didn't result in
assertion failure, is that debug info would describe the DAE'd function
in the abstract, but wouldn't provide high/low_pc, variable locations,
line table, etc (it would appear as though the function had been
entirely optimized away), see the original PR14016 for details of the
general problem)

I'm not recommitting the assertion just yet, as there's been another
regression of it since I last tried. It might just be a few test cases
weren't adequately updated after Adrian or Duncan's recent schema
changes.

llvm-svn: 219210
2014-10-07 15:10:23 +00:00
Suyog Sarda 181cc9a029 Remove Extra lines. NFC.
llvm-svn: 219201
2014-10-07 11:31:31 +00:00
Yuri Gorshenin e8c81fd25a [asan-asm-instrumentation] CFI directives are generated for .S files.
Summary: CFI directives are generated for .S files.

Reviewers: eugenis

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D5520

llvm-svn: 219199
2014-10-07 11:03:09 +00:00
Daniel Sanders f3fe49aac6 [mips] Return {f128} correctly for N32/N64.
Summary:
According to the ABI documentation, f128 and {f128} should both be returned
in $f0 and $f2. However, this doesn't match GCC's behaviour which is to
return f128 in $f0 and $f2, but {f128} in $f0 and $f1.

Reviewers: vmedic

Reviewed By: vmedic

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D5578

llvm-svn: 219196
2014-10-07 09:29:59 +00:00
Craig Topper 0676b902ad [X86] Fix a bug where the disassembler was ignoring the VEX.W bit in 32-bit mode for certain instructions it shouldn't.
Unfortunately, this isn't easy to fix since there's no simple way to figure out from the disassembler tables whether the W-bit is being used to select a 64-bit GPR or if its a required part of the opcode. The fix implemented here just looks for "64" in the instruction name and ignores the W-bit in 32-bit mode if its present.

Fixes PR21169.

llvm-svn: 219194
2014-10-07 07:29:50 +00:00
David Majnemer e025321d36 GlobalDCE: Don't drop any COMDAT members
If we require a single member of a comdat, require all of the other
members as well.

This fixes PR20981.

llvm-svn: 219191
2014-10-07 07:07:19 +00:00
Rafael Espindola dfc7ed7ad0 gold plugin: Handle gold selecting a linkonce GV when a weak is present.
The plugin API doesn't have the notion of linkonce, only weak. It is up to the
plugin to figure out if a symbol used only for the symbol table can be dropped.
In particular, it has to avoid dropping a linkonce_odr selected by gold if there
is also a weak_odr.

llvm-svn: 219188
2014-10-07 04:06:13 +00:00
Juergen Ributzka ef3722d8e9 [FastISel][AArch64] Teach the address computation code to also fold sign-/zero-extends.
The code already folds sign-/zero-extends, but only if they are arguments to
mul and shift instructions. This extends the code to also fold them when they
are direct inputs.

llvm-svn: 219187
2014-10-07 03:40:06 +00:00
Juergen Ributzka 75b2f34069 [FastISel][AArch64] Teach the address computation to also fold sub instructions.
Tiny enhancement to the address computation code to also fold sub instructions
if the rhs is constant and can be folded into the offset.

llvm-svn: 219186
2014-10-07 03:40:03 +00:00
Juergen Ributzka 42bf665f2b [FastISel][AArch64] Fix "Fold sign-/zero-extends into the load instruction."
This commit fixes an issue with sign-/zero-extending loads that was discovered
by Richard Barton.

We use now the correct load instructions for sign-extending loads to 64bit. Also
updated and added more unit tests.

llvm-svn: 219185
2014-10-07 03:39:59 +00:00
Rafael Espindola cb69583c50 gold plugin: create internal replacement with original linkage first.
The call to copyAttributesFrom will copy the visibility, which might assert
if it were to produce something invalid like "internal hidden". We avoid it
by first creating the replacement with the original linkage and then setting
it to internal affter the call to copyAttributesFrom.

llvm-svn: 219184
2014-10-07 03:19:55 +00:00
Rafael Espindola 93aec71ba9 gold plugin: Remap function arguments when creating a replacement function.
When creating an internal function replacement for use in an alias we were
not remapping the argument uses in the instructions to point to the new
arguments.

llvm-svn: 219177
2014-10-07 00:47:38 +00:00
Gerolf Hoflehner c0b4c20e5e [InstCombine] re-commit r218721 icmp-select-icmp optimization
Takes care of the assert that caused build fails.
Rather than asserting the code checks now that the definition
and use are in the same block, and does not attempt
to optimize when that is not the case.

llvm-svn: 219175
2014-10-07 00:16:12 +00:00
NAKAMURA Takumi 3848e903e4 llvm/test/lit.cfg: Suppress dwarf stuff for targeting x86_64-mingw32 while investigating since r219108.
llvm-svn: 219171
2014-10-06 23:29:06 +00:00
Hal Finkel 9808595319 [DAGCombine] Remove SIGN_EXTEND-related inf-loop
The patch's author points out that, despite the function's documentation,
getSetCCResultType is only used to get the SETCC result type (with one
here-removed problematic exception). In one case, getSetCCResultType was being
used to get the predicate type to use for a SELECT node, and then
SIGN_EXTENDing (or truncating) to get the input predicate to match that type.
Unfortunately, this was happening inside visitSIGN_EXTEND, and creating new
SIGN_EXTEND nodes was causing an infinite loop. In addition, this behavior was
wrong if a target was not using ZeroOrNegativeOneBooleanContent. Lastly, the
extension/truncation seems unnecessary here: SELECT is defined as:

  Select(COND, TRUEVAL, FALSEVAL). If the type of the boolean COND is not i1
  then the high bits must conform to getBooleanContents.

So here we remove this use of getSetCCResultType and update
getSetCCResultType's documentation to reflect its actual uses.

Patch by deadal nix!

llvm-svn: 219141
2014-10-06 20:19:47 +00:00
Sanjay Patel 7bc9185ab5 Fast-math fold: x / (y * sqrt(z)) -> x * (rsqrt(z) / y)
The motivation is to recognize code such as this from /llvm/projects/test-suite/SingleSource/Benchmarks/BenchmarkGame/n-body.c:

float distance = sqrt(dx * dx + dy * dy + dz * dz);
float mag = dt / (distance * distance * distance);

Without this patch, we don't match the sqrt as a reciprocal sqrt, so for PPC the new testcase in this patch produces:

   addis 3, 2, .LCPI4_2@toc@ha
   lfs 4, .LCPI4_2@toc@l(3)
   addis 3, 2, .LCPI4_1@toc@ha
   lfs 0, .LCPI4_1@toc@l(3)
   fcmpu 0, 1, 4
   beq 0, .LBB4_2
# BB#1:
   frsqrtes 4, 1
   addis 3, 2, .LCPI4_0@toc@ha
   lfs 5, .LCPI4_0@toc@l(3)
   fnmsubs 13, 1, 5, 1
   fmuls 6, 4, 4
   fmadds 1, 13, 6, 5
   fmuls 1, 4, 1
   fres 4, 1                <--- reciprocal of reciprocal square root
   fnmsubs 1, 1, 4, 0
   fmadds 4, 4, 1, 4
.LBB4_2:
   fmuls 1, 4, 2
   fres 2, 1
   fnmsubs 0, 1, 2, 0
   fmadds 0, 2, 0, 2
   fmuls 1, 3, 0
   blr

After the patch, this simplifies to:

frsqrtes 0, 1
addis 3, 2, .LCPI4_1@toc@ha
fres 5, 2
lfs 4, .LCPI4_1@toc@l(3)
addis 3, 2, .LCPI4_0@toc@ha
lfs 7, .LCPI4_0@toc@l(3)
fnmsubs 13, 1, 4, 1
fmuls 6, 0, 0
fnmsubs 2, 2, 5, 7
fmadds 1, 13, 6, 4
fmadds 2, 5, 2, 5
fmuls 0, 0, 1
fmuls 0, 0, 2
fmuls 1, 3, 0
blr

Differential Revision: http://reviews.llvm.org/D5628

llvm-svn: 219139
2014-10-06 19:31:18 +00:00
Hal Finkel 43ce71f1b1 [BasicAA] Revert "Revert r218714 - Make better use of zext and sign information."
This reverts r218944, which reverted r218714, plus a bug fix.

Description of the bug in r218714 (by Nick)

The original patch forgot to check if the Scale in VariableGEPIndex flipped the
sign of the variable. The BasicAA pass iterates over the instructions in the
order they appear in the function, and so BasicAliasAnalysis::aliasGEP is
called with the variable it first comes across as parameter GEP1. Adding a
%reorder label puts the definition of %a after %b so aliasGEP is called with %b
as the first parameter and %a as the second. aliasGEP later calculates that %a
== %b + 1 - %idxprom where %idxprom >= 0 (if %a was passed as the first
parameter it would calculate %b == %a - 1 + %idxprom where %idxprom >= 0) -
ignoring that %idxprom is scaled by -1 here lead the patch to incorrectly
conclude that %a > %b.

Revised patch by Nick White, thanks! Thanks to Lang to isolating the bug.
Slightly modified by me to add an early exit from the loop and avoid
unnecessary, but expensive, function calls.

Original commit message:

Two related things:

 1. Fixes a bug when calculating the offset in GetLinearExpression. The code
    previously used zext to extend the offset, so negative offsets were converted
    to large positive ones.

 2. Enhance aliasGEP to deduce that, if the difference between two GEP
    allocations is positive and all the variables that govern the offset are also
    positive (i.e. the offset is strictly after the higher base pointer), then
    locations that fit in the gap between the two base pointers are NoAlias.

Patch by Nick White!

llvm-svn: 219135
2014-10-06 18:37:59 +00:00
Hans Wennborg 1b1a399489 MachObjectWriter: optimize the string table for common suffices
This is a follow-up to r207670 (ELF) and r218636 (COFF).

Differential Revision: http://reviews.llvm.org/D5622

llvm-svn: 219126
2014-10-06 17:05:19 +00:00
Timur Iskhodzhanov 116033303b Fix dumping codeview line tables when there are multiple debug sections
Codeview line tables for functions in different sections refer to a common
STRING_TABLE_SUBSECTION for filenames.
This happens when building with -Gy or with inline functions with MSVC.

Original patch by Jeff Muizelaar!

llvm-svn: 219125
2014-10-06 16:59:44 +00:00
Hal Finkel 8eae3ad2ff [CFL-AA] Update for handling of globals and more tests
We used to return PartialAlias if *either* variable being queried interacted
with arguments or globals. AFAICT, we can change this to only returning
MayAlias iff *both* variables being queried interacted with arguments or
globals.

Also, adding some basic functionality tests: some basic IPA tests, checking
that we give conservative responses with arguments/globals thrown in the mix,
and ensuring that we trace values through stores and loads.

Note that saying that 'x' interacted with arguments or globals means that the
Attributes of the StratifiedSet that 'x' belongs to has any bits set.

Patch by George Burgess IV, thanks!

llvm-svn: 219122
2014-10-06 14:42:56 +00:00
Eric Christopher 2dab835979 For biendian targets like ARM and AArch64, it is useful to have the
output of the llvm-dwarfdump and llvm-objdump report the endianness
used when the object files were generated.

Patch by Charlie Turner.

llvm-svn: 219110
2014-10-06 07:06:03 +00:00
Eric Christopher 9864519c8b Add support for ARM and AArch64 big endian objects to
RelocVisitor.

Patch by Charlie Turner.

llvm-svn: 219109
2014-10-06 07:02:58 +00:00
Eric Christopher 1452c8e640 Add some tests for RelocVisitor.
Patch by Charlie Turner.

llvm-svn: 219107
2014-10-06 06:52:52 +00:00
Frederic Riss d1cfc3c791 [dwarfdump] Print the name for referenced specification of abstract_origin DIEs.
Reviewers: dblaikie, samsonov, echristo, aprantl

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D5466

llvm-svn: 219099
2014-10-06 03:36:31 +00:00
Owen Anderson 8373d338f6 Give the Reassociate pass a bit more flexibility and autonomy when optimizing expressions.
Particularly, it addresses cases where Reassociate breaks Subtracts but then fails to optimize combinations like I1 + -I2 where I1 and I2 have the same rank and are identical.

Patch by Dmitri Shtilman.

llvm-svn: 219092
2014-10-05 23:41:26 +00:00
Chandler Carruth 0927da4583 [x86] Remove the 2-addr-to-3-addr "optimization" from shufps to pshufd.
This trades a (register-renamer-friendly) movaps for a floating point
/ integer domain cross. That is a very bad trade, even on architectures
where domain crossing is relatively fast. On any chip where there is
even a cycle stall, this is a Very Bad Idea. It doesn't even seem likely
to cause a spill to be introduced because the reason for the copy is to
destructively shuffle in place.

Thanks to Ben Kramer for fixing a bug in this code that my new shuffle
lowering exposed and highlighting that perhaps it should just go away.
=]

llvm-svn: 219090
2014-10-05 22:57:31 +00:00
Chandler Carruth daa1ff985c [x86, dag] Teach the DAG combiner to prune inputs toa vector_shuffle
that are unused.

This allows the combiner to delete math feeding shuffles where the math
isn't actually necessary. This improves some of the vperm2x128 tests
that regressed when the vector shuffle lowering started actually
generating vperm instructions rather than forcibly decomposing them.

Sadly, this isn't enough to get this *really* right because we still
form a completely unnecessary permutation. To fix that, we also need to
fold shuffles which just rearrange concatenated or inserted subvectors.

llvm-svn: 219086
2014-10-05 19:14:34 +00:00
Benjamin Kramer 77b0e13aba X86: Don't drop half of the mask when converting 2-address shufps into 3-address pshufd.
It's debatable whether this transform is useful at all, but for now make sure
we don't generate invalid asm.

llvm-svn: 219084
2014-10-05 16:14:29 +00:00
Elena Demikhovsky 44bf0637d5 AVX-512-SKX: Added instruction VPMOVM2B/W/D/Q.
This instruction allows to broadacst mask vector to data vector.

llvm-svn: 219083
2014-10-05 14:11:08 +00:00
Chandler Carruth acecdc0211 [x86] Fix PR21139, one of the last remaining regressions found in the
new vector shuffle lowering.

This is loosely based on a patch by Marius Wachtler to the PR (thanks!).
I refactored it a bi to use std::count_if and a mutable array ref but
the core idea was exactly right. I also added some direct testing of
this case.

I believe PR21137 is now the only remaining regression.

llvm-svn: 219081
2014-10-05 12:07:34 +00:00
Chandler Carruth 9f4d9fa54e [x86] Teach the new vector shuffle lowering how to lower 128-bit
shuffles using AVX and AVX2 instructions. This fixes PR21138, one of the
few remaining regressions impacting benchmarks from the new vector
shuffle lowering.

You may note that it "regresses" many of the vperm2x128 test cases --
these were actually "improved" by the naive lowering that the new
shuffle lowering previously did. This regression gave me fits. I had
this patch ready-to-go about an hour after flipping the switch but
wasn't sure how to have the best of both worlds here and thought the
correct solution might be a completely different approach to lowering
these vector shuffles.

I'm now convinced this is the correct lowering and the missed
optimizations shown in vperm2x128 are actually due to missing
target-independent DAG combines. I've even written most of the needed
DAG combine and will submit it shortly, but this part is ready and
should help some real-world benchmarks out.

llvm-svn: 219079
2014-10-05 11:41:36 +00:00
Hal Finkel 04a156139e [InstCombine] Remove redundant @llvm.assume intrinsics
For any @llvm.assume intrinsic, if there is another which dominates it and uses
the same condition, then it is redundant and can be removed. While this does
not alter the semantics of the @llvm.assume intrinsics, it makes subsequent
handling more efficient (and the resulting IR easier to read).

llvm-svn: 219067
2014-10-04 21:27:06 +00:00
Chandler Carruth 808ec85ad0 [x86] Slap a triple on this test since it is poking around at the stack
and calling conventions. Otherwise its too hard to craft a usefully
generic set of assertions.

llvm-svn: 219047
2014-10-04 04:22:55 +00:00
Chandler Carruth 99627bfbff [x86] Enable the new vector shuffle lowering by default.
Update the entire regression test suite for the new shuffles. Remove
most of the old testing which was devoted to the old shuffle lowering
path and is no longer relevant really. Also remove a few other random
tests that only really exercised shuffles and only incidently or without
any interesting aspects to them.

Benchmarking that I have done shows a few small regressions with this on
LNT, zero measurable regressions on real, large applications, and for
several benchmarks where the loop vectorizer fires in the hot path it
shows 5% to 40% improvements for SSE2 and SSE3 code running on Sandy
Bridge machines. Running on AMD machines shows even more dramatic
improvements.

When using newer ISA vector extensions the gains are much more modest,
but the code is still better on the whole. There are a few regressions
being tracked (PR21137, PR21138, PR21139) but by and large this is
expected to be a win for x86 generated code performance.

It is also more correct than the code it replaces. I have fuzz tested
this extensively with ISA extensions up through AVX2 and found no
crashes or miscompiles (yet...). The old lowering had a few miscompiles
and crashers after a somewhat smaller amount of fuzz testing.

There is one significant area where the new code path lags behind and
that is in AVX-512 support. However, there was *extremely little*
support for that already and so this isn't a significant step backwards
and the new framework will probably make it easier to implement lowering
that uses the full power of AVX-512's table-based shuffle+blend (IMO).

Many thanks to Quentin, Andrea, Robert, and others for benchmarking
assistance. Thanks to Adam and others for help with AVX-512. Thanks to
Hal, Eric, and *many* others for answering my incessant questions about
how the backend actually works. =]

I will leave the old code path in the tree until the 3 PRs above are at
least resolved to folks' satisfaction. Then I will rip it (and 1000s of
lines of code) out. =] I don't expect this flag to stay around for very
long. It may not survive next week.

llvm-svn: 219046
2014-10-04 03:52:55 +00:00
Matt Arsenault c996175b57 R600/SI: Custom lower f64 -> i64 conversions
llvm-svn: 219038
2014-10-03 23:54:56 +00:00
Matt Arsenault f7c95e3eda R600: Custom lower [s|u]int_to_fp for i64 -> f64
llvm-svn: 219037
2014-10-03 23:54:41 +00:00
Matt Arsenault 6cda887776 R600/SI: Fix ftrunc f64 conformance failures.
Re-add the tests since they were deleted at some point

llvm-svn: 219036
2014-10-03 23:54:27 +00:00
Chandler Carruth f3e880697a [x86] Add a really preposterous number of patterns for matching all of
the various ways in which blends can be used to do vector element
insertion for lowering with the scalar math instruction forms that
effectively re-blend with the high elements after performing the
operation.

This then allows me to bail on the element insertion lowering path when
we have SSE4.1 and are going to be doing a normal blend, which in turn
restores the last of the blends lost from the new vector shuffle
lowering when I got it to prioritize insertion in other cases (for
example when we don't *have* a blend instruction).

Without the patterns, using blends here would have regressed
sse-scalar-fp-arith.ll *completely* with the new vector shuffle
lowering. For completeness, I've added RUN-lines with the new lowering
here. This is somewhat superfluous as I'm about to flip the default, but
hey, it shows that this actually significantly changed behavior.

The patterns I've added are just ridiculously repetative. Suggestions on
making them better very much welcome. In particular, handling the
commuted form of the v2f64 patterns is somewhat obnoxious.

llvm-svn: 219033
2014-10-03 22:43:17 +00:00
Chandler Carruth 0adda1e4d4 [x86] Adjust the patterns for lowering X86vzmovl nodes which don't
perform a load to use blendps rather than movss when it is available.

For non-loads, blendps is *much* faster. It can execute on two ports in
Sandy Bridge and Ivy Bridge, and *three* ports on Haswell. This fixes
one of the "regressions" from aggressively taking the "insertion" path
in the new vector shuffle lowering.

This does highlight one problem with blendps -- it isn't commuted as
heavily as it should be. That's future work though.

llvm-svn: 219022
2014-10-03 21:38:49 +00:00
Richard Smith 1ed4229f6f PR21145: Teach LLVM about C++14 sized deallocation functions.
C++14 adds new builtin signatures for 'operator delete'. This change allows
new/delete pairs to be removed in C++14 onwards, as they were in C++11 and
before.

llvm-svn: 219014
2014-10-03 20:17:06 +00:00
Duncan P. N. Exon Smith 176b691d32 Revert "Revert "DI: Fold constant arguments into a single MDString""
This reverts commit r218918, effectively reapplying r218914 after fixing
an Ocaml bindings test and an Asan crash.  The root cause of the latter
was a tightened-up check in `DILexicalBlock::Verify()`, so I'll file a
PR to investigate who requires the loose check (and why).

Original commit message follows.

--

This patch addresses the first stage of PR17891 by folding constant
arguments together into a single MDString.  Integers are stringified and
a `\0` character is used as a separator.

Part of PR17891.

Note: I've attached my testcases upgrade scripts to the PR.  If I've
just broken your out-of-tree testcases, they might help.

llvm-svn: 219010
2014-10-03 20:01:09 +00:00
Adam Nemet ff63a2dc51 [ISel] Keep matching state consistent when folding during X86 address match
In the X86 backend, matching an address is initiated by the 'addr' complex
pattern and its friends.  During this process we may reassociate and-of-shift
into shift-of-and (FoldMaskedShiftToScaledMask) to allow folding of the
shift into the scale of the address.

However as demonstrated by the testcase, this can trigger CSE of not only the
shift and the AND which the code is prepared for but also the underlying load
node.  In the testcase this node is sitting in the RecordedNode and MatchScope
data structures of the matcher and becomes a deleted node upon CSE.  Returning
from the complex pattern function, we try to access it again hitting an assert
because the node is no longer a load even though this was checked before.

Now obviously changing the DAG this late is bending the rules but I think it
makes sense somewhat.  Outside of addresses we prefer and-of-shift because it
may lead to smaller immediates (FoldMaskAndShiftToScale is an even better
example because it create a non-canonical node).  We currently don't recognize
addresses during DAGCombiner where arguably this canonicalization should be
performed.  On the other hand, having this in the matcher allows us to cover
all the cases where an address can be used in an instruction.

I've also talked a little bit to Dan Gohman on llvm-dev who added the RAUW for
the new shift node in FoldMaskedShiftToScaledMask.  This RAUW is responsible
for initiating the recursive CSE on users
(http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-September/076903.html) but it
is not strictly necessary since the shift is hooked into the visited user.  Of
course it's safer to keep the DAG consistent at all times (e.g. for accurate
number of uses, etc.).

So rather than changing the fundamentals, I've decided to continue along the
previous patches and detect the CSE.  This patch installs a very targeted
DAGUpdateListener for the duration of a complex-pattern match and updates the
matching state accordingly.  (Previous patches used HandleSDNode to detect the
CSE but that's not practical here).  The listener is only installed on X86.

I tested that there is no measurable overhead due to this while running
through the spec2k BC files with llc.  The only thing we pay for is the
creation of the listener.  The callback never ever triggers in spec2k since
this is a corner case.

Fixes rdar://problem/18206171

llvm-svn: 219009
2014-10-03 20:00:34 +00:00
Tom Stellard fae1dc8a12 R600: Align functions to 256 bytes
llvm-svn: 219002
2014-10-03 19:02:02 +00:00
Robin Morisset 028eabcdaa [Power] Delete redundant test Atomics-32.ll
The test Atomics-32.ll was both redundant (all operations are also checked by
atomics.ll at least) and not actually checking correctness (it was not using
FileCheck, just verifying that the compiler does not crash).

llvm-svn: 218997
2014-10-03 18:10:07 +00:00
Rui Ueyama 1af0865871 llvm-readobj: print out the fields of the COFF delay-import table
llvm-svn: 218996
2014-10-03 18:07:18 +00:00
Robin Morisset 9098fee690 [Power] Use lwsync for non-seq_cst fences
Summary:
hwsync is only required for seq_cst fences, acquire and release one can use
the cheaper lwsync.

Test Plan: Added some cases to atomics.ll + make check-all

Reviewers: jfb, wschmidt

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D5317

llvm-svn: 218995
2014-10-03 18:04:36 +00:00
Daniel Sanders ef638fea2d [mips] Print warning when using register names not available in N32/64
Summary:
The register names t4-t7 are not available in the N32 and N64 ABIs.
This patch prints a warning, when those names are used in N32/64,
along with a fix-it with the correct register names.

Patch by Vasileios Kalintiris

Reviewers: dsanders

Reviewed By: dsanders

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D5272

llvm-svn: 218989
2014-10-03 15:37:37 +00:00
Chandler Carruth 1964078936 [x86] Teach the new vector shuffle lowering to aggressively form MOVSS
and MOVSD nodes for single element vector inserts.

This is particularly important because a number of patterns in the
backend detect these patterns and leverage them to simplify things. It
also fixes quite a few of the insertion bad code examples. However, it
regresses a specific area: when available, blendps and blendpd are
*dramatically* faster than movss and movsd respectively. But it doesn't
really work to form the blend logic first because the blends *aren't* as
crazy efficient when the data is coming from memory anyways, and thus
will have a movss or movsd regardless. Also, doing that would block
a bunch of the patterns that this is designed to hit.

So my plan is to go into the patterns for lowering MOVSS and MOVSD and
lower them via blends when available. However that's a pretty invasive
restructuring so it will need to be a follow-up patch.

I have already gone into the patterns to lower MOVSS and MOVSD from
memory using MOVLPD, etc. Without that, several of the test cases
I already have regress.

llvm-svn: 218985
2014-10-03 13:11:13 +00:00
Renato Golin 4e31ae1051 Revert 202433 - Provide a target override for the latest regalloc heuristic
That commit was introduced in order to help investigate a problem in ARM
codegen breaking from commit 202304 (Add a limit to the heuristic that register
allocates instructions in local order). Recent analisys indicated that the
problem no longer exists, so I'm reverting this change.

See PR18996.

llvm-svn: 218981
2014-10-03 12:20:53 +00:00
Chandler Carruth 7a4459fec1 [x86] Fix the RUN-lines of this test to make sense.
I got them quite wrong when updating it and had the SSE4.1 run checked
for SSE2 and the SSE2 run checked for SSE4.1. I think everything was
actually generic SSE, but this still seems good to fix. While here,
hoist the triple into the IR and make the flag set a bit more direct in
what it is trying to test.

llvm-svn: 218978
2014-10-03 11:30:02 +00:00
Chandler Carruth 971a560cb8 [x86] Significantly improve the ability of the new vector shuffle
lowering to match VZEXT_MOVL patterns.

I hadn't realized that these had sufficient pattern smarts in the
backend to lower zext-ing from the low element of a vector without it
being a scalar_to_vector node. They do, and this is how to match a bunch
of patterns for movq, movss, etc.

There is a weird propensity to end up using pshufd to place the element
afterward even though it means domain crossing (or rather, to use
xorps+movss to zext the element rather than movq) but that's an
orthogonal problem with VZEXT_MOVL that someone should probably look at.

llvm-svn: 218977
2014-10-03 11:25:58 +00:00
Chandler Carruth 080cab91e1 [x86] Add some important, missing test coverage for blending from one
vector to a zero vector for the v2 cases and fix the v4 integer cases to
actually blend from a vector.

There are already seprate tests for the case of inserting from a scalar.

These cases cover a lot of the regressions I've seen in the regression
test suite for the new vector shuffle lowering and specifically cover
the reported lack of using various zext-ing instruction patterns. My
next patch should fix a big chunk of this, but wanted to get a nice
baseline for these patterns in the test cases first.

llvm-svn: 218976
2014-10-03 11:16:45 +00:00
Chandler Carruth e91b316266 [x86] Unbreak SSE1 with the new vector shuffle lowering. We can't widen
element types to form illegal vector types.

I've added a special SSE1 test case here that makes sure we don't break
this going forward.

llvm-svn: 218974
2014-10-03 10:11:39 +00:00
Chandler Carruth ea026fbb7b [x86] Add two more triples to stabilize the precise assembly syntax
across platforms.

llvm-svn: 218973
2014-10-03 09:43:23 +00:00
Chandler Carruth b193d96bce [x86] Remove a couple of fairly pointless tests. These were merely
testing that we generated divps and divss but not in a very systematic
way. There are other tests for widening binary operations already that
make these unnecessary.

The second one seems mostly about testing Atom as well as normal X86,
but despite the comment claiming it is testing a different instruction
sequence, it then tests for exactly the same div instruction sequence!
(The sequence of instructions is actually quite different on Atom, but
not the sequence of div instructions....)

And then it has an "execution" test that simply isn't run? Very strange.
Anyways, none of this is really needed so clean this up.

llvm-svn: 218972
2014-10-03 09:43:19 +00:00
James Molloy cb7449d058 Revert r215343.
This was contentious and needs invesigation.

llvm-svn: 218971
2014-10-03 09:29:24 +00:00
Daniel Sanders 36a3e69bd4 [mips] Remove XFAIL from two XPASS'ing tests on the llvm-mips-linux builder
llvm-svn: 218967
2014-10-03 08:49:44 +00:00
Chandler Carruth 5bacb0aef0 [x86] Add another triple to a test to make the comment syntax stable.
Should fix darwin builders.

llvm-svn: 218956
2014-10-03 02:06:28 +00:00
Chandler Carruth bfde7c2da9 [x86] Add triples to these tests so that we see fewer calling convention
differences and they're a bit easier to maintain. This should fix the
tests on cygwin bots, etc.

llvm-svn: 218955
2014-10-03 02:00:09 +00:00
Chandler Carruth 3a0883f32e [x86] Regenerate precise FileCheck lines for the lats batch of test
cases.

llvm-svn: 218954
2014-10-03 01:57:38 +00:00
Chandler Carruth 9a24a33b14 [x86] Remove another low-value test still written using grep. We have
many tests for movss and friends.

llvm-svn: 218953
2014-10-03 01:57:35 +00:00
Chandler Carruth b56b18d8b3 [x86] Regenerate precise checks for a couple of test cases and remove
a test case that was just grepping the debug stats output rather than
actually checking the generated code for anything useful.

llvm-svn: 218951
2014-10-03 01:50:08 +00:00
Chandler Carruth be3a669440 [x86] Remove an over-reduced test case. This would need to be
intergrated much more fully into some logical part of the backend to
really understand what it is trying to accomplish and how to update it.
I suspect it no longer holds enough value to be worth having.

llvm-svn: 218950
2014-10-03 01:50:06 +00:00
Chandler Carruth fb924fdf38 [x86] Regenerate and clean up more tests is preparation for vector
shufle switch.

I nuked a win64 config from one test as it doesn't really make sense to
cover that ABI specially for generic v2f32 tests...

llvm-svn: 218948
2014-10-03 01:44:04 +00:00
Chandler Carruth c75abc162c [x86] Cleanup and generate precise FileCheck assertions for a bunch of
SSE tests.

llvm-svn: 218947
2014-10-03 01:37:58 +00:00
Chandler Carruth c1e2fc726c [x86] This is a terrible SSE1 test, but we should keep it. I've deleted
two functions that really didn't have any interesting assertions, and
generated more precise tests for one of the others.

llvm-svn: 218946
2014-10-03 01:37:56 +00:00
Chandler Carruth 23578e9055 [x86] Merge two very similar tests and regenerate FileCheck lines for
them.

llvm-svn: 218945
2014-10-03 01:37:53 +00:00
Lang Hames 89e9c17235 [BasicAA] Revert r218714 - Make better use of zext and sign information.
This patch broke 447.dealII on Darwin. I'm currently working on a reduced
test-case, but reverting for now to keep the bots happy.

<rdar://problem/18530107>

llvm-svn: 218944
2014-10-03 01:33:47 +00:00
Chandler Carruth b264e4a325 [x86] Regenerate a number of FileCheck assertions with my script for
test cases that will change with the new vector shuffle lowering. This
gives us a nice baseline for deltas against. I've checked and removed
the cases where there were weird register usage being pinned down, and
all of these are extremely pin-pointed tests so fully checking them
seems very appropriate.

llvm-svn: 218941
2014-10-03 01:06:32 +00:00
Chandler Carruth 9df29f3a6f [x86] Remove a couple of other overly isolated tests that are low-value
at this point. We have lots of tests of peephole optimizations with
insert and extract on vectors.

llvm-svn: 218940
2014-10-03 01:06:30 +00:00
Chandler Carruth d20c80ae79 [x86] Remove a test that provides little value. There are plenty of
tests for zext of a vector.

llvm-svn: 218939
2014-10-03 01:06:27 +00:00
Chandler Carruth 19c9f8cd50 [x86] Regenerate a bunch more avx512 test cases using my script to have
tighter, more strict FileCheck assertions. Some of these I really like
as they show case exactly what instruction sequences come out of these
microscopic functionality tests.

llvm-svn: 218936
2014-10-03 00:50:03 +00:00
Chandler Carruth 666ee3f7ab [x86] Regenerate an avx512 test with my script to provide a nice
baseline for updates from the new vector shuffle lowering.

I've inspected the results here, and I couldn't find any register
allocation decisions where there should be any realistic way to register
allocate things differently. The closest was the imul test case. If you
see something here you'd like register number variables on, just shout
and I'll add them.

llvm-svn: 218935
2014-10-03 00:44:46 +00:00
Rui Ueyama 15d993591c llvm-readobj: print COFF delay-load import table
This patch adds another iterator to access the delay-load import table
and use it from llvm-readobj.

http://reviews.llvm.org/D5594

llvm-svn: 218933
2014-10-03 00:41:58 +00:00
Chandler Carruth 3eda855c69 [x86] Remove some of the --show-mc-encoding flags from avx512 tests that
need to be updated for the new vector shuffle lowering.

After talking to Adam Nemet, Tim Northover, etc., it seems that testing
MC encodings in the same suite as the basic codegen isn't the right
approach. Instead, we're going to want dedicated MC tests for the
encodings. These encodings are starting to get in my way so I wanted to
cut them out early. The total set of instructions that should have
encoding tests added is:

  vpaddd
  vsqrtss
  vsqrtsd
  vmovlhps
  vmovhlps
  valignq
  vbroadcastss

Not too many parts of these tests were even using this. =]

llvm-svn: 218932
2014-10-03 00:36:29 +00:00
Rui Ueyama a3f58694b5 llvm-readobj: add a test for COFF import-by-ordinal symbols
llvm-svn: 218924
2014-10-02 22:40:55 +00:00
Hal Finkel fe3368cb57 [PowerPC] Modern Book-E cores support sync
Older Book-E cores, such as the PPC 440, support only msync (which has the same
encoding as sync 0), but not any of the other sync forms. Newer Book-E cores,
however, do support sync, and for performance reasons we should allow the use
of the more-general form.

This refactors msync use into its own feature group so that it applies by
default only to older Book-E cores (of the relevant cores, we only have
definitions for the PPC440/450 currently).

llvm-svn: 218923
2014-10-02 22:34:22 +00:00
Robin Morisset e1ca44bd4c [Power] Improve the expansion of atomic loads/stores
Summary:
Atomic loads and store of up to the native size (32 bits, or 64 for PPC64)
can be lowered to a simple load or store instruction (as the synchronization
is already handled by AtomicExpand, and the atomicity is guaranteed thanks to
the alignment requirements of atomic accesses). This is exactly what this patch
does. Previously, these were implemented by complex
load-linked/store-conditional loops.. an obvious performance problem.

For example, this patch turns
```
define void @store_i8_unordered(i8* %mem) {
  store atomic i8 42, i8* %mem unordered, align 1
  ret void
}
```
from
```
_store_i8_unordered:                    ; @store_i8_unordered
; BB#0:
    rlwinm r2, r3, 3, 27, 28
    li r4, 42
    xori r5, r2, 24
    rlwinm r2, r3, 0, 0, 29
    li r3, 255
    slw r4, r4, r5
    slw r3, r3, r5
    and r4, r4, r3
LBB4_1:                                 ; =>This Inner Loop Header: Depth=1
    lwarx r5, 0, r2
    andc r5, r5, r3
    or r5, r4, r5
    stwcx. r5, 0, r2
    bne cr0, LBB4_1
; BB#2:
    blr
```
into
```
_store_i8_unordered:                    ; @store_i8_unordered
; BB#0:
    li r2, 42
    stb r2, 0(r3)
    blr

```
which looks like a pretty clear win to me.

Test Plan:
fixed the tests + new test for indexed accesses + make check-all

Reviewers: jfb, wschmidt, hfinkel

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D5587

llvm-svn: 218922
2014-10-02 22:27:07 +00:00
Juergen Ributzka 99bd3cba8b [Stackmaps] Make ithe frame-pointer required for stackmaps.
Do not eliminate the frame pointer if there is a stackmap or patchpoint in the
function. All stackmap references should be FP relative.

This fixes PR21107.

llvm-svn: 218920
2014-10-02 22:21:49 +00:00
Duncan P. N. Exon Smith 786cd049fc Revert "DI: Fold constant arguments into a single MDString"
This reverts commit r218914 while I investigate some bots.

llvm-svn: 218918
2014-10-02 22:15:31 +00:00
Rui Ueyama 861021f986 llvm-readobj: print COFF imported symbols
This patch defines a new iterator for the imported symbols.
Make a change to COFFDumper to use that iterator to print
out imported symbols and its ordinals.

llvm-svn: 218915
2014-10-02 22:05:29 +00:00
Duncan P. N. Exon Smith 571f97bd90 DI: Fold constant arguments into a single MDString
This patch addresses the first stage of PR17891 by folding constant
arguments together into a single MDString.  Integers are stringified and
a `\0` character is used as a separator.

Part of PR17891.

Note: I've attached my testcases upgrade scripts to the PR.  If I've
just broken your out-of-tree testcases, they might help.

llvm-svn: 218914
2014-10-02 21:56:57 +00:00
Chandler Carruth 75e182b414 [x86] Teach the new vector shuffle lowering to widen floating point
elements as well as integer elements in order to form simpler shuffle
patterns.

This is the primary reason why we were failing to match some of the
2-and-2 floating point shuffles such as PR21140. Even after fixing this
we need to support some extra patterns in the backend in order to match
the resulting X86ISD::UNPCKL nodes into the correct instructions. This
commit should fix PR21140 and includes more comprehensive testing of
insertion patterns in v4 shuffles.

Not all of the added tests are beautiful. For example, we don't have
clever instructions to insert-via-load in the integer domain. There are
also some places where we aren't sufficiently cunning with our use of
movq and movd, but that's future work.

llvm-svn: 218911
2014-10-02 21:37:14 +00:00
Sanjay Patel 13a657819b Remove unused function attribute params.
llvm-svn: 218909
2014-10-02 21:12:04 +00:00
Sanjay Patel 12d1ce5408 Optimize square root squared (PR21126).
When unsafe-fp-math is enabled, we can turn sqrt(X) * sqrt(X) into X.

This can happen in the real world when calculating x ** 3/2. This occurs
in test-suite/SingleSource/Benchmarks/BenchmarkGame/n-body.c.

Differential Revision: http://reviews.llvm.org/D5584

llvm-svn: 218906
2014-10-02 21:10:54 +00:00
Chandler Carruth 41fdd61f64 [x86] Move the vperm2f128 test to be vperm2x128 and test both the
floating point and integer domains.

Merge the AVX2 test into it and add an extra RUN line. Generate clean
FileCheck statements with my script. Remove the now merged AVX2 tests.

llvm-svn: 218903
2014-10-02 20:11:11 +00:00
Rui Ueyama 1e152d5eec This patch adds a new flag "-coff-imports" to llvm-readobj.
When the flag is given, the command prints out the COFF import table.

Currently only the import table directory will be printed.
I'm going to make another patch to print out the imported symbols.

The implementation of import directory entry iterator in
COFFObjectFile.cpp was buggy. This patch fixes that too.

http://reviews.llvm.org/D5569

llvm-svn: 218891
2014-10-02 17:02:18 +00:00
Joerg Sonnenberger f148a6d498 Support padding unaligned data in .text.
llvm-svn: 218870
2014-10-02 13:41:42 +00:00
Zinovy Nis ccc3e3733b [BUG][INDVAR] Fix for PR21014: wrong SCEV operands commuting for non-commutative instructions
My commit rL216160 introduced a bug PR21014: IndVars widens code 'for (i = ; i < ...; i++) arr[ CONST - i]' into 'for (i = ; i < ...; i++) arr[ i - CONST]'
thus inverting index expression. This patch fixes it. 
Thanks to Jörg Sonnenberger for pointing.

Differential Revision: http://reviews.llvm.org/D5576

llvm-svn: 218867
2014-10-02 13:01:15 +00:00
Chandler Carruth 2570e48a30 [x86] Just delete the last combine test file.
This file isn't really doing anything useful. Many of the tests that
seem to be combined are also repeats from other test files. Many of the
other tests, despite the comment that they should be combined into
a single shuffle... well... aren't combined into a single shuffle.
=/

llvm-svn: 218862
2014-10-02 08:05:57 +00:00
Chandler Carruth 71f4187dbb [x86] Merge still more combine tests into the common file. These at
least seem *slightly* more interesting test wise, although given how
spotily we actually combine anything, I remain somewhat suspicious.

llvm-svn: 218861
2014-10-02 08:02:34 +00:00
Chandler Carruth 782b0a72ac [x86] Merge the third combining test into the generic one and add proper
checks for all the ISA variants.

If the SSE2 checks here terrify you, good. This is (in large part) the
kind of amazingly bad code that is holding LLVM back when vectorizing on
older ISAs.

At the same time, these tests seem increasingly dubious to me. There are
a very large number of tests and it isn't clear that they are
systematically covering a specific set of functionality. Anyways,
I don't want to reduce testing during the transition, I just want to
consolidate it to where it is easier to manage.

llvm-svn: 218860
2014-10-02 07:56:47 +00:00
Chandler Carruth b2941e2693 [x86] Merge the second set of vector combining tests into a common test
file.

Some of these really don't make sense to test -- we're testing for the
*lack* of combining two shuffles into one, presumably because the two
would generate better shuffles in the end. But if you look at the
generated code shown here, in many cases the generated code is, frankly,
terrible. Or we combine any two generated shuffles back into a single
instruction! I've left a FIXME to revisit these decisions.

llvm-svn: 218859
2014-10-02 07:42:58 +00:00
Chandler Carruth 2110501822 [x86] Merge the bitwise operation shuffle combining into the common test
file, adding assertions across the ISA variants for it.

llvm-svn: 218858
2014-10-02 07:30:24 +00:00
Chandler Carruth 3c7bf04c87 [x86] Update this test to run a full complement of the ISA extensions,
and use the new grouped FileCheck patterns to match them.

No interesting changes yet, but this test is now in proper form to have
the other shuffle combining tests merged into it.

llvm-svn: 218857
2014-10-02 07:22:26 +00:00
Chandler Carruth 11a5ae0880 [x86] Minimize the parameters to this test for clarity.
The test has to do with DAG combines, and so it doesn't need the new
vector shuffle lowering to be effective. Also, it has a nice in-IR
triple string which we should really be using rather than command line
flags (unless it varies form RUN-line to RUN-line). Finally, I much
prefer letting LLVM synthesize the correct datalayout string from the
triple rather than baking one in here that will just become stale.

llvm-svn: 218856
2014-10-02 07:17:15 +00:00
Chandler Carruth 7b2706726e [x86] Add a comment clarifying that this test should span all manners of
generic DAG combining of shuffles relevant to x86.

My plan is to fold a bunch of the other DAG combining test cases into
this one, while converting them to use the nice new FileCheck assertion
syntax.

llvm-svn: 218855
2014-10-02 07:13:25 +00:00
Chandler Carruth c1bb0e84bc [x86] Switch some of the new consolidated vector tests to use
a bare-metal triple and have nice BB labels, etc.

No significant change here, just tidying up to have a consistent set of
OS-agnostic vector functionality here.

llvm-svn: 218854
2014-10-02 06:52:19 +00:00
Eric Christopher 94ec17fac9 Remove test directories with no tests.
llvm-svn: 218843
2014-10-02 00:42:30 +00:00
Chandler Carruth 8a16802d46 [x86] Improve and correct how the new vector shuffle lowering was
matching and lowering 64-bit insertions.

The first problem was that we weren't looking through bitcasts to
discover that we *could* lower as insertions. Once fixed, we in turn
weren't looking through bitcasts to discover that we could fold a load
into the lowering. Once fixed, we weren't forming a SCALAR_TO_VECTOR
node around the inserted element and instead were passing a scalar to
a DAG node that expected a vector. It turns out there are some patterns
that will "lower" this into the correct asm, but the rest of the X86
backend is very unhappy with such antics.

This should fix a few more edge case regressions I've spotted going
through the regression test suite to enable the new vector shuffle
lowering.

llvm-svn: 218839
2014-10-01 23:14:28 +00:00
Sanjay Patel 9ebfbb969d Lower FNEG ( FABS (x) ) -> FNABS (x) [X86 codegen] PR20578
Negative FABS of either a scalar or vector should be handled the same way
on x86 with SSE/AVX: a single OR instruction of the FP operand with a
constant to light up the sign bit(s).

http://llvm.org/bugs/show_bug.cgi?id=20578

Differential Revision: http://reviews.llvm.org/D5201

llvm-svn: 218822
2014-10-01 21:20:06 +00:00
Chandler Carruth f795e4805e [x86] Merge the remaining test cases into vector-blend.ll and remove all
the ISA-specific test files.

llvm-svn: 218818
2014-10-01 21:07:07 +00:00
Chandler Carruth b21e033b2a [x86] Expand the ISA coverage of our blend test in preparation for
merging ISA-specific testing into this file.

llvm-svn: 218816
2014-10-01 21:03:21 +00:00
Chandler Carruth a406d6c386 [x86] Merge the interesting test cases from blend-msb.ll into
vector-blend.ll and remove the former.

llvm-svn: 218814
2014-10-01 20:56:57 +00:00
Chandler Carruth 19a5366481 [x86] Move the AVX blend test to a generic name. I'm going to fold other
blend tests into this one.

llvm-svn: 218813
2014-10-01 20:52:55 +00:00
Chandler Carruth b4b09c7a6c [x86] Remove a test that wasn't doing anything really. We have plenty of
better tests for zext of vectors at this point.

llvm-svn: 218811
2014-10-01 20:50:58 +00:00
Chandler Carruth ab5ddea2cb [x86] Add a 32-bit run to the sext test, and remove a sad vec_sext.ll
test file.

This old test had a bunch of functions that were never even checked. =/
The only thing it really did was to make sure that we did something
reasonable in 32-bit mode with SSE4.1. Adding another run line to the
main vector-sext.ll test seems a better way to do that.

llvm-svn: 218810
2014-10-01 20:49:54 +00:00
Chandler Carruth bbbdb9f0ee [x86] Teach both sext and zext vector tests to cover a nice wide range
of architectures: SSE2, SSSE3, SSE4.1, AVX, and AVX2.

Unfortunately, this exposses the absolute horror of the code we generate
for many of these patterns. Anyone wanting to familiarize themselves
with the x86 backend and improve performance could do a lot of good
sitting down and making these test cases not look so terrible. While the
new vector shuffle code I'm working on well help some, it won't fix all
of the crimes here.

llvm-svn: 218807
2014-10-01 20:41:36 +00:00
Sanjay Patel 7b2cd9ad86 Make the sqrt intrinsic return undef for a negative input.
As discussed here:
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140609/220598.html

And again here:
http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-September/077168.html

The sqrt of a negative number when using the llvm intrinsic is undefined. 
We should return undef rather than 0.0 to match the definition in the LLVM IR lang ref.

This change should not affect any code that isn't using "no-nans-fp-math"; 
ie, no-nans is a requirement for generating the llvm intrinsic in place of a sqrt function call.

Unfortunately, the behavior introduced by this patch will not match current gcc, xlc, icc, and 
possibly other compilers. The current clang/llvm behavior of returning 0.0 doesn't either. 
We knowingly approve of this difference with the other compilers in an attempt to flag code 
that is invoking undefined behavior.

A front-end warning should also try to convince the user that the program will fail:
http://llvm.org/bugs/show_bug.cgi?id=21093

Differential Revision: http://reviews.llvm.org/D5527

llvm-svn: 218803
2014-10-01 20:36:33 +00:00
Chandler Carruth b5c9e04b51 [x86] Sort the ISA-specific RUN lines for vector-sext.ll to go from
oldest to newest. This makes more sense to me and is more consistent
with other tests.

llvm-svn: 218802
2014-10-01 20:32:44 +00:00
Tim Northover 6a1ef73140 ARM: yes it can (as of r218789)
llvm-svn: 218801
2014-10-01 20:31:58 +00:00
Chandler Carruth c66ea0fc12 [x86] Rename avx-{s,z}ext.ll to vector-{s,z}ext.ll.
These tests are far and away the best sext and zext tests we have for
vectors. I'm going to merge the other similar tests into them and expand
the ISA coverage.

llvm-svn: 218800
2014-10-01 20:30:30 +00:00
Chandler Carruth 011088fc46 [x86] Cleanup and re-generate the checks for avx-zext.ll using the new
script.

llvm-svn: 218799
2014-10-01 20:27:16 +00:00
Chandler Carruth fbba2fa8d9 [x86] Generate the FileCheck assertions for avx-blend.ll with my new
script to make them nice and predictable. This will ease updating them
for the new vector shuffle lowering and seeing the delta if any.

llvm-svn: 218795
2014-10-01 20:19:45 +00:00
Chandler Carruth 1f569b05b6 [x86] Clean up and generate detailed FileCheck assertions for
avx-sext.ll using my new script.

Also add an AVX2 mode to this test.

Part of cleaning up the test suite before enabling the new vector
shuffle lowering. This also highlights some of the abysmal failures of
the old shuffle lowering. Check out those 'pinsrw' and 'pextrw'
sequences!

llvm-svn: 218794
2014-10-01 20:19:32 +00:00
Tim Northover 5d72c5de02 ARM: allow copying of CPSR when all else fails.
As with x86 and AArch64, certain situations can arise where we need to spill
CPSR in the middle of a calculation. These should be avoided where possible
(MRS/MSR is rather expensive), which ARM is actually better at than the other
two since it tries to Glue defs to uses, but as a last ditch effort, copying is
better than crashing.

rdar://problem/18011155

llvm-svn: 218789
2014-10-01 19:21:03 +00:00
Adrian Prantl 87b7eb9d0f Move the complex address expression out of DIVariable and into an extra
argument of the llvm.dbg.declare/llvm.dbg.value intrinsics.

Previously, DIVariable was a variable-length field that has an optional
reference to a Metadata array consisting of a variable number of
complex address expressions. In the case of OpPiece expressions this is
wasting a lot of storage in IR, because when an aggregate type is, e.g.,
SROA'd into all of its n individual members, the IR will contain n copies
of the DIVariable, all alike, only differing in the complex address
reference at the end.

By making the complex address into an extra argument of the
dbg.value/dbg.declare intrinsics, all of the pieces can reference the
same variable and the complex address expressions can be uniqued across
the CU, too.
Down the road, this will allow us to move other flags, such as
"indirection" out of the DIVariable, too.

The new intrinsics look like this:
declare void @llvm.dbg.declare(metadata %storage, metadata %var, metadata %expr)
declare void @llvm.dbg.value(metadata %storage, i64 %offset, metadata %var, metadata %expr)

This patch adds a new LLVM-local tag to DIExpressions, so we can detect
and pretty-print DIExpression metadata nodes.

What this patch doesn't do:

This patch does not touch the "Indirect" field in DIVariable; but moving
that into the expression would be a natural next step.

http://reviews.llvm.org/D4919
rdar://problem/17994491

Thanks to dblaikie and dexonsmith for reviewing this patch!

Note: I accidentally committed a bogus older version of this patch previously.
llvm-svn: 218787
2014-10-01 18:55:02 +00:00
Duncan P. N. Exon Smith 08a83be3ea LTO: Add missing target triple from r218784
llvm-svn: 218786
2014-10-01 18:49:58 +00:00
Reed Kotler b9dc248e9e Add fptrunc to mips fast-sel
Summary: Implement conversion of 64 to 32 bit floating point numbers (fptrunc) in mips fast-isel

Test Plan:
fptrunc.ll
checked also with 4 internal mips build bot flavors mip32r1/miprs32r2 and at -O0 and -O2

Reviewers: dsanders

Reviewed By: dsanders

Subscribers: rfuhler

Differential Revision: http://reviews.llvm.org/D5553

llvm-svn: 218785
2014-10-01 18:47:02 +00:00
Duncan P. N. Exon Smith 30c9242caa LTO: Ignore disabled diagnostic remarks
r206400 and r209442 added remarks that are disabled by default.
However, if a diagnostic handler is registered, the remarks are sent
unfiltered to the handler.  This is the right behaviour for clang, since
it has its own filters.

However, the diagnostic handler exposed in the LTO API receives only the
severity and message.  It doesn't have the information to filter by pass
name.  For LTO, disabled remarks should be filtered by the producer.

I've changed `LLVMContext::setDiagnosticHandler()` to take a `bool`
argument indicating whether to respect the built-in filters.  This
defaults to `false`, so other consumers don't have a behaviour change,
but `LTOCodeGenerator::setDiagnosticHandler()` sets it to `true`.

To make this behaviour testable, I added a `-use-diagnostic-handler`
command-line option to `llvm-lto`.

This fixes PR21108.

llvm-svn: 218784
2014-10-01 18:36:03 +00:00
Adrian Prantl b458dc2eee Revert r218778 while investigating buldbot breakage.
"Move the complex address expression out of DIVariable and into an extra"

llvm-svn: 218782
2014-10-01 18:10:54 +00:00
Adrian Prantl 25a7174e7a Move the complex address expression out of DIVariable and into an extra
argument of the llvm.dbg.declare/llvm.dbg.value intrinsics.

Previously, DIVariable was a variable-length field that has an optional
reference to a Metadata array consisting of a variable number of
complex address expressions. In the case of OpPiece expressions this is
wasting a lot of storage in IR, because when an aggregate type is, e.g.,
SROA'd into all of its n individual members, the IR will contain n copies
of the DIVariable, all alike, only differing in the complex address
reference at the end.

By making the complex address into an extra argument of the
dbg.value/dbg.declare intrinsics, all of the pieces can reference the
same variable and the complex address expressions can be uniqued across
the CU, too.
Down the road, this will allow us to move other flags, such as
"indirection" out of the DIVariable, too.

The new intrinsics look like this:
declare void @llvm.dbg.declare(metadata %storage, metadata %var, metadata %expr)
declare void @llvm.dbg.value(metadata %storage, i64 %offset, metadata %var, metadata %expr)

This patch adds a new LLVM-local tag to DIExpressions, so we can detect
and pretty-print DIExpression metadata nodes.

What this patch doesn't do:

This patch does not touch the "Indirect" field in DIVariable; but moving
that into the expression would be a natural next step.

http://reviews.llvm.org/D4919
rdar://problem/17994491

Thanks to dblaikie and dexonsmith for reviewing this patch!

llvm-svn: 218778
2014-10-01 17:55:39 +00:00
Tom Stellard 79243d9664 R600: Call EmitFunctionHeader() in the AsmPrinter to populate the ELF symbol table
llvm-svn: 218776
2014-10-01 17:15:17 +00:00
Jingyue Wu fd47fb9976 Revert r216862 due to a performance regression
Reported by Alexey Volkov in PR21115

llvm-svn: 218771
2014-10-01 15:22:13 +00:00
Oliver Stannard d4e0a4fd2c [ARM] Allow selecting VRINT[APMXZR] and VCVT[BT] instructions for FPv5
Currently, we only codegen the VRINT[APMXZR] and VCVT[BT] instructions
when targeting ARMv8, but they are actually present on any target with
FP-ARMv8. Note that FP-ARMv8 is called FPv5 when is is part of an
M-profile core, but they have the same instructions so we model them
both as FPARMv8 in the ARM backend.

llvm-svn: 218763
2014-10-01 13:13:18 +00:00
Chandler Carruth 6c02c031b8 [x86] Fix a few more tiny patterns with the new vector shuffle lowering
that keep cropping up in the regression test suite.

This also addresses one of the issues raised on the mailing list with
failing to form 'movsd' in as many cases as we realistically should.
There will be corresponding patches forthcoming for v4f32 at least. This
was a lot of fuss for a relatively small gain, but all the fuss was on
my end trying different ways of holding the pieces of the x86 fragment
patterns *just right*. Now that it works, the code is reasonably simple.

In the new test cases I'm adding here, v2i64 sticks out as just plain
horrible. I've not come up with any great ideas here other than that it
would be nice to recognize when we're *going* to take a domain crossing
hit and cross earlier to get the decent instructions. At least with AVX
it is slightly less silly....

llvm-svn: 218756
2014-10-01 11:14:02 +00:00
Tom Coxon e493f177ee [AArch64] Allow access to all system registers with MRS/MSR instructions.
The A64 instruction set includes a generic register syntax for accessing
implementation-defined system registers. The syntax for these registers is:
    S<op0>_<op1>_<CRn>_<CRm>_<op2>

The encoding space permitted for implementation-defined system registers
is:
    op0 op1  CRn   CRm   op2
    11  xxx  1x11  xxxx  xxx

The full encoding space can now be accessed:
    op0 op1  CRn   CRm   op2
    xx  xxx  xxxx  xxxx  xxx

This is useful to anyone needing to write assembly code supporting new
system registers before the assembler has learned the official names for
them.

llvm-svn: 218753
2014-10-01 10:13:59 +00:00
Evgeniy Stepanov 815f2869ad Revert r218721, r218735.
Failing bootstrap on Linux (arm, x86).

http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux/builds/13139/steps/bootstrap%20clang/logs/stdio
http://lab.llvm.org:8011/builders/clang-cmake-armv7-a15-selfhost/builds/470
http://lab.llvm.org:8011/builders/clang-native-arm-lnt/builds/8518

llvm-svn: 218752
2014-10-01 10:07:28 +00:00
Asiri Rathnayake 530b3edab6 Add missing natual vector cast.
Summary: The natual vector cast node (similar to bitcast) AArch64ISD::NVCAST
was introduced in r217159 and r217138. This patch adds a missing cast from
v2f32 to v1i64 which is causing some compilation failures. Also added test
cases to cover various modimm types and BUILD_VECTORs with i64 elements.

llvm-svn: 218751
2014-10-01 09:59:45 +00:00
Oliver Stannard 37e4daab05 [ARM] Add support for Cortex-M7, FPv5-SP and FPv5-DP (LLVM)
The Cortex-M7 has 3 options for its FPU: none, FPv5-SP-D16 and
FPv5-DP-D16. FPv5 has the same instructions as FP-ARMv8, so it can be
modelled using the same target feature, and all double-precision
operations are already disabled by the fp-only-sp target features.

llvm-svn: 218747
2014-10-01 09:02:17 +00:00
Daniel Sanders 92db6b78f7 [mips] Fix disassembly of [ls][wd]c[23], cache, and pref
Fixes PR21015, and PR20993.                                                       
                                                                                  
Patch by Jun Koi

llvm-svn: 218745
2014-10-01 08:26:55 +00:00
Sasa Stankovic 7072a7968f [mips] For indirect calls we don't need $gp to point to .got. Mips linker
doesn't generate lazy binding stub for a function whose address is taken in
the program.

Differential Revision: http://reviews.llvm.org/D5067

llvm-svn: 218744
2014-10-01 08:22:21 +00:00
Justin Bogner 6a107bad15 test: XFAIL the non-darwin gmlt test on darwin
r218702 disabled a -gmlt optimization for darwin, but this means the
non-darwin test isn't working there anymore.

llvm-svn: 218742
2014-10-01 05:45:45 +00:00
Chandler Carruth 26cb9b8d2d [x86] Teach the new vector shuffle lowering to be even more aggressive
in exposing the scalar value to the broadcast DAG fragment so that we
can catch even reloads and fold them into the broadcast.

This is somewhat magical I'm afraid but seems to work. It is also what
the old lowering did, and I've switched an old test to run both
lowerings demonstrating that we get the same result.

Unlike the old code, I'm not lowering f32 or f64 scalars through this
path when we only have AVX1. The target patterns include pretty heinous
code to re-cast those as shuffles when the scalar happens to not be
spilled because AVX1 provides no broadcast mechanism from registers
what-so-ever. This is terribly brittle. I'd much rather go through our
generic lowering code to get this. If needed, we can add a peephole to
get even more opportunities to broadcast-from-spill-slots that are
exposed post-RA, but my suspicion is this just doesn't matter that much.

llvm-svn: 218734
2014-10-01 03:19:43 +00:00
Chandler Carruth 846baf2ca1 [x86] Hoist the zext-lowering up in the v4i32 lowering routine -- it is
the same speed as pshufd but we can fold loads into the pmovzx
instructions.

This fixes some regressions that came up in the regression test suite
for the new vector shuffle lowering.

llvm-svn: 218733
2014-10-01 02:25:54 +00:00
David Blaikie 32b0f365a2 Implement DW_TAG_subrange_type with DW_AT_count rather than DW_AT_upper_bound
This allows proper disambiguation of unbounded arrays and arrays of zero
bound ("struct foo { int x[]; };" and "struct foo { int x[0]; }"). GCC
instead produces an upper bound of -1 in the latter situation, but count
seems tidier. This way lower_bound is provided if it's not the language
default and count is provided if the count is known, otherwise it's
omitted. Simple.

If someone wants to look at rdar://problem/12566646 and see if this
change is acceptable to that bug/fix, that might be helpful (see the
empty-and-one-elem-array.ll test case which cites that radar).

llvm-svn: 218726
2014-10-01 00:56:55 +00:00
Chandler Carruth b9d3fa1e65 [x86] Teach the new vector shuffle lowering about VBROADCAST and
VPBROADCAST.

This has the somewhat expected pervasive impact. I don't know why
I forgot about this. Everything seems good with lots of significant
improvements in the tests.

llvm-svn: 218724
2014-10-01 00:41:21 +00:00
NAKAMURA Takumi 614f1001ec llvm/test/DebugInfo/X86/gmlt.test: Get rid of %llc_dwarf. It should not be used with -mtriple.
Also, remove object-emission. test/DebugInfo/X86 doesn't require it.

llvm-svn: 218722
2014-10-01 00:29:16 +00:00
Gerolf Hoflehner 08cc4b950c [InstCombine] Optimize icmp-select-icmp
In special cases select instructions can be eliminated by
replacing them with a cheaper bitwise operation even when the
select result is used outside its home block. The instances implemented
are patterns like
    %x=icmp.eq
    %y=select %x,%r, null
    %z=icmp.eq|neq %y, null
    br %z,true, false
==> %x=icmp.ne
    %y=icmp.eq %r,null
    %z=or %x,%y
    br %z,true,false
The optimization is integrated into the instruction
combiner and performed only when all uses of the select result can
be replaced by the select operand proper. For this dominator information
is used and dominance is now a required analysis pass in the combiner.
The optimization itself is iterative. The critical step is to replace the
select result with the non-constant select operand. So the select becomes
local and the combiner iteratively works out simpler code pattern and
eventually eliminates the select.

rdar://17853760

llvm-svn: 218721
2014-10-01 00:13:22 +00:00
David Blaikie 6cca8109ab Omit DW_AT_inline under -gmlt to save a little more space.
llvm-svn: 218719
2014-09-30 23:29:16 +00:00
Hal Finkel fd86317989 [BasicAA] Make better use of zext and sign information
Two related things:

 1. Fixes a bug when calculating the offset in GetLinearExpression. The code
    previously used zext to extend the offset, so negative offsets were converted
    to large positive ones.

 2. Enhance aliasGEP to deduce that, if the difference between two GEP
    allocations is positive and all the variables that govern the offset are also
    positive (i.e. the offset is strictly after the higher base pointer), then
    locations that fit in the gap between the two base pointers are NoAlias.

Patch by Nick White!

llvm-svn: 218714
2014-09-30 22:43:40 +00:00
Jingyue Wu fc0296704c [SimplifyCFG] threshold for folding branches with common destination
Summary:
This patch adds a threshold that controls the number of bonus instructions
allowed for folding branches with common destination. The original code allows
at most one bonus instruction. With this patch, users can customize the
threshold to allow multiple bonus instructions. The default threshold is still
1, so that the code behaves the same as before when users do not specify this
threshold.

The motivation of this change is that tuning this threshold significantly (up
to 25%) improves the performance of some CUDA programs in our internal code
base. In general, branch instructions are very expensive for GPU programs.
Therefore, it is sometimes worth trading more arithmetic computation for a more
straightened control flow. Here's a reduced example:

  __global__ void foo(int a, int b, int c, int d, int e, int n,
                      const int *input, int *output) {
    int sum = 0;
    for (int i = 0; i < n; ++i)
      sum += (((i ^ a) > b) && (((i | c ) ^ d) > e)) ? 0 : input[i];
    *output = sum;
  }

The select statement in the loop body translates to two branch instructions "if
((i ^ a) > b)" and "if (((i | c) ^ d) > e)" which share a common destination.
With the default threshold, SimplifyCFG is unable to fold them, because
computing the condition of the second branch "(i | c) ^ d > e" requires two
bonus instructions. With the threshold increased, SimplifyCFG can fold the two
branches so that the loop body contains only one branch, making the code
conceptually look like:

  sum += (((i ^ a) > b) & (((i | c ) ^ d) > e)) ? 0 : input[i];

Increasing the threshold significantly improves the performance of this
particular example. In the configuration where both conditions are guaranteed
to be true, increasing the threshold from 1 to 2 improves the performance by
18.24%. Even in the configuration where the first condition is false and the
second condition is true, which favors shortcuts, increasing the threshold from
1 to 2 still improves the performance by 4.35%.

We are still looking for a good threshold and maybe a better cost model than
just counting the number of bonus instructions. However, according to the above
numbers, we think it is at least worth adding a threshold to enable more
experiments and tuning. Let me know what you think. Thanks!

Test Plan: Added one test case to check the threshold is in effect

Reviewers: nadav, eliben, meheff, resistor, hfinkel

Reviewed By: hfinkel

Subscribers: hfinkel, llvm-commits

Differential Revision: http://reviews.llvm.org/D5529

llvm-svn: 218711
2014-09-30 22:23:38 +00:00
Chandler Carruth bebedbaf36 [x86] Add AVX1 and AVX2 testing to all of the 128-bit shuffle test
cases.

While clearly we don't need the AVX vector width, these ISA extensions
often cause us to select different instructions and we should cover them
even with the narrow vector width.

Also, while here, nuke the stress_test2 contents. There is no reason to
try to FileCheck this entire body when it is mostly a test for
successfully surviving the code generator.

llvm-svn: 218710
2014-09-30 22:16:23 +00:00
Chandler Carruth a41dceb39b [x86] Update the exact FileCheck syntax of the 256-bit and 512-bit
shuffle tests to match that used in the script I posted and now used
consistently in 128-bit tests.

Nothing interesting changing here, just using the label name as the
FileCheck label and a slightly more general comment marker consumption
strategy.

llvm-svn: 218709
2014-09-30 22:04:45 +00:00
David Blaikie 515387569a Adjust test case addition in r218702 so as not to fail when the X86 target isn't built.
llvm-svn: 218708
2014-09-30 22:02:27 +00:00
Chandler Carruth 6a62cd3538 [x86] Rework all of the 128-bit vector shuffle tests with my handy test
updating script so that they are more thorough and consistent.

Specific fixes here include:
- Actually test VEX-encoded AVX mnemonics.
- Actually use an SSE 4.1 run to test SSE 4.1 features!
- Correctly check instructions sequences from the start of the function.
- Elide the shuffle operands and comment designator in a consistent way.
- Test all of the architectures instead of just the ones I was motivated
  to manually author.

I've gone back through and fixed up any egregious issues I spotted. Let
me know if I missed something you really dislike.

One downside to this is that we're now not as diligently using FileCheck
variables for registers. I would be much more concerned with this if we
had larger register usage, but there just aren't that interesting of
register choices here and most of the registers are constrained by the
ABI. Ultimately, I don't think this is likely to be the maintenance
burden for these tests and updating them again should be staright
forward.

llvm-svn: 218707
2014-09-30 21:44:34 +00:00
David Blaikie e1c79749ca Disable the -gmlt optimization implemented in r218129 under Darwin due to issues with dsymutil.
r218129 omits DW_TAG_subprograms which have no inlined subroutines when
emitting -gmlt data. This makes -gmlt very low cost for -O0 builds.

Darwin's dsymutil reasonably considers a CU empty if it has no
subprograms (which occurs with the above optimization in -O0 programs
without any force_inline function calls) and drops the line table, CU,
and everything in this situation, making backtraces impossible.

Until dsymutil is modified to account for this, disable this
optimization on Darwin to preserve the desired functionality.
(see r218545, which should be reverted after this patch, for other
discussion/details)

Footnote:
In the long term, it doesn't look like this scheme (of simplified debug
info to describe inlining to enable backtracing) is tenable, it is far
too size inefficient for optimized code (the DW_TAG_inlined_subprograms,
even once compressed, are nearly twice as large as the line table
itself (also compressed)) and we'll be considering things like Cary's
two level line table proposal to encode all this information directly in
the line table.

llvm-svn: 218702
2014-09-30 21:28:32 +00:00
Juergen Ributzka c110c0b99a Recommit r218010 [FastISel][AArch64] Fold bit test and branch into TBZ and TBNZ.
Note: This version fixed an issue with the TBZ/TBNZ instructions that were
generated in FastISel. The issue was that the 64bit version of TBZ (TBZX)
automagically sets the upper bit of the immediate field that is used to specify
the bit we want to test. To test for any of the lower 32bits we have to first
extract the subregister and use the 32bit version of the TBZ instruction (TBZW).

Original commit message:
Teach selectBranch to fold bit test and branch into a single instruction (TBZ or
TBNZ).

llvm-svn: 218693
2014-09-30 19:59:35 +00:00
Matt Arsenault 9706978077 R600/SI: Fix printing of clamp and omod
No tests for omod since nothing uses it yet, but
this should get rid of the remaining annoying trailing
zeros after some instructions.

llvm-svn: 218692
2014-09-30 19:49:48 +00:00
Bradley Smith 7a77075530 Extend C disassembler API to allow specifying target features
llvm-svn: 218682
2014-09-30 16:31:40 +00:00
Reed Kotler 3ebdcc9ea7 Add numeric extend, trunctate to mips fast-isel
Summary:
 Add numeric extend, trunctate to mips fast-isel

 Reactivates D4827



Test Plan:
fpext.ll
loadstoreconv.ll

Reviewers: dsanders

Subscribers: mcrosier

Differential Revision: http://reviews.llvm.org/D5251

llvm-svn: 218681
2014-09-30 16:30:13 +00:00
Alex Lorenz 597eaf2a43 Revert r218673 'llvm-cov: add test for report's function & file association.'
Test causes buildbot failures.

llvm-svn: 218676
2014-09-30 14:48:12 +00:00
Alex Lorenz a891e6d44a llvm-cov: add test for report's function & file association.
This commit adds a test which checks that the functions defined in header files will get associated with the header files rather than the source files in the reports.

Differential Revision: http://reviews.llvm.org/D5489

llvm-svn: 218673
2014-09-30 12:52:31 +00:00
Alex Lorenz cb1702d45a llvm-cov: Use the number of executed functions for the function coverage metric.
This commit fixes llvm-cov's function coverage metric by using the number of executed functions instead of the number of fully covered functions.

Differential Revision: http://reviews.llvm.org/D5196

llvm-svn: 218672
2014-09-30 12:45:13 +00:00
Lorenzo Martignoni 40d3deeb7d Introduce support for custom wrappers for vararg functions.
Differential Revision: http://reviews.llvm.org/D5412

llvm-svn: 218671
2014-09-30 12:33:16 +00:00
Robert Khasanov 28a7df0b5f [AVX512] Added intrinsics for 128-, 256- and 512-bit versions of VCMPGT{BWDQ}.
Patch by Sergey Lisitsyn <sergey.lisitsyn@intel.com>

llvm-svn: 218670
2014-09-30 12:15:52 +00:00
Robert Khasanov 5aa4445bde [AVX512] Added intrinsics for 128- and 256-bit versions of VCMPEQ{BWDQ}
Fixed lowering of this intrinsics in case when mask is v2i1 and v4i1.
Now cmp intrinsics lower in the following way:
 (i8 (int_x86_avx512_mask_pcmpeq_q_128
             (v2i64 %a), (v2i64 %b), (i8 %mask))) ->
 (i8 (bitcast
   (v8i1 (insert_subvector undef,
           (v2i1 (and (PCMPEQM %a, %b),
                      (extract_subvector
                         (v8i1 (bitcast %mask)), 0))), 0))))

llvm-svn: 218669
2014-09-30 11:41:54 +00:00
Robert Khasanov b25e562d14 [AVX512] Added intrinsics for VPCMPEQB and VPCMPEQW.
Added new operand type for intrinsics (IIT_V64)

llvm-svn: 218668
2014-09-30 11:32:22 +00:00
Robert Khasanov a27c8e0fd9 [AVX512] Enabled intrinsics for VPCMPEQD and VPCMPEQQ.
Added CMP_MASK intrinsic type

llvm-svn: 218667
2014-09-30 11:19:50 +00:00
Chad Rosier aab5d7bd33 [IndVarSimplify] Widen loop unsigned compares.
This patch extends r217953 to handle unsigned comparison.
Phabricator revision: http://reviews.llvm.org/D5526

llvm-svn: 218659
2014-09-30 03:17:42 +00:00
Chandler Carruth aaf8e03d92 [x86] Revert r218588, r218589, and r218600. These patches were pursuing
a flawed direction and causing miscompiles. Read on for details.

Fundamentally, the premise of this patch series was to map
VECTOR_SHUFFLE DAG nodes into VSELECT DAG nodes for all blends because
we are going to *have* to lower to VSELECT nodes for some blends to
trigger the instruction selection patterns of variable blend
instructions. This doesn't actually work out so well.

In order to match performance with the existing VECTOR_SHUFFLE
lowering code, we would need to re-slice the blend in order to fit it
into either the integer or floating point blends available on the ISA.
When coming from VECTOR_SHUFFLE (or other vNi1 style VSELECT sources)
this works well because the X86 backend ensures that these types of
operands to VSELECT get sign extended into '-1' and '0' for true and
false, allowing us to re-slice the bits in whatever granularity without
changing semantics.

However, if the VSELECT condition comes from some other source, for
example code lowering vector comparisons, it will likely only have the
required bit set -- the high bit. We can't blindly slice up this style
of VSELECT. Reid found some code using Halide that triggers this and I'm
hopeful to eventually get a test case, but I don't need it to understand
why this is A Bad Idea.

There is another aspect that makes this approach flawed. When in
VECTOR_SHUFFLE form, we have very distilled information that represents
the *constant* blend mask. Converting back to a VSELECT form actually
can lose this information, and so I think now that it is better to treat
this as VECTOR_SHUFFLE until the very last moment and only use VSELECT
nodes for instruction selection purposes.

My plan is to:
1) Clean up and formalize the target pre-legalization DAG combine that
   converts a VSELECT with a constant condition operand into
   a VECTOR_SHUFFLE.
2) Remove any fancy lowering from VSELECT during *legalization* relying
   entirely on the DAG combine to catch cases where we can match to an
   immediate-controlled blend instruction.

One additional step that I'm not planning on but would be interested in
others' opinions on: we could add an X86ISD::VSELECT or X86ISD::BLENDV
which encodes a fully legalized VSELECT node. Then it would be easy to
write isel patterns only in terms of this to ensure VECTOR_SHUFFLE
legalization only ever forms the fully legalized construct and we can't
cycle between it and VSELECT combining.

llvm-svn: 218658
2014-09-30 02:52:28 +00:00
Chandler Carruth 964747adcf [x86] Add some vector-register broadcast operations to the 256-bit v4
tests which were missing them.

llvm-svn: 218657
2014-09-30 02:32:36 +00:00
Matt Arsenault 1c4571e0fd R600: Fix broken check lines, missing scalar case.
llvm-svn: 218655
2014-09-30 01:05:29 +00:00
Juergen Ributzka 6ac12439d0 [FastISel][AArch64] Fold sign-/zero-extends into the load instruction.
The sign-/zero-extension of the loaded value can be performed by the memory
instruction for free. If the result of the load has only one use and the use is
a sign-/zero-extend, then we emit the proper load instruction. The extend is
only a register copy and will be optimized away later on.

Other instructions that consume the sign-/zero-extended value are also made
aware of this fact, so they don't fold the extend too.

This fixes rdar://problem/18495928.

llvm-svn: 218653
2014-09-30 00:49:58 +00:00
Hans Wennborg f26bfc1671 WinCOFFObjectWriter: optimize the string table for common suffices
This is a follow-up from r207670 which did the same for ELF.

Differential Revision: http://reviews.llvm.org/D5530

llvm-svn: 218636
2014-09-29 22:43:20 +00:00
Eric Christopher 6a0551e43a Add soft-float to the key for the subtarget lookup in the TargetMachine
map, this makes sure that we can compile the same code for two different
ABIs (hard and soft float) in the same module.

Update one testcase accordingly (and fix some confusing naming) and
add a new testcase as well with the ordering swapped which would
highlight the problem.

llvm-svn: 218632
2014-09-29 21:57:54 +00:00
Matt Arsenault 3d4233fe48 R600/SI: Also fix fsub + fadd a, a to mad combines
llvm-svn: 218609
2014-09-29 14:59:38 +00:00
Matt Arsenault 02cb0ff7db R600/SI: Fix using mad with multiplies by 2
These turn into fadds, so combine them into the target
mad node.

fadd (fadd (a, a), b) -> mad 2.0, a, b

llvm-svn: 218608
2014-09-29 14:59:34 +00:00
Chad Rosier 70d54ac848 [AArch64] Improve cost model to handle sdiv by a pow-of-two.
This patch improves the target-specific cost model to better handle signed
division by a power of two. The immediate result is that this enables the SLP
vectorizer to do a better job.

http://reviews.llvm.org/D5469
PR20714

llvm-svn: 218607
2014-09-29 13:59:31 +00:00
Kevin Qin fc02e3c363 Use a loop to simplify the runtime unrolling prologue.
Runtime unrolling will create a prologue to execute the extra
iterations which is can't divided by the unroll factor. It
generates an if-then-else sequence to jump into a factor -1
times unrolled loop body, like

    extraiters = tripcount % loopfactor
    if (extraiters == 0) jump Loop:
    if (extraiters == loopfactor) jump L1
    if (extraiters == loopfactor-1) jump L2
    ...
    L1:  LoopBody;
    L2:  LoopBody;
    ...
    if tripcount < loopfactor jump End
    Loop:
    ...
    End:

It means if the unroll factor is 4, the loop body will be 7
times unrolled, 3 are in loop prologue, and 4 are in the loop.
This commit is to use a loop to execute the extra iterations
in prologue, like

        extraiters = tripcount % loopfactor
        if (extraiters == 0) jump Loop:
        else jump Prol
 Prol:  LoopBody;
        extraiters -= 1                 // Omitted if unroll factor is 2.
        if (extraiters != 0) jump Prol: // Omitted if unroll factor is 2.
        if (tripcount < loopfactor) jump End
 Loop:
 ...
 End:

Then when unroll factor is 4, the loop body will be copied by
only 5 times, 1 in the prologue loop, 4 in the original loop.
And if the unroll factor is 2, new loop won't be created, just
as the original solution.

llvm-svn: 218604
2014-09-29 11:15:00 +00:00
Oliver Stannard a4eba5ad70 [Thumb2] ldrexd and strexd are not defined on v7M
The Thumb2 ldrexd and strexd instructions are not defined for
M-class architectures.

llvm-svn: 218603
2014-09-29 10:57:29 +00:00
Chandler Carruth 6cbf43167b [x86] Make the new vector shuffle lowering lower blends as VSELECT
nodes, and rely exclusively on its logic. This removes a ton of
duplication from the blend lowering and centralizes it in one place.

One downside is that it requires a bunch of hacks to make this work with
the current legalization framework. We have to manually speculate one
aspect of legalizing VSELECT nodes to get everything to work nicely
because the existing legalization framework isn't *actually* bottom-up.

The other grossness is that we somewhat duplicate the analysis of
constant blends. I'm on the fence here. If reviewers thing this would
look better with VSELECT when it has constant operands dumping over tho
VECTOR_SHUFFLE, we could go that way. But it would be a substantial
change because currently all of the actual blend instructions are
matched via patterns in the TD files based around VSELECT nodes (despite
them not being perfect fits for that). Suggestions welcome, but at least
this removes the rampant duplication in the backend.

llvm-svn: 218600
2014-09-29 09:57:07 +00:00
Chandler Carruth b1cc7a8542 [x86] Delete a bunch of really bad and totally unnecessary code in the
X86 target-specific DAG combining that tried to convert VSELECT nodes
into VECTOR_SHUFFLE nodes that it "knew" would lower into
immediate-controlled blend nodes.

Turns out, we have perfectly good lowering of all these VSELECT nodes,
and indeed that lowering already knows how to handle lowering through
BLENDI to immediate-controlled blend nodes. The code just wasn't getting
used much because this thing forced the world to go through the vector
shuffle lowering. Yuck.

This also exposes that I was too aggressive in avoiding domain crossing
in v218588 with that lowering -- when the other option is to expand into
two 128-bit vectors, it is worth domain crossing. Restore that behavior
now that we have nice tests covering it.

The test updates here fall into two camps. One is where previously we
ended up with an unsigned encoding of the blend operand and now we get
a signed encoding. In most of those places there were elaborate comments
explaining exactly what these operands really mean. Rather than that,
just switch these tests to use the nicely decoded comments that make it
obvious that the final shuffle matches.

The other updates are just removing pointless domain crossing by
blending integers with PBLENDW rather than BLENDPS.

llvm-svn: 218589
2014-09-29 02:01:20 +00:00
Chandler Carruth c7129276cd [x86] Add the dispatch skeleton to the new vector shuffle lowering for
AVX-512.

There is no interesting logic yet. Everything ends up eventually
delegating to the generic code to split the vector and shuffle the
halves. Interestingly, that logic does a significantly better job of
lowering all of these types than the generic vector expansion code does.
Mostly, it lets most of the cases fall back to nice AVX2 code rather
than all the way back to SSE code paths.

Step 2 of basic AVX-512 support in the new vector shuffle lowering. Next
up will be to incrementally add direct support for the basic instruction
set to each type (adding tests first).

llvm-svn: 218585
2014-09-29 00:37:27 +00:00
Chandler Carruth 24e3b69cbd [x86] Teach the new vector shuffle lowering to fall back on AVX-512
vectors.

Someone will need to build the AVX512 lowering, which should follow
AVX1 and AVX2 *very* closely for AVX512F and AVX512BW resp. I've added
a dummy test which is a port of the v8f32 and v8i32 tests from AVX and
AVX2 to v8f64 and v8i64 tests for AVX512F and AVX512BW. Hopefully this
is enough information for someone to implement proper lowering here. If
not, I'll be happy to help, but right now the AVX-512 support isn't
a priority for me.

llvm-svn: 218583
2014-09-28 23:53:10 +00:00
Chandler Carruth abe742e8fb [x86] Fix the new vector shuffle lowering's use of VSELECT for AVX2
lowerings.

This was hopelessly broken. First, the x86 backend wants '-1' to be the
element value representing true in a boolean vector, and second the
operand order for VSELECT is backwards from the actual x86 instructions.
To make matters worse, the backend is just using '-1' as the true value
to get the high bit to be set. It doesn't actually symbolically map the
'-1' to anything. But on x86 this isn't quite how it works: there *only*
the high bit is relevant. As a consequence weird non-'-1' values like
0x80 actually "work" once you flip the operands to be backwards.

Anyways, thanks to Hal for helping me sort out what these *should* be.

llvm-svn: 218582
2014-09-28 23:23:55 +00:00
Chandler Carruth 6578f9208b [x86] Fix a really silly bug that I introduced fixing another bug in the
new vector shuffle target DAG combines -- it helps to actually test for
the value you want rather than just using an integer in a boolean
context.

Have I mentioned that I loathe implicit conversions recently? :: sigh ::

llvm-svn: 218576
2014-09-28 06:11:04 +00:00
Chandler Carruth b10c6b8e9e [x86] Fix yet another bug in the new vector shuffle lowering's handling
of widening masks.

We can't widen a zeroing mask unless both elements that would be merged
are either zeroed or undef. This is the only way to widen a mask if it
has a zeroed element.

Also clean up the code here by ordering the checks in a more logical way
and by using the symoblic values for undef and zero. I'm actually torn
on using the symbolic values because the existing code is littered with
the assumption that -1 is undef, and moreover that entries '< 0' are the
special entries. While that works with the values given to these
constants, using the symbolic constants actually makes it a bit more
opaque why this is the case.

llvm-svn: 218575
2014-09-28 03:30:25 +00:00
James Molloy 463db9a77c [AArch64] Redundant store instructions should be removed as dead code
If there is a store followed by a store with the same value to the same location, then the store is dead/noop. It can be removed.

This problem is found in spec2006-197.parser.

For example,
  stur    w10, [x11, #-4]
  stur    w10, [x11, #-4]
Then one of the two stur instructions can be removed.

Patch by David Xu!

llvm-svn: 218569
2014-09-27 17:02:54 +00:00
Craig Topper 5ed88de99b Update test case to match minor formatting change introduced in r218563.
llvm-svn: 218564
2014-09-27 05:36:53 +00:00
Chandler Carruth 4d03be1717 [x86] Fix terrible bugs everywhere in the new vector shuffle lowering
and in the target shuffle combining when trying to widen vector
elements.

Previously only one of these was correct, and we didn't correctly
propagate zeroing target shuffle masks (which have a different sentinel
value from undef in non- target shuffle masks now). This isn't just
a missed optimization, this caused us to drop zeroing shuffles on the
floor and miscompile code. The added test case is one example of that.

There are other fixes to the test suite as a consequence of this as well
as restoring the undef elements in some of the masks that were lost when
I brought sanity to the actual *value* of the undef and zero sentinels.

I've also just cleaned up some of the PSHUFD and PSHUFLW and PSHUFHW
combining code, but that code really needs to go. It was a nice initial
attempt, but it isn't very principled and the recursive shuffle combiner
is much more powerful.

llvm-svn: 218562
2014-09-27 04:42:44 +00:00
Chandler Carruth 81e6b29f03 [x86] Flip the sentinel values used in the target shuffle mask decoding
to significantly more sane sentinels. Notably, everywhere else in the
backend's representation of shuffles uses '-1' to represent undef. The
target shuffle masks really shouldn't diverge from that, especially as
in a few places they are manipulated by shared code.

This causes us to lose some undef lanes in various test masks. I want to
get these back, but technically it isn't invalid and there are a *lot*
of bugs here so I want to try to establish a saner baseline for fixing
some of the bugs by aligning the specific senitnel values used.

llvm-svn: 218561
2014-09-27 04:42:39 +00:00
Craig Topper 5996da2032 Fix TableGen -gen-disassembler output for bit fields with an offset.
This fixes bit assignments like this
Inst{7-0} = Foo{9-2}

Patch by Steve King.

llvm-svn: 218560
2014-09-27 04:38:02 +00:00
Sanjay Patel bdf1e38856 Refactor reciprocal and reciprocal square root estimate into target-independent functions (part 2).
This is purely refactoring. No functional changes intended. PowerPC is the only target
that is currently using this interface.

The ultimate goal is to allow targets other than PowerPC (certainly X86 and Aarch64) to turn this:

z = y / sqrt(x)

into:

z = y * rsqrte(x)

And:

z = y / x

into:

z = y * rcpe(x)

using whatever HW magic they can use. See http://llvm.org/bugs/show_bug.cgi?id=20900 .

There is one hook in TargetLowering to get the target-specific opcode for an estimate instruction
along with the number of refinement steps needed to make the estimate usable.

Differential Revision: http://reviews.llvm.org/D5484

llvm-svn: 218553
2014-09-26 23:01:47 +00:00
David Majnemer dac39857d6 Object: BSS/virtual sections don't have contents
Users of getSectionContents shouldn't try to pass in BSS or virtual
sections.  In all instances, this is a bug in the code calling this
routine.

N.B. Some COFF implementations (like CL) will mark their BSS sections as
taking space on disk.  This would confuse COFFObjectFile into thinking
the section is larger than the file.

llvm-svn: 218549
2014-09-26 22:32:16 +00:00
Kevin Enderby 8597488e5e Update llvm-objdump’s Mach-O symbolizer code to print the name of symbol stubs.
So in fully linked images when a call is made through a stub it now gets a
comment like the following in the disassembly:

    callq	0x100000f6c             ## symbol stub for: _printf

indicating the call is to a symbol stub and which symbol it is for.  This is
done for branch reference types and seeing if the branch target is in a stub
section and if so using the indirect symbol table entry for that stub and
using that symbol table entries symbol name.

llvm-svn: 218546
2014-09-26 22:20:44 +00:00
Chandler Carruth f572f3b2c0 [x86] Fix a moderately terrifying bug in the new 128-bit shuffle logic
that managed to elude all of my fuzz testing historically. =/

Something changed to allow this code path to actually be exercised and
it was doing bad things. It is especially heavily exercised by the
patterns that emerge when doing AVX shuffles that end up lowered through
the 128-bit code path.

llvm-svn: 218540
2014-09-26 20:41:45 +00:00
Chad Rosier 7b974b73ae [IndVar] Don't widen loop compare unless IV user is sign extended.
PR21030

llvm-svn: 218539
2014-09-26 20:05:35 +00:00
Matt Arsenault ed8a3e0a08 R600/SI: Add strict check lines to div_scale tests.
This has weird operand requirements so it's worthwhile
to have very strict checks for its operands.

Add different combinations of SGPR operands.

llvm-svn: 218535
2014-09-26 17:55:11 +00:00
Matt Arsenault 6a0919fb9b R600/SI Allow same SGPR to be used for multiple operands
Instead of moving the first SGPR that is different than the first,
legalize the operand that requires the fewest moves if one
SGPR is used for multiple operands.

This saves extra moves and is also required for some instructions
which require that the same operand be used for multiple operands.

llvm-svn: 218532
2014-09-26 17:55:03 +00:00
Matt Arsenault cb0ac3d1fb R600/SI: Partially move operand legalization to post-isel hook.
Disable the SGPR usage restriction parts of the DAG legalizeOperands.
It now should only be doing immediate folding until it can be replaced
later. The real legalization work is now done by the other
SIInstrInfo::legalizeOperands

llvm-svn: 218531
2014-09-26 17:54:59 +00:00
Matt Arsenault 5885bef6cf R600/SI: Don't move operands that are required to be SGPRs
e.g. v_cndmask_b32 requires the condition operand be an SGPR.
If one of the source operands were an SGPR, that would be considered
the one SGPR use and the condition operand would be illegally moved.

llvm-svn: 218529
2014-09-26 17:54:52 +00:00
Matt Arsenault aff65fbca5 R600/SI: Fix using wrong operand indices when commuting
No test since the current SIISelLowering::legalizeOperands
effectively hides this, and the general uses seem to only fire
on SALU instructions which don't have modifiers between
the operands.

When trying to use legalizeOperands immediately after
instruction selection, it now sees a lot more patterns
it did not see before which break on this.

llvm-svn: 218527
2014-09-26 17:54:43 +00:00
David Peixotto 472b05b36c Ignore annotation function calls in cost computation
The annotation instructions are dropped during codegen and have no
impact on size.  In some cases, the annotations were preventing the
unroller from unrolling a loop because the annotation calls were
pushing the cost over the unrolling threshold.

Differential Revision: http://reviews.llvm.org/D5335

llvm-svn: 218525
2014-09-26 17:48:40 +00:00
Chandler Carruth 0c9ee10d01 [x86] In the new vector shuffle lowering, when trying to do another
layer of tie-breaking sorting, it really helps to check that you're in
a tie first. =] Otherwise the whole thing cycles infinitely. Test case
added, another one found through fuzz testing.

llvm-svn: 218523
2014-09-26 17:24:26 +00:00
Chandler Carruth 5afd4c2603 [x86] Fix a large collection of bugs that crept in as I fleshed out the
AVX support.

New test cases included. Note that none of the existing test cases
covered these buggy code paths. =/ Also, it is clear from this that
SHUFPS and SHUFPD are the most bug prone shuffle instructions in x86. =[

These were all detected by fuzz-testing. (I <3 fuzz testing.)

llvm-svn: 218522
2014-09-26 17:11:02 +00:00
Renato Golin 36c626e33f Elide repeated register operand in Thumb1 instructions
This patch makes the ARM backend transform 3 operand instructions such as
'adds/subs' to the 2 operand version of the same instruction if the first
two register operands are the same.

Example: 'adds r0, r0, #1' will is transformed to 'adds r0, #1'.

Currently for some instructions such as 'adds' if you try to assemble
'adds r0, r0, #8' for thumb v6m the assembler would throw an error message
because the immediate cannot be encoded using 3 bits.

The backend should be smart enough to transform the instruction to
'adds r0, #8', which allows for larger immediate constants.

Patch by Ranjeet Singh.

llvm-svn: 218521
2014-09-26 16:14:29 +00:00
Robert Khasanov 6d62c0202b [AVX512] Added load/store from BW/VL subsets to Register2Memory opcode tables.
Added lowering tests for these instructions.

llvm-svn: 218508
2014-09-26 09:48:50 +00:00
David Majnemer 56167c3e95 llvm-vtabledump: strip trailing NUL bytes
llvm-svn: 218502
2014-09-26 05:50:45 +00:00
David Majnemer 1ac52ebfe2 llvm-vtabledump: Dump RTTI structures for the MS ABI
llvm-svn: 218498
2014-09-26 04:21:51 +00:00
David Xu beff8bf746 Revert patch of r218493, delete the test case
llvm-svn: 218495
2014-09-26 02:40:54 +00:00
David Xu 64f661ee0b Redundant store instructions should be removed as dead code
llvm-svn: 218493
2014-09-26 02:02:09 +00:00
Eric Christopher a9353d1798 Add the first backend support for on demand subtarget creation
based on the Function. This is currently used to implement
mips16 support in the mips backend via the existing module
pass resetting the subtarget.

Things to note:

a) This involved running resetTargetOptions before creating a
new subtarget so that code generation options like soft-float
could be recognized when creating the new subtarget. This is
to deal with initialization code in isel lowering that only
paid attention to the initial value.

b) Many of the existing testcases weren't using the soft-float
feature correctly. I've corrected these based on the check
values assuming that was the desired behavior.

c) The mips port now pays attention to the target-cpu and
target-features strings when generating code for a particular
function. I've removed these from one function where the
requested cpu and features didn't match the check lines in
the testcase.

llvm-svn: 218492
2014-09-26 01:44:08 +00:00
Matt Arsenault 0c652c3fbc R600: Avoid repeated check lines
llvm-svn: 218487
2014-09-26 01:12:36 +00:00
Matt Arsenault 3a99759498 R600/SI: Fix emitting trailing whitespace after s_waitcnt
llvm-svn: 218486
2014-09-26 01:09:46 +00:00
Adam Nemet 8d5354eaa2 [AVX512] Make vextract*x4/vinsert*x4 tests check for the index as well
Extend test so that it provides coverage for the next commit.

llvm-svn: 218479
2014-09-25 23:48:47 +00:00
Matt Arsenault 42d1565844 R600: Fix some missing conversion testcases
llvm-svn: 218474
2014-09-25 23:16:18 +00:00
Matt Arsenault c16fafb24d Remove duplicated RUN lines in middle of test
llvm-svn: 218473
2014-09-25 23:16:14 +00:00
Bruno Cardoso Lopes d04f7596e7 [MachineSink+PGO] Teach MachineSink to use BlockFrequencyInfo
Machine Sink uses loop depth information to select between successors BBs to
sink machine instructions into, where BBs within smaller loop depths are
preferable.  This patch adds support for choosing between successors by using
profile information from BlockFrequencyInfo instead, whenever the information
is available.

Tested it under SPEC2006 train (average of 30 runs for each program); ~1.5%
execution speedup in average on x86-64 darwin.

<rdar://problem/18021659>

llvm-svn: 218472
2014-09-25 23:14:26 +00:00
Tom Stellard 7980fc8562 R600/SI: Add support for global atomic add
llvm-svn: 218457
2014-09-25 18:30:26 +00:00
Robin Morisset 810739d174 Lower idempotent RMWs to fence+load
Summary:
I originally tried doing this specifically for X86 in the backend in D5091,
but it was rather brittle and generally running too late to be general.
Furthermore, other targets may want to implement similar optimizations.
So I reimplemented it at the IR-level, fitting it into AtomicExpandPass
as it interacts with that pass (which could not be cleanly done before
at the backend level).

This optimization relies on a new target hook, which is only used by X86
for now, as the correctness of the optimization on other targets remains
an open question. If it is found correct on other targets, it should be
trivial to enable for them.

Details of the optimization are discussed in D5091.

Test Plan: make check-all + a new test

Reviewers: jfb

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D5422

llvm-svn: 218455
2014-09-25 17:27:43 +00:00
Sid Manning 31f7125562 Add missing attributes !cmp.[eq,gt,gtu] instructions.
These instructions do not indicate they are extendable or the
number of bits in the extendable operand.  Rename to match
architected names.  Add a testcase for the intrinsics.

llvm-svn: 218453
2014-09-25 13:09:54 +00:00
Daniel Sanders ae275e38a2 [mips] Add CCValAssign::[ASZ]ExtUpper and CCPromoteToUpperBitsInType and handle struct's correctly on big-endian N32/N64 return values.
Summary:
The N32/N64 ABI's require that structs passed in registers are laid out
such that spilling the register with 'sd' places the struct at the lowest
address. For little endian this is trivial but for big-endian it requires
that structs are shifted into the upper bits of the register.

We also require that structs passed in registers have the 'inreg'
attribute for big-endian N32/N64 to work correctly. This is because the
tablegen-erated calling convention implementation only has access to the
lowered form of struct arguments (one or more integers of up to 64-bits
each) and is unable to determine the original type.

Reviewers: vmedic

Reviewed By: vmedic

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D5286

llvm-svn: 218451
2014-09-25 12:15:05 +00:00
Renato Golin f5dd1dacb6 Add aliases for VAND imm to VBIC ~imm
On ARM NEON, VAND with immediate (16/32 bits) is an alias to VBIC ~imm with
the same type size. Adding that logic to the parser, and generating VBIC
instructions from VAND asm files.

This patch also fixes the validation routines for NEON splat immediates which
were wrong.

Fixes PR20702.

llvm-svn: 218450
2014-09-25 11:31:24 +00:00
Chandler Carruth 0a6e961efd [x86] Teach the new vector shuffle lowering to use AVX2 instructions for
v4f64 and v8f32 shuffles when they are lane-crossing. We have fully
general lane-crossing permutation functions in AVX2 that make this easy.

Part of this also changes exactly when and how these vectors are split
up when we don't have AVX2. This isn't always a win but it usually is
a win, so on the balance I think its better. The primary regressions are
all things that just need to be fixed anyways such as modeling when
a blend can be completely accomplished via VINSERTF128, etc.

Also, this highlights one of the few remaining big features: we do
a really poor job of inserting elements into AVX registers efficiently.

This completes almost all of the big tricks I have in mind for AVX2. The
only things left that I plan to add:

1) element insertion smarts
2) palignr and other fairly specialized lowerings when they happen to
   apply

llvm-svn: 218449
2014-09-25 11:03:55 +00:00
Chandler Carruth e91d68c475 [x86] Teach the new vector shuffle lowering a fancier way to lower
256-bit vectors with lane-crossing.

Rather than immediately decomposing to 128-bit vectors, try flipping the
256-bit vector lanes, shuffling them and blending them together. This
reduces our worst case shuffle by a pretty significant margin across the
board.

llvm-svn: 218446
2014-09-25 10:21:15 +00:00
Oliver Stannard 3256b26ef2 [Thumb2] BXJ should be undefined for v7M, v8A
The Thumb2 BXJ instruction (Branch and Exchange Jazelle) is not
defined for v7M or v8A. It is defined for all other Thumb2-supporting
architectures (v6T2, v7A and v7R).

llvm-svn: 218445
2014-09-25 10:02:05 +00:00
Chandler Carruth 02387122e0 [x86] Fix an oversight in the v8i32 path of the new vector shuffle
lowering where it only used the mask of the low 128-bit lane rather than
the entire mask.

This allows the new lowering to correctly match the unpack patterns for
v8i32 vectors.

For reference, the reason that we check for the the entire mask rather
than checking the repeated mask is because the repeated masks don't
abide by all of the invariants of normal masks. As a consequence, it is
safer to use the full mask with functions like the generic equivalence
test.

llvm-svn: 218442
2014-09-25 04:10:27 +00:00
Chandler Carruth d8f528adb8 [x86] Implement AVX2 support for v32i8 in the new vector shuffle
lowering.

This completes the basic AVX2 feature support, but there are still some
improvements I'd like to do to really get the last mile of performance
here.

llvm-svn: 218440
2014-09-25 02:52:12 +00:00
Chandler Carruth 397d12c4b4 [x86] More tweaks to the v32i8 test cases.
I made a mistake in the previous commit and produced the wrong pattern.
Fix that. Also make one more shuffle pattern byte-based rather than
word-based, and add two more blend patterns.

llvm-svn: 218439
2014-09-25 02:44:39 +00:00
Chandler Carruth a03011ffae [x86] Re-work a bunch of the v32i8 test cases to actually involve byte
shuffles rather than word shuffles.

As you might guess, these were built starting from the word shuffle test
cases and I failed to properly port a bunch of them and left them as
widened word shuffle test cases. We still have a couple of tests that
check our ability to widen shuffles, but now we will test the actual
byte shuffle quite a bit better.

llvm-svn: 218438
2014-09-25 02:20:02 +00:00
Reid Kleckner 81782f0cb8 MC: Use @IMGREL instead of @IMGREL32, which we can't parse
Nico Rieck added support for this 32-bit COFF relocation some time ago
for Win64 stuff. It appears that as an oversight, the assembly output
used "foo"@IMGREL32 instead of "foo"@IMGREL, which is what we can parse.

Sadly, there were actually tests that took in IMGREL and put out
IMGREL32, and we didn't notice the inconsistency. Oh well. Now LLVM can
assemble it's own output with slightly more fidelity.

llvm-svn: 218437
2014-09-25 02:09:18 +00:00
Chandler Carruth a577bc26b6 [x86] Fix the v16i16 blend logic I added in the prior commit and add the
missing test cases for it.

Unsurprisingly, without test cases, there were bugs here. Surprisingly,
this bug wasn't caught at compile time. Yep, there is an X86ISD::BLENDV.
It isn't wired to anything. Oops. I'll fix than next.

llvm-svn: 218434
2014-09-25 01:13:38 +00:00
Justin Bogner b35a72ae9e llvm-cov: Combine segments that cover the same location
If we have multiple coverage counts for the same segment, we need to
add them up rather than arbitrarily choosing one. This fixes that and
adds a test with template instantiations to exercise it.

llvm-svn: 218432
2014-09-25 00:34:18 +00:00
Akira Hatanaka 8cc48bd159 [X86,AVX] Add an isel pattern for X86VBroadcast.
This fixes PR21050 and rdar://problem/18434607.

llvm-svn: 218431
2014-09-25 00:26:15 +00:00
Chandler Carruth 98443d89b9 [x86] Implement v16i16 support with AVX2 in the new vector shuffle
lowering.

This also implements the fancy blend lowering for v16i16 using AVX2 and
teaches the X86 backend to print shuffle masks for 256-bit PSHUFB
and PBLENDW instructions. It also makes the mask decoding correct for
PBLENDW instructions. The yaks, they are legion.

Tests are updated accordingly. There are some missing tests for the
VBLENDVB lowering, but I'll add those in a follow-up as this commit has
accumulated enough cruft already.

llvm-svn: 218430
2014-09-25 00:24:19 +00:00
Kevin Enderby bf246f5a9d Flush out enough of llvm-objdump’s SymbolizerSymbolLookUp() for Mach-O files to
get the literal string “Hello world” printed as a comment on the instruction
that loads the pointer to it. For now this is just for x86_64. So for object
files with relocation entries it produces things like:

	leaq	L_.str(%rip), %rax      ## literal pool for: "Hello world\n"

and similar for fully linked images like executables:

	leaq	0x4f(%rip), %rax        ## literal pool for: "Hello world\n"

Also to allow testing against darwin’s otool(1), I hooked up the existing 
-no-show-raw-insn option to the Mach-O parser code, added the new Mach-O
only -full-leading-addr option to match otool(1)'s printing of addresses and
also added the new -print-imm-hex option.

llvm-svn: 218423
2014-09-24 23:08:22 +00:00
Kostya Serebryany 34ddf8725c [asan] don't instrument module CTORs that may be run before asan.module_ctor. This fixes asan running together -coverage
llvm-svn: 218421
2014-09-24 22:41:55 +00:00
Renato Golin 9c4a6d87ec Removing empty ARM tests from failed revert
llvm-svn: 218419
2014-09-24 21:58:04 +00:00
Renato Golin a86bbc37f2 Removing empty tests from failed revert
llvm-svn: 218417
2014-09-24 21:45:26 +00:00
Renato Golin 4b5f91f513 Revert 218406 - Refactor the RelocVisitor::visit method
llvm-svn: 218416
2014-09-24 21:30:43 +00:00
Renato Golin ba89f068bf Revert 218407 - Add support for ARM and AArch64 BE object files
llvm-svn: 218415
2014-09-24 21:30:14 +00:00
Renato Golin d35e6f6aee Revert 218408 - Report endianness in output of {dwarf, obj}dump
llvm-svn: 218414
2014-09-24 21:29:45 +00:00
Renato Golin 2328747ede Revert 218411 - XFAIL reloc test on x86/hexagon
llvm-svn: 218413
2014-09-24 21:28:53 +00:00
Renato Golin 7aa836043f XFAIL reloc test on x86/hexagon
llvm-svn: 218411
2014-09-24 21:00:30 +00:00
Renato Golin 6f92c6b982 Report endianness in output of {dwarf, obj}dump
For biendian targets like ARM and AArch64, it is useful to have the
output of the llvm-dwarfdump and llvm-objdump report the endianness
used when the object files were generated.

Patch by Charlie Turner.

llvm-svn: 218408
2014-09-24 20:07:41 +00:00
Renato Golin ed654f5852 Add support for ARM and AArch64 BE object files
This change fixes the ARM and AArch64 relocation visitors in
RelocVisitor.  They were unconditionally assuming the object data are
little-endian.  Tests have been added to ensure that the
llvm-dwarfdump utility does not crash when processing big-endian
object files.

Patch by Charlie Turner.

llvm-svn: 218407
2014-09-24 20:07:30 +00:00
Renato Golin 2b25450061 Refactor the RelocVisitor::visit method
This change replaces the brittle if/else chain of string comparisons
with a switch statement on the detected target triple, removing the
need for testing arbitrary architecture names returned from
getFileFormatName, whose primary purpose seems to be for display
(user-interface) purposes. The visitor now takes a reference to the
object file, rather than its arbitrary file format name to figure out
whether the file is a 32 or 64-bit object file and what the detected
target triple is.

A set of tests have been added to help show that the refactoring processes
relocations for the same targets as the original code.

Patch by Charlie Turner.

llvm-svn: 218406
2014-09-24 20:07:22 +00:00
Scott Douglass ae671341c4 pass environment when invoking llvm-config from lit.cfg
Use the same environment when invoking llvm-config from lit.cfg as
will be used when running tests, so that ASAN_OPTIONS, INCLUDE, etc.
are present.

llvm-svn: 218403
2014-09-24 18:37:48 +00:00
Kaelyn Takata c4067328cf Revert "Add support for ARM and AArch64 BE object files"
This reverts commit r218389 as it depends on r218388.

llvm-svn: 218398
2014-09-24 18:00:20 +00:00
Kaelyn Takata e43d88e3f5 Revert "Report endianness in output of {dwarf, obj}dump"
This reverts commit r218391 as it depends on r218388 and r218389

llvm-svn: 218397
2014-09-24 18:00:17 +00:00
Kaelyn Takata f2fce14920 Revert "Refactor the RelocVisitor::visit method"
This reverts commit faac033f7364bb4226e22c8079c221c96af10d02.

The test depends on all targets to be enabled in llc in order to pass,
and needs to be rewritten/refactored to not have that dependency.

llvm-svn: 218393
2014-09-24 17:49:07 +00:00
Renato Golin 4edda28b8a Report endianness in output of {dwarf, obj}dump
For biendian targets like ARM and AArch64, it is useful to have the
output of the llvm-dwarfdump and llvm-objdump report the endianness
used when the object files were generated.

Patch by Charlie Turner.

llvm-svn: 218391
2014-09-24 17:01:33 +00:00
Renato Golin 0e92815e94 Add support for ARM and AArch64 BE object files
This change fixes the ARM and AArch64 relocation visitors in
RelocVisitor.  They were unconditionally assuming the object data are
little-endian.  Tests have been added to ensure that the
llvm-dwarfdump utility does not crash when processing big-endian
object files.

Patch by Charlie Turner.

llvm-svn: 218389
2014-09-24 17:01:06 +00:00
Renato Golin 53f6034f8e Refactor the RelocVisitor::visit method
This change replaces the brittle if/else chain of string comparisons
with a switch statement on the detected target triple, removing the
need for testing arbitrary architecture names returned from
getFileFormatName, whose primary purpose seems to be for display
(user-interface) purposes. The visitor now takes a reference to the
object file, rather than its arbitrary file format name to figure out
whether the file is a 32 or 64-bit object file and what the detected
target triple is.

A set of tests have been added to help show that the refactoring processes
relocations for the same targets as the original code.

Patch by Charlie Turner.

llvm-svn: 218388
2014-09-24 17:00:42 +00:00
David Peixotto 0d4d5e64ec Fix assertion in LICM doFinalization()
The doFinalization method checks that the LoopToAliasSetMap is
empty. LICM populates that map as it runs through the loop nest,
deleting the entries for child loops as it goes. However, if a child
loop is deleted by another pass (e.g. unrolling) then the loop will
never be deleted from the map because LICM walks the loop nest to
find entries it can delete.

The fix is to delete the loop from the map and free the alias set
when the loop is deleted from the loop nest.

Differential Revision: http://reviews.llvm.org/D5305

llvm-svn: 218387
2014-09-24 16:48:31 +00:00
Moritz Roth f5d0c7c2c0 [Thumb] Make load/store optimizer less conservative.
If it's safe to clobber the condition flags, we can do a few extra things:
it's then possible to reset the base register writeback using a SUBS, so
we can try to merge even if the base register isn't dead after the merged
instruction.

This is effectively a (heavily bug-fixed) rewrite of r208992.

llvm-svn: 218386
2014-09-24 16:35:50 +00:00
Oliver Stannard 1ae8b476f4 [Thumb] 32-bit encodings of 'cps' are not valid for v7M
v7M only allows the 16-bit encoding of the 'cps' (Change Processor
State) instruction, and does not have the 32-bit encoding which is
valid from v6T2 onwards.

llvm-svn: 218382
2014-09-24 14:20:01 +00:00
Chandler Carruth e7e9c04ddf [x86] Teach the instruction lowering to add comments describing constant
pool data being loaded into a vector register.

The comments take the form of:

  # ymm0 = [a,b,c,d,...]
  # xmm1 = <x,y,z...>

The []s are used for generic sequential data and the <>s are used for
specifically ConstantVector loads. Undef elements are printed as the
letter 'u', integers in decimal, and floating point values as floating
point values. Suggestions on improving the formatting or other aspects
of the display are very welcome.

My primary use case for this is to be able to FileCheck test masks
passed to vector shuffle instructions in-register. It isn't fantastic
for that (no decoding special zeroing semantics or other tricks), but it
at least puts the mask onto an instruction line that could reasonably be
checked. I've updated many of the new vector shuffle lowering tests to
leverage this in their test cases so that we're actually checking the
shuffle masks remain as expected.

Before implementing this, I tried a *bunch* of different approaches.
I looked into teaching the MCInstLower code to scan up the basic block
and find a definition of a register used in a shuffle instruction and
then decode that, but this seems incredibly brittle and complex.
I talked to Hal a lot about the "right" way to do this: attach the raw
shuffle mask to the instruction itself in some form of unencoded
operands, and then use that to emit the comments. I still think that's
the optimal solution here, but it proved to be beyond what I'm up for
here. In particular, it seems likely best done by completing the
plumbing of metadata through these layers and attaching the shuffle mask
in metadata which could have fully automatic dropping when encoding an
actual instruction.

llvm-svn: 218377
2014-09-24 09:39:41 +00:00
Matt Arsenault 3e0effa223 R600/SI: Fix weird CHECK-DAG usage
This prevents these from failing in a future commit.

llvm-svn: 218356
2014-09-24 02:14:26 +00:00
Tom Stellard 744b99b476 R600/SI: Enable selecting SALU inside branches
We can do this now that the FixSGPRLiveRanges pass is working.

llvm-svn: 218353
2014-09-24 01:33:28 +00:00
Chandler Carruth 9bd10e7492 [x86] Teach the new vector shuffle lowering to lower v8i32 shuffles with
the native AVX2 instructions.

Note that the test case is really frustrating here because VPERMD
requires the mask to be in the register input and we don't produce
a comment looking through that to the constant pool. I'm going to
attempt to improve this in a subsequent commit, but not sure if I will
succeed.

llvm-svn: 218347
2014-09-24 01:24:44 +00:00
Chandler Carruth fd11815a7d [x86] Fix a really terrible bug in the repeated 128-bin-lane shuffle
detection. It was incorrectly handling undef lanes by actually treating
an undef lane in the first 128-bit lane as a *numeric* shuffle value.

Fortunately, this almost always DTRT and disabled detecting repeated
patterns. But not always. =/ This patch introduces a much more
principled approach and fixes the miscompiles I spotted by inspection
previously.

llvm-svn: 218346
2014-09-24 01:03:57 +00:00
Robin Morisset dc1b248ccf Fix swift-atomics testcase
This testcase was not testing what it meant: because there were only two checks for
dmb {{ish}} in the second function, it could have missed a bug where one of the three
required dmb {{ish}} became dmb {{ishst}}. As I was fixing it, I also added
CHECK-LABELs to make it a bit less brittle.

llvm-svn: 218341
2014-09-23 23:18:01 +00:00
Chandler Carruth df2e421845 [x86] Teach the new vector shuffle lowering to lower v4i64 vector
shuffles using the AVX2 instructions. This is the first step of cutting
in real AVX2 support.

Note that I have spotted at least one bug in the test cases already, but
I suspect it was already present and just is getting surfaced. Will
investigate next.

llvm-svn: 218338
2014-09-23 22:39:02 +00:00
Reid Kleckner 78927e884b GlobalOpt: Preserve comdats of unoptimized initializers
Rather than slurping in and splatting out the whole ctor list, preserve
the existing array entries without trying to understand them.  Only
remove the entries that we know we can optimize away.  This way we don't
need to wire through priority and comdats or anything else we might add.

Fixes a linker issue where the .init_array or .ctors entry would point
to discarded initialization code if the comdat group from the TU with
the faulty global_ctors entry was dropped.

llvm-svn: 218337
2014-09-23 22:33:01 +00:00
Jim Grosbach 57fd2623c3 AArch64: allow constant expressions for shifted reg literals
e.g., add w1, w2, w3, lsl #(2 - 1)

This sort of thing comes up in pre-processed assembly playing macro games.
Still validate that it's an assembly time constant. The early exit error check
was just a bit overzealous and disallowed a left paren.

rdar://18430542

llvm-svn: 218336
2014-09-23 22:16:02 +00:00
Chandler Carruth 9a94bd6fa4 [x86] Teach the rest of the 'target shuffle' machinery about blends and
add VPBLENDD to the InstPrinter's comment generation so we get nice
comments everywhere.

Now that we have the nice comments, I can see the bug introduced by
a silly typo in the commit that enabled VPBLENDD, and have fixed it. Yay
tests that are easy to inspect.

llvm-svn: 218335
2014-09-23 22:14:14 +00:00
Robin Morisset 6dbbbc28b0 [X86] Make wide loads be managed by AtomicExpand
Summary:
AtomicExpand already had logic for expanding wide loads and stores on LL/SC
architectures, and for expanding wide stores on CmpXchg architectures, but
not for wide loads on CmpXchg architectures. This patch fills this hole,
and makes use of this new feature in the X86 backend.

Only one functionnal change: we now lose the SynchScope attribute.
It is regrettable, but I have another patch that I will submit soon that will
solve this for all of AtomicExpand (it seemed better to split it apart as it
is a different concern).

Test Plan: make check-all (lots of tests for this functionality already exist)

Reviewers: jfb

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D5404

llvm-svn: 218332
2014-09-23 20:59:25 +00:00
Robin Morisset 2212996936 [Power] Use AtomicExpandPass for fence insertion, and use lwsync where appropriate
Summary:
This patch makes use of AtomicExpandPass in Power for inserting fences around
atomic as part of an effort to remove fence insertion from SelectionDAGBuilder.
As a big bonus, it lets us use sync 1 (lightweight sync, often used by the mnemonic
lwsync) instead of sync 0 (heavyweight sync) in many cases.

I also added a test, as there was no test for the barriers emitted by the Power
backend for atomic loads and stores.

Test Plan: new test + make check-all

Reviewers: jfb

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D5180

llvm-svn: 218331
2014-09-23 20:46:49 +00:00
Chandler Carruth adcfec995c [x86] Teach the new shuffle lowering's blend functionality to use AVX2's
VPBLENDD where appropriate even on 128-bit vectors.

According to Agner's tables, this instruction is significantly higher
throughput (can execute on any port) on Haswell chips so we should
aggressively try to form it when available.

Sadly, this loses our delightful shuffle comments. I'll add those back
for VPBLENDD next.

llvm-svn: 218322
2014-09-23 18:16:12 +00:00
Oliver Stannard c546625c4f Fix segfault in AArch64 backend with -g and -mbig-endian
Fix a null pointer dereference when trying to swap the endianness of
fixups in the .eh_frame section in the AArch64 backend.

llvm-svn: 218311
2014-09-23 15:38:11 +00:00
Timur Iskhodzhanov f6b889126c Fix a small typo in the test comment
llvm-svn: 218306
2014-09-23 14:07:12 +00:00
Timur Iskhodzhanov d171153f81 Rebuild the inputs for the codeview-linetables.test with VS2013
Also provide reproducible instructions

llvm-svn: 218303
2014-09-23 13:49:51 +00:00
Chandler Carruth 40592d2dec [x86] Teach the vector comment parsing and printing to correctly handle
undef in the shuffle mask. This shows up when we're printing comments
during lowering and we still have an IR-level constant hanging around
that models undef.

A nice consequence of this is *much* prettier test cases where the undef
lanes actually show up as undef rather than as a particular set of
values. This also allows us to print shuffle comments in cases that use
undef such as the recently added variable VPERMILPS lowering. Now those
test cases have nice shuffle comments attached with their details.

The shuffle lowering for PSHUFB has been augmented to use undef, and the
shuffle combining has been augmented to comprehend it.

llvm-svn: 218301
2014-09-23 11:15:19 +00:00
Chandler Carruth 6d5916a2d7 [x86] Teach the AVX1 path of the new vector shuffle lowering one more
trick that I missed.

VPERMILPS has a non-immediate memory operand mode that allows it to do
asymetric shuffles in the two 128-bit lanes. Use this rather than two
shuffles and a blend.

However, it turns out the variable shuffle path to VPERMILPS (and
VPERMILPD, although that one offers no functional differenc from the
immediate operand other than variability) wasn't even plumbed through
codegen. Do such plumbing so that we can reasonably emit
a variable-masked VPERMILP instruction. Also plumb basic comment parsing
and printing through so that the tests are reasonable.

There are still a few tests which don't show the shuffle pattern. These
are tests with undef lanes. I'll teach the shuffle decoding and printing
to handle undef mask entries in a follow-up. I've looked at the masks
and they seem reasonable.

llvm-svn: 218300
2014-09-23 10:08:29 +00:00
Michael Kuperstein 946b3b2e16 Ensure bitcode encoding stays stable.
This includes constants, attributes, and some additional instructions not covered by previous tests.

Work was done by lama.saba@intel.com.

llvm-svn: 218297
2014-09-23 08:48:01 +00:00
Sanjay Patel 4bc685c206 tighten up checks
We manage to generate all of the matching instructions (and a lot more) via
the reciprocal optimization function - even if we completely remove the square
root optimization. With CHECK_NEXT, we assure that we're executing the
expected square root optimization paths and not generating extra insts.

llvm-svn: 218284
2014-09-22 22:46:44 +00:00
Sanjay Patel 5cf7561d21 remove unnecessary labels; NFC
llvm-svn: 218278
2014-09-22 21:52:53 +00:00
Juergen Ributzka 27e959d7b2 [FastISel][AArch64] Also allow folding of sign-/zero-extend and shift-left for booleans (i1).
Shift-left immediate with sign-/zero-extensions also works for boolean values.
Update the assert and the test cases to reflect that fact.

This should fix a bug found by Chad.

llvm-svn: 218275
2014-09-22 21:08:53 +00:00
David Majnemer 597be2ded6 MC: ReadOnlyWithRel section kinds should map to rdata in COFF
Don't consider ReadOnlyWithRel as a writable section in COFF, they
really belong in .rdata.

llvm-svn: 218268
2014-09-22 20:39:23 +00:00
Chandler Carruth 44deb8015c [x86] Introduce tests covering the gamut of 256-bit vector shuffling.
These are just test cases, no actual code yet. This establishes the
baseline fallback strategy we're starting from on AVX2 and the expected
lowering we use on AVX1.

Also, these test cases are very much generated. I've manually crafted
the specific pattern set that I'm hoping will be useful at exercising
the lowering code, but I've not (and could not) manually verify *all* of
these. I've spot checked and they seem legit to me.

As with the rest of vector shuffling, at a certain point the only really
useful way to check the correctness of this stuff is through fuzz
testing.

llvm-svn: 218267
2014-09-22 20:25:08 +00:00
Sanjay Patel 7939d7229d Use broadcasts to optimize overall size when loading constant splat vectors (x86-64 with AVX or AVX2).
We generate broadcast instructions on CPUs with AVX2 to load some constant splat vectors.
This patch should preserve all existing behavior with regular optimization levels, 
but also use splats whenever possible when optimizing for *size* on any CPU with AVX or AVX2.

The tradeoff is up to 5 extra instruction bytes for the broadcast instruction to save
at least 8 bytes (up to 31 bytes) of constant pool data.

Differential Revision: http://reviews.llvm.org/D5347

llvm-svn: 218263
2014-09-22 18:54:01 +00:00
Akira Hatanaka f2a721a875 Fix test case commited in r218242 to appease buildbot.
llvm-svn: 218261
2014-09-22 18:07:20 +00:00
Tom Stellard 9f73851e39 Revert "R600/SI: Add support for global atomic add"
This reverts commit r218254.

The global_atomics.ll test fails with asserts disabled.  For some reason,
the compiler fails to produce the atomic no return variants.

llvm-svn: 218257
2014-09-22 16:44:04 +00:00
Frederic Riss 220fa48491 Fix a test introduced in r218246 to work also on Windows.
llvm-svn: 218255
2014-09-22 16:17:32 +00:00
Tom Stellard 2355a77e74 R600/SI: Add support for global atomic add
llvm-svn: 218254
2014-09-22 15:35:35 +00:00
Pavel Chupin be9f12102f [x32] Fix segmented stacks support
Summary:
Update segmented-stacks*.ll tests with x32 target case and make
corresponding changes to make them pass.

Test Plan: tests updated with x32 target

Reviewers: nadav, rafael, dschuff

Subscribers: llvm-commits, zinovy.nis

Differential Revision: http://reviews.llvm.org/D5245

llvm-svn: 218247
2014-09-22 13:11:35 +00:00
Frederic Riss 955724e3f5 [dwarfdump] Dump full filenames as DW_AT_(decl|call)_file attribute values
Reviewers: dblaikie samsonov

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D5192

llvm-svn: 218246
2014-09-22 12:36:04 +00:00
Frederic Riss 58ed53cfcd Allow DWARFDebugInfoEntryMinimal::getSubroutineName to resolve cross-unit references.
Summary: getSubroutineName is currently only used by llvm-symbolizer, thus add a binary test containing a cross-cu inlining example.

Reviewers: samsonov, dblaikie

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D5394

llvm-svn: 218245
2014-09-22 12:35:53 +00:00
Robert Lougher 6da8a243f9 Fix assert when decoding PSHUFB mask
The PSHUFB mask decode routine used to assert if the mask index was out of
range (<0 or greater than the size of the vector).  The problem is, we can
legitimately have a PSHUFB with a large index using intrinsics.  The
instruction only uses the least significant 4 bits.  This change removes the
assert and masks the index to match the instruction behaviour.

llvm-svn: 218242
2014-09-22 11:54:38 +00:00
Oliver Stannard 14f97d0017 Downgrade DWARF2 section limit error to a warning
We currently emit an error when trying to assemble a file with more
than one section using DWARF2 debug info. This should be a warning
instead, as the resulting file will still be usable, but with a
degraded debug illusion.

llvm-svn: 218241
2014-09-22 10:45:16 +00:00
Chandler Carruth 7158c95d65 [x86] Move the AVX v4i64 test cases down to group them together.
Increasingly I don't want to mix the integer and floating point tests,
especially with AVX where they are handled quite differently.

llvm-svn: 218233
2014-09-22 03:05:23 +00:00
Chandler Carruth 12bbf7d922 [x86] Back out a bad choice about lowering v4i64 and pave the way for
a more sane approach to AVX2 support.

Fundamentally, there is no useful way to lower integer vectors in AVX.
None. We always end up with a VINSERTF128 in the end, so we might as
well eagerly switch to the floating point domain and do everything
there. This cleans up lots of weird and unlikely to be correct
differences between integer and floating point shuffles when we only
have AVX1.

The other nice consequence is that by doing things this way we will make
it much easier to write the integer lowering routines as we won't need
to duplicate the logic to check for AVX vs. AVX2 in each one -- if we
actually try to lower a 256-bit vector as an integer vector, we have
AVX2 and can rely on it. I think this will make the code much simpler
and more comprehensible.

Currently, I've disabled *all* support for AVX2 so that we always fall
back to AVX. This keeps everything working rather than asserting. That
will go away with the subsequent series of patches that provide
a baseline AVX2 implementation.

Please note, I'm going to implement AVX2 *without access to hardware*.
That means I cannot correctness test this path. I will be relying on
those with access to AVX2 hardware to do correctness testing and fix
bugs here, but as a courtesy I'm trying to sketch out the framework for
the new-style vector shuffle lowering in the context of the AVX2 ISA.

llvm-svn: 218228
2014-09-22 00:32:15 +00:00
Chandler Carruth 5d45962b2c [x86] Teach the new vector shuffle lowering how to cleverly lower single
input v8f32 shuffles which are not 128-bit lane crossing but have
different shuffle patterns in the low and high lanes. This removes most
of the extract/insert traffic that was unnecessary and is particularly
good at lowering cases where only one of the two lanes is shuffled at
all.

I've also added a collection of test cases with undef lanes because this
lowering is somewhat more sensitive to undef lanes than others.

llvm-svn: 218226
2014-09-21 23:46:13 +00:00
Chandler Carruth b195e860f9 [x86] Add a bunch of test cases where we have different shuffle patterns
in the high and low 128-bit lanes of a v8f32 vector.

No functionality change yet, but wanted to set up the baseline for my
next patch which will make these quite a bit better. =]

llvm-svn: 218224
2014-09-21 23:32:42 +00:00
Chandler Carruth b3125c7522 [x86] Teach the new vector shuffle lowering to re-use the SHUFPS
lowering when it can use a symmetric SHUFPS across both 128-bit lanes.

This required making the SHUFPS lowering tolerant of other vector types,
and adjusting our canonicalization to canonicalize harder.

This is the last of the clever uses of symmetry I've thought of for
v8f32. The rest of the tricks I'm aware of here are to work around
assymetry in the mask.

llvm-svn: 218216
2014-09-21 13:35:14 +00:00
Chandler Carruth 33eda72802 [x86] Teach the new vector shuffle lowering the basics about insertion
of a single element into a zero vector for v4f64 and v4i64 in AVX.
Ironically, there is less to see here because xor+blend is so crazy fast
that we can't really beat that to zero the high 128-bit lane.

llvm-svn: 218214
2014-09-21 12:49:46 +00:00
Chandler Carruth 43f5974ea0 [x86] Teach the new vector shuffle lowering how to lower to UNPCKLPS and
UNPCKHPS with AVX vectors by recognizing those patterns when they are
repeated for both 128-bit lanes.

With this, we now generate the exact same (really nice) code for
Quentin's avx_test_case.ll which was the most significant regression
reported for the new shuffle lowering. In fact, I'm out of specific test
cases for AVX lowering, the rest were AVX2 I think. However, there are
a bunch of pretty obvious remaining things to improve with AVX...

llvm-svn: 218213
2014-09-21 12:20:44 +00:00
Chandler Carruth 78f4798913 [x86] Add test cases for UNPCK instructions with v8f32 AVX vectors in
preparation for enhancing their support in the new vector shuffle
lowering.

llvm-svn: 218212
2014-09-21 12:13:11 +00:00
Chandler Carruth 88404c4f9b [x86] Begin teaching the new vector shuffle lowering among the most
important bits of cleverness: to detect and lower repeated shuffle
patterns between the two 128-bit lanes with a single instruction.

This patch just teaches it how to lower single-input shuffles that fit
this model using VPERMILPS. =] There is more that needs to happen here.

llvm-svn: 218211
2014-09-21 12:01:19 +00:00
Chandler Carruth 83252ac8f4 [x86] Regenerate this test case now that I've improved my script for
generating the test cases to format things more consistently and
actually catch all the operand sequences that should be elided in favor
of the asm comments. No actual changes here.

llvm-svn: 218210
2014-09-21 11:51:33 +00:00
Chandler Carruth e81bfbada9 [x86] Teach the new vector shuffle lowering of v4f64 to prefer a direct
VBLENDPD over using VSHUFPD. While the 256-bit variant of VBLENDPD slows
down to the same speed as VSHUFPD on Sandy Bridge CPUs, it has twice the
reciprocal throughput on Ivy Bridge CPUs much like it does everywhere
for 128-bits. There isn't a downside, so just eagerly use this
instruction when it suffices.

llvm-svn: 218208
2014-09-21 11:17:55 +00:00
Chandler Carruth 6aea21df8e [x86] Add some more comprehensive tests for v4f64 blending.
llvm-svn: 218207
2014-09-21 11:12:19 +00:00
Chandler Carruth 908afb56c0 [x86] Re-generate a bunch of the v4f64 test cases with my new script.
This expands the integer cases to cover the fact that AVX2 moves their
lane-crossing shuffles into the integer domain. It also adds proper
support for AVX2 run lines and the "ALL" group when it doesn't matter.

llvm-svn: 218206
2014-09-21 11:07:41 +00:00
Chandler Carruth 293327ddcd [x86] Teach the new vector shuffle lowering the first step toward more
actual support for complex AVX shuffling tricks. We can do independent
blends of the low and high 128-bit lanes of an avx vector, so shuffle
the inputs into place and then do the blend at 256 bits. This will in
many cases remove one blend instruction.

The next step is to permute the low and high halves in-place rather than
extracting them and re-inserting them.

llvm-svn: 218202
2014-09-21 09:35:22 +00:00
David Majnemer 48227a3759 MC: Support aligned COMMON symbols for COFF
link.exe:
Fuzz testing has shown that COMMON symbols with size > 32 will always
have an alignment of at least 32 and all symbols with size < 32 will
have an alignment of at least the largest power of 2 less than the size
of the symbol.

binutils:
The BFD linker essentially work like the link.exe behavior but with
alignment 4 instead of 32.  The BFD linker also supports an extension to
COFF which adds an -aligncomm argument to the .drectve section which
permits specifying a precise alignment for a variable but MC currently
doesn't support editing .drectve in this way.

With all of this in mind, we decide to play a little trick: we can
ensure that the alignment will be respected by bumping the size of the
global to it's alignment.

llvm-svn: 218201
2014-09-21 09:18:07 +00:00
Chandler Carruth 8ff73c0170 [x86] Add some more test cases covering specific blend patterns.
llvm-svn: 218200
2014-09-21 09:01:26 +00:00
Chandler Carruth 7a6108d652 [x86] Add the beginnings of some tests for our v8f32 shuffle lowering
under AVX.

This really just documents the current state of the world. I'm going to
try to flesh it out to cover any test cases I plan to improve prior to
improving them so that the delta made by changes is actually visible to
code reviewers.

This is made easier by the fact that I now have a script to automate the
process of producing test cases including the check lines. =]

llvm-svn: 218199
2014-09-21 08:49:27 +00:00
Chandler Carruth a454812ac8 [x86] Teach the new vector shuffle lowering to use VPERMILPD for
single-input shuffles with doubles. This allows them to fold memory
operands into the shuffle, etc. This is just the analog to the v4f32
case in my prior commit.

llvm-svn: 218193
2014-09-20 22:09:27 +00:00
Chandler Carruth aa5b798ae7 [x86] Add an AVX run to the 128-bit v2 tests, teach them to have
a generic SSE and AVX mode in addition to a specific AVX1 test path, and
flesh out the AVX tests.

llvm-svn: 218192
2014-09-20 21:26:41 +00:00
David Majnemer fb83977538 Update tests which broke from r218189
llvm-svn: 218191
2014-09-20 21:18:43 +00:00
Chandler Carruth 6f80abac4e [x86] Teach the new vector shuffle lowering to use the AVX VPERMILPS
instruction for single-vector floating point shuffles. This in turn
allows the shuffles to fold a load into the instruction which is one of
the common regressions hit with the new shuffle lowering.

llvm-svn: 218190
2014-09-20 20:52:07 +00:00
David Majnemer 7d0dc3ef18 MC: Fix MCSectionCOFF::PrintSwitchToSection
We had a few bugs:
- We were considering the GVKind instead of just looking at the section
  characteristics
- We would never print out 'y' when a section was meant to be unreadable
- We would never print out 's' when a section was meant to be shared
- We translated IMAGE_SCN_MEM_DISCARDABLE to 'n' when it should've meant
  IMAGE_SCN_LNK_REMOVE

llvm-svn: 218189
2014-09-20 20:40:50 +00:00
Chandler Carruth 78a761ce8c [x86] Start moving to a fancier check syntax to reduce the need for
duplication of check lines. The idea is to have broad sets of
compilation modes that will frequently diverge without having to always
and immediately explode to the precise ISA feature set.

While this already helps due to VEX encoded differences, it will help
much more as I teach the new shuffle lowering about more of the new VEX
encoded instructions which can still be used to implement 128-bit
shuffles.

llvm-svn: 218188
2014-09-20 18:36:39 +00:00
David Majnemer b8dbebb31c MC: Treat ReadOnlyWithRel and ReadOnlyWithRelLocal as ReadOnly for COFF
A problem with our old behavior becomes observable under x86-64 COFF
when we need a read-only GV which has an initializer which is referenced
using a relocation: we would mark the section as writable.  Marking the
section as writable interferes with section merging.

This fixes PR21009.

llvm-svn: 218179
2014-09-20 07:31:46 +00:00
Chandler Carruth 8c4cccd4aa [x86] Teach the v4f32 path of the new shuffle lowering to handle the
tricky case of single-element insertion into the zero lane of a zero
vector.

We can't just use the same pattern here as we do in every other vector
type because the general insertion logic can handle insertion into the
non-zero lane of the vector. However, in SSE4.1 with v4f32 vectors we
have INSERTPS that is a much better choice than the generic one for such
lowerings. But INSERTPS can do lots of other lowerings as well so
factoring its logic into the general insertion logic doesn't work very
well. We also can't just extract the core common part of the general
insertion logic that is faster (forming VZEXT_MOVL synthetic nodes that
lower to MOVSS when they can) because VZEXT_MOVL is often *faster* than
a blend while INSERTPS is slower! So instead we do a restrictive
condition on attempting to use the generic insertion logic to narrow it
to those cases where VZEXT_MOVL won't need a shuffle afterward and thus
will do better than INSERTPS. Then we try blending. Then we go back to
INSERTPS.

This still doesn't generate perfect code for some silly reasons that can
be fixed by tweaking the td files for lowering VZEXT_MOVL to use
XORPS+BLENDPS when available rather than XORPS+MOVSS when the input ends
up in a register rather than a load from memory -- BLENDPSrr has twice
the reciprocal throughput of MOVSSrr. Don't you love this ISA?

llvm-svn: 218177
2014-09-20 04:15:22 +00:00
Chandler Carruth 00389f3ed9 [x86] Generalize the single-element insertion lowering to work with
floating point types and use it for both v2f64 and v2i64 single-element
insertion lowering.

This fixes the last non-AVX performance regression test case I've gotten
of for the new vector shuffle lowering. There is obvious analogous
lowering for v4f32 that I'll add in a follow-up patch (because with
INSERTPS, v4f32 requires special treatment). After that, its AVX stuff.

llvm-svn: 218175
2014-09-20 03:32:25 +00:00
David Majnemer f4dc456eef llvm-readobj: pretty-print special COFF section names
Print IMAGE_SYM_DEBUG and the like instead of (-2).

llvm-svn: 218172
2014-09-20 00:25:06 +00:00
Peter Collingbourne 975726345c Fix crash with an insertvalue that produces an empty object.
llvm-svn: 218171
2014-09-20 00:10:47 +00:00
Matt Arsenault de0253791c R600: Un-xfail a test which passes with pass disabled
llvm-svn: 218165
2014-09-19 23:02:20 +00:00
Matt Arsenault 5e5b242946 R600/SI: Un-xfail tests which work now
llvm-svn: 218164
2014-09-19 23:02:18 +00:00
Matt Arsenault a986554377 R600/SI: Un xfail a test that works now
llvm-svn: 218162
2014-09-19 22:42:40 +00:00
Juergen Ributzka 92e8978e40 [FastIsel][AArch64] Fix a think-o in address computation.
When looking through sign/zero-extensions the code would always assume there is
such an extension instruction and use the wrong operand for the address.

There was also a minor issue in the handling of 'AND' instructions. I
accidentially used a 'cast' instead of a 'dyn_cast'.

llvm-svn: 218161
2014-09-19 22:23:46 +00:00
Chandler Carruth 0fc0c22fa9 [x86] Fully generalize the zext lowering in the new vector shuffle
lowering to support both anyext and zext and to custom lower for many
different microarchitectures.

Using this allows us to get *exactly* the right code for zext and anyext
shuffles in all the vector sizes. For v16i8, the improvement is *huge*.
The new SSE2 test case added I refused to add before this because it was
sooooo muny instructions.

llvm-svn: 218143
2014-09-19 20:00:32 +00:00
Justin Bogner a829fde160 llvm-cov: Prevent a test from matching its own check lines
Since llvm-cov shows the source file in its output, be careful about
potentially matching the check lines themselves.

llvm-svn: 218138
2014-09-19 19:04:08 +00:00
David Blaikie db119544a2 Fix test case to be portable to different architectures.
llvm-svn: 218134
2014-09-19 18:31:25 +00:00
Matt Arsenault 4505f3a73d R600/SI: Fix test to prepare for scheduler
llvm-svn: 218131
2014-09-19 18:11:16 +00:00
David Blaikie 3a7ce252cc Omit DW_TAG_subprograms for subprograms without inlined subroutines when producing -gmlt data
To reduce the size of -gmlt data, skip the subprograms without any
inlined subroutines. Since we've now got the ability to make these
determinations in the backend (funnily enough - we added the flag so we
wouldn't produce ranges under -gmlt, but with this change we use the
flag, but go back to producing ranges under -gmlt).

Instead, just produce CU ranges to inform the consumer which parts of
the code are described by this CU's line table. Tools could inspect the
line table directly to compute the range, but the CU ranges only seem to
be about 0.5% of object/executable size, so I'm not too worried about
teaching llvm-symbolizer that trick just yet - it's certainly a possible
piece of future work.

Update an llvm-symbolizer test just to demonstrate that this schema is
acceptable there (if it wasn't, the compiler-rt tests would catch this,
but good to have an in-llvm-tree test for llvm-symbolizer's behavior
here)

Building the clang binary with -gmlt with this patch reduces the total
size of object files by 5.1% (5.56% without ranges) without compression
and the executable by 4.37% (4.75% without ranges).

llvm-svn: 218129
2014-09-19 17:03:16 +00:00
Hal Finkel 62ac736faa Optionally enable more-aggressive FMA formation in DAGCombine
The heuristic used by DAGCombine to form FMAs checks that the FMUL has only one
use, but this is overly-conservative on some systems. Specifically, if the FMA
and the FADD have the same latency (and the FMA does not compete for resources
with the FMUL any more than the FADD does), there is no need for the
restriction, and furthermore, forming the FMA leaving the FMUL can still allow
for higher overall throughput and decreased critical-path length.

Here we add a new TLI callback, enableAggressiveFMAFusion, false by default, to
elide the hasOneUse check. This is enabled for PowerPC by default, as most
PowerPC systems will benefit.

Patch by Olivier Sallenave, thanks!

llvm-svn: 218120
2014-09-19 11:42:56 +00:00
Chandler Carruth 8a6536d4b2 [x86] Recognize that we can use duplication to widen v16i8 shuffles due
to undef lanes as well as defined widenable lanes. This dramatically
improves the lowering we use for undef-shuffles in a zext-ish pattern
for SSE2.

llvm-svn: 218115
2014-09-19 09:45:21 +00:00
Chandler Carruth 662b6d84e7 [x86] Actually test the SSE2 lowering for most of the zext-ish shuffles.
Not sure why I only did SSSE3 here. Also, I've left out some of the SSE2
ones because the shuffles are so absurd it's not worth transcribing
them. Will try to fix them to be sane and then check them.

llvm-svn: 218114
2014-09-19 08:51:06 +00:00
Chandler Carruth 2e275142cd [x86] Teach the new vector shuffle lowering to also use pmovzx for v4i32
shuffles that are zext-ing.

Not a lot to see here; the undef lane variant is better handled with
pshufd, but this improves the actual zext pattern.

llvm-svn: 218112
2014-09-19 08:37:44 +00:00
Justin Bogner 13ba23bb79 llvm-cov: Fix dropped lines when filters were applied
Uncovered lines in the middle of a covered region weren't being shown
when filtering to a particular function.

llvm-svn: 218109
2014-09-19 08:13:16 +00:00
Chandler Carruth 398ba9a018 [x86] Add a dedicated lowering path for zext-compatible vector shuffles
to the new vector shuffle lowering code.

This allows us to emit PMOVZX variants consistently for patterns where
it is a viable lowering. This instruction is both fast and allows us to
fold loads into it. This only hooks the new lowering up for i16 and i8
element widths, mostly so I could manage the change to the tests. I'll
add the i32 one next, although it is significantly less interesting.

One thing to note is that we already had some tests for these patterns
but those tests had far less horrible instructions. The problem is that
those tests weren't checking the strict start and end of the instruction
sequence. =[ As a consequence something changed in the lowering making
us generate *TERRIBLE* code for these patterns in SSE2 through SSSE3.
I've consolidated all of the tests and spelled out the madness that we
currently emit for these shuffles. I'm going to try to figure out what
has gone wrong here.

llvm-svn: 218102
2014-09-19 06:07:49 +00:00
Jiangning Liu ffbc690933 Optimize sext/zext insertion algorithm in back-end.
With this optimization, we will not always insert zext for values crossing
basic blocks, but insert sext if the users of a value crossing basic block
has preference of sign predicate.

llvm-svn: 218101
2014-09-19 05:30:35 +00:00
David Blaikie 03c3dbeb62 Omit DW_AT_frame_base under -gmlt for size
llvm-svn: 218100
2014-09-19 04:55:05 +00:00
David Blaikie 73b65d236c Omit all the extra static attributes on subprograms in -gmlt
This omission will be done in a fancier manner once we're dealing with
"put gmlt in the skeleton CUs under fission" - it'll have to be
conditional on the kind of CU we're emitting into (skeleton or gmlt).

llvm-svn: 218098
2014-09-19 04:30:36 +00:00
Hans Wennborg c0f0c511db Fix an it's vs. its typo.
llvm-svn: 218093
2014-09-19 01:14:56 +00:00
Matt Arsenault 46cbc4367b R600: Better fix for bug 20982
Just do the left shift as unsigned to avoid the UB.

llvm-svn: 218092
2014-09-19 00:42:06 +00:00
Chandler Carruth be58fd2f2d [x86] Extend this test to cover SSE4.1. Nothing interesting here, but
paves the way for subsequent changes.

llvm-svn: 218091
2014-09-19 00:30:24 +00:00