Commit Graph

32982 Commits

Author SHA1 Message Date
Charlie Turner 458e79b814 [ARM] Expand ROTL and ROTR of vector value types
Summary: After D13851 landed, we saw backend crashes when compiling the reduced test case included in this patch. The right fix seems to be to allow these vector types for expansion in instruction selection.

Reviewers: rengolin, t.p.northover

Subscribers: RKSimon, t.p.northover, aemerson, llvm-commits, rengolin

Differential Revision: http://reviews.llvm.org/D14082

llvm-svn: 251401
2015-10-27 10:25:20 +00:00
David Majnemer dd9a815746 [ScalarEvolutionExpander] Properly insert no-op casts + EH Pads
We want to insert no-op casts as close as possible to the def.  This is
tricky when the cast is of a PHI node and the BasicBlocks between the
def and the use cannot hold any instructions.  Iteratively walk EH pads
until we hit a non-EH pad.

This fixes PR25326.

llvm-svn: 251393
2015-10-27 07:36:42 +00:00
Michael Kuperstein e1194bdb4f [X86] Make elfiamcu an OS, not an environment.
GNU tools require elfiamcu to take up the entire OS field, so, e.g.
i?86-*-linux-elfiamcu is not considered a legal triple.
Make us compatible.

Differential Revision: http://reviews.llvm.org/D14081

llvm-svn: 251390
2015-10-27 07:23:59 +00:00
Sanjay Patel 309c4f93e5 [x86] replace integer logic ops with packed SSE FP logic ops
If we have an operand to a bitwise logic op that's already in
an XMM register and the result is going to be sent to an XMM
register, then use an SSE logic op to avoid moves between the
integer and vector register files.

Related commits:
http://reviews.llvm.org/rL248395
http://reviews.llvm.org/rL248399
http://reviews.llvm.org/rL248404
http://reviews.llvm.org/rL248409
http://reviews.llvm.org/rL248415

This should solve PR22428:
https://llvm.org/bugs/show_bug.cgi?id=22428

llvm-svn: 251378
2015-10-27 01:28:07 +00:00
Steve King fee370be72 Fix llc crash processing S/UREM for -Oz builds caused by rL250825.
When taking the remainder of a value divided by a constant, visitREM()
attempts to convert the REM to a longer but faster sequence of instructions.
This conversion calls combine() on a speculative DIV instruction. Commit
rL250825 may cause this combine() to return a DIVREM, corrupting nearby nodes.
Flow eventually hits unreachable().

This patch adds a test case and a check to prevent visitREM() from trying
to convert the REM instruction in cases where a DIVREM is possible.
See http://reviews.llvm.org/D14035

llvm-svn: 251373
2015-10-27 00:14:06 +00:00
Sanjay Patel 28d1598e5b add FP logic test cases to show current codegen (PR22428)
llvm-svn: 251370
2015-10-26 23:52:42 +00:00
Chandler Carruth 5a14186b6c [x86] Make the vselect-minmax test 2x to 3x faster by deleting all the
instructions that aren't relevant for instruction selection of vector
min and max.

llvm-svn: 251366
2015-10-26 22:54:53 +00:00
Tim Northover 939f089242 ARM: make sure VFP loads and stores are properly aligned.
Both VLDRS and VLDRD fault if the memory is not 4 byte aligned, which wasn't
really being checked before, leading to faults at runtime.

llvm-svn: 251352
2015-10-26 21:32:53 +00:00
Peter Collingbourne 56fff8d394 Fix tests.
llvm-svn: 251343
2015-10-26 20:49:49 +00:00
Alexey Samsonov f3ecfd3af4 [LLVMSymbolize] Use symbol table only if function linkage name was requested.
Now it's enough to just specify -functions=short without additionally
providing -use-symbol-table=false.

llvm-svn: 251339
2015-10-26 20:12:29 +00:00
Igor Laevsky 1ef06559f4 [RS4GC] Strip noalias attribute after statepoint rewrite
We should remove noalias along with dereference and dereference_or_null attributes 
because statepoint could potentially touch the entire heap including noalias objects.

Differential Revision: http://reviews.llvm.org/D14032

llvm-svn: 251333
2015-10-26 19:06:01 +00:00
Diego Novillo 7963ea1996 SamplePGO - Add optimization reports.
This adds a couple of optimization remarks to the SamplePGO
transformation. When it decides to inline a hot function (to mimic the
inline stack and repeat useful inline decisions in the original build).

It will also report branch destinations. For instance, given the code
fragment:

     6      if (i < 1000)
     7        sum -= i;
     8      else
     9        sum += -i * rand();

If the 'else' branch is taken most of the time, building this code with
-Rpass=sample-profile will produce:

a.cc:9:14: remark: most popular destination for conditional branches at small.cc:6:9 [-Rpass=sample-profile]
      sum += -i * rand();
             ^

llvm-svn: 251330
2015-10-26 18:52:53 +00:00
Mehdi Amini 5d303285b9 Add an (optional) identification block in the bitcode
Processing bitcode from a different LLVM version can lead to
unexpected behavior. The LLVM project guarantees autoupdating
bitcode from a previous minor revision for the same major, but
can't make any promise when reading bitcode generated from a
either a non-released LLVM, a vendor toolchain, or a "future"
LLVM release. This patch aims at being more user-friendly and
allows a bitcode produce to emit an optional block at the
beginning of the bitcode that will contains an opaque string
intended to describe the bitcode producer information. The
bitcode reader will dump this information alongside any error it
reports.

The optional block also includes an "epoch" number, monotonically
increasing when incompatible changes are made to the bitcode. The
reader will reject bitcode whose epoch is different from the one
expected.

Differential Revision: http://reviews.llvm.org/D13666

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 251325
2015-10-26 18:37:00 +00:00
Evgeniy Stepanov d1aad26589 [safestack] Fast access to the unsafe stack pointer on AArch64/Android.
Android libc provides a fixed TLS slot for the unsafe stack pointer,
and this change implements direct access to that slot on AArch64 via
__builtin_thread_pointer() + offset.

This change also moves more code into TargetLowering and its
target-specific subclasses to get rid of target-specific codegen
in SafeStackPass.

This change does not touch the ARM backend because ARM lowers
builting_thread_pointer as aeabi_read_tp, which is not available
on Android.

The previous iteration of this change was reverted in r250461. This
version leaves the generic, compiler-rt based implementation in
SafeStack.cpp instead of moving it to TargetLoweringBase in order to
allow testing without a TargetMachine.

llvm-svn: 251324
2015-10-26 18:28:25 +00:00
Peter Collingbourne 97aae40880 ARM/ELF: Better codegen for global variable addresses.
In PIC mode we were previously computing global variable addresses (or GOT
entry addresses) by adding the PC, the PC-relative GOT displacement and
the GOT-relative symbol/GOT entry displacement. Because the latter two
displacements are fixed, we ended up performing one more addition than
necessary.

This change causes us to compute addresses using a single PC-relative
displacement, resulting in a shorter code sequence. This reduces code size
by about 4% in a recent build of Chromium for Android.

As a result of this change we no longer need to compute the GOT base address
in the ARM backend, which allows us to remove the Global Base Reg pass and
SDAG lowering for the GOT.

We also now no longer use the GOT when addressing a symbol which is known
to be defined in the same linkage unit. Specifically, the symbol must have
either hidden visibility or a strong definition in the current module in
order to not use the the GOT.

This is a change from the previous behaviour where we would use the GOT to
address externally visible symbols defined in the same module. I think the
only cases where this could matter are cases involving symbol interposition,
but we don't really support that well anyway.

Differential Revision: http://reviews.llvm.org/D13650

llvm-svn: 251322
2015-10-26 18:23:16 +00:00
Diego Novillo 5eb5ad09d5 Cleanup test case debug info. NFC.
llvm-svn: 251320
2015-10-26 18:14:44 +00:00
Jonas Paulsson 83553d0cac [SystemZ] LTGFR use regclass should be GR32, not GR64.
Discovered by testing int-cmp-44.ll with -verify-machineinstrs (added to
test run).

llvm-svn: 251299
2015-10-26 15:03:49 +00:00
Jonas Paulsson 7da3820882 [SystemZ] Also clear kill flag for index reg in splitMove().
Discovered by running fp-move-05.ll with -verify-machineinstrs (added
to test case run).

llvm-svn: 251298
2015-10-26 15:03:41 +00:00
Jonas Paulsson 9525b2c0c8 [SystemZ] Don't forget the CC def op on LTEBRCompare pseudos
Discovered by running fp-cmp-02.ll with -verify-machineinstrs (now added
to test run).

llvm-svn: 251297
2015-10-26 15:03:32 +00:00
Jonas Paulsson dab7407258 [SystemZ] Tie operands in SystemZShorteInst if MI becomes 2-address.
Discovered by testing fp-add-02.ll with -verify-machineinstrs.

Test case updated to always run with -verify-machineinstrs.

llvm-svn: 251296
2015-10-26 15:03:07 +00:00
Vasileios Kalintiris 165121f326 [mips] Check for the correct error message in tests for interrupt attributes.
Instead of XFAIL-ing the tests with the wrong usage of the "interrupt"
attribute, we should check that we emit the correct error messages to
the user.

llvm-svn: 251295
2015-10-26 14:24:30 +00:00
James Molloy 493e57de01 [ValueTracking] Extend r251146 to catch a fairly common case
Even though we may not know the value of the shifter operand, it's possible we know the shifter operand is non-zero. This can allow us to infer more known bits - for example:

  %1 = load %p !range {1, 5}
  %2 = shl %q, %1

We don't know %1, but we do know that it is nonzero so %2[0] is known zero, and importantly %2 is known non-zero.

Calling isKnownNonZero is nontrivially expensive so use an Optional to run it lazily and cache its result.

llvm-svn: 251294
2015-10-26 14:10:46 +00:00
Elena Demikhovsky 7a77149391 Loop Vectorizer - skipping "bitcast" before GEP
Vectorization of memory instruction (Load/Store) is possible when the pointer is coming from GEP. The GEP analysis allows to estimate the profit.
In some cases we have a "bitcast" between GEP and memory instruction.
I added code that skips the "bitcast".

http://reviews.llvm.org/D13886

llvm-svn: 251291
2015-10-26 13:42:41 +00:00
Tim Northover dd6752279f Tests: be slightly more specific to avoid conflict with path.
llvm-svn: 251290
2015-10-26 13:40:03 +00:00
Igor Breger 2e919c89ce fix test errors (on windows) for commit r251287
llvm-svn: 251288
2015-10-26 13:31:41 +00:00
Igor Breger e4ddc3f4cd AVX512: Enabled VPBROADCASTB lowering for v64i8 vectors.
Differential Revision: http://reviews.llvm.org/D13896

llvm-svn: 251287
2015-10-26 13:01:02 +00:00
Vasileios Kalintiris 43dff0c033 [mips] Interrupt attribute support for mips32r2+.
Summary:
This patch adds support for using the "interrupt" attribute on Mips
for interrupt handling functions. At this time only mips32r2+ with the
o32 ABI with the static relocation model is supported. Unsupported
configurations will be rejected

Patch by Simon Dardis (+ clang-format & some trivial changes to follow the
LLVM coding standards by me).

Reviewers: mpf, dsanders

Subscribers: dsanders, vkalintiris, llvm-commits

Differential Revision: http://reviews.llvm.org/D10768

llvm-svn: 251286
2015-10-26 12:38:43 +00:00
Igor Breger 684af8156c AVX-512: Use correct extract vector length.
Bug https://llvm.org/bugs/show_bug.cgi?id=25318

Differential Revision: http://reviews.llvm.org/D14062

llvm-svn: 251285
2015-10-26 12:26:34 +00:00
Silviu Baranga b892e35520 [InstCombine] Teach instcombine not to create extra PHI nodes when folding GEPs
Summary:
InstCombine tries to transform GEP(PHI(GEP1, GEP2, ..)) into GEP(GEP(PHI(...))
when possible. However, this may leave the old PHI node around. Even if we
do end up folding the GEPs, having an extra PHI node might not be beneficial.

This change makes the transformation more conservative. We now only do this if
the PHI has only one use, and can therefore be removed after the transformation.

Reviewers: jmolloy, majnemer

Subscribers: mcrosier, mssimpso, llvm-commits

Differential Revision: http://reviews.llvm.org/D13887

llvm-svn: 251281
2015-10-26 10:25:05 +00:00
James Molloy 72222f5dca [ARM] Handle the inline asm constraint type 'o'
This means "memory with offset" and requires very little plumbing to get working. This fixes PR25317.

llvm-svn: 251280
2015-10-26 10:04:52 +00:00
Igor Breger f8e461f920 AVX512: Add AVX-512 not materializable instructions.
Otherwise value can be reused , despite its value could be changed - produces incorrect assembler.

https://llvm.org/bugs/show_bug.cgi?id=25270

Differential Revision: http://reviews.llvm.org/D14057

llvm-svn: 251275
2015-10-26 08:37:12 +00:00
David Majnemer 7bb30968e2 Update test to take into account for r251271.
llvm-svn: 251272
2015-10-26 03:34:29 +00:00
David Majnemer 0993e0b8a1 [MC] Add support for GNU as-compatible binary operator precedence
GNU as and Darwin give the various binary operators different
precedence.  LLVM's MC supported the Darwin semantics but not the GNU
semantics.

This fixes PR25311.

llvm-svn: 251271
2015-10-26 03:15:34 +00:00
David Majnemer a375b26144 [MC] Don't crash when .word is given bogus values
We didn't validate that the .word directive was given a sane value,
leading to crashes when we attempt to write out the object file.

Instead, perform some validation and issue a diagnostic pointing at the
start of the diagnostic.

llvm-svn: 251270
2015-10-26 02:45:50 +00:00
Simon Pilgrim 3e5e272fca [X86][AVX] Regenerate tests.
llvm-svn: 251263
2015-10-25 21:47:09 +00:00
Simon Pilgrim ec6db262e0 [X86][SSE4A] Fix for EXTRQI shuffle lowering.
Incorrect range test - found during fuzz testing.

llvm-svn: 251245
2015-10-25 17:40:54 +00:00
Simon Pilgrim b9ab397647 [X86][SSE] Refreshed tests (missing AVX512 patterns)
llvm-svn: 251238
2015-10-25 15:39:22 +00:00
Elena Demikhovsky 092858588a Scalarizer for masked.gather and masked.scatter intrinsics.
When the target does not support these intrinsics they should be converted to a chain of scalar load or store operations.
If the mask is not constant, the scalarizer will build a chain of conditional basic blocks.
I added isLegalMaskedGather() isLegalMaskedScatter() APIs.

Differential Revision: http://reviews.llvm.org/D13722

llvm-svn: 251237
2015-10-25 15:37:55 +00:00
Simon Pilgrim be187a0a1a [X86][SSE] Added tests for shuffling through bitcasts.
llvm-svn: 251236
2015-10-25 15:32:04 +00:00
Simon Pilgrim 40ada57b7e [X86][SSE] vector sext/zext tests - remove unnecessary mcpu arguments
llvm-svn: 251233
2015-10-25 12:15:00 +00:00
Simon Pilgrim b398da1d5c [X86][SSE] shift/rotate tests - remove unnecessary mcpu arguments and regenerate/cleanup
llvm-svn: 251232
2015-10-25 12:07:45 +00:00
Simon Pilgrim 5d4866b174 [X86] PMOV*X* tests - remove unnecessary mcpu arguments and regenerate
llvm-svn: 251230
2015-10-25 11:55:10 +00:00
Simon Pilgrim 5e79ea8281 [X86] Stack folding tests - just use mtriple - no need for mcpu in these tests
llvm-svn: 251229
2015-10-25 11:42:46 +00:00
Michael Kuperstein eaa16005af [X86] Use correct calling convention for MCU psABI libcalls
When using the MCU psABI, compiler-generated library calls should pass
some parameters in-register. However, since inreg marking for x86 is currently
done by the front end, it will not be applied to backend-generated calls.

This is a workaround for PR3997, which describes a similar issue for -mregparm.

Differential Revision: http://reviews.llvm.org/D13977

llvm-svn: 251223
2015-10-25 08:14:05 +00:00
Simon Pilgrim 53c2bff5fe [X86][SSE] Use lowerVectorShuffleWithUNPCK instead of custom matches.
Most 128-bit and 256-bit shuffles were manually matching UNPCK patterns - use lowerVectorShuffleWithUNPCK to be more thorough.

llvm-svn: 251211
2015-10-24 22:45:04 +00:00
Simon Pilgrim 64f772eac9 Removed old FIXME - we do generate movddup for SSE3 and higher
llvm-svn: 251205
2015-10-24 20:15:43 +00:00
Simon Pilgrim 7430804fe1 [DAGCombiner] Generalize masking of constant rotates.
We don't need a mask of a rotation result to be a constant splat - any constant scalar/vector can be usefully folded.

Followup to D13851.

llvm-svn: 251197
2015-10-24 18:44:52 +00:00
Hans Wennborg 34d40434a7 X86ISelLowering: Support tail calls to/from callee pop functions
This enables tail calls with thiscall, stdcall, vectorcall and
fastcall functions.

Differential Revision: http://reviews.llvm.org/D13999

llvm-svn: 251190
2015-10-24 16:47:10 +00:00
Simon Pilgrim d5ef318b5b [X86][XOP] Add support for lowering vector rotations
This patch adds support for lowering to the XOP VPROT / VPROTI vector bit rotation instructions.

This has required changes to the DAGCombiner rotation pattern matching to support vector types - so far I've only changed it to support splat vectors, but generalising this further is feasible in the future.

Differential Revision: http://reviews.llvm.org/D13851

llvm-svn: 251188
2015-10-24 13:17:26 +00:00
Chen Li 7009cd3554 Revert rL251061 [SimplifyCFG] Extend SimplifyResume to handle phi of trivial landing pad.
llvm-svn: 251149
2015-10-23 21:13:01 +00:00
Hal Finkel f2199b2178 Handle non-constant shifts in computeKnownBits, and use computeKnownBits for constant folding in InstCombine/Simplify
First, the motivation: LLVM currently does not realize that:

  ((2072 >> (L == 0)) >> 7) & 1 == 0

where L is some arbitrary value. Whether you right-shift 2072 by 7 or by 8, the
lowest-order bit is always zero. There are obviously several ways to go about
fixing this, but the generic solution pursued in this patch is to teach
computeKnownBits something about shifts by a non-constant amount. Previously,
we would give up completely on these. Instead, in cases where we know something
about the low-order bits of the shift-amount operand, we can combine (and
together) the associated restrictions for all shift amounts consistent with
that knowledge. As a further generalization, I refactored all of the logic for
all three kinds of shifts to have this capability. This works well in the above
case, for example, because the dynamic shift amount can only be 0 or 1, and
thus we can say a lot about the known bits of the result.

This brings us to the second part of this change: Even when we know all of the
bits of a value via computeKnownBits, nothing used to constant-fold the result.
This introduces the necessary code into InstCombine and InstSimplify. I've
added it into both because:

  1. InstCombine won't automatically pick up the associated logic in
     InstSimplify (InstCombine uses InstSimplify, but not via the API that
     passes in the original instruction).

  2. Putting the logic in InstCombine allows the resulting simplifications to become
     part of the iterative worklist

  3. Putting the logic in InstSimplify allows the resulting simplifications to be
     used by everywhere else that calls SimplifyInstruction (inlining, unrolling,
     and many others).

And this requires a small change to our definition of an ephemeral value so
that we don't break the rest case from r246696 (where the icmp feeding the
@llvm.assume, is also feeding a br). Under the old definition, the icmp would
not be considered ephemeral (because it is used by the br), but this causes the
assume to remove itself (in addition to simplifying the branch structure), and
it seems more-useful to prevent that from happening.

llvm-svn: 251146
2015-10-23 20:37:08 +00:00
Tim Northover d4f55c0b1b GVN: don't try to replace instruction with itself.
After some look-ahead PRE was added for GEPs, an instruction could end
up in the table of candidates before it was actually inspected. When
this happened the pass might decide it was the best candidate to
replace itself. This didn't go well.

Should fix PR25291

llvm-svn: 251145
2015-10-23 20:30:02 +00:00
Sanjoy Das 0a1bee8a80 [Inliner] Don't inline through callsites with operand bundles
Summary:
This change teaches the LLVM inliner to not inline through callsites
with unknown operand bundles.  Currently all operand bundles are
"unknown" operand bundles but in the near future we will add support for
inlining through some select kinds of operand bundles.

Reviewers: reames, chandlerc, majnemer

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D14001

llvm-svn: 251141
2015-10-23 20:09:55 +00:00
Reid Kleckner f02e33ce42 [X86] Clean up the tail call eligibility logic
Summary:
The logic here isn't straightforward because our support for
TargetOptions::GuaranteedTailCallOpt.

Also fix a bug where we were allowing tail calls to cdecl functions from
fastcall and vectorcall functions. We were special casing thiscall and
stdcall callers rather than checking for any convention that requires
clearing stack arguments before returning.

Reviewers: hans

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D14024

llvm-svn: 251137
2015-10-23 19:35:38 +00:00
Lang Hames 3fef117ba5 [RuntimeDyld][COFF] Fix a think-o in the handling of the IMAGE_REL_AMD64_ADDR64
relocation that was introduced in r250733.

llvm-svn: 251135
2015-10-23 18:46:43 +00:00
Matt Arsenault 382557ec72 AMDGPU: Fix parsing of 32-bit literals with sign bit set
llvm-svn: 251132
2015-10-23 18:07:58 +00:00
Artyom Skrobov 5a6e39454e [ARM] Renaming +t2dsp feature into +dsp, as discussed on llvm-dev
llvm-svn: 251125
2015-10-23 17:19:19 +00:00
Oleg Ranevskyy 6389dd9fa2 [ARM CodeGen] @llvm.debugtrap call may be removed when restoring callee saved registers
Summary:
When ARMFrameLowering::emitPopInst generates a "pop" instruction to restore the callee saved registers, it checks if the LR register is among them. If so, the function may decide to remove the basic block's terminator and replace it with a "pop" to the PC register instead of LR.

This leads to a problem when the block's terminator is preceded by a "llvm.debugtrap" call. The MI iterator points to the trap in such a case, which is also a terminator. If the function decides to restore LR to PC, it erroneously removes the trap.

Reviewers: asl, rengolin

Subscribers: aemerson, jfb, rengolin, dschuff, llvm-commits

Differential Revision: http://reviews.llvm.org/D13672

llvm-svn: 251123
2015-10-23 17:17:59 +00:00
Joseph Tremoulet 3d0fbf1d74 [CodeGen] Mark setjmp/catchret MBBs address-taken
Summary:
This ensures that BranchFolding (and similar) won't remove these blocks.

Also allow AsmPrinter::EmitBasicBlockStart to process MBBs which are
address-taken but do not have BBs that are address-taken, since otherwise
its call to getAddrLabelSymbolTableToEmit would fail an assertion on such
blocks.  I audited the other callers of getAddrLabelSymbolTableToEmit
(and getAddrLabelSymbol); they all have BBs known to be address-taken
except for the call through getAddrLabelSymbol from
WinException::create32bitRef; that call is actually now unreachable, so
I've removed it and updated the signature of create32bitRef.

This fixes PR25168.

Reviewers: majnemer, andrew.w.kaylor, rnk

Subscribers: pgavlin, llvm-commits

Differential Revision: http://reviews.llvm.org/D13774

llvm-svn: 251113
2015-10-23 15:06:05 +00:00
James Molloy 05a896a8d1 [BasicAA] Bugfix for r251016
If the loaded type sizes don't match the element type of the sequential type, all bets are off and the addresses may, indeed, overlap.

Surprisingly, this just got caught in one test, on one builder, out of the 30+ builders testing this change. Congratulations go to http://lab.llvm.org:8011/builders/clang-aarch64-lnt/builds/5205.

llvm-svn: 251112
2015-10-23 14:17:03 +00:00
James Molloy 5b18b4ce96 Revert "[AArch64]Merge halfword loads into a 32-bit load"
This reverts commit r250719. This introduced a codegen fault in SPEC2000.gcc, when compiled for Cortex-A53.

llvm-svn: 251108
2015-10-23 10:41:38 +00:00
Zlatko Buljan 2cf61020b8 [mips][microMIPS] Implement SHLL.PH, SHLL_S.PH, SHLL.QB, SHLLV.PH, SHLLV_S.PH, SHLLV.QB, SHLLV_S.W, SHLL_S.W, SHRA.QB and SHRA_R.QB instructions
Differential Revision: http://reviews.llvm.org/D13929

llvm-svn: 251098
2015-10-23 06:39:29 +00:00
Dylan McKay 57cee79f7c [AVR] Add ELF constants to headers
Also adds a 'trivial' ELF file. This was generated by assembling
and linking a file with the symbol main which contains a single
return instruction.

llvm-svn: 251096
2015-10-23 06:05:55 +00:00
Chen Li c6e28782d8 [SimplifyCFG] Extend SimplifyResume to handle phi of trivial landing pad.
Summary: Currently SimplifyResume can convert an invoke instruction to a call instruction if its landing pad is trivial. In practice we could have several invoke instructions with trivial landing pads and share a common rethrow block, and in the common rethrow block, all the landing pads join to a phi node. The patch extends SimplifyResume to check the phi of landing pad and their incoming blocks. If any of them is trivial, remove it from the phi node and convert the invoke instruction to a call instruction.  

Reviewers: hfinkel, reames

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D13718

llvm-svn: 251061
2015-10-22 20:48:38 +00:00
Sanjoy Das 1740d72867 [SCEV] Remove a test case added in r249168
The test case wasn't testing what it was commented to be testing; and
when I tried to fix the test I noticed that SCEV does not support the
simplification that the test was supposed to test.

This change removes the test case to avoid confusion.

llvm-svn: 251053
2015-10-22 19:57:41 +00:00
Sanjoy Das eeca9f6fd4 [SCEV] Commute zero extends through <nuw> additions
llvm-svn: 251052
2015-10-22 19:57:38 +00:00
Sanjoy Das 6e78b17b43 [SCEV] Opportunistically interpret unsigned constraints as signed
Summary:
An unsigned comparision is equivalent to is corresponding signed version
if both the operands being compared are positive.  Teach SCEV to use
this fact when profitable.

Reviewers: atrick, hfinkel, reames, nlewycky

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D13687

llvm-svn: 251051
2015-10-22 19:57:34 +00:00
Sanjoy Das 1123148d40 [SCEV] Teach SCEV some axioms about non-wrapping arithmetic
Summary:
 - A s<  (A + C)<nsw> if C >  0
 - A s<= (A + C)<nsw> if C >= 0
 - (A + C)<nsw> s<  A if C <  0
 - (A + C)<nsw> s<= A if C <= 0

Right now `C` needs to be a constant, but we can later generalize it to
be a non-constant if needed.

Reviewers: atrick, hfinkel, reames, nlewycky

Subscribers: sanjoy, llvm-commits

Differential Revision: http://reviews.llvm.org/D13686

llvm-svn: 251050
2015-10-22 19:57:29 +00:00
Sanjoy Das a060e602fd [SCEV] Commute sign extends through nsw additions
Summary: Depends on D13613.

Reviewers: atrick, hfinkel, reames, nlewycky

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D13685

llvm-svn: 251049
2015-10-22 19:57:25 +00:00
Sanjoy Das 8f27415c05 [SCEV] Mark AddExprs as nsw or nuw if legal
Summary:
This uses `ScalarEvolution::getRange` and not potentially control
dependent `nsw` and `nuw` bits on the arithmetic instruction.

Reviewers: atrick, hfinkel, nlewycky

Subscribers: llvm-commits, sanjoy

Differential Revision: http://reviews.llvm.org/D13613

llvm-svn: 251048
2015-10-22 19:57:19 +00:00
Alexey Samsonov 8daaf8b09b [ASan] Minor fixes to dynamic allocas handling:
* Don't instrument promotable dynamic allocas:
  We already have a test that checks that promotable dynamic allocas are
  ignored, as well as static promotable allocas. Make sure this test will
  still pass if/when we enable dynamic alloca instrumentation by default.

* Handle lifetime intrinsics before handling dynamic allocas:
  lifetime intrinsics may refer to dynamic allocas, so we need to emit
  instrumentation before these dynamic allocas would be replaced.

Differential Revision: http://reviews.llvm.org/D12704

llvm-svn: 251045
2015-10-22 19:51:59 +00:00
Zia Ansari 8f509a7044 [X86] - Catch extra combine opportunities for redundant imuls.
When we fold "mul ((add x, c1), c1)" -> "add ((mul x, c2), c1*c2)", we bail if (add x, c1) has multiple
users which would result in an extra add instruction.
In such cases, this patch adds a check to see if we can eliminate a multiply instruction in exchange for the extra add.

I also added the capability of doing the existing optimization with non-splatted vectors (splatted also works).

Differential Revision: http://reviews.llvm.org/D13740

llvm-svn: 251028
2015-10-22 16:14:45 +00:00
Bill Schmidt de1dc9c98f [PPC] Fix PR24686 by failing assembly for an invalid relocation
PR24686 identifies a problem where a relocation expression is invalid
when not all of the symbols in the expression can be locally
resolved.  This causes the compiler to request a PC-relative half16ds
relocation, which is nonsensical for PowerPC.  This patch recognizes
this situation and ensures we fail the assembly cleanly.

Test case provided by Anton Blanchard.

llvm-svn: 251027
2015-10-22 15:53:44 +00:00
Pirama Arumuga Nainar 5884a1f5a6 Fix incorrect target triple in fp16-promote.ll
Summary:
Hyphens were missing from the triple, causing it to be parsed
incorrectly.  This patch updates the triple and makes necessary
changes to the expected output.

Patch is from Vinicius Tinti.

Reviewers: ab, tinti

Subscribers: srhines, llvm-commits

Differential Revision: http://reviews.llvm.org/D13792

llvm-svn: 251020
2015-10-22 14:15:00 +00:00
Daniel Sanders 9e19817870 [mips][mips16] Fix typo in FileCheck directive.
llvm-svn: 251019
2015-10-22 14:01:52 +00:00
Asaf Badouh 7c52245660 [X86][AVX512] extend vcvtph2ps to support xmm/ymm and sae versions
Differential Revision: http://reviews.llvm.org/D13945

llvm-svn: 251018
2015-10-22 14:01:16 +00:00
James Molloy 5b2a732fac [GlobalsAA] Loosen an overly conservative bailout
Instead of bailing out when we see loads, analyze them. If we can prove that the loaded-from address must escape, then we can conclude that a load from that address must escape too and therefore cannot alias a non-addr-taken global.

When checking if a Value can alias a non-addr-taken global, if the Value is a LoadInst of a non-global, recurse instead of bailing.

If we can follow a trail of loads up to some base that is captured, we know by inference that all the loads we followed are also captured.

llvm-svn: 251017
2015-10-22 13:44:26 +00:00
James Molloy 5a4d8cd519 [BasicAA] Non-equal indices in a GEP of a SequentialType don't overlap
If the final indices of two GEPs can be proven to not be equal, and
the GEP is of a SequentialType (not a StructType), then the two GEPs
do not alias.

llvm-svn: 251016
2015-10-22 13:28:18 +00:00
James Molloy 1d88d6f289 [ValueTracking] Add a new predicate: isKnownNonEqual()
isKnownNonEqual(A, B) returns true if it can be determined that A != B.

At the moment it only knows two facts, that a non-wrapping add of nonzero to a value cannot be that value:

A + B != A [where B != 0, addition is nsw or nuw]

and that contradictory known bits imply two values are not equal.

This patch also hooks this up to InstSimplify; InstSimplify had a peephole for the first fact but not the second so this teaches InstSimplify a new trick too (alas no measured performance impact!)

llvm-svn: 251012
2015-10-22 13:18:42 +00:00
Elena Demikhovsky 5c97dfdc9c AVX-512: Fixed a bug in select_cc for i1 type
Fixed faiure:
LLVM ERROR: Cannot select: t33: i1 = select_cc t25, Constant:i32<0>, t45, t42, seteq:ch

added a test

Differential Revision: http://reviews.llvm.org/D13943

llvm-svn: 250996
2015-10-22 07:10:29 +00:00
Sanjoy Das 769e7e2f71 [OperandBundles] Teach AliasAnalysis about operand bundles
Summary:
If a `CallSite` has operand bundles, then do not peek into the called
function to get a more precise `ModRef` answer.

This is tested using `argmemonly`, `-basicaa` and `-gvn`; but the
functionality is not specific to any of these.

Depends on D13961

Reviewers: reames, chandlerc

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D13962

llvm-svn: 250974
2015-10-22 03:12:51 +00:00
Sanjoy Das 98a341bc0c [OperandBundles] Make function attributes conservatively correct
Summary:
This makes attribute accessors on `CallInst` and `InvokeInst` do the
(conservatively) right thing.  This essentially involves, in some
cases, *not* falling back querying the attributes on the called
`llvm::Function` when operand bundles are present.

Attributes locally present on the `CallInst` or `InvokeInst` will still
override operand bundle semantics.  The LangRef has been amended to
reflect this.  Note: this change does not do anything prevent
`-function-attrs` from inferring `CallSite` local attributes after
inspecting the called function -- that will be done as a separate
change.

I've used `-adce` and `-early-cse` to test these changes.  There is
nothing special about these passes (and they did not require any
changes) except that they seemed be the easiest way to write the tests.

This change does not add deal with `argmemonly`.  That's a later change
because alias analysis requires a related fix before `argmemonly` can be
tested.

Reviewers: reames, chandlerc

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D13961

llvm-svn: 250973
2015-10-22 03:12:22 +00:00
JF Bastien f2364bf129 WebAssembly: fix more syntax
br_if shouldn't start with a dot.
div and rem went from prefix u/s to suffix.

llvm-svn: 250972
2015-10-22 02:32:50 +00:00
Pete Cooper b70b956c80 Add missing load/store flags to thumb2 instructions.
These were the cause of a verifier error when building 7zip with
-verify-machineinstrs.  Running 'make check' with the verifier
triggered the same error on the test here so i've updated the test
to run the verifier on one of its runs instead of adding a new one.

While looking at this code, there was a stale comment that these
instructions were only used for disassembly.  This probably used to
be the case, but they are now used in the 'ARM load / store optimization pass' too.

This reapplies r242300 which was reverted in r242428 due to bot failures.

Ultimately those failures were spurious and completely unrelated to this commit.  I reverted this
at the time because it was thought to be at fault.

llvm-svn: 250969
2015-10-22 01:48:57 +00:00
Matt Arsenault e8c0891e42 AMDGPU: Fix verifier error in SIFoldOperands
There may be other use operands that also need their kill flags cleared.

This happens in a few tests when SIFoldOperands is moved after
PeepholeOptimizer.

PeepholeOptimizer rewrites cases that look like:
%vreg0 = ...
%vreg1 = COPY %vreg0
use %vreg1<kill>
%vreg2 = COPY %vreg0
use %vreg2<kill>

to use the earlier source to
%vreg0 = ...
use %vreg0
use %vreg0

Currently SIFoldOperands sees the copied registers, so there is
only one use. So far I haven't managed to come up with a test
that currently has multiple uses of a foldable VGPR -> VGPR copy.

llvm-svn: 250960
2015-10-21 22:37:50 +00:00
Michael Liao 446c714a76 [InstCombine] Revise the test case to match full sequene
llvm-svn: 250950
2015-10-21 21:50:58 +00:00
Keno Fischer ddad187ce7 [RuntimeDyld] Ignore ST_FILE symbols when constructing GlobalSymbolTable
Summary: ELF's STT_File symbols may overlap with regular globals in
other files, so we should ignore them here in order to avoid having
bogus entries in the symbol table that confuse us when resolving relocations.

Reviewers: lhames

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D13888

llvm-svn: 250942
2015-10-21 20:22:04 +00:00
Joerg Sonnenberger 7212809abc Drop assert that a call with struct return goes to a function with sret
attribute. Clang incorrectly misses it on __muldc3 and friends and the
type system doesn't include it properly either.

llvm-svn: 250938
2015-10-21 20:05:01 +00:00
Reid Kleckner cfaeb42f9c [WinEH] Add test for llvm.va.start in catchpad
It already works, but we should have a test for it.

This used to be PR23094 in the old model.

llvm-svn: 250936
2015-10-21 19:54:40 +00:00
David Majnemer dc3b67b4ca [SimplifyCFG] Don't use-after-free an SSA value
SimplifyTerminatorOnSelect didn't consider the possibility that the
condition might be related to one of PHI nodes.

This fixes PR25267.

llvm-svn: 250922
2015-10-21 18:22:24 +00:00
Craig Topper 896c267544 [X86] Add AMD mwaitx, monitorx, and clzero instructions to the assembly parser and disassembler.
llvm-svn: 250911
2015-10-21 17:26:45 +00:00
Sanjay Patel 557001d1c7 [x86] add test case that shows holes in LEA isel
llvm-svn: 250910
2015-10-21 17:24:00 +00:00
Kevin Enderby da9dd05011 Backing out commit r250906 as it broke lld.
llvm-svn: 250908
2015-10-21 17:13:20 +00:00
Kevin Enderby e3bf4fd546 This removes the eating of the error in Archive::Child::getSize() when the characters
in the size field in the archive header for the member is not a number.  To do this we
have all of the needed methods return ErrorOr to push them up until we get out of lib.
Then the tools and can handle the error in whatever way is appropriate for that tool.

So the solution is to plumb all the ErrorOr stuff through everything that touches archives.
This include its iterators as one can create an Archive object but the first or any other
Child object may fail to be created due to a bad size field in its header.

Thanks to Lang Hames on the changes making child_iterator contain an
ErrorOr<Child> instead of a Child and the needed changes to ErrorOr.h to add
operator overloading for * and -> .

We don’t want to use llvm_unreachable() as it calls abort() and is produces a “crash”
and using report_fatal_error() to move the error checking will cause the program to
stop, neither of which are really correct in library code. There are still some uses of
these that should be cleaned up in this library code for other than the size field.

Also corrected the code where the size gets us to the “at the end of the archive”
which is OK but past the end of the archive will return object_error::parse_failed now.

The test cases use archives with text files so one can see the non-digit character,
in this case a ‘%’, in the size field.

llvm-svn: 250906
2015-10-21 16:59:24 +00:00
Daniel Sanders d6cf3e05ef [mips][mips16] Re-work the inline assembly stubs to work with IAS. NFC.
Summary:
Previously, we were inserting an InlineAsm statement for each line of the
inline assembly. This works for GAS but it triggers prologue/epilogue
emission when IAS is in use. This caused:
    .set noreorder
    .cpload $25
to be emitted as:
    .set push
    .set reorder
    .set noreorder
    .set pop
    .set push
    .set reorder
    .cpload $25
    .set pop
which led to assembler errors and caused the test to fail.

The whitespace-after-comma changes included in this patch are necessary to
match the output when IAS is in use.

Reviewers: vkalintiris

Subscribers: rkotler, llvm-commits, dsanders

Differential Revision: http://reviews.llvm.org/D13653

llvm-svn: 250895
2015-10-21 12:44:14 +00:00
Elena Demikhovsky 3ad76a1acd Masked Load/Store optimization for scalar code
When we have to convert the masked.load, masked.store to scalar code, we generate a chain of conditional basic blocks.
I added optimization for constant mask vector.

Differential Revision: http://reviews.llvm.org/D13855

llvm-svn: 250893
2015-10-21 11:50:54 +00:00
Daniel Sanders 0f596814e9 [mips][msa] Remove copy_u.d and move copy_u.w to MSA64.
Summary:
The forwards compatibility strategy employed by MIPS is to consider registers
to be infinitely sign-extended. Then on ISA's with a wider register, the result
of existing instructions are sign-extended to register width and zero-extended
counterparts are added. copy_u.w on MSA32 and copy_u.w on MSA64 violate this
strategy and we have therefore corrected the MSA specs to fix this.

We still keep track of sign/zero-extension during legalization but we now
match copy_s.[wd] where required.

No change required to clang since __builtin_msa_copy_u_[wd] will map to
copy_s.[wd] where appropriate for the target.

Reviewers: vkalintiris

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D13472

llvm-svn: 250887
2015-10-21 09:58:54 +00:00
Jonas Paulsson 17ad04535f Let MachineVerifier be aware of mem-to-mem instructions.
A mem-to-mem instruction (that both loads and stores), which store to an
FI, cannot pass the verifier since it thinks it is loading from the FI.

For the mem-to-mem instruction, do a looser check in visitMachineOperand()
and only check liveness at the reg-slot while analyzing a frame index operand.

Needed to make CodeGen/SystemZ/xor-01.ll pass with -verify-machineinstrs,
which now runs with this flag.

Reviewed by Evan Cheng and Quentin Colombet.

llvm-svn: 250885
2015-10-21 07:39:47 +00:00
Krzysztof Parzyszek fdb7b693a7 Tail duplication can mix incompatible registers in phi nodes
Do not tail duplicate blocks where the successor has a phi node,
and the corresponding value in that phi node uses a subregister.

http://reviews.llvm.org/D13922

llvm-svn: 250877
2015-10-21 02:40:06 +00:00
JF Bastien 1a59c6b2c9 WebAssembly: support imports
C/C++ code can declare an extern function, which will show up as an import in WebAssembly's output. It's expected that the linker will resolve these, and mark unresolved imports as call_import (I have a patch which does this in wasmate).

llvm-svn: 250875
2015-10-21 02:23:09 +00:00
Dehao Chen 100424124b Tolerate negative offset when matching sample profile.
In some cases (as illustrated in the unittest), lineno can be less than the heade_lineno because the function body are included from some other files. In this case, offset will be negative. This patch makes clang still able to match the profile to IR in this situation.

http://reviews.llvm.org/D13914

llvm-svn: 250873
2015-10-21 01:22:27 +00:00
Krzysztof Parzyszek ced9941cd4 [Hexagon] Bit-based instruction simplification
Analyze bit patterns of operands and values of instructions to perform
various simplifications, dead/redundant code elimination, etc.

llvm-svn: 250868
2015-10-20 22:57:13 +00:00
Simon Pilgrim bb17881731 [X86][SSE] Add 256-bit vector bit rotation tests.
llvm-svn: 250853
2015-10-20 20:27:23 +00:00
Jonas Paulsson 5558536a4e [SystemZ] Comment fix in test/CodeGen/SystemZ/fp-cmp-05.ll
llvm-svn: 250828
2015-10-20 15:05:54 +00:00
Artyom Skrobov 7fd67e25aa Adding support for TargetLoweringBase::LibCall
Summary:
TargetLoweringBase::Expand is defined as "Try to expand this to other ops,
otherwise use a libcall." For ISD::UDIV and ISD::SDIV, the choice between
the two possibilities was defined in a rather convoluted way:

- if DIVREM is legal, expand to DIVREM
- if DIVREM has a custom lowering, expand to DIVREM
- if DIVREM libcall is defined and a remainder from the same division is
  computed elsewhere, expand to a DIVREM libcall
- else, expand to a DIV libcall

This had the undesirable effect that if both DIV and DIVREM are implemented
as libcalls, then ISD::UDIV and ISD::SDIV are expanded to the heavier DIVREM
libcall, even when the remainder isn't used.

The new code adds a new LegalizeAction, TargetLoweringBase::LibCall, so that
backends can directly control whether they prefer an expansion or a conversion
to a libcall. This makes the generic lowering code even more generic,
allowing its reuse in a wider range of target-specific configurations.

The useful effect is that ARM backend will now generate a call
to __aeabi_{i,u}div rather than __aeabi_{i,u}divmod in cases where
it doesn't need the remainder. There's no functional change outside
the ARM backend.

Reviewers: t.p.northover, rengolin

Subscribers: t.p.northover, llvm-commits, aemerson

Differential Revision: http://reviews.llvm.org/D13862

llvm-svn: 250826
2015-10-20 13:14:52 +00:00
Igor Breger 21296d230a AVX512: Implemented encoding and intrinsics for VPBROADCASTB/W/D/Q instructions.
Differential Revision: http://reviews.llvm.org/D13884

llvm-svn: 250819
2015-10-20 11:56:42 +00:00
Andrea Di Biagio 9a85b7abe0 [x86] Fix AVX maskload/store intrinsic prototypes.
The mask value type for maskload/maskstore GCC builtins is never a vector of
packed floats/doubles.

This patch fixes the following issues:
1. The mask argument for builtin_ia32_maskloadpd and builtin_ia32_maskstorepd
   should be of type llvm_v2i64_ty and not llvm_v2f64_ty.
2. The mask argument for builtin_ia32_maskloadpd256 and
   builtin_ia32_maskstorepd256 should be of type llvm_v4i64_ty and not
   llvm_v4f64_ty.
3. The mask argument for builtin_ia32_maskloadps and builtin_ia32_maskstoreps
   should be of type llvm_v4i32_ty and not llvm_v4f32_ty.
4. The mask argument for builtin_ia32_maskloadps256 and
   builtin_ia32_maskstoreps256 should be of type llvm_v8i32_ty and not
   llvm_v8f32_ty.

Differential Revision: http://reviews.llvm.org/D13776

llvm-svn: 250817
2015-10-20 11:20:13 +00:00
Matt Arsenault 8f18917a90 AMDGPU: Stop reserving v[254:255]
This wasn't doing anything useful. They weren't explicitly used
anywhere, and the RegScavenger ignores reserved registers.

This for some reason caused a random scheduling change in the test.
Getting the check lines to pass is too frustrating, and there's probably
not too much value in checking the vector case's operands N times.

llvm-svn: 250794
2015-10-20 03:59:58 +00:00
JF Bastien c8f89e86d5 WebAssembly: fix call/return syntax.
They are now typeless, unlike other operations.

llvm-svn: 250793
2015-10-20 01:26:54 +00:00
Sanjoy Das 7ad67640e9 [RS4GC] Re-purpose `normalizeForInvokeSafepoint`; NFC.
`normalizeForInvokeSafepoint` in RewriteStatepointsForGC.cpp, as it is
written today, deals with `gc.relocate` and `gc.result` uses of a
statepoint equally well.  This change documents this fact and adds a
test case.

There is no functional change here -- only documentation of existing
functionality.

llvm-svn: 250784
2015-10-20 01:06:24 +00:00
JF Bastien 3b0177c542 WebAssembly: fix syntax for br_if.
llvm-svn: 250777
2015-10-20 00:37:42 +00:00
Cong Hou 7745dbc5c4 Enhance loop rotation with existence of profile data in MachineBlockPlacement pass.
Currently, in MachineBlockPlacement pass the loop is rotated to let the best exit to be the last BB in the loop chain, to maximize the fall-through from the loop to outside. With profile data, we can determine the cost in terms of missed fall through opportunities when rotating a loop chain and select the best rotation. Basically, there are three kinds of cost to consider for each rotation:

1. The possibly missed fall through edge (if it exists) from BB out of the loop to the loop header.
2. The possibly missed fall through edges (if they exist) from the loop exits to BB out of the loop.
3. The missed fall through edge (if it exists) from the last BB to the first BB in the loop chain.

Therefore, the cost for a given rotation is the sum of costs listed above. We select the best rotation with the smallest cost. This is only for PGO mode when we have more precise edge frequencies.

Differential revision: http://reviews.llvm.org/D10717

llvm-svn: 250754
2015-10-19 23:16:40 +00:00
Michael Liao c65d386b81 [InstCombine] Optimize icmp of inc/dec at RHS
Allow LLVM to optimize the sequence like the following:

  %inc = add nsw i32 %i, 1
  %cmp = icmp slt %n, %inc

into:

  %cmp = icmp sle i32 %n, %i

The case is not handled previously due to the complexity of compuation of %n.
Hence, LLVM cannot swap operands of icmp accordingly.

llvm-svn: 250746
2015-10-19 22:08:14 +00:00
Sanjay Patel 69a50a1e17 [CGP] transform select instructions into branches and sink expensive operands
This was originally checked in at r250527, but reverted at r250570 because of PR25222.
There were at least 2 problems: 
1. The cost check was checking for an instruction with an exact cost of TCC_Expensive;
that should have been >=.
2. The cause of the clang stage 1 failures was illegally sinking 'call' instructions;
we can't sink instructions that may have side effects / are not safe to execute speculatively.

Fixed those conditions in sinkSelectOperand() and added test cases.

Original commit message:
This is a follow-up to the discussion in D12882.

Ideally, we would like SimplifyCFG to be able to form select instructions even when the operands
are expensive (as defined by the TTI cost model) because that may expose further optimizations.
However, we would then like a later pass like CodeGenPrepare to undo that transformation if the
target would likely benefit from not speculatively executing an expensive op (this patch).

Once we have this safety mechanism in place, we can adjust SimplifyCFG to restore its
select-formation behavior that changed with r248439.

Differential Revision: http://reviews.llvm.org/D13297

llvm-svn: 250743
2015-10-19 21:59:12 +00:00
Lang Hames f1381cb8d0 [RuntimeDyld][COFF] Fix some endianness issues, re-enable the regression test.
llvm-svn: 250733
2015-10-19 20:37:52 +00:00
Jun Bum Lim d3548303ec [AArch64]Merge halfword loads into a 32-bit load
Convert two halfword loads into a single 32-bit word load with bitfield extract
instructions. For example :
  ldrh w0, [x2]
  ldrh w1, [x2, #2]
becomes
  ldr w0, [x2]
  ubfx w1, w0, #16, #16
  and  w0, w0, #ffff

llvm-svn: 250719
2015-10-19 18:34:53 +00:00
Krzysztof Parzyszek db8677067c [Hexagon] Delay emission of CFI instructions
Emit the CFI instructions after all code transformation have been done.
This will avoid any interference between CFI instructions and packetization.

llvm-svn: 250714
2015-10-19 17:46:01 +00:00
Teresa Johnson bd715d4702 Fix windows bot failures from r250699 by removing "/" from expected path
in test output.

llvm-svn: 250701
2015-10-19 15:19:02 +00:00
Teresa Johnson 91a88bba0e llvm-lto support for generating combined function indexes
Summary:
This patch adds support to llvm-lto that mirrors the support added by
r249270 to the gold plugin. This enables better testing of combined
index generation for ThinLTO.

Added a new test, and this support will be used in the test in D13515.

Reviewers: joker.eph

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D13847

llvm-svn: 250699
2015-10-19 14:30:44 +00:00
Asiri Rathnayake 1040a53be3 Fix mapping of @llvm.arm.ssat/usat intrinsics to ssat/usat instructions
The mapping of these two intrinsics in ARMInstrInfo.td had a small
omission which lead to their operands not being validated/transformed
before being lowered into usat and ssat instructions. This can cause
incorrect instructions to be emitted.

I've also added tests for the remaining two saturating arithmatic
intrinsics @llvm.arm.qadd and @llvm.arm.qsub as they are missing
codegen tests.

llvm-svn: 250697
2015-10-19 11:44:24 +00:00
Zlatko Buljan 5292083584 [mips][microMIPS] Implement ADDQ.PH, ADDQ_S.W, ADDQH.PH, ADDQH.W, ADDSC, ADDU.PH, ADDU_S.QB, ADDWC and ADDUH.QB instructions
Differential Revision: http://reviews.llvm.org/D13130

llvm-svn: 250685
2015-10-19 07:16:26 +00:00
Zlatko Buljan d0a7d6e4ee [mips][microMIPS] Implement ABSQ.QB, ABSQ_S.PH, ABSQ_S.W, ABSQ_S.QB, INSV, MADD, MADDU, MSUB, MSUBU, MULT and MULTU instructions
Differential Revision: http://reviews.llvm.org/D13721

llvm-svn: 250683
2015-10-19 06:34:44 +00:00
Xinliang David Li aa0592cc70 [PGO] Eliminate prof data register calls on FreeBSD platform
This is a follow up patch of r250199 after verifying the start/stop
section symbols work as spected on FreeBSD.

llvm-svn: 250679
2015-10-19 04:17:10 +00:00
Jakub Staszak f12821a43c Preserve CFG in MergedLoadStoreMotion. This fixes PR24426.
llvm-svn: 250660
2015-10-18 19:34:10 +00:00
Simon Pilgrim 4708060e94 [X86][SSE] Add vector bit rotation tests.
llvm-svn: 250656
2015-10-18 12:54:37 +00:00
Asaf Badouh 696e8e0bb7 [X86][AVX512DQ] add scalar fpclass
Differential Revision: http://reviews.llvm.org/D13769

llvm-svn: 250650
2015-10-18 11:04:38 +00:00
Igor Breger cbb9550537 AVX512: Lowering i8/i16 vector CTLZ using the dword LZCNT vector instruction
Differential Revision: http://reviews.llvm.org/D13632

llvm-svn: 250649
2015-10-18 09:56:39 +00:00
Lang Hames a32d71be4c [RuntimeDyld] Add support for absolute symbols.
llvm-svn: 250639
2015-10-18 01:41:37 +00:00
Simon Pilgrim 4773763bb0 [X86][XOP] Add VPROT rotate by immediate intrinsics tests
llvm-svn: 250618
2015-10-17 18:21:53 +00:00
Simon Pilgrim a18ae9bd70 [CostModel] Fixed AVX integer shift costs
Targets with AVX but without AVX2 were incorrectly reporting costs of 256-bit integer shifts.

llvm-svn: 250611
2015-10-17 13:23:38 +00:00
Simon Pilgrim 5b65f28fe7 [X86][FastISel] Teach how to select SSE4A nontemporal stores.
Add FastISel support for SSE4A scalar float / double non-temporal stores

Follow up to D13698

Differential Revision: http://reviews.llvm.org/D13773

llvm-svn: 250610
2015-10-17 13:04:42 +00:00
Simon Pilgrim 216b1bf5ed [InstCombine] SSE4A constant folding and conversion to shuffles.
This patch improves support for combining the SSE4A EXTRQ(I) and INSERTQ(I) intrinsics:

1 - Converts INSERTQ/EXTRQ calls to INSERTQI/EXTRQI if the 'bit index' and 'length' operands are constant
2 - Converts INSERTQI/EXTRQI calls to shufflevector if the bit index/length are both byte aligned (we can already lower shuffles to INSERTQI/EXTRQI if its useful)
3 - Constant folding support
4 - Add zeroinitializer handling

Differential Revision: http://reviews.llvm.org/D13348

llvm-svn: 250609
2015-10-17 11:40:05 +00:00
Colin LeMahieu 68d155be8e [Hexagon] Reverting test file change.
llvm-svn: 250601
2015-10-17 01:58:51 +00:00
Colin LeMahieu 7c9587136d [Hexagon] Adding skeleton of HVX extension instructions.
llvm-svn: 250600
2015-10-17 01:33:04 +00:00
JF Bastien 3428ed4f53 WebAssembly: don't omit dead vregs from locals
Summary:
This is a temporary hack until we get around to remapping the vreg
numbers to local numbers. Dead vregs cause bad numbering and make
consumers sad.

We could also just look at debug info an use named locals instead, but
vregs have to work properly anyways so there!

Reviewers: binji, sunfish

Subscribers: jfb, llvm-commits, dschuff

Differential Revision: http://reviews.llvm.org/D13839

llvm-svn: 250594
2015-10-17 00:25:38 +00:00
JF Bastien 4f43e80ece WebAssembly: fix the syntax for comparisons
Summary: It has also slightly changed.

Reviewers: binji

Subscribers: jfb, dschuff, llvm-commits, sunfish

Differential Revision: http://reviews.llvm.org/D13837

llvm-svn: 250591
2015-10-17 00:12:29 +00:00
Joseph Tremoulet 55b51e9dcc [WinEH] Fix eh.exceptionpointer intrinsic lowering
Summary:
Some shared code for handling eh.exceptionpointer and eh.exceptioncode
needs to not share the part that truncates to 32 bits, which is intended
just for exception codes.

Reviewers: rnk

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D13747

llvm-svn: 250588
2015-10-17 00:08:08 +00:00
Reid Kleckner ce10cec56a Disable a test relying on symbol demangling on non-Windows platforms
llvm-svn: 250587
2015-10-16 23:56:14 +00:00
Reid Kleckner 28e490342b [WinEH] Fix stack alignment in funclets and ParentFrameOffset calculation
Our previous value of "16 + 8 + MaxCallFrameSize" for ParentFrameOffset
is incorrect when CSRs are involved. We were supposed to have a test
case to catch this, but it wasn't very rigorous.

The main effect here is that calling _CxxThrowException inside a
catchpad doesn't immediately crash on MOVAPS when you have an odd number
of CSRs.

llvm-svn: 250583
2015-10-16 23:43:27 +00:00
Reid Kleckner 02b74368ce [llvm-symbolizer] Use the export table if no symbols are present
This lets us make guesses about symbols in third party DLLs without
debug info, like MSVCR120.dll or kernel32.dll. dbghelp does the same
thing.

llvm-svn: 250582
2015-10-16 23:43:22 +00:00
Davide Italiano 4f05f32bb7 [llvm-readobj] Teach ELFDumper about symbol versioning.
Differential Revision:	 http://reviews.llvm.org/D13824

llvm-svn: 250575
2015-10-16 23:19:01 +00:00
Benjamin Kramer b43d33bf0f Revert "This is a follow-up to the discussion in D12882."
Breaks clang selfhost, see PR25222. This reverts commits r250527 and r250528.

llvm-svn: 250570
2015-10-16 23:00:29 +00:00
Sanjay Patel bbd524496c [x86] promote 'add nsw' to a wider type to allow more combines
The motivation for this patch starts with PR20134:
https://llvm.org/bugs/show_bug.cgi?id=20134

void foo(int *a, int i) {
  a[i] = a[i+1] + a[i+2];
}

It seems better to produce this (14 bytes):

movslq	%esi, %rsi
movl	0x4(%rdi,%rsi,4), %eax
addl	0x8(%rdi,%rsi,4), %eax
movl	%eax, (%rdi,%rsi,4)

Rather than this (22 bytes):

leal	0x1(%rsi), %eax
cltq             
leal	0x2(%rsi), %ecx      
movslq	%ecx, %rcx     
movl	(%rdi,%rcx,4), %ecx
addl	(%rdi,%rax,4), %ecx
movslq	%esi, %rax       
movl	%ecx, (%rdi,%rax,4)

The most basic problem (the first test case in the patch combines constants) should also be fixed in InstCombine, 
but it gets more complicated after that because we need to consider architecture and micro-architecture. For
example, AArch64 may not see any benefit from the more general transform because the ISA solves the sexting in
hardware. Some x86 chips may not want to replace 2 ADD insts with 1 LEA, and there's an attribute for that: 
FeatureSlowLEA. But I suspect that doesn't go far enough or maybe it's not getting used when it should; I'm 
also not sure if FeatureSlowLEA should also mean "slow complex addressing mode".

I see no perf differences on test-suite with this change running on AMD Jaguar, but I see small code size
improvements when building clang and the LLVM tools with the patched compiler.

A more general solution to the sext(add nsw(x, C)) problem that works for multiple targets is available
in CodeGenPrepare, but it may take quite a bit more work to get that to fire on all of the test cases that
this patch takes care of.

Differential Revision: http://reviews.llvm.org/D13757

llvm-svn: 250560
2015-10-16 22:14:12 +00:00
Jim Grosbach 0fdd572763 MC: Don't crash after issuing a diagnostic.
Crashing is bad, m'kay? Fixing a 4 year old bug of my own creation.
Adding the testcase now which I should have added then which would have
long since caught this.

The problem is that printMessage() will display the diagnostic but not
set HadError to true, resulting in the assembler continuing on its way
and trying to create relocations for things that may not allow them or
otherwise get itself into trouble. Using the Error() helper function
here rather than calling printMessage() directly resolves this.

rdar://23133240

llvm-svn: 250557
2015-10-16 22:07:59 +00:00
Joseph Tremoulet d11a998e81 [WinEH] Fix CatchRetSuccessorColorMap accounting
Summary:
We now use the block for the catchpad itself, rather than its normal
successor, as the funclet entry.
Putting the normal successor in the map leads downstream funclet
membership computations to erroneous results.

Reviewers: majnemer, rnk

Subscribers: rnk, llvm-commits

Differential Revision: http://reviews.llvm.org/D13798

llvm-svn: 250552
2015-10-16 21:22:54 +00:00
Andrew Kaylor 09b39acc03 Fix assertion failure with fp128 to unsigned i64 conversion
Patch by Mitch Bodart

Differential Revision: http://reviews.llvm.org/D13780

llvm-svn: 250550
2015-10-16 20:39:20 +00:00
Krzysztof Parzyszek a7c5f0409c [Hexagon] Split double registers
llvm-svn: 250549
2015-10-16 20:38:54 +00:00
Krzysztof Parzyszek 5b7dd0cdf9 [Hexagon] Merge adjacent stores
llvm-svn: 250542
2015-10-16 19:43:56 +00:00
Diego Novillo b93483dbce Sample profiles - Re-arrange binary format to emit head samples only on top functions.
The number of samples collected at the head of a function only make
sense for top-level functions (i.e., those actually called as opposed to
being inlined inside another).

Head samples essentially count the time spent inside the function's
prologue.  This clearly doesn't make sense for inlined functions, so we
were always emitting 0 in those.

llvm-svn: 250539
2015-10-16 18:54:35 +00:00
JF Bastien 6126d2b883 WebAssembly: fix load/store syntax
Summary: The syntax has changed a bit recently.

Reviewers: binji

Subscribers: llvm-commits, jfb, sunfish, dschuff

Differential Revision: http://reviews.llvm.org/D13821

llvm-svn: 250535
2015-10-16 18:24:42 +00:00
Joseph Tremoulet 53e9cbd95a [WinEH] Fix endpad coloring/numbering
Summary:
When a cleanup's cleanupendpad or cleanupret targets a catchendpad, stop
trying to propagate the cleanup's parent's color to the catchendpad, since
what's needed is the cleanup's grandparent's color and the catchendpad
will get that color from the catchpad linkage already.  We already had
this exclusion for invokes, but were missing it for
cleanupendpad/cleanupret.

Also add a missing line that tags cleanupendpads' states in the
EHPadStateMap, without with lowering invokes that target cleanupendpads
which unwind to other handlers (and so don't have the -1 state) will fail.

This fixes the reduced IR repro in PR25163.


Reviewers: majnemer, andrew.w.kaylor, rnk

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D13797

llvm-svn: 250534
2015-10-16 18:08:16 +00:00
Sanjay Patel bb5d550c3d move test case to x86 directory because it specifies an x86 target
llvm-svn: 250528
2015-10-16 17:18:07 +00:00
Sanjay Patel 374dd8d88e This is a follow-up to the discussion in D12882.
Ideally, we would like SimplifyCFG to be able to form select instructions even when the operands
are expensive (as defined by the TTI cost model) because that may expose further optimizations. 
However, we would then like a later pass like CodeGenPrepare to undo that transformation if the
target would likely benefit from not speculatively executing an expensive op (this patch).

Once we have this safety mechanism in place, we can adjust SimplifyCFG to restore its 
select-formation behavior that changed with r248439.

Differential Revision: http://reviews.llvm.org/D13297

llvm-svn: 250527
2015-10-16 16:54:30 +00:00
Charlie Turner 434d4599d4 [AArch64] Implement vector splitting on UADDV.
Summary: Fixes PR25056.

Reviewers: mcrosier, junbuml, jmolloy

Subscribers: aemerson, rengolin, llvm-commits

Differential Revision: http://reviews.llvm.org/D13466

llvm-svn: 250520
2015-10-16 15:38:25 +00:00
Zlatko Buljan 4c4f21b971 Commited two test files which are forgotten during commit of patch for http://reviews.llvm.org/D13376
llvm-svn: 250512
2015-10-16 13:03:10 +00:00
Hrvoje Varga 3c88fbd367 [mips][microMIPS] Implement LB, LBE, LBU and LBUE instructions
Differential Revision: http://reviews.llvm.org/D11633

llvm-svn: 250511
2015-10-16 12:24:58 +00:00
Craig Topper 09b6598572 [X86] Add fxsr feature flag for fxsave/fxrestore instructions.
llvm-svn: 250497
2015-10-16 06:03:09 +00:00
Sanjoy Das 58fae7cf6b [RS4GC] Dont' propagate call attrs related to patchable statepoints
The `"statepoint-id"` and `"statepoint-num-patch-bytes"` attributes are
used solely to determine properties of the `gc.statepoint` being
created.  Once the `gc.statepoint` is in place, these should be removed.

llvm-svn: 250491
2015-10-16 02:41:23 +00:00
Sanjoy Das 25ec1a3e60 [RS4GC] Use "deopt" operand bundles
Summary:
This is a step towards using operand bundles to carry deopt state till
RewriteStatepointsForGC.  The change adds a flag to
RewriteStatepointsForGC that teaches it to pick up deopt state from a
`"deopt"` operand bundle attached to the `call` or `invoke` it is
wrapping.

The command line flag added, `-rs4gc-use-deopt-bundles`, will only exist
for a short while.  Once we are able to pipe deopt bundle state through
the full optimization pipeline without problems, we will "constant fold"
`-rs4gc-use-deopt-bundles` to `true`.

Reviewers: swaroop.sridhar, reames

Subscribers: llvm-commits, sanjoy

Differential Revision: http://reviews.llvm.org/D13372

llvm-svn: 250489
2015-10-16 02:41:00 +00:00
Sanjoy Das 37e87c2023 [IndVars] Have `cloneArithmeticIVUser` guess better
Summary:
`cloneArithmeticIVUser` currently trips over expression like `add %iv,
-1` when `%iv` is being zero extended -- it tries to construct the
widened use as `add %iv.zext, zext(-1)` and (correctly) fails to prove
equivalence to `zext(add %iv, -1)` (here the SCEV for `%iv` is
`{1,+,1}`).

This change teaches `IndVars` to try sign extending the non-IV operand
if that makes the newly constructed IV use equivalent to the widened
narrow IV use.

Reviewers: atrick, hfinkel, reames

Subscribers: sanjoy, llvm-commits

Differential Revision: http://reviews.llvm.org/D13717

llvm-svn: 250483
2015-10-16 01:00:47 +00:00
JF Bastien 1d20a5e9e8 WebAssembly: update syntax
Summary:
Follow the same syntax as for the spec repo. Both have evolved slightly
independently and need to converge again.

This, along with wasmate changes, allows me to do the following:

  echo "int add(int a, int b) { return a + b; }" > add.c
  ./out/bin/clang -O2 -S --target=wasm32-unknown-unknown add.c -o add.wack
  ./experimental/prototype-wasmate/wasmate.py add.wack > add.wast
  ./sexpr-wasm-prototype/out/sexpr-wasm add.wast -o add.wasm
  ./sexpr-wasm-prototype/third_party/v8-native-prototype/v8/v8/out/Release/d8 -e "print(WASM.instantiateModule(readbuffer('add.wasm'), {print:print}).add(42, 1337));"

As you'd expect, the d8 shell prints out the right value.

Reviewers: sunfish

Subscribers: jfb, llvm-commits, dschuff

Differential Revision: http://reviews.llvm.org/D13712

llvm-svn: 250480
2015-10-16 00:53:49 +00:00
Davide Italiano a0ec2f5f99 [llvm-readobj/ELF] Dump DT_VERDEF/DT_VERDEFNUM correctly.
llvm-svn: 250464
2015-10-15 22:04:55 +00:00
Evgeniy Stepanov 9addbc9fc1 Revert "[safestack] Fast access to the unsafe stack pointer on AArch64/Android."
Breaks the hexagon buildbot.

llvm-svn: 250461
2015-10-15 21:26:49 +00:00
Evgeniy Stepanov 142947e9f0 [safestack] Fast access to the unsafe stack pointer on AArch64/Android.
Android libc provides a fixed TLS slot for the unsafe stack pointer,
and this change implements direct access to that slot on AArch64 via
__builtin_thread_pointer() + offset.

This change also moves more code into TargetLowering and its
target-specific subclasses to get rid of target-specific codegen
in SafeStackPass.

This change does not touch the ARM backend because ARM lowers
builting_thread_pointer as aeabi_read_tp, which is not available
on Android.

llvm-svn: 250456
2015-10-15 20:50:16 +00:00
JF Bastien 2cdd5e4710 x86: preserve flags when folding atomic operations
D4796 taught LLVM to fold some atomic integer operations into a single
instruction. The pattern was unaware that the instructions clobbered
flags. I fixed some of this issue in D13680 but had missed INC/DEC.

This patch adds the missing EFLAGS definition.

llvm-svn: 250438
2015-10-15 18:24:52 +00:00
Kevin B. Smith 89760f04f9 Change test to use FileCheck rather than grep.
Differential Revision: http://reviews.llvm.org/D13751

llvm-svn: 250431
2015-10-15 17:05:12 +00:00
Philip Reames a956cc7f08 Revert 250343 and 250344
Turns out this approach is buggy.  In discussion about follow on work, Sanjoy pointed out that we could be subject to circular logic problems.  

Consider:
 if (i u< L) leave()
 if ((i + 1) u< L) leave()
 print(a[i] + a[i+1]) 

If we know that L is less than UINT_MAX, we could possible prove (in a control dependent way) that i + 1 does not overflow.  This gives us:
 if (i u< L) leave()
 if ((i +nuw 1) u< L) leave()
 print(a[i] + a[i+1]) 

If we now do the transform this patch proposed, we end up with:
 if ((i +nuw 1) u< L) leave_appropriately()
 print(a[i] + a[i+1]) 

That would be a miscompile when i==-1.  The problem here is that the control dependent nuw bits got used to prove something about the first condition.  That's obviously invalid.

This won't happen today, but since I plan to enhance LVI/CVP with exactly that transform at some point in the not too distant future...

llvm-svn: 250430
2015-10-15 16:51:00 +00:00
JF Bastien 5b327712b0 x86 FP atomic codegen: don't drop globals, stack
Summary:
x86 codegen is clever about generating good code for relaxed
floating-point operations, but it was being silly when globals and
immediates were involved, forgetting where the global was and
loading/storing from/to the wrong place. The same applied to hard-coded
address immediates.

Don't let it forget about the displacement.

This fixes https://llvm.org/bugs/show_bug.cgi?id=25171

A very similar bug when doing floating-points atomics to the stack is
also fixed by this patch.

This fixes https://llvm.org/bugs/show_bug.cgi?id=25144

Reviewers: pete

Subscribers: llvm-commits, majnemer, rsmith

Differential Revision: http://reviews.llvm.org/D13749

llvm-svn: 250429
2015-10-15 16:46:29 +00:00
Manman Ren 72d44b1b09 Recommit r250345, it was reverted in r250366 to investigate a bot failure.
Our internal bot is still red after r250366.

llvm-svn: 250415
2015-10-15 14:59:40 +00:00
Daniel Sanders 6394ee598e [mips][ias] Implement ulh macro.
Summary:
This macro is needed to prevent test/CodeGen/Mips/2008-08-01-AsmInline.ll from
failing after the integrated assembler is enabled by default.

Reviewers: vkalintiris

Subscribers: llvm-commits, dsanders

Differential Revision: http://reviews.llvm.org/D13654

llvm-svn: 250414
2015-10-15 14:52:58 +00:00
Daniel Sanders 8008de5551 [mips][mips16] MIPS16 is not a CPU/Architecture but is an ASE.
Summary:
The -mcpu=mips16 option caused the Integrated Assembler to crash because
it couldn't figure out the architecture revision number to write to the
.MIPS.abiflags section. This CPU definition has been removed because, like
microMIPS, MIPS16 is an ASE to a base architecture.

Reviewers: vkalintiris

Subscribers: rkotler, llvm-commits, dsanders

Differential Revision: http://reviews.llvm.org/D13656

llvm-svn: 250407
2015-10-15 14:34:23 +00:00
Igor Breger d7bae451de AVX512: Implemented DAG lowering for shuff62x2/shufi62x2 instructions ( shuffle packed values at 128-bit granularity )
Differential Revision: http://reviews.llvm.org/D13648

llvm-svn: 250400
2015-10-15 13:29:07 +00:00
NAKAMURA Takumi 59e569b66e [CMake] check-llvm requires llvm-pdbdump.
llvm-svn: 250399
2015-10-15 13:22:38 +00:00
Igor Breger b4bb190eed AVX512: Implemented encoding and intrinsics for vpternlogd/q.
Differential Revision: http://reviews.llvm.org/D13768

llvm-svn: 250396
2015-10-15 12:33:24 +00:00
Elena Demikhovsky ecff21b297 AVX-512: Fixed a bug in shuffle lowering 32-bit mode
AVX-512 bit shuffle fails on 32 bit since we create a vector of 64-bit constants.
I split 8x64-bit const vector to 16x32 on 32-bit mode.

Differential Revision: http://reviews.llvm.org/D13644

llvm-svn: 250390
2015-10-15 11:35:33 +00:00
Andrea Di Biagio 6a61cecef0 [x86] Merge test pr24562.ll into x86-fold-pshufb.ll. NFC.
llvm-svn: 250387
2015-10-15 09:54:25 +00:00
Zlatko Buljan 54b1eb4c73 [mips][microMIPS] Implement DPA.W.PH, DPAQ_S.W.PH, DPAQ_SA.L.W, DPAQX_S.W.PH, DPAQX_SA.W.PH, DPAU.H.QBL, DPAU.H.QBR and DPAX.W.PH instructions
Differential Revision: http://reviews.llvm.org/D13376

llvm-svn: 250382
2015-10-15 08:59:45 +00:00
Hrvoje Varga 3a3c4b8a39 [mips][microMIPS] Implement BREAK16, LI16, MOVE16, SDBBP16, SUBU16 and XOR16 instructions
Differential Revision: http://reviews.llvm.org/D11292#inline-103143

llvm-svn: 250381
2015-10-15 08:39:07 +00:00
Hrvoje Varga 3ef4dd7bc8 [mips][microMIPS] Implement LLE and SCE instructions
Differential Revision: http://reviews.llvm.org/D11630

llvm-svn: 250379
2015-10-15 08:11:50 +00:00
Hrvoje Varga a766eff5a0 [mips][microMIPS] Implement LWLE, LWRE, SWLE and SWRE instructions
Differential Revision: http://reviews.llvm.org/D11631

llvm-svn: 250377
2015-10-15 07:23:06 +00:00
Lang Hames 2ac52760b9 [RuntimeDyld] Drop the '.s' suffix off the COFF test case - the MIPS bot started
failing when the suffix was added.

I assume the lack of a '.s' suffix means that the test case just wasn't running
before, and it has never worked on MIPS. I'll investigate that tomorrow.

llvm-svn: 250376
2015-10-15 07:16:40 +00:00
Lang Hames 86a4593dd2 [RuntimeDyld] Don't try to get the contents of sections that don't have any
(e.g. bss sections).

MachO and ELF have been silently letting this pass, but COFFObjectFile contains
an assertion to catch this kind of (ab)use of the getSectionContents, and this
was causing the JIT to crash on COFF objects with BSS sections. This patch
should fix that.

llvm-svn: 250371
2015-10-15 06:41:45 +00:00
Akira Hatanaka 8ad7399f8e [MachO] Stop generating *coal* sections.
Recommit r250342: move coal-sections-powerpc.s to subdirectory for powerpc.

Some background on why we don't have to use *coal* sections anymore:
Long ago when C++ was new and "weak" had not been standardized, an attempt was
made in cctools to support C++ inlines that can be coalesced by putting them
into their own section (TEXT/textcoal_nt instead of TEXT/text).

The current macho linker supports the weak-def bit on any symbol to allow it to
be coalesced, but the compiler still puts weak-def functions/data into alternate
section names, which the linker must map back to the base section name.

This patch makes changes that are necessary to prevent the compiler from using
the "coal" sections and have it use the non-coal sections instead when the
target architecture is not powerpc:

TEXT/textcoal_nt instead use TEXT/text
TEXT/const_coal instead use TEXT/const
DATA/datacoal_nt instead use DATA/data

If the target is powerpc, we continue to use the *coal* sections since anyone
targeting powerpc is probably using an old linker that doesn't have support for
the weak-def bits.

Also, have the assembler issue a warning if it encounters a *coal* section in
the assembly file and inform the users to use the non-coal sections instead.

rdar://problem/14265330

Differential Revision: http://reviews.llvm.org/D13188

llvm-svn: 250370
2015-10-15 05:28:38 +00:00
Manman Ren f5499fd9d5 Temporarily revert r250345 to sort out bot failure.
With r250345 and r250343, we start to observe the following failure
when bootstrap clang with lto and pgo:
PHI node entries do not match predecessors!
  %.sroa.029.3.i = phi %"class.llvm::SDNode.13298"* [ null, %30953 ], [ null, %31017 ], [ null, %30998 ], [ null, %_ZN4llvm8dyn_castINS_14ConstantSDNodeENS_7SDValueEEENS_10cast_rettyIT_T0_E8ret_typeERS5_.exit.i.1804 ], [ null, %30975 ], [ null, %30991 ], [ null, %_ZNK4llvm3EVT13getScalarTypeEv.exit.i.1812 ], [ %..sroa.029.0.i, %_ZN4llvm11SmallVectorIiLj8EED1Ev.exit.i.1826 ], !dbg !451895
label %30998
label %_ZNK4llvm3EVTeqES0_.exit19.thread.i
LLVM ERROR: Broken function found, compilation aborted!

I will re-commit this if the bot does not recover.

llvm-svn: 250366
2015-10-15 04:58:24 +00:00
David Majnemer 0943419262 s/NumFiles/NumStreams/
llvm-svn: 250357
2015-10-15 01:39:00 +00:00
David Majnemer 6e08126f31 [llvm-pdbdump] Provide a mechanism to dump the raw contents of a PDB
A PDB can be thought of as a very simple file system.  It is
occasionally illuminating to see the contents of the underlying files.

Differential Revision: http://reviews.llvm.org/D13674

llvm-svn: 250356
2015-10-15 01:27:19 +00:00
Quentin Colombet 5084e44d71 [ARM] Make sure we do not dereference the end iterator when accessing debug
information.
Although the problem was always here, it would only be exposed when
shrink-wrapping is enable.

rdar://problem/23110493

llvm-svn: 250352
2015-10-15 00:41:26 +00:00
Akira Hatanaka 276332b47f Revert r250349.
Test case coal-sections-powerpc.s is still failing on some buildbots.

llvm-svn: 250351
2015-10-15 00:11:03 +00:00
Akira Hatanaka 1cea644114 [MachO] Stop generating *coal* sections.
Recommit r250342: add -arch=ppc32 to the RUN lines of powerpc tests.

Some background on why we don't have to use *coal* sections anymore:
Long ago when C++ was new and "weak" had not been standardized, an attempt was
made in cctools to support C++ inlines that can be coalesced by putting them
into their own section (TEXT/textcoal_nt instead of TEXT/text).

The current macho linker supports the weak-def bit on any symbol to allow it to
be coalesced, but the compiler still puts weak-def functions/data into alternate
section names, which the linker must map back to the base section name.

This patch makes changes that are necessary to prevent the compiler from using
the "coal" sections and have it use the non-coal sections instead when the
target architecture is not powerpc:

TEXT/textcoal_nt instead use TEXT/text
TEXT/const_coal instead use TEXT/const
DATA/datacoal_nt instead use DATA/data

If the target is powerpc, we continue to use the *coal* sections since anyone
targeting powerpc is probably using an old linker that doesn't have support for
the weak-def bits.

Also, have the assembler issue a warning if it encounters a *coal* section in
the assembly file and inform the users to use the non-coal sections instead.

rdar://problem/14265330

Differential Revision: http://reviews.llvm.org/D13188

llvm-svn: 250349
2015-10-14 23:48:10 +00:00
Akira Hatanaka d58d347e42 Revert r250342.
Investigate why coal-sections-powerpc.s is failing on some buildbots.

llvm-svn: 250346
2015-10-14 23:29:10 +00:00
Cong Hou b74d3b3b86 Update the branch weight metadata in JumpThreading pass.
Currently in JumpThreading pass, the branch weight metadata is not updated after CFG modification. Consider the jump threading on PredBB, BB, and SuccBB. After jump threading, the weight on BB->SuccBB should be adjusted as some of it is contributed by the edge PredBB->BB, which doesn't exist anymore. This patch tries to update the edge weight in metadata on BB->SuccBB by scaling it by 1 - Freq(PredBB->BB) / Freq(BB->SuccBB).

This is the third attempt to submit this patch, while the first two led to failures in some FDO tests. After investigation, it is the edge weight normalization that caused those failures. In this patch the edge weight normalization is fixed so that there is no zero weight in the output and the sum of all weights can fit in 32-bit integer. Several unit tests are added.

Differential revision: http://reviews.llvm.org/D10979

llvm-svn: 250345
2015-10-14 23:14:17 +00:00
Philip Reames aae328d6eb Test case which should have been part of 250343
llvm-svn: 250344
2015-10-14 22:47:06 +00:00
Akira Hatanaka c078ae3e4f [MachO] Stop generating *coal* sections.
Some background on why we don't have to use *coal* sections anymore:
Long ago when C++ was new and "weak" had not been standardized, an attempt was
made in cctools to support C++ inlines that can be coalesced by putting them
into their own section (TEXT/textcoal_nt instead of TEXT/text).

The current macho linker supports the weak-def bit on any symbol to allow it to
be coalesced, but the compiler still puts weak-def functions/data into alternate
section names, which the linker must map back to the base section name.

This patch makes changes that are necessary to prevent the compiler from using
the "coal" sections and have it use the non-coal sections instead when the
target architecture is not powerpc:

TEXT/textcoal_nt instead use TEXT/text
TEXT/const_coal instead use TEXT/const
DATA/datacoal_nt instead use DATA/data

If the target is powerpc, we continue to use the *coal* sections since anyone
targeting powerpc is probably using an old linker that doesn't have support for
the weak-def bits.

Also, have the assembler issue a warning if it encounters a *coal* section in
the assembly file and inform the users to use the non-coal sections instead.

rdar://problem/14265330

Differential Revision: http://reviews.llvm.org/D13188

llvm-svn: 250342
2015-10-14 22:45:36 +00:00
Philip Reames ddcf6b35a2 Tighten known bits for ctpop based on zero input bits
This is a cleaned up patch from the one written by John Regehr based on the findings of the Souper superoptimizer.

The basic idea here is that input bits that are known zero reduce the maximum count that the intrinsic could return. We know that the number of bits required to represent a particular count is at most log2(N)+1.

Differential Revision: http://reviews.llvm.org/D13253

llvm-svn: 250338
2015-10-14 22:42:12 +00:00
Sanjay Patel e2b528074d add x86 codegen tests for 'add nsw' followed by 'sext'
llvm-svn: 250332
2015-10-14 21:47:03 +00:00
Bill Schmidt 048cc97fb1 [PowerPC] Fix invalid lxvdsx optimization (PR25157)
PR25157 identifies a bug where a load plus a vector shuffle is
incorrectly converted into an LXVDSX instruction.  That optimization
is only valid if the load is of a doubleword, and in the noted case,
it was not.  This corrects that problem.

Joint patch with Eric Schweitz, who provided the bugpoint-reduced test
case.

llvm-svn: 250324
2015-10-14 20:45:00 +00:00
Igor Kudrin bd716ab08c [llvm-readobj/ELF] fix: add correct test inputs
llvm-svn: 250292
2015-10-14 12:21:30 +00:00
Igor Kudrin 496fb2f040 [llvm-readobj/ELF] Print GNU Hash section
Add a new command line switch, -gnu-hash-table, to print the content of that section.

Differential Revision: http://reviews.llvm.org/D13696

llvm-svn: 250291
2015-10-14 12:11:50 +00:00
Andrea Di Biagio c47edbef4c [x86][FastISel] Teach how to select nontemporal stores.
This patch teaches x86 fast-isel how to select nontemporal stores.

On x86, we can use MOVNTI for nontemporal stores of doublewords/quadwords.
Instructions (V)MOVNTPS/PD/DQ can be used for SSE2/AVX aligned nontemporal
vector stores.

Before this patch, fast-isel always selected 'movd/movq' instead of 'movnti'
for doubleword/quadword nontemporal stores. In the case of nontemporal stores
of aligned vectors, fast-isel always selected movaps/movapd/movdqa instead of
movntps/movntpd/movntdq.

With this patch, if we use SSE2/AVX intrinsics for nontemporal stores we now
always get the expected (V)MOVNT instructions.
The lack of fast-isel support for nontemporal stores was spotted when analyzing
the -O0 codegen for nontemporal stores.

Differential Revision: http://reviews.llvm.org/D13698

llvm-svn: 250285
2015-10-14 10:03:13 +00:00
Xinliang David Li fb821f9c01 Add a instrumentation test for Linux
Make sure __llvm_profile_init is not emitted.

llvm-svn: 250274
2015-10-14 07:24:14 +00:00
Manman Ren 2c8e16d507 Revert r250204 and r250240 due to bot failure. We failed to build PGO-ed clang.
llvm-svn: 250264
2015-10-14 03:04:03 +00:00
Evgeniy Stepanov ebd3f44f93 [msan] Fix crash on multiplication by a non-integer constant.
Fixes PR25160.

llvm-svn: 250260
2015-10-14 00:21:13 +00:00
Kostya Serebryany 5cb86d5a40 [asan] Disabling speculative loads under asan. Patch by Mike Aizatsky
llvm-svn: 250259
2015-10-14 00:21:05 +00:00
Diego Novillo 760c5a8f45 Sample profiles - Add a name table to the binary encoding.
Binary encoded profiles used to encode all function names inline at
every reference.  This is clearly suboptimal in terms of space.  This
patch fixes this by adding a name table to the header of the file.

llvm-svn: 250241
2015-10-13 22:48:46 +00:00
Cong Hou 973e050639 Update MachineBranchProbabilityInfo::normalizeEdgeWeights to make sure there is no zero weight in the output, and also add a missing test for JumpThreading.
The test is for the patch in http://reviews.llvm.org/D10979 but was missing when committing that patch.

llvm-svn: 250240
2015-10-13 22:27:41 +00:00
David Majnemer eba62796cb [InlineFunction] Correctly inline TerminatePadInst
We forgot to append the terminatepad's arguments which resulted in us
treating the old terminatepad as an argument to the new terminatepad
causing us to crash immediately.  Instead, add the old terminatepad's
arguments to the new terminatepad.

This fixes PR25155.

llvm-svn: 250234
2015-10-13 22:08:17 +00:00
Kevin Enderby 1c1add44b6 Tweak to r250117 and change to use ErrorOr and drop isSizeValid for
ArchiveMemberHeader, suggestion by Rafael Espíndola.

Also The clang-x86-win2008-selfhost bot still does not like the
malformed-machos 00000031.a test, so removing it for now.  All
the other bots are fine with it however.

llvm-svn: 250222
2015-10-13 20:48:04 +00:00
Joseph Tremoulet 28c89bbb36 [WinEH] Add CoreCLR EH table emission
Summary:
Emit the handler and clause locations immediately after the standard
xdata.
Clauses are emitted in the same order and format used to communiate them
to the CLR Execution Engine.
Add a lit test to verify correct table generation on a small but
interesting example function.

Reviewers: majnemer, andrew.w.kaylor, rnk

Subscribers: pgavlin, AndyAyers, llvm-commits

Differential Revision: http://reviews.llvm.org/D13451

llvm-svn: 250219
2015-10-13 20:18:27 +00:00
Akira Hatanaka 5a4e4f8d8a [AArch64] Check the size of the vector before accessing its elements.
This fixes an assert in AArch64AsmParser::MatchAndEmitInstruction.

rdar://problem/23081753

llvm-svn: 250207
2015-10-13 18:55:34 +00:00
Xinliang David Li 3dd8817d84 [PGO]: Eliminate calls to __llvm_profile_register_function for Linux.
On Linux, the profile runtime can use __start_SECTNAME and __stop_SECTNAME
symbols defined by the linker to locate the start and end location of
a named section (with C name). This eliminates the need for instrumented
binary to call __llvm_profile_register_function during start-up time.

llvm-svn: 250199
2015-10-13 18:39:48 +00:00
Kevin Enderby d1c66ddf00 The issue with the malformed-machos 00000031.a test is that it needed ‘-arch x86_64’
flag as it was a Mach-O universal file.

The default as to which architecture slice that is dumped without an -arch flag
depends on the host architecture and the contents of the universal file.  The
malformed archive 00000031.a file has both an x86_64 and i386 slice.  So for
for x86_64 hosts only that slice is dumped, for non-x86_64 hosts, which is many
of the bots both slices are dumped.

The test is intended to only check that the malformation of the x86_64 which
has a non-decimal characters in the size field of the archive header so it no
longer crashes.

The problem turned out that the i388 slice of the malformed archive had a
different malformation which was causing the non-x86_64 bots to get this error:

llvm-objdump -macho -disassemble -arch i386 00000031.a 
Archive : .00000031.a
00000031.a(c_start.o):
LLVM ERROR: Symbol name entry points before beginning or past end of file.

and causing the test as it was written to fail.  So by adding ‘-arch x86_64’ it
should correct the test and the malformation on the i388 slice will not be
dumped.

Also the removal of the malformed-machos mem-crup-0261.macho was not causing
the issue so that is put back in.

Sorry for the churn on these tests, Kev

llvm-svn: 250184
2015-10-13 17:06:34 +00:00
Simon Pilgrim 3c2b30f8ba [InstCombine][SSE4A] Remove broken INSERTQI range combining optimization
As discussed in D13348 - the INSERTQI range combining code is wrong in that it confuses the insertion bit index with an extraction bit index.

The remaining legal combines are very unlikely (especially once we've converted to shuffles in D13348) so I'm removing the optimization.

llvm-svn: 250160
2015-10-13 14:48:54 +00:00
James Molloy 860507f838 [GlobalsAA] Don't assume anything about functions that may be overridden
Weak linkage and friends allow a symbol to be overriden outside the
code generator's model, so GlobalsAA shouldn't assume that anything it
can compute about such a symbol is valid.

llvm-svn: 250156
2015-10-13 10:43:33 +00:00
Manman Ren 9f824dab1d Revert 250089 due to bot failure. It failed when building clang itself with PGO.
llvm-svn: 250145
2015-10-13 03:38:02 +00:00
Kevin Enderby 19e291aac0 Looks like malformed-machos 00000031.a test is just getting a different error
on some of the bots.  I’ll remove this test for now.

llvm-svn: 250141
2015-10-13 01:27:28 +00:00
Matt Arsenault e5d9515fb7 DAGCombiner: Don't stop finding better chain on 2 aliases
The comment says this was stopped because it was unlikely to be
profitable. This is not true if you want to combine vector loads
with multiple components.

For a simple case that looks like

t0 = load t0 ...
t1 = load t0 ...
t2 = load t0 ...
t3 = load t0 ...

t4 = store t0:1, t0:1
  t5 = store t4, t1:0
    t6 = store t5, t2:0
	  t7 = store t6, t3:0

We want to get all of these stores onto a chain
that is a TokenFactor of these N loads. This mostly
solves the AMDGPU merge-stores.ll regressions
with -combiner-alias-analysis for merging vector
stores of vector loads.

llvm-svn: 250138
2015-10-13 00:49:00 +00:00
JF Bastien 986ed68eed x86: preserve flags when folding atomic operations
Summary:
D4796 taught LLVM to fold some atomic integer operations into a single
instruction. The pattern was unaware that the instructions clobbered
flags.

This patch adds the missing EFLAGS definition.

Floating point operations don't set flags, the subsequent fadd
optimization is therefore correct. The same applies for surrounding
load/store optimizations.

Reviewers: rsmith, rtrieu

Subscribers: llvm-commits, reames, morisset

Differential Revision: http://reviews.llvm.org/D13680

llvm-svn: 250135
2015-10-13 00:28:47 +00:00
Kevin Enderby 3c4927b723 Remove the correct unstable malformed-machos test mem-crup-0261.macho and
restore the malformed-machos 00000031.a test.  Hopefully this will get all the
build bots happy again.  I’ll again keep an eye on them.

llvm-svn: 250130
2015-10-13 00:05:17 +00:00
Matt Arsenault 61dc235f20 DAGCombiner: Combine extract_vector_elt from build_vector
This basic combine was surprisingly missing.
AMDGPU legalizes many operations in terms of 32-bit vector components,
so not doing this results in many extra copies and subregister extracts
that need to be cleaned up later.

InstCombine already does this for the hasOneUse case. The target hook
is to fix a handful of tests which break (e.g. ARM/vmov.ll) which turn
from a vector materialize repeated immediate instruction to a constant
vector load with more scalar copies from it.

llvm-svn: 250129
2015-10-12 23:59:50 +00:00
Simon Pilgrim aa0ec7f45c [InstCombine] Tidied up SSE4A tests.
First stage of bugfix discussed in D13348

llvm-svn: 250121
2015-10-12 23:07:06 +00:00
Kevin Enderby 0b3bfd15fe Temporarily remove the test added in r250117 while I investigate why two
of the build bots get a different error on that malformed file.

llvm-svn: 250120
2015-10-12 23:03:43 +00:00
Cong Hou bf22f5063a Assign correct edge weights to unwind destinations when lowering invoke statement.
When lowering invoke statement, all unwind destinations are directly added as successors of call site block, and the weight of those new edges are not assigned properly. Actually, default weight 16 are used for those edges. This patch calculates the proper edge weights for those edges when collecting all unwind destinations.

Differential revision: http://reviews.llvm.org/D13354

llvm-svn: 250119
2015-10-12 23:02:58 +00:00
Kevin Enderby 903955451e Fixed bugs in llvm-obdump while parsing Mach-O files from malformed archives
that caused aborts.  This was because of the characters of the ‘Size’ field in
the archive header did not contain decimal characters.

rdar://22983603

llvm-svn: 250117
2015-10-12 22:04:54 +00:00
Cong Hou 3320bcd815 Update the branch weight metadata in JumpThreading pass.
In JumpThreading pass, the branch weight metadata is not updated after CFG modification. Consider the jump threading on PredBB, BB, and SuccBB. After jump threading, the weight on BB->SuccBB should be adjusted as some of it is contributed by the edge PredBB->BB, which doesn't exist anymore. This patch tries to update the edge weight in metadata on BB->SuccBB by scaling it by 1 - Freq(PredBB->BB) / Freq(BB->SuccBB). 

Differential revision: http://reviews.llvm.org/D10979

llvm-svn: 250089
2015-10-12 19:44:08 +00:00
Reid Kleckner 4a5f35c0ae Make Win64 localescape offsets FP relative instead of SP relative
We made them SP relative back in March (r233137) because that's the
value the runtime passes to EH functions. With the new cleanuppad IR,
funclets adjust their frame argument from SP to FP, so our offsets
should now be FP-relative.

llvm-svn: 250088
2015-10-12 19:43:34 +00:00
Hemant Kulkarni 80f82fb2d4 [llvm-symbolizer] Add -print-address option
Differential Revision: http://reviews.llvm.org/D13518

llvm-svn: 250086
2015-10-12 19:26:44 +00:00
Andrea Di Biagio b0fe4eb199 [x86] Fix wrong lowering of vsetcc nodes (PR25080).
Function LowerVSETCC (in X86ISelLowering.cpp) worked under the wrong
assumption that for non-AVX512 targets, the source type and destination type
of a type-legalized setcc node were always the same type.

This assumption was unfortunately incorrect; the type legalizer is not always
able to promote the return type of a setcc to the same type as the first
operand of a setcc.

In the case of a vsetcc node, the legalizer firstly checks if the first input
operand has a legal type. If so, then it promotes the return type of the vsetcc
to that same type. Otherwise, the return type is promoted to the 'next legal
type', which, for vectors of MVT::i1 is always a 128-bit integer vector type.

Example (-mattr=+avx):

  %0 = trunc <8 x i32> %a to <8 x i23>
  %1 = icmp eq <8 x i23> %0, zeroinitializer

The initial selection dag for the code above is:

v8i1 = setcc t5, t7, seteq:ch
  t5: v8i23 = truncate t2
    t2: v8i32,ch = CopyFromReg t0, Register:v8i32 %vreg1
    t7: v8i32 = build_vector of all zeroes.

The type legalizer would firstly check if 't5' has a legal type. If so, then it
would reuse that same type to promote the return type of the setcc node.
Unfortunately 't5' is of illegal type v8i23, and therefore it cannot be used to
promote the return type of the setcc node. Consequently, the setcc return type
is promoted to v8i16. Later on, 't5' is promoted to v8i32 thus leading to the
following dag node:
  v8i16 = setcc t32, t25, seteq:ch

  where t32 and t25 are now values of type v8i32.

Before this patch, function LowerVSETCC would have wrongly expanded the setcc
to a single X86ISD::PCMPEQ. Surprisingly, ISel was still able to match an
instruction. In our case, ISel would have matched a VPCMPEQWrr:
  t37: v8i16 = X86ISD::VPCMPEQWrr t36, t25

However, t36 and t25 are both VR256, while the result type is instead of class
VR128. This inconsistency ended up causing the insertion of COPY instructions
like this:
  %vreg7<def> = COPY %vreg3; VR128:%vreg7 VR256:%vreg3

Which is an invalid full copy (not a sub register copy).
Eventually, the backend would have hit an UNREACHABLE "Cannot emit physreg copy
instruction" in the attempt to expand the malformed pseudo COPY instructions.

This patch fixes the problem adding the missing logic in LowerVSETCC to handle
the corner case of a setcc with 128-bit return type and 256-bit operand type.

This problem was originally reported by Dimitry as PR25080. It has been latent
for a very long time. I have added the minimal reproducible from that bugzilla
as test setcc-lowering.ll.

Differential Revision: http://reviews.llvm.org/D13660

llvm-svn: 250085
2015-10-12 19:22:30 +00:00
Colin LeMahieu e901616bf6 [llvm-symbolizer] Reverting r250067
llvm-svn: 250072
2015-10-12 17:57:02 +00:00
Hemant Kulkarni c07c7eddad [llvm-symbolizer] Add -print-address option
Differential Revision  http://reviews.llvm.org/D13518

llvm-svn: 250067
2015-10-12 17:31:22 +00:00
Zoran Jovanovic 2e386d3d07 [mips][micromips] Initial support for micrmomips DSP instructions and addu.qb implementation
Differential Revision: http://reviews.llvm.org/D12798

llvm-svn: 250058
2015-10-12 16:07:25 +00:00
Oliver Stannard cca893ffac [Debug] Look through bitcasts to find argument registers
On targets where f32 is not legal, we have to look through a BITCAST SDNode to
find the register that an argument is stored in when emitting debug info, or we
will not be able to emit a DW_AT_location for it.

Differential Revision: http://reviews.llvm.org/D13005

llvm-svn: 250056
2015-10-12 15:52:36 +00:00
Jun Bum Lim 54f3ddfbe2 [AArch64]Fix bug in function names in test case
Functions in this test case need to be renamed as its names are the same
as the instructions we are comparing with.

llvm-svn: 250052
2015-10-12 15:34:52 +00:00
Daniel Sanders b1ef88c172 [mips][ias] Implement macro expansion when bcc has an immediate where a register belongs.
Summary: Fixes PR24915.

Reviewers: vkalintiris

Subscribers: emaste, seanbruno, llvm-commits

Differential Revision: http://reviews.llvm.org/D13533

llvm-svn: 250042
2015-10-12 14:24:05 +00:00
Daniel Sanders 332cef6c5f [mips] Whitespace cleanup in MIPS16 tests to reduce noise in following changes. NFC.
Mostly tabs -> spaces and double spacing.

llvm-svn: 250041
2015-10-12 14:16:52 +00:00
Daniel Sanders 2fb8564d99 [mips] Handle undef when extracting subregs from FP64 registers.
Summary:
This removes unnecessary instructions when extracting from an undefined register
and also fixes a crash for O32 when passing undef to a double argument in
held in integer registers.

Reviewers: vkalintiris

Subscribers: llvm-commits, zoran.jovanovic, petarj

Differential Revision: http://reviews.llvm.org/D13467

llvm-svn: 250039
2015-10-12 13:55:44 +00:00
Oliver Stannard 939724cd02 GlobalOpt does not treat externally_initialized globals correctly
GlobalOpt currently merges stores into the initialisers of internal,
externally_initialized globals, but should not do so as the value of the global
may change between the initialiser and any code in the module being run.

llvm-svn: 250035
2015-10-12 13:20:52 +00:00
James Molloy 55d633bd60 [LoopVectorize] Shrink integer operations into the smallest type possible
C semantics force sub-int-sized values (e.g. i8, i16) to be promoted to int
type (e.g. i32) whenever arithmetic is performed on them.

For targets with native i8 or i16 operations, usually InstCombine can shrink
the arithmetic type down again. However InstCombine refuses to create illegal
types, so for targets without i8 or i16 registers, the lengthening and
shrinking remains.

Most SIMD ISAs (e.g. NEON) however support vectors of i8 or i16 even when
their scalar equivalents do not, so during vectorization it is important to
remove these lengthens and truncates when deciding the profitability of
vectorization.

The algorithm this uses starts at truncs and icmps, trawling their use-def
chains until they terminate or instructions outside the loop are found (or
unsafe instructions like inttoptr casts are found). If the use-def chains
starting from different root instructions (truncs/icmps) meet, they are
unioned. The demanded bits of each node in the graph are ORed together to form
an overall mask of the demanded bits in the entire graph. The minimum bitwidth
that graph can be truncated to is the bitwidth minus the number of leading
zeroes in the overall mask.

The intention is that this algorithm should "first do no harm", so it will
never insert extra cast instructions. This is why the use-def graphs are
unioned, so that subgraphs with different minimum bitwidths do not need casts
inserted between them.

This algorithm works hard to reduce compile time impact. DemandedBits are only
queried if there are extends of illegal types and if a truncate to an illegal
type is seen. In the general case, this results in a simple linear scan of the
instructions in the loop.

No non-noise compile time impact was seen on a clang bootstrap build.

llvm-svn: 250032
2015-10-12 12:34:45 +00:00
Amjad Aboud 1db6d7af46 [X86] Add XSAVE intrinsic family
Add intrinsics for the
  XSAVE instructions (XSAVE/XSAVE64/XRSTOR/XRSTOR64)
  XSAVEOPT instructions (XSAVEOPT/XSAVEOPT64)
  XSAVEC instructions (XSAVEC/XSAVEC64)
  XSAVES instructions (XSAVES/XSAVES64/XRSTORS/XRSTORS64)

Differential Revision: http://reviews.llvm.org/D13012

llvm-svn: 250029
2015-10-12 11:47:46 +00:00
Andrea Di Biagio a0922ed8fe [x86] PR24562: fix incorrect folding of PSHUFB nodes with a mask where all indices have the most significant bit set.
This patch fixes a problem in function 'combineX86ShuffleChain' that causes a
chain of shuffles to be wrongly folded away when the combined shuffle mask has
only one element.

We may end up with a combined shuffle mask of one element as a result of
multiple calls to function 'canWidenShuffleElements()'.
Function canWidenShuffleElements attempts to simplify a shuffle mask by widening
the size of the elements being shuffled.
For every pair of shuffle indices, function canWidenShuffleElements checks if
indices refer to adjacent elements. If all pairs refer to "adjacent" elements
then the shuffle mask is safely widened. As a consequence of widening, we end up
with a new shuffle mask which is half the size of the original shuffle mask.

The byte shuffle (pshufb) from test pr24562.ll has a mask of all SM_SentinelZero
indices. Function canWidenShuffleElements would combine each pair of
SM_SentinelZero indices into a single SM_SentinelZero index. So, in a
logarithmic number of steps (4 in this case), the pshufb mask is simplified to
a mask with only one index which is equal to SM_SentinelZero.

Before this patch, function combineX86ShuffleChain wrongly assumed that a mask
of size one is always equivalent to an identity mask. So, the entire shuffle
chain was just folded away as the combined shuffle mask was treated as a no-op
mask.

With this patch we know check if the only element of a combined shuffle mask is
SM_SentinelZero. In case, we propagate a zero vector.

Differential Revision: http://reviews.llvm.org/D13364

llvm-svn: 250027
2015-10-12 11:25:41 +00:00
Jonas Paulsson 233b9ce8bf [SystemZ] testcase MC/SystemZ/insn-good-z13.s extended.
New instructions using floating point registers have been added, to check
that AsmParser can deal with fp regs in vector instructions.

This tests r249810.

llvm-svn: 250023
2015-10-12 10:13:57 +00:00
Tobias Grosser 374bce0c22 SCEV: Allow simple AddRec * Parameter products in delinearization
This patch also allows the -delinearize pass to delinearize expressions that do
not have an outermost SCEVAddRec expression. The SCEV::delinearize
infrastructure allowed this since r240952, but the -delinearize pass was not
updated yet.

llvm-svn: 250018
2015-10-12 08:02:00 +00:00
Craig Topper 5be914eda1 [X86] Change the immediate for IN/OUT instructions to u8imm so the assembly parser will check the size.
llvm-svn: 250012
2015-10-12 04:17:55 +00:00
Simon Pilgrim d45c88bbb5 [DAGCombiner] Improved FMA combine support for vectors
Enabled constant canonicalization for all constants.

Improved combining of constant vectors.

llvm-svn: 249993
2015-10-11 19:48:12 +00:00
Simon Pilgrim 18a048e1cd [X86] Completed SHL cost model tests
As discussed in D8690. 

llvm-svn: 249990
2015-10-11 18:33:48 +00:00
Craig Topper 87990ee4ec [X86] Remove special validation for INT immediate operand from AsmParser. Instead mark its operand type as u8imm which will cause it to fail to match. This is more consistent with other instruction behavior.
This also fixes a bug where negative immediates below -128 were not being reported as errors.

llvm-svn: 249989
2015-10-11 18:27:24 +00:00
Simon Pilgrim 3bcf5bb79e [X86] Renamed SHL cost model tests
Matches naming conventions for ASHR/LSHR cost tests

As discussed in D8690. 

llvm-svn: 249984
2015-10-11 17:34:32 +00:00
Simon Pilgrim acbf51ab60 [X86] Added LSHR cost model tests
There are several dodgy costings due to AVX1 legalizing 256-bit integer vectors that need fixing.

As discussed in D8690. 

llvm-svn: 249983
2015-10-11 17:29:26 +00:00
Simon Pilgrim 602b0e1f0b [X86] Added ASHR cost model tests
There are several dodgy costings due to AVX1 legalizing 256-bit integer vectors that need fixing.

As discussed in D8690. 

llvm-svn: 249981
2015-10-11 17:08:05 +00:00
Simon Pilgrim 1d1c56e2df [InstCombine][X86][XOP] Combine XOP integer vector comparisons to native IR
We now have lowering support for XOP PCOM/PCOMU instructions.

llvm-svn: 249977
2015-10-11 14:38:34 +00:00
Simon Pilgrim 52d47e5704 [X86][XOP] Added support for the lowering of 128-bit vector integer comparisons to XOP PCOM/PCOMU instructions.
The XOP vector integer comparisons can deal with all signed/unsigned comparison cases directly and can be easily commuted as well (D7646).

llvm-svn: 249976
2015-10-11 14:15:17 +00:00
Simon Pilgrim bdbf839a3b [X86][SSE] Vector signed/unsigned integer compare tests.
llvm-svn: 249954
2015-10-10 22:21:05 +00:00
Teresa Johnson 1493ad9c24 Fix PR25101 - Handle anonymous functions without VST entries
Summary:
The change to use the VST function entries for lazy deserialization did
not handle the case of anonymous functions without aliases. In that case
we must fall back to scanning the function blocks as there is no VST
entry.

Reviewers: dexonsmith, joker.eph, davidxl

Subscribers: tstellarAMD, llvm-commits

Differential Revision: http://reviews.llvm.org/D13596

llvm-svn: 249947
2015-10-10 14:18:36 +00:00
Jonas Paulsson 28fa48de32 [SystemZ] CodeGen/SystemZ/asm-18.ll run with -verify-machineinstrs
Relates to the fixes of r249811.

llvm-svn: 249946
2015-10-10 07:20:23 +00:00
Jonas Paulsson 63a2b6862e [SystemZ] Fixes in the backend I/R.
expandPostRAPseudo():
STX -> 2 * STD: The first STD should not have the kill flag set for the address.

SystemZElimCompare:
BRC -> BRCT conversion: Don't forget to remove the CC<use,kill> operand.

Needed to make SystemZ/asm-17.ll pass with -verify-machineinstrs, which
now runs with this flag.

Reviewed by Ulrich Weigand.

llvm-svn: 249945
2015-10-10 07:14:24 +00:00
NAKAMURA Takumi 2b0e1730a0 Suppress LLVM::tools/llvm-symbolizer/coff-dwarf.test for mingw, for now.
FIXME: Improve llvm-symbolizer, or rename the feature "system-windows".
llvm-svn: 249937
2015-10-10 02:57:02 +00:00
Kevin Enderby 78ab58077f Move llvm-objdump malformed Mach-O tests to X86 test directory.
rdar://22983603

llvm-svn: 249927
2015-10-10 01:06:20 +00:00
Kevin Enderby d90a4176ff Fix a bugs in the Mach-O disassembler when disassembling from a
malformed Mach-O file that caused a crash.  This was because of an
assert where the code was incorrectly attempting to parse relocation
entries off of the sections and the filetype was not an MH_OBJECT.

rdar://22983603

llvm-svn: 249921
2015-10-10 00:05:01 +00:00
Reid Kleckner 14e773500e [WinEH] Delete the old landingpad implementation of Windows EH
The new implementation works at least as well as the old implementation
did.

Also delete the associated preparation tests. They don't exercise
interesting corner cases of the new implementation. All the codegen
tests of the EH tables have already been ported.

llvm-svn: 249918
2015-10-09 23:34:53 +00:00
Reid Kleckner eb7cd6c889 [SEH] Update SEH codegen tests to use the new IR
Also Fix a buglet where SEH tables had ranges that spanned funclets.

The remaining tests using the old landingpad IR are preparation tests,
and will be deleted along with the old preparation.

llvm-svn: 249917
2015-10-09 23:05:54 +00:00
David Majnemer 35d27b21a1 [WinEH] Insert the catchpad return before CSR restoration
x64 catchpads use rax to inform the unwinder where control should go
next.  However, we must initialize rax before the epilogue sequence so
as to not perturb the unwinder.

llvm-svn: 249910
2015-10-09 22:18:45 +00:00
James Y Knight 692e037499 Fix assert when emitting llvm.pow.f86.
This occurred due to introducing the invalid i64 type after type
legalization had already finished, in an attempt to workaround bitcast
f64 -> v2i32 not doing constant folding.

The *right* thing is to actually fix bitcast, but that has other
complications. So, for now, just get rid of the broken workaround, and
check in a test-case showing that it doesn't crash, with TODOs for
emitting proper code.

llvm-svn: 249908
2015-10-09 21:36:19 +00:00
Reid Kleckner e1c8a7f9c7 [SEH] Fix _except_handler4 table base states
We got them right for the old IR, but not with funclets.  Port the old
test to the new IR and fix the code.

llvm-svn: 249906
2015-10-09 21:27:28 +00:00
Reid Kleckner d880dc7509 [SEH] Remember to emit the last invoke range for SEH
This wasn't very observable in execution tests, because usually there is
an invoke in the catchpad that unwinds the the catchendpad but never
actually throws.

llvm-svn: 249898
2015-10-09 20:39:39 +00:00
James Y Knight 5b8217bc05 Fix assert in X86 backend.
When running combine on an extract_vector_elt, it wants to look through
a bitcast to check if the argument to the bitcast was itself an
extract_vector_elt with particular operands.

However, it called getOperand() on the argument to the bitcast *before*
checking that the opcode was EXTRACT_VECTOR_ELT, assert-failing if there
were zero operands for the actual opcode.

Fix, and add trivial test.

llvm-svn: 249891
2015-10-09 20:10:14 +00:00
Owen Anderson 2c9978b12b Teach LoopUnswitch not to perform non-trivial unswitching on loops containing convergent operations.
Doing so could cause the post-unswitching convergent ops to be
control-dependent on the unswitch condition where they were not before.
This check could be refined to allow unswitching where the convergent
operation was already control-dependent on the unswitch condition.

llvm-svn: 249874
2015-10-09 18:40:20 +00:00
Diego Novillo a7f1e8ef83 Add inline stack streaming to binary sample profiles.
With this patch we can now read and write inline stacks in sample
profiles to the binary encoded profiles.

In a subsequent patch, I will add a string table to the binary encoding.
Right now function names are emitted as strings every time we find them.
This is too bloated and will produce large files in applications with
lots of inlining.

llvm-svn: 249861
2015-10-09 17:54:24 +00:00
Dan Gohman ee1588ce96 [WebAssembly] Rename floating-point operators to match their spec names.
llvm-svn: 249859
2015-10-09 17:50:00 +00:00
Artur Pilipenko cca800207a Add verification for align, dereferenceable, dereferenceable_or_null load metadata
Reviewed By: reames

Differential Revision: http://reviews.llvm.org/D13428

llvm-svn: 249856
2015-10-09 17:41:29 +00:00
Reid Kleckner 848055ad16 Fix pdb.test when python is not on PATH
llvm-svn: 249847
2015-10-09 16:49:56 +00:00
Kevin Enderby af7c9d0123 Fixed two bugs in llvm-objdump’s printing of Objective-C meta data
from malformed Mach-O files that caused crashes.  The first because the
offset in a dyld bind table entry was out of range.  The second because their
was no image info section and the routine printing it did not have the
need check to see the section did not exist.

rdar://22983603

llvm-svn: 249845
2015-10-09 16:48:44 +00:00
Jun Bum Lim 0aace13d18 Improve ISel across lane float min/max reduction
In vectorized float min/max reduction code, the final "reduce" step
is sub-optimal. In AArch64, this change wll combine :

  svn0 = vector_shuffle t0, undef<2,3,u,u>
  fmin = fminnum t0,svn0
  svn1 = vector_shuffle fmin, undef<1,u,u,u>
  cc = setcc fmin, svn1, ole
  n0 = extract_vector_elt cc, #0
  n1 = extract_vector_elt fmin, #0
  n2 = extract_vector_elt fmin, #1
  result = select n0, n1,n2
into :
  result = llvm.aarch64.neon.fminnmv t0

This change extends r247575.

llvm-svn: 249834
2015-10-09 14:11:25 +00:00
Nemanja Ivanovic d389657399 Vector element extraction without stack operations on Power 8
This patch corresponds to review:
http://reviews.llvm.org/D12032

This patch builds onto the patch that provided scalar to vector conversions
without stack operations (D11471).
Included in this patch:

    - Vector element extraction for all vector types with constant element number
    - Vector element extraction for v16i8 and v8i16 with variable element number
    - Removal of some unnecessary COPY_TO_REGCLASS operations that ended up
      unnecessarily moving things around between registers

Not included in this patch (will be in upcoming patch):

    - Vector element extraction for v4i32, v4f32, v2i64 and v2f64 with
      variable element number
    - Vector element insertion for variable/constant element number

Testing is provided for all extractions. The extractions that are not
implemented yet are just placeholders.

llvm-svn: 249822
2015-10-09 11:12:18 +00:00
Andrea Di Biagio 99493df257 [MemCpyOpt] Fix wrong merging adjacent nontemporal stores into memset calls.
Pass MemCpyOpt doesn't check if a store instruction is nontemporal.
As a consequence, adjacent nontemporal stores are always merged into a
memset call.

Example:

;;;
define void @foo(<4 x float>* nocapture %p) {
entry:
  store <4 x float> zeroinitializer, <4 x float>* %p, align 16, !nontemporal !0
  %p1 = getelementptr inbounds <4 x float>, <4 x float>* %dst, i64 1
  store <4 x float> zeroinitializer, <4 x float>* %p1, align 16, !nontemporal !0
  ret void
}

!0 = !{i32 1}
;;;

In this example, the two nontemporal stores are combined to a memset of zero
which does not preserve the nontemporal hint. Later on the backend (tested on a
x86-64 corei7) expands that memset call into a sequence of two normal 16-byte
aligned vector stores.

opt -memcpyopt example.ll -S -o - | llc -mcpu=corei7 -o -

Before:
  xorps  %xmm0, %xmm0
  movaps  %xmm0, 16(%rdi)
  movaps  %xmm0, (%rdi)

With this patch, we no longer merge nontemporal stores into calls to memset.
In this example, llc correctly expands the two stores into two movntps:
  xorps  %xmm0, %xmm0
  movntps %xmm0, 16(%rdi)
  movntps  %xmm0, (%rdi)

In theory, we could extend the usage of !nontemporal metadata to memcpy/memset
calls. However a change like that would only have the effect of forcing the
backend to expand !nontemporal memsets back to sequences of store instructions.
A memset library call would not have exactly the same semantic of a builtin
!nontemporal memset call. So, SelectionDAG will have to conservatively expand
it back to a sequence of !nontemporal stores (effectively undoing the merging).

Differential Revision: http://reviews.llvm.org/D13519

llvm-svn: 249820
2015-10-09 10:53:41 +00:00
Saleem Abdulrasool 1825fac3c9 ARM: tweak WoA frame lowering
Accept r11 when targeting Windows on ARM rather than just low registers.
Because we are in a thumb-2 only mode, this may be slightly more expensive in
code size, but results in better code for the environment since it spills the
frame register, which is generally desired for fast stack walking as per the
ABI.

llvm-svn: 249804
2015-10-09 03:19:03 +00:00
Reid Kleckner ba77cd2737 Re-enable the coff-dwarf test on Windows
Apparently system-windows was only a clang lit suite feature.

llvm-svn: 249797
2015-10-09 01:18:27 +00:00
Reid Kleckner ae44e871cd Revert "Revert "Revert r248959, "[WinEH] Emit int3 after noreturn calls on Win64"""
This reverts commit r249794.

Apparently my checkouts are full of unexpected surprises today.

llvm-svn: 249796
2015-10-09 01:13:17 +00:00
Reid Kleckner 37bb6810f2 Fix coff-dwarf test for non-Windows platforms that cannot demangle MS C++ names
llvm-svn: 249795
2015-10-09 01:11:40 +00:00
Reid Kleckner b510401785 Revert "Revert r248959, "[WinEH] Emit int3 after noreturn calls on Win64""
This reverts commit r249032.

TODO write commit msg

llvm-svn: 249794
2015-10-09 01:11:37 +00:00
Joseph Tremoulet 676e5cf07f [WinEH] Fix cleanup state numbering
Summary:
 - Recurse from cleanupendpads to their cleanuppads, to make sure the
   cleanuppad is visited if it has a cleanupendpad but no cleanupret.
 - Check for and avoid double-processing cleanuppads, to allow for them to
   have multiple cleanuprets (plus cleanupendpads).
 - Update Cxx state numbering to visit toplevel cleanupendpads and to
   recurse from cleanupendpads to their preds, to ensure we number any
   funclets in inlined cleanups.  SEH state numbering already did this.

Reviewers: rnk

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D13374

llvm-svn: 249792
2015-10-09 00:46:08 +00:00
Reid Kleckner ebef256269 [SEH] Fix llvm.eh.exceptioncode fast register allocation assertion
I called the wrong MachineBasicBlock::addLiveIn() overload.

llvm-svn: 249786
2015-10-09 00:15:13 +00:00
Reid Kleckner 21427ada3e Address review comments, remove error case and return 0 instead as required by tests
llvm-svn: 249785
2015-10-09 00:15:08 +00:00
Reid Kleckner e94fef7b3d [llvm-symbolizer] Make --relative-address work with DWARF contexts
Summary:
Previously the relative address flag only affected PDB debug info.  Now
both DIContext implementations always expect to be passed virtual
addresses. llvm-symbolizer is now responsible for adding ImageBase to
module offsets when --relative-offset is passed.

Reviewers: zturner

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D12883

llvm-svn: 249784
2015-10-09 00:15:01 +00:00
Sanjoy Das 3c520a1272 [RS4GC] Refactoring to make a later change easier, NFCI
Summary:
These non-semantic changes will help make a later change adding
support for deopt operand bundles more streamlined.

Reviewers: reames, swaroop.sridhar

Subscribers: sanjoy, llvm-commits

Differential Revision: http://reviews.llvm.org/D13491

llvm-svn: 249779
2015-10-08 23:18:38 +00:00
Kevin Enderby 46e642f8c5 Fix a bug in llvm-objdump’s printing of Objective-C meta data
from malformed Mach-O files that caused a crash because of a
section header had a size that extended past the end of the file.

rdar://22983603

llvm-svn: 249768
2015-10-08 22:50:55 +00:00
Evgeniy Stepanov d12212bc8c New MSan mapping layout (llvm part).
This is an implementation of
https://github.com/google/sanitizers/issues/579

It has a number of advantages over the current mapping:
* Works for non-PIE executables.
* Does not require ASLR; as a consequence, debugging MSan programs in
  gdb no longer requires "set disable-randomization off".
* Supports linux kernels >=4.1.2.
* The code is marginally faster and smaller.

This is an ABI break. We never really promised ABI stability, but
this patch includes a courtesy escape hatch: a compile-time macro
that reverts back to the old mapping layout.

llvm-svn: 249753
2015-10-08 21:35:26 +00:00
Eric Christopher ab2241f1b8 Remove a '#' so that we can check either form for the various targets.
llvm-svn: 249734
2015-10-08 20:18:15 +00:00
Eric Christopher 11e5983658 Move the MMX subtarget feature out of the SSE set of features and into
its own variable.

This is needed so that we can explicitly turn off MMX without turning
off SSE and also so that we can diagnose feature set incompatibilities
that involve MMX without SSE.

Rationale:

// sse3
__m128d test_mm_addsub_pd(__m128d A, __m128d B) {
  return _mm_addsub_pd(A, B);
}

// mmx
void shift(__m64 a, __m64 b, int c) {
  _mm_slli_pi16(a, c);
  _mm_slli_pi32(a, c);
  _mm_slli_si64(a, c);
  _mm_srli_pi16(a, c);
  _mm_srli_pi32(a, c);
  _mm_srli_si64(a, c);
  _mm_srai_pi16(a, c);
  _mm_srai_pi32(a, c);
}

clang -msse3 -mno-mmx file.c -c

For this code we should be able to explicitly turn off MMX
without affecting the compilation of the SSE3 function and then
diagnose and error on compiling the MMX function.

This matches the existing gcc behavior and follows the spirit of
the SSE/MMX separation in llvm where we can (and do) turn off
MMX code generation except in the presence of intrinsics.

Updated a couple of tests, but primarily tested with a couple of tests
for turning on only mmx and only sse.

This is paired with a patch to clang to take advantage of this behavior.

llvm-svn: 249731
2015-10-08 20:10:06 +00:00
Diego Novillo aae1ed8e08 Re-apply r249644: Handle inline stacks in gcov-encoded sample profiles.
This fixes memory allocation problems by making the merge operation keep
the profile readers around until the merged profile has been emitted.
This is needed to prevent the inlined function names to disappear from
the function profiles. Since all the names are kept as references, once
the reader disappears, the names are also deallocated.

Additionally, XFAIL on big-endian architectures. The test case uses a
gcov file generated on a little-endian system.

llvm-svn: 249724
2015-10-08 19:40:37 +00:00
Alexei Starovoitov 87f83e6926 [bpf] Do not expand UNDEF SDNode during insn selection lowering
o Before this patch, BPF backend will expand UNDEF node
    to i64 constant 0.
  o For second pass of dag combiner, legalizer will run through
    each to-be-processed dag node.
  o If any new SDNode is generated and has an undef operand,
    dag combiner will put undef node, newly-generated constant-0 node,
    and any node which uses these nodes in the working list.
  o During this process, it is possible undef operand is
    generated again, and this will form an infinite loop
    for dag combiner pass2.
  o This patch allows UNDEF to be a legal type.

Signed-off-by: Yonghong Song <yhs@plumgrid.com>
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
llvm-svn: 249718
2015-10-08 18:52:40 +00:00
Reid Kleckner b2244cb8f0 [WinEH] Relax assertion in the presence of stack realignment
The code is correct as is, but we should test it.

llvm-svn: 249715
2015-10-08 18:41:52 +00:00
Sanjoy Das dd70996a5c [SCEV] Pick backedge values for phi nodes correctly
Summary:
`getConstantEvolutionLoopExitValue` and `ComputeExitCountExhaustively`
assumed all phi nodes in the loop header have the same order of incoming
values.  This is not correct, and this commit changes
`getConstantEvolutionLoopExitValue` and `ComputeExitCountExhaustively`
to lookup the backedge value of a phi node using the loop's latch block.

Unfortunately, there is still some code duplication
`getConstantEvolutionLoopExitValue` and `ComputeExitCountExhaustively`.
At some point in the future we should extract out a helper class /
method that can evolve constant evolution phi nodes across iterations.

Fixes 25060.  Thanks to Mattias Eriksson for the spot-on analysis!

Depends on D13457.

Reviewers: atrick, hfinkel

Subscribers: materi, llvm-commits

Differential Revision: http://reviews.llvm.org/D13458

llvm-svn: 249712
2015-10-08 18:28:36 +00:00
Ulrich Weigand f4d14f781f [SystemZ] Fix another assertion failure in tryBuildVectorShuffle
This fixes yet another scenario where tryBuildVectorShuffle would
attempt to create a BUILD_VECTOR node with an invalid combination
of types.  This can happen if the incoming BUILD_VECTOR has elements
of a type different from the vector element type, which is allowed
in certain cases as long as they are all the same type.

When one of these elements is used in the residual vector, and
UNDEF elements are added to fill up the residual vector, those
UNDEFs then have to use the type of the original element, not
the vector element type, or else the resulting BUILD_VECTOR
will have an invalid type combination.

llvm-svn: 249706
2015-10-08 17:46:59 +00:00
Sanjay Patel f61a08fbf1 [InstCombine] transform masking off of an FP sign bit into a fabs() intrinsic call (PR24886)
This is a partial fix for PR24886:
https://llvm.org/bugs/show_bug.cgi?id=24886

Without this IR transform, the backend (x86 at least) was producing inefficient code.

This patch is making 2 assumptions:

    1. The canonical form of a fabs() operation is, in fact, the LLVM fabs() intrinsic.
    2. The high bit of an FP value is always the sign bit; as noted in the bug report, this isn't specified by the LangRef.

Differential Revision: http://reviews.llvm.org/D13076

llvm-svn: 249702
2015-10-08 17:09:31 +00:00
Sanjay Patel 9115cf8c9d [ValueTracking] teach computeKnownBits that a fabs() clears sign bits
This was requested in D13076: if we're going to canonicalize to fabs(), ValueTracking
should know that fabs() clears sign bits.

In this patch (as in D13076), we're not handling vectors yet even though computeKnownBits'
fabs() case itself should be vector-ready via the splat in this patch. 
Fixing this will require follow-on patches to correct other logic that uses 'getScalarType'.

Differential Revision: http://reviews.llvm.org/D13222

llvm-svn: 249701
2015-10-08 16:56:55 +00:00
Kevin Enderby aac7538216 Fix a bug in llvm-objdump’s printing of Objective-C meta data
from malformed Mach-O files that caused a crash because of loops
in the class meta data.

llvm-svn: 249700
2015-10-08 16:56:35 +00:00
Teresa Johnson b1cfcd4a53 Support for llvm-bcanalyzer dumping of record array strings.
Summary:
Adds support for automatically detecting and printing strings
represented by Array abbrev operands, analogous to the string dumping
performed for Blob abbrev operands.

Enhanced the ThinLTO combined index test to check for the appropriate
module and function strings.

Reviewers: dexonsmith, joker.eph, davidxl

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D13553

llvm-svn: 249695
2015-10-08 15:56:24 +00:00
Frederic Riss 263b772bda [X86] Disable X86CallFrameOptimization on Darwin in presence of EH
We emit 1 compact unwind encoding per function, and this can’t represent
the varying stack pointer that will be generated by X86CallFrameOptimization.
Disable the optimization on Darwin.

(It might be possible to split the function into multiple ranges
and emit 1 compact unwind info per range. The compact unwind emission
code isn’t ready for that and this kind of info certainly isn’t
tested/used anywhere. It might be worth exploring this path if we want
to get the space savings at some point though)

llvm-svn: 249694
2015-10-08 15:45:08 +00:00
Igor Breger defab3c1ef AVX512: vpextrb/w/d/q and vpinsrb/w/d/q implementation.
This instructions doesn't have intrincis.
Added tests for lowering and encoding.

Differential Revision: http://reviews.llvm.org/D12317

llvm-svn: 249688
2015-10-08 12:55:01 +00:00
James Molloy e9d50dc9f7 Compute demanded bits for icmp instructions
Instead of bailing out when we see an icmp, we can instead at least
say that if the upper bits of both operands are known zero, they are
not demanded. This doesn't help with signed comparisons, but it's at
least better than bailing out.

llvm-svn: 249687
2015-10-08 12:40:06 +00:00
James Molloy bcd7f0ac98 Treat Mul just like Add and Subtract
Like adds and subtracts, muls ripple only to the left so we can use
the same logic.

While we're here, add a print method to DemandedBits so it can be used
with -analyze, which we'll use in the testcase.

llvm-svn: 249686
2015-10-08 12:39:59 +00:00
Michael Kuperstein 04e79329d0 [X86] Fix wrong treatment of multi-lane blends in BUILD_VECTORtoBlendMask()
This fixes two separate bugs:
1) The mask for the high lane was not set correctly. That fixes PR24532.
2) The transformation should bail out if it believes it involves more than
2 lanes, as it does not currently do anything sensible in this case.

Differential Revision: http://reviews.llvm.org/D13505

llvm-svn: 249669
2015-10-08 08:13:02 +00:00
Michael Kuperstein 2b3c16ca17 Do not assert on first non-prologue instruction being a CFI directive.
llvm-svn: 249668
2015-10-08 07:48:49 +00:00
Jonas Paulsson 5d3fbd3733 [SystemZ] SystemZElimCompare pass improved.
Compare elimination extended to recognize load-and-test instructions used
for comparison and eliminate them the same way as with compare instructions.

Test case fp-cmp-05.ll updated to expect optimized results now also for z13.

The order of instruction shortening and compare elimination passes have been
changed so that opcodes do not have to be handled in both passes.

Reviewed by Ulrich Weigand.

llvm-svn: 249666
2015-10-08 07:40:23 +00:00
Jonas Paulsson 7c5ce10a07 [SystemZ] Use load-and-test for fp compare with 0 if vector support is present.
Since the LTxBRCompare instructions can't be used with vector registers, a
normal load-and-test instruction (with a modelled def operand) is used instead.

Reviewed by Ulrich Weigand.

llvm-svn: 249664
2015-10-08 07:40:16 +00:00
Diego Novillo a082040ded Revert "Handle inline stacks in gcov-encoded sample profiles."
This reverts commit r249644.

The buildbots are failing the new test I added. Investigating.

llvm-svn: 249648
2015-10-08 01:17:26 +00:00
Diego Novillo b7fca57493 Handle inline stacks in gcov-encoded sample profiles.
This patch adds support for reading sample profiles with inline stacks.
Inline stacks in a profile are generated when the sampled binary has
samples in inlined functions.

For instance, if main() calls foo() and foo() calls bar(), and bar() is
inlined into foo() and foo() inlined into main(), the profile may look
something like:

main total:364084 head:0
  [ ... ]
  2.3: _Z3fool total:243786
    1: 60149
    1.2: 38568
    1.4: 46511
    1.7: _Z3bari total:98558
      1.1: 52672
      1.2: 45886

At line 2, discriminator 3, main() calls foo(). In turn, foo() calls
bar() at line 1, discriminator 7.

In the textual format, this stacking of inline calls is represented
with indentation.

With this change, LLVM can now read sample profile files generated by
the create_gcov tool from https://github.com/google/autofdo.

llvm-svn: 249644
2015-10-08 00:39:11 +00:00
Reid Kleckner 94fe836afa [WinEH] Add missing test case for llvm.eh.exceptioncode
llvm-svn: 249638
2015-10-07 23:55:06 +00:00
Reid Kleckner 97797419e6 [WinEH] Fix 32-bit funclet epilogues in the presence of dynamic allocas
In particular, passing non-trivially copyable objects by value on win32
uses a dynamic alloca (inalloca). We would clobber ESP in the epilogue
and end up returning to outer space.

llvm-svn: 249637
2015-10-07 23:55:01 +00:00
David Majnemer 6af5f82c20 [WinEH] Refer to filter funclets using their symbol-table symbol
The relocation for the filter funclet will be against a symbol table
entry for a function instead of the section, making it easier to
understand what is going on.

llvm-svn: 249621
2015-10-07 21:34:00 +00:00
Reid Kleckner 70bf6bb5e6 [WinEH] Undo the effect of r249578 for 32-bit
The __CxxFrameHandler3 tables for 32-bit are supposed to hold stack
offsets relative to EBP, not ESP. I blindly updated the win-catchpad.ll
test case, and immediately noticed that 32-bit catching stopped working.

While I'm at it, move the frame index to frame offset WinEH table logic
out of PEI.  PEI shouldn't have to know about WinEHFuncInfo. I realized
we can calculate frame index offsets just fine from the table printer.

llvm-svn: 249618
2015-10-07 21:13:15 +00:00
David Majnemer c289c9ff55 [WinEH] Remove unreachable blocks before preparation
We remove unreachable blocks because it is pointless to consider them
for coloring.  However, we still had stale pointers to these blocks in
some data structures after we removed them from the function.

Instead, remove the unreachable blocks before attempting to do anything
with the function.

This fixes PR25099.

llvm-svn: 249617
2015-10-07 21:08:25 +00:00
Joseph Tremoulet 39234fc67e [WinEH] Set NoModuleLevelChanges in clone flags
Summary:
This is necessary to keep the cloner from making bogus copies of debug
metadata attached to the IR it is cloning.
Also, avoid running RemapInstruction over all instructions in the common
case that no cloning was performed.

Reviewers: rnk, andrew.w.kaylor, majnemer

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D13514

llvm-svn: 249591
2015-10-07 19:29:56 +00:00
Kevin B. Smith 99e8c0fffb [X86]Update test to use FileCheck.
Updates this test to use FileCheck and a single llc invocation rather than
3 llc invocations and grep.

llvm-svn: 249583
2015-10-07 18:21:41 +00:00
Mehdi Amini 044cb34bdc Revert "Revert "This patch builds on top of D13378 to handle constant condition.""
This reverts commit r249528 and reapply r249431. The fix for the
fallout has been commited in r249575.

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 249581
2015-10-07 18:14:25 +00:00
Chad Rosier 7c6ac2b8f9 [AArch64] Fold a floating-point divide by power of two into fp conversion.
Part of http://reviews.llvm.org/D13442

llvm-svn: 249579
2015-10-07 17:51:37 +00:00
Reid Kleckner 33bd2d99d8 [WinEH] Fix two minor issues in __CxxFrameHandler3 tables
There was an off-by-one bug in ip2state tables which manifested when one
call immediately preceded the try-range of the next. The return address
of the previous call would appear to be within the try range of the next
scope, resulting in extra destructors or catches running.

We also computed the wrong offset for catch parameter stack objects. The
offset should be from RSP, not from RBP.

llvm-svn: 249578
2015-10-07 17:49:32 +00:00
Chad Rosier fa30c9b436 [AArch64] Fold a floating-point multiply by power of two into fp conversion.
Part of http://reviews.llvm.org/D13442

llvm-svn: 249576
2015-10-07 17:39:18 +00:00
Sanjoy Das 0015e5a088 [IndVars] Preserve LCSSA in `eliminateIdentitySCEV`
Summary:
After r249211, SCEV can see through some LCSSA phis.  Add a
`replacementPreservesLCSSAForm` check before replacing uses of these phi
nodes with a simplified use of the induction variable to avoid breaking
LCSSA.

Fixes 25047.

Depends on D13460.

Reviewers: atrick, hfinkel

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D13461

llvm-svn: 249575
2015-10-07 17:38:31 +00:00
Chad Rosier 169865ffda [ARM] Promote helper function to SelectionDAG.
I'll be using the function in a similar combine for AArch64.  The helper was
also improved to handle undef values.

Part of http://reviews.llvm.org/D13442

llvm-svn: 249572
2015-10-07 17:28:58 +00:00
Oliver Stannard d3d114ba54 [ARM] Use correct half-precision functions in EABI mode
The ARM RTABI defines the half- to single-precision float conversion functions
with an __aeabi prefix, but libgcc only has them with a __gnu prefix. Therefore
we need to emit the __aeabi version when compiling with an eabi or eabihf
triple, and the __gnu version with a gnueabi or gnueabihf triple.

llvm-svn: 249565
2015-10-07 16:58:49 +00:00
David Blaikie 30f07f9326 Move test back to Generic now it's fixed the right way (thanks Eric!)
I knee-jerk tried to fix this in completely the wrong way - it's not an
CPU limitation, but an OS/object file type one, so moving it
into a CPU-specific classification didn't help at all.

llvm-svn: 249562
2015-10-07 16:26:28 +00:00
Chad Rosier 17436bf64e [ARM] Prevent PerformVDIVCombine from combining a vcvt/vdiv with 8 lanes.
This would result in a crash since the vcvt used does not support v8i32 types.

llvm-svn: 249560
2015-10-07 16:15:40 +00:00
Artur Pilipenko d94903c9f8 Teach computeKnownBits to use new align attribute/metadata
Reviewed By: reames

Differential Revision: http://reviews.llvm.org/D13470

llvm-svn: 249557
2015-10-07 16:01:18 +00:00
Jeroen Ketema aebca09543 [ARM][AArch64] Only lower to interleaved load/store if the target has NEON
Without an additional check for NEON, the compiler crashes during
legalization of NEON ldN/stN.

Differential Revision: http://reviews.llvm.org/D13508

llvm-svn: 249550
2015-10-07 14:53:29 +00:00
James Molloy 47efaeb36e Revert "This patch builds on top of D13378 to handle constant condition."
This reverts commit r249431. This caused failures in sqlite3: http://lab.llvm.org:8011/builders/clang-native-arm-lnt/builds/14453

llvm-svn: 249528
2015-10-07 09:03:34 +00:00
Arnaud A. de Grandmaison a6178a179d [EarlyCSE] Fix handling of target memory intrinsics for CSE'ing loads.
Summary:
Some target intrinsics can access multiple elements, using the pointer as a
base address (e.g. AArch64 ld4). When trying to CSE such instructions,
it must be checked the available value comes from a compatible instruction
because the pointer is not enough to discriminate whether the value is
correct.

Reviewers: ssijaric

Subscribers: mcrosier, llvm-commits, aemerson

Differential Revision: http://reviews.llvm.org/D13475

llvm-svn: 249523
2015-10-07 07:41:29 +00:00
Michael Kuperstein 259f1508f0 [X86] Emit .cfi_escape GNU_ARGS_SIZE when adjusting the stack before calls
When outgoing function arguments are passed using push instructions, and EH
is enabled, we may need to indicate to the stack unwinder that the stack
pointer was adjusted before the call.

This should fix the exception handling issues in PR24792.

Differential Revision: http://reviews.llvm.org/D13132

llvm-svn: 249522
2015-10-07 07:01:31 +00:00
Igor Breger 1a6fd1cc0f AVX512: Change encoding of vpshuflw and vpshufhw instructions. Implement WIG as W0 and not W1, like all other instruction have been implemented.
Add encoding tests.

Differential Revision: http://reviews.llvm.org/D13471

llvm-svn: 249521
2015-10-07 06:31:18 +00:00
Eric Christopher 97b9189e5e Remove the comdat-ness from the testcase as it won't lower properly
on darwin with it since darwin doesn't have comdat and it isn't
necessary for the testcase.

llvm-svn: 249504
2015-10-07 01:52:33 +00:00
Eric Christopher ab2802c58f Update test to use FileCheck and clean up run lines to match the
expected behavior.

llvm-svn: 249498
2015-10-07 01:21:49 +00:00
Matt Arsenault 284192730a AMDGPU: Use explicit register size indirect pseudos
This stops using an unknown reg class operand.

Currently build_vector selection has a broken looking check
where it tries to use a VGPR reg class and an SGPR one if it
sees an SGPR use.

With the source operand has an explicit VGPR class,
illegal copies will be inserted that SIFixSGPRCopies will take care
of normally later, which will allow removing the weird check
of build_vector users. Without this, when removed v_movrels_b32 would
still be emitted even though all of the values were only stored in
SGPRs.

llvm-svn: 249494
2015-10-07 00:42:51 +00:00
Reid Kleckner 72ba70418f [SEH] Add llvm.eh.exceptioncode intrinsic
This will support the Clang __exception_code intrinsic.

llvm-svn: 249492
2015-10-07 00:27:33 +00:00
Hans Wennborg f1f36517b7 InstCombine: Fold comparisons between unguessable allocas and other pointers
This will allow us to optimize code such as:

  int f(int *p) {
    int x;
    return p == &x;
  }

as well as:

  int *allocate(void);
  int f() {
    int x;
    int *p = allocate();
    return p == &x;
  }

The folding can only be done under certain circumstances. Even though p and &x
cannot alias, the comparison must still return true if the pointer
representations are equal. If a user successfully generates a p that's a
correct guess for &x, comparison should return true even though p is an invalid
pointer.

This patch argues that if the address of the alloca isn't observable outside the
function, the function can act as-if the address is impossible to guess from the
outside. The tricky part is keeping the act consistent: if we fold p == &x to
false in one place, we must make sure to fold any other comparisons based on
those pointers similarly. To ensure that, we only fold when &x is involved
exactly once in comparison instructions.

Differential Revision: http://reviews.llvm.org/D13358

llvm-svn: 249490
2015-10-07 00:20:07 +00:00
David Blaikie 534ff2caca Move test to X86-specific due to some IR invalid on other targets
llvm-svn: 249489
2015-10-07 00:17:31 +00:00
David Blaikie c9ad9191a7 DebugInfo: Include the decl_line/decl_file in subprogram definitions if they differ from those in the declaration
This is handy for some AutoFDO stuff, and seems like a minor improvement
to correctness (otherwise a debug info consumer might think the decl
line/file of the def was the same as that of the declaration - though
what a consumer might use that for, I'm not sure - maybe "list <func>"
would've misbehaved with the old behavior?) and at a minor cost (in my
experiment, with fission, without type units, without compression, 0.01%
growth in debug info in the executable/objects, 0.02% growth in the .dwo
files).

llvm-svn: 249487
2015-10-07 00:04:16 +00:00
David Majnemer 7735a6d07a [WinEH] Create a separate MBB for funclet prologues
Our current emission strategy is to emit the funclet prologue in the
CatchPad's normal destination.  This is problematic because
intra-funclet control flow to the normal destination is not erroneous
and results in us reevaluating the prologue if said control flow is
taken.

Instead, use the CatchPad's location for the funclet prologue.  This
correctly models our desire to have unwind edges evaluate the prologue
but edges to the normal destination result in typical control flow.

Differential Revision: http://reviews.llvm.org/D13424

llvm-svn: 249483
2015-10-06 23:31:59 +00:00
Lang Hames 44780acd91 [Orc] Teach the CompileOnDemand layer to clone aliases.
This allows modules containing aliases to be lazily jit'd. Previously these
failed with missing symbol errors because the aliases weren't cloned from the
original module.

llvm-svn: 249481
2015-10-06 22:55:05 +00:00
Kevin Enderby a59824a174 Fix two bugs in llvm-objdump’s printing of Objective-C meta data
from malformed Mach-O files that caused crashes.

We recently got about 700 malformed Mach-O files which we have
been using the improve the robustness of tools that deal with reading
data from object files.  These resulted in about 20 small bug fixes to
the darwin based tools.

The goal here is to also improve the robustness of llvm-objdump and
this is the first two fixes.  In talking with Tim Northover the approach
we thought might be best is to:

1) Only include tests for the malformed Mach-O files that cause crashes
(not all 700+ tests).
2) The test should only contain the command line option that caused the
crash and not all the others that don’t matter.
3) There should be only one line for the FileCheck that is past the point
of the crash if possible and if possible indicates the malformation.

Again the goal is to fix crashes and not so much care about how the
printing of malformed data comes out.

Tim also suggested if we really wanted to add test cases for all 700+
malformed Mach-O files putting them in the regression tests might be
an option.  But many of these do not cause crashes.

llvm-svn: 249479
2015-10-06 22:27:08 +00:00
Sanjoy Das 5c8bead46d [IndVars] Don't break dominance in `eliminateIdentitySCEV`
Summary:
After r249211, `getSCEV(X) == getSCEV(Y)` does not guarantee that X and
Y are related in the dominator tree, even if X is an operand to Y (I've
included a toy example in comments, and a real example as a test case).

This commit changes `SimplifyIndVar` to require a `DominatorTree`.  I
don't think this is a problem because `ScalarEvolution` requires it
anyway.

Fixes PR25051.

Depends on D13459.

Reviewers: atrick, hfinkel

Subscribers: joker.eph, llvm-commits, sanjoy

Differential Revision: http://reviews.llvm.org/D13460

llvm-svn: 249471
2015-10-06 21:44:49 +00:00
Tom Stellard 0fbf899c0f AMDGPU/SI: Remove calling convention assertion from LowerFormalArguments()
Summary:
We currently ignore the calling convention, so there is no real reason to
assert on the calling convention of functions.

Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D13367

llvm-svn: 249468
2015-10-06 21:16:34 +00:00
Philip Reames 675418ebc0 Extend known bits to understand @llvm.bswap
This is a cleaned up patch from the one written by John Regehr based on the findings of the Souper superoptimizer.

When writing tests, I was surprised to find that instsimplify apparently doesn't know how to collapse bit test sequences based purely on known bits. This required me to split my tests across both instsimplify and instcombine.

Differential Revision: http://reviews.llvm.org/D13250

llvm-svn: 249453
2015-10-06 20:20:45 +00:00
Philip Reames 600a91580f Fix pr25040 - Handle vectors of i1s in recently added implication code
As mentioned in the bug, I'd missed the presence of a getScalarType in the caller of the new implies method. As a result, when we ended up with a implication over two vectors, we'd trip an assert and crash.

Differential Revision: http://reviews.llvm.org/D13441

llvm-svn: 249442
2015-10-06 19:00:02 +00:00
Chad Rosier cb14dd0265 [ARM] Simplify tests and make checks more rigid. NFC.
llvm-svn: 249432
2015-10-06 17:54:12 +00:00
Mehdi Amini cf2513b352 This patch builds on top of D13378 to handle constant condition.
With this patch, clang -O3 optimizes correctly providing > 1000x speedup on this artificial benchmark):

for (a=0; a<n; a++)
    for (b=0; b<n; b++)
        for (c=0; c<n; c++)
            for (d=0; d<n; d++)
                for (e=0; e<n; e++)
                    for (f=0; f<n; f++)
                        x++;
From test-suite/SingleSource/Benchmarks/Shootout/nestedloop.c

Reviewers: sanjoyd

Differential Revision: http://reviews.llvm.org/D13390

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 249431
2015-10-06 17:19:20 +00:00
Tom Stellard 88e0b25181 AMDGPU/SI: Add 64-bit versions of v_nop and v_clrexcp
Summary:
The assembly printing of these is still missing the encoding size
suffix, but this will be fixed in a later commit.

Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D13436

llvm-svn: 249424
2015-10-06 15:57:53 +00:00
Krzysztof Parzyszek fb33824efd [Hexagon] Add an early if-conversion pass
llvm-svn: 249423
2015-10-06 15:49:14 +00:00
Daniel Sanders 1b3341724c [mips][microMIPS] Fix an issue with selecting sqrt instruction in LLVM backend
Summary:
This fixes 7 tests during fast LLVM test-suite run:
* MultiSource/Benchmarks/McCat/18-imp/imp
* MultiSource/Applications/oggenc/oggenc
* MultiSource/Benchmarks/MallocBench/gs/gs
* MultiSource/Benchmarks/MiBench/automotive-susan/automotive-susan
* MultiSource/Benchmarks/VersaBench/beamformer/beamformer
* MultiSource/Benchmarks/MiBench/consumer-lame/consumer-lame
* MultiSource/Benchmarks/Bullet/bullet

Error message was in the form of:
fatal error: error in backend: Cannot select: 0x95c3288: f32 = fsqrt 0x95c0190 [ORD=9] [ID=18]
  0x95c0190: f32 = fadd 0x95bef30, 0x95c4d00 [ORD=8] [ID=17]
    0x95bef30: f32 = fmul 0x95c4988, 0x95c4988 [ORD=5] [ID=16]
...

There was problem with selecting sqrt instruction in LLVM backend.

To fix the issue changes are made in TableGen definition for sqrt instruction in MipsInstrFPU.td and new test file sqrt.ll is added to LLVM regression tests.

Patch by Zlatko Buljan

Reviewers: zoran.jovanovic, hvarga, dsanders

Subscribers: llvm-commits, petarj

Differential Revision: http://reviews.llvm.org/D13235

llvm-svn: 249416
2015-10-06 15:17:25 +00:00
Daniel Sanders add9057fa7 Revert r249123 - [mips][microMIPS] Fix an issue with selecting sqrt instruction in LLVM backend
The author was not credited and most of the commit message is missing. Will re-commit with this fixed.

llvm-svn: 249415
2015-10-06 15:13:16 +00:00
Filipe Cabecinhas b70fd8719e Make sure the CastInst is valid before trying to create it
Bug found with afl-fuzz.

llvm-svn: 249396
2015-10-06 12:37:54 +00:00
Andrea Di Biagio 40f59e4466 [InstCombine] Teach SimplifyDemandedVectorElts how to handle ConstantVector select masks with ConstantExpr elements (PR24922)
If the mask of a select instruction is a ConstantVector, method
SimplifyDemandedVectorElts iterates over the mask elements to identify which
values are selected from the select inputs.

Before this patch, method SimplifyDemandedVectorElts always used method
Constant::isNullValue() to check if a value in the mask was zero. Unfortunately
that method always returns false when called on a ConstantExpr.

This patch fixes the problem in SimplifyDemandedVectorElts by adding an explicit
check for ConstantExpr values. Now, if a value in the mask is a ConstantExpr, we
avoid calling isNullValue() on it.

Fixes PR24922.

Differential Revision: http://reviews.llvm.org/D13219

llvm-svn: 249390
2015-10-06 10:34:53 +00:00
Daniel Sanders bb65d730bf [mips][disassembler] Changed CHECK-EB directives to CHECK so div/divu are tested.
llvm-svn: 249386
2015-10-06 10:08:14 +00:00
Daniel Sanders d245267be0 [mips][disassembler] Merged disassembler tests into the corresponding ISA/ASE subdirectories.
llvm-svn: 249384
2015-10-06 10:02:35 +00:00
Daniel Sanders 31bfdb5a82 [mips][disassembler] Moved DSP tests into proper place and corrected formatting.
llvm-svn: 249383
2015-10-06 09:28:48 +00:00
Craig Topper 2c4068f409 [TwoAddressInstructionPass] When looking for a 3 addr conversion after commuting, make sure regB has been updated to take into account the commute.
llvm-svn: 249378
2015-10-06 05:39:59 +00:00
Alexei Starovoitov 4e01a38da0 [bpf] Avoid extra pointer arithmetic for stack access
For the program like below
struct key_t {
  int pid;
  char name[16];
};
extern void test1(char *);
int test() {
  struct key_t key = {};
  test1(key.name);
  return 0;
}
For key.name, the llc/bpf may generate the below code:
  R1 = R10  // R10 is the frame pointer
  R1 += -24 // framepointer adjustment
  R1 |= 4   // R1 is then used as the first parameter of test1
OR operation is not recognized by in-kernel verifier.

This patch introduces an intermediate FI_ri instruction and
generates the following code that can be properly verified:
  R1 = R10
  R1 += -20

Patch by Yonghong Song <yhs@plumgrid.com>

llvm-svn: 249371
2015-10-06 04:00:53 +00:00
Craig Topper 79dd1bf094 [X86] Teach constant hoisting that ANDs with 64-bit immediates in the range 0x80000000-0xffffffff can be handled cheaply and don't need to be hoisted.
Most importantly, this keeps constant hoisting from preventing instruction selections ability to turn an AND with 0xffffffff into a move into a 32-bit subregister.

llvm-svn: 249370
2015-10-06 02:50:24 +00:00
Dan Gohman e51c058ecc [WebAssembly] Switch to a more traditional assembly syntax
This new syntax is built around putting each instruction on its own line
in a "mnemonic op, op, op" like syntax. It also uses conventional data
section directives like ".byte" and so on rather than requiring everything
to be in hierarchical S-expression format. This is a more natural syntax
for a ".s" file format from the perspective of LLVM MC and related tools,
while remaining easy to translate into other forms as needed.

llvm-svn: 249364
2015-10-06 00:27:55 +00:00
Adrian Prantl d2793a030b dsymutil: Don't prune forward declarations inside of an imported TAG_module
if there exists not definition for the type.
For this to work, we need to clone the imported modules before building
the decl context chains of the DIEs in the non-skeleton CUs.

llvm-svn: 249362
2015-10-05 23:11:20 +00:00
Arnold Schwaighofer 0591c5d719 MergeFunctions: Clear GlobalNumbers ValueMap
Otherwise, the map will observe changes as long as MergeFunctions is alive. This
is bad because follow-up passes could replace-all-uses-with on the key of an
entry in the map. The value handle callback of ValueMap however asserts that the
key type matches.

rdar://22971893

llvm-svn: 249327
2015-10-05 17:26:36 +00:00
Scott Douglass 953f908173 [ARM] Modify codegen for memcpy intrinsic to prefer LDM/STM.
We were previously codegen'ing memcpy as regular load/store operations and
hoping that the register allocator would allocate registers in ascending order
so that we could apply an LDM/STM combine after register allocation. According
to the commit that first introduced this code (r37179), we planned to teach the
register allocator to allocate the registers in ascending order. This never got
implemented, and up to now we've been stuck with very poor codegen.

A much simpler approach for achieving better codegen is to create MEMCPY pseudo
instructions, attach scratch virtual registers to them and then, post register
allocation, expand the MEMCPYs into LDM/STM pairs using the scratch registers.
The register allocator will have picked arbitrary registers which we sort when
expanding the MEMCPY. This approach also avoids the need to repeatedly calculate
offsets which ultimately ought to be eliminated pre-RA in order to decrease
register pressure.

Fixes PR9199 and PR23768.

[This is based on Peter Collingbourne's r238473 which was reverted.]

Differential Revision: http://reviews.llvm.org/D13239

Change-Id: I727543c2e94136e0f80b8e22d5642d7b9ee5b458
Author: Peter Collingbourne <peter@pcc.me.uk>
llvm-svn: 249322
2015-10-05 14:49:54 +00:00
Zoran Jovanovic 5a8dffc618 [mips][microMIPS] Implement JALRC16, JRCADDIUSP and JRC16 instructions
Differential Revision: http://reviews.llvm.org/D11219

llvm-svn: 249317
2015-10-05 14:00:09 +00:00
Alexandros Lamprineas 1bab191f25 [MC layer][AArch64] llvm-mc accepts 4-bit immediate values for
"msr pan, #imm", while only 1-bit immediate values should be valid.
Changed encoding and decoding for msr pstate instructions.

Differential Revision: http://reviews.llvm.org/D13011

llvm-svn: 249313
2015-10-05 13:42:31 +00:00
Daniel Sanders d5a89418c5 [mips] Changed the way symbols are handled in dla and la instructions to allow simple expressions.
Summary:
An instruction like "(d)la $5, symbol+8" previously would have crashed the
assembler as it contains an expression. This is now fixed.
A few tests cases have also been changed to reflect these changes, however
these should only be syntax changes. Some new test cases have also been
added.

Patch by Scott Egerton.

Reviewers: vkalintiris, dsanders

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D12760

llvm-svn: 249311
2015-10-05 13:19:29 +00:00
Alexandros Lamprineas 057f0a68cc Added missing test for [ARM] AttributeParser. Check that build attribute
Tag_Advanced_SIMD_arch is set correctly when targeting v8.1-a NEON.

Differential Revision: http://reviews.llvm.org/D13281

llvm-svn: 249304
2015-10-05 12:13:29 +00:00
Rafael Espindola e3a20f57d9 Fix pr24486.
This extends the work done in r233995 so that now getFragment (in addition to
getSection) also works for variable symbols.

With that the existing logic to decide if a-b can be computed works even if
a or b are variables. Given that, the expression evaluation can avoid expanding
variables as aggressively and that in turn lets the relocation code see the
original variable.

In order for this to work with the asm streamer, there is now a dummy fragment
per section. It is used to assign a section to a symbol when no other fragment
exists.

This patch is a joint work by Maxim Ostapenko andy myself.

llvm-svn: 249303
2015-10-05 12:07:05 +00:00
Teresa Johnson 403a787e03 Support for function summary index bitcode sections and files.
Summary:
The bitcode format is described in this document:
  https://drive.google.com/file/d/0B036uwnWM6RWdnBLakxmeDdOeXc/view
For more info on ThinLTO see:
  https://sites.google.com/site/llvmthinlto

The first customer is ThinLTO, however the data structures are designed
and named more generally based on prior feedback. There are a few
comments regarding how certain interfaces are used by ThinLTO, and the
options added here to gold currently have ThinLTO-specific names as the
behavior they provoke is currently ThinLTO-specific.

This patch includes support for generating per-module function indexes,
the combined index file via the gold plugin, and several tests
(more are included with the associated clang patch D11908).

Reviewers: dexonsmith, davidxl, joker.eph

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D13107

llvm-svn: 249270
2015-10-04 14:33:43 +00:00
Simon Pilgrim bb01c6fda2 [X86][SSE4A] Added shuffle decode tests for 'special case' SSE4A EXTRQI/INSERTQI ops.
llvm-svn: 249263
2015-10-04 10:12:53 +00:00
Joerg Sonnenberger 726e624c0c [SPARCv9] Add support for the rdpr/wrpr instructions.
llvm-svn: 249262
2015-10-04 09:11:22 +00:00
Igor Breger 78741a1b1e AVX512: Implemented encoding and intrinsics for VPERMILPS/PD instructions.
Added tests for intrinsics and encoding.

Differential Revision: http://reviews.llvm.org/D12690

llvm-svn: 249261
2015-10-04 07:20:41 +00:00
David Majnemer 161935520d [WinEH] Permit branch folding in the face of funclets
Track which basic blocks belong to which funclets.  Permit branch
folding to fire but only if it can prove that doing so will not cause
code in one funclet to be reused in another.

llvm-svn: 249257
2015-10-04 02:22:52 +00:00
Simon Pilgrim dde63374c5 [DAGCombiner] Generalize FADD constant combines to work with vectors
Updated the FADD combines to work with vectors as well as scalars.

Differential Revision: http://reviews.llvm.org/D13416

llvm-svn: 249251
2015-10-03 22:06:06 +00:00
Sanjay Patel 004ea240ad add test cases that demonstrate bad behavior
These are based on PR25016 and likely caused by a bug in
MachineCombiner's definition of improvesCriticalPathLen().

llvm-svn: 249249
2015-10-03 20:52:55 +00:00
Davide Italiano 4961936d1a [llvm-size] Attempt to fix a test failure on Windows.
llvm-svn: 249247
2015-10-03 20:20:28 +00:00
Davide Italiano f0acbbfd96 [llvm-size] Fix time to check if time of use bug.
This was the last tool relying on this pattern.

llvm-svn: 249244
2015-10-03 19:44:06 +00:00
Simon Pilgrim 93ea954e6d [X86][SSE] Add FADD combine tests.
llvm-svn: 249240
2015-10-03 18:17:43 +00:00
Dan Gohman dc51b96b7f [WebAssembly] Implement the remaining conversion operations.
This is a temporary assembly syntax that will likely evolve along with
broader upcoming syntax changes.

llvm-svn: 249225
2015-10-03 02:10:28 +00:00
Dan Gohman 6a050f30de [WebAssembly] Rename setlocal to set_local to match the spec.
llvm-svn: 249218
2015-10-03 00:01:53 +00:00
Dan Gohman eb440092c9 [WebAssembly] Update this test for the new loop scheme.
llvm-svn: 249217
2015-10-02 23:54:03 +00:00
Sanjoy Das 55015d210f [SCEV] Recognize simple br-phi patterns
Summary:
Teach SCEV to match patterns like

```
  br %cond, label %left, label %right
 left:
  br label %merge
 right:
  br label %merge
 merge:
  V = phi [ %x, %left ], [ %y, %right ]
```

as "select %cond, %x, %y".  Before this SCEV would match PHI nodes
exclusively to add recurrences.

This addresses PR25005.

Reviewers: joker.eph, joker-eph, atrick

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D13378

llvm-svn: 249211
2015-10-02 23:09:44 +00:00
Piotr Padlewski dc9b2cfc50 inariant.group handling in GVN
The most important part required to make clang
devirtualization works ( ͡°͜ʖ ͡°).
The code is able to find non local dependencies, but unfortunatelly
because the caller can only handle local dependencies, I had to add
some restrictions to look for dependencies only in the same BB.

http://reviews.llvm.org/D12992

llvm-svn: 249196
2015-10-02 22:12:22 +00:00
Dan Gohman e3e4a5ff52 [WebAssembly] Fix CFG stackification of nested loops.
llvm-svn: 249187
2015-10-02 21:11:36 +00:00
Dan Gohman 9cc692b06e [WebAssembly] Support calls marked as "tail", fastcc, and coldcc.
llvm-svn: 249184
2015-10-02 20:54:23 +00:00
Richard Trieu e0129e474d Call the correct overload.
Call the correct overload so a string literal does not get converted to a bool.
Also fix the test case to match the names given.

llvm-svn: 249183
2015-10-02 20:52:14 +00:00
Dan Gohman baba8c648b [WebAssembly] Add a resize_memory intrinsic.
llvm-svn: 249178
2015-10-02 20:10:26 +00:00
Michael Zolotukhin d57f4b9011 [Tests] Add one more case to LoopUnroll/pr18861.ll for better coverage.
llvm-svn: 249174
2015-10-02 19:21:52 +00:00
Michael Zolotukhin 8df4bddd16 [Tests] Give meaningful names to blocks in LoopUnroll/pr18861.ll, add a description of what's going on.
llvm-svn: 249173
2015-10-02 19:21:49 +00:00
Michael Zolotukhin 47eef7a3c9 [Tests] Slightly reduce test LoopUnroll/pr18861.ll.
llvm-svn: 249172
2015-10-02 19:21:43 +00:00
Dan Gohman 72f1692a2c [WebAssembly] Add a memory_size intrinsic.
llvm-svn: 249171
2015-10-02 19:21:15 +00:00
Sanjoy Das 7d910f2b11 [SCEV] Try to prove predicates by splitting them
Summary:
This change teaches SCEV that to prove `A u< B` it is sufficient to
prove each of these facts individually:

 - B >= 0
 - A s< B
 - A >= 0

In practice, SCEV sometimes finds it easier to prove these facts
individually than to prove `A u< B` as one atomic step.

Reviewers: reames, atrick, nlewycky, hfinkel

Subscribers: sanjoy, llvm-commits

Differential Revision: http://reviews.llvm.org/D13042

llvm-svn: 249168
2015-10-02 18:50:30 +00:00
Roman Divacky 4b5507a037 Actually switch the arch when we see .arch. PR21695
llvm-svn: 249165
2015-10-02 18:25:25 +00:00
Tim Northover 8d67b8e053 ARM: diagnose invalid local fixups on Thumb1
We previously stopped producing Thumb2 relaxations when they weren't supported,
but only diagnosed the case where an actual relocation was produced. We should
also tell people if local symbols aren't going to work rather than silently
overflowing.

llvm-svn: 249164
2015-10-02 18:07:18 +00:00
Tim Northover 956b008db6 ARM: correctly align constant pool value on Thumb1 targets.
Since we're using tLDRpci to access it, the constant pool's address must be 0
(mod 4).

llvm-svn: 249163
2015-10-02 18:07:13 +00:00
Andrea Di Biagio 77f62652c1 Reapply r249121 : "[FastISel][x86] Teach how to select SSE2/AVX bitcasts between 128/256-bit vector types."
This patch teaches FastIsel the following two things:
1) On SSE2, no instructions are needed for bitcasts between 128-bit vector types;
2) On AVX, no instructions are needed for bitcasts between 256-bit vector types.

Example:

  %1 = bitcast <4 x i31> %V to <2 x i64>

Before (-fast-isel -fast-isel-abort=1):

  FastIsel miss: %1 = bitcast <4 x i31> %V to <2 x i64>

Now we don't fall back to SelectionDAG and we correctly fold that computation
propagating the register associated to %V.

Originally reviewed here: http://reviews.llvm.org/D13347

llvm-svn: 249147
2015-10-02 16:08:05 +00:00
Andrea Di Biagio 45874e67a1 Revert: [FastISel][x86] Teach how to select SSE2/AVX bitcasts between 128/256-bit vector types.
r249121 caused a Clang test failure (avx2-buitins.c).
Revert r249121 while I keep investigating on the reason why that test failed.

llvm-svn: 249124
2015-10-02 13:06:19 +00:00
Zoran Jovanovic 9ffdfa5986 [mips][microMIPS] Fix an issue with selecting sqrt instruction in LLVM backend
Differential Revision: http://reviews.llvm.org/D13235

llvm-svn: 249123
2015-10-02 13:06:02 +00:00
Andrea Di Biagio cb33456122 [FastISel][x86] Teach how to select SSE2/AVX bitcasts between 128/256-bit vector types.
This patch teaches FastIsel the following two things:
1) On SSE2, no instructions are needed for bitcasts between 128-bit vector types;
2) On AVX, no instructions are needed for bitcasts between 256-bit vector types.

Example:

  %1 = bitcast <4 x i31> %V to <2 x i64>

Before (-fast-isel -fast-isel-abort=1):

  FastIsel miss: %1 = bitcast <4 x i31> %V to <2 x i64>

Now we don't fall back to SelectionDAG and we correctly fold that computation
propagating the register associated to %V.

Differential Revision: http://reviews.llvm.org/D13347

llvm-svn: 249121
2015-10-02 12:45:37 +00:00
Adrian Prantl 42562c38f5 dsymutil: Also ignore the ByteSize when building the DeclContext cache for
clang modules.

Forward decls of ObjC interfaces don't have a bytesize.

llvm-svn: 249110
2015-10-02 00:27:08 +00:00
Bruno Cardoso Lopes b491a2d641 [SimplifyLibCalls] Fix instruction misplacement in string/memory libcall optimization
When trying to optimize fortified library functions use the right
location to insert new instructions in order to preserve correct
def-use order.

This fixes an issue where a misplaced instruction definition would
happen to be *after* one of its use after a RAUW, forming invalid IR.
This behavior was introduced by r227250.

Differential Revision: http://reviews.llvm.org/D13301

rdar://problem/22802369

llvm-svn: 249092
2015-10-01 22:43:53 +00:00
Colin LeMahieu 665c9be489 [Hexagon] XFAILing test while diagnosing backend error.
llvm-svn: 249088
2015-10-01 22:14:05 +00:00
Joerg Sonnenberger c8d50d6347 Fix relocation used for GOT references in non-PIC mode. Fix relocations
for "set" pseudo op in PIC mode.

Differential Revision: http://reviews.llvm.org/D13173

llvm-svn: 249086
2015-10-01 22:08:20 +00:00
Davide Italiano f070688ecf [PATCH] D13360: [llvm-objdump] Teach -d about AArch64 mapping symbols
AArch64 uses $d* and $x* to interleave between text and data.
llvm-objdump didn't know about this so it ended up printing garbage.
This patch is a first step towards a solution of the problem.

Differential Revision:	 http://reviews.llvm.org/D13360

llvm-svn: 249083
2015-10-01 21:57:09 +00:00
Reid Kleckner fc64fae6e3 [WinEH] Emit __C_specific_handler tables for the new IR
We emit denormalized tables, where every range of invokes in the same
state gets a complete list of EH action entries. This is significantly
simpler than trying to infer the correct nested scoping structure from
the MI. Fortunately, for SEH, the nesting structure is really just a
size optimization.

With this, some basic __try / __except examples work.

llvm-svn: 249078
2015-10-01 21:38:24 +00:00
Colin LeMahieu f92c175bdd [Hexagon] XFAILing test while diagnosing backend error.
llvm-svn: 249075
2015-10-01 21:19:03 +00:00
Tom Stellard e9f8b24985 AMDGPU/SI: Remove assert from AMDGPUOpenCLImageTypeLowering pass
Summary:
Instead of asserting when the kernel metadata is different than we expect,
we should just skip lowering that function.  This fixes assertion
failures with OpenCL argument metadata from older LLVM releases.

Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D13356

llvm-svn: 249073
2015-10-01 21:16:05 +00:00
David Majnemer 4600c06434 [WinEH] Stop BranchFolding from merging across funclets
BranchFolding would merge two funclets together, this is not OK.
Disable this and strengthen the assertion in FuncletLayout.

llvm-svn: 249069
2015-10-01 21:04:13 +00:00
David Majnemer f828a0ccc7 [WinEH] Make FuncletLayout more robust against catchret
Catchret transfers control from a catch funclet to an earlier funclet.
However, it is not completely clear which funclet the catchret target is
part of.  Make this clear by stapling the catchret target's funclet
membership onto the CATCHRET SDAG node.

llvm-svn: 249052
2015-10-01 18:44:59 +00:00
Jonas Paulsson 12629324a4 [SystemZ] Add some generic (floating point support) load instructions.
Add generic instructions for load complement, load negative and load positive
for fp32 and fp64, and let isel prefer them. They do not clobber CC, and so
give scheduler more freedom. SystemZElimCompare pass will convert them when it
can to the CC-setting variants.

Regression tests updated to expect the new opcodes in places where the old ones
where used. New test case SystemZ/fp-cmp-05.ll checks that
SystemZCompareElim.cpp can handle the new opcodes.

README.txt updated (bullet removed).

Note that fp128 is not yet handled, because it is relatively rare, and is a
bit trickier, because of the fact that l.dfr would operate on the sign bit of
one of the subregisters of a fp128, but we would not want to copy the other
sub-reg in case src and dst regs are not the same.

Reviewed by Ulrich Weigand.

llvm-svn: 249046
2015-10-01 18:12:28 +00:00
Rafael Espindola e883514736 Fix printing of 64 bit values and make test more strict.
llvm-svn: 249043
2015-10-01 17:57:31 +00:00
Tom Stellard e0e582c9aa AMDGPU: Add MEM_RAT STORE_TYPED.
v2: Add test (Matt).
    Fix capitalization of isEOP (Matt).
    Move pattern to class parameter (Matt).
    Make the instruction available to Cayman (Matt).
    Change name from MEM_RAT WRITE_TYPED to MEM_RAT STORE_TYPED.

Patch by: Zoltan Gilian

llvm-svn: 249042
2015-10-01 17:51:34 +00:00
NAKAMURA Takumi 1ed20db720 Revert r248959, "[WinEH] Emit int3 after noreturn calls on Win64"
It broke; LLVM :: CodeGen__Generic__2009-11-16-BadKillsCrash.ll

llvm-svn: 249032
2015-10-01 17:00:56 +00:00
Arnaud A. de Grandmaison 849f3bf8c9 [InstCombine] Remove trivially empty lifetime start/end ranges.
Summary:
Some passes may open up opportunities for optimizations, leaving empty
lifetime start/end ranges. For example, with the following code:

    void foo(char *, char *);
    void bar(int Size, bool flag) {
      for (int i = 0; i < Size; ++i) {
        char text[1];
        char buff[1];
        if (flag)
          foo(text, buff); // BBFoo
      }
    }

the loop unswitch pass will create 2 versions of the loop, one with
flag==true, and the other one with flag==false, but always leaving
the BBFoo basic block, with lifetime ranges covering the scope of the for
loop. Simplify CFG will then remove BBFoo in the case where flag==false,
but will leave the lifetime markers.

This patch teaches InstCombine to remove trivially empty lifetime marker
ranges, that is ranges ending right after they were started (ignoring
debug info or other lifetime markers in the range).

This fixes PR24598: excessive compile time after r234581.

Reviewers: reames, chandlerc

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D13305

llvm-svn: 249018
2015-10-01 14:54:31 +00:00
Ulrich Weigand cf1670a095 [SystemZ] Add assembly instructions for obtaining clock values as well as CPU features
Provide assembler support for STCK, STCKF, STCKE, and STFLE.

Author: joncmu
Differential Revision: http://reviews.llvm.org/D13299

llvm-svn: 249015
2015-10-01 14:43:48 +00:00
Zoran Jovanovic 2960f3a346 [mips][microMIPS] Implement CACHEE, WRPGPR and WSBH instructions
Differential Revision: http://reviews.llvm.org/D10337

llvm-svn: 249004
2015-10-01 12:49:27 +00:00
Scott Douglass 290183d734 [ARM] More care with Thumb1 writeback in ARMLoadStoreOptimizer
Differential Revision: http://reviews.llvm.org/D13240

llvm-svn: 249002
2015-10-01 11:56:19 +00:00
Jingyue Wu df1a1b113b [NaryReassociate] SeenExprs records WeakVH
Summary:
The instructions SeenExprs records may be deleted during rewriting.
FindClosestMatchingDominator should ignore these deleted instructions.

Fixes PR24301.

Reviewers: grosser

Subscribers: grosser, llvm-commits

Differential Revision: http://reviews.llvm.org/D13315

llvm-svn: 248983
2015-10-01 03:51:44 +00:00
Dehao Chen 7c41dd6498 Update sample profile propagation algorithm.
http://reviews.llvm.org/D13218

llvm-svn: 248968
2015-10-01 00:26:56 +00:00
Ahmed Bougacha 23a0d1a1d6 [X86] Don't custom-lower vNi32 uint_to_fp when unsafe-fp-math.
The custom code produces incorrect results if later reassociated.

Since r221657, on x86, vNi32 uitofp is lowered using an optimized
sequence:

  movdqa LCPI0_0(%rip), %xmm1 ## xmm1 = [65535, ...]
  pand %xmm0, %xmm1
  por LCPI0_1(%rip), %xmm1 ## [0x4b000000, ...]
  psrld $16, %xmm0
  por LCPI0_2(%rip), %xmm0 ## [0x53000000, ...]
  addps LCPI0_3(%rip), %xmm0 ## [float -5.497642e+11, ...]
  addps %xmm1, %xmm0

Since r240361, the machine combiner opportunistically reassociates
2-instruction sequences (with -ffast-math). In the new code sequence,
the ADDPS' are eligible. In isolation, for simple examples (without
reassociable users), this makes no performance difference (the goal
being to enable reassociation of longer chains).

In the trivial example (just one uitofp), the reassociation doesn't
happen, because (I think) it would require the emission of a separate
movaps for a constantpool load (instead of folding it into addps).

However, when we have multiple uitofp sequences, and the constantpool
loads are CSE'd earlier, the machine combiner can do the reassociation.

When the ADDPS' are reassociated, the resulting sequence isn't correct
anymore, as we'd be adding large (2**39) constants with comparatively
smaller values (~2**23). Given that two of the three inputs are powers
of 2 larger than 2**16, and that ulp(2**39) == 2**(39-24) == 2**15,
the reassociated chain will produce 0 for any input in [0, 2**14[.
In my testing, it also produces wrong results for 99.5% of [0, 2**32[.

Avoid this by disabling the new lowering when -ffast-math. It does
mean that we'll get slower code than without it, but at least we
won't get egregiously incorrect code.

One might argue that, considering -ffast-math is all but meaningless,
uitofp producing wrong results isn't a compiler bug. But it really is.

Fixes PR24512.

...though this is really more of a workaround.
Ideally, we'd have some sort of Machine FMF, but that's a problem
that's not worth tackling until we do more with machine IR.

llvm-svn: 248965
2015-10-01 00:11:07 +00:00
Reid Kleckner 6dec87a8a0 [WinEH] Emit int3 after noreturn calls on Win64
The Win64 unwinder disassembles forwards from each PC to try to
determine if this PC is in an epilogue. If so, it skips calling the EH
personality function for that frame. Typically, this means you cannot
catch an exception in the same frame that you threw it, because 'throw'
calls a noreturn runtime function.

Previously we avoided this problem with the TrapUnreachable
TargetOption, but that's a much bigger hammer than we need. All we need
is a 1 byte non-epilogue instruction right after the call.  Instead,
what we got was an unconditional branch to a shared block containing the
ud2, potentially 7 bytes instead of 1. So, this reverts r206684, which
added TrapUnreachable, and replaces it with something better.

The new code pattern matches for invoke/call followed by unreachable and
inserts an int3 into the DAG. To be 100% watertight, we would need to
insert SEH_Epilogue instructions into all basic blocks ending in a call
with no terminators or successors, but in practice this is unlikely to
come up.

llvm-svn: 248959
2015-09-30 23:09:23 +00:00
Sanjay Patel a114a10bbe [x86] enable machine combiner reassociations for 256-bit vector logical integer insts
llvm-svn: 248955
2015-09-30 22:25:55 +00:00
Chad Rosier 4c5a4646bf [AArch64] Remove an unnecessary run line and other cleanup. NFC.
Unscaled load/store combining has been enabled since the initial ARM64 port.  No
need for a redundance run.  Also, add CHECK-LABEL directives.

llvm-svn: 248945
2015-09-30 21:10:02 +00:00
Michael Zolotukhin fc783e91e0 [SLP] Don't vectorize loads of non-packed types (like i1, i2).
Summary:
Given an array of i2 elements, 4 consecutive scalar loads will be lowered to
i8-sized loads and thus will access 4 consecutive bytes in memory. If we
vectorize these loads into a single <4 x i2> load, it'll access only 1 byte in
memory. Hence, we should prohibit vectorization in such cases.

PS: Initial patch was proposed by Arnold.

Reviewers: aschwaighofer, nadav, hfinkel

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D13277

llvm-svn: 248943
2015-09-30 21:05:43 +00:00
Evgeniy Stepanov 422a61306e Move dw_op_minus test to DebugInfo/X86.
The test requires X86 target support, and checks the actual debug
info contents, including register numbers which would be different on
other platforms.

llvm-svn: 248938
2015-09-30 20:23:24 +00:00
Evgeniy Stepanov f608111d1b Fix debug info with SafeStack.
llvm-svn: 248933
2015-09-30 19:55:43 +00:00
Chad Rosier 11c825f7db [AArch64] Remove an unnecessary restriction on pre-index instructions.
Previously, the index was constrained to the size of the memory operation for
no apparent reason.  This change removes that constraint so that we can form
pre-index instructions with any valid offset.

llvm-svn: 248931
2015-09-30 19:44:40 +00:00
Hal Finkel 4c45775880 [PowerPC] Disable shrink wrapping
Shrink wrapping is causing a self-hosting failure on PPC64/Linux. Disable for
now until the problem can be fixed.

llvm-svn: 248924
2015-09-30 17:29:03 +00:00
Erik Eckstein 91c49810f2 SLPVectorizer: add a test to check if the minimum region size works.
This is an addition to rL248917.

llvm-svn: 248923
2015-09-30 17:28:19 +00:00
Artyom Skrobov 72ca6b8f3f [ARM] Support for ARMv6-Z / ARMv6-ZK missing
As Richard Barton observed at http://reviews.llvm.org/D12937#inline-107121
TargetParser in LLVM has insufficient support for ARMv6Z and ARMv6ZK.

In particular, there were no tests for TrustZone being supported in these
architectures.

The patch clears a FIXME: left by Saleem Abdulrasool in r201471, and fixes
his test case which hadn't really been testing what it was claiming to test.

Differential Revision: http://reviews.llvm.org/D13236

llvm-svn: 248921
2015-09-30 17:25:52 +00:00
Erik Eckstein 848c1aa452 SLPVectorizer: limit the scheduling region size per basic block.
Usually large blocks are not a problem. But if a large block (> 10k instructions)
contains many (potential) chains of vector instructions, and those chains are
spread over a wide range of instructions, then scheduling becomes a compile time problem.
This change introduces a limit for the accumulate scheduling region size of a block.
For real-world functions this limit will never be exceeded (it's about 10x larger than
the maximum value seen in the test-suite and external test suite).

llvm-svn: 248917
2015-09-30 17:00:44 +00:00
Andrea Di Biagio 0594e2a1e9 [InstCombine] Teach how to convert SSSE3/AVX2 byte shuffles to builtin shuffles if the shuffle mask is constant.
This patch teaches InstCombiner how to convert a SSSE3/AVX2 byte shuffle to a
builtin shuffle if the mask is constant.

Converting byte shuffle intrinsic calls to builtin shuffles can help finding
more opportunities for combining shuffles later on in selection dag.

We may end up with byte shuffles with constant masks as the result of inlining.

Differential Revision: http://reviews.llvm.org/D13252

llvm-svn: 248913
2015-09-30 16:44:39 +00:00
Jeroen Ketema ab99b59e8c [ARM][NEON] Use address space in vld([1234]|[234]lane) and vst([1234]|[234]lane) instructions
This commit changes the interface of the vld[1234], vld[234]lane, and vst[1234],
vst[234]lane ARM neon intrinsics and associates an address space with the
pointer that these intrinsics take. This changes, e.g.,

<2 x i32> @llvm.arm.neon.vld1.v2i32(i8*, i32)

to

<2 x i32> @llvm.arm.neon.vld1.v2i32.p0i8(i8*, i32)

This change ensures that address spaces are fully taken into account in the ARM
target during lowering of interleaved loads and stores.

Differential Revision: http://reviews.llvm.org/D12985

llvm-svn: 248887
2015-09-30 10:56:37 +00:00
Simon Pilgrim 3d11c994f7 [X86][XOP] Added support for the lowering of 128-bit vector shifts to XOP shift instructions
The XOP shifts just have logical/arithmetic versions and the left/right shifts are controlled by whether the value is positive/negative. Because of this I've added new X86ISD nodes instead of trying to force them to use the existing shift nodes.

Additionally Excavator cores (bdver4) support XOP and AVX2 - meaning that it should use the AVX2 shifts when it can and fall back to XOP in other cases.

Differential Revision: http://reviews.llvm.org/D8690

llvm-svn: 248878
2015-09-30 08:17:50 +00:00
Dehao Chen aae9e1f2bd Add unittest for new samle profile format.
http://reviews.llvm.org/D13145

llvm-svn: 248870
2015-09-30 01:05:37 +00:00
Dehao Chen 6722688eaa http://reviews.llvm.org/D13145
Support hierarachical sample profile format.

llvm-svn: 248865
2015-09-30 00:42:46 +00:00
Evgeniy Stepanov d3f544f271 [safestack] Fix a stupid mix-up in the direct-tls code path.
llvm-svn: 248863
2015-09-30 00:01:47 +00:00
Reid Kleckner a13dfd539b [WinEH] Setup RBP correctly in Win64 funclet prologues
Previously local variable captures just didn't work in 64-bit. Now we
can access local variables more or less correctly.

llvm-svn: 248857
2015-09-29 23:32:01 +00:00
David Majnemer 91b0ab9172 [WinEH] Ensure that funclets obey the x64 ABI
The x64 ABI requires that epilogues do not contain code other than stack
adjustments and some limited control flow.  However, we'd insert code to
initialize the return address after stack adjustments.  Instead, insert
EAX/RAX with the current value before we create the stack adjustments in
the epilogue.

llvm-svn: 248839
2015-09-29 22:33:36 +00:00
Maksim Panchenko cce239c45d HHVM calling conventions.
HHVM calling convention, hhvmcc, is used by HHVM JIT for
functions in translated cache. We currently support LLVM back end to
generate code for X86-64 and may support other architectures in the
future.

In HHVM calling convention any GP register could be used to pass and
return values, with the exception of R12 which is reserved for
thread-local area and is callee-saved. Other than R12, we always
pass RBX and RBP as args, which are our virtual machine's stack pointer
and frame pointer respectively.

When we enter translation cache via hhvmcc function, we expect
the stack to be aligned at 16 bytes, i.e. skewed by 8 bytes as opposed
to standard ABI alignment. This affects stack object alignment and stack
adjustments for function calls.

One extra calling convention, hhvm_ccc, is used to call C++ helpers from
HHVM's translation cache. It is almost identical to standard C calling
convention with an exception of first argument which is passed in RBP
(before we use RDI, RSI, etc.)

Differential Revision: http://reviews.llvm.org/D12681

llvm-svn: 248832
2015-09-29 22:09:16 +00:00
Chad Rosier 1769d8505f Fix test from r248825.
llvm-svn: 248827
2015-09-29 20:50:15 +00:00
Chad Rosier 4315012769 [AArch64] Add support for pre- and post-index LDPSWs.
llvm-svn: 248825
2015-09-29 20:39:55 +00:00
David Majnemer a80c151286 [WinEH] Teach AsmPrinter about funclets
Summary:
Funclets have been turned into functions by the time they hit the object
file.  Make sure that they have decent names for the symbol table and
CFI directives explaining how to reason about their prologues.

Differential Revision: http://reviews.llvm.org/D13261

llvm-svn: 248824
2015-09-29 20:12:33 +00:00
Zachary Turner 4dddcc64d3 [llvm-pdbdump] Add include-only filters.
PDB files have a lot of noise in them, with hundreds (or thousands)
of symbols from system libraries and compiler generated types.  If
you're only looking for a specific type, this can be problematic.

This CL allows you to display *only* types, variables, or compilands
matching a particular pattern.  These filters can even be combined
with exclude filters.  Include-only filters are given priority, so
that first the set of items to display is limited only to those that
match the include filters, and then the set of exclude filters is
applied to those.  If there are no include filters specified, then
it means "display everything".

llvm-svn: 248822
2015-09-29 19:49:06 +00:00
Chad Rosier dabe2534ed [AArch64] Add integer pre- and post-index halfword/byte loads and stores.
llvm-svn: 248817
2015-09-29 18:26:15 +00:00
Dehao Chen 028e122ca9 Revert r248810 which breaks tests.
llvm-svn: 248814
2015-09-29 18:18:49 +00:00
Dehao Chen 410a25aa7a http://reviews.llvm.org/D13231
Change lookup functions to const functions.

llvm-svn: 248810
2015-09-29 17:59:58 +00:00
James Molloy 897048bee3 [ValueTracking] Teach isKnownNonZero about monotonically increasing PHIs
If a PHI starts at a non-negative constant, monotonically increases
(only adds of a constant are supported at the moment) and that add
does not wrap, then the PHI is known never to be zero.

llvm-svn: 248796
2015-09-29 14:08:45 +00:00
Jeroen Ketema 740f9d79ca Arguments spilled on the stack before a function call may have
alignment requirements, for example in the case of vectors.
These requirements are exploited by the code generator by using
move instructions that have similar alignment requirements, e.g.,
movaps on x86.

Although the code generator properly aligns the arguments with
respect to the displacement of the stack pointer it computes,
the displacement itself may cause misalignment. For example if
we have

%3 = load <16 x float>, <16 x float>* %1, align 64
call void @bar(<16 x float> %3, i32 0)

the x86 back-end emits:

movaps  32(%ecx), %xmm2
movaps  (%ecx), %xmm0
movaps  16(%ecx), %xmm1
movaps  48(%ecx), %xmm3
subl    $20, %esp       <-- if %esp was 16-byte aligned before this instruction, it no longer will be afterwards 
movaps  %xmm3, (%esp)   <-- movaps requires 16-byte alignment, while %esp is not aligned as such.
movl    $0, 16(%esp)
calll   __bar

To solve this, we need to make sure that the computed value with which
the stack pointer is changed is a multiple af the maximal alignment seen
during its computation. With this change we get proper alignment:

subl    $32, %esp
movaps  %xmm3, (%esp)

Differential Revision: http://reviews.llvm.org/D12337

llvm-svn: 248786
2015-09-29 10:12:57 +00:00
Simon Pilgrim 43f5e0848e [InstCombine] Improve Vector Demanded Bits Through Bitcasts
Currently SimplifyDemandedVectorElts can only peek through bitcasts if the vectors have the same number of elements.

This patch fixes and enables some existing (disabled) code to support bitcasting to vectors with more/fewer elements. It currently only accepts cases when vectors alias cleanly (i.e. number of elements are an exact multiple of the other vector).

This was added to improve the demanded vector elements support for SSE vector shifts which require the __m128i (<2 x i64>) argument type to be bitcast to the vector type for the builtin shift. I've added extra tests for various additional bitcasts.

Differential Revision: http://reviews.llvm.org/D12935

llvm-svn: 248784
2015-09-29 08:19:11 +00:00
Dan Gohman 868e1c08d9 [WebAssembly] Rename test files to match platform naming conventions.
llvm-svn: 248783
2015-09-29 08:13:58 +00:00
Chen Li 9f27fc0599 [LoopUnswitch] Add block frequency analysis to recognize hot/cold regions
Summary: This patch adds block frequency analysis to LoopUnswitch pass to recognize hot/cold regions. For cold regions the pass only performs trivial unswitches since they do not increase code size, and for hot regions everything works as before. This helps to minimize code growth in cold regions and be more aggressive in hot regions. Currently the default cold regions are blocks with frequencies below 20% of function entry frequency, and it can be adjusted via -loop-unswitch-cold-block-frequency flag. The entire feature is controlled via -loop-unswitch-with-block-frequency flag and it is off by default.

Reviewers: broune, silvas, dnovillo, reames

Subscribers: davidxl, llvm-commits

Differential Revision: http://reviews.llvm.org/D11605

llvm-svn: 248777
2015-09-29 05:03:32 +00:00
Evgeniy Stepanov d8b86f7cdc Move dbg.declare intrinsics when merging and replacing allocas.
Place new and update dbg.declare calls immediately after the
corresponding alloca.

Current code in replaceDbgDeclareForAlloca puts the new dbg.declare
at the end of the basic block. LLVM codegen has problems emitting
debug info in a situation when dbg.declare appears after all uses of
the variable. This usually kinda works for inlining and ASan (two
users of this function) but not for SafeStack (see the pending change
in http://reviews.llvm.org/D13178).

llvm-svn: 248769
2015-09-29 00:30:19 +00:00