Commit Graph

44269 Commits

Author SHA1 Message Date
Tim Northover ff168c68dc ARM: TLS calling convention doesn't preserve r9 or r12 on Darwin.
llvm-svn: 300726
2017-04-19 18:07:54 +00:00
Sanjay Patel ded7d59f0e [DAG] add splat vector support for 'and' in SimplifyDemandedBits
The patch itself is simple: stop discriminating against vectors in visitAnd() and again in 
SimplifyDemandedBits().

Some notes for reference:

1. We're not consistent about calls to SimplifyDemandedBits in the various visitXXX functions. 
   Sometimes, we check if the RHS is a constant first. Other times (like here), we just dive in.
2. I'd like to break the vector shackles in steps for the sake of risk minimization, but we could
    make similar simultaneous changes in other places if we think that would be better.
3. I don't know what the intent of the changed tests in this patch was supposed to be, but since 
   they wiggled in a positive way, I'm just going with that. :)
4. In the rotate tests, note that we can see through non-splat constants. This is a result of D24253.
5. My motivation for being here now is to make D31944 look better, so this is step 1 of N towards 
   improving the vector codegen in that patch without writing any actual new code.

Differential Revision: https://reviews.llvm.org/D32230

llvm-svn: 300725
2017-04-19 18:05:06 +00:00
Matt Arsenault 6cb7b8a42f AMDGPU: Don't align callable functions to 256
llvm-svn: 300720
2017-04-19 17:42:39 +00:00
Matt Arsenault 4c1ecded63 AMDGPU: Change DivergenceAnalysis for function arguments
Stop assuming all functions are kernels.

llvm-svn: 300719
2017-04-19 17:42:34 +00:00
Sanjay Patel a3c297dba4 [InstSimplify] fold identity shuffles (recursing if needed)
This patch simplifies the examples from D31509 and D31927 (PR30630) and catches 
the basic identity shuffle tests that Zvi recently added.

I'm not sure if we have something like this in DAGCombiner, but we should?

It's worth noting that "MaxRecurse / RecursionLimit" is only 3 on entry at the moment. 
We might want to bump that up if there are longer shuffle chains like this in the wild.

For now, we're ignoring shuffles that have undef mask elements because it's not
clear how those should be handled.

Differential Revision: https://reviews.llvm.org/D31960

llvm-svn: 300714
2017-04-19 16:48:22 +00:00
Krzysztof Parzyszek 333b2bf2ed [Hexagon] Generate proper offset in opt-addr-mode
Also, make a few changes to allow using the pass in .mir testcases.
Among other things, change the abbreviation from opt-amode to amode-opt,
because otherwise lit would expand the "opt" part to the full path to
the opt binary.

llvm-svn: 300707
2017-04-19 15:15:51 +00:00
Sanjay Patel c9d36f181f [PowerPC] add test and auto-generate checks; NFC
llvm-svn: 300700
2017-04-19 14:58:09 +00:00
Sanjay Patel 5a2235bbd0 [ARM] add test and auto-generate checks; NFC
llvm-svn: 300698
2017-04-19 14:55:50 +00:00
Davide Italiano a9f047a594 [InstSimplify] Deduce correct type for vector GEP.
InstSimplify returned the wrong type when simplifying a vector GEP
and we ended up crashing when trying to replace all uses with the
new value. Fixes PR32697.

Differential Revision: https://reviews.llvm.org/D32180

llvm-svn: 300693
2017-04-19 14:23:42 +00:00
Dylan McKay da2d74642a [AVR] Remove the 'multibyte' asm test
It tests registers which are not actually used on AVR.

llvm-svn: 300684
2017-04-19 12:13:45 +00:00
Simon Pilgrim 5536ecd9f0 Regenerate test. NFCI.
llvm-svn: 300683
2017-04-19 12:06:40 +00:00
Dylan McKay 7838104382 [AVR] Fix the test suite
A bunch of tests failed because memory operations have been reordered.

I am unsure which commit changed this behaviour as the AVR build was
failing at that point with an unrelated error.

This commit just reoders some of the CHECK lines in some tests to suit
current llc output.

llvm-svn: 300682
2017-04-19 12:02:52 +00:00
Igor Breger 4fdf1e489c [GlobalIsel][X86] support G_TRUNC selection.
Summary:
[GlobalIsel][X86] support G_TRUNC selection.
Add regbank-select and legalizer tests. Currently legalization of trunc i64 on 32bit platform not supported.

Reviewers: ab, zvi, rovka

Reviewed By: zvi

Subscribers: dberris, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D32115

llvm-svn: 300678
2017-04-19 11:34:59 +00:00
Simon Pilgrim 94d5dba8ae [X86] Add D32039/PR31357 tests to show current BSWAP codegen
llvm-svn: 300672
2017-04-19 11:06:22 +00:00
Simon Pilgrim b13ab63503 [X86][SSE] Add scheduling latency/throughput tests for (most) SSE2 instructions
llvm-svn: 300671
2017-04-19 10:52:09 +00:00
Renato Golin 742aed8683 Revert "ARMFrameLowering: Reserve emergency spill slot for large arguments"
This reverts commit r300639, as it broke self-hosting on ARM. PR32709.

llvm-svn: 300668
2017-04-19 09:02:52 +00:00
Igor Breger 746b3c336f [GlobalISel][X86] Split select tests. NFC.
llvm-svn: 300666
2017-04-19 08:40:44 +00:00
Diana Picus 49472ff1cf [ARM] GlobalISel: Add support for G_MUL
Support G_MUL, very similar to G_ADD and G_SUB. The only difference is
in the instruction selector, where we have to select either MUL or MULv5
depending on the target.

llvm-svn: 300665
2017-04-19 07:29:46 +00:00
Kristof Beyls 0f36e68f62 [GlobalISel] Support vector-of-pointers in LLT
This fixes PR32471.

As comment 10 on that bug report highlights
(https://bugs.llvm.org//show_bug.cgi?id=32471#c10), there are quite a
few different defendable design tradeoffs that could be made, including
not representing pointers at all in LLT.

I decided to go for representing vector-of-pointer as a concept in LLT,
while keeping the size of the LLT type 64 bits (this is an increase from
48 bits before). My rationale for keeping pointers explicit is that on
some targets probably it's very handy to have the distinction between
pointer and non-pointer (e.g. 68K has a different register bank for
pointers IIRC). If we keep a scalar pointer, it probably is easiest to
also have a vector-of-pointers to keep LLT relatively conceptually clean
and orthogonal, while we don't have a very strong reason to break that
orthogonality.  Once we gain more experience on the use of LLT, we can
of course reconsider this direction.

Rejecting vector-of-pointer types in the IRTranslator is also an option
to avoid the crash reported in PR32471, but that is only a very
short-term solution; also needs quite a bit of code tweaks in places,
and is probably fragile. Therefore I didn't consider this the best
option.

llvm-svn: 300664
2017-04-19 07:23:57 +00:00
Chandler Carruth ae3386aa74 Revert r300657 due to crashes in stage2 of bootstraps:
http://lab.llvm.org:8011/builders/clang-with-lto-ubuntu/builds/2476/steps/build-stage2-LLVMgold.so/logs/stdio
http://bb.pgr.jp/builders/clang-3stage-x86_64-linux/builds/15036/steps/build_llvmclang/logs/stdio

I've updated the commit thread, reverting to get the bots back to green.

Original commit summary:
[JumpThread] We want to fold (not thread) when all predecessor go to single BB's successor.

llvm-svn: 300662
2017-04-19 06:23:20 +00:00
Xin Tong 636a332906 [JumpThread] We want to fold (not thread) when all predecessor go to single BB's successor. .
Summary: In case all predecessor go to a single successor of current BB. We want to fold (not thread).

Reviewers: efriedma, sanjoy

Reviewed By: sanjoy

Subscribers: dberlin, majnemer, llvm-commits

Differential Revision: https://reviews.llvm.org/D30869

llvm-svn: 300657
2017-04-19 05:15:57 +00:00
Matthias Braun 661d3d4b00 ARMFrameLowering: Reserve emergency spill slot for large arguments
We need to reserve an emergency spill slot in cases with large argument
types that could overflow immediate offsets for FP relative address
calculations.

rdar://31317893

Differential Revision: https://reviews.llvm.org/D31643

llvm-svn: 300639
2017-04-19 01:16:07 +00:00
Dean Michael Berris 97923db891 [XRay][tools] Fix yaml matching to be more permissive
Account for a potentially empty function name.

Follow-up to D32153.

llvm-svn: 300631
2017-04-19 00:10:09 +00:00
Dean Michael Berris 918802bed4 [XRay][tools] Add option to llvm-xray extract to symbolize functions
Summary:
This allows us to, if the symbol names are available in the binary, be
able to provide the function name in the YAML output.

Reviewers: dblaikie, pelikan

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D32153

llvm-svn: 300624
2017-04-18 23:23:54 +00:00
Sanjay Patel ff981f9256 [x86] add tests for potential andn optimization; NFC
llvm-svn: 300617
2017-04-18 22:36:59 +00:00
Chih-Hung Hsieh 877923a87f [X86] Keep EXTRACT_VECTOR_ELT result type as f128 for Android x86_64.
Android x86_64 target uses f128 type and stores f128 values in %xmm* registers.
SoftenFloatRes_EXTRACT_VECTOR_ELT should not convert result value
from f128 to i128.

Differential Revision: http://reviews.llvm.org/D32102

llvm-svn: 300583
2017-04-18 20:15:18 +00:00
Simon Pilgrim 9398649fea [X86][SSE] Add scheduling latency/throughput tests for (most) SSE1 instructions
llvm-svn: 300576
2017-04-18 19:04:40 +00:00
Easwaran Raman 76aba5f6d7 [SLP vectorizer] Allow phi node reordering in tryToVectorizeList.
In tryToVectorizeList, under a very limited circumstance (when entered
from tryToVectorizePair), the values may be reordered (swapped) and the
SLP tree is built with the new order. This extends that to the case when
starting from phis in vectorizeChainsInBlock when there are exactly two
phis. The textual order of phi nodes shouldn't really matter. Without
this change, the loop body in the accompnaying test case is fully vectorized
when we swap the orde of the phis but not with this order. While this
doesn't solve the phi-ordering problem in a general way (for more than 2
phis), this is simple fix that piggybacks on an existing mechanism and
is useful in cases like multiplying two complex numbers.

Differential revision: https://reviews.llvm.org/D32065

llvm-svn: 300574
2017-04-18 18:16:57 +00:00
Nirav Dave 855ef45602 [DAG] Improve store merge candidate pruning.
Remove non-consecutive stores from store merge candidate search as
they cannot be merged and will prevent us from finding subsequent
mergeable store cases.

Reviewers: jyknight, bogner, javed.absar, spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D32086

llvm-svn: 300561
2017-04-18 15:36:34 +00:00
Nirav Dave e50544cdf2 Add base-index-based store merge test
llvm-svn: 300559
2017-04-18 15:12:13 +00:00
Nikolai Bozhenov 95fc644148 Make globalaa-retained.ll test catching more cases.
Summary:
* Add checks for store. That is needed because GlobalsAA is called
  twice in the current pipeline with different sets of Function passes
  following it. However, the loads are eliminated using instcombine
  which happens everywhere. On the other hand, DeadStoreElimination is
  performed only once so by checking for store we'll be able to catch
  more cases when GlobalsAA is invalidated unintentionally.
* Add empty function above/below the test so that we don't depend on
  the relative order of instcombine/dead-store-elimination and the
  pass that invalidates the analysis (inside the same
  FunctionPassManager).

Reviewers: kristof.beyls

Reviewed By: kristof.beyls

Subscribers: llvm-commits, n.bozhenov

Differential Revision: https://reviews.llvm.org/D32015
Patch by Andrei Elovikov <andrei.elovikov@intel.com>

llvm-svn: 300553
2017-04-18 13:29:26 +00:00
Nirav Dave b97768499f Add store Merge test.
llvm-svn: 300551
2017-04-18 13:25:19 +00:00
Oliver Stannard 7ad2e8aae1 [ARM] Add hardware build attributes in assembler
In the assembler, we should emit build attributes based on the target
selected with command-line options. This matches the GNU assembler's
behaviour. We only do this for build attributes which describe the
hardware that is expected to be available, not the ones that describe
ABI compatibility.

This is done by moving some of the attribute emission code to
ARMTargetStreamer, so that it can be shared between the assembly and
code-generation code paths. Since the assembler only creates a
MCSubtargetInfo, not an ARMSubtarget, the code had to be changed to
check raw features, and not use the convenience functions in
ARMSubtarget.

If different attributes are later specified using the .eabi_attribute
directive, then they will take precedence, as happens when the same
.eabi_attribute is specified twice.

This must be enabled by an option, because we don't want to do this when
parsing inline assembly. The attributes would match the ones emitted at
the start of the file, so wouldn't actually change the emitted object
file, but the extra directives would be added to every inline assembly
block when emitting assembly, which we'd like to avoid.

The majority of the changes in the build-attributes.ll test are just
re-ordering the directives, because the hardware attributes are now
emitted before the ABI ones. However, I did fix one bug which I spotted:
Tag_CPU_arch_profile was not being emitted for v6M.

Differential revision: https://reviews.llvm.org/D31812

llvm-svn: 300547
2017-04-18 12:52:35 +00:00
Diana Picus a3a0cccb2c [ARM] GlobalISel: Add support for G_SUB
Support G_SUB throughout the GlobalISel pipeline. It is exactly the same
as G_ADD, nothing fancy.

llvm-svn: 300546
2017-04-18 12:35:28 +00:00
Kristof Beyls a4e79cca77 Revert "[GlobalISel] Support vector-of-pointers in LLT"
This reverts r300535 and r300537.
The newly added tests in test/CodeGen/AArch64/GlobalISel/arm64-fallback.ll
produces slightly different code between LLVM versions being built with different compilers.
E.g., dependent on the compiler LLVM is built with, either one of the following
can be produced:

remark: <unknown>:0:0: unable to legalize instruction: %vreg0<def>(p0) = G_EXTRACT_VECTOR_ELT %vreg1, %vreg2; (in function: vector_of_pointers_extractelement)
remark: <unknown>:0:0: unable to legalize instruction: %vreg2<def>(p0) = G_EXTRACT_VECTOR_ELT %vreg1, %vreg0; (in function: vector_of_pointers_extractelement)

Non-determinism like this is clearly a bad thing, so reverting this until
I can find and fix the root cause of the non-determinism.

llvm-svn: 300538
2017-04-18 09:26:36 +00:00
Diana Picus e2626bb7c2 [ARM] Check for correct HW div when lowering divmod
For subtargets that use the custom lowering for divmod, e.g. gnueabi,
we used to check if the subtarget has hardware divide and then lower to
a div-mul-sub sequence if true, or to a libcall if false.

However, judging by the usage of hasDivide vs hasDivideInARMMode, it
seems that hasDivide only refers to Thumb. For instance, in the
ARMTargetLowering constructor, the code that specifies whether to use
libcalls for (S|U)DIV looks like this:

bool hasDivide = Subtarget->isThumb() ? Subtarget->hasDivide()
                                      : Subtarget->hasDivideInARMMode();

In the case of divmod for arm-gnueabi, using only hasDivide() to
determine what to do means that instead of lowering to __aeabi_idivmod
to get the remainder, we lower to div-mul-sub and then further lower the
div to __aeabi_idiv. Even worse, if we have hardware divide in ARM but
not in Thumb, we generate a libcall instead of using it (this is not an
issue in practice since AFAICT none of the cores that we support have
hardware divide in ARM but not Thumb).

This patch fixes the code dealing with custom lowering to take into
account the mode (Thumb or ARM) when deciding whether or not hardware
division is available.

Differential Revision: https://reviews.llvm.org/D32005

llvm-svn: 300536
2017-04-18 08:32:27 +00:00
Kristof Beyls fb73eb0324 [GlobalISel] Support vector-of-pointers in LLT
This fixes PR32471.

As comment 10 on that bug report highlights
(https://bugs.llvm.org//show_bug.cgi?id=32471#c10), there are quite a
few different defendable design tradeoffs that could be made, including
not representing pointers at all in LLT.

I decided to go for representing vector-of-pointer as a concept in LLT,
while keeping the size of the LLT type 64 bits (this is an increase from
48 bits before). My rationale for keeping pointers explicit is that on
some targets probably it's very handy to have the distinction between
pointer and non-pointer (e.g. 68K has a different register bank for
pointers IIRC). If we keep a scalar pointer, it probably is easiest to
also have a vector-of-pointers to keep LLT relatively conceptually clean
and orthogonal, while we don't have a very strong reason to break that
orthogonality. Once we gain more experience on the use of LLT, we can
of course reconsider this direction.

Rejecting vector-of-pointer types in the IRTranslator is also an option
to avoid the crash reported in PR32471, but that is only a very
short-term solution; also needs quite a bit of code tweaks in places,
and is probably fragile. Therefore I didn't consider this the best
option.

llvm-svn: 300535
2017-04-18 08:12:45 +00:00
Adrian Prantl 6825fb64e9 PR32382: Fix emitting complex DWARF expressions.
The DWARF specification knows 3 kinds of non-empty simple location
descriptions:
1. Register location descriptions
  - describe a variable in a register
  - consist of only a DW_OP_reg
2. Memory location descriptions
  - describe the address of a variable
3. Implicit location descriptions
  - describe the value of a variable
  - end with DW_OP_stack_value & friends

The existing DwarfExpression code is pretty much ignorant of these
restrictions. This used to not matter because we only emitted very
short expressions that we happened to get right by accident.  This
patch makes DwarfExpression aware of the rules defined by the DWARF
standard and now chooses the right kind of location description for
each expression being emitted.

This would have been an NFC commit (for the existing testsuite) if not
for the way that clang describes captured block variables. Based on
how the previous code in LLVM emitted locations, DW_OP_deref
operations that should have come at the end of the expression are put
at its beginning. Fixing this means changing the semantics of
DIExpression, so this patch bumps the version number of DIExpression
and implements a bitcode upgrade.

There are two major changes in this patch:

I had to fix the semantics of dbg.declare for describing function
arguments. After this patch a dbg.declare always takes the *address*
of a variable as the first argument, even if the argument is not an
alloca.

When lowering a DBG_VALUE, the decision of whether to emit a register
location description or a memory location description depends on the
MachineLocation — register machine locations may get promoted to
memory locations based on their DIExpression. (Future) optimization
passes that want to salvage implicit debug location for variables may
do so by appending a DW_OP_stack_value. For example:
  DBG_VALUE, [RBP-8]                        --> DW_OP_fbreg -8
  DBG_VALUE, RAX                            --> DW_OP_reg0 +0
  DBG_VALUE, RAX, DIExpression(DW_OP_deref) --> DW_OP_reg0 +0

All testcases that were modified were regenerated from clang. I also
added source-based testcases for each of these to the debuginfo-tests
repository over the last week to make sure that no synchronized bugs
slip in. The debuginfo-tests compile from source and run the debugger.

https://bugs.llvm.org/show_bug.cgi?id=32382
<rdar://problem/31205000>

Differential Revision: https://reviews.llvm.org/D31439

llvm-svn: 300522
2017-04-18 01:21:53 +00:00
Dehao Chen 1ea8bd8109 Build SymbolMap in SampleProfileLoader to help matchin function names with suffix.
Summary: If there is suffix added in the function name (e.g. module hash added by thinLTO), we will not be able to find a match in profile as the suffix does not exist in profile. This patch build a map from function name to Function *. The map includes the entry for the stripped function name so that inlineHotFunctions can find the corresponding function to promote/inline.

Reviewers: davidxl, dnovillo, tejohnson

Reviewed By: davidxl

Subscribers: mehdi_amini, llvm-commits

Differential Revision: https://reviews.llvm.org/D31952

llvm-svn: 300507
2017-04-17 22:23:05 +00:00
Haicheng Wu 68f82a31d3 Change the testcase tail-merge-after-mbp.ll to tail-merge-after-mbp.mir
Differential Revision: https://reviews.llvm.org/D32037

llvm-svn: 300506
2017-04-17 22:22:38 +00:00
Jacob Gravelle 0bb7541233 [WebAssembly] Fix WebAssemblyOptimizeReturned after r300367
Summary:
Refactoring changed paramHasAttr(1 + i) to paramHasAttr(0), fix that to
paramHasAttr(i).
Add more tests to WebAssemblyOptimizeReturned that catch that
regression.

Reviewers: dschuff

Subscribers: jfb, sbc100, llvm-commits

Differential Revision: https://reviews.llvm.org/D32136

llvm-svn: 300502
2017-04-17 21:40:28 +00:00
Davide Italiano cdc937d0fc [InstCombine] Matchers work with both ConstExpr and Instructions.
So, `cast<Instruction>` is not guaranteed to succeed. Change the
code so that we create a new constant and use it in the newly
created instruction, as it's done in other places in InstCombine.

OK'ed by Sanjay/Craig. Fixes PR32686.

llvm-svn: 300495
2017-04-17 20:49:50 +00:00
Sanjay Patel 9083dc9680 [InstSimplify] add/move tests for (icmp X, C1 & icmp X, C2); NFC
We simplify based on range intersection, but we're missing folds.

llvm-svn: 300493
2017-04-17 20:38:33 +00:00
Dehao Chen 6afb3167e9 Update the test to fix the buildbot failure introduced by r300486 (NFC)
llvm-svn: 300492
2017-04-17 20:35:32 +00:00
Dehao Chen ef700d550e Add GNU_discriminator support for inline callsites in llvm-symbolizer.
Summary: LLVM symbolize cannot recognize GNU_discriminator for inline callsites. This patch adds support for it.

Reviewers: dblaikie

Reviewed By: dblaikie

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D32134

llvm-svn: 300486
2017-04-17 20:10:39 +00:00
Matt Arsenault a3566f2149 AMDGPU: Use MachineRegisterInfo to find max used register
Avoid looping through program to determine register counts.
This avoids needing to look at regmask operands.

Also fixes some counting errors with flat_scr when there
are no stack objects.

llvm-svn: 300482
2017-04-17 19:48:30 +00:00
Brendon Cahoon 7769a0854e [CodeGenPrepare] Fix crash due to an invalid CFG
The splitIndirectCriticalEdges function generates and invalid CFG when the
'Target' basic block is a loop to itself. When this occurs, the code that
updates the predecessor terminator needs to update the terminator in the split
basic block.

This occurs when there is an edge from block D back to D. Since D is split in
to D0 and D1, the code needs to update the terminator in D1. But D1 is not in
the OtherPreds vector, so it was not getting updated.

Differential Revision: https://reviews.llvm.org/D32126

llvm-svn: 300480
2017-04-17 19:11:04 +00:00
Tim Northover 46e36f0953 AArch64: put nonlazybind special handling behind a flag for now.
It's basically a terrible idea anyway but objc_msgSend gets emitted like that.
We can decide on a better way to deal with it in the unlikely event that anyone
actually uses it.

llvm-svn: 300474
2017-04-17 18:18:47 +00:00
Konstantin Zhuravlyov dfe2ffe8c5 AMDGPU: Test handling of R_AMDGPU_ABS64 in RelocVisitor
llvm-svn: 300472
2017-04-17 18:12:45 +00:00
Konstantin Zhuravlyov 12096848fd AMDGPU: Set CodePointerSize to 8 for amdgcn
llvm-svn: 300470
2017-04-17 18:02:09 +00:00
Peter Collingbourne a0f371a106 Bitcode: Add a string table to the bitcode format.
Add a top-level STRTAB block containing a string table blob, and start storing
strings for module codes FUNCTION, GLOBALVAR, ALIAS, IFUNC and COMDAT in
the string table.

This change allows us to share names between globals and comdats as well
as between modules, and improves the efficiency of loading bitcode files by
no longer using a bit encoding for symbol names. Once we start writing the
irsymtab to the bitcode file we will also be able to share strings between
it and the module.

On my machine, link time for Chromium for Linux with ThinLTO decreases by
about 7% for no-op incremental builds or about 1% for full builds. Total
bitcode file size decreases by about 3%.

As discussed on llvm-dev:
http://lists.llvm.org/pipermail/llvm-dev/2017-April/111732.html

Differential Revision: https://reviews.llvm.org/D31838

llvm-svn: 300464
2017-04-17 17:51:36 +00:00
Tim Northover 879a0b2e1b AArch64: support nonlazybind
It's almost certainly not a good idea to actually use it in most cases (there's
a pretty large code size overhead on AArch64), but we can't do those
experiments until it's supported.

llvm-svn: 300462
2017-04-17 17:27:56 +00:00
Matt Arsenault 7205f3c2e4 AMDGPU: SimplifyDemandedElts for image intrinsics
Causes some VGPR usage improvements in shaderdb, but
introduces some SGPR spilling regressions due to random
scheduling changes later.

llvm-svn: 300453
2017-04-17 15:12:44 +00:00
Max Kazantsev 751579cac0 [LoopPeeling] Get rid of Phis that become invariant after N steps
This patch is a generalization of the improvement introduced in rL296898.
Previously, we were able to peel one iteration of a loop to get rid of a Phi that becomes
an invariant on the 2nd iteration. In more general case, if a Phi becomes invariant after
N iterations, we can peel N times and turn it into invariant.
In order to do this, we for every Phi in loop's header we define the Invariant Depth value
which is calculated as follows:

Given %x = phi <Inputs from above the loop>, ..., [%y, %back.edge].

If %y is a loop invariant, then Depth(%x) = 1.
If %y is a Phi from the loop header, Depth(%x) = Depth(%y) + 1.
Otherwise, Depth(%x) is infinite.
Notice that if we peel a loop, all Phis with Depth = 1 become invariants,
and all other Phis with finite depth decrease the depth by 1.
Thus, peeling N first iterations allows us to turn all Phis with Depth <= N
into invariants.

Reviewers: reames, apilipenko, mkuper, skatkov, anna, sanjoy

Reviewed By: sanjoy

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D31613

llvm-svn: 300446
2017-04-17 09:52:02 +00:00
Max Kazantsev 8ed6b66d85 [LoopPeeling] Fix condition for phi-eliminating peeling
When peeling loops basing on phis becoming invariants, we make a wrong loop size check.
UP.Threshold should be compared against the total numbers of instructions after the transformation,
which is equal to 2 * LoopSize in case of peeling one iteration.
We should also check that the maximum allowed number of peeled iterations is not zero.

Reviewers: sanjoy, anna, reames, mkuper

Reviewed By: mkuper

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D31753

llvm-svn: 300441
2017-04-17 05:38:28 +00:00
Serguei Katkov 2616bbb16d [BPI] Use metadata info before any other heuristics
Metadata potentially is more precise than any heuristics we use, so
it makes sense to use first metadata info if it is available. However it makes
sense to examine it against other strong heuristics like unreachable one.
If edge coming to unreachable block has higher probability then it is expected 
by unreachable heuristic then we use heuristic and remaining probability is
distributed among other reachable blocks equally.

An example where metadata might be more strong then unreachable heuristic is
as follows: it is possible that there are two branches and for the branch A
metadata says that its probability is (0, 2^25). For the branch B
the probability is (1, 2^25).
So the expectation is that first edge of B is hotter than first edge of A
because first edge of A did not executed at least once.
If first edge of A points to the unreachable block then using the unreachable
heuristics we'll set the probability for A to (1, 2^20) and now edge of A
becomes hotter than edge of B.
This is unexpected behavior.

This fixed the biggest part of https://bugs.llvm.org/show_bug.cgi?id=32214

Reviewers: sanjoy, junbuml, vsk, chandlerc

Reviewed By: chandlerc

Subscribers: llvm-commits, reames, davidxl

Differential Revision: https://reviews.llvm.org/D30631

llvm-svn: 300440
2017-04-17 04:33:04 +00:00
Craig Topper 218a359fbd [InstCombine] Simplify 1/X for vectors.
llvm-svn: 300439
2017-04-17 03:41:47 +00:00
Craig Topper eee53c030a [InstCombine] Add test cases for missing support for simplifying 1/X for vectors. NFC
llvm-svn: 300438
2017-04-17 03:41:44 +00:00
Craig Topper 1a18a7c51e [InstCombine] Add support for vector srem->urem.
llvm-svn: 300437
2017-04-17 01:51:24 +00:00
Craig Topper b60f300afb [InstCombine] Add missing testcases for srem->urem conversion. The vector version isn't currently supported. NFC
llvm-svn: 300436
2017-04-17 01:51:21 +00:00
Craig Topper f248468359 [InstCombine] Add support for turning vector sdiv into udiv.
llvm-svn: 300435
2017-04-17 01:51:19 +00:00
Craig Topper 43b012b1b3 [InstCombine] Add test cases for missing support for turning vector sdiv into udiv. NFC
llvm-svn: 300434
2017-04-17 01:51:16 +00:00
Benjamin Kramer f5f593b674 [X86] Remove special handling for 16 bit for A asm constraints.
Our 16 bit support is assembler-only + the terrible hack that is
.code16gcc. Simply using 32 bit registers does the right thing for the
latter.

Fixes PR32681.

llvm-svn: 300429
2017-04-16 20:13:08 +00:00
Michael Zuckerman 16b20d2fc5 [X86][X86 intrinsics]Folding cmp(sub(a,b),0) into cmp(a,b) optimization
This patch adds new optimization (Folding cmp(sub(a,b),0) into cmp(a,b))
to instCombineCall pass and was written specific for X86 CMP intrinsics.

Differential Revision: https://reviews.llvm.org/D31398

llvm-svn: 300422
2017-04-16 13:26:08 +00:00
Dimitry Andric 909b3376ba Use correct registers for "A" inline asm constraint
Summary:
In PR32594, inline assembly using the 'A' constraint on x86_64 causes
llvm to crash with a "Cannot select" stack trace.  This is because
`X86TargetLowering::getRegForInlineAsmConstraint` hardcodes that 'A'
means the EAX and EDX registers.

However, on x86_64 it means the RAX and RDX registers, and on 16-bit x86
(ia16?) it means the old AX and DX registers.

Add new register classes in `X86RegisterInfo.td` to support these cases,
and amend the logic in `getRegForInlineAsmConstraint` to cope with
different subtargets.  Also add a test case, derived from PR32594.

Reviewers: craig.topper, qcolombet, RKSimon, ab

Reviewed By: ab

Subscribers: ab, emaste, royger, llvm-commits

Differential Revision: https://reviews.llvm.org/D31902

llvm-svn: 300404
2017-04-15 22:15:01 +00:00
Sanjay Patel ef9f586bb2 [InstCombine] allow (X != C1 && X != C2) and similar patterns to match splat vector constants
llvm-svn: 300402
2017-04-15 17:55:06 +00:00
Sanjay Patel c8405b82a1 [InstCombine] add tests to show missing transforms for vectors; NFC
llvm-svn: 300401
2017-04-15 17:50:45 +00:00
Sam Clegg 135a4b8ea1 [WebAssembly] Improve readobj and nm support for wasm
Now that the libObect support for wasm is better we can
have readobj and nm produce more useful output too.

Differential Revision: https://reviews.llvm.org/D31514

llvm-svn: 300365
2017-04-14 19:50:44 +00:00
Sanjay Patel 7cfe41659c [InstCombine] (X != C1 && X != C2) --> (X | (C1 ^ C2)) != C2
...when C1 differs from C2 by one bit and C1 <u C2:
http://rise4fun.com/Alive/Vuo

And move related folds to a helper function. This reduces code duplication and
will make it easier to remove the scalar-only restriction as a follow-up step.

llvm-svn: 300364
2017-04-14 19:23:50 +00:00
Craig Topper fb71b7d3e0 [InstCombine] Support folding a subtract with a constant LHS into a phi node
We currently only support folding a subtract into a select but not a PHI. This fixes that.

I had to fix an assumption in FoldOpIntoPhi that assumed the PHI node was always in operand 0. Now we pass it in like we do for FoldOpIntoSelect. But we still require some dancing to find the Constant when we create the BinOp or ConstantExpr. This is based code is similar to what we do for selects.

Since I touched all call sites, this also renames FoldOpIntoPhi to foldOpIntoPhi to match coding standards.

Differential Revision: https://reviews.llvm.org/D31686

llvm-svn: 300363
2017-04-14 19:20:12 +00:00
Stanislav Mekhanoshin eff0bc7839 [AMDGPU] set read_only access qualifier for pointers
If a kernel's pointer argument is known to be readonly
set access qualifier accordingly. This allows RT not to
flush caches before dispatches.

Differential Revision: https://reviews.llvm.org/D32091

llvm-svn: 300362
2017-04-14 19:11:40 +00:00
Sam Clegg dd2d7bf100 [Test commit] Cleanup some whitespace in a test file
llvm-svn: 300361
2017-04-14 18:43:57 +00:00
Craig Topper d61ccd735e [InstCombine] Regenerate test checks using script. NFC
llvm-svn: 300360
2017-04-14 18:42:55 +00:00
Sanjay Patel 9d39a9d860 [InstCombine] add/move tests for and/or-of-icmps equality folds; NFC
llvm-svn: 300357
2017-04-14 18:19:27 +00:00
Alexey Bataev 9c27d79520 Update tests for the patch.
llvm-svn: 300351
2017-04-14 17:47:07 +00:00
Simon Pilgrim 5a22eaa2bf [X86][SSE] Update MOVNTDQA non-temporal loads to generic implementation (LLVM)
MOVNTDQA non-temporal aligned vector loads can be correctly represented using generic builtin loads, allowing us to remove the existing x86 intrinsics.

Clang companion patch: D31766.

Differential Revision: https://reviews.llvm.org/D31767

llvm-svn: 300325
2017-04-14 15:05:35 +00:00
Dmitry Preobrazhensky e6ef099dcd [AMDGPU][MC] Corrected ds_write_src2_* to require one offset instead of two.
Fixed bug 32551: https://bugs.llvm.org//show_bug.cgi?id=32551

Reviewers: vpykhtin

Differential Revision: https://reviews.llvm.org/D31809

llvm-svn: 300319
2017-04-14 12:28:07 +00:00
Dmitry Preobrazhensky 5714860ee4 [AMDGPU][MC] Enabled constants for src operands of s_cbranch_g_fork
Fixed bug 32619: https://bugs.llvm.org//show_bug.cgi?id=32619

Reviewers: artem.tamazov, vpykhtin

Differential Revision: https://reviews.llvm.org/D31973

llvm-svn: 300318
2017-04-14 11:52:26 +00:00
Andrew V. Tischenko 4e7bcd5216 Fix for PR#30562: Selection DAG error: Detected cycle in SelectionDAG.
Patch by Dinar Temirbulatov

llvm-svn: 300314
2017-04-14 09:17:09 +00:00
Andrew V. Tischenko 75745d0c3e This patch closes PR#32216: Better testing of schedule model instruction latencies/throughputs.
The details are here: https://reviews.llvm.org/D30941

llvm-svn: 300311
2017-04-14 07:44:23 +00:00
Peter Collingbourne 8446f1fe6a Object, LTO: Add target triple to irsymtab and LTO API.
Start using it in LLD to avoid needing to read bitcode again just to get the
target triple, and in llvm-lto2 to avoid printing symbol table information
that is inappropriate for the target.

Differential Revision: https://reviews.llvm.org/D32038

llvm-svn: 300300
2017-04-14 02:55:06 +00:00
Daniel Berlin 2f72b19b05 NewGVN: Don't propagate over phi backedges where undef causes us to
have >1 value, unless we can prove the phi node is cycle free.

Fixes PR 32607.

llvm-svn: 300299
2017-04-14 02:53:37 +00:00
Stanislav Mekhanoshin 86b0a5465b [AMDGPU] added SIInstrInfo::getAddNoCarry() helper
Addressed rest of post submit comments from D31993.

Differential Revision: https://reviews.llvm.org/D32057

llvm-svn: 300288
2017-04-14 00:33:44 +00:00
Xinliang David Li 57dea2d359 [Profile] PE binary coverage bug fix
PR/32584

Differential Revision: https://reviews.llvm.org/D32023

llvm-svn: 300277
2017-04-13 23:37:12 +00:00
Adam Nemet c5779460f4 [AArch64] Avoid partial register writes on lane 0 of BUILD_VECTOR for i8/i16/f16
This further improves Ahmed's change in rL299482.  See the new comment for the
rationale.

The patch recovers most of the regression for bzip2 after D31965. We're down
to +2.68% from +6.97%.

Differential Revision: https://reviews.llvm.org/D32028

llvm-svn: 300276
2017-04-13 23:32:47 +00:00
Konstantin Zhuravlyov d24aeb20fc AMDGPU/GFX9: Do not use v_pack_b32_f16 when packing
Differential Revision: https://reviews.llvm.org/D31819

llvm-svn: 300275
2017-04-13 23:17:00 +00:00
Alexei Starovoitov 56db145164 [bpf] Fix memory offset check for loads and stores
If the offset cannot fit into the instruction, an addition to the
pointer is emitted before the actual access. However, BPF offsets are
16-bit but LLVM considers them to be, for the matter of this check,
to be 32-bit long.

This causes the following program:

int bpf_prog1(void *ign)
{

volatile unsigned long t = 0x8983984739ull;
return *(unsigned long *)((0xffffffff8fff0002ull) + t);

}

To generate the following (wrong) code:

0: 18 01 00 00 39 47 98 83 00 00 00 00 89 00 00 00

r1 = 590618314553ll

2: 7b 1a f8 ff 00 00 00 00 *(u64 *)(r10 - 8) = r1
3: 79 a1 f8 ff 00 00 00 00 r1 = *(u64 *)(r10 - 8)
4: 79 10 02 00 00 00 00 00 r0 = *(u64 *)(r1 + 2)
5: 95 00 00 00 00 00 00 00 exit

Fix it by changing the offset check to 16-bit.

Patch by Nadav Amit <nadav.amit@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Differential Revision: https://reviews.llvm.org/D32055

llvm-svn: 300269
2017-04-13 22:24:13 +00:00
Zachary Turner 4dc4f01a86 [llvm-pdbdump] Recursively dump class layout.
llvm-svn: 300258
2017-04-13 21:11:00 +00:00
Richard Smith 6c2615177b Revert accidentally-committed files in r300252.
llvm-svn: 300253
2017-04-13 20:31:21 +00:00
Richard Smith 55bd375b69 Remove all allocation and divisions from GreatestCommonDivisor
Switch from Euclid's algorithm to Stein's algorithm for computing GCD. This
avoids the (expensive) APInt division operation in favour of bit operations.
Remove all memory allocation from within the GCD loop by tweaking our `lshr`
implementation so it can operate in-place.

Differential Revision: https://reviews.llvm.org/D31968

llvm-svn: 300252
2017-04-13 20:29:59 +00:00
Reid Kleckner 257cb4e099 [InstCombine] Fix !prof metadata preservation for invokes
Summary:
Bug noticed by inspection.

Extend the test to handle invokes as well as calls, and rewrite it to
not depend on the inliner and other passes.

Also simplify the call site replacement code with CallSite, similar to
what I did to dead arg elimination and arg promotion (rL300235 and
rL300229).

Reviewers: danielcdh, davidxl

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D32041

llvm-svn: 300251
2017-04-13 20:26:38 +00:00
Dehao Chen 2c7ca9b5df SamplePGO: convert callsite samples map key from callsite_location to callsite_location+callee_name
Summary: For iterative SamplePGO, an indirect call can be speculatively promoted to multiple direct calls and get inlined. All these promoted direct calls will share the same callsite location (offset+discriminator). With the current implementation, we cannot distinguish between different promotion candidates and its inlined instance. This patch adds callee_name to the key of the callsite sample map. And added helper functions to get all inlined callee samples for a given callsite location. This helps the profile annotator promote correct targets and inline it before annotation, and ensures all indirect call targets to be annotated correctly.

Reviewers: davidxl, dnovillo

Reviewed By: davidxl

Subscribers: andreadb, llvm-commits

Differential Revision: https://reviews.llvm.org/D31950

llvm-svn: 300240
2017-04-13 19:52:10 +00:00
Anna Thomas dcdb325fee [LV] Fix the vector code generation for first order recurrence
Summary:
In first order recurrences where phi's are used outside the loop,
we should generate an additional vector.extract of the second last element from
the vectorized phi update.
This is because we require the phi itself (which is the value at the second last
iteration of the vector loop) and not the phi's update within the loop.
Also fix the code gen when we just unroll, but don't vectorize.
Fixes PR32396.

Reviewers: mssimpso, mkuper, anemet

Subscribers: llvm-commits, mzolotukhin

Differential Revision: https://reviews.llvm.org/D31979

llvm-svn: 300238
2017-04-13 18:59:25 +00:00
Sanjay Patel 445d03bf00 [InstCombine] fold X == 0 || X == -1 to one compare (PR32524)
This is effectively a retry of:
https://reviews.llvm.org/rL299851
but now we have tests and an assert to make sure the bug
that was exposed with that attempt will not happen again.

I'll fix the code duplication and missing sibling fold next,
but I want to make this change as small as possible to reduce
risk since I messed it up last time.

This should fix:
https://bugs.llvm.org/show_bug.cgi?id=32524

llvm-svn: 300236
2017-04-13 18:47:06 +00:00
Reid Kleckner 3a1150352d [ArgPromotion] Don't drop !prof metadata on promoted calls
Noticed by inspection while doing attribute work. DAE, InstCombineCalls,
and ArgPromotion have a fair amount of duplicated code for hacking on
call sites, and you can find bugs by comparing them.

Add a test case for this.

llvm-svn: 300229
2017-04-13 18:10:30 +00:00
Stanislav Mekhanoshin d026f79bd3 [AMDGPU] Combine DS operations with offsets bigger than byte
In many cases ds operations can be combined even if offsets do not
fit into 8 bit encoding. What it takes is to adjust base address.

Differential Revision: https://reviews.llvm.org/D31993

llvm-svn: 300227
2017-04-13 17:53:07 +00:00
Brian Gesiak 0a7894d99c [Analysis] Support bitreverse in -demanded-bits pass
Summary:
* Add a bitreverse case in the demanded bits analysis pass.
* Add tests for the bitreverse (and bswap) intrinsic in the
  demanded bits pass.
* Add a test case to the BDCE tests: that manipulations to
  high-order bits are eliminated once the bits are reversed
  and then right-shifted.

Reviewers: mkuper, jmolloy, hfinkel, trentxintong

Reviewed By: jmolloy

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D31857

llvm-svn: 300215
2017-04-13 16:44:25 +00:00
Tobias Edler von Koch 90df1f48d5 LTO: Pass SF_Executable flag through to InputFile::Symbol
Summary:
The linker needs to be able to determine whether a symbol is text or data to
handle the case of a common being overridden by a strong definition in an
archive. If the archive contains a text member of the same name as the common,
that function is discarded. However, if the archive contains a data member of
the same name, that strong definition overrides the common. This is a behavior
of ld.bfd, which the Qualcomm linker also supports in LTO.

Here's a test case to illustrate:

####

cat > 1.c << \!
int blah;
!

cat > 2.c << \!
int blah() {
  return 0;
}
!

cat > 3.c << \!
int blah = 20;
!

clang -c 1.c
clang -c 2.c
clang -c 3.c

ar cr lib.a 2.o 3.o
ld 1.o lib.a -t

####

The correct output is:

1.o
(lib.a)3.o

Thanks to Shankar Easwaran and Hemant Kulkarni for the test case!

Reviewers: mehdi_amini, rafael, pcc, davide

Reviewed By: pcc

Subscribers: davide, llvm-commits, inglorion

Differential Revision: https://reviews.llvm.org/D31901

llvm-svn: 300205
2017-04-13 16:24:14 +00:00
Krzysztof Parzyszek c87c48019c [Hexagon] Unxfail passing tests
r300198 fixed a problem that caused two tests to be xfailed. Unxfail
these tests now, since they are passing.

llvm-svn: 300203
2017-04-13 16:05:35 +00:00
Sanjay Patel 104e36a0e9 [InstCombine] add/move tests for or-of-icmps; NFC
If we had these tests, the bug caused by https://reviews.llvm.org/rL299851 would have been caught sooner.
There's also an assert in the code that should have caught that bug, but the assert line itself has a bug.

llvm-svn: 300201
2017-04-13 15:46:39 +00:00