Commit Graph

1279 Commits

Author SHA1 Message Date
Nemanja Ivanovic 44513e545f [PowerPC] - Legalize vector types by widening instead of integer promotion
This patch corresponds to review:
http://reviews.llvm.org/D20443

It changes the legalization strategy for illegal vector types from integer
promotion to widening. This only applies for vectors with elements of width
that is a multiple of a byte since we have hardware support for vectors with
1, 2, 3, 8 and 16 byte elements.
Integer promotion for vectors is quite expensive on PPC due to the sequence
of breaking apart the vector, extending the elements and reconstituting the
vector. Two of these operations are expensive.
This patch causes between minor and major improvements in performance on most
benchmarks. There are very few benchmarks whose performance regresses. These
regressions can be handled in a subsequent patch with a DAG combine (similar
to how this patch handles int -> fp conversions of illegal vector types).

llvm-svn: 274535
2016-07-05 09:22:29 +00:00
Rafael Espindola c4cabb8054 Update tests to use at least darwin9.
llvm-svn: 274129
2016-06-29 14:51:10 +00:00
Matthias Braun 6ad3d05b68 MachineScheduler: Fully compare top/bottom candidates
In bidirectional scheduling this gives more stable results than just
comparing the "reason" fields of the top/bottom node because the reason
field may be higher depending on what other nodes are in the queue.

Differential Revision: http://reviews.llvm.org/D19401

llvm-svn: 273755
2016-06-25 00:23:00 +00:00
Rafael Espindola f2898d73a5 Convert test to FileCheck.
llvm-svn: 273609
2016-06-23 20:37:49 +00:00
Artur Pilipenko 80771b9ad9 Upgrade other old memset/memcpy signatures in tests causing buildbot failures with rL273568.
llvm-svn: 273580
2016-06-23 16:34:52 +00:00
Rafael Espindola 928a95d0b0 Use shouldAssumeDSOLocal.
With this it handle -fPIE.

llvm-svn: 273499
2016-06-22 22:09:17 +00:00
Peter Collingbourne 96efdd6107 IR: Introduce local_unnamed_addr attribute.
If a local_unnamed_addr attribute is attached to a global, the address
is known to be insignificant within the module. It is distinct from the
existing unnamed_addr attribute in that it only describes a local property
of the module rather than a global property of the symbol.

This attribute is intended to be used by the code generator and LTO to allow
the linker to decide whether the global needs to be in the symbol table. It is
possible to exclude a global from the symbol table if three things are true:
- This attribute is present on every instance of the global (which means that
  the normal rule that the global must have a unique address can be broken without
  being observable by the program by performing comparisons against the global's
  address)
- The global has linkonce_odr linkage (which means that each linkage unit must have
  its own copy of the global if it requires one, and the copy in each linkage unit
  must be the same)
- It is a constant or a function (which means that the program cannot observe that
  the unique-address rule has been broken by writing to the global)

Although this attribute could in principle be computed from the module
contents, LTO clients (i.e. linkers) will normally need to be able to compute
this property as part of symbol resolution, and it would be inefficient to
materialize every module just to compute it.

See:
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20160509/356401.html
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20160516/356738.html
for earlier discussion.

Part of the fix for PR27553.

Differential Revision: http://reviews.llvm.org/D20348

llvm-svn: 272709
2016-06-14 21:01:22 +00:00
Diana Picus bae1d89e45 [SelectionDAG] Remove exit-on-error flag from test (PR27765)
The exit-on-error flag in the ARM test is necessary in order to avoid an
unreachable in the DAGTypeLegalizer, when trying to expand a physical register.
We can also avoid this situation by introducing a bitcast early on, where the
invalid scalar-to-vector conversion is detected.

We also add a test for PowerPC, which goes through a similar code path in the
SelectionDAGBuilder.

Fixes PR27765.

Differential Revision: http://reviews.llvm.org/D21061

llvm-svn: 272644
2016-06-14 07:30:20 +00:00
Strahinja Petrovic f0980e4dc0 This patch fixes handling long double type when it is
constant in soft float mode on PowerPC 32 architecture.

llvm-svn: 272543
2016-06-13 10:29:29 +00:00
Eric Christopher 1dbb23e162 Add aliases for mfvrsave/mtvrsave.
Update a test as we're now going to emit it for easier reading of
generated assembly as well.

llvm-svn: 272339
2016-06-09 23:27:48 +00:00
Ulrich Weigand 6b0634b304 [PowerPC] Support multiple return values with fast isel
Using an LLVM IR aggregate return value type containing three
or more integer values causes an abort in the fast isel pass.

This patch adds two more registers to RetCC_PPC64_ELF_FIS to
allow returning up to four integers with fast isel, just the
same as is currently supported with regular isel (RetCC_PPC).

This is needed for Swift and (possibly) other non-clang frontends.

Fixes PR26190.

llvm-svn: 272005
2016-06-07 12:48:22 +00:00
Geoff Berry c932f533e1 [PowerPC] Run reg2mem on tests to simplify them.
Summary:
Also convert test/CodeGen/PowerPC/vsx-ldst-builtin-le.ll to use
FileCheck instead of two grep and count runs.

This change is needed to avoid spurious diffs in these tests when
EarlyCSE is improved to use MemorySSA and can do more load elimination.

Reviewers: hfinkel

Subscribers: mcrosier, llvm-commits

Differential Revision: http://reviews.llvm.org/D20238

llvm-svn: 271553
2016-06-02 18:02:50 +00:00
Keno Fischer 5573483c5b [PPC64] Fix SUBFC8 Defs list
Fix PR27943 "Bad machine code: Using an undefined physical register".
SUBFC8 implicitly defines the CR0 register, but this was omitted in
the instruction definition.

Patch by Jameson Nash <jameson@juliacomputing.com>

Reviewers: hfinkel
Differential Revision: http://reviews.llvm.org/D20802

llvm-svn: 271425
2016-06-01 20:31:07 +00:00
Tim Shen fa57367ae5 Move and add comments to the top for tailcall-string-rvo.ll
Differential Revision: http://reviews.llvm.org/D20311

llvm-svn: 270722
2016-05-25 17:01:09 +00:00
Tim Shen 95e84c5123 [PowerPC] Add a testcase for TCO on string rvo function
Differential Revision: http://reviews.llvm.org/D20311

llvm-svn: 270287
2016-05-20 22:42:01 +00:00
Rafael Espindola 8571aa3d5d Simplify handling of hidden stubs on PowerPC.
We now handle them just like non hidden ones. This was already the case
on x86 (r207518) and arm (r207517).

llvm-svn: 270205
2016-05-20 12:00:52 +00:00
Rafael Espindola cdb2a15d9d Don't pass relocation-model= to tests that don't need it.
Very few things in MC itself use the option. Most of the code that that
uses it could be move to CodeGen.

llvm-svn: 269871
2016-05-18 00:27:17 +00:00
Renato Golin 38ed8021c7 Fix an assert in SelectionDAGBuilder when processing inline asm
When processing inline asm that contains errors, make sure we can recover
gracefully by creating an UNDEF SDValue for the inline asm statement before
returning from SelectionDAGBuilder::visitInlineAsm. This is necessary for
consumers that don't exit on the first error that is emitted (e.g. clang)
and that would assert later on.

Fixes PR24071.

Patch by Diana Picus.

llvm-svn: 269811
2016-05-17 19:52:01 +00:00
Rafael Espindola 9210be5431 Add a test showing how hidden stubs are handled on ppc.
llvm-svn: 269766
2016-05-17 14:24:33 +00:00
Renato Golin 4b9c0d4dcf [llc] New diagnostic handler
Without a diagnostic handler installed, llc's behaviour is to exit on the first
error that it encounters. This is very different from the behaviour of clang
and other front ends, which try to gather as many errors as possible before
exiting.

This commit adds a diagnostic handler to llc, allowing it to find and report
more than one error. The old behaviour is preserved under a flag (-exit-on-error).

Some of the tests fail with the new diagnostic handler, so they have to use the
new flag in order to run under the previous behaviour. Some of these are known
bugs, others need further investigation. Ideally, we should fix the tests and
remove the flag at some point in the future.

Reapplied after fixing the LLDB build that was broken due to the new
DiagnosticSeverity in LLVMContext.h, and fixed an UB in the new change.

Patch by Diana Picus.

llvm-svn: 269655
2016-05-16 14:28:02 +00:00
Renato Golin f4917d35c9 Revert "[llc] New diagnostic handler"
This reverts commit r269563. Even though now it passes all LLDB bots
after a local fix, there's a new buildbot it fails with tests that we
hadn't seen locally:

http://lab.llvm.org:8011/builders/clang-x86_64-linux-selfhost-modules/builds/15647

Adding those tests to the list to investigate.

llvm-svn: 269568
2016-05-14 14:37:11 +00:00
Renato Golin c001e67baf [llc] New diagnostic handler
Without a diagnostic handler installed, llc's behaviour is to exit on the first
error that it encounters. This is very different from the behaviour of clang
and other front ends, which try to gather as many errors as possible before
exiting.

This commit adds a diagnostic handler to llc, allowing it to find and report
more than one error. The old behaviour is preserved under a flag (-exit-on-error).

Some of the tests fail with the new diagnostic handler, so they have to use the
new flag in order to run under the previous behaviour. Some of these are known
bugs, others need further investigation. Ideally, we should fix the tests and
remove the flag at some point in the future.

Reapplied after fixing the LLDB build that was broken due to the new
DiagnosticSeverity in LLVMContext.h.

Patch by Diana Picus.

llvm-svn: 269563
2016-05-14 13:15:22 +00:00
Renato Golin e9fa3585c5 Revert "[llc] New diagnostic handler"
This reverts commit r269428, as it breaks the LLDB build. We need to
understand how to change LLDB in the same way as LLC before landing this
again.

llvm-svn: 269432
2016-05-13 16:02:44 +00:00
Renato Golin d7a64a5b23 [llc] New diagnostic handler
Without a diagnostic handler installed, llc's behaviour is to exit on the first
error that it encounters. This is very different from the behaviour of clang
and other front ends, which try to gather as many errors as possible before
exiting.

This commit adds a diagnostic handler to llc, allowing it to find and report
more than one error. The old behaviour is preserved under a flag (-exit-on-error).

Some of the tests fail with the new diagnostic handler, so they have to use the
new flag in order to run under the previous behaviour. Some of these are known
bugs, others need further investigation. Ideally, we should fix the tests and
remove the flag at some point in the future.

Patch by Diana Picus.

llvm-svn: 269428
2016-05-13 15:37:46 +00:00
Hal Finkel 1fb10e846a [PowerPC] Fix a DAG replacement bug in PPCTargetLowering::DAGCombineExtBoolTrunc
While promoting nodes in PPCTargetLowering::DAGCombineExtBoolTrunc, it is
possible for one of the nodes to be replaced by another. To make sure we do not
visit the deleted nodes, and to make sure we visit the replacement nodes, use a
list of HandleSDNodes to track the to-be-promoted nodes during the promotion
process.

The same fix has been applied to the analogous code in
PPCTargetLowering::DAGCombineTruncBoolExt.

Fixes PR26985.

llvm-svn: 269272
2016-05-12 04:00:56 +00:00
Rafael Espindola 32483a7641 Make "@name =" mandatory for globals in .ll files.
An oddity of the .ll syntax is that the "@var = " in

@var = global i32 42

is optional. Writing just

global i32 42

is equivalent to

@0 = global i32 42

This means that there is a pretty big First set at the top level. The
current implementation maintains it manually. I was trying to refactor
it, but then started wondering why keep it a all. I personally find the
above syntax confusing. It looks like something is missing.

This patch removes the feature and simplifies the parser.

llvm-svn: 269096
2016-05-10 18:22:45 +00:00
Nemanja Ivanovic 6e29baf7f5 [Power9] Add support for -mcpu=pwr9 in the back end
This patch corresponds to review:
http://reviews.llvm.org/D19683

Simply adds the bits for being able to specify -mcpu=pwr9 to the back end.

llvm-svn: 268950
2016-05-09 18:54:58 +00:00
Strahinja Petrovic e682b80b8b [PowerPC] fix register alignment for long double type
This patch fixes register alignment for long double type in
soft float mode. Before this patch alignment was 8 and this
patch changes it to 4.
Differential Revision: http://reviews.llvm.org/D18034

llvm-svn: 268909
2016-05-09 12:27:39 +00:00
Nemanja Ivanovic 1a2b2f03e7 [PowerPC] Generate VSX version of splat word
This patch corresponds to review:
http://reviews.llvm.org/D18592

It allows the PPC back end to generate the xxspltw instruction where we
previously only emitted vspltw.

llvm-svn: 268516
2016-05-04 16:04:02 +00:00
Haicheng Wu 4afe0425db [MBP] Use Function::optForSize() instead of checking OptimizeForSize directly.
Fix a FIXME.  Disable loop alignment if compiled with -Oz now.

llvm-svn: 268121
2016-04-29 22:01:10 +00:00
Guozhi Wei fa3e04298b [PPC] Enable shuffling of VSX vectors
This patch fixes PR27078 by enabling shuffling of vectors if VSX is available.

llvm-svn: 268064
2016-04-29 17:00:54 +00:00
Marcin Koscielnicki 7b32957852 [PowerPC] Fix the EH_SjLj_Setup pseudo.
This instruction is just a control flow marker - it should not
actually exist in the object file.  Unfortunately, nothing catches
it before it gets to AsmPrinter.  If integrated assembler is used,
it's considered to be a normal 4-byte instruction, and emitted as
an all-0 word, crashing the program.  With external assembler,
a comment is emitted.

Fixed by setting Size to 0 and handling it in MCCodeEmitter - this
means the comment will still be emitted if integrated assembler
is not used.

This broke an ASan test, which has been disabled for a long time
as a result (see the discussion on D19657).  We can reenable it
once this lands.

llvm-svn: 267943
2016-04-28 21:24:37 +00:00
Chuang-Yu Cheng 8676c3d599 [ppc64] fix bug in prologue that mfocrf's cr operand should be explict state instead of implicit
This fixes PR27414

Reviewers: kbarton mgrang tjablin

http://reviews.llvm.org/D19255

llvm-svn: 267660
2016-04-27 02:59:28 +00:00
Marcin Koscielnicki 0cfb612413 [PowerPC] Add support for llvm.thread.pointer
Differential Revision: http://reviews.llvm.org/D19304

llvm-svn: 267546
2016-04-26 10:37:22 +00:00
Marcin Koscielnicki 1c1af6ef77 [PR27390] [CodeGen] Reject indexed loads in CombinerDAG.
visitAND, when folding and (load) forgets to check which output of
an indexed load is involved, happily folding the updated address
output on the following testcase:

target datalayout = "e-m:e-i64:64-n32:64"
target triple = "powerpc64le-unknown-linux-gnu"

%typ = type { i32, i32 }

define signext i32 @_Z8access_pP1Tc(%typ* %p, i8 zeroext %type) {
  %b = getelementptr inbounds %typ, %typ* %p, i64 0, i32 1
  %1 = load i32, i32* %b, align 4
  %2 = ptrtoint i32* %b to i64
  %3 = and i64 %2, -35184372088833
  %4 = inttoptr i64 %3 to i32*
  %_msld = load i32, i32* %4, align 4
  %zzz = add i32 %1,  %_msld
  ret i32 %zzz
}

Fix this by checking ResNo.

I've found a few more places that currently neglect to check for
indexed load, and tightened them up as well, but I don't have test
cases for them.  In fact, they might not be triggerable at all,
at least with current targets.  Still, better safe than sorry.

Differential Revision: http://reviews.llvm.org/D19202

llvm-svn: 267420
2016-04-25 15:43:44 +00:00
Marcin Koscielnicki a44d44cb2e [PowerPC] [PR27387] Disallow r0 for ADD8TLS.
ADD8TLS, a variant of add instruction used for initial-exec TLS,
currently accepts r0 as a source register.  While add itself supports
r0 just fine, linker can relax it to a local-exec sequence, converting
it to addi - which doesn't support r0.

Differential Revision: http://reviews.llvm.org/D19193

llvm-svn: 267388
2016-04-25 09:24:34 +00:00
Sanjay Patel 95590f4b7b use FileCheck; add test for disguised fabs
llvm-svn: 267051
2016-04-21 20:58:58 +00:00
Marcin Koscielnicki 48d72342ff [PowerPC] [SSP] Fix stack guard load for 32-bit.
r266809 incorrectly used LD to load the stack guard, it should be LWZ.

Differential Revision: http://reviews.llvm.org/D19358

llvm-svn: 267017
2016-04-21 17:36:05 +00:00
Mandeep Singh Grang 029a0567fa [LLVM] Remove unwanted --check-prefix=CHECK from unit tests. NFC.
Summary: Removed unwanted --check-prefix=CHECK from numerous unit tests.

Reviewers: t.p.northover, dblaikie, uweigand, MatzeB, tstellarAMD, mcrosier

Subscribers: mcrosier, dsanders

Differential Revision: http://reviews.llvm.org/D19279

llvm-svn: 266834
2016-04-19 23:51:52 +00:00
Tim Shen a1d8bc5597 [PPC, SSP] Support PowerPC Linux stack protection.
llvm-svn: 266809
2016-04-19 20:14:52 +00:00
Strahinja Petrovic c91b45a31a [PowerPC] add comment to test
Added comment in test for soft-float operations on ppc architecture.
Test commit.

llvm-svn: 266600
2016-04-18 11:52:14 +00:00
Adrian Prantl 75819aedf6 [PR27284] Reverse the ownership between DICompileUnit and DISubprogram.
Currently each Function points to a DISubprogram and DISubprogram has a
scope field. For member functions the scope is a DICompositeType. DIScopes
point to the DICompileUnit to facilitate type uniquing.

Distinct DISubprograms (with isDefinition: true) are not part of the type
hierarchy and cannot be uniqued. This change removes the subprograms
list from DICompileUnit and instead adds a pointer to the owning compile
unit to distinct DISubprograms. This would make it easy for ThinLTO to
strip unneeded DISubprograms and their transitively referenced debug info.

Motivation
----------

Materializing DISubprograms is currently the most expensive operation when
doing a ThinLTO build of clang.

We want the DISubprogram to be stored in a separate Bitcode block (or the
same block as the function body) so we can avoid having to expensively
deserialize all DISubprograms together with the global metadata. If a
function has been inlined into another subprogram we need to store a
reference the block containing the inlined subprogram.

Attached to https://llvm.org/bugs/show_bug.cgi?id=27284 is a python script
that updates LLVM IR testcases to the new format.

http://reviews.llvm.org/D19034
<rdar://problem/25256815>

llvm-svn: 266446
2016-04-15 15:57:41 +00:00
Nirav Dave 1f51c334ca Fix typing on generated LXV2DX/STXV2DX instructions
[PPC] Previously when casting generic loads to LXV2DX/ST instructions we
would leave the original load return type in place allowing for an
assertion failure when we merge two equivalent LXV2DX nodes with
different types.

This fixes PR27350.

Reviewers: nemanjai

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D19133

llvm-svn: 266438
2016-04-15 15:01:38 +00:00
Sanjay Patel 748b06514a [ppc] add tests to show potential andc optimization
llvm-svn: 266261
2016-04-13 23:23:30 +00:00
Chuang-Yu Cheng 94f58e79ae [PPC64] Mark CR0 Live if PPCInstrInfo::optimizeCompareInstr Creates a Use of CR0
Resolve Bug 27046 (https://llvm.org/bugs/show_bug.cgi?id=27046).
The PPCInstrInfo::optimizeCompareInstr function could create a new use of
CR0, even if CR0 were previously dead. This patch marks CR0 live if a use of
CR0 is created.

Author: Tom Jablin (tjablin)
Reviewers: hfinkel kbarton cycheng

http://reviews.llvm.org/D18884

llvm-svn: 266040
2016-04-12 03:10:52 +00:00
Chuang-Yu Cheng 6efde2fb45 [PPC64] Use mfocrf in prologue when we only need to save 1 nonvolatile CR field
In the ELFv2 ABI, we are not required to save all CR fields. If only one
nonvolatile CR field is clobbered, use mfocrf instead of mfcr to
selectively save the field, because mfocrf has short latency compares to
mfcr.

Thanks Nemanja's invaluable hint!
Reviewers: nemanjai tjablin hfinkel kbarton

http://reviews.llvm.org/D17749

llvm-svn: 266038
2016-04-12 03:04:44 +00:00
Nirav Dave 66f485f4e2 Fix Load Control Dependence in MemCpy Generation
In Memcpy lowering we had missed a dependence from the load of the
operation to successor operations. This causes us to potentially
construct an in initial DAG with a memory dependence not fully
represented in the chain sub-DAG but rather require looking at the
entire DAG breaking alias analysis by allowing incorrect repositioning
of memory operations.

To work around this, r200033 changed DAGCombiner::GatherAllAliases to be
conservative if any possible issues to happen. Unfortunately this check
forbade many non-problematic situations as well. For example, it's
common for incoming argument lowering to add a non-aliasing load hanging
off of EntryNode. Then, if GatherAllAliases visited EntryNode, it would
find that other (unvisited) use of the EntryNode chain, and just give up
entirely. Furthermore, the check was incomplete: it would not actually
detect all such potentially problematic DAG constructions, because
GatherAllAliases did not guarantee to visit all chain nodes going up to
the root EntryNode. This is in general fine -- giving up early will just
miss a potential optimization, not generate incorrect results. But, for
this non-chain dependency detection code, it's possible that you could
have a load attached to a higher-up chain node than any which were
visited. If that load aliases your store, but the only dependency is
through the value operand of a non-aliasing store, it would've been
missed by this code, and potentially reordered.

With the dependence added, this check can be removed and Alias Analysis
can be much more aggressive. This fixes code quality regression in the
Consecutive Store Merge cleanup (D14834).

Test Change:

ppc64-align-long-double.ll now may see multiple serializations
of its stores

Differential Revision: http://reviews.llvm.org/D18062

llvm-svn: 265836
2016-04-08 19:44:40 +00:00
Chuang-Yu Cheng 98c1894755 CXX_FAST_TLS calling convention: performance improvement for PPC64
This is the same change on PPC64 as r255821 on AArch64. I have even borrowed
his commit message.

The access function has a short entry and a short exit, the initialization
block is only run the first time. To improve the performance, we want to
have a short frame at the entry and exit.

We explicitly handle most of the CSRs via copies. Only the CSRs that are not
handled via copies will be in CSR_SaveList.

Frame lowering and prologue/epilogue insertion will generate a short frame
in the entry and exit according to CSR_SaveList. The majority of the CSRs will
be handled by register allcoator. Register allocator will try to spill and
reload them in the initialization block.

We add CSRsViaCopy, it will be explicitly handled during lowering.

1> we first set FunctionLoweringInfo->SplitCSR if conditions are met (the target
   supports it for the given machine function and the function has only return
   exits). We also call TLI->initializeSplitCSR to perform initialization.
2> we call TLI->insertCopiesSplitCSR to insert copies from CSRsViaCopy to
   virtual registers at beginning of the entry block and copies from virtual
   registers to CSRsViaCopy at beginning of the exit blocks.
3> we also need to make sure the explicit copies will not be eliminated.

Author: Tom Jablin (tjablin)
Reviewers: hfinkel kbarton cycheng

http://reviews.llvm.org/D17533

llvm-svn: 265781
2016-04-08 12:04:32 +00:00
Amaury Sechet c53ad4f3b2 Do not select EhPad BB in MachineBlockPlacement when there is regular BB to schedule
Summary:
EHPad BB are not entered the classic way and therefor do not need to be placed after their predecessors. This patch make sure EHPad BB are not chosen amongst successors to form chains, and are selected as last resort when selecting the best candidate.

EHPad are scheduled in reverse probability order in order to have them flow into each others naturally.

Reviewers: chandlerc, majnemer, rafael, MatzeB, escha, silvas

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D17625

llvm-svn: 265726
2016-04-07 21:29:39 +00:00
Ehsan Amiri 4701a91e59 [PPC] Enable transformations in PPCPassConfig::addIRPasses at O2
http://reviews.llvm.org/D18562

A large number of testcases has been modified so they pass after this test.
One testcase is deleted, because I realized even after undoing the original
change that was committed with this testcase, the testcase still passes. So
I removed it. The change to one other testcase (test/CodeGen/PowerPC/pr25802.ll)
is an arbitrary change to keep it passing. Given the original intention of the
testcase, and the fact that fixing it will require some time to change the testcase,
we concluded that this quick change will be enough.

llvm-svn: 265683
2016-04-07 15:30:55 +00:00
Ehsan Amiri 322eca3849 [PPC] Use VSX/FP Facility integer load when an integer load's only users are conversion to FP
http://reviews.llvm.org/D18405

When the integer value loaded is never used directly as integer we should use VSX 
or Floating Point Facility integer loads and avoid extra direct move

llvm-svn: 265593
2016-04-06 20:12:29 +00:00
Chuang-Yu Cheng 6e1408a891 [ppc64] Temporary disable sibling call optimization on ppc64 due to breaking test case
r265506 breaks print-stack-trace.cc test case of compiler-rt in bootstrap
test.

http://lab.llvm.org:8011/builders/clang-ppc64be-linux-multistage/builds/1708

llvm-svn: 265528
2016-04-06 10:48:36 +00:00
Chuang-Yu Cheng 2e5973ef74 [ppc64] Enable sibling call optimization on ppc64 ELFv1/ELFv2 abi
This patch enable sibling call optimization on ppc64 ELFv1/ELFv2 abi, and
add a couple of test cases. This patch also passed llvm/clang bootstrap
test, and spec2006 build/run/result validation.

Original issue: https://llvm.org/bugs/show_bug.cgi?id=25617

Great thanks to Tom's (tjablin) help, he contributed a lot to this patch.
Thanks Hal and Kit's invaluable opinions!

Reviewers: hfinkel kbarton

http://reviews.llvm.org/D16315

llvm-svn: 265506
2016-04-06 02:04:38 +00:00
Matthias Braun 571e3481e7 test: Always treat .mir files as tests even outside of CodeGen/MIR
We missed a handful of .mir tests that existed outside the
test/CodeGen/MIR directory.

Also fix the three powerpc .mir tests that nobody noticed were broken.

llvm-svn: 265350
2016-04-04 21:23:44 +00:00
Chuang-Yu Cheng 35c6181982 Fix Sub-register Rewriting in Aggressive Anti-Dependence Breaker
Previously, HandleLastUse would delete RegRef information for sub-registers
if they were dead even if their corresponding super-register were still live.

If the super-register were later renamed, then the definitions of the
sub-register would not be updated appropriately. This patch alters the
behavior so that RegInfo information for sub-registers is only deleted when
the sub-register and super-register are both dead.

This resolves PR26775. This is the mirror image of Hal's r227311 commit.

Author: Tom Jablin (tjablin)
Reviewers: kbarton uweigand nemanjai hfinkel

http://reviews.llvm.org/D18448

llvm-svn: 265097
2016-04-01 02:05:29 +00:00
Adrian Prantl b8089516a5 testcase gardening: update the emissionKind enum to the new syntax. (NFC)
llvm-svn: 265081
2016-04-01 00:16:49 +00:00
Adrian Prantl b939a25707 Move the DebugEmissionKind enum from DIBuilder into DICompileUnit.
This mostly cosmetic patch moves the DebugEmissionKind enum from DIBuilder
into DICompileUnit. DIBuilder is not the right place for this enum to live
in — a metadata consumer should not have to include DIBuilder.h.
I also added a Verifier check that checks that the emission kind of a
DICompileUnit is actually legal.

http://reviews.llvm.org/D18612
<rdar://problem/25427165>

llvm-svn: 265077
2016-03-31 23:56:58 +00:00
Tim Shen 2e24d0c0c1 Move asm-printer-topological-order.ll to PowerPC backend
llvm-svn: 265067
2016-03-31 22:32:10 +00:00
Hal Finkel 0f02ccd955 [PowerPC] Cleanup test/CodeGen/PowerPC/qpx-load-splat.ll
Removing unnecessary attributes and metadata...

llvm-svn: 265049
2016-03-31 20:45:00 +00:00
Hal Finkel fc35391f2b [PowerPC] Add a late MI-level pass for QPX load/splat simplification
Chapter 3 of the QPX manual states that, "Scalar floating-point load
instructions, defined in the Power ISA, cause a replication of the source data
across all elements of the target register." Thus, if we have a load followed
by a QPX splat (from the first lane), the splat is redundant. This adds a late
MI-level pass to remove the redundant splats in some of these cases
(specifically when both occur in the same basic block).

This optimization is scheduled just prior to post-RA scheduling. It can't happen
before anything that might replace the load with some already-computed quantity
(i.e. store-to-load forwarding).

llvm-svn: 265047
2016-03-31 20:39:41 +00:00
Ulrich Weigand 6ad762d36f [PowerPC] Attempt to fix fast-isel-i64offset.ll failure
The test case added in r265023 is failing on ninja-x64-msvc-RA-centos6.
Update the test to make less specific assumptions on code generation.

llvm-svn: 265026
2016-03-31 16:38:57 +00:00
Ulrich Weigand 3707ba8030 [PowerPC] Correctly compute 64-bit offsets in fast isel
PPCSimplifyAddress contains this code:

  IntegerType *OffsetTy = ((VT == MVT::i32) ? Type::getInt32Ty(*Context)
                                            : Type::getInt64Ty(*Context));

to determine the type to be used for an index register, if one needs
to be created.  However, the "VT" here is the type of the data being
loaded or stored, *not* the type of an address.  This means that if
a data element of type i32 is accessed using an index that does not
not fit into 32 bits, a wrong address is computed here.

Note that PPCFastISel is only ever used on 64-bit currently, so the type
of an address is actually *always* MVT::i64.  Other parts of the code,
even in this same PPCSimplifyAddress routine, already rely on that fact.
Thus, this patch changes the code to simply unconditionally use
Type::getInt64Ty(*Context) as OffsetTy.

llvm-svn: 265023
2016-03-31 15:37:06 +00:00
Ulrich Weigand 1931b01a64 [PowerPC] Remove incorrect use of COPY_TO_REGCLASS in fast isel
The fast isel pass currently emits a COPY_TO_REGCLASS node to convert
from a F4RC to a F8RC register class during conversion of a
floating-point number to integer. There is actually no support in the
common code instruction printers to emit COPY_TO_REGCLASS nodes, so the
PowerPC back-end has special code there to simply ignore
COPY_TO_REGCLASS.

This is correct *if and only if* the source and destination registers of
COPY_TO_REGCLASS are the same (except for the different register class).
But nothing guarantees this to be the case, and if the register
allocator does end up allocating source and destination to different
registers after all, the back-end simply generates incorrect code. I've
included a test case that shows such incorrect code generation.

However, it seems that COPY_TO_REGCLASS is actually not intended to be
used at the MI layer at all. It is used during SelectionDAG, but always
lowered to a plain COPY before emitting MI. Other back-end's fast isel
passes never emit COPY_TO_REGCLASS at all. I suspect it is simply wrong
for the PowerPC back-end to emit it here.

This patch changes the PowerPC back-end to directly emit COPY instead of
COPY_TO_REGCLASS and removes the special handling in the instruction
printers.

Differential Revision: http://reviews.llvm.org/D18605

llvm-svn: 265020
2016-03-31 14:44:50 +00:00
Hal Finkel 851b33a0b1 [PowerPC] Load two floats directly instead of using one 64-bit integer load
When dealing with complex<float>, and similar structures with two
single-precision floating-point numbers, especially when such things are being
passed around by value, we'll sometimes end up loading both float values by
extracting them from one 64-bit integer load. It looks like this:

  t13: i64,ch = load<LD8[%ref.tmp]> t0, t6, undef:i64
      t16: i64 = srl t13, Constant:i32<32>
    t17: i32 = truncate t16
  t18: f32 = bitcast t17
    t19: i32 = truncate t13
  t20: f32 = bitcast t19

The problem, especially before the P8 where those bitcasts aren't legal (and
get expanded via the stack), is that it would have been better to use two
floating-point loads directly. Here we add a target-specific DAGCombine to do
just that. In short, we turn:

	ld 3, 0(5)
	stw 3, -8(1)
	rldicl 3, 3, 32, 32
	stw 3, -4(1)
	lfs 3, -4(1)
	lfs 0, -8(1)

into:

        lfs 3, 4(5)
        lfs 0, 0(5)

llvm-svn: 264988
2016-03-31 02:56:05 +00:00
Hal Finkel fa7057a415 [PowerPC] Refactor popcnt[dw] target features
Instead of using two feature bits, one to indicate the availability of the
popcnt[dw] instructions, and another to indicate whether or not they're fast,
use a single enum. This allows more consistent control via target attribute
strings, and via Clang's command line.

llvm-svn: 264690
2016-03-29 01:36:01 +00:00
Kyle Butt 5e241b11ed [Codegen] Decrease minimum jump table density.
Minimum density for both optsize and non optsize are now options
-sparse-jump-table-density (default 10) for non optsize functions
-dense-jump-table-density (default 40) for optsize functions, which
matches the current default. This improves several benchmarks at google
at the cost of a small codesize increase. For code compiled with -Os,
the old behavior continues

llvm-svn: 264689
2016-03-29 00:23:41 +00:00
Hal Finkel 7059d41622 [PowerPC] On the A2, popcnt[dw] are very slow
The A2 cores support the popcntw/popcntd instructions, but they're microcoded,
and slower than our default software emulation. Specifically, popcnt[dw] take
approximately 74 cycles, whereas our software emulation takes only 24-28
cycles.

I've added a new target feature to indicate a slow popcnt[dw], instead of just
removing the existing target feature from the a2/a2q processor models, because:
  1. This allows us to return more accurate information via the TTI interface
     (I recognize that this currently makes no practical difference)
  2. Is hopefully easier to understand (it allows the core's features to match
     its manual while still having the desired effect).

llvm-svn: 264600
2016-03-28 17:52:08 +00:00
Hal Finkel 0b37175ca6 [PowerPC] Map max/minnum intrinsics and fmax/fmin to ISD nodes for CTR-based loop legality
Intrinsic::maxnum and Intrinsic::minnum, along with the associated libc
function calls (fmax[f], etc.) generally map to function calls after lowering.
For some vector types with QPX at least, however, we can legally lower these,
and we don't need to prohibit CTR-based loops on their account.

It turned out, however, that the logic that checked the opcodes associated with
intrinsics was broken (it would set the Opcode variable, but that variable was
later checked only if set for some otherwise-external function call.

This fixes the latter problem and adds the FMAX/MINNUM mappings.

llvm-svn: 264532
2016-03-27 05:40:56 +00:00
David Majnemer b549ab02b4 [PowerPC] Disable the CTR optimization in the presence of {min,max}num
The minnum and maxnum intrinsics get lowered to libcalls which
invalidates the CTR optimization.

This fixes PR27083.

llvm-svn: 264508
2016-03-26 09:42:31 +00:00
Eric Christopher b979d51afa Finish the incomplete 'd' inline asm constraint support for PPC by
making sure we give it a register and mark it as a register constraint.

llvm-svn: 264340
2016-03-24 21:04:52 +00:00
Eric Christopher 8c95d53d45 Reorder check lines, comments in test and remove unnecessary IR.
llvm-svn: 264339
2016-03-24 21:04:47 +00:00
Nemanja Ivanovic 5ebc92dbe1 [PowerPC] Disable direct moves for extractelement and bitcast in 32-bit mode
This patch corresponds to review:
http://reviews.llvm.org/D17711

It disables direct moves on these operations in 32-bit mode since the patterns
assume 64-bit registers. The final patch is slightly different from the
Phabricator review as the bitcast operations needed to be disabled in 32-bit
mode as well. This fixes PR26617.

llvm-svn: 264282
2016-03-24 13:40:33 +00:00
Kyle Butt 613112826e Codegen: [PPC] Word Rotates are Zero Extending.
Add Word rotates to the list of instructions that are zero extending.
This allows them to be used in dot form to compare with zero.

llvm-svn: 264183
2016-03-23 19:51:22 +00:00
Tim Shen 5cdf75084a [PPC, FastISel] Fix ordered/unordered fcmp
For fcmp, major concern about the following 6 cases is NaN result. The
comparison result consists of 4 bits, indicating lt, eq, gt and un (unordered),
only one of which will be set. The result is generated by fcmpu
instruction. However, bc instruction only inspects one of the first 3
bits, so when un is set, bc instruction may jump to to an undesired
place.

More specifically, if we expect an unordered comparison and un is set, we
expect to always go to true branch; in such case UEQ, UGT and ULT still
give false, which are undesired; but UNE, UGE, ULE happen to give true,
since they are tested by inspecting !eq, !lt, !gt, respectively.

Similarly, for ordered comparison, when un is set, we always expect the
result to be false. In such case OGT, OLT and OEQ is good, since they are
actually testing GT, LT, and EQ respectively, which are false. OGE, OLE
and ONE are tested through !lt, !gt and !eq, and these are true.

llvm-svn: 263753
2016-03-17 22:27:58 +00:00
Petar Jovanovic 0b44f24033 [PowerPC] Disable CTR loops optimization for soft float operations
This patch prevents CTR loops optimization when using soft float operations
inside loop body. Soft float operations use function calls, but function
calls are not allowed inside CTR optimized loops.

Patch by Aleksandar Beserminji.

Differential Revision: http://reviews.llvm.org/D17600

llvm-svn: 263727
2016-03-17 17:11:33 +00:00
Nemanja Ivanovic bd56e4e25a Fix for PR 26378
This patch corresponds to review:
http://reviews.llvm.org/D17712

We were not clearing the TOC vector in PPCAsmPrinter when initializing it. This
caused duplicate definition asserts when the pass is reused on the module
(i.e. with -compile-twice or in JIT contexts).

llvm-svn: 263338
2016-03-12 10:23:07 +00:00
Kit Barton a1d6a6f1de [PPC] backend changes to generate xvabs[s,d]p and xvnabs[s,d]p instructions
This has to be committed before the FE changes

Phabricator: http://reviews.llvm.org/D17837
llvm-svn: 263035
2016-03-09 17:48:01 +00:00
Tim Shen 6e676a84ad [PPCVSXFMAMutate] Temporarily disable this pass
llvm-svn: 262573
2016-03-03 01:27:35 +00:00
Nemanja Ivanovic 1a5706ca1b Fix for PR26180
Corresponds to Phabricator review:
http://reviews.llvm.org/D16592

This fix includes both an update to how we handle the "generic" CPU on LE
systems as well as Anton's fix for the Fast Isel issue.

llvm-svn: 262233
2016-02-29 16:42:27 +00:00
Kit Barton 915c5ecee1 [PPC] Legalize FNEG on PPC when possible
Currently we always expand ISD::FNEG. For v4f32 and v2f64 vector types VSX has
native support for this opcode

Phabricator: http://reviews.llvm.org/D17647
llvm-svn: 262079
2016-02-26 21:59:44 +00:00
Paul Robinson 51fa0a87c3 Fix tests that used CHECK-NEXT-NOT and CHECK-DAG-NOT.
FileCheck actually doesn't support combo suffixes.

Differential Revision: http://reviews.llvm.org/D17588

llvm-svn: 262054
2016-02-26 19:40:34 +00:00
Nemanja Ivanovic a8ef3c9b86 Fix for PR26690 take 2
This is what was meant to be in the initial commit to fix this bug. The
parens were missing. This commit also adds a test case for the bug and
has undergone full testing on PPC and X86.

llvm-svn: 261546
2016-02-22 18:04:00 +00:00
Justin Lebar c75d566f56 When printing MIR, output to errs() rather than outs().
Summary:
Without this, this command

  $ llvm-run llc -stop-after machine-cp -o - <( echo '' )

outputs an error, because we close stdout twice -- once when closing the
file opened for "-o", and again when closing outs().

Also clarify in the outs() definition that you can't ever call it if you
want to open your own raw_fd_ostream on stdout.

Reviewers: jroelofs, tstellarAMD

Subscribers: jholewinski, qcolombet, dsanders, llvm-commits

Differential Revision: http://reviews.llvm.org/D17422

llvm-svn: 261286
2016-02-19 00:18:46 +00:00
Nemanja Ivanovic d05e072b74 Add the missing test case for PR26193
llvm-svn: 259888
2016-02-05 15:03:17 +00:00
Nemanja Ivanovic b6fdce4ca0 Fix for PR 26356
Using the load immediate only when the immediate (whether signed or unsigned)
can fit in a 16-bit signed field. Namely, from -32768 to 32767 for signed and
0 to 65535 for unsigned. This patch also ensures that we sign-extend under the
right conditions.

llvm-svn: 259840
2016-02-04 23:14:42 +00:00
Nemanja Ivanovic 220b4fe4a9 Provide a test case for rl259798
llvm-svn: 259835
2016-02-04 22:36:10 +00:00
Renato Golin c455e2f441 [PPC] Move PPC test to a PPC-specific dir
llvm-svn: 259797
2016-02-04 16:14:59 +00:00
Nemanja Ivanovic 155402c9c2 Test case for PR 26381
llvm-svn: 259740
2016-02-04 01:58:20 +00:00
Tim Shen f99f0d5a7e [SelectionDAG] Fix CombineToPreIndexedLoadStore O(n^2) behavior
This patch consists of two parts: a performance fix in DAGCombiner.cpp
and a correctness fix in SelectionDAG.cpp.

The test case tests the bug that's uncovered by the performance fix, and
fixed by the correctness fix.

The performance fix keeps the containers required by the
hasPredecessorHelper (which is a lazy DFS) and reuse them. Since
hasPredecessorHelper is called in a loop, the overall efficiency reduced
from O(n^2) to O(n), where n is the number of SDNodes.

The correctness fix keeps iterating the neighbor list even if it's time
to early return. It will return after finishing adding all neighbors to
Worklist, so that no neighbors are discarded due to the original early
return.

llvm-svn: 259691
2016-02-03 20:58:55 +00:00
Jonas Paulsson ac29f01788 [ScheduleDAGInstrs::buildSchedGraph()] Handling of memory dependecies rewritten.
Recommited, after some fixing with test cases.

Updated test cases:
test/CodeGen/AArch64/arm64-misched-memdep-bug.ll
test/CodeGen/AArch64/tailcall_misched_graph.ll

Temporarily disabled test cases:
test/CodeGen/AMDGPU/split-vector-memoperand-offsets.ll
test/CodeGen/PowerPC/ppc64-fastcc.ll (partially updated)
test/CodeGen/PowerPC/vsx-fma-m.ll
test/CodeGen/PowerPC/vsx-fma-sp.ll

http://reviews.llvm.org/D8705
Reviewers: Hal Finkel, Andy Trick.

llvm-svn: 259673
2016-02-03 17:52:29 +00:00
Kyle Butt d62d8b771d Codegen: [PPC] Fix PPCVSXFMAMutate to handle duplicates.
The purpose of PPCVSXFMAMutate is to elide copies by changing FMA forms
on PPC.

    %vreg6<def> = COPY %vreg96
    %vreg6<def,tied1> = XSMADDASP %vreg6<tied0>, %vreg5<kill>, %vreg7
    ;v6 = v6 + v5 * v7

is replaced by

    %vreg5<def,tied1> = XSMADDMSP %vreg5<tied0>, %vreg7, %vreg96
    ;v5 = v5 * v7 + v96

This was broken in the case where the target register was also used as a
multiplicand. Fix this case by checking for it and replacing both uses
with the copied register.

    %vreg6<def> = COPY %vreg96
    %vreg6<def,tied1> = XSMADDASP %vreg6<tied0>, %vreg5<kill>, %vreg6
    ;v6 = v6 + v5 * v6

is replaced by

    %vreg5<def,tied1> = XSMADDMSP %vreg5<tied0>, %vreg96, %vreg96
    ;v5 = v5 * v96 + v96

llvm-svn: 259617
2016-02-03 01:41:09 +00:00
Eric Christopher 5a2429e239 Since LI/LIS sign extend the constant passed into the instruction we should
check that the sign extended constant fits into 16-bits if we want a
zero extended value, otherwise go ahead and put it together piecemeal.

Fixes PR26356.

llvm-svn: 259177
2016-01-29 07:20:01 +00:00
David Majnemer bff6b581e2 Address buildbot fallout from r259065
llvm-svn: 259074
2016-01-28 18:59:04 +00:00
Dan Gohman 61d15ae4f5 [MC] Use .p2align instead of .align
For historic reasons, the behavior of .align differs between targets.
Fortunately, there are alternatives, .p2align and .balign, which make the
interpretation of the parameter explicit, and which behave consistently across
targets.

This patch teaches MC to use .p2align instead of .align, so that people reading
code for multiple architectures don't have to remember which way each platform
does its .align directive.

Differential Revision: http://reviews.llvm.org/D16549

llvm-svn: 258750
2016-01-26 00:03:25 +00:00
Ulrich Weigand 46ff7ec317 [PowerPC] Fix large code model with the ELFv2 ABI
The global entry point prologue currently assumes that the TOC
associated with a function is less than 2GB away from the function
entry point.  This is always true when using the medium or small
code model, but may not be the case when using the large code model.

This patch adds a new variant of the ELFv2 global entry point prologue
that lifts the 2GB restriction when building with -mcmodel=large.
This works by emitting a quadword containing the distance from the
function entry point to its associated TOC immediately before the
entry point, and then using a prologue like:

ld r2,-8(r12)
add r2,r2,r12

Since creation of the entry point prologue is now split across two
separate routines (PPCLinuxAsmPrinter::EmitFunctionEntryLabel emits
the data word, PPCLinuxAsmPrinter::EmitFunctionBodyStart the prolog
code), I've switched to using named labels instead of just temporaries
to indicate the locations of the global and local entry points and the
new TOC offset data word.

These names are provided by new routines in PPCFunctionInfo modeled
after the existing PPCFunctionInfo::getPICOffsetSymbol.

Note that a corresponding change was committed to GCC here:
https://gcc.gnu.org/ml/gcc-patches/2015-12/msg00355.html

Reviewers: hfinkel

Differential Revision: http://reviews.llvm.org/D15500

llvm-svn: 257597
2016-01-13 13:12:23 +00:00
Kyle Butt cec40806f1 Codegen: [PPC] Handle weighted comparisons when inserting selects.
Only non-weighted predicates were handled in PPCInstrInfo::insertSelect. Handle
the weighted predicates as well.

This latent bug was triggered by r255398, because it added use of the
branch-weighted predicates.

While here, switch over an enum instead of an int to get the compiler to enforce
totality in the future.

llvm-svn: 257518
2016-01-12 21:00:43 +00:00
Rafael Espindola 36a425b618 Remove a bugs assert.
There is no reason the value being printed has to be positive.
Fixes pr25802.

llvm-svn: 257412
2016-01-11 23:21:45 +00:00
Kyle Butt bfcff3856a Add call sequence start and end for __tls_get_addr
This is a fix for bug http://llvm.org/bugs/show_bug.cgi?id=25839.

For a PIC TLS variable access in a function, prologue (mflr followed by std and
stdu) gets scheduled after a tls_get_addr call. tls_get_addr messed up LR but
no one saves/restores it.

Also added a test for save/restore clobbered registers during calling __tls_get_addr.

Patch by Tim Shen

llvm-svn: 257137
2016-01-08 02:06:19 +00:00
Nemanja Ivanovic 8922476bcb Bitcasts between FP and INT values using direct moves
This patch corresponds to review:
http://reviews.llvm.org/D15286

This patch was meant to land in revision 255246, but I accidentally uploaded
the patch that corresponds to http://reviews.llvm.org/D15372 in that revision
accidentally.

Thereby, this patch is the actual Bitcasts using direct moves patch, whereas
http://reviews.llvm.org/rL255246 actually corresponds to
http://reviews.llvm.org/D15372.

llvm-svn: 255649
2015-12-15 14:50:34 +00:00
Petar Jovanovic 280f7101e8 [Power PC] llvm soft float support for ppc32
This is the second in a set of patches for soft float support for ppc32,
it enables soft float operations.

Patch by Strahinja Petrovic.

Differential Revision: http://reviews.llvm.org/D13700

llvm-svn: 255516
2015-12-14 17:57:33 +00:00