Commit Graph

4835 Commits

Author SHA1 Message Date
Ulrich Weigand 14bd521f4c [PowerPC] Constrain base register in PPCRegisterInfo::resolveFrameIndex
I've run into a bug where current LLVM at -O0 (with fast-isel)
generated invalid code like:

        ld 0, 20936(1)                  # 8-byte Folded Reload
        stw 12, 10348(0)
        stw 12, 10344(0)

The underlying vreg had been introduced as base register by the
Local Stack Slot Allocation pass.  That register was constrained
to G8RC by PPCRegisterInfo::materializeFrameBaseRegister to match
the ADDI instruction used to set it, but it was *not* constrained
to G8RC_NOX0 to fit the *use* of the register in an address.

That should have happened in PPCRegisterInfo::resolveFrameIndex.
This patch adds an appropriate constrainRegClass call.

Reviewed by Hal Finkel.

llvm-svn: 211897
2014-06-27 13:04:12 +00:00
Eric Christopher 83e0723457 Remove extraneous includes from the target machines.
llvm-svn: 211800
2014-06-26 19:30:05 +00:00
Will Schmidt 970ff64dc5 add ppc64/pwr8 as target
includes handling DIR_PWR8 where appropriate
The P7Model Itinerary is currently tied in for use under the P8Model, and will be updated later.

llvm-svn: 211779
2014-06-26 13:36:19 +00:00
Rafael Espindola e2c6624475 Move expression visitation logic up to MCStreamer.
Remove the duplicate from MCRecordStreamer. No functionality change.

llvm-svn: 211714
2014-06-25 15:45:33 +00:00
Rafael Espindola 2be1281d43 Simplify the visitation of target expressions. No functionality change.
llvm-svn: 211707
2014-06-25 15:29:54 +00:00
Bill Schmidt 83973ef23b [PPC64] Fix PR20071 (fctiduz generated for targets lacking that instruction)
PR20071 identifies a problem in PowerPC's fast-isel implementation for
floating-point conversion to integer.  The fctiduz instruction was added in
Power ISA 2.06 (i.e., Power7 and later).  However, this instruction is being
generated regardless of which 64-bit PowerPC target is selected.

The intent is for fast-isel to punt to DAG selection when this instruction is
not available.  This patch implements that change.  For testing purposes, the
existing fast-isel-conversion.ll test adds a RUN line for -mcpu=970 and tests
for the expected code generation.  Additionally, the existing test
fast-isel-conversion-p5.ll was found to be incorrectly expecting the
unavailable instruction to be generated.  I've removed these test variants
since we have adequate coverage in fast-isel-conversion.ll.

llvm-svn: 211627
2014-06-24 20:05:18 +00:00
Ulrich Weigand 8ca988f31a [PowerPC] Refactor getMinCallFrameSize / getMinCallArgumentsSize
As of r211495, the only remaining users of getMinCallFrameSize are in
core ABI code (LowerFormalParameter / LowerCall).  This is actually a
good thing, since the details of the parameter save area are ABI specific.

With the new ELFv2 ABI in particular, the rules defining the size of the
save area will become significantly more complex, so it wouldn't make
sense to implement those outside ABI code that has all required
information.

In preparation, this patch eliminates the getMinCallFrameSize (and
associated getMinCallArgumentsSize) routines, and inlines them into all
callers.  Note that since nearly all call arguments are constant, this
allows simplifying the inlined copies to a single line everywhere.

No change in generate code expected.

llvm-svn: 211497
2014-06-23 14:15:53 +00:00
Ulrich Weigand f316e1db75 [PowerPC] Allow stack frames without parameter save area
The PPCFrameLowering::determineFrameLayout routine currently ensures
that every function that allocates a stack frame provides space for the
parameter save area (via PPCFrameLowering::getMinCallFrameSize).

This is actually not necessary.  There may be functions that never call
another routine but still allocate a frame; those do not require the
parameter save area.  In the future, with the ELFv2 ABI, even some
routines that do call other functions do not need to allocate the
parameter save area.

While it is not a bug to allocate the parameter area when it is not
needed, it is better to avoid it to save stack space.

Note that when any particular function call requires the parameter save
area, this space will already have been included by ABI code in the size
the CALLSEQ_START insn is annotated with, and therefore included in the
size returned by MFI->getMaxCallFrameSize().

This means that determineFrameLayout simply does not need to care about
the parameter save area.  (It still needs to ensure that every frame
provides the linkage area.)  This is implemented by this patch.

Note that this exposed a bug in the new fast-isel code where the parameter
area was *not* included in the CALLSEQ_START size; this is also fixed.

A couple of test cases needed to be adapted for the new (smaller) stack
frame size those tests now see.

llvm-svn: 211495
2014-06-23 13:47:52 +00:00
Ulrich Weigand c6fcb7a5de [PowerPC] Fix IsDarwin arg in PPCFrameLowering:: calls
As remarked in the commit message to r211493, in several places
throughout the 64-bit SVR4 ABI code there are calls to
PPCFrameLowering::getLinkageSize and getMinCallFrameSize
using an incorrect IsDarwin argument of "true".

(Some of those were made explicit by the above refactoring patch, others
have been there all along.)

This patch fixes those places to pass "false" for IsDarwin.

No change in generated code expected.

llvm-svn: 211494
2014-06-23 13:21:43 +00:00
Ulrich Weigand 2bffb95915 [PowerPC] Refactor setMinReservedArea and CalculateParameterAndLinkageAreaSize
The PPCISelLowering.cpp routines PPCTargetLowering::setMinReservedArea and
CalculateParameterAndLinkageAreaSize are currently used as subroutines
from both 64-bit SVR4 and Darwin ABI code.

However, the two ABIs are already quite different w.r.t. AltiVec
conventions, and they will become more different when the ELFv2 ABI is
supported.  Also, in general it seems better to disentangle ABI support
routines for different ABIs to avoid accidentally affecting one ABI when
intending to change only the other.

(Actually, the current code strictly speaking already contains a bug:
these routines call PPCFrameLowering::getMinCallFrameSize and
PPCFrameLowering::getLinkageSize with the IsDarwin parameter set to
"true" even on 64-bit SVR4.  This bug currently has no adverse effect
since those routines always return the same for 64-bit SVR4 and 64-bit
Darwin, but it still seems wrong ...  I'll fix this in a follow-up
commit shortly.)

To remove this code sharing, I'm simply inlining both routines into all
call sites (there are just two each, one for 64-bit SVR4 and one for
Darwin), and simplifying due to constant parameters where possible.

A small piece of code that *does* make sense to share is refactored into
the new routine EnsureStackAlignment, now also called from 32-bit SVR4
ABI code.

No change in generated code is expected.

llvm-svn: 211493
2014-06-23 13:08:27 +00:00
Ulrich Weigand 9ba552db89 [PowerPC] Fix on-stack AltiVec arguments with 64-bit SVR4
Current 64-bit SVR4 code seems to have some remnants of Darwin code
in AltiVec argument handing.  This had the effect that AltiVec arguments
(or subsequent arguments) were not correctly placed in the parameter area
in some cases.

The correct behaviour with the 64-bit SVR4 ABI is:
- All AltiVec arguments take up space in the parameter area, just like
  any other arguments, whether vararg or not.
- They are always 16-byte aligned, skipping a parameter area doubleword
  (and the associated GPR, if any), if necessary.

This patch implements the correct behaviour and adds a test case.
(Verified against GCC behaviour via the ABI compat test suite.)

llvm-svn: 211492
2014-06-23 12:36:34 +00:00
Ulrich Weigand 59c6ab20d6 [PowerPC] Fix small argument stack slot offset for LE
When small arguments (structures < 8 bytes or "float") are passed in a
stack slot in the ppc64 SVR4 ABI, they must reside in the least
significant part of that slot.  On BE, this means that an offset needs
to be added to the stack address of the parameter, but on LE, the least
significant part of the slot has the same address as the slot itself.

This changes the PowerPC back-end ABI code to only add the small
argument stack slot offset for BE.  It also adds test cases to verify
the correct behavior on both BE and LE.

llvm-svn: 211368
2014-06-20 16:34:05 +00:00
Ulrich Weigand f460d69ada [PowerPC] Remove unnecessary load of r12 in indirect call
When looking at the 64-bit SVR4 indirect call sequence, I noticed
an unnecessary load of r12.  And indeed the code says:

  // R12 must contain the address of an indirect callee. 

But this is not correct; in the 64-bit SVR4 (ELFv1) ABI, there is
no need to load r12 at this point.  It seems this code and comment
is a remnant of code originally shared with the Darwin ABI ...

This patch simply removes the unnecessary load.

llvm-svn: 211203
2014-06-18 18:33:36 +00:00
Ulrich Weigand ad0cb91ed9 [PowerPC] Simplify and improve loading into TOC register
During an indirect function call sequence on the 64-bit SVR4 ABI,
generate code must load and then restore the TOC register.

This does not use a regular LOAD instruction since the TOC
register r2 is marked as reserved.  Instead, the are two
special instruction patterns:

 let RST = 2, DS = 2 in
 def LDinto_toc: DSForm_1a<58, 0, (outs), (ins g8rc:$reg),
                     "ld 2, 8($reg)", IIC_LdStLD,
                     [(PPCload_toc i64:$reg)]>, isPPC64;
 
 let RST = 2, DS = 10, RA = 1 in
 def LDtoc_restore : DSForm_1a<58, 0, (outs), (ins),
                     "ld 2, 40(1)", IIC_LdStLD,
                     [(PPCtoc_restore)]>, isPPC64;

Note that these not only restrict the destination of the
load to r2, but they also restrict the *source* of the
load to particular address combinations.  The latter is
a problem when we want to support the ELFv2 ABI, since
there the TOC save slot is no longer at 40(1).

This patch replaces those two instructions with a single
instruction pattern that only hard-codes r2 as destination,
but supports generic addresses as source.  This will allow
supporting the ELFv2 ABI, and also helps generate more
efficient code for calls to absolute addresses (allowing
simplification of the ppc64-calls.ll test case).

llvm-svn: 211193
2014-06-18 17:52:49 +00:00
Ulrich Weigand 9aa09ef30f [PowerPC] Do not use BLA with the 64-bit SVR4 ABI
The PowerPC back-end uses BLA to implement calls to functions at
known-constant addresses, which is apparently used for certain
system routines on Darwin.

However, with the 64-bit SVR4 ABI, this is actually incorrect.
An immediate function pointer value on this platform is not
directly usable as a target address for BLA:
- in the ELFv1 ABI, the function pointer value refers to the
  *function descriptor*, not the code address
- in the ELFv2 ABI, the function pointer value refers to the
  global entry point, but BL(A) would only be correct when
  calling the *local* entry point

This bug didn't show up since using immediate function pointer
values is not usually done in the 64-bit SVR4 ABI in the first
place.  However, I ran into this issue with a certain use case
of LLVM as JIT, where immediate function pointer values were
uses to implement callbacks from JITted code to helpers in
statically compiled code.

Fixed by simply not using BLA with the 64-bit SVR4 ABI.

llvm-svn: 211174
2014-06-18 16:14:04 +00:00
Ulrich Weigand 7c3f0dc7e4 [PowerPC] Fix emitting instruction pairs on LE
My patch r204634 to emit instructions in little-endian format failed to
handle those special cases where we emit a pair of instructions from a
single LLVM MC instructions (like the bl; nop pairs used to implement
the call sequence).

In those cases, we still need to emit the "first" instruction (the one
in the more significant word) first, on both big and little endian,
and not swap them.

llvm-svn: 211171
2014-06-18 15:37:07 +00:00
Bill Schmidt 5d82f09b53 [PPC64] Fix PR19893 - improve code generation for local function addresses
Rafael opened http://llvm.org/bugs/show_bug.cgi?id=19893 to track non-optimal
code generation for forming a function address that is local to the compile
unit.  The existing code was treating both local and non-local functions
identically.

This patch fixes the problem by properly identifying local functions and
generating the proper addis/addi code.  I also noticed that Rafael's earlier
changes to correct the surrounding code in PPCISelLowering.cpp were also
needed for fast instruction selection in PPCFastISel.cpp, so this patch
fixes that code as well.

The existing test/CodeGen/PowerPC/func-addr.ll is modified to test the new
code generation.  I've added a -O0 run line to test the fast-isel code as
well.

Tested on powerpc64[le]-unknown-linux-gnu with no regressions.

llvm-svn: 211056
2014-06-16 21:36:02 +00:00
Eric Christopher f047bfd115 The hazard recognizer only needs a subtarget, not a target machine
so make it take one. Fix up all users accordingly.

llvm-svn: 210948
2014-06-13 22:38:52 +00:00
Eric Christopher 170ebcf07f Fix typo.
llvm-svn: 210947
2014-06-13 22:38:48 +00:00
Eric Christopher 02ae6902fa Move the PPCSelectionDAGInfo off the TargetMachine and onto the
subtarget.

llvm-svn: 210854
2014-06-12 23:02:32 +00:00
Eric Christopher e47dcd411a Make PPCSelectionDAGInfo take a DataLayout instead of a TargetMachine
since that's all it needs.

llvm-svn: 210853
2014-06-12 22:56:48 +00:00
Eric Christopher f8c031fccf Move PPCTargetLowering off of the TargetMachine and onto the subtarget.
llvm-svn: 210852
2014-06-12 22:50:10 +00:00
Eric Christopher d90a8746df Remove an extraneous this-> to access the subtarget.
llvm-svn: 210849
2014-06-12 22:38:20 +00:00
Eric Christopher b1aaebecb1 Rename PPCSubTarget to Subtarget in PPCTargetLowering for consistency.
Also remove an extra local subtarget in the initialization functions.

llvm-svn: 210848
2014-06-12 22:38:18 +00:00
Eric Christopher f55a224920 Move PPCJITInfo off of the TargetMachine and onto the subtarget.
Needed to migrate a few functions around to avoid circular header
dependencies.

llvm-svn: 210845
2014-06-12 22:28:06 +00:00
Eric Christopher 54367e01cc Remove the use of TargetMachine from PPCJITInfo and replace with
the subtarget. Also remove unnecessary argument to the constructor
at the same time, we already have access via the subtarget.

llvm-svn: 210844
2014-06-12 22:19:51 +00:00
Eric Christopher bd14dc519c Move PPCInstrInfo off of the target machine and onto the subtarget.
llvm-svn: 210839
2014-06-12 22:05:46 +00:00
Eric Christopher 1dcea73540 Remove TargetMachine from PPCInstrInfo and all dependencies and
replace with the current subtarget.

llvm-svn: 210836
2014-06-12 21:48:52 +00:00
Eric Christopher 49628bc4ff Move DataLayout from the PPCTargetMachine to the subtarget.
llvm-svn: 210824
2014-06-12 21:08:06 +00:00
Eric Christopher d104c31fc0 Move PPCFrameLowering into PPCSubtarget from PPCTargetMachine. Use
the initializeSubtargetDependencies code to obtain an initialized
subtarget and migrate a couple of subtarget using functions to the
.cpp file to avoid circular includes.

llvm-svn: 210822
2014-06-12 20:54:11 +00:00
Eric Christopher a475d5c54a Remove duplicate copy of InstrItineraryData from the TargetMachine,
it's already on the subtarget.

llvm-svn: 210619
2014-06-11 00:53:17 +00:00
Bill Schmidt f910a0650e [PPC64LE] Recognize shufflevector patterns for little endian
Various masks on shufflevector instructions are recognizable as
specific PowerPC instructions (vector pack, vector merge, etc.).
There is existing code in PPCISelLowering.cpp to recognize the correct
patterns for big endian code.  The masks for these instructions are
different for little endian code due to the big-endian numbering
employed by these instructions.  This patch adds the recognition code
for little endian.

I've added a new test case test/CodeGen/PowerPC/vec_shuffle_le.ll for
this.  The existing recognizer test (vec_shuffle.ll) is unnecessarily
verbose and difficult to read, so I felt it was better to add a new
test rather than modify the old one.

llvm-svn: 210536
2014-06-10 14:35:01 +00:00
Bill Schmidt 6b5a7dfc24 [PPC64LE] Generate correct code for unaligned little-endian vector loads
The code in PPCTargetLowering::PerformDAGCombine() that handles
unaligned Altivec vector loads generates a lvsl followed by a vperm.
As we've seen in numerous other places, the vperm instruction has a
big-endian bias, and this is fixed for little endian by complementing
the permute control vector and swapping the input operands.  In this
case the lvsl is providing the permute control vector.  Rather than
generating an lvsl and a complement operation, it is sufficient to
generate an lvsr instruction instead.  Thus for LE code generation we
will generate an lvsr rather than an lvsl, and swap the other input
arguments on the vperm.

The existing test/CodeGen/PowerPC/vec_misalign.ll is updated to test
the code generation for PPC64 and PPC64LE, in addition to the existing
PPC32/G5 testing.

llvm-svn: 210493
2014-06-09 22:00:52 +00:00
Bill Schmidt 42995e8c74 [PPC64LE] Generate correct little-endian code for v16i8 multiply
The existing code in PPCTargetLowering::LowerMUL() for multiplying two
v16i8 values assumes that vector elements are numbered in big-endian
order.  For little-endian targets, the vector element numbering is
reversed, but the vmuleub, vmuloub, and vperm instructions still
assume big-endian numbering.  To account for this, we must adjust the
permute control vector and reverse the order of the input registers on
the vperm instruction.

The existing test/CodeGen/PowerPC/vec_mul.ll is updated to be executed
on powerpc64 and powerpc64le targets as well as the original powerpc
(32-bit) target.

llvm-svn: 210474
2014-06-09 16:06:29 +00:00
David Blaikie 960ea3f018 AsmMatchers: Use unique_ptr to manage ownership of MCParsedAsmOperand
I saw at least a memory leak or two from inspection (on probably
untested error paths) and r206991, which was the original inspiration
for this change.

I ran this idea by Jim Grosbach a few weeks ago & he was OK with it.
Since it's a basically mechanical patch that seemed sufficient - usual
post-commit review, revert, etc, as needed.

llvm-svn: 210427
2014-06-08 16:18:35 +00:00
Eric Christopher 0dd8d486b3 Have TargetSelectionDAGInfo take a DataLayout initializer rather than
a TargetMachine since the only thing it wants is DataLayout.

llvm-svn: 210366
2014-06-06 19:04:48 +00:00
Bill Schmidt 4aedff8995 [PPC64LE] Fix lowering of BUILD_VECTOR and SHUFFLE_VECTOR for little endian
This patch fixes a couple of lowering issues for little endian
PowerPC.  The code for lowering BUILD_VECTOR contains a number of
optimizations that are only valid for big endian.  For now, we disable
those optimizations for correctness.  In the future, we will add
analogous optimizations that are correct for little endian.

When lowering a SHUFFLE_VECTOR to a VPERM operation, we again need to
make the now-familiar transformation of swapping the input operands
and complementing the permute control vector.  Correctness of this
transformation is tested by the accompanying test case.

llvm-svn: 210336
2014-06-06 14:06:26 +00:00
Bill Schmidt 82f9e8a62e [PPC64LE] Temporarily disable VSX support in little-endian mode
This is a preliminary patch for the PowerPC64LE support.  In stage 1
of the vector support, we will support the VMX (Altivec) instruction
set, but will not yet support the VSX instructions.  This is merely a
staging issue to provide functional vector support as soon as
possible.

llvm-svn: 210271
2014-06-05 16:21:13 +00:00
Eric Christopher a84189a2e4 Omit else branch after return.
llvm-svn: 210034
2014-06-02 17:29:07 +00:00
Eric Christopher 8995833a34 Have the TLOF creation take a Triple rather than needing a subtarget.
llvm-svn: 209937
2014-05-31 00:07:32 +00:00
Eric Christopher 5435224a53 isSVR4ABI() returned !isDarwin() so just move that to the else
block and remove the unreachable code.

llvm-svn: 209927
2014-05-30 22:47:53 +00:00
Eric Christopher 174c662b7c Rename CreateTLOF->createTLOF to match the rest of the file and the
rest of the targets with a similar function name.

llvm-svn: 209926
2014-05-30 22:47:48 +00:00
Rafael Espindola 04902862a8 [PPC] Use alias symbols in address computation.
This seems to match what gcc does for ppc and what every other llvm
backend does.

This is a fixed version of r209638. The difference is to avoid any change
in behavior for functions. The logic for using constant pools for function
addresseses is spread over a few places and we have to keep them in sync.

llvm-svn: 209821
2014-05-29 15:41:38 +00:00
Rafael Espindola 59f7eba2b5 [pr19844] Add thread local mode to aliases.
This matches gcc's behavior. It also seems natural given that aliases
contain other properties that govern how it is accessed (linkage,
visibility, dll storage).

Clang still has to be updated to expose this feature to C.

llvm-svn: 209759
2014-05-28 18:15:43 +00:00
Hal Finkel f5c07ada1d Revert "[PPC] Use alias symbols in address computation."
This reverts commit r209638 because it broke self-hosting on ppc64/Linux. (the
Clang-compiled TableGen would segfault because it jumped to an invalid address
from within _ZNK4llvm17ManagedStaticBase21RegisterManagedStaticEPFPvvEPFvS1_E
(which is within the command-line parameter registration process)).

llvm-svn: 209745
2014-05-28 15:25:06 +00:00
Bill Schmidt 71dddd51d9 [PATCH] Correct type used for VADD_SPLAT optimization on PowerPC
In PPCISelLowering.cpp: PPCTargetLowering::LowerBUILD_VECTOR(), there
is an optimization for certain patterns to generate one or two vector
splats followed by a vector add or subtract.  This operation is
represented by a VADD_SPLAT in the selection DAG.  Prior to this
patch, it was possible for the VADD_SPLAT to be assigned the wrong
data type, causing incorrect code generation.  This patch corrects the
problem.

Specifically, the code previously assigned the value type of the
BUILD_VECTOR node to the newly generated VADD_SPLAT node.  This is
correct much of the time, but not always.  The problem is that the
call to isConstantSplat() may return a SplatBitSize that is not the
same as the number of bits in the original element vector type.  The
correct type to assign is a vector type with the same element bit size
as SplatBitSize.

The included test case shows an example of this, where the
BUILD_VECTOR node has a type of v16i8.  The vector to be built is {0,
16, 0, 16, 0, 16, 0, 16, 0, 16, 0, 16, 0, 16, 0, 16}.  isConstantSplat
detects that we can generate a splat of 16 for type v8i16, which is
the type we must assign to the VADD_SPLAT node.  If we do not, we
generate a vspltisb of 8 and a vaddubm, which generates the incorrect
result {16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16,
16}.  The correct code generation is a vspltish of 8 and a vadduhm.

This patch also corrected code generation for
CodeGen/PowerPC/2008-07-10-SplatMiscompile.ll, which had been marked
as an XFAIL, so we can remove the XFAIL from the test case.

llvm-svn: 209662
2014-05-27 15:57:51 +00:00
Rafael Espindola ac69cee6a2 [PPC] Use alias symbols in address computation.
This seems to match what gcc does for ppc and what every other llvm
backend does.

llvm-svn: 209638
2014-05-26 19:08:19 +00:00
Eric Christopher 89f18805f4 Fix typo.
llvm-svn: 209377
2014-05-22 01:21:44 +00:00
Eric Christopher d71e4441c9 Avoid using subtarget features when initializing the pass pipeline
on PPC.

llvm-svn: 209376
2014-05-22 01:21:35 +00:00
Eric Christopher 1b8e763630 Reset the subtarget for DAGToDAG on every iteration of runOnMachineFunction.
This required updating the generated functions and TD file accordingly
to be pointers rather than const references.

llvm-svn: 209375
2014-05-22 01:07:24 +00:00
Eric Christopher 6b0fcfee36 Make early if conversion dependent upon the subtarget and add
a subtarget hook to enable. Unconditionally add to the pass pipeline
for targets that might want to use it. No functional change.

llvm-svn: 209340
2014-05-21 23:40:26 +00:00
Adam Nemet 571eb5fc91 [PowerPC] PR19796: Also match ISD::TargetConstant in isIntS16Immediate
The SplitIndexingFromLoad changes exposed a latent isel bug in the PowerPC64
backend.  We matched an immediate offset with STWX8 even though it only
supports register offset.

The culprit is the complex-pattern predicate, SelectAddrIdx, which decides
that if the offset is not ISD::Constant it must be a register.

Many thanks to Bill Schmidt for testing this.

llvm-svn: 209219
2014-05-20 17:20:34 +00:00
Benjamin Kramer f3ad23551d SDAG: Legalize vector BSWAP into a shuffle if the shuffle is legal but the bswap not.
- On ARM/ARM64 we get a vrev because the shuffle matching code is really smart. We still unroll anything that's not v4i32 though.
- On X86 we get a pshufb with SSSE3. Required more cleverness in isShuffleMaskLegal.
- On PPC we get a vperm for v8i16 and v4i32. v2i64 is unrolled.

llvm-svn: 209123
2014-05-19 13:12:38 +00:00
Saleem Abdulrasool f3a5a5c546 Target: remove old constructors for CallLoweringInfo
This is mostly a mechanical change changing all the call sites to the newer
chained-function construction pattern.  This removes the horrible 15-parameter
constructor for the CallLoweringInfo in favour of setting properties of the call
via chained functions.  No functional change beyond the removal of the old
constructors are intended.

llvm-svn: 209082
2014-05-17 21:50:17 +00:00
Pete Cooper 0d4ea975ef Use a sized enum for MachineOperandType. No functionality change
llvm-svn: 209048
2014-05-16 23:28:17 +00:00
Rafael Espindola e0098928c9 Delete getAliasedGlobal.
llvm-svn: 209040
2014-05-16 22:37:03 +00:00
Jay Foad a0653a3e6c Rename ComputeMaskedBits to computeKnownBits. "Masked" has been
inappropriate since it lost its Mask parameter in r154011.

llvm-svn: 208811
2014-05-14 21:14:37 +00:00
Eric Christopher 02e1804d8d Fix typo in function name.
llvm-svn: 208743
2014-05-14 00:31:15 +00:00
Eric Christopher d1309ee27d Save the optimization level the subtarget was created with in a
member variable and sink the initialization of crbits into the
subtarget feature reset code.

No functional change, but this refactor will be used in a future
commit.

llvm-svn: 208726
2014-05-13 20:49:08 +00:00
Hal Finkel 0d8db46799 [PowerPC] Add global named register support
Support for the intrinsics that read from and write to global named registers
is added for r1, r2 and r13 (depending on the subtarget).

llvm-svn: 208509
2014-05-11 19:29:11 +00:00
Hal Finkel c4c6c87666 [PowerPC] On PPC32, 128-bit shifts might be runtime calls
The counter-loops formation pass needs to know what operations might be
function calls (because they can't appear in counter-based loops). On PPC32,
128-bit shifts might be runtime calls (even though you can't use __int128 on
PPC32, it seems that SROA might form them).

Fixes PR19709.

llvm-svn: 208501
2014-05-11 16:23:29 +00:00
Rafael Espindola 566fcfe69b Remove the UseCFI option from createAsmStreamer.
We were already always passing true, this just removes the option.

llvm-svn: 208205
2014-05-07 13:00:43 +00:00
Rafael Espindola 3d082fa507 Fix pr19645.
The fix itself is fairly simple: move getAccessVariant to MCValue so that we
replace the old weak expression evaluation with the far more general
EvaluateAsRelocatable.

This then requires that EvaluateAsRelocatable stop when it finds a non
trivial reference kind. And that in turn requires the ELF writer to look
harder for weak references.

Last but not least, this found a case where we were being bug by bug
compatible with gas and accepting an invalid input. I reported pr19647
to track it.

llvm-svn: 207920
2014-05-03 19:57:04 +00:00
Craig Topper 2d2aa0ca1f Use makeArrayRef insted of calling ArrayRef<T> constructor directly. I introduced most of these recently.
llvm-svn: 207616
2014-04-30 07:17:30 +00:00
Craig Topper ee7b0f3956 De-virtualize or remove some methods that have no overrides nor override anything. In some cases remove all together if there are no callers either.
llvm-svn: 207610
2014-04-30 05:53:27 +00:00
Craig Topper 0d3fa92514 [C++11] Add 'override' keywords and remove 'virtual'. Additionally add 'final' and leave 'virtual' on some methods that are marked virtual without overriding anything and have no obvious overrides themselves. PowerPC edition
llvm-svn: 207504
2014-04-29 07:57:37 +00:00
Eric Christopher 612bb69bf7 None of these targets actually define their own CFI_INSTRUCTION
opcode so there's no reason to use the target namespace for it
rather than TargetOpcode.

llvm-svn: 207475
2014-04-29 00:16:46 +00:00
Eric Christopher d17374919b 80-column, tab characters, comment fixups.
llvm-svn: 207473
2014-04-29 00:16:40 +00:00
Craig Topper 8c0b4d0791 Convert more SelectionDAG functions to use ArrayRef.
llvm-svn: 207397
2014-04-28 05:57:50 +00:00
Craig Topper e73658ddbb [C++] Use 'nullptr'.
llvm-svn: 207394
2014-04-28 04:05:08 +00:00
Craig Topper 481fb2879f Convert SelectionDAG::SelectNodeTo to use ArrayRef.
llvm-svn: 207377
2014-04-27 19:21:11 +00:00
Craig Topper 64941d9786 Convert SelectionDAG::getMergeValues to use ArrayRef.
llvm-svn: 207374
2014-04-27 19:20:57 +00:00
Craig Topper 206fcd450a Convert getMemIntrinsicNode to take ArrayRef of SDValue instead of pointer and size.
llvm-svn: 207329
2014-04-26 19:29:41 +00:00
Craig Topper 48d114bed1 Convert SelectionDAG::getNode methods to use ArrayRef<SDValue>.
llvm-svn: 207327
2014-04-26 18:35:24 +00:00
Craig Topper 062a2baef0 [C++] Use 'nullptr'. Target edition.
llvm-svn: 207197
2014-04-25 05:30:21 +00:00
Reid Kleckner 5772b77789 Add 'musttail' marker to call instructions
This is similar to the 'tail' marker, except that it guarantees that
tail call optimization will occur.  It also comes with convervative IR
verification rules that ensure that tail call optimization is possible.

Reviewers: nicholas

Differential Revision: http://llvm-reviews.chandlerc.com/D3240

llvm-svn: 207143
2014-04-24 20:14:34 +00:00
David Blaikie 908f4d4bf5 Spread some const around for non-mutating uses of MCSymbolData.
I discovered this const-hole while attempting to coalesnce the Symbol
and SymbolMap data structures. There's some pending issues with that,
but I figured this change was easy to flush early.

llvm-svn: 207124
2014-04-24 16:59:40 +00:00
Evgeniy Stepanov 0a951b775e Create MCTargetOptions.
For now it contains a single flag, SanitizeAddress, which enables
AddressSanitizer instrumentation of inline assembly.

Patch by Yuri Gorshenin.

llvm-svn: 206971
2014-04-23 11:16:03 +00:00
Chandler Carruth 84e68b2994 [Modules] Fix potential ODR violations by sinking the DEBUG_TYPE
definition below all of the header #include lines, lib/Target/...
edition.

llvm-svn: 206842
2014-04-22 02:41:26 +00:00
Chandler Carruth d174b72a28 [cleanup] Lift using directives, DEBUG_TYPE definitions, and even some
system headers above the includes of generated '.inc' files that
actually contain code. In a few targets this was already done pretty
consistently, but it wasn't done *really* consistently anywhere. It is
strictly cleaner IMO and necessary in a bunch of places where the
DEBUG_TYPE is referenced from the generated code. Consistency with the
necessary places trumps. Hopefully the build bots are OK with the
movement of intrin.h...

llvm-svn: 206838
2014-04-22 02:03:14 +00:00
Chandler Carruth e96dd8975f [Modules] Make Support/Debug.h modular. This requires it to not change
behavior based on other files defining DEBUG_TYPE, which means it cannot
define DEBUG_TYPE at all. This is actually better IMO as it forces folks
to define relevant DEBUG_TYPEs for their files. However, it requires all
files that currently use DEBUG(...) to define a DEBUG_TYPE if they don't
already. I've updated all such files in LLVM and will do the same for
other upstream projects.

This still leaves one important change in how LLVM uses the DEBUG_TYPE
macro going forward: we need to only define the macro *after* header
files have been #include-ed. Previously, this wasn't possible because
Debug.h required the macro to be pre-defined. This commit removes that.
By defining DEBUG_TYPE after the includes two things are fixed:

- Header files that need to provide a DEBUG_TYPE for some inline code
  can do so by defining the macro before their inline code and undef-ing
  it afterward so the macro does not escape.

- We no longer have rampant ODR violations due to including headers with
  different DEBUG_TYPE definitions. This may be mostly an academic
  violation today, but with modules these types of violations are easy
  to check for and potentially very relevant.

Where necessary to suppor headers with DEBUG_TYPE, I have moved the
definitions below the includes in this commit. I plan to move the rest
of the DEBUG_TYPE macros in LLVM in subsequent commits; this one is big
enough.

The comments in Debug.h, which were hilariously out of date already,
have been updated to reflect the recommended practice going forward.

llvm-svn: 206822
2014-04-21 22:55:11 +00:00
Nick Lewycky aad475b324 Break PseudoSourceValue out of the Value hierarchy. It is now the root of its own tree containing FixedStackPseudoSourceValue (which you can use isa/dyn_cast on) and MipsCallEntry (which you can't). Anything that needs to use either a PseudoSourceValue* and Value* is strongly encouraged to use a MachinePointerInfo instead.
llvm-svn: 206255
2014-04-15 07:22:52 +00:00
Lang Hames a1bc0f5662 [MC] Require an MCContext when constructing an MCDisassembler.
This patch re-introduces the MCContext member that was removed from
MCDisassembler in r206063, and requires that an MCContext be passed in at
MCDisassembler construction time. (Previously the MCContext member had been
initialized in an ad-hoc fashion after construction). The MCCContext member
can be used by MCDisassembler sub-classes to construct constant or
target-specific MCExprs.

This patch updates disassemblers for in-tree targets, and provides the
MCRegisterInfo instance that some disassemblers were using through the
MCContext (previously those backends were constructing their own
MCRegisterInfo instances).

llvm-svn: 206241
2014-04-15 04:40:56 +00:00
Hal Finkel 0192cbac66 [PowerPC] [Constant Hoisting] Enable constant hoisting on PPC
Implements the various TTI functions to enable constant hoisting on PPC. The
only significant test-suite change is this:

MultiSource/Benchmarks/VersaBench/bmm/bmm - 20% speedup
(which essentially reverses the slowdown from r206120).

llvm-svn: 206141
2014-04-13 23:02:40 +00:00
Hal Finkel d9963c75da [PowerPC] Fix rlwimi isel when mask is not constant
We had been using the known-zero values of the operand of the or to construct
the mask for an rlwimi; this is not quite correct, but fine when the mask is
constant. When the mask is constant, then the known zeros of the operand must
be a superset of the zeros in the mask. However, when the mask is not a
constant, then there might be bits in the operand that are not known to be zero
that, at runtime, might be zero in the mask. Therefore, we check that any bits
not known to be zero *are* known to be one in the mask. Otherwise, we can't
fold the mask with the or and shift.

This was revealed as a miscompile of
MultiSource/Benchmarks/BitBench/drop3/drop3 when I started experimenting with
constant hoisting.

llvm-svn: 206136
2014-04-13 17:10:58 +00:00
Hal Finkel 34974ed503 [PowerPC] Implement some additional TLI callbacks
Add implementations of:
  bool isLegalICmpImmediate(int64_t Imm) const
  bool isLegalAddImmediate(int64_t Imm) const
  bool isTruncateFree(Type *Ty1, Type *Ty2) const
  bool isTruncateFree(EVT VT1, EVT VT2) const
  bool shouldConvertConstantLoadToIntImm(const APInt &Imm, Type *Ty) const

Unfortunately, this regresses counter-register-based loop formation because
some of the loops now end up in forms were SE cannot compute loop counts.
However, nevertheless, the test-suite results favor committing:

SingleSource/Benchmarks/BenchmarkGame/puzzle: 26% speedup
MultiSource/Benchmarks/FreeBench/analyzer/analyzer: 21% speedup
MultiSource/Benchmarks/MiBench/automotive-susan/automotive-susan: 20% speedup
SingleSource/Benchmarks/Polybench/linear-algebra/kernels/trisolv/trisolv: 19% speedup
SingleSource/Benchmarks/Polybench/linear-algebra/kernels/gesummv/gesummv: 15% speedup
MultiSource/Benchmarks/FreeBench/pcompress2/pcompress2: 2% speedup

MultiSource/Benchmarks/VersaBench/bmm/bmm: 26% slowdown

llvm-svn: 206120
2014-04-12 21:52:38 +00:00
NAKAMURA Takumi 98905d3f85 LLVMBuild.txt: Reformat.
llvm-svn: 205961
2014-04-10 11:16:17 +00:00
Hal Finkel a775e51274 [PowerPC] Don't return false from PPC::isVSLDOIShuffleMask
PPC::isVSLDOIShuffleMask should return -1, not false, when the shuffle
predicate should be false.

Noticed by inspection; no test case (yet).

llvm-svn: 205787
2014-04-08 19:00:27 +00:00
Hal Finkel 41e9b1d559 [PowerPC] Remove unused TM member variable to unbreak build
Fix "error: private field 'TM' is not used [-Werror,-Wunused-private-field]"

llvm-svn: 205660
2014-04-05 00:16:28 +00:00
Hal Finkel de0b413ec0 [PowerPC] Adjust load/store costs in PPCTTI
This provides more realistic costs for the insert/extractelement instructions
(which are load/store pairs), accounts for the cheap unaligned Altivec load
sequence, and for unaligned VSX load/stores.

Bad news:
MultiSource/Applications/sgefa/sgefa - 35% slowdown (this will require more investigation)
SingleSource/Benchmarks/McGill/queens - 20% slowdown (we no longer vectorize this, but it was a constant store that was scalarized)
MultiSource/Benchmarks/FreeBench/pcompress2/pcompress2 - 2% slowdown

Good news:
SingleSource/Benchmarks/Shootout/ary3 - 54% speedup
SingleSource/Benchmarks/Shootout-C++/ary - 40% speedup
MultiSource/Benchmarks/Ptrdist/ks/ks - 35% speedup
MultiSource/Benchmarks/FreeBench/neural/neural - 30% speedup
MultiSource/Benchmarks/TSVC/Symbolics-flt/Symbolics-flt - 20% speedup

Unfortunately, estimating the costs of the stack-based scalarization sequences
is hard, and adjusting these costs is like a game of whac-a-mole :( I'll
revisit this again after we have better codegen for vector extloads and
truncstores and unaligned load/stores.

llvm-svn: 205658
2014-04-04 23:51:18 +00:00
Hal Finkel b1308d525c [PowerPC] PPCTTI Cleanup
Remove the declaration of an unimplemented function.

llvm-svn: 205657
2014-04-04 23:51:11 +00:00
Hal Finkel fbf7e2a1a1 [PowerPC] Add a full condition code register to make the "cc" clobber work
gcc inline asm supports specifying "cc" as a clobber of all condition
registers. Add just enough modeling of the full register to make this work.
Fixed PR19326.

llvm-svn: 205630
2014-04-04 15:15:57 +00:00
Craig Topper 840beec2d0 Make consistent use of MCPhysReg instead of uint16_t throughout the tree.
llvm-svn: 205610
2014-04-04 05:16:06 +00:00
Hal Finkel f823380a44 [PowerPC] Make PPCTTI::getMemoryOpCost call BasicTTI::getMemoryOpCost
PPCTTI::getMemoryOpCost will now make use of BasicTTI::getMemoryOpCost to
calculate the base cost of the memory access, and then adjust on top of that.
There is no functionality change from this modification, but it will become
important so that PPCTTI can take advantage of scalarization information for which
BasicTTI::getMemoryOpCost will account in the near future.

llvm-svn: 205476
2014-04-02 22:43:49 +00:00
Jim Grosbach 36c4953348 Simplify resolveFrameIndex() signature.
Just pass a MachineInstr reference rather than an MBB iterator.
Creating a MachineInstr& is the first thing every implementation did
anyway.

llvm-svn: 205453
2014-04-02 19:28:18 +00:00
Hal Finkel 9e0baa6d3a [PowerPC] Add some missing VSX bitcast patterns
llvm-svn: 205352
2014-04-01 19:24:27 +00:00
Hal Finkel b4240ca0f4 [PowerPC] Don't ever expand BUILD_VECTOR of v2i64 with shuffles
If we have two unique values for a v2i64 build vector, this will always result
in two vector loads if we expand using shuffles. Only one is necessary.

llvm-svn: 205231
2014-03-31 17:48:16 +00:00
Hal Finkel 4c8f634f23 [PowerPC] Correct P7 dispatch unit allocation for vector instructions
llvm-svn: 205222
2014-03-31 17:02:10 +00:00
Hal Finkel 5c0d1454d6 [PowerPC] Handle VSX v2i64 SIGN_EXTEND_INREG
sitofp from v2i32 to v2f64 ends up generating a SIGN_EXTEND_INREG v2i64 node
(and similarly for v2i16 and v2i8). Even though there are no sign-extension (or
algebraic shifts) for v2i64 types, we can handle v2i32 sign extensions by
converting two and from v2i64. The small trick necessary here is to shift the
i32 elements into the right lanes before the i32 -> f64 step. This is because
of the big Endian nature of the system, we need the i32 portion in the high
word of the i64 elements.

For v2i16 and v2i8 we can do the same, but we first use the default Altivec
shift-based expansion from v2i16 or v2i8 to v2i32 (by casting to v4i32) and
then apply the above procedure.

llvm-svn: 205146
2014-03-30 13:22:59 +00:00
Hal Finkel 777c9dd90a [PowerPC] Handle v2i64 comparisons
v2i64 is a legal type under VSX, however we don't have native vector
comparisons. We can handle eq/ne by casting it to an Altivec type, but
everything else must be expanded.

llvm-svn: 205106
2014-03-29 16:04:40 +00:00
Hal Finkel e8fba98735 [PowerPC] VSX instruction latency corrections
The vector divide and sqrt instructions have high latencies, and the scalar
comparisons are like all of the others. On the P7, permutations take an extra
cycle over purely-simple vector ops.

llvm-svn: 205096
2014-03-29 13:20:31 +00:00
Rafael Espindola 5904e12bfa Completely rewrite ELFObjectWriter::RecordRelocation.
I started trying to fix a small issue, but this code has seen a small fix too
many.

The old code was fairly convoluted. Some of the issues it had:

* It failed to check if a symbol difference was in the some section when
  converting a relocation to pcrel.
* It failed to check if the relocation was already pcrel.
* The pcrel value computation was wrong in some cases (relocation-pc.s)
* It was missing quiet a few cases where it should not convert symbol
  relocations to section relocations, leaving the backends to patch it up.
* It would not propagate the fact that it had changed a relocation to pcrel,
  requiring a quiet nasty work around in ARM.
* It was missing comments.

llvm-svn: 205076
2014-03-29 06:26:49 +00:00
Hal Finkel 19be506a5e [PowerPC] Add subregister classes for f64 VSX values
We had stored both f64 values and v2f64, etc. values in the VSX registers. This
worked, but was suboptimal because we would always spill 16-byte values even
through we almost always had scalar 8-byte values. This resulted in an
increase in stack-size use, extra memory bandwidth, etc. To fix this, I've
added 64-bit subregisters of the Altivec registers, and combined those with the
existing scalar floating-point registers to form a class of VSX scalar
floating-point registers. The ABI code has also been enhanced to use this
register class and some other necessary improvements have been made.

llvm-svn: 205075
2014-03-29 05:29:01 +00:00
Hal Finkel 2583b06310 [PowerPC] Fix VSX permutation isel
Not only did I invert the indices when I wrote the code, but I also did the
same thing when I wrote the regression test. Oops.

llvm-svn: 205046
2014-03-28 20:24:55 +00:00
Hal Finkel 7811c6188e [PowerPC] v2[fi]64 need to be explicitly passed in VSX registers
v2[fi]64 values need to be explicitly passed in VSX registers. This is because
the code in TRI that finds the minimal register class given a register and a
value type will assert if given an Altivec register and a non-Altivec type.

llvm-svn: 205041
2014-03-28 19:58:11 +00:00
Hal Finkel c6fc9b8960 [PowerPC] Use a small cleanup pass to remove VSX self copies
As explained in r204976, because of how the allocation of VSX registers
interacts with the call-lowering code, we sometimes end up generating self VSX
copies. Specifically, things like this:
  %VSL2<def> = COPY %F2, %VSL2<imp-use,kill>
(where %F2 is really a sub-register of %VSL2, and so this copy is a nop)

This adds a small cleanup pass to remove these prior to post-RA scheduling.

llvm-svn: 204980
2014-03-27 23:12:31 +00:00
Hal Finkel 9dcb3583d5 [PowerPC] Don't remove self VSX copies in PPCInstrInfo::copyPhysReg
Because of how the allocation of VSX registers interacts with the call-lowering
code, we sometimes end up generating self VSX copies. Specifically, things like
this:
  %VSL2<def> = COPY %F2, %VSL2<imp-use,kill>
(where %F2 is really a sub-register of %VSL2, and so this copy is a nop)

The problem is that ExpandPostRAPseudos always assumes that *some* instruction
has been inserted, and adds implicit defs to it. This is a problem if no copy
was inserted because it can cause subtle problems during post-RA scheduling.
These self copies will have to be removed some other way.

llvm-svn: 204976
2014-03-27 22:46:28 +00:00
Hal Finkel 82569b6366 [PowerPC] Fix v2f64 vector extract and related patterns
First, v2f64 vector extract had not been declared legal (and so the existing
patterns were not being used). Second, the patterns for that, and for
scalar_to_vector, should really be a regclass copy, not a subregister
operation, because the VSX registers directly hold both the vector and scalar data.

llvm-svn: 204971
2014-03-27 22:22:48 +00:00
Hal Finkel ad801b7459 [PowerPC] Expand v2i64 shifts
These operations need to be expanded during legalization so that isel does not
crash. In theory, we might be able to custom lower some of these. That,
however, would need to be follow-up work.

llvm-svn: 204963
2014-03-27 21:26:33 +00:00
Rafael Espindola c03f44ca8a Remove another unused argument.
llvm-svn: 204961
2014-03-27 20:49:35 +00:00
Rafael Espindola 9ab380122a Remove unused argument.
llvm-svn: 204956
2014-03-27 20:41:17 +00:00
Rafael Espindola 24a669d225 Prevent alias from pointing to weak aliases.
This adds back r204781.

Original message:

Aliases are just another name for a position in a file. As such, the
regular symbol resolutions are not applied. For example, given

define void @my_func() {
  ret void
}
@my_alias = alias weak void ()* @my_func
@my_alias2 = alias void ()* @my_alias

We produce without this patch:

        .weak   my_alias
my_alias = my_func
        .globl  my_alias2
my_alias2 = my_alias

That is, in the resulting ELF file my_alias, my_func and my_alias are
just 3 names pointing to offset 0 of .text. That is *not* the
semantics of IR linking. For example, linking in a

@my_alias = alias void ()* @other_func

would require the strong my_alias to override the weak one and
my_alias2 would end up pointing to other_func.

There is no way to represent that with aliases being just another
name, so the best solution seems to be to just disallow it, converting
a miscompile into an error.

llvm-svn: 204934
2014-03-27 15:26:56 +00:00
Hal Finkel df3e34d944 [PowerPC] Generate VSX permutations for v2[fi]64 vectors
llvm-svn: 204873
2014-03-26 22:58:37 +00:00
Hal Finkel 6e28e6aaaf [PowerPC] VSX loads and stores support unaligned access
I've not yet updated PPCTTI because I'm not sure what the actual relative cost
is compared to the aligned uses.

llvm-svn: 204848
2014-03-26 19:39:09 +00:00
Hal Finkel 7279f4b00d [PowerPC] Use v2f64 <-> v2i64 VSX conversion instructions
llvm-svn: 204843
2014-03-26 19:13:54 +00:00
Hal Finkel ea76a44584 [PowerPC] Remove some dead VSX v4f32 store patterns
These patterns are dead (because v4f32 stores are currently promoted to v4i32
and stored using Altivec instructions), and also are likely not correct
(because they'd store the vector elements in the opposite order from that
assumed by the rest of the Altivec code).

llvm-svn: 204839
2014-03-26 18:26:36 +00:00
Hal Finkel 9281c9a38b [PowerPC] Use VSX vector load/stores for v2[fi]64
These instructions have access to the complete VSX register file. In addition,
they "swap" the order of the elements so that element 0 (the scalar part) comes
first in memory and element 1 follows at a higher address.

llvm-svn: 204838
2014-03-26 18:26:30 +00:00
Hal Finkel a6c8b51212 [PowerPC] Add v2i64 as a legal VSX type
v2i64 needs to be a legal VSX type because it is the SetCC result type from
v2f64 comparisons. We need to expand all non-arithmetic v2i64 operations.

This fixes the lowering for v2f64 VSELECT.

llvm-svn: 204828
2014-03-26 16:12:58 +00:00
Hal Finkel 732f0f73a7 [PowerPC] Lower VSELECT using xxsel when VSX is available
With VSX there is a real vector select instruction, and so we should use it.
Note that VSELECT will still scalarize for v2f64 because the corresponding
SetCC result type (v2i64) is not currently a legal type.

llvm-svn: 204801
2014-03-26 12:49:28 +00:00
Rafael Espindola 65481d7b97 Revert "Prevent alias from pointing to weak aliases."
This reverts commit r204781.

I will follow up to with msan folks to see what is what they
were trying to do with aliases to weak aliases.

llvm-svn: 204784
2014-03-26 06:14:40 +00:00
Hal Finkel bd4de9d478 [PowerPC] Generate logical vector VSX instructions
These instructions are essentially the same as their Altivec counterparts, but
have access to the larger VSX register file.

llvm-svn: 204782
2014-03-26 04:55:40 +00:00
Rafael Espindola 3b712a84a9 Prevent alias from pointing to weak aliases.
Aliases are just another name for a position in a file. As such, the
regular symbol resolutions are not applied. For example, given

define void @my_func() {
  ret void
}
@my_alias = alias weak void ()* @my_func
@my_alias2 = alias void ()* @my_alias

We produce without this patch:

        .weak   my_alias
my_alias = my_func
        .globl  my_alias2
my_alias2 = my_alias

That is, in the resulting ELF file my_alias, my_func and my_alias are
just 3 names pointing to offset 0 of .text. That is *not* the
semantics of IR linking. For example, linking in a

@my_alias = alias void ()* @other_func

would require the strong my_alias to override the weak one and
my_alias2 would end up pointing to other_func.

There is no way to represent that with aliases being just another
name, so the best solution seems to be to just disallow it, converting
a miscompile into an error.

llvm-svn: 204781
2014-03-26 04:48:47 +00:00
Hal Finkel 174e590966 [PowerPC] Select between VSX A-type and M-type FMA instructions just before RA
The VSX instruction set has two types of FMA instructions: A-type (where the
addend is taken from the output register) and M-type (where one of the product
operands is taken from the output register). This adds a small pass that runs
just after MI scheduling (and, thus, just before register allocation) that
mutates A-type instructions (that are created during isel) into M-type
instructions when:

 1. This will eliminate an otherwise-necessary copy of the addend

 2. One of the product operands is killed by the instruction

The "right" moment to make this decision is in between scheduling and register
allocation, because only there do we know whether or not one of the product
operands is killed by any particular instruction. Unfortunately, this also
makes the implementation somewhat complicated, because the MIs are not in SSA
form and we need to preserve the LiveIntervals analysis.

As a simple example, if we have:

%vreg5<def> = COPY %vreg9; VSLRC:%vreg5,%vreg9
%vreg5<def,tied1> = XSMADDADP %vreg5<tied0>, %vreg17, %vreg16,
                        %RM<imp-use>; VSLRC:%vreg5,%vreg17,%vreg16
  ...
  %vreg9<def,tied1> = XSMADDADP %vreg9<tied0>, %vreg17, %vreg19,
                        %RM<imp-use>; VSLRC:%vreg9,%vreg17,%vreg19
  ...

We can eliminate the copy by changing from the A-type to the
M-type instruction. This means:

  %vreg5<def,tied1> = XSMADDADP %vreg5<tied0>, %vreg17, %vreg16,
                        %RM<imp-use>; VSLRC:%vreg5,%vreg17,%vreg16

is replaced by:

  %vreg16<def,tied1> = XSMADDMDP %vreg16<tied0>, %vreg18, %vreg9,
                        %RM<imp-use>; VSLRC:%vreg16,%vreg18,%vreg9

and we remove: %vreg5<def> = COPY %vreg9; VSLRC:%vreg5,%vreg9

llvm-svn: 204768
2014-03-25 23:29:21 +00:00
Hal Finkel 6c32ff31d0 [PowerPC] Correct commutable indices for VSX FMA instructions
Although the first two operands are the ones that can be swapped, the tied
input operand is listed before them, so we need to adjust for that.

I have a test case for this, but it goes along with an upcoming commit (so it
will come soon).

llvm-svn: 204748
2014-03-25 19:26:43 +00:00
Hal Finkel 25e0454f10 [PowerPC] Add a TableGen relation for A-type and M-type VSX FMA instructions
TableGen will create a lookup table for the A-type FMA instructions providing
their corresponding M-form opcodes. This will be used by upcoming commits.

llvm-svn: 204746
2014-03-25 18:55:11 +00:00
Ulrich Weigand cae3a17a21 [PowerPC] Generate little-endian object files
As a first step towards real little-endian code generation, this patch
changes the PowerPC MC layer to actually generate little-endian object
files.  This involves passing the little-endian flag through the various
layers, including down to createELFObjectWriter so we actually get basic
little-endian ELF objects, emitting instructions in little-endian order,
and handling fixups and relocations as appropriate for little-endian.

The bulk of the patch is to update most test cases in test/MC/PowerPC
to verify both big- and little-endian encodings.  (The only test cases
*not* updated are those that create actual big-endian ABI code, like
the TLS tests.)

Note that while the object files are now little-endian, the generated
code itself is not yet updated, in particular, it still does not adhere
to the ELFv2 ABI.

llvm-svn: 204634
2014-03-24 18:16:09 +00:00
Will Schmidt 114777e47f [PPC64LE] ELFv2 ABI updates for the .opd section
[PPC64LE] ELFv2 ABI updates for the .opd section
The PPC64 Little Endian (PPC64LE) target supports the ELFv2 ABI, and as
such, does not have a ".opd" section.  This is keyed off a _CALL_ELF=2
macro check.

The CALL_ELF check is not clearly documented at this time.  The basis
for usage in this patch is from the gcc thread here:
http://gcc.gnu.org/ml/gcc-patches/2013-11/msg01144.html

> Adding comment from Uli:
Looks good to me.  I think the old-style JIT doesn't really work
anyway for 64-bit, but at least with this patch LLVM will compile
and link again on a ppc64le host ...

llvm-svn: 204614
2014-03-24 16:04:15 +00:00
Hal Finkel e01d32107c [PowerPC] Mark many instructions as commutative
I'm under the impression that we used to infer the isCommutable flag from the
instruction-associated pattern. Regardless, we don't seem to do this (at least
by default) any more. I've gone through all of our instruction definitions, and
marked as commutative all of those that should be trivial to commute (by
exchanging the first two operands). There has been special code for the RL*
instructions, and that's not changed.

Before this change, we had the following commutative instructions:

 RLDIMI
 RLDIMIo
 RLWIMI
 RLWIMI8
 RLWIMI8o
 RLWIMIo
 XSADDDP
 XSMULDP
 XVADDDP
 XVADDSP
 XVMULDP
 XVMULSP

After:

 ADD4
 ADD4o
 ADD8
 ADD8o
 ADDC
 ADDC8
 ADDC8o
 ADDCo
 ADDE
 ADDE8
 ADDE8o
 ADDEo
 AND
 AND8
 AND8o
 ANDo
 CRAND
 CREQV
 CRNAND
 CRNOR
 CROR
 CRXOR
 EQV
 EQV8
 EQV8o
 EQVo
 FADD
 FADDS
 FADDSo
 FADDo
 FMADD
 FMADDS
 FMADDSo
 FMADDo
 FMSUB
 FMSUBS
 FMSUBSo
 FMSUBo
 FMUL
 FMULS
 FMULSo
 FMULo
 FNMADD
 FNMADDS
 FNMADDSo
 FNMADDo
 FNMSUB
 FNMSUBS
 FNMSUBSo
 FNMSUBo
 MULHD
 MULHDU
 MULHDUo
 MULHDo
 MULHW
 MULHWU
 MULHWUo
 MULHWo
 MULLD
 MULLDo
 MULLW
 MULLWo
 NAND
 NAND8
 NAND8o
 NANDo
 NOR
 NOR8
 NOR8o
 NORo
 OR
 OR8
 OR8o
 ORo
 RLDIMI
 RLDIMIo
 RLWIMI
 RLWIMI8
 RLWIMI8o
 RLWIMIo
 VADDCUW
 VADDFP
 VADDSBS
 VADDSHS
 VADDSWS
 VADDUBM
 VADDUBS
 VADDUHM
 VADDUHS
 VADDUWM
 VADDUWS
 VAND
 VAVGSB
 VAVGSH
 VAVGSW
 VAVGUB
 VAVGUH
 VAVGUW
 VMADDFP
 VMAXFP
 VMAXSB
 VMAXSH
 VMAXSW
 VMAXUB
 VMAXUH
 VMAXUW
 VMHADDSHS
 VMHRADDSHS
 VMINFP
 VMINSB
 VMINSH
 VMINSW
 VMINUB
 VMINUH
 VMINUW
 VMLADDUHM
 VMULESB
 VMULESH
 VMULEUB
 VMULEUH
 VMULOSB
 VMULOSH
 VMULOUB
 VMULOUH
 VNMSUBFP
 VOR
 VXOR
 XOR
 XOR8
 XOR8o
 XORo
 XSADDDP
 XSMADDADP
 XSMAXDP
 XSMINDP
 XSMSUBADP
 XSMULDP
 XSNMADDADP
 XSNMSUBADP
 XVADDDP
 XVADDSP
 XVMADDADP
 XVMADDASP
 XVMAXDP
 XVMAXSP
 XVMINDP
 XVMINSP
 XVMSUBADP
 XVMSUBASP
 XVMULDP
 XVMULSP
 XVNMADDADP
 XVNMADDASP
 XVNMSUBADP
 XVNMSUBASP
 XXLAND
 XXLNOR
 XXLOR
 XXLXOR

This is a by-inspection change, and I'm not sure how to write a reliable test
case. I would like advice on this, however.

llvm-svn: 204609
2014-03-24 15:07:28 +00:00
Hal Finkel 32854b0439 [PowerPC] Don't schedule VSX copy legalization unless VSX is enabled
There is no need to schedule this extra pass if it will have nothing to do.

llvm-svn: 204594
2014-03-24 09:51:41 +00:00
Hal Finkel bbad2332e3 [PowerPC] Update comment re: VSX copy-instruction selection
I've done some experimentation with this, and it looks like using the
lower-latency (but lower throughput) copy instruction is essentially always the
right thing to do.

My assumption is that, in order to be relatively sure that the higher-latency
copy will increase throughput, we'd want to have it unlikely to be in-flight
with its use. On the P7, the global completion table (GCT) can hold a maximum
of 120 instructions, shared among all active threads (up to 4), giving 30
instructions per thread.  So specifically, I'd require at least that many
instructions between the copy and the use before the high-latency variant is
used.

Trying this, however, over the entire test suite resulted in zero cases where
the high-latency form would be preferable. This may be a consequence of the
fact that the scheduler views copies as free, and so they tend to end up close
to their uses. For this experiment I created a function:

  unsigned chooseVSXCopy(MachineBasicBlock &MBB,
                         MachineBasicBlock::iterator I,
                         unsigned DestReg, unsigned SrcReg,
                         unsigned StartDist = 1,
                         unsigned Depth = 3) const;

with an implementation like:

  if (!Depth)
    return PPC::XXLOR;

  const unsigned MaxDist = 30;
  unsigned Dist = StartDist;
  for (auto J = I, JE = MBB.end(); J != JE && Dist <= MaxDist; ++J) {
    if (J->isTransient() && !J->isCopy())
      continue;

    if (J->isCall() || J->isReturn() || J->readsRegister(DestReg, TRI))
      return PPC::XXLOR;

    ++Dist;
  }

  // We've exceeded the required distance for the high-latency form, use it.
  if (Dist > MaxDist)
    return PPC::XVCPSGNDP;

  // If this is only an exit block, use the low-latency form.
  if (MBB.succ_empty())
    return PPC::XXLOR;

  // We've reached the end of the block, check the successor blocks (up to some
  // depth), and use the high-latency form if that is okay with all successors.
  for (auto J = MBB.succ_begin(), JE = MBB.succ_end(); J != JE; ++J) {
    if (chooseVSXCopy(**J, (*J)->begin(), DestReg, SrcReg,
                      Dist, --Depth) == PPC::XXLOR)
      return PPC::XXLOR;
  }

  // All of our successor blocks seem okay with the high-latency variant, so
  // we'll use it.
  return PPC::XVCPSGNDP;

and then changed the copy opcode selection from:
    Opc = PPC::XXLOR;
to:
    Opc = chooseVSXCopy(MBB, std::next(I), DestReg, SrcReg);

In conclusion, I'm removing the FIXME from the comment, because I believe that
there is, at least absent other examples, nothing to fix.

llvm-svn: 204591
2014-03-24 09:36:36 +00:00
Nuno Lopes 31617266ea remove a bunch of unused private methods
found with a smarter version of -Wunused-member-function that I'm playwing with.
Appologies in advance if I removed someone's WIP code.

 include/llvm/CodeGen/MachineSSAUpdater.h            |    1 
 include/llvm/IR/DebugInfo.h                         |    3 
 lib/CodeGen/MachineSSAUpdater.cpp                   |   10 --
 lib/CodeGen/PostRASchedulerList.cpp                 |    1 
 lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp    |   10 --
 lib/IR/DebugInfo.cpp                                |   12 --
 lib/MC/MCAsmStreamer.cpp                            |    2 
 lib/Support/YAMLParser.cpp                          |   39 ---------
 lib/TableGen/TGParser.cpp                           |   16 ---
 lib/TableGen/TGParser.h                             |    1 
 lib/Target/AArch64/AArch64TargetTransformInfo.cpp   |    9 --
 lib/Target/ARM/ARMCodeEmitter.cpp                   |   12 --
 lib/Target/ARM/ARMFastISel.cpp                      |   84 --------------------
 lib/Target/Mips/MipsCodeEmitter.cpp                 |   11 --
 lib/Target/Mips/MipsConstantIslandPass.cpp          |   12 --
 lib/Target/NVPTX/NVPTXISelDAGToDAG.cpp              |   21 -----
 lib/Target/NVPTX/NVPTXISelDAGToDAG.h                |    2 
 lib/Target/PowerPC/PPCFastISel.cpp                  |    1 
 lib/Transforms/Instrumentation/AddressSanitizer.cpp |    2 
 lib/Transforms/Instrumentation/BoundsChecking.cpp   |    2 
 lib/Transforms/Instrumentation/MemorySanitizer.cpp  |    1 
 lib/Transforms/Scalar/LoopIdiomRecognize.cpp        |    8 -
 lib/Transforms/Scalar/SCCP.cpp                      |    1 
 utils/TableGen/CodeEmitterGen.cpp                   |    2 
 24 files changed, 2 insertions(+), 261 deletions(-)

llvm-svn: 204560
2014-03-23 17:09:26 +00:00
Hal Finkel 4a912250fa [PowerPC] Make use of VSX f64 <-> i64 conversion instructions
When VSX is available, these instructions should be used in preference to the
older variants that only have access to the scalar floating-point registers.

llvm-svn: 204559
2014-03-23 05:35:00 +00:00
Hal Finkel 55805eb562 [PowerPC] Fix the VSX v2f64 return register
v2f64 values, like other 128-bit values, are returned under VSX in register
vs34 (Altivec register v2).

llvm-svn: 204543
2014-03-22 18:24:43 +00:00
Alexey Samsonov 94bc422d7a Add llvm_unreachable after fully-covered switches to appease GCC
llvm-svn: 204318
2014-03-20 07:30:40 +00:00
Rafael Espindola 7fadc0ea7d Look through variables when computing relocations.
Given

bar = foo + 4
	.long bar

MC would eat the 4. GNU as includes it in the relocation. The rule seems to be
that a variable that defines a symbol is used in the relocation and one that
does not define a symbol is evaluated and the result included in the relocation.

Fixing this unfortunately required some other changes:

* Since the variable is now evaluated, it would prevent the ELF writer from
  noticing the weakref marker the elf streamer uses. This patch then replaces
  that with a VariantKind in MCSymbolRefExpr.

* Using VariantKind then requires us to look past other VariantKind to see

	.weakref	bar,foo
	call	bar@PLT

  doing this also fixes

	zed = foo +2
	call zed@PLT

  so that is a good thing.

* Looking past VariantKind means that the relocation selection has to use
  the fixup instead of the target.

This is a reboot of the previous fixes for MC. I will watch the sanitizer
buildbot and wait for a build before adding back the previous fixes.

llvm-svn: 204294
2014-03-20 02:12:01 +00:00
Bill Schmidt ff9622ef0e Fix PR19144: Incorrect offset generated for int-to-fp conversion at -O0.
When converting a signed 32-bit integer to double-precision floating point on
hardware without a lfiwax instruction, we have to instead use a lfd followed
by fcfid.  We were erroneously offsetting the address by 4 bytes in
preparation for either a lfiwax or lfiwzx when generating the lfd.  This fixes
that silly error.

This was not caught in the test suite since the conversion tests were run with
-mcpu=pwr7, which implies availability of lfiwax.  I've added another test
case for older hardware that checks the code we expect in the absence of
lfiwax and other flavors of fcfid.  There are fewer tests in this test case
because we punt to DAG selection in more cases on older hardware.  (We must
generate complex fiddly sequences in those cases, and there is marginal
benefit in duplicating that logic in fast-isel.)

llvm-svn: 204155
2014-03-18 14:32:50 +00:00
Craig Topper 26696314d5 [C++11] Mark the target fast isel classes as 'final' so that the compiler can de-virtualize some of the internal calls.
llvm-svn: 204123
2014-03-18 07:27:13 +00:00
Ulrich Weigand f445399870 [ppc64] Avoid copy relocs in named rodata sections
Commit r181723 introduced code to avoid placing initialized variables
needing relocations into the .rodata section, which avoid copy relocs
that do not work as expected on ppc64 function references.

The same treatment is also needed for *named* .rodata.XXX sections.
This patch changes PPC64LinuxTargetObjectFile::SelectSectionForGlobal
to modify "Kind" *before* calling the default SelectSectionForGlobal
routine, instead of first calling the default routine and then just
checking for the (main) .rodata section afterwards.

llvm-svn: 203921
2014-03-14 12:45:22 +00:00
Owen Anderson 16c6bf49b7 Phase 2 of the great MachineRegisterInfo cleanup. This time, we're changing
operator* on the by-operand iterators to return a MachineOperand& rather than
a MachineInstr&.  At this point they almost behave like normal iterators!

Again, this requires making some existing loops more verbose, but should pave
the way for the big range-based for-loop cleanups in the future.

llvm-svn: 203865
2014-03-13 23:12:04 +00:00
Hal Finkel 27774d9274 [PowerPC] Initial support for the VSX instruction set
VSX is an ISA extension supported on the POWER7 and later cores that enhances
floating-point vector and scalar capabilities. Among other things, this adds
<2 x double> support and generally helps to reduce register pressure.

The interesting part of this ISA feature is the register configuration: there
are 64 new 128-bit vector registers, the 32 of which are super-registers of the
existing 32 scalar floating-point registers, and the second 32 of which overlap
with the 32 Altivec vector registers. This makes things like vector insertion
and extraction tricky: this can be free but only if we force a restriction to
the right register subclass when needed. A new "minipass" PPCVSXCopy takes care
of this (although it could do a more-optimal job of it; see the comment about
unnecessary copies below).

Please note that, currently, VSX is not enabled by default when targeting
anything because it is not yet ready for that.  The assembler and disassembler
are fully implemented and tested. However:

 - CodeGen support causes miscompiles; test-suite runtime failures:
      MultiSource/Benchmarks/FreeBench/distray/distray
      MultiSource/Benchmarks/McCat/08-main/main
      MultiSource/Benchmarks/Olden/voronoi/voronoi
      MultiSource/Benchmarks/mafft/pairlocalalign
      MultiSource/Benchmarks/tramp3d-v4/tramp3d-v4
      SingleSource/Benchmarks/CoyoteBench/almabench
      SingleSource/Benchmarks/Misc/matmul_f64_4x4

 - The lowering currently falls back to using Altivec instructions far more
   than it should. Worse, there are some things that are scalarized through the
   stack that shouldn't be.

 - A lot of unnecessary copies make it past the optimizers, and this needs to
   be fixed.

 - Many more regression tests are needed.

Normally, I'd fix these things prior to committing, but there are some
students and other contributors who would like to work this, and so it makes
sense to move this development process upstream where it can be subject to the
regular code-review procedures.

llvm-svn: 203768
2014-03-13 07:58:58 +00:00
Hal Finkel 5457bd08cb [TableGen] Optionally forbid overlap between named and positional operands
There are currently two schemes for mapping instruction operands to
instruction-format variables for generating the instruction encoders and
decoders for the assembler and disassembler respectively: a) to map by name and
b) to map by position.

In the long run, we'd like to remove the position-based scheme and use only
name-based mapping. Unfortunately, the name-based scheme currently cannot deal
with complex operands (those with suboperands), and so we currently must use
the position-based scheme for those. On the other hand, the position-based
scheme cannot deal with (register) variables that are split into multiple
ranges. An upcoming commit to the PowerPC backend (adding VSX support) will
require this capability. While we could teach the position-based scheme to
handle that, since we'd like to move away from the position-based mapping
generally, it seems silly to teach it new tricks now. What makes more sense is
to allow for partial transitioning: use the name-based mapping when possible,
and only use the position-based scheme when necessary.

Now the problem is that mixing the two sensibly was not possible: the
position-based mapping would map based on position, but would not skip those
variables that were mapped by name. Instead, the two sets of assignments would
overlap. However, I cannot currently change the current behavior, because there
are some backends that rely on it [I think mistakenly, but I'll send a message
to llvmdev about that]. So I've added a new TableGen bit variable:
noNamedPositionallyEncodedOperands, that can be used to cause the
position-based mapping to skip variables mapped by name.

llvm-svn: 203767
2014-03-13 07:57:54 +00:00
Roman Divacky a26f9a6a42 Allow exclamation and tilde to be parsed as a part of the ppc asm operand.
llvm-svn: 203699
2014-03-12 19:25:57 +00:00
Rafael Espindola 3d5d464df8 Try harder to evaluate expressions when printing assembly.
When printing assembly we don't have a Layout object, but we can still
try to fold some constants.

Testcase by Ulrich Weigand.

llvm-svn: 203677
2014-03-12 16:55:59 +00:00
Will Schmidt acae468c8e Update the datalayout string for ppc64LE.
Update the datalayout string for ppc64LE.

llvm-svn: 203664
2014-03-12 14:59:17 +00:00
Patrik Hagglund 1da3512166 Replace '#include ValueTypes.h' with forward declarations.
In some cases the include is pushed "downstream" (or removed if
unused).

llvm-svn: 203644
2014-03-12 08:00:24 +00:00
Chandler Carruth aee3ca6cfd [TTI] There is actually no realistic way to pop TTI implementations off
the stack of the analysis group because they are all immutable passes.
This is made clear by Craig's recent work to use override
systematically -- we weren't overriding anything for 'finalizePass'
because there is no such thing.

This is kind of a lame restriction on the API -- we can no longer push
and pop things, we just set up the stack and run. However, I'm not
invested in building some better solution on top of the existing
(terrifying) immutable pass and legacy pass manager.

llvm-svn: 203437
2014-03-10 02:45:14 +00:00
Rafael Espindola 24a542fd5c Don't avoid cfi instructions on the bg/p.
The integrated assembler now works for ppc. Since this was the last use of the
bg/p predicate and Hal says that it is now dead, drop the predicate too.

llvm-svn: 203269
2014-03-07 19:04:12 +00:00
David Majnemer 7b58305ff6 MC: Remove superfluous section attribute flag definitions
Summary:
llvm/MC/MCSectionMachO.h and llvm/Support/MachO.h both had the same
definitions for the section flags.  Instead, grab the definitions out of
support.

No functionality change.

Reviewers: grosbach, Bigcheese, rafael

Reviewed By: rafael

CC: llvm-commits

Differential Revision: http://llvm-reviews.chandlerc.com/D2998

llvm-svn: 203211
2014-03-07 07:36:05 +00:00
Rafael Espindola b1f25f1b93 Replace PROLOG_LABEL with a new CFI_INSTRUCTION.
The old system was fairly convoluted:
* A temporary label was created.
* A single PROLOG_LABEL was created with it.
* A few MCCFIInstructions were created with the same label.

The semantics were that the cfi instructions were mapped to the PROLOG_LABEL
via the temporary label. The output position was that of the PROLOG_LABEL.
The temporary label itself was used only for doing the mapping.

The new CFI_INSTRUCTION has a 1:1 mapping to MCCFIInstructions and points to
one by holding an index into the CFI instructions of this function.

I did consider removing MMI.getFrameInstructions completelly and having
CFI_INSTRUCTION own a MCCFIInstruction, but MCCFIInstructions have non
trivial constructors and destructors and are somewhat big, so the this setup
is probably better.

The net result is that we don't create temporary labels that are never used.

llvm-svn: 203204
2014-03-07 06:08:31 +00:00
Hal Finkel 6daf2aa140 The PPC global base register cannot be r0
The global base register cannot be r0 because it might end up as the first
argument to addi or addis. Fixes PR18316.

I don't have a small stable test case.

llvm-svn: 203054
2014-03-06 01:28:23 +00:00
Chandler Carruth 9a4c9e597b [Layering] Move DebugInfo.h into the IR library where its implementation
already lives.

llvm-svn: 203046
2014-03-06 00:46:21 +00:00
Hal Finkel 7f908e8ef4 Fixup PPC Darwin i1 argument handling
Like on other targets, we need to zero_extend/truncate i1 args before copying
them to GPRs.

llvm-svn: 203045
2014-03-06 00:45:19 +00:00
Hal Finkel 2a9d318e4a When using CR bit registers on PPC32, handle the i1 vaarg case
When copying an i1 value into a GPR for a vaarg call, we need to explicitly
zero-extend the i1 value (otherwise an invalid CRBIT -> GPR copy will be
generated).

llvm-svn: 203041
2014-03-06 00:23:33 +00:00
Hal Finkel 6a56b21729 With PPC CR bit registers, handle int_to_fp on older cores
On cores without fpcvt support, we cannot promote int_to_fp i1 operations,
because there is nothing to promote them to. The most straightforward
implementation of this uses a select to choose between the two possible
resulting floating-point values (and that's what is done here).

llvm-svn: 203015
2014-03-05 22:14:00 +00:00
Joerg Sonnenberger cce644a633 Enable integrated assembler on OpenBSD/PPC32 by default, too.
From Brad Smith.

llvm-svn: 202967
2014-03-05 11:37:04 +00:00
Will Schmidt 3503018e19 [PowerPC] support powerpc64le as syntax-checking target (pass2)
Register the Asm Printer for the ppc64le target.

This fills in a spot that was missed in an earlier change (r187179).

llvm-svn: 202861
2014-03-04 16:51:52 +00:00
Chandler Carruth 4220e9c154 [Modules] Move ValueHandle into the IR library where Value itself lives.
Move the test for this class into the IR unittests as well.

This uncovers that ValueMap too is in the IR library. Ironically, the
unittest for ValueMap is useless in the Support library (honestly, so
was the ValueHandle test) and so it already lives in the IR unittests.
Mmmm, tasty layering.

llvm-svn: 202821
2014-03-04 11:17:44 +00:00
Chandler Carruth 03eb0de93d [Modules] Move GetElementPtrTypeIterator into the IR library. As its
name might indicate, it is an iterator over the types in an instruction
in the IR.... You see where this is going.

Another step of modularizing the support library.

llvm-svn: 202815
2014-03-04 10:40:04 +00:00
Hal Finkel 6aca2373f2 Add a PPC inline asm constraint type for single CR bits
Now that the PowerPC backend can track individual CR bits as first-class
registers, we should also have a way of allocating them for inline asm
statements. Because these registers are only one bit, if an output variable is
implicitly cast to a larger integer size, we'll get an any_extend to that
larger type (this is part of the existing target-independent logic). As a
result, regardless of the size of the output type, only the first bit is
meaningful.

The constraint identifier "wc" has been chosen for this purpose. Although gcc
does not currently support allocating individual CR bits, this identifier
choice has been coordinated with the gcc PowerPC team, and will be marked as
reserved for this purpose in the gcc constraints.md file.

llvm-svn: 202657
2014-03-02 18:23:39 +00:00
Benjamin Kramer b6d0bd48bd [C++11] Replace llvm::next and llvm::prior with std::next and std::prev.
Remove the old functions.

llvm-svn: 202636
2014-03-02 12:27:27 +00:00
Craig Topper 73156025e0 Switch all uses of LLVM_OVERRIDE to just use 'override' directly.
llvm-svn: 202621
2014-03-02 09:09:27 +00:00
Craig Topper 77dfe45f81 Switch all uses of LLVM_FINAL to just use 'final', and remove the macro.
llvm-svn: 202618
2014-03-02 08:08:51 +00:00
Hal Finkel 46043edc56 Remove extra truncs/exts around i32 bit operations on PPC64
This generalizes the code to eliminate extra truncs/exts around i1 bit
operations to also do the same on PPC64 for i32 bit operations. This eliminates
a fairly prevalent code wart:

int foo(int a) {
  return a == 5 ? 7 : 8;
}

On PPC64, because of the extension implied by the ABI, this would generate:

	cmplwi 0, 3, 5
	li 12, 8
	li 4, 7
	isel 3, 4, 12, 2
	rldicl 3, 3, 0, 32
	blr

where the 'rldicl 3, 3, 0, 32', the extension, is completely unnecessary. At
least for the single-BB case (which is all that the DAG combine mechanism can
handle), this unnecessary extension is no longer generated.

llvm-svn: 202600
2014-03-01 21:36:57 +00:00
Hal Finkel b998915ee1 Swap PPC isel operands to allow for 0-folding
The PPC isel instruction can fold 0 into the first operand (thus eliminating
the need to materialize a zero-containing register when the 'true' result of
the isel is 0). When the isel is fed by a bit register operation that we can
invert, do so as part of the bit-register-operation peephole routine.

llvm-svn: 202469
2014-02-28 06:11:16 +00:00
Hal Finkel 5cae2168c7 Trying to unbreak the darwin11 builder
The CR bit tracking code broke PPC/Darwin; trying to get it working again...

(the darwin11 builder, which defaults to the darwin ABI when running PPC tests,
asserted when running test/CodeGen/PowerPC/inverted-bool-compares.ll)

llvm-svn: 202459
2014-02-28 01:17:25 +00:00
Hal Finkel b39a0475c0 Try to unbreak the C++11 build
Cannot use negative numbers in case statements without running afoul of -Wc++11-narrowing.

llvm-svn: 202455
2014-02-28 00:45:27 +00:00
Hal Finkel 940ab934d4 Add CR-bit tracking to the PowerPC backend for i1 values
This change enables tracking i1 values in the PowerPC backend using the
condition register bits. These bits can be treated on PowerPC as separate
registers; individual bit operations (and, or, xor, etc.) are supported.
Tracking booleans in CR bits has several advantages:

 - Reduction in register pressure (because we no longer need GPRs to store
   boolean values).

 - Logical operations on booleans can be handled more efficiently; we used to
   have to move all results from comparisons into GPRs, perform promoted
   logical operations in GPRs, and then move the result back into condition
   register bits to be used by conditional branches. This can be very
   inefficient, because the throughput of these CR <-> GPR moves have high
   latency and low throughput (especially when other associated instructions
   are accounted for).

 - On the POWER7 and similar cores, we can increase total throughput by using
   the CR bits. CR bit operations have a dedicated functional unit.

Most of this is more-or-less mechanical: Adjustments were needed in the
calling-convention code, support was added for spilling/restoring individual
condition-register bits, and conditional branch instruction definitions taking
specific CR bits were added (plus patterns and code for generating bit-level
operations).

This is enabled by default when running at -O2 and higher. For -O0 and -O1,
where the ability to debug is more important, this feature is disabled by
default. Individual CR bits do not have assigned DWARF register numbers,
and storing values in CR bits makes them invisible to the debugger.

It is critical, however, that we don't move i1 values that have been promoted
to larger values (such as those passed as function arguments) into bit
registers only to quickly turn around and move the values back into GPRs (such
as happens when values are returned by functions). A pair of target-specific
DAG combines are added to remove the trunc/extends in:
  trunc(binary-ops(binary-ops(zext(x), zext(y)), ...)
and:
  zext(binary-ops(binary-ops(trunc(x), trunc(y)), ...)
In short, we only want to use CR bits where some of the i1 values come from
comparisons or are used by conditional branches or selects. To put it another
way, if we can do the entire i1 computation in GPRs, then we probably should
(on the POWER7, the GPR-operation throughput is higher, and for all cores, the
CR <-> GPR moves are expensive).

POWER7 test-suite performance results (from 10 runs in each configuration):

SingleSource/Benchmarks/Misc/mandel-2: 35% speedup
MultiSource/Benchmarks/Prolangs-C++/city/city: 21% speedup
MultiSource/Benchmarks/MiBench/automotive-susan: 23% speedup
SingleSource/Benchmarks/CoyoteBench/huffbench: 13% speedup
SingleSource/Benchmarks/Misc-C++/Large/sphereflake: 13% speedup
SingleSource/Benchmarks/Misc-C++/mandel-text: 10% speedup

SingleSource/Benchmarks/Misc-C++-EH/spirit: 10% slowdown
MultiSource/Applications/lemon/lemon: 8% slowdown

llvm-svn: 202451
2014-02-28 00:27:01 +00:00
Aaron Ballman 3d69c5cae4 Silencing an MSVC signed comparison warning.
llvm-svn: 202295
2014-02-26 20:22:20 +00:00
Hal Finkel 22304046c7 Account for 128-bit integer operations in PPCCTRLoops
We need to abort the formation of counter-register-based loops where there are
128-bit integer operations that might become function calls.

llvm-svn: 202192
2014-02-25 20:51:50 +00:00
Rafael Espindola 935125126c Make DataLayout a plain object, not a pass.
Instead, have a DataLayoutPass that holds one. This will allow parts of LLVM
don't don't handle passes to also use DataLayout.

llvm-svn: 202168
2014-02-25 17:30:31 +00:00
Rafael Espindola aeff8a9c05 Make some DataLayout pointers const.
No functionality change. Just reduces the noise of an upcoming patch.

llvm-svn: 202087
2014-02-24 23:12:18 +00:00
Rafael Espindola 612886fc8c Rename a few more DataLayout variables.
llvm-svn: 201833
2014-02-21 01:53:35 +00:00
Rafael Espindola a3ad4e693c move getNameWithPrefix and getSymbol to TargetMachine.
TargetLoweringBase is implemented in CodeGen, so before this patch we had
a dependency fom Target to CodeGen. This would show up as a link failure of
llvm-stress when building with -DBUILD_SHARED_LIBS=ON.

This fixes pr18900.

llvm-svn: 201711
2014-02-19 20:30:41 +00:00
Rafael Espindola daeafb4c2a Add back r201608, r201622, r201624 and r201625
r201608 made llvm corretly handle private globals with MachO. r201622 fixed
a bug in it and r201624 and r201625 were changes for using private linkage,
assuming that llvm would do the right thing.

They all got reverted because r201608 introduced a crash in LTO. This patch
includes a fix for that. The issue was that TargetLoweringObjectFile now has
to be initialized before we can mangle names of private globals. This is
trivially true during the normal codegen pipeline (the asm printer does it),
but LTO has to do it manually.

llvm-svn: 201700
2014-02-19 17:23:20 +00:00
Daniel Jasper 7e198ad862 Revert r201622 and r201608.
This causes the LLVMgold plugin to segfault. More information on the
replies to r201608.

llvm-svn: 201669
2014-02-19 12:26:01 +00:00
Rafael Espindola 09dcc6a536 Fix PR18743.
The IR
@foo = private constant i32 42

is valid, but before this patch we would produce an invalid MachO from it. It
was invalid because it would use an L label in a section where the liker needs
the labels in order to atomize it.

One way of fixing it would be to just reject this IR in the backend, but that
would not be very front end friendly.

What this patch does is use an 'l' prefix in sections that we know the linker
requires symbols for atomizing them. This allows frontends to just use
private and not worry about which sections they go to or how the linker handles
them.

One small issue with this strategy is that now a symbol name depends on the
section, which is not available before codegen. This is not a problem in
practice. The reason is that it only happens with private linkage, which will
be ignored by the non codegen users (llvm-nm and llvm-ar).

llvm-svn: 201608
2014-02-18 22:24:57 +00:00
Rafael Espindola ea09c595a6 Rename a DebugLoc variable to DbgLoc and a DataLayout to DL.
This is quiet a bit less confusing now that TargetData was renamed DataLayout.

llvm-svn: 201606
2014-02-18 22:05:46 +00:00
Daniel Sanders 753e17629d Re-commit: Demote EmitRawText call in AsmPrinter::EmitInlineAsm() and remove hasRawTextSupport() call
Summary:
AsmPrinter::EmitInlineAsm() will no longer use the EmitRawText() call for
targets with mature MC support. Such targets will always parse the inline
assembly (even when emitting assembly). Targets without mature MC support
continue to use EmitRawText() for assembly output.

The hasRawTextSupport() check in AsmPrinter::EmitInlineAsm() has been replaced
with MCAsmInfo::UseIntegratedAs which when true, causes the integrated assembler
to parse inline assembly (even when emitting assembly output). UseIntegratedAs
is set to true for targets that consider any failure to parse valid assembly
to be a bug. Target specific subclasses generally enable the integrated
assembler in their constructor. The default value can be overridden with
-no-integrated-as.

All tests that rely on inline assembly supporting invalid assembly (for example,
those that use mnemonics such as 'foo' or 'hello world') have been updated to
disable the integrated assembler.

Changes since review (and last commit attempt):
- Fixed test failures that were missed due to configuration of local build.
  (fixes crash.ll and a couple others).
- Fixed tests that happened to pass because the local build was on X86
  (should fix 2007-12-17-InvokeAsm.ll)
- mature-mc-support.ll's should no longer require all targets to be compiled.
  (should fix ARM and PPC buildbots)
- Object output (-filetype=obj and similar) now forces the integrated assembler
  to be enabled regardless of default setting or -no-integrated-as.
  (should fix SystemZ buildbots)

Reviewers: rafael

Reviewed By: rafael

CC: llvm-commits

Differential Revision: http://llvm-reviews.chandlerc.com/D2686

llvm-svn: 201333
2014-02-13 14:44:26 +00:00
Daniel Sanders abe212a3b8 Revert r201237+r201238: Demote EmitRawText call in AsmPrinter::EmitInlineAsm() and remove hasRawTextSupport() call
It introduced multiple test failures in the buildbots.

llvm-svn: 201241
2014-02-12 15:39:20 +00:00
Daniel Sanders a7d504cf58 Demote EmitRawText call in AsmPrinter::EmitInlineAsm() and remove hasRawTextSupport() call
Summary:
AsmPrinter::EmitInlineAsm() will no longer use the EmitRawText() call for targets with mature MC support. Such targets will always parse the inline assembly (even when emitting assembly). Targets without mature MC support continue to use EmitRawText() for assembly output.

The hasRawTextSupport() check in AsmPrinter::EmitInlineAsm() has been replaced with MCAsmInfo::UseIntegratedAs which when true, causes the integrated assembler to parse inline assembly (even when emitting assembly output). UseIntegratedAs is set to true for targets that consider any failure to parse valid assembly to be a bug. Target specific subclasses generally enable the integrated assembler in their constructor. The default value can be overridden with -no-integrated-as.

All tests that rely on inline assembly supporting invalid assembly (for example, those that use mnemonics such as 'foo' or 'hello world') have been updated to disable the integrated assembler.

Reviewers: rafael

Reviewed By: rafael

CC: llvm-commits

Differential Revision: http://llvm-reviews.chandlerc.com/D2686

llvm-svn: 201237
2014-02-12 14:44:54 +00:00
Rafael Espindola fa0f72837f Pass the Mangler by reference.
It is never null and it is not used in casts, so there is no reason to use a
pointer. This matches how we pass TM.

llvm-svn: 201025
2014-02-08 14:53:28 +00:00
Rafael Espindola 1070501586 Add LLVM_OVERRIDE to a few declarations.
llvm-svn: 201022
2014-02-08 06:07:27 +00:00
Rafael Espindola 4998280fdf Just returning false is the default.
llvm-svn: 200890
2014-02-06 00:03:15 +00:00
Matt Arsenault 25793a3f22 Add address space argument to allowsUnalignedMemoryAccess.
On R600, some address spaces have more strict alignment
requirements than others.

llvm-svn: 200887
2014-02-05 23:15:53 +00:00
Rafael Espindola b4eec1daa1 Remove support for not using .loc directives.
Clang itself was not using this. The only way to access it was via llc.

llvm-svn: 200862
2014-02-05 18:00:21 +00:00
Hal Finkel a7bbaf6de6 Replace PPC instruction-size code with MCInstrDesc getSize
As part of the cleanup done to enable the disassembler, the PPC instructions
now have a valid Size description field. This can now be used to replace some
custom logic in a few places to compute instruction sizes.

Patch by David Wiberg!

llvm-svn: 200623
2014-02-02 06:12:27 +00:00
David Woodhouse d2cca113df Delete MCSubtargetInfo data members from target MCCodeEmitter classes
The subtarget info is explicitly passed to the EncodeInstruction
method and we should use that subtarget info to influence any
encoding decisions.

llvm-svn: 200350
2014-01-28 23:13:25 +00:00
David Woodhouse 3fa98a65e9 Propagate MCSubtargetInfo through TableGen's getBinaryCodeForInstr()
llvm-svn: 200349
2014-01-28 23:13:18 +00:00
David Woodhouse 9784cef38d Explictly pass MCSubtargetInfo to MCCodeEmitter::EncodeInstruction()
llvm-svn: 200348
2014-01-28 23:13:07 +00:00
David Woodhouse e6c13e4abd Change MCStreamer EmitInstruction interface to take subtarget info
llvm-svn: 200345
2014-01-28 23:12:42 +00:00
Iain Sandoe 625b65a90c Provide a stub Target Streamer implementation for PPC MachO
At present, this handles .tc (error) and needs to be expanded to deal properly with .machine

llvm-svn: 200309
2014-01-28 11:03:17 +00:00
Hal Finkel 4e703bcecd Handle spilling the PPC GPRC_NOR0 register class
GPRC_NOR0 is not a subclass of GPRC (because it also contains the ZERO pseudo
register). As a result, we also need to check for it in the spilling code.

llvm-svn: 200288
2014-01-28 05:32:58 +00:00
Eric Christopher 2037caf8b9 Revert r199871 and replace it with a simple check in the debug info
code to see if we're emitting a function into a non-default
text section. This is still a less-than-ideal solution, but more
contained than r199871 to determine whether or not we're emitting
code into an array of comdat sections.

llvm-svn: 200269
2014-01-28 00:49:26 +00:00
Rafael Espindola e41383f899 Pass a MCSubtargetInfo down to the TargetStreamer creation.
With this the target streamers will be able to know the target features that
are in use.

llvm-svn: 200135
2014-01-26 06:38:58 +00:00
Rafael Espindola 24ea09ef7d Construct the MCStreamer before constructing the MCTargetStreamer.
This has a few advantages:
* Only targets that use a MCTargetStreamer have to worry about it.
* There is never a MCTargetStreamer without a MCStreamer, so we can use a
  reference.
* A MCTargetStreamer can talk to the MCStreamer in its constructor.

llvm-svn: 200129
2014-01-26 06:06:37 +00:00
Rafael Espindola 6b9ee9bce3 Remove an easy use of EmitRawText from PPC.
This makes lib/Target/PowerPC EmitRawText free.

llvm-svn: 200065
2014-01-25 02:35:56 +00:00
Juergen Ributzka 3e752e7af9 Add final and owerride keywords to TargetTransformInfo's subclasses.
llvm-svn: 200021
2014-01-24 18:22:59 +00:00
Alp Toker cb40291100 Fix known typos
Sweep the codebase for common typos. Includes some changes to visible function
names that were misspelt.

llvm-svn: 200018
2014-01-24 17:20:08 +00:00
Eric Christopher 15abef6df9 Add a variable to track whether or not we've used a unique section,
e.g. linkonce, to TargetMachine and set it when we've done so
for ELF targets currently. This involved making TargetMachine
non-const in a TLOF use and propagating that change around - I'm
open to other ideas.

This will be used in a future commit to handle emitting debug
information with ranges.

llvm-svn: 199871
2014-01-23 06:47:25 +00:00
Rafael Espindola 28a85a84ac Fix pr18515.
My understanding (from reading just the llvm code) is that
* most ppc cpus have a "sync n" instruction and an msync alias that is "sync 0".
* "book e" cpus instead have a msync instruction and not the more
general "sync n"

This patch reflects that in the .td files, allowing a single codepath for
asm ond obj streamer and incidentelly fixes a crash when EmitRawText was
called on a obj streamer.

llvm-svn: 199832
2014-01-22 20:20:52 +00:00
Hal Finkel 3e4a34c8c3 Fix pointer info on PPC byval stores
For PPC64 SVR (and Darwin), the stores that take byval aggregate parameters
from registers into the stack frame had MachinePointerInfo objects with
incorrect offsets. These offsets are relative to the object itself, not to the
stack frame base.

This fixes self hosting on PPC64 when compiling with -enable-aa-sched-mi.

llvm-svn: 199763
2014-01-21 20:15:58 +00:00
Rafael Espindola 4a1a360634 Make getTargetStreamer return a possibly null pointer.
This will allow it to be called from target independent parts of the main
streamer that don't know if there is a registered target streamer or not. This
in turn will allow targets to perform extra actions at specified points in the
interface: add extra flags for some labels, extra work during finalization, etc.

llvm-svn: 199174
2014-01-14 01:21:46 +00:00
Chandler Carruth 73523021d0 [PM] Split DominatorTree into a concrete analysis result object which
can be used by both the new pass manager and the old.

This removes it from any of the virtual mess of the pass interfaces and
lets it derive cleanly from the DominatorTreeBase<> template. In turn,
tons of boilerplate interface can be nuked and it turns into a very
straightforward extension of the base DominatorTree interface.

The old analysis pass is now a simple wrapper. The names and style of
this split should match the split between CallGraph and
CallGraphWrapperPass. All of the users of DominatorTree have been
updated to match using many of the same tricks as with CallGraph. The
goal is that the common type remains the resulting DominatorTree rather
than the pass. This will make subsequent work toward the new pass
manager significantly easier.

Also in numerous places things became cleaner because I switched from
re-running the pass (!!! mid way through some other passes run!!!) to
directly recomputing the domtree.

llvm-svn: 199104
2014-01-13 13:07:17 +00:00
Chandler Carruth 5ad5f15cff [cleanup] Move the Dominators.h and Verifier.h headers into the IR
directory. These passes are already defined in the IR library, and it
doesn't make any sense to have the headers in Analysis.

Long term, I think there is going to be a much better way to divide
these matters. The dominators code should be fully separated into the
abstract graph algorithm and have that put in Support where it becomes
obvious that evn Clang's CFGBlock's can use it. Then the verifier can
manually construct dominance information from the Support-driven
interface while the Analysis library can provide a pass which both
caches, reconstructs, and supports a nice update API.

But those are very long term, and so I don't want to leave the really
confusing structure until that day arrives.

llvm-svn: 199082
2014-01-13 09:26:24 +00:00
Saleem Abdulrasool a6505ca4c2 correct target directive handling error handling
The target specific parser should return `false' if the target AsmParser handles
the directive, and `true' if the generic parser should handle the directive.
Many of the target specific directive handlers would `return Error' which does
not follow these semantics.  This change simply changes the target specific
routines to conform to the semantis of the ParseDirective correctly.

Conformance to the semantics improves diagnostics emitted for the invalid
directives.  X86 is taken as a sample to ensure that multiple diagnostics are
not presented for a single error.

llvm-svn: 199068
2014-01-13 01:15:39 +00:00
Chandler Carruth d48cdbf0c3 Put the functionality for printing a value to a raw_ostream as an
operand into the Value interface just like the core print method is.
That gives a more conistent organization to the IR printing interfaces
-- they are all attached to the IR objects themselves. Also, update all
the users.

This removes the 'Writer.h' header which contained only a single function
declaration.

llvm-svn: 198836
2014-01-09 02:29:41 +00:00
Rafael Espindola 894843cb4e Move the llvm mangler to lib/IR.
This makes it available to tools that don't link with target (like llvm-ar).

llvm-svn: 198708
2014-01-07 21:19:40 +00:00
Chandler Carruth 9aca918df9 Move the LLVM IR asm writer header files into the IR directory, as they
are part of the core IR library in order to support dumping and other
basic functionality.

Rename the 'Assembly' include directory to 'AsmParser' to match the
library name and the only functionality left their -- printing has been
in the core IR library for quite some time.

Update all of the #includes to match.

All of this started because I wanted to have the layering in good shape
before I started adding support for printing LLVM IR using the new pass
infrastructure, and commandline support for the new pass infrastructure.

llvm-svn: 198688
2014-01-07 12:34:26 +00:00
Chandler Carruth 8a8cd2bab9 Re-sort all of the includes with ./utils/sort_includes.py so that
subsequent changes are easier to review. About to fix some layering
issues, and wanted to separate out the necessary churn.

Also comment and sink the include of "Windows.h" in three .inc files to
match the usage in Memory.inc.

llvm-svn: 198685
2014-01-07 11:48:04 +00:00
Bill Wendling 13199b17f8 Remove unnecessary #includes.
llvm-svn: 198585
2014-01-06 06:00:00 +00:00
Bill Wendling 908bf814e7 Refactor function that checks that __builtin_returnaddress's argument is constant.
This moves the check up into the parent class so that all targets can use it
without having to copy (and keep in sync) the same error message.

llvm-svn: 198579
2014-01-06 00:43:20 +00:00
Bill Wendling df7dd28dc8 Emit an error message if the value passed to __builtin_returnaddress isn't a constant
__builtin_returnaddress requires that the value passed into is be a constant.
However, at -O0 even a constant expression may not be converted to a constant.
Emit an error message intead of crashing.

llvm-svn: 198531
2014-01-05 01:47:20 +00:00
Rafael Espindola 58873566b3 Make the llvm mangler depend only on DataLayout.
Before this patch any program that wanted to know the final symbol name of a
GlobalValue had to link with Target.

This patch implements a compromise solution where the mangler uses DataLayout.
This way, any tool that already links with Target (llc, clang) gets the exact
behavior as before and new IR files can be mangled without linking with Target.

With this patch the mangler is constructed with just a DataLayout and DataLayout
is extended to include the information the Mangler needs.

llvm-svn: 198438
2014-01-03 19:21:54 +00:00
Hal Finkel 860fa9052e [PPC] Fix comment to match function name
llvm-svn: 198362
2014-01-02 22:09:39 +00:00
Hal Finkel 1d429f2ee0 [PPC] Fix the scheduling of CR logicals on the P7
CR logicals (crand, crxor, etc.) on the P7 need to be in the first slot of each
dispatch group. The old itinerary entry was just wrong (but has not mattered
because we don't generate these instructions).

This will matter when, in an upcoming commit, we start generating these
instructions.

llvm-svn: 198359
2014-01-02 21:38:26 +00:00
Hal Finkel 77c8dc1da3 [PPC] Use the correct immediate operands on 64-bit instructions
Several of the 64-bit fixed-point instructions with immediate operands were
using the 32-bit (i32) operand nodes instead of the corresponding 64-bit (i64)
operand definitions (u16imm instead of u16imm64, for example).

This error has had no effect so far, but would have caused type-checking
violations with an upcoming change.

llvm-svn: 198356
2014-01-02 21:26:59 +00:00
Roman Divacky bc1655b4b0 Use r2 when encoding tls on ppc32. Fixes PR18305.
llvm-svn: 197878
2013-12-22 10:45:37 +00:00
Roman Divacky 8854e76944 Add some comments.
llvm-svn: 197875
2013-12-22 09:48:38 +00:00
Roman Divacky 32143e2bda Implement initial-exec TLS for PPC32.
llvm-svn: 197824
2013-12-20 18:08:54 +00:00
Rafael Espindola 9ec26f395b Long doubles are required to be aligned to 128 bits and svr4 32 bits.
Clang was already getting this right.

llvm-svn: 197694
2013-12-19 16:23:59 +00:00
Hal Finkel 2345347eb9 Add a disassembler to the PowerPC backend
The tests for the disassembler were adapted from the encoder tests, and for the
most part, the output from the disassembler matches that encoder-test inputs.
There are some places where more-informative mnemonics could be produced
(notably for the branch instructions), and those cases are noted in the tests
with FIXMEs.

Future work includes:

 - Generating more-informative mnemonics when possible (this may also be done
   in the printer).

 - Remove the dependence on positional "numbered" operand-to-variable mapping
   (for both encoding and decoding).

 - Internally using 64-bit instruction variants in 64-bit mode (if this turns
   out to matter).

llvm-svn: 197693
2013-12-19 16:13:01 +00:00
Rafael Espindola 988f35e999 Fix f64 and f128 for ppc-darwin.
This patch adds -f64:32:64 to 32 bit ppc darwin since a f64 inside a
structure are only 32 bit aligned.

The patch also drop -f128:64:128 from all ppc darwin, since f128 is
128 bit aligned.

llvm-svn: 197574
2013-12-18 15:06:25 +00:00
Rafael Espindola 382ee385fd One ppc32-darwin, a i64 inside a structure can have 32 bit alignment.
Thanks for Iain Sandoe for testing this with the original gcc.

Clang was already getting this right.

llvm-svn: 197572
2013-12-18 14:35:37 +00:00
Hal Finkel b4b99e545b Eliminate PPC instruction decoding ambiguities
The instruction definitions in the PPC backend have a number of variants
defined for the same instruction to represent differences between 64-bit and
32-bit semantics. In order to generate a disassembler for the PPC backend, we
need to mark all but one of these as CodeGen only.

No functionality change intended; this is prep work for PPC disassembly
support.

llvm-svn: 197535
2013-12-17 23:05:18 +00:00
Rafael Espindola 345d718d16 Fix the pointer size for the PS3 datalayout.
This will be tested from clang.

llvm-svn: 197501
2013-12-17 15:29:48 +00:00
Andrew Trick e339828b90 Allow MachineCSE to coalesce trivial subregister copies the same way that it coalesces normal copies.
Without this, MachineCSE is powerless to handle redundant operations with truncated source operands.

This required fixing the 2-addr pass to handle tied subregisters. It isn't clear what combinations of subregisters can legally be tied, but the simple case of truncated source operands is now safely handled:

     %vreg11<def> = COPY %vreg1:sub_32bit; GR32:%vreg11 GR64:%vreg1
     %vreg12<def> = COPY %vreg2:sub_32bit; GR32:%vreg12 GR64:%vreg2
     %vreg13<def,tied1> = ADD32rr %vreg11<tied0>, %vreg12<kill>, %EFLAGS<imp-def>

Test case: cse-add-with-overflow.ll.

This exposed an existing bug in
PPCInstrInfo::commuteInstruction. Thanks to Rafael for the test case:
PowerPC/crash.ll.

llvm-svn: 197465
2013-12-17 04:50:45 +00:00
Andrew Trick 9defbd882b whitespace
llvm-svn: 197464
2013-12-17 04:50:40 +00:00
Rafael Espindola bccb9d45ad The preferred alignment defaults to the abi alignment. Omit if it is the same.
llvm-svn: 197400
2013-12-16 18:01:51 +00:00
Rafael Espindola 8afbb28cea On DataLayout, omit the default of p:64:64:64.
llvm-svn: 197397
2013-12-16 17:15:29 +00:00
Hal Finkel 0a576d52fa Set has_asmparser in PowerPC/LLVMBuild.txt
PowerPC now has an asm parser (and has for many months now); indicate this in
PowerPC/LLVMBuild.txt.

llvm-svn: 197393
2013-12-16 15:48:09 +00:00
Iain Sandoe e0b4cb62f5 [Powerpc darwin] AsmParser Base implementation.
This is a base implementation of the powerpc-apple-darwin asm parser dialect.

* Enables infrastructure (essentially isDarwin()) and fixes up the parsing of asm directives to separate out ELF and MachO/Darwin additions.
* Enables parsing of {r,f,v}XX as register identifiers.
* Enables parsing of lo16() hi16() and ha16() as modifiers.

The changes to the test case are from David Fang (fangism).

llvm-svn: 197324
2013-12-14 13:34:02 +00:00
Rafael Espindola 1caa693a7b Assume defaults to produce smaller datalayout strings.
llvm-svn: 197249
2013-12-13 17:56:11 +00:00
Iain Sandoe 680385830f test commit.
Amend a comment.

llvm-svn: 197237
2013-12-13 15:46:48 +00:00
Gabor Greif 5fde43bf2e typo in comment
llvm-svn: 197136
2013-12-12 08:00:34 +00:00
Hal Finkel fa50630e43 Remove unused multiclass from PPCInstrInfo.td
llvm-svn: 197100
2013-12-12 00:23:29 +00:00
Hal Finkel ceb1f12d9a Improve instruction scheduling for the PPC POWER7
Aside from a few minor latency corrections, the major change here is a new
hazard recognizer which focuses on better dispatch-group formation on the
POWER7. As with the PPC970's hazard recognizer, the most important thing it
does is avoid load-after-store hazards within the same dispatch group. It uses
the POWER7's special dispatch-group-terminating nop instruction (instead of
inserting multiple regular nop instructions). This new hazard recognizer makes
use of the scheduling dependency graph itself, built using AA information, to
robustly detect the possibility of load-after-store hazards.

significant test-suite performance changes (the error bars are 99.5% confidence
intervals based on 5 test-suite runs both with and without the change --
speedups are negative):

speedups:

MultiSource/Benchmarks/FreeBench/pcompress2/pcompress2
	-0.55171% +/- 0.333168%

MultiSource/Benchmarks/TSVC/CrossingThresholds-dbl/CrossingThresholds-dbl
	-17.5576% +/- 14.598%

MultiSource/Benchmarks/TSVC/Reductions-dbl/Reductions-dbl
	-29.5708% +/- 7.09058%

MultiSource/Benchmarks/TSVC/Reductions-flt/Reductions-flt
	-34.9471% +/- 11.4391%

SingleSource/Benchmarks/BenchmarkGame/puzzle
	-25.1347% +/- 11.0104%

SingleSource/Benchmarks/Misc/flops-8
	-17.7297% +/- 9.79061%

SingleSource/Benchmarks/Shootout-C++/ary3
	-35.5018% +/- 23.9458%

SingleSource/Regression/C/uint64_to_float
	-56.3165% +/- 25.4234%

SingleSource/UnitTests/Vectorizer/gcc-loops
	-18.5309% +/- 6.8496%

regressions:

MultiSource/Benchmarks/ASCI_Purple/SMG2000/smg2000
	18.351% +/- 12.156%

SingleSource/Benchmarks/Shootout-C++/methcall
	27.3086% +/- 14.4733%

llvm-svn: 197099
2013-12-12 00:19:11 +00:00
Hal Finkel 94a6f380bb Fix the PPC subsumes-predicate check
For one predicate to subsume another, they must both check the same condition
register. Failure to check this prerequisite was causing miscompiles.

Fixes PR18003.

llvm-svn: 197089
2013-12-11 23:12:25 +00:00
NAKAMURA Takumi 8bc9bfaa5a Prune redundant dependencies in LLVMBuild.txt.
llvm-svn: 196988
2013-12-11 00:30:57 +00:00
Rafael Espindola 5b3585871b Move PPC's getDataLayoutString out of line and document it better.
llvm-svn: 196987
2013-12-11 00:09:06 +00:00
David Fang 1b01849f2d on darwin<10, fallback to .weak_definition (PPC,X86)
.weak_def_can_be_hidden was not yet supported by the system assembler

llvm-svn: 196970
2013-12-10 21:37:41 +00:00
NAKAMURA Takumi 396d4d3c7e Add proper dependencies to LLVMBuild.txt in llvm/lib.
I'll prune redundant deps in LLVMBuild.txt, later.

llvm-svn: 196881
2013-12-10 05:39:34 +00:00
Rafael Espindola 117b20c492 Remove the isImplicitlyPrivate argument of getNameWithPrefix.
getSymbolWithGlobalValueBase use is to create a name of a new symbol based
on the name of an existing GV. Assert that and then remove the last call
to pass true to isImplicitlyPrivate.

This gives the mangler API a 1:1 mapping from GV to names, which is what we
need to drop the mangler dependency on the target (and use an extended
datalayout instead).

llvm-svn: 196472
2013-12-05 05:53:12 +00:00
Alp Toker f907b891da Correct word hyphenations
This patch tries to avoid unrelated changes other than fixing a few
hyphen-related ambiguities and contractions in nearby lines.

llvm-svn: 196471
2013-12-05 05:44:44 +00:00
Hal Finkel 563cc05cae Remove PPCScoreboardHazardRecognizer
PPCScoreboardHazardRecognizer was a subclass of ScoreboardHazardRecognizer
which did only one thing: filtered out nodes in EmitInstruction for which
DAG->getInstrDesc(SU) returned NULL. This used to be the case for PPC pseudo
instructions. As far as I can tell, this is no longer true, and so we can use
ScoreboardHazardRecognizer directly.

llvm-svn: 196171
2013-12-02 23:52:46 +00:00
Rafael Espindola 5113d166f5 Refactor the setting of PrivateGlobalPrefix.
No functionality change.

llvm-svn: 196170
2013-12-02 23:39:26 +00:00
Rafael Espindola f4e6b29a03 Move getSymbolWithGlobalValueBase to TargetLoweringObjectFile.
This allows it to be used in TargetLoweringObjectFileImpl.cpp.

llvm-svn: 196117
2013-12-02 16:25:47 +00:00
Rafael Espindola 957cf6f9e1 Remove dead code.
MO_JumpTableIndex and MO_ExternalSymbol don't show up on inline asm.

Keeping parts of the old asm printer just to print inline asm to a string that
we then parse back looks like a hack.

llvm-svn: 196111
2013-12-02 15:36:37 +00:00
Rafael Espindola 50712a456d Change the default of AsmWriterClassName and isMCAsmWriter.
llvm-svn: 196065
2013-12-02 04:55:42 +00:00
Rafael Espindola 32635781bb Refactor for clarity and efficiency.
The PPC GetSymbolFromOperand already prefixed stubs of MO_ExternalSymbol, so
this should be a nop.

llvm-svn: 196059
2013-12-02 03:26:43 +00:00
Hal Finkel 42daeae9bd Add a scheduling model (with itinerary) for the PPC POWER7
This adds a scheduling model for the POWER7 (P7) core, and enables the
machine-instruction scheduler when targeting the P7. Scheduling for the P7,
like earlier ooo PPC cores, requires considering both dispatch group hazards,
and functional unit resources and latencies. These are both modeled in a
combined itinerary. Dispatch group formation is still handled by the post-RA
scheduler (which still needs to be updated for the P7, but nevertheless does a
pretty good job).

One interesting aspect of this change is that I've also enabled to use of AA
duing CodeGen for the P7 (just as it is for the embedded cores). The benchmark
results seem to support this decision (see below), and while this is normally
useful for in-order cores, and not for ooo cores like the P7, I think that the
dispatch slot hazards are enough like in-order resources to make the AA useful.

Test suite significant performance differences (where negative is a speedup,
and positive is a regression) vs. the current situation:

MultiSource/Benchmarks/BitBench/drop3/drop3
  with AA: N/A
  without AA: -28.7614% +/- 19.8356%
(significantly against AA)

MultiSource/Benchmarks/FreeBench/neural/neural
  with AA: -17.7406% +/- 11.2712%
  without AA: N/A
(significantly in favor of AA)

MultiSource/Benchmarks/SciMark2-C/scimark2
  with AA: -11.2079% +/- 1.80543%
  without AA: -11.3263% +/- 2.79651%

MultiSource/Benchmarks/TSVC/Symbolics-flt/Symbolics-flt
  with AA: -41.8649% +/- 17.0053%
  without AA: -34.5256% +/- 23.7072%

MultiSource/Benchmarks/mafft/pairlocalalign
  with AA: 25.3016% +/- 17.8614%
  without AA: 38.6629% +/- 14.9391%
(significantly in favor of AA)

MultiSource/Benchmarks/sim/sim
  with AA: N/A
  without AA: 13.4844% +/- 7.18195%
(significantly in favor of AA)

SingleSource/Benchmarks/BenchmarkGame/Large/fasta
  with AA: 15.0664% +/- 6.70216%
  without AA: 12.7747% +/- 8.43043%

SingleSource/Benchmarks/BenchmarkGame/puzzle
  with AA: 82.2713% +/- 26.3567%
  without AA: 75.7525% +/- 41.1842%

SingleSource/Benchmarks/Misc/flops-2
  with AA: -37.1621% +/- 20.7964%
  without AA: -35.2342% +/- 20.2999%
(significantly in favor of AA)

These are 99.5% confidence intervals from 5 runs per configuration. Regarding
the choice to turn on AA during CodeGen, of these results, four seem
significantly in favor of using AA, and one seems significantly against. I'm
not making this decision based on these numbers alone, but these results
seem consistent with results I have from other tests, and so I think that, on
balance, using AA is a win.

llvm-svn: 195981
2013-11-30 20:55:12 +00:00
Hal Finkel 46402a4211 Split some PPC itinerary classes
In preparation for adding scheduling definitions for the POWER7, split some PPC
itinerary classes so that the P7's latencies and hazards can be better
described. For the most part, this means differentiating indexed from non-index
pre-increment loads and stores. Also, differentiate single from
double-precision sqrt.

No functionality change intended (except for a more-specific latency for
single-precision sqrt on the A2).

llvm-svn: 195980
2013-11-30 20:41:13 +00:00
Hal Finkel 1df3205e8c Adjust PPC A2 input operand latencies
On the PPC A2, instructions are only issued after their input operands are
ready. Model this by specifying that input operands are read at dispatch (0
cycles after issue). This changes all input operand latencies from 1 to 0.

Significant test-suite performance changes (these are 99.5% confidence
intervals on 6 runs for both before and after):

speedups:
MultiSource/Benchmarks/sim/sim
	-1.21915% +/- 0.175063%
MultiSource/Benchmarks/TSVC/LinearDependence-flt/LinearDependence-flt
	-1.23946% +/- 1.05133%
SingleSource/Benchmarks/Misc/flops-2
	-1.24237% +/- 0.681362%
MultiSource/Applications/JM/lencod/lencod
	-1.33992% +/- 0.757498%
MultiSource/Benchmarks/TSVC/InductionVariable-flt/InductionVariable-flt
	-1.51802% +/- 1.21468%
MultiSource/Benchmarks/TSVC/GlobalDataFlow-flt/GlobalDataFlow-flt
	-2.18818% +/- 1.28605%
MultiSource/Benchmarks/TSVC/Packing-flt/Packing-flt
	-2.21977% +/- 1.19499%
SingleSource/Benchmarks/BenchmarkGame/spectral-norm
	-2.29822% +/- 0.671871%
MultiSource/Benchmarks/TSVC/Packing-dbl/Packing-dbl
	-2.40975% +/- 0.355931%
SingleSource/Benchmarks/Misc/fp-convert
	-2.41899% +/- 1.04751%
MultiSource/Benchmarks/TSVC/Searching-dbl/Searching-dbl
	-2.50349% +/- 0.126765%
SingleSource/Benchmarks/Misc/flops-3
	-3.00214% +/- 0.700795%
MultiSource/Benchmarks/TSVC/LoopRestructuring-flt/LoopRestructuring-flt
	-3.56995% +/- 3.2929%
MultiSource/Applications/sgefa/sgefa
	-4.24908% +/- 2.00413%
MultiSource/Benchmarks/ASC_Sequoia/IRSmk/IRSmk
	-18.1294% +/- 3.96489%

regressions:
MultiSource/Benchmarks/TSVC/Reductions-dbl/Reductions-dbl
	1.03249% +/- 0.178547%
MultiSource/Applications/hexxagon/hexxagon
	1.16597% +/- 0.285235%
MultiSource/Benchmarks/TSVC/IndirectAddressing-flt/IndirectAddressing-flt
	1.39576% +/- 1.07855%
SingleSource/Benchmarks/Misc-C++/stepanov_v1p2
	1.71539% +/- 0.173182%
MultiSource/Benchmarks/Fhourstones-3.1/fhourstones3.1
	1.90013% +/- 0.866472%
MultiSource/Benchmarks/TSVC/Recurrences-dbl/Recurrences-dbl
	2.39854% +/- 1.05914%
MultiSource/Benchmarks/TSVC/ControlFlow-dbl/ControlFlow-dbl
	2.4402% +/- 0.817904%
MultiSource/Benchmarks/TSVC/LoopRestructuring-dbl/LoopRestructuring-dbl
	5.87997% +/- 3.3172%
MultiSource/Benchmarks/Trimaran/netbench-crc/netbench-crc
	9.02643% +/- 5.79591%
MultiSource/Benchmarks/VersaBench/bmm/bmm
	10.3517% +/- 1.227%

Obviously, there are data points on both sides of this; but I think, overall,
this supports making the change.

llvm-svn: 195951
2013-11-29 07:04:59 +00:00
Hal Finkel 5a7162f36b Create a PPC440 SchedMachineModel
Some of the older PPC processor definitions don't have associated
SchedMachineModels; correct this for the PPC440.

llvm-svn: 195949
2013-11-29 06:32:17 +00:00
Hal Finkel 4035e8d86a Fixup PPC440 load/store operand latencies
The operand latencies for loads and stores in the PPC440 itinerary were wrong
(the store operands are all inputs, and the "with update" (pre-increment)
instructions need a latency for the additional output).

llvm-svn: 195948
2013-11-29 06:19:43 +00:00
Hal Finkel a10bd1d23a Adjust PPC440 operand latencies
The operand latencies for the PPC440 should be specified relative to dispatch,
not relative to the initial fetch-and-decode stages. Because most instructions
(ignoring bypass) wait in dispatch until their operands are ready, this is
modeled as reading input operands "at dispatch" (0 cycles after issue), and so
every input and output operand has 4 cycles subtracted from it.

This could alter scheduling slightly, but I don't expect a large effect.

llvm-svn: 195947
2013-11-29 05:59:00 +00:00
Hal Finkel dd06369913 Don't model the fetch and decode units for the PPC440
Modeling the fetch and decode units in the PPC440 itinerary does not add
anything to the hazard detection capability (and so modeling them just wastes
compile time).

No functionality change intended.

llvm-svn: 195946
2013-11-29 05:58:38 +00:00
NAKAMURA Takumi 226e10edff [CMake] Let add_public_tablegen_target() provide intrinsics_gen, too.
I think, in principle, intrinsics_gen may be added explicitly.
That said, it can be added incidentally, since each target already has dependencies to llvm-tblgen.
Almost all source files depend on both CommonTaleGen and intrinsics_gen.

Explicit add_dependencies() have been pruned under lib/Target.

llvm-svn: 195929
2013-11-28 17:04:31 +00:00
NAKAMURA Takumi ce746c6c49 [CMake] Let add_public_tablegen_target responsible to provide dependency to CommonTableGen.
add_public_tablegen_target adds *CommonTableGen to LLVM_COMMON_DEPENDS.
LLVM_COMMON_DEPENDS affects add_llvm_library (and other add_target stuff) within its scope.

llvm-svn: 195927
2013-11-28 17:04:04 +00:00
NAKAMURA Takumi 413518f1f8 [CMake] Prune include_directories() in llvm/lib/Target. add_llvm_target() sets them.
llvm-svn: 195921
2013-11-28 14:53:30 +00:00
Rafael Espindola 3e3a3f1f85 Use the mangler consistently instead of using getGlobalPrefix directly.
llvm-svn: 195911
2013-11-28 08:59:52 +00:00
Hal Finkel 92720ab1b2 Don't share functional units among the PPC itineraries
Instead of sharing functional unit names between the various PPC itineraries,
give each core its own unit names prefixed with the core name.  This follows
the convention used by other backends (such as ARM), and removes a non-obvious
ordering dependency between the various PPCSchedule*.td files.

No functionality change intended.

llvm-svn: 195908
2013-11-28 06:05:59 +00:00
Hal Finkel 3e5a360ba3 Add IIC_ prefix to PPC instruction-class names
This adds the IIC_ prefix to the instruction itinerary class names, giving the
PPC backend a naming convention for itinerary classes that is more consistent
with that used by the X86 and ARM backends.

Instruction scheduling in the PPC backend needs a bunch of cleanup and
improvement (especially for the ooo cores). This is just a preliminary step.

No functionality change intended.

llvm-svn: 195890
2013-11-27 23:26:09 +00:00
Rafael Espindola c90584b6f6 Don't set GlobalPrefix to the default value.
llvm-svn: 195884
2013-11-27 21:57:54 +00:00
Hal Finkel 8081ae9134 Fix comment in PPCA2Model
llvm-svn: 195807
2013-11-27 03:12:56 +00:00
Hal Finkel 884bde3031 PPC popcnt[dw] do not have record forms
The instruction definitions incorrectly specified that popcntd and popcntw have
record forms; they do not. This mistake was causing invalid code generation.

llvm-svn: 195272
2013-11-20 20:54:55 +00:00
Hal Finkel 22498fa6e3 PPC: Optimize rldicl generation for masked shifts
Masking operations (where only some number of the low bits are being kept) are
selected to rldicl(x, 0, mb). If x is a logical right shift (which would become
rldicl(y, 64-n, n)), we might be able to fold the two instructions together:

  rldicl(rldicl(x, 64-n, n), 0, mb) -> rldicl(x, 64-n, mb) for n <= mb

The right shift is really a left rotate followed by a mask, and if the explicit
mask is a more-restrictive sub-mask of the mask implied by the shift, only one
rldicl is needed.

llvm-svn: 195185
2013-11-20 01:10:15 +00:00
Juergen Ributzka d12ccbd343 [weak vtables] Remove a bunch of weak vtables
This patch removes most of the trivial cases of weak vtables by pinning them to
a single object file. The memory leaks in this version have been fixed. Thanks
Alexey for pointing them out.

Differential Revision: http://llvm-reviews.chandlerc.com/D2068

Reviewed by Andy

llvm-svn: 195064
2013-11-19 00:57:56 +00:00
Alexey Samsonov 49109a279c Revert r194865 and r194874.
This change is incorrect. If you delete virtual destructor of both a base class
and a subclass, then the following code:
  Base *foo = new Child();
  delete foo;
will not cause the destructor for members of Child class. As a result, I observe
plently of memory leaks. Notable examples I investigated are:
ObjectBuffer and ObjectBufferStream, AttributeImpl and StringSAttributeImpl.

llvm-svn: 194997
2013-11-18 09:31:53 +00:00
Juergen Ributzka dbedae89b9 [weak vtables] Remove a bunch of weak vtables
This patch removes most of the trivial cases of weak vtables by pinning them to
a single object file.

Differential Revision: http://llvm-reviews.chandlerc.com/D2068

Reviewed by Andy

llvm-svn: 194865
2013-11-15 22:34:48 +00:00
Bob Wilson 9f3e6b25ee Avoid illegal integer promotion in fastisel
Stop folding constant adds into GEP when the type size doesn't match.
Otherwise, the adds' operands are effectively being promoted, changing the
conditions of an overflow.  Results are different when:

    sext(a) + sext(b) != sext(a + b)

Problem originally found on x86-64, but also fixed issues with ARM and PPC,
which used similar code.

<rdar://problem/15292280>

Patch by Duncan Exon Smith!

llvm-svn: 194840
2013-11-15 19:09:27 +00:00
Hal Finkel c6a243987d Add PPC option for full register names in asm
On non-Darwin PPC systems, we currently strip off the register name prefix
prior to instruction printing. So instead of something like this:

  mr r3, r4

we print this:

  mr 3, 4

The first form is the default on Darwin, and is understood by binutils, but not
yet understood by our integrated assembler. Once our integrated-as understands
full register names as well, this temporary option will be replaced by tying
this functionality to the verbose-asm option. The numeric-only form is
compatible with legacy assemblers and tools, and is also gcc's default on most
PPC systems. On the other hand, it is harder to read, and there are some
analysis tools that expect full register names.

llvm-svn: 194384
2013-11-11 14:58:40 +00:00
Rui Ueyama 29d2910875 Use StringRef::startswith_lower. No functionality change.
llvm-svn: 193796
2013-10-31 19:59:55 +00:00
Rafael Espindola 79858aa3df Add a helper getSymbol to AsmPrinter.
llvm-svn: 193627
2013-10-29 17:07:16 +00:00
Eric Christopher 081efcc3ac Add support for the VSX target attribute. No functional change
as we don't actually use it to emit any code yet.

llvm-svn: 192837
2013-10-16 20:38:58 +00:00
Rafael Espindola 43c4e24fad Add a MCAsmInfoELF class and factor some code into it.
We had a MCAsmInfoCOFF, but no common class for all the ELF MCAsmInfos before.

llvm-svn: 192760
2013-10-16 01:34:32 +00:00
Rafael Espindola a17151ad5a Add a MCTargetStreamer interface.
This patch fixes an old FIXME by creating a MCTargetStreamer interface
and moving the target specific functions for ARM, Mips and PPC to it.

The ARM streamer is still declared in a common place because it is
used from lib/CodeGen/ARMException.cpp, but the Mips and PPC are
completely hidden in the corresponding Target directories.

I will send an email to llvmdev with instructions on how to use this.

llvm-svn: 192181
2013-10-08 13:08:17 +00:00
Rafael Espindola e90fd9c5e0 Remove getEHExceptionRegister and getEHHandlerRegister.
They haven't been used for a long time. Patch by MathOnNapkins.

llvm-svn: 192099
2013-10-07 13:39:22 +00:00
Bill Schmidt cea1596205 [PowerPC] Fix PR17354: Generate nop after local calls for PIC code.
When generating code for shared libraries, even local calls may be
intercepted, so we need a nop after the call for the linker to fix up the
TOC.  Test case adapted from the one provided in PR17354.

llvm-svn: 191440
2013-09-26 17:09:28 +00:00
David Majnemer 7137420d94 PPC: Allow partial fills in writeNopData()
When asked to pad an irregular number of bytes, we should fill with
zeros.  This is consistent with the behavior specified in the AIX
Assembler Language Reference as well as other LLVM and binutils
assemblers.

N.B. There is a small deviation from binutils' PPC assembler:
when handling pads which are greater than 4 bytes but not mod 4,
binutils will not emit any NOP sequences at all and only use zeros.
This may or may not be a bug but there is no excellent rationale as to
why that behavior is important to emulate.  If that behavior is needed,
we can change writeNopData() to behave in the same way.

This fixes PR17352.

llvm-svn: 191426
2013-09-26 09:18:48 +00:00
David Majnemer 08249a31b2 PPC: Do not introduce ISD nodes for fctid and fctiw
llvm-svn: 191421
2013-09-26 05:22:11 +00:00
David Majnemer 6ad26d3364 PPC: Add support for fctid and fctiw
Encodings were checked against the Power ISA documents and double
checked against binutils.

This fixes PR17350.

llvm-svn: 191419
2013-09-26 04:11:24 +00:00
David Majnemer 0c58bc64a4 MC: Add support for treating $ as a reference to the PC
The binutils assembler supports a mode called DOLLAR_DOT which treats
the dollar sign token as a reference to the current program counter if
the dollar sign doesn't precede a constant or identifier.

This commit adds a new MCAsmInfo flag stating whether or not a given
target supports this interpretation of the dollar sign token; by
default, this flag is not enabled.

Further, enable this flag for PPC. The system assembler for AIX and
binutils both support using the dollar sign in this manner.

This fixes PR17353.

llvm-svn: 191368
2013-09-25 10:47:21 +00:00
David Majnemer 1ccd2f2aee MC: Remove vestigial PCSymbol field from AsmInfo
llvm-svn: 191362
2013-09-25 09:36:11 +00:00
Tim Northover 31d093c705 ISelDAG: spot chain cycles involving MachineNodes
Previously, the DAGISel function WalkChainUsers was spotting that it
had entered already-selected territory by whether a node was a
MachineNode (amongst other things). Since it's fairly common practice
to insert MachineNodes during ISelLowering, this was not the correct
check.

Looking around, it seems that other nodes get their NodeId set to -1
upon selection, so this makes sure the same thing happens to all
MachineNodes and uses that characteristic to determine whether we
should stop looking for a loop during selection.

This should fix PR15840.

llvm-svn: 191165
2013-09-22 08:21:56 +00:00
Hal Finkel 25415c2b83 Correct the pre-increment load latencies in the PPC A2 itinerary
Pre-increment loads are microcoded on the A2, and the address increment occurs
only after the load completes. As a result, the latency of the GPR address
update is an additional 2 cycles on top of the load latency.

llvm-svn: 191156
2013-09-22 00:08:14 +00:00
Bill Schmidt bdae03f227 [PowerPC] Add a FIXME.
Documenting a design choice to generate only medium model sequences for TLS
addresses at this time.  Small and large code models could be supported if
necessary.

llvm-svn: 190883
2013-09-17 20:22:05 +00:00
Bill Schmidt bb381d7063 [PowerPC] Fix problems with large code model (PR17169).
Large code model on PPC64 requires creating and referencing TOC entries when
using the addis/ld form of addressing.  This was not being done in all cases.
The changes in this patch to PPCAsmPrinter::EmitInstruction() fix this.  Two
test cases are also modified to reflect this requirement.

Fast-isel was not creating correct code for loading floating-point constants
using large code model.  This also requires the addis/ld form of addressing.
Previously we were using the addis/lfd shortcut which is only applicable to
medium code model.  One test case is modified to reflect this requirement.

llvm-svn: 190882
2013-09-17 20:03:25 +00:00
Bill Schmidt c763c22469 [PowerPC] Fix PR17155 - Ignore COPY_TO_REGCLASS during emit.
Fast-isel generates a COPY_TO_REGCLASS for widening f32 to f64, which
is a nop on PPC64.  This is needed to keep the register class system
happy, but on the fast-isel path it is not removed before emit as it
is for DAG select.  Ignore this op when emitting instructions.

llvm-svn: 190795
2013-09-16 17:25:12 +00:00
Hal Finkel 40c34781b5 PPC: Don't restrict lvsl generation to after type legalization
This is a re-commit of r190764, with an extra check to make sure that we're not
performing the transformation on illegal types (a small test case has been
added for this as well).

Original commit message:

The PPC backend uses a target-specific DAG combine to turn unaligned Altivec
loads into a permutation-based sequence when possible. Unfortunately, the
target-specific DAG combine is not always called on all loads of interest
(sometimes the routines in DAGCombine call CombineTo such that the new node and
users are not added to the worklist); allowing the combine to trigger early
(before type legalization) mitigates this problem. Because the autovectorizers
only create legal vector types, I don't expect a lot of cases where this
optimization is enabled by type legalization in practice.

llvm-svn: 190771
2013-09-15 22:09:58 +00:00
Hal Finkel 31025a6325 Revert r190764: PPC: Don't restrict lvsl generation to after type legalization
This is causing test-suite failures.

Original commit message:

The PPC backend uses a target-specific DAG combine to turn unaligned Altivec
loads into a permutation-based sequence when possible. Unfortunately, the
target-specific DAG combine is not always called on all loads of interest
(sometimes the routines in DAGCombine call CombineTo such that the new node and
users are not added to the worklist); allowing the combine to trigger early
(before type legalization) mitigates this problem. Because the autovectorizers
only create legal vector types, I don't expect a lot of cases where this
optimization is enabled by type legalization in practice.

llvm-svn: 190765
2013-09-15 15:41:11 +00:00
Hal Finkel 2945d4e916 PPC: Don't restrict lvsl generation to after type legalization
The PPC backend uses a target-specific DAG combine to turn unaligned Altivec
loads into a permutation-based sequence when possible. Unfortunately, the
target-specific DAG combine is not always called on all loads of interest
(sometimes the routines in DAGCombine call CombineTo such that the new node and
users are not added to the worklist); allowing the combine to trigger early
(before type legalization) mitigates this problem. Because the autovectorizers
only create legal vector types, I don't expect a lot of cases where this
optimization is enabled by type legalization in practice.

llvm-svn: 190764
2013-09-15 15:20:54 +00:00
Hal Finkel c3cfbf8677 Add missing break statement in PPCISelLowering
As it turns out, not a problem in practice, but it should be there.

llvm-svn: 190720
2013-09-13 20:09:02 +00:00
Chandler Carruth 51428e363f Remove an unused variable, fixing -Werror build with latest Clang.
llvm-svn: 190640
2013-09-12 23:30:48 +00:00
Hal Finkel 262a224712 Fix PPC ABI for ByVal structs with vector members
When a structure is passed by value, and that structure contains a vector
member, according to the PPC ABI, the structure will receive enhanced alignment
(so that the vector within the structure will always be aligned).

This should resolve PR16641.

llvm-svn: 190636
2013-09-12 23:20:06 +00:00
Hal Finkel 1e2e3ea584 Make the PPC fast-math sqrt expansion safe at 0
In fast-math mode sqrt(x) is calculated using the fast expansion of the
reciprocal of the reciprocal sqrt expansion. The reciprocal and reciprocal
sqrt expansions use the associated estimate instructions along with some Newton
iterations. Unfortunately, as a result, sqrt(0) was being calculated as NaN,
which is not correct. Now we explicitly return a result of zero if the input is
zero.

llvm-svn: 190624
2013-09-12 19:04:12 +00:00
Roman Divacky 62cb63543b Implement asm support for a few PowerPC bookIII that are needed for assembling
FreeBSD kernel.

llvm-svn: 190618
2013-09-12 17:50:54 +00:00
Hal Finkel 0096dbd50d Mark PPC MFTB and DST (and friends) as deprecated
Use the new instruction deprecation feature to mark mftb (now replaced with
mfspr) and dst (along with the other Altivec cache control instructions) as
deprecated when targeting cores supporting at least ISA v2.03.

llvm-svn: 190605
2013-09-12 14:40:06 +00:00
Joey Gouly 0e76fa7df5 Add an instruction deprecation feature to TableGen.
The 'Deprecated' class allows you to specify a SubtargetFeature that the
instruction is deprecated on.

The 'ComplexDeprecationPredicate' class allows you to define a custom
predicate that is called to check for deprecation.
For example:
  ComplexDeprecationPredicate<"MCR">

would mean you would have to define the following function:
  bool getMCRDeprecationInfo(MCInst &MI, MCSubtargetInfo &STI,
                             std::string &Info)

Which returns 'false' for not deprecated, and 'true' for deprecated
and store the warning message in 'Info'.

The MCTargetAsmParser constructor was chaned to take an extra argument of
the MCInstrInfo class, so out-of-tree targets will need to be changed.

llvm-svn: 190598
2013-09-12 10:28:05 +00:00
Hal Finkel 7fe6a5390f PPC: Enable aggressive anti-dependency breaking
Aggressive anti-dependency breaking is enabled by default for all PPC cores.
This provides a general speedup on the P7 and other platforms (among other
factors, the instruction group formation for the non-embedded PPC cores is done
during post-RA scheduling). In order to do this safely, the incompatibility
between uses of the MFOCRF instruction and anti-dependency breaking are
resolved by marking MFOCRF with hasExtraSrcRegAllocReq. As noted in the removed
FIXME, the problem was that MFOCRF's output is sensitive to the identify of the
source register, and always paired with a shift to undo this effect. Because
anti-dependency breaking is unaware of this hidden dependency of the shift
amount on the source register of the MFOCRF instruction, changing that register
must be inhibited.

Two test cases were adjusted: The SjLj test was made more insensitive to
register choices and scheduling; the saveCR test disabled anti-dependency
breaking because part of what it is testing is proper register reuse.

llvm-svn: 190587
2013-09-12 05:24:49 +00:00
Hal Finkel f574c27769 Greatly simplify the PPC A2 scheduling itinerary
As Andy pointed out to me a long time ago, there are no structural hazards in
the later pipeline stages of the A2, and so modeling them is useless. Also,
modeling the top pre-dispatch stages is deceiving because, when multiple
hardware threads are active, those resources are shared among the threads. The
bypass definitions were mostly wrong, and so those have been removed. The
resulting itinerary is much simpler, and more accurate.

llvm-svn: 190562
2013-09-11 23:25:21 +00:00
Hal Finkel 21442b24fb Enable MI scheduling (and CodeGen AA) by default for embedded PPC cores
For embedded PPC cores (especially the A2 core), using the MI scheduler with AA
is far superior to the other scheduling options.

llvm-svn: 190558
2013-09-11 23:05:25 +00:00
Hal Finkel 71780ec4fd Implement TTI getUnrollingPreferences for PowerPC
The PowerPC A2 core greatly benefits from aggressive concatenation unrolling;
use the new getUnrollingPreferences to enable this by default when targeting
the PPC A2 core.

llvm-svn: 190549
2013-09-11 21:20:40 +00:00
Bill Wendling 58e2d3d856 Generate compact unwind encoding from CFI directives.
We used to generate the compact unwind encoding from the machine
instructions. However, this had the problem that if the user used `-save-temps'
or compiled their hand-written `.s' file (with CFI directives), we wouldn't
generate the compact unwind encoding.

Move the algorithm that generates the compact unwind encoding into the
MCAsmBackend. This way we can generate the encoding whether the code is from a
`.ll' or `.s' file.

<rdar://problem/13623355>

llvm-svn: 190290
2013-09-09 02:37:14 +00:00
Charles Davis 8bdfafd505 Move everything depending on Object/MachOFormat.h over to Support/MachO.h.
llvm-svn: 189728
2013-09-01 04:28:48 +00:00
Bill Schmidt eb8d6f7da0 [PowerPC] Fast-isel cleanup patch.
Here are a few miscellaneous things to tidy up the PPC64 fast-isel
implementation.  I corrected a couple of commentary lapses, and added
documentation of future opportunities.  I also implemented
TargetMaterializeAlloca, which I somehow forgot when I split up the
original huge patch.

Finally, I decided to delete SelectCmp.  I hadn't previously hooked it
in to TargetSelectInstruction(), and when I did I realized it wasn't
serving any useful purpose.  This is only useful for compares that
don't feed a branch in the same block, and to handle that we would
have to have logic to interpret i1 as a condition register.  This
could probably be done, but would require Unseemly Hackery, and
honestly does not seem worth the hassle.

This ends the current patch series.

llvm-svn: 189715
2013-08-31 02:33:40 +00:00
Bill Schmidt 9d9510d806 [PowerPC] Add integer truncation support to fast-isel.
This is the last substantive patch I'm planning for fast-isel in the
near future, adding fast selection of integer truncates.  There are
certainly more things that can be improved (many of which are called
out in FIXMEs), but for now we are catching most of the important
cases.

I'll document some of the remaining work in a cleanup patch shortly.

llvm-svn: 189706
2013-08-30 23:31:33 +00:00
Bill Schmidt 0954ea1b5e Correct partially defined variable
llvm-svn: 189705
2013-08-30 23:25:30 +00:00
Bill Schmidt 8470b0f96c [PowerPC] Call support for fast-isel.
This patch adds fast-isel support for calls (but not intrinsic calls
or varargs calls).  It also removes a badly-formed assert.  There are
some new tests just for calls, and also for folding loads into
arguments on calls to avoid extra extends.

llvm-svn: 189701
2013-08-30 22:18:55 +00:00
Bill Schmidt 8d86fe7d6f [PowerPC] Add handling for conversions to fast-isel.
Yet another chunk of fast-isel code.  This one handles various
conversions involving floating-point.  (It also includes some
miscellaneous handling throughout the back end for LWA_32 and LWAX_32
that should have been part of the load-store patch.)

llvm-svn: 189677
2013-08-30 15:18:11 +00:00
Bill Schmidt 057b04f662 [PowerPC] Handle selection of compare instructions in fast-isel.
Mostly trivial patch adding support for compares.  The meat of the
work was added with the branch support.

llvm-svn: 189639
2013-08-30 03:16:48 +00:00
Bill Schmidt 72e3d55a76 Remove bogus debug statement. Sheesh.
llvm-svn: 189638
2013-08-30 03:07:11 +00:00
Bill Schmidt ccecf26157 [PowerPC] Add loads, stores, and related things to fast-isel.
This is the next big chunk of fast-isel code.  The primary purpose is
to implement selection of loads and stores, but there is a lot of
drag-along to support this.  The common code to analyze addresses for
both loads and stores is substantial.  It's also necessary to add the
materialization code for global values.

Related to load-store processing is the code to fold loads into
integer extends, since otherwise we generate lots of redundant
instructions.  We also need to add some overrides to some FastEmit
routines to ensure we don't assign GPR 0 to a virtual register when
this would change the meaning of an instruction.

I added handling selection of a few binary arithmetic instructions, to
enable committing some test cases I wrote a while back.

Finally, ap couple of miscellaneous changes:
 * I cleaned up some poor style from a previous patch in
   PPCISelLowering.cpp, pointed out by David Blaikie.
 * I enlarged the Addr.Offset field to avoid sign problems with 32-bit
   offsets. 

llvm-svn: 189636
2013-08-30 02:29:45 +00:00
Alexey Samsonov a3a037df63 Fix use of uninitialized value added in r189400 (found by MemorySanitizer)
llvm-svn: 189456
2013-08-28 08:30:47 +00:00
Joerg Sonnenberger b822af4721 Given target assembler parsers a chance to handle variant expressions
first. Use this to turn the PPC modifiers into PPC specific expressions,
allowing them to work on constants.

llvm-svn: 189400
2013-08-27 20:23:19 +00:00
Charles Davis 1827bd8a6c Revert "Fix the build broken by r189315." and "Move everything depending on Object/MachOFormat.h over to Support/MachO.h."
This reverts commits r189319 and r189315. r189315 broke some tests on what I
believe are big-endian platforms.

llvm-svn: 189321
2013-08-27 05:38:30 +00:00
Charles Davis 0c6f71b40d Move everything depending on Object/MachOFormat.h over to Support/MachO.h.
llvm-svn: 189315
2013-08-27 05:00:43 +00:00
Bill Schmidt 8c3976eca1 Dummy code to silence warning from 4189266
llvm-svn: 189272
2013-08-26 20:11:46 +00:00
Bill Schmidt d89f678cfd [PowerPC] More fast-isel chunks (returns and integer extends)
Incremental improvement to fast-isel for PPC64.  This allows us to
select on ret, sext, and zext.  Filling in sext/zext improves some of
the existing logic in handling compare-immediates that needed extends.

A simplified return convention for fast-isel is also added to the
PPC64 calling conventions.  All call/return processing for DAG
selection is handled with custom code, so there isn't an existing CC
to rely on here.  The include of PPCGenCallingConv.inc causes compiler
warnings due to the 32-bit calling conventions that are not used, so
the dummy function "usePPC32CCs()" is added here to silence those.

Test cases for the return and extend logic are added.

llvm-svn: 189266
2013-08-26 19:42:51 +00:00
Bill Schmidt 0300813d6a [PowerPC] Add fast-isel branch and compare selection.
First chunk of actual fast-isel selection code.  This handles direct
and indirect branches, as well as feeding compares for direct
branches.  PPCFastISel::PPCEmitIntExt() is just roughed in and will be
expanded in a future patch.  This also corrects a problem with
selection for constant pool entries in JIT mode or with small code
model.

llvm-svn: 189202
2013-08-25 22:33:42 +00:00
Bill Schmidt f381afc906 [PowerPC] More refactoring prior to real PPC emitPrologue/Epilogue changes.
(Patch committed on behalf of Mark Minich, whose log entry follows.)

This is a continuation of the refactorings performed in svn rev 188573
(see that rev's comments for more detail).

This is my stage 2 refactoring: I combined the emitPrologue() &
emitEpilogue() PPC32 & PPC64 code into a single flow, simplifying a
lot of the code since in essence the PPC32 & PPC64 code generation
logic is the same, only the instruction forms are different (in most
cases). This simplification is necessary because my functional changes
(yet to come) add significant complexity, and without the
simplification of my stage 2 refactoring, the overall complexity of
both emitPrologue() & emitEpilogue() would have become almost
intractable for most mortal programmers (like me).

This submission was intended to be a pure refactoring (no functional
changes whatsoever). However, in the process of combining the PPC32 &
PPC64 flows, I spotted a difference that I believe is a bug (see svn
rev 186478 line 863, or svn rev 188573 line 888): This line appears to
be restoring the BP with the original FP content, not the original BP
content. When I merged the 32-bit and 64-bit code, I used the
corresponding code from the 64-bit flow, which I believe uses the
correct offset (BPOffset) for this operation.

llvm-svn: 188741
2013-08-20 03:12:23 +00:00
Hal Finkel 0c5c01aa4a Add a llvm.copysign intrinsic
This adds a llvm.copysign intrinsic; We already have Libfunc recognition for
copysign (which is turned into the FCOPYSIGN SDAG node). In order to
autovectorize calls to copysign in the loop vectorizer, we need a corresponding
intrinsic as well.

In addition to the expected changes to the language reference, the loop
vectorizer, BasicTTI, and the SDAG builder (the intrinsic is transformed into
an FCOPYSIGN node, just like the function call), this also adds FCOPYSIGN to a
few lists in LegalizeVector{Ops,Types} so that vector copysigns can be
expanded.

In TargetLoweringBase::initActions, I've made the default action for FCOPYSIGN
be Expand for vector types. This seems correct for all in-tree targets, and I
think is the right thing to do because, previously, there was no way to generate
vector-values FCOPYSIGN nodes (and most targets don't specify an action for
vector-typed FCOPYSIGN).

llvm-svn: 188728
2013-08-19 23:35:46 +00:00
Hal Finkel 1cf48ab811 Don't form PPC CTR-based loops around a copysignl call
copysign/copysignf never become function calls (because the SDAG expansion code
does not lower to the corresponding function call, but rather directly
implements the associated logic), but copysignl almost always is lowered into a
call to the requested libm functon (and, thus, might clobber CTR).

llvm-svn: 188727
2013-08-19 23:35:24 +00:00
Hal Finkel dbc78e1f73 Add the PPC fcpsgn instruction
Modern PPC cores support a floating-point copysign instruction, and we can use
this to lower the FCOPYSIGN node (which is created from calls to the libm
copysign function). A couple of extra patterns are necessary because the
operand types of FCOPYSIGN need not agree.

llvm-svn: 188653
2013-08-19 05:01:02 +00:00
Bill Schmidt 8893a3d159 [PowerPC] Preparatory refactoring for making prologue and epilogue
safe on PPC32 SVR4 ABI

[Patch and following text by Mark Minich; committing on his behalf.]

There are FIXME's in PowerPC/PPCFrameLowering.cpp, method
PPCFrameLowering::emitPrologue() related to "negative offsets of R1"
on PPC32 SVR4. They're true, but the real issue is that on PPC32 SVR4
(and any ABI without a Red Zone), no spills may be made until after
the stackframe is claimed, which also includes the LR spill which is
at a positive offset. The same problem exists in emitEpilogue(),
though there's no FIXME for it. I intend to fix this issue, making
LLVM-compiled code finally safe for use on SVR4/EABI/e500 32-bit
platforms (including in particular, OS-free embedded systems & kernel
code, where interrupts may share the same stack as user code).

In preparation for making these changes, to make the diffs for the
functional changes less cluttered, I am providing the non-functional
refactorings in two stages:

Stage 1 does some minor fluffy refactorings to pull multiple method
calls up into a single bool, creating named bools for repeated uses of
obscure logic, moving some code up earlier because either stage 2 or
my final version will require it earlier, and rewording/adding some
comments. My stage 1 changes can be characterized as primarily fluffy
cleanup, the purpose of which may be unclear until the stage 2 or
final changes are made.

My stage 2 refactorings combine the separate PPC32 & PPC64 logic,
which is currently performed by largely duplicate code, into a single
flow, with the differences handled by a group of constants initialized
early in the methods.

This submission is for my stage 1 changes. There should be no
functional changes whatsoever; this is a pure refactoring.

llvm-svn: 188573
2013-08-16 20:05:04 +00:00
Craig Topper 5671010cbb Replace getValueType().getSimpleVT() with getSimpleValueType(). Also remove one weird cast from MVT->EVT just to call getSimpleVT().
llvm-svn: 188441
2013-08-15 02:33:50 +00:00
Hal Finkel b3ca00d2a3 Actually fix PPC64 64-bit GPR inline asm constraint matching
This is a follow-up to r187693, correcting that code to request the correct
register class. The previous version, with the wrong register class, was not
really correcting the constraints, but rather was removing them. Coincidentally,
this fixed the failing test case in r187693, but obviously created other
problems.

llvm-svn: 188407
2013-08-14 20:05:04 +00:00
David Fang 2f1b0b55b8 cast fix to appease buildbot
llvm-svn: 188014
2013-08-08 21:29:30 +00:00
David Fang b88cdf62f5 initial draft of PPCMachObjectWriter.cpp
this records relocation entries in the mach-o object file
for PIC code generation.
tested on powerpc-darwin8, validated against darwin otool -rvV

llvm-svn: 188004
2013-08-08 20:14:40 +00:00
Hal Finkel 2b7b2f373b PPC: Map frin to round() not nearbyint() and rint()
Making use of the recently-added ISD::FROUND, which allows for custom lowering
of round(), the PPC backend will now map frin to round(). Previously, we had
been using frin to lower nearbyint() (and rint() via some custom lowering to
handle the extra fenv flags requirements), but only in fast-math mode because
frin does not tie-to-even. Several users had complained about this behavior,
and this new mapping of frin to round is certainly more appropriate (and does
not require fast-math mode).

In effect, this reverts r178362 (and part of r178337, replacing the nearbyint
mapping with the round mapping).

llvm-svn: 187960
2013-08-08 04:31:34 +00:00
Hal Finkel 171817ee8a Add ISD::FROUND for libm round()
All libm floating-point rounding functions, except for round(), had their own
ISD nodes. Recent PowerPC cores have an instruction for round(), and so here I'm
adding ISD::FROUND so that round() can be custom lowered as well.

For the most part, this is straightforward. I've added an intrinsic
and a matching ISD node just like those for nearbyint() and friends. The
SelectionDAG pattern I've named frnd (because ISD::FP_ROUND has already claimed
fround).

This will be used by the PowerPC backend in a follow-up commit.

llvm-svn: 187926
2013-08-07 22:49:12 +00:00
Hal Finkel 11b9e452f6 Add PPC64 mulli pattern
The PPC backend had been missing a pattern to generate mulli for 64-bit
multiples. We had been generating it only for 32-bit multiplies. Unfortunately,
generating li + mulld unnecessarily increases register pressure.

llvm-svn: 187807
2013-08-06 17:03:03 +00:00
NAKAMURA Takumi aaf66c7357 Target/*/CMakeLists.txt: Add the dependency to CommonTableGen explicitly for each corresponding CodeGen.
Without explicit dependencies, both per-file action and in-CommonTableGen action could run in parallel.
It races to emit *.inc files simultaneously.

llvm-svn: 187780
2013-08-06 06:38:37 +00:00
Benjamin Kramer 72d45cc846 PPCAsmParser: Stop leaking names.
Store them in a place that gets cleaned up properly.

llvm-svn: 187700
2013-08-03 22:43:29 +00:00
Hal Finkel b176acb6b7 Fix PPC64 64-bit GPR inline asm constraint matching
Internally, the PowerPC backend names the 32-bit GPRs R[0-9]+, and names the
64-bit parent GPRs X[0-9]+. When matching inline assembly constraints with
explicit register names, on PPC64 when an i64 MVT has been requested, we need
to follow gcc's convention of using r[0-9]+ to refer to the 64-bit (parent)
registers.

At some point, we'll probably want to arrange things so that the generic code
in TargetLowering uses the AsmName fields declared in *RegisterInfo.td in order
to match these inline asm register constraints. If we do that, this change can
be reverted.

llvm-svn: 187693
2013-08-03 12:25:10 +00:00
Bill Wendling a5c536e1ee Use function attributes to indicate that we don't want to realign the stack.
Function attributes are the future! So just query whether we want to realign the
stack directly from the function instead of through a random target options
structure.

llvm-svn: 187618
2013-08-01 21:42:05 +00:00
Bill Schmidt 0cf702fa61 [PowerPC] Skeletal FastISel support for 64-bit PowerPC ELF.
This is the first of many upcoming patches for PowerPC fast
instruction selection support.  This patch implements the minimum
necessary for a functional (but extremely limited) FastISel pass.  It
allows the table-generated portions of the selector to be created and
used, but in most cases selection will fall back to the DAG selector.
None of the block terminator instructions are implemented yet, and
most interesting instructions require some special handling.
Therefore there aren't any new test cases with this patch.  There will
be quite a few tests coming with future patches.

This patch adds the make/CMake support for the new code (including
tablegen -gen-fast-isel) and creates the FastISel object for PPC64 ELF
only.  It instantiates the necessary virtual functions
(TargetSelectInstruction, TargetMaterializeConstant,
TargetMaterializeAlloca, tryToFoldLoadIntoMI, and FastLowerArguments),
but of these, only TargetMaterializeConstant contains any useful
implementation.  This is present since the table-generated code
requires the ability to materialize integer constants for some
instructions.

This patch has been tested by building and running the
projects/test-suite code with -O0.  All tests passed with the
exception of a couple of long-running tests that time out using -O0
code generation.

llvm-svn: 187399
2013-07-30 00:50:39 +00:00
Bill Schmidt 40f78a2a86 [PowerPC] Add comment explaining preprocessor directive.
llvm-svn: 187320
2013-07-28 03:23:32 +00:00
Bill Schmidt 20573225ed Revert 187318
llvm-svn: 187319
2013-07-28 02:13:24 +00:00
Bill Schmidt f5b32e3935 [PowerPC] Remove unnecessary preprocessor checking.
The tests !defined(__ppc__) && !defined(__powerpc__) are not needed
or helpful when verifying that code is being compiled for a 64-bit
target.  The simpler test provided by this revision is sufficient to
tell if the target is 64-bit.

llvm-svn: 187318
2013-07-28 02:08:13 +00:00
Rafael Espindola 05b5a46ed8 Revert "[PowerPC] Improve consistency in use of __ppc__, __powerpc__, etc."
This reverts commit r187248. It broke many bots.

llvm-svn: 187254
2013-07-26 22:13:57 +00:00
Bill Schmidt 419f7c2345 [PowerPC] Improve consistency in use of __ppc__, __powerpc__, etc.
Both GCC and LLVM will implicitly define __ppc__ and __powerpc__ for
all PowerPC targets, whether 32- or 64-bit.  They will both implicitly
define __ppc64__ and __powerpc64__ for 64-bit PowerPC targets, and not
for 32-bit targets.  We cannot be sure that all other possible
compilers used to compile Clang/LLVM define both __ppc__ and
__powerpc__, for example, so it is best to check for both when relying
on either inside the Clang/LLVM code base.

This patch makes sure we always check for both variants.  In addition,
it fixes one unnecessary check in lib/Target/PowerPC/PPCJITInfo.cpp.
(At least one of __ppc__ and __powerpc__ should always be defined when
compiling for a PowerPC target, no matter which compiler is used, so
testing for them is unnecessary.)

There are some places in the compiler that check for other variants,
like __POWERPC__ and _POWER, and I have left those in place.  There is
no need to add them elsewhere.  This seems to be in Apple-specific
code, and I won't take a chance on breaking it.

There is no intended change in behavior; thus, no test cases are
added.

llvm-svn: 187248
2013-07-26 21:39:15 +00:00
Bill Schmidt 0a9170d931 [PowerPC] Support powerpc64le as a syntax-checking target.
This patch provides basic support for powerpc64le as an LLVM target.
However, use of this target will not actually generate little-endian
code.  Instead, use of the target will cause the correct little-endian
built-in defines to be generated, so that code that tests for
__LITTLE_ENDIAN__, for example, will be correctly parsed for
syntax-only testing.  Code generation will otherwise be the same as
powerpc64 (big-endian), for now.

The patch leaves open the possibility of creating a little-endian
PowerPC64 back end, but there is no immediate intent to create such a
thing.

The LLVM portions of this patch simply add ppc64le coverage everywhere
that ppc64 coverage currently exists.  There is nothing of any import
worth testing until such time as little-endian code generation is
implemented.  In the corresponding Clang patch, there is a new test
case variant to ensure that correct built-in defines for little-endian
code are generated.

llvm-svn: 187179
2013-07-26 01:35:43 +00:00
Roman Divacky c3825df87e PPC32 va_list is an actual structure so va_copy needs to copy the whole
structure not just a pointer. This implements that and thus fixes va_copy
on PPC32. Fixes #15286. Both bug and patch by Florian Zeitz!

llvm-svn: 187158
2013-07-25 21:36:47 +00:00
David Fang 75ca7ce811 allow tests to run on powerpc-darwin8 again, checking for __ppc__
llvm-svn: 187027
2013-07-24 07:52:16 +00:00
Craig Topper 690d8ea181 Split generated asm mnemonic matching table into a separate table for each asm variant.
This removes the need to store the asm variant in each row of the single table that existed before. Shaves ~16K off the size of X86AsmParser.o.

llvm-svn: 187026
2013-07-24 07:33:14 +00:00
Hal Finkel 1860763c76 PPC: Support dynamic allocas with large alignment
Support for dynamic stack alignments in the PPC backend has been unfinished, in
part because it depends on dynamic stack realignment (which I only just
recently implemented fully). Now we can also support dynamic allocas with
higher than the default target stack alignment (16 bytes).

In order to round-up the requested size to the maximum requested alignment, we
need an additional register to hold the rounded-up size. We're already using one
scavenged register to hold the previous stack-pointer value (which needs to be
stored with the signal-safe stdux update), and so when we have dynamic allocas
and a large alignment, we allocate two emergency spill slots for the scavenger.

llvm-svn: 186562
2013-07-18 04:28:21 +00:00
Hal Finkel f05d6c7843 PPC: Add base-pointer support to builtin setjmp/longjmp
First, this changes the base-pointer implementation to remove an unnecessary
complication (and one that is incompatible with how builtin SjLj is
implemented): instead of using r31 as the base pointer when it is not needed as
a frame pointer, now the base pointer will always be r30 when needed.

Second, we introduce another pseudo register, BP, which is used just like the FP
pseudo register to refer to the base register before we know for certain what
register it will be.

Third, we now save BP into the jmp_buf, and restore r30 from that slot in
longjmp.  If the function that called setjmp did not use a base pointer, then
r30 will be overwritten by the setjmp-calling-function's restore code. FP
restoration (which is restored into r31) works the same way.

llvm-svn: 186545
2013-07-17 23:50:51 +00:00
Hal Finkel 40f76d5830 PPC: Add CTR-register clobber to builtin setjmp
Because the builtin longjmp implementation uses a CTR-based indirect jump, when
the control flow arrives at the builtin setjmp call, the CTR register has
necessarily been clobbered. Correspondingly, this adds CTR to the list of
implicit definitions of the builtin setjmp pseudo instruction.

We don't need to add CTR to the implicit definitions of builtin longjmp
because, even though it does clobber the CTR register, the control flow cannot
return to inside the loop unless there is also a builtin setjmp call.

llvm-svn: 186488
2013-07-17 05:35:44 +00:00
Hal Finkel a7c54e8cf4 PPC: Implement base pointer and stack realignment
This builds on some frame-lowering code that has existed since 2005 (r24224)
but was disabled in 2008 (r48188) because it needed base pointer support to
function correctly. This implementation follows the strategy suggested by Dale
Johannesen in r48188 where the following comment was added:

  This does not currently work, because the delta between old and new stack
  pointers is added to offsets that reference incoming parameters after the
  prolog is generated, and the code that does that doesn't handle a variable
  delta.  You don't want to do that anyway; a better approach is to reserve
  another register that retains to the incoming stack pointer, and reference
  parameters relative to that.

And now we do exactly that. If we don't need a frame pointer, then we use r31
as a base pointer. If we do need a frame pointer, then we use r30 as a base
pointer. The base pointer retains the value of the stack pointer before it was
decremented in the prologue. We then use the base pointer to resolve all
negative frame indicies. The basic scheme follows that for base pointers in the
X86 backend.

We use a base pointer when we need to dynamically realign the incoming stack
pointer. This currently applies only to static objects (dynamic allocas with
large alignments, and base-pointer support in SjLj lowering will come in future
commits).

llvm-svn: 186478
2013-07-17 00:45:52 +00:00
NAKAMURA Takumi 37ce985739 PPCJITInfo.cpp: Tweak r186252 with s/__ppc/__powerpc/ to work on powerpc-linux Fedora 12.
g++ (GCC) 4.4.4 20100630 (Red Hat 4.4.4-10)

llvm-svn: 186396
2013-07-16 09:59:51 +00:00
Hal Finkel a0014a5a26 PPC: Refactoring to support subtarget feature changing
This change mirrors the changes that were made to the X86 and ARM targets to
support subtarget feature changing. As indicated in r182899, the mechanism is
still undergoing revision, and so as with the X86 and ARM targets, there is no
test case yet (there is no effective functionality change).

llvm-svn: 186357
2013-07-15 22:29:40 +00:00
Hal Finkel 8e8618ae5c Fix register subclass handling in PPCInstrInfo::insertSelect
PPCInstrInfo::insertSelect and PPCInstrInfo::canInsertSelect were computing the
common subclass of the true and false inputs, and then selecting either the
32-bit or the 64-bit isel variant based on the result of calling
PPC::GPRCRegClass.hasSubClassEq(RC) and PPC::G8RCRegClass.hasSubClassEq(RC)
(where RC is the common subclass). Unfortunately, this is not quite right: if
we have something like this:

  %vreg8<def> = SELECT_CC_I8 %vreg4<kill>, %vreg7<kill>, %vreg6<kill>, 76;
    G8RC_and_G8RC_NOX0:%vreg8 CRRC:%vreg4 G8RC_NOX0:%vreg7,%vreg6

then the common subclass of G8RC_and_G8RC_NOX0 and G8RC_NOX0 is G8RC_NOX0, and
G8RC_NOX0 is not a subclass of G8RC (because it also contains the ZERO8
pseudo-register). As a result, we also need to check the common subclass
against GPRC_NOR0 and G8RC_NOX0 explicitly.

This had not been a problem for clients of insertSelect that called
canInsertSelect first (because it had a compensating mistake), but insertSelect
is also used by the PPC pseudo-instruction expander, and this error was causing
a problem in that context.

This problem was found by csmith.

llvm-svn: 186343
2013-07-15 20:22:58 +00:00
Craig Topper 5871321e49 Use llvm::array_lengthof to replace sizeof(array)/sizeof(array[0]).
llvm-svn: 186301
2013-07-15 04:27:47 +00:00
Craig Topper b94011fd28 Use SmallVectorImpl& instead of SmallVector to avoid repeating small vector size.
llvm-svn: 186274
2013-07-14 04:42:23 +00:00
Joerg Sonnenberger 8e01ae895d Reduce large list of macros to the primary platform macros. Distingiush
between ELF (Linux, FreeBSD, NetBSD) and OSX as platform for the
assembler dialect.

llvm-svn: 186252
2013-07-13 17:59:55 +00:00
Hal Finkel 4715081787 PPC: Add some missing V_SET0 patterns
We had patterns to match v4i32 immAllZerosV -> V_SET0, but not patterns for
v8i16 (which occurs in the test case) or v16i8. The same was true for
V_SETALLONES (so I added the associated patterns for those as well).

Another bug found by llvm-stress.

llvm-svn: 186108
2013-07-11 17:43:32 +00:00
Hal Finkel ff3ea8060c PPCDAGToDAGISel::isRunOfOnes should return false on zero
This fixes a bug (found by csmith) at -O0 where we attempt to create a RLWIMI
with an out-of-range operand. Most uses of the isRunOfOnes function are guarded
by a condition that the value is not zero. This was not true in two places, and
in both places a zero input would result in an out-of-rage MB value (= 32).

To fix this, isRunOfOnes returns false on a zero input (and I've remove one
now-redundant guard).

llvm-svn: 186101
2013-07-11 16:31:51 +00:00
Hal Finkel 7ab3db52d3 PPC: Add a better comment about the i64 FI fixup
In discussing this change with Bill Schmidt, it was decided that the original
comment about negative FIs was incorrect. We'll still exclude them for now, but
now with a more-accurate explanation.

llvm-svn: 186005
2013-07-10 15:29:01 +00:00
Bill Schmidt 4122169308 [PowerPC] Better fix for PR16556.
A more complete example of the bug in PR16556 was recently provided,
showing that the previous fix was not sufficient.  The previous fix is
reverted herein.

The real problem is that ReplaceNodeResults() uses LowerFP_TO_INT as
custom lowering for FP_TO_SINT during type legalization, without
checking whether the input type is handled by that routine.
LowerFP_TO_INT requires the input to be f32 or f64, so we fail when
the input is ppcf128.

I'm leaving the test case from the initial fix (r185821) in place, and
adding the new test as another crash-only check.

llvm-svn: 185959
2013-07-09 18:50:20 +00:00
Stephen Lin 73de7bf5de AArch64/PowerPC/SystemZ/X86: This patch fixes the interface, usage, and all
in-tree implementations of TargetLoweringBase::isFMAFasterThanMulAndAdd in
order to resolve the following issues with fmuladd (i.e. optional FMA)
intrinsics:

1. On X86(-64) targets, ISD::FMA nodes are formed when lowering fmuladd
intrinsics even if the subtarget does not support FMA instructions, leading
to laughably bad code generation in some situations.

2. On AArch64 targets, ISD::FMA nodes are formed for operations on fp128,
resulting in a call to a software fp128 FMA implementation.

3. On PowerPC targets, FMAs are not generated from fmuladd intrinsics on types
like v2f32, v8f32, v4f64, etc., even though they promote, split, scalarize,
etc. to types that support hardware FMAs.

The function has also been slightly renamed for consistency and to force a
merge/build conflict for any out-of-tree target implementing it. To resolve,
see comments and fixed in-tree examples.

llvm-svn: 185956
2013-07-09 18:16:56 +00:00
Ulrich Weigand 52cf8e4488 [PowerPC] Revert r185476 and fix up TLS variant kinds
In the commit message to r185476 I wrote:

>The PowerPC-specific modifiers VK_PPC_TLSGD and VK_PPC_TLSLD
>correspond exactly to the generic modifiers VK_TLSGD and VK_TLSLD.
>This causes some confusion with the asm parser, since VK_PPC_TLSGD
>is output as @tlsgd, which is then read back in as VK_TLSGD.
>
>To avoid this confusion, this patch removes the PowerPC-specific
>modifiers and uses the generic modifiers throughout.  (The only
>drawback is that the generic modifiers are printed in upper case
>while the usual convention on PowerPC is to use lower-case modifiers.
>But this is just a cosmetic issue.)

This was unfortunately incorrect, there is is fact another,
serious drawback to using the default VK_TLSLD/VK_TLSGD
variant kinds: using these causes ELFObjectWriter::RelocNeedsGOT
to return true, which in turn causes the ELFObjectWriter to emit
an undefined reference to _GLOBAL_OFFSET_TABLE_.

This is a problem on powerpc64, because it uses the TOC instead
of the GOT, and the linker does not provide _GLOBAL_OFFSET_TABLE_,
so the symbol remains undefined.  This means shared libraries
using TLS built with the integrated assembler are currently
broken.

While the whole RelocNeedsGOT / _GLOBAL_OFFSET_TABLE_ situation
probably ought to be properly fixed at some point, for now I'm
simply reverting the r185476 commit.  Now this in turn exposes
the breakage of handling @tlsgd/@tlsld in the asm parser that
this check-in was originally intended to fix.

To avoid this regression, I'm also adding a different fix for
this problem: while common code now parses @tlsgd as VK_TLSGD,
a special hack in the asm parser translates this code to the
platform-specific VK_PPC_TLSGD that the back-end now expects.
While this is not really pretty, it's self-contained and
shouldn't hurt anything else for now.  One the underlying
problem is fixed, this hack can be reverted again.

llvm-svn: 185945
2013-07-09 16:41:09 +00:00
Ulrich Weigand 55daa77901 [PowerPC] Support ".machine any"
The PowerPC assembler is supposed to provide a directive .machine
that allows switching the supported CPU instruction set on the fly.
Since we do not yet check CPU feature sets at all and always accept
any available instruction, this is not really useful at this point.

However, it makes sense to accept (and ignore) ".machine any" to
avoid spuriously rejecting existing assembler files that use this.

llvm-svn: 185924
2013-07-09 10:00:34 +00:00
Ulrich Weigand 78a5a116a0 [PowerPC] Support .llong and fix .word
This adds support for the .llong PowerPC-specifc assembler directive.
In doing so, I notices that .word is currently incorrect: it is
supposed to define a 2-byte data element, not a 4-byte one.

llvm-svn: 185911
2013-07-09 07:59:25 +00:00
Hal Finkel dbbf09b28e PPC: Allocate RS spill slot for unaligned i64 load/store
This fixes another bug found by llvm-stress!

If we happen to be doing an i64 load or store into a stack slot that has less
than a 4-byte alignment, then the frame-index elimination may need to use an
indexed load or store instruction (because the offset may not be a multiple of
4, a requirement of the STD/LD instructions). The extra register needed to hold
the offset comes from the register scavenger, and it is possible that the
scavenger will need to use an emergency spill slot. As a result, we need to
make sure that a spill slot is allocated when doing an i64 load/store into a
less-than-4-byte-aligned stack slot.

Because test cases for things like this tend to be fairly fragile, I've
concatenated a few small bugpoint-reduced test cases together to form the
regression test.

llvm-svn: 185907
2013-07-09 06:34:51 +00:00
Ulrich Weigand 266db7fe04 [PowerPC] Always use "assembler dialect" 1
A setting in MCAsmInfo defines the "assembler dialect" to use.  This is used
by common code to choose between alternatives in a multi-alternative GNU
inline asm statement like the following:

  __asm__ ("{sfe|subfe} %0,%1,%2" : "=r" (out) : "r" (in1), "r" (in2));

The meaning of these dialects is platform specific, and GCC defines those
for PowerPC to use dialect 0 for old-style (POWER) mnemonics and 1 for
new-style (PowerPC) mnemonics, like in the example above.

To be compatible with inline asm used with GCC, LLVM ought to do the same.
Specifically, this means we should always use assembler dialect 1 since
old-style mnemonics really aren't supported on any current platform.

However, the current LLVM back-end uses:
  AssemblerDialect = 1;           // New-Style mnemonics.
in PPCMCAsmInfoDarwin, and
  AssemblerDialect = 0;           // Old-Style mnemonics.
in PPCLinuxMCAsmInfo.

The Linux setting really isn't correct, we should be using new-style
mnemonics everywhere.  This is changed by this commit.

Unfortunately, the setting of this variable is overloaded in the back-end
to decide whether or not we are on a Darwin target.  This is done in
PPCInstPrinter (the "SyntaxVariant" is initialized from the MCAsmInfo
AssemblerDialect setting), and also in PPCMCExpr.  Setting AssemblerDialect
to 1 for both Darwin and Linux no longer allows us to make this distinction.

Instead, this patch uses the MCSubtargetInfo passed to createPPCMCInstPrinter
to distinguish Darwin targets, and ignores the SyntaxVariant parameter.
As to PPCMCExpr, this patch adds an explicit isDarwin argument that needs
to be passed in by the caller when creating a target MCExpr.  (To do so
this patch implicitly also reverts commit 184441.)

llvm-svn: 185858
2013-07-08 20:20:51 +00:00
Hal Finkel 21ada79757 PPC: Mark vector CC action for SETO and SETONE as Expand
Another bug found by llvm-stress! This fixes hitting
  llvm_unreachable("Invalid integer vector compare condition");
at the end of getVCmpInst in PPCISelDAGToDAG.

llvm-svn: 185855
2013-07-08 20:00:03 +00:00
Hal Finkel e39302258e PPC: Mark vector FREM as Expand by default
Another bug found by llvm-stress! This fixes crashing with:
  LLVM ERROR: Cannot select: v4f32 = frem ...

llvm-svn: 185840
2013-07-08 17:30:25 +00:00
Ulrich Weigand e840ee2ca2 [PowerPC] Support time base instructions
This adds support for the old-style time base instructions;
while new programs are supposed to use mfspr, the mftb instructions
are still supported and in use by existing assembler files.

llvm-svn: 185829
2013-07-08 15:20:38 +00:00
Ulrich Weigand c0944b50fe [PowerPC] Support basic compare mnemonics
This adds support for the basic mnemoics (with the L operand) for the
fixed-point compare instructions.  These are defined as aliases for the
already existing CMPW/CMPD patterns, depending on the value of L.

This requires use of InstAlias patterns with immediate literal operands.
To make this work, we need two further changes:

 - define a RegisterPrefix, because otherwise literals 0 and 1 would
   be parsed as literal register names

 - provide a PPCAsmParser::validateTargetOperandClass routine to
   recognize immediate literals (like ARM does)

llvm-svn: 185826
2013-07-08 14:49:37 +00:00
Bill Schmidt 2db29ef467 [PowerPC] Fix PR16556 (handle undef ppcf128 in LowerFP_TO_INT).
PPCTargetLowering::LowerFP_TO_INT() expects its source operand to be
either an f32 or f64, but this is not checked.  A long double
(ppcf128) operand will normally be custom-lowered to a conversion to
f64 in this context.  However, this isn't the case for an UNDEF node.

This patch recognizes a ppcf128 as a legal source operand for
FP_TO_INT only if it's an undef, in which case it creates an undef of
the target type.

At some point we might want to do a wholesale custom lowering of
ISD::UNDEF when the type is ppcf128, but it's not really clear that's
a great idea, and probably more work than it's worth for a situation
that only arises in the case of a programming error.  At this point I
think simple is best.

The test case comes from PR16556, and is a crash-test only.

llvm-svn: 185821
2013-07-08 14:22:45 +00:00
Ulrich Weigand b204431106 [PowerPC] Add some special @got@tprel fixup cases
When a target@got@tprel or target@got@tprel@l symbol variant is used in
a fixup_ppc_half16 (*not* fixup_ppc_half16ds) context, we currently fail,
since the corresponding R_PPC64_GOT_TPREL16 / R_PPC64_GOT_TPREL16_LO
relocation types do not exist.

However, since such symbol variants resolve to GOT offsets which are
always 4-aligned, we can simply instead use the _DS variants of the
relocation types, which *do* exist.

The same applies for the @got@dtprel variants.

llvm-svn: 185700
2013-07-05 13:49:46 +00:00
Ulrich Weigand 5b427591d6 [PowerPC] Support @tls in the asm parser
This adds support for the last missing construct to parse TLS-related
assembler code:
   add 3, 4, symbol@tls

The ADD8TLS currently hard-codes the @tls into the assembler string.
This cannot be handled by the asm parser, since @tls is parsed as
a symbol variant.  This patch changes ADD8TLS to have the @tls suffix
printed as symbol variant on output too, which allows us to remove
the isCodeGenOnly marker from ADD8TLS.  This in turn means that we
can add a AsmOperand to accept @tls marked symbols on input.

As a side effect, this means that the fixup_ppc_tlsreg fixup type
is no longer necessary and can be merged into fixup_ppc_nofixup.

llvm-svn: 185692
2013-07-05 12:22:36 +00:00
Ulrich Weigand d3ac7c058b [PowerPC] Implement writeNopData
This implements a proper PPCAsmBackend::writeNopData routine
that actually writes PowerPC nop instructions.

This fixes the last remaining difference in object file output
(text section) between the integrated assembler and GNU as
that I've seen anywhere.

llvm-svn: 185662
2013-07-04 18:28:46 +00:00
Ulrich Weigand 56b0e7b011 [PowerPC] Add all trap mnemonics
This adds support for all basic and extended variants
of the trap instructions to the asm parser.

llvm-svn: 185638
2013-07-04 14:40:12 +00:00
Ulrich Weigand b86cb7d04b [PowerPC] Add asm parser support for CR expressions
This adds support for specifying condition registers and
condition register fields via expressions using the symbols
defined by the PowerISA, like "4*cr2+eq".

llvm-svn: 185633
2013-07-04 14:24:00 +00:00
Jakob Stoklund Olesen db429d9483 Remove the EXCEPTIONADDR, EHSELECTION, and LSDAADDR ISD opcodes.
These exception-related opcodes are not used any longer.

llvm-svn: 185625
2013-07-04 13:54:20 +00:00
Craig Topper af0dea1347 Use SmallVectorImpl::iterator/const_iterator instead of SmallVector to avoid specifying the vector size.
llvm-svn: 185606
2013-07-04 01:31:24 +00:00
Jakob Stoklund Olesen a1f5b901a5 Revert r185595-185596 which broke buildbots.
Revert "Simplify landing pad lowering."
Revert "Remove the EXCEPTIONADDR, EHSELECTION, and LSDAADDR ISD opcodes."

llvm-svn: 185600
2013-07-04 00:26:30 +00:00
Jakob Stoklund Olesen f33ec531fa Remove the EXCEPTIONADDR, EHSELECTION, and LSDAADDR ISD opcodes.
These exception-related opcodes are not used any longer.

llvm-svn: 185596
2013-07-03 23:56:31 +00:00
Bill Schmidt 541758daa9 [PowerPC] FreeBSD does not require f128 in its data layout string.
Long double is 64 bits on FreeBSD PPC, so the f128 entry is superfluous.

llvm-svn: 185583
2013-07-03 21:03:35 +00:00
Ulrich Weigand 2542b3b17f [PowerPC] Support lmw/stmw in the asm parser
This adds support for the load/store multiple instructions,
currently used by the asm parser only.

llvm-svn: 185564
2013-07-03 18:29:47 +00:00
Ulrich Weigand 49f487e6cd [PowerPC] Use mtocrf when available
Just as with mfocrf, it is also preferable to use mtocrf instead of
mtcrf when only a single CR register is to be written.

Current code however always emits mtcrf.  This probably does not matter
when using an external assembler, since the GNU assembler will in fact
automatically replace mtcrf with mtocrf when possible.  It does create
inefficient code with the integrated assembler, however.

To fix this, this patch adds MTOCRF/MTOCRF8 instruction patterns and
uses those instead of MTCRF/MTCRF8 everything.  Just as done in the
MFOCRF patch committed as 185556, these patterns will be converted
back to MTCRF if MTOCRF is not available on the machine.

As a side effect, this allows to modify the MTCRF pattern to accept
the full range of mask operands for the benefit of the asm parser.

llvm-svn: 185561
2013-07-03 17:59:07 +00:00
Ulrich Weigand d5ebc626d5 [PowerPC] Always use mfocrf if available
When accessing just a single CR register, it is always preferable to
use mfocrf instead of mfcr, if the former is available on the CPU.

Current code makes that distinction in many, but not all places
where a single CR register value is retrieved.  One missing
location is PPCRegisterInfo::lowerCRSpilling.

To fix this and make this simpler in the future, this patch changes
the bulk of the back-end to always assume mfocrf is available and
simply generate it when needed.

On machines that actually do not support mfocrf, the instruction
is replaced by mfcr at the very end, in EmitInstruction.

This has the additional benefit that we no longer need the
MFCRpseud hack, since before EmitInstruction we always have
a MFOCRF instruction pattern, which already models data flow
as required.

The patch also adds the MFOCRF8 version of the instruction,
which was missing so far.

Except for the PPCRegisterInfo::lowerCRSpilling case, no change
in generated code intended.

llvm-svn: 185556
2013-07-03 17:05:42 +00:00
Ulrich Weigand 47e9328afe [PowerPC] Remove dead code from PPCDAGToDAGISel::SelectSETCC
The subroutine getCRIdxForSetCC has a parameter "Other" and comment:

  If this returns with Other != -1, then the returned comparison
  is an or of two simpler comparisons.

However for at least the last five years this routine has never
returned a value of Other != -1; these cases are now handled
differently to begin with.

This patch removes the parameter and the code in SelectSETCC that
attempted to handle the Other != -1 case.

llvm-svn: 185541
2013-07-03 15:13:30 +00:00
Ulrich Weigand 9d2e202d65 [PowerPC] Make specialized AltiVec patterns isCodeGenOnly
A couple of AltiVec patterns are just specialized forms of the
generic instruction pattern, and should therefore be marked
isCodeGenOnly to avoid confusing the asm parser:
VCFSX_0, VCTUXS_0, VCFUX_0, VCTSXS_0, and V_SETALLONES.

Noticed by inspection of the generated PPCGenAsmMatcher.inc.

llvm-svn: 185533
2013-07-03 12:51:09 +00:00
Ulrich Weigand ae9cf5828c [PowerPC] Support mtspr/mfspr in the asm parser
This adds support for the generic forms of mtspr/mfspr
for the asm parser.  The compiler will continue to use
the specialized patters for mtlr etc. since those are
needed to correctly describe data flow.

llvm-svn: 185532
2013-07-03 12:32:41 +00:00
Ulrich Weigand 42a09dc12f [PowerPC] PR16512 - Support TLS call sequences in the asm parser
This patch now adds support for recognizing TLS call sequences in
the asm parser.  This needs a new pattern BL8_TLS, which is like
BL8_NOP_TLS except without nop.  That pattern is used for the
asm parser only.

llvm-svn: 185478
2013-07-02 21:31:59 +00:00
Ulrich Weigand 5143bab2f9 [PowerPC] Rework TLS call operand processing
As part of the global-dynamic and local-dynamic TLS sequences, we need
to use a special form of the call instruction:

 bl __tls_get_addr(sym@tlsld)
 bl __tls_get_addr(sym@tlsgd)

which generates two fixups.  The current implementation of this causes
problems with recognizing this form in the asm parser.  To fix this,
this patch reworks operand processing for this special form by using
a single operand to hold both __tls_get_addr and sym@tlsld and defining
a print method to output the above form, and an encoding method to
generate the two fixups.

As a side simplification, the patch replaces the two instruction
patterns BL8_NOP_TLSGD and BL8_NOP_TLSLD by a single BL8_NOP_TLS,
since the patterns already operate in an identical fashion (whether
we have a local-dynamic or global-dynamic symbol is already encoded
in the symbol modifier).

No change in code generation intended.

llvm-svn: 185477
2013-07-02 21:31:04 +00:00
Ulrich Weigand 4050995650 [PowerPC] Remove VK_PPC_TLSGD and VK_PPC_TLSLD
The PowerPC-specific modifiers VK_PPC_TLSGD and VK_PPC_TLSLD
correspond exactly to the generic modifiers VK_TLSGD and VK_TLSLD.
This causes some confusion with the asm parser, since VK_PPC_TLSGD
is output as @tlsgd, which is then read back in as VK_TLSGD.

To avoid this confusion, this patch removes the PowerPC-specific
modifiers and uses the generic modifiers throughout.  (The only
drawback is that the generic modifiers are printed in upper case
while the usual convention on PowerPC is to use lower-case modifiers.
But this is just a cosmetic issue.)

llvm-svn: 185476
2013-07-02 21:29:06 +00:00
Ulrich Weigand 0f0398246c [PowerPC] Support TLS variables in debug info
This adds an implementation of getDebugThreadLocalSymbol for
(64-bit) PowerPC.  This needs to return a generic MCExpr
since on ppc64, we need to add a bias of 0x8000 to the
value returned by the R_PPC64_DTPREL64 relocation.

llvm-svn: 185461
2013-07-02 18:47:35 +00:00
Rafael Espindola 64e1af8eb9 Remove address spaces from MC.
This is dead code since PIC16 was removed in 2010. The result was an odd mix,
where some parts would carefully pass it along and others would assert it was
zero (most of the object streamer for example).

llvm-svn: 185436
2013-07-02 15:49:13 +00:00
Hal Finkel 52727c6b82 Cleanup PPC Altivec registers in CSR lists and improve VRSAVE handling
There are a couple of (small) related changes here:

1. The printed name of the VRSAVE register has been changed from VRsave to
vrsave in order to match the name accepted by GNU binutils.

2. Support for parsing vrsave has been added to the asm parser (it seems that
there was no test case specifically covering this code, so I've added one).

3. The list of Altivec registers, which was common to all calling conventions,
has been separated out. This allows us to define the base CSR lists, and then
lists for each ABI with Altivec included. This allows SjLj, for example, to
work correctly on non-Altivec targets without using unnatural definitions of
the NoRegs CSR list.

4. VRSAVE is now always reserved on non-Darwin targets and all Altivec
registers are reserved when Altivec is disabled.

With these changes, it is now possible to compile a function containing
__builtin_unwind_init() on Linux/PPC64 with debugging information. This did not
work previously because GNU binutils assumes that all .cfi_offset offsets will
be 8-byte aligned on PPC64 (and errors out if you provide a non-8-byte-aligned
offset). This is not true for the vrsave register, however, because this
register is used only on Darwin, GCC does not bother printing a .cfi_offset
entry for it (even though there is a slot in the stack frame for it as
specified by the ABI). This change allows us to do the same: we will also not
print .cfi_offset directives for vrsave.

llvm-svn: 185409
2013-07-02 03:39:34 +00:00
Ulrich Weigand f11efe7f48 [PowerPC] Add support for TLS data relocations
This adds support for TLS data relocations and modifiers:
       .quad target@dtpmod
       .quad target@tprel
       .quad target@dtprel
Currently exploited by the asm parser only.

llvm-svn: 185394
2013-07-01 23:33:29 +00:00
Ulrich Weigand 85c6f7f7a7 [PowerPC] Support all condition register logical instructions
This adds support for all missing condition register logical
instructions and extended mnemonics to the asm parser.

llvm-svn: 185387
2013-07-01 21:40:54 +00:00
Bill Schmidt 48fc20a034 Index: test/CodeGen/PowerPC/reloc-align.ll
===================================================================
--- test/CodeGen/PowerPC/reloc-align.ll	(revision 0)
+++ test/CodeGen/PowerPC/reloc-align.ll	(revision 0)
@@ -0,0 +1,34 @@
+; RUN: llc -mcpu=pwr7 -O1 < %s | FileCheck %s
+
+; This test verifies that the peephole optimization of address accesses
+; does not produce a load or store with a relocation that can't be
+; satisfied for a given instruction encoding.  Reduced from a test supplied
+; by Hal Finkel.
+
+target datalayout = "E-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-f128:128:128-v128:128:128-n32:64"
+target triple = "powerpc64-unknown-linux-gnu"
+
+%struct.S1 = type { [8 x i8] }
+
+@main.l_1554 = internal global { i8, i8, i8, i8, i8, i8, i8, i8 } { i8 -1, i8 -6, i8 57, i8 62, i8 -48, i8 0, i8 58, i8 80 }, align 1
+
+; Function Attrs: nounwind readonly
+define signext i32 @main() #0 {
+entry:
+  %call = tail call fastcc signext i32 @func_90(%struct.S1* byval bitcast ({ i8, i8, i8, i8, i8, i8, i8, i8 }* @main.l_1554 to %struct.S1*))
+; CHECK-NOT: ld {{[0-9]+}}, main.l_1554@toc@l
+  ret i32 %call
+}
+
+; Function Attrs: nounwind readonly
+define internal fastcc signext i32 @func_90(%struct.S1* byval nocapture %p_91) #0 {
+entry:
+  %0 = bitcast %struct.S1* %p_91 to i64*
+  %bf.load = load i64* %0, align 1
+  %bf.shl = shl i64 %bf.load, 26
+  %bf.ashr = ashr i64 %bf.shl, 54
+  %bf.cast = trunc i64 %bf.ashr to i32
+  ret i32 %bf.cast
+}
+
+attributes #0 = { nounwind readonly "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf"="true" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "unsafe-fp-math"="false" "use-soft-float"="false" }
Index: lib/Target/PowerPC/PPCAsmPrinter.cpp
===================================================================
--- lib/Target/PowerPC/PPCAsmPrinter.cpp	(revision 185327)
+++ lib/Target/PowerPC/PPCAsmPrinter.cpp	(working copy)
@@ -679,7 +679,26 @@ void PPCAsmPrinter::EmitInstruction(const MachineI
       OutStreamer.EmitRawText(StringRef("\tmsync"));
       return;
     }
+    break;
+  case PPC::LD:
+  case PPC::STD:
+  case PPC::LWA: {
+    // Verify alignment is legal, so we don't create relocations
+    // that can't be supported.
+    // FIXME:  This test is currently disabled for Darwin.  The test
+    // suite shows a handful of test cases that fail this check for
+    // Darwin.  Those need to be investigated before this sanity test
+    // can be enabled for those subtargets.
+    if (!Subtarget.isDarwin()) {
+      unsigned OpNum = (MI->getOpcode() == PPC::STD) ? 2 : 1;
+      const MachineOperand &MO = MI->getOperand(OpNum);
+      if (MO.isGlobal() && MO.getGlobal()->getAlignment() < 4)
+        llvm_unreachable("Global must be word-aligned for LD, STD, LWA!");
+    }
+    // Now process the instruction normally.
+    break;
   }
+  }
 
   LowerPPCMachineInstrToMCInst(MI, TmpInst, *this);
   OutStreamer.EmitInstruction(TmpInst);
Index: lib/Target/PowerPC/PPCISelDAGToDAG.cpp
===================================================================
--- lib/Target/PowerPC/PPCISelDAGToDAG.cpp	(revision 185327)
+++ lib/Target/PowerPC/PPCISelDAGToDAG.cpp	(working copy)
@@ -1530,6 +1530,14 @@ void PPCDAGToDAGISel::PostprocessISelDAG() {
       if (GlobalAddressSDNode *GA = dyn_cast<GlobalAddressSDNode>(ImmOpnd)) {
         SDLoc dl(GA);
         const GlobalValue *GV = GA->getGlobal();
+        // We can't perform this optimization for data whose alignment
+        // is insufficient for the instruction encoding.
+        if (GV->getAlignment() < 4 &&
+            (StorageOpcode == PPC::LD || StorageOpcode == PPC::STD ||
+             StorageOpcode == PPC::LWA)) {
+          DEBUG(dbgs() << "Rejected this candidate for alignment.\n\n");
+          continue;
+        }
         ImmOpnd = CurDAG->getTargetGlobalAddress(GV, dl, MVT::i64, 0, Flags);
       } else if (ConstantPoolSDNode *CP =
                  dyn_cast<ConstantPoolSDNode>(ImmOpnd)) {

llvm-svn: 185380
2013-07-01 20:52:27 +00:00
Ulrich Weigand f7152a8596 [PowerPC] Also add "msync" alias
This adds an alias for "msync" (which is used on Book E
systems instead of "sync").

llvm-svn: 185375
2013-07-01 20:39:50 +00:00
Hal Finkel 25e4a0d418 Don't form PPC CTR loops for over-sized exit counts
Although you can't generate this from C on PPC64, if you have a loop using a
64-bit counter on PPC32 then you can't form a CTR-based loop for it. This had
been cauing the PPCCTRLoops pass to assert.

Thanks to Joerg Sonnenberger for providing a test case!

llvm-svn: 185361
2013-07-01 19:34:59 +00:00
Ulrich Weigand 3a75861b06 [PowerPC] Fix @got references to local symbols
A @got reference must always result in a relocation, so that
the linker has a chance to set up the GOT entry, even if the
symbol happens to be local.

Add a PPCELFObjectWriter::ExplicitRelSym routine that enforces
a relocation to be emitted for GOT references.

llvm-svn: 185353
2013-07-01 18:19:56 +00:00
Ulrich Weigand 7a9fcdf6fb [PowerPC] Add "wait" instruction
This adds the "wait" instruction and its extended mnemonics.

llvm-svn: 185350
2013-07-01 17:21:23 +00:00
Ulrich Weigand 98fcc7b6bc [PowerPC] Support "eieio" instruction
This adds support for the "eieio" instruction to
the asm parser.

llvm-svn: 185349
2013-07-01 17:06:26 +00:00
Ulrich Weigand 797f1a3f5b [PowerPC] Add variants of "sync" instruction
This adds support for the "sync $L" instruction with operand,
and provides aliases for "lwsync" and "ptesync".

llvm-svn: 185344
2013-07-01 16:37:52 +00:00
Hal Finkel ac1a24b508 PPC: Ignore spill/restore requests for VRSAVE (except on Darwin)
This fixes PR16418, which reports that a function calling
__builtin_unwind_init() asserts. The cause is that this generates a
spill/restore for VRSAVE, and we support that only on Darwin (because VRSAVE is
only really used on Darwin).

The test case checks only that we don't crash. We can add correctness checks
once someone verifies what behavior the function is supposed to have.

llvm-svn: 185235
2013-06-28 22:29:56 +00:00
Hal Finkel 4ca70100de Fix a PPC rlwimi instruction-selection bug
Under certain (evidently rare) circumstances, this code used to convert OR(a,
AND(x, y)) into OR(a, x). This was incorrect.

While there, I've added a comment to the code immediately above.

llvm-svn: 185201
2013-06-28 20:00:07 +00:00
Ulrich Weigand 5a02a02b41 [PowerPC] Accept 17-bit signed immediates for addis
The assembler currently strictly verifies that immediates for
s16imm operands are in range (-32768 ... 32767).  This matches
the behaviour of the GNU assembler, with one exception: gas
allows, as a special case, operands in an extended range
(-65536 .. 65535) for the addis instruction only (and its
extended mnemonic lis).

The main reason for this seems to be to allow using unsigned
16-bit operands for lis, e.g. like lis %r1, 0xfedc.

Since this has been supported by gas for a long time, and
assembler source code seen "in the wild" actually exploits
this feature, this patch adds equivalent support to LLVM
for compatibility reasons.

llvm-svn: 184946
2013-06-26 13:49:53 +00:00
Ulrich Weigand fd3ad693e8 [PowerPC] Support symbolic u16imm operands
Currently, all instructions taking s16imm operands support symbolic
operands.  However, for u16imm operands, we only support actual
immediate integers.  This causes the assembler to reject code like

  ori %r5, %r5, symbol@l

This patch changes the u16imm operand definition to likewise
accept symbolic operands.  In fact, s16imm and u16imm can
share the same encoding routine, now renamed to getImm16Encoding.

llvm-svn: 184944
2013-06-26 13:49:15 +00:00
Ulrich Weigand 93372b4583 [PowerPC] Support @got modifier
Add VK_... values and relocation types necessary to support
the @got family of modifiers.  Used by the asm parser only.

llvm-svn: 184860
2013-06-25 16:49:50 +00:00
Ulrich Weigand ad873cdb2b [PowerPC] Add extended rotate/shift mnemonics
This adds all missing extended rotate/shift mnemonics to the asm parser.

llvm-svn: 184834
2013-06-25 13:17:41 +00:00
Ulrich Weigand 6c31c4aae8 [PowerPC] Add rldcr/rldic instructions
This adds pattern for the rldcr and rldic instructions (the last instruction
from the rotate/shift family that were missing).  They are currently used
only by the asm parser.

llvm-svn: 184833
2013-06-25 13:17:10 +00:00
Ulrich Weigand 4069e24bd3 [PowerPC] Add extended subtract mnemonics
This adds support for the extended subtract mnemonics to the asm parser:
   subi
   subis
   subic
   subic.
   sub
   sub.
   subc
   subc.
 

llvm-svn: 184832
2013-06-25 13:16:48 +00:00
NAKAMURA Takumi 36c17ee5a1 PPCAsmParser.cpp: Quote "@l/@ha" in comments. [-Wdocumentation]
llvm-svn: 184809
2013-06-25 01:14:20 +00:00
Ulrich Weigand 6ca71579db [PowerPC] Support some miscellaneous mnemonics in the asm parser
This adds support for the following extended mnemonics:
  xnop
  mr.
  not
  not.
  la

llvm-svn: 184767
2013-06-24 18:08:03 +00:00
Benjamin Kramer 3912d785e3 PPC: Remove default case from fully covered switch.
llvm-svn: 184758
2013-06-24 17:03:25 +00:00
Ulrich Weigand 86247b6e27 [PowerPC] Add predicted forms of branches
This adds support for the predicted forms of branches (+/-).
There are three cases to consider:
- Branches using a PPC::Predicate code
  For these, I've added new PPC::Predicate codes corresponding
  to the BO values for predicted branch forms, and updated insn
  printing to print them correctly.  I've also added new aliases
  for the asm parser matching the new forms.
- bt/bf
  I've added new aliases matching to gBC etc.
- bd(n)z variants
  I've added new instruction patterns for the predicted forms.

In all cases, the new patterns are used for the asm parser only.
(The new infrastructure ought to be sufficient to allow use by
the compiler too at some point.)

llvm-svn: 184754
2013-06-24 16:52:04 +00:00
Ulrich Weigand fedd5a756e [PowerPC] Add t/f branch mnemonics to asm parser
This adds the bt/bf/bd(n)zt/bd(n)zf mnemonics as aliases for the
asm parser, resolving to the generic conditional patterns.

llvm-svn: 184725
2013-06-24 12:49:20 +00:00
Ulrich Weigand 824b7d8dfd [PowerPC] Support generic conditional branches in asm parser
This adds instruction patterns to cover the generic forms of
the conditional branch instructions.  This allows the assembler
to support the generic mnemonics.

The compiler will still generate the various specific forms
of the instruction that were already supported.

llvm-svn: 184722
2013-06-24 11:55:21 +00:00
Ulrich Weigand b6a30d159e [PowerPC] Support absolute branches
There is currently only limited support for the "absolute" variants
of branch instructions.  This patch adds support for the absolute
variants of all branches that are currently otherwise supported.

This requires adding new fixup types so that the correct variant
of relocation type can be selected by the object writer.

While the compiler will continue to usually choose the relative
branch variants, this will allow the asm parser to fully support
the absolute branches, with either immediate (numerical) or
symbolic target addresses.

No change in code generation intended.

llvm-svn: 184721
2013-06-24 11:03:33 +00:00
Ulrich Weigand 5b9d591ad1 [PowerPC] Support bd(n)zl and bd(n)zlrl
This adds support for the bd(n)zl and bd(n)zlrl instructions.
The patterns are currently used for the asm parser only.

llvm-svn: 184720
2013-06-24 11:02:38 +00:00
Ulrich Weigand d20e91edad [PowerPC] Support b(cond)l in the asm parser
This patch adds support for the conditional variants of bl.
The pattern is currently used by the asm parser only.

llvm-svn: 184719
2013-06-24 11:02:19 +00:00
Ulrich Weigand 1847bb811e [PowerPC] Support blrl and variants in the asm parser
This patch adds support for blrl and its conditional variants.
The patterns are (currently) used for the asm parser only.

llvm-svn: 184718
2013-06-24 11:01:55 +00:00
Chad Rosier 295bd43adb The getRegForInlineAsmConstraint function should only accept MVT value types.
llvm-svn: 184642
2013-06-22 18:37:38 +00:00
Ulrich Weigand 91add7dfbf [PowerPC] Support R_PPC_REL16 family of relocations
The GNU assembler supports (as extension to the ABI) use of PC-relative
relocations in half16 fields, which allows writing code like:

  li 1, base-.

This patch adds support for those relocation types in the assembler.

llvm-svn: 184552
2013-06-21 14:44:37 +00:00
Ulrich Weigand 876a0d0133 [PowerPC] Support various tls-related modifiers
The current code base only supports the minimum set of tls-related
relocations and @modifiers that are necessary to support compiler-
generated code.  This patch extends this to the full set defined
in the ABI (and supported by the GNU assembler) for the benefit
of the assembler parser.

llvm-svn: 184551
2013-06-21 14:44:15 +00:00
Ulrich Weigand e9126f5534 [PowerPC] Support @higher et.al. modifiers
This adds support for the @higher, @highera, @highest, and @highesta
modifers, including some missing relocation types.

llvm-svn: 184550
2013-06-21 14:43:42 +00:00
Ulrich Weigand 72ddbd656e [PowerPC] Support @toc@h modifier
This adds the relocation type and other necessary infrastructure
to use the @toc@h modifier in the assembler.

llvm-svn: 184549
2013-06-21 14:43:10 +00:00
Ulrich Weigand e67c565dc1 [PowerPC] Support @h modifier
This adds necessary infrastructure to support the @h modifier.
Note that all required relocation types were already present
(and unused).

This patch provides support for using @h in the assembler;
it would also be possible to now use this feature in code
generated by the compiler, but this is not done yet.

llvm-svn: 184548
2013-06-21 14:42:49 +00:00
Ulrich Weigand d51c09f5d9 [PowerPC] Rename some more VK_PPC_ enums
This renames more VK_PPC_ enums, to make them more closely reflect
the @modifier string they represent.  This also prepares for adding
a bunch of new VK_PPC_ enums in upcoming patches.

For consistency, some MO_ flags related to VK_PPC_ enums are
likewise renamed.

No change in behaviour.

llvm-svn: 184547
2013-06-21 14:42:20 +00:00
Ulrich Weigand 68e2e1b32b [PowerPC] Clean up VK_PPC_TOC... names
This is another minor cleanup; to bring enum names in line
with the corresponding @modifier names, this renames:

  VK_PPC_TOC -> VK_PPC_TOCBASE
  VK_PPC_TOC_ENTRY -> VK_PPC_TOC16

No code change intended.

llvm-svn: 184491
2013-06-20 22:39:42 +00:00
Ulrich Weigand 9e90b3c814 [PowerPC] Minor cleanup in PPCELFObjectWriter::getRelocTypeInner
This just re-sorts the big switch statement in
PPCELFObjectWriter::getRelocTypeInner to follow
the (numerical) order of the reloc types, and
fixes a couple of whitespace issues.

llvm-svn: 184485
2013-06-20 22:04:40 +00:00
Ulrich Weigand 4727888f4e [PowerPC] Remove unused parameter
The isDarwin parameter to the llvm::LowerPPCMachineInstrToMCInst
routine is now no longer needed; remove it.

llvm-svn: 184441
2013-06-20 16:58:14 +00:00
Ulrich Weigand 22dff957d4 [PowerPC] Add missing build dependency
This (hopefully) fixes build failures resulting from r184436;
the PowerPC asm parser now depends on PowerPC target expresssions.

llvm-svn: 184439
2013-06-20 16:38:00 +00:00
Ulrich Weigand 96e6578395 [PowerPC] Optimize @ha/@l constructs
This patch adds support for having the assembler optimize fixups
to constructs like "symbol@ha" or "symbol@l" if "symbol" can be
resolved at assembler time.

This optimization is already present in the PPCMCExpr.cpp code
for handling PPC_HA16/PPC_LO16 target expressions.  However,
those target expression were used only on Darwin targets.

This patch changes target expression code so that they are
usable also with the GNU assembler (using the @ha / @l syntax
instead of the ha16() / lo16() syntax), and changes the
MCInst lowering code to generate those target expressions
where appropriate.

It also changes the asm parser to generate HA16/LO16 target
expressions when parsing assembler source that uses the
@ha / @l modifiers.  The effect is that now the above-
mentioned optimization automatically becomes available
for those situations too.
 

llvm-svn: 184436
2013-06-20 16:23:52 +00:00
Ulrich Weigand 865a1efc13 [PowerPC] Support compare mnemonics with implied CR0
Just like for branch mnemonics (where support was recently added), the
assembler is supposed to support extended mnemonics for the compare
instructions where no condition register is specified explicitly
(and CR0 is assumed implicitly).

This patch adds support for those extended compare mnemonics.


Index: llvm-head/test/MC/PowerPC/ppc64-encoding-ext.s
===================================================================
--- llvm-head.orig/test/MC/PowerPC/ppc64-encoding-ext.s
+++ llvm-head/test/MC/PowerPC/ppc64-encoding-ext.s
@@ -449,21 +449,37 @@
 
 # CHECK: cmpdi 2, 3, 128                 # encoding: [0x2d,0x23,0x00,0x80]
          cmpdi 2, 3, 128
+# CHECK: cmpdi 0, 3, 128                 # encoding: [0x2c,0x23,0x00,0x80]
+         cmpdi 3, 128
 # CHECK: cmpd 2, 3, 4                    # encoding: [0x7d,0x23,0x20,0x00]
          cmpd 2, 3, 4
+# CHECK: cmpd 0, 3, 4                    # encoding: [0x7c,0x23,0x20,0x00]
+         cmpd 3, 4
 # CHECK: cmpldi 2, 3, 128                # encoding: [0x29,0x23,0x00,0x80]
          cmpldi 2, 3, 128
+# CHECK: cmpldi 0, 3, 128                # encoding: [0x28,0x23,0x00,0x80]
+         cmpldi 3, 128
 # CHECK: cmpld 2, 3, 4                   # encoding: [0x7d,0x23,0x20,0x40]
          cmpld 2, 3, 4
+# CHECK: cmpld 0, 3, 4                   # encoding: [0x7c,0x23,0x20,0x40]
+         cmpld 3, 4
 
 # CHECK: cmpwi 2, 3, 128                 # encoding: [0x2d,0x03,0x00,0x80]
          cmpwi 2, 3, 128
+# CHECK: cmpwi 0, 3, 128                 # encoding: [0x2c,0x03,0x00,0x80]
+         cmpwi 3, 128
 # CHECK: cmpw 2, 3, 4                    # encoding: [0x7d,0x03,0x20,0x00]
          cmpw 2, 3, 4
+# CHECK: cmpw 0, 3, 4                    # encoding: [0x7c,0x03,0x20,0x00]
+         cmpw 3, 4
 # CHECK: cmplwi 2, 3, 128                # encoding: [0x29,0x03,0x00,0x80]
          cmplwi 2, 3, 128
+# CHECK: cmplwi 0, 3, 128                # encoding: [0x28,0x03,0x00,0x80]
+         cmplwi 3, 128
 # CHECK: cmplw 2, 3, 4                   # encoding: [0x7d,0x03,0x20,0x40]
          cmplw 2, 3, 4
+# CHECK: cmplw 0, 3, 4                   # encoding: [0x7c,0x03,0x20,0x40]
+         cmplw 3, 4
 
 # FIXME: Trap mnemonics
 
Index: llvm-head/lib/Target/PowerPC/PPCInstrInfo.td
===================================================================
--- llvm-head.orig/lib/Target/PowerPC/PPCInstrInfo.td
+++ llvm-head/lib/Target/PowerPC/PPCInstrInfo.td
@@ -2201,3 +2201,12 @@ defm : BranchExtendedMnemonic<"ne", 68>;
 defm : BranchExtendedMnemonic<"nu", 100>;
 defm : BranchExtendedMnemonic<"ns", 100>;
 
+def : InstAlias<"cmpwi $rA, $imm", (CMPWI CR0, gprc:$rA, s16imm:$imm)>;
+def : InstAlias<"cmpw $rA, $rB", (CMPW CR0, gprc:$rA, gprc:$rB)>;
+def : InstAlias<"cmplwi $rA, $imm", (CMPLWI CR0, gprc:$rA, u16imm:$imm)>;
+def : InstAlias<"cmplw $rA, $rB", (CMPLW CR0, gprc:$rA, gprc:$rB)>;
+def : InstAlias<"cmpdi $rA, $imm", (CMPDI CR0, g8rc:$rA, s16imm:$imm)>;
+def : InstAlias<"cmpd $rA, $rB", (CMPD CR0, g8rc:$rA, g8rc:$rB)>;
+def : InstAlias<"cmpldi $rA, $imm", (CMPLDI CR0, g8rc:$rA, u16imm:$imm)>;
+def : InstAlias<"cmpld $rA, $rB", (CMPLD CR0, g8rc:$rA, g8rc:$rB)>;
+

llvm-svn: 184435
2013-06-20 16:15:12 +00:00
Bill Wendling afc1036f3e Access the TargetLoweringInfo from the TargetMachine object instead of caching it. The TLI may change between functions. No functionality change.
llvm-svn: 184349
2013-06-19 20:51:24 +00:00
Bill Wendling bc07a8900c Use pointers to the MCAsmInfo and MCRegInfo.
Someone may want to do something crazy, like replace these objects if they
change or something.

No functionality change intended.

llvm-svn: 184175
2013-06-18 07:20:20 +00:00
David Blaikie b735b4d6db DebugInfo: remove target-specific Frame Index handling for DBG_VALUE MachineInstrs
Frame index handling is now target-agnostic, so delete the target hooks
for creation & asm printing of target-specific addressing in DBG_VALUEs
and any related functions.

llvm-svn: 184067
2013-06-16 20:34:27 +00:00
David Blaikie c2467c4e9b Revert r183854 (PPC: Fix switch warnings from r183841)
Now that the PRED_BAD has been removed, this is failing the Clang
-Werror build due to -Wcovered-switch-default.

llvm-svn: 183863
2013-06-12 20:57:32 +00:00
Bill Schmidt 4fcb8c260e [PowerPC] Remove PRED_BAD from PPC::Predicate enumeration.
I'm taking David Blaikie's suggestion to use an
Optional<PPC::Predicate> return value instead.  That's the right
solution for this problem.  Thanks for pointing out that possibility!

llvm-svn: 183858
2013-06-12 20:22:24 +00:00
Bill Schmidt 31c60f740e [PowerPC] Fix switch warnings from r183841.
Introducing PRED_BAD caused some unexpected warnings that are now
suppressed.

llvm-svn: 183854
2013-06-12 19:20:32 +00:00
Bill Schmidt 230b451389 [PowerPC] Expose some calling convention functions in PPCISelLowering.h.
This is a preparatory patch for fast-isel support.  The instruction
selector will need to access some functions in PPCGenCallingConv.inc,
which in turn requires several helper functions to be defined.  These
are currently defined near the only use of PCCGenCallingConv.inc,
inside PPCISelLowering.cpp.  This patch moves the declaration of the
functions into the associated header file to provide the needed
visibility.

No functional change intended.

llvm-svn: 183844
2013-06-12 16:39:22 +00:00
Bill Schmidt 6207a4b7dd Add artificial PRED_BAD to PPC::Predicate enumeration.
Allows returning a PPC::Predicate from a function with a no-predicate
value possible.  Preparatory patch for fast-isel on PPC64 ELF.  No
behavioral change intended.

llvm-svn: 183841
2013-06-12 15:14:42 +00:00
Ulrich Weigand 32d725b80a [MC/DWARF] Support .debug_frame / .debug_line code alignment factors
I've been comparing the object file output of LLVM's integrated
assembler against the external assembler on PowerPC, and one
area where differences still remain are in DWARF sections.

In particular, the GNU assembler generates .debug_frame and
.debug_line sections using a code alignment factor of 4, since
all PowerPC instructions have size 4 and must be aligned to a
multiple of 4.  However, current MC code hard-codes a code
alignment factor of 1.

This patch changes this by adding a "minimum instruction alignment"
data element to MCAsmInfo and using this as code alignment factor.

This requires passing a MCContext into MCDwarfLineAddr::Encode
and MCDwarfLineAddr::EncodeAdvanceLoc.  Note that one caller,
MCDwarfLineAddr::Write, didn't actually have that information
available.  However, it turns out that this routine is in fact
never used in the whole code base, so the patch simply removes
it.  If it turns out to be needed again at a later time, it
could be re-added with an updated interface.

llvm-svn: 183834
2013-06-12 14:46:54 +00:00
Ulrich Weigand 4c44032aa1 [PowerPC] Support extended sc mnemonic
A plain "sc" without argument is supposed to be treated like "sc 0"
by the assembler.  This patch adds a corresponding alias.

Problem reported by Joerg Sonnenberger.

llvm-svn: 183687
2013-06-10 17:19:43 +00:00
Ulrich Weigand aa4a2d71aa [PowerPC] Support branch mnemonics with implied CR0
The extended branch mnemonics are supposed to use an implied CR0
if there is no explicit condition register specified.  This patch
adds extra variants of the mnemonics to this effect.

Problem reported by Joerg Sonnenberger.

llvm-svn: 183686
2013-06-10 17:19:15 +00:00
Ulrich Weigand 397406259e [PowerPC] Use multiclass to generate extended branch mnemonics
This patch removes some redundancy by generating the extended branch
mnemonics via a multiclass.

No change in behaviour expected.

llvm-svn: 183685
2013-06-10 17:18:29 +00:00
Hal Finkel fa5f6f7440 Disallow i64 div/rem in PPC32 counter loops
On PPC32, [su]div,rem on i64 types are transformed into runtime library
function calls. As a result, they are not allowed in counter-based loops (the
counter-loops verification pass caught this error; this change fixes PR16169).

llvm-svn: 183581
2013-06-07 22:16:19 +00:00
Benjamin Kramer f0ec199448 Fold variable that's only used in assert into the assert.
Avoids unused variable warnings in Release builds.

llvm-svn: 183512
2013-06-07 11:23:35 +00:00
Bill Wendling 5e7656bf0c Don't cache the instruction and register info from the TargetMachine, because
the internals of TargetMachine could change.

No functionality change intended.

llvm-svn: 183494
2013-06-07 07:55:53 +00:00
Ahmed Bougacha b1a4d9da3b Make SubRegIndex size mandatory, following r183020.
This also makes TableGen able to compute sizes/offsets of synthesized
indices representing tuples.

llvm-svn: 183061
2013-05-31 23:45:26 +00:00
Andrew Trick ad6d08ac6f Order CALLSEQ_START and CALLSEQ_END nodes.
Fixes PR16146: gdb.base__call-ar-st.exp fails after
pre-RA-sched=source fixes.

Patch by Xiaoyi Guo!

This also fixes an unsupported dbg.value test case. Codegen was
previously incorrect but the test was passing by luck.

llvm-svn: 182885
2013-05-29 22:03:55 +00:00
Hal Finkel 8ebfe6c263 PPC: Add a isConsecutiveLS utility function
isConsecutiveLS is a slightly more general form of
SelectionDAG::isConsecutiveLoad. Aside from also handling stores, it also does
not assume equality of the chain operands is necessary. In the case of the PPC
backend, this chain condition is checked in a more general way by the
surrounding code.

Mostly, this part of the refactoring in preparation for supporting optimized
unaligned stores.

llvm-svn: 182723
2013-05-27 02:06:39 +00:00
Hal Finkel 7d8a691b5d Prefer to duplicate PPC Altivec loads when expanding unaligned loads
When expanding unaligned Altivec loads, we use the decremented offset trick to
prevent page faults. Unfortunately, if we have a sequence of consecutive
unaligned loads, this leads to suboptimal code generation because the 'extra'
load from the first unaligned load can be combined with the base load from the
second (but only if the decremented offset trick is not used for the first).
Search up and down the chain, through loads and token factors, looking for
consecutive loads, and if one is found, don't use the offset reduction trick.
These duplicate loads are later combined to yield the desired sequence (in the
future, we might want a more-powerful chain search, but that will require some
changes to allow the combiner routines to access the AA object).

This should complete the initial implementation of the optimized unaligned
Altivec load expansion. There is some refactoring that should be done, but
that will happen when the unaligned store expansion is added.

llvm-svn: 182719
2013-05-26 18:08:30 +00:00
Hal Finkel bc2ee4c4e6 PPC: Combine duplicate (offset) lvsl Altivec intrinsics
The lvsl permutation control instruction is a function only of the alignment of
the pointer operand (relative to the 16-byte natural alignment of Altivec
vectors). As a result, multiple lvsl intrinsics where the operands differ by a
multiple of 16 can be combined.

llvm-svn: 182708
2013-05-25 04:05:05 +00:00
Andrew Trick ef9de2a739 Track IR ordering of SelectionDAG nodes 2/4.
Change SelectionDAG::getXXXNode() interfaces as well as call sites of
these functions to pass in SDLoc instead of DebugLoc.

llvm-svn: 182703
2013-05-25 02:42:55 +00:00
Hal Finkel cf2e908014 PPC: Initial support for permutation-based unaligned Altivec loads
Altivec only directly supports aligned loads, but the loads have a strange
property: If given an unaligned address, they truncate the address to the next
lower aligned address, and load from there.  This property, along with an extra
load and some special-purpose permutation-control instructions that generate
the appropriate permutations from the original unaligned address, allow
efficient lowering of aligned loads. This code uses the trick explained in the
Apple Velocity Engine optimization overview document to prevent the needed
extra load from possibly causing a page fault if the original address happens
to be aligned.

As noted in the FIXMEs, there are several additional optimizations that can be
performed to reduce the cost of these loads even more. These will be
implemented in future commits.

llvm-svn: 182691
2013-05-24 23:00:14 +00:00
Michael J. Spencer df1ecbd734 Replace Count{Leading,Trailing}Zeros_{32,64} with count{Leading,Trailing}Zeros.
llvm-svn: 182680
2013-05-24 22:23:49 +00:00
Ulrich Weigand 9948546923 [PowerPC] Remove symbolLo/symbolHi instruction operand types
Now that there is no longer any distinction between symbolLo
and symbolHi operands in either printing, encoding, or parsing,
the operand types can be removed in favor of simply using
s16imm.

This completes the patch series to decouple lo/hi operand part
processing from the particular instruction whose operand it is.

No change in code generation expected from this patch.

llvm-svn: 182618
2013-05-23 22:48:06 +00:00
Ulrich Weigand 41789de165 [PowerPC] Clean up generation of ha16() / lo16() markers
When targeting the Darwin assembler, we need to generate markers ha16() and
lo16() to designate the high and low parts of a (symbolic) immediate.  This
is necessary not just for plain symbols, but also for certain symbolic
expression, typically along the lines of ha16(A - B).  The latter doesn't
work when simply using VariantKind flags on the symbol reference.
This is why the current back-end uses hacks (explicitly called out as such
via multiple FIXMEs) in the symbolLo/symbolHi print methods.

This patch uses target-defined MCExpr codes to represent the Darwin
ha16/lo16 constructs, following along the lines of the equivalent solution
used by the ARM back end to handle their :upper16: / :lower16: markers.
This allows us to get rid of special handling both in the symbolLo/symbolHi
print method and in the common code MCExpr::print routine.  Instead, the
ha16 / lo16 markers are printed simply in a custom print routine for the
target MCExpr types.  (As a result, the symbolLo/symbolHi print methods
can now replaced by a single printS16ImmOperand routine that also handles
symbolic operands.)

The patch also provides a EvaluateAsRelocatableImpl routine to handle
ha16/lo16 constructs.  This is not actually used at the moment by any
in-tree code, but is provided as it makes merging into David Fang's
out-of-tree Mach-O object writer simpler.

Since there is no longer any need to treat VK_PPC_GAS_HA16 and
VK_PPC_DARWIN_HA16 differently, they are merged into a single
VK_PPC_ADDR16_HA (and likewise for the _LO16 types).

llvm-svn: 182616
2013-05-23 22:26:41 +00:00
Bill Schmidt f88571e027 Change some PowerPC PatLeaf definitions to ImmLeaf for fast-isel.
Using PatLeaf rather than ImmLeaf when defining immediate predicates
prevents simple patterns using those predicates from being recognized
for fast instruction selection.  This patch replaces the immSExt16
PatLeaf predicate with two ImmLeaf predicates, imm32SExt16 and
imm64SExt16, allowing a few more patterns to be recognized (ADDI,
ADDIC, MULLI, ADDI8, and ADDIC8).  Using the new predicates does not
help for LI, LI8, SUBFIC, and SUBFIC8 because these are rejected for
other reasons, but I see no reason to retain the PatLeaf predicate.

No functional change intended, and thus no test cases yet.  This is
preliminary work for enabling fast-isel support for PowerPC.  When
that support is ready, we'll be able to test this function.

llvm-svn: 182510
2013-05-22 20:09:24 +00:00
Hal Finkel c5211291f1 Fix PPC branch selection for counter-based branches
Although I had added some support for the BDZ/BDNZ branches into the selector
(in r158204), I had not correctly adjusted the condition at the top of the
loop. As a result, these branches were still essentially unsupported.

This fixes PR16086. Unfortunately, any test case would be very large (because
it would need to force the loop backedge to exceed the range of the 16-bit
immediate).

llvm-svn: 182385
2013-05-21 14:21:09 +00:00
Hal Finkel a969df84ab Rename LoopSimplify.h to LoopUtils.h
As discussed, LoopUtils.h is a better name.

llvm-svn: 182314
2013-05-20 20:46:30 +00:00
Hal Finkel e6d7c285b3 Remove copied preheader insertion logic from PPCCTRLoops
Now that the preheader insertion logic in LoopSimplify is externally exposed,
use it, and remove the copy-and-pasted version.

No functionality change intended.

llvm-svn: 182300
2013-05-20 16:47:10 +00:00
Hal Finkel 0859ef29d5 Rename PPC MTCTRse to MTCTRloop
As the pairing of this instruction form with the bdnz/bdz branches is now
enforced by the verification pass, make it clear from the name that these
are used only for counter-based loops.

No functionality change intended.

llvm-svn: 182296
2013-05-20 16:08:37 +00:00
Hal Finkel 8ca3884147 Add a PPCCTRLoops verification pass
When asserts are enabled, this adds a verification pass for PPC counter-loop
formation. Unfortunately, without sacrificing code quality, there is no better
way of forming counter-based loops except at the (late) IR level. This means
that we need to recognize, at the IR level, anything which might turn into a
function call (or indirect branch). Because this is currently a finite set of
things, and because SelectionDAG lowering is basic-block local, this can be
done. Nevertheless, it is fragile, and failure results in a miscompile. This
verification pass checks that all (reachable) counter-based branches are
dominated by a loop mtctr instruction, and that no instructions in between
clobber the counter register. If these conditions are not satisfied, then an
ICE will be triggered.

In short, this is to help us sleep better at night.

llvm-svn: 182295
2013-05-20 16:08:17 +00:00
Hal Finkel 2f474f0e8a Check InlineAsm clobbers in PPCCTRLoops
We don't need to reject all inline asm as using the counter register (most does
not). Only those that explicitly clobber the counter register need to prevent
the transformation.

llvm-svn: 182191
2013-05-18 09:20:39 +00:00
Matt Arsenault 75865923c9 Add LLVMContext argument to getSetCCResultType
llvm-svn: 182180
2013-05-18 00:21:46 +00:00
Ulrich Weigand 2dbe06a987 [PowerPC] Fix hi/lo encoding in old-style code emitter
This patch implements the equivalent change to r182091/r182092
in the old-style code emitter.  Instead of having two separate
16-bit immediate encoding routines depending on the instruction,
this patch introduces a single encoder that checks the machine
operand flags to decide whether the low or high half of a
symbol address is required.

Since now both encoders make no further distinction between
"symbolLo" and "symbolHi", the .td operand can now use a
single getS16ImmEncoding method.

Tested by running the old-style JIT tests on 32-bit Linux.

llvm-svn: 182097
2013-05-17 14:14:12 +00:00
Ulrich Weigand 6e23ac606e [PowerPC] Merge/rename PPC fixup types
Now that fixup_ppc_ha16 and fixup_ppc_lo16 are being treated exactly
the same everywhere, it no longer makes sense to have two fixup types.

This patch merges them both into a single type fixup_ppc_half16,
and renames fixup_ppc_lo16_ds to fixup_ppc_half16ds for consistency.
(The half16 and half16ds names are taken from the description of
relocation types in the PowerPC ABI.)

No change in code generation expected.

llvm-svn: 182092
2013-05-17 12:37:21 +00:00
Ulrich Weigand 994f49ed79 [PowerPC] Fix processing of ha16/lo16 fixups
The current PowerPC MC back end distinguishes between fixup_ppc_ha16
and fixup_ppc_lo16, which are determined by the instruction the fixup
applies to, and uses this distinction to decide whether a fixup ought
to resolve to the high or the low part of a symbol address.

This isn't quite correct, however.  It is valid -if unusual- assembler
to use, e.g.
  li 1, symbol@ha
or
  lis 1, symbol@l
Whether the high or the low part of the address is used depends solely
on the @ suffix, not on the instruction.

In addition, both
  li 1, symbol
and
  lis 1, symbol
are valid, assuming the symbol address fits into 16 bits; again, both
will then refer to the actual symbol value (so li will load the value
itself, while lis will load the value shifted by 16).


To fix this, two places need to be adapted.  If the fixup cannot be
resolved at assembler time, a relocation needs to be emitted via
PPCELFObjectWriter::getRelocType.  This routine already looks at
the VK_ type to determine the relocation.  The only problem is that
will reject any _LO modifier in a ha16 fixup and vice versa.  This
is simply incorrect; any of those modifiers ought to be accepted
for either fixup type.

If the fixup *can* be resolved at assembler time, adjustFixupValue
currently selects the high bits of the symbol value if the fixup
type is ha16.  Again, this is incorrect; see the above example
  lis 1, symbol

Now, in theory we'd have to respect a VK_ modifier here.  However,
in fact common code never even attempts to resolve symbol references
using any nontrivial VK_ modifier at assembler time; it will always
fall back to emitting a reloc and letting the linker handle it.

If this ever changes, presumably there'd have to be a target callback
to resolve VK_ modifiers.  We'd then have to handle @ha etc. there.

llvm-svn: 182091
2013-05-17 12:36:29 +00:00
Rafael Espindola b08d2c2db0 Remove addFrameMove.
Now that we have good testing, remove addFrameMove and create cfi
instructions directly.

llvm-svn: 182052
2013-05-16 21:02:15 +00:00
Hal Finkel 5f587c59a5 Create an new preheader in PPCCTRLoops to avoid counter register clobbers
Some IR-level instructions (such as FP <-> i64 conversions) are not chained
w.r.t. the mtctr intrinsic and yet may become function calls that clobber the
counter register. At the selection-DAG level, these might be reordered with the
mtctr intrinsic causing miscompiles. To avoid this situation, if an existing
preheader has instructions that might use the counter register, create a new
preheader for the mtctr intrinsic. This extra block will be remerged with the
old preheader at the MI level, but will prevent unwanted reordering at the
selection-DAG level.

llvm-svn: 182045
2013-05-16 19:58:38 +00:00
Ulrich Weigand 9d980cbdb9 [PowerPC] Use true offset value in "memrix" machine operands
This is the second part of the change to always return "true"
offset values from getPreIndexedAddressParts, tackling the
case of "memrix" type operands.

This is about instructions like LD/STD that only have a 14-bit
field to encode immediate offsets, which are implicitly extended
by two zero bits by the machine, so that in effect we can access
16-bit offsets as long as they are a multiple of 4.

The PowerPC back end currently handles such instructions by
carrying the 14-bit value (as it will get encoded into the
actual machine instructions) in the machine operand fields
for such instructions.  This means that those values are
in fact not the true offset, but rather the offset divided
by 4 (and then truncated to an unsigned 14-bit value).

Like in the case fixed in r182012, this makes common code
operations on such offset values not work as expected.
Furthermore, there doesn't really appear to be any strong
reason why we should encode machine operands this way.

This patch therefore changes the encoding of "memrix" type
machine operands to simply contain the "true" offset value
as a signed immediate value, while enforcing the rules that
it must fit in a 16-bit signed value and must also be a
multiple of 4.

This change must be made simultaneously in all places that
access machine operands of this type.  However, just about
all those changes make the code simpler; in many cases we
can now just share the same code for memri and memrix
operands.

llvm-svn: 182032
2013-05-16 17:58:02 +00:00
Hal Finkel 47db66d43f PPC32 cannot form counter loops around i64 FP conversions
On PPC32, i64 FP conversions are implemented using runtime calls (which clobber
the counter register). These must be excluded.

llvm-svn: 182023
2013-05-16 16:52:41 +00:00
Ulrich Weigand 7aa76b6a07 [PowerPC] Report true displacement value from getPreIndexedAddressParts
DAGCombiner::CombineToPreIndexedLoadStore calls a target routine to
decompose a memory address into a base/offset pair.  It expects the
offset (if constant) to be the true displacement value in order to
perform optional additional optimizations; in particular, to convert
other uses of the original pointer into uses of the new base pointer
after pre-increment.

The PowerPC implementation of getPreIndexedAddressParts, however,
simply calls SelectAddressRegImm, which returns a TargetConstant.
This value is appropriate for encoding into the instruction, but
it is not always usable as true displacement value:

- Its type is always MVT::i32, even on 64-bit, where addresses
  ought to be i64 ... this causes the optimization to simply
  always fail on 64-bit due to this line in DAGCombiner:

      // FIXME: In some cases, we can be smarter about this.
      if (Op1.getValueType() != Offset.getValueType()) {

- Its value is truncated to an unsigned 16-bit value if negative.
  This causes the above opimization to generate wrong code.

This patch fixes both problems by simply returning the true
displacement value (in its original type).  This doesn't
affect any other user of the displacement.

llvm-svn: 182012
2013-05-16 14:53:05 +00:00
Rafael Espindola 6e8c0d94f8 Removed dead code.
llvm-svn: 181975
2013-05-16 03:34:58 +00:00
Hal Finkel 80267a0a37 undef setjmp in PPCCTRLoops
Trying to unbreak the VS build by copying some undef code from
Utils/LowerInvoke.cpp.

llvm-svn: 181938
2013-05-15 22:20:24 +00:00
Hal Finkel 25c1992bc7 Implement PPC counter loops as a late IR-level pass
The old PPCCTRLoops pass, like the Hexagon pass version from which it was
derived, could only handle some simple loops in canonical form. We cannot
directly adapt the new Hexagon hardware loops pass, however, because the
Hexagon pass contains a fundamental assumption that non-constant-trip-count
loops will contain a guard, and this is not always true (the result being that
incorrect negative counts can be generated). With this commit, we replace the
pass with a late IR-level pass which makes use of SE to calculate the
backedge-taken counts and safely generate the loop-count expressions (including
any necessary max() parts). This IR level pass inserts custom intrinsics that
are lowered into the desired decrement-and-branch instructions.

The most fragile part of this new implementation is that interfering uses of
the counter register must be detected on the IR level (and, on PPC, this also
includes any indirect branches in addition to function calls). Also, to make
all of this work, we need a variant of the mtctr instruction that is marked
as having side effects. Without this, machine-code level CSE, DCE, etc.
illegally transform the resulting code. Hopefully, this can be improved
in the future.

This new pass is smaller than the original (and much smaller than the new
Hexagon hardware loops pass), and can handle many additional cases correctly.
In addition, the preheader-creation code has been copied from LoopSimplify, and
after we decide on where it belongs, this code will be refactored so that it
can be explicitly shared (making this implementation even smaller).

The new test-case files ctrloop-{le,lt,ne}.ll have been adapted from tests for
the new Hexagon pass. There are a few classes of loops that this pass does not
transform (noted by FIXMEs in the files), but these deficiencies can be
addressed within the SE infrastructure (thus helping many other passes as well).

llvm-svn: 181927
2013-05-15 21:37:41 +00:00
Rafael Espindola 0f2a6fe613 Cleanup relocation sorting for ELF.
We want the order to be deterministic on all platforms. NAKAMURA Takumi
fixed that in r181864. This patch is just two small cleanups:

* Move the function to the cpp file. It is only passed to array_pod_sort.
* Remove the ppc implementation which is now redundant

llvm-svn: 181910
2013-05-15 18:22:01 +00:00
NAKAMURA Takumi dc9f013a5d PPCISelLowering.h: Escape \@ in comments. [-Wdocumentation]
llvm-svn: 181907
2013-05-15 18:01:35 +00:00
NAKAMURA Takumi dcc66456cc Whitespace.
llvm-svn: 181906
2013-05-15 18:01:28 +00:00
Ulrich Weigand 2fb140ef31 [PowerPC] Remove need for adjustFixupOffst hack
Now that applyFixup understands differently-sized fixups, we can define
fixup_ppc_lo16/fixup_ppc_lo16_ds/fixup_ppc_ha16 to properly be 2-byte
fixups, applied at an offset of 2 relative to the start of the 
instruction text.

This has the benefit that if we actually need to generate a real
relocation record, its address will come out correctly automatically,
without having to fiddle with the offset in adjustFixupOffset.

Tested on both 64-bit and 32-bit PowerPC, using external and
integrated assembler.

llvm-svn: 181894
2013-05-15 15:07:06 +00:00
Ulrich Weigand 56f5b28d2e [PowerPC] Correctly handle fixups of other than 4 byte size
The PPCAsmBackend::applyFixup routine handles the case where a
fixup can be resolved within the same object file.  However,
this routine is currently hard-coded to assume the size of
any fixup is always exactly 4 bytes.

This is sort-of correct for fixups on instruction text; even
though it only works because several of what really would be
2-byte fixups are presented as 4-byte fixups instead (requiring
another hack in PPCELFObjectWriter::adjustFixupOffset to clean
it up).

However, this assumption breaks down completely for fixups
on data, which legitimately can be of any size (1, 2, 4, or 8).

This patch makes applyFixup aware of fixups of varying sizes,
introducing a new helper routine getFixupKindNumBytes (along
the lines of what the ARM back end does).  Note that in order
to handle fixups of size 8, we also need to fix the return type
of adjustFixupValue to uint64_t to avoid truncation.

Tested on both 64-bit and 32-bit PowerPC, using external and
integrated assembler.

llvm-svn: 181891
2013-05-15 15:01:46 +00:00
Bill Schmidt a87a7e2620 Implement the PowerPC system call (sc) instruction.
Instruction added at request of Roman Divacky.  Tested via asm-parser.

llvm-svn: 181821
2013-05-14 19:35:45 +00:00
Bill Schmidt ef3d1a24ed PPC32: Fix stack collision between FP and CR save areas.
The changes to CR spill handling missed a case for 32-bit PowerPC.
The code in PPCFrameLowering::processFunctionBeforeFrameFinalized()
checks whether CR spill has occurred using a flag in the function
info.  This flag is only set by storeRegToStackSlot and
loadRegFromStackSlot.  spillCalleeSavedRegisters does not call
storeRegToStackSlot, but instead produces MI directly.  Thus we don't
see the CR is spilled when assigning frame offsets, and the CR spill
ends up colliding with some other location (generally the FP slot).

This patch sets the flag in spillCalleeSavedRegisters for PPC32 so
that the CR spill is properly detected and gets its own slot in the
stack frame.

llvm-svn: 181800
2013-05-14 16:08:32 +00:00
Bill Schmidt 6cda22a3b4 Fix goofy commentary in PPCTargetObjectFile.cpp.
llvm-svn: 181725
2013-05-13 19:40:36 +00:00
Bill Schmidt 22d40dcfe9 PPC64: Constant initializers with dynamic relocations go in .data.rel.ro.
This fixes warning messages observed in the oggenc application test in
projects/test-suite.  Special handling is needed for the 64-bit
PowerPC SVR4 ABI when a constant is initialized with a pointer to a
function in a shared library.  Because a function address is
implemented as the address of a function descriptor, the use of copy
relocations can lead to problems with initialization.  GNU ld
therefore replaces copy relocations with dynamic relocations to be
resolved by the dynamic linker.  This means the constant cannot reside
in the read-only data section, but instead belongs in .data.rel.ro,
which is designed for constants containing dynamic relocations.

The implementation creates a class PPC64LinuxTargetObjectFile
inheriting from TargetLoweringObjectFileELF, which behaves like its
parent except to place constants of this sort into .data.rel.ro.

The test case is reduced from the oggenc application.

llvm-svn: 181723
2013-05-13 19:34:37 +00:00
Rafael Espindola 227144c23c Remove the MachineMove class.
It was just a less powerful and more confusing version of
MCCFIInstruction. A side effect is that, since MCCFIInstruction uses
dwarf register numbers, calls to getDwarfRegNum are pushed out, which
should allow further simplifications.

I left the MachineModuleInfo::addFrameMove interface unchanged since
this patch was already fairly big.

llvm-svn: 181680
2013-05-13 01:16:13 +00:00
Rafael Espindola 1b09836bc3 Change getFrameMoves to return a const reference.
To add a frame now there is a dedicated addFrameMove which also takes
care of constructing the move itself.

llvm-svn: 181657
2013-05-11 02:38:11 +00:00
Rafael Espindola 140a837acd Remove unused argument.
llvm-svn: 181618
2013-05-10 18:16:59 +00:00
Roman Divacky 2d26e8e56b Remove unused isLegalAddressImmediate() method.
llvm-svn: 181452
2013-05-08 17:51:39 +00:00
Ulrich Weigand e462053f64 [PowerPC] Fix regression in generating @ha/@l relocs
The patch I committed as revision 167864 introduced a regression that
causes LLVM to no longer generate appropriate relocs for @ha/@l symbol
references (but fail an assertion instead).

This is fixed here by re-enabling support for the VK_PPC_GAS_HA16/
VK_PPC_GAS_LO16 variant kinds (and their Darwin variants) in
PPCELFObjectWriter.cpp.

Tested by running projects/test-suite in -m32 mode with the integrated
assembler forced on.  A standalone test case will be committed shortly
as well.

llvm-svn: 181450
2013-05-08 17:50:07 +00:00
Bill Schmidt 38b6cb51bc Fix handling of anonymous aggregate parameters for powerpc*-apple-darwin8.
This fixes bug 15821 similarly to the powerpc64-linux fix for bug 14779.

Patch by David Fang.

llvm-svn: 181449
2013-05-08 17:22:33 +00:00
Hal Finkel 08e53ee551 PPCInstrInfo::optimizeCompareInstr should not optimize FP compares
The floating-point record forms on PPC don't set the condition register bits
based on a comparison with zero (like the integer record forms do), but rather
based on the exception status bits.

llvm-svn: 181423
2013-05-08 12:16:14 +00:00
Hal Finkel c363245ff2 Cleanup PPCInstrInfo::optimizeCompareInstr
Implement suggestions by Bill Schmidt in post-commit review. No functionality
change intended.

llvm-svn: 181338
2013-05-07 17:49:55 +00:00
Ulrich Weigand 509c240ce5 [PowerPC] Fix memory corruption in AsmParser
As pointed out by Evgeniy Stepanov, assigning a std::string temporary
to a StringRef is not a good idea.  Rework MatchRegisterName to avoid
using the .lower routine.

llvm-svn: 181192
2013-05-06 11:16:57 +00:00
Ulrich Weigand b9d5d073d6 [PowerPC] Avoid using '$' in generated assembler code
PowerPC assemblers are supposed to support a stand-alone '$' symbol
as an alternative of '.' to refer to the current PC.  This does not
work in the LLVM assembler parser yet.

To avoid bootstrap failures when using the LLVM assembler as system
assembler, this patch modifies the assembler source code generated
by LLVM to avoid using '$' (and simply use '.' instead).

llvm-svn: 181054
2013-05-03 19:53:04 +00:00
Ulrich Weigand 300b6875fb [PowerPC] Add some Book II instructions to AsmParser
This patch adds a couple of Book II instructions (isync, icbi) to the
PowerPC assembler parser.  These are needed when bootstrapping clang
with the integrated assembler forced on, because they are used in
inline asm statements in the code base.

The test case adds the full list of Book II storage control instructions,
including associated extended mnemonics.  Again, those that are not yet
supported as marked as FIXME.

llvm-svn: 181052
2013-05-03 19:51:09 +00:00
Ulrich Weigand d839490f16 [PowerPC] Support extended mnemonics in AsmParser
This patch adds infrastructure to support extended mnemonics in the
PowerPC assembler parser.  It adds support specifically for those
extended mnemonics that LLVM will itself generate.

The test case lists *all* extended mnemonics according to the
PowerPC ISA v2.06 Book I, but marks those not yet supported
as FIXME.

llvm-svn: 181051
2013-05-03 19:50:27 +00:00
Ulrich Weigand 640192daa8 [PowerPC] Add assembler parser
This adds assembler parser support to the PowerPC back end.

The parser will run for any powerpc-*-* and powerpc64-*-* triples,
but was tested only on 64-bit Linux.  The supported syntax is
intended to be compatible with the GNU assembler.

The parser does not yet support all PowerPC instructions, but
it does support anything that is generated by LLVM itself.
There is no support for testing restricted instruction sets yet,
i.e. the parser will always accept any instructions it knows,
no matter what feature flags are given.

Instruction operands will be checked for validity and errors
generated.  (Error handling in general could still be improved.)

The patch adds a number of test cases to verify instruction
and operand encodings.  The tests currently cover all instructions
from the following PowerPC ISA v2.06 Book I facilities:
Branch, Fixed-point, Floating-Point, and Vector. 
Note that a number of these instructions are not yet supported
by the back end; they are marked with FIXME.

A number of follow-on check-ins will add extra features.  When
they are all included, LLVM passes all tests (including bootstrap)
when using clang -cc1as as the system assembler.

llvm-svn: 181050
2013-05-03 19:49:39 +00:00
Rafael Espindola 1357ab74e5 Make all darwin ppc stubs local.
This fixes pr15763.
Patch by David Fang.

llvm-svn: 180657
2013-04-27 00:43:16 +00:00
Ulrich Weigand 136ac22eaa PowerPC: Use RegisterOperand instead of RegisterClass operands
In the default PowerPC assembler syntax, registers are specified simply
by number, so they cannot be distinguished from immediate values (without
looking at the opcode).  This means that the default operand matching logic
for the asm parser does not work, and we need to specify custom matchers.
Since those can only be specified with RegisterOperand classes and not
directly on the RegisterClass, all instructions patterns used by the asm
parser need to use a RegisterOperand (instead of a RegisterClass) for
all their register operands.

This patch adds one RegisterOperand for each RegisterClass, using the
same name as the class, just in lower case, and updates all instruction
patterns to use RegisterOperand instead of RegisterClass operands.

llvm-svn: 180611
2013-04-26 16:53:15 +00:00
Ulrich Weigand 551b085d55 PowerPC: Fix encoding of vsubcuw and vsum4sbs instructions
When testing the asm parser, I noticed wrong encodings for the
above instructions (wrong sub-opcodes).

Tests will be added together with the asm parser.

llvm-svn: 180608
2013-04-26 15:39:57 +00:00
Ulrich Weigand 48b949b650 PowerPC: Fix encoding of stfsu and stfdu instructions
When testing the asm parser, I noticed wrong encodings for the
above instructions (wrong sub-opcodes).  Note that apparently
the compiler currently never generates pre-inc instructions
for floating point types for some reason ...

Tests will be added together with the asm parser.

llvm-svn: 180607
2013-04-26 15:39:40 +00:00
Ulrich Weigand fa451ba1b9 PowerPC: Fix encoding of rldimi and rldcl instructions
When testing the asm parser, I noticed wrong encodings for the
above instructions (wrong operand name in rldimi, wrong form
and sub-opcode for rldcl).

Tests will be added together with the asm parser.

llvm-svn: 180606
2013-04-26 15:39:12 +00:00
Ulrich Weigand 72a7dc0d7d PowerPC: Support PC-relative fixup_ppc_brcond14.
When testing the asm parser, I ran into an error when using a conditional
branch to an external symbol (this doesn't occur in compiler-generated
code) due to missing support in PPCELFObjectWriter::getRelocTypeInner.

llvm-svn: 180605
2013-04-26 15:38:30 +00:00
Bill Schmidt a76bf5a6d0 Change commentary for PowerPC Boolean vector contents.
No functional change intended.

llvm-svn: 180131
2013-04-23 18:49:44 +00:00
Owen Anderson 2d4cca35c3 DAGCombine should not aggressively fold SEXT(VSETCC(...)) into a wider VSETCC without first checking the target's vector boolean contents.
This exposed an issue with PowerPC AltiVec where it appears it was setting the wrong vector boolean contents.  The included change
fixes the PowerPC tests, and was OK'd by Hal.

llvm-svn: 180129
2013-04-23 18:09:28 +00:00
Tim Northover a2b533906a Remove unused MEMBARRIER DAG node; it's been replaced by ATOMIC_FENCE.
llvm-svn: 179939
2013-04-20 12:32:17 +00:00
Hal Finkel 0f64e21bb9 Move PPC getSwappedPredicate for reuse
The getSwappedPredicate function can be used in other places (such as in
improvements to the PPCCTRLoops pass). Instead of trapping it as a static
function in PPCInstrInfo, move it into PPCPredicates with other
predicate-related things.

No functionality change intended.

llvm-svn: 179926
2013-04-20 05:16:26 +00:00
Michael Liao b53d8963ce ArrayRefize getMachineNode(). No functionality change.
llvm-svn: 179901
2013-04-19 22:22:57 +00:00
Hal Finkel e632239d7b Fix PPC optimizeCompareInstr swapped-sub argument handling
When matching a compare with a subtract where the arguments of the compare are
swapped w.r.t. the arguments of the subtract, we need to negate the predicates
(or CR bit indices) of the users. This, however, is not the same as inverting
the predicate (negating LT -> GT, but inverting LT -> GE, for example). The ARM
backend seems to do this correctly, but when I adapted the code for the PPC
backend, I introduced an error in this logic.

Comparison optimization is now enabled again by default.

llvm-svn: 179899
2013-04-19 22:08:38 +00:00
Hal Finkel b12da6be75 Disable PPC comparison optimization by default
This seems to cause a stage-2 LLVM compile failure (by crashing TableGen); do
I'm disabling this for now.

llvm-svn: 179807
2013-04-18 22:54:25 +00:00
Hal Finkel 82656cb200 Implement optimizeCompareInstr for PPC
Many PPC instructions have a so-called 'record form' which stores to a specific
condition register the result of comparing the result of the instruction with
zero (always as a signed comparison). For integer operations on PPC64, this is
always a 64-bit comparison.

This implementation is derived from the implementation in the ARM backend;
there are some differences because PPC condition registers are allocatable
virtual registers (although the record forms always use a specific one), and we
look for a matching subtraction instruction after the compare (but before the
first use) in addition to before it.

llvm-svn: 179802
2013-04-18 22:15:08 +00:00
Peter Collingbourne 2f495b93ee Add support for subsections to the ELF assembler. Fixes PR8717.
Differential Revision: http://llvm-reviews.chandlerc.com/D598

llvm-svn: 179725
2013-04-17 21:18:16 +00:00
Ulrich Weigand d0585d8686 PowerPC: Mark some more patterns as isCodeGenOnly.
A couple of recently introduced conditional branch patterns
also need to be marked as isCodeGenOnly since they cannot
be handled by the asm parser.

No change in generated code.

llvm-svn: 179690
2013-04-17 17:19:05 +00:00
Hal Finkel 95e6ea69be Mark all PPC comparison instructions as not having side effects
Now that the CR spilling issues have been resolved, we can remove the
unmodeled-side-effect attributes from the comparison instructions (and also
mark them as isCompare). By allowing these, by default, to have unmodeled side
effects, we were hiding problems with CR spilling; but everything seems much
happier now.

llvm-svn: 179502
2013-04-15 02:37:46 +00:00
Hal Finkel 6736988ae2 Fix PPC64 CR spill location for callee-saved registers
This fixes an ABI bug for non-Darwin PPC64. For the callee-saved condition
registers, the spill location is specified relative to the stack pointer (SP +
8). However, this is not relative to the SP after the new stack frame is
established, but instead relative to the caller's stack pointer (it is stored
into the linkage area of the parent's stack frame).

So, like with the link register, we don't directly spill the CRs with other
callee-saved registers, but just mark them to be spilled during prologue
generation.

In practice, this reverts r179457 for PPC64 (but leaves it in place for PPC32).

llvm-svn: 179500
2013-04-15 02:07:05 +00:00
Hal Finkel 2f29391504 Mark all PPC CR registers to be spilled as live-in and tag MFCR appropriately
Leaving MFCR has having unmodeled side effects is not enough to prevent
unwanted instruction reordering post-RA. We could probably apply a stronger
barrier attribute, but there is a better way: Add all (not just the first) CR
to be spilled as live-in to the entry block, and add all CRs to the MFCR
instruction as implicitly killed.

Unfortunately, I don't have a small test case.

llvm-svn: 179465
2013-04-13 23:06:15 +00:00
Hal Finkel d85a04b3df Spill and restore PPC CR registers using the FP when we have one
For functions that need to spill CRs, and have dynamic stack allocations, the
value of the SP during the restore is not what it was during the save, and so
we need to use the FP in these cases (as for all of the other spills and
restores, but the CR restore has a special code path because its reserved slot,
like the link register, is specified directly relative to the adjusted SP).

llvm-svn: 179457
2013-04-13 08:09:20 +00:00
Hal Finkel 1b58f335ca PPC: Remove (broken) nested implicit definition lists
TableGen will not combine nested list 'let' bindings into a single list, and
instead uses only the inner scope. As a result, several instruction definitions
were missing implicit register defs that were in outer scopes. This de-nests
these scopes and makes all instructions have only one let binding which sets
implicit register definitions.

llvm-svn: 179392
2013-04-12 18:17:57 +00:00
Hal Finkel 2277196f64 Add a comment about the PPC Interpretation64Bit bit
llvm-svn: 179391
2013-04-12 18:17:38 +00:00
Hal Finkel 654d43b41a Add PPC instruction record forms and associated query functions
This is prep. work for the implementation of optimizeCompare. Many PPC
instructions have 'record' forms (in almost all cases, this means that the RC
bit is set) that cause the result of the instruction to be compared with zero,
and the result of that comparison saved in a predefined condition register. In
order to add the record forms of the instructions without too much
copy-and-paste, the relevant functions have been refactored into multiclasses
which define both the record and normal forms.

Also, two TableGen-generated mapping functions have been added which allow
querying the instruction code for the record form given the normal form (and
vice versa).

No functionality change intended.

llvm-svn: 179356
2013-04-12 02:18:09 +00:00
Hal Finkel f29285a487 Make PPCInstrInfo::isPredicated always return false
Because of how predication in implemented on PPC (only for branches), I think
that this is the right thing to do.  No functionality change intended.

llvm-svn: 179252
2013-04-11 01:23:34 +00:00
Hal Finkel 30ae229141 PPC: Don't predicate a diamond with two counter decrements
I've not seen this happen in practice, and probably can't until we start
allowing decrement-counter-based conditional branches to be double predicated,
but just in case, don't allow predication of a diamond in which both sides have
ctr-defining branches. Even though the branching behavior of these can be
predicated, the counter-decrementing behavior cannot be.

llvm-svn: 179199
2013-04-10 18:30:16 +00:00
Hal Finkel af822018aa Cleanup PPCInstrInfo::DefinesPredicate
Implement suggestions made by Bill Schmidt in post-commit review. Thanks!

llvm-svn: 179162
2013-04-10 07:17:47 +00:00
Hal Finkel 500b004566 PPC: Prep for if conversion of bctr[l]
This adds in-principle support for if-converting the bctr[l] instructions.
These instructions are used for indirect branching. It seems, however, that the
current if converter will never actually predicate these. To do so, it would
need the ability to hoist a few setup insts. out of the conditionally-executed
block. For example, code like this:
  void foo(int a, int (*bar)()) { if (a != 0) bar(); }
becomes:
        ...
        beq 0, .LBB0_2
        std 2, 40(1)
        mr 12, 4
        ld 3, 0(4)
        ld 11, 16(4)
        ld 2, 8(4)
        mtctr 3
        bctrl
        ld 2, 40(1)
.LBB0_2:
        ...
and it would be safe to do all of this unconditionally with a predicated
beqctrl instruction.

llvm-svn: 179156
2013-04-10 06:42:34 +00:00
Hal Finkel 5711eca19c Allow PPC B and BLR to be if-converted into some predicated forms
This enables us to form predicated branches (which are the same conditional
branches we had before) and also a larger set of predicated returns (including
instructions like bdnzlr which is a conditional return and loop-counter
decrement all in one).

At the moment, if conversion does not capture all possible opportunities. A
simple example is provided in early-ret2.ll, where if conversion forms one
predicated return, and then the PPCEarlyReturn pass picks up the other one. So,
at least for now, we'll keep both mechanisms.

llvm-svn: 179134
2013-04-09 22:58:37 +00:00
Hal Finkel 21aad9a8e8 Cleanup PPCEarlyReturn
Some general cleanup and only scan the end of a BB for branches (once we're
done with the terminators and debug values, then there should not be any other
branches). These address post-commit review suggestions by Bill Schmidt.

No functionality change intended.

llvm-svn: 179112
2013-04-09 18:25:18 +00:00
Hal Finkel b5899d5774 Use virtual base registers on PPC
On PowerPC, non-vector loads and stores have r+i forms; however, in functions
with large stack frames these were not being used to access slots far from the
stack pointer because such slots were out of range for the signed 16-bit
immediate offset field. This increases register pressure because we need a
separate register for each offset (when the r+r form is used). By enabling
virtual base registers, we can deal with large stack frames without unduly
increasing register pressure.

llvm-svn: 179105
2013-04-09 17:27:09 +00:00
Hal Finkel b5aa7e54d9 Generate PPC early conditional returns
PowerPC has a conditional branch to the link register (return) instruction: BCLR.
This should be used any time when we'd otherwise have a conditional branch to a
return. This adds a small pass, PPCEarlyReturn, which runs just prior to the
branch selection pass (and, importantly, after block placement) to generate
these conditional returns when possible. It will also eliminate unconditional
branches to returns (these happen rarely; most of the time these have already
been tail duplicated by the time PPCEarlyReturn is invoked). This is a nice
optimization for small functions that do not maintain a stack frame.

llvm-svn: 179026
2013-04-08 16:24:03 +00:00
Hal Finkel 81f8799fe3 Cleanup and improve PPC fsel generation
First, we should not cheat: fsel-based lowering of select_cc is a
finite-math-only optimization (the ISA manual, section F.3 of v2.06, makes
this clear, as does a note in our own README).

This also adds fsel-based lowering of EQ and NE condition codes. As it turned
out, fsel generation was covered by a grand total of zero regression test
cases. I've added some test cases to cover the existing behavior (which is now
finite-math only), as well as the new EQ cases.

llvm-svn: 179000
2013-04-07 22:11:09 +00:00
Hal Finkel 7795e47b5e PPC rotate instructions don't have unmodeled side effcts
llvm-svn: 178982
2013-04-07 15:06:53 +00:00
Hal Finkel b47a69acde Most PPC M[TF]CR instructions do not have side effects
llvm-svn: 178978
2013-04-07 14:33:13 +00:00
Hal Finkel d71cc3a7f3 PPC pre-increment load instructions do not have side effects
A few were missed in r178972.

llvm-svn: 178973
2013-04-07 06:30:47 +00:00
Hal Finkel 6efd45e902 PPC pre-increment load instructions do not have side effects
llvm-svn: 178972
2013-04-07 05:46:58 +00:00
Hal Finkel 933e8f037d PPC MCRF instruction does not have side effects
llvm-svn: 178971
2013-04-07 05:16:57 +00:00
Hal Finkel 94072b98eb PPC FMR instruction does not have side effects
llvm-svn: 178970
2013-04-07 04:56:16 +00:00
Hal Finkel d61d4f80e6 Implement PPCInstrInfo::FoldImmediate
There are certain PPC instructions into which we can fold a zero immediate
operand. We can detect such cases by looking at the register class required
by the using operand (so long as it is not otherwise constrained).

llvm-svn: 178961
2013-04-06 19:30:30 +00:00
Hal Finkel 8fc33e5d95 PPC ISEL is a select and never has side effects
llvm-svn: 178960
2013-04-06 19:30:28 +00:00
Hal Finkel ed6a28597b Enable early if conversion on PPC
On cores for which we know the misprediction penalty, and we have
the isel instruction, we can profitably perform early if conversion.
This enables us to replace some small branch sequences with selects
and avoid the potential stalls from mispredicting the branches.

Enabling this feature required implementing canInsertSelect and
insertSelect in PPCInstrInfo; isel code in PPCISelLowering was
refactored to use these functions as well.

llvm-svn: 178926
2013-04-05 23:29:01 +00:00
Hal Finkel 85526f2e71 Correct the PPC A2 misprediction penalty
The manual states that there is a minimum of 13 cycles from when the
mispredicted branch is issued to when the correct branch target is
issued.

llvm-svn: 178925
2013-04-05 23:28:58 +00:00
Hal Finkel 1a958cf30d Add a SchedMachineModel for the PPC G5
llvm-svn: 178850
2013-04-05 05:49:18 +00:00
Hal Finkel 5fde1b033e Add a SchedMachineModel for the PPC A2
llvm-svn: 178848
2013-04-05 05:34:08 +00:00
Arnold Schwaighofer b977387112 CostModel: Add parameter to instruction cost to further classify operand values
On certain architectures we can support efficient vectorized version of
instructions if the operand value is uniform (splat) or a constant scalar.
An example of this is a vector shift on x86.

We can efficiently support

for (i = 0 ; i < ; i += 4)
  w[0:3] = v[0:3] << <2, 2, 2, 2>

but not

for (i = 0; i < ; i += 4)
  w[0:3] = v[0:3] << x[0:3]

This patch adds a parameter to getArithmeticInstrCost to further qualify operand
values as uniform or uniform constant.

Targets can then choose to return a different cost for instructions with such
operand values.

A follow-up commit will test this feature on x86.

radar://13576547

llvm-svn: 178807
2013-04-04 23:26:21 +00:00
Hal Finkel e5680b3c36 Rename the current PPC BCL definition to BCLalways
BCL is normally a conditional branch-and-link instruction, but has
an unconditional form (which is used in the SjLj code, for example).
To make clear that this BCL instruction definition is specifically
the special unconditional form (which does not meaningfully take
a condition-register input), rename it to BCLalways.

No functionality change intended.

llvm-svn: 178803
2013-04-04 22:55:54 +00:00
Hal Finkel f96c18e3bc PPC: Improve code generation for mixed-precision reciprocal sqrt
The DAGCombine logic that recognized a/sqrt(b) and transformed it into
a multiplication by the reciprocal sqrt did not handle cases where the
sqrt and the division were separated by an fpext or fptrunc.

llvm-svn: 178801
2013-04-04 22:44:12 +00:00
Hal Finkel b0c810ff6d Cleanup PPC reciprocal-estimate functionality
Incorporating review feedback from Bill Schmidt on r178617. No functionality
change intended.

llvm-svn: 178672
2013-04-03 17:44:56 +00:00
Hal Finkel 7ac4592e97 PPC: Enable FRES and FRSQRTE on the default PPC64 description
I discussed this with Bill Schmidt on IRC, and it was decided that this is a
safe and reasonable default.

llvm-svn: 178659
2013-04-03 14:40:18 +00:00
Hal Finkel 0c6d21933a PPC: Add a FIXME regarding the non-working fma+fneg Altivec pattern
llvm-svn: 178658
2013-04-03 14:40:16 +00:00
Hal Finkel 2ed21a8ca6 Remove some obsolete PowerPC/README entries
llvm-svn: 178657
2013-04-03 14:25:55 +00:00
Ulrich Weigand 084ff8e891 More direct types in PowerPC AltiVec intrinsics.
This patch follows up on work done by Bill Schmidt in r178277,
and replaces most of the remaining uses of VRRC in ISEL DAG patterns.

The resulting .inc files are identical except for comments, so
no change in code generation is expected.

llvm-svn: 178656
2013-04-03 14:08:13 +00:00
Bill Schmidt 92e26646bc Fix PR15632: No support for ppcf128 floating-point remainder on PowerPC.
For this we need to use a libcall.  Previously LLVM didn't implement
libcall support for frem, so I've added it in the usual
straightforward manner.  A test case from the bug report is included.

llvm-svn: 178639
2013-04-03 13:05:44 +00:00
Hal Finkel b00fc87608 Remove some unsupported-feature comments from PPC.td
These refer to the reciprocal estimate support recently committed.

llvm-svn: 178618
2013-04-03 04:03:58 +00:00
Hal Finkel 2e10331057 Use PPC reciprocal estimates with Newton iteration in fast-math mode
When unsafe FP math operations are enabled, we can use the fre[s] and
frsqrte[s] instructions, which generate reciprocal (sqrt) estimates, together
with some Newton iteration, in order to quickly generate floating-point
division and sqrt results. All of these instructions are separately optional,
and so each has its own feature flag (except for the Altivec instructions,
which are covered under the existing Altivec flag). Doing this is not only
faster than using the IEEE-compliant fdiv/fsqrt instructions, but allows these
computations to be pipelined with other computations in order to hide their
overall latency.

I've also added a couple of missing fnmsub patterns which turned out to be
missing (but are necessary for good code generation of the Newton iterations).
Altivec needs a similar fix, but that will probably be more complicated because
fneg is expanded for Altivec's v4f32.

llvm-svn: 178617
2013-04-03 04:01:11 +00:00
Bill Schmidt 3581cd4b4c Fix PR15630: Replace faulty stdcx. with stwcx.
When doing a partword atomic operation, a lwarx was being paired with
a stdcx. instead of a stwcx. when compiling for a 64-bit target.  The
target has nothing to do with it in this case; we always need a stwcx.

Thanks to Kai Nacke for reporting the problem.

llvm-svn: 178559
2013-04-02 18:37:08 +00:00
Hal Finkel 93d75ea08a Fix typo in PPCISelLowering
Thanks to Bill Schmidt for finding this in review of r178480.

llvm-svn: 178521
2013-04-02 03:29:51 +00:00
Hal Finkel 3f88d08974 Fix a bad assert in PPCTargetLowering
llvm-svn: 178489
2013-04-01 18:42:58 +00:00
Hal Finkel f6d45f2379 Add more PPC floating-point conversion instructions
The P7 and A2 have additional floating-point conversion instructions which
allow a direct two-instruction sequence (plus load/store) to convert from all
combinations (signed/unsigned i32/i64) <--> (float/double) (on previous cores,
only some combinations were directly available).

llvm-svn: 178480
2013-04-01 17:52:07 +00:00
Hal Finkel 39caf9f5ec Use ImmToIdxMap.count in PPCRegisterInfo
Code improvement suggested by Jakob (in review of r178450). No functionality
change intended.

llvm-svn: 178473
2013-04-01 17:02:06 +00:00
Hal Finkel 290376dd78 Add the PPC popcntw instruction
The popcntw instruction is available whenever the popcntd instruction is
available, and performs a separate popcnt on the lower and upper 32-bits.
Ignoring the high-order count, this can be used for the 32-bit input case
(saving on the explicit zero extension otherwise required to use popcntd).

llvm-svn: 178470
2013-04-01 15:58:15 +00:00
Hal Finkel 60c7510711 Treat PPCISD::STFIWX like the memory opcode that it is
PPCISD::STFIWX is really a memory opcode, and so it should come after
FIRST_TARGET_MEMORY_OPCODE, and we should use DAG.getMemIntrinsicNode to create
nodes using it.

No functionality change intended (although there could be optimization benefits
from preserving the MMO information).

llvm-svn: 178468
2013-04-01 15:37:53 +00:00
Hal Finkel 8540f7771c Cleanup ImmToIdxMap and noImmForm in PPCRegisterInfo
ImmToIdxMap should be a DenseMap (not a std::map) because there
is no ordering requirement. Also, we don't need a separate list
of instructions for noImmForm in eliminateFrameIndex, because this
list is essentially the complement of the keys in ImmToIdxMap.

No functionality change intended.

llvm-svn: 178450
2013-03-31 14:43:31 +00:00
Hal Finkel beb296bea1 Add the PPC lfiwax instruction
This instruction is available on modern PPC64 CPUs, and is now used
to improve the SINT_TO_FP lowering (by eliminating the need for the
separate sign extension instruction and decreasing the amount of
needed stack space).

llvm-svn: 178446
2013-03-31 10:12:51 +00:00
Hal Finkel e53429a13e Cleanup PPC(64) i32 -> float/double conversion
The existing SINT_TO_FP code for i32 -> float/double conversion was disabled
because it relied on broken EXTSW_32/STD_32 instruction definitions. The
original intent had been to enable these 64-bit instructions to be used on CPUs
that support them even in 32-bit mode.  Unfortunately, this form of lying to
the infrastructure was buggy (as explained in the FIXME comment) and had
therefore been disabled.

This re-enables this functionality, using regular DAG nodes, but only when
compiling in 64-bit mode. The old STD_32/EXTSW_32 definitions (which were dead)
are removed.

llvm-svn: 178438
2013-03-31 01:58:02 +00:00
Hal Finkel f8ac57e289 Implement FRINT lowering on PPC using frin
Like nearbyint, rint can be implemented on PPC using the frin instruction. The
complication comes from the fact that rint needs to set the FE_INEXACT flag
when the result does not equal the input value (and frin does not do that). As
a result, we use a custom inserter which, after the rounding, compares the
rounded value with the original, and if they differ, explicitly sets the XX bit
in the FPSCR register (which corresponds to FE_INEXACT).

Once LLVM has better modeling of the floating-point environment we should be
able to (often) eliminate this extra complexity.

llvm-svn: 178362
2013-03-29 19:41:55 +00:00
Benjamin Kramer 70671b9937 Remove the old CodePlacementOpt pass.
It was superseded by MachineBlockPlacement and disabled by default since LLVM 3.1.

llvm-svn: 178349
2013-03-29 17:14:24 +00:00
Hal Finkel c20a08d25b Add PPC FP rounding instructions fri[mnpz]
These instructions are available on the P5x (and later) and on the A2. They
implement the standard floating-point rounding operations (floor, trunc, etc.).
One caveat: frin (round to nearest) does not implement "ties to even", and so
is only enabled in fast-math mode.

llvm-svn: 178337
2013-03-29 08:57:48 +00:00
Hal Finkel 22e41c411e Only enable 64-bit bswap DAG combines for PPC64
Compiling in 32-bit mode on a P7 would assert after 64-bit DAG combines were
added for bswap with load/store. This is because these combines are really only
valid in 64-bit mode, regardless of the CPU (and this was not being checked).

llvm-svn: 178286
2013-03-28 20:23:46 +00:00
Hal Finkel 93492fa696 Fix bad indentation in r178276
Thanks to Bill Schmidt for pointing this out!

llvm-svn: 178280
2013-03-28 19:43:12 +00:00
Bill Schmidt 74b2e72ab3 Use direct types in most PowerPC Altivec instructions and patterns.
This follows up Ulrich Weigand's work in PPCInstrInfo.td and
PPCInstr64Bit.td by doing the corresponding work for most of the
Altivec patterns.  I have not been able to do anything for the
following classes of instructions:

(1) Vector logicals.  These don't have corresponding intrinsics and
don't have a single obvious vector type.  So far as I can tell I need
to leave these as VRRC.  Affected instructions are:  VAND, VANDC,
VNOR, VOR, VXOR, V_SET0.

(2) Instructions that make use of vector shuffle.  The selection code
promotes all shuffles to v16i8, so any pattern that matches on a
shuffle is constrained.  I haven't found any way to make the patterns
match on their natural types, so I plan to leave these as VRRC.
Affected instructions are:  VMRG*, VSPLTB, VSPLTH, VSPLTW, VPKUHUM,
VPKUWUM.

No change in behavior is anticipated.

llvm-svn: 178277
2013-03-28 19:27:24 +00:00
Hal Finkel 31d2956510 Add the PPC64 ldbrx/stdbrx instructions
These are 64-bit load/store with byte-swap, and available on the P7 and the A2.
Like the similar instructions for 16- and 32-bit words, these are matched in the
target DAG-combine phase against load/store-bswap pairs.

llvm-svn: 178276
2013-03-28 19:25:55 +00:00
Hal Finkel a4d074863a Add the PPC64 popcntd instruction
PPC ISA 2.06 (P7, A2, etc.) has a popcntd instruction. Add this instruction and
tell TTI about it so that popcount-loop recognition will know about it.

llvm-svn: 178233
2013-03-28 13:29:47 +00:00
Hal Finkel 035b4825ce Cleanup PPC CR-spill kill flags and 32- vs. 64-bit instructions
There were a few places where kill flags were not being set correctly, and
where 32-bit instruction variants were being used with 64-bit registers. After
r178180, this code was being triggered causing llc to assert.

llvm-svn: 178220
2013-03-28 03:38:16 +00:00
Hal Finkel 25aab01058 Fix typo in PPCInstr64Bit
llvm-svn: 178219
2013-03-28 03:38:08 +00:00
Hal Finkel 37714b8a48 Resynchronize isLoadFromStackSlot with LoadRegFromStackSlot (and stores) in PPCInstrInfo
These functions should have the same list of load/store instructions. Now that
all load/store forms have been normalized (to single instructions or pseudos)
they can be resynchronized.

Found by inspection, although hopefully this will improve optimization.  I've
also added some comments.

llvm-svn: 178180
2013-03-27 21:21:15 +00:00
Hal Finkel 1996f3d87f Fix typo (common to both X86 and PPC)
Thanks to Bill Schmidt for pointing this out during code review!

llvm-svn: 178170
2013-03-27 19:10:42 +00:00
Hal Finkel 5791f51449 Remove more dead LR-as-GPR PPC code
I had removed similar code a few days ago, but somehow missed this.

llvm-svn: 178169
2013-03-27 19:10:40 +00:00
Hal Finkel f1af79ab45 Remove "gpr0 allocation" from the PPC README TODO list
As Chris pointed out, post r178123, this is now done!

llvm-svn: 178165
2013-03-27 18:39:52 +00:00
Hal Finkel 687143557d Print PPC ZERO as 0 (not r0) even on Darwin
It seems that the Darwin PPC assembler requires r0 to be written as 0 when it
means 0 (at least in lwarx/stwcx.). Fixes PR15605.

llvm-svn: 178142
2013-03-27 13:20:52 +00:00
Hal Finkel 0f77861d9f Allocate r0 on PPC
The R0 register can now be allocated because instructions
that cannot use R0 as a GPR have been appropriately marked.

llvm-svn: 178123
2013-03-27 06:52:27 +00:00
Hal Finkel 573fc28d64 Use the PPC no-r0 class on the TOC LD pseudos
The register parameter in these instructions becomes the base register in an
r+i ld instruction (and, thus, cannot be r0).

This is not yet testable because we don't yet allocate r0 (and even then any
test would be very fragile).

llvm-svn: 178121
2013-03-27 06:36:55 +00:00
Hal Finkel 3fa362a51a Apply the no-r0 register class to the PPC SELECT_CC_I[4|8] pseudos
Either operand of these pseudo instructions can be transformed into the first
operand of an isel instruction (and this operand cannot be r0).

This is not yet testable because we don't yet allocate r0 (and even when we do,
any test would be very fragile).

llvm-svn: 178119
2013-03-27 05:57:58 +00:00
Hal Finkel 42a312b261 Apply the no-r0 class to PPC TOC ADDI[S] pseudo instructions
Like the addi/addis instructions themselves, these pseudo instructions also
cannot have r0 as their register parameter (because it will be interpreted as
the value 0).

This is not yet testable because we don't yet allocate r0 (and even when we do,
any regression test would be very fragile because it would depend on the
register allocator heuristics).

llvm-svn: 178118
2013-03-27 05:57:56 +00:00
Bill Schmidt a1b72d0f6a Remove the link register from the GPR classes on PowerPC.
Some implementation detail in the forgotten past required the link
register to be placed in the GPRC and G8RC register classes.  This is
just wrong on the face of it, and causes several extra intersection
register classes to be generated.  I found this was having evil
effects on instruction scheduling, by causing the wrong register class
to be consulted for register pressure decisions.

No code generation changes are expected, other than some minor changes
in instruction order.  Seven tests in the test bucket required minor
tweaks to adjust to the new normal.

llvm-svn: 178114
2013-03-27 02:40:14 +00:00
Hal Finkel a7b0630ba8 Don't spill PPC VRSAVE on non-Darwin (even in SjLj)
As Bill Schmidt pointed out to me, only on Darwin do we need to spill/restore
VRSAVE in the SjLj code. For non-Darwin, don't spill/restore VRSAVE (and I've
added some asserts to make sure that we're not).

As it turns out, we're not currently handling the Darwin case correctly (I've
added a FIXME in the test case). I've tried adding various implied register
definitions/uses to force the spill without success, so I'll need to address
this later.

llvm-svn: 178096
2013-03-27 00:02:20 +00:00
Hal Finkel 567fa62ddc Restore real bit lengths on PPC register numbers
As suggested by Bill Schmidt (in reviewing r178067), use the real register
number bit lengths (which is self-documenting, and prevents using illegal
numbers), and set only the relevant bits in HWEncoding (which defaults to 0).

No functionality change intended.

llvm-svn: 178077
2013-03-26 21:50:26 +00:00
Hal Finkel feea653974 PPC: Use HWEncoding and TRI->getEncodingValue
As pointed out by Jakob, we don't need to maintain a separate
register-numbering table. Instead we should let TableGen generate the table for
us from the information (already present) in PPCRegisterInfo.td.
TRI->getEncodingValue is now used to access register-encoding values.

No functionality change intended.

llvm-svn: 178067
2013-03-26 20:08:20 +00:00
Hal Finkel 0dfbb05aff Use multiple virtual registers in PPC CR spilling
Now that the register scavenger can support multiple spill slots, and PEI can
use virtual-register-based scavenging for multiple simultaneous registers, we
can use a virtual register for the transfer register in the CR spilling code.

This should eliminate the last place (outside of the prologue/epilogue) where
we depend on the unconditional availability of the r0 register. We will soon be
able to allocate it (in a somewhat restricted sense) as a GPR.

llvm-svn: 178060
2013-03-26 18:57:22 +00:00
Hal Finkel d8a423cd71 Update PPCRegisterInfo's use of virtual registers to be SSA
PPC's use of PEI's virtual-register-based scavenging functionality had
redefined the virtual registers (it was non-SSA). Now that PEI supports
dealing with instructions with multiple virtual registers, this can be
cleanup up to use multiple virtual registers and keep SSA form.

No functionality change intended.

llvm-svn: 178059
2013-03-26 18:57:20 +00:00
Benjamin Kramer cf3d5aaea9 Remove default case from fully covered switch.
llvm-svn: 178025
2013-03-26 14:17:42 +00:00
Ulrich Weigand bbfb0c55c8 PowerPC: Mark patterns as isCodeGenOnly.
There remain a number of patterns that cannot (and should not)
be handled by the asm parser, in particular all the Pseudo patterns.

This commit marks those patterns as isCodeGenOnly.

No change in generated code.

llvm-svn: 178008
2013-03-26 10:57:16 +00:00
Ulrich Weigand 3e1860150d PowerPC: Simplify handling of fixups.
MCTargetDesc/PPCMCCodeEmitter.cpp current has code like:

 if (isSVR4ABI() && is64BitMode())
   Fixups.push_back(MCFixup::Create(0, MO.getExpr(),
                                    (MCFixupKind)PPC::fixup_ppc_toc16));
 else
   Fixups.push_back(MCFixup::Create(0, MO.getExpr(),
                                    (MCFixupKind)PPC::fixup_ppc_lo16));

This is a problem for the asm parser, since it requires knowledge of
the ABI / 64-bit mode to be set up.  However, more fundamentally,
at this point we shouldn't make such distinctions anyway; in an assembler
file, it always ought to be possible to e.g. generate TOC relocations even
when the main ABI is one that doesn't use TOC.

Fortunately, this is actually completely unnecessary; that code was added
to decide whether to generate TOC relocations, but that information is in
fact already encoded in the VariantKind of the underlying symbol.

This commit therefore merges those fixup types into one, and then decides
which relocation to use based on the VariantKind.

No changes in generated code.

llvm-svn: 178007
2013-03-26 10:56:47 +00:00
Ulrich Weigand 874fc628df PowerPC: Simplify FADD in round-to-zero mode.
As part of the the sequence generated to implement long double -> int
conversions, we need to perform an FADD in round-to-zero mode.  This is
problematical since the FPSCR is not at all modeled at the SelectionDAG
level, and thus there is a risk of getting floating point instructions
generated out of sequence with the instructions to modify FPSCR.

The current code handles this by somewhat "special" patterns that in part
have dummy operands, and/or duplicate existing instructions, making them
awkward to handle in the asm parser.

This commit changes this by leaving the "FADD in round-to-zero mode"
as an atomic operation on the SelectionDAG level, and only split it up into
real instructions at the MI level (via custom inserter).  Since at *this*
level the FPSCR *is* modeled (via the "RM" hard register), much of the
"special" stuff can just go away, and the resulting patterns can be used by
the asm parser.

No significant change in generated code expected.

llvm-svn: 178006
2013-03-26 10:56:22 +00:00
Ulrich Weigand 4a0838863b PowerPC: Remove LDrs pattern.
The LDrs pattern is a duplicate of LD, except that it accepts memory
addresses where the displacement is a symbolLo64.  An operand type
"memrs" is defined for just that purpose.

However, this wouldn't be necessary if the default "memrix" operand
type were to simply accept 64-bit symbolic addresses directly.
The only problem with that is that it uses "symbolLo", which is
hardcoded to 32-bit.

To fix this, this commit changes "memri" and "memrix" to use new
operand types for the memory displacement, which allow iPTR
instead of i32.  This will also make address parsing easier to
implment in the asm parser.

No change in generated code.

llvm-svn: 178005
2013-03-26 10:55:45 +00:00
Ulrich Weigand 35f9fdfdfd PowerPC: Remove ADDIL patterns.
The ADDI/ADDI8 patterns are currently duplicated into ADDIL/ADDI8L,
which describe the same instruction, except that they accept a
symbolLo[64] operand instead of a s16imm[64] operand.

This duplication confuses the asm parser, and it actually not really
needed, since symbolLo[64] already accepts immediate operands anyway.
So this commit removes the duplicate patterns.

No change in generated code.

llvm-svn: 178004
2013-03-26 10:55:20 +00:00
Ulrich Weigand 4749b1ecd8 PowerPC: Use CCBITRC operand for ISEL patterns.
This commit changes the ISEL patterns to use a CCBITRC operand
instead of a "pred" operand.  This matches the actual instruction
text more directly, and simplifies use of ISEL with the asm parser.
In addition, this change allows some simplification of handling
the "pred" operand, as this is now only used by BCC.

No change in generated code.

llvm-svn: 178003
2013-03-26 10:54:54 +00:00
Ulrich Weigand 63aa852a84 PowerPC: Simplify BLR pattern.
The BLR pattern cannot be recognized by the asm parser in its current form.
This complexity is due to an apparent attempt to enable conditional BLR
variants.  However, none of those can ever be generated by current code;
the pattern is only ever created using the default "pred" operand.

To simplify the pattern and allow it to be recognized by the parser,
this commit removes those attempts at conditional BLR support.

When we later come back to actually add real conditional BLR, this
should probably be done via a fully generic conditional branch pattern.

No change in generated code.

llvm-svn: 178002
2013-03-26 10:53:27 +00:00
Ulrich Weigand 410a40bb5f PowerPC: Move some 64-bit branch patterns.
In PPCInstr64Bit.td, some branch patterns appear in a different sequence
than the corresponding 32-bit patterns in PPCInstrInfo.td.

To simplify future changes that affect both files, this commit moves
those patterns to rearrange them into a similar sequence.

No effect on generated code.

llvm-svn: 178001
2013-03-26 10:53:03 +00:00
Ulrich Weigand c8868106e6 Use direct types in PowerPC instruction patterns.
This commit updates the PowerPC back-end (PPCInstrInfo.td and
PPCInstr64Bit.td) to use types instead of register classes in
instruction patterns, along the lines of Jakob Stoklund Olesen's
changes in r177835 for Sparc.
 

llvm-svn: 177890
2013-03-25 19:05:30 +00:00
Ulrich Weigand ec6e2cd124 Use direct types in PowerPC Pat patterns.
This commit updates the PowerPC back-end (PPCInstrInfo.td and
PPCInstr64Bit.td) to use types instead of register classes in
Pat patterns, along the lines of Jakob Stoklund Olesen's
changes in r177829 for Sparc.

llvm-svn: 177889
2013-03-25 19:04:58 +00:00
Hal Finkel 915769edd9 PPC ZERO register needs a register number of 0.
In order for the new ZERO register to be used with MC, etc. we need to specify
its register number (0).

Thanks to Kai for reporting the problem!

llvm-svn: 177833
2013-03-23 22:06:07 +00:00
Hal Finkel cc1eeda16d Note in PPCFunctionInfo VRSAVE spills
In preparation for using the new register scavenger capability for providing
more than one register simultaneously, specifically note functions that have
spilled VRSAVE (currently, this can happen only in functions that use the
setjmp intrinsic). As with CR spilling, such functions will need to provide two
emergency spill slots to the scavenger.

No functionality change intended.

llvm-svn: 177832
2013-03-23 22:06:03 +00:00
Hal Finkel f07a8e04ab MCize the bcl instruction in PPCAsmPrinter
I recently added a BCL instruction definition as part of implementing SjLj
support. This can also be used to MCize bcl emission in the asm printer.

No functionality change intended.

llvm-svn: 177830
2013-03-23 20:53:15 +00:00
Hal Finkel c6eaa4cead Cleanup some unused reg. scavenger parameters in PPCRegisterInfo
These spilling functions will eventually make use of the register scavenger,
however, they'll do so by taking advantage of PEI's virtual-register-based
delayed scavenging mechanism. As a result, these function parameters will not
be used, and can be removed.

No functionality change intended.

llvm-svn: 177827
2013-03-23 19:36:47 +00:00
Hal Finkel 794e05b03b Remove dead PPC LR spilling code
The LR register is unconditionally reserved, and its spilling and restoration
is handled by the prologue/epilogue code. As a result, it is never explicitly
spilled by the register allocator.

No functionality change intended.

llvm-svn: 177823
2013-03-23 17:14:27 +00:00
Hal Finkel 9e331c2f9c Allow the register scavenger to spill multiple registers
This patch lets the register scavenger make use of multiple spill slots in
order to guarantee that it will be able to provide multiple registers
simultaneously.

To support this, the RS's API has changed slightly: setScavengingFrameIndex /
getScavengingFrameIndex have been replaced by addScavengingFrameIndex /
isScavengingFrameIndex / getScavengingFrameIndices.

In forthcoming commits, the PowerPC backend will use this capability in order
to implement the spilling of condition registers, and some special-purpose
registers, without relying on r0 being reserved. In some cases, spilling these
registers requires two GPRs: one for addressing and one to hold the value being
transferred.

llvm-svn: 177774
2013-03-22 23:32:27 +00:00
Ulrich Weigand f62e83f415 Remove ABI-duplicated call instruction patterns.
We currently have a duplicated set of call instruction patterns depending
on the ABI to be followed (Darwin vs. Linux).  This is a bit odd; while the
different ABIs will result in different instruction sequences, the actual
instructions themselves ought to be independent of the ABI.  And in fact it
turns out that the only nontrivial difference between the two sets of
patterns is that in the PPC64 Linux ABI, the instruction used for indirect
calls is marked to take X11 as extra input register (which is indeed used
only with that ABI to hold an incoming environment pointer for nested
functions).  However, this does not need to be hard-coded at the .td
pattern level; instead, the C++ code expanding calls can simply add that
use, just like it adds uses for argument registers anyway.

No change in generated code expected.

llvm-svn: 177735
2013-03-22 15:24:13 +00:00
Ulrich Weigand 1df06d8b58 Rename memrr ptrreg and offreg components.
Currently, the sub-operand of a memrr address that corresponds to what
hardware considers the base register is called "offreg", while the
sub-operand that corresponds to the offset is called "ptrreg".

To avoid confusion, this patch simply swaps the named of those two
sub-operands and updates all uses.  No functional change is intended.

llvm-svn: 177734
2013-03-22 14:59:13 +00:00
Ulrich Weigand e90b022468 Fix swapped BasePtr and Offset in pre-inc memory addresses.
PPCTargetLowering::getPreIndexedAddressParts currently provides
the base part of a memory address in the offset result, and the
offset part in the base result.  That swap is then undone again
when an MI instruction is generated (in PPCDAGToDAGISel::Select
for loads, and using .md Pat patterns for stores).

This patch reverts this double swap, to make common code and
back-end be in sync as to which part of the address is base
and which is offset.

To avoid performance regressions in certain cases, target code
now checks whether the choice of base register would be rejected
for pre-inc accesses by common code, and attempts to swap base
and offset again in such cases.  (Overall, this means that now
pre-ice accesses are generated *more* frequently than before.)

llvm-svn: 177733
2013-03-22 14:58:48 +00:00
Ulrich Weigand d1b99d350c Tighten iaddroff ComplexPattern.
The iaddroff ComplexPattern is supposed to recognize displacement
expressions that have been processed by a SelectAddressRegImm,
which means it needs to accept TargetConstant and TargetGlobalAddress
nodes.  Currently, it erroneously also accepts some other nodes,
in particular Constant and PPCISD::Lo.

While this problem is currently latent, it would cause wrong-code
bugs with a follow-on patch I'm about to commit, so this patch
tightens the ComplexPattern.  The equivalent change is made in
PPCDAGToDAGISel::Select, where pre-inc load patterns are handled
(as opposed to store patterns, the loads are handled in C++ code
without making use of the .td ComplexPattern).

llvm-svn: 177732
2013-03-22 14:58:17 +00:00
Ulrich Weigand e448badbb1 Remove the xaddroff ComplexPattern.
The xaddroff pattern is currently (mistakenly) used to recognize
the *base* register in pre-inc store patterns.  This patch replaces
those uses by ptr_rc_nor0 (as is elsewhere done to match the base
register of an address), and removes the now unused ComplexPattern.

llvm-svn: 177731
2013-03-22 14:57:48 +00:00
Hal Finkel f70c41ea7c Remove the G8RC_NOX0_and_GPRC_NOR0 PPC register class
As Jakob pointed out in his review of r177423, having a shared ZERO
register between the 32- and 64-bit register classes causes this
odd G8RC_NOX0_and_GPRC_NOR0 class to be created. As recommended,
this adds a ZERO8 register which differentiates the 32- and 64-bit
zeros.

No functionality change intended.

llvm-svn: 177683
2013-03-21 23:45:03 +00:00
Hal Finkel 891671afe5 Fix a register-class comparison bug in PPCCTRLoops
Thanks to Jakob for isolating the underlying problem from the
test case in r177423. The original commit had introduced
asymmetric copy operations, but these turned out to be a work-around
to the real problem (the use of == instead of hasSubClassEq in PPCCTRLoops).

llvm-svn: 177679
2013-03-21 23:23:34 +00:00
Hal Finkel 756810fe36 Implement builtin_{setjmp/longjmp} on PPC
This implements SJLJ lowering on PPC, making the Clang functions
__builtin_{setjmp/longjmp} functional on PPC platforms. The implementation
strategy is similar to that on X86, with the exception that a branch-and-link
variant is used to get the right jump address. Credit goes to Bill Schmidt for
suggesting the use of the unconditional bcl form (instead of the regular bl
instruction) to limit return-address-cache pollution.

Benchmarking the speed at -O3 of:

static jmp_buf env_sigill;

void foo() {
                __builtin_longjmp(env_sigill,1);
}

main() {
	...

        for (int i = 0; i < c; ++i) {
                if (__builtin_setjmp(env_sigill)) {
                        goto done;
                } else {
                        foo();
                }

done:;
        }

	...
}

vs. the same code using the libc setjmp/longjmp functions on a P7 shows that
this builtin implementation is ~4x faster with Altivec enabled and ~7.25x
faster with Altivec disabled. This comparison is somewhat unfair because the
libc version must also save/restore the VSX registers which we don't yet
support.

llvm-svn: 177666
2013-03-21 21:37:52 +00:00
Hal Finkel a1431df540 Add support for spilling VRSAVE on PPC
Although there is only one Altivec VRSAVE register, it is a member of
a register class, and we need the ability to spill it. Because this
register is normally callee-preserved and handled by special code this
has never before been necessary. However, this capability will be required by
a forthcoming commit adding SjLj support.

llvm-svn: 177654
2013-03-21 19:03:21 +00:00
Hal Finkel aa03c03a2d Correct PPC FRAMEADDR lowering using a pseudo-register
The old code used to lower FRAMEADDR tried to replicate the logic in the real
frame-lowering code that determines whether or not the frame pointer (r31) will
be used. When it seemed as through the frame pointer would not be used, the
stack pointer (r1) was used instead. Unfortunately, because the stack size is
not yet known, this does not work. Instead, this change introduces new
always-reserved pseudo-registers (FP and FP8) that are replaced during prologue
insertion with the real frame-pointer register (either r1 or r31).

It is important that this intrinsic always return a valid frame address because
it is used by Clang to store the frame address as part of code generation for
__builtin_setjmp.

llvm-svn: 177653
2013-03-21 19:03:19 +00:00
Ulrich Weigand 01dd4c1a12 Add missing mayLoad flag to LHAUX8 and LWAUX.
All pre-increment load patterns need to set the mayLoad flag (since
they don't provide a DAG pattern).

This was missing for LHAUX8 and LWAUX, which is added by this patch.

llvm-svn: 177431
2013-03-19 19:53:27 +00:00
Ulrich Weigand f8030096b1 Rewrite LHAU8 pattern to use standard memory operand.
As opposed to to pre-increment store patterns, the pre-increment
load patterns were already using standard memory operands, with
the sole exception of LHAU8.

As there's no real reason why LHAU8 should be different here,
this patch simply rewrites the pattern to also use a memri
operand, just like all the other patterns.

llvm-svn: 177430
2013-03-19 19:52:30 +00:00
Ulrich Weigand d850167a19 Rewrite pre-increment store patterns to use standard memory operands.
Currently, pre-increment store patterns are written to use two separate
operands to represent address base and displacement:

  stwu $rS, $ptroff($ptrreg)

This causes problems when implementing the assembler parser, so this
commit changes the patterns to use standard (complex) memory operands
like in all other memory access instruction patterns:

  stwu $rS, $dst

To still match those instructions against the appropriate pre_store
SelectionDAG nodes, the patch uses the new feature that allows a Pat
to match multiple DAG operands against a single (complex) instruction
operand.

Approved by Hal Finkel.

llvm-svn: 177429
2013-03-19 19:52:04 +00:00
Ulrich Weigand fd24544ff8 Fix sub-operand size mismatch in tocentry operands.
The tocentry operand class refers to 64-bit values (it is only used in 64-bit,
where iPTR is a 64-bit type), but its sole suboperand is designated as 32-bit
type.  This causes a mismatch to be detected at compile-time with the TableGen
patch I'll check in shortly.

To fix this, this commit changes the suboperand to a 64-bit type as well.

llvm-svn: 177427
2013-03-19 19:50:30 +00:00
Hal Finkel 638a9fa43e Prepare to make r0 an allocatable register on PPC
Currently the PPC r0 register is unconditionally reserved. There are two reasons
for this:

 1. r0 is treated specially (as the constant 0) by certain instructions, and so
    cannot be used with those instructions as a regular register.

 2. r0 is used as a temporary register in the CR-register spilling process
    (where, under some circumstances, we require two GPRs).

This change addresses the first reason by introducing a restricted register
class (without r0) for use by those instructions that treat r0 specially. These
register classes have a new pseudo-register, ZERO, which represents the r0-as-0
use. This has the side benefit of making the existing target code simpler (and
easier to understand), and will make it clear to the register allocator that
uses of r0 as 0 don't conflict will real uses of the r0 register.

Once the CR spilling code is improved, we'll be able to allocate r0.

Adding these extra register classes, for some reason unclear to me, causes
requests to the target to copy 32-bit registers to 64-bit registers. The
resulting code seems correct (and causes no test-suite failures), and the new
test case covers this new kind of asymmetric copy.

As r0 is still reserved, no functionality change intended.

llvm-svn: 177423
2013-03-19 18:51:05 +00:00
Hal Finkel 6681486375 Cleanup PPC64 unaligned i64 load/store
Remove an accidentally-added instruction definition and add a comment in the
test case. This is in response to a post-commit review by Bill Schmidt.

No functionality change intended.

llvm-svn: 177404
2013-03-19 15:23:39 +00:00
Hal Finkel d9e10d51fa Don't reserve R31 on PPC64 unless the frame pointer is needed
llvm-svn: 177379
2013-03-19 08:09:38 +00:00
Hal Finkel fc9aad6436 Fix a sign-extension bug in PPCCTRLoops
Don't sign extend the immediate value from the OR instruction in
an LIS/OR pair.

llvm-svn: 177361
2013-03-18 23:58:28 +00:00
Hal Finkel b09680b0f7 Fix PPC unaligned 64-bit loads and stores
PPC64 supports unaligned loads and stores of 64-bit values, but
in order to use the r+i forms, the offset must be a multiple of 4.
Unfortunately, this cannot always be determined by examining the
immediate itself because it might be available only via a TOC entry.

In order to get around this issue, we additionally predicate the
selection of the r+i form on the alignment of the load or store
(forcing it to be at least 4 in order to select the r+i form).

llvm-svn: 177338
2013-03-18 23:00:58 +00:00
Hal Finkel e8f1cf478b Fix 80-col. violations in PPCCTRLoops
llvm-svn: 177296
2013-03-18 17:40:46 +00:00
Hal Finkel 21f2a43ab4 Fix large count and negative constant count handling in PPCCTRLoops
This commit fixes an assert that would occur on loops with large constant counts
(like looping for ((uint32_t) -1) iterations on PPC64). The existing code did
not handle counts that it computed to be negative (asserting instead), but
these can be created with valid inputs.

This bug was discovered by bugpoint while I was attempting to isolate a
completely different problem.

Also, in writing test cases for the negative-count problem, I discovered that
the ori/lsi handling was broken (there was a typo which caused the logic that
was supposed to detect these pairs and extract the iteration count to always
fail). This has now also been corrected (and is covered by one of the new test
cases).

llvm-svn: 177295
2013-03-18 17:40:44 +00:00
Hal Finkel 12337e4e7d Cleanup initial-value constants in PPCCTRLoops
Because the initial-value constants had not been added to the list
of instructions considered for DCE the resulting code had redundant
constant-materialization instructions.

llvm-svn: 177294
2013-03-18 17:40:27 +00:00
Sylvestre Ledru 37ef20d307 To avoid symbol clash, undefine PPC here. PPC may be predefined on some hosts.
llvm-svn: 177234
2013-03-17 12:40:42 +00:00
Hal Finkel fcc51d4ff1 Improve PPC VR (Altivec) register spilling
This change cleans up two issues with Altivec register spilling:

  1. The spilling code was inefficient (using two instructions, and add and a
     load, when just one would do)

  2. The code assumed that r0 would always be available (true for now, but this
     will change)

The new code handles VR spilling just like GPR spills but forced into r+r mode.
As a result, when any VR spills are present, we must now always allocate the
register-scavenger spill slot.

llvm-svn: 177231
2013-03-17 04:43:44 +00:00
Hal Finkel 8b0470393b Remove PPC avoidWriteAfterWrite callback
As a follow-up to r158719, remove PPCRegisterInfo::avoidWriteAfterWrite.
Jakob pointed out in response to r158719 that this callback is currently unused
and so this has no effect (and the speedups that I thought that I had observed
as a result of implementing this function must have been noise).

llvm-svn: 177228
2013-03-16 22:50:51 +00:00
Hal Finkel 8d7fbc9dad Enable unaligned memory access on PPC for scalar types
Unaligned access is supported on PPC for non-vector types, and is generally
more efficient than manually expanding the loads and stores.

A few of the existing test cases were using expanded unaligned loads and stores
to test other features (like load/store with update), and for these test cases,
unaligned access remains disabled.

llvm-svn: 177160
2013-03-15 15:27:13 +00:00
Hal Finkel b0fac42987 Protect PPC Altivec patterns with a predicate
In preparation for the addition of other SIMD ISA extensions (such as QPX) we
need to make sure that all Altivec patterns are properly predicated on having
Altivec support.

No functionality change intended (one test case needed to be updated b/c it
assumed that Altivec intrinsics would be supported without enabling Altivec
support).

llvm-svn: 177152
2013-03-15 13:21:21 +00:00
Hal Finkel bb420f10e9 Allocate the RS spill slot for any PPC function with spills and a large stack frame
For spills into a large stack frame, the FI-elimination code uses the register
scavenger to obtain a free GPR for use with an r+r-addressed load or store.
When there are no available GPRs, the scavenger gets one by using its spill
slot. Previously, we were not always allocating that spill slot and the RS
would assert when the spill slot was needed.

I don't currently have a small test that triggered the assert, but I've
created a small regression test that verifies that the spill slot is now
added when the stack frame is sufficiently large.

llvm-svn: 177140
2013-03-15 05:06:04 +00:00
Hal Finkel 5a765fddb0 Provide the register scavenger to processFunctionBeforeFrameFinalized
Add the current PEI register scavenger as a parameter to the
processFunctionBeforeFrameFinalized callback.

This change is necessary in order to allow the PowerPC target code to
set the register scavenger frame index after the save-area offset
adjustments performed by processFunctionBeforeFrameFinalized. Only
after these adjustments have been made is it possible to estimate
the size of the stack frame.

llvm-svn: 177108
2013-03-14 20:33:40 +00:00
Hal Finkel ad26f4ded2 Use frame-index scavenging for PPC register spilling
Make requiresFrameIndexScavenging return true, and create virtual registers in
the spilling code instead of using the register scavenger directly. This makes
the target-level code simpler, and importantly, delays the scavenging until
after callee-saved register processing (which will be important for later
changes).

Also cleans up trackLivenessAfterRegAlloc (makes it inline in the header with
the other related functions). This makes it clear that it always returns true.

No functionality change intended.

llvm-svn: 177107
2013-03-14 20:21:47 +00:00
Hal Finkel e987a311ba Not all PPC functions with a frame pointer need a RS spill slot
We used to add a spill slot for the register scavenger whenever the function
has a frame pointer. This is unnecessarily conservative: We may need the spill
slot for dynamic stack allocations, and functions with dynamic stack
allocations always have a FP, but we might also have a FP for other reasons
(such as the user explicitly disabling frame-pointer elimination), and we don't
necessarily need a spill slot for those functions.

The structsinregs test needed adjustment because it disables FP elimination.

llvm-svn: 177106
2013-03-14 19:34:32 +00:00
Hal Finkel ad92b46505 Add a comment about overlapping PPC frame offsets
I don't think that it is otherwise clear how the overlapping offsets
are processed into distinct spill slots. Comment that this is done
in processFunctionBeforeFrameFinalized.

llvm-svn: 177094
2013-03-14 18:38:31 +00:00
Hal Finkel 01271c6022 Don't reserve R2 on Darwin/PPC
Now that only the register-scavenger version of the CR spilling code remains,
we no longer need the Darwin R2 hack. Darwin can use R0 as a spare register in
any case where the System V ABI uses it (R0 is special architecturally, and so
is reserved under all common ABIs).

A few test cases needed to be updated to reflect the register-allocation changes.

llvm-svn: 176868
2013-03-12 15:18:14 +00:00
Hal Finkel e154c8f23e PPC should always use the register scavenger for CR spilling
This removes the -disable-ppc[32|64]-regscavenger options; the code
that uses the register scavenger has been working well (and has been the default)
for some time, and we don't need options to enable the old (broken) CR spilling code.

llvm-svn: 176865
2013-03-12 14:12:16 +00:00
Benjamin Kramer fdf362bd69 ArrayRefize some code. No functionality change.
llvm-svn: 176648
2013-03-07 20:33:29 +00:00
Michael Liao 6af16fc3b7 Fix PR10475
- ISD::SHL/SRL/SRA must have either both scalar or both vector operands
  but TLI.getShiftAmountTy() so far only return scalar type. As a
  result, backend logic assuming that breaks.
- Rename the original TLI.getShiftAmountTy() to
  TLI.getScalarShiftAmountTy() and re-define TLI.getShiftAmountTy() to
  return target-specificed scalar type or the same vector type as the
  1st operand.
- Fix most TICG logic assuming TLI.getShiftAmountTy() a simple scalar
  type.

llvm-svn: 176364
2013-03-01 18:40:30 +00:00
Bill Schmidt 8ea7af8e44 Fix PR15332 (patch by Florian Zeitz).
There's no need to generate a stack frame for PPC32 SVR4 when there are
no local variables assigned to the stack, i.e., when no red zone is needed.
(PPC64 supports a red zone, but PPC32 does not.)

llvm-svn: 176124
2013-02-26 21:28:57 +00:00
Bill Schmidt b454829981 Fix missing relocation for TLS addressing peephole optimization.
Report and fix due to Kai Nacke.  Testcase update by me.

llvm-svn: 176029
2013-02-25 16:44:35 +00:00
Bill Schmidt c68c6df884 Fix PR14364.
This removes a const_cast hack from PPCRegisterInfo::hasReservedSpillSlot().
The proper place to save the frame index for the CR spill slot is in the
PPCFunctionInfo object, not the PPCRegisterInfo object.

No new test cases, as this just reimplements existing function.  Existing
tests such as test/CodeGen/PowerPC/crsave.ll are sufficient.

llvm-svn: 175998
2013-02-24 17:34:50 +00:00
Eli Bendersky 8da87163ca Move the eliminateCallFramePseudoInstr method from TargetRegisterInfo
to TargetFrameLowering, where it belongs. Incidentally, this allows us
to delete some duplicated (and slightly different!) code in TRI.

There are potentially other layering problems that can be cleaned up
as a result, or in a similar manner.

The refactoring was OK'd by Anton Korobeynikov on llvmdev.

Note: this touches the target interfaces, so out-of-tree targets may
be affected.

llvm-svn: 175788
2013-02-21 20:05:00 +00:00
Bill Schmidt 836c45badf Trivial cleanup
llvm-svn: 175771
2013-02-21 17:26:05 +00:00
Bill Schmidt 27917785ae Large code model support for PowerPC.
Large code model is identical to medium code model except that the
addis/addi sequence for "local" accesses is never used.  All accesses
use the addis/ld sequence.

The coding changes are straightforward; most of the patch is taken up
with creating variants of the medium model tests for large model.

llvm-svn: 175767
2013-02-21 17:12:27 +00:00
Bill Schmidt 49498dac9d Code review cleanup for r175697
llvm-svn: 175739
2013-02-21 14:35:42 +00:00
Bill Schmidt f5b474c6c6 PPCDAGToDAGISel::PostprocessISelDAG()
This patch implements the PPCDAGToDAGISel::PostprocessISelDAG virtual
method to perform post-selection peephole optimizations on the DAG
representation.

One optimization is implemented here:  folds to clean up complex
addressing expressions for thread-local storage and medium code
model.  It will also be useful for large code model sequences when
those are added later.  I originally thought about doing this on the
MI representation prior to register assignment, but it's difficult to
do effective global dead code elimination at that point.  DCE is
trivial on the DAG representation.

A typical example of a candidate code sequence in assembly:

   addis 3, 2, globalvar@toc@ha
   addi  3, 3, globalvar@toc@l
   lwz   5, 0(3)

When the final instruction is a load or store with an immediate offset
of zero, the offset from the add-immediate can replace the zero,
provided the relocation information is carried along:

   addis 3, 2, globalvar@toc@ha
   lwz   5, globalvar@toc@l(3)

Since the addi can in general have multiple uses, we need to only
delete the instruction when the last use is removed.

llvm-svn: 175697
2013-02-21 00:38:25 +00:00
Bill Schmidt 3822ef2c0c Relocation enablement for PPC DAG postprocessing pass
llvm-svn: 175693
2013-02-21 00:05:29 +00:00
Jim Grosbach 341ad3e72a Update TargetLowering ivars for name policy.
http://llvm.org/docs/CodingStandards.html#name-types-functions-variables-and-enumerators-properly

ivars should be camel-case and start with an upper-case letter. A few in
TargetLowering were starting with a lower-case letter.

No functional change intended.

llvm-svn: 175667
2013-02-20 21:13:59 +00:00
Bill Schmidt c6cbecc2c7 Additional fixes for bug 15155.
This handles the cases where the 6-bit splat element is odd, converting
to a three-instruction sequence to add or subtract two splats.  With this
fix, the XFAIL in test/CodeGen/PowerPC/vec_constants.ll is removed.

llvm-svn: 175663
2013-02-20 20:41:42 +00:00
Bill Schmidt 6631e94838 Fix bug 14779 for passing anonymous aggregates [patch by Kai Nacke].
The PPC backend doesn't handle these correctly.  This patch uses logic
similar to that in the X86 and ARM backends to track these arguments
properly.

llvm-svn: 175635
2013-02-20 17:31:41 +00:00
Bill Schmidt 51e7951e24 Fix PR15155: lost vadd/vsplat optimization.
During lowering of a BUILD_VECTOR, we look for opportunities to use a
vector splat.  When the splatted value fits in 5 signed bits, a single
splat does the job.  When it doesn't fit in 5 bits but does fit in 6,
and is an even value, we can splat on half the value and add the result
to itself.

This last optimization hasn't been working recently because of improved
constant folding.  To circumvent this, create a pseudo VADD_SPLAT that
can be expanded during instruction selection.

llvm-svn: 175632
2013-02-20 15:50:31 +00:00
Jakub Staszak 2be3832d50 Add missing #include.
llvm-svn: 175583
2013-02-20 00:31:54 +00:00
Benjamin Kramer de712b788b Make the visibility of LLVMPPCCompilationCallback work with GCC.
GCC warns about the attribute being ignored if it occurs after void*.
There seems to be some kind of incompatibility between clang and gcc here, but
I can't fathom who's right.

void* LLVM_LIBRARY_VISIBILITY foo(); // clang: hidden, gcc: default
LLVM_LIBRARY_VISIBILITY void *bar(); // clang: hidden, gcc: hidden
void LLVM_LIBRARY_VISIBILITY qux();  // clang: hidden, gcc: hidden

llvm-svn: 175394
2013-02-17 14:30:32 +00:00
Rafael Espindola 91cbcbb909 Give these callbacks hidden visibility. It is better to not export them more
than we need to and some ELF linkers complain about directly accessing symbols
with default visibility.

llvm-svn: 175268
2013-02-15 14:15:59 +00:00
Rafael Espindola 9b7d4004bc Don't make assumptions about the mangling of static functions in extern "C"
blocks. We still don't have consensus if we should try to change clang or
the standard, but llvm should work with compilers that implement the current
standard and mangle those functions.

llvm-svn: 175267
2013-02-15 14:08:43 +00:00
Rafael Espindola 8868faac14 Revert r175120 and r175121. Clang is producing the expected asm names again.
llvm-svn: 175133
2013-02-14 03:33:34 +00:00
Rafael Espindola 764993493c Don't asume that a static function in an extern "C" block will not be mangled.
Since functions with internal linkage don't have language linkage, it is valid
to overload them:

extern "C" {
       static int foo();
       static int foo(int);
}

So we mangle them.

llvm-svn: 175120
2013-02-14 01:58:08 +00:00
Krzysztof Parzyszek 2680b53d90 Add registration for PPC-specific passes to allow the IR to be dumped
via -print-after-all.

llvm-svn: 175058
2013-02-13 17:40:07 +00:00
Bill Schmidt 62fe7a5b17 Refine fix to bug 15041.
Thanks to help from Nadav and Hal, I have a more reasonable (and even
correct!) approach.  This specifically penalizes the insertelement
and extractelement operations for the performance hit that will occur
on PowerPC processors.

llvm-svn: 174725
2013-02-08 18:19:17 +00:00
Bill Schmidt b3cece13cf Constrain PowerPC autovectorization to fix bug 15041.
Certain vector operations don't vectorize well with the current
PowerPC implementation.  Element insert/extract performs poorly
without VSX support because Altivec requires going through memory.
SREM, UREM, and VSELECT all produce bad scalar code.

There's a lot of work to do for the cost model before
autovectorization will be tuned well, and this is not an attempt to
address the larger problem.

llvm-svn: 174660
2013-02-07 20:33:57 +00:00
Bill Schmidt ef17c14254 PPC calling convention cleanup.
Most of PPCCallingConv.td is used only by the 32-bit SVR4 ABI.  Rename
things to clarify this.  Also delete some code that's been commented out
for a long time.

llvm-svn: 174526
2013-02-06 17:33:58 +00:00
Jakob Stoklund Olesen 8660a8c0fc Move MRI liveouts to PowerPC return instructions.
llvm-svn: 174409
2013-02-05 18:12:00 +00:00
Jakob Stoklund Olesen bf034dbd32 Avoid using MRI::liveout_iterator for computing VRSAVEs.
The liveout lists are about to be removed from MRI, this is the only
place they were used after register allocation.

Get the live out V registers directly from the return instructions
instead.

llvm-svn: 174399
2013-02-05 17:40:36 +00:00
Benjamin Kramer c35d526489 Disable a couple more vector splat optimizations on PPC.
I didn't see those because the test case used "not grep". FileCheck the test and
XFAIL it, preserving the old optimization, so this can be fixed eventually.

llvm-svn: 174330
2013-02-04 15:52:32 +00:00
Benjamin Kramer 548ffa274a SelectionDAG: Teach FoldConstantArithmetic how to deal with vectors.
This required disabling a PowerPC optimization that did the following:
input:
x = BUILD_VECTOR <i32 16, i32 16, i32 16, i32 16>
lowered to:
tmp = BUILD_VECTOR <i32 8, i32 8, i32 8, i32 8>
x = ADD tmp, tmp

The add now gets folded immediately and we're back at the BUILD_VECTOR we
started from. I don't see a way to fix this currently so I left it disabled
for now.

Fix some trivially foldable X86 tests too.

llvm-svn: 174325
2013-02-04 15:19:18 +00:00
NAKAMURA Takumi 80159432de PPCDarwinAsmPrinter::EmitStartOfAsmFile(): Add checking range in CPUDirectives[].
llvm-svn: 174298
2013-02-04 00:47:38 +00:00
NAKAMURA Takumi 3d591ae0b9 PPCDarwinAsmPrinter::EmitStartOfAsmFile(): Add possible elements in CPUDirectives[].
llvm-svn: 174297
2013-02-04 00:47:33 +00:00
Bill Schmidt cc99a2f61d Add notes about future PowerPC features
llvm-svn: 174232
2013-02-01 23:10:09 +00:00
Bill Schmidt 52742c25ae LLVM enablement for some older PowerPC CPUs
llvm-svn: 174230
2013-02-01 22:59:51 +00:00
Chad Rosier df782d2225 [PEI] Pass the frame index operand number to the eliminateFrameIndex function.
Each target implementation was needlessly recomputing the index.
Part of rdar://13076458

llvm-svn: 174083
2013-01-31 20:02:54 +00:00
Hal Finkel e1df90958d PPC QPX requires a 32-byte aligned stack
On systems which support the QPX vector instructions, the stack must be
32-byte aligned.

llvm-svn: 173993
2013-01-30 23:43:27 +00:00
Hal Finkel b3fc509b23 Initialize hasQPX in PPCSubtarget
This should have gone in with r173973.

llvm-svn: 173984
2013-01-30 22:43:44 +00:00
Hal Finkel efb305e54c Add definitions for the PPC a2q core marked as having QPX available
This is the first commit of a large series which will add support for the
QPX vector instruction set to the PowerPC backend. This instruction set is
used on the IBM Blue Gene/Q supercomputers.

llvm-svn: 173973
2013-01-30 21:17:42 +00:00
Evan Cheng 0e88c7d897 Teach SDISel to combine fsin / fcos into a fsincos node if the following
conditions are met:
1. They share the same operand and are in the same BB.
2. Both outputs are used.
3. The target has a native instruction that maps to ISD::FSINCOS node or
   the target provides a sincos library call.

Implemented the generic optimization in sdisel and enabled it for
Mac OSX. Also added an additional optimization for x86_64 Mac OSX by
using an alternative entry point __sincos_stret which returns the two
results in xmm0 / xmm1.

rdar://13087969
PR13204

llvm-svn: 173755
2013-01-29 02:32:37 +00:00
Hal Finkel 7f9e8d3eaa Add isBGQ method to PPCSubtarget
This function will be used in future commits.

llvm-svn: 173729
2013-01-29 00:22:47 +00:00
Dmitri Gribenko c451bdf9ff Remove unused variables, silences -Wunused-variable
llvm-svn: 173526
2013-01-25 23:17:21 +00:00
Hal Finkel 4e5ca9e578 Initial implementation of PPCTargetTransformInfo
This provides a place to add customized operation cost information and
control some other target-specific IR-level transformations.

The only non-trivial logic in this checkin assigns a higher cost to
unaligned loads and stores (covered by the included test case).

llvm-svn: 173520
2013-01-25 23:05:59 +00:00
Hal Finkel 53f4ba6ce3 More cleanup of PPC register definitions.
Uses the new !add TableGen operator to do more cleanup of the
PPC register definitions.

llvm-svn: 173446
2013-01-25 14:49:10 +00:00
Hal Finkel 41176f43c4 Start cleanup of PPC register definitions using foreach loops.
No functionality change intended.

This captures the first two cases GPR32/64. For the others, we need
an addition operator (if we have one, I've not yet found it).

Based on a suggestion made by Tom Stellard in the AArch64 review!

llvm-svn: 173366
2013-01-24 20:43:18 +00:00
Eli Bendersky f759526983 Fix powerpc test failure - forgot to initialize stack slot size for PPCLinuxMCAsmInfo
llvm-svn: 173275
2013-01-23 17:12:15 +00:00
Eli Bendersky 32aab2216d Clean up assignment of CalleeSaveStackSlotSize: get rid of the default and explicitly set this in every target that needs to change it from the default.
llvm-svn: 173270
2013-01-23 16:22:04 +00:00
Chandler Carruth 1fe21fc0b5 Sort all of the includes. Several files got checked in with mis-sorted
includes.

llvm-svn: 172891
2013-01-19 08:03:47 +00:00
Bill Schmidt dee1ef8f53 This patch fixes PR13626 by providing i128 support in the return
calling convention.  128-bit integers are now properly returned
in GPR3 and GPR4 on PowerPC.

llvm-svn: 172745
2013-01-17 19:34:57 +00:00
Bill Schmidt 6b2940b01e This patch fixes the PPC calling convention to handle returns of
_Complex float and _Complex long double, by simply increasing the
number of floating point registers available for return values.

The test case verifies that the correct registers are loaded.

llvm-svn: 172733
2013-01-17 17:45:19 +00:00
Adhemerval Zanella 1ae2248e14 PowerPC: EH adjustments
This patch adjust the r171506 to make all DWARF enconding pc-relative
for PPC64. It also adds the R_PPC64_REL32 relocation handling in MCJIT
(since the eh_frame will not generate PIC-relative relocation) and also
adds the emission of stubs created by the TTypeEncoding.

llvm-svn: 171979
2013-01-09 17:08:15 +00:00
Eric Christopher e3ab3d0e2c These functions have default arguments of 0 for the last arg. Use
them.

llvm-svn: 171933
2013-01-09 01:57:54 +00:00
Eli Bendersky 4d9ada036c Renamed MCInstFragment to MCRelaxableFragment and added some comments.
No change in functionality.

llvm-svn: 171822
2013-01-08 00:22:56 +00:00
Bill Schmidt 9b1e3e25dc This patch addresses bug 14678 by fixing two problems in medium code model
code generation.  Variables addressed through a GlobalAlias were not being
handled, and variables with available_externally linkage were treated
incorrectly.  The patch contains two new tests to verify the correct code
generation for these cases.

llvm-svn: 171778
2013-01-07 19:29:18 +00:00
Chandler Carruth 664e354de7 Switch TargetTransformInfo from an immutable analysis pass that requires
a TargetMachine to construct (and thus isn't always available), to an
analysis group that supports layered implementations much like
AliasAnalysis does. This is a pretty massive change, with a few parts
that I was unable to easily separate (sorry), so I'll walk through it.

The first step of this conversion was to make TargetTransformInfo an
analysis group, and to sink the nonce implementations in
ScalarTargetTransformInfo and VectorTargetTranformInfo into
a NoTargetTransformInfo pass. This allows other passes to add a hard
requirement on TTI, and assume they will always get at least on
implementation.

The TargetTransformInfo analysis group leverages the delegation chaining
trick that AliasAnalysis uses, where the base class for the analysis
group delegates to the previous analysis *pass*, allowing all but tho
NoFoo analysis passes to only implement the parts of the interfaces they
support. It also introduces a new trick where each pass in the group
retains a pointer to the top-most pass that has been initialized. This
allows passes to implement one API in terms of another API and benefit
when some other pass above them in the stack has more precise results
for the second API.

The second step of this conversion is to create a pass that implements
the TargetTransformInfo analysis using the target-independent
abstractions in the code generator. This replaces the
ScalarTargetTransformImpl and VectorTargetTransformImpl classes in
lib/Target with a single pass in lib/CodeGen called
BasicTargetTransformInfo. This class actually provides most of the TTI
functionality, basing it upon the TargetLowering abstraction and other
information in the target independent code generator.

The third step of the conversion adds support to all TargetMachines to
register custom analysis passes. This allows building those passes with
access to TargetLowering or other target-specific classes, and it also
allows each target to customize the set of analysis passes desired in
the pass manager. The baseline LLVMTargetMachine implements this
interface to add the BasicTTI pass to the pass manager, and all of the
tools that want to support target-aware TTI passes call this routine on
whatever target machine they end up with to add the appropriate passes.

The fourth step of the conversion created target-specific TTI analysis
passes for the X86 and ARM backends. These passes contain the custom
logic that was previously in their extensions of the
ScalarTargetTransformInfo and VectorTargetTransformInfo interfaces.
I separated them into their own file, as now all of the interface bits
are private and they just expose a function to create the pass itself.
Then I extended these target machines to set up a custom set of analysis
passes, first adding BasicTTI as a fallback, and then adding their
customized TTI implementations.

The fourth step required logic that was shared between the target
independent layer and the specific targets to move to a different
interface, as they no longer derive from each other. As a consequence,
a helper functions were added to TargetLowering representing the common
logic needed both in the target implementation and the codegen
implementation of the TTI pass. While technically this is the only
change that could have been committed separately, it would have been
a nightmare to extract.

The final step of the conversion was just to delete all the old
boilerplate. This got rid of the ScalarTargetTransformInfo and
VectorTargetTransformInfo classes, all of the support in all of the
targets for producing instances of them, and all of the support in the
tools for manually constructing a pass based around them.

Now that TTI is a relatively normal analysis group, two things become
straightforward. First, we can sink it into lib/Analysis which is a more
natural layer for it to live. Second, clients of this interface can
depend on it *always* being available which will simplify their code and
behavior. These (and other) simplifications will follow in subsequent
commits, this one is clearly big enough.

Finally, I'm very aware that much of the comments and documentation
needs to be updated. As soon as I had this working, and plausibly well
commented, I wanted to get it committed and in front of the build bots.
I'll be doing a few passes over documentation later if it sticks.

Commits to update DragonEgg and Clang will be made presently.

llvm-svn: 171681
2013-01-07 01:37:14 +00:00
Adhemerval Zanella 9b0b781395 PowerPC: Fix eh_frame relocation for PIC
This patch fixes the PPC eh_frame definitions for the personality and 
frame unwinding for PIC objects. It makes PIC build correctly creates
relative relocations in the '.rela.eh_frame' segments and thus avoiding
a text relocation that generates a DT_TEXTREL segments in link phase.

llvm-svn: 171506
2013-01-04 19:08:13 +00:00
Chandler Carruth 9fb823bbd4 Move all of the header files which are involved in modelling the LLVM IR
into their new header subdirectory: include/llvm/IR. This matches the
directory structure of lib, and begins to correct a long standing point
of file layout clutter in LLVM.

There are still more header files to move here, but I wanted to handle
them in separate commits to make tracking what files make sense at each
layer easier.

The only really questionable files here are the target intrinsic
tablegen files. But that's a battle I'd rather not fight today.

I've updated both CMake and Makefile build systems (I think, and my
tests think, but I may have missed something).

I've also re-sorted the includes throughout the project. I'll be
committing updates to Clang, DragonEgg, and Polly momentarily.

llvm-svn: 171366
2013-01-02 11:36:10 +00:00
Bill Wendling 698e84fc4f Remove the Function::getFnAttributes method in favor of using the AttributeSet
directly.

This is in preparation for removing the use of the 'Attribute' class as a
collection of attributes. That will shift to the AttributeSet class instead.

llvm-svn: 171253
2012-12-30 10:32:01 +00:00
Hal Finkel 1b5ff08d43 Expand PPC64 atomic load and store
Use of store or load with the atomic specifier on 64-bit types would
cause instruction-selection failures. As with the 32-bit case, these
can use the default expansion in terms of cmp-and-swap.

llvm-svn: 171072
2012-12-25 17:22:53 +00:00
Rafael Espindola fb8ac2df09 Undefine PPC harder.
This was causing a build failure while trying to build on ppc ubuntu 12.10 with
cmake.

llvm-svn: 170668
2012-12-20 05:13:09 +00:00
Benjamin Kramer c5071466d4 PowerPC: Expand VSELECT nodes.
There's probably a better expansion for those nodes than the default for
altivec, but this is better than crashing. VSELECTs occur in loop vectorizer
output.

llvm-svn: 170551
2012-12-19 15:49:14 +00:00
Bill Wendling 3d7b0b8ac7 Rename the 'Attributes' class to 'Attribute'. It's going to represent a single attribute in the future.
llvm-svn: 170502
2012-12-19 07:18:57 +00:00
Bill Schmidt a4f898448c This patch removes some nondeterminism from direct object file output
for TLS dynamic models on 64-bit PowerPC ELF.  The default sort routine
for relocations only sorts on the r_offset field; but with TLS, there
can be two relocations with the same r_offset.  For PowerPC, this patch
sorts secondarily on descending r_type, which matches the behavior
expected by the linker.

llvm-svn: 170237
2012-12-14 20:28:38 +00:00
Bill Schmidt 9f0b4ec0f5 This patch improves the 64-bit PowerPC InitialExec TLS support by providing
for a wider range of GOT entries that can hold thread-relative offsets.
This matches the behavior of GCC, which was not documented in the PPC64 TLS
ABI.  The ABI will be updated with the new code sequence.

Former sequence:

  ld 9,x@got@tprel(2)
  add 9,9,x@tls

New sequence:

  addis 9,2,x@got@tprel@ha
  ld 9,x@got@tprel@l(9)
  add 9,9,x@tls

Note that a linker optimization exists to transform the new sequence into
the shorter sequence when appropriate, by replacing the addis with a nop
and modifying the base register and relocation type of the ld.

llvm-svn: 170209
2012-12-14 17:02:38 +00:00
Bill Schmidt 9ed4dbcb75 This is another cleanup patch for 64-bit PowerPC TLS processing. I had
some hackery in place that hid my poor use of TblGen, which I've now sorted
out and cleaned up.  No change in observable behavior, so no new test cases.

llvm-svn: 170149
2012-12-13 20:57:10 +00:00
Bill Schmidt 732eb91f05 This is just a clean-up patch that simplifies the initial-exec TLS logic by
avoiding use of machine operand flags.  No change in observable behavior, so
no new test cases.

llvm-svn: 170141
2012-12-13 18:45:54 +00:00
Bill Schmidt 24b8dd6eb7 This patch implements local-dynamic TLS model support for the 64-bit
PowerPC target.  This is the last of the four models, so we now have 
full TLS support.

This is mostly a straightforward extension of the general dynamic model.
I had to use an additional Chain operand to tie ADDIS_DTPREL_HA to the
register copy following ADDI_TLSLD_L; otherwise everything above the
ADDIS_DTPREL_HA appeared dead and was removed.

As before, there are new test cases to test the assembly generation, and
the relocations output during integrated assembly.  The expected code
gen sequence can be read in test/CodeGen/PowerPC/tls-ld.ll.

There are a couple of things I think can be done more efficiently in the
overall TLS code, so there will likely be a clean-up patch forthcoming;
but for now I want to be sure the functionality is in place.

Bill

llvm-svn: 170003
2012-12-12 19:29:35 +00:00
Evan Cheng 962711ee71 Sorry about the churn. One more change to getOptimalMemOpType() hook. Did I
mention the inline memcpy / memset expansion code is a mess?

This patch split the ZeroOrLdSrc argument into two: IsMemset and ZeroMemset.
The first indicates whether it is expanding a memset or a memcpy / memmove.
The later is whether the memset is a memset of zero. It's totally possible
(likely even) that targets may want to do different things for memcpy and
memset of zero.

llvm-svn: 169959
2012-12-12 02:34:41 +00:00
Evan Cheng c3d1aca657 - Rename isLegalMemOpType to isSafeMemOpType. "Legal" is a very overloade term.
Also added more comments to explain why it is generally ok to return true.
- Rename getOptimalMemOpType argument IsZeroVal to ZeroOrLdSrc. It's meant to
be true for loaded source (memcpy) or zero constants (memset). The poor name
choice is probably some kind of legacy issue.

llvm-svn: 169954
2012-12-12 01:32:07 +00:00
Bill Schmidt c56f1d34bc This patch implements the general dynamic TLS model for 64-bit PowerPC.
Given a thread-local symbol x with global-dynamic access, the generated
code to obtain x's address is:

     Instruction                            Relocation            Symbol
  addis ra,r2,x@got@tlsgd@ha           R_PPC64_GOT_TLSGD16_HA       x
  addi  r3,ra,x@got@tlsgd@l            R_PPC64_GOT_TLSGD16_L        x
  bl __tls_get_addr(x@tlsgd)           R_PPC64_TLSGD                x
                                       R_PPC64_REL24           __tls_get_addr
  nop
  <use address in r3>

The implementation borrows from the medium code model work for introducing
special forms of ADDIS and ADDI into the DAG representation.  This is made
slightly more complicated by having to introduce a call to the external
function __tls_get_addr.  Using the full call machinery is overkill and,
more importantly, makes it difficult to add a special relocation.  So I've
introduced another opcode GET_TLS_ADDR to represent the function call, and
surrounded it with register copies to set up the parameter and return value.

Most of the code is pretty straightforward.  I ran into one peculiarity
when I introduced a new PPC opcode BL8_NOP_ELF_TLSGD, which is just like
BL8_NOP_ELF except that it takes another parameter to represent the symbol
("x" above) that requires a relocation on the call.  Something in the 
TblGen machinery causes BL8_NOP_ELF and BL8_NOP_ELF_TLSGD to be treated
identically during the emit phase, so this second operand was never
visited to generate relocations.  This is the reason for the slightly
messy workaround in PPCMCCodeEmitter.cpp:getDirectBrEncoding().

Two new tests are included to demonstrate correct external assembly and
correct generation of relocations using the integrated assembler.

Comments welcome!

Thanks,
Bill

llvm-svn: 169910
2012-12-11 20:30:11 +00:00
Bill Schmidt ca4a0c9dbd This patch introduces initial-exec model support for thread-local storage
on 64-bit PowerPC ELF.

The patch includes code to handle external assembly and MC output with the
integrated assembler.  It intentionally does not support the "old" JIT.

For the initial-exec TLS model, the ABI requires the following to calculate
the address of external thread-local variable x:

 Code sequence            Relocation                  Symbol
  ld 9,x@got@tprel(2)      R_PPC64_GOT_TPREL16_DS      x
  add 9,9,x@tls            R_PPC64_TLS                 x

The register 9 is arbitrary here.  The linker will replace x@got@tprel
with the offset relative to the thread pointer to the generated GOT
entry for symbol x.  It will replace x@tls with the thread-pointer
register (13).

The two test cases verify correct assembly output and relocation output
as just described.

PowerPC-specific selection node variants are added for the two
instructions above:  LD_GOT_TPREL and ADD_TLS.  These are inserted
when an initial-exec global variable is encountered by
PPCTargetLowering::LowerGlobalTLSAddress(), and later lowered to
machine instructions LDgotTPREL and ADD8TLS.  LDgotTPREL is a pseudo
that uses the same LDrs support added for medium code model's LDtocL,
with a different relocation type.

The rest of the processing is straightforward.

llvm-svn: 169281
2012-12-04 16:18:08 +00:00
Chandler Carruth 802d755533 Sort includes for all of the .h files under the 'lib' tree. These were
missed in the first pass because the script didn't yet handle include
guards.

Note that the script is now able to handle all of these headers without
manual edits. =]

llvm-svn: 169224
2012-12-04 07:12:27 +00:00
Chandler Carruth ed0881b2a6 Use the new script to sort the includes of every file under lib.
Sooooo many of these had incorrect or strange main module includes.
I have manually inspected all of these, and fixed the main module
include to be the nearest plausible thing I could find. If you own or
care about any of these source files, I encourage you to take some time
and check that these edits were sensible. I can't have broken anything
(I strictly added headers, and reordered them, never removed), but they
may not be the headers you'd really like to identify as containing the
API being implemented.

Many forward declarations and missing includes were added to a header
files to allow them to parse cleanly when included first. The main
module rule does in fact have its merits. =]

llvm-svn: 169131
2012-12-03 16:50:05 +00:00
Adhemerval Zanella 812410f2d1 This patch fixes the Altivec addend construction for the fused multiply-add
instruction (vmaddfp) to conform with IEEE to ensure the sign of a zero
result when resulting product is -0.0.

The -0.0 vector addend to vmaddfp is generated by a creating a vector
with full bits sets and then shifting each elements by 31-bits to the
left, resulting in a vector of 0x80000000 (or -0.0 as float).

The 'buildvec_canonicalize.ll' was adjusted to reflect this change and
the 'vec_mul.ll' was complemented with the float vector multiplication
test.

llvm-svn: 168998
2012-11-30 13:05:44 +00:00
Ulrich Weigand 9e159fd7a0 Fix initial frame state on powerpc64.
The createPPCMCAsmInfo routine used PPC::R1 as the initial frame
pointer register, but on PPC64 the 32-bit R1 register does not
have a corresponding DWARF number, causing invalid CIE initial
frame state to be emitted.  Fix by using PPC::X1 instead.

llvm-svn: 168799
2012-11-28 18:21:03 +00:00
Jakob Stoklund Olesen 9de596e650 Remove all references to TargetInstrInfoImpl.
This class has been merged into its super-class TargetInstrInfo.

llvm-svn: 168760
2012-11-28 02:35:17 +00:00
Bill Schmidt e0a68a562b This patch makes medium code model the default for 64-bit PowerPC ELF.
When the CodeGenInfo is to be created for the PPC64 target machine,
a default code-model selection is converted to CodeModel::Medium
provided we are not targeting the Darwin OS.  Defaults for Darwin
are unaffected.

llvm-svn: 168747
2012-11-27 23:36:26 +00:00
Bill Schmidt 34627e3434 This patch implements medium code model support for 64-bit PowerPC.
The default for 64-bit PowerPC is small code model, in which TOC entries
must be addressable using a 16-bit offset from the TOC pointer.  Additionally,
only TOC entries are addressed via the TOC pointer.

With medium code model, TOC entries and data sections can all be addressed
via the TOC pointer using a 32-bit offset.  Cooperation with the linker
allows 16-bit offsets to be used when these are sufficient, reducing the
number of extra instructions that need to be executed.  Medium code model
also does not generate explicit TOC entries in ".section toc" for variables
that are wholly internal to the compilation unit.

Consider a load of an external 4-byte integer.  With small code model, the
compiler generates:

	ld 3, .LC1@toc(2)
	lwz 4, 0(3)

	.section	.toc,"aw",@progbits
.LC1:
	.tc ei[TC],ei

With medium model, it instead generates:

	addis 3, 2, .LC1@toc@ha
	ld 3, .LC1@toc@l(3)
	lwz 4, 0(3)

	.section	.toc,"aw",@progbits
.LC1:
	.tc ei[TC],ei

Here .LC1@toc@ha is a relocation requesting the upper 16 bits of the
32-bit offset of ei's TOC entry from the TOC base pointer.  Similarly,
.LC1@toc@l is a relocation requesting the lower 16 bits.  Note that if
the linker determines that ei's TOC entry is within a 16-bit offset of
the TOC base pointer, it will replace the "addis" with a "nop", and
replace the "ld" with the identical "ld" instruction from the small
code model example.

Consider next a load of a function-scope static integer.  For small code
model, the compiler generates:

	ld 3, .LC1@toc(2)
	lwz 4, 0(3)

	.section	.toc,"aw",@progbits
.LC1:
	.tc test_fn_static.si[TC],test_fn_static.si
	.type	test_fn_static.si,@object
	.local	test_fn_static.si
	.comm	test_fn_static.si,4,4

For medium code model, the compiler generates:

	addis 3, 2, test_fn_static.si@toc@ha
	addi 3, 3, test_fn_static.si@toc@l
	lwz 4, 0(3)

	.type	test_fn_static.si,@object
	.local	test_fn_static.si
	.comm	test_fn_static.si,4,4

Again, the linker may replace the "addis" with a "nop", calculating only
a 16-bit offset when this is sufficient.

Note that it would be more efficient for the compiler to generate:

	addis 3, 2, test_fn_static.si@toc@ha
        lwz 4, test_fn_static.si@toc@l(3)

The current patch does not perform this optimization yet.  This will be
addressed as a peephole optimization in a later patch.

For the moment, the default code model for 64-bit PowerPC will remain the
small code model.  We plan to eventually change the default to medium code
model, which matches current upstream GCC behavior.  Note that the different
code models are ABI-compatible, so code compiled with different models will
be linked and execute correctly.

I've tested the regression suite and the application/benchmark test suite in
two ways:  Once with the patch as submitted here, and once with additional
logic to force medium code model as the default.  The tests all compile
cleanly, with one exception.  The mandel-2 application test fails due to an
unrelated ABI compatibility with passing complex numbers.  It just so happens
that small code model was incredibly lucky, in that temporary values in 
floating-point registers held the expected values needed by the external
library routine that was called incorrectly.  My current thought is to correct
the ABI problems with _Complex before making medium code model the default,
to avoid introducing this "regression."

Here are a few comments on how the patch works, since the selection code
can be difficult to follow:

The existing logic for small code model defines three pseudo-instructions:
LDtoc for most uses, LDtocJTI for jump table addresses, and LDtocCPT for
constant pool addresses.  These are expanded by SelectCodeCommon().  The
pseudo-instruction approach doesn't work for medium code model, because
we need to generate two instructions when we match the same pattern.
Instead, new logic in PPCDAGToDAGISel::Select() intercepts the TOC_ENTRY
node for medium code model, and generates an ADDIStocHA followed by either
a LDtocL or an ADDItocL.  These new node types correspond naturally to
the sequences described above.

The addis/ld sequence is generated for the following cases:
 * Jump table addresses
 * Function addresses
 * External global variables
 * Tentative definitions of global variables (common linkage)

The addis/addi sequence is generated for the following cases:
 * Constant pool entries
 * File-scope static global variables
 * Function-scope static variables

Expanding to the two-instruction sequences at select time exposes the
instructions to subsequent optimization, particularly scheduling.

The rest of the processing occurs at assembly time, in
PPCAsmPrinter::EmitInstruction.  Each of the instructions is converted to
a "real" PowerPC instruction.  When a TOC entry needs to be created, this
is done here in the same manner as for the existing LDtoc, LDtocJTI, and
LDtocCPT pseudo-instructions (I factored out a new routine to handle this).

I had originally thought that if a TOC entry was needed for LDtocL or
ADDItocL, it would already have been generated for the previous ADDIStocHA.
However, at higher optimization levels, the ADDIStocHA may appear in a 
different block, which may be assembled textually following the block
containing the LDtocL or ADDItocL.  So it is necessary to include the
possibility of creating a new TOC entry for those two instructions.

Note that for LDtocL, we generate a new form of LD called LDrs.  This
allows specifying the @toc@l relocation for the offset field of the LD
instruction (i.e., the offset is replaced by a SymbolLo relocation).
When the peephole optimization described above is added, we will need
to do similar things for all immediate-form load and store operations.

The seven "mcm-n.ll" test cases are kept separate because otherwise the
intermingling of various TOC entries and so forth makes the tests fragile
and hard to understand.

The above assumes use of an external assembler.  For use of the
integrated assembler, new relocations are added and used by
PPCELFObjectWriter.  Testing is done with "mcm-obj.ll", which tests for
proper generation of the various relocations for the same sequences
tested with the external assembler.

llvm-svn: 168708
2012-11-27 17:35:46 +00:00
Benjamin Kramer ebf576d31d Decouple MCInstBuilder from the streamer per Eli's request.
llvm-svn: 168597
2012-11-26 18:05:52 +00:00
Benjamin Kramer 4e629f7964 Add MCInstBuilder, a utility class to simplify MCInst creation similar to MachineInstrBuilder.
Simplify some repetitive code with it. No functionality change.

llvm-svn: 168587
2012-11-26 13:34:22 +00:00
Benjamin Kramer 1694f91266 PPC: Reinstate the fatal error when trying to emit a macho file.
llvm-svn: 168543
2012-11-24 15:23:49 +00:00
Benjamin Kramer dd76a93af5 PPC: MCize most of the darwin PIC emission.
The last remaining bit is "bcl 20, 31, AnonSymbol", which I couldn't find the
instruction definition for. Only whitespace changes in assembly output.

llvm-svn: 168541
2012-11-24 13:18:25 +00:00
Benjamin Kramer 8abfec3967 PPC: Share applyFixup between ELF and Darwin.
llvm-svn: 168540
2012-11-24 13:18:17 +00:00
Benjamin Kramer 69dc3fe461 PPC: Simplify code with Twines.
No functionality change.

llvm-svn: 168539
2012-11-24 13:18:11 +00:00
Joe Abbey cceda898b8 Using const cast to alleviate a warning.
A PR is being filed to address some code issues here.

llvm-svn: 168185
2012-11-16 19:38:42 +00:00
Adhemerval Zanella bdface5699 PowerPC: Lowering floor intrinsic for Altivec
This patch lowers the llvm.floor, llvm.ceil, llvm.trunc, and
llvm.nearbyint to Altivec instruction when using 4 single-precision
float vectors.

llvm-svn: 168086
2012-11-15 20:56:03 +00:00
Craig Topper c8a2adf1ca Make a bunch of floating point operations on vectors Expand so that instruction selection won't fail.
llvm-svn: 168028
2012-11-15 08:02:19 +00:00
Craig Topper 61d045781a Add llvm.ceil, llvm.trunc, llvm.rint, llvm.nearbyint intrinsics.
llvm-svn: 168025
2012-11-15 06:51:10 +00:00
Craig Topper c4343f2c45 Set FFLOOR of vectors to expand to keep intruction selection from failing.
llvm-svn: 167922
2012-11-14 08:11:25 +00:00
Ulrich Weigand 8557850053 Add (some) PowerPC TLS relocation types to ELF.h and
generate them from PPCELFObjectWriter::getRelocTypeInner
as appropriate.

llvm-svn: 167864
2012-11-13 19:24:36 +00:00
Ulrich Weigand 0f79500af5 Fix wrong PowerPC instruction opcodes for:
- lwaux
 - lhzux
 - stbu

llvm-svn: 167863
2012-11-13 19:21:31 +00:00
Ulrich Weigand a82389b3d0 Fix wrong PowerPC instruction encodings due to
operand field name mismatches in:
 - AForm_3  (fmul, fmuls)
 - XFXForm_5 (mtcrf)
 - XFLForm (mtfsf)

llvm-svn: 167862
2012-11-13 19:19:46 +00:00
Ulrich Weigand 0117718580 Fix instruction encoding for "bd(n)z" on PowerPC,
by using a new instruction format BForm_1.

llvm-svn: 167861
2012-11-13 19:15:52 +00:00
Ulrich Weigand 84ee76acfe Fix instruction encoding for "isel" on PowerPC,
using a new instruction format AForm_4.

llvm-svn: 167860
2012-11-13 19:14:19 +00:00
Ulrich Weigand 2c93acdfbf Make TOC order deterministic by using MapVector instead of DenseMap.
llvm-svn: 167737
2012-11-12 19:13:24 +00:00
Ulrich Weigand 339d0597d3 On PowerPC64, integer return values (as well as arguments) are supposed
to be extended to a full register.   This is modeled in the IR by marking
the return value (or argument) with a signext or zeroext attribute.

However, while these attributes are respected for function arguments,
they are currently ignored for function return values by the PowerPC
back-end.  This patch updates PPCCallingConv.td to ask for the promotion
to i64, and fixes LowerReturn and LowerCallResult to implement it.

The new test case verifies that both arguments and return values are
properly extended when passing them; and also that the optimizers
understand incoming argument and return values are in fact guaranteed
by the ABI to be extended.

The patch caused a spurious breakage in CodeGen/PowerPC/coalesce-ext.ll,
since the test case used a "ret" instruction to create a use of an i32
value at the end of the function (to set up data flow as required for
what the test is intended to test).  Since there's now an implicit
promotion to i64, that data flow no longer works as expected.  To fix
this, this patch now adds an extra "add" to ensure we have an appropriate
use of the i32 value.

llvm-svn: 167396
2012-11-05 19:39:45 +00:00
Hal Finkel 4f24c621d9 Add support for the PowerPC-specific inline asm Z constraint and y modifier.
The Z constraint specifies an r+r memory address, and the y modifier expands
to the "r, r" in the asm string. For this initial implementation, the base
register is forced to r0 (which has the special meaning of 0 for r+r addressing
on PowerPC) and the full address is taken in the second register. In the
future, this should be improved.

llvm-svn: 167388
2012-11-05 18:18:42 +00:00
Adhemerval Zanella c4182d1890 [PATCH] PowerPC: Expand load extend vector operations
This patch expands the SEXTLOAD, ZEXTLOAD, and EXTLOAD operations for
vector types when altivec is enabled.

llvm-svn: 167386
2012-11-05 17:15:56 +00:00
Chandler Carruth 5da3f0512e Revert the majority of the next patch in the address space series:
r165941: Resubmit the changes to llvm core to update the functions to
         support different pointer sizes on a per address space basis.

Despite this commit log, this change primarily changed stuff outside of
VMCore, and those changes do not carry any tests for correctness (or
even plausibility), and we have consistently found questionable or flat
out incorrect cases in these changes. Most of them are probably correct,
but we need to devise a system that makes it more clear when we have
handled the address space concerns correctly, and ideally each pass that
gets updated would receive an accompanying test case that exercises that
pass specificaly w.r.t. alternate address spaces.

However, from this commit, I have retained the new C API entry points.
Those were an orthogonal change that probably should have been split
apart, but they seem entirely good.

In several places the changes were very obvious cleanups with no actual
multiple address space code added; these I have not reverted when
I spotted them.

In a few other places there were merge conflicts due to a cleaner
solution being implemented later, often not using address spaces at all.
In those cases, I've preserved the new code which isn't address space
dependent.

This is part of my ongoing effort to clean out the partial address space
code which carries high risk and low test coverage, and not likely to be
finished before the 3.2 release looms closer. Duncan and I would both
like to see the above issues addressed before we return to these
changes.

llvm-svn: 167222
2012-11-01 09:14:31 +00:00
Chandler Carruth 7ec5085e01 Revert the series of commits starting with r166578 which introduced the
getIntPtrType support for multiple address spaces via a pointer type,
and also introduced a crasher bug in the constant folder reported in
PR14233.

These commits also contained several problems that should really be
addressed before they are re-committed. I have avoided reverting various
cleanups to the DataLayout APIs that are reasonable to have moving
forward in order to reduce the amount of churn, and minimize the number
of commits that were reverted. I've also manually updated merge
conflicts and manually arranged for the getIntPtrType function to stay
in DataLayout and to be defined in a plausible way after this revert.

Thanks to Duncan for working through this exact strategy with me, and
Nick Lewycky for tracking down the really annoying crasher this
triggered. (Test case to follow in its own commit.)

After discussing with Duncan extensively, and based on a note from
Micah, I'm going to continue to back out some more of the more
problematic patches in this series in order to ensure we go into the
LLVM 3.2 branch with a reasonable story here. I'll send a note to
llvmdev explaining what's going on and why.

Summary of reverted revisions:

r166634: Fix a compiler warning with an unused variable.
r166607: Add some cleanup to the DataLayout changes requested by
         Chandler.
r166596: Revert "Back out r166591, not sure why this made it through
         since I cancelled the command. Bleh, sorry about this!
r166591: Delete a directory that wasn't supposed to be checked in yet.
r166578: Add in support for getIntPtrType to get the pointer type based
         on the address space.
llvm-svn: 167221
2012-11-01 08:07:29 +00:00
Bill Schmidt 9953cf294b This patch addresses an ABI compatibility issue with empty aggregate
parameters.  Examples of these are:

  struct { } a;
  union { } b[256];
  int a[0];

An empty aggregate has an address, although dereferencing that address is
pointless.  When passed as a parameter, an empty aggregate does not consume
a protocol register, nor does it consume a doubleword in the parameter save
area.  Passing an empty aggregate by reference passes an address just as
for any other aggregate.  Returning an empty aggregate uses GPR3 as a hidden
address of the return value location, just as for any other aggregate.

The patch modifies PPCTargetLowering::LowerFormalArguments_64SVR4 and
PPCTargetLowering::LowerCall_64SVR4 to properly skip empty aggregate
parameters passed by value.  The handling of return values and by-reference
parameters was already correct.

Built on powerpc64-unknown-linux-gnu and tested with no new regressions.
A test case is included to test proper handling of empty aggregate
parameters on both sides of the function call protocol.

llvm-svn: 167090
2012-10-31 01:15:05 +00:00
Adhemerval Zanella 5c043aeb1b PowerPC: Expand FSRQT for vector types
This patch expands FSQRT for floating point vector types when altivec is
used.

llvm-svn: 167034
2012-10-30 18:29:42 +00:00
Adhemerval Zanella 56775e0f13 PowerPC: More support for Altivec compare operations
This patch adds more support for vector type comparisons using altivec.
It adds correct support for v16i8, v8i16, v4i32, and v4f32 vector
types for comparison operators ==, !=, >, >=, <, and <=.

llvm-svn: 167015
2012-10-30 13:50:19 +00:00
Bill Schmidt bd4ac26973 This patch solves a problem with passing varargs parameters under the PPC64
ELF ABI.

A varargs parameter consisting of a single-precision floating-point value,
or of a single-element aggregate containing a single-precision floating-point
value, must be passed in the low-order (rightmost) four bytes of the
doubleword stack slot reserved for that parameter.  If there are GPR protocol
registers remaining, the parameter must also be mirrored in the low-order
four bytes of the reserved GPR.

Prior to this patch, such parameters were being passed in the high-order
four bytes of the stack slot and the mirrored GPR.

The patch adds a new test case to verify the correct code generation.

llvm-svn: 166968
2012-10-29 21:18:16 +00:00
Ulrich Weigand 0de4a1e4ae Allow i32/i64 for 'f' constraint on PowerPC.
This fixes PR12757.

llvm-svn: 166943
2012-10-29 17:49:34 +00:00
NAKAMURA Takumi 4bd79920be PPCSubtarget.h: Add explicit braces.
llvm-svn: 166932
2012-10-29 15:51:42 +00:00
NAKAMURA Takumi 70b25de24e PPCSubtarget.h: Whitespace.
llvm-svn: 166931
2012-10-29 15:51:35 +00:00
Bill Schmidt bbc661e572 This patch adds alignment information for long double to the 64-bit PowerPC
ELF subtarget.

The existing logic is used as a fallback to avoid any changes to the Darwin
ABI.  PPC64 ELF now has two possible data layout strings: one for FreeBSD,
which requires 8-byte alignment, and a default string that requires
16-byte alignment.

I've added a test for PPC64 Linux to verify the 16-byte alignment.  If
somebody wants to add a separate test for FreeBSD, that would be great.

Note that there is a companion patch to update the alignment information
in Clang, which I am committing now as well.

llvm-svn: 166928
2012-10-29 14:59:36 +00:00
Adhemerval Zanella 0f9cff1ab8 PowerPC: Fix for rldcl/rldicl/rldicr MC emission
This patch fixes the rldcl/rldicl/rldicr instruction emission. The issue is
the MDForm_1 instruction defines the PowerISA MB field from 'rldicl'
with the name MBE, but RLDCL/RLDICL/RLDICR definition uses as 'MB'.

It end up by generatint the 'rldicl' enconding at 
'lib/Target/PowerPC/PPCGenMCCodeEmitter.inc' to use the fourth argument as the
third. The patch changes it by adjusting to use the fourth argument as
intended.

Fixes PR14180.

llvm-svn: 166770
2012-10-26 12:09:58 +00:00
Adhemerval Zanella 1be10dc732 This patch fixes the MC object emission of 'nop' for external function calls
and also fixes the R_PPC64_TOC16 and R_PPC64_TOC16_DS relocation offset.
The 'nop' is needed so a restore TOC instruction (ld r2,40(r1)) can be placed
by the linker to correct restore the TOC of previous function.

Current code has two issues: it defines in PPCInstr64Bit.td file a LDinto_toc
and LDtoc_restore as a DSForm_1 with DS_RA=0 where it should be
DS=2 (the 8 bytes displacement of the TOC saving). It also wrongly emits a
MC intruction using an uint32_t value while the PPC::BL8_NOP_ELF
and PPC::BLA8_NOP_ELF are both uint64_t (because of the following 'nop').

This patch corrects the remaining ExecutionEngine using MCJIT:

ExecutionEngine/2002-12-16-ArgTest.ll
ExecutionEngine/2003-05-07-ArgumentTest.ll
ExecutionEngine/2005-12-02-TailCallBug.ll
ExecutionEngine/hello.ll
ExecutionEngine/hello2.ll
ExecutionEngine/test-call.ll

llvm-svn: 166682
2012-10-25 14:29:13 +00:00
Bill Schmidt 6ed3b99f43 This patch addresses a PPC64 ELF issue with passing parameters consisting of
structs having size 3, 5, 6, or 7.  Such a struct must be passed and received
as right-justified within its register or memory slot.  The problem is only
present for structs that are passed in registers.

Previously, as part of a patch handling all structs of size less than 8, I
added logic to rotate the incoming register so that the struct was left-
justified prior to storing the whole register.  This was incorrect because
the address of the parameter had already been adjusted earlier to point to
the right-adjusted value in the storage slot.  Essentially I had accidentally
accounted for the right-adjustment twice.

In this patch, I removed the incorrect logic and reorganized the code to make
the flow clearer.

The removal of the rotates changes the expected code generation, so test case
structsinregs.ll has been modified to reflect this.  I also added a new test
case, jaggedstructs.ll, to demonstrate that structs of these sizes can now
be properly received and passed.

I've built and tested the code on powerpc64-unknown-linux-gnu with no new
regressions.  I also ran the GCC compatibility test suite and verified that
earlier problems with these structs are now resolved, with no new regressions.

llvm-svn: 166680
2012-10-25 13:38:09 +00:00
Adhemerval Zanella f2aceda854 Initial TOC support for PowerPC64 object creation
This patch adds initial PPC64 TOC MC object creation using the small mcmodel
(a single 64K TOC) adding the some TOC relocations (R_PPC64_TOC,
R_PPC64_TOC16, and R_PPC64_TOC16DS).

The addition of 'undefinedExplicitRelSym' hook on 'MCELFObjectTargetWriter'
is meant to avoid the creation of an unreferenced ".TOC." symbol (used in
the .odp creation) as well to set the R_PPC64_TOC relocation target as the
temporary ".TOC." symbol. On PPC64 ABI, the R_PPC64_TOC relocation should
not point to any symbol.

llvm-svn: 166677
2012-10-25 12:27:42 +00:00
Nadav Rotem 2289f2c932 Implement a basic VectorTargetTransformInfo interface to be used by the loop and bb vectorizers for modeling the cost of instructions.
llvm-svn: 166593
2012-10-24 17:22:41 +00:00
Micah Villmow 12d9127833 Add in support for getIntPtrType to get the pointer type based on the address space.
This checkin also adds in some tests that utilize these paths and updates some of the
clients.

llvm-svn: 166578
2012-10-24 15:52:52 +00:00
Bill Schmidt 57d6de5fd9 This is another TLC patch for separating code for the Darwin and ELF ABIs
for the PowerPC target, and factoring the results.  This will ease future
maintenance of both subtargets.

PPCTargetLowering::LowerCall_Darwin_Or_64SVR4() has grown a lot of special-case
code for the different ABIs, making maintenance difficult.  This is getting
worse as we repair errors in the 64-bit ELF ABI implementation, while avoiding
changes to the Darwin ABI logic.  This patch splits the routine into
LowerCall_Darwin() and LowerCall_64SVR4(), allowing both versions to be
significantly simplified.  I've factored out chunks of similar code where it
made sense to do so.  I also performed similar factoring on
LowerFormalArguments_Darwin() and LowerFormalArguments_64SVR4().

There are no functional changes in this patch, and therefore no new test
cases have been developed.

Built and tested on powerpc64-unknown-linux-gnu with no new regressions.

llvm-svn: 166480
2012-10-23 15:51:16 +00:00
Nadav Rotem 5dc203e8f4 Reapply the TargerTransformInfo changes, minus the changes to LSR and Lowerinvoke.
llvm-svn: 166248
2012-10-18 23:22:48 +00:00
Ulrich Weigand d34b5bd610 This patch fixes failures in the SingleSource/Regression/C/uint64_to_float
test case on PowerPC caused by rounding errors when converting from a 64-bit
integer to a single-precision floating point. The reason for this are
double-rounding effects, since on PowerPC we have to convert to an
intermediate double-precision value first, which gets rounded to the
final single-precision result.

The patch fixes the problem by preparing the 64-bit integer so that the
first conversion step to double-precision will always be exact, and the
final rounding step will result in the correctly-rounded single-precision
result.  The generated code sequence is equivalent to what GCC would generate.

When -enable-unsafe-fp-math is in effect, that extra effort is omitted
and we accept possible rounding errors (just like GCC does as well).

llvm-svn: 166178
2012-10-18 13:16:11 +00:00
Bob Wilson d6d9ccca38 Temporarily revert the TargetTransform changes.
The TargetTransform changes are breaking LTO bootstraps of clang.  I am
working with Nadav to figure out the problem, but I am reverting it for now
to get our buildbots working.

This reverts svn commits: 165665 165669 165670 165786 165787 165997
and I have also reverted clang svn 165741

llvm-svn: 166168
2012-10-18 05:43:52 +00:00
Bill Schmidt 48081cad0d This patch addresses PR13949.
For the PowerPC 64-bit ELF Linux ABI, aggregates of size less than 8
bytes are to be passed in the low-order bits ("right-adjusted") of the
doubleword register or memory slot assigned to them.  A previous patch
addressed this for aggregates passed in registers.  However, small
aggregates passed in the overflow portion of the parameter save area are
still being passed left-adjusted.

The fix is made in PPCTargetLowering::LowerCall_Darwin_Or_64SVR4 on the
caller side, and in PPCTargetLowering::LowerFormalArguments_64SVR4 on
the callee side.  The main fix on the callee side simply extends
existing logic for 1- and 2-byte objects to 1- through 7-byte objects,
and correcting a constant left over from 32-bit code.  There is also a
fix to a bogus calculation of the offset to the following argument in
the parameter save area.

On the caller side, again a constant left over from 32-bit code is
fixed.  Additionally, some code for 1, 2, and 4-byte objects is
duplicated to handle the 3, 5, 6, and 7-byte objects for SVR4 only.  The
LowerCall_Darwin_Or_64SVR4 logic is getting fairly convoluted trying to
handle both ABIs, and I propose to separate this into two functions in a
future patch, at which time the duplication can be removed.

The patch adds a new test (structsinmem.ll) to demonstrate correct
passing of structures of all seven sizes.  Eight dummy parameters are
used to force these structures to be in the overflow portion of the
parameter save area.

As a side effect, this corrects the case when aggregates passed in
registers are saved into the first eight doublewords of the parameter
save area:  Previously they were stored left-justified, and now are
properly stored right-justified.  This requires changing the expected
output of existing test case structsinregs.ll.

llvm-svn: 166022
2012-10-16 13:30:53 +00:00
Micah Villmow 4bb926d91d Resubmit the changes to llvm core to update the functions to support different pointer sizes on a per address space basis.
llvm-svn: 165941
2012-10-15 16:24:29 +00:00
Adhemerval Zanella ef206f19a4 PowerPC: add EmitTCEntry class for TOC creation
This patch replaces the EmitRawText by a EmitTCEntry class (specialized for
each Streamer) in PowerPC64 TOC entry creation.

llvm-svn: 165940
2012-10-15 15:43:14 +00:00
Micah Villmow 0c61134d8d Revert 165732 for further review.
llvm-svn: 165747
2012-10-11 21:27:41 +00:00
Micah Villmow 083189730e Add in the first iteration of support for llvm/clang/lldb to allow variable per address space pointer sizes to be optimized correctly.
llvm-svn: 165726
2012-10-11 17:21:41 +00:00
Bill Schmidt 22162470ba This patch addresses PR13947.
For function calls on the 64-bit PowerPC SVR4 target, each parameter
is mapped to as many doublewords in the parameter save area as
necessary to hold the parameter.  The first 13 non-varargs
floating-point values are passed in registers; any additional
floating-point parameters are passed in the parameter save area.  A
single-precision floating-point parameter (32 bits) must be mapped to
the second (rightmost, low-order) word of its assigned doubleword
slot.

Currently LLVM violates this ABI requirement by mapping such a
parameter to the first (leftmost, high-order) word of its assigned
doubleword slot.  This is internally self-consistent but will not
interoperate correctly with libraries compiled with an ABI-compliant
compiler.

This patch corrects the problem by adjusting the parameter addressing
on both sides of the calling convention.

llvm-svn: 165714
2012-10-11 15:38:20 +00:00
Nadav Rotem e10328737d Add a new interface to allow IR-level passes to access codegen-specific information.
llvm-svn: 165665
2012-10-10 22:04:55 +00:00
Bill Schmidt b9bc47409d When generating spill and reload code for vector registers on PowerPC,
the compiler makes use of GPR0.  However, there are two flavors of
GPR0 defined by the target:  the 32-bit GPR0 (R0) and the 64-bit GPR0
(X0).  The spill/reload code makes use of R0 regardless of whether we
are generating 32- or 64-bit code.

This patch corrects the problem in the obvious manner, using X0 and
ADDI8 for 64-bit and R0 and ADDI for 32-bit.

llvm-svn: 165658
2012-10-10 21:25:01 +00:00
Bill Schmidt 38d9458720 The PowerPC VRSAVE register has been somewhat of an odd beast since
the Altivec extensions were introduced.  Its use is optional, and
allows the compiler to communicate to the operating system which
vector registers should be saved and restored during a context switch.
In practice, this information is ignored by the various operating
systems using the SVR4 ABI; the kernel saves and restores the entire
register state.  Setting the VRSAVE register is no longer performed by
the AIX XL compilers, the IBM i compilers, or by GCC on Power Linux
systems.  It seems best to avoid this logic within LLVM as well.

This patch avoids generating code to update and restore VRSAVE for the
PowerPC SVR4 ABIs (32- and 64-bit).  The code remains in place for the
Darwin ABI.

llvm-svn: 165656
2012-10-10 20:54:15 +00:00
Bill Wendling c9b22d735a Create enums for the different attributes.
We use the enums to query whether an Attributes object has that attribute. The
opaque layer is responsible for knowing where that specific attribute is stored.

llvm-svn: 165488
2012-10-09 07:45:08 +00:00
Adhemerval Zanella fe3f793cec PR12716: PPC crashes on vector compare
Vector compare using altivec 'vcmpxxx' instructions have as third argument
a vector register instead of CR one, different from integer and float-point
compares. This leads to a failure in code generation, where 'SelectSETCC'
expects a DAG with a CR register and gets vector register instead.

This patch changes the behavior by just returning a DAG with the 
vector compare instruction based on the type. The patch also adds a testcase
for all vector types llvm defines.

It also included a fix on signed 5-bits predicates printing, where
signed values were not handled correctly as signed (char are unsigned by
default for PowerPC). This generates 'vspltisw' (vector splat)
instruction with SIM out of range.

llvm-svn: 165419
2012-10-08 18:59:53 +00:00
Adhemerval Zanella 22b9fd2f2e PowerPC: Fix object creation with PPC::MTCRF8 instruction
llvm-svn: 165411
2012-10-08 18:25:11 +00:00
Adhemerval Zanella 5c6e08435e Add floating-point to and from integer conversion
This patch add altivec support for v4i32 to v4f32 and for v4f32 to
v4i32 vector rounding conversion.

llvm-svn: 165409
2012-10-08 17:27:24 +00:00
Micah Villmow cdfe20b97f Move TargetData to DataLayout.
llvm-svn: 165402
2012-10-08 16:38:25 +00:00
Bill Schmidt d1fa36f903 This patch splits apart PPCISelLowering::LowerFormalArguments_Darwin_Or_64SVR4
into separate versions for the Darwin and 64-bit SVR4 ABIs.  This will
facilitate doing more major surgery on the 64-bit SVR4 ABI in the near future.

llvm-svn: 165336
2012-10-05 21:27:08 +00:00
Will Schmidt 314c6c4c2b - Mark the BCC and BLR defs as isCodeGenOnly per error output from
llvm-tblgen -gen-asm-matcher.

 PPCInstrInfo.td |   11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

llvm-svn: 165315
2012-10-05 15:16:11 +00:00
Will Schmidt 4a67f2e2a7 - add tokens to PPCInstrInfo.td and PPCInstr64Bit.td to resolve
"Instruction 'foo' has no tokens" errors during llvm-tblgen
-gen-asm-matcher attempts.  At this time, the added
tokens are "#comment" style rather than the actual mnemonic.  This will
be revisited once the rest of the base asmparser bits get straightened
out for ppc64-elf-linux.

llvm-svn: 165237
2012-10-04 18:14:28 +00:00
Will Schmidt 2247f8afbd test commit / whitespace
llvm-svn: 165233
2012-10-04 16:20:24 +00:00
Sylvestre Ledru 91ce36c986 Revert 'Fix a typo 'iff' => 'if''. iff is an abreviation of if and only if. See: http://en.wikipedia.org/wiki/If_and_only_if Commit 164767
llvm-svn: 164768
2012-09-27 10:14:43 +00:00
Sylvestre Ledru 721cffd53a Fix a typo 'iff' => 'if'
llvm-svn: 164767
2012-09-27 09:59:43 +00:00
Bill Wendling 863bab689a Remove the `hasFnAttr' method from Function.
The hasFnAttr method has been replaced by querying the Attributes explicitly. No
intended functionality change.

llvm-svn: 164725
2012-09-26 21:48:26 +00:00
Roman Divacky ca10389bfe Specify MachinePointerInfo as refering to the argument value and offset of the
store when handling byval arguments. Thus preventing reordering of the store
with load with post-RA scheduler.

llvm-svn: 164553
2012-09-24 20:47:19 +00:00
Bill Schmidt 019cc6fe03 Small structs for PPC64 SVR4 must be passed right-justified in registers.
lib/Target/PowerPC/PPCISelLowering.{h,cpp}
 Rename LowerFormalArguments_Darwin to LowerFormalArguments_Darwin_Or_64SVR4.
 Rename LowerFormalArguments_SVR4 to LowerFormalArguments_32SVR4.
 Receive small structs right-justified in LowerFormalArguments_Darwin_Or_64SVR4.
 Rename LowerCall_Darwin to LowerCall_Darwin_Or_64SVR4.
 Rename LowerCall_SVR4 to LowerCall_32SVR4.
 Pass small structs right-justified in LowerCall_Darwin_Or_64SVR4.

test/CodeGen/PowerPC/structsinregs.ll
 New test.

llvm-svn: 164228
2012-09-19 15:42:13 +00:00
Roman Divacky 09adf3decc Fix the isLocalCall() by checking for linker weakness as well.
llvm-svn: 164155
2012-09-18 18:27:49 +00:00
Roman Divacky 0be33598ce Avoid symbol name clash when filling TOC.
Patch by Adhemerval Zanella.

llvm-svn: 164141
2012-09-18 17:10:37 +00:00
Roman Divacky d4f6f421a9 On PPC64 emit the environment pointer. Patch by Adhemerval Zanella.
llvm-svn: 164139
2012-09-18 16:55:29 +00:00
Roman Divacky 762930637c Optimize local func calls to not emit nop for TOC restoration.
Patch by Adhemerval Zanella.

llvm-svn: 164138
2012-09-18 16:47:58 +00:00
Roman Divacky 5dd4ccb402 When creating MCAsmBackend pass the CPU string as well. In X86AsmBackend
store this and use it to not emit long nops when the CPU is geode which
doesnt support them.

Fixes PR11212.

llvm-svn: 164132
2012-09-18 16:08:49 +00:00
Craig Topper 462c31b3da Change unsigned to uint32_t to match base class declaration and other targets.
llvm-svn: 164001
2012-09-16 18:10:23 +00:00
Craig Topper a60c0f1163 Use LLVM_DELETED_FUNCTION in place of 'DO NOT IMPLEMENT' comments.
llvm-svn: 163974
2012-09-15 17:09:36 +00:00
Michael Liao abb87d4857 Fix PR11985
- BlockAddress has no support of BA + offset form and there is no way to
  propagate that offset into machine operand;
- Add BA + offset support and a new interface 'getTargetBlockAddress' to
  simplify target block address forming;
- All targets are modified to use new interface and X86 backend is enhanced to
  support BA + offset addressing.

llvm-svn: 163743
2012-09-12 21:43:09 +00:00
Roman Divacky 10a448d45a Enable exceptions handling on PPC64 now that cr misaligned spilling
was fixed in r163713.

llvm-svn: 163715
2012-09-12 15:29:32 +00:00
Roman Divacky c9e23d93ae This patch corrects logic in PPCFrameLowering for save and restore of
nonvolatile condition register fields across calls under the SVR4 ABIs.                                            
                                                                                                                   
 * With the 64-bit ABI, the save location is at a fixed offset of 8 from                                           
the stack pointer.  The frame pointer cannot be used to access this                                                
portion of the stack frame since the distance from the frame pointer may                                           
change with alloca calls.                                                                                          
                                                                                                                   
 * With the 32-bit ABI, the save location is just below the general
register save area, and is accessed via the frame pointer like the rest
of the save areas.  This is an optional slot, so it must only be created                                           
if any of CR2, CR3, and CR4 were modified.                                                                      
                                                                                                                   
 * For both ABIs, save/restore logic is generated only if one of the     
nonvolatile CR fields were modified.                                   

I also took this opportunity to clean up an extra FIXME in
PPCFrameLowering.h.  Save area offsets for 32-bit GPRs are meaningless
for the 64-bit ABI, so I removed them for correctness and efficiency.


Fixes PR13708 and partially also PR13623. It lets us enable exception handling
on PPC64.

Patch by William J. Schmidt!

llvm-svn: 163713
2012-09-12 14:47:47 +00:00
Benjamin Kramer 47f9ec92cb MC: Overhaul handling of .lcomm
- Darwin lied about not supporting .lcomm and turned it into zerofill in the
  asm parser. Push the zerofill-conversion down into macho-specific code.
- This makes the tri-state LCOMMType enum superfluous, there are no targets
  without .lcomm.
- Do proper error reporting when trying to use .lcomm with alignment on a target
  that doesn't support it.
- .comm and .lcomm alignment was parsed in bytes on COFF, should be power of 2.
- Fixes PR13755 (.lcomm crashes on ELF).

llvm-svn: 163395
2012-09-07 17:25:13 +00:00
Hal Finkel efe4a44106 Move the PPC TOC defs into the PPC64 InstrInfo file.
Since TOC is just defined for PPC64, move its definition to PPC64 td file.

Patch by Adhemerval Zanella.

llvm-svn: 163234
2012-09-05 19:22:27 +00:00
Roman Divacky 2be394bdcd Remove always true checks. Noticed by Adhemerval Zanella.
llvm-svn: 163117
2012-09-03 16:55:42 +00:00
NAKAMURA Takumi ac49029fd9 PPCISelLowering.cpp: Fix r162725.
[Tobias von Koch] What's happening here is that the CR6SET/CR6UNSET is breaking the chain of register copies glued to the function call (BL_SVR4 node). The scheduler then moves other instructions in between those and the function call, which isn't good!

Right. That's the case where there is no chain of register copies before the call, so InFlag == 0... Attached is a new revision of the patch which should fix this for good.

llvm-svn: 162916
2012-08-30 15:52:29 +00:00
NAKAMURA Takumi 8ad54e04d2 PPCISelLowering.cpp: Whitespace.
llvm-svn: 162915
2012-08-30 15:52:23 +00:00
Hal Finkel 1859d26528 Reserve space for the mandatory traceback fields on PPC64.
We need to reserve space for the mandatory traceback fields,
though leaving them as zero is appropriate for now.

Although the ABI calls for these fields to be filled in fully, no
compiler on Linux currently does this, and GDB does not read these
fields.  GDB uses the first word of zeroes during exception handling to
find the end of the function and the size field, allowing it to compute
the beginning of the function.  DWARF information is used for everything
else.  We need the extra 8 bytes of pad so the size field is found in
the right place.

As a comparison, GCC fills in a few of the fields -- language, number
of saved registers -- but ignores the rest.  IBM's proprietary OSes do
make use of the full traceback table facility.

Patch by Bill Schmidt.

llvm-svn: 162854
2012-08-29 20:22:24 +00:00
Roman Divacky 8c4b6a307e Emit word of zeroes after the last instruction as a start of the mandatory
traceback table on PowerPC64. This helps gdb handle exceptions. The other
mandatory fields are ignored by gdb and harder to implement so just add
there a FIXME.

Patch by Bill Schmidt. PR13641.

llvm-svn: 162778
2012-08-28 19:06:55 +00:00
Hal Finkel 742b535e40 Add PPC Freescale e500mc and e5500 subtargets.
Add subtargets for Freescale e500mc (32-bit) and e5500 (64-bit) to
the PowerPC backend.

Patch by Tobias von Koch.

llvm-svn: 162764
2012-08-28 16:12:39 +00:00
Hal Finkel 679c73cb33 Split several PPC instruction classes.
Slight reorganisation of PPC instruction classes for scheduling. No
functionality change for existing subtargets.
 - Clearly separate load/store-with-update instructions from regular loads and stores.
 - Split IntRotateD -> IntRotateD and IntRotateDI
 - Split out fsub and fadd from FPGeneral -> FPAddSub
 - Update existing itineraries

Patch by Tobias von Koch.

llvm-svn: 162729
2012-08-28 02:49:14 +00:00
Hal Finkel 686f2ee226 Allow remat of LI on PPC.
Allow load-immediates to be rematerialised in the register coalescer for
PPC. This makes test/CodeGen/PowerPC/big-endian-formal-args.ll fail,
because it relies on a register move getting emitted. The immediate load is
equivalent, so change this test case.

Patch by Tobias von Koch.

llvm-svn: 162727
2012-08-28 02:10:33 +00:00
Hal Finkel 5ab378037f Eliminate redundant CR moves on PPC32.
The 32-bit ABI requires CR bit 6 to be set if the call has fp arguments and
unset if it doesn't. The solution up to now was to insert a MachineNode to
set/unset the CR bit, which produces a CR vreg. This vreg was then copied
into CR bit 6. When the register allocator saw a bunch of these in the same
function, it allocated the set/unset CR bit in some random CR register (1
extra instruction) and then emitted CR moves before every vararg function
call, rather than just setting and unsetting CR bit 6 directly before every
vararg function call. This patch instead inserts a PPCcrset/PPCcrunset
instruction which are then matched by a dedicated instruction pattern.

Patch by Tobias von Koch.

llvm-svn: 162725
2012-08-28 02:10:27 +00:00
Hal Finkel e39526a789 Optimize zext on PPC64.
The zeroextend IR instruction is lowered to an 'and' node with an immediate
mask operand, which in turn gets legalised to a sequence of ori's & ands.
This can be done more efficiently using the rldicl instruction.

Patch by Tobias von Koch.

llvm-svn: 162724
2012-08-28 02:10:15 +00:00
Richard Smith 228e6d4cf3 Fix integer undefined behavior due to signed left shift overflow in LLVM.
Reviewed offline by chandlerc.

llvm-svn: 162623
2012-08-24 23:29:28 +00:00
Roman Divacky ace4707ea6 Lower constant pools and jump tables via TOC on PPC64/SVR4.
In collaboration with Adhemerval Zanella.

llvm-svn: 162562
2012-08-24 16:26:02 +00:00
Jakob Stoklund Olesen a954e92053 Add missing SDNPSideEffect flags.
llvm-svn: 162557
2012-08-24 14:43:27 +00:00
Roman Divacky 2039a987c4 Revert r162034, r162035 and r162037.
llvm-svn: 162039
2012-08-16 19:07:59 +00:00
Roman Divacky 9d38fc8ddc Define and handle additional fixup kinds. By Adhemerval Zanella.
llvm-svn: 162037
2012-08-16 18:37:52 +00:00
Roman Divacky 1faf5b07c6 Fix typo and grammar. By Adhemerval Zanella.
llvm-svn: 162032
2012-08-16 18:19:29 +00:00
Jakob Stoklund Olesen 978c1280a5 Don't use getNextOperandForReg().
This way of using getNextOperandForReg() was unlikely to work as
intended. We don't give any guarantees about the order of operands in
the use-def chains, so looking only at operands following a given
operand in the chain doesn't make sense.

llvm-svn: 161542
2012-08-08 23:44:04 +00:00
Hal Finkel 895a5f5d12 Add a comment about mftb vs. mfspr on PPC.
Thanks to Alex Rosenberg for the suggestion.

llvm-svn: 161428
2012-08-07 17:04:20 +00:00
Hal Finkel 33e529d56b MFTB on PPC64 should really be encoded using MFSPR.
The MFTB instruction itself is being phased out, and its functionality
is provided by MFSPR. According to the ISA docs, using MFSPR works on all known
chips except for the 601 (which did not have a timebase register anyway)
and the POWER3.

Thanks to Adhemerval Zanella for pointing this out!

llvm-svn: 161346
2012-08-06 21:21:44 +00:00
Hal Finkel 70381a7b18 Add readcyclecounter lowering on PPC64.
On PPC64, this can be done with a simple TableGen pattern.
To enable this, I've added the (otherwise missing) readcyclecounter
SDNode definition to TargetSelectionDAG.td.

llvm-svn: 161302
2012-08-04 14:10:46 +00:00
Gabor Greif a1529b6ca4 allow 'make CPPFLAGS=<something>' work again
this makes this hack a bit more bearable
for poor souls who need to pass custom
preprocessor flags to the build process

llvm-svn: 161240
2012-08-03 13:31:24 +00:00
Jakob Stoklund Olesen ed6c0408fa Remove variable_ops from call instructions in most targets.
Call instructions are no longer required to be variadic, and
variable_ops should only be used for instructions that encode a variable
number of arguments, like the ARM stm/ldm instructions.

llvm-svn: 160189
2012-07-13 20:44:29 +00:00
Evan Cheng 39e90029a2 Target option DisableJumpTables is a gross hack. Move it to TargetLowering instead.
llvm-svn: 159611
2012-07-02 22:39:56 +00:00
Bob Wilson bbd38dd9c0 Add all codegen passes to the PassManager via TargetPassConfig.
This is a preliminary step toward having TargetPassConfig be able to
start and stop the compilation at specified passes for unit testing
and debugging.  No functionality change.

llvm-svn: 159567
2012-07-02 19:48:31 +00:00
Bill Wendling e38859dc8e Move lib/Analysis/DebugInfo.cpp to lib/VMCore/DebugInfo.cpp and
include/llvm/Analysis/DebugInfo.h to include/llvm/DebugInfo.h.

The reasoning is because the DebugInfo module is simply an interface to the
debug info MDNodes and has nothing to do with analysis.

llvm-svn: 159312
2012-06-28 00:05:13 +00:00
Jack Carter 5e69cffed5 There are a number of generic inline asm operand modifiers that
up to r158925 were handled as processor specific. Making them 
generic and putting tests for these modifiers in the CodeGen/Generic
directory caused a number of targets to fail. 

This commit addresses that problem by having the targets call 
the generic routine for generic modifiers that they don't currently
have explicit code for.

For now only generic print operands 'c' and 'n' are supported.vi


Affected files:

    test/CodeGen/Generic/asm-large-immediate.ll
    lib/Target/PowerPC/PPCAsmPrinter.cpp
    lib/Target/NVPTX/NVPTXAsmPrinter.cpp
    lib/Target/ARM/ARMAsmPrinter.cpp
    lib/Target/XCore/XCoreAsmPrinter.cpp
    lib/Target/X86/X86AsmPrinter.cpp
    lib/Target/Hexagon/HexagonAsmPrinter.cpp
    lib/Target/CellSPU/SPUAsmPrinter.cpp
    lib/Target/Sparc/SparcAsmPrinter.cpp
    lib/Target/MBlaze/MBlazeAsmPrinter.cpp
    lib/Target/Mips/MipsAsmPrinter.cpp
    
MSP430 isn't represented because it did not even run with
the long existing 'c' modifier and it was not apparent what
needs to be done to get it inline asm ready.

Contributer: Jack Carter
llvm-svn: 159203
2012-06-26 13:49:27 +00:00
NAKAMURA Takumi 704de074b8 llvm/lib: [CMake] Add explicit dependency to intrinsics_gen.
llvm-svn: 159112
2012-06-24 13:32:01 +00:00
Craig Topper 2361cd9897 Silence an unused variable warning on release builds.
llvm-svn: 159074
2012-06-23 08:09:30 +00:00
Hal Finkel 460e94d842 Add support for the PPC isel instruction.
The isel (integer select) instruction is supported on the 440 and A2
embedded cores and on the POWER7.

llvm-svn: 159045
2012-06-22 23:10:08 +00:00
Hal Finkel 0a479ae7d1 Convert the PPC backend to use the new FMA infrastructure.
The existing contraction patterns are replaced with fma/fneg.
Overall functionality should be the same.

llvm-svn: 158955
2012-06-22 00:49:52 +00:00
Hal Finkel a86b0f20dd Treat TargetGlobalAddress as a constant for the purpose of matching pre-inc stores on PPC.
Thanks to Tobias von Koch for pointing out this problem.

llvm-svn: 158932
2012-06-21 20:10:48 +00:00
Hal Finkel ca542beffe Add support for generating reg+reg (indexed) pre-inc loads on PPC.
llvm-svn: 158823
2012-06-20 15:43:03 +00:00
Lang Hames 39fb1d08dc Add DAG-combines for aggressive FMA formation.
This patch adds DAG combines to form FMAs from pairs of FADD + FMUL or
FSUB + FMUL. The combines are performed when:
(a) Either
      AllowExcessFPPrecision option (-enable-excess-fp-precision for llc)
        OR
      UnsafeFPMath option (-enable-unsafe-fp-math)
    are set, and
(b) TargetLoweringInfo::isFMAFasterThanMulAndAdd(VT) is true for the type of
    the FADD/FSUB, and
(c) The FMUL only has one user (the FADD/FSUB).

If your target has fast FMA instructions you can make use of these combines by
overriding TargetLoweringInfo::isFMAFasterThanMulAndAdd(VT) to return true for
types supported by your FMA instruction, and adding patterns to match ISD::FMA
to your FMA instructions.

llvm-svn: 158757
2012-06-19 22:51:23 +00:00
Jakob Stoklund Olesen 0f855e4263 Implement PPCInstrInfo::isCoalescableExtInstr().
The PPC::EXTSW instruction preserves the low 32 bits of its input, just
like some of the x86 instructions. Use it to reduce register pressure
when the low 32 bits have multiple uses.

This requires a small change to PeepholeOptimizer since EXTSW takes a
64-bit input register.

This is related to PR5997.

llvm-svn: 158743
2012-06-19 21:14:34 +00:00
Hal Finkel d465810f7c Mark most PPC register classes to avoid write-after-write.
For processors with the G5-like instruction-grouping scheme, this helps avoid
early group termination due to a write-after-write dependency within the group.
It should also help on pipelined embedded cores.

On POWER7, over the test suite, this gives an average 0.5% speedup. The largest
speedups are:

SingleSource/Benchmarks/Stanford/Quicksort - 33%
MultiSource/Applications/d/make_dparser - 21%
MultiSource/Benchmarks/FreeBench/analyzer/analyzer - 12%
MultiSource/Benchmarks/MiBench/telecomm-FFT/telecomm-fft - 12%

Largest slowdowns:

SingleSource/Benchmarks/Stanford/Bubblesort - 23%
MultiSource/Benchmarks/Prolangs-C++/city/city - 21%
MultiSource/Benchmarks/BitBench/uuencode/uuencode - 16%
MultiSource/Benchmarks/mediabench/mpeg2/mpeg2dec/mpeg2decode - 13%

llvm-svn: 158719
2012-06-19 13:57:17 +00:00
Hal Finkel 1cc27e44a4 Add support for generating reg+reg preinc stores on PPC.
PPC will now generate STWUX and friends.

llvm-svn: 158698
2012-06-19 02:34:32 +00:00
Hal Finkel 6261c2dc28 Cleanup trip-count finding for PPC CTR loops (and some bug fixes).
This cleans up the method used to find trip counts in order to form CTR loops on PPC.
This refactoring allows the pass to find loops which have a constant trip count but also
happen to end with a comparison to zero. This also adds explicit FIXMEs to mark two different
classes of loops that are currently ignored.

In addition, we now search through all potential induction operations instead of just the first.
Also, we check the predicate code on the conditional branch and abort the transformation if the
code is not EQ or NE, and we then make sure that the branch to be transformed matches the
condition register defined by the comparison (multiple possible comparisons will be considered).

llvm-svn: 158607
2012-06-16 20:34:07 +00:00
Hal Finkel 9898614854 Add another missing 64-bit itinerary definition for the PPC A2 core.
llvm-svn: 158393
2012-06-13 05:55:09 +00:00
Hal Finkel 79c39da135 Add some missing 64-bit itinerary definitions for the PPC A2 core.
llvm-svn: 158373
2012-06-12 20:32:29 +00:00
Hal Finkel 8c33dde666 Split out the PPC instruction class IntSimple from IntGeneral.
On the POWER7, adds and logical operations can also be handled
in the load/store pipelines. We'll call these IntSimple.

llvm-svn: 158366
2012-06-12 19:01:24 +00:00
Hal Finkel f1cc96ab50 Fixes for PPC host detection and features.
POWER4 is a 64-bit CPU (better matched to the 970).
The g3 is really the 750 (no altivec), the g4+ is the 74xx (not the 750).

Patch by Andreas Tobler.

llvm-svn: 158363
2012-06-12 16:39:23 +00:00
Hal Finkel 59b0ee8a56 Reapply r158337, this time properly protect Darwin/PPC host CPU use with __ppc__.
Original commit message:
Move PPC host-CPU detection logic from PPCSubtarget into sys::getHostCPUName().

Both the new Linux functionality and the old Darwin functions have been moved.
This change also allows this information to be queried directly by clang and
other frontends (clang, for example, will now have real -mcpu=native support).

llvm-svn: 158349
2012-06-12 03:03:13 +00:00
Jakob Stoklund Olesen f8f128606c Revert r158337 "Move PPC host-CPU detection logic from PPCSubtarget into sys::getHostCPUName()."
This commit broke most of the PowerPC unit tests when running on
Intel/Apple.

llvm-svn: 158345
2012-06-12 00:58:40 +00:00
Hal Finkel 23c699e497 Move PPC host-CPU detection logic from PPCSubtarget into sys::getHostCPUName().
Both the new Linux functionality and the old Darwin functions have been moved.
This change also allows this information to be queried directly by clang and
other frontends (clang, for example, will now have real -mcpu=native support).

llvm-svn: 158337
2012-06-11 23:14:31 +00:00
Hal Finkel bddc916f2b Enable MFOCRF generation on the PPC A2 core.
llvm-svn: 158324
2012-06-11 19:57:04 +00:00
Hal Finkel bfd3d08d18 Rename the PPC target feature gpul to mfocrf.
The PPC target feature gpul (IsGigaProcessor) was only used for one thing:
To enable the generation of the MFOCRF instruction. Furthermore, this
instruction is available on other PPC cores outside of the G5 line. This
feature now corresponds to the HasMFOCRF flag.

No functionality change.

llvm-svn: 158323
2012-06-11 19:57:01 +00:00
Hal Finkel 25d4c568d3 Add A2 to the list of PPC CPUs recognized by Linux host CPU-type detection.
llvm-svn: 158322
2012-06-11 19:56:57 +00:00
Hal Finkel 2c09058f19 Emit the two-operand form of the PPC mfcr instruction as mfocrf.
This is necessary on Linux and supported on Darwin, see PR2604.

llvm-svn: 158315
2012-06-11 15:43:15 +00:00
Hal Finkel ba671c0ea7 Add local CPU detection for Linux PPC.
This functionality mirrors that available on PPC/Darwin.

llvm-svn: 158314
2012-06-11 15:43:13 +00:00
Hal Finkel f2b9c38d6f Add POWER6 and POWER7 CPU types to the PPC backend.
No functional change; these will be used by upcoming scheduler enhancements.

llvm-svn: 158313
2012-06-11 15:43:08 +00:00
Hal Finkel 4e9f1a859f Enable ILP scheduling for all nodes by default on PPC.
Over the entire test-suite, this has an insignificantly negative average
performance impact, but reduces some of the worst slowdowns from the
anti-dep. change (r158294).

Largest speedups:
SingleSource/Benchmarks/Stanford/Quicksort - 28%
SingleSource/Benchmarks/Stanford/Towers - 24%
SingleSource/Benchmarks/Shootout-C++/matrix - 23%
MultiSource/Benchmarks/SciMark2-C/scimark2 - 19%
MultiSource/Benchmarks/MiBench/automotive-bitcount/automotive-bitcount - 15%
(matrix and automotive-bitcount were both in the top-5 slowdown list from the
anti-dep. change)

Largest slowdowns:
MultiSource/Benchmarks/McCat/03-testtrie/testtrie - 28%
MultiSource/Benchmarks/mediabench/gsm/toast/toast - 26%
MultiSource/Benchmarks/MiBench/automotive-susan/automotive-susan - 21%
SingleSource/Benchmarks/CoyoteBench/lpbench - 20%
MultiSource/Applications/d/make_dparser - 16%

llvm-svn: 158296
2012-06-10 19:32:29 +00:00
Hal Finkel a8100281ae Use critical anti-dep. breaking on all PPC targets, but also add other register classes.
Using 'all' instead of 'critical' would be better because it would make it easier to
satisfy the bundling constraints, but, as noted in the FIXME, that is currently not
possible with the crs.

This yields an average 1% speedup over the entire test suite (on Power 7). Largest speedups:
SingleSource/Benchmarks/Shootout-C++/moments - 40%
MultiSource/Benchmarks/McCat/03-testtrie/testtrie - 28%
SingleSource/Benchmarks/BenchmarkGame/nsieve-bits - 26%
SingleSource/Benchmarks/McGill/misr - 23%
MultiSource/Applications/JM/ldecod/ldecod - 22%

Largest slowdowns:
SingleSource/Benchmarks/Shootout-C++/matrix - -29%
SingleSource/Benchmarks/Shootout-C++/ary3 - -22%
MultiSource/Benchmarks/BitBench/uuencode/uuencode - -18%
SingleSource/Benchmarks/Shootout-C++/ary - -17%
MultiSource/Benchmarks/MiBench/automotive-bitcount/automotive-bitcount - -15%

llvm-svn: 158294
2012-06-10 11:15:36 +00:00
Hal Finkel 2edfbddcf0 Improve ext/trunc patterns on PPC64.
The PPC64 backend had patterns for i32 <-> i64 extensions and truncations that
would leave self-moves in the final assembly. Replacing those patterns with ones
based on the SUBREG builtins yields better-looking code.

Thanks to Jakob and Owen for their suggestions in this matter.

llvm-svn: 158283
2012-06-09 22:10:19 +00:00
Hal Finkel eb50c2d4a4 Enable tail merging on PPC.
Tail merging had been disabled on PPC because it would disturb bundling decisions
made during pre-RA scheduling on the 970 cores. Now, however, all bundling decisions
are made during post-RA scheduling, and tail merging is generally beneficial (the
average test-suite speedup is insignificantly positive).

Largest test-suite speedups:
MultiSource/Benchmarks/mediabench/gsm/toast/toast - 30%
MultiSource/Benchmarks/BitBench/uuencode/uuencode - 23%
SingleSource/Benchmarks/Shootout-C++/ary - 21%
SingleSource/Benchmarks/Stanford/Queens - 17%

Largest slowdowns:
MultiSource/Benchmarks/MiBench/security-sha/security-sha - 24%
MultiSource/Benchmarks/McCat/03-testtrie/testtrie - 22%
MultiSource/Applications/JM/ldecod/ldecod - 14%
MultiSource/Benchmarks/mediabench/g721/g721encode/encode - 9%

This is improved by using full (instead of just critical) anti-dependency breaking,
but doing so still causes miscompiles and so cannot yet be enabled by default.

llvm-svn: 158259
2012-06-09 03:14:50 +00:00
Hal Finkel 41e6fd1df9 Remove the TODO statement in the PPC README re: CTR loops
As Chris points out, this can now be removed!

TODO: check if the associated section on viterbi's inner loop can also be removed.
llvm-svn: 158224
2012-06-08 20:02:09 +00:00
Hal Finkel c6b5debb40 Enable PPC CTR loop formation by default.
Thanks to Jakob's help, this now causes no new test suite failures!

Over the entire test suite, this gives an average 1% speedup. The largest speedups are:
SingleSource/Benchmarks/Misc/pi - 108%
SingleSource/Benchmarks/CoyoteBench/lpbench - 54%
MultiSource/Benchmarks/Prolangs-C/unix-smail/unix-smail - 50%
SingleSource/Benchmarks/Shootout/ary3 - 32%
SingleSource/Benchmarks/Shootout-C++/matrix - 30%

The largest slowdowns are:
MultiSource/Benchmarks/mediabench/gsm/toast/toast - -30%
MultiSource/Benchmarks/Prolangs-C/bison/mybison - -25%
MultiSource/Benchmarks/BitBench/uuencode/uuencode - -22%
MultiSource/Applications/d/make_dparser - -14%
SingleSource/Benchmarks/Shootout-C++/ary - -13%

In light of these slowdowns, additional profiling work is obviously needed!

llvm-svn: 158223
2012-06-08 19:19:53 +00:00
Hal Finkel 3d32ad3a7f Mark the PPC CTRRC and CTRRC8 register classes as non-allocatable.
Marking these classes as non-alocatable allows CTR loop generation to
work correctly with the block placement passes, etc. These register
classes are currently used only by some unused TCRETURN patterns.
In future cleanup, these will be removed.

Thanks again to Jakob for suggesting this fix to the CTR loop problem!

llvm-svn: 158221
2012-06-08 19:02:08 +00:00
Hal Finkel 821e00121c Disable the PPC CTR-Loops pass by default.
The pass itself works well, but the something in the Machine* infrastructure
does not understand terminators which define registers. Without the ability
to use the block-placement pass, etc. this causes performance regressions (and
so is turned off by default). Turning off the analysis turns off the problems
with the Machine* infrastructure.

llvm-svn: 158206
2012-06-08 15:38:25 +00:00
Hal Finkel 8b01503ee5 Fix a bug in the new PPC CTR-Loops pass.
The code which tests for an induction operation cannot assume that any
ADDI instruction will have a register operand because the operand could
also be a frame index; for example:
    %vreg16<def> = ADDI8 <fi#0>, 0; G8RC:%vreg16

llvm-svn: 158205
2012-06-08 15:38:23 +00:00
Hal Finkel 96c2d4d945 Add the PPCCTRLoops pass: a PPC machine-code-level optimization pass to form CTR-based loop branching code.
This pass is derived from the Hexagon HardwareLoops pass. The only significant enhancement over the Hexagon
pass is that PPCCTRLoops will also attempt to delete the replaced add and compare operations if they are
no longer otherwise used. Also, invalid preheader DebugLoc is not used.

llvm-svn: 158204
2012-06-08 15:38:21 +00:00
Roman Divacky c856653fb3 PPC32 uses R2 as the TLS register. Fix the copy and paste.
llvm-svn: 158004
2012-06-05 17:14:17 +00:00
Roman Divacky e3f15c98d1 Implement local-exec TLS on PowerPC.
llvm-svn: 157935
2012-06-04 17:36:38 +00:00
Hal Finkel 1de9bf01e4 Fix a copy-and-paste duplication error in the PPC 440 and A2 schedules (no functionality change).
llvm-svn: 157912
2012-06-04 02:39:52 +00:00
Hal Finkel 595817eebe Enable generating PPC pre-increment (r+imm) instructions by default.
It seems that this no longer causes test suite failures on PPC64 (after r157159),
and often gives a performance benefit, so it can be enabled by default.

llvm-svn: 157911
2012-06-04 02:21:00 +00:00
Justin Holewinski aa58397b3c Change interface for TargetLowering::LowerCallTo and TargetLowering::LowerCall
to pass around a struct instead of a large set of individual values.  This
cleans up the interface and allows more information to be added to the struct
for future targets without requiring changes to each and every target.

NV_CONTRIB

llvm-svn: 157479
2012-05-25 16:35:28 +00:00
Hal Finkel 601f555eee Add a missing PPC 64-bit stwu pattern.
This seems to fix the remaining compile-time failures on PPC64 when
compiling with -enable-ppc-preinc.

llvm-svn: 157159
2012-05-20 17:11:24 +00:00
Hal Finkel 66b0c93553 Add a FIXME about access to negative stack-pointer offsets on PPC32.
The current code will generate a prologue which starts with something like:
        mflr 0
        stw 31, -4(1)
        stw 0, 4(1)
        stwu 1, -16(1)

But under the PPC32 SVR4 ABI, access to negative offsets from R1 is not allowed.

This was pointed out by Peter Bergner.

llvm-svn: 157133
2012-05-19 21:52:55 +00:00
Jim Grosbach c3b0427921 Allow MCCodeEmitter access to the target MCRegisterInfo.
Add the MCRegisterInfo to the factories and constructors.

Patch by Tom Stellard <Tom.Stellard@amd.com>.

llvm-svn: 156828
2012-05-15 17:35:52 +00:00
Roman Divacky e07cc042f6 Mark .opd @progbits, thus avoiding a warning from asm.
llvm-svn: 156494
2012-05-09 18:24:23 +00:00
Jakob Stoklund Olesen 3c52f0281f Add an MF argument to TRI::getPointerRegClass() and TII::getRegClass().
The getPointerRegClass() hook can return register classes that depend on
the calling convention of the current function (ptr_rc_tailcall).

So far, we have been able to infer the calling convention from the
subtarget alone, but as we add support for multiple calling conventions
per target, that no longer works.

Patch by Yiannis Tsiouris!

llvm-svn: 156328
2012-05-07 22:10:26 +00:00
Jakob Stoklund Olesen 796e5272ab Remove the SubRegClasses field from RegisterClass descriptions.
This information in now computed by TableGen.

llvm-svn: 156152
2012-05-04 03:30:34 +00:00
Bill Wendling b12f16e75f Change the PassManager from a reference to a pointer.
The TargetPassManager's default constructor wants to initialize the PassManager
to 'null'. But it's illegal to bind a null reference to a null l-value. Make the
ivar a pointer instead.
PR12468

llvm-svn: 155902
2012-05-01 08:27:43 +00:00
Preston Gurd 9a0914753a This patch fixes a problem which arose when using the Post-RA scheduler
on X86 Atom. Some of our tests failed because the tail merging part of
the BranchFolding pass was creating new basic blocks which did not
contain live-in information. When the anti-dependency code in the Post-RA
scheduler ran, it would sometimes rename the register containing
the function return value because the fact that the return value was
live-in to the subsequent block had been lost. To fix this, it is necessary
to run the RegisterScavenging code in the BranchFolding pass.

This patch makes sure that the register scavenging code is invoked
in the X86 subtarget only when post-RA scheduling is being done.
Post RA scheduling in the X86 subtarget is only done for Atom.

This patch adds a new function to the TargetRegisterClass to control
whether or not live-ins should be preserved during branch folding.
This is necessary in order for the anti-dependency optimizations done
during the PostRASchedulerList pass to work properly when doing
Post-RA scheduling for the X86 in general and for the Intel Atom in particular.

The patch adds and invokes the new function trackLivenessAfterRegAlloc()
instead of using the existing requiresRegisterScavenging().
It changes BranchFolding.cpp to call trackLivenessAfterRegAlloc() instead of
requiresRegisterScavenging(). It changes the all the targets that
implemented requiresRegisterScavenging() to also implement
trackLivenessAfterRegAlloc().  

It adds an assertion in the Post RA scheduler to make sure that post RA
liveness information is available when it is needed.

It changes the X86 break-anti-dependencies test to use –mcpu=atom, in order
to avoid running into the added assertion.

Finally, this patch restores the use of anti-dependency checking
(which was turned off temporarily for the 3.1 release) for
Intel Atom in the Post RA scheduler.

Patch by Andy Zhang!

Thanks to Jakob and Anton for their reviews.

llvm-svn: 155395
2012-04-23 21:39:35 +00:00
Gabor Greif c8a9abe9df effectively back out my last change (r155190)
llvm-svn: 155195
2012-04-20 11:41:38 +00:00
Gabor Greif 9eccbe9c82 fix obviously bogus (IMO) operand index of the load in asserts
(load only has one operand) and smuggle in some whitespace changes too

NB: I am obviously testing the water here, and believe that the unguarded
    cast is still wrong, but why is the getZExtValue of the load's operand
    tested against zero here? Any review is appreciated.
llvm-svn: 155190
2012-04-20 08:58:49 +00:00
Craig Topper abadc660e0 Convert some uses of XXXRegisterClass to &XXXRegClass. No functional change since they are equivalent.
llvm-svn: 155186
2012-04-20 06:31:50 +00:00
Gabor Greif 180c4445cf zap tabs
llvm-svn: 155128
2012-04-19 15:16:31 +00:00
Jay Foad 08a0598cd4 Remove unused CCIfSubtarget.
llvm-svn: 154921
2012-04-17 11:29:05 +00:00
Rafael Espindola ba0a6cabb8 Always compute all the bits in ComputeMaskedBits.
This allows us to keep passing reduced masks to SimplifyDemandedBits, but
know about all the bits if SimplifyDemandedBits fails. This allows instcombine
to simplify cases like the one in the included testcase.

llvm-svn: 154011
2012-04-04 12:51:34 +00:00
Anton Korobeynikov 325e92668b Make PPCCompilationCallbackC function to be static, so there will be no need to issue call via
PLT when LLVM is built as shared library. This mimics the X86 backend towards the approach.

llvm-svn: 153938
2012-04-03 06:59:28 +00:00
Hal Finkel 7591afa235 The binutils for the IBM BG/P are too old to support CFI.
llvm-svn: 153886
2012-04-02 19:09:04 +00:00
Roman Divacky b9663ccd6b Implement the SVR4 byval alignment for aggregates. Fixing a FIXME.
llvm-svn: 153876
2012-04-02 15:49:30 +00:00
Benjamin Kramer 1c0541b031 Move getOpcodeName from the various target InstPrinters into the superclass MCInstPrinter.
All implementations used the same code.

llvm-svn: 153866
2012-04-02 08:32:38 +00:00
Craig Topper dab9e35ad0 Remove getInstructionName from MCInstPrinter implementations in favor of using the instruction name table from MCInstrInfo. Reduces static data in the InstPrinter implementations.
llvm-svn: 153863
2012-04-02 07:01:04 +00:00
Craig Topper 54bfde79db Make MCInstrInfo available to the MCInstPrinter. This will be used to remove getInstructionName and the static data it contains since the same tables are already in MCInstrInfo.
llvm-svn: 153860
2012-04-02 06:09:36 +00:00
Hal Finkel 3ecfa7b277 Fix some 80-col. violations I introduced with the A2 PPC64 core.
llvm-svn: 153852
2012-04-01 21:20:14 +00:00
Hal Finkel 322e41a914 Enable prefetch generation on PPC64.
llvm-svn: 153851
2012-04-01 20:08:17 +00:00
Hal Finkel 9032344c15 Add LdStSTD* itin. for the PPC64 A2 core.
llvm-svn: 153850
2012-04-01 20:08:08 +00:00
Hal Finkel 88ed4e3b15 Set the default PPC node scheduling preference to ILP (for the embedded cores).
The 440 and A2 cores have detailed itineraries, and this allows them to be
fully used to maximize throughput.

llvm-svn: 153845
2012-04-01 19:23:08 +00:00
Hal Finkel b9845f5758 Add ppc440 itin. entries for LdStSTD*
llvm-svn: 153844
2012-04-01 19:23:04 +00:00
Hal Finkel ec5a1e3669 Use full anti-dep. breaking with post-ra sched. on the embedded ppc cores.
Post-RA scheduling gives a significant performance improvement on
the embedded cores, so turn it on. Using full anti-dep. breaking is
important for FP-intensive blocks, so turn it on (just on the
embedded cores for now; this should also be good on the 970s because
post-ra scheduling is all that we have for now, but that should have
more testing first).

llvm-svn: 153843
2012-04-01 19:22:57 +00:00
Hal Finkel 9f9f8929ee Add instruction itinerary for the PPC64 A2 core.
This adds a full itinerary for IBM's PPC64 A2 embedded core. These
cores form the basis for the CPUs in the new IBM BG/Q supercomputer.

llvm-svn: 153842
2012-04-01 19:22:40 +00:00
Hal Finkel 59607e63cb Split the LdStGeneral PPC itin. class into LdStLoad and LdStStore.
Loads and stores can have different pipeline behavior, especially on
embedded chips. This change allows those differences to be expressed.
Except for the 440 scheduler, there are no functionality changes.
On the 440, the latency adjustment is only by one cycle, and so this
probably does not affect much. Nevertheless, it will make a larger
difference in the future and this removes a FIXME from the 440 itin.

llvm-svn: 153821
2012-04-01 04:44:16 +00:00
Hal Finkel 51861b4855 Fix dynamic linking on PPC64.
Dynamic linking on PPC64 has had problems since we had to move the top-down
hazard-detection logic post-ra. For dynamic linking to work there needs to be
a nop placed after every call. It turns out that it is really hard to guarantee
that nothing will be placed in between the call (bl) and the nop during post-ra
scheduling. Previous attempts at fixing this by placing logic inside the
hazard detector only partially worked.

This is now fixed in a different way: call+nop codegen-only instructions. As far
as CodeGen is concerned the pair is now a single instruction and cannot be split.
This solution works much better than previous attempts.

The scoreboard hazard detector is also renamed to be more generic, there is currently
no cpu-specific logic in it.

llvm-svn: 153816
2012-03-31 14:45:15 +00:00
Craig Topper f6e7e12f75 Remove unnecessary llvm:: qualifications
llvm-svn: 153500
2012-03-27 07:21:54 +00:00
Craig Topper 6e80c28017 Prune some includes and forward declarations.
llvm-svn: 153429
2012-03-26 06:58:25 +00:00
Hal Finkel e44eb28807 Fix small-integer VAARG on SVR4 ABI PPC64.
The PPC64 SVR4 ABI requires integer stack arguments, and thus the var. args., that
are smaller than 64 bits be zero extended to 64 bits.

llvm-svn: 153373
2012-03-24 03:53:55 +00:00
Hal Finkel 76eb187c0f PPC::DBG_VALUE must use Reg+Imm frame-index elimination even for large offsets. Fixes PR12203.
I don't have a small test case yet, but I'll try to construct one.

llvm-svn: 153240
2012-03-22 05:28:19 +00:00
Craig Topper b25fda95f6 Reorder includes in Target backends to following coding standards. Remove some superfluous forward declarations.
llvm-svn: 152997
2012-03-17 18:46:09 +00:00
Craig Topper bef78fc2ee Convert more static tables of registers used by calling convention to uint16_t to reduce space.
llvm-svn: 152538
2012-03-11 07:57:25 +00:00
Craig Topper ca658c2264 Use uint16_t to store registers and opcode in static tables in the target specific backends.
llvm-svn: 152537
2012-03-11 07:16:55 +00:00
Roman Divacky ef21be2cda Convert PowerPC to register mask operands.
llvm-svn: 152122
2012-03-06 16:41:49 +00:00
Jim Grosbach fd93a59557 Make MCRegisterInfo available to the the MCInstPrinter.
Used to allow context sensitive printing of super-register or sub-register
references.

llvm-svn: 152043
2012-03-05 19:33:20 +00:00
Craig Topper 420525ce3b Use uint16_t to store registers in callee saved register tables to reduce size of static data.
llvm-svn: 151996
2012-03-04 03:33:22 +00:00
Evan Cheng 65f9d19c4f Re-commit r151623 with fix. Only issue special no-return calls if it's a direct call.
llvm-svn: 151645
2012-02-28 18:51:51 +00:00
Roman Divacky 34d4b9682b Properly MCize the section switch, removing a FIXME.
llvm-svn: 151639
2012-02-28 18:15:25 +00:00
Daniel Dunbar ee7b899343 Revert r151623 "Some ARM implementaions, e.g. A-series, does return stack prediction. ...", it is breaking the Clang build during the Compiler-RT part.
llvm-svn: 151630
2012-02-28 15:36:07 +00:00
Evan Cheng 87c7b09d8d Some ARM implementaions, e.g. A-series, does return stack prediction. That is,
the processor keeps a return addresses stack (RAS) which stores the address
and the instruction execution state of the instruction after a function-call
type branch instruction.

Calling a "noreturn" function with normal call instructions (e.g. bl) can
corrupt RAS and causes 100% return misprediction so LLVM should use a
unconditional branch instead. i.e.
mov lr, pc
b _foo
The "mov lr, pc" is issued in order to get proper backtrace.

rdar://8979299

llvm-svn: 151623
2012-02-28 06:42:03 +00:00
Roman Divacky 8fe40cd659 Reapply r151278 with fixes.
MCize function entry label emission on PowerPC64 properly.

llvm-svn: 151547
2012-02-27 20:20:47 +00:00
Hal Finkel 6fd2b434bd Revert r151278, breaks static linking.
Reverting this because it breaks static linking on ppc64. Specifically, it may be linkonce_odr functions that are the problem.
With this patch, if you link statically, calls to some functions end up calling their descriptor addresses instead
of calling to their entry points. This causes the execution to fail with SIGILL (b/c the descriptor address just
has some pointers, not code).

llvm-svn: 151433
2012-02-25 03:40:11 +00:00
Hal Finkel a3e6ed2161 X11/X2 loads around indirect calls on ppc64 should not be deleted.
llvm-svn: 151374
2012-02-24 17:54:01 +00:00
Roman Divacky a2d3608f78 MCize function entry label emission on PowerPC64 properly.
llvm-svn: 151278
2012-02-23 20:28:39 +00:00
Hal Finkel ad4d9f5848 Allow the use of an alternate symbol for calculating a function's size.
The standard function epilog includes a .size directive, but ppc64 uses
an alternate local symbol to tag the actual start of each function.

Until recently, binutils accepted the .size directive as:
 .size	test1, .Ltmp0-test1
however, using this directive with recent binutils will result in the error:
 .size expression for XXX does not evaluate to a constant
so we must use the label which actually tags the start of the function.

llvm-svn: 151200
2012-02-22 21:11:47 +00:00
Craig Topper 760b134ffa Make all pointers to TargetRegisterClass const since they are all pointers to static data that should not be modified.
llvm-svn: 151134
2012-02-22 05:59:10 +00:00
Jia Liu b22310fda6 Emacs-tag and some comment fix for all ARM, CellSPU, Hexagon, MBlaze, MSP430, PPC, PTX, Sparc, X86, XCore.
llvm-svn: 150878
2012-02-18 12:03:15 +00:00
Andrew Trick 58648e4e98 Move pass configuration out of pass constructors: BranchFolderPass
llvm-svn: 150095
2012-02-08 21:22:48 +00:00
Craig Topper e55c556a24 Convert assert(0) to llvm_unreachable
llvm-svn: 149961
2012-02-07 02:50:20 +00:00
Craig Topper c4965bce14 Convert assert(0) to llvm_unreachable
llvm-svn: 149814
2012-02-05 07:21:30 +00:00
Andrew Trick f8ea108c05 TargetPassConfig: confine the MC configuration to TargetMachine.
Passes prior to instructon selection are now split into separate configurable stages.
Header dependencies are simplified.
The bulk of this diff is simply removal of the silly DisableVerify flags.

Sorry for the target header churn. Attempting to stabilize them.

llvm-svn: 149754
2012-02-04 02:56:59 +00:00
Andrew Trick ccb673659a Added TargetPassConfig. The first little step toward configuring codegen passes.
Allows command line overrides to be centralized in LLVMTargetMachine.cpp.
LLVMTargetMachine can intercept common passes and give precedence to command line overrides.
Allows adding "internal" target configuration options without touching TargetOptions.
Encapsulates the PassManager.
Provides a good point to initialize all CodeGen passes so that Pass ID's can be used in APIs.
Allows modifying the target configuration hooks without rebuilding the world.

llvm-svn: 149672
2012-02-03 05:12:41 +00:00
Andrew Trick 808a7a6ce6 whitespace
llvm-svn: 149671
2012-02-03 05:12:30 +00:00
Owen Anderson d845d9d9e9 Widen the instruction encoder that TblGen emits to a 64 bits, which should accomodate every target I can think of offhand.
llvm-svn: 148833
2012-01-24 18:37:29 +00:00
David Blaikie 46a9f016c5 More dead code removal (using -Wunreachable-code)
llvm-svn: 148578
2012-01-20 21:51:11 +00:00
Benjamin Kramer 084b9f4134 Remove a bunch of unused variable assignments.
Found by the clang static analyzer.

llvm-svn: 148541
2012-01-20 14:42:32 +00:00
Jakob Stoklund Olesen f1fb1d2375 Ignore register mask operands when lowering instructions to MC.
This is similar to implicit register operands.  MC doesn't understand
register liveness and call clobbers.

llvm-svn: 148437
2012-01-18 23:52:19 +00:00
Jim Grosbach e2d298168c Tidy up. 80 columns.
llvm-svn: 148401
2012-01-18 18:52:20 +00:00
Jim Grosbach aba3de99c0 Tidy up. MCAsmBackend naming conventions.
llvm-svn: 148400
2012-01-18 18:52:16 +00:00
Hal Finkel b1691ccaaa Cleanup PPC RLWINM8 vs RLWINM
No test case: output assembly will be identical.

llvm-svn: 148261
2012-01-16 23:22:50 +00:00
Benjamin Kramer 339ced4e34 Return an ArrayRef from ShuffleVectorSDNode::getMask and push it through CodeGen.
llvm-svn: 148218
2012-01-15 13:16:05 +00:00
David Blaikie edbb58c577 Remove unnecessary default cases in switches that cover all enum values.
llvm-svn: 147855
2012-01-10 16:47:17 +00:00
Benjamin Kramer 6898db6269 Remove VectorExtras. This unused helper was written for a type of API that is discouraged now.
llvm-svn: 147738
2012-01-07 19:42:13 +00:00
Hal Finkel 692d1fb355 Cleanup stack/frame register define/kill states. This fixes two bugs:
1. The ST*UX instructions that store and update the stack pointer did not set define/kill on R1. This became a problem when I activated post-RA scheduling (and had incorrectly adjusted the Frames-large test).

2. eliminateFrameIndex did not kill its scavenged temporary register, and this could cause the scavenger to exhaust all available registers (and its emergency spill slot) when there were a lot of CR values to spill. The 2010-02-12-saveCR test has been adjusted to check for this.

llvm-svn: 147359
2011-12-30 00:34:00 +00:00
Rafael Espindola 250096233b Fix an incomplete refactoring of the ppc backend. Thanks to rdivacky for reporting
it. It does need some some tests...

llvm-svn: 147154
2011-12-22 18:38:06 +00:00
Rafael Espindola 428b9ee036 Fix cmake.
llvm-svn: 147126
2011-12-22 02:06:17 +00:00
Rafael Espindola 38a400df3b Move PPC bits to lib/Target/PowerPC.
llvm-svn: 147124
2011-12-22 01:57:09 +00:00
Rafael Espindola 1ad4095d6b Reduce the exposure of Triple::OSType in the ELF object writer. This will
avoid including ADT/Triple.h in many places when the target specific bits are
moved.

llvm-svn: 147059
2011-12-21 17:00:36 +00:00
Chandler Carruth e805b16e3d Fix up the CMake build for the new files added in r146960, they're
likely to stay either way that discussion ends up resolving itself.

llvm-svn: 146966
2011-12-20 08:42:11 +00:00
David Blaikie a379b18173 Unweaken vtables as per http://llvm.org/docs/CodingStandards.html#ll_virtual_anch
llvm-svn: 146960
2011-12-20 02:50:00 +00:00
Hal Finkel 9dd3f62b38 Ensure that the nop that should follow a bl call in PPC64 ELF actually does
llvm-svn: 146664
2011-12-15 17:54:01 +00:00
Chandler Carruth 637cc6a8aa Initial CodeGen support for CTTZ/CTLZ where a zero input produces an
undefined result. This adds new ISD nodes for the new semantics,
selecting them when the LLVM intrinsic indicates that the undef behavior
is desired. The new nodes expand trivially to the old nodes, so targets
don't actually need to do anything to support these new nodes besides
indicating that they should be expanded. I've done this for all the
operand types that I could figure out for all the targets. Owners of
various targets, please review and let me know if any of these are
incorrect.

Note that the expand behavior is *conservatively correct*, and exactly
matches LLVM's current behavior with these operations. Ideally this
patch will not change behavior in any way. For example the regtest suite
finds the exact same instruction sequences coming out of the code
generator. That's why there are no new tests here -- all of this is
being exercised by the existing test suite.

Thanks to Duncan Sands for reviewing the various bits of this patch and
helping me get the wrinkles ironed out with expanding for each target.
Also thanks to Chris for clarifying through all the discussions that
this is indeed the approach he was looking for. That said, there are
likely still rough spots. Further review much appreciated.

llvm-svn: 146466
2011-12-13 01:56:10 +00:00
Daniel Dunbar 8889bb08b8 LLVMBuild: Introduce a common section which currently has a list of the
subdirectories to traverse into.
 - Originally I wanted to avoid this and just autoscan, but this has one key
   flaw in that new subdirectories can not automatically trigger a rerun of the
   llvm-build tool. This is particularly a pain when switching back and forth
   between trees where one has added a subdirectory, as the dependencies will
   tend to be wrong. This will also eliminates FIXME implicitly.

llvm-svn: 146436
2011-12-12 22:45:54 +00:00
Daniel Dunbar 27a7489a03 LLVMBuild: Remove trailing newline, which irked me.
llvm-svn: 146409
2011-12-12 19:48:00 +00:00
Hal Finkel 67a7f18faf Make CR spill and restore use a reserved register. These operations cannot use the register scavenger because the scavenger can only scavenge one register and frame-index elimination may have already grabbed it.
llvm-svn: 146318
2011-12-10 04:50:53 +00:00
Owen Anderson 0b9b9da6c8 Teach SelectionDAG to match more calls to libm functions onto existing SDNodes. Mark these nodes as illegal by default, unless the target declares otherwise.
llvm-svn: 146171
2011-12-08 19:32:14 +00:00
Hal Finkel 528ff4bee0 MTCTR needs to be glued to BCTR so that CTR is not marked dead in MTCTR (another find by -verify-machineinstrs)
llvm-svn: 146137
2011-12-08 04:36:44 +00:00
Evan Cheng 7f8e563a69 Add bundle aware API for querying instruction properties and switch the code
generator to it. For non-bundle instructions, these behave exactly the same
as the MC layer API.

For properties like mayLoad / mayStore, look into the bundle and if any of the
bundled instructions has the property it would return true.
For properties like isPredicable, only return true if *all* of the bundled
instructions have the property.
For properties like canFoldAsLoad, isCompare, conservatively return false for
bundles.

llvm-svn: 146026
2011-12-07 07:15:52 +00:00
Hal Finkel ac9df3d411 make CR spill and restore 64-bit clean (no functional change), and fix some other problems found with -verify-machineinstrs
llvm-svn: 146024
2011-12-07 06:34:06 +00:00
Hal Finkel 16c744180d make base register selection used in eliminateFrameIndex 64-bit clean
llvm-svn: 146023
2011-12-07 06:34:02 +00:00
Hal Finkel abbc2529c1 set mayStore and mayLoad on CR pseudos
llvm-svn: 146022
2011-12-07 06:33:57 +00:00
Hal Finkel 2ba61e47a9 64-bit LR8 load should use X11 not R11
llvm-svn: 146021
2011-12-07 06:32:37 +00:00
Hal Finkel bde7f8ffe2 add RESTORE_CR and support CR unspills
llvm-svn: 145961
2011-12-06 20:55:36 +00:00
Hal Finkel 4ec02b02ac remove old FIXME
llvm-svn: 145960
2011-12-06 20:52:56 +00:00
NAKAMURA Takumi d3002490bf MipsAsmBackend.cpp, PPCAsmBackend.cpp: Fix -Asserts build to appease msvc.
llvm-svn: 145894
2011-12-06 01:48:32 +00:00
Jim Grosbach 25b63fa117 Move target-specific logic out of generic MCAssembler.
Whether a fixup needs relaxation for the associated instruction is a
target-specific function, as the FIXME indicated. Create a hook for that
and use it.

llvm-svn: 145881
2011-12-06 00:47:03 +00:00
Hal Finkel 8f6834dfa5 enable PPC register scavenging by default (update tests and remove some FIXMEs)
llvm-svn: 145819
2011-12-05 17:55:17 +00:00
Hal Finkel 72a26e8b8d don't include CR bit subregs in callee-saved list
llvm-svn: 145818
2011-12-05 17:55:12 +00:00
Hal Finkel b544019a60 add register pressure for CR regs
llvm-svn: 145816
2011-12-05 17:54:17 +00:00
Nick Lewycky 50f02cb21b Move global variables in TargetMachine into new TargetOptions class. As an API
change, now you need a TargetOptions object to create a TargetMachine. Clang
patch to follow.

One small functionality change in PTX. PTX had commented out the machine
verifier parts in their copy of printAndVerify. That now calls the version in
LLVMTargetMachine. Users of PTX who need verification disabled should rely on
not passing the command-line flag to enable it.

llvm-svn: 145714
2011-12-02 22:16:29 +00:00
Hal Finkel f9ce7b60ef remove unneeded FIXME comment
llvm-svn: 145679
2011-12-02 04:58:17 +00:00
Hal Finkel 58ca360081 update PPC 940 hazard rec. to function in postRA mode
llvm-svn: 145676
2011-12-02 04:58:02 +00:00