This is the final patch in the series of patches that improves
BUILD_VECTOR handling on PowerPC. This adds a few peephole optimizations
to remove redundant instructions. It also adds a large test case which
encompasses a large set of code patterns that build vectors - this test
case was the motivator for this series of patches.
Differential Revision: https://reviews.llvm.org/D26066
llvm-svn: 288800
VSX has instructions lxsiwax/lxsdx that can load 32/64 bit value into VSX register cheaply. That patch makes it known to memory cost model, so the vectorization of the test case in pr30990 is beneficial.
Differential Revision: https://reviews.llvm.org/D26713
llvm-svn: 288560
Instead, expose whether the current type is an array or a struct, if an array
what the upper bound is, and if a struct the struct type itself. This is
in preparation for a later change which will make PointerType derive from
Type rather than SequentialType.
Differential Revision: https://reviews.llvm.org/D26594
llvm-svn: 288458
This is per function data so it is better kept at the function instead
of the module.
This is a necessary step to have machine module passes work properly.
Differential Revision: https://reviews.llvm.org/D27185
llvm-svn: 288291
This patch corresponds to review:
https://reviews.llvm.org/D26023
This patch adds support for converting a vector of loads into a single load if
the loads are consecutive (in either direction).
llvm-svn: 288219
This patch corresponds to review:
https://reviews.llvm.org/D25980
This is the 2nd patch in a series of 4 that improve the lowering and combining
for BUILD_VECTOR nodes on PowerPC. This particular patch combines a build vector
of fp-to-int conversions into an fp-to-int conversion of a build vector of fp
values. For example:
Converts (build_vector (fp_to_[su]i $A), (fp_to_[su]i $B), ...)
Into (fp_to_[su]i (build_vector $A, $B, ...))).
Which is a natural match for much cleaner code.
llvm-svn: 288218
This commit caused some miscompiles that did not show up on any of the bots.
Reverting until we can investigate the cause of those failures.
llvm-svn: 288214
This patch corresponds to review:
https://reviews.llvm.org/D25912
This is the first patch in a series of 4 that improve the lowering and combining
for BUILD_VECTOR nodes on PowerPC.
llvm-svn: 288152
In rL283190, I added some InstAlias definitions to generate extended mnemonics
for some uses of the XXPERMDI instruction. However, when the assembler matches
these extended mnemonics, it matches the new instruction in situations where it
should match the old one.
This patch removes these definitions and accomplishes that by defining these
mnemonics with additional instructions that are isCodeGenOnly.
Fixes PR31127.
llvm-svn: 287765
Summary:
* ARM is omitted from this patch because this check appears to expose bugs in this target.
* Mips is omitted from this patch because this check either detects bugs or deliberate
emission of instructions that don't satisfy their predicates. One deliberate
use is the SYNC instruction where the version with an operand is correctly
defined as requiring MIPS32 while the version without an operand is defined
as an alias of 'SYNC 0' and requires MIPS2.
* X86 is omitted from this patch because it doesn't use the tablegen-erated
MCCodeEmitter infrastructure.
Patches for ARM and Mips will follow.
Depends on D25617
Reviewers: tstellarAMD, jmolloy
Subscribers: wdng, jmolloy, aemerson, rengolin, arsenm, jyknight, nemanjai, nhaehnle, tstellarAMD, llvm-commits
Differential Revision: https://reviews.llvm.org/D25618
llvm-svn: 287439
When we see a SETCC whose only users are zero extend operations, we can replace
it with a subtraction. This results in doing all calculations in GPRs and
avoids CR use.
Currently we do this only for ULT, ULE, UGT and UGE condition codes. There are
ways that this can be extended. For example for signed condition codes. In that
case we will be introducing additional sign extend instructions, so more careful
profitability analysis may be required.
Another direction to extend this is for equal, not equal conditions. Also when
users of SETCC are any_ext or sign_ext, we might be able to do something
similar.
llvm-svn: 287329
For the default, small and medium code model, use the existing
difference from the jump table towards the label. For all other code
models, setup the picbase and use the difference between the picbase and
the block address.
Overall, this results in smaller data tables at the expensive of one or
two more arithmetic operation at the jump site. Given that we only create
jump tables with a lot more than two entries, it is a net win in size.
For larger code models the assumption remains that individual functions
are no larger than 2GB.
Differential Revision: https://reviews.llvm.org/D26336
llvm-svn: 287059
This patch implements all the overloads for vec_xl_be and vec_xst_be. On BE,
they behaves exactly the same with vec_xl and vec_xst, therefore they are
simply implemented by defining a matching macro. On LE, they are implemented
by defining new builtins and intrinsics. For int/float/long long/double, it
is just a load (lxvw4x/lxvd2x) or store(stxvw4x/stxvd2x). For char/char/short,
we also need some extra shuffling before or after call the builtins to get the
desired BE order. For int128, simply call vec_xl or vec_xst.
llvm-svn: 286967
add an intrinsic to expose the 'VSX Scalar Convert Half-Precision to
Single-Precision' instruction.
Differential review: https://reviews.llvm.org/D26536
llvm-svn: 286862
This patch corresponds to review:
https://reviews.llvm.org/D26480
Adds all the intrinsics used for various permute builtins that will
be added to altivec.h.
llvm-svn: 286638
This patch corresponds to review:
https://reviews.llvm.org/D26307
Adds all the intrinsics used for various conversion builtins that will
be added to altivec.h. These are type conversions between various types of
vectors.
llvm-svn: 286596
The generic infrastructure to compute the Newton series for reciprocal and
reciprocal square root was conceived to allow a target to compute the series
itself. However, the original code did not properly consider this condition
if returned by a target. This patch addresses the issues to allow a target
to compute the series on its own.
Differential revision: https://reviews.llvm.org/D22975
llvm-svn: 286523
behind the test that the MachineModuleInfo analysis was
actually available and can be used.
While the MachO bits may well be reasonable to assume in the darwin
assembly printer, the analysis isn't constructively guaranteed anywhere
I could find so it seems safest to avoid crashing here.
This issue was found with PVS-Studio. Pretty sure the Clang Static
Anaylzer flags similar issues but we've probably never pointed it at
this code effectively.
llvm-svn: 285972
GPRC and GPRC_NOR0 (or the 64bit equivalent) and not just the latter.
GPRC_NOR0 contains ZERO as alternative meaning of r0 and is therefore
not a true subclass of GPRC.
llvm-svn: 285813
This patch corresponds to review:
https://reviews.llvm.org/D25896
It just eliminates the redundant ZExt after a count trailing zeros instruction.
llvm-svn: 285267
These functions are about classifying a global which will actually be
emitted, so it does not make sense for them to take a GlobalValue which may
for example be an alias.
Change the Mach-O object writer and the Hexagon, Lanai and MIPS backends to
look through aliases before using TargetLoweringObjectFile interfaces. These
are functional changes but all appear to be bug fixes.
Differential Revision: https://reviews.llvm.org/D25917
llvm-svn: 285006
https://reviews.llvm.org/D24924
This improves the code generated for a sequence of AND, ANY_EXT, SRL instructions. This is a targetted fix for this special pattern. The pattern is generated by target independet dag combiner and so a more general fix may not be necessary. If we come across other similar cases, some ideas for handling it are discussed on the code review.
llvm-svn: 284983
This is a retry of r284495 which was reverted at r284513 due to use-after-scope bugs
caused by faulty usage of StringRef.
This version also renames a pair of functions:
getRecipEstimateDivEnabled()
getRecipEstimateSqrtEnabled()
as suggested by Eric Christopher.
original commit msg:
[Target] remove TargetRecip class; move reciprocal estimate isel functionality to TargetLowering
This is a follow-up to https://reviews.llvm.org/D24816 - where we changed reciprocal estimates to be function attributes
rather than TargetOptions.
This patch is intended to be a structural, but not functional change. By moving all of the
TargetRecip functionality into TargetLowering, we can remove all of the reciprocal estimate
state, shield the callers from the string format implementation, and simplify/localize the
logic needed for a target to enable this.
If a function has a "reciprocal-estimates" attribute, those settings may override the target's
default reciprocal preferences for whatever operation and data type we're trying to optimize.
If there's no attribute string or specific setting for the op/type pair, just use the target
default settings.
As noted earlier, a better solution would be to move the reciprocal estimate settings to IR
instructions and SDNodes rather than function attributes, but that's a multi-step job that
requires infrastructure improvements. I intend to work on that, but it's not clear how long
it will take to get all the pieces in place.
Differential Revision: https://reviews.llvm.org/D25440
llvm-svn: 284746
All of these existed because MSVC 2013 was unable to synthesize default
move ctors. We recently dropped support for it so all that error-prone
boilerplate can go.
No functionality change intended.
llvm-svn: 284721
This is a follow-up to D24816 - where we changed reciprocal estimates to be function attributes
rather than TargetOptions.
This patch is intended to be a structural, but not functional change. By moving all of the
TargetRecip functionality into TargetLowering, we can remove all of the reciprocal estimate
state, shield the callers from the string format implementation, and simplify/localize the
logic needed for a target to enable this.
If a function has a "reciprocal-estimates" attribute, those settings may override the target's
default reciprocal preferences for whatever operation and data type we're trying to optimize.
If there's no attribute string or specific setting for the op/type pair, just use the target
default settings.
As noted earlier, a better solution would be to move the reciprocal estimate settings to IR
instructions and SDNodes rather than function attributes, but that's a multi-step job that
requires infrastructure improvements. I intend to work on that, but it's not clear how long
it will take to get all the pieces in place.
Differential Revision: https://reviews.llvm.org/D25440
llvm-svn: 284495
This is a patch to implement pr30640.
When a 64bit constant has the same hi/lo words, we can use rldimi to copy the low word into high word of the same register.
This optimization caused failure of test case bperm.ll because of not optimal heuristic in function SelectAndParts64. It chooses AND or ROTATE to extract bit groups from a register, and OR them together. This optimization lowers the cost of loading 64bit constant mask used in AND method, and causes different code sequence. But actually ROTATE method is better in this test case. The reason is in ROTATE method the final OR operation can be avoided since rldimi can insert the rotated bits into target register directly. So this patch also enhances SelectAndParts64 to prefer ROTATE method when the two methods have same cost and there are multiple bit groups need to be ORed together.
Differential Revision: https://reviews.llvm.org/D25521
llvm-svn: 284276
Summary:
In PPCMIPeephole, when we see two splat instructions, we can't simply do the following transformation:
B = Splat A
C = Splat B
=>
C = Splat A
because B may still be used between these two instructions. Instead, we should make the second Splat a PPC::COPY and let later passes decide whether to remove it or not:
B = Splat A
C = Splat B
=>
B = Splat A
C = COPY B
Fixes PR30663.
Reviewers: echristo, iteratee, kbarton, nemanjai
Subscribers: mehdi_amini, llvm-commits
Differential Revision: https://reviews.llvm.org/D25493
llvm-svn: 283961
Summary:
I had for the second time today a bug where llvm::format("%s", Str)
was called with Str being a StringRef. The Linux and MacOS bots were
fine, but windows having different calling convention, it printed
garbage.
Instead we can catch this at compile-time: it is never expected to
call a C vararg printf-like function with non scalar type I believe.
Reviewers: bogner, Bigcheese, dexonsmith
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D25266
llvm-svn: 283509
The motivation for the change is that we can't have pseudo-global settings for
codegen living in TargetOptions because that doesn't work with LTO.
Ideally, these reciprocal attributes will be moved to the instruction-level via
FMF, metadata, or something else. But making them function attributes is at least
an improvement over the current state.
The ingredients of this patch are:
Remove the reciprocal estimate command-line debug option.
Add TargetRecip to TargetLowering.
Remove TargetRecip from TargetOptions.
Clean up the TargetRecip implementation to work with this new scheme.
Set the default reciprocal settings in TargetLoweringBase (everything is off).
Update the PowerPC defaults, users, and tests.
Update the x86 defaults, users, and tests.
Note that if this patch needs to be reverted, the related clang patch checked in
at r283251 should be reverted too.
Differential Revision: https://reviews.llvm.org/D24816
llvm-svn: 283252
This patch corresponds to review:
The newly added VSX D-Form (register + offset) memory ops target the upper half
of the VSX register set. The existing ones target the lower half. In order to
unify these and have the ability to target all the VSX registers using D-Form
operations, this patch defines Pseudo-ops for the loads/stores which are
expanded post-RA. The expansion then choses the correct opcode based on the
register that was allocated for the operation.
llvm-svn: 283212
This patch corresponds to review:
https://reviews.llvm.org/D23155
This patch removes the VSHRC register class (based on D20310) and adds
exploitation of the Power9 sub-word integer loads into VSX registers as well
as vector sign extensions.
The new instructions are useful for a few purposes:
Int to Fp conversions of 1 or 2-byte values loaded from memory
Building vectors of 1 or 2-byte integers with values loaded from memory
Storing individual 1 or 2-byte elements from integer vectors
This patch implements all of those uses.
llvm-svn: 283190
The PPC branch-selection pass, which performs branch relaxation, needs to
account for the padding that might be introduced to satisfy block alignment
requirements. We were assuming that the first block was at offset zero (i.e.
had the alignment of the function itself), but under the ELFv2 ABI, a global
entry function prologue is added to the first block, and it is a
two-instruction sequence (i.e. eight-bytes long). If the function has 16-byte
alignment, the fact that the first block is eight bytes offset from the start
of the function is relevant to calculating where padding will be added in
between later blocks.
Unfortunately, I don't have a small test case.
llvm-svn: 283086
This change enables soft-float for PowerPC64, and also makes soft-float disable
all vector instruction sets for both 32-bit and 64-bit modes. This latter part
is necessary because the PPC backend canonicalizes many Altivec vector types to
floating-point types, and so soft-float breaks scalarization support for many
operations. Both for embedded targets and for operating-system kernels desiring
soft-float support, it seems reasonable that disabling hardware floating-point
also disables vector instructions (embedded targets without hardware floating
point support are unlikely to have Altivec, etc. and operating system kernels
desiring not to use floating-point registers to lower syscall cost are unlikely
to want to use vector registers either). If someone needs this to work, we'll
need to change the fact that we promote many Altivec operations to act on
v4f32. To make it possible to disable Altivec when soft-float is enabled,
hardware floating-point support needs to be expressed as a positive feature,
like the others, and not a negative feature, because target features cannot
have dependencies on the disabling of some other feature. So +soft-float has
now become -hard-float.
Fixes PR26970.
llvm-svn: 283060
This patch corresponds to review:
https://reviews.llvm.org/D24396
This patch adds support for the "vector count trailing zeroes",
"vector compare not equal" and "vector compare not equal or zero instructions"
as well as "scalar count trailing zeroes" instructions. It also changes the
vector negation to use XXLNOR (when VSX is enabled) so as not to increase
register pressure (previously this was done with a splat immediate of all
ones followed by an XXLXOR). This was done because the altivec.h
builtins (patch to follow) use vector negation and the use of an additional
register for the splat immediate is not optimal.
llvm-svn: 282478
This patch corresponds to review:
https://reviews.llvm.org/D21135
This patch exploits the following instructions:
mtvsrws
lxvwsx
mtvsrdd
mfvsrld
In order to improve some build_vector and extractelement patterns.
llvm-svn: 282246
Atomic comparison instructions use the sub-word load instruction on
Power8 and up but the value is not sign extended prior to the signed word
compare instruction. This patch adds that sign extension.
llvm-svn: 282182
This patch corresponds to:
https://reviews.llvm.org/D21409
The LXVD2X, LXVW4X, STXVD2X and STXVW4X instructions permute the two doublewords
in the vector register when in little-endian mode. Custom code ensures that the
necessary swaps are inserted for these. This patch simply removes the possibilty
that a load/store node will match one of these instructions in the SDAG as that
would not insert the necessary swaps.
llvm-svn: 282144
This patch corresponds to review:
https://reviews.llvm.org/D19825
The new lxvx/stxvx instructions do not require the swaps to line the elements
up correctly. In order to select them over the lxvd2x/lxvw4x instructions which
require swaps, the patterns for the old instruction have a predicate that
ensures they won't be selected on Power9 and newer CPUs.
llvm-svn: 282143
When a phi node is finally lowered to a machine instruction it is
important that the lowered "load" instruction is placed before the
associated DEBUG_VALUE entry describing the value loaded.
Renamed the existing SkipPHIsAndLabels to SkipPHIsLabelsAndDebug to
more fully describe that it also skips debug entries. Then used the
"new" function SkipPHIsAndLabels when the debug information should not
be skipped when placing the lowered "load" instructions so that it is
placed before the debug entries.
Differential Revision: https://reviews.llvm.org/D23760
llvm-svn: 281727
This patch corresponds to review:
https://reviews.llvm.org/D24021
In the initial implementation of this instruction, I forgot to account for
variable indices. This patch fixes PR30189 and should probably be merged into
3.9.1 (I'll open a bug according to the new instructions).
llvm-svn: 281479
There is currently no codegen for Power9 that depends on the directive
so this is NFC for now but will be important in the future. This was
missed in r268950 so I'm adding it now.
llvm-svn: 281473
Summary:
An IR load can be invariant, dereferenceable, neither, or both. But
currently, MI's notion of invariance is IR-invariant &&
IR-dereferenceable.
This patch splits up the notions of invariance and dereferenceability at
the MI level. It's NFC, so adds some probably-unnecessary
"is-dereferenceable" checks, which we can remove later if desired.
Reviewers: chandlerc, tstellarAMD
Subscribers: jholewinski, arsenm, nemanjai, llvm-commits
Differential Revision: https://reviews.llvm.org/D23371
llvm-svn: 281151
When folding an addi into a memory access that can take an immediate offset, we
were implicitly assuming that the existing offset was zero. This was incorrect.
If we're dealing with an addi with a plain constant, we can add it to the
existing offset (assuming that doesn't overflow the immediate, etc.), but if we
have anything else (i.e. something that will become a relocation expression),
we'll go back to requiring the existing immediate offset to be zero (because we
don't know what the requirements on that relocation expression might be - e.g.
maybe it is paired with some addis in some relevant way).
On the other hand, when dealing with a plain addi with a regular constant
immediate, the alignment restrictions (from the TOC base pointer, etc.) are
irrelevant.
I've added the test case from PR30280, which demonstrated the bug, but also
demonstrates a missed optimization opportunity (i.e. we don't need the memory
accesses at all).
Fixes PR30280.
llvm-svn: 280789
Unlike PPC64, PPC32/SVRV4 does not have red zone. In the absence of it
there is no guarantee that this part of the stack will not be modified
by any interrupt. To avoid this, make sure to claim the stack frame first
before storing into it.
This fixes https://llvm.org/bugs/show_bug.cgi?id=26519.
Differential Revision: https://reviews.llvm.org/D24093
llvm-svn: 280705
We used to compute the padding contributions to the block sizes during branch
relaxation only at the start of the transformation. As we perform branch
relaxation, we change the sizes of the blocks, and so the amount of inter-block
padding might change. Accordingly, we need to recompute the (alignment-based)
padding in between every iteration on our way toward the fixed point.
Unfortunately, I don't have a test case (and none was provided in the bug
report), and while this obviously seems needed, algorithmically, I don't have
any way of generating a small and/or non-fragile regression test.
llvm-svn: 280626
As it turns out, whether we zero-extend or sign-extend i8/i16 constants, which
are illegal types promoted to i32 on PowerPC, is a choice constrained by
assumptions within the infrastructure. Specifically, the logic in
FunctionLoweringInfo::ComputePHILiveOutRegInfo assumes that constant PHI
operands will be zero extended, and so, at least when materializing constants
that are PHI operands, we must do the same.
The rest of our fast-isel implementation does not appear to depend on the fact
that we were sign-extending i8/i16 constants, and all other targets also appear
to zero-extend small-bitwidth constants in fast-isel; we'll now do the same (we
had been doing this only for i1 constants, and sign-extending the others).
Fixes PR27721.
llvm-svn: 280614
PowerPC assembly code in the wild, so it seems, has things like this:
bc+ 12, 28, .L9
This is a bit odd because the '+' here becomes part of the BO field, and the BO
field is otherwise the first operand. Nevertheless, the ISA specification does
clearly say that the +- hint syntax applies to all conditional-branch mnemonics
(that test either CTR or a condition register, although not the forms which
check both), both basic and extended, so this is supposed to be valid.
This introduces some asm-parser-only definitions which take only the upper
three bits from the specified BO value, and the lower two bits are implied by
the +- suffix (via some associated aliases).
Fixes PR23646.
llvm-svn: 280571
dcbf has an optional hint-like field, add support for the extended form and the
associated mnemonics (dcbfl and dcbflp).
Partially fixes PR24796.
llvm-svn: 280559
When we have an offset into a global, etc. that is accessed relative to the TOC
base pointer, and the offset is larger than the minimum alignment of the global
itself and the TOC base pointer (which is 8-byte aligned), we can still fold
the @toc@ha into the memory access, but we must update the addis instruction's
symbol reference with the offset as the symbol addend. When there is only one
use of the addi to be folded and only one use of the addis that would need its
symbol's offset adjusted, then we can make the adjustment and fold the @toc@l
into the memory access.
llvm-svn: 280545
As Sanjay suggested when he added the hook, PPC should return true from
hasAndNotCompare. We have an efficient negated 'and' on PPC (which can feed a
compare).
Fixes PR27203.
llvm-svn: 280457
Following a suggestion by Sanjay, we should lower:
%shl = shl i32 1, %y
%and = and i32 %x, %shl
%cmp = icmp eq i32 %and, %shl
ret i1 %cmp
into:
subfic r4, r4, 32
rlwnm r3, r3, r4, 31, 31
Add this pattern and some associated patterns for the 64-bit case and the
not-equal case. Fixes PR27356.
llvm-svn: 280454
When applying our address-formation PPC64 peephole, we are reusing the @ha TOC
addis value with the low parts associated with different offsets (i.e.
different effective symbol addends). We were assuming this was okay so long as
the offsets were less than the alignment of the global variable being accessed.
This ignored the fact, however, that the TOC base pointer itself need only be
8-byte aligned. As a result, what we were doing is legal only for offsets less
than 8 regardless of the alignment of the object being accessed.
Fixes PR28727.
llvm-svn: 280441
The logic in this function assumes that the P8 supports fusion of addis/addi,
but it does not. As a result, there is no advantage to restricting our peephole
application, merging addi instructions into dependent memory accesses, even
when the addi has multiple users, regardless of whether or not we're optimizing
for size.
We might need something like this again for the P9; I suspect we'll revisit
this code when we work on P9 tuning.
llvm-svn: 280440
LLVM has an @llvm.eh.dwarf.cfa intrinsic, used to lower the GCC-compatible
__builtin_dwarf_cfa() builtin. As pointed out in PR26761, this is currently
broken on PowerPC (and likely on ARM as well). Currently, @llvm.eh.dwarf.cfa is
lowered using:
ADD(FRAMEADDR, FRAME_TO_ARGS_OFFSET)
where FRAME_TO_ARGS_OFFSET defaults to the constant zero. On x86,
FRAME_TO_ARGS_OFFSET is lowered to 2*SlotSize. This setup, however, does not
work for PowerPC. Because of the way that the stack layout works, the canonical
frame address is not exactly (FRAMEADDR + FRAME_TO_ARGS_OFFSET) on PowerPC
(there is a lower save-area offset as well), so it is not just a matter of
implementing FRAME_TO_ARGS_OFFSET for PowerPC (unless we redefine its
semantics -- We can do that, since it is currently used only for
@llvm.eh.dwarf.cfa lowering, but the better to directly lower the CFA construct
itself (since it can be easily represented as a fixed-offset FrameIndex)). Mips
currently does this, but by using a custom lowering for ADD that specifically
recognizes the (FRAMEADDR, FRAME_TO_ARGS_OFFSET) pattern.
This change introduces a ISD::EH_DWARF_CFA node, which by default expands using
the existing logic, but can be directly lowered by the target. Mips is updated
to use this method (which simplifies its implementation, and I suspect makes it
more robust), and updates PowerPC to do the same.
Fixes PR26761.
Differential Revision: https://reviews.llvm.org/D24038
llvm-svn: 280350
When a function contains something, such as inline asm, which explicitly
clobbers the register used as the frame pointer, don't spill it twice. If we
need a frame pointer, it will be saved/restored in the prologue/epilogue code.
Explicitly spilling it again will reuse the same spill slot used by the
prologue/epilogue code, thus clobbering the saved value. The same applies
to the base-pointer or PIC-base register.
Partially fixes PR26856. Thanks to Ulrich for his analysis and the small
inline-asm reproducer.
llvm-svn: 280188
Implement Bill's suggested fix for 32-bit targets for PR22711 (for the
alignment of each entry). As pointed out in the bug report, we could just force
the section alignment, since we only add pointer-sized things currently, but
this fix is somewhat more future-proof.
llvm-svn: 280049
The "long call" option forces the use of the indirect calling sequence for all
calls (even those that don't really need it). GCC provides this option; This is
helpful, under certain circumstances, for building very-large binaries, and
some other specialized use cases.
Fixes PR19098.
llvm-svn: 280040
For little-Endian PowerPC, we generally target only P8 and later by default.
However, generic (older) 64-bit configurations are still an option, and in that
case, partword atomics are not available (e.g. stbcx.). To lower i8/i16 atomics
without true i8/i16 atomic operations, we emulate using i32 atomics in
combination with a bunch of shifting and masking, etc. The amount by which to
shift in little-Endian mode is different from the amount in big-Endian mode (it
is inverted -- meaning we can leave off the xor when computing the amount).
Fixes PR22923.
llvm-svn: 280022
Rename AllVRegsAllocated to NoVRegs. This avoids the connotation of
running after register and simply describes that no vregs are used in
a machine function. With that we can simply compute the property and do
not need to dump/parse it in .mir files.
Differential Revision: http://reviews.llvm.org/D23850
llvm-svn: 279698
The names of the tablegen defs now match the names of the ISD nodes.
This makes the world a slightly saner place, as previously "fround" matched
ISD::FP_ROUND and not ISD::FROUND.
Differential Revision: https://reviews.llvm.org/D23597
llvm-svn: 279129
This is a mechanical change of comments in switches like fallthrough,
fall-through, or fall-thru to use the LLVM_FALLTHROUGH macro instead.
llvm-svn: 278902
This is a quick work around, because in some cases, e.g. caller's stack
size > callee's stack size, we are still able to apply sibling call
optimization even callee has any byval arg.
This patch fix: https://llvm.org/bugs/show_bug.cgi?id=28328
Reviewers: hfinkel kbarton nemanjai amehsan
Subscribers: hans, tjablin
https://reviews.llvm.org/D23441
llvm-svn: 278900
Following the discussion on D22038, this refactors a PowerPC specific setcc -> srl(ctlz) transformation so it can be used by other targets.
Differential Revision: https://reviews.llvm.org/D23445
llvm-svn: 278799
Summary: It triggers exponential behavior when the DAG has many branches.
Reviewers: hfinkel, kbarton
Subscribers: iteratee, nemanjai, echristo
Differential Revision: https://reviews.llvm.org/D23428
llvm-svn: 278548
There were two locations where fast-isel would generate a LFD instruction
with a target register class VSFRC instead of F8RC when VSX was enabled.
This can ccause invalid registers to be used in certain cases, like:
lfd 36, ...
instead of using a VSX load instruction. The wrong register number gets
silently truncated, causing invalid code to be generated.
The first place is PPCFastISel::PPCEmitLoad, which had multiple problems:
1.) The IsVSSRC and IsVSFRC flags are not initialized correctly, since they
are computed from resultReg, which is still zero at this point in many cases.
Fixed by changing the helper routines to operate on a register class instead
of a register and passing in UseRC.
2.) Even with this fixed, Is64VSXLoad is still wrong due to a typo:
bool Is32VSXLoad = IsVSSRC && Opc == PPC::LFS;
bool Is64VSXLoad = IsVSSRC && Opc == PPC::LFD;
The second line needs to use isVSFRC (like PPCEmitStore does).
3.) Once both the above are fixed, we're now generating a VSX instruction --
but an incorrect one, since generation of an indexed instruction with null
index is wrong. Fixed by copying the code handling the same issue in
PPCEmitStore.
The second place is PPCFastISel::PPCMaterializeFP, where we would emit an
LFD to load a constant from the literal pool, and use the wrong result
register class. Fixed by hardcoding a F8RC class even on systems
supporting VSX.
Fixes: https://llvm.org/bugs/show_bug.cgi?id=28630
Differential Revision: https://reviews.llvm.org/D22632
llvm-svn: 277823
This patch fixes passing long double type arguments to function in
soft float mode. If there is less than 4 argument registers free
(long double type is mapped in 4 gpr registers in soft float mode)
long double type argument must be passed through stack.
Differential Revision: https://reviews.llvm.org/D20114.
llvm-svn: 277804
This patch fixes pr25548.
Current implementation of PPCBoolRetToInt doesn't handle CallInst correctly, so it failed to do the intended optimization when there is a CallInst with parameters. This patch fixed that.
llvm-svn: 277655
This adds a target hook getInstSizeInBytes to TargetInstrInfo that a lot of
subclasses already implement.
Differential Revision: https://reviews.llvm.org/D22885
llvm-svn: 277126
Avoid implicit conversions from MachineInstrBundleIterator to
MachineInstr* in the PowerPC backend, mainly by preferring MachineInstr&
over MachineInstr* when a pointer isn't nullable and using range-based
for loops.
There was one piece of questionable code in PPCInstrInfo::AnalyzeBranch,
where a condition checked a pointer converted from an iterator for
nullptr. Since this case is impossible (moreover, the code above
guarantees that the iterator is valid), I removed the check when I
changed the pointer to a reference.
Despite that case, there should be no functionality change here.
llvm-svn: 276864
Some targets, notably AArch64 for ILP32, have different relocation encodings
based upon the ABI. This is an enabling change, so a future patch can use the
ABIName from MCTargetOptions to chose which relocations to use. Tested using
check-llvm.
The corresponding change to clang is in: http://reviews.llvm.org/D16538
Patch by: Joel Jones
Differential Revision: https://reviews.llvm.org/D16213
llvm-svn: 276654
This patch corresponds to review:
https://reviews.llvm.org/D21354
We use direct moves for extracting integer elements from vectors. We also use
direct moves when converting integers to FP. When these operations are chained,
we get a direct move out of a VSR followed by a direct move back into a VSR.
These are redundant - all we need to do is line up the element and convert.
llvm-svn: 275796
Summary:
Instead, we take a single flags arg (a bitset).
Also add a default 0 alignment, and change the order of arguments so the
alignment comes before the flags.
This greatly simplifies many callsites, and fixes a bug in
AMDGPUISelLowering, wherein the order of the args to getLoad was
inverted. It also greatly simplifies the process of adding another flag
to getLoad.
Reviewers: chandlerc, tstellarAMD
Subscribers: jholewinski, arsenm, jyknight, dsanders, nemanjai, llvm-commits
Differential Revision: http://reviews.llvm.org/D22249
llvm-svn: 275592
This patch corresponds to review:
http://reviews.llvm.org/D20239
It adds exploitation of XXINSERTW and XXEXTRACTUW instructions that
are useful in some cases for inserting and extracting vector elements of
v4[if]32 vectors.
llvm-svn: 275215
This patch corresponds to review:
http://reviews.llvm.org/D21358
Vector shifts that have the same semantics as a vector swap are cannonicalized
as such to provide additional opportunities for swap removal optimization to
remove unnecessary swaps.
llvm-svn: 275168
There is a problem in VSXSwapRemoval where it is incorrectly removing permute instructions.
In this case, the permute is feeding both a vector store and also a non-store instruction. In this case, the permute cannot be removed.
The fix is to simply look at all the uses of the vector register defined by the permute and ensure that all the uses are vector store instructions.
This problem was reported in PR 27735 (https://llvm.org/bugs/show_bug.cgi?id=27735).
Test case based on the original problem reported.
Phabricator Review: http://reviews.llvm.org/D21802
llvm-svn: 274645
This patch corresponds to review:
http://reviews.llvm.org/D20443
It changes the legalization strategy for illegal vector types from integer
promotion to widening. This only applies for vectors with elements of width
that is a multiple of a byte since we have hardware support for vectors with
1, 2, 3, 8 and 16 byte elements.
Integer promotion for vectors is quite expensive on PPC due to the sequence
of breaking apart the vector, extending the elements and reconstituting the
vector. Two of these operations are expensive.
This patch causes between minor and major improvements in performance on most
benchmarks. There are very few benchmarks whose performance regresses. These
regressions can be handled in a subsequent patch with a DAG combine (similar
to how this patch handles int -> fp conversions of illegal vector types).
llvm-svn: 274535
TargetSubtargetInfo::overrideSchedPolicy takes two MachineInstr*
arguments (begin and end) that invite implicit conversions from
MachineInstrBundleIterator. One option would be to change their type to
an iterator, but since they don't seem to have been used since the API
was added in 2010, I'm deleting the dead code.
llvm-svn: 274304
This is a mechanical change to make TargetLowering API take MachineInstr&
(instead of MachineInstr*), since the argument is expected to be a valid
MachineInstr. In one case, changed a parameter from MachineInstr* to
MachineBasicBlock::iterator, since it was used as an insertion point.
As a side effect, this removes a bunch of MachineInstr* to
MachineBasicBlock::iterator implicit conversions, a necessary step
toward fixing PR26753.
llvm-svn: 274287
This is mostly a mechanical change to make TargetInstrInfo API take
MachineInstr& (instead of MachineInstr* or MachineBasicBlock::iterator)
when the argument is expected to be a valid MachineInstr. This is a
general API improvement.
Although it would be possible to do this one function at a time, that
would demand a quadratic amount of churn since many of these functions
call each other. Instead I've done everything as a block and just
updated what was necessary.
This is mostly mechanical fixes: adding and removing `*` and `&`
operators. The only non-mechanical change is to split
ARMBaseInstrInfo::getOperandLatencyImpl out from
ARMBaseInstrInfo::getOperandLatency. Previously, the latter took a
`MachineInstr*` which it updated to the instruction bundle leader; now,
the latter calls the former either with the same `MachineInstr&` or the
bundle leader.
As a side effect, this removes a bunch of MachineInstr* to
MachineBasicBlock::iterator implicit conversions, a necessary step
toward fixing PR26753.
Note: I updated WebAssembly, Lanai, and AVR (despite being
off-by-default) since it turned out to be easy. I couldn't run tests
for AVR since llc doesn't link with it turned on.
llvm-svn: 274189
I think this converts all the simple cases that really just care about
the generated code being position independent or not. The remaining
uses are a bit more complicated and are checking things like "is this
a library or executable" or "can this symbol be preempted".
llvm-svn: 274055
The setCallee function will set the number of fixed arguments based
on the size of the argument list. The FixedArgs parameter was often
explicitly set to 0, leading to a lack of consistent value for non-
vararg functions.
Differential Revision: http://reviews.llvm.org/D20376
llvm-svn: 273403
We convert `Default` to `NotPIC` so that target independent code
can reason about this correctly.
Differential Revision: http://reviews.llvm.org/D21394
llvm-svn: 273024
Recommiting after fixing non-atomic insert to front of SmallVector in
MCAsmLexer.h
Add explicit Comment Token in Assembly Lexing for future support for
outputting explicit comments from inline assembly. As part of this,
CPPHash Directives are now explicitly distinguished from Hash line
comments in Lexer.
Line comments are recorded as EndOfStatement tokens, not Comment tokens
to simplify compatibility with current TargetParsers. This slightly
complicates comment output.
This remove all lexing tasks out of the parser, does minor cleanup
to remove extraneous newlines Asm Output, and some improvements white
space handling.
Reviewers: rtrieu, dwmw2, rnk
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D20009
llvm-svn: 273007
Add explicit Comment Token in Assembly Lexing for future support for
outputting explicit comments from inline assembly. As part of this,
CPPHash Directives are now explicitly distinguished from Hash line
comments in Lexer.
Line comments are recorded as EndOfStatement tokens, not Comment tokens
to simplify compatibility with current TargetParsers. This slightly
complicates comment output.
This remove all lexing tasks out of the parser, does minor cleanup
to remove extraneous newlines Asm Output, and some improvements white
space handling.
Reviewers: rtrieu, dwmw2, rnk
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D20009
llvm-svn: 272953
This used to be free, copying and moving DebugLocs became expensive
after the metadata rewrite. Passing by reference eliminates a ton of
track/untrack operations. No functionality change intended.
llvm-svn: 272512
Using an LLVM IR aggregate return value type containing three
or more integer values causes an abort in the fast isel pass.
This patch adds two more registers to RetCC_PPC64_ELF_FIS to
allow returning up to four integers with fast isel, just the
same as is currently supported with regular isel (RetCC_PPC).
This is needed for Swift and (possibly) other non-clang frontends.
Fixes PR26190.
llvm-svn: 272005
subclasses. These are not passes proper. We don't support registering
them, they can't be constructed with default arguments, and the ID is
actually in a base class.
Only these two targets even had any boiler plate to try to do this, and
it had to be munged out of the INITIALIZE_PASS macros to work. What's
worse, the boiler plate has rotted and the "name" of the pass is
actually the description string now!!! =/ All of this is completely
unnecessary. No other target bothers, and nothing breaks if you don't
initialize them because CodeGen has an entirely separate initialization
path that is somewhat more durable than relying on the implicit
initialization the way the 'opt' tool does for registered passes.
llvm-svn: 271650
Fix PR27943 "Bad machine code: Using an undefined physical register".
SUBFC8 implicitly defines the CR0 register, but this was omitted in
the instruction definition.
Patch by Jameson Nash <jameson@juliacomputing.com>
Reviewers: hfinkel
Differential Revision: http://reviews.llvm.org/D20802
llvm-svn: 271425
- Where we were returning a node before, call ReplaceNode instead.
- Where we would return null to fall back to another selector, rename
the method to try* and return a bool for success.
- Where we were calling SelectNodeTo, just return afterwards.
Part of llvm.org/pr26808.
llvm-svn: 270283
Having an enum member named Default is quite confusing: Is it distinct
from the others?
This patch removes that member and instead uses Optional<Reloc> in
places where we have a user input that still hasn't been maped to the
default value, which is now clear has no be one of the remaining 3
options.
llvm-svn: 269988
While promoting nodes in PPCTargetLowering::DAGCombineExtBoolTrunc, it is
possible for one of the nodes to be replaced by another. To make sure we do not
visit the deleted nodes, and to make sure we visit the replacement nodes, use a
list of HandleSDNodes to track the to-be-promoted nodes during the promotion
process.
The same fix has been applied to the analogous code in
PPCTargetLowering::DAGCombineTruncBoolExt.
Fixes PR26985.
llvm-svn: 269272
Many files include Passes.h but only a fraction needs to know about the
TargetPassConfig class. Move it into an own header. Also rename
Passes.cpp to TargetPassConfig.cpp while we are at it.
llvm-svn: 269011
This patch corresponds to review:
http://reviews.llvm.org/D19683
Simply adds the bits for being able to specify -mcpu=pwr9 to the back end.
llvm-svn: 268950
This patch fixes register alignment for long double type in
soft float mode. Before this patch alignment was 8 and this
patch changes it to 4.
Differential Revision: http://reviews.llvm.org/D18034
llvm-svn: 268909
This is a step towards removing the rampant undefined behaviour in
SelectionDAG, which is a part of llvm.org/PR26808.
We rename SelectionDAGISel::Select to SelectImpl and update targets to
match, and then change Select to return void and consolidate the
sketchy behaviour we're trying to get away from there.
Next, we'll update backends to implement `void Select(...)` instead of
SelectImpl and eventually drop the base Select implementation.
llvm-svn: 268693
This patch corresponds to review:
http://reviews.llvm.org/D18592
It allows the PPC back end to generate the xxspltw instruction where we
previously only emitted vspltw.
llvm-svn: 268516
If, in between the splat and the load (which does an implicit splat), there is
a read of the splat register, then that register must have another earlier
definition. In that case, we can't replace the load's destination register with
the splat's destination register.
Unfortunately, I don't have a small or non-fragile test case.
llvm-svn: 268152
This requirement was a huge hack to keep LiveVariables alive because it
was optionally used by TwoAddressInstructionPass and PHIElimination.
However we have AnalysisUsage::addUsedIfAvailable() which we can use in
those passes.
This re-applies r260806 with LiveVariables manually added to PowerPC to
hopefully not break the stage 2 bots this time.
llvm-svn: 267954
This instruction is just a control flow marker - it should not
actually exist in the object file. Unfortunately, nothing catches
it before it gets to AsmPrinter. If integrated assembler is used,
it's considered to be a normal 4-byte instruction, and emitted as
an all-0 word, crashing the program. With external assembler,
a comment is emitted.
Fixed by setting Size to 0 and handling it in MCCodeEmitter - this
means the comment will still be emitted if integrated assembler
is not used.
This broke an ASan test, which has been disabled for a long time
as a result (see the discussion on D19657). We can reenable it
once this lands.
llvm-svn: 267943
Revert "[Power9] Implement add-pc, multiply-add, modulo, extend-sign-shift, random number, set bool, and dfp test significance".
This patch has caused a functional regression in SPEC2k6 namd, and a performance regression in mesa-pipe.
llvm-svn: 267927
print-stack-trace.cc test failure of compiler-rt has been fixed by
r266869 (http://reviews.llvm.org/D19148), so reenable sibling call
optimization on ppc64
Reviewers: nemanjai kbarton
llvm-svn: 267527
Summary:
We don't use MinLatency any more since r184032.
Reviewers: atrick, hfinkel, mcrosier
Differential Revision: http://reviews.llvm.org/D19474
llvm-svn: 267502
ADD8TLS, a variant of add instruction used for initial-exec TLS,
currently accepts r0 as a source register. While add itself supports
r0 just fine, linker can relax it to a local-exec sequence, converting
it to addi - which doesn't support r0.
Differential Revision: http://reviews.llvm.org/D19193
llvm-svn: 267388
Removed some unused headers, replaced some headers with forward class declarations.
Found using simple scripts like this one:
clear && ack --cpp -l '#include "llvm/ADT/IndexedMap.h"' | xargs grep -L 'IndexedMap[<]' | xargs grep -n --color=auto 'IndexedMap'
Patch by Eugene Kosov <claprix@yandex.ru>
Differential Revision: http://reviews.llvm.org/D19219
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 266595
[PPC] Previously when casting generic loads to LXV2DX/ST instructions we
would leave the original load return type in place allowing for an
assertion failure when we merge two equivalent LXV2DX nodes with
different types.
This fixes PR27350.
Reviewers: nemanjai
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D19133
llvm-svn: 266438
This patch corresponds to review:
http://reviews.llvm.org/D17850
This patch implements the following instructions:
cmprb, cmpeqb, cnttzw, cnttzw., cnttzd, cnttzd.
llvm-svn: 266228
Resolve Bug 27046 (https://llvm.org/bugs/show_bug.cgi?id=27046).
The PPCInstrInfo::optimizeCompareInstr function could create a new use of
CR0, even if CR0 were previously dead. This patch marks CR0 live if a use of
CR0 is created.
Author: Tom Jablin (tjablin)
Reviewers: hfinkel kbarton cycheng
http://reviews.llvm.org/D18884
llvm-svn: 266040
In the ELFv2 ABI, we are not required to save all CR fields. If only one
nonvolatile CR field is clobbered, use mfocrf instead of mfcr to
selectively save the field, because mfocrf has short latency compares to
mfcr.
Thanks Nemanja's invaluable hint!
Reviewers: nemanjai tjablin hfinkel kbarton
http://reviews.llvm.org/D17749
llvm-svn: 266038
This is the same change on PPC64 as r255821 on AArch64. I have even borrowed
his commit message.
The access function has a short entry and a short exit, the initialization
block is only run the first time. To improve the performance, we want to
have a short frame at the entry and exit.
We explicitly handle most of the CSRs via copies. Only the CSRs that are not
handled via copies will be in CSR_SaveList.
Frame lowering and prologue/epilogue insertion will generate a short frame
in the entry and exit according to CSR_SaveList. The majority of the CSRs will
be handled by register allcoator. Register allocator will try to spill and
reload them in the initialization block.
We add CSRsViaCopy, it will be explicitly handled during lowering.
1> we first set FunctionLoweringInfo->SplitCSR if conditions are met (the target
supports it for the given machine function and the function has only return
exits). We also call TLI->initializeSplitCSR to perform initialization.
2> we call TLI->insertCopiesSplitCSR to insert copies from CSRsViaCopy to
virtual registers at beginning of the entry block and copies from virtual
registers to CSRsViaCopy at beginning of the exit blocks.
3> we also need to make sure the explicit copies will not be eliminated.
Author: Tom Jablin (tjablin)
Reviewers: hfinkel kbarton cycheng
http://reviews.llvm.org/D17533
llvm-svn: 265781
http://reviews.llvm.org/D18562
A large number of testcases has been modified so they pass after this test.
One testcase is deleted, because I realized even after undoing the original
change that was committed with this testcase, the testcase still passes. So
I removed it. The change to one other testcase (test/CodeGen/PowerPC/pr25802.ll)
is an arbitrary change to keep it passing. Given the original intention of the
testcase, and the fact that fixing it will require some time to change the testcase,
we concluded that this quick change will be enough.
llvm-svn: 265683
Summary:
In the context of http://wg21.link/lwg2445 C++ uses the concept of
'stronger' ordering but doesn't define it properly. This should be fixed
in C++17 barring a small question that's still open.
The code currently plays fast and loose with the AtomicOrdering
enum. Using an enum class is one step towards tightening things. I later
also want to tighten related enums, such as clang's
AtomicOrderingKind (which should be shared with LLVM as a 'C++ ABI'
enum).
This change touches a few lines of code which can be improved later, I'd
like to keep it as NFC for now as it's already quite complex. I have
related changes for clang.
As a follow-up I'll add:
bool operator<(AtomicOrdering, AtomicOrdering) = delete;
bool operator>(AtomicOrdering, AtomicOrdering) = delete;
bool operator<=(AtomicOrdering, AtomicOrdering) = delete;
bool operator>=(AtomicOrdering, AtomicOrdering) = delete;
This is separate so that clang and LLVM changes don't need to be in sync.
Reviewers: jyknight, reames
Subscribers: jyknight, llvm-commits
Differential Revision: http://reviews.llvm.org/D18775
llvm-svn: 265602
http://reviews.llvm.org/D18405
When the integer value loaded is never used directly as integer we should use VSX
or Floating Point Facility integer loads and avoid extra direct move
llvm-svn: 265593
This patch enable sibling call optimization on ppc64 ELFv1/ELFv2 abi, and
add a couple of test cases. This patch also passed llvm/clang bootstrap
test, and spec2006 build/run/result validation.
Original issue: https://llvm.org/bugs/show_bug.cgi?id=25617
Great thanks to Tom's (tjablin) help, he contributed a lot to this patch.
Thanks Hal and Kit's invaluable opinions!
Reviewers: hfinkel kbarton
http://reviews.llvm.org/D16315
llvm-svn: 265506
This patch implements the following BookII and Book III instructions:
- copy copy_first cp_abort paste paste. paste_last
- msgsync
- slbieg slbsync
- stop
Total 10 instructions
Reviewers: nemanjai hfinkel tjablin amehsan kbarton
llvm-svn: 265504
Summary:
This adds the same checks that were added in r264593 to all
target-specific passes that run after register allocation.
Reviewers: qcolombet
Subscribers: jyknight, dsanders, llvm-commits
Differential Revision: http://reviews.llvm.org/D18525
llvm-svn: 265313
Chapter 3 of the QPX manual states that, "Scalar floating-point load
instructions, defined in the Power ISA, cause a replication of the source data
across all elements of the target register." Thus, if we have a load followed
by a QPX splat (from the first lane), the splat is redundant. This adds a late
MI-level pass to remove the redundant splats in some of these cases
(specifically when both occur in the same basic block).
This optimization is scheduled just prior to post-RA scheduling. It can't happen
before anything that might replace the load with some already-computed quantity
(i.e. store-to-load forwarding).
llvm-svn: 265047
This will become necessary in a subsequent change to make this method
merge adjacent stack adjustments, i.e. it might erase the previous
and/or next instruction.
It also greatly simplifies the calls to this function from Prolog-
EpilogInserter. Previously, that had a bunch of logic to resume iteration
after the call; now it just continues with the returned iterator.
Note that this changes the behaviour of PEI a little. Previously,
it attempted to re-visit the new instruction created by
eliminateCallFramePseudoInstr(). That code was added in r36625,
but I can't see any reason for it: the new instructions will obviously
not be pseudo instructions, they will not have FrameIndex operands,
and we have already accounted for the stack adjustment.
Differential Revision: http://reviews.llvm.org/D18627
llvm-svn: 265036
PPCSimplifyAddress contains this code:
IntegerType *OffsetTy = ((VT == MVT::i32) ? Type::getInt32Ty(*Context)
: Type::getInt64Ty(*Context));
to determine the type to be used for an index register, if one needs
to be created. However, the "VT" here is the type of the data being
loaded or stored, *not* the type of an address. This means that if
a data element of type i32 is accessed using an index that does not
not fit into 32 bits, a wrong address is computed here.
Note that PPCFastISel is only ever used on 64-bit currently, so the type
of an address is actually *always* MVT::i64. Other parts of the code,
even in this same PPCSimplifyAddress routine, already rely on that fact.
Thus, this patch changes the code to simply unconditionally use
Type::getInt64Ty(*Context) as OffsetTy.
llvm-svn: 265023
This patch corresponds to review:
http://reviews.llvm.org/D18032
This patch provides asm implementation for the following instructions:
lwat, ldat, stwat, stdat, ldmx, mcrxrx
llvm-svn: 265022
The fast isel pass currently emits a COPY_TO_REGCLASS node to convert
from a F4RC to a F8RC register class during conversion of a
floating-point number to integer. There is actually no support in the
common code instruction printers to emit COPY_TO_REGCLASS nodes, so the
PowerPC back-end has special code there to simply ignore
COPY_TO_REGCLASS.
This is correct *if and only if* the source and destination registers of
COPY_TO_REGCLASS are the same (except for the different register class).
But nothing guarantees this to be the case, and if the register
allocator does end up allocating source and destination to different
registers after all, the back-end simply generates incorrect code. I've
included a test case that shows such incorrect code generation.
However, it seems that COPY_TO_REGCLASS is actually not intended to be
used at the MI layer at all. It is used during SelectionDAG, but always
lowered to a plain COPY before emitting MI. Other back-end's fast isel
passes never emit COPY_TO_REGCLASS at all. I suspect it is simply wrong
for the PowerPC back-end to emit it here.
This patch changes the PowerPC back-end to directly emit COPY instead of
COPY_TO_REGCLASS and removes the special handling in the instruction
printers.
Differential Revision: http://reviews.llvm.org/D18605
llvm-svn: 265020
When dealing with complex<float>, and similar structures with two
single-precision floating-point numbers, especially when such things are being
passed around by value, we'll sometimes end up loading both float values by
extracting them from one 64-bit integer load. It looks like this:
t13: i64,ch = load<LD8[%ref.tmp]> t0, t6, undef:i64
t16: i64 = srl t13, Constant:i32<32>
t17: i32 = truncate t16
t18: f32 = bitcast t17
t19: i32 = truncate t13
t20: f32 = bitcast t19
The problem, especially before the P8 where those bitcasts aren't legal (and
get expanded via the stack), is that it would have been better to use two
floating-point loads directly. Here we add a target-specific DAGCombine to do
just that. In short, we turn:
ld 3, 0(5)
stw 3, -8(1)
rldicl 3, 3, 32, 32
stw 3, -4(1)
lfs 3, -4(1)
lfs 0, -8(1)
into:
lfs 3, 4(5)
lfs 0, 0(5)
llvm-svn: 264988
These checks are redundant and can be removed
Reviewers: hans
Subscribers: llvm-commits, mzolotukhin
Differential Revision: http://reviews.llvm.org/D18564
llvm-svn: 264872
Instead of using two feature bits, one to indicate the availability of the
popcnt[dw] instructions, and another to indicate whether or not they're fast,
use a single enum. This allows more consistent control via target attribute
strings, and via Clang's command line.
llvm-svn: 264690
The A2 cores support the popcntw/popcntd instructions, but they're microcoded,
and slower than our default software emulation. Specifically, popcnt[dw] take
approximately 74 cycles, whereas our software emulation takes only 24-28
cycles.
I've added a new target feature to indicate a slow popcnt[dw], instead of just
removing the existing target feature from the a2/a2q processor models, because:
1. This allows us to return more accurate information via the TTI interface
(I recognize that this currently makes no practical difference)
2. Is hopefully easier to understand (it allows the core's features to match
its manual while still having the desired effect).
llvm-svn: 264600
Intrinsic::maxnum and Intrinsic::minnum, along with the associated libc
function calls (fmax[f], etc.) generally map to function calls after lowering.
For some vector types with QPX at least, however, we can legally lower these,
and we don't need to prohibit CTR-based loops on their account.
It turned out, however, that the logic that checked the opcodes associated with
intrinsics was broken (it would set the Opcode variable, but that variable was
later checked only if set for some otherwise-external function call.
This fixes the latter problem and adds the FMAX/MINNUM mappings.
llvm-svn: 264532
This patch corresponds to review:
http://reviews.llvm.org/D17711
It disables direct moves on these operations in 32-bit mode since the patterns
assume 64-bit registers. The final patch is slightly different from the
Phabricator review as the bitcast operations needed to be disabled in 32-bit
mode as well. This fixes PR26617.
llvm-svn: 264282
For fcmp, major concern about the following 6 cases is NaN result. The
comparison result consists of 4 bits, indicating lt, eq, gt and un (unordered),
only one of which will be set. The result is generated by fcmpu
instruction. However, bc instruction only inspects one of the first 3
bits, so when un is set, bc instruction may jump to to an undesired
place.
More specifically, if we expect an unordered comparison and un is set, we
expect to always go to true branch; in such case UEQ, UGT and ULT still
give false, which are undesired; but UNE, UGE, ULE happen to give true,
since they are tested by inspecting !eq, !lt, !gt, respectively.
Similarly, for ordered comparison, when un is set, we always expect the
result to be false. In such case OGT, OLT and OEQ is good, since they are
actually testing GT, LT, and EQ respectively, which are false. OGE, OLE
and ONE are tested through !lt, !gt and !eq, and these are true.
llvm-svn: 263753
This patch prevents CTR loops optimization when using soft float operations
inside loop body. Soft float operations use function calls, but function
calls are not allowed inside CTR optimized loops.
Patch by Aleksandar Beserminji.
Differential Revision: http://reviews.llvm.org/D17600
llvm-svn: 263727
- Rename getATOMIC to getSYNC, as llvm will soon be able to emit both
'__sync' libcalls and '__atomic' libcalls, and this function is for
the '__sync' ones.
- getInsertFencesForAtomic() has been replaced with
shouldInsertFencesForAtomic(Instruction), so that the decision can be
made per-instruction. This functionality will be used soon.
- emitLeadingFence/emitTrailingFence are no longer called if
shouldInsertFencesForAtomic returns false, and thus don't need to
check the condition themselves.
llvm-svn: 263665
This patch corresponds to review:
http://reviews.llvm.org/D17712
We were not clearing the TOC vector in PPCAsmPrinter when initializing it. This
caused duplicate definition asserts when the pass is reused on the module
(i.e. with -compile-twice or in JIT contexts).
llvm-svn: 263338
TableGen checks at compiletime that for scheduling models with
"CompleteModel = 1" one of the following holds:
- Is marked with the hasNoSchedulingInfo flag
- The instruction is a subclass of Sched
- There are InstRW definitions in the scheduling model
Typical steps necessary to complete a model:
- Ensure all pseudo instructions that are expanded before machine
scheduling (usually everything handled with EmitYYY() functions in
XXXTargetLowering).
- If a CPU does not support some instructions mark the corresponding
resource unsupported: "WriteRes<WriteXXX, []> { let Unsupported = 1; }".
- Add missing scheduling information.
Differential Revision: http://reviews.llvm.org/D17747
llvm-svn: 262384
Corresponds to Phabricator review:
http://reviews.llvm.org/D16592
This fix includes both an update to how we handle the "generic" CPU on LE
systems as well as Anton's fix for the Fast Isel issue.
llvm-svn: 262233
Change MachineInstr API to prefer MachineInstr& over MachineInstr*
whenever the parameter is expected to be non-null. Slowly inching
toward being able to fix PR26753.
llvm-svn: 262149
Take MachineInstr by reference instead of by pointer in SlotIndexes and
the SlotIndex wrappers in LiveIntervals. The MachineInstrs here are
never null, so this cleans up the API a bit. It also incidentally
removes a few implicit conversions from MachineInstrBundleIterator to
MachineInstr* (see PR26753).
At a couple of call sites it was convenient to convert to a range-based
for loop over MachineBasicBlock::instr_begin/instr_end, so I added
MachineBasicBlock::instrs.
llvm-svn: 262115
Currently we always expand ISD::FNEG. For v4f32 and v2f64 vector types VSX has
native support for this opcode
Phabricator: http://reviews.llvm.org/D17647
llvm-svn: 262079
Change TargetInstrInfo API to take `MachineInstr&` instead of
`MachineInstr*` in the functions related to predicated instructions
(I'll try to come back later and get some of the rest). All of these
functions require non-null parameters already, so references are more
clear. As a bonus, this happens to factor away a host of implicit
iterator => pointer conversions.
No functionality change intended.
llvm-svn: 261605
This is what was meant to be in the initial commit to fix this bug. The
parens were missing. This commit also adds a test case for the bug and
has undergone full testing on PPC and X86.
llvm-svn: 261546
The patch has a necessary call to a function inside an assert. Which is fine
when you have asserts turned on. Not so much when they're off. Sorry about
the regression.
llvm-svn: 261447
This patch corresponds to review:
http://reviews.llvm.org/D17294
It ensures that whatever block we are emitting the prologue/epilogue into, we
have the necessary scratch registers. It takes away the hard-coded register
numbers for use as scratch registers as registers that are guaranteed to be
available in the function prologue/epilogue are not guaranteed to be available
within the function body. Since we shrink-wrap, the prologue/epilogue may end
up in the function body.
llvm-svn: 261441
This patch is part of the work to make PPCLoopDataPrefetch
target-independent
(http://thread.gmane.org/gmane.comp.compilers.llvm.devel/92758).
Obviously the pass still only used from PPC at this point. Subsequent
patches will start driving this from ARM64 as well.
Due to the previous patch most lines should show up as moved lines.
llvm-svn: 261265
This is done only to make the next patch that move the pass out PPC to
Transforms easier to read. After this most line should show up as moved
lines in that patch.
This patch is part of the work to make PPCLoopDataPrefetch
target-independent
(http://thread.gmane.org/gmane.comp.compilers.llvm.devel/92758).
llvm-svn: 261264
Using the load immediate only when the immediate (whether signed or unsigned)
can fit in a 16-bit signed field. Namely, from -32768 to 32767 for signed and
0 to 65535 for unsigned. This patch also ensures that we sign-extend under the
right conditions.
llvm-svn: 259840
The purpose of PPCVSXFMAMutate is to elide copies by changing FMA forms
on PPC.
%vreg6<def> = COPY %vreg96
%vreg6<def,tied1> = XSMADDASP %vreg6<tied0>, %vreg5<kill>, %vreg7
;v6 = v6 + v5 * v7
is replaced by
%vreg5<def,tied1> = XSMADDMSP %vreg5<tied0>, %vreg7, %vreg96
;v5 = v5 * v7 + v96
This was broken in the case where the target register was also used as a
multiplicand. Fix this case by checking for it and replacing both uses
with the copied register.
%vreg6<def> = COPY %vreg96
%vreg6<def,tied1> = XSMADDASP %vreg6<tied0>, %vreg5<kill>, %vreg6
;v6 = v6 + v5 * v6
is replaced by
%vreg5<def,tied1> = XSMADDMSP %vreg5<tied0>, %vreg96, %vreg96
;v5 = v5 * v96 + v96
llvm-svn: 259617
check that the sign extended constant fits into 16-bits if we want a
zero extended value, otherwise go ahead and put it together piecemeal.
Fixes PR26356.
llvm-svn: 259177
This patch is part of the work to make PPCLoopDataPrefetch
target-independent
(http://thread.gmane.org/gmane.comp.compilers.llvm.devel/92758).
As it was discussed in the above thread, getPrefetchDistance is
currently using instruction count which may change in the future.
llvm-svn: 258995
Summary:
This patch is provided in preparation for removing autoconf on 1/26. The proposal to remove autoconf on 1/26 was discussed on the llvm-dev thread here: http://lists.llvm.org/pipermail/llvm-dev/2016-January/093875.html
"I felt a great disturbance in the [build system], as if millions of [makefiles] suddenly cried out in terror and were suddenly silenced. I fear something [amazing] has happened."
- Obi Wan Kenobi
Reviewers: chandlerc, grosbach, bob.wilson, tstellarAMD, echristo, whitequark
Subscribers: chfast, simoncook, emaste, jholewinski, tberghammer, jfb, danalbert, srhines, arsenm, dschuff, jyknight, dsanders, joker.eph, llvm-commits
Differential Revision: http://reviews.llvm.org/D16471
llvm-svn: 258861
Summary:
And use it in PPCLoopDataPrefetch.cpp.
@hfinkel, please let me know if your preference would be to preserve the
ppc-loop-prefetch-cache-line option in order to be able to override the
value of TTI::getCacheLineSize for PPC.
Reviewers: hfinkel
Subscribers: hulx2000, mcrosier, mssimpso, hfinkel, llvm-commits
Differential Revision: http://reviews.llvm.org/D16306
llvm-svn: 258419
Some compilers don't do exhaustive switch checking. For those compilers,
add an initialization to prevent un-initialized variable warnings from
firing. For compilers with exhaustive switch checking, we still get a
guarantee that the switch is exhaustive, and hence the initializations
are redundant, and a non-functional change.
llvm-svn: 257923
The global entry point prologue currently assumes that the TOC
associated with a function is less than 2GB away from the function
entry point. This is always true when using the medium or small
code model, but may not be the case when using the large code model.
This patch adds a new variant of the ELFv2 global entry point prologue
that lifts the 2GB restriction when building with -mcmodel=large.
This works by emitting a quadword containing the distance from the
function entry point to its associated TOC immediately before the
entry point, and then using a prologue like:
ld r2,-8(r12)
add r2,r2,r12
Since creation of the entry point prologue is now split across two
separate routines (PPCLinuxAsmPrinter::EmitFunctionEntryLabel emits
the data word, PPCLinuxAsmPrinter::EmitFunctionBodyStart the prolog
code), I've switched to using named labels instead of just temporaries
to indicate the locations of the global and local entry points and the
new TOC offset data word.
These names are provided by new routines in PPCFunctionInfo modeled
after the existing PPCFunctionInfo::getPICOffsetSymbol.
Note that a corresponding change was committed to GCC here:
https://gcc.gnu.org/ml/gcc-patches/2015-12/msg00355.html
Reviewers: hfinkel
Differential Revision: http://reviews.llvm.org/D15500
llvm-svn: 257597
Only non-weighted predicates were handled in PPCInstrInfo::insertSelect. Handle
the weighted predicates as well.
This latent bug was triggered by r255398, because it added use of the
branch-weighted predicates.
While here, switch over an enum instead of an int to get the compiler to enforce
totality in the future.
llvm-svn: 257518
This patch corresponds to review:
http://reviews.llvm.org/D15930
Moves to and from CR fields depend on shifts/masks that depend on the
target/source CR field. Thus, post-ra anti-dep breaking must not later
change that CR register assignment.
llvm-svn: 257168
This is a fix for bug http://llvm.org/bugs/show_bug.cgi?id=25839.
For a PIC TLS variable access in a function, prologue (mflr followed by std and
stdu) gets scheduled after a tls_get_addr call. tls_get_addr messed up LR but
no one saves/restores it.
Also added a test for save/restore clobbered registers during calling __tls_get_addr.
Patch by Tim Shen
llvm-svn: 257137
This patch removes all weight-related interfaces from BPI and replace
them by probability versions. With this patch, we won't use edge weight
anymore in either IR or MC passes. Edge probabilitiy is a better
representation in terms of CFG update and validation.
Differential revision: http://reviews.llvm.org/D15519
llvm-svn: 256263
This matches the other MIB methods, none of which modify the builder.
Without this, we can't chain copyImplicitOps.
Also reformat the few users, in PPCEarlyReturn.
llvm-svn: 255828
A large number of loop utility functions take a `Pass *` and reach
into it to find out which analyses to preserve. There are a number of
problems with this:
- The APIs have access to pretty well any Pass state they want, so
it's hard to tell what they may or may not do.
- Other APIs have copied these and pass around a `Pass *` even though
they don't even use it. Some of these just hand a nullptr to the API
since the callers don't even have a pass available.
- Passes in the new pass manager don't work like the current ones, so
the APIs can't be used as is there.
Instead, we should explicitly thread the analysis results that we
actually care about through these APIs. This is both simpler and more
reusable.
llvm-svn: 255669
This patch corresponds to review:
http://reviews.llvm.org/D15117
In preparation for supporting IEEE Quad precision floating point,
this patch simply defines a feature to specify the target supports this.
For now, nothing is done with the target feature, we just don't want
warnings from the Clang FE when a user specifies -mfloat128.
Calling convention and other related work will add to this patch in
the near future.
llvm-svn: 255642
This is the second in a set of patches for soft float support for ppc32,
it enables soft float operations.
Patch by Strahinja Petrovic.
Differential Revision: http://reviews.llvm.org/D13700
llvm-svn: 255516
This patch adds some missing calls to MBB::normalizeSuccProbs() in several
locations where it should be called. Those places are found by checking if the
sum of successors' probabilities is approximate one in MachineBlockPlacement
pass with some instrumented code (not in this patch).
Differential revision: http://reviews.llvm.org/D15259
llvm-svn: 255455
We don't need to pass OutStreamer as a parameter to LowerSTACKMAP and
LowerPATCHPOINT. It is a member variable of PPCAsmPrinter, and thus, is already
available. NFC.
llvm-svn: 255418
This branch adds hints for highly biased branches on the PPC architecture. Even
in absence of profiling information, LLVM will mark code reaching unreachable
terminators and other exceptional control flow constructs as highly unlikely to
be reached.
Patch by Tom Jablin!
llvm-svn: 255398
These are redundant pairs of nodes defined for
INSERT_VECTOR_ELEMENT/EXTRACT_VECTOR_ELEMENT.
insertelement/extractelement are slightly closer to the corresponding
C++ node name, and has stricter type checking so prefer it.
Update targets to only use these nodes where it is trivial to do so.
AArch64, ARM, and Mips all have various type errors on simple replacement,
so they will need work to fix.
Example from AArch64:
def : Pat<(sext_inreg (vector_extract (v16i8 V128:$Rn), VectorIndexB:$idx), i8),
(i32 (SMOVvi8to32 V128:$Rn, VectorIndexB:$idx))>;
Which is trying to do sext_inreg i8, i8.
llvm-svn: 255359
Access to aligned globals gives us a chance to peephole optimize nonzero
offsets. If a struct is 4 byte aligned, then accesses to bytes 0-3 won't
overflow the available displacement. For example:
addis 3, 2, b4v@toc@ha
addi 4, 3, b4v@toc@l
lbz 5, b4v@toc@l(3) ; This is the result of the current peephole
lbz 6, 1(4) ; optimizer
lbz 7, 2(4)
lbz 8, 3(4)
If b4v is 4-byte aligned, we can skip using register 4 because we know
that b4v@toc@l+{1,2,3} won't overflow 32K, and instead generate:
addis 3, 2, b4v@toc@ha
lbz 4, b4v@toc@l(3)
lbz 5, b4v@toc@l+1(3)
lbz 6, b4v@toc@l+2(3)
lbz 7, b4v@toc@l+3(3)
Saving a register and an addition.
Larger alignments allow larger structures/arrays to be optimized.
llvm-svn: 255319
This was causing bad code gen and assembly that won't assemble, as
mixed altivec and vsx code would end up with a vsx high register
assigned to an altivec instruction, which won't work. Constraining the
classes allows the optimization to proceed.
llvm-svn: 255299
This patch corresponds to review:
http://reviews.llvm.org/D15286
LLVM IR frequently contains bitcast operations between floating point and
integer values of the same width. Doing this through memory operations is
quite expensive on PPC. This patch allows the use of direct register moves
between FPRs and GPRs for lowering bitcasts.
llvm-svn: 255246
This call should in fact be made by RegScavenger::enterBasicBlock()
called below. The first call does nothing except for triggering UB,
indicated by UBSan (passing nullptr to memset()).
llvm-svn: 254548
The @llvm.get.dynamic.area.offset.* intrinsic family is used to get the offset
from native stack pointer to the address of the most recent dynamic alloca on
the caller's stack. These intrinsics are intendend for use in combination with
@llvm.stacksave and @llvm.restore to get a pointer to the most recent dynamic
alloca. This is useful, for example, for AddressSanitizer's stack unpoisoning
routines.
Patch by Max Ostapenko.
Differential Revision: http://reviews.llvm.org/D14983
llvm-svn: 254404
Re-enable shrink wrapping for PPC64 Little Endian.
One minor modification to PPCFrameLowering::findScratchRegister was necessary to handle fall-thru blocks (blocks with no terminator) correctly.
Tested with all LLVM test, clang tests, and the self-hosting build, with no problems found.
PHabricator: http://reviews.llvm.org/D14778
llvm-svn: 254314
Summary:
Many target lowerings copy-paste the code to test SDValues for known constants.
This code can instead be shared in SelectionDAG.cpp, and reused in the targets.
Reviewers: MatzeB, andreadb, tstellarAMD
Subscribers: arsenm, jyknight, llvm-commits
Differential Revision: http://reviews.llvm.org/D14945
llvm-svn: 254085
The e500mc does not actually support the mfocrf instruction; update the
processor definitions to reflect that fact.
Patch by Tom Rix (with some test-case cleanup by me).
llvm-svn: 254064
The patch in http://reviews.llvm.org/D13745 is broken into four parts:
1. New interfaces without functional changes.
2. Use new interfaces in SelectionDAG, while in other passes treat probabilities
as weights.
3. Use new interfaces in all other passes.
4. Remove old interfaces.
This the second patch above. In this patch SelectionDAG starts to use
probability-based interfaces in MBB to add successors but other MC passes are
still using weight-based interfaces. Therefore, we need to maintain correct
weight list in MBB even when probability-based interfaces are used. This is
done by updating weight list in probability-based interfaces by treating the
numerator of probabilities as weights. This change affects many test cases
that check successor weight values. I will update those test cases once this
patch looks good to you.
Differential revision: http://reviews.llvm.org/D14361
llvm-svn: 253965
incorrect, as the chosen representative of the weak symbol may not live
with the code in question. Always indirect the access through the TOC
instead.
Patch by Kyle Butt!
llvm-svn: 253708
Currently, if the assembler encounters an error after parsing (such as an
out-of-range fixup), it reports this as a fatal error, and so stops after the
first error. However, for most of these there is an obvious way to recover
after emitting the error, such as emitting the fixup with a value of zero. This
means that we can report on all of the errors in a file, not just the first
one. MCContext::reportError records the fact that an error was encountered, so
we won't actually emit an object file with the incorrect contents.
Differential Revision: http://reviews.llvm.org/D14717
llvm-svn: 253328
The way prelink used to work was
* The compiler decides if a given section only has relocations that
are know to point to the same DSO. If so, it names it
.data.rel.ro.local<something>.
* The static linker puts all of these together.
* The prelinker program assigns addresses to each library and resolves
the local relocations.
There are many problems with this:
* It is incompatible with address space randomization.
* The information passed by the compiler is redundant. The linker
knows if a given relocation is in the same DSO or not. If could sort
by that if so desired.
* There are newer ways of speeding up DSO (gnu hash for example).
* Even if we want to implement this again in the compiler, the previous
implementation is pretty broken. It talks about relocations that are
"resolved by the static linker". If they are resolved, there are none
left for the prelinker. What one needs to track is if an expression
will require only dynamic relocations that point to the same DSO.
At this point it looks like the prelinker is an historical curiosity.
For example, fedora has retired it because it failed to build for two
releases
(http://pkgs.fedoraproject.org/cgit/prelink.git/commit/?id=eb43100a8331d91c801ee3dcdb0a0bb9babfdc1f)
This patch removes support for it. That is, it stops printing the
".local" sections.
llvm-svn: 253280
MCRelaxableFragment previously kept a copy of MCSubtargetInfo and
MCInst to enable re-encoding the MCInst later during relaxation. A copy
of MCSubtargetInfo (instead of a reference or pointer) was needed
because the feature bits could be modified by the parser.
This commit replaces the MCSubtargetInfo copy in MCRelaxableFragment
with a constant reference to MCSubtargetInfo. The copies of
MCSubtargetInfo are kept in MCContext, and the target parsers are now
responsible for asking MCContext to provide a copy whenever the feature
bits of MCSubtargetInfo have to be toggled.
With this patch, I saw a 4% reduction in peak memory usage when I
compiled verify-uselistorder.lto.bc using llc.
rdar://problem/21736951
Differential Revision: http://reviews.llvm.org/D14346
llvm-svn: 253127
MCSubtargetInfo in the subclasses into MCTargetAsmParser and define a
member function getSTI.
This is done in preparation for making changes to shrink the size of
MCRelaxableFragment. (see http://reviews.llvm.org/D14346).
llvm-svn: 253124
This patch adds a pass for doing PowerPC peephole optimizations at the
MI level while the code is still in SSA form. This allows for easy
modifications to the instructions while depending on a subsequent pass
of DCE. Both passes are very fast due to the characteristics of SSA.
At this time, the only peepholes added are for cleaning up various
redundancies involving the XXPERMDI instruction. However, I would
expect this will be a useful place to add more peepholes for
inefficiencies generated during instruction selection. The pass is
placed after VSX swap optimization, as it is best to let that pass
remove unnecessary swaps before performing any remaining clean-ups.
The utility of these clean-ups are demonstrated by changes to four
existing test cases, all of which now have tighter expected code
generation. I've also added Eric Schweiz's bugpoint-reduced test from
PR25157, for which we now generate tight code. One other test started
failing for me, and I've fixed it
(test/Transforms/PlaceSafepoints/finite-loops.ll) as well; this is not
related to my changes, and I'm not sure why it works before and not
after. The problem is that the CHECK-NOT: of "statepoint" from test1
fails because of the "statepoint" in test2, and so forth. Adding a
CHECK-LABEL in between keeps the different occurrences of that string
properly scoped.
llvm-svn: 252651
Under most circumstances, if SCEV can simplify X-Y to a constant, then it can
also simplify Y-X to a constant. However, there is no guarantee that this is
always true, and concensus is not to consider that a correctness bug in SCEV
(although it is undesirable).
PPCLoopPreIncPrep gathers pointers used to access memory (via loads, stores and
prefetches) into buckets, where in each bucket the relative pointer offsets are
constant. We used to keep each bucket as a multimap, where SCEV's subtraction
operation was used to define the ordering predicate. Instead, use a fixed SCEV
base expression for each bucket, record the constant offsets from that base
expression, and adjust it later, if desirable, once all pointers have been
collected.
Doing it this way should be more compile-time efficient than the previous
scheme (in addition to making the implementation less sensitive to SCEV
simplification quirks).
Fixes PR25170.
llvm-svn: 252417
Summary:
The CLR's personality routine passes these in rdx/edx, not rax/eax.
Make getExceptionPointerRegister a virtual method parameterized by
personality function to allow making this distinction.
Similarly make getExceptionSelectorRegister a virtual method parameterized
by personality function, for symmetry.
Reviewers: pgavlin, majnemer, rnk
Subscribers: jyknight, dsanders, llvm-commits
Differential Revision: http://reviews.llvm.org/D14344
llvm-svn: 252383
Summary:
In this implementation, LiveIntervalAnalysis invents a few register
masks on basic block boundaries that preserve no registers. The nice
thing about this is that it prevents the prologue inserter from thinking
it needs to spill all XMM CSRs, because it doesn't see any explicit
physreg defs in the MI.
Reviewers: MatzeB, qcolombet, JosephTremoulet, majnemer
Subscribers: MatzeB, llvm-commits
Differential Revision: http://reviews.llvm.org/D14407
llvm-svn: 252318
Also, remove an enum hack where enum values were used as indexes into an array.
We may want to make this a real class to allow pattern-based queries/customization (D13417).
llvm-svn: 252196
This revision has introduced an issue that only affects bootstrapped compiler
when it is printing the ASM. It turns out that the new code path taken due to
legalizing a scalar_to_vector of i64 -> v2i64 exposes a missing check in a
micro optimization to change a load followed by a scalar_to_vector into a
load and splat instruction on PPC.
llvm-svn: 251798
We cannot form ctr-based loops around function calls, including calls to
__tls_get_addr used for PIC TLS variables. References to such TLS variables,
however, might be buried within constant expressions, and so we need to search
the entire constant expression to be sure that no references to such TLS
variables exist.
Fixes PR25256, reported by Eric Schweitz. This is a slightly-modified version
of the patch suggested by Eric in the bug report, and a test case I created.
llvm-svn: 251582
As a follow-up to r251566, do the same for the other optionally-supported
register classes (mostly for vector registers). Don't return an unavailable
register class (which would cause an assert later), but fail cleanly when
provided an unsupported inline asm constraint.
llvm-svn: 251575
cntlz is the old POWER mnemonic. cntlzw is the PowerPC mnemonic.
This change fixes an issue when -no-integrated-as: The opcode cntlz is
unrecognized by gas
Alias the POWER mnemonic cntlz[.] to the PowerPC mnemonic cntlzw[.]
This is done for because the POWER cntlz mnemonic has be used by LLVM for
a very long time. We need to make sure that assembly programs
that are using the cntlz[.] do not break with this change.
Change PowerPC tests to reflect the insn change from cntlz to cntlzw.
Add assembly test to verify cntlz[.] is encoded correctly.
Patch by Tom Rix!
llvm-svn: 251489
We didn't validate that the .word directive was given a sane value,
leading to crashes when we attempt to write out the object file.
Instead, perform some validation and issue a diagnostic pointing at the
start of the diagnostic.
llvm-svn: 251270
PR24686 identifies a problem where a relocation expression is invalid
when not all of the symbols in the expression can be locally
resolved. This causes the compiler to request a PC-relative half16ds
relocation, which is nonsensical for PowerPC. This patch recognizes
this situation and ensures we fail the assembly cleanly.
Test case provided by Anton Blanchard.
llvm-svn: 251027
PR25157 identifies a bug where a load plus a vector shuffle is
incorrectly converted into an LXVDSX instruction. That optimization
is only valid if the load is of a doubleword, and in the noted case,
it was not. This corrects that problem.
Joint patch with Eric Schweitz, who provided the bugpoint-reduced test
case.
llvm-svn: 250324
This patch corresponds to review:
http://reviews.llvm.org/D12032
This patch builds onto the patch that provided scalar to vector conversions
without stack operations (D11471).
Included in this patch:
- Vector element extraction for all vector types with constant element number
- Vector element extraction for v16i8 and v8i16 with variable element number
- Removal of some unnecessary COPY_TO_REGCLASS operations that ended up
unnecessarily moving things around between registers
Not included in this patch (will be in upcoming patch):
- Vector element extraction for v4i32, v4f32, v2i64 and v2f64 with
variable element number
- Vector element insertion for variable/constant element number
Testing is provided for all extractions. The extractions that are not
implemented yet are just placeholders.
llvm-svn: 249822
Stop using `getNextNode()` to create an insertion point for machine
instructions (at least, in this one place). Instead, use an iterator.
As a drive-by, clean up dump statements to use iterator logic.
The `getNextNode()` interface isn't actually supposed to work for
insertion points; it's supposed to return `nullptr` if this is the last
node. It's currently broken and will "happen" to work, but if we ever
fix the function, we'll get some strange failures.
llvm-svn: 249758
This extends the work done in r233995 so that now getFragment (in addition to
getSection) also works for variable symbols.
With that the existing logic to decide if a-b can be computed works even if
a or b are variables. Given that, the expression evaluation can avoid expanding
variables as aggressively and that in turn lets the relocation code see the
original variable.
In order for this to work with the asm streamer, there is now a dummy fragment
per section. It is used to assign a section to a symbol when no other fragment
exists.
This patch is a joint work by Maxim Ostapenko andy myself.
llvm-svn: 249303
Summary:
It is fairly common to call SE->getConstant(Ty, 0) or
SE->getConstant(Ty, 1); this change makes such uses a little bit
briefer.
I've refactored the call sites I could find easily to use getZero /
getOne.
Reviewers: hfinkel, majnemer, reames
Subscribers: sanjoy, llvm-commits
Differential Revision: http://reviews.llvm.org/D12947
llvm-svn: 248362
After D10403, we had FMF in the DAG but disabled by default. Nick reported no crashing errors after some stress testing,
so I enabled them at r243687. However, Escha soon notified us of a bug not covered by any in-tree regression tests:
if we don't propagate the flags, we may fail to CSE DAG nodes because differing FMF causes them to not match. There is
one test case in this patch to prove that point.
This patch hopes to fix or leave a 'TODO' for all of the in-tree places where we create nodes that are FMF-capable. I
did this by putting an assert in SelectionDAG.getNode() to find any FMF-capable node that was being created without FMF
( D11807 ). I then ran all regression tests and test-suite and confirmed that everything passes.
This patch exposes remaining work to get DAG FMF to be fully functional: (1) add the flags to non-binary nodes such as
FCMP, FMA and FNEG; (2) add the flags to intrinsics; (3) use the flags as conditions for transforms rather than the
current global settings.
Differential Revision: http://reviews.llvm.org/D12095
llvm-svn: 247815
Summary:
This is the first patch in the series to migrate Triple's (which are ambiguous)
to TargetTuple's (which aren't).
For the moment, TargetTuple simply passes all requests to the Triple object it
holds. Once it has replaced Triple, it will start to implement the interface in
a more suitable way.
This change makes some changes to the public C++ API. In particular,
InitMCSubtargetInfo(), createMCRelocationInfo(), and createMCSymbolizer()
now take TargetTuples instead of Triples. The other public C++ API's have
been left as-is for the moment to reduce patch size.
This commit also contains a trivial patch to clang to account for the C++ API
change. Thanks go to Pavel Labath for fixing LLDB for me.
Reviewers: rengolin
Subscribers: jyknight, dschuff, arsenm, rampitec, danalbert, srhines, javed.absar, dsanders, echristo, emaste, jholewinski, tberghammer, ted, jfb, llvm-commits, rengolin
Differential Revision: http://reviews.llvm.org/D10969
llvm-svn: 247692
Summary:
This is the first patch in the series to migrate Triple's (which are ambiguous)
to TargetTuple's (which aren't).
For the moment, TargetTuple simply passes all requests to the Triple object it
holds. Once it has replaced Triple, it will start to implement the interface in
a more suitable way.
This change makes some changes to the public C++ API. In particular,
InitMCSubtargetInfo(), createMCRelocationInfo(), and createMCSymbolizer()
now take TargetTuples instead of Triples. The other public C++ API's have
been left as-is for the moment to reduce patch size.
This commit also contains a trivial patch to clang to account for the C++ API
change.
Reviewers: rengolin
Subscribers: jyknight, dschuff, arsenm, rampitec, danalbert, srhines, javed.absar, dsanders, echristo, emaste, jholewinski, tberghammer, ted, jfb, llvm-commits, rengolin
Differential Revision: http://reviews.llvm.org/D10969
llvm-svn: 247683
The changes in this patch are as follows:
1. Modify the emitPrologue and emitEpilogue methods to work properly when the prologue and epilogue blocks are not the first/last blocks in the function
2. Fix a bug in PPCEarlyReturn optimization caused by an empty entry block in the function
3. Override the runShrinkWrap PredicateFtor (defined in TargetMachine) to check whether shrink wrapping should run:
Shrink wrapping will run on PPC64 (Little Endian and Big Endian) unless -enable-shrink-wrap=false is specified on command line
A new test case, ppc-shrink-wrapping.ll was created based on the existing shrink wrapping tests for x86, arm, and arm64.
Phabricator review: http://reviews.llvm.org/D11817
llvm-svn: 247237
with the new pass manager, and no longer relying on analysis groups.
This builds essentially a ground-up new AA infrastructure stack for
LLVM. The core ideas are the same that are used throughout the new pass
manager: type erased polymorphism and direct composition. The design is
as follows:
- FunctionAAResults is a type-erasing alias analysis results aggregation
interface to walk a single query across a range of results from
different alias analyses. Currently this is function-specific as we
always assume that aliasing queries are *within* a function.
- AAResultBase is a CRTP utility providing stub implementations of
various parts of the alias analysis result concept, notably in several
cases in terms of other more general parts of the interface. This can
be used to implement only a narrow part of the interface rather than
the entire interface. This isn't really ideal, this logic should be
hoisted into FunctionAAResults as currently it will cause
a significant amount of redundant work, but it faithfully models the
behavior of the prior infrastructure.
- All the alias analysis passes are ported to be wrapper passes for the
legacy PM and new-style analysis passes for the new PM with a shared
result object. In some cases (most notably CFL), this is an extremely
naive approach that we should revisit when we can specialize for the
new pass manager.
- BasicAA has been restructured to reflect that it is much more
fundamentally a function analysis because it uses dominator trees and
loop info that need to be constructed for each function.
All of the references to getting alias analysis results have been
updated to use the new aggregation interface. All the preservation and
other pass management code has been updated accordingly.
The way the FunctionAAResultsWrapperPass works is to detect the
available alias analyses when run, and add them to the results object.
This means that we should be able to continue to respect when various
passes are added to the pipeline, for example adding CFL or adding TBAA
passes should just cause their results to be available and to get folded
into this. The exception to this rule is BasicAA which really needs to
be a function pass due to using dominator trees and loop info. As
a consequence, the FunctionAAResultsWrapperPass directly depends on
BasicAA and always includes it in the aggregation.
This has significant implications for preserving analyses. Generally,
most passes shouldn't bother preserving FunctionAAResultsWrapperPass
because rebuilding the results just updates the set of known AA passes.
The exception to this rule are LoopPass instances which need to preserve
all the function analyses that the loop pass manager will end up
needing. This means preserving both BasicAAWrapperPass and the
aggregating FunctionAAResultsWrapperPass.
Now, when preserving an alias analysis, you do so by directly preserving
that analysis. This is only necessary for non-immutable-pass-provided
alias analyses though, and there are only three of interest: BasicAA,
GlobalsAA (formerly GlobalsModRef), and SCEVAA. Usually BasicAA is
preserved when needed because it (like DominatorTree and LoopInfo) is
marked as a CFG-only pass. I've expanded GlobalsAA into the preserved
set everywhere we previously were preserving all of AliasAnalysis, and
I've added SCEVAA in the intersection of that with where we preserve
SCEV itself.
One significant challenge to all of this is that the CGSCC passes were
actually using the alias analysis implementations by taking advantage of
a pretty amazing set of loop holes in the old pass manager's analysis
management code which allowed analysis groups to slide through in many
cases. Moving away from analysis groups makes this problem much more
obvious. To fix it, I've leveraged the flexibility the design of the new
PM components provides to just directly construct the relevant alias
analyses for the relevant functions in the IPO passes that need them.
This is a bit hacky, but should go away with the new pass manager, and
is already in many ways cleaner than the prior state.
Another significant challenge is that various facilities of the old
alias analysis infrastructure just don't fit any more. The most
significant of these is the alias analysis 'counter' pass. That pass
relied on the ability to snoop on AA queries at different points in the
analysis group chain. Instead, I'm planning to build printing
functionality directly into the aggregation layer. I've not included
that in this patch merely to keep it smaller.
Note that all of this needs a nearly complete rewrite of the AA
documentation. I'm planning to do that, but I'd like to make sure the
new design settles, and to flesh out a bit more of what it looks like in
the new pass manager first.
Differential Revision: http://reviews.llvm.org/D12080
llvm-svn: 247167
To commute a trivial rlwimi instructions (meaning one with a full mask and zero
shift), we'd need to ability to form an all-zero mask (instead of an all-one
mask) using rlwimi. We can't represent this, however, and we'll miscompile code
if we try.
The code quality problem that this highlights (that SDAG simplification can
lead to us generating an ISD::OR node with a constant zero LHS) will be fixed
as a follow-up.
Fixes PR24719.
llvm-svn: 246937
PPCISelDAGToDAG has a transformation that generates a rlwimi instruction from
an input pattern that looks like this:
and(or(x, c1), c2)
but the associated logic does not work if there are bits that are 1 in c1 but 0
in c2 (these are normally canonicalized away, but that can't happen if the 'or'
has other users. Make sure we abort the transformation if such bits are
discovered.
Fixes PR24704.
llvm-svn: 246900
This adds a basic cost model for interleaved-access vectorization (and a better
default for shuffles), and enables interleaved-access vectorization by default.
The relevant difference from the default cost model for interleaved-access
vectorization, is that on PPC, the shuffles that end up being used are *much*
cheaper than modeling the process with insert/extract pairs (which are
quite expensive, especially on older cores).
llvm-svn: 246824
On the A2, with an eye toward QPX unaligned-load merging, we should always use
aggressive interleaving. It is generally superior to only using concatenation
unrolling.
llvm-svn: 246819
When forming permutation-based unaligned vector loads, we need to know whether
it is valid to read ahead of the requested address by a full vector length.
Doing so is more efficient (and allows for more CSE with later loads), but
could trigger a page fault if invalid. To determine validity, we look for other
loads in the same block that access the relevant address range.
The relevant point here is that we need to do this as part of the process of
forming permutation-based vector loads, and this happens quite early in the
SDAG pipeline - specifically before many of the address calculations are fully
canonicalized. As a result, we need to try harder to recognize base+offset
address computations, because they still might appear as chain of adds
(base+offset+offset, for example). To account for this, we'll look through
chains of adds, accumulating the constant offsets.
llvm-svn: 246813
Pre-P8, when we generate code for unaligned vector loads (for Altivec and QPX
types), even when accounting for the combining that takes place for multiple
consecutive such loads, there is at least one load instructions and one
permutation for each load. Make sure the cost reported reflects the cost of the
permutes as well.
llvm-svn: 246807
If you compute the MMO offset using unsigned arithmetic, you end up with a
large positive offset instead of a small negative one. In theory, this could
cause bad instruction-scheduling decisions later.
I noticed this by inspection from the debug output, and using that for the
regression test is the best I can do right now.
llvm-svn: 246805
I'm adding a regression test to better cover code generation for unaligned
vector loads and stores, but there's no functional change to the code
generation here. There is an improvement to the cost model for unaligned vector
loads and stores, mostly for QPX (for which we were not previously accounting
for the permutation-based loads), and the cost model implementation is cleaner.
llvm-svn: 246712
LowerVECTOR_SHUFFLE needs to decide whether to pass a vector shuffle off to the
TableGen-generated matching code, and it does this by testing the same
predicates used by the TableGen files. Unfortunately, when we added new
P8Altivec-only predicates, we started universally testing them in
LowerVECTOR_SHUFFLE, and if then matched when targeting a system prior to a P8,
we'd end up with a selection failure.
llvm-svn: 246675
Also delete and simplify a lot of MachineModuleInfo code that used to be
needed to handle personalities on landingpads. Now that the personality
is on the LLVM Function, we no longer need to track it this way on MMI.
Certainly it should not live on LandingPadInfo.
llvm-svn: 246478
There were really two problems here. The first was that we had the truth tables
for signed i1 comparisons backward. I imagine these are not very common, but if
you have:
setcc i1 x, y, LT
this has the '0 1' and the '1 0' results flipped compared to:
setcc i1 x, y, ULT
because, in the signed case, '1 0' is really '-1 0', and the answer is not the
same as in the unsigned case.
The second problem was that we did not have patterns (at all) for the unsigned
comparisons select_cc nodes for i1 comparison operands. This was the specific
cause of PR24552. These had to be added (and a missing Altivec promotion added
as well) to make sure these function for all types. I've added a bunch more
test cases for these patterns, and there are a few FIXMEs in the test case
regarding code-quality.
Fixes PR24552.
llvm-svn: 246400
Add support for MIR serialization of PowerPC-specific operand target flags
(based on the generic infrastructure added in r244185 and r245383).
I won't even pretend that this is good test coverage, but this includes the
regression test associated with r246372. Adding an MIR test for that fix is far
superior to adding an IR-level test because particular instruction-scheduling
decisions are necessary in order to expose the bug, and using an MIR test we
can start the pipeline post-scheduling.
llvm-svn: 246373
Even through ADDISdtprelHA generally has r3 as its source register, it is
possible for the instruction scheduler to move things around such that some
other register is the source. We need to print the actual source register, not
always r3. Fixes PR24394.
The test case will come in a follow-up commit because it depends on MIR
target-flags parsing.
llvm-svn: 246372
We might end up with a trivial copy as the addend, and if so, we should ignore
the corresponding FMA instruction. The trivial copy can be coalesced away later,
so there's nothing to do here. We should not, however, assert. Fixes PR24544.
llvm-svn: 245907
This patch fixes PR24546, which demonstrates a segfault during the VSX
swap removal pass. The problem is that debug value instructions were
not excluded from the list of instructions to be analyzed for webs of
related computation. I've added the test case from the PR as a crash
test in test/CodeGen/PowerPC.
llvm-svn: 245862
When PPCVSXFMAMutate would look at the input addend register, it would get its
input value number. This would fail, however, if the register was undef,
causing a segfault. Don't segfault (just skip such FMA instructions).
Fixes the test case from PR24542 (although that may have been over-reduced).
llvm-svn: 245741
XVCMPEQDP is used for VSX v2f64 equality comparisons, but the value type needs
to be v2i64 (as that's the corresponding SETCC type).
Fixes PR24225.
llvm-svn: 245535
This DAGCombine was creating custom SDAG nodes with an illegal ppc_fp128
operand type because it was triggering on f64/f32 int2fp(fp2int(ppc_fp128 x)),
but shouldn't (it should only apply to f32/f64 types). The result was a crash.
llvm-svn: 245530
This revision has introduced an issue that only affects bootstrapped compiler
when it is printing the ASM. I am working on resolving the issue, but in the
meantime, I'm disabling the legalization of scalar_to_vector operation for v2i64
and the associated testing until I can get this fixed.
llvm-svn: 245481
This change makes ScalarEvolution a stand-alone object and just produces
one from a pass as needed. Making this work well requires making the
object movable, using references instead of overwritten pointers in
a number of places, and other refactorings.
I've also wired it up to the new pass manager and added a RUN line to
a test to exercise it under the new pass manager. This includes basic
printing support much like with other analyses.
But there is a big and somewhat scary change here. Prior to this patch
ScalarEvolution was never *actually* invalidated!!! Re-running the pass
just re-wired up the various other analyses and didn't remove any of the
existing entries in the SCEV caches or clear out anything at all. This
might seem OK as everything in SCEV that can uses ValueHandles to track
updates to the values that serve as SCEV keys. However, this still means
that as we ran SCEV over each function in the module, we kept
accumulating more and more SCEVs into the cache. At the end, we would
have a SCEV cache with every value that we ever needed a SCEV for in the
entire module!!! Yowzers. The releaseMemory routine would dump all of
this, but that isn't realy called during normal runs of the pipeline as
far as I can see.
To make matters worse, there *is* actually a key that we don't update
with value handles -- there is a map keyed off of Loop*s. Because
LoopInfo *does* release its memory from run to run, it is entirely
possible to run SCEV over one function, then over another function, and
then lookup a Loop* from the second function but find an entry inserted
for the first function! Ouch.
To make matters still worse, there are plenty of updates that *don't*
trip a value handle. It seems incredibly unlikely that today GVN or
another pass that invalidates SCEV can update values in *just* such
a way that a subsequent run of SCEV will incorrectly find lookups in
a cache, but it is theoretically possible and would be a nightmare to
debug.
With this refactoring, I've fixed all this by actually destroying and
recreating the ScalarEvolution object from run to run. Technically, this
could increase the amount of malloc traffic we see, but then again it is
also technically correct. ;] I don't actually think we're suffering from
tons of malloc traffic from SCEV because if we were, the fact that we
never clear the memory would seem more likely to have come up as an
actual problem before now. So, I've made the simple fix here. If in fact
there are serious issues with too much allocation and deallocation,
I can work on a clever fix that preserves the allocations (while
clearing the data) between each run, but I'd prefer to do that kind of
optimization with a test case / benchmark that shows why we need such
cleverness (and that can test that we actually make it faster). It's
possible that this will make some things faster by making the SCEV
caches have higher locality (due to being significantly smaller) so
until there is a clear benchmark, I think the simple change is best.
Differential Revision: http://reviews.llvm.org/D12063
llvm-svn: 245193
This patch corresponds to review:
http://reviews.llvm.org/D11471
It improves the code generated for converting a scalar to a vector value. With
direct moves from GPRs to VSRs, we no longer require expensive stack operations
for this. Subsequent patches will handle the reverse case and more general
operations between vectors and their scalar elements.
llvm-svn: 244921
This commit removes the global manager variable which is responsible for
storing and allocating pseudo source values and instead it introduces a new
manager class named 'PseudoSourceValueManager'. Machine functions now own an
instance of the pseudo source value manager class.
This commit also modifies the 'get...' methods in the 'MachinePointerInfo'
class to construct pseudo source values using the instance of the pseudo
source value manager object from the machine function.
This commit updates calls to the 'get...' methods from the 'MachinePointerInfo'
class in a lot of different files because those calls now need to pass in a
reference to a machine function to those methods.
This change will make it easier to serialize pseudo source values as it will
enable me to transform the mips specific MipsCallEntry PseudoSourceValue
subclass into two target independent subclasses.
Reviewers: Akira Hatanaka
llvm-svn: 244693
After r244074, we now have a successors() method to iterate over
all the successors of a TerminatorInst. This commit changes a bunch
of eligible loops to use it.
llvm-svn: 244260
rather than 'unsigned' for their costs.
For something like costs in particular there is a natural "negative"
value, that of savings or saved cost. As a consequence, there is a lot
of code that subtracts or creates negative values based on cost, all of
which is prone to awkwardness or bugs when dealing with an unsigned
type. Similarly, we *never* want these values to wrap, as that would
cause Very Bad code generation (likely percieved as an infinite loop as
we try to emit over 2^32 instructions or some such insanity).
All around 'int' seems a much better fit for these basic metrics. I've
added asserts to ensure that at least the TTI interface never returns
negative numbers here. If we ever have a use case for negative numbers,
we can remove this, but this way a bug where someone used '-1' to
produce a 'very large' cost will be caught by the assert.
This passes all tests, and is also UBSan clean.
No functional change intended.
Differential Revision: http://reviews.llvm.org/D11741
llvm-svn: 244080
Given certain shuffle-vector masks, LLVM emits splat instructions
which splat the wrong bytes from the source register. The issue is
that the function PPC::isSplatShuffleMask() in PPCISelLowering.cpp
does not ensure that the splat pattern found is requesting bytes that
are aligned on an EltSize boundary. This patch detects this situation
as not a valid splat mask, resulting in a permute being generated
instead of a splat.
Patch and test case by Tyler Kenney, cleaned up a bit by me.
This is a simple bug fix that would be good to incorporate into 3.7.
llvm-svn: 243519
This fix was suggested as part of D11345 and is part of fixing PR24141.
With this change, we can avoid walking the uses of a divisor node if the target
doesn't want the combineRepeatedFPDivisors transform in the first place.
There is no NFC-intended other than that.
Differential Revision: http://reviews.llvm.org/D11531
llvm-svn: 243498
The 'common' section TLS is not implemented.
Current C/C++ TLS variables are not placed in common section.
DWARF debug info to get the address of TLS variables is not generated yet.
clang and driver changes in http://reviews.llvm.org/D10524
Added -femulated-tls flag to select the emulated TLS model,
which will be used for old targets like Android that do not
support ELF TLS models.
Added TargetLowering::LowerToTLSEmulatedModel as a target-independent
function to convert a SDNode of TLS variable address to a function call
to __emutls_get_address.
Added into lib/Target/*/*ISelLowering.cpp to call LowerToTLSEmulatedModel
for TLSModel::Emulated. Although all targets supporting ELF TLS models are
enhanced, emulated TLS model has been tested only for Android ELF targets.
Modified AsmPrinter.cpp to print the emutls_v.* and emutls_t.* variables for
emulated TLS variables.
Modified DwarfCompileUnit.cpp to skip some DIE for emulated TLS variabls.
TODO: Add proper DIE for emulated TLS variables.
Added new unit tests with emulated TLS.
Differential Revision: http://reviews.llvm.org/D10522
llvm-svn: 243438
This reverts commit r243146.
Feedback from Craig Topper and David Blaikie was that we don't put const on Type as it has no mutable state.
llvm-svn: 243282
extension property we're requesting - zero or sign extended.
This fixes cases where we want to return a zero extended 32-bit -1
and not be sign extended for the entire register. Also updated the
already out of date comment with the current behavior.
llvm-svn: 243192
We had a few places where we did
for (unsigned i = 0, e = STy->getNumElements(); i != e; ++i) {
but those could instead do
for (auto *EltTy : STy->elements()) {
llvm-svn: 243136
This makes one substantive change and a few stylistic changes to the
VSX swap optimization pass.
The substantive change is to permit LXSDX and LXSSPX instructions to
participate in swap optimization computations. The previous change to
insert a swap following a SUBREG_TO_REG widening operation makes this
almost trivial.
I experimented with also permitting STXSDX and STXSSPX instructions.
This can be done using similar techniques: we could insert a swap
prior to a narrowing COPY operation, and then permit these stores to
participate. I prototyped this, but discovered that the pattern of a
narrowing COPY followed by an STXSDX does not occur in any of our
test-suite code. So instead, I added commentary indicating that this
could be done.
Other TLC:
- I changed SH_COPYSCALAR to SH_COPYWIDEN to more clearly indicate
the direction of the copy.
- I factored the insertion of swap instructions into a separate
function.
Finally, I added a new test case to check that the scalar-to-vector
loads are working properly with swap optimization.
llvm-svn: 242838
This patch does the following:
* Fix FIXME on `needsStackRealignment`: it is now shared between multiple targets, implemented in `TargetRegisterInfo`, and isn't `virtual` anymore. This will break out-of-tree targets, silently if they used `virtual` and with a build error if they used `override`.
* Factor out `canRealignStack` as a `virtual` function on `TargetRegisterInfo`, by default only looks for the `no-realign-stack` function attribute.
Multiple targets duplicated the same `needsStackRealignment` code:
- Aarch64.
- ARM.
- Mips almost: had extra `DEBUG` diagnostic, which the default implementation now has.
- PowerPC.
- WebAssembly.
- x86 almost: has an extra `-force-align-stack` option, which the default implementation now has.
The default implementation of `needsStackRealignment` used to just return `false`. My current patch changes the behavior by simply using the above shared behavior. This affects:
- AMDGPU
- BPF
- CppBackend
- MSP430
- NVPTX
- Sparc
- SystemZ
- XCore
- Out-of-tree targets
This is a breaking change! `make check` passes.
The only implementation of the `virtual` function (besides the slight different in x86) was Hexagon (which did `MF.getFrameInfo()->getMaxAlignment() > 8`), and potentially some out-of-tree targets. Hexagon now uses the default implementation.
`needsStackRealignment` was being overwritten in `<Target>GenRegisterInfo.inc`, to return `false` as the default also did. That was odd and is now gone.
Reviewers: sunfish
Subscribers: aemerson, llvm-commits, jfb
Differential Revision: http://reviews.llvm.org/D11160
llvm-svn: 242727
I was looking at some vector code generation and kept seeing
unnecessary vector copies into the Altivec half of the VSX registers.
I discovered that we overlooked v4i32 when adding the register classes
for VSX; we only added v4f32 and v2f64. This means that anything that
canonicalizes into v4i32 (which is a LOT of stuff) ends up being
forced into VRRC on its way to VSRC.
The fix is one line. The rest of the patch is fixing up some test
cases whose code generation has changed as a result.
This seems like it would be a good candidate for backport to 3.7.
llvm-svn: 242442
Summary:
This change is part of a series of commits dedicated to have a single
DataLayout during compilation by using always the one owned by the
module.
This patch is quite boring overall, except for some uglyness in
ASMPrinter which has a getDataLayout function but has some clients
that use it without a Module (llmv-dsymutil, llvm-dwarfdump), so
some methods are taking a DataLayout as parameter.
Reviewers: echristo
Subscribers: yaron.keren, rafael, llvm-commits, jholewinski
Differential Revision: http://reviews.llvm.org/D11090
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 242386
The vec_sld interface provides access to the vsldoi instruction.
Unlike most of the vec_* interfaces, we do not attempt to change the
generated code for vec_sld based on the endian mode. It is too
difficult to correctly infer the desired semantics because of
different element types, and the corrected instruction sequence is
expensive, involving loading a permute control vector and performing a
generalized permute.
For GCC, this was implemented as "Don't touch the vec_sld"
implementation. When it came time for the LLVM implementation, I did
the same thing. However, this was hasty and incorrect. In LLVM's
version of altivec.h, vec_sld was previously defined in terms of the
vec_perm interface. Because vec_perm semantics are adjusted for
little endian, this means that leaving vec_sld untouched causes it to
generate something different for LE than for BE. Not good.
This back-end patch accompanies the changes to altivec.h that change
vec_sld's behavior for little endian. Those changes mean that we see
slightly different code in the back end when trying to recognize a
VSLDOI instruction in isVSLDOIShuffleMask. In particular, a
ShuffleKind of 1 (where the two inputs are identical) must now be
treated the same way as a ShuffleKind of 2 (little endian with
different inputs) when little endian mode is in force. This is
because ShuffleKind of 1 is defined using big-endian numbering.
This has a ripple effect on LowerBUILD_VECTOR, where we create our own
internal VSLDOI instructions. Because these are a ShuffleKind of 1,
they will now have their shift amounts subtracted from 16 when
recognizing the shuffle mask. To avoid problems we have to subtract
them from 16 again before creating the VSLDOI instructions.
There are a couple of other uses of BuildVSLDOI, but these do not need
to be modified because the shift amount is 8, which is unchanged when
subtracted from 16.
llvm-svn: 242296
This is a direct port of the code from the X86 backend (r239486/r240361), which
uses the MachineCombiner to reassociate (floating-point) adds/muls to increase
ILP, to the PowerPC backend. The rationale is the same.
There is a lot of copy-and-paste here between the X86 code and the PowerPC
code, and we should extract at least some of this into CodeGen somewhere.
However, I don't want to do that until this code is enhanced to handle FMAs as
well. After that, we'll be in a better position to extract the common parts.
llvm-svn: 242279
If the source of the copy that defines the addend is a physical register, then
its existing live range may not extend to the FMA being mutated. Make sure we
extend the live range of the register to meet the FMA because it will become
its operand in this case.
I don't have an independent test case, but it will be exposed by change to be
committed shortly enabling the use of the machine combiner to do fadd/fmul
reassociation, and will be covered by one of the associated regression tests.
llvm-svn: 242278
Follow-up r235483, with the corresponding support in PPC. We use a regular call
for symbolic targets (because they're much cheaper than indirect calls).
llvm-svn: 242239
We used to take the address specified as the direct target of the patchpoint
and did no TOC-pointer handling. This, however, as not all that useful,
because MCJIT tends to create a lot of modules, and they have their own TOC
sections. Thus, to call from the generated code to other generated code, you
really need to switch TOC pointers. Make this work as expected, and under
ELFv1, tread the address as the function descriptor address so that the correct
TOC pointer can be loaded.
llvm-svn: 242217
SelectionDAG already had begin/end methods for iterating over all
the nodes, but didn't define an iterator_range for us in foreach
loops.
This adds such a method and uses it in some of the eligible places
throughout the backends.
llvm-svn: 242212
PowerPC uses itineraries to describe processor pipelines (and dispatch-group
restrictions for P7/P8 cores). Unfortunately, the target-independent
implementation of TII.getInstrLatency calls ItinData->getStageLatency, and that
looks for the largest cycle count in the pipeline for any given instruction.
This, however, yields the wrong answer for the PPC itineraries, because we
don't encode the full pipeline. Because the functional units are fully
pipelined, we only model the initial stages (there are no relevant hazards in
the later stages to model), and so the technique employed by getStageLatency
does not really work. Instead, we should take the maximum output operand
latency, and that's what PPCInstrInfo::getInstrLatency now does.
This caused some test-case churn, including two unfortunate side effects.
First, the new arrangement of copies we get from function parameters now
sometimes blocks VSX FMA mutation (a FIXME has been added to the code and the
test cases), and we have one significant test-suite regression:
SingleSource/Benchmarks/BenchmarkGame/spectral-norm
56.4185% +/- 18.9398%
In this benchmark we have a loop with a vectorized FP divide, and it with the
new scheduling both divides end up in the same dispatch group (which in this
case seems to cause a problem, although why is not exactly clear). The grouping
structure is hard to predict from the bottom of the loop, and there may not be
much we can do to fix this.
Very few other test-suite performance effects were really significant, but
almost all weakly favor this change. However, in light of the issues
highlighted above, I've left the old behavior available via a
command-line flag.
llvm-svn: 242188
We have a detailed def/use lists for every physical register in
MachineRegisterInfo anyway, so there is little use in maintaining an
additional bitset of which ones are used.
Removing it frees us from extra book keeping. This simplifies
VirtRegMap.
Differential Revision: http://reviews.llvm.org/D10911
llvm-svn: 242173
This changes TargetFrameLowering::processFunctionBeforeCalleeSavedScan():
- Rename the function to determineCalleeSaves()
- Pass a bitset of callee saved registers by reference, thus avoiding
the function-global PhysRegUsed bitset in MachineRegisterInfo.
- Without PhysRegUsed the implementation is fine tuned to not save
physcial registers which are only read but never modified.
Related to rdar://21539507
Differential Revision: http://reviews.llvm.org/D10909
llvm-svn: 242165
This patch allows VSX swap optimization to succeed more frequently.
Specifically, it is concerned with common code sequences that occur
when copying a scalar floating-point value to a vector register. This
patch currently handles cases where the floating-point value is
already in a register, but does not yet handle loads (such as via an
LXSDX scalar floating-point VSX load). That will be dealt with later.
A typical case is when a scalar value comes in as a floating-point
parameter. The value is copied into a virtual VSFRC register, and
then a sequence of SUBREG_TO_REG and/or COPY operations will convert
it to a full vector register of the class required by the context. If
this vector register is then used as part of a lane-permuted
computation, the original scalar value will be in the wrong lane. We
can fix this by adding a swap operation following any widening
SUBREG_TO_REG operation. Additional COPY operations may be needed
around the swap operation in order to keep register assignment happy,
but these are pro forma operations that will be removed by coalescing.
If a scalar value is otherwise directly referenced in a computation
(such as by one of the many XS* vector-scalar operations), we
currently disable swap optimization. These operations are
lane-sensitive by definition. A MentionsPartialVR flag is added for
use in each swap table entry that mentions a scalar floating-point
register without having special handling defined.
A common idiom for PPC64LE is to convert a double-precision scalar to
a vector by performing a splat operation. This ensures that the value
can be referenced as V[0], as it would be for big endian, whereas just
converting the scalar to a vector with a SUBREG_TO_REG operation
leaves this value only in V[1]. A doubleword splat operation is one
form of an XXPERMDI instruction, which takes one doubleword from a
first operand and another doubleword from a second operand, with a
two-bit selector operand indicating which doublewords are chosen. In
the general case, an XXPERMDI can be permitted in a lane-swapped
region provided that it is properly transformed to select the
corresponding swapped values. This transformation is to reverse the
order of the two input operands, and to reverse and complement the
bits of the selector operand (derivation left as an exercise to the
reader ;).
A new test case that exercises the scalar-to-vector and generalized
XXPERMDI transformations is added as CodeGen/PowerPC/swaps-le-5.ll.
The patch also requires a change to CodeGen/PowerPC/swaps-le-3.ll to
use CHECK-DAG instead of CHECK for two independent instructions that
now appear in reverse order.
There are two small unrelated changes that are added with this patch.
First, the XXSLDWI instruction was incorrectly omitted from the list
of lane-sensitive instructions; this is now fixed. Second, I observed
that the same webs were being rejected over and over again for
different reasons. Since it's sufficient to reject a web only once, I
added a check for this to speed up the compilation time slightly.
llvm-svn: 242081
r238842 added the TargetRecip system for controlling use of reciprocal
estimates for sqrt and division using a set of parameters that can be set by
the frontend. Clang now supports a sophisticated -mrecip option, and this will
allow that option to effectively control the relevant code-generation
functionality of the PPC backend.
llvm-svn: 241985
This adds support for the 'nest' attribute, which allows the static chain
register to be set for functions calls under non-Darwin PPC/PPC64 targets. r11
is the chain register (which the PPC64 ELF ABI calls the "environment
pointer"). For indirect calls under PPC64 ELFv1, this would normally be loaded
from the function descriptor, but providing an explicit 'nest' parameter will
override that process and use the value provided.
This allows __builtin_call_with_static_chain to work as expected on PowerPC.
llvm-svn: 241984
Force all creators of `MCSubtargetInfo` to immediately initialize it,
merging the default constructor and the initializer into an initializing
constructor. Besides cleaning up the code a little, this makes it clear
that the initializer is never called again later.
Out-of-tree backends need a trivial change: instead of calling:
auto *X = new MCSubtargetInfo();
InitXYZMCSubtargetInfo(X, ...);
return X;
they should call:
return createXYZMCSubtargetInfoImpl(...);
There's no real functionality change here.
llvm-svn: 241957
Summary:
The target frame lowering's concrete type is always known in RegisterInfo, yet it's only sometimes devirtualized through a static_cast. This change adds an auto-generated static function <Target>GenRegisterInfo::getFrameLowering(const MachineFunction &MF) which does this devirtualization, and uses this function in all targets which can.
This change was suggested by sunfish in D11070 for WebAssembly, I figure that I may as well improve the other targets while I'm here.
Subscribers: sunfish, ted, llvm-commits, jfb
Differential Revision: http://reviews.llvm.org/D11093
llvm-svn: 241921
This patch allows the read_register and write_register intrinsics to
read/write the RBP/EBP registers on X86 iff the targeted register is
the frame pointer for the containing function.
Differential Revision: http://reviews.llvm.org/D10977
llvm-svn: 241827
Summary:
Remove empty subclass in the process.
This change is part of a series of commits dedicated to have a single
DataLayout during compilation by using always the one owned by the
module.
Reviewers: echristo
Subscribers: jholewinski, llvm-commits, rafael, yaron.keren, ted
Differential Revision: http://reviews.llvm.org/D11045
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 241780
Summary:
This change is part of a series of commits dedicated to have a single
DataLayout during compilation by using always the one owned by the
module.
Reviewers: echristo
Subscribers: yaron.keren, rafael, llvm-commits, jholewinski
Differential Revision: http://reviews.llvm.org/D11042
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 241779
Summary:
This change is part of a series of commits dedicated to have a single
DataLayout during compilation by using always the one owned by the
module.
Reviewers: echristo
Subscribers: jholewinski, llvm-commits, rafael, yaron.keren
Differential Revision: http://reviews.llvm.org/D11040
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 241778
Summary:
This change is part of a series of commits dedicated to have a single
DataLayout during compilation by using always the one owned by the
module.
Reviewers: echristo
Subscribers: yaron.keren, rafael, llvm-commits, jholewinski
Differential Revision: http://reviews.llvm.org/D11038
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 241777
Summary:
This change is part of a series of commits dedicated to have a single
DataLayout during compilation by using always the one owned by the
module.
Reviewers: echristo
Subscribers: jholewinski, llvm-commits, rafael, yaron.keren
Differential Revision: http://reviews.llvm.org/D11037
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 241776
Summary:
This change is part of a series of commits dedicated to have a single
DataLayout during compilation by using always the one owned by the
module.
Reviewers: echristo
Subscribers: jholewinski, ted, yaron.keren, rafael, llvm-commits
Differential Revision: http://reviews.llvm.org/D11028
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 241775
DataLayout is no longer optional. It was initialized with or without
a DataLayout, and the DataLayout when supplied could have been the
one from the TargetMachine.
Summary:
This change is part of a series of commits dedicated to have a single
DataLayout during compilation by using always the one owned by the
module.
Reviewers: echristo
Subscribers: jholewinski, llvm-commits, rafael, yaron.keren
Differential Revision: http://reviews.llvm.org/D11021
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 241774
Summary:
Avoid using the TargetMachine owned DataLayout and use the Module owned
one instead. This requires passing the DataLayout up the stack to
ComputeValueVTs().
This change is part of a series of commits dedicated to have a single
DataLayout during compilation by using always the one owned by the
module.
Reviewers: echristo
Subscribers: jholewinski, yaron.keren, rafael, llvm-commits
Differential Revision: http://reviews.llvm.org/D11019
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 241773
Summary:
This concludes the patch series to eliminate StringRef forms of GNU triples
from the internals of LLVM that began in r239036.
At this point, the StringRef-form of GNU Triples should only be used in the
public API (including IR serialization) and a couple objects that directly
interact with the API (most notably the Module class). The next step is to
replace these Triple objects with the TargetTuple object that will represent
our authoratative/unambiguous internal equivalent to GNU Triples.
Reviewers: rengolin
Subscribers: llvm-commits, jholewinski, ted, rengolin
Differential Revision: http://reviews.llvm.org/D10962
llvm-svn: 241472
From the linker's perspective, an available_externally global is equivalent
to an external declaration (per isDeclarationForLinker()), so it is incorrect
to consider it to be a weak definition.
Also clean up some logic in the dead argument elimination pass and clarify
its comments to better explain how its behavior depends on linkage,
introduce GlobalValue::isStrongDefinitionForLinker() and start using
it throughout the optimizers and backend.
Differential Revision: http://reviews.llvm.org/D10941
llvm-svn: 241413
There is some functional change here because it changes target code from
atoi(3) to StringRef::getAsInteger which has error checking. For valid
constraints there should be no difference.
llvm-svn: 241411
In r241285, I removed the SUBREG_TO_REG restriction from VSX swap
removal, determining that this was overly conservative. We have
another form of the same restriction in that we check for the presence
of implicit subregs in vector operations. As with SUBREG_TO_REG for
partial register conversions, an implicit subreg is safe in and of
itself, provided no other operation makes a lane-sensitive assumption
about the result. This patch removes that restriction, by removing
the HasImplicitSubreg flag and all code that relies on it.
I've added a test case that fails to optimize before this patch is
applied, and optimizes properly with the patch. Test based on a
report from Anton Blanchard.
llvm-svn: 241290
With a previous patch, the VSX swap optimization is able to recognize
the doubleword load-splat idiom that can be implemented using lxvdsx.
However, that does not cover a doubleword splat where the source is a
register. We can implement this using xxspltd (a special form of
xxpermdi). This patch teaches the swap optimization pass about this
idiom.
As a prerequisite, it also permits swap optimization to succeed for
all forms of SUBREG_TO_REG. Previously we were conservative and only
allowed SUBREG_TO_REG when it copied a full register. However, on
reflection any form of SUBREG_TO_REG is safe in and of itself, so long
as an unsafe operation is not performed on its result. In particular,
a widening SUBREG_TO_REG often occurs as an input to a doubleword
splat idiom, particularly in auto-vectorized code.
The doubleword splat idiom is an XXPERMDI operation where both source
registers are identical, and the selection mask is either 0 (splat the
first element) or 3 (splat the second element). To determine whether
the registers are identical, we use the existing mechanism for looking
through "copy-like" operations. That mechanism has a side effect of
marking the XXPERMDI operation as using a physical register, which
would invalidate its presence in a swap-optimized region. This is
correct for the form of XXPERMDI that performs a swap and hence would
be removed, but is not what we want for a doubleword-splat variety of
XXPERMDI. Therefore we reset the physical-register flag on the
XXPERMDI when it represents a splat.
A simple test case is added to verify that we generate the splat and
that we also remove the xxswapd instructions that would otherwise be
associated with the load and store of another operand.
llvm-svn: 241285
When adding little-endian vector support for PowerPC last year, I
inadvertently disabled an optimization that recognizes a load-splat
idiom and generates the lxvdsx instruction. This patch moves the
offending logic so lxvdsx is once again generated.
This pattern is frequently generated by the vectorizer for scalar
loads of an effective constant. Previously the lxvdsx instruction was
wrongly listed as lane-sensitive for the VSX swap optimization (since
both doublewords are identical, swaps are safe). This patch fixes
this as well, so that vectorized code using lxvdsx can now have swaps
removed from the computation.
There is an existing test (@test50) in test/CodeGen/PowerPC/vsx.ll
that checks for the missing optimization. However, vsx.ll was only
being tested for POWER7 with big-endian code generation. I've added
a little-endian RUN statement and expected LE code generation for all
the tests in vsx.ll to give us a bit better VSX coverage, including
what's needed for this patch.
llvm-svn: 241183
represented by uint64_t, this patch replaces these
usages with the FeatureBitset (std::bitset) type.
Differential Revision: http://reviews.llvm.org/D10542
llvm-svn: 241058
This patch corresponds to review:
http://reviews.llvm.org/D10638
This is the back end portion of patch
http://reviews.llvm.org/D10637
It just adds the code gen and intrinsic functions necessary to support that patch to the back end.
llvm-svn: 240820
This patch adds support for the vector merge even word and vector merge odd word
instructions introduced in POWER8.
Phabricator review: http://reviews.llvm.org/D10704
llvm-svn: 240650
The summary is that it moves the mangling earlier and replaces a few
calls to .addExternalSymbol with addSym.
I originally wanted to replace all the uses of addExternalSymbol with
addSym, but noticed it was a lot of work and doesn't need to be done
all at once.
llvm-svn: 240395
The patch is generated using this command:
tools/clang/tools/extra/clang-tidy/tool/run-clang-tidy.py -fix \
-checks=-*,llvm-namespace-comment -header-filter='llvm/.*|clang/.*' \
llvm/lib/
Thanks to Eugene Kosov for the original patch!
llvm-svn: 240137
The mftb instruction was incorrectly marked as deprecated in the PPC
Backend. Instead, it should not be treated as deprecated, but rather be
implemented using the mfspr instruction. A similar patch was put into GCC last
year. Details can be found at:
https://sourceware.org/ml/binutils/2014-11/msg00383.html.
This change will replace instances of the mftb instruction with the mfspr
instruction for all CPUs except 601 and pwr3. This will also be the default
behaviour.
Additional details can be found in:
https://llvm.org/bugs/show_bug.cgi?id=23680
Phabricator review: http://reviews.llvm.org/D10419
llvm-svn: 239827
Summary:
This continues the patch series to eliminate StringRef forms of GNU triples
from the internals of LLVM that began in r239036.
Reviewers: rengolin
Reviewed By: rengolin
Subscribers: llvm-commits, rengolin
Differential Revision: http://reviews.llvm.org/D10381
llvm-svn: 239815
r213101 changed the behaviour of this method to not only affect the
PostMachineScheduler scheduler but also the PostRAScheduler scheduler,
renaming should make this fact clear. Also document that the preferred
way is to specify this in the scheduling model instead of overriding
this method.
Differential Revision: http://reviews.llvm.org/D10427
llvm-svn: 239659
This will use Itinieraries if available, but will also work if just a
MCSchedModel is available.
Differential Revision: http://reviews.llvm.org/D10428
llvm-svn: 239658
Summary:
For the moment, TargetMachine::getTargetTriple() still returns a StringRef.
This continues the patch series to eliminate StringRef forms of GNU triples
from the internals of LLVM that began in r239036.
Reviewers: rengolin
Reviewed By: rengolin
Subscribers: ted, llvm-commits, rengolin, jholewinski
Differential Revision: http://reviews.llvm.org/D10362
llvm-svn: 239554
This patch corresponds to review:
http://reviews.llvm.org/D10096
This is the back end portion of the patch related to D10095.
The patch adds the instructions and back end intrinsics for:
vbpermq
vgbbd
llvm-svn: 239505
Summary:
This continues the patch series to eliminate StringRef forms of GNU triples
from the internals of LLVM that began in r239036.
Reviewers: rafael
Reviewed By: rafael
Subscribers: rafael, ted, jfb, llvm-commits, rengolin, jholewinski
Differential Revision: http://reviews.llvm.org/D10311
llvm-svn: 239467
Summary:
This continues the patch series to eliminate StringRef forms of GNU triples
from the internals of LLVM that began in r239036.
Reviewers: echristo, rafael
Reviewed By: rafael
Subscribers: rafael, llvm-commits, rengolin
Differential Revision: http://reviews.llvm.org/D10243
llvm-svn: 239464
Fix the FIXME and remove this old as(1) compat option. It was useful for
bringup of the integrated assembler to diff object files, but now it's
just causing more relocations than strictly necessary to be generated.
rdar://21201804
llvm-svn: 239084
Summary:
This is the first of several patches to eliminate StringRef forms of GNU
triples from the internals of LLVM. After this is complete, GNU triples
will be replaced by a more authoratitive representation in the form of
an LLVM TargetTuple.
Reviewers: rengolin
Reviewed By: rengolin
Subscribers: ted, llvm-commits, rengolin, jholewinski
Differential Revision: http://reviews.llvm.org/D10236
llvm-svn: 239036
The fix is just that getOther had not been updated for packing the st_other
values in fewer bits and could return spurious values:
- unsigned Other = (getFlags() & (0x3f << ELF_STO_Shift)) >> ELF_STO_Shift;
+ unsigned Other = (getFlags() & (0x7 << ELF_STO_Shift)) >> ELF_STO_Shift;
Original message:
Pack the MCSymbolELF bit fields into MCSymbol's Flags.
This reduces MCSymolfELF from 64 bytes to 56 bytes on x86_64.
While at it, also make getOther/setOther easier to use by accepting unshifted
STO_* values.
llvm-svn: 239012
This reduces MCSymolfELF from 64 bytes to 56 bytes on x86_64.
While at it, also make getOther/setOther easier to use by accepting unshifted
STO_* values.
llvm-svn: 239006
This is important because of different addressing modes
depending on the address space for GPU targets.
This only adds the argument, and does not update
any of the uses to provide the correct address space.
llvm-svn: 238723
This patch corresponds to review:
http://reviews.llvm.org/D9941
It adds the various FMA instructions introduced in the version 2.07 of
the ISA along with the testing for them. These are operations on single
precision scalar values in VSX registers.
llvm-svn: 238578
Previously, subtarget features were a bitfield with the underlying type being uint64_t.
Since several targets (X86 and ARM, in particular) have hit or were very close to hitting this bound, switching the features to use a bitset.
No functional change.
The first several times this was committed (e.g. r229831, r233055), it caused several buildbot failures.
Apparently the reason for most failures was both clang and gcc's inability to deal with large numbers (> 10K) of bitset constructor calls in tablegen-generated initializers of instruction info tables.
This should now be fixed.
llvm-svn: 238192
in POWER8:
vadduqm
vaddeuqm
vaddcuq
vaddecuq
vsubuqm
vsubeuqm
vsubcuq
vsubecuq
In addition to adding the instructions themselves, it also adds support for the
v1i128 type for intrinsics (Intrinsics.td, Function.cpp, and
IntrinsicEmitter.cpp).
http://reviews.llvm.org/D9081
llvm-svn: 238144
When the compare feeding a branch was in a different BB from the branch, we'd
try to "regenerate" the compare in the block with the branch, possibly trying
to make use of values not available there. Copy a page from AArch64's play book
here to fix the problem (at least in terms of correctness).
Fixes PR23640.
llvm-svn: 238097
This patch adds support for the ISA 2.07 additions involving the
branch history rolling buffer and event-based branching. These will
not be used by typical applications, so built-in support is not
required. They will only be available via inline assembly.
Assembly/disassembly tests are included in the patch.
llvm-svn: 238032
Unfortunately, I can't reduce a small test case for this (although compiling
mpfr-3.1.2 with -O2 -mcpu=a2 would fairly reliably trigger a crash), but the
problem is fairly clear (at least once you know you're looking for one). If the
TLS instruction being replaced was at the end of the block, we'd increment the
iterator past it (so it would then point to MBB.end()), and then we'd increment
it again as part of the for statement, thus overrunning the end of the list.
Don't do that.
llvm-svn: 237974
My recent patch to add support for ISA 2.07 vector pack/unpack
instructions didn't properly check for availability of the vpkudum
instruction when recognizing it as a special vector shuffle case.
This causes us to leave the vector shuffle in place (rather than
converting it to a vector permute) so that it can be recognized later
as a vpkudum, but that pattern is invalid for processors prior to
POWER8. Thus LLVM crashes with an "unable to select" message. We
observed this since one of our buildbots is configured to generate
code for a POWER7.
This patch fixes the problem by checking for availability of the
vpkudum instruction during custom lowering of vector shuffles.
I've added a test case variant for the vpkudum pattern when the
instruction isn't available.
llvm-svn: 237952
On X86 (and similar OOO cores) unrolling is very limited, and even if the
runtime unrolling is otherwise profitable, the expense of a division to compute
the trip count could greatly outweigh the benefits. On the A2, we unroll a lot,
and the benefits of unrolling are more significant (seeing a 5x or 6x speedup
is not uncommon), so we're more able to tolerate the expense, on average, of a
division to compute the trip count.
llvm-svn: 237947
http://reviews.llvm.org/D9891
Following up on the VSX single precision loads and stores added earlier, this
adds support for elementary arithmetic operations on single precision values
in VSX registers. These instructions utilize the new VSSRC register class.
Instructions added:
xsaddsp
xsdivsp
xsmulsp
xsresp
xsrsqrtesp
xssqrtsp
xssubsp
llvm-svn: 237937
This starts merging MCSection and MCSectionData.
There are a few issues with the current split between MCSection and
MCSectionData.
* It optimizes the the not as important case. We want the production
of .o files to be really fast, but the split puts the information used
for .o emission in a separate data structure.
* The ELF/COFF/MachO hierarchy is not represented in MCSectionData,
leading to some ad-hoc ways to represent the various flags.
* It makes it harder to remember where each item is.
The attached patch starts merging the two by moving the alignment from
MCSectionData to MCSection.
Most of the patch is actually just dropping 'const', since
MCSectionData is mutable, but MCSection was not.
llvm-svn: 237936
If some commits are happy, and some commits are sad, this is a sad commit. It
is sad because it restricts instruction scheduling to work around a binutils
linker bug, and moreover, one that may never be fixed. On 2012-05-21, GCC was
updated not to produce code triggering this bug, and now we'll do the same...
When resolving an address using the ELF ABI TOC pointer, two relocations are
generally required: one for the high part and one for the low part. Only
the high part generally explicitly depends on r2 (the TOC pointer). And, so,
we might produce code like this:
.Ltmp526:
addis 3, 2, .LC12@toc@ha
.Ltmp1628:
std 2, 40(1)
ld 5, 0(27)
ld 2, 8(27)
ld 11, 16(27)
ld 3, .LC12@toc@l(3)
rldicl 4, 4, 0, 32
mtctr 5
bctrl
ld 2, 40(1)
And there is nothing wrong with this code, as such, but there is a linker bug
in binutils (https://sourceware.org/bugzilla/show_bug.cgi?id=18414) that will
misoptimize this code sequence to this:
nop
std r2,40(r1)
ld r5,0(r27)
ld r2,8(r27)
ld r11,16(r27)
ld r3,-32472(r2)
clrldi r4,r4,32
mtctr r5
bctrl
ld r2,40(r1)
because the linker does not know (and does not check) that the value in r2
changed in between the instruction using the .LC12@toc@ha (TOC-relative)
relocation and the instruction using the .LC12@toc@l(3) relocation.
Because it finds these instructions using the relocations (and not by
scanning the instructions), it has been asserted that there is no good way
to detect the change of r2 in between. As a result, this bug may never be
fixed (i.e. it may become part of the definition of the ABI). GCC was
updated to add extra dependencies on r2 to instructions using the @toc@l
relocations to avoid this problem, and we'll do the same here.
This is done as a separate pass because:
1. These extra r2 dependencies are not really properties of the
instructions, but rather due to a linker bug, and maybe one day we'll be
able to get rid of them when targeting linkers without this bug (and,
thus, keeping the logic centralized here will make that
straightforward).
2. There are ISel-level peephole optimizations that propagate the @toc@l
relocations to some user instructions, and so the exta dependencies do
not apply only to a fixed set of instructions (without undesirable
definition replication).
The test case was reduced with the help of bugpoint, with minimal cleaning. I'm
looking forward to our upcoming MI serialization support, and with that, much
better tests can be created.
llvm-svn: 237556
This patch adds support for the following new instructions in the
Power ISA 2.07:
vpksdss
vpksdus
vpkudus
vpkudum
vupkhsw
vupklsw
These instructions are available through the vec_packs, vec_packsu,
vec_unpackh, and vec_unpackl built-in interfaces. These are
lane-sensitive instructions, so the built-ins have different
implementations for big- and little-endian, and the instructions must
be marked as killing the vector swap optimization for now.
The first three instructions perform saturating pack operations. The
fourth performs a modulo pack operation, which means it can be
represented with a vector shuffle, and conversely the appropriate
vector shuffles may cause this instruction to be generated. The other
instructions are only generated via built-in support for now.
Appropriate tests have been added.
There is a companion patch to clang for the rest of this support.
llvm-svn: 237499
Previously, subtarget features were a bitfield with the underlying type being uint64_t.
Since several targets (X86 and ARM, in particular) have hit or were very close to hitting this bound, switching the features to use a bitset.
No functional change.
The first two times this was committed (r229831, r233055), it caused several buildbot failures.
At least some of the ARM and MIPS ones were due to gcc/binutils issues, and should now be fixed.
llvm-svn: 237234
The code that builds the dependence graph assumes that two PseudoSourceValues
don't alias. In a tail calling function two FixedStackObjects might refer to the
same location. Worse 'immutable' fixed stack objects like function arguments are
not immutable and will be clobbered.
Change this so that a load from a FixedStackObject is not invariant in a tail
calling function and don't return a PseudoSourceValue for an instruction in tail
calling functions when building the dependence graph so that we handle function
arguments conservatively.
Fix for PR23459.
rdar://20740035
llvm-svn: 236916
This patch corresponds to review:
http://reviews.llvm.org/D9440
It adds a new register class to the PPC back end to contain single precision
values in VSX registers. Additionally, it adds scalar loads and stores for
VSX registers.
llvm-svn: 236755
The patch disabled unrolling in loop vectorization pass when VF==1 on x86 architecture,
by setting MaxInterleaveFactor to 1. Unrolling in loop vectorization pass may introduce
the cost of overflow check, memory boundary check and extra prologue/epilogue code when
regular unroller will unroll the loop another time. Disable it when VF==1 remove the
unnecessary cost on x86. The same can be done for other platforms after verifying
interleaving/memory bound checking to be not perf critical on those platforms.
Differential Revision: http://reviews.llvm.org/D9515
llvm-svn: 236613
The initial code drop for VSX swap optimization permitted the
optimization only when all operations in a web of related computation
are lane-insensitive. For some lane-sensitive operations, we can
still permit the optimization provided that we make adjustments to
those operations. This patch adds special handling for vector splats
so that their presence doesn't kill the optimization.
Vector splats are lane-sensitive since they identify by number a
vector element to be used as the source of a splat. When swap
optimizations take place, the desired vector element will move to the
opposite doubleword of the quadword vector. We thus replace the index
I by (I + N/2) % N, where N is the number of elements in the vector.
A new test case is added to test that swap optimization succeeds when
vector splats are present, and that the proper input element is used
as the source of the splat.
An ancillary change removes SH_BUILDVEC as one of the kinds of special
handling that may be required by VSX swap optimization. From
experience with GCC, I had expected to need some modifications for
vector build operations, but I did not find that to be the case.
llvm-svn: 236606
This patch introduces a new pass that computes the safe point to insert the
prologue and epilogue of the function.
The interest is to find safe points that are cheaper than the entry and exits
blocks.
As an example and to avoid regressions to be introduce, this patch also
implements the required bits to enable the shrink-wrapping pass for AArch64.
** Context **
Currently we insert the prologue and epilogue of the method/function in the
entry and exits blocks. Although this is correct, we can do a better job when
those are not immediately required and insert them at less frequently executed
places.
The job of the shrink-wrapping pass is to identify such places.
** Motivating example **
Let us consider the following function that perform a call only in one branch of
a if:
define i32 @f(i32 %a, i32 %b) {
%tmp = alloca i32, align 4
%tmp2 = icmp slt i32 %a, %b
br i1 %tmp2, label %true, label %false
true:
store i32 %a, i32* %tmp, align 4
%tmp4 = call i32 @doSomething(i32 0, i32* %tmp)
br label %false
false:
%tmp.0 = phi i32 [ %tmp4, %true ], [ %a, %0 ]
ret i32 %tmp.0
}
On AArch64 this code generates (removing the cfi directives to ease
readabilities):
_f: ; @f
; BB#0:
stp x29, x30, [sp, #-16]!
mov x29, sp
sub sp, sp, #16 ; =16
cmp w0, w1
b.ge LBB0_2
; BB#1: ; %true
stur w0, [x29, #-4]
sub x1, x29, #4 ; =4
mov w0, wzr
bl _doSomething
LBB0_2: ; %false
mov sp, x29
ldp x29, x30, [sp], #16
ret
With shrink-wrapping we could generate:
_f: ; @f
; BB#0:
cmp w0, w1
b.ge LBB0_2
; BB#1: ; %true
stp x29, x30, [sp, #-16]!
mov x29, sp
sub sp, sp, #16 ; =16
stur w0, [x29, #-4]
sub x1, x29, #4 ; =4
mov w0, wzr
bl _doSomething
add sp, x29, #16 ; =16
ldp x29, x30, [sp], #16
LBB0_2: ; %false
ret
Therefore, we would pay the overhead of setting up/destroying the frame only if
we actually do the call.
** Proposed Solution **
This patch introduces a new machine pass that perform the shrink-wrapping
analysis (See the comments at the beginning of ShrinkWrap.cpp for more details).
It then stores the safe save and restore point into the MachineFrameInfo
attached to the MachineFunction.
This information is then used by the PrologEpilogInserter (PEI) to place the
related code at the right place. This pass runs right before the PEI.
Unlike the original paper of Chow from PLDI’88, this implementation of
shrink-wrapping does not use expensive data-flow analysis and does not need hack
to properly avoid frequently executed point. Instead, it relies on dominance and
loop properties.
The pass is off by default and each target can opt-in by setting the
EnableShrinkWrap boolean to true in their derived class of TargetPassConfig.
This setting can also be overwritten on the command line by using
-enable-shrink-wrap.
Before you try out the pass for your target, make sure you properly fix your
emitProlog/emitEpilog/adjustForXXX method to cope with basic blocks that are not
necessarily the entry block.
** Design Decisions **
1. ShrinkWrap is its own pass right now. It could frankly be merged into PEI but
for debugging and clarity I thought it was best to have its own file.
2. Right now, we only support one save point and one restore point. At some
point we can expand this to several save point and restore point, the impacted
component would then be:
- The pass itself: New algorithm needed.
- MachineFrameInfo: Hold a list or set of Save/Restore point instead of one
pointer.
- PEI: Should loop over the save point and restore point.
Anyhow, at least for this first iteration, I do not believe this is interesting
to support the complex cases. We should revisit that when we motivating
examples.
Differential Revision: http://reviews.llvm.org/D9210
<rdar://problem/3201744>
llvm-svn: 236507
It adds v1i128 to the appropriate register classes and checks parameter passing
and return values.
This is related to http://reviews.llvm.org/D9081, which will add instructions
that exploit the v1i128 datatype.
Phabricator review: http://reviews.llvm.org/D9475
llvm-svn: 236503
[DebugInfo] Add debug locations to constant SD nodes
This adds debug location to constant nodes of Selection DAG and updates
all places that create constants to pass debug locations
(see PR13269).
Can't guarantee that all locations are correct, but in a lot of cases choice
is obvious, so most of them should be. At least all tests pass.
Tests for these changes do not cover everything, instead just check it for
SDNodes, ARM and AArch64 where it's easy to get incorrect locations on
constants.
This is not complete fix as FastISel contains workaround for wrong debug
locations, which drops locations from instructions on processing constants,
but there isn't currently a way to use debug locations from constants there
as llvm::Constant doesn't cache it (yet). Although this is a bit different
issue, not directly related to these changes.
Differential Revision: http://reviews.llvm.org/D9084
llvm-svn: 235989
This adds debug location to constant nodes of Selection DAG and updates
all places that create constants to pass debug locations
(see PR13269).
Can't guarantee that all locations are correct, but in a lot of cases choice
is obvious, so most of them should be. At least all tests pass.
Tests for these changes do not cover everything, instead just check it for
SDNodes, ARM and AArch64 where it's easy to get incorrect locations on
constants.
This is not complete fix as FastISel contains workaround for wrong debug
locations, which drops locations from instructions on processing constants,
but there isn't currently a way to use debug locations from constants there
as llvm::Constant doesn't cache it (yet). Although this is a bit different
issue, not directly related to these changes.
Differential Revision: http://reviews.llvm.org/D9084
llvm-svn: 235977
This patch adds a new SSA MI pass that runs on little-endian PPC64
code with VSX enabled. Loads and stores of 4x32 and 2x64 vectors
without alignment constraints are accomplished for little-endian using
lxvd2x/xxswapd and xxswapd/stxvd2x. The existence of the additional
xxswapd instructions hurts performance in comparison with big-endian
code, but they are necessary in the general case to support correct
semantics.
However, the general case does not apply to most vector code. Many
vector instructions are lane-insensitive; they do not "care" which
lanes the parallel computations are performed within, provided that
the resulting data is stored into the correct locations. Thus this
pass looks for computations that perform only lane-insensitive
operations, and remove the unnecessary swaps from loads and stores in
such computations.
Future improvements will allow computations using certain
lane-sensitive operations to also be optimized in this manner, by
modifying the lane-sensitive operations to account for the permuted
order of the lanes. However, this patch only adds the infrastructure
to permit this; no lane-sensitive operations are optimized at this
time.
This code is heavily exercised by the various vectorizing applications
in the projects/test-suite tree. For the time being, I have only added
one simple test case to demonstrate what the pass is doing. Although
it is quite simple, it provides coverage for much of the code,
including the special case handling of copies and subreg-to-reg
operations feeding the swaps. I plan to add additional tests in the
future as I fill in more of the "special handling" code.
Two existing tests were affected, because they expected the swaps to
be present, but they are now removed.
llvm-svn: 235910
Match binutils by supporting the optional register name prefix for new vector
registers ("vs" for VSX registers and "q" for QPX registers).
llvm-svn: 235665
Add assembler/disassembler support for dcbt/dcbtst (and aliases) with the hint
field specified (non-zero). Unforunately, the syntax for this instruction is
special in that it differs for server vs. embedded cores:
dcbt ra, rb, th [server]
dcbt th, ra, rb [embedded]
where th can be omitted when it is 0. dcbtst is the same. Thus we need to play
games in the parser and the printer to flip the operands around on the embedded
cores. We'll use the server syntax as the default (binutils currently uses the
embedded form by default, but IBM is changing that).
We also stop marking dcbtst as having unmodeled side effects (this is not
necessary, it is just a hint like dcbt -- noticed by inspection, so no separate
test case).
llvm-svn: 235657
TableGen had been nicely generating code to print a number of instructions using
shorter aliases (and PowerPC has plenty of short mnemonics), but we were not
calling it. For some of the aliases we support in the parser, TableGen can't
infer the "inverse" alias relationship, so there is still more to do.
Thus, after some hours of updating test cases...
llvm-svn: 235616
Third time's the charm. The previous commit was reverted as a
reverse for-loop in SelectionDAGBuilder::lowerWorkItem did 'I--'
on an iterator at the beginning of a vector, causing asserts
when using debugging iterators. This commit fixes that.
llvm-svn: 235608
This is a re-commit of r235101, which also fixes the problems with the previous patch:
- Switches with only a default case and non-fallthrough were handled incorrectly
- The previous patch tickled a bug in PowerPC Early-Return Creation which is fixed here.
> This is a major rewrite of the SelectionDAG switch lowering. The previous code
> would lower switches as a binary tre, discovering clusters of cases
> suitable for lowering by jump tables or bit tests as it went along. To increase
> the likelihood of finding jump tables, the binary tree pivot was selected to
> maximize case density on both sides of the pivot.
>
> By not selecting the pivot in the middle, the binary trees would not always
> be balanced, leading to performance problems in the generated code.
>
> This patch rewrites the lowering to search for clusters of cases
> suitable for jump tables or bit tests first, and then builds the binary
> tree around those clusters. This way, the binary tree will always be balanced.
>
> This has the added benefit of decoupling the different aspects of the lowering:
> tree building and jump table or bit tests finding are now easier to tweak
> separately.
>
> For example, this will enable us to balance the tree based on profile info
> in the future.
>
> The algorithm for finding jump tables is quadratic, whereas the previous algorithm
> was O(n log n) for common cases, and quadratic only in the worst-case. This
> doesn't seem to be major problem in practice, e.g. compiling a file consisting
> of a 10k-case switch was only 30% slower, and such large switches should be rare
> in practice. Compiling e.g. gcc.c showed no compile-time difference. If this
> does turn out to be a problem, we could limit the search space of the algorithm.
>
> This commit also disables all optimizations during switch lowering in -O0.
>
> Differential Revision: http://reviews.llvm.org/D8649
llvm-svn: 235560
When I fixed these a couple of days ago to iterate over all loops, not just
depth == 1 loops, I inadvertently made it such that we'd only look at the first
top-level loop. Make sure that we really look at all of them.
llvm-svn: 234705
As it turns out, even though these are part of ISA 2.06, the P7 does not
support them (or, at least, not any P7s we're tested so far).
llvm-svn: 234686
This patch corresponds to review:
http://reviews.llvm.org/D8928
It adds direct move instructions to/from VSX registers to GPR's. These are
exploited for FP <-> INT conversions.
llvm-svn: 234682
The patch is generated using clang-tidy misc-use-override check.
This command was used:
tools/clang/tools/extra/clang-tidy/tool/run-clang-tidy.py \
-checks='-*,misc-use-override' -header-filter='llvm|clang' \
-j=32 -fix -format
http://reviews.llvm.org/D8925
llvm-svn: 234679
This pass had the same problem as the data-prefetching pass: it was only
checking for depth == 1 loops in practice. Fix that, add some debugging
statements, and make sure that, when we grab an AddRec, it is for the loop we
expect.
llvm-svn: 234670
Iterating over loops from the LoopInfo instance only provides top-level loops.
We need to search the whole tree of loops to find the inner ones.
llvm-svn: 234603
When we have an instruction for this (and, thus, don't generate a runtime
call), we need to custom type legalize this (in a trivial way, just as we do
for fp_to_sint).
Fixes PR23173.
llvm-svn: 234561
This is the patch corresponding to review:
http://reviews.llvm.org/D8406
It adds some missing instructions from ISA 2.06 to the PPC back end.
llvm-svn: 234546
When enabling PPC64LE, I disabled some optimizations of BUILD_VECTOR
nodes for little endian because wrong results were produced. I've
subsequently investigated and found this is due to a call to
BuildVectorSDNode::isConstantSplat that was always specifying
big-endian. With this changed to correctly identify the target
endianness, the optimizations work as expected.
I found another case of a call to the same method with big-endian
hardcoded, in PPC::isAllNegativeZeroVector(). I discovered this was
an orphaned method with no callers, so I've just removed it.
The existing test/CodeGen/PowerPC/vec_constants.ll checks these
optimizations, so for testing I've just added a variant for little
endian.
llvm-svn: 234011
Under normal circumstances, use of CR bits is disabled when running at -O0, but
it is enabled by default otherwise, and if you have optnone functions, they'll
still generally be generated with crbits turned on (because nothing else turns
them off). FastISel can't handle most things dealing with i1 values when using
CR bits, and checks for that, but was not checking the return type on
functions; we can't fast-isel function calls with i1 return values either when
using CR bits for boolean values.
Fixes PR22664.
llvm-svn: 233775
Even at -O0, we fall back to SDAG when we hit intrinsics, and if the intrinsic
is a memset/memcpy/etc. we might normally use vector types. At -O0, this is
probably not a good idea (because, if there is a bug in the lowering code,
there would be no good way to turn it off). At -O0, only use scalar preferred
types.
Related to PR22754.
llvm-svn: 233755
The asm syntax for the 32-bit rotate-and-mask instructions can take a 32-bit
bitmask instead of an (mb, me) pair. This syntax is not specified in the Power
ISA manual, but is accepted by GNU as, and is documented in IBM's Assembler
Language Reference. The GNU Multiple Precision Arithmetic Library (gmp)
contains assembly that uses this syntax.
To implement this, I moved the isRunOfOnes utility function from
PPCISelDAGToDAG.cpp to PPCMCTargetDesc.h.
llvm-svn: 233483
per-function subtarget.
Currently, code-gen passes the default or generic subtarget to the constructors
of MCInstPrinter subclasses (see LLVMTargetMachine::addPassesToEmitFile), which
enables some targets (AArch64, ARM, and X86) to change their instprinter's
behavior based on the subtarget feature bits. Since the backend can now use
different subtargets for each function, instprinter has to be changed to use the
per-function subtarget rather than the default subtarget.
This patch takes the first step towards enabling instprinter to change its
behavior based on the per-function subtarget. It adds a bit "PassSubtarget" to
AsmWriter which tells table-gen to pass a reference to MCSubtargetInfo to the
various print methods table-gen auto-generates.
I will follow up with changes to instprinters of AArch64, ARM, and X86.
llvm-svn: 233411
for PPC due to some unfortunate default setting via TargetMachine
creation. I've added a FIXME on how this can be unraveled in the
backend and a test to make sure we successfully legalize 64-bit things
if we say we're 64-bits.
llvm-svn: 233239
This patch adds Hardware Transaction Memory (HTM) support supported by ISA 2.07
(POWER8). The intrinsic support is based on GCC one [1], but currently only the
'PowerPC HTM Low Level Built-in Function' are implemented.
The HTM instructions follows the RC ones and the transaction initiation result
is set on RC0 (with exception of tcheck). Currently approach is to create a
register copy from CR0 to GPR and comapring. Although this is suboptimal, since
the branch could be taken directly by comparing the CR0 value, it generates code
correctly on both test and branch and just return value. A possible future
optimization could be elimitate the MFCR instruction to branch directly.
The HTM usage requires a recently newer kernel with PPC HTM enabled. Tested on
powerpc64 and powerpc64le.
This is send along a clang patch to enabled the builtins and option switch.
[1] https://gcc.gnu.org/onlinedocs/gcc/PowerPC-Hardware-Transactional-Memory-Built-in-Functions.html
Phabricator Review: http://reviews.llvm.org/D8247
llvm-svn: 233204
This reverts commit r233055.
It still causes buildbot failures (gcc running out of memory on several platforms, and a self-host failure on arm), although less than the previous time.
llvm-svn: 233068
Previously, subtarget features were a bitfield with the underlying type being uint64_t.
Since several targets (X86 and ARM, in particular) have hit or were very close to hitting this bound, switching the features to use a bitset.
No functional change.
The first time this was committed (r229831), it caused several buildbot failures.
At least some of the ARM ones were due to gcc/binutils issues, and should now be fixed.
Differential Revision: http://reviews.llvm.org/D8542
llvm-svn: 233055
LocalStackSlotPass assumes that isFrameOffsetLegal doesn't change its
answer when the base register changes. Unfortunately this isn't true
in thumb1, where SP-based loads allow a larger offset than
non-SP-based loads, and this causes the base register reuse code to
generate instructions that are unencodable, causing an assertion
failure.
Solve this by adding a BaseReg parameter to isFrameOffsetLegal, which
ARMBaseRegisterInfo can then make use of to give the correct answer.
Differential Revision: http://reviews.llvm.org/D8419
llvm-svn: 232825
There are two main advantages to doing this
* Targets that only need to handle one of the formats specially don't have
to worry about the others. For example, x86 now only registers a
constructor for the COFF streamer.
* Changes to the arguments passed to one format constructor will not impact
the other formats.
llvm-svn: 232699
Before this patch code wanting to create temporary labels for a given entity
(function, cu, exception range, etc) had to keep its own counter to have stable
symbol names.
createTempSymbol would still add a suffix to make sure a new symbol was always
returned, but it kept a single counter. Because of that, if we were to use
just createTempSymbol("cu_begin"), the label could change from cu_begin42 to
cu_begin43 because some other code started using temporary labels.
Simplify this by just keeping one counter per prefix and removing the various
specialized counters.
llvm-svn: 232535
We have observed that noreg was being generated due to a bug in FastIsel and was not being detected during emission. It happens that in the Asm emission there is an assertion that detects this in getRegisterName() from the tbl-generated file PPCGenAsmWriter.inc. However, when emitting an Obj file, invalid registers can be emitted given that no check are made in getBinaryCodeFromInstr() from PPCGenMCCodeEmitter.inc. In order to cover all cases this adds an assertion for reg operands in LowerPPCMachineInstrToMCInst.
llvm-svn: 232525
The VSX stores are sometimes generated with a undefined index register, causing %noreg to be used and R0 to be emitted later on. The semantics of the VSX store (e.g. stdsdx) requires R0 to be used as base if we want zero to be used in the computation of the effective address instead of the content of R0. This patch checks if no index register was generated and forces R0 to be used as base address.
llvm-svn: 232486
It's not completely clear why 'i' has historically been treated as a memory
constraint. According to the documentation, it represents a constant immediate.
llvm-svn: 232470
Summary:
But still handle them the same way since I don't know how they differ on
this target.
Of these, 'es', and 'Q' do not have backend tests but are accepted by
clang.
No functional change intended. Depends on D8173.
Reviewers: hfinkel
Reviewed By: hfinkel
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D8213
llvm-svn: 232466
Summary:
This is instead of doing this in target independent code and is the last
non-functional change before targets begin to distinguish between
different memory constraints when selecting code for the ISD::INLINEASM
node.
Next, each target will individually move away from the idea that all
memory constraints behave like 'm'.
Subscribers: jholewinski, llvm-commits
Differential Revision: http://reviews.llvm.org/D8173
llvm-svn: 232373
The operand flag word for ISD::INLINEASM nodes now contains a 15-bit
memory constraint ID when the operand kind is Kind_Mem. This constraint
ID is a numeric equivalent to the constraint code string and is converted
with a target specific hook in TargetLowering.
This patch maps all memory constraints to InlineAsm::Constraint_m so there
is no functional change at this point. It just proves that using these
previously unused bits in the encoding of the flag word doesn't break
anything.
The next patch will make each target preserve the current mapping of
everything to Constraint_m for itself while changing the target independent
implementation of the hook to return Constraint_Unknown appropriately. Each
target will then be adapted in separate patches to use appropriate
Constraint_* values.
PR22883 was caused the matching operands copying the whole of the operand flags
for the matched operand. This included the constraint id which needed to be
replaced with the operand number. This has been fixed with a conversion
function. Following on from this, matching operands also used the operand
number as the constraint id. This has been fixed by looking up the matched
operand and taking it from there.
llvm-svn: 232165
This (r232027) has caused PR22883; so it seems those bits might be used by
something else after all. Reverting until we can figure out what else to do.
Original commit message:
The operand flag word for ISD::INLINEASM nodes now contains a 15-bit
memory constraint ID when the operand kind is Kind_Mem. This constraint
ID is a numeric equivalent to the constraint code string and is converted
with a target specific hook in TargetLowering.
This patch maps all memory constraints to InlineAsm::Constraint_m so there
is no functional change at this point. It just proves that using these
previously unused bits in the encoding of the flag word doesn't break anything.
The next patch will make each target preserve the current mapping of
everything to Constraint_m for itself while changing the target independent
implementation of the hook to return Constraint_Unknown appropriately. Each
target will then be adapted in separate patches to use appropriate Constraint_*
values.
llvm-svn: 232093
Summary:
The operand flag word for ISD::INLINEASM nodes now contains a 15-bit
memory constraint ID when the operand kind is Kind_Mem. This constraint
ID is a numeric equivalent to the constraint code string and is converted
with a target specific hook in TargetLowering.
This patch maps all memory constraints to InlineAsm::Constraint_m so there
is no functional change at this point. It just proves that using these
previously unused bits in the encoding of the flag word doesn't break anything.
The next patch will make each target preserve the current mapping of
everything to Constraint_m for itself while changing the target independent
implementation of the hook to return Constraint_Unknown appropriately. Each
target will then be adapted in separate patches to use appropriate Constraint_*
values.
Reviewers: hfinkel
Reviewed By: hfinkel
Subscribers: hfinkel, jholewinski, llvm-commits
Differential Revision: http://reviews.llvm.org/D8171
llvm-svn: 232027